Next: Converting earleme to Earley set ID, Previous: The fully general input model, Up: Advanced input models [Contents][Index]
One variation of the fully general model is the one-codepoint-per-earleme input model, usually called the codepoint-per-earleme input model.
Most input models may be called token-per-earleme input models. That is, they are models in which every token is exactly one earleme long.
In the codepoint-per-earleme model, every codepoint will be treated as being exactly one earleme in length. If a token is more than one codepoint in length, that token will span earlemes. In the codepoint-per-earleme model, tokens may be ambiguous, and they may overlap.
When a codepoint-per-earleme model of input is used, there may be many earlemes at which no tokens start. For example, in a straightforward codepoint-per-earleme implementation of a grammar for a language that allows comments, no tokens will start at any earlemes which correspond to character locations inside a comment.
Codepoint-per-earleme input model have seen a lot of use,
but mainly for experimental and toy grammars.
Their disadvantage is efficiency — the requirement of one call
of marpa_r_alternative()
for each codepoint can make them
substantially more expensive than input models which allow multiple
codepoints per earleme.