The codepoint-per-earleme model (Libmarpa 11.0.10)

29.3 The codepoint-per-earleme model

One variation of the fully general model is the one-codepoint-per-earleme input model, usually called the codepoint-per-earleme input model.

Most input models may be called token-per-earleme input models. That is, they are models in which every token is exactly one earleme long.

In the codepoint-per-earleme model, every codepoint will be treated as being exactly one earleme in length. If a token is more than one codepoint in length, that token will span earlemes. In the codepoint-per-earleme model, tokens may be ambiguous, and they may overlap.

When a codepoint-per-earleme model of input is used, there may be many earlemes at which no tokens start. For example, in a straightforward codepoint-per-earleme implementation of a grammar for a language that allows comments, no tokens will start at any earlemes which correspond to character locations inside a comment.

Codepoint-per-earleme input model have seen a lot of use, but mainly for experimental and toy grammars. Their disadvantage is efficiency — the requirement of one call of marpa_r_alternative() for each codepoint can make them substantially more expensive than input models which allow multiple codepoints per earleme.