Earley items (Libmarpa 11.0.10)

We assume knowledge of Earley’s algorithm. This section is intended as a reminder, and to set out our choices for notation and terminology.

A dotted rule is a duple of rule and position in the rule. The position must be an non-negative integer that is not greater than the length of the rule. For example,

     [ [ A ::= X Y Z ], 2 ]      (DR1)

is a dotted rule whose rule is [ A ::= X Y Z ] and whose position is 2. More often a dotted rule is written with the position marked in the RHS of the rule. Here we use “•” as our marker, so that in (DR1) the “•” would come after the 2nd symbol on the RHS, and (DR1) would be written

     [ A ::= X Y • Z ].

Traditionally the marker is a raised dot, and this is where the term “dotted rule” comes from. The position of the dotted rule is almost always called the dot position.

An Earley item is a triple of dotted rule, origin and current parse location.¹ The Earley item makes a statement about the progress of a parse. The origin is the parse location where recognition of the dotted rule begins. The current location is the parse location which recognition of the dotted rule has reached. Symbols before the dot position of the dotted rule have been recognized, and symbols after the dot position of the dotted rule have yet to be recognized.

     [ [ A ::= X Y • Z ], 7, 42],

the dotted rule [ A ::= X Y • Z ] was recognized as far as the dot position, starting at parse location 7 and ending at parse location 42. More precisely,

The Earley set at location j is the set of Earley items whose current location is j. For example, the Earley item [ [ A ::= X Y • Z ], 7, 42] is in the Earley set at location 42.

We say that a dotted rule is completed iff its dot position is after the last symbol. We say that an Earley item is completed iff its dotted rule is completed. For example, the Earley item

     [ [ A ::= X Y Z • ], 7, 50],

is completed. A completed dotted rule is also called a completion. Similarly, a completed Earley item is also called a completion.

We say that a dotted rule is predicted iff its dot position is before the first symbol. We say that an Earley item is predicted iff its dotted rule is predicted. For example, the Earley item

     [ [ A ::= • X Y Z ], 7, 7],

is predicted. A predicted dotted rule is also called a prediction. Similarly, a predicted Earley item is also called a prediction.

The postdot symbol of a dotted rule is the RHS symbol after the dot position. For example, in the dotted rule,

     [ A ::= X Y • Z ]

the postdot symbol is Z. If the dotted rule is a completion, its postdot symbol is not defined.

The postdot symbol of an Earley item is the postdot symbol of its dotted rule. The postdot symbol of an Earley item is an expected symbol at its current location. For example, the Earley item

     [ [ A ::= X Y • Z ], 7, 42],

Let sym be a symbol, and let j be a parse location. Unless sym is expected at j, that is, unless sym is a postdot symbol of some Earley item in the Earley set at j, sym will be rejected by the marpa_r_alternative() method at current location j. See marpa_r_alternative.

Footnotes

(1)

Those interested in details of notation may notice that we include current location in the Earley item tuple, contrary to the tradition. We develop our notation more formally, and explain our reasons for the deviation from tradition, on pages 5-7 of https://arxiv.org/abs/2303.04093v1.

5.13 Earley items

Footnotes

(1)