Next: Application and diagnostic behavior, Previous: Ambiguity, Up: Terms [Contents][Index]
We assume knowledge of Earley’s algorithm. This section is intended as a reminder, and to set out our choices for notation and terminology.
A dotted rule is a duple of rule and position in the rule. The position must be an non-negative integer that is not greater than the length of the rule. For example,
[ [ A ::= X Y Z ], 2 ] (DR1)
is a dotted rule
whose rule is
[ A ::= X Y Z ]
and whose position is 2.
More often a dotted rule is written
with the position marked in the RHS of the rule.
Here we use “•” as our marker, so that
in (DR1)
the “•” would come after
the 2nd symbol on the RHS,
and (DR1)
would be written
[ A ::= X Y • Z ].
Traditionally the marker is a raised dot, and this is where the term “dotted rule” comes from. The position of the dotted rule is almost always called the dot position.
An Earley item is a triple of dotted rule, origin and current parse location.1 The Earley item makes a statement about the progress of a parse. The origin is the parse location where recognition of the dotted rule begins. The current location is the parse location which recognition of the dotted rule has reached. Symbols before the dot position of the dotted rule have been recognized, and symbols after the dot position of the dotted rule have yet to be recognized.
For example, in the Earley item
[ [ A ::= X Y • Z ], 7, 42],
the dotted rule [ A ::= X Y • Z ]
was recognized
as far as the dot position, starting at parse location 7
and ending at parse location 42.
More precisely,
[ A ::= X Y Z ]
was expected
beginning at parse location 7;
X Y
” has been recognized
starting at parse location 7,
which implies that the symbol X
has been recognized
starting at parse location 7;
X Y
” has been recognized
ending at parse location 42,
which implies that the symbol Y
has been recognized
ending at parse location 42; and
Z
has yet to recognized,
but is expected starting at parse location 42.
The Earley set at location j
is the set of Earley items whose current location is j
.
For example, the Earley item
[ [ A ::= X Y • Z ], 7, 42]
is in the Earley set at location 42.
We say that a dotted rule is completed iff its dot position is after the last symbol. We say that an Earley item is completed iff its dotted rule is completed. For example, the Earley item
[ [ A ::= X Y Z • ], 7, 50],
is completed. A completed dotted rule is also called a completion. Similarly, a completed Earley item is also called a completion.
We say that a dotted rule is predicted iff its dot position is before the first symbol. We say that an Earley item is predicted iff its dotted rule is predicted. For example, the Earley item
[ [ A ::= • X Y Z ], 7, 7],
is predicted. A predicted dotted rule is also called a prediction. Similarly, a predicted Earley item is also called a prediction.
The postdot symbol of a dotted rule is the RHS symbol after the dot position. For example, in the dotted rule,
[ A ::= X Y • Z ]
the postdot symbol is Z
.
If the dotted rule is a completion,
its postdot symbol is not defined.
The postdot symbol of an Earley item is the postdot symbol of its dotted rule. The postdot symbol of an Earley item is an expected symbol at its current location. For example, the Earley item
[ [ A ::= X Y • Z ], 7, 42],
indicates that the symbol Z
is expected at location 42.
Let sym
be a symbol,
and let j
be a parse location.
Unless sym
is expected at j
,
that is, unless sym
is a postdot symbol of some Earley
item in the Earley set at j
,
sym
will be rejected by the
marpa_r_alternative()
method
at current location j
.
See marpa_r_alternative.
Those interested in details of notation may notice that we include current location in the Earley item tuple, contrary to the tradition. We develop our notation more formally, and explain our reasons for the deviation from tradition, on pages 5-7 of https://arxiv.org/abs/2303.04093v1.
Next: Application and diagnostic behavior, Previous: Ambiguity, Up: Terms [Contents][Index]