Libmarpa 11.0.10

1 License
2 Updates
3 About this document
- 3.1 How to read this document
- 3.2 Prerequisites
4 Overview of Libmarpa
5 Terms, definitions and notation
- 5.1 Miscellaneous definitions
- 5.2 Parsing theory preliminaries
- 5.3 Rules
- 5.4 Derivations
- 5.5 Nulling
- 5.6 Useless rules
- 5.7 Recursion and cycles
- 5.8 Trees
- 5.9 An example tree
- 5.10 Traversal
- 5.11 Semantics terms
- 5.12 Ambiguity
- 5.13 Earley items
- 5.14 Application and diagnostic behavior
6 Architecture
- 6.1 Major objects
- 6.2 Time objects
- 6.3 Reference counting
- 6.4 Numbered objects
7 Input
- 7.1 Parse location
- 7.2 The basic models of input
  - 7.2.1 The standard model of input
  - 7.2.2 Ambiguous input
- 7.3 Terminals
8 Exhaustion
9 End of parse
- 9.1 Determining EOP from EOI
- 9.2 Determining EOP from exhaustion
- 9.3 Other ways of determining EOP
10 Semantics
11 Threads
12 Sequence rules
13 Nullability
- 13.1 Nullability in the valuator
- 13.2 Assigning semantics to nulled symbols
- 13.3 Evaluating nulled symbols
- 13.4 Example of nulled symbol
- 13.5 Duplicate nulled nodes
14 Failure
- 14.1 Libmarpa’s approach to failure
- 14.2 User non-conformity to specified behavior
- 14.3 Classifying failure
- 14.4 Memory allocation failure
- 14.5 Undetected failure
- 14.6 Irrecoverable hard failure
- 14.7 Partially recoverable hard failure
- 14.8 Library-recoverable hard failure
- 14.9 Ancestry-recoverable hard failure
- 14.10 Fully recoverable hard failure
- 14.11 Soft failure
- 14.12 Error codes
15 Introduction to the method descriptions
- 15.1 About the overviews
- 15.2 Naming conventions
- 15.3 Return values
- 15.4 How to read the method descriptions
16 Static methods
17 Configuration methods
18 Grammar methods
- 18.1 Overview
- 18.2 Creating a new grammar
- 18.3 Tracking the reference count of the grammar
- 18.4 Symbol methods
- 18.5 Rule methods
- 18.6 Sequence methods
- 18.7 Rank methods
- 18.8 Precomputing the Grammar
19 Recognizer methods
- 19.1 Recognizer overview
- 19.2 Creating a new recognizer
- 19.3 Keeping the reference count of a recognizer
- 19.4 Life cycle mutators
- 19.5 Location accessors
- 19.6 Other parse status methods
20 Progress reports
- 20.1 Progress report traverser
- 20.2 Progress report methods
21 Bocage methods
- 21.1 Overview
- 21.2 Bocage data structure
- 21.3 Creating a new bocage
- 21.4 Reference counting
- 21.5 Accessors
22 Ordering methods
- 22.1 Overview
- 22.2 Freezing the ordering
- 22.3 Creating an ordering
- 22.4 Reference counting
- 22.5 Accessors
- 22.6 Non-default ordering
23 Tree methods
- 23.1 Overview
- 23.2 Creating a new tree iterator
- 23.3 Reference counting
- 23.4 Iterating through the trees
24 Value methods
- 24.1 Overview
- 24.2 How to use the valuator
- 24.3 Advantages of step-driven valuation
- 24.4 Maintaining the stack
  - 24.4.1 Sizing the stack
- 24.5 Creating a new valuator
- 24.6 Reference counting
- 24.7 Stepping through the valuator
- 24.8 Valuator step types
- 24.9 Basic step accessors
- 24.10 Step location accessors
25 Events
- 25.1 Overview
- 25.2 Event codes
- 25.3 Basic event accessors
- 25.4 Completion events
- 25.5 Symbol nulled events
- 25.6 Prediction events
- 25.7 Symbol expected events
- 25.8 Recognizer per-symbol events
- 25.9 Tentative events
- 25.10 Ambiguous parses and events
- 25.11 Nulled subtrees and events
- 25.12 Event coincidence of symbol instances
- 25.13 Event coincidence of symbols
- 25.14 Marker symbols
- 25.15 Per-rule events
  - 25.15.1 Workaround with a dedicated LHS symbol
26 Tracing and diagnosing parses
- 26.1 Listing symbols
- 26.2 Listing rules
- 26.3 Listing Earley sets
- 26.4 Tracing tokens
- 26.5 Parse diagnosis checklist
27 Error methods, macros and codes
- 27.1 Error methods
- 27.2 Error Macros
- 27.3 External error codes
- 27.4 Internal error codes
28 Technical notes
- 28.1 Data types used by Libmarpa
- 28.2 Why so many time objects?
- 28.3 Numbered objects
- 28.4 Trap representations
- 28.5 Out-of-memory handling
29 Advanced input models
- 29.1 The dense variable-length token model
- 29.2 The fully general input model
- 29.3 The codepoint-per-earleme model
- 29.4 Converting earleme to Earley set ID
30 Support
31 Futures
- 31.1 Nulling versus nulled
- 31.2 Document pre-conditions more formally
- 31.3 Simpler events interface
- 31.4 Better defined ambiguity metric
- 31.5 Report item traverser should be a time object
- 31.6 Orthogonal treatment of soft failures
- 31.7 Orthogonal treatment of exhaustion
- 31.8 Furthest earleme values
- 31.9 Additional recoverable failures in marpa_r_alternative()
- 31.10 Untested methods
  - 31.10.1 Zero-width assertion methods
  - 31.10.2 Methods for revising parses
32 Deprecated techniques and methods
- 32.1 LHS terminals
- 32.2 Valued and unvalued symbols
- 32.3 Cycles
33 History of the Marpa algorithm
34 Annotated bibliography
- 34.1 Aho and Ullman 1972
- 34.2 Aycock and Horspool 2002
- 34.3 C89 standard
- 34.4 C99 standard
- 34.5 Dominus 2005
- 34.6 Earley 1970
- 34.7 Grune and Jacobs 1990
- 34.8 Grune and Jacobs 2008
- 34.9 Marpa theory paper
- 34.10 Kegler 2023
- 34.11 Parsing timeline
- 34.12 Leo 1991
- 34.13 Scott 2008
- 34.14 Wikipedia BNF article
35 Acknowledgements
- 35.1 An apology
Index of terms

Libmarpa: The Marpa low-level library

This manual (30 April 2023) is for Libmarpa 11.0.10.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


• License:
• Updates:
• About this document:
• Overview of Libmarpa:
• Terms:
• Architecture:
• Input:
• Exhaustion:
• End of parse:
• Semantics:
• Threads:
• Sequence rules:
• Nullability:
• Failure:
• Introduction to the method descriptions:
• Static methods:
• Configuration methods:
• Grammar methods:
• Recognizer methods:
• Progress reports:
• Bocage methods:
• Ordering methods:
• Tree methods:
• Value methods:
• Events:
• Tracing and diagnosing parses:
• Error methods macros and codes:
• Technical notes:
• Advanced input models:
• Support:
• Futures:
• Deprecated techniques and methods:
• History of the Marpa algorithm:
• Annotated bibliography:
• Acknowledgements:
• Index of terms:
— The Detailed Node Listing — About this document
• How to read this document:
• Prerequisites:
Terms, definitions and notation
• Miscellaneous definitions:
• Parsing theory preliminaries:
• Rules:
• Derivations:
• Nulling:
• Useless rules:
• Recursion and cycles:
• Trees:
• An example tree:
• Traversal:
• Semantics terms:
• Ambiguity:
• Earley items:
• Application and diagnostic behavior:
Architecture
• Major objects:
• Time objects:
• Reference counting:
• Numbered objects:
Input
• Parse location:
• The basic models of input:
• Terminals:
Parse location
• The traditional input model:
• Earlemes and Earley set IDs:
• The latest Earley set:
• The latest earleme:
• The current earleme:
• The furthest earleme:
The basic models of input
• The standard model of input:
• Ambiguous input:
End of parse
• Determining EOP from EOI:
• Determining EOP from exhaustion:
• Other ways of determining EOP:
Nullability
• Nullability in the valuator:
• Assigning semantics to nulled symbols:
• Evaluating nulled symbols:
• Example of nulled symbol:
• Duplicate nulled nodes:
Failure
• Libmarpa's approach to failure:
• User non-conformity to specified behavior:
• Classifying failure:
• Memory allocation failure:
• Undetected failure:
• Irrecoverable hard failure:
• Partially recoverable hard failure:
• Library-recoverable hard failure:
• Ancestry-recoverable hard failure:
• Fully recoverable hard failure:
• Soft failure:
• Error codes:
Introduction to the method descriptions
• About the overviews:
• Naming conventions:
• Return values:
• How to read the method descriptions:
Grammar methods
• Grammar overview:
• Grammar constructor:
• Grammar reference counting:
• Symbol methods:
• Rule methods:
• Sequence methods:
• Rank methods:
• Grammar precomputation:
Recognizer methods
• Recognizer overview:
• Creating a new recognizer:
• Recognizer reference counting:
• Recognizer life cycle mutators:
• Location accessors:
• Other parse status methods:
Progress reports
• Progress report traverser:
• Progress report methods:
Bocage methods
• Bocage overview:
• Bocage data structure:
• Bocage constructor:
• Bocage reference counting:
• Bocage accessor:
Ordering methods
• Ordering overview:
• Freezing the ordering:
• Ordering constructor:
• Ordering reference counting:
• Order accessor:
• Non-default ordering:
Tree methods
• Tree overview:
• Tree constructor:
• Tree reference counting:
• Iterating through the trees:
Value methods
• Value overview:
• How to use the valuator:
• Advantages of step-driven valuation:
• Maintaining the stack:
• Valuator constructor:
• Valuator reference counting:
• Stepping through the valuator:
• Valuator step types:
• Basic step accessors:
• Step location accessors:
Maintaining the stack
• Sizing the stack:
Events
• Events overview:
• Event codes:
• Basic event accessors:
• Completion events:
• Symbol nulled events:
• Prediction events:
• Symbol expected events:
• Recognizer per-symbol events:
• Tentative events:
• Ambiguous parses and events:
• Nulled subtrees and events:
• Event coincidence of symbol instances:
• Event coincidence of symbols:
• Marker symbols:
• Per-rule events:
Event coincidence of symbol instances
• Nulled and completed symbol instances:
• Nulled and predicted symbol instances:
• Nulled and expected symbol instances:
• Completed and predicted symbol instances:
• Completed and expected symbol instances:
• Predicted and expected symbol instances:
Per-rule events
• Workaround with a dedicated LHS symbol:
Tracing and diagnosing parses
• Listing symbols:
• Listing rules:
• Listing Earley sets:
• Tracing tokens:
• Parse diagnosis checklist:
Error methods, macros and codes
• Error methods:
• Error Macros:
• External error codes:
• Internal error codes:
Technical notes
• Data types used by Libmarpa:
• Why so many time objects:
• Design of numbered objects:
• Trap representations:
• Out-of-memory handling:
Advanced input models
• The dense variable-length token model:
• The fully general input model:
• The codepoint-per-earleme model:
• Converting earleme to Earley set ID:
Futures
• Nulling versus nulled:
• Document pre-conditions more formally:
• Simpler events interface:
• Better defined ambiguity metric:
• Report item traverser should be a time object:
• Orthogonal treatment of soft failures:
• Orthogonal treatment of exhaustion:
• Furthest earleme values:
• Additional recoverable failures in marpa_r_alternative():
• Untested methods:
Untested methods
• Zero-width assertion methods:
• Methods for revising parses:
Deprecated techniques and methods
• LHS terminals:
• Valued and unvalued symbols:
• Cycles:
LHS terminals
• Overview of LHS terminals:
• Motivation of LHS terminals:
• LHS terminal methods:
• Precomputation and LHS terminals:
• Nulling terminals:
Valued and unvalued symbols
• What unvalued symbols were:
• Grammar methods dealing with unvalued symbols:
• Registering semantics in the valuator:
Annotated bibliography
• Aho and Ullman 1972:
• Aycock and Horspool 2002:
• C89:
• C99:
• Dominus 2005:
• Earley 1970:
• Grune and Jacobs 1990:
• Grune and Jacobs 2008:
• Marpa theory paper:
• Kegler 2023:
• Parsing timeline:
• Leo 1991:
• Scott 2008:
• Wikipedia BNF article:
Acknowledgements
• An apology:

1 License

This manual (30 April 2023) is for Libmarpa 11.0.10.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

2 Updates

For important information that has changed since the last stable release, there is an “updates” document (https://github.com/jeffreykegler/libmarpa/blob/updated/UPDATES.md). The updates document includes

descriptions of bugs in the latest stable release;
notices which are useful to current users, but which do not justify a full new stable distribution; and
other information that we want to be able to update without issuing a new stable release.

To allow that information to be kept current without issuing a new stable release, we describe how to obtain support in the updates document. See Support.

3 About this document

3.1 How to read this document

This is essentially a reference document, but its early chapters lay out concepts essential to the others. Readers will usually want to read the chapters up and including Introduction to the method descriptions in order. Otherwise, they should follow their interests.

3.2 Prerequisites

This document is very far from self-contained. It assumes the following:

The reader knows the C programming language at least well enough to understand function prototypes and return values.
The reader has read the documents for one of Libmarpa’s upper layers. As of this writing, the only such layer is Marpa::R2 or Marpa::R3, in Perl.
The reader knows some parsing theory. See Parsing theory preliminaries.

4 Overview of Libmarpa

This chapter contains a quick overview of Libmarpa, using standard parsing terminology. It is intended to help a prospective reader of the whole document to know what to expect. Details and careful definitions will be provided in later chapters.

Libmarpa implements the Marpa parsing algorithm. Marpa is named after the legendary 11th century Tibetan translator, Marpa Lotsawa. In creating Marpa, we depended heavily on previous work by Jay Earley, Joop Leo, John Aycock and Nigel Horspool.

Marpa parses any language whose grammar can be written in BNF. That includes recursive grammars, ambiguous grammars, infinitely ambiguous grammars and grammars with useless or empty productions. Marpa does both left- and right-recursion in linear time – in fact if a grammar is in any class currently in practical use, Marpa will parse it in linear time. See Marpa theory paper.

Libmarpa implements the entire Marpa algorithm. This library does the necessary grammar preprocessing, recognizes the input, and produces a “bocage”, which is an optimized parse forest. Libmarpa also supports the ordering, iteration and evaluation of the parse trees in the bocage.

Libmarpa is very low-level. For example, it has no strings. Rules, symbols, and token values are all represented by integers. This, of course, will not suffice for many applications. Users will very often want names for the symbols, non-integer values for tokens, or both. Typically, applications will use arrays to translate Libmarpa’s integer ID’s to strings or other values as required.

Libmarpa also does not implement most of the semantics. Libmarpa does have an evaluator (called a “valuator”), but it does not manipulate the stack directly. Instead, Libmarpa, based on its traversal of the parse tree, passes optimized step by step stack manipulation instructions to the upper layer. These instructions indicate the token or rule involved, and the proper location for the true token value or the result of the rule evaluation. For rule evaluations, the instructions include the stack location of the arguments.

Marpa requires most semantics to be implemented in the application. This allows the application total flexibility. It also puts the application is in a much better position to prevent errors, to catch errors at runtime or, failing all else, to successfully debug the logic.

5 Terms, definitions and notation

5.1 Miscellaneous definitions

Application means an “application” of Libmarpa. In this document, a Libmarpa application is not necessarily an application program. For our purposes, an “application” might be another library that uses Libmarpa.
The components of the application environment are the system components other than Libmarpa and the application itself. Application environments vary, but their components often include an operating system, system libraries, a C compiler and its libraries, and hardware.
A boolean value, or boolean, is an integer that is 0 or 1.
An application environment limit, more often called simply an environment limit, is a limit imposed by the application enviroment. For our purposes, the most important environment limit is usually available memory.
The lexeme iff is an abbreviation for “if and only if”.
max(x,y) is the maximum of x and y, where x and y are two numbers.
Libmarpa method, or just method, means a C function or a function-like macro of the Libmarpa library.
User means a “user” of the Libmarpa library. A user of the library is also a programmer, so that in this document, “user” and “programmer” are essentially synonyms.
We (as well as “us” and ”our”) refers to the authors, but is also intended to suggest that the reader is joining the authors on the millenia-old voyage of discovery into mathematics and language. As of this writing, there is a one primary author.

The following terms are used in a way that is consistent with their use in the C99 standard:

An indeterminate value is either an unspecified value or a trap value.
A trap value, also called a trap representation, is a value that when accessed causes undefined behavior. See Trap representations.
An undefined behavior is a behavior that this document does not specify. One implication is that this behavior might be problematic.
An unspecified behavior is a behavior that, within a range of possibilities, is not further specified by this document. This is usually not problematic.
An unspecified value is a value on which this document imposes no restrictions, except that it cannot be a trap value.

5.2 Parsing theory preliminaries

This document assumes the reader is familiar with parsing theory. The following exposition is not intended as either an introduction or a reference. Instead, it is intended to serve as a guide to the definitions of parsing terms as used in this document.

The definitions given are intended to be applicable within Libmarpa, rather than to reflect general usage. For example, Marpa sometimes uses a standard term with a definition that is different from the standard definition. “Ambiguous grammar” is one example: See Ambiguity. The term “terminal” is another. See Terminals. When a standard term is defined in a non-standard way. this is explicitly pointed out.

Readers who want a textbook or tutorial in parsing theory can look at Mark Jason Dominus’s excellent chapter on parsing in the Perl context. See Dominus 2005. It is available on-line. Wikipedia is also an excellent place to start. See Wikipedia BNF article.

A grammar is a set of rules, associated with a set of symbols, one of which is distinguished as the start symbol. A symbol string, or simply string where the meaning is clear, is an ordered series of symbols. The length of a string is the number of symbols in it. A symbol string is also called a sentential form.

Some of the symbols are terminals. For the the moment, we will say that a terminal is a symbol which may occur in an input to a parse of a grammar. A careful definition appropriate specifically to Marpa is below (see Terminals). In a parse, an input is either accepted or rejected. A potential input string, that is, a sentential form which is made up entirely of terminal symbols, is called a sentence. The set of sentences that a grammar accepts is the language of the grammar.

It is important to note that the term language, as it is used in parsing theory, means something very different from what it means in ordinary use. The meaning of the strings is an essential part of the ordinary idea of what a language is. In parsing terminology, meaning (or semantics as it is called) is a separate issue. For parsing theory a language is exactly a set of strings – that and nothing more.

A recognizer is a program that determines whether its input is in the language of a grammar. A parser is a program which finds the structure of that input.

Parsers often use a lexical analyzer to convert raw input, usually input text, into a token stream, which is a series of tokens. Each token represents one or more symbols of the grammar and has a value. (Libmarpa tokens may be ambiguous. See Ambiguous input.) A token is also sometimes called a lexeme. A lexical analyzer is often called a lexer or a scanner, and lexical analysis is often called lexing or scanning.

The series of symbols represented by the series of tokens becomes the symbol string input seen by the recognizer. The symbol string input is more often called the input sentence, or simply the input.

5.3 Rules

A standard way of describing rules is Backus-Naur Form, or BNF. In one common way of writing BNF, a rule looks like this:

    Expression ::= Term Factor

In the rule above, Expression, Term and Factor are symbols. A rule consists of a left hand side and a right hand side. In a context-free grammar, like those Marpa parses, the left hand side of a rule is always a symbol string of length 1. The right hand side of a rule is a symbol string of zero or more symbols. In the example, Expression is the left hand side, and Term and Factor are right hand side symbols.

Left hand side and right hand side are often abbreviated as RHS and LHS. If the RHS of a rule has no symbols, the rule is called an empty rule or an empty rule.

The length of a rule is the length of its RHS string. This implies that the length of an empty rule is zero.

In a standard grammar, all rules are BNF rules, as just described. Marpa grammars differ from standard grammars in allowing a second kind of rule: a sequence rule. The RHS of a sequence rule is a single symbol, which is repeated zero or more times. Libmarpa allows the application to specify other parameters for sequence rules, including a separator symbol. See Sequence rules.

5.4 Derivations

A step of a derivation, or derivation step, is a change made to a symbol string by applying one of the rules from the grammar. The rule must be one of those with a LHS that occurs in the symbol string. The result of the derivation step is another symbol string, one in which an occurrence of the LHS symbol from the rule is replaced by the RHS of the rule. For example, if A, B, C, D, and X are symbols, and

    X ::= B C

is a rule, then

    A X D -> A B C D

is a derivation step,

with “A X D” as its beginning,
“A B C D” as its end or result, and
X ::= B C as its rule.

A derivation is a sequence of derivation steps. The length of a derivation is its length in steps.

A string X derives a string Y iff there is a derivation of zero or more steps which begins with the string X and ends in the string Y. In the example above (see derivation-example), we say that the symbol string “A X D” derives the symbol string “A B C D” in one step.
We say that a first symbol string directly derives a second symbol string iff there is a derivation of length 1 from the first symbol string to the second symbol string. In the example above (see derivation-example), we say that the symbol string “A X D” directly derives the symbol string “A B C D”.
Every symbol string is said to derive itself in a derivation of length 0. A zero length derivation is a trivial derivation.
A derivation which is not trivial (that is, a derivation which has one or more steps) is a non-trivial derivation.
If a derivation is not trivial or direct, that is, if it has more than one step, then it is an indirect derivation.

Pedantically, a symbol X and a string that consists of only that symbol are two different things. But we often say “the symbol X”, or “X”, as shorthand for “the string of length 1 whose only symbol is X”. For example, if the string containing only the symbol X derives a string Y, we will usually say simply that “X derives Y”.

Wherever symbol or string X derives Y, we may also say X produces Y. Derivations are often described as symbol matches. Wherever symbol or string X derives Y, we may also say that Y matches X or that X matches Y. It is particularly common to say that X matches Y when Y is a sentence.

The parse of an input by a grammar is successful iff, according to the grammar, the start symbol produces the input sentence. The set of all input sentences that a grammar will successfully parse is the language of the grammar.

5.5 Nulling

The zero length symbol string is called the empty string. The empty string can be considered to be a sentence, in which case it is the empty sentence. A string of one or more symbols is non-empty. A derivation which produces the empty string is a null derivation. A derivation from the start symbol which produces the empty string is a null parse.

If a symbol produces the empty string, it is a nullable symbol. If the only sentence produced by a symbol is the empty sentence, it is a nulling symbol. All nulling symbols are nullable symbols.

If a symbol is not nullable, it is non-nullable. If a symbol is not nulling, it is non-nulling.

A rule is nullable iff it is the rule of the first step of a null derivation. A rule is nullable iff its LHS symbol is nullable.

A rule r is nulling iff every derivation whose first step has r as its rule is a null derivation. A rule is nulling iff its LHS symbol is nulling.

If a rule is not nullable, it is non-nullable. If a rule is not nulling, it is non-nulling.

5.6 Useless rules

If any derivation from the start symbol uses a rule, that rule is called reachable or accessible. A rule that is not accessible is called unreachable or inaccessible. A symbol is reachable or accessible iff it appears in a accessible rule. If a symbol is not accessible, it is unreachable or inaccessible.

If any derivation which results in a sentence uses a rule, that rule is said to be productive. A rule that is not productive is called unproductive. A rule is productive iff every symbol on its RHS is productive. An empty rule is productive, vacuously.

A symbol is productive iff it is a terminal or it is the LHS of a productive rule. A symbol that is not productive is called unproductive.

These definitions imply that all nullable rules are productive. They also imply that every nullable symbol is productive, because every nullable symbol must be on the LHS of a nullable rule.

A rule which is inaccessible or unproductive is called a useless rule. A symbol which is inaccessible or unproductive is called a useless symbol. Marpa can handle grammars with useless rules and symbols.

5.7 Recursion and cycles

If any symbol in the grammar non-trivially produces a symbol string containing itself, the grammar is said to be recursive. If any symbol non-trivially produces a symbol string in which it is the leftmost symbol, the grammar is said to be left-recursive. If any symbol non-trivially produces a symbol string in which it is the rightmost symbol, the grammar is said to be right-recursive. Marpa can handle all recursive grammars, including grammars which are left-recursive, grammars which are right-recursive, and grammars which contain both left- and right-recursion.

A cycle is a non-trivial derivation of a string of symbols from itself. If it is not possible for any derivation using a grammar to contain a cycle, then that grammar is said to be cycle-free. Traditionally, a grammar is considered useless if it is not cycle-free.

The traditional deprecation of cycles is well-founded. A cycle is the parsing equivalent of an infinite loop. Once a cycle appears, it can be repeated over and over again. Even a very short input sentence can have an infinite number of parses when the grammar is not cycle-free.

For that reason, a grammar which contains a cycle is also called infinitely ambiguous. Marpa can parse with grammars which are not cycle-free, but use of this feature is deprecated. See Cycles.

5.8 Trees

In this document, unless otherwise stated,

by tree, we mean a labeled ordered tree; and
by tree node, we mean a labeled ordered tree node.

For brevity, in contexts where the meaning is clear, we refer to a tree node simply as a node. A node is a pair:

The first element of a node is a “label tuple”. The label tuple is a triple of symbol ID, start Earley set ID, and end Earley set ID. For more about the Earley set IDs, see Input.
The second element of a node is a sequence of nodes. Our definition of a tree node is therefore recursive.

When looked at from the point of view of its labels, a node is often called an instance. In the following list of definitions and assertions, let

     nd = < < sym, start, end >, children >

be a tree node.

We say that sym is the symbol of nd.
We say that nd is an instance of the symbol with ID sym starting at start and ending at end. We often write this as sym@start-end.
We say that nd is an instance of the symbol with ID sym at location end.
We say that the length of nd is the difference between its start and end, that is end-start.
The length of nd is zero iff start is the same as end. Put another way, the length of nd is zero iff start = end.
We say that the elements of children are the children of nd.
We say that every element of children is a child of nd.
For brevity, we say that the symbol sym is at end. We note that this means we consider the location of a symbol to be where it ends.
nd is a leaf node iff children is the empty list. A leaf node is also called a leaf.
nd is a rule node iff it is not a leaf node.
Every node is either a leaf node or a rule node. No node is both a leaf and a rule node.
We say that nd is a terminal node iff nd is a leaf node and sym is a terminal. A terminal node is also called a token node.
We say that nd is a nulled node iff nd is a leaf node and sym is not a terminal. A nulled node is also called a nulling node.
Every leaf node is either a nulled node or a terminal node. But, because nullable LHS terminals are not allowed, no node is both nulled and terminal.
We say that nd is a BNF node iff nd is not a terminal node and sym is the LHS of a BNF rule.
We say that nd is a sequence node iff nd is not a terminal node and sym is the LHS of a sequence rule.
Every node is a terminal node, a BNF node or a sequence node. But no node is more than one of the these three. This is because sequence rules never share a LHS with a BNF rule, and no BNF node or sequence node is a terminal node.
If nd is a rule node, its LHS is sym.
If nd is a rule node, its RHS is the concatenation, from first to last, of the symbols of the nodes in children.
All nulled nodes are zero-length. No terminal node is zero-length.
We say that nd is an instance of sym starting at start and ending at end. We also say that nd is an instance of sym at end or, simply, that nd is an instance of sym.
Let r be the BNF rule whose LHS is equal to the LHS of nd, and whose RHS is equal to the RHS of nd. If nd is a BNF rule node, there must be such a rule. In that case, we say that nd is an instance of r starting at start and ending at end. We often write this as r@start-end. We also say that nd is an instance of r at end or, simply, that nd is an instance of r.
Let r be the sequence rule whose LHS is equal to the LHS of nd. If nd is a sequence rule node, there must be such a rule. In that case, we say that nd is an instance of r starting at start and ending at end. We often write this as r@start-end. We also say that nd is an instance of r at end or, simply, that nd is an instance of r.
If nd is a nulled instance, we say that sym is nulled at location end or, simply, that the symbol sym is nulled. We can write this as nulled@end-end.

Let nd1 and nd2 be two nodes. If nd2 is a child of nd1, then nd1 is the parent of nd2.

We define ancestor recursively such that nd1 is the ancestor of a node nd2 iff one of the following are true:

nd1 and nd2 are the same node. In this case we say that nd1 is the trivial ancestor of nd2.
nd1 is the parent of an ancestor of nd2. In this case we say that nd1 is a proper ancestor of nd2.

Simlarly, we define descendant recursively such that nd1 is the descendant of a node nd2 iff one of the following are true:

nd1 and nd2 are the same node. In this case we say that nd1 is the trivial descendant of nd2.
nd1 is the parent of an descendant of nd2. In this case we say that nd1 is a proper descendant of nd2.

A tree is its own root node. That implies that, in fact, tree and node are just two different terms for the same thing. We usually speak of trees when we are thinking of the nodes/trees as a collection of nodes, and we speak of nodes when we are more focused on the individual nodes.

We sometimes call a root node, a start node. We do this especially in contexts where a tree is “top-level”, that is, not the child of any other tree under discussion.

If t1 and t2 are trees, and t2 (considered as a node) is a descendant of t1, then we say t2 is a subtree of t1. By this definition, every tree is a subtree of itself.

A parse forest is a set of one or more parse trees.

We use “parse” as a noun in several senses. Depending on context a parse may be

a parse run, that is, an instance of running a parser on an input using a grammar;
a parse tree; or
a parse forest.

When the meaning of “parse” is not clear in context, we will be explicit about which sense is intended.

5.9 An example tree

     < < "S", 0, 7 >, [
         < < "A", 0, 2 >, [
             < < "T", 0, 1 >, [ ] >,
             < < "U", 1, 2 >, [ ] >
         ] >,
         < < "B", 2, 2 >, [ ] >,
         < < "C", 2, 3 >, [
             < < "V", 2, 3 >, [ ] >
         ] >,
         < < "D", 3, 5 >, [
             < < "W", 3, 4 >, [ ] >,
             < < "X", 4, 5 >, [ ] >
         ] >,
         < < "E", 5, 7 >, [
             < < "Y", 5, 6 >, [ ] >,
             < < "Z", 6, 7 >, [ ] >
         ] >
      ] >

In the tree above, we use symbol names (for example, "S") as symbol IDs. Since we have not used the same symbol twice, we will be able to refer to a node by its symbol name. In practical parse trees, symbols usually occur many times.

Node S has 5 children. Node A has 2 children. Node B has no children and is a nulled node. Nodes T, U, V, W, X, Y, and Z are terminal nodes. Nodes S, A, C, D, and E are BNF rule nodes. The tree implies a grammar with at least the following rules:

    S ::= A B C D E
    A ::= T U
    B ::=
    C ::= V
    D ::= W X
    E ::= Y Z

The tree derives the sentence

      T U V W X Y X

and can do so in several ways. One is these is its rightmost derviation:

    S -> A B C D E
      -> A B C D Y Z
      -> A B C W X Y Z
      -> A B V W X Y Z
      -> A V W X Y Z
      -> T U V W X Y Z

This is called the rightmost derviation because, at each derivation step, the symbol that is expanded is the rightmost symbol possible. There is also a leftmost derivation:

    S -> A B C D E
      -> T U B C D E
      -> T U C D E
      -> T U V D E
      -> T U V W X E
      -> T U V W X Y Z

5.10 Traversal

We often want to “visit” all the nodes of a tree in a sequence, or traverse the tree. In the semantics phase, it is often necessary to visit all the children of a node before the node itself. We can do this using a postorder traversal traversal. Here is a recursive procedure for a postorder traversal of the tree nd:

         If nd is not well-defined, return
         Traverse the children of nd, in order
         Visit nd

A postorder traversal is also often called a depth-first traversal, or a bottom-up traversal. A postorder traversal of our example tree (see An example tree) visits the nodes in the following order:

    T U A B V C W X D Y Z E S

5.11 Semantics terms

Rarely is an application interested only in the tree. Traditionally and most often, the tree is an intermediate step in producing the “value” of the parse run. A value of a parse run is a “meaning” of the input string.

Grammars almost always have a semantics associated with them. The purpose of parsing is to use the grammar and its semantics to find the value of the input string. Finding the value of a parse tree is called evaluating a tree. More loosely, we also speak of evaluating the parse run, or evaluating the input string.

We recall that Libmarpa provides instructions for stack manipulation. See Value methods. Libmarpa does not directly do evaluation because Libmarpa is usually used together with a higher-level language, and the semantics is first known, and the evaluation is most easily performed, in the higher-level language.

In the most common method of evaluating a parse tree, every node has a value associated with it. An application might determine the value of a terminal node from

the node’s symbol;
a “token value”, which Libmarpa allows the user to specify for each input symbol at the time the input symbol is recognized;
context information, including the node’s location information; or
any combination of the above.

In determining the value of nulled nodes, the application may also consider the node’s symbol and context information. Libmarpa prunes nulled subtrees back to their root, which usually is what the application expects. The evaluation of nulled nodes is discussed in detail in the chapter on nullability. See Nullability.

To find the value of a rule node, an application may use the rule, context information, and the values of the rule node’s children. Let nd be a rule node, and let childCount be the number of children of nd. Frequently, the semantics of nd is implemented as a “semantic function”. The arguments of the semantic function are typically a scratchpad variable, call it scratch, followed by the values of the children of nd, so that the number of arguments for the function is childCount+1. The return value of the semantic function becomes the value of nd.

Semantic functions often have side effects. For example, a grammar for a language often wants its semantic functions to build and refer to a symbol table. The scratch variable allows for this.

Implementing this traditional method of tree evaluation requires a bottom-up traversal of the parse tree. In the final step of a bottom-up traversal, the start node is evaluated, and the value of the start node becomes the value of the parse run.

5.12 Ambiguity

In our discussion of evaluation above (see Semantics terms), we spoke of evaluating trees. In fact, the result of a successful Libmarpa parse run is a parse forest. We say that a Libmarpa parse run is ambiguous iff the parse run returns a forest containing more than one parse tree. We say that a succcessful Libmarpa parse run is unambiguous iff it contains exactly one parse tree.

Most applications care only about one parse tree. An application is free to decide what to do in case a parse forest is ambiguous. Among the application’s options are picking one tree and evaluating that tree; iterating through the parse trees; or treating the ambiguity as an error.

Libmarpa’s treatment of ambiguity differs somewhat from the traditional one, for two reasons. First, Libmarpa allows ambiguous tokens. Second, Libmarpa prunes nulled subtrees back to their topmost nulled symbol.

An ambiguous token occurs when the same lexeme can be recognized as more than one token symbol. Traditionally, parsers do not allow this, but it is the most natural way of parsing some real-life applications. For example, natural languages often allow words to be used as more than one part of speech. In this document, the English word “parse” is used as an adjective, a verb, and a noun, and each of these uses is common. Ambiguous tokens are a source of ambiguity not present in traditional parsing. See Ambiguous input.

On the other hand, the pruning of nulled subtrees eliminates a source of ambiguity that is present in traditional parsing. Multiple nulled subtrees may share their topmost nulled symbol. When the subtrees are pruned to their shared root, ambiguity is removed. See Nullability.

5.13 Earley items

We assume knowledge of Earley’s algorithm. This section is intended as a reminder, and to set out our choices for notation and terminology.

A dotted rule is a duple of rule and position in the rule. The position must be an non-negative integer that is not greater than the length of the rule. For example,

     [ [ A ::= X Y Z ], 2 ]      (DR1)

is a dotted rule whose rule is [ A ::= X Y Z ] and whose position is 2. More often a dotted rule is written with the position marked in the RHS of the rule. Here we use “•” as our marker, so that in (DR1) the “•” would come after the 2nd symbol on the RHS, and (DR1) would be written

     [ A ::= X Y • Z ].

Traditionally the marker is a raised dot, and this is where the term “dotted rule” comes from. The position of the dotted rule is almost always called the dot position.

An Earley item is a triple of dotted rule, origin and current parse location.¹ The Earley item makes a statement about the progress of a parse. The origin is the parse location where recognition of the dotted rule begins. The current location is the parse location which recognition of the dotted rule has reached. Symbols before the dot position of the dotted rule have been recognized, and symbols after the dot position of the dotted rule have yet to be recognized.

For example, in the Earley item

     [ [ A ::= X Y • Z ], 7, 42],

the dotted rule [ A ::= X Y • Z ] was recognized as far as the dot position, starting at parse location 7 and ending at parse location 42. More precisely,

the rule [ A ::= X Y Z ] was expected beginning at parse location 7;
the string of symbols “X Y” has been recognized starting at parse location 7, which implies that the symbol X has been recognized starting at parse location 7;
the string of symbols “X Y” has been recognized ending at parse location 42, which implies that the symbol Y has been recognized ending at parse location 42; and
the symbol Z has yet to recognized, but is expected starting at parse location 42.

The Earley set at location j is the set of Earley items whose current location is j. For example, the Earley item [ [ A ::= X Y • Z ], 7, 42] is in the Earley set at location 42.

We say that a dotted rule is completed iff its dot position is after the last symbol. We say that an Earley item is completed iff its dotted rule is completed. For example, the Earley item

     [ [ A ::= X Y Z • ], 7, 50],

is completed. A completed dotted rule is also called a completion. Similarly, a completed Earley item is also called a completion.

We say that a dotted rule is predicted iff its dot position is before the first symbol. We say that an Earley item is predicted iff its dotted rule is predicted. For example, the Earley item

     [ [ A ::= • X Y Z ], 7, 7],

is predicted. A predicted dotted rule is also called a prediction. Similarly, a predicted Earley item is also called a prediction.

The postdot symbol of a dotted rule is the RHS symbol after the dot position. For example, in the dotted rule,

     [ A ::= X Y • Z ]

the postdot symbol is Z. If the dotted rule is a completion, its postdot symbol is not defined.

The postdot symbol of an Earley item is the postdot symbol of its dotted rule. The postdot symbol of an Earley item is an expected symbol at its current location. For example, the Earley item

     [ [ A ::= X Y • Z ], 7, 42],

indicates that the symbol Z is expected at location 42.

Let sym be a symbol, and let j be a parse location. Unless sym is expected at j, that is, unless sym is a postdot symbol of some Earley item in the Earley set at j, sym will be rejected by the marpa_r_alternative() method at current location j. See marpa_r_alternative.

5.14 Application and diagnostic behavior

An application behavior is a behavior on which it is intended that the design of applications will be based. In this document, a behavior is an application behavior unless otherwise stated. We sometimes say that “applications may expect” a certain behavior to emphasize that that behavior is an application behavior.

After an irrecoverable failure, the behavior of a Libmarpa application is undefined, so that there are no behaviors that can be relied on for normal application processing, and therefore, there are no application behaviors. In this circumstance, some of the application behaviors become diagnostic behaviors. A diagnostic behavior is a behavior that this document suggests that the programmer may attempt in the face of an irrecoverable failure, for purposes of testing, diagnostics and debugging. Diagnostic behaviors are hoped for, rather than expected, and intended to allow the programmer to deal with irrecoverable failures as smoothly as possible. (See Failure.)

In this document, a behavior is a diagnostic behavior only if that is specifically indicated. Applications should not be designed to rely on diagnostic behaviors. We sometimes say that the application “may attempt” a certain behavior to emphasize that that behavior is a diagnostic behavior.

6 Architecture

6.1 Major objects

The classes of Libmarpa’s object system fall into two types: major and numbered. These are the Libmarpa’s major classes, in sequence.

Configuration: A configuration object is a thread-safe way to hold configuration variables, as well as the return code from failed attempts to create grammar objects.
Grammar: A grammar object contains rules and symbols, with their properties.
Recognizer: A recognizer object reads input.
Bocage: A bocage object is a collection of parse trees, as found by a recognizer. A bocage is a way of representing a parse forest.
Ordering: An ordering object is an ordering of the trees in a bocage.
Tree: A tree object is a bocage iterator.
Value: A value object is a tree iterator. Iteration of a tree using a value object produces “steps”. These “steps” are instructions to the application on how to evaluate the semantics, and how to manipulate the stack.

The major objects have one letter abbreviations, which are used frequently. These are, in the standard sequence,

Configuration: C
Grammar: G
Recognizer: R
Bocage: B
Ordering: O
Tree: T
Value: V

6.2 Time objects

All of Libmarpa’s major classes, except the configuration class, are time classes. An object in a time class is a time object. Except for objects in the grammar class, all time objects are created from a time object of the class before it in the sequence. A recognizer cannot be created without a precomputed grammar; a bocage cannot be created without a recognizer; and so on.

When one time object is used to create a second time object, the first time object is the parent object and the second time object is the child object. For example, when a bocage is created from a recognizer, the recognizer is the parent object, and the bocage is the child object.

Grammars have no parent object. Every other time object has exactly one parent object. Value objects have no child objects. All other time objects can have any number of children, from zero up to a maximum determined by memory availability or some other environment limit.

An object is the ancestor of another object if it is the parent of that object, or if it is the parent of an ancestor of that object. An object is the descendant of another object if it is the child of that object, or if it is the child of an descendant of that object. The following three statements are mutually exclusive:

Object X is of class C.
Object X has an ancestor of class C.
Object X has a descendant of class C.

It follows from the definitions of “parent” and “ancestor” that, for any time object class, an object can have at most one ancestor of that class. On the other hand, if an object has descendants in a class, there can be many of them.

An object is a base of another object, if it is that object, or if it is the ancestor of the object. For each time object class, an object has at most one base object. For example, a recognizer is its own base recognizer, and has exactly one base grammar.

The base grammar of a time object is of special importance. Every time object has a base grammar. A grammar object is its own base grammar. The base grammar of a recognizer is its parent grammar, the one that it was created with. The base grammar of any other time object is the base grammar of its parent object. For example, the base grammar of a bocage is the base grammar of the recognizer that it was created with.

6.3 Reference counting

Every object in a “time” class has its own, distinct, lifetime, which is controlled by the object’s reference count. Reference counting follows the usual practice. Contexts that take a share of the “ownership” of an object increase the reference count by 1. When a context relinquishes its share of the ownership of an object, it decreases the reference count by 1.

Each class of time object has a “ref” and an “unref” method, to be used by those contexts that need to explicitly increment and decrement the reference count. For example, the “ref” method for the grammar class is marpa_g_ref() and the “unref” method for the grammar class is marpa_g_unref().

Time objects do not have explicit destructors. When the reference count of a time object reaches 0, that time object is destroyed.

Much of the necessary reference counting is performed automatically. The context calling the constructor of a time object does not need to explicitly increase the reference count, because Libmarpa time objects are always created with a reference count of 1.

Child objects “own” their parents, and when a child object is successfully created, the reference count of its parent object is automatically incremented to reflect this. When a child object is destroyed, it automatically decrements the reference count of its parent.

In a typical application, a calling context needs only to remember to “unref” each time object that it creates, once it is finished with that time object. All other reference decrements and increments are taken care of automatically. The typical application never needs to explicitly call one of the “ref” methods.

More complex applications may find it convenient to have one or more contexts share ownership of objects created in another context. These more complex situations are the only cases in which the “ref” methods will be needed.

6.4 Numbered objects

In addition to its major, “time” objects, Libmarpa also has numbered objects. Numbered objects do not have lifetimes of their own. Every numbered object belongs to a time object, and is destroyed with it. Rules and symbols are numbered objects. Token values are another class of numbered objects.

7 Input

7.1 Parse location

7.1.1 The traditional input model

In traditional Earley parsers, the concept of location is very simple. Locations are numbered from 0 to n, where n is the length of the input. Every location has an Earley set, and vice versa. Location 0 is the start location. Every location after the start location has exactly one input token associated with it.

Some applications do not fit this traditional input model — natural language processing requires ambiguous tokens, for example. Libmarpa allows a wide variety of alternative input models.

7.1.2 Earlemes and Earley set IDs

Libmarpa’s idea of parse location, is the earleme, but Libmarpa also tracks Earley sets by ordinal number. In Libmarpa’s basic input models, the earleme of an Earley set is always equal to the ordinal of the Earley set, so the distinction between Earley set ordinal and earleme is pedantic. But Libmarpa’s advanced input models allow sparse input and, when sparse input is in use, an Earley set’s ordinal and its earleme may differ. See The fully general input model.

The earlemes are assigned consecutive non-negative integers, starting at 0. The highest valued earleme is called the furthest earleme. See The furthest earleme. The other important earleme values are the latest earleme (see The latest earleme) and the current earleme (see The current earleme). Latest, current and furthest earleme, when they have specified values, obey a lexical order in this sense: The latest earleme is always at or before the current earleme, and the current earleme is always at or before the furthest earleme.

The Earley set ordinal is also known as the ID of the Earley set, or Earley set ID. Earley set IDs are consecutive non-negative integers, starting at 0. The highest valued Earley set ID is that of the latest Earley set. See The latest Earley set

7.1.3 The latest Earley set

The latest Earley set is the Earley set completed most recently. This is initially the Earley set at location 0. The latest Earley set is always the Earley set with the highest ID, and the Earley set with the highest earleme location.

The latest Earley set is set implicitly by the marpa_start_input() (see marpa_r_start_input) and marpa_r_earleme_complete() (see marpa_r_earleme_complete) methods. Applications can access its value via the marpa_r_latest_earley_set() method. See marpa_r_latest_earley_set.

7.1.4 The latest earleme

The latest earleme is the earleme of the latest Earley set. If there is an Earley set at the current earleme, it is the latest Earley set and the latest earleme is equal to the current earleme. There is never an Earley set after the current earleme, and therefore the latest Earley set is never after the current earleme.

The latest earleme is different from the current earleme if and only if there is no Earley set at the current earleme. A different end of parse can be specified, but by default, parsing is of the input in the range from earleme 0 to the latest earleme. See End of parse.

The latest earleme is set implicitly by the marpa_start_input() (see marpa_r_start_input) and marpa_r_earleme_complete() (see marpa_r_earleme_complete) methods. Applications can access the value of the latest earleme by first calling the marpa_r_latest_earley_set() method (see marpa_r_latest_earley_set), then converting its return value with the marpa_r_earleme() method (see marpa_r_earleme method).

7.1.5 The current earleme

The current earleme is the earleme that Libmarpa is currently working on. More specifically, it is the one at which new tokens will start. Since tokens are never zero length, a new token will always end after the current earleme. marpa_r_start_input() initializes the current earleme to 0, and every call to marpa_r_earleme_complete() advances the current earleme by 1.

The current earleme is set implicitly by the marpa_start_input() (see marpa_r_start_input) and marpa_r_earleme_complete() (see marpa_r_earleme_complete) methods. Applications can access the value of the current earleme via the marpa_r_current_earleme() method (see marpa_r_current_earleme).

7.1.6 The furthest earleme

Loosely speaking, the furthest earleme is the furthest earleme reached by the parse. More precisely, it is the highest numbered earleme at which a token ends and is 0 if there are no tokens. The furthest earleme is 0 when a recognizer is created. With every call to marpa_r_alternative(), the end of the token it adds is calculated. A token ends at the earleme location current+length, where current is the current earleme, and length is the length of the newly added token. If old_f is the furthest earleme before a call to marpa_r_alternative(), the furthest earleme after the call is max(old_f, current+length).

The furthest earleme is set implicitly by the marpa_r_new() (see marpa_r_new) and the marpa_r_alternative() (see marpa_r_alternative) methods. Applications can access the value of the furthest earleme via the marpa_r_furthest_earleme() method (see marpa_r_furthest_earleme).

In the basic input models, where every token has length 1, calling marpa_r_earleme_complete() after each marpa_r_alternative() call is sufficient to process all inputs, and the furthest earleme’s value can be ignored. In alternative input models, where tokens have lengths greater than 1, calling marpa_r_earleme_complete() once after the last token is read may not be enough to ensure that all tokens have been processed. To ensure that all tokens have been processed, an application must advance the current earleme by calling marpa_r_earleme_complete(), until the current earleme is equal to the furthest earleme.

7.2 The basic models of input

For the purposes of presentation, we (somewhat arbitrarily) divide Libmarpa’s input models into two groups: basic and advanced. In the basic input models of input, every token is exactly one earleme long. This implies that, in a basic model of input,

every token is the same length,
the ordinal of an Earley set will always be the same as its earleme location, and
the latest earleme and the current earleme are always equal.

In the advanced models of input, tokens may have a length other than 1. Most applications use the basic input models. The details of the advanced models of input are presented in a later chapter. See Advanced input models.

7.2.1 The standard model of input

In the standard model of input, there is exactly one successful marpa_r_alternative() call immediately previous to every marpa_r_earleme_complete() call. A marpa_r_alternative() call is immediately previous to a marpa_r_earleme_complete() call iff that marpa_r_earleme_complete() call is the first marpa_r_earleme_complete() call after the marpa_r_alternative() call.

Recall that, since the standard model is a basic model, the token length in every successful call to marpa_r_alternative() will be one. For an input of length n, there will be exactly n marpa_r_earleme_complete() calls, and all but the last call to marpa_r_earleme_complete() must be successful.

In the standard model, after a successful call to marpa_r_alternative(), if c is the value of the current earleme before the call,

the current earleme will remain unchanged and therefore will be c; and
the furthest earleme will be c+1.

In the standard model, a call to marpa_r_earleme_complete() follows a successful call of marpa_r_alternative(), so that the value of the furthest earleme before the call to marpa_r_earleme_complete() will be c+1, where c is the value of the current earleme. After a successful call to marpa_r_earleme_complete(),

the current earleme will be advanced to c+1; and
the furthest earleme will be c+1, and therefore equal to the current earleme.

Recall that, in the basic models of input, the latest earleme is always equal to the current earleme.

7.2.2 Ambiguous input

We can loosen the standard model to allow more than one successful call to marpa_r_alternative() immediately previous to each call to marpa_r_earleme_complete(). This change will mean that multiple tokens become possible at each earleme — in other words, that the input becomes ambiguous. We continue to require that there be at least one successful call to marpa_r_alternative() before each call to marpa_r_earleme_complete(). And we recall that, since this is a basic input model, all tokens must have a length of 1.

In the ambiguous input model, the behavior of the current, latest and furthest earlemes are exactly as described for the standard model. See The standard model of input.

7.3 Terminals

Traditionally, a terminal symbol is a symbol that may appear in the input. Traditional grammars divide all symbols sharply into terminals and non-terminals: A terminal symbol must always be used as a terminal. A non-terminal symbol can never be used as a terminal.

In Libmarpa, by default, a symbol is a terminal, and therefore may appear in the input iff both of the following are true:

The symbol is non-nulling. It is a logical contradiction for a nulling symbol to appear in the input. For this reason, Marpa does not allow it.
The symbol does not appear on the LHS of any rule.

Marpa’s default behavior follows tradition. A now-deprecated feature of Marpa allowed for LHS terminals. See LHS terminals. Most readers will want to stick to Marpa’s default behavior, and can and should ignore the possibility of LHS terminals. Even when LHS terminals are allowed, terminals can never be zero length.

In Libmarpa, every terminal instance has a token value associated with it. Token values are int’s. Libmarpa does nothing with token values except accept them from the application and return them during parse evaluation.

8 Exhaustion

A parse is exhausted when it cannot accept any further input. A parse is active iff it is not exhausted. For a parse to be exhausted, the furthest earleme and the current earleme must be equal. However, the converse is not always the case: if more tokens can be read at the current earleme, then it is possible for the furthest earleme and the current earleme to be equal in an active parse.

Parse exhaustion always has a location. That is, if a parse is exhausted it is exhausted at some earleme location X. If a parse is exhausted at location X, then

There may be valid parses at X.
The parse was active at all locations earlier than X.
There may be valid parses at locations before X.
There will be no valid parses at locations after X.
No tokens can start at location X.
No tokens can end at a location after X.
No tokens can start at any location after X.
No tokens will be accepted by an exhausted parser. It is an irrecoverable hard failure to call marpa_r_alternative() after a parser has become exhausted.
No Earley sets will be at any location after X.
No earlemes are completed by, and no Earley sets are created by, an exhausted parser. It is an irrecoverable hard failure to call marpa_r_earleme_complete() after a parser has become exhausted.

Users sometimes assume that parse exhaustion means parse failure. But other users sometimes assume that parse exhaustion means parse success. For many grammars, there are strong associations between parse exhaustion and parse success, but the strong association can go either way, Both exhaustion-loving and exhaustion-hating grammars are very common in practical application.

In an exhaustion-hating application, parse exhaustion typically means parse failure. C programs, Perl scripts and most programming languages are exhaustion-hating applications. If a C program is well-formed, it is always possible to read more input. The same is true of a Perl program that does not have a __DATA__ section.

In an exhaustion-loving application parse exhaustion means parse success. A toy example of an exhaustion-loving application is the language consisting of balanced parentheses. When the parentheses come into perfect balance the parse is exhausted, because any further input would unbalance the brackets. And the parse succeeds when the parentheses come into perfect balance. Exhaustion means success. Any language that balances start and end indicators will tend to be exhaustion-loving. HTML and XML, with their start and end tags, can be seen as exhaustion-loving languages.

One common form of exhaustion-loving parsing occurs in lexers that look for longest matches. Exhaustion will indicate that the longest match has been found.

It is possible for a language to be exhaustion-loving at some points and exhaustion-hating at others. We mentioned Perl’s __DATA__ as a complication in a basically exhaustion-hating language.

marpa_r_earleme_complete() and marpa_r_start_input are the only methods that may encounter parse exhaustion. See marpa_r_earleme_complete(), and marpa_r_start_input(). When the marpa_r_start_input or marpa_r_earleme_complete() methods exhaust the parse, they trigger a MARPA_EVENT_EXHAUSTED event. Applications can also query parse exhaustion status directly with the marpa_r_is_exhausted() method. See marpa_r_is_exhausted().

Parse exhaustion can be triggered on method success by either the marpa_r_earleme_complete() or the marpa_r_start_input method. Parse exhaustion can be triggered on method failure by the marpa_r_earleme_complete() method. Parse exhaustion triggered on method success is called parse exhaustion on success or exhaustion on success. Parse exhaustion triggered on method failure is called parse exhaustion on failure or exhaustion on failure.

9 End of parse

Users accustomed to certain applications may believe that the question of when to end parsing is almost too trivial to ask. And, for many applications this is the case. But the general question of determining the end of parse is decidedly non-trivial. For a broad theoretical overview, see "When are we done parsing?", pp. 93-94 in the 2nd edition of Grune and Jacobs, (see Grune and Jacobs 2008). This chapter discusses determination of the end of parse (EOP) from the Libmarpa point of view.

EOP-handling and exhaustion-handling are intertwined. It may be useful to review the section on exhaustion at this point. See Exhaustion.

9.1 Determining EOP from EOI

In many applications the EOP is always the end of input (EOI). That is, parsing ends at a known parse location. The length of the input may be known because it was fixed in advance; there may be an end of file (EOF) indication; there may be an end of string (EOS) indication; or the application may have some other means of knowing that is has hit the EOI.

If the EOI is the EOP, the following is typically the case:

The application will treat exhaustion on failure as a parse error.
The application will treat exhaustion before the EOI, even on success, as a parse error.

How parse exhaustion on success at the EOI is handled will depend on the application. Here are a few of the many possible ways:

Exhaustion might be ignored. Very often this will be what the application wants to do.
Exhaustion might be, if the application is exhaustion-loving, required, so that it is treated as an error if the parse is still active at the EOI.
An application, if it is exhaustion-hating in the extreme, may wish to treat exhaustion, even exhaustion on success at the EOI, as an error.

9.2 Determining EOP from exhaustion

An exhaustion-loving application that does not know the length of its input may want to terminate the parse in case of exhaustion on success, in effect treating exhaustion as an end-of-input indicator. These applications will usually want to treat exhaustion on failure as an error.

An example of such an application would be the language consisting entirely of strings of balanced parentheses. HTML might also be treated as an exhaustion-loving language, with the parse terminating at the /html tag.

9.3 Other ways of determining EOP

Occasionally, an exhaustion-hating application may not know the length of its input in advance. Since these applications will not know from the length of the input, or from exhaustion, that they are at end of input, they will need some other way of determining this. One way they may do this is with a MARPA_EVENT_SYMBOL_COMPLETED event.

An example of such an application that has seen much use is the Marpa::R2 lexer. The Marpa::R2 lexer looks for one of a set of “lexemes”, seeking the longest one. It does this by declaring a MARPA_EVENT_SYMBOL_COMPLETED event for each lexeme, and recording the most recent location at which one of these completion events occurs. The Marpa::R2 lexer stops looking for the longest lexeme on parse exhaustion, and at that point treats the parse location of the most recent MARPA_EVENT_SYMBOL_COMPLETED event as the EOP. If no MARPA_EVENT_SYMBOL_COMPLETED event occurred, the Marpa::R2 lexer failed to find a lexeme, and reports a lexing error.

10 Semantics

Libmarpa handling of semantics is unusual. Most semantics are left up to the application, but Libmarpa guides them. Specifically, the application is expected to maintain the evaluation stack. Libmarpa’s valuator provides instructions on how to handle the stack. Libmarpa’s stack handling instructions are called “steps”. For example, a Libmarpa step might tell the application that the value of a token needs to go into a certain stack position. Or a Libmarpa step might tell the application that a rule is to be evaluated. For rule evaluation, Libmarpa will tell the application where the operands are to be found, and where the result must go.

The detailed discussion of Libmarpa’s handling of semantics is in the reference chapters of this document, under the appropriate methods and classes. The most extensive discussion of the semantics is in the section that deals with the methods of the value time class (Value methods).

11 Threads

Libmarpa is thread-safe, given circumstances as described below. The Libmarpa methods are not reentrant.

Libmarpa is C89-compliant. See C89. It uses no global data, and calls only the routines that are defined in the C89 standard and that can be made thread-safe. In most modern implementations, the default C89 implementation is thread-safe to the extent possible. But the C89 standard does not require thread-safety, and even most modern environments allow the user to turn thread safety off. To be thread-safe, Libmarpa must be compiled and linked in an environment that provides thread-safety.

While Libmarpa can be used safely across multiple threads, a Libmarpa grammar cannot be. Further, a Libmarpa time object can only be used safely in the same thread as its base grammar. This is because all time objects with the same base grammar share data from that base grammar.

To work around this limitation, the same grammar definition can be used to a create a new Libmarpa grammar time object in each thread. If there is sufficient interest, future versions of Libmarpa could allow thread-safe cloning of grammars and other time objects.

12 Sequence rules

Traditionally, grammars only allow BNF rules. Libmarpa allows sequence rules, which express sequences by allowing a single RHS symbol to be repeated.

A sequence rule consists of a LHS and a RHS symbol. Additionally, the application must indicate the minimum number of repetitions. The minimum count must be 0 or 1.

Optionally, a separator symbol may be specified. For example, a comma-separated sequence of numbers

     1,42,7192,711,

may be recognized by specifying the rule Seq ::= num and the separator comma ::= ','. By default, an optional final separator, as shown in the example above, is recognized, but “proper separation” may also be specified. In proper separation separators must, in fact, come between (“separate”) items of the sequence. A final separator is not a separator in the strict sense, and therefore is not recognized when proper separation is in effect. For more on specifying sequence rules, see marpa_g_sequence_new.

Sequence rules are “sugar” — their presence in the Libmarpa interface does not extend its power. Every Libmarpa grammar that can be written using sequence rules can be rewritten as a grammar without sequence rules.

The RHS symbol and the separator, if there is one, must not be nullable. This is because it is not completely clear what an application intends when it asks for a sequence of items, some of which are nullable — the most natural interpretation of this usually results in a highly ambiguous grammar.

Libmarpa allows highly ambiquous grammars and a programmer who wants a grammar with sequences containing nullable items or separators can write that grammar using BNF rules. The use of BNF rules make it clearer that ambiguity is what the programmer intended, and allows the programmer more flexibility.

A sequence rule must have a dedicated LHS — that is, the LHS of a sequence rule must not be the LHS of any other rule. This implies that the LHS of a sequence rule can never be the LHS of a BNF rule.

The requirement that the LHS of a sequence rule be unique is imposed for reasons similar to those for the prohibition against RHS and separator nullables. Often reuse of the LHS of a sequence rule is simply a mistake. Even when deliberate, reuse of the LHS results in a complex grammar, one which often parses in ways that the programmer did not intend.

A programmer who believes they know what they are doing, and really does want alternative sequences starting at the same input location, can specify this behavior indirectly. They can do this by creating two sequence rules with distinct LHS’s:

     Seq1 ::= Item1
     Seq2 ::= Item2

and adding a new “parent” LHS which recognizes the sequences as alternatives.

     SeqChoice ::= Seq1
     SeqChoice ::= Seq2

13 Nullability

In Libmarpa, there is no direct way to mark a symbol nullable or nulling. All Libmarpa’s terminal symbols are non-nullable. By default, Libmarpa’s non-terminal symbols are nullable or nulling depending on the rules in which they appear on the LHS. The default behavior for non-terminals can be changed (see LHS terminals), but this is deprecated.

To make a symbol x nullable, a user must create an nulling rule whose LHS is x. The empty rule is nulling, so that one way a user can ensure x is nullable is by making it the LHS of an empty rule. If every rule with x on the LHS is nulling, x will be not just nullable, but nulling as well.

13.1 Nullability in the valuator

In the valuator, every nulling tree is pruned back to its topmost nulling symbol. This means that there are no nulling rules in the valuator, only nulling symbols. For an example of how this works, see Example of nulled symbol.

While this may sound draconian, the “lost” semantics of the nulled rules and non-topmost nulled symbols are almost never missed. Nulled subtrees cannot contain input, and therefore do not contain token symbols. So no token values are lost when nulled subtrees are pruned, and we are dealing with the semantics of the empty string. See Evaluating nulled symbols.

13.2 Assigning semantics to nulled symbols

Libmarpa leaves the semantics to an upper layer, so that we usually treat semantics as outside the scope of this document. But most upper layers will find that nulled symbols are a corner case for their semantics, and we therefore offer the writers of upper layers some hints.

Typically, upper layers will assign semantics to a LHS symbol based on the rule instance in which the LHS occurs. All nulled symbols are LHS symbols, but the valuator prunes all nulled rules, forcing the application to determine the semantics of a nulled symbol instance based on its symbol. One method of making this determination is the one which is implemented in Marpa::R2. Let g be a grammar; and let x be a symbol that is nulled in a parse that uses g. Call a rule in g with x on its LHS, an “x LHS rule”. Marpa::R2 assigns a semantics to x using the first of following guidelines that applies:

If every x LHS rule in g has the same semantics, Marpa::R2 assigns that shared semantics to x.
If there is an empty x LHS rule in g, Marpa::R2 assigns the semantics of that empty rule to x.
If none of the previous guidelines apply, Marpa::R2 reports an error.

13.3 Evaluating nulled symbols

In theory, the semantics of nulled symbols, like any semantics, can be arbitrarily complex. In practice, we are dealing with the semantics of the empty string, which is literally the “semantics of nothing”. If what we are dealing with truly is primarily a parsing problem, we can usually expect that the semantics of nothing will be simple.

The possible subtrees below a nulled symbol can be seen as a set, and that set is a constant that depends on the grammar. Since the input corresponding to the nulled symbol is also a constant (the empty string), the semantics of a nulled symbol will also be constant, unless one of the following is true:

The semantics is non-deterministic.
The semantics takes into account, not just the input, but its context. For example, the semantics might depend on parse location, in which case the nulled symbol instance of x at location 5 might mean something different from an nulled instance of x at location 50.

Both of these exceptions are unusual. When they do occur, the upper layer can implement the semantics of the nulled symbols with a function or a closure.

13.4 Example of nulled symbol

As already stated, Marpa prunes every null subtree back to its topmost null symbol. Here is an example grammar, with S as the start symbol.

        S ::= L R
        L ::= A B X
        L ::=
        R ::= A B Y
        R ::=
        A ::=
        B ::=
        X ::=
        X ::= "x"
        Y ::=
        Y ::= "y"

If we let the input be ‘x’, we can write the unpruned parse tree in preorder (depth-first), indenting children below their parents, like this:

        0: Visible Rule: S := L R
             1: Visible Rule L := A B X
                 1.1: Nulled Symbol A
                 1.2: Nulled Symbol B
                 1.3: Token, Value is "x"
             2: Nulled Rule, Rule R := A B Y
                 2.1: Nulled Symbol A
                 2.2: Nulled Symbol B
                 2.3: Nulled Symbol Y

In this example, five symbols and a rule are nulled. The nulled rule and three of the nulled symbols are in a nulled subtree: 2, 2.1, 2.2 and 2.3. Marpa prunes every null subtree back to its topmost symbol, which in this case is the LHS of the rule numbered 2. The pruned tree looks like this:

         0: Visible Rule: S := L R
              1: Visible Rule L := A B X
                  1.1: Nulled Symbol A
                  1.2: Nulled Symbol B
                  1.3: Token, Value is "x"
              2: LHS of Nulled Rule, Symbol R

Nulled nodes 1.1, 1.2 and 2 were all kept, because they are topmost in their nulled subtree. All the other nulled nodes were discarded.

13.5 Duplicate nulled nodes

A non-nulled node can never appear twice in the same tree. But duplicate nulled nodes in the same tree are quite possible. The start and end locations of a nulled node are the same, and other nodes can share the same start and end locations, so location information does not necessarily made a nulled node unique. Since a nulled node also always has no children, if two nulled nodes at the same location differ, they must differ in the symbol.

It is quite possible in a grammar to allow sequences of the same nulled symbol. All the nodes in this sequence will be equal to each other. These nodes represent multiple nulled symbol instances. The context (specifically, the position of the nulled node in the tree) is required to distinguish these instances.

Libmarpa could distinguish nulled nodes by including a “instance number” in the node and, from a set-theoretic point of view, this would be more appropriate. But, in the implementation, the context of a node is always available, and allowing for an “instance number” with each node would consume a considerable amount of space.

14 Failure

As a reminder, no language in this chapter (or, for that matter, in this document) should be read as providing, or suggesting the existence of, a warranty. See License.

14.1 Libmarpa’s approach to failure

Libmarpa is a C language library, and inherits the traditional C language approach to avoiding and handling application programming errors. The burden this C tradition puts on the application programmer might strike readers unfamiliar with that tradition as appallingly large.

But in the early 1970’s, when the C language first stabilized, the alternative, and the consensus choice for its target applications, was assembly language. In that context, C was radical in its willingness to incur a price in efficiency in order to protect programmers from themselves. C was considered to take a excessively “hand holding” approach which very much flew in the face of consensus.

The decades have made a large difference in the trade-offs, and the consensus about the degree to which even a low-level language should protect the user has changed. It seems inevitable that C will be replaced as the low-level language of choice, by a language that places fewer burdens on the programmer, and more on the machine. The question seems to be not whether C will be dethroned as the “go to” language for low-level progamming, but when, and by which alternative.

Modern hardware makes many simple checks essentially cost-free, and Libmarpa’s efforts to protect the application programmer go well beyond what would have been considered best practice in the past. But it remains a C language library. The Libmarpa application programmer must be prepared to exercise the high degree of carefulness traditionally required for C language programming. Libmarpa places the burden of avoiding irrecoverable failures, and of handling recoverable failures, largely on the application programmer.

14.2 User non-conformity to specified behavior

This document specifies many behaviors for Libmarpa application programs to follow, such as the nature of the arguments to each method. The C language and the application environment impose many more behaviors, such as proper memory management. When a non-conformity to specified behavior is unintentional and problematic, it is frequently called a “bug”. Even the most carefully programmed Libmarpa application may sometimes contain a “bug”. In addition, some specified behaviors are explicitly stated as characterizing a primary branch of the processing, rather than made mandatory for all successful processing. Non-conformity to non-mandatory behaviors can be efficiently recoverable, and is often intentional.

This chapter describes how non-conformity to specified behavior by a Libmarpa application is handled by Libmarpa. Non-conformity to specified behavior by a Libmarpa application is also called, for the purposes of this document, a Libmarpa application programming failure. In contexts where no ambiguity arises, Libmarpa application programming failure will usually be abbreviated to failure.

Libmarpa application programming success in a context is defined as the absence of unrecovered failure in that context. When no ambiguity arises, Libmarpa application programming success is almost always abbreviated to success. For example, the success of an application means the application ran without any irrecoverable failures, and that it recovered from all the recoverable failures that were detected.

14.3 Classifying failure

A Libmarpa application programming failure, unless stated otherwise, is an irrecoverable failure. Once an irrecoverable failure has occurred, the further behavior of the program is undefined. Nonetheless, we specify, and Libmarpa attempts, diagnostics behaviors (see Application and diagnostic behavior) in an effort to handle irrecoverable failures as smoothly as possible.

A Libmarpa application programming failure is not recoverable, unless this document states otherwise.

A failure is called a hard failure if it has an error code associated with it. A recoverable failure is called a soft failure if it has no associated error code. (For more on error codes, see Error codes.)

All failures fall into one of five types. In order of severity, these are

memory allocation failures,
undetected failures,
irrecoverable hard failures,
partially recoverable hard failures,
fully recoverable hard failures, and
soft failures.

14.4 Memory allocation failure

Failure to allocate memory is the most irrecoverable of irrecoverable errors. On memory allocation failure, as with all irrecoverable failures, Libmarpa’s behavior is undefined. Libmarpa attempts to terminate the current program abnormally by calling abort(), but a component of many modern operating systems is an out-of-memory (OOM) killer, which takes the decision out of the hands of both Libmarpa and the user. See Out-of-memory handling.

Memory allocation failure is the only case in which the decision to terminate the program is made for the user. In all other cases, Libmarpa leaves the decision to terminate the program, whether normally or abnormally, up to the application programmer.

Memory allocation failure does not have an error code. As a pedantic matter, memory allocation failure is neither a hard or a soft failure.

14.5 Undetected failure

An undetected failure is a failure that the Libmarpa library does not detect. Many failures are impossible or impractical for a C library to detect. Two examples of failure that the Libmarpa methods do not detect are writes outside the bounds of allocated memory, and use of memory after it has been freed. C is not strongly typed, and arguments of Libmarpa routines undergo only a few simple tests, tests which are inadequate to detect many of the potential problems.

By undetected failure we emphasize that we mean failures undetected by the Libmarpa methods. In the examples just given, there exist tools that can help the programmer detect memory errors and other tools exist to check the sanity of method arguments.

This document points out some of the potentially undetected problems, when doing so seems more helpful than tedious. But any attempt to list all the undetected problems would be too large and unwieldy to be useful.

Undetected failure is always irrecoverable. An undetected failure is neither a hard or a soft failure.

14.6 Irrecoverable hard failure

An irrecoverable hard failure is an irrecoverable Libmarpa application programming failure that has an error code associated with it. Libmarpa attempts to behave as predictably as possible in the face of a hard failure, but once an irrecoverable failure occurs, the behavior of a Libmarpa application is undefined.

In the event of an irrecoverable failure, there are no application behaviors. The diagnostic behavior for a hard failure is as described for the method that detects the hard failure. At a minimum, this diagnostic behavior will be returning from the method that detects the hard failure with the return value specified for hard failure, and setting the error code as specified for hard failure.

14.7 Partially recoverable hard failure

A partially recoverable hard failure is a recoverable Libmarpa application programming failure

that has an error code associated with it; and
after which some, but not all, of the application behaviors remain available to the programmer.

For every partially recoverable hard failure, this document specifies the application behaviors that remain available after it occurs. The most common kind of partially recoverable hard failure is a library-recoverable hard failure. For an example of partially recoverable hard failure, see Library-recoverable hard failure.

14.8 Library-recoverable hard failure

A library-recoverable hard failure is a type of partially recoverable hard failure. Loosely described, it is a hard failure that allows the programmer to continue to use many of the Libmarpa methods in the library, but that disallows certain methods on some objects.

To state the restrictions of application behaviors more precisely, let the “failure grammar” be the base grammar of the method that detected the library-recoverable hard failure. After a library-recoverable hard failure, the following behaviors are no longer application behaviors:

Libmarpa mutator and constructor method calls where the base grammar is the failure grammar.

Recall that any use of a behavior that is not an application behavior is an irrecoverable failure.

The application behaviors remaining after a library-recoverable hard failure are the following:

All non-Libmarpa interfaces, including calls to libraries in the application and in the application environment.
All Libmarpa static method calls.
All Libmarpa accessor and destructor method calls.
All Libmarpa mutator and constructor method calls whose base grammar is not the failure grammar.

Note that Libmarpa destructors remain available after a library recoverable failure. An application will often want to destroy all Libmarpa objects whose base grammar is the failure grammar, in order to clear memory of problematic objects.

An example of a library-recoverable hard failure is the MARPA_ERR_COUNTED_NULLABLE error in the marpa_g_precompute method. See marpa_g_precompute().

14.9 Ancestry-recoverable hard failure

An ancestry-recoverable hard failure is a type of partially recoverable hard failure. An ancestry-recoverable failure allows a superset of the application behaviors allowed by a library-recoverable hard failure. More precisely, let the “failure object” be the object that detected the ancestry-recoverable hard failure. After an ancestry-recoverable hard failure, the following behaviors are no longer application behaviors:

Libmarpa mutator and constructor method calls where the object is the failure object, or one of its descendants.

Recall that any use of a behavior that is not an application behavior is an irrecoverable failure.

The application behaviors remaining after a ancestry-recoverable hard failure are the following:

All non-Libmarpa interfaces, including calls to libraries in the application and in the application environment.
All Libmarpa static method calls.
All Libmarpa accessor and destructor method calls.
All Libmarpa mutator and constructor method calls for time objects that are not the failure object, or one of its descendants.

Note that all Libmarpa destructors remain available after an ancestry-recoverable failure. An application will often want to destroy the failure object and all of its descendants, in order to clear memory of problematic objects.

As an example, users calling marpa_g_precompute() will often want to treat a MARPA_EVENT_EARLEY_ITEM_THRESHOLD event as if it were an ancestry-recoverable hard failure. See marpa_g_precompute().

Library-recoverable failure is a special case of ancestry-recoverable failure. When the failure object is a grammar, ancestry-recoverable failure is synonymous with library-recoverable failure.

14.10 Fully recoverable hard failure

A fully recoverable hard failure is a recoverable Libmarpa application programming failure

that has an error code associated with it; and
after which all of the application behaviors remain available to the programmer.

One example of a fully recoverable hard failure is the error code MARPA_ERR_UNEXPECTED_TOKEN_ID. The “Ruby Slippers” parsing technique (see Ruby Slippers), which has seen extensive usage, is based on Libmarpa’s ability to recover from a MARPA_ERR_UNEXPECTED_TOKEN_ID error fully and efficiently,

14.11 Soft failure

A soft failure is a recoverable Libmarpa application programming failure that has no error code associated with it. Hard errors are assigned error codes in order to tell them apart. Error codes are not necessary or useful for soft errors, because there is at most one type of soft failure per Libmarpa method.

Soft failures are so called, because they are the least severe kind of failure. The most severe failures are “bugs” — unintended, and a symptom of a problem. Soft failures, on the other hand, are a frequent occurrence in normal, successful, processing. In the phrase “soft failure”, the word “failure” is used in the same sense that its cognate “fail” is used when we say that a loop terminates when it “fails” its loop condition. That ”failure” is the failure of a condition necessary to continue on a main branch of processing, and a signal to proceed on another branch.

It is expected that Libmarpa applications will be designed such that successful execution requires handling soft failures. In fact, a non-trivial Libmarpa application can hardly be designed except on that basis.

14.12 Error codes

As stated, every hard failure has an associated error code. Full descriptions of the error codes that are returned by the external methods are given in their own section (External error codes).

How the error code is accessed depends on the method that detects the hard failure associated with that error code. Methods for time objects always set the error code in the base grammar, from which it may be accessed using the error methods described below. See Error methods. If a method has no base grammar, the description of that method will state how to access the error code for a hard failure detected by that method.

Since the error of a time object is set in the base grammar, it follows that every object with the same base grammar has the same error code. Objects with different base grammars may have different error codes.

While error codes are properties of a base grammar, irrecoverability is application-wide. That is, whenever any irrecoverable failure occurs, the entire application is irrecoverable. Once an application becomes irrecoverable, those Libmarpa objects with error codes for recoverable errors are still subject to the general irrecoverability.

15 Introduction to the method descriptions

The following chapters describe Libmarpa’s methods in detail.

15.1 About the overviews

The method descriptions are grouped into chapters and sections. Each such group of methods descriptions begins, optionally, with an overview. These overviews, again optionally, end with a “cheat sheet”. The “cheat sheets” name the most important Libmarpa methods in that chapter or section, in the order in which they are typically used, and very briefly describe their purpose.

The overviews sometimes speak of an “archetypal” application. The archetypal Libmarpa application implements a complete logic flow, starting with the creation of a grammar, and proceeding all the way to the return of the final result from a value object. In the archetypal Libmarpa application, the grammar, input and semantics are all small but non-trivial.

15.2 Naming conventions

Methods in Libmarpa follow a strict naming convention. All methods have a name beginning with marpa_, if they are part of the external interface. If an external method is not a static method, its name is prefixed with one of marpa_c_, marpa_g_, marpa_r_, marpa_b_, marpa_o_, marpa_t_ or marpa_v_, where the single letter between underscores is one of the Libmarpa major class abbreviations. The letter indicates which class the method belongs to.

Methods that are exported, but that are part of the internal interface, begin with _marpa_. Methods that are part of the internal interface (often called “internal methods”) are subject to change and are intended for use only by Libmarpa’s developers.

Libmarpa reserves the marpa_ and _marpa_ prefixes for itself, with all their capitalization variants. All Libmarpa names visible outside the package will begin with a capitalization variant of one of these two prefixes.

15.3 Return values

Some general conventions for return values are worth mentioning:

For methods that return an integer, a return value of -1 usually indicates soft method failure.
For methods that return an integer, a return value of -2 usually indicates hard method failure.
For methods that return an integer, a return value greater of zero or more usually indicates method success.
If a method returns an pointer value, NULL usually indicates method failure. Any other result usually indicates method success.

The words “success” and “failure” are heavily overloaded in these documents. But in contexts where our meaning is clear we will usually abbreviate “method success” and “method failure” to “success” and “failure”, respectively.

While these general conventions are a memory aid, there are exceptions, and the programmer must look at the return value summary in the description of every method they use.

As one example of an exception, for certain methods, a return value of -2 is ambiguous: -2 can be both a valid return value for method success, and a potential indication of hard method failure. An example of a method where -2 is ambiguous is marpa_g_rule_rank_set(). See marpa_g_rule_rank_set. In cases like marpa_g_rule_rank_set(), the programmer must distinguish the two return statuses based on the error code.

Whenever a method departs from the general return value conventions, the departure will be detailed in the description of that method. Any departure by a method from the general return value conventions will always be indicated in the return value summary for that method.

15.4 How to read the method descriptions

The method descriptions are written on the assumption that the reader has the following in mind while reading them:

Each method description begins with the signature of its “topic method”.
In the method description, the phrase “this method” always refers to the topic method.
Whenever “this method” is the subject of a sentence in the method description, it may be elided, so that, for example,

This method returns 42.

becomes

Returns 42.
In a time object method, the instance, sometimes called the “self argument”, is always the first argument of the method. In the descriptions of time object methods, references to the base objects of the instance are often left implied. For example, rather than write “the rule with ID id in the base grammar”, we will usually write “the rule with ID id”. References to time objects other than base objects of the instance are always explicitly stated, as are references to multiple time objects.
If the return type of a method is not void, the last paragraph of its method description is a “return value summary”. The return value summary starts with the label “Return Value”.
Every method returns in exactly one of three statuses: success, hard failure, or soft failure.
A return status of hard failure indicates that the method detected a hard failure.
A method may have several kinds of hard failure, including several kinds of irrecoverable hard failure and several kinds of recoverable hard failure. On return, these can be distinguished by their error codes.
If a method call hard fails, its error code is that associated with the hard failure. Unless stated otherwise in the return value summary, the error code is set in the base grammar of the method call, and may be accessed with the methods described below. See Error methods.
If a method allows a recoverable hard failure, this is explicitly stated in its return value summary, along with the associated error code. The method description will state the circumstances under which the recoverable hard failure occurs, and what the application must do to recover.
A return status of soft failure indicates that the method detected a soft failure.
Every method has at most one kind of soft failure.
If a method allows a soft failure, this is explicitly stated in its return value summary, and the method description will state the circumstances under which the soft failure occurs, and what the application must do to recover.
If a method call soft fails, the value of the error code is unspecified.
If a method call succeeds, the value of the error code is unspecified.
A return status of success indicates that the method did not detect any failures.
If both a hard failure and a soft failure occur, the return status will be hard failure.
If both a recoverable hard failure and an irrecoverable hard failure occur, the error code will be for an irrecoverable hard failure.
The behaviors specified for success and soft failure are application behaviors.
The behaviors specified for hard failures are diagnostic behaviors if an irrecoverable failure occurred, and application behaviors otherwise.

16 Static methods

Accessor function: Marpa_Error_Code marpa_check_version ( int required_major, int required_minor, int required_micro )

Checks that the Marpa library in use is compatible with the given version. Generally, the application programmer will pass in the constants MARPA_MAJOR_VERSION, MARPA_MINOR_VERSION, and MARPA_MICRO_VERSION as the three arguments, to check that their application was compiled with headers that match the version of Libmarpa that they are using.

If required_major.required_minor.required_micro is an exact match with the version of Libmarpa, the method succeeds. The version of Libmarpa described in this manual is 11.0.10. Otherwise the return status is an irrecoverable hard failure.

Return value: On success, MARPA_ERR_NONE. On hard failure, the error code.

Accessor function: Marpa_Error_Code marpa_version ( int* version)

Writes the version number in version. It is an undetected irrecoverable hard failure if version does not have room for three int’s.

Return value: Always succeeds. The return value is unspecified.

17 Configuration methods

The configuration object is intended for future extensions. Currently, the only function of the Marpa_Config class is to give marpa_g_new() a place to put its error code.

Marpa_Config is Libmarpa’s only “major” class which is not a time class. There is no constructor or destructor, although Marpa_Config objects do need to be initialized before use. Aside from its own accessor, Marpa_Config objects are only used by marpa_g_new() and no reference to their location is kept in any of Libmarpa’s time objects. The intent is that it be convenient to have Marpa_Config objects in memory that might be deallocated soon after marpa_g_new() returns. For example, they could be put on the stack.

Mutator function: int marpa_c_init ( Marpa_Config* config)

Initialize the config information to “safe” default values. An irrecoverable error will result if an uninitialized configuration is used to create a grammar.

Return value: Always succeeds. The return value is unspecified.

Accessor function: Marpa_Error_Code marpa_c_error ( Marpa_Config* config, const char** p_error_string )

Error codes are usually kept in the base grammar, which leaves marpa_g_new() no place to put its error code on failure. Objects of the Marpa_Config class provide such a place. p_error_string is reserved for use by the internals. Applications should set it to NULL.

Return value: The error code in config. Always succeeds, so that marpa_c_error() never requires an error code for itself.

18 Grammar methods

18.1 Overview

An archetypal application has a grammar. To create a grammar, use the marpa_g_new() method. When a grammar is no longer in use, its memory can be freed using the marpa_g_unref() method.

To be precomputed, a grammar must have one or more symbols. To create symbols, use the marpa_g_symbol_new() method.

To be precomputed, a grammar must have one or more rules. To create rules, use the marpa_g_rule_new() and marpa_g_sequence_new() methods.

To be precomputed, a grammar must have exactly one start symbol. To mark a symbol as the start symbol, use the marpa_g_start_symbol_set() method.

Before parsing with a grammar, it must be precomputed. To precompute a grammar, use the marpa_g_precompute() method.

18.2 Creating a new grammar

Constructor function: Marpa_Grammar marpa_g_new ( Marpa_Config* configuration )

Creates a new grammar time object. The returned grammar object is not yet precomputed, and will have no symbols and rules. Its reference count will be 1.

Unless the application calls marpa_c_error(), Libmarpa will not reference the location pointed to by the configuration argument after marpa_g_new() returns. (See marpa_c_error().) The configuration argument may be NULL, but if it is, there will be no way to determine the error code on failure.

Return value: On success, the grammar object. On hard failure, NULL. Also on hard failure, if the configuration argument is not NULL, the error code is set in configuration. The error code may be accessed using marpa_c_error().

Mutator function: int marpa_g_force_valued ( Marpa_Grammar g )

It is recommended that this call be made immediately after the grammar constructor. It turns off a deprecated feature.

The marpa_g_force_valued() method forces all the symbols in a grammar to be “valued”. The parse cares about the value of the symbol iff the symbol is “valued”. In the past, symbols were allowed to be “unvalued” in the hope of gaining efficiencies at evaluation time. Use of unvalued symbols is now deprecated, because current thinking is that the gains do not repay the extra complexity.

Return value: On success, a non-negative integer, whose value is otherwise unspecified. On failure, -2.

18.3 Tracking the reference count of the grammar

Mutator function: Marpa_Grammar marpa_g_ref (Marpa_Grammar g)

Increases the reference count of g by 1. Not needed by most applications.

Return value: On success, g. On hard failure, NULL.

Destructor function: void marpa_g_unref (Marpa_Grammar g): Decreases the reference count by 1, destroying g once the reference count reaches zero.

18.4 Symbol methods

Accessor function: Marpa_Symbol_ID marpa_g_start_symbol (Marpa_Grammar g)

When successful, returns the ID of the start symbol. Soft fails, if there is no start symbol. The start symbol is set by the marpa_g_start_symbol_set() call.

Return value: On success, the ID of the start symbol, which is always a non-negative integer. On soft failure, -1. On hard failure, -2.

Mutator function: Marpa_Symbol_ID marpa_g_start_symbol_set ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)

When successful, sets the start symbol of grammar g to symbol sym_id. Soft fails iff sym_id is well-formed (a non-negative integer), but a symbol with that ID does not exist.

Return value: On success, sym_id, which will always be a non-negative integer. On soft failure, -1. On hard failure, -2.

Accessor function: int marpa_g_highest_symbol_id (Marpa_Grammar g): Return value: On success, the maximum symbol ID of g. On hard failure, -2.

Accessor function: int marpa_g_symbol_is_accessible (Marpa_Grammar g, Marpa_Symbol_ID sym_id)

Soft fails iff sym_id is well-formed (a non-negative integer), but a symbol with that ID does not exist. A common hard failure is calling this method with a grammar that is not precomputed.

Return value: On success, 1 if symbol sym_id is accessible, 0 if not. On soft failure, -1. On hard failure, -2.

Accessor function: int marpa_g_symbol_is_nullable ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)

Soft fails iff sym_id is well-formed (a non-negative integer), but a symbol with that ID does not exist. A common hard failure is calling this method with a grammar that is not precomputed.

Return value: On success, 1 if symbol sym_id is nullable, 0 if not. On soft failure, -1. On hard failure, -2.

Accessor function: int marpa_g_symbol_is_nulling (Marpa_Grammar g, Marpa_Symbol_ID sym_id)

Soft fails iff sym_id is well-formed (a non-negative integer), but a symbol with that ID does not exist. A common hard failure is calling this method with a grammar that is not precomputed.

Return value: On success, 1 if symbol sym_id is nulling, 0 if not. On soft failure, -1. On hard failure, -2.

Accessor function: int marpa_g_symbol_is_productive (Marpa_Grammar g, Marpa_Symbol_ID sym_id)

Soft fails iff sym_id is well-formed (a non-negative integer), but a symbol with that ID does not exist. A common hard failure is calling this method with a grammar that is not precomputed.

Return value: On success, 1 if symbol sym_id is productive, 0 if not. On soft failure, -1. On hard failure, -2.

Accessor function: int marpa_g_symbol_is_start ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)

On success, if sym_id is the start symbol, returns 1. On success, if sym_id is not the start symbol, returns 0. On success, if no start symbol has been set, returns 0.

Soft fails iff sym_id is well-formed (a non-negative integer), but a symbol with that ID does not exist.

Return value: On success, 1 or 0. On soft failure, -1. On hard failure, -2.

Accessor function: int marpa_g_symbol_is_terminal ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)

On success, if sym_id is a terminal symbol, returns 1. On success, if sym_id is not a terminal symbol, returns 0. To be used as an input symbol in the marpa_r_alternative() method, a symbol must be a terminal.

Soft fails iff sym_id is well-formed (a non-negative integer), but a symbol with that ID does not exist.

Return value: On success, 1 or 0. On soft failure, -1. On hard failure, -2.

Mutator function: Marpa_Symbol_ID marpa_g_symbol_new (Marpa_Grammar g)

When successful, creates a new symbol in grammar g, and returns the ID of the new symbol. The symbol ID’s are non-negative integers. Within each grammar, a symbol’s ID is unique to that symbol.

Symbols are numbered consecutively, starting at 0. That is, the first successful call of this method for a grammar returns the symbol ID 0. The second and later successful calls of marpa_g_symbol_new(), return the symbol ID n+1, where n is the symbol ID returned by the previous successful call of marpa_g_symbol_new(). This makes it convenient for applications to store additional information about the symbols in an array.

Return value: On success, the ID of the new symbol, which will be a non-negative integer. On hard failure, -2.

18.5 Rule methods

Accessor function: int marpa_g_highest_rule_id (Marpa_Grammar g): Return value: On success, the maximum rule ID of g. On hard failure, -2.

Accessor function: int marpa_g_rule_is_accessible (Marpa_Grammar g, Marpa_Rule_ID rule_id)

On success, does the following:

If the rule whose ID is rule_id is accessible, returns 1.
If the rule whose ID is rule_id is inaccessible, returns 0.

Soft fails iff rule_id is well-formed (a non-negative integer), but a rule with that ID does not exist. A common hard failure is calling this method with a grammar that is not precomputed.

Return value: On success, 1 or 0. On soft failure, -1. On hard failure, -2.

Accessor function: int marpa_g_rule_is_nullable ( Marpa_Grammar g, Marpa_Rule_ID ruleid)

On success, does the following:

If the rule whose ID is rule_id is nullable, returns 1.
If the rule whose ID is rule_id is non-nullable, returns 0.

Soft fails iff rule_id is well-formed (a non-negative integer), but a rule with that ID does not exist. A common hard failure is calling this method with a grammar that is not precomputed.

Return value: On success, 1 or 0. On soft failure, -1. On hard failure, -2.

Accessor function: int marpa_g_rule_is_nulling (Marpa_Grammar g, Marpa_Rule_ID ruleid)

On success, does the following:

If the rule whose ID is rule_id is nulling, returns 1.
If the rule whose ID is rule_id is non-nulling, returns 0.

Soft fails iff rule_id is well-formed (a non-negative integer), but a rule with that ID does not exist. A common hard failure is calling this method with a grammar that is not precomputed.

Return value: On success, 1 or 0. On soft failure, -1. On hard failure, -2.

Accessor function: int marpa_g_rule_is_loop (Marpa_Grammar g, Marpa_Rule_ID rule_id)

On success, does the following:

If the rule whose ID is rule_id is a loop rule, returns 1.
If the rule whose ID is rule_id is not a loop rule, returns 0.

A rule is a loop rule iff it non-trivially produces the string of length one that consists only of its LHS symbol. The presence of a loop rule indicates that g contains a cycle. Parsing with a grammar that contains a cycle is deprecated. marpa_g_rule_is_loop should only be used for diagnostic purposes, to allow a user to find the rules which cause the cycle and to change the grammar to be cycle-free. See Cycles.

Soft fails iff rule_id is well-formed (a non-negative integer), but a rule with that ID does not exist. A common hard failure is calling this method with a grammar that is not precomputed.

Return value: On success, 1 or 0. On soft failure, -1. On hard failure, -2.

Accessor function: int marpa_g_rule_is_productive (Marpa_Grammar g, Marpa_Rule_ID rule_id)

On success, does the following:

If the rule whose ID is rule_id is productive, returns 1.
If the rule whose ID is rule_id is unproductive, returns 0.

Soft fails iff rule_id is well-formed (a non-negative integer), but a rule with that ID does not exist. A common hard failure is calling this method with a grammar that is not precomputed.

Return value: On success, 1 or 0. On soft failure, -1. On hard failure, -2.

Accessor function: int marpa_g_rule_length ( Marpa_Grammar g, Marpa_Rule_ID rule_id)

On success, returns the length of the rule with ID rule_id. The length of a rule is the length of its RHS.

Soft fails iff rule_id is well-formed (a non-negative integer), but a rule with that ID does not exist.

Return value: On success, the length of the rule with ID rule_id, which is always a non-negative integer. On soft failure, -1. On hard failure, -2.

Accessor function: Marpa_Symbol_ID marpa_g_rule_lhs ( Marpa_Grammar g, Marpa_Rule_ID rule_id)

Soft fails iff rule_id is well-formed (a non-negative integer), but a rule with that ID does not exist.

Return value: On success, returns the ID of the LHS symbol of the rule with ID rule_id in grammar g. A symbol ID is always a non-negative integer. On soft failure, -1. On hard failure, -2.

Mutator function: Marpa_Rule_ID marpa_g_rule_new (Marpa_Grammar g, Marpa_Symbol_ID lhs_id, Marpa_Symbol_ID *rhs_ids, int length)

marpa_g_rule_new() is one of the rule creation methods. On success, the following are true:

This method creates a new BNF rule in grammar g.
This method gives the new rule an ID.
The ID of the new rule is a non-negative integer, which is unique to that rule.
If grammar g had no rules when this method was called, the new rule will have ID 0.
If grammar g had one or more rules when this method was called, let n be the maximum rule ID in grammar g when this method was called. Then the new rule will have rule ID n+1.
The ID of the LHS symbol of the new rule will be lhs_id.
The length of the RHS of the new rule will be length.
For 0 <= i < length, the ID of the RHS symbol with index i will be rhs_ids[i].
The return value of this method is the ID of the new rule.

In addition to BNF rules, Marpa also allows sequence rules, which are created by the marpa_g_sequence_new() method. See marpa_g_sequence_new(). We call marpa_g_rule_new() and marpa_g_sequence_new() rule creation methods. Sequence rules and BNF rules are both rules: They share the same series of rule IDs, and are accessed and manipulated by the same methods, with the only differences being as noted in the descriptions of those methods.

Each grammar’s rule ID’s are a consecutive sequence of non-negative integers, starting at 0. The consecutive numbering of rule ID’s is intended to make it convenient for applications to store additional information about a grammar’s rules in an array.

Possible hard failures, with their error codes, include:

MARPA_ERR_SEQUENCE_LHS_NOT_UNIQUE: The LHS symbol is the same as that of a sequence rule.
MARPA_ERR_DUPLICATE_RULE: The new rule would duplicate another BNF rule. Another BNF rule is considered the duplicate of the new one, if its LHS symbol is the same as symbol lhs_id, if its length is the same as length, and if its RHS symbols match one for one those in the array of symbols rhs_ids.

Return value: On success, the ID of the new rule, which is always a non-negative integer. On hard failure, -2.

Accessor function: Marpa_Symbol_ID marpa_g_rule_rhs ( Marpa_Grammar g, Marpa_Rule_ID rule_id, int ix)

When successful, returns the ID of the symbol at index ix in the RHS of the rule with ID rule_id. The indexing of RHS symbols is zero-based.

Soft fails iff rule_id is well-formed (a non-negative integer), but a rule with that ID does not exist.

Hard fails if ix is not a valid index of the RHS. This happens if ix is less than zero, or if ix is greater than or equal to the length of the rule.

Return value: On success, a symbol ID, which is always non-negative. On soft failure, -1. On hard failure, -2.

18.6 Sequence methods

Accessor function: int marpa_g_rule_is_proper_separation ( Marpa_Grammar g, Marpa_Rule_ID rule_id)

When successful, returns

1 if rule_id is the ID of a sequence rule whose proper separation flag is set,
0 if rule_id is the ID of a sequence rule whose proper separation flag is not set,
0 if rule_id is the ID of a rule that is not a sequence rule.

Does not distinguish sequence rules without proper separation from non-sequence rules. That is, does not distinguish an unset proper separation flag from a proper separation flag whose value is unspecified because rule_id is the ID of a BNF rule. Applications that want to determine whether or not a rule is a sequence rule can use marpa_g_sequence_min() to do this. See marpa_g_sequence_min().

Soft fails iff rule_id is well-formed (a non-negative integer), but a rule with that ID does not exist.

Return value: On success, 1 or 0. On soft failure, -1. On hard failure, -2.

Accessor function: int marpa_g_sequence_min ( Marpa_Grammar g, Marpa_Rule_ID rule_id)

On success, returns the mininum length of a sequence rule. Soft fails iff a rule with ID rule_id exists, but is not a sequence rule. This soft failure can used to test whether or not a rule is a sequence rule.

Hard fails irrecoverably if rule_id is not well-formed (a non-negative integer). Also, hard fails irrecoverably if no rule with ID rule_id exists, even when rule_id is well formed. Note that, in its handling of the non-existence of a rule for its rule argument, this method differs from many of the other grammar methods. Grammar methods that take a rule ID argument more often treat the non-existence of rule for a well-formed rule ID as a soft, recoverable, failure.

Return value: On success, the minimum length of the sequence rule with ID rule_id, which is always non-negative. On soft failure, -1. On hard failure, -2.

Mutator function: Marpa_Rule_ID marpa_g_sequence_new (Marpa_Grammar g, Marpa_Symbol_ID lhs_id, Marpa_Symbol_ID rhs_id, Marpa_Symbol_ID separator_id, int min, int flags )

marpa_g_sequence_new() is one of the rule creation methods. On success, the following are true:

This method creates a new sequence rule in grammar g.
This method gives the new rule an ID.
The ID of the new rule is a non-negative integer, which is unique to that rule.
If grammar g had no rules when this method was called, the new rule will have ID 0.
If grammar g had one or more rules when this method was called, let n be the maximum rule ID in grammar g when this method was called. Then the new rule will have rule ID n+1.
The ID of the LHS symbol of the new rule will be lhs_id.
The ID of the symbol to be repeated on the RHS of the sequence will be rhs_id. It is a irrecoverable hard error if rhs_id is the ID of a nullable symbol.
The sequence must be repeated at least min times. min must be 0 or 1.
If separator_id is a non-negative integer, it is the ID of a separator symbol. It is a irrecoverable hard error if separator_id is the ID of a nullable symbol.
flags is treated as a bit vector. The behavior when any bit other than MARPA_PROPER_SEPARATION is set is undefined.
By default, the sequence rule recognizes a trailing separator. If flags & MARPA_PROPER_SEPARATION is non-zero, separation is “proper”. Proper separation means the the rule does not recognize a trailing separator. Specifying proper separation has no effect if a separator symbol has not been specified.
The return value of this method is the ID of the new rule.

In addition to sequence rules, Marpa also allows BNF rules, which are created by the marpa_g_rule_new() method. See marpa_g_rule_new(). We call marpa_g_rule_new() and marpa_g_sequence_new() rule creation methods. For details on the use of sequence rules, see Sequence rules.

Sequence rules and BNF rules are both rules: They share the same series of rule IDs, and are accessed and manipulated by the same methods, with the only differences being as noted in the descriptions of those methods.

The LHS symbol cannot be the LHS of any other rule, whether a BNF rule or a sequence rule. On an attempt to create an sequence rule with a duplicate LHS, this method hard fails, with an error code of MARPA_ERR_SEQUENCE_LHS_NOT_UNIQUE.

Return value: On success, the ID of the newly added sequence rule, which is always non-negative. On hard failure, -2.

Accessor function: int marpa_g_sequence_separator ( Marpa_Grammar g, Marpa_Rule_ID rule_id)

On success, returns the symbol ID of the separator of the sequence rule with ID rule_id. Soft fails iff there is no separator. Hard fails if rule_id is a negative integer; if rule_id is not the ID of a rule that exists; or if rule_id is not the ID a sequence rule.

Return value: On success, a symbol ID, which is always non-negative. On soft failure, -1. On hard failure, -2.

Accessor function: int marpa_g_symbol_is_counted (Marpa_Grammar g, Marpa_Symbol_ID sym_id)

On success, returns a boolean whose value is 1 iff the symbol with ID sym_id is counted. A symbol is counted iff

it appears on the RHS of a sequence rule, or
it is used as the separator symbol of a sequence rule.

Soft fails iff sym_id is well-formed (a non-negative integer), but a symbol with that ID does not exist.

Return value: On success, a boolean. On soft failure, -1. On hard failure, -2.

18.7 Rank methods

Accessor function: Marpa_Rank marpa_g_default_rank ( Marpa_Grammar g)

On success, returns the default rank of the grammar g. For more about the default rank of a grammar, see marpa_g_default_rank_set().

Return value: On success, returns the default rank of the grammar, which will be an integer, and sets the error code to MARPA_ERR_NONE. On hard failure, returns -2, and sets the error code to an appropriate value, which will never be MARPA_ERR_NONE. Note that when the default rank of the grammar is -2, the error code is the only way to distinguish success from failure. The error code can be determined by using the marpa_g_error() call. See marpa_g_error().

Mutator function: Marpa_Rank marpa_g_default_rank_set ( Marpa_Grammar g, Marpa_Rank rank)

On success, sets the default rank of the grammar g to rank. When a grammar is created, the default rank is 0. When rules and symbols are created, their rank is the default rank of the grammar.

Changing the grammar’s default rank does not affect those rules and symbols already created, only those that will be created. This means that the grammar’s default rank can be used to, in effect, assign ranks to groups of rules and symbols. Applications may find this behavior useful.

Return value: On success, returns rank, which will be an integer, and sets the error code to MARPA_ERR_NONE. On failure, returns -2, and sets the error code to an appropriate value, which will never be MARPA_ERR_NONE. Note that when the rank is -2, the error code is the only way to distinguish success from failure. The error code can be determined by using the marpa_g_error() call. See marpa_g_error().

Accessor function: Marpa_Rank marpa_g_symbol_rank ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)

When successful, returns the rank of the symbol with ID sym_id.

Return value: On success, returns a symbol rank, which will be an integer, and sets the error code to MARPA_ERR_NONE. On hard failure, returns -2, and sets the error code to an appropriate value, which will never be MARPA_ERR_NONE. Note that -2 is a valid symbol rank, so that when -2 is returned, the error code is the only way to distinguish success from failure. The error code can be determined using marpa_g_error(). See marpa_g_error().

Mutator function: Marpa_Rank marpa_g_symbol_rank_set ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, Marpa_Rank rank)

When successful, sets the rank of the symbol with ID sym_id to rank.

Return value: On success, returns rank, which will be an integer, and sets the error code to MARPA_ERR_NONE. On hard failure, returns -2, and sets the error code to an appropriate value, which will never be MARPA_ERR_NONE. Note that rank may be -2, and in this case the error code is the only way to distinguish success from failure. The error code can be determined using marpa_g_error(). See marpa_g_error().

Accessor function: Marpa_Rank marpa_g_rule_rank ( Marpa_Grammar g, Marpa_Rule_ID rule_id)

When successful, returns the rank of the rule with ID rule_id.

Return value: On success, returns a rule rank, which will be an integer, and sets the error code to MARPA_ERR_NONE. On hard failure, returns -2, and sets the error code to an appropriate value, which will never be MARPA_ERR_NONE. Note that -2 is a valid rule rank, so that when -2 is returned, the error code is the only way to distinguish success from failure. The error code can be determined using marpa_g_error(). See marpa_g_error().

Mutator function: Marpa_Rank marpa_g_rule_rank_set ( Marpa_Grammar g, Marpa_Rule_ID rule_id, Marpa_Rank rank)

When successful, sets the rank of the rule with ID rule_id to rank.

Return value: On success, returns rank, which will be an integer, and sets the error code to MARPA_ERR_NONE. On hard failure, returns -2, and sets the error code to an appropriate value, which will never be MARPA_ERR_NONE. Note that -2 is a valid rule rank, so that when -2 is returned, the error code is the only way to distinguish success from failure. The error code can be determined using marpa_g_error(). See marpa_g_error().

Accessor function: int marpa_g_rule_null_high ( Marpa_Grammar g, Marpa_Rule_ID rule_id)

On success, returns a boolean whose value is 1 iff “null ranks high” is set in the rule with ID rule_id. When a rule is created, it has “null ranks high” set.

For more on the “null ranks high” setting, read the description of marpa_g_rule_null_high_set(). See marpa_g_rule_null_high_set().

Soft fails iff rule_id is well-formed (a non-negative integer), but a rule with that ID does not exist.

Return value: On success, returns a boolean. On soft failure, returns -1. On hard failure, returns -2.

Mutator function: int marpa_g_rule_null_high_set ( Marpa_Grammar g, Marpa_Rule_ID rule_id, int flag)

On success,

sets “null ranks high” in the rule with ID rule_id if the value of the boolean flag is 1;
unsets “null ranks high” in the rule with ID rule_id if the value of the boolean flag is 0; and
returns flag.

The “null ranks high” setting affects the ranking of rules with properly nullable symbols on their right hand side. If a rule has properly nullable symbols on its RHS, each instance in which it appears in a parse will have a pattern of nulled and non-nulled symbols. Such a pattern is called a “null variant”.

If the “null ranks high” is set, nulled symbols rank high. If the “null ranks high” is unset (the default), nulled symbols rank low. Ranking of a null variants is done from left-to-right.

Soft fails iff rule_id is well-formed (a non-negative integer), but a rule with that ID does not exist.

Hard fails if the grammar has been precomputed.

Return value: On success, returns a boolean. On soft failure, returns -1. On hard failure, returns -2.

18.8 Precomputing the Grammar

Accessor function: int marpa_g_has_cycle (Marpa_Grammar g)

On success, returns a boolean which is 1 iff g has a cycle. Parsing with a grammar that contains a cycle is deprecated. See Cycles.

For diagnostic purposes, it is useful to determine which rules are in the cycle. marpa_g_rule_is_loop() can be used for for this purpose. See marpa_g_rule_is_loop.

Return value: On success, a boolean. On hard failure, -2.

Accessor function: int marpa_g_is_precomputed (Marpa_Grammar g): Return value: On success, a boolean which is 1 iff grammar g is precomputed. On hard failure, -2.

Mutator function: int marpa_g_precompute (Marpa_Grammar g)

Precomputation involves running a series of grammar checks and “precomputing” some useful information which is kept internally to save repeated calculations. After precomputation, the grammar is “frozen” in many respects, and many grammar mutators that succeed before precomputation will cause hard failures after precomputation. Precomputation is necessary for a recognizer to be generated from a grammar.

On success, and on fully recoverable hard failure, does the following:

Precomputes the grammar g.
Clears any events that were in the event queue at the beginning of this method call.
Returns zero or more events.
Returns a value as described in this method’s return value summary.

The types of event that this method may return are MARPA_EVENT_LOOP_RULES, MARPA_EVENT_COUNTED_NULLABLE, and MARPA_EVENT_NULLING_TERMINAL. All of these events occur only on failure. Events may be queried using the marpa_g_event() method. See marpa_g_event().

The fully recoverable hard failure is MARPA_ERR_GRAMMAR_HAS_CYCLE. While this method precomputes the grammar for fully recoverable hard failures, parsing with a grammar that has a cycle is deprecated. Applications should treat MARPA_ERR_GRAMMAR_HAS_CYCLE as a library-recoverable error. A MARPA_ERR_GRAMMAR_HAS_CYCLE error occurs iff at least one MARPA_EVENT_LOOP_RULES event occurs. For more details on cycles, see Cycles.

The error code MARPA_ERR_COUNTED_NULLABLE is library-recoverable. This failure occurs when a symbol on the RHS of a sequence rule is nullable, which Libmarpa does not allow in a grammar. Error code MARPA_ERR_COUNTED_NULLABLE occurs iff one or more MARPA_EVENT_COUNTED_NULLABLE events occur. There is one MARPA_EVENT_COUNTED_NULLABLE event for every symbol that is a nullable on the right hand side of a sequence rule. An application may use these events to inform the user of the problematic symbols, and this detail may help the user fix the grammar.

The error code MARPA_ERR_NULLING_TERMINAL occurs only if LHS terminals are enabled. The LHS terminals feature is deprecated. See LHS terminals. Error code MARPA_ERR_NULLING_TERMINAL is library-recoverable. One or more MARPA_EVENT_NULLING_TERMINAL events will occur iff this method fails with error code MARPA_ERR_NULLING_TERMINAL. See Nulling terminals.

Among the other error codes that may case this method to fail are the following:

MARPA_ERR_NO_RULES: The grammar has no rules.
MARPA_ERR_NO_START_SYMBOL: No start symbol was specified.
MARPA_ERR_INVALID_START_SYMBOL: A start symbol ID was specified, but it is not the ID of a valid symbol.
MARPA_ERR_START_NOT_LHS: The start symbol is not on the LHS of any rule.
MARPA_ERR_UNPRODUCTIVE_START: The start symbol is not productive.

More details of these can be found under the description of the appropriate code. See External error codes.

Return value: On success, a non-negative integer, whose value is otherwise unspecified. On hard failure, -2. For the error code MARPA_ERR_GRAMMAR_HAS_CYCLE, the hard failure is fully recoverable. For the error codes MARPA_ERR_COUNTED_NULLABLE and MARPA_ERR_NULLING_TERMINAL, the hard failure is library-recoverable.

19 Recognizer methods

19.1 Recognizer overview

An archetypal application uses a recognizer to read input. To create a recognizer, use the marpa_r_new() method. When a recognizer is no longer in use, its memory can be freed using the marpa_r_unref() method.

To make a recognizer ready for input, use the marpa_r_start_input() method.

The recognizer starts with its current earleme at location 0. To read a token at the current earleme, use the marpa_r_alternative() call.

To complete the processing of the current earleme, and move forward to a new one, use the marpa_r_earleme_complete() call.

19.2 Creating a new recognizer

Constructor function: Marpa_Recognizer marpa_r_new ( Marpa_Grammar g )

On success, creates a new recognizer and increments the reference count of g, the base grammar, by one. In the new recognizer, the following will be true:

The reference count will be 1.
The furthest earleme will be 0.
The values of the latest and current earleme will be unspecified.
The declaration status of all events will be inherited from g, so that the event in the recognizer is declared iff the event in g is declared.
The activation status of all events will be inherited from g, so that the event in the recognizer is activated iff the event in g is activated.

Return value: On success, the newly created recognizer, which is never NULL. If g is not precomputed, or on other hard failure, NULL.

19.3 Keeping the reference count of a recognizer

Mutator function: Marpa_Recognizer marpa_r_ref (Marpa_Recognizer r)

Increases the reference count by 1. This method is not needed by most applications.

Return value: On success, the recognizer object, r, which is never NULL. On hard failure, NULL.

Destructor function: void marpa_r_unref (Marpa_Recognizer r): Decreases the reference count by 1, destroying r once the reference count reaches zero. When r is destroyed, the reference count of its base grammar is decreased by one. If this takes the reference count of the base grammar to zero, the base grammar is also destroyed.

19.4 Life cycle mutators

Mutator function: int marpa_r_start_input (Marpa_Recognizer r)

When successful, does the following:

Readies r to accept input.
Completes the first Earley set. The ID of the first Earley set is 0, and it is located at earleme 0.
Leaves the latest, current and furthest earlemes all at 0.
Clears any events that were in the event queue before this method was called.
If this method exhausts the parse, triggers a MARPA_EVENT_EXHAUSTED event. See Exhaustion.
May trigger one or more MARPA_EVENT_SYMBOL_NULLED, MARPA_EVENT_SYMBOL_PREDICTED, or MARPA_EVENT_SYMBOL_EXPECTED events. See Events.

Return value: On success, a non-negative value, whose value is otherwise unspecified. On hard failure, -2.

Mutator function: int marpa_r_alternative (Marpa_Recognizer r, Marpa_Symbol_ID token_id, int value, int length)

The token_id argument must be the symbol ID of a terminal. The value argument is an integer that represents the “value” of the token, and which should not be zero. The length argument is the length of the token, which must be greater than zero.

On success, does the following, where current is the value of the current earleme before the call and furthest is the value of the furthest earleme before the call:

Reads a new token into r. The symbol ID of the token will be token_id. The token will start at current and end at current+length.
Sets the value of the furthest earleme to max(current+length,furthest).
Leaves the values of the latest and current earlemes unchanged.

After recoverable failure, the following are the case:

The tokens read into r are unchanged. Specifically, no new token has been read into r.
The values of the latest, current and furthest earlemes are unchanged.

Libmarpa allows tokens to be ambiguous. Two tokens are ambiguous if they end at the same earleme location. If two tokens are ambiguous, Libmarpa will attempt to produce all the parses that include either of them.

Libmarpa allows tokens to overlap. Let the notation t@s-e indicate that token t starts at earleme s and ends at earleme e. Let t1@s1-e1 and t2@s2-e2 be two tokens such that s1<=s2. We say that t1 and t2 overlap iff e1>s2.

The value argument is not used inside Libmarpa — it is simply stored to be returned by the valuator as a convenience for the application. In applications where the token’s actual value is not an integer, it is expected that the application will use value as a “virtual” value, perhaps finding the actual value by using value to index an array. Some applications may prefer to track token values on their own, perhaps based on the earleme location and token_id, instead of using Libmarpa’s token values.

A value of 0 does not cause a failure, but it is reserved for unvalued symbols, a now-deprecated feature. See Valued and unvalued symbols.

Hard fails irrecoverably with MARPA_ERR_DUPLICATE_TOKEN if the token added would be a duplicate. Two tokens are duplicates iff all of the following are true:

They would have the same start earleme. In other words, if marpa_r_alternative() attempts to read them while at the same current earleme.
They have the same token_id.
They have the same length.

If a token was not accepted because of its token ID, hard fails with the MARPA_ERR_UNEXPECTED_TOKEN_ID. This hard failure is fully recoverable so that, for example, the application may retry this method with different token IDs until it succeeds. These retries are efficient, and are quite useable as a parsing technique — so much so we have given the technique a name: the Ruby Slippers. The Ruby Slippers are used in several applications.

Return value: On success, MARPA_ERR_NONE. On hard failure, an error code other than MARPA_ERR_NONE. The hard failure for MARPA_ERR_UNEXPECTED_TOKEN_ID is fully recoverable.

Mutator function: int marpa_r_earleme_complete (Marpa_Recognizer r)

For the purposes of this method description, we define the following:

current is the value of the current earleme before the call of marpa_r_earleme_complete.
latest is the value of the latest earleme before the call of marpa_r_earleme_complete.
An “expected” terminal is one expected at a current earleme, in the same sense that marpa_r_terminal_is_expected() determines if a terminal is “expected” at the current earleme. See marpa_r_terminals_expected().
An “anticipated” terminal is one that was accepted by the marpa_r_alternative() to end at an earleme after the current earleme. An anticipated terminal will have length greater than one. “Anticipated” terminals only occur if the application is using an advanced model of input. See Advanced input models.

In this method description, we will frequently refer to parse exhaustion. Parse exhaustion is discussed in detail in its own section. See Exhaustion.

On success, does the final processing for the current earleme, including the following:

Advances the current earleme, incrementing its value by 1. That is, sets the current earleme to current+1.
If any token was accepted at current, creates a new Earley set, which will be the latest Earley set. After the call, the latest earleme will be equal to the new current earleme, current+1.
If no token was accepted at current, no Earley set is created. After the call, the value of the latest earleme will be unchanged — that is, it will remain at latest. Success when no tokens were accepted at current can only occur if the application is using an advanced model of input. See Advanced input models.
The value of the furthest earleme is never changed by a call to marpa_r_earleme_complete().
Clears the event queue of any events that occured before this method was called.
May trigger one or more MARPA_EVENT_SYMBOL_COMPLETED, MARPA_EVENT_SYMBOL_NULLED, MARPA_EVENT_SYMBOL_PREDICTED, or MARPA_EVENT_SYMBOL_EXPECTED events. See Events.
If an application-settable threshold on the number of Earley items has been reached or exceeded, triggers a MARPA_EVENT_EARLEY_ITEM_THRESHOLD event. Often, the application will want to treat this event as if it were a ancestry-recoverable failure. See marpa_r_earley_item_warning_threshold_set().
If the parse is exhausted, triggers a MARPA_EVENT_EXHAUSTED event. Exhaustion on success only occurs if no terminals are expected at the current earleme after the call to this method (that is, at current+1) and no terminals are anticipated after current+1.

On hard failure with the code MARPA_ERR_PARSE_EXHAUSTED, does the following:

Leaves the current earleme at current. The current earleme will be the same as the furthest earleme.
The value of the furthest earleme is never changed by a call to marpa_r_earleme_complete().
Leaves the value of the latest earleme at latest. No new Earley set is created.
Sets the parse exhausted, so that no more tokens will be accepted.
Leaves the parse in a state where no terminals are expected or anticipated.
Clears the event queue of any events that occured before the call to this method.
Triggers a MARPA_EVENT_EXHAUSTED event and no others.
Leaves valid any parses that were valid at the current or earlier earlemes. Processing with these can continue, and it for this reason that we consider hard failures with the code MARPA_ERR_PARSE_EXHAUSTED to be fully recoverable.

We note that exhaustion can occur when this method fails and when it succeeds. The distinction is that, on success, the call creates a new Earley set before becoming exhausted while, on failure, it becomes exhausted without creating a new Earley set.

This method is commonly called at the top of a loop. Almost all applications will want to check the return value and take special action in case of a value other than zero. If the value is greater than zero, an event will have occurred and almost all applications should react to MARPA_EVENT_EARLEY_ITEM_THRESHOLD events, as described above, and to unexpected events. If the value is less than zero, it may be due to an irrecoverable error, and only in very unusual circumstances will an application wish to ignore these.

How an application reacts to exhaustion will depend on the kind of parsing it is doing. See Exhaustion.

It is often up to the logic immediately around this method to detect EOP. See End of parse.

Return value: On success, the number of events in the event queue when the method call returns. On hard failure, -2. Hard failure with the code MARPA_ERR_PARSE_EXHAUSTED is fully recoverable.

19.5 Location accessors

Accessor function: Marpa_Earleme marpa_r_current_earleme (Marpa_Recognizer r)

Successful iff input has started. If input has not started, returns soft failure.

Return value: On success, the current earleme, which is always non-negative. On soft failure, -1. Never returns a hard failure.

Accessor function: Marpa_Earleme marpa_r_earleme ( Marpa_Recognizer r, Marpa_Earley_Set_ID set_id)

On success, returns the earleme of the Earley set with ID set_id. The ID of an Earley set ID is also called its ordinal. In the default, token-stream model, Earley set ID and earleme are always equal, but this is not the case in other input models.

Hard fails if there is no Earley set whose ID is set_id. This hard failure is fully recoverable. If set_id was negative, the error code of the hard failure is MARPA_ERR_INVALID_LOCATION. If set_id is greater than the ordinal of the latest Earley set, the error code of the hard failure is MARPA_ERR_NO_EARLEY_SET_AT_LOCATION.

Techniques for performing the inverse operation (conversion of an earleme to an Earley set ID) are described in the section on advanced input models. See Converting earleme to Earley set ID.

Return value: On success, the earleme corresponding to Earley set set_id, which is always non-negative. On hard failure, -2. The hard failures with error codes MARPA_ERR_INVALID_LOCATION and MARPA_ERR_NO_EARLEY_SET_AT_LOCATION are fully recoverable.

Accessor function: int marpa_r_earley_set_value ( Marpa_Recognizer r, Marpa_Earley_Set_ID earley_set)

On success, returns the “integer value” of earley_set. For more about the integer value of an Earley set, see marpa_r_earley_set_values().

Return value: On success, returns the the integer value of earley_set, and sets the error code to MARPA_ERR_NONE. On hard failure, returns -2, and sets the error code to the error code of the hard failure, which will never be MARPA_ERR_NONE. Note that -2 is a valid “integer value” for an Earley set, so that when -2 is returned, the error code is the only way to distinguish success from failure. The error code can be determined using marpa_g_error(). See marpa_g_error().

Mutator function: int marpa_r_earley_set_values ( Marpa_Recognizer r, Marpa_Earley_Set_ID earley_set, int* p_value, void** p_pvalue )

On success, does the following:

If p_value is non-zero, sets the location pointed to by p_value to the integer value of the Earley set with ID earley_set.
If p_pvalue is non-zero, sets the location pointed to by p_pvalue to the pointer value of the Earley set with ID earley_set.

The “value” and “pointer” of an Earley set are an arbitrary integer and an arbitrary pointer. Libmarpa never examines them and the application is free to use them for its own purposes. In an application with a codepoint-per-earleme input model, for example, the integer value of the Earley set can used to store the current codepoint. In a traditional token-per-earleme input model, the integer and pointer values could be used to track the string value of the token – the pointer could point to the start of the string, and the integer could indicate its length. See The codepoint-per-earleme model.

The Earley set integer value defaults to -1, and the pointer value defaults to NULL. The Earley set value and pointer can be set using the marpa_r_latest_earley_set_values_set() method. See marpa_r_latest_earley_set_values_set().

Return value: On success, returns a non-negative integer. On hard failure, returns -2.

Accessor function: unsigned int marpa_r_furthest_earleme (Marpa_Recognizer r): Return value: The furthest earleme. Always succeeds.

Accessor function: Marpa_Earley_Set_ID marpa_r_latest_earley_set (Marpa_Recognizer r)

Returns the Earley set ID of the latest Earley set. The ID of an Earley set ID is also called its ordinal. Applications that want the value of the latest earleme can convert this value using the marpa_r_earleme() method. See marpa_r_earleme().

Return value: The ID of the latest Earley set. Always succeeds.

Mutator function: int marpa_r_latest_earley_set_value_set ( Marpa_Recognizer r, int value)

Sets the “integer value” of the latest Earley set to value. For more about the integer value of an Earley set, see marpa_r_earley_set_values().

Return value: On success, returns the newly set integer value of the latest earley set, and sets the error code to MARPA_ERR_NONE. On hard failure, returns -2, and sets the error code to the error code of the hard failure, which will never be MARPA_ERR_NONE. Note that -2 is a valid “integer value” for an Earley set, so that when -2 is returned, the error code is the only way to distinguish success from failure. The error code can be determined using marpa_g_error(). See marpa_g_error().

Mutator function: int marpa_r_latest_earley_set_values_set ( Marpa_Recognizer r, int value, void* pvalue)

Sets the integer and pointer value of the latest Earley set. For more about the “integer value” and “pointer value” of an Earley set, see marpa_r_earley_set_values().

Return value: On success, returns a non-negative integer. On hard failure, returns -2.

19.6 Other parse status methods

Accessor function: int marpa_r_earley_item_warning_threshold (Marpa_Recognizer r)

Details about the “earley item warning threshold” are in the description of the marpa_r_earley_item_warning_threshold_set() method. See marpa_r_earley_item_warning_threshold_set().

Return value: The Earley item warning threshold. Always succeeds.

Mutator function: int marpa_r_earley_item_warning_threshold_set (Marpa_Recognizer r, int threshold)

On success, sets the Earley item warning threshold. The Earley item warning threshold is a number that is compared with the count of Earley items in each Earley set. When it is matched or exceeded, a MARPA_EVENT_EARLEY_ITEM_THRESHOLD event is created. See MARPA_EVENT_EARLEY_ITEM_THRESHOLD.

If threshold is zero or less, an unlimited number of Earley items will be allowed without warning. This will rarely be what the user wants.

By default, Libmarpa calculates a value based on the grammar. The formula Libmarpa uses is the result of some experience, and most applications will be happy with it.

What should be done when the threshold is exceeded, depends on the application, but exceeding the threshold means that it is very likely that the time and space resources consumed by the parse will prove excessive. This is often a sign of a bug in the grammar. Applications often will want to smoothly shut down the parse, in effect treating the MARPA_EVENT_EARLEY_ITEM_THRESHOLD event as equivalent to library-recoverable hard failure.

Return value: The value that the Earley item warning threshold has after the method call is finished. Always succeeds.

Accessor function: int marpa_r_is_exhausted (Marpa_Recognizer r)

A parser is “exhausted” if it cannot accept any more input. See Exhaustion.

Return value: 1 if the parser is exhausted, 0 otherwise. Always succeeds.

Accessor function: int marpa_r_terminals_expected ( Marpa_Recognizer r, Marpa_Symbol_ID* buffer)

Returns a list of the ID’s of the symbols that are acceptable as tokens at the current earleme. buffer is expected to be large enough to hold the result. This is guaranteed to be the case if the buffer is large enough to hold an array of Marpa_Symbol_ID’s whose length is greater than or equal to the number of symbols in the grammar.

Return value: On success, the number of Marpa_Symbol_ID’s in buffer, which is always non-negative. On hard failure, -2.

Accessor function: int marpa_r_terminal_is_expected ( Marpa_Recognizer r, Marpa_Symbol_ID symbol_id)

On success, does the folloing:

If symbol_id is not the ID of a terminal symbol, returns 0.
If symbol_id is the ID of a terminal symbol, but that symbol is not expected at the current earleme, returns 0.
If symbol_id is the ID of a terminal symbol, but that symbol is expected at the current earleme, returns 1.

Hard fails if the symbol with ID symbol_id does not exist.

Return value: On success, 0 or 1. On hard failure, -2.

20 Progress reports

Libmarpa’s progress reports allow access to the Earley items in an Earley set.

To start a progress report, use the marpa_r_progress_report_start() command. For each recognizer, only one progress report can be in use at any one time.

To step through the Earley items, use the marpa_r_progress_item() method.

20.1 Progress report traverser

A progress report traverser is

a linked list of the Earley items in a Earley set, in which
one node is distinguished as the current node of the traverser.

The linked list has two sentinel nodes, one at the start of the linked list, and one at the end of the linked list. The two sentinel nodes do not correspond to Earley items so that, where n is the size of the Earley set, the linked list of its traverser contains n+2 nodes.

The sentinel node at the start of the linked list is called the start sentinel node or start sentinel. In a traverser for a non-empty Earley set, the start sentinel comes before the first Earley item. The sentinel node at the end of the linked list is called the end sentinel node or end sentinel. In a traverser for a non-empty Earley set, the end sentinel comes after the last Earley item.

20.2 Progress report methods

Mutator function: int marpa_r_progress_report_reset ( Marpa_Recognizer r)

On success, sets the current node of the report traverser to the start sentinel. See Progress report traverser.

This method is not usually needed. Its effect is to leave the traverser in the same state as it is immediately after the marpa_r_progress_report_start() method. Loosely speaking, it allows the traversal to “start over”.

Hard fails if the recognizer is not started, or if no progress report traverser is active.

Return value: On success, a non-negative value. On failure, -2.

Mutator function: int marpa_r_progress_report_start ( Marpa_Recognizer r, Marpa_Earley_Set_ID set_id)

Creates a progress report traverser in recognizer r for the Earley set with ID set_id.

On success, does the following:

If a progress report traverser was active in recognizer r before this method call, destroys it and frees its memory.
Creates a new progress report traverser from the Earley items for the Earley set with ID set_id.
Sets the current node of the traverser to the start sentinel. Intuitively, for a non-empty Earley set, this can be thought of as positioning the traverser before the first Earley item.
Returns n, the number of Earley items. n may be zero.

Hard fails if no Earley set with ID set_id exists. The error code is MARPA_ERR_INVALID_LOCATION if set_id is negative. The error code is MARPA_ERR_NO_EARLEY_SET_AT_LOCATION if set_id is greater than the ID of the latest Earley set.

Return value: On success, the number of Earley items, which will always be non-negative. On hard failure, -2.

Mutator function: int marpa_r_progress_report_finish ( Marpa_Recognizer r )

On success, destroys the progress report traverser for recognizer r, freeing its memory. For details about the report traverser, see marpa_r_progress_report_start().

It is often not necessary to call this method. marpa_r_progress_report_start() destroys any previously existing progress report. And, when a recognizer is destroyed, its progress report is destroyed as a side effect.

Hard fails if no progress report is active.

Return value: On success, a non-negative value. On hard failure, -2.

Mutator function: Marpa_Rule_ID marpa_r_progress_item ( Marpa_Recognizer r, int* position, Marpa_Earley_Set_ID* origin )

This method allows access to the data for the next progress report item of a progress report. See Progress report traverser.

Let oldCurrent be the current node of the progress report traverser in r when this method was called. Let oldNext be the next link of oldCurrent. If oldCurrent is the end sentinel, oldNext is not defined, and this method soft fails.

In the event of success:

Sets the current node of the traverser to oldNext. We will call this new value of the current node, newCurrent.
Writes the “cooked dot position” of the Earley item corresponding to newCurrent to the location pointed to by the position argument.
Writes the origin of the Earley item corresponding to newCurrent to the location pointed to by the origin argument.
Returns the rule ID of the Earley item corresponding to newCurrent.

The “cooked dot position” is

the standard 0-based start-relative dot position, if the dotted rule is not a completion; and
-1, if the dotted rule is a completion. This has the advantage of making it easy for the application to determine if the dotted rule is a completion. An application that prefers the “raw” 0-based start-relative dot position may convert a -1 to the rule length. See marpa_g_rule_length().

In the event of soft failure:

Sets the current node to the end sentinel.
Sets the error code to MARPA_ERR_PROGRESS_REPORT_EXHAUSTED.
Leaves unchanged the locations pointed to by the position and origin arguments.
Returns -1.

In addition to watching for soft failure, the application can use the item count returned by marpa_r_progress_report_start() to determine when the last item has been seen.

Return value: On success, the rule ID of the progress report item, which is always non-negative. On soft failure, -1. If either the position or the origin argument is NULL, or on other hard failure, -2.

21 Bocage methods

21.1 Overview

To create a bocage, use the marpa_b_new() method.

When a bocage is no longer in use, its memory can be freed using the marpa_b_unref() method.

21.2 Bocage data structure

A bocage is a data structure containing the parses found by processing the input according to the grammar. It is related to a parse forest, but is in a form that is more compact and easily traversable. “Bocage” is our term, and we discovered this structure independently, but we were preceded in the discovery by Elizabeth Scott. And, unlike us, Prof. Scott did the all-important work of documenting it and providing the appropriate mathematical apparatus. See Scott 2008.

21.3 Creating a new bocage

Constructor function: Marpa_Bocage marpa_b_new (Marpa_Recognizer r, Marpa_Earley_Set_ID earley_set_ID)

On success, the following is the case:

If earley_set_ID is non-negative, creates a new bocage object, whose EOP is the Earley set with ID earley_set_ID.
If earley_set_ID is -1, creates a new bocage object, whose EOP is the Earley set at the current earleme.
The new bocage object has a reference count of 1.
The reference count of its parent recognizer object, r, is increased by 1.

The application will usually want EOP to be the Earley set at the current earleme, but there are exceptions. See End of parse.

If earley_set_ID is -1 and there is no Earley set at the current earleme; or if earley_set_ID is non-negative and there is no parse ending at Earley set earley_set_ID, marpa_b_new() hard fails with the error code MARPA_ERR_NO_PARSE.

Return value: On success, the new bocage object. On hard failure, NULL.

21.4 Reference counting

Mutator function: Marpa_Bocage marpa_b_ref (Marpa_Bocage b)

On success, increases the reference count by 1. This method is not needed by most applications.

Return value: On success, b. On hard failure, NULL.

Destructor function: void marpa_b_unref (Marpa_Bocage b): Decreases the reference count by 1, destroying b once the reference count reaches zero. When b is destroyed, the reference count of its parent recognizer is decreased by 1.

21.5 Accessors

Accessor function: int marpa_b_ambiguity_metric (Marpa_Bocage b)

On success, returns an ambiguity metric. If the parse is unambiguous, the metric is 1. If the parse is ambiguous, the metric is 2 or greater, and is otherwise unspecified. See Better defined ambiguity metric.

Return value: On success, the ambiguity metric, which is always non-negative. On hard failure, -2.

Accessor function: int marpa_b_is_null (Marpa_Bocage b): Return value On success, a non-negative integer: 1 or greater if the bocage is for a null parse, and 0 if the bocage is not for a null parse. On hard failure, -2.

22 Ordering methods

22.1 Overview

Before iterating through the parse trees in the bocage, the parse trees must be ordered. To create an ordering, use the marpa_o_new() method.

When an ordering is no longer in use, its memory can be freed using the marpa_o_unref() method.

22.2 Freezing the ordering

An ordering is frozen under the following circumstances:

The first tree iterator is created using the ordering. See marpa_t_new().
marpa_o_ambiguity_metric() is successfully called. See marpa_o_ambiguity_metric().
marpa_o_rank() is successfully called. See marpa_o_rank().

A frozen ordering cannot be changed. There is no way to “unfreeze” an ordering.

22.3 Creating an ordering

Constructor function: Marpa_Order marpa_o_new ( Marpa_Bocage b)

On success, does the following:

Creates a new ordering object, with a reference count of 1.
Increases the reference count of its parent bocage object, b, by 1.

Return value: On success, the new ordering object. On hard failure, NULL.

22.4 Reference counting

Mutator function: Marpa_Order marpa_o_ref ( Marpa_Order o)

On success, increases the reference count by 1. Not needed by most applications.

Return value: On success, o. On hard failure, NULL.

Destructor function: void marpa_o_unref ( Marpa_Order o): Decreases the reference count by 1, destroying o once the reference count reaches zero.

22.5 Accessors

Accessor function: int marpa_o_ambiguity_metric (Marpa_Order o)

If “high rank only” is in effect, this ambiguity metric may differ from that returned by marpa_b_ambiguity_metric(). In particular, a “high rank only” ordering may be unambiguous even if its base bocage is ambiguous. But note also, because multiple parses choices may have the same rank, it is quite possible for a “high rank only” ordering to be ambiguous.

If the ordering is not already frozen, it will be frozen on return from marpa_o_ambiguity_metric(). Note that, despite this, we classify marpa_o_ambiguity_metric() as an “accessor”. Our classification of methods into accessors, mutators, etc. is for the purpose of specifying diagnostic behaviors. When diagnostic behaviors proscribe the use of mutators on a ordering, they treat the orderings as if they were frozen, so that, from the point of view of the diagnostic behaviors, the “freezing” of an ordering does not change its state.

Return value: On success, the ambiguity metric, which is non-negative. On hard failure, -2.

Accessor function: int marpa_o_is_null (Marpa_Order o): Return value: On success: A number greater than or equal to 1 if the ordering is for a null parse; otherwise, 0. On hard failure, -2.

22.6 Non-default ordering

Accessor function: int marpa_o_high_rank_only ( Marpa_Order o)

On success, returns, the “high rank only” flag of ordering o. See marpa_o_high_rank_only_set().

Return value: On success, the value of the “high rank only” flag, which is a boolean. On hard failure, -2.

Mutator function: int marpa_o_high_rank_only_set ( Marpa_Order o, int flag)

Sets the “high rank only” flag of ordering o. A flag of 1 indicates that, when ranking, all choices should be discarded except those of the highest rank. A flag of 0 indicates that no choices should be discarded on the basis of their rank.

A value of 1 is the default. The value of the “high rank only” flag has no effect until ranking is turned on using the marpa_o_rank() method.

Hards fails if the ordering is frozen.

Return value: On success, a boolean which is the value of the “high rank only” flag after the call. On hard failure, -2.

Mutator function: int marpa_o_rank ( Marpa_Order o )

By default, the ordering of parse trees is arbitrary. On success, the following happens:

The ordering is ranked according to the ranks of symbols and rules, the “null ranks high” flags of the rules, and the “high rank only” flag of the ordering.
The ordering is frozen. See Freezing the ordering.

Return value: On success, a non-negative value. On hard failure, -2.

23 Tree methods

23.1 Overview

Once the bocage has an ordering, the parses trees can be iterated. Marpa’s parse tree iterators iterate the parse trees contained in a bocage object. In Libmarpa, “parse tree iterators” are usually just called trees.

To create a tree, use the marpa_t_new() method. A newly created tree iterator is positioned before the first parse tree.

When a tree iterator is no longer in use, its memory can be freed using the marpa_t_unref() method.

To position a newly created tree iterator at the first parse tree, use the marpa_t_next() method. Once the tree iterator is positioned at a parse tree, the same marpa_t_next() method is used to position it to the next parse tree.

23.2 Creating a new tree iterator

Constructor function: Marpa_Tree marpa_t_new (Marpa_Order o)

On success, does the following:

Creates a new tree iterator, with a reference count of 1.
Increases the reference count of its parent ordering object, o, by 1.
Positions the new tree iterator before the first parse tree.

Return value: On success, a newly created tree. On hard failure, NULL.

23.3 Reference counting

Mutator function: Marpa_Tree marpa_t_ref (Marpa_Tree t)

On success, increases the reference count by 1. Not needed by most applications.

Return value: On success, t. On hard failure, NULL.

Destructor function: void marpa_t_unref (Marpa_Tree t): Decreases the reference count by 1, destroying t once the reference count reaches zero.

23.4 Iterating through the trees

Mutator function: int marpa_t_next ( Marpa_Tree t)

On success, positions t at the next parse tree in the iteration.

Tree iterators are initialized to the position before the first parse tree, so this method must be called before creating a valuator from a tree.

If a tree iterator is positioned after the last parse, the tree is said to be “exhausted”. A tree iterator for a bocage with no parse trees is considered to be “exhausted” when initialized.

If the tree iterator is exhausted, soft fails, and sets the error code to MARPA_ERR_TREE_EXHAUSTED. See Orthogonal treatment of soft failures.

It the tree iterator is paused, hard fails, and sets the error code to MARPA_ERR_TREE_PAUSED. This hard failure is fully recoverable. See marpa_v_new().

Return value: On success, a non-negative value. On soft failure, -1. On hard failure, -2. The hard failure with error code MARPA_ERR_TREE_PAUSED is fully recoverable.

Accessor function: int marpa_t_parse_count ( Marpa_Tree t)

Returns the count of the number of parse trees traversed so far. The count includes the current iteration of the tree. A value of 0 indicates that the tree iterator is at its initialized position, before the first parse tree.

Return value: The number of parses traversed so far. Always succeeds.

24 Value methods

24.1 Overview

The archetypal application needs a value object (or valuator) to produce the value of the parse tree. To create a valuator, use the marpa_v_new() method.

The application is required to maintain the stack, and the application is also required to implement most of the semantics, including the evaluation of rules. Libmarpa’s valuator provides instructions to the application on how to manipulate the stack. To iterate through this series of instructions, use the marpa_v_step() method.

When successful, marpa_v_step() returns the type of step. Most step types have values associated with them. See Basic step accessors, see How to use the valuator, and see Stepping through the valuator.

When a valuator is no longer in use, its memory can be freed using the marpa_v_unref() method.

24.2 How to use the valuator

Libmarpa’s valuator provides the application with “steps”, which are instructions for stack manipulation. Libmarpa itself does not maintain a stack. This leaves the upper layer in total control of the stack and the values that are placed on it.

As example may make this clearer. Suppose the evaluation is at a place in the parse tree where an addition is being performed. Libmarpa does not know that the operation is an addition. It will tell the application that rule number R is to be applied to the arguments at stack locations N and N+1, and that the result is to placed in stack location N.

In this system the application keeps track of the semantics for all rules, so it looks up rule R and determines that it is an addition. The application can do this by using R as an index into an array of callbacks, or by any other method it chooses. Let’s assume a callback implements the semantics for rule R. Libmarpa has told the application that two arguments are available for this operation, and that they are at locations N and N+1 in the stack. They might be the numbers 42 and 711. So the callback is called with its two arguments, and produces a return value, let’s say, 753. Libmarpa has told the application that the result belongs at location N in the stack, so the application writes 753 to location N.

Since Libmarpa knows nothing about the semantics, the operation for rule R could be string concatenation instead of addition. Or, if it is addition, it could allow for its arguments to be floating point or complex numbers. Since the application maintains the stack, it is up to the application whether the stack contains integers, strings, complex numbers, or polymorphic objects that are capable of being any of these things and more.

24.3 Advantages of step-driven valuation

Step-driven valuation hides Libmarpa’s grammar rewrites from the application, and is quite efficient. Libmarpa knows which rules are sequences. Libmarpa optimizes stack manipulations based on this knowledge. Long sequences are very common in practical grammars. For these, the stack manipulations suggested by Libmarpa’s step-driven valuator will be significantly faster than the traditional stack evaluation algorithm.

Step-driven evaluation has another advantage. To illustrate this, consider what is a very common case: The semantics are implemented in a higher-level language, using callbacks. If Libmarpa did not use step-driven valuation, it would need to provide for this case. But for generality, Libmarpa would have to deal in C callbacks. Therefore, a middle layer would have to create C language wrappers for the callbacks in the higher level language.

The implementation that results is this: The higher level language would need to wrap each callback in C. When calling Libmarpa, it would pass the wrappered callback. Libmarpa would then need to call the C language “wrapper”. Next, the wrapper would call the higher-level language callback. The return value, which would be data native to the higher-level language, would need to be passed to the C language wrapper, which will need to make arrangements for it to be based back to the higher-level language when appropriate.

A setup like this is not terribly efficient. And exception handling across language boundaries would be very tricky.

But neither of these is the worst problem. The worst problem is that callbacks are hard to debug. Wrappered callbacks are even worse. Calls made across language boundaries are harder yet to debug. In the system described above, by the time a return value is finally consumed, a language boundary will have been crossed four times. The ability to debug can make the difference between code that works and code that does not work, and callbacks put far too many hurdles in the way of debugging.

So, while step-driven valuation seems a roundabout approach, it is simpler and more direct than the likely alternatives. And there is something to be said for pushing semantics up to the higher levels — they can be expected to know more about it.

These advantages of step-driven valuation are strictly in the context of a low-level interface. We are under no illusion that direct use of Libmarpa’s valuator will be found satisfactory by most Libmarpa users, even those using the C language. Libmarpa’s valuator is intended to be used via an upper layer, one that does know about semantics.

24.4 Maintaining the stack

This section discusses in detail the requirements for maintaining the stack. In some cases, such as implementation using a Perl array, fulfilling these requirements is trivial. Perl auto-extends its arrays, and initializes the element values, on every read or write. For the C programmer, things are not quite so easy.

In this section, we will assume a C89 standard-conformant C application. This assumption is convenient on two grounds. First, this will be the intended use for many readers. Second, standard-conformant C89 is a “worst case”. Any issue faced by a programmer in another environment is likely to also be one that must be solved by the C programmer.

Libmarpa often optimizes away unnecessary stack writes to stack locations. When it does so, it will not necessarily optimize away all reads to that stack location. This means that a location’s first access, as suggested by the Libmarpa step instructions, may be a read. This possibility requires a special awareness from the C programmer. See Sizing the stack.

24.4.1 Sizing the stack

In our discussion of the stack handler for the valuator, we will treat the stack as a 0-based array. If an implementation applies Libmarpa’s step instructions literally, using a physical stack, it must make sure that all locations in the stack are initialized. The range of locations that must be initialized is from stack location 0 to the “end of stack” location. The result of the parse tree is always stored in stack location 0, so that a stack location 0 is always needed. Therefore, the end of stack location is always a specified value, and greater than or equal to 0. The end of stack location must also be greater than or equal to

marpa_v_result(v) for every MARPA_STEP_TOKEN step,
marpa_v_result(v) for every MARPA_STEP_NULLING_SYMBOL step, and
marpa_v_arg_n(v) for every MARPA_STEP_RULE step.

In practice, an application will often extend the stack as it iterates through the steps, initializing the new stack locations as they are created.

Note that our requirement is not merely that the stack locations exist and be writable, but that they be initialized. This is necessary for C89 conformance. Because of write optimizations in our implementation, the first access to any stack location may be a read. C89 allows trap values, so that a read of an uninitialized location could result in undefined behavior. See Trap representations.

24.5 Creating a new valuator

Constructor function: Marpa_Value marpa_v_new ( Marpa_Tree t )

On success, does the following:

Creates a new valuator. The parent object of the new valuator will be the tree iterator t.
Sets the reference count of the new valuator to 1.
Sets new valuator to active. A valuator is always either active or inactive.
Increases the reference count of t by 1.
“Pauses” the parent tree iterator.

As long as a parent tree iterator is paused marpa_t_next() will not succeed, and therefore the parent tree iterator cannot move on to a new parse tree. Many valuators can share the same parent parse tree. A tree iterator is “unpaused” when all of the valuators of that tree iterator are destroyed.

Return value: On success, the newly created valuator. On hard failure, NULL.

24.6 Reference counting

Mutator function: Marpa_Value marpa_v_ref (Marpa_Value v)

On success, increases the reference count by 1. Not needed by most applications.

Return value: On success, v. On hard failure, NULL.

Destructor function: void marpa_v_unref (Marpa_Value v): Decreases the reference count by 1, destroying v once the reference count reaches zero.

24.7 Stepping through the valuator

Mutator function: Marpa_Step_Type marpa_v_step ( Marpa_Value v)

Steps through the tree in depth-first, left-to-right order. On success, does the following:

Takes the valuator to the next step in its life cycle.
Sets and returns the step type, which is a non-negative integer. The C type for a step type is Marpa_Step_Type.

The step type tells the application how it expected to act on the step. See Valuator step types. Steps are often referred to along with their step type so that, for example, we say “a MARPA_STEP_RULE step” to refer to a step whose step type is MARPA_STEP_RULE.

When the iteration through the steps is finished, the step type is MARPA_STEP_INACTIVE. At this point, we say that the valuator is inactive. Once a valuator becomes inactive, it stays inactive.

Return value: On success, a Marpa_Step_Type, which always be a non-negative integer. On hard failure, -2.

24.8 Valuator step types

Accessor macro: Marpa_Step_Type MARPA_STEP_RULE

MARPA_STEP_RULE is the step type for for a rule node. The application should perform its equivalent of rule execution.

The “child values” for this step will be in the stack locations from marpa_v_arg_0(v) to marpa_v_arg_n(v).
The rule for this step will be marpa_v_rule(v).
The result of this step should be written to stack location marpa_v_result(v). Typically, the result of this step is determined by executing the semantics for its rule on its child values.
The stack location of marpa_v_result(v) is guaranteed to be equal to marpa_v_arg_0(v).

Accessor macro: Marpa_Step_Type MARPA_STEP_TOKEN

MARPA_STEP_TOKEN is the step type for a token node. The application’s equivalent of the evaluation of the semantics of a non-null token should be performed.

The symbol ID of the token will be at stack location marpa_v_token(v).
Libmarpa will already have a“token value” for the token in this step, as was set by the marpa_r_alternative() method. See marpa_r_alternative(). Libmarpa’s “token value” will be in stack location marpa_v_token_value(v).
The result of the applications’s evaluation of the semantics of this token should be placed in stack location marpa_v_result(v). Often, an application will simply copy Libmarpa’s “token value” to stack location marpa_v_result(v).

Accessor macro: Marpa_Step_Type MARPA_STEP_NULLING_SYMBOL

MARPA_STEP_NULLING_SYMBOL is the step type for a nulled node. The application’s equivalent of the evaluation of the semantics of a nulled token should be performed.

The ID of the nulled symbol will be at stack location marpa_v_symbol(v).
The application’s value for this nulled symbol instance should be placed in stack location marpa_v_result(v). Often, an application will assign a fixed value to each nullable symbol, and will simply copy this fixed value to stack location marpa_v_result(v).

The use of the word “nulling” in the step type name MARPA_STEP_NULLING_SYMBOL is problematic. While the node must be zero-length (nulled), the node’s symbol need not be nulling: it may be a proper nullable. The name of the step type would more correctly be “MARPA_STEP_NULLED_SYMBOL”, but it is left as is for backward compatibility. See Nulling versus nulled.

Accessor macro: Marpa_Step_Type MARPA_STEP_INACTIVE: When this is the step type, the valuator has gone through all of its steps and is now inactive. The value of the parse tree will be in stack location 0. Because of optimizations, it is possible for valuator to immediately became inactive — MARPA_STEP_INACTIVE could be both the first and last step. Once a valuator becomes inactive, it stays inactive.

Accessor macro: Marpa_Step_Type MARPA_STEP_INITIAL: The valuator is new and has yet to go through any steps.

Accessor macro: Marpa_Step_Type MARPA_STEP_INTERNAL1
Accessor macro: Marpa_Step_Type MARPA_STEP_INTERNAL2
Accessor macro: Marpa_Step_Type MARPA_STEP_TRACE: These step types are reserved for internal purposes.

24.9 Basic step accessors

This section describes the accessors that are basic to stack manipulation.

Accessor macro: int marpa_v_arg_0 (Marpa_Value v): Return value: For a MARPA_STEP_RULE step, the stack location where the value of first child can be found. For other step types, an unspecified value. Always succeeds.

Accessor macro: int marpa_v_arg_n (Marpa_Value v): Return value: For a MARPA_STEP_RULE step, the stack location where the value of the last child can be found. For other step types, an unspecified value. Always succeeds.

Accessor macro: int marpa_v_result (Marpa_Value v): Return value: For MARPA_STEP_RULE, MARPA_STEP_TOKEN, and MARPA_STEP_NULLING_SYMBOL steps, the stack location where the result of the semantics should be placed. For other step types, an unspecified value. Always succeeds.

Accessor macro: Marpa_Rule_ID marpa_v_rule (Marpa_Value v): Return value: For the MARPA_STEP_RULE step, the ID of the rule. For other step types, an unspecified value. Always succeeds.

Accessor macro: Marpa_Step_Type marpa_v_step_type (Marpa_Value v)

This macro is rarely needed since its return value is the same as the value that marpa_v_step() returns on success.

Return value: The current step type: MARPA_STEP_TOKEN, MARPA_STEP_RULE, etc. Always succeeds.

Accessor macro: Marpa_Symbol_ID marpa_v_symbol (Marpa_Value v): Return value: For the MARPA_STEP_NULLING_SYMBOL step, the ID of the symbol. The value returned is the same as that returned by the marpa_v_token() macro. For other step types, an unspecified value. Always succeeds.

Accessor macro: Marpa_Symbol_ID marpa_v_token (Marpa_Value v): Return value: For the MARPA_STEP_TOKEN step, the ID of the token. The value returned is the same as that returned by the marpa_v_symbol() macro. For other step types, an unspecified value. Always succeeds.

Accessor macro: int marpa_v_token_value (Marpa_Value v): Return value: For the MARPA_STEP_TOKEN step, the “token value” that was assigned to the token by the marpa_r_alternative() method. See marpa_r_alternative(). For other step types, an unspecified value. Always succeeds.

24.10 Step location accessors

This section describes step accessors that are not basic to stack manipulation. They provide Earley set location information about the parse tree.

A step’s location in terms of Earley sets is called its ES location. Every ES location is the ID of an Earley set. ES location is only relevant for steps that correspond to tree nodes.

Every tree node has both a start ES location and an end ES location. The start ES location is the first ES location of that parse node.

The end ES location of a leaf is the ES location where the next leaf symbol in the fringe of the current parse tree would start. Typically, this is the location where a leaf node actually starts but, at the end of a parse, there is not an actual next leaf node.

The start ES location of a MARPA_RULE_STEP step is the start ES location of its first child node in the current parse tree. The end ES location of a MARPA_RULE_STEP step is the end ES location of its last child node in the current parse tree.

Accessor macro: Marpa_Earley_Set_ID marpa_v_es_id (Marpa_Value v): Return value: If the current step type is MARPA_STEP_RULE, MARPA_STEP_TOKEN, or MARPA_STEP_NULLING_SYMBOL, the return value is the end ES location of the parse node. If the current step type is anything else, or if the valuator is inactive, the return value is unspecified.

Accessor macro: Marpa_Earley_Set_ID marpa_v_rule_start_es_id (Marpa_Value v): Return value: If the current step type is MARPA_STEP_RULE, the start ES location of the rule node. If the current step type is anything else, or if the valuator is inactive, the return value is unspecified.

Accessor macro: Marpa_Earley_Set_ID marpa_v_token_start_es_id (Marpa_Value v): Return value: If the current step type is MARPA_STEP_TOKEN or MARPA_STEP_NULLING_SYMBOL, the start ES location of the leaf node. If the current step type is anything else, or if the valuator is inactive, the return value is unspecified.

For every parse node of the current parse tree, the Earley set length (ES length) of the node is the end ES location, less the start ES location. The ES length of a nulled node is always 0.

If v is a valuator whose current step type is MARPA_STEP_NULLING_SYMBOL, it is always the case that

         marpa_v_token_start_es_id(v) == marpa_v_es_id(v)

If v is a valuator whose current step type is MARPA_STEP_RULE or MARPA_STEP_TOKEN, it is always the case that

        marpa_v_token_start_es_id(v) <= marpa_v_es_id(v)

For the following examples,

let Null be a nulling symbol,
let Tok be a non-nullable symbol, and
let the notation Sym@m-n indicate that the symbol Sym has ES start location m and ES end location n.

Ordered from left to right, a possible fringe is

        Null@0-0, Tok@0-1, Null@1-1, Tok@1-2, Null@2-2

Another example is

        Null@0-0, Null@0-0, Tok@0-1, Null@1-1, Null@1-1, Tok@1-2,
        Null@2-2, Null@2-2

In this second example note that when a nulled leaf immediately follows another nulled leaf, both leaves has the same start ES location and the same end ES location. This makes sense, because nulled symbol instances do not advance the current ES location, but it also implies that the ES locations and LHS symbol cannot be used to uniquely identify a parse node.

25 Events

25.1 Overview

This chapter discusses Libmarpa’s events. It contains descriptions of both grammar and recognizer methods.

A method is event-triggering iff it can add event instances to the event queue. The event-triggering methods are marpa_g_precompute(), marpa_r_earleme_complete(), and marpa_r_start_input(). The event-triggering methods always clear all previous events so that, on return from an event-triggering method, the only events in the event queue will be the events triggered by that method.

Every event instance has a type, and a subtype. The type of an event instance is its code, such as MARPA_EVENT_SYMBOL_COMPLETED. Event codes are listed in a later section. See Event codes. Subtypes will be described shortly.

Event types are either implicit or explicit. Intuitively, implicit event types are available to the application without any action by the user; and explicit event types are not available unless the user calls certains methods which are available for that purpose.

Event are also either global or per-symbol. In any event-triggering method, at most one event instance can trigger for each subtype. If an event type is global, it has only one subtype, and only one event instance of that type may trigger in a method call. If an event is per-symbol, it has one subtype for each symbol in grammar. Many instances of a per-symbol event may trigger in a method call, up to the number of symbols in the grammar.

All global event types are implicit. All explicit event types are per-symbol. The reason for event types to be explicit is to avoid the overhead of unused events, and for per-symbol event types there might be many such events.

For an event instance to trigger, its event subtype must first be declared. Subtypes of implicit events are always declared — declaring them requires no action on the part of the user. Subtypes of explicit event types must be declared by the user, using a method call provided for that purpose. Once declared, an event subtype remains declared: there is no way to remove or undo the declaration of an event subtype. Details of event declaration are provided, by event code, in the section on event codes. See Event codes.

A declared event subtype can be activated. An event instance will not trigger, unless its event subtype is activated. Subtypes of implicit event are always activated — activating them requires no action on the part of the user. Subtypes of explicit event types must be activated by the user, using a method call provided for that purpose. Implicitly activated event subtypes cannot be deactivated. Some explicitly activated event subtypes can be deactivated, using a method call available for that purpose. Details of event activation are provided, by event code, in the section on event codes. See Event codes.

When a recognizer is created, the event subtypes of a recognizer inherit their declared and activated status from the events of the base grammar. See marpa_r_new().

A Libmarpa method or macro is event-safe iff it does not change the events queue. All Libmarpa accessors are event-safe.

Regardless of the event-safety of the method calls between event triggering and event access, it is good practice to access event instances as soon as reasonable after the method that triggered them. Note that the event queue is kept in the base grammar, so that multiple recognizers using the same base grammar can overwrite each other’s events.

To find out how many events are in the event queue, use the marpa_g_event_count() method. See marpa_g_event_count().

To access specific events, use the marpa_g_event() (see marpa_g_event()) and marpa_g_event_value() (see marpa_g_event_value()) methods.

25.2 Event codes

Accessor macro: int MARPA_EVENT_NONE

This is an implicit global event. This event code is reserved. No method triggers an event with this code.

Event value: Unspecified. Suggested message: "No event".

Accessor macro: int MARPA_EVENT_COUNTED_NULLABLE

This is an implicit per-symbol event. This event is triggered by the marpa_g_precompute() method (see marpa_g_precompute()), when a nullable symbol was used as either the separator for, or the right hand side of, a sequence.

Event value: The ID of the symbol. Suggested message: "This symbol is a counted nullable".

Accessor macro: int MARPA_EVENT_EARLEY_ITEM_THRESHOLD

This is an implicit global event. This event is triggered by the marpa_r_earleme_complete() method (see marpa_r_earleme_complete()), when an application-settable threshold on the number of Earley items has been reached or exceeded. See marpa_r_earley_item_warning_threshold_set().

Event value: The current Earley item count. Suggested message: "Too many Earley items".

Accessor macro: int MARPA_EVENT_EXHAUSTED

This is an implicit global event. This event is triggered by the marpa_r_earleme_complete() (see marpa_r_earleme_complete) and marpa_r_start_input() (see marpa_r_start_input) methods, when the parse is exhausted. See Exhaustion.

Event value: Unspecified. Suggested message: "Recognizer is exhausted".

Accessor macro: int MARPA_EVENT_LOOP_RULES

This is an implicit global event. This event is triggered by the marpa_g_precompute() method, when one or more rules are loop rules. (See marpa_g_precompute().

A rule is a loop rule iff it is part of a cycle. See Cycles. Parsing with a grammar than contains a loop rule is deprecated.

Event value: The count of loop rules. Suggested message: "Grammar contains a infinite loop".

Accessor macro: int MARPA_EVENT_NULLING_TERMINAL

This is an implicit per-symbol event. This event is triggered by the marpa_g_precompute() method. See marpa_g_precompute(). It only occurs if the LHS terminals feature is in use. The LHS terminals feature is deprecated. See LHS terminals.

Event value: The ID of the symbol. Suggested message: "This symbol is a nulling terminal".

Accessor macro: int MARPA_EVENT_SYMBOL_COMPLETED

This is an explicit per-symbol event. Intuitively, it triggers when a non-nulled non-terminal symbol is fully recognized.

More precisely, when an Earley set is completed at earleme j, the MARPA_EVENT_SYMBOL_COMPLETED event instance for symbol sym triggers iff

the event subtype of MARPA_EVENT_SYMBOL_COMPLETED for sym is activated; and
the Earley set at earleme j contains a completed Earley item for a rule whose LHS is sym.

The MARPA_EVENT_SYMBOL_COMPLETED is triggered by the marpa_r_earleme_complete() method. See marpa_r_earleme_complete().

We recall that a symbol instance is a symbol, together with start and end parse locations. More than one symbol instance may trigger the same MARPA_EVENT_SYMBOL_COMPLETED event instance. For example, if x@40-42 and x@38-42 are symbol instances, they will trigger only one event instance — the MARPA_EVENT_SYMBOL_COMPLETED event instance at location 42 with symbol x.

To declare a completed symbol event, use the marpa_g_symbol_is_completion_event_set() method. See marpa_g_symbol_is_completion_event_set. To activate or deactivate a completed symbol event, use the marpa_g_completion_symbol_activate method (see marpa_g_completion_symbol_activate), or the marpa_r_completion_symbol_activate method (see marpa_r_completion_symbol_activate).

Event value: The ID of the completed symbol. Suggested message: "Completed symbol".

Accessor macro: int MARPA_EVENT_SYMBOL_EXPECTED

This is an explicit per-symbol event. Intuitively, it triggers when a terminal symbol is expected.

More precisely, when an Earley set is completed at earleme j, the MARPA_EVENT_SYMBOL_EXPECTED event instance for symbol sym triggers iff

the event subtype of MARPA_EVENT_SYMBOL_EXPECTED for sym is activated;
sym is a terminal; and
sym is the postdot symbol of an Earley item whose current location is j.

This event is triggered by the marpa_r_earleme_complete() (see marpa_r_earleme_complete) and marpa_r_start_input() (see marpa_r_start_input) methods.

More than one symbol instance may trigger the same MARPA_EVENT_SYMBOL_EXPECTED event instance. For example, if

    [ [ X ::= B • x ], 0, 42 ] and
    [ [ Y ::= C • x ], 7, 42 ]

are Earley items, they will trigger only one event instance — the MARPA_EVENT_SYMBOL_EXPECTED event instance at location 42 with symbol x.

MARPA_EVENT_SYMBOL_EXPECTED events only trigger if their symbol is expected as a terminal. Predicted symbols that are not expected as terminals do not trigger this event.

To declare an expected terminal event, use the marpa_r_expected_symbol_event_set() method. The marpa_r_expected_symbol_event_set() method is also used to activate or deactivate an expected terminal event. See marpa_r_expected_symbol_event_set.

Event value: The ID of the expected symbol. Suggested message: "Expecting symbol".

Accessor macro: int MARPA_EVENT_SYMBOL_NULLED

This is an explicit per-symbol event. Intuitively, this event triggers when a nulled instance of a symbol is recognized.

More precisely, when an Earley set is completed at earleme j, the MARPA_EVENT_SYMBOL_NULLED event instance for symbol sym triggers iff

the event subtype of MARPA_EVENT_SYMBOL_NULLED for sym is activated; and
sym@j-j is a symbol instance.

This event is triggered by the marpa_r_earleme_complete() (see marpa_r_earleme_complete) and marpa_r_start_input() (see marpa_r_start_input) methods.

Duplicate nulled nodes may occur. See Duplicate nulled nodes. Even if there are duplicate nulled nodes for symbol sym at location j, only one MARPA_EVENT_SYMBOL_NULLED event instance for sym will be triggered at location j.

To declare an nulled symbol event, use the recognizer’s marpa_g_symbol_is_nulled_event_set() method. See marpa_g_symbol_is_nulled_event_set. To activate or deactivate a nulled symbol event, use the marpa_g_nulled_symbol_activate method (see marpa_g_nulled_symbol_activate), or the marpa_r_nulled_symbol_activate method (see marpa_r_nulled_symbol_activate).

Event value: The ID of the nulled symbol. Suggested message: "Symbol was nulled".

Accessor macro: int MARPA_EVENT_SYMBOL_PREDICTED

This is an explicit per-symbol event. Intuitively, this event triggers when a symbol is predicted. Unlike the MARPA_EVENT_SYMBOL_EXPECTED event, the MARPA_EVENT_SYMBOL_PREDICTED event triggers for predictions of both non-terminals and terminals.

More precisely, when an Earley set is completed at earleme j, the MARPA_EVENT_SYMBOL_PREDICTED event instance for symbol sym triggers iff

the event subtype of MARPA_EVENT_SYMBOL_PREDICTED for sym is activated; and
the Earley set at earleme j contains an Earley item whose postdot symbol is sym.

This event is triggered by the marpa_r_earleme_complete() (see marpa_r_earleme_complete) and marpa_r_start_input() (see marpa_r_start_input) methods.

In an Earley set, multiple Earley items may have the same postdot symbol, but they will trigger only one event instance. For example, if

    [ [ X ::= B • X ], 0, 42 ] and
    [ [ Y ::= C • X ], 7, 42 ]

are Earley items, they will trigger only one event instance — the MARPA_EVENT_SYMBOL_PREDICTED event instance at location 42 with symbol X.

To declare a predicted symbol event, use the recognizer’s marpa_g_symbol_is_prediction_event_set() method. See marpa_g_symbol_is_prediction_event_set. To activate or deactivate a predicted symbol event, use the marpa_g_prediction_symbol_activate method (see marpa_g_prediction_symbol_activate), or the marpa_r_prediction_symbol_activate method (see marpa_r_prediction_symbol_activate).

Event value: The ID of the predicted symbol. Suggested message: "Symbol was predicted".

25.3 Basic event accessors

Accessor function: Marpa_Event_Type marpa_g_event (Marpa_Grammar g, Marpa_Event* event, int ix)

On success,

the type of the ix’th event is returned, and
the data for the ix’th event is placed in the location pointed to by event.

Event indexes are in sequence. Valid events will be in the range from 0 to n, where n is one less than the event count. The event count can be read using the marpa_g_event_count() method.

Hard fails if there is no ix’th event, or if ix is negative. On failure, the locations pointed to by event are not changed.

Return value: On success, the type of event ix, which is always non-negative. On hard failure, -2.

Accessor function: int marpa_g_event_count ( Marpa_Grammar g ): Return value: On success, the number of events, which is always non-negative. On hard failure, -2.

Accessor macro: int marpa_g_event_value (Marpa_Event* event)

Returns the “value” of the event. The semantics of the value varies according to the type of the event, and is described in the section on event codes. See Event codes.

Return value: The “value” of the event. Always succeeds.

25.4 Completion events

This section contains methods for dealing with MARPA_EVENT_SYMBOL_COMPLETED events. See MARPA_EVENT_SYMBOL_COMPLETED.

To declare a symbol as a completion event symbol use the marpa_g_symbol_is_completion_event_set() method. The event will be activated by default.

To activate or deactivate a completion symbol event use the marpa_r_completion_symbol_activate() method.

Mutator function: int marpa_g_completion_symbol_activate ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, int reactivate )

Allows the user to deactivate and reactivate symbol completion events in the grammar. On success, does the following:

If reactivate is zero, deactivates the event in the grammar.
If reactivate is one, activates the event in the grammar.

The activation status of a completion event in the grammar becomes the initial activation status of a completion event in all of its child recognizers.

This method is seldom needed. When a symbol is declared as a completion event symbol in the grammar, it is activated by default. See marpa_g_symbol_is_completion_event_set(). And a completion event can be deactivated and reactivated in the recognizer using the marpa_r_completion_symbol_activate method. See marpa_r_completion_symbol_activate().

Hard fails if the sym_id is not declared as a completion event symbol in the grammar, or if the grammar has not been precomputed.

Return value: On success, the value of reactivate, which is a boolean. On hard failure, -2.

Mutator function: int marpa_r_completion_symbol_activate ( Marpa_Recognizer r, Marpa_Symbol_ID sym_id, int reactivate )

Allows the user to deactivate and reactivate symbol completion events in the recognizer. On success, does the following:

If reactivate is zero, deactivates the event in the recognizer.
If reactivate is one, activates the event in the recognizer.

Hard fails if sym_id was not declared for completion events in the base grammar.

Return value: On success, the value of reactivate, which is a boolean. On hard failure, -2.

Accessor function: int marpa_g_symbol_is_completion_event ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)

On success, returns a boolean which is 1 iff sym_id is declared as a completion event symbol in g. For more about completion events, see marpa_g_symbol_is_completion_event_set().

Soft fails iff sym_id is well-formed (a non-negative integer), but there is no such symbol.

Hard fails if g is precomputed.

Return value: On success, a boolean . On soft failure, -1. On hard failure, -2.

Mutator function: int marpa_g_symbol_is_completion_event_set ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, int value)

On success, if value is 1,

declares symbol sym_id as a completion event symbol,
activates the completion event for sym_id in g, and
returns 1.

On success, if value is 0,

declares symbol sym_id as a completion event symbol,
deactivates the completion event for sym_id in g, and
returns 0.

Nulled rules and symbols will never cause completion events. Nullable symbols may be declared as completion event symbols, but this will have an effect only when the symbol is not nulled. Nulling symbols may be declared as completion event symbols, but no completion events will ever be triggered for a nulling symbol. Note that this implies that no completion event will ever be triggered at earleme 0, the start of parsing.

Soft fails iff sym_id is well-formed (a non-negative integer), but there is no such symbol.

Hards fails if the grammar is precomputed.

Return value: On success, value, which is a boolean. On soft failure, -1. On hard failure, -2.

25.5 Symbol nulled events

This section contains methods for dealing with MARPA_EVENT_SYMBOL_NULLED events. See MARPA_EVENT_SYMBOL_NULLED.

To declare a symbol as a nulled event symbol use the marpa_g_symbol_is_nulled_event_set() method. The event will be activated by default.

To activate or deactivate a nulled symbol event use the marpa_r_nulled_symbol_activate() method.

Mutator function: int marpa_g_nulled_symbol_activate ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, int reactivate )

Allows the user to deactivate and reactivate symbol nulled events in the grammar. On success, does the following:

If reactivate is zero, deactivates the event in the grammar.
If reactivate is one, activates the event in the grammar.

The activation status of a nulled event in the grammar becomes the initial activation status of a nulled event in all of its child recognizers.

This method is seldom needed. When a symbol is declared as a nulled event symbol in the grammar, it is activated by default. See marpa_g_symbol_is_nulled_event_set(). And a nulled event can be deactivated and reactivated in the recognizer using the marpa_r_nulled_symbol_activate method. See marpa_r_nulled_symbol_activate().

Hard fails if the sym_id is not declared as a nulled event symbol in the grammar, or if the grammar has not been precomputed.

Return value: On success, the value of reactivate, which is a boolean. On hard failure, -2.

Mutator function: int marpa_r_nulled_symbol_activate ( Marpa_Recognizer r, Marpa_Symbol_ID sym_id, int boolean )

Allows the user to deactivate and reactivate symbol nulled events in the recognizer. On success, does the following:

If reactivate is zero, deactivates the event in the recognizer.
If reactivate is one, activates the event in the recognizer.

Hard fails if the subtype sym_id of the MARPA_EVENT_SYMBOL_NULLED event is not declared.

Return value: On success, the value of reactivate, which is a boolean. On hard failure, -2.

Accessor function: int marpa_g_symbol_is_nulled_event ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)

On success, returns a boolean which is 1 iff the subtype sym_id of the MARPA_EVENT_SYMBOL_NULLED event is declared. For more about nulled events, see marpa_g_symbol_is_nulled_event_set.

Soft fails iff sym_id is well-formed (a non-negative integer), but there is no such symbol.

Hard fails if g is precomputed.

Return value: On success, a boolean . On soft failure, -1. On hard failure, -2.

Mutator function: int marpa_g_symbol_is_nulled_event_set ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, int value)

On success, if value is 1,

declares symbol sym_id as a nulled event symbol,
activates the nulled event for sym_id in g, and
returns 1.

On success, if value is 0,

declares symbol sym_id as a nulled event symbol,
deactivates the nulled event for sym_id in g, and
returns 0.

The marpa_g_symbol_is_nulled_event_set() method will declare a symbol as a nulled event symbol, even if the symbol is non-nullable. This is convenient, for example, for automatically generated grammars. Applications that wish to treat it as a failure when a non-nullable symbol is declared as a nulled event symbol, can check for this after precomputation, using the marpa_g_symbol_is_nulled_event() and marpa_g_symbol_is_nullable() methods.

Soft fails iff sym_id is well-formed (a non-negative integer), but there is no such symbol.

Hards fails if the grammar is precomputed.

Return value: On success, value, which is a boolean. On soft failure, -1. On hard failure, -2.

25.6 Prediction events

This section contains methods for dealing with MARPA_EVENT_SYMBOL_PREDICTED events. See MARPA_EVENT_SYMBOL_PREDICTED.

To declare a symbol as a prediction event symbol use the marpa_g_symbol_is_prediction_event_set() method. The event will be activated by default.

To activate or deactivate a prediction symbol event use the marpa_r_prediction_symbol_activate() method.

Mutator function: int marpa_g_prediction_symbol_activate ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, int reactivate )

Allows the user to deactivate and reactivate symbol prediction events in the grammar. On success, does the following:

If reactivate is zero, deactivates the event in the grammar.
If reactivate is one, activates the event in the grammar.

The activation status of a prediction event in the grammar becomes the initial activation status of a prediction event in all of its child recognizers.

This method is seldom needed. When a symbol is declared as a prediction event symbol in the grammar, it is activated by default. See marpa_g_symbol_is_prediction_event_set(). And a prediction event can be deactivated and reactivated in the recognizer using the marpa_r_prediction_symbol_activate method. See marpa_r_prediction_symbol_activate().

Hard fails if the sym_id is not declared as a prediction event symbol in the grammar, or if the grammar has not been precomputed.

Return value: On success, the value of reactivate, which is a boolean. On hard failure, -2.

Mutator function: int marpa_r_prediction_symbol_activate ( Marpa_Recognizer r, Marpa_Symbol_ID sym_id, int boolean )

Allows the user to deactivate and reactivate symbol prediction events in the recognizer. On success, does the following:

If reactivate is zero, deactivates the event in the recognizer.
If reactivate is one, activates the event in the recognizer.

Hard fails if sym_id was not declared for prediction events in the base grammar.

Return value: On success, the value of reactivate, which is a boolean. On hard failure, -2.

Accessor function: int marpa_g_symbol_is_prediction_event ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)

On success, returns a boolean which is 1 iff sym_id is declared as a prediction event symbol. For more about prediction events, see marpa_g_symbol_is_prediction_event_set.

Soft fails iff sym_id is well-formed (a non-negative integer), but there is no such symbol.

Hard fails if g is precomputed.

Return value: On success, a boolean . On soft failure, -1. On hard failure, -2.

Mutator function: int marpa_g_symbol_is_prediction_event_set ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, int value)

On success, if value is 1,

declares symbol sym_id as a prediction event symbol,
activates the prediction event for sym_id in g, and
returns 1.

On success, if value is 0,

declares symbol sym_id as a prediction event symbol,
deactivates the prediction event for sym_id in g, and
returns 0.

Soft fails iff sym_id is well-formed (a non-negative integer), but there is no such symbol.

Hards fails if the grammar is precomputed.

Return value: On success, value, which is a boolean. On soft failure, -1. On hard failure, -2.

25.7 Symbol expected events

This section contains methods for dealing with MARPA_EVENT_SYMBOL_EXPECTED events. See MARPA_EVENT_SYMBOL_EXPECTED.

Mutator function: int marpa_r_expected_symbol_event_set ( Marpa_Recognizer r, Marpa_Symbol_ID symbol_id, int value)

On success, if value is 1, does the following:

Declares the symbol expected event for the symbol with ID symbol_id, if it is not already declared.
Activates the symbol expected event for the symbol with ID symbol_id.
Returns 1.

On success, if value is 0, does the following:

Deactivates the symbol expected event for the symbol with ID symbol_id.
Returns 0.

Expected symbol events only trigger if the symbol with ID symbol_id is expected as terminal. This method will succeed if the symbol with ID symbol_id is not a terminal, but the event subtype that this method declared will never trigger.

The same symbol may be acceptable as both a terminal and a non-terminal if the deprecated LHS terminals feature is in use. If the symbol with ID symbol_id is expected at the current earleme as a non-terminal, but is not acceptable as a terminal, an expected symbol event will not be triggered at the current earleme. See LHS terminals.

Hard fails if value is not a boolean. Hard fails if value is 1, and symbol_id is the ID of a nulling symbol, an inaccessible symbol, or an unproductive symbol. Hard fails if symbol_id is not the ID of a valid symbol.

Return value: On success, value, which will be a boolean. On hard failure, -2.

25.8 Recognizer per-symbol events

The recognizer per-symbol events are those which trigger in the recognizer, and which are per-symbol. All recognizer per-symbol event instances are triggered by the marpa_r_earleme_complete() (see marpa_r_earleme_complete) or marpa_r_start_input() (see marpa_r_start_input) methods. Recognizer per-symbol events have one of the event types MARPA_EVENT_SYMBOL_COMPLETED, MARPA_EVENT_SYMBOL_EXPECTED, MARPA_EVENT_SYMBOL_NULLED, and MARPA_EVENT_SYMBOL_PREDICTED.

25.9 Tentative events

Recognizer per-symbol events are tentative. The symbol instance that triggers them may not actually occur in the parse.

Right of the location of the event, Libmarpa is totally unaware of what the actual input will be — there is no "lookahead". Therefore, if end location of the symbol instance for the event is after the triggering parse location, the symbol instance will reflect a possible input, usually one of many possible inputs. Of these possible outputs, only one will become the actual output, and that only if the parse is allowed to run that far.

25.10 Ambiguous parses and events

In an ambiguous parse, tentative events (see Tentative events) trigger for all of the parse trees. User thinking in terms of only one of the parse trees, perhaps because they are unaware of the ambiguity, will sometimes find this unexpected.

25.11 Nulled subtrees and events

As mentioned above (see Nullability in the valuator), nulled subforests are, for evaluation purposes, pruned back to their topmost symbol. This is not the case for events. Events for every nulled symbol in the subforest will trigger, even if these symbols are pruned in the evaluation phase. More precisely, if

X@j-j is a nulled symbol instance,
X@j-j derives Y@j-j (in which case Y@j-j must also be a nulled symbol instance), and
the event MARPA_EVENT_SYMBOL_NULLED for symbol Y is activated,

then a MARPA_EVENT_SYMBOL_NULLED for symbol Y will trigger at parse location j.

25.12 Event coincidence of symbol instances

We say that two event instances “coincide” if they occur at the same parse location. We say the two event instances are coincident same-symbol-instance event instances iff both event instances trigger at the same parse location and both event instances are for the same symbol instance. Since every event triggered by a single symbol instance will be for the same symbol, coincident events triggered by a single symbol instance can differ only in their event type. This event type will always be one of the recognizer per-symbol events.

25.12.1 Nulled and completed symbol instances

A symbol instance cannot trigger both a MARPA_EVENT_SYMBOL_NULLED and a MARPA_EVENT_SYMBOL_COMPLETED event at the same location. This is because a symbol instance must be zero-length to trigger a MARPA_EVENT_SYMBOL_NULLED event, but must be of non-zero length to trigger a MARPA_EVENT_SYMBOL_COMPLETED event. That is, if the symbol instance is sym@start-end, a MARPA_EVENT_SYMBOL_NULLED event can trigger only if start=end, while the MARPA_EVENT_SYMBOL_COMPLETED event can trigger only if start < end.

25.12.2 Nulled and predicted symbol instances

A symbol instance cannot trigger both a MARPA_EVENT_SYMBOL_NULLED and a MARPA_EVENT_SYMBOL_PREDICTED event at the same location. This is because a symbol instance must be zero-length to trigger a MARPA_EVENT_SYMBOL_NULLED event, but only non-zero length symbols will trigger a MARPA_EVENT_SYMBOL_PREDICTED event. That is, if the symbol instance is sym@start-end, a MARPA_EVENT_SYMBOL_NULLED event can trigger only if start=end, while the MARPA_EVENT_SYMBOL_PREDICTED event can trigger only if start < end.

25.12.3 Nulled and expected symbol instances

A symbol instance cannot trigger both a MARPA_EVENT_SYMBOL_NULLED and a MARPA_EVENT_SYMBOL_EXPECTED event at the same location. This is because a symbol instance must be zero-length to trigger a MARPA_EVENT_SYMBOL_NULLED event, but only terminal symbols will trigger a MARPA_EVENT_SYMBOL_EXPECTED event, and termnal symbols cannot be zero-length. That is, if the symbol instance is sym@start-end, a MARPA_EVENT_SYMBOL_NULLED event will trigger only if start=end, while the MARPA_EVENT_SYMBOL_EXPECTED event will trigger only if start < end.

25.12.4 Completed and predicted symbol instances

A symbol instance cannot trigger both a MARPA_EVENT_SYMBOL_COMPLETED and a MARPA_EVENT_SYMBOL_PREDICTED event at the same location. Call the location, j. Coincident events with the types MARPA_EVENT_SYMBOL_COMPLETED and MARPA_EVENT_SYMBOL_PREDICTED cannot be triggered by a single symbol instance because the symbol instances for MARPA_EVENT_SYMBOL_COMPLETED and MARPA_EVENT_SYMBOL_PREDICTED cannot be zero-length; the symbol instance for the MARPA_EVENT_SYMBOL_COMPLETED must end at j; and the symbol instance for the MARPA_EVENT_SYMBOL_PREDICTED must start at j. That is, if the MARPA_EVENT_SYMBOL_COMPLETED symbol instance is sym@c1-c2 and the MARPA_EVENT_SYMBOL_PREDICTED symbol instance is sym@p1-p2 we must have all of

c1<c2, because the length of MARPA_EVENT_SYMBOL_COMPLETED symbol instance is non-zero;
c2=j, because the MARPA_EVENT_SYMBOL_COMPLETED symbol instance must end at j; and
j=p1, because the MARPA_EVENT_SYMBOL_PREDICTED symbol instance must start at j.

so that c1 < p1 and the two instances sym@c1-c2 and sym@p1-p2 must be distinct.

25.12.5 Completed and expected symbol instances

A symbol instance cannot trigger both a MARPA_EVENT_SYMBOL_COMPLETED and a MARPA_EVENT_SYMBOL_EXPECTED event at the same location. The reasoning is much the same as for the case of MARPA_EVENT_SYMBOL_COMPLETED and MARPA_EVENT_SYMBOL_PREDICTED events. See Completed and predicted symbol instances.

Call the location, j. Coincident events with the types MARPA_EVENT_SYMBOL_COMPLETED and MARPA_EVENT_SYMBOL_PREDICTED cannot be triggered by a single symbol instance because the symbol instances for MARPA_EVENT_SYMBOL_COMPLETED and MARPA_EVENT_SYMBOL_EXPECTED cannot be zero-length; the symbol instance for the MARPA_EVENT_SYMBOL_COMPLETED must end at j; and the symbol instance for the MARPA_EVENT_SYMBOL_EXPECTED must start at j. That is, if the MARPA_EVENT_SYMBOL_COMPLETED symbol instance is sym@c1-c2 and the MARPA_EVENT_SYMBOL_EXPECTED symbol instance is sym@e1-e2 we must have all of

c1<c2, because the length of MARPA_EVENT_SYMBOL_COMPLETED symbol instance is non-zero;
c2=j, because the MARPA_EVENT_SYMBOL_COMPLETED symbol instance must end at j; and
j=e1, because the MARPA_EVENT_SYMBOL_EXPECTED symbol instance must start at j.

so that c1 < e1 and the two instances sym@c1-c2 and sym@e1-e2 must be distinct.

25.12.6 Predicted and expected symbol instances

The one case where a single symbol instance can trigger events of two different types at the same parse location is that of the MARPA_EVENT_SYMBOL_PREDICTED and MARPA_EVENT_SYMBOL_EXPECTED events. That is,

if j is a parse location;
sym is the postdot symbol in an Earley item in the Earley set at j;
sym is a terminal; and
both event subtypes are activated,

then both a MARPA_EVENT_SYMBOL_PREDICTED and a MARPA_EVENT_SYMBOL_EXPECTED event will trigger at location start.

25.13 Event coincidence of symbols

We say the two event instances are coincident same-symbol event instances iff both event instances trigger at the same parse location and both event instances are for the same symbol. Above (see Event coincidence of symbol instances) we looked at event instances which trigger at the same parse location and were for the same symbol instance. In this section we have loosened the requirement, so they events instances can be triggered by multiple symbol instances, as long as all the symbol instances share the same symbol.

Obviously every coincident same-symbol event instance is for the same symbol, and therefore coincident same-symbol event instances can differ only in their event type. This event type will always be one of the recognizer per-symbol events.

Same-symbol event instances can coincide

if one is of type MARPA_EVENT_SYMBOL_PREDICTED and the other is of type MARPA_EVENT_SYMBOL_COMPLETED;
if one of of type MARPA_EVENT_SYMBOL_PREDICTED and the other is of type MARPA_EVENT_SYMBOL_NULLED;
if one of of type MARPA_EVENT_SYMBOL_COMPLETED and the other is of type MARPA_EVENT_SYMBOL_NULLED.

In fact, three same-symbol event instances can coincide, where they are of types MARPA_EVENT_SYMBOL_PREDICTED, MARPA_EVENT_SYMBOL_COMPLETED and MARPA_EVENT_SYMBOL_NULLED. An example of the triggers for such a triple coincidence is the set of symbol instances sym@41-42, sym@42-42, and sym@42-43, where

sym@41-42 can trigger a MARPA_EVENT_SYMBOL_COMPLETED event for symbol sym at parse location 42;
sym@42-42 can trigger a MARPA_EVENT_SYMBOL_NULLED event for symbol sym at parse location 42;
sym@42-43 can trigger a MARPA_EVENT_SYMBOL_PREDICTED event for symbol sym at parse location 42.

Earlier (see Predicted and expected symbol instances), we saw that a single symbol instance can trigger both a MARPA_EVENT_SYMBOL_EXPECTED event instance and a MARPA_EVENT_SYMBOL_PREDICTED event instance at the same parse location. Obviously then, the same symbol can trigger a MARPA_EVENT_SYMBOL_EXPECTED event instance and a MARPA_EVENT_SYMBOL_PREDICTED event instance at the same parse location.

If the deprecated LHS terminals feature is not in use, a MARPA_EVENT_SYMBOL_EXPECTED event instance and a MARPA_EVENT_SYMBOL_COMPLETED event instance cannot be same-symbol coincident. This is because the symbol for the MARPA_EVENT_SYMBOL_COMPLETED event instance must be the LHS of a rule; the symbol for the MARPA_EVENT_SYMBOL_EXPECTED event instance must be a terminal; and terminals cannot be the LHS of a rule. Libmarpa has a feature that allows LHS terminals, but use of this is strongly discouraged. See LHS terminals.

There are no circumstances under which a MARPA_EVENT_SYMBOL_EXPECTED event instance and a MARPA_EVENT_SYMBOL_NULLED event instance can be same-symbol coincident. This is because the symbol for the MARPA_EVENT_SYMBOL_NULLED event instance must be nulled; the symbol for the MARPA_EVENT_SYMBOL_EXPECTED event instance must be a terminal; and terminals can never be nulled. Not only can a MARPA_EVENT_SYMBOL_EXPECTED event instance not coincide with a same-symbol MARPA_EVENT_SYMBOL_NULLED event instance, the two cannot occur in the same parse. In fact, a MARPA_EVENT_SYMBOL_EXPECTED event instance and a same-symbol MARPA_EVENT_SYMBOL_NULLED event instance cannot occur in any parse with the same base grammar.

25.14 Marker symbols

A marker symbol is a nulling symbol introduced for the purpose of “marking” a position in a rule. If a MARPA_EVENT_SYMBOL_NULLED event is activated for the marker symbol, an event instance will be triggered whenever that position in the rule is reached.

For example, consider the rule

     [ A ::= Y Z ].                  (R1)

If we replace (R1) with the following rules

     [ Mk0 ::= ]
     [ Mk1 ::= ]
     [ Mk2 ::= ]
     [ A ::= Mk0 Y Mk1 Z Mk2 ].      (R2)

and activate MARPA_EVENT_SYMBOL_NULLED events for the marker symbols (Mk1, Mk2, and Mk3), then a MARPA_EVENT_SYMBOL_NULLED event instance

for Mk0 will trigger at every location where (R2) is predicted;
for Mk1 will trigger at every location where the symbol Y in rule (R2) has been recognized, and the symbol Z is predicted;
for Mk2 will trigger at every location where (R2) has been fully recognized, so that the entire RHS, including both symbols Y and Z, has been recognized.

We can note that use of the marker Mk0 can be replaced by declaring MARPA_EVENT_SYMBOL_PREDICTED event for symbol A. Similarly, use of the marker Mk2 can be replaced by declaring MARPA_EVENT_SYMBOL_COMPLETED event for symbol A. We discuss this more below. See Per-rule events.

25.15 Per-rule events

The event types are either global or per-symbol — there are no per-rule events. Nonetheless, the user can get the equivalent of per-rule events in two ways, both discussed previously (see Marker symbols). The competely general way is to use marker symbols.

The other way is to declare MARPA_EVENT_SYMBOL_PREDICTED and MARPA_EVENT_SYMBOL_COMPLETED events for the LHS of the rule. This has two limitations: First, it can only trigger events before the first RHS symbol of the rule, or after the rule’s last RHS symbol. Second, the MARPA_EVENT_SYMBOL_PREDICTED and MARPA_EVENT_SYMBOL_COMPLETED events will trigger events for all rules with the same LHS, which may not be what is wanted.

We illustrate the second limitation with an example, which we will call lim2Ex. In lim2Ex, we have the following:

The grammar includes these two rules:

     [ A ::= W X ]              (R3)
     [ A ::= Y Z ]              (R4)

A MARPA_EVENT_SYMBOL_COMPLETED event is activated for symbol A.

Suppose we want an event to trigger at, and only at, the place when rule (R3) is fully recognized. In lim2Ex, the MARPA_EVENT_SYMBOL_COMPLETED event for symbol A triggers at the end of both rules (R3) and (R4). The triggering at the end of rule (R4) is problematic.

25.15.1 Workaround with a dedicated LHS symbol

Most users will fix the second limitation (see lim2Ex) by falling back on marker symbols. For those users who for some reason prefer to avoid this, in this section we present a workaround that is based on introducing a new symbol, one dedicated to acting as the LHS of (R3) of lim2Ex.

We show the fix as another example, one which we will call fixEx. fixEx is identical to lim2Ex with the following exceptions:

Rule (R3) in lim2Ex is replaced with the following rules:
```
     [ A1 ::= W X ]             (R3a)
     [ A ::= A1 ]               (R3b)
```
In these two rules, rule (R3a) is the replacement rule for rule (R3) in lim2Ex, and A1 is the new LHS symbol, which is dedicated to serving as the LHS of rule (R3a).
A MARPA_EVENT_SYMBOL_COMPLETED event is activated for symbol A1, instead of for symbol A.
Rule (R3a) has the same semantics as rule (R3) in lim2Ex, the rule that it replaced.
Rule (R3b) has pass-through semantics.
The application code must be changed to look for a MARPA_EVENT_SYMBOL_COMPLETED event for symbol A1, instead of symbol A.

fixEx will accept the same language as lim2Ex, with the same semantics as lim2Ex. A MARPA_EVENT_SYMBOL_COMPLETED event will trigger at the end of (R3a), this time for symbol A1, but at exactly the same parse location as it did for symbol A in lim2Ex. And a MARPA_EVENT_SYMBOL_COMPLETED event will no longer trigger at the end of (R4), which is the fix we were looking for.

26 Tracing and diagnosing parses

Libmarpa is very low-level, and does not use strings to identify its objects. Libmarpa is expected to come with a upper level which uses strings to name and/or identify the objects used in parsing, in the process deciding issues like naming conventions, ASCII-7 versus UTF-8, etc. Based on those decisions, the upper layer is expected to offer facilities for tracking and displaying symbol names, displaying rules, tracking and displaying tokens, and displaying progress items.

It is possible to trace and diagnose parses without an upper layer – a few programs in Libmarpa’s test suite do this. Instead of using symbol names and expanding rules into their LHS and RHS, the user can keep track of symbol IDs and rule IDs. But most people who have tried this would agree that this is something that the user should go to great pains to avoid.

As a rule, every application of Libmarpa should have an upper layer with diagnostics. If a pre-existing diagnostics layer is not used, it should be one of the first things written — it is very likely to be one of the first things needed.

The rest of this chapter describes facilities that should be offered in the layer handling grammar diagnostics. Applications may want to allow these facilities at various levels of verbosity. Implementation suggestions are made, focusing on what information the facilities should display at the highest verbosity level.

26.1 Listing symbols

The parse diagnostics should include a facility for naming symbols, iand one for listing the symbols in a grammar, with their names. The symbols will have IDs from 0 to the highest symbol ID, which can be found using the marpa_g_highest_symbol_id() method (see marpa_g_highest_symbol_id). At the highest verbosity level, in addition to the symbol name, the symbol listing facility should display

the symbol ID;
whether the symbol is accessible (see marpa_g_symbol_is_accessible);
whether the symbol is nullable (see marpa_g_symbol_is_nullable);
whether the symbol is nulling (see marpa_g_symbol_is_nulling);
whether the symbol is productive (see marpa_g_symbol_is_productive);
whether the symbol is the start symbol (see marpa_g_symbol_is_start);
whether the symbol is a terminal (see marpa_g_symbol_is_terminal);
whether the symbol is a counted symbol (see marpa_g_symbol_is_counted); and
the rank of the symbol (see marpa_g_symbol_rank).

26.2 Listing rules

The parse diagnostics should include a facility for displaying the rules in a grammar. The rules will have IDs from 0 to the highest rule ID, which can be found using the marpa_g_highest_rule_id() method (see marpa_g_highest_rule_id).

It is up to the application whether to have names for rules. In some grammar generators, the LHS of a rule serves as its name, but does not produce unique names for each rule in Libmarpa, since many rules may have the same LHS. Currently the most used upper layer for Libmarpa, Marpa::R2, does not have rule names. An application that does want to name its rules must decide on a rule naming convention.

At the highest verbosity level, the rule listing facility should display

the rule ID;
the LHS of the rule (see marpa_g_rule_lhs);
the RHS of the rule, that is, each RHS symbol (see marpa_g_rule_rhs) for the length of the rule (see marpa_g_rule_length);
whether the rule is accessible (see marpa_g_rule_is_accessible);
whether the rule is nullable (see marpa_g_rule_is_nullable);
whether the rule is nulling (see marpa_g_rule_is_nulling);
whether the rule is productive (see marpa_g_rule_is_productive);
the rank of the rule (see marpa_g_rule_rank);
whether the rule is a sequence rule (see marpa_g_sequence_min);
if the rule is a sequence rule, the minimum count of the sequence (see marpa_g_sequence_min); and
if the rule is a sequence rule, the sequence separator, if any, (see marpa_g_sequence_separator)
if the rule is a sequence rule, whether the rule uses proper separation, (see marpa_g_rule_is_proper_separation);

26.3 Listing Earley sets

The parse diagnostics should include a facility for displaying Earley sets by Earley set location. For each Earley item at a location, the facility should display

the rule;
the rule’s dot position;
the origin of the Earley item.

The “progress reports” methods of Libmarpa are available for this purpose. See Progress reports.

It is up to the application how to display rule and dot position, but the most helpful way is usually to display the LHS and RHS, with a bullet or dot at the appropriate position in the RHS. This is what is done when showing dotted rules in this document, as for example in

    [ Y ::= C • X ].

The current location of each Earley item may be displayed with the Earley item but, since it is shared by the entire set of Earley items, it can also be put in a header, or left implicit.

We recall (see Earlemes and Earley set IDs) that the Earley IDs range from 0 to latest, inclusive, where latest is the latest Earley set. See The latest Earley set].

The obvious way to display the Earley items is one at a time. We call this the “raw” method of displaying an Earley set. But we can also try to compact this listing by grouping together Earley items that share a dotted rule (and which therefore differ only in their origins). We call this the “compact” method of displaying an Earley set. Marpa::R2 uses the compact method.

To use the compact method, we must be able to show an arbitrarily large number of origins compactly. The following is an example line from Marpa::R2’s progress reports:

    F11 x12 @0...38-41 L1c1-L2c40 <plain assignment> -> ’x’ ’=’ expression •

In this example line, the following are true:

The “F11” indicates indicates that the display line is for rule 11, which is a completion. The “F” indicates that the dot is in the “final” position.
“x12” indicates that the example line represents 12 distinct Earley items.
“@0...38-41” states that the Earley items represented have origins ranging from 0 to 38, and that the current location for the Earley items represented in the example line is 41.
“L1c1-L2c40” describes the line and column locations in the input — the Earley items represented in the example line run from line 1, column 1 to line 2, column 40. Line and column tracking is implemented at the Marpa::R2 level.
“<plain assignment> -> ’x’ ’=’ expression •” indicates the dotted rule.

For more details on Marpa::R2’s implementation of progress reports, see https://metacpan.org/dist/Marpa-R2/view/pod/Progress.pod.

The disadvantage of the compact representation of Earley sets is that it is more complicated to read. The disadvantage of the raw representation is that, for ambiguous grammars, the number of Earley items can be very large, and therefore the listing of the Earley set can be unmanageably long.

26.4 Tracing tokens

The parse diagnostics should include a facility for displaying the token processing. At the default verbosity level, the facility should describe the token passed to the marpa_r_alternative (see marpa_r_alternative) method call, and specify the outcome of the marpa_r_alternative method call as “accepted”, “rejected” or “uncorrectable”, where

“accepted” means that the marpa_r_alternative() call succeeded;
“rejected” means that the marpa_r_alternative() call failed, but the failure is fully recoverable; and
“uncorrectable” means that the marpa_r_alternative() call failed, but the failure is not fully recoverable.

The token tracing facility’s reporting of rejected tokens can vary by application. For some applications, token rejection will not be allowed, so that distinguishing between rejected and uncorrectable tokens is pointless and unnecessary. For other applications, such as those that use the Ruby Slippers technique (see Ruby Slippers), token rejection will be a common occurrence in normal processing.

At some verbosity level above the default, after every marpa_r_earleme_complete() method call (see marpa_r_earleme_complete) the token tracing facility should list all the expected tokens. The marpa_r_terminals_expected() (see marpa_r_terminals_expected) method call is available for this purpose.

26.5 Parse diagnosis checklist

A checklist for diagnosing parse problems will depend, clearly, on the facilities offered by the diagnostics layer in the application. What follows is a “bare bones” checklist, based on the diagnostic facilities recommended in this chapter.

Check the location of the failure, and any error message. This is an obvious step and, if you are reading this checklist, you most likely have already taken it.
Use the token tracing facility. Even if the problem is not directly with the tokens, the token trace is a good way to see how far the parse got, and what it was doing. If you are lucky, looking at it may make it apparent where the problem is.
List the rules and the symbols. Perhaps you’ll see an issue. In any case, you will need these listings for the next step on this checklist.
List the latest Earley set. Form a theory of what should be happening in the parse and check it against what the listing of the latest Earley shows you is actually happening.
If the fix is still not apparent, list earlier Earley sets, until you see the fix, or you reach the first one with an error. Figure out a way to change the grammar to fix the first erroneous Earley set, then work forward.

27 Error methods, macros and codes

27.1 Error methods

Accessor function: Marpa_Error_Code marpa_g_error ( Marpa_Grammar g, const char** p_error_string)

Allows the application to read the error code. p_error_string is reserved for use by the internals. Applications should set it to NULL.

Return value: The current error code. Always succeeds.

Mutator function: Marpa_Error_Code marpa_g_error_clear ( Marpa_Grammar g )

Sets the error code to MARPA_ERR_NONE. Not often used, but now and then it can be useful to force the error code to a known state.

Return value: MARPA_ERR_NONE. Always succeeds.

27.2 Error Macros

Accessor macro: int MARPA_ERRCODE_COUNT: The number of error codes. All error codes, whether internal or external, will be integers, non-negative but strictly less than MARPA_ERRCODE_COUNT.

27.3 External error codes

This section lists the external error codes. These are the only error codes that users of the Libmarpa external interface should ever see. Internal error codes are in their own section (Internal error codes).

Accessor macro: int MARPA_ERR_NONE: No error condition. The error code is initialized to this value. Methods that do not result in failure sometimes reset the error code to MARPA_ERR_NONE. Numeric value: 0. Suggested message: "No error".

Accessor macro: int MARPA_ERR_BAD_SEPARATOR: A separator was specified for a sequence rule, but its ID was not that of a valid symbol. Numeric value: 6. Suggested message: "Separator has invalid symbol ID".

Accessor macro: int MARPA_ERR_BEFORE_FIRST_TREE: A tree iterator is positioned before the first tree, and the tree iterator was specified in a context where the tree iterator must be positioned at or after the first tree. A newly created tree is positioned before the first tree. To position a newly created tree iterator to the first tree use the marpa_t_next() method. Numeric value: 91. Suggested message: "Tree iterator is before first tree".

Accessor macro: int MARPA_ERR_COUNTED_NULLABLE

A “counted” symbol was found that is also a nullable symbol. A “counted” symbol is one that appears on the RHS of a sequence rule. If a symbol is nullable, counting its occurrences becomes difficult. Questions of definition and problems of implementation arise. At a minimum, a sequence with counted nullables would be wildly ambigious.

Sequence rules are simply an optimized shorthand for rules that can also be written in ordinary BNF. If the equivalent of a sequence of nullables is really what your application needs, nothing in Libmarpa prevents you from specifying that sequence with ordinary BNF rules.

Numeric value: 8. Suggested message: "Nullable symbol on RHS of a sequence rule".

Accessor macro: int MARPA_ERR_DUPLICATE_RULE

This error indicates an attempt to add a BNF rule that is a duplicate of a BNF rule already in the grammar. Two BNF rules are considered duplicates if

Both rules have the same left hand symbol, and
Both rules have the same right hand symbols in the same order.

Duplication of sequence rules, and duplication between BNF rules and sequence rules, is dealt with by requiring that the LHS of a sequence rule not be the LHS of any other rule.

Numeric value: 11. Suggested message: "Duplicate rule".

Accessor macro: int MARPA_ERR_DUPLICATE_TOKEN: This error indicates an attempt to add a duplicate token. A token is a duplicate if one already read at the same earleme has the same symbol ID and the same length. Numeric value: 12. Suggested message: "Duplicate token".

Accessor macro: int MARPA_ERR_YIM_COUNT

This error code indicates that an implementation-defined limit on the number of Earley items per Earley set was exceeded. This limit is different from the Earley item warning threshold, an optional limit on the number of Earley items in an Earley set, which can be set by the application.

The implementation-defined limit is very large, at least 500,000,000 Earley items. An application is unlikely ever to see this error. Libmarpa’s use of memory would almost certainly exceed the limit imposed by the application environment before this error occurs. Numeric value: 13. Suggested message: "Maximum number of Earley items exceeded".

Accessor macro: int MARPA_ERR_EVENT_IX_NEGATIVE: A negative event index was specified. That is not allowed. Numeric value: 15. Suggested message: "Negative event index".

Accessor macro: int MARPA_ERR_EVENT_IX_OOB: An non-negative event index was specified, but there is no event at that index. Since the events are in sequence, this means it was too large. Numeric value: 16. Suggested message: "No event at that index".

Accessor macro: int MARPA_ERR_GRAMMAR_HAS_CYCLE: The grammar has a cycle. Parsing using a grammar that contains a cycle is deprecated. See Cycles. Numeric value: 17. Suggested message: "Grammar has cycle".

Accessor macro: int MARPA_ERR_HEADERS_DO_NOT_MATCH: This error indicates that Libmarpa was incorrectly built. Libmarpa was compiled with headers that do not match the rest of the code. The solution is to find a correctly built Libmarpa. Numeric value: 98. Suggested message: "Internal error: Libmarpa was built incorrectly"

Accessor macro: int MARPA_ERR_I_AM_NOT_OK: The Libmarpa base grammar is in a “not ok” state. Currently, the only way this can happen is if Libmarpa memory is being overwritten. Numeric value: 29. Suggested message: "Marpa is in a not OK state".

Accessor macro: int MARPA_ERR_INACCESSIBLE_TOKEN

This error code indicates that the token symbol is an inaccessible symbol — one that cannot be reached from the start symbol. Since the inaccessibility of a symbol is a property of the grammar, this error code typically indicates an application error.

Numeric value: 18. Suggested message: "Token symbol is inaccessible".

Accessor macro: int MARPA_ERR_INVALID_BOOLEAN: A function was called that takes a boolean argument, but the value of that argument was not either 0 or 1. Numeric value: 22. Suggested message: "Argument is not boolean".

Accessor macro: int MARPA_ERR_INVALID_LOCATION

The location (Earley set ID) is not valid. It may be invalid for one of two reasons:

It is negative, and it is being used as the argument to a method for which that negative value does not have a special meaning.
It is after the latest Earley set.

For users of input models other than the standard one, the term “location”, as used in association with this error code, means Earley set ID or Earley set ordinal. In the standard input model, this will always be identical with Libmarpa’s other idea of location, the earleme.

Numeric value: 25. Suggested message: "Location is not valid".

Accessor macro: int MARPA_ERR_INVALID_START_SYMBOL: A start symbol was specified, but its symbol ID is not that of a valid symbol. Numeric value: 27. Suggested message: "Specified start symbol is not valid".

Accessor macro: int MARPA_ERR_INVALID_ASSERTION_ID: A method was called with an invalid assertion ID. This is a assertion ID that not only does not exist, but cannot exist. Currently that means its value is less than zero. Numeric value: 96. Suggested message: "Assertion ID is malformed".

Accessor macro: int MARPA_ERR_INVALID_RULE_ID: A method was called with an invalid rule ID. This is a rule ID that not only does not exist, but cannot exist. Currently that means its value is less than zero. Numeric value: 26. Suggested message: "Rule ID is malformed".

Accessor macro: int MARPA_ERR_INVALID_SYMBOL_ID: A method was called with an invalid symbol ID. This is a symbol ID that not only does not exist, but cannot exist. Currently that means its value is less than zero. Numeric value: 28. Suggested message: "Symbol ID is malformed".

Accessor macro: int MARPA_ERR_MAJOR_VERSION_MISMATCH: There was a mismatch in the major version number between the requested version of libmarpa, and the actual one. Numeric value: 30. Suggested message: "Libmarpa major version number is a mismatch".

Accessor macro: int MARPA_ERR_MICRO_VERSION_MISMATCH: There was a mismatch in the micro version number between the requested version of libmarpa, and the actual one. Numeric value: 31. Suggested message: "Libmarpa micro version number is a mismatch".

Accessor macro: int MARPA_ERR_MINOR_VERSION_MISMATCH: There was a mismatch in the minor version number between the requested version of libmarpa, and the actual one. Numeric value: 32. Suggested message: "Libmarpa minor version number is a mismatch".

Accessor macro: int MARPA_ERR_NO_EARLEY_SET_AT_LOCATION: A non-negative Earley set ID (also called an Earley set ordinal) was specified, but there is no corresponding Earley set. Since the Earley set ordinals are in sequence, this means that the specified ID is greater than that of the latest Earley set. Numeric value: 39. Suggested message: "Earley set ID is after latest Earley set".

Accessor macro: int MARPA_ERR_NOT_PRECOMPUTED: The grammar is not precomputed, and attempt was made to do something with it that is not allowed for unprecomputed grammars. For example, a recognizer cannot be created from a grammar until it is precomputed. Numeric value: 34. Suggested message: "This grammar is not precomputed".

Accessor macro: int MARPA_ERR_NO_PARSE: The application attempted to create a bocage from a recognizer with no parse tree. Numeric value: 41. Suggested message: "No parse".

Accessor macro: int MARPA_ERR_NO_RULES: A grammar that has no rules is being used in a way that is not allowed. Usually the problem is that the user is trying to precompute the grammar. Numeric value: 42. Suggested message: "This grammar does not have any rules".

Accessor macro: int MARPA_ERR_NO_START_SYMBOL: The grammar has no start symbol, and an attempt was made to perform an operation that requires one. Usually the problem is that the user is trying to precompute the grammar. Numeric value: 43. Suggested message: "This grammar has no start symbol".

Accessor macro: int MARPA_ERR_NO_SUCH_ASSERTION_ID: A method was called with an assertion ID that is well-formed (a non-negative integer), but the assertion does not exist. Numeric value: 97. Suggested message: "No assertion with this ID exists".

Accessor macro: int MARPA_ERR_NO_SUCH_RULE_ID: A method was called with a rule ID that is well-formed (a non-negative integer), but the rule does not exist. Numeric value: 89. Suggested message: "No rule with this ID exists".

Accessor macro: int MARPA_ERR_NO_SUCH_SYMBOL_ID: A method was called with a symbol ID that is well-formed (a non-negative integer), but the symbol does not exist. Numeric value: 90. Suggested message: "No symbol with this ID exists".

Accessor macro: int MARPA_ERR_NO_TOKEN_EXPECTED_HERE

An attempt was made to read a token using marpa_r_alternative, but no tokens were expected at this earleme location. This can only happen in alternative input models. See marpa_r_alternative().

Numeric value: 44. Suggested message: "No token is expected at this earleme location".

Accessor macro: int MARPA_ERR_NOT_A_SEQUENCE

This error occurs in situations where a rule is required to be a sequence, and indicates that the rule of interest is, in fact, not a sequence.

Numeric value: 99. Suggested message: "Rule is not a sequence".

Accessor macro: int MARPA_ERR_NULLING_TERMINAL: This error occurs only if LHS terminals feature is in use. The LHS terminals feature is deprecated. See LHS terminals. Numeric value: 49. Suggested message: "A symbol is both terminal and nulling".

Accessor macro: int MARPA_ERR_ORDER_FROZEN

The Marpa order object has been frozen. If a Marpa order object is frozen, it cannot be changed.

Multiple tree iterators can share a Marpa order object, but that order object is frozen after the first tree iterator is created from it. Applications can order an bocage in many ways, but they must do so by creating multiple order objects.

Numeric value: 50. Suggested message: "The ordering is frozen".

Accessor macro: int MARPA_ERR_PARSE_EXHAUSTED: The parse is exhausted. Numeric value: 53. Suggested message: "The parse is exhausted".

Accessor macro: int MARPA_ERR_PARSE_TOO_LONG

The parse is too long. The limit on the length of a parse is implementation dependent, but it is very large, at least 500,000,000 earlemes.

In the standard input model, Libmarpa’s use of memory would almost certainly exceed the limit imposed by the application environment before this error could occur. If an application sees this error, it almost certainly using one of the non-standard input models.

In the non-standard input models, this message will occur most often because of an attempt to add a single extremely long token, perhaps as a result of an application error. It is also possible this error condition will occur after the input of a large number of long tokens.

Numeric value: 54. Suggested message: "This input would make the parse too long".

Accessor macro: int MARPA_ERR_POINTER_ARG_NULL: In a method that takes pointers as arguments, one of the pointer arguments is NULL, in a case where that is not allowed. One such method is marpa_r_progress_item(). Numeric value: 56. Suggested message: "An argument is null when it should not be".

Accessor macro: int MARPA_ERR_PRECOMPUTED: An attempt was made to use a precomputed grammar in a way that is not allowed. Often this is an attempt to change the grammar. Nearly every change to a grammar after precomputation invalidates the precomputation, and is therefore not allowed. Numeric value: 57. Suggested message: "This grammar is precomputed".

Accessor macro: int MARPA_ERR_PROGRESS_REPORT_NOT_STARTED: No recognizer progress report is currently active, and an action has been attempted that requires the progress report to be active. One such action would be a marpa_r_progress_item() call. Numeric value: 59. Suggested message: "No progress report has been started".

Accessor macro: int MARPA_ERR_PROGRESS_REPORT_EXHAUSTED: The progress report is “exhausted” — all its items have been iterated through. Numeric value: 58. Suggested message: "The progress report is exhausted".

Accessor macro: int MARPA_ERR_RANK_TOO_LOW: A symbol or rule rank was specified that was less than an implementation-defined minimum. Implementations will always allow ranks in the range between -500,000,000 and 500,000,000. Numeric value: 85. Suggested message: "Rule or symbol rank too low".

Accessor macro: int MARPA_ERR_RANK_TOO_HIGH: A symbol or rule rank was specified that was greater than an implementation-defined maximum. Implementations will always allow ranks in the range between -500,000,000 and 500,000,000. Numeric value: 86. Suggested message: "Rule or symbol rank too high".

Accessor macro: int MARPA_ERR_RECCE_NOT_ACCEPTING_INPUT: The recognizer is not accepting input, and the application has attempted something that is inconsistent with that fact. Numeric value: 60. Suggested message: "The recognizer is not accepting input".

Accessor macro: int MARPA_ERR_RECCE_NOT_STARTED: The recognizer has not been started. and the application has attempted something that is inconsistent with that fact. Numeric value: 61. Suggested message: "The recognizer has not been started".

Accessor macro: int MARPA_ERR_RECCE_STARTED: The recognizer has been started. and the application has attempted something that is inconsistent with that fact. Numeric value: 62. Suggested message: "The recognizer has been started".

Accessor macro: int MARPA_ERR_RHS_IX_NEGATIVE: The index of a RHS symbol was specified, and it was negative. That is not allowed. Numeric value: 63. Suggested message: "RHS index cannot be negative".

Accessor macro: int MARPA_ERR_RHS_IX_OOB: A non-negative index of RHS symbol was specified, but there is no symbol at that index. Since the indexes are in sequence, this means the index was greater than or equal to the rule length. Numeric value: 64. Suggested message: "RHS index must be less than rule length".

Accessor macro: int MARPA_ERR_RHS_TOO_LONG: An attempt was made to add a rule with too many right hand side symbols. The limit on the RHS symbol count is implementation dependent, but it is very large, at least 500,000,000 symbols. Libmarpa’s use of memory would almost certainly exceed the limit imposed by the application environment before this error could occur. Numeric value: 65. Suggested message: "The RHS is too long".

Accessor macro: int MARPA_ERR_SEQUENCE_LHS_NOT_UNIQUE: The LHS of a sequence rule cannot be the LHS of any other rule, whether a sequence rule or a BNF rule. An attempt was made to violate this restriction. Numeric value: 66. Suggested message: "LHS of sequence rule would not be unique".

Accessor macro: int MARPA_ERR_START_NOT_LHS: The start symbol is not on the LHS on any rule. That means it could never match any possible input, not even the null string. This is likely to be a mistake in writing the grammar, which can be fixed by rewriting the grammar. Numeric value: 73. Suggested message: "Start symbol not on LHS of any rule".

Accessor macro: int MARPA_ERR_SYMBOL_IS_NOT_COMPLETION_EVENT: An attempt was made to use a symbol in a way that requires it to be set up for completion events, but the symbol was not set up for completion events. This error can occur when the application attempts to activate a completion event in the recognizer for a symbol that is not set up as a completion event. Numeric value: 92. Suggested message: "Symbol is not set up for completion events".

Accessor macro: int MARPA_ERR_SYMBOL_IS_NOT_NULLED_EVENT: An attempt was made to use a symbol in a way that requires it to be set up for nulled events, but the symbol was not set up for nulled events. This error can occur when the application attempts to activate a nulled symbol event in the recognizer for a symbol that is not set up as a nulled event. Numeric value: 93. Suggested message: "Symbol is not set up for nulled events".

Accessor macro: int MARPA_ERR_SYMBOL_IS_NOT_PREDICTION_EVENT: An attempt was made to use a symbol in a way that requires it to be set up for prediction events, but the symbol was not set up for prediction events. This error can occur when the application attempts to activate a prediction event in the recognizer for a symbol that is not set up as a prediction event. Numeric value: 94. Suggested message: "Symbol is not set up for prediction events".

Accessor macro: int MARPA_ERR_SYMBOL_VALUED_CONFLICT: Unvalued symbols are a deprecated Marpa feature, which may be avoided with the marpa_g_force_valued() method. An unvalued symbol may take on any value, and therefore a symbol that is unvalued at some points cannot safely to be used to contain a value at others. This error indicates that such an unsafe use is being attempted. Numeric value: 74. Suggested message: "Symbol is treated both as valued and unvalued".

Accessor macro: int MARPA_ERR_TERMINAL_IS_LOCKED: An attempt was made to change the terminal status of a symbol to a different value after it was locked. Numeric value: 75. Suggested message: "The terminal status of the symbol is locked".

Accessor macro: int MARPA_ERR_TOKEN_IS_NOT_TERMINAL: A token was specified whose symbol ID is not a terminal. Numeric value: 76. Suggested message: "Token symbol must be a terminal".

Accessor macro: int MARPA_ERR_TOKEN_LENGTH_LE_ZERO: A token length was specified that is less than or equal to zero. Zero-length tokens are not allowed in Libmarpa. Numeric value: 77. Suggested message: "Token length must greater than zero".

Accessor macro: int MARPA_ERR_TOKEN_TOO_LONG: The token length is too long. The limit on the length of a token is implementation dependent, but it is at least 500,000,000 earlemes. Libmarpa’s use of memory would almost certainly exceed the limit imposed by the application environment before this error could occur. Numeric value: 78. Suggested message: "Token is too long".

Accessor macro: int MARPA_ERR_TREE_EXHAUSTED: A Libmarpa parse tree iterator is “exhausted”, that is, it has no more parse trees. Numeric value: 79. Suggested message: "Tree iterator is exhausted".

Accessor macro: int MARPA_ERR_TREE_PAUSED: A Libmarpa tree is “paused” and an operation was attempted that is inconsistent with that fact. Typically, this operation will be a call of the marpa_t_next() method. Numeric value: 80. Suggested message: "Tree iterator is paused".

Accessor macro: int MARPA_ERR_UNEXPECTED_TOKEN_ID: An attempt was made to read a token where a token with that symbol ID is not expected. This message can also occur when an attempt is made to read a token at a location where no token is expected. Numeric value: 81. Suggested message: "Unexpected token".

Accessor macro: int MARPA_ERR_UNPRODUCTIVE_START: The start symbol is unproductive. That means it could never match any possible input, not even the null string. This is likely to be a mistake in writing the grammar, which can be fixed by rewriting the grammar. Numeric value: 82. Suggested message: "Unproductive start symbol".

Accessor macro: int MARPA_ERR_VALUATOR_INACTIVE: The valuator is inactive in a context where that should not be the case. Numeric value: 83. Suggested message: "Valuator inactive".

Accessor macro: int MARPA_ERR_VALUED_IS_LOCKED: Unvalued symbols are a deprecated Marpa feature, which may be avoided with the marpa_g_force_valued() method. See marpa_g_force_valued(). This error code indicates that the valued status of a symbol is locked, and an attempt was made to change it to a status different from the current one. Numeric value: 84. Suggested message: "The valued status of the symbol is locked".

Accessor macro: int MARPA_ERR_SYMBOL_IS_NULLING: An attempt was made to do something with a nulling symbol that is not allowed. For example, the ID of a nulling symbol cannot be an argument to marpa_r_expected_symbol_event_set(), because it is not possible to create an “expected symbol” event for a nulling symbol. Numeric value: 87. Suggested message: "Symbol is nulling".

Accessor macro: int MARPA_ERR_SYMBOL_IS_UNUSED: An attempt was made to do something with an unused symbol that is not allowed. An “unused” symbol is a inaccessible or unproductive symbol. For example, the ID of a unused symbol cannot be an argument to marpa_r_expected_symbol_event_set(), because it is not possible to create an “expected symbol” event for an unused symbol. Numeric value: 88. Suggested message: "Symbol is not used".

27.4 Internal error codes

An internal error code may be one of two things: First, it can be an error code that arises from an internal Libmarpa programming issue (in other words, something happening in the code that was not supposed to be able to happen.) Second, it can be an error code that only occurs when a method from Libmarpa’s internal interface is used. Both kinds of internal error message share one common trait — users of the Libmarpa’s external interface should never see them.

Internal error messages require someone with knowledge of the Libmarpa internals to follow up on them. They usually do not have descriptions or suggested messages.

Accessor macro: int MARPA_ERR_AHFA_IX_NEGATIVE: Numeric value: 1.

Accessor macro: int MARPA_ERR_AHFA_IX_OOB: Numeric value: 2.

Accessor macro: int MARPA_ERR_ANDID_NEGATIVE: Numeric value: 3.

Accessor macro: int MARPA_ERR_ANDID_NOT_IN_OR: Numeric value: 4.

Accessor macro: int MARPA_ERR_ANDIX_NEGATIVE: Numeric value: 5.

Accessor macro: int MARPA_ERR_BOCAGE_ITERATION_EXHAUSTED: Numeric value: 7.

Accessor macro: int MARPA_ERR_DEVELOPMENT

“Development” errors were used heavily during Libmarpa’s development, when it was not yet clear how precisely to classify every error condition. Unless they are using a developer’s version, users of the external interface should never see development errors.

Development errors have an error string associated with them. The error string is a short 7-bit ASCII error string that describes the error. Numeric value: 9. Suggested message: "Development error, see string".

Accessor macro: int MARPA_ERR_DUPLICATE_AND_NODE: Numeric value: 10.

Accessor macro: int MARPA_ERR_YIM_ID_INVALID: Numeric value: 14.

Accessor macro: int MARPA_ERR_INTERNAL: A “catchall” internal error. Numeric value: 19.

Accessor macro: int MARPA_ERR_INVALID_AHFA_ID: The AHFA ID was invalid. There are no AHFAs any more, so this message should not occur. Numeric value: 20.

Accessor macro: int MARPA_ERR_INVALID_AIMID: The AHM ID was invalid. The term “AIMID” is a legacy of earlier implementations and must be kept for backward compatibility. Numeric value: 21.

Accessor macro: int MARPA_ERR_INVALID_IRLID: Numeric value: 23.

Accessor macro: int MARPA_ERR_INVALID_NSYID: Numeric value: 24.

Accessor macro: int MARPA_ERR_NOOKID_NEGATIVE: Numeric value: 33.

Accessor macro: int MARPA_ERR_NOT_TRACING_COMPLETION_LINKS: Numeric value: 35.

Accessor macro: int MARPA_ERR_NOT_TRACING_LEO_LINKS: Numeric value: 36.

Accessor macro: int MARPA_ERR_NOT_TRACING_TOKEN_LINKS: Numeric value: 37.

Accessor macro: int MARPA_ERR_NO_AND_NODES: Numeric value: 38.

Accessor macro: int MARPA_ERR_NO_OR_NODES: Numeric value: 40.

Accessor macro: int MARPA_ERR_NO_TRACE_YS: Numeric value: 46.

Accessor macro: int MARPA_ERR_NO_TRACE_PIM: Numeric value: 47.

Accessor macro: int MARPA_ERR_NO_TRACE_YIM: Numeric value: 45.

Accessor macro: int MARPA_ERR_NO_TRACE_SRCL: Numeric value: 48.

Accessor macro: int MARPA_ERR_ORID_NEGATIVE: Numeric value: 51.

Accessor macro: int MARPA_ERR_OR_ALREADY_ORDERED: Numeric value: 52.

Accessor macro: int MARPA_ERR_PIM_IS_NOT_LIM: Numeric value: 55.

Accessor macro: int MARPA_ERR_SOURCE_TYPE_IS_NONE: Numeric value: 70.

Accessor macro: int MARPA_ERR_SOURCE_TYPE_IS_TOKEN: Numeric value: 71.

Accessor macro: int MARPA_ERR_SOURCE_TYPE_IS_COMPLETION: Numeric value: 68.

Accessor macro: int MARPA_ERR_SOURCE_TYPE_IS_LEO: Numeric value: 69.

Accessor macro: int MARPA_ERR_SOURCE_TYPE_IS_AMBIGUOUS: Numeric value: 67.

Accessor macro: int MARPA_ERR_SOURCE_TYPE_IS_UNKNOWN: Numeric value: 72.

Accessor macro: int MARPA_ERR_RECCE_IS_INCONSISTENT: Numeric value: 95. Suggested message: "The recognizer is inconsistent".

28 Technical notes

This section contains technical notes that are not necessary for the main presentation, but which may be helpful or interesting.

28.1 Data types used by Libmarpa

Libmarpa does not use any floating point data or strings. All data are either integers or pointers.

28.2 Why so many time objects?

Marpa is an aggressively multi-pass algorithm. Marpa achieves its efficiency, not in spite of making multiple passes over the data, but because of it. Marpa regularly substitutes two fast O(n) passes for a single O(n log n) pass. Marpa’s proliferation of time objects is in keeping with its multi-pass approach.

Bocage objects come at no cost, even for unambiguous parses, because the same pass that creates the bocage also deals with other issues that are of major significance for unambiguous parses. It is the post-processing of the bocage pass that enables Marpa to do both left- and right-recursion in linear time.

Of the various objects, the best case for elimination is of the ordering object. In many cases, the ordering is trivial. Either the parse is unambiguous, or the application does not care about the order in which parse trees are returned. But while it would be easy to add an option to bypass creation of an ordering object, there is little to be gained from it. When the ordering is trivial, its overhead is very small — essentially a handful of subroutine calls. Many orderings accomplish nothing, but these cost next to nothing.

Tree objects come at minimal cost to unambiguous grammars, because the same pass that allows iteration through multiple parse trees does the tree traversal. This eliminates much of the work that otherwise would need to be done in the valuation time object. In the current implementation, the valuation time object needs only to step through a sequence already determined by the tree iterator.

28.3 Numbered objects

As the name suggests, the choice was made to implement numbered objects as integers, and not as pointers. In standard-conformant C, integers can be safely checked for validity, while pointers cannot.

There are efficiency tradeoffs between pointers and integers but they are complicated, and they go both ways. Pointers can be faster, but integers can be used as indexes into more than one data structure. Which is actually faster depends on the design. Integers allow for a more flexible design, so that once the choice is settled on, careful programming can make them a win, possibly a very big one.

The approach taken in Libmarpa was to settle, from the outset, on integers as the implementation for numbered objects, and to optimize on that basis. The author concedes that it is possible that others redoing Libmarpa from scratch might find that pointers are faster. But the author is confident that they will also discover, on modern architectures, that the lack of safe validity checking is far too high a price to pay for the difference in speed.

28.4 Trap representations

In order to be C89 conformant, an application must initialize all locations that might be read. This is because C89 allows trap representations.

A trap representation is a byte pattern in memory that is not a valid value of some object type. When read, the trap representation causes undefined behavior according to the C89 standard. Because of this undefined behavior the application that allowed the read is non-conformant to the C89 standard. Trap representations are carefully defined and discussed in the C99 standard. See C99

In real life, trap representations can occur when floating point values are used: Some byte patterns that can occur in memory are not valid floating point values, and can cause undefined behavior when read.

Pointers raise the same issue although, since pointers can be safely read as an integer, some insist that an invalid pointer is not, strictly speaking, a trap representation. But there is no portable c89-conformant way of testing the integer form of a pointer for validity, so that the only way to guarantee C89 conformance is to initialize the pointer, either to a valid pointer, or to a known and therefore testable value, such as NULL.

All this implies that, in order to claim c89-conformance, an application must initialize all locations that might be read to non-trap values. For a stack implementation, this means that, as a practical matter, all locations on the stack must be initialized.

28.5 Out-of-memory handling

Modern operating systems usually have an out-of-memory (OOM) killer as a component. When an OOM killer is in use, if an application or library is unable to allocate memory, it usually will not be given an opportunity to handle this error. See “When malloc() Never Returns NULL — Reliability as an Illusion”, https://arxiv.org/pdf/2208.08484.pdf.

29 Advanced input models

In an earlier chapter, we introduced Libmarpa’s concept of input, and described its basic input models. See Input. In this chapter we describe Libmarpa’s advanced models of input. These advanced input models have attracted considerable interest. However, they have seen little actual use so far, and for that reason we delayed their consideration until now.

A Libmarpa input model is advanced if it allows tokens of length other than 1. The advanced input models are also called variable-length token models because they allow the token length to vary from the “normal” length of 1.

29.1 The dense variable-length token model

In the dense variable-length model of input, one or more successful calls of marpa_r_alternative() must be immediately previous to every call to marpa_r_earleme_complete(). Note that, for a variable-length input model to be “dense” according to this definition, at least one successful call of marpa_r_alternative() must be immediately previous to each call to marpa_r_earleme_complete(). Recall that, in this document, we say that a marpa_r_alternative() call is “immediately previous” to a marpa_r_earleme_complete() call iff that marpa_r_earleme_complete() call is the first marpa_r_earleme_complete() call after the marpa_r_alternative() call.

In the dense model of input, after a successful call of marpa_r_alternative(), the earleme variables are as follows:

The furthest earleme will be max(old_f, old_c+length),
- where old_f is the furthest earleme before the call to marpa_r_alternative(),
- old_c is the value of the current earleme before the call to marpa_r_alternative(), and
- length is the length of the token read.
marpa_r_alternative() never changes the latest or current earleme.

In the dense variable-length model of input, the following are also true:

The effect of the marpa_r_earleme_complete() mutator on the earleme variables is the same as for the basic models of input. See The standard model of input.
the latest earleme is always the same as the current earleme. In fact, the latest earleme and the current earleme are always the same, except in the fully general model of input.

29.2 The fully general input model

In the sparse variable-length model of input, zero or more successful calls of marpa_r_alternative() must be immediately previous to every call to marpa_r_earleme_complete(). The sparse model is the dense variable-length model, with its only restriction lifted — the sparse variable-length input model allows calls to marpa_r_earleme_complete() that are not immediately preceded by successful calls to marpa_r_alternative().

Since it is unrestricted, the sparse input model is Libmarpa’s fully general input model. Because of this, it may be useful for us to state the effect of mutators on the earleme variables in detail, even at the expense of some repetition.

In the sparse input model, empty earlemes are now possible. An empty earleme is an earleme with no tokens and no Earley set. An empty earleme occurs iff marpa_r_earleme_complete() is called when there is no immediately previous successful call to marpa_r_alternative(). The sparse model takes its name from the fact that there may be earlemes with no Earley set. In the sparse model, Earley sets are “sparsely” distributed among the earlemes.

The latest earleme is the most recent non-empty earleme. In other words, the latest earleme is the most recent earleme with an Earley set at it. In the dense input models, the current earleme and the latest earleme were always the same. Because the sparse input models allow empty earlemes, it becomes possible for the latest earleme and the current earleme to differ. The latest earleme must always be the earleme of an Earley set. The current earleme is not necessarily the earleme of an Earley set.

In the sparse model of input, the effect on the earleme variables of a successful call of the marpa_r_alternative() mutator is the same as for the dense model of input:

The furthest earleme will be max(old_f, old_c+length),
- where old_f is the furthest earleme before the call to marpa_r_alternative(),
- old_c is the value of the current earleme before the call to marpa_r_alternative(), and
- length is the length of the token read.
marpa_r_alternative() never changes the latest or current earleme.

In the sparse model, when the earleme is not empty, the effect of a call to marpa_r_earleme_complete() on the earleme variables is the same as in the dense and the basic models of input. Specifically, the following will be true:

The current earleme will be advanced to old_c+1, where old_c is the current earleme before the call.
The latest earleme will be old_c+1, and therefore will be equal to the current earleme.
The value of the furthest earleme is never changed by a call to marpa_r_earleme_complete().

For the sparse input model, in the case of an empty earleme, the effect of the marpa_r_earleme_complete() mutator on the earleme variables is the following:

The current earleme will be advanced to old_c+1, where old_c is the current earleme before the call.
The latest earleme will remain at old_l, where the latest earleme before the call is old_l. This implies that the latest earleme will be less than the current earleme.
The furthest earleme is never changed by a call to marpa_r_earleme_complete().

29.3 The codepoint-per-earleme model

One variation of the fully general model is the one-codepoint-per-earleme input model, usually called the codepoint-per-earleme input model.

Most input models may be called token-per-earleme input models. That is, they are models in which every token is exactly one earleme long.

In the codepoint-per-earleme model, every codepoint will be treated as being exactly one earleme in length. If a token is more than one codepoint in length, that token will span earlemes. In the codepoint-per-earleme model, tokens may be ambiguous, and they may overlap.

When a codepoint-per-earleme model of input is used, there may be many earlemes at which no tokens start. For example, in a straightforward codepoint-per-earleme implementation of a grammar for a language that allows comments, no tokens will start at any earlemes which correspond to character locations inside a comment.

Codepoint-per-earleme input model have seen a lot of use, but mainly for experimental and toy grammars. Their disadvantage is efficiency — the requirement of one call of marpa_r_alternative() for each codepoint can make them substantially more expensive than input models which allow multiple codepoints per earleme.

29.4 Converting earleme to Earley set ID

In the dense input models, the earleme of every Earley set is equal to the ID of that Earley set. But, because of the sparse input models, this is not true in the general case.

For other applications that want an earleme-to-ID mapping, the most general method for converting earlemes to Earley set IDs is to create a ID-to-earleme array using the marpa_r_earleme() method (see marpa_r_earleme), and invert it. The resulting earleme-to-ID array will be sparse. More precisely, where the function earleme maps IDs to earlemes, the value at index ix of the array should be

id, if earleme(id) = ix for some id; and
undefined, otherwise.

The Libmarpa interface does not provide a method for earleme-to-ID conversion. The input model is not known at the Libmarpa level, so that an earleme-to-ID method would have to be fully general. A fully general earleme-to-ID method would be costly, an cost which usually would be unnecessary. For these reasons, it’s up to the application to create its own earleme-to-conversion method, if one is needed.

Here are the alternatives:

Usually an earleme-to-ID conversion method is not needed.
If an earleme-to-ID conversion method is needed, and the sparse input model is not in use, the Earley set ID and the earleme are always equal, and the earleme-to-ID conversion is trivial.
If an earleme-to-ID conversion method is needed, and a sparse input model is in use, the application may be able to implement an efficient earleme-to-ID conversion targeted at that specific input model.
If an earleme-to-ID conversion method is needed, a sparse input model is in use, and no efficient earleme-to-ID conversion targeted at that specific input model seems to be available, the general earleme-to-ID conversion technique described just above must be used.

30 Support

The “updates” (https://github.com/jeffreykegler/libmarpa/blob/updated/UPDATES.md) document contains instructions for reporting bugs, getting answers to questions, and other support.

31 Futures

This chapter is not about the current interface. Instead, it discusses changes or additions that might be made to this document or to the external interface in the future.

31.1 Nulling versus nulled

Currently we call a zero-length instance (aka tree node) either a nulling instance or a nulled instance. The use of “nulling” is for historic reasons and arguably is confusing. The symbol of a nulling instance is not necessarily a nulling symbol — it might be a nullable symbol. Usage of the term “nulled” is less confusing. At this time, we continue to allow zero-length instances to be called nulling instances because that terminology is embedded in a lot of code and documents.

31.2 Document pre-conditions more formally

A more formal approach to documenting preconditions of the methods is possible, and may be helpful enough to repay any cost in verbosity or complexity. Dave Abrahams recommended I look at https://www.boost.org/sgi/stl/ for one approach.

31.3 Simpler events interface

The events interface is unnecessarily complex. A separate grammar method for the activation and deactivation of each event type is unnecessary. See Completion events. See Symbol nulled events. See Prediction events.

31.4 Better defined ambiguity metric

With experience, we are now in a position to define an ambiguity metric that can be cheaply calculated, and that might be of real use. Preliminary notes are in the CWeb code.

31.5 Report item traverser should be a time object

Right now, a report item traverser is a kind of “subobject” of a recognizer. It should be made into a full-fledged time object. This will allow multiple report item traversers to be in use at once, allowing more aggressive use of this facility.

31.6 Orthogonal treatment of soft failures

The treatment of soft failure evolved along with this interface, leaving traces of that evolution in the interface. For example, soft failures should not set the error code, but soft failure in marpa_r_progress_item() sets the error code to MARPA_ERR_PROGRESS_REPORT_EXHAUSTED. See marpa_r_progress_item(). Similar, soft failure marpa_t_next() sets the error code to MARPA_ERR_TREE_EXHAUSTED. These non-orthogonalities should be fixed someday.

31.7 Orthogonal treatment of exhaustion

The treatment of parse exhaustion is very awkward. marpa_r_start_input() returns success on exhaustion, while marpa_r_earleme_complete() either returns success or a hard failure, depending on circumstances. See marpa_r_earleme_complete(), and marpa_r_start_input().

Ideally the treatment should be simpler, more intuitive and more orthogonal. Better, perhaps, would be to always treat parse exhaustion as a soft failure.

31.8 Furthest earleme values

marpa_r_furthest_earleme returns unsigned int, which is non-orthogonal with marpa_r_current_earleme. This leaves no room for an failure return value, which we deal with by not checking for failures. The only important potential failure is calling marpa_r_furthest_earleme when the furthest earleme is an indeterminate value. We eliminate this potential cause of failure by regarding furthest earleme as having been initialized when the recognizer was created, which is another non-orthogonality with marpa_r_current_earleme.

All this might be fine, if something were gained, but in fact nothing ever is gained. The furthest earleme, unless the parse fails, always becomes the current earleme, and no use cases for extremely long variable-length tokens are envisioned, so that current and furthest earleme should never be far apart. Additionally, the additional values for the furthest earleme only come into play if the length of parse causes the application to exceed the memory limit imposed by modern application environments. Summarizing, marpa_r_furthest_earleme, should return an int, like marpa_r_current_earleme, and the non-orthogonalities should be eliminated.

31.9 Additional recoverable failures in marpa_r_alternative()

Among the hard failures that marpa_r_alternative() returns are the error codes MARPA_ERR_DUPLICATE_TOKEN, MARPA_ERR_NO_TOKEN_EXPECTED_HERE and MARPA_ERR_INACCESSIBLE_TOKEN. These are currently irrecoverable. They may in fact be fully recoverable, but are not documented as such because this has not been tested.

At this writing, we know of no applications that attempt to recover from these errors. It is possible that these error codes may also be useable for the techniques similar to the Ruby Slippers, as of this writing, we know of no proposals to use them in this way.

31.10 Untested methods

The methods of this section are not in the external interface, because they have not been adequately tested. Their fate is uncertain. Users should regard these methods as unsupported.

31.10.1 Zero-width assertion methods

Function: Marpa_Assertion_ID marpa_g_zwa_new ( Marpa_Grammar g, int default_value)

Function: int marpa_g_zwa_place ( Marpa_Grammar g, Marpa_Assertion_ID zwaid, Marpa_Rule_ID xrl_id, int rhs_ix)

Function: int marpa_r_zwa_default ( Marpa_Recognizer r, Marpa_Assertion_ID zwaid): On success, returns previous default value of the assertion.

Function: int marpa_r_zwa_default_set ( Marpa_Recognizer r, Marpa_Assertion_ID zwaid, int default_value): Changes default value to default_value. On success, returns previous default value of the assertion.

Function: Marpa_Assertion_ID marpa_g_highest_zwa_id ( Marpa_Grammar g )

31.10.2 Methods for revising parses

Function: Marpa_Earleme marpa_r_clean ( Marpa_Recognizer r): Allows an application to “change its mind” about a parse, rejecting rules previously recognized or predicted, and terminals previously scanned. This method is untested and its fate is uncertain.

32 Deprecated techniques and methods

32.1 LHS terminals

32.1.1 Overview of LHS terminals

The user creates LHS terminals with the marpa_g_symbol_is_terminal_set() method. See marpa_g_symbol_is_terminal_set(). If the marpa_g_symbol_is_terminal_set() method is never called for a grammar, then LHS terminals are not in use for any time object with that grammar as its base grammar.

32.1.2 Motivation of LHS terminals

Recall that a terminal symbol is a symbol that may appear in the input. Traditionally, all LHS symbols, as well as the start symbol, must be non-terminals. By default, this is Marpa’s behavior.

In a departure from tradition, Marpa had a feature that allowed the user to eliminate the distinction between terminals and non-terminals. This feature is now deprecated.

When LHS terminals are in use, a terminal can appear on the LHS of one or more rules, and can be be the start symbol. Note however, that terminals can never be zero length.

The basis of the LHS terminals feature was that, while sharp division between terminals and non-terminals was a useful simplification for proving theorems, it was not essential in practice. In the UNIX “toolkit” tradition, the practice has been to include even awkward, dangerous tools with no known use, in the toolkit. The philosophy was that empowering the user who discovers new techniques is more important than playing nanny to the toolkit’s users.

LHS symbols could be used to bypass, or “short circuit”, the rules on whose LHS they occur. Short circuiting rules, it was thought, might prove helpful in debugging, or have other applications.

But, a decade after the release of Libmarpa, no uses for LHS symbols have emerged. And they do introduce many new corner cases into the code and complicate the API documentation.

32.1.3 LHS terminal methods

The terminal status of a symbol is a boolean, which is true iff the symbol is a terminal. The terminal status of a symbol is locked iff the terminal status of that symbol cannot be changed.

Mutator function: int marpa_g_symbol_is_terminal_set ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, int value)

On success, does the following:

Sets the terminal status of the symbol in grammar g with symbol ID sym_id to value. To be used as an input symbol in the marpa_r_alternative() method, a symbol must be a terminal.
Locks the terminal status of symbol sym_id.
Returns value.

Hard fails with error code MARPA_ERR_TERMINAL_IS_LOCKED if the symbol with sym_id is locked, and the terminal status of the symbol with sym_id is not equal to value. Also hard fails if value is not a boolean or if g is precomputed.

Return value: On success, value, which will be 1 or 0. On soft failure, -1. On hard failure, -2.

32.1.4 Precomputation and LHS terminals

On success, marpa_g_precompute() sets and locks the terminal status of every symbol. More precisely,

let the symbol be x;
let the terminal status of x when marpa_g_precompute() was called be v_before; and
let the terminal status of x when marpa_g_precompute() returns success be v_after.

The effect of the successful call of marpa_g_precompute() will be as follows:

If terminal status of x is locked when marpa_g_precompute() was called, then v_after = v_before.
If terminal status of x is not locked, and x appears on the LHS of some rule then v_after is false.
If terminal status of x is not locked, and x does not appear on the LHS of any rule then v_after is true.

The terminal status of all symbols is locked after a successful call to marpa_g_precompute(). See marpa_g_precompute().

32.1.5 Nulling terminals

When LHS terminals are not in use, nulling terminals cannot occur, and applications need not take them in account. This is because, in order to be nullable, a symbol must appear on the LHS of a nullable rule. Without LHS terminals, therefore, no terminals can ever be either nullable or nulling.

Things become more complicated if LHS terminals are allowed. In that case nulling terminals can be created, and Libmarpa must take measures to prevent a recognizer from being created for a grammar with nulling terminals. Libmarpa will not allow a recognizer to be created from a grammar with nulling terminals because they are a logical contradiction. A terminal is (by definition) a symbol which can appear in the input, and a nulling symbol, by definition, cannot appear in the input.

Libmarpa’s marpa_g_precompute method fails with the error code MARPA_ERR_NULLING_TERMINAL if it detects nulling terminals during precomputation. The error code MARPA_ERR_NULLING_TERMINAL is library-recoverable. See marpa_g_precompute().

Libmarpa’s marpa_g_precompute method also triggers one MARPA_EVENT_NULLING_TERMINAL event for every nulling terminal in the grammar. This implies that one or more MARPA_EVENT_NULLING_TERMINAL events occur iff marpa_g_precompute fails with error code MARPA_ERR_NULLING_TERMINAL.

32.2 Valued and unvalued symbols

32.2.1 What unvalued symbols were

Libmarpa symbols can have values, which is the traditional way of doing semantics. Libmarpa also allows symbols to be unvalued. An unvalued symbol is one whose value is unpredictable from instance to instance. If a symbol is unvalued, we sometimes say that it has “whatever” semantics.

Situations where the semantics can tolerate unvalued symbols are surprisingly frequent. For example, the top-level of many languages is a series of major units, all of whose semantics are typically accomplished via side effects. The compiler is typically indifferent to the actual value produced by these major units, and tracking them is a waste of time. Similarly, the value of the separators in a list is typically ignored.

Rules are unvalued if and only if their LHS symbols are unvalued. When rules and symbols are unvalued, Libmarpa optimizes their evaluation.

It is in principle unsafe to check the value of a symbol if it can be unvalued. For this reason, once a symbol has been treated as valued, Libmarpa marks it as valued. Similarly, once a symbol has been treated as unvalued, Libmarpa marks it as unvalued. Once marked, a symbol’s valued status is locked and cannot be changed later.

The valued status of terminals is marked the first time they are read.

Unvalued symbols may be used in combination with another deprecated feature, LHS terminals. See LHS terminals. The valued status of LHS symbols must be explicitly marked by the application when initializing the valuator — this is Libmarpa’s equivalent of registering a callback.

The valued status of a LHS terminal will be locked in the recognizer if it is used as a terminal, and the symbol’s use as a rule LHS in the valuator must be consistent with the recognizer’s valued marking. LHS terminals are disabled by default.

Marpa reports an error when a symbol’s use conflicts with its locked valued status. Doing so usually saves the Libmarpa user some tricky debugging further down the road.

32.2.2 Grammar methods dealing with unvalued symbols

Function: int marpa_g_symbol_is_valued_set ( Marpa_Grammar g, Marpa_Symbol_ID symbol_id, int value)

Function: int marpa_g_symbol_is_valued ( Marpa_Grammar g, Marpa_Symbol_ID symbol_id)

These methods, respectively, set and query the “valued status” of a symbol. Once set to a value with the marpa_g_symbol_is_valued_set() method, the valued status of a symbol is “locked” at that value. It cannot thereafter be changed. Subsequent calls to marpa_g_symbol_is_valued_set() for the same sym_id will fail, leaving sym_id’s valued status unchanged, unless value is the same as the locked-in value.

Return value: On success, 1 if the symbol symbol_id is valued after the call, 0 if not. If the valued status is locked and value is different from the current status, -2. If value is not 0 or 1; or on other hard failure, -2.

32.2.3 Registering semantics in the valuator

By default, Libmarpa’s valuator objects assume that non-terminal symbols have no semantics. The archetypal application will need to register symbols that contain semantics. The primary method for doing this is marpa_v_symbol_is_valued(). Applications will typically register semantics by rule, and these applications will find the marpa_v_rule_is_valued() method more convenient.

Function: int marpa_v_symbol_is_valued_set ( Marpa_Value v, Marpa_Symbol_ID sym_id, int status )

Function: int marpa_v_symbol_is_valued ( Marpa_Value v, Marpa_Symbol_ID sym_id )

These methods, respectively, set and query the valued status of symbol sym_id. marpa_v_symbol_is_valued_set() will set the valued status to the value of its status argument. A valued status of 1 indicates that the symbol is valued. A valued status of 0 indicates that the symbol is unvalued. If the valued status is locked, an attempt to change to a status different from the current one will fail (error code MARPA_ERR_VALUED_IS_LOCKED).

Return value: On success, the valued status after the call. If value is not either 0 or 1, or on other hard failure, -2.

Function: int marpa_v_rule_is_valued_set ( Marpa_Value v, Marpa_Rule_ID rule_id, int status )

Function: int marpa_v_rule_is_valued ( Marpa_Value v, Marpa_Rule_ID rule_id )

These methods, respectively, set and query the valued status for the LHS symbol of rule rule_id. marpa_v_rule_is_valued_set() sets the valued status to the value of its status argument.

A valued status of 1 indicates that the symbol is valued. A valued status of 0 indicates that the symbol is unvalued. If the valued status is locked, an attempt to change it to a status different from the current one will fail with error code MARPA_ERR_VALUED_IS_LOCKED.

Rules have no valued status of their own. The valued status of a rule is always that of its LHS symbol. These methods are conveniences — they save the application the trouble of looking up the rule’s LHS.

Return value: On success, the valued status of the rule rule_id’s LHS symbol after the call. If value is not either 0 or 1, or on other hard failure, -2.

Function: int marpa_v_valued_force ( Marpa_Value v)

This methods locks the valued status of all symbols to 1, so that all symbols will always be valued. Hard fails if this is not possible, for example because one of the grammar’s symbols already is locked at a valued status of 0.

Return value: On success, a non-negative integer. On hard failure, returns -2, and sets the error code to an appropriate value, which will never be MARPA_ERR_NONE.

32.3 Cycles

A cycle is a case in a grammar where a string non-trvially derives itself. Marpa will parse grammars with cycles, but this is deprecated. Applications should treat cycles as mistakes on the part of the writer of the grammar, and not attempt to parse with grammars that contain cycles.

If Marpa is used to parse an input that causes a cycle, it will limit the cycle to a single loop, so that only a finite number of parse trees is returned. Again, parsing with cycles is deprecated.

For more on cycles, see

marpa_g_has_cycle();
marpa_g_rule_is_loop();
the discussion on the MARPA_ERR_GRAMMAR_HAS_CYCLE and MARPA_EVENT_LOOP_RULES events in marpa_g_precompute();
MARPA_EVENT_LOOP_RULES; and
MARPA_ERR_GRAMMAR_HAS_CYCLE.

33 History of the Marpa algorithm

This chapter is a quick summary of the most important events in Marpa’s development. My “timeline” of the major events in parsing theory has a much broader scope, and also includes more detail about Marpa’s development. See Parsing timeline.

1970: Jay Earley invents the algorithm that now bears his name See Earley 1970.
1991: Joop Leo describes a way to modify Earley’s algorithm so that it runs in O(n) time for all LR-regular grammars. See Leo 1991. LR-regular is a vast class of grammars, including all the LR(k) grammars, all grammars parseable with recursive descent, and regular expressions. LR-regular can safely be thought of as including all grammars in practical use today, and then some.
2002: Aycock and Horspool describe a way to do LR(0) precomputation for Earley’s algorithm. See Aycock and Horspool 2002. Their method makes Earley’s faster in most practical situations, but not all. In particular, right-recursion remains quadratic in the Aycock and Horspool algorithm. Worst case is no better than Earley’s. Leo is unaware of Aycock and Horspool’s work and Aycock and Horspool seem unaware of Leo.
2010: Marpa combines the Leo and Aycock-Horspool algorithms in the process making significant changes to both of them. See Marpa theory paper. The result preserves the best features of both and tackles the many remaining implementation issues.

34 Annotated bibliography

34.1 Aho and Ullman 1972

The Theory of Parsing, Translation and Compiling, Volume I: Parsing by Alfred Aho and Jeffrey Ullman (Prentice-Hall: Englewood Cliffs, New Jersey, 1972). I think this was the standard source for Earley’s algorithm for decades. It certainly was my standard source. The account of Earley’s algorithm is on pages 320-330.

34.2 Aycock and Horspool 2002

Marpa is based on ideas from John Aycock and R. Nigel Horspool’s “Practical Earley Parsing”, The Computer Journal, Vol. 45, No. 6, 2002, pp. 620-630. The idea of doing LR(0) precomputation for Earley’s general parsing algorithm (see Earley 1970), and Marpa’s approach to handling nullable symbols and rules, both came from this article.

The Aycock and Horspool paper summarizes Earley’s very nicely and is available on the web: http://www.cs.uvic.ca/~nigelh/Publications/PracticalEarleyParsing.pdf. Unlike Earley’s 1970 paper (see Earley 1970), Aycock and Horspool 2002 is not easy reading. I have been following this particular topic on and off for years and nonetheless found this paper very heavy going.

34.3 C89 standard

Libmarpa is written to conform to the C89 standard. This can be found online at https://web.archive.org/web/20200909074736/https://www.pdf-archive.com/2014/10/02/ansi-iso-9899-1990-1/ansi-iso-9899-1990-1.pdf.

34.4 C99 standard

The C99 standard clarifies the C89 standard in many respects. It can be found online at https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf.

34.5 Dominus 2005

Although my approach to parsing is not influenced by Mark Jason Dominus’s Higher Order Perl, Mark’s treatment of parsing is an excellent introduction to parsing, especially in a Perl context. His focus on just about every other technique except general BNF parsing is pretty much standard, and will help a beginner understand how unconventional Marpa’s approach is.

Both Mark’s Perl and his English are examples of good writing, and the book is dense with insights. Mark’s discussion on memoization in Chapter 3 is the best I’ve seen. I wish I’d bought his book earlier in my coding.

Mark’s book is available on-line. You can download chapter-by-chapter or the whole thing at once, and you can take your pick of his original sources or PDF, at http://hop.perl.plover.com/book/. A PDF of the parsing chapter is at http://hop.perl.plover.com/book/pdf/08Parsing.pdf.

34.6 Earley 1970

Of Jay Earley’s papers on his general parsing algorithm, the most readily available is “An efficient context-free parsing algorithm”, Communications of the Association for Computing Machinery, 13:2:94-102, 1970.

Ordinarily, I’d not bother pointing out 35-year old nits in a brilliant and historically important article. But more than a few people treat this article as not just the first word in Earley parsing, but the last as well. Many implementations of Earley’s algorithm come, directly and unaltered, from his paper. These implementers and their users need to be aware of two issues.

First, the recognition engine itself, as described, has a serious bug. There’s an easy fix, but one that greatly slows down an algorithm whose main problem, in its original form, was speed. This issue is well laid out by Aycock and Horspool in their article. See Aycock and Horspool 2002.

Second, according to Tomita there is a mistake in the parse tree representation. See page 153 of Grune and Jacobs 1990, page 210 of Grune and Jacobs 2008, and the bibliography entry for Earley 1970 in Grune and Jacobs 2008. In the printed edition of the 2008 bibliography, the entry is on page 578, and on the web (ftp://ftp.cs.vu.nl/pub/dick/PTAPG_2nd_Edition/CompleteList.pdf), it’s on pp. 583-584. My methods for producing parse results from Earley sets do not come from Earley 1970, so I am taking Tomita’s word on this one.

34.7 Grune and Jacobs 1990

Parsing Techniques: A Practical Guide, by Dick Grune and Ceriel Jacobs, (Ellis Horwood Limited: Chichester, West Sussex, England, 1990). This book is available on the Web: http://dickgrune.com/Books/PTAPG_1st_Edition/

34.8 Grune and Jacobs 2008

Parsing Techniques: A Practical Guide, by Dick Grune and Ceriel Jacobs, 2nd Edition. (Springer: New York NY, 2008). This is the most authoritative and comprehensive introduction to parsing I know of. In theory it requires no mathematics, only a programming background, but even so it is moderately difficult reading.

This is the second edition of Grune and Jacobs’s survey of parsing methods. See Grune and Jacobs 1990. The bibliography for this book is available in enlarged form on the web: ftp://ftp.cs.vu.nl/pub/dick/PTAPG_2nd_Edition/CompleteList.pdf.

34.9 Marpa theory paper

My writeup of the theory behind Marpa, with proofs of correctness and of my complexity claims, was first made public in 2013. It was updated in 2023, and can be found on arxiv.org (https://arxiv.org/abs/1910.08129).

34.10 Kegler 2023

This theory-oriented paper expands on the Marpa theory paper (see Marpa theory paper). It gives a detailed treatment of Marpa’s use of the Aycock-Horspool algorithm, and Marpa’s handling of nullables It was originally announced in 2023, and can be found on arxiv.org (https://arxiv.org/abs/2303.04093).

34.11 Parsing timeline

Far more popular than my Marpa theory paper is my Parsing: a timeline. This is a detailed history of parsing theory, and is available online: https://jeffreykegler.github.io/personal/timeline_v3.

34.12 Leo 1991

Marpa’s handling of right-recursion uses the ideas in Joop M.I.M. Leo’s “A General Context-Free Parsing Algorithm Running in Linear Time on Every LR(k) Grammar Without Using Lookahead”, Theoretical Computer Science, Vol. 82, No. 1, 1991, pp 165-176. This is a difficult paper. It is available online at http://www.sciencedirect.com/science/article/pii/030439759190180A, click the PDF icon at the top left.

34.13 Scott 2008

One of our most important data structures is what we call a “bocage”. Prof. Scott’s work preceded ours, and her SPPF structure is our bocage in all essential respects, so much so that her excellent writeup serves perfectly as documentation for the bocage: Scott, Elizabeth. “SPPF-style parsing from Earley recognisers.” Electronic Notes in Theoretical Computer Science 203.2 (2008): 53-67, https://dinhe.net/~aredridel/.notmine/PDFs/Parsing/SCOTT%2C%20Elizabeth%20-%20SPPF-Style%20Parsing%20From%20Earley%20Recognizers.pdf.

34.14 Wikipedia BNF article

Wikipedia’s article on Backus-Naur form is http://en.wikipedia.org/wiki/Backus-Naur_form. It’s a great place to start if you don’t know the basics of grammars and parsing. As Wikipedia points out, BNF might better be called Panini-Backus Form. The grammarian Panini gave a precise description of Sanskrit more than 23 centuries earlier in India using a similar notation.

35 Acknowledgements

Beggar that I am, I am even poor in thanks; but I thank you; and sure, dear friends, my thanks are too dear a halfpenny.

— Hamlet, Act 2, scene 2

This section is to thank those who have assisted us, not just with Libmarpa, but with the Marpa project as a whole. Many people have helped us in many ways, and it is risky to single out one of them. But Ron Savage was a very aggressive early adopter of Marpa, went on to maintain a Marpa web site and FAQ, and provided generous financial support.

Two other very early supporters were Peter Stuifzand and Ruslan Zakirov. Peter Stuifzand invented the “Stuifzand interface”, the prototype on which Marpa’s SLIF interface is based. Ruslan Zakirov started, and agreed to moderate, the “Marpa parser” mailing list. A list of the other contributions to Marpa that Ron, Peter, and Ruslan Z. made over the years would be very long.

Larry Wall provided wise and experienced advice which saved us much trouble. Larry’s openness to new ideas has been a major encouragement, and his insight into the relationship between “natural language” and computer language has been a major influence. Randal Schwartz, Allison Randal and Patrick Michaud were also generous with their very valuable time.

At perlmonks.org, answers from chromatic, Corion, dragonchild, jdporter, samtregar and Juerd were helpful. In writing an early, “pure Perl” version of Marpa, I benefited from studying the work of Francois Desarmenien (Parse::Yapp), Damian Conway (Parse::RecDescent) and Graham Barr (Scalar::Util). Adam Kennedy patiently instructed me in module writing, both on the finer points and on issues about which I really should have known better.

Jean-Damien Durand’s assistance included several ambitious Marpa applications, as well as the Windows port of Marpa. Deyan Ginev provided advice on LaTeX and on deeper matters which proved essential. Lenz Moritz quietly and effectively maintained our IRC channel. Andrew Rodland stood in for Jeffrey Kegler as the face of Marpa when it was needed, and his TAP parser, which used one Marpa grammar that fed another, was a major inspiration for the SLIF. Ruslan Shvedov provided many hours of assistance, including contributing linguistic insights into details of code testing, creating a test suite for Libmarpa, making major contributions to the Perl test suite, helping with my implementation of ASF’s, and assisting me in writing documentation. Luc St-Louis moderated the Marpa IRC channel, saving Jeffrey time, and the channel’s other users from having to put up with Jeffrey who, as a moderator, could be too heavy-handed. An anonymous member of the Hoon community was generous financially.

Additional help came from Dave Abrahams, Mohammad S Anwar, Lukas Atkinson, Peter Blackson, Domingo Alvarez Duarte, Anton Dyudin, B. Fraser, Zaki Mughal, Omar Roth, Arsen Shnurkov, Aria Stewart, Flaviu Tomas, and David Whitten.

Finally, my thanks to all those who participated in the discussions on perlmonk.org, on the “Marpa parser” mailing list, and on the #marpa IRC channel. It is hard to describe how important informed and constructive feedback is to a lone laborer on a complex and large project like Marpa, and I greatly appreciate all those who participated.

35.1 An apology

We feared that these acknowledgements would become an exhibition of our negligence and ingratitude, and at least to some degree this has turned out to be the case. There were plans for things to be better. We took care to thank those who helped us in Marpa’s IRC channel, thinking that we could use the channel’s backlog as a source for these notices.

The European Union (EU) had other ideas. In 2018, the EU passed the General Data Protection Regulation (GDPR), which suddenly brought our plans to nought. Under the GDPR, logs like that for Marpa’s IRC channel could not possibly be made conformant in a way that kept their administrators safe from legal action. In theory this applied to similar, highly-profitable databases maintained by large corporations, as much as it did to the volunteers generously maintaining the backlog for our IRC channel. In practice large corporations have been able to largely neutralize the GDPR, deftly parrying most enforcement, with their worst possible outcome being a cost-of-doing-business fine. For the typical volunteer, however, even winning a legal action is ruinous, given the costs.

Whether or not some were aware of GDPR’s highly discriminatory effects when it was being drafted, the realization of the GDPR’s practical implications for volunteer-driven open source effort came to our community very abruptly, and after it was already in effect. Our backlogging was done in EU jurisdiction, and no measure to save our IRC backlogs could be taken without exposing innocent and generous volunteers to almost certainly disastrous legal action. My notes for these acknkowledgements needed to be cobbled together from other sources.

Index of terms

This index is of terms that are used in a special sense in this document. Not every use of these terms is indexed — only those uses that are in some way defining.

Jump to:	@ A B C D E F G H I L M N O P R S T U V W

	Index Entry	Section

@
	@-notation for nulled instance:	Trees
	@-notation for rule instance:	Trees
	@-notation for sequence instance:	Trees
	@-notation for symbol instance:	Trees

A
	accessible rule:	Useless rules
	accessible symbol:	Useless rules
	activated event subtype:	Events overview
	active (valuator):	Valuator constructor
	active parse:	Exhaustion
	advanced input model:	Advanced input models
	advanced models of input:	The basic models of input
	ambiguous:	Ambiguity
	ancestor:	Trees
	ancestor (object):	Time objects
	ancestor, proper:	Trees
	ancestor, trivial:	Trees
	ancestry-recoverable hard failure:	Ancestry-recoverable hard failure
	application:	Miscellaneous definitions
	application behavior:	Application and diagnostic behavior
	application environment:	Miscellaneous definitions
	application environment limit:	Miscellaneous definitions
	applications, exhaustion-hating:	Exhaustion
	applications, exhaustion-loving:	Exhaustion
	archetypal Libmarpa application:	About the overviews
	at (a location, wrt a symbol):	Trees

B
	base (object):	Time objects
	base grammar (of a time object):	Time objects
	basic models of input:	The basic models of input
	behavior, application:	Application and diagnostic behavior
	behavior, diagnostic:	Application and diagnostic behavior
	BNF:	Rules
	BNF node:	Trees
	boolean:	Miscellaneous definitions
	boolean value:	Miscellaneous definitions
	bottom-up traversal:	Traversal

C
	child (of a node):	Trees
	child object (of a time object):	Time objects
	children (of a node):	Trees
	codepoint-per-earleme input model:	The codepoint-per-earleme model
	completed dotted rule:	Earley items
	completed Earley item:	Earley items
	completion (dotted rule):	Earley items
	completion (Earley item):	Earley items
	context-free grammar:	Rules
	counted symbol:	Sequence methods
	current location (in an Earley item):	Earley items
	current node (of a progress report traverser):	Progress report traverser
	cycle:	Recursion and cycles
	cycle-free:	Recursion and cycles

D
	declared event subtype:	Events overview
	dense variable-length input model:	The dense variable-length token model
	depth-first traversal:	Traversal
	derivation:	Derivations
	derivation step:	Derivations
	derivation, leftmost:	An example tree
	derivation, rightmost:	An example tree
	derives:	Derivations
	derives, directly:	Derivations
	descendant:	Trees
	descendant (object):	Time objects
	descendant, proper:	Trees
	descendant, trivial:	Trees
	diagnostic behavior:	Application and diagnostic behavior
	directly derives:	Derivations
	dot position:	Earley items
	dotted rule:	Earley items

E
	earleme:	Earlemes and Earley set IDs
	earleme, current:	The current earleme
	earleme, empty:	The fully general input model
	earleme, furthest:	The furthest earleme
	earleme, latest:	The latest earleme
	Earley item:	Earley items
	Earley item warning threshold:	Other parse status methods
	Earley set ID:	Earlemes and Earley set IDs
	Earley set, latest:	The latest Earley set
	empty earleme:	The fully general input model
	empty rule:	Rules
	empty rule:	Rules
	empty sentence:	Nulling
	empty string:	Nulling
	end of file:	Determining EOP from EOI
	end of input:	Determining EOP from EOI
	end of parse:	End of parse
	end of string:	Determining EOP from EOI
	end sentinel:	Progress report traverser
	end sentinel node:	Progress report traverser
	environment limit:	Miscellaneous definitions
	environment, application:	Miscellaneous definitions
	EOF:	Determining EOP from EOI
	EOI:	Determining EOP from EOI
	EOP:	End of parse
	EOS:	Determining EOP from EOI
	evaluate (a parse run):	Semantics terms
	evaluate (a tree):	Semantics terms
	evaluate (an input string):	Semantics terms
	event queue:	Events overview
	event, global:	Events overview
	event, per-symbol:	Events overview
	event-safe (method):	Events overview
	event-triggering (method):	Events overview
	events, recognizer per-symbol:	Recognizer per-symbol events
	exhausted parse:	Exhaustion
	exhaustion on failure:	Exhaustion
	exhaustion on success:	Exhaustion
	exhaustion-hating applications:	Exhaustion
	exhaustion-loving applications:	Exhaustion
	expected symbol:	Earley items
	explicit event type:	Events overview

F
	failure:	User non-conformity to specified behavior
	failure, ancestry-recoverable hard:	Ancestry-recoverable hard failure
	failure, fully recoverable hard:	Fully recoverable hard failure
	failure, hard:	Classifying failure
	failure, irrecoverable hard:	Irrecoverable hard failure
	failure, Libmarpa application programming:	User non-conformity to specified behavior
	failure, library-recoverable hard:	Library-recoverable hard failure
	failure, memory allocation:	Memory allocation failure
	failure, partially recoverable hard:	Partially recoverable hard failure
	failure, soft:	Classifying failure
	failure, soft:	Soft failure
	failure, undetected:	Undetected failure
	forest:	Trees
	frozen ordering:	Freezing the ordering
	fully recoverable hard failure:	Fully recoverable hard failure

G
	global event:	Events overview
	global event:	Events overview
	grammar:	Parsing theory preliminaries

H
	hard failure:	Classifying failure
	hard failure, ancestry-recoverable:	Ancestry-recoverable hard failure
	hard failure, fully recoverable:	Fully recoverable hard failure
	hard failure, irrecoverable:	Irrecoverable hard failure
	hard failure, library-recoverable:	Library-recoverable hard failure
	hard failure, partially recoverable:	Partially recoverable hard failure

I
	ID (of an Earley set):	Earlemes and Earley set IDs
	iff:	Miscellaneous definitions
	immediately previous (to a `marpa_r_earleme_complete()` call):	The standard model of input
	implicit event type:	Events overview
	in use (LHS terminals):	Overview of LHS terminals
	inaccessible rule:	Useless rules
	inaccessible symbol:	Useless rules
	inactive (valuator):	Stepping through the valuator
	indeterminate value:	Miscellaneous definitions
	indirect:	Derivations
	infinitely ambiguous:	Recursion and cycles
	input:	Parsing theory preliminaries
	input:	Parsing theory preliminaries
	input model, advanced:	Advanced input models
	input model, dense variable-length:	The dense variable-length token model
	input model, sparse variable-length:	The fully general input model
	input model, variable-length token:	Advanced input models
	input sentence:	Parsing theory preliminaries
	input text:	Parsing theory preliminaries
	input, advanced models of:	The basic models of input
	input, basic models of:	The basic models of input
	instance:	Trees
	instance (of a symbol):	Trees
	instance (of a symbol):	Trees
	irrecoverable hard failure:	Irrecoverable hard failure
	iterator, parse tree:	Tree overview

L
	labeled ordered tree:	Trees
	labeled ordered tree node:	Trees
	language:	Parsing theory preliminaries
	language:	Derivations
	latest Earley set:	The latest Earley set
	leaf (node):	Trees
	leaf node:	Trees
	left hand side:	Rules
	left-recursive:	Recursion and cycles
	leftmost derivation:	An example tree
	length:	Parsing theory preliminaries
	length:	Derivations
	length (of a node):	Trees
	length, rule:	Rules
	lexeme:	Parsing theory preliminaries
	lexer:	Parsing theory preliminaries
	lexical analysis:	Parsing theory preliminaries
	lexical analyzer:	Parsing theory preliminaries
	lexing:	Parsing theory preliminaries
	LHS:	Rules
	LHS (of a rule node):	Trees
	LHS terminals in use:	Overview of LHS terminals
	Libmarpa application programming failure:	User non-conformity to specified behavior
	Libmarpa application programming success:	User non-conformity to specified behavior
	Libmarpa application, archetypal:	About the overviews
	library-recoverable hard failure:	Library-recoverable hard failure
	limit, application environment:	Miscellaneous definitions
	limit, environment:	Miscellaneous definitions
	locked terminal status:	LHS terminal methods
	locked value status (of a symbol):	What unvalued symbols were

M
	matches:	Derivations
	matches:	Derivations
	max(x,y):	Miscellaneous definitions
	memory allocation failure:	Memory allocation failure
	method:	Miscellaneous definitions
	models of input, advanced:	The basic models of input
	models of input, basic:	The basic models of input

N
	node (of a tree):	Trees
	node length:	Trees
	node, current (of a progress report traverser):	Progress report traverser
	node, labeled ordered tree:	Trees
	node, root:	Trees
	node, start:	Trees
	non-empty:	Nulling
	non-nullable:	Nulling
	non-nullable:	Nulling
	non-nulling:	Nulling
	non-nulling:	Nulling
	non-trivial:	Derivations
	null derivation:	Nulling
	null parse:	Nulling
	nullable rule:	Nulling
	nullable symbol:	Nulling
	nulled (of a symbol):	Trees
	nulled node:	Trees
	nulled symbol instance:	Trees
	nulling node:	Trees
	nulling rule:	Nulling
	nulling symbol:	Nulling

O
	one-codepoint-per-earleme input model:	The codepoint-per-earleme model
	ordering, frozen:	Freezing the ordering
	origin (in an Earley item):	Earley items
	our:	Miscellaneous definitions

P
	parent (wrt a node):	Trees
	parent object (of a time object):	Time objects
	parse (aka parse forest):	Trees
	parse (aka parse run):	Trees
	parse (aka parse tree):	Trees
	parse (forest):	Trees
	parse exhaustion on failure:	Exhaustion
	parse exhaustion on success:	Exhaustion
	parse forest:	Trees
	parse tree:	Tree overview
	parse tree iterator:	Tree overview
	parse, active:	Exhaustion
	parse, exhausted:	Exhaustion
	parser:	Parsing theory preliminaries
	partially recoverable hard failure:	Partially recoverable hard failure
	pause (a parent tree iterator):	Valuator constructor
	per-symbol event:	Events overview
	per-symbol event:	Events overview
	postdot symbol (of a dotted rule):	Earley items
	postdot symbol (of an Earley item):	Earley items
	postorder traversal:	Traversal
	predicted dotted rule:	Earley items
	predicted Earley item:	Earley items
	prediction (dotted rule):	Earley items
	prediction (Earley item):	Earley items
	previous (to a `marpa_r_earleme_complete()` call), immediately:	The standard model of input
	produces:	Derivations
	productive rule:	Useless rules
	productive symbol:	Useless rules
	proper ancestor:	Trees
	proper descendant:	Trees

R
	raw input:	Parsing theory preliminaries
	reachable rule:	Useless rules
	reachable symbol:	Useless rules
	recognizer:	Parsing theory preliminaries
	recognizer per-symbol events:	Recognizer per-symbol events
	recursive:	Recursion and cycles
	RHS:	Rules
	RHS (of a rule node):	Trees
	right hand side:	Rules
	right-recursive:	Recursion and cycles
	rightmost derivation:	An example tree
	root node:	Trees
	Ruby Slippers:	Recognizer life cycle mutators
	rule:	Parsing theory preliminaries
	rule creation methods:	Rule methods
	rule creation methods:	Sequence methods
	rule length:	Rules
	rule node:	Trees
	rule, accessible:	Useless rules
	rule, inaccessible:	Useless rules
	rule, nullable:	Nulling
	rule, nulling:	Nulling
	rule, productive:	Useless rules
	rule, reachable:	Useless rules
	rule, unproductive:	Useless rules
	rule, unreachable:	Useless rules
	rule, useless:	Useless rules

S
	scanner:	Parsing theory preliminaries
	scanning:	Parsing theory preliminaries
	semantics:	Parsing theory preliminaries
	semantics:	Semantics terms
	sentence:	Parsing theory preliminaries
	sentential form:	Parsing theory preliminaries
	sequence node:	Trees
	sequence rule:	Rules
	soft failure:	Classifying failure
	soft failure:	Soft failure
	sparse variable-length input model:	The fully general input model
	start node:	Trees
	start sentinel:	Progress report traverser
	start sentinel node:	Progress report traverser
	step:	Derivations
	step (of a valuator):	Stepping through the valuator
	step type:	Stepping through the valuator
	step type, valuator:	Stepping through the valuator
	string:	Parsing theory preliminaries
	subtree:	Trees
	success:	User non-conformity to specified behavior
	success, Libmarpa application programming:	User non-conformity to specified behavior
	successful:	Derivations
	symbol:	Parsing theory preliminaries
	symbol instance, nulled:	Trees
	symbol string:	Parsing theory preliminaries
	symbol string input:	Parsing theory preliminaries
	symbol string input:	Parsing theory preliminaries
	symbol, accessible:	Useless rules
	symbol, counted:	Sequence methods
	symbol, inaccessible:	Useless rules
	symbol, productive:	Useless rules
	symbol, reachable:	Useless rules
	symbol, unproductive:	Useless rules
	symbol, unreachable:	Useless rules
	symbol, unvalued:	What unvalued symbols were
	symbol, useless:	Useless rules

T
	terminal node:	Trees
	terminal status (of a symbol):	LHS terminal methods
	token node:	Trees
	token stream:	Parsing theory preliminaries
	token-per-earleme input models:	The codepoint-per-earleme model
	tokens:	Parsing theory preliminaries
	trap representations:	Trap representations
	trap value:	Miscellaneous definitions
	traverse:	Traversal
	tree:	Trees
	tree:	Tree overview
	tree node:	Trees
	tree node, labeled ordered:	Trees
	tree, labeled ordered:	Trees
	trivial ancestor:	Trees
	trivial derivation:	Derivations
	trivial descendant:	Trees

U
	unambiguous:	Ambiguity
	undefined behavior:	Miscellaneous definitions
	undetected failure:	Undetected failure
	unproductive rule:	Useless rules
	unproductive symbol:	Useless rules
	unreachable:	Useless rules
	unreachable rule:	Useless rules
	unreachable symbol:	Useless rules
	unspecified behavior:	Miscellaneous definitions
	unspecified value:	Miscellaneous definitions
	unvalued symbol:	What unvalued symbols were
	us:	Miscellaneous definitions
	useless rule:	Useless rules
	useless symbol:	Useless rules
	user:	Miscellaneous definitions

V
	valuator:	Value overview
	valuator step:	Stepping through the valuator
	valuator step type:	Stepping through the valuator
	value:	Parsing theory preliminaries
	value:	Semantics terms
	value status, locked (of a symbol):	What unvalued symbols were
	value, boolean:	Miscellaneous definitions
	variable-length input model, dense:	The dense variable-length token model
	variable-length input model, sparse:	The fully general input model
	variable-length token input model:	Advanced input models

W
	we:	Miscellaneous definitions

Jump to:	@ A B C D E F G H I L M N O P R S T U V W

Footnotes

(1)

Those interested in details of notation may notice that we include current location in the Earley item tuple, contrary to the tradition. We develop our notation more formally, and explain our reasons for the deviation from tradition, on pages 5-7 of https://arxiv.org/abs/2303.04093v1.

Libmarpa 11.0.10

Table of Contents

Libmarpa: The Marpa low-level library

1 License

2 Updates

3 About this document

3.1 How to read this document

3.2 Prerequisites

4 Overview of Libmarpa

5 Terms, definitions and notation

5.1 Miscellaneous definitions

5.2 Parsing theory preliminaries

5.3 Rules

5.4 Derivations

5.5 Nulling

5.6 Useless rules

5.7 Recursion and cycles

5.8 Trees

5.9 An example tree

5.10 Traversal

5.11 Semantics terms

5.12 Ambiguity

5.13 Earley items

5.14 Application and diagnostic behavior

6 Architecture

6.1 Major objects

6.2 Time objects

6.3 Reference counting

6.4 Numbered objects

7 Input

7.1 Parse location

7.1.1 The traditional input model

7.1.2 Earlemes and Earley set IDs

7.1.3 The latest Earley set

7.1.4 The latest earleme

7.1.5 The current earleme

7.1.6 The furthest earleme

7.2 The basic models of input

7.2.1 The standard model of input

7.2.2 Ambiguous input

7.3 Terminals

8 Exhaustion

9 End of parse

9.1 Determining EOP from EOI

9.2 Determining EOP from exhaustion

9.3 Other ways of determining EOP

10 Semantics

11 Threads

12 Sequence rules

13 Nullability

13.1 Nullability in the valuator

13.2 Assigning semantics to nulled symbols

13.3 Evaluating nulled symbols

13.4 Example of nulled symbol

13.5 Duplicate nulled nodes

14 Failure

14.1 Libmarpa’s approach to failure

14.2 User non-conformity to specified behavior

14.3 Classifying failure

14.4 Memory allocation failure

14.5 Undetected failure

14.6 Irrecoverable hard failure

14.7 Partially recoverable hard failure

14.8 Library-recoverable hard failure

14.9 Ancestry-recoverable hard failure

14.10 Fully recoverable hard failure

14.11 Soft failure

14.12 Error codes

15 Introduction to the method descriptions

15.1 About the overviews

15.2 Naming conventions

15.3 Return values

15.4 How to read the method descriptions

16 Static methods

17 Configuration methods

18 Grammar methods

18.1 Overview

18.2 Creating a new grammar

18.3 Tracking the reference count of the grammar

18.4 Symbol methods