Libmarpa 8.6.0

Table of Contents

Libmarpa: The Marpa low-level library

This manual (14 July 2017) is for Libmarpa 8.6.0.

Copyright © 2015 Jeffrey Kegler.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


1 About this document


1.1 How to read this document

This is essentially a reference document, but its early chapters lay out concepts essential to the others. Readers will usually want to read the chapters up and including Introduction to the external interface in order. Otherwise, they should follow their interests.


1.2 Prerequisites

This document is very far from self-contained. It assumes the following:


1.3 Parsing theory

This document assumes some acquaintance with parsing theory. The reader’s level of knowledge is probably adequate if he can answer the following questions, either immediately or after a little reflection.


2 About Libmarpa

Libmarpa implements the Marpa parsing algorithm. Marpa is named after the legendary 11th century Tibetan translator, Marpa Lotsawa. In creating Marpa, I depended heavily on previous work by Jay Earley, Joop Leo, John Aycock and Nigel Horspool.

Libmarpa implements the entire Marpa algorithm. This library does the necessary grammar preprocessing, recognizes the input, and produces parse trees. It also supports the ordering, iteration and evaluation of the parse trees.

Libmarpa is very low-level. For example, it has no strings. Rules, symbols, and token values are all represented by integers. This, of course, will not suffice for many applications. Users will very often want names for the symbols, non-integer values for tokens, or both. Typically, applications will use arrays to translate Libmarpa’s integer ID’s to strings or other values as required.

Libmarpa also does not implement most of the semantics. Libmarpa does have an evaluator (called a “valuator”), but it does not manipulate the stack directly. Instead, Libmarpa, based on its traversal of the parse tree, passes optimized step by step stack manipulation instructions to the upper layer. These instructions indicate the token or rule involved, and the proper location for the true token value or the result of the rule evaluation. For rule evaluations, the instructions include the stack location of the arguments.

Marpa requires most semantics to be implemented in the application. This allows the application total flexibility. It also puts the application is in a much better position to prevent errors, to catch errors at runtime or, failing all else, to successfully debug the logic.


3 Architecture


3.1 Major objects

The classes of Libmarpa’s object system fall into two types: major and numbered. These are the Libmarpa’s major classes, in sequence.

The major objects have one letter abbreviations, which are used frequently. These are, in the standard sequence,


3.2 Time objects

All of Libmarpa’s major classes, except the configuration class, are “time” classes. Except for objects in the grammar class, all time objects are created from another time object. Each time object is created from a time object of the class before it in the sequence. A recognizer cannot be created without a precomputed grammar; a bocage cannot be created without a recognizer; and so on.

When one time object is used to create a second time object, the first time object is the parent object and the second time object is the child object. For example, when a bocage is created from a recognizer, the recognizer is the parent object, and the bocage is the child object.

Grammars have no parent object. Every other time object has exactly one parent object. Value objects have no child objects. All other time objects can have any number of children, from zero up to a number determined by memory or some other machine-determined limit.

Every time object has a base grammar. A grammar object is its own base grammar. The base grammar of a recognizer is the grammar that it was created with. The base grammar of any other time object is the base grammar of its parent object. For example, the base grammar of a bocage is the base grammar of the recognizer that it was created with.


3.3 Reference counting

Every object in a “time” class has its own, distinct, lifetime, which is controlled by the object’s reference count. Reference counting follows the usual practice. Contexts which take a share of the “ownership” of an object increase the reference count by 1. When a context relinquishes its share of the ownership of an object, it decreases the reference count by 1.

Each class of time object has a “ref” and an “unref” method, to be used by those contexts which need to explicitly increment and decrement the reference count. For example, the “ref” method for the grammar class is marpa_g_ref() and the “unref” method for the grammar class is marpa_g_unref().

Time objects do not have explicit destructors. When the reference count of a time object reaches 0, that time object is destroyed.

Much of the necessary reference counting is performed automatically. The context calling the constructor of a time object does not need to explicitly increase the reference count, because Libmarpa time objects are always created with a reference count of 1.

Child objects “own” their parents, and when a child object is successfully created, the reference count of its parent object is automatically incremented to reflect this. When a child object is destroyed, it automatically decrements the reference count of its parent.

In a typical application, a calling context needs only to remember to “unref” each time object that it creates, once it is finished with that time object. All other reference decrements and increments are taken care of automatically. The typical application never needs to explicitly call one of the “ref” methods.

More complex applications may find it convenient to have one or more contexts share ownership of objects created in another context. These more complex situations are the only cases in which the “ref” methods will be needed.


3.4 Numbered objects

In addition to its major, “time” objects, Libmarpa also has numbered objects. Numbered objects do not have lifetimes of their own. Every numbered object belongs to a time object, and is destroyed with it. Rules and symbols are numbered objects. Tokens values are another class of numbered objects.


4 Input


4.1 Earlemes


4.1.1 The traditional model

In traditional Earley parsers, the concept of location is very simple. Locations are numbered from 0 to n, where n is the length of the input. Every location has an Earley set, and vice versa. Location 0 is the start location. Every location after the start location has exactly one input token associated with it.

Some applications do not fit this traditional input model — natural language processing requires ambiguous tokens, for example. Libmarpa allows a wide variety of alternative input models.

This document assumes that the reader knows the concepts behind Libmarpa’s alternative input models, either from the documentation of a higher level interface, such as Marpa::XS, Marpa::R2, or Marpa::R3, or from Marpa’s theory document.

As a reminder, in Libmarpa a location is called a earleme. The number of an Earley set is the ID of the Earley set, or its ordinal. In the traditional model, the ordinal of an Earley set and its earleme are always exactly the same, but in Libmarpa they will be different.


4.1.2 The earleme variables

The important earleme variables are the current earleme, the furthest earleme and the latest earleme. The current earleme is the earleme that Libmarpa is currently working on. More specifically, it is the one at which new tokens will start. Since tokens are never zero length, a new token will always end after the current earleme. The current earleme is initially earleme 0. Every call to marpa_r_earleme_complete() advances the current earleme by 1.

The furthest earleme is the highest numbered (and therefore “furthest”) earleme at which a token ends. The furthest earleme is initially earleme 0. With every call to marpa_r_alternative(), the end of the token it adds is calculated. A token ends at the earleme location current+length, where current is the current earleme, and length is the length of the newly added token. After a call to marpa_r_alternative(), the furthest earleme is its value before the call, or current+length, whichever is greater.

The latest earleme is the earleme of the latest Earley set. The latest Earley set is the last Earley set completed. This is always the highest numbered Earley set. If there is an Earley set at the current earleme, it is the latest Earley set and the latest earleme is equal to the current earleme. There is never an Earley set after the current earleme.

After every call to the marpa_r_earleme_complete() method that adds a token, the value of the latest earleme is same as the value of the current earleme. After every call to the marpa_r_earleme_complete() method that does not add a token, the value of the latest earleme is unchanged from its value before the call.


4.1.3 The significances of the earleme variables

The current earleme tracks the advance of the recognizer through the input. Input tokens always start at the current earleme. An application can advance past the current earleme, by calling marpa_r_earleme_complete(), which increments the current earleme by 1. After initialization, marpa_r_earleme_complete() is the only way to manipulate the value of the current earleme.

The furthest earleme tracks how “far out” tokens can be found. In the standard input model, calling marpa_r_earleme_complete() after each marpa_r_alternative() call is sufficient to process all inputs, and the furthest earleme’s value can be typically be ignored. In alternative input models, if tokens have lengths greater than 1, calling marpa_r_earleme_complete() once after the last token is read may not be enough to ensure that all tokens have been processed. To ensure that all tokens have been processed, an application must advance the current earleme by calling marpa_r_earleme_complete(), until the current earleme is equal to the furthest earleme.

The lastest earleme is the earleme of the last Earley set. The latest earleme is different from the current earleme if and only if there is no Earley set at the current earleme. A different end of parsing can be specified, but by default, parsing is of the input in the range from earleme 0 to the latest earleme.


4.1.4 The initial earleme settings

All input models have the same initial values. Initially the current, latest and furthest earleme are always earleme 0.

Understanding the settings of current, latest and furthest earleme is crucial to working with advanced input models, and for this reason the next sections will go through the possibilities carefully. The presentation will start with the most traditional and restrictive models. It will proceed to less restrictive models.


4.1.5 The standard model of input

In the standard model of input, calls to marpa_r_alternative() and marpa_r_earleme_complete() are made in pairs. There is, first, exactly one call to marpa_r_alternative(), and it is for a token with length 1. Following it must be a call to marpa_r_earleme_complete(). For an input of length n, there will be exactly n such paired calls.

Suppose, in the standard model that, for a call to marpa_r_alternative(), the following is true:

Because this is the standard model, this means that we also know that

In that case, after the call to marpa_r_alternative(), the following will be true:

Suppose, in the standard model that, for a call to marpa_r_earleme_complete(), the following is true:

Because this is the standard model, this means that we also know that

In that case, after the call to marpa_r_earleme_complete(), the current, latest and furthest earlemes will be the same. Specifically, the following will be true:


4.1.6 Ambiguous input

As a first loosening of the standard model, we no longer require calls to marpa_r_alternative() to be paired with calls to marpa_r_earleme_complete(). Instead, we allow a series of one or more calls to marpa_r_alternative() before each call to marpa_r_earleme_complete(). We still require that there be at least one call to marpa_r_alternative() before each call to marpa_r_earleme_complete(), and we still require that all tokens have a length of 1.

In the ambiguous input model, the behavior of the current, latest and furthest earlemes are exactly as described for the standard model, with one minor exception: Where the current earleme is c, and a call to marpa_r_alternative() is not the first of a series, the value of the furthest earleme before the call will be c+1.


4.1.7 Variable length tokens

Our next loosening of the restrictions is to allow variable length tokens. That is, instead of requiring that all tokens be of length 1, we allow tokens to be of length 1 or longer. This does change the behavior of the earleme variables.

Suppose, in the variable length token model that, for a call to marpa_r_alternative(), the following is true:

In that case, after the call to marpa_r_alternative(), the following will be true:

Suppose, in the variable length token model that, for a call to marpa_r_earleme_complete(), the following is true:

In that case, after the call to marpa_r_earleme_complete(), the following will be true:


4.1.8 The generalized model

To fully generalize the input model, we now need to remove only one restriction. We now allow empty earlemes — earlemes with no tokens and no Earley set. For the generalized input model, the effect on the earleme variables of a call to marpa_r_alternative() is exactly the same as it is for the variable length input model. The effect on the earleme variables of a call to to marpa_r_earleme_complete() depends on whether or not that call creates an empty earleme. A call to marpa_r_earleme_complete() creates an empty earleme if and only if it falls into one of these two cases:

Suppose, in the generalized input model that, for a call to marpa_r_earleme_complete() that creates an empty earleme, the following is true:

In that case, after the call to marpa_r_earleme_complete(), the following will be true:

Suppose, in the generalized input model that, for a call to marpa_r_earleme_complete() that does not create an empty earleme, the following is true:

In that case, after the call to marpa_r_earleme_complete(), the following will be true:


4.1.9 General rules for the earleme variables

At this point, the most generalized input model has been introduced. Here are some useful facts about the earleme variables that will always be the case, no matter which of the input models is in use.

If the parser is not exhausted, the following will always be true, no matter which of the input models is in use.

In an exhausted parser, the following will always be true, no matter which of the input models is in use.


4.2 Terminals

A terminal symbol is a symbol which may appear in the input. Traditionally, all LHS symbols, as well as the start symbol, must be non-terminals. This is Marpa’s behavior, by default.

Marpa allows the user to eliminate the distinction between terminals and non-terminals. In this, it differs from traditional parsers. Libmarpa can arrange for a terminal to appear on the LHS of one or more rules, or for a terminal to be the start symbol. However, since terminals can never be zero length, it is a logical contradiction for a nulling symbol to also be a terminal and Marpa does not allow it.

Token values are int’s. Libmarpa does nothing with token values except accept them from the application and return them during parse evaluation.


5 Semantics

Libmarpa handling of semantics is unusual. Most semantics are left up to the application, but Libmarpa guides them. Specifically, the application is expected to maintain the evaluation stack. Libmarpa’s valuator provides instructions on how to handle the stack. Libmarpa’s stack handling instructions are called “steps”. For example, a Libmarpa step might tell the application that the value of a token needs to go into a certain stack position. Or a Libmarpa step might tell the application that a rule is to be evaluated. For rule evalution, Libmarpa will tell the application where the operands are to be found, and where the result must go.

The detailed discussion of Libmarpa’s handling of semantics is in the reference chapters of this document, under the appropriate methods and classes. The most extensive discussion of the semantics is in the section that deals with the methods of the value time class. See Value methods.


6 Threads

Libmarpa is thread-safe, given circumstances as described below. The Libmarpa methods are not reentrant.

Libmarpa is C89-compliant. It uses no global data, and calls only the routines that are defined in the C89 standard and that can be made thread-safe. In most modern implementations, the default C89 implementation is thread-safe to the extent possible. But the C89 standard does not require thread-safety, and even most modern environments allow the user to turn thread safety off. To be thread-safe, Libmarpa must be compiled and linked in an environment that provides thread-safety.

While Libmarpa can be used safely across multiple threads, a Libmarpa grammar cannot be. Further, a Libmarpa time object can only be used safely in the same thread as its base grammar. This is because all time objects with the same base grammar share data from that base grammar.

To work around this limitation, the same grammar definition can be used to a create a new Libmarpa grammar time object in each thread. If there is sufficient interest, future versions of Libmarpa could allow thread-safe cloning of grammars and other time objects.


7 Fatal Errors

Libmarpa avoids fatal errors, leaving that decision up to the application, with one exception. Currently, if malloc fails to allocate memory, Libmarpa terminates the program with a fatal error.

While this is in keeping with current practice, future versions of Libmarpa are likely to both allow an alternative memory allocator to be specified, and to allow the user to specifier a handler to be called when an out-of-memory condition occurs.


8 Introduction to the external interface

The following chapters describe Libmarpa’s external interface in detail.


8.1 About the overviews

The reference sections for each major class begin with an overview. These sections name the most important Libmarpa methods in the order in which they are typically used, and very briefly describe their purpose. The overviews are intended as “cheat sheets”.

The overview sections often speak of an “archetypal” application. The archetypal Libmarpa application implements a complete logic flow, starting with the creation of a grammar, and proceeding all the way to the return of the final result from a value object. In the archetypal Libmarpa application, the grammar, input and semantics are all small but non-trivial.


8.2 Return values

Return values are discussed method by method, but some general practices are worth mentioning. For methods that return an integer, Libmarpa usually reserves -1 for special purposes, such as indicating loop termination in an iterator. In Libmarpa, methods usually indicate failure by returning -2. Any result greater than -2 usually indicates success.

If a method returns an pointer value, NULL usually indicates failure. Any other result usually indicates success.

On failure, a Libmarpa method always sets the error code. On success, a Libmarpa method may take one of three actions:

Which of the three actions the method takes on success should be regarded as unspecified, unless stated in the description of the method in this document. If a method sets an informational error code on success, that will always be stated in its description, and the description will specify, for each informational error code, the specific error code value, the circumstances under which it is set, and what return values will accompany it.

Informational error codes are set because some applications may find them convenient. Typically, informational error codes are redundant information, which may be ignored.

A method can have many reasons for failing, and many of the reasons for failure are common to large number of methods. Full descriptions of the error codes that are returned by the external methods are given in their own section. See External error codes.


8.3 Naming conventions

Methods in Libmarpa follow a strict naming convention. All methods have a name beginning with marpa_, if they are part of the external interface. If an external method is not a static method, its name is prefixed with one of marpa_c_, marpa_g_, marpa_r_, marpa_b_, marpa_o_, marpa_t_ or marpa_v_, where the single letter between underscores is one of the Libmarpa major class abbreviations. The letter indicates which class the method belongs to.

Methods that are exported, but that are part of the internal interface, begin with _marpa_. Methods that are part of the internal interface (often called “internal methods”) are subject to change and are intended for use only by Libmarpa’s developers.

Libmarpa reserves the marpa_ and _marpa_ prefixes for itself, with all their capitalization variants. All Libmarpa names visible outside the package will begin with a capitalization variant of one of these two prefixes.


9 Static methods

Function: Marpa_Error_Code marpa_check_version (int required_major, int required_minor, int required_micro )

Checks that the Marpa library in use is compatible with the given version. Generally you will pass in the constants MARPA_MAJOR_VERSION, MARPA_MINOR_VERSION, MARPA_MICRO_VERSION as the three arguments to this function; that produces a check that the library in use is compatible with the version of Libmarpa the application or module was compiled against. Compatibility is defined by two things: first the version of the running library is newer than the version required_major.required_minor.required_micro. Second the running library must be binary compatible with the version required_major.required_minor.required_micro (same major version.)

Return value: MARPA_ERR_NONE if the Marpa library is compatible with the requested version. If the library is not compatible, one of the following is returned, indicating the nature of the mismatch:

Function: Marpa_Error_Code marpa_version (int* version)

Returns the version number in version, which must have room for three int’s.

Return value: A non-negative number on success, -2 on failure.


10 Configuration methods

The configuration object is intended for future extensions. These may allow the application to override Libmarpa’s memory allocation and fatal error handling without resorting to global variables, and therefore in a thread-safe way. Currently, the only function of the Marpa_Config class is to give marpa_g_new() a place to put its error code.

Marpa_Config is Libmarpa’s only “major” class which is not a time class. There is no constructor or destructor, although Marpa_Config objects do need to be initialized before use. Aside from its own accessor, Marpa_Config objects are only used by marpa_g_new and no reference to their location is not kept in any of Libmarpa’s time objects. The intent is to that it be convenient to have them in memory that might be deallocated soon after marpa_g_new returns. For example, they could be put on the stack.

Function: int marpa_c_init ( Marpa_Config* config)

Initialize the config information to “safe” default values. Unspecified behavior will result if an initialized configuration is used to create a grammar.

Return value: A non-negative value. Always succeeds.

Function: Marpa_Error_Code marpa_c_error ( Marpa_Config* config, const char** p_error_string )

Error codes are usually kept in the base grammar, which leaves marpa_g_new() no place to put its error code on failure. Objects of the Marpa_Config class provide such a place. p_error_string is reserved for use by the internals. Applications should set it to NULL.

Return value: The error code in config. Always succeeds.


11 Grammar methods


11.1 Overview

An archetypal application has a grammar. To create a grammar, use the marpa_g_new() method. When a grammar is no longer in use, its memory can be freed using the marpa_g_unref() method.

To be precomputed, a grammar must have one or more symbols. To create symbols, use the marpa_g_symbol_new() method.

To be precomputed, a grammar must have one or more rules. To create rules, use the marpa_g_rule_new() and marpa_g_sequence_new() methods.

For non-trivial parsing, one or more of the symbols must be terminals. To mark a symbol as a terminal, use the marpa_g_symbol_is_terminal_set() method.

To be precomputed, a grammar must have exactly one start symbol. To mark a symbol as the start symbol, use the marpa_g_start_symbol_set() method.

Before parsing with a grammar, it must be precomputed. To precompute a grammar, use the marpa_g_precompute() method.


11.2 Creating a new grammar

Function: Marpa_Grammar marpa_g_new ( Marpa_Config* configuration )

Creates a new grammar time object. The returned grammar object is not yet precomputed, and will have no symbols and rules. Its reference count will be 1.

Unless the application calls marpa_c_error, Libmarpa will not reference the location pointed to by the configuration argument after marpa_g_new returns. The configuration argument may be NULL, but if it is, there will be no way to determine the error code on failure.

Return value: On success, the grammar object. On failure, NULL, and the error code is set in configuration.

Function: int marpa_g_force_valued ( Marpa_Grammar g )

It is recommended that this call be made immediately after the grammar constructor. It turns off a deprecated feature.

The marpa_g_force_valued forces all the symbols in a grammar to be “valued”. The opposite of a valued symbol is one about whose value you do not care. This distinction has been made in the past in hope of gaining efficiencies at evaluation time. Current thinking is that the gains do not repay the extra complexity.

Return value: On success, a non-negative integer. On failure, a negative integer.


11.3 Tracking the reference count of the grammar

Function: Marpa_Grammar marpa_g_ref (Marpa_Grammar g)

Increases the reference count by 1. Not needed by most applications.

Return value: On success, the grammar object it was called with; NULL on failure.

Function: void marpa_g_unref (Marpa_Grammar g)

Decreases the reference count by 1, destroying g once the reference count reaches zero.


11.4 Symbols

Function: Marpa_Symbol_ID marpa_g_start_symbol (Marpa_Grammar g)

Returns current value of the start symbol of grammar g. The value is that specified in the marpa_g_start_symbol_set() call, if there has been one.

Return value: On failure, -2; -1 if there is no start symbol yet; otherwise the ID of the new start symbol.

Function: Marpa_Symbol_ID marpa_g_start_symbol_set ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)

Sets the start symbol of grammar g to symbol id.

Return value: On success, the ID of the new start symbol. If sym_id is well-formed, but there is no such symbol, -1. On failure, -2.

Function: int marpa_g_highest_symbol_id (Marpa_Grammar g)

Return value: On success, the numerically largest symbol ID of g. On failure, -2.

Function: int marpa_g_symbol_is_accessible (Marpa_Grammar g, Marpa_Symbol_ID sym_id)

A symbol is accessible if it can be reached from the start symbol.

Return value: On success, 1 if symbol sym_id is accessible, 0 if not. If sym_id is well-formed, but there is no such symbol, -1. If the grammar is not precomputed, or on other failure, -2.

Function: int marpa_g_symbol_is_nullable ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)

A symbol is nullable if it sometimes produces the empty string. A nulling symbol is always a nullable symbol, but not all nullable symbols are nulling symbols.

Return value: On success, 1 if symbol sym_id is nullable, 0 if not. If sym_id is well-formed, but there is no such symbol, -1. If the grammar is not precomputed, or on other failure, -2.

Function: int marpa_g_symbol_is_nulling (Marpa_Grammar g, Marpa_Symbol_ID sym_id)

A symbol is nulling if it always produces the empty string.

Return value: On success, 1 if symbol sym_id is nulling, 0 if not. If sym_id is well-formed, but there is no such symbol, -1. If the grammar is not precomputed, or on other failure, -2.

Function: int marpa_g_symbol_is_productive (Marpa_Grammar g, Marpa_Symbol_ID sym_id)

A symbol is productive if it can produce a string of terminals. All nullable symbols are considered productive.

Return value: On success, 1 if symbol sym_id is productive, 0 if not. If sym_id is well-formed, but there is no such symbol, -1. If the grammar is not precomputed, or on other failure, -2.

Function: int marpa_g_symbol_is_start ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)

This return value of this call indicates whether sym_id is the start symbol.

Success value: Returns 1 if sym_id is the start symbol. Returns 0 if sym_id is not the start symbol, either because the start symbol is different from sym_id, or because the start symbol has not been set yet.

Failure values: If sym_id is well-formed, but there is no such symbol, -1. On other failure, -2.

Function: int marpa_g_symbol_is_terminal_set ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, int value)
Function: int marpa_g_symbol_is_terminal ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)

These methods, respectively, set and query the “terminal status” of a symbol. To be used as an input symbol in the marpa_r_alternative() method, a symbol must be a terminal. This function flags symbol sym_id as a terminal if value is 1, or flags it as a non-terminal if value is 0.

Once set to a value with the marpa_g_symbol_is_terminal_set() method, the terminal status of a symbol is “locked” at that value. A subsequent call to marpa_g_symbol_is_terminal_set() that attempts to change the terminal status of sym_id to a value different from its current one will fail. The error code will be MARPA_ERR_TERMINAL_IS_LOCKED.

By default, a symbol is a terminal if and only if it does not appear on the LHS of any rule. An attempt to flag a nulling symbol as a terminal will cause a failure, but this is not necessarily detected before precomputation.

Success returns: When successful, these methods return 1 if symbol sym_id is a terminal symbol after the call, 0 otherwise.

Failure returns: If sym_id is well-formed, but there is no such symbol, -1. On all other failures, -2. Other failures include when value is not 0 or 1; when the terminal status is locked and value is different from its current value; and, in the case of marpa_g_symbol_is_terminal_set(), when the grammar g is precomputed.

Function: Marpa_Symbol_ID marpa_g_symbol_new (Marpa_Grammar g)

Creates a new symbol. The symbol ID will be a non-negative integer.

Return value: On success, the ID of a new symbol; On failure, -2.


11.5 Rules

Function: int marpa_g_highest_rule_id (Marpa_Grammar g)

Return value: On success, the numerically largest rule ID of g. On failure, -2.

Function: int marpa_g_rule_is_accessible (Marpa_Grammar g, Marpa_Rule_ID rule_id)

A rule is accessible if it can be reached from the start symbol. A rule is accessible if and only if its LHS symbol is accessible. The start rule is always an accessible rule.

Return value: On success, 1 if rule rule_id is accessible, 0 if not. If rule_id is well-formed, but there is no such rule, -1. If the grammar is not precomputed, or on other failure, -2.

Function: int marpa_g_rule_is_nullable ( Marpa_Grammar g, Marpa_Rule_ID ruleid)

A rule is nullable if it sometimes produces the empty string. A nulling rule is always a nullable rule, but not all nullable rules are nulling rules.

Return value: On success, 1 if rule ruleid is nullable, 0 if not. If rule_id is well-formed, but there is no such rule, -1. If the grammar is not precomputed, or on other failure, -2.

Function: int marpa_g_rule_is_nulling (Marpa_Grammar g, Marpa_Rule_ID ruleid)

A rule is nulling if it always produces the empty string.

Return value: On success, 1 if rule ruleid is nulling, 0 if not. If rule_id is well-formed, but there is no such rule, -1. If the grammar is not precomputed, or on other failure, -2.

Function: int marpa_g_rule_is_loop (Marpa_Grammar g, Marpa_Rule_ID rule_id)

A rule is a loop rule if it non-trivially produces the string of length one which consists only of its LHS symbol. Such a derivation takes the parse back to where it started, hence the term “loop”. “Non-trivially” means the zero-step derivation does not count — the derivation must have at least one step.

The presence of a loop rule makes a grammar infinitely ambiguous, and applications will typically want to treat them as fatal errors. But nothing forces an application to do this, and Marpa will successfully parse and evaluate grammars with loop rules.

Return value: On success, 1 if rule rule_id is a loop rule, 0 if not. If rule_id is well-formed, but there is no such rule, -1. If the grammar is not precomputed, or on other failure, -2.

Function: int marpa_g_rule_is_productive (Marpa_Grammar g, Marpa_Rule_ID rule_id)

A rule is productive if it can produce a string of terminals. A rule is productive if and only if all the symbols on its RHS are productive. The empty string counts as a string of terminals, so that a nullable rule is always a productive rule. For that same reason, an empty rule is considered productive.

Return value: On success, 1 if rule rule_id is productive, 0 if not. If rule_id is well-formed, but there is no such rule, -1. If the grammar is not precomputed, or on other failure, -2.

Function: int marpa_g_rule_length ( Marpa_Grammar g, Marpa_Rule_ID rule_id)

The length of a rule is the number of symbols on its RHS.

Return value: On success, the length of rule rule_id. On failure, -2.

Function: Marpa_Symbol_ID marpa_g_rule_lhs ( Marpa_Grammar g, Marpa_Rule_ID rule_id)

Return value: On success, the LHS symbol of rule rule_id. If rule_id is well-formed, but there is no such rule, -1. On other failure, -2.

Function: Marpa_Rule_ID marpa_g_rule_new (Marpa_Grammar g, Marpa_Symbol_ID lhs_id, Marpa_Symbol_ID *rhs_ids, int length)

Creates a new external BNF rule in grammar g. The ID of the new rule will be a non-negative integer, which will be unique to that rule. In addition to BNF rules, Marpa also allows sequence rules, which are created by another method: marpa_g_sequence_new(). Sequence rules and BNF rules are numbered in the same series, so that a BNF rule will never have the same rule ID as a sequence rule, and vice versa.

The LHS symbol is lhs_id, and there are length symbols on the RHS. The RHS symbols are in an array pointed to by rhs_ids.

Possible failures, with their error codes, include:

Return value: On success, the ID of new external rule. On failure, -2.

Function: Marpa_Symbol_ID marpa_g_rule_rhs ( Marpa_Grammar g, Marpa_Rule_ID rule_id, int ix)

Returns the ID of the symbol in position ix in the RHS of rule rule_id. The RHS position, ix, is zero-based.

Return value: On success, the ID of the symbol in position ix on the rules RHS. If rule_id is well-formed, but there is no such rule, -1. If ix is greater than or equal to the length of the rule, or on other failure, -2.


11.6 Sequences

Function: int marpa_g_rule_is_proper_separation ( Marpa_Grammar g, Marpa_Rule_ID rule_id)

Success returns: marpa_g_rule_is_proper_separation() succeeds if and only if rule_id is valid. If rule rule_id is a sequence rule where the proper separation flag is set, returns 1. On other success, including the case where rule rule_id is not a sequence rule, returns 0.

The marpa_g_rule_is_proper_separation() does not distinguish rules without proper separation from rules for which separation is not defined, because they are not sequence rules. Applications for which this distinction is important should use the marpa_g_sequence_min() method.

Failure returns: If rule_id is well-formed, but there is no such rule, returns -1. On other failure, -2.

Function: int marpa_g_sequence_min ( Marpa_Grammar g, Marpa_Rule_ID rule_id)

Returns the mininum length of a sequence rule. This accessor can be also be used to test whether or not a rule is a sequence rule. -1 is returned if and only if the rule is valid but not a sequence rule.

Return value: If rule rule_id is a sequence rule, its minimum length. If rule rule_id is valid, but is not the rule ID of sequence rule, -1. If rule_id is well-formed, but there is no such rule, or on other failure, -2.

Function: Marpa_Rule_ID marpa_g_sequence_new (Marpa_Grammar g, Marpa_Symbol_ID lhs_id, Marpa_Symbol_ID rhs_id, Marpa_Symbol_ID separator_id, int min, int flags )

Adds a new sequence rule to grammar g. The ID of the new sequence rule will be a non-negative integer, which is unique to that rule. Sequence rules and BNF rules are numbered in the same series, so that a BNF rule will never have the same rule ID as a sequence rule, and vice versa.

The sequence is lhs_id, and the item to be repeated in the sequence is rhs_id. The sequence must be repeated at least min times, where min is 0 or 1. If separator_id is non-negative, it is a separator symbol.

If flags & MARPA_PROPER_SEPARATION is non-zero, separation is “proper”, that is, a trailing separator is not allowed. The term proper is based on the idea that properly-speaking, separators should actually separate items.

Some higher-level Marpa interfaces offer the ability to discard separators in the semantics, and in fact will do this by default. At the Libmarpa level, sequences always “keep separators”. It is up to the programmer to arrange to discard separators, if that is what is desired.

The sequence RHS, or item, is restricted to a single symbol, and that symbol cannot be nullable. If separator_id is a symbol, it also cannot be a nullable symbol. Nullables on the RHS of sequences are restricted because they lead to highly ambiguous grammars. Grammars of this kind are allowed by Libmarpa, but they must be expressed using BNF rules, not sequence rules. This is for two reasons: First, sequence optimizations would not work in the presence of nullables. Second, since it is not completely clear what an application intends when it asks for a sequence of identical items, some of which are nullable, the user’s intent can be more clearly expressed directly in BNF.

The LHS symbol cannot be the LHS of any other rule, whether a BNF rule or a sequence rule. On an attempt to create an sequence rule with a duplicate LHS, marpa_g_sequence_new() fails, setting the error code to MARPA_ERR_SEQUENCE_LHS_NOT_UNIQUE.

Sequence rules do not add to the classes of grammar parsed by Libmarpa — a sequence can always be written as BNF rules. When a rule is created with the marpa_g_sequence_new() method, Libmarpa will understand that it is a sequence, and will optimize accordingly. The speedup is often considerable.

Return value: On success, the ID of the external rule. On failure, -2.

Function: int marpa_g_sequence_separator ( Marpa_Grammar g, Marpa_Rule_ID rule_id)

Return value: If rule rule_id is a sequence rule, its separator. If rule rule_id is a sequence rule, but there is no separator, -1. If rule_id is not a sequence rule, does not exist or is not well-formed; or on other failure, -2.

Function: int marpa_g_symbol_is_counted (Marpa_Grammar g, Marpa_Symbol_ID sym_id)

A symbol is counted if it appears on the RHS of a sequence rule, or if it is used as the separator symbol of a sequence rule.

Success return: Returns 1 if symbol sym_id is counted, 0 if not.

Failure return: If sym_id is well-formed, but there is no such symbol, -1. If sym_id is not well-formed, and on other failure, -2.


11.7 Ranks

Function: Marpa_Rank marpa_g_rule_rank_set ( Marpa_Grammar g, Marpa_Rule_ID rule_id, Marpa_Rank rank)
Function: Marpa_Rank marpa_g_rule_rank ( Marpa_Grammar g, Marpa_Rule_ID rule_id)

These methods, respectively, set and query the rank of a rule rule_id. When rule_id is created, its rank initialized to the default rank of the grammar.

Initially, the default rank of the grammar is 0. Methods to reset the grammar’s default rank are currently in the experimental stage.

Return value: On success, returns the rank after the call, and sets the error code to MARPA_ERR_NONE. On failure, returns -2, and sets the error code to an appropriate value, which will never be MARPA_ERR_NONE. Note that when the rank is -2, the error code is the only way to distinguish success from failure. The error code can be determined by using the marpa_g_error() call.

Function: int marpa_g_rule_null_high_set ( Marpa_Grammar g, Marpa_Rule_ID rule_id, int flag)
Function: int marpa_g_rule_null_high ( Marpa_Grammar g, Marpa_Rule_ID rule_id)

These methods, respectively, set and query the “null ranks high” setting of the rule rule_id. The “null ranks high” setting is either 0 or 1. When rule_id is created, its “null ranks high” setting is initialized to 0.

The “null ranks high” setting affects the ranking of rules with properly nullable symbols on their right hand side. If a rule has properly nullable symbols on its RHS, each instance in which it appears in a parse will have a pattern of nulled and non-nulled symbols. Such a pattern is called a “null variant”.

If the “null ranks high” setting is 0 (the default), nulled symbols rank low. If the “null ranks high” setting is 1, nulled symbols rank high. Ranking of a null variants is done from left-to-right.

The marpa_g_rule_null_high_set() method will return failure after the grammar has been precomputed. If there is no other cause of failure, the marpa_g_rule_null_high() method succeeds on both precomputed and unprecomputed grammars.

Success return value: The value of the “null ranks high” flag after the call.

Failure return value: If rule_id is well-formed, but there is no such rule, -1. On all other failures, -2.

11.8 Events

Function: int marpa_g_completion_symbol_activate ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, int reactivate )

Allows the user to deactivate and reactivate symbol completion events in the grammar. When a recognizer is created, the activation status of each of its events is initialized to the activation status of that event in the base grammar. If reactivate is zero, the event is deactivated in the grammar. If reactivate is one, the event is activated in the grammar.

Symbol completion events are active by default if the symbol was set up for completion events in the grammar. If a symbol was not set up for completion events in the grammar, symbol completion events are inactive by default and any attempt to change that is a fatal error.

The activation status of a completion event in the grammar can only be changed if the symbol is marked as a completion event symbol in the grammar, and before the grammar is precomputed. However, if a symbol is marked as a completion event symbol in the recognizer, the completion event can be deactivated and reactivated in the recognizer.

Success cases: On success, the method returns the value of reactivate. The method succeeds trivially if the symbol is already set as indicated by reactivate.

Failure cases: If the active status of the completion event for sym_id cannot be set as indicated by reactivate, the method fails. On failure, -2 is returned.

Function: int marpa_g_nulled_symbol_activate ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, int reactivate )

Allows the user to deactivate and reactivate symbol nulled events in the grammar. When a recognizer is created, the activation status of each of its events is initialized to the activation status of that event in the base grammar. If reactivate is zero, the event is deactivated in the grammar. If reactivate is one, the event is activated in the grammar.

Symbol nulled events are active by default if the symbol was set up for nulled events in the grammar. If a symbol was not set up for nulled events in the grammar, symbol nulled events are inactive by default and any attempt to change that is a fatal error.

The activation status of a nulled event in the grammar can only be changed if the symbol is marked as a nulled event symbol in the grammar, and before the grammar is precomputed. However, if a symbol is marked as a nulled event symbol in the recognizer, the nulled event can be deactivated and reactivated in the recognizer.

Success cases: On success, the method returns the value of reactivate. The method succeeds trivially if the symbol is already set as indicated by reactivate.

Failure cases: If the active status of the nulled event for sym_id cannot be set as indicated by reactivate, the method fails. On failure, -2 is returned.

Function: int marpa_g_prediction_symbol_activate ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, int reactivate )

Allows the user to deactivate and reactivate symbol prediction events in the grammar. When a recognizer is created, the activation status of each of its events is initialized to the activation status of that event in the base grammar. If reactivate is zero, the event is deactivated in the grammar. If reactivate is one, the event is activated in the grammar.

Symbol prediction events are active by default if the symbol was set up for prediction events in the grammar. If a symbol was not set up for prediction events in the grammar, symbol prediction events are inactive by default and any attempt to change that is a fatal error.

The activation status of a prediction event in the grammar can only be changed if the symbol is marked as a prediction event symbol in the grammar, and before the grammar is precomputed. However, if a symbol is marked as a prediction event symbol in the recognizer, the prediction event can be deactivated and reactivated in the recognizer.

Success cases: On success, the method returns the value of reactivate. The method succeeds trivially if the symbol is already set as indicated by reactivate.

Failure cases: If the active status of the prediction event for sym_id cannot be set as indicated by reactivate, the method fails. On failure, -2 is returned.

Function: int marpa_g_symbol_is_completion_event ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)
Function: int marpa_g_symbol_is_completion_event_set ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, int value)

Libmarpa can be set up to generate an MARPA_EVENT_SYMBOL_COMPLETED event whenever the symbol is completed. A symbol is said to be completed when a non-nulling rule with that symbol on its LHS is completed.

For completion events to occur, the symbol must be marked as a completion event symbol. The marpa_g_symbol_is_completion_event_set() function marks symbol sym_id as a completion event symbol if value is 1, and unmarks it it as a completion event symbol if value is 0. The marpa_g_symbol_is_completion_event() method returns the current value of the completion event marking for symbol sym_id.

Marking a completion event sets its activation status to on. Unmarking a completion event sets its activation status to off. The completion event marking cannot be changed once the grammar is precomputed.

If a completion event is marked, its activation status can be changed using the marpa_g_completion_symbol_activate() method. Note that, if a symbol is marked as a completion event symbol in the recognizer, its completion event can be deactivated and reactivated in the recognizer.

Nulled rules and symbols will never cause completion events. Nullable symbols may be marked as completion event symbols, but this will have an effect only when the symbol is not nulled. Nulling symbols may be marked as completion event symbols, but no completion events will ever be generated for a nulling symbol. Note that this implies at no completion event will ever be generated at earleme 0, the start of parsing.

Success: On success, 1 if symbol sym_id is a completion event symbol after the call, 0 otherwise.

Failures: If sym_id is well-formed, but there is no such symbol, -1. If the grammar g is precomputed; or on other failure, -2.

Function: int marpa_g_symbol_is_nulled_event ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)
Function: int marpa_g_symbol_is_nulled_event_set ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, int value)

Libmarpa can set up to generate an MARPA_EVENT_SYMBOL_NULLED event whenever the symbol is nulled. A symbol is said to be nulled when a zero length instance of that symbol is recognized.

For nulled events to occur, the symbol must be marked as a nulled event symbol. The marpa_g_symbol_is_nulled_event_set() function marks symbol sym_id as a nulled event symbol if value is 1, and unmarks it it as a nulled event symbol if value is 0. The marpa_g_symbol_is_nulled_event() method returns the current value of the nulled event marking for symbol sym_id.

Marking a nulled event sets its activation status to on. Unmarking a nulled event sets its activation status to off. The nulled event marking cannot be changed once the grammar is precomputed.

If a nulled event is marked, its activation status can be changed using the marpa_g_nulled_symbol_activate() method. Note that, if a symbol is marked as a nulled event symbol in the recognizer, its nulled event can be deactivated and reactivated in the recognizer.

As a reminder, a symbol instance is a symbol at a specific location in the input, and with a specific length. Also, whenever a nulled symbol instance is recognized at a location, it is acceptable at that location, and vice versa.

When a symbol instance is recognized at a location, it will generate a nulled event or a prediction event, but never both. A symbol instance of zero length, when recognized at a location, generates a nulled event at that location, and does not generate a completion event. A symbol instance of non-zero length, when acceptable at a location, generates a completion event at that location, and does not generate a nulled event.

When a symbol instance is acceptable at a location, it will generate a nulled event or a prediction event, but never both. A symbol instance of zero length, when acceptable at a location, generates a nulled event at that location, and does not generate a prediction event. A symbol instance of non-zero length, when acceptable at a location, generates a prediction event at that location, and does not generate a nulled event.

While it is not possible for a symbol instance to generate both a nulled event and a completion event at a location, it is quite possible that a symbol might generate both kinds of event at that location. This is because multiple instances of the same symbol may be recognized at a given location, and these instances will have different lengths. If one instance is recognized at a given location as zero length and a second, non-zero-length, instance is recognized at the same location, the first will generate only nulled events, while the second will generate only completion events. For similar reasons, while a symbol instance will never generate both a null event and a prediction event at a location, multiple instances of the same symbol may do so.

Zero length derivations can be ambiguous. When a zero length symbol is recognized, all of its zero-length derivations are also considered to be recognized.

The marpa_g_symbol_is_nulled_event_set() method will mark a symbol as a nulled event symbol, even if the symbol is non-nullable. This is convenient, for example, for automatically generated grammars. Applications which wish to treat it as a failure if there is an attempt to mark a non-nullable symbol as a nulled event symbol, can check for this case using the marpa_g_symbol_is_nullable() method.

Success: On success, 1 if symbol sym_id is a nulled event symbol after the call, 0 otherwise.

Failures: If sym_id is well-formed, but there is no such symbol, -1. If the grammar g is precomputed; or on other failure, -2.

Function: int marpa_g_symbol_is_prediction_event ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)
Function: int marpa_g_symbol_is_prediction_event_set ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, int value)

Libmarpa can be set up to generate a MARPA_EVENT_SYMBOL_PREDICTED event when a non-nulled symbol is predicted. A non-nulled symbol is said to be predicted when a instance of it is acceptable at the current earleme according to the grammar. Nulled symbols do not generate predictions.

For predicted events to occur, the symbol must be marked as a predicted event symbol. The marpa_g_symbol_is_predicted_event_set() function marks symbol sym_id as a predicted event symbol if value is 1, and unmarks it it as a predicted event symbol if value is 0. The marpa_g_symbol_is_predicted_event() method returns the current value of the predicted event marking for symbol sym_id.

Marking a prediction event sets its activation status to on. Unmarking a prediction event sets its activation status to off. The prediction event marking cannot be changed once the grammar is precomputed.

If a prediction event is marked, its activation status can be changed using the marpa_g_prediction_symbol_activate() method. Note that, if a symbol is marked as a prediction event symbol in the recognizer, its prediction event can be deactivated and reactivated in the recognizer.

Success: On success, 1 if symbol sym_id is a predicted event symbol after the call, 0 otherwise.

Failures: If sym_id is well-formed, but there is no such symbol, -1. If the grammar g is precomputed; or on other failure, -2.


11.9 Precomputing the Grammar

Function: int marpa_g_precompute (Marpa_Grammar g)

Precomputation is necessary for a recognizer to be generated from a grammar. On success, marpa_g_precompute returns a non-negative number to indicate that it precomputed the grammar without issues. On failure, marpa_g_precompute returns -2.

Precomputation may return one or more events, which may be queried using the marpa_g_event() method. At this point events only occur when failure is reported, and events always report issues. But application writers should expect future versions to have events which are reported on success, as well as events which do not represent issues.

A MARPA_EVENT_LOOP_RULES event occurs when there are infinite loop rules (cycles) in the grammar. The presence of one or more of these will cause failure to be reported, but will not prevent the grammar from being precomputed.

Each MARPA_EVENT_COUNTED_NULLABLE event is a symbol which is a nullable on the right hand side of a sequence rule — a “counted” symbol. The presence of one or more of these will cause failure to be reported, and will prevent the grammar from being precomputed. So that the programmer can fix several at once, these failures are delayed until events are created for all of the counted nullables.

Each MARPA_EVENT_NULLING_TERMINAL event is a nulling symbol which is also flagged as a terminal. Since terminals cannot be of zero length, this is a logical impossibility. The presence of one or more of these will cause failure to be reported, and will prevent the grammar from being precomputed. So that the programmer can fix several at once, the failure is delayed until events are created for all of the counted nullables.

Precomputation involves freezing and then thoroughly checking the grammar. Among the reasons for precomputation to fail are the following:

More details of these can be found under the description of the appropriate code. See External error codes.

marpa_g_precompute() is unusual in that it is possible to treat one of its failures as “advisory”, and to proceed with parsing. If marpa_g_precompute() fails with an error code of MARPA_ERR_GRAMMAR_HAS_CYCLE, parsing can proceed, just as it typically would for success. The grammar will have been precomputed, as calling the marpa_g_is_precomputed() method will confirm.

Most applications, however, will want to simply treat failure with MARPA_ERR_GRAMMAR_HAS_CYCLE, as simply another failure, and fix the cycles before parsing. Cycles make a grammar infinitely ambiguous, and are considered useless in current practice. Cycles make processing the grammar less efficient, sometimes considerably so. Detection of cycles is returned as failure because that is by far the convenient thing to do for the vast majority of applications.

Return value: On success, a non-negative number. On failure, -2.

Function: int marpa_g_is_precomputed (Marpa_Grammar g)

Return value: On success, 1 if grammar g is already precomputed, 0 otherwise. On failure, -2.

Function: int marpa_g_has_cycle (Marpa_Grammar g)

This function allows the application to determine if grammar g has a cycle. As mentioned, most applications will want to treat these as fatal errors. To determine which rules are in the cycle, marpa_g_rule_is_loop() can be used.

Return value: On success, 1 if the grammar has a cycle, 0 otherwise. On failure, -2.


12 Recognizer methods


12.1 Overview

An archetypal application uses a recognizer to read input. To create a recognizer, use the marpa_r_new() method. When a recognizer is no longer in use, its memory can be freed using the marpa_r_unref() method.

To make a recognizer ready for input, use the marpa_r_start_input() method.

The recognizer starts with its current earleme at location 0. To read a token at the current earleme, use the marpa_r_alternative() call.

To complete the processing of the current earleme, and move forward to a new one, use the marpa_r_earleme_complete() call.


12.2 Creating a new recognizer

Function: Marpa_Recognizer marpa_r_new ( Marpa_Grammar g )

Creates a new recognizer. The reference count of the recognizer will be 1. The reference count of g, the base grammar, will be incremented by one.

Return value: On success, the newly created recognizer. If g is not precomputed, or on other failure, NULL.


12.3 Keeping the reference count of a recognizer

Function: Marpa_Recognizer marpa_r_ref (Marpa_Recognizer r)

Increases the reference count by 1. Not needed by most applications.

Return value: On success, the recognizer object, r. On failure, NULL.

Function: void marpa_r_unref (Marpa_Recognizer r)

Decreases the reference count by 1, destroying r once the reference count reaches zero. When r is destroyed, the reference count of its base grammar is decreased by one. If this takes the reference count of the base grammar to zero, it too is destroyed.


12.4 Life cycle mutators

Function: int marpa_r_start_input (Marpa_Recognizer r)

Makes r ready to accept input. The first Earley set, the one at earleme 0, will be completed during this call.

Because the call to marpa_r_start_input() completes an Earley set, it may generate events. For details about the events that may be generated during Earley set completion, see the description of the marpa_r_earleme_complete() method.

Return value: On success, a non-negative value. On failure, -2.

Function: int marpa_r_alternative (Marpa_Recognizer r, Marpa_Symbol_ID token_id, int value, int length)

Reads a token into r. The token will start at the current earleme. Libmarpa allows tokens to be ambiguous, to be of variable length and to overlap. token_id is the symbol ID of the token, which must be a terminal. length is the length of the token.

value is an integer that represents the value of the token. In applications where the token’s actual value is not an integer, it is expected that the application will use this value as a “virtual” value, perhaps finding the actual value by using value to index an array. value is not used inside Libmarpa — it is simply stored to be returned by the valuator as a convenience for the application. Some applications may prefer to track token values on their own, perhaps based on the earleme location and token_id, instead of using Libmarpa’s token values.

A value of 0 is reserved for a now-deprecated feature. Do not use it. For more details on that feature, see the section Valued and unvalued symbols.

When marpa_r_alternative() is successful, the value of the furthest earleme is set to the greater of its value before the call, and current+length, where current is the value of the current earleme. The values of the current and latest earlemes are unchanged by calls to marpa_r_alternative().

Several error codes leave the recognizer in a fully recoverable state, allowing the application to retry the marpa_r_alternative() method. Retry is efficient, and quite useable as a parsing technique. The error code of primary interest from this point of view is MARPA_ERR_UNEXPECTED_TOKEN_ID, which indicates that the token was not accepted because of its token ID. Retry after this condition is used in several applications, and is called “the Ruby Slippers technique”.

The error codes MARPA_ERR_DUPLICATE_TOKEN, MARPA_ERR_NO_TOKEN_EXPECTED_HERE and MARPA_ERR_INACCESSIBLE_TOKEN also leave the recognizer in a fully recoverable state, and may also be useable for the Ruby Slippers or similar techniques. At this writing, the author knows of no applications which attempt to recover from these errors.

Return value: On success, MARPA_ERR_NONE. On failure, some other error code.

Function: int marpa_r_earleme_complete (Marpa_Recognizer r)

This method does the final processing for the current earleme. It then advances the current earleme by one. Note that marpa_r_earleme_complete() may be called even when no tokens have been read at the current earleme — in the character-per-earleme input model, for example, tokens can span many characters and, if the input is unambiguous over that span, there will be no other tokens that start inside it.

As mentioned, marpa_r_earleme_complete() always advances the current earleme, incrementing its value by 1. This means that value of the current earleme after the call will be the one plus the value of the earleme processed by the call to marpa_r_earleme_complete(). If any token was accepted at the earleme being processed, marpa_r_earleme_complete() creates a new Earley set which will be the latest Earley set and, after the call, the latest earleme will be equal to the new current earleme. If no token was accepted at the earleme being processed, no Earley set is created, and the value of the latest earleme remains unchanged. The value of the furthest earleme is never changed by a call to marpa_r_earleme_complete().

During this method, one or more events may occur. On success, this function returns the number of events generated, but it is important to note that events may be created whether earleme completion fails or succeeds. When this method fails, the application must call marpa_g_event() if it wants to determine if any events occurred. Since the reason for failure to complete an earleme is often detailed in the events, applications that fail will often be at least as interested in the events as those that succeed.

The MARPA_EVENT_EARLEY_ITEM_THRESHOLD event indicates that an application-settable threshold on the number of Earley items has been reached or exceeded. What this means depends on the application, but when the default threshold is exceeded, it means that it is very likely that the time and space resources consumed by the parse will prove excessive.

A parse is “exhausted” when it can accept no more input. This can happen both on success and on failure. Note that the failure due to parse exhaustion only means failure at the current earleme. There may be successful parses at earlier earlemes.

If a parse is exhausted, but successful, an event with the event code MARPA_EVENT_EXHAUSTED occurs. Because the parse is exhausted, no input will be accepted at later earlemes. It is quite common for a parse to become exhausted when it succeeds. Many practical grammars are designed so that a successful parse cannot be extended.

An exhausted parse may cause a failure, in which case marpa_r_earleme_complete() returns an error whose error code is MARPA_ERR_PARSE_EXHAUSTED. For a parse to fail at an earleme due to exhaustion, it must be the case that no alternatives were accepted at that earleme. In fact, in the standard input model, a failure due to parse exhaustion occurs if and only if no alternatives were accepted at the current earleme.

The circumstances under which failure due to parse exhaustion occurs are slightly more complicated when variable length tokens are in use. Informally, a parse will never fail due to exhaustion as long as it is possible that a token ending at some future earleme will continue the parse. More precisely, a call to marpa_r_earleme_complete() fails due to parse exhaustion if and only if, first, no alternatives were added at the current earleme and, second, that call left the current earleme equal to the furthest earleme.

Return value: On success, the number of events generated. On failure, -2.


12.5 Location accessors

Function: Marpa_Earleme marpa_r_current_earleme (Marpa_Recognizer r)

Return value: If input has started, the current earleme. If input has not started, -1. Always succeeds.

Function: Marpa_Earleme marpa_r_earleme ( Marpa_Recognizer r, Marpa_Earley_Set_ID set_id)

In the default, token-stream model, Earley set ID and earleme are always equal, but this is not the case in other input models. (The ID of an Earley set ID is also called its ordinal.) If there is no Earley set whose ID is set_id, marpa_r_earleme() fails. If set_id was negative, the error code is set to MARPA_ERR_INVALID_LOCATION. If set_id is greater than the ordinal of the latest Earley set, the error code is set to MARPA_ERR_NO_EARLEY_SET_AT_LOCATION.

At this writing, there is no method for the inverse operation (conversion of an earleme to an Earley set ID). One consideration in writing such a method is that not all earlemes correspond to Earley sets. Applications that want to map earlemes to Earley sets will have no trouble if they are using the standard input model — the Earley set ID is always exactly equal to the earleme in that model. For other applications that want an earleme-to-ID mapping, the most general method is create an ID-to-earleme array using the marpa_r_earleme() method and invert it.

Return value: On success, the earleme corresponding to Earley set set_id. On failure, -2.

Function: int marpa_r_earley_set_value ( Marpa_Recognizer r, Marpa_Earley_Set_ID earley_set)

Returns the integer value of earley_set. For more details, see the description of marpa_r_earley_set_values().

Return value: On success, the value of earley_set. On failure, -2.

Function: int marpa_r_earley_set_values ( Marpa_Recognizer r, Marpa_Earley_Set_ID earley_set, int* p_value, void** p_pvalue )

If p_value is non-zero, sets the location pointed to by p_value to the integer value of the Earley set. Similarly, if p_pvalue is non-zero, sets the location pointed to by p_pvalue to the pointer value of the Earley set.

The “value” and “pointer” of an Earley set are an arbitrary integer and an arbitrary pointer that the application can use for its own purposes. In character-per-earleme input models, for example, the integer can be the codepoint of the current character. In a traditional token-per-earleme input model, they could be used to indicate the string value of the token – the pointer could point to the start of the string, and the integer could indicate its length.

The Earley set value and pointer can be set using the marpa_r_latest_earley_set_values_set() method. The Earley set integer value defaults to -1, and the pointer value defaults to NULL.

Return value: On success, returns a non-negative integer. On failure, returns -2.

Function: unsigned int marpa_r_furthest_earleme (Marpa_Recognizer r)

Return value: On success, the furthest earleme. Always succeeds.

Function: Marpa_Earley_Set_ID marpa_r_latest_earley_set (Marpa_Recognizer r)

This method returns the Earley set ID (ordinal) of the latest Earley set. Applications that want the value of the latest earleme can convert this value using the marpa_r_earleme() method.

Return value: On success, the ID of the latest Earley set. Always succeeds.

Function: int marpa_r_latest_earley_set_value_set ( Marpa_Recognizer r, int value)

Sets the integer value of the latest Earley set. For more details, see the description of marpa_r_latest_earley_set_values_set().

Return value: On success, the new value of earley_set. On failure, -2.

Function: int marpa_r_latest_earley_set_values_set ( Marpa_Recognizer r, int value, void* pvalue)

Sets the integer and pointer value of the latest Earley set. For more about the “integer value” and “pointer value” of an Earley set, see the description of the marpa_r_earley_set_values() method.

Return value: On success, returns a non-negative integer. On failure, returns -2.


12.6 Other parse status methods

Function: int marpa_r_completion_symbol_activate ( Marpa_Recognizer r, Marpa_Symbol_ID sym_id, int reactivate )

Allows the user to deactivate and reactivate symbol completion events in the recognizer. If reactivate is zero, the event is deactivated. If reactivate is one, the event is activated.

Symbol completion events are active by default if the symbol was set up for completion events in the grammar. If a symbol was not set up for completion events in the grammar, symbol completion events are inactive by default and any attempt to change that is a fatal error.

Success cases: On success, the method returns the value of reactivate. The method succeeds trivially if the symbol is already set as indicated by reactivate.

Failure cases: If the active status of the completion event for sym_id cannot be set as indicated by reactivate, the method fails. On failure, -2 is returned.

Function: int marpa_r_earley_item_warning_threshold_set (Marpa_Recognizer r, int threshold)
Function: int marpa_r_earley_item_warning_threshold (Marpa_Recognizer r)

These methods, respectively, set and query the Earley item warning threshold. The Earley item warning threshold is a number that is compared with the count of Earley items in each Earley set. When it is matched or exceeded, a MARPA_EVENT_EARLEY_ITEM_THRESHOLD event is created.

If threshold is zero or less, an unlimited number of Earley items will be allowed without warning. This will rarely be what the user wants.

By default, Libmarpa calculates a value based on the grammar. The formula Libmarpa uses is the result of some experience, and most applications will be happy with it.

Return value: The value that the Earley item warning threshold has after the method call is finished. Always succeeds.

Function: int marpa_r_expected_symbol_event_set ( Marpa_Recognizer r, Marpa_Symbol_ID symbol_id, int value)

Sets the “expected symbol event bit” for symbol_id to value. A recognizer event is created whenever symbol symbol_id is expected at the current earleme. if and only if the expected symbol event bit for symbol_id is 1. The “expected symbol event bit” must be 1 or 0.

In this context, “expected” means “expected as a terminal”. Even if a symbol is predicted at the current earleme, if it is not acceptable as a terminal, it does not trigger an “expected symbol event”.

By default, the “expected symbol event bit” is 0. It is an error to attempt to set the “expected symbol event bit” to 1 for a nulling symbol, an inaccessible symbol, or an unproductive symbol.

Return value: The value of the event bit after the method call is finished. -2 if symbol_id is not the ID of a valid symbol; if it is the ID of an nulling, inaccessible for unproductive symbol; or on other failure.

Function: int marpa_r_is_exhausted (Marpa_Recognizer r)

A parser is “exhausted” if it cannot accept any more input. Both successful and failed parses can be exhausted. In many grammars, the parse is always exhausted as soon as it succeeds. Good parses may also exist at earlemes prior to the current one.

Return value: 1 if the parser is exhausted, 0 otherwise. Always succeeds.

Function: int marpa_r_nulled_symbol_activate ( Marpa_Recognizer r, Marpa_Symbol_ID sym_id, int boolean )

Allows the user to deactivate and reactivate symbol nulled events in the recognizer. If boolean is zero, the event is deactivated. If boolean is one, the event is activated.

Symbol nulled events are active by default if the symbol was set up for nulled events in the grammar. If a symbol was not set up for nulled events in the grammar, symbol nulled events are inactive by default and any attempt to change that is a fatal error.

Success cases: On success, the method returns the value of boolean. The method succeeds trivially if the symbol is already set as indicated by boolean.

Failure cases: If the active status of the nulled event for sym_id cannot be set as indicated by boolean, the method fails. On failure, -2 is returned.

Function: int marpa_r_prediction_symbol_activate ( Marpa_Recognizer r, Marpa_Symbol_ID sym_id, int boolean )

Allows the user to deactivate and reactivate symbol prediction events in the recognizer. If boolean is zero, the event is deactivated. If boolean is one, the event is activated.

Symbol prediction events are active by default if the symbol was set up for prediction events in the grammar. If a symbol was not set up for prediction events in the grammar, symbol prediction events are inactive by default and any attempt to change that is a fatal error.

Success cases: On success, the method returns the value of boolean. The method succeeds trivially if the symbol is already set as indicated by boolean.

Failure cases: If the active status of the prediction event for sym_id cannot be set as indicated by boolean, the method fails. On failure, -2 is returned.

Function: int marpa_r_terminals_expected ( Marpa_Recognizer r, Marpa_Symbol_ID* buffer)

Returns a list of the ID’s of the symbols that are acceptable as tokens at the current earleme. buffer is expected to be large enough to hold the result. This is guaranteed to be the case if the buffer is large enough to hold a number of Marpa_Symbol_ID’s that is greater than or equal to the number of symbols in the grammar.

Return value: On success, the number of Marpa_Symbol_ID’s in buffer. On failure, -2.

Function: int marpa_r_terminal_is_expected ( Marpa_Recognizer r, Marpa_Symbol_ID symbol_id)

Return values on success: If symbol_id is the ID of a valid terminal symbol that is expected at the current earleme, a number greater than zero. If symbol_id is the ID of a valid terminal symbol that is not expected at the current earleme, or if symbol_id is the ID of a valid symbol that is not a terminal, zero.

Failure cases: Returns -2 on failure. It is a failure if symbol_id is not the ID of a valid symbol.


13 Progress reports

An important advantage of the Marpa algorithm is the ability to easily get full information about the state of the parse.

To start a progress report, use the marpa_r_progress_report_start() command. Only one progress report can be in use at any one time.

To get the information in a progress report, it is necessary to step through the progress report items. To get the data for the current progress report item, and advance to the next one, use the marpa_r_progress_item() method.

To destroy a progress report, freeing the memory it uses, call the marpa_r_progress_report_finish() method.

Function: int marpa_r_progress_report_reset ( Marpa_Recognizer r)

Resets the progress report. Assumes a report of the progress has already been initialized at some Earley set for recognizer r, with marpa_r_progress_report_start(). The reset progress report will be positioned before its first item.

Return value: On success, a non-negative value. On failure, -2.

Function: int marpa_r_progress_report_start ( Marpa_Recognizer r, Marpa_Earley_Set_ID set_id)

Initializes a report of the progress at Earley set set_id for recognizer r. If a progress report already exists, it is destroyed and its memory is freed. Initially, the progress report is positioned before its first item.

If no Earley set with ID set_id exists, marpa_r_progress_report_start() fails. The error code is MARPA_ERR_INVALID_LOCATION if set_id is negative. The error code is MARPA_ERR_NO_EARLEY_SET_AT_LOCATION if set_id is greater than the ID of the latest Earley set.

Return value: On success, the number of report items available. If the recognizer has not been started; if set_id does not exist; or on other failure, -2.

Function: int marpa_r_progress_report_finish ( Marpa_Recognizer r )

Destroys the report of the progress at Earley set set_id for recognizer r, freeing the memory and other resources. It is often not necessary to call this method. Any previously existing progress report is destroyed automatically whenever a new progress report is started, and when the recognizer is destroyed.

Return value: -2 if no progress report has been started, or on other failure. On success, a non-negative value.

Function: Marpa_Rule_ID marpa_r_progress_item ( Marpa_Recognizer r, int* position, Marpa_Earley_Set_ID* origin )

This method allows access to the data for the next item of a progress report. If there are no more progress report items, it returns -1 as a termination indicator and sets the error code to MARPA_ERR_PROGRESS_REPORT_EXHAUSTED. Either the termination indicator, or the item count returned by marpa_r_progress_report_start(), can be used to determine when the last item has been seen.

On success, the dot position is returned in the location pointed to by the position argument, and the origin is returned in the location pointed to by the origin argument. On failure, the locations pointed to by the position and origin arguments are unchanged.

Return value: On success, the rule ID of the next progress report item. If there are no more progress report items, -1. If either the position or the origin argument is NULL, or on other failure, -2.


14 Bocage methods


14.1 Overview

A bocage is structure containing the full set of parses found by processing the input according to the grammar. The bocage structure is new with Libmarpa, but is very similar in purpose to the more familar parse forests.

To create a bocage, use the marpa_b_new() method.

When a bocage is no longer in use, its memory can be freed using the marpa_b_unref() method.


14.2 Creating a new bocage

Function: Marpa_Bocage marpa_b_new (Marpa_Recognizer r, Marpa_Earley_Set_ID earley_set_ID)

Creates a new bocage object, with a reference count of 1. The reference count of its parent recognizer object, r, is increased by 1. If earley_set_ID is -1, the Earley set at the current earleme is used, if there is one.

If earley_set_ID is -1 and there is no Earley set at the current earleme; or if earley_set_ID is -1 and there is no parse ending at Earley set earley_set_ID, marpa_b_new() fails and the error code is set to MARPA_ERR_NO_PARSE.

Success return value: On success, the new bocage object. On failure, NULL.


14.3 Reference counting

Function: Marpa_Bocage marpa_b_ref (Marpa_Bocage b)

Increases the reference count by 1. Not needed by most applications.

Return value: On success, b. On failure, NULL.

Function: void marpa_b_unref (Marpa_Bocage b)

Decreases the reference count by 1, destroying b once the reference count reaches zero. When b is destroyed, the reference count of its parent recognizer is decreased by 1. If this takes the reference count of the parent recognizer to zero, it too is destroyed. If the parent recognizer is destroyed, the reference count of its base grammar is decreased by 1. If this takes the reference count of the base grammar to zero, it too is destroyed.


14.4 Accessors

Function: int marpa_b_ambiguity_metric (Marpa_Bocage b)

Returns an ambiguity metric. The metric is 1 is the parse is unambiguous. If the metric is 2 or greater, the parse is ambiguous. It was originally intended to have values greater than 2 be an cheaply computed estimate of the degree of ambiguity, but a satisfactory scheme for this has yet to be implemented.

Return value on success: 1 if the bocage is not for an ambiguous parse; 2 or greater if the bocage is for an ambiguous parse.

Failures: On failure, -2.

Function: int marpa_b_is_null (Marpa_Bocage b)

Return value on success: A number greater than or equal to 1 if the bocage is for a null parse; otherwise, 0.

Failures: On failure, -2.


15 Ordering methods


15.1 Overview

Before iterating the parses in the bocage, they must be ordered. To create an ordering, use the marpa_o_new() method. When an ordering is no longer in use, its memory can be freed using the marpa_o_unref() method.

An ordering is frozen once the first tree iterator is created using it. A frozen ordering cannot be changed.

As of this writing, the only methods to order parses are internal and undocumented. This is expected to change.


15.2 Creating an ordering

Function: Marpa_Order marpa_o_new ( Marpa_Bocage b)

Creates a new ordering object, with a reference count of 1. The reference count of its parent bocage object, b, is increased by 1.

Return value: On success, the new ordering object. On failure, NULL.


15.3 Reference counting

Function: Marpa_Order marpa_o_ref ( Marpa_Order o)

Increases the reference count by 1. Not needed by most applications.

Return value: On success, o. On failure, NULL.

Function: void marpa_o_unref ( Marpa_Order o)

Decreases the reference count by 1, destroying o once the reference count reaches zero. Beginning with o’s parent bocage, Libmarpa then proceeds up the chain of parent objects. Every time a child is destroyed, the reference count of its parent is decreased by 1. Every time the reference count of an object is decreased by 1, if that reference count is now zero, that object is destroyed. Libmarpa follows this chain of decrements and destructions as required, all the way back to the base grammar, if necessary.


15.4 Accessors

Function: int marpa_o_ambiguity_metric (Marpa_Order o)

Returns an ambiguity metric. The metric is 1 is the parse is unambiguous. If the metric is 2 or greater, the parse is ambiguous. It was originally intended to have values greater than 2 be an cheaply computed estimate of the degree of ambiguity, but a satisfactory scheme for this has yet to be implemented.

If the ordering is not already frozen, it will be frozen on return from marpa_o_ambiguity_metric(). marpa_o_ambiguity_metric() is considered an “accessor”, because it is assumed that the ordering is frozen when marpa_o_ambiguity_metric() is called.

Return value on success: 1 if the ordering is not for an ambiguous parse; 2 or greater if the ordering is for an ambiguous parse.

Failures: On failure, -2.

Function: int marpa_o_is_null (Marpa_Order o)

Return value on success: A number greater than or equal to 1 if the ordering is for a null parse; otherwise, 0.

Failures: On failure, -2.


15.5 Non-default ordering

Function: int marpa_o_high_rank_only_set ( Marpa_Order o, int flag)
Function: int marpa_o_high_rank_only ( Marpa_Order o)

These methods, respectively, set and query the “high rank only” flag of ordering o. A flag of 1 indicates that, when ranking, all choices should be discarded except those of the highest rank. A flag of 0 indicates that no choices should be discarded on the basis of their rank.

A value of 1 is the default. The value of the “high rank only” flag has no effect unless ranking has been turned on using the marpa_o_rank() method.

Return value: On success, the value of the “high rank only” flag after the call. On failure, -2.

Function: int marpa_o_rank ( Marpa_Order o )

By default, the ordering of parse trees is arbitrary. This method causes the ordering to be ranked according to the ranks of symbols and rules, the “null ranks high” flags of the rules, and the “high rank only” flag of the ordering. Once this method returns, the ordering is frozen.

Return value: On success, a non-negative value. On failure, -2.


16 Tree methods


16.1 Overview

Once the bocage has an ordering, the parses trees can be iterated. Marpa’s parse tree iterators iterate the parse trees contained in a bocage object. In Libmarpa, “parse tree iterators” are usually just called trees.

To create a tree, use the marpa_t_new() method. A newly created tree iterator is positioned before the first parse tree. When a tree iterator is no longer in use, its memory can be freed using the marpa_t_unref() method.

To position a newly created tree iterator at the first parse tree, use the marpa_t_next() method. Once the tree iterator is positioned at a parse tree, the same marpa_t_next() method is used to position it to the next parse tree.


16.2 Creating a new tree iterator

Function: Marpa_Tree marpa_t_new (Marpa_Order o)

Creates a new tree iterator, with a reference count of 1. The reference count of its parent ordering object, o, is increased by 1.

When initialized, a tree iterator is positioned before the first parse tree. To position the tree iterator to the first parse, the application must call marpa_t_next().

Return value: On success, a newly created tree. On failure, NULL.


16.3 Reference counting

Function: Marpa_Tree marpa_t_ref (Marpa_Tree t)

Increases the reference count by 1. Not needed by most applications.

Return value: On success, t. On failure, NULL.

Function: void marpa_t_unref (Marpa_Tree t)

Decreases the reference count by 1, destroying t once the reference count reaches zero. Beginning with t’s parent ordering, Libmarpa then proceeds up the chain of parent objects. Every time a child is destroyed, the reference count of its parent is decreased by 1. Every time the reference count of an object is decreased by 1, if that reference count is now zero, that object is destroyed. Libmarpa follows this chain of decrements and destructions as required, all the way back to the base grammar, if necessary.


16.4 Iterating through the trees

Function: int marpa_t_next ( Marpa_Tree t)

Positions t at the next parse tree in the iteration. Tree iterators are initialized to the position before the first parse tree, so this method must be called before creating a valuator from a tree.

If a tree iterator is positioned after the last parse, the tree is said to be “exhausted”. A tree iterator for a bocage with no parse trees is considered to be “exhausted” when initialized. If the tree iterator is exhausted, marpa_t_next() returns -1 as a termination indicator, and sets the error code to MARPA_ERR_TREE_EXHAUSTED.

Return value: On success, a non-negative value. If the tree iterator is exhausted, -1. On failure, -2.

Function: int marpa_t_parse_count ( Marpa_Tree t)

The parse counter counts the number of parse trees traversed so far. The count includes the current iteration of the tree, so that a value of 0 indicates that the tree iterator is at its initialized position, before the first parse tree.

Return value: The number of parses traversed so far. Always succeeds.


17 Value methods


17.1 Overview

The archetypal application needs a value object (or valuator) to produce the value of the parse. To create a valuator, use the marpa_v_new() method. When a valuator is no longer in use, its memory can be freed using the marpa_v_unref() method.

The application is required to maintain the stack, and the application is also required to implement most of the semantics, including the evaluation of rules. Libmarpa’s valuator provides instructions to the application on how to manipulate the stack. To iterate through this series of instructions, use the marpa_v_step() method.

marpa_v_step() returns the type of step. Most step types have values associated with them. To access these values use the methods described in the section Basic step accessors. How to perform the steps is described in the sections How to use the valuator and Stepping through the valuator.


17.2 How to use the valuator

Libmarpa’s valuator provides the application with “steps”, which are instructions for stack manipulation. Libmarpa itself does not maintain a stack. This leaves the upper layer in total control of the stack and the values which are placed on it.

As example may make this clearer. Suppose the evalution is at a place in the parse tree where an addition is being performed. Libmarpa does not know that the operation is an addition. It will tell the application that rule number R is to be applied to the arguments at stack locations N and N+1, and that the result is to placed in stack location N.

In this system the application keeps track of the semantics for all rules, so it looks up rule R and determines that it is an addition. The application can do this by using R as an index into an array of callbacks, or by any other method it chooses. Let’s assume a callback implements the semantics for rule R. Libmarpa has told the application that two arguments are available for this operation, and that they are at locations N and N+1 in the stack. They might be the numbers 42 and 711. So the callback is called with its two arguments, and produces a return value, let’s say, 753. Libmarpa has told the application that the result belongs at location N in the stack, so the application writes 753 to location N.

Since Libmarpa knows nothing about the semantics, the operation for rule R could be string concatenation instead of addition. Or, if it is addition, it could allow for its arguments to be floating point or complex numbers. Since the application maintains the stack, it is up to the application whether the stack contains integers, strings, complex numbers, or polymorphic objects which are capable of being any of these things and more.


17.3 Advantages of step-driven valuation

Step-driven valuation hides Libmarpa’s grammar rewrites from the application, and is quite efficient. Libmarpa knows which rules are sequences. Libmarpa optimizes stack manipulations based on this knowledge. Long sequences are very common in practical grammars. For these, the stack manipulations suggested by Libmarpa’s step-driven valuator will be significantly faster than the traditional stack evaluation algorithm.

Step-driven evalution has another advantage. To illustrate this, consider what is a very common case: The semantics are implemented in a higher-level language, using callbacks. If Libmarpa did not use step-driven valuation, it would need to provide for this case. But for generality, Libmarpa would have to deal in C callbacks. Therefore, a middle layer would have to create C language wrappers for the callbacks in the higher level language.

The implementation that results is this: The higher level language would need to wrap each callback in C. When calling Libmarpa, it would pass the wrappered callback. Libmarpa would then need to call the C language “wrappered” callback. Next, the wrapper would call the higher-level language callback. The return value, which would be data native to the higher-level language, would need to be passed to the C language wrapper, which will need to make arrangements for it to be based back to the higher-level language when appropriate.

A setup like this is not terribly efficient. And exception handling across language boundaries would be very tricky. But neither of these is the worst problem.

Callbacks are hard to debug. Wrappered callbacks are even worse. Calls made across language boundaries are harder yet to debug. In the system described above, by the time a return value is finally consumed, a language boundary will have been crossed four times.

How do programmers deal with difficulties like this? Usually, by doing the absolute minimum possible in the callbacks. A horrific debugging enviroment can become a manageable one if there is next to no code to be debugged. And this can be accomplished by doing as much as possible in pre- and post-processing.

In essence, callbacks force applications to do most of the programming via side effects. One need not be a functional programming purist to find this a very undesirable style of design to force on an application. But the ability to debug can make the difference between code that does work and code that does not. Unfairly or not, code is rarely considered well-designed when it does not work.

So, while step-driven valuation seems a roundabout approach, it is simpler and more direct than the likely alternatives. And there is something to be said for pushing semantics up to the higher levels — they can be expected to know more about it.

These advantages of step-driven valuation are strictly in the context of a low-level interface. The author is under no illusion that direct use of Libmarpa’s valuator will be found satisfactory by most application programmers, even those using the C language. The author certainly avoids using step-driven valuation directly. Libmarpa’s valuator is intended to be used via an upper layer, one which does know about semantics.


17.4 Maintaining the stack

This section discusses in detail the requirements for maintaining the stack. In some cases, such as implementation using a Perl array, fulfilling these requirements is trivial. Perl auto-extends its arrays, and initializes the element values, on every read or write. For the C programmer, things are not quite so easy.

In this section, we will assume a C90 or C99 standard-conformant C application. This assumption is convenient on two grounds. First, this will be the intended use for many readers. Second, standard-conformant C is a “worst case”. Any issue faced by a programmer of another environment is likely to also be one that must be solved by the C programmer.

Libmarpa often optimizes away unnecessary stack writes to stack locations. When it does so, it will not necessarily optimize away all reads to that stack location. This means that a location’s first access, as suggested by the Libmarpa step instructions, may be a read. This possibility requires a special awareness from the C programmer, as discussed in the sections Sizing the stack and Initializing locations in the stack.

In the discussions in this document, stack locations are non-negative integers. The bottom of the stack is location 0. In moving from the bottom of the stack to the top, the numbers increase. Stack location Y is said to be “greater” than stack location X if stack location Y is closer to the top of stack than location X, and therefore stack locations are considered greater or lesser if the integers that represent them are greater or lesser. Another way to state that a stack location Y is greater (lesser) than stack location X is to say that a stack location Y is later (earlier) than stack location X.


17.4.1 Sizing the stack

If an implementation applies Libmarpa’s step instructions literally, using a physical stack, it must make sure the stack is large enough. Specifically, the application must do the following

Three aspects of these requirements deserve special mention. First, note that the requirement for a MARPA_STEP_RULE is that the application size the stack to include the arguments to be read. Because stack writes may be optimized away, an application, when reading, cannot assume that the stack was sized appropriately by a prior write. The first access to a new stack location may be a read.

Second, note that there is no explicit requirement that the application size the stack to include the location for the result of the MARPA_STEP_RULE step. An application is allowed to assume that result will go into one of the locations that were read.

Third, special note should be made of the requirement that location 0 exist. By convention, the parse result resides in location 0 of the stack. Because of potential optimizations, an application cannot assume that it will receive a Libmarpa step instruction that either reads from or writes to location 0.


17.4.2 Initializing locations in the stack

Write optimizations also creates issues for implementations which require data to be initialized before reading. Every fully standard-conforming C application is such an implementation. Both C90 and C99 allow “trap values”, and therefore conforming applications must be prepared for an uninitialized location to contain one of those. Reading a trap value may cause an abend. (It is safe, in standard-conforming C, to write to a location containing a trap value.)

The requirement that locations be initialized before reading occurs in other implementations. Any implementation that has a “universe” of “safe” values, may require special precautions. The required precautions may amount to a need to initialize “uninitialized” values. A practical example might be an implementation that expects all locations to contain a pointer which it can safely indirect from. In such implementations, just as in standard-conformant C, every stack location needs to be initialized before being read.

Due to write optimizations, an application cannot rely on Libmarpa’s step instructions to initialize every stack location before its first read. One way to safely deal with the initialization of stack locations, is to do all of the following:

Applications which try to optimize out some of these initializations need to be aware that an application can never assume that activity in the stack is safely “beyond” an uninitialized location. Libmarpa steps often revisit earlier sections of the stack, and these revisits may include reads of previously unvisited stack locations.


17.5 Creating a new valuator

Function: Marpa_Value marpa_v_new ( Marpa_Tree t )

Creates a new valuator. The parent object of the new valuator will be the tree iterator t, and the reference count of the new valuator will be 1. The reference count of t is increased by 1.

The parent tree iterator is “paused”, so that the tree iterator cannot move on to a new parse tree until the valuator is destroyed. Many valuators of the same parse tree can exist at once. A tree iterator is “unpaused” when all of the valuators of that tree iterator are destroyed.

Return value: On success, the newly created valuator. On failure, NULL.


17.6 Reference counting

Function: Marpa_Value marpa_v_ref (Marpa_Value v)

Increases the reference count by 1. Not needed by most applications.

Return value: On success, v. On failure, NULL.

Function: void marpa_v_unref ( Marpa_Value v)

Decreases the reference count by 1, destroying v once the reference count reaches zero. Beginning with v’s parent tree, Libmarpa then proceeds up the chain of parent objects. Every time a child is destroyed, the reference count of its parent is decreased by 1. Every time the reference count of an object is decreased by 1, if that reference count is now zero, that object is destroyed. Libmarpa follows this chain of decrements and destructions as required, all the way back to the base grammar, if necessary.


17.7 Stepping through the valuator

Function: Marpa_Step_Type marpa_v_step ( Marpa_Value v)

This method “steps through” the valuator. The return value is a Marpa_Step_Type, an integer which indicates the type of step. How the application is expected to act on each step is described below. See Valuator steps by type. When the iteration through the steps is finished, marpa_v_step returns MARPA_STEP_INACTIVE.

Return value: On success, the type of the step to be performed. This will always be a non-negative number. On failure, -2.


17.8 Valuator steps by type

Macro: Marpa_Step_Type MARPA_STEP_RULE

The semantics of a rule should be performed. The application can find the value of the rule’s children in the stack locations from marpa_v_arg_0(v) to marpa_v_arg_n(v). The semantics for the rule whose ID is marpa_v_rule(v) should be executed on these child values, and the result placed in marpa_v_result(v). In the case of a MARPA_STEP_RULE step, the stack location of marpa_v_result(v) is guaranteed to be equal to marpa_v_arg_0(v).

Macro: Marpa_Step_Type MARPA_STEP_TOKEN

The semantics of a non-null token should be performed. The application’s value for the token whose ID is marpa_v_token(v) should be placed in stack location marpa_v_result(v). Its value according to Libmarpa will be in marpa_v_token_value(v).

Macro: Marpa_Step_Type MARPA_STEP_NULLING_SYMBOL

The semantics for a nulling symbol should be performed. The ID of the symbol is marpa_v_symbol(v) and its value should be placed in stack location marpa_v_result(v).

Macro: Marpa_Step_Type MARPA_STEP_INACTIVE

The valuator has gone through all of its steps and is now inactive. The value of the parse will be in stack location 0. Because of optimizations, it is possible for valuator to immediately became inactive — MARPA_STEP_INACTIVE could be both the first and last step.

Macro: Marpa_Step_Type MARPA_STEP_INITIAL

The valuator is new and has yet to go through any steps.

Macro: Marpa_Step_Type MARPA_STEP_INTERNAL1
Macro: Marpa_Step_Type MARPA_STEP_INTERNAL2
Macro: Marpa_Step_Type MARPA_STEP_TRACE

These step types are reserved for internal purposes.


17.9 Basic step accessors

The basic step accessors are so called because their information is basic to the stack manipulation. The basic step accessors are implemented as macros. They always succeed.

Macro: int marpa_v_arg_0 (Marpa_Value v)

For a MARPA_STEP_RULE step, returns the stack location where the value of first child can be found.

Macro: int marpa_v_arg_n (Marpa_Value v)

For a MARPA_STEP_RULE step, returns the stack location where the value of the last child can be found.

Macro: int marpa_v_result (Marpa_Value v)

For MARPA_STEP_RULE, MARPA_STEP_TOKEN, and MARPA_STEP_NULLING_SYMBOL steps, returns the stack location where the result of the semantics should be placed.

Macro: Marpa_Rule_ID marpa_v_rule (Marpa_Value v)

For the MARPA_STEP_RULE step, returns the ID of the rule.

Macro: Marpa_Step_Type marpa_v_step_type (Marpa_Value v)

Returns the current step type: MARPA_STEP_TOKEN, MARPA_STEP_RULE, etc. Usually not needed since this is also the return value of marpa_v_step().

Macro: Marpa_Symbol_ID marpa_v_symbol (Marpa_Value v)

For the MARPA_STEP_NULLING_SYMBOL step, returns the ID of the symbol. The value returned is the same as that returned by the marpa_v_token() macro.

Macro: Marpa_Symbol_ID marpa_v_token (Marpa_Value v)

For the MARPA_STEP_TOKEN step, returns the ID of the token. The value returned is the same as that returned by the marpa_v_symbol() macro.

Macro: int marpa_v_token_value (Marpa_Value v)

For the MARPA_STEP_TOKEN step, returns the integer which is (or which represents) the value of the token.


17.10 Other step accessors

This section contains the step accessors that are not basic to stack manipulation, but which provide other useful information about the parse. These step accessors are implemented as macros.

All of these accessors always succeed, but if called when they are irrelevant they return an unspecified value. In this context, an “unspecified value” is a value that is either -1 or the ID of a valid Earley set, but which is otherwise unpredictable.

Macro: Marpa_Earley_Set_ID marpa_v_es_id (Marpa_Value v)

Return value: If the current step type is MARPA_STEP_RULE, the Earley Set ordinal where the rule ends. If the current step type is MARPA_STEP_TOKEN or MARPA_STEP_NULLING_SYMBOL, the Earley Set ordinal where the symbol ends. If the current step type is anything else, an unspecified value.

Macro: Marpa_Earley_Set_ID marpa_v_rule_start_es_id (Marpa_Value v)

Return value: If the current step type is MARPA_STEP_RULE, the Earley Set ordinal where the rule begins. If the current step type is anything else, an unspecified value.

Macro: Marpa_Earley_Set_ID marpa_v_token_start_es_id (Marpa_Value v)

Return value: If the current step type is MARPA_STEP_TOKEN or MARPA_STEP_NULLING_SYMBOL, the Earley Set ordinal where the token begins. If the current step type is anything else, an unspecified value.


18 Events


18.1 Overview

Events are generated by the marpa_g_precompute(), marpa_r_earleme_complete(), and marpa_r_start_input() methods. The methods are called event-active. Event-active methods always clear all previous events, so that after an event-active method the only events available will be those generated by that method.

Events are volatile, and it is expected that events will be queried immediately after the method that generated them. Note especially that multiple recognizers using the same base grammar overwrite each other’s events.

To find out how many events were generated by the last event-active method, use the marpa_g_event_count method.

To query a specific event, use the marpa_g_event and marpa_g_event_value methods.


18.2 Methods

Function: Marpa_Event_Type marpa_g_event (Marpa_Grammar g, Marpa_Event* event, int ix)

On success, the type of the ix’th event is returned and the data for the ix’th event is placed in the location pointed to by event.

Event indexes are in sequence. Valid events will be in the range from 0 to n, where n is one less than the event count. The event count can be queried using the marpa_g_event_count() method.

Return value: On success, the type of event ix. If there is no ix’th event, if ix is negative, or on other failure, -2. On failure, the locations pointed to by event are not changed.

Function: int marpa_g_event_count ( Marpa_Grammar g )

Return value: On success, the number of events. On failure, -2.

Macro: int marpa_g_event_value (Marpa_Event* event)

This macro provides access to the “value” of the event. The semantics of the value varies according to the type of the event, and is described in the section on event codes. See Event codes.


18.3 Event codes

Macro: int MARPA_EVENT_NONE

Applications should never see this event. Event value: Undefined. Suggested message: "No event".

Macro: int MARPA_EVENT_COUNTED_NULLABLE

A nullable symbol is either the separator for, or the right hand side of, a sequence. Event value: The ID of the symbol. Suggested message: "This symbol is a counted nullable".

Macro: int MARPA_EVENT_EARLEY_ITEM_THRESHOLD

This event indicates the current Earley item count exceeded a user-settable threshold. Event value: The current Earley item count. Suggested message: "Too many Earley items".

Macro: int MARPA_EVENT_EXHAUSTED

The parse is exhausted. Event value: Undefined. Suggested message: "Recognizer is exhausted".

Macro: int MARPA_EVENT_LOOP_RULES

One or more rules are loop rules — rules that are part of a cycle. Cycles are pathological cases of recursion, in which the same symbol string derives itself a potentially infinite number of times. Nonetheless, Marpa parses in the presence of these, and it is up to the application to treat these as fatal errors, something they almost always will wish to do. Event value: The count of loop rules. Suggested message: "Grammar contains a infinite loop".

Macro: int MARPA_EVENT_NULLING_TERMINAL

A nulling symbol is also a terminal. Event value: The ID of the symbol. Suggested message: "This symbol is a nulling terminal".

Macro: int MARPA_EVENT_SYMBOL_COMPLETED

The recognizer can be set to generate an event a symbol is completed using its marpa_g_symbol_is_completion_event_set() method. (A symbol is "completed" if and only if any rule with that symbol as its LHS is completed.) This event code indicates that one of those events occurred. Event value: The ID of the completed symbol. Suggested message: "Completed symbol".

Macro: int MARPA_EVENT_SYMBOL_EXPECTED

The recognizer can be set to generate an event when a symbol is expected as a terminal, using its marpa_r_expected_symbol_event_set() method. Note that this event only triggers if the symbol is expected as a terminal. Predicted symbols which are not expected as terminals do not trigger this event. This event code indicates that one of those events occurred. Event value: The ID of the expected symbol. Suggested message: "Expecting symbol".

Macro: int MARPA_EVENT_SYMBOL_NULLED

The recognizer can be set to generate an event when a symbol is nulled – that is, recognized as a zero-length symbol. To set an nulled symbol event, use the recognizer’s marpa_r_nulled_symbol_event_set() method. This event code indicates that a nulled symbol event occurred. Event value: The ID of the nulled symbol. Suggested message: "Symbol was nulled".

Macro: int MARPA_EVENT_SYMBOL_PREDICTED

The recognizer can be set to generate an event when a symbol is predicted. To set an predicted symbol event, use the recognizer’s marpa_g_symbol_is_prediction_event_set() method. Unlike the MARPA_EVENT_SYMBOL_EXPECTED event, the MARPA_EVENT_SYMBOL_PREDICTED event triggers for predictions of both non-terminals and terminals. This event code indicates that a predicted symbol event occurred. Event value: The ID of the predicted symbol. Suggested message: "Symbol was predicted".


19 Error methods, macros and codes


19.1 Error methods

Function: Marpa_Error_Code marpa_g_error ( Marpa_Grammar g, const char** p_error_string)

When a method fails, this method allows the application to read the error code. p_error_string is reserved for use by the internals. Applications should set it to NULL.

Return value: The last error code from a Libmarpa method. Always succeeds.

Function: Marpa_Error_Code marpa_g_error_clear ( Marpa_Grammar g )

Sets the error code to MARPA_ERR_NONE. Not often used, but now and then it can be useful to force the error code to a known state.

Return value: MARPA_ERR_NONE. Always succeeds.


19.2 Error Macros

Macro: int MARPA_ERRCODE_COUNT

The number of error codes. All error codes, whether internal or external, will be integers, non-negative but strictly less than MARPA_ERRCODE_COUNT.


19.3 External error codes

This section lists the external error codes. These are the only error codes that users of the Libmarpa external interface should ever see. Internal error codes are in their own section. See Internal error codes.

Macro: int MARPA_ERR_NONE

No error condition. The error code is initialized to this value. Methods which do not result in failure sometimes reset the error code to MARPA_ERR_NONE. Numeric value: 0. Suggested message: "No error".

Macro: int MARPA_ERR_BAD_SEPARATOR

A separator was specified for a sequence rule, but its ID was not that of a valid symbol. Numeric value: 6. Suggested message: "Separator has invalid symbol ID".

Macro: int MARPA_ERR_BEFORE_FIRST_TREE

A tree iterator is positioned before the first tree, and it was specified in a context where that is not allowed. A newly created tree is positioned before the first tree. To position a newly created tree iterator to the first tree use the marpa_t_next() method. Numeric value: 91. Suggested message: "Tree iterator is before first tree".

Macro: int MARPA_ERR_COUNTED_NULLABLE

A “counted” symbol was found that is also a nullable symbol. A “counted” symbol is one that appears on the RHS of a sequence rule. If a symbol is nullable, counting its occurrences becomes difficult. Questions of definition and problems of implementation arise. At a minimum, a sequence with counted nullables would be wildly ambigious.

Sequence rules are simply an optimized shorthand for rules that can also be written in ordinary BNF. If the equivalent of a sequence of nullables is really what your application needs, nothing in Libmarpa prevents you from specifying that sequence with ordinary BNF rules.

Numeric value: 8. Suggested message: "Nullable symbol on RHS of a sequence rule".

Macro: int MARPA_ERR_DUPLICATE_RULE

This error indicates an attempt to add a BNF rule which is a duplicate of a BNF rule already in the grammar. Two BNF rules are considered duplicates if

Duplication of sequence rules, and duplication between BNF rules and sequence rules, is dealt with by requiring that the LHS of a sequence rule not be the LHS of any other rule.

Numeric value: 11. Suggested message: "Duplicate rule".

Macro: int MARPA_ERR_DUPLICATE_TOKEN

This error indicates an attempt to add a duplicate token. A token is a duplicate if one already read at the same earleme has the same symbol ID and the same length. Numeric value: 12. Suggested message: "Duplicate token".

Macro: int MARPA_ERR_YIM_COUNT

This error code indicates that an implementation-defined limit on the number of Earley items per Earley set was exceedeed. This limit is different from the Earley item warning threshold, an optional limit on the number of Earley items in an Earley set, which can be set by the application.

The implementation defined-limit is very large, at least 500,000,000 earlemes. An application is unlikely ever to see this error. Libmarpa’s use of memory would almost certainly exceed the implementation’s limits before it occurred. Numeric value: 13. Suggested message: "Maximum number of Earley items exceeded".

Macro: int MARPA_ERR_EVENT_IX_NEGATIVE

A negative event index was specified. That is not allowed. Numeric value: 15. Suggested message: "Negative event index".

Macro: int MARPA_ERR_EVENT_IX_OOB

An non-negative event index was specified, but there is no event at that index. Since the events are in sequence, this means it was too large. Numeric value: 16. Suggested message: "No event at that index".

Macro: int MARPA_ERR_GRAMMAR_HAS_CYCLE

The grammar has a cycle — one or more loop rules. This is a recoverable error, although most applications will want to treat it as fatal. For more see the description of marpa_g_precompute. Numeric value: 17. Suggested message: "Grammar has cycle".

Macro: int MARPA_ERR_HEADERS_DO_NOT_MATCH

This is an internal error, and indicates that Libmarpa was wrongly built. Libmarpa was compiled with headers which do not match the rest of the code. The solution is to find a correctly built Libmarpa. Numeric value: 98. Suggested message: "Internal error: Libmarpa was built incorrectly"

Macro: int MARPA_ERR_I_AM_NOT_OK

The Libmarpa base grammar is in a "not ok" state. Currently, the only way this can happen is if Libmarpa memory is being overwritten. Numeric value: 29. Suggested message: "Marpa is in a not OK state".

Macro: int MARPA_ERR_INACCESSIBLE_TOKEN

This error code indicates that the token symbol is an inaccessible symbol — one which cannot be reached from the start symbol.

Since the inaccessibility of a symbol is a property of the grammar, this error code typically indicates an application error. Nevertheless, a retry at this location, using another token ID, may succeed. At this writing, the author knows of no uses of this technique.

Numeric value: 18. Suggested message: "Token symbol is inaccessible".

Macro: int MARPA_ERR_INVALID_BOOLEAN

A function was called that takes a boolean argument, but the value of that argument was not either 0 or 1. Numeric value: 22. Suggested message: "Argument is not boolean".

Macro: int MARPA_ERR_INVALID_LOCATION

The location (Earley set ID) is not valid. It may be invalid for one of two reasons:

For users of input models other than the standard one, the term “location”, as used in association with this error code, means Earley set ID or Earley set ordinal. In the standard input model, this will always be identical with Libmarpa’s other idea of location, the earleme.

Numeric value: 25. Suggested message: "Location is not valid".

Macro: int MARPA_ERR_INVALID_START_SYMBOL

A start symbol was specified, but its symbol ID is not that of a valid symbol. Numeric value: 27. Suggested message: "Specified start symbol is not valid".

Macro: int MARPA_ERR_INVALID_ASSERTION_ID

A method was called with an invalid assertion ID. This is a assertion ID which not only does not exist, but cannot exist. Currently that means its value is less than zero. Numeric value: 96. Suggested message: "Assertion ID is malformed".

Macro: int MARPA_ERR_INVALID_RULE_ID

A method was called with an invalid rule ID. This is a rule ID which not only does not exist, but cannot exist. Currently that means its value is less than zero. Numeric value: 26. Suggested message: "Rule ID is malformed".

Macro: int MARPA_ERR_INVALID_SYMBOL_ID

A method was called with an invalid symbol ID. This is a symbol ID which not only does not exist, but cannot exist. Currently that means its value is less than zero. Numeric value: 28. Suggested message: "Symbol ID is malformed".

Macro: int MARPA_ERR_MAJOR_VERSION_MISMATCH

There was a mismatch in the major version number between the requested version of libmarpa, and the actual one. Numeric value: 30. Suggested message: "Libmarpa major version number is a mismatch".

Macro: int MARPA_ERR_MICRO_VERSION_MISMATCH

There was a mismatch in the micro version number between the requested version of libmarpa, and the actual one. Numeric value: 31. Suggested message: "Libmarpa micro version number is a mismatch".

Macro: int MARPA_ERR_MINOR_VERSION_MISMATCH

There was a mismatch in the minor version number between the requested version of libmarpa, and the actual one. Numeric value: 32. Suggested message: "Libmarpa minor version number is a mismatch".

Macro: int MARPA_ERR_NO_EARLEY_SET_AT_LOCATION

A non-negative Earley set ID (also called an Earley set ordinal) was specified, but there is no corresponding Earley set. Since the Earley set ordinals are in sequence, this means that the specified ID is greater than that of the latest Earley set. Numeric value: 39. Suggested message: "Earley set ID is after latest Earley set".

Macro: int MARPA_ERR_NOT_PRECOMPUTED

The grammar is not precomputed, and attempt was made to do something with it that is not allowed for unprecomputed grammars. For example, a recognizer cannot be created from a grammar until it is precomputed. Numeric value: 34. Suggested message: "This grammar is not precomputed".

Macro: int MARPA_ERR_NO_PARSE

The application attempted to create a bocage from a recognizer without a parse. Applications will often want to treat this as a soft error. Numeric value: 41. Suggested message: "No parse".

Macro: int MARPA_ERR_NO_RULES

A grammar which has no rules is being used in a way that is not allowed. Usually the problem is that the user is trying to precompute the grammar. Numeric value: 42. Suggested message: "This grammar does not have any rules".

Macro: int MARPA_ERR_NO_START_SYMBOL

The grammar has no start symbol, and an attempt was made to perform an operation which requires one. Usually the problem is that the user is trying to precompute the grammar. Numeric value: 43. Suggested message: "This grammar has no start symbol".

Macro: int MARPA_ERR_NO_SUCH_ASSERTION_ID

A method was called with an assertion ID which is well-formed, but the assertion does not exist. Numeric value: 97. Suggested message: "No assertion with this ID exists".

Macro: int MARPA_ERR_NO_SUCH_RULE_ID

A method was called with a rule ID which is well-formed, but the rule does not exist. Numeric value: 89. Suggested message: "No rule with this ID exists".

Macro: int MARPA_ERR_NO_SUCH_SYMBOL_ID

A method was called with a symbol ID which is well-formed, but the symbol does not exist. Numeric value: 90. Suggested message: "No symbol with this ID exists".

Macro: int MARPA_ERR_NO_TOKEN_EXPECTED_HERE

This error code indicates that no tokens at all were expected at this earleme location. This can only happen in alternative input models.

Typically, this indicates an application programming error. Retrying input at this location will always fail. But if the application is able to leave this earleme empty, a retry at a later location, using this or another token, may succeed. At this writing, the author knows of no uses of this technique.

Numeric value: 44. Suggested message: "No token is expected at this earleme location".

Macro: int MARPA_ERR_NOT_A_SEQUENCE

This error occurs in situations where a rule is required to be a sequence, and indicates that the rule of interest is, in fact, not a sequence.

Numeric value: 99. Suggested message: "Rule is not a sequence".

Macro: int MARPA_ERR_NULLING_TERMINAL

Marpa does not allow a symbol to be both nulling and a terminal. Numeric value: 49. Suggested message: "A symbol is both terminal and nulling".

Macro: int MARPA_ERR_ORDER_FROZEN

The Marpa order object has been frozen. If a Marpa order object is frozen, it cannot be changed.

Multiple tree iterators can share a Marpa order object, but that order object is frozen after the first tree iterator is created from it. Applications can order an bocage in many ways, but they must do so by creating multiple order objects.

Numeric value: 50. Suggested message: "The ordering is frozen".

Macro: int MARPA_ERR_PARSE_EXHAUSTED

The parse is exhausted. Numeric value: 53. Suggested message: "The parse is exhausted".

Macro: int MARPA_ERR_PARSE_TOO_LONG

The parse is too long. The limit on the length of a parse is implementation dependent, but it is very large, at least 500,000,000 earlemes.

This error code is unlikely in the standard input model. Almost certainly memory would be exceeded before it could occur. If an application sees this error, it almost certainly using one of the non-standard input models.

Most often this messsage will occur because of a request to add a single extremely long token, perhaps as a result of an application error. But it is also possible this error condition will occur after the input of a large number of long tokens.

Numeric value: 54. Suggested message: "This input would make the parse too long".

Macro: int MARPA_ERR_POINTER_ARG_NULL

In a method which takes pointers as arguments, one of the pointer arguments is NULL, in a case where that is not allowed. One such method is marpa_r_progress_item(). Numeric value: 56. Suggested message: "An argument is null when it should not be".

Macro: int MARPA_ERR_PRECOMPUTED

An attempt was made to use a precomputed grammar in a way that is not allowed. Often this is an attempt to change the grammar. Nearly every change to a grammar after precomputation invalidates the precomputation, and is therefore not allowed. Numeric value: 57. Suggested message: "This grammar is precomputed".

Macro: int MARPA_ERR_PROGRESS_REPORT_NOT_STARTED

No recognizer progress report is currently active, and an action has been attempted which is inconsistent with that. One such action would be a marpa_r_progress_item() call. Numeric value: 59. Suggested message: "No progress report has been started".

Macro: int MARPA_ERR_PROGRESS_REPORT_EXHAUSTED

The progress report is “exhausted” — all its items have been iterated through. Numeric value: 58. Suggested message: "The progress report is exhausted".

Macro: int MARPA_ERR_RANK_TOO_LOW

A symbol or rule rank was specified which was less than an implementation-defined minimum. Implementations will always allow at least those ranks in the range between -134,217,727 and 134,217,727. Numeric value: 85. Suggested message: "Rule or symbol rank too low".

Macro: int MARPA_ERR_RANK_TOO_HIGH

A symbol or rule rank was specified which was greater than an implementation-defined maximum. Implementations will always allow at least those ranks in the range between -134,217,727 and 134,217,727. Numeric value: 86. Suggested message: "Rule or symbol rank too high".

Macro: int MARPA_ERR_RECCE_IS_INCONSISTENT

The recognizer is “inconsistent”, usually because the user has rejected one or more rules or terminals, and has not yet called the marpa_r_consistent() method. Numeric value: 95. Suggested message: "The recognizer is inconsistent.

Macro: int MARPA_ERR_RECCE_NOT_ACCEPTING_INPUT

The recognizer is not accepting input, and the application has attempted something that is inconsistent with that fact. Numeric value: 60. Suggested message: "The recognizer is not accepting input".

Macro: int MARPA_ERR_RECCE_NOT_STARTED

The recognizer has not been started. and the application has attempted something that is inconsistent with that fact. Numeric value: 61. Suggested message: "The recognizer has not been started".

Macro: int MARPA_ERR_RECCE_STARTED

The recognizer has been started. and the application has attempted something that is inconsistent with that fact. Numeric value: 62. Suggested message: "The recognizer has been started".

Macro: int MARPA_ERR_RHS_IX_NEGATIVE

The index of a RHS symbol was specified, and it was negative. That is not allowed. Numeric value: 63. Suggested message: "RHS index cannot be negative".

Macro: int MARPA_ERR_RHS_IX_OOB

A non-negative index of RHS symbol was specified, but there is no symbol at that index. Since the indexes are in sequence, this means the index was greater than or equal to the rule length. Numeric value: 64. Suggested message: "RHS index must be less than rule length".

Macro: int MARPA_ERR_RHS_TOO_LONG

An attempt was made to add a rule with too many right hand side symbols. The limit on the RHS symbol count is implementation dependent, but it is very large, at least 500,000,000 symbols. This is far beyond what is required in any current practical grammar. An application with rules of this length is almost certain to run into memory and other limits. Numeric value: 65. Suggested message: "The RHS is too long".

Macro: int MARPA_ERR_SEQUENCE_LHS_NOT_UNIQUE

The LHS of a sequence rule cannot be the LHS of any other rule, whether a sequence rule or a BNF rule. An attempt was made to violate this restriction. Numeric value: 66. Suggested message: "LHS of sequence rule would not be unique".

Macro: int MARPA_ERR_START_NOT_LHS

The start symbol is not on the LHS on any rule. That means it could never match any possible input, not even the null string. Presumably, an error in writing the grammar. Numeric value: 73. Suggested message: "Start symbol not on LHS of any rule".

Macro: int MARPA_ERR_SYMBOL_IS_NOT_COMPLETION_EVENT

An attempt was made to use a symbol in a way that requires it to be set up for completion events, but the symbol was not set set up for completion events. The archetypal case is an attempt to activate completion events for the symbol in the recognizer. The archetypal case is an attempt to activate a completion event in the recognizer for a symbol that is not set up as a completion event. Numeric value: 92. Suggested message: "Symbol is not set up for completion events".

Macro: int MARPA_ERR_SYMBOL_IS_NOT_NULLED_EVENT

An attempt was made to use a symbol in a way that requires it to be set up for nulled events, but the symbol was not set set up for nulled events. The archetypal case is an attempt to activate a nulled events in the recognizer for a symbol that is not set up as a nulled event. Numeric value: 93. Suggested message: "Symbol is not set up for nulled events".

Macro: int MARPA_ERR_SYMBOL_IS_NOT_PREDICTION_EVENT

An attempt was made to use a symbol in a way that requires it to be set up for predictino events, but the symbol was not set set up for predictino events. The archetypal case is an attempt to activate a prediction event in the recognizer for a symbol that is not set up as a prediction event. Numeric value: 94. Suggested message: "Symbol is not set up for prediction events".

Macro: int MARPA_ERR_SYMBOL_VALUED_CONFLICT

Unvalued symbols are a deprecated Marpa feature, which may be avoided with the marpa_g_force_valued method. An unvalued symbol may take on any value, and therefore a symbol which is unvalued at some points cannot safely to be used to contain a value at others. This error indicates that such an unsafe use is being attempted. Numeric value: 74. Suggested message: "Symbol is treated both as valued and unvalued".

Macro: int MARPA_ERR_TERMINAL_IS_LOCKED

An attempt was made to change the terminal status of a symbol to a different value after it was locked. Numeric value: 75. Suggested message: "The terminal status of the symbol is locked".

Macro: int MARPA_ERR_TOKEN_IS_NOT_TERMINAL

A token was specified whose symbol ID is not a terminal. Numeric value: 76. Suggested message: "Token symbol must be a terminal".

Macro: int MARPA_ERR_TOKEN_LENGTH_LE_ZERO

A token length was specified which is less than or equal to zero. Zero-length tokens are not allowed in Libmarpa. Numeric value: 77. Suggested message: "Token length must greater than zero".

Macro: int MARPA_ERR_TOKEN_TOO_LONG

The token length is too long. The limit on the length of a token is implementation dependent, but it is at least 500,000,000 earlemes. An application using a token that long is almost certain to run into some other limit. Numeric value: 78. Suggested message: "Token is too long".

Macro: int MARPA_ERR_TREE_EXHAUSTED

A Libmarpa parse tree iterator is “exhausted”, that is, it has no more parses. Numeric value: 79. Suggested message: "Tree iterator is exhausted".

Macro: int MARPA_ERR_TREE_PAUSED

A Libmarpa tree is “paused” and an operation was attempted which is inconsistent with that fact. Typically, this operation will be a call of the marpa_t_next() method. Numeric value: 80. Suggested message: "Tree iterator is paused".

Macro: int MARPA_ERR_UNEXPECTED_TOKEN_ID

An attempt was made to read a token where a token with that symbol ID is not expected. This message can also occur when an attempt is made to read a token at a location where no token is expected. Numeric value: 81. Suggested message: "Unexpected token".

Macro: int MARPA_ERR_UNPRODUCTIVE_START

The start symbol is unproductive. That means it could never match any possible input, not even the null string. Presumably, an error in writing the grammar. Numeric value: 82. Suggested message: "Unproductive start symbol".

Macro: int MARPA_ERR_VALUATOR_INACTIVE

The valuator is inactive in a context where that should not be the case. Numeric value: 83. Suggested message: "Valuator inactive".

Macro: int MARPA_ERR_VALUED_IS_LOCKED

Unvalued symbols are a deprecated Marpa feature, which may be avoided with the marpa_g_force_valued method. This error code indicates that the valued status of a symbol is locked, and an attempt was made to change it to a status different from the current one. Numeric value: 84. Suggested message: "The valued status of the symbol is locked".

Macro: int MARPA_ERR_SYMBOL_IS_NULLING

An attempt was made to do something with a nulling symbol that is not allowed. For example, the ID of a nulling symbol cannot be an argument to marpa_r_expected_symbol_event_set — because it is not possible to create an “expected symbol” event for a nulling symbol. Numeric value: 87. Suggested message: "Symbol is nulling".

Macro: int MARPA_ERR_SYMBOL_IS_UNUSED

An attempt was made to do something with an unused symbol that is not allowed. An “unused” symbol is a inaccessible or unproductive symbol. For example, the ID of a unused symbol cannot be an argument to marpa_r_expected_symbol_event_set — because it is not possible to create an “expected symbol” event for an unused symbol. Numeric value: 88. Suggested message: "Symbol is not used".


19.4 Internal error codes

An internal error code may be one of two things: First, it can be an error code which arises from an internal Libmarpa programming issue (in other words, something happening in the code that was not supposed to be able to happen.) Second, it can be an error code which only occurs when a method from Libmarpa’s internal interface is used. Both kinds of internal error message share one common trait — users of the Libmarpa’s external interface should never see them.

Internal error messages require someone with knowledge of the Libmarpa internals to follow up on them. They usually do not have descriptions or suggested messages.

Macro: int MARPA_ERR_AHFA_IX_NEGATIVE

Numeric value: 1.

Macro: int MARPA_ERR_AHFA_IX_OOB

Numeric value: 2.

Macro: int MARPA_ERR_ANDID_NEGATIVE

Numeric value: 3.

Macro: int MARPA_ERR_ANDID_NOT_IN_OR

Numeric value: 4.

Macro: int MARPA_ERR_ANDIX_NEGATIVE

Numeric value: 5.

Macro: int MARPA_ERR_BOCAGE_ITERATION_EXHAUSTED

Numeric value: 7.

Macro: int MARPA_ERR_DEVELOPMENT

"Development" errors were used heavily during Libmarpa’s development, when it was not yet clear how precisely to classify every error condition. Unless they are using a developer’s version, users of the external interface should never see development errors.

Development errors have an error string associated with them. The error string is a short 7-bit ASCII error string which describes the error. Numeric value: 9. Suggested message: "Development error, see string".

Macro: int MARPA_ERR_DUPLICATE_AND_NODE

Numeric value: 10.

Macro: int MARPA_ERR_YIM_ID_INVALID

Numeric value: 14.

Macro: int MARPA_ERR_INTERNAL

A “catchall” internal error. Numeric value: 19.

Macro: int MARPA_ERR_INVALID_AHFA_ID

The AHFA ID was invalid. There are no AHFAs any more, so this message should not occur. Numeric value: 20.

Macro: int MARPA_ERR_INVALID_AIMID

The AHM ID was invalid. The term “AIMID” is a legacy of earlier implementations and must be kept for backward compatibility. Numeric value: 21.

Macro: int MARPA_ERR_INVALID_IRLID

Numeric value: 23.

Macro: int MARPA_ERR_INVALID_NSYID

Numeric value: 24.

Macro: int MARPA_ERR_NOOKID_NEGATIVE

Numeric value: 33.

Macro: int MARPA_ERR_NOT_TRACING_COMPLETION_LINKS

Numeric value: 35.

Macro: int MARPA_ERR_NOT_TRACING_LEO_LINKS

Numeric value: 36.

Macro: int MARPA_ERR_NOT_TRACING_TOKEN_LINKS

Numeric value: 37.

Macro: int MARPA_ERR_NO_AND_NODES

Numeric value: 38.

Macro: int MARPA_ERR_NO_OR_NODES

Numeric value: 40.

Macro: int MARPA_ERR_NO_TRACE_YS

Numeric value: 46.

Macro: int MARPA_ERR_NO_TRACE_PIM

Numeric value: 47.

Macro: int MARPA_ERR_NO_TRACE_YIM

Numeric value: 45.

Macro: int MARPA_ERR_NO_TRACE_SRCL

Numeric value: 48.

Macro: int MARPA_ERR_ORID_NEGATIVE

Numeric value: 51.

Macro: int MARPA_ERR_OR_ALREADY_ORDERED

Numeric value: 52.

Macro: int MARPA_ERR_PIM_IS_NOT_LIM

Numeric value: 55.

Macro: int MARPA_ERR_SOURCE_TYPE_IS_NONE

Numeric value: 70.

Macro: int MARPA_ERR_SOURCE_TYPE_IS_TOKEN

Numeric value: 71.

Macro: int MARPA_ERR_SOURCE_TYPE_IS_COMPLETION

Numeric value: 68.

Macro: int MARPA_ERR_SOURCE_TYPE_IS_LEO

Numeric value: 69.

Macro: int MARPA_ERR_SOURCE_TYPE_IS_AMBIGUOUS

Numeric value: 67.

Macro: int MARPA_ERR_SOURCE_TYPE_IS_UNKNOWN

Numeric value: 72.


20 Design notes

Design apologetics are throughout this document, dispersed among sections where it is hoped that they motivate the description or discussion. The notes in this section did not find a home elsewhere.


20.1 Why so many time objects?

Marpa is an aggressively multi-pass algorithm. Marpa achieves its efficiency, not in spite of making multiple passes over the data, but because of it. Marpa regularly substitutes two fast O(n) passes for a single O(n log n) pass. Marpa’s proliferation of time objects is in keeping with its multi-pass approach.

Bocage objects come at no cost, even for unambiguous parses, because the same pass which creates the bocage also deals with other issues which are of major significance for unambiguous parses. It is the post-processing of the bocage pass that enables Marpa to do both left- and right-recursion in linear time.

Of the various objects, the best case for elimination is of the ordering object. In many cases, the ordering is trivial. Either the parse is unambiguous, or the application does not care about the order in which parses are returned. But while it would be easy to add an option to bypass creation of an ordering object, there is little to be gained from it. When the ordering is trivial, its overhead is very small — essentially a handful of subroutine calls. Many orderings accomplish nothing, but these cost next to nothing.

Tree objects come at minimal cost to unambiguous grammars, because the same pass that allows iteration through multiple parse trees does the tree traversal. This eliminates much of the work that otherwise would need to be done in the valuation time object. In the current implement, the valuation time object needs only to step through a sequence already determined in the tree iterator.


20.2 Numbered objects

As the name suggests, the choice was made to implement numbered objects as integers, and not as pointers. In standard-conformant C, integers can be safely checked for validity, while pointers cannot.

There are efficiency tradeoffs between pointers and integers but they are complicated, and they go both ways. Pointers can be faster, but integers can be used as indexes into more than one data structure. Which is actually faster depends on the design. Integers allow for a more flexible design, so that once the choice is settled on, careful programming can make them a win, possibly a very big one.

The approach taken in Libmarpa was to settle, from the outset, on integers as the implementation for numbered objects, and to optimize on that basis. The author concedes that it is possible that others redoing Libmarpa from scratch might find that pointers are faster. But the author is confident that they will also discover, on modern architectures, that the lack of safe validity checking is far too high a price to pay for the difference in speed.


20.3 LHS terminals

Marpa’s idea in losing the sharp division between terminals and non-terminals is that the distinction, while helpful for proving theorems, is not essential in practice. LHS symbols in the input might be useful for “short circuiting” the rules in which they occur. This may prove helpful in debugging, or have other applications.

However, it also can be useful, for checking input validity as well as for efficiency, to follow tradition and distinguish non-terminals from terminals. For this reason, the traditional behavior is the default in Libmarpa.


21 Work in Progress


21.1 Untested methods

The methods of this section are not in the external interface, because they have not been adequately tested. Their fate is uncertain. Users should regard these methods as unsupported.


21.1.1 Ranking methods

Function: Marpa_Rank marpa_g_default_rank_set ( Marpa_Grammar g, Marpa_Rank rank)
Function: Marpa_Rank marpa_g_default_rank ( Marpa_Grammar g)

These methods, respectively, set and query the default rank of the grammar. When a grammar is created, the default rank is 0. When rules and symbols are created, their rank is the default rank of the grammar.

Changing the grammar’s default rank does not affect those rules and symbols already created, only those that will be created. This means that the grammar’s default rank can be used to, in effect, assign ranks to groups of rules and symbols. Applications may find this behavior useful.

Return value: On success, returns the rank after the call, and sets the error code to MARPA_ERR_NONE. On failure, returns -2, and sets the error code to an appropriate value, which will never be MARPA_ERR_NONE. Note that when the rank is -2, the error code is the only way to distinguish success from failure. The error code can be determined by using the marpa_g_error() call.

Function: Marpa_Rank marpa_g_symbol_rank_set ( Marpa_Grammar g, Marpa_Symbol_ID sym_id, Marpa_Rank rank)
Function: Marpa_Rank marpa_g_symbol_rank ( Marpa_Grammar g, Marpa_Symbol_ID sym_id)

These methods, respectively, set and query the rank of a symbol sym_id. When sym_id is created, its rank initialized to the default rank of the grammar.

Return value: On success, returns the rank after the call, and sets the error code to MARPA_ERR_NONE. On failure, returns -2, and sets the error code to an appropriate value, which will never be MARPA_ERR_NONE. Note that when the rank is -2, the error code is the only way to distinguish success from failure. The error code can be determined by using the marpa_g_error() call.


21.1.2 Zero-width assertion methods

Function: Marpa_Assertion_ID marpa_g_zwa_new ( Marpa_Grammar g, int default_value)
Function: int marpa_g_zwa_place ( Marpa_Grammar g, Marpa_Assertion_ID zwaid, Marpa_Rule_ID xrl_id, int rhs_ix)
Function: int marpa_r_zwa_default ( Marpa_Recognizer r, Marpa_Assertion_ID zwaid)

On success, returns previous default value of the assertion.

Function: int marpa_r_zwa_default_set ( Marpa_Recognizer r, Marpa_Assertion_ID zwaid, int default_value)

Changes default value to default_value. On success, returns previous default value of the assertion.

Function: Marpa_Assertion_ID marpa_g_highest_zwa_id ( Marpa_Grammar g )

21.1.3 Methods for revising parses

Marpa allows an application to “change its mind” about a parse, rejected rule previously recognized or predicted, and terminals previously scanned. The methods in this section provide that capability.

Function: Marpa_Earleme marpa_r_clean ( Marpa_Recognizer r)

22 Deprecated techniques and methods


22.1 Valued and unvalued symbols


22.1.1 What unvalued symbols were

Libmarpa symbols can have values, which is the traditional way of doing semantics. Libmarpa also allows symbols to be unvalued. An unvalued symbol is one whose value is unpredictable from instance to instance. If a symbol is unvalued, we sometimes say that it has “whatever” semantics.

Situations where the semantics can tolerate unvalued symbols are surprisingly frequent. For example, the top-level of many languages is a series of major units, all of whose semantics are typically accomplished via side effects. The compiler is typically indifferent to the actual value produced by these major units, and tracking them is a waste of time. Similarly, the value of the separators in a list is typically ignored.

Rules are unvalued if and only if their LHS symbols are unvalued. When rules and symbols are unvalued, Libmarpa optimizes their evaluation.

It is in principle unsafe to check the value of a symbol if it can be unvalued. For this reason, once a symbol has been treated as valued, Libmarpa marks it as valued. Similarly, once a symbol has been treated as unvalued, Libmarpa marks it as unvalued. Once marked, a symbol’s valued status is locked and cannot be changed later.

The valued status of terminals is marked the first time they are read. The valued status of LHS symbols must be explicitly marked by the application when initializing the valuator — this is Libmarpa’s equivalent of registering a callback.

LHS terminals are disabled by default. If allowed, the user should be aware that the valued status of a LHS terminal will be locked in the recognizer if it is used as a terminal, and the symbol’s use as a rule LHS in the valuator must be consistent with the recognizer’s marking.

Marpa reports an error when a symbol’s use conflicts with its locked valued status. Doing so usually saves the programmer some tricky debugging further down the road.


22.1.2 Grammar methods dealing with unvalued symbols

Function: int marpa_g_symbol_is_valued_set ( Marpa_Grammar g, Marpa_Symbol_ID symbol_id, int value)
Function: int marpa_g_symbol_is_valued ( Marpa_Grammar g, Marpa_Symbol_ID symbol_id)

These methods, respectively, set and query the “valued status” of a symbol. Once set to a value with the marpa_g_symbol_is_valued_set() method, the valued status of a symbol is “locked” at that value. It cannot thereafter be changed. Subsequent calls to marpa_g_symbol_is_valued_set() for the same sym_id will fail, leaving sym_id’s valued status unchanged, unless value is the same as the locked-in value.

Return value: On success, 1 if the symbol symbol_id is valued after the call, 0 if not. If the valued status is locked and value is different from the current status, -2. If value is not 0 or 1; or on other failure, -2.


22.1.3 Registering semantics in the valuator

By default, Libmarpa’s valuator objects assume that non-terminal symbols have no semantics. The archetypal application will need to register symbols that contain semantics. The primary method for doing this is marpa_v_symbol_is_valued(). Applications will typically register semantics by rule, and these applications will find the marpa_v_rule_is_valued() method more convenient.

Function: int marpa_v_symbol_is_valued_set ( Marpa_Value v, Marpa_Symbol_ID sym_id, int status )
Function: int marpa_v_symbol_is_valued ( Marpa_Value v, Marpa_Symbol_ID sym_id )

These methods, respectively, set and query the valued status of symbol sym_id. marpa_v_symbol_is_valued_set() will set the valued status to the value of its status argument. A valued status of 1 indicates that the symbol is valued. A valued status of 0 indicates that the symbol is unvalued. If the valued status is locked, an attempt to change to a status different from the current one will fail (error code MARPA_ERR_VALUED_IS_LOCKED).

Return value: On success, the valued status after the call. If value is not either 0 or 1, or on other failure, -2.

Function: int marpa_v_rule_is_valued_set ( Marpa_Value v, Marpa_Rule_ID rule_id, int status )
Function: int marpa_v_rule_is_valued ( Marpa_Value v, Marpa_Rule_ID rule_id )

These methods, respectively, set and query the valued status for the LHS symbol of rule rule_id. marpa_v_rule_is_valued_set() sets the valued status to the value of its status argument.

A valued status of 1 indicates that the symbol is valued. A valued status of 0 indicates that the symbol is unvalued. If the valued status is locked, an attempt to change to a status different from the current one will fail (error code MARPA_ERR_VALUED_IS_LOCKED).

Rules have no valued status of their own. The valued status of a rule is always that of its LHS symbol. These methods are conveniences — they save the application the trouble of looking up the rule’s LHS.

Return value: On success, the valued status of the rule rule_id’s LHS symbol after the call. If value is not either 0 or 1, or on other failure, -2.

Function: int marpa_v_valued_force ( Marpa_Value v)

This methods locks the valued status of all symbols to 1, indicated that the symbol is valued. If this is not possible, for example because one of the grammar’s symbols already is locked at a valued status of 0, failure is returned.

Return value: On success, a non-negative number. On failure, returns -2, and sets the error code to an appropriate value, which will never be MARPA_ERR_NONE.