Trees (Libmarpa 11.0.10)

5.8 Trees

In this document, unless otherwise stated,

by tree, we mean a labeled ordered tree; and
by tree node, we mean a labeled ordered tree node.

For brevity, in contexts where the meaning is clear, we refer to a tree node simply as a node. A node is a pair:

The first element of a node is a “label tuple”. The label tuple is a triple of symbol ID, start Earley set ID, and end Earley set ID. For more about the Earley set IDs, see Input.
The second element of a node is a sequence of nodes. Our definition of a tree node is therefore recursive.

When looked at from the point of view of its labels, a node is often called an instance. In the following list of definitions and assertions, let

     nd = < < sym, start, end >, children >

be a tree node.

We say that sym is the symbol of nd.
We say that nd is an instance of the symbol with ID sym starting at start and ending at end. We often write this as sym@start-end.
We say that nd is an instance of the symbol with ID sym at location end.
We say that the length of nd is the difference between its start and end, that is end-start.
The length of nd is zero iff start is the same as end. Put another way, the length of nd is zero iff start = end.
We say that the elements of children are the children of nd.
We say that every element of children is a child of nd.
For brevity, we say that the symbol sym is at end. We note that this means we consider the location of a symbol to be where it ends.
nd is a leaf node iff children is the empty list. A leaf node is also called a leaf.
nd is a rule node iff it is not a leaf node.
Every node is either a leaf node or a rule node. No node is both a leaf and a rule node.
We say that nd is a terminal node iff nd is a leaf node and sym is a terminal. A terminal node is also called a token node.
We say that nd is a nulled node iff nd is a leaf node and sym is not a terminal. A nulled node is also called a nulling node.
Every leaf node is either a nulled node or a terminal node. But, because nullable LHS terminals are not allowed, no node is both nulled and terminal.
We say that nd is a BNF node iff nd is not a terminal node and sym is the LHS of a BNF rule.
We say that nd is a sequence node iff nd is not a terminal node and sym is the LHS of a sequence rule.
Every node is a terminal node, a BNF node or a sequence node. But no node is more than one of the these three. This is because sequence rules never share a LHS with a BNF rule, and no BNF node or sequence node is a terminal node.
If nd is a rule node, its LHS is sym.
If nd is a rule node, its RHS is the concatenation, from first to last, of the symbols of the nodes in children.
All nulled nodes are zero-length. No terminal node is zero-length.
We say that nd is an instance of sym starting at start and ending at end. We also say that nd is an instance of sym at end or, simply, that nd is an instance of sym.
Let r be the BNF rule whose LHS is equal to the LHS of nd, and whose RHS is equal to the RHS of nd. If nd is a BNF rule node, there must be such a rule. In that case, we say that nd is an instance of r starting at start and ending at end. We often write this as r@start-end. We also say that nd is an instance of r at end or, simply, that nd is an instance of r.
Let r be the sequence rule whose LHS is equal to the LHS of nd. If nd is a sequence rule node, there must be such a rule. In that case, we say that nd is an instance of r starting at start and ending at end. We often write this as r@start-end. We also say that nd is an instance of r at end or, simply, that nd is an instance of r.
If nd is a nulled instance, we say that sym is nulled at location end or, simply, that the symbol sym is nulled. We can write this as nulled@end-end.

Let nd1 and nd2 be two nodes. If nd2 is a child of nd1, then nd1 is the parent of nd2.

We define ancestor recursively such that nd1 is the ancestor of a node nd2 iff one of the following are true:

nd1 and nd2 are the same node. In this case we say that nd1 is the trivial ancestor of nd2.
nd1 is the parent of an ancestor of nd2. In this case we say that nd1 is a proper ancestor of nd2.

Simlarly, we define descendant recursively such that nd1 is the descendant of a node nd2 iff one of the following are true:

nd1 and nd2 are the same node. In this case we say that nd1 is the trivial descendant of nd2.
nd1 is the parent of an descendant of nd2. In this case we say that nd1 is a proper descendant of nd2.

A tree is its own root node. That implies that, in fact, tree and node are just two different terms for the same thing. We usually speak of trees when we are thinking of the nodes/trees as a collection of nodes, and we speak of nodes when we are more focused on the individual nodes.

We sometimes call a root node, a start node. We do this especially in contexts where a tree is “top-level”, that is, not the child of any other tree under discussion.

If t1 and t2 are trees, and t2 (considered as a node) is a descendant of t1, then we say t2 is a subtree of t1. By this definition, every tree is a subtree of itself.

A parse forest is a set of one or more parse trees.

We use “parse” as a noun in several senses. Depending on context a parse may be

a parse run, that is, an instance of running a parser on an input using a grammar;
a parse tree; or
a parse forest.

When the meaning of “parse” is not clear in context, we will be explicit about which sense is intended.