In India, Pannini creates an exact and complete description of the Sanskrit language, including pronunciation. Sanskrit could be recreated using nothing but Pannini's grammar. Pannini's grammar is probably the first formal system of any kind, predating Euclid. Even today, nothing like it exists for any other natural language of comparable size or corpus. Pannini is the object of serious study today. But in the 1940's and 1950's Pannini is almost unknown in the West. Pannini will have no direct effect on the other events in this timeline.

Andrey Markov introduces his

In 1913, Markov revisits his chains, applying them to the sequence of vowels and consonants in Pushkin's Eugene Onegin[3]. Again, Markov's interest is not in parsing. Nonetheless, this is an application of what later will be regarded as a parsing technique to language, and apparently for the first time in the West.

Emil Post defines and studies a formal rewriting system[4] using productions. With this, the process of rediscovering Pannini in the West begins.

Alan Turing discovers the stack as part of his design of the ACE machine. This is important in parsing because recursive parsing requires stacks. The importance of Turing's discovery is not noticed at the time and stacks will be rediscovered many times over the next two decades[5].

Claude Shannon publishes the foundation paper of information theory[6]. In this paper, Shannon models English using Andrey Markov's chains[7]. The approach is similar to Markov's but the intent seems to be different -- Shannon's is a serious attempt at a contribution to parsing human languages.

From 1949 to 1951 at the ETH Zurich, Heinz Rutishauser worked on the design of what we would now call a compiler[8]. Rutishauser's arithmetic expression parser did not honor precedence but it does allow nested parentheses. It is perhaps the first algorithm which can really be considered a parsing method. Rutishauser's compiler was never implemented.

In the form of arithmetic expressions, operator expressions are the target of the first efforts at automatic parsing. We will not see this issue go away. Very informally[9], we can say that an operator expression is an expression built up from operands and operators. It is expected that the operand might be another operator expression, so operator expressions raise the issue of recursion.

The archetypal examples of operator expressions are arithmetic expressions:

2+(3*4) 13^2^-5*9/11 1729-42*8675309

In Western mathematics arithmetic expressions were read according to traditional ideas of associativity and precedence:

`^`is exponentiation. It right associates and has tightest[10] precedence.-
Multiplication (
`*`) and division (`/`) left associate. They have a precedence equal to each other and less tight than that of exponentiation. - Addition ('+') and subtraction ('-') left associate. They have a precedence equal to each other and less tight than that of multiplication and division.
- Parentheses, when present, override the traditional associativity and precedence.

Certainly in the first attempts at parsing in compilers, arithmetic expressions are central. The first compiled languages are organized line-by-line and, except for arithmetic expressions, lines are interpreted using basic string manipulations. It is only when those string manipulations turn to dealing with arithmetic expressions that they become sophisticated enough to be called parsing techniques. And only then does a parsing theory to describe those techniques become helpful.

Rutishauser's language, and in fact all languages before LISP, are structured line-by-line. No language before ALGOL is truly block-structured.

The line-by-line languages are parsed using string manipulations. A parsing theory is not helpful for describing these ad hoc string manipulations, so they don't give rise to one. The only logic in these early compilers that really deserves to be called a parsing method is that which tackles arithmetic expressions.

During 1950, Corrado Boehm, also at the ETH Zurich develops his own compiler. Rutishauser and Boehm are working at the same institution at the same time, but Boehm is unaware of Rutishauser's work until his own is complete. Boehm's is also the first self-compiling compiler -- it is written in its own language.

Like Rutishauser, Boehm's language is line-by-line and
parsed ad hoc, except for expressions. Boehm's expression parser
*does*
honor precedence, making it perhaps the first operator precedence
parser[11].
In Norvell's taxonomy[12],
Boehm's algorithm inaugurates
the
classic approach

to operator parsing.

Boehm's compiler also allows parentheses, but the two cannot be mixed -- an expression can either be parsed using precedence or have parentheses, but not both. And like Rutishauser's, Boehm's compiler is never implemented[13].

Grace Hopper writes a linker-loader,
and calls it a

Hopper used the term
to compose out of materials from other
documents

[15].
Specifically, in 1952,
subroutines were new,
and automated programming
(what we would come to call
compiling

)
often was viewed as providing a interface
for calling a collection of
carefully chosen assembler subroutines[16].
Hopper's new
program took this subroutine calling
one step further -- instead of calling the subroutines
it expanded them (or in Hopper's terms
compiled

them) into
a single program.
Since Hopper the term

As we have seen, programs that today we
*would*
call compilers
were envisioned before Hopper,
but nobody called these programs compilers -- compiling in our modern sense
was called
automatic coding

,
codification automatique

or
Rechenplanfertigung

[18].

Kleene discovers regular languages[19]. Kleene does not use regular expression notation, but his regular languages are the idea behind it.

Knuth
calls Glennie's the first
real

[20]
compiler, by which he means that
it was actually implemented and used by someone to translate
algebraic statements into machine language.
Glennie's AUTOCODE
was very close to the machine -- just above machine language.
It did not allow operator expressions.

AUTOCODE was hard-to-use,
and apparently saw little use by anybody but
Glennie himself.
Because
Glennie worked for the British atomic weapons projects his papers
were routinely classified,
so even the indirect influence of AUTOCODE was
slow to spread.
Nonetheless, many other

At IBM, a team under John Backus begins working on the language which will be called FORTRAN.

The term

Noam Chomsky is awarded a Ph.D. in linguistics and accepts a teaching post at MIT. MIT does not have a linguistics department and Chomsky is free to teach his own approach. Chomksy's linguistics course is highly original and very mathematical.

At Purdue, a team including Alan Perlis and Joseph Smith begins work on the IT compiler[23].

Perlis and Smith, now at the Carnegie Institute of Technology, finish the IT compiler[24]. Don Knuth calls this

the first reallyusefulcompiler. IT and IT's derivatives were used successfully and frequently in hundreds of computer installations until [their target] the [IBM] 650 became obsolete. [... P]revious systems were important steps along the way, but none of them had the combination of powerful language and adequate implementation and documentation needed to make a significant impact in the use of machines.[25]

The IT language had arithmetic expressions, of a sort -- parentheses are honored, but otherwise evaluation is always right-to-left -- and there is no operator precedence. The IT compiler's way of doing arithmetic expressions proves very unpopular: Donald Knuth reports that

The lack of operator priority (often called precedence or hierarchy) in the IT language was the most frequent single cause of errors by the users of that compiler.[26]

In the 1956 document describing the IT compiler[27],
the IT team is careful to define the term.
Their definition makes clear that
they are using the word

Chomsky publishes the paper which is usually considered the foundation of Western formal language theory[28]. Chomsky demolishes the idea that natural language grammar can be modeled using only Markov chains. Instead, the paper advocates a natural language approach that uses three layers:

- Chomsky uses
Markov's chains
as his
**bottom layer**. This becomes the modern compiler's**lexical phase**. - Chomsky's
**middle layer**uses context-free grammars and context-sensitive grammars. These are his own discoveries[29]. This middle layer becomes the**syntactic phase**of modern compilers. - Chomsky's
**top layer**, again his own discovery, maps ortransforms the output of the middle layer. Chomsky's top layer is the inspiration for the AST transformation phase of modern compilers.

Chomsky is a turning point, so much so that he establishes or settles the
meaning of many of the terms we are using.
A

In contrast to a parser,
a
yes

or
no

--
yes

if the string is in the language described by the recognizer,
no

otherwise.
Note that,
if we intend to do semantics,
a recognizer alone is not sufficient.

In the strict sense, a recognizer cannot driver a compiler -- a compiler needs a parser. But recognizers can be far easier to write, and are often hacked up and pressed into service as parsers. For example, if your semantics is simple and non-recursive, you might be able to drive your semantics phase with the output of a regex engine, using captures.

Noam Chomsky publishes
Syntactic Structures[30], one of the most important books of
all time.
The orthodoxy in 1957 is structural linguistics which
argues, with Sherlock Holmes, that
it is a capital mistake
to theorize in advance of the facts

.
Structuralists start
with the utterances in a language, and build upward.

But Chomsky claims that without a theory there are no facts: there is only noise. The Chomskyan approach is to start with a grammar, and use the corpus of the language to check its accuracy. Chomsky's approach will soon come to dominate linguistics.

From here on, parsing theory divides into Chomskyan and non-Chomskyan. Chomskyan parsing theory becomes and remains the mainstream but it is far from unchallenged.

Above we defined a

As of 1957, we are calling the description
of a Chomskyan middle layer,
a

Backus's team makes the first FORTRAN compiler available to IBM customers. FORTRAN is the first high-level language that will find widespread implementation. As of this writing, it is the oldest language that survives in practical use.

FORTRAN I is a line-by-line language. Its parsing is non-Chomskyan. But it includes one important discovery. FORTRAN allows expressions and, learning from the dissatisfaction with the IT compiler, FORTRAN honors associativity and precedence.

The designers of FORTRAN used a strange trick for parsing operator expresssions -- they hacked the expressions by adding parentheses around each operator. The number of parentheses varied, depending on the operator. Surprisingly, this works. In fact, once the theoretical understanding of operator precedence comes about, the FORTRAN I implementation is recognized as a hackish and inefficient way of implementing the classic operator precedence algorithm.

John McCarthy's LISP appears. LISP goes beyond the line-by-line syntax -- it is recursively structured. But the LISP interpreter does not find the recursive structure: the programmer must explicitly indicate the structure herself, using parentheses. Similarly, LISP does not have operator expressions in the usual sense -- associativity and precedence must be specified with parentheses.

Backus discovers a new notation to describe the IAL language[31]. According to Backus's recollection, his paper[32] has only one reader: Peter Naur[33]. The IAL language will soon be renamed ALGOL.

Samuelson and Bauer[34]
describe the use of stacks to implement
operator precedence.
Samuelson and Bauer do not use the word

Their algorithm, and its presentation, are thoroughly non-Chomskyan. Samuelson and Bauer do not provide a context-free grammar for their grammar, just an operator precedence table.

Since Boehm, many people have been refining operator precedence. With Samuelson and Bauer, what Norvell[35] calls "the classic algorithm" takes a well-documented form. The Samuelson and Bauer paper was very influential.

The ALGOL 60 report[36] specifies, for the first time, a block structured language. ALGOL 60 is recursively structured but the structure is implicit -- newlines are not semantically significant, and parentheses indicate syntax only in a few specific cases. The ALGOL compiler will have to find the structure.

With the ALGOL 60 report, a quest begins which continues to this day: to find a parser that solves the Parsing Problem. Ideally such a parser is

- efficient,
- practical,
- declarative, and
- general.

It is a case of 1960's optimism at its best. As the ALGOL committee is well aware, a parsing algorithm capable of handling ALGOL 60 does not yet exist. But the risk they are taking has immediate results -- 1961 will see a number of discoveries that are still important today. On the other hand, the parser they seek remains elusive for decades.

In the ALGOL 60 report[37], Peter Naur improves the Backus notation and uses it to describe the language. This bring Backus' notation to wide attention. The improved notation will become known as Backus-Naur Form (BNF).

For our purposes, a parser is

A parser is

A general parser is a parser that will parse any grammar that can be written in BNF[38].

The first description of a consciously non-Chomskyan compiler
seems to predate the first description of a Chomskyan parser.
It is A.E. Gleenie's 1960 description of his compiler-compiler[39].
Glennie's

Glennie uses BNF,
but he does
*not*
use it in the way
Chomsky, Backus and Naur intended.
Glennie is using BNF to describe a
**procedure**.
In true BNF, the order of the rules does not matter --
in Glennie's pseudo-BNF order matters very much.
This means that, for most practical grammars,
Glennie's pseudo-BNF and BNF proper do
**not**
describe the same language.

Glennie is well aware of what he his doing, and
is not attempting to deceive anyone --
he points out that the distinction
between his procedural pseudo-BNF and declarative BNF,
and clearly warns his reader that the difference is
important

[40].

In January, Ned Irons publishes a paper describing his ALGOL
60 parser.
It is the first paper to fully describe any parser.
The Irons algorithm is pure Chomskyan
and top-down with a bottom-up

The Irons parser is declarative and general. And since it is general, operator expressions are within the power of the Irons parser[42].

A top-down parser deduces the consituents of a rule from the rule.
That is, it looks at the rule first,
and then deduces what is on its RHS.
Thus, it works from the start rule, and works
top down

until it arrives at the input tokens.

It is important to note that no useful parser can be purely top-down --
if a parser worked purely pure top-down, it would never look at its input.
So every top-parser we will consider has
*some*
kind of bottom-up element.
That bottom-up element may be very simple -- for example, one character lookahead.
The
Irons 1961
parser,
like most modern top-down parsers,
has a sophisticated bottom-up element.

A bottom-up parser deduces a rule from its constituents.
That is, it looks at either the input or at the LHS symbols
of previously deduced rules,
and from that deduces a rule.
Thus, it works from the input tokens, and works
bottom up

until it reaches the start rule.
A parser can be be purely bottom-up,
but for efficiency it is usually best to eliminate potential
parses based on the grammar,
and that requires some kind of top-down logic.

Irons 1961
also introduces synthesized attributes: the parse creates
a tree, which is evaluated bottom-up. Each node is evaluated using
attributes
synthesized

from its child nodes[43].

Peter Lucas publishes his description of a top-down parser[44]. Either Irons paper or this one can be considered to be the first description of recursive descent[45]. Except to say that he deals properly with them, Lucas does not say how he parses operator expressions. But it is easy to believe Lucas' claim -- by this time the techniques for parsing operator expressions are well understood[46].

In November 1961, Dijkstra publishes the
shunting yard

algorithm[47].
In Norvell's useful classification[48]
of operator expression parsers,
all
earlier parsers have been what he calls
the classic
algorithm

.

Dijkstra's approach is new. In the classic approach, the number of levels of precedence is hard-coded into the algorithm. Dijkstra's algorithm can handle any number of levels of precedence without a change in its code, and without any effect on its running speed.

The results of 1961 transformed the Operator Issue.
Before ALGOL,
parsing operator expressions essentially
*was*
parsing.
After ALGOL, almost all languages will be block-structured
and ad hoc string manipulatons are no longer adequate --
the language as a whole requires a serious parsing technique.
Parsing operator expressions becomes a side show,
or so it seems.

Why not use the algorithms that parse operator expressions for the whole language? Samuelson and Bauer 1959 had suggested exactly that. But, alas, operator expression parsing is not adequate for languages as a whole[49].

Also by 1961, we have BNF. This gives us a useful notation for describing grammars. It allows us to introduce our Basic Operator Grammar (BASIC-OP):

S ::= E E ::= E + T E ::= T T ::= T * F T ::= F F ::= number

BASIC-OP has two operators, three levels of precedence and left associativity. This was enough to challenge the primitive parsing techniques in use before 1961. Surprisingly, this simple grammar will continue to bedevil mainstream parsing theory for the next half a century.

Recursive descent, it turns out, cannot parse BASIC-OP because it is left recursive. And that is not the end of it. Making addition and multiplication right-associate is unnatural and, as the authors of the IT compiler found out, causes users to revolt. But suppose we try to use this Right-recursive Operator Grammar (RIGHT-OP) anyway:

S ::= E E ::= T + E E ::= T T ::= F * T T ::= F F ::= number

Recursive descent, without help, cannot parse RIGHT-OP. As of 1961, parsing theory has not developed well enough to state why in a precise terms. Suffice it to say for now that RIGHT-OP requires too much lookahead.

But recursive descent does have a huge advantage, one which, despite its severe limitations, will save it from obsolescence time and again. Hand-written recursive descent is essentially calling subroutines. Adding custom modification to recursive descent is very straight-forward.

In addition,
while pure recursive descent cannot
*parse*
operator expressions,
it can
*recognize*
them.
This means pure recursive descent may not be able to create
the parse subtree for an operator expression itself,
but it can recognize the expression and hand control
over to a specialized operator expression parser.
This seems to be what Lucas' 1961 algorithm did,
and it is certainly what many other implementations did afterwards.
Adding the operator expression subparser makes the implementation
only quasi-Chomskyan,
but this was a price the profession has
been willing to pay.

Alternatively, a recursive descent implementation can parse operator expressions as lists, and add associativity in post-processing. This pushes some of the more important parsing out of the syntactic phase into the semantics but, once again, it seemed that Chomskyan purity had to be thrown overboard if the ship was to stay afloat.

Bottom line: as of 1961 the Operator Issue takes a new form. Because of the Operator Issue, recursive descent is not sufficient for practical grammars -- it must always be part of a hybrid.

In this context, Dijkstra's new 1961 algorithm is a welcome alternative: as an operator expression subparser, it can parse operator expressions faster and in less space. But Dijkstra's algorithm has no more parsing power than the classic operator precedence algorithm -- it does nothing to change the basic tradeoffs.

Schorre publishes a paper on the Meta II
compiler
writing language

, summarizing the papers of the 1963
conference. Schorre cites both Backus and Chomsky as sources for
Meta II's notation.
Schorre notes that his parser is
entirely different

from that of
Irons 1961
--
in fact, it is non-Chomskyan.
Meta II is a template, rather than something that
his readers can use, but in principle it can be turned into a fully
automated compiler-compiler[50].

Don Knuth discovers LR parsing[51].
The LR algorithm is deterministic,
Chomskyan and bottom-up. Knuth is primarily interested in the
mathematics, and the parsing algorithm he gives is not practical. He
leaves development of practical LR parsing algorithms as an
open question

for
future research

[52].

Among
the goals
for the ideal
ALGOL parser was that it be
efficient

.
As of 1965 this notion becomes more precise,
thanks to a notation that Knuth borrows from
calculus.
This
big O

notation characterizes algorithms
using functions, but it treats functions that
differ by a constant multiple as identical[53].
Ignoring the constant

means that conclusions
drawn from
big O

results outlast steady improvements
in technology,
but capture non-linear jumps.

These big O functions take as their input some aspect of the algorithm's
input.
In the case of parsing, by convention,
the big O function is usually a function
of the length of the input[54].
Of the many possible
big O

functions,
only a few will be of interest in this timeline.

- A function which grows steadily as the input grows
is called
linear . Inbig O

notation,linear is written O(n). - A function which grows as the length of the input times
its logarithm is almost linear:
quasi-linear . Inbig O

notation,quasi-linear can be written O(n*log n). - A function which grows as the square of the length
is called
quadratic

-- O(n**2). - A function which grows as the cube of the length
is called
cubic

-- O(n**3). - In extreme cases,
a function can grow as a constant taken to the power
of the length, or
exponentially -- O(c**n).

By this time,
efficient

for a parser means
linear

or
quasi-linear

.
Parsers which operate in quadratic, cubic, or exponential
time are
considered impractical.
But parsers aimed at practitioners will often push the
envelope --
any parser which uses backtracking is potentially exponential
and is designed in the (often disappointed) hope
that the backtracking will not
get out of hand
for the grammar of interest to the user.

Knuth 1965 is a significant milestone in the history of operator expresssion parsing. Knuth specifically addresses the parsing of ALGOL and he zeroes in on its arithmetic expressions as the crucial issue[55]. Knuth suggests a "typical" grammar[56] which is short, but encapsulates the parsing challenges presented by ALGOL's arithmetic expressions:

S ::= E E ::= - T E ::= T E ::= E - T T ::= P T ::= T * P P ::= a P ::= ( E )

KNUTH-OP is left-recursive, allows parentheses, has three levels of precedence, and implements both unary and binary minus. Recursive descent cannot parse KNUTH-OP, but it is LR(1). The means that it is well within Knuth's new class of grammars and, by implication, probably within a practical subclass of the LR grammars.

In his 1965, Knuth makes a reasonable assumption, but one which turns out to be wrong:

Another important area of research is to develop algorithms that accept LR(k) grammars, or special classes of them, and to mechanically produce efficient parsing programs.[57]

Implied in this is
the belief that
better algorithms for parsing LR(k)
grammars
will either be parsers aimed specifically at LR(k)
or at subsets of LR(k).
That is, Knuth excludes the idea that algorithms which
target
**supersets**
of LR(k) might be faster than those
that take on LR(k) directly.
In short, he assumes, as is usually the case,
that as you gain in parsing power,
you lose in terms of speed.
That is a good rule of thumb,
and a reasonable expectation,
but it is not always true.

A factor in this misconception may be another assumption: the most efficient way to parse a deterministic language is with a deterministic parser. Certainly all deterministic parsers are linear. And certainly all non-deterministic parsers can run in worse than quasi-linear time, at which point they quickly become impractical. But neither of these truths directly answers the question of practical interest:

Is there a non-deterministic parser which is linear for the LR(k) grammars, or a superset of them?

This question will be answered by Joop Leo in 1991.
The answer, surprisingly, will be
yes

.

When Knuth discovered the LR grammars, he announced them to the world with a full-blown mathematical description. The top-down grammars, which arose historically, have lacked such a description. In 1968, Lewis and Stearns fill that gap by defining the LL(k) grammars[58].

When LL is added to the vocabulary of parsing, the meaning of
translatable from left to right

[59].
But LL means
scan from the left, using left reductions

and, in response, the meaning of LR shifts to become
scan from the left, using
right reductions

[60].

If there is a number after the parentheses in this notation for
parsing algorithms, it usually indicates the number of tokens of
lookahead.
As an example, LL(1) means
scan from the
left, using left reductions with one character of lookahead

.
LL(1) will be important in what follows.

With Knuth 1965 and Lewis and Stearns 1968, we can now restate the problem with recursive descent and operator expressions in precise terms: Recursive descent in its pure form, is LL(1). Arithmetic operator grammars are not LL(1) -- not even close. In fact, of the grammars BASIC-OP, RIGHT-OP, and KNUTH-OP, none is LL(k) for any k.

Compromises have to be made
if we are parsing with recursive descent.
As one result of these compromises,
truly
declarative versions of LL(1) are not used.
A pure declarative
LL(1) parser generator
*could*
be written, but it would not be able to parse arithmetic expressions
properly.

As mentioned, a common compromise is to have recursive descent parse arithmetic expressions as lists, and add associativity in post-processing. We are now able to look at this in more detail. An extended BNF grammar to recognize BASIC-OP as a list is as follows:

S ::= E E ::= T { TI }* TI ::= '+' T T ::= F { FI } * FI ::= 'x' F F ::= number

In the above
`{Z}*`
means
zero or more occurences of Z

.
Expanded into pure BNF,
and avoiding empty right hand sides,
our operator "list" grammar becomes
LIST-OP:

S ::= E E ::= T TL E ::= T TL ::= TI TL ::= TI TL TI ::= '+' T T ::= F FL T ::= F FL ::= FI FL ::= FI FL FI ::= 'x' F F ::= number

LIST-OP is LL(1), and therefore can be parsed directly by recursive descent. Note that in LIST-OP, although associativity must be provided by a post-processor, the grammar enforces precedence.

Jay Earley discovers the algorithm named after him[61]. Like the Irons algorithm, Earley's algorithm is Chomskyan, declarative and fully general. Unlike the Irons algorithm, it does not backtrack. Earley's algorithm is both top-down and bottom-up at once -- it uses dynamic programming and keeps track of the parse in tables. Earley's approach makes a lot of sense and looks very promising indeed, but it has three serious issues:

- First, there is a bug in the handling of zero-length rules.
- Second, it is quadratic for right recursions.
- Third, the bookkeeping required to set up the tables is, by the standards of 1968 hardware, daunting.

Irons' synthesized attributes had always been inadequate for many tasks. Until now, they had been supplemented by side effects or state variables. In 1968, Knuth publishes a paper on a concept he had been working for the previous few years: inherited attributes[62].

Recall that a node in a parse gets its synthesized attributes from its children. Inherited attributes are attibutes that a node gets from its parents. Of course, if both inherited and synthesized attributes are used, there are potential circularities. But inherited attributes are powerful and, with care, the circularities can be avoided.

An attribute grammar is a grammar whose nodes may have both inherited and synthesized attributes.

Since Knuth 1965 many have taken up his challenge to find a practically parseable subset of the LR(k) languages. In 1969, Frank DeRemer describes a new variant of Knuth's LR parsing[63]. DeRemer's LALR algorithm requires only a stack and a state table of quite manageable size. LALR looks practical, and can parse KNUTH-OP. LALR will go on to become the most widely used of the LR(k) subsets.

Ken Thompson writes the
`ed`
editor as one of the
first components of UNIX[64].
Before 1969,
regular expressions were an
esoteric mathematical formalism. Through the
`ed`
editor
and its descendants, regular expressions become an everyday
part of the working programmers toolkit.

Alfred Aho and Jeffrey Ullman publish the first volume[65] of their two volume textbook summarizing the theory of parsing. This book is still important. It is also distressingly up-to-date -- progress in parsing theory slowed dramatically after 1972. Aho and Ullman's version of Earley's algorithm includes a straightforward fix to the zero-length rule bug in Earley's original[66]. Unfortunately, this fix involves adding even more bookkeeping to Earley's.

Under the names TDPL and GTDPL, Aho and Ullman investigate
the non-Chomksyan parsers in the Schorre lineage[67]. They note that
it can be quite difficult to determine what language is
defined by a TDPL parser

[68].
At or around this time, rumor has it that the main line of
development for GTDPL parsers is classified secret by the US
government[69].
Public interest in GTDPL fades.

As we have noted pure LL(1) cannot parse operator expressions, and so operator expression parsers are often called into service as subparsers. What about making the entire grammar an operator grammar? This idea had already occurred to Samuelson and Bauer in 1959. In 1973, Vaughan Pratt revives it[70].

There are problems with switching to operator grammars. Operator grammars are non-Chomskyan -- the BNF no longer accurately describes the grammar. Instead the BNF becomes part of a combined notation, and the actual grammar parsed depends also on precedence, associativity, and semantics. And operator grammars have a very restricted form -- most practical languages are not operator grammars.

But many practical grammars are almost operator grammars. And the Chomskyan approach has always had its dissenters. Vaughn Pratt is one of these, and discovers a new approach to operator expression parsing: Pratt parsing is the third and last approach in Theodore Norvell's taxonomy of approaches to operator expression parsing[71]. Some have adopted Pratt parsing as the overall solution to their parsing problems[72].

As of 2018, the Pratt approach is not popular as an overall strategy. Under the name precedence climbing, it is most often used as a subparser within a recursive descent strategy. All operator expression subparsers break the Chomskyan paradigm so the non-Chomskyan nature of Pratt's parser is not a problem in this context.

Bell Labs converts its C compiler from hand-written recursive descent to DeRemer's LALR algorithm[73].

The first
dragon book

[74]
comes out. This soon-to-become
classic textbook is nicknamed after the drawing on the front cover,
in which a knight takes on a dragon. Emblazoned on the knight's lance
are the letters
LALR

. From here on out, to speak lightly
of LALR will be to besmirch the escutcheon of parsing theory.

Bell Laboratories releases Version 7 UNIX[75]. V7 includes what is, by far, the most comprehensive, useable and easily available compiler writing toolkit yet developed.

Part of the V7 toolkit is Yet Another Compiler Compiler
(`yacc`)[76].
`yacc`
is LALR-powered. Despite its name,
`yacc`
is the first
compiler-compiler in the modern sense.
For a few useful languages, the
process of going from Chomskyan specification to executable
is now fully automated.
Most practical languages, including the C language and
`yacc`'s own input language, still require manual hackery[77].

Recall the criteria outlined for a solution of the Parsing Problem: that the parser be efficient, practical, declarative and general. LALR is linear and runs fast on 1979 hardware. As of 1979 LALR seems practical. And LALR is declarative. True, LALR is very far from general, but BASIC-OP, RIGHT-OP KNUTH-OP and LIST-OP are all LALR, and LALR can parse most operator expressions directly.

Not all of the criteria of the original quest have been fully met. Nonetheless, after two decades of research, it seems as if the Parsing Problem and the Operator Issue can both be declared solved.

Larry Wall introduces Perl 1[78].
Perl embraces complexity like no
previous language.
Larry uses
`yacc`
and LALR very aggressively --
to my knowledge more aggressively than anyone before or since.

Wadler starts to introduce monads to the functional programming community. As one example he converts his previous efforts at function parsing techniques to monads[79]. The parsing technique is pure recursive descent. The examples shown for monadic parsing are very simple, and do not include operator expressions. In his earlier non-monadic work, Wadler uses a very restricted form of operator expression: His grammar avoided left recursion by using parentheses, and operator precedence was not implemented [80].

Joop Leo discovers a way of speeding up right recursions in Earley's algorithm[81]. Leo's algorithm is linear for just about every unambiguous grammar of practical interest, and many ambiguous ones as well. In 1991 hardware is six orders of magnitude faster than 1968 hardware, so that the issue of bookkeeping overhead had receded in importance. This is a major discovery. When it comes to speed, the game has changed in favor of the Earley algorithm.

Recall that
Knuth 1965, in suggesting lines of future research,
had excluded the possibility that a parser of a superset of the LR(k) languages
could be practical and linear.
Knuth did not conclude this carelessly --
he proved a strong relationship between the

But it was not a proof --
Earley's non-deterministic algorithm,
while not linear in general,
**is**
linear for a superset of Knuth's LR(k) grammars.
Is it practical?
At this point, Earley parsing is almost forgotten,
and few take notice of Earley's discovery.
Two decades will pass before anyone
attempts a practical implementation of
Leo 1991.

Earley's is forgotten. So everyone in LALR-land is content, right? Wrong. In fact, far from it.

As of 1979, LALR seemed practical.
But by the 1990s users of LALR are making
unpleasant discoveries.
While LALR automatically generates their
parsers, debugging them is so hard they could just as easily write the
parser by hand.
Once debugged, their LALR parsers are fast for correct
inputs.
But almost all they tell the users about incorrect inputs
is that they are incorrect.
In Larry's words, LALR is
fast
but stupid

[82].

On the written record, the discontent with LALR and
`yacc`
is hard to find.
What programmer wants to announce to colleagues and
potential employers that he cannot figure
how to make the standard state-of-the-art tool work?
But, by 1991,
a movement away from LALR has already begun.
Parsing practice falls back on
recursive descent
and the Operator Issue
is back in full force.

Combinator parsing
was introduced in two 1992 papers[83].
Of more interest
to us is the one by Hutton, which focuses on combinator parsing[84].
As Hutton explains[85]
a combinator is an attribute grammar
with one inherited attribute (the input string)
and two synthesized attributes (the value of the node
and the remaining part of the input string).
Combinators can be
combined

to compose other parsers.
This allows you to have an algebra of parsers.
It's an extremely attractive idea.

But, exciting as the new mathematics is, the underlying parsing theory contains nothing new -- it is the decades-old recursive descent, unchanged. Hutton's example language is extremely basic -- his operator expressions require left association, but not precedence. Left association is implemented by post-processing.

Hutton's main presentation does not use monads.
In his final pages, however,
Hutton points out
combinator parsers give rise to a monad

and shows
how his presentation could be rewritten in a form
closely related to monads[86].

In the adaptation of monads into the world of programming, Walder 1995 is a turning point. One of Wadler's examples is a parser based on monads and combinator parsing. Combinator parsing becomes the standard example of monad programming.

In its monadic form,
the already attractive technique of combinator parsing is extremely elegant
and certainly seems as if it
*should*
solve all parsing problems.
But, once again, it is still recursive descent,
completely unchanged.
Wadler handles left recursion by parsing into
a list, and re-processing that list
into left recursive form.
Wadler's example grammar only has one operator,
so the issue of precedence does not arise.

Instead of one paradigm to solve them all, Wadler ends up with a two layered approach, with a hackish bottom layer and an awkward interface. This undercuts the simplicity and elegance of the combinator approach.

Wadler 1995
used parsing as an example to motivate
its presentation of monads.
Graham Hutton and Erik Meijer[87]
take the opposite perspective -- they
write a paper on combinator parsing that could
also be viewed as a first introduction to the use
of monads in programming.

[88]

The most complicated grammar[89] for operator expressions in Hutton and Meijer 1996 has two operators (exponentiation and addition) with two levels of precedence and two different associativities: exponentiation associates right and addition associates left.

The solution in Hutton and Meijer 1996 will be familiar by now: Precedence is provided by the combinator parser which parses operators and operands of the same precedence into lists. Associativity is provided by re-processing the lists.

As we have seen, the literature introducing monads and combinator parsing added nothing to what was already known about parsing algorithms. We have focused on it for three reasons:

- Functional programming has had a profound influence on the way the profession sees parsing. That influence will probably grow.
- The literature shows the relative popularity among one group of highly skilled programmers of LALR and recursive descent. LALR is not mentioned once in any of the articles surveyed.
- The functional programming literature shows that the Operator Issue, once thought solved, is as live an issue in the 1990's as it was at the end of 1961.

Larry Wall decides on a radical reimplementation of Perl -- Perl 6. Larry does not even consider using LALR again.

John Aycock and R. Nigel Horspool publish their attempt at a fast, practical Earley's parser[90]. Missing from it is Joop Leo's improvement -- they seem not to be aware of it. Their own speedup is limited in what it achieves and the complications it introduces can be counter-productive at evaluation time. But buried in their paper is a solution to the zero-length rule bug. And this time the solution requires no additional bookkeeping.

Implementers by now are avoiding
`yacc`, but the demand for
declarative parsers remains.
In 2004,
Bryan Ford[91]
fills this gap by
repackaging the nearly-forgotten GTDPL.
Ford's new algorithm,
PEG,
is declarative, always linear,
and has an attractive new syntax.

But PEG is, in fact, pseudo-declarative -- it uses the BNF notation, but it does not parse the BNF grammar described by the notation: PEG achieves unambiguity by finding only a subset of the parses of its BNF grammar. And, as with its predecessor GTDPL, in practice it is usually impossible to determine what the subset is. This means that the best a programmer usually can do is to create a test suite and fiddle with the PEG description until it passes. Problems not covered by the test suite will be encountered at runtime.

PEG takes the old joke that
the code is
the documentation

one step further --
with PEG it is often the case that
nothing documents the grammar,
not even the code.
Under this circumstance, creating reliable software is impossible.
As it is usually used,
PEG is the nitroglycerin of LL(1) parsing --
slightly more powerful, but too dangerous to be worth it.

PEG, in safe use, would essentially be LL(1)[92]. Those adventurous souls who parse operator expressions under PEG typically seem to parse the operator expressions as lists, and re-process them in the semantics.

The old Bison-based C and Objective-C parser has been replaced by a new, faster hand-written recursive-descent parser.[93]

With this single line
in the middle of a change list hundreds of lines long,
GNU announces a major event in parsing history.
(Bison, an LALR parser, is the direct descendant of Steve Johnson's
`yacc`.)

For three decades, the industry's flagship C compilers have used LALR as their parser -- proof of the claim that LALR and serious parsing are equivalent. Now, GNU replaces LALR with the technology that LALR replaced a quarter century earlier: recursive descent.

Jeffrey Kegler (the author of this timeline) releases the first practical implementation of Joop Leo's algorithm[94]. The new parser, named Marpa, is general, declarative and linear for all the grammars discussed in Knuth 1965, that is, it is linear for the LR(k) grammars for all finite k[95]. This means it is also linear for most of the grammar classes we have discussed in this timeline.

- LR(1) grammars.
- LL(k) grammars for all finite k, including LL(1) grammars.
- Operator expression grammars[96].

This is a vast class of grammars, and it has the important feature that a programmer can readily determine if their grammar is linear under Marpa. Marpa will parse a grammar in linear time, if

- It is unambiguous[97].
- It has no unmarked middle recursions[98].
- It has no ambiguous right recursions[99].

The ability to ensure that a grammar can be parsed in linear time is highly useful for second-order languages -- a programmer can easily ensure that all his automatically generated grammars will be practical. Marpa's own DSL will make use of second-order programming.

Leo 1991 did not address the zero-length rule problem. Kegler solves it by adopting the solution of Aycock and Horspool 2002. Finally, in an effort to combine the best of declarative and hand-written parsing, Kegler reorders Earley's parse engine so that it allows procedural logic. As a side effect, this gives Marpa excellent error-handling properties.

While the purely precedence-driven parsing of Samuelson and Bauer 1960 and Pratt 1973 never caught on, the use of precedence in parsing remains popular. In fact, precedence tables often occur in documentation and standards, which is a clear sign that they too can play a role in human-readable but precise descriptions of grammars.

In 2012, Kegler releases a version of Marpa which allows
the user to mix Chomskyan and precedence parsing at will[100].
Marpa's
precedenced statements

describe expressions
in terms of precedence and association.
If
precedenced statements

are added to a grammar that Marpa parses in linear
time, then the grammar remains linear as long as the precedenced statements
are (after precedence is taken in consideration) unambiguous[101].

Here is an example of a Marpa
precedenced statement

:
[102]

Expression ::= Number | '(' Expression ')' assoc => group || Expression '**' Expression assoc => right || Expression '*' Expression | Expression '/' Expression || Expression '+' Expression | Expression '-' Expression

Previously operator expression parsers had worked by
comparing adjacent operators.
This limited them to binary expressions
(the ternary expressions in most languages are
implemented using post-parsing hackery).
Marpa's
precedenced expressions

are not operator-driven --
operators are used only to avoid ambiguity.
Precedenced statements

will directly implement any number of ternary,
quaternary or n-ary operators.
Ternary and larger expressions are useful in
financial calculations,
and are common in calculus.

Marpa allows operator expressions to be parsed as BNF, eliminating the need for them. On the other hand, Marpa's ability to handle second-order languages allows it to take statements which describe expressions in terms of precedence and association, and rewrite them in a natural way into Chomskyan form. In this way the grammar writer has all the benefits of Chomskyan parsing, but is also allowed to describe his grammar in terms of operators when that is convenient. Once again, the Operator Issue seems to be solved, but this time perhaps for real.

Again recall the four criteria from the original quest for a solution to the Parsing Problem: that a parser be efficient, practical, declarative and general. Does Marpa[103] fulfill them?

At first glance, no.
While Marpa is general, it is not linear or quasi-linear for the
general case,
and therefore cannot be considered efficient enough.
But to be general, a parser has to parse
**every**
context-free grammar, including
some wildly ambiguous ones, for which simply printing the
list of possible parses takes exponential time.
With the experience of the decades,
linearity for a fully general BNF parser seems to
be a superfluous requirement for a practical parser.

The LR(k) grammars include every other grammar class we have discussed -- LL(k) for all k, LALR, operator grammars, etc. If we change our criteria as follows:

- linear for all LR(k) grammars, and for all grammars parseable in linear time by any other parser in practical use;
- practical; and
- declarative.

then Marpa qualifies.

My teachers came from the generation which created the ALGOL vision and started the quest for a solution to the Parsing Problem. Alan Perlis was one of them. Would he have accepted my claim that I have found the solution to the problem he posed? I do not know, but I am certain, as anyone else who knew him can attest, that he would given me a plain-spoken answer. In that spirit, I submit this timeline to the candid reader. In any case, I hope that my reader has found this journey through the history of parsing informative.

An attempt is made to list the original publication, which is not necessarily the one consulted.

**Aho and Ullman 1972**:
Aho, Alfred V., and Jeffrey D. Ullman.

**Aho and Ullman 1973**:
Aho, Alfred V., and Jeffrey D. Ullman.

**Aho and Ullman 1977**:
Aho, Alfred V., and Jeffrey D. Ullman.
green dragon book

to distinguish
it from its second edition,
which had a red dragon on the cover.

**ALGOL 1960**:
Backus, John W.,
F. L. Bauer, J. Green, C. Katz, J. McCarthy, A. J. Perlis, H. Rutishauser,
K. Samelson, B. Vauquois, J. H. Wegstein, A. van Wijngaarden,
and M. Woodger.

**Aycock and Horspool 2002**:
Aycock, John, and R. Nigel Horspool.

**Backus 1959**:
Backus, John W.

**Backus 1980**:
Backus, John.

**Backus 2006**:
Booch, Grady.

**Carpenter and Doran 1977**:
Carpenter, Brian E., and Robert W. Doran.

**Chipps et al 1956**:
Chipps, J., et al.

**Chomsky 1956**:
Chomsky, Noam.

**Chomsky 1957**:
Chomsky, Noam.
Syntactic Structures.
Mouton & Co., 1957.

**Darwin 1984**:
Darwin, Ian F., and Geoffrey Collyer.

**Denning 1983**:
Denning, Peter.
Untitled introduction to
Irons 1961.

**DeRemer 1969**:
DeRemer, Franklin Lewis.

**Dijkstra 1961**:
Dijkstra, E. W.

**Earley 1968**:
Earley, Jay.

**Ford 2002**:
Ford, Bryan.

**Ford 2004**:
Ford, Bryan.

**Frost 1992**:
Frost, Richard A.

**FSF 2006**:
"GCC 4.1 Release Series: Changes, New Features, and Fixes."
Free Software Foundation,
http://gcc.gnu.org/gcc-4.1/changes.html.
Accessed 25 April 2018.
The release date of GCC 4.1 was February 28, 2006.

**Glennie 1960**:
Glennie, A. E.

**Grune and Jacobs 2008**:
Grune, D. and Jacobs, C. J. H.
Parsing Techniques: A Practical
Guide, 2nd edition.
Springer, 2008.
When it comes to determining who first discovered an idea,
the literature often disagrees.
In such cases, I have often deferred
to the judgement of Grune and Jacobs.

**Hayes 2013**:
Hayes, Brian.

**Hilgers and Langville 2006**:
Hilgers, Philipp, and Amy N. Langville.

**Hutton 1992**:
Hutton, Graham.

**Hutton and Meijer 1996**:
Hutton, Graham and Erik Meijer.

**Irons 1961**:
Irons, Edgar T.

**Johnson 1975**:
Johnson, Stephen C.

**Kegler 2010**:
Kegler, Jeffrey.

**Kegler 2012a**:
Kegler, Jeffrey.

**Kegler 2012b**:
Kegler, Jeffrey.

**Kleene 1951**:
Kleene, Stephen Cole.

**Knuth 1962**:
Knuth, Donald E.

**Knuth 1965**:
Knuth, Donald E.

**Knuth 1968**:
Knuth, Donald E.

**Knuth 1971**:
Knuth, Donald E.

**Knuth 1990**:
Knuth, Donald E.

**Knuth and Pardo 1976**:
Knuth, Donald E., and Luis Trabb Pardo.

**Leo 1991**:
Leo, Joop M. I. M.

**Lewis and Stearns 1968**:

**Lucas 1961**:
Lucas, Peter.

**Marpa::R2**:

Home website:
http://savage.net.au/Marpa.html.
Accessed 25 April 2018.

Kegler's Marpa website:
https://jeffreykegler.github.io/Marpa-web-site/.
Accessed 25 April 2018.

Github:
https://github.com/jeffreykegler/Marpa--R2.
Accessed 25 April 2018.

See also
Kegler 2012b
and
Marpa::R2 MetaCPAN.

**Marpa::R2 MetaCPAN**:
https://metacpan.org/pod/Marpa::R2.
Accessed 30 April 2018.

**Markov 1906**:
Markov, Andrey Andreyevich.

**Markov 1913**:
Markov, Andrey Andreyevich.

**Mascrenhas et al 2014**:
Mascarenhas, Fabio, Sèrgio Medeiros, and Roberto Ierusalimschy.

**McIlroy 1987**:
McIlroy, M. Douglas.
"A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971-1986,"
AT&T Bell Laboratories Computing Science Technical Report #139, 1987.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.692.4849&rep=rep1&type=pdf.
Accessed 25 April 2018.

**McIlroy and Kernighan 1979**:
McIlroy, M. D. and B. W. Kernighan.

**Moser 1954**:
Moser, Nora B.

**Norvell 1999**:
Norvell, Theodore.

**Padua 2000**:
Padua, David.

**Perlis et al 1958**:
Perlis, Alan J., J. W. Smith, and H. R. Van Zoeren.

**Post 1943**:
Post, Emil L.

**Pratt 1973**:
Pratt, Vaughan R.

**Rosencrantz and Stearns 1970**:
Rosenkrantz, Daniel J., and Richard Edwin Stearns.

**Samuelson and Bauer 1959**:
Samelson, Klaus, and Friedrich L. Bauer.
"Sequentielle formelübersetzung."
it-Information Technology 1.1-4 (1959): 176-182.

**Samuelson and Bauer 1960**:
Samelson, Klaus, and Friedrich L. Bauer.

**Schorre 1964**:
Schorre, D. V.

**Shannon 1948**:
Shannon, Claude E.

**Simms 2014**:
Simms, Damon.

**Snyder 1975**:
Snyder, Alan.

**Wadler 1985**:
Wadler, Philip.

**Wadler 1990**:
Wadler, Philip.

**Wadler 1995**:
Wadler, Philip.

**Wall 2014**:
Wall, Larry.

**Wiki Big_O**:

**Wiki Perl**:

1. Markov 1906. ↩

2. Indirectly, Markov's purpose may have been to refute Nekrasov's proof of free will. Nekrasov pointed out that social statistics of all kinds follow the law of large numbers, which (up until then) had assumed events were independent. Since social events were human choices, Nekrasov saw this as a proof of free will. Markov's demonstration that the law of large numbers works just as well for dependent events undercut Nekrasov's proof. (Hayes 2013, pp. 92-93). ↩

3. Markov 1913. See Hayes 2013 and Hilgers and Langville 2006, pp. 155-157. ↩

6. Shannon 1948. ↩

7. Shannon 1948, pp. 4-6. See also Hilgers and Langville 2006, pp. 157-159. ↩

8. Knuth and Pardo 1976, pp. 29-35, 40. ↩

9. The formal apparatus for describing operator expressions is not fully formed as of 1949. ↩

10.
This timeline refers to precedence levels as
tight

or
loose

.
The traditional terminology is confusing: tighter
precedences are
higher

but traditionally the
precendence levels are also numbered and the higher
the number, the lower the precedence.
↩

11.
Many of the
firsts

for parsing algorithms in this
timeline are debatable,
and the history of operator precedence is especially murky.
Here I follow
Samuelson and Bauer 1960
in giving the priority to Boehm.
↩

12. Norvell 1999. ↩

13. Knuth and Pardo 1976, pp 35-42. ↩

14. Knuth and Pardo 1976, p. 50. ↩

15. The quoted definition is from Moser 1954, p. 15, as cited in Knuth and Pardo 1976, p. 51. ↩

16. In 1952, an interface that guided calls to subroutines was much more helpful than current programmers might imagine: "Existing programs for similar problems were unreadable and hence could not be adapted to new uses." (Backus 1980, p. 126) ↩

17. I hope nobody will read this terminological clarification as in any sense detracting from Hopper's achievement. Whatever it is called, Hopper's program was a major advance, both in terms of insight and execution, and her energetic followup did much to move forward the events in this timeline. Hopper has a big reputation and it is fully deserved. ↩

18. Knuth and Pardo 1976, p. 50. ↩

19. Kleene 1951. ↩

20. Knuth and Pardo 1976, p 42. ↩

21. Knuth and Pardo 1976, pp. 42-49. ↩

22. Backus 1980, pp. 133-134. ↩

23. Knuth and Pardo 1976, pp. 83-86. ↩

24. Perlis et al 1958. Knuth and Pardo 1976, p. 83, state that the IT compiler "was put into use in October, 1956." ↩

25. Knuth and Pardo 1976, p. 84. ↩

26. Knuth 1962. The 1956 date for IT is from Padua 2000. ↩

27. Chipps et al 1956. ↩

28. Chomsky 1956. ↩

29. Chomsky seems to have been unaware of Post's work -- he does not cite it. ↩

30. Chomsky 1957. ↩

31. Backus's notation is influenced by his study of Post -- he seems not to have read Chomsky until later. See Backus 2006, p. 25 and Backus 1980, p. 133. ↩

32. Backus 1959. ↩

33. Backus 1980, pp. 132-133. ↩

34. Samuelson and Bauer 1960. ↩

35. Norvell 1999. ↩

36. ALGOL 1960. ↩

37. ALGOL 1960. ↩

38.
As a pedantic point, a general parser need only parse

39. Glennie 1960. ↩

40. Glennie 1960. ↩

41.
Irons 1961.
Among those who state that
Irons 1961
parser is what
is now called

42. Although it seems likely that parsing operator expressions would require backtracking, and therefore could be inefficient. ↩

43. Irons is credited with the discovery of synthesized attributes by Knuth (Knuth 1990). Synthesized attributes are a concept in semantics, which this timeline usually does not cover. But synthesized attributes will be important for us. ↩

44. Lucas 1961. ↩

45.
Grune and Jacobs 2008
call Lucas the discoverer of recursive descent.
In fact, both
Irons 1961
and
Lucas 1961
are recursive descent with a major bottom-up element.
Perhaps Grune and Jacobs based their decision on the Lucas' description of his algorithm,
which talks about his parser's bottom-up element only briefly,
while describing the top-down element in detail.
Also, the Lucas algorithm more resembles modern implementations of recursive descent
much more closely than
Irons 1961.

In conversation, Irons described his 1961 parser as a kind of recursive descent.
Peter Denning
(Denning 1983)
gives priority to Irons and,
as a grateful former student of Irons,
that is of course my preference on
a personal basis.
↩

46. Lucas 1961 cites Samuelson and Bauer 1960. ↩

47. Dijkstra 1961. ↩

48. Norvell 1999. ↩

49. But see the entry for 1973 on Pratt parsing (Pratt 1973) where the idea of parsing entire languages as operator grammars is revisited. ↩

50. Schorre 1964, p. D1.3-1. ↩

51. Knuth 1965.. ↩

52. Knuth 1965, p. 637. ↩

53. This timeline is not a mathematics tutorial, and I have ignored important complications to avoid digression. For interested readers, the details of "big O" notation can be worth learning: Wiki Big_O. ↩

54.
Because constants are ignored, all reasonable measures of
the length are equivalent for
big O

notation.
Some papers on parsing also consider the size of the grammar,
but usually size of the grammar is regarded as a fixed constant.
In this timeline all
big O

results will be
in terms of the input length.
↩

55. Knuth also discusses some ambiguities, and some intermixing of semantics with syntax, in the ALGOL report. But Knuth (quite appropriately) notes that BNF was new when it was written, and treats these other issues as problems with the ALGOL specification. (See Knuth 1965, pp. 622-625.) ↩

56. Knuth 1965, p. 621. ↩

57. Knuth 1965, p. 637. ↩

58. Lewis and Stearns 1968. They are credited in Rosencrantz and Stearns 1970 and Aho and Ullman 1972, p. 368. ↩

59. Knuth 1965, p. 610. See also on p. 611 "corresponds with the intuitive notion of translation from left to right looking k characters ahead". ↩

60.
Knuth 1971, p. 102.
LL and LR have mirror images: RL means
scan from the right,
using left reductions

and RR acquires its current meaning
of
scan from the right, using right reductions

.
Practical use of these
mirror images is rare, but it may have occurred
in one of the algorithms in our timeline
--
operator expression parsing
in the IT compiler seems to have been RL(2) with backtracking.
↩

61. Earley 1968. ↩

62. Knuth 1968. ↩

63. DeRemer 1969 ↩

64. Darwin 1984, Section 3.1, "An old editor made new". ↩

65. Aho and Ullman 1972. ↩

66. Aho and Ullman 1972, p 321. ↩

67. Aho and Ullman 1972, pp. 456-485. ↩

68. Aho and Ullman 1972, p. 466. ↩

69. Simms 2014. ↩

70. Pratt 1973. ↩

71. Norvell 1999. ↩

72. Pratt 1973, p. 51. ↩

73. Synder 1975. See, in particular, pp. 67-68. ↩

74. Aho and Ullman 1977. ↩

75. McIlroy and Kernighan 1979. ↩

76. Johnson 1975. ↩

77. See McIlroy 1987, p. 11, for a listing of yacc-powered languages in V7 Unix. ↩

78.
Wiki Perl,

79. Walder 1990 ↩

80. Wadler 1985. ↩

83. Some sources consider Wadler 1985 an early presentation of the ideas of combinator parsing. In assigning priority here, I follow Grune and Jacobs 2008 (p. 564). ↩

84. The paper which is devoted to parsing is Hutton 1992. The other paper, which centers on combinators as a programming paradigm, is Frost 1992. Frost only mentions parsing in one paragraph, and that focuses on implementation issues. Some of his grammars include operator expressions, but these avoid left recursion, and implement precedence and associativity, using parentheses. ↩

85. Hutton 1992, p. 5. ↩

86. Hutton 1992, pp. 19-20. ↩

87. Hutton and Meijer 1996. ↩

88. Hutton and Meijer 1996, p. 3. ↩

89. Hutton and Meijer 1996, p. 18. A simpler arithmetic expression grammar on p. 15 raises no issues not present in the more complicated example. ↩

90. Aycock and Horspool 2002. ↩

91. Ford 2004. See also Ford 2002. ↩

92. Mascrenhas et al 2014. My account of PEG is negative because almost all real-life PEG practice is unsafe, and almost all of the literature offers no cautions against unsafe use of PEG. Ford's PEG is efficient and does in fact have real, if niche, uses. Programmers interested in the safe use of PEG should consult Mascrenhas et al 2014. ↩

94. Kegler 2010. ↩

95. In fact, Leo 1991 proves his algorithm, and therefore Marpa, is linear for the LR-regular grammars -- the LR grammars with infinite lookahead, so long as the lookahead is a regular expresion. It is not known (in fact not decidable, see Leo 1991, p. 175.), just how large a class of grammars Marpa is linear for. But it is known that there are both ambiguous and unambiguous grammars for which Marpa is not linear. In the general case, Marpa obeys the bounds of Earley's algorithm -- it is worst-case O(n**2) for unambiguous grammars; and worst-case O(n**3) for ambiguous grammars. On the other hand, Marpa follows Leo 1991 in being linear for many ambiguous grammars, including those of most practical interest. ↩

96.
In the literature operator expression grammars are more often
call simply
operator grammars

.
↩

97. In the general case, ambiguity is undecidable, but in practice it is usually straight-forward for a programmer to determine that the grammar he wants is unambiguous. Note the while the general case is undecidable, Marpa will tell the programmer if a parse (a grammar with a particular input) is ambiguous and, since it parses ambiguous grammars, produces an error message showing where the ambiguity is. ↩

98.
A
marker

, in this sense, is something in the input
that shows where the middle of a middle recursion is,
without having to go to the end and count back.
Whether or not a grammar is unmarked is, in general,
an undecidable problem.
But in practice, there is a easy rule:
if a person can
eyeball

the middle of a long
middle recursion, and does not need to count from both
ends to find it,
the grammar is
marked

.
↩

99.
This is a very pedantic requirement,
which practical programmers can in fact ignore.
Any programmer who has found and eliminated
ambiguities in his grammar will usually,
in the process,
also have found and
eliminated ambiguous right recursions,
since it is far easier to eliminate them than
to determine is they create a overall ambiguity.
In fact,
it is an open question whether there
*are*
any unambiguous
grammars with ambiguous right recursions --
neither Joop Leo or I know of any.
↩

100. Kegler 2012a. ↩

101. Internally, Marpa rewrites precedenced statements into BNF. Marpa's rewrite does not use middle recursions and does not introduce any new ambiguities so that, following Marpa's linearity rules, the result must be linear. ↩

102.
In this, precedence goes from tightest to loosest.
Single bars (`|`) separate RHS's of equal precedence.
Double bars (`||`) separate RHS's of different precedence.
Associativity defaults to left,
and exceptions are indicated by the value of the
`assoc`
adverb.
Note that these
precedenced statements

,
like all of Marpa's DSL,
are parsed by Marpa itself.
The statement shown has the semantic adverbs removed for clarity.
The original is part of the
Marpa::R2
test suite
and can also be found on
Marpa::R2 MetaCPAN,