In India, Pannini creates an exact and complete description of the Sanskrit language, including pronunciation. Sanskrit could be recreated using nothing but Pannini's grammar. Pannini's grammar is probably the first formal system of any kind, predating Euclid. Even today, nothing like it exists for any other natural language of comparable size or corpus. Pannini is the object of serious study today. But in the 1940's and 1950's Pannini is almost unknown in the West. Pannini will have no direct effect on the other events in this timeline.

Andrey Markov introduces his

In 1913, Markov revisits his chains, applying them to the sequence of vowels and consonants in Pushkin's Eugene Onegin[3]. Again, Markov's interest is not in parsing. Nonetheless, this is an application to language of what later will be regarded as a parsing technique, and apparently for the first time in the West.

In 1929 Leonard Bloomfield, as part of his effort to create a linguistics that would be taken seriously as a science, published his "Postulates".[4] Known as structural linguistics, Bloomfield's approach will be very successful, dominating American lingustics for over two decades.

Bloomfield's "Postulates" defines a "language" as

[t]he totality of utterances that can be made in a speech community[5]

Note that there is no reference in this definition to the usual view -- that the utterances of a language "mean" something.

This omission is not accidental.[6] Bloomfield excludes meaning from his definition of language because he wants linguistics to be taken seriously as science. Behaviorist thought is very influential at this time and behavorists believe that, while human behaviors can be observed and verified and therefore made the subject of science, mental states cannot be verified. Claiming to know what someone means by a word is claiming to read his mind. And "mind-reading" is not science.

Emil Post defines and studies a formal rewriting system[7] using productions. With this, the process of rediscovering Pannini in the West begins.

Alan Turing discovers the stack as part of his design of the ACE machine. This is important in parsing because recursive parsing requires stacks. The importance of Turing's discovery is not noticed at the time and stacks will be rediscovered many times over the next two decades[8].

Claude Shannon publishes the foundation paper of information theory[9]. In this paper, Shannon models English using Andrey Markov's chains[10]. The approach is similar to Markov's but the intent seems to be different -- Shannon's is a serious attempt at a contribution to parsing human languages.

From 1949 to 1951 at the ETH Zurich, Heinz Rutishauser works on the design of what we would now call a compiler[11]. Rutishauser's arithmetic expression parser does not honor precedence but it does allow nested parentheses. It is perhaps the first algorithm which can really be considered a parsing method. Rutishauser's compiler is never implemented.

In the form of arithmetic expressions, operator expressions are the target of the first efforts at automatic parsing. We will not see this issue go away. Very informally[12], we can say that an operator expression is an expression built up from operands and operators. It is expected that the operand might be another operator expression, so operator expressions raise the issue of recursion.

The archetypal examples of operator expressions are arithmetic expressions:

2+(3*4) 13^2^-5*9/11 1729-42*8675309

In Western mathematics arithmetic expressions were read according to traditional ideas of associativity and precedence:

`^`is exponentiation. It right associates and has tightest[13] precedence.-
Multiplication (
`*`) and division (`/`) left associate. They have a precedence equal to each other and less tight than that of exponentiation. - Addition ('+') and subtraction ('-') left associate. They have a precedence equal to each other and less tight than that of multiplication and division.
- Parentheses, when present, override the traditional associativity and precedence.

In the first attempts at parsing in compilers, arithmetic expressions are central. The first compiled languages are organized line-by-line and, except for arithmetic expressions, lines are interpreted using basic string manipulations. It is only when those string manipulations turn to dealing with arithmetic expressions that they become sophisticated enough to be called parsing techniques. And only then does a parsing theory to describe those techniques become helpful.

Rutishauser's language, and in fact all languages before LISP, are structured line-by-line. No language before ALGOL is truly block-structured.

The line-by-line languages are parsed using string manipulations. A parsing theory is not helpful for describing these ad hoc string manipulations, so they don't give rise to one. The only logic in these early compilers that really deserves to be called a parsing method is that which tackles arithmetic expressions.

During 1950, Corrado Boehm, also at the ETH Zurich develops his own compiler. Rutishauser and Boehm are working at the same institution at the same time, but Boehm is unaware of Rutishauser's work until his own is complete. Boehm's is also the first self-compiling compiler -- it is written in its own language.

Like Rutishauser, Boehm's language is line-by-line and
parsed ad hoc, except for expressions. Boehm's expression parser
*does*
honor precedence, making it perhaps the first operator precedence
parser[14].
In Norvell's taxonomy[15],
Boehm's algorithm inaugurates
the
classic approach

to operator parsing.

Boehm's compiler also allows parentheses, but the two cannot be mixed -- an expression can either be parsed using precedence or have parentheses, but not both. Also like Rutishauser's, Boehm's compiler is never implemented[16].

Grace Hopper writes a linker-loader,
and calls it a

Hopper used the term
to compose out of materials from other
documents

[18].
Specifically, in 1952,
subroutines were new,
and automated programming
(what we would come to call
compiling

)
often was viewed as providing a interface
for calling a collection of
carefully chosen assembler subroutines[19].
Hopper's new
program took this subroutine calling
one step further -- instead of calling the subroutines
it expanded them (or in Hopper's terms
compiled

them) into
a single program.
Since Hopper the term

As we have seen, programs that today we
*would*
call compilers
were envisioned before Hopper,
but nobody called these programs compilers -- compiling in our modern sense
was called
automatic coding

,
codification automatique

or
Rechenplanfertigung

[21].

Kleene discovers regular languages[22]. Kleene does not use regular expression notation, but his regular languages are the idea behind it.

Glennie discovers what Knuth will later
call the first
real

[23]
compiler.
(By this Knuth will mean that
AUTOCODE was actually implemented and used by someone to translate
algebraic statements into machine language.)
Glennie's AUTOCODE
is very close to the machine -- just above machine language.
It does not allow operator expressions.

AUTOCODE is hard-to-use,
and apparently sees little use by anybody but
Glennie himself.
Because
Glennie works for the British atomic weapons projects his papers
are routinely classified,
so even the indirect influence of AUTOCODE is
slow to spread.
Nonetheless, many other

At IBM, a team under John Backus begins working on the language which will be called FORTRAN.

The term

Noam Chomsky earns his PhD at the University of Pennsylvania. His teacher, Zelig Harris, is a prominent Bloomfieldian, and Chomsky's early work is thought to be in the Bloomfield school.[26] That same year Chomsky accepts a teaching post at MIT. MIT does not have a linguistics department, and Chomsky is free to teach his own (highly original and very mathematical) approach to linguistics.

At Purdue, a team including Alan Perlis and Joseph Smith begins work on the IT compiler[27].

Perlis and Smith, now at the Carnegie Institute of Technology, finish the IT compiler[28]. Don Knuth calls this

the first reallyusefulcompiler. IT and IT's derivatives were used successfully and frequently in hundreds of computer installations until [their target] the [IBM] 650 became obsolete. [... P]revious systems were important steps along the way, but none of them had the combination of powerful language and adequate implementation and documentation needed to make a significant impact in the use of machines.[29]

The IT language had arithmetic expressions, of a sort -- parentheses are honored, but otherwise evaluation is always right-to-left -- and there is no operator precedence. The IT compiler's way of doing arithmetic expressions proves very unpopular: Donald Knuth reports that

The lack of operator priority (often called precedence or hierarchy) in the IT language was the most frequent single cause of errors by the users of that compiler.[30]

In the 1956 document describing the IT compiler[31],
the IT team is careful to define the term.
Their definition makes clear that
they are using the word

Chomsky publishes his "Three models" paper, which is usually considered the foundation of Western formal language theory[32]. Chomsky demolishes the idea that natural language grammar can be modeled using only Markov chains. Instead, the paper advocates a natural language approach that uses three layers:

- Chomsky uses
Markov's chains
as his
**bottom layer**. This becomes the modern compiler's**lexical phase**. - Chomsky's
**middle layer**uses context-free grammars and context-sensitive grammars. These are his own discoveries[33]. This middle layer becomes the**syntactic phase**of modern compilers. - Chomsky's
**top layer**, again his own discovery, maps ortransforms the output of the middle layer. Chomsky's top layer is the inspiration for the AST transformation phase of modern compilers.

In his "Three models" paper, Chomsky says that

By a language then, we shall mean a set (finite or infinite) of sentences, each of finite length, all constructed from a finite alphabet of symbols.[34]

This is exactly Bloomfield's definition, restated using set theory.

Nonetheless signs of departure from the behaviorist orthodoxy are apparent in "Three Models" -- Chomsky is quite willing to talk about what sentences mean, when it serves his purposes. For a utterance with multiple meanings, Chomsky's new model produces multiple syntactic derivations. Each of these syntactic derivations "looks" like the natural representation of one of the meanings. Chomsky points out that the insight into semantics that his new model provides is a very desirable property to have.[35]

Chomsky is a turning point, so much so that he establishes or settles the
meaning of many of the terms we are using.
A

In contrast to a parser,
a
yes

or
no

--
yes

if the string is in the language described by the recognizer,
no

otherwise.
Note that,
if we intend to do semantics,
a recognizer alone is not sufficient.

In the strict sense, a recognizer cannot drive a compiler -- a compiler needs a parser. But recognizers can be far easier to write, and are often hacked up and pressed into service as parsers. For example, if your semantics is simple and non-recursive, you might be able to drive your semantics phase with the output of a regex engine, using captures.

Noam Chomsky publishes
Syntactic Structures[36], one of the most important books of
all time.
The orthodoxy in 1957 is structural linguistics which
argues, with Sherlock Holmes, that
it is a capital mistake
to theorize in advance of the facts

.
Structuralists start
with the utterances in a language, and build upward.

But Chomsky claims that without a theory there are no facts: there is only noise. The Chomskyan approach is to start with a grammar, and use the corpus of the language to check its accuracy. Chomsky's approach will soon come to dominate linguistics.

From here on, parsing theory divides into Chomskyan and non-Chomskyan. Chomskyan parsing theory becomes and remains the mainstream. But it is far from unchallenged.

Above we defined a

As of 1957, we are calling the description
of a Chomskyan middle layer,
a

Backus's team makes the first FORTRAN compiler available to IBM customers. FORTRAN is the first high-level language that will find widespread implementation. As of this writing, it is the oldest language that survives in practical use.

FORTRAN I is a line-by-line language. Its parsing is non-Chomskyan. But it includes one important discovery. FORTRAN allows expressions and, learning from the dissatisfaction with the IT compiler, FORTRAN honors associativity and precedence.

The designers of FORTRAN use a strange trick for parsing operator expresssions -- they hack the expressions by adding parentheses around each operator. The number of parentheses varied, depending on the operator. Surprisingly, this works. In fact, once the theoretical understanding of operator precedence comes about, the FORTRAN I implementation is recognized as a hackish and inefficient way of implementing the classic operator precedence algorithm.

John McCarthy's LISP appears. LISP goes beyond the line-by-line syntax -- it is recursively structured. But the LISP interpreter does not find the recursive structure: the programmer must explicitly indicate the structure herself, using parentheses. Similarly, LISP does not have operator expressions in the usual sense -- associativity and precedence must be specified with parentheses.

Backus discovers a new notation to describe the IAL language[37]. According to Backus's recollection, his paper[38] has only one reader: Peter Naur[39]. The IAL language will soon be renamed ALGOL.

Samuelson and Bauer[40]
describe the use of stacks to implement
operator precedence.
Samuelson and Bauer do not use the word

Their algorithm, and its presentation, are thoroughly non-Chomskyan. Samuelson and Bauer do not provide a context-free grammar for their grammar, just an operator precedence table.

Since Boehm, many people have been refining operator precedence. With Samuelson and Bauer, what Norvell[41] calls "the classic algorithm" takes a well-documented form. The Samuelson and Bauer paper will be very influential.

In 1959, Chomsky reviews a book by B.F. Skinner on linguistics.[42] Skinner is the most prominent behaviorist of the time. This review galvanizes the opposition to behaviorism, and establishes Chomsky as behavorism's most prominent and effective critic. Chomsky makes clear his stand on the role of mental states in language study. Behaviorist claims that they can eliminate "traditional formulations in terms of reference and meaning", Chomsky says, are "simply not true."[43]

But, by this time, Chomsky's original definition of language, as a "set of strings", is established in the computer literature. Chomsky's clarifications goes unnoticed. In fact, as the mathematics of parsing theory develops, the Bloomfieldian definition will become even more firmly entrenched.

The ALGOL 60 report[44] specifies, for the first time, a block structured language. ALGOL 60 is recursively structured but the structure is implicit -- newlines are not semantically significant, and parentheses indicate syntax only in a few specific cases. The ALGOL compiler will have to find the structure.

With the ALGOL 60 report, a search begins which continues to this day: to find a parser that solves the Parsing Problem. Ideally such a parser is[45]

- efficient,
- general,
- declarative, and
- practical[46].

It is a case of 1960's optimism at its best. As the ALGOL committee is well aware, a parsing algorithm capable of handling ALGOL 60 does not yet exist. But the risk they are taking has immediate results -- 1961 will see a number of discoveries that are still important today. On the other hand, the parser they seek remains elusive for decades.

In the ALGOL 60 report[47], Peter Naur improves the Backus notation and uses it to describe the language. This brings Backus' notation to wide attention. The improved notation will become known as Backus-Naur Form (BNF).

For our purposes, a parser is

A parser is

A general parser is a parser that will parse any grammar that can be written in BNF[48].

The first description of a consciously non-Chomskyan compiler
seems to predate the first description of a Chomskyan parser.
It is A.E. Glennie's 1960 description of his compiler-compiler[49].
Glennie's

Glennie uses BNF,
but he does
*not*
use it in the way
Chomsky, Backus and Naur intended.
Glennie is using BNF to describe a
**procedure**.
In true BNF, the order of the rules does not matter --
in Glennie's pseudo-BNF order matters very much.
This means that, for most practical grammars,
a set of rules in BNF form describe one
language when interpreted as intended
by Backus and Naur,
and another when interpreted as
intended by Glennie.

Glennie is aware of what he is doing, and
is not attempting to deceive anyone --
he points out that the distinction
between his procedural pseudo-BNF and declarative BNF,
and clearly warns his reader that the difference is
important

[50].

In January, Ned Irons publishes a paper describing his ALGOL
60 parser.
It is the first paper to fully describe any parser.
The Irons algorithm is pure Chomskyan,
declarative, general,
and top-down with a bottom-up

A top-down parser deduces the constituents of a rule from the rule.
That is, it looks at the rule first,
and then deduces what is on its RHS.
Thus, it works from the start rule, and works
top down

until it arrives at the input tokens.

It is important to note that no useful parser can be purely top-down --
if a parser worked purely top-down, it would never look at its input.
So every top-parser we will consider has
*some*
kind of bottom-up element.
That bottom-up element may be very simple -- for example, one character lookahead.
The
Irons 1961
parser,
like most modern top-down parsers,
has a sophisticated bottom-up element.

A bottom-up parser deduces a rule from its constituents.
That is, it looks at either the input or at the LHS symbols
of previously deduced rules,
and from that deduces a rule.
Thus, it works from the input tokens, and works
bottom up

until it reaches the start rule.

But just as no really useful parser is purely top-down,
no really useful parser is purely bottom-up.
It it true that
the
**implementation**
of a useful parser might be 100% bottom-up.
But a useful
parser must do more than return all possible partitions of the input --
it must
implement some criteria for preferring some parse trees over
others.
Implicitly, such criteria come "from the top".

In fact, bottom-up parsers will often be pressed into service for Chomskyan parsing, and the Chomskyan approach is inherently top down. Even when it comes to implementation, bottom-up parsers will often use explicit top-down logic. For efficiency it is usually necessary to eliminate unwanted parses trees while parsing, and culling unwanted parse trees is a job for top-down logic.

Irons 1961
also introduces synthesized attributes: the parse creates
a tree, which is evaluated bottom-up. Each node is evaluated using
attributes
synthesized

from its child nodes[53].

Peter Lucas publishes his description of a top-down parser[54]. Either Irons paper or this one can be considered to be the first description of recursive descent[55]. Except to say that he deals properly with them, Lucas does not say how he parses operator expressions. But it is easy to believe Lucas' claim -- by this time the techniques for parsing operator expressions are well understood[56].

In November 1961, Dijkstra publishes the
shunting yard

algorithm[57].
In Norvell's useful classification[58]
of operator expression parsers,
all
earlier parsers have been what he calls
the classic
algorithm

.

Dijkstra's approach is new. In the classic approach, the number of levels of precedence is hard-coded into the algorithm. Dijkstra's algorithm can handle any number of levels of precedence without a change in its code, and without any effect on its running speed.

Mathematical study of stacks as models of computing begins with Anthony Oettinger's 1961 paper.[59] Oettinger 1961 is full of evidence that stacks are still very new. For a start, he does not call them stacks -- he calls them "pushdown stores". And Oettinger does not assume his highly sophisticated audience knows what the "push" and "pop" operations are -- he defines them based on another set of operations -- one that eventually will form the basis of the APL language.

Oettinger's definitions all follow the behavorist model -- they are sets of strings.[60] Oettinger's mathematical model of stacks will come to be called a deterministic pushdown automata (DPDA). DPDA's will become the subject of a substantial literature.

Oettinger's own field is Russian translation. He hopes that DPDA's will be an adequate basis for the study of both computer and natural language translation. DPDA's soon prove totally inadequate for natural language translation. But for dealing with computing languages, DPDA's will have a much longer life. As of 1961, all algorithms with acceptable speed are using stacks with various modifications.

Oettinger challenges the research community to develop a systematic method for generating stack-based parsers.[61] This search quickly becomes identified with the search for a theoretical basis for practical parsing.

The results of 1961 transformed the Operator Issue.
Before ALGOL,
parsing operator expressions essentially
*was*
parsing.
After ALGOL, almost all languages will be block-structured
and ad hoc string manipulatons are no longer adequate --
the language as a whole requires a serious parsing technique.
Parsing operator expressions becomes a side show,
or so it seems.

Why not use the algorithms that parse operator expressions for the whole language? Samuelson and Bauer 1959 had suggested exactly that. But, alas, operator expression parsing is not adequate for languages as a whole[62].

Also by 1961, we have BNF. This gives us a useful notation for describing grammars. It allows us to introduce our Basic Operator Grammar (BASIC-OP):

S ::= E E ::= E + T E ::= T T ::= T * F T ::= F F ::= number

BASIC-OP has two operators, three levels of precedence and left associativity. This was enough to challenge the primitive parsing techniques in use before 1961. Surprisingly, this simple grammar will continue to bedevil mainstream parsing theory for the next half a century.

Recursive descent, it turns out, cannot parse BASIC-OP because it is left recursive. And that is not the end of it. Making addition and multiplication right-associate is unnatural and, as the authors of the IT compiler found out, causes users to revolt. But suppose we try to use this Right-recursive Operator Grammar (RIGHT-OP) anyway:

S ::= E E ::= T + E E ::= T T ::= F * T T ::= F F ::= number

Recursive descent, without help, cannot parse RIGHT-OP. As of 1961, parsing theory has not developed well enough to state why in a precise terms. Suffice it to say for now that RIGHT-OP requires too much lookahead.

But recursive descent does have a huge advantage, one which, despite its severe limitations, will save it from obsolescence time and again. Hand-written recursive descent is essentially calling subroutines. Adding custom modification to recursive descent is very straight-forward.

In addition,
while pure recursive descent cannot
*parse*
operator expressions,
it can
*recognize*
them.
This means pure recursive descent may not be able to create
the parse subtree for an operator expression itself,
but it can recognize the expression and hand control
over to a specialized operator expression parser.
This seems to be what Lucas' 1961 algorithm did,
and it is certainly what many other implementations did afterwards.
Adding the operator expression subparser makes the implementation
only quasi-Chomskyan,
but this was a price the profession has
been willing to pay.

Alternatively, a recursive descent implementation can parse operator expressions as lists, and add associativity in post-processing. This pushes some of the more important parsing out of the syntactic phase into the semantics but, once again, it seemed that Chomskyan purity had to be thrown overboard if the ship was to stay afloat.

Bottom line: as of 1961 the Operator Issue takes a new form. Because of the Operator Issue, recursive descent is not sufficient for practical grammars -- it must always be part of a hybrid.

In this context, Dijkstra's new 1961 algorithm is a welcome alternative: as an operator expression subparser, it can parse operator expressions faster and in less space. But Dijkstra's algorithm has no more parsing power than the classic operator precedence algorithm -- it does nothing to change the basic tradeoffs.

Schorre publishes a paper on the Meta II
compiler
writing language

, summarizing the papers of the 1963
conference. Schorre cites both Backus and Chomsky as sources for
Meta II's notation.
Schorre notes that his parser is
entirely different

from that of
Irons 1961
--
in fact, it is non-Chomskyan.
Meta II is a template, rather than something that
his readers can use, but in principle it can be turned into a fully
automated compiler-compiler[63].

Among
the goals
for the ideal
ALGOL parser was that it be
efficient

.
As of 1965 this notion becomes more precise,
thanks to a notation that Knuth borrows from
calculus.
This
big O

notation characterizes algorithms
using functions, but it treats functions that
differ by a constant multiple as identical[64].
Ignoring the constant

means that conclusions
drawn from
big O

results outlast steady improvements
in technology,
but capture non-linear jumps.

These big O functions take as their input some aspect of the algorithm's
input.
In the case of parsing, by convention,
the big O function is usually a function
of the length of the input[65].
Of the many possible
big O

functions,
only a few will be of interest in this timeline.

- A function which grows steadily as the input grows
is called
linear . Inbig O

notation,linear is written O(n). - A function which grows as the length of the input times
its logarithm is almost linear:
quasi-linear . Inbig O

notation,quasi-linear can be written O(n*log n). - A function which grows as the square of the length
is called
quadratic

-- O(n**2). - A function which grows as the cube of the length
is called
cubic

-- O(n**3). - In extreme cases,
a function can grow as a constant taken to the power
of the length, or
exponentially -- O(c**n).

By this time,
efficient

for a parser means
linear

or
quasi-linear

.
Parsers which operate in quadratic, cubic, or exponential
time are
considered impractical.
But parsers aimed at practitioners will often push the
envelope --
any parser which uses backtracking is potentially exponential
and is designed in the (often disappointed) hope
that the backtracking will not
get out of hand
for the grammar of interest to the user.

Donald Knuth answers[66] the challenge expressed a few years earlier by Oettinger. Oettinger had hoped for a theory of stack-based parsing to replace "ad hoc invention".[67] Knuth responds with a theory that encompasses all the "tricks"[68] used for efficient parsing up to that time. In an exhilarating and exhausting 39-page paper, Knuth shows that stack-based parsing is equivalent to a new class of grammars. Knuth calls this new class, LR(k). Knuth also provides a parsing algorithm for the LR(k) grammars.

Knuth's new LR parsing algorithm is deterministic, Chomskyan and bottom-up. It might be expected to be "the one to rule them all". Unfortunately, while linear, it is not practical.

LR(k) is actually a set of grammar classes.
There is a grammar class for every
`k`,
where
`k`
is the amount of lookahead used.
LR(0) requires no lookahead,
but it is not practical because it is too weak
to parse most grammars of interest.
LR(1) is not practical
because of the size of the tables it requires --
well beyond what can be done with 1965
hardware.[69]

And, as the
`k`
in
LR(k)
grows, things get rapidly worse.
The size of the tables grows exponentially,
while the value of the additional lookahead rapidly diminishes.
It is not likely that
LR(2)
parsing will ever see much actual use,
never mind
LR(k) for any `k`
greater than 2.

Knuth 1965 uses the Bloomfieldian definition of language, following the rest of the parsing literature at this time. In other words, Knuth defines a language a "set of strings", without regard to their meaning. The Bloomfieldian definition, while severely restrictive, has yet to cause any recognizable harm.

To keep these things straight,
I will borrow two terms from linguistics:
"intension" and "extension".
For this discussion,
I will speak of the "set of strings" that make
up a language as its
**extension**.
If you are a Bloomfieldian,
a language
**is**
its extension.

The vast majority of people has always thought that,
to be considered a language,
a set of strings has to have a semantics --
that is, the strings must mean something.
I will call the semantics of a language
its
**intension**.[70]
A language is extensional
(or Bloomfieldian) if it consists only of an
extension.
Otherwise, the language is intensional.
For most people, the term "language" means
an intensional language.

As of 1965, you can argue that using the Bloomfieldian definition
has been helpful.
It
**is**
easier to do math with an extensional language.
And the results for its extension sometimes do apply
to an intensional language.
For example, Chomsky, to demonstrate the
superiority of his model over Markov chains,
showed that the extension of his model of English
contains strings which clearly are English,
but which are not predicted by Markov chains.
Obviously, your model of an intensional language is wrong
if its extension is wrong,
and if you can discover that without getting into
the semantics, all the better.

Knuth needs to establish an equivalence between his LR grammars, and the mathematical literature on stack-based parsing (DPDA's). The trouble is, the DPDA literature is entirely non-Chomskyan -- all of its results are in terms of sets of strings. Knuth is forced to "compare apples to oranges"

How do you show that string-sets are the equivalent of grammars? What Knuth does is treat them both as string-sets -- extensions. Knuth compares the language extensions of the LR grammars to the sets of strings recognized by the DPDA's.

The result certainly seems encouraging.
It turns out that the language extension of
deterministic stack machines
is
**exactly**
that of the
LR
grammars.

If you take language extensions as the proxy for grammars, things fall into place very neatly: the LR-parsers are the deterministic subset of the context-free parsers. And "deterministic" seemed like a very good approximation of practical. All practical parsers in 1965 were deterministic parsers.

Viewed this way, LR-parsing looked like the equivalent of practical parsing. It was a "direct hit", or as close to an exact equivalent of practical parsing as theory was going to get. Based on the algorithms of Knuth 1965, that meant that the theoretical equivalent of "practical parsing" was somewhere between LR(0) and LR(1).

Not all the signs were promising, however. In fact, one of them is ominous. LR grammars form a hierarchy -- for every k≥0, there is an LR grammar which is LR(k+1), but which is not LR(k).

But if you just look at sets of strings, this hierarchy pancakes. Every LR(k) language extension is also an LR(1) language extension, as long as k≥1.

It gets worse. So far LR(0) has withstood the general collapse of the extensional hierarchy. But in most practical applications, you can add an end-of-input marker to a grammar.[71], If you do this, every LR(k) language extension is also an LR(0) language extension. In terms of strings-sets, there is no LR hierarchy:

In short, as a proxy for LR grammars, LR language extensions look like they might be completely worthless.[72]

While the algorithm of Knuth 1965 did not solve the Parsing Problem, it convinced most that stack-based, and therefore LR, parsing was the framework for the solution.[73] The reasons seem overwhelming:

- In 1965, every practical parser is stack-driven.
- An algorithm that combines state transitions and stack operations is already a challenge to existing machines. In 1965, any more complicated algorithm is likely to be unuseable in practice.[74]
- For LR(k) grammars, practical parsing is bracketed somewhere between LR(0) and LR(1).
- In terms of what the parsing literature calls "languages" (extensional languages), LR(k) parsing is the exact equivalent of deterministic stack-based parsing.
- Determinism is always linear and, at least in terms of speed, linear is almost always practical.
- In 1965, every practical parser is deterministic.
- In general, power is a trade-off for speed.

Extensional languages are used in this reasoning, while the actual interest is in intensional languages. And for extensions the LR hierarchy collapses. But these things that are not considered troubling. After all, there is no exact theoretical equivalent of "practical" -- you always have to settle for a rough equivalent.

In fact, there are aesthetic reasons to think that this theoretical equivalent for practical parsing is not all that rough. Recall that deterministic stack-based parsing is "exactly" LR-parsing. It is also the case that non-deterministic stack-based parsing is "exactly" context-free parsing. This symmetry is very elegant, and suggests that the theoreticians have uncovered the basic laws behind parsing.

Of course, "exactly" here means "exactly in terms of extensions". But, again, in 1965 this is not extremely troubling. Extensions are what is used. Nobody understood the limits of extensions as a definition of language better than Chomsky, but Chomsky himself had used extensions to produce some of his most impressive results.

After 1965, the consensus excludes the idea that algorithms which target supersets of LR(k) might be faster than those that take on LR(k) directly.[75]. But, what if the LR language hierarchy collapse is a real problem? If we remove the evidence based on language extensions from the list above, all we have left are couple of over-generalizations. So the question remains:

Is there a non-deterministic parser that is linear for the LR(k) grammars, or even a superset of them?

This question will be answered
by Joop Leo in 1991.
The answer, surprisingly, will be
yes

.

Knuth 1965 is a significant milestone in the history of operator expresssion parsing. Knuth specifically addresses the parsing of ALGOL and he zeroes in on its arithmetic expressions as the crucial issue[76]. Knuth suggests a "typical" grammar[77] which is short, but encapsulates the parsing challenges presented by ALGOL's arithmetic expressions:

S ::= E E ::= - T E ::= T E ::= E - T T ::= P T ::= T * P P ::= a P ::= ( E )

KNUTH-OP is left-recursive, allows parentheses, has three levels of precedence, and implements both unary and binary minus. Recursive descent cannot parse KNUTH-OP, but KNUTH-OP is LR(1). The means that it is well within Knuth's new class of grammars and, by implication, probably within a practical subclass of the LR grammars.

When Knuth discovered the LR grammars, he announced them to the world with a full-blown mathematical description. The top-down grammars, which arose historically, have lacked such a description. In 1968, Lewis and Stearns fill that gap by defining the LL(k) grammars[78].

When LL is added to the vocabulary of parsing, the meaning of
translatable from left to right

[79].
But LL means
scan from the left, using left reductions

and, in response, the meaning of LR shifts to become
scan from the left, using
right reductions

[80].

If there is a number in parentheses in this notation for
parsing algorithms, it usually indicates the number of tokens of
lookahead.
As an example, LL(1) means
scan from the
left, using left reductions with one character of lookahead

.
LL(1) will be important in what follows.

With Knuth 1965 and Lewis and Stearns 1968, we can now restate the problem with recursive descent and operator expressions in precise terms: Recursive descent in its pure form, is LL(1). Arithmetic operator grammars are not LL(1) -- not even close. In fact, of the grammars BASIC-OP, RIGHT-OP, and KNUTH-OP, none is LL(k) for any k.

A common work-around is to have recursive descent parse arithmetic expressions as lists, and add associativity in post-processing. We are now able to look at this in more detail. An extended BNF grammar to recognize BASIC-OP as a list is as follows:

S ::= E E ::= T { TI }* TI ::= '+' T T ::= F { FI } * FI ::= 'x' F F ::= number

In the above
`{Z}*`
means
zero or more occurences of Z

.
Expanded into pure BNF,
and avoiding empty right hand sides,
our operator "list" grammar becomes
LIST-OP:

S ::= E E ::= T TL E ::= T TL ::= TI TL ::= TI TL TI ::= '+' T T ::= F FL T ::= F FL ::= FI FL ::= FI FL FI ::= 'x' F F ::= number

LIST-OP is LL(1), and therefore can be parsed directly by recursive descent. Note that in LIST-OP, although associativity must be provided by a post-processor, the grammar enforces precedence.

Jay Earley discovers the algorithm named after him[81]. Like the Irons algorithm, Earley's algorithm is Chomskyan, declarative and fully general. Unlike the Irons algorithm, it does not backtrack. Earley's algorithm is both top-down and bottom-up at once -- it uses dynamic programming and keeps track of the parse in tables. Earley's approach makes a lot of sense and looks very promising indeed, but it has three serious issues:

- First, there is a bug in the handling of zero-length rules.
- Second, it is quadratic for right recursions.
- Third, the bookkeeping required to set up the tables is, by the standards of 1968 hardware, daunting.

Irons' synthesized attributes had always been inadequate for many tasks. Until now, they had been supplemented by side effects or state variables. In 1968, Knuth publishes a paper on a concept he had been working for the previous few years: inherited attributes[82].

Recall that a node in a parse gets its synthesized attributes from its children. Inherited attributes are attibutes that a node gets from its parents. Of course, if both inherited and synthesized attributes are used, there are potential circularities. But inherited attributes are powerful and, with care, the circularities can be avoided.

An attribute grammar is a grammar whose nodes may have both inherited and synthesized attributes.

Since Knuth 1965, many have taken up his challenge to find a practically parseable subset of the LR(k) languages. In 1969, Frank DeRemer describes a new variant of Knuth's LR parsing[83]. DeRemer's LALR algorithm requires only a stack and a state table of quite manageable size. LALR looks practical, and can parse KNUTH-OP. LALR will go on to become the most widely used of the LR(k) subsets.

Ken Thompson writes the
`ed`
editor as one of the
first components of UNIX[84].
Before 1969,
regular expressions were an
esoteric mathematical formalism. Through the
`ed`
editor
and its descendants, regular expressions become an everyday
part of the working programmers toolkit.

Alfred Aho and Jeffrey Ullman publish the first volume[85] of their two volume textbook summarizing the theory of parsing. This book is still important. It is also distressingly up-to-date -- progress in parsing theory slowed dramatically after 1972. Aho and Ullman's version of Earley's algorithm includes a straightforward fix to the zero-length rule bug in Earley's original[86]. Unfortunately, this fix involves adding even more bookkeeping to Earley's.

Under the names TDPL and GTDPL, Aho and Ullman investigate
the non-Chomksyan parsers in the Schorre lineage[87]. They note that
it can be quite difficult to determine what language is
defined by a TDPL parser

[88].
At or around this time, rumor has it that the main line of
development for GTDPL parsers is classified secret by the US
government[89].
Public interest in GTDPL fades.

An article by Čulik and Cohen extends the idea behind LR to grammars with infinite lookahead.[90] LR-regular (LRR) grammars are LR with lookahead that is infinite but restricted. The restriction is that to tell the strings apart, you must use a finite set of regular expressions.

LRR seems to be a natural class of grammars, coming close to capturing the idea of "eyeball grammars" -- grammars that humans can spot by "eyeball". And, despite computer languages being designed around the idea that practical parsing falls short of LR(1), never mind LRR, constructs requiring infinite lookahead will come up in real languages.[91]

As we have noted pure LL(1) cannot parse operator expressions, and so operator expression parsers are often called into service as subparsers. What about making the entire grammar an operator grammar? This idea had already occurred to Samuelson and Bauer in 1959. In 1973, Vaughan Pratt revives it[92].

There are problems with switching to operator grammars. Operator grammars are non-Chomskyan -- the BNF no longer accurately describes the grammar. Instead the BNF becomes part of a combined notation, and the actual grammar parsed depends also on precedence, associativity, and semantics. And operator grammars have a very restricted form -- most practical languages are not operator grammars.

But many practical grammars are almost operator grammars. And the Chomskyan approach has always had its dissenters. Vaughn Pratt is one of these, and discovers a new approach to operator expression parsing: Pratt parsing is the third and last approach in Theodore Norvell's taxonomy of approaches to operator expression parsing[93]. Some have adopted Pratt parsing as the overall solution to their parsing problems[94].

As of 2018, the Pratt approach is not popular as an overall strategy. Under the name precedence climbing, it is most often used as a subparser within a recursive descent strategy. All operator expression subparsers break the Chomskyan paradigm so the non-Chomskyan nature of Pratt's parser is not a problem in this context.

Bell Labs converts its C compiler from hand-written recursive descent to DeRemer's LALR algorithm[95].

The first
dragon book

[96]
comes out. This soon-to-become
classic textbook is nicknamed after the drawing on the front cover,
in which a knight takes on a dragon. Emblazoned on the knight's lance
are the letters
LALR

. From here on out, to speak lightly
of LALR will be to besmirch the escutcheon of parsing theory.

Bell Laboratories releases Version 7 UNIX[97]. V7 includes what is, by far, the most comprehensive, useable and easily available compiler writing toolkit yet developed.

Part of the V7 toolkit is Yet Another Compiler Compiler
(`yacc`)[98].
`yacc`
is LALR-powered. Despite its name,
`yacc`
is the first
compiler-compiler in the modern sense.
For a few useful languages, the
process of going from Chomskyan specification to executable
is now fully automated.
Most practical languages, including the C language and
`yacc`'s own input language, still require manual hackery[99].

Recall the criteria outlined for a solution of the Parsing Problem: that the parser be efficient, general, declarative and practical. LALR is linear and runs fast on 1979 hardware. As of 1979 LALR seems practical. And LALR is declarative. True, LALR is very far from general, but BASIC-OP, RIGHT-OP KNUTH-OP and LIST-OP are all LALR, and LALR can parse most operator expressions directly.

The criteria set in the original statement of the Parsing Problem have not been fully met. Nonetheless, after two decades of research, it seems as if the Parsing Problem and the Operator Issue can both be declared solved.

Larry Wall introduces Perl 1[100].
Perl embraces complexity like no
previous language.
Larry uses
`yacc`
and LALR very aggressively --
to my knowledge more aggressively than anyone before or since.

Wadler starts to introduce monads to the functional programming community. As one example he converts his previous efforts at functional parsing techniques to monads[101]. The parsing technique is pure recursive descent. The examples shown for monadic parsing are very simple, and do not include operator expressions. In his earlier non-monadic work, Wadler uses a very restricted form of operator expression: His grammar avoided left recursion by using parentheses, and operator precedence was not implemented [102].

Joop Leo discovers a way of speeding up right recursions in Earley's algorithm[103]. Leo's algorithm is linear for LR-regular, a superset of LR(k). That means it is linear for just about every unambiguous grammar of practical interest, and many ambiguous ones as well. In 1991 hardware is six orders of magnitude faster than 1968 hardware, so that the issue of bookkeeping overhead had receded in importance. When it comes to speed, the game has changed in favor of the Earley algorithm.

Recall that after Knuth 1965, the research consensus had excluded the possibility that a parser of a superset of the LR(k) languages could be practical and linear. The argument from Knuth had been highly suggestive. It was not actually a proof, but by 1991 many effectively take it as one.

Does Leo's algorithm refute the consensus?
While not linear in general,
Leo's non-deterministic algorithm
**is** linear for a superset of Knuth's LR(k) grammars.
But is Leo's algorithm practical?

But in 1991, most researchers see the Parsing Problem as "solved" -- a closed issue. Earley parsing is almost forgotten, and Leo's discovery is ignored. Two decades will pass before anyone attempts a practical implementation of Leo 1991.

If Earley's is almost forgotten, then everyone in LALR-land is content, right? Wrong. In fact, far from it.

As of 1979, LALR seemed practical.
But by the 1990s users of LALR are making
unpleasant discoveries.
While LALR automatically generates their
parsers, debugging them is so hard they could just as easily write the
parser by hand.
Once debugged, their LALR parsers are fast for correct
inputs.
But almost all they tell the users about incorrect inputs
is that they are incorrect.
In Larry's words, LALR is
fast
but stupid

[104].

On the written record, the discontent with LALR and
`yacc`
is hard to find.
What programmer wants to announce to colleagues and
potential employers that he cannot figure
how to make the standard state-of-the-art tool work?
But, by 1991,
a movement away from LALR has already begun.
Parsing practice falls back on
recursive descent
and the Operator Issue
is back in full force.

Combinator parsing
was introduced in two 1992 papers[105].
Of more interest
to us is the one by Hutton, which focuses on combinator parsing[106].
As Hutton explains[107]
a combinator is an attribute grammar
with one inherited attribute (the input string)
and two synthesized attributes (the value of the node
and the remaining part of the input string).
Combinators can be
combined

to compose other parsers.
This allows you to have an algebra of parsers.
It is an extremely attractive idea.

But, exciting as the new mathematics is, the underlying parsing theory contains nothing new -- it is the decades-old recursive descent, unchanged. Hutton's example language is extremely basic -- his operator expressions require left association, but not precedence. Left association is implemented by post-processing.

Hutton's main presentation does not use monads.
In his final pages, however,
Hutton points out
combinator parsers give rise to a monad

and shows
how his presentation could be rewritten in a form
closely related to monads[108].

In the adaptation of monads into the world of programming, Wadler 1995 is a turning point. One of Wadler's examples is a parser based on monads and combinator parsing. Combinator parsing becomes the standard example of monad programming.

In its monadic form,
the already attractive technique of combinator parsing is extremely elegant
and certainly seems as if it
*should*
solve all parsing problems.
But, once again, it is still recursive descent,
completely unchanged.
Wadler handles left recursion by parsing into
a list, and re-processing that list
into left recursive form.
Wadler's example grammar only has one operator,
so the issue of precedence does not arise.

Instead of one paradigm to solve them all, Wadler ends up with a two layered approach, with a hackish bottom layer and an awkward interface. This undercuts the simplicity and elegance of the combinator approach.

Wadler 1995
used parsing as an example to motivate
its presentation of monads.
Graham Hutton and Erik Meijer[109]
take the opposite perspective -- they
write a paper on combinator parsing that could
also be viewed as a first introduction to the use
of monads in programming.

[110]

The most complicated grammar[111] for operator expressions in Hutton and Meijer 1996 has two operators (exponentiation and addition) with two levels of precedence and two different associativities: exponentiation associates right and addition associates left.

The solution in Hutton and Meijer 1996 will be familiar by now: Precedence is provided by the combinator parser which parses operators and operands of the same precedence into lists. Associativity is provided by re-processing the lists.

As we have seen, the literature introducing monads and combinator parsing added nothing to what was already known about parsing algorithms. We have focused on it for three reasons:

- Functional programming has had a profound influence on the way the profession sees parsing. That influence will probably grow.
- The literature shows the relative popularity among one group of highly skilled programmers of LALR and recursive descent. LALR is not mentioned once in any of the articles surveyed.
- The functional programming literature shows that the Operator Issue, once thought solved, is as live an issue in the 1990's as it was at the end of 1961.

Larry Wall decides on a radical reimplementation of Perl -- Perl 6. Larry does not even consider using LALR again.

John Aycock and R. Nigel Horspool publish their attempt at a fast, practical Earley's parser[112]. Missing from it is Joop Leo's improvement -- they seem not to be aware of it. Their own speedup is limited in what it achieves and the complications it introduces can be counter-productive at evaluation time. But buried in their paper is a solution to the zero-length rule bug. And this time the solution requires no additional bookkeeping.

Implementers by now are avoiding
`yacc`, but the demand for
declarative parsers remains.
In 2004,
Bryan Ford[113]
fills this gap by
repackaging the nearly-forgotten GTDPL.
Ford's new algorithm,
PEG,
is declarative, always linear,
and has an attractive new syntax.

But PEG is, in fact, pseudo-declarative -- it uses the BNF notation, but it does not parse the BNF grammar described by the notation: PEG achieves unambiguity by finding only a subset of the parses of its BNF grammar. And, as with its predecessor GTDPL, in practice it is usually impossible to determine what the subset is. This means that the best a programmer usually can do is to create a test suite and fiddle with the PEG description until it passes. Problems not covered by the test suite will be encountered at runtime.

PEG takes the old joke that
the code is
the documentation

one step further --
with PEG it is often the case that
nothing documents the grammar,
not even the code.
Under this circumstance, creating reliable software is impossible.
As it is usually used,
PEG is the nitroglycerin of LL(1) parsing --
slightly more powerful, but too dangerous to be worth it.

PEG, in safe use, would essentially be LL(1)[114]. Those adventurous souls who parse operator expressions under PEG typically seem to parse the operator expressions as lists, and re-process them in the semantics.

The old Bison-based C and Objective-C parser has been replaced by a new, faster hand-written recursive-descent parser.[115]

With this single line
in the middle of a change list hundreds of lines long,
GNU announces a major event in parsing history.
(Bison, an LALR parser, is the direct descendant of Steve Johnson's
`yacc`.)

For three decades, the industry's flagship C compilers have used LALR as their parser -- proof of the claim that LALR and serious parsing are equivalent. Now, GNU replaces LALR with the technology that LALR replaced a quarter century earlier: recursive descent.

Jeffrey Kegler (the author of this timeline) releases the first practical implementation of Joop Leo's algorithm[116]. The new parser, named Marpa, is general, declarative and linear for all the grammars discussed in Knuth 1965, that is, it is linear for the LR(k) grammars for all finite k[117]. This means it is also linear for most of the grammar classes we have discussed in this timeline.

- LR(1) grammars.
- LL(k) grammars for all finite k, including LL(1) grammars.
- Operator expression grammars[118].

This is a vast class of grammars, and it has the important feature that a programmer can readily determine if their grammar is linear under Marpa. Marpa will parse a grammar in linear time, if

- It is unambiguous[119].
- It has no unmarked middle recursions[120].
- It has no ambiguous right recursions[121].

The ability to ensure that a grammar can be parsed in linear time is highly useful for second-order languages -- a programmer can easily ensure that all his automatically generated grammars will be practical. Marpa's own DSL will make use of second-order programming.

Leo 1991 did not address the zero-length rule problem. Kegler solves it by adopting the solution of Aycock and Horspool 2002. Finally, in an effort to combine the best of declarative and hand-written parsing, Kegler reorders Earley's parse engine so that it allows procedural logic. As a side effect, this gives Marpa excellent error-handling properties.

While the purely precedence-driven parsing of Samuelson and Bauer 1960 and Pratt 1973 never caught on, the use of precedence in parsing remains popular. In fact, precedence tables often occur in documentation and standards, which is a clear sign that they too can play a role in human-readable but precise descriptions of grammars.

In 2012, Kegler releases a version of Marpa which allows
the user to mix Chomskyan and precedence parsing at will[122].
Marpa's
precedenced statements

describe expressions
in terms of precedence and association.
If
precedenced statements

are added to a grammar that Marpa parses in linear
time, then the grammar remains linear as long as the precedenced statements
are (after precedence is taken in consideration) unambiguous[123].

Here is an example of a Marpa
precedenced statement

:
[124]

Expression ::= Number | '(' Expression ')' assoc => group || Expression '**' Expression assoc => right || Expression '*' Expression | Expression '/' Expression || Expression '+' Expression | Expression '-' Expression

Previously operator expression parsers had worked by
comparing adjacent operators.
This limited them to binary expressions
(the ternary expressions in most languages are
implemented using post-parsing hackery).
Marpa's
precedenced expressions

are not operator-driven --
operators are used only to avoid ambiguity.
Precedenced statements

will directly implement any number of ternary,
quaternary or n-ary operators.
Ternary and larger expressions are useful in
financial calculations,
and are common in calculus.

Marpa allows operator expressions to be parsed as BNF, eliminating the need for them. On the other hand, Marpa's ability to handle second-order languages allows it to take statements which describe expressions in terms of precedence and association, and rewrite them in a natural way into Chomskyan form. In this way the grammar writer has all the benefits of Chomskyan parsing, but is also allowed to describe his grammar in terms of operators when that is convenient. Once again, the Operator Issue seems to be solved, but this time perhaps for real.

Again recall the four criteria from the original statement of the Parsing Problem: that a parser be efficient, general, declarative and practical. Does Marpa[125] fulfill them?

At first glance, no.
While Marpa is general, it is not linear or quasi-linear for the
general case,
and therefore cannot be considered efficient enough.
But to be general, a parser has to parse
**every**
context-free grammar, including
some wildly ambiguous ones, for which simply printing the
list of possible parses takes exponential time.
With the experience of the decades,
linearity for a fully general BNF parser seems to
be a superfluous requirement for a practical parser.

The LR(k) grammars include every other grammar class we have discussed -- LL(k) for all k, LALR, operator grammars, etc. If we change our criteria as follows:

- linear for all LR(k) grammars, and for all grammars parseable in linear time by any other parser in practical use;
- declarative; and
- practical.

then Marpa qualifies.

My teachers came from the generation which created the ALGOL vision and started the search for a solution to the Parsing Problem. Alan Perlis was one of them. Would he have accepted my claim that I have found the solution to the problem he posed? I do not know, but I am certain, as anyone else who knew him can attest, that he would given me a plain-spoken answer. In that spirit, I submit this timeline to the candid reader. In any case, I hope that my reader has found this journey through the history of parsing informative.

An attempt is made to list the original publication, which is not necessarily the one consulted.

**Aho and Ullman 1972**:
Aho, Alfred V., and Jeffrey D. Ullman.

**Aho and Ullman 1973**:
Aho, Alfred V., and Jeffrey D. Ullman.

**Aho and Ullman 1977**:
Aho, Alfred V., and Jeffrey D. Ullman.
green dragon book

to distinguish
it from its second edition,
which had a red dragon on the cover.

**ALGOL 1960**:
Backus, John W.,
F. L. Bauer, J. Green, C. Katz, J. McCarthy, A. J. Perlis, H. Rutishauser,
K. Samelson, B. Vauquois, J. H. Wegstein, A. van Wijngaarden,
and M. Woodger.

**Aycock and Horspool 2002**:
Aycock, John, and R. Nigel Horspool.

**Backus 1959**:
Backus, John W.

**Backus 1980**:
Backus, John.

**Bloomfield 1926**:
Bloomfield, Leonard.

**Backus 2006**:
Booch, Grady.

**Carpenter and Doran 1977**:
Carpenter, Brian E., and Robert W. Doran.

**Chipps et al 1956**:
Chipps, J., et al.

**Chomsky 1956**:
Chomsky, Noam.

**Chomsky 1957**:
Chomsky, Noam.
Syntactic Structures.
Mouton & Co., 1957.

**Chomsky 1959**:
Chomsky, Noam.

**Chomsky 1978**:
Chomsky, Noam.

**Chomsky 2002**:
Chomsky, Noam.

**Čulik and Cohen 1973**:
Čulik II, Karel and Cohen, Rina.

**Darwin 1984**:
Darwin, Ian F., and Geoffrey Collyer.

**Denning 1983**:
Denning, Peter.
Untitled introduction to
Irons 1961.

**DeRemer 1969**:
DeRemer, Franklin Lewis.

**Dijkstra 1961**:
Dijkstra, E. W.

**Earley 1968**:
Earley, Jay.

**Ford 2002**:
Ford, Bryan.

**Ford 2004**:
Ford, Bryan.

**Frost 1992**:
Frost, Richard A.

**FSF 2006**:
"GCC 4.1 Release Series: Changes, New Features, and Fixes."
Free Software Foundation,
http://gcc.gnu.org/gcc-4.1/changes.html.
Accessed 25 April 2018.
The release date of GCC 4.1 was February 28, 2006.

**Glennie 1960**:
Glennie, A. E.

**Grune and Jacobs 2008**:
Grune, D. and Jacobs, C. J. H.
Parsing Techniques: A Practical
Guide, 2nd edition.
Springer, 2008.
When it comes to determining who first discovered an idea,
the literature often disagrees.
In such cases, I have often deferred
to the judgement of Grune and Jacobs.

**Harris 1993**:
Harris, Randy Allen.
The Linguistics Wars.
Oxford University Press, 1993

**Hayes 2013**:
Hayes, Brian.

**Hilgers and Langville 2006**:
Hilgers, Philipp, and Amy N. Langville.

**Hutton 1992**:
Hutton, Graham.

**Hutton and Meijer 1996**:
Hutton, Graham and Erik Meijer.

**Irons 1961**:
Irons, Edgar T.

**Johnson 1975**:
Johnson, Stephen C.

**Kegler 2010**:
Kegler, Jeffrey.

**Kegler 2012a**:
Kegler, Jeffrey.

**Kegler 2012b**:
Kegler, Jeffrey.

**Kegler Strings 2018**:
Kegler, Jeffrey.

**Kegler Solved 2018**:
Kegler, Jeffrey.

**Kegler Undershoot 2018**:
Kegler, Jeffrey.

**Kegler Haskell 2018**:
Kegler, Jeffrey.

**Kleene 1951**:
Kleene, Stephen Cole.

**Knuth 1962**:
Knuth, Donald E.

**Knuth 1965**:
Knuth, Donald E.

**Knuth 1968**:
Knuth, Donald E.

**Knuth 1971**:
Knuth, Donald E.

**Knuth 1990**:
Knuth, Donald E.

**Knuth and Pardo 1976**:
Knuth, Donald E., and Luis Trabb Pardo.

**Leo 1991**:
Leo, Joop M. I. M.

**Lewis and Stearns 1968**:

**Lucas 1961**:
Lucas, Peter.

**Marpa::R2**:

Home website:
http://savage.net.au/Marpa.html.
Accessed 25 April 2018.

Kegler's Marpa website:
https://jeffreykegler.github.io/Marpa-web-site/.
Accessed 25 April 2018.

Github:
https://github.com/jeffreykegler/Marpa--R2.
Accessed 25 April 2018.

See also
Kegler 2012b
and
Marpa::R2 MetaCPAN.

**Marpa::R2 MetaCPAN**:
https://metacpan.org/pod/Marpa::R2.
Accessed 30 April 2018.

**Markov 1906**:
Markov, Andrey Andreyevich.

**Markov 1913**:
Markov, Andrey Andreyevich.

**Mascrenhas et al 2014**:
Mascarenhas, Fabio, Sèrgio Medeiros, and Roberto Ierusalimschy.

**McIlroy 1987**:
McIlroy, M. Douglas.
"A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971-1986,"
AT&T Bell Laboratories Computing Science Technical Report #139, 1987.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.692.4849&rep=rep1&type=pdf.
Accessed 25 April 2018.

**McIlroy and Kernighan 1979**:
McIlroy, M. D. and B. W. Kernighan.

**Moser 1954**:
Moser, Nora B.

**Norvell 1999**:
Norvell, Theodore.

**Oettinger 1961**:
Oettinger, Anthony.

**Padua 2000**:
Padua, David.

**Perlis et al 1958**:
Perlis, Alan J., J. W. Smith, and H. R. Van Zoeren.

**Post 1943**:
Post, Emil L.

**Pratt 1973**:
Pratt, Vaughan R.

**Rosencrantz and Stearns 1970**:
Rosenkrantz, Daniel J., and Richard Edwin Stearns.

**Samuelson and Bauer 1959**:
Samelson, Klaus, and Friedrich L. Bauer.
"Sequentielle formelübersetzung."
it-Information Technology 1.1-4 (1959): 176-182.

**Samuelson and Bauer 1960**:
Samelson, Klaus, and Friedrich L. Bauer.

**Schorre 1964**:
Schorre, D. V.

**Shannon 1948**:
Shannon, Claude E.

**Simms 2014**:
Simms, Damon.

**Snyder 1975**:
Snyder, Alan.

**Wadler 1985**:
Wadler, Philip.

**Wadler 1990**:
Wadler, Philip.

**Wadler 1995**:
Wadler, Philip.

**Wall 2014**:
Wall, Larry.

**Wiki Big_O**:

**Wiki Perl**:

**1**.
Markov 1906.
↩

**2**.
Indirectly, Markov's purpose may have been
to refute Nekrasov's proof of free will.
Nekrasov pointed out that social statistics of all kinds
follow the law of large numbers,
which (up until then)
had assumed events were independent.
Since social events were human choices,
Nekrasov saw this as a proof of free will.
Markov's demonstration that the law of large
numbers works just as well for dependent events
undercut Nekrasov's proof.
(Hayes 2013, pp. 92-93).
↩

**3**.
Markov 1913.
See
Hayes 2013
and
Hilgers and Langville 2006, pp. 155-157.
↩

**4**.
Bloomfield 1926.
↩

**5**.
Bloomfield 1926,
definition 4 on p. 154.
↩

**6**.
"The statement of meanings is therefore the weak point in
language-study, and will remain so until human knowledge
advances very far beyond its present state. In practice, we define the
meaning of a linguistic form, wherever we can, in terms of some
other science.",
Bloomfield 1926, p. 140.
↩

**9**.
Shannon 1948.
↩

**10**.
Shannon 1948, pp. 4-6.
See also
Hilgers and Langville 2006,
pp. 157-159.
↩

**11**.
Knuth and Pardo 1976,
pp. 29-35, 40.
↩

**12**.
No formal apparatus for describing operator expressions is fully worked
out as of 1949.
↩

**13**.
This timeline refers to precedence levels as
tight

or
loose

.
The traditional terminology is confusing: tighter
precedences are
higher

but traditionally the
precendence levels are also numbered and the higher
the number, the lower the precedence.
↩

**14**.
Some of the
firsts

for parsing algorithms in this
timeline are debatable,
and the history of operator precedence is especially murky.
Here I follow
Samuelson and Bauer 1960
in giving the priority to Boehm.
↩

**15**.
Norvell 1999.
↩

**16**.
Knuth and Pardo 1976,
pp 35-42.
↩

**17**.
Knuth and Pardo 1976, p. 50.
↩

**18**.
The quoted definition is from
Moser 1954, p. 15,
as cited in
Knuth and Pardo 1976,
p. 51.
↩

**19**.
In 1952, an interface that guided calls to subroutines
was much more helpful than current programmers might
imagine:
"Existing programs for similar problems were unreadable
and hence could not be adapted to new uses."
(Backus 1980, p. 126)
↩

**20**.
I hope nobody will read this terminological clarification as in any sense
detracting from Hopper's achievement.
Whatever it is called, Hopper's program was a major advance,
both in terms of insight and execution, and her energetic followup
did much to move forward the events in this timeline.
Hopper has a big reputation and it is fully deserved.
↩

**21**.
Knuth and Pardo 1976,
p. 50.
↩

**22**.
Kleene 1951.
↩

**23**.
Knuth and Pardo 1976,
p 42.
↩

**24**.
Knuth and Pardo 1976,
pp. 42-49.
↩

**25**.
Backus 1980, pp. 133-134.
↩

**26**.
Harris 1993,
pp 31-34, p. 37.
↩

**27**.
Knuth and Pardo 1976,
pp. 83-86.
↩

**28**.
Perlis et al 1958.
Knuth and Pardo 1976, p. 83,
state that the IT compiler "was put into use
in October, 1956."
↩

**29**.
Knuth and Pardo 1976, p. 84.
↩

**30**.
Knuth 1962.
The 1956 date for IT is from
Padua 2000.
↩

**31**.
Chipps et al 1956.
↩

**32**.
Chomsky 1956.
↩

**33**.
Chomsky seems
to have been unaware of Post's work -- he does not cite it.
↩

**34**.
Chomsky 1956, p. 114.
In case there is any doubt Chomsky's "strings"
are the same as Bloomfield's utterances,
Chomsky also calls his strings,
"utterances".
For example,
Chomsky 2002, p. 15:
"Any grammar of a language will project the finite and somewhat accidental
corpus of observed utterances to a set (presumably infinite)
of grammatical utterances."
↩

**35**.
Chomsky 1956, p. 118, p. 123.
↩

**36**.
Chomsky 1957.
↩

**37**.
Backus's notation is influenced by his study of Post --
he seems not to have read Chomsky until later.
See
Backus 2006, p. 25
and
Backus 1980, p. 133.
↩

**38**.
Backus 1959.
↩

**39**.
Backus 1980, pp. 132-133.
↩

**40**.
Samuelson and Bauer 1960.
↩

**41**.
Norvell 1999.
↩

**42**.
Chomsky 1959.
↩

**43**.
Chomsky 1959, Section VIII.
In later years,
Chomsky will make it clear that he never intended
to avoid semantics:

[...] it would be absurd to develop a general syntactic theory without assigning an absolutely crucial role to semantic considerations, since obviously the necessity to support semantic interpretation is one of the primary requirements that the structures generated by the syntactic component of a grammar must meet. (Chomsky 1978, p. 20, in footnote 7 starting on p. 19).↩

**44**.
ALGOL 1960.
↩

**45**.
The linguistics literature discuss goals, philosophy and motivations,
sometimes to the detriment of actual research
(Harris 1993).
The Parsing Theory literature, on the other hand, avoids
non-technical discussions,
so that a "manifesto" listing the goals of Parsing Theory
is hard to find.
But the first two sentences of the pivotal
Knuth 1965
may serve as one:

"There has been much recent interest in languages
whose grammar is sufficiently simple
that an
**efficient**
left-to-right parsing
algorithm can be
**mechanically produced**
from the grammar.
In this paper, we define LR(k) grammars,
which are perhaps the
**most general**
ones of this type, [...]"
(p. 607, boldface added.)

Here Knuth explcitly refers to goals of efficiency,
declarativeness,
and generality,
announcing a trade-off of the first two against
the second.
Note Knuth also refers to "left-to-right"
as a goal,
but this he makes clear (pp. 607-608) that
is because he sees it as a requirement
for efficiency.
↩

**46**.

**47**.
ALGOL 1960.
↩

**48**.
As a pedantic point, a general parser need only parse

**49**.
Glennie 1960.
↩

**50**.
Glennie 1960.
↩

**51**.
Irons 1961.
Among those who state that
Irons 1961
parser is what
is now called

**52**.
Although it seems likely that parsing operator expressions would require
backtracking,
and therefore could be inefficient.
↩

**53**.
Irons is credited with the discovery of synthesized attributes
by Knuth (Knuth 1990).
Synthesized attributes are a concept in semantics,
which this timeline usually does not cover.
But synthesized attributes
will be important for us.
↩

**54**.
Lucas 1961.
↩

**55**.
Grune and Jacobs 2008
call Lucas the discoverer of recursive descent.
In fact, both
Irons 1961
and
Lucas 1961
are recursive descent with a major bottom-up element.
Perhaps Grune and Jacobs based their decision on the Lucas' description of his algorithm,
which talks about his parser's bottom-up element only briefly,
while describing the top-down element in detail.
Also, the Lucas algorithm resembles modern implementations of recursive descent
much more closely than does
Irons 1961.

In conversation, Irons described his 1961 parser as a kind of recursive descent.
Peter Denning
(Denning 1983)
gives priority to Irons and,
as a grateful former student of Irons,
that is of course my preference on
a personal basis.
↩

**56**.
Lucas 1961
cites
Samuelson and Bauer 1960.
↩

**57**.
Dijkstra 1961.
↩

**58**.
Norvell 1999.
↩

**59**.
Oettinger 1961
American Mathematical Society, 1961.
↩

**60**.
Oettinger 1961, p. 106.
↩

**61**.
"The development of a theory of pushdown algorithms should
hopefully lead to systematic techniques for generating
algorithms satisfying given requirements to replace
the ad hoc invention of each new algorithm."
(Oettinger 1961, p. 127.)
↩

**62**.
But see the entry for 1973 on Pratt parsing
(Pratt 1973)
where the idea of parsing
entire languages as operator grammars is revisited.
↩

**63**.
Schorre 1964, p. D1.3-1.
↩

**64**.
This timeline is not a mathematics tutorial,
and I have ignored important complications to
avoid digression.
For interested readers,
the details of "big O"
notation can be worth learning:
Wiki Big_O.
↩

**65**.
Because constants are ignored, all reasonable measures of
the length are equivalent for
big O

notation.
Some papers on parsing also consider the size of the grammar,
but usually size of the grammar is regarded as a fixed constant.
In this timeline all
big O

results will be
in terms of the input length.
↩

**66**.
Knuth 1965..
↩

**67**.
(Oettinger 1961, p. 127.)
↩

**68**.
Knuth 1965, p. 607, in the abstract.
↩

**69**.
Given the capacity of computer memories in 1965,
LR(1) was clearly impractical.
With the huge computer memories of 2018,
that could be reconsidered,
but even today LR(1) is rare in practical use.
LR(1) shares
the poor error-handling that
the LR(k)-based parsers became known for.
And, since
LR(1) is still somewhat limited in its power,
it just does not seem to be worth it.
↩

**70**.
In this document
I will usually speak of a language intension as if it was
a BNF grammar.
Admittedly,
this is a very simplified view of semantics.
Treating semantics as reducible to the tagged trees produced
by parsing a BNF grammar
not only assumes that the language is pure context-free,
it is not even adequate for most computer
applications.
Most applications require at least some kind
of post-processing.

But this is not a formal mathematical presentation.
and for our purposes,
we do not need to actually represent a semantics,
just the interface to it.
And the grammars we consider are almost always
context-free.
↩

**71**.
The exception are applications which receive their input "on-line";
can not determine the size of their input in advance;
and must return a result in a fixed amount of time.
For these few applications,
adding an end marker to their input
is not possible.
↩

**72**.
None of these problematic signs escape Knuth.
He discovers and proves them on pp. 630-636.
But Knuth seems to consider the LR hierarchy collapse a
mathematical curiousity,
and one with no implications for practical parsing.
↩

**73**.
I deal with the implications of
Knuth 1965
for parsing theory
at greater length in these blog posts:
Kegler Strings 2018,
Kegler Solved 2018,
and
Kegler Undershoot 2018.
↩

**74**.
In 1964, advocates of stack-based parsing would not
have been seen themselves as settling for a "safe"
or "traditional" solution.
As
recently as 1961, readers of technical journals were
not expected to know what a stack was.
↩

**75**.
Knuth 1965, p. 637.
Knuth's own language is cautious,
so it is not 100% clear
that he believes that the
future of practical parsing theory lies
in the pursuit of LR(k) subsets.
His program for further research
(Knuth 1961, pp. 637-639)
also suggests investigation of parsers for superclasses
of LR(k).
Indeed,
Knuth shows elsewhere (p. 638)
that he is well aware that some grammars
beyond LR(k) can be parsed in linear time.
Knuth also is very much aware (p. 638) that
it is an open question whether all context-free grammars
can be parsed in linear time.

Knuth even describes a new superclass of his own:
LR(k,t), which is LR(k) with more aggressive lookahead.
But Knuth introduces LR(k,t) in dismissive terms:
"Finally, we might mention another generalization of LR(k)"
(Knuth 1965, p. 638); and
"One might choose to call this left-to-right translation,
although we had to back up a finite amount."
(p. 639).

It is reasonable to suppose
that Knuth is even more negative about
the more general approaches that
he does not bother to mention.
Knuth's skepticism of more general Chomskyan approaches
is also suggested by his own plans for his (not yet released) Chapter
12 of the
Art of Computer Programming,
in which he planned to use pre-Chomskyan bottom-up methods
(Knuth 1990, p. 3).

In
Knuth 1965
(pp. 607-608),
Knuth had emphasized that
proceeding strictly left-to-right
is necessary for efficiency reasons.
So probably subsequent researchers were correct in
reading into Knuth a prediction that
research into beyond-LR(k)
parsing would be not be fruitful.
And, regardless of what Knuth himself believed,
the consensus of the parsing theorists is not
in doubt:
interest in beyond-LR parsing will almost disappear
after 1965.
Their attention will be focused almost
exclusively on Knuth's suggestions for research within the stack-based
model (p. 637).
These included grammar rewrites;
streamlining of the LR(k) tables;
and research into LR(k) subclasses.
↩

**76**.
Knuth also discusses some ambiguities,
and some intermixing of semantics with syntax,
in the
ALGOL report.
But Knuth (quite appropriately)
notes that BNF was new when it was written,
and treats these other issues
as problems with the ALGOL specification.
(See
Knuth 1965, pp. 622-625.)
↩

**77**.
Knuth 1965, p. 621.
↩

**78**.
Lewis and Stearns 1968.
They are credited in
Rosencrantz and Stearns 1970
and
Aho and Ullman 1972, p. 368.
↩

**79**.
Knuth 1965, p. 610.
See also on p. 611
"corresponds with the intuitive notion of translation
from left to right looking k characters ahead".
↩

**80**.
Knuth 1971, p. 102.
LL and LR have mirror images: RL means
scan from the right,
using left reductions

and RR acquires its current meaning
of
scan from the right, using right reductions

.
Practical use of these
mirror images is rare, but it may have occurred
in one of the algorithms in our timeline
--
operator expression parsing
in the IT compiler seems to have been RL(2) with backtracking.
↩

**81**.
Earley 1968.
↩

**82**.
Knuth 1968.
↩

**83**.
DeRemer 1969
↩

**84**.
Darwin 1984,
Section 3.1, "An old editor made new".
↩

**85**.
Aho and Ullman 1972.
↩

**86**.
Aho and Ullman 1972, p 321.
↩

**87**.
Aho and Ullman 1972, pp. 456-485.
↩

**88**.
Aho and Ullman 1972, p. 466.
↩

**89**.
Simms 2014.
↩

**90**.
Čulik and Cohen 1973.
Čulik and Cohen give an algorithm for parsing LR-regular,
but the obstacles to implementation are daunting.
(See
Grune and Jacobs 2008,
pp. 327-333 and Problem 9.28 on p. 341.)
Their algorithm did not come into practical use,
and as of 1991 LR-regular grammars could be
handled more easily by
Joop Leo's algorithm.
↩

**91**.
Haskell list comprehension, for example, requires infinite lookahead --
see
Kegler Haskell 2018.
(Haskell's native parser deals with this in post-processing.)
↩

**92**.
Pratt 1973.
↩

**93**.
Norvell 1999.
↩

**94**.
Pratt 1973, p. 51.
↩

**95**.
Synder 1975.
See, in particular, pp. 67-68.
↩

**96**.
Aho and Ullman 1977.
↩

**97**.
McIlroy and Kernighan 1979.
↩

**98**.
Johnson 1975.
↩

**99**.
See
McIlroy 1987, p. 11,
for a listing of yacc-powered
languages in V7 Unix.
↩

**100**.
Wiki Perl,

**101**.
Wadler 1990
↩

**102**.
Wadler 1985.
↩

**105**.
Some sources consider
Wadler 1985
an
early presentation of the ideas of combinator parsing.
In assigning priority here,
I follow
Grune and Jacobs 2008
(p. 564).
↩

**106**.
The paper which is devoted to parsing is
Hutton 1992.
The other paper, which centers on combinators as a programming
paradigm, is
Frost 1992.
Frost only mentions parsing in one paragraph, and
that focuses on implementation issues.
Some of his grammars include operator expressions,
but these avoid left recursion,
and implement precedence and associativity,
using parentheses.
↩

**107**.
Hutton 1992, p. 5.
↩

**108**.
Hutton 1992, pp. 19-20.
↩

**109**.
Hutton and Meijer 1996.
↩

**110**.
Hutton and Meijer 1996,
p. 3.
↩

**111**.
Hutton and Meijer 1996, p. 18.
A simpler arithmetic expression grammar on p. 15 raises no issues
not present in the more complicated example.
↩

**112**.
Aycock and Horspool 2002.
↩

**113**.
Ford 2004.
See also
Ford 2002.
↩

**114**.
Mascrenhas et al 2014.
My account of PEG is negative because almost all
real-life PEG practice is unsafe,
and almost all of the literature offers no cautions
against unsafe use of PEG.
Ford's PEG is efficient
and does in fact have real, if niche, uses.
Programmers interested in the safe use of PEG should
consult
Mascrenhas et al 2014.
↩

**116**.
Kegler 2010.
↩

**117**.
In fact,
Leo 1991
proves his algorithm,
and therefore Marpa,
is linear for the LR-regular
grammars -- the LR grammars with infinite lookahead, so long as the lookahead
is a regular expresion.
It is not known (in fact not decidable, see
Leo 1991, p. 175.),
just how large a class of grammars
Marpa is linear for.

But it is known that there are both ambiguous and unambiguous grammars
for which Marpa is not linear.
In the general case, Marpa obeys the bounds of Earley's algorithm --
it is worst-case O(n**2) for unambiguous grammars;
and worst-case O(n**3) for ambiguous grammars.
On the other hand, Marpa follows
Leo 1991
in being linear for many ambiguous grammars,
including those of most practical interest.
↩

**118**.
In the literature operator expression grammars are more often
call simply
operator grammars

.
↩

**119**.
In the general case, ambiguity is undecidable, but
in practice it is usually straight-forward for a programmer
to determine that the grammar he wants is unambiguous.
Note that while the general case is undecidable,
Marpa will tell the programmer if a parse
(a grammar with a particular input)
is ambiguous and, since it parses ambiguous
grammars, produces an error message showing where the ambiguity is.
↩

**120**.
A
marker

, in this sense, is something in the input
that shows where the middle of a middle recursion is,
without having to go to the end and count back.
Whether or not a grammar is unmarked is, in general,
an undecidable problem.
But in practice, there is a easy rule:
if a person can
eyeball

the middle of a long
middle recursion, and does not need to count from both
ends to find it,
the grammar is
marked

.
↩

**121**.
This is a very pedantic requirement,
which practical programmers can in fact ignore.
Any programmer who has found and eliminated
ambiguities in his grammar will usually,
in the process,
also have found and
eliminated ambiguous right recursions,
since it is far easier to eliminate them than
to determine if they create a overall ambiguity.
In fact,
it is an open question whether there
*are*
any unambiguous
grammars with ambiguous right recursions --
neither Joop Leo or I know of any.
↩

**122**.
Kegler 2012a.
↩

**123**.
Internally, Marpa rewrites precedenced statements into BNF.
Marpa's rewrite does not use middle recursions
and does not introduce any new ambiguities
so that, following Marpa's
linearity
rules, the result must be linear.
↩

**124**.
In this, precedence goes from tightest to loosest.
Single bars (`|`) separate RHS's of equal precedence.
Double bars (`||`) separate RHS's of different precedence.
Associativity defaults to left,
and exceptions are indicated by the value of the
`assoc`
adverb.
Note that these
precedenced statements

,
like all of Marpa's DSL,
are parsed by Marpa itself.
The statement shown has the semantic adverbs removed for clarity.
The original is part of the
Marpa::R2
test suite.
It can also be found in the "Synopsis"
of the "Scanless::DSL" document on
Marpa::R2 MetaCPAN:
https://metacpan.org/pod/release/JKEGL/Marpa-R2-4.000000/pod/Scanless/DSL.pod#Synopsis,
permalink accessed 4 September 2018.
↩