Parsing: a timeline Version 3.0 10 May 2018 Jeffrey Kegler

4th BCE: Pannini's description of Sanskrit

In India, Pannini creates an exact and complete description of the Sanskrit language, including pronunciation. Sanskrit could be recreated using nothing but Pannini's grammar. Pannini's grammar is probably the first formal system of any kind, predating Euclid. Even today, nothing like it exists for any other natural language of comparable size or corpus. Pannini is the object of serious study today. But in the 1940's and 1950's Pannini is almost unknown in the West. Pannini will have no direct effect on the other events in this timeline.

1906: Markov's chains

Andrey Markov introduces his chains -- a set of states with transitions between them.[1] One offshoot of Markov's work will be what we now know as regular expressions. Markov uses his chains, not for parsing, but for addressing a problem in probability -- does the law of large numbers require that events be independent?[2].

1913: Markov and Eugene Onegin

In 1913, Markov revisits his chains, applying them to the sequence of vowels and consonants in Pushkin's Eugene Onegin[3]. Again, Markov's interest is not in parsing. Nonetheless, this is an application of what later will be regarded as a parsing technique to language, and apparently for the first time in the West.

1943: Post's rewriting system

Emil Post defines and studies a formal rewriting system[4] using productions. With this, the process of rediscovering Pannini in the West begins.

1945: Turing discovers stacks

Alan Turing discovers the stack as part of his design of the ACE machine. This is important in parsing because recursive parsing requires stacks. The importance of Turing's discovery is not noticed at the time and stacks will be rediscovered many times over the next two decades[5].

1948: Shannon repurposes Markov's chains

Claude Shannon publishes the foundation paper of information theory[6]. In this paper, Shannon models English using Andrey Markov's chains[7]. The approach is similar to Markov's but the intent seems to be different -- Shannon's is a serious attempt at a contribution to parsing human languages.

1949: Rutishauser's compiler

From 1949 to 1951 at the ETH Zurich, Heinz Rutishauser worked on the design of what we would now call a compiler[8]. Rutishauser's arithmetic expression parser did not honor precedence but it does allow nested parentheses. It is perhaps the first algorithm which can really be considered a parsing method. Rutishauser's compiler was never implemented.

The Operator Issue as of 1949

In the form of arithmetic expressions, operator expressions are the target of the first efforts at automatic parsing. We will not see this issue go away. Very informally[9], we can say that an operator expression is an expression built up from operands and operators. It is expected that the operand might be another operator expression, so operator expressions raise the issue of recursion.

The archetypal examples of operator expressions are arithmetic expressions:


In Western mathematics arithmetic expressions were read according to traditional ideas of associativity and precedence:

Certainly in the first attempts at parsing in compilers, arithmetic expressions are central. The first compiled languages are organized line-by-line and, except for arithmetic expressions, lines are interpreted using basic string manipulations. It is only when those string manipulations turn to dealing with arithmetic expressions that they become sophisticated enough to be called parsing techniques. And only then does a parsing theory to describe those techniques become helpful.

Rutishauser's language, and in fact all languages before LISP, are structured line-by-line. No language before ALGOL is truly block-structured.

The line-by-line languages are parsed using string manipulations. A parsing theory is not helpful for describing these ad hoc string manipulations, so they don't give rise to one. The only logic in these early compilers that really deserves to be called a parsing method is that which tackles arithmetic expressions.

1950: Boehm's compiler

During 1950, Corrado Boehm, also at the ETH Zurich develops his own compiler. Rutishauser and Boehm are working at the same institution at the same time, but Boehm is unaware of Rutishauser's work until his own is complete. Boehm's is also the first self-compiling compiler -- it is written in its own language.

Like Rutishauser, Boehm's language is line-by-line and parsed ad hoc, except for expressions. Boehm's expression parser does honor precedence, making it perhaps the first operator precedence parser[11]. In Norvell's taxonomy[12], Boehm's algorithm inaugurates the classic approach to operator parsing.

Boehm's compiler also allows parentheses, but the two cannot be mixed -- an expression can either be parsed using precedence or have parentheses, but not both. And like Rutishauser's, Boehm's compiler is never implemented[13].

1952: Grace Hopper uses the term compiler

Grace Hopper writes a linker-loader, and calls it a compiler[14]. Hopper seems to be the first person to use this term for a computer program.

"Compiler" as of 1952

Hopper used the term compiler in a meaning it had at the time: to compose out of materials from other documents[15]. Specifically, in 1952, subroutines were new, and automated programming (what we would come to call compiling) often was viewed as providing a interface for calling a collection of carefully chosen assembler subroutines[16]. Hopper's new program took this subroutine calling one step further -- instead of calling the subroutines it expanded them (or in Hopper's terms compiled them) into a single program. Since Hopper the term compiler has acquired a more specialized and somewhat different meaning in the computer field. Today we would not call Hopper's program a compiler[17].

As we have seen, programs that today we would call compilers were envisioned before Hopper, but nobody called these programs compilers -- compiling in our modern sense was called automatic coding, codification automatique or Rechenplanfertigung[18].

1951: Kleene's regular languages

Kleene discovers regular languages[19]. Kleene does not use regular expression notation, but his regular languages are the idea behind it.

1952: Glennie's AUTOCODE

Knuth calls Glennie's the first real[20] compiler, by which he means that it was actually implemented and used by someone to translate algebraic statements into machine language. Glennie's AUTOCODE was very close to the machine -- just above machine language. It did not allow operator expressions.

AUTOCODE was hard-to-use, and apparently saw little use by anybody but Glennie himself. Because Glennie worked for the British atomic weapons projects his papers were routinely classified, so even the indirect influence of AUTOCODE was slow to spread. Nonetheless, many other compilers afterward were named AUTOCODE, which probably indicates awareness of Glennie's effort[21].

1954: The FORTRAN project begins

At IBM, a team under John Backus begins working on the language which will be called FORTRAN.

"Compiler" as of 1954

The term compiler is still being used in Hopper's looser sense, instead of its modern, specialized, one. In particular, there is no implication that the output of a compiler is ready for execution by a computer. The output of one 1954 compiler, for example, produces relative addresses, which need to be translated by hand before a machine can execute them[22].

1955: Noam Chomsky starts teaching at MIT

Noam Chomsky is awarded a Ph.D. in linguistics and accepts a teaching post at MIT. MIT does not have a linguistics department and Chomsky is free to teach his own approach. Chomksy's linguistics course is highly original and very mathematical.

1955: Work begins on the IT compiler

At Purdue, a team including Alan Perlis and Joseph Smith begins work on the IT compiler[23].

1956: The IT compiler is released

Perlis and Smith, now at the Carnegie Institute of Technology, finish the IT compiler[24]. Don Knuth calls this

the first really useful compiler. IT and IT's derivatives were used successfully and frequently in hundreds of computer installations until [their target] the [IBM] 650 became obsolete. [... P]revious systems were important steps along the way, but none of them had the combination of powerful language and adequate implementation and documentation needed to make a significant impact in the use of machines.[25]

The IT language had arithmetic expressions, of a sort -- parentheses are honored, but otherwise evaluation is always right-to-left -- and there is no operator precedence. The IT compiler's way of doing arithmetic expressions proves very unpopular: Donald Knuth reports that

The lack of operator priority (often called precedence or hierarchy) in the IT language was the most frequent single cause of errors by the users of that compiler.[26]

"Compiler" as of 1956

In the 1956 document describing the IT compiler[27], the IT team is careful to define the term. Their definition makes clear that they are using the word compiler in something like its modern sense, perhaps for the first time. From this time on, when used as a technical term within computing, compiler will usually mean what we currently understand it to mean.

1956: The Chomsky hierarchy

Chomsky publishes the paper which is usually considered the foundation of Western formal language theory[28]. Chomsky demolishes the idea that natural language grammar can be modeled using only Markov chains. Instead, the paper advocates a natural language approach that uses three layers:

Term: "Parser"

Chomsky is a turning point, so much so that he establishes or settles the meaning of many of the terms we are using. A parser, for our purposes, is something or someone that transforms a string of symbols into a structure, according to a description of the mapping from strings to structures. For our purposes, the structures can usually be considered to be parse trees.

Term: "Recognizer"

In contrast to a parser, a recognizer is a something or someone that takes a string and answers yes or no -- yes if the string is in the language described by the recognizer, no otherwise. Note that, if we intend to do semantics, a recognizer alone is not sufficient.

In the strict sense, a recognizer cannot driver a compiler -- a compiler needs a parser. But recognizers can be far easier to write, and are often hacked up and pressed into service as parsers. For example, if your semantics is simple and non-recursive, you might be able to drive your semantics phase with the output of a regex engine, using captures.

1957: Chomsky publishes Syntactic Structures

Noam Chomsky publishes Syntactic Structures[30], one of the most important books of all time. The orthodoxy in 1957 is structural linguistics which argues, with Sherlock Holmes, that it is a capital mistake to theorize in advance of the facts. Structuralists start with the utterances in a language, and build upward.

But Chomsky claims that without a theory there are no facts: there is only noise. The Chomskyan approach is to start with a grammar, and use the corpus of the language to check its accuracy. Chomsky's approach will soon come to dominate linguistics.

Term: "Chomskyan parsing"

From here on, parsing theory divides into Chomskyan and non-Chomskyan. Chomskyan parsing theory becomes and remains the mainstream but it is far from unchallenged.

Above we defined a parser (recognizer) as something or someone that parses (recognizes) a string according to a description. In Chomskyan parsing (recognizing), the description is that of Chomsky's middle layer. If the description is anything else, the parser (recognizer) is non-Chomskyan.

As of 1957, we are calling the description of a Chomskyan middle layer, a context-free grammar -- BNF notation for context-free grammars has not yet been discovered. Once it is, we will refer to context-free grammars as BNF grammars.

1957: FORTRAN released

Backus's team makes the first FORTRAN compiler available to IBM customers. FORTRAN is the first high-level language that will find widespread implementation. As of this writing, it is the oldest language that survives in practical use.

FORTRAN I is a line-by-line language. Its parsing is non-Chomskyan. But it includes one important discovery. FORTRAN allows expressions and, learning from the dissatisfaction with the IT compiler, FORTRAN honors associativity and precedence.

The designers of FORTRAN used a strange trick for parsing operator expresssions -- they hacked the expressions by adding parentheses around each operator. The number of parentheses varied, depending on the operator. Surprisingly, this works. In fact, once the theoretical understanding of operator precedence comes about, the FORTRAN I implementation is recognized as a hackish and inefficient way of implementing the classic operator precedence algorithm.

1958: LISP released

John McCarthy's LISP appears. LISP goes beyond the line-by-line syntax -- it is recursively structured. But the LISP interpreter does not find the recursive structure: the programmer must explicitly indicate the structure herself, using parentheses. Similarly, LISP does not have operator expressions in the usual sense -- associativity and precedence must be specified with parentheses.

1959: Backus's notation

Backus discovers a new notation to describe the IAL language[31]. According to Backus's recollection, his paper[32] has only one reader: Peter Naur[33]. The IAL language will soon be renamed ALGOL.

1959: Operator precedence and stacks

Samuelson and Bauer[34] describe the use of stacks to implement operator precedence. Samuelson and Bauer do not use the word stack -- what we call a stack, they call a cellar. And they explain the idea in great detail. Apparently, 14 years after Turing's discovery of them, stacks are still unfamiliar, even to the readers of technical journals.

Their algorithm, and its presentation, are thoroughly non-Chomskyan. Samuelson and Bauer do not provide a context-free grammar for their grammar, just an operator precedence table.

The Operator Issue as of 1959

Since Boehm, many people have been refining operator precedence. With Samuelson and Bauer, what Norvell[35] calls "the classic algorithm" takes a well-documented form. The Samuelson and Bauer paper was very influential.

1960: The ALGOL report

The ALGOL 60 report[36] specifies, for the first time, a block structured language. ALGOL 60 is recursively structured but the structure is implicit -- newlines are not semantically significant, and parentheses indicate syntax only in a few specific cases. The ALGOL compiler will have to find the structure.

The Parsing Problem

With the ALGOL 60 report, a quest begins which continues to this day: to find a parser that solves the Parsing Problem. Ideally such a parser is

It is a case of 1960's optimism at its best. As the ALGOL committee is well aware, a parsing algorithm capable of handling ALGOL 60 does not yet exist. But the risk they are taking has immediate results -- 1961 will see a number of discoveries that are still important today. On the other hand, the parser they seek remains elusive for decades.

1960: BNF

In the ALGOL 60 report[37], Peter Naur improves the Backus notation and uses it to describe the language. This bring Backus' notation to wide attention. The improved notation will become known as Backus-Naur Form (BNF).

Term: "declarative"

For our purposes, a parser is declarative, if it will parse directly and automatically from grammars written in BNF. Declarative parsers are often called syntax-driven parsers.

Term: "procedural"

A parser is procedural, if it requires procedural logic as part of its syntax phase.

Term: "general"

A general parser is a parser that will parse any grammar that can be written in BNF[38].

1960: Gleenie's compiler-compiler

The first description of a consciously non-Chomskyan compiler seems to predate the first description of a Chomskyan parser. It is A.E. Gleenie's 1960 description of his compiler-compiler[39]. Glennie's universal compiler apparently is useable, but it is more of a methodology than an implementation -- the compilers must be written by hand.

Glennie uses BNF, but he does not use it in the way Chomsky, Backus and Naur intended. Glennie is using BNF to describe a procedure. In true BNF, the order of the rules does not matter -- in Glennie's pseudo-BNF order matters very much. This means that, for most practical grammars, Glennie's pseudo-BNF and BNF proper do not describe the same language.

Glennie is well aware of what he his doing, and is not attempting to deceive anyone -- he points out that the distinction between his procedural pseudo-BNF and declarative BNF, and clearly warns his reader that the difference is important[40]. Is Glennie's non-Chomskyan-ism on theoretical grounds, or pragmatic?

1961: The first parsing paper

In January, Ned Irons publishes a paper describing his ALGOL 60 parser. It is the first paper to fully describe any parser. The Irons algorithm is pure Chomskyan and top-down with a bottom-up left corner element -- it is what now would be called a left corner parser[41].

The Irons parser is declarative and general. And since it is general, operator expressions are within the power of the Irons parser[42].

Term: "Top-down"

A top-down parser deduces the consituents of a rule from the rule. That is, it looks at the rule first, and then deduces what is on its RHS. Thus, it works from the start rule, and works top down until it arrives at the input tokens.

It is important to note that no useful parser can be purely top-down -- if a parser worked purely pure top-down, it would never look at its input. So every top-parser we will consider has some kind of bottom-up element. That bottom-up element may be very simple -- for example, one character lookahead. The Irons 1961 parser, like most modern top-down parsers, has a sophisticated bottom-up element.

Term: "Bottom-up"

A bottom-up parser deduces a rule from its constituents. That is, it looks at either the input or at the LHS symbols of previously deduced rules, and from that deduces a rule. Thus, it works from the input tokens, and works bottom up until it reaches the start rule. A parser can be be purely bottom-up, but for efficiency it is usually best to eliminate potential parses based on the grammar, and that requires some kind of top-down logic.

Term: "Synthesized attribute"

Irons 1961 also introduces synthesized attributes: the parse creates a tree, which is evaluated bottom-up. Each node is evaluated using attributes synthesized from its child nodes[43].

1961: Lucas discovers recursive descent

Peter Lucas publishes his description of a top-down parser[44]. Either Irons paper or this one can be considered to be the first description of recursive descent[45]. Except to say that he deals properly with them, Lucas does not say how he parses operator expressions. But it is easy to believe Lucas' claim -- by this time the techniques for parsing operator expressions are well understood[46].

1961: Dijkstra's shunting yard algorithm

In November 1961, Dijkstra publishes the shunting yard algorithm[47]. In Norvell's useful classification[48] of operator expression parsers, all earlier parsers have been what he calls the classic algorithm.

Dijkstra's approach is new. In the classic approach, the number of levels of precedence is hard-coded into the algorithm. Dijkstra's algorithm can handle any number of levels of precedence without a change in its code, and without any effect on its running speed.

The Operator Issue as of 1961

The results of 1961 transformed the Operator Issue. Before ALGOL, parsing operator expressions essentially was parsing. After ALGOL, almost all languages will be block-structured and ad hoc string manipulatons are no longer adequate -- the language as a whole requires a serious parsing technique. Parsing operator expressions becomes a side show, or so it seems.

Why not use the algorithms that parse operator expressions for the whole language? Samuelson and Bauer 1959 had suggested exactly that. But, alas, operator expression parsing is not adequate for languages as a whole[49].

Also by 1961, we have BNF. This gives us a useful notation for describing grammars. It allows us to introduce our Basic Operator Grammar (BASIC-OP):

      S ::= E
      E ::= E + T
      E ::= T
      T ::= T * F
      T ::= F
      F ::= number

BASIC-OP has two operators, three levels of precedence and left associativity. This was enough to challenge the primitive parsing techniques in use before 1961. Surprisingly, this simple grammar will continue to bedevil mainstream parsing theory for the next half a century.

Recursive descent, it turns out, cannot parse BASIC-OP because it is left recursive. And that is not the end of it. Making addition and multiplication right-associate is unnatural and, as the authors of the IT compiler found out, causes users to revolt. But suppose we try to use this Right-recursive Operator Grammar (RIGHT-OP) anyway:

      S ::= E
      E ::= T + E
      E ::= T
      T ::= F * T
      T ::= F
      F ::= number

Recursive descent, without help, cannot parse RIGHT-OP. As of 1961, parsing theory has not developed well enough to state why in a precise terms. Suffice it to say for now that RIGHT-OP requires too much lookahead.

But recursive descent does have a huge advantage, one which, despite its severe limitations, will save it from obsolescence time and again. Hand-written recursive descent is essentially calling subroutines. Adding custom modification to recursive descent is very straight-forward.

In addition, while pure recursive descent cannot parse operator expressions, it can recognize them. This means pure recursive descent may not be able to create the parse subtree for an operator expression itself, but it can recognize the expression and hand control over to a specialized operator expression parser. This seems to be what Lucas' 1961 algorithm did, and it is certainly what many other implementations did afterwards. Adding the operator expression subparser makes the implementation only quasi-Chomskyan, but this was a price the profession has been willing to pay.

Alternatively, a recursive descent implementation can parse operator expressions as lists, and add associativity in post-processing. This pushes some of the more important parsing out of the syntactic phase into the semantics but, once again, it seemed that Chomskyan purity had to be thrown overboard if the ship was to stay afloat.

Bottom line: as of 1961 the Operator Issue takes a new form. Because of the Operator Issue, recursive descent is not sufficient for practical grammars -- it must always be part of a hybrid.

In this context, Dijkstra's new 1961 algorithm is a welcome alternative: as an operator expression subparser, it can parse operator expressions faster and in less space. But Dijkstra's algorithm has no more parsing power than the classic operator precedence algorithm -- it does nothing to change the basic tradeoffs.

1964: The Meta II compiler

Schorre publishes a paper on the Meta II compiler writing language, summarizing the papers of the 1963 conference. Schorre cites both Backus and Chomsky as sources for Meta II's notation. Schorre notes that his parser is entirely different from that of Irons 1961 -- in fact, it is non-Chomskyan. Meta II is a template, rather than something that his readers can use, but in principle it can be turned into a fully automated compiler-compiler[50].

1965: Knuth discovers LR

Don Knuth discovers LR parsing[51]. The LR algorithm is deterministic, Chomskyan and bottom-up. Knuth is primarily interested in the mathematics, and the parsing algorithm he gives is not practical. He leaves development of practical LR parsing algorithms as an open question for future research[52].

Term: "linear"

Among the goals for the ideal ALGOL parser was that it be efficient. As of 1965 this notion becomes more precise, thanks to a notation that Knuth borrows from calculus. This big O notation characterizes algorithms using functions, but it treats functions that differ by a constant multiple as identical[53]. Ignoring the constant means that conclusions drawn from big O results outlast steady improvements in technology, but capture non-linear jumps.

These big O functions take as their input some aspect of the algorithm's input. In the case of parsing, by convention, the big O function is usually a function of the length of the input[54]. Of the many possible big O functions, only a few will be of interest in this timeline.

By this time, efficient for a parser means linear or quasi-linear. Parsers which operate in quadratic, cubic, or exponential time are considered impractical. But parsers aimed at practitioners will often push the envelope -- any parser which uses backtracking is potentially exponential and is designed in the (often disappointed) hope that the backtracking will not get out of hand for the grammar of interest to the user.

The Operator Issue as of 1965

Knuth 1965 is a significant milestone in the history of operator expresssion parsing. Knuth specifically addresses the parsing of ALGOL and he zeroes in on its arithmetic expressions as the crucial issue[55]. Knuth suggests a "typical" grammar[56] which is short, but encapsulates the parsing challenges presented by ALGOL's arithmetic expressions:

      S ::= E
      E ::= - T
      E ::= T
      E ::= E - T
      T ::= P
      T ::= T * P
      P ::= a
      P ::= ( E )

KNUTH-OP is left-recursive, allows parentheses, has three levels of precedence, and implements both unary and binary minus. Recursive descent cannot parse KNUTH-OP, but it is LR(1). The means that it is well within Knuth's new class of grammars and, by implication, probably within a practical subclass of the LR grammars.

An influential misconception

In his 1965, Knuth makes a reasonable assumption, but one which turns out to be wrong:

Another important area of research is to develop algorithms that accept LR(k) grammars, or special classes of them, and to mechanically produce efficient parsing programs.[57]

Implied in this is the belief that better algorithms for parsing LR(k) grammars will either be parsers aimed specifically at LR(k) or at subsets of LR(k). That is, Knuth excludes the idea that algorithms which target supersets of LR(k) might be faster than those that take on LR(k) directly. In short, he assumes, as is usually the case, that as you gain in parsing power, you lose in terms of speed. That is a good rule of thumb, and a reasonable expectation, but it is not always true.

A factor in this misconception may be another assumption: the most efficient way to parse a deterministic language is with a deterministic parser. Certainly all deterministic parsers are linear. And certainly all non-deterministic parsers can run in worse than quasi-linear time, at which point they quickly become impractical. But neither of these truths directly answers the question of practical interest:

Is there a non-deterministic parser which is linear for the LR(k) grammars, or a superset of them?

This question will be answered by Joop Leo in 1991. The answer, surprisingly, will be yes.

1968: Lewis and Stearns discover LL

When Knuth discovered the LR grammars, he announced them to the world with a full-blown mathematical description. The top-down grammars, which arose historically, have lacked such a description. In 1968, Lewis and Stearns fill that gap by defining the LL(k) grammars[58].

Terms: "LL" and "LR"

When LL is added to the vocabulary of parsing, the meaning of LR shifts slightly. In 1965 Knuth meant LR to mean translatable from left to right[59]. But LL means scan from the left, using left reductions and, in response, the meaning of LR shifts to become scan from the left, using right reductions[60].

If there is a number after the parentheses in this notation for parsing algorithms, it usually indicates the number of tokens of lookahead. As an example, LL(1) means scan from the left, using left reductions with one character of lookahead. LL(1) will be important in what follows.

The Operator Issue as of 1968

With Knuth 1965 and Lewis and Stearns 1968, we can now restate the problem with recursive descent and operator expressions in precise terms: Recursive descent in its pure form, is LL(1). Arithmetic operator grammars are not LL(1) -- not even close. In fact, of the grammars BASIC-OP, RIGHT-OP, and KNUTH-OP, none is LL(k) for any k.

Compromises have to be made if we are parsing with recursive descent. As one result of these compromises, truly declarative versions of LL(1) are not used. A pure declarative LL(1) parser generator could be written, but it would not be able to parse arithmetic expressions properly.

As mentioned, a common compromise is to have recursive descent parse arithmetic expressions as lists, and add associativity in post-processing. We are now able to look at this in more detail. An extended BNF grammar to recognize BASIC-OP as a list is as follows:

      S  ::= E
      E  ::= T { TI }*
      TI ::= '+' T
      T  ::= F { FI } *
      FI ::= 'x' F
      F  ::= number

In the above {Z}* means zero or more occurences of Z. Expanded into pure BNF, and avoiding empty right hand sides, our operator "list" grammar becomes LIST-OP:

      S  ::= E
      E  ::= T TL
      E  ::= T
      TL ::= TI
      TL ::= TI TL
      TI ::= '+' T
      T  ::= F FL
      T  ::= F
      FL ::= FI
      FL ::= FI FL
      FI ::= 'x' F
      F  ::= number

LIST-OP is LL(1), and therefore can be parsed directly by recursive descent. Note that in LIST-OP, although associativity must be provided by a post-processor, the grammar enforces precedence.

1968: Earley's algorithm

Jay Earley discovers the algorithm named after him[61]. Like the Irons algorithm, Earley's algorithm is Chomskyan, declarative and fully general. Unlike the Irons algorithm, it does not backtrack. Earley's algorithm is both top-down and bottom-up at once -- it uses dynamic programming and keeps track of the parse in tables. Earley's approach makes a lot of sense and looks very promising indeed, but it has three serious issues:

1968: Attribute grammars

Irons' synthesized attributes had always been inadequate for many tasks. Until now, they had been supplemented by side effects or state variables. In 1968, Knuth publishes a paper on a concept he had been working for the previous few years: inherited attributes[62].

Term: "Inherited attributes"

Recall that a node in a parse gets its synthesized attributes from its children. Inherited attributes are attibutes that a node gets from its parents. Of course, if both inherited and synthesized attributes are used, there are potential circularities. But inherited attributes are powerful and, with care, the circularities can be avoided.

Term: "Attribute grammar"

An attribute grammar is a grammar whose nodes may have both inherited and synthesized attributes.

1969: LALR

Since Knuth 1965 many have taken up his challenge to find a practically parseable subset of the LR(k) languages. In 1969, Frank DeRemer describes a new variant of Knuth's LR parsing[63]. DeRemer's LALR algorithm requires only a stack and a state table of quite manageable size. LALR looks practical, and can parse KNUTH-OP. LALR will go on to become the most widely used of the LR(k) subsets.

1969: the ed editor

Ken Thompson writes the ed editor as one of the first components of UNIX[64]. Before 1969, regular expressions were an esoteric mathematical formalism. Through the ed editor and its descendants, regular expressions become an everyday part of the working programmers toolkit.

1972: Aho and Ullman is published

Alfred Aho and Jeffrey Ullman publish the first volume[65] of their two volume textbook summarizing the theory of parsing. This book is still important. It is also distressingly up-to-date -- progress in parsing theory slowed dramatically after 1972. Aho and Ullman's version of Earley's algorithm includes a straightforward fix to the zero-length rule bug in Earley's original[66]. Unfortunately, this fix involves adding even more bookkeeping to Earley's.

Under the names TDPL and GTDPL, Aho and Ullman investigate the non-Chomksyan parsers in the Schorre lineage[67]. They note that it can be quite difficult to determine what language is defined by a TDPL parser[68]. At or around this time, rumor has it that the main line of development for GTDPL parsers is classified secret by the US government[69]. Public interest in GTDPL fades.

1973: Pratt parsing

As we have noted pure LL(1) cannot parse operator expressions, and so operator expression parsers are often called into service as subparsers. What about making the entire grammar an operator grammar? This idea had already occurred to Samuelson and Bauer in 1959. In 1973, Vaughan Pratt revives it[70].

There are problems with switching to operator grammars. Operator grammars are non-Chomskyan -- the BNF no longer accurately describes the grammar. Instead the BNF becomes part of a combined notation, and the actual grammar parsed depends also on precedence, associativity, and semantics. And operator grammars have a very restricted form -- most practical languages are not operator grammars.

But many practical grammars are almost operator grammars. And the Chomskyan approach has always had its dissenters. Vaughn Pratt is one of these, and discovers a new approach to operator expression parsing: Pratt parsing is the third and last approach in Theodore Norvell's taxonomy of approaches to operator expression parsing[71]. Some have adopted Pratt parsing as the overall solution to their parsing problems[72].

As of 2018, the Pratt approach is not popular as an overall strategy. Under the name precedence climbing, it is most often used as a subparser within a recursive descent strategy. All operator expression subparsers break the Chomskyan paradigm so the non-Chomskyan nature of Pratt's parser is not a problem in this context.

1975: The C compiler is converted to LALR

Bell Labs converts its C compiler from hand-written recursive descent to DeRemer's LALR algorithm[73].

1977: The first dragon book is published

The first dragon book[74] comes out. This soon-to-become classic textbook is nicknamed after the drawing on the front cover, in which a knight takes on a dragon. Emblazoned on the knight's lance are the letters LALR. From here on out, to speak lightly of LALR will be to besmirch the escutcheon of parsing theory.

1979: Yacc is released

Bell Laboratories releases Version 7 UNIX[75]. V7 includes what is, by far, the most comprehensive, useable and easily available compiler writing toolkit yet developed.

Part of the V7 toolkit is Yet Another Compiler Compiler (yacc)[76]. yacc is LALR-powered. Despite its name, yacc is the first compiler-compiler in the modern sense. For a few useful languages, the process of going from Chomskyan specification to executable is now fully automated. Most practical languages, including the C language and yacc's own input language, still require manual hackery[77].

The Operator Issue as of 1979

Recall the criteria outlined for a solution of the Parsing Problem: that the parser be efficient, practical, declarative and general. LALR is linear and runs fast on 1979 hardware. As of 1979 LALR seems practical. And LALR is declarative. True, LALR is very far from general, but BASIC-OP, RIGHT-OP KNUTH-OP and LIST-OP are all LALR, and LALR can parse most operator expressions directly.

Not all of the criteria of the original quest have been fully met. Nonetheless, after two decades of research, it seems as if the Parsing Problem and the Operator Issue can both be declared solved.

1987: Perl 1 is released

Larry Wall introduces Perl 1[78]. Perl embraces complexity like no previous language. Larry uses yacc and LALR very aggressively -- to my knowledge more aggressively than anyone before or since.

1990: Monads

Wadler starts to introduce monads to the functional programming community. As one example he converts his previous efforts at function parsing techniques to monads[79]. The parsing technique is pure recursive descent. The examples shown for monadic parsing are very simple, and do not include operator expressions. In his earlier non-monadic work, Wadler uses a very restricted form of operator expression: His grammar avoided left recursion by using parentheses, and operator precedence was not implemented [80].

1991: Leo's speed-up of Earley's algorithm

Joop Leo discovers a way of speeding up right recursions in Earley's algorithm[81]. Leo's algorithm is linear for just about every unambiguous grammar of practical interest, and many ambiguous ones as well. In 1991 hardware is six orders of magnitude faster than 1968 hardware, so that the issue of bookkeeping overhead had receded in importance. This is a major discovery. When it comes to speed, the game has changed in favor of the Earley algorithm.

Knuth's influential misconception revisited

Recall that Knuth 1965, in suggesting lines of future research, had excluded the possibility that a parser of a superset of the LR(k) languages could be practical and linear. Knuth did not conclude this carelessly -- he proved a strong relationship between the deterministic languages of previous theory, and the language defined by his new LR(k) model. Knuth's argument was highly suggestive, so much so that most took it as a proof.

But it was not a proof -- Earley's non-deterministic algorithm, while not linear in general, is linear for a superset of Knuth's LR(k) grammars. Is it practical? At this point, Earley parsing is almost forgotten, and few take notice of Earley's discovery. Two decades will pass before anyone attempts a practical implementation of Leo 1991.

The Operator Issue as of 1991

Earley's is forgotten. So everyone in LALR-land is content, right? Wrong. In fact, far from it.

As of 1979, LALR seemed practical. But by the 1990s users of LALR are making unpleasant discoveries. While LALR automatically generates their parsers, debugging them is so hard they could just as easily write the parser by hand. Once debugged, their LALR parsers are fast for correct inputs. But almost all they tell the users about incorrect inputs is that they are incorrect. In Larry's words, LALR is fast but stupid[82].

On the written record, the discontent with LALR and yacc is hard to find. What programmer wants to announce to colleagues and potential employers that he cannot figure how to make the standard state-of-the-art tool work? But, by 1991, a movement away from LALR has already begun. Parsing practice falls back on recursive descent and the Operator Issue is back in full force.

1992: Combinator parsing

Combinator parsing was introduced in two 1992 papers[83]. Of more interest to us is the one by Hutton, which focuses on combinator parsing[84]. As Hutton explains[85] a combinator is an attribute grammar with one inherited attribute (the input string) and two synthesized attributes (the value of the node and the remaining part of the input string). Combinators can be combined to compose other parsers. This allows you to have an algebra of parsers. It's an extremely attractive idea.

But, exciting as the new mathematics is, the underlying parsing theory contains nothing new -- it is the decades-old recursive descent, unchanged. Hutton's example language is extremely basic -- his operator expressions require left association, but not precedence. Left association is implemented by post-processing.

Hutton's main presentation does not use monads. In his final pages, however, Hutton points out combinator parsers give rise to a monad and shows how his presentation could be rewritten in a form closely related to monads[86].

1995: Wadler's monads

In the adaptation of monads into the world of programming, Walder 1995 is a turning point. One of Wadler's examples is a parser based on monads and combinator parsing. Combinator parsing becomes the standard example of monad programming.

In its monadic form, the already attractive technique of combinator parsing is extremely elegant and certainly seems as if it should solve all parsing problems. But, once again, it is still recursive descent, completely unchanged. Wadler handles left recursion by parsing into a list, and re-processing that list into left recursive form. Wadler's example grammar only has one operator, so the issue of precedence does not arise.

Instead of one paradigm to solve them all, Wadler ends up with a two layered approach, with a hackish bottom layer and an awkward interface. This undercuts the simplicity and elegance of the combinator approach.

1996: Hutton and Meijer on combinator parsing

Wadler 1995 used parsing as an example to motivate its presentation of monads. Graham Hutton and Erik Meijer[87] take the opposite perspective -- they write a paper on combinator parsing that could also be viewed as a first introduction to the use of monads in programming.[88]

The most complicated grammar[89] for operator expressions in Hutton and Meijer 1996 has two operators (exponentiation and addition) with two levels of precedence and two different associativities: exponentiation associates right and addition associates left.

The solution in Hutton and Meijer 1996 will be familiar by now: Precedence is provided by the combinator parser which parses operators and operands of the same precedence into lists. Associativity is provided by re-processing the lists.

The Operator Issue as of 1996

As we have seen, the literature introducing monads and combinator parsing added nothing to what was already known about parsing algorithms. We have focused on it for three reasons:

2000: The Perl 6 effort begins

Larry Wall decides on a radical reimplementation of Perl -- Perl 6. Larry does not even consider using LALR again.

2002: Aycock and Horspool solve Earley's zero-length rule problem

John Aycock and R. Nigel Horspool publish their attempt at a fast, practical Earley's parser[90]. Missing from it is Joop Leo's improvement -- they seem not to be aware of it. Their own speedup is limited in what it achieves and the complications it introduces can be counter-productive at evaluation time. But buried in their paper is a solution to the zero-length rule bug. And this time the solution requires no additional bookkeeping.

2004: PEG

Implementers by now are avoiding yacc, but the demand for declarative parsers remains. In 2004, Bryan Ford[91] fills this gap by repackaging the nearly-forgotten GTDPL. Ford's new algorithm, PEG, is declarative, always linear, and has an attractive new syntax.

But PEG is, in fact, pseudo-declarative -- it uses the BNF notation, but it does not parse the BNF grammar described by the notation: PEG achieves unambiguity by finding only a subset of the parses of its BNF grammar. And, as with its predecessor GTDPL, in practice it is usually impossible to determine what the subset is. This means that the best a programmer usually can do is to create a test suite and fiddle with the PEG description until it passes. Problems not covered by the test suite will be encountered at runtime.

PEG takes the old joke that the code is the documentation one step further -- with PEG it is often the case that nothing documents the grammar, not even the code. Under this circumstance, creating reliable software is impossible. As it is usually used, PEG is the nitroglycerin of LL(1) parsing -- slightly more powerful, but too dangerous to be worth it.

PEG, in safe use, would essentially be LL(1)[92]. Those adventurous souls who parse operator expressions under PEG typically seem to parse the operator expressions as lists, and re-process them in the semantics.

2006: GNU C reverts to recursive descent

The old Bison-based C and Objective-C parser has been replaced by a new, faster hand-written recursive-descent parser.[93]

With this single line in the middle of a change list hundreds of lines long, GNU announces a major event in parsing history. (Bison, an LALR parser, is the direct descendant of Steve Johnson's yacc.)

For three decades, the industry's flagship C compilers have used LALR as their parser -- proof of the claim that LALR and serious parsing are equivalent. Now, GNU replaces LALR with the technology that LALR replaced a quarter century earlier: recursive descent.

2010: Leo 1991 is implemented

Jeffrey Kegler (the author of this timeline) releases the first practical implementation of Joop Leo's algorithm[94]. The new parser, named Marpa, is general, declarative and linear for all the grammars discussed in Knuth 1965, that is, it is linear for the LR(k) grammars for all finite k[95]. This means it is also linear for most of the grammar classes we have discussed in this timeline.

This is a vast class of grammars, and it has the important feature that a programmer can readily determine if their grammar is linear under Marpa. Marpa will parse a grammar in linear time, if

The ability to ensure that a grammar can be parsed in linear time is highly useful for second-order languages -- a programmer can easily ensure that all his automatically generated grammars will be practical. Marpa's own DSL will make use of second-order programming.

Leo 1991 did not address the zero-length rule problem. Kegler solves it by adopting the solution of Aycock and Horspool 2002. Finally, in an effort to combine the best of declarative and hand-written parsing, Kegler reorders Earley's parse engine so that it allows procedural logic. As a side effect, this gives Marpa excellent error-handling properties.

2012: General precedence parsing

While the purely precedence-driven parsing of Samuelson and Bauer 1960 and Pratt 1973 never caught on, the use of precedence in parsing remains popular. In fact, precedence tables often occur in documentation and standards, which is a clear sign that they too can play a role in human-readable but precise descriptions of grammars.

In 2012, Kegler releases a version of Marpa which allows the user to mix Chomskyan and precedence parsing at will[100]. Marpa's precedenced statements describe expressions in terms of precedence and association. If precedenced statements are added to a grammar that Marpa parses in linear time, then the grammar remains linear as long as the precedenced statements are (after precedence is taken in consideration) unambiguous[101].

Here is an example of a Marpa precedenced statement: [102]

              Expression ::=
              |  '(' Expression ')' assoc => group
              || Expression '**' Expression assoc => right
              || Expression '*' Expression
              |  Expression '/' Expression
              || Expression '+' Expression
              |  Expression '-' Expression

Previously operator expression parsers had worked by comparing adjacent operators. This limited them to binary expressions (the ternary expressions in most languages are implemented using post-parsing hackery). Marpa's precedenced expressions are not operator-driven -- operators are used only to avoid ambiguity. Precedenced statements will directly implement any number of ternary, quaternary or n-ary operators. Ternary and larger expressions are useful in financial calculations, and are common in calculus.

The Operator Issue as of 2012

Marpa allows operator expressions to be parsed as BNF, eliminating the need for them. On the other hand, Marpa's ability to handle second-order languages allows it to take statements which describe expressions in terms of precedence and association, and rewrite them in a natural way into Chomskyan form. In this way the grammar writer has all the benefits of Chomskyan parsing, but is also allowed to describe his grammar in terms of operators when that is convenient. Once again, the Operator Issue seems to be solved, but this time perhaps for real.

The Parsing Problem as of 2012

Again recall the four criteria from the original quest for a solution to the Parsing Problem: that a parser be efficient, practical, declarative and general. Does Marpa[103] fulfill them?

At first glance, no. While Marpa is general, it is not linear or quasi-linear for the general case, and therefore cannot be considered efficient enough. But to be general, a parser has to parse every context-free grammar, including some wildly ambiguous ones, for which simply printing the list of possible parses takes exponential time. With the experience of the decades, linearity for a fully general BNF parser seems to be a superfluous requirement for a practical parser.

The LR(k) grammars include every other grammar class we have discussed -- LL(k) for all k, LALR, operator grammars, etc. If we change our criteria as follows:

then Marpa qualifies.

My teachers came from the generation which created the ALGOL vision and started the quest for a solution to the Parsing Problem. Alan Perlis was one of them. Would he have accepted my claim that I have found the solution to the problem he posed? I do not know, but I am certain, as anyone else who knew him can attest, that he would given me a plain-spoken answer. In that spirit, I submit this timeline to the candid reader. In any case, I hope that my reader has found this journey through the history of parsing informative.


An attempt is made to list the original publication, which is not necessarily the one consulted.

Aho and Ullman 1972: Aho, Alfred V., and Jeffrey D. Ullman. The theory of parsing, translation, and compiling. Vol. 1. Prentice-Hall, 1972.

Aho and Ullman 1973: Aho, Alfred V., and Jeffrey D. Ullman. The theory of parsing, translation, and compiling. Vol. 2. Prentice-Hall, 1973.

Aho and Ullman 1977: Aho, Alfred V., and Jeffrey D. Ullman. Principles of Compiler Design. Addison-Wesley, 1977. The dragon on the cover is green, and this classic first edition is sometimes called the green dragon book to distinguish it from its second edition, which had a red dragon on the cover.

ALGOL 1960: Backus, John W., F. L. Bauer, J. Green, C. Katz, J. McCarthy, A. J. Perlis, H. Rutishauser, K. Samelson, B. Vauquois, J. H. Wegstein, A. van Wijngaarden, and M. Woodger. Revised report on the algorithmic language Algol 60. Edited by Peter Naur. Springer, 1969. Accessed 23 April 2018. The report of the ALGOL 60 committee was very important, and I have departed from format to list all of its authors.

Aycock and Horspool 2002: Aycock, John, and R. Nigel Horspool. Practical earley parsing. The Computer Journal, vol. 45, issue 6, 2002, pp. 620-630. Accessed 23 April 2018.

Backus 1959: Backus, John W. The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM conference. Proceedings of the International Comference on Information Processing, 1959. Accessible online here and here as of 24 April 2018.

Backus 1980: Backus, John. Programming in america in the 1950s - Some Personal Impressions. A History of Computing in the twentieth century, 1980, pp. 125-135. Accessed 24 April 2018.

Backus 2006: Booch, Grady. Oral History of John Backus. Computer History Museum, 5 September 2006. Accessed 24 April 2018.

Carpenter and Doran 1977: Carpenter, Brian E., and Robert W. Doran. The other Turing machine. The Computer Journal, vol. 20, issue 3, 1 January 1977, pp. 269-279. Accessed online 24 April 2018.

Chipps et al 1956: Chipps, J., et al. A mathematical language compiler. Proceedings of the 1956 11th ACM national meeting, ACM, 1956.

Chomsky 1956: Chomsky, Noam. Three models for the description of language. IRE Transactions on information theory, vol. 2, issue 3, September 1956, pp. 113-124.

Chomsky 1957: Chomsky, Noam. Syntactic Structures. Mouton & Co., 1957.

Darwin 1984: Darwin, Ian F., and Geoffrey Collyer. A History of UNIX before Berkeley: UNIX Evolution, 1975-1984. /sys/doc/, 1984, Accessed 24 April 2018.

Denning 1983: Denning, Peter. Untitled introduction to Irons 1961. Communications of the ACM, 25th Anniversary Issue, vol. 26, no. 1, 1983, p. 14.

DeRemer 1969: DeRemer, Franklin Lewis. Practical translators for LR(k) languages. September 1969. PhD dissertation, Massachusetts Institute of Technology, Accessed 24 April 2018.

Dijkstra 1961: Dijkstra, E. W. Algol 60 translation: An algol 60 translator for the x1 and making a translator for algol 60. Stichting Mathematisch Centrum, Rekenafdeling, MR 34/61, 1961.

Earley 1968: Earley, Jay. An Efficient Context-free Parsing Algorithm. August 1968. PhD dissertation, Carnegie-Mellon University, Accessed 24 April 2018.

Ford 2002: Ford, Bryan. Packet parsing: a Practical Linear-Time Algorithm with Backtracking. September 2002. PhD dissertation, Massachusetts Institute of Technology,;sequence=2. Accessed 24 April 2018.

Ford 2004: Ford, Bryan. Parsing expression grammars: a recognition-based syntactic foundation. ACM SIGPLAN Notices, vol. 39, no. 1, January 2004. Accessed 24 April 2018.

Frost 1992: Frost, Richard A. Constructing Programs as Executable Attribute Grammars. The Computer Journal, vol. 35, issue 4, 1 August 1992, pp. 376-389.

FSF 2006: "GCC 4.1 Release Series: Changes, New Features, and Fixes." Free Software Foundation, Accessed 25 April 2018. The release date of GCC 4.1 was February 28, 2006.

Glennie 1960: Glennie, A. E. On the syntax machine and the construction of a universal compiler, TR-2. Carnegie Institute of Technology, Computation Center, 1960. Accessed 24 April 2018.

Grune and Jacobs 2008: Grune, D. and Jacobs, C. J. H. Parsing Techniques: A Practical Guide, 2nd edition. Springer, 2008. When it comes to determining who first discovered an idea, the literature often disagrees. In such cases, I have often deferred to the judgement of Grune and Jacobs.

Hayes 2013: Hayes, Brian. First links in the Markov chain. American Scientist, vol. 101, no .2, 2013, p. 252. Accessed online as of 24 April 2018.

Hilgers and Langville 2006: Hilgers, Philipp, and Amy N. Langville. The five greatest applications of Markov Chains. Proceedings of the Markov Anniversary Meeting, Boston Press, 2006. Accessed 24 April 2018. Slide presentation: Accessed 24 April 2018.

Hutton 1992: Hutton, Graham. Higher-order functions for parsing. Journal of functional programming, vol. 2, no. 3, July 1992, pp. 323-343. Accessed 24 April 2018.

Hutton and Meijer 1996: Hutton, Graham and Erik Meijer. Monadic parser combinators, Technical Report NOTTCS-TR-96-4. Department of Computer Science, University of Nottingham, 1996. Accessed 24 April 2018.

Irons 1961: Irons, Edgar T. A syntax directed compiler for ALGOL 60. Communications of the ACM, vol. 4, no. 1, January 1961, pp. 51-55.

Johnson 1975: Johnson, Stephen C. Yacc: Yet another compiler-compiler, Technical Report. Bell Laboratories, Murray Hill, NJ, 1975. Accessed 24 April 2018.

Kegler 2010: Kegler, Jeffrey. Marpa is now O(n) for Right Recursions. Ocean of Awareness, June 5, 2010. Accessed 24 April 2018. This is the initial announcement, and its examples and links are obsolete. The latest stable version of Marpa is Marpa::R2.

Kegler 2012a: Kegler, Jeffrey. Precedence parsing made simpler. Ocean of Awareness, August 22, 2012. Accessed 24 April 2018. This is the initial announcement, and its examples and links are obsolete. The latest stable version of Marpa is Marpa::R2.

Kegler 2012b: Kegler, Jeffrey. Marpa, A Practical General Parser: The Recognizer.,%20Jeffrey%20-%20Marpa,%20a%20practical%20general%20parser:%20the%20recognizer.pdf. Accessed 24 April 2018. The link is to the 19 June 2013 revision of the 2012 original.

Kleene 1951: Kleene, Stephen Cole. Representation of events in nerve nets and finite automata., RM-704. Rand Corporation, 15 December 1951. Accessed 24 April 2018.

Knuth 1962: Knuth, Donald E. A history of writing compilers. Computers and Automation, vol. 11, no. 12, 1962, pp. 8-18.

Knuth 1965: Knuth, Donald E. On the translation of languages from left to right. Information and Control, vol. 8, issue 6, December 1965, pp. 607-639. Accessed 24 April 2018.

Knuth 1968: Knuth, Donald E. Semantics of context-free languages. Mathematical Systems Theory, vol. 2, no. 2, 1968, pp. 127-145. Accessed 24 April 2018.

Knuth 1971: Knuth, Donald E. Top-down syntax analysis. Acta Informatica vol. 1, issue 2, 1971, pp. 79-110. Accessed 24 April 2018.

Knuth 1990: Knuth, Donald E. The genesis of attribute grammars. Attribute Grammars and Their Applications, Springer, September 1990, pp. 1-12. Accessed 24 April 2018.

Knuth and Pardo 1976: Knuth, Donald E., and Luis Trabb Pardo. The Early Development of Programming Languages, STAN-CS-76-562. Computer Science Department, Stanford University, August 1976. Accessed 24 April 2018.

Leo 1991: Leo, Joop M. I. M. A general context-free parsing algorithm running in linear time on every LR (k) grammar without using lookahead. Theoretical computer science vol. 82, issue 1, 22 May 1991, pp. 165-176. Accessed 24 April 2018.

Lewis and Stearns 1968: There is a 1966 paper with the same authors and title. but Aho and Ullman 1972, p. 268 says that the 1968 paper is the first definition of LL(k). Lewis II, Philip M., and Richard Edwin Stearns. Syntax-directed transduction. Journal of the ACM, vol. 15, issue 3, 1968, pp. 465-488.

Lucas 1961: Lucas, Peter. Die Strukturanalyse von Formelübersetzern/analysis of the structure of formula translators. Electronische Rechenlagen, vol. 3, 11.4, 1961, pp. 159-167.

Home website: Accessed 25 April 2018.
Kegler's Marpa website: Accessed 25 April 2018.
Github: Accessed 25 April 2018.
See also Kegler 2012b and Marpa::R2 MetaCPAN.

Marpa::R2 MetaCPAN: Accessed 30 April 2018.

Markov 1906: Markov, Andrey Andreyevich. Rasprostranenie zakona bol'shih chisel na velichiny, zavisyaschie drug ot druga. Izvestiya Fiziko-matematicheskogo obschestva pri Kazanskom universitete, 2-ya seriya, tom 15, 1906, pp. 135-156. In English, the title translates to "Extension of the law of large numbers to quantities, depending on each other".

Markov 1913: Markov, Andrey Andreyevich. Primer statisticheskogo issledovaniya nad tekstom 'Evgeniya Onegina', illyustriruyuschij svyaz' ispytanij v cep'. Izvestiya Akademii Nauk, Sankt-Peterburg, VI seriya, tom 7, 9(3), 1913, pp. 153-162. An English translation is An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains, Science in Context, vol. 19, no. 4, December 2006, pp. 591-600. Accessed 25 April 2018.

Mascrenhas et al 2014: Mascarenhas, Fabio, Sèrgio Medeiros, and Roberto Ierusalimschy. On the relation between context-free grammars and parsing expression grammars. Science of Computer Programming, volume 89, part C, 1 September 2014, pp. 235-250. Accessed 25 April 2018.

McIlroy 1987: McIlroy, M. Douglas. "A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971-1986," AT&T Bell Laboratories Computing Science Technical Report #139, 1987. Accessed 25 April 2018.

McIlroy and Kernighan 1979: McIlroy, M. D. and B. W. Kernighan. Unix Time-Sharing System: Unix Programmer's Manual. Seventh Edition, Bell Telephone Laboratories, January 1979. Accessed 25 April 2018.

Moser 1954: Moser, Nora B. Compiler method of automatic programming. Proceedings of the Symposium on Automatic Programming for Digital Computers, Office of Naval Research, 1954, pp. 15-21. This conference, held 13-14 May and organized by Grace Hopper, was the first symposium devoted to software.

Norvell 1999: Norvell, Theodore. Parsing Expressions by Recursive Descent. 1999, Accessed 25 April 2018.

Padua 2000: Padua, David. The Fortran I compiler. Computing in Science & Engineering, volume 2, issue 1, Jan-Feb 2000, pp. 70-75. Accessed 25 April 2018.

Perlis et al 1958: Perlis, Alan J., J. W. Smith, and H. R. Van Zoeren. Internal Translator (IT): A compiler for the 650. Computation Center, Carnegie Institute of Technology. Publication date is given as March 1957 in Knuth and Pardo 1976, p. 105. Accessed 25 April 2018.

Post 1943: Post, Emil L. Formal Reductions of the General Combinatorial Decision Problem. American Journal of Mathematics, vol. 65, no .2, April 1943, pp. 197-215.

Pratt 1973: Pratt, Vaughan R. Top down operator precedence. Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages, ACM, 1973.

Rosencrantz and Stearns 1970: Rosenkrantz, Daniel J., and Richard Edwin Stearns. Properties of deterministic top-down grammars. Information and Control, volume 17, no. 3, October 1970, pp. 226-256. Accessed 23 April 2018.

Samuelson and Bauer 1959: Samelson, Klaus, and Friedrich L. Bauer. "Sequentielle formelübersetzung." it-Information Technology 1.1-4 (1959): 176-182.

Samuelson and Bauer 1960: Samelson, Klaus, and Friedrich L. Bauer. Sequential formula translation. Communications of the ACM, vol. 3, no. 2, February 1960, pp. 76-83. A translation of Samuelson and Bauer 1959.

Schorre 1964: Schorre, D. V. Meta ii a syntax-oriented compiler writing language. Proceedings of the 1964 19th ACM national conference, ACM, 1964. Accessed 23 April 2018.

Shannon 1948: Shannon, Claude E. A Mathematical Theory of Communication. Bell System Technical Journal, vol. 27, pp. 379-423, 623-656, July, October, 1948. Accessed 23 April 2018.

Simms 2014: Simms, Damon. Further discussion related to request for arbitration. Talk:Metacompiler, Archive 2, Wikipedia, 9 October 2014, Accessed 25 April 2018.

Snyder 1975: Snyder, Alan. A portable compiler for the language C. No. MAC-TR-149. Massachusetts Institute of Technology, Project MAC, 1975.

Wadler 1985: Wadler, Philip. How to replace failure by a list of successes a method for exception handling, backtracking, and pattern matching in lazy functional languages. Conference on Functional Programming Languages and Computer Architecture, Springer, 1985. Accessed 23 April 2018.

Wadler 1990: Wadler, Philip. Comprehending monads. Proceedings of the 1990 ACM conference on LISP and functional programming, ACM, 1990. Accessed 23 April 2018.

Wadler 1995: Wadler, Philip. Monads for functional programming. International School on Advanced Functional Programming, Springer, 1995. Accessed 23 April 2018.

Wall 2014: Wall, Larry. IRC log for #perl6, 30 August 2014, Accessed 25 April 2018.

Wiki Big_O: Big O Notation, Wikipedia, 29 April 2018, Accessed 29 April 2018.

Wiki Perl: Perl, Wikipedia, 21 April 2018, Accessed 26 April 2018.


1. Markov 1906.

2. Indirectly, Markov's purpose may have been to refute Nekrasov's proof of free will. Nekrasov pointed out that social statistics of all kinds follow the law of large numbers, which (up until then) had assumed events were independent. Since social events were human choices, Nekrasov saw this as a proof of free will. Markov's demonstration that the law of large numbers works just as well for dependent events undercut Nekrasov's proof. (Hayes 2013, pp. 92-93).

3. Markov 1913. See Hayes 2013 and Hilgers and Langville 2006, pp. 155-157.

4. Post 1943.

5. Carpenter and Doran 1977

6. Shannon 1948.

7. Shannon 1948, pp. 4-6. See also Hilgers and Langville 2006, pp. 157-159.

8. Knuth and Pardo 1976, pp. 29-35, 40.

9. The formal apparatus for describing operator expressions is not fully formed as of 1949.

10. This timeline refers to precedence levels as tight or loose. The traditional terminology is confusing: tighter precedences are higher but traditionally the precendence levels are also numbered and the higher the number, the lower the precedence.

11. Many of the firsts for parsing algorithms in this timeline are debatable, and the history of operator precedence is especially murky. Here I follow Samuelson and Bauer 1960 in giving the priority to Boehm.

12. Norvell 1999.

13. Knuth and Pardo 1976, pp 35-42.

14. Knuth and Pardo 1976, p. 50.

15. The quoted definition is from Moser 1954, p. 15, as cited in Knuth and Pardo 1976, p. 51.

16. In 1952, an interface that guided calls to subroutines was much more helpful than current programmers might imagine: "Existing programs for similar problems were unreadable and hence could not be adapted to new uses." (Backus 1980, p. 126)

17. I hope nobody will read this terminological clarification as in any sense detracting from Hopper's achievement. Whatever it is called, Hopper's program was a major advance, both in terms of insight and execution, and her energetic followup did much to move forward the events in this timeline. Hopper has a big reputation and it is fully deserved.

18. Knuth and Pardo 1976, p. 50.

19. Kleene 1951.

20. Knuth and Pardo 1976, p 42.

21. Knuth and Pardo 1976, pp. 42-49.

22. Backus 1980, pp. 133-134.

23. Knuth and Pardo 1976, pp. 83-86.

24. Perlis et al 1958. Knuth and Pardo 1976, p. 83, state that the IT compiler "was put into use in October, 1956."

25. Knuth and Pardo 1976, p. 84.

26. Knuth 1962. The 1956 date for IT is from Padua 2000.

27. Chipps et al 1956.

28. Chomsky 1956.

29. Chomsky seems to have been unaware of Post's work -- he does not cite it.

30. Chomsky 1957.

31. Backus's notation is influenced by his study of Post -- he seems not to have read Chomsky until later. See Backus 2006, p. 25 and Backus 1980, p. 133.

32. Backus 1959.

33. Backus 1980, pp. 132-133.

34. Samuelson and Bauer 1960.

35. Norvell 1999.

36. ALGOL 1960.

37. ALGOL 1960.

38. As a pedantic point, a general parser need only parse proper grammars. A BNF grammar is proper if it has no useless rules and no infinite loops. Neither of these have any practical use, so that the restriction to proper grammars is unproblematic.

39. Glennie 1960.

40. Glennie 1960.

41. Irons 1961. Among those who state that Irons 1961 parser is what is now called left-corner is Knuth (Knuth 1971, p. 109).

42. Although it seems likely that parsing operator expressions would require backtracking, and therefore could be inefficient.

43. Irons is credited with the discovery of synthesized attributes by Knuth (Knuth 1990). Synthesized attributes are a concept in semantics, which this timeline usually does not cover. But synthesized attributes will be important for us.

44. Lucas 1961.

45. Grune and Jacobs 2008 call Lucas the discoverer of recursive descent. In fact, both Irons 1961 and Lucas 1961 are recursive descent with a major bottom-up element. Perhaps Grune and Jacobs based their decision on the Lucas' description of his algorithm, which talks about his parser's bottom-up element only briefly, while describing the top-down element in detail. Also, the Lucas algorithm more resembles modern implementations of recursive descent much more closely than Irons 1961.

In conversation, Irons described his 1961 parser as a kind of recursive descent. Peter Denning (Denning 1983) gives priority to Irons and, as a grateful former student of Irons, that is of course my preference on a personal basis.

46. Lucas 1961 cites Samuelson and Bauer 1960.

47. Dijkstra 1961.

48. Norvell 1999.

49. But see the entry for 1973 on Pratt parsing (Pratt 1973) where the idea of parsing entire languages as operator grammars is revisited.

50. Schorre 1964, p. D1.3-1.

51. Knuth 1965..

52. Knuth 1965, p. 637.

53. This timeline is not a mathematics tutorial, and I have ignored important complications to avoid digression. For interested readers, the details of "big O" notation can be worth learning: Wiki Big_O.

54. Because constants are ignored, all reasonable measures of the length are equivalent for big O notation. Some papers on parsing also consider the size of the grammar, but usually size of the grammar is regarded as a fixed constant. In this timeline all big O results will be in terms of the input length.

55. Knuth also discusses some ambiguities, and some intermixing of semantics with syntax, in the ALGOL report. But Knuth (quite appropriately) notes that BNF was new when it was written, and treats these other issues as problems with the ALGOL specification. (See Knuth 1965, pp. 622-625.)

56. Knuth 1965, p. 621.

57. Knuth 1965, p. 637.

58. Lewis and Stearns 1968. They are credited in Rosencrantz and Stearns 1970 and Aho and Ullman 1972, p. 368.

59. Knuth 1965, p. 610. See also on p. 611 "corresponds with the intuitive notion of translation from left to right looking k characters ahead".

60. Knuth 1971, p. 102. LL and LR have mirror images: RL means scan from the right, using left reductions and RR acquires its current meaning of scan from the right, using right reductions. Practical use of these mirror images is rare, but it may have occurred in one of the algorithms in our timeline -- operator expression parsing in the IT compiler seems to have been RL(2) with backtracking.

61. Earley 1968.

62. Knuth 1968.

63. DeRemer 1969

64. Darwin 1984, Section 3.1, "An old editor made new".

65. Aho and Ullman 1972.

66. Aho and Ullman 1972, p 321.

67. Aho and Ullman 1972, pp. 456-485.

68. Aho and Ullman 1972, p. 466.

69. Simms 2014.

70. Pratt 1973.

71. Norvell 1999.

72. Pratt 1973, p. 51.

73. Synder 1975. See, in particular, pp. 67-68.

74. Aho and Ullman 1977.

75. McIlroy and Kernighan 1979.

76. Johnson 1975.

77. See McIlroy 1987, p. 11, for a listing of yacc-powered languages in V7 Unix.

78. Wiki Perl, Early versions section,

79. Walder 1990

80. Wadler 1985.

81. Leo 1991.

82. Wall 2014.

83. Some sources consider Wadler 1985 an early presentation of the ideas of combinator parsing. In assigning priority here, I follow Grune and Jacobs 2008 (p. 564).

84. The paper which is devoted to parsing is Hutton 1992. The other paper, which centers on combinators as a programming paradigm, is Frost 1992. Frost only mentions parsing in one paragraph, and that focuses on implementation issues. Some of his grammars include operator expressions, but these avoid left recursion, and implement precedence and associativity, using parentheses.

85. Hutton 1992, p. 5.

86. Hutton 1992, pp. 19-20.

87. Hutton and Meijer 1996.

88. Hutton and Meijer 1996, p. 3.

89. Hutton and Meijer 1996, p. 18. A simpler arithmetic expression grammar on p. 15 raises no issues not present in the more complicated example.

90. Aycock and Horspool 2002.

91. Ford 2004. See also Ford 2002.

92. Mascrenhas et al 2014. My account of PEG is negative because almost all real-life PEG practice is unsafe, and almost all of the literature offers no cautions against unsafe use of PEG. Ford's PEG is efficient and does in fact have real, if niche, uses. Programmers interested in the safe use of PEG should consult Mascrenhas et al 2014.

93. FSF 2006.

94. Kegler 2010.

95. In fact, Leo 1991 proves his algorithm, and therefore Marpa, is linear for the LR-regular grammars -- the LR grammars with infinite lookahead, so long as the lookahead is a regular expresion. It is not known (in fact not decidable, see Leo 1991, p. 175.), just how large a class of grammars Marpa is linear for. But it is known that there are both ambiguous and unambiguous grammars for which Marpa is not linear. In the general case, Marpa obeys the bounds of Earley's algorithm -- it is worst-case O(n**2) for unambiguous grammars; and worst-case O(n**3) for ambiguous grammars. On the other hand, Marpa follows Leo 1991 in being linear for many ambiguous grammars, including those of most practical interest.

96. In the literature operator expression grammars are more often call simply operator grammars.

97. In the general case, ambiguity is undecidable, but in practice it is usually straight-forward for a programmer to determine that the grammar he wants is unambiguous. Note the while the general case is undecidable, Marpa will tell the programmer if a parse (a grammar with a particular input) is ambiguous and, since it parses ambiguous grammars, produces an error message showing where the ambiguity is.

98. A marker, in this sense, is something in the input that shows where the middle of a middle recursion is, without having to go to the end and count back. Whether or not a grammar is unmarked is, in general, an undecidable problem. But in practice, there is a easy rule: if a person can eyeball the middle of a long middle recursion, and does not need to count from both ends to find it, the grammar is marked.

99. This is a very pedantic requirement, which practical programmers can in fact ignore. Any programmer who has found and eliminated ambiguities in his grammar will usually, in the process, also have found and eliminated ambiguous right recursions, since it is far easier to eliminate them than to determine is they create a overall ambiguity. In fact, it is an open question whether there are any unambiguous grammars with ambiguous right recursions -- neither Joop Leo or I know of any.

100. Kegler 2012a.

101. Internally, Marpa rewrites precedenced statements into BNF. Marpa's rewrite does not use middle recursions and does not introduce any new ambiguities so that, following Marpa's linearity rules, the result must be linear.

102. In this, precedence goes from tightest to loosest. Single bars (|) separate RHS's of equal precedence. Double bars (||) separate RHS's of different precedence. Associativity defaults to left, and exceptions are indicated by the value of the assoc adverb. Note that these precedenced statements, like all of Marpa's DSL, are parsed by Marpa itself. The statement shown has the semantic adverbs removed for clarity. The original is part of the Marpa::R2 test suite and can also be found on Marpa::R2 MetaCPAN, Synopsis section, Scanless::DSL page, .

103. Marpa::R2.