Parsing: a timeline Version 3.1 Revision 9, 6 Jul 2023 Jeffrey Kegler

4th century BCE: Panini's description of Sanskrit

In India, Panini creates an exact description of the Sanskrit language, including pronunciation. Sanskrit can be completely reconstructed using nothing but Panini's grammar. Panini's grammar, which predates Euclid, is probably the first formal system of any kind. In the future, nothing like it will exist for any other natural language of significant size or corpus.

By the 1940's and 1950's, Panini will be well known and much studied in the Western linguistics community, but will be almost unknown to non-linguists outside of India. Panini's effect on the other events in this timeline will be indirect.

1854: Ada discovers computer "language"

In her translator's note on an article on Babbage's computer, Ada Lovelace becomes the first person to clearly see that programming a computer is a distinct discipline from building the computer itself.[1] By 2023, this hardware/software distinction will be so well known as to seem obvious.

Ada is also the first to think of software in linguistic terms -- she originates the idea that, in writing software, we communicate with the computer, and that the means that we use to do that is a "language".[2]

A new, a vast, and a powerful language is developed for the future use of analysis, in which to wield its truths so that these may become of more speedy and accurate practical application for the purposes of mankind than the means hitherto in our possession have rendered possible. Thus not only the mental and the material, but the theoretical and the practical in the mathematical world, are brought into more intimate and effective connexion with each other.[3]

Here "analysis" means the branch of mathematics that studies limits and continuous functions, and which includes calculus.

1906: Markov's chains

Andrey Markov introduces his chains -- a set of states with transitions between them.[4] One offshoot of Markov's work will be what will come to be known as regular expressions. Markov uses his chains, not for parsing, but to deal with a problem in probability -- does the law of large numbers require that events be independent? Indirectly, Markov is addressing the question of the existence of free will.[5]

1913: Markov and Eugene Onegin

In 1913, Markov revisits his chains, applying them to the sequence of vowels and consonants in Pushkin's Eugene Onegin.[6] Again, Markov's interest is not in parsing. Nonetheless, this is an application to language of what later will be regarded as a parsing technique, and apparently for the first time in the West.

1929: Bloomfield's "Postulates"

Leonard Bloomfield, as part of his effort to create a linguistics that would be taken seriously as a science, publishes his "Postulates".[7] Known as structural linguistics, Bloomfield's approach will dominate American linguistics for over two decades.

"Language" as of 1929

Bloomfield's "Postulates" defines a "language" as

[t]he totality of utterances that can be made in a speech community[8]

Note that there is no reference in this definition to the usual view -- that the utterances of a language "mean" something.

This omission is not accidental.[9] Bloomfield excludes meaning from his definition of language because he wants linguistics to be taken seriously as science. Behaviorist thought is very influential at this time and behavorists believe that, while human behaviors can be observed and verified and therefore made the subject of science, mental states cannot be verified. Claiming to know what someone means by a word is claiming to read his mind. And "mind-reading" is not science.

1943: Post's rewriting system

Emil Post defines and studies a formal rewriting system[10] using productions. With this, the process of rediscovering Panini in the West begins.

1945: Turing discovers stacks

Alan Turing discovers the stack as part of his design of the ACE machine. This will be important in parsing because recursive parsing requires stacks. The importance of Turing's discovery is not noticed in 1945 and stacks will be rediscovered many times over the next two decades[11].

1948: Shannon repurposes Markov's chains

Claude Shannon publishes the foundation paper of information theory[12]. In this paper, Shannon models English using Andrey Markov's chains[13]. The approach is similar to Markov's but the intent is different -- Shannon's is a serious attempt at a contribution to parsing human languages.

1949: Rutishauser's compiler

From 1949 to 1951 at the ETH Zurich, Heinz Rutishauser works on the design of what we will come to call a compiler[14]. Rutishauser's arithmetic expression parser does not honor precedence but it does allow nested parentheses. It is perhaps the first algorithm which can really be considered a parsing method. Rutishauser's compiler is never implemented.

The Operator Issue as of 1949

In the form of arithmetic expressions, operator expressions are the target of the first efforts at automatic parsing. We will not see this issue go away. Very informally[15], we can say that an operator expression is an expression built up from operands and operators. It is expected that the operand might be another operator expression, so operator expressions raise the issue of recursion.

The archetypal examples of operator expressions are arithmetic expressions:


         2+(3*4)
         13^2^-5*9/11
         1729-42*8675309

In Western mathematics arithmetic expressions have been read according to traditional ideas of associativity and precedence:

^ is exponentiation. It right associates and has tightest[16] precedence.
Multiplication (*) and division (/) left associate. They have a precedence equal to each other and less tight than that of exponentiation.
Addition ('+') and subtraction ('-') left associate. They have a precedence equal to each other and less tight than that of multiplication and division.
Parentheses, when present, override the traditional associativity and precedence.

Rutishauser's language is structured line-by-line, as will be all languages until LISP. No language until ALGOL will be truly block-structured.

The line-by-line languages are parsed using string manipulations. A parsing theory is not helpful for describing these ad hoc string manipulations, so they don't give rise to one. The only logic in these early compilers that really deserves to be called a parsing method is that which tackles arithmetic expressions.

1950: Boehm's compiler

During 1950, Corrado Boehm, also at the ETH Zurich, develops his own compiler. Rutishauser and Boehm are working at the same institution at the same time, but Boehm is unaware of Rutishauser's work until his own is complete. Boehm's is also the first self-compiling compiler -- it is written in its own language.

Like Rutishauser, Boehm's language is line-by-line and parsed ad hoc, except for expressions. Boehm's expression parser does honor precedence, making it perhaps the first operator precedence parser[17]. In Norvell's taxonomy[18], Boehm's algorithm inaugurates the classic approach to operator parsing.

Boehm's compiler also allows parentheses, but the two cannot be mixed -- an expression can either be parsed using precedence or have parentheses, but not both. Also like Rutishauser's, Boehm's compiler is never implemented[19].

1952: Grace Hopper uses the term compiler

Grace Hopper writes a linker-loader, and calls it a compiler[20]. Hopper seems to be the first person to use the term compiler to describe a computer program.

"Compiler" as of 1952

Hopper uses the term compiler in a meaning very close to one of its traditional senses: to compose out of materials from other documents[21]. Specifically, in 1952, subroutines were new, and automated programming (what we will come to call compiling) often is viewed as providing a interface for calling a collection of carefully chosen assembler subroutines[22]. Hopper's new program takes this subroutine calling one step further -- instead of calling the subroutines it expands them (or in Hopper's terms compiles them) into a single program.

After Hopper the term compiler will acquire a different meaning, one specific to the computer field. By 1956, programs like Hopper's will no longer be called compilers[23].

1951: Kleene's regular languages

Kleene discovers regular languages[24]. Kleene does not use regular expression notation, but his regular languages are the idea behind it.

1952: Glennie's AUTOCODE

Glennie discovers what Knuth will later call the first real[25] compiler. (By this Knuth will mean that AUTOCODE was actually implemented and used by someone to translate algebraic statements into machine language.) Glennie's AUTOCODE is very close to the machine -- just above machine language. It does not allow operator expressions.

AUTOCODE is hard-to-use, and apparently sees little use by anybody but Glennie himself. Because Glennie works for the British atomic weapons projects his papers are routinely classified, so even the indirect influence of AUTOCODE is slow to spread. Nonetheless, many other compilers afterward will be named AUTOCODE, probably because the authors are aware of Glennie's effort[26].

1954: The FORTRAN project begins

At IBM, a team under John Backus begins working on the language which will be called FORTRAN.

"Compiler" as of 1954

In June 1954, an ACM committee including John Backus and Grace Hopper issues a "First Glossary of Programming Terminology", which defines the term "compiler" as

a executive routine which, before the desired computation is started, translates a program expressed in pseudo-code into machine code (or into another pseudo-code for further translation by an interpreter)."[27]

The "pseudo-code" of the Committee is what in 2023 will be called a "programming language".[28] The definition goes on to detail the various tasks a "compiler" might perform. These include what in 2023 will be called "linking" and "loading", so that the ACM committee's redefinition of compiler can be seen as a broadening of Hopper's use of the term.[29] It is not clear if the ACM committee's new definition captures a current usage of the term "compiler", or if it is a prescriptive redefinition, intended to change usage.

As the computer field develops, the textbook definition will expand beyond the ACM Committee's implementation-focused one, so that a "compiler" is any translator from one language (the "source language") to another (the "target language"). And linkers and loaders will often not be consider a part of the compiler in the more proper sense. But in its essentials, the most common usage of the term "compiler" in 2023 will be as it is defined by the ACM committee.

Note that, even though they were not called "compilers", compilers in the ACM committee sense of the term had been envisioned even before Hopper first used the term. But nobody had called these programs compilers -- they were called automatic coding, codification automatique or Rechenplanfertigung[30]

1955: Noam Chomsky graduates

Noam Chomsky earns his PhD at the University of Pennsylvania. His teacher, Zelig Harris, is a prominent Bloomfieldian, and Chomsky's early work is thought to be in the Bloomfield school.[31] That same year Chomsky accepts a teaching post at MIT. MIT does not have a linguistics department, and Chomsky is free to teach his own (highly original and very mathematical) approach to linguistics.

1955: Work begins on the IT compiler

At Purdue, a team including Alan Perlis and Joseph Smith begins work on the IT compiler[32].

1956: The IT compiler is released

Perlis and Smith, now at the Carnegie Institute of Technology, finish the IT compiler.[33] Don Knuth calls this

the first really useful compiler. IT and IT's derivatives were used successfully and frequently in hundreds of computer installations until [their target] the [IBM] 650 became obsolete. [... P]revious systems were important steps along the way, but none of them had the combination of powerful language and adequate implementation and documentation needed to make a significant impact in the use of machines.[34]

The IT language had arithmetic expressions, of a sort -- parentheses are honored, but otherwise evaluation is always right-to-left -- and there is no operator precedence. The IT compiler's way of doing arithmetic expressions proves very unpopular: Donald Knuth reports that

[t]he lack of operator priority (often called precedence or hierarchy) will be the most frequent single cause of errors by the users of the IT compiler.[35]

"Compiler" as of 1956

The ACM committee's definition is by now the standard one.[36]

1956: The Chomsky hierarchy

Chomsky publishes his "Three models" paper, which is usually considered the foundation of Western formal language theory[37]. Chomsky demolishes the idea that natural language grammar can be modeled using only Markov chains. Instead, the paper advocates a natural language approach that uses three layers:

Chomsky uses Markov's chains as his bottom layer. This will come to be called a compiler's lexical phase.
Chomsky's middle layer uses context-free grammars and context-sensitive grammars. These are his own discoveries[38]. This middle layer will come to be called a compiler's syntactic phase.
Chomsky's top layer, again his own discovery, maps or transforms the output of the middle layer. In years to come, Chomsky's top layer will be the inspiration for the AST transformation phase of compilers.

Term: "Language" as of 1956

In his "Three models" paper, Chomsky says that

By a language then, we shall mean a set (finite or infinite) of sentences, each of finite length, all constructed from a finite alphabet of symbols.[39]

This is exactly Bloomfield's definition, restated using set theory.

Nonetheless signs of departure from the behaviorist orthodoxy are apparent in "Three Models" -- Chomsky is quite willing to talk about what sentences mean, when it serves his purposes. For a utterance with multiple meanings, Chomsky's new model produces multiple syntactic derivations. Each of these syntactic derivations "looks" like the natural representation of one of the meanings. Chomsky points out that the insight into semantics that his new model provides is a very desirable property to have.[40]

Term: "Parser"

Chomsky is a turning point, so much so that he establishes or settles the meaning of many of the terms we are using. A parser, for our purposes, is something or someone that transforms a string of symbols into a structure, according to a description of the mapping from strings to structures. For our purposes, the structures can usually be considered to be parse trees.

Term: "Recognizer"

In contrast to a parser, a recognizer is a something or someone that takes a string and answers yes or no -- yes if the string is in the language described by the recognizer, no otherwise. Note that, if we intend to do semantics, a recognizer alone is not sufficient.

In the strict sense, a recognizer cannot drive a compiler -- a compiler needs a parser. But recognizers can be far easier to write, and are often hacked up and pressed into service as parsers. For example, if your semantics is simple and non-recursive, you might be able to drive your semantics phase with the output of a regex engine, using captures.

1957: Chomsky publishes Syntactic Structures

Noam Chomsky publishes Syntactic Structures[41], one of the most important books of all time. The orthodoxy in 1957 is structural linguistics which argues, with Sherlock Holmes, that it is a capital mistake to theorize in advance of the facts. Structuralists start with the utterances in a language, and build upward.

But Chomsky claims that without a theory there are no facts: there is only noise. The Chomskyan approach is to start with a grammar, and use the corpus of the language to check its accuracy. Chomsky's approach will soon come to dominate linguistics.

Term: "Chomskyan parsing"

From here on, parsing theory divides into Chomskyan and non-Chomskyan. Chomskyan parsing theory becomes and remains the mainstream. But it will be far from unchallenged.

Above we defined a parser (recognizer) as something or someone that parses (recognizes) a string according to a description. In Chomskyan parsing (recognizing), the description is that of Chomsky's middle layer. If the description is anything else, the parser (recognizer) is non-Chomskyan.

As of 1957, we are calling the description of a Chomskyan middle layer, a context-free grammar -- BNF notation for context-free grammars has not yet been discovered. Once it is, we will refer to context-free grammars as BNF grammars.

1957: FORTRAN released

Backus's team makes the first FORTRAN compiler available to IBM customers. FORTRAN is the first high-level language that will find widespread implementation. As of 2023, it is the oldest language that survives in practical use.

FORTRAN I is a line-by-line language. Its parsing is non-Chomskyan. But it includes one important discovery. FORTRAN allows expressions and, learning from the dissatisfaction with the IT compiler, FORTRAN honors associativity and precedence.

The designers of FORTRAN use a strange trick for parsing operator expressions -- they hack the expressions by adding parentheses around each operator. The number of parentheses varied, depending on the operator. Surprisingly, this works. In fact, once the theoretical understanding of operator precedence comes about, the FORTRAN I implementation will be recognized as a hackish and inefficient way of implementing the classic operator precedence algorithm.

1958: LISP released

John McCarthy's LISP appears. LISP goes beyond the line-by-line syntax -- it is recursively structured. But the LISP interpreter does not find the recursive structure: the programmer must explicitly indicate the structure herself, using parentheses. Similarly, LISP does not have operator expressions in the usual sense -- associativity and precedence must be specified with parentheses.

1959: Backus's notation

Backus discovers a new notation to describe the IAL language[42]. According to Backus's recollection, his paper[43] has only one reader: Peter Naur[44]. The IAL language will soon be renamed ALGOL.

1959: Operator precedence and stacks

Samelson and Bauer[45] describe the use of stacks to implement operator precedence. Samelson and Bauer do not use the word stack -- what we call a stack, they call a cellar. They explain the idea in great detail -- apparently, 14 years after Turing's discovery of them, stacks are still unfamiliar, even to the readers of technical journals.

Their algorithm, and its presentation, are thoroughly non-Chomskyan. Samelson and Bauer do not provide a context-free grammar for their grammar, just an operator precedence table.

The Operator Issue as of 1959

Since Boehm, many people have been refining operator precedence. With Samelson and Bauer, what Norvell[46] calls "the classic algorithm" takes a well-documented form. The Samelson and Bauer paper will be very influential.

"Language" as of 1959

In 1959, Chomsky reviews a book by B.F. Skinner on linguistics.[47] Skinner is the most prominent behaviorist of the time. This review galvanizes the opposition to behaviorism, and establishes Chomsky as behavorism's most prominent and effective critic. The behaviorists had claimed that they could eliminate "traditional formulations in terms of reference and meaning". These claims, Chomsky says, are "simply not true."[48]

With this review, Chomsky makes clear his stand on the role of mental states in language study. Unfortunately, by this time, Chomsky's original definition of language as a "set of strings", is established in the computer literature. Chomsky's clarifications go unnoticed. In fact, as the mathematics of parsing theory develops, the Bloomfieldian definition will become even more firmly entrenched.

April 1960: Oettinger discovers pushdown automata

Mathematical study of stacks as models of computing begins with a paper submitted by Anthony Oettinger to a symposium in 1960.[49] Oettinger's paper is full of evidence that stacks are still very new. For a start, he does not call them stacks -- he calls them "pushdown stores". And Oettinger does not assume his highly sophisticated audience knows what the "push" and "pop" operations are -- he defines them based on another set of operations -- one that eventually will form the basis of the APL language.

Oettinger's definitions all follow the behavorist model -- they are sets of strings.[50] Oettinger's mathematical model of stacks will come to be called a deterministic pushdown automata (DPDA). DPDA's will become the subject of a substantial literature.

Oettinger's own field is Russian translation. He hopes that DPDA's will be an adequate basis for the study of both computer and natural language translation. DPDA's soon prove totally inadequate for natural language translation. But for dealing with computing languages, DPDA's will have a much longer life.

Oettinger challenges the research community to develop a systematic method for generating stack-based parsers.[51] This search quickly becomes identified with the search for a theoretical basis for practical parsing.

Term: "State-driven"

A "state-driven" parser is one that tracks the parse using a data of a small constant size. Regular expressions are the classic state-driven parsers, and their state usually can be tracked as an integer. Regular expressions have limited power, but a parser just does not get any faster or more compact.

Term: "Stack-driven"

A "stack-driven" algorithm in one whose data is not of constant size, but which only needs to access a constant-size portion, its most recent data, at any one time. This "most recent" data is more often called the "top" of the stack. By 1961, stacks are coming to be understood, and the hardware is becoming capable of handling them. As of 1961, all of the more powerful algorithms with acceptable speed are based on stacks in some way.

May 1960: The ALGOL report

The ALGOL 60 report[52] specifies, for the first time, a block structured language. ALGOL 60 is recursively structured but the structure is implicit -- newlines are not semantically significant, and parentheses indicate syntax only in a few specific cases. The ALGOL compiler will have to find the structure.

The Parsing Problem

With the ALGOL 60 report, a search begins which continues to this day: to find a parser that solves the Parsing Problem. Ideally such a parser is[53]

efficient,
general,
declarative, and
practical[54].

It is a case of 1960's optimism at its best. As the ALGOL committee is well aware, a parsing algorithm capable of handling ALGOL 60 does not yet exist. But the risk they are taking has immediate results -- 1961 will see a number of discoveries that are still important today. On the other hand, the parser they seek will remain elusive for decades.

May 1960: BNF

In the ALGOL 60 report[55], Peter Naur improves the Backus notation and uses it to describe the language. This brings Backus' notation to wide attention. The improved notation will become known as Backus-Naur Form (BNF).

Term: "declarative"

For our purposes, a parser is declarative, if it will parse directly and automatically from grammars written in BNF. Declarative parsers are often called syntax-driven parsers.

Term: "procedural"

A non-declarative parser is more often called a "procedural" parser. A parser is procedural, if it requires procedural logic as part of its syntax phase.

Term: "general"

A general parser is a parser that will parse any grammar that can be written in BNF[56].

July 1960: Glennie's compiler-compiler

The first description of a consciously non-Chomskyan compiler seems to predate the first description of a Chomskyan parser. It is A.E. Glennie's 1960 description of his compiler-compiler[57]. Glennie's universal compiler apparently is useable, but it is more of a methodology than an implementation -- the compilers must be written by hand.

Glennie uses BNF, but he does not use it in the way Chomsky, Backus and Naur intended. Glennie is using BNF to describe a procedure. In true BNF, the order of the rules does not matter -- in Glennie's pseudo-BNF order matters very much. This means that, for most practical grammars, a set of rules in the form of BNF describe one language when interpreted as intended by Backus and Naur, and another when interpreted as intended by Glennie.

Glennie is aware of what he is doing, and is not attempting to deceive anyone -- he points out that the distinction between his procedural pseudo-BNF and declarative BNF, and warns his reader that the difference is important[58]. Is Glennie's non-Chomskyan-ism on theoretical grounds, or pragmatic?

January 1961: The first parsing paper

Ned Irons publishes a paper describing his ALGOL 60 parser. It is the first paper to fully describe any parser. The Irons algorithm is pure Chomskyan, declarative, general, and top-down with a bottom-up left corner element -- it is what will come to be called a left corner parser[59]. Since it is general, operator expressions are within the power of the Irons parser[60].

Term: "Top-down"

A top-down parser deduces the constituents of a rule from the rule. That is, it looks at the rule first, and then deduces what is on its RHS. Thus, it works from the start rule, and works top down until it arrives at the input tokens.

It is important to note that no useful parser can be purely top-down -- if a parser worked purely top-down, it would never look at its input. So every top-parser we will consider has some kind of bottom-up element. That bottom-up element may be very simple -- for example, one character lookahead. The Irons 1961 parser, like most later top-down parsers, has a sophisticated bottom-up element.

Term: "Bottom-up"

A bottom-up parser deduces a rule from its constituents. That is, it looks at either the input or at the LHS symbols of previously deduced rules, and from that deduces a rule. Thus, it works from the input tokens, and works bottom up until it reaches the start rule.

But just as no really useful parser is purely top-down, no really useful parser is purely bottom-up. It it true that the implementation of a useful parser might be 100% bottom-up. But a useful parser must do more than return all possible partitions of the input -- it must implement some criteria for preferring some parse trees over others. Implicitly, these criteria come "from the top".

In fact, bottom-up parsers will often be pressed into service for Chomskyan parsing, and the Chomskyan approach is inherently top down. Even when it comes to implementation, bottom-up parsers will often use explicit top-down logic. For efficiency it is usually necessary to eliminate unwanted parses trees while parsing, and culling unwanted parse trees is a job for top-down logic.

Term: "Synthesized attribute"

Irons 1961 also introduces synthesized attributes: the parse creates a tree, which is evaluated bottom-up. Each node is evaluated using attributes synthesized from its child nodes[61].

September 1961: Sakai discovers table parsing

Sakai publishes[62] a description of a translator, illustrating it with two translations of brief texts, one Japanese to English, and the other English to Japanese. Sakai's translation scheme is hopelessly underpowered, an example of the linguistic naivety prevalent in the field in 1961. But the parser in Sakai's translator is an important discovery, and will remain in use.

Sakai's is the first description of a table-driven parser. His algorithm will be rediscovered several times, until as late as 1969. It will more commonly be called the CYK algorithm after the names of some of its rediscoverers.[63]

Sakai's algorithm is bottom-up -- it works by pairing adjacent symbols and selecting pairs according to a table. The table entries can be probabilities, and this is highly useful in some circumstances. If the probabilities other than 0 and 1 are used, Sakai's algorithm is non-Chomskyan. If the probabilities are restricted to 0 and 1, they can be treated as booleans instead, and in this special case Sakai's algorithm is Chomskyan.[64]

Sakai's algorithm is impractically slow for large inputs[65]. But Sakai's algorithm will remain useful in very special circumstances. These are applications where no method is capable of parsing long inputs in reasonable time, and where the grammar is conveniently described in terms of the frequency of adjacent components. This will make Sakai's a good fit with some statistically-based approaches to Natural Language Processing.

Term: "Table-driven"

A parsing algorithm is "table-driven" if it tracks the parse using random access to data which varies in size with the parse. The memory and speed demands of table-driven parsing are almost always seen as too great for hardware as of 1961.

November 1961: Dijkstra's shunting yard algorithm

In November 1961, Dijkstra publishes the shunting yard algorithm[66]. In Norvell's useful classification[67] of operator expression parsers, all earlier parsers have been what he calls the classic algorithm.

Dijkstra's approach is new. In the classic approach, the number of levels of precedence is hard-coded into the algorithm. Dijkstra's algorithm can handle any number of levels of precedence without a change in its code, and without any effect on its running speed.

December 1961: Lucas discovers recursive descent

Peter Lucas publishes his description of a top-down parser[68]. Either Irons paper or this one can be considered to be the first description of recursive descent[69]. Except to claim that he deals properly with them, Lucas does not say how he parses operator expressions. But it is easy to believe Lucas' claim -- by this time the techniques for parsing operator expressions are well understood[70].

The Operator Issue as of 1961

The results of 1961 transform the Operator Issue. Up until ALGOL, parsing has essentially been parsing operator expressions. After ALGOL, almost all languages will be block-structured and ad hoc string manipulations will no longer be adequate -- the language as a whole will require a serious parsing technique. Parsing operator expressions will be a side show. Or so it seems.

Why not use the algorithms that parse operator expressions for the whole language? Samelson and Bauer 1959 had suggested exactly that. But, alas, operator expression parsing is not adequate for languages as a whole[71].

Also, as of 1961, we have BNF. This gives us a useful notation for describing grammars. BNF allows us to introduce our Basic Operator Grammar (BASIC-OP):


      S ::= E
      E ::= E + T
      E ::= T
      T ::= T * F
      T ::= F
      F ::= number

BASIC-OP has two operators, three levels of precedence and left associativity. This is enough to challenge the primitive parsing techniques in use before 1961. Surprisingly, this simple grammar will continue to bedevil mainstream parsing theory for the next half a century.

Recursive descent, it turns out, cannot parse BASIC-OP because BASIC-OP is left recursive. And that is not the end of it. Making addition and multiplication right-associate is unnatural and, as the authors of the IT compiler had found out, causes users to revolt. But suppose we try to use this Right-recursive Operator Grammar (RIGHT-OP) anyway:


      S ::= E
      E ::= T + E
      E ::= T
      T ::= F * T
      T ::= F
      F ::= number

Recursive descent, without help, cannot parse RIGHT-OP. As of 1961, parsing theory has not developed well enough to state why in precise terms. Suffice it to say for now that RIGHT-OP requires too much lookahead.

Recursive descent does have a huge advantage, one which, despite its severe limitations, will save it from obsolescence time and again. Hand-written recursive descent is essentially calling subroutines. Adding custom modification to recursive descent is very straight-forward.

In addition, while pure recursive descent cannot parse operator expressions, it can recognize them. Pure recursive descent may not be able to create the parse subtree for an operator expression itself. But it can recognize the expression, and hand control over to a specialized operator expression parser. This seems to be what Lucas' 1961 algorithm did, and it is certainly what many other implementations did afterwards. Adding the operator expression subparser makes the implementation only quasi-Chomskyan, but this is a price the profession will be willing to pay.

Alternatively, a recursive descent implementation can parse operator expressions as lists, and add associativity in post-processing. This pushes some of the more important parsing out of the syntactic phase into the semantics but, once again, it seems that Chomskyan purity will have to be thrown overboard if the ship is to stay afloat.

Bottom line: as of 1961 the Operator Issue takes a new form. Because of the Operator Issue, recursive descent is not sufficient for practical grammars -- it must always be part of a hybrid.

In this context, Dijkstra's new 1961 algorithm is a welcome alternative: as an operator expression subparser, it can parse operator expressions faster and in less space. But Dijkstra's algorithm has no more parsing power than the classic operator precedence algorithm -- it does nothing to change the basic tradeoffs.

1964: The Meta II compiler

Schorre publishes a paper on the Meta II compiler writing language, summarizing papers from a 1963 conference. Schorre cites both Backus and Chomsky as sources for Meta II's notation. Schorre notes that his parser is entirely different from that of Irons 1961 -- in fact, Schorre's parser is non-Chomskyan. Meta II is a template, rather than something that his readers can use, but in principle it can be turned into a fully automated compiler-compiler[72].

Term: "linear"

Among the goals for the ideal ALGOL parser is that it be efficient. As of 1965 this notion becomes more precise, thanks to a notation that Knuth borrows from calculus. This big O notation characterizes algorithms using functions, but it treats functions that differ by a constant multiple as identical[73]. Ignoring the constant means that conclusions drawn from big O results stay relevant in the face of steady improvements in technology.

These big O functions take as their input some aspect of the algorithm's input. In the case of parsing, by convention, the big O function is usually a function of the length of the input[74]. Of the many possible big O functions, only a few will be of interest in this timeline.

A function which grows steadily as the input grows is called linear. In big O notation, linear is written O(n).
A function which grows as the length of the input times its logarithm is almost linear: quasi-linear. In big O notation, quasi-linear can be written O(n*log n).
A function which grows as the square of the length is called quadratic -- O(n**2).
A function which grows as the cube of the length is called cubic -- O(n**3).
In extreme cases, a function can grow as a constant taken to the power of the length, or exponentially -- O(c**n).

By this time, efficient for a parser means linear or quasi-linear. Parsers which operate in quadratic, cubic, or exponential time are considered impractical. But parsers aimed at practitioners will often push the envelope -- any parser which uses backtracking is potentially exponential and is designed in the (often disappointed) hope that the backtracking will not get out of hand for the grammar of interest to the user.

1965: Knuth discovers LR

Donald Knuth answers[75] the challenge expressed a few years earlier by Oettinger. Oettinger had hoped for a theory of stack-based parsing to replace "ad hoc invention".[76] Knuth responds with a theory that encompasses all the "tricks"[77] used for efficient parsing up to that time. In an exhilarating and exhausting 39-page paper, Knuth shows that stack-based parsing is equivalent to a new class of grammars. Knuth calls this new class, LR(k). Knuth also provides a parsing algorithm for the LR(k) grammars.

Knuth's new LR parsing algorithm is deterministic, Chomskyan and bottom-up. It might be expected to be "the one to rule them all". Unfortunately, while linear, it is not practical.

LR(k) is actually a set of grammar classes. There is a grammar class for every k, where k is the amount of lookahead used. LR(0) requires no lookahead, but it is not practical because it is too weak to parse most grammars of interest. LR(1) is not practical because of the size of the tables it requires -- well beyond what can be done with 1965 hardware.[78]

As the k in LR(k) grows, LR(k) becomes more impractical rapidly. The size of the tables grows exponentially, while the value of the additional lookahead rapidly diminishes. It is not likely that LR(2) parsing will ever see much actual use, never mind LR(k) for any k greater than 2.

"Language" as of 1965

Knuth 1965 uses the Bloomfieldian definition of language, following the rest of the parsing literature at this time. In other words, Knuth defines a language a "set of strings", without regard to their meaning. The Bloomfieldian definition, while severely restrictive, has yet to cause any recognizable harm.

To keep things straight, I will borrow two terms from linguistics: "intension" and "extension". For this discussion, I will speak of the "set of strings" that make up a language as its extension. If you are a Bloomfieldian, a language is its extension.

The vast majority of people have always thought that, to be considered a language, a set of strings has to have a semantics -- that is, the strings must mean something. I will call the semantics of a language its intension.[79] A language is extensional (or Bloomfieldian) if it consists only of an extension. A language is intensional if it consists of both an intension and an extension. (All languages have extensions.[80]) For most people, the term "language" means an intensional language.

As of 1965, you can argue that using the Bloomfieldian definition has been helpful. It is easier to do math with an extensional language. And the results in terms of its extension often do apply to an intensional language. For example, Chomsky, to demonstrate the superiority of his model over Markov chains, showed that the extension of his model of English contains strings which clearly are English, but which are not predicted by Markov chains. Obviously, your model of an intensional language is wrong if its extension is wrong, and if you can discover that without delving into semantics, all the better.

Knuth wants to show a relationship between his LR grammars, and the mathematical literature on stack-based parsing (DPDA's). The trouble is, the DPDA literature is entirely non-Chomskyan -- all of its results are in terms of sets of strings. Knuth is forced to "compare apples to oranges"

How do you show that string-sets are the equivalent of grammars? What Knuth does is treat them both as string-sets -- extensions. Knuth compares the language extensions of the LR grammars to the sets of strings recognized by the DPDA's.

The result certainly seems encouraging. It turns out that the language extension of deterministic stack machines is exactly that of the LR grammars.

If you take language extensions as a proxy for grammars, things fall into place very neatly: the LR-parsers are the deterministic subset of the context-free parsers. And "deterministic" seems like a very good approximation of practical. All practical parsers in 1965 are deterministic parsers.[81]

Viewed this way, LR-parsing looks like the theoretical equivalent of practical parsing -- at least it is as close to an exact equivalent of practical parsing as theory is likely to get. Based on the algorithms of Knuth 1965, that means that the theoretical equivalent of "practical parsing" is somewhere between LR(0) and LR(1).

Not all the signs are promising, however. In fact, one of them is ominous. LR grammars form a hierarchy -- for every k≥0, there is an LR grammar which is LR(k+1), but which is not LR(k). But if you just look at sets of strings, this hierarchy pancakes. Every LR(k) language extension is also an LR(1) language extension, as long as k≥1.

It gets worse. In most practical applications, you can add an end-of-input marker to a grammar.[82], If you do this, every LR(k) language extension is also an LR(0) language extension. In terms of strings-sets, there is no LR hierarchy:

LR(0) = LR(1) = LR(2) = ... = LR(42) = ...

In short, as a proxy for LR grammars, LR language extensions look like they might be completely worthless.[83]

The Parsing Problem as of 1965

While the algorithm of Knuth 1965 does not solve the Parsing Problem, it convinces most that stack-based, and therefore LR, parsing is the framework for the solution.[84] To be sure, there is no proof, but the reasoning is persuasive:

In 1965, every practical parser is stack-driven.[85]
An algorithm that combines state transitions and stack operations is already a challenge to existing machines. In 1965, any more complicated algorithm is likely to be unuseable in practice.[86]
For LR(k) grammars, practical parsing is bracketed somewhere between LR(0) and LR(1).
In terms of what the parsing literature calls "languages" (extensional languages), LR(k) parsing is the exact equivalent of deterministic stack-based parsing.
Determinism is always linear and, at least in terms of speed, linear is almost always practical.
In 1965, every practical parser is deterministic.
In general, power is a trade-off for speed.

There are also aesthetic reasons to think that this theoretical equivalent for practical parsing is not all that rough. Recall that deterministic stack-based parsing is "exactly" LR-parsing. It is also the case that non-deterministic stack-based parsing is "exactly" context-free parsing. This symmetry is very elegant, and suggests that the theoreticians have uncovered the basic laws behind parsing.

Of course, "exactly" here means "exactly in terms of extensions". Extensions are used in this reasoning, while the actual interest is in intensions. And for extensions the LR hierarchy collapses. But in 1965 these things are not considered troubling.

After all, there is no exact theoretical equivalent of "practical" -- you always have to settle for a more or less rough equivalent. Reasoning in terms of extensions had not bitten anyone yet. And, while nobody had been more authoritative about the limits of extensions as a definition of language than Chomsky (he had literally written the book), Chomsky himself had used extensions to produce some of his most impressive results.

After 1965, the consensus excludes the idea that algorithms which target supersets of LR(k) might be faster than those that take on LR(k) directly.[87]. But, what if the LR language hierarchy collapse is a real problem? If we remove the evidence based on language extensions from the list above, all we have left are couple of over-generalizations. So the question remains:

Is there a non-deterministic parser that is linear for the LR(k) grammars, or even a superset of them?

This question will be answered by Joop Leo in 1991. The answer, surprisingly, will be yes.

The Operator Issue as of 1965

Knuth 1965 is a significant milestone in the history of operator expression parsing. Knuth specifically addresses the parsing of ALGOL and he zeroes in on its arithmetic expressions as the crucial issue[88]. Knuth suggests a "typical" grammar[89] which is short, but encapsulates the parsing challenges presented by ALGOL's arithmetic expressions:


      S ::= E
      E ::= - T
      E ::= T
      E ::= E - T
      T ::= P
      T ::= T * P
      P ::= a
      P ::= ( E )

KNUTH-OP is left-recursive, allows parentheses, has three levels of precedence, and implements both unary and binary minus. Recursive descent cannot parse KNUTH-OP, but KNUTH-OP is LR(1). The means that it is well within Knuth's new class of grammars and, by implication, probably within a practical subclass of the LR grammars.

1968: Lewis and Stearns discover LL

When Knuth discovered the LR grammars, he announced them to the world with a full-blown mathematical description. The top-down grammars, which arose historically, have lacked such a description. In 1968, Lewis and Stearns fill that gap by defining the LL(k) grammars[90].

Terms: "LL" and "LR"

When LL is added to the vocabulary of parsing, the meaning of LR shifts slightly. In 1965 Knuth defined LR to mean translatable from left to right[91]. But LL means scan from the left, using left reductions and, in response, the meaning of LR shifts to become scan from the left, using right reductions[92].

If there is a number in parentheses in this notation for parsing algorithms, it usually indicates the number of tokens of lookahead. For example, LL(1) means scan from the left, using left reductions with one character of lookahead. LL(1) will be important.

The Operator Issue as of 1968

With Knuth 1965 and Lewis and Stearns 1968, we can now restate the problem with recursive descent and operator expressions in precise terms: Recursive descent in its pure form, is LL(1). Arithmetic operator grammars are not LL(1) -- not even close. In fact, of the grammars BASIC-OP, RIGHT-OP, and KNUTH-OP, none is LL(k) for any k.

A common work-around is to have recursive descent parse arithmetic expressions as lists, and add associativity in post-processing. We are now able to look at this in more detail. An extended BNF grammar to recognize BASIC-OP as a list is as follows:


      S  ::= E
      E  ::= T { TI }*
      TI ::= '+' T
      T  ::= F { FI } *
      FI ::= 'x' F
      F  ::= number

In the above {Z}* means zero or more occurrences of Z. Expanded into pure BNF, and avoiding empty right hand sides, our operator "list" grammar becomes LIST-OP:


      S  ::= E
      E  ::= T TL
      E  ::= T
      TL ::= TI
      TL ::= TI TL
      TI ::= '+' T
      T  ::= F FL
      T  ::= F
      FL ::= FI
      FL ::= FI FL
      FI ::= 'x' F
      F  ::= number

LIST-OP is LL(1), and therefore can be parsed directly by recursive descent. Note that in LIST-OP, although associativity must be provided by a post-processor, the grammar enforces precedence.

1968: Earley's algorithm

Jay Earley discovers the algorithm named after him[93]. Like the Irons algorithm, Earley's algorithm is Chomskyan, declarative and fully general. Unlike the Irons algorithm, it does not backtrack. Earley's algorithm is both top-down and bottom-up at once -- it uses dynamic programming and is table-driven. Earley's approach makes a lot of sense and looks very promising indeed, but it has three serious issues:

First, there is a bug in the handling of zero-length rules.
Second, it is quadratic for right recursions.
Third, table-driven parsing is, by the standards of 1968 hardware, daunting.

1968: Attribute grammars

Irons' synthesized attributes had always been inadequate for many tasks. Until 1968, they had been supplemented by side effects or state variables. In 1968, Knuth publishes a paper on a concept he had been working for the previous few years: inherited attributes[94].

Term: "Inherited attributes"

Recall that a node in a parse gets its synthesized attributes from its children. Inherited attributes are attributes that a node gets from its parents. Of course, if both inherited and synthesized attributes are used, there are potential circularities. But inherited attributes are powerful and, with care, the circularities can be avoided.

Term: "Attribute grammar"

An attribute grammar is a grammar whose nodes may have both inherited and synthesized attributes.

1969: LALR

Since Knuth 1965, many have taken up his challenge to find a practically parseable subset of the LR(k) languages. In 1969, Frank DeRemer describes a new variant of Knuth's LR parsing[95]. DeRemer's LALR algorithm requires only a stack and a state table of quite manageable size. LALR looks practical, and can parse KNUTH-OP. LALR will go on to become the most widely used of the LR(k) subsets.

1969: the `ed` editor

Ken Thompson writes the ed editor as one of the first components of UNIX[96]. Before 1969, regular expressions were an esoteric mathematical formalism. Through the ed editor and its descendants, regular expressions become an everyday part of the working programmer's toolkit.

1972: Aho and Ullman is published

Alfred Aho and Jeffrey Ullman publish the first volume[97] of their two volume textbook summarizing the theory of parsing. As of 2023, Aho and Ullman 1972 will still be important. It will also be distressingly up-to-date -- progress in parsing theory will slow dramatically after 1972.

Aho and Ullman's version of Earley's algorithm includes a straightforward fix to the zero-length rule bug in Earley's original[98]. Unfortunately, this fix involves adding even more bookkeeping to Earley's.

Under the names TDPL and GTDPL, Aho and Ullman investigate the non-Chomskyan parsers in the Schorre lineage[99]. They note that it can be quite difficult to determine what language is defined by a TDPL parser[100]. At or around this time, rumor has it that the main line of development for GTDPL parsers is classified secret by the US government[101]. Whatever the reason, public interest in GTDPL fades.

1973: LRR

An article by Čulik and Cohen extends the idea behind LR to grammars with infinite lookahead.[102] LR-regular (LRR) grammars are LR with lookahead that is infinite but restricted. The restriction is that to tell the strings apart, you must use a finite set of regular expressions.

LRR seems to be a natural class of grammars, coming close to capturing the idea of "eyeball grammars" -- grammars that humans can spot by "eyeball". And, despite computer languages being designed around the idea that practical parsing falls short of LR(1), never mind LRR, constructs requiring infinite lookahead will come up in real languages.[103]

1973: Pratt parsing

As we have noted, pure LL(1) cannot parse operator expressions, and so operator expression parsers are often called into service as subparsers. What about making the entire grammar an operator grammar? This idea had already occurred to Samelson and Bauer in 1959. In 1973, Vaughan Pratt revives it[104].

There are problems with switching to operator grammars. Operator grammars are non-Chomskyan -- the BNF no longer accurately describes the grammar. Instead the BNF becomes part of a combined notation, and the actual grammar parsed depends also on precedence, associativity, and semantics. And operator grammars have a very restricted form -- most practical languages are not operator grammars.

But many practical grammars are almost operator grammars. And the Chomskyan approach has always had its dissenters. Vaughn Pratt is one of these, and discovers a new approach to operator expression parsing: Pratt parsing is the third and last approach in Theodore Norvell's taxonomy of approaches to operator expression parsing[105]. Some will adopt Pratt parsing as the overall solution to their parsing problems[106].

As of 2023, the Pratt approach will not be popular as an overall strategy. Under the name precedence climbing, it will most often be used as a subparser within a recursive descent strategy. All operator expression subparsers break the Chomskyan paradigm so the non-Chomskyan nature of Pratt's parser is not a problem in this context.

1975: The C compiler is converted to LALR

Bell Labs converts its C compiler from hand-written recursive descent to DeRemer's LALR algorithm[107].

1977: The first dragon book is published

The first dragon book[108] comes out. This soon-to-become classic textbook is nicknamed after the drawing on the front cover, in which a knight takes on a dragon. Emblazoned on the knight's lance are the letters LALR. From here on out, to speak lightly of LALR will be to besmirch the escutcheon of parsing theory.

1979: Yacc is released

Bell Laboratories releases Version 7 UNIX[109]. V7 includes what is, by far, the most comprehensive, useable and easily available compiler writing toolkit yet developed.

Part of the V7 toolkit is Yet Another Compiler Compiler (yacc)[110]. yacc is LALR-powered. Despite its name, yacc is the first compiler-compiler in the 2023 sense. For a few useful languages, the process of going from Chomskyan specification to executable is now fully automated. Most practical languages, including the C language and yacc's own input language, still require manual hackery[111].

The Parsing Problem as of 1979

Recall the criteria outlined for a solution of the Parsing Problem: that the parser be efficient, general, declarative and practical. LALR is linear and runs fast on 1979 hardware. LALR seems practical. And LALR is declarative. True, LALR is very far from general, but BASIC-OP, RIGHT-OP, KNUTH-OP, and LIST-OP are all LALR, and LALR can parse most operator expressions directly.

The criteria set in the original statement of the Parsing Problem have not been fully met. Nonetheless, even if there is a certain amount of disappointment, it seems as if the Parsing Problem and the Operator Issue can both be declared solved. Two decades of research seem to have paid off.

1987: Perl 1 is released

Larry Wall introduces Perl 1[112]. Perl embraces complexity like no previous language. Larry uses yacc and LALR very aggressively -- to my knowledge more aggressively than anyone before or since.

1990: Monads

Wadler starts to introduce monads to the functional programming community. As one example he converts his previous efforts at functional parsing techniques to monads[113]. The parsing technique is pure recursive descent. The examples shown for monadic parsing are very simple, and do not include operator expressions. In his earlier non-monadic work, Wadler had used a very restricted form of operator expression: His grammar had avoided left recursion by using parentheses, and operator precedence had not been implemented [114].

1991: Leo's speed-up of Earley's algorithm

Joop Leo discovers a way of speeding up right recursions in Earley's algorithm[115]. Leo's algorithm is linear for LRR. That means it is linear for a superset of LR(k) and therefore just about every unambiguous grammar of practical interest, as well as many ambiguous ones. As of 1991, hardware is six orders of magnitude faster than 1968 hardware, so that the issue of bookkeeping overhead has receded in importance. When it comes to speed, the game has changed in favor of the Earley algorithm.

The Parsing Problem as of 1991

Recall that after Knuth 1965, the research consensus had excluded the possibility that a parser of a superset of the LR(k) languages could be practical and linear. The argument from Knuth had been highly suggestive. It had not actually been a proof, but by 1991 many effectively take it as one.

Does Leo's algorithm refute the consensus? While not linear in general, Leo's non-deterministic algorithm is linear for a superset of Knuth's LR(k) grammars. But is Leo's algorithm practical?

In 1991, almost nobody asks any of those questions. Most researchers see the Parsing Problem as "solved" -- a closed issue. Earley parsing is almost forgotten, and Leo's discovery is ignored. Two decades will pass before anyone attempts a practical implementation of Leo 1991.

If Earley's is almost forgotten, then everyone in LALR-land is content, right? Wrong. In fact, far from it.

As of 1979, LALR seemed practical. But by the 1990s users of LALR are making unpleasant discoveries. While LALR automatically generates their parsers, debugging them is so hard they could just as easily write the parser by hand. Once debugged, their LALR parsers are fast for correct inputs. But almost all they tell the users about incorrect inputs is that they are incorrect. In Larry Wall's words, LALR is fast but stupid[116].

The Operator Issue as of 1991

On the written record, the discontent with LALR and yacc is hard to find. What programmer wants to announce to colleagues and potential employers that he cannot figure how to make the standard state-of-the-art tool work? But a movement away from LALR has already begun. Parsing practice falls back on recursive descent. The Operator Issue is back in full force.

1992: Combinator parsing

Combinator parsing is introduced in two papers published this year.[117] Of more interest to us is Hutton 1992, which focuses on combinator parsing[118]. As Hutton explains[119] a combinator is an attribute grammar with one inherited attribute (the input string) and two synthesized attributes (the value of the node and the remaining part of the input string). Combinators can be combined to compose other parsers. This allows you to have an algebra of parsers. It is an extremely attractive idea.

Exciting as the new mathematics is, the underlying parsing theory contains nothing new -- it is the decades-old recursive descent, unchanged. Hutton's example language is extremely basic -- his operator expressions require left association, but not precedence. Left association is implemented by post-processing.

Hutton's main presentation does not use monads. In his final pages, however, Hutton points out combinator parsers give rise to a monad and shows how his presentation could be rewritten in a form closely related to monads[120].

1995: Wadler's monads

In the adaptation of monads into the world of programming, Wadler 1995 is a turning point. One of Wadler's examples is a parser based on monads and combinator parsing. Combinator parsing becomes the standard example of monad programming.

In its monadic form, the already attractive technique of combinator parsing is extremely elegant and certainly seems as if it should solve all parsing problems. But, once again, it is still recursive descent, completely unchanged. Wadler handles left recursion by parsing into a list, and re-processing that list into left recursive form. Wadler's example grammar only has one operator, so the issue of precedence does not arise.

Instead of one paradigm to solve them all, Wadler ends up with a two layered approach, with a hackish bottom layer and an awkward interface. This undercuts the simplicity and elegance of the combinator approach.

1996: Hutton and Meijer on combinator parsing

Wadler 1995 used parsing as an example to motivate its presentation of monads. Graham Hutton and Erik Meijer[121] take the opposite perspective -- they write a paper on combinator parsing that could also be viewed as a first introduction to the use of monads in programming.[122]

The most complicated grammar[123] for operator expressions in Hutton and Meijer 1996 has two operators (exponentiation and addition) with two levels of precedence and two different associativities: exponentiation associates right and addition associates left.

The approach to parsing taken in Hutton and Meijer 1996 will be familiar by now: Precedence is provided by the combinator parser which parses operators and operands of the same precedence into lists. Associativity is provided by re-processing the lists.

The Operator Issue as of 1996

As we have seen, the literature introducing monads and combinator parsing added nothing to what is already known about parsing algorithms. We have focused on it for three reasons:

Functional programming will have had a profound influence on the way the profession sees parsing.
The literature shows the relative popularity in the 1990's of LALR and recursive descent, at least among one group of highly skilled programmers. LALR is not mentioned once in any of the articles we have surveyed.
The functional programming literature also shows that the Operator Issue, once thought solved, is as live an issue in 1996 as it had been at the end of 1961.

2000: The Perl 6 effort begins

Larry Wall decides on a radical reimplementation of Perl -- Perl 6. Larry does not even consider using LALR again.

2002: Aycock and Horspool solve Earley's zero-length rule problem

John Aycock and R. Nigel Horspool publish their attempt at a fast, practical Earley's parser[124]. Missing from it is Joop Leo's improvement -- they seem not to be aware of it. Their own speedup is limited in what it achieves and the complications it introduces can be counter-productive at evaluation time. But buried in their paper is a solution to the zero-length rule bug. And this time the solution requires no additional bookkeeping.

2004: PEG

Implementers by now are avoiding yacc, but the demand for declarative parsers remains. In 2004, Bryan Ford[125] fills this gap by repackaging the nearly-forgotten GTDPL. Ford's new algorithm, PEG, is declarative, always linear, and has an attractive new syntax.

But PEG, like Glennie's 1960 syntax formalism, is pseudo-declarative. PEG uses BNF notation, but it does not parse the BNF grammar described by the notation: PEG achieves unambiguity by finding only a subset of the parses of its BNF grammar. And, like its predecessor GTDPL, in practice it is usually impossible to determine what the subset is. The best a programmer usually can do is to create a test suite and fiddle with the PEG description until it passes. Problems not covered by the test suite will be encountered at runtime.

PEG takes the old joke that the code is the documentation one step further -- with PEG it is often the case that nothing documents the grammar, not even the code. Under this circumstance, creating reliable software is impossible. As it is usually used, PEG is the nitroglycerin of LL parsing -- slightly more powerful than LL(1), but too dangerous to be worth it.

PEG, in safe use, would essentially be LL(1)[126]. Those adventurous souls who parse operator expressions under PEG typically seem to parse the operator expressions as lists, and re-process them in the semantics.

2006: GNU C reverts to recursive descent

The old Bison-based C and Objective-C parser has been replaced by a new, faster hand-written recursive-descent parser.[127]

With this single line in the middle of a change list hundreds of lines long, GNU announces a major event in parsing history. (Bison, an LALR parser, is the direct descendant of Steve Johnson's yacc.)

For three decades, the industry's flagship C compilers have used LALR as their parser -- proof of the claim that LALR and serious parsing are equivalent. Now, GNU replaces LALR with the technology that LALR replaced a quarter century earlier: recursive descent.

2010: Leo 1991 is implemented

Jeffrey Kegler (the author of this timeline) releases the first practical implementation of Joop Leo's algorithm[128]. Named Marpa, the new parser is general, table-driven, declarative, and linear for LRR, which in turn is a superset of the LR grammars discussed in Knuth 1965.[129]. This means it is also linear for most of the grammar classes we have discussed in this timeline:

LL(k) grammars for all finite k, including LL(1) grammars.
LR(k) grammars for all finite k.
LALR grammars.
Operator grammars.

This is a vast class of grammars, and Marpa has the important feature that it allows a programmer to readily determine if their grammar is linear under Marpa. Marpa will parse a grammar in linear time, if

It is unambiguous[130].
It has no unmarked middle recursions[131].
It has no ambiguous right recursions[132].

The ability to ensure that a grammar can be parsed in linear time is highly useful for second-order languages -- a programmer can easily ensure that all her automatically generated grammars will be practical. Marpa's own DSL will make use of second-order programming.

Leo 1991 did not address the zero-length rule problem. Kegler solves it by adopting the solution of Aycock and Horspool 2002. Finally, in an effort to combine the best of declarative and hand-written parsing, Kegler reorders Earley's parse engine so that it allows procedural logic. As a side effect, this gives Marpa excellent error-handling properties.

2012: General precedence parsing

While the purely precedence-driven parsing of Samelson and Bauer 1960 and Pratt 1973 never caught on, the use of precedence in parsing remains popular. In fact, precedence tables often occur in documentation and standards, which is a clear sign that they too can play a role in human-readable but precise descriptions of grammars.

In 2012, Kegler releases a version of Marpa which allows the user to mix Chomskyan and precedence parsing at will[133]. Marpa's precedenced statements describe expressions in terms of precedence and association. If precedenced statements are added to a grammar that Marpa parses in linear time, then the grammar remains linear as long as the precedenced statements are (after precedence is taken in consideration) unambiguous[134].

Here is an example of a Marpa precedenced statement: [135]


              Expression ::=
                 Number
              |  '(' Expression ')' assoc => group
              || Expression '**' Expression assoc => right
              || Expression '*' Expression
              |  Expression '/' Expression
              || Expression '+' Expression
              |  Expression '-' Expression

Previously operator expression parsers had worked by comparing adjacent operators. This limited them to binary expressions (the ternary expressions in most languages are implemented using post-parsing hackery). Marpa's precedenced expressions are not operator-driven -- operators are used only to avoid ambiguity. Precedenced statements will directly implement any number of ternary, quaternary or n-ary operators. Ternary and larger expressions are useful in financial calculations, and are common in calculus.

The Operator Issue as of 2012

Marpa allows operator expressions to be parsed as BNF, eliminating the need for them. On the other hand, Marpa's ability to handle second-order languages allows it to take statements which describe expressions in terms of precedence and association, and rewrite them in a natural way into Chomskyan form. In this way the grammar writer has all the benefits of Chomskyan parsing, but is also allowed to describe his grammar in terms of operators when that is convenient. Once again, the Operator Issue seems to be solved, but this time perhaps for real.

The Parsing Problem as of 2012

Again recall the four criteria from the original statement of the Parsing Problem: that a parser be efficient, general, declarative and practical. Does Marpa[136] fulfill them?

At first glance, no. While Marpa is general, it is not linear or quasi-linear for the general case, and in that sense, Marpa might be considered not efficient enough. But to be general, a parser has to parse every context-free grammar, including some wildly ambiguous ones, for which simply printing the list of possible parses takes exponential time. With the experience of the decades, linearity for a fully general BNF parser seems to be an unnecessary requirement for a practical parser.

The LRR grammars, which Marpa does parse in linear time, include every other grammar class we have discussed -- LL(k) for all k, LR(k) for all k, LALR, operator grammars, etc. If we change our criteria as follows:

efficient for all LRR grammars, and for all grammars parseable in linear time by any other parser in practical use;
general;
declarative; and
practical.

then Marpa qualifies.

My teachers came from the generation which created the ALGOL vision and started the search for a solution to the Parsing Problem. Alan Perlis was one of them. Would he have accepted my claim that I have found the solution to the problem he posed? I do not know. But I am certain, as anyone else who knew him can attest, that he would given me a direct and plain-spoken answer. In that spirit, I submit this timeline to the candid reader. In any case, I hope that my reader has found this journey through the history of parsing informative.

Bibliography

An attempt is made to list the original publication, which is not always the one consulted. Departures from format are made to include information of historical interest, for instance the full author list of the ALGOL 1960 report. For similar reasons, the date in the tag is the date for purposes of historical priority, and is not always the publication date.

ACM 1954 Committee on Nomenclature of the Association for Computing Machinery. First Glossary of Programming Terminology. June 1954. http://archive.computerhistory.org/resources/text/Knuth_Don_X4100/PDF_index/k-8-pdf/k-8-u2741-2-ACM-Glossary.pdf Accessed 10 October 2018.

Aho and Ullman 1972: Aho, Alfred V., and Jeffrey D. Ullman. The theory of parsing, translation, and compiling. Vol. 1. Prentice-Hall, 1972.

Aho and Ullman 1973: Aho, Alfred V., and Jeffrey D. Ullman. The theory of parsing, translation, and compiling. Vol. 2. Prentice-Hall, 1973.

Aho and Ullman 1977: Aho, Alfred V., and Jeffrey D. Ullman. Principles of Compiler Design. Addison-Wesley, 1977. The dragon on the cover is green, and this classic first edition is sometimes called the green dragon book to distinguish it from its second edition, which had a red dragon on the cover.

ALGOL 1960: Backus, John W., F. L. Bauer, J. Green, C. Katz, J. McCarthy, A. J. Perlis, H. Rutishauser, K. Samelson, B. Vauquois, J. H. Wegstein, A. van Wijngaarden, and M. Woodger. Revised report on the algorithmic language Algol 60. Edited by Peter Naur. Springer, 1969. Accessed 23 April 2018. Originally published in Communications of the ACM Volume 3, Issue 5, May 1960, Pages 299-314.

Aycock and Horspool 2002: Aycock, John, and R. Nigel Horspool. Practical earley parsing. The Computer Journal, vol. 45, issue 6, 2002, pp. 620-630. Accessed 23 April 2018.

Backus 1959: Backus, John W. The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM conference. Proceedings of the International Conference on Information Processing, 1959. Accessible online here and here as of 24 April 2018.

Backus 1980: Backus, John. Programming in america in the 1950s - Some Personal Impressions. A History of Computing in the twentieth century, 1980, pp. 125-135. Accessed 11 October 2018.

Bloomfield 1926: Bloomfield, Leonard. A set of Postulates for the Science of Language. Language, Vol. 2, No. 3 (Sep., 1926), pp. 153-164.

Bloomfield 1933: Bloomfield, Leonard. Language. Holt, Rinehart and Winston, 1933.

Backus 2006: Booch, Grady. Oral History of John Backus. Computer History Museum, 5 September 2006. http://archive.computerhistory.org/resources/text/Oral_History/Backus_John/Backus_John_1.oral_history.2006.102657970.pdf Accessed 11 October 2018.

Carpenter and Doran 1977: Carpenter, Brian E., and Robert W. Doran. The other Turing machine. The Computer Journal, vol. 20, issue 3, 1 January 1977, pp. 269-279. Accessed online 24 April 2018.

Chipps et al 1956: Chipps, J.; Koschmann, M.; Orgel, S.; Perlis, A.; and Smith, J. A mathematical language compiler. Proceedings of the 1956 11th ACM national meeting, ACM, 1956.

Chomsky 1956: Chomsky, Noam. Three models for the description of language. IRE Transactions on information theory, vol. 2, issue 3, September 1956, pp. 113-124.

Chomsky 1957: Chomsky, Noam. Syntactic Structures. Mouton & Co., 1957.

Chomsky 1959: Chomsky, Noam. A Review of B. F. Skinner's Verbal Behavior. Language, Volume 35, No. 1, 1959, 26-58. https://chomsky.info/1967____/. Accessed 1 September 2018.

Chomsky 1978: Chomsky, Noam. Topics in the Theory of Generative Grammar. De Gruyter, 1978.

Chomsky 2002: Chomsky, Noam. Syntactic Structures, 2nd ed., Mouton de Gruyter, 2002.

Čulik and Cohen 1973: Čulik II, Karel and Cohen, Rina. LR-regular grammars -- an extension of LR(k) grammars Journal of Computer and System Sciences. Volume 7, Issue 1, February 1973, Pages 66-96. http://www.sciencedirect.com/science/article/pii/S0022000073800509. Accessed 29 August 2018.

Darwin 1984: Darwin, Ian F., and Geoffrey Collyer. A History of UNIX before Berkeley: UNIX Evolution, 1975-1984. /sys/doc/, 1984, http://doc.cat-v.org/unix/unix-before-berkeley/. Accessed 24 April 2018.

Denning 1983: Denning, Peter. Untitled introduction to Irons 1961. Communications of the ACM, 25th Anniversary Issue, vol. 26, no. 1, 1983, p. 14.

DeRemer 1969: DeRemer, Franklin Lewis. Practical translators for LR(k) languages. September 1969. PhD dissertation, Massachusetts Institute of Technology, https://dspace.mit.edu/bitstream/handle/1721.1/13628/24228988-MIT.pdf?sequence=2. Accessed 24 April 2018.

Dijkstra 1961: Dijkstra, E. W. Algol 60 translation: An algol 60 translator for the x1 and making a translator for algol 60. Stichting Mathematisch Centrum, Rekenafdeling, MR 34/61, 1961.

Earley 1968: Earley, Jay. An Efficient Context-free Parsing Algorithm. August 1968. PhD dissertation, Carnegie-Mellon University, http://reports-archive.adm.cs.cmu.edu/anon/anon/usr0/ftp/scan/CMU-CS-68-earley.pdf. Accessed 24 April 2018.

Ford 2002: Ford, Bryan. Packet parsing: a Practical Linear-Time Algorithm with Backtracking. September 2002. PhD dissertation, Massachusetts Institute of Technology, https://dspace.mit.edu/bitstream/handle/1721.1/87310/51972156-MIT.pdf;sequence=2. Accessed 24 April 2018.

Ford 2004: Ford, Bryan. Parsing expression grammars: a recognition-based syntactic foundation. ACM SIGPLAN Notices, vol. 39, no. 1, January 2004. https://pdos.csail.mit.edu/papers/parsing:popl04.pdf. Accessed 24 April 2018.

Frost 1992: Frost, Richard A. Constructing Programs as Executable Attribute Grammars. The Computer Journal, vol. 35, issue 4, 1 August 1992, pp. 376-389. https://academic.oup.com/comjnl/article/35/4/376/348233.

FSF 2006: "GCC 4.1 Release Series: Changes, New Features, and Fixes." Free Software Foundation, http://gcc.gnu.org/gcc-4.1/changes.html. Accessed 25 April 2018. The release date of GCC 4.1 was February 28, 2006.

Glennie 1960: Glennie, A. E. On the syntax machine and the construction of a universal compiler, TR-2. Carnegie Institute of Technology, Computation Center, 1960. http://www.chilton-computing.org.uk/acl/literature/reports/p024.htm. The paper is dated 10 July 1960. Accessed 24 April 2018.

Grune and Jacobs 2008: Grune, D. and Jacobs, C. J. H. Parsing Techniques: A Practical Guide, 2nd edition. Springer, 2008. When it comes to determining who first discovered an idea, the literature often disagrees. In such cases, I have often deferred to the judgement of Grune and Jacobs.

Harris 1993: Harris, Randy Allen. The Linguistics Wars. Oxford University Press, 1993

Hayes 2013: Hayes, Brian. First links in the Markov chain. American Scientist, vol. 101, no .2, 2013, p. 252. Accessed online as of 24 April 2018.

Hayes 1962: Hayes, David G. Automatic language-data processing. Pp. 394-423 in Computer Applications in the Behavioral Sciences, H. Borko, editor, Prentice Hall, 1962. Describes two algorithms, the first of which, attributed to J. Cocke, is a CYK parser.

Hilgers and Langville 2006: Hilgers, Philipp, and Amy N. Langville. The five greatest applications of Markov Chains. Proceedings of the Markov Anniversary Meeting, Boston Press, 2006. http://langvillea.people.cofc.edu/MCapps7.pdf. Accessed 24 April 2018. Slide presentation: https://www.csc2.ncsu.edu/conferences/nsmc/MAM2006/langville.pdf. Accessed 24 April 2018.

Hutton 1992: Hutton, Graham. Higher-order functions for parsing. Journal of functional programming, vol. 2, no. 3, July 1992, pp. 323-343. http://eprints.nottingham.ac.uk/221/1/parsing.pdf. Accessed 24 April 2018.

Hutton and Meijer 1996: Hutton, Graham and Erik Meijer. Monadic parser combinators, Technical Report NOTTCS-TR-96-4. Department of Computer Science, University of Nottingham, 1996. http://eprints.nottingham.ac.uk/237/1/monparsing.pdf. Accessed 24 April 2018.

Irons 1961: Irons, Edgar T. A syntax directed compiler for ALGOL 60. Communications of the ACM, vol. 4, no. 1, January 1961, pp. 51-55.

Johnson 1975: Johnson, Stephen C. Yacc: Yet another compiler-compiler, Technical Report. Bell Laboratories, Murray Hill, NJ, 1975. https://www.isi.edu/~pedro/Teaching/CSCI565-Fall15/Materials/Yacc.pdf. Accessed 24 April 2018.

Kasami and Torii 1969: Kasami, T. and Torii, K. A syntax-analysis procedure for unambiguous context-free grammars. Journal of the ACM, vol. 16, issue 3, July 1969, pp. 423-431.

Kegler 2010: Kegler, Jeffrey. Marpa is now O(n) for Right Recursions. Ocean of Awareness, June 5, 2010. http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2010/06/marpa-is-now-on-for-right-recursions.html. Accessed 24 April 2018. This is the initial announcement, and its examples and links are obsolete. The latest stable version of Marpa is Marpa::R2.

Kegler 2012a: Kegler, Jeffrey. Precedence parsing made simpler. Ocean of Awareness, August 22, 2012. http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2012/08/precedence-parsing-made-simpler.html. Accessed 24 April 2018. This is the initial announcement, and its examples and links are obsolete. The latest stable version of Marpa is Marpa::R2.

Kegler 2012b: Kegler, Jeffrey. Marpa, A Practical General Parser: The Recognizer. https://arxiv.org/abs/1910.08129. Accessed 23 October 2019.

Kegler Strings 2018: Kegler, Jeffrey. Is a language just a set of strings? Ocean of Awareness, May 28, 2018. http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2018/05/chomsky_1956.html. Accessed 2 September 2018.

Kegler Solved 2018: Kegler, Jeffrey. Why is parsing considered solved? Ocean of Awareness, 4 June, 2018. http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2018/05/knuth_1965.html. Accessed 2 September 2018.

Kegler Undershoot 2018: Kegler, Jeffrey. Undershoot: Parsing theory in 1965. Ocean of Awareness, August 8, 2018. http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2018/07/knuth_1965_2.html. Accessed 2 September 2018.

Kegler Haskell 2018: Kegler, Jeffrey. A Haskell challenge. Ocean of Awareness, August 28, 2018. http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2018/08/rntz.html. Accessed 28 August 2018.

Kleene 1951: Kleene, Stephen Cole. Representation of events in nerve nets and finite automata., RM-704. Rand Corporation, 15 December 1951. https://www.rand.org/content/dam/rand/pubs/research_memoranda/2008/RM704.pdf. Accessed 24 April 2018.

Knuth 1962: Knuth, Donald E. A history of writing compilers. Computers and Automation, vol. 11, no. 12, 1962, pp. 8-18.

Knuth 1965: Knuth, Donald E. On the translation of languages from left to right. Information and Control, vol. 8, issue 6, December 1965, pp. 607-639. https://ac.els-cdn.com/S0019995865904262/1-s2.0-S0019995865904262-main.pdf?_tid=dcf0f8a0-d312-475e-a559-be7714206374&acdnat=1524066529_64987973992d3a5fffc1b0908fe20b1d Accessed 24 April 2018.

Knuth 1968: Knuth, Donald E. Semantics of context-free languages. Mathematical Systems Theory, vol. 2, no. 2, 1968, pp. 127-145. https://www.csee.umbc.edu/courses/331/fall16/01/resources/papers/Knuth67AG.pdf. Accessed 24 April 2018.

Knuth 1971: Knuth, Donald E. Top-down syntax analysis. Acta Informatica vol. 1, issue 2, 1971, pp. 79-110. http://www.dcc.ufrj.br/~fabiom/comp20122/knuth_topdown.pdf. Accessed 24 April 2018.

Knuth 1990: Knuth, Donald E. The genesis of attribute grammars. Attribute Grammars and Their Applications, Springer, September 1990, pp. 1-12. http://www.cs.miami.edu/home/odelia/teaching/csc419_spring17/syllabus/Knuth_AttributeHistory.pdf. Accessed 24 April 2018.

Knuth and Pardo 1976: Knuth, Donald E., and Luis Trabb Pardo. The Early Development of Programming Languages, STAN-CS-76-562. Computer Science Department, Stanford University, August 1976. http://bitsavers.trailing-edge.com/pdf/stanford/cs_techReports/STAN-CS-76-562_EarlyDevelPgmgLang_Aug76.pdf. Accessed 24 April 2018.

Leo 1991: Leo, Joop M. I. M. A general context-free parsing algorithm running in linear time on every LR (k) grammar without using lookahead. Theoretical computer science vol. 82, issue 1, 22 May 1991, pp. 165-176. https://www.sciencedirect.com/science/article/pii/030439759190180A Accessed 24 April 2018.

Lewis and Stearns 1968: There is a 1966 paper with the same authors and title. but Aho and Ullman 1972, p. 268 says that the 1968 paper is the first definition of LL(k). Lewis II, Philip M., and Richard Edwin Stearns. Syntax-directed transduction. Journal of the ACM, vol. 15, issue 3, 1968, pp. 465-488.

Lovelace 1843: Ada Augusta, Countess of Lovelace. "Notes" on her translation of Sketch of The Analytical Engine Invented by Charles Babbage by Menabrea, L. F. Scientific Memoirs Volume 3, September 1843. https://www.fourmilab.ch/babbage/sketch.html. Accessed 11 October 2018.

Lucas 1961: Lucas, Peter. Die Strukturanalyse von Formelübersetzern/analysis of the structure of formula translators. Electronische Rechenlagen, vol. 3, 11.4, December 1961, pp. 159-167.

Marpa::R2:
Home website: http://savage.net.au/Marpa.html. Accessed 25 April 2018.
Kegler's Marpa website: https://jeffreykegler.github.io/Marpa-web-site/. Accessed 25 April 2018.
Github: https://github.com/jeffreykegler/Marpa--R2. Accessed 25 April 2018.
See also Kegler 2012b and Marpa::R2 MetaCPAN.

Marpa::R2 MetaCPAN: https://metacpan.org/pod/Marpa::R2. Accessed 30 April 2018.

Markov 1906: Markov, Andrey Andreyevich. Rasprostranenie zakona bol'shih chisel na velichiny, zavisyaschie drug ot druga. Izvestiya Fiziko-matematicheskogo obschestva pri Kazanskom universitete, 2-ya seriya, tom 15, 1906, pp. 135-156. In English, the title translates to "Extension of the law of large numbers to quantities, depending on each other".

Markov 1913: Markov, Andrey Andreyevich. Primer statisticheskogo issledovaniya nad tekstom 'Evgeniya Onegina', illyustriruyuschij svyaz' ispytanij v cep'. Izvestiya Akademii Nauk, Sankt-Peterburg, VI seriya, tom 7, 9(3), 1913, pp. 153-162. An English translation is An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains, Science in Context, vol. 19, no. 4, December 2006, pp. 591-600. http://www.alpha60.de/research/markov/DavidLink_AnExampleOfStatistical_MarkovTrans_2007.pdf. Accessed 25 April 2018.

Mascarenhas et al 2014: Mascarenhas, Fabio, Sèrgio Medeiros, and Roberto Ierusalimschy. On the relation between context-free grammars and parsing expression grammars. Science of Computer Programming, volume 89, part C, 1 September 2014, pp. 235-250. https://arxiv.org/abs/1304.3177. Accessed 25 April 2018.

McIlroy 1987: McIlroy, M. Douglas. "A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971-1986," AT&T Bell Laboratories Computing Science Technical Report #139, 1987. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.692.4849&rep=rep1&type=pdf. Accessed 25 April 2018.

McIlroy and Kernighan 1979: McIlroy, M. D. and B. W. Kernighan. Unix Time-Sharing System: Unix Programmer's Manual. Seventh Edition, Bell Telephone Laboratories, January 1979. https://s3.amazonaws.com/plan9-bell-labs/7thEdMan/index.html. Accessed 25 April 2018.

Moser 1954: Moser, Nora B. Compiler method of automatic programming. Proceedings of the Symposium on Automatic Programming for Digital Computers, Office of Naval Research, 1954, pp. 15-21. This conference, held 13-14 May and organized by Grace Hopper, was the first symposium devoted to software.

Norvell 1999: Norvell, Theodore. Parsing Expressions by Recursive Descent. 1999, https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm. Accessed 25 April 2018.

Oettinger 1960: Oettinger, Anthony. Automatic Syntactic Analysis and the Pushdown Store. Proceedings of Symposia in Applied Mathematics, Volume 12, American Mathematical Society, 1961. From the Proceedings of the Twelfth Symposium in Applied Mathematics held in New York City, April 14-15, 1960.

Padua 2000: Padua, David. The Fortran I compiler. Computing in Science & Engineering, volume 2, issue 1, Jan-Feb 2000, pp. 70-75. https://web.stanford.edu/class/archive/cs/cs339/cs339.2002/fortran.pdf. Accessed 25 April 2018.

Perlis et al 1958: Perlis, Alan J., J. W. Smith, and H. R. Van Zoeren. Internal Translator (IT): A compiler for the 650. Computation Center, Carnegie Institute of Technology. Publication date is given as March 1957 in Knuth and Pardo 1976, p. 105. https://ia801904.us.archive.org/11/items/bitsavers_ibm650Carntor_16304233/CarnegieInternalTranslator.pdf. Accessed 25 April 2018.

Post 1943: Post, Emil L. Formal Reductions of the General Combinatorial Decision Problem. American Journal of Mathematics, vol. 65, no .2, April 1943, pp. 197-215.

Pratt 1973: Pratt, Vaughan R. Top down operator precedence. Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages, ACM, 1973.

Rosenkrantz and Stearns 1970: Rosenkrantz, Daniel J., and Richard Edwin Stearns. Properties of deterministic top-down grammars. Information and Control, volume 17, no. 3, October 1970, pp. 226-256. https://s3.amazonaws.com/academia.edu.documents/43093758/Properties_of_Deterministic_Top-Down_Gra20160226-4294-1dzfmgn.pdf?AWSAccessKeyId=AKIAIWOWYYGZ2Y53UL3A&Expires=1524516488&Signature=EQ9rs7iGKORIoxa%2F1jGEzOI5pOQ%3D&response-content-disposition=inline%3B%20filename%3DProperties_of_deterministic_top_down_gra.pdf. Accessed 23 April 2018.

Sakai 1961: Sakai, Itiroo. Syntax in Universal Translation International Conference on Machine Translation of Languages and Applied Language Analysis, National Physical Laboratory, Teddington, UK, 5-8 September 1961. http://www.mt-archive.info/50/NPL-1961-Sakai.pdf. Accessed 6 October 2018.

Samelson and Bauer 1959: Samelson, Klaus, and Friedrich L. Bauer. "Sequentielle formelübersetzung." it-Information Technology 1.1-4 (1959): 176-182.

Samelson and Bauer 1960: Samelson, Klaus, and Friedrich L. Bauer. Sequential formula translation. Communications of the ACM, vol. 3, no. 2, February 1960, pp. 76-83. A translation of Samelson and Bauer 1959.

Schorre 1964: Schorre, D. V. Meta ii a syntax-oriented compiler writing language. Proceedings of the 1964 19th ACM national conference, ACM, 1964. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.695.4636&rep=rep1&type=pdf. Accessed 23 April 2018.

Shannon 1948: Shannon, Claude E. A Mathematical Theory of Communication. Bell System Technical Journal, vol. 27, pp. 379-423, 623-656, July, October, 1948. http://cs-www.cs.yale.edu/homes/yry/readings/general/shannon1948.pdf. Accessed 23 April 2018.

Simms 2014: Simms, Damon. Further discussion related to request for arbitration. Talk:Metacompiler, Archive 2, Wikipedia, 9 October 2014, https://en.wikipedia.org/wiki/Talk:Metacompiler/Archive_2#Further_discussion_related_to_request_for_arbitration. Accessed 25 April 2018.

Snyder 1975: Snyder, Alan. A portable compiler for the language C. No. MAC-TR-149. Massachusetts Institute of Technology, Project MAC, 1975.

Wadler 1985: Wadler, Philip. How to replace failure by a list of successes a method for exception handling, backtracking, and pattern matching in lazy functional languages. Conference on Functional Programming Languages and Computer Architecture, Springer, 1985. https://rkrishnan.org/files/wadler-1985.pdf. Accessed 23 April 2018.

Wadler 1990: Wadler, Philip. Comprehending monads. Proceedings of the 1990 ACM conference on LISP and functional programming, ACM, 1990. http://www.diku.dk/hjemmesider/ansatte/henglein/papers/wadler1992.pdf. Accessed 23 April 2018.

Wadler 1995: Wadler, Philip. Monads for functional programming. International School on Advanced Functional Programming, Springer, 1995. http://roman-dushkin.narod.ru/files/fp__philip_wadler_001.pdf. Accessed 23 April 2018.

Wall 2014: Wall, Larry. IRC log for #perl6, 30 August 2014, https://logs.liz.nl/perl6/2014-08-30.html#16:10 and https://web.archive.org/web/20211015215851/https://logs.liz.nl/perl6/2014-08-30.html#16:09-0004. Accessed 16 October 2021.

Wiki Big O: Big O Notation, Wikipedia, 29 April 2018, https://en.wikipedia.org/w/index.php?title=Big_O_notation&oldid=838856464. Accessed 29 April 2018.

Wiki Perl: Perl, Wikipedia, 21 April 2018, https://en.wikipedia.org/w/index.php?title=Perl&oldid=837585549. Accessed 26 April 2018.

Younger 1967: Younger, Daniel M. Recognition and Parsing of Context-Free Languages in Time n³ Information and Control 10, pp 189-208, 1967. https://core.ac.uk/download/pdf/82305262.pdf. Accessed 6 October 2018.

Footnotes

1. The controversy over whether Ada actually wrote the first computer programs is not important for our purposes. For most new machines, the first "programs" are written by the electrical engineers for their "smoke tests", and it is not unlikely that Babbage had written programs for his Calculating Engine before Ada. All of which misses the point of Ada's contribution, which is to be the first to realize what software is. It is the difference between being the first person to build and fly a working airplane, and being the first person to lay out the foundations of aerodynamics. ↩

2. The most important ideas come to permeate the culture, to the detriment of the reputation of those who discover them. As of 2018, small children understand that, in order to play a computer game, they need not just the game platform (hardware), but also the game itself (software). Similarly, the idea that the medium for creating software is a "programming language" seems trivially obvious. Ada's discoveries are so deeply embedded in 2023 culture, that it is hard to realize that there was once a time in which they were, in fact, discoveries.

It is hard to appreciate in 2023 how shocking Ada's ideas were in 1843. We experience no crisis of faith when we use our smartphone to calculate a square root. But for someone in 1843, doing square roots meant thinking complex thoughts, and it seemed obvious that any entity that could think complex thoughts must also have a soul. Your cellphone would have been, literally, a "Frankenstein" experience for most people in 1843, and they would wonder whether you could remove its battery without committing murder. ↩

3. Lovelace 1854, "Note A". ↩

4. Markov 1906. ↩

5. Markov's purpose seems to have been to refute Nekrasov's proof of free will. Nekrasov reasoned as follows:

Events that follow the law of large numbers must be independent.
Social statistics follow the law of large numbers.
Social statistics are made up of individual human choices.
Therefore, individual human choices are independent events.
Since individual human choices are events independent of outside influence, human beings have free will.

Markov demonstrated that the law of large numbers also works for dependent events, undercutting Nekrasov's proof (Hayes 2013, pp. 92-93). ↩

6. Markov 1913. See Hayes 2013 and Hilgers and Langville 2006, pp. 155-157. ↩

7. Bloomfield 1926. ↩

8. Bloomfield 1926, definition 4 on p. 154. ↩

9. "The statement of meanings is therefore the weak point in language-study, and will remain so until human knowledge advances very far beyond its present state. In practice, we define the meaning of a linguistic form, wherever we can, in terms of some other science.", Bloomfield 1926, p. 140. ↩

10. Post 1943. ↩

11. Carpenter and Doran 1977 ↩

12. Shannon 1948. ↩

13. Shannon 1948, pp. 4-6. See also Hilgers and Langville 2006, pp. 157-159. ↩

14. Knuth and Pardo 1976, pp. 29-35, 40. ↩

15. No formal apparatus for describing operator expressions is fully worked out as of 1949. ↩

16. This timeline refers to precedence levels as tight or loose. The traditional terminology is confusing: tighter precedences are higher but traditionally the precedence levels are also numbered and the higher the number, the lower the precedence. ↩

17. Some of the firsts for parsing algorithms in this timeline are debatable, and the history of operator precedence is especially murky. Here I follow Samelson and Bauer 1960 in giving the priority to Boehm. ↩

18. Norvell 1999. ↩

19. Knuth and Pardo 1976, pp 35-42. ↩

20. Knuth and Pardo 1976, p. 50. ↩

21. The quoted definition is from Moser 1954, p. 15, as cited in Knuth and Pardo 1976, p. 51. ↩

22. In 1952, an interface that guided calls to subroutines was much more helpful than current programmers might imagine: "Existing programs for similar problems were unreadable and hence could not be adapted to new uses." (Backus 1980, p. 126) ↩

23. I hope nobody will read this terminological clarification as in any sense detracting from Hopper's achievement. Whatever it is called, Hopper's program was a major advance, both in terms of insight and execution, and her energetic followup did much to move forward the events in this timeline. Hopper has a big reputation and it is fully deserved. ↩

24. Kleene 1951. ↩

25. Knuth and Pardo 1976, p 42. ↩

26. Knuth and Pardo 1976, pp. 42-49. ↩

27. ACM 1954, p. 17. Emphasis in original. ↩

28. ACM 1954, p. 5. ↩

29. Interestingly, John Backus in Backus 1980, pp. 133-134, discusses the evolution of the meaning of the term "compiler", but does not mention serving on the ACM committee, or that committee's definition of "compiler". ↩

30. Knuth and Pardo 1976, p. 50. ↩

31. Harris 1993, pp 31-34, p. 37. ↩

32. Knuth and Pardo 1976, pp. 83-86. ↩

33. Perlis et al 1958. Knuth and Pardo 1976, p. 83, state that the IT compiler "was put into use in October, 1956." ↩

34. Knuth and Pardo 1976, p. 84. ↩

35. Knuth 1962. The 1956 date for IT is from Padua 2000. ↩

36. "[A] comparison of ONR 1954 and ONR 1956 proceedings makes it clear that the work 'compiler' had by now [1956] acquired its new [ACM committee] meaning": Knuth and Pardo 1976, p. 83. Here ONR is the Office of Naval Research. See also Backus 1980, pp. 133-134. ↩

37. Chomsky 1956. ↩

38. Chomsky seems to have been unaware of Post's work -- he does not cite it. ↩

39. Chomsky 1956, p. 114. In case there is any doubt Chomsky's "strings" are the same as Bloomfield's utterances, Chomsky also calls his strings, "utterances". For example, Chomsky 2002, p. 15: "Any grammar of a language will project the finite and somewhat accidental corpus of observed utterances to a set (presumably infinite) of grammatical utterances." ↩

40. Chomsky 1956, p. 118, p. 123. ↩

41. Chomsky 1957. ↩

42. Backus's notation is influenced by his study of Post -- he seems not to have read Chomsky until later. See Backus 2006, p. 25 and Backus 1980, p. 133. ↩

43. Backus 1959. ↩

44. Backus 1980, pp. 132-133. ↩

45. Samelson and Bauer 1960. ↩

46. Norvell 1999. ↩

47. Chomsky 1959. ↩

48. Chomsky 1959, Section VIII. In later years, Chomsky will make it clear that he never intended to avoid semantics:

[...] it would be absurd to develop a general syntactic theory without assigning an absolutely crucial role to semantic considerations, since obviously the necessity to support semantic interpretation is one of the primary requirements that the structures generated by the syntactic component of a grammar must meet. (Chomsky 1978, p. 20, in footnote 7 starting on p. 19).

↩

49. Oettinger 1960 American Mathematical Society, 1961. ↩

50. Oettinger 1960, p. 106. ↩

51. "The development of a theory of pushdown algorithms should hopefully lead to systematic techniques for generating algorithms satisfying given requirements to replace the ad hoc invention of each new algorithm." (Oettinger 1960, p. 127.) ↩

52. ALGOL 1960. ↩

53. The linguistics literature discuss goals, philosophy and motivations, sometimes to the detriment of actual research (Harris 1993). The Parsing Theory literature, on the other hand, avoids non-technical discussions, so that a "manifesto" listing the goals of Parsing Theory is hard to find. But the first two sentences of the pivotal Knuth 1965 may serve as one:

"There has been much recent interest in languages whose grammar is sufficiently simple that an efficient left-to-right parsing algorithm can be mechanically produced from the grammar. In this paper, we define LR(k) grammars, which are perhaps the most general ones of this type, [...]" (p. 607, boldface added.)

Here Knuth explicitly refers to goals of efficiency, declarativeness, and generality, announcing a trade-off of the first two against the second. Note Knuth also refers to "left-to-right" as a goal, but this he makes clear (pp. 607-608) that is because he sees it as a requirement for efficiency. ↩

54. Practical here is a catch-all term for those properties which Parsing Theory de-emphasized. As such, the literature rarely explicitly refers to practicality as a goal. And, in some Parsing Theory papers, it is not a goal, at least not directly. But I think it can be taken as given that, when the discussion was about parsers intended for actual use, practicality was a goal. ↩

55. ALGOL 1960. ↩

56. As a pedantic point, a general parser need only parse proper grammars. A BNF grammar is proper if it has no useless rules and no infinite loops. Neither of these have any practical use, so that the restriction to proper grammars is unproblematic. ↩

57. Glennie 1960. ↩

58. Glennie 1960. ↩

59. Irons 1961. Among those who state that Irons 1961 parser is what is now called left-corner is Knuth (Knuth 1971, p. 109). ↩

60. Although it seems likely that parsing operator expressions would require backtracking, and therefore could be inefficient. ↩

61. Irons is credited with the discovery of synthesized attributes by Knuth (Knuth 1990). ↩

62. Sakai 1961 ↩

63. The credited rediscoveries are Hayes 1962 (attributed to Cocke); Younger 1967; and Kasami and Torii 1969. ↩

64. Sakai 1961 does not give context-free grammars for its examples, so it is perhaps best called non-Chomskyan. On the other hand, Sakai's tables for adjacent pairs are restricted to booleans. ↩

65. Sakai's algorithm runs in what will come to be called cubic time. ↩

66. Dijkstra 1961. ↩

67. Norvell 1999. ↩

68. Lucas 1961. ↩

69. Grune and Jacobs 2008 call Lucas the discoverer of recursive descent. In fact, both Irons 1961 and Lucas 1961 are recursive descent with a major bottom-up element. Perhaps Grune and Jacobs based their decision on the Lucas' description of his algorithm, which talks about his parser's bottom-up element only briefly, while describing the top-down element in detail. Also, the Lucas algorithm resembles 2023 implementations of recursive descent much more closely than does Irons 1961.

In conversation, Irons described his 1961 parser as a kind of recursive descent. Peter Denning (Denning 1983) gives priority to Irons and, as a grateful former student of Irons, that is of course my preference on a personal basis. ↩

70. Lucas 1961 cites Samelson and Bauer 1960. ↩

71. But see the entry for 1973 on Pratt parsing (Pratt 1973) where the idea of parsing entire languages as operator grammars is revisited. ↩

72. Schorre 1964, p. D1.3-1. ↩

73. This timeline is not a mathematics tutorial, and I have ignored important complications to avoid digression. For interested readers, the details of "big O" notation are worth learning: Wiki Big O. ↩

74. Because constants are ignored, all reasonable measures of the length are equivalent for big O notation. Some papers on parsing also consider the size of the grammar, but usually size of the grammar is regarded as a fixed constant. In this timeline all big O results will be in terms of the input length. ↩

75. Knuth 1965.. ↩

76. Oettinger 1960, p. 127. ↩

77. Knuth 1965, p. 607, in the abstract. ↩

78. Given the capacity of computer memories in 1965, LR(1) was clearly impractical. With the huge computer memories of 2023, that could be reconsidered, but even today LR(1) is rare in practical use. LR(1) shares the poor error-handling that the LR(k)-based parsers became known for. And, since LR(1) is still very limited in its power compared to LRR, it just does not seem to be worth it. ↩

79. In this document I will usually speak of a language intension as if it was a BNF grammar. Admittedly, this is a very simplified view of semantics. Treating semantics as reducible to the tagged trees produced by parsing a BNF grammar not only assumes that the language is pure context-free, it is not even adequate for most computer applications of context-free grammars. Most applications require at least some kind of post-processing.

But this is not a formal mathematical presentation. and for our purposes, we do not need to actually represent a semantics, just the interface to it. And the grammars we consider are almost always context-free. ↩

80. The empty set counts as an extension. ↩

81. As of 1965, the Irons parser has fallen out of favor and Sakai parsers are still being forgotten and rediscovered. ↩

82. The exception are applications which receive their input "on-line"; which can not determine the size of their input in advance; and which must return a result in a fixed amount of time. For this minority of applications, adding an end marker to their input is not possible. ↩

83. None of these problematic signs escaped Knuth. He discovers and proves them on pp. 630-636. But Knuth seemed to consider the LR hierarchy collapse a mathematical curiosity, and one with no implications for practical parsing. ↩

84. I deal with the implications of Knuth 1965 for parsing theory at greater length in these blog posts: Kegler Strings 2018, Kegler Solved 2018, and Kegler Undershoot 2018. ↩

85. Table-driven Sakai parsers have already been described in the parsing literature, but they continued to be rediscovered until 1969. That journal editors kept accepting descriptions of Sakai parsing as new research suggests that table parsers were seeing very little or no actual usage. ↩

86. In 1964, advocates of stack-based parsing would not have been seen themselves as settling for a "safe" or "traditional" solution. As recently as 1960, readers of technical journals were not expected to know what a stack was. ↩

87. Knuth 1965, p. 637. Knuth's own language is cautious, so it is not 100% clear that he believes that the future of practical parsing theory lies in the pursuit of LR(k) subsets. His program for further research (Knuth 1961, pp. 637-639) also suggests investigation of parsers for superclasses of LR(k). Indeed, Knuth shows (p. 638) that he is well aware that some grammars beyond LR(k) can be parsed in linear time. Knuth also is very much aware (p. 638) that it is an open question whether all context-free grammars can be parsed in linear time.

Knuth even describes a new superclass of his own: LR(k,t), which is LR(k) with more aggressive lookahead. But Knuth introduces LR(k,t) in dismissive terms: "Finally, we might mention another generalization of LR(k)" (Knuth 1965, p. 638); and "One might choose to call this left-to-right translation, although we had to back up a finite amount." (p. 639).

It is reasonable to suppose that Knuth is even more negative about the more general approaches that he does not bother to mention. Knuth's skepticism of more general Chomskyan approaches is also suggested by his own plans for his (not yet released) Chapter 12 of the Art of Computer Programming, in which he planned to use pre-Chomskyan bottom-up methods (Knuth 1990, p. 3).

In Knuth 1965 (pp. 607-608), Knuth had emphasized that proceeding strictly left-to-right is necessary for efficiency reasons. So subsequent researchers were probably correct in reading into Knuth a prediction that research into beyond-LR(k) parsing would be not be fruitful. Regardless of what Knuth himself believed, the consensus of the parsing theorists is not in doubt: interest in beyond-LR parsing will almost disappear after 1965. The theorist's attention will be focused almost exclusively on Knuth's suggestions for research within the stack-based model (p. 637). These included grammar rewrites; streamlining of the LR(k) tables; and research into LR(k) subclasses. ↩

88. Knuth also discusses some ambiguities, and some intermixing of semantics with syntax, in the ALGOL report. But Knuth (quite appropriately) notes that BNF was new when it was written, and treats these other issues as problems with the ALGOL specification. (See Knuth 1965, pp. 622-625.) ↩

89. Knuth 1965, p. 621. ↩

90. Lewis and Stearns 1968. They are credited in Rosenkrantz and Stearns 1970 and Aho and Ullman 1972, p. 368. ↩

91. Knuth 1965, p. 610. See also on p. 611 "corresponds with the intuitive notion of translation from left to right looking k characters ahead". ↩

92. Knuth 1971, p. 102. LL and LR have mirror images: RL means scan from the right, using left reductions and RR means scan from the right, using right reductions. Practical use of these mirror images is rare, but it may have occurred in one of the algorithms in our timeline -- operator expression parsing in the IT compiler seems to have been RL(2) with backtracking. ↩

93. Earley 1968. ↩

94. Knuth 1968. ↩

95. DeRemer 1969 ↩

96. Darwin 1984, Section 3.1, "An old editor made new". ↩

97. Aho and Ullman 1972. ↩

98. Aho and Ullman 1972, p 321. ↩

99. Aho and Ullman 1972, pp. 456-485. ↩

100. Aho and Ullman 1972, p. 466. ↩

101. Simms 2014. ↩

102. Čulik and Cohen 1973. Čulik and Cohen give an algorithm for parsing LRR, but the obstacles to implementation are daunting. (See Grune and Jacobs 2008, pp. 327-333 and Problem 9.28 on p. 341.) Their algorithm did not come into practical use, and as of 1991 LRR grammars could be handled more easily by Joop Leo's algorithm. ↩

103. Haskell list comprehension, for example, requires infinite lookahead -- see Kegler Haskell 2018. (Haskell's native parser deals with this in post-processing.) ↩

104. Pratt 1973. ↩

105. Norvell 1999. ↩

106. Pratt 1973, p. 51. ↩

107. Synder 1975. See, in particular, pp. 67-68. ↩

108. Aho and Ullman 1977. ↩

109. McIlroy and Kernighan 1979. ↩

110. Johnson 1975. ↩

111. See McIlroy 1987, p. 11, for a listing of yacc-powered languages in V7 Unix. ↩

112. Wiki Perl, Early versions section, https://en.wikipedia.org/w/index.php?title=Perl&oldid=837585549#Early_versions. ↩

113. Wadler 1990 ↩

114. Wadler 1985. ↩

115. Leo 1991. ↩

116. Wall 2014. ↩

117. Some sources consider Wadler 1985 an early presentation of the ideas of combinator parsing. In assigning priority here, I follow Grune and Jacobs 2008 (p. 564). ↩

118. The paper which is devoted to parsing is Hutton 1992. The other paper, which centers on combinators as a programming paradigm, is Frost 1992. Frost only mentions parsing in one paragraph, and that focuses on implementation issues. Some of Frost's grammars include operator expressions, but these avoid left recursion, and use parentheses to implement precedence and associativity. ↩

119. Hutton 1992, p. 5. ↩

120. Hutton 1992, pp. 19-20. ↩

121. Hutton and Meijer 1996. ↩

122. Hutton and Meijer 1996, p. 3. ↩

123. Hutton and Meijer 1996, p. 18. A simpler arithmetic expression grammar on p. 15 raises no issues not present in the more complicated example. ↩

124. Aycock and Horspool 2002. ↩

125. Ford 2004. See also Ford 2002. ↩

126. Mascarenhas et al 2014. My account of PEG is negative because almost all real-life PEG practice is unsafe, and almost all of the literature offers no cautions against unsafe use of PEG. Ford's PEG is efficient and does have specialized uses. Programmers interested in the safe use of PEG should consult Mascarenhas et al 2014. ↩

127. FSF 2006. ↩

128. Kegler 2010. ↩

129. In fact, Leo 1991, and therefore Marpa, is linear for a superset of the LRR grammars. It is not known (in fact it is not decidable, see Leo 1991, p. 175.) just how large the class of grammars that Marpa parses in linear time is.

It is known that there are both ambiguous and unambiguous grammars for which Leo 1991 is not linear. In the general case, Marpa and Leo 1991 obey the bounds of Earley's algorithm -- they are worst-case O(n**2) for unambiguous grammars; and worst-case O(n**3) for ambiguous grammars. On the other hand, Marpa follows Leo 1991 in being linear for many ambiguous grammars, including those of most practical interest. ↩

130. In the general case, ambiguity is undecidable, but in practice it is usually straight-forward for a programmer to determine that the grammar he wants is unambiguous. Note that while the general case is undecidable, Marpa will tell the programmer if a parse (a grammar with a particular input) is ambiguous. In fact, since Marpa can parse ambiguous grammars, the error message that Marpa produces shows where the ambiguity is. ↩

131. A marker, in this sense, is something in the input that shows where the middle of a middle recursion is, without having to go to the end and count back. Whether or not a grammar is unmarked is, in general, an undecidable problem. But in practice, there is a easy rule: if a person can eyeball the middle of a long middle recursion, and does not need to count from both ends to find it, the grammar is marked. ↩

132. This is a very pedantic requirement, which practical programmers can in fact ignore. Any programmer who has found and eliminated ambiguities in his grammar will usually, in the process, also have found and eliminated ambiguous right recursions, since it is easier to eliminate them than to determine if they create a overall ambiguity. In fact, it is an open question whether there are any unambiguous grammars with ambiguous right recursions -- neither Joop Leo or I know of any. ↩

133. Kegler 2012a. ↩

134. Internally, Marpa rewrites precedenced statements into BNF. Marpa's rewrite does not use middle recursions and does not introduce any new ambiguities so that, following Marpa's linearity rules, the result must be linear. ↩

135. In this, precedence goes from tightest to loosest. Single bars (|) separate RHS's of equal precedence. Double bars (||) separate RHS's of different precedence. Associativity defaults to left, and exceptions are indicated by the value of the assoc adverb. Note that these precedenced statements, like all of Marpa's DSL, are parsed by Marpa itself. The statement shown has the semantic adverbs removed for clarity. The original is part of the Marpa::R2 test suite. It can also be found in the "Synopsis" of the "Scanless::DSL" document on Marpa::R2 MetaCPAN: https://metacpan.org/pod/release/JKEGL/Marpa-R2-4.000000/pod/Scanless/DSL.pod#Synopsis, permalink accessed 4 September 2018. ↩

136. Marpa::R2. ↩