Mon, 28 May 2018
Is a language just a set of strings?
But to my mind, though I am native here
And to the manner born, it is a custom
More honor’d in the breach than the observance.
Chomsky's "Three Models" paper
Important papers produce important mistakes.
A paper can contain a great many errors,
and they will have no effect if the paper is ignored.
On the other hand,
even the good methods of
a great paper can go badly wrong
when its methods
outlive the reasons for using them.
Chomsky's "Three Models" paper
is about as influential
as a paper can get.
Just 12 pages,
it's the paper in which the most-cited scholar of our
time first outlined his ideas.
Even at the time,
linguists described its effect on their field as
Bringing new rigor into what had been seen as a "soft"
science, it turned lots of heads outside linguistics.
It belongs on anyone's list of the most important scientific papers ever.
Given its significance,
it is almost incidental that
"Three models" is also the foundation paper of computer Parsing Theory,
the subject of these blog posts.
Chomsky does not consider himself a computer scientist
and, after founding our field,
has paid little attention to it.
But in fact,
the Chomskyan model has been even more dominant
in computer parsing than in Chomsky's own
field of linguistics.
"Three Models" places Chomksy among the great mathematicians
of all time.
True, the elegance and rigor of Chomsky's proofs
better befit a slumming linguist
than they would a professional mathematician.
But at its heart,
mathematics is not a technical field,
or even about problem-solving --
at its most fundamental,
it is about framing problems so that they
can be solved.
And Chomsky's skill at framing problems is astonishing.
A brilliant simplification
Chomsky had a new approach to linguistics,
and wanted to prove that his approach to language
did things that
the previous approach,
based on finite-state models,
("Finite-state" models, also known as Markov chains,
are the predecessors of the regular expressions
Chomsky sets out to do this with extremely minimal definition of what
a language is.
By a language then, we shall mean a set (finite or infinite) of
sentences, each of finite length, all constructed from a finite
alphabet of sysbols. If A is an alphabet, we shall say that
anything formed by concatenating the symbols of A is a string in
A. By a grammar of the language L we mean a device of some sort that
produces all of the strings that are sentences of L and only these.
Yes, you read that right --
Chomsky uses a definition of language which has nothing to
do with language actually meaning anything.
A language, for the purposes of the math in "Three Models",
is nothing but a list of strings.
Similarly, a grammar is just something that enumerates
The grammar does not have to provide any clue as to what
the strings might mean.
For example, Chomsky would require of a French grammar that one of the
strings that it lists be
(42) Ceci n'est pas une phrase vraie.
But for the purposes of his demonstration,
Chomsky does not require of his "grammar" that it
give us any guidance as to what sentence (42) might mean.
Chomsky shows that there are English sentences that his "grammar" would
and which a finite-state "grammar" would not list.
Clearly if the finite-state grammar cannot even produce a sentence
as one of a list,
it is not adequate as a model of that language,
at least as far as that sentence goes.
Chomsky shows that there is,
a large, significant class of sentences that
his "grammars" can list,
but which the finite-state grammars cannot list.
Chomsky presents this as
very strong evidence that
his grammars will make better models
than finite-state grammars can.
In addition to simplifying the math, Chomsky has two other good
reasons to avoid dealing with meaning.
A second reason is that
semantics is a treacherously dangerous field of study.
If you can make your point,
and don't have to drag in semantics,
you are crazy to do otherwise.
Sentence (42), above,
is just one example of the pitfalls
that await those tackle who semantic issues.
a famous Magritte
and translates to "This is not a true sentence".
A third reason is that most linguists of Chomsky's time
Bloomfield defined language as follows:
The totality of utterances that can be made in a speech
community is the language
of that speech-community.
Bloomfield says "totality" instead of "set"
and "utterances" instead of "strings",
but for our purposes in this post the idea is the same --
the definition is without regard to the meaning
of the members of the set.
Bloomfield's omission of semantics is not accidental.
Bloomfield wanted to establish linguistics as a
science, and for Bloomfield
claiming to know the meaning of
a sentence was dangerously close to
claiming to be able to read minds.
You cannot base your work on mind-reading and expect people to
believe that you are doing science.
Bloomfield therefore suggested avoiding,
totally if possible,
any discussion of semantics.
Most readers of Chomsky's paper in 1956 were Bloomfieldians --
Chomsky has studied under a Bloomfieldian,
and originally was seen as one.
By excluding semantics from his own model of language,
Chomsky was making his paper maximally acceptable to
Semantics sneaks back in
But you did not have to read Chomsky's mind,
or predict the future,
to see that Chomsky
was a lot more interested in semantics than
Already in "Three Models",
he is suggesting that his model is superior to
because his model,
when an utterance is ambiguous,
produces multiple derivations to reflect that.
Even better, these multiple derivations "look" like natural representations
of the difference between meanings.
which dropped effortlessly out of Chomsky's grammars,
were well beyond what the finite-state models were providing.
Chomsky's argument, that his grammars were better
because they could list more sentences,
might have carried the day.
With the demonstration that his grammars could do
more than list sentences,
but also could proivde insight into the structure and semantics
Chomsky's case was compelling.
Young linguists wanted theoretical tools with this
kind of power and those few
older linguists not convinced struggled to find
reasons why the young linguists could not
have what they wanted.
In later years,
Chomsky made it quite clear what his position was:
[...] it would be absurd to develop
a general syntactic theory
without assigning an absolutely
crucial role to semantic considerations,
since obviously the necessity to support
semantic interpretation is one of the primary
that the structures
generated by the syntactic component of a grammar
Compare this to Bloomfield:
The statement of meanings is therefore the weak point in
language-study, and will remain so until human knowledge
advances very far beyond its present state. In practice, we define the
meaning of a linguistic form, wherever we can, in terms of some
It is easy to see why linguists found Chomsky's
expansion of their horizons irresistable.
Given the immense prestige of "Three models",
it is unsurprising that it was closely studied by
the pioneers of parsing theory.
Unfortunately, what they picked up was not
Chomsky's emphasis on the overriding importance
but the narrow definition of language
that Chomsky had adopted from Bloomfield
for tactical purposes.
In the classic Aho and Ullman 1972 textbook, we have
A language over an alphabet Σ
is a set of strings over an alphabet Σ.
This definition encompasses almost everyone's notion of a language.
If this "encompasses" my notion of a language,
it does so only in the sense that an avalanche encompasses
From 1988, thirty years after Chomsky,
here is another authoritative textbook of Parsing Theory
A set V is an alphabet (or a vocabulary) if it is finite and
of an alphabet V are called the symbols (or letters or characters) of
A language L over V is any subset of the free monoid V*.
of a language L are called sentences of L.
The language is now that of abstract algebra,
but the idea is the same -- pure Bloomfield.
Interesting, you might be saying, that
some textbook definitions are not everything they could be,
but is there any effect on the daily practice of
The languages human beings use with each other
varied, flexible and endlessly retargetable.
The parsers we use to communicate with computers
are restrictive, repetitive in form,
difficult to reprogram,
and prohibitively hard to retarget.
Is this because humans have a preternatural language ability?
Or is there something wrong with the way we
go about talking to computers?
How the Theory of Parsing literature defines the term
"language" may seem
of only pedantic interest.
But I will argue that it is a mistake which has everything
to do with the limits of modern computer languages.
What is the problem with defining a language as a set of strings?
Here is one example of how the textbook definition
affects daily practice.
Call one grammar SENSE:
SENSE ::= E
E ::= E + T
E ::= T
T ::= T * P
T ::= P
P ::= number
And call another grammar STRING:
STRING ::= E
E ::= P OP E
OP ::= '*'
OP ::= '+'
P ::= number
If you define a language as a set of strings,
recognize the same language.
But it's a very different story if you
take the intended meaning as
that of traditional arithmetic expressions,
the meaning of the two grammars.
recognizes the associativity and precedence of the two operators --
the parse tree it produces could be used directly to evaluate an arithmetic
expression and the answer would always be correct.
The parse tree that STRING produces, if evaluated directly,
will very often produce a wrong answer -- it does not capture
the structure of an arithmetic expression.
In order to produce correct results,
the output of STRING could be put through a second phase,
but that is the point --
STRING left crucial parts of the job of parsing undone,
and either some other logic does the job STRING did not do,
or a wrong answer results.
It is much easier to write a parser for STRING
than it is for SENSE.
Encouraged by a theory that minimizes the
many implementations attempt to make do with STRING.
But that is not the worst of it.
The idea that a language is a set of strings
has guided research,
and steered it away from the most promising lines.
How, I hope to explain in the future.
The background material for this post is in my
Parsing: a timeline 3.0,
and this post may be considered a supplement to "Timelime".
To learn about Marpa,
my Earley/Leo-based parsing project,
there is the
semi-official web site, maintained by Ron Savage.
The official, but more limited, Marpa website
is my personal one.
Comments on this post can be made in
Marpa's Google group,
or on our IRC channel: #marpa at freenode.net.
posted at: 19:59 |
direct link to this entry