Mon, 14 May 2018
Parsers and Useful Power
What do parser
What makes a parser
In this post I will look at
one aspect of
in light of an episode in
the history of parsing.
The first paper
fully describing a parser was Irons 1961.
The Irons parser was what is called "general",
meaning that it can parse all of
the "context-free grammars".
That makes it
far more powerful than most parsers
in practical use today.
But the Irons algorithm was not always fast in the general case.
Irons 1961 used backtracking
to achieve its power,
so it would go exponential for many useful grammars.
Among the grammars Irons 1961 could not parse quickly
were those containing the all-important arithmetic expressions.
Irons 1961 gave way to recursive descent.
Recursive descent (RD) in its pure form,
could not parse arithmetic expressions at all,
but it could be customized with procedural code.
That is, it could call specialized parsers which were
reliably fast for specific sections of the input.
The Irons parser was declarative,
and not easy to cusomtize.
Raw power versus useful power
The contest between Irons parsing and recursive descent took place
before the theory for analyzing algorithms was fully formed.
In retrospect, we can say that,
except in specialized uses,
an acceptable parser for most practical uses
must be linear or quasi-linear.
the "useful power" of a parser is the class
of grammars that it will parse in quasi-linear time.
Useful power turns out to be more important,
than raw power.
Recursive descent won out over the
Irons algorithm because,
while the Irons algorithm had vastly more raw power,
RD had slightly more "useful power".
It is nice to have raw power as well -- it means an algorithm can take on some specialized tasks.
And raw power provides a kind of "soft failure" debugging mode for grammars with,
for example, unintended ambiguities.
But, in the eyes of the programming community, the more important measure of a parser
is its useful power -- the class of grammars that it will parse at quasi-linear speed.
Stating the obvious?
That useful power is more important than raw power may seem,
But in fact, it remains a live issue.
In practice raw power and useful power are often confused.
The parsing literature is not always as helpful as it could be:
it can be hard to determine what the
useful power of an algorithm is.
And the Irons experiment with raw power is often repeated,
in hopes of a different result.
a new algorithm is a hybrid of two others:
an algorithm with a lot of raw power,
but which can go quadratic or worse;
and a fast algorithm which lacks power.
When the power of the fast algorithm fails,
the hybrid algorithm switches over
to the algorithm with raw power.
It is a sort of
cross-breeding of algorithms.
The hope is that the hybrid algorithm has the best
features of each of its parents.
This works a lot better in botany than it does in parsing.
Once you have a successful cross in a plant,
you can breed from the successful hybrid
and expect good things to happen.
the individual crosses can have an extremely high
and cross-breeding can still succeed.
But it's different when you cross algorithms:
Even after you've succeeded with one parse,
the next parse from your hybrid is a fresh new toss of the dice.
References, comments, etc.
To learn about
my own parsing project,
there is the
semi-official web site, maintained by Ron Savage.
The official, but more limited, Marpa website
is my personal one.
Comments on this post can be made in
Marpa's Google group,
or on our IRC channel: #marpa at freenode.net.
posted at: 06:01 |
direct link to this entry