Sun, 20 Dec 2015
Top-down parsing is guessing
Top-down parsing is guessing. Literally.
Bottom-up parsing is looking.
The way you'll often hear that phrased is that top-down parsing is
looking, starting at the top,
and bottom-up parsing is looking, starting at the bottom.
But that is misleading, because the input is at the bottom --
at the top there is nothing to look at.
A usable top-down parser
have a bottom-up component,
even if that component is just lookahead.
A more generous, but still accurate, way to describe the top-down
component of parsers is "prediction".
And prediction is, indeed, a very useful component of a parser,
when used in combination with other techniques.
Of course, if a parser does nothing but predict,
it can predict only one input.
Top-down parsing must always be combined with a bottom-up
This bottom-up component may be as modest as lookahead, but it
be there or else top-down parsing is really not parsing at all.
So why is top-down parsing used so much?
Top-down parsing may be unusable in its pure form,
but from one point of view that is irrelevant.
Top-down parsing's biggest advantage is that it is highly flexible --
there's no reason to stick to its "pure" form.
A top-down parser can be written as a series of subroutine calls --
a technique called recursive descent.
allows you to hook in custom-written bottom-up logic at every
top-down choice point,
and it is a technique which is
completely understandable to programmers with little or no training
in parsing theory.
When dealing with recursive descent parsers,
it is more useful to be a seasoned, far-thinking programmer
than it is to be a mathematician.
This makes recursive descent very appealing to
seasoned, far-thinking programmers,
and they are the audience that counts.
You can even use the flexibility of top-down to switch
away from top-down parsing.
For example, you could claim that a top-down parser could do anything my
could do, because a recursive descent parser can call
a Marpa parser.
A less dramatic switchoff, and one that still leaves the parser with a good claim to be
basically top-down, is very common.
Arithmetic expressions are essential for a computer language.
But they are also
among the many things
top-down parsing cannot handle, even with ordinary lookahead.
Even so, most computer languages these days are parsed top-down --
by recursive descent.
These recursive descent parsers deal with expressions
by temporarily handing control over to an bottom-up operator
Neither of these parsers is
extremely smart about the hand-over and hand-back
-- it is up to the programmer to make sure the two play together nicely.
But used with caution, this approach works.
Top-down parsing and language-oriented programming
But what about taking top-down methods into the future of language-oriented programming,
extensible languages, and grammars which write grammars?
Here we are forced to confront the reality -- that the effectiveness of
top-down parsing comes entirely from the foreign elements that are added to it.
Starting from a basis of top-down parsing is literally starting
As I have shown in more detail
top-down techniques simply do not have enough horsepower to deal with grammar-driven programming.
Perl 6 grammars are top-down -- PEG with lots of extensions.
include backtracking, backtracking control,
a new style of tie-breaking and lots of opportunity
for the programmer to intervene and customize everything.
But behind it all is a top-down parse engine.
One aspect of Perl 6 grammars might be seen as breaking
out of the top-down trap.
That trick of switching over to a
bottom-up operator precedence parser for expressions,
which I mentioned above,
is built into Perl 6 and semi-automated.
(I say semi-automated because making sure the two parsers "play nice"
with each other is not automated -- that's still up to the programmer.)
As far as I know, this semi-automation of expression handling
is new with Perl 6 grammars, and it
may prove handy for duplicating what is done
in recursive descent parsers.
But it adds no new technique to those already in use.
- mulitple types of expression, which can be told apart
based on their context,
- n-ary expressions for arbitrary
- the autogeneration of multiple rules, each allowing
a different precedence scheme,
for expressions of arbitrary arity and associativity,
all of which are available and in current use in
are impossible for the technology behind Perl 6 grammars.
I am a fan of the Perl 6 effort.
Obviously, I have doubts about one specific set of hopes for Perl 6 grammars.
But these hopes have not been central to the Perl 6 effort,
and I will be an eager student of the Perl 6 team's work over the coming months.
To learn more about Marpa,
official web site maintained by Ron Savage.
I also have
a Marpa web site.
Comments on this post can be made in
Marpa's Google group,
or on our IRC channel: #marpa at freenode.net.
posted at: 18:21 |
direct link to this entry