Sun, 27 Nov 2011
Marpa and the Ruby Slippers
a previous post,
I listed the four ideas
that are essential to
This post delves into
one of them: Ruby Slippers parsing.
In Ruby Slippers parsing, the parser imagines
that the language it is parsing
to parse than it actually is.
The part of the application that handles input
manipulates the input
to make the parser's
"wishes" come true.
As an example,
take liberal HTML.
"Liberal HTML" is HTML as it is found "in the wild",
with missing and spurious tags.
I've written a
Marpa-powered liberal HTML parser
which uses the Ruby Slippers as its primary
The grammar behind
assumes a fantasy world,
one where no element ever occurs out of place,
and where all HTML elements have both start and end tags.
With Marpa as the parse engine, it is easy for
the lexer to make wishes come
All the lexer needs to do is wait until the parser is not happy
with the input.
When the parser sees the actual input as a problem,
the lexer asks the parser what it would like to see instead.
Marpa always knows exactly what it is looking for,
so that it is no problem for the lexer
to invent an input that makes the parser happy.
This technique sounds a bit magical, which
is why I named it "the Ruby Slippers".
But the idea is simple enough,
and the need for it great enough,
that it has occurred to others over the
now 50 year history of parsing techniques.
Perl lexer invents input
to simplify the Perl language into something
that its LALR-based parse engine can handle.
Commenters on a previous post
have mentioned other instances.
where the Ruby Slippers were used
can be found going back to the 60's.
But previous use of the Ruby Slippers was difficult,
and had to be rudimentary and infrequent.
The parsers in standard use did not provide
Recursive descent takes a worm's eye point of view of the
parse -- it knows only where it is trying to burrow at that
When it came to feedback,
LALR-based parsers were particularly bad --
they were based on
a very abstract state machine.
The Perl lexer,
for its use of the Ruby Slippers,
didn't even try looking at the LALR state --
instead the Perl lexer
duplicated the LALR parser's work
for the section of the parse that was of interest.
If the Perl lexer tried to make frequent use of the Ruby Slippers
it would raise a question:
Why bother with the LALR parser at all?
To really empower the Ruby Slippers, a parser needs to
do two things.
First, the parser must know what it wants.
Second, the parser must know this in time to help guide the input.
Earley's algorithm met the first requirement from the beginning.
Earley's algorithm works by creating "Earley sets".
The original Earley sets were lists of rule applications,
one list for every token in the input.
A rule application was in the Earley set if and only if it
was a real
The second requirement was not met in the original Earley's
For the second requirement to be met,
all work on an Earley set at a parse position
must be finished before
the tokens that start at that position
Pre-Marpa, Earley parse engines
intermixed work on a location's Earley set,
with reading the input for that location.
That meant that the information to guide Ruby Slippers parsing
became available until just AFTER it was needed.
In creating Marpa,
to combine other researcher's improvements to Earley's into a single
Merging their parse engines into a new one
forced me to write my own, new, parse engine.
In the course of this,
I saw that Earley's Algorithm could be rearranged
at every location in the parse,
each list of possible rule applications was finished before
starting at that location was read.
To make this happen,
Marpa divides the single loop of previous Earley parse engines
into two loops,
and reverses their order.
In other words,
Marpa's parse engine turns the original Earley parse engine
inside-out and upside-down.
The result is provably equivalent to the original,
and just as fast.
At the end
of the Wizard of Oz movie,
Dorothy tells Glenda she wants to get back to Kansas.
Glenda points down to Dorothy's Ruby Slippers and explains
adding that Dorothy could have gone to Kansas anytime she liked.
The Ruby Slippers have always been implicit in Earley's Algorithm.
We've just had to realize they were there.
Possiblity is defined the way you'd hope it would
be: a rule application is "possible" at any point,
if and only if,
given the input up to that point,
there is some input from that point on,
that produces a parse
which includes that rule application.
"anytime she liked":
As a child I wondered if Glenda was not being a bit tactless.
posted at: 15:48 |
direct link to this entry