Tue, 10 Mar 2015
Linear? Yeah right.
I have claimed that my new parser,
is linear for vast classes of grammars,
going well beyond what the traditional parsers can do.
But skepticism is justified.
When it comes to parsing algorithms,
there have been a lot of time complexity claims
that are hand-wavy, misleading or just
This post describes how someone,
is exercising the appropriate degree of skepticism,
might conclude that believing Marpa's claims is a reasonable
and prudent thing to do.
Marpa's linearity claims seem to be,
in comparison with the other parsers in practical use
Marpa claims linearity,
not just for every class of grammar for which
yacc/bison, PEG and recursive descent currently claim linearity,
but for considerably more.
(The mathematical details of these claims
are in a
section at the end.)
It seems too good to be true.
Why should I believe you?
The most important thing to realize,
in assessing the believability of Marpa's time complexity claims,
is that they are not new.
They were already proved in a long-accepted paper in the refereed literature.
They are the time complexity claims proved by Joop Leo for his algorithm
in 1991, over two decades ago.
Marpa is derived from Leo's algorithm, and its time complexity claims
are those proved for Leo's algorithm.
Above I said that Marpa's time complexity claims "seem" bold.
On any objective assessment, they are in fact a bit of a yawn.
surprising only because a lot of people
are unaware of Leo's results.
That is, they are surprising in the same sense that someone who
had avoided hearing about radio waves would be surprised
to learn that he can
with someone on the other side of the world.
So, if there's so little to prove, why does the
In Marpa, I made many implementation decisions about,
and some changes to, the Leo/Earley algorithm.
None of my changes produced better time complexity results --
my only claim is that I did not change the Leo/Earley
algorithm in a way that slowed it down.
To convince myself of this claim, I reworked
the original proofs of
Leo and Earley,
changing them to reflect my changes,
and demonstrated that the results
that Leo had obtained still held.
Proofs of this kind,
which introduce no new mathematical techniques,
but simply take a previous result and march from
here to there by well-know means,
are called "tedious".
In journals, where there's a need to conserve space,
they are usually omitted,
as is the case with Marpa's time complexity proofs,
the results are intuitively quite plausible.
Getting from plausible to near-certain
So let's say you are not going to work through every line
of Marpa's admittedly tedious proofs.
We've seen that the results are intuitively plausible,
as long as you don't reject the previous literature.
But can we do better than merely "plausible"?
As an aside, many people misunderstand the phrase
"mathematically proven", especially as it applies to branches
of math like parsing theory.
The fact is that proofs in papers often contain errors.
Usually these are minor,
and don't affect the result.
On the other hand, Jay Earley's paper,
while one of the best Computer Science papers ever published,
also contained a very
And this error slipped past his Ph.D. committee and
Mathematical arguments and proofs do not allow us to achieve
They can only improve the degree of certainty.
There's a second way to dramatically increase
your degree of conviction
in Marpa's linearity claims, and it is quite simple.
Create examples of problematic grammars,
run them and time them.
This is not as satisfying as a mathematical proof,
because no set of test grammars can be exhaustive.
But if you can't find a counter-example
to Marpa's linearity claims among the grammars of
most interest to you,
that should help lift
your level of certainty to "certain for
all practical purposes".
Much of this increase in certainty can be
achieved without bothering to run your own tests.
Marpa is in wide use at this point.
If Marpa was going quadratic on grammars
for which it claimed to be linear,
and these were grammars of practical interest,
that would very likely have been noticed by now.
I'm still not convinced
Let's suppose all this has not brought you to
the level of certainty you need to use Marpa.
That means the reasonable thing is to continue to
struggle to work with the restrictions of the
traditional algorithms, right?
No, absolutely not.
OK, so you don't believe that Marpa preserves
the advances in power and speed made by Leo.
Does that mean that parsers have to stay underpowered?
No, it simply means that there should be a
more direct implementation of Leo's 1991,
But if you are looking for an implementation of Leo's
I think you may end up coming back to Marpa as the most
Marpa's additional features
include the ability to use custom,
as you can with recursive descent.
And Marpa has worked out a lot of the
implementation details for you.
Comments on this post can be made in
Marpa's Google group,
or on our IRC channel: #marpa at freenode.net.
To learn more about Marpa,
official web site maintained by Ron Savage.
I also have
a Marpa web site.
Appendix: Some technical details
I talked about algorithms, classes of grammars and their
I didn't give details because most folks aren't interested.
For those who are, they are in this section.
yacc is linear for a grammar class called LALR,
which is a subset of another grammar class
If you are willing to hassle with GLR,
bison claims linearity for all of LR(1).
Recursive descent is a technique, not an algorithm,
but it is top-down with look-ahead,
and therefore can be seen as some form of LL(k),
where k depends on how it is implemented.
In practice, I suspect k is never much bigger than 3,
and usually pretty close to 1.
PEG can be made linear for everything it
parses but there is a catch --
only in limited cases do you know
what language your PEG grammar actually parses.
In current practice, that means your PEG grammar
must be LL(1).
Some of the PEG literature looks at techniques for
extending this as far as LL-regular, but there are no
implementations, and it remains to be seen if the
algorithms described are practical.
contains a proof,
based on a proof of the same claim by
that Marpa is linear for LR-regular grammars.
The LR-regular grammars
include the LR(k) grammars for every k.
So Marpa is linear for LR(1), LR(2), LR(8675309), etc.
LR-regular also includes LL-regular.
So every class of grammar under discussion
in the PEG literature is
already parsed in linear time by Marpa.
it is also safe to conclude that,
if a grammar can be parsed by
anything reasonably described as recursive descent,
it can be parsed in linear time by Marpa.
posted at: 22:01 |
direct link to this entry