Thu, 12 Apr 2012
Marpa v. Parse::RecDescent: some numbers
a recent blog post,
Flavio Poletti described an unusual language,
one which he'd written a parser for,
using the very traditional method of
Specifically he'd used Perl's Parse::RecDescent
Several people asked me how
would do on this language.
This post contains
a comparison, including a benchmark.
The reader who wants an exact description should
Flavio's post and code
I call it a Dyck-Hollerith language because it
(strings preceded by a count),
with balanced parentheses
(what is called
a Dyck language
Here's Flavio's example:
When I did
a Marpa versus Perl regexes comparison,
I was very careful to choose an application which showed Marpa in
a good light.
Here I did not pick the application,
and it is almost as if it was designed
to show recursive descent in a good light.
Recursive descent tries to parse by treating each rule
as a subroutine call.
This is a very natural way for a programmer
to look at a grammar.
But, alas, it only works in certain cases,
and most of those are already handled by regular expressions.
However, Dyck languages, including balanced parentheses,
are one of the boundary cases:
languages beyond the abilities of regular expressions,
but within the capabilities of recursive descent.
Hollerith constants highlight the other strength of
Since recursive descent's basic idea is so intuitive, it is easy
to add custom hacks to it.
And Hollerith constants are beyond the capabilities of even
general BNF parsing.
Earley's, Marpa, CYK, GLR, you name it,
none of them can handle Hollerith constants without
some kind of custom hack.
You might say that Hollerith constants force a parser
to wrestle in the mud.
And when it comes to wrestling in the mud,
recursive descent is hard to beat.
For Parse::RecDescent, the code used
in the benchmark is Flavio's,
modifications such as the
one that allows inputs of varying lengths.
The Marpa::XS parser
used in this benchmark is
on github as a gist.
Since Marpa::XS has its parse engine in optimized C,
while Parse::RecDescent is pure Perl,
one would hope that Marpa::XS would put better numbers
on the board,
and it did.
Marpa::XS has some startup overhead, but settles in
at roughly 10 times faster than the Parse::RecDescent
To test inputs of varying lengths, I took the example above and created
arrays of it in the Dyck-Hollerith language.
The reported length is the length of the top-level array.
The example is 42 characters long, so a reported length of 1000 means
the input string is 6+1000*42+1=42007 characters long.
(The 6 is the length of the "A1000(" prefix of the enclosing
array and the 1 is for
its closing parenthesis.)
Benchmarks are like traffic accidents.
They have a draw and an excitement,
one that the wiser and
nobler sort know is improper and to be resisted.
But you'll often catch even them slowing down to look.
Other factors than speed may determine the choice of a parser,
particularly if large inputs do not occur.
As noted, many programmers prefer the recursive descent approach
of turning grammar rules into subroutines calls:
for them, it is the way things SHOULD work.
And given recursive descent's hacker friendliness, it is the
way things often DO work.
But I think those accustomed to recursive descent may want to
look at Marpa.
Marpa has all the same context that recursive descent
has, and allows the same tricks and more.
In this benchmark,
Marpa did NOT haul out the Ruby Slippers --
it limited itself to the context that recursive descent has.
Marpa is faster in this benchmark because it gives the
application the best of both worlds -- custom hacks,
*AND* a parse engine written in optimized C.
Recursive descent is good at custom hacks,
and for this grammar its underlying algorithm has
the same big-O complexity.
But by its nature, Parse::RecDescent is pure Perl,
so Marpa::XS clocks a steady factor-of-ten advantage.
And, in fact, of the time Marpa::XS does use, only about 20%
is used by its parse engine.
Overhead and lexing in Marpa::XS are in pure Perl, and they take
80% of the time.
posted at: 00:08 |
direct link to this entry