Ocean of Awareness

Jeffrey Kegler's blog about Marpa, his new parsing algorithm, and other topics of interest

Jeffrey's personal website

Google+

Marpa resources

The Marpa website

The Ocean of Awareness blog: home page, chronological index, and annotated index.

Mon, 17 Jun 2013


Marpa v. Parse::RecDescent: a rematch

The application

In a recent post, I looked at an unusual language which serializes arrays and strings, using a mixture of counts and parentheses. Here is an example:

A2(A2(S3(Hey)S13(Hello, World!))S5(Ciao!))

The language is of special interest for comparison against recursive descent because, while simple, it requires procedural parsing -- a purely declarative BNF approach will not work. So it's a chance to find out if Marpa can play the game that is recursive descent's specialty.

The previous post focused on how to use Marpa to mix procedural and declarative parsing together smoothly, from a coding point of view. It only hinted at another aspect: speed. Over the last year, Marpa has greatly improved its speed for this kind of application. The latest release of Marpa::R2 now clocks in almost 100 times faster than Parse::RecDescent for long inputs.

The benchmark

LengthSeconds
Marpa::R2 Marpa::XS Parse::RecDescent
1000 1.569 2.938 13.616
2000 2.746 7.067 62.083
3000 3.935 13.953 132.549
10000 12.270 121.654 1373.171

Parse::RecDescent is pure Perl, while Marpa is based on a parse engine in a library written in hand-optimized C. You'd expect Marpa to win this race and it did.

And it is nice to see that the changes from Marpa::XS to Marpa::R2 have paid off. Included in the table are the Marpa numbers from my 2012 benchmark of Marpa::XS. Marpa::R2 has a new interface and an internal lexer, and now beats Marpa::XS by a factor of up to 10.

While the benchmarked language is ideally suited to show recursive descent to advantage, the input lengths were picked to emphasize Marpa's strengths. Marpa optimizes by doing a lot of precomputation, and is written with long inputs in mind. Though these days, a 500K source, longer than the longest tested, would not exactly set a new industry record.

To learn more

There are fuller descriptions of the language in Flavio's post and code, and my recent post on how to write a parser for this language. I talk more about the benchmark's methodology in my post on the 2012 benchmark.

Marpa::R2 is available on CPAN. A list of my Marpa tutorials can be found here. There is a new tutorial by Peter Stuifzand. The Ocean of Awareness blog focuses on Marpa, and it has an annotated guide. Marpa also has a web page. For questions, support and discussion, there is a Google Group: marpa-parser@googlegroups.com. Comments on this post can be made there.


posted at: 09:47 | direct link to this entry

§         §         §