Ocean of Awareness

Jeffrey Kegler's blog about Marpa, his new parsing algorithm, and other topics of interest

Jeffrey's personal website


Marpa resources

The Marpa website

The Ocean of Awareness blog: home page, chronological index, and annotated index.

Thu, 31 Dec 2009

Marpa: Practical General BNF Parsing

With a regular expression engine, there are expectations. You feed a regular expression to the RE engine, it parses with it. That simple.

A general BNF parser is one which fulfills the same expectation for BNF. Write your language in BNF, you got a parser. But it hasn't been that simple.

The guys who write the textbooks have pushed general BNF parsing for years. Improvements in these algorithms have pushed the speeds down to linear or close to it for the kinds of language in practical use.

But the general parsing algorithms have languished on the textbook pages. And I did find it wasn't quite as easy as the academics suggested. There were some obstacles that they didn't forsee. But bottom line, they were right. General BNF parsing is practical.

Marpa is a practical (if at this point alpha) general BNF parser generator. I have used Marpa to write a practical parser for a non-trivial task -- HTML parsing. That HTML parser is Marpa::UrHTML, and I now use Marpa::UrHTML in tasks I do routinely.

Over the next weeks, I will do phased releases of Marpa. The HTML parser, Marpa::UrHTML, is documented and ready to use in the first release. Over the next few weeks, I will document Marpa, the parse engine itself.

Right now the Marpa parse engine is "pure Perl", and speeds are on the order of PPI. This is acceptable for many applications, but Marpa can do better. Marpa has as its parse engine a mathematical algorithm that lends itself to conversion to C. Further down the road, I'll write a Marpa C library, and an XS wrapper.

posted at: 15:29 | direct link to this entry

§         §         §