Mon, 12 Sep 2011

Announcing Marpa::XS 0.010000

Some time ago I released Marpa::XS 0.010000. The core Marpa algorithm had already been converted to C, speeding it up considerably. Marpa::XS 0.010000 cleans up a lot of code left over from development, further speeding things up.

What is Marpa?

Marpa is an advance over recursive descent and yacc. I hope the Marpa algorithm will become the standard parser for problems too big for regular expressions.

Marpa parses, in linear time, all classes of grammar that are currently in practical use. (See Note 1).
The importance of parse-time debugging is often underestimated. Marpa's parse-time error detection breaks new ground -- Marpa is fully aware of exactly where in the parse it is at all times, and of exactly what input it expects and why. This means parse-time error detection, once a desperate last resort, now can be used as a parsing technique in itself.
Marpa is a general BNF parser -- that means if you feed it anything written in BNF, it "just parses" it. This includes grammars which are left-recursive, right-recursive and ambiguous -- even infinitely ambiguous.
Marpa never goes exponential -- worst case, even for highly ambiguous grammars, is O(n**3), which is considered optimal (See Note 2).

What is Next with Marpa?

I had planned to skip this announcement and wait to announce Marpa::XS 0.012000. I expected the release of Marpa::XS 0.012000 to be just days away. And I did produce its release candidate almost immediately. But in the meantime cpantesters was hit with a prolonged outage, which continues as of this writing. I've always waited for a few days of cpantesters results on the release candidate before cutting an official release and I hope cpantesters will be back online soon.

Thanks to git, delay in getting one release out is no major obstacle to work on its successor, and work toward Marpa::XS 0.014000 is well underway. At this point (assuming the cpantesters results will be positive) very little remains to be done before Marpa::XS can go beta.

Marpa::XS 0.014000 will be about bug fixes. I will do one or two additional applications, just to see if any problems surface. Marpa's current test suite is already quite extensive -- besides the usual specific and regression tests, it aggressively exercises a full parser for liberal HTML, and it runs a couple of large tests with a prototype Perl parser. So, while it may be unwise to make this kind of prediction in public, I frankly do not expect a new application or two to find much in the way of bugs.

What I had not done for prior releases was look proactively for leaks and other memory issues. I am now most of the way through doing that for Marpa::XS 0.014000. So far, I have found and fixed one slow leak. No memory overruns have turned up. I will complete the memory debugging and leak detection before taking Marpa::XS beta.

Limitations and Comparisons

Currently, the major limitation of Marpa::XS is that it is alpha. Development is well advanced, but the interface remains subject to change. For a comparison of Marpa to other parsers, which is careful to point out situations where older parsing algorithms may still be superior, see the "Limitations and Comparisons" section in my announcement of Marpa::XS 0.008000.

Notes

Note 1: To be specific, Marpa parses any LR-regular grammar in linear time -- O(n). LR-regular is a vast class of grammars that includes LALR, LR(k) for all k, and LL(k) for all k. This means that Marpa parses, in linear time, every grammar parsed by yacc, recursive descent and regular expressions.

Note 2: The phrase "considered optimal" elides some irrelevant bits of theory. It would be a bit of a surprise if it is possible to do general BNF parsing in O(n), but nobody has proved that it can't be done. The Valiant algorithm parses general BNF and is O(n**2.376...) or better. But Valiant's algorithm is only faster for huge problems, and for those it needs a machine with many terabytes of main memory to deliver on its speed claims. So it won't be competing with Marpa any time soon.

posted at: 14:45 | direct link to this entry

§ § §