Ocean of Awareness

Jeffrey Kegler's blog about Marpa, his new parsing algorithm, and other topics of interest

Jeffrey's personal website

Google+

Marpa resources

The Marpa website

The Ocean of Awareness blog: home page, chronological index, and annotated index.

Wed, 21 Dec 2011


Marpa::XS is now 1.000000

Marpa::XS is now 1.000000. Marpa::XS is the current lead implementation of Marpa, an algorithm that I hope will become standard for those parsing problems which are too complex for regular expressions. Apparently quite a number of people have put the beta to use. Feedback has been positive -- often extremely so.

What is Marpa?

Marpa is a general BNF parser -- it parses anything you can write in BNF, no exceptions. Left-recursion, right-recursion, ambiguity and even infinite ambiguity, you name it, Marpa parses it. If the grammar is of a class in practical use, Marpa parses it in linear time -- O(n).

Marpa's parse-time error detection is a breakthrough. When previous parsers failed, they often offered very little clue as to why. Marpa knows exactly what input it expects and why. Marpa is always fully aware of exactly where it is in the parse, in terms of the rules of the grammar, and it can share that information with the application. So good is Marpa at error detection, once considered a desperate last resort, that error detection can be used as a parsing technique in itself.

While Marpa is intended to computer with production parsers, it does have special advantages for developers and experimenters. Marpa is highly tolerant of difficult grammars -- it parses all of them, and in times which are considered optimal.

New with this release

For Marpa::XS 1.000000, only the version number and the README file were changed from the previous, beta, release.

What is next with Marpa?

Marpa::XS is aimed at users who want a stable platform for applications. To ensure the stability of Marpa::XS, active development of Marpa is moving into a new fork: Marpa::R2. This will isolate Marpa::XS users from the accidental changes and bugs that can be the side effect of active development.

Initially, changes to Marpa::XS will be restricted to bug fixes and those justified from a maintainability standpoint. The feature set will be kept stable. (As it stands, Marpa::XS is much more fully featured than competing parsers.) If I enhance the features of Marpa::XS, the new features will be back-ported from Marpa's active development forks, and I will preserve backward compatibility.

Limitations

Marpa::XS is, as the name suggests, XS only -- installation requires access to a C compiler, and to many of the GNU utilities and libraries as well. Marpa::XS has been tested on a wide variety of POSIX systems. In theory Marpa::XS is NOT restricted to POSIX systems -- all the tools it uses have Windows versions, for example. However, Marpa::XS has not, to my knowledge, been installed on a non-POSIX system.

Notes

  1. "in linear time": To be specific, Marpa parses any LR-regular grammar in linear time -- O(n). LR-regular is a vast class of grammars that includes LALR, LR(k) for all k, and LL(k) for all k. This means that Marpa parses, in linear time, every class of grammar parsed by yacc, recursive descent and regular expressions.
  2. "considered optimal": The phrase "considered optimal" elides some irrelevant bits of theory. I would be mildly surprised if it turns out that there is an O(n) algorithm for general BNF parsing, but nobody has proved that such a thing cannot exist. And there is an algorithm which, in theory, beats Marpa's O(n**3) worst case. The Valiant algorithm parses general BNF and is O(n**2.373...) or better. But Valiant's algorithm is only faster for huge problems, and for those it needs a machine with many terabytes of main memory to deliver on its speed claims. So it won't be competing with Marpa any time soon.
  3. "GNU utilities and libraries": These dependences can be an inconvenience, I admit, but the alternative is installing my attempt to portably re-create all the things the GNU people have developed. I think that it is clear that the GNU software is the easier and more reliable alternative.

    If you browse the package, you may see that it uses TeX as well. TeX is ONLY needed is you are working on libmarpa, the highly mathematical, low-level core library that provides the parse engine. To do this, you'd need to have studied a lot of the mathematics of parsing -- and you'd understand why I feel forced to do the documentation in TeX. All the non-mathematical parts are either in Perl, or in C code which can be read and changed on systems which do not have TeX installed.


  4. posted at: 00:34 | direct link to this entry

    §         §         §