Ocean of Awareness

Jeffrey Kegler's blog about Marpa, his new parsing algorithm, and other topics of interest

Jeffrey's personal website


Marpa resources

The Marpa website

The Ocean of Awareness blog: home page, chronological index, and annotated index.

Mon, 22 Apr 2013

Marpa's SLIF now allows procedural parsing

Marpa's SLIF (scanless interface) allows an application to parse directly from any BNF grammar. Marpa parses vast classes of grammars in linear time, including all those classes currently in practical use. With its latest release, Marpa::R2's SLIF also allows an application to intermix its own custom lexing and parsing logic with Marpa's, and to switch back and forth between them. This means, among other things, that Marpa's SLIF can now do procedural parsing.

What is procedural parsing? Procedural parsing is parsing using ad hoc code in a procedural language. The opposite of procedural parsing is declarative parsing -- parsing driven by some kind of formal description of the grammar. Procedural parsing may be described as what you do when you've given up on your parsing algorithm. Dissatisfaction with parsing theory has left modern programmers accustomed to procedural parsing. And in fact some problems are best tackled with procedural parsing.

An example

One such problem is parsing Perl-style here-documents. Peter Stuifzand has tackled this using the just-released version of Marpa::R2. For those unfamiliar, Perl allows documents to be incorporated into its source files in line-oriented fashion as "here-documents". Here-documents can be used in expressions. The syntax to do this is very handy, if a little strange. For example,

say <<ENDA, <<ENDB, <<ENDC; say <<ENDD;

starts with a single line declaring four here-documents spread out over two say statements. The expressions of the form


are here-document expressions. << is the heredoc operator. The string which follows it (in this example, ENDA, ENDB, etc.) is the heredoc terminator string -- the string that will signal end of body of the here-document. The body of the here-documents follow, in order, over the next eight lines. More details of here-document syntax, with examples, can be found in the Perl documentation.

All of this poses quite a challenge to a parser-lexer combination, which is one reason I chose it as an example -- to illustrate that the Marpa's SLIF support for procedural parsing can handle genuinely difficult cases. There are a few ways Marpa could approach this. The one Peter Stuifzand chose was to to read the here-document's body as the value of the terminator in each <<ENDX expression.

The strategy works this way: Marpa allows the application to mark certain lexemes as "pause" lexemes. Whenever a "pause" lexeme is encountered, Marpa's internal scanning stops, and control is handed over to the application. In this case, the application is set up to pause after every newline, and before the terminator in every here-document expression.

While reading the line containing the four here-document expressions, Marpa's SLIF pauses and resumes five times -- once for each here-document expression, then once for the final newline. Details can be found in compact form in the heavily commented code in this Github gist.

Marpa as a better procedural parser

So far I've talked in terms of Marpa "allowing" procedural parsing. In fact, there can be much more to it. Marpa can make procedural parsing easier and more accurate.

Marpa knows, at every point, which rules it is recognizing, and how far it is into them. Marpa also knows which new rules the grammar expects, and which terminals. The procedural parsing logic can consult this information to guide its decisions. Marpa can provide your procedural parsing logic with radar, as well as the option to use a very smart autopilot.

For more about Marpa

Marpa's latest version is Marpa::R2, which is available on CPAN. Marpa's SLIF is a new interface, which represents a major increase in Marpa's "whipitupitude". The SLIF has tutorials here and here. Marpa has a web page, and of course it is the focus of my "Ocean of Awareness" blog.

Comments on this post can be sent to the Marpa's Google Group: marpa-parser@googlegroups.com

posted at: 03:03 | direct link to this entry

§         §         §