Ocean of Awareness

Jeffrey Kegler's blog about Marpa, his new parsing algorithm, and other topics of interest

Jeffrey's personal website


Marpa resources

The Marpa website

The Ocean of Awareness blog: home page, chronological index, and annotated index.

Thu, 05 Jan 2012

What! No Lexer?

To those who have noted that Marpa::XS does not come with a lexer, I'd respond that, in a very real sense it does -- Perl. Perl5 is a powerful lexical analyzer.

If you're trying to figure out how to write your first Marpa parser, I'd recommend a close look at Wolfgang Kinkeldei's recent posting about his Marpa-powered CSS parser. Wolfgang lays his parser out in a very elegant fashion, and I find his code makes an excellent template.

Especially nice-looking is Wolfgang's lexer. Wolfgang follows one of the two main strategies for lexical analysis in Perl: he consumes the input using substitution (s/ ... / ... /) commands.

The other strategy is to use the Perl regex search position to track the progress of the lexical analysis. In the search-position strategy, your cases consist of a lot of match commands using the \G anchor and the gc modifier: m/\G ... /gc. An excellent tutorial on this kind of lexing, albeit in a non-Marpa context, can be found in Mark Jason Dominus's book, Higher Order Perl. Mark's coverage of lexing is in Chapter 8, "Parsing", on pages 359-375. Mark's book can be read on-line. I highly recommend Mark's book and own a paper copy.

Actually, regular expressions are well within Marpa's capabilites, and lexical analysis could be done in Marpa. But a look at Mark and Wolfgang's code should convince you that lexical analysis is easy to do in Perl.

posted at: 18:31 | direct link to this entry

§         §         §