Ocean of Awareness

Jeffrey Kegler's blog about Marpa, his new parsing algorithm, and other topics of interest

Jeffrey's personal website


Marpa resources

The Marpa website

The Ocean of Awareness blog: home page, chronological index, and annotated index.

Sun, 13 Jan 2013

A language for writing languages

Marpa::R2's Scanless interface is not yet two weeks old, but already there are completed applications. Significantly, two of them are for work.

A JSON Parser

The non-work-related application is a JSON parser. Given what it does, it easily could have been work-related. (It's been available for a few days as a gist, so it may well be in production use somewhere.) It was written by Peter Stuifzand, runs 185 lines and took him less than 30 minutes to write. Peter reports that it was a matter of typing in the grammar, and adding a few Perl functions to provide the semantics.

There are, of course, other JSON parsers out there, many of which run faster. These, however, took weeks to write. If you are, for example, thinking of extending JSON, and development time is a major consideration, the Marpa-based solution will be attractive.

Printer escape codes

Peter also did a Marpa-based language for work -- a solution to the problem of printer escape codes. For those unfamilar, a printer's special features can often be invoked by "escape sequences" -- byte sequences which control things like cursor motion, color, character sets, graphics, etc., etc. It's nice to invoke them with a set of well-named functions.

Escape sequences are usually repetitive, and when complex, are usually not complex in an interesting way. They can be programmed with regex or eval hacks. But this time Peter chose to write a mini-language that specifies escape sequences, and to use Marpa to compile the mini-language into Perl code. He was done in a hour.

A log file query language

Meanwhile, an interesting and adventurous language effort was underway on the other side of the Atlantic, where Paul Bennett, faced with analyzing lots of nginx log files, decided a powerful custom log query language was the best way to address his issue. Paul needed to design and specify his language from scratch. Paul was also facing a learning curve, but he read the gist for a Scanless interface example, and apparently was able to teach himself quickly from there. (He doesn't say, but it might have been the one for this post.)

Like Peter's escape sequence language, Paul's log query language program compiles to Perl. Its writing and debugging were spread out over 3 days. Paul reports that his language is on the job already, but that it needs some clean-up before going onto CPAN.

The snippets Paul shows are enticing. The language seems to include strings, integers and timestamps as supported types; regexes; a full set of comparison and boolean operators; and helpful new "any", "between" and "one" operators. Pretty good for three days. A lot of nasty problems snuggled away in log files may find their hiding places are not nearly as safe as they have been able to expect.

Where to start

If you're interested in learning more about Marpa's Scanless interface, there is a tutorial. Additionally, the announcement of the Scanless interface contained a mini-tutorial.


Comments on this post can be sent to the Marpa Google Group: marpa-parser@googlegroups.com

posted at: 17:58 | direct link to this entry

§         §         §