Ocean of Awareness

Jeffrey Kegler's blog about Marpa, his new parsing algorithm, and other topics of interest

Jeffrey's personal website

Google+

Marpa resources

The Marpa website

The Ocean of Awareness blog: home page, chronological index, and annotated index.

Tue, 25 Feb 2014


Significant newlines? Or semicolons?

Should statements have explicit terminators, like the semicolon of Perl and the C language? Or should they avoid the clutter, and separate statements by giving whitespace syntactic significance and a real effect on the semantics, as is done in Python and Javascript?

Actually we don't have to go either way. As an example, let's look at some BNF-ish DSL. It defines a small calculator. At first glance, it looks as if this language has taken the significant-whitespace route -- there certainly are no explicit statement terminators.

:default ::= action => ::first
:start ::= Expression
Expression ::= Term
Term ::=
      Factor
    | Term '+' Term action => do_add
Factor ::=
      Number
    | Factor '*' Factor action => do_multiply
Number ~ digits
digits ~ [\d]+
:discard ~ whitespace
whitespace ~ [\s]+

The rule is that there isn't one

If we don't happen to like the layout of the above DSL, and rearrange it in various ways, we'll find that everything we try works. If we become curious about what exactly what the rules for newlines are, and look at the documentation, we won't find any. That's because there aren't any.

We can see this by thoroughly messing up the line structure:

:default ::= action => ::first :start ::= Expression Expression ::= Term
Term ::= Factor | Term '+' Term action => do_add Factor ::= Number |
Factor '*' Factor action => do_multiply Number ~ digits digits ~
[\d]+ :discard ~ whitespace whitespace ~ [\s]+

The script will continue to run just fine.

How does it work?

How does it work? Actually, pose the question this way: Can a human reader tell where the statements end? If the reader is not used to reading BNF, he might have trouble with this particular example but, for a language that he knows, the answer is simple: Yes, of course he can. So really the question is, why do we expect the parser to be so stupid that it cannot?

The only trick is that this is done without trickery. Marpa's DSL is written in itself, and Marpa's self-grammar describes exactly what a statement is and what it is not. The Marpa parser is powerful enough to simply take this self-describing DSL and act on it, finding where statements begin and end, much as a human reader is able to.

To learn more

This example was produced with the Marpa parser. Marpa::R2 is available on CPAN. The code for this example is based on that in the synopsis for its top-level document, but it is isolated conveniently in a Github gist.

A list of my Marpa tutorials can be found here. There are new tutorials by Peter Stuifzand and amon. The Ocean of Awareness blog focuses on Marpa, and it has an annotated guide. Marpa has a web page that I maintain and Ron Savage maintains another. For questions, support and discussion, there is the "marpa parser" Google Group. Comments on this post can be made there.


posted at: 18:30 | direct link to this entry

§         §         §