Ocean of Awareness

Jeffrey Kegler's blog about Marpa, his new parsing algorithm, and other topics of interest

Jeffrey's personal website


Marpa resources

The Marpa website

The Ocean of Awareness blog: home page, chronological index, and annotated index.

Tue, 26 Jun 2012

The useful, the playful, the easy, the hard and the beautiful

The useful

I am now using a useful new tool on CPAN, Peter Stuifzand's MarpaX::Simple::Rules. Marpa grammars are often best expressed like this:

    pair ::= a a
    pair ::= pair a
    pair ::= a pair
    pair ::= pair pair
Peter's module allow you to do this, and all the examples in this blog post were run using it.

For those new to this blog Marpa is something new in parsing -- it parses anything you can write in BNF and, if your grammar is in one of the classes currently in practical use, parses it in linear time. Marpa's parse engine is written in optimized C, so that Marpa's speed is competitive with parsers of far less power. Marpa's stable version is Marpa::XS.

The playful

The grammar allows every possible pairing of 2 adjacent symbols in a grammar. Which takes this post on a brief excursion into the playful. Combinatoric issues are often easily expressed as grammars, and Marpa makes it easy to count parses. Clearly there is only one way in which 2 symbols can be paired. And it's not too hard to see that 3 symbols can be paired in 2 ways. The counting rapidly gets difficult and error-prone, but I can convince myself with pencil and paper that 4 symbols can be paired in 5 ways. At this point I let Marpa take over. The code is online, and this is the result:

    1 2 5 14 42 132 429 1430 4862
For further analysis, I do what any red-blooded mathematician would do: I type the series into a Google seach bar and hit return. These numbers are the Catalan series, one which pops up all over the place. Because my issue is set up as a pairing problem, I didn't obtain the first two numbers, but the standard Catalan series is this:
    0 1 1 2 5 14 42 132 429 1430 4862 16796 58786 208012 742900 2674440
Confirmation that the Catalan series is the right one can be found in Concrete Mathmematics, Graham, Knuth, and Patashnik, pp. 343-344, or on Wikipedia.

The useful revisited

Playing with these series can be fun, but for my Marpa work they are more than useful, they are necessary. Marpa claims to accurately iterate all the parses of an ambiguous parse. But to claim this, I need to test it.

I can't do serious testing based on parse counts by hand. Because even if you can find the right answer with pencil and paper, that's not enough -- you need to demonstrate conclusively that the right answer IS the right answer. Matching Marpa's answer is not enough -- it might simply be a case of me and Marpa both counting wrong the same way.

For serious testing, I need to match the parse counts with a verified series. I can verify a series in two ways. I prefer the easy way -- finding a case, like that of the Catalan numbers just described, where it's already well-known that my parse counts and the series should match. But I also do it the hard way when I have to. In what follows I'll show an example of each.

The easy

This next grammar allows every one of 10 symbols to be either 'a' or nulled.
    S ::= A A A A A A A A A A
    A ::= a
    A ::= E
    E ::= Null
This is equivalent to the very well-known (and mercifully well-understood) problem of picking k items out of n. Here are the parse counts:
    1 10 45 120 210 252 210 120 45 10 1
These are the binomial coefficients for n=10. (Binomals are one of the many places this series pops up.)

A nice feature of Peter's MarpaX::Simple::Rules is that Peter's rules can be mixed with Marpa's "native" rules. This will be useful here, for generalizing from n=10 to 0<=n<=12. The code is online as a Github gist. And the result, as the math-savvy among my readers will be expecting, is Pascal's triangle:

1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
1 10 45 120 210 252 210 120 45 10 1
1 11 55 165 330 462 462 330 165 55 11 1
1 12 66 220 495 792 924 792 495 220 66 12 1

The hard

The previous two examples were good, but also quite "pure" mathematically -- there's symmetry all over the place. Real grammars tend to be a bit more "pear-shaped", and I wanted an example which, while combinatorially hard, reflected this.

Operators tend to be the most complex part of practical language grammars, so I focused on them. Their ambiguity is considerably tamed by using precedence, so I chose to ignore precedence. Few languages have a richer, and more potentially ambiguous, set of operators than Perl, so I chose the Perl operators. And the fastest way to generate operator ambiguity in Perl is with a series of minus signs. Things like:

Let me emphasize that this is not real Perl syntax -- it's a wildly ambiguous variant of Perl invented to make a good test case.

In Perl, minus signs can be part of 4 different operators: prefix decrement (--$a); postfix decrement ($a--); unary negation (-$a) and subtraction ($a-$b). Here's our grammar:

    E ::= E Minus E
    E ::= E Minus Minus
    E ::= Minus Minus E
    E ::= Minus E
    E ::= Variable
The code is online, and these, for minus sign counts from 1 to 12, are our parse counts:
    1 1 3 4 8 12 21 33 55 88 144 232

This series is not one of the well-known ones, but it is known -- it is A052952 , a sort of rag-time Fibonacci variant. It has no name, and I like to call it the Wall series, as a tribute to Larry, and because it's based on his Perl operators. With this one I could not simply look it up by name and confirm that A05952 is the series generated by the parse counts. But I was able to come up with my own proof.

The beautiful

The Pythagoreans believed that the world was created out of beauty and order; that mathematics was the basis of all things; and that each of these beliefs implied the other. Their beliefs gained special credibility when they showed that much of the power of music comes down to ratio.

Avowed Pythagoreans, who were probably never thick on the ground, are hard to find these days. But in another sense, Pythagoreanism dominates our world. That special reverence and belief in the power of science that characterizes us in the West -- it comes from them. We claim many philosophies and religions, but in fact it seems that we cannot stop ourselves from thinking like Pythagoreans, and these other creeds are more our aspiration than our conviction.

So does the secret of a happy and virtuous life lie in contemplating the harmonies of mathematics? That is an idea that seems at once too hard and too easy to be true. But it is easy to appreciate these series for their beauty, and hard not to be struck by the way that, when you avoid order, order emerges.

posted at: 19:02 | direct link to this entry

§         §         §