Ocean of Awareness

Jeffrey Kegler's blog about Marpa, his new parsing algorithm, and other topics of interest

Jeffrey's personal website

Google+

Marpa resources

The Marpa website

The Ocean of Awareness blog: home page, chronological index, and annotated index.

Wed, 09 Mar 2016


What parser do birds use?

"Here we provide, to our knowledge, the first unambiguous experimental evidence for compositional syntax in a non-human vocal system." -- "Experimental evidence for compositional syntax in bird calls", Toshitaka N. Suzuki, David Wheatcroft & Michael Griesser Nature Communications 7, Article number: 10986

In this post I look at a subset of the language of the Japanese great tit, also known as Parus major. The above cited article presents evidence that bird brains can parse this language. What about standard modern computer parsing methods? Here is the subset -- probably a tiny one -- of the language actually used by Parus major.

      S ::= ABC
      S ::= D
      S ::= ABC D
      S ::= D ABC
    

Classifying the Parus major grammar

Grammophone is a very handy new tool for classifying grammars. Its own parser is somewhat limited, so that it requires a period to mark the end of a rule. The above grammar is in Marpa's SLIF format, which is smart enough to use the "::=" operator to spot the beginning and end of rules, just as the human eye does. Here's the same grammar converted into a form acceptable to Grammophone:

      S -> ABC .
      S -> D .
      S -> ABC D .
      S -> D ABC .
    

Grammophone tells us that the Parus major grammar is not LL(1), but that it is LALR(1).

What does this mean?

LL(1) is the class of grammar parseable by top-down methods: it's the best class for characterizing most parsers in current use, including recursive descent, PEG, and Perl 6 grammars. All of these parsers fall short of dealing with the Parus major language.

LALR(1) is probably most well-known from its implementations in bison and yacc. While able to handle this subset of Parus's language, LALR(1) has its own, very strict, limits. Whether LALR(1) could handle the full complexity of Parus language is a serious question. But it's a question that in practice would probably not arise. LALR(1) has horrible error handling properties.

When the input is correct and within its limits, an LALR-driven parser is fast and works well. But if the input is not perfectly correct, LALR parsers produce no useful analysis of what went wrong. If Parus hears "d abc d", a parser like Marpa, on the other hand, can produce something like this:

# * String before error: abc d\s
# * The error was at line 1, column 7, and at character 0x0064 'd', ...
# * here: d
    

Parus uses its language in predatory contexts, and one can assume that a Parus with a preference for parsers whose error handling is on an LALR(1) level will not be keeping its alleles in the gene pool for very long.

References, comments, etc.

Those readers content with sub-Parus parsing methods may stop reading here. Those with greater parsing ambitions, however, may wish to learn more about Marpa. A Marpa test script for parsing the Parus subset is in a Github gist. Marpa has a semi-official web site, maintained by Ron Savage. The official, but more limited, Marpa website is my personal one. Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.


posted at: 22:13 | direct link to this entry

§         §         §