Wed, 09 Mar 2016
What parser do birds use?
"Here we provide, to our knowledge, the first unambiguous experimental evidence for compositional syntax in a non-human vocal system."
"Experimental evidence for compositional syntax in bird calls",
Toshitaka N. Suzuki, David Wheatcroft & Michael Griesser
7, Article number: 10986
In this post I look at a subset of the language
Japanese great tit,
also known as Parus major.
The above cited article presents evidence
that bird brains can parse this language.
What about standard modern computer parsing methods?
Here is the subset -- probably a tiny one --
of the language actually used by Parus major.
S ::= ABC
S ::= D
S ::= ABC D
S ::= D ABC
Classifying the Parus major grammar
Grammophone is a very handy
for classifying grammars.
Its own parser is somewhat limited, so that it requires a period
to mark the end of a rule.
The above grammar is in Marpa's SLIF format, which is smart enough to
use the "::="
operator to spot the beginning and end of rules,
just as the human eye does.
Here's the same grammar converted into a form acceptable to Grammophone:
S -> ABC .
S -> D .
S -> ABC D .
S -> D ABC .
Grammophone tells us that the Parus major grammar is not LL(1),
but that it is LALR(1).
What does this mean?
LL(1) is the class of grammar parseable by top-down methods:
it's the best class for characterizing most parsers in current use,
including recursive descent, PEG,
and Perl 6 grammars.
All of these parsers fall short of dealing with the Parus major language.
LALR(1) is probably most well-known from its implementations in
bison and yacc.
While able to handle this subset of Parus's language,
LALR(1) has its own, very strict, limits.
Whether LALR(1) could handle the full
complexity of Parus
language is a serious question.
But it's a question that in practice would probably not arise.
LALR(1) has horrible error handling properties.
When the input is correct and within its limits,
an LALR-driven parser is fast and works well.
But if the input is not perfectly correct,
LALR parsers produce
no useful analysis of what went wrong.
If Parus hears "d abc d",
a parser like Marpa, on the other hand, can produce something like
# * String before error: abc d\s
# * The error was at line 1, column 7, and at character 0x0064 'd', ...
# * here: d
Parus uses its language in predatory contexts,
and one can assume that a Parus with a preference for parsers whose
error handling is on an LALR(1) level
will not be keeping its alleles in the gene pool for very
References, comments, etc.
Those readers content with sub-Parus parsing methods may stop reading here.
Those with greater parsing ambitions, however, may
wish to learn more about Marpa.
A Marpa test script for parsing the Parus subset is in
a Github gist.
Marpa has a
semi-official web site, maintained by Ron Savage.
The official, but more limited, Marpa website
is my personal one.
Comments on this post can be made in
Marpa's Google group,
or on our IRC channel: #marpa at freenode.net.
posted at: 19:13 |
direct link to this entry