Sun, 04 Nov 2012
A grammar that exemplifies, describes and parses itself
I've written a grammar in Marpa's new BNF interface,
to parse Marpa's new BNF interface.
In the 70's, when I learned parsing theory,
this was a very fashionable thing to do, perhaps because
yacc had done it,
in Appendix B of
the original 1975 paper.
By 1979, Hoftstadter's book Godel-Escher-Bach (GEB) was out,
and the next year it took the Pulitzer for
Self-description, recursion, self-reference, self-embedding,
these things were all the rage.
that is at once both self-example and self-description
still holds a certain magic for me.
Regular expressions cannot describe themselves.
Recursive descent parsers are hand-written
in another general-purpose language,
so there can be no concise self-description.
Ironically, yacc actually cannot parse its own description language.
("Ironically" is the word used in the paper.)
Like almost all useful grammars, yacc's description language
goes beyond the capabilities of yacc's LALR parser,
and a lexer hack is needed to make the code in Appendix B work.
Marpa is a general BNF parser and requires no special hacks
to parse the following efficiently:
rules ::= rule+ action => do_rules
rule ::= empty_rule | priority_rule | quantified_rule
priority_rule ::= lhs op_declare priorities
action => do_priority_rule
empty_rule ::= lhs op_declare adverb_list
action => do_empty_rule
quantified_rule ::= lhs op_declare name quantifier adverb_list
action => do_quantified_rule
priorities ::= alternatives+
separator => op_tighter proper => 1
action => do_discard_separators
alternatives ::= alternative+
separator => op_eq_pri proper => 1
action => do_discard_separators
alternative ::= rhs adverb_list action => do_alternative
adverb_list ::= adverb_item* action => do_adverb_list
| left_association | right_association | group_association
| separator_specification | proper_specification
action ::= kw_action op_arrow name action => do_action
left_association ::= kw_assoc op_arrow kw_left
action => do_left_association
right_association ::= kw_assoc op_arrow kw_right
action => do_right_association
group_association ::= kw_assoc op_arrow kw_group
action => do_group_association
separator_specification ::= kw_separator op_arrow name
action => do_separator_specification
proper_specification ::= kw_proper op_arrow boolean
action => do_proper_specification
lhs ::= name action => do_lhs
rhs ::= names
quantifier ::= op_star | op_plus
names ::= name+ action => do_array
name ::= bare_name | reserved_word | quoted_name
name ::= bracketed_name action => do_bracketed_name
reserved_word ::= kw_action | kw_assoc | kw_separator | kw_proper
| kw_left | kw_right | kw_group
The conventions are standard or transparent.
The "::=" symbol separates the left and right hand sides of rules.
The "|" symbol separates alternative right hand sides.
The "*" and
"+" are quantifiers, similar to those in regular expressions,
and indicate, respectively, zero or more repetitions and one or more repetitions
of the preceding symbol.
Adverbs take the form "keyword => value",
and indicate semantics or the style of sequence separation.
Full documentation can be found
Self-parsing compiler compilers ruled the earth
in the age of bellbottoms.
Self-parsing has lasted better, but not by much.
When some years I wrote a self-describing language as an interface to
Marpa, it seemed to confuse people.
They wondered what Marpa did --
parsing your own description did not seem to be
about doing anything.
These days my examples feature a lot of calculators.
("Ironically", Hofstadter seems to have had the same problem with
GEB -- he felt that
people did not understand what his book was saying --
even those who liked it.)
But ideas from Larry Wall and Peter Stuifzand
have re-ignited my interest in self-parsing.
And this time the self-parsing parser was written
with a specific purpose.
I plan to enhance this language.
I have found that the convenience of this interface
more than compensates for the circular
The BNF source in this post is
for its own parser,
and I plan to use it
to produce improved versions
Comments on this post can be sent to the Marpa Google Group:
posted at: 16:00 |
direct link to this entry