Ocean of Awareness

Jeffrey Kegler's blog about Marpa, his new parsing algorithm, and other topics of interest

Jeffrey's personal website


Marpa resources

The Marpa website

The Ocean of Awareness blog: home page, chronological index, and annotated index.

Sun, 04 Nov 2012

A grammar that exemplifies, describes and parses itself

I've written a grammar in Marpa's new BNF interface, to parse Marpa's new BNF interface. In the 70's, when I learned parsing theory, this was a very fashionable thing to do, perhaps because yacc had done it, in Appendix B of the original 1975 paper. By 1979, Hoftstadter's book Godel-Escher-Bach (GEB) was out, and the next year it took the Pulitzer for General Nonfiction. Self-description, recursion, self-reference, self-embedding, you (preferably autologically) name it, these things were all the rage.

Reading code that is at once both self-example and self-description still holds a certain magic for me. Regular expressions cannot describe themselves. Recursive descent parsers are hand-written in another general-purpose language, so there can be no concise self-description. Ironically, yacc actually cannot parse its own description language. ("Ironically" is the word used in the paper.) Like almost all useful grammars, yacc's description language goes beyond the capabilities of yacc's LALR parser, and a lexer hack is needed to make the code in Appendix B work.

Marpa is a general BNF parser and requires no special hacks to parse the following efficiently:

rules ::= rule+ action => do_rules
rule ::= empty_rule | priority_rule | quantified_rule
priority_rule ::= lhs op_declare priorities
  action => do_priority_rule
empty_rule ::= lhs op_declare adverb_list
  action => do_empty_rule
quantified_rule ::= lhs op_declare name quantifier adverb_list
    action => do_quantified_rule
priorities ::= alternatives+
    separator => op_tighter proper => 1
    action => do_discard_separators
alternatives ::= alternative+
    separator => op_eq_pri proper => 1
    action => do_discard_separators
alternative ::= rhs adverb_list action => do_alternative
adverb_list ::= adverb_item* action => do_adverb_list
adverb_item ::=
    | left_association | right_association | group_association
    | separator_specification | proper_specification

action ::= kw_action op_arrow name action => do_action
left_association ::= kw_assoc op_arrow kw_left
  action => do_left_association
right_association ::= kw_assoc op_arrow kw_right
  action => do_right_association
group_association ::= kw_assoc op_arrow kw_group
  action => do_group_association
separator_specification ::= kw_separator op_arrow name
  action => do_separator_specification
proper_specification ::= kw_proper op_arrow boolean
action => do_proper_specification

lhs ::= name action => do_lhs
rhs ::= names
quantifier ::= op_star | op_plus
names ::= name+ action => do_array
name ::= bare_name | reserved_word | quoted_name
name ::= bracketed_name action => do_bracketed_name

reserved_word ::= kw_action | kw_assoc | kw_separator | kw_proper
  | kw_left | kw_right | kw_group

The conventions are standard or transparent. The "::=" symbol separates the left and right hand sides of rules. The "|" symbol separates alternative right hand sides. The "*" and "+" are quantifiers, similar to those in regular expressions, and indicate, respectively, zero or more repetitions and one or more repetitions of the preceding symbol. Adverbs take the form "keyword => value", and indicate semantics or the style of sequence separation. Full documentation can be found here.

Self-parsing compiler compilers ruled the earth in the age of bellbottoms. Self-parsing has lasted better, but not by much. When some years I wrote a self-describing language as an interface to Marpa, it seemed to confuse people. They wondered what Marpa did -- parsing your own description did not seem to be about doing anything. These days my examples feature a lot of calculators. ("Ironically", Hofstadter seems to have had the same problem with GEB -- he felt that people did not understand what his book was saying -- even those who liked it.)

But ideas from Larry Wall and Peter Stuifzand have re-ignited my interest in self-parsing. And this time the self-parsing parser was written with a specific purpose. I plan to enhance this language. I have found that the convenience of this interface more than compensates for the circular dependency issues. The BNF source in this post is the source for its own parser, and I plan to use it to produce improved versions of itself.


Comments on this post can be sent to the Marpa Google Group: marpa-parser@googlegroups.com

posted at: 16:00 | direct link to this entry

§         §         §