Sun, 13 Dec 2015
Every year the Perl 6 community creates an "Advent" series of posts.
I always follow these, but
caught my attention this year.
It presents a vision of a future where programming is language-driven.
A vision that I share.
The post went on to encourage its readers to follow up on this vision,
and suggested an approach.
But I do not think the particular approach suggested would be fruitful.
In this post I'll explain why.
The focus of the Advent post was language-driven programming,
and that is the aspect that excites me most.
But the points that I wish to make are more easily understood if
I root them in a narrower, but more familiar issue
-- grammar reuse.
Most programmers will be very familiar with grammar reuse from regular expressions.
In the regular expression ("RE") world,
programming by cutting and pasting
is very practical and often practiced.
For this post I will consider grammar reusability to be the ability
to join two grammars and create a third.
This is also sometimes called grammar composition.
For this purpose, I will widen the term "grammar" to include
RE's and PEG parser specifications.
Ideally, when you compose two grammars,
what you get is
- a language you can reasonably predict, and
- if each of the two original grammars can be parsed in reasonable time,
a language that can be parsed in reasonable time.
Not all language representations are reusable.
RE's are, and BNF is.
PEG looks like a combination of BNF and RE's,
but PEG, in fact, is its own very special form of parser specification.
And PEG parser specifications are
one of the least reusable language representations ever invented.
Reuse and regular expressions
RE's are as well-behaved under
reuse as a language representation can get.
The combination of two RE's is always another RE,
and you can reasonably determine what language the combined RE recognizes by
Further, every RE is parseable in linear time.
The one downside, often mentioned by critics, is that RE's
do not scale in terms of readability.
Here, however, the problem is not really one of reusability.
The problem is that RE's are quite limited in their capabilities,
and programmers often exploit the excellent behavior of RE's under reuse
to push them into applications for which RE's just do not have the power.
Reuse and PEG
When programmers first look at PEG syntax, they often think they've encountered
paradise. They see both BNF and RE's, and imagine they'll have the
best of each.
But the convenient behavior of
RE's depends on their unambiguity.
You simply cannot write
an unambiguous RE -- it's impossible.
More powerful and more flexible, BNF allows you to describe many more grammars --
including ambiguous ones.
How does PEG resolve this? With a Gordian knot approach.
Whenever it encounters an ambiguity, it throws all but one of the choices away.
The author of the PEG specification gets some control over what is thrown away --
he specifies an order of preference for the choices.
But degree of control is less than it seems,
and in practice PEG is
the nitroglycerin of parsing -- marvelous when it works,
but tricky and dangerous.
Consider these 3 PEG specifications:
A = "a"A"a"/"aa"
All three clearly accept only strings which are repetitions
of the letter "a".
But which strings?
For the answers,
suggestions for dealing with PEG if you are committed to it,
my previous post on PEG.
When getting an RE or a BNF grammar to work,
you can go back to the grammar and ask
yourself "Does my grammar look like my intended language?".
With PEG, this is not really possible.
With practice, you might get used to figuring out single line PEG specs
like the first two above.
(If you can get the last one, you're amazing.)
But tracing these through multiple rule layers required by useful grammars is,
in practice, not really possible.
In real life,
PEG specifications are written by hacking them until the test suite works.
And, once you get a PEG specification to pass the test suite for a practical-sized grammar,
you are very happy to leave it alone.
Trying to compose two PEG specifications is rolling the dice with the odds against you.
Reuse and the native Perl 6 parser
The native Perl 6 parser is an extended PEG parser.
The extensions are very interesting from the PEG point of view.
The PEG "tie breaking" has been changed,
and backtracking can be used.
These features mean the Perl 6 parser can be extended to languages
well beyond what
ordinary PEG parsers can handle.
But, if you use the extra features, reuse will be even trickier than
if you stuck with vanilla PEG.
Reuse and general BNF parsing
As mentioned, general BNF is reusable, and so general BNF parsers like
are as reusable as regular expressions, with two caveats.
First, if the two grammars are not doing their own lexing, their lexers will have
to be compatible.
with regular expressions you had the advantage that
regular expression parses in linear time, so that speed was guaranteed to be acceptable.
Marpa users reuse grammars and pieces of grammars all the time.
The result is always the language specified by the merged BNF,
and I've never heard anyone complain that performance deterioriated.
But, while it may not happen often,
it is possible to combine two Marpa grammars that run in linear time
and end up with one that does not.
You can guarantee your merged Marpa grammar will stay linear if you follow 2 rules:
- keep the grammar unambiguous;
- don't use an unmarked middle recursion.
Unmarked middle recursions are not things you're likely to need a lot: they
are those palindromes where you have to count to find the middle:
grammars like "A ::= a | a A a".
If you use a middle recursion at all, it is almost certainly going to
be marked, like "A ::= b | a A a",
which generates strings like "aabaa".
With Marpa, as with RE's, reuse is easy and practical.
And, as I hope to show in a future post, unlike RE's,
Marpa opens the road to language-driven programming.
I'm a fan of the Perl 6 effort.
be a supporter, after the many favors they've done for me
and the Marpa community over the years.
The considerations of this post
will disappoint some of the
hopes for applications of the native Perl 6 parser.
But these applications have not been central to the Perl 6 effort,
of which I will be an eager student over the coming months.
To learn more about Marpa,
official web site maintained by Ron Savage.
I also have
a Marpa web site.
Comments on this post can be made in
Marpa's Google group,
or on our IRC channel: #marpa at freenode.net.
posted at: 13:13 |
direct link to this entry