<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Ocean of Awareness   </title>
    <link>http://jeffreykegler.github.com/Ocean-of-Awareness-blog</link>
    <description>Ocean of Awareness.</description>
    <language>en</language>

  <item>
    <title>Is Earley parsing fast enough?</title>
    <link>http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/04/fast_enough.html</link>
    <description>  &lt;blockquote&gt;
      &lt;!--
      marpa_r2_html_fmt --no-added-tag-comment --no-ws-ok-after-start-tag
      --&gt;
      &quot;First we ask, what impact will our algorithm have on the parsing
      done in production compilers for existing programming languages?
      The answer is, practically none.&quot; -- Jay Earley's Ph.D thesis, p. 122.
    &lt;/blockquote&gt;
    &lt;p&gt;In the above quote, the inventor of the Earley parsing
      algorithm poses a question.
      Is his algorithm fast enough for a production compiler?  His answer is a
      stark &quot;no&quot;.
    &lt;/p&gt;
    &lt;p&gt;
      This is the verdict on Earley's that you often
      hear repeated today, 45 years later.
      Earley's, it is said, has a too high a &quot;constant factor&quot;.
      Verdicts tends to be repeated more often than examined.
      This particular verdict originates with the inventor himself.
      So perhaps it is not astonishing
      that many treat the dismissal
      of Earley's on grounds of speed to be as valid today as it
      was in 1968.
    &lt;/p&gt;
    &lt;p&gt;But in the past 45 years,
      computer technology has changed beyond recognition
      and researchers
      have made several significant improvements to Earley's.
      It is time to reopen this case.
    &lt;/p&gt;&lt;h3&gt;What is a &quot;constant factor&quot;&lt;/h3&gt;
    &lt;p&gt;The term &quot;constant factor&quot; here has a special meaning,
      one worth looking at carefully.
      Programmers talk about time efficiency in two ways:
      time complexity and speed.
    &lt;/p&gt;
    &lt;p&gt;
      Speed is simple:
      It's how fast the algorithm is against the clock.
      To make comparison easy,
      the clock can be an abstraction.
      The clock ticks could be, for example, weighted instructions
      on some convenient and mutually-agreed architecture.
    &lt;/p&gt;
    &lt;p&gt;
      By the time Earley was writing, programmers had discovered that simply comparing
      speeds,
      even on well-chosen abstract clocks, was not enough.
      Computers were improving very quickly.
      A speed result
      that was clearly significant when the comparison was made
      could quickly become unimportant.
      Researchers needed to
      talk about time efficiency in a way that made what they said as true
      decades later as on the day they said it.
      To do this, researchers created the idea of time complexity.
    &lt;/p&gt;
    &lt;p&gt;Time complexity is measured using several notations, but the most
      common is
      &lt;a href=&quot;http://en.wikipedia.org/wiki/Big_O_notation&quot;&gt;big-O
        notation&lt;/a&gt;.
      Here's the idea:
      Assume we are comparing two algorithms, Algorithm A and Algorithm B.
      Assume that algorithm A uses 42 weighted instructions for each input symbol.
      Assume that algorithm B uses 1792 weighted instructions for each input symbol.
      Where the count of input symbols is N,
      A's speed is 42*N, and B's is 1792*N.
      But the time complexity of both is the same: O(N).
      The big-O notation throws away the two &quot;constant factors&quot;, 42 and 1792.
      Both are said to be &quot;linear in N&quot;.
      (Or more often, just &quot;linear&quot;.)
    &lt;/p&gt;
    &lt;p&gt;It often happens that algorithms we need to compare for time efficiency
      have different speeds,
      but the same time complexity.
      In practice,
      this usually this means we can treat them as having essentially
      the same time efficiency.
      But not always.
      It sometimes happens that this difference is relevant.
      When this happens, the rap against the slower algorithm is that it
      has a &quot;high constant factor&quot;.
    &lt;/p&gt;
    &lt;h3&gt;OK, about that high constant factor&lt;/h3&gt;
    &lt;p&gt;What is the &quot;constant factor&quot; between Earley and the current favorite
      parsing algorithm, as a number?
      (My interest is practical, not historic,
      so I will be talking about Earley's
      as modernized by Aycock, Horspool, Leo and myself.
      But much of what I say applies to Earley's algorithm in general.)
    &lt;/p&gt;
    &lt;p&gt;What the current favorite parsing algorithm is
      can be an interesting question.
      When Earley wrote, it was hand-written recursive descent.
      The next year (1969) LALR parsing was invented,
      and the year after (1970) a tool that used it was introduced -- yacc.
      At points over the next decades,
      yacc chased both Earley's
      and recursive descent almost completely out of the textbooks.
      &lt;a href=&quot;http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2010/09/perl-and-parsing-6-rewind.html&quot;&gt;
        But as I have detailed elsewhere&lt;/a&gt;,
          yacc had serious problems.
          In 2006 things went full circle -- the industry's standard C
          compiler, GCC, replaced LALR with recursive descent.
        &lt;/p&gt;
    &lt;p&gt;So back to 1970.
    That year, Jay Earley wrote up his algorithm for
    &quot;Communications of the ACM&quot;,
      and put a rough number on his &quot;constant factor&quot;.
      He said that his algorithm was an &quot;order of magnitude&quot; slower
      than the current favorites -- a factor of 10.
      Earley suggested ways to lower this 10-handicap,
      and modern implementations have followed up on them
      and found others.
      But for this post,
      let's concede the factor of ten and throw
      in another.
      Let's say Earley's is 100 times slower than the current favorite,
      whatever that happens to be.
    &lt;/p&gt;
    &lt;h3&gt;Moore's Law and beyond&lt;/h3&gt;
    &lt;p&gt;Let's look at the handicap of 100
      in the light of Moore's Law.
      Since 1968, computers have gotten a billion times faster -- nine orders
      of magnitude. Nine factors of ten.
      This means that today Earley's runs
      seven factors of ten faster than
      the current favorite algorithm did in
      1968.
      Earley's is 10 million times as fast as the algorithm that was
      then considered practical.
    &lt;/p&gt;
    &lt;p&gt;
      Of course, our standard of &quot;fast enough to be practical&quot; also evolves.
      But it evolves a lot more slowly.
      Let's exaggerate
      and say that &quot;practical&quot; meant &quot;takes an hour&quot; in 1968,
      but that today we would demand that the same program take only a second.
      Do the arithmetic and you find that Earley's is now
      more than 2,000 times faster than it needs to be to be practical.
    &lt;/p&gt;
    &lt;p&gt;Bringing in Moore's Law is just the beginning.
      The handicap Jay Earley gave his algorithm
      is based on a straight comparison of CPU speeds.
      But parsing, in practical cases, involves I/O.
      And the &quot;current favorite&quot; needs to do as much I/O as Earley's.
      I/O overheads, and the accompanying context switches,
      swamp considerations of CPU speed,
      and that is more true today
      that it was in 1968.
      When an application is I/O bound, CPU is in effect free.
      Parsing may not be I/O bound in this sense, but neither
      is it one of those applications where the comparison can be made
      in raw CPU terms.
    &lt;/p&gt;
    &lt;p&gt;Finally, pipelining has changed
      the nature of the CPU overhead itself radically.
      In 1968, the time to run a series of CPU
      instructions varied linearly with the number of instructions.
      Today, that is no longer true,
      and the change favors strategies like Earley's,
      which require a higher instruction count,
      but achieve efficiency in other ways.
    &lt;/p&gt;
    &lt;h3&gt;Achievable speed&lt;/h3&gt;
    &lt;p&gt;
      So far, I've spoken in terms of theoretical speeds, not achievable ones.
      That is, I've assumed that both Earley's
      and the current favorite are producing their best speed, unimpeded by
      implementation considerations.
    &lt;/p&gt;
    &lt;p&gt;
      Earley, writing in 1968 and thinking of hand-written recursive descent,
      assumed that production compilers
      could be, and in practice usually would be,
      written by
      programmers with plenty of time to do
      careful and well-thought-out hand-optimization.
      After forty-five years of real-life experience,
      we know better.
    &lt;/p&gt;
    &lt;p&gt;
      In those widely used practical compilers and interpreters
      that rely on lots of procedural logic --
      and these days that is almost all of them --
      it is usually all the maintainers can do to keep the procedural logic correct.
      In all but a few cases, optimization is opportunistic,
      not systematic.
      Programmers have been exposed to
      the realities of parsing with
      large amounts of complex procedural logic,
      and hand-written recursive descent has acquired a
      reputation for being slow.
    &lt;/p&gt;
    &lt;p&gt;
      In theory,
      LALR based compilers are less dependent on procedural
      parsing and therefore easier to keep optimal.
      In practice they are as bad or worse.
      LALR parsers usually still need a considerable amount of procedural logic,
      but procedural logic is harder to write for LALR than it
      is for recursive descent.
    &lt;/p&gt;
    &lt;p&gt;Modern Earley parsing
      has a much easier time actually delivering
      its theoretical best speed in practice.
      Earley's is powerful enough,
      and in its modern version well-enough aware of the state of the parse,
      that procedural logic can be kept to minimum or eliminated.
      Most of the parsing is done by the mathematics at its core.
    &lt;/p&gt;
    &lt;p&gt;
      The math at Earley's core can be heavily optimized,
      and any optimization benefits all applications.
      Optimization of special-purpose procedural logic benefits
      only the application that uses that logic.
    &lt;/p&gt;
    &lt;h3&gt;Other considerations&lt;/h3&gt;
    &lt;p&gt;But you might say,
    &lt;/p&gt;&lt;blockquote&gt;
      &quot;A lot of interesting points, Jeffrey, but all things being
      equal, a factor of 10,
      or even what's left from a factor of ten once I/O,
      pipelining and implementation inefficiencies have all nibbled away at it,
      is still worth having.
      It may in a lot of instances not even be measurable, but why not grab
      it for the sake of the cases where it is?&quot;
    &lt;/blockquote&gt;&lt;p&gt;
      Which is a good point.
      The &quot;implementation inefficiences&quot; can be nasty enough that Earley's is in
      fact faster in raw terms,
      but let's assume
      that some cost in speed is still being paid for the use of Earley's.
      Why incur that cost?
    &lt;/p&gt;&lt;h4&gt;Error diagnosis&lt;/h4&gt;&lt;p&gt;
      The parsing algorithms currently favored,
      in their quest for efficiency,
      do not maintain full
      information about the state of the parse.
      This is fine when the source is 100% correct,
      but in practice an important function of a parser is to find and
      diagnose errors.
      When the parse fails, the current favorites
      often have little idea of why.
      An Earley parser knows the full state of the parse.
      This added knowledge can save a lot of
      programmer time.
    &lt;/p&gt;&lt;h4&gt;Readability&lt;/h4&gt;
    &lt;p&gt;
      The more that a parser does from the grammar,
      and the less procedural logic it uses,
      the more readable the code will be.
      This has a determining effect on maintainance costs
      and the software's ability to evolve over time.
    &lt;/p&gt;&lt;h4&gt;Accuracy&lt;/h4&gt;
    &lt;p&gt;Procedural logic can produce inaccuracy -- inability
      to describe or control the actual language begin parsed.
      Some parsers, particularly LALR and PEG,
      have a second major source of inaccuracy -- they use
      a precedence scheme for conflict resolution.
      In specific cases, this can work, but
      precedence-driven conflict resolution
      produces a language without
      a &quot;clean&quot; theoretical description.
    &lt;/p&gt;
    &lt;p&gt;
      The obvious problem with not knowing what language you
      are parsing is failure to parse correct source code.
      But another, more subtle, problem can be worse over the
      life cycle of a language ...
    &lt;/p&gt;
    &lt;h4&gt;False positives&lt;/h4&gt;
    &lt;p&gt;False positives are cases
      where the input is in error,
      and should be reported as such, but instead
      the result is what you wanted.
      This may sound like unexpected good news,
      but when a false positive does surface,
      it is quite possible that it cannot be fixed
      without breaking code that, while incorrect, does work.
      Over the life of a language, false positives are deadly.
      False positives produce buggy and poorly understood code
      which must be preserved and maintained forever.
    &lt;/p&gt;
    &lt;h4&gt;Power&lt;/h4&gt;
    &lt;p&gt;
      The modern Earley implementation can parse vast classes
      of grammar in linear time.
      These classes include all those currently in practical use.
    &lt;/p&gt;&lt;h4&gt;Flexibility&lt;/h4&gt;
    &lt;p&gt;Modern Earley implementations
      parse all context-free grammars in times that are, in practice,
      considered optimal.
      With other parsers,
      the class of grammars parsed is highly restricted,
      and there is usually a real danger that a new change
      will violate those restrictions.
      As mentioned,
      the favorite alternatives to Earley's
      make it hard to know exactly what language you are,
      in fact, parsing.
      A change can break one of these parsers
      without there being any indication.
      By comparison,
      syntax changes and extensions to Earley's grammars
      are carefree.
    &lt;/p&gt;
    &lt;h3&gt;For more about Marpa&lt;/h3&gt;
    &lt;p&gt;
      Above I've spoken of &quot;modern Earley parsing&quot;,
      by which I've meant Earley parsing as amended and improved
      by the efforts of Aho, Horspool, Leo and myself.
      At the moment, the only implementation that contains
      all of these modernizations is Marpa.
    &lt;/p&gt;
    &lt;p&gt;
      Marpa's latest version is
      &lt;a href=&quot;https://metacpan.org/module/Marpa::R2&quot;&gt;Marpa::R2,
        which is available on CPAN&lt;/a&gt;.
      Marpa's
      &lt;a href=&quot;https://metacpan.org/module/JKEGL/Marpa-R2-2.052000/pod/Scanless/DSL.pod&quot;&gt;SLIF
        is
        a new interface&lt;/a&gt;,
      which represents a major increase
      in Marpa's &quot;whipitupitude&quot;.
      The SLIF has tutorials
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/01/dsl_simpler2.html&quot;&gt;here
      &lt;/a&gt;
      and
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/01/announce_scanless.html&quot;&gt;
        here&lt;/a&gt;.
      Marpa has
      &lt;a href=&quot;http://jeffreykegler.github.com/Marpa-web-site/&quot;&gt;a web page&lt;/a&gt;,
      and of course it is the focus of
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/&quot;&gt;
        my &quot;Ocean of Awareness&quot; blog&lt;/a&gt;.
    &lt;/p&gt;
    &lt;p&gt;
      Comments on this post
      can be sent to the Marpa's Google Group:
      &lt;code&gt;marpa-parser@googlegroups.com&lt;/code&gt;
    &lt;/p&gt;</description>
  </item>
  <item>
    <title>Marpa's SLIF now allows procedural parsing</title>
    <link>http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/04/procedural.html</link>
    <description>  &lt;p&gt;
      &lt;!--
      marpa_r2_html_fmt --no-added-tag-comment --no-ws-ok-after-start-tag
      --&gt;
    &lt;/p&gt;
    &lt;p&gt;
      Marpa's SLIF (scanless interface)
      allows an application to parse directly from any BNF grammar.
      Marpa parses vast classes of grammars in linear time,
      including all those classes currently in practical use.
      With
      &lt;a href=&quot;https://metacpan.org/release/Marpa-R2&quot;&gt;
        its latest release&lt;/a&gt;,
      Marpa::R2's SLIF
      also allows an application to intermix
      its own custom lexing and parsing logic
      with Marpa's,
      and to switch back and forth between them.
      This means,
      among other things,
      that Marpa's SLIF can now
      do procedural parsing.
    &lt;/p&gt;
    &lt;p&gt;
      What is procedural parsing?
      Procedural parsing is parsing using
      ad hoc code in a procedural language.
      The opposite of procedural parsing is declarative parsing
      -- parsing driven by some kind of formal description
      of the grammar.
      Procedural parsing
      may be described as what you do when you've given up
      on your parsing algorithm.
      Dissatisfaction with parsing theory
      has left modern programmers accustomed to procedural parsing.
      And in fact some problems are best tackled with procedural parsing.
    &lt;/p&gt;
    &lt;h3&gt;An example&lt;/h3&gt;
    &lt;p&gt;
      One such problem is parsing Perl-style here-documents.
      Peter Stuifzand has tackled this using
      &lt;a href=&quot;https://metacpan.org/release/JKEGL/Marpa-R2-2.052000&quot;&gt;
        the
        just-released version of Marpa::R2&lt;/a&gt;.
      For those unfamiliar, Perl allows documents to be incorporated
      into its source files in line-oriented fashion as &quot;here-documents&quot;.
      Here-documents can be used in expressions.
      The syntax to do this is very handy, if a little strange.
      For example,
    &lt;/p&gt;
    &lt;blockquote&gt;&lt;pre&gt;
say &amp;lt;&amp;lt;ENDA, &amp;lt;&amp;lt;ENDB, &amp;lt;&amp;lt;ENDC; say &amp;lt;&amp;lt;ENDD;
a
ENDA
b
ENDB
c
ENDC
d
ENDD&lt;/pre&gt;&lt;/blockquote&gt;
    &lt;p&gt;
      starts with a single line declaring four here-documents spread out
      over two
      &lt;tt&gt;say&lt;/tt&gt;
      statements.
      The expressions of the form
    &lt;/p&gt;&lt;blockquote&gt;&lt;pre&gt;&amp;lt;&amp;lt;ENDX&lt;/pre&gt;&lt;/blockquote&gt;&lt;p&gt;
      are here-document expressions.
      &lt;tt&gt;&amp;lt;&amp;lt;&lt;/tt&gt;
      is the heredoc operator.
      The string which follows it (in this example,
      &lt;tt&gt;ENDA&lt;/tt&gt;,
      &lt;tt&gt;ENDB&lt;/tt&gt;, etc.) is the heredoc terminator string --
      the string that will signal end
      of body of the here-document.
      The body of the here-documents follow, in order, over the next eight lines.
      More details of here-document syntax, with examples, can be found
      in
      &lt;a href=&quot;http://perldoc.perl.org/perlop.html#Quote-Like-Operators&quot;&gt;the
        Perl documentation&lt;/a&gt;.
    &lt;/p&gt;
    &lt;p&gt;All of this poses quite a challenge to a parser-lexer combination,
      which is one reason I chose it as an example --
      to illustrate that the Marpa's SLIF support for procedural parsing can
      handle genuinely difficult cases.
      There are a few ways Marpa could approach this.
      The one
      Peter Stuifzand chose was to
      to read the
      here-document's body as the value of the terminator in
      each
      &lt;tt&gt;&amp;lt;&amp;lt;ENDX&lt;/tt&gt;
      expression.
    &lt;/p&gt;
    &lt;p&gt;
      The strategy works this way:
      Marpa allows the application to mark certain lexemes as &quot;pause&quot; lexemes.
      Whenever a &quot;pause&quot; lexeme is encountered, Marpa's internal scanning stops,
      and control is handed over to the application.
      In this case, the application is set up to pause after every newline,
      and before the terminator in every here-document expression.
    &lt;/p&gt;
    &lt;p&gt;
      While reading the line containing the four here-document expressions,
      Marpa's SLIF pauses and resumes five times -- once for each here-document expression,
      then once for the final newline.
      Details can be found in compact form in the heavily commented code
      in
      &lt;a href=&quot;https://gist.github.com/jeffreykegler/5431739&quot;&gt;this
        Github gist&lt;/a&gt;.
    &lt;/p&gt;
    &lt;h3&gt;Marpa as a better procedural parser&lt;/h3&gt;
    &lt;p&gt;So far I've talked in terms of Marpa &quot;allowing&quot; procedural parsing.
      In fact, there can be much more to it.
      Marpa can make procedural parsing easier and more accurate.
    &lt;/p&gt;
    &lt;p&gt;Marpa knows, at every point, which rules it is recognizing, and how far it
      is into them.
      Marpa also knows which new rules the grammar expects, and which terminals.
      The procedural parsing logic can consult this information to guide its decisions.
      Marpa can provide your procedural parsing logic with radar,
      as well as the option to use a very smart autopilot.
    &lt;/p&gt;
    &lt;h3&gt;For more about Marpa&lt;/h3&gt;
    &lt;p&gt;
      Marpa's latest version is
      &lt;a href=&quot;https://metacpan.org/module/Marpa::R2&quot;&gt;Marpa::R2,
        which is available on CPAN&lt;/a&gt;.
      Marpa's
      &lt;a href=&quot;https://metacpan.org/module/JKEGL/Marpa-R2-2.052000/pod/Scanless/DSL.pod&quot;&gt;SLIF
        is
        a new interface&lt;/a&gt;,
      which represents a major increase
      in Marpa's &quot;whipitupitude&quot;.
      The SLIF has tutorials
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/01/dsl_simpler2.html&quot;&gt;here
      &lt;/a&gt;
      and
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/01/announce_scanless.html&quot;&gt;
        here&lt;/a&gt;.
      Marpa has
      &lt;a href=&quot;http://jeffreykegler.github.com/Marpa-web-site/&quot;&gt;a web page&lt;/a&gt;,
      and of course it is the focus of
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/&quot;&gt;
        my &quot;Ocean of Awareness&quot; blog&lt;/a&gt;.
    &lt;/p&gt;
    &lt;p&gt;
      Comments on this post
      can be sent to the Marpa's Google Group:
      &lt;code&gt;marpa-parser@googlegroups.com&lt;/code&gt;
    &lt;/p&gt;</description>
  </item>
  <item>
    <title>What if languages were free?</title>
    <link>http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/03/what_if_free.html</link>
    <description>  &lt;p&gt;
      &lt;!--
      marpa_r2_html_fmt --no-added-tag-comment --no-ws-ok-after-start-tag
      --&gt;
    &lt;/p&gt;
    &lt;p&gt;In 1980, George Copeland wrote
      &lt;a href=&quot;http://dl.acm.org/citation.cfm?id=802685&quot;&gt;
    an article&lt;/a&gt;
    titled &quot;What if Mass Storage were Free?&quot;.
      Costs of mass storage were showing signs
      that they might fall dramatically.
      Copeland, as a thought exercise, took this trend to its extreme.
      Among other things, he predicted that deletion would become
      unnecessary, and in fact, undesirable.
    &lt;/p&gt;
    &lt;p&gt;Copeland's
      thought experiment has proved prophetic.
      For many purposes, mass storage is treated as if it were free.
      For example, you probably retrieved this blog post from a server
      provided to me at no charge, in the hope
      that I might write and upload something interesting.
    &lt;/p&gt;
    &lt;p&gt;
      Until now languages were high-cost efforts.
      Worse, language projects ran a high risk of disappointment,
      up to and including total failure.
      I believe those days are coming to an end.
    &lt;/p&gt;
    &lt;h3&gt;Small languages, shaped to the problem domain&lt;/h3&gt;
    &lt;p&gt;What if whenever you needed a new language, poof, it was there?
      You would be encouraged to tackle each problem domain with
      a new language dedicated to dealing with that domain.
      Since each language is no larger than its problem domain,
      learning a language would be essentially the same as learning
      the problem domain.
      The incremental effort required to learn the language
      itself would head toward zero.
    &lt;/p&gt;
    &lt;h3&gt;No more language bloat&lt;/h3&gt;
    &lt;p&gt;Language bloat would end.
      Currently, the risk and cost of developing languages
      make it imperative to extend the ones we have.
      Free languages mean fewer reasons to add features
      to existing languages.
    &lt;/p&gt;
    &lt;h3&gt;No more search for THE perfect language&lt;/h3&gt;
    &lt;p&gt;
      No language is perfect for all tasks.
      But because the high cost of languages favors
      large, general-purpose languages,
      we are compelled to try for perfection anyway.
      Ironically, we are often making the language worse,
      and we know it.
    &lt;/p&gt;
    &lt;h3&gt;A world full of perfect languages&lt;/h3&gt;
    &lt;p&gt;An older sense of the word perfect is
      &quot;having all the properties or qualities requisite to its nature and kind&quot;.
      The C language might be called perfect in this sense.
      C lacks a lot of features that are highly desirable in most contexts.
      But for programming that is portable
      and close to the hardware,
      the C language is perfect or close to it.
      If languages were free, this is the kind of perfection
      that we would seek --
      languages precisely fitted to their domain,
      so that adding to them cannot make them better.
    &lt;/p&gt;
    &lt;h3&gt;Moving toward free&lt;/h3&gt;
    &lt;p&gt;
      My own effort to contribute to 
      a fall in the cost of languages is the Marpa parser.
      Marpa produces a reasonable parser
      for every language you can write in BNF.
      If the BNF is for a grammar in any of the classes currently in practical
      use, the parser Marpa produces will have linear speed.
      In one case, using Marpa,
      &lt;a href=&quot;https://gist.github.com/4447349&quot;&gt;a targeted language&lt;/a&gt;
      was written
      in less than an hour.
      &lt;a href=&quot;http://blogs.perl.org/users/jeffrey_kegler/2013/01/a-language-for-writing-languages.html&quot;&gt;
        More typically&lt;/a&gt;, Marpa reduce the time needed to create new languages to hours.
    &lt;/p&gt;
    &lt;p&gt;As one example of going from &quot;impossible&quot; to &quot;easy&quot;,
      I have written a drop-in solution to an example in the
      &lt;a href=&quot;http://en.wikipedia.org/wiki/Design_Patterns&quot;&gt;Gang
        of Four book&lt;/a&gt;.
      The Gang of Four described a language
      and its interpretation,
      but they did not include a parser.
      Creating a parser
      to fit their example would have been
      impossibly hard when the Gang of Four wrote.
      Using Marpa, it is easy.
      The parser can be found in
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/03/bnf_to_ast.html&quot;&gt;this
        earlier blog post&lt;/a&gt;.
    &lt;/p&gt;
    &lt;p&gt;
      Marpa's latest version is
      &lt;a href=&quot;https://metacpan.org/module/Marpa::R2&quot;&gt;Marpa::R2,
        which is available on CPAN&lt;/a&gt;.
      Recently, it has gained immensely in &quot;whipitupitude&quot; with
      &lt;a href=&quot;https://metacpan.org/module/JKEGL/Marpa-R2-2.048000/pod/Scanless/DSL.pod&quot;&gt;
        a new interface&lt;/a&gt;,
      which has tutorials
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/01/dsl_simpler2.html&quot;&gt;here
      &lt;/a&gt;
      and
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/01/announce_scanless.html&quot;&gt;
        here&lt;/a&gt;.
      Marpa has
      &lt;a href=&quot;http://jeffreykegler.github.com/Marpa-web-site/&quot;&gt;a web page&lt;/a&gt;,
      and of course it is the focus of
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/&quot;&gt;
        my &quot;Ocean of Awareness&quot; blog&lt;/a&gt;.
    &lt;/p&gt;
    &lt;p&gt;
      Comments on this post
      can be sent to the Marpa's Google Group:
      &lt;code&gt;marpa-parser@googlegroups.com&lt;/code&gt;
    &lt;/p&gt;</description>
  </item>
  <item>
    <title>The Interpreter Design Pattern</title>
    <link>http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/03/interpreter.html</link>
    <description>  &lt;p&gt;
      &lt;!--
      marpa_r2_html_fmt --no-added-tag-comment --no-ws-ok-after-start-tag
      --&gt;
    &lt;/p&gt;
    &lt;p&gt;The influential
      &lt;a href=&quot;http://en.wikipedia.org/wiki/Design_Patterns&quot;&gt;
        &lt;em&gt;Design Patterns&lt;/em&gt;
        book&lt;/a&gt;
      lays out 23 patterns for programming.
      One of them, the Interpreter Pattern, is rarely used.
      Steve Yegge puts it a bit more strikingly -- he says
      that the book contains
      &lt;a href=&quot;https://sites.google.com/site/steveyegge2/ten-great-books&quot;&gt;22
        patterns and a practical joke&lt;/a&gt;.
    &lt;/p&gt;
    &lt;p&gt;That sounds (and in fact is) negative, but
      &lt;a href=&quot;http://steve-yegge.blogspot.com/2007/12/codes-worst-enemy.html&quot;&gt;
        elsewhere&lt;/a&gt;
      Yegge says that
      &quot;[t]ragically, the only [Go4] pattern that can help code get smaller
      (Interpreter) is utterly ignored by programmers&quot;.
      (The
      &lt;i&gt;Design Patterns&lt;/i&gt;
      book has four authors,
      and is often called the Gang of Four book, or Go4.)
    &lt;/p&gt;
    &lt;p&gt;
      In fact, under various names and definitions, the
      Interpreter Pattern and its close relatives and/or identical twins
      are widely cited,
      much argued and highly praised&lt;a href=&quot;#NOTE1&quot;&gt;[1]&lt;/a&gt;.
          As they should be.
          Languages are the most powerful and flexible design pattern of all.
          A language can include all, and only, the concepts relevent
          to your domain.
          A language can allow you to relate them in all, and only, the appropriate ways.
          A language can identify errors with pinpoint precision,
          hide implementation details,
          allow invisible &quot;drop-in&quot; enhancements, etc., etc., etc.
        &lt;/p&gt;
    &lt;p&gt;
      In fact languages are so powerful and flexible,
      that their use is pretty much universal.
      The choice is not whether or not to use a language to solve
      the problem,
      but whether to use
      a general-purpose language,
      or a domain-specific language.
      Put another way,
      if you decide not to use a language targeted
      to your domain,
      it almost always means that you
      are choosing to use another language that is not specifically
      fitted to your domain.
    &lt;/p&gt;
    &lt;p&gt;
      Why then, is the Interpreter Pattern so little used?
      Why does Yegge call it a practical joke?
    &lt;/p&gt;
    &lt;h3&gt;There's a problem&lt;/h3&gt;
    &lt;p&gt;The problem with the Interpreter Pattern is that you must
      turn your language into an AST --
      that is,
      you must parse it somehow.
      Simplifying the language can help here.
      But if the point is to be simple at the expense of power
      and flexibility,
      you might as well
      stick with the other 22 design patterns.
    &lt;/p&gt;
    &lt;p&gt;
      On the other hand,
      creating a parser for anything but the simplest languages
      has been a time-consuming effort,
      and one of a kind known for disappointing results.
      In fact,
      language development efforts run
      a real risk of total failure.
    &lt;/p&gt;
    &lt;p&gt;How did the Go4 deal with this?
      They defined the problem away.
      They stated that the parsing issue was separate from the
      Interpreter Pattern, which was limited to what you did with the AST
      once you'd somehow come up with one.
    &lt;/p&gt;
    &lt;p&gt;
      But AST's don't (so to speak) grow on trees.
      You have to get one from somewhere.
      In their example, the Go4 simply built an AST in their code,
      node by node.
      In doing this, they bypassed the BNF and the problem of parsing.
      But they also bypassed their language and the whole point
      of the Interpreter Pattern.
    &lt;/p&gt;
    &lt;p&gt;
      Which is why Yegge characterized the chapter as a practical joke.
      And why other programming techniques and patterns are almost
      always preferred to the Interpreter Pattern.
    &lt;/p&gt;
    &lt;h3&gt;Finding that one missing piece&lt;/h3&gt;
    &lt;p&gt;So that's how the Go4 left things.
      A potentially great programming technique,
      made almost useless because
      of a missing piece.
      There was no easy, general, and practical way to generate AST's.
    &lt;/p&gt;
    &lt;p&gt;
      Few expected that to change.
      I was more optimistic than most.
      In 2007 I embarked on a full-time project:
      to create a parser based on Earley's algorithm.
      I was sure that it would fulfill two of the criteria --
      it would be easy to use, and it would be general.
      As for practical -- well, a lot of parsing problems
      are small, and a lot of applications don't require a lot
      of speed, and for these I expected the result to be good enough.
    &lt;/p&gt;
    &lt;p&gt;What I didn't realize was that
      all of the problems preventing
      Earley's from seeing real, practical use
      has already been solved in the academic literature.
      I was not alone in not having put the picture together.
      The people who had solved the problems
      had focused on two disjoint sets of issues,
      and were unaware of each other's
      work.
      In 1991, in the Netherlands,
      the mathematican Joop Leo had
      arrived at an astounding result --
      he showed how to make Earley's run in linear time for LR-regular grammars.
      LR-regular is a vast class of grammars.
      It easily includes, as a proper subset, every class of grammar now
      in practical use -- regular expressions, PEG, recursive descent,
      the LALR on which yacc and bison are based, you name it.
      (For those into the math,
      LR-regular includes LR(k)
      for all &lt;i&gt;k&lt;/i&gt;,
      and therefore LL(k),
      also for all &lt;i&gt;k&lt;/i&gt;.)
      &lt;/p&gt;
      &lt;p&gt;
      Leo's mathematical approach did not address some nagging practical issues,
      foremost among them the handling of nullable rules and symbols.
      But ten years later in Canada,
      Aycock and Horspool focused on exactly these issues,
      and solved them.
      Aycock-Horspool
      seem to have been unaware of Leo's earlier result.
      The time complexity of the Aycock-Horspool
      algorithm was essentially that of
      Earley's original algorithm.
    &lt;/p&gt;
    &lt;p&gt;
      Because of Leo's work,
      for any grammar in any class currently in practical use,
      an Earley's parser could be fast.
      If only it could be combined with the approach
      of Aycock and Horspool, I realized,
      Leo's speeds could be available in an everyday programming tool.
    &lt;/p&gt;
    &lt;p&gt;
      In changing the Earley parse engine,
      Aycock-Horspool and Leo had branched off in different directions.
      It was not obvious that their approaches could be combined, much less how.
      And in fact, the combination of the two is not a simple algorithm.
      But it is fast,
      and the new Marpa parse engine makes full information
      about the state of the parse (rules recognized, symbols expected, etc.)
      available as it proceeds.
      This is very convenient for, among other things, error reporting.
    &lt;/p&gt;
    &lt;h3&gt;Eureka and all that&lt;/h3&gt;
    &lt;p&gt;The result is an algorithm which parses anything
      you can write in BNF and
      does it in times considered optimal in practice.
      Unlike recursive descent, you don't have to write out the parser --
      Marpa generates a parser for you, from the BNF.
      It's the easy, &quot;drop-in&quot; solution that the Go4 needed and did not have.
      A reworking of the Go4 example, with the missing parser added,
      is in
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/03/bnf_to_ast.html&quot;&gt;a
        previous blog post&lt;/a&gt;, and the code for the reworking is in
      &lt;a href=&quot;https://gist.github.com/jeffreykegler/5121769&quot;&gt;
        a Github gist&lt;/a&gt;.
    &lt;/p&gt;
    &lt;h3&gt;More about Marpa&lt;/h3&gt;
    &lt;p&gt;
      Marpa's latest version is
      &lt;a href=&quot;https://metacpan.org/module/Marpa::R2&quot;&gt;Marpa::R2,
        which is available on CPAN&lt;/a&gt;.
      Recently, it has gained immensely in &quot;whipitupitude&quot; with
      &lt;a href=&quot;https://metacpan.org/module/JKEGL/Marpa-R2-2.048000/pod/Scanless/DSL.pod&quot;&gt;
        a new interface&lt;/a&gt;,
      which has tutorials
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/01/dsl_simpler2.html&quot;&gt;here
      &lt;/a&gt;
      and
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/01/announce_scanless.html&quot;&gt;
        here&lt;/a&gt;.
      Marpa has
      &lt;a href=&quot;http://jeffreykegler.github.com/Marpa-web-site/&quot;&gt;a web page&lt;/a&gt;,
      and of course it is the focus of
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/&quot;&gt;
        my &quot;Ocean of Awareness&quot; blog&lt;/a&gt;.
    &lt;/p&gt;
    &lt;p&gt;
      Comments on this post
      can be sent to the Marpa's Google Group:
      &lt;code&gt;marpa-parser@googlegroups.com&lt;/code&gt;
    &lt;/p&gt;
    &lt;h3&gt;Notes&lt;/h3&gt;
    &lt;p&gt;&lt;a name=&quot;NOTE1&quot;&gt;Note 1&lt;/a&gt;:
      For example,
      &lt;a href=&quot;http://en.wikipedia.org/wiki/Domain-specific_language&quot;&gt;the Wikipedia article on DSL's&lt;/a&gt;;
      &lt;a href=&quot;http://www.faqs.org/docs/artu/minilanguageschapter.html&quot;&gt;Eric Raymond discussing mini-languages&lt;/a&gt;;
      &lt;a href=&quot;http://www.dmst.aueb.gr/dds/pubs/jrnl/2000-JSS-DSLPatterns/html/dslpat.html&quot;&gt;
        &quot;Notable Design Patterns for Domain-Specific Languages&quot;&lt;/a&gt;, Diomidis Spinellis; and
      &lt;a href=&quot;http://www.c2.com/cgi/wiki?DomainSpecificLanguage&quot;&gt;the c2.com wiki&lt;/a&gt;.
    &lt;/p&gt;</description>
  </item>
  <item>
    <title>BNF to AST</title>
    <link>http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/03/bnf_to_ast.html</link>
    <description>  &lt;p&gt;
      &lt;!--
      marpa_r2_html_fmt --no-added-tag-comment --no-ws-ok-after-start-tag
      --&gt;
      The latest version of
      &lt;a href=&quot;https://metacpan.org/module/Marpa::R2&quot;&gt;
      Marpa&lt;/a&gt; takes parsing &quot;whipitupitude&quot; one step further.
      You can now go straight from
      a BNF description of your language,
      and an input string,
      to an abstract syntax tree (AST).
    &lt;/p&gt;
    &lt;p&gt;To illustrate, I'll use an example from the
      Gang of Four's (Go4's) chapter
      on the Interpreter pattern.
      (It's pages 243-255 of the
      &lt;a href=&quot;http://en.wikipedia.org/wiki/Design_Patterns&quot;&gt;
      &lt;em&gt;Design Patterns&lt;/em&gt; book&lt;/a&gt;.)
      The Go4 knew of no easy general way to go from BNF to AST,
      so they dealt with that part of the interpreter problem
      by punting --
      they did not even try to parse the input string.
      Instead they constructed the BNF they'd just presented and
      constructed an AST directly in their code.
    &lt;/p&gt;
    &lt;p&gt;The reason the Go4 didn't know of an easy,
    generally-applicable way
      to parse their example was that
      there was none.
      Now there is.
      In this post, Marpa will take us
      quickly and easily
      from BNF to AST.
      (Full code for this post can
      be found in
      &lt;a href=&quot;https://gist.github.com/jeffreykegler/5121769&quot;&gt;a Github gist&lt;/a&gt;.)
    &lt;/p&gt;
    &lt;p&gt;
      The Go4's example was a simple boolean expression language,
      whose primary input was
    &lt;/p&gt;
    &lt;blockquote&gt;
      &lt;pre&gt;
true and x or y and not x
&lt;/pre&gt;
    &lt;/blockquote&gt;
    &lt;p&gt;Here, in full, is the BNF for an slight elaboration of the
      Go4 example.
      It is written in the DSL for Marpa's Scanless interface (SLIF DSL),
      and includes specifications for building the AST.
    &lt;/p&gt;&lt;blockquote&gt;
      &lt;pre&gt;
:default ::= action =&amp;gt; ::array

:start ::= &amp;lt;boolean expression&amp;gt;
&amp;lt;boolean expression&amp;gt; ::=
       &amp;lt;variable&amp;gt; bless =&amp;gt; variable
     | '1' bless =&amp;gt; constant
     | '0' bless =&amp;gt; constant
     | ('(') &amp;lt;boolean expression&amp;gt; (')') action =&amp;gt; ::first bless =&amp;gt; ::undef
    || ('not') &amp;lt;boolean expression&amp;gt; bless =&amp;gt; not
    || &amp;lt;boolean expression&amp;gt; ('and') &amp;lt;boolean expression&amp;gt; bless =&amp;gt; and
    || &amp;lt;boolean expression&amp;gt; ('or') &amp;lt;boolean expression&amp;gt; bless =&amp;gt; or

&amp;lt;variable&amp;gt; ~ [[:alpha:]] &amp;lt;zero or more word characters&amp;gt;
&amp;lt;zero or more word characters&amp;gt; ~ [\w]*

:discard ~ whitespace
whitespace ~ [\s]+
&lt;/pre&gt;
    &lt;/blockquote&gt;
    &lt;p&gt;This syntax should be fairly transparent.
      In previous posts I've given
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/01/dsl_simpler2.html&quot;&gt;
        a tutorial&lt;/a&gt;,
      and a
      &lt;a href=&quot;http://jeffreykegler.github.com/Ocean-of-Awareness-blog/individual/2013/01/announce_scanless.html&quot;&gt;a
        mini-tutorial&lt;/a&gt;.
      And of course, the interface is
      &lt;a href=&quot;https://metacpan.org/module/JKEGL/Marpa-R2-2.048000/pod/Scanless/DSL.pod&quot;&gt;
        documented&lt;/a&gt;.
    &lt;/p&gt;
    &lt;p&gt;For those skimming, here are a few quick comments on less-obvious features.
      To guide Marpa in building the AST,
      the BNF statements have
      &lt;tt&gt;action&lt;/tt&gt;
      and
      &lt;tt&gt;bless&lt;/tt&gt;
      adverbs.
      The
      &lt;tt&gt;bless&lt;/tt&gt;
      adverbs indicate a Perl class into which the node should be
      blessed.
      This is convenient for using an object-oriented approach with the AST.
      The
      &lt;tt&gt;action&lt;/tt&gt;
      adverb tells Marpa how to build the nodes.
      &quot;&lt;tt&gt;action =&amp;gt; ::array&lt;/tt&gt;&quot; means the result of the rule should
      be an array containing its child nodes.
      &quot;&lt;tt&gt;action =&amp;gt; ::first&lt;/tt&gt;&quot; means the result of the rule should just be
      its first child.
      Many of the child symbols,
      especially literal strings of a structural nature,
      are in parentheses.
      This makes them invisible to
      the semantics.
    &lt;/p&gt;
    &lt;p&gt;A
      &lt;tt&gt;:default&lt;/tt&gt;
      pseudo-rule specifies the defaults -- in this case the
      &quot;&lt;tt&gt;action =&amp;gt; ::array&lt;/tt&gt;&quot; adverb setting.
      The
      &lt;tt&gt;:start&lt;/tt&gt;
      pseudo-rule specified the start symbol.
      The &lt;tt&gt;:discard&lt;/tt&gt; pseudo-rule
      indicates that whitespace is to be discarded.
    &lt;/p&gt;
    &lt;p&gt;The Go4 did not deal with precedence.
      In their example, the input string is fully parenthesized,
      even though its priorities are the standard ones.
      I've eliminated the parentheses, because
      the standard precedence is implemented in SLIF grammar.
      The double vertical bar (&quot;&lt;tt&gt;||&lt;/tt&gt;&quot;) is a &quot;loosen&quot; operator --
      an alternative after &quot;loosen&quot; operator will be
      at a looser precedence than the one before.
      Alternatives separated by a single bar are at the same precedence.
    &lt;/p&gt;&lt;h3&gt;Creating the AST&lt;/h3&gt;&lt;p&gt;
      Creating the AST is simple.
      First, we use Marpa to turn the above DSL for boolean expressions
      into a parser.
      (We'd saved the SLIF DSL source in the string
      &lt;tt&gt;$rules&lt;/tt&gt;.)
    &lt;/p&gt;&lt;blockquote&gt;
      &lt;pre&gt;
my $grammar = Marpa::R2::Scanless::G-&gt;new(
    {   bless_package =&gt; 'Boolean_Expression',
        source        =&gt; \$rules,
    }   
);  
&lt;/pre&gt;
    &lt;/blockquote&gt;
    &lt;p&gt;Next we define a closure that uses
      &lt;tt&gt;$grammar&lt;/tt&gt;
      to turn
      BNF into AST's.
    &lt;/p&gt;&lt;blockquote&gt;
      &lt;pre&gt;
sub bnf_to_ast {
    my ($bnf) = @_;
    my $recce = Marpa::R2::Scanless::R-&gt;new( { grammar =&gt; $grammar } );
    $recce-&gt;read( \$bnf );
    my $value_ref = $recce-&gt;value();
    if ( not defined $value_ref ) {
        die &quot;No parse for $bnf&quot;;
    }
    return ${$value_ref};
} ## end sub bnf_to_ast
&lt;/pre&gt;
    &lt;/blockquote&gt;&lt;p&gt;
Where &lt;tt&gt;$bnf&lt;/tt&gt; is our input string,
we run it as follows:
    &lt;/p&gt;&lt;blockquote&gt;
      &lt;pre&gt;
my $ast1 = bnf_to_ast($bnf);
&lt;/pre&gt;
    &lt;/blockquote&gt;
    &lt;h3&gt;The AST&lt;/h3&gt;
    &lt;p&gt;If we use Data::Dumper to examine the AST,
    &lt;/p&gt;&lt;blockquote&gt;
      &lt;pre&gt;
say Data::Dumper::Dumper($ast1) if $verbose_flag;
&lt;/pre&gt;
    &lt;/blockquote&gt;&lt;p&gt;
      we see this:
    &lt;/p&gt;&lt;blockquote&gt;
      &lt;pre&gt;
$VAR1 = bless( [
                 bless( [
                          bless( [
                                   'true'
                                 ], 'Boolean_Expression::variable' ),
                          bless( [
                                   'x'
                                 ], 'Boolean_Expression::variable' )
                        ], 'Boolean_Expression::and' ),
                 bless( [
                          bless( [
                                   'y'
                                 ], 'Boolean_Expression::variable' ),
                          bless( [
                                   bless( [
                                            'x'
                                          ], 'Boolean_Expression::variable' )
                                 ], 'Boolean_Expression::not' )
                        ], 'Boolean_Expression::and' )
               ], 'Boolean_Expression::or' );
&lt;/pre&gt;
    &lt;/blockquote&gt;
    &lt;h3&gt;Processing the AST&lt;/h3&gt;
    &lt;p&gt;In their example,
    the Go4 processed their AST in several ways:
    straight evaluation, copying,
      and substitution of the occurrences of a variable in one boolean expression
      by another boolean expression.
      It is obvious that the AST above is the computational
      equivalent of the Go4's AST,
      but for the sake of completeness I carry out the same operations
      &lt;a href=&quot;https://gist.github.com/jeffreykegler/5121769&quot;&gt;in the Github gist&lt;/a&gt;.
    &lt;/p&gt;
    &lt;p&gt;
      AST creation via Marpa's SLIF is self-hosting --
      the SLIF DSL is parsed into an AST,
      and a parser created by interpreting the AST.
      The Marpa SLIF DSL source file in this post,
      that describes boolean expressions,
      was itself turned into an AST on its way to becoming a parser
      that turns boolean expressions into AST's.
    &lt;/p&gt;&lt;h3&gt;Comments&lt;/h3&gt;
    &lt;p&gt;
      Comments on this post
      can be sent to the Marpa Google Group:
      &lt;code&gt;marpa-parser@googlegroups.com&lt;/code&gt;
    &lt;/p&gt;</description>
  </item>
  </channel>
</rss>
