Posts about Marpa: an annotated list

Tutorials

Practical general parsing makes writing a DSL much simpler than it has been in the past. The posts in this series of tutorials are among my most popular. Although you can join the series starting at a topic of special interest, the easiest way to read (or at least skim) them is in the order listed.

If you're looking for tutorials, you'll want to start, and probably to end, with those based on the SLIF interface. The best of mine for the beginner are above, in this section. The section on programming language design also contains some posts whose examples work as SLIF-based tutorials. More tutorials can be found in the section on combinator parsing.

While SLIF-based tutorials should be strongly preferred, my pre-SLIF tutorials, described below may still have some value. They contain many useful examples, and most of the code in them carries over to SLIF line-for-line.

About Marpa in general

Parsing timeline and history

In the process of writing Marpa, I read and thought a lot about the various approaches to parsing, and how they developed. When I came to share my thoughts, it discovered quite a few people were also interested -- the posts in this section are my most popular.

Marpa and combinator parsing

Marpa is a better way of doing combinator parsing, as this set of tutorials sets out to show.

Marpa versus Perl 6 grammars and PEG

In these blog posts, I discuss Perl 6 grammars, and compare them with Marpa. Perl 6 grammars are a PEG variant. I believe that Perl 6 grammars and PEG have a place when used as a "better" regular expression engine. But it can be shown that users (and language designers) whose hopes go beyond that are doomed to disappointment. In particular, if your interest is language-oriented programming, PEG and Perl 6 grammars lack the horsepower and the programmability to advance the state of the art.

I'm a fan of the Perl 6 effort. And I certainly should be their supporter, after the many favors they've done for me and the Marpa community over the years. The considerations of these post will disappoint the hopes for LOP applications of the native Perl 6 parser. But Perl 6 is much more than that. It would be a mistake to read these posts as part of a dismissal of the Perl 6 effort as a whole.

Language-oriented programming

If you're interested in language-oriented programming with Marpa, the blog posts Fast, handy languages, Grammar reuse and What are the reasonable computer languages will also be of interest.

Designing languages

Marpa allows things to be done which could not be done before. Many of these things are significant for language design -- they allow languages to take radically new and aggressive approaches to their syntax.

How to parse HTML

This series describes a new strategy for parsing liberal HTML, using Marpa and the Ruby Slippers techniques it makes possible. It is the most worked out example of Marpa use that I've done to date.

The Marpa-based HTML reformatter, html_fmt, is the Marpa application that I use the most. In fact, as I am typing this HTML, I am using it repeatedly as a filter, much in the same way that you'd use gnuindent for C code or perltidy for Perl.

When I left off with the first series, I mentioned that an HTML parser really should be configurable -- the user/application should be able to decide which variant of HTML they want to parse. The result is a parser driven by a configuration file that allows the user to reshape HTML. And, by the way, the configurable version is almost twice as fast.

Marpa v. other parsers

These posts compare Marpa directly to other parsers, with benchmarks.

The Bovicide Series

Yacc and its derivatives are considered the standard in parser generators. Yacc is part of a great series of traditions it was my privilege to watch being formed, and I have the greatest respect for its inventors. But to my mind yacc is one tradition more honored in the breach than in the observance. In this series I describe what will be required for a parsing algorithm to displace yacc's LALR, and why I think the Marpa algorithm has what it takes and more. The first three requirements are described in my first post, Killing Yacc: 1, 2 & 3 :

Error reporting has long been overlooked, and it is something at which yacc was astonishingly bad. I looked at the issue of development-time error reporting in Why the Bovicidal Rage?:

And in Bovicide 5: Parse-time Error Reporting, I looked at the other half of error reporting.

Both hand-written recursive descent and Marpa meet, to various degrees, all five of the previous requirements. The last requirement, stated in Bovicide 6: The Final Requirement, was the tie-breaker. Hand-written recursive descent has many strengths, but it will never be available as a library.

The "Perl and parsing" series

My "Perl and parsing" series used Perl to illustrate parsing theory and parsing theory to investigate Perl. I am not one of those who believes that mathematics takes place in vacuum -- it follows trends and even what are more accurately called fashions. Therefore this series has a lot of discussion of how current parsing theory came to be.

Natural Language Processing

Theory

Behind Marpa is a lot of mathematics, a lot of it old, but also some new. The posts below are for those curious about the Theory side of Marpa. (The definitive source is my paper on the theory behind Marpa, but that assumes the reader is comfortable with the mathematics of Parsing Theory.)

Visions

A warning to the reader: The posts in this section are my most misunderstood, and therefore, so far, can be called my least successful. These are posts in which I outline future directions, as plainly and simply as I know how to, uncluttered by details. The problem seems to be that, without technical details and code examples, many readers take my words as pleasant speculations, and not as I intend them, as descriptions of things that can be done now, today.

Older tutorials

These tutorial were written before the SLIF interface was created. Today, you'd want to use the SLIF in most of these cases. But most of their contents are still relevant, and most of the actual code would simply be carried over.

These posts, while not full-fledged tutorials, have a "how-to" focus.