Skip to content
seancribbs edited this page Sep 13, 2010 · 11 revisions

About

Neotoma is a packrat parser-generator for Erlang for Parsing Expression Grammars (PEGs). It consists of a parsing-combinator library with memoization routines, a parser for PEGs, and a utility to generate parsers from PEGs. It is inspired by treetop, a Ruby library with similar aims, and parsec, the parser-combinator library for Haskell.

Getting started

  1. Clone the repository:
    $ git clone git://github.com/seancribbs/neotoma.git
  2. Build the library:
    $ cd neotoma
    $ make
  3. Start the Erlang shell and generate your parser:
    $ erl -pa ebin
    1> peg_gen:file(“mygrammar.peg”).
    ok

Writing a Grammar

Neotoma’s PEG grammars are based on the grammars from Brian Ford’s thesis with some influences from Treetop. The basic format is thus:

 nonterminal <- parsing_expression;

Where parsing_expression is any combination of nonterminals, terminals and sub-expressions (e, e1, e2 are parsing expressions) as described below:

Non-terminal symbol some_nonterminal All nonterminals on the RHS must have a corresponding rule/reduction.
String "Hello, world" single- or double-quoted, quotes escaped with \\
Character class [a-zA-Z0-9] just as in PCRE
Any single character .
Sequence e1 e2
Ordered choice e1 / e2
Grouping (e)
Zero-width positive lookahead &e
Zero-width negative lookahead !e
Optional (zero-or-more) repetition e*
Mandatory (one-or-more) repetition e+
Optional expression e?
Label name:e Helps extract sub-expressions from the AST

Currently all reductions must end with a semi-colon ;.

Working with the AST

Without specifying any transformations, Neotoma will return a nested list of the results of its parse — essentially an S-expression. In this form, the AST is not very useful; one needs to transform and annotate the tree into a useful data structure. Neotoma provides hooks into the parsing process in the form of the transform/3 function. Once you have generated your parser, you can edit this function in the generated file. The prototype is thus:

transform('nonterminal', Node, Index)
  • nonterminal is the nonterminal that was successfully parsed.
  • Node is a list of the results from sub-expressions, which may be raw terminals or the transformations of other nonterminals.
  • Index is a tuple representing the position of the parser at the start of this expression, in the form {{line, L},{column,C}} where L and C are both integers.

While editing this within the generated parser is easy, Neotoma does not currently allow Erlang transformation code inline with the grammar; therefore, I recommend that you put your transformations in a separate module. Doing so will allow you to develop your grammar and transformations independently, without the parser-generator overwriting your transformations. You can do this by specifying the transform_module option to peg_gen:file/2. The module will be generated for you if it does not exist already. An example:

1>peg_gen:file("mygrammar.peg", [{transform_module, myast}]).

Future features

  • Transformation code and supplemental code inline with the grammar.
  • Support for parsing in binary form/UTF.
  • Support for LFE and Reia.
Clone this wiki locally