Home

About

Neotoma is a packrat parser-generator for Erlang for Parsing Expression Grammars (PEGs). It consists of a parsing-combinator library with memoization routines, a parser for PEGs, and a utility to generate parsers from PEGs. It is inspired by treetop, a Ruby library with similar aims, and parsec, the parser-combinator library for Haskell.

Getting started

Clone the repository:

$ git clone git://github.com/seancribbs/neotoma.git

Build the library:
```
$ cd neotoma

$ make
```

Start the Erlang shell and generate your parser:

$ erl -pa ebin

1> peg_gen:file(“mygrammar.peg”).

ok

Writing a Grammar

Neotoma’s PEG grammars are based on the grammars from Brian Ford’s thesis with some influences from Treetop. The basic format is thus:

 nonterminal <- parsing_expression;

Where parsing_expression is any combination of nonterminals, terminals and sub-expressions (e, e1, e2 are parsing expressions) as described below:

Non-terminal symbol	`some_nonterminal`	All nonterminals on the RHS must have a corresponding rule/reduction.
String	`"Hello, world"`	single- or double-quoted, quotes escaped with `\\`
Character class	`[a-zA-Z0-9]`	just as in PCRE
Any single character	`.`
Sequence	`e1 e2`
Ordered choice	`e1 / e2`
Grouping	`(e)`
Zero-width positive lookahead	`&e`
Zero-width negative lookahead	`!e`
Optional (zero-or-more) repetition	`e*`
Mandatory (one-or-more) repetition	`e+`
Optional expression	`e?`
Label	`name:e`	Helps extract sub-expressions from the AST

Currently all reductions must end with a semi-colon ;.

Working with the AST

Without specifying any transformations, Neotoma will return a nested list of the results of its parse — essentially an S-expression. In this form, the AST is not very useful; one needs to transform and annotate the tree into a useful data structure. Neotoma provides hooks into the parsing process in the form of the transform/3 function. Once you have generated your parser, you can edit this function in the generated file. The prototype is thus:

transform('nonterminal', Node, Index)

nonterminal is the nonterminal that was successfully parsed.
Node is a list of the results from sub-expressions, which may be raw terminals or the transformations of other nonterminals.
Index is a tuple representing the position of the parser at the start of this expression, in the form {{line, L},{column,C}} where L and C are both integers.

While editing this within the generated parser is easy, Neotoma does not currently allow Erlang transformation code inline with the grammar; therefore, I recommend that you put your transformations in a separate module. Doing so will allow you to develop your grammar and transformations independently, without the parser-generator overwriting your transformations. You can do this by specifying the transform_module option to peg_gen:file/2. The module will be generated for you if it does not exist already. An example:

1>peg_gen:file("mygrammar.peg", [{transform_module, myast}]).

Future features

Transformation code and supplemental code inline with the grammar.
Support for parsing in binary form/UTF.
Support for LFE and Reia.

Provide feedback

Saved searches