In Douglas Crockford's article Top Down Operator Precedence {1} he implements a parser first presented by Vaughan Pratt {5}. I've been working on writing out Crockford's Parser with the hopes of extending it for fun.
I began typing out Crockfords parser implementation (using the published version from Beautiful Code) over the Holiday Break from University between 2016-2017. I needed a simple coding project to practice learning VIM before my Programming 2 course for the Spring of 2017, where I knew we would be primarily using the instructors Linux server via SSH. So typing out Crockford's code was a great way to practive VIM, and also learn a little bit about parsing in the process. I did my best to understand the code, but it remained ellusive even though I could grok the simplicity of its design.
I've decided to return to the project and add extensive documentation as a way to learn what the parser does. My plan is to flesh out this README file with explanations from Pratt's original paper, as well as other notes from other people's implementations (See References at the bottom). I also plan to add a lot of comments to the code itself, at the risk of looking like a n00b. Basically I'm writing this for a new JavaScript coder to be able to write their own (toy) parser, incase they want to better understand how computers work.
Crockfords original parser was designed for parsing Simplified JavaScript, or "just the good stuff" {1}. In his own words, it includes:
-
Functions as first class objects. Functions in Simplified JavaScript are lambdas with lexical scoping.
-
Dynamic objects with prototypal inheritance. Objects are class-free. We can add a new member to any object by ordinary assignment. An object can inherit members from another object.
-
Object literals and array literals. This is a very convenient notation for creating new objects and arrays. JavaScript literals were the inspiration for the JSON data interchange format{1}.
Top Down Operator Precedence (TDOP) is a type of parser that is based on a Recursive descent model, but it differs by treating the tokens as something simmilar to objects. Recursive descent associates 'semantic actions' with grammer rules, while TDOP has the actions associated with the tokens {2}.
This feature allows for TDOP to be implemented very well by a dynamic functional-object-oriented language like JavaScript. Generic objects are first created by a factory function, and methods can then be dynamically assigned to allow for the token to handle its own parsing logic.
Top Down Operator Precedence is Vaughan Pratt's generalized form of what is otherwise known as 'precedence climbing', used in recursive descent parsers {3}. If any of that sentence made sense to you, then congratulations! If not, then I'll try to explain.
Recursive descent parsers are very effective means for parsing statements, where tokens are neatly arranged in rigid pattterns. Unfortunately, they don't do so well with parsing expressions {4}. This is where Pratt's parser does its magic: it's very good at dealing with expressions because each token object is assigned a precedence, or binding power in Pratt's terminology {3}.
The binding power is a value that determines how tightly an operator binds to its operands. When operator tokens are parsed, their
binding powers are inspected by an expression()
function which recursively gobbles up tokens into expression trees that branch
according to the binding powers.
TODO Add more explanations. For now, check out Eli Bendersky's blog post about the subject.
As this project is in a very early stage, it has limited useability.
It currently runs on NodeJS, but I hope to extend it for the browser too.
The printTree
module will take a JS file as an argument, and output the tree
generated by parser.js
to the command-line.
A few Simplified JavaScript test files are included; for instance, test03.js
looks like:
var beans = function cool (x, y, z) {
return x + y * z;
};
You can see how the parser works by using printTree
on the test03.js
file (or any Simplified JavaScript) via
$ node printTree test03.js
The output should look like so:
Scanning..
Parsing..
Printing Tree for test03.js:
{ value: '=',
arity: 'binary',
first:
{ value: 'beans',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope:
{ def:
{ var: { value: 'var', arity: 'name', reserved: true },
beans: [Circular] },
parent: undefined } },
second:
{ value: 'function',
arity: 'function',
name: 'cool',
first:
[ { value: 'x',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope:
{ def:
{ cool:
{ value: 'cool',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope: [Circular] },
x: [Circular],
y:
{ value: 'y',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope: [Circular] },
z:
{ value: 'z',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope: [Circular] },
return:
{ value: 'return',
arity: 'statement',
reserved: true,
first:
{ value: '+',
arity: 'binary',
first: { value: 'x', arity: 'name' },
second:
{ value: '*',
arity: 'binary',
first: { value: 'y', arity: 'name' },
second: { value: 'z', arity: 'name' } } } } },
parent:
{ def:
{ var: { value: 'var', arity: 'name', reserved: true },
beans:
{ value: 'beans',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope: [Circular] } },
parent: undefined } } },
{ value: 'y',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope:
{ def:
{ cool:
{ value: 'cool',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope: [Circular] },
x:
{ value: 'x',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope: [Circular] },
y: [Circular],
z:
{ value: 'z',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope: [Circular] },
return:
{ value: 'return',
arity: 'statement',
reserved: true,
first:
{ value: '+',
arity: 'binary',
first: { value: 'x', arity: 'name' },
second:
{ value: '*',
arity: 'binary',
first: { value: 'y', arity: 'name' },
second: { value: 'z', arity: 'name' } } } } },
parent:
{ def:
{ var: { value: 'var', arity: 'name', reserved: true },
beans:
{ value: 'beans',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope: [Circular] } },
parent: undefined } } },
{ value: 'z',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope:
{ def:
{ cool:
{ value: 'cool',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope: [Circular] },
x:
{ value: 'x',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope: [Circular] },
y:
{ value: 'y',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope: [Circular] },
z: [Circular],
return:
{ value: 'return',
arity: 'statement',
reserved: true,
first:
{ value: '+',
arity: 'binary',
first: { value: 'x', arity: 'name' },
second:
{ value: '*',
arity: 'binary',
first: { value: 'y', arity: 'name' },
second: { value: 'z', arity: 'name' } } } } },
parent:
{ def:
{ var: { value: 'var', arity: 'name', reserved: true },
beans:
{ value: 'beans',
arity: 'name',
reserved: false,
nud: [Function: itself],
led: null,
std: null,
lbp: 0,
scope: [Circular] } },
parent: undefined } } } ],
second:
{ value: 'return',
arity: 'statement',
reserved: true,
first:
{ value: '+',
arity: 'binary',
first: { value: 'x', arity: 'name' },
second:
{ value: '*',
arity: 'binary',
first: { value: 'y', arity: 'name' },
second: { value: 'z', arity: 'name' } } } } } }
As this project is a work-in progress, there is always room for improvement. One area for improvement might be to follow Andy Chu's lead {6} and re-design the implementation to accomplish some of the following:
- Avoid Globals
- Use 'pure' functions opperating on the AST as data
- Increase the modularity by allowing a 'spec' definition of
nud
andled
functions be defined apart from the parsing mechanics
The last part is the most immediately useful, because a spec
file could be used to define a DSL in a modular way. Also,
language revisions could be separated out into versions based on spec
files.
[1] http://javascript.crockford.com/tdop/tdop.html
Douglas Crockford's original TDOP parser.
[2] http://eli.thegreenplace.net/2010/01/02/top-down-operator-precedence-parsing/
This post by Eli Bendersky is a solid explanation of how TDOP works.
[3] http://www.oilshell.org/blog/2016/11/01.html
Andy Chu. Pratt parsing equivalent to precedence climbing.
[4] http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/
Another awesome blog-post about Pratt parsers from Bob Nystrom.
Carl Smith kindly made an HTML version of Pratt's original paper!
[6] https://github.com/andychu/pratt-parsing-demo
Andy Chu [see ref 3] also implemented his own Pratt parser in Python in a clean and modular way.