-
-
Notifications
You must be signed in to change notification settings - Fork 163
Oil Parser Generator Project
Back to Tasks Under NLNet Grant
This is an introduction to an important subproject of https://www.oilshell.org/
(Note that we also need help on the Python-to-C++ translator. This work is separate from that. It involves parsing, but otherwise isn't strictly related.)
Oil is developed "middle out" -- it has an "executable spec" in Python, which is then semi-automatically translated to C++.
Much of the code works in C++, but the expression parser does not. It is a special case of the translatino.
In Python, the oil_lang/grammar_gen.py
tool reads the grammar oil_lang/grammar.pgen2
. It produces parse tables in Python's "marshal" format. At runtime, the pgen2/
library reads it.
So instead of outputting these Python data structures, we want to output C data structures just like Python itself does it. (before Python 3.8, when they switched to PEG.)
And we want the pgen-native
parser runtime to be linked into the Oil executable. It should then interpret those parse tables.
- Issue 594: Generate Parse Tables for pgen-native, and integrate it into oil-native. This is part of the translation to C++. Right now we only have a slow parser in Python for the Oil expression language.
How to Parse Shell Like a Programming Language explains our parsing approach. This already works in Python:
$ bin/oil --ast-format text -n -c 'echo "hello $name"'
(command.Simple
words: [
(compound_word parts:[(Token id:Id.Lit_Chars span_id:0 val:echo)])
...
And it's already translated to C++:
$ _bin/cxx-dbg/osh_eval -n -c 'echo "hello $name"'
(command.Simple
words: [
(compound_word parts:[(Token id:Id.Lit_Chars span_id:0 val:echo)])
...
This part does not use pgen2, because it's just shell. The Oil language has a var
keyword, and that parse uses pgen2:
~/git/oilshell/oil$ bin/oil --ast-format text -n -c 'var x = 1 + 2 * 3'
(command.VarDecl
keyword: (Token id:Id.KW_Var span_id:0 val:var)
lhs: [(name_type name:(Token id:Id.Expr_Name span_id:2 val:x))]
rhs:
(expr.Binary
op: (Token id:Id.Arith_Plus span_id:8 val:_)
left: (expr.Const c:(Token id:Id.Expr_DecInt span_id:6 val:1))
right:
(expr.Binary
However it crashes in C++:
$ _bin/cxx-dbg/osh_eval --ast-format text -n -c 'var x = 1 + 2 * 3'
osh_eval: cpp/pgen2_parse.cc:8: void parse::Parser::setup(int): Assertion `0' failed.
Aborted (core dumped)
So this is what we want to work.
~/git/oilshell/oil/Python-2.7.13$ head -n 15 Python/graminit.c
/* Generated by Parser/pgen */
#include "pgenheaders.h"
#include "grammar.h"
PyAPI_DATA(grammar) _PyParser_Grammar;
static arc arcs_0_0[3] = {
{2, 1},
{3, 1},
{4, 2},
};
static arc arcs_0_1[1] = {
{0, 1},
};
static arc arcs_0_2[1] = {
{2, 1},
These test run against Python, but I can make them run against C++ (the .asan variant).
$ oil_lang/run.sh soil-run
oil_lang/grammar_gen.py
oil_lang/grammar.pgen2
-
_devbuild/gen/grammar.marshal
and_devbuild/gen/grammar_nt.py
(non-terminals) -
oil_lang/expr_parse.py
-- a wrapper for the generated parser - The
pgen2/
directory- parse.py and more
-
pgen-native/
dir -- this is just a copy of Python, imported by a contributor