Skip to content

Commit

Permalink
docs: GLR docs
Browse files Browse the repository at this point in the history
  • Loading branch information
igordejanovic committed Oct 15, 2023
1 parent ad17c94 commit 7723c2a
Show file tree
Hide file tree
Showing 9 changed files with 89 additions and 23 deletions.
1 change: 1 addition & 0 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
- [Configuration](configuration.md)
- [Components](components.md)
- [Lexers](lexers.md)
- [Parsers](parsers.md)
- [Builders](builders.md)
- [CLI](cli.md)
- [Handling errors](handling_errors/handling_errors.md)
Expand Down
2 changes: 1 addition & 1 deletion docs/src/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ would be:
```

```admonish note
Don't forget to add `rustmo-compiler` to the `build-dependencies` section of the
Don't forget to add `rustеmo-compiler` to the `build-dependencies` section of the
`Cargo.toml` file.
```

Expand Down
12 changes: 3 additions & 9 deletions docs/src/grammar_language.md
Original file line number Diff line number Diff line change
Expand Up @@ -370,19 +370,13 @@ is equivalent to:
So using of `*` creates both `A0` and `A1` rules. Action attached to `A0`
returns a list of matched `a` and empty list if no match is found. Please note
the [usage of `nops`](./disambiguation.md#nops-and-nopse). In case if
`prefer_shift` strategy is used using `nops` will perform both `REDUCE` and
`SHIFT` during GLR parsing in case what follows zero or more might be another
the [usage of `nops`](./disambiguation.md#nops-and-nopse). In case
`prefer_shift` strategy is used, using `nops` will perform both `REDUCE` and
`SHIFT` during GLR parsing if what follows zero or more might be another
element in the sequence. This is most of the time what you need.
```


```admonish warning
Previous statements will be valid when GLR parsing is implemented.
`{nops}` needs to be implemented.
```


### Repetition modifiers

Repetitions (`+`, `*`, `?`) may optionally be followed by a modifier in square
Expand Down
2 changes: 1 addition & 1 deletion docs/src/handling_errors/handling_errors.md
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ provide `message`, `file` and `location` inside the file.
```admonish todo
1. Lexical ambiguities - when there can be recognized multiple tokens at the
current position.
2. Syntactic ambiguities - aplicable only to GLR - when multiple
2. Syntactic ambiguities - applicable only to GLR - when multiple
interpretation/trees of the input can be constructed.
These are not errors per se so should be moved to some other chapter.
Expand Down
4 changes: 0 additions & 4 deletions docs/src/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,6 @@
Rustemo is a LR/GLR parser generator for Rust (a.k.a.
[compiler-compiler](https://en.wikipedia.org/wiki/Compiler-compiler)).

```admonish note
Only LR is implemented at the moment. See the roadmap in the [README](https://github.com/igordejanovic/rustemo/#roadmap-tentative).
```

Basically, this kind of tools, given a formal grammar of the language, produce a
program that can transform unstructured text (a sequence of characters, or more
generally a sequence of tokens) to a structured (tree-like or graph-like) form
Expand Down
67 changes: 67 additions & 0 deletions docs/src/parsers.md
Original file line number Diff line number Diff line change
@@ -1 +1,68 @@
# Parsers

Parsers use tokens from lexer as inputs and recognize syntactic elements. Then, they call a builder to produce the final output.

There are two flavours of parsers supported by Rustemo:

- Deterministic LR
- Non-deterministic GLR, or more precise Right-Nulled GLR

```admonish tip
GLR parsing is more complex as it must handle all possibilities so there is some
overhead and LR parsing is generally faster. Thus, use GLR only if you know that
you need it or in the early development process when you want to deal with
SHIFT/REDUCE conflicts later.
Another benefit of LR parsing is that it is deterministic and non-ambiguous. If
the input can be parsed there is only one possible way to do it with LR.
```

The API for both flavours is similar. You create an instance of the generated
parser type and call either `parse` or `parse_file` where the first method
accepts the input directly while the second method accepts the path to the file
that needs to be parsed.

For example, in the calculator tutorial, we create a new parser instance and
call `parse` to parse the input supplied by the user on the stdin:

```rust
{{#include ./tutorials/calculator/calculator1/src/main.rs:main}}
```

The parser type `CalculatorParser` is generated by Rustemo from grammar
`calculator.rustemo`.

The result of the parsing process is a `Result` value which contains either the
result of parsing if successful, in the `Ok` variant, or the error value in
`Err` variant.

If deterministic parsing is used the result will be the final output constructed
by the [configured builder](./builders.md).

For GLR the result will be `Forest` which contains all the possible
trees/solution for the given input. For the final output you have to choose the
tree and call the builder over it.

To generate GLR parser either set the algorithm using settings API (e.g. from `build.rs` script):

```rust
rustemo_compiler::Settings::new().parser_algo(ParserAlgo::GLR).process_dir()
```

or call `rcomp` CLI with `--parser-algo glr` over your grammar file.

For example of calling GLR parser see this test:

```rust
{{#include ../../tests/src/glr/forest/mod.rs:forest}}
```

The most useful API calls for `Forest` are `get_tree` and `get_first_tree`.
There is also `solutions` which gives your the number of trees in the forest.

A tree can accept a builder using the `build` method. For an example of calling
the default builder over the forest tree see this test:

```rust
{{#include ../../tests/src/glr/build/mod.rs:build}}
```
20 changes: 12 additions & 8 deletions docs/src/parsing/parsing.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,9 @@ by a table used during parsing to decide the action and transition to perform.
Given a sequence of tokens, if machine starts in a start state and end-up in an
accepting state, we say that the sequence of tokens (a sentence) belongs to the
language recognized by the DFSA. A set of languages which can be recognized by
DFSA are called deterministic [context-free languages - CFG](). NFSA can
recognize a full set of CFG. GLR parsing is based on NFSA.
DFSA are called deterministic [context-free languages -
CFG](https://en.wikipedia.org/wiki/Context-free_language). NFSA can recognize a
full set of CFG. GLR parsing is based on NFSA.

Depending on the algorithm used to produce the FSA table we have different LR
variants (SLR, LALR etc.). They only differ by the table they use, the parsing
Expand Down Expand Up @@ -276,15 +277,18 @@ Parser -> Builder: Get build product

# GLR parsing
GLR is a generalized version of LR which accepts a full set of CFG. If multiple
actions can be executed the parser will split and investigate each possibility.
If some of the path prove wrong it would be discarded (we call these local
splits - local ambiguities) but if multiple paths lead to the successful parse
then all interpretations are valid and instead of the parse tree we get the
parse forest. In that case we say that our language is ambiguous.
parse actions can be executed at the current parser state the parser will split
and investigate each possibility. If some of the path prove wrong it would be
discarded (we call these splits - local ambiguities) but if multiple paths lead
to the successful parse then all interpretations are valid and instead of the
parse tree we get the parse forest. In that case we say that our language is
ambiguous.

Due to the fact that automata handled by GLR can be non-deterministic we say
that GLR is a form of non-deterministic parsing. See more in the [section on
resolving LR
conflicts](../handling_errors/handling_errors.md#resolving-lr-conflicts).

GLR will be implemented in the future versions of Rustemo.
Rustemo uses a particular implementation of the GLR called [Right-Nulled GLR](https://www.doi.org/10.1145/1146809.1146810)
which is a correct and more efficient version of the [original GLR](https://dl.acm.org/doi/abs/10.5555/1623611.1623625).

2 changes: 2 additions & 0 deletions tests/src/glr/build/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ rustemo_mod!(calc_actions, "/src/glr/build");
use self::calc::CalcParser;
use rustemo::parser::Parser;

// ANCHOR: build
#[test]
fn glr_tree_build_default() {
let forest = CalcParser::new().parse("1 + 4 * 9").unwrap();
Expand All @@ -27,6 +28,7 @@ fn glr_tree_build_default() {
format!("{:#?}", forest.get_tree(1).unwrap().build(&mut builder))
);
}
// ANCHOR_END: build

#[test]
fn glr_tree_build_generic() {
Expand Down
2 changes: 2 additions & 0 deletions tests/src/glr/forest/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ fn glr_calc_parse_ambiguities() {
);
}

// ANCHOR: forest
#[test]
fn glr_extract_tree_from_forest() {
let forest = CalcParser::new().parse("1 + 4 * 9 + 3 * 2 + 7").unwrap();
Expand All @@ -98,3 +99,4 @@ fn glr_extract_tree_from_forest() {
format!("{:#?}", tree.children()[0].children())
);
}
// ANCHOR_END: forest

0 comments on commit 7723c2a

Please sign in to comment.