docs: GLR docs

igordejanovic · Oct 15, 2023 · 7723c2a · 7723c2a
1 parent ad17c94
commit 7723c2a
Show file tree

Hide file tree

Showing 9 changed files with 89 additions and 23 deletions.
diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md
@@ -6,6 +6,7 @@
 - [Configuration](configuration.md)
 - [Components](components.md)
   - [Lexers](lexers.md)
+  - [Parsers](parsers.md)
   - [Builders](builders.md)
 - [CLI](cli.md)
 - [Handling errors](handling_errors/handling_errors.md)

diff --git a/docs/src/configuration.md b/docs/src/configuration.md
@@ -21,7 +21,7 @@ would be:
 ```
 
 ```admonish note
-Don't forget to add `rustmo-compiler` to the `build-dependencies` section of the
+Don't forget to add `rustеmo-compiler` to the `build-dependencies` section of the
 `Cargo.toml` file.
 ```
 

diff --git a/docs/src/grammar_language.md b/docs/src/grammar_language.md
@@ -370,19 +370,13 @@ is equivalent to:
 
 So using of `*` creates both `A0` and `A1` rules. Action attached to `A0`
 returns a list of matched `a` and empty list if no match is found. Please note
-the [usage of `nops`](./disambiguation.md#nops-and-nopse). In case if
-`prefer_shift` strategy is used using `nops` will perform both `REDUCE` and
-`SHIFT` during GLR parsing in case what follows zero or more might be another
+the [usage of `nops`](./disambiguation.md#nops-and-nopse). In case
+`prefer_shift` strategy is used, using `nops` will perform both `REDUCE` and
+`SHIFT` during GLR parsing if what follows zero or more might be another
 element in the sequence. This is most of the time what you need.
 ```
 
 
-```admonish warning
-Previous statements will be valid when GLR parsing is implemented.
-`{nops}` needs to be implemented.
-```
-
-
 ### Repetition modifiers
 
 Repetitions (`+`, `*`, `?`) may optionally be followed by a modifier in square

diff --git a/docs/src/handling_errors/handling_errors.md b/docs/src/handling_errors/handling_errors.md
@@ -273,7 +273,7 @@ provide `message`, `file` and `location` inside the file.
 ```admonish todo
 1. Lexical ambiguities - when there can be recognized multiple tokens at the
    current position.
-2. Syntactic ambiguities - aplicable only to GLR - when multiple
+2. Syntactic ambiguities - applicable only to GLR - when multiple
    interpretation/trees of the input can be constructed.
    
 These are not errors per se so should be moved to some other chapter.

diff --git a/docs/src/introduction.md b/docs/src/introduction.md
@@ -3,10 +3,6 @@
 Rustemo is a LR/GLR parser generator for Rust (a.k.a.
 [compiler-compiler](https://en.wikipedia.org/wiki/Compiler-compiler)).
 
-```admonish note
-Only LR is implemented at the moment. See the roadmap in the [README](https://github.com/igordejanovic/rustemo/#roadmap-tentative).
-```
-
 Basically, this kind of tools, given a formal grammar of the language, produce a
 program that can transform unstructured text (a sequence of characters, or more
 generally a sequence of tokens) to a structured (tree-like or graph-like) form

diff --git a/docs/src/parsers.md b/docs/src/parsers.md
@@ -1 +1,68 @@
 # Parsers
+
+Parsers use tokens from lexer as inputs and recognize syntactic elements. Then, they call a builder to produce the final output.
+
+There are two flavours of parsers supported by Rustemo:
+
+- Deterministic LR 
+- Non-deterministic GLR, or more precise Right-Nulled GLR
+
+```admonish tip
+GLR parsing is more complex as it must handle all possibilities so there is some
+overhead and LR parsing is generally faster. Thus, use GLR only if you know that
+you need it or in the early development process when you want to deal with
+SHIFT/REDUCE conflicts later.
+
+Another benefit of LR parsing is that it is deterministic and non-ambiguous. If
+the input can be parsed there is only one possible way to do it with LR.
+```
+
+The API for both flavours is similar. You create an instance of the generated
+parser type and call either `parse` or `parse_file` where the first method
+accepts the input directly while the second method accepts the path to the file
+that needs to be parsed.
+
+For example, in the calculator tutorial, we create a new parser instance and
+call `parse` to parse the input supplied by the user on the stdin:
+
+```rust
+{{#include ./tutorials/calculator/calculator1/src/main.rs:main}}
+```
+
+The parser type `CalculatorParser` is generated by Rustemo from grammar
+`calculator.rustemo`.
+
+The result of the parsing process is a `Result` value which contains either the
+result of parsing if successful, in the `Ok` variant, or the error value in
+`Err` variant.
+
+If deterministic parsing is used the result will be the final output constructed
+by the [configured builder](./builders.md). 
+
+For GLR the result will be `Forest` which contains all the possible
+trees/solution for the given input. For the final output you have to choose the
+tree and call the builder over it.
+
+To generate GLR parser either set the algorithm using settings API (e.g. from `build.rs` script):
+
+```rust
+rustemo_compiler::Settings::new().parser_algo(ParserAlgo::GLR).process_dir()
+```
+
+or call `rcomp` CLI with `--parser-algo glr` over your grammar file.
+
+For example of calling GLR parser see this test:
+
+```rust
+{{#include ../../tests/src/glr/forest/mod.rs:forest}}
+```
+
+The most useful API calls for `Forest` are `get_tree` and `get_first_tree`.
+There is also `solutions` which gives your the number of trees in the forest.
+
+A tree can accept a builder using the `build` method. For an example of calling
+the default builder over the forest tree see this test:
+
+```rust
+{{#include ../../tests/src/glr/build/mod.rs:build}}
+```
diff --git a/docs/src/parsing/parsing.md b/docs/src/parsing/parsing.md
@@ -51,8 +51,9 @@ by a table used during parsing to decide the action and transition to perform.
 Given a sequence of tokens, if machine starts in a start state and end-up in an
 accepting state, we say that the sequence of tokens (a sentence) belongs to the
 language recognized by the DFSA. A set of languages which can be recognized by
-DFSA are called deterministic [context-free languages - CFG](). NFSA can
-recognize a full set of CFG. GLR parsing is based on NFSA.
+DFSA are called deterministic [context-free languages -
+CFG](https://en.wikipedia.org/wiki/Context-free_language). NFSA can recognize a
+full set of CFG. GLR parsing is based on NFSA.
 
 Depending on the algorithm used to produce the FSA table we have different LR
 variants (SLR, LALR etc.). They only differ by the table they use, the parsing
@@ -276,15 +277,18 @@ Parser -> Builder: Get build product
 
 # GLR parsing
 GLR is a generalized version of LR which accepts a full set of CFG. If multiple
-actions can be executed the parser will split and investigate each possibility.
-If some of the path prove wrong it would be discarded (we call these local
-splits - local ambiguities) but if multiple paths lead to the successful parse
-then all interpretations are valid and instead of the parse tree we get the
-parse forest. In that case we say that our language is ambiguous.
+parse actions can be executed at the current parser state the parser will split
+and investigate each possibility. If some of the path prove wrong it would be
+discarded (we call these splits - local ambiguities) but if multiple paths lead
+to the successful parse then all interpretations are valid and instead of the
+parse tree we get the parse forest. In that case we say that our language is
+ambiguous.
 
 Due to the fact that automata handled by GLR can be non-deterministic we say
 that GLR is a form of non-deterministic parsing. See more in the [section on
 resolving LR
 conflicts](../handling_errors/handling_errors.md#resolving-lr-conflicts).
 
-GLR will be implemented in the future versions of Rustemo.
+Rustemo uses a particular implementation of the GLR called [Right-Nulled GLR](https://www.doi.org/10.1145/1146809.1146810)
+which is a correct and more efficient version of the [original GLR](https://dl.acm.org/doi/abs/10.5555/1623611.1623625).
+
diff --git a/tests/src/glr/build/mod.rs b/tests/src/glr/build/mod.rs
@@ -9,6 +9,7 @@ rustemo_mod!(calc_actions, "/src/glr/build");
 use self::calc::CalcParser;
 use rustemo::parser::Parser;
 
+// ANCHOR: build
 #[test]
 fn glr_tree_build_default() {
     let forest = CalcParser::new().parse("1 + 4 * 9").unwrap();
@@ -27,6 +28,7 @@ fn glr_tree_build_default() {
         format!("{:#?}", forest.get_tree(1).unwrap().build(&mut builder))
     );
 }
+// ANCHOR_END: build
 
 #[test]
 fn glr_tree_build_generic() {

diff --git a/tests/src/glr/forest/mod.rs b/tests/src/glr/forest/mod.rs
@@ -73,6 +73,7 @@ fn glr_calc_parse_ambiguities() {
     );
 }
 
+// ANCHOR: forest
 #[test]
 fn glr_extract_tree_from_forest() {
     let forest = CalcParser::new().parse("1 + 4 * 9 + 3 * 2 + 7").unwrap();
@@ -98,3 +99,4 @@ fn glr_extract_tree_from_forest() {
         format!("{:#?}", tree.children()[0].children())
     );
 }
+// ANCHOR_END: forest