diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md index 1700f890..2dee3cd2 100644 --- a/docs/src/SUMMARY.md +++ b/docs/src/SUMMARY.md @@ -6,6 +6,7 @@ - [Configuration](configuration.md) - [Components](components.md) - [Lexers](lexers.md) + - [Parsers](parsers.md) - [Builders](builders.md) - [CLI](cli.md) - [Handling errors](handling_errors/handling_errors.md) diff --git a/docs/src/configuration.md b/docs/src/configuration.md index 6f512017..4f93bb78 100644 --- a/docs/src/configuration.md +++ b/docs/src/configuration.md @@ -21,7 +21,7 @@ would be: ``` ```admonish note -Don't forget to add `rustmo-compiler` to the `build-dependencies` section of the +Don't forget to add `rustŠµmo-compiler` to the `build-dependencies` section of the `Cargo.toml` file. ``` diff --git a/docs/src/grammar_language.md b/docs/src/grammar_language.md index 9acf5838..0226acfc 100644 --- a/docs/src/grammar_language.md +++ b/docs/src/grammar_language.md @@ -370,19 +370,13 @@ is equivalent to: So using of `*` creates both `A0` and `A1` rules. Action attached to `A0` returns a list of matched `a` and empty list if no match is found. Please note -the [usage of `nops`](./disambiguation.md#nops-and-nopse). In case if -`prefer_shift` strategy is used using `nops` will perform both `REDUCE` and -`SHIFT` during GLR parsing in case what follows zero or more might be another +the [usage of `nops`](./disambiguation.md#nops-and-nopse). In case +`prefer_shift` strategy is used, using `nops` will perform both `REDUCE` and +`SHIFT` during GLR parsing if what follows zero or more might be another element in the sequence. This is most of the time what you need. ``` -```admonish warning -Previous statements will be valid when GLR parsing is implemented. -`{nops}` needs to be implemented. -``` - - ### Repetition modifiers Repetitions (`+`, `*`, `?`) may optionally be followed by a modifier in square diff --git a/docs/src/handling_errors/handling_errors.md b/docs/src/handling_errors/handling_errors.md index 23b835cc..98a33be5 100644 --- a/docs/src/handling_errors/handling_errors.md +++ b/docs/src/handling_errors/handling_errors.md @@ -273,7 +273,7 @@ provide `message`, `file` and `location` inside the file. ```admonish todo 1. Lexical ambiguities - when there can be recognized multiple tokens at the current position. -2. Syntactic ambiguities - aplicable only to GLR - when multiple +2. Syntactic ambiguities - applicable only to GLR - when multiple interpretation/trees of the input can be constructed. These are not errors per se so should be moved to some other chapter. diff --git a/docs/src/introduction.md b/docs/src/introduction.md index e05574de..2c5419c9 100644 --- a/docs/src/introduction.md +++ b/docs/src/introduction.md @@ -3,10 +3,6 @@ Rustemo is a LR/GLR parser generator for Rust (a.k.a. [compiler-compiler](https://en.wikipedia.org/wiki/Compiler-compiler)). -```admonish note -Only LR is implemented at the moment. See the roadmap in the [README](https://github.com/igordejanovic/rustemo/#roadmap-tentative). -``` - Basically, this kind of tools, given a formal grammar of the language, produce a program that can transform unstructured text (a sequence of characters, or more generally a sequence of tokens) to a structured (tree-like or graph-like) form diff --git a/docs/src/parsers.md b/docs/src/parsers.md index 7ced9df2..4db8c6d9 100644 --- a/docs/src/parsers.md +++ b/docs/src/parsers.md @@ -1 +1,68 @@ # Parsers + +Parsers use tokens from lexer as inputs and recognize syntactic elements. Then, they call a builder to produce the final output. + +There are two flavours of parsers supported by Rustemo: + +- Deterministic LR +- Non-deterministic GLR, or more precise Right-Nulled GLR + +```admonish tip +GLR parsing is more complex as it must handle all possibilities so there is some +overhead and LR parsing is generally faster. Thus, use GLR only if you know that +you need it or in the early development process when you want to deal with +SHIFT/REDUCE conflicts later. + +Another benefit of LR parsing is that it is deterministic and non-ambiguous. If +the input can be parsed there is only one possible way to do it with LR. +``` + +The API for both flavours is similar. You create an instance of the generated +parser type and call either `parse` or `parse_file` where the first method +accepts the input directly while the second method accepts the path to the file +that needs to be parsed. + +For example, in the calculator tutorial, we create a new parser instance and +call `parse` to parse the input supplied by the user on the stdin: + +```rust +{{#include ./tutorials/calculator/calculator1/src/main.rs:main}} +``` + +The parser type `CalculatorParser` is generated by Rustemo from grammar +`calculator.rustemo`. + +The result of the parsing process is a `Result` value which contains either the +result of parsing if successful, in the `Ok` variant, or the error value in +`Err` variant. + +If deterministic parsing is used the result will be the final output constructed +by the [configured builder](./builders.md). + +For GLR the result will be `Forest` which contains all the possible +trees/solution for the given input. For the final output you have to choose the +tree and call the builder over it. + +To generate GLR parser either set the algorithm using settings API (e.g. from `build.rs` script): + +```rust +rustemo_compiler::Settings::new().parser_algo(ParserAlgo::GLR).process_dir() +``` + +or call `rcomp` CLI with `--parser-algo glr` over your grammar file. + +For example of calling GLR parser see this test: + +```rust +{{#include ../../tests/src/glr/forest/mod.rs:forest}} +``` + +The most useful API calls for `Forest` are `get_tree` and `get_first_tree`. +There is also `solutions` which gives your the number of trees in the forest. + +A tree can accept a builder using the `build` method. For an example of calling +the default builder over the forest tree see this test: + +```rust +{{#include ../../tests/src/glr/build/mod.rs:build}} +``` diff --git a/docs/src/parsing/parsing.md b/docs/src/parsing/parsing.md index bced1687..04c0dabc 100644 --- a/docs/src/parsing/parsing.md +++ b/docs/src/parsing/parsing.md @@ -51,8 +51,9 @@ by a table used during parsing to decide the action and transition to perform. Given a sequence of tokens, if machine starts in a start state and end-up in an accepting state, we say that the sequence of tokens (a sentence) belongs to the language recognized by the DFSA. A set of languages which can be recognized by -DFSA are called deterministic [context-free languages - CFG](). NFSA can -recognize a full set of CFG. GLR parsing is based on NFSA. +DFSA are called deterministic [context-free languages - +CFG](https://en.wikipedia.org/wiki/Context-free_language). NFSA can recognize a +full set of CFG. GLR parsing is based on NFSA. Depending on the algorithm used to produce the FSA table we have different LR variants (SLR, LALR etc.). They only differ by the table they use, the parsing @@ -276,15 +277,18 @@ Parser -> Builder: Get build product # GLR parsing GLR is a generalized version of LR which accepts a full set of CFG. If multiple -actions can be executed the parser will split and investigate each possibility. -If some of the path prove wrong it would be discarded (we call these local -splits - local ambiguities) but if multiple paths lead to the successful parse -then all interpretations are valid and instead of the parse tree we get the -parse forest. In that case we say that our language is ambiguous. +parse actions can be executed at the current parser state the parser will split +and investigate each possibility. If some of the path prove wrong it would be +discarded (we call these splits - local ambiguities) but if multiple paths lead +to the successful parse then all interpretations are valid and instead of the +parse tree we get the parse forest. In that case we say that our language is +ambiguous. Due to the fact that automata handled by GLR can be non-deterministic we say that GLR is a form of non-deterministic parsing. See more in the [section on resolving LR conflicts](../handling_errors/handling_errors.md#resolving-lr-conflicts). -GLR will be implemented in the future versions of Rustemo. +Rustemo uses a particular implementation of the GLR called [Right-Nulled GLR](https://www.doi.org/10.1145/1146809.1146810) +which is a correct and more efficient version of the [original GLR](https://dl.acm.org/doi/abs/10.5555/1623611.1623625). + diff --git a/tests/src/glr/build/mod.rs b/tests/src/glr/build/mod.rs index 22079d0b..c8e729af 100644 --- a/tests/src/glr/build/mod.rs +++ b/tests/src/glr/build/mod.rs @@ -9,6 +9,7 @@ rustemo_mod!(calc_actions, "/src/glr/build"); use self::calc::CalcParser; use rustemo::parser::Parser; +// ANCHOR: build #[test] fn glr_tree_build_default() { let forest = CalcParser::new().parse("1 + 4 * 9").unwrap(); @@ -27,6 +28,7 @@ fn glr_tree_build_default() { format!("{:#?}", forest.get_tree(1).unwrap().build(&mut builder)) ); } +// ANCHOR_END: build #[test] fn glr_tree_build_generic() { diff --git a/tests/src/glr/forest/mod.rs b/tests/src/glr/forest/mod.rs index 1a822e39..512da64d 100644 --- a/tests/src/glr/forest/mod.rs +++ b/tests/src/glr/forest/mod.rs @@ -73,6 +73,7 @@ fn glr_calc_parse_ambiguities() { ); } +// ANCHOR: forest #[test] fn glr_extract_tree_from_forest() { let forest = CalcParser::new().parse("1 + 4 * 9 + 3 * 2 + 7").unwrap(); @@ -98,3 +99,4 @@ fn glr_extract_tree_from_forest() { format!("{:#?}", tree.children()[0].children()) ); } +// ANCHOR_END: forest