Skip to content

Commit

Permalink
Refresh file
Browse files Browse the repository at this point in the history
  • Loading branch information
HexMerlin committed Jan 19, 2025
1 parent 4f55c23 commit 43fa022
Show file tree
Hide file tree
Showing 2 changed files with 45 additions and 73 deletions.
116 changes: 44 additions & 72 deletions docs/ALANG.html
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ <h2 id="alang-grammar-specification">Alang Grammar Specification</h2>
<table>
<thead>
<tr>
<th>Grammar Rule</th>
<th>Rule</th>
<th>Expansion</th>
</tr>
</thead>
Expand All @@ -113,7 +113,7 @@ <h2 id="alang-grammar-specification">Alang Grammar Specification</h2>
</tr>
<tr>
<td>🔹Union</td>
<td>Difference (<code>|</code> Difference)*</td>
<td>Difference (<code>\|</code> Difference)*</td>
</tr>
<tr>
<td>🔹Difference</td>
Expand Down Expand Up @@ -145,7 +145,7 @@ <h2 id="alang-grammar-specification">Alang Grammar Specification</h2>
</tr>
<tr>
<td>🔹Complement</td>
<td>PrimaryRegex <code>~</code></td>
<td>Primary <code>~</code></td>
</tr>
<tr>
<td>PrimaryRegex</td>
Expand All @@ -169,20 +169,14 @@ <h2 id="alang-grammar-specification">Alang Grammar Specification</h2>
</tr>
</tbody>
</table>
<p>🔹 Denotes an actual node type in the resulting AST (abstract syntax tree) outputed by the parser.</p>
<p>Note to developers: All types marked with a 🔹 have corresponding classes with the exact same names in the namespace <strong>Automata.Core.Alang</strong>.</p>
<p>For an input to be valid, the root rule AlangRegex must cover the entire input, with no residue.</p>
<h3 id="operators">Operators</h3>
<ul>
<li>Operators with higher precedence levels bind more tightly than those with lower levels.</li>
<li>Operators of the same precedence level are left-associative (left-to-right).</li>
<li>All <em>unary</em> operators are <em>postfix operators</em> and all <em>binary</em> operators are <em>infix</em> operators.</li>
</ul>
<p>🔹 Denotes a node-type that can be included in the resulting parse tree.</p>
<p>The root rule AlangRegex must cover the entire input, with no residue.</p>
<h3 id="operators-ordered-by-precedence-lowest-to-highest">Operators Ordered by Precedence (Lowest-to-Highest)</h3>
<table>
<thead>
<tr>
<th>Precedence</th>
<th>Operation/Unit</th>
<th>Operation</th>
<th>Operator Character</th>
<th>Position &amp; Arity</th>
</tr>
Expand Down Expand Up @@ -262,77 +256,55 @@ <h3 id="operators">Operators</h3>
</tr>
</tbody>
</table>
<h3 id="whitespace">Whitespace</h3>
<h3 id="operation-definitions">Operation Definitions</h3>
<pre><code>Union: L₁ ∪ L₂ = { w | w ∈ L₁ or w ∈ L₂ }

Difference: L₁ - L₂ = { w | w ∈ L₁ and w ∉ L₂ }

Intersection: L₁ ∩ L₂ = { w | w ∈ L₁ and w ∈ L₂ }

Concatenation: L₁ ⋅ L₂ = { w | w = uv, u ∈ L₁, v ∈ L₂ }

Option: L? = L ∪ { ε }

Kleene Star: L* = ⋃ₙ₌₀^∞ Lⁿ, where L⁰ = { ε }, Lⁿ = L ⋅ Lⁿ⁻¹ for n ≥ 1

Kleene Plus: L⁺ = ⋃ₙ₌₁^∞ Lⁿ, where Lⁿ = L ⋅ Lⁿ⁻¹ for n ≥ 1

Complement: ᒾL = Σ* \ L
</code></pre>
<h3 id="operators">Operators</h3>
<ul>
<li>Multiple Whitespace is allowed anywhere in the grammar, except within Symbols.</li>
<li>Whitespace is never required anywhere - except for separating <em>directly</em> adjacent Symbols or operators.
Thus, the parser resolves all reserved tokens as delimiters: The following are correcly delimited: <code>hello+world</code> or <code>hello(world)</code>.</li>
<li>Whitespace denotes any whitespace character (i.e. space, tab, newline, etc.).</li>
<li>The formal whitespace definition is equivalent to .NET's <code>char.IsWhiteSpace(char c)</code>.</li>
<li>Operators with higher precedence levels bind more tightly than those with lower levels.</li>
<li>Operators of the same precedence level are left-associative (left-to-right).</li>
</ul>
<h3 id="whitespaces">Whitespaces</h3>
<ul>
<li>Whitespace denotes any whitespace character (i.e. space, tab, newline, etc.)</li>
<li>Whitespace is allowed anywhere in the grammar, except within Symbols.</li>
<li>Whitespace it is never required unless to separate directly adjacent Symbols or operators.</li>
</ul>
<h3 id="symbols">Symbols</h3>
<p><strong>Symbols</strong> have a specific meaning - as formally defined by automata theory:</p>
<p><strong>Symbols</strong> have a specific meaning and are defined as:</p>
<ul>
<li>User-defined string literals that constitute the <em>atoms</em> of Alang expressions.</li>
<li>It is equivalent to <strong>symbols</strong> in finite-state automata.</li>
<li>Directly equivalent to <strong>alphabet symbols</strong> in the context of finite-state automata.</li>
<li>Can contain any characters except reserved operator characters or whitespace.</li>
<li>They can never be empty.</li>
<li>Symbols are <em>strings</em> and are not to be confused with characters,</li>
<li>Can never be empty.</li>
<li>They are not to be confused with characters.</li>
</ul>
<h3 id="wildcard">Wildcard</h3>
<p>A Wildcard is a special token denoted by a <code>.</code> (dot).</p>
<p>It represents any symbol in the alphabet.</p>
<p>For example:</p>
<p><code>. - hello</code> represents the language of all symbols except 'hello'.</p>
<p><code>(. - hello).*</code> represents the language of all sequences, except those containing 'hello'.</p>
<h3 id="the-empty-language--and-the-language-containing-only-epsilon-ε">The Empty Language ∅ and The Language containing only epsilon {ε}</h3>
<ul>
<li><p>The Empty Language ∅ is the language that does not cotain anything.</p>
<ul>
<li>It is written in Alang using empty parentheses <code>()</code>.</li>
<li>Its corresponding grammar rule is <code>EmptyLang</code> and the parse tree type is <code>EmptyLang</code>.</li>
<li>Its automata equivalence is an automaton that does not accept anything (not even the empty string).</li>
<li>In most scenarios, <code>()</code> is not required when writing a Alang expressions.
However, many operations can result in the empty language. For example <code>a - (a | b)</code> is equivalent to <code>()</code>.</li>
</ul>
</li>
<li><p>The language containing only the empty string {ε}</p>
<ul>
<li>It is written in Alang as <code>()?</code>, since the Option operator <code>?</code> unites the operand with {ε}: <strong>L? = L ∪ { ε }</strong></li>
<li>Its automata equivalence is an automaton that only accepts ε.</li>
</ul>
</li>
<li><p>Note that <code>()</code><code>{ε}</code>. For instance:</p>
<ul>
<li>Concatenating any language <code>L</code> with <code>()</code> =&gt; <code>()</code>.</li>
<li>Concatenating any language <code>L</code> with <code>{ε}</code> =&gt; <code>L</code>.</li>
</ul>
</li>
</ul>
<h3 id="alang-expression-examples">Alang expression examples</h3>
<p><code>(a? (b | c) )+</code> : All sequences from the set {a, b, c} where any 'a' must be followed by 'b' or 'c'.</p>
<p><code>a+~ b</code> : Complement of 'a+' - all sequences that are not 1 or more 'a's, followed by a 'b'</p>
<p><code>(x1 | x2 | x3)* - (x1 x2 x3)+</code> : All sequences constaining {x1, x2, x3}, except repetitions of &quot;x1 x2 x3&quot;.</p>
<p><code>()</code> : The empty language that does not accept anything. For example, it is the result from <code>hello - hello</code> and from <code>hello &amp; world</code>.</p>
<h3 id="operation-definitions">Operation Definitions</h3>
<pre><code>Union: L₁ ∪ L₂ = { w | w ∈ L₁ or w ∈ L₂ }
Difference: L₁ - L₂ = { w | w ∈ L₁ and w ∉ L₂ }
Intersection: L₁ ∩ L₂ = { w | w ∈ L₁ and w ∈ L₂ }
Concatenation: L₁ ⋅ L₂ = { w | w = uv, u ∈ L₁, v ∈ L₂ }
Option: L? = L ∪ { ε }
Kleene Star: L* = ⋃ₙ₌₀^∞ Lⁿ, where L⁰ = { ε }, Lⁿ = L ⋅ Lⁿ⁻¹ for n ≥ 1
Kleene Plus: L⁺ = ⋃ₙ₌₁^∞ Lⁿ, where Lⁿ = L ⋅ Lⁿ⁻¹ for n ≥ 1
Complement: ᒾL = Σ* \ L
</code></pre>
<h2 id="c-api">C# API</h2>
<p>The Alang parser and FSA compiler is provided by the namespace <strong>Automata.Core.Alang</strong>.</p>
<p>Key class: <strong>AlangRegex</strong></p>
<p>Example usage:</p>
<pre><code class="lang-csharp"> AlangRegex regex = AlangRegex.Parse(&quot;(a? (b | c) )+&quot;); // Create an Alang regex

Mfa fsa = regex.Compile(); // Compile the regex to a minimal finite-state automaton
</code></pre>
<p>For more information, see the <a href="index.html">Automata documentation</a></p>
<p><code>(. - hello).*</code> represents the language of all sequences, except those starting with 'hello'.</p>
<h3 id="empty-language-">Empty Language ∅</h3>
<p>The Empty Language (∅) is the language that does not cotain anything.
It written in Alang with an empty pair of parentheses <code>()</code>.
Its automata equivalence is an automaton that does not accept anything (not even the empty string).
In most scenarios, <code>()</code> is not required when writing a Alang expressions.
However, many operations can result in the empty language. For example <code>a - (a | b)</code> is equivalent to <code>()</code>.</p>

</article>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/index.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"ALANG.html": {
"href": "ALANG.html",
"title": "Alang (Automata Language) | Automata Docs",
"keywords": "Alang (Automata Language) Alang is a formal language for defining finite-state automata using human-readable regular expressions. It supports many operations, such as union, intersection, complement and set difference, enabling expressions like \"(a? (b | c)* - (b b))+\". Alang's syntax is defined by the Alang Grammar which is an LL(1) context-free grammar. The Alang parser is optimized for fast parsing of very large inputs. The parser validates syntactic correctness and generates detailed error messages for invalid inputs. Alang Grammar Specification Grammar Rule Expansion AlangRegex (root) Union \uD83D\uDD39Union Difference (| Difference)* \uD83D\uDD39Difference Intersection (- Intersection)* \uD83D\uDD39Intersection Concatenation (& Concatenation)* \uD83D\uDD39Concatenation UnaryRegex+ UnaryRegex PrimaryRegex (Option ┃ KleeneStar ┃ KleenePlus ┃ Complement)* \uD83D\uDD39Option PrimaryRegex ? \uD83D\uDD39KleeneStar PrimaryRegex * \uD83D\uDD39KleenePlus PrimaryRegex + \uD83D\uDD39Complement PrimaryRegex ~ PrimaryRegex ( AlangRegex ) ┃ Symbol ┃ Wildcard ┃ EmptyLang \uD83D\uDD39Symbol SymbolChar+ \uD83D\uDD39Wildcard . \uD83D\uDD39EmptyLang () SymbolChar any character except operator characters and whitespace \uD83D\uDD39 Denotes an actual node type in the resulting AST (abstract syntax tree) outputed by the parser. Note to developers: All types marked with a \uD83D\uDD39 have corresponding classes with the exact same names in the namespace Automata.Core.Alang. For an input to be valid, the root rule AlangRegex must cover the entire input, with no residue. Operators Operators with higher precedence levels bind more tightly than those with lower levels. Operators of the same precedence level are left-associative (left-to-right). All unary operators are postfix operators and all binary operators are infix operators. Precedence Operation/Unit Operator Character Position & Arity 1 Union L₁ | L₂ Infix Binary 2 Difference L₁ - L₂ Infix Binary 3 Intersection L₁ & L₂ Infix Binary 4 Concatenation L₁ L₂ Infix Implicit 5 Option L ? Postfix Unary 5 Kleene Star L* Postfix Unary 5 Kleene Plus L+ Postfix Unary 5 Complement L~ Postfix Unary 6 Group ( L ) Enclosing Unary 7 EmptyLang () Empty parentheses 7 Wildcard . Terminal 7 Symbol string literal Terminal Whitespace Multiple Whitespace is allowed anywhere in the grammar, except within Symbols. Whitespace is never required anywhere - except for separating directly adjacent Symbols or operators. Thus, the parser resolves all reserved tokens as delimiters: The following are correcly delimited: hello+world or hello(world). Whitespace denotes any whitespace character (i.e. space, tab, newline, etc.). The formal whitespace definition is equivalent to .NET's char.IsWhiteSpace(char c). Symbols Symbols have a specific meaning - as formally defined by automata theory: User-defined string literals that constitute the atoms of Alang expressions. It is equivalent to symbols in finite-state automata. Can contain any characters except reserved operator characters or whitespace. They can never be empty. Symbols are strings and are not to be confused with characters, Wildcard A Wildcard is a special token denoted by a . (dot). It represents any symbol in the alphabet. For example: . - hello represents the language of all symbols except 'hello'. (. - hello).* represents the language of all sequences, except those containing 'hello'. The Empty Language ∅ and The Language containing only epsilon {ε} The Empty Language ∅ is the language that does not cotain anything. It is written in Alang using empty parentheses (). Its corresponding grammar rule is EmptyLang and the parse tree type is EmptyLang. Its automata equivalence is an automaton that does not accept anything (not even the empty string). In most scenarios, () is not required when writing a Alang expressions. However, many operations can result in the empty language. For example a - (a | b) is equivalent to (). The language containing only the empty string {ε} It is written in Alang as ()?, since the Option operator ? unites the operand with {ε}: L? = L ∪ { ε } Its automata equivalence is an automaton that only accepts ε. Note that () ≠ {ε}. For instance: Concatenating any language L with () => (). Concatenating any language L with {ε} => L. Alang expression examples (a? (b | c) )+ : All sequences from the set {a, b, c} where any 'a' must be followed by 'b' or 'c'. a+~ b : Complement of 'a+' - all sequences that are not 1 or more 'a's, followed by a 'b' (x1 | x2 | x3)* - (x1 x2 x3)+ : All sequences constaining {x1, x2, x3}, except repetitions of \"x1 x2 x3\". () : The empty language that does not accept anything. For example, it is the result from hello - hello and from hello & world. Operation Definitions Union: L₁ ∪ L₂ = { w | w ∈ L₁ or w ∈ L₂ } Difference: L₁ - L₂ = { w | w ∈ L₁ and w ∉ L₂ } Intersection: L₁ ∩ L₂ = { w | w ∈ L₁ and w ∈ L₂ } Concatenation: L₁ ⋅ L₂ = { w | w = uv, u ∈ L₁, v ∈ L₂ } Option: L? = L ∪ { ε } Kleene Star: L* = ⋃ₙ₌₀^∞ Lⁿ, where L⁰ = { ε }, Lⁿ = L ⋅ Lⁿ⁻¹ for n ≥ 1 Kleene Plus: L⁺ = ⋃ₙ₌₁^∞ Lⁿ, where Lⁿ = L ⋅ Lⁿ⁻¹ for n ≥ 1 Complement: ᒾL = Σ* \\ L C# API The Alang parser and FSA compiler is provided by the namespace Automata.Core.Alang. Key class: AlangRegex Example usage: AlangRegex regex = AlangRegex.Parse(\"(a? (b | c) )+\"); // Create an Alang regex Mfa fsa = regex.Compile(); // Compile the regex to a minimal finite-state automaton For more information, see the Automata documentation"
"keywords": "Alang (Automata Language) Alang is a formal language for defining finite-state automata using human-readable regular expressions. It supports many operations, such as union, intersection, complement and set difference, enabling expressions like \"(a? (b | c)* - (b b))+\". Alang's syntax is defined by the Alang Grammar which is an LL(1) context-free grammar. The Alang parser is optimized for fast parsing of very large inputs. The parser validates syntactic correctness and generates detailed error messages for invalid inputs. Alang Grammar Specification Rule Expansion AlangRegex (root) Union \uD83D\uDD39Union Difference (\\| Difference)* \uD83D\uDD39Difference Intersection (- Intersection)* \uD83D\uDD39Intersection Concatenation (& Concatenation)* \uD83D\uDD39Concatenation UnaryRegex+ UnaryRegex PrimaryRegex (Option ┃ KleeneStar ┃ KleenePlus ┃ Complement)* \uD83D\uDD39Option PrimaryRegex ? \uD83D\uDD39KleeneStar PrimaryRegex * \uD83D\uDD39KleenePlus PrimaryRegex + \uD83D\uDD39Complement Primary ~ PrimaryRegex ( AlangRegex ) ┃ Symbol ┃ Wildcard ┃ EmptyLang \uD83D\uDD39Symbol SymbolChar+ \uD83D\uDD39Wildcard . \uD83D\uDD39EmptyLang () SymbolChar any character except operator characters and whitespace \uD83D\uDD39 Denotes a node-type that can be included in the resulting parse tree. The root rule AlangRegex must cover the entire input, with no residue. Operators Ordered by Precedence (Lowest-to-Highest) Precedence Operation Operator Character Position & Arity 1 Union L₁ | L₂ Infix Binary 2 Difference L₁ - L₂ Infix Binary 3 Intersection L₁ & L₂ Infix Binary 4 Concatenation L₁ L₂ Infix Implicit 5 Option L ? Postfix Unary 5 Kleene Star L* Postfix Unary 5 Kleene Plus L+ Postfix Unary 5 Complement L~ Postfix Unary 6 Group ( L ) Enclosing Unary 7 EmptyLang () Empty parentheses 7 Wildcard . Terminal 7 Symbol string literal Terminal Operation Definitions Union: L₁ ∪ L₂ = { w | w ∈ L₁ or w ∈ L₂ } Difference: L₁ - L₂ = { w | w ∈ L₁ and w ∉ L₂ } Intersection: L₁ ∩ L₂ = { w | w ∈ L₁ and w ∈ L₂ } Concatenation: L₁ ⋅ L₂ = { w | w = uv, u ∈ L₁, v ∈ L₂ } Option: L? = L ∪ { ε } Kleene Star: L* = ⋃ₙ₌₀^∞ Lⁿ, where L⁰ = { ε }, Lⁿ = L ⋅ Lⁿ⁻¹ for n ≥ 1 Kleene Plus: L⁺ = ⋃ₙ₌₁^∞ Lⁿ, where Lⁿ = L ⋅ Lⁿ⁻¹ for n ≥ 1 Complement: ᒾL = Σ* \\ L Operators Operators with higher precedence levels bind more tightly than those with lower levels. Operators of the same precedence level are left-associative (left-to-right). Whitespaces Whitespace denotes any whitespace character (i.e. space, tab, newline, etc.) Whitespace is allowed anywhere in the grammar, except within Symbols. Whitespace it is never required unless to separate directly adjacent Symbols or operators. Symbols Symbols have a specific meaning and are defined as: User-defined string literals that constitute the atoms of Alang expressions. Directly equivalent to alphabet symbols in the context of finite-state automata. Can contain any characters except reserved operator characters or whitespace. Can never be empty. They are not to be confused with characters. Wildcard A Wildcard is a special token denoted by a . (dot). It represents any symbol in the alphabet. For example: . - hello represents the language of all symbols except 'hello'. (. - hello).* represents the language of all sequences, except those starting with 'hello'. Empty Language ∅ The Empty Language (∅) is the language that does not cotain anything. It written in Alang with an empty pair of parentheses (). Its automata equivalence is an automaton that does not accept anything (not even the empty string). In most scenarios, () is not required when writing a Alang expressions. However, many operations can result in the empty language. For example a - (a | b) is equivalent to ()."
},
"Automata.Core.Alang.AlangCursor.html": {
"href": "Automata.Core.Alang.AlangCursor.html",
Expand Down

0 comments on commit 43fa022

Please sign in to comment.