Add boilerplate for regex AST nodes. #3

Louis-He · 2024-10-22T02:44:47Z

Description

Add basic data structure for lexar and parser

Validation performed

Added basic unit tests

LinZhihao-723

Two major comments:

I will suggest to remove token and parser from the current PR for the following reasons. We're suppose to implement a LALR parser to execute proper parsing with configurable rules. That means we will deprecate the current naive parser/token implementation always; having them public in the library is a little weird as they are not supposed to be the API we expose to users. If we want to keep them, we can move them into the integration test for testing ast only. Let's try to focus on ast node in this PR first
For AST variants, there are some problems:

I think there should be some unit tests/integration tests for both the enum and the variant types. Currently, we have the code written, but no code path actual uses them. Like the concern I raised in the inline comments, debug serialization and partial match are not exercised in the unit tests.
The current implementation doesn't provide specifications of how the instances are created, and how to handle the error on creation. There should be factory functions new defined for each variant type to provide ways of constructing a node instance. Ideally, the new functions should return Result<AstNodeXXX, OurLibraryError>, so it's probably better to also include a basic implementation of the error system of our library (in this PR or in a separate PR).

LinZhihao-723 · 2024-10-22T03:15:33Z

src/parser/ast_node/ast_node.rs

+    Literal(AstNodeLiteral),   // Single character literal
+    Concat(AstNodeConcat),     // Concatenation of two expressions
+    Union(AstNodeUnion),       // Union of two expressions
+    Star(AstNodeStar),         // Kleene Star (zero or more)
+    Plus(AstNodePlus),         // One or more
+    Optional(AstNodeOptional), // Zero or one (optional)
+    Group(AstNodeGroup),       // Capturing group


Suggested change

Literal(AstNodeLiteral), // Single character literal

Concat(AstNodeConcat), // Concatenation of two expressions

Union(AstNodeUnion), // Union of two expressions

Star(AstNodeStar), // Kleene Star (zero or more)

Plus(AstNodePlus), // One or more

Optional(AstNodeOptional), // Zero or one (optional)

Group(AstNodeGroup), // Capturing group

Literal(AstNodeLiteral),

Concat(AstNodeConcat),

Union(AstNodeUnion),

Star(AstNodeStar),

Plus(AstNodePlus),

Optional(AstNodeOptional),

Group(AstNodeGroup),

Ideally these comments should be the doc string of the actual variant. I'd prefer to remove them to make the enum definition cleaner

LinZhihao-723 · 2024-10-22T03:21:32Z

src/parser/ast_node/ast_node.rs

+            (AstNode::Literal(l1), AstNode::Literal(l2)) => l1 == l2,
+            (AstNode::Concat(c1), AstNode::Concat(c2)) => c1 == c2,
+            (AstNode::Union(u1), AstNode::Union(u2)) => u1 == u2,
+            (AstNode::Star(s1), AstNode::Star(s2)) => s1 == s2,
+            (AstNode::Plus(p1), AstNode::Plus(p2)) => p1 == p2,
+            (AstNode::Optional(o1), AstNode::Optional(o2)) => o1 == o2,
+            (AstNode::Group(g1), AstNode::Group(g2)) => g1 == g2,


Suggested change

(AstNode::Literal(l1), AstNode::Literal(l2)) => l1 == l2,

(AstNode::Concat(c1), AstNode::Concat(c2)) => c1 == c2,

(AstNode::Union(u1), AstNode::Union(u2)) => u1 == u2,

(AstNode::Star(s1), AstNode::Star(s2)) => s1 == s2,

(AstNode::Plus(p1), AstNode::Plus(p2)) => p1 == p2,

(AstNode::Optional(o1), AstNode::Optional(o2)) => o1 == o2,

(AstNode::Group(g1), AstNode::Group(g2)) => g1 == g2,

(AstNode::Literal(lhs), AstNode::Literal(rhs)) => lhs == rhs,

(AstNode::Concat(lhs), AstNode::Concat(rhs)) => lhs == rhs,

(AstNode::Union(lhs), AstNode::Union(rhs)) => lhs == rhs,

(AstNode::Star(lhs), AstNode::Star(rhs)) => lhs == rhs,

(AstNode::Plus(lhs), AstNode::Plus(rhs)) => lhs == rhs,

(AstNode::Optional(lhs), AstNode::Optional(rhs)) => lhs == rhs,

(AstNode::Group(lhs), AstNode::Group(rhs)) => lhs == rhs,

Let's use lhs and rhs for clarifications

LinZhihao-723 · 2024-10-22T03:29:24Z

src/parser/ast_node/ast_node.rs

+impl std::fmt::Debug for AstNode {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        match self {
+            AstNode::Literal(l) => write!(f, "Literal({:?})", l),
+            AstNode::Concat(c) => write!(f, "Concat({:?})", c),
+            AstNode::Union(u) => write!(f, "Union({:?})", u),
+            AstNode::Star(s) => write!(f, "Star({:?})", s),
+            AstNode::Plus(p) => write!(f, "Plus({:?})", p),
+            AstNode::Optional(o) => write!(f, "Optional({:?})", o),
+            AstNode::Group(g) => write!(f, "Group({:?})", g),
+        }
+    }
+}


Two questions:

How does this recursively handle printing children nodes?

This is a general comment that should apply to all match. Any strong reasons of using l, c (most like Go's naming convention)? This convention is super unclear IMO... I'd prefer to use whatever syn suggested, in our case it's like:

Suggested change

impl std::fmt::Debug for AstNode {

fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {

match self {

AstNode::Literal(l) => write!(f, "Literal({:?})", l),

AstNode::Concat(c) => write!(f, "Concat({:?})", c),

AstNode::Union(u) => write!(f, "Union({:?})", u),

AstNode::Star(s) => write!(f, "Star({:?})", s),

AstNode::Plus(p) => write!(f, "Plus({:?})", p),

AstNode::Optional(o) => write!(f, "Optional({:?})", o),

AstNode::Group(g) => write!(f, "Group({:?})", g),

}

}

}

impl std::fmt::Debug for AstNode {

fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {

match self {

AstNode::Literal(ast_node) => write!(f, "Literal({:?})", ast_node),

AstNode::Concat(ast_node) => write!(f, "Concat({:?})", ast_node),

AstNode::Union(ast_node) => write!(f, "Union({:?})", ast_node),

AstNode::Star(ast_node) => write!(f, "Star({:?})", ast_node),

AstNode::Plus(ast_node) => write!(f, "Plus({:?})", ast_node),

AstNode::Optional(ast_node) => write!(f, "Optional({:?})", ast_node),

AstNode::Group(ast_node) => write!(f, "Group({:?})", ast_node),

}

}

}

LinZhihao-723 · 2024-10-22T03:31:05Z

src/parser/ast_node/mod.rs

+mod ast_node_concat;
+mod ast_node_group;
+mod ast_node_literal;
+mod ast_node_optional;
+mod ast_node_plus;
+mod ast_node_star;
+mod ast_node_union;


We might need them to be public if we implement nfa in a separate mod

we can do this when we need to do this

LinZhihao-723 · 2024-10-25T03:32:02Z

src/parser/token.rs

Do we need this file in this PR?

LinZhihao-723

For PR title, how about:
Add boilerplate for regex AST nodes.
This will be the commit message in the git log after we do a squash merge

Louis-He added 5 commits October 15, 2024 20:33

feat: complete basic lexer and parser for regex.

1d7e5a1

Added basic unit tests

fix all formatting issue

29e2769

add basic data structures for lexar and parser

7083658

refactor each node type to individual files

09ed315

update camal naming

d872840

LinZhihao-723 requested changes Oct 22, 2024

View reviewed changes

implement new for all nodes and add basic tests

082b6e4

LinZhihao-723 reviewed Oct 25, 2024

View reviewed changes

src/parser/token.rs Outdated

Copy link

Contributor

LinZhihao-723 Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this file in this PR?

remove token

3d1e18c

LinZhihao-723 approved these changes Oct 26, 2024

View reviewed changes

LinZhihao-723 changed the title ~~Basic data structure~~ Add boilerplate for regex AST nodes. Oct 26, 2024

LinZhihao-723 merged commit e5b49f2 into Toplogic-Inc:main Oct 26, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add boilerplate for regex AST nodes. #3

Add boilerplate for regex AST nodes. #3

Louis-He commented Oct 22, 2024

LinZhihao-723 left a comment

LinZhihao-723 Oct 22, 2024

LinZhihao-723 Oct 22, 2024

LinZhihao-723 Oct 22, 2024

LinZhihao-723 Oct 22, 2024

Louis-He Oct 25, 2024

LinZhihao-723 Oct 25, 2024

LinZhihao-723 left a comment

Add boilerplate for regex AST nodes. #3

Add boilerplate for regex AST nodes. #3

Conversation

Louis-He commented Oct 22, 2024

Description

Validation performed

LinZhihao-723 left a comment

Choose a reason for hiding this comment

LinZhihao-723 Oct 22, 2024

Choose a reason for hiding this comment

LinZhihao-723 Oct 22, 2024

Choose a reason for hiding this comment

LinZhihao-723 Oct 22, 2024

Choose a reason for hiding this comment

LinZhihao-723 Oct 22, 2024

Choose a reason for hiding this comment

Louis-He Oct 25, 2024

Choose a reason for hiding this comment

LinZhihao-723 Oct 25, 2024

Choose a reason for hiding this comment

LinZhihao-723 left a comment

Choose a reason for hiding this comment