Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add boilerplate for regex AST nodes. #3

Merged
merged 7 commits into from
Oct 26, 2024

Conversation

Louis-He
Copy link
Contributor

Description

Add basic data structure for lexar and parser

Validation performed

Copy link
Contributor

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two major comments:

  1. I will suggest to remove token and parser from the current PR for the following reasons. We're suppose to implement a LALR parser to execute proper parsing with configurable rules. That means we will deprecate the current naive parser/token implementation always; having them public in the library is a little weird as they are not supposed to be the API we expose to users. If we want to keep them, we can move them into the integration test for testing ast only. Let's try to focus on ast node in this PR first
  2. For AST variants, there are some problems:
  • I think there should be some unit tests/integration tests for both the enum and the variant types. Currently, we have the code written, but no code path actual uses them. Like the concern I raised in the inline comments, debug serialization and partial match are not exercised in the unit tests.
  • The current implementation doesn't provide specifications of how the instances are created, and how to handle the error on creation. There should be factory functions new defined for each variant type to provide ways of constructing a node instance. Ideally, the new functions should return Result<AstNodeXXX, OurLibraryError>, so it's probably better to also include a basic implementation of the error system of our library (in this PR or in a separate PR).

Comment on lines 11 to 17
Literal(AstNodeLiteral), // Single character literal
Concat(AstNodeConcat), // Concatenation of two expressions
Union(AstNodeUnion), // Union of two expressions
Star(AstNodeStar), // Kleene Star (zero or more)
Plus(AstNodePlus), // One or more
Optional(AstNodeOptional), // Zero or one (optional)
Group(AstNodeGroup), // Capturing group
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Literal(AstNodeLiteral), // Single character literal
Concat(AstNodeConcat), // Concatenation of two expressions
Union(AstNodeUnion), // Union of two expressions
Star(AstNodeStar), // Kleene Star (zero or more)
Plus(AstNodePlus), // One or more
Optional(AstNodeOptional), // Zero or one (optional)
Group(AstNodeGroup), // Capturing group
Literal(AstNodeLiteral),
Concat(AstNodeConcat),
Union(AstNodeUnion),
Star(AstNodeStar),
Plus(AstNodePlus),
Optional(AstNodeOptional),
Group(AstNodeGroup),

Ideally these comments should be the doc string of the actual variant. I'd prefer to remove them to make the enum definition cleaner

Comment on lines 23 to 29
(AstNode::Literal(l1), AstNode::Literal(l2)) => l1 == l2,
(AstNode::Concat(c1), AstNode::Concat(c2)) => c1 == c2,
(AstNode::Union(u1), AstNode::Union(u2)) => u1 == u2,
(AstNode::Star(s1), AstNode::Star(s2)) => s1 == s2,
(AstNode::Plus(p1), AstNode::Plus(p2)) => p1 == p2,
(AstNode::Optional(o1), AstNode::Optional(o2)) => o1 == o2,
(AstNode::Group(g1), AstNode::Group(g2)) => g1 == g2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(AstNode::Literal(l1), AstNode::Literal(l2)) => l1 == l2,
(AstNode::Concat(c1), AstNode::Concat(c2)) => c1 == c2,
(AstNode::Union(u1), AstNode::Union(u2)) => u1 == u2,
(AstNode::Star(s1), AstNode::Star(s2)) => s1 == s2,
(AstNode::Plus(p1), AstNode::Plus(p2)) => p1 == p2,
(AstNode::Optional(o1), AstNode::Optional(o2)) => o1 == o2,
(AstNode::Group(g1), AstNode::Group(g2)) => g1 == g2,
(AstNode::Literal(lhs), AstNode::Literal(rhs)) => lhs == rhs,
(AstNode::Concat(lhs), AstNode::Concat(rhs)) => lhs == rhs,
(AstNode::Union(lhs), AstNode::Union(rhs)) => lhs == rhs,
(AstNode::Star(lhs), AstNode::Star(rhs)) => lhs == rhs,
(AstNode::Plus(lhs), AstNode::Plus(rhs)) => lhs == rhs,
(AstNode::Optional(lhs), AstNode::Optional(rhs)) => lhs == rhs,
(AstNode::Group(lhs), AstNode::Group(rhs)) => lhs == rhs,

Let's use lhs and rhs for clarifications

Comment on lines 35 to 47
impl std::fmt::Debug for AstNode {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
AstNode::Literal(l) => write!(f, "Literal({:?})", l),
AstNode::Concat(c) => write!(f, "Concat({:?})", c),
AstNode::Union(u) => write!(f, "Union({:?})", u),
AstNode::Star(s) => write!(f, "Star({:?})", s),
AstNode::Plus(p) => write!(f, "Plus({:?})", p),
AstNode::Optional(o) => write!(f, "Optional({:?})", o),
AstNode::Group(g) => write!(f, "Group({:?})", g),
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. How does this recursively handle printing children nodes?
  2. This is a general comment that should apply to all match. Any strong reasons of using l, c (most like Go's naming convention)? This convention is super unclear IMO... I'd prefer to use whatever syn suggested, in our case it's like:
Suggested change
impl std::fmt::Debug for AstNode {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
AstNode::Literal(l) => write!(f, "Literal({:?})", l),
AstNode::Concat(c) => write!(f, "Concat({:?})", c),
AstNode::Union(u) => write!(f, "Union({:?})", u),
AstNode::Star(s) => write!(f, "Star({:?})", s),
AstNode::Plus(p) => write!(f, "Plus({:?})", p),
AstNode::Optional(o) => write!(f, "Optional({:?})", o),
AstNode::Group(g) => write!(f, "Group({:?})", g),
}
}
}
impl std::fmt::Debug for AstNode {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
AstNode::Literal(ast_node) => write!(f, "Literal({:?})", ast_node),
AstNode::Concat(ast_node) => write!(f, "Concat({:?})", ast_node),
AstNode::Union(ast_node) => write!(f, "Union({:?})", ast_node),
AstNode::Star(ast_node) => write!(f, "Star({:?})", ast_node),
AstNode::Plus(ast_node) => write!(f, "Plus({:?})", ast_node),
AstNode::Optional(ast_node) => write!(f, "Optional({:?})", ast_node),
AstNode::Group(ast_node) => write!(f, "Group({:?})", ast_node),
}
}
}

Comment on lines +2 to +8
mod ast_node_concat;
mod ast_node_group;
mod ast_node_literal;
mod ast_node_optional;
mod ast_node_plus;
mod ast_node_star;
mod ast_node_union;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need them to be public if we implement nfa in a separate mod

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do this when we need to do this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this file in this PR?

Copy link
Contributor

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For PR title, how about:
Add boilerplate for regex AST nodes.
This will be the commit message in the git log after we do a squash merge

@LinZhihao-723 LinZhihao-723 changed the title Basic data structure Add boilerplate for regex AST nodes. Oct 26, 2024
@LinZhihao-723 LinZhihao-723 merged commit e5b49f2 into Toplogic-Inc:main Oct 26, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants