-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add boilerplate for regex AST nodes. #3
Add boilerplate for regex AST nodes. #3
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two major comments:
- I will suggest to remove token and parser from the current PR for the following reasons. We're suppose to implement a LALR parser to execute proper parsing with configurable rules. That means we will deprecate the current naive parser/token implementation always; having them public in the library is a little weird as they are not supposed to be the API we expose to users. If we want to keep them, we can move them into the integration test for testing ast only. Let's try to focus on ast node in this PR first
- For AST variants, there are some problems:
- I think there should be some unit tests/integration tests for both the enum and the variant types. Currently, we have the code written, but no code path actual uses them. Like the concern I raised in the inline comments, debug serialization and partial match are not exercised in the unit tests.
- The current implementation doesn't provide specifications of how the instances are created, and how to handle the error on creation. There should be factory functions
new
defined for each variant type to provide ways of constructing a node instance. Ideally, thenew
functions should returnResult<AstNodeXXX, OurLibraryError>
, so it's probably better to also include a basic implementation of the error system of our library (in this PR or in a separate PR).
src/parser/ast_node/ast_node.rs
Outdated
Literal(AstNodeLiteral), // Single character literal | ||
Concat(AstNodeConcat), // Concatenation of two expressions | ||
Union(AstNodeUnion), // Union of two expressions | ||
Star(AstNodeStar), // Kleene Star (zero or more) | ||
Plus(AstNodePlus), // One or more | ||
Optional(AstNodeOptional), // Zero or one (optional) | ||
Group(AstNodeGroup), // Capturing group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Literal(AstNodeLiteral), // Single character literal | |
Concat(AstNodeConcat), // Concatenation of two expressions | |
Union(AstNodeUnion), // Union of two expressions | |
Star(AstNodeStar), // Kleene Star (zero or more) | |
Plus(AstNodePlus), // One or more | |
Optional(AstNodeOptional), // Zero or one (optional) | |
Group(AstNodeGroup), // Capturing group | |
Literal(AstNodeLiteral), | |
Concat(AstNodeConcat), | |
Union(AstNodeUnion), | |
Star(AstNodeStar), | |
Plus(AstNodePlus), | |
Optional(AstNodeOptional), | |
Group(AstNodeGroup), |
Ideally these comments should be the doc string of the actual variant. I'd prefer to remove them to make the enum definition cleaner
src/parser/ast_node/ast_node.rs
Outdated
(AstNode::Literal(l1), AstNode::Literal(l2)) => l1 == l2, | ||
(AstNode::Concat(c1), AstNode::Concat(c2)) => c1 == c2, | ||
(AstNode::Union(u1), AstNode::Union(u2)) => u1 == u2, | ||
(AstNode::Star(s1), AstNode::Star(s2)) => s1 == s2, | ||
(AstNode::Plus(p1), AstNode::Plus(p2)) => p1 == p2, | ||
(AstNode::Optional(o1), AstNode::Optional(o2)) => o1 == o2, | ||
(AstNode::Group(g1), AstNode::Group(g2)) => g1 == g2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(AstNode::Literal(l1), AstNode::Literal(l2)) => l1 == l2, | |
(AstNode::Concat(c1), AstNode::Concat(c2)) => c1 == c2, | |
(AstNode::Union(u1), AstNode::Union(u2)) => u1 == u2, | |
(AstNode::Star(s1), AstNode::Star(s2)) => s1 == s2, | |
(AstNode::Plus(p1), AstNode::Plus(p2)) => p1 == p2, | |
(AstNode::Optional(o1), AstNode::Optional(o2)) => o1 == o2, | |
(AstNode::Group(g1), AstNode::Group(g2)) => g1 == g2, | |
(AstNode::Literal(lhs), AstNode::Literal(rhs)) => lhs == rhs, | |
(AstNode::Concat(lhs), AstNode::Concat(rhs)) => lhs == rhs, | |
(AstNode::Union(lhs), AstNode::Union(rhs)) => lhs == rhs, | |
(AstNode::Star(lhs), AstNode::Star(rhs)) => lhs == rhs, | |
(AstNode::Plus(lhs), AstNode::Plus(rhs)) => lhs == rhs, | |
(AstNode::Optional(lhs), AstNode::Optional(rhs)) => lhs == rhs, | |
(AstNode::Group(lhs), AstNode::Group(rhs)) => lhs == rhs, |
Let's use lhs
and rhs
for clarifications
src/parser/ast_node/ast_node.rs
Outdated
impl std::fmt::Debug for AstNode { | ||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
match self { | ||
AstNode::Literal(l) => write!(f, "Literal({:?})", l), | ||
AstNode::Concat(c) => write!(f, "Concat({:?})", c), | ||
AstNode::Union(u) => write!(f, "Union({:?})", u), | ||
AstNode::Star(s) => write!(f, "Star({:?})", s), | ||
AstNode::Plus(p) => write!(f, "Plus({:?})", p), | ||
AstNode::Optional(o) => write!(f, "Optional({:?})", o), | ||
AstNode::Group(g) => write!(f, "Group({:?})", g), | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- How does this recursively handle printing children nodes?
- This is a general comment that should apply to all
match
. Any strong reasons of usingl
,c
(most like Go's naming convention)? This convention is super unclear IMO... I'd prefer to use whatever syn suggested, in our case it's like:
impl std::fmt::Debug for AstNode { | |
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | |
match self { | |
AstNode::Literal(l) => write!(f, "Literal({:?})", l), | |
AstNode::Concat(c) => write!(f, "Concat({:?})", c), | |
AstNode::Union(u) => write!(f, "Union({:?})", u), | |
AstNode::Star(s) => write!(f, "Star({:?})", s), | |
AstNode::Plus(p) => write!(f, "Plus({:?})", p), | |
AstNode::Optional(o) => write!(f, "Optional({:?})", o), | |
AstNode::Group(g) => write!(f, "Group({:?})", g), | |
} | |
} | |
} | |
impl std::fmt::Debug for AstNode { | |
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | |
match self { | |
AstNode::Literal(ast_node) => write!(f, "Literal({:?})", ast_node), | |
AstNode::Concat(ast_node) => write!(f, "Concat({:?})", ast_node), | |
AstNode::Union(ast_node) => write!(f, "Union({:?})", ast_node), | |
AstNode::Star(ast_node) => write!(f, "Star({:?})", ast_node), | |
AstNode::Plus(ast_node) => write!(f, "Plus({:?})", ast_node), | |
AstNode::Optional(ast_node) => write!(f, "Optional({:?})", ast_node), | |
AstNode::Group(ast_node) => write!(f, "Group({:?})", ast_node), | |
} | |
} | |
} |
mod ast_node_concat; | ||
mod ast_node_group; | ||
mod ast_node_literal; | ||
mod ast_node_optional; | ||
mod ast_node_plus; | ||
mod ast_node_star; | ||
mod ast_node_union; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need them to be public if we implement nfa in a separate mod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can do this when we need to do this
src/parser/token.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this file in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For PR title, how about:
Add boilerplate for regex AST nodes.
This will be the commit message in the git log after we do a squash merge
Description
Add basic data structure for lexar and parser
Validation performed