-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Unify all types of NFA transitions into NfaSpontaneousTransition
.
#76
refactor: Unify all types of NFA transitions into NfaSpontaneousTransition
.
#76
Conversation
Warning Rate limit exceeded@SharafMohamed has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 27 minutes and 10 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (3)
WalkthroughThis pull request updates the build configuration and core finite automata components in the regex processing system. The changes remove outdated tagged transition handling and introduce a new mechanism based on spontaneous transitions and tag operations. Updates include renaming methods, modifying constructor parameters, and enhancing serialization to use optional values. In addition, new files ( Changes
Sequence Diagram(s)sequenceDiagram
participant RA as RegexAST
participant Nfa as Nfa
participant NS as NfaState
participant TS as TagOperation
participant NST as NfaSpontaneousTransition
RA->>Nfa: add_to_nfa_with_negative_captures(...)
Nfa->>NS: new_state_from_negative_captures(captures, dest_state)
NS->>NS: add_spontaneous_transition(TagOperationType, tag_ids, dest_state)
Nfa->>NST: create NfaSpontaneousTransition(tag_ops, dest_state)
NST->>Nfa: serialize() returns optional string
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (7)
src/log_surgeon/Lexer.tpp (4)
362-365
: Passsymbol_id_t
by value inadd_rule
methodSince
symbol_id_t
is likely a small type (e.g., an integer), passing it by value is more efficient than passing by const reference.Apply this diff to update the parameter:
- void Lexer<TypedNfaState, TypedDfaState>::add_rule( - symbol_id_t const& var_id, + void Lexer<TypedNfaState, TypedDfaState>::add_rule( + symbol_id_t var_id, std::unique_ptr<finite_automata::RegexAST<TypedNfaState>> rule)
369-374
: Passsymbol_id_t
by value inget_rule
methodFor consistency and efficiency, consider passing
symbol_id_t
by value in theget_rule
method.Apply this diff to update the parameter:
- auto Lexer<TypedNfaState, TypedDfaState>::get_rule(symbol_id_t const var_id) + auto Lexer<TypedNfaState, TypedDfaState>::get_rule(symbol_id_t var_id)
381-395
: Improve exception message for duplicate capture namesIncluding the duplicate capture name in the exception message enhances debugging by providing specific information about the error.
Apply this diff to enhance the exception message:
- throw std::invalid_argument("`m_rules` contains capture names that are not unique."); + throw std::invalid_argument("Duplicate capture name detected: " + capture_name);
404-404
: Offer assistance for DFA capture handlingThe TODO comment notes that the DFA currently ignores captures, which might lead to incorrect lexing of patterns with capture groups.
Would you like assistance in updating the DFA implementation to properly handle captures? I can help develop a solution and open a new GitHub issue to track this task.
src/log_surgeon/Lexer.hpp (1)
131-169
: Add documentation for the new getter methods.While the methods are well-structured, they would benefit from documentation explaining:
- The purpose of each method
- The meaning of nullopt returns
- Any preconditions or postconditions
tests/test-nfa.cpp (1)
47-47
: Consider using std::move for rules.The removal of std::move could lead to unnecessary copying. Consider restoring it:
- ByteNfa const nfa{rules}; + ByteNfa const nfa{std::move(rules)};tests/test-lexer.cpp (1)
296-350
: Comprehensive test coverage with room for expansion.The test cases effectively cover both basic lexer functionality and capture groups. However, there's a TODO comment about adding tests for register-related functionality.
Would you like me to help implement the register-related tests once the determinization is implemented?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
CMakeLists.txt
(1 hunks)src/log_surgeon/Lexer.hpp
(5 hunks)src/log_surgeon/Lexer.tpp
(3 hunks)src/log_surgeon/LexicalRule.hpp
(1 hunks)src/log_surgeon/SchemaParser.cpp
(4 hunks)src/log_surgeon/finite_automata/Capture.hpp
(2 hunks)src/log_surgeon/finite_automata/Dfa.hpp
(3 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(3 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(3 hunks)src/log_surgeon/finite_automata/PrefixTree.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(21 hunks)src/log_surgeon/finite_automata/RegisterHandler.hpp
(2 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(3 hunks)tests/CMakeLists.txt
(2 hunks)tests/test-capture.cpp
(1 hunks)tests/test-lexer.cpp
(5 hunks)tests/test-nfa.cpp
(4 hunks)tests/test-tag.cpp
(0 hunks)
💤 Files with no reviewable changes (1)
- tests/test-tag.cpp
✅ Files skipped from review due to trivial changes (1)
- src/log_surgeon/finite_automata/PrefixTree.hpp
🧰 Additional context used
📓 Path-based instructions (13)
src/log_surgeon/finite_automata/RegisterHandler.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/LexicalRule.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-nfa.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/SchemaParser.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-lexer.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Capture.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Dfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Lexer.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/RegexAST.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-capture.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/TaggedTransition.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (7)
src/log_surgeon/finite_automata/RegisterHandler.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#56
File: src/log_surgeon/finite_automata/RegisterHandler.hpp:0-0
Timestamp: 2024-11-27T22:25:35.608Z
Learning: In the `RegisterHandler` class in `src/log_surgeon/finite_automata/RegisterHandler.hpp`, the methods `add_register` and `append_position` rely on `emplace_back` and `m_prefix_tree.insert` to handle exceptions correctly and maintain consistent state without requiring additional exception handling.
src/log_surgeon/SchemaParser.cpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
src/log_surgeon/finite_automata/Capture.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
src/log_surgeon/finite_automata/NfaState.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
src/log_surgeon/Lexer.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#56
File: src/log_surgeon/finite_automata/RegisterHandler.hpp:0-0
Timestamp: 2024-11-27T22:25:35.608Z
Learning: In the `RegisterHandler` class in `src/log_surgeon/finite_automata/RegisterHandler.hpp`, the methods `add_register` and `append_position` rely on `emplace_back` and `m_prefix_tree.insert` to handle exceptions correctly and maintain consistent state without requiring additional exception handling.
src/log_surgeon/finite_automata/RegexAST.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/TaggedTransition.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build (ubuntu-latest, Debug)
🔇 Additional comments (27)
src/log_surgeon/finite_automata/Nfa.hpp (1)
43-43
: Good practice: Passrules
by const referenceModifying the constructor to accept
rules
as aconst&
parameter avoids unnecessary copies and improves performance.src/log_surgeon/finite_automata/RegexAST.hpp (4)
86-100
: Efficient management of subtree positive capturesThe addition of methods to get, set, and add
m_subtree_positive_captures
enhances the clarity and maintainability of capture tracking within the AST nodes.
650-660
: Validation of non-null parameters inRegexASTCapture
constructorEnsuring that
group_regex_ast
andcapture
are not null by throwingstd::invalid_argument
improves robustness and aligns with previously learned best practices regarding non-null requirements form_capture
inRegexASTCapture
.This approach reflects the learning that
m_tag
(nowm_capture
) must always be non-null, as noted in past experiences.
788-790
: Correct accumulation of positive captures inRegexASTOr
The constructor correctly accumulates subtree positive captures from both left and right operands, ensuring accurate tracking of captures in alternation expressions.
817-819
: Accurate merging of captures inRegexASTCat
Combining subtree positive captures from both operands in concatenation expressions maintains the integrity of capture groups throughout the AST.
src/log_surgeon/finite_automata/Capture.hpp (2)
1-2
: LGTM! Class and header guards renamed consistently.The renaming from
Tag
toCapture
has been consistently applied across the class name and header guards, which improves code clarity by better reflecting its purpose.Also applies to: 9-9, 20-20
11-11
: LGTM! Constructor implementation is correct.The constructor correctly moves the name parameter into the member variable, which is an efficient approach for string handling.
tests/test-capture.cpp (1)
7-34
: LGTM! Comprehensive test coverage with well-organized test cases.The test suite effectively covers:
- Basic functionality
- Edge cases (empty names)
- Special character handling
- Copy and move semantics
src/log_surgeon/LexicalRule.hpp (1)
26-28
: LGTM! Well-implemented getter with appropriate const correctness.The method correctly:
- Uses
[[nodiscard]]
to prevent accidental value discarding- Returns const pointers to prevent modification of captures
- Delegates to the appropriate regex method
src/log_surgeon/finite_automata/RegisterHandler.hpp (2)
5-5
: LGTM! Appropriate include for fixed-width integer types.The addition of
<cstdint>
is necessary for usinguint32_t
in the type alias.
21-21
: LGTM! Type alias improves code clarity and maintainability.Using
register_id_t
instead of rawuint32_t
enhances code readability and makes future type changes easier.src/log_surgeon/finite_automata/TaggedTransition.hpp (3)
14-15
: LGTM! Type alias improves code clarity.The introduction of
tag_id_t
asstd::uint32_t
enhances code readability and ensures consistent tag ID representation across the codebase.
Line range hint
23-46
: LGTM! Improved type safety with value types.The refactoring from Tag pointers to tag_id_t values:
- Reduces memory management complexity
- Improves type safety
- Aligns with the PR objective of simplifying transition handling
Line range hint
55-78
: LGTM! Consistent refactoring approach.The changes to NegativeTaggedTransition mirror those in PositiveTaggedTransition, maintaining consistency in the transition handling approach.
src/log_surgeon/finite_automata/Dfa.hpp (2)
Line range hint
5-45
: LGTM! Enhanced register handling capabilities.The addition of RegisterHandler and supporting includes improves the DFA's register management capabilities.
82-82
: LGTM! Explicit pointer initialization improves safety.Initializing the state pointer to nullptr prevents potential undefined behavior.
src/log_surgeon/Lexer.hpp (3)
27-29
: LGTM! Well-defined type system.The introduction of clear type aliases enhances code readability and maintains consistency with the codebase's type system.
59-61
: LGTM! Clear error documentation.The updated documentation clearly specifies the exception condition for duplicate capture names.
197-199
: LGTM! Efficient ID mapping implementation.The use of unordered_map provides efficient lookups and aligns with the transition to an ID-based system.
tests/test-nfa.cpp (1)
Line range hint
50-138
: LGTM! Test expectations updated correctly.The test expectations have been properly updated to reflect the new tag ID system while maintaining comprehensive coverage of different transition types.
src/log_surgeon/finite_automata/NfaState.hpp (3)
35-40
: LGTM! Constructor changes improve efficiency.The switch from
Tag const*
totag_id_t
for constructor parameters is a good change that:
- Reduces pointer indirection
- Aligns with the PR's goal of unifying transitions
- Simplifies memory management
54-56
: LGTM! Method signature change maintains consistency.The update to use
tag_id_t
inadd_positive_tagged_start_transition
maintains consistency with the constructor changes.
Line range hint
205-251
: Consider documenting the epsilon transition behaviour.The TODO comment "currently treat tagged transitions as epsilon transitions" suggests this might be temporary. Consider:
- Documenting this behaviour in the class documentation
- Evaluating if this should remain the long-term approach
tests/test-lexer.cpp (1)
58-72
: Well-structured helper functions with clear documentation.The new helper functions
initialize_lexer
andtest_scanning_input
improve test maintainability and readability.src/log_surgeon/SchemaParser.cpp (1)
12-12
: LGTM! Consistent transition from Tag to Capture.The changes properly implement the transition from Tag to Capture, maintaining consistency across the codebase.
Also applies to: 170-170
tests/CMakeLists.txt (2)
5-7
: LGTM! Source files properly updated.The addition of Capture.hpp and related headers maintains proper build dependencies.
27-34
: LGTM! Test sources well-organized.The addition of test-capture.cpp and the improved formatting enhance maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
src/log_surgeon/finite_automata/RegexAST.hpp (1)
Line range hint
89-850
: Format the code using clang-format.Multiple code formatting violations were detected by the linter. Please run clang-format on the file to ensure it adheres to the project's style guide.
♻️ Duplicate comments (1)
src/log_surgeon/finite_automata/Nfa.hpp (1)
20-28
:⚠️ Potential issueEnsure thread safety of
UniqueIdGenerator
The
current_id
counter is not thread-safe. If instances ofNfa
are accessed by multiple threads, consider usingstd::atomic<uint32_t>
forcurrent_id
to prevent data races.private: - uint32_t current_id; + std::atomic<uint32_t> current_id;
🧹 Nitpick comments (6)
src/log_surgeon/finite_automata/SpontaneousTransition.hpp (4)
15-19
: Add documentation for TransitionOperation enum valuesEach enum value's purpose should be documented to improve code maintainability.
Add documentation like this:
enum class TransitionOperation { + /// No operation to perform on tags None, + /// Set the specified tags as active SetTags, + /// Set the specified tags as inactive NegateTags };
62-64
: Make member variables constSince these members are never modified after construction, they should be marked as const.
Apply this diff:
- TransitionOperation m_transition_op; - std::vector<tag_id_t> m_tag_ids; - TypedNfaState const* m_dest_state; + const TransitionOperation m_transition_op; + const std::vector<tag_id_t> m_tag_ids; + TypedNfaState const* const m_dest_state;
52-59
: Enhance serialize() implementationThe current implementation has two potential improvements:
- Include the transition operation in the output for better debugging and visualization
- Handle empty tag_ids vector case specially
Consider this implementation:
[[nodiscard]] auto serialize(std::unordered_map<TypedNfaState const*, uint32_t> const& state_ids ) const -> std::optional<std::string> { auto const state_id_it = state_ids.find(m_dest_state); if (state_id_it == state_ids.end()) { return std::nullopt; } - return fmt::format("{}[{}]", state_id_it->second, fmt::join(m_tag_ids, ",")); + auto const op_str = m_transition_op == TransitionOperation::None ? "" + : m_transition_op == TransitionOperation::SetTags ? "+" + : "-"; + auto const tags_str = m_tag_ids.empty() ? "" + : fmt::format("[{}]", fmt::join(m_tag_ids, ",")); + return fmt::format("{}{}{}", state_id_it->second, op_str, tags_str); }
26-26
: Add comparison operators for container operationsConsider adding operator== and operator<=> to enable using this class in ordered containers and algorithms.
Add these operators:
auto operator<=>(const SpontaneousTransition&) const = default; bool operator==(const SpontaneousTransition&) const = default;src/log_surgeon/finite_automata/NfaState.hpp (1)
48-49
: Add parameter validationThe method should validate that
dest_state
is not null before adding it to the transitions.auto add_spontaneous_transition(NfaState* dest_state) -> void { + if (nullptr == dest_state) { + return; + } m_spontaneous_transitions.emplace_back(dest_state); }src/log_surgeon/finite_automata/Nfa.hpp (1)
110-111
: Consider implementing scoped capture groupsThe TODO comment suggests that unique naming across capture groups limits use cases. Consider implementing a scoping mechanism to allow reuse of capture names within different scopes.
Would you like me to propose a design for implementing scoped capture groups?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CMakeLists.txt
(1 hunks)src/log_surgeon/LexicalRule.hpp
(2 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(5 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(19 hunks)src/log_surgeon/finite_automata/SpontaneousTransition.hpp
(1 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(0 hunks)tests/CMakeLists.txt
(2 hunks)tests/test-nfa.cpp
(5 hunks)
💤 Files with no reviewable changes (1)
- src/log_surgeon/finite_automata/TaggedTransition.hpp
🚧 Files skipped from review as they are similar to previous changes (4)
- src/log_surgeon/LexicalRule.hpp
- tests/test-nfa.cpp
- CMakeLists.txt
- tests/CMakeLists.txt
🧰 Additional context used
📓 Path-based instructions (4)
src/log_surgeon/finite_automata/SpontaneousTransition.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/RegexAST.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (3)
src/log_surgeon/finite_automata/SpontaneousTransition.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/RegexAST.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🪛 GitHub Actions: lint
src/log_surgeon/finite_automata/Nfa.hpp
[error] 224-224: Code formatting violation detected. Line needs to be formatted using clang-format.
src/log_surgeon/finite_automata/NfaState.hpp
[error] 198-204: Code formatting violations detected. Multiple lines need to be formatted using clang-format.
src/log_surgeon/finite_automata/RegexAST.hpp
[error] 89-850: Multiple code formatting violations detected throughout the file. File needs to be formatted using clang-format.
🔇 Additional comments (9)
src/log_surgeon/finite_automata/RegexAST.hpp (7)
23-23
: LGTM! Documentation accurately reflects the tag-to-capture transition.The include directive and class documentation have been properly updated to reflect the architectural change from tags to captures.
Also applies to: 33-37
85-99
: LGTM! Member functions properly handle capture-based operations.The implementation correctly manages capture-related operations, and the condition on line 116 follows the coding guideline of using
false == <expression>
.Also applies to: 103-104, 112-120
132-148
: LGTM! Serialization methods effectively handle capture formatting.The implementation makes good use of modern C++ features like ranges and format for serializing captures.
Line range hint
632-658
: LGTM! RegexASTCapture implementation maintains non-null capture invariant.The implementation correctly enforces the non-null requirement for captures, which aligns with the previously established invariant from PR #48.
Also applies to: 659-662, 670-671
784-788
: LGTM! RegexASTOr correctly handles captures in alternation.The implementation properly manages captures and their negations in alternation patterns.
Also applies to: 794-795
815-817
: LGTM! RegexASTCat properly manages captures in concatenation.The implementation correctly handles capture propagation in concatenated patterns.
Also applies to: 825-825, 827-827
850-851
: LGTM! RegexASTMultiplication effectively manages captures in repetition patterns.The implementation properly handles captures in various repetition scenarios (zero, finite, and infinite).
Also applies to: 861-885
src/log_surgeon/finite_automata/NfaState.hpp (2)
171-175
: Address TODO comment regarding tag modificationsThe TODO comment indicates that only transitions without tag modifications are handled. This limitation should be addressed to ensure proper functionality.
Would you like me to help implement the handling of transitions that modify tags?
198-204
:⚠️ Potential issueFix code formatting
The code formatting violates the project's style guidelines. Apply proper formatting to these lines.
- auto const optional_serialized_transition - = spontaneous_transition.serialize(state_ids); - if(false == optional_serialized_transition.has_value()) { + auto const optional_serialized_transition = spontaneous_transition.serialize(state_ids); + if (false == optional_serialized_transition.has_value()) {Likely invalid or redundant comment.
🧰 Tools
🪛 GitHub Actions: lint
[error] 198-204: Code formatting violations detected. Multiple lines need to be formatted using clang-format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/log_surgeon/finite_automata/Nfa.hpp
(5 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(19 hunks)src/log_surgeon/finite_automata/SpontaneousTransition.hpp
(1 hunks)tests/test-nfa.cpp
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/test-nfa.cpp
- src/log_surgeon/finite_automata/SpontaneousTransition.hpp
🧰 Additional context used
📓 Path-based instructions (3)
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/RegexAST.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (2)
src/log_surgeon/finite_automata/NfaState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/RegexAST.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
🔇 Additional comments (12)
src/log_surgeon/finite_automata/RegexAST.hpp (4)
23-23
: LGTM! Documentation and includes updated to reflect the transition from tags to captures.The changes accurately reflect the architectural shift from tag-based to capture-based implementation.
Also applies to: 33-37
85-102
: LGTM! Base class changes maintain functionality while transitioning to captures.The changes consistently replace tag-based operations with capture-based ones, maintaining the same functionality and following coding guidelines.
Also applies to: 110-118, 130-146, 151-153
630-632
: LGTM! RegexASTCapture changes maintain the non-null invariant.The changes correctly transition from tags to captures while maintaining the critical invariant that the capture pointer must never be null, as enforced by the constructor's null checks.
Also applies to: 641-658, 666-667, 707-707, 716-716
780-783
: LGTM! Derived classes updated consistently.The changes systematically replace tag operations with capture operations across all derived classes. The updated documentation in
RegexASTCapture::add_to_nfa
provides an excellent visualization of the NFA structure.Also applies to: 789-790, 810-811, 819-819, 821-821, 854-878, 906-926, 932-932, 936-936, 942-949
src/log_surgeon/finite_automata/NfaState.hpp (4)
16-21
: LGTM! Clean header organization and type aliases.The addition of
SpontaneousTransition.hpp
and the type aliases forByteNfaState
andUtf8NfaState
improve code readability.
35-40
: LGTM! Clear constructor implementation.The constructor properly initializes the spontaneous transitions with the provided parameters.
48-57
: LGTM! Well-designed method overloads.The two overloads of
add_spontaneous_transition
provide good flexibility for adding transitions with or without operations and tags.
196-202
: LGTM! Robust error handling.The serialization implementation properly handles failures using
std::optional
.src/log_surgeon/finite_automata/Nfa.hpp (4)
126-136
: LGTM! Efficient tag management.The method efficiently handles tag creation and caching using the unique ID generator.
146-161
: LGTM! Clear negative capture handling.The method properly manages negative transitions and tag creation for each capture.
166-179
: LGTM! Well-structured capture state creation.The method properly creates and links start and end states with appropriate tag operations.
20-28
: 🛠️ Refactor suggestionConsider making UniqueIdGenerator thread-safe.
The current implementation uses a non-atomic increment operation which could lead to race conditions if used in a multi-threaded context.
class UniqueIdGenerator { public: UniqueIdGenerator() : current_id{0} {} - [[nodiscard]] auto generate_id() -> uint32_t { return current_id++; } + [[nodiscard]] auto generate_id() -> uint32_t { + return std::atomic_fetch_add(¤t_id, 1u); + } private: - uint32_t current_id; + std::atomic<uint32_t> current_id; };Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/log_surgeon/finite_automata/Nfa.hpp (1)
166-180
: Consider more efficient vector initialization.The implementation is correct, but the vector initialization for end_tag could be more efficient.
- std::vector{end_tag}, + std::vector<tag_id_t>{end_tag},
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/log_surgeon/finite_automata/Nfa.hpp
(5 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/SpontaneousTransition.hpp
(1 hunks)tests/test-nfa.cpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/log_surgeon/finite_automata/SpontaneousTransition.hpp
- tests/test-nfa.cpp
🧰 Additional context used
📓 Path-based instructions (2)
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
src/log_surgeon/finite_automata/NfaState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (8)
src/log_surgeon/finite_automata/NfaState.hpp (4)
35-40
: LGTM! Efficient constructor implementation.The constructor correctly uses std::move for tag_ids parameter and initializes m_spontaneous_transitions with the provided parameters.
48-57
: LGTM! Well-designed transition methods.Both overloads of add_spontaneous_transition are well-implemented, with proper parameter handling and efficient use of std::move.
171-174
: Consider transition operations in epsilon closure.The epsilon_closure method currently adds all spontaneous transitions to the closure set without checking their TransitionOperation. This might lead to incorrect closure calculation if some transitions modify tags.
Consider filtering transitions based on their operation type or documenting the intended behaviour.
193-207
: LGTM! Robust serialization implementation.The serialize method properly handles serialization failures and maintains a clear format for the output.
src/log_surgeon/finite_automata/Nfa.hpp (4)
126-136
: LGTM! Clean implementation of tag management.The method efficiently handles both existing and new tags, with good use of structured bindings and clear control flow.
146-162
: LGTM! Well-structured negative capture handling.The method properly collects both start and end tags for each capture and efficiently creates the new state with the NegateTags operation.
Line range hint
216-233
: LGTM! Robust serialization with proper error handling.The method correctly handles serialization failures and maintains a clear format for the output.
20-28
:⚠️ Potential issueMake UniqueIdGenerator thread-safe.
The current implementation uses a non-atomic increment operation which isn't thread-safe. Consider using std::atomic<uint32_t> for current_id to prevent data races in multi-threaded contexts.
- uint32_t current_id; + std::atomic<uint32_t> current_id;Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
src/log_surgeon/UniqueIdGenerator.hpp (1)
5-13
: Add class documentation.Add documentation to describe the purpose, thread safety guarantees, and overflow behaviour of the class.
namespace log_surgeon { +/** + * Generates unique sequential identifiers starting from 0. + * Note: This class is not thread-safe. If thread safety is required, + * consider using std::atomic<uint32_t> for current_id. + */ class UniqueIdGenerator {CMakeLists.txt (1)
Line range hint
1-1
: Update CMake minimum version to address deprecation warning.The pipeline shows a deprecation warning. Consider updating to CMake 3.10 or later.
-cmake_minimum_required(VERSION 3.5.1) +cmake_minimum_required(VERSION 3.10)src/log_surgeon/finite_automata/Nfa.hpp (3)
21-28
: Enhance class documentation.The documentation should clarify:
- The relationship between captures and tags
- Thread safety guarantees
- Performance characteristics
/** * Represents a NFA(non-deterministic finite automata) for recognizing a language based on the set * of rules used during initialization. Currently use as an intermediate model for generating the * DFA. * * Currently we assume all capture groups have unique names. + * + * Thread safety: This class is not thread-safe due to shared state modifications. + * Performance: O(1) for ID generation, O(n) for capture-to-tag mapping where n is + * the number of captures. * @tparam TypedNfaState */
117-127
: Consider caching tag pairs to improve performance.The
get_or_create_capture_tags
method performs a map lookup for every call. For frequently accessed captures, consider caching the results.auto Nfa<TypedNfaState>::get_or_create_capture_tags(Capture const* capture ) -> std::pair<tag_id_t, tag_id_t> { + static thread_local std::unordered_map<Capture const*, std::pair<tag_id_t, tag_id_t>> cache; + if (auto it = cache.find(capture); it != cache.end()) { + return it->second; + } auto const existing_tags{m_capture_to_tag_ids.find(capture)}; if (m_capture_to_tag_ids.end() == existing_tags) { auto start_tag{m_unique_id_generator.generate_id()}; auto end_tag{m_unique_id_generator.generate_id()}; auto new_tags{std::make_pair(start_tag, end_tag)}; m_capture_to_tag_ids.emplace(capture, new_tags); + cache.emplace(capture, new_tags); return new_tags; } + cache.emplace(capture, existing_tags->second); return existing_tags->second; }
101-102
: Consider addressing the TODO comment about capture group naming.The TODO comment suggests a limitation in the current design. Consider creating an issue to track this enhancement.
Would you like me to create an issue to track the enhancement of supporting non-unique capture group names?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
CMakeLists.txt
(2 hunks)src/log_surgeon/UniqueIdGenerator.hpp
(1 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(5 hunks)tests/CMakeLists.txt
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/CMakeLists.txt
🧰 Additional context used
📓 Path-based instructions (2)
src/log_surgeon/UniqueIdGenerator.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
src/log_surgeon/finite_automata/Nfa.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🪛 GitHub Actions: Build
CMakeLists.txt
[warning] 1-1: CMake Deprecation Warning: Compatibility with CMake < 3.10 will be removed from a future version of CMake. Update the VERSION argument value.
🔇 Additional comments (3)
src/log_surgeon/UniqueIdGenerator.hpp (1)
9-9
: Consider potential overflow of current_id.The
generate_id()
method could overflow uint32_t without any checks. Consider adding overflow protection.- [[nodiscard]] auto generate_id() -> uint32_t { return current_id++; } + [[nodiscard]] auto generate_id() -> uint32_t { + if (current_id == std::numeric_limits<uint32_t>::max()) { + throw std::runtime_error("UniqueIdGenerator: ID space exhausted"); + } + return current_id++; + }✅ Verification successful
Overflow protection is warranted for correctness
The ID generator is used for NFA tag generation, where each capture requires two IDs. While practical overflow is unlikely (supporting over 2 billion captures), adding the suggested overflow check would prevent undefined behaviour and improve robustness.
🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts found
Script:
#!/bin/bash # Find and show the complete UniqueIdGenerator implementation ast-grep --pattern 'class UniqueIdGenerator { $$$ }' # Find all usages of generate_id rg "generate_id" -A 2 # Look for test files related to UniqueIdGenerator fd "UniqueIdGenerator.*test" --type fLength of output: 715
Script:
#!/bin/bash # Show the complete UniqueIdGenerator.hpp file cat src/log_surgeon/UniqueIdGenerator.hpp # Find more context about tag usage in NFA rg -B 2 -A 5 "m_unique_id_generator"Length of output: 1870
CMakeLists.txt (1)
71-84
: LGTM! Source files list is properly updated.The SOURCE_FILES list correctly includes all the new files required for the transition from tag-based to capture-based implementation.
src/log_surgeon/finite_automata/Nfa.hpp (1)
217-221
: Fix code formatting to match project style.The conditional check formatting doesn't match the project style.
- if (false == optional_serialized_state.has_value()) { + if (false == optional_serialized_state.has_value()) { return std::nullopt; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
src/log_surgeon/finite_automata/TagOperation.hpp (1)
39-48
: Consider improving error handling in serialize method.The default case in the switch statement returns "?" without any error logging or handling. Consider throwing an exception or adding error logging for unexpected operation types.
[[nodiscard]] auto serialize() const -> std::string { switch (m_type) { case TagOperationType::Set: return fmt::format("{}{}", m_tag_id, "p"); case TagOperationType::Negate: return fmt::format("{}{}", m_tag_id, "n"); default: - return fmt::format("{}{}", m_tag_id, "?"); + throw std::runtime_error( + fmt::format("Unexpected tag operation type for tag ID: {}", m_tag_id)); } }src/log_surgeon/finite_automata/SpontaneousTransition.hpp (2)
18-21
: Consider enhancing class documentation.The documentation could provide more details about:
- The purpose and significance of tag operations
- When to use different types of transitions
- Examples of typical usage
31-33
: Consider returning tag operations by const reference.Returning the vector by value might be inefficient. Consider returning by const reference to avoid unnecessary copying.
- [[nodiscard]] auto get_tag_ops() const -> std::vector<TagOperation> { return m_tag_ops; } + [[nodiscard]] auto get_tag_ops() const -> std::vector<TagOperation> const& { return m_tag_ops; }src/log_surgeon/finite_automata/Nfa.hpp (1)
102-104
: Address TODO: Support non-unique capture group names.The current implementation requires unique naming across capture groups, which limits use cases. Would you like me to help design a solution that supports non-unique capture group names?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
.github/workflows/build.yaml
(3 hunks)CMakeLists.txt
(2 hunks)lint-tasks.yml
(2 hunks)src/log_surgeon/UniqueIdGenerator.hpp
(1 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(5 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/SpontaneousTransition.hpp
(1 hunks)src/log_surgeon/finite_automata/TagOperation.hpp
(1 hunks)tests/CMakeLists.txt
(2 hunks)tests/test-capture.cpp
(1 hunks)tests/test-nfa.cpp
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/test-capture.cpp
- tests/test-nfa.cpp
- src/log_surgeon/UniqueIdGenerator.hpp
- tests/CMakeLists.txt
- CMakeLists.txt
🧰 Additional context used
📓 Path-based instructions (4)
src/log_surgeon/finite_automata/TagOperation.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/SpontaneousTransition.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (2)
src/log_surgeon/finite_automata/NfaState.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🪛 GitHub Actions: lint
src/log_surgeon/finite_automata/Nfa.hpp
[error] 149-153: code should be clang-formatted
[error] 165-169: code should be clang-formatted
🔇 Additional comments (17)
lint-tasks.yml (2)
29-32
: Enhanced Test File Inclusion for C++ LintingIncluding the
"tests/**/*.cpp"
,"tests/**/*.h"
,"tests/**/*.hpp"
, and"tests/**/*.tpp"
patterns in thecpp-check
sources now ensures that C++ test files are subject to linting. This change helps maintain a consistent code quality standard across both production and test files.
72-72
: Excluding Build Directories in the Formatting TaskAdding the exclusion condition
! -path "*/*build*/*"
in thecpp
task is a useful improvement that prevents formatting from touching build artefacts. This should help avoid unnecessary processing and potential issues with generated files..github/workflows/build.yaml (3)
31-31
: Updated OS Matrix Specification
The OS matrix now specifiesubuntu-22.04
instead ofubuntu-latest
. This should improve reproducibility and consistency in the build environment for Ubuntu.
41-43
: Adjusted Installation Condition for Ubuntu
The condition in the "Install Catch2 on Ubuntu" step has been updated to check for"matrix.os == 'ubuntu-22.04'"
, aligning it with the new OS matrix. This change ensures that the correct installation script is executed on the specified Ubuntu version.
55-58
: Addition of Test Log Output on Failure
A new step has been added to print the test log when prior steps fail. This is a valuable improvement for debugging failing tests by automatically exposing the contents of./build/Testing/Temporary/LastTest.log
. Please verify that this log file path is consistent across all environments and that it reliably captures diagnostic information.src/log_surgeon/finite_automata/TagOperation.hpp (3)
11-16
: LGTM! Well-defined types with clear intent.The type alias and enum class are appropriately defined with proper scoping and sizing.
20-30
: LGTM! Well-implemented constructor and comparison operators.The constructor and comparison operators are correctly implemented using member initialization lists and std::tie for member-wise comparison.
32-34
: LGTM! Well-designed getter methods.The getter methods are properly marked with nodiscard and use correct trailing return type syntax.
src/log_surgeon/finite_automata/SpontaneousTransition.hpp (2)
25-29
: LGTM! Well-implemented constructors.The constructors are properly implemented with move semantics for tag operations.
48-62
: LGTM! Well-implemented serialization with modern C++ features.The serialize method effectively uses:
- Optional for handling missing states
- Ranges for transforming tag operations
- Format library for string formatting
src/log_surgeon/finite_automata/NfaState.hpp (5)
35-41
: LGTM! Constructor is well-designed.The constructor efficiently delegates to
add_spontaneous_transition
and uses move semantics for better performance.
49-51
: LGTM! Function is concise and focused.The function follows the single responsibility principle by handling only the addition of simple spontaneous transitions.
53-64
: LGTM! Function is optimized for performance.The function uses performance optimizations:
- Pre-reserves vector capacity
- Uses
emplace_back
for construction
184-215
: LGTM! Improved error handling in serialization.The function now properly handles serialization failures through
std::optional
and uses type-safe string formatting withfmt::format
.
177-179
: Verify epsilon closure behavior with transition operations.The function treats all spontaneous transitions as epsilon transitions, regardless of their transition operations. Please confirm if this aligns with the intended behavior after unifying all transitions into spontaneous transitions.
✅ Verification successful
Epsilon Closure Behaviour Aligns with PR Intentions
The current implementation of the epsilon closure function treats all spontaneous transitions as epsilon transitions without checking transition operations. Since no tests explicitly verify transition operations for epsilon closures and the code change reflects the update to unify all transitions into spontaneous ones, the behavior appears intentional.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for any tests that verify epsilon closure behavior rg -A 5 "test.*epsilon.*closure" tests/Length of output: 41
Script:
#!/bin/bash set -e echo "Searching for 'epsilon_closure' in tests/ folder:" rg -A 5 "epsilon_closure" tests/ echo "-------------------------------" echo "Searching for 'epsilon' keyword in tests/ folder (case-insensitive):" rg -i "epsilon" tests/Length of output: 156
src/log_surgeon/finite_automata/Nfa.hpp (2)
110-115
: LGTM! Improved constructor efficiency.The constructor now takes a const reference to avoid unnecessary copying of the rules vector.
138-155
: LGTM! Efficient handling of negative captures.The function efficiently creates both start and end tags for each capture and uses move semantics for better performance.
🧰 Tools
🪛 GitHub Actions: lint
[error] 149-153: code should be clang-formatted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (9)
tests/test-lexer.cpp (2)
141-166
: Consider adding register values verification.The TODO comment indicates missing verification of register values. This should be addressed after implementing the DFA simulation.
Would you like me to help create an issue to track the implementation of register values verification after the DFA simulation is complete?
326-364
: Consider adding register-related checks.The TODO comment indicates missing checks for
get_reg_id_from_tag_id
andget_reg_ids_from_capture_id
. These should be added after implementing TDFA's determinization.Would you like me to help create an issue to track the implementation of these additional checks after the TDFA determinization is complete?
src/log_surgeon/finite_automata/TaggedTransition.hpp (1)
48-48
: Update documentation for consistency.For consistency with the rest of the codebase, consider updating the documentation to mention "tag IDs" instead of "tags":
- * Represents an NFA transition indicating that multiple tags have been unmatched. + * Represents an NFA transition indicating that multiple tag IDs have been unmatched.src/log_surgeon/finite_automata/NfaState.hpp (1)
184-214
: Consider adding error details for serialization failures.The method currently returns
std::nullopt
without any context about why serialization failed. Consider enhancing error reporting by:
- Using a custom error type or
- Adding logging to capture failure reasons
src/log_surgeon/Lexer.hpp (2)
130-189
: Enhance const correctness in map lookups.Consider using
std::as_const()
when accessing the maps to ensure const correctness:- if (m_rule_id_to_capture_ids.contains(rule_id)) { - return m_rule_id_to_capture_ids.at(rule_id); + if (std::as_const(m_rule_id_to_capture_ids).contains(rule_id)) { + return std::as_const(m_rule_id_to_capture_ids).at(rule_id);
217-219
: Add documentation for member variables.Consider adding documentation comments for the new member variables to explain their purpose and relationships:
+ /// Maps rule IDs to their associated capture IDs std::unordered_map<rule_id_t, std::vector<capture_id_t>> m_rule_id_to_capture_ids; + /// Maps capture IDs to their corresponding start and end tag IDs std::unordered_map<capture_id_t, std::pair<tag_id_t, tag_id_t>> m_capture_id_to_tag_id_pair; + /// Maps tag IDs to their associated register IDs std::unordered_map<tag_id_t, reg_id_t> m_tag_to_reg_id;src/log_surgeon/finite_automata/Nfa.hpp (2)
143-148
: Optimize vector allocation.Consider reserving space for the tags vector upfront to avoid reallocations:
std::vector<tag_id_t> tags; + tags.reserve(captures.size() * 2); // Each capture needs 2 tags for (auto const capture : captures) {
106-108
: Address TODO about capture group naming limitations.The current design limits use cases by enforcing unique naming across capture groups. Would you like me to help design a solution that allows for scoped capture names?
src/log_surgeon/Lexer.tpp (1)
386-387
: Improve the error message for duplicate capture names.The current error message could be more descriptive by including the duplicate capture name.
- throw std::invalid_argument("`m_rules` contains capture names that are not unique." + throw std::invalid_argument(fmt::format( + "`m_rules` contains duplicate capture name: '{}'", capture_name)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
CMakeLists.txt
(2 hunks)src/log_surgeon/Aliases.hpp
(1 hunks)src/log_surgeon/Lexer.hpp
(5 hunks)src/log_surgeon/Lexer.tpp
(3 hunks)src/log_surgeon/LexicalRule.hpp
(2 hunks)src/log_surgeon/UniqueIdGenerator.hpp
(1 hunks)src/log_surgeon/finite_automata/Dfa.hpp
(1 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(5 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(20 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(3 hunks)tests/CMakeLists.txt
(2 hunks)tests/test-lexer.cpp
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- src/log_surgeon/finite_automata/Dfa.hpp
- src/log_surgeon/UniqueIdGenerator.hpp
- src/log_surgeon/LexicalRule.hpp
- tests/CMakeLists.txt
- CMakeLists.txt
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Aliases.hpp
tests/test-lexer.cpp
src/log_surgeon/finite_automata/TaggedTransition.hpp
src/log_surgeon/Lexer.hpp
src/log_surgeon/finite_automata/NfaState.hpp
src/log_surgeon/finite_automata/Nfa.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Aliases.hpp
tests/test-lexer.cpp
src/log_surgeon/finite_automata/TaggedTransition.hpp
src/log_surgeon/Lexer.hpp
src/log_surgeon/finite_automata/NfaState.hpp
src/log_surgeon/finite_automata/Nfa.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
🧠 Learnings (4)
src/log_surgeon/finite_automata/TaggedTransition.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/NfaState.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/RegexAST.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: build (ubuntu-22.04, Release)
- GitHub Check: build (ubuntu-22.04, Debug)
- GitHub Check: build (macos-latest, Release)
- GitHub Check: build (macos-latest, Debug)
🔇 Additional comments (18)
tests/test-lexer.cpp (3)
2-23
: LGTM! Well-organized includes and using declarations.The new includes and using declarations are properly organized and necessary for the added functionality.
102-139
: LGTM! Well-structured initialization function.The function properly initializes the lexer with delimiters and schema variables, following coding guidelines and best practices.
310-324
: LGTM! Comprehensive basic lexer test case.The test case properly validates both successful and unsuccessful token matching scenarios.
src/log_surgeon/finite_automata/TaggedTransition.hpp (3)
8-9
: LGTM! Header includes updated appropriately.The added headers support the transition from Tag pointers to tag IDs, and unnecessary headers have been removed.
Also applies to: 13-13
17-17
: LGTM! Successfully transitioned from Tag pointers to tag IDs.The changes effectively decouple the class from the Tag type while maintaining functionality.
Also applies to: 23-24, 37-37, 43-43
54-55
: LGTM! Successfully transitioned to using tag IDs.The implementation changes effectively maintain functionality while using the new tag ID type.
Also applies to: 68-68, 74-74
src/log_surgeon/Aliases.hpp (1)
1-14
: LGTM! Well-structured type aliases.The file is well-organized with proper include guards, namespace usage, and clear type aliases that enhance code readability and maintainability.
src/log_surgeon/finite_automata/NfaState.hpp (3)
16-17
: LGTM! Clean integration of spontaneous transitions.The includes and member variable changes align well with the PR's objective to unify various types of transitions into spontaneous transitions.
Also applies to: 111-111
35-41
: LGTM! Well-designed transition management.The constructor and methods provide a clean and type-safe interface for managing spontaneous transitions.
Also applies to: 49-51, 53-64
177-179
: LGTM! Proper boolean comparison pattern.The method correctly follows the project's coding guideline of using
false == <expression>
pattern.src/log_surgeon/finite_automata/Nfa.hpp (1)
110-110
: Ensure thread safety of UniqueIdGenerator.The UniqueIdGenerator is used to generate unique IDs in
get_or_create_capture_tag_pair
. If this code will be used in a multi-threaded context, consider making UniqueIdGenerator thread-safe.Also applies to: 124-128
src/log_surgeon/Lexer.tpp (3)
361-367
: LGTM! Type safety improvement.The change from
uint32_t
torule_id_t
enhances type safety and makes the code more maintainable.
369-378
: LGTM! Consistent type usage.The parameter type change maintains consistency with the
add_rule
method and improves type safety.
406-406
: Verify the impact of DFA ignoring captures.The TODO comment indicates that DFA ignores captures, which could lead to incorrect behavior.
✅ Verification successful
🏁 Script executed:
#!/bin/bash # Description: Search for capture-related code in DFA implementation. # Test: Search for capture-related code in DFA files rg -A 5 "capture" --glob "src/log_surgeon/finite_automata/Dfa*"Length of output: 63
I'll run a broader search to locate any implementation details or references to the "capture:user=" pattern in the repository. This will help us determine if any DFA code is handling captures or if they are indeed being ignored.
🏁 Script executed:
#!/bin/bash # Search for the "capture:user=" pattern in the entire repository to locate potential capture handling code rg "capture:user=" .Length of output: 149
DFA Captures Verification Outcome
We have confirmed that the DFA implementation does not handle capture groups. Our search for any capture-handling code returned only the original TODO comment (in src/log_surgeon/Lexer.tpp, line 406), and no additional logic was found in the DFA files. This indicates that the DFA currently ignores captures, which could cause issues if named capture groups are expected to be processed.
- src/log_surgeon/Lexer.tpp (line 406) contains the related TODO comment.
- No evidence of capture group processing was found in any DFA-related files.
src/log_surgeon/finite_automata/RegexAST.hpp (4)
33-38
: LGTM! Consistent terminology update.The renaming from tags to captures is consistent throughout the class and aligns with the PR objective.
Also applies to: 86-106
632-663
: LGTM! Robust parameter validation.The constructor correctly validates both parameters, maintaining the non-null requirement previously established for tags.
909-947
: LGTM! Well-documented NFA structure.The changes maintain the same NFA structure while transitioning to capture-based terminology. The documentation clearly explains the structure and transitions.
950-960
: LGTM! Consistent serialization format.The serialization logic maintains the same format while transitioning to capture-based terminology.
fb2de27
to
ad30714
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
src/log_surgeon/finite_automata/RegexAST.hpp (2)
116-116
: Fix typo in comment."spontaenous" should be "spontaneous".
- // root --(regex)--> state_with_spontaenous_transition --(negate tags)--> end_state + // root --(regex)--> state_with_spontaneous_transition --(negate tags)--> end_state
118-119
: Fix typo in variable name.For consistency with the comment fix, rename the variable.
- auto* state_with_spontaenous_transition{ + auto* state_with_spontaneous_transition{src/log_surgeon/finite_automata/SpontaneousTransition.hpp (2)
18-21
: Update class documentation to reflect multiple tag operations.The current documentation suggests a single tag operation, but the class actually handles multiple tag operations through a vector. Update the documentation to accurately reflect this.
- * Represents an NFA transition indicating a tag and an operation to perform on the tag. + * Represents an NFA transition with a collection of tag operations to be performed during the transition.
31-33
: Return vectors by const reference for better performance.Returning vectors by value causes unnecessary copying. Consider returning by const reference instead.
- [[nodiscard]] auto get_tag_ops() const -> std::vector<TagOperation> { return m_tag_ops; } + [[nodiscard]] auto get_tag_ops() const -> std::vector<TagOperation> const& { return m_tag_ops; }src/log_surgeon/finite_automata/NfaState.hpp (2)
52-63
: Add parameter validation for tag_ids.Consider validating that tag_ids is not empty when op_type is not None.
auto add_spontaneous_transition( TagOperationType const op_type, std::vector<tag_id_t> const& tag_ids, NfaState const* dest_state ) -> void { + if (op_type != TagOperationType::None && tag_ids.empty()) { + throw std::invalid_argument("tag_ids cannot be empty when op_type is not None"); + } std::vector<TagOperation> tag_ops; tag_ops.reserve(tag_ids.size());
165-181
: Add capacity hint for closure_set.Consider reserving space in closure_set to avoid reallocations during traversal.
auto NfaState<state_type>::epsilon_closure() -> std::set<NfaState const*> { std::set<NfaState const*> closure_set; + closure_set.reserve(16); // Initial capacity hint std::stack<NfaState const*> stack;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CMakeLists.txt
(1 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(8 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(3 hunks)src/log_surgeon/finite_automata/SpontaneousTransition.hpp
(1 hunks)src/log_surgeon/finite_automata/TagOperation.hpp
(1 hunks)tests/CMakeLists.txt
(1 hunks)tests/test-nfa.cpp
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- CMakeLists.txt
- tests/CMakeLists.txt
- src/log_surgeon/finite_automata/TagOperation.hpp
- src/log_surgeon/finite_automata/Nfa.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-nfa.cpp
src/log_surgeon/finite_automata/SpontaneousTransition.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
src/log_surgeon/finite_automata/NfaState.hpp
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-nfa.cpp
src/log_surgeon/finite_automata/SpontaneousTransition.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
src/log_surgeon/finite_automata/NfaState.hpp
🧠 Learnings (2)
src/log_surgeon/finite_automata/RegexAST.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build (ubuntu-22.04, Release)
- GitHub Check: build (ubuntu-22.04, Debug)
- GitHub Check: build (macos-latest, Debug)
🔇 Additional comments (7)
src/log_surgeon/finite_automata/RegexAST.hpp (2)
863-864
: LGTM! Transition type updated correctly.The change from epsilon transition to spontaneous transition aligns with the PR objective of unifying different types of transitions.
909-941
: LGTM! Capture group NFA construction updated correctly.The changes to the capture group NFA construction align with the PR objective:
- Comment block accurately describes the new structure
- Implementation correctly uses the new transition types
- Method name updated to reflect the new design
src/log_surgeon/finite_automata/SpontaneousTransition.hpp (2)
35-41
: LGTM!The method documentation clearly describes the return values and parameters.
48-65
: LGTM!The implementation is clean and efficient, making good use of modern C++ features like ranges and fmt::format.
tests/test-nfa.cpp (2)
24-38
: LGTM!The test case setup is clear and well-structured, properly initializing the schema and AST.
79-95
: LGTM!The test properly handles the optional return value and verifies the serialized output line by line.
src/log_surgeon/finite_automata/NfaState.hpp (1)
183-214
: LGTM!The serialization implementation properly handles optional return values and formats the output clearly.
@coderabbitai Re-review this PR. |
@SharafMohamed I'll review the changes in detail. ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/log_surgeon/finite_automata/Nfa.hpp (1)
63-63
: Fix typo in documentation.The word "spotaneous" should be "spontaneous".
- * - A state with a spotaneous transition to `dest_state` that sets a tag to track the capture's + * - A state with a spontaneous transition to `dest_state` that sets a tag to track the capture's
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/log_surgeon/finite_automata/Nfa.hpp
(7 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(3 hunks)src/log_surgeon/finite_automata/SpontaneousTransition.hpp
(1 hunks)src/log_surgeon/finite_automata/TagOperation.hpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/log_surgeon/finite_automata/TagOperation.hpp
- src/log_surgeon/finite_automata/SpontaneousTransition.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Nfa.hpp
src/log_surgeon/finite_automata/NfaState.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Nfa.hpp
src/log_surgeon/finite_automata/NfaState.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
🧠 Learnings (3)
src/log_surgeon/finite_automata/Nfa.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/RegexAST.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (12)
src/log_surgeon/finite_automata/RegexAST.hpp (5)
116-116
: LGTM! Documentation update aligns with transition unification.The comment update accurately reflects the new terminology for tag operations.
117-121
: LGTM! Implementation follows coding guidelines and aligns with transition unification.The changes maintain the same logic while:
- Following the coding guideline of using
false == <expression>
.- Using updated method names that reflect the unified transition approach.
863-864
: LGTM! Transition type updated to use unified terminology.The change from epsilon transition to spontaneous transition aligns with the PR's objective of unifying different transition types.
915-936
: LGTM! Documentation updates maintain clarity while reflecting new terminology.The ASCII art diagram and comments have been updated to accurately reflect the new unified transition approach while maintaining clear visual representation of the NFA structure.
941-941
: LGTM! Method name change improves clarity.The new method name
new_start_and_end_states_from_positive_capture
better describes its purpose and aligns with the unified transition approach.src/log_surgeon/finite_automata/NfaState.hpp (5)
42-48
: LGTM!The constructor is well-designed, using member initializer list and delegating to a helper method. The use of
std::move
for the vector parameter is efficient.
56-58
: LGTM!The method is concise and follows the single responsibility principle.
60-71
: LGTM!The method is well-implemented with efficient vector capacity reservation and follows the single responsibility principle.
174-189
: LGTM!The method follows the standard epsilon closure algorithm and maintains consistent coding patterns.
191-222
: LGTM!The method implements proper error handling using
std::optional
and follows consistent formatting patterns.src/log_surgeon/finite_automata/Nfa.hpp (2)
141-156
: LGTM!The method efficiently collects tags from captures and uses move semantics for optimal performance.
207-224
: LGTM!The method implements proper error handling using
std::optional
and follows consistent formatting patterns.
…hat result from the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (9)
src/log_surgeon/finite_automata/TagOperation.hpp (2)
19-31
: Consider adding noexcept specifier to comparison operators.The comparison operators are simple and cannot throw exceptions.
- auto operator<(TagOperation const& rhs) const -> bool { + auto operator<(TagOperation const& rhs) const noexcept -> bool { - auto operator==(TagOperation const& rhs) const -> bool { + auto operator==(TagOperation const& rhs) const noexcept -> bool {
40-49
: Consider handling all enum values explicitly in serialize().The default case could hide potential future enum values. Consider handling all enum values explicitly or adding a [[fallthrough]] attribute.
switch (m_type) { case TagOperationType::Set: return fmt::format("{}{}", m_tag_id, "p"); case TagOperationType::Negate: return fmt::format("{}{}", m_tag_id, "n"); - default: + case TagOperationType::None: return fmt::format("{}{}", m_tag_id, "?"); }src/log_surgeon/finite_automata/SpontaneousTransition.hpp (2)
26-30
: Consider marking constructors as noexcept.The constructors perform simple member initialization and move operations that cannot throw.
- explicit SpontaneousTransition(TypedNfaState const* dest_state) : m_dest_state{dest_state} {} + explicit SpontaneousTransition(TypedNfaState const* dest_state) noexcept : m_dest_state{dest_state} {} - SpontaneousTransition(std::vector<TagOperation> tag_ops, TypedNfaState const* dest_state) + SpontaneousTransition(std::vector<TagOperation> tag_ops, TypedNfaState const* dest_state) noexcept
32-34
: Consider marking getters as noexcept.The getter methods cannot throw exceptions.
- [[nodiscard]] auto get_tag_ops() const -> std::vector<TagOperation> const& { return m_tag_ops; } + [[nodiscard]] auto get_tag_ops() const noexcept -> std::vector<TagOperation> const& { return m_tag_ops; } - [[nodiscard]] auto get_dest_state() const -> TypedNfaState const* { return m_dest_state; } + [[nodiscard]] auto get_dest_state() const noexcept -> TypedNfaState const* { return m_dest_state; }tests/test-nfa.cpp (1)
42-62
: Consider using raw string literals for improved readability.The expected serialized NFA string contains many special characters. Using a raw string literal would improve readability.
- string expected_serialized_nfa{ - "0:byte_transitions={A-->1,Z-->2},spontaneous_transition={}\n" + string expected_serialized_nfa{R"( + 0:byte_transitions={A-->1,Z-->2},spontaneous_transition={}src/log_surgeon/finite_automata/NfaState.hpp (2)
42-67
: Add documentation for the constructor and add_spontaneous_transition method.While the implementation is correct, these public methods would benefit from documentation explaining:
- Parameters and their purpose
- Any preconditions or invariants
- The relationship between TagOperationType and tag_ids
187-219
: Consider reserving space for serialized transitions.The vectors
byte_transitions
andserialized_spontaneous_transitions
could benefit from pre-allocation to avoid reallocations during push_back operations.std::vector<std::string> byte_transitions; + byte_transitions.reserve(cSizeOfByte); for (uint32_t idx{0}; idx < cSizeOfByte; ++idx) { std::vector<std::string> serialized_spontaneous_transitions; + serialized_spontaneous_transitions.reserve(m_spontaneous_transitions.size()); for (auto const& spontaneous_transition : m_spontaneous_transitions) {src/log_surgeon/finite_automata/RegexAST.hpp (2)
858-891
: Consider extracting multiplication cases into separate methods.The method handles multiple cases (min=0, infinite, max>min) which makes it complex. Consider extracting these into separate private methods for better readability and maintainability.
For example:
private: void handle_zero_min_case(Nfa<TypedNfaState>* nfa, TypedNfaState* end_state) const; void handle_infinite_case(Nfa<TypedNfaState>* nfa, TypedNfaState* end_state) const; void handle_finite_case(Nfa<TypedNfaState>* nfa, TypedNfaState* end_state) const;
908-949
: Update documentation to reflect spontaneous transitions.The documentation comments (lines 911-913) still refer to "tagged transitions" instead of "spontaneous transitions". Update these to match the new implementation.
- // The NFA constructed for a capture group follows the structure below, with tagged transitions + // The NFA constructed for a capture group follows the structure below, with spontaneous transitions
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/log_surgeon/Lexer.hpp
(1 hunks)src/log_surgeon/finite_automata/Dfa.hpp
(1 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(7 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(4 hunks)src/log_surgeon/finite_automata/SpontaneousTransition.hpp
(1 hunks)src/log_surgeon/finite_automata/TagOperation.hpp
(1 hunks)tests/test-nfa.cpp
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/log_surgeon/finite_automata/Nfa.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Lexer.hpp
src/log_surgeon/finite_automata/Dfa.hpp
tests/test-nfa.cpp
src/log_surgeon/finite_automata/TagOperation.hpp
src/log_surgeon/finite_automata/NfaState.hpp
src/log_surgeon/finite_automata/SpontaneousTransition.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Lexer.hpp
src/log_surgeon/finite_automata/Dfa.hpp
tests/test-nfa.cpp
src/log_surgeon/finite_automata/TagOperation.hpp
src/log_surgeon/finite_automata/NfaState.hpp
src/log_surgeon/finite_automata/SpontaneousTransition.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
🧠 Learnings (3)
src/log_surgeon/Lexer.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/RegexAST.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (9)
src/log_surgeon/Lexer.hpp (1)
16-16
: LGTM! Making NfaState dependency explicit.The addition of this include directive makes the dependency on NfaState explicit, which is a good practice since the class directly uses TypedNfaState in its template parameters and member functions. This change improves code maintainability by clearly showing the required dependencies.
src/log_surgeon/finite_automata/TagOperation.hpp (1)
13-17
: LGTM! Well-structured enum class.The enum class is appropriately scoped and uses uint8_t as the underlying type, which is memory efficient.
src/log_surgeon/finite_automata/SpontaneousTransition.hpp (1)
53-53
: LGTM! Follows coding guidelines.The condition
false == state_ids.contains(m_dest_state)
follows the coding guideline of preferringfalse == <expression>
over!<expression>
.tests/test-nfa.cpp (1)
64-68
: LGTM! Proper handling of optional value.The code correctly checks for the presence of the serialized output before using it.
src/log_surgeon/finite_automata/Dfa.hpp (1)
8-8
: LGTM! Appropriate header inclusion.The addition of header is necessary for managing unmarked sets in the NFA to DFA conversion.
src/log_surgeon/finite_automata/NfaState.hpp (3)
4-25
: LGTM! Header changes align with transition unification.The addition of
SpontaneousTransition.hpp
andTagOperation.hpp
headers, along with the removal ofTaggedTransition.hpp
, properly supports the refactoring of transitions.
94-110
: LGTM! Well-designed getter methods.The getters are properly marked with [[nodiscard]] and follow consistent return value semantics.
170-185
:❓ Verification inconclusive
Verify handling of tag operations in epsilon_closure.
The method currently adds all states reachable via spontaneous transitions to the closure set without checking their
TransitionOperation
. This could lead to incorrect state transitions if some operations should be handled differently.
🏁 Script executed:
#!/bin/bash # Search for other uses of TransitionOperation in epsilon transitions ast-grep --pattern 'TransitionOperation::$_'Length of output: 46
Attention: Confirm Tag Operation Handling in epsilon_closure
The current implementation of
epsilon_closure
collects all states reachable via spontaneous transitions without verifying their associatedTransitionOperation
. This could lead to unintended state transitions if specific operations require distinct handling (e.g., tagged operations). Although the initial grep forTransitionOperation::$_
produced no results, this is inconclusive. Please manually verify whether the omission of any checks for tag operations was intentional or if additional logic is needed to correctly process state transitions.src/log_surgeon/finite_automata/RegexAST.hpp (1)
114-126
: LGTM! Clean transition to spontaneous transitions.The refactoring from tagged transitions to spontaneous transitions is well-implemented, maintaining the same functionality while simplifying the transition model.
switch (m_type) { | ||
case TagOperationType::Set: | ||
return fmt::format("{}{}", m_tag_id, "p"); | ||
case TagOperationType::Negate: | ||
return fmt::format("{}{}", m_tag_id, "n"); | ||
case TagOperationType::None: | ||
return "none"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're missing a default case handler (my compiler sends a compile-time warning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed None
case and added default
case. Not sure what to print in this case though since its impossible to reach. Don't want to return optional or throw an error just to handle this impossible case, so returned type as ?
. Not sure if you have a better idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
src/log_surgeon/finite_automata/NfaSpontaneousTransition.hpp (2)
26-30
: Consider adding noexcept specifier to constructors.The constructors are simple and don't throw exceptions, so they should be marked as noexcept.
- explicit NfaSpontaneousTransition(TypedNfaState const* dest_state) : m_dest_state{dest_state} {} + explicit NfaSpontaneousTransition(TypedNfaState const* dest_state) noexcept : m_dest_state{dest_state} {} - NfaSpontaneousTransition(std::vector<TagOperation> tag_ops, TypedNfaState const* dest_state) + NfaSpontaneousTransition(std::vector<TagOperation> tag_ops, TypedNfaState const* dest_state) noexcept : m_tag_ops{std::move(tag_ops)}, m_dest_state{dest_state} {}
53-67
: Consider using std::ranges::views::transform for better readability.The transformation of tag operations could be more concise using ranges.
- auto transformed_operations{ - m_tag_ops | std::ranges::views::transform([](TagOperation const& tag_op) { - return tag_op.serialize(); - }) - }; + auto transformed_operations = m_tag_ops | std::ranges::views::transform(&TagOperation::serialize);src/log_surgeon/finite_automata/NfaState.hpp (1)
56-67
: Consider reserving vector capacity before emplace_back.The method reserves capacity for tag_ops but not for m_spontaneous_transitions.
auto add_spontaneous_transition( TagOperationType const op_type, std::vector<tag_id_t> const& tag_ids, NfaState const* dest_state ) -> void { + if (m_spontaneous_transitions.empty()) { + m_spontaneous_transitions.reserve(4); // Reserve space for typical use + } std::vector<TagOperation> tag_ops; tag_ops.reserve(tag_ids.size()); for (auto const tag_id : tag_ids) { tag_ops.emplace_back(tag_id, op_type); } m_spontaneous_transitions.emplace_back(std::move(tag_ops), dest_state); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CMakeLists.txt
(1 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(7 hunks)src/log_surgeon/finite_automata/NfaSpontaneousTransition.hpp
(1 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(4 hunks)src/log_surgeon/finite_automata/TagOperation.hpp
(1 hunks)tests/CMakeLists.txt
(1 hunks)tests/test-nfa.cpp
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- CMakeLists.txt
- tests/test-nfa.cpp
- tests/CMakeLists.txt
- src/log_surgeon/finite_automata/TagOperation.hpp
- src/log_surgeon/finite_automata/Nfa.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaSpontaneousTransition.hpp
src/log_surgeon/finite_automata/NfaState.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaSpontaneousTransition.hpp
src/log_surgeon/finite_automata/NfaState.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
🧠 Learnings (2)
src/log_surgeon/finite_automata/NfaState.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
src/log_surgeon/finite_automata/RegexAST.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build (ubuntu-22.04, Release)
- GitHub Check: build (ubuntu-22.04, Debug)
- GitHub Check: build (macos-latest, Release)
🔇 Additional comments (3)
src/log_surgeon/finite_automata/NfaSpontaneousTransition.hpp (1)
17-47
: LGTM! Well-documented class design.The class is well-structured with clear documentation explaining its purpose and template parameter. The use of
[[nodiscard]]
for getters is a good practice.src/log_surgeon/finite_automata/NfaState.hpp (1)
171-186
: LGTM! Efficient epsilon closure implementation.The epsilon closure implementation using a stack and set is efficient and correctly handles cycles.
src/log_surgeon/finite_automata/RegexAST.hpp (1)
911-913
: LGTM! Clear documentation of NFA structure.The updated documentation clearly explains the NFA structure for capture groups using spontaneous transitions.
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/log_surgeon/finite_automata/NfaSpontaneousTransition.hpp (1)
26-28
: Consider null destination state guard.
It might be safer to validate thatdest_state
is non-null in the constructor to avoid unexpected runtime errors later.NfaSpontaneousTransition(std::vector<TagOperation> tag_ops, TypedNfaState const* dest_state) : m_tag_ops{std::move(tag_ops)}, - m_dest_state{dest_state} {} + m_dest_state{dest_state} { + if (nullptr == dest_state) { + throw std::invalid_argument("Destination state cannot be null"); + } +}src/log_surgeon/finite_automata/TagOperation.hpp (1)
14-16
: Explicitly handle additional enumerators in the future.
Consider omitting thedefault:
branch so the compiler warns you when newTagOperationType
entries are added. This encourages exhaustive handling of all enumerators.switch (m_type) { case TagOperationType::Set: return fmt::format("{}{}", m_tag_id, "p"); case TagOperationType::Negate: return fmt::format("{}{}", m_tag_id, "n"); - default: - return fmt::format("{}{}", m_tag_id, "?"); + // If new enumerators are added, the compiler will warn if they aren’t handled. }src/log_surgeon/finite_automata/NfaState.hpp (1)
42-48
: Validate null pointer in constructor.
Accepting a raw pointer without checking for null could lead to undefined behaviour if accidentally passed a null. Consider adding a null check:NfaState( TagOperationType const op_type, std::vector<tag_id_t> const& tag_ids, NfaState const* dest_state ) { + if (nullptr == dest_state) { + throw std::invalid_argument("dest_state cannot be null"); + } add_spontaneous_transition(op_type, tag_ids, dest_state); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/log_surgeon/finite_automata/Nfa.hpp
(7 hunks)src/log_surgeon/finite_automata/NfaSpontaneousTransition.hpp
(1 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(4 hunks)src/log_surgeon/finite_automata/TagOperation.hpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/log_surgeon/finite_automata/Nfa.hpp
- src/log_surgeon/finite_automata/RegexAST.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/TagOperation.hpp
src/log_surgeon/finite_automata/NfaSpontaneousTransition.hpp
src/log_surgeon/finite_automata/NfaState.hpp
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/TagOperation.hpp
src/log_surgeon/finite_automata/NfaSpontaneousTransition.hpp
src/log_surgeon/finite_automata/NfaState.hpp
🧠 Learnings (1)
src/log_surgeon/finite_automata/NfaState.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: build (ubuntu-22.04, Release)
- GitHub Check: build (ubuntu-22.04, Debug)
- GitHub Check: build (macos-latest, Release)
- GitHub Check: build (macos-latest, Debug)
🔇 Additional comments (4)
src/log_surgeon/finite_automata/NfaSpontaneousTransition.hpp (1)
51-53
: Good adherence to coding guidelines regarding expression checks.
Usingif (false == state_ids.contains(m_dest_state))
follows the project guideline to avoid the!
operator, which aids readability.src/log_surgeon/finite_automata/NfaState.hpp (3)
185-188
: Appropriate handling of spontaneous transitions in epsilon closure.
The loop properly checks for previously visited states, preventing infinite recursion, ensuring correctness.
211-214
: Maintain consistency with the project’s condition style.
Usingif (false == optional_serialized_transition.has_value())
is consistent with the guidelines. This check helps gracefully handle missing states.
218-223
: Clarity in final output format.
The final formatting merges acceptance info, byte transitions, and spontaneous transitions comprehensively, aiding debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR title, how about:
refactor: Unify all types of NFA transitions into `NfaSpontaneousTransition`.
NfaSpontaneousTransition
.
References
Description
Validation performed
Summary by CodeRabbit
New Features
Refactor
Tests