-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Extract RegexDFAState
class, RegexDFAStatePair
class, and RegexDFAStateType
enum into their own files.
#57
refactor: Extract RegexDFAState
class, RegexDFAStatePair
class, and RegexDFAStateType
enum into their own files.
#57
Conversation
…ransitions) return nullopt if state_ids is malformed.
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
…ake it clear to the reader that both failures are handled the same way and return nullopt. For more complicated return cases it would warrant the reader looking at the doc for the individual functions, but here I think we can make their life easier.
Co-authored-by: Lin Zhihao <[email protected]>
…-surgeon into tagged-nfa-new
…nor are parts of the rules stored, instead the rules are only read and used to build the NFA.
Co-authored-by: Lin Zhihao <[email protected]>
…call succeeds in NFA's serialize. Co-authored-by: Lin Zhihao <[email protected]>
…on classes when they are initialized in their constructor.
…d transitions instead of emplace back.
…han one leaving an NFA state.
…egativeTaggedTransition classes into their own files.
…just an id. This object is created and owned by the capture AST, and other AST and NFA states point to these tags.
…that was accidentally removed.
WalkthroughThe changes in this pull request involve significant updates to the CMake build configuration and modifications to the DFA-related classes in the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (3)
src/log_surgeon/finite_automata/RegexDFAState.hpp (1)
35-35
: Pass small integral typeuint8_t
by value instead of by const referenceThe parameter
byte
inadd_byte_transition
is auint8_t
, which is more efficiently passed by value rather than by const reference. Passing small integral types by value avoids unnecessary indirection.Apply this diff to pass
byte
by value:-auto add_byte_transition(uint8_t const& byte, RegexDFAState<stateType>* dest_state) -> void { +auto add_byte_transition(uint8_t byte, RegexDFAState<stateType>* dest_state) -> void {src/log_surgeon/finite_automata/RegexDFA.hpp (1)
65-65
: Use idiomatic condition in the while loopConsider using
!unvisited_pairs.empty()
instead offalse == unvisited_pairs.empty()
for better readability and to align with common coding practices.Apply this diff:
- while (false == unvisited_pairs.empty()) { + while (!unvisited_pairs.empty()) {src/log_surgeon/finite_automata/RegexDFAStatePair.hpp (1)
70-80
: Offer assistance to implement UTF-8 transitionsThe method
get_reachable_pairs
currently handles only single-byte transitions, as indicated by the TODO comment. Supporting UTF-8 (multi-byte) transitions is essential for full Unicode compatibility.I can help implement support for UTF-8 transitions in this method. Would you like me to provide a solution or open a GitHub issue to track this enhancement?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
CMakeLists.txt
(1 hunks)src/log_surgeon/finite_automata/RegexDFA.hpp
(2 hunks)src/log_surgeon/finite_automata/RegexDFA.tpp
(0 hunks)src/log_surgeon/finite_automata/RegexDFAState.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexDFAStatePair.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexDFAStateType.hpp
(1 hunks)
💤 Files with no reviewable changes (1)
- src/log_surgeon/finite_automata/RegexDFA.tpp
✅ Files skipped from review due to trivial changes (1)
- src/log_surgeon/finite_automata/RegexDFAStateType.hpp
🧰 Additional context used
📓 Learnings (3)
CMakeLists.txt (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
src/log_surgeon/finite_automata/RegexDFA.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
src/log_surgeon/finite_automata/RegexDFAState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
🔇 Additional comments (1)
CMakeLists.txt (1)
100-102
: Addition of new source files is correct
The inclusion of the new source files RegexDFAState.hpp
, RegexDFAStatePair.hpp
, and RegexDFAStateType.hpp
into the build system is appropriate. This ensures that the newly created classes are compiled and linked properly.
RegexDFAState
class, RegexDFAStatePair
class, and RegexDFAStateType
enum into their own files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
src/log_surgeon/finite_automata/RegexDFA.hpp (2)
12-14
: Consider addressing the TODO comment in this PRThe TODO suggests renaming
RegexDFA
toDFA
. Since this PR is already refactoring the DFA-related code, it might be a good opportunity to implement this naming change as well.Would you like me to help with renaming the class and updating all its references?
60-60
: Track the UTF-8 handling requirementThe TODO comment indicates missing functionality for handling UTF-8 multi-byte transitions.
Would you like me to create a GitHub issue to track the UTF-8 handling requirement?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
src/log_surgeon/finite_automata/RegexDFA.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexDFAState.hpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/log_surgeon/finite_automata/RegexDFAState.hpp
🧰 Additional context used
📓 Path-based instructions (1)
src/log_surgeon/finite_automata/RegexDFA.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
src/log_surgeon/finite_automata/RegexDFA.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (3)
src/log_surgeon/finite_automata/RegexDFA.hpp (3)
27-34
: LGTM! Parameter type changed as suggested
The method signature has been updated to use a raw pointer instead of a unique_ptr, which is more appropriate for this use case where ownership transfer isn't needed.
40-52
: LGTM! Clean implementation of state creation
The implementation properly manages memory using smart pointers and correctly handles the transfer of matching variable IDs from NFA to DFA states.
54-72
: LGTM! Well-structured intersection algorithm
The implementation:
- Correctly manages state pair traversal
- Properly collects matching variable IDs
- Follows coding guidelines with
false == unvisited_pairs.empty()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
src/log_surgeon/finite_automata/RegexDFAState.hpp (1)
55-55
: Optimise memory usage by conditionally includingm_bytes_transition
The member
m_bytes_transition
is always included inRegexDFAState
, even whenstateType
isUTF8
, which can lead to unnecessary memory usage. Consider conditionally includingm_bytes_transition
only whenstateType
isByte
, similar to howm_tree_transitions
is conditionally included.src/log_surgeon/finite_automata/RegexDFAStatePair.hpp (1)
71-71
: Implement UTF-8 handling inget_reachable_pairs
There is a TODO comment indicating that UTF-8 (multi-byte transitions) are not yet handled in the
get_reachable_pairs
method. Implementing this functionality is important for full UTF-8 support. Would you like assistance in adding this implementation?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
examples/intersect-test.cpp
(1 hunks)src/log_surgeon/Lexer.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexDFAState.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexDFAStatePair.hpp
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/log_surgeon/Lexer.hpp
🧰 Additional context used
📓 Path-based instructions (3)
examples/intersect-test.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/RegexDFAStatePair.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/RegexDFAState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
src/log_surgeon/finite_automata/RegexDFAState.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (2)
src/log_surgeon/finite_automata/RegexDFAState.hpp (1)
63-65
:
Add bounds check on character
in Byte state
Accessing m_bytes_transition[character]
without validating character
may lead to out-of-bounds access if character
is greater than or equal to cSizeOfByte
. Consider adding an assertion to ensure character
is within bounds to prevent potential errors.
Apply this diff to add the assertion:
if constexpr (RegexDFAStateType::Byte == stateType) {
+ assert(character < cSizeOfByte);
return m_bytes_transition[character];
}
examples/intersect-test.cpp (1)
45-45
: Function call to get_intersect
updated correctly
The get_intersect
function call has been correctly updated to pass a raw pointer using dfa2.get()
, aligning with the updated function signature. This ensures proper functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refactoring looks good to me. Let's make an agreement on the macro naming and then we can merge
@@ -0,0 +1,80 @@ | |||
#ifndef LOG_SURGEON_FINITE_AUTOMATA_REGEX_DFA_STATE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to miss this in previous refactor PRs: I think we should name macros to exactly match the file name, so this header should be LOG_SURGEON_FINITE_AUTOMATA_REGEXDFASTATE
instead. We can create an issue to keep track of this and fix them all together later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kk sounds good, I'll create the issue. I was previously separating it on capitalization, e.g. log_surgeon/finite_automate/DfaState
would use #ifndef LOG_SURGEON_FINITE_AUTOMATA_DFA_STATE
as the correct snake_case
naming for the separate words (as we're combining snake_case
folder names and camal_case
file names).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR title looks good to me.
The macro naming issue is tracked here: #65
git diff
will provide better comparisons for files with changed names than using the Github UI:Description
Separate DFA Functionality into different files.
RegexDFAState
class moved into its own file.RegexDFAStatePair
class moved into its own file.RegexDFAStateType
enum moved into its own file.Validation performed
Previously existing tests still succeed.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation