-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Move epsilon_closure
and nfa_to_dfa
from Lexer
class to Dfa
class.
#62
base: main
Are you sure you want to change the base?
Conversation
…pp files; Also remove RegexDFA.tpp file.
…ector instead of set.
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
… one in case merging caused issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (15)
src/log_surgeon/finite_automata/DfaStatePair.hpp (5)
11-21
: Consider enhancing the documentation.While the current documentation is clear, consider adding:
- Requirements/constraints for the template parameter
TypedDfaState
- Thread safety guarantees for the class
22-28
: Consider adding safety checks and documentation.The constructor could be improved by:
- Adding nullptr validation for the state pointers
- Documenting pointer ownership semantics
- Adding noexcept specification if appropriate
template <typename TypedDfaState> class DfaStatePair { public: + /** + * @brief Constructs a pair of DFA states + * @param state1 Non-null pointer to first DFA state (not owned) + * @param state2 Non-null pointer to second DFA state (not owned) + * @throws std::invalid_argument if either pointer is null + */ DfaStatePair(TypedDfaState const* state1, TypedDfaState const* state2) : m_state1(state1), - m_state2(state2) {}; + m_state2(state2) { + if (state1 == nullptr || state2 == nullptr) { + throw std::invalid_argument("DFA state pointers cannot be null"); + } + }
42-64
: Consider enhancing method documentation and specifications.The member functions could benefit from:
- More detailed documentation for
get_reachable_pairs
:
- Complexity guarantees
- Exception safety guarantees
noexcept
specifications where appropriate foris_accepting
andget_matching_variable_ids
71-71
: Address the TODO comment regarding UTF-8 support.The comment indicates missing UTF-8 support, which could lead to incorrect behaviour with multi-byte characters.
Would you like me to help create a GitHub issue to track the UTF-8 implementation task?
72-82
: Consider performance optimizations for state pair generation.The implementation could benefit from:
- Documentation of time and space complexity
- Pre-reserving space in the sets based on the expected number of transitions
auto DfaStatePair<TypedDfaState>::get_reachable_pairs( std::set<DfaStatePair>& visited_pairs, std::set<DfaStatePair>& unvisited_pairs ) const -> void { + // Reserve space for potential new pairs (maximum cSizeOfByte new pairs) + unvisited_pairs.reserve(unvisited_pairs.size() + cSizeOfByte); // TODO: Handle UTF-8 (multi-byte transitions) as well for (uint32_t i = 0; i < cSizeOfByte; i++) {src/log_surgeon/finite_automata/Nfa.hpp (2)
Line range hint
171-171
: Preferfalse == <expression>
over!<expression>
for clarityAccording to the coding guidelines, it's preferred to use
false == <expression>
instead of!<expression>
. Please update the following lines:
- Line 171:
while (!unused_item_sets.empty()) {
should bewhile (false == unused_item_sets.empty()) {
- Line 220:
while (!q.empty()) {
should bewhile (false == q.empty()) {
- Line 383:
while (!queue.empty()) {
should bewhile (false == queue.empty()) {
Apply these diffs:
- while (!unused_item_sets.empty()) { + while (false == unused_item_sets.empty()) {- while (!q.empty()) { + while (false == q.empty()) {- while (!queue.empty()) { + while (false == queue.empty()) {Also applies to: 220-220, 383-383
Line range hint
241-241
: Usefalse == <expression>
instead of!<expression>
At line 241,
if (!item.has_dot_at_end()) {
should be updated toif (false == item.has_dot_at_end()) {
to adhere to the coding guidelines.Apply this diff:
- if (!item.has_dot_at_end()) { + if (false == item.has_dot_at_end()) {src/log_surgeon/Lalr1Parser.tpp (2)
110-110
: Preferfalse == <expression>
over!<expression>
for consistencyPlease update the following lines to match the coding guidelines:
- Line 110:
while (!q.empty()) {
→while (false == q.empty()) {
- Line 273:
while (changed) {
→while (false == changed) {
- Line 335:
while (changed) {
→while (false == changed) {
- Line 383:
while (!queue.empty()) {
→while (false == queue.empty()) {
Apply these diffs:
- while (!q.empty()) { + while (false == q.empty()) {- while (changed) { + while (false == changed) {- while (!queue.empty()) { + while (false == queue.empty()) {Also applies to: 273-273, 335-335, 383-383
571-571
: Usefalse == <expression>
instead of!<expression>
for assertionsAt line 571, please update the assertion to follow the coding guideline:
- Change
assert(!m_parse_stack_matches.empty());
toassert(false == m_parse_stack_matches.empty());
Apply this diff:
- assert(!m_parse_stack_matches.empty()); + assert(false == m_parse_stack_matches.empty());src/log_surgeon/LexicalRule.hpp (1)
28-30
: Address TODO comment regarding constant pointer return type.The TODO comment indicates a need to make the returned pointer constant. This would prevent modification of the underlying RegexAST through the getter.
Would you like me to propose a solution for making the returned pointer constant?
src/log_surgeon/Lexer.hpp (1)
14-16
: Consider grouping related includes together.The finite automata related includes could be grouped together for better organization:
-#include <log_surgeon/finite_automata/Dfa.hpp> -#include <log_surgeon/finite_automata/DfaState.hpp> -#include <log_surgeon/finite_automata/Nfa.hpp> +// Finite automata related includes +#include <log_surgeon/finite_automata/Dfa.hpp> +#include <log_surgeon/finite_automata/DfaState.hpp> +#include <log_surgeon/finite_automata/Nfa.hpp>src/log_surgeon/finite_automata/Dfa.hpp (3)
45-60
: Consider using std::unordered_map for better performanceThe
dfa_states
map is used for state lookup operations. Since the order of states isn't important here, usingstd::unordered_map
could provide better average-case performance.- std::map<StateSet, DfaStateType*> dfa_states; + std::unordered_map<StateSet, DfaStateType*> dfa_states;
131-131
: Use!unvisited_pairs.empty()
for consistencyThe condition uses
false == unvisited_pairs.empty()
which is less common than the!
operator.- while (false == unvisited_pairs.empty()) { + while (!unvisited_pairs.empty()) {
73-82
: TODO comments need attentionThere are multiple TODO comments about UTF-8 support. This could impact the functionality for non-ASCII characters.
Would you like me to help create GitHub issues to track the UTF-8 implementation tasks?
src/log_surgeon/finite_automata/NfaState.hpp (1)
174-207
: Consider optimizing epsilon closure computationThe current implementation uses a set for visited states and a stack for traversal. Consider using a vector with a boolean flag for visited states if the state IDs are dense, as this could provide better cache locality.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (27)
CMakeLists.txt
(2 hunks)examples/intersect-test.cpp
(4 hunks)src/log_surgeon/BufferParser.hpp
(2 hunks)src/log_surgeon/Lalr1Parser.cpp
(2 hunks)src/log_surgeon/Lalr1Parser.hpp
(8 hunks)src/log_surgeon/Lalr1Parser.tpp
(28 hunks)src/log_surgeon/Lexer.hpp
(5 hunks)src/log_surgeon/Lexer.tpp
(18 hunks)src/log_surgeon/LexicalRule.hpp
(2 hunks)src/log_surgeon/LogParser.cpp
(6 hunks)src/log_surgeon/LogParser.hpp
(4 hunks)src/log_surgeon/Parser.hpp
(1 hunks)src/log_surgeon/Parser.tpp
(2 hunks)src/log_surgeon/ReaderParser.hpp
(2 hunks)src/log_surgeon/SchemaParser.cpp
(3 hunks)src/log_surgeon/SchemaParser.hpp
(4 hunks)src/log_surgeon/finite_automata/Dfa.hpp
(1 hunks)src/log_surgeon/finite_automata/DfaState.hpp
(1 hunks)src/log_surgeon/finite_automata/DfaStatePair.hpp
(1 hunks)src/log_surgeon/finite_automata/DfaStateType.hpp
(1 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(6 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(29 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(5 hunks)tests/CMakeLists.txt
(1 hunks)tests/test-NFA.cpp
(1 hunks)tests/test-lexer.cpp
(2 hunks)
✅ Files skipped from review due to trivial changes (2)
- src/log_surgeon/BufferParser.hpp
- src/log_surgeon/ReaderParser.hpp
🚧 Files skipped from review as they are similar to previous changes (12)
- src/log_surgeon/finite_automata/DfaStateType.hpp
- src/log_surgeon/LogParser.hpp
- src/log_surgeon/Parser.hpp
- src/log_surgeon/Parser.tpp
- tests/CMakeLists.txt
- src/log_surgeon/SchemaParser.hpp
- src/log_surgeon/LogParser.cpp
- src/log_surgeon/finite_automata/TaggedTransition.hpp
- tests/test-NFA.cpp
- tests/test-lexer.cpp
- src/log_surgeon/finite_automata/RegexAST.hpp
- src/log_surgeon/Lexer.tpp
🧰 Additional context used
📓 Path-based instructions (11)
src/log_surgeon/Lalr1Parser.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Dfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/DfaStatePair.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/DfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/LexicalRule.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Lalr1Parser.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
examples/intersect-test.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Lexer.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/SchemaParser.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
src/log_surgeon/finite_automata/NfaState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (16)
src/log_surgeon/finite_automata/DfaStatePair.hpp (2)
1-9
: LGTM! Header guards and includes are well-organized.
The header guards follow the project's naming convention, and the includes are minimal and appropriate for the required functionality.
29-40
: LGTM! Well-implemented comparison operator.
The implementation is efficient and correctly documented, with proper const correctness and clear return type specification.
src/log_surgeon/finite_automata/DfaState.hpp (1)
62-62
:
Add bounds checking in next
method to prevent out-of-bounds access
In the next
method, when state_type
is Byte
, accessing m_bytes_transition[character]
without verifying that character
is within bounds can cause out-of-bounds access if character
is greater than or equal to cSizeOfByte
. Please add a bounds check to ensure safe access.
Apply this diff to add bounds checking:
if constexpr (DfaStateType::Byte == state_type) {
+ if (character >= cSizeOfByte) {
+ return nullptr;
+ }
return m_bytes_transition[character];
} else {
src/log_surgeon/Lalr1Parser.cpp (1)
11-13
: LGTM
The initialization and member variable adjustments are correctly implemented.
src/log_surgeon/LexicalRule.hpp (1)
9-9
: LGTM! Type renaming changes improve clarity.
The renaming from generic types (NFAStateType) to more descriptive types (TypedNfaState) enhances code readability and better reflects the purpose of these types.
Also applies to: 15-15, 24-24, 28-28, 35-35, 38-39
examples/intersect-test.cpp (2)
43-44
: Improved DFA construction using constructor.
The change from using ByteLexer::nfa_to_dfa(nfa)
to Dfa dfa2(nfa)
is a good improvement as it:
- Simplifies the code by using constructor instead of static method
- Makes the relationship between NFA and DFA more explicit
81-82
: Verify test coverage after refactoring.
The DFA construction changes are significant. Although the test cases remain unchanged, we should verify that they still provide adequate coverage.
src/log_surgeon/Lexer.hpp (1)
162-163
: LGTM! Type aliases follow new naming convention.
The type aliases in the lexers namespace have been updated to use the new type names, maintaining consistency across the codebase.
CMakeLists.txt (1)
99-105
: LGTM! Source files properly organized
The new finite automata source files are correctly added and organized within the existing structure.
src/log_surgeon/finite_automata/NfaState.hpp (1)
37-38
: Constructor parameter type matches previous feedback
The constructor now uses std::vector<Tag const*>
instead of std::set
as per the learning from PR #48.
src/log_surgeon/Lalr1Parser.hpp (4)
123-123
: LGTM! Clean and concise implementation.
The removal of the this->
qualifier improves readability while maintaining the same functionality.
161-161
: LGTM! Clean and concise implementation.
The removal of the this->
qualifier improves readability while maintaining the same functionality.
168-168
: LGTM! Clean and concise implementation.
The single-line implementation improves readability while maintaining the same functionality.
203-204
: LGTM! Improved type naming and proper base class member access.
The changes enhance code clarity through:
- More descriptive template parameter names
- Proper base class member access using the
using
declaration
Also applies to: 206-206, 215-215, 225-225, 277-277, 397-398, 410-410
src/log_surgeon/SchemaParser.cpp (2)
419-419
: LGTM! Improved C++ style consistency.
The change to trailing return type syntax aligns with modern C++ conventions.
484-484
: LGTM! Improved C++ style consistency.
The change to trailing return type syntax aligns with modern C++ conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
examples/intersect-test.cpp (1)
24-24
: Consider adding const qualifier to the DFA parameterSince
get_intersect_for_query
only calls const member functionget_intersect
, consider making the parameter const:- Dfa<ByteDfaState>& dfa1, + Dfa<ByteDfaState> const& dfa1,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
examples/intersect-test.cpp
(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
examples/intersect-test.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
🔇 Additional comments (3)
examples/intersect-test.cpp (3)
8-11
: LGTM! Type declarations are clear and consistent
The new type names and namespace organization improve code clarity and maintainability.
Also applies to: 20-20
43-44
: LGTM! Efficient construction with move semantics
The construction of NFA and DFA objects properly utilizes move semantics, and the new approach of constructing DFA directly from NFA is more elegant.
81-82
: LGTM! Verify test coverage remains unchanged
The refactored construction is consistent with the changes above. Let's verify that all test cases still exercise the same functionality.
✅ Verification successful
All test cases are preserved and type refactoring is complete
The verification confirms:
- All test cases using
get_intersect_for_query
are intact, maintaining the same test coverage - No legacy type names remain in the codebase, confirming successful refactoring
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify test coverage and functionality
# Check if all test cases are preserved
rg -A 1 "get_intersect_for_query.*dfa" examples/intersect-test.cpp
# Verify no old type names remain
rg -l "RegexDFA|RegexNFA|RegexDFAByteState|RegexNFAByteState" || echo "No old type names found"
Length of output: 716
References
Description
epsilon_closure
fromLexer
class toNfaState
class.nfa_to_dfa
inLexer
class toDfa
class constructor.Validation performed
Previously existing tests succeed.
Summary by CodeRabbit
Release Notes
New Features
Dfa
andNfa
classes for enhanced finite automata functionality.Improvements
Bug Fixes
LogEvent
andLogParser
classes.Documentation