-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Extract RegexDFAState
class, RegexDFAStatePair
class, and RegexDFAStateType
enum into their own files.
#57
Merged
LinZhihao-723
merged 372 commits into
y-scope:main
from
SharafMohamed:individual-dfa-files
Dec 11, 2024
Merged
Changes from all commits
Commits
Show all changes
372 commits
Select commit
Hold shift + click to select a range
e8db277
Have internal serialize() functions for RegexNFA (states and tagged t…
SharafMohamed 337cead
Reserve space during BFS; Run linter.
SharafMohamed 4a30fdc
Add braced initialization to nfa.
SharafMohamed 0203038
Update docstring for positive tag serialization.
SharafMohamed 633acc4
Update docstring for negative tag serialization.
SharafMohamed 4db7b82
Use return statement for full docstring of get_bfs_traversal_order.
SharafMohamed 01f8b14
Update NFA serialize() docstring.
SharafMohamed d047624
Add long form of BFS for first use.
SharafMohamed f9c4f46
Use const for state_id_it.
SharafMohamed bd77c78
Update docstring for NFA state serialize.
SharafMohamed f2d8049
Combine the two failure cases in NFA state serailize's docstring to m…
SharafMohamed 4cb560f
Use const for state_id_it.
SharafMohamed 95b7497
For NFA state serialize flip order of failure checks to reduce indent…
SharafMohamed e187445
Merge branch 'tagged-nfa-new' of https://github.com/SharafMohamed/log…
SharafMohamed 8b85511
Use const& for passing rules into the NFA as rules are never stored, …
SharafMohamed 0756794
Use braced initialization for NFA.
SharafMohamed 6ab439a
Remove warning for not check std::optional when we know the function …
SharafMohamed 9244812
Remove redundant initialzation of member variables in tagged transiti…
SharafMohamed 0d151a4
Use member initialization lists for constructing NFA state from tagge…
SharafMohamed ac63713
Switch to using optional prefix for optional return types.
SharafMohamed b57b93f
Make negative tagged transition singular as you can never have more t…
SharafMohamed c3fb16d
Add missing param for new_state_with_negative_tagged_transitions.
SharafMohamed 8a41367
Move RegexNFAStateType, RegexNFAState, and PositiveTaggedTransition/N…
SharafMohamed d1a57e4
Add tag class.
SharafMohamed bc78f59
Make tag an object with name, start, and end information, instead of …
SharafMohamed ac7260f
Run linter.
SharafMohamed 40a8206
Merge branch 'main' into singular-negative-transition
SharafMohamed c2eea21
Change t to curr_state and u to dest_state.
SharafMohamed 629fce9
Change curr_state to current_state; Remove extraneous *; Add newline …
SharafMohamed aed62b2
Add TODO for utf8 case in BFS.
SharafMohamed 34522a7
Use auto and fix order of const wrt to *.
SharafMohamed 332af35
Initialize m_dest_state to nullptr.
SharafMohamed 748e794
Change negative_tagged_transition to negative_tagged_transition_string.
SharafMohamed 38dc22b
Change negative tag transitions to singular.
SharafMohamed 5a30ed8
Switch transitions to singular where applicable.
SharafMohamed c8bf9e6
Merge changes with previous PR manually. Still missing changes to pre…
SharafMohamed 90edf77
Auto linter.
SharafMohamed fd765f7
Merge branch 'singular-negative-transition' into individual-files
SharafMohamed f7d3415
Merge branch 'individual-files' into meaningful-tags
SharafMohamed b5f7cdf
Modify expected output where ordering of negative tags is ambiguous. …
SharafMohamed d90b731
Add a description for how to use the tag.
SharafMohamed 3f1f8ff
Add start and end positive transitions.
SharafMohamed 2bd5d2c
Add functionality to tags to use it for tracking capture positions; R…
SharafMohamed 2d0157e
Reduce indentation of epsilon closure by using continue.
SharafMohamed 1cabafd
Use optional for negative transitions in RegexNFAState.
SharafMohamed dc2c637
Add missing headers; Remove unused headers.
SharafMohamed 7c5cfc0
Assign optional_negative_tagged_transition to a reference.
SharafMohamed 4e8d290
Assign optional_negative_tagged_transition to a reference again.
SharafMohamed fde9037
Add <stack> to Lexer.tpp.
SharafMohamed e63637e
Fix comment grammar.
SharafMohamed 08e7d5e
Update with previous PR.
SharafMohamed f7b5666
Merge branch 'individual-files' into meaningful-tags
SharafMohamed 93aebd5
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed b8c8f77
Store negative tags in a vector instead of set so that the order is d…
SharafMohamed ef95061
Sync with previous PR.
SharafMohamed b55e96c
Merge branch 'individual-files' into meaningful-tags
SharafMohamed 304f612
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed 7cc8c52
Add start tags to NFA.
SharafMohamed b1a9300
Update unit-test to handle start transitions.
SharafMohamed 9da470d
Merge branch 'main' into individual-files
SharafMohamed b451651
Move RegexNFAXState typedef into RegexNFAState.hpp
SharafMohamed f71348b
Switch void to auto -> void.
SharafMohamed 21e80b9
Merge branch 'individual-files' of https://github.com/SharafMohamed/l…
SharafMohamed 4576d7d
Move short functions into the class definition; Move RegexNFAXState t…
SharafMohamed 6e24969
Merge branch 'individual-files' into meaningful-tags
SharafMohamed ff91bcc
Merge branch 'main' into meaningful-tags
SharafMohamed e786ec6
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed 5abe906
Auto format.
SharafMohamed bb0bd2e
Remove unused lambda; Auto format.
SharafMohamed a36bb90
Add test case for Tag class.
SharafMohamed 59cc6cd
Add nullptr checks.
SharafMohamed 8097a69
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed 9fc41c0
Change Tag class functionality to reflect how registers will be used.
SharafMohamed 6e5c968
Add register class.
SharafMohamed e185fe2
Seperate classes from RegexDFA.hpp and RegexDFA.tpp into their own .h…
SharafMohamed d060bc6
Temp fix for unit-test until future PR where Tag ptrs are stored in v…
SharafMohamed f041a37
Swap from set to vector to tag pointers to ensure determinism.
SharafMohamed f72e120
Better test coverage for tag class.
SharafMohamed d5ac1ad
Use constant iterators for elements that should not change.
SharafMohamed 30f03ed
Use braced intiailization in test-tag.cpp.
SharafMohamed d386fc0
Use const& for insertion function that can't use move semantics.
SharafMohamed 4024c3e
Have get_name() return string_view; Update headers.
SharafMohamed 22c3b82
Remove const from member variable.
SharafMohamed ed55534
Remove const from member variable.
SharafMohamed 534afce
Run linter.
SharafMohamed 61fdb5d
Add move semantic test cases.
SharafMohamed 78e5fe8
Add PositiveTaggedTransition docstring and make m_tag throw if ever n…
SharafMohamed 630d882
Delete unused operators.
SharafMohamed 543f8af
Move null check into intiailizer list for NegativeTaggedTransition co…
SharafMohamed ec342fc
Remove position vectors from Tag, as they arent used in the AST.
SharafMohamed af86281
RegexASTCapture enforces non-null arguments; Add docstring to RegexAS…
SharafMohamed 738becd
Capitalize exceptions.
SharafMohamed 789263e
Use () to fix linting issue.
SharafMohamed 1f15ca7
Keep default copy assignment.
SharafMohamed 7688c24
Move @throw to constructor docstrings.
SharafMohamed 27618b2
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed 867d27c
Merge branch 'fixed-tagged-nfa' into register
SharafMohamed aff5bca
Merge branch 'register' into individual-dfa-files
SharafMohamed 486190a
Do string_viee comparisomn in lexer test.
SharafMohamed ac75909
Use string_view compares in tag tests.
SharafMohamed 090f18c
Update headers in TaggedTransition.hpp.
SharafMohamed c7cfc10
Seperate copy and move constructor unit-tests.
SharafMohamed 91b8b51
Use NOTE for class requirements.
SharafMohamed fcb1a76
Use NOTE for class requirements.
SharafMohamed 9b09e19
Use NOTE for class requirements.
SharafMohamed 2f712e6
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed 75aecc4
Update install-catch2.sh to compile catch2 with c++17.
SharafMohamed 9302b94
Merge branch 'main' into fixed-tagged-nfa
SharafMohamed 97caabb
Merge branch 'catch2-install-fix' into fixed-tagged-nfa
SharafMohamed 507a7d3
Loop over end_transitions correctly.
SharafMohamed 34c227b
Add TagPositions class.
SharafMohamed 27c8560
Remove new class, going to add it later.
SharafMohamed 86caa9b
Add const back in.
SharafMohamed 338638e
Add more const back in.
SharafMohamed a742601
Add more const back in.
SharafMohamed d358713
Linter.
SharafMohamed 43870ea
Add more const back in.
SharafMohamed b827a6c
Merge branch 'fixed-tagged-nfa' into register
SharafMohamed f941607
Use `auto`.
SharafMohamed aad9eb3
Fix spacing.
SharafMohamed a801bf8
Add diagram for capture group NFA.
SharafMohamed 08b7548
Add const for consitency with constructor.
SharafMohamed 449133e
Update positive end transition to be optional instead of a vector.
SharafMohamed 7b837bf
Rename new_state function correctly.
SharafMohamed f0eb56b
Update capture group AST state creation.
SharafMohamed a945915
Encapsulate new state for capture group.
SharafMohamed c757ded
Fix compiler error.
SharafMohamed 2eb7477
Use singular for end transition getter function.
SharafMohamed 08060ed
Void to auto -> void.
SharafMohamed 0c2c1d1
Update new_capture_group_start_states to new_capture_group_states to …
SharafMohamed b0b951a
Linter.
SharafMohamed 3c2a2ab
Update docstring for .
SharafMohamed 98c5b95
Rename to new_start_and_end_states_with_positively_tagged_transitions.
SharafMohamed f59cf41
Rename to capture_X_state.
SharafMohamed 85a2d69
Update docstring.
SharafMohamed 4c602d4
Updated diagram to match vars used in code.
SharafMohamed 2b01433
Rename vars to serialized_X.
SharafMohamed e37b29a
Run Linter.
SharafMohamed c5beca3
Fix typo.
SharafMohamed fe4a7b3
Update diagram for capture group NFA.
SharafMohamed 8993088
Merge branch 'fixed-tagged-nfa' into register
SharafMohamed aaf720a
Merge branch 'main' into register
SharafMohamed 0017512
Add register unit-tests, add PrefixTree with unit-tests.
SharafMohamed 336f2ae
Finished with initial register implementation.
SharafMohamed 3449df2
Linter.
SharafMohamed ef62df1
Linter.
SharafMohamed a085650
Docstring fixes.
SharafMohamed 2be06c0
Add boundry test case.
SharafMohamed 9ec01dd
Improve test cases for setting positions in prefix tree.
SharafMohamed 019e675
Improve test cases for setting invalid positions in prefix tree.
SharafMohamed 83a411a
Remove confusing description; Remove unused include.
SharafMohamed c88fbb5
Add edge case test to register unit-tests.
SharafMohamed 7c91ddc
Update docstring for PrefixTreeNode.
SharafMohamed 4c50769
Add comments to test-case; Add new test case for setting root value.
SharafMohamed 98200b4
Update docstring to make it clear that any negative value of m_positi…
SharafMohamed afaf01a
Fix header gaurd.
SharafMohamed 8dea476
Fix typo.
SharafMohamed dbb1e16
Remove newline in docstring.
SharafMohamed e054825
Improve throw consistency.
SharafMohamed 792ce96
Update prefix tree insertion test cases.
SharafMohamed cab6e81
Fix test case.
SharafMohamed ffda5e6
Fix @throws doscstring for consistency; Improve insert() docstring.
SharafMohamed ff11672
Improve register handler test coverage.
SharafMohamed 536b50b
Fix == ordering in test-cases; Fix vector initialization to remove re…
SharafMohamed 77c20f7
Add const for consistency.
SharafMohamed f43759c
Add _HPP to header guards; Remove unused include.
SharafMohamed 01e8881
Fix typo.
SharafMohamed fbb3d36
Remove blank line.
SharafMohamed e1f2b18
Rename to m_prefix_tree; Remove unused include.
SharafMohamed a51b49d
Add param descriptions to docstrings.
SharafMohamed 002577e
Improve out of range check to be consistent.
SharafMohamed 52a155c
Update set docstring.
SharafMohamed a6beafc
Punctuate docstrings.
SharafMohamed ec1f757
Update PregixTreeNode docstring.
SharafMohamed f35741f
Improve docstring for PrefixTree.
SharafMohamed e8e5e55
Change to use auto -> void; Punctuate out_of_range throws.
SharafMohamed f1ece30
Update Register docstring.
SharafMohamed 08997ae
Update PrefixTree docstring.
SharafMohamed 0910c62
Grammar fix.
SharafMohamed ede680e
Grammar fix.
SharafMohamed c7b047c
Use auto where possible.
SharafMohamed 6fa8fcb
Use uniform initialization.
SharafMohamed 18b9160
Add missing header.
SharafMohamed 3f08fa3
Linter.
SharafMohamed e281f04
Fix spacing.
SharafMohamed a03734e
Make Node a member of PrefixTree.
SharafMohamed 9123c7a
Rename index to prefix_tree_node_id.
SharafMohamed fe35fe0
Make it clear indicies in add_register are refering to prefix_tree no…
SharafMohamed de58e08
Linter.
SharafMohamed 1426179
rename to reg_id.
SharafMohamed 3301f14
Rename to reg_id.
SharafMohamed c9b1369
Use at().
SharafMohamed e2aee66
Remove Register class and use uint32_t instead; Rename vers to xxx_re…
SharafMohamed 36c1810
Rename to reg_id.
SharafMohamed 48df8b0
Remove unused header.
SharafMohamed a8605fc
Change pred index to be optional and nullopt for root.
SharafMohamed 15cb1b6
Add and use node_id_t.
SharafMohamed 6b787d0
Add position_t.
SharafMohamed cd8f4e3
Change to id_t.
SharafMohamed 72da50c
Add is_root().
SharafMohamed 3fc7ea7
Add missing header.
SharafMohamed 6443d66
Update PrefixTree docstring.
SharafMohamed 63aec4d
Removing node docstring as its redundant.
SharafMohamed 295f3ee
Combine private section in PrefixTree.
SharafMohamed 1186666
Add missing header; Remove copy paste error.
SharafMohamed 06ee38e
Rename to node_id and parent_node_id.
SharafMohamed e103011
Update get_reversed_positions' docstring.
SharafMohamed 31b0346
Update get_reversed positions' docstring to clarify exlcusivity of th…
SharafMohamed 4005e41
Grammar fix.
SharafMohamed e38940c
Add maybe_unusued.
SharafMohamed d71368d
Update src/log_surgeon/finite_automata/RegisterHandler.hpp
SharafMohamed dd4b6e1
Update test case names to document code names better.
SharafMohamed 7322852
Implicitily use auto in vectors.
SharafMohamed dba1a18
Explicitily use position_t for vectors.
SharafMohamed ee6efab
Update tests/test-register-handler.cpp
SharafMohamed 9ba980c
Switch to size_t.
SharafMohamed 27b324c
Clang-tidy: Remove magic numbers + Fix headers.
SharafMohamed f651a24
Reduce complexity for clang-tidy.
SharafMohamed fc6f426
Add negative pos test case in test-register-handler.cpp.
SharafMohamed c8fb570
Alternate b/w positive and negative positions in test-prefix-tree neg…
SharafMohamed 1f66918
Add cRootId and size() to PrefixTree.
SharafMohamed a388c80
Update note.
SharafMohamed 340eaf7
Update docstring.
SharafMohamed 22cf931
Fix typo.
SharafMohamed e75c888
Merge branch 'register' into individual-dfa-files
SharafMohamed c61f2d9
Update header for size_t.
SharafMohamed 417bde8
Update src/log_surgeon/finite_automata/PrefixTree.hpp
SharafMohamed 738876d
Update src/log_surgeon/finite_automata/PrefixTree.hpp
SharafMohamed 93c03a0
Update src/log_surgeon/finite_automata/RegisterHandler.hpp
SharafMohamed 6481e5f
Update tests/test-prefix-tree.cpp
SharafMohamed 6a9a4a4
Clean up register initialization helper; Fix typo.
SharafMohamed 052d86f
Update get_parent_id to clarify its unsafe and suppress warning.
SharafMohamed ed70bd5
Move constants in test-register-handler.hpp to minimize scope.
SharafMohamed fab801f
Merge branch 'register' into individual-dfa-files
SharafMohamed 1671e39
Move constants into scope for test-prefix-tree.cpp.
SharafMohamed 748dfc5
Rename to handler_init and return handler.
SharafMohamed 8abf35a
Add docstring for get_parent_id_unsafe().
SharafMohamed 1e5fdcc
Linter.
SharafMohamed 71d926d
Merge branch 'register' into individual-dfa-files
SharafMohamed 66ed13b
Merge branch 'main' into individual-dfa-files
SharafMohamed a12a360
Fix comment length.
SharafMohamed 244d122
Initialize byte transitions.
SharafMohamed 176391b
Use const* in place of unique_ptr reference; Update docstrings.
SharafMohamed 012f61f
Update intersect test to compile.
SharafMohamed 96a6363
Update next() docstring.
SharafMohamed 421c3de
Update headers.
SharafMohamed 1b945a1
Update Lexer headers.
SharafMohamed 78c4125
Add header for conditional_t.
SharafMohamed 33623fa
Linter.
SharafMohamed 0decaf5
Change ! to false ==.
SharafMohamed File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,149 +1,75 @@ | ||
#ifndef LOG_SURGEON_FINITE_AUTOMATA_REGEX_DFA_HPP | ||
#define LOG_SURGEON_FINITE_AUTOMATA_REGEX_DFA_HPP | ||
|
||
#include <algorithm> | ||
#include <cstdint> | ||
#include <memory> | ||
#include <set> | ||
#include <utility> | ||
#include <vector> | ||
|
||
#include <log_surgeon/Constants.hpp> | ||
#include <log_surgeon/finite_automata/RegexNFA.hpp> | ||
#include <log_surgeon/finite_automata/UnicodeIntervalTree.hpp> | ||
#include <log_surgeon/finite_automata/RegexDFAStatePair.hpp> | ||
|
||
namespace log_surgeon::finite_automata { | ||
enum class RegexDFAStateType { | ||
Byte, | ||
UTF8 | ||
}; | ||
|
||
template <RegexDFAStateType stateType> | ||
class RegexDFAState { | ||
public: | ||
using Tree = UnicodeIntervalTree<RegexDFAState<stateType>*>; | ||
|
||
auto add_matching_variable_id(uint32_t const variable_id) -> void { | ||
m_matching_variable_ids.push_back(variable_id); | ||
} | ||
|
||
[[nodiscard]] auto get_matching_variable_ids() const -> std::vector<uint32_t> const& { | ||
return m_matching_variable_ids; | ||
} | ||
|
||
[[nodiscard]] auto is_accepting() const -> bool { return !m_matching_variable_ids.empty(); } | ||
|
||
auto add_byte_transition(uint8_t const& byte, RegexDFAState<stateType>* dest_state) -> void { | ||
m_bytes_transition[byte] = dest_state; | ||
} | ||
|
||
/** | ||
* Returns the next state the DFA transitions to on input character (byte or | ||
* utf8) | ||
* @param character | ||
* @return RegexDFAState<stateType>* | ||
*/ | ||
[[nodiscard]] auto next(uint32_t character) const -> RegexDFAState<stateType>*; | ||
|
||
private: | ||
std::vector<uint32_t> m_matching_variable_ids; | ||
RegexDFAState<stateType>* m_bytes_transition[cSizeOfByte]; | ||
// NOTE: We don't need m_tree_transitions for the `stateType == | ||
// RegexDFAStateType::Byte` case, so we use an empty class (`std::tuple<>`) | ||
// in that case. | ||
std::conditional_t<stateType == RegexDFAStateType::UTF8, Tree, std::tuple<>> m_tree_transitions; | ||
}; | ||
|
||
/** | ||
* Class for a pair of DFA states, where each state in the pair belongs to a different DFA. | ||
* This class is used to facilitate the construction of an intersection DFA from two separate DFAs. | ||
* Each instance represents a state in the intersection DFA and follows these rules: | ||
* | ||
* - A pair is considered accepting if both states are accepting in their respective DFAs. | ||
* - A pair is considered reachable if both its states are reachable in their respective DFAs | ||
* from this pair's states. | ||
* | ||
* NOTE: Only the first state in the pair contains the variable types matched by the pair. | ||
*/ | ||
template <typename DFAState> | ||
class RegexDFAStatePair { | ||
public: | ||
RegexDFAStatePair(DFAState const* state1, DFAState const* state2) | ||
: m_state1(state1), | ||
m_state2(state2) {}; | ||
|
||
/** | ||
* Used for ordering in a set by considering the states' addresses | ||
* @param rhs | ||
* @return Whether m_state1 in lhs has a lower address than in rhs, or if they're equal, | ||
* whether m_state2 in lhs has a lower address than in rhs | ||
*/ | ||
auto operator<(RegexDFAStatePair const& rhs) const -> bool { | ||
if (m_state1 == rhs.m_state1) { | ||
return m_state2 < rhs.m_state2; | ||
} | ||
return m_state1 < rhs.m_state1; | ||
} | ||
|
||
/** | ||
* Generates all pairs reachable from the current pair via any string and store any reachable | ||
* pair not previously visited in unvisited_pairs | ||
* @param visited_pairs Previously visited pairs | ||
* @param unvisited_pairs Set to add unvisited reachable pairs | ||
*/ | ||
auto get_reachable_pairs( | ||
std::set<RegexDFAStatePair<DFAState>>& visited_pairs, | ||
std::set<RegexDFAStatePair<DFAState>>& unvisited_pairs | ||
) const -> void; | ||
|
||
[[nodiscard]] auto is_accepting() const -> bool { | ||
return m_state1->is_accepting() && m_state2->is_accepting(); | ||
} | ||
|
||
[[nodiscard]] auto get_matching_variable_ids() const -> std::vector<uint32_t> const& { | ||
return m_state1->get_matching_variable_ids(); | ||
} | ||
|
||
private: | ||
DFAState const* m_state1; | ||
DFAState const* m_state2; | ||
}; | ||
|
||
using RegexDFAByteState = RegexDFAState<RegexDFAStateType::Byte>; | ||
using RegexDFAUTF8State = RegexDFAState<RegexDFAStateType::UTF8>; | ||
|
||
// TODO: rename `RegexDFA` to `DFA` | ||
template <typename DFAStateType> | ||
class RegexDFA { | ||
public: | ||
/** | ||
* Creates a new DFA state based on a set of NFA states and adds it to | ||
* m_states | ||
* @param nfa_state_set | ||
* @return DFAStateType* | ||
* Creates a new DFA state based on a set of NFA states and adds it to `m_states`. | ||
* @param nfa_state_set The set of NFA states represented by this DFA state. | ||
* @return A pointer to the new DFA state. | ||
*/ | ||
template <typename NFAStateType> | ||
auto new_state(std::set<NFAStateType*> const& nfa_state_set) -> DFAStateType*; | ||
|
||
auto get_root() const -> DFAStateType const* { return m_states.at(0).get(); } | ||
|
||
/** | ||
* Compares this dfa with dfa_in to determine the set of schema types in | ||
* this dfa that are reachable by any type in dfa_in. A type is considered | ||
* reachable if there is at least one string for which: (1) this dfa returns | ||
* a set of types containing the type, and (2) dfa_in returns any non-empty | ||
* set of types. | ||
* @param dfa_in | ||
* @return The set of schema types reachable by dfa_in | ||
* Compares this dfa with `dfa_in` to determine the set of schema types in this dfa that are | ||
* reachable by any type in `dfa_in`. A type is considered reachable if there is at least one | ||
* string for which: (1) this dfa returns a set of types containing the type, and (2) `dfa_in` | ||
* returns any non-empty set of types. | ||
* @param dfa_in The dfa with which to take the intersect. | ||
* @return The set of schema types reachable by `dfa_in`. | ||
*/ | ||
[[nodiscard]] auto get_intersect(std::unique_ptr<RegexDFA> const& dfa_in | ||
) const -> std::set<uint32_t>; | ||
[[nodiscard]] auto get_intersect(RegexDFA const* dfa_in) const -> std::set<uint32_t>; | ||
|
||
private: | ||
std::vector<std::unique_ptr<DFAStateType>> m_states; | ||
}; | ||
} // namespace log_surgeon::finite_automata | ||
|
||
#include "RegexDFA.tpp" | ||
template <typename DFAStateType> | ||
template <typename NFAStateType> | ||
auto RegexDFA<DFAStateType>::new_state(std::set<NFAStateType*> const& nfa_state_set | ||
) -> DFAStateType* { | ||
m_states.emplace_back(std::make_unique<DFAStateType>()); | ||
auto* dfa_state = m_states.back().get(); | ||
for (auto const* nfa_state : nfa_state_set) { | ||
if (nfa_state->is_accepting()) { | ||
dfa_state->add_matching_variable_id(nfa_state->get_matching_variable_id()); | ||
} | ||
} | ||
return dfa_state; | ||
} | ||
|
||
template <typename DFAStateType> | ||
auto RegexDFA<DFAStateType>::get_intersect(RegexDFA const* dfa_in) const -> std::set<uint32_t> { | ||
std::set<uint32_t> schema_types; | ||
std::set<RegexDFAStatePair<DFAStateType>> unvisited_pairs; | ||
std::set<RegexDFAStatePair<DFAStateType>> visited_pairs; | ||
unvisited_pairs.emplace(this->get_root(), dfa_in->get_root()); | ||
// TODO: Handle UTF-8 (multi-byte transitions) as well | ||
while (false == unvisited_pairs.empty()) { | ||
auto current_pair_it = unvisited_pairs.begin(); | ||
if (current_pair_it->is_accepting()) { | ||
auto const& matching_variable_ids = current_pair_it->get_matching_variable_ids(); | ||
schema_types.insert(matching_variable_ids.cbegin(), matching_variable_ids.cend()); | ||
} | ||
visited_pairs.insert(*current_pair_it); | ||
current_pair_it->get_reachable_pairs(visited_pairs, unvisited_pairs); | ||
unvisited_pairs.erase(current_pair_it); | ||
} | ||
return schema_types; | ||
} | ||
} // namespace log_surgeon::finite_automata | ||
|
||
#endif // LOG_SURGEON_FINITE_AUTOMATA_REGEX_DFA_HPP |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
#ifndef LOG_SURGEON_FINITE_AUTOMATA_REGEX_DFA_STATE | ||
#define LOG_SURGEON_FINITE_AUTOMATA_REGEX_DFA_STATE | ||
|
||
#include <cassert> | ||
#include <cstdint> | ||
#include <memory> | ||
#include <tuple> | ||
#include <type_traits> | ||
#include <vector> | ||
|
||
#include <log_surgeon/Constants.hpp> | ||
#include <log_surgeon/finite_automata/RegexDFAStateType.hpp> | ||
#include <log_surgeon/finite_automata/UnicodeIntervalTree.hpp> | ||
|
||
namespace log_surgeon::finite_automata { | ||
template <RegexDFAStateType state_type> | ||
class RegexDFAState; | ||
|
||
using RegexDFAByteState = RegexDFAState<RegexDFAStateType::Byte>; | ||
using RegexDFAUTF8State = RegexDFAState<RegexDFAStateType::UTF8>; | ||
|
||
template <RegexDFAStateType stateType> | ||
class RegexDFAState { | ||
public: | ||
using Tree = UnicodeIntervalTree<RegexDFAState<stateType>*>; | ||
|
||
RegexDFAState() { | ||
std::fill(std::begin(m_bytes_transition), std::end(m_bytes_transition), nullptr); | ||
} | ||
|
||
auto add_matching_variable_id(uint32_t const variable_id) -> void { | ||
m_matching_variable_ids.push_back(variable_id); | ||
} | ||
|
||
[[nodiscard]] auto get_matching_variable_ids() const -> std::vector<uint32_t> const& { | ||
return m_matching_variable_ids; | ||
} | ||
|
||
[[nodiscard]] auto is_accepting() const -> bool { | ||
return false == m_matching_variable_ids.empty(); | ||
} | ||
|
||
auto add_byte_transition(uint8_t const& byte, RegexDFAState<stateType>* dest_state) -> void { | ||
m_bytes_transition[byte] = dest_state; | ||
} | ||
|
||
/** | ||
* @param character The character (byte or utf8) to transition on. | ||
* @return A pointer to the DFA state reached after transitioning on `character`. | ||
*/ | ||
[[nodiscard]] auto next(uint32_t character) const -> RegexDFAState<stateType>*; | ||
|
||
private: | ||
std::vector<uint32_t> m_matching_variable_ids; | ||
RegexDFAState<stateType>* m_bytes_transition[cSizeOfByte]; | ||
// NOTE: We don't need m_tree_transitions for the `stateType == RegexDFAStateType::Byte` case, | ||
// so we use an empty class (`std::tuple<>`) in that case. | ||
std::conditional_t<stateType == RegexDFAStateType::UTF8, Tree, std::tuple<>> m_tree_transitions; | ||
}; | ||
|
||
template <RegexDFAStateType stateType> | ||
auto RegexDFAState<stateType>::next(uint32_t character) const -> RegexDFAState<stateType>* { | ||
if constexpr (RegexDFAStateType::Byte == stateType) { | ||
return m_bytes_transition[character]; | ||
} else { | ||
SharafMohamed marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (character < cSizeOfByte) { | ||
return m_bytes_transition[character]; | ||
} | ||
std::unique_ptr<std::vector<typename Tree::Data>> result | ||
= m_tree_transitions.find(Interval(character, character)); | ||
assert(result->size() <= 1); | ||
if (false == result->empty()) { | ||
return result->front().m_value; | ||
} | ||
return nullptr; | ||
} | ||
} | ||
} // namespace log_surgeon::finite_automata | ||
|
||
#endif // LOG_SURGEON_FINITE_AUTOMATA_REGEX_DFA_STATE |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to miss this in previous refactor PRs: I think we should name macros to exactly match the file name, so this header should be
LOG_SURGEON_FINITE_AUTOMATA_REGEXDFASTATE
instead. We can create an issue to keep track of this and fix them all together laterThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kk sounds good, I'll create the issue. I was previously separating it on capitalization, e.g.
log_surgeon/finite_automate/DfaState
would use#ifndef LOG_SURGEON_FINITE_AUTOMATA_DFA_STATE
as the correctsnake_case
naming for the separate words (as we're combiningsnake_case
folder names andcamal_case
file names).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue created.