Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Remove redundant this when referring to member variables. #60

Open
wants to merge 419 commits into
base: main
Choose a base branch
from

Conversation

SharafMohamed
Copy link
Contributor

@SharafMohamed SharafMohamed commented Dec 5, 2024

References

  • Depends on PR#59.
  • To review in parallel with PR#59, diff against PR#59 locally. In the repo run:
git fetch upstream pull/59/head:pr-59
git fetch upstream pull/60/head:pr-60
git diff pr-59 pr-60

Description

Old standard in this repository was using this->var to refer to member variables in classes. We switched the standard to use m_var. This PR updates previous code to meet this standard.

  • using Parser<TypedNfaState, TypedDfaState>::m_lexer; added to the top of Lalr1Parser variables in the Private section. This allows this->m_lexer of the Parser parent class to be used as m_lexer in Lalr1Parser.

Validation performed

Previously existing tests still succeed.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced the PrefixTree class for managing prefix tree structures in register data.
    • Added the RegisterHandler class for managing registers using a prefix tree.
    • Implemented the Dfa and Nfa classes for deterministic and non-deterministic finite automata.
  • Improvements

    • Updated naming conventions across the codebase for clarity and consistency, including renaming RegexNFA to Nfa and RegexDFA to Dfa.
    • Enhanced error handling in various classes to ensure robustness.
  • Testing

    • Added unit tests for PrefixTree and RegisterHandler to validate functionality and error handling.
    • Expanded test coverage for finite automata implementations.

These changes enhance the application's structure, improve readability, and ensure that new features are thoroughly tested.

SharafMohamed and others added 30 commits November 1, 2024 06:40
…eterministic. This is fine as tags are unique across disjoint subtrees.
…ypedefs to the top of the file to fix compilation error.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (9)
src/log_surgeon/Lalr1Parser.tpp (5)

94-96: Simplify the add_token_chain function by refactoring

In lines 94-96 of the add_token_chain function, constructing the rule_chain manually for each character can be error-prone and hard to maintain. Consider refactoring the code to build the chain using a loop, which will make it more scalable and readable.

Here's an example of how you might refactor the code:

std::unique_ptr<finite_automata::RegexAST<TypedNfaState>> rule_chain =
    std::make_unique<finite_automata::RegexASTLiteral<TypedNfaState>>(chain[0]);

for (uint32_t i = 1; i < chain.size(); i++) {
    std::unique_ptr<finite_automata::RegexASTLiteral<TypedNfaState>> next_char_rule =
        std::make_unique<finite_automata::RegexASTLiteral<TypedNfaState>>(chain[i]);
    rule_chain = std::make_unique<finite_automata::RegexASTCat<TypedNfaState>>(
        std::move(rule_chain), std::move(next_char_rule)
    );
}

196-196: Use false == instead of ! according to coding guidelines

In line 196, you have if (!item_set_ptr->m_closure.insert(*item).second). As per the coding guidelines, prefer false == <expression> rather than !<expression>.

Apply this diff:

-    if (!item_set_ptr->m_closure.insert(*item).second) {
+    if (false == item_set_ptr->m_closure.insert(*item).second) {

Line range hint 223-223: Replace this->m_lexer with m_lexer to follow the new standard

In line 223, you use this->m_lexer.m_symbol_id. Replace this->m_lexer with m_lexer to align with the member variable naming convention.

Apply this diff:

-    if (this->m_lexer.m_symbol_id.find(head) == this->m_lexer.m_symbol_id.end()) {
+    if (m_lexer.m_symbol_id.find(head) == m_lexer.m_symbol_id.end()) {

633-633: Prefer false == accept over !accept

In line 633, you have if (!accept). According to the coding guidelines, prefer using false == accept instead.

Apply this diff:

-    if (!accept) {
+    if (false == accept) {

698-698: Use false == is_accepting instead of !is_accepting

In line 698, within the lambda function, replace if (!is_accepting) with if (false == is_accepting) as per the coding guidelines.

Apply this diff:

-            if (!is_accepting) {
+            if (false == is_accepting) {
src/log_surgeon/Lalr1Parser.cpp (1)

11-14: Update member variable initialization to match new naming conventions

In lines 11-14, in the NonTerminal constructor, you can remove unnecessary qualifiers and adhere to the new m_ member variable notation.

Apply this diff:

-NonTerminal::NonTerminal(Production* p)
-        : m_children_start(m_next_children_start),
-          m_production(p),
-          m_ast(nullptr) {
+NonTerminal::NonTerminal(Production* p)
+        : m_children_start(m_next_children_start),
+          m_production(p),
+          m_ast(nullptr) {
     m_next_children_start += p->m_body.size();
 }
src/log_surgeon/finite_automata/Dfa.hpp (1)

54-71: Provide clarification on UTF-8 handling in get_intersect method

In lines 54-71, the get_intersect method mentions handling UTF-8 transitions but currently only supports byte transitions. Consider implementing UTF-8 handling or clarifying the comment.

Apply this diff to update the TODO comment:

-    // TODO: Handle UTF-8 (multi-byte transitions) as well
+    // Note: Currently, this method handles only byte transitions. UTF-8 support is not implemented yet.
src/log_surgeon/finite_automata/Nfa.hpp (2)

151-151: Consider using a more descriptive lambda parameter name.

The lambda parameter name dest_state could be more descriptive, such as target_state or next_state, to better convey its role in the BFS traversal.

-            = [&state_queue, &visited_states](TypedNfaState const* dest_state) {
+            = [&state_queue, &visited_states](TypedNfaState const* target_state) {
-                  if (visited_states.insert(dest_state).second) {
-                      state_queue.push(dest_state);
+                  if (visited_states.insert(target_state).second) {
+                      state_queue.push(target_state);
                   }
              };

Line range hint 165-167: TODO comment needs attention.

The TODO comment about handling the UTF8 case should be addressed to ensure proper support for UTF-8 characters.

Would you like me to help implement the UTF-8 case handling or create a GitHub issue to track this task?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a346104 and ae64f64.

📒 Files selected for processing (27)
  • CMakeLists.txt (2 hunks)
  • examples/intersect-test.cpp (4 hunks)
  • src/log_surgeon/BufferParser.hpp (2 hunks)
  • src/log_surgeon/Lalr1Parser.cpp (2 hunks)
  • src/log_surgeon/Lalr1Parser.hpp (7 hunks)
  • src/log_surgeon/Lalr1Parser.tpp (20 hunks)
  • src/log_surgeon/Lexer.hpp (6 hunks)
  • src/log_surgeon/Lexer.tpp (18 hunks)
  • src/log_surgeon/LexicalRule.hpp (2 hunks)
  • src/log_surgeon/LogParser.cpp (5 hunks)
  • src/log_surgeon/LogParser.hpp (4 hunks)
  • src/log_surgeon/Parser.hpp (1 hunks)
  • src/log_surgeon/Parser.tpp (3 hunks)
  • src/log_surgeon/ReaderParser.hpp (2 hunks)
  • src/log_surgeon/SchemaParser.cpp (1 hunks)
  • src/log_surgeon/SchemaParser.hpp (4 hunks)
  • src/log_surgeon/finite_automata/Dfa.hpp (1 hunks)
  • src/log_surgeon/finite_automata/DfaState.hpp (1 hunks)
  • src/log_surgeon/finite_automata/DfaStatePair.hpp (1 hunks)
  • src/log_surgeon/finite_automata/DfaStateType.hpp (1 hunks)
  • src/log_surgeon/finite_automata/Nfa.hpp (6 hunks)
  • src/log_surgeon/finite_automata/NfaState.hpp (6 hunks)
  • src/log_surgeon/finite_automata/RegexAST.hpp (29 hunks)
  • src/log_surgeon/finite_automata/TaggedTransition.hpp (5 hunks)
  • tests/CMakeLists.txt (1 hunks)
  • tests/test-NFA.cpp (1 hunks)
  • tests/test-lexer.cpp (2 hunks)
✅ Files skipped from review due to trivial changes (2)
  • src/log_surgeon/ReaderParser.hpp
  • src/log_surgeon/BufferParser.hpp
🚧 Files skipped from review as they are similar to previous changes (15)
  • src/log_surgeon/finite_automata/DfaStateType.hpp
  • src/log_surgeon/LogParser.hpp
  • src/log_surgeon/Parser.tpp
  • src/log_surgeon/finite_automata/DfaStatePair.hpp
  • tests/CMakeLists.txt
  • examples/intersect-test.cpp
  • src/log_surgeon/Parser.hpp
  • CMakeLists.txt
  • src/log_surgeon/LexicalRule.hpp
  • src/log_surgeon/finite_automata/TaggedTransition.hpp
  • tests/test-NFA.cpp
  • src/log_surgeon/LogParser.cpp
  • tests/test-lexer.cpp
  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/finite_automata/RegexAST.hpp
🧰 Additional context used
📓 Path-based instructions (9)
src/log_surgeon/Lalr1Parser.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/finite_automata/Dfa.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/finite_automata/DfaState.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/SchemaParser.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/Lalr1Parser.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/Lexer.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/finite_automata/NfaState.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/SchemaParser.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/finite_automata/Nfa.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (2)
src/log_surgeon/finite_automata/NfaState.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/Nfa.hpp (4)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:99-105
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/RegexNFA.hpp`, it's acceptable to have constructors without the `explicit` specifier. Do not suggest adding `explicit` to constructors in this file.
🔇 Additional comments (16)
src/log_surgeon/Lalr1Parser.tpp (3)

73-73: 🛠️ Refactor suggestion

Replace this->m_lexer with m_lexer to adhere to the new member variable convention

In line 73, you use this->m_lexer.m_symbol_id[name]. As per the new standard of using the m_ prefix for member variables, please replace this->m_lexer with m_lexer.

Apply this diff:

-    m_terminals.insert(this->m_lexer.m_symbol_id[name]);
+    m_terminals.insert(m_lexer.m_symbol_id[name]);

Likely invalid or redundant comment.


667-668: 🛠️ Refactor suggestion

Change this->m_lexer to m_lexer when calling scan

In line 668, replace this->m_lexer.scan(...) with m_lexer.scan(...) to follow the updated convention.

Apply this diff:

-    if (ErrorCode error = this->m_lexer.scan(m_input_buffer, token);
+    if (ErrorCode error = m_lexer.scan(m_input_buffer, token);

Likely invalid or redundant comment.


654-655: 🛠️ Refactor suggestion

Update this->m_lexer to m_lexer in the reset method

In line 655, you have this->m_lexer.reset();. Please change it to m_lexer.reset(); to adhere to the new member variable notation.

Apply this diff:

-    this->m_lexer.reset();
+    m_lexer.reset();

Likely invalid or redundant comment.

src/log_surgeon/finite_automata/DfaState.hpp (1)

62-63: ⚠️ Potential issue

Add bounds checking in the next method to prevent out-of-bounds access

When accessing m_bytes_transition[character] in line 63, it's important to ensure that character is less than cSizeOfByte to prevent out-of-bounds access.

Refer to the previous review comment addressing this issue.

src/log_surgeon/SchemaParser.hpp (1)

49-49: LGTM! Type updates are consistent.

The changes align with the broader refactoring effort to standardize type names across the codebase. The updates to the constructor parameter, member variable type, and class inheritance maintain consistency with the new naming convention.

Also applies to: 58-58, 71-72

src/log_surgeon/Lexer.hpp (2)

14-16: LGTM! Include paths updated correctly.

The include paths have been updated to reflect the new file names, maintaining consistency with the refactoring changes.


23-23: LGTM! Type updates are comprehensive and consistent.

The changes systematically update all references from the old naming convention to the new one:

  • Template parameters from NFAStateType/DFAStateType to TypedNfaState/TypedDfaState
  • Method signatures and return types
  • Member variable types
  • Type aliases in the lexers namespace

Also applies to: 32-36, 49-52, 59-59, 129-129, 159-159, 162-162, 164-164, 168-169

src/log_surgeon/finite_automata/Nfa.hpp (2)

1-2: LGTM! File-level updates are correct.

The header guards and include paths have been properly updated to reflect the new file name.

Also applies to: 16-16


20-21: LGTM! Class definition updates are consistent.

The class has been renamed from RegexNFA to Nfa, and the template parameter has been updated to TypedNfaState, maintaining consistency with the new naming convention.

Also applies to: 23-25

src/log_surgeon/finite_automata/NfaState.hpp (3)

130-130: Well-structured constant comparison!

The condition NfaStateType::Utf8 == state_type follows the coding guideline of placing the constant on the left side of the comparison, which prevents accidental assignments.


Line range hint 191-191: Consistent coding style in boolean comparisons.

The expression false == optional_serialized_positive_start_transition.has_value() follows the coding guideline of using explicit comparisons with boolean literals on the left.


110-114: Well-structured member variable declarations.

The member variables consistently use the m_ prefix and are logically grouped by their purpose (transitions, states, etc.).

src/log_surgeon/Lalr1Parser.hpp (2)

161-161: Clean and concise method implementations.

The methods have been simplified by removing redundant this-> qualifiers while maintaining clarity:

  • has_dot_at_end() uses a clear comparison
  • next_symbol() directly accesses member variables

Also applies to: 168-168


203-204: Well-structured template parameters.

The template parameters have been renamed from NFAStateType and DFAStateType to more descriptive TypedNfaState and TypedDfaState, improving code readability.

src/log_surgeon/SchemaParser.cpp (2)

19-37: Consistent type alias updates.

The type aliases have been updated to use ByteNfaState consistently throughout the file, improving code clarity and maintainability. The changes align well with the new naming conventions established in the NFA implementation.


14-14: Updated include statement.

The include statement has been updated to use the new filename Lalr1Parser.hpp, maintaining consistency with the renamed file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant