Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for log-level filtering of structured IR streams. #35

Open
wants to merge 50 commits into
base: main
Choose a base branch
from

Conversation

davemarco
Copy link
Contributor

@davemarco davemarco commented Nov 11, 2024

Description

Adds log level filtering for structured reader.

Design is similiar to junhao prototype. We retrieve log level and timestamp node id from the new node ir unit as the stream is being deserialized. Then we use the cached node id to retrieve values from the logs.

Difference from prototype is that parsing is done in deserialization stage.

I also added support for both string and int log levels.

Validation performed

I tested on one log file with a single int log level..."test-irv2"

@junhaoliao, if we do need string log level, is there a test file for it?

Summary by CodeRabbit

  • New Features

    • Introduced new log event types and enhanced filtering capabilities.
    • Added a new StructuredIrUnitHandler for improved log event processing.
    • Updated ReaderOptions structure to include logLevelKey.
  • Bug Fixes

    • Enhanced error handling for version validation in stream creation.
  • Refactor

    • Simplified filter_log_events and decode_range methods for better maintainability.
    • Removed the IrUnitHandler class to streamline log event handling.
    • Cleaned up unused code and header files.
  • Chores

    • Updated project version to 0.4.0 and added new source files in the build configuration.

Copy link

coderabbitai bot commented Nov 11, 2024

Walkthrough

The changes in this pull request involve multiple updates across several files, primarily focusing on the StreamReader, StructuredIrStreamReader, and log event handling classes. Key modifications include the introduction of new enumerators and type aliases, updates to the LogLevel enumeration, and enhancements in error handling and filtering mechanisms. The CMakeLists.txt file has also been updated to reflect a new project version and the inclusion of additional source files. Overall, these changes aim to improve the functionality and maintainability of the logging framework.

Changes

File Change Summary
src/clp_ffi_js/constants.hpp - Added enumerator LENGTH to LogLevel enum.
- Updated cLogLevelNames to use LogLevel::LENGTH for size.
src/clp_ffi_js/ir/LogEventWithFilterData.hpp - Added type alias StructuredLogEvent as clp::ffi::KeyValuePairLogEvent.
- Updated constraints to allow StructuredLogEvent in LogEventWithFilterData.
src/clp_ffi_js/ir/StreamReader.hpp - Added type alias FilteredLogEventsMap as std::optional<std::vector<size_t>>.
- Updated create method signature to include reader_options parameter.
- Added methods generic_filter_log_events and generic_decode_range.
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp - Changed deserialized_log_events type to std::shared_ptr<StructuredLogEvents>.
- Updated methods to process the new log event type.
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp - Removed IrUnitHandler class.
- Updated StructuredIrDeserializer to use StructuredIrUnitHandler.
- Changed constructor and member variable types to StructuredLogEvents.
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp - Added parse_log_level function for log level conversion.
- Implemented methods for handling log events and schema tree nodes.
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp - Introduced StructuredIrUnitHandler class with methods for log event processing.
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp - Refactored filter_log_events and decode_range methods to use StreamReader methods.
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp - Removed FilteredLogEventsMap type alias.
- Added UnstructuredLogEvents type alias.
CMakeLists.txt - Updated project version from 0.3.0 to 0.4.0.
- Added new source file StructuredIrUnitHandler.cpp.
src/clp_ffi_js/ir/StreamReader.cpp - Updated ReaderOptions type registration to include logLevelKey.
- Enhanced error handling in create method.

Possibly related PRs

Suggested reviewers

  • junhaoliao
  • kirkrodrigues

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. This feature will be included in our Pro Plan when released.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@davemarco davemarco marked this pull request as draft November 11, 2024 04:02
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (16)
src/clp_ffi_js/constants.hpp (1)

22-22: Consider adding range validation helpers.

To make the valid range more useful, consider adding constexpr helper functions to validate log levels.

Here's a suggested addition:

constexpr bool IsValidLogLevel(LogLevel level) {
    return level >= cValidLogLevelsBeginIdx && level <= cValidLogLevelsEndIdx;
}
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)

Line range hint 24-28: Consider updating class documentation

Since this class now handles filtering capabilities through LogEventWithFilterData, it would be beneficial to update the class documentation to reflect these changes and document the filtering behaviour.

src/clp_ffi_js/ir/StreamReader.hpp (1)

26-31: Consider adding validation methods for the mapping.

Since this type is central to the filtering mechanism, consider adding utility methods to:

  • Validate index bounds
  • Handle mapping updates when log events are modified
  • Provide iterator access for efficient traversal

Example interface additions:

class StreamReader {
    // ... existing code ...

    /**
     * Validates that all indices in the mapping are within bounds
     * @return true if all indices are valid
     */
    [[nodiscard]] virtual bool validate_mapping() const = 0;

    /**
     * Updates the mapping after log event modifications
     * @param old_index The original index
     * @param new_index The new index
     * @return true if update was successful
     */
    virtual bool update_mapping(size_t old_index, size_t new_index) = 0;
};
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)

130-135: Consider moving the log_event parameter.

The implementation is correct, but there's a potential performance optimization opportunity.

Consider this minor optimization:

-        auto log_event_with_filter_data{LogEventWithFilterData<UnstructuredLogEvent>(
-                log_event,
-                log_level,
-                log_event.get_timestamp()
-        )};
+        auto log_event_with_filter_data{LogEventWithFilterData<UnstructuredLogEvent>(
+                std::move(log_event),
+                log_level,
+                log_event.get_timestamp()
+        )};
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (2)

97-103: Correct the @return description in get_log_level documentation

The @return comment for get_log_level incorrectly mentions "Timestamp" instead of "Log level".

Apply this diff to fix the documentation:

 /**
  * @param id_value_pairs
- * @return Timestamp from `StructuredLogEvent`
+ * @return Log level from `StructuredLogEvent`
  */

120-123: Address the TODO regarding gsl::not_null usage

The TODO suggests replacing std::shared_ptr with gsl::not_null once gsl is added to the project to reflect that the pointer should not be null.

Would you like assistance in implementing this change, or should we create a GitHub issue to track this enhancement?

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (4)

144-144: Implement parsing for timestamp values of string type

The TODO comment indicates that support for parsing timestamp values of string type is pending. Implementing this will enhance the flexibility of timestamp handling.

Would you like assistance in developing the parsing logic for timestamp strings?


110-112: Handle unrecognized log level names after parsing

After parsing the log level string using parse_log_level, if the result is LogLevel::NONE, it implies an unrecognized log level. Consider logging a warning or handling this case to alert about unexpected log level values.


54-55: Log a message when log level parsing fails

When parse_log_level does not find a matching log level name, it currently returns LogLevel::NONE silently. Consider logging an informative message to aid in debugging issues related to unrecognized log level strings.


38-44: Be cautious with character case conversion and locale

The use of std::toupper in converting the log level string to uppercase may not handle non-ASCII characters correctly. If internationalization is a concern, consider using locale-aware functions or specifying the locale explicitly to ensure proper case conversion.

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (1)

Line range hint 135-138: Inconsistent handling of log level filtering in decode_range

Despite adding filtering capabilities elsewhere, decode_range still logs an error stating that log level filtering is not supported. Consider updating decode_range to support filtering or modifying the error message to reflect current functionality.

Apply this diff to enable filtering in decode_range:

-    if (use_filter) {
-        SPDLOG_ERROR(cLogLevelFilteringNotSupportedErrorMsg);
-        return DecodedResultsTsType{emscripten::val::null()};
-    }
+    if (use_filter) {
+        // Apply filtering logic here
+        // For example, adjust the range based on filtered indices
+        // ... (implementation details)
+    }
src/clp_ffi_js/ir/LogEventWithFilterData.hpp (5)

46-47: Re-evaluate the need to delete copy constructor and copy assignment operator

Deleting the copy constructor and copy assignment operator restricts the ability to copy LogEventWithFilterData instances. If copying of this class is acceptable and safe, consider allowing the default copy operations. If there are specific reasons to prevent copying (e.g., to avoid accidental copying of heavy resources), ensure that this aligns with the intended use of the class.


50-51: Explicitly defaulted move operations may be redundant

The move constructor and move assignment operator are explicitly defaulted on lines 50-51. Since these would be implicitly generated by the compiler when copy operations are deleted, you may omit these explicit defaults unless there is a specific need to declare them.


53-54: Destructor can be omitted

The destructor is explicitly defaulted on lines 53-54. If there is no custom destructor logic required, you can omit this declaration and allow the compiler to generate the destructor implicitly.


14-17: Consider centralizing type aliases for log events

The type aliases UnstructuredLogEvent and StructuredLogEvent are defined here. If these aliases are used across multiple files, consider moving them to a common header file to improve maintainability and reduce duplication.


1-2: Header guard naming convention

The header guard CLP_FFI_JS_IR_LOGEVENTWITHFILTERDATA_HPP lacks underscores between words, which may reduce readability. Consider renaming it to CLP_FFI_JS_IR_LOG_EVENT_WITH_FILTER_DATA_HPP to align with common naming conventions and enhance clarity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 450e8f8 and be0e5ba.

📒 Files selected for processing (10)
  • src/clp_ffi_js/constants.hpp (1 hunks)
  • src/clp_ffi_js/ir/LogEventWithFilterData.hpp (1 hunks)
  • src/clp_ffi_js/ir/LogEventWithLevel.hpp (0 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (1 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (5 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (2 hunks)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (1 hunks)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (1 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (5 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (2 hunks)
💤 Files with no reviewable changes (1)
  • src/clp_ffi_js/ir/LogEventWithLevel.hpp
🧰 Additional context used
📓 Learnings (3)
src/clp_ffi_js/ir/LogEventWithFilterData.hpp (1)
Learnt from: davemarco
PR: y-scope/clp-ffi-js#32
File: src/clp_ffi_js/ir/LogEventWithFilterData.hpp:0-0
Timestamp: 2024-11-10T02:14:19.411Z
Learning: In the context of `LogEventWithFilterData`, it is acceptable for `log_level` to be `LogLevel::NONE`; additional validation against `LogLevel::NONE` is not necessary.
src/clp_ffi_js/ir/StreamReader.hpp (1)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.
🔇 Additional comments (24)
src/clp_ffi_js/constants.hpp (2)

22-22: LGTM! The constant addition is well-placed and properly typed.

The new constant cValidLogLevelsEndIdx pairs nicely with the existing cValidLogLevelsBeginIdx to define the valid range of log levels.


22-22: Verify the constant usage across the codebase.

Since this PR is implementing filtering for structured reader, let's verify how this constant is being used.

✅ Verification successful

Let me gather more information about the usage context.


The constant cValidLogLevelsEndIdx is properly used for log level validation

The constant is used correctly in StructuredIrUnitHandler.cpp to validate log levels during deserialization, ensuring they fall within the valid range (TRACE to FATAL). The implementation properly converts and validates the log level before using it in the LogEventWithFilterData structure.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage of the new constant and related filtering logic

# Search for references to the new constant
rg -l "cValidLogLevelsEndIdx"

# Look for potential filter implementations using log levels
ast-grep --pattern 'class $_ {
  $$$
  LogLevel $$$
  $$$
}'

Length of output: 167


Script:

#!/bin/bash
# Check the implementation details in StructuredIrUnitHandler.cpp
rg -A 5 "cValidLogLevelsEndIdx" src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp

# Look for other log level related constants and their usage
rg "LogLevel::" src/clp_ffi_js/constants.hpp

# Check for any filter implementations
rg -A 10 "filter" src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp

Length of output: 1179

src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (2)

22-22: LGTM! Good type alias addition

The new type alias improves code readability and maintains consistency with the codebase naming conventions.


83-83: Verify the impact of type change on existing methods

The type change from LogEventWithLevel to LogEventWithFilterData for m_encoded_log_events might affect existing methods that interact with this member. Please ensure all usages are updated accordingly.

✅ Verification successful

Let me analyze the results and generate additional verification scripts to check for any potential issues.


Based on the verification results, I can now provide a final response:

Type change to LogEventWithFilterData is properly handled

The change from LogEventWithLevel to LogEventWithFilterData for m_encoded_log_events has been properly implemented across all usages:

  • All accesses to m_encoded_log_events are consistent with the new type
  • No references to the old LogEventWithLevel type remain in the codebase
  • The member is used correctly in all methods including:
    • get_num_events_buffered()
    • filter_log_events()
    • deserialize_stream()
    • decode_range()
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for methods accessing m_encoded_log_events
rg -A 3 "m_encoded_log_events" --type cpp

# Search for any remaining references to LogEventWithLevel
rg "LogEventWithLevel" --type cpp

Length of output: 3337


Script:

#!/bin/bash
# Search for the LogEventWithFilterData type definition and usage
ast-grep --pattern 'template<typename T> class LogEventWithFilterData { $$$ }'

# Search for any references to the old type pattern
ast-grep --pattern 'LogEventWithLevel<$_>'

# Look for method signatures that might be affected
rg "UnstructuredIrStreamReader::" --type cpp -A 2

Length of output: 2720

src/clp_ffi_js/ir/StreamReader.hpp (2)

26-30: Well-designed type alias with clear documentation.

The FilteredLogEventsMap type alias is well-chosen:

  • Using std::optional appropriately handles cases where no mapping exists
  • Using std::vector<size_t> provides efficient index-based mapping
  • Documentation clearly explains the purpose

26-31: Verify the impact on memory usage.

The mapping could potentially consume significant memory for large log files. Consider implementing a streaming approach or pagination mechanism for better memory efficiency.

✅ Verification successful

Let me analyze the code further to verify the memory usage concerns.


Memory usage concerns are already addressed with proper optimizations

The codebase already implements memory optimizations:

  • Pre-allocation with reserve() is used to avoid reallocation overhead, with a reasonable default size of 500,000 events
  • The StreamReader design implements streaming through deserialize_stream() and decode_range() methods
  • The FilteredLogEventsMap is an std::optional, meaning it's only allocated when filtering is actually used
  • Memory is properly released after stream exhaustion via reset(nullptr) on the stream reader context
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for large log file handling patterns
rg -A 5 "pagination|chunk|batch|stream" --type cpp

Length of output: 31871


Script:

#!/bin/bash
# Search for memory management patterns and vector reservations
rg -A 3 "reserve|capacity|memory" --type cpp

# Search for pagination or batch processing implementations
ast-grep --pattern 'class $_ {
  $$$
  decode_range($_, $_, $_) {
    $$$
  }
  $$$
}'

Length of output: 3627

src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (3)

26-26: LGTM! Include changes are appropriate.

The addition of LogEventWithFilterData header and reorganization of includes align well with the new implementation.

Also applies to: 29-29


75-75: LGTM! Good simplification of filtering logic.

The delegation to filter_deserialized_events improves code organization and maintainability. As per previous learnings from PR #26, TypeScript typing ensures safe log level filter inputs.


Line range hint 169-193: LGTM! Proper adaptation to the new data structure.

The changes correctly handle the LogEventWithFilterData structure while maintaining robust error handling and proper timestamp formatting.

src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (6)

14-14: Include of LogEventWithFilterData.hpp is appropriate.

The inclusion of LogEventWithFilterData.hpp is necessary for handling log events with filtering capabilities.


17-17: Include of StructuredIrUnitHandler.hpp is appropriate.

Including StructuredIrUnitHandler.hpp ensures access to the new unit handler for structured IR units.


21-21: Type alias StructuredIrDeserializer is correctly defined.

Defining StructuredIrDeserializer simplifies references to the deserializer specialized with StructuredIrUnitHandler.


81-82: Constructor parameter updated to handle filtered log events.

The constructor now accepts a std::shared_ptr to a vector of LogEventWithFilterData<StructuredLogEvent>, aligning with the new filtering mechanism.


86-87: Member variable m_deserialized_log_events updated appropriately.

Updating m_deserialized_log_events to store log events with filter data matches the changes in the constructor and supports the enhanced processing of log events.


89-89: Addition of m_filtered_log_event_map supports new filtering features.

Introducing m_filtered_log_event_map provides the necessary mapping for filtered log events, which is essential for the filtering functionality.

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (9)

23-27: Includes updated to support new filtering functionality

The necessary headers have been appropriately included to support the new filtering capabilities.


35-36: Added constants for reader options keys

The constants cReaderOptionsLogLevelKey and cReaderOptionsTimestampKey are correctly defined for accessing reader options.


44-46: Initialized deserialized_log_events with updated data structure

The shared pointer to the vector of LogEventWithFilterData<StructuredLogEvent> is properly initialized.


81-85: Properly handling optional m_filtered_log_event_map

The method correctly checks if m_filtered_log_event_map has a value before attempting to access it, ensuring safe operation.


89-93: Simplified call to filter_deserialized_events

The filter_log_events method now directly calls filter_deserialized_events with the appropriate parameters, enhancing code clarity.


147-148: Accessing log events using updated data structure

The code correctly retrieves the log event from log_event_with_filter_data using the new data structure.


150-150: Serializing structured log event to JSON

The serialize_to_json method is appropriately called on the structured log event.


167-168: Passing timestamp and log level to JavaScript context

The timestamp and log level are correctly obtained and passed to the JavaScript context via EM_ASM.


178-179: Constructor updated with new log event structure

The constructor now accepts deserialized_log_events as a shared pointer to a vector of LogEventWithFilterData<StructuredLogEvent>, aligning with the updated data structures.

Comment on lines 9 to 17
#include <clp/ir/LogEvent.hpp>
#include <clp/ir/LogEventDeserializer.hpp>
#include <clp/ir/types.hpp>
#include <clp/TimestampPattern.hpp>
#include <emscripten/val.h>

#include <clp_ffi_js/ir/LogEventWithLevel.hpp>
#include <clp_ffi_js/ir/LogEventWithFilterData.hpp>
#include <clp_ffi_js/ir/StreamReader.hpp>
#include <clp_ffi_js/ir/StreamReaderDataContext.hpp>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Restore the <optional> header inclusion

The <optional> header appears to be still required as the class member m_filtered_log_event_map is of type FilteredLogEventsMap which likely uses std::optional. Its removal might cause compilation errors.

+#include <optional>
 #include <clp/ir/LogEvent.hpp>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#include <clp/ir/LogEvent.hpp>
#include <clp/ir/LogEventDeserializer.hpp>
#include <clp/ir/types.hpp>
#include <clp/TimestampPattern.hpp>
#include <emscripten/val.h>
#include <clp_ffi_js/ir/LogEventWithLevel.hpp>
#include <clp_ffi_js/ir/LogEventWithFilterData.hpp>
#include <clp_ffi_js/ir/StreamReader.hpp>
#include <clp_ffi_js/ir/StreamReaderDataContext.hpp>
#include <optional>
#include <clp/ir/LogEvent.hpp>
#include <clp/ir/LogEventDeserializer.hpp>
#include <clp/ir/types.hpp>
#include <clp/TimestampPattern.hpp>
#include <emscripten/val.h>
#include <clp_ffi_js/ir/LogEventWithFilterData.hpp>
#include <clp_ffi_js/ir/StreamReader.hpp>
#include <clp_ffi_js/ir/StreamReaderDataContext.hpp>

};
} // namespace clp_ffi_js::ir

#endif // CLP_FFI_JS_IR_STRUCTUREDIRSTREAMREADER_HPP
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the inconsistent header guard macro

The header guard macro at the end of the file does not match the one at the beginning, which may lead to compilation issues.

Apply this diff to correct the header guard:

-#endif  // CLP_FFI_JS_IR_STRUCTUREDIRSTREAMREADER_HPP
+#endif  // CLP_FFI_JS_IR_STRUCTUREDIRUNITHANDLER_HPP
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#endif // CLP_FFI_JS_IR_STRUCTUREDIRSTREAMREADER_HPP
#endif // CLP_FFI_JS_IR_STRUCTUREDIRUNITHANDLER_HPP

Comment on lines 115 to 117
if (log_level_node_value <= (clp::enum_to_underlying_type(cValidLogLevelsEndIdx))) {
log_level = static_cast<LogLevel>(log_level_node_value);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Validate lower bound when checking integer log level values

In the get_log_level method, when validating the integer value of the log level, the code only checks if log_level_node_value <= (clp::enum_to_underlying_type(cValidLogLevelsEndIdx)). Please add a lower bound check to ensure the value falls within the valid range.

Apply this diff to add the lower bound check:

-    if (log_level_node_value <= (clp::enum_to_underlying_type(cValidLogLevelsEndIdx))) {
+    if (log_level_node_value >= clp::enum_to_underlying_type(cValidLogLevelsBeginIdx) &&
+        log_level_node_value <= clp::enum_to_underlying_type(cValidLogLevelsEndIdx)) {
         log_level = static_cast<LogLevel>(log_level_node_value);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (log_level_node_value <= (clp::enum_to_underlying_type(cValidLogLevelsEndIdx))) {
log_level = static_cast<LogLevel>(log_level_node_value);
}
if (log_level_node_value >= clp::enum_to_underlying_type(cValidLogLevelsBeginIdx) &&
log_level_node_value <= clp::enum_to_underlying_type(cValidLogLevelsEndIdx)) {
log_level = static_cast<LogLevel>(log_level_node_value);
}

Comment on lines +51 to 52
reader_options[cReaderOptionsLogLevelKey.data()].as<std::string>(),
reader_options[cReaderOptionsTimestampKey.data()].as<std::string>()
Copy link

@coderabbitai coderabbitai bot Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ensure presence of required keys in reader_options

The code assumes that reader_options contains the keys logLevelKey and timestampKey. To prevent potential exceptions, consider adding error handling to verify that these keys are present.

Apply this diff to add key existence checks:

+if (!reader_options.hasOwnProperty(cReaderOptionsLogLevelKey.data()) || 
+    !reader_options.hasOwnProperty(cReaderOptionsTimestampKey.data())) {
+    throw ClpFfiJsException{
+        clp::ErrorCode::ErrorCode_InvalidArgument,
+        __FILENAME__,
+        __LINE__,
+        "Required reader options keys are missing"
+    };
+}

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a issue for JavaScript support (usage without the TS definition file) and keep a list of TODOs there.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, I've created the issue as requested: Issue #36 to track JavaScript support without the TypeScript definition file and to keep a list of TODOs.

🔗 Analysis chain

@junhaoliao, I've created a new GitHub issue to track JavaScript support without the TypeScript definition file and to keep a list of TODOs.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Create a new GitHub issue for JavaScript support without the TypeScript definition file
gh issue create \
  --title "JavaScript support without the TypeScript definition file" \
  --body "We need to support JavaScript usage without relying on the TypeScript definition file.

This issue is created to keep a list of TODOs for implementing this feature.

Referenced from [PR #35](https://github.com/y-scope/clp-ffi-js/pull/35) and [this comment](https://github.com/y-scope/clp-ffi-js/pull/35#discussion_r1835929602) by @junhaoliao." \
  --assignee junhaoliao

Length of output: 507


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#35
File: src/clp_ffi_js/ir/StructuredIrStreamReader.cpp:50-51
Timestamp: 2024-11-14T15:31:45.747Z
Learning: Ensure that `clp-ffi-js` supports JavaScript usage without relying on the TypeScript definition file, and keep track of related TODOs in a GitHub issue.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines 19 to 22
// TODO: Extend valid log event types when filtering support is added for structured logs.
template <typename T>
concept ValidLogEventTypes
= std::same_as<T, UnstructuredLogEvent> || std::same_as<T, StructuredLogEvent>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Update or remove outdated TODO comment regarding valid log event types

The TODO comment on line 19 suggests extending valid log event types when filtering support is added for structured logs. However, StructuredLogEvent is already included in ValidLogEventTypes on line 22. Please consider updating or removing the TODO comment to reflect the current state of the code.

@davemarco davemarco changed the title WIP: Filter for structured reader. Add log level filtering for structured IR stream reader and bump version to 0.4.0 Nov 11, 2024
@davemarco davemarco changed the title Add log level filtering for structured IR stream reader and bump version to 0.4.0 Add log level filtering for structured IR stream reader and bump version to 0.4.0. Nov 11, 2024
@davemarco
Copy link
Contributor Author

davemarco commented Dec 12, 2024

@junhaoliao @kirkrodrigues You guys can look at new implementation of generic_decode_range. @junhaoliao you can also now close the prototype PR. I think we have now implemented everything in prototype.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
src/clp_ffi_js/ir/StreamReader.hpp (2)

119-154: Consider using unordered_set for better performance.

The current implementation uses std::ranges::find within a loop, which has O(n*m) complexity where n is the number of log events and m is the size of the filter set. Consider using std::unordered_set for O(n) complexity:

- auto filter_levels = emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter);
+ auto filter_levels_vec = emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter);
+ std::unordered_set<std::underlying_type_t<LogLevel>> filter_levels(filter_levels_vec.begin(), filter_levels_vec.end());

  for (size_t log_event_idx = 0; log_event_idx < log_events.size(); ++log_event_idx) {
      auto const& log_event = log_events[log_event_idx];
-     if (std::ranges::find(
-             filter_levels,
-             clp::enum_to_underlying_type(log_event.get_log_level())
-         )
-         != filter_levels.end())
+     if (filter_levels.contains(clp::enum_to_underlying_type(log_event.get_log_level())))
      {
          filtered_log_event_map->emplace_back(log_event_idx);
      }
  }

156-219: Maintain consistency in boolean comparisons.

The method follows the coding guideline of using false == expression in line 179 but not in line 189. For consistency:

- if (length < end_idx || begin_idx > end_idx) {
+ if (true == (length < end_idx || begin_idx > end_idx)) {

Otherwise, the implementation is well-structured and handles all edge cases appropriately.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b684ab and 9a5a502.

📒 Files selected for processing (1)
  • src/clp_ffi_js/ir/StreamReader.hpp (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
src/clp_ffi_js/ir/StreamReader.hpp (2)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/StreamReader.cpp:35-45
Timestamp: 2024-11-10T16:46:58.543Z
Learning: An IRv1 version validation utility function is not planned to be added to CLP-core soon.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.
🔇 Additional comments (2)
src/clp_ffi_js/ir/StreamReader.hpp (2)

4-4: LGTM! New includes and type definitions are appropriate.

The new includes support the added functionality, and the FilteredLogEventsMap type alias is well-documented and serves a clear purpose for mapping filtered to unfiltered log event indices.

Also applies to: 8-8, 14-14, 35-39


53-54: Documentation is missing for the reader_options parameter.

The docstring should include a description of the reader_options parameter.

src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)

118-153: Documentation needs improvement.

The parameter documentation has a duplicate entry for log_events and could be more descriptive.

Apply these documentation improvements:

     /**
      * Templated implementation of `filter_log_events`.
      *
      * @tparam LogEvent Type of the log event being filtered
-     * @param log_level_filter
-     * @param log_events Derived class's log events.
-     * @param log_events
-     * @param[out] filtered_log_event_map Returns the filtered log events.
+     * @param[out] filtered_log_event_map Collection to store indices of filtered log events
+     * @param log_level_filter Array of log levels to filter by
+     * @param log_events Collection of log events to filter
      */
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a5a502 and feec5ef.

📒 Files selected for processing (3)
  • src/clp_ffi_js/ir/StreamReader.hpp (4 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (4 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (3 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (3)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:85-91
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In the `IrStreamReader::deserialize_stream` method, we handle missing End-of-Stream packets gracefully by logging an error and breaking the loop when encountering `std::errc::result_out_of_range`, rather than throwing an exception.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#35
File: src/clp_ffi_js/ir/StructuredIrStreamReader.cpp:50-51
Timestamp: 2024-11-14T15:31:45.926Z
Learning: Ensure that `clp-ffi-js` supports JavaScript usage without relying on the TypeScript definition file, and keep track of related TODOs in a GitHub issue.
src/clp_ffi_js/ir/StreamReader.hpp (2)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/StreamReader.cpp:35-45
Timestamp: 2024-11-10T16:46:58.543Z
Learning: An IRv1 version validation utility function is not planned to be added to CLP-core soon.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.
🔇 Additional comments (6)
src/clp_ffi_js/ir/StreamReader.hpp (2)

4-18: LGTM! Well-documented type alias with necessary includes.

The FilteredLogEventsMap type alias is clearly documented and the new includes support the added functionality.

Also applies to: 34-38


155-223: LGTM! Well-implemented generic decoding with proper type constraints.

The method effectively handles all edge cases and uses type constraints to ensure type safety at compile time.

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)

71-75: LGTM! Consistent implementation using generic methods.

The methods effectively utilize the shared implementations from StreamReader, improving code maintainability.

Also applies to: 79-83


123-147: LGTM! Clean implementation of decode_range.

The method effectively uses the generic implementation with a well-structured lambda for string conversion.

src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (2)

72-76: LGTM! Consistent use of generic implementation.

The method effectively utilizes the shared implementation from StreamReader.


Line range hint 139-162: LGTM! Well-structured decode_range implementation.

The method effectively uses the generic implementation with proper error handling in the string conversion lambda.

Comment on lines +96 to 102
while (false == deserializer.is_stream_completed()) {
auto result{deserializer.deserialize_next_ir_unit(reader)};
if (false == result.has_error()) {
continue;
}
auto const error{result.error()};
if (std::errc::operation_not_permitted == error) {
break;
}
if (std::errc::result_out_of_range == error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling for incomplete IR streams.

Based on the learnings from LinZhihao-723, we should only check for std::errc::result_out_of_range as the incomplete stream error has been fixed in ffi core.

Apply this change:

-    while (false == deserializer.is_stream_completed()) {
-        auto result{deserializer.deserialize_next_ir_unit(reader)};
-        if (false == result.has_error()) {
-            continue;
-        }
-        auto const error{result.error()};
-        if (std::errc::result_out_of_range == error) {
-            SPDLOG_ERROR("File contains an incomplete IR stream");
-            break;
-        }
+    while (true) {
+        auto result{deserializer.deserialize_next_ir_unit(reader)};
+        if (result.has_error()) {
+            if (std::errc::result_out_of_range == result.error()) {
+                SPDLOG_ERROR("File contains an incomplete IR stream");
+                break;
+            }
+            throw ClpFfiJsException{
+                clp::ErrorCode::ErrorCode_Corrupt,
+                __FILENAME__,
+                __LINE__,
+                std::format(
+                    "Failed to deserialize IR unit: {}:{}",
+                    result.error().category().name(),
+                    result.error().message()
+                )
+            };
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while (false == deserializer.is_stream_completed()) {
auto result{deserializer.deserialize_next_ir_unit(reader)};
if (false == result.has_error()) {
continue;
}
auto const error{result.error()};
if (std::errc::operation_not_permitted == error) {
break;
}
if (std::errc::result_out_of_range == error) {
while (true) {
auto result{deserializer.deserialize_next_ir_unit(reader)};
if (result.has_error()) {
if (std::errc::result_out_of_range == result.error()) {
SPDLOG_ERROR("File contains an incomplete IR stream");
break;
}
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Corrupt,
__FILENAME__,
__LINE__,
std::format(
"Failed to deserialize IR unit: {}:{}",
result.error().category().name(),
result.error().message()
)
};
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (5)
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)

82-88: Enhance documentation for new filtering functionality

Consider adding documentation for:

  • The thread safety guarantees of the class
  • The behaviour of filtering operations
  • The relationship between m_deserialized_log_events and m_filtered_log_event_map
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (4)

33-33: Remove duplicate function declaration.

The function declaration is redundant as the implementation follows immediately after the documentation.


35-55: Consider optimizing the log level parsing.

The current implementation creates a temporary string for case conversion. Consider using a case-insensitive comparison function to avoid the temporary allocation.

-    std::string log_level_name_upper_case{str};
-    std::ranges::transform(
-            log_level_name_upper_case.begin(),
-            log_level_name_upper_case.end(),
-            log_level_name_upper_case.begin(),
-            [](unsigned char c) { return std::toupper(c); }
-    );
-
-    auto const* it = std::ranges::find(
-            cLogLevelNames.begin() + clp::enum_to_underlying_type(cValidLogLevelsBeginIdx),
-            cLogLevelNames.end(),
-            log_level_name_upper_case
-    );
+    auto const* it = std::ranges::find_if(
+            cLogLevelNames.begin() + clp::enum_to_underlying_type(cValidLogLevelsBeginIdx),
+            cLogLevelNames.end(),
+            [&str](std::string_view const& name) {
+                return std::equal(str.begin(), str.end(), name.begin(), name.end(),
+                    [](char a, char b) { return std::toupper(a) == b; });
+            }
+    );

69-75: Document the method's behaviour.

Add documentation to clarify that the method currently logs a warning and returns success without handling the UTC offset change.

+/**
+ * Handles UTC offset change packets.
+ * @note Currently, this method only logs a warning and does not process the offset change.
+ * @param utc_offset_old The previous UTC offset
+ * @param utc_offset_new The new UTC offset
+ * @return Always returns IRErrorCode_Success
+ */
 auto StructuredIrUnitHandler::handle_utc_offset_change(

92-94: Document the method's purpose.

Add documentation to explain why this method exists and always returns success.

+/**
+ * Handles the end of stream event.
+ * @return Always returns IRErrorCode_Success as no special handling is required
+ */
 auto StructuredIrUnitHandler::handle_end_of_stream() -> clp::ffi::ir_stream::IRErrorCode {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between feec5ef and 61e1e83.

📒 Files selected for processing (5)
  • src/clp_ffi_js/ir/StreamReader.hpp (4 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (2 hunks)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (1 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (3 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
  • src/clp_ffi_js/ir/StreamReader.hpp
🧰 Additional context used
📓 Path-based instructions (3)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.
🔇 Additional comments (8)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (2)

72-72: LGTM! The refactoring improves code reusability.

The method has been simplified to use the common generic_filter_log_events implementation, which reduces code duplication and improves maintainability.


Line range hint 135-158: LGTM! The refactoring enhances code organization and error handling.

The changes improve the code in several ways:

  1. The log_event_to_string lambda encapsulates message decoding logic cleanly
  2. Error handling for decoding failures is robust
  3. The use of generic_decode_range reduces code duplication

The implementation follows coding guidelines and maintains good error handling practices.

src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (3)

14-22: LGTM! Type definitions enhance code clarity

The new includes and type aliases effectively support the log level filtering functionality while maintaining code readability.


82-82: Verify callers of the constructor are updated

The constructor parameter type has changed from std::vector<KeyValuePairLogEvent> to StructuredLogEvents. While the constructor is private, ensure the factory method create and any other internal callers are updated accordingly.

✅ Verification successful

Let me gather more information about the factory method and its implementation.


Constructor parameter change has been properly handled

The factory method create and its implementation have been correctly updated to work with the new StructuredLogEvents type. The constructor's parameter change is properly encapsulated, and all callers are using the public create factory method which handles the type conversion internally. No issues were found.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining references to KeyValuePairLogEvent
rg "KeyValuePairLogEvent" -A 5

# Search for create method implementation
ast-grep --pattern 'StructuredIrStreamReader::create($$$)'

Length of output: 2490


Script:

#!/bin/bash
# Search for StructuredIrStreamReader class implementation and usage
rg "StructuredIrStreamReader" -A 5

# Search for any remaining vector<KeyValuePairLogEvent> usage
rg "vector.*KeyValuePairLogEvent" -A 5

# Search specifically for create method in StructuredIrStreamReader files
fd "StructuredIrStreamReader" --exec cat {}

Length of output: 18018


86-88: Consider thread safety implications

The shared pointer m_deserialized_log_events and m_filtered_log_event_map could be accessed concurrently through public methods. Please ensure proper synchronization is in place if concurrent access is possible.

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (3)

58-67: 🛠️ Refactor suggestion

Add input validation for log event.

The method should validate the input log event before processing. Consider checking if the node ID-value pairs are empty.

 auto StructuredIrUnitHandler::handle_log_event(StructuredLogEvent&& log_event
 ) -> clp::ffi::ir_stream::IRErrorCode {
     auto const& id_value_pairs{log_event.get_node_id_value_pairs()};
+    if (id_value_pairs.empty()) {
+        SPDLOG_ERROR("Empty node ID-value pairs in log event");
+        return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_InvalidData;
+    }
+
     auto const timestamp = get_timestamp(id_value_pairs);
     auto const log_level = get_log_level(id_value_pairs);

Likely invalid or redundant comment.


77-90: ⚠️ Potential issue

Add overflow protection for node ID increment.

The current node ID increment could potentially overflow. Consider adding bounds checking.

 auto StructuredIrUnitHandler::handle_schema_tree_node_insertion(
         clp::ffi::SchemaTree::NodeLocator schema_tree_node_locator
 ) -> clp::ffi::ir_stream::IRErrorCode {
+    if (m_current_node_id == std::numeric_limits<std::size_t>::max()) {
+        SPDLOG_ERROR("Node ID overflow detected");
+        return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_InvalidData;
+    }
     ++m_current_node_id;

Likely invalid or redundant comment.


96-130: ⚠️ Potential issue

Improve exception safety when accessing node ID-value pairs.

Use find() instead of at() to avoid potential exceptions when accessing the map.

-    auto const& log_level_optional_value{id_value_pairs.at(m_log_level_node_id.value())};
-    if (false == log_level_optional_value.has_value()) {
+    auto it = id_value_pairs.find(m_log_level_node_id.value());
+    if (it == id_value_pairs.end() || false == it->second.has_value()) {
         return log_level;
     }
 
-    auto const log_level_value = log_level_optional_value.value();
+    auto const log_level_value = it->second.value();

Likely invalid or redundant comment.

Copy link
Contributor Author

@davemarco davemarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp Outdated Show resolved Hide resolved
@@ -17,6 +18,7 @@ enum class LogLevel : std::uint8_t {
WARN,
ERROR,
FATAL,
LENGTH, // This isn't a valid log level.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao please review my and kirk arguments, and let us know ur thoughts

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp Outdated Show resolved Hide resolved
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more minor comments.

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp Outdated Show resolved Hide resolved
if (false == m_log_level_node_id.has_value()) {
return log_level;
}
auto const& log_level_optional_value{id_value_pairs.at(m_log_level_node_id.value())};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although what you wrote is grammatically correct, we have an internal guideline that requires the name is optional_xxx. Unless you're purposefully violating that guideline?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for timestamp_optional_value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (2)
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (2)

35-55: Add input validation for empty strings

The function should validate that the input string is not empty before processing.

 auto parse_log_level(std::string_view str) -> LogLevel {
+    if (str.empty()) {
+        return LogLevel::NONE;
+    }
+
     // Convert the string to uppercase.
     std::string log_level_name_upper_case{str};

92-94: Add documentation for the empty implementation

Document why this method returns success without any processing. If it's intentionally empty, add a comment explaining why.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1da4c7d and f08e07f.

📒 Files selected for processing (1)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (2)
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (2)

77-90: ⚠️ Potential issue

Add overflow protection for node ID increment

The current implementation could potentially overflow m_current_node_id.

 ) -> clp::ffi::ir_stream::IRErrorCode {
+    if (m_current_node_id == std::numeric_limits<std::size_t>::max()) {
+        SPDLOG_ERROR("Node ID overflow detected");
+        return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_InvalidData;
+    }
     ++m_current_node_id;

Likely invalid or redundant comment.


58-67: 🛠️ Refactor suggestion

Add validation for empty id_value_pairs

The method should validate that id_value_pairs is not empty before processing.

 auto StructuredIrUnitHandler::handle_log_event(StructuredLogEvent&& log_event
 ) -> clp::ffi::ir_stream::IRErrorCode {
     auto const& id_value_pairs{log_event.get_node_id_value_pairs()};
+    if (id_value_pairs.empty()) {
+        SPDLOG_ERROR("Empty node ID-value pairs in log event");
+        return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_InvalidData;
+    }
+
     auto const timestamp = get_timestamp(id_value_pairs);

Likely invalid or redundant comment.

Comment on lines +69 to +75
auto StructuredIrUnitHandler::handle_utc_offset_change(
[[maybe_unused]] clp::UtcOffset utc_offset_old,
[[maybe_unused]] clp::UtcOffset utc_offset_new
) -> clp::ffi::ir_stream::IRErrorCode {
SPDLOG_WARN("UTC offset change packets aren't handled currently.");
return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Return a more appropriate error code

Since UTC offset changes aren't handled, consider returning IRErrorCode_Unimplemented instead of IRErrorCode_Success to accurately reflect the state.

 ) -> clp::ffi::ir_stream::IRErrorCode {
     SPDLOG_WARN("UTC offset change packets aren't handled currently.");
-    return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success;
+    return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Unimplemented;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto StructuredIrUnitHandler::handle_utc_offset_change(
[[maybe_unused]] clp::UtcOffset utc_offset_old,
[[maybe_unused]] clp::UtcOffset utc_offset_new
) -> clp::ffi::ir_stream::IRErrorCode {
SPDLOG_WARN("UTC offset change packets aren't handled currently.");
return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success;
}
auto StructuredIrUnitHandler::handle_utc_offset_change(
[[maybe_unused]] clp::UtcOffset utc_offset_old,
[[maybe_unused]] clp::UtcOffset utc_offset_new
) -> clp::ffi::ir_stream::IRErrorCode {
SPDLOG_WARN("UTC offset change packets aren't handled currently.");
return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Unimplemented;
}

Comment on lines 131 to 159
auto StructuredIrUnitHandler::get_timestamp(
StructuredLogEvent::NodeIdValuePairs const& id_value_pairs
) const -> clp::ir::epoch_time_ms_t {
clp::ir::epoch_time_ms_t timestamp{0};

if (false == m_timestamp_node_id.has_value()) {
return timestamp;
}
auto const& timestamp_optional_value{id_value_pairs.at(m_timestamp_node_id.value())};
if (false == timestamp_optional_value.has_value()) {
return timestamp;
}
auto const timestamp_value = timestamp_optional_value.value();

if (timestamp_value.is<clp::ffi::value_int_t>()) {
timestamp = static_cast<clp::ir::epoch_time_ms_t>(
timestamp_value.get_immutable_view<clp::ffi::value_int_t>()
);
} else {
// TODO: Add support for parsing string-type timestamp values.
auto log_event_idx = m_deserialized_log_events->size();
SPDLOG_ERROR(
"Authoritative timestamp's value is not an int for log event index {}",
log_event_idx
);
}

return timestamp;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add bounds checking and improve exception safety

The method should validate the timestamp value and use find() instead of at().

-    auto const& timestamp_optional_value{id_value_pairs.at(m_timestamp_node_id.value())};
+    auto it = id_value_pairs.find(m_timestamp_node_id.value());
+    if (it == id_value_pairs.end()) {
+        return timestamp;
+    }
+    auto const& timestamp_optional_value = it->second;

     if (timestamp_value.is<clp::ffi::value_int_t>()) {
+        auto const raw_value = timestamp_value.get_immutable_view<clp::ffi::value_int_t>();
+        if (raw_value > std::numeric_limits<clp::ir::epoch_time_ms_t>::max()) {
+            SPDLOG_ERROR("Timestamp value overflow detected");
+            return timestamp;
+        }
         timestamp = static_cast<clp::ir::epoch_time_ms_t>(
-                timestamp_value.get_immutable_view<clp::ffi::value_int_t>()
+                raw_value
         );

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 96 to 129
auto StructuredIrUnitHandler::get_log_level(
StructuredLogEvent::NodeIdValuePairs const& id_value_pairs
) const -> LogLevel {
LogLevel log_level{LogLevel::NONE};

if (false == m_log_level_node_id.has_value()) {
return log_level;
}
auto const& log_level_optional_value{id_value_pairs.at(m_log_level_node_id.value())};
if (false == log_level_optional_value.has_value()) {
return log_level;
}
auto const log_level_value = log_level_optional_value.value();

if (log_level_value.is<std::string>()) {
auto const& log_level_str = log_level_value.get_immutable_view<std::string>();
log_level = parse_log_level(log_level_str);
} else if (log_level_value.is<clp::ffi::value_int_t>()) {
auto const& log_level_int = log_level_value.get_immutable_view<clp::ffi::value_int_t>();
if (log_level_int >= clp::enum_to_underlying_type(cValidLogLevelsBeginIdx)
&& log_level_int < clp::enum_to_underlying_type(LogLevel::LENGTH))
{
log_level = static_cast<LogLevel>(log_level_int);
}
} else {
auto log_event_idx = m_deserialized_log_events->size();
SPDLOG_ERROR(
"Authoritative log level's value is not an int or string for log event index {}",
log_event_idx
);
}

return log_level;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve exception safety and bounds checking

The method should use find() instead of at() to avoid potential exceptions, and validate integer bounds more strictly.

-    auto const& log_level_optional_value{id_value_pairs.at(m_log_level_node_id.value())};
+    auto it = id_value_pairs.find(m_log_level_node_id.value());
+    if (it == id_value_pairs.end()) {
+        return log_level;
+    }
+    auto const& log_level_optional_value = it->second;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto StructuredIrUnitHandler::get_log_level(
StructuredLogEvent::NodeIdValuePairs const& id_value_pairs
) const -> LogLevel {
LogLevel log_level{LogLevel::NONE};
if (false == m_log_level_node_id.has_value()) {
return log_level;
}
auto const& log_level_optional_value{id_value_pairs.at(m_log_level_node_id.value())};
if (false == log_level_optional_value.has_value()) {
return log_level;
}
auto const log_level_value = log_level_optional_value.value();
if (log_level_value.is<std::string>()) {
auto const& log_level_str = log_level_value.get_immutable_view<std::string>();
log_level = parse_log_level(log_level_str);
} else if (log_level_value.is<clp::ffi::value_int_t>()) {
auto const& log_level_int = log_level_value.get_immutable_view<clp::ffi::value_int_t>();
if (log_level_int >= clp::enum_to_underlying_type(cValidLogLevelsBeginIdx)
&& log_level_int < clp::enum_to_underlying_type(LogLevel::LENGTH))
{
log_level = static_cast<LogLevel>(log_level_int);
}
} else {
auto log_event_idx = m_deserialized_log_events->size();
SPDLOG_ERROR(
"Authoritative log level's value is not an int or string for log event index {}",
log_event_idx
);
}
return log_level;
}
auto StructuredIrUnitHandler::get_log_level(
StructuredLogEvent::NodeIdValuePairs const& id_value_pairs
) const -> LogLevel {
LogLevel log_level{LogLevel::NONE};
if (false == m_log_level_node_id.has_value()) {
return log_level;
}
auto it = id_value_pairs.find(m_log_level_node_id.value());
if (it == id_value_pairs.end()) {
return log_level;
}
auto const& log_level_optional_value = it->second;
if (false == log_level_optional_value.has_value()) {
return log_level;
}
auto const log_level_value = log_level_optional_value.value();
if (log_level_value.is<std::string>()) {
auto const& log_level_str = log_level_value.get_immutable_view<std::string>();
log_level = parse_log_level(log_level_str);
} else if (log_level_value.is<clp::ffi::value_int_t>()) {
auto const& log_level_int = log_level_value.get_immutable_view<clp::ffi::value_int_t>();
if (log_level_int >= clp::enum_to_underlying_type(cValidLogLevelsBeginIdx)
&& log_level_int < clp::enum_to_underlying_type(LogLevel::LENGTH))
{
log_level = static_cast<LogLevel>(log_level_int);
}
} else {
auto log_event_idx = m_deserialized_log_events->size();
SPDLOG_ERROR(
"Authoritative log level's value is not an int or string for log event index {}",
log_event_idx
);
}
return log_level;
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (1)

35-55: Consider optimizing the log level parsing implementation.

The current implementation makes an unnecessary string copy for case conversion. Consider using a case-insensitive comparison instead.

-    std::string log_level_name_upper_case{str};
-    std::ranges::transform(
-            log_level_name_upper_case.begin(),
-            log_level_name_upper_case.end(),
-            log_level_name_upper_case.begin(),
-            [](unsigned char c) { return std::toupper(c); }
-    );
-
-    auto const* it = std::ranges::find(
-            cLogLevelNames.begin() + clp::enum_to_underlying_type(cValidLogLevelsBeginIdx),
-            cLogLevelNames.end(),
-            log_level_name_upper_case
-    );
+    auto const* it = std::ranges::find_if(
+            cLogLevelNames.begin() + clp::enum_to_underlying_type(cValidLogLevelsBeginIdx),
+            cLogLevelNames.end(),
+            [&str](std::string_view const& name) {
+                return std::equal(str.begin(), str.end(), name.begin(), name.end(),
+                    [](char a, char b) { return std::toupper(a) == b; });
+            }
+    );
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f08e07f and aee6422.

📒 Files selected for processing (1)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (3)
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (3)

69-75: ⚠️ Potential issue

Return appropriate error code for unhandled UTC offset changes.

The method should return IRErrorCode_Unimplemented to accurately reflect that UTC offset changes aren't handled.

 auto StructuredIrUnitHandler::handle_utc_offset_change(
         [[maybe_unused]] clp::UtcOffset utc_offset_old,
         [[maybe_unused]] clp::UtcOffset utc_offset_new
 ) -> clp::ffi::ir_stream::IRErrorCode {
     SPDLOG_WARN("UTC offset change packets aren't handled currently.");
-    return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success;
+    return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Unimplemented;
 }

Likely invalid or redundant comment.


131-159: ⚠️ Potential issue

Add bounds checking and improve exception safety.

The method should validate the timestamp value and use find() instead of at().

     if (false == m_timestamp_node_id.has_value()) {
         return timestamp;
     }
-    auto const& optional_timestamp_value{id_value_pairs.at(m_timestamp_node_id.value())};
+    auto it = id_value_pairs.find(m_timestamp_node_id.value());
+    if (it == id_value_pairs.end()) {
+        return timestamp;
+    }
+    auto const& optional_timestamp_value = it->second;

     if (timestamp_value.is<clp::ffi::value_int_t>()) {
+        auto const raw_value = timestamp_value.get_immutable_view<clp::ffi::value_int_t>();
+        if (raw_value > std::numeric_limits<clp::ir::epoch_time_ms_t>::max()) {
+            SPDLOG_ERROR("Timestamp value overflow detected");
+            return timestamp;
+        }
         timestamp = static_cast<clp::ir::epoch_time_ms_t>(
-                timestamp_value.get_immutable_view<clp::ffi::value_int_t>()
+                raw_value
         );

Likely invalid or redundant comment.


58-67: 🛠️ Refactor suggestion

Add input validation for log event processing.

The method should validate the input log event's node ID-value pairs before processing.

 auto StructuredIrUnitHandler::handle_log_event(StructuredLogEvent&& log_event
 ) -> clp::ffi::ir_stream::IRErrorCode {
+    if (log_event.get_node_id_value_pairs().empty()) {
+        SPDLOG_ERROR("Empty node ID-value pairs in log event");
+        return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_InvalidData;
+    }
+
     auto const& id_value_pairs{log_event.get_node_id_value_pairs()};
     auto const timestamp = get_timestamp(id_value_pairs);
     auto const log_level = get_log_level(id_value_pairs);

Likely invalid or redundant comment.

Comment on lines +96 to +129
auto StructuredIrUnitHandler::get_log_level(
StructuredLogEvent::NodeIdValuePairs const& id_value_pairs
) const -> LogLevel {
LogLevel log_level{LogLevel::NONE};

if (false == m_log_level_node_id.has_value()) {
return log_level;
}
auto const& optional_log_level_value{id_value_pairs.at(m_log_level_node_id.value())};
if (false == optional_log_level_value.has_value()) {
return log_level;
}
auto const log_level_value = optional_log_level_value.value();

if (log_level_value.is<std::string>()) {
auto const& log_level_str = log_level_value.get_immutable_view<std::string>();
log_level = parse_log_level(log_level_str);
} else if (log_level_value.is<clp::ffi::value_int_t>()) {
auto const& log_level_int = log_level_value.get_immutable_view<clp::ffi::value_int_t>();
if (log_level_int >= clp::enum_to_underlying_type(cValidLogLevelsBeginIdx)
&& log_level_int < clp::enum_to_underlying_type(LogLevel::LENGTH))
{
log_level = static_cast<LogLevel>(log_level_int);
}
} else {
auto log_event_idx = m_deserialized_log_events->size();
SPDLOG_ERROR(
"Authoritative log level's value is not an int or string for log event index {}",
log_event_idx
);
}

return log_level;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve exception safety and bounds checking.

The method should use find() instead of at() to avoid potential exceptions, and validate integer bounds more strictly.

     if (false == m_log_level_node_id.has_value()) {
         return log_level;
     }
-    auto const& optional_log_level_value{id_value_pairs.at(m_log_level_node_id.value())};
+    auto it = id_value_pairs.find(m_log_level_node_id.value());
+    if (it == id_value_pairs.end()) {
+        return log_level;
+    }
+    auto const& optional_log_level_value = it->second;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto StructuredIrUnitHandler::get_log_level(
StructuredLogEvent::NodeIdValuePairs const& id_value_pairs
) const -> LogLevel {
LogLevel log_level{LogLevel::NONE};
if (false == m_log_level_node_id.has_value()) {
return log_level;
}
auto const& optional_log_level_value{id_value_pairs.at(m_log_level_node_id.value())};
if (false == optional_log_level_value.has_value()) {
return log_level;
}
auto const log_level_value = optional_log_level_value.value();
if (log_level_value.is<std::string>()) {
auto const& log_level_str = log_level_value.get_immutable_view<std::string>();
log_level = parse_log_level(log_level_str);
} else if (log_level_value.is<clp::ffi::value_int_t>()) {
auto const& log_level_int = log_level_value.get_immutable_view<clp::ffi::value_int_t>();
if (log_level_int >= clp::enum_to_underlying_type(cValidLogLevelsBeginIdx)
&& log_level_int < clp::enum_to_underlying_type(LogLevel::LENGTH))
{
log_level = static_cast<LogLevel>(log_level_int);
}
} else {
auto log_event_idx = m_deserialized_log_events->size();
SPDLOG_ERROR(
"Authoritative log level's value is not an int or string for log event index {}",
log_event_idx
);
}
return log_level;
}
auto StructuredIrUnitHandler::get_log_level(
StructuredLogEvent::NodeIdValuePairs const& id_value_pairs
) const -> LogLevel {
LogLevel log_level{LogLevel::NONE};
if (false == m_log_level_node_id.has_value()) {
return log_level;
}
auto it = id_value_pairs.find(m_log_level_node_id.value());
if (it == id_value_pairs.end()) {
return log_level;
}
auto const& optional_log_level_value = it->second;
if (false == optional_log_level_value.has_value()) {
return log_level;
}
auto const log_level_value = optional_log_level_value.value();
if (log_level_value.is<std::string>()) {
auto const& log_level_str = log_level_value.get_immutable_view<std::string>();
log_level = parse_log_level(log_level_str);
} else if (log_level_value.is<clp::ffi::value_int_t>()) {
auto const& log_level_int = log_level_value.get_immutable_view<clp::ffi::value_int_t>();
if (log_level_int >= clp::enum_to_underlying_type(cValidLogLevelsBeginIdx)
&& log_level_int < clp::enum_to_underlying_type(LogLevel::LENGTH))
{
log_level = static_cast<LogLevel>(log_level_int);
}
} else {
auto log_event_idx = m_deserialized_log_events->size();
SPDLOG_ERROR(
"Authoritative log level's value is not an int or string for log event index {}",
log_event_idx
);
}
return log_level;
}

Comment on lines +77 to +90
auto StructuredIrUnitHandler::handle_schema_tree_node_insertion(
clp::ffi::SchemaTree::NodeLocator schema_tree_node_locator
) -> clp::ffi::ir_stream::IRErrorCode {
++m_current_node_id;

auto const& key_name{schema_tree_node_locator.get_key_name()};
if (key_name == m_log_level_key) {
m_log_level_node_id.emplace(m_current_node_id);
} else if (key_name == m_timestamp_key) {
m_timestamp_node_id.emplace(m_current_node_id);
}

return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add bounds checking for node ID increment.

The current node ID increment could potentially overflow. Consider adding bounds checking.

 auto StructuredIrUnitHandler::handle_schema_tree_node_insertion(
         clp::ffi::SchemaTree::NodeLocator schema_tree_node_locator
 ) -> clp::ffi::ir_stream::IRErrorCode {
+    if (m_current_node_id == std::numeric_limits<std::size_t>::max()) {
+        SPDLOG_ERROR("Node ID overflow detected");
+        return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_InvalidData;
+    }
     ++m_current_node_id;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto StructuredIrUnitHandler::handle_schema_tree_node_insertion(
clp::ffi::SchemaTree::NodeLocator schema_tree_node_locator
) -> clp::ffi::ir_stream::IRErrorCode {
++m_current_node_id;
auto const& key_name{schema_tree_node_locator.get_key_name()};
if (key_name == m_log_level_key) {
m_log_level_node_id.emplace(m_current_node_id);
} else if (key_name == m_timestamp_key) {
m_timestamp_node_id.emplace(m_current_node_id);
}
return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success;
}
auto StructuredIrUnitHandler::handle_schema_tree_node_insertion(
clp::ffi::SchemaTree::NodeLocator schema_tree_node_locator
) -> clp::ffi::ir_stream::IRErrorCode {
if (m_current_node_id == std::numeric_limits<std::size_t>::max()) {
SPDLOG_ERROR("Node ID overflow detected");
return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_InvalidData;
}
++m_current_node_id;
auto const& key_name{schema_tree_node_locator.get_key_name()};
if (key_name == m_log_level_key) {
m_log_level_node_id.emplace(m_current_node_id);
} else if (key_name == m_timestamp_key) {
m_timestamp_node_id.emplace(m_current_node_id);
}
return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success;
}

kirkrodrigues
kirkrodrigues previously approved these changes Dec 13, 2024
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

feat: Add support for log-level filtering of structured IR streams.

bool use_filter
) -> DecodedResultsTsType {
if (use_filter && false == filtered_log_event_map.has_value()) {
return DecodedResultsTsType{emscripten::val::null()};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't in the log viewer code as well -

Can we add a log print before returning so that we know the input argument check fails here rather than at the range check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here i dont think so. This should be a normal case when a filter is applied, and the file does not have any logs with that filter level.

length = log_events.size();
}
if (length < end_idx || begin_idx > end_idx) {
return DecodedResultsTsType{emscripten::val::null()};
Copy link
Collaborator

@junhaoliao junhaoliao Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see https://github.com/y-scope/clp-ffi-js/pull/35/files#r1884668582

it might be a good idea to add a log print here before returning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an error log here

junhaoliao
junhaoliao previously approved these changes Dec 14, 2024
Copy link
Collaborator

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested with a clp-ffi-py generated IRv2 file with various log levels. all good except

  1. logs can be printed before returning null in auto StreamReader::generic_decode_range: https://github.com/y-scope/clp-ffi-js/pull/35/files#r1884668582
  2. a pending discussion at https://github.com/y-scope/clp-ffi-js/pull/35/files#r1881142029

@davemarco davemarco dismissed stale reviews from junhaoliao and kirkrodrigues via 0581310 December 14, 2024 01:53
@davemarco davemarco requested a review from junhaoliao December 14, 2024 01:55
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
src/clp_ffi_js/ir/StreamReader.hpp (2)

38-39: Add documentation for the LogEvents type alias

Consider adding a documentation comment explaining the purpose and usage of this type alias, similar to the documentation provided for FilteredLogEventsMap.

Add this documentation above the type alias:

+/**
+ * Collection of log events with their associated filter data (timestamp and log level).
+ */
template <typename LogEvent>
using LogEvents = std::vector<LogEventWithFilterData<LogEvent>>;

246-256: Consider extracting the filtering logic into a named lambda for better readability

The filtering logic could be made more readable by extracting it into a named lambda.

-    for (size_t log_event_idx = 0; log_event_idx < log_events.size(); ++log_event_idx) {
-        auto const& log_event = log_events[log_event_idx];
-        if (std::ranges::find(
-                    filter_levels,
-                    clp::enum_to_underlying_type(log_event.get_log_level())
-            )
-            != filter_levels.end())
-        {
-            filtered_log_event_map->emplace_back(log_event_idx);
-        }
-    }
+    auto const matches_filter_level = [&filter_levels](auto const& log_event) {
+        return std::ranges::find(
+                   filter_levels,
+                   clp::enum_to_underlying_type(log_event.get_log_level())
+               ) != filter_levels.end();
+    };
+
+    for (size_t log_event_idx = 0; log_event_idx < log_events.size(); ++log_event_idx) {
+        if (matches_filter_level(log_events[log_event_idx])) {
+            filtered_log_event_map->emplace_back(log_event_idx);
+        }
+    }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aee6422 and 0581310.

📒 Files selected for processing (1)
  • src/clp_ffi_js/ir/StreamReader.hpp (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
src/clp_ffi_js/ir/StreamReader.hpp (2)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/StreamReader.cpp:35-45
Timestamp: 2024-11-10T16:46:58.543Z
Learning: An IRv1 version validation utility function is not planned to be added to CLP-core soon.
🔇 Additional comments (2)
src/clp_ffi_js/ir/StreamReader.hpp (2)

59-60: Add documentation for the reader_options parameter

The reader_options parameter is missing from the docstring.


174-229: Well-implemented error handling and type safety

The implementation includes:

  • Proper concept requirements for type safety
  • Clear error handling with appropriate logging
  • Efficient range validation

@davemarco davemarco changed the title feat: Add log level filtering for structured IR stream reader. feat: Add support for log-level filtering of structured IR streams. Dec 14, 2024
Copy link
Collaborator

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments about documentation for the exception behaviour

For the PR title, how about

feat(StructuredIrStreamReader): Add support for log-level filtering.

return log_level;
}

auto StructuredIrUnitHandler::get_timestamp(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be deduplicated with StructuredIrUnitHandler::get_log_level, but for the sake of simplicity let's leave this for now.

auto const parsed{unstructured_log_event.get_message().decode_and_unparse()};
auto log_event_to_string = [this](UnstructuredLogEvent const& log_event) -> std::string {
std::string message;
auto const parsed{log_event.get_message().decode_and_unparse()};
if (false == parsed.has_value()) {
throw ClpFfiJsException{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the structured one also throws the same error when a message cannot be decoded, let's document this @throw in the docstring at

[[nodiscard]] virtual auto decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType = 0;

* @param log_events
* @param use_filter
* @param log_event_to_string
* @return See `decode_range`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the lack of context at #35 (comment)

Suggested change
* @return See `decode_range`.
* @return See `decode_range`.
* @throw Propagates `ToStringFunc`'s exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants