Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add log-level filtering methods and associated refactoring: #12

Merged
merged 50 commits into from
Oct 10, 2024

Conversation

davemarco
Copy link
Contributor

@davemarco davemarco commented Oct 2, 2024

Description

Adds support for log level filtering.

Validation performed

Performed some basic testing by calling functions in a test log-viewer branch. More testing will be done once integrated into log viewer.

Summary by CodeRabbit

  • New Features

    • Introduced a new LogViewerEvent class for enhanced log event handling, including log level information.
    • Added filtering capabilities in the StreamReader class to filter log events based on user-selected log levels.
    • Created a StreamReaderDataContext class for managing stream reading and deserialization resources.
  • Bug Fixes

    • Improved robustness of log event deserialization in the StreamReader class.
  • Documentation

    • Updated method signatures and added new methods for better usability and clarity.

Summary by CodeRabbit

  • New Features

    • Introduced a new LogLevel enumeration for defining various log levels.
    • Implemented filtering capabilities in the StreamReader class to filter log events based on user-selected log levels.
    • Added methods in the StreamReader class for retrieving filtered log events and deserializing log data.
  • Bug Fixes

    • Improved robustness of log event deserialization in the StreamReader class.
  • Documentation

    • Updated method signatures and added new methods for better usability and clarity.

Copy link

coderabbitai bot commented Oct 2, 2024

Walkthrough

The changes involve modifications to the src/clp_ffi_js/constants.hpp, src/clp_ffi_js/ir/StreamReader.cpp, and src/clp_ffi_js/ir/StreamReader.hpp files. The constants.hpp file introduces an enumeration for log levels and updates a constant type. The StreamReader class in both StreamReader.cpp and StreamReader.hpp is enhanced with new methods for filtering log events and managing deserialization, while existing methods are updated to accommodate these new functionalities.

Changes

File Change Summary
src/clp_ffi_js/constants.hpp - Header <cstddef> removed.
- Enum added: enum class LogLevel with levels NONE, TRACE, DEBUG, INFO, WARN, ERROR, FATAL.
- Constant updated: cValidLogLevelsBeginIdx changed from size_t to LogLevel.
src/clp_ffi_js/ir/StreamReader.cpp - Method added: void filter_log_events(emscripten::val const& log_level_filter)
- Method added: auto get_filtered_log_event_map() const
- Method removed: auto deserialize_range(size_t begin_idx, size_t end_idx)
- Method added: auto deserialize_stream() -> size_t
- Method signature updated: auto decode_range(size_t begin_idx, size_t end_idx) to include bool use_filter.
- Logic updated to handle both filtered and unfiltered log events.
src/clp_ffi_js/ir/StreamReader.hpp - Method added: [[nodiscard]] auto get_filtered_log_event_map() const
- Method added: void filter_log_events(emscripten::val const& log_level_filter)
- Method signature updated: [[nodiscard]] auto deserialize_range(size_t begin_idx, size_t end_idx) to [[nodiscard]] auto deserialize_stream() -> size_t.
- Method signature updated: [[nodiscard]] auto decode_range(size_t begin_idx, size_t end_idx) to include bool use_filter.
- Variable modified: m_encoded_log_events to store LogEventWithLevel instances, m_filtered_log_event_map added as std::optional<std::vector<size_t>>.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant StreamReader
    participant LogEventWithLevel

    User->>StreamReader: filter_log_events(log_level_filter)
    StreamReader->>StreamReader: Populate m_filtered_log_event_map
    StreamReader->>LogEventWithLevel: Create LogEventWithLevel instances
    StreamReader->>User: get_filtered_log_event_map()
Loading

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@davemarco davemarco marked this pull request as draft October 2, 2024 04:25
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (4)
src/clp_ffi_js/ir/StreamReaderDataContext.hpp (1)

Line range hint 24-31: Constructor design is good, but check deserializer type

The constructor is well-designed, using rvalue references and std::move for efficient resource transfer. However, there's a potential issue:

The deserializer parameter type is clp::ir::LogEventDeserializer<clp::ir::four_byte_encoded_variable_t>, but the class template uses encoded_variable_t. This mismatch might cause compilation errors or unexpected behaviour.

Consider changing the deserializer parameter type to match the class template:

-    clp::ir::LogEventDeserializer<clp::ir::four_byte_encoded_variable_t> deserializer
+    clp::ir::LogEventDeserializer<encoded_variable_t> deserializer
src/clp_ffi_js/ir/StreamReader.hpp (2)

73-78: Ensure consistent parameter naming in comments

The parameter names in the comments (beginIdx, endIdx, useFilter) should match the actual parameter names in the function signature (begin_idx, end_idx, use_filter). Consistent naming enhances readability and reduces confusion.


88-94: Maintain consistent parameter naming conventions

The parameter logLevelFilter in filter_log_events() uses camelCase, while other parameters use snake_case (e.g., begin_idx, end_idx, use_filter). Consider renaming it to log_level_filter for consistency.

src/clp_ffi_js/ir/StreamReader.cpp (1)

104-104: Reconsider resetting m_is_filtered in deserialize_range

Currently, m_is_filtered is set to false at the beginning of deserialize_range, which resets the filtered state whenever deserialization occurs. If the intent is to preserve the filter state across deserializations, consider removing this reset or providing an option to retain the filter.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between d19be4a and 856ce4a.

📒 Files selected for processing (4)
  • src/clp_ffi_js/ir/LogViewerEvent.hpp (1 hunks)
  • src/clp_ffi_js/ir/StreamReader.cpp (7 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (4 hunks)
  • src/clp_ffi_js/ir/StreamReaderDataContext.hpp (1 hunks)
🔇 Additional comments (6)
src/clp_ffi_js/ir/StreamReaderDataContext.hpp (3)

Line range hint 13-21: LGTM: Well-structured class declaration

The StreamReaderDataContext class is well-designed:

  • The template allows flexibility for different encoded variable types.
  • The class encapsulates related resources, adhering to the single responsibility principle.
  • The namespace is consistent with the file structure.

Also applies to: 50-54


Line range hint 33-41: LGTM: Appropriate special member function declarations

The special member functions are well-defined:

  • Deleting copy operations prevents accidental copying of potentially large resources.
  • Defaulting move operations allows for efficient transfer of ownership.
  • Defaulting the destructor is appropriate as there are no custom cleanup operations needed.

This approach ensures proper resource management and follows the Rule of Five.


Line range hint 46-49: LGTM: Well-designed getter method

The get_deserializer method is well-implemented:

  • It returns a reference, avoiding unnecessary copying.
  • The [[nodiscard]] attribute is correctly used.
  • The method provides controlled access to the encapsulated deserializer.
src/clp_ffi_js/ir/StreamReader.hpp (3)

15-15: Include of 'LogViewerEvent.hpp' is appropriate

The inclusion of LogViewerEvent.hpp is necessary for the use of LogViewerEvent in the class.


20-20: Declaration of 'FilteredLogEventMapType' is correct

No issues found with the declaration of FilteredLogEventMapType.


87-87: Verify all calls to 'decode_range()' include 'use_filter' parameter

The method signature of decode_range() has been updated to include the use_filter parameter. Please ensure all existing calls to this method are updated to include this new parameter to prevent compilation errors or unexpected behaviour.

Run the following script to identify all usages of decode_range():

✅ Verification successful

All calls to decode_range() include the use_filter parameter.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all calls to 'decode_range()' and check for the 'use_filter' parameter.
# Expected: All calls should include three arguments, including 'use_filter'.

rg --type cpp --type c++ --type c -A 2 -B 2 'decode_range\('

Length of output: 90


Script:

#!/bin/bash
# Description: Find all calls to 'decode_range()' and check for the 'use_filter' parameter.
# Expected: All calls should include three arguments, including 'use_filter'.

rg --glob '*.cpp' --glob '*.cxx' --glob '*.cc' --glob '*.c++' --glob '*.c' 'decode_range\('

Length of output: 243

src/clp_ffi_js/ir/LogViewerEvent.hpp Outdated Show resolved Hide resolved
Comment on lines 54 to 57
/**
* @return The number of events buffered.
*/
[[nodiscard]] auto get_filtered_log_event_map() const -> FilteredLogEventMapType;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct the documentation for 'get_filtered_log_event_map()'

The comment for get_filtered_log_event_map() incorrectly states @return The number of events buffered. Please update the documentation to accurately reflect that this function returns the filtered log event map.

Comment on lines 103 to 107
m_stream_reader_data_context;
clp::TimestampPattern m_ts_pattern;
std::vector<LogViewerEvent<clp::ir::four_byte_encoded_variable_t>> m_encoded_log_events;
std::vector<size_t> m_filtered_log_event_map;
bool m_is_filtered;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Initialize 'm_is_filtered' to prevent undefined behaviour

The member variable m_is_filtered is declared but not initialized. To avoid potential undefined behaviour, initialize m_is_filtered to false in its declaration or within the constructor.

Comment on lines 228 to 233
std::vector<int> filter_levels;
unsigned int length = logLevelFilter["length"].as<unsigned int>();
filter_levels.reserve(length);
for (unsigned int i = 0; i < length; ++i) {
filter_levels.push_back(logLevelFilter[i].as<int>());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve performance of log level filtering by using std::unordered_set

Using std::vector<int> for filter_levels and std::find to check for log levels results in O(n) lookup time for each log event. This can be inefficient for large datasets. Consider using std::unordered_set<int> to reduce the lookup time to O(1).

Apply this diff to modify filter_levels to an unordered set and update the lookup:

-std::vector<int> filter_levels;
+std::unordered_set<int> filter_levels;
 unsigned int length = logLevelFilter["length"].as<unsigned int>();
 filter_levels.reserve(length);
 for (unsigned int i = 0; i < length; ++i) {
-    filter_levels.push_back(logLevelFilter[i].as<int>());
+    filter_levels.insert(logLevelFilter[i].as<int>());
 }

-// Filter log events based on the provided log levels
 for (size_t index = 0; index < m_encoded_log_events.size(); ++index) {
     const auto& logEvent = m_encoded_log_events[index];
-    if (std::find(filter_levels.begin(), filter_levels.end(), logEvent.get_log_level()) != filter_levels.end()) {
+    if (filter_levels.count(logEvent.get_log_level()) > 0) {
         m_filtered_log_event_map.push_back(index);
     }
 }

Also applies to: 236-241

src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved
Comment on lines 129 to 142
constexpr size_t cLogLevelPositionInMessages{1};
size_t log_level{cLogLevelNone};
// NOLINTNEXTLINE(readability-qualified-auto)
auto const log_level_name_it{std::find_if(
cLogLevelNames.begin() + cValidLogLevelsBeginIdx,
cLogLevelNames.end(),
[&](std::string_view level) {
return logtype.substr(cLogLevelPositionInMessages).starts_with(level);
}
)};
if (log_level_name_it != cLogLevelNames.end()) {
log_level = std::distance(cLogLevelNames.begin(), log_level_name_it);
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure logtype length before calling substr to prevent out-of-range access

The code assumes that logtype has more than cLogLevelPositionInMessages characters when calling substr(cLogLevelPositionInMessages). If logtype is shorter, this could result in undefined behaviour or an out-of-range access. Consider adding a check to ensure that logtype.length() > cLogLevelPositionInMessages before performing substr.

Apply this diff to add the length check:

+if (logtype.length() > cLogLevelPositionInMessages) {
    auto const log_level_name_it{std::find_if(
        cLogLevelNames.begin() + cValidLogLevelsBeginIdx,
        cLogLevelNames.end(),
        [&](std::string_view level) {
            return logtype.substr(cLogLevelPositionInMessages).starts_with(level);
        }
    )};
    if (log_level_name_it != cLogLevelNames.end()) {
        log_level = std::distance(cLogLevelNames.begin(), log_level_name_it);
    }
+} else {
+    log_level = cLogLevelNone;
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
constexpr size_t cLogLevelPositionInMessages{1};
size_t log_level{cLogLevelNone};
// NOLINTNEXTLINE(readability-qualified-auto)
auto const log_level_name_it{std::find_if(
cLogLevelNames.begin() + cValidLogLevelsBeginIdx,
cLogLevelNames.end(),
[&](std::string_view level) {
return logtype.substr(cLogLevelPositionInMessages).starts_with(level);
}
)};
if (log_level_name_it != cLogLevelNames.end()) {
log_level = std::distance(cLogLevelNames.begin(), log_level_name_it);
}
constexpr size_t cLogLevelPositionInMessages{1};
size_t log_level{cLogLevelNone};
// NOLINTNEXTLINE(readability-qualified-auto)
if (logtype.length() > cLogLevelPositionInMessages) {
auto const log_level_name_it{std::find_if(
cLogLevelNames.begin() + cValidLogLevelsBeginIdx,
cLogLevelNames.end(),
[&](std::string_view level) {
return logtype.substr(cLogLevelPositionInMessages).starts_with(level);
}
)};
if (log_level_name_it != cLogLevelNames.end()) {
log_level = std::distance(cLogLevelNames.begin(), log_level_name_it);
}
} else {
log_level = cLogLevelNone;
}

return;
}

// Convert JavaScript array to C++ vector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we tried vecFromJSArray?

Copy link
Contributor Author

@davemarco davemarco Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I think it works. Thanks!!! This was a bit rushed so did not read any emscripten docs...

Comment on lines 250 to 259
emscripten::val results = emscripten::val::array();
for (size_t index : m_filtered_log_event_map) {
EM_ASM_(
{
Emval.toValue($0).push($1);
},
results.as_handle(),
index
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if this works.

Suggested change
emscripten::val results = emscripten::val::array();
for (size_t index : m_filtered_log_event_map) {
EM_ASM_(
{
Emval.toValue($0).push($1);
},
results.as_handle(),
index
);
}
emscripten::val::array(m_filtered_log_event_map);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!!! I think it works

namespace {
EMSCRIPTEN_BINDINGS(ClpIrStreamReader) {
emscripten::register_type<clp_ffi_js::ir::DataArrayTsType>("Uint8Array");
emscripten::register_type<clp_ffi_js::ir::DecodedResultsTsType>(
"Array<[string, number, number, number]>"
);
emscripten::register_type<clp_ffi_js::ir::FilteredLogEventMapType>(
"Array<number>"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the "array-simple" convention when typing arrays with TypeScript: https://typescript-eslint.io/rules/array-type#array-simple

Suggested change
"Array<number>"
"number[]"

Comment on lines 245 to 247
m_encoded_log_events(),
m_filtered_log_event_map(),
m_is_filtered(false) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We do not need to init m_encoded_log_events and m_filtered_log_event_map given their default constructors will be invoked.
  2. Let's use the brace syntax for inits.
Suggested change
m_encoded_log_events(),
m_filtered_log_event_map(),
m_is_filtered(false) {}
m_is_filtered{false} {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@davemarco davemarco marked this pull request as ready for review October 2, 2024 19:00
@davemarco
Copy link
Contributor Author

Completed my personal review. Thanks junhao for help with emscripten! Is it ready for review

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
src/clp_ffi_js/ir/StreamReader.hpp (1)

56-66: LGTM: New filtering methods added with appropriate documentation

The additions of get_filtered_log_event_map() and filter_log_events() methods are well-documented and align with the PR objective of adding log level filtering support. The documentation issue mentioned in a previous review has been addressed.

Consider adding a brief description of the return type for get_filtered_log_event_map() to enhance clarity.

src/clp_ffi_js/ir/StreamReader.cpp (1)

131-131: Style Suggestion: Place break; on a new line after the error log.

For better readability, it's recommended to place break; on a new line after the SPDLOG_ERROR statement.

Apply this diff to adjust the code:

- SPDLOG_ERROR("Failed to extract log type for log level parsing."); break;
+ SPDLOG_ERROR("Failed to extract log type for log level parsing.");
+ break;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 856ce4a and d2e3d1d.

📒 Files selected for processing (3)
  • src/clp_ffi_js/ir/LogViewerEvent.hpp (1 hunks)
  • src/clp_ffi_js/ir/StreamReader.cpp (7 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/clp_ffi_js/ir/LogViewerEvent.hpp
🔇 Additional comments (15)
src/clp_ffi_js/ir/StreamReader.hpp (6)

6-6: LGTM: Appropriate include added

The addition of the header is consistent with the introduction of the m_filtered_log_event_map member variable. This change supports the new filtering functionality.


16-16: LGTM: Necessary include added for LogViewerEvent

The inclusion of the LogViewerEvent header file is appropriate, given the change in the type of m_encoded_log_events. This modification supports the updated event handling in the class.


21-21: LGTM: FilteredLogEventMapType declaration added

The declaration of FilteredLogEventMapType using EMSCRIPTEN_DECLARE_VAL_TYPE is appropriate. This addition supports the new filtering functionality and the get_filtered_log_event_map() method.


68-74: LGTM: New build() method added with clear documentation

The addition of the build() method, which replaces the previous deserialize_range() method, is well-documented and aligns with the changes described in the PR summary. The method's purpose and return value are clearly explained.


77-91: LGTM: decode_range() method updated to support filtering

The decode_range() method has been appropriately updated to include the use_filter parameter. The documentation clearly explains the new parameter and its effect on the method's behaviour. These changes align well with the new filtering functionality.


100-103: LGTM: Member variables updated to support new functionality

The changes to the member variables are appropriate:

  1. The type of m_encoded_log_events has been updated to LogViewerEvent, which is consistent with the new include statement and the changes described in the PR summary.
  2. The addition of m_filtered_log_event_map supports the new filtering functionality.

Note: The previous review comment about initializing m_is_filtered is no longer applicable as this variable is not present in the current version.

src/clp_ffi_js/ir/StreamReader.cpp (9)

8-8: Include <optional> header is appropriate.

The inclusion of the <optional> header is necessary for the use of std::optional in the code.


29-29: Include LogViewerEvent.hpp header is necessary.

The inclusion of LogViewerEvent.hpp is required for the LogViewerEvent class used in the code.


103-109: Implementation of get_filtered_log_event_map is correct.

The function correctly handles the case when no filter is applied by returning null and returns the filtered log event map appropriately.


111-154: build function is implemented correctly.

The function appropriately builds and stores the encoded log events, including parsing log levels and handling potential errors.


177-206: decode_range function handles filtering correctly.

The function correctly decodes log events within the specified range, taking into account whether filtering is applied.


223-223: Verify the correctness of log_event_idx + 1.

In the EM_ASM block, log_event_idx + 1 is used. Please ensure that this offset is intentional and aligns with how indices are expected in the JavaScript environment.


230-245: filter_log_events function is implemented correctly.

The function appropriately filters the log events based on the provided log levels and updates the filtered event map.


263-266: Type registration for FilteredLogEventMapType is appropriate.

The emscripten type is correctly registered as "number[]", adhering to TypeScript array conventions.


276-279: EMSCRIPTEN bindings updated for new methods.

The functions getFilteredLogEventMap, build, decodeRange, and filterLogEvents are correctly bound for JavaScript interoperability.

@davemarco davemarco requested a review from junhaoliao October 3, 2024 02:37
Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly lgtm. The interfaces here are aligned with the ones in the revised JsonlDecoder in the new-log-viewer.

* @tparam encoded_variable_t The type of encoded variables in the event
*/
template <typename encoded_variable_t>
class LogViewerEvent : public clp::ir::LogEvent<encoded_variable_t> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to name this as LogEventExt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(As in we might want to be less specific about the Log Viewer which is a final product. ) If done correctly, in the future we can release clp-ffi-js as a Node.js library and the extended attributes like the log level are still useful, right?

Copy link
Member

@kirkrodrigues kirkrodrigues Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call it LogEventWithLevel? I prefer to pick a strangely specific name because eventually, once we move to the next IR format, LogEvent (which will be a set of kv-pairs) should have a field which is its level. So by picking an odd name now, I think it will hint to the reader that this code has something odd about it and they will look more into the docstring (where we can explain the situation).

src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved
Comment on lines 239 to 244
for (size_t index = 0; index < m_encoded_log_events.size(); ++index) {
const auto& logEvent = m_encoded_log_events[index];
if (std::find(filter_levels.begin(), filter_levels.end(), logEvent.get_log_level()) != filter_levels.end()) {
m_filtered_log_event_map->push_back(index);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (size_t index = 0; index < m_encoded_log_events.size(); ++index) {
const auto& logEvent = m_encoded_log_events[index];
if (std::find(filter_levels.begin(), filter_levels.end(), logEvent.get_log_level()) != filter_levels.end()) {
m_filtered_log_event_map->push_back(index);
}
}
for (const auto& [logEventIdx, logEvent] : std::views::enumerate(m_encoded_log_events)) {
if (std::ranges::find(filter_levels, logEvent.get_log_level()) != filter_levels.end()) {
m_filtered_log_event_map->push_back(logEventIdx);
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting back to previous. I dont think most people are using c++23 yet, so not sure we want to use std::views::enumerate

src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (5)
src/clp_ffi_js/ir/StreamReader.hpp (2)

61-70: Consider enhancing the comment for filter_log_events().

The new methods for filtered log events look good. However, the comment for filter_log_events() could be more descriptive. Consider updating it to provide more context about what the method does and how it affects the internal state of the StreamReader.

Suggestion:

/**
 * Generates a filtered collection of log events based on the provided log levels.
 * This method updates the internal filtered log event map.
 *
 * @param log_level_filter Array of selected log levels to filter by
 */

81-96: Consider improving comment formatting for better readability.

The updated decode_range() method and its documentation look good. The new use_filter parameter is well-explained. To improve readability, consider using a bullet list for the return value description:

/**
 * Decodes log events in the range `[beginIdx, endIdx)` of the filtered or unfiltered
 * (depending on the value of `useFilter`) log events collection.
 *
 * @param begin_idx
 * @param end_idx
 * @param use_filter Whether to decode from the filtered or unfiltered log events collection.
 * @return An array where each element is a decoded log event represented by an array of:
 * - The log event's message
 * - The log event's timestamp as milliseconds since the Unix epoch
 * - The log event's log level as an integer that indexes into `cLogLevelNames`
 * - The log event's number (1-indexed) in the stream
 * @return null if any log event in the range doesn't exist (e.g. the range exceeds the number
 * of log events in the collection).
 */

This format makes it easier to scan and understand the return value structure.

src/clp_ffi_js/ir/StreamReader.cpp (3)

Line range hint 126-184: LGTM with suggestions: Well-implemented deserialize_stream method

The deserialize_stream method is well-structured and handles error cases appropriately. However, consider the following improvements:

  1. Extract the log level parsing logic (lines 155-172) into a separate method for better readability and maintainability.
  2. Replace magic numbers like cLogLevelPositionInMessages with named constants or enum values.

Consider refactoring the log level extraction logic:

LogLevel parse_log_level(std::string_view logtype) {
    constexpr size_t cLogLevelPositionInMessages = 1;
    if (logtype.length() <= cLogLevelPositionInMessages) {
        return LogLevel::NONE;
    }
    
    auto log_level_substr = logtype.substr(cLogLevelPositionInMessages);
    auto it = std::find_if(
        cLogLevelNames.begin() + static_cast<size_t>(cValidLogLevelsBeginIdx),
        cLogLevelNames.end(),
        [&](std::string_view level) { return log_level_substr.starts_with(level); }
    );
    
    return (it != cLogLevelNames.end())
        ? static_cast<LogLevel>(std::distance(cLogLevelNames.begin(), it))
        : LogLevel::NONE;
}

Then use it in deserialize_stream:

auto log_level = parse_log_level(logtype);

Line range hint 186-231: LGTM with a minor fix: Updated decode_range method

The updated decode_range method now handles both filtered and unfiltered log event decoding, which is a good improvement. However, there's a minor issue with the index range check.

The condition 0 > begin_idx on line 198 is unnecessary since begin_idx is of type size_t (unsigned). Consider updating the condition to:

-    if (length < end_idx || 0 > begin_idx || begin_idx > end_idx) {
+    if (begin_idx > end_idx || end_idx > length) {

This change ensures that:

  1. The begin index is less than or equal to the end index.
  2. The end index doesn't exceed the available length.
  3. We don't have an unnecessary check for a negative begin index.

Line range hint 1-274: Overall assessment: Well-implemented changes with minor suggestions

The changes to StreamReader.cpp successfully implement log level filtering and improve stream deserialization. The new methods and updates to existing ones are generally well-structured and consistent. A few minor improvements have been suggested:

  1. Verify C++20 compatibility for std::ranges::find usage.
  2. Refactor log level parsing logic in deserialize_stream.
  3. Fix the index range check in decode_range.

These changes enhance the functionality of the StreamReader class while maintaining good code quality. After addressing the suggested improvements, the code will be more robust and maintainable.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between c04d887 and b3fde47.

📒 Files selected for processing (3)
  • src/clp_ffi_js/constants.hpp (1 hunks)
  • src/clp_ffi_js/ir/StreamReader.cpp (7 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/clp_ffi_js/constants.hpp
🧰 Additional context used
🔇 Additional comments (8)
src/clp_ffi_js/ir/StreamReader.hpp (3)

20-32: LGTM: New type declarations for filtered log events.

The addition of FilteredLogEventMapTsType and the FilteredLogEventsMap alias are well-documented and consistent with the new filtering functionality. Good job on providing clear comments for the new type.


72-78: LGTM: Improved method name and documentation.

The renaming of deserialize_range() to deserialize_stream() and the updated comment provide a clear description of the method's purpose and behaviour. This change improves the overall clarity of the API.


104-104: LGTM: Updated log event type to include level information.

The change from LogEvent to LogEventWithLevel for m_encoded_log_events is consistent with the new filtering functionality. This update allows for storing log level information directly with each event, which is crucial for efficient filtering.

src/clp_ffi_js/ir/StreamReader.cpp (5)

8-8: LGTM: Addition of <optional> header

The inclusion of the <optional> header is appropriate for the new functionality, likely used for the m_filtered_log_event_map member.


28-28: LGTM: Inclusion of LogEventWithLevel.hpp

The addition of the LogEventWithLevel.hpp header is consistent with the new log level filtering functionality being implemented.


102-108: LGTM: Well-implemented get_filtered_log_event_map method

The new get_filtered_log_event_map method is well-implemented. It correctly handles the case when no filtering has been applied and efficiently converts the C++ vector to a JavaScript array using emscripten::val::array.


Line range hint 254-270: LGTM: Updated EMSCRIPTEN bindings

The EMSCRIPTEN bindings have been correctly updated to reflect the changes in the StreamReader class. The new methods are properly bound, and the removed deserializeRange binding has been appropriately deleted. The type registration for FilteredLogEventMapTsType correctly uses a union type to represent the possibility of a null value.


110-124: LGTM with a note: Well-implemented filter_log_events method

The filter_log_events method is well-implemented, correctly handling null input and efficiently converting the JavaScript array to a C++ vector. However, the use of std::ranges::find might require C++20 support.

Please verify the project's C++ standard. If it's not C++20, consider using std::find instead:

If C++20 is not used, apply this change:

-        if (std::ranges::find(filter_levels, log_event.get_log_level()) != filter_levels.end()) {
+        if (std::find(filter_levels.begin(), filter_levels.end(), log_event.get_log_level()) != filter_levels.end()) {

std::unique_ptr<StreamReaderDataContext<clp::ir::four_byte_encoded_variable_t>>
m_stream_reader_data_context;
FilteredLogEventsMap m_filtered_log_event_map;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Consider initializing m_filtered_log_event_map.

The addition of m_filtered_log_event_map is consistent with the new filtering functionality. However, to prevent potential undefined behaviour, consider initializing this member variable:

FilteredLogEventsMap m_filtered_log_event_map{};

This ensures that the optional starts in a well-defined state (containing no value) when a StreamReader object is created.

kirkrodrigues
kirkrodrigues previously approved these changes Oct 10, 2024
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending testing with log viewer and will want to discuss PR title.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)
src/clp_ffi_js/ir/StreamReader.cpp (4)

102-108: LGTM: New method for retrieving filtered log event map

The implementation of get_filtered_log_event_map() looks good. It correctly handles the case when no filtering has been applied by returning null.

Consider using std::move when returning the array to potentially avoid unnecessary copying:

return FilteredLogEventMapTsType{emscripten::val::array(std::move(m_filtered_log_event_map.value()))};

This change might improve performance, especially for large filtered maps.


110-124: LGTM with suggestions: New method for filtering log events

The implementation of filter_log_events() looks good overall. It correctly handles null input and implements the filtering logic as expected.

Consider the following optimizations:

  1. Use std::unordered_set instead of std::vector for filter_levels to improve lookup performance from O(n) to O(1):
auto filter_levels = std::unordered_set<std::uint8_t>(
    emscripten::vecFromJSArray<std::uint8_t>(log_level_filter).begin(),
    emscripten::vecFromJSArray<std::uint8_t>(log_level_filter).end()
);
  1. Replace std::ranges::find with filter_levels.contains():
if (filter_levels.contains(static_cast<std::uint8_t>(log_event.get_log_level()))) {

These changes could significantly improve performance for large datasets.

Also, ensure that your project is set to use C++20, as std::ranges::find requires it.


Line range hint 126-184: LGTM with suggestions: New method for deserializing the stream

The implementation of deserialize_stream() looks good overall. It correctly handles various error cases and implements the deserialization logic as expected.

Consider the following improvements:

  1. Extract the log level parsing logic into a separate method for better readability and maintainability:
LogLevel parse_log_level(std::string_view logtype) {
    // ... (move the existing log level parsing logic here)
}
  1. Define constants for magic numbers:
constexpr size_t cLogLevelPositionInMessages = 1;
constexpr size_t cDefaultReservedMessageLength = 512;
  1. Consider using std::optional<LogLevel> for the log_level variable to make it clear that it might not have a valid value.

These changes would improve the code's readability and maintainability.


265-270: LGTM with a suggestion: Added new function bindings

The addition of new function bindings for getFilteredLogEventMap, filterLogEvents, and deserializeStream is correct and consistent with the new methods added to the StreamReader class.

Consider reordering the bindings to match the order of the methods in the class declaration. This can improve readability and make it easier to maintain consistency between the class declaration and the bindings.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b3fde47 and 7564280.

📒 Files selected for processing (1)
  • src/clp_ffi_js/ir/StreamReader.cpp (7 hunks)
🧰 Additional context used
🔇 Additional comments (4)
src/clp_ffi_js/ir/StreamReader.cpp (4)

Line range hint 8-28: LGTM: New includes added to support enhanced functionality

The addition of these new includes is appropriate for the implementation of log level filtering and the restructuring of the StreamReader class. The use of suggests good practices for handling nullable values.


206-214: LGTM: Updated loop logic in decode_range

The changes to the loop in decode_range() look good. The code correctly handles both filtered and unfiltered cases based on the use_filter parameter. The logic is clear and easy to understand, making it maintainable for future developers.


Line range hint 221-231: LGTM: Updated message handling and output in decode_range

The changes to the message handling and EM_ASM call in decode_range() look good. The addition of log_event.get_log_level() to the output is consistent with the new filtering functionality and provides valuable information to the JavaScript side.


254-255: LGTM: Added registration for FilteredLogEventMapTsType

The addition of the registration for FilteredLogEventMapTsType is correct and necessary for the Emscripten bindings. The type is properly defined as "number[] | null", which accurately represents the possible return values of the get_filtered_log_event_map() method.

src/clp_ffi_js/ir/StreamReader.cpp Show resolved Hide resolved
@davemarco
Copy link
Contributor Author

I had to fix a small thing (you can see change in commit 7564280 ), but then tested again and all looks good. I suggest updating the version and publishing to npm? I dont know if i need permissions or something to do this?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (2)
src/clp_ffi_js/ir/StreamReader.cpp (2)

210-215: Avoid Unnecessary Initialization of log_event_idx

You can eliminate the redundant initialization of log_event_idx by declaring it within each branch:

size_t log_event_idx;
if (use_filter) {
    log_event_idx = m_filtered_log_event_map->at(i);
} else {
    log_event_idx = i;
}

This approach enhances code clarity by defining variables closer to their point of use.


184-184: Simplify Resetting the Unique Pointer

Calling reset(nullptr) on std::unique_ptr is redundant. You can simplify it by using:

m_stream_reader_data_context.reset();

This change makes the code cleaner and more idiomatic.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b3fde47 and 7d01cf2.

📒 Files selected for processing (1)
  • src/clp_ffi_js/ir/StreamReader.cpp (7 hunks)
🧰 Additional context used
🔇 Additional comments (3)
src/clp_ffi_js/ir/StreamReader.cpp (3)

117-117: Use LogLevel for Type Safety and Clarity

Consider using LogLevel instead of std::uint8_t when extracting filter levels to enhance type safety and code clarity:

auto filter_levels{emscripten::vecFromJSArray<LogLevel>(log_level_filter)};

This change ensures consistency in the usage of LogLevel throughout your code.


120-122: Ensure Compatibility with Project's C++ Standard

The use of std::ranges::find requires C++20 support. If your project is not compiled with C++20, consider replacing it with std::find:

if (std::find(filter_levels.begin(), filter_levels.end(), log_level.get_log_level())
    != filter_levels.end())
{
    m_filtered_log_event_map->emplace_back(log_event_idx);
}

This ensures compatibility with earlier C++ standards.


232-234: Verify Data Types Passed to JavaScript

Ensure that the data types passed in the EM_ASM block match the expected JavaScript types. Specifically, confirm that log_event.get_log_level() returns a value compatible with JavaScript number types and that incrementing log_event_idx + 1 aligns with your indexing scheme.

Comment on lines +200 to 202
if (length < end_idx || begin_idx > end_idx) {
return DecodedResultsTsType{emscripten::val::null()};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct the Index Range Check in decode_range

The condition if (length < end_idx || begin_idx > end_idx) may lead to unexpected behaviour. To ensure proper bounds checking, consider updating the condition:

if (begin_idx >= end_idx || end_idx > length) {
    return DecodedResultsTsType{emscripten::val::null()};
}

This change verifies that begin_idx is less than end_idx and that end_idx does not exceed the total length, preventing potential out-of-bounds errors.

Comment on lines +128 to +134
auto StreamReader::deserialize_stream() -> size_t {
if (nullptr == m_stream_reader_data_context) {
return m_encoded_log_events.size();
}

constexpr size_t cDefaultNumReservedLogEvents{500'000};
m_encoded_log_events.reserve(cDefaultNumReservedLogEvents);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Check for Potential Memory Allocation Issues

Reserving space for 500,000 log events with m_encoded_log_events.reserve(cDefaultNumReservedLogEvents); could lead to high memory usage. Consider assessing typical usage scenarios to determine if this number is appropriate or if a smaller initial capacity would suffice.

Comment on lines +157 to +174
auto const& logtype = message.get_logtype();
constexpr size_t cLogLevelPositionInMessages{1};
LogLevel log_level{LogLevel::NONE};
if (logtype.length() > cLogLevelPositionInMessages) {
// NOLINTNEXTLINE(readability-qualified-auto)
auto const log_level_name_it{std::find_if(
cLogLevelNames.begin() + static_cast<size_t>(cValidLogLevelsBeginIdx),
cLogLevelNames.end(),
[&](std::string_view level) {
return logtype.substr(cLogLevelPositionInMessages).starts_with(level);
}
)};
if (log_level_name_it != cLogLevelNames.end()) {
log_level = static_cast<LogLevel>(
std::distance(cLogLevelNames.begin(), log_level_name_it)
);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle Empty or Short logtype Correctly

In the log level extraction logic, ensure that logtype with length equal to cLogLevelPositionInMessages is handled appropriately. Currently, if logtype is exactly one character long, logtype.substr(cLogLevelPositionInMessages) results in an empty string, which may not be intended. Consider adjusting the condition:

if (logtype.length() >= cLogLevelPositionInMessages) {
    // Existing code...
}

This change ensures that even log types of minimal valid length are processed correctly.

@davemarco davemarco changed the title Add support for log level filtering. Move log level parsing from decode method to deserialization method to support log level filtering in log viewer. Oct 10, 2024
@davemarco davemarco changed the title Move log level parsing from decode method to deserialization method to support log level filtering in log viewer. Move log level parsing from decode method to deserialization method; Add new filtering methods. Oct 10, 2024
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title and commit body, how about:

Add log-level filtering methods and associated refactoring:
- Move log-level parsing from the decode method to the deserialization method.
- Remove unused range parameters from the deserialize method.

src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
src/clp_ffi_js/ir/StreamReader.cpp (4)

102-108: LGTM: New get_filtered_log_event_map method

The method correctly handles both filtered and unfiltered cases, returning null when no filtering has been applied. The use of emscripten::val::array is appropriate for JavaScript interoperability.

Consider using std::move when returning m_filtered_log_event_map to avoid unnecessary copying:

-    return FilteredLogEventMapTsType{emscripten::val::array(m_filtered_log_event_map.value())};
+    return FilteredLogEventMapTsType{emscripten::val::array(std::move(m_filtered_log_event_map.value()))};

This change could potentially improve performance, especially for large filtered maps.


110-130: LGTM: New filter_log_events method with a suggestion

The method effectively implements log level filtering. The use of std::ranges::find is a nice modern C++ feature. However, consider using std::unordered_set instead of std::vector for filter_levels to improve lookup performance:

-    auto filter_levels{emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter
-    )};
+    auto filter_levels{std::unordered_set<std::underlying_type_t<LogLevel>>(
+        emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter).begin(),
+        emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter).end()
+    )};

-        if (std::ranges::find(
-                    filter_levels,
-                    clp::enum_to_underlying_type(log_event.get_log_level())
-            )
-            != filter_levels.end())
+        if (filter_levels.count(clp::enum_to_underlying_type(log_event.get_log_level())) > 0)

This change would reduce the time complexity of the level lookup from O(n) to O(1), where n is the number of filter levels.


Line range hint 132-190: LGTM: New deserialize_stream method with a suggestion

The method effectively deserializes the stream and extracts log levels. Error handling is comprehensive. Consider using a lookup table for log level names to improve performance:

static const std::unordered_map<std::string_view, LogLevel> logLevelMap = {
    {"TRACE", LogLevel::TRACE},
    {"DEBUG", LogLevel::DEBUG},
    // ... other levels ...
};

// In the log level extraction logic:
auto it = logLevelMap.find(logtype.substr(cLogLevelPositionInMessages));
if (it != logLevelMap.end()) {
    log_level = it->second;
}

This change could potentially improve the performance of log level extraction, especially for large numbers of log events.


192-220: LGTM: Updated decode_range method with a suggestion

The changes to decode_range effectively handle both filtered and unfiltered cases. The index range check is correct. Consider using a ternary operator to simplify the log event index selection:

-        size_t log_event_idx{0};
-        if (use_filter) {
-            log_event_idx = m_filtered_log_event_map->at(i);
-        } else {
-            log_event_idx = i;
-        }
+        size_t log_event_idx = use_filter ? m_filtered_log_event_map->at(i) : i;

This change would make the code more concise without sacrificing readability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 7d01cf2 and bd29b8f.

📒 Files selected for processing (1)
  • src/clp_ffi_js/ir/StreamReader.cpp (7 hunks)
🧰 Additional context used
🔇 Additional comments (3)
src/clp_ffi_js/ir/StreamReader.cpp (3)

8-8: LGTM: Addition of <optional> header

The inclusion of the <optional> header is appropriate, likely to support the new filtering functionality in the StreamReader class.


28-28: LGTM: Inclusion of LogEventWithLevel.hpp

The addition of this header is consistent with the new functionality for handling log levels in the StreamReader class.


Line range hint 260-276: LGTM: Updated EMSCRIPTEN bindings

The new bindings correctly expose the added methods to JavaScript. The type registration for FilteredLogEventMapTsType as "number[] | null" is appropriate, allowing for the case when no filtering has been applied.

@kirkrodrigues
Copy link
Member

I suggest updating the version and publishing to npm? I dont know if i need permissions or something to do this?

Can you submit a PR after this one to bump the version to 0.2.0?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
src/clp_ffi_js/ir/StreamReader.cpp (3)

112-132: LGTM with a suggestion: Well-implemented filtering method with room for optimization.

The filter_log_events method correctly implements the filtering logic. It properly handles the case of clearing the filter when the input is null. The use of emscripten::vecFromJSArray for converting the JavaScript array to a C++ vector is appropriate.

However, there's room for performance improvement:

Consider using std::unordered_set instead of std::vector for filter_levels. This would change the time complexity of the level lookup from O(n) to O(1), where n is the number of filter levels. Here's a suggested implementation:

void StreamReader::filter_log_events(emscripten::val const& log_level_filter) {
    if (log_level_filter.isNull()) {
        m_filtered_log_event_map.reset();
        return;
    }

    m_filtered_log_event_map.emplace();
    auto filter_levels_vec = emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter);
    std::unordered_set<std::underlying_type_t<LogLevel>> filter_levels(filter_levels_vec.begin(), filter_levels_vec.end());

    for (size_t log_event_idx = 0; log_event_idx < m_encoded_log_events.size(); ++log_event_idx) {
        auto const& log_event = m_encoded_log_events[log_event_idx];
        if (filter_levels.count(clp::enum_to_underlying_type(log_event.get_log_level())) > 0) {
            m_filtered_log_event_map->emplace_back(log_event_idx);
        }
    }
}

This change would be particularly beneficial for larger datasets or when there are many filter levels.


Line range hint 134-191: LGTM with a suggestion: Well-implemented deserialization method.

The deserialize_stream method correctly implements the deserialization logic for log events. The extraction of log levels from the logtype is done carefully, considering potential edge cases.

However, there's a point to consider:

The method reserves space for 500,000 log events initially with m_encoded_log_events.reserve(cDefaultNumReservedLogEvents). While this can improve performance by reducing reallocations, it might be excessive for some use cases and could lead to unnecessary memory usage.

Consider making this value configurable or adjustable based on expected input size. Alternatively, you could implement a more dynamic allocation strategy, starting with a smaller reservation and growing as needed.

For example:

constexpr size_t cInitialReservedLogEvents{10'000};
constexpr size_t cGrowthFactor{2};
size_t current_capacity = cInitialReservedLogEvents;
m_encoded_log_events.reserve(current_capacity);

// In the deserialization loop:
if (m_encoded_log_events.size() == current_capacity) {
    current_capacity *= cGrowthFactor;
    m_encoded_log_events.reserve(current_capacity);
}

This approach would start with a smaller memory footprint and grow as needed, potentially being more memory-efficient for smaller log files while still providing good performance for larger ones.


194-222: LGTM with a suggestion: Good implementation of filtered decoding.

The changes to the decode_range method correctly implement support for both filtered and unfiltered log events. The addition of the use_filter parameter and the logic to handle both cases are well done.

However, there's a small improvement that could be made:

The index range check could be more explicit. Consider updating the condition:

if (begin_idx >= end_idx || end_idx > length) {
    return DecodedResultsTsType{emscripten::val::null()};
}

This change makes it clearer that:

  1. The begin index should be less than the end index.
  2. The end index should not exceed the available length.

This slight modification improves readability and makes the bounds checking more explicit.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between bd29b8f and a244f5e.

📒 Files selected for processing (1)
  • src/clp_ffi_js/ir/StreamReader.cpp (6 hunks)
🧰 Additional context used
🔇 Additional comments (5)
src/clp_ffi_js/ir/StreamReader.cpp (5)

8-8: LGTM: New includes are appropriate for the added functionality.

The addition of <optional>, <type_traits>, and <clp/type_utils.hpp> headers is consistent with the new features implemented in this file. These includes support the use of std::optional, std::underlying_type_t, and clp::enum_to_underlying_type respectively.

Also applies to: 12-12, 23-23


104-110: LGTM: Well-implemented getter for filtered log event map.

The get_filtered_log_event_map method correctly handles both cases: when a filtered map exists and when it doesn't. The use of emscripten::val::array for returning the result to JavaScript is appropriate. The method provides a clean interface for accessing the filtered log events.


Line range hint 215-241: LGTM: Excellent implementation of filtered decoding loop.

The changes to the decoding loop in decode_range are well-implemented. The method now correctly handles both filtered and unfiltered cases, using the appropriate index based on whether filtering is enabled. The inclusion of the log level in the output array is consistent with the new filtering functionality and provides valuable information to the caller.

The logic for selecting the correct log event index is clear and concise:

size_t log_event_idx{0};
if (use_filter) {
    log_event_idx = m_filtered_log_event_map->at(i);
} else {
    log_event_idx = i;
}

This implementation ensures that the correct log events are decoded, whether filtering is used or not.


Line range hint 262-278: LGTM: Appropriate updates to EMSCRIPTEN bindings.

The changes to the EMSCRIPTEN bindings correctly expose the new functionality to JavaScript:

  1. The registration of FilteredLogEventMapTsType as "number[] | null" is appropriate, accurately representing the possible return types of the get_filtered_log_event_map method.

  2. The addition of new functions to the bindings (getFilteredLogEventMap, filterLogEvents, and deserializeStream) correctly exposes the new methods of the StreamReader class to JavaScript.

These updates ensure that the new filtering and deserialization capabilities are accessible from the JavaScript side, maintaining consistency between the C++ implementation and its JavaScript interface.


Line range hint 1-281: Overall assessment: Well-implemented filtering functionality with minor suggestions for improvement.

The changes to StreamReader.cpp successfully implement log level filtering functionality and update existing methods to support it. The new methods get_filtered_log_event_map, filter_log_events, and deserialize_stream are well-designed and correctly implemented. The modifications to decode_range appropriately handle both filtered and unfiltered cases.

Key strengths:

  1. Correct implementation of filtering logic.
  2. Proper handling of edge cases (e.g., null filter input).
  3. Consistent updates to EMSCRIPTEN bindings.

Suggestions for improvement:

  1. Consider using std::unordered_set in filter_log_events for better performance.
  2. Evaluate the initial reservation size in deserialize_stream for potential memory optimization.
  3. Make the index range check in decode_range more explicit.

These changes significantly enhance the functionality of the StreamReader class while maintaining good code quality. The suggested improvements are minor and aimed at optimizing performance and clarity.

@davemarco
Copy link
Contributor Author

Sure I can do that. I just made your changes and tested again. All looks good

@davemarco davemarco changed the title Move log level parsing from decode method to deserialization method; Add new filtering methods. Add log-level filtering methods and associated refactoring: Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants