Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split StreamReader into an interface and implementation to prepare for adding another IR stream reader. #26

Merged
merged 42 commits into from
Nov 7, 2024

Conversation

davemarco
Copy link
Contributor

@davemarco davemarco commented Oct 31, 2024

Description

Extracts a base class for IRv1 and IRv2 stream readers to inherit from. Since IRv2 is not yet supported by clp-ffi-js, the repo now includes a base class and a derived class for IRv1.

Code was originally written by Junhao. He made these changes

  • Rename ClpIrStreamReader to ClpIRStreamReader in bindings.
  • Add get_version to decoding methods.
  • Extract a StreamReader base class from IRStreamReader.
  • Implement factory method StreamReader::create to perform version checking before creating derived classes.

I merged main afterwards, and made small changes like exporting base class methods instead of derived class.

Validation performed

Build and test functionality in log viewer (opening a few files and filtering). FYI since the exported class name changed, we will need to update log viewer when this is re-released.

Summary by CodeRabbit

  • New Features

    • Introduced the IrStreamReader class for reading and processing IR streams.
    • Added methods for filtering log events, deserializing streams, and decoding log event ranges.
    • New function to retrieve version information from a reader interface.
  • Bug Fixes

    • Enhanced error handling during stream deserialization and logging.
  • Refactor

    • Significant restructuring of the StreamReader class to improve memory management and extensibility.
  • Documentation

    • Updated documentation to reflect changes in class names and functionalities.

Summary by CodeRabbit

  • New Features

    • Introduced the UnstructuredIrStreamReader class for handling unstructured IR streams, including methods for filtering log events and deserializing data.
    • Enhanced error handling and logging capabilities in the stream processing.
  • Refactor

    • Major restructuring of the StreamReader class, focusing on smart pointer management and interface simplification.
    • Removed several outdated methods to streamline functionality.
  • Chores

    • Updated build configuration to include new source files and reflect changes in project structure.
    • Updated clang-format configuration to recognize additional library headers.

Copy link

coderabbitai bot commented Oct 31, 2024

Walkthrough

The pull request introduces significant changes to the StreamReader class and related components in the clp_ffi_js namespace. Key modifications include updating the create method to return a std::unique_ptr<StreamReader>, enhancing memory management. Multiple methods have been removed or refactored, and new methods have been added to improve error handling and encoding validation. Additionally, a new class, UnstructuredIrStreamReader, has been introduced to handle unstructured IR streams. The changes reflect a substantial restructuring aimed at improving the class's functionality and interface.

Changes

File Path Change Summary
src/clp_ffi_js/ir/StreamReader.cpp Updated create method return type to std::unique_ptr<StreamReader>. Removed methods: get_num_events_buffered, get_filtered_log_event_map, filter_log_events, deserialize_stream, decode_range. Added new methods for version checking and encoding validation.
src/clp_ffi_js/ir/StreamReader.hpp Changed create method return type to std::unique_ptr<StreamReader>. Made several methods virtual/pure virtual. Modified constructor to be protected. Added new type alias and constant. Introduced two new static methods.
src/clp_ffi_js/ir/decoding_methods.cpp Removed implementation of rewind_reader_and_validate_encoding_type.
src/clp_ffi_js/ir/decoding_methods.hpp Removed header file decoding_methods.hpp which declared rewind_reader_and_validate_encoding_type.
CMakeLists.txt Updated CLP_FFI_JS_SRC_MAIN to replace decoding_methods.cpp with UnstructuredIrStreamReader.cpp.
src/submodules/clp Updated submodule reference from 86299ca2907565e09cb10c2ddd3661ad1ceb6cb0 to e1f3f2abe3473324b19d66e22c182ec3ac0d408f.
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp Added new file implementing UnstructuredIrStreamReader class with methods for reading, filtering, and deserializing unstructured IR streams.
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp Added new header file for UnstructuredIrStreamReader class, including methods for handling log events and decoding streams.

Possibly related PRs

Suggested reviewers

  • junhaoliao
  • kirkrodrigues

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@davemarco davemarco marked this pull request as draft October 31, 2024 20:17
@davemarco davemarco changed the title WIP(Replaces #22)- Rename StreamReader -> IrStreamReader; Extract a StreamReader base class from IrStreamReader. WIP(Replaces #23)- Rename StreamReader -> IrStreamReader; Extract a StreamReader base class from IrStreamReader. Oct 31, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (5)
src/clp_ffi_js/ir/decoding_methods.hpp (1)

9-9: Add documentation for the get_version function.

As this is part of the public API, please add documentation explaining:

  • The function's purpose
  • Expected return value format
  • Any exceptions that might be thrown
  • Any preconditions or side effects
src/clp_ffi_js/ir/StreamReaderDataContext.hpp (1)

Line range hint 18-18: Class name needs to be updated to match the refactoring

The class name should be IrStreamReaderDataContext to maintain consistency with the renaming changes mentioned in the PR objectives and summary. This aligns with the pattern of renaming StreamReader to IrStreamReader.

Apply this diff to update the class name:

-class StreamReaderDataContext {
+class IrStreamReaderDataContext {
src/clp_ffi_js/ir/decoding_methods.cpp (2)

3-8: Consider reorganizing include directives

The includes should be grouped into three distinct blocks separated by blank lines:

  1. System headers (, , etc.)
  2. Third-party headers (nlohmann/json)
  3. Project headers (clp/*)
+// System headers
 #include <cstdint>
-#include <json/single_include/nlohmann/json.hpp>
 #include <string>
 #include <string_view>
 #include <vector>

+// Third-party headers
+#include <json/single_include/nlohmann/json.hpp>
+#include <spdlog/spdlog.h>
+
+// Project headers
 #include <clp/ErrorCode.hpp>

Also applies to: 11-11, 14-14


24-41: Enhance error reporting for preamble deserialization

The error handling for preamble deserialization could be more informative.

     if (clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success != deserialize_preamble_result) {
         SPDLOG_CRITICAL(
-                "Failed to deserialize preamble for version reading: {}",
+                "Failed to deserialize IR stream preamble: error_code={}, metadata_size={}",
-                deserialize_preamble_result
+                deserialize_preamble_result,
+                metadata_bytes.size()
         );
         throw ClpFfiJsException{
                 clp::ErrorCode::ErrorCode_Failure,
                 __FILENAME__,
                 __LINE__,
-                "Failed to deserialize preamble for version reading."
+                "Failed to deserialize IR stream preamble. See logs for details."
         };
     }
src/clp_ffi_js/ir/IrStreamReader.cpp (1)

193-201: Ensure safe handling of message content in EM_ASM

When passing message.c_str() to EM_ASM, there's a potential risk if the message contains unexpected characters or is not properly null-terminated. To enhance security and stability, consider validating or sanitizing the message before passing it to JavaScript.

You might sanitize the message or ensure it meets expected format constraints before usage.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9e82372 and ad306f2.

📒 Files selected for processing (7)
  • src/clp_ffi_js/ir/IrStreamReader.cpp (1 hunks)
  • src/clp_ffi_js/ir/IrStreamReader.hpp (1 hunks)
  • src/clp_ffi_js/ir/StreamReader.cpp (2 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (3 hunks)
  • src/clp_ffi_js/ir/StreamReaderDataContext.hpp (1 hunks)
  • src/clp_ffi_js/ir/decoding_methods.cpp (1 hunks)
  • src/clp_ffi_js/ir/decoding_methods.hpp (1 hunks)
🔇 Additional comments (18)
src/clp_ffi_js/ir/decoding_methods.hpp (3)

1-7: LGTM! Header organization is clean and follows best practices.

The header guards and includes are well-organized and contain only the necessary dependencies.


8-8: LGTM! Namespace declaration is appropriate.

The nested namespace follows modern C++ conventions and matches the project structure.


Line range hint 10-17: Consider separating the rewind and validation responsibilities.

The function name suggests it performs two distinct operations: rewinding and validation. Consider splitting these into separate functions to improve modularity and reusability:

  1. rewind_reader()
  2. validate_encoding_type()

This would better follow the Single Responsibility Principle and make the code more maintainable.

Additionally, since this is a work in progress, we should verify the exception handling across the codebase:

src/clp_ffi_js/ir/StreamReaderDataContext.hpp (2)

14-14: Documentation update looks good, eh!

The documentation has been properly updated to reference IrStreamReader instead of StreamReader.


Line range hint 1-57: Verify the impact of class renaming

The class implementation looks solid with proper memory management and RAII principles. However, we should verify that all references to this class are updated consistently.

Let's check for any references to this class that need updating:

✅ Verification successful

Let me check the content of IrStreamReader files to ensure the class name is used consistently and there are no mismatches.


No issues found with StreamReaderDataContext class references

After reviewing the implementation files, all references to StreamReaderDataContext are consistent and properly used throughout the codebase. The class is:

  • Correctly included in IrStreamReader.hpp
  • Properly used as a template class with four_byte_encoded_variable_t
  • Consistently used in both the header and implementation files
  • Appropriately managed with smart pointers and move semantics
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all references to StreamReaderDataContext that need updating

# Search for direct references to StreamReaderDataContext
echo "Checking for direct references to StreamReaderDataContext:"
rg -l "StreamReaderDataContext"

# Search for template usage patterns
echo -e "\nChecking for template usage patterns:"
rg "StreamReaderDataContext<"

Length of output: 1083


Script:

#!/bin/bash
# Check the content of IrStreamReader files
echo "Content of IrStreamReader.hpp:"
cat src/clp_ffi_js/ir/IrStreamReader.hpp

echo -e "\nContent of IrStreamReader.cpp:"
cat src/clp_ffi_js/ir/IrStreamReader.cpp

Length of output: 14360

src/clp_ffi_js/ir/decoding_methods.cpp (2)

Line range hint 57-82: LGTM! Existing implementation is robust

The function properly handles reader positioning and encoding type validation with appropriate error handling.


20-55: 🛠️ Refactor suggestion

Add defensive JSON validation and improve error handling

The implementation looks solid but could benefit from some improvements:

Consider these improvements:

  1. Add JSON structure validation:
     nlohmann::json const metadata = nlohmann::json::parse(metadata_view);
+    if (!metadata.is_object()) {
+        throw ClpFfiJsException{
+            clp::ErrorCode::ErrorCode_MetadataCorrupted,
+            __FILENAME__,
+            __LINE__,
+            "Metadata is not a valid JSON object."
+        };
+    }
+    if (!metadata.contains(clp::ffi::ir_stream::cProtocol::Metadata::VersionKey)) {
+        throw ClpFfiJsException{
+            clp::ErrorCode::ErrorCode_MetadataCorrupted,
+            __FILENAME__,
+            __LINE__,
+            "Version information missing from metadata."
+        };
+    }
  1. Add more detailed logging:
-    SPDLOG_INFO("The version is {}", version);
+    SPDLOG_INFO("Successfully extracted IR stream version: {}", version);
src/clp_ffi_js/ir/StreamReader.cpp (2)

6-6: Ensure C++20 support for <format> and std::format.

The inclusion of <format> and the usage of std::format in line 51 rely on C++20 features. Please verify that the project's build configuration and compilers are set to support C++20 to prevent potential compilation issues.

Also applies to: 51-51


64-68: Confirm class binding consistency in EMSCRIPTEN bindings.

In the EMSCRIPTEN_BINDINGS section, the class is registered as ClpStreamReader while binding clp_ffi_js::ir::StreamReader. Since there's an introduction of IrStreamReader, please ensure that the class binding accurately reflects the intended class hierarchy and that the JavaScript interface is correctly updated.

src/clp_ffi_js/ir/StreamReader.hpp (4)

30-31: Use std::unique_ptr for proper memory management in create method

Returning a std::unique_ptr<StreamReader> from the static create method enhances memory safety by ensuring exclusive ownership and automatic resource cleanup.


34-34: Declare virtual destructor in base class

Adding a virtual destructor to the StreamReader base class ensures that destructors of derived classes are called correctly, preventing potential resource leaks.


49-49: Convert member functions to pure virtual

Declaring methods as pure virtual functions (= 0) enforces that derived classes implement these interfaces, promoting polymorphism and ensuring consistent behaviour across implementations.

Also applies to: 54-54, 61-61, 69-69, 86-87


89-90: Make constructor protected to prevent direct instantiation

Moving the constructor to the protected section prevents direct instantiation of the StreamReader class, indicating its role as an abstract base class intended for inheritance.

src/clp_ffi_js/ir/IrStreamReader.hpp (5)

1-2: Header guard is correctly defined

The header guard CLP_FFI_JS_IR_IR_STREAM_READER_HPP is properly set up to prevent multiple inclusions.


27-27: Verify the necessity of 'friend StreamReader;' declaration

The friend StreamReader; declaration may not be necessary unless StreamReader requires access to the private members of IrStreamReader. Consider removing it if it's not needed.


45-56: Properly managing copy and move semantics

Disabling the copy constructor and copy assignment operator, while correctly handling the move constructor and deleting the move assignment operator, ensures proper resource management.


103-105: Efficient use of move semantics in constructor

Utilizing move semantics in the constructor to initialize stream_reader_data_context optimizes resource allocation.


113-118: Appropriate management of member variables

The use of smart pointers and standard containers for member variables ensures proper memory management and exception safety.

src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved
*
* @param log_level_filter Array of selected log levels
*/
void filter_log_events(emscripten::val const& log_level_filter) override;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ensure appropriate passing of 'emscripten::val'

Consider passing emscripten::val by value rather than by const reference in filter_log_events. Since emscripten::val is a lightweight handle to a JavaScript value, passing it by value might be more suitable and aligns with common practices.

Comment on lines 248 to 251
.constructor(
&clp_ffi_js::ir::IrStreamReader::create,
emscripten::return_value_policy::take_ownership()
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use return_value_policy::move for efficiency

In the Emscripten bindings, the constructor uses return_value_policy::take_ownership, which is typically used for pointers. Since IrStreamReader::create returns by value and supports move semantics, it's more appropriate to use return_value_policy::move to avoid unnecessary copies.

Apply this diff to adjust the return value policy:

             .constructor(
                     &clp_ffi_js::ir::IrStreamReader::create,
-                    emscripten::return_value_policy::take_ownership()
+                    emscripten::return_value_policy::move
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
.constructor(
&clp_ffi_js::ir::IrStreamReader::create,
emscripten::return_value_policy::take_ownership()
)
.constructor(
&clp_ffi_js::ir::IrStreamReader::create,
emscripten::return_value_policy::move
)

Comment on lines 183 to 189

auto const parsed{log_event.get_message().decode_and_unparse()};
if (false == parsed.has_value()) {
SPDLOG_ERROR("Failed to decode message.");
break;
}
message = parsed.value();
Copy link

@coderabbitai coderabbitai bot Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle message decoding failures explicitly

In the decode_range method, when message decoding fails, the code logs an error and breaks the loop. This may result in partial results without informing the caller of the failure. Consider throwing an exception to notify the caller about the decoding issue.

Apply this diff to throw an exception upon decoding failure:

     auto const parsed{log_event.get_message().decode_and_unparse()};
     if (false == parsed.has_value()) {
         SPDLOG_ERROR("Failed to decode message.");
-        break;
+        throw ClpFfiJsException{
+            clp::ErrorCode::ErrorCode_Corrupt,
+            __FILENAME__,
+            __LINE__,
+            "Failed to decode message"
+        };
     }
     message = parsed.value();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto const parsed{log_event.get_message().decode_and_unparse()};
if (false == parsed.has_value()) {
SPDLOG_ERROR("Failed to decode message.");
break;
}
message = parsed.value();
auto const parsed{log_event.get_message().decode_and_unparse()};
if (false == parsed.has_value()) {
SPDLOG_ERROR("Failed to decode message.");
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Corrupt,
__FILENAME__,
__LINE__,
"Failed to decode message"
};
}
message = parsed.value();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this. Messages decoding is not expected to fail unless the file has been corrupted / the deserializer in CLP core does not handle edge cases.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, understood. Throwing an exception upon message decoding failure is appropriate here since message decoding failures are not expected unless the file has been corrupted or the deserializer in CLP core does not handle edge cases. I'll note this for future reference.


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:26.917Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

Comment on lines 79 to 80
auto filter_levels{emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter
)};
Copy link

@coderabbitai coderabbitai bot Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for invalid log_level_filter input

Currently, the function assumes that log_level_filter is a valid array. To prevent potential errors if a non-array is passed from JavaScript, consider adding a type check and throwing an exception or returning an error if the input is invalid.

Apply this diff to enhance input validation:

 void IrStreamReader::filter_log_events(emscripten::val const& log_level_filter) {
     if (log_level_filter.isNull()) {
         m_filtered_log_event_map.reset();
         return;
     }
+    if (!log_level_filter.isArray()) {
+        throw ClpFfiJsException{
+            clp::ErrorCode::ErrorCode_BadParam,
+            __FILENAME__,
+            __LINE__,
+            "Expected an array for log_level_filter"
+        };
+    }

     m_filtered_log_event_map.emplace();
     auto filter_levels{emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter
     )};

Committable suggestion was skipped due to low confidence.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now this is skippable since the wrapper code is typed with TypeScript and any usage with non-array would fail static checks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, understood. Since the wrapper code is typed with TypeScript and non-array inputs would fail static checks, additional error handling here is not necessary.


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.367Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually sorry for my oversight, I think we want to define a type (similar to FilteredLogEventMapTsType) instead of directly using emscripten::val here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, thank you for the clarification. Defining a specific type for log_level_filter will enhance type safety and improve the code's readability. Here's how you might define a type similar to FilteredLogEventMapTsType:

using LogLevelFilterTsType = emscripten::val; // Define a type representing the log level filter

void IrStreamReader::filter_log_events(LogLevelFilterTsType const& log_level_filter) {
    // Implementation remains the same
}

This approach makes the expected input type explicit in the method signature and aligns with the existing type definitions.

Since this updates our previous understanding and may contradict the earlier learning (ID: 940b05fa-811e-47b0-98a2-bfbbe962ccf3) regarding the unnecessary runtime error handling for log_level_filter, would you like me to remove this learning? You can also manage learnings directly using the CodeRabbit UI learnings page.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

@davemarco davemarco changed the title WIP(Replaces #23)- Rename StreamReader -> IrStreamReader; Extract a StreamReader base class from IrStreamReader. Rename StreamReader -> IrStreamReader; Extract a StreamReader base class from IrStreamReader. Nov 1, 2024
@davemarco davemarco marked this pull request as ready for review November 1, 2024 17:55
@davemarco davemarco requested a review from junhaoliao November 1, 2024 17:55
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (7)
src/clp_ffi_js/ir/decoding_methods.cpp (1)

3-8: Consider reorganizing includes for better readability

Consider grouping includes into standard library, third-party libraries, and project-specific headers, separated by newlines:

-#include <cstdint>
-#include <json/single_include/nlohmann/json.hpp>
-#include <string>
-#include <string_view>
-#include <vector>
-#include <clp/ErrorCode.hpp>
-#include <clp/ffi/ir_stream/decoding_methods.hpp>
-#include <clp/ffi/ir_stream/protocol_constants.hpp>
-#include <clp/ReaderInterface.hpp>
-#include <clp/TraceableException.hpp>
-#include <clp/type_utils.hpp>
-#include <spdlog/spdlog.h>

+// Standard library
+#include <cstdint>
+#include <string>
+#include <string_view>
+#include <vector>
+
+// Third-party libraries
+#include <json/single_include/nlohmann/json.hpp>
+#include <spdlog/spdlog.h>
+
+// Project headers
+#include <clp/ErrorCode.hpp>
+#include <clp/ReaderInterface.hpp>
+#include <clp/TraceableException.hpp>
+#include <clp/ffi/ir_stream/decoding_methods.hpp>
+#include <clp/ffi/ir_stream/protocol_constants.hpp>
+#include <clp/type_utils.hpp>

Also applies to: 11-11, 14-14

src/clp_ffi_js/ir/StreamReader.hpp (3)

Line range hint 41-42: Document the implementation dependency more formally

The comment about LogEventDeserializer creates tight coupling. Consider documenting this relationship more formally in the class documentation.

-    // Delete move assignment operator since it's also disabled in `clp::ir::LogEventDeserializer`.
+    /**
+     * Move assignment is disabled due to the class's relationship with LogEventDeserializer.
+     * @note This class follows LogEventDeserializer's contract which disallows move assignment.
+     */

59-59: Enhance documentation for error handling behaviour

These pure virtual methods should document their error handling behaviour:

  • What exceptions might be thrown?
  • What are the preconditions and postconditions?
-    virtual void filter_log_events(emscripten::val const& log_level_filter) = 0;
+    /**
+     * Generates a filtered collection from all log events.
+     * @param log_level_filter Array of selected log levels
+     * @throws std::invalid_argument If the filter is malformed
+     * @post Filtered collection is updated to reflect the new filter
+     */
+    virtual void filter_log_events(emscripten::val const& log_level_filter) = 0;

-    [[nodiscard]] virtual auto deserialize_stream() -> size_t = 0;
+    /**
+     * Deserializes all log events in the stream.
+     * @return The number of successfully deserialized ("valid") log events
+     * @throws std::runtime_error If the stream is corrupted or invalid
+     * @post The stream is deallocated after processing
+     */
+    [[nodiscard]] virtual auto deserialize_stream() -> size_t = 0;

Also applies to: 67-67


28-29: Consider version handling strategy

Given that the PR objectives mention adding version checking in the factory method, consider:

  1. Documenting the version compatibility requirements in the class documentation
  2. Adding version-related methods to the interface
  3. Implementing version validation in the factory method

This will help ensure proper version handling across different stream formats.

Also applies to: 84-85

src/clp_ffi_js/ir/IrStreamReader.hpp (3)

1-18: Consider adding comprehensive file-level documentation.

While the includes are well-organized, it would be beneficial to add file-level documentation explaining:

  • The purpose and scope of this header file
  • The relationship between IrStreamReader and StreamReader
  • Any threading considerations or requirements

74-90: Enhance decode_range documentation for error handling.

The documentation should clarify:

  • What specific error conditions might cause a null return
  • Whether any exceptions might be thrown
  • The behaviour when begin_idx > end_idx

99-110: Add documentation for private members.

Consider adding:

  • Documentation for create_data_context explaining ownership transfer of parameters
  • Documentation for each member variable explaining their purpose and lifecycle
  • Any invariants maintained by these members
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ad306f2 and dba3910.

📒 Files selected for processing (7)
  • CMakeLists.txt (1 hunks)
  • src/clp_ffi_js/ir/IrStreamReader.cpp (1 hunks)
  • src/clp_ffi_js/ir/IrStreamReader.hpp (1 hunks)
  • src/clp_ffi_js/ir/StreamReader.cpp (2 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (3 hunks)
  • src/clp_ffi_js/ir/decoding_methods.cpp (2 hunks)
  • src/clp_ffi_js/ir/decoding_methods.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/clp_ffi_js/ir/decoding_methods.hpp
🔇 Additional comments (11)
src/clp_ffi_js/ir/decoding_methods.cpp (2)

Line range hint 20-43: LGTM! Well-structured error handling

The function properly validates the encoding type with appropriate error handling and clear error messages.


45-80: Verify version handling across the codebase

Let's verify how the version information is used elsewhere in the codebase to ensure compatibility.

✅ Verification successful

Let me gather more specific information about how the version is used in these files.


Version handling implementation is properly constrained

The version handling is well-defined and properly implemented:

  • The get_version function is used only in StreamReader.cpp
  • Version validation explicitly checks for specific versions: "v0.0.1", "v0.0.0", "0.0.1", "0.0.0"
  • Appropriate error handling is in place for unsupported versions
  • No other parts of the codebase make assumptions about version format
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for version-related code patterns
echo "Searching for version usage patterns..."
rg -l "VersionKey|get_version" --type cpp

echo "Searching for version format validation..."
ast-grep --pattern 'if ($version $_) { $$$ }'

Length of output: 388


Script:

#!/bin/bash
# Check version-related code in the identified files
echo "=== Content of decoding_methods.hpp ==="
rg -A 5 "get_version|VersionKey" src/clp_ffi_js/ir/decoding_methods.hpp

echo -e "\n=== Content of StreamReader.cpp ==="
rg -A 5 "get_version|VersionKey" src/clp_ffi_js/ir/StreamReader.cpp

# Look for version comparisons or validations
echo -e "\n=== Version comparisons or checks ==="
rg "version|Version" --type cpp -C 3

Length of output: 5891

src/clp_ffi_js/ir/StreamReader.hpp (3)

28-29: Excellent improvement to memory management!

The shift to std::unique_ptr return type enforces RAII principles and makes ownership semantics explicit. The [[nodiscard]] attribute helps prevent accidental resource leaks.


32-32: Virtual destructor is correctly defaulted

This enables proper cleanup of derived classes through base class pointers.


87-88: Protected constructor correctly enforces base class design

The protected defaulted constructor properly prevents direct instantiation while enabling derived class construction, which aligns well with the factory method pattern.

src/clp_ffi_js/ir/StreamReader.cpp (3)

5-6: LGTM: New includes support error handling improvements

The addition of ErrorCode.hpp and format headers properly supports the enhanced error handling mechanism.


Line range hint 21-53: LGTM: Improved memory management and error handling

The changes demonstrate good practices:

  • Use of std::unique_ptr for automatic resource management
  • Proper move semantics for transferring ownership
  • Comprehensive error handling with detailed messages

Line range hint 58-77: LGTM: Emscripten bindings properly updated

The bindings correctly reflect the class restructuring and include appropriate memory management policies.

src/clp_ffi_js/ir/IrStreamReader.hpp (2)

32-47: Well-structured class declaration with proper memory management.

The class follows modern C++ best practices:

  • Appropriate use of friend declaration for the factory pattern
  • Proper handling of special member functions
  • Clear documentation of design decisions

64-64: Previous review comment about emscripten::val parameter is still applicable.

CMakeLists.txt (1)

117-117: The addition of IrStreamReader.cpp looks good, eh!

The source file addition aligns with the PR objectives for extracting the StreamReader base class.

Let's verify the presence of corresponding header files:

✅ Verification successful

Beauty! The header files are all present and properly organized, eh!

Found all the expected header files in the src/clp_ffi_js/ir/ directory:

  • IrStreamReader.hpp
  • StreamReader.hpp
  • StreamReaderDataContext.hpp
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the presence of header files for IrStreamReader and StreamReader

# Test: Check if the header files exist
fd -e hpp -e h "IrStreamReader|StreamReader" src/clp_ffi_js/ir/

Length of output: 183

Comment on lines 45 to 80
auto get_version(clp::ReaderInterface& reader) -> std::string {
// The encoding type bytes must be consumed before the metadata can be read.
rewind_reader_and_validate_encoding_type(reader);

// Deserialize metadata bytes from preamble.
clp::ffi::ir_stream::encoded_tag_t metadata_type{};
std::vector<int8_t> metadata_bytes;
auto const deserialize_preamble_result{
clp::ffi::ir_stream::deserialize_preamble(reader, metadata_type, metadata_bytes)
};
if (clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success != deserialize_preamble_result) {
SPDLOG_CRITICAL(
"Failed to deserialize preamble for version reading: {}",
deserialize_preamble_result
);
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Failure,
__FILENAME__,
__LINE__,
"Failed to deserialize preamble for version reading."
};
}

// Deserialize metadata bytes which is encoded in JSON.
std::string_view const metadata_view{
clp::size_checked_pointer_cast<char const>(metadata_bytes.data()),
metadata_bytes.size()
};
nlohmann::json const metadata = nlohmann::json::parse(metadata_view);

// Retrieve version from metadata.
auto const& version{metadata.at(clp::ffi::ir_stream::cProtocol::Metadata::VersionKey)};
SPDLOG_INFO("The version is {}", version);

return version;
}
Copy link

@coderabbitai coderabbitai bot Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding more robust error handling and validation

While the basic error handling is present, consider these improvements:

  1. Add validation for the JSON structure and version format
  2. Consider using string_view for better memory efficiency
  3. Provide more specific error messages in exceptions

Here's a suggested improvement:

-auto get_version(clp::ReaderInterface& reader) -> std::string {
+auto get_version(clp::ReaderInterface& reader) -> std::string_view {
     // The encoding type bytes must be consumed before the metadata can be read.
     rewind_reader_and_validate_encoding_type(reader);

     // Deserialize metadata bytes from preamble.
     clp::ffi::ir_stream::encoded_tag_t metadata_type{};
     std::vector<int8_t> metadata_bytes;
     auto const deserialize_preamble_result{
             clp::ffi::ir_stream::deserialize_preamble(reader, metadata_type, metadata_bytes)
     };
     if (clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success != deserialize_preamble_result) {
         SPDLOG_CRITICAL(
-                "Failed to deserialize preamble for version reading: {}",
+                "Failed to deserialize preamble: error code {}",
                 deserialize_preamble_result
         );
         throw ClpFfiJsException{
                 clp::ErrorCode::ErrorCode_Failure,
                 __FILENAME__,
                 __LINE__,
-                "Failed to deserialize preamble for version reading."
+                fmt::format("Failed to deserialize preamble: error code {}", deserialize_preamble_result)
         };
     }

     // Deserialize metadata bytes which is encoded in JSON.
     std::string_view const metadata_view{
             clp::size_checked_pointer_cast<char const>(metadata_bytes.data()),
             metadata_bytes.size()
     };
-    nlohmann::json const metadata = nlohmann::json::parse(metadata_view);
+    nlohmann::json metadata;
+    try {
+        metadata = nlohmann::json::parse(metadata_view);
+    } catch (const nlohmann::json::parse_error& e) {
+        SPDLOG_CRITICAL("Failed to parse metadata JSON: {}", e.what());
+        throw ClpFfiJsException{
+            clp::ErrorCode::ErrorCode_MetadataCorrupted,
+            __FILENAME__,
+            __LINE__,
+            fmt::format("Failed to parse metadata JSON: {}", e.what())
+        };
+    }

     // Retrieve version from metadata.
-    auto const& version{metadata.at(clp::ffi::ir_stream::cProtocol::Metadata::VersionKey)};
+    try {
+        auto const& version{metadata.at(clp::ffi::ir_stream::cProtocol::Metadata::VersionKey)};
+        if (!version.is_string()) {
+            throw ClpFfiJsException{
+                clp::ErrorCode::ErrorCode_MetadataCorrupted,
+                __FILENAME__,
+                __LINE__,
+                "Version in metadata is not a string"
+            };
+        }
+    } catch (const nlohmann::json::out_of_range& e) {
+        throw ClpFfiJsException{
+            clp::ErrorCode::ErrorCode_MetadataCorrupted,
+            __FILENAME__,
+            __LINE__,
+            fmt::format("Version key not found in metadata: {}", e.what())
+        };
+    }
     SPDLOG_INFO("The version is {}", version);

     return version;
 }

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the rabbit on all those 3 items thought the error messages in the log and throwable should be consistent (don't apply the suggested code directly).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, understood. I'll ensure that error messages in logs and exceptions are consistent in future suggestions.


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/decoding_methods.cpp:45-80
Timestamp: 2024-11-04T21:02:17.823Z
Learning: Ensure that error messages in logs and exceptions are consistent.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Collaborator

@junhaoliao junhaoliao Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: it is a bad idea to return a string_view in this method since metadata goes out of scope when this function returns. Let's address only 1 & 2.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, thank you for pointing that out. You're correct; returning a std::string_view in this case is unsafe since metadata goes out of scope when the function returns. Returning a std::string is the appropriate choice to ensure the version string remains valid.


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/decoding_methods.cpp:45-80
Timestamp: 2024-11-04T21:06:24.293Z
Learning: In `src/clp_ffi_js/ir/decoding_methods.cpp`, within the `get_version(clp::ReaderInterface&)` function, avoid returning `std::string_view` when referencing local data that goes out of scope. Instead, return `std::string` to ensure data validity.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of coderabbit suggestion incliude printing things that are not printable, but otherwise did comment

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines 35 to 45
auto const version{get_version(*zstd_decompressor)};
if (version == "v0.0.1" || version == "v0.0.0" || version == "0.0.1" || version == "0.0.0") {
auto stream_reader_data_context{IrStreamReader::create_data_context(
std::move(zstd_decompressor),
std::move(data_buffer)
)};
m_encoded_log_events.emplace_back(std::move(log_viewer_event));
}
m_stream_reader_data_context.reset(nullptr);
return m_encoded_log_events.size();
}

auto StreamReader::decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType {
if (use_filter && false == m_filtered_log_event_map.has_value()) {
return DecodedResultsTsType{emscripten::val::null()};
}

size_t length{0};
if (use_filter) {
length = m_filtered_log_event_map->size();
} else {
length = m_encoded_log_events.size();
}
if (length < end_idx || begin_idx > end_idx) {
return DecodedResultsTsType{emscripten::val::null()};
}

std::string message;
constexpr size_t cDefaultReservedMessageLength{512};
message.reserve(cDefaultReservedMessageLength);
auto const results{emscripten::val::array()};

for (size_t i = begin_idx; i < end_idx; ++i) {
size_t log_event_idx{0};
if (use_filter) {
log_event_idx = m_filtered_log_event_map->at(i);
} else {
log_event_idx = i;
}
auto const& log_event{m_encoded_log_events[log_event_idx]};

auto const parsed{log_event.get_message().decode_and_unparse()};
if (false == parsed.has_value()) {
SPDLOG_ERROR("Failed to decode message.");
break;
}
message = parsed.value();

m_ts_pattern.insert_formatted_timestamp(log_event.get_timestamp(), message);

EM_ASM(
{ Emval.toValue($0).push([UTF8ToString($1), $2, $3, $4]); },
results.as_handle(),
message.c_str(),
log_event.get_timestamp(),
log_event.get_log_level(),
log_event_idx + 1
return std::unique_ptr<IrStreamReader>(
new IrStreamReader(std::move(stream_reader_data_context))
);
}
Copy link

@coderabbitai coderabbitai bot Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider using version constants and semantic version parsing

The hard-coded version strings ("v0.0.1", "v0.0.0", "0.0.1", "0.0.0") could be difficult to maintain. Consider:

  1. Defining these as constants
  2. Implementing semantic version parsing to handle versions more robustly
  3. Using a version comparison utility to support future versions

Example improvement:

// In a header file:
namespace clp_ffi_js::ir {
    constexpr std::array<std::string_view, 4> SUPPORTED_VERSIONS = {
        "v0.0.1", "v0.0.0", "0.0.1", "0.0.0"
    };
}

// In the implementation:
if (std::ranges::find(SUPPORTED_VERSIONS, version) != SUPPORTED_VERSIONS.end()) {
    // ... existing creation logic
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given an IRv1 version validate util function is not to be added to CLP-core soon, I agree with this approach.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, understood. I'll note that an IRv1 version validation utility function is not planned to be added to CLP-core in the near future.


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/StreamReader.cpp:35-45
Timestamp: 2024-11-04T20:55:01.575Z
Learning: An IRv1 version validation utility function is not planned to be added to CLP-core soon.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines 51 to 71
void IrStreamReader::filter_log_events(emscripten::val const& log_level_filter) {
if (log_level_filter.isNull()) {
m_filtered_log_event_map.reset();
return;
}

m_filtered_log_event_map.emplace();
auto filter_levels{emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter
)};
for (size_t log_event_idx = 0; log_event_idx < m_encoded_log_events.size(); ++log_event_idx) {
auto const& log_event = m_encoded_log_events[log_event_idx];
if (std::ranges::find(
filter_levels,
clp::enum_to_underlying_type(log_event.get_log_level())
)
!= filter_levels.end())
{
m_filtered_log_event_map->emplace_back(log_event_idx);
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Consider adding a type check for log_level_filter

Currently, the function assumes that log_level_filter is a valid array. To prevent potential runtime errors if a non-array is passed from JavaScript, consider adding a type check and throwing an exception if the input is invalid.

Comment on lines 154 to 181
for (size_t i = begin_idx; i < end_idx; ++i) {
size_t log_event_idx{0};
if (use_filter) {
log_event_idx = m_filtered_log_event_map->at(i);
} else {
log_event_idx = i;
}
auto const& log_event{m_encoded_log_events[log_event_idx]};

auto const parsed{log_event.get_message().decode_and_unparse()};
if (false == parsed.has_value()) {
SPDLOG_ERROR("Failed to decode message.");
break;
}
message = parsed.value();

m_ts_pattern.insert_formatted_timestamp(log_event.get_timestamp(), message);

EM_ASM(
{ Emval.toValue($0).push([UTF8ToString($1), $2, $3, $4]); },
results.as_handle(),
message.c_str(),
log_event.get_timestamp(),
log_event.get_log_level(),
log_event_idx + 1
);
}

Copy link

@coderabbitai coderabbitai bot Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Optimize the loop in decode_range

Instead of manually constructing the results array using EM_ASM, consider using a more direct approach with Emscripten bindings to improve readability and maintainability.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be explored in the future.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, understood.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

Comment on lines 163 to 168
auto const parsed{log_event.get_message().decode_and_unparse()};
if (false == parsed.has_value()) {
SPDLOG_ERROR("Failed to decode message.");
break;
}
message = parsed.value();
Copy link

@coderabbitai coderabbitai bot Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle message decoding failures more explicitly

In the decode_range method, when message decoding fails, the code logs an error and breaks the loop. This might result in partial results without informing the caller of the failure. Consider throwing an exception to notify the caller about the decoding issue.

You can apply this diff to throw an exception upon decoding failure:

 auto const parsed{log_event.get_message().decode_and_unparse()};
 if (false == parsed.has_value()) {
     SPDLOG_ERROR("Failed to decode message.");
-    break;
+    throw ClpFfiJsException{
+        clp::ErrorCode::ErrorCode_Corrupt,
+        __FILENAME__,
+        __LINE__,
+        "Failed to decode message"
+    };
 }
 message = parsed.value();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto const parsed{log_event.get_message().decode_and_unparse()};
if (false == parsed.has_value()) {
SPDLOG_ERROR("Failed to decode message.");
break;
}
message = parsed.value();
auto const parsed{log_event.get_message().decode_and_unparse()};
if (false == parsed.has_value()) {
SPDLOG_ERROR("Failed to decode message.");
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Corrupt,
__FILENAME__,
__LINE__,
"Failed to decode message"
};
}
message = parsed.value();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be the same as the above comment.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, thank you for pointing that out. It seems the comment was duplicated. Sorry for any confusion.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

Comment on lines 85 to 91
if (std::errc::no_message_available == error) {
break;
}
if (std::errc::result_out_of_range == error) {
SPDLOG_ERROR("File contains an incomplete IR stream");
break;
}
Copy link

@coderabbitai coderabbitai bot Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Uniform error handling upon deserialization failure

When encountering std::errc::result_out_of_range, the code logs an error and breaks the loop. To maintain consistency and ensure the caller is properly informed of all deserialization failures, consider throwing an exception here as well.

Here's a suggested change:

 if (std::errc::result_out_of_range == error) {
     SPDLOG_ERROR("File contains an incomplete IR stream");
-    break;
+    throw ClpFfiJsException{
+        clp::ErrorCode::ErrorCode_Corrupt,
+        __FILENAME__,
+        __LINE__,
+        "File contains an incomplete IR stream"
+    };
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (std::errc::no_message_available == error) {
break;
}
if (std::errc::result_out_of_range == error) {
SPDLOG_ERROR("File contains an incomplete IR stream");
break;
}
if (std::errc::no_message_available == error) {
break;
}
if (std::errc::result_out_of_range == error) {
SPDLOG_ERROR("File contains an incomplete IR stream");
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Corrupt,
__FILENAME__,
__LINE__,
"File contains an incomplete IR stream"
};
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not throw an exception here because we intend to handle missing End-of-Stream packet gracefully.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, understood. Thank you for the clarification.


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:85-91
Timestamp: 2024-11-04T19:37:26.993Z
Learning: In the `IrStreamReader::deserialize_stream` method, we handle missing End-of-Stream packets gracefully by logging an error and breaking the loop when encountering `std::errc::result_out_of_range`, rather than throwing an exception.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides renaming the class from StreamReader to IrStreamReader, did we make any other changes to this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed create() since we have effectively moved this to base class. I removed encripten bindings and put them all in base class.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
src/clp_ffi_js/ir/StreamReader.cpp (1)

5-18: Reorganize includes following standard convention.

The includes should be organized in the following groups, separated by blank lines:

  1. C++ standard library headers
  2. Third-party library headers
  3. Project headers

Apply this organization:

 #include <cstddef>
 #include <cstdint>
 #include <format>
 #include <memory>
 #include <utility>
 
 #include <clp/Array.hpp>
-#include <clp/ErrorCode.hpp>
 #include <clp/streaming_compression/zstd/Decompressor.hpp>
 #include <clp/TraceableException.hpp>
+#include <clp/ErrorCode.hpp>
 #include <emscripten/bind.h>
 #include <spdlog/spdlog.h>
 
 #include <clp_ffi_js/ClpFfiJsException.hpp>
 #include <clp_ffi_js/ir/decoding_methods.hpp>
 #include <clp_ffi_js/ir/IrStreamReader.hpp>
src/clp_ffi_js/ir/IrStreamReader.cpp (2)

102-119: Consider extracting log level parsing logic.

The log level extraction logic is complex and could be moved to a separate method for better readability and maintainability.

+auto parse_log_level(std::string_view logtype) -> LogLevel {
+    constexpr size_t cLogLevelPositionInMessages{1};
+    if (logtype.length() <= cLogLevelPositionInMessages) {
+        return LogLevel::NONE;
+    }
+    
+    auto const log_level_name_it{std::find_if(
+        cLogLevelNames.begin() + static_cast<size_t>(cValidLogLevelsBeginIdx),
+        cLogLevelNames.end(),
+        [&](std::string_view level) {
+            return logtype.substr(cLogLevelPositionInMessages).starts_with(level);
+        }
+    )};
+    
+    return (log_level_name_it != cLogLevelNames.end())
+        ? static_cast<LogLevel>(std::distance(cLogLevelNames.begin(), log_level_name_it))
+        : LogLevel::NONE;
+}

-        constexpr size_t cLogLevelPositionInMessages{1};
-        LogLevel log_level{LogLevel::NONE};
-        if (logtype.length() > cLogLevelPositionInMessages) {
-            auto const log_level_name_it{std::find_if(
-                    cLogLevelNames.begin() + static_cast<size_t>(cValidLogLevelsBeginIdx),
-                    cLogLevelNames.end(),
-                    [&](std::string_view level) {
-                        return logtype.substr(cLogLevelPositionInMessages).starts_with(level);
-                    }
-            )};
-            if (log_level_name_it != cLogLevelNames.end()) {
-                log_level = static_cast<LogLevel>(
-                        std::distance(cLogLevelNames.begin(), log_level_name_it)
-                );
-            }
-        }
+        auto const log_level{parse_log_level(logtype)};

177-185: Add documentation for the EM_ASM block.

The EM_ASM block performs critical JavaScript array manipulation, but its purpose and structure could be better documented.

+        // EM_ASM block pushes a tuple [message, timestamp, log_level, index] to the JavaScript array
+        // Using direct EM_ASM for performance instead of higher-level emscripten::val operations
         EM_ASM(
                 { Emval.toValue($0).push([UTF8ToString($1), $2, $3, $4]); },
                 results.as_handle(),
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between dba3910 and 9ddd974.

📒 Files selected for processing (3)
  • src/clp_ffi_js/ir/IrStreamReader.cpp (1 hunks)
  • src/clp_ffi_js/ir/IrStreamReader.hpp (1 hunks)
  • src/clp_ffi_js/ir/StreamReader.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/clp_ffi_js/ir/IrStreamReader.hpp
🧰 Additional context used
📓 Learnings (2)
src/clp_ffi_js/ir/IrStreamReader.cpp (3)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:85-91
Timestamp: 2024-11-04T19:37:27.204Z
Learning: In the `IrStreamReader::deserialize_stream` method, we handle missing End-of-Stream packets gracefully by logging an error and breaking the loop when encountering `std::errc::result_out_of_range`, rather than throwing an exception.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:27.398Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.
src/clp_ffi_js/ir/StreamReader.cpp (1)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/StreamReader.cpp:35-45
Timestamp: 2024-11-04T20:55:01.745Z
Learning: An IRv1 version validation utility function is not planned to be added to CLP-core soon.
🔇 Additional comments (7)
src/clp_ffi_js/ir/StreamReader.cpp (3)

35-45: Implement version checking using constants.

Based on the previous review discussion and the learning that "An IRv1 version validation utility function is not planned to be added to CLP-core soon", we should at least make the current version checking more maintainable.


46-52: Well-implemented error handling with proper logging and detailed exception.

The error handling follows best practices by:

  • Using SPDLOG_CRITICAL for logging
  • Throwing a custom exception with detailed information
  • Using std::format for clear error messages

Line range hint 58-77: Well-structured Emscripten bindings with proper memory management.

The bindings implementation:

  • Correctly registers TypeScript types for interoperability
  • Uses appropriate memory management with take_ownership
  • Provides a clean interface for JavaScript/TypeScript consumers
src/clp_ffi_js/ir/IrStreamReader.cpp (4)

1-38: LGTM! Well-organized includes and namespace setup.

The includes are properly organized into logical groups (standard library, external libraries, and project headers), and the using declarations are appropriately scoped within the namespace.


39-71: LGTM! Efficient event filtering implementation.

The event management methods are well-implemented with:

  • Proper handling of optional filtered map
  • Efficient use of std::ranges for filtering
  • Type-safe handling of log levels through TypeScript typing (as per previous learnings)

73-98: LGTM! Robust error handling in stream deserialization.

The error handling follows the established pattern:

  • Gracefully handles missing End-of-Stream packets
  • Properly throws exceptions for corruption errors
  • Clear error messages with detailed context

190-223: LGTM! Robust constructor and data context creation.

The implementation demonstrates good practices:

  • Proper use of move semantics
  • RAII principles in constructor
  • Thorough error handling with detailed messages

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
src/clp_ffi_js/ir/IrStreamReader.cpp (3)

78-79: Consider making the magic number configurable.

The default number of reserved log events (500,000) could be made configurable through a constant or configuration parameter for better flexibility.

-    constexpr size_t cDefaultNumReservedLogEvents{500'000};
+    static constexpr size_t cDefaultNumReservedLogEvents{500'000};  // TODO: Make configurable

103-119: Consider extracting log level parsing logic.

The log level parsing logic is complex enough to warrant its own method for better readability and maintainability.

+    auto parse_log_level(std::string_view logtype) -> LogLevel {
+        constexpr size_t cLogLevelPositionInMessages{1};
+        if (logtype.length() <= cLogLevelPositionInMessages) {
+            return LogLevel::NONE;
+        }
+        
+        auto const log_level_name_it{std::find_if(
+            cLogLevelNames.begin() + static_cast<size_t>(cValidLogLevelsBeginIdx),
+            cLogLevelNames.end(),
+            [&](std::string_view level) {
+                return logtype.substr(cLogLevelPositionInMessages).starts_with(level);
+            }
+        )};
+        
+        if (log_level_name_it == cLogLevelNames.end()) {
+            return LogLevel::NONE;
+        }
+        
+        return static_cast<LogLevel>(
+            std::distance(cLogLevelNames.begin(), log_level_name_it)
+        );
+    }

177-185: Consider using Embind instead of EM_ASM for better type safety.

The current implementation uses EM_ASM for array manipulation. Consider using Embind's val::array() methods for better type safety and maintainability.

-        EM_ASM(
-                { Emval.toValue($0).push([UTF8ToString($1), $2, $3, $4]); },
-                results.as_handle(),
-                message.c_str(),
-                log_event.get_timestamp(),
-                log_event.get_log_level(),
-                log_event_idx + 1
-        );
+        results.call<void>("push", emscripten::val::array(
+            std::vector<emscripten::val>{
+                emscripten::val(message),
+                emscripten::val(log_event.get_timestamp()),
+                emscripten::val(log_event.get_log_level()),
+                emscripten::val(log_event_idx + 1)
+            }));
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9ddd974 and a5cb529.

📒 Files selected for processing (2)
  • src/clp_ffi_js/ir/IrStreamReader.cpp (1 hunks)
  • src/submodules/clp (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/submodules/clp
🧰 Additional context used
📓 Learnings (1)
src/clp_ffi_js/ir/IrStreamReader.cpp (3)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:85-91
Timestamp: 2024-11-04T19:37:27.204Z
Learning: In the `IrStreamReader::deserialize_stream` method, we handle missing End-of-Stream packets gracefully by logging an error and breaking the loop when encountering `std::errc::result_out_of_range`, rather than throwing an exception.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:27.398Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.
🔇 Additional comments (4)
src/clp_ffi_js/ir/IrStreamReader.cpp (4)

1-38: LGTM! Well-organized includes and namespace setup.

The includes are logically grouped and the namespace usage is appropriate.


39-41: LGTM! Simple and effective accessor method.


190-197: LGTM! Constructor properly initializes member variables.

The constructor correctly moves the data context and initializes the timestamp pattern.


199-223: LGTM! Robust error handling in data context creation.

The method properly validates the decompressor and handles errors appropriately.

@@ -3,249 +3,122 @@
#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <iterator>
#include <format>
#include <json/single_include/nlohmann/json.hpp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add json to src/clp-ffi-js/.clang-format.

src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved
Comment on lines 156 to 175
// Required to validate encoding type prior to getting version.
rewind_reader_and_validate_encoding_type(*zstd_decompressor);
auto const version{get_version(*zstd_decompressor)};

// Required that reader offset matches position after validation in order to decode log events.
rewind_reader_and_validate_encoding_type(*zstd_decompressor);
if (std::ranges::find(cUnstructuredIrVersions, version) != cUnstructuredIrVersions.end()) {
return std::make_unique<UnstructuredIrStreamReader>(UnstructuredIrStreamReader::create(
std::move(zstd_decompressor),
std::move(data_buffer)
));
}
SPDLOG_INFO("did i get here 3");

throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Unsupported,
__FILENAME__,
__LINE__,
std::format("Unable to create reader for IR stream with version {}.", version)
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Required to validate encoding type prior to getting version.
rewind_reader_and_validate_encoding_type(*zstd_decompressor);
auto const version{get_version(*zstd_decompressor)};
// Required that reader offset matches position after validation in order to decode log events.
rewind_reader_and_validate_encoding_type(*zstd_decompressor);
if (std::ranges::find(cUnstructuredIrVersions, version) != cUnstructuredIrVersions.end()) {
return std::make_unique<UnstructuredIrStreamReader>(UnstructuredIrStreamReader::create(
std::move(zstd_decompressor),
std::move(data_buffer)
));
}
SPDLOG_INFO("did i get here 3");
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Unsupported,
__FILENAME__,
__LINE__,
std::format("Unable to create reader for IR stream with version {}.", version)
};
rewind_reader_and_validate_encoding_type(*zstd_decompressor);
// Validate the stream's version
auto pos = zstd_decompressor->get_pos();
auto const version{get_version(*zstd_decompressor)};
if (std::ranges::find(cUnstructuredIrVersions, version) == cUnstructuredIrVersions.end()) {
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Unsupported,
__FILENAME__,
__LINE__,
std::format("Unable to create reader for IR stream with version {}.", version)
};
}
try {
zstd_decompressor->seek_from_begin(pos);
} catch (ZstdDecompressor::OperationFailed& e) {
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Failure,
__FILENAME__,
__LINE__,
std::format("Unable to rewind zstd decompressor: {}", e.what())
};
}
return std::make_unique<UnstructuredIrStreamReader>(
UnstructuredIrStreamReader::create(std::move(zstd_decompressor), std::move(data_buffer))
);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does a few things:

  • Refactors the code structure a bit in terms of comments and flow
    • Note that we handle the error path in the if block rather than the previous code which handled the success path in the if block
  • Updated the flow to avoid re-validating the encoding type twice:
    • Rewind + validate encoding type
    • Save position
    • Validate version
    • Restore position
    • Create and return the reader

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better for you to copy and commit this manually rather than applying the change from GH, just so that you can double-check what I did.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is did you test it... I tried something similiar but it did not work. I will test it for you and let u know

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, hence why I wanted you to double-check, lol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay it seems to work now. I dont know why mine was giving an error, but wtv i guess.

src/clp_ffi_js/ir/StreamReader.hpp Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.hpp Show resolved Hide resolved
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp Outdated Show resolved Hide resolved
*/
[[nodiscard]] static auto create(
std::unique_ptr<ZstdDecompressor>&& zstd_decompressor,
clp::Array<char>&& data_array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's pass this as a value for the same reason I discussed with Junhao in a previous PR. Essentially, passing it as a value and moving it in the caller offers maximum flexibility. We can't do the same for zstd_decompressor because the class doesn't support it and likely won't any time soon.

void filter_log_events(LogLevelFilterTsType const& log_level_filter) override;

/**
* Deserializes all log events in the stream. After the stream has been exhausted, it will be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I see you've deleted this docstring. I'm fine keeping it this way, but if @junhaoliao wants to keep the part about deallocating the stream after exhaustion, then we need to add the docstring back in UnstructuredIrStreamReader.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
src/clp_ffi_js/ir/StreamReader.cpp (1)

168-168: Remove debug log message.

The debug message "did i get here 3" was left in accidentally and should be removed.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 28d4250 and 1b31f08.

📒 Files selected for processing (3)
  • src/clp_ffi_js/ir/StreamReader.cpp (2 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (3 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/clp_ffi_js/ir/StreamReader.hpp
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
🧰 Additional context used
📓 Learnings (1)
src/clp_ffi_js/ir/StreamReader.cpp (3)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/StreamReader.cpp:35-45
Timestamp: 2024-11-04T20:55:01.745Z
Learning: An IRv1 version validation utility function is not planned to be added to CLP-core soon.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:27.398Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:85-91
Timestamp: 2024-11-04T19:37:27.204Z
Learning: In the `IrStreamReader::deserialize_stream` method, we handle missing End-of-Stream packets gracefully by logging an error and breaking the loop when encountering `std::errc::result_out_of_range`, rather than throwing an exception.
🪛 cppcheck
src/clp_ffi_js/ir/StreamReader.cpp

[error] 111-111: syntax error

(syntaxError)

🔇 Additional comments (1)
src/clp_ffi_js/ir/StreamReader.cpp (1)

7-7: Add json to .clang-format configuration.

As noted in previous reviews, the json include needs to be added to src/clp_ffi-js/.clang-format to ensure consistent formatting.

Comment on lines 156 to 161
// Required to validate encoding type prior to getting version.
rewind_reader_and_validate_encoding_type(*zstd_decompressor);
auto const version{get_version(*zstd_decompressor)};

// Required that reader offset matches position after validation in order to decode log events.
rewind_reader_and_validate_encoding_type(*zstd_decompressor);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Optimize stream validation flow.

The encoding type is validated twice unnecessarily. As suggested in previous reviews, we can optimize this by:

  1. Rewind + validate encoding type once
  2. Save position
  3. Validate version
  4. Restore position
  5. Create and return the reader

Apply this restructuring:

-    // Required to validate encoding type prior to getting version.
-    rewind_reader_and_validate_encoding_type(*zstd_decompressor);
+    rewind_reader_and_validate_encoding_type(*zstd_decompressor);
+    
+    // Save position after encoding validation
+    auto pos = zstd_decompressor->get_pos();
+    
+    // Get and validate version
     auto const version{get_version(*zstd_decompressor)};
-
-    // Required that reader offset matches position after validation in order to decode log events.
-    rewind_reader_and_validate_encoding_type(*zstd_decompressor);
+    
+    // Restore position for decoding
+    try {
+        zstd_decompressor->seek_from_begin(pos);
+    } catch (ZstdDecompressor::OperationFailed& e) {
+        throw ClpFfiJsException{
+                clp::ErrorCode::ErrorCode_Failure,
+                __FILENAME__,
+                __LINE__,
+                std::format("Unable to rewind zstd decompressor: {}", e.what())
+        };
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Required to validate encoding type prior to getting version.
rewind_reader_and_validate_encoding_type(*zstd_decompressor);
auto const version{get_version(*zstd_decompressor)};
// Required that reader offset matches position after validation in order to decode log events.
rewind_reader_and_validate_encoding_type(*zstd_decompressor);
rewind_reader_and_validate_encoding_type(*zstd_decompressor);
// Save position after encoding validation
auto pos = zstd_decompressor->get_pos();
// Get and validate version
auto const version{get_version(*zstd_decompressor)};
// Restore position for decoding
try {
zstd_decompressor->seek_from_begin(pos);
} catch (ZstdDecompressor::OperationFailed& e) {
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Failure,
__FILENAME__,
__LINE__,
std::format("Unable to rewind zstd decompressor: {}", e.what())
};
}

Comment on lines +71 to +108
auto get_version(clp::ReaderInterface& reader) -> std::string {
// Deserialize metadata bytes from preamble.
clp::ffi::ir_stream::encoded_tag_t metadata_type{};
std::vector<int8_t> metadata_bytes;
auto const err{clp::ffi::ir_stream::deserialize_preamble(reader, metadata_type, metadata_bytes)
};
if (result.has_error()) {
auto const error_code{result.error()};
SPDLOG_CRITICAL(
"Failed to create deserializer: {}:{}",
error_code.category().name(),
error_code.message()
);
if (IRErrorCode::IRErrorCode_Success != err) {
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Failure,
__FILENAME__,
__LINE__,
"Failed to create deserializer"
std::format(
"Failed to deserialize preamble: IR error code {}",
clp::enum_to_underlying_type(err)
)
};
}

return {std::move(data_buffer), std::move(zstd_decompressor), std::move(result.value())};
std::string version;
try {
// Deserialize metadata bytes as JSON.
std::string_view const metadata_view{
clp::size_checked_pointer_cast<char const>(metadata_bytes.data()),
metadata_bytes.size()
};
nlohmann::json const metadata = nlohmann::json::parse(metadata_view);
version = metadata.at(clp::ffi::ir_stream::cProtocol::Metadata::VersionKey);
} catch (nlohmann::json::exception const& e) {
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_MetadataCorrupted,
__FILENAME__,
__LINE__,
std::format("Failed to parse stream's metadata: {}", e.what())
};
}

SPDLOG_INFO("IR version is {}", version);
return version;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add metadata type validation.

The function deserializes the preamble but doesn't validate if the metadata_type is correct before attempting to parse it as JSON.

Add validation like this:

     if (IRErrorCode::IRErrorCode_Success != err) {
         throw ClpFfiJsException{
                 clp::ErrorCode::ErrorCode_Failure,
                 __FILENAME__,
                 __LINE__,
                 std::format(
                         "Failed to deserialize preamble: IR error code {}",
                         clp::enum_to_underlying_type(err)
                 )
         };
     }
+    if (metadata_type != clp::ffi::ir_stream::cProtocol::Tags::Metadata) {
+        throw ClpFfiJsException{
+                clp::ErrorCode::ErrorCode_MetadataCorrupted,
+                __FILENAME__,
+                __LINE__,
+                std::format("Invalid metadata type: {}", metadata_type)
+        };
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto get_version(clp::ReaderInterface& reader) -> std::string {
// Deserialize metadata bytes from preamble.
clp::ffi::ir_stream::encoded_tag_t metadata_type{};
std::vector<int8_t> metadata_bytes;
auto const err{clp::ffi::ir_stream::deserialize_preamble(reader, metadata_type, metadata_bytes)
};
if (result.has_error()) {
auto const error_code{result.error()};
SPDLOG_CRITICAL(
"Failed to create deserializer: {}:{}",
error_code.category().name(),
error_code.message()
);
if (IRErrorCode::IRErrorCode_Success != err) {
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Failure,
__FILENAME__,
__LINE__,
"Failed to create deserializer"
std::format(
"Failed to deserialize preamble: IR error code {}",
clp::enum_to_underlying_type(err)
)
};
}
return {std::move(data_buffer), std::move(zstd_decompressor), std::move(result.value())};
std::string version;
try {
// Deserialize metadata bytes as JSON.
std::string_view const metadata_view{
clp::size_checked_pointer_cast<char const>(metadata_bytes.data()),
metadata_bytes.size()
};
nlohmann::json const metadata = nlohmann::json::parse(metadata_view);
version = metadata.at(clp::ffi::ir_stream::cProtocol::Metadata::VersionKey);
} catch (nlohmann::json::exception const& e) {
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_MetadataCorrupted,
__FILENAME__,
__LINE__,
std::format("Failed to parse stream's metadata: {}", e.what())
};
}
SPDLOG_INFO("IR version is {}", version);
return version;
auto get_version(clp::ReaderInterface& reader) -> std::string {
// Deserialize metadata bytes from preamble.
clp::ffi::ir_stream::encoded_tag_t metadata_type{};
std::vector<int8_t> metadata_bytes;
auto const err{clp::ffi::ir_stream::deserialize_preamble(reader, metadata_type, metadata_bytes)
};
if (IRErrorCode::IRErrorCode_Success != err) {
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Failure,
__FILENAME__,
__LINE__,
std::format(
"Failed to deserialize preamble: IR error code {}",
clp::enum_to_underlying_type(err)
)
};
}
if (metadata_type != clp::ffi::ir_stream::cProtocol::Tags::Metadata) {
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_MetadataCorrupted,
__FILENAME__,
__LINE__,
std::format("Invalid metadata type: {}", metadata_type)
};
}
std::string version;
try {
// Deserialize metadata bytes as JSON.
std::string_view const metadata_view{
clp::size_checked_pointer_cast<char const>(metadata_bytes.data()),
metadata_bytes.size()
};
nlohmann::json const metadata = nlohmann::json::parse(metadata_view);
version = metadata.at(clp::ffi::ir_stream::cProtocol::Metadata::VersionKey);
} catch (nlohmann::json::exception const& e) {
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_MetadataCorrupted,
__FILENAME__,
__LINE__,
std::format("Failed to parse stream's metadata: {}", e.what())
};
}
SPDLOG_INFO("IR version is {}", version);
return version;

src/clp_ffi_js/.clang-format Outdated Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.cpp Show resolved Hide resolved
src/clp_ffi_js/ir/StreamReader.cpp Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
src/clp_ffi_js/ir/StreamReader.cpp (2)

6-26: Reorganize header includes into logical groups.

Based on previous discussions, the headers should be organized into distinct groups:

  1. C++ standard library headers
  2. Third-party library headers
  3. Project-specific headers (clp)
  4. Local headers

Apply this reorganization:

-#include <format>
-#include <memory>
-#include <string>
-#include <string_view>
-#include <utility>
-#include <vector>
-
-#include <clp/Array.hpp>
-#include <clp/ErrorCode.hpp>
-#include <clp/ffi/ir_stream/decoding_methods.hpp>
-#include <clp/ffi/ir_stream/protocol_constants.hpp>
-#include <clp/ReaderInterface.hpp>
-#include <clp/streaming_compression/zstd/Decompressor.hpp>
-#include <clp/TraceableException.hpp>
-#include <clp/type_utils.hpp>
-#include <emscripten/bind.h>
-#include <json/single_include/nlohmann/json.hpp>
-#include <spdlog/spdlog.h>
-
-#include <clp_ffi_js/ClpFfiJsException.hpp>
-#include <clp_ffi_js/ir/UnstructuredIrStreamReader.hpp>

+// C++ standard library
+#include <format>
+#include <memory>
+#include <string>
+#include <string_view>
+#include <utility>
+#include <vector>
+
+// Third-party libraries
+#include <emscripten/bind.h>
+#include <json/single_include/nlohmann/json.hpp>
+#include <spdlog/spdlog.h>
+
+// CLP libraries
+#include <clp/Array.hpp>
+#include <clp/ErrorCode.hpp>
+#include <clp/ffi/ir_stream/decoding_methods.hpp>
+#include <clp/ffi/ir_stream/protocol_constants.hpp>
+#include <clp/ReaderInterface.hpp>
+#include <clp/streaming_compression/zstd/Decompressor.hpp>
+#include <clp/TraceableException.hpp>
+#include <clp/type_utils.hpp>
+
+// Local headers
+#include <clp_ffi_js/ClpFfiJsException.hpp>
+#include <clp_ffi_js/ir/UnstructuredIrStreamReader.hpp>

116-121: Add newlines before comments for better readability.

Following the established coding style, add newlines before comments.

Apply this formatting:

 EMSCRIPTEN_BINDINGS(ClpStreamReader) {
+
     // JS types used as inputs
     emscripten::register_type<clp_ffi_js::ir::DataArrayTsType>("Uint8Array");
     emscripten::register_type<clp_ffi_js::ir::LogLevelFilterTsType>("number[] | null");
+
     // JS types used as outputs
🧰 Tools
🪛 cppcheck

[error] 116-116: syntax error

(syntaxError)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1b31f08 and bfea301.

📒 Files selected for processing (5)
  • CMakeLists.txt (1 hunks)
  • src/clp_ffi_js/.clang-format (1 hunks)
  • src/clp_ffi_js/ir/StreamReader.cpp (2 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/clp_ffi_js/.clang-format
🚧 Files skipped from review as they are similar to previous changes (3)
  • CMakeLists.txt
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
🧰 Additional context used
📓 Learnings (1)
src/clp_ffi_js/ir/StreamReader.cpp (2)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/StreamReader.cpp:35-45
Timestamp: 2024-11-04T20:55:01.745Z
Learning: An IRv1 version validation utility function is not planned to be added to CLP-core soon.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:27.398Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.
🪛 cppcheck
src/clp_ffi_js/ir/StreamReader.cpp

[error] 116-116: syntax error

(syntaxError)

🔇 Additional comments (3)
src/clp_ffi_js/ir/StreamReader.cpp (3)

32-48: LGTM! Well-documented function declarations.

The function declarations are clear and include comprehensive documentation with proper exception specifications.


49-72: LGTM! Robust encoding type validation.

The function properly handles:

  • Reader position reset
  • Encoding type validation
  • Appropriate error handling with descriptive messages

146-187: LGTM! Well-structured stream reader creation.

The implementation follows best practices with:

  • Proper memory management using smart pointers
  • Optimized stream validation flow
  • Comprehensive error handling
  • Clear logging

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

Split `StreamReader` into an interface and implementation to prepare for adding another IR stream reader.

@davemarco davemarco changed the title Rename StreamReader -> IrStreamReader; Extract a StreamReader base class from IrStreamReader. Split StreamReader into an interface and implementation to prepare for adding another IR stream reader. Nov 7, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)

63-70: Enhance method documentation.

The documentation for deserialize_stream() only references the base class method without explaining its specific behaviour in this context. Consider adding details about:

  • What exactly is being deserialized
  • What happens to the stream after exhaustion
  • Any specific error conditions
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between bfea301 and 119de6c.

📒 Files selected for processing (2)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1 hunks)
  • src/submodules/clp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/submodules/clp
🔇 Additional comments (5)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (5)

21-25: LGTM! Well-documented type alias.

The FilteredLogEventsMap type alias is clearly documented and effectively describes its purpose as a mapping between filtered and unfiltered log event indices.


27-31: LGTM! Clear and concise class documentation.

The class documentation effectively communicates its purpose and responsibilities.


33-43: LGTM! Proper RAII implementation.

The class correctly implements RAII with well-defined destructor and move/copy semantics. The rationale for deleting the move assignment operator is clearly documented.


76-87: LGTM! Well-structured private implementation.

The private section demonstrates good design with:

  • Proper encapsulation using private constructor
  • Clear member variable organization
  • Appropriate use of smart pointers for ownership

31-87: Verify error handling across FFI boundary.

Since this class handles compressed data and operates across the FFI boundary, let's verify the error handling implementation:

✅ Verification successful

Let me gather more information about memory safety patterns, particularly around unique pointers and RAII usage.


Error handling and memory safety are properly implemented

The verification shows robust error handling and memory safety mechanisms:

  • Proper RAII with smart pointers (unique_ptr) for managing resources
  • Consistent error handling using ClpFfiJsException with detailed error messages
  • Clear ownership semantics with move operations and deleted copy operations
  • Comprehensive error checks for:
    • Corrupt metadata
    • Deserialization failures
    • Encoding type validation
    • Stream version compatibility
    • Decompression errors

The implementation follows best practices for C++ FFI boundary safety through exception handling and RAII patterns.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for error handling patterns in the implementation
echo "Searching for error handling in implementation..."
rg -A 5 "throw|error|exception" --type cpp

# Look for memory safety patterns
echo "Checking memory safety patterns..."
ast-grep --pattern 'try { $$$ } catch { $$$ }'

Length of output: 10412


Script:

#!/bin/bash
# Check for unique_ptr usage and ownership transfer
echo "Checking smart pointer patterns..."
rg "unique_ptr|shared_ptr|make_unique" --type cpp -A 3

# Look for move semantics and RAII patterns
echo "Checking RAII and move semantics..."
ast-grep --pattern 'std::move($_)'

Length of output: 3823

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants