Split `StreamReader` into an interface and implementation to prepare for adding another IR stream reader. #26

davemarco · 2024-10-31T20:17:19Z

Description

Extracts a base class for IRv1 and IRv2 stream readers to inherit from. Since IRv2 is not yet supported by clp-ffi-js, the repo now includes a base class and a derived class for IRv1.

Code was originally written by Junhao. He made these changes

Rename ClpIrStreamReader to ClpIRStreamReader in bindings.
Add get_version to decoding methods.
Extract a StreamReader base class from IRStreamReader.
Implement factory method StreamReader::create to perform version checking before creating derived classes.

I merged main afterwards, and made small changes like exporting base class methods instead of derived class.

Validation performed

Build and test functionality in log viewer (opening a few files and filtering). FYI since the exported class name changed, we will need to update log viewer when this is re-released.

Summary by CodeRabbit

New Features
- Introduced the IrStreamReader class for reading and processing IR streams.
- Added methods for filtering log events, deserializing streams, and decoding log event ranges.
- New function to retrieve version information from a reader interface.
Bug Fixes
- Enhanced error handling during stream deserialization and logging.
Refactor
- Significant restructuring of the StreamReader class to improve memory management and extensibility.
Documentation
- Updated documentation to reflect changes in class names and functionalities.

Summary by CodeRabbit

New Features
- Introduced the UnstructuredIrStreamReader class for handling unstructured IR streams, including methods for filtering log events and deserializing data.
- Enhanced error handling and logging capabilities in the stream processing.
Refactor
- Major restructuring of the StreamReader class, focusing on smart pointer management and interface simplification.
- Removed several outdated methods to streamline functionality.
Chores
- Updated build configuration to include new source files and reflect changes in project structure.
- Updated clang-format configuration to recognize additional library headers.

…ader.cpp.

coderabbitai · 2024-10-31T20:17:27Z

Walkthrough

The pull request introduces significant changes to the StreamReader class and related components in the clp_ffi_js namespace. Key modifications include updating the create method to return a std::unique_ptr<StreamReader>, enhancing memory management. Multiple methods have been removed or refactored, and new methods have been added to improve error handling and encoding validation. Additionally, a new class, UnstructuredIrStreamReader, has been introduced to handle unstructured IR streams. The changes reflect a substantial restructuring aimed at improving the class's functionality and interface.

Changes

File Path	Change Summary
`src/clp_ffi_js/ir/StreamReader.cpp`	Updated `create` method return type to `std::unique_ptr<StreamReader>`. Removed methods: `get_num_events_buffered`, `get_filtered_log_event_map`, `filter_log_events`, `deserialize_stream`, `decode_range`. Added new methods for version checking and encoding validation.
`src/clp_ffi_js/ir/StreamReader.hpp`	Changed `create` method return type to `std::unique_ptr<StreamReader>`. Made several methods virtual/pure virtual. Modified constructor to be protected. Added new type alias and constant. Introduced two new static methods.
`src/clp_ffi_js/ir/decoding_methods.cpp`	Removed implementation of `rewind_reader_and_validate_encoding_type`.
`src/clp_ffi_js/ir/decoding_methods.hpp`	Removed header file `decoding_methods.hpp` which declared `rewind_reader_and_validate_encoding_type`.
`CMakeLists.txt`	Updated `CLP_FFI_JS_SRC_MAIN` to replace `decoding_methods.cpp` with `UnstructuredIrStreamReader.cpp`.
`src/submodules/clp`	Updated submodule reference from `86299ca2907565e09cb10c2ddd3661ad1ceb6cb0` to `e1f3f2abe3473324b19d66e22c182ec3ac0d408f`.
`src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp`	Added new file implementing `UnstructuredIrStreamReader` class with methods for reading, filtering, and deserializing unstructured IR streams.
`src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp`	Added new header file for `UnstructuredIrStreamReader` class, including methods for handling log events and decoding streams.

Possibly related PRs

Add log-level filtering methods and associated refactoring: #12: This PR enhances the StreamReader class by adding new methods for filtering log events and managing deserialization, which directly relates to the significant modifications made in the main PR regarding the StreamReader class.
Refactor StreamReader to modularize decoding logic. #22: This PR refactors the StreamReader class to modularize decoding logic, which aligns with the main PR's focus on restructuring the StreamReader class and improving its functionality.

Suggested reviewers

junhaoliao
kirkrodrigues

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (5)

src/clp_ffi_js/ir/decoding_methods.hpp (1)

9-9: Add documentation for the get_version function.

As this is part of the public API, please add documentation explaining:

The function's purpose

Expected return value format

Any exceptions that might be thrown

Any preconditions or side effects
src/clp_ffi_js/ir/StreamReaderDataContext.hpp (1)
Line range hint 18-18: Class name needs to be updated to match the refactoring

The class name should be IrStreamReaderDataContext to maintain consistency with the renaming changes mentioned in the PR objectives and summary. This aligns with the pattern of renaming StreamReader to IrStreamReader.

Apply this diff to update the class name:
-class StreamReaderDataContext {
+class IrStreamReaderDataContext {
src/clp_ffi_js/ir/decoding_methods.cpp (2)
3-8: Consider reorganizing include directives

The includes should be grouped into three distinct blocks separated by blank lines:

System headers (, , etc.)

Third-party headers (nlohmann/json)

Project headers (clp/*)
+// System headers
 #include <cstdint>
-#include <json/single_include/nlohmann/json.hpp>
 #include <string>
 #include <string_view>
 #include <vector>

+// Third-party headers
+#include <json/single_include/nlohmann/json.hpp>
+#include <spdlog/spdlog.h>
+
+// Project headers
 #include <clp/ErrorCode.hpp>
Also applies to: 11-11, 14-14

24-41: Enhance error reporting for preamble deserialization

The error handling for preamble deserialization could be more informative.
     if (clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success != deserialize_preamble_result) {
         SPDLOG_CRITICAL(
-                "Failed to deserialize preamble for version reading: {}",
+                "Failed to deserialize IR stream preamble: error_code={}, metadata_size={}",
-                deserialize_preamble_result
+                deserialize_preamble_result,
+                metadata_bytes.size()
         );
         throw ClpFfiJsException{
                 clp::ErrorCode::ErrorCode_Failure,
                 __FILENAME__,
                 __LINE__,
-                "Failed to deserialize preamble for version reading."
+                "Failed to deserialize IR stream preamble. See logs for details."
         };
     }
src/clp_ffi_js/ir/IrStreamReader.cpp (1)

193-201: Ensure safe handling of message content in EM_ASM

When passing message.c_str() to EM_ASM, there's a potential risk if the message contains unexpected characters or is not properly null-terminated. To enhance security and stability, consider validating or sanitizing the message before passing it to JavaScript.

You might sanitize the message or ensure it meets expected format constraints before usage.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9e82372 and ad306f2.

📒 Files selected for processing (7)

src/clp_ffi_js/ir/IrStreamReader.cpp (1 hunks)
src/clp_ffi_js/ir/IrStreamReader.hpp (1 hunks)
src/clp_ffi_js/ir/StreamReader.cpp (2 hunks)
src/clp_ffi_js/ir/StreamReader.hpp (3 hunks)
src/clp_ffi_js/ir/StreamReaderDataContext.hpp (1 hunks)
src/clp_ffi_js/ir/decoding_methods.cpp (1 hunks)
src/clp_ffi_js/ir/decoding_methods.hpp (1 hunks)

🔇 Additional comments (18)

src/clp_ffi_js/ir/decoding_methods.hpp (3)

1-7: LGTM! Header organization is clean and follows best practices.

The header guards and includes are well-organized and contain only the necessary dependencies.

8-8: LGTM! Namespace declaration is appropriate.

The nested namespace follows modern C++ conventions and matches the project structure.

Line range hint 10-17: Consider separating the rewind and validation responsibilities.

The function name suggests it performs two distinct operations: rewinding and validation. Consider splitting these into separate functions to improve modularity and reusability:

rewind_reader()
validate_encoding_type()

This would better follow the Single Responsibility Principle and make the code more maintainable.

Additionally, since this is a work in progress, we should verify the exception handling across the codebase:

src/clp_ffi_js/ir/StreamReaderDataContext.hpp (2)

14-14: Documentation update looks good, eh!

The documentation has been properly updated to reference IrStreamReader instead of StreamReader.

Line range hint 1-57: Verify the impact of class renaming

The class implementation looks solid with proper memory management and RAII principles. However, we should verify that all references to this class are updated consistently.

Let's check for any references to this class that need updating:

✅ Verification successful

Let me check the content of IrStreamReader files to ensure the class name is used consistently and there are no mismatches.

No issues found with StreamReaderDataContext class references

After reviewing the implementation files, all references to StreamReaderDataContext are consistent and properly used throughout the codebase. The class is:

Correctly included in IrStreamReader.hpp
Properly used as a template class with four_byte_encoded_variable_t
Consistently used in both the header and implementation files
Appropriately managed with smart pointers and move semantics

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all references to StreamReaderDataContext that need updating

# Search for direct references to StreamReaderDataContext
echo "Checking for direct references to StreamReaderDataContext:"
rg -l "StreamReaderDataContext"

# Search for template usage patterns
echo -e "\nChecking for template usage patterns:"
rg "StreamReaderDataContext<"

Length of output: 1083

Script:

#!/bin/bash
# Check the content of IrStreamReader files
echo "Content of IrStreamReader.hpp:"
cat src/clp_ffi_js/ir/IrStreamReader.hpp

echo -e "\nContent of IrStreamReader.cpp:"
cat src/clp_ffi_js/ir/IrStreamReader.cpp

Length of output: 14360

src/clp_ffi_js/ir/decoding_methods.cpp (2)

Line range hint 57-82: LGTM! Existing implementation is robust

The function properly handles reader positioning and encoding type validation with appropriate error handling.

20-55: 🛠️ Refactor suggestion

Add defensive JSON validation and improve error handling

The implementation looks solid but could benefit from some improvements:

Consider these improvements:

Add JSON structure validation:

     nlohmann::json const metadata = nlohmann::json::parse(metadata_view);
+    if (!metadata.is_object()) {
+        throw ClpFfiJsException{
+            clp::ErrorCode::ErrorCode_MetadataCorrupted,
+            __FILENAME__,
+            __LINE__,
+            "Metadata is not a valid JSON object."
+        };
+    }
+    if (!metadata.contains(clp::ffi::ir_stream::cProtocol::Metadata::VersionKey)) {
+        throw ClpFfiJsException{
+            clp::ErrorCode::ErrorCode_MetadataCorrupted,
+            __FILENAME__,
+            __LINE__,
+            "Version information missing from metadata."
+        };
+    }

Add more detailed logging:

-    SPDLOG_INFO("The version is {}", version);
+    SPDLOG_INFO("Successfully extracted IR stream version: {}", version);

src/clp_ffi_js/ir/StreamReader.cpp (2)

6-6: Ensure C++20 support for <format> and std::format.

The inclusion of <format> and the usage of std::format in line 51 rely on C++20 features. Please verify that the project's build configuration and compilers are set to support C++20 to prevent potential compilation issues.

Also applies to: 51-51

64-68: Confirm class binding consistency in EMSCRIPTEN bindings.

In the EMSCRIPTEN_BINDINGS section, the class is registered as ClpStreamReader while binding clp_ffi_js::ir::StreamReader. Since there's an introduction of IrStreamReader, please ensure that the class binding accurately reflects the intended class hierarchy and that the JavaScript interface is correctly updated.

src/clp_ffi_js/ir/StreamReader.hpp (4)

30-31: Use std::unique_ptr for proper memory management in create method

Returning a std::unique_ptr<StreamReader> from the static create method enhances memory safety by ensuring exclusive ownership and automatic resource cleanup.

34-34: Declare virtual destructor in base class

Adding a virtual destructor to the StreamReader base class ensures that destructors of derived classes are called correctly, preventing potential resource leaks.

49-49: Convert member functions to pure virtual

Declaring methods as pure virtual functions (= 0) enforces that derived classes implement these interfaces, promoting polymorphism and ensuring consistent behaviour across implementations.

Also applies to: 54-54, 61-61, 69-69, 86-87

89-90: Make constructor protected to prevent direct instantiation

Moving the constructor to the protected section prevents direct instantiation of the StreamReader class, indicating its role as an abstract base class intended for inheritance.

src/clp_ffi_js/ir/IrStreamReader.hpp (5)

1-2: Header guard is correctly defined

The header guard CLP_FFI_JS_IR_IR_STREAM_READER_HPP is properly set up to prevent multiple inclusions.

27-27: Verify the necessity of 'friend StreamReader;' declaration

The friend StreamReader; declaration may not be necessary unless StreamReader requires access to the private members of IrStreamReader. Consider removing it if it's not needed.

45-56: Properly managing copy and move semantics

Disabling the copy constructor and copy assignment operator, while correctly handling the move constructor and deleting the move assignment operator, ensures proper resource management.

103-105: Efficient use of move semantics in constructor

Utilizing move semantics in the constructor to initialize stream_reader_data_context optimizes resource allocation.

113-118: Appropriate management of member variables

The use of smart pointers and standard containers for member variables ensures proper memory management and exception safety.

src/clp_ffi_js/ir/StreamReader.cpp

coderabbitai · 2024-10-31T20:24:12Z

src/clp_ffi_js/ir/IrStreamReader.hpp

+     *
+     * @param log_level_filter Array of selected log levels
+     */
+    void filter_log_events(emscripten::val const& log_level_filter) override;


🛠️ Refactor suggestion

Ensure appropriate passing of 'emscripten::val'

Consider passing emscripten::val by value rather than by const reference in filter_log_events. Since emscripten::val is a lightweight handle to a JavaScript value, passing it by value might be more suitable and aligns with common practices.

coderabbitai · 2024-10-31T20:24:12Z

src/clp_ffi_js/ir/IrStreamReader.cpp

+            .constructor(
+                    &clp_ffi_js::ir::IrStreamReader::create,
+                    emscripten::return_value_policy::take_ownership()
+            )


🛠️ Refactor suggestion

Use return_value_policy::move for efficiency

In the Emscripten bindings, the constructor uses return_value_policy::take_ownership, which is typically used for pointers. Since IrStreamReader::create returns by value and supports move semantics, it's more appropriate to use return_value_policy::move to avoid unnecessary copies.

Apply this diff to adjust the return value policy:

.constructor( &clp_ffi_js::ir::IrStreamReader::create, - emscripten::return_value_policy::take_ownership() + emscripten::return_value_policy::move )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

.constructor(

&clp_ffi_js::ir::IrStreamReader::create,

emscripten::return_value_policy::take_ownership()

)

.constructor(

&clp_ffi_js::ir::IrStreamReader::create,

emscripten::return_value_policy::move

)

coderabbitai · 2024-10-31T20:24:12Z

src/clp_ffi_js/ir/IrStreamReader.cpp

+
+        auto const parsed{log_event.get_message().decode_and_unparse()};
+        if (false == parsed.has_value()) {
+            SPDLOG_ERROR("Failed to decode message.");
+            break;
+        }
+        message = parsed.value();


⚠️ Potential issue

Handle message decoding failures explicitly

In the decode_range method, when message decoding fails, the code logs an error and breaks the loop. This may result in partial results without informing the caller of the failure. Consider throwing an exception to notify the caller about the decoding issue.

Apply this diff to throw an exception upon decoding failure:

auto const parsed{log_event.get_message().decode_and_unparse()}; if (false == parsed.has_value()) { SPDLOG_ERROR("Failed to decode message."); - break; + throw ClpFfiJsException{ + clp::ErrorCode::ErrorCode_Corrupt, + __FILENAME__, + __LINE__, + "Failed to decode message" + }; } message = parsed.value();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

auto const parsed{log_event.get_message().decode_and_unparse()};

if (false == parsed.has_value()) {

SPDLOG_ERROR("Failed to decode message.");

break;

}

message = parsed.value();

auto const parsed{log_event.get_message().decode_and_unparse()};

if (false == parsed.has_value()) {

SPDLOG_ERROR("Failed to decode message.");

throw ClpFfiJsException{

clp::ErrorCode::ErrorCode_Corrupt,

__FILENAME__,

__LINE__,

"Failed to decode message"

};

}

message = parsed.value();

Let's add this. Messages decoding is not expected to fail unless the file has been corrupted / the deserializer in CLP core does not handle edge cases.

@junhaoliao, understood. Throwing an exception upon message decoding failure is appropriate here since message decoding failures are not expected unless the file has been corrupted or the deserializer in CLP core does not handle edge cases. I'll note this for future reference.

✏️ Learnings added

Learnt from: junhaoliao PR: y-scope/clp-ffi-js#26 File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168 Timestamp: 2024-11-04T19:55:26.917Z Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: junhaoliao PR: y-scope/clp-ffi-js#26 File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59 Timestamp: 2024-11-04T19:35:32.602Z Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

coderabbitai · 2024-10-31T20:24:13Z

src/clp_ffi_js/ir/IrStreamReader.cpp

+    auto filter_levels{emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter
+    )};


⚠️ Potential issue

Add error handling for invalid log_level_filter input

Currently, the function assumes that log_level_filter is a valid array. To prevent potential errors if a non-array is passed from JavaScript, consider adding a type check and throwing an exception or returning an error if the input is invalid.

Apply this diff to enhance input validation:

void IrStreamReader::filter_log_events(emscripten::val const& log_level_filter) { if (log_level_filter.isNull()) { m_filtered_log_event_map.reset(); return; } + if (!log_level_filter.isArray()) { + throw ClpFfiJsException{ + clp::ErrorCode::ErrorCode_BadParam, + __FILENAME__, + __LINE__, + "Expected an array for log_level_filter" + }; + } m_filtered_log_event_map.emplace(); auto filter_levels{emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter )};

Committable suggestion was skipped due to low confidence.

I think for now this is skippable since the wrapper code is typed with TypeScript and any usage with non-array would fail static checks.

@junhaoliao, understood. Since the wrapper code is typed with TypeScript and non-array inputs would fail static checks, additional error handling here is not necessary.

✏️ Learnings added

Learnt from: junhaoliao PR: y-scope/clp-ffi-js#26 File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59 Timestamp: 2024-11-04T19:35:32.367Z Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Actually sorry for my oversight, I think we want to define a type (similar to FilteredLogEventMapTsType) instead of directly using emscripten::val here.

@junhaoliao, thank you for the clarification. Defining a specific type for log_level_filter will enhance type safety and improve the code's readability. Here's how you might define a type similar to FilteredLogEventMapTsType:

using LogLevelFilterTsType = emscripten::val; // Define a type representing the log level filter void IrStreamReader::filter_log_events(LogLevelFilterTsType const& log_level_filter) { // Implementation remains the same }

This approach makes the expected input type explicit in the method signature and aligns with the existing type definitions.

Since this updates our previous understanding and may contradict the earlier learning (ID: 940b05fa-811e-47b0-98a2-bfbbe962ccf3) regarding the unnecessary runtime error handling for log_level_filter, would you like me to remove this learning? You can also manage learnings directly using the CodeRabbit UI learnings page.

🧠 Learnings used

Learnt from: junhaoliao PR: y-scope/clp-ffi-js#26 File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59 Timestamp: 2024-11-04T19:35:32.602Z Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

coderabbitai

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (7)

src/clp_ffi_js/ir/decoding_methods.cpp (1)
3-8: Consider reorganizing includes for better readability

Consider grouping includes into standard library, third-party libraries, and project-specific headers, separated by newlines:
-#include <cstdint>
-#include <json/single_include/nlohmann/json.hpp>
-#include <string>
-#include <string_view>
-#include <vector>
-#include <clp/ErrorCode.hpp>
-#include <clp/ffi/ir_stream/decoding_methods.hpp>
-#include <clp/ffi/ir_stream/protocol_constants.hpp>
-#include <clp/ReaderInterface.hpp>
-#include <clp/TraceableException.hpp>
-#include <clp/type_utils.hpp>
-#include <spdlog/spdlog.h>

+// Standard library
+#include <cstdint>
+#include <string>
+#include <string_view>
+#include <vector>
+
+// Third-party libraries
+#include <json/single_include/nlohmann/json.hpp>
+#include <spdlog/spdlog.h>
+
+// Project headers
+#include <clp/ErrorCode.hpp>
+#include <clp/ReaderInterface.hpp>
+#include <clp/TraceableException.hpp>
+#include <clp/ffi/ir_stream/decoding_methods.hpp>
+#include <clp/ffi/ir_stream/protocol_constants.hpp>
+#include <clp/type_utils.hpp>
Also applies to: 11-11, 14-14
src/clp_ffi_js/ir/StreamReader.hpp (3)
Line range hint 41-42: Document the implementation dependency more formally

The comment about LogEventDeserializer creates tight coupling. Consider documenting this relationship more formally in the class documentation.
-    // Delete move assignment operator since it's also disabled in `clp::ir::LogEventDeserializer`.
+    /**
+     * Move assignment is disabled due to the class's relationship with LogEventDeserializer.
+     * @note This class follows LogEventDeserializer's contract which disallows move assignment.
+     */
59-59: Enhance documentation for error handling behaviour

These pure virtual methods should document their error handling behaviour:

What exceptions might be thrown?

What are the preconditions and postconditions?
-    virtual void filter_log_events(emscripten::val const& log_level_filter) = 0;
+    /**
+     * Generates a filtered collection from all log events.
+     * @param log_level_filter Array of selected log levels
+     * @throws std::invalid_argument If the filter is malformed
+     * @post Filtered collection is updated to reflect the new filter
+     */
+    virtual void filter_log_events(emscripten::val const& log_level_filter) = 0;

-    [[nodiscard]] virtual auto deserialize_stream() -> size_t = 0;
+    /**
+     * Deserializes all log events in the stream.
+     * @return The number of successfully deserialized ("valid") log events
+     * @throws std::runtime_error If the stream is corrupted or invalid
+     * @post The stream is deallocated after processing
+     */
+    [[nodiscard]] virtual auto deserialize_stream() -> size_t = 0;
Also applies to: 67-67

28-29: Consider version handling strategy

Given that the PR objectives mention adding version checking in the factory method, consider:

Documenting the version compatibility requirements in the class documentation

Adding version-related methods to the interface

Implementing version validation in the factory method

This will help ensure proper version handling across different stream formats.

Also applies to: 84-85
src/clp_ffi_js/ir/IrStreamReader.hpp (3)

1-18: Consider adding comprehensive file-level documentation.

While the includes are well-organized, it would be beneficial to add file-level documentation explaining:

The purpose and scope of this header file

The relationship between IrStreamReader and StreamReader

Any threading considerations or requirements

74-90: Enhance decode_range documentation for error handling.

The documentation should clarify:

What specific error conditions might cause a null return

Whether any exceptions might be thrown

The behaviour when begin_idx > end_idx

99-110: Add documentation for private members.

Consider adding:

Documentation for create_data_context explaining ownership transfer of parameters

Documentation for each member variable explaining their purpose and lifecycle

Any invariants maintained by these members

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ad306f2 and dba3910.

📒 Files selected for processing (7)

CMakeLists.txt (1 hunks)
src/clp_ffi_js/ir/IrStreamReader.cpp (1 hunks)
src/clp_ffi_js/ir/IrStreamReader.hpp (1 hunks)
src/clp_ffi_js/ir/StreamReader.cpp (2 hunks)
src/clp_ffi_js/ir/StreamReader.hpp (3 hunks)
src/clp_ffi_js/ir/decoding_methods.cpp (2 hunks)
src/clp_ffi_js/ir/decoding_methods.hpp (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/clp_ffi_js/ir/decoding_methods.hpp

🔇 Additional comments (11)

src/clp_ffi_js/ir/decoding_methods.cpp (2)

Line range hint 20-43: LGTM! Well-structured error handling

The function properly validates the encoding type with appropriate error handling and clear error messages.

45-80: Verify version handling across the codebase

Let's verify how the version information is used elsewhere in the codebase to ensure compatibility.

✅ Verification successful

Let me gather more specific information about how the version is used in these files.

Version handling implementation is properly constrained

The version handling is well-defined and properly implemented:

The get_version function is used only in StreamReader.cpp
Version validation explicitly checks for specific versions: "v0.0.1", "v0.0.0", "0.0.1", "0.0.0"
Appropriate error handling is in place for unsupported versions
No other parts of the codebase make assumptions about version format

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for version-related code patterns
echo "Searching for version usage patterns..."
rg -l "VersionKey|get_version" --type cpp

echo "Searching for version format validation..."
ast-grep --pattern 'if ($version $_) { $$$ }'

Length of output: 388

Script:

#!/bin/bash
# Check version-related code in the identified files
echo "=== Content of decoding_methods.hpp ==="
rg -A 5 "get_version|VersionKey" src/clp_ffi_js/ir/decoding_methods.hpp

echo -e "\n=== Content of StreamReader.cpp ==="
rg -A 5 "get_version|VersionKey" src/clp_ffi_js/ir/StreamReader.cpp

# Look for version comparisons or validations
echo -e "\n=== Version comparisons or checks ==="
rg "version|Version" --type cpp -C 3

Length of output: 5891

src/clp_ffi_js/ir/StreamReader.hpp (3)

28-29: Excellent improvement to memory management!

The shift to std::unique_ptr return type enforces RAII principles and makes ownership semantics explicit. The [[nodiscard]] attribute helps prevent accidental resource leaks.

32-32: Virtual destructor is correctly defaulted

This enables proper cleanup of derived classes through base class pointers.

87-88: Protected constructor correctly enforces base class design

The protected defaulted constructor properly prevents direct instantiation while enabling derived class construction, which aligns well with the factory method pattern.

src/clp_ffi_js/ir/StreamReader.cpp (3)

5-6: LGTM: New includes support error handling improvements

The addition of ErrorCode.hpp and format headers properly supports the enhanced error handling mechanism.

Line range hint 21-53: LGTM: Improved memory management and error handling

The changes demonstrate good practices:

Use of std::unique_ptr for automatic resource management
Proper move semantics for transferring ownership
Comprehensive error handling with detailed messages

Line range hint 58-77: LGTM: Emscripten bindings properly updated

The bindings correctly reflect the class restructuring and include appropriate memory management policies.

src/clp_ffi_js/ir/IrStreamReader.hpp (2)

32-47: Well-structured class declaration with proper memory management.

The class follows modern C++ best practices:

Appropriate use of friend declaration for the factory pattern
Proper handling of special member functions
Clear documentation of design decisions

64-64: Previous review comment about emscripten::val parameter is still applicable.

CMakeLists.txt (1)

117-117: The addition of IrStreamReader.cpp looks good, eh!

The source file addition aligns with the PR objectives for extracting the StreamReader base class.

Let's verify the presence of corresponding header files:

✅ Verification successful

Beauty! The header files are all present and properly organized, eh!

Found all the expected header files in the src/clp_ffi_js/ir/ directory:

IrStreamReader.hpp
StreamReader.hpp
StreamReaderDataContext.hpp

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the presence of header files for IrStreamReader and StreamReader

# Test: Check if the header files exist
fd -e hpp -e h "IrStreamReader|StreamReader" src/clp_ffi_js/ir/

Length of output: 183

coderabbitai · 2024-11-01T18:01:56Z

src/clp_ffi_js/ir/decoding_methods.cpp

+auto get_version(clp::ReaderInterface& reader) -> std::string {
+    // The encoding type bytes must be consumed before the metadata can be read.
+    rewind_reader_and_validate_encoding_type(reader);
+
+    // Deserialize metadata bytes from preamble.
+    clp::ffi::ir_stream::encoded_tag_t metadata_type{};
+    std::vector<int8_t> metadata_bytes;
+    auto const deserialize_preamble_result{
+            clp::ffi::ir_stream::deserialize_preamble(reader, metadata_type, metadata_bytes)
+    };
+    if (clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success != deserialize_preamble_result) {
+        SPDLOG_CRITICAL(
+                "Failed to deserialize preamble for version reading: {}",
+                deserialize_preamble_result
+        );
+        throw ClpFfiJsException{
+                clp::ErrorCode::ErrorCode_Failure,
+                __FILENAME__,
+                __LINE__,
+                "Failed to deserialize preamble for version reading."
+        };
+    }
+
+    // Deserialize metadata bytes which is encoded in JSON.
+    std::string_view const metadata_view{
+            clp::size_checked_pointer_cast<char const>(metadata_bytes.data()),
+            metadata_bytes.size()
+    };
+    nlohmann::json const metadata = nlohmann::json::parse(metadata_view);
+
+    // Retrieve version from metadata.
+    auto const& version{metadata.at(clp::ffi::ir_stream::cProtocol::Metadata::VersionKey)};
+    SPDLOG_INFO("The version is {}", version);
+
+    return version;
+}


🛠️ Refactor suggestion

Consider adding more robust error handling and validation

While the basic error handling is present, consider these improvements:

Add validation for the JSON structure and version format

Consider using string_view for better memory efficiency

Provide more specific error messages in exceptions

Here's a suggested improvement:

-auto get_version(clp::ReaderInterface& reader) -> std::string { +auto get_version(clp::ReaderInterface& reader) -> std::string_view { // The encoding type bytes must be consumed before the metadata can be read. rewind_reader_and_validate_encoding_type(reader); // Deserialize metadata bytes from preamble. clp::ffi::ir_stream::encoded_tag_t metadata_type{}; std::vector<int8_t> metadata_bytes; auto const deserialize_preamble_result{ clp::ffi::ir_stream::deserialize_preamble(reader, metadata_type, metadata_bytes) }; if (clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success != deserialize_preamble_result) { SPDLOG_CRITICAL( - "Failed to deserialize preamble for version reading: {}", + "Failed to deserialize preamble: error code {}", deserialize_preamble_result ); throw ClpFfiJsException{ clp::ErrorCode::ErrorCode_Failure, __FILENAME__, __LINE__, - "Failed to deserialize preamble for version reading." + fmt::format("Failed to deserialize preamble: error code {}", deserialize_preamble_result) }; } // Deserialize metadata bytes which is encoded in JSON. std::string_view const metadata_view{ clp::size_checked_pointer_cast<char const>(metadata_bytes.data()), metadata_bytes.size() }; - nlohmann::json const metadata = nlohmann::json::parse(metadata_view); + nlohmann::json metadata; + try { + metadata = nlohmann::json::parse(metadata_view); + } catch (const nlohmann::json::parse_error& e) { + SPDLOG_CRITICAL("Failed to parse metadata JSON: {}", e.what()); + throw ClpFfiJsException{ + clp::ErrorCode::ErrorCode_MetadataCorrupted, + __FILENAME__, + __LINE__, + fmt::format("Failed to parse metadata JSON: {}", e.what()) + }; + } // Retrieve version from metadata. - auto const& version{metadata.at(clp::ffi::ir_stream::cProtocol::Metadata::VersionKey)}; + try { + auto const& version{metadata.at(clp::ffi::ir_stream::cProtocol::Metadata::VersionKey)}; + if (!version.is_string()) { + throw ClpFfiJsException{ + clp::ErrorCode::ErrorCode_MetadataCorrupted, + __FILENAME__, + __LINE__, + "Version in metadata is not a string" + }; + } + } catch (const nlohmann::json::out_of_range& e) { + throw ClpFfiJsException{ + clp::ErrorCode::ErrorCode_MetadataCorrupted, + __FILENAME__, + __LINE__, + fmt::format("Version key not found in metadata: {}", e.what()) + }; + } SPDLOG_INFO("The version is {}", version); return version; }

Committable suggestion skipped: line range outside the PR's diff.

I agree with the rabbit on all those 3 items thought the error messages in the log and throwable should be consistent (don't apply the suggested code directly).

@junhaoliao, understood. I'll ensure that error messages in logs and exceptions are consistent in future suggestions.

✏️ Learnings added

Learnt from: junhaoliao PR: y-scope/clp-ffi-js#26 File: src/clp_ffi_js/ir/decoding_methods.cpp:45-80 Timestamp: 2024-11-04T21:02:17.823Z Learning: Ensure that error messages in logs and exceptions are consistent.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Correction: it is a bad idea to return a string_view in this method since metadata goes out of scope when this function returns. Let's address only 1 & 2.

@junhaoliao, thank you for pointing that out. You're correct; returning a std::string_view in this case is unsafe since metadata goes out of scope when the function returns. Returning a std::string is the appropriate choice to ensure the version string remains valid.

✏️ Learnings added

Learnt from: junhaoliao PR: y-scope/clp-ffi-js#26 File: src/clp_ffi_js/ir/decoding_methods.cpp:45-80 Timestamp: 2024-11-04T21:06:24.293Z Learning: In `src/clp_ffi_js/ir/decoding_methods.cpp`, within the `get_version(clp::ReaderInterface&)` function, avoid returning `std::string_view` when referencing local data that goes out of scope. Instead, return `std::string` to ensure data validity.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

some of coderabbit suggestion incliude printing things that are not printable, but otherwise did comment

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

coderabbitai · 2024-11-01T18:01:56Z

src/clp_ffi_js/ir/StreamReader.cpp

+    auto const version{get_version(*zstd_decompressor)};
+    if (version == "v0.0.1" || version == "v0.0.0" || version == "0.0.1" || version == "0.0.0") {
+        auto stream_reader_data_context{IrStreamReader::create_data_context(
+                std::move(zstd_decompressor),
+                std::move(data_buffer)
        )};
-        m_encoded_log_events.emplace_back(std::move(log_viewer_event));
-    }
-    m_stream_reader_data_context.reset(nullptr);
-    return m_encoded_log_events.size();
-}
-
-auto StreamReader::decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-        -> DecodedResultsTsType {
-    if (use_filter && false == m_filtered_log_event_map.has_value()) {
-        return DecodedResultsTsType{emscripten::val::null()};
-    }
-
-    size_t length{0};
-    if (use_filter) {
-        length = m_filtered_log_event_map->size();
-    } else {
-        length = m_encoded_log_events.size();
-    }
-    if (length < end_idx || begin_idx > end_idx) {
-        return DecodedResultsTsType{emscripten::val::null()};
-    }
-
-    std::string message;
-    constexpr size_t cDefaultReservedMessageLength{512};
-    message.reserve(cDefaultReservedMessageLength);
-    auto const results{emscripten::val::array()};
-
-    for (size_t i = begin_idx; i < end_idx; ++i) {
-        size_t log_event_idx{0};
-        if (use_filter) {
-            log_event_idx = m_filtered_log_event_map->at(i);
-        } else {
-            log_event_idx = i;
-        }
-        auto const& log_event{m_encoded_log_events[log_event_idx]};

-        auto const parsed{log_event.get_message().decode_and_unparse()};
-        if (false == parsed.has_value()) {
-            SPDLOG_ERROR("Failed to decode message.");
-            break;
-        }
-        message = parsed.value();
-
-        m_ts_pattern.insert_formatted_timestamp(log_event.get_timestamp(), message);
-
-        EM_ASM(
-                { Emval.toValue($0).push([UTF8ToString($1), $2, $3, $4]); },
-                results.as_handle(),
-                message.c_str(),
-                log_event.get_timestamp(),
-                log_event.get_log_level(),
-                log_event_idx + 1
+        return std::unique_ptr<IrStreamReader>(
+                new IrStreamReader(std::move(stream_reader_data_context))
        );
    }


🛠️ Refactor suggestion

Consider using version constants and semantic version parsing

The hard-coded version strings ("v0.0.1", "v0.0.0", "0.0.1", "0.0.0") could be difficult to maintain. Consider:

Defining these as constants

Implementing semantic version parsing to handle versions more robustly

Using a version comparison utility to support future versions

Example improvement:

// In a header file: namespace clp_ffi_js::ir { constexpr std::array<std::string_view, 4> SUPPORTED_VERSIONS = { "v0.0.1", "v0.0.0", "0.0.1", "0.0.0" }; } // In the implementation: if (std::ranges::find(SUPPORTED_VERSIONS, version) != SUPPORTED_VERSIONS.end()) { // ... existing creation logic }

Given an IRv1 version validate util function is not to be added to CLP-core soon, I agree with this approach.

@junhaoliao, understood. I'll note that an IRv1 version validation utility function is not planned to be added to CLP-core in the near future.

✏️ Learnings added

Learnt from: junhaoliao PR: y-scope/clp-ffi-js#26 File: src/clp_ffi_js/ir/StreamReader.cpp:35-45 Timestamp: 2024-11-04T20:55:01.575Z Learning: An IRv1 version validation utility function is not planned to be added to CLP-core soon.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

coderabbitai · 2024-11-01T18:01:56Z

src/clp_ffi_js/ir/IrStreamReader.cpp

+void IrStreamReader::filter_log_events(emscripten::val const& log_level_filter) {
+    if (log_level_filter.isNull()) {
+        m_filtered_log_event_map.reset();
+        return;
+    }
+
+    m_filtered_log_event_map.emplace();
+    auto filter_levels{emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter
+    )};
+    for (size_t log_event_idx = 0; log_event_idx < m_encoded_log_events.size(); ++log_event_idx) {
+        auto const& log_event = m_encoded_log_events[log_event_idx];
+        if (std::ranges::find(
+                    filter_levels,
+                    clp::enum_to_underlying_type(log_event.get_log_level())
+            )
+            != filter_levels.end())
+        {
+            m_filtered_log_event_map->emplace_back(log_event_idx);
+        }
+    }
+}


⚠️ Potential issue

Consider adding a type check for log_level_filter

Currently, the function assumes that log_level_filter is a valid array. To prevent potential runtime errors if a non-array is passed from JavaScript, consider adding a type check and throwing an exception if the input is invalid.

coderabbitai · 2024-11-01T18:01:56Z

src/clp_ffi_js/ir/IrStreamReader.cpp

+    for (size_t i = begin_idx; i < end_idx; ++i) {
+        size_t log_event_idx{0};
+        if (use_filter) {
+            log_event_idx = m_filtered_log_event_map->at(i);
+        } else {
+            log_event_idx = i;
+        }
+        auto const& log_event{m_encoded_log_events[log_event_idx]};
+
+        auto const parsed{log_event.get_message().decode_and_unparse()};
+        if (false == parsed.has_value()) {
+            SPDLOG_ERROR("Failed to decode message.");
+            break;
+        }
+        message = parsed.value();
+
+        m_ts_pattern.insert_formatted_timestamp(log_event.get_timestamp(), message);
+
+        EM_ASM(
+                { Emval.toValue($0).push([UTF8ToString($1), $2, $3, $4]); },
+                results.as_handle(),
+                message.c_str(),
+                log_event.get_timestamp(),
+                log_event.get_log_level(),
+                log_event_idx + 1
+        );
+    }
+


🛠️ Refactor suggestion

Optimize the loop in decode_range

Instead of manually constructing the results array using EM_ASM, consider using a more direct approach with Emscripten bindings to improve readability and maintainability.

To be explored in the future.

@junhaoliao, understood.

🧠 Learnings used

Learnt from: junhaoliao PR: y-scope/clp-ffi-js#26 File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59 Timestamp: 2024-11-04T19:35:32.602Z Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

coderabbitai · 2024-11-01T18:01:56Z

src/clp_ffi_js/ir/IrStreamReader.cpp

+        auto const parsed{log_event.get_message().decode_and_unparse()};
+        if (false == parsed.has_value()) {
+            SPDLOG_ERROR("Failed to decode message.");
+            break;
+        }
+        message = parsed.value();


⚠️ Potential issue

Handle message decoding failures more explicitly

In the decode_range method, when message decoding fails, the code logs an error and breaks the loop. This might result in partial results without informing the caller of the failure. Consider throwing an exception to notify the caller about the decoding issue.

You can apply this diff to throw an exception upon decoding failure:

auto const parsed{log_event.get_message().decode_and_unparse()}; if (false == parsed.has_value()) { SPDLOG_ERROR("Failed to decode message."); - break; + throw ClpFfiJsException{ + clp::ErrorCode::ErrorCode_Corrupt, + __FILENAME__, + __LINE__, + "Failed to decode message" + }; } message = parsed.value();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

auto const parsed{log_event.get_message().decode_and_unparse()};

if (false == parsed.has_value()) {

SPDLOG_ERROR("Failed to decode message.");

break;

}

message = parsed.value();

auto const parsed{log_event.get_message().decode_and_unparse()};

if (false == parsed.has_value()) {

SPDLOG_ERROR("Failed to decode message.");

throw ClpFfiJsException{

clp::ErrorCode::ErrorCode_Corrupt,

__FILENAME__,

__LINE__,

"Failed to decode message"

};

}

message = parsed.value();

this seems to be the same as the above comment.

@junhaoliao, thank you for pointing that out. It seems the comment was duplicated. Sorry for any confusion.

🧠 Learnings used

Learnt from: junhaoliao PR: y-scope/clp-ffi-js#26 File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59 Timestamp: 2024-11-04T19:35:32.602Z Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

coderabbitai · 2024-11-01T18:01:56Z

src/clp_ffi_js/ir/IrStreamReader.cpp

+            if (std::errc::no_message_available == error) {
+                break;
+            }
+            if (std::errc::result_out_of_range == error) {
+                SPDLOG_ERROR("File contains an incomplete IR stream");
+                break;
+            }


⚠️ Potential issue

Uniform error handling upon deserialization failure

When encountering std::errc::result_out_of_range, the code logs an error and breaks the loop. To maintain consistency and ensure the caller is properly informed of all deserialization failures, consider throwing an exception here as well.

Here's a suggested change:

if (std::errc::result_out_of_range == error) { SPDLOG_ERROR("File contains an incomplete IR stream"); - break; + throw ClpFfiJsException{ + clp::ErrorCode::ErrorCode_Corrupt, + __FILENAME__, + __LINE__, + "File contains an incomplete IR stream" + }; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (std::errc::no_message_available == error) {

break;

}

if (std::errc::result_out_of_range == error) {

SPDLOG_ERROR("File contains an incomplete IR stream");

break;

}

if (std::errc::no_message_available == error) {

break;

}

if (std::errc::result_out_of_range == error) {

SPDLOG_ERROR("File contains an incomplete IR stream");

throw ClpFfiJsException{

clp::ErrorCode::ErrorCode_Corrupt,

__FILENAME__,

__LINE__,

"File contains an incomplete IR stream"

};

}

I think we should not throw an exception here because we intend to handle missing End-of-Stream packet gracefully.

@junhaoliao, understood. Thank you for the clarification.

✏️ Learnings added

Learnt from: junhaoliao PR: y-scope/clp-ffi-js#26 File: src/clp_ffi_js/ir/IrStreamReader.cpp:85-91 Timestamp: 2024-11-04T19:37:26.993Z Learning: In the `IrStreamReader::deserialize_stream` method, we handle missing End-of-Stream packets gracefully by logging an error and breaking the loop when encountering `std::errc::result_out_of_range`, rather than throwing an exception.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: junhaoliao PR: y-scope/clp-ffi-js#26 File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59 Timestamp: 2024-11-04T19:35:32.602Z Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

src/clp_ffi_js/ir/StreamReader.hpp

junhaoliao · 2024-11-04T20:09:44Z

src/clp_ffi_js/ir/IrStreamReader.cpp

Besides renaming the class from StreamReader to IrStreamReader, did we make any other changes to this file?

I removed create() since we have effectively moved this to base class. I removed encripten bindings and put them all in base class.

src/clp_ffi_js/ir/StreamReader.cpp

src/clp_ffi_js/ir/StreamReaderDataContext.hpp

src/clp_ffi_js/ir/StreamReader.cpp

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)

src/clp_ffi_js/ir/StreamReader.cpp (1)

5-18: Reorganize includes following standard convention.

The includes should be organized in the following groups, separated by blank lines:

C++ standard library headers

Third-party library headers

Project headers

Apply this organization:
 #include <cstddef>
 #include <cstdint>
 #include <format>
 #include <memory>
 #include <utility>
 
 #include <clp/Array.hpp>
-#include <clp/ErrorCode.hpp>
 #include <clp/streaming_compression/zstd/Decompressor.hpp>
 #include <clp/TraceableException.hpp>
+#include <clp/ErrorCode.hpp>
 #include <emscripten/bind.h>
 #include <spdlog/spdlog.h>
 
 #include <clp_ffi_js/ClpFfiJsException.hpp>
 #include <clp_ffi_js/ir/decoding_methods.hpp>
 #include <clp_ffi_js/ir/IrStreamReader.hpp>

src/clp_ffi_js/ir/IrStreamReader.cpp (2)

102-119: Consider extracting log level parsing logic.

The log level extraction logic is complex and could be moved to a separate method for better readability and maintainability.

+auto parse_log_level(std::string_view logtype) -> LogLevel {
+    constexpr size_t cLogLevelPositionInMessages{1};
+    if (logtype.length() <= cLogLevelPositionInMessages) {
+        return LogLevel::NONE;
+    }
+    
+    auto const log_level_name_it{std::find_if(
+        cLogLevelNames.begin() + static_cast<size_t>(cValidLogLevelsBeginIdx),
+        cLogLevelNames.end(),
+        [&](std::string_view level) {
+            return logtype.substr(cLogLevelPositionInMessages).starts_with(level);
+        }
+    )};
+    
+    return (log_level_name_it != cLogLevelNames.end())
+        ? static_cast<LogLevel>(std::distance(cLogLevelNames.begin(), log_level_name_it))
+        : LogLevel::NONE;
+}

-        constexpr size_t cLogLevelPositionInMessages{1};
-        LogLevel log_level{LogLevel::NONE};
-        if (logtype.length() > cLogLevelPositionInMessages) {
-            auto const log_level_name_it{std::find_if(
-                    cLogLevelNames.begin() + static_cast<size_t>(cValidLogLevelsBeginIdx),
-                    cLogLevelNames.end(),
-                    [&](std::string_view level) {
-                        return logtype.substr(cLogLevelPositionInMessages).starts_with(level);
-                    }
-            )};
-            if (log_level_name_it != cLogLevelNames.end()) {
-                log_level = static_cast<LogLevel>(
-                        std::distance(cLogLevelNames.begin(), log_level_name_it)
-                );
-            }
-        }
+        auto const log_level{parse_log_level(logtype)};

177-185: Add documentation for the EM_ASM block.

The EM_ASM block performs critical JavaScript array manipulation, but its purpose and structure could be better documented.

+        // EM_ASM block pushes a tuple [message, timestamp, log_level, index] to the JavaScript array
+        // Using direct EM_ASM for performance instead of higher-level emscripten::val operations
         EM_ASM(
                 { Emval.toValue($0).push([UTF8ToString($1), $2, $3, $4]); },
                 results.as_handle(),

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between dba3910 and 9ddd974.

📒 Files selected for processing (3)

src/clp_ffi_js/ir/IrStreamReader.cpp (1 hunks)
src/clp_ffi_js/ir/IrStreamReader.hpp (1 hunks)
src/clp_ffi_js/ir/StreamReader.cpp (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/clp_ffi_js/ir/IrStreamReader.hpp

🧰 Additional context used

📓 Learnings (2)

src/clp_ffi_js/ir/IrStreamReader.cpp (3)

Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:85-91
Timestamp: 2024-11-04T19:37:27.204Z
Learning: In the `IrStreamReader::deserialize_stream` method, we handle missing End-of-Stream packets gracefully by logging an error and breaking the loop when encountering `std::errc::result_out_of_range`, rather than throwing an exception.

Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:27.398Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.

src/clp_ffi_js/ir/StreamReader.cpp (1)

Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/StreamReader.cpp:35-45
Timestamp: 2024-11-04T20:55:01.745Z
Learning: An IRv1 version validation utility function is not planned to be added to CLP-core soon.

🔇 Additional comments (7)

src/clp_ffi_js/ir/StreamReader.cpp (3)

35-45: Implement version checking using constants.

Based on the previous review discussion and the learning that "An IRv1 version validation utility function is not planned to be added to CLP-core soon", we should at least make the current version checking more maintainable.

46-52: Well-implemented error handling with proper logging and detailed exception.

The error handling follows best practices by:

Using SPDLOG_CRITICAL for logging
Throwing a custom exception with detailed information
Using std::format for clear error messages

Line range hint 58-77: Well-structured Emscripten bindings with proper memory management.

The bindings implementation:

Correctly registers TypeScript types for interoperability
Uses appropriate memory management with take_ownership
Provides a clean interface for JavaScript/TypeScript consumers

src/clp_ffi_js/ir/IrStreamReader.cpp (4)

1-38: LGTM! Well-organized includes and namespace setup.

The includes are properly organized into logical groups (standard library, external libraries, and project headers), and the using declarations are appropriately scoped within the namespace.

39-71: LGTM! Efficient event filtering implementation.

The event management methods are well-implemented with:

Proper handling of optional filtered map
Efficient use of std::ranges for filtering
Type-safe handling of log levels through TypeScript typing (as per previous learnings)

73-98: LGTM! Robust error handling in stream deserialization.

The error handling follows the established pattern:

Gracefully handles missing End-of-Stream packets
Properly throws exceptions for corruption errors
Clear error messages with detailed context

190-223: LGTM! Robust constructor and data context creation.

The implementation demonstrates good practices:

Proper use of move semantics
RAII principles in constructor
Thorough error handling with detailed messages

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)

src/clp_ffi_js/ir/IrStreamReader.cpp (3)

78-79: Consider making the magic number configurable.

The default number of reserved log events (500,000) could be made configurable through a constant or configuration parameter for better flexibility.

-    constexpr size_t cDefaultNumReservedLogEvents{500'000};
+    static constexpr size_t cDefaultNumReservedLogEvents{500'000};  // TODO: Make configurable

103-119: Consider extracting log level parsing logic.

The log level parsing logic is complex enough to warrant its own method for better readability and maintainability.

+    auto parse_log_level(std::string_view logtype) -> LogLevel {
+        constexpr size_t cLogLevelPositionInMessages{1};
+        if (logtype.length() <= cLogLevelPositionInMessages) {
+            return LogLevel::NONE;
+        }
+        
+        auto const log_level_name_it{std::find_if(
+            cLogLevelNames.begin() + static_cast<size_t>(cValidLogLevelsBeginIdx),
+            cLogLevelNames.end(),
+            [&](std::string_view level) {
+                return logtype.substr(cLogLevelPositionInMessages).starts_with(level);
+            }
+        )};
+        
+        if (log_level_name_it == cLogLevelNames.end()) {
+            return LogLevel::NONE;
+        }
+        
+        return static_cast<LogLevel>(
+            std::distance(cLogLevelNames.begin(), log_level_name_it)
+        );
+    }

177-185: Consider using Embind instead of EM_ASM for better type safety.

The current implementation uses EM_ASM for array manipulation. Consider using Embind's val::array() methods for better type safety and maintainability.

-        EM_ASM(
-                { Emval.toValue($0).push([UTF8ToString($1), $2, $3, $4]); },
-                results.as_handle(),
-                message.c_str(),
-                log_event.get_timestamp(),
-                log_event.get_log_level(),
-                log_event_idx + 1
-        );
+        results.call<void>("push", emscripten::val::array(
+            std::vector<emscripten::val>{
+                emscripten::val(message),
+                emscripten::val(log_event.get_timestamp()),
+                emscripten::val(log_event.get_log_level()),
+                emscripten::val(log_event_idx + 1)
+            }));

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9ddd974 and a5cb529.

📒 Files selected for processing (2)

src/clp_ffi_js/ir/IrStreamReader.cpp (1 hunks)
src/submodules/clp (1 hunks)

✅ Files skipped from review due to trivial changes (1)

src/submodules/clp

🧰 Additional context used

📓 Learnings (1)

src/clp_ffi_js/ir/IrStreamReader.cpp (3)

Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:85-91
Timestamp: 2024-11-04T19:37:27.204Z
Learning: In the `IrStreamReader::deserialize_stream` method, we handle missing End-of-Stream packets gracefully by logging an error and breaking the loop when encountering `std::errc::result_out_of_range`, rather than throwing an exception.

Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:27.398Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.

🔇 Additional comments (4)

src/clp_ffi_js/ir/IrStreamReader.cpp (4)

1-38: LGTM! Well-organized includes and namespace setup.

The includes are logically grouped and the namespace usage is appropriate.

39-41: LGTM! Simple and effective accessor method.

190-197: LGTM! Constructor properly initializes member variables.

The constructor correctly moves the data context and initializes the timestamp pattern.

199-223: LGTM! Robust error handling in data context creation.

The method properly validates the decompressor and handles errors appropriately.

kirkrodrigues · 2024-11-06T18:45:08Z

src/clp_ffi_js/ir/StreamReader.cpp

@@ -3,249 +3,122 @@
 #include <algorithm>
 #include <cstddef>
 #include <cstdint>
-#include <iterator>
+#include <format>
+#include <json/single_include/nlohmann/json.hpp>


We need to add json to src/clp-ffi-js/.clang-format.

src/clp_ffi_js/ir/StreamReader.cpp

kirkrodrigues · 2024-11-06T18:56:28Z

src/clp_ffi_js/ir/StreamReader.cpp

+    // Required to validate encoding type prior to getting version.
+    rewind_reader_and_validate_encoding_type(*zstd_decompressor);
+    auto const version{get_version(*zstd_decompressor)};
+
+    // Required that reader offset matches position after validation in order to decode log events.
+    rewind_reader_and_validate_encoding_type(*zstd_decompressor);
+    if (std::ranges::find(cUnstructuredIrVersions, version) != cUnstructuredIrVersions.end()) {
+        return std::make_unique<UnstructuredIrStreamReader>(UnstructuredIrStreamReader::create(
+                std::move(zstd_decompressor),
+                std::move(data_buffer)
+        ));
+    }
+    SPDLOG_INFO("did i get here 3");
+
+    throw ClpFfiJsException{
+            clp::ErrorCode::ErrorCode_Unsupported,
+            __FILENAME__,
+            __LINE__,
+            std::format("Unable to create reader for IR stream with version {}.", version)
+    };


Suggested change

// Required to validate encoding type prior to getting version.

rewind_reader_and_validate_encoding_type(*zstd_decompressor);

auto const version{get_version(*zstd_decompressor)};

// Required that reader offset matches position after validation in order to decode log events.

rewind_reader_and_validate_encoding_type(*zstd_decompressor);

if (std::ranges::find(cUnstructuredIrVersions, version) != cUnstructuredIrVersions.end()) {

return std::make_unique<UnstructuredIrStreamReader>(UnstructuredIrStreamReader::create(

std::move(zstd_decompressor),

std::move(data_buffer)

));

}

SPDLOG_INFO("did i get here 3");

throw ClpFfiJsException{

clp::ErrorCode::ErrorCode_Unsupported,

__FILENAME__,

__LINE__,

std::format("Unable to create reader for IR stream with version {}.", version)

};

rewind_reader_and_validate_encoding_type(*zstd_decompressor);

// Validate the stream's version

auto pos = zstd_decompressor->get_pos();

auto const version{get_version(*zstd_decompressor)};

if (std::ranges::find(cUnstructuredIrVersions, version) == cUnstructuredIrVersions.end()) {

throw ClpFfiJsException{

clp::ErrorCode::ErrorCode_Unsupported,

__FILENAME__,

__LINE__,

std::format("Unable to create reader for IR stream with version {}.", version)

};

}

try {

zstd_decompressor->seek_from_begin(pos);

} catch (ZstdDecompressor::OperationFailed& e) {

throw ClpFfiJsException{

clp::ErrorCode::ErrorCode_Failure,

__FILENAME__,

__LINE__,

std::format("Unable to rewind zstd decompressor: {}", e.what())

};

}

return std::make_unique<UnstructuredIrStreamReader>(

UnstructuredIrStreamReader::create(std::move(zstd_decompressor), std::move(data_buffer))

);

This does a few things:

Refactors the code structure a bit in terms of comments and flow

Note that we handle the error path in the if block rather than the previous code which handled the success path in the if block

Updated the flow to avoid re-validating the encoding type twice:

Rewind + validate encoding type

Save position

Validate version

Restore position

Create and return the reader

It might be better for you to copy and commit this manually rather than applying the change from GH, just so that you can double-check what I did.

The question is did you test it... I tried something similiar but it did not work. I will test it for you and let u know

Nope, hence why I wanted you to double-check, lol.

Okay it seems to work now. I dont know why mine was giving an error, but wtv i guess.

src/clp_ffi_js/ir/StreamReader.hpp

src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp

kirkrodrigues · 2024-11-06T19:04:15Z

src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp

+     */
+    [[nodiscard]] static auto create(
+            std::unique_ptr<ZstdDecompressor>&& zstd_decompressor,
+            clp::Array<char>&& data_array


Let's pass this as a value for the same reason I discussed with Junhao in a previous PR. Essentially, passing it as a value and moving it in the caller offers maximum flexibility. We can't do the same for zstd_decompressor because the class doesn't support it and likely won't any time soon.

kirkrodrigues · 2024-11-06T19:07:23Z

src/clp_ffi_js/ir/IrStreamReader.hpp

+    void filter_log_events(LogLevelFilterTsType const& log_level_filter) override;
+
+    /**
+     * Deserializes all log events in the stream. After the stream has been exhausted, it will be


Currently, I see you've deleted this docstring. I'm fine keeping it this way, but if @junhaoliao wants to keep the part about deallocating the stream after exhaustion, then we need to add the docstring back in UnstructuredIrStreamReader.

Co-authored-by: kirkrodrigues <[email protected]>

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)

src/clp_ffi_js/ir/StreamReader.cpp (1)

168-168: Remove debug log message.

The debug message "did i get here 3" was left in accidentally and should be removed.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 28d4250 and 1b31f08.

📒 Files selected for processing (3)

src/clp_ffi_js/ir/StreamReader.cpp (2 hunks)
src/clp_ffi_js/ir/StreamReader.hpp (3 hunks)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/clp_ffi_js/ir/StreamReader.hpp
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp

🧰 Additional context used

📓 Learnings (1)

src/clp_ffi_js/ir/StreamReader.cpp (3)

Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/StreamReader.cpp:35-45
Timestamp: 2024-11-04T20:55:01.745Z
Learning: An IRv1 version validation utility function is not planned to be added to CLP-core soon.

Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:27.398Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.

Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:85-91
Timestamp: 2024-11-04T19:37:27.204Z
Learning: In the `IrStreamReader::deserialize_stream` method, we handle missing End-of-Stream packets gracefully by logging an error and breaking the loop when encountering `std::errc::result_out_of_range`, rather than throwing an exception.

🪛 cppcheck

src/clp_ffi_js/ir/StreamReader.cpp

[error] 111-111: syntax error

(syntaxError)

🔇 Additional comments (1)

src/clp_ffi_js/ir/StreamReader.cpp (1)

7-7: Add json to .clang-format configuration.

As noted in previous reviews, the json include needs to be added to src/clp_ffi-js/.clang-format to ensure consistent formatting.

coderabbitai · 2024-11-06T19:35:37Z

src/clp_ffi_js/ir/StreamReader.cpp

+    // Required to validate encoding type prior to getting version.
+    rewind_reader_and_validate_encoding_type(*zstd_decompressor);
+    auto const version{get_version(*zstd_decompressor)};
+
+    // Required that reader offset matches position after validation in order to decode log events.
+    rewind_reader_and_validate_encoding_type(*zstd_decompressor);


🛠️ Refactor suggestion

Optimize stream validation flow.

The encoding type is validated twice unnecessarily. As suggested in previous reviews, we can optimize this by:

Rewind + validate encoding type once

Save position

Validate version

Restore position

Create and return the reader

Apply this restructuring:

- // Required to validate encoding type prior to getting version. - rewind_reader_and_validate_encoding_type(*zstd_decompressor); + rewind_reader_and_validate_encoding_type(*zstd_decompressor); + + // Save position after encoding validation + auto pos = zstd_decompressor->get_pos(); + + // Get and validate version auto const version{get_version(*zstd_decompressor)}; - - // Required that reader offset matches position after validation in order to decode log events. - rewind_reader_and_validate_encoding_type(*zstd_decompressor); + + // Restore position for decoding + try { + zstd_decompressor->seek_from_begin(pos); + } catch (ZstdDecompressor::OperationFailed& e) { + throw ClpFfiJsException{ + clp::ErrorCode::ErrorCode_Failure, + __FILENAME__, + __LINE__, + std::format("Unable to rewind zstd decompressor: {}", e.what()) + }; + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Required to validate encoding type prior to getting version.

rewind_reader_and_validate_encoding_type(*zstd_decompressor);

auto const version{get_version(*zstd_decompressor)};

// Required that reader offset matches position after validation in order to decode log events.

rewind_reader_and_validate_encoding_type(*zstd_decompressor);

rewind_reader_and_validate_encoding_type(*zstd_decompressor);

// Save position after encoding validation

auto pos = zstd_decompressor->get_pos();

// Get and validate version

auto const version{get_version(*zstd_decompressor)};

// Restore position for decoding

try {

zstd_decompressor->seek_from_begin(pos);

} catch (ZstdDecompressor::OperationFailed& e) {

throw ClpFfiJsException{

clp::ErrorCode::ErrorCode_Failure,

__FILENAME__,

__LINE__,

std::format("Unable to rewind zstd decompressor: {}", e.what())

};

}

coderabbitai · 2024-11-06T19:35:37Z

src/clp_ffi_js/ir/StreamReader.cpp

+auto get_version(clp::ReaderInterface& reader) -> std::string {
+    // Deserialize metadata bytes from preamble.
+    clp::ffi::ir_stream::encoded_tag_t metadata_type{};
+    std::vector<int8_t> metadata_bytes;
+    auto const err{clp::ffi::ir_stream::deserialize_preamble(reader, metadata_type, metadata_bytes)
    };
-    if (result.has_error()) {
-        auto const error_code{result.error()};
-        SPDLOG_CRITICAL(
-                "Failed to create deserializer: {}:{}",
-                error_code.category().name(),
-                error_code.message()
-        );
+    if (IRErrorCode::IRErrorCode_Success != err) {
        throw ClpFfiJsException{
                clp::ErrorCode::ErrorCode_Failure,
                __FILENAME__,
                __LINE__,
-                "Failed to create deserializer"
+                std::format(
+                        "Failed to deserialize preamble: IR error code {}",
+                        clp::enum_to_underlying_type(err)
+                )
        };
    }

-    return {std::move(data_buffer), std::move(zstd_decompressor), std::move(result.value())};
+    std::string version;
+    try {
+        // Deserialize metadata bytes as JSON.
+        std::string_view const metadata_view{
+                clp::size_checked_pointer_cast<char const>(metadata_bytes.data()),
+                metadata_bytes.size()
+        };
+        nlohmann::json const metadata = nlohmann::json::parse(metadata_view);
+        version = metadata.at(clp::ffi::ir_stream::cProtocol::Metadata::VersionKey);
+    } catch (nlohmann::json::exception const& e) {
+        throw ClpFfiJsException{
+                clp::ErrorCode::ErrorCode_MetadataCorrupted,
+                __FILENAME__,
+                __LINE__,
+                std::format("Failed to parse stream's metadata: {}", e.what())
+        };
+    }
+
+    SPDLOG_INFO("IR version is {}", version);
+    return version;


⚠️ Potential issue

Add metadata type validation.

The function deserializes the preamble but doesn't validate if the metadata_type is correct before attempting to parse it as JSON.

Add validation like this:

if (IRErrorCode::IRErrorCode_Success != err) { throw ClpFfiJsException{ clp::ErrorCode::ErrorCode_Failure, __FILENAME__, __LINE__, std::format( "Failed to deserialize preamble: IR error code {}", clp::enum_to_underlying_type(err) ) }; } + if (metadata_type != clp::ffi::ir_stream::cProtocol::Tags::Metadata) { + throw ClpFfiJsException{ + clp::ErrorCode::ErrorCode_MetadataCorrupted, + __FILENAME__, + __LINE__, + std::format("Invalid metadata type: {}", metadata_type) + }; + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

auto get_version(clp::ReaderInterface& reader) -> std::string {

// Deserialize metadata bytes from preamble.

clp::ffi::ir_stream::encoded_tag_t metadata_type{};

std::vector<int8_t> metadata_bytes;

auto const err{clp::ffi::ir_stream::deserialize_preamble(reader, metadata_type, metadata_bytes)

};

if (result.has_error()) {

auto const error_code{result.error()};

SPDLOG_CRITICAL(

"Failed to create deserializer: {}:{}",

error_code.category().name(),

error_code.message()

);

if (IRErrorCode::IRErrorCode_Success != err) {

throw ClpFfiJsException{

clp::ErrorCode::ErrorCode_Failure,

__FILENAME__,

__LINE__,

"Failed to create deserializer"

std::format(

"Failed to deserialize preamble: IR error code {}",

clp::enum_to_underlying_type(err)

)

};

}

return {std::move(data_buffer), std::move(zstd_decompressor), std::move(result.value())};

std::string version;

try {

// Deserialize metadata bytes as JSON.

std::string_view const metadata_view{

clp::size_checked_pointer_cast<char const>(metadata_bytes.data()),

metadata_bytes.size()

};

nlohmann::json const metadata = nlohmann::json::parse(metadata_view);

version = metadata.at(clp::ffi::ir_stream::cProtocol::Metadata::VersionKey);

} catch (nlohmann::json::exception const& e) {

throw ClpFfiJsException{

clp::ErrorCode::ErrorCode_MetadataCorrupted,

__FILENAME__,

__LINE__,

std::format("Failed to parse stream's metadata: {}", e.what())

};

}

SPDLOG_INFO("IR version is {}", version);

return version;

auto get_version(clp::ReaderInterface& reader) -> std::string {

// Deserialize metadata bytes from preamble.

clp::ffi::ir_stream::encoded_tag_t metadata_type{};

std::vector<int8_t> metadata_bytes;

auto const err{clp::ffi::ir_stream::deserialize_preamble(reader, metadata_type, metadata_bytes)

};

if (IRErrorCode::IRErrorCode_Success != err) {

throw ClpFfiJsException{

clp::ErrorCode::ErrorCode_Failure,

__FILENAME__,

__LINE__,

std::format(

"Failed to deserialize preamble: IR error code {}",

clp::enum_to_underlying_type(err)

)

};

}

if (metadata_type != clp::ffi::ir_stream::cProtocol::Tags::Metadata) {

throw ClpFfiJsException{

clp::ErrorCode::ErrorCode_MetadataCorrupted,

__FILENAME__,

__LINE__,

std::format("Invalid metadata type: {}", metadata_type)

};

}

std::string version;

try {

// Deserialize metadata bytes as JSON.

std::string_view const metadata_view{

clp::size_checked_pointer_cast<char const>(metadata_bytes.data()),

metadata_bytes.size()

};

nlohmann::json const metadata = nlohmann::json::parse(metadata_view);

version = metadata.at(clp::ffi::ir_stream::cProtocol::Metadata::VersionKey);

} catch (nlohmann::json::exception const& e) {

throw ClpFfiJsException{

clp::ErrorCode::ErrorCode_MetadataCorrupted,

__FILENAME__,

__LINE__,

std::format("Failed to parse stream's metadata: {}", e.what())

};

}

SPDLOG_INFO("IR version is {}", version);

return version;

Co-authored-by: kirkrodrigues <[email protected]>

src/clp_ffi_js/.clang-format

src/clp_ffi_js/ir/StreamReader.cpp

Co-authored-by: kirkrodrigues <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

src/clp_ffi_js/ir/StreamReader.cpp (2)

6-26: Reorganize header includes into logical groups.

Based on previous discussions, the headers should be organized into distinct groups:

C++ standard library headers
Third-party library headers
Project-specific headers (clp)
Local headers

Apply this reorganization:

-#include <format>
-#include <memory>
-#include <string>
-#include <string_view>
-#include <utility>
-#include <vector>
-
-#include <clp/Array.hpp>
-#include <clp/ErrorCode.hpp>
-#include <clp/ffi/ir_stream/decoding_methods.hpp>
-#include <clp/ffi/ir_stream/protocol_constants.hpp>
-#include <clp/ReaderInterface.hpp>
-#include <clp/streaming_compression/zstd/Decompressor.hpp>
-#include <clp/TraceableException.hpp>
-#include <clp/type_utils.hpp>
-#include <emscripten/bind.h>
-#include <json/single_include/nlohmann/json.hpp>
-#include <spdlog/spdlog.h>
-
-#include <clp_ffi_js/ClpFfiJsException.hpp>
-#include <clp_ffi_js/ir/UnstructuredIrStreamReader.hpp>

+// C++ standard library
+#include <format>
+#include <memory>
+#include <string>
+#include <string_view>
+#include <utility>
+#include <vector>
+
+// Third-party libraries
+#include <emscripten/bind.h>
+#include <json/single_include/nlohmann/json.hpp>
+#include <spdlog/spdlog.h>
+
+// CLP libraries
+#include <clp/Array.hpp>
+#include <clp/ErrorCode.hpp>
+#include <clp/ffi/ir_stream/decoding_methods.hpp>
+#include <clp/ffi/ir_stream/protocol_constants.hpp>
+#include <clp/ReaderInterface.hpp>
+#include <clp/streaming_compression/zstd/Decompressor.hpp>
+#include <clp/TraceableException.hpp>
+#include <clp/type_utils.hpp>
+
+// Local headers
+#include <clp_ffi_js/ClpFfiJsException.hpp>
+#include <clp_ffi_js/ir/UnstructuredIrStreamReader.hpp>

116-121: Add newlines before comments for better readability.

Following the established coding style, add newlines before comments.

Apply this formatting:

 EMSCRIPTEN_BINDINGS(ClpStreamReader) {
+
     // JS types used as inputs
     emscripten::register_type<clp_ffi_js::ir::DataArrayTsType>("Uint8Array");
     emscripten::register_type<clp_ffi_js::ir::LogLevelFilterTsType>("number[] | null");
+
     // JS types used as outputs

🧰 Tools

🪛 cppcheck

[error] 116-116: syntax error

(syntaxError)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1b31f08 and bfea301.

📒 Files selected for processing (5)

CMakeLists.txt (1 hunks)
src/clp_ffi_js/.clang-format (1 hunks)
src/clp_ffi_js/ir/StreamReader.cpp (2 hunks)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1 hunks)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1 hunks)

✅ Files skipped from review due to trivial changes (1)

src/clp_ffi_js/.clang-format

🚧 Files skipped from review as they are similar to previous changes (3)

CMakeLists.txt
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp

🧰 Additional context used

📓 Learnings (1)

src/clp_ffi_js/ir/StreamReader.cpp (2)

Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/StreamReader.cpp:35-45
Timestamp: 2024-11-04T20:55:01.745Z
Learning: An IRv1 version validation utility function is not planned to be added to CLP-core soon.

Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:27.398Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.

🪛 cppcheck

src/clp_ffi_js/ir/StreamReader.cpp

[error] 116-116: syntax error

(syntaxError)

🔇 Additional comments (3)

src/clp_ffi_js/ir/StreamReader.cpp (3)

32-48: LGTM! Well-documented function declarations.

The function declarations are clear and include comprehensive documentation with proper exception specifications.

49-72: LGTM! Robust encoding type validation.

The function properly handles:

Reader position reset
Encoding type validation
Appropriate error handling with descriptive messages

146-187: LGTM! Well-structured stream reader creation.

The implementation follows best practices with:

Proper memory management using smart pointers
Optimized stream validation flow
Comprehensive error handling
Clear logging

kirkrodrigues

For the PR title, how about:

Split `StreamReader` into an interface and implementation to prepare for adding another IR stream reader.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)

63-70: Enhance method documentation.

The documentation for deserialize_stream() only references the base class method without explaining its specific behaviour in this context. Consider adding details about:

What exactly is being deserialized

What happens to the stream after exhaustion

Any specific error conditions

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between bfea301 and 119de6c.

📒 Files selected for processing (2)

src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1 hunks)
src/submodules/clp (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/submodules/clp

🔇 Additional comments (5)

src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (5)

21-25: LGTM! Well-documented type alias.

The FilteredLogEventsMap type alias is clearly documented and effectively describes its purpose as a mapping between filtered and unfiltered log event indices.

27-31: LGTM! Clear and concise class documentation.

The class documentation effectively communicates its purpose and responsibilities.

33-43: LGTM! Proper RAII implementation.

The class correctly implements RAII with well-defined destructor and move/copy semantics. The rationale for deleting the move assignment operator is clearly documented.

76-87: LGTM! Well-structured private implementation.

The private section demonstrates good design with:

Proper encapsulation using private constructor
Clear member variable organization
Appropriate use of smart pointers for ownership

31-87: Verify error handling across FFI boundary.

Since this class handles compressed data and operates across the FFI boundary, let's verify the error handling implementation:

✅ Verification successful

Let me gather more information about memory safety patterns, particularly around unique pointers and RAII usage.

Error handling and memory safety are properly implemented

The verification shows robust error handling and memory safety mechanisms:

Proper RAII with smart pointers (unique_ptr) for managing resources
Consistent error handling using ClpFfiJsException with detailed error messages
Clear ownership semantics with move operations and deleted copy operations
Comprehensive error checks for:
- Corrupt metadata
- Deserialization failures
- Encoding type validation
- Stream version compatibility
- Decompression errors

The implementation follows best practices for C++ FFI boundary safety through exception handling and RAII patterns.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for error handling patterns in the implementation
echo "Searching for error handling in implementation..."
rg -A 5 "throw|error|exception" --type cpp

# Look for memory safety patterns
echo "Checking memory safety patterns..."
ast-grep --pattern 'try { $$$ } catch { $$$ }'

Length of output: 10412

Script:

#!/bin/bash
# Check for unique_ptr usage and ownership transfer
echo "Checking smart pointer patterns..."
rg "unique_ptr|shared_ptr|make_unique" --type cpp -A 3

# Look for move semantics and RAII patterns
echo "Checking RAII and move semantics..."
ast-grep --pattern 'std::move($_)'

Length of output: 3823

old

junhaoliao and others added 9 commits October 14, 2024 20:52

Refactor StreamReader to modularize decoding logic.

3dae806

Add a comment section for methods in StreamReader.hpp.

ab32045

Add get_version to decoding methods.

8dc3dff

Move #include <emscripten/bind.h> from StreamReader.hpp to StreamRe…

eba1d30

…ader.cpp.

Rename StreamReader -> IRStreamReader.

5398d1d

Rename ClpIrStreamReader to ClpIRStreamReader in bindings.

cfd55ca

Extract a StreamReader base class from IRStreamReader.

d1ed2f0

Rename IRStreamReader -> IrStreamReader.

bf5e4c9

merging main into junhao PR

ad306f2

davemarco marked this pull request as draft October 31, 2024 20:17

davemarco changed the title ~~WIP(Replaces #22)- Rename StreamReader -> IrStreamReader; Extract a StreamReader base class from IrStreamReader.~~ WIP(Replaces #23)- Rename StreamReader -> IrStreamReader; Extract a StreamReader base class from IrStreamReader. Oct 31, 2024

coderabbitai bot reviewed Oct 31, 2024

View reviewed changes

davemarco added 3 commits November 1, 2024 15:08

some restructure

c474bb4

fix cmake

f4a6896

more changes after test

dba3910

davemarco changed the title ~~WIP(Replaces #23)- Rename StreamReader -> IrStreamReader; Extract a StreamReader base class from IrStreamReader.~~ Rename StreamReader -> IrStreamReader; Extract a StreamReader base class from IrStreamReader. Nov 1, 2024

davemarco marked this pull request as ready for review November 1, 2024 17:55

davemarco requested a review from junhaoliao November 1, 2024 17:55

coderabbitai bot reviewed Nov 1, 2024

View reviewed changes

junhaoliao reviewed Nov 4, 2024

View reviewed changes

src/clp_ffi_js/ir/StreamReader.hpp Show resolved Hide resolved

junhaoliao reviewed Nov 4, 2024

View reviewed changes

src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved

junhaoliao reviewed Nov 4, 2024

View reviewed changes

src/clp_ffi_js/ir/StreamReaderDataContext.hpp Outdated Show resolved Hide resolved

junhaoliao reviewed Nov 4, 2024

View reviewed changes

src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved

junhaoliao reviewed Nov 4, 2024

View reviewed changes

src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved

junhao changes

9ddd974

coderabbitai bot reviewed Nov 4, 2024

View reviewed changes

fix lint

a5cb529

coderabbitai bot reviewed Nov 4, 2024

View reviewed changes

kirkrodrigues reviewed Nov 6, 2024

View reviewed changes

davemarco and others added 4 commits November 6, 2024 14:25

Update src/clp_ffi_js/ir/StreamReader.cpp

c5db9a6

Co-authored-by: kirkrodrigues <[email protected]>

Update src/clp_ffi_js/ir/StreamReader.hpp

4a83bce

Co-authored-by: kirkrodrigues <[email protected]>

Update src/clp_ffi_js/ir/StreamReader.hpp

de7a2f8

Co-authored-by: kirkrodrigues <[email protected]>

Apply suggestions from code review

1b31f08

Co-authored-by: kirkrodrigues <[email protected]>

coderabbitai bot reviewed Nov 6, 2024

View reviewed changes

davemarco mentioned this pull request Nov 6, 2024

fix(taskfiles): Use empty list as default for optional task arguments passed via a ref (fixes 14). y-scope/yscope-dev-utils#15

Merged

davemarco and others added 3 commits November 6, 2024 15:09

Update src/clp_ffi_js/ir/StreamReader.cpp

226ab3a

Co-authored-by: kirkrodrigues <[email protected]>

change to pass by value

fa4debd

kirk changes

4108398

davemarco requested a review from kirkrodrigues November 6, 2024 20:14

kirkrodrigues requested changes Nov 6, 2024

View reviewed changes

src/clp_ffi_js/.clang-format Outdated Show resolved Hide resolved

src/clp_ffi_js/ir/StreamReader.cpp Show resolved Hide resolved

src/clp_ffi_js/ir/StreamReader.cpp Show resolved Hide resolved

davemarco and others added 3 commits November 6, 2024 23:39

change submodule

c7a8fbf

Merge branch 'main' into extractBaseClass

d5cad52

Apply suggestions from code review

bfea301

Co-authored-by: kirkrodrigues <[email protected]>

coderabbitai bot reviewed Nov 6, 2024

View reviewed changes

revert submodule change

72bdf6e

davemarco requested a review from kirkrodrigues November 7, 2024 00:03

use see

119de6c

kirkrodrigues approved these changes Nov 7, 2024

View reviewed changes

davemarco changed the title ~~Rename StreamReader -> IrStreamReader; Extract a StreamReader base class from IrStreamReader.~~ Split StreamReader into an interface and implementation to prepare for adding another IR stream reader. Nov 7, 2024

coderabbitai bot reviewed Nov 7, 2024

View reviewed changes

davemarco merged commit 8e3b660 into y-scope:main Nov 7, 2024
3 checks passed

This was referenced Nov 7, 2024

Modify StreamReaderDataContext to accept the deserializer as a template parameter to prepare for adding another IR stream reader. #28

Merged

Add support for deserializing and decoding v0.1.0 IR streams, but without log-level parsing and filtering. #30

Merged

junhaoliao mentioned this pull request Nov 9, 2024

WIP (depends on #22) - Rename StreamReader -> IrStreamReader; Extract a StreamReader base class from IrStreamReader. #23

Closed

		auto filter_levels{emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter
		)};

Split StreamReader into an interface and implementation to prepare for adding another IR stream reader. #26

Split StreamReader into an interface and implementation to prepare for adding another IR stream reader. #26

Conversation

davemarco commented Oct 31, 2024 • edited by coderabbitai bot Loading

Description

Validation performed

Summary by CodeRabbit

Summary by CodeRabbit

coderabbitai bot commented Oct 31, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 31, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 31, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 4, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 5, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 4, 2024

Choose a reason for hiding this comment

junhaoliao Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

coderabbitai bot Nov 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 4, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 4, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 1, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 4, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 4, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Split `StreamReader` into an interface and implementation to prepare for adding another IR stream reader. #26

Split `StreamReader` into an interface and implementation to prepare for adding another IR stream reader. #26

davemarco commented Oct 31, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 31, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai bot Oct 31, 2024 •

edited

Loading

coderabbitai bot Oct 31, 2024 •

edited

Loading

coderabbitai bot Nov 1, 2024 •

edited

Loading

junhaoliao Nov 4, 2024 •

edited

Loading

coderabbitai bot Nov 1, 2024 •

edited

Loading

coderabbitai bot Nov 1, 2024 •

edited

Loading

coderabbitai bot Nov 1, 2024 •

edited

Loading

coderabbitai bot Nov 1, 2024 •

edited

Loading