Skip to content

Conversation

junhaoliao
Copy link
Member

@junhaoliao junhaoliao commented May 26, 2025

Description

Support retrieving CLP IR metadata in Structured and Unstructured IR stream readers

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Test files: test-files.zip

  1. Created below test driver code test.mjs
    import ModuleInit from "./cmake-build-debug/ClpFfiJs-node.js"
    import fs from "node:fs"
    
    const main = async () => {
        const file = fs.readFileSync("./test-irv1.clp.zst")
        // const file = fs.readFileSync("./test-irv2.clp.zst")
        // const file = fs.readFileSync("./test.clp.zst")
    
        console.time("perf")
        const Module = await ModuleInit()
        console.log(Module.MERGED_KV_PAIRS_AUTO_GENERATED_KEY);
        console.log(Module.MERGED_KV_PAIRS_USER_GENERATED_KEY);
        try {
            const decoder = new Module.ClpStreamReader(
                new Uint8Array(file),
                {timestampKey: {isAutoGenerated: true, parts: ["timestamp", "unix_millisecs"]},
                    logLevelKey: {parts: ["level", "name"], isAutoGenerated: true}})
            console.log(decoder.getMetadata())
            console.log("type:", decoder.getIrStreamType() === Module.IrStreamType.STRUCTURED,
                decoder.getIrStreamType() === Module.IrStreamType.UNSTRUCTURED)
            const numEvents = decoder.deserializeStream()
            console.log(numEvents)
            const results = decoder.decodeRange(0, numEvents, false)
            console.log(results)
        } catch (e) {
            console.error("Exception caught:", e.stack)
        }
        console.timeEnd("perf")
    }
    
    void main()
    
  2. Ran for test-irv1.clp.zst and observed output:
    ...
    [2025-05-25 22:41:24.230] [info] [StreamReader.cpp:56] IR version is v0.0.0
    [
      {
        REFERENCE_TIMESTAMP: '1427088366258',
        TIMESTAMP_PATTERN: 'yyyy-MM-dd HH:mm:ss,SSS',
        TZ_ID: 'America/Toronto',
        VERSION: 'v0.0.0'
      }
    ]
    ...
    
  3. Ran for test-irv2.clp.zst and observed output:
    ...
    [2025-05-25 22:41:59.112] [info] [StreamReader.cpp:56] IR version is 0.1.0
    {
      VARIABLES_SCHEMA_ID: 'com.yscope.clp.VariablesSchemaV2',
      VARIABLE_ENCODING_METHODS_ID: 'com.yscope.clp.VariableEncodingMethodsV1',
      VERSION: '0.1.0'
    }
    ...
    
  4. No decoding error was observed in (2.) and (3.).

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Added the ability to extract and access metadata from IR streams, now available through a new getter method in both structured and unstructured stream readers.
  • Refactor

    • Improved and modularized the handling of metadata deserialization and version retrieval for IR streams.
    • Enhanced integration with JavaScript by expanding type support and bindings.
    • Simplified decompressor position management during stream reader creation.
  • Chores

    • Introduced utility for safe JSON serialization, replacing invalid UTF-8 sequences automatically.
    • Updated build configuration to include new source files supporting decoding and JSON utilities.

@junhaoliao junhaoliao requested a review from a team as a code owner May 26, 2025 02:38
Copy link

coderabbitai bot commented May 26, 2025

Warning

Rate limit exceeded

@junhaoliao has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 21 minutes and 3 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 30b7750 and e96db08.

📒 Files selected for processing (5)
  • CMakeLists.txt (1 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (3 hunks)
  • src/clp_ffi_js/ir/decoding_methods.cpp (1 hunks)
  • src/clp_ffi_js/utils.cpp (1 hunks)
  • src/clp_ffi_js/utils.hpp (1 hunks)

"""

Walkthrough

The changes remove redundant rewind and encoding validation in the IR stream reader creation, centralize metadata deserialization into a dedicated function, and add metadata storage and access support in both structured and unstructured IR stream readers. Emscripten bindings are extended to expose metadata to JavaScript. Utility functions for JSON serialization with invalid UTF-8 replacement are introduced.

Changes

File(s) Change Summary
src/clp_ffi_js/ir/StreamReader.cpp, src/clp_ffi_js/ir/StreamReader.hpp Removed rewind and encoding validation function; refactored version retrieval to use deserialize_metadata; added get_metadata() method and JS binding.
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp, src/clp_ffi_js/ir/StructuredIrStreamReader.hpp Added member to store metadata JSON; implemented get_metadata() returning JS-compatible metadata; removed local JSON helper function.
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp, src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp Extended constructor to accept metadata JSON; stored metadata and added get_metadata() method exposing it as JS object.
src/clp_ffi_js/ir/decoding_methods.cpp, src/clp_ffi_js/ir/decoding_methods.hpp Added functions to rewind and validate encoding, deserialize metadata JSON from stream, and convert metadata to JS object.
src/clp_ffi_js/json_utils.hpp, src/clp_ffi_js/json_utils.cpp Introduced utility function to serialize JSON with invalid UTF-8 replacement instead of throwing.
CMakeLists.txt Added decoding_methods.cpp and json_utils.cpp to the build sources.

Sequence Diagram(s)

sequenceDiagram
    participant JS as JavaScript
    participant Emscripten
    participant StreamReader
    participant ReaderInterface

    JS->>Emscripten: Create StreamReader from data array
    Emscripten->>StreamReader: create(data_array, reader_options)
    StreamReader->>ReaderInterface: deserialize_metadata()
    ReaderInterface-->>StreamReader: metadata JSON
    StreamReader->>StreamReader: rewind decompressor to start
    StreamReader->>StreamReader: validate version and create appropriate IR stream reader
    StreamReader-->>Emscripten: StreamReader instance with metadata
    Emscripten-->>JS: StreamReader object (with get_metadata method)
    JS->>StreamReader: get_metadata()
    StreamReader-->>JS: MetadataTsType (JS object)
Loading

Possibly related issues

Possibly related PRs

Suggested reviewers

  • davemarco
    """
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

);

// JS types used as outputs
emscripten::register_type<clp_ffi_js::ir::MetadataTsType>("Record<string, any>");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was added

&clp_ffi_js::ir::StreamReader::create,
emscripten::return_value_policy::take_ownership()
)
.function("getMetadata", &clp_ffi_js::ir::StreamReader::get_metadata)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was added

auto get_version(clp::ReaderInterface& reader) -> std::string {
std::string version;
try {
auto const metadata_json = clp_ffi_js::ir::deserialize_metadata(reader);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata deserialization logic was extracted as deserialize_metadata.

constexpr std::string_view cMergedKvPairsAutoGeneratedKey{"auto-generated"};
constexpr std::string_view cMergedKvPairsUserGeneratedKey{"user-generated"};

/**
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to json_utils.hpp

@junhaoliao junhaoliao requested a review from LinZhihao-723 May 26, 2025 03:23
Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline:

  • These utils for rewinding readers are needed because the our deserializer interface doesn't support deserializing IRv1 yet. Hopefully we can get rid of this complicated procedure soon once we have a more complete implementation of the deserializer (this would also avoid deserializing metadata manually. Technically the deserializer API should handle it automatically)
  • We should use result-style error handling instead of relying on exceptions, at least for the factory functions. It's meaningless to use a factory function if it just throw on errors, since a constructor can be designed in a same way. The result-style error handling allows us to forward the error code from the core-ffi.

* @param json
* @return Serialized JSON.
*/
[[nodiscard]] inline auto dump_json_with_replace(nlohmann::json const& json) -> std::string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we use inline instead of putting the implementation into a source file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe it was because we intended to just wrap the json_obj.dump() call with specific config arguments. i understand that the inline keyword is just a suggestion to the compiler, and function calling overhead may not always be reduced

given those, shall we split the implementation into a source file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's put it into a source file.

* @param json
* @return Serialized JSON.
*/
[[nodiscard]] inline auto dump_json_with_replace(nlohmann::json const& json) -> std::string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to use json_obj instead of json since we usually use json to refer to the format itself, or a namespace.

);

// Variables
nlohmann::json m_metadata_json;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

explicit UnstructuredIrStreamReader(
StreamReaderDataContext<UnstructuredIrDeserializer>&& stream_reader_data_context
StreamReaderDataContext<UnstructuredIrDeserializer>&& stream_reader_data_context,
nlohmann::json&& metadata_json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nlohmann::json&& metadata_json
nlohmann::json metadata_json

Do we need an rvalue reference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have to. i added the && as i considered this as a private constructor for internal use so no flexibility is required

(and also that we have added && on stream_reader_data_context; would it be better to be consistent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to not adding && since it's not necessary. We should send a PR to fix stream_reader_data_context as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai create an issue for removing && from private constructors

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@junhaoliao I've created an issue to track the removal of unnecessary rvalue references from private constructors. The issue includes the context from your discussion with LinZhihao-723 and references this PR and comment thread.


🏁 Script executed:

#!/bin/bash
# Create GitHub issue for removing unnecessary rvalue references from private constructors
gh issue create \
  --title "Remove unnecessary rvalue references from private constructors" \
  --body "## Description

Remove unnecessary rvalue reference parameters (\`&&\`) from private constructors as they don't provide meaningful benefits for internal-only APIs.

## Context

During PR review, it was noted that private constructors don't need the flexibility of rvalue references since they are only used internally. This applies to parameters like \`stream_reader_data_context\` and \`metadata_json\` in private constructors.

## Examples

- \`UnstructuredIrStreamReader\` private constructor
- \`StructuredIrStreamReader\` private constructor (if applicable)
- Other similar cases in the codebase

## References

- Discussion in PR #80: https://github.com/y-scope/clp-ffi-js/pull/80#discussion_r2107956653
- Related comment: https://github.com/y-scope/clp-ffi-js/pull/80#discussion_r2107956653

## Acceptance Criteria

- [ ] Identify all private constructors using rvalue references unnecessarily
- [ ] Remove \`&&\` from private constructor parameters where not needed
- [ ] Ensure all call sites still compile and work correctly
- [ ] Update any related documentation if necessary

Requested by: @junhaoliao" \
  --assignee "junhaoliao"

Length of output: 1277


@junhaoliao Perfect! I've successfully created issue #90 for removing unnecessary rvalue references from private constructors: #90

The issue captures the context from your discussion with LinZhihao-723, includes references to this PR and comment thread, and provides clear acceptance criteria for the cleanup work. It's been assigned to you as requested.

    /\_/\  
   ( o.o ) 
    > ^ <   *thoughtful*

🧬 Code Graph Analysis Results

Relevant snippets from src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (lines 176-185):

UnstructuredIrStreamReader::UnstructuredIrStreamReader(
        StreamReaderDataContext<UnstructuredIrDeserializer>&& stream_reader_data_context,
        nlohmann::json&& metadata
)
        : m_metadata(std::move(metadata)),
          m_stream_reader_data_context{
                  std::make_unique<StreamReaderDataContext<UnstructuredIrDeserializer>>(
                          std::move(stream_reader_data_context)
                  )
          } {}
  • This snippet shows the definition of the private constructor of UnstructuredIrStreamReader which takes rvalue references (&&) for its parameters. It moves the parameters into member variables. This is directly related to the user comment about removing && from private constructors.

Relevant snippets from src/clp_ffi_js/ir/StreamReaderDataContext.hpp (lines 42-44):

[[nodiscard]] auto get_deserializer() -> Deserializer& { return m_deserializer; }

[[nodiscard]] auto get_reader() -> clp::ReaderInterface& { return *m_reader; }
  • These methods provide access to the deserializer and reader inside StreamReaderDataContext, which is the type used in the constructor of UnstructuredIrStreamReader. Understanding this context may be useful for understanding the constructor parameters.

Relevant snippets from src/clp_ffi_js/ir/StreamReader.hpp (lines 69-70, 107, 133-134, 155):

create(DataArrayTsType const& data_array, ReaderOptions const& reader_options)
            -> std::unique_ptr<StreamReader>

filter_log_events(LogLevelFilterTsType const& log_level_filter)

decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
            -> DecodedResultsTsType

find_nearest_log_event_by_timestamp(clp::ir::epoch_time_ms_t target_ts) -> NullableLogEventIdx
  • These declarations show the interface of StreamReader, which UnstructuredIrStreamReader inherits from and overrides. They provide context on the class design and method signatures related to stream reading and filtering.

Relevant snippets from src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (lines 38-42):

create(
            std::unique_ptr<ZstdDecompressor>&& zstd_decompressor,
            clp::Array<char> data_array,
            ReaderOptions const& reader_options
    ) -> StructuredIrStreamReader
  • This snippet shows a create method with a similar signature to the create method in UnstructuredIrStreamReader. It uses rvalue references (&&) for the decompressor parameter, which is relevant for comparison regarding the use of && in constructors or factory methods.

These snippets provide relevant context on the use of rvalue references (&&) in constructors and factory methods, especially the private constructor of UnstructuredIrStreamReader and related classes.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StructuredIrStreamReader.hpp:187-190
Timestamp: 2024-11-07T08:35:20.376Z
Learning: In the `StructuredIrStreamReader` constructor of `src/clp_ffi_js/ir/StructuredIrStreamReader.hpp`, the `stream_reader_data_context` parameter is already wrapped in a `std::make_unique`, ensuring it matches the member variable `m_stream_reader_data_context` of type `std::unique_ptr<StreamReaderDataContext<StructuredIrDeserializer>>`.

// Constructor
explicit UnstructuredIrStreamReader(
StreamReaderDataContext<UnstructuredIrDeserializer>&& stream_reader_data_context
StreamReaderDataContext<UnstructuredIrDeserializer>&& stream_reader_data_context,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of this PR's scope: I don't think we need && for stream_reader_data_context either

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(sorry i didn't see this earlier)
right, if the same principle applies to metadata_json. let's continue the discussion at #80 (comment)

@junhaoliao
Copy link
Member Author

@LinZhihao-723

We should use result-style error handling instead of relying on exceptions, at least for the factory functions. It's meaningless to use a factory function if it just throw on errors, since a constructor can be designed in a same way. The result-style error handling allows us to forward the error code from the core-ffi.

As discussed offline, i agree the Rust-like Result-style error is better. i can't quickly figure out how such results can be mapped to Emscripten's Embind system though. Shall we create an issue to address this as a separate effort? I believe this can be better done once we sync the latest clp code into clp-ffi-js (and could be done even better after you are done with the proposed error handling refactoring in clp code)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0220828 and 59a2ebb.

📒 Files selected for processing (1)
  • src/clp_ffi_js/json_utils.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • src/clp_ffi_js/json_utils.hpp
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: lint (ubuntu-latest)
🔇 Additional comments (3)
src/clp_ffi_js/json_utils.hpp (3)

20-21: Include guard closing matches definition.
The #endif comment correctly references CLP_FFI_JS_JSON_UTILS_HPP. No changes needed.


4-7:

✅ Verification successful

Verify consistency of the nlohmann JSON include path.
Ensure that <json/single_include/nlohmann/json.hpp> matches the project’s include conventions. Run this to check other instances:


🏁 Script executed:

rg '#include\s*<.*json.hpp>' -n

Length of output: 688


Include path consistency verified
The nlohmann JSON header is uniformly included as <json/single_include/nlohmann/json.hpp> across all occurrences (e.g., json_utils.hpp, UnstructuredIrStreamReader.*, StructuredIrStreamReader.*, StreamReader.*). This matches the project’s existing include conventions—no changes required.


17-19: 🧹 Nitpick (assertive)

Add noexcept specifier for stronger exception guarantees.
Since error_handler_t::replace prevents UTF-8 errors from throwing, consider marking the wrapper as noexcept:

-[[nodiscard]] inline auto dump_json_with_replace(nlohmann::json const& json_obj) -> std::string {
+[[nodiscard]] inline auto dump_json_with_replace(nlohmann::json const& json_obj) noexcept -> std::string {

Likely an incorrect or invalid review comment.

Comment on lines 1 to 3
#ifndef CLP_FFI_JS_JSON_UTILS_HPP
#define CLP_FFI_JS_JSON_UTILS_HPP

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider using #pragma once for include guard simplification.
To reduce boilerplate and avoid macro name collisions, you can replace the traditional guards with:

-#ifndef CLP_FFI_JS_JSON_UTILS_HPP
-#define CLP_FFI_JS_JSON_UTILS_HPP
+#pragma once

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/clp_ffi_js/json_utils.hpp at the beginning of the file (lines 1 to 3),
replace the traditional include guard macros (#ifndef, #define, #endif) with a
single #pragma once directive to simplify the include guard and prevent macro
name collisions.

Comment on lines 9 to 16
/**
* @see nlohmann::basic_json::dump
* Serializes a JSON value into a string with invalid UTF-8 sequences replaced rather than throwing
* an exception.
* @param json_obj
* @return The JSON object serialized as a string.
*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Enhance Doxygen comment with parameter and brief description.
The @param tag lacks a description and there’s no brief summary line before @see. Consider:

- * @see nlohmann::basic_json::dump
+ * Serializes a JSON value into a string, replacing invalid UTF-8 sequences.
+ *
+ * @see nlohmann::basic_json::dump
 @param json_obj The JSON object to serialize.
 @return The serialized JSON string.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* @see nlohmann::basic_json::dump
* Serializes a JSON value into a string with invalid UTF-8 sequences replaced rather than throwing
* an exception.
* @param json_obj
* @return The JSON object serialized as a string.
*/
/**
* Serializes a JSON value into a string, replacing invalid UTF-8 sequences.
*
* @see nlohmann::basic_json::dump
* @param json_obj The JSON object to serialize.
* @return The serialized JSON string.
*/
🤖 Prompt for AI Agents
In src/clp_ffi_js/json_utils.hpp around lines 9 to 16, the Doxygen comment is
missing a brief summary line and a description for the @param tag. Add a concise
brief description at the start of the comment summarizing the function's
purpose, and provide a clear explanation for the @param json_obj describing what
it represents or expects.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/clp_ffi_js/ir/StreamReader.hpp (1)

52-76: Well-documented metadata handling functions.

The function declarations are clear and properly documented. Note that a past reviewer suggested creating a separate utility file for these non-member helper methods to avoid cluttering the header.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59a2ebb and 779f940.

📒 Files selected for processing (4)
  • src/clp_ffi_js/ir/StreamReader.cpp (6 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (4 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (3 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/StreamReader.hpp
  • src/clp_ffi_js/ir/StreamReader.cpp
🧠 Learnings (3)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StructuredIrStreamReader.hpp:187-190
Timestamp: 2024-11-07T08:35:20.376Z
Learning: In the `StructuredIrStreamReader` constructor of `src/clp_ffi_js/ir/StructuredIrStreamReader.hpp`, the `stream_reader_data_context` parameter is already wrapped in a `std::make_unique`, ensuring it matches the member variable `m_stream_reader_data_context` of type `std::unique_ptr<StreamReaderDataContext<StructuredIrDeserializer>>`.
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (1)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StructuredIrStreamReader.hpp:187-190
Timestamp: 2024-11-07T08:35:20.376Z
Learning: In the `StructuredIrStreamReader` constructor of `src/clp_ffi_js/ir/StructuredIrStreamReader.hpp`, the `stream_reader_data_context` parameter is already wrapped in a `std::make_unique`, ensuring it matches the member variable `m_stream_reader_data_context` of type `std::unique_ptr<StreamReaderDataContext<StructuredIrDeserializer>>`.
src/clp_ffi_js/ir/StreamReader.cpp (3)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StreamReader.cpp:179-179
Timestamp: 2024-11-07T08:38:39.882Z
Learning: The function `clp::ffi::ir_stream::validate_protocol_version` used in `src/clp_ffi_js/ir/StreamReader.cpp` is imported from the CLP upstream library and may not be present in the local codebase.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/decoding_methods.cpp:45-80
Timestamp: 2024-11-04T21:06:24.453Z
Learning: In `src/clp_ffi_js/ir/decoding_methods.cpp`, within the `get_version(clp::ReaderInterface&)` function, avoid returning `std::string_view` when referencing local data that goes out of scope. Instead, return `std::string` to ensure data validity.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StructuredIrStreamReader.hpp:187-190
Timestamp: 2024-11-07T08:35:20.376Z
Learning: In the `StructuredIrStreamReader` constructor of `src/clp_ffi_js/ir/StructuredIrStreamReader.hpp`, the `stream_reader_data_context` parameter is already wrapped in a `std::make_unique`, ensuring it matches the member variable `m_stream_reader_data_context` of type `std::unique_ptr<StreamReaderDataContext<StructuredIrDeserializer>>`.
🧬 Code Graph Analysis (2)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (4)
src/clp_ffi_js/ir/StreamReader.cpp (6)
  • rewind_reader_and_validate_encoding_type (105-130)
  • rewind_reader_and_validate_encoding_type (105-105)
  • deserialize_metadata (132-165)
  • deserialize_metadata (132-132)
  • convert_metadata_to_js_object (167-173)
  • convert_metadata_to_js_object (167-167)
src/clp_ffi_js/ir/StreamReader.hpp (3)
  • rewind_reader_and_validate_encoding_type (58-58)
  • deserialize_metadata (67-67)
  • convert_metadata_to_js_object (75-76)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
  • zstd_decompressor (47-48)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)
  • get_metadata (120-122)
  • get_metadata (120-120)
src/clp_ffi_js/ir/StreamReader.hpp (3)
src/clp_ffi_js/json_utils.hpp (1)
  • nodiscard (17-19)
src/clp_ffi_js/ir/StreamReaderDataContext.hpp (2)
  • nodiscard (42-42)
  • nodiscard (44-44)
src/clp_ffi_js/ClpFfiJsException.hpp (1)
  • nodiscard (24-24)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: lint (ubuntu-latest)
🔇 Additional comments (14)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (3)

28-28: Good refactoring to centralize JSON utilities.

Moving dump_json_with_replace to a shared header improves code reusability.


120-122: Correct implementation of the metadata getter.

The method properly implements the virtual function from the base class and correctly converts the stored metadata to a JavaScript object.


220-221: Metadata extraction looks correct.

The metadata is properly extracted from the deserializer during construction. Consider adding validation to ensure the metadata extraction succeeded.

src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (3)

37-42: Correct metadata extraction with proper state preservation.

The implementation correctly saves and restores the reader position to avoid side effects on subsequent deserialization operations.


62-62: Efficient parameter passing with move semantics.

Good use of std::move to avoid unnecessary copies of the metadata JSON object.


65-67: Consistent metadata getter implementation.

The implementation correctly matches the pattern used in StructuredIrStreamReader.

src/clp_ffi_js/ir/StreamReader.hpp (2)

15-15: Necessary includes and type declaration for metadata support.

The additions properly support the new metadata functionality and follow established patterns.

Also applies to: 20-20, 35-35


111-114: Clean interface for metadata retrieval.

Adding the pure virtual method to the base class ensures consistent metadata access across all stream reader implementations.

src/clp_ffi_js/ir/StreamReader.cpp (6)

27-27: LGTM: Required include for JSON utilities.

The new include is needed for the dump_json_with_replace function used in convert_metadata_to_js_object.


42-58: Well-structured refactoring with proper error handling.

The rewritten get_version function properly delegates to deserialize_metadata and maintains consistent error handling patterns. The extraction of metadata deserialization logic improves code modularity.


60-101: Comprehensive Emscripten bindings with good organization.

The bindings have been properly moved to the anonymous namespace and significantly expanded to support the new metadata functionality. The separation between input and output types is clear, and all necessary methods are bound for JavaScript interoperability.


156-156: Simplified return statement improves readability.

The direct return of the parsed JSON eliminates unnecessary intermediate variables while maintaining the same functionality.


167-173: Well-implemented JavaScript interop helper.

The convert_metadata_to_js_object function properly converts JSON metadata to JavaScript-compatible objects using Emscripten's JSON parsing capabilities and the robust dump_json_with_replace utility.


194-194:

✅ Verification successful

Verify decompressor position management is correct.

The simplified approach with a single rewind after get_version should work correctly since get_version now calls deserialize_metadata which handles the metadata reading internally. However, please ensure this doesn't affect the subsequent version validation or stream reader creation.


🏁 Script executed:

#!/bin/bash
# Description: Verify that get_version and deserialize_metadata handle reader positioning correctly
# Expected: Both functions should rewind the reader to the beginning before processing

echo "Searching for reader positioning calls in get_version and deserialize_metadata..."
ast-grep --pattern 'auto get_version($reader) -> std::string {
  $$$
}'

echo "Searching for seek operations in deserialize_metadata..."
ast-grep --pattern 'auto deserialize_metadata($reader) -> nlohmann::json {
  $$$
}'

echo "Checking if deserialize_preamble affects reader position..."
rg -A 10 -B 5 "deserialize_preamble" --type cpp

Length of output: 1857


#!/bin/bash

Locate the get_version implementation to inspect reader positioning

rg -n "auto get_version" -A 10 -B 5 src/clp_ffi_js/ir/StreamReader.cpp


#!/bin/bash

Locate get_version implementation to inspect reader positioning

echo "=== get_version implementation ==="
rg -n "get_version" -C 5 src/clp_ffi_js/ir/StreamReader.cpp

Locate deserialize_metadata implementation to inspect reader positioning

echo "=== deserialize_metadata implementation ==="
rg -n "deserialize_metadata" -C 5 src/clp_ffi_js/ir/StreamReader.cpp

Find any seek/rewind calls in StreamReader.cpp

echo "=== seek operations in StreamReader.cpp ==="
rg -n "seek_from_begin" -C 3 src/clp_ffi_js/ir/StreamReader.cpp

Locate deserialize_preamble definition to see if it rewinds the reader

echo "=== deserialize_preamble definition ==="
rg -n "deserialize_preamble" -C 5 .


🏁 Script executed:

#!/bin/bash
# Search for get_version implementation to inspect reader positioning
echo "=== get_version implementation ==="
rg -n "auto get_version" -C5 src/clp_ffi_js/ir/StreamReader.cpp

# Inspect deserialize_metadata for seek operations
echo "=== deserialize_metadata implementation ==="
rg -n "auto deserialize_metadata" -C5 src/clp_ffi_js/ir/StreamReader.cpp

# Find all seek_from_begin calls in StreamReader.cpp
echo "=== seek_from_begin calls in StreamReader.cpp ==="
rg -n "seek_from_begin" -C3 src/clp_ffi_js/ir/StreamReader.cpp

Length of output: 2148


StreamReader.cpp: Decompressor position management verified

A single zstd_decompressor->seek_from_begin(0) at line 194 correctly resets the stream after metadata is read in get_version. Neither get_version (lines 42–47) nor deserialize_metadata perform any implicit seeks, so version validation and the subsequent stream‐reader creation operate on a fresh reader position.

No changes required.

@junhaoliao junhaoliao requested a review from LinZhihao-723 June 5, 2025 09:02
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 779f940 and ee5526f.

📒 Files selected for processing (9)
  • CMakeLists.txt (1 hunks)
  • src/clp_ffi_js/ir/StreamReader.cpp (5 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (3 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (3 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (4 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (4 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (4 hunks)
  • src/clp_ffi_js/ir/decoding_methods.cpp (1 hunks)
  • src/clp_ffi_js/ir/decoding_methods.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
  • src/clp_ffi_js/ir/StreamReader.hpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/StreamReader.cpp
  • src/clp_ffi_js/ir/decoding_methods.hpp
  • src/clp_ffi_js/ir/decoding_methods.cpp
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
🧠 Learnings (5)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StructuredIrStreamReader.hpp:187-190
Timestamp: 2024-11-07T08:35:20.376Z
Learning: In the `StructuredIrStreamReader` constructor of `src/clp_ffi_js/ir/StructuredIrStreamReader.hpp`, the `stream_reader_data_context` parameter is already wrapped in a `std::make_unique`, ensuring it matches the member variable `m_stream_reader_data_context` of type `std::unique_ptr<StreamReaderDataContext<StructuredIrDeserializer>>`.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StructuredIrStreamReader.hpp:187-190
Timestamp: 2024-11-07T08:35:20.376Z
Learning: In the `StructuredIrStreamReader` constructor of `src/clp_ffi_js/ir/StructuredIrStreamReader.hpp`, the `stream_reader_data_context` parameter is already wrapped in a `std::make_unique`, ensuring it matches the member variable `m_stream_reader_data_context` of type `std::unique_ptr<StreamReaderDataContext<StructuredIrDeserializer>>`.
src/clp_ffi_js/ir/StreamReader.cpp (3)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:27.398Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/decoding_methods.cpp:45-80
Timestamp: 2024-11-04T21:06:24.453Z
Learning: In `src/clp_ffi_js/ir/decoding_methods.cpp`, within the `get_version(clp::ReaderInterface&)` function, avoid returning `std::string_view` when referencing local data that goes out of scope. Instead, return `std::string` to ensure data validity.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StructuredIrStreamReader.hpp:187-190
Timestamp: 2024-11-07T08:35:20.376Z
Learning: In the `StructuredIrStreamReader` constructor of `src/clp_ffi_js/ir/StructuredIrStreamReader.hpp`, the `stream_reader_data_context` parameter is already wrapped in a `std::make_unique`, ensuring it matches the member variable `m_stream_reader_data_context` of type `std::unique_ptr<StreamReaderDataContext<StructuredIrDeserializer>>`.
src/clp_ffi_js/ir/decoding_methods.cpp (1)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:27.398Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StructuredIrStreamReader.hpp:187-190
Timestamp: 2024-11-07T08:35:20.376Z
Learning: In the `StructuredIrStreamReader` constructor of `src/clp_ffi_js/ir/StructuredIrStreamReader.hpp`, the `stream_reader_data_context` parameter is already wrapped in a `std::make_unique`, ensuring it matches the member variable `m_stream_reader_data_context` of type `std::unique_ptr<StreamReaderDataContext<StructuredIrDeserializer>>`.
🧬 Code Graph Analysis (5)
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (5)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
  • nodiscard (52-54)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (1)
  • nodiscard (66-66)
src/clp_ffi_js/json_utils.hpp (1)
  • nodiscard (17-19)
src/clp_ffi_js/ir/StreamReaderDataContext.hpp (2)
  • nodiscard (42-42)
  • nodiscard (44-44)
src/clp_ffi_js/ir/LogEventWithFilterData.hpp (3)
  • nodiscard (50-50)
  • nodiscard (52-52)
  • nodiscard (54-54)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (3)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (2)
  • get_metadata (66-68)
  • get_metadata (66-66)
src/clp_ffi_js/ir/decoding_methods.cpp (2)
  • convert_metadata_to_js_object (90-96)
  • convert_metadata_to_js_object (90-90)
src/clp_ffi_js/ir/decoding_methods.hpp (1)
  • convert_metadata_to_js_object (33-33)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (4)
src/clp_ffi_js/ir/decoding_methods.cpp (6)
  • rewind_reader_and_validate_encoding_type (28-53)
  • rewind_reader_and_validate_encoding_type (28-28)
  • deserialize_metadata (55-88)
  • deserialize_metadata (55-55)
  • convert_metadata_to_js_object (90-96)
  • convert_metadata_to_js_object (90-90)
src/clp_ffi_js/ir/decoding_methods.hpp (3)
  • rewind_reader_and_validate_encoding_type (16-16)
  • deserialize_metadata (25-25)
  • convert_metadata_to_js_object (33-33)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
  • zstd_decompressor (47-48)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)
  • get_metadata (121-123)
  • get_metadata (121-121)
src/clp_ffi_js/ir/decoding_methods.cpp (1)
src/clp_ffi_js/json_utils.hpp (1)
  • dump_json_with_replace (17-17)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (3)
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)
  • nodiscard (58-60)
src/clp_ffi_js/json_utils.hpp (1)
  • nodiscard (17-19)
src/clp_ffi_js/ir/StreamReaderDataContext.hpp (2)
  • nodiscard (42-42)
  • nodiscard (44-44)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: lint (ubuntu-latest)
🔇 Additional comments (24)
CMakeLists.txt (1)

114-114: LGTM!

The addition of the new source file to the build configuration is correct and necessary for the metadata extraction functionality.

src/clp_ffi_js/ir/StreamReader.hpp (3)

15-15: LGTM!

The new includes for ReaderInterface and nlohmann/json are necessary for the metadata functionality being added.

Also applies to: 20-20


35-35: LGTM!

The declaration of MetadataTsType for JavaScript bindings is correctly placed with other output type declarations.


85-88: LGTM!

The pure virtual get_metadata() method properly establishes the interface contract for derived classes to provide metadata access.

src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (3)

13-13: LGTM!

The JSON include is necessary for the new m_metadata member.


56-56: LGTM!

The get_metadata() override declaration correctly implements the interface from the base class.


91-91: LGTM!

The m_metadata member follows the naming convention suggested in past reviews and uses the appropriate nlohmann::json type.

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (3)

24-24: LGTM!

The new includes for decoding_methods.hpp and json_utils.hpp are correctly added to support the metadata functionality.

Also applies to: 29-29


121-123: LGTM!

The get_metadata() implementation correctly uses convert_metadata_to_js_object to convert the stored JSON metadata to a JavaScript-compatible object, following the same pattern as UnstructuredIrStreamReader.


221-221: LGTM!

The constructor properly initialises m_metadata by extracting metadata from the deserializer, ensuring the metadata is available when get_metadata() is called.

src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1)

38-43: LGTM! Well-structured metadata extraction.

The approach of saving and restoring the stream position ensures the existing deserialization logic remains unaffected while extracting metadata.

src/clp_ffi_js/ir/decoding_methods.hpp (1)

1-36: Well-designed header with clear function declarations.

The header properly encapsulates decoding functionality with appropriate documentation and return type annotations.

src/clp_ffi_js/ir/decoding_methods.cpp (3)

28-53: Robust implementation with proper error handling.

The function correctly validates the encoding type and follows the coding guidelines for boolean comparisons.


55-88: Well-implemented metadata deserialization with comprehensive error handling.

The function properly handles both IR deserialization errors and JSON parsing exceptions with informative error messages.


90-96: Clean conversion to JavaScript object.

The function appropriately uses dump_json_with_replace for safe serialization and Emscripten's API for JavaScript interoperability.

src/clp_ffi_js/ir/StreamReader.cpp (7)

25-25: LGTM: Include statement properly added.

The inclusion of decoding_methods.hpp aligns with the refactoring to extract metadata deserialization logic.


44-45: LGTM: Metadata deserialization properly refactored.

The extraction of metadata deserialization logic into deserialize_metadata improves code modularity and reusability. The function returns a JSON object that can be safely accessed for version extraction.


51-51: Improved error message specificity.

The updated error message "Failed to get stream's version" is more specific and accurately reflects the operation that failed compared to the previous generic metadata parsing message.


69-69: LGTM: MetadataTsType registration enables metadata access in JavaScript.

The registration of MetadataTsType as "Record<string, any>" properly exposes metadata functionality to JavaScript consumers.


83-83: LGTM: getMetadata method binding supports the new functionality.

Adding the getMetadata method to the Emscripten class binding correctly exposes the metadata retrieval capability to JavaScript, fulfilling the PR objectives.


123-123: LGTM: Simplified stream positioning.

Making the seek position parameter explicit with seek_from_begin(0) improves code clarity and maintains the expected behaviour of rewinding to the beginning before creating the specific stream reader type.


119-119:

✅ Verification successful

Verify that rewind_reader_and_validate_encoding_type function is properly accessible.

According to the AI summary, the rewind_reader_and_validate_encoding_type function was moved to the decoding_methods module, but this call appears unchanged. Please ensure the function is properly imported or the call is updated to use the new location.


🏁 Script executed:

#!/bin/bash
# Description: Verify the rewind_reader_and_validate_encoding_type function is properly defined and accessible
# Expected: Function should be declared in decoding_methods.hpp or still available in current scope

# Check if the function is declared in the decoding_methods header
echo "Checking for function declaration in decoding_methods.hpp:"
fd -t f -n "decoding_methods.hpp" -x cat {} \; | rg -A 3 "rewind_reader_and_validate_encoding_type"

# Check for any other declarations of this function in the codebase
echo "Checking for other declarations in the codebase:"
rg -A 3 "rewind_reader_and_validate_encoding_type.*\(" --type cpp --type hpp

# Check if there are any compilation errors related to this function
echo "Checking for potential namespace or include issues:"
rg "rewind_reader_and_validate_encoding_type" src/clp_ffi_js/ir/

Length of output: 1315


#!/bin/bash
echo "Includes in StreamReader.cpp:"
rg "#include" src/clp_ffi_js/ir/StreamReader.cpp -n

🏁 Script executed:

#!/bin/bash
echo "Checking includes in StreamReader.cpp for decoding_methods.hpp:"
rg "#include.*decoding_methods.hpp" -n src/clp_ffi_js/ir/StreamReader.cpp

echo "Snippet around call site in StreamReader.cpp:"
rg -n -B2 -A2 "rewind_reader_and_validate_encoding_type" src/clp_ffi_js/ir/StreamReader.cpp

Length of output: 742


Function accessibility verified – no changes needed
The call to rewind_reader_and_validate_encoding_type at line 119 of StreamReader.cpp is valid: the function is declared in src/clp_ffi_js/ir/decoding_methods.hpp and that header is included.

src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (2)

50-50: Expose Metadata Retrieval
The get_metadata() override cleanly exposes stored metadata to JavaScript consumers.


85-85: Metadata Member Ownership
Holding nlohmann::json m_metadata by value ensures safe ownership and avoids dangling references.

#include <clp/ir/LogEventDeserializer.hpp>
#include <clp/ir/types.hpp>
#include <emscripten/val.h>
#include <json/single_include/nlohmann/json.hpp>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Include Path Verification
Ensure the JSON include path (json/single_include/nlohmann/json.hpp) aligns with the repository’s conventions and matches other modules (e.g. nlohmann/json.hpp).

🤖 Prompt for AI Agents
In src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp at line 11, verify that the
JSON include path matches the repository's standard convention. If other modules
use `nlohmann/json.hpp`, update the include directive to `#include
<nlohmann/json.hpp>` to maintain consistency across the codebase.

@LinZhihao-723
Copy link
Member

@LinZhihao-723

We should use result-style error handling instead of relying on exceptions, at least for the factory functions. It's meaningless to use a factory function if it just throw on errors, since a constructor can be designed in a same way. The result-style error handling allows us to forward the error code from the core-ffi.

As discussed offline, i agree the Rust-like Result-style error is better. i can't quickly figure out how such results can be mapped to Emscripten's Embind system though. Shall we create an issue to address this as a separate effort? I believe this can be better done once we sync the latest clp code into clp-ffi-js (and could be done even better after you are done with the proposed error handling refactoring in clp code)

Sure. Let's create an issue and defer these changes to future PRs.

@junhaoliao
Copy link
Member Author

@LinZhihao-723

We should use result-style error handling instead of relying on exceptions, at least for the factory functions. It's meaningless to use a factory function if it just throw on errors, since a constructor can be designed in a same way. The result-style error handling allows us to forward the error code from the core-ffi.

As discussed offline, i agree the Rust-like Result-style error is better. i can't quickly figure out how such results can be mapped to Emscripten's Embind system though. Shall we create an issue to address this as a separate effort? I believe this can be better done once we sync the latest clp code into clp-ffi-js (and could be done even better after you are done with the proposed error handling refactoring in clp code)

Sure. Let's create an issue and defer these changes to future PRs.

@coderabbitai please help create an issue

Copy link

coderabbitai bot commented Jun 7, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@junhaoliao I've created a GitHub issue to track the result-style error handling enhancement as discussed. The issue captures the context from your discussion with @LinZhihao-723 and includes the appropriate timing considerations you mentioned.

  /\_/\  
 ( o.o ) 
  > ^ <

curious rabbit taking notes for future improvements


🏁 Script executed:

#!/bin/bash
# Create a GitHub issue for adopting result-style error handling
gh issue create \
  --title "Adopt result-style error handling for factory functions instead of exceptions" \
  --body "## Background

During PR review #80, @LinZhihao-723 suggested adopting result-style error handling (similar to Rust's Result type) instead of relying on exceptions, particularly for factory functions. The current approach of using factory functions that throw exceptions is not meaningful since constructors can behave similarly.

## Benefits

Result-style error handling would allow:
- Forwarding error codes from the core FFI layer more effectively
- More explicit error handling patterns
- Better integration with the underlying CLP core error handling

## Challenges

The main challenge identified is mapping such result types to Emscripten's Embind system for JavaScript bindings.

## Timing

This enhancement should be implemented after:
1. Syncing the latest CLP code into clp-ffi-js
2. Completion of the proposed error handling refactoring in the CLP codebase

## References

- Original discussion: https://github.com/y-scope/clp-ffi-js/pull/80#issuecomment-2938939675
- PR: https://github.com/y-scope/clp-ffi-js/pull/80

Requested by: @junhaoliao" \
  --assignee junhaoliao

Length of output: 1246


@junhaoliao Perfect! I've successfully created the GitHub issue for adopting result-style error handling. You can find it here:

Issue #89: Adopt result-style error handling for factory functions instead of exceptions

The issue captures all the context from your discussion with @LinZhihao-723, including the benefits, challenges with Emscripten's Embind system, and the suggested timing for implementation. It's been assigned to you and includes backlinks to this PR and the relevant comment.

@junhaoliao junhaoliao requested review from LinZhihao-723 and removed request for LinZhihao-723 June 7, 2025 09:26
Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last few comments. Otherwise lgtm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just utils.{cpp, hpp}?

/**
* @param zstd_decompressor A decompressor for an IR stream, where the read head of the stream
* is just after the stream's encoding type.
* @param zstd_decompressor A decompressor for an IR stream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out of this PR's scope:
Similarly, I think we don't need rvalue reference for zstd_decompressor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai please also track this in #90

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

Let me check the current state of issue #90 and add this suggestion to track it properly.

@junhaoliao I've added the suggestion about removing the unnecessary rvalue reference for zstd_decompressor to issue #90. The suggestion from @LinZhihao-723 is now tracked along with the other code improvements for future implementation.


🏁 Script executed:

#!/bin/bash
# Check the details of issue #90
gh issue view 90 --json title,body,comments

Length of output: 1251


🏁 Script executed:

#!/bin/bash
# Add a comment to issue #90 to track the rvalue reference suggestion
gh issue comment 90 --body "## Additional suggestion from PR #80

**Context**: In PR #80, @LinZhihao-723 noted that rvalue reference for \`zstd_decompressor\` parameter is not needed in \`StructuredIrStreamReader::create\`.

**Suggestion**: Remove unnecessary rvalue reference for \`zstd_decompressor\` parameter.

**Reference**: https://github.com/y-scope/clp-ffi-js/pull/80#discussion_r2134832676"

Length of output: 467


@junhaoliao I've successfully added the suggestion about removing the unnecessary rvalue reference for zstd_decompressor to issue #90. The suggestion from @LinZhihao-723 is now tracked along with the other rvalue reference improvements for future implementation.

The comment was added here: #90 (comment)


🧬 Code Graph Analysis Results

Relevant snippets for src/clp_ffi_js/ir/StructuredIrStreamReader.hpp


From src/clp_ffi_js/ir/StreamReaderDataContext.hpp (lines 42-44)

[[nodiscard]] auto get_deserializer() -> Deserializer& { return m_deserializer; }

[[nodiscard]] auto get_reader() -> clp::ReaderInterface& { return *m_reader; }

These methods provide access to the deserializer and the underlying reader, which are likely used internally by StructuredIrStreamReader to manage stream reading and deserialization.


From src/clp_ffi_js/ir/LogEventWithFilterData.hpp (lines 50-54)

[[nodiscard]] auto get_log_event() const -> LogEvent const& { return m_log_event; }

[[nodiscard]] auto get_log_level() const -> LogLevel { return m_log_level; }

[[nodiscard]] auto get_timestamp() const -> clp::ir::epoch_time_ms_t { return m_timestamp; }

These accessors provide details about individual log events, including the event data, log level, and timestamp, which are relevant for filtering and decoding log events in StructuredIrStreamReader.


From src/clp_ffi_js/ir/StreamReader.hpp (lines 69-70, 107, 133-134, 155)

create(DataArrayTsType const& data_array, ReaderOptions const& reader_options)
            -> std::unique_ptr<StreamReader>

filter_log_events(LogLevelFilterTsType const& log_level_filter)

decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
            -> DecodedResultsTsType

find_nearest_log_event_by_timestamp(clp::ir::epoch_time_ms_t target_ts) -> NullableLogEventIdx

These declarations correspond to virtual or interface methods that StructuredIrStreamReader overrides, defining the contract for creating readers, filtering log events, decoding ranges of events, and finding events by timestamp.


From src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (lines 217-227)

StructuredIrStreamReader::StructuredIrStreamReader(
        StreamReaderDataContext<StructuredIrDeserializer>&& stream_reader_data_context,
        std::shared_ptr<StructuredLogEvents> deserialized_log_events
)
        : m_metadata(stream_reader_data_context.get_deserializer().get_metadata()),
          m_deserialized_log_events{std::move(deserialized_log_events)},
          m_stream_reader_data_context{
                  std::make_unique<StreamReaderDataContext<StructuredIrDeserializer>>(
                          std::move(stream_reader_data_context)
                  )
          } {}

This constructor implementation shows how the StructuredIrStreamReader initializes its metadata, deserialized log events, and stream reader data context from the provided parameters.


From src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (line 66)

[[nodiscard]] auto is_auto_generated() const -> bool { return m_is_auto_generated; }

This method indicates whether a unit handler instance is auto-generated, potentially relevant for handling or filtering structured IR units during deserialization.


From src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (lines 47-48, 52-54, 60, 71-75)

create(std::unique_ptr<ZstdDecompressor>&& zstd_decompressor, clp::Array<char> data_array)
            -> UnstructuredIrStreamReader

[[nodiscard]] auto get_ir_stream_type() const -> StreamType override {
        return StreamType::Unstructured;
    }

filter_log_events(LogLevelFilterTsType const& log_level_filter) override

decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
            -> DecodedResultsTsType override

find_nearest_log_event_by_timestamp(clp::ir::epoch_time_ms_t target_ts)
            -> NullableLogEventIdx override

These methods from the unstructured IR stream reader provide a useful comparison and context for the similarly named methods in StructuredIrStreamReader, showing the interface and typical implementation patterns for stream readers.


These snippets collectively provide context on the interfaces, internal data handling, and related classes that are relevant to understanding and working with StructuredIrStreamReader.hpp.

@junhaoliao junhaoliao requested a review from LinZhihao-723 June 9, 2025 13:17
Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

feat: Add support for retrieving CLP IR stream-level metadata from IR stream readers (resolves #77).

@junhaoliao junhaoliao changed the title feat: Support retrieving CLP IR metadata in Structured and Unstructured IR stream readers (resolves #77). feat: Add support for retrieving CLP IR stream-level metadata from IR stream readers (resolves #77). Jun 9, 2025
@junhaoliao junhaoliao merged commit 7961a46 into y-scope:main Jun 9, 2025
4 checks passed
@junhaoliao junhaoliao deleted the get-metadata branch June 14, 2025 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants