Skip to content

Conversation

hoophalab
Copy link
Contributor

@hoophalab hoophalab commented Aug 20, 2025

Description

As the title indicates, this pull request introduces support for KQL filtering.

  1. The IR query functionality is dependent on clp_s::search. This PR updates the CMakeLists.txt file to link against clp-s and its required dependencies.
  2. A new method, query_index, uses the new IR search feature and returns matched indexes.
  3. The existing filter_log_events method has been modified to accept a KQL filter as a parameter.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  1. Install the package on the log viewer's prototype branch.
  2. Open an IRv2 file containing a log level key
  3. The log level filtering functionality works as expected.
  4. KQL filtering works correctly (entered in the search tab panel).
  5. Applying both level filtering and KQL filtering simultaneously works as intended.

Summary by CodeRabbit

  • New Features

    • KQL query support to locate matching log-event indices in structured streams; JS bindings expose both log‑level and log‑level+KQL overloads.
  • Behaviour Changes

    • Filtering APIs now take an additional KQL parameter; unstructured streams ignore KQL and emit a warning.
    • Filtering can combine KQL and log‑level criteria; deserialization now uses a shared helper that preserves context for subsequent queries.
  • Chores

    • Build and dependency updates: added third‑party libraries and download tasks; executables link to new libraries and unnecessary targets were disabled.

@hoophalab hoophalab requested a review from a team as a code owner August 20, 2025 20:38
Copy link

coderabbitai bot commented Aug 20, 2025

Walkthrough

Adds KQL-based querying to IR stream readers, extends StreamReader::filter_log_events to accept an optional KQL string (with a one-arg wrapper), wires query execution into deserialization to collect matching event indices, adds query/deserialization helpers and index-collector unit handler, and updates build/deps to include ANTLR/date/simdjson and related libs.

Changes

Cohort / File(s) Summary
Build system & deps
CMakeLists.txt, taskfiles/deps.yaml
Add query-related sources and clp_s/ANTLR/date/simdjson/spdlog/string_utils subdirectories; enable CLP_S search AST/KQL components while disabling many CLP_S targets; configure ANTLR static build; link clp_s::search::{ast,kql}, embind, spdlog; remove explicit spdlog include path; add download tasks for antlr-runtime, date, simdjson; update CLP tarball URL/SHA.
IR public interfaces
src/clp_ffi_js/ir/StreamReader.hpp, src/clp_ffi_js/ir/StructuredIrStreamReader.hpp, src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp, src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp
Extend StreamReader::filter_log_events to accept std::string const& kql_filter and retain a single-arg wrapper; update overrides in structured/unstructured readers; StructuredIrUnitHandler::handle_log_event gains size_t log_event_idx ([[maybe_unused]]).
StreamReader bindings
src/clp_ffi_js/ir/StreamReader.cpp
EMSCRIPTEN bindings now export two overloads for filterLogEvents (with and without kql_filter) using emscripten::select_overload.
Structured reader implementation
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp, .../StructuredIrStreamReader.hpp
Implement KQL path: when kql_filter non-empty, call collect_matched_log_event_indices to compute matching indices and set m_filtered_log_event_map; then optionally apply log-level filtering to that subset; reset map to nullopt if it covers all deserialized events; replace manual IR-unit loop with deserialize_log_events; minor lint suppression and logging severity adjustments.
Unstructured reader implementation
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp, .../UnstructuredIrStreamReader.hpp
New signature accepts kql_filter ([[maybe_unused]]); emit SPDLOG_WARN if non-empty (KQL ignored for unstructured IR); preserve existing log-level filtering logic.
Query utilities (new)
src/clp_ffi_js/ir/query_methods.hpp, src/clp_ffi_js/ir/query_methods.cpp
Add collect_matched_log_event_indices(clp::ReaderInterface&, std::string const&) and LogEventIndexIrUnitHandler that parse KQL, build a QueryHandler, run IR deserialization and collect matching log-event indices; throw ClpFfiJsException on fatal errors.
Deserialization helper (new)
src/clp_ffi_js/ir/decoding_methods.hpp
Add templated deserialize_log_events helper to drive a Deserializer loop, convert errors into ClpFfiJsException or WARN for incomplete streams.
Unit handler impl
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
Update handle_log_event implementation to accept new log_event_idx parameter (currently unused).
New source
src/clp_ffi_js/ir/query_methods.cpp, src/clp_ffi_js/ir/query_methods.hpp
New files implementing KQL parsing, query handler wiring, index collection and exposed declaration.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant JS as JS/WASM caller
  participant SR as StructuredIrStreamReader
  participant QM as collect_matched_log_event_indices
  participant QP as clp_s::search::kql::QueryHandler
  participant DZ as Deserializer
  participant RI as ReaderInterface

  JS->>SR: filter_log_events(level_filter, kql_filter)
  alt kql_filter non-empty
    SR->>QM: collect_matched_log_event_indices(RI, kql_filter)
    QM->>QP: parse KQL & build QueryHandler
    QM->>DZ: create Deserializer(RI, LogEventIndexHandler, QP)
    loop per IR unit
      DZ->>RI: read IR unit
      DZ-->>QM: invoke handler callbacks (emit matching indices)
    end
    QM-->>SR: return matched_indices
    SR->>SR: set m_filtered_log_event_map = matched_indices
  else
    SR->>SR: clear/leave m_filtered_log_event_map empty
  end
  opt level_filter provided
    SR->>SR: apply log-level filtering over selected set or all events
  end
  SR-->>JS: complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • junhaoliao
  • davemarco
  • kirkrodrigues

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Comment on lines 157 to 189
if (false == log_level_filter.isNull()) {
std::vector<size_t> filtered_log_event_map;

auto filter_levels{
emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter)
};

auto fn = [&](size_t log_event_idx) {
auto const& log_event{m_deserialized_log_events->at(log_event_idx)};
if (std::ranges::find(
filter_levels,
clp::enum_to_underlying_type(log_event.get_log_level())
)
!= filter_levels.end())
{
filtered_log_event_map.emplace_back(log_event_idx);
}
};

if (m_filtered_log_event_map.has_value()) {
for (auto log_event_idx : m_filtered_log_event_map.value()) {
fn(log_event_idx);
}
} else {
for (size_t log_event_idx = 0; log_event_idx < m_deserialized_log_events->size();
++log_event_idx)
{
fn(log_event_idx);
}
}

m_filtered_log_event_map = std::make_optional(std::move(filtered_log_event_map));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should merge existing log level filtering with the new KQL filtering by extending user supplied query expression in the future. @coderabbitai can you create an issue on this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@hoophalab I've created an issue to track the enhancement for merging log level filtering with KQL filtering. The issue outlines the current implementation, the proposed benefits of unifying the filtering approach, and provides examples of how users could write more expressive queries that combine log levels with other criteria in a single KQL expression.


🏁 Script executed:

#!/bin/bash
# Create a GitHub issue for merging log level filtering with KQL filtering
gh issue create \
  --title "Merge log level filtering with KQL filtering by extending query expressions" \
  --body "## Enhancement Request

Currently, the \`StructuredIrStreamReader::filter_log_events\` method handles log level filtering and KQL filtering separately:

1. KQL filtering is applied first (if a KQL query is provided)
2. Log level filtering is then applied on top of the KQL results

## Proposed Enhancement

Merge the existing log level filtering with KQL filtering by extending user-supplied query expressions to include log level constraints. This would:

- **Simplify filtering logic**: Use a single unified query approach instead of two separate filtering passes
- **Improve performance**: Filter in one pass instead of applying filters sequentially
- **Enhance API consistency**: Use KQL for all filtering needs rather than having separate parameters
- **Enable complex queries**: Allow users to combine log levels with other criteria in sophisticated query expressions

## Current Implementation

The current approach in \`src/clp_ffi_js/ir/StructuredIrStreamReader.cpp\` applies filters sequentially, which works but could be more efficient and user-friendly.

## Expected Outcome

Users should be able to write KQL queries like:
- \`level:INFO AND message:\"error\"\`
- \`level:(ERROR OR WARN) AND timestamp > 2024-01-01\`

This would eliminate the need for separate \`log_level_filter\` parameter while providing more flexibility.

---

**Requested by:** @hoophalab  
**Context:** [PR #104](https://github.com/y-scope/clp-ffi-js/pull/104)  
**Reference:** [Comment](https://github.com/y-scope/clp-ffi-js/pull/104#discussion_r2289245966)" \
  --assignee hoophalab

Length of output: 1715


Perfect! The issue has been successfully created at #105. The issue captures the enhancement request to merge log level filtering with KQL filtering by extending query expressions, which would provide a more unified and flexible filtering approach for users.


🧬 Code Graph Analysis Results
  • File: src/clp_ffi_js/ir/query_methods.hpp
    Lines: 10-11
    Code:

    query_index(clp::ReaderInterface& reader, std::string const& query_string)
            -> std::vector<size_t>
    
  • File: src/clp_ffi_js/ir/query_methods.cpp
    Lines: 66-136
    Code:

    [[nodiscard]] auto query_index(clp::ReaderInterface& reader, std::string const& query_string)
            -> std::vector<size_t> {
        std::istringstream query_string_stream{query_string};
        auto query_handler_result{
                clp::ffi::ir_stream::search::QueryHandler<
                        decltype(&trivial_new_projected_schema_tree_node_callback)>::
                        create(&trivial_new_projected_schema_tree_node_callback,
                               clp_s::search::kql::parse_kql_expression(query_string_stream),
                               {},
                               false)
        };
    
        if (query_handler_result.has_error()) {
            auto const error_code{query_handler_result.error()};
            throw ClpFfiJsException{
                    clp::ErrorCode::ErrorCode_Failure,
                    __FILENAME__,
                    __LINE__,
                    std::format(
                            "Failed to create qury handler: {} {}",
                            error_code.category().name(),
                            error_code.message()
                    )
            };
        }
    
        auto deserializer_result{clp::ffi::ir_stream::make_deserializer(
                reader,
                LogEventIndexIrUnitHandler{},
                std::move(query_handler_result.value())
        )};
    
        if (deserializer_result.has_error()) {
            auto const error_code{deserializer_result.error()};
            throw ClpFfiJsException{
                    clp::ErrorCode::ErrorCode_Failure,
                    __FILENAME__,
                    __LINE__,
                    std::format(
                            "Failed to create deserializer: {} {}",
                            error_code.category().name(),
                            error_code.message()
                    )
            };
        }
    
        auto& deserializer{deserializer_result.value()};
        while (false == deserializer.is_stream_completed()) {
            auto result{deserializer.deserialize_next_ir_unit(reader)};
            if (false == result.has_error()) {
                continue;
            }
            auto const error{result.error()};
            if (std::errc::result_out_of_range == error) {
                SPDLOG_ERROR("File contains an incomplete IR stream");
                break;
            }
            throw ClpFfiJsException{
                    clp::ErrorCode::ErrorCode_Corrupt,
                    __FILENAME__,
                    __LINE__,
                    std::format(
                            "Failed to deserialize IR unit: {}:{}",
                            error.category().name(),
                            error.message()
                    )
            };
        }
    
        return deserializer.get_ir_unit_handler().get_deserialized_log_event_indexes();
    }
    
  • File: src/clp_ffi_js/ir/query_methods.cpp
    Lines: 24-29
    Code:

    [[nodiscard]] auto
        handle_log_event([[maybe_unused]] KeyValuePairLogEvent&& log_event, size_t log_event_ix)
                -> IRErrorCode {
        m_deserialized_log_event_indexes.push_back(log_event_ix);
        return IRErrorCode::IRErrorCode_Success;
    }
    
  • File: src/clp_ffi_js/ir/query_methods.cpp
    Lines: 31-36
    Code:

    [[nodiscard]] static auto handle_utc_offset_change(
                [[maybe_unused]] UtcOffset utc_offset_old,
                [[maybe_unused]] UtcOffset utc_offset_new
        ) -> IRErrorCode {
        return IRErrorCode::IRErrorCode_Decode_Error;
    }
    
  • File: src/clp_ffi_js/ir/query_methods.cpp
    Lines: 38-44
    Code:

    [[nodiscard]] static auto handle_schema_tree_node_insertion(
                [[maybe_unused]] bool is_auto_generated,
                [[maybe_unused]] clp::ffi::SchemaTree::NodeLocator schema_tree_node_locator,
                [[maybe_unused]] std::shared_ptr<clp::ffi::SchemaTree const> const& schema_tree
        ) -> IRErrorCode {
        return IRErrorCode::IRErrorCode_Success;
    }
    
  • File: src/clp_ffi_js/ir/query_methods.cpp
    Lines: 46-48
    Code:

    [[nodiscard]] static auto handle_end_of_stream() -> IRErrorCode {
            return IRErrorCode::IRErrorCode_Success;
    }
    
  • File: src/clp_ffi_js/ir/query_methods.cpp
    Lines: 50-52
    Code:

    [[nodiscard]] auto get_deserialized_log_event_indexes() const -> std::vector<size_t> const& {
            return m_deserialized_log_event_indexes;
    }
    
  • File: src/clp_ffi_js/ir/StreamReader.cpp
    Lines: 232-347
    Code:

    auto StreamReader::generic_decode_range(
            size_t begin_idx,
            size_t end_idx,
            FilteredLogEventsMap const& filtered_log_event_map,
            LogEvents<LogEvent> const& log_events,
            ToStringFunc log_event_to_string,
            bool use_filter
    ) -> DecodedResultsTsType {
        if (use_filter && false == filtered_log_event_map.has_value()) {
            return DecodedResultsTsType{emscripten::val::null()};
        }
    
        size_t length{0};
        if (use_filter) {
            length = filtered_log_event_map->size();
        } else {
            length = log_events.size();
        }
        if (length < end_idx || begin_idx > end_idx) {
            SPDLOG_ERROR("Invalid log event index range: {}-{}", begin_idx, end_idx);
            return DecodedResultsTsType{emscripten::val::null()};
        }
    
        auto const results{emscripten::val::array()};
    
        for (size_t i = begin_idx; i < end_idx; ++i) {
            size_t log_event_idx{0};
            if (use_filter) {
                log_event_idx = filtered_log_event_map->at(i);
            } else {
                log_event_idx = i;
            }
    
            auto const& log_event_with_filter_data{log_events.at(log_event_idx)};
            auto const& log_event = log_event_with_filter_data.get_log_event();
            auto const& timestamp = log_event_with_filter_data.get_timestamp();
            auto const& log_level = log_event_with_filter_data.get_log_level();
            auto const& utc_offset = log_event_with_filter_data.get_utc_offset().count();
    
            EM_ASM(
                    {
                        Emval.toValue($0).push({
                            "logEventNum": $1,
                            "logLevel": $2,
                            "message": UTF8ToString($3),
                            "timestamp": $4,
                            "utcOffset": $5
                        });
                    },
                    results.as_handle(),
                    log_event_idx + 1,
                    log_level,
                    log_event_to_string(log_event).c_str(),
                    timestamp,
                    utc_offset
            );
        }
    
        return DecodedResultsTsType(results);
    }
    
    template <typename LogEvent>
    auto StreamReader::generic_filter_log_events(
            FilteredLogEventsMap& filtered_log_event_map,
            LogLevelFilterTsType const& log_level_filter,
            LogEvents<LogEvent> const& log_events
    ) -> void {
        if (log_level_filter.isNull()) {
            filtered_log_event_map.reset();
            return;
        }
    
        filtered_log_event_map.emplace();
        auto filter_levels
                = emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter);
    
        for (size_t log_event_idx = 0; log_event_idx < log_events.size(); ++log_event_idx) {
            auto const& log_event = log_events[log_event_idx];
            if (std::ranges::find(
                        filter_levels,
                        clp::enum_to_underlying_type(log_event.get_log_level())
                )
                != filter_levels.end())
            {
                filtered_log_event_map->emplace_back(log_event_idx);
            }
        }
    }
    
    template <typename LogEvent>
    auto StreamReader::generic_find_nearest_log_event_by_timestamp(
            LogEvents<LogEvent> const& log_events,
            clp::ir::epoch_time_ms_t target_ts
    ) -> NullableLogEventIdx {
        if (log_events.empty()) {
            return NullableLogEventIdx{emscripten::val::null()};
        }
    
        // Find the log event whose timestamp is just after `target_ts`
        auto first_greater_it{std::upper_bound(
                log_events.begin(),
                log_events.end(),
                target_ts,
                [](clp::ir::epoch_time_ms_t ts, LogEventWithFilterData<LogEvent> const& log_event) {
                    return ts < log_event.get_timestamp();
                }
        )};
    
        if (first_greater_it == log_events.begin()) {
            return NullableLogEventIdx{emscripten::val(0)};
        }
    
        auto const first_greater_idx{std::distance(log_events.begin(), first_greater_it)};
    
        return NullableLogEventIdx{emscripten::val(first_greater_idx - 1)};
    }
    
  • File: src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
    Lines: 83-91
    Code:

    void UnstructuredIrStreamReader::filter_log_events(
            LogLevelFilterTsType const& log_level_filter,
            [[maybe_unused]] std::string const& kql_filter
    ) {
        SPDLOG_WARN(
                "KQL filters aren't supported for unstructured IR streams, so they're being ignored."
        );
        generic_filter_log_events(m_filtered_log_event_map, log_level_filter, m_encoded_log_events);
    }
    
  • File: src/clp_ffi_js/ir/LogEventWithFilterData.hpp
    Lines: 53-53
    Code:

    [[nodiscard]] auto get_log_event() const -> LogEvent const& { return m_log_event; }
    
  • File: src/clp_ffi_js/ir/LogEventWithFilterData.hpp
    Lines: 55-55
    Code:

    [[nodiscard]] auto get_log_level() const -> LogLevel { return m_log_level; }
    
  • File: src/clp_ffi_js/ir/LogEventWithFilterData.hpp
    Lines: 57-57
    Code:

    [[nodiscard]] auto get_timestamp() const -> clp::ir::epoch_time_ms_t { return m_timestamp; }
    
  • File: src/clp_ffi_js/ir/LogEventWithFilterData.hpp
    Lines: 59-59
    Code:

    [[nodiscard]] auto get_utc_offset() const -> UtcOffset { return m_utc_offset; }
    
  • File: src/clp_ffi_js/ir/decoding_methods.hpp
    Lines: 34-34
    Code:

    convert_metadata_to_js_object(nlohmann::json const& metadata) -> MetadataTsType
    
  • File: src/clp_ffi_js/ir/decoding_methods.cpp
    Lines: 91-97
    Code:

    auto convert_metadata_to_js_object(nlohmann::json const& metadata) -> MetadataTsType {
        auto const metadata_str{dump_json_with_replace(metadata)};
        auto const metadata_obj{
                emscripten::val::global("JSON").call<emscripten::val>("parse", metadata_str)
        };
        return MetadataTsType{metadata_obj};
    }
    
  • File: src/clp_ffi_js/utils.hpp
    Lines: 17-17
    Code:

    dump_json_with_replace(nlohmann::json const& json_obj) -> std::string
    
  • File: src/clp_ffi_js/utils.cpp
    Lines: 6-8
    Code:

    [[nodiscard]] auto dump_json_with_replace(nlohmann::json const& json_obj) -> std::string {
        return json_obj.dump(-1, ' ', false, nlohmann::json::error_handler_t::replace);
    }
    

🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 21

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 00382bc and dedb758.

📒 Files selected for processing (11)
  • CMakeLists.txt (4 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (1 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1 hunks)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (1 hunks)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (1 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (1 hunks)
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1 hunks)
  • src/clp_ffi_js/ir/query_methods.cpp (1 hunks)
  • src/clp_ffi_js/ir/query_methods.hpp (1 hunks)
  • taskfiles/deps.yaml (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit Configuration File

  • Prefer false == <expression> rather than !<expression>.

Files:

  • src/clp_ffi_js/ir/query_methods.hpp
  • src/clp_ffi_js/ir/StreamReader.hpp
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
  • src/clp_ffi_js/ir/query_methods.cpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
🧠 Learnings (4)
📚 Learning: 2024-11-04T19:35:32.602Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

Applied to files:

  • src/clp_ffi_js/ir/StreamReader.hpp
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
  • src/clp_ffi_js/ir/query_methods.cpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
📚 Learning: 2024-12-09T20:27:28.454Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#35
File: src/clp_ffi_js/ir/utils.hpp:157-157
Timestamp: 2024-12-09T20:27:28.454Z
Learning: In `src/clp_ffi_js/ir/utils.hpp`, when `log_event_idx` is of type `size_t`, avoid suggesting changes to expressions like `log_event_idx + 1` for potential integer overflow concerns.

Applied to files:

  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
  • src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
📚 Learning: 2024-11-07T08:38:39.882Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StreamReader.cpp:179-179
Timestamp: 2024-11-07T08:38:39.882Z
Learning: The function `clp::ffi::ir_stream::validate_protocol_version` used in `src/clp_ffi_js/ir/StreamReader.cpp` is imported from the CLP upstream library and may not be present in the local codebase.

Applied to files:

  • CMakeLists.txt
📚 Learning: 2024-10-05T19:27:37.240Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#11
File: package.json:24-33
Timestamp: 2024-10-05T19:27:37.240Z
Learning: In the `clp-ffi-js` project, the files in the `/dist` directory are generated during the build process and are not checked into source control, so their absence before compilation is expected.

Applied to files:

  • CMakeLists.txt
🧬 Code Graph Analysis (9)
src/clp_ffi_js/ir/query_methods.hpp (1)
src/clp_ffi_js/ir/query_methods.cpp (6)
  • nodiscard (24-29)
  • nodiscard (31-36)
  • nodiscard (38-44)
  • nodiscard (46-48)
  • nodiscard (50-52)
  • nodiscard (66-136)
src/clp_ffi_js/ir/StreamReader.hpp (2)
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)
  • log_level_filter (66-69)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
  • log_level_filter (61-64)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (1)
src/clp_ffi_js/ir/query_methods.cpp (6)
  • nodiscard (24-29)
  • nodiscard (31-36)
  • nodiscard (38-44)
  • nodiscard (46-48)
  • nodiscard (50-52)
  • nodiscard (66-136)
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (2)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (5)
  • log_event (109-110)
  • log_event (153-153)
  • log_event (163-164)
  • log_event (174-174)
  • maybe_unused (118-121)
src/clp_ffi_js/ir/query_methods.cpp (3)
  • maybe_unused (25-26)
  • maybe_unused (31-34)
  • maybe_unused (38-42)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (2)
src/clp_ffi_js/ir/StreamReader.hpp (1)
  • log_level_filter (108-108)
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)
  • log_level_filter (66-69)
src/clp_ffi_js/ir/query_methods.cpp (3)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (8)
  • nodiscard (66-66)
  • maybe_unused (118-121)
  • log_event (109-110)
  • log_event (153-153)
  • log_event (163-164)
  • log_event (174-174)
  • is_auto_generated (131-135)
  • schema_tree (74-77)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (2)
  • create (35-65)
  • create (35-38)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)
  • create (83-126)
  • create (83-87)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (4)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (2)
  • filter_log_events (83-91)
  • filter_log_events (83-86)
src/clp_ffi_js/ir/StreamReader.hpp (1)
  • log_level_filter (108-108)
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)
  • log_level_filter (66-69)
src/clp_ffi_js/ir/query_methods.hpp (1)
  • query_index (10-11)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (4)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)
  • filter_log_events (144-196)
  • filter_log_events (144-147)
src/clp_ffi_js/ir/StreamReader.hpp (1)
  • log_level_filter (108-108)
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)
  • log_level_filter (66-69)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
  • log_level_filter (61-64)
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (2)
src/clp_ffi_js/ir/StreamReader.hpp (1)
  • log_level_filter (108-108)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
  • log_level_filter (61-64)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: lint (ubuntu-latest)
  • GitHub Check: lint (macos-latest)
🔇 Additional comments (12)
taskfiles/deps.yaml (4)

19-26: New deps correctly wired into download-all

antlr-runtime, date, and simdjson are included in the aggregate task. Looks good.


89-105: Howard Hinnant date v3.0.1 pin looks fine

Pinned to a known-good version with checksum. No issues spotted.


145-161: Checksum Verified for simdjson v3.13.0

The downloaded archive’s SHA-256 hash matches the pinned value (07a1bb3587aac18fd6a10a83fe4ab09f1100ab39f0cb73baea1317826b9f9e0d), so the dependency pin is correct.
No further changes are required.


81-84: Manual verification required: CLP tarball SHA256 and KQL search components

Automated checks in this environment were inconclusive. Please manually confirm the following:

  • The SHA256 checksum of the tarball at
    https://github.com/y-scope/clp/archive/6a00a87.tar.gz
    matches
    00fa3880ac74c00c453832d51f9a533533e59f89b94159f05ad4c767225178f0.

  • The unpacked archive contains the search/...kql files required by the clp_s::search (KQL) component.

src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)

66-69: filter_log_events signature consistently updated across codebase

Verified that the two-argument signature is applied uniformly in the base class, all derived readers, and the JS binding—no one-argument variants remain.

• src/clp_ffi_js/ir/StreamReader.hpp: abstract method declaration uses
filter_log_events(LogLevelFilterTsType const& log_level_filter, std::string const& kql_filter)
• src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp / UnstructuredIrStreamReader.cpp: override and implementation updated to the two-arg form (with [[maybe_unused]] on the second parameter)
• src/clp_ffi_js/ir/StructuredIrStreamReader.hpp / StructuredIrStreamReader.cpp: override and implementation updated to the two-arg form
• src/clp_ffi_js/ir/StreamReader.cpp: the JS binding registers filterLogEvents against the updated C++ method

All grep searches for a lone filter_log_events(...) without a comma returned no hits, confirming there are no lingering one-argument declarations or calls.

CMakeLists.txt (4)

105-105: Add query_methods.cpp to build: LGTM

Source is correctly added to CLP_FFI_JS_SRC_MAIN. No issues.


153-165: Unable to locate clp_s sources – please manually verify search toggles
I couldn’t find a local components/core/src/clp_s directory in this PR (it’s likely pulled in via the CLP_FFI_JS_CLP_SOURCE_DIRECTORY variable), so I’m unable to confirm whether
clp_s::search::ast and clp_s::search::kql targets are correctly built when CLP_BUILD_CLP_S_SEARCH is set to OFF while their individual toggles are ON.

• Please open the external components/core/src/clp_s/CMakeLists.txt and ensure that each add_library(clp_s::search::ast …) and add_library(clp_s::search::kql …) call is gated by its own toggle (e.g. CLP_BUILD_CLP_S_SEARCH_AST, CLP_BUILD_CLP_S_SEARCH_KQL) rather than the umbrella CLP_BUILD_CLP_S_SEARCH.
• Confirm that disabling the overall search toggle does not inadvertently disable AST or KQL support when those sub-toggles are explicitly enabled.


117-119: Missing core search source files in CLP_FFI_JS_CLP_SOURCE_DIRECTORY

I couldn’t locate the following files under the expected upstream paths, which will break CI if they’re not present:

• components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
• components/core/src/clp/ffi/ir_stream/search/utils.cpp
• components/core/src/clp/ffi/ir_stream/utils.cpp

Only a single utils.cpp exists at src/clp_ffi_js/utils.cpp, so please verify:

  • That CLP_FFI_JS_CLP_SOURCE_DIRECTORY is set correctly
  • The pinned CLP commit indeed contains these files at those locations
  • CMakeLists.txt paths are updated to match the actual directory layout

150-152: Verify string_utils subdirectory and target name
I wasn’t able to locate a string_utils folder or its CMakeLists.txt under the expected path, nor any add_library(string_utils …) or add_library(clp::string_utils …) declarations. Please manually confirm:

  • That ${CLP_FFI_JS_CLP_SOURCE_DIRECTORY}/components/core/src/clp/string_utils exists and contains a CMakeLists.txt.
  • That this CMakeLists defines a target exactly named string_utils (or provides an alias) so downstream target_link_libraries(string_utils) calls will succeed.
  • If the directory name or target name differs (e.g. string-util or StringUtils), adjust it to match the expected string_utils, or add an explicit alias via add_library(string_utils ALIAS actual_target_name).
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)

31-32: Include added for query methods: LGTM

Local include path is consistent with project style and resolves the new dependency.


191-195: Nice touch: drop the map when it matches all events

This avoids unnecessary filtering overhead downstream. Good.

src/clp_ffi_js/ir/query_methods.cpp (1)

112-118: Core loop looks solid

The deserialization loop follows the project’s style (i.e., false == ...) and handles errors explicitly. Aside from the incomplete-stream semantics noted above, LGTM.

Also applies to: 123-133

coderabbitai[bot]

This comment was marked as resolved.

@y-scope y-scope deleted a comment from coderabbitai bot Aug 20, 2025
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as off-topic.

@hoophalab hoophalab requested a review from junhaoliao August 20, 2025 22:27
Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand this repo lacks unit / integration test which makes APIs sample usages a bit unintuitive to be documented. for now, let's try our best to list how some APIs can be called in the PR description (usually i do that in the "Validation" section since i need to write what i test and what the expectation is anyways).

without reading the code, i tried but couldn't really figure out how some log can be validly filtered for. here're my attempts with example.clp.zst.zip:

import ModuleInit from "./cmake-build-debug/ClpFfiJs-node.js"
import fs from "node:fs"

const main = async () => {
    // const file = fs.readFileSync("./test-irv2.clp.zst")
    const file = fs.readFileSync("./example.clp.zst")

    console.time("perf")
    const Module = await ModuleInit()
    console.log(Module.MERGED_KV_PAIRS_AUTO_GENERATED_KEY);
    console.log(Module.MERGED_KV_PAIRS_USER_GENERATED_KEY);
    try {
        const decoder = new Module.ClpStreamReader(
            new Uint8Array(file),
            {timestampKey: {isAutoGenerated: true, parts: ["timestamp", "unix_millisecs"]},
                logLevelKey: {parts: ["level", "name"], isAutoGenerated: true},
                utcOffsetKey: {parts: ["timestamp", "utc_offset_secs"], isAutoGenerated: true} })
        console.log(decoder.getMetadata())
        console.log("type:", decoder.getIrStreamType() === Module.IrStreamType.STRUCTURED,
            decoder.getIrStreamType() === Module.IrStreamType.UNSTRUCTURED)
        const numEvents = decoder.deserializeStream()
        console.log(numEvents)
        const results = decoder.decodeRange(0, numEvents, false)
        console.log(results)

        // Try different filter approaches
        console.log("Trying different filter approaches:");
        
        // Approach 0
        // decoder.filterLogEvents([5], "message : This is an example message")
        
        // Approach 1: Wildcard filter on message
        decoder.filterLogEvents([5], "message: *")
        console.log("Wildcard filter [5], message: *:", decoder.getFilteredLogEventMap())
        
        // Approach 2: Specific message filter
        decoder.filterLogEvents([5], 'message: "This is an example message"')
        console.log("Specific message filter [5], message: \"This is an example message\":", decoder.getFilteredLogEventMap())
        
        // Approach 3: Try with different log level
        decoder.filterLogEvents([], "message: *")
        console.log("No log level filter, message: *:", decoder.getFilteredLogEventMap())
        
        // Approach 4: Try with log level 5 (ERROR) but different syntax
        decoder.filterLogEvents([5], "*")
        console.log("Wildcard KQL with ERROR log level:", decoder.getFilteredLogEventMap())
        
        // Approach 5: Try filtering on auto-generated fields
        decoder.filterLogEvents([5], "level.name: ERROR")
        console.log("Filter on level.name: ERROR:", decoder.getFilteredLogEventMap())
    } catch (e) {
        console.error("Exception caught:", e.stack)
    }
    console.timeEnd("perf")
}

void main()

none of them gave me results and "Approach 0" (when uncommented) even throws an "antlr4::InputMismatchException":

Trying different filter approaches:
Wildcard filter [5], message: *: null
Specific message filter [5], message: "This is an example message": null
No log level filter, message: *: []
Wildcard KQL with ERROR log level: null
Filter on level.name: ERROR: []

Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool feature. let's see the comments and also update the validation section in the PR description as discussed in another thread

CMakeLists.txt Outdated
Comment on lines 182 to 189
PRIVATE
antlr4_static
clp_s::search::ast
clp_s::search::kql
embind
simdjson::simdjson
spdlog::spdlog
string_utils
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it will help to create a list like "CLP_FFI_JS_CLP_S_LINK_LIBS" and reference it here so that we can tell why some lib is needed (which helps clean up / reorganization in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry this is my fault. I forget to remove those link libraries. They were only used for debugging. We only need to list the libraries that we directly use.

Suggested change
PRIVATE
antlr4_static
clp_s::search::ast
clp_s::search::kql
embind
simdjson::simdjson
spdlog::spdlog
string_utils
PRIVATE
clp_s::search::kql
embind
spdlog::spdlog

*
* @param log_level_filter Array of selected log levels
* @param kql_filter KQL expression to filter structured IR. Use an empty string to disable. For
* unstructured IR, the filter is ignored (a warning is logged).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we improve this a bit to be more explicit so API users know what to pass for unstructured IR?

e.g., a warning is logged if ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick question: at some point i think we were looking into marking some arguments as optional (supply a default value or leave as undefined) with Emscripten but i vaguely remember we couldn't due to some limitations in Emscripten. is it still the case in 2025?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! I didn't think about this before. I overload this function. This way we don't need to bump minor version again.

auto& reader{m_stream_reader_data_context->get_reader()};
reader.seek_from_begin(0);
auto indexes{query_log_event_indices(reader, kql_filter)};
m_filtered_log_event_map = std::make_optional(std::move(indexes));
Copy link
Member

@junhaoliao junhaoliao Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't really need to wrap it with make_optional right? (the compiler should know how to do the transformation)

m_filtered_log_event_map = query_log_event_indices(reader, kql_filter);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we forgot to push this change?

Copy link
Contributor Author

@hoophalab hoophalab Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I forgot to rename indexes, but make_optional is removed

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sth may not strongly related to this PR: We've added projection support within the query handler. The timestamp and log level key search can now be replaced by a projection.

@hoophalab
Copy link
Contributor Author

i understand this repo lacks unit / integration test which makes APIs sample usages a bit unintuitive to be documented. for now, let's try our best to list how some APIs can be called in the PR description (usually i do that in the "Validation" section since i need to write what i test and what the expectation is anyways).
Trying different filter approaches:
Wildcard filter [5], message: *: null
Specific message filter [5], message: "This is an example message": null
No log level filter, message: *: []
Wildcard KQL with ERROR log level: null
Filter on level.name: ERROR: []

This null behavior wasn't created by me, and I've been annoyed by this existing behavior of clp-ffi-js so many times... Basically, get_filtered_log_event_map could return null if all log events matches. I updated the doc string.

    /**
     * @return The filtered log events map.
     * This is a sorted list of log event indices that match the filter.
     * If all log events match the filter, it returns `null` or a vector of all log events.
     */
    [[nodiscard]] virtual auto get_filtered_log_event_map() const -> FilteredLogEventMapTsType = 0;

I've run this script and the output is exactly as expected.

Wildcard filter [5], message: *: null all events match -- null
Specific message filter [5], message: "This is an example message": null all events match -- null
> No log level filter, message: *: [] no event matches (because you only want to see the log level in the list) -- []
Wildcard KQL with ERROR log level: null all event match (because all log events are ERROR in this file) -- null
> Filter on level.name: ERROR: [] no event match (because there are no user generated level.name key, and @level.name: ERROR will return null)-- []

As for the exception when kql is invalid, we probably should print an error message. This can be done by catching an exception in the log viewer.

@hoophalab hoophalab requested a review from junhaoliao August 21, 2025 19:01
@hoophalab
Copy link
Contributor Author

hoophalab commented Aug 21, 2025

sth may not strongly related to this PR: We've added projection support within the query handler. The timestamp and log level key search can now be replaced by a projection.

Thanks for pointing this out! We are actually already planning this in our offline discussion, and this issue tracks it.

@junhaoliao
Copy link
Member

sth may not strongly related to this PR: We've added projection support within the query handler. The timestamp and log level key search can now be replaced by a projection.

sounds good. let's do the refactoring in a future PR because we need to deliver the KQL query UI in the log-viewer by tomorrow. @coderabbitai create an issue for this

@hoophalab hoophalab changed the title feat!: Add KQL filtering support. feat: Add KQL filtering support. Aug 21, 2025
Copy link

coderabbitai bot commented Aug 21, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@junhaoliao I can see that this refactoring is already tracked in issue #105: "Refactor timestamp and log level key search to use projection" (#105).

The issue description matches exactly what hoophalab mentioned about using projection support within the query handler to replace the timestamp and log level key search. Since you have the KQL query UI delivery deadline tomorrow, this existing issue can serve as the tracking item for the future refactoring PR.


🏁 Script executed:

#!/bin/bash
# Check if issue #105 exists and get its details
gh issue view 105 --json title,body,state,url

Length of output: 1783


@junhaoliao I see that issue #105 is actually about merging log level filtering with KQL filtering by extending query expressions, which is different from the projection support refactoring mentioned in this discussion. Let me create a new issue specifically for the projection support refactoring.


🏁 Script executed:

#!/bin/bash
# Create a new issue for the projection support refactoring
gh issue create \
  --title "Refactor timestamp and log level key search to use projection support" \
  --body "## Enhancement Request

As mentioned by @hoophalab in [PR #104](https://github.com/y-scope/clp-ffi-js/pull/104), projection support has been added within the query handler. The current timestamp and log level key search implementation can now be replaced by using this projection functionality.

## Background

Currently, the IR stream reader performs timestamp and log level key searches separately. With the newly added projection support in the query handler, this can be refactored to use a more unified approach.

## Proposed Enhancement

Refactor the existing timestamp and log level key search implementation to leverage the projection support within the query handler, which should:

- **Improve code consistency**: Use the same projection mechanism across the codebase
- **Reduce code duplication**: Eliminate separate search logic for timestamps and log levels
- **Enhance maintainability**: Centralize projection-related functionality

## Context

This refactoring was discussed during the review of PR #104 for KQL filtering support. The KQL query UI delivery timeline requires this refactoring to be done in a future PR.

---

**Requested by:** @junhaoliao  
**Context:** [PR #104](https://github.com/y-scope/clp-ffi-js/pull/104)  
**Reference:** [Comment Thread](https://github.com/y-scope/clp-ffi-js/pull/104#discussion_r2290157891)" \
  --assignee junhaoliao

Length of output: 1525


@junhaoliao I've created issue #106: "Refactor timestamp and log level key search to use projection support" (#106) and assigned it to you.

This issue specifically tracks the refactoring to use projection support within the query handler for timestamp and log level key search, as mentioned by @hoophalab. It's separate from issue #105, which focuses on merging log level filtering with KQL filtering by extending query expressions.

Good luck with the KQL query UI delivery tomorrow!


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

coderabbitai[bot]

This comment was marked as duplicate.

Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinZhihao-723 would you have time to do a quick review on this?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (1)

48-55: Fix duplicate/inaccurate Doxygen @return lines

The comment repeats and mentions “reader option” instead of “filter option”. Streamline it.

 /**
  * @param filter_option The JavaScript object representing a filter option.
  * @param leaf_node_type The type of the leaf node for the schema tree full branch.
- * @return A schema tree full branch constructed based on the provided filter option and leaf node
- * type.
- * @return A schema tree full branch constructed based on the provided reader option.
- * @return std::nullopt if `option` is `null`.
+ * @return A schema tree full branch constructed based on the provided filter option and leaf node type,
+ *         or std::nullopt if `filter_option` is null/undefined.
  */
♻️ Duplicate comments (6)
CMakeLists.txt (2)

172-179: Vendor deps: add top-level guard and EXCLUDE_FROM_ALL to prevent duplicate-build conflicts

Unconditionally adding third-party deps (ANTLR, date, simdjson, spdlog) can conflict when this project is included as a subproject and those deps are already present upstream, reproducing the Ninja “multiple rules generate …” issue mentioned earlier. Guard these and mark them EXCLUDE_FROM_ALL to avoid unintended builds.

Apply this diff:

-set(ANTLR_BUILD_SHARED OFF)
-set(ANTLR_BUILD_STATIC ON)
-add_subdirectory(${CLP_FFI_JS_ANTLR_RUNTIME_SOURCE_DIRECTORY})
-
-add_subdirectory(${CLP_FFI_JS_DATE_SOURCE_DIRECTORY})
-add_subdirectory(${CLP_FFI_JS_SIMDJSON_SOURCE_DIRECTORY})
-add_subdirectory(${CLP_FFI_JS_SPDLOG_SOURCE_DIRECTORY})
+set(ANTLR_BUILD_SHARED OFF)
+set(ANTLR_BUILD_STATIC ON)
+if(CMAKE_SOURCE_DIR STREQUAL PROJECT_SOURCE_DIR)
+  # Only vendor these deps at the top level to avoid clashes when used as a subproject.
+  add_subdirectory(${CLP_FFI_JS_ANTLR_RUNTIME_SOURCE_DIRECTORY} EXCLUDE_FROM_ALL)
+  add_subdirectory(${CLP_FFI_JS_DATE_SOURCE_DIRECTORY}          EXCLUDE_FROM_ALL)
+  add_subdirectory(${CLP_FFI_JS_SIMDJSON_SOURCE_DIRECTORY}      EXCLUDE_FROM_ALL)
+  add_subdirectory(${CLP_FFI_JS_SPDLOG_SOURCE_DIRECTORY}        EXCLUDE_FROM_ALL)
+endif()

190-196: Minimise link surface: drop explicit AST if transitive via KQL

If clp_s::search::kql already links against clp_s::search::ast transitively (likely), directly linking the AST target is redundant and slightly increases surface area for ODR/link churn. Prefer relying on transitive deps.

Apply this diff:

 target_link_libraries(${CLP_FFI_JS_BIN_NAME}
     PRIVATE
-    clp_s::search::ast
     clp_s::search::kql
     embind
     spdlog::spdlog
 )

If KQL does not export AST transitively in your pinned CLP version, ignore this change — but consider requesting that upstream exposes it as a transitive dependency.

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (4)

12-12: Remove stray <type_utils.hpp> include (redundant and may fail on some setups)

This header likely doesn’t exist in the include paths; <clp/type_utils.hpp> (Line 20) already provides enum_to_underlying_type.

Apply this diff:

-#include <type_utils.hpp>

154-159: KQL path: guard null data context and treat whitespace-only KQL as empty

Avoid dereferencing a null m_stream_reader_data_context and skip queries for whitespace-only KQL.

-    if (false == kql_filter.empty()) {
-        auto& reader{m_stream_reader_data_context->get_reader()};
-        reader.seek_from_begin(0);
-        auto indices{query_log_event_indices(reader, kql_filter)};
-        m_filtered_log_event_map = std::move(indices);
-    }
+    auto const has_non_space =
+            (std::string::npos != kql_filter.find_first_not_of(" \t\n\r\f\v"));
+    if (has_non_space) {
+        if (nullptr == m_stream_reader_data_context) {
+            SPDLOG_WARN("KQL provided but stream reader data context is not available; skipping.");
+        } else {
+            auto& reader{m_stream_reader_data_context->get_reader()};
+            reader.seek_from_begin(0);
+            auto indices{query_log_event_indices(reader, kql_filter)};
+            m_filtered_log_event_map = std::move(indices);
+        }
+    }

162-167: Reserve capacity for filtered indices to avoid reallocations

Minor perf improvement; reserve based on the expected source set size.

-        std::vector<size_t> filtered_log_event_map;
+        std::vector<size_t> filtered_log_event_map;
+        filtered_log_event_map.reserve(
+                m_filtered_log_event_map.has_value()
+                        ? m_filtered_log_event_map->size()
+                        : m_deserialized_log_events->size());
 
-        auto const filter_levels{
+        auto const filter_levels{
                 emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter)
         };

164-176: Optional: use an unordered_set for O(1) level membership checks

Given the small number of log levels this is micro, but if the level list grows, set lookup avoids per-event linear scans.

-        auto const filter_levels{
-                emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter)
-        };
+        auto const filter_levels_vec{
+                emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter)
+        };
+        std::unordered_set<std::underlying_type_t<LogLevel>> const filter_levels{
+                filter_levels_vec.begin(), filter_levels_vec.end()
+        };
@@
-            if (std::ranges::find(
-                        filter_levels,
-                        clp::enum_to_underlying_type(log_event.get_log_level())
-                )
-                != filter_levels.end())
+            if (filter_levels.find(
+                        clp::enum_to_underlying_type(log_event.get_log_level()))
+                != filter_levels.end())
             {
                 filtered_log_event_map.emplace_back(log_event_idx);
             }

Add the missing include near the other standard headers:

+#include <unordered_set>
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d2eef8d and 587c25f.

📒 Files selected for processing (2)
  • CMakeLists.txt (4 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
🧠 Learnings (11)
📓 Common learnings
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#104
File: src/clp_ffi_js/ir/query_methods.cpp:132-135
Timestamp: 2025-08-21T07:18:06.881Z
Learning: In query methods for IR streams (like query_log_event_indices in src/clp_ffi_js/ir/query_methods.cpp), when encountering std::errc::result_out_of_range indicating an incomplete IR stream, log a warning instead of an error since partial results are returned and can still be useful.
📚 Learning: 2024-11-04T19:35:32.602Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

Applied to files:

  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2025-08-21T07:18:06.881Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#104
File: src/clp_ffi_js/ir/query_methods.cpp:132-135
Timestamp: 2025-08-21T07:18:06.881Z
Learning: In query methods for IR streams (like query_log_event_indices in src/clp_ffi_js/ir/query_methods.cpp), when encountering std::errc::result_out_of_range indicating an incomplete IR stream, log a warning instead of an error since partial results are returned and can still be useful.

Applied to files:

  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2024-12-09T20:27:28.454Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#35
File: src/clp_ffi_js/ir/utils.hpp:157-157
Timestamp: 2024-12-09T20:27:28.454Z
Learning: In `src/clp_ffi_js/ir/utils.hpp`, when `log_event_idx` is of type `size_t`, avoid suggesting changes to expressions like `log_event_idx + 1` for potential integer overflow concerns.

Applied to files:

  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2024-11-07T08:35:20.376Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StructuredIrStreamReader.hpp:187-190
Timestamp: 2024-11-07T08:35:20.376Z
Learning: In the `StructuredIrStreamReader` constructor of `src/clp_ffi_js/ir/StructuredIrStreamReader.hpp`, the `stream_reader_data_context` parameter is already wrapped in a `std::make_unique`, ensuring it matches the member variable `m_stream_reader_data_context` of type `std::unique_ptr<StreamReaderDataContext<StructuredIrDeserializer>>`.

Applied to files:

  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2024-11-04T19:55:27.398Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:27.398Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.

Applied to files:

  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2024-11-07T08:38:39.882Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StreamReader.cpp:179-179
Timestamp: 2024-11-07T08:38:39.882Z
Learning: The function `clp::ffi::ir_stream::validate_protocol_version` used in `src/clp_ffi_js/ir/StreamReader.cpp` is imported from the CLP upstream library and may not be present in the local codebase.

Applied to files:

  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • CMakeLists.txt
📚 Learning: 2024-11-10T02:14:19.411Z
Learnt from: davemarco
PR: y-scope/clp-ffi-js#32
File: src/clp_ffi_js/ir/LogEventWithFilterData.hpp:0-0
Timestamp: 2024-11-10T02:14:19.411Z
Learning: In the context of `LogEventWithFilterData`, it is acceptable for `log_level` to be `LogLevel::NONE`; additional validation against `LogLevel::NONE` is not necessary.

Applied to files:

  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2024-11-04T19:37:27.204Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:85-91
Timestamp: 2024-11-04T19:37:27.204Z
Learning: In the `IrStreamReader::deserialize_stream` method, we handle missing End-of-Stream packets gracefully by logging an error and breaking the loop when encountering `std::errc::result_out_of_range`, rather than throwing an exception.

Applied to files:

  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2024-11-14T15:31:45.926Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#35
File: src/clp_ffi_js/ir/StructuredIrStreamReader.cpp:50-51
Timestamp: 2024-11-14T15:31:45.926Z
Learning: Ensure that `clp-ffi-js` supports JavaScript usage without relying on the TypeScript definition file, and keep track of related TODOs in a GitHub issue.

Applied to files:

  • CMakeLists.txt
📚 Learning: 2024-10-05T19:27:37.240Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#11
File: package.json:24-33
Timestamp: 2024-10-05T19:27:37.240Z
Learning: In the `clp-ffi-js` project, the files in the `/dist` directory are generated during the build process and are not checked into source control, so their absence before compilation is expected.

Applied to files:

  • CMakeLists.txt
🧬 Code graph analysis (1)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (6)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (2)
  • filter_log_events (83-94)
  • filter_log_events (83-86)
src/clp_ffi_js/ir/StreamReader.hpp (3)
  • log_level_filter (113-113)
  • log_level_filter (121-123)
  • log_level_filter (121-121)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
  • log_level_filter (61-64)
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)
  • log_level_filter (66-69)
src/clp_ffi_js/ir/query_methods.hpp (1)
  • query_log_event_indices (22-23)
src/clp_ffi_js/ir/query_methods.cpp (1)
  • query_log_event_indices (79-80)
🔇 Additional comments (4)
CMakeLists.txt (2)

105-105: New IR query plumbing source added — looks good

Adding src/clp_ffi_js/ir/query_methods.cpp into CLP_FFI_JS_SRC_MAIN is consistent with the new KQL feature. No build-system issues here.


117-118: Upstream IR search sources wired in — confirm dependency coverage

Including QueryHandlerImpl.cpp and search/utils.cpp from CLP core makes sense for KQL. Given these files depend on clp_s search AST/KQL, the later link lines should cover that. Please confirm these paths exist across the pinned CLP commit/branch you’re integrating to avoid brittle path drift.

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)

3-3: Includes for ranges and enum utilities are correct

Good additions: for std::ranges::find, <clp/type_utils.hpp> for enum_to_underlying_type, and the new query_methods.hpp import.

Also applies to: 20-20, 31-31


219-220: Correct: downgrade incomplete stream from error to warning

Matches the guidance to warn on std::errc::result_out_of_range while returning partial results.

Comment on lines +195 to +199
if (m_filtered_log_event_map.has_value()
&& m_filtered_log_event_map->size() == m_deserialized_log_events->size())
{
m_filtered_log_event_map = std::nullopt;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Document “null = all events match” behaviour in the public API

This normalisation is fine; ensure the header/TS typings and README explicitly state that get_filtered_log_event_map returns null when the filtered set equals the full set.

I can update the doxygen in StructuredIrStreamReader.hpp and the TS wrapper comments to make this explicit; let me know if you want a follow-up PR.

🤖 Prompt for AI Agents
In src/clp_ffi_js/ir/StructuredIrStreamReader.cpp around lines 195-199, the code
normalizes m_filtered_log_event_map to null when the filtered set equals the
full set; update the public API docs to document this behavior: add a doxygen
note to StructuredIrStreamReader.hpp on get_filtered_log_event_map stating it
returns null/optional empty when filtered == full (i.e., "null = all events
match"), update the TypeScript wrapper typings and JSDoc/comments to match that
contract, and add a short note to the README (or relevant API docs) describing
the same semantics so callers know to treat null as "no filtering applied."

Comment on lines +242 to 249
// NOLINTBEGIN(bugprone-lambda-function-name)
SPDLOG_ERROR(
"Failed to deserialize log event to JSON: {}:{}",
error_code.category().name(),
error_code.message()
);
// NOLINTEND(bugprone-lambda-function-name)
return std::string{cEmptyJsonStr};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Localise the lint suppression

Use NOLINTNEXTLINE to target the SPDLOG call, keeping the suppression as narrow as possible.

-            // NOLINTBEGIN(bugprone-lambda-function-name)
-            SPDLOG_ERROR(
+            // NOLINTNEXTLINE(bugprone-lambda-function-name)
+            SPDLOG_ERROR(
                     "Failed to deserialize log event to JSON: {}:{}",
                     error_code.category().name(),
                     error_code.message()
             );
-            // NOLINTEND(bugprone-lambda-function-name)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// NOLINTBEGIN(bugprone-lambda-function-name)
SPDLOG_ERROR(
"Failed to deserialize log event to JSON: {}:{}",
error_code.category().name(),
error_code.message()
);
// NOLINTEND(bugprone-lambda-function-name)
return std::string{cEmptyJsonStr};
// NOLINTNEXTLINE(bugprone-lambda-function-name)
SPDLOG_ERROR(
"Failed to deserialize log event to JSON: {}:{}",
error_code.category().name(),
error_code.message()
);
return std::string{cEmptyJsonStr};
🤖 Prompt for AI Agents
In src/clp_ffi_js/ir/StructuredIrStreamReader.cpp around lines 242 to 249, the
current NOLINTBEGIN/END block is too broad; replace it by removing the
NOLINTBEGIN(bugprone-lambda-function-name) and
NOLINTEND(bugprone-lambda-function-name) and place a single
NOLINTNEXTLINE(bugprone-lambda-function-name) immediately above the SPDLOG_ERROR
call so the lint suppression only applies to that call; keep the SPDLOG_ERROR
invocation and the subsequent return unchanged.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly lgtm. Some minor comments.

LogLevelFilterTsType const& log_level_filter,
std::string const& kql_filter
) {
m_filtered_log_event_map = std::nullopt;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name xxx_map is a bit confusing. We usually use _map for a hash map (std::unordered_map) or a tree-based map. How about filtered_log_event_indices?

Copy link
Contributor Author

@hoophalab hoophalab Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been feeling the same way since day 1... but this is used in ffi-js and log viewer. I guess the reasoning could be that this is a map from filtered_log_index to unfiltered_index. @coderabbitai can you create an issue on this refacctoring?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

I'll create an issue for this refactoring request. The naming is indeed confusing since _map typically indicates a hash map or tree-based map structure.


🏁 Script executed:

#!/bin/bash

# Create a GitHub issue for refactoring the confusing naming convention
gh issue create \
  --title "Refactor confusing 'm_filtered_log_event_map' naming to 'filtered_log_event_indices'" \
  --body "## Problem

The current naming convention using \`m_filtered_log_event_map\` is confusing because:
- The \`_map\` suffix typically indicates a hash map (\`std::unordered_map\`) or tree-based map
- The actual data structure appears to be a vector/array of indices, not a key-value mapping
- This naming pattern is used across both ffi-js and log viewer codebases

## Proposed Solution

Rename \`m_filtered_log_event_map\` to \`filtered_log_event_indices\` (or similar) to better reflect that it contains a collection of indices rather than a mapping structure.

## Context

This naming has been identified as confusing by multiple developers and would benefit from refactoring to improve code clarity and follow better naming conventions.

## References

- Original discussion: https://github.com/y-scope/clp-ffi-js/pull/104#discussion_r2291969811
- Raised by: @LinZhihao-723 and @hoophalab

## Impact

This is a cross-cutting refactoring that would require coordinated changes across:
- ffi-js codebase  
- log viewer codebase

The change would improve code maintainability and reduce developer confusion." \
  --assignee hoophalab

Length of output: 1299


@hoophalab I've created issue #107 to track this refactoring: #107

You're absolutely right that this naming has been confusing. I can see from the code that m_filtered_log_event_map is actually a std::vector<size_t> containing indices, not a map data structure. The issue is now assigned to you and includes references to this discussion and the original concern raised by @LinZhihao-723.

The refactoring will indeed require coordinated changes across both ffi-js and log viewer codebases, but it will significantly improve code clarity for future developers.

  (\   /)
 ( ._. )
o_(")(")  *thinking about better names*

🧬 Code Graph Analysis Results
  • src/clp_ffi_js/ir/StreamReader.cpp | Summary (generic_decode_range, lines 246-362)

    • Summary: This function decodes a range of log events. When use_filter is true, it uses the provided filtered_log_event_map to map the i-th entry in the output to the corresponding unfiltered log event index (log_event_idx = filtered_log_event_map->at(i)); when use_filter is false, it uses log_event_idx = i. It builds a JS array of decoded log events by calling log_event_to_string on each log_event, and emits entries via an EM_ASM block. It handles invalid ranges and missing filtered maps by returning a null value in DecodedResultsTsType.
    • Location: src/clp_ffi_js/ir/StreamReader.cpp, 246-362
  • src/clp_ffi_js/ir/query_methods.hpp | Snippet

    • Signature:
    • collect_matched_log_event_indices(clp::ReaderInterface& reader, std::string const& query_string)
      -> std::vector<size_t>
  • src/clp_ffi_js/ir/StreamReaderDataContext.hpp | Snippet

    • get_deserializer():
    • [[nodiscard]] auto get_deserializer() -> Deserializer& { return m_deserializer; }
  • src/clp_ffi_js/ir/StreamReaderDataContext.hpp | Snippet

    • get_reader():
    • [[nodiscard]] auto get_reader() -> clp::ReaderInterface& { return *m_reader; }
  • src/clp_ffi_js/utils.hpp | Snippet

    • dump_json_with_replace signature:
    • dump_json_with_replace(nlohmann::json const& json_obj) -> std::string
  • src/clp_ffi_js/utils.cpp | Snippet

    • dump_json_with_replace implementation:
    [[nodiscard]] auto dump_json_with_replace(nlohmann::json const& json_obj) -> std::string {
        return json_obj.dump(-1, ' ', false, nlohmann::json::error_handler_t::replace);
    }
    
  • src/clp_ffi_js/ir/LogEventWithFilterData.hpp | Snippet

    • get_log_event():
    • [[nodiscard]] auto get_log_event() const -> LogEvent const& { return m_log_event; }
  • src/clp_ffi_js/ir/LogEventWithFilterData.hpp | Snippet

    • get_log_level():
    • [[nodiscard]] auto get_log_level() const -> LogLevel { return m_log_level; }
  • src/clp_ffi_js/ir/LogEventWithFilterData.hpp | Snippet

    • get_timestamp():
    • [[nodiscard]] auto get_timestamp() const -> clp::ir::epoch_time_ms_t { return m_timestamp; }
  • src/clp_ffi_js/ir/LogEventWithFilterData.hpp | Snippet

    • get_utc_offset():
    • [[nodiscard]] auto get_utc_offset() const -> UtcOffset { return m_utc_offset; }
  • src/clp_ffi_js/ir/decoding_methods.cpp | Snippet

    • convert_metadata_to_js_object:
    auto convert_metadata_to_js_object(nlohmann::json const& metadata) -> MetadataTsType {
        auto const metadata_str{dump_json_with_replace(metadata)};
        auto const metadata_obj{
                emscripten::val::global("JSON").call<emscripten::val>("parse", metadata_str)
        };
        return MetadataTsType{metadata_obj};
    }
    
    • Location: src/clp_ffi_js/ir/decoding_methods.cpp, 91-97
  • src/clp_ffi_js/ir/decoding_methods.hpp | Snippet

    • Signature:
    • convert_metadata_to_js_object(nlohmann::json const& metadata) -> MetadataTsType
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp | Snippet

    • is_auto_generated():
    • [[nodiscard]] auto is_auto_generated() const -> bool { return m_is_auto_generated; }
  • src/clp_ffi_js/ir/StructuredIrStreamReader.hpp | Snippet

    • get_ir_stream_type():
    • [[nodiscard]] auto get_ir_stream_type() const -> StreamType override {
      return StreamType::Structured;
      }

🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#104
File: src/clp_ffi_js/ir/query_methods.cpp:132-135
Timestamp: 2025-08-21T07:18:06.881Z
Learning: In query methods for IR streams (like query_log_event_indices in src/clp_ffi_js/ir/query_methods.cpp), when encountering std::errc::result_out_of_range indicating an incomplete IR stream, log a warning instead of an error since partial results are returned and can still be useful.

Comment on lines +148 to +151
void StructuredIrStreamReader::filter_log_events(
LogLevelFilterTsType const& log_level_filter,
std::string const& kql_filter
) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with the current implementation. There can be a more efficient way: we can embed this log-level filtering using a lambda into the query method. But that might be over-engineering at this point.

Copy link
Contributor Author

@hoophalab hoophalab Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are probably going to extend the kql ast by ourself
(user_kql) AND (loglevelkey: ERROR OR loglevelkey: WARNING OR ...)
This is tracked in #105 #106
*the log level code is actually copied from generic_filter_log_events

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (11)
src/clp_ffi_js/ir/query_methods.hpp (2)

4-9: Fix missing include and drop unused header

This header declares std::string but doesn’t include . Also, <ystdlib/containers/Array.hpp> isn’t used here.

Apply:

 #include <vector>
+#include <string>
 
 #include <clp/ReaderInterface.hpp>
-#include <ystdlib/containers/Array.hpp>

18-19: Doc typos/casing: “Query” → “query”; use straight or curly apostrophes consistently

Tighten the user-facing contract.

- * @throws ClpFfiJsException if the preamble couldn't be deserialized or the Query couldn't be
- * executed.
+ * @throws ClpFfiJsException if the preamble couldn't be deserialized or the query couldn't be
+ * executed.
src/clp_ffi_js/ir/StreamReader.hpp (2)

99-101: Clarify “null vs full-vector” invariant for callers

Be explicit that null represents “all events match” to avoid ambiguous handling downstream.

  /**
   * @return The filtered log events map.
- * This is a sorted list of log event indices that match the filter.
- * If all log events match the filter, it returns `null` or a vector of all log events.
+ * This is a sorted list of log event indices that match the filter.
+ * If all log events match the filter, this returns `null` to signal “no-op filtering”.
+ * Callers should treat `null` as equivalent to “full coverage”.
   */

107-111: Document composition (AND), failure semantics, and whitespace handling for kql_filter

Spell out how KQL composes with log_level_filter, what happens on invalid KQL, and how unstructured streams behave. Also remove the stray colon for consistency.

- * @param log_level_filter Array of selected log levels.
- * @param kql_filter: A KQL expression used to filter kv-pairs.
- * - For structured IR: the filter is applied when non-empty.
- * - For unstructured IR: the filter is always ignored, and a warning is logged when non-empty.
+ * @param log_level_filter Array of selected log levels.
+ * @param kql_filter A KQL expression applied in conjunction (logical AND) with `log_level_filter`.
+ *        Use an empty or whitespace-only string to disable KQL.
+ *        Behaviour:
+ *        - Structured IR: the filter is applied when kql_filter is non-empty (post-trim).
+ *        - Unstructured IR: the filter is ignored; when non-empty (post-trim) a warning is logged.
+ *        Failure semantics:
+ *        - Invalid or unsupported KQL causes this call to throw `ClpFfiJsException`
+ *          with `ErrorCode_Failure`. Callers should catch and surface a user-friendly message.
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (3)

220-227: Localise the lint suppression

Prefer NOLINTNEXTLINE to narrowly target the SPDLOG line.

-            // NOLINTBEGIN(bugprone-lambda-function-name)
             SPDLOG_ERROR(
                     "Failed to deserialize log event to JSON: {}:{}",
                     error_code.category().name(),
                     error_code.message()
             );
-            // NOLINTEND(bugprone-lambda-function-name)
+            // NOLINTNEXTLINE(bugprone-lambda-function-name)

151-157: KQL handling: treat whitespace-only as empty and guard data context

Avoid running a query for whitespace-only KQL; add a defensive null check before dereferencing the data context.

     m_filtered_log_event_map.reset();
 
-    if (false == kql_filter.empty()) {
-        auto& reader{m_stream_reader_data_context->get_reader()};
-        reader.seek_from_begin(0);
-        m_filtered_log_event_map.emplace(collect_matched_log_event_indices(reader, kql_filter));
-    }
+    auto const has_non_space{
+            std::string::npos != kql_filter.find_first_not_of(" \t\n\r\f\v")
+    };
+    if (has_non_space) {
+        if (nullptr == m_stream_reader_data_context) {
+            SPDLOG_WARN("KQL provided but stream reader data context is unavailable; skipping.");
+        } else {
+            auto& reader{m_stream_reader_data_context->get_reader()};
+            reader.seek_from_begin(0);
+            m_filtered_log_event_map.emplace(
+                    collect_matched_log_event_indices(reader, kql_filter)
+            );
+        }
+    }

159-165: Pre-size the result vector to reduce reallocations

Reserve capacity based on the expected iteration set.

-    if (false == log_level_filter.isNull()) {
-        std::vector<size_t> filtered_log_event_map;
+    if (false == log_level_filter.isNull()) {
+        std::vector<size_t> filtered_log_event_map;
+        if (m_filtered_log_event_map.has_value()) {
+            filtered_log_event_map.reserve(m_filtered_log_event_map->size());
+        } else {
+            filtered_log_event_map.reserve(m_deserialized_log_events->size());
+        }
src/clp_ffi_js/ir/query_methods.cpp (4)

43-48: Bug: UTC offset change must not be treated as a decode error

Returning IRErrorCode_Decode_Error will abort on valid streams containing UTC offset changes. Return success here.

     [[nodiscard]] static auto handle_utc_offset_change(
             [[maybe_unused]] UtcOffset utc_offset_old,
             [[maybe_unused]] UtcOffset utc_offset_new
     ) -> IRErrorCode {
-        return IRErrorCode::IRErrorCode_Decode_Error;
+        return IRErrorCode::IRErrorCode_Success;
     }

62-68: Avoid copying potentially large index vectors by moving them out

Provide an extract method and use it at the return site.

     [[nodiscard]] auto get_deserialized_log_event_indices() const -> std::vector<size_t> const& {
         return m_deserialized_log_event_indices;
     }
+
+    // Moves accumulated indices out of the handler.
+    [[nodiscard]] auto extract_deserialized_log_event_indices() -> std::vector<size_t> {
+        return std::move(m_deserialized_log_event_indices);
+    }
-    return deserializer.get_ir_unit_handler().get_deserialized_log_event_indices();
+    return deserializer.get_ir_unit_handler().extract_deserialized_log_event_indices();

Also applies to: 126-129


98-103: Augment error context with the offending KQL for easier debugging

Include the KQL string in the exception message.

-                std::format(
-                        "Failed to create query handler: {} {}",
-                        error_code.category().name(),
-                        error_code.message()
-                )
+                std::format(
+                        "Failed to create query handler: {} {} | KQL: \"{}\"",
+                        error_code.category().name(),
+                        error_code.message(),
+                        query_string
+                )

83-90: Readability: name the callback type alias instead of inline decltype

This improves clarity and diagnostics.

-    auto query_handler_result{
-            clp::ffi::ir_stream::search::QueryHandler<
-                    decltype(&trivial_new_projected_schema_tree_node_callback)>::
+    using NewProjectedNodeCallback = decltype(&trivial_new_projected_schema_tree_node_callback);
+    auto query_handler_result{
+            clp::ffi::ir_stream::search::QueryHandler<NewProjectedNodeCallback>::
             create(&trivial_new_projected_schema_tree_node_callback,
                    clp_s::search::kql::parse_kql_expression(query_string_stream),
                    {},
                    false)
     };
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 587c25f and 994a77e.

📒 Files selected for processing (5)
  • src/clp_ffi_js/ir/StreamReader.hpp (1 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (4 hunks)
  • src/clp_ffi_js/ir/decoding_methods.hpp (2 hunks)
  • src/clp_ffi_js/ir/query_methods.cpp (1 hunks)
  • src/clp_ffi_js/ir/query_methods.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • src/clp_ffi_js/ir/decoding_methods.hpp
  • src/clp_ffi_js/ir/query_methods.hpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/StreamReader.hpp
  • src/clp_ffi_js/ir/query_methods.cpp
🧠 Learnings (12)
📚 Learning: 2024-11-04T19:55:27.398Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:27.398Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.

Applied to files:

  • src/clp_ffi_js/ir/decoding_methods.hpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-11-04T19:35:32.602Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

Applied to files:

  • src/clp_ffi_js/ir/decoding_methods.hpp
  • src/clp_ffi_js/ir/query_methods.hpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/StreamReader.hpp
  • src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2025-08-21T07:18:06.881Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#104
File: src/clp_ffi_js/ir/query_methods.cpp:132-135
Timestamp: 2025-08-21T07:18:06.881Z
Learning: In query methods for IR streams (like query_log_event_indices in src/clp_ffi_js/ir/query_methods.cpp), when encountering std::errc::result_out_of_range indicating an incomplete IR stream, log a warning instead of an error since partial results are returned and can still be useful.

Applied to files:

  • src/clp_ffi_js/ir/decoding_methods.hpp
  • src/clp_ffi_js/ir/query_methods.hpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/StreamReader.hpp
  • src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-12-09T20:27:28.454Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#35
File: src/clp_ffi_js/ir/utils.hpp:157-157
Timestamp: 2024-12-09T20:27:28.454Z
Learning: In `src/clp_ffi_js/ir/utils.hpp`, when `log_event_idx` is of type `size_t`, avoid suggesting changes to expressions like `log_event_idx + 1` for potential integer overflow concerns.

Applied to files:

  • src/clp_ffi_js/ir/decoding_methods.hpp
  • src/clp_ffi_js/ir/query_methods.hpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-11-07T08:35:20.376Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StructuredIrStreamReader.hpp:187-190
Timestamp: 2024-11-07T08:35:20.376Z
Learning: In the `StructuredIrStreamReader` constructor of `src/clp_ffi_js/ir/StructuredIrStreamReader.hpp`, the `stream_reader_data_context` parameter is already wrapped in a `std::make_unique`, ensuring it matches the member variable `m_stream_reader_data_context` of type `std::unique_ptr<StreamReaderDataContext<StructuredIrDeserializer>>`.

Applied to files:

  • src/clp_ffi_js/ir/decoding_methods.hpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-11-07T08:38:39.882Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StreamReader.cpp:179-179
Timestamp: 2024-11-07T08:38:39.882Z
Learning: The function `clp::ffi::ir_stream::validate_protocol_version` used in `src/clp_ffi_js/ir/StreamReader.cpp` is imported from the CLP upstream library and may not be present in the local codebase.

Applied to files:

  • src/clp_ffi_js/ir/decoding_methods.hpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-11-04T21:06:24.453Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/decoding_methods.cpp:45-80
Timestamp: 2024-11-04T21:06:24.453Z
Learning: In `src/clp_ffi_js/ir/decoding_methods.cpp`, within the `get_version(clp::ReaderInterface&)` function, avoid returning `std::string_view` when referencing local data that goes out of scope. Instead, return `std::string` to ensure data validity.

Applied to files:

  • src/clp_ffi_js/ir/decoding_methods.hpp
  • src/clp_ffi_js/ir/query_methods.hpp
  • src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-11-10T02:14:19.411Z
Learnt from: davemarco
PR: y-scope/clp-ffi-js#32
File: src/clp_ffi_js/ir/LogEventWithFilterData.hpp:0-0
Timestamp: 2024-11-10T02:14:19.411Z
Learning: In the context of `LogEventWithFilterData`, it is acceptable for `log_level` to be `LogLevel::NONE`; additional validation against `LogLevel::NONE` is not necessary.

Applied to files:

  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2024-11-04T19:37:27.204Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:85-91
Timestamp: 2024-11-04T19:37:27.204Z
Learning: In the `IrStreamReader::deserialize_stream` method, we handle missing End-of-Stream packets gracefully by logging an error and breaking the loop when encountering `std::errc::result_out_of_range`, rather than throwing an exception.

Applied to files:

  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-11-05T16:07:26.232Z
Learnt from: davemarco
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/decoding_methods.cpp:0-0
Timestamp: 2024-11-05T16:07:26.232Z
Learning: When suggesting the use of `std::format` with error codes or custom types in C++, ensure that the type is formattable. If the type is unknown or not directly supported by `std::format`, consider casting it to a compatible type (e.g., using `clp::enum_to_underlying_type`) to avoid compilation errors.

Applied to files:

  • src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2025-06-27T19:57:25.643Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#99
File: src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp:134-134
Timestamp: 2025-06-27T19:57:25.643Z
Learning: In the clp-ffi-js codebase, prefer list initialization `{}` over assignment operator `=` for variable declarations to prevent narrowing conversions and maintain consistency. This applies to C++ files throughout the codebase.

Applied to files:

  • src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-11-14T15:31:45.926Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#35
File: src/clp_ffi_js/ir/StructuredIrStreamReader.cpp:50-51
Timestamp: 2024-11-14T15:31:45.926Z
Learning: Ensure that `clp-ffi-js` supports JavaScript usage without relying on the TypeScript definition file, and keep track of related TODOs in a GitHub issue.

Applied to files:

  • src/clp_ffi_js/ir/query_methods.cpp
🧬 Code graph analysis (3)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (4)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (2)
  • filter_log_events (83-94)
  • filter_log_events (83-86)
src/clp_ffi_js/ir/StreamReader.hpp (3)
  • log_level_filter (113-113)
  • log_level_filter (121-123)
  • log_level_filter (121-121)
src/clp_ffi_js/ir/query_methods.cpp (1)
  • collect_matched_log_event_indices (80-81)
src/clp_ffi_js/ir/decoding_methods.hpp (2)
  • deserialize_log_events (44-69)
  • deserialize_log_events (44-47)
src/clp_ffi_js/ir/StreamReader.hpp (6)
src/clp_ffi_js/ir/query_methods.cpp (6)
  • nodiscard (36-41)
  • nodiscard (43-48)
  • nodiscard (50-56)
  • nodiscard (58-60)
  • nodiscard (62-64)
  • nodiscard (79-129)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (2)
  • nodiscard (53-55)
  • log_level_filter (61-64)
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (2)
  • nodiscard (58-60)
  • log_level_filter (66-69)
src/clp_ffi_js/ir/StreamReaderDataContext.hpp (2)
  • nodiscard (42-42)
  • nodiscard (44-44)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)
  • filter_log_events (147-198)
  • filter_log_events (147-150)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (2)
  • filter_log_events (83-94)
  • filter_log_events (83-86)
src/clp_ffi_js/ir/query_methods.cpp (2)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (7)
  • maybe_unused (119-122)
  • log_event (110-111)
  • log_event (154-154)
  • log_event (164-165)
  • log_event (175-175)
  • is_auto_generated (132-136)
  • schema_tree (74-77)
src/clp_ffi_js/ir/decoding_methods.hpp (2)
  • deserialize_log_events (44-69)
  • deserialize_log_events (44-47)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: lint (macos-latest)
  • GitHub Check: lint (ubuntu-latest)
🔇 Additional comments (4)
src/clp_ffi_js/ir/query_methods.hpp (1)

21-23: API name reads clearly — good change

Renaming to collect_matched_log_event_indices removes ambiguity vs. “index”. LGTM.

src/clp_ffi_js/ir/decoding_methods.hpp (1)

54-57: Correct severity for incomplete IR stream — good

Using SPDLOG_WARN for std::errc::result_out_of_range matches the codebase guideline that partial results are acceptable. No change needed.

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (1)

3-3: Correct include for std::ranges::find — good

Adding in this TU is required for std::ranges::find. Looks good.

src/clp_ffi_js/ir/query_methods.cpp (1)

126-129: Shared loop reuse via deserialize_log_events — good

Delegating to the common helper keeps error-handling consistent across readers. Nice.

Comment on lines +4 to 13
#include <clp/ffi/ir_stream/Deserializer.hpp>
#include <clp/ffi/ir_stream/IrUnitHandlerReq.hpp>
#include <clp/ffi/ir_stream/search/QueryHandlerReq.hpp>
#include <clp/ReaderInterface.hpp>
#include <nlohmann/json.hpp>

#include <clp_ffi_js/ClpFfiJsException.hpp>
#include <clp_ffi_js/ir/StreamReader.hpp>
#include <clp_ffi_js/ir/StructuredIrStreamReader.hpp>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add required headers for std::format and logging; avoid relying on transitive includes

This TU uses std::format and SPDLOG_WARN; include their headers explicitly. Also include <system_error> for std::errc.

 #include <clp/ffi/ir_stream/Deserializer.hpp>
 #include <clp/ffi/ir_stream/IrUnitHandlerReq.hpp>
 #include <clp/ffi/ir_stream/search/QueryHandlerReq.hpp>
 #include <clp/ReaderInterface.hpp>
 
 #include <nlohmann/json.hpp>
 
 #include <clp_ffi_js/ClpFfiJsException.hpp>
 #include <clp_ffi_js/ir/StreamReader.hpp>
 #include <clp_ffi_js/ir/StructuredIrStreamReader.hpp>
+#include <format>
+#include <spdlog/spdlog.h>
+#include <system_error>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#include <clp/ffi/ir_stream/Deserializer.hpp>
#include <clp/ffi/ir_stream/IrUnitHandlerReq.hpp>
#include <clp/ffi/ir_stream/search/QueryHandlerReq.hpp>
#include <clp/ReaderInterface.hpp>
#include <nlohmann/json.hpp>
#include <clp_ffi_js/ClpFfiJsException.hpp>
#include <clp_ffi_js/ir/StreamReader.hpp>
#include <clp_ffi_js/ir/StructuredIrStreamReader.hpp>
#include <clp/ffi/ir_stream/Deserializer.hpp>
#include <clp/ffi/ir_stream/IrUnitHandlerReq.hpp>
#include <clp/ffi/ir_stream/search/QueryHandlerReq.hpp>
#include <clp/ReaderInterface.hpp>
#include <nlohmann/json.hpp>
#include <clp_ffi_js/ClpFfiJsException.hpp>
#include <clp_ffi_js/ir/StreamReader.hpp>
#include <clp_ffi_js/ir/StructuredIrStreamReader.hpp>
#include <format>
#include <spdlog/spdlog.h>
#include <system_error>
🤖 Prompt for AI Agents
In src/clp_ffi_js/ir/decoding_methods.hpp around lines 4 to 13, this translation
unit uses std::format, SPDLOG_WARN and std::errc but relies on transitive
includes; add explicit includes for <format>, <spdlog/spdlog.h> and
<system_error> near the other headers to ensure std::format, spdlog macros and
std::errc are available and avoid depending on transitive headers.

Comment on lines 11 to +12
#include <clp_ffi_js/ir/StreamReader.hpp>
#include <clp_ffi_js/ir/StructuredIrStreamReader.hpp>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Decouple: remove unnecessary include of StructuredIrStreamReader.hpp

This header doesn’t reference StructuredIrStreamReader; drop the include to reduce compile-time and coupling.

-#include <clp_ffi_js/ir/StreamReader.hpp>
-#include <clp_ffi_js/ir/StructuredIrStreamReader.hpp>
+#include <clp_ffi_js/ir/StreamReader.hpp>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#include <clp_ffi_js/ir/StreamReader.hpp>
#include <clp_ffi_js/ir/StructuredIrStreamReader.hpp>
#include <clp_ffi_js/ir/StreamReader.hpp>
🤖 Prompt for AI Agents
In src/clp_ffi_js/ir/decoding_methods.hpp around lines 11 to 12, the file
unnecessarily includes StructuredIrStreamReader.hpp even though it does not
reference that type; remove the line including
<clp_ffi_js/ir/StructuredIrStreamReader.hpp> to reduce coupling and compile
time, leaving only the required #include <clp_ffi_js/ir/StreamReader.hpp>, and
run a quick rebuild to confirm no missing symbols.

Comment on lines +13 to +17
#include <clp/ErrorCode.hpp>
#include <clp/ffi/ir_stream/decoding_methods.hpp>
#include <clp/ffi/ir_stream/Deserializer.hpp>
#include <clp/ffi/ir_stream/search/QueryHandler.hpp>
#include <clp/ffi/KeyValuePairLogEvent.hpp>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Trim unused include: clp/ffi/ir_stream/decoding_methods.hpp

Not referenced in this TU; rely on Deserializer.hpp and our local helper instead.

 #include <clp/ErrorCode.hpp>
-#include <clp/ffi/ir_stream/decoding_methods.hpp>
 #include <clp/ffi/ir_stream/Deserializer.hpp>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#include <clp/ErrorCode.hpp>
#include <clp/ffi/ir_stream/decoding_methods.hpp>
#include <clp/ffi/ir_stream/Deserializer.hpp>
#include <clp/ffi/ir_stream/search/QueryHandler.hpp>
#include <clp/ffi/KeyValuePairLogEvent.hpp>
#include <clp/ErrorCode.hpp>
#include <clp/ffi/ir_stream/Deserializer.hpp>
#include <clp/ffi/ir_stream/search/QueryHandler.hpp>
#include <clp/ffi/KeyValuePairLogEvent.hpp>
🤖 Prompt for AI Agents
In src/clp_ffi_js/ir/query_methods.cpp around lines 13 to 17, remove the unused
include <clp/ffi/ir_stream/decoding_methods.hpp> (it's not referenced in this
translation unit); keep the other includes (Deserializer.hpp, QueryHandler.hpp,
KeyValuePairLogEvent.hpp, ErrorCode.hpp), rebuild to ensure no missing symbols,
and run tests or compilation to verify everything still compiles relying on
Deserializer.hpp and the local helper.

Comment on lines +116 to +123
/**
* Generates a filtered collection from all log events.
*
* @param log_level_filter Array of selected log levels.
*/
void filter_log_events(LogLevelFilterTsType const& log_level_filter) {
filter_log_events(log_level_filter, "");
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Minor: document the single-arg wrapper’s behaviour

Clarify that this overload disables KQL by design.

-/**
- * Generates a filtered collection from all log events.
- *
- * @param log_level_filter Array of selected log levels.
- */
+/**
+ * Convenience wrapper: filters only by `log_level_filter` (KQL disabled).
+ *
+ * @param log_level_filter Array of selected log levels.
+ */
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* Generates a filtered collection from all log events.
*
* @param log_level_filter Array of selected log levels.
*/
void filter_log_events(LogLevelFilterTsType const& log_level_filter) {
filter_log_events(log_level_filter, "");
}
/**
* Convenience wrapper: filters only by `log_level_filter` (KQL disabled).
*
* @param log_level_filter Array of selected log levels.
*/
void filter_log_events(LogLevelFilterTsType const& log_level_filter) {
filter_log_events(log_level_filter, "");
}
🤖 Prompt for AI Agents
In src/clp_ffi_js/ir/StreamReader.hpp around lines 116 to 123, the doc comment
for the single-argument filter_log_events overload should explicitly state that
this wrapper disables KQL by design; update the comment to mention that calling
filter_log_events(LogLevelFilterTsType const& log_level_filter) invokes the
two-argument overload with an empty KQL string and therefore performs only
log-level filtering (KQL is not applied), so readers understand the intended
behavior.

Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we might be adding a compatibility layer between unstructured and structured IR logs, the unstructured IR logs might be accessible in the same way as the structured IR logs (e.g., query-able by KQL) in the future.

for the title, how about:

feat: Add KQL query support for filtering structured IR log events.

@hoophalab hoophalab changed the title feat: Add KQL filtering support. feat: Add KQL query support for filtering structured IR log events. Aug 22, 2025
@hoophalab hoophalab merged commit cbe1929 into y-scope:main Aug 22, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants