-
Notifications
You must be signed in to change notification settings - Fork 9
feat: Add KQL query support for filtering structured IR log events. #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds KQL-based querying to IR stream readers, extends StreamReader::filter_log_events to accept an optional KQL string (with a one-arg wrapper), wires query execution into deserialization to collect matching event indices, adds query/deserialization helpers and index-collector unit handler, and updates build/deps to include ANTLR/date/simdjson and related libs. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant JS as JS/WASM caller
participant SR as StructuredIrStreamReader
participant QM as collect_matched_log_event_indices
participant QP as clp_s::search::kql::QueryHandler
participant DZ as Deserializer
participant RI as ReaderInterface
JS->>SR: filter_log_events(level_filter, kql_filter)
alt kql_filter non-empty
SR->>QM: collect_matched_log_event_indices(RI, kql_filter)
QM->>QP: parse KQL & build QueryHandler
QM->>DZ: create Deserializer(RI, LogEventIndexHandler, QP)
loop per IR unit
DZ->>RI: read IR unit
DZ-->>QM: invoke handler callbacks (emit matching indices)
end
QM-->>SR: return matched_indices
SR->>SR: set m_filtered_log_event_map = matched_indices
else
SR->>SR: clear/leave m_filtered_log_event_map empty
end
opt level_filter provided
SR->>SR: apply log-level filtering over selected set or all events
end
SR-->>JS: complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
if (false == log_level_filter.isNull()) { | ||
std::vector<size_t> filtered_log_event_map; | ||
|
||
auto filter_levels{ | ||
emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter) | ||
}; | ||
|
||
auto fn = [&](size_t log_event_idx) { | ||
auto const& log_event{m_deserialized_log_events->at(log_event_idx)}; | ||
if (std::ranges::find( | ||
filter_levels, | ||
clp::enum_to_underlying_type(log_event.get_log_level()) | ||
) | ||
!= filter_levels.end()) | ||
{ | ||
filtered_log_event_map.emplace_back(log_event_idx); | ||
} | ||
}; | ||
|
||
if (m_filtered_log_event_map.has_value()) { | ||
for (auto log_event_idx : m_filtered_log_event_map.value()) { | ||
fn(log_event_idx); | ||
} | ||
} else { | ||
for (size_t log_event_idx = 0; log_event_idx < m_deserialized_log_events->size(); | ||
++log_event_idx) | ||
{ | ||
fn(log_event_idx); | ||
} | ||
} | ||
|
||
m_filtered_log_event_map = std::make_optional(std::move(filtered_log_event_map)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should merge existing log level filtering with the new KQL filtering by extending user supplied query expression in the future. @coderabbitai can you create an issue on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
@hoophalab I've created an issue to track the enhancement for merging log level filtering with KQL filtering. The issue outlines the current implementation, the proposed benefits of unifying the filtering approach, and provides examples of how users could write more expressive queries that combine log levels with other criteria in a single KQL expression.
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue for merging log level filtering with KQL filtering
gh issue create \
--title "Merge log level filtering with KQL filtering by extending query expressions" \
--body "## Enhancement Request
Currently, the \`StructuredIrStreamReader::filter_log_events\` method handles log level filtering and KQL filtering separately:
1. KQL filtering is applied first (if a KQL query is provided)
2. Log level filtering is then applied on top of the KQL results
## Proposed Enhancement
Merge the existing log level filtering with KQL filtering by extending user-supplied query expressions to include log level constraints. This would:
- **Simplify filtering logic**: Use a single unified query approach instead of two separate filtering passes
- **Improve performance**: Filter in one pass instead of applying filters sequentially
- **Enhance API consistency**: Use KQL for all filtering needs rather than having separate parameters
- **Enable complex queries**: Allow users to combine log levels with other criteria in sophisticated query expressions
## Current Implementation
The current approach in \`src/clp_ffi_js/ir/StructuredIrStreamReader.cpp\` applies filters sequentially, which works but could be more efficient and user-friendly.
## Expected Outcome
Users should be able to write KQL queries like:
- \`level:INFO AND message:\"error\"\`
- \`level:(ERROR OR WARN) AND timestamp > 2024-01-01\`
This would eliminate the need for separate \`log_level_filter\` parameter while providing more flexibility.
---
**Requested by:** @hoophalab
**Context:** [PR #104](https://github.com/y-scope/clp-ffi-js/pull/104)
**Reference:** [Comment](https://github.com/y-scope/clp-ffi-js/pull/104#discussion_r2289245966)" \
--assignee hoophalab
Length of output: 1715
Perfect! The issue has been successfully created at #105. The issue captures the enhancement request to merge log level filtering with KQL filtering by extending query expressions, which would provide a more unified and flexible filtering approach for users.
🧬 Code Graph Analysis Results
-
File: src/clp_ffi_js/ir/query_methods.hpp
Lines: 10-11
Code:query_index(clp::ReaderInterface& reader, std::string const& query_string) -> std::vector<size_t>
-
File: src/clp_ffi_js/ir/query_methods.cpp
Lines: 66-136
Code:[[nodiscard]] auto query_index(clp::ReaderInterface& reader, std::string const& query_string) -> std::vector<size_t> { std::istringstream query_string_stream{query_string}; auto query_handler_result{ clp::ffi::ir_stream::search::QueryHandler< decltype(&trivial_new_projected_schema_tree_node_callback)>:: create(&trivial_new_projected_schema_tree_node_callback, clp_s::search::kql::parse_kql_expression(query_string_stream), {}, false) }; if (query_handler_result.has_error()) { auto const error_code{query_handler_result.error()}; throw ClpFfiJsException{ clp::ErrorCode::ErrorCode_Failure, __FILENAME__, __LINE__, std::format( "Failed to create qury handler: {} {}", error_code.category().name(), error_code.message() ) }; } auto deserializer_result{clp::ffi::ir_stream::make_deserializer( reader, LogEventIndexIrUnitHandler{}, std::move(query_handler_result.value()) )}; if (deserializer_result.has_error()) { auto const error_code{deserializer_result.error()}; throw ClpFfiJsException{ clp::ErrorCode::ErrorCode_Failure, __FILENAME__, __LINE__, std::format( "Failed to create deserializer: {} {}", error_code.category().name(), error_code.message() ) }; } auto& deserializer{deserializer_result.value()}; while (false == deserializer.is_stream_completed()) { auto result{deserializer.deserialize_next_ir_unit(reader)}; if (false == result.has_error()) { continue; } auto const error{result.error()}; if (std::errc::result_out_of_range == error) { SPDLOG_ERROR("File contains an incomplete IR stream"); break; } throw ClpFfiJsException{ clp::ErrorCode::ErrorCode_Corrupt, __FILENAME__, __LINE__, std::format( "Failed to deserialize IR unit: {}:{}", error.category().name(), error.message() ) }; } return deserializer.get_ir_unit_handler().get_deserialized_log_event_indexes(); }
-
File: src/clp_ffi_js/ir/query_methods.cpp
Lines: 24-29
Code:[[nodiscard]] auto handle_log_event([[maybe_unused]] KeyValuePairLogEvent&& log_event, size_t log_event_ix) -> IRErrorCode { m_deserialized_log_event_indexes.push_back(log_event_ix); return IRErrorCode::IRErrorCode_Success; }
-
File: src/clp_ffi_js/ir/query_methods.cpp
Lines: 31-36
Code:[[nodiscard]] static auto handle_utc_offset_change( [[maybe_unused]] UtcOffset utc_offset_old, [[maybe_unused]] UtcOffset utc_offset_new ) -> IRErrorCode { return IRErrorCode::IRErrorCode_Decode_Error; }
-
File: src/clp_ffi_js/ir/query_methods.cpp
Lines: 38-44
Code:[[nodiscard]] static auto handle_schema_tree_node_insertion( [[maybe_unused]] bool is_auto_generated, [[maybe_unused]] clp::ffi::SchemaTree::NodeLocator schema_tree_node_locator, [[maybe_unused]] std::shared_ptr<clp::ffi::SchemaTree const> const& schema_tree ) -> IRErrorCode { return IRErrorCode::IRErrorCode_Success; }
-
File: src/clp_ffi_js/ir/query_methods.cpp
Lines: 46-48
Code:[[nodiscard]] static auto handle_end_of_stream() -> IRErrorCode { return IRErrorCode::IRErrorCode_Success; }
-
File: src/clp_ffi_js/ir/query_methods.cpp
Lines: 50-52
Code:[[nodiscard]] auto get_deserialized_log_event_indexes() const -> std::vector<size_t> const& { return m_deserialized_log_event_indexes; }
-
File: src/clp_ffi_js/ir/StreamReader.cpp
Lines: 232-347
Code:auto StreamReader::generic_decode_range( size_t begin_idx, size_t end_idx, FilteredLogEventsMap const& filtered_log_event_map, LogEvents<LogEvent> const& log_events, ToStringFunc log_event_to_string, bool use_filter ) -> DecodedResultsTsType { if (use_filter && false == filtered_log_event_map.has_value()) { return DecodedResultsTsType{emscripten::val::null()}; } size_t length{0}; if (use_filter) { length = filtered_log_event_map->size(); } else { length = log_events.size(); } if (length < end_idx || begin_idx > end_idx) { SPDLOG_ERROR("Invalid log event index range: {}-{}", begin_idx, end_idx); return DecodedResultsTsType{emscripten::val::null()}; } auto const results{emscripten::val::array()}; for (size_t i = begin_idx; i < end_idx; ++i) { size_t log_event_idx{0}; if (use_filter) { log_event_idx = filtered_log_event_map->at(i); } else { log_event_idx = i; } auto const& log_event_with_filter_data{log_events.at(log_event_idx)}; auto const& log_event = log_event_with_filter_data.get_log_event(); auto const& timestamp = log_event_with_filter_data.get_timestamp(); auto const& log_level = log_event_with_filter_data.get_log_level(); auto const& utc_offset = log_event_with_filter_data.get_utc_offset().count(); EM_ASM( { Emval.toValue($0).push({ "logEventNum": $1, "logLevel": $2, "message": UTF8ToString($3), "timestamp": $4, "utcOffset": $5 }); }, results.as_handle(), log_event_idx + 1, log_level, log_event_to_string(log_event).c_str(), timestamp, utc_offset ); } return DecodedResultsTsType(results); } template <typename LogEvent> auto StreamReader::generic_filter_log_events( FilteredLogEventsMap& filtered_log_event_map, LogLevelFilterTsType const& log_level_filter, LogEvents<LogEvent> const& log_events ) -> void { if (log_level_filter.isNull()) { filtered_log_event_map.reset(); return; } filtered_log_event_map.emplace(); auto filter_levels = emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter); for (size_t log_event_idx = 0; log_event_idx < log_events.size(); ++log_event_idx) { auto const& log_event = log_events[log_event_idx]; if (std::ranges::find( filter_levels, clp::enum_to_underlying_type(log_event.get_log_level()) ) != filter_levels.end()) { filtered_log_event_map->emplace_back(log_event_idx); } } } template <typename LogEvent> auto StreamReader::generic_find_nearest_log_event_by_timestamp( LogEvents<LogEvent> const& log_events, clp::ir::epoch_time_ms_t target_ts ) -> NullableLogEventIdx { if (log_events.empty()) { return NullableLogEventIdx{emscripten::val::null()}; } // Find the log event whose timestamp is just after `target_ts` auto first_greater_it{std::upper_bound( log_events.begin(), log_events.end(), target_ts, [](clp::ir::epoch_time_ms_t ts, LogEventWithFilterData<LogEvent> const& log_event) { return ts < log_event.get_timestamp(); } )}; if (first_greater_it == log_events.begin()) { return NullableLogEventIdx{emscripten::val(0)}; } auto const first_greater_idx{std::distance(log_events.begin(), first_greater_it)}; return NullableLogEventIdx{emscripten::val(first_greater_idx - 1)}; }
-
File: src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
Lines: 83-91
Code:void UnstructuredIrStreamReader::filter_log_events( LogLevelFilterTsType const& log_level_filter, [[maybe_unused]] std::string const& kql_filter ) { SPDLOG_WARN( "KQL filters aren't supported for unstructured IR streams, so they're being ignored." ); generic_filter_log_events(m_filtered_log_event_map, log_level_filter, m_encoded_log_events); }
-
File: src/clp_ffi_js/ir/LogEventWithFilterData.hpp
Lines: 53-53
Code:[[nodiscard]] auto get_log_event() const -> LogEvent const& { return m_log_event; }
-
File: src/clp_ffi_js/ir/LogEventWithFilterData.hpp
Lines: 55-55
Code:[[nodiscard]] auto get_log_level() const -> LogLevel { return m_log_level; }
-
File: src/clp_ffi_js/ir/LogEventWithFilterData.hpp
Lines: 57-57
Code:[[nodiscard]] auto get_timestamp() const -> clp::ir::epoch_time_ms_t { return m_timestamp; }
-
File: src/clp_ffi_js/ir/LogEventWithFilterData.hpp
Lines: 59-59
Code:[[nodiscard]] auto get_utc_offset() const -> UtcOffset { return m_utc_offset; }
-
File: src/clp_ffi_js/ir/decoding_methods.hpp
Lines: 34-34
Code:convert_metadata_to_js_object(nlohmann::json const& metadata) -> MetadataTsType
-
File: src/clp_ffi_js/ir/decoding_methods.cpp
Lines: 91-97
Code:auto convert_metadata_to_js_object(nlohmann::json const& metadata) -> MetadataTsType { auto const metadata_str{dump_json_with_replace(metadata)}; auto const metadata_obj{ emscripten::val::global("JSON").call<emscripten::val>("parse", metadata_str) }; return MetadataTsType{metadata_obj}; }
-
File: src/clp_ffi_js/utils.hpp
Lines: 17-17
Code:dump_json_with_replace(nlohmann::json const& json_obj) -> std::string
-
File: src/clp_ffi_js/utils.cpp
Lines: 6-8
Code:[[nodiscard]] auto dump_json_with_replace(nlohmann::json const& json_obj) -> std::string { return json_obj.dump(-1, ' ', false, nlohmann::json::error_handler_t::replace); }
🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 21
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (11)
CMakeLists.txt
(4 hunks)src/clp_ffi_js/ir/StreamReader.hpp
(1 hunks)src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
(2 hunks)src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
(1 hunks)src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
(1 hunks)src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp
(1 hunks)src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
(1 hunks)src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
(1 hunks)src/clp_ffi_js/ir/query_methods.cpp
(1 hunks)src/clp_ffi_js/ir/query_methods.hpp
(1 hunks)taskfiles/deps.yaml
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit Configuration File
- Prefer
false == <expression>
rather than!<expression>
.
Files:
src/clp_ffi_js/ir/query_methods.hpp
src/clp_ffi_js/ir/StreamReader.hpp
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
src/clp_ffi_js/ir/query_methods.cpp
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
🧠 Learnings (4)
📚 Learning: 2024-11-04T19:35:32.602Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.
Applied to files:
src/clp_ffi_js/ir/StreamReader.hpp
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
src/clp_ffi_js/ir/query_methods.cpp
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
📚 Learning: 2024-12-09T20:27:28.454Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#35
File: src/clp_ffi_js/ir/utils.hpp:157-157
Timestamp: 2024-12-09T20:27:28.454Z
Learning: In `src/clp_ffi_js/ir/utils.hpp`, when `log_event_idx` is of type `size_t`, avoid suggesting changes to expressions like `log_event_idx + 1` for potential integer overflow concerns.
Applied to files:
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
📚 Learning: 2024-11-07T08:38:39.882Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StreamReader.cpp:179-179
Timestamp: 2024-11-07T08:38:39.882Z
Learning: The function `clp::ffi::ir_stream::validate_protocol_version` used in `src/clp_ffi_js/ir/StreamReader.cpp` is imported from the CLP upstream library and may not be present in the local codebase.
Applied to files:
CMakeLists.txt
📚 Learning: 2024-10-05T19:27:37.240Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#11
File: package.json:24-33
Timestamp: 2024-10-05T19:27:37.240Z
Learning: In the `clp-ffi-js` project, the files in the `/dist` directory are generated during the build process and are not checked into source control, so their absence before compilation is expected.
Applied to files:
CMakeLists.txt
🧬 Code Graph Analysis (9)
src/clp_ffi_js/ir/query_methods.hpp (1)
src/clp_ffi_js/ir/query_methods.cpp (6)
nodiscard
(24-29)nodiscard
(31-36)nodiscard
(38-44)nodiscard
(46-48)nodiscard
(50-52)nodiscard
(66-136)
src/clp_ffi_js/ir/StreamReader.hpp (2)
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)
log_level_filter
(66-69)src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
log_level_filter
(61-64)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (1)
src/clp_ffi_js/ir/query_methods.cpp (6)
nodiscard
(24-29)nodiscard
(31-36)nodiscard
(38-44)nodiscard
(46-48)nodiscard
(50-52)nodiscard
(66-136)
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (2)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (5)
log_event
(109-110)log_event
(153-153)log_event
(163-164)log_event
(174-174)maybe_unused
(118-121)src/clp_ffi_js/ir/query_methods.cpp (3)
maybe_unused
(25-26)maybe_unused
(31-34)maybe_unused
(38-42)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (2)
src/clp_ffi_js/ir/StreamReader.hpp (1)
log_level_filter
(108-108)src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)
log_level_filter
(66-69)
src/clp_ffi_js/ir/query_methods.cpp (3)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (8)
nodiscard
(66-66)maybe_unused
(118-121)log_event
(109-110)log_event
(153-153)log_event
(163-164)log_event
(174-174)is_auto_generated
(131-135)schema_tree
(74-77)src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (2)
create
(35-65)create
(35-38)src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)
create
(83-126)create
(83-87)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (4)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (2)
filter_log_events
(83-91)filter_log_events
(83-86)src/clp_ffi_js/ir/StreamReader.hpp (1)
log_level_filter
(108-108)src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)
log_level_filter
(66-69)src/clp_ffi_js/ir/query_methods.hpp (1)
query_index
(10-11)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (4)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)
filter_log_events
(144-196)filter_log_events
(144-147)src/clp_ffi_js/ir/StreamReader.hpp (1)
log_level_filter
(108-108)src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)
log_level_filter
(66-69)src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
log_level_filter
(61-64)
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (2)
src/clp_ffi_js/ir/StreamReader.hpp (1)
log_level_filter
(108-108)src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
log_level_filter
(61-64)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: lint (ubuntu-latest)
- GitHub Check: lint (macos-latest)
🔇 Additional comments (12)
taskfiles/deps.yaml (4)
19-26
: New deps correctly wired into download-allantlr-runtime, date, and simdjson are included in the aggregate task. Looks good.
89-105
: Howard Hinnant date v3.0.1 pin looks finePinned to a known-good version with checksum. No issues spotted.
145-161
: Checksum Verified for simdjson v3.13.0The downloaded archive’s SHA-256 hash matches the pinned value (
07a1bb3587aac18fd6a10a83fe4ab09f1100ab39f0cb73baea1317826b9f9e0d
), so the dependency pin is correct.
No further changes are required.
81-84
: Manual verification required: CLP tarball SHA256 and KQL search componentsAutomated checks in this environment were inconclusive. Please manually confirm the following:
The SHA256 checksum of the tarball at
https://github.com/y-scope/clp/archive/6a00a87.tar.gz
matches
00fa3880ac74c00c453832d51f9a533533e59f89b94159f05ad4c767225178f0
.The unpacked archive contains the
search/...kql
files required by theclp_s::search
(KQL) component.src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)
66-69
: filter_log_events signature consistently updated across codebaseVerified that the two-argument signature is applied uniformly in the base class, all derived readers, and the JS binding—no one-argument variants remain.
• src/clp_ffi_js/ir/StreamReader.hpp: abstract method declaration uses
filter_log_events(LogLevelFilterTsType const& log_level_filter, std::string const& kql_filter)
• src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp / UnstructuredIrStreamReader.cpp: override and implementation updated to the two-arg form (with[[maybe_unused]]
on the second parameter)
• src/clp_ffi_js/ir/StructuredIrStreamReader.hpp / StructuredIrStreamReader.cpp: override and implementation updated to the two-arg form
• src/clp_ffi_js/ir/StreamReader.cpp: the JS binding registersfilterLogEvents
against the updated C++ methodAll grep searches for a lone
filter_log_events(...)
without a comma returned no hits, confirming there are no lingering one-argument declarations or calls.CMakeLists.txt (4)
105-105
: Add query_methods.cpp to build: LGTMSource is correctly added to CLP_FFI_JS_SRC_MAIN. No issues.
153-165
: Unable to locate clp_s sources – please manually verify search toggles
I couldn’t find a localcomponents/core/src/clp_s
directory in this PR (it’s likely pulled in via theCLP_FFI_JS_CLP_SOURCE_DIRECTORY
variable), so I’m unable to confirm whether
clp_s::search::ast
andclp_s::search::kql
targets are correctly built whenCLP_BUILD_CLP_S_SEARCH
is set to OFF while their individual toggles are ON.• Please open the external
components/core/src/clp_s/CMakeLists.txt
and ensure that eachadd_library(clp_s::search::ast …)
andadd_library(clp_s::search::kql …)
call is gated by its own toggle (e.g.CLP_BUILD_CLP_S_SEARCH_AST
,CLP_BUILD_CLP_S_SEARCH_KQL
) rather than the umbrellaCLP_BUILD_CLP_S_SEARCH
.
• Confirm that disabling the overall search toggle does not inadvertently disable AST or KQL support when those sub-toggles are explicitly enabled.
117-119
: Missing core search source files in CLP_FFI_JS_CLP_SOURCE_DIRECTORYI couldn’t locate the following files under the expected upstream paths, which will break CI if they’re not present:
• components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
• components/core/src/clp/ffi/ir_stream/search/utils.cpp
• components/core/src/clp/ffi/ir_stream/utils.cppOnly a single utils.cpp exists at src/clp_ffi_js/utils.cpp, so please verify:
- That CLP_FFI_JS_CLP_SOURCE_DIRECTORY is set correctly
- The pinned CLP commit indeed contains these files at those locations
- CMakeLists.txt paths are updated to match the actual directory layout
150-152
: Verify string_utils subdirectory and target name
I wasn’t able to locate astring_utils
folder or its CMakeLists.txt under the expected path, nor anyadd_library(string_utils …)
oradd_library(clp::string_utils …)
declarations. Please manually confirm:
- That
${CLP_FFI_JS_CLP_SOURCE_DIRECTORY}/components/core/src/clp/string_utils
exists and contains aCMakeLists.txt
.- That this CMakeLists defines a target exactly named
string_utils
(or provides an alias) so downstreamtarget_link_libraries(string_utils)
calls will succeed.- If the directory name or target name differs (e.g.
string-util
orStringUtils
), adjust it to match the expectedstring_utils
, or add an explicit alias viaadd_library(string_utils ALIAS actual_target_name)
.src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)
31-32
: Include added for query methods: LGTMLocal include path is consistent with project style and resolves the new dependency.
191-195
: Nice touch: drop the map when it matches all eventsThis avoids unnecessary filtering overhead downstream. Good.
src/clp_ffi_js/ir/query_methods.cpp (1)
112-118
: Core loop looks solidThe deserialization loop follows the project’s style (i.e.,
false == ...
) and handles errors explicitly. Aside from the incomplete-stream semantics noted above, LGTM.Also applies to: 123-133
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i understand this repo lacks unit / integration test which makes APIs sample usages a bit unintuitive to be documented. for now, let's try our best to list how some APIs can be called in the PR description (usually i do that in the "Validation" section since i need to write what i test and what the expectation is anyways).
without reading the code, i tried but couldn't really figure out how some log can be validly filtered for. here're my attempts with example.clp.zst.zip:
import ModuleInit from "./cmake-build-debug/ClpFfiJs-node.js"
import fs from "node:fs"
const main = async () => {
// const file = fs.readFileSync("./test-irv2.clp.zst")
const file = fs.readFileSync("./example.clp.zst")
console.time("perf")
const Module = await ModuleInit()
console.log(Module.MERGED_KV_PAIRS_AUTO_GENERATED_KEY);
console.log(Module.MERGED_KV_PAIRS_USER_GENERATED_KEY);
try {
const decoder = new Module.ClpStreamReader(
new Uint8Array(file),
{timestampKey: {isAutoGenerated: true, parts: ["timestamp", "unix_millisecs"]},
logLevelKey: {parts: ["level", "name"], isAutoGenerated: true},
utcOffsetKey: {parts: ["timestamp", "utc_offset_secs"], isAutoGenerated: true} })
console.log(decoder.getMetadata())
console.log("type:", decoder.getIrStreamType() === Module.IrStreamType.STRUCTURED,
decoder.getIrStreamType() === Module.IrStreamType.UNSTRUCTURED)
const numEvents = decoder.deserializeStream()
console.log(numEvents)
const results = decoder.decodeRange(0, numEvents, false)
console.log(results)
// Try different filter approaches
console.log("Trying different filter approaches:");
// Approach 0
// decoder.filterLogEvents([5], "message : This is an example message")
// Approach 1: Wildcard filter on message
decoder.filterLogEvents([5], "message: *")
console.log("Wildcard filter [5], message: *:", decoder.getFilteredLogEventMap())
// Approach 2: Specific message filter
decoder.filterLogEvents([5], 'message: "This is an example message"')
console.log("Specific message filter [5], message: \"This is an example message\":", decoder.getFilteredLogEventMap())
// Approach 3: Try with different log level
decoder.filterLogEvents([], "message: *")
console.log("No log level filter, message: *:", decoder.getFilteredLogEventMap())
// Approach 4: Try with log level 5 (ERROR) but different syntax
decoder.filterLogEvents([5], "*")
console.log("Wildcard KQL with ERROR log level:", decoder.getFilteredLogEventMap())
// Approach 5: Try filtering on auto-generated fields
decoder.filterLogEvents([5], "level.name: ERROR")
console.log("Filter on level.name: ERROR:", decoder.getFilteredLogEventMap())
} catch (e) {
console.error("Exception caught:", e.stack)
}
console.timeEnd("perf")
}
void main()
none of them gave me results and "Approach 0" (when uncommented) even throws an "antlr4::InputMismatchException":
Trying different filter approaches:
Wildcard filter [5], message: *: null
Specific message filter [5], message: "This is an example message": null
No log level filter, message: *: []
Wildcard KQL with ERROR log level: null
Filter on level.name: ERROR: []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool feature. let's see the comments and also update the validation section in the PR description as discussed in another thread
CMakeLists.txt
Outdated
PRIVATE | ||
antlr4_static | ||
clp_s::search::ast | ||
clp_s::search::kql | ||
embind | ||
simdjson::simdjson | ||
spdlog::spdlog | ||
string_utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it will help to create a list like "CLP_FFI_JS_CLP_S_LINK_LIBS" and reference it here so that we can tell why some lib is needed (which helps clean up / reorganization in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry this is my fault. I forget to remove those link libraries. They were only used for debugging. We only need to list the libraries that we directly use.
PRIVATE | |
antlr4_static | |
clp_s::search::ast | |
clp_s::search::kql | |
embind | |
simdjson::simdjson | |
spdlog::spdlog | |
string_utils | |
PRIVATE | |
clp_s::search::kql | |
embind | |
spdlog::spdlog |
src/clp_ffi_js/ir/StreamReader.hpp
Outdated
* | ||
* @param log_level_filter Array of selected log levels | ||
* @param kql_filter KQL expression to filter structured IR. Use an empty string to disable. For | ||
* unstructured IR, the filter is ignored (a warning is logged). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we improve this a bit to be more explicit so API users know what to pass for unstructured IR?
e.g., a warning is logged if ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick question: at some point i think we were looking into marking some arguments as optional (supply a default value or leave as undefined) with Emscripten but i vaguely remember we couldn't due to some limitations in Emscripten. is it still the case in 2025?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! I didn't think about this before. I overload this function. This way we don't need to bump minor version again.
auto& reader{m_stream_reader_data_context->get_reader()}; | ||
reader.seek_from_begin(0); | ||
auto indexes{query_log_event_indices(reader, kql_filter)}; | ||
m_filtered_log_event_map = std::make_optional(std::move(indexes)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't really need to wrap it with make_optional
right? (the compiler should know how to do the transformation)
m_filtered_log_event_map = query_log_event_indices(reader, kql_filter);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we forgot to push this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I forgot to rename indexes, but make_optional is removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sth may not strongly related to this PR: We've added projection support within the query handler. The timestamp and log level key search can now be replaced by a projection.
This null behavior wasn't created by me, and I've been annoyed by this existing behavior of clp-ffi-js so many times... Basically,
I've run this script and the output is exactly as expected.
As for the exception when kql is invalid, we probably should print an error message. This can be done by catching an exception in the log viewer. |
Thanks for pointing this out! We are actually already planning this in our offline discussion, and this issue tracks it. |
sounds good. let's do the refactoring in a future PR because we need to deliver the KQL query UI in the log-viewer by tomorrow. @coderabbitai create an issue for this |
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain@junhaoliao I can see that this refactoring is already tracked in issue #105: "Refactor timestamp and log level key search to use projection" (#105). The issue description matches exactly what hoophalab mentioned about using projection support within the query handler to replace the timestamp and log level key search. Since you have the KQL query UI delivery deadline tomorrow, this existing issue can serve as the tracking item for the future refactoring PR. 🏁 Script executed:
Length of output: 1783 @junhaoliao I see that issue #105 is actually about merging log level filtering with KQL filtering by extending query expressions, which is different from the projection support refactoring mentioned in this discussion. Let me create a new issue specifically for the projection support refactoring. 🏁 Script executed:
Length of output: 1525 @junhaoliao I've created issue #106: "Refactor timestamp and log level key search to use projection support" (#106) and assigned it to you. This issue specifically tracks the refactoring to use projection support within the query handler for timestamp and log level key search, as mentioned by @hoophalab. It's separate from issue #105, which focuses on merging log level filtering with KQL filtering by extending query expressions. Good luck with the KQL query UI delivery tomorrow! 🧠 Learnings used
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LinZhihao-723 would you have time to do a quick review on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (1)
48-55
: Fix duplicate/inaccurate Doxygen @return linesThe comment repeats and mentions “reader option” instead of “filter option”. Streamline it.
/** * @param filter_option The JavaScript object representing a filter option. * @param leaf_node_type The type of the leaf node for the schema tree full branch. - * @return A schema tree full branch constructed based on the provided filter option and leaf node - * type. - * @return A schema tree full branch constructed based on the provided reader option. - * @return std::nullopt if `option` is `null`. + * @return A schema tree full branch constructed based on the provided filter option and leaf node type, + * or std::nullopt if `filter_option` is null/undefined. */
♻️ Duplicate comments (6)
CMakeLists.txt (2)
172-179
: Vendor deps: add top-level guard and EXCLUDE_FROM_ALL to prevent duplicate-build conflictsUnconditionally adding third-party deps (ANTLR, date, simdjson, spdlog) can conflict when this project is included as a subproject and those deps are already present upstream, reproducing the Ninja “multiple rules generate …” issue mentioned earlier. Guard these and mark them EXCLUDE_FROM_ALL to avoid unintended builds.
Apply this diff:
-set(ANTLR_BUILD_SHARED OFF) -set(ANTLR_BUILD_STATIC ON) -add_subdirectory(${CLP_FFI_JS_ANTLR_RUNTIME_SOURCE_DIRECTORY}) - -add_subdirectory(${CLP_FFI_JS_DATE_SOURCE_DIRECTORY}) -add_subdirectory(${CLP_FFI_JS_SIMDJSON_SOURCE_DIRECTORY}) -add_subdirectory(${CLP_FFI_JS_SPDLOG_SOURCE_DIRECTORY}) +set(ANTLR_BUILD_SHARED OFF) +set(ANTLR_BUILD_STATIC ON) +if(CMAKE_SOURCE_DIR STREQUAL PROJECT_SOURCE_DIR) + # Only vendor these deps at the top level to avoid clashes when used as a subproject. + add_subdirectory(${CLP_FFI_JS_ANTLR_RUNTIME_SOURCE_DIRECTORY} EXCLUDE_FROM_ALL) + add_subdirectory(${CLP_FFI_JS_DATE_SOURCE_DIRECTORY} EXCLUDE_FROM_ALL) + add_subdirectory(${CLP_FFI_JS_SIMDJSON_SOURCE_DIRECTORY} EXCLUDE_FROM_ALL) + add_subdirectory(${CLP_FFI_JS_SPDLOG_SOURCE_DIRECTORY} EXCLUDE_FROM_ALL) +endif()
190-196
: Minimise link surface: drop explicit AST if transitive via KQLIf clp_s::search::kql already links against clp_s::search::ast transitively (likely), directly linking the AST target is redundant and slightly increases surface area for ODR/link churn. Prefer relying on transitive deps.
Apply this diff:
target_link_libraries(${CLP_FFI_JS_BIN_NAME} PRIVATE - clp_s::search::ast clp_s::search::kql embind spdlog::spdlog )
If KQL does not export AST transitively in your pinned CLP version, ignore this change — but consider requesting that upstream exposes it as a transitive dependency.
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (4)
12-12
: Remove stray <type_utils.hpp> include (redundant and may fail on some setups)This header likely doesn’t exist in the include paths; <clp/type_utils.hpp> (Line 20) already provides enum_to_underlying_type.
Apply this diff:
-#include <type_utils.hpp>
154-159
: KQL path: guard null data context and treat whitespace-only KQL as emptyAvoid dereferencing a null m_stream_reader_data_context and skip queries for whitespace-only KQL.
- if (false == kql_filter.empty()) { - auto& reader{m_stream_reader_data_context->get_reader()}; - reader.seek_from_begin(0); - auto indices{query_log_event_indices(reader, kql_filter)}; - m_filtered_log_event_map = std::move(indices); - } + auto const has_non_space = + (std::string::npos != kql_filter.find_first_not_of(" \t\n\r\f\v")); + if (has_non_space) { + if (nullptr == m_stream_reader_data_context) { + SPDLOG_WARN("KQL provided but stream reader data context is not available; skipping."); + } else { + auto& reader{m_stream_reader_data_context->get_reader()}; + reader.seek_from_begin(0); + auto indices{query_log_event_indices(reader, kql_filter)}; + m_filtered_log_event_map = std::move(indices); + } + }
162-167
: Reserve capacity for filtered indices to avoid reallocationsMinor perf improvement; reserve based on the expected source set size.
- std::vector<size_t> filtered_log_event_map; + std::vector<size_t> filtered_log_event_map; + filtered_log_event_map.reserve( + m_filtered_log_event_map.has_value() + ? m_filtered_log_event_map->size() + : m_deserialized_log_events->size()); - auto const filter_levels{ + auto const filter_levels{ emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter) };
164-176
: Optional: use an unordered_set for O(1) level membership checksGiven the small number of log levels this is micro, but if the level list grows, set lookup avoids per-event linear scans.
- auto const filter_levels{ - emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter) - }; + auto const filter_levels_vec{ + emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter) + }; + std::unordered_set<std::underlying_type_t<LogLevel>> const filter_levels{ + filter_levels_vec.begin(), filter_levels_vec.end() + }; @@ - if (std::ranges::find( - filter_levels, - clp::enum_to_underlying_type(log_event.get_log_level()) - ) - != filter_levels.end()) + if (filter_levels.find( + clp::enum_to_underlying_type(log_event.get_log_level())) + != filter_levels.end()) { filtered_log_event_map.emplace_back(log_event_idx); }Add the missing include near the other standard headers:
+#include <unordered_set>
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
CMakeLists.txt
(4 hunks)src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
🧠 Learnings (11)
📓 Common learnings
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#104
File: src/clp_ffi_js/ir/query_methods.cpp:132-135
Timestamp: 2025-08-21T07:18:06.881Z
Learning: In query methods for IR streams (like query_log_event_indices in src/clp_ffi_js/ir/query_methods.cpp), when encountering std::errc::result_out_of_range indicating an incomplete IR stream, log a warning instead of an error since partial results are returned and can still be useful.
📚 Learning: 2024-11-04T19:35:32.602Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.
Applied to files:
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2025-08-21T07:18:06.881Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#104
File: src/clp_ffi_js/ir/query_methods.cpp:132-135
Timestamp: 2025-08-21T07:18:06.881Z
Learning: In query methods for IR streams (like query_log_event_indices in src/clp_ffi_js/ir/query_methods.cpp), when encountering std::errc::result_out_of_range indicating an incomplete IR stream, log a warning instead of an error since partial results are returned and can still be useful.
Applied to files:
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2024-12-09T20:27:28.454Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#35
File: src/clp_ffi_js/ir/utils.hpp:157-157
Timestamp: 2024-12-09T20:27:28.454Z
Learning: In `src/clp_ffi_js/ir/utils.hpp`, when `log_event_idx` is of type `size_t`, avoid suggesting changes to expressions like `log_event_idx + 1` for potential integer overflow concerns.
Applied to files:
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2024-11-07T08:35:20.376Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StructuredIrStreamReader.hpp:187-190
Timestamp: 2024-11-07T08:35:20.376Z
Learning: In the `StructuredIrStreamReader` constructor of `src/clp_ffi_js/ir/StructuredIrStreamReader.hpp`, the `stream_reader_data_context` parameter is already wrapped in a `std::make_unique`, ensuring it matches the member variable `m_stream_reader_data_context` of type `std::unique_ptr<StreamReaderDataContext<StructuredIrDeserializer>>`.
Applied to files:
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2024-11-04T19:55:27.398Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:27.398Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.
Applied to files:
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2024-11-07T08:38:39.882Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StreamReader.cpp:179-179
Timestamp: 2024-11-07T08:38:39.882Z
Learning: The function `clp::ffi::ir_stream::validate_protocol_version` used in `src/clp_ffi_js/ir/StreamReader.cpp` is imported from the CLP upstream library and may not be present in the local codebase.
Applied to files:
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
CMakeLists.txt
📚 Learning: 2024-11-10T02:14:19.411Z
Learnt from: davemarco
PR: y-scope/clp-ffi-js#32
File: src/clp_ffi_js/ir/LogEventWithFilterData.hpp:0-0
Timestamp: 2024-11-10T02:14:19.411Z
Learning: In the context of `LogEventWithFilterData`, it is acceptable for `log_level` to be `LogLevel::NONE`; additional validation against `LogLevel::NONE` is not necessary.
Applied to files:
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2024-11-04T19:37:27.204Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:85-91
Timestamp: 2024-11-04T19:37:27.204Z
Learning: In the `IrStreamReader::deserialize_stream` method, we handle missing End-of-Stream packets gracefully by logging an error and breaking the loop when encountering `std::errc::result_out_of_range`, rather than throwing an exception.
Applied to files:
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2024-11-14T15:31:45.926Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#35
File: src/clp_ffi_js/ir/StructuredIrStreamReader.cpp:50-51
Timestamp: 2024-11-14T15:31:45.926Z
Learning: Ensure that `clp-ffi-js` supports JavaScript usage without relying on the TypeScript definition file, and keep track of related TODOs in a GitHub issue.
Applied to files:
CMakeLists.txt
📚 Learning: 2024-10-05T19:27:37.240Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#11
File: package.json:24-33
Timestamp: 2024-10-05T19:27:37.240Z
Learning: In the `clp-ffi-js` project, the files in the `/dist` directory are generated during the build process and are not checked into source control, so their absence before compilation is expected.
Applied to files:
CMakeLists.txt
🧬 Code graph analysis (1)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (6)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (2)
filter_log_events
(83-94)filter_log_events
(83-86)src/clp_ffi_js/ir/StreamReader.hpp (3)
log_level_filter
(113-113)log_level_filter
(121-123)log_level_filter
(121-121)src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (1)
log_level_filter
(61-64)src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (1)
log_level_filter
(66-69)src/clp_ffi_js/ir/query_methods.hpp (1)
query_log_event_indices
(22-23)src/clp_ffi_js/ir/query_methods.cpp (1)
query_log_event_indices
(79-80)
🔇 Additional comments (4)
CMakeLists.txt (2)
105-105
: New IR query plumbing source added — looks goodAdding src/clp_ffi_js/ir/query_methods.cpp into CLP_FFI_JS_SRC_MAIN is consistent with the new KQL feature. No build-system issues here.
117-118
: Upstream IR search sources wired in — confirm dependency coverageIncluding QueryHandlerImpl.cpp and search/utils.cpp from CLP core makes sense for KQL. Given these files depend on clp_s search AST/KQL, the later link lines should cover that. Please confirm these paths exist across the pinned CLP commit/branch you’re integrating to avoid brittle path drift.
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)
3-3
: Includes for ranges and enum utilities are correctGood additions: for std::ranges::find, <clp/type_utils.hpp> for enum_to_underlying_type, and the new query_methods.hpp import.
Also applies to: 20-20, 31-31
219-220
: Correct: downgrade incomplete stream from error to warningMatches the guidance to warn on std::errc::result_out_of_range while returning partial results.
if (m_filtered_log_event_map.has_value() | ||
&& m_filtered_log_event_map->size() == m_deserialized_log_events->size()) | ||
{ | ||
m_filtered_log_event_map = std::nullopt; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Document “null = all events match” behaviour in the public API
This normalisation is fine; ensure the header/TS typings and README explicitly state that get_filtered_log_event_map returns null when the filtered set equals the full set.
I can update the doxygen in StructuredIrStreamReader.hpp and the TS wrapper comments to make this explicit; let me know if you want a follow-up PR.
🤖 Prompt for AI Agents
In src/clp_ffi_js/ir/StructuredIrStreamReader.cpp around lines 195-199, the code
normalizes m_filtered_log_event_map to null when the filtered set equals the
full set; update the public API docs to document this behavior: add a doxygen
note to StructuredIrStreamReader.hpp on get_filtered_log_event_map stating it
returns null/optional empty when filtered == full (i.e., "null = all events
match"), update the TypeScript wrapper typings and JSDoc/comments to match that
contract, and add a short note to the README (or relevant API docs) describing
the same semantics so callers know to treat null as "no filtering applied."
// NOLINTBEGIN(bugprone-lambda-function-name) | ||
SPDLOG_ERROR( | ||
"Failed to deserialize log event to JSON: {}:{}", | ||
error_code.category().name(), | ||
error_code.message() | ||
); | ||
// NOLINTEND(bugprone-lambda-function-name) | ||
return std::string{cEmptyJsonStr}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Localise the lint suppression
Use NOLINTNEXTLINE to target the SPDLOG call, keeping the suppression as narrow as possible.
- // NOLINTBEGIN(bugprone-lambda-function-name)
- SPDLOG_ERROR(
+ // NOLINTNEXTLINE(bugprone-lambda-function-name)
+ SPDLOG_ERROR(
"Failed to deserialize log event to JSON: {}:{}",
error_code.category().name(),
error_code.message()
);
- // NOLINTEND(bugprone-lambda-function-name)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// NOLINTBEGIN(bugprone-lambda-function-name) | |
SPDLOG_ERROR( | |
"Failed to deserialize log event to JSON: {}:{}", | |
error_code.category().name(), | |
error_code.message() | |
); | |
// NOLINTEND(bugprone-lambda-function-name) | |
return std::string{cEmptyJsonStr}; | |
// NOLINTNEXTLINE(bugprone-lambda-function-name) | |
SPDLOG_ERROR( | |
"Failed to deserialize log event to JSON: {}:{}", | |
error_code.category().name(), | |
error_code.message() | |
); | |
return std::string{cEmptyJsonStr}; |
🤖 Prompt for AI Agents
In src/clp_ffi_js/ir/StructuredIrStreamReader.cpp around lines 242 to 249, the
current NOLINTBEGIN/END block is too broad; replace it by removing the
NOLINTBEGIN(bugprone-lambda-function-name) and
NOLINTEND(bugprone-lambda-function-name) and place a single
NOLINTNEXTLINE(bugprone-lambda-function-name) immediately above the SPDLOG_ERROR
call so the lint suppression only applies to that call; keep the SPDLOG_ERROR
invocation and the subsequent return unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly lgtm. Some minor comments.
LogLevelFilterTsType const& log_level_filter, | ||
std::string const& kql_filter | ||
) { | ||
m_filtered_log_event_map = std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name xxx_map
is a bit confusing. We usually use _map
for a hash map (std::unordered_map
) or a tree-based map. How about filtered_log_event_indices
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been feeling the same way since day 1... but this is used in ffi-js and log viewer. I guess the reasoning could be that this is a map from filtered_log_index to unfiltered_index. @coderabbitai can you create an issue on this refacctoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
I'll create an issue for this refactoring request. The naming is indeed confusing since _map
typically indicates a hash map or tree-based map structure.
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue for refactoring the confusing naming convention
gh issue create \
--title "Refactor confusing 'm_filtered_log_event_map' naming to 'filtered_log_event_indices'" \
--body "## Problem
The current naming convention using \`m_filtered_log_event_map\` is confusing because:
- The \`_map\` suffix typically indicates a hash map (\`std::unordered_map\`) or tree-based map
- The actual data structure appears to be a vector/array of indices, not a key-value mapping
- This naming pattern is used across both ffi-js and log viewer codebases
## Proposed Solution
Rename \`m_filtered_log_event_map\` to \`filtered_log_event_indices\` (or similar) to better reflect that it contains a collection of indices rather than a mapping structure.
## Context
This naming has been identified as confusing by multiple developers and would benefit from refactoring to improve code clarity and follow better naming conventions.
## References
- Original discussion: https://github.com/y-scope/clp-ffi-js/pull/104#discussion_r2291969811
- Raised by: @LinZhihao-723 and @hoophalab
## Impact
This is a cross-cutting refactoring that would require coordinated changes across:
- ffi-js codebase
- log viewer codebase
The change would improve code maintainability and reduce developer confusion." \
--assignee hoophalab
Length of output: 1299
@hoophalab I've created issue #107 to track this refactoring: #107
You're absolutely right that this naming has been confusing. I can see from the code that m_filtered_log_event_map
is actually a std::vector<size_t>
containing indices, not a map data structure. The issue is now assigned to you and includes references to this discussion and the original concern raised by @LinZhihao-723.
The refactoring will indeed require coordinated changes across both ffi-js and log viewer codebases, but it will significantly improve code clarity for future developers.
(\ /)
( ._. )
o_(")(") *thinking about better names*
🧬 Code Graph Analysis Results
-
src/clp_ffi_js/ir/StreamReader.cpp | Summary (generic_decode_range, lines 246-362)
- Summary: This function decodes a range of log events. When use_filter is true, it uses the provided filtered_log_event_map to map the i-th entry in the output to the corresponding unfiltered log event index (log_event_idx = filtered_log_event_map->at(i)); when use_filter is false, it uses log_event_idx = i. It builds a JS array of decoded log events by calling log_event_to_string on each log_event, and emits entries via an EM_ASM block. It handles invalid ranges and missing filtered maps by returning a null value in DecodedResultsTsType.
- Location: src/clp_ffi_js/ir/StreamReader.cpp, 246-362
-
src/clp_ffi_js/ir/query_methods.hpp | Snippet
- Signature:
- collect_matched_log_event_indices(clp::ReaderInterface& reader, std::string const& query_string)
-> std::vector<size_t>
-
src/clp_ffi_js/ir/StreamReaderDataContext.hpp | Snippet
- get_deserializer():
- [[nodiscard]] auto get_deserializer() -> Deserializer& { return m_deserializer; }
-
src/clp_ffi_js/ir/StreamReaderDataContext.hpp | Snippet
- get_reader():
- [[nodiscard]] auto get_reader() -> clp::ReaderInterface& { return *m_reader; }
-
src/clp_ffi_js/utils.hpp | Snippet
- dump_json_with_replace signature:
- dump_json_with_replace(nlohmann::json const& json_obj) -> std::string
-
src/clp_ffi_js/utils.cpp | Snippet
- dump_json_with_replace implementation:
[[nodiscard]] auto dump_json_with_replace(nlohmann::json const& json_obj) -> std::string { return json_obj.dump(-1, ' ', false, nlohmann::json::error_handler_t::replace); }
-
src/clp_ffi_js/ir/LogEventWithFilterData.hpp | Snippet
- get_log_event():
- [[nodiscard]] auto get_log_event() const -> LogEvent const& { return m_log_event; }
-
src/clp_ffi_js/ir/LogEventWithFilterData.hpp | Snippet
- get_log_level():
- [[nodiscard]] auto get_log_level() const -> LogLevel { return m_log_level; }
-
src/clp_ffi_js/ir/LogEventWithFilterData.hpp | Snippet
- get_timestamp():
- [[nodiscard]] auto get_timestamp() const -> clp::ir::epoch_time_ms_t { return m_timestamp; }
-
src/clp_ffi_js/ir/LogEventWithFilterData.hpp | Snippet
- get_utc_offset():
- [[nodiscard]] auto get_utc_offset() const -> UtcOffset { return m_utc_offset; }
-
src/clp_ffi_js/ir/decoding_methods.cpp | Snippet
- convert_metadata_to_js_object:
auto convert_metadata_to_js_object(nlohmann::json const& metadata) -> MetadataTsType { auto const metadata_str{dump_json_with_replace(metadata)}; auto const metadata_obj{ emscripten::val::global("JSON").call<emscripten::val>("parse", metadata_str) }; return MetadataTsType{metadata_obj}; }
- Location: src/clp_ffi_js/ir/decoding_methods.cpp, 91-97
-
src/clp_ffi_js/ir/decoding_methods.hpp | Snippet
- Signature:
- convert_metadata_to_js_object(nlohmann::json const& metadata) -> MetadataTsType
-
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp | Snippet
- is_auto_generated():
- [[nodiscard]] auto is_auto_generated() const -> bool { return m_is_auto_generated; }
-
src/clp_ffi_js/ir/StructuredIrStreamReader.hpp | Snippet
- get_ir_stream_type():
- [[nodiscard]] auto get_ir_stream_type() const -> StreamType override {
return StreamType::Structured;
}
🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#104
File: src/clp_ffi_js/ir/query_methods.cpp:132-135
Timestamp: 2025-08-21T07:18:06.881Z
Learning: In query methods for IR streams (like query_log_event_indices in src/clp_ffi_js/ir/query_methods.cpp), when encountering std::errc::result_out_of_range indicating an incomplete IR stream, log a warning instead of an error since partial results are returned and can still be useful.
void StructuredIrStreamReader::filter_log_events( | ||
LogLevelFilterTsType const& log_level_filter, | ||
std::string const& kql_filter | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with the current implementation. There can be a more efficient way: we can embed this log-level filtering using a lambda into the query method. But that might be over-engineering at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (11)
src/clp_ffi_js/ir/query_methods.hpp (2)
4-9
: Fix missing include and drop unused headerThis header declares std::string but doesn’t include . Also, <ystdlib/containers/Array.hpp> isn’t used here.
Apply:
#include <vector> +#include <string> #include <clp/ReaderInterface.hpp> -#include <ystdlib/containers/Array.hpp>
18-19
: Doc typos/casing: “Query” → “query”; use straight or curly apostrophes consistentlyTighten the user-facing contract.
- * @throws ClpFfiJsException if the preamble couldn't be deserialized or the Query couldn't be - * executed. + * @throws ClpFfiJsException if the preamble couldn't be deserialized or the query couldn't be + * executed.src/clp_ffi_js/ir/StreamReader.hpp (2)
99-101
: Clarify “null vs full-vector” invariant for callersBe explicit that null represents “all events match” to avoid ambiguous handling downstream.
/** * @return The filtered log events map. - * This is a sorted list of log event indices that match the filter. - * If all log events match the filter, it returns `null` or a vector of all log events. + * This is a sorted list of log event indices that match the filter. + * If all log events match the filter, this returns `null` to signal “no-op filtering”. + * Callers should treat `null` as equivalent to “full coverage”. */
107-111
: Document composition (AND), failure semantics, and whitespace handling for kql_filterSpell out how KQL composes with log_level_filter, what happens on invalid KQL, and how unstructured streams behave. Also remove the stray colon for consistency.
- * @param log_level_filter Array of selected log levels. - * @param kql_filter: A KQL expression used to filter kv-pairs. - * - For structured IR: the filter is applied when non-empty. - * - For unstructured IR: the filter is always ignored, and a warning is logged when non-empty. + * @param log_level_filter Array of selected log levels. + * @param kql_filter A KQL expression applied in conjunction (logical AND) with `log_level_filter`. + * Use an empty or whitespace-only string to disable KQL. + * Behaviour: + * - Structured IR: the filter is applied when kql_filter is non-empty (post-trim). + * - Unstructured IR: the filter is ignored; when non-empty (post-trim) a warning is logged. + * Failure semantics: + * - Invalid or unsupported KQL causes this call to throw `ClpFfiJsException` + * with `ErrorCode_Failure`. Callers should catch and surface a user-friendly message.src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (3)
220-227
: Localise the lint suppressionPrefer NOLINTNEXTLINE to narrowly target the SPDLOG line.
- // NOLINTBEGIN(bugprone-lambda-function-name) SPDLOG_ERROR( "Failed to deserialize log event to JSON: {}:{}", error_code.category().name(), error_code.message() ); - // NOLINTEND(bugprone-lambda-function-name) + // NOLINTNEXTLINE(bugprone-lambda-function-name)
151-157
: KQL handling: treat whitespace-only as empty and guard data contextAvoid running a query for whitespace-only KQL; add a defensive null check before dereferencing the data context.
m_filtered_log_event_map.reset(); - if (false == kql_filter.empty()) { - auto& reader{m_stream_reader_data_context->get_reader()}; - reader.seek_from_begin(0); - m_filtered_log_event_map.emplace(collect_matched_log_event_indices(reader, kql_filter)); - } + auto const has_non_space{ + std::string::npos != kql_filter.find_first_not_of(" \t\n\r\f\v") + }; + if (has_non_space) { + if (nullptr == m_stream_reader_data_context) { + SPDLOG_WARN("KQL provided but stream reader data context is unavailable; skipping."); + } else { + auto& reader{m_stream_reader_data_context->get_reader()}; + reader.seek_from_begin(0); + m_filtered_log_event_map.emplace( + collect_matched_log_event_indices(reader, kql_filter) + ); + } + }
159-165
: Pre-size the result vector to reduce reallocationsReserve capacity based on the expected iteration set.
- if (false == log_level_filter.isNull()) { - std::vector<size_t> filtered_log_event_map; + if (false == log_level_filter.isNull()) { + std::vector<size_t> filtered_log_event_map; + if (m_filtered_log_event_map.has_value()) { + filtered_log_event_map.reserve(m_filtered_log_event_map->size()); + } else { + filtered_log_event_map.reserve(m_deserialized_log_events->size()); + }src/clp_ffi_js/ir/query_methods.cpp (4)
43-48
: Bug: UTC offset change must not be treated as a decode errorReturning IRErrorCode_Decode_Error will abort on valid streams containing UTC offset changes. Return success here.
[[nodiscard]] static auto handle_utc_offset_change( [[maybe_unused]] UtcOffset utc_offset_old, [[maybe_unused]] UtcOffset utc_offset_new ) -> IRErrorCode { - return IRErrorCode::IRErrorCode_Decode_Error; + return IRErrorCode::IRErrorCode_Success; }
62-68
: Avoid copying potentially large index vectors by moving them outProvide an extract method and use it at the return site.
[[nodiscard]] auto get_deserialized_log_event_indices() const -> std::vector<size_t> const& { return m_deserialized_log_event_indices; } + + // Moves accumulated indices out of the handler. + [[nodiscard]] auto extract_deserialized_log_event_indices() -> std::vector<size_t> { + return std::move(m_deserialized_log_event_indices); + }- return deserializer.get_ir_unit_handler().get_deserialized_log_event_indices(); + return deserializer.get_ir_unit_handler().extract_deserialized_log_event_indices();Also applies to: 126-129
98-103
: Augment error context with the offending KQL for easier debuggingInclude the KQL string in the exception message.
- std::format( - "Failed to create query handler: {} {}", - error_code.category().name(), - error_code.message() - ) + std::format( + "Failed to create query handler: {} {} | KQL: \"{}\"", + error_code.category().name(), + error_code.message(), + query_string + )
83-90
: Readability: name the callback type alias instead of inline decltypeThis improves clarity and diagnostics.
- auto query_handler_result{ - clp::ffi::ir_stream::search::QueryHandler< - decltype(&trivial_new_projected_schema_tree_node_callback)>:: + using NewProjectedNodeCallback = decltype(&trivial_new_projected_schema_tree_node_callback); + auto query_handler_result{ + clp::ffi::ir_stream::search::QueryHandler<NewProjectedNodeCallback>:: create(&trivial_new_projected_schema_tree_node_callback, clp_s::search::kql::parse_kql_expression(query_string_stream), {}, false) };
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
src/clp_ffi_js/ir/StreamReader.hpp
(1 hunks)src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
(4 hunks)src/clp_ffi_js/ir/decoding_methods.hpp
(2 hunks)src/clp_ffi_js/ir/query_methods.cpp
(1 hunks)src/clp_ffi_js/ir/query_methods.hpp
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
src/clp_ffi_js/ir/decoding_methods.hpp
src/clp_ffi_js/ir/query_methods.hpp
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
src/clp_ffi_js/ir/StreamReader.hpp
src/clp_ffi_js/ir/query_methods.cpp
🧠 Learnings (12)
📚 Learning: 2024-11-04T19:55:27.398Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:27.398Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.
Applied to files:
src/clp_ffi_js/ir/decoding_methods.hpp
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-11-04T19:35:32.602Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.
Applied to files:
src/clp_ffi_js/ir/decoding_methods.hpp
src/clp_ffi_js/ir/query_methods.hpp
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
src/clp_ffi_js/ir/StreamReader.hpp
src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2025-08-21T07:18:06.881Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#104
File: src/clp_ffi_js/ir/query_methods.cpp:132-135
Timestamp: 2025-08-21T07:18:06.881Z
Learning: In query methods for IR streams (like query_log_event_indices in src/clp_ffi_js/ir/query_methods.cpp), when encountering std::errc::result_out_of_range indicating an incomplete IR stream, log a warning instead of an error since partial results are returned and can still be useful.
Applied to files:
src/clp_ffi_js/ir/decoding_methods.hpp
src/clp_ffi_js/ir/query_methods.hpp
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
src/clp_ffi_js/ir/StreamReader.hpp
src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-12-09T20:27:28.454Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#35
File: src/clp_ffi_js/ir/utils.hpp:157-157
Timestamp: 2024-12-09T20:27:28.454Z
Learning: In `src/clp_ffi_js/ir/utils.hpp`, when `log_event_idx` is of type `size_t`, avoid suggesting changes to expressions like `log_event_idx + 1` for potential integer overflow concerns.
Applied to files:
src/clp_ffi_js/ir/decoding_methods.hpp
src/clp_ffi_js/ir/query_methods.hpp
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-11-07T08:35:20.376Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StructuredIrStreamReader.hpp:187-190
Timestamp: 2024-11-07T08:35:20.376Z
Learning: In the `StructuredIrStreamReader` constructor of `src/clp_ffi_js/ir/StructuredIrStreamReader.hpp`, the `stream_reader_data_context` parameter is already wrapped in a `std::make_unique`, ensuring it matches the member variable `m_stream_reader_data_context` of type `std::unique_ptr<StreamReaderDataContext<StructuredIrDeserializer>>`.
Applied to files:
src/clp_ffi_js/ir/decoding_methods.hpp
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-11-07T08:38:39.882Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#30
File: src/clp_ffi_js/ir/StreamReader.cpp:179-179
Timestamp: 2024-11-07T08:38:39.882Z
Learning: The function `clp::ffi::ir_stream::validate_protocol_version` used in `src/clp_ffi_js/ir/StreamReader.cpp` is imported from the CLP upstream library and may not be present in the local codebase.
Applied to files:
src/clp_ffi_js/ir/decoding_methods.hpp
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-11-04T21:06:24.453Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/decoding_methods.cpp:45-80
Timestamp: 2024-11-04T21:06:24.453Z
Learning: In `src/clp_ffi_js/ir/decoding_methods.cpp`, within the `get_version(clp::ReaderInterface&)` function, avoid returning `std::string_view` when referencing local data that goes out of scope. Instead, return `std::string` to ensure data validity.
Applied to files:
src/clp_ffi_js/ir/decoding_methods.hpp
src/clp_ffi_js/ir/query_methods.hpp
src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-11-10T02:14:19.411Z
Learnt from: davemarco
PR: y-scope/clp-ffi-js#32
File: src/clp_ffi_js/ir/LogEventWithFilterData.hpp:0-0
Timestamp: 2024-11-10T02:14:19.411Z
Learning: In the context of `LogEventWithFilterData`, it is acceptable for `log_level` to be `LogLevel::NONE`; additional validation against `LogLevel::NONE` is not necessary.
Applied to files:
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
📚 Learning: 2024-11-04T19:37:27.204Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:85-91
Timestamp: 2024-11-04T19:37:27.204Z
Learning: In the `IrStreamReader::deserialize_stream` method, we handle missing End-of-Stream packets gracefully by logging an error and breaking the loop when encountering `std::errc::result_out_of_range`, rather than throwing an exception.
Applied to files:
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-11-05T16:07:26.232Z
Learnt from: davemarco
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/decoding_methods.cpp:0-0
Timestamp: 2024-11-05T16:07:26.232Z
Learning: When suggesting the use of `std::format` with error codes or custom types in C++, ensure that the type is formattable. If the type is unknown or not directly supported by `std::format`, consider casting it to a compatible type (e.g., using `clp::enum_to_underlying_type`) to avoid compilation errors.
Applied to files:
src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2025-06-27T19:57:25.643Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#99
File: src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp:134-134
Timestamp: 2025-06-27T19:57:25.643Z
Learning: In the clp-ffi-js codebase, prefer list initialization `{}` over assignment operator `=` for variable declarations to prevent narrowing conversions and maintain consistency. This applies to C++ files throughout the codebase.
Applied to files:
src/clp_ffi_js/ir/query_methods.cpp
📚 Learning: 2024-11-14T15:31:45.926Z
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#35
File: src/clp_ffi_js/ir/StructuredIrStreamReader.cpp:50-51
Timestamp: 2024-11-14T15:31:45.926Z
Learning: Ensure that `clp-ffi-js` supports JavaScript usage without relying on the TypeScript definition file, and keep track of related TODOs in a GitHub issue.
Applied to files:
src/clp_ffi_js/ir/query_methods.cpp
🧬 Code graph analysis (3)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (4)
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (2)
filter_log_events
(83-94)filter_log_events
(83-86)src/clp_ffi_js/ir/StreamReader.hpp (3)
log_level_filter
(113-113)log_level_filter
(121-123)log_level_filter
(121-121)src/clp_ffi_js/ir/query_methods.cpp (1)
collect_matched_log_event_indices
(80-81)src/clp_ffi_js/ir/decoding_methods.hpp (2)
deserialize_log_events
(44-69)deserialize_log_events
(44-47)
src/clp_ffi_js/ir/StreamReader.hpp (6)
src/clp_ffi_js/ir/query_methods.cpp (6)
nodiscard
(36-41)nodiscard
(43-48)nodiscard
(50-56)nodiscard
(58-60)nodiscard
(62-64)nodiscard
(79-129)src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp (2)
nodiscard
(53-55)log_level_filter
(61-64)src/clp_ffi_js/ir/StructuredIrStreamReader.hpp (2)
nodiscard
(58-60)log_level_filter
(66-69)src/clp_ffi_js/ir/StreamReaderDataContext.hpp (2)
nodiscard
(42-42)nodiscard
(44-44)src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (2)
filter_log_events
(147-198)filter_log_events
(147-150)src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp (2)
filter_log_events
(83-94)filter_log_events
(83-86)
src/clp_ffi_js/ir/query_methods.cpp (2)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (7)
maybe_unused
(119-122)log_event
(110-111)log_event
(154-154)log_event
(164-165)log_event
(175-175)is_auto_generated
(132-136)schema_tree
(74-77)src/clp_ffi_js/ir/decoding_methods.hpp (2)
deserialize_log_events
(44-69)deserialize_log_events
(44-47)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: lint (macos-latest)
- GitHub Check: lint (ubuntu-latest)
🔇 Additional comments (4)
src/clp_ffi_js/ir/query_methods.hpp (1)
21-23
: API name reads clearly — good changeRenaming to collect_matched_log_event_indices removes ambiguity vs. “index”. LGTM.
src/clp_ffi_js/ir/decoding_methods.hpp (1)
54-57
: Correct severity for incomplete IR stream — goodUsing SPDLOG_WARN for std::errc::result_out_of_range matches the codebase guideline that partial results are acceptable. No change needed.
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (1)
3-3
: Correct include for std::ranges::find — goodAdding in this TU is required for std::ranges::find. Looks good.
src/clp_ffi_js/ir/query_methods.cpp (1)
126-129
: Shared loop reuse via deserialize_log_events — goodDelegating to the common helper keeps error-handling consistent across readers. Nice.
#include <clp/ffi/ir_stream/Deserializer.hpp> | ||
#include <clp/ffi/ir_stream/IrUnitHandlerReq.hpp> | ||
#include <clp/ffi/ir_stream/search/QueryHandlerReq.hpp> | ||
#include <clp/ReaderInterface.hpp> | ||
#include <nlohmann/json.hpp> | ||
|
||
#include <clp_ffi_js/ClpFfiJsException.hpp> | ||
#include <clp_ffi_js/ir/StreamReader.hpp> | ||
#include <clp_ffi_js/ir/StructuredIrStreamReader.hpp> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add required headers for std::format and logging; avoid relying on transitive includes
This TU uses std::format and SPDLOG_WARN; include their headers explicitly. Also include <system_error> for std::errc.
#include <clp/ffi/ir_stream/Deserializer.hpp>
#include <clp/ffi/ir_stream/IrUnitHandlerReq.hpp>
#include <clp/ffi/ir_stream/search/QueryHandlerReq.hpp>
#include <clp/ReaderInterface.hpp>
#include <nlohmann/json.hpp>
#include <clp_ffi_js/ClpFfiJsException.hpp>
#include <clp_ffi_js/ir/StreamReader.hpp>
#include <clp_ffi_js/ir/StructuredIrStreamReader.hpp>
+#include <format>
+#include <spdlog/spdlog.h>
+#include <system_error>
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#include <clp/ffi/ir_stream/Deserializer.hpp> | |
#include <clp/ffi/ir_stream/IrUnitHandlerReq.hpp> | |
#include <clp/ffi/ir_stream/search/QueryHandlerReq.hpp> | |
#include <clp/ReaderInterface.hpp> | |
#include <nlohmann/json.hpp> | |
#include <clp_ffi_js/ClpFfiJsException.hpp> | |
#include <clp_ffi_js/ir/StreamReader.hpp> | |
#include <clp_ffi_js/ir/StructuredIrStreamReader.hpp> | |
#include <clp/ffi/ir_stream/Deserializer.hpp> | |
#include <clp/ffi/ir_stream/IrUnitHandlerReq.hpp> | |
#include <clp/ffi/ir_stream/search/QueryHandlerReq.hpp> | |
#include <clp/ReaderInterface.hpp> | |
#include <nlohmann/json.hpp> | |
#include <clp_ffi_js/ClpFfiJsException.hpp> | |
#include <clp_ffi_js/ir/StreamReader.hpp> | |
#include <clp_ffi_js/ir/StructuredIrStreamReader.hpp> | |
#include <format> | |
#include <spdlog/spdlog.h> | |
#include <system_error> |
🤖 Prompt for AI Agents
In src/clp_ffi_js/ir/decoding_methods.hpp around lines 4 to 13, this translation
unit uses std::format, SPDLOG_WARN and std::errc but relies on transitive
includes; add explicit includes for <format>, <spdlog/spdlog.h> and
<system_error> near the other headers to ensure std::format, spdlog macros and
std::errc are available and avoid depending on transitive headers.
#include <clp_ffi_js/ir/StreamReader.hpp> | ||
#include <clp_ffi_js/ir/StructuredIrStreamReader.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Decouple: remove unnecessary include of StructuredIrStreamReader.hpp
This header doesn’t reference StructuredIrStreamReader; drop the include to reduce compile-time and coupling.
-#include <clp_ffi_js/ir/StreamReader.hpp>
-#include <clp_ffi_js/ir/StructuredIrStreamReader.hpp>
+#include <clp_ffi_js/ir/StreamReader.hpp>
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#include <clp_ffi_js/ir/StreamReader.hpp> | |
#include <clp_ffi_js/ir/StructuredIrStreamReader.hpp> | |
#include <clp_ffi_js/ir/StreamReader.hpp> |
🤖 Prompt for AI Agents
In src/clp_ffi_js/ir/decoding_methods.hpp around lines 11 to 12, the file
unnecessarily includes StructuredIrStreamReader.hpp even though it does not
reference that type; remove the line including
<clp_ffi_js/ir/StructuredIrStreamReader.hpp> to reduce coupling and compile
time, leaving only the required #include <clp_ffi_js/ir/StreamReader.hpp>, and
run a quick rebuild to confirm no missing symbols.
#include <clp/ErrorCode.hpp> | ||
#include <clp/ffi/ir_stream/decoding_methods.hpp> | ||
#include <clp/ffi/ir_stream/Deserializer.hpp> | ||
#include <clp/ffi/ir_stream/search/QueryHandler.hpp> | ||
#include <clp/ffi/KeyValuePairLogEvent.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Trim unused include: clp/ffi/ir_stream/decoding_methods.hpp
Not referenced in this TU; rely on Deserializer.hpp and our local helper instead.
#include <clp/ErrorCode.hpp>
-#include <clp/ffi/ir_stream/decoding_methods.hpp>
#include <clp/ffi/ir_stream/Deserializer.hpp>
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#include <clp/ErrorCode.hpp> | |
#include <clp/ffi/ir_stream/decoding_methods.hpp> | |
#include <clp/ffi/ir_stream/Deserializer.hpp> | |
#include <clp/ffi/ir_stream/search/QueryHandler.hpp> | |
#include <clp/ffi/KeyValuePairLogEvent.hpp> | |
#include <clp/ErrorCode.hpp> | |
#include <clp/ffi/ir_stream/Deserializer.hpp> | |
#include <clp/ffi/ir_stream/search/QueryHandler.hpp> | |
#include <clp/ffi/KeyValuePairLogEvent.hpp> |
🤖 Prompt for AI Agents
In src/clp_ffi_js/ir/query_methods.cpp around lines 13 to 17, remove the unused
include <clp/ffi/ir_stream/decoding_methods.hpp> (it's not referenced in this
translation unit); keep the other includes (Deserializer.hpp, QueryHandler.hpp,
KeyValuePairLogEvent.hpp, ErrorCode.hpp), rebuild to ensure no missing symbols,
and run tests or compilation to verify everything still compiles relying on
Deserializer.hpp and the local helper.
/** | ||
* Generates a filtered collection from all log events. | ||
* | ||
* @param log_level_filter Array of selected log levels. | ||
*/ | ||
void filter_log_events(LogLevelFilterTsType const& log_level_filter) { | ||
filter_log_events(log_level_filter, ""); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Minor: document the single-arg wrapper’s behaviour
Clarify that this overload disables KQL by design.
-/**
- * Generates a filtered collection from all log events.
- *
- * @param log_level_filter Array of selected log levels.
- */
+/**
+ * Convenience wrapper: filters only by `log_level_filter` (KQL disabled).
+ *
+ * @param log_level_filter Array of selected log levels.
+ */
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
/** | |
* Generates a filtered collection from all log events. | |
* | |
* @param log_level_filter Array of selected log levels. | |
*/ | |
void filter_log_events(LogLevelFilterTsType const& log_level_filter) { | |
filter_log_events(log_level_filter, ""); | |
} | |
/** | |
* Convenience wrapper: filters only by `log_level_filter` (KQL disabled). | |
* | |
* @param log_level_filter Array of selected log levels. | |
*/ | |
void filter_log_events(LogLevelFilterTsType const& log_level_filter) { | |
filter_log_events(log_level_filter, ""); | |
} |
🤖 Prompt for AI Agents
In src/clp_ffi_js/ir/StreamReader.hpp around lines 116 to 123, the doc comment
for the single-argument filter_log_events overload should explicitly state that
this wrapper disables KQL by design; update the comment to mention that calling
filter_log_events(LogLevelFilterTsType const& log_level_filter) invokes the
two-argument overload with an empty KQL string and therefore performs only
log-level filtering (KQL is not applied), so readers understand the intended
behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we might be adding a compatibility layer between unstructured and structured IR logs, the unstructured IR logs might be accessible in the same way as the structured IR logs (e.g., query-able by KQL) in the future.
for the title, how about:
feat: Add KQL query support for filtering structured IR log events.
Description
As the title indicates, this pull request introduces support for KQL filtering.
clp_s::search
. This PR updates theCMakeLists.txt
file to link againstclp-s
and its required dependencies.query_index
, uses the new IR search feature and returns matched indexes.filter_log_events
method has been modified to accept a KQL filter as a parameter.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Behaviour Changes
Chores