Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for log-level filtering of structured IR streams. #35

Open
wants to merge 50 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
1a3da51
add new class
davemarco Nov 9, 2024
29a87d2
fix-lint
davemarco Nov 9, 2024
e5a8e4c
fix lint
davemarco Nov 9, 2024
921c322
small change
davemarco Nov 9, 2024
865e41d
small change
davemarco Nov 9, 2024
0234365
fix lint
davemarco Nov 9, 2024
0c86cf4
Update src/clp_ffi_js/ir/LogEventWithFilterData.hpp
davemarco Nov 9, 2024
84037cf
fix lint
davemarco Nov 10, 2024
ef1ff5d
fix lint
davemarco Nov 10, 2024
cdcad94
fix lint
davemarco Nov 10, 2024
a1e9142
fix lint
davemarco Nov 10, 2024
2738dd5
first draft
davemarco Nov 10, 2024
d4afd35
add missing filtering
davemarco Nov 10, 2024
3f32f91
fix lint
davemarco Nov 10, 2024
01ef415
add support for int
davemarco Nov 10, 2024
de5aaf5
add new functions
davemarco Nov 11, 2024
67f43f0
cleanup
davemarco Nov 11, 2024
be0e5ba
lint
davemarco Nov 11, 2024
8e8c648
add files
davemarco Nov 11, 2024
f2907f4
changes
davemarco Nov 11, 2024
a1b776f
fix filter
davemarco Nov 11, 2024
a199924
Merge branch 'main' into sFilter
davemarco Nov 11, 2024
d00d0a2
change version
davemarco Nov 11, 2024
e9ed45d
fix lint
davemarco Nov 11, 2024
72029f1
small change
davemarco Nov 11, 2024
c757571
junhao review
davemarco Nov 15, 2024
4975f79
fix lint
davemarco Nov 25, 2024
7346715
fix lint
davemarco Nov 25, 2024
04a8d1f
junhao changes
davemarco Dec 2, 2024
2a202b8
add missing code in structured reader
davemarco Dec 5, 2024
5332e9b
add template for decode
davemarco Dec 6, 2024
da6350f
lint
davemarco Dec 6, 2024
ee2eb42
fix lint
davemarco Dec 6, 2024
90b977f
reword some comments
davemarco Dec 6, 2024
8cc73d0
junhao review
davemarco Dec 9, 2024
4d905e8
remove double declaration
davemarco Dec 9, 2024
5854e36
fix lint
davemarco Dec 10, 2024
f3f6e76
move reserve up
davemarco Dec 10, 2024
d4467ea
Apply suggestions from code review
davemarco Dec 12, 2024
6660763
add reader option param
davemarco Dec 12, 2024
5b684ab
refactor using lambda
davemarco Dec 12, 2024
9a5a502
small nits
davemarco Dec 12, 2024
cce7980
fix lint
davemarco Dec 12, 2024
feec5ef
Apply suggestions from code review
davemarco Dec 13, 2024
61e1e83
kirk review
davemarco Dec 13, 2024
907c327
fix lint
davemarco Dec 13, 2024
1da4c7d
fix lint
davemarco Dec 13, 2024
f08e07f
Apply suggestions from code review
davemarco Dec 13, 2024
aee6422
kirk review
davemarco Dec 13, 2024
0581310
added rerror for invalid range
davemarco Dec 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/clp_ffi_js/constants.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ enum class LogLevel : std::uint8_t {
FATAL,
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a Length as the last item and use it to replace cValidLogLevelsEndIdx?

Suggested change
};
Length
};

That way we don't always have to update cValidLogLevelsEndIdx if we ever add more log levels.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at https://github.com/y-scope/clp/blob/f1876cd63a56dd79a6243f6c8b6b6a7fd48348cc/components/core/src/clp/GlobalMySQLMetadataDB.cpp#L22-L23 when i suggested the enum name "length" though I don't see the convention being documented anywhere. If that's a convention, i feel we can omit the comment.

@kirkrodrigues could you confirm the convention?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a general C/C++ convention to call it length; that said, it's probably not a bad idea to leave a comment because maybe the reader would confuse "SIZE" for a log level.

I also realized that the case of these enum values is wrong. They should be None, Trace, etc. Can we open an issue and fix in another PR?

Copy link
Collaborator

@junhaoliao junhaoliao Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai let's open an issue to correct cases of enum property names as PascalCase

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, I've successfully created the issue to correct the enum property names to PascalCase: Issue #39.

🔗 Analysis chain

@junhaoliao, I've opened an issue to correct the cases of enum property names to PascalCase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
gh issue create \
  --title "Correct enum property names to PascalCase" \
  --body "As discussed in [PR #35](https://github.com/y-scope/clp-ffi-js/pull/35) and [this comment](https://github.com/y-scope/clp-ffi-js/pull/35#discussion_r1842284173), the enum property names in \`src/clp_ffi_js/constants.hpp\` should be updated to PascalCase to adhere to coding conventions.

Requested by @junhaoliao." \
  --assignee junhaoliao

Length of output: 460

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to Length.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

constexpr LogLevel cValidLogLevelsBeginIdx{LogLevel::TRACE};
constexpr LogLevel cValidLogLevelsEndIdx{LogLevel::FATAL};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indices bounds are non-inclusive, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure i changed


/**
* Strings corresponding to `LogLevel`.
Expand Down
70 changes: 70 additions & 0 deletions src/clp_ffi_js/ir/LogEventWithFilterData.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
#ifndef CLP_FFI_JS_IR_LOGEVENTWITHFILTERDATA_HPP
#define CLP_FFI_JS_IR_LOGEVENTWITHFILTERDATA_HPP

#include <concepts>
#include <utility>

#include <clp/ffi/KeyValuePairLogEvent.hpp>
#include <clp/ir/LogEvent.hpp>
#include <clp/ir/types.hpp>

#include <clp_ffi_js/constants.hpp>

namespace clp_ffi_js::ir {
using clp::ir::four_byte_encoded_variable_t;
using UnstructuredLogEvent = clp::ir::LogEvent<four_byte_encoded_variable_t>;
using StructuredLogEvent = clp::ffi::KeyValuePairLogEvent;

// Concept defining valid log event types.
// TODO: Extend valid log event types when filtering support is added for structured logs.
template <typename T>
concept ValidLogEventTypes
= std::same_as<T, UnstructuredLogEvent> || std::same_as<T, StructuredLogEvent>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Update or remove outdated TODO comment regarding valid log event types

The TODO comment on line 19 suggests extending valid log event types when filtering support is added for structured logs. However, StructuredLogEvent is already included in ValidLogEventTypes on line 22. Please consider updating or removing the TODO comment to reflect the current state of the code.


/**
* A templated class that extends a log event type with processed versions of some of its fields,
* specifically the fields that are used for filtering in the `StreamReader` classes and their
* callers.
*
* @tparam LogEvent The type of the log event.
*/
template <typename LogEvent>
requires ValidLogEventTypes<LogEvent>
class LogEventWithFilterData {
public:
// Constructor
LogEventWithFilterData(
LogEvent log_event,
LogLevel log_level,
clp::ir::epoch_time_ms_t timestamp
)
: m_log_event{std::move(log_event)},
m_log_level{log_level},
m_timestamp{timestamp} {}

// Disable copy constructor and assignment operator
LogEventWithFilterData(LogEventWithFilterData const&) = delete;
auto operator=(LogEventWithFilterData const&) -> LogEventWithFilterData& = delete;

// Default move constructor and assignment operator
LogEventWithFilterData(LogEventWithFilterData&&) = default;
auto operator=(LogEventWithFilterData&&) -> LogEventWithFilterData& = default;

// Destructor
~LogEventWithFilterData() = default;

[[nodiscard]] auto get_log_event() const -> LogEvent const& { return m_log_event; }

[[nodiscard]] auto get_log_level() const -> LogLevel { return m_log_level; }

[[nodiscard]] auto get_timestamp() const -> clp::ir::epoch_time_ms_t { return m_timestamp; }

private:
LogEvent m_log_event;
LogLevel m_log_level;
clp::ir::epoch_time_ms_t m_timestamp;
};

} // namespace clp_ffi_js::ir

#endif // CLP_FFI_JS_IR_LOGEVENTWITHFILTERDATA_HPP
41 changes: 0 additions & 41 deletions src/clp_ffi_js/ir/LogEventWithLevel.hpp

This file was deleted.

6 changes: 6 additions & 0 deletions src/clp_ffi_js/ir/StreamReader.hpp
kirkrodrigues marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@ enum class StreamType : uint8_t {
Unstructured,
};

/**
* Mapping between an index in the filtered log events collection to an index in the unfiltered
* log events collection.
*/
using FilteredLogEventsMap = std::optional<std::vector<size_t>>;

/**
* Class to deserialize and decode Zstandard-compressed CLP IR streams as well as format decoded
* log events.
Expand Down
60 changes: 26 additions & 34 deletions src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,39 +13,42 @@
#include <clp/ErrorCode.hpp>
#include <clp/ffi/ir_stream/Deserializer.hpp>
#include <clp/ffi/KeyValuePairLogEvent.hpp>
#include <clp/ffi/Value.hpp>
#include <clp/ir/types.hpp>
#include <clp/TraceableException.hpp>
#include <emscripten/em_asm.h>
#include <emscripten/val.h>
#include <spdlog/spdlog.h>

#include <clp_ffi_js/ClpFfiJsException.hpp>
#include <clp_ffi_js/constants.hpp>
#include <clp_ffi_js/ir/LogEventWithFilterData.hpp>
#include <clp_ffi_js/ir/StreamReader.hpp>
#include <clp_ffi_js/ir/StreamReaderDataContext.hpp>
#include <clp_ffi_js/ir/StructuredIrUnitHandler.hpp>
#include <clp_ffi_js/ir/utils.hpp>

namespace clp_ffi_js::ir {
namespace {
constexpr std::string_view cEmptyJsonStr{"{}"};
constexpr std::string_view cLogLevelFilteringNotSupportedErrorMsg{
"Log level filtering is not yet supported in this reader."
};
constexpr std::string_view cReaderOptionsLogLevelKey{"logLevelKey"};
constexpr std::string_view cReaderOptionsTimestampKey{"timestampKey"};
} // namespace

using clp::ir::four_byte_encoded_variable_t;

auto StructuredIrStreamReader::create(
std::unique_ptr<ZstdDecompressor>&& zstd_decompressor,
clp::Array<char> data_array,
ReaderOptions const& reader_options
) -> StructuredIrStreamReader {
auto deserialized_log_events{std::make_shared<std::vector<clp::ffi::KeyValuePairLogEvent>>()};
auto deserialized_log_events{
std::make_shared<std::vector<LogEventWithFilterData<StructuredLogEvent>>>()
};
auto result{StructuredIrDeserializer::create(
*zstd_decompressor,
IrUnitHandler{
StructuredIrUnitHandler{
deserialized_log_events,
reader_options[cReaderOptionsLogLevelKey.data()].as<std::string>(),
reader_options[cReaderOptionsTimestampKey.data()].as<std::string>()
Comment on lines +41 to 42
Copy link

@coderabbitai coderabbitai bot Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ensure presence of required keys in reader_options

The code assumes that reader_options contains the keys logLevelKey and timestampKey. To prevent potential exceptions, consider adding error handling to verify that these keys are present.

Apply this diff to add key existence checks:

+if (!reader_options.hasOwnProperty(cReaderOptionsLogLevelKey.data()) || 
+    !reader_options.hasOwnProperty(cReaderOptionsTimestampKey.data())) {
+    throw ClpFfiJsException{
+        clp::ErrorCode::ErrorCode_InvalidArgument,
+        __FILENAME__,
+        __LINE__,
+        "Required reader options keys are missing"
+    };
+}

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a issue for JavaScript support (usage without the TS definition file) and keep a list of TODOs there.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, I've created the issue as requested: Issue #36 to track JavaScript support without the TypeScript definition file and to keep a list of TODOs.

🔗 Analysis chain

@junhaoliao, I've created a new GitHub issue to track JavaScript support without the TypeScript definition file and to keep a list of TODOs.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Create a new GitHub issue for JavaScript support without the TypeScript definition file
gh issue create \
  --title "JavaScript support without the TypeScript definition file" \
  --body "We need to support JavaScript usage without relying on the TypeScript definition file.

This issue is created to keep a list of TODOs for implementing this feature.

Referenced from [PR #35](https://github.com/y-scope/clp-ffi-js/pull/35) and [this comment](https://github.com/y-scope/clp-ffi-js/pull/35#discussion_r1835929602) by @junhaoliao." \
  --assignee junhaoliao

Length of output: 507


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#35
File: src/clp_ffi_js/ir/StructuredIrStreamReader.cpp:50-51
Timestamp: 2024-11-14T15:31:45.747Z
Learning: Ensure that `clp-ffi-js` supports JavaScript usage without relying on the TypeScript definition file, and keep track of related TODOs in a GitHub issue.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

}
)};
Expand Down Expand Up @@ -75,15 +78,19 @@ auto StructuredIrStreamReader::get_num_events_buffered() const -> size_t {
}

auto StructuredIrStreamReader::get_filtered_log_event_map() const -> FilteredLogEventMapTsType {
SPDLOG_ERROR(cLogLevelFilteringNotSupportedErrorMsg);
return FilteredLogEventMapTsType{emscripten::val::null()};
if (false == m_filtered_log_event_map.has_value()) {
return FilteredLogEventMapTsType{emscripten::val::null()};
}

return FilteredLogEventMapTsType{emscripten::val::array(m_filtered_log_event_map.value())};
}

void StructuredIrStreamReader::filter_log_events(LogLevelFilterTsType const& log_level_filter) {
if (log_level_filter.isNull()) {
return;
}
SPDLOG_ERROR(cLogLevelFilteringNotSupportedErrorMsg);
filter_deserialized_events(
log_level_filter,
m_filtered_log_event_map,
*m_deserialized_log_events
);
}

auto StructuredIrStreamReader::deserialize_stream() -> size_t {
Expand Down Expand Up @@ -119,9 +126,6 @@ auto StructuredIrStreamReader::deserialize_stream() -> size_t {
)
};
}
m_timestamp_node_id = m_stream_reader_data_context->get_deserializer()
.get_ir_unit_handler()
.get_timestamp_node_id();
m_stream_reader_data_context.reset(nullptr);
return m_deserialized_log_events->size();
}
Expand All @@ -140,9 +144,10 @@ auto StructuredIrStreamReader::decode_range(size_t begin_idx, size_t end_idx, bo
auto const results{emscripten::val::array()};

for (size_t log_event_idx = begin_idx; log_event_idx < end_idx; ++log_event_idx) {
auto const& log_event{m_deserialized_log_events->at(log_event_idx)};
auto const& log_event_with_filter_data{m_deserialized_log_events->at(log_event_idx)};
auto const& structured_log_event = log_event_with_filter_data.get_log_event();

auto const json_result{log_event.serialize_to_json()};
auto const json_result{structured_log_event.serialize_to_json()};
std::string json_str{cEmptyJsonStr};
if (false == json_result.has_value()) {
auto error_code{json_result.error()};
Expand All @@ -155,26 +160,12 @@ auto StructuredIrStreamReader::decode_range(size_t begin_idx, size_t end_idx, bo
json_str = json_result.value().dump();
}

auto const& id_value_pairs{log_event.get_node_id_value_pairs()};
clp::ffi::value_int_t timestamp{0};
if (m_timestamp_node_id.has_value()) {
auto const& timestamp_pair{id_value_pairs.at(m_timestamp_node_id.value())};
if (timestamp_pair.has_value()) {
if (timestamp_pair->is<clp::ffi::value_int_t>()) {
timestamp = timestamp_pair.value().get_immutable_view<clp::ffi::value_int_t>();
} else {
// TODO: Add support for parsing timestamp values of string type.
SPDLOG_ERROR("Unable to parse timestamp for log_event_idx={}", log_event_idx);
}
}
}

EM_ASM(
{ Emval.toValue($0).push([UTF8ToString($1), $2, $3, $4]); },
results.as_handle(),
json_str.c_str(),
timestamp,
LogLevel::NONE,
log_event_with_filter_data.get_timestamp(),
log_event_with_filter_data.get_log_level(),
log_event_idx + 1
);
}
Expand All @@ -184,7 +175,8 @@ auto StructuredIrStreamReader::decode_range(size_t begin_idx, size_t end_idx, bo

StructuredIrStreamReader::StructuredIrStreamReader(
StreamReaderDataContext<StructuredIrDeserializer>&& stream_reader_data_context,
std::shared_ptr<std::vector<clp::ffi::KeyValuePairLogEvent>> deserialized_log_events
std::shared_ptr<std::vector<LogEventWithFilterData<StructuredLogEvent>>>
deserialized_log_events
)
: m_deserialized_log_events{std::move(deserialized_log_events)},
m_stream_reader_data_context{
Expand Down
Loading
Loading