Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for log-level filtering of structured IR streams. #35

Open
wants to merge 50 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
1a3da51
add new class
davemarco Nov 9, 2024
29a87d2
fix-lint
davemarco Nov 9, 2024
e5a8e4c
fix lint
davemarco Nov 9, 2024
921c322
small change
davemarco Nov 9, 2024
865e41d
small change
davemarco Nov 9, 2024
0234365
fix lint
davemarco Nov 9, 2024
0c86cf4
Update src/clp_ffi_js/ir/LogEventWithFilterData.hpp
davemarco Nov 9, 2024
84037cf
fix lint
davemarco Nov 10, 2024
ef1ff5d
fix lint
davemarco Nov 10, 2024
cdcad94
fix lint
davemarco Nov 10, 2024
a1e9142
fix lint
davemarco Nov 10, 2024
2738dd5
first draft
davemarco Nov 10, 2024
d4afd35
add missing filtering
davemarco Nov 10, 2024
3f32f91
fix lint
davemarco Nov 10, 2024
01ef415
add support for int
davemarco Nov 10, 2024
de5aaf5
add new functions
davemarco Nov 11, 2024
67f43f0
cleanup
davemarco Nov 11, 2024
be0e5ba
lint
davemarco Nov 11, 2024
8e8c648
add files
davemarco Nov 11, 2024
f2907f4
changes
davemarco Nov 11, 2024
a1b776f
fix filter
davemarco Nov 11, 2024
a199924
Merge branch 'main' into sFilter
davemarco Nov 11, 2024
d00d0a2
change version
davemarco Nov 11, 2024
e9ed45d
fix lint
davemarco Nov 11, 2024
72029f1
small change
davemarco Nov 11, 2024
c757571
junhao review
davemarco Nov 15, 2024
4975f79
fix lint
davemarco Nov 25, 2024
7346715
fix lint
davemarco Nov 25, 2024
04a8d1f
junhao changes
davemarco Dec 2, 2024
2a202b8
add missing code in structured reader
davemarco Dec 5, 2024
5332e9b
add template for decode
davemarco Dec 6, 2024
da6350f
lint
davemarco Dec 6, 2024
ee2eb42
fix lint
davemarco Dec 6, 2024
90b977f
reword some comments
davemarco Dec 6, 2024
8cc73d0
junhao review
davemarco Dec 9, 2024
4d905e8
remove double declaration
davemarco Dec 9, 2024
5854e36
fix lint
davemarco Dec 10, 2024
f3f6e76
move reserve up
davemarco Dec 10, 2024
d4467ea
Apply suggestions from code review
davemarco Dec 12, 2024
6660763
add reader option param
davemarco Dec 12, 2024
5b684ab
refactor using lambda
davemarco Dec 12, 2024
9a5a502
small nits
davemarco Dec 12, 2024
cce7980
fix lint
davemarco Dec 12, 2024
feec5ef
Apply suggestions from code review
davemarco Dec 13, 2024
61e1e83
kirk review
davemarco Dec 13, 2024
907c327
fix lint
davemarco Dec 13, 2024
1da4c7d
fix lint
davemarco Dec 13, 2024
f08e07f
Apply suggestions from code review
davemarco Dec 13, 2024
aee6422
kirk review
davemarco Dec 13, 2024
0581310
added rerror for invalid range
davemarco Dec 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ endif()
set(CLP_FFI_JS_SRC_MAIN
src/clp_ffi_js/ir/StreamReader.cpp
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
)

Expand Down
21 changes: 12 additions & 9 deletions src/clp_ffi_js/constants.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include <array>
#include <cstdint>
#include <string_view>
#include <type_utils.hpp>

namespace clp_ffi_js {
/**
Expand All @@ -17,6 +18,7 @@ enum class LogLevel : std::uint8_t {
WARN,
ERROR,
FATAL,
LENGTH, // This isn't a valid log level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove NONE and use LENGTH in its place? That way we can drop one of the enum values and we can think about LENGTH as indicating that no valid level was found (so the search extended past the last valid value).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a reasonable idea, but have some qualms

  1. If we add multiple level schemas in the future, say one with more levels (like a new CRITICAL), then LENGTH enum may or may not represent a valid level across level schemas. Having a unique NONE as always zero across level schemas may simplify some code.
  2. There will be some effort to modify log viewer to support this change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying. On the other hand, having two invalid values is kind of messy already. For example, the code to check if a log level is within the valid enum values depends on cValidLogLevelsBeginIdx and LogLevel::LENGTH.

My perspective on the current log level parsing is that it's a heuristic until we migrate to IRv2 entirelyl; at that point, we shouldn't have to parse log levels any more. So I feel like simplifying the current code is a better approach until we switch to IRv2 or we have a user-need for more complicated levels.

That said, this isn't that big of a deal, so I guess if @junhaoliao doesn't agree, I'll leave it be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao please review my and kirk arguments, and let us know ur thoughts

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current log level parsing is that it's a heuristic until we migrate to IRv2 entirely

@kirkrodrigues Would CLP FFI also support parsing log level for IRv1 stream in the new Deserializer interface (the one that is now only for IRv2)? If not, we still need to maintain this enum for IRv1 till we completely drop support for IRv1, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we need to wait until we completely drop support for IRv1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @davemarco 's (1.) argument about the challenges of adding new log levels. However, I also recognize that the NONE value is used solely for log parsing (in read paths, not write paths). Since NONE is never serialized into non-code spaces and read back into the code, as long as our implementation consistently refers to it, there should be no risk of errors, which aligns with @kirkrodrigues's argument, assuming I understand it correctly.

With that said, I still prefer that we keep the NONE as the very first item in the enum.
The underlying type of the LogLevel is uint8_t which has value 0 when unutilized and placed in bss / initialized without explicit value (={}). Such value is usually used to hint the default; for example, in order to hint feature XXX is by default enabled, we can add Boolean flags like is_xxx_disabled. With NONE removed, when we declare a enum variable without proper initialization, it may contain a valid value, and most importantly the default value is no longer hinted to code readers. (Though most of the time we shouldn't really read before initialization lol

Copy link
Collaborator

@junhaoliao junhaoliao Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a way I feel the inline comments seem a bit bloated. Instead, as a new convention, can we add a prefix to hint the enum member is for describing properties of the enum? e.g.,

enum class FilesTableCol: std::uint8_t {
   NAME = 0,
   PATH,
   SIZE,
   FILES_TABLE_COL_LENGTH // then this should not be confused with `SIZE`
}

I was going to propose _LENGTH but clang-tidy claims that's reserved even now we've been using the enum class syntax.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Sure, let's keep NONE.
  • Sure, I guess we could define a new convention. Keep in mind the naming of enum values should be PascalCase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao i'm not sure clang tidy will currently let you have underscore in enum. For now I added the part about changing name of "LENGTH" to existing issue #39 about pascal case.

};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a Length as the last item and use it to replace cValidLogLevelsEndIdx?

Suggested change
};
Length
};

That way we don't always have to update cValidLogLevelsEndIdx if we ever add more log levels.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at https://github.com/y-scope/clp/blob/f1876cd63a56dd79a6243f6c8b6b6a7fd48348cc/components/core/src/clp/GlobalMySQLMetadataDB.cpp#L22-L23 when i suggested the enum name "length" though I don't see the convention being documented anywhere. If that's a convention, i feel we can omit the comment.

@kirkrodrigues could you confirm the convention?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a general C/C++ convention to call it length; that said, it's probably not a bad idea to leave a comment because maybe the reader would confuse "SIZE" for a log level.

I also realized that the case of these enum values is wrong. They should be None, Trace, etc. Can we open an issue and fix in another PR?

Copy link
Collaborator

@junhaoliao junhaoliao Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai let's open an issue to correct cases of enum property names as PascalCase

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, I've successfully created the issue to correct the enum property names to PascalCase: Issue #39.

🔗 Analysis chain

@junhaoliao, I've opened an issue to correct the cases of enum property names to PascalCase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
gh issue create \
  --title "Correct enum property names to PascalCase" \
  --body "As discussed in [PR #35](https://github.com/y-scope/clp-ffi-js/pull/35) and [this comment](https://github.com/y-scope/clp-ffi-js/pull/35#discussion_r1842284173), the enum property names in \`src/clp_ffi_js/constants.hpp\` should be updated to PascalCase to adhere to coding conventions.

Requested by @junhaoliao." \
  --assignee junhaoliao

Length of output: 460

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to Length.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

constexpr LogLevel cValidLogLevelsBeginIdx{LogLevel::TRACE};

Expand All @@ -25,15 +27,16 @@ constexpr LogLevel cValidLogLevelsBeginIdx{LogLevel::TRACE};
*
* NOTE: These must be kept in sync manually.
*/
constexpr std::array<std::string_view, 7> cLogLevelNames{
"NONE", // This isn't a valid log level.
"TRACE",
"DEBUG",
"INFO",
"WARN",
"ERROR",
"FATAL",
};
constexpr std::array<std::string_view, clp::enum_to_underlying_type(LogLevel::LENGTH)>
cLogLevelNames{
"NONE", // This isn't a valid log level.
"TRACE",
"DEBUG",
"INFO",
"WARN",
"ERROR",
"FATAL",
};
} // namespace clp_ffi_js

#endif // CLP_FFI_JS_CONSTANTS_HPP
4 changes: 3 additions & 1 deletion src/clp_ffi_js/ir/LogEventWithFilterData.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include <concepts>
#include <utility>

#include <clp/ffi/KeyValuePairLogEvent.hpp>
#include <clp/ir/LogEvent.hpp>
#include <clp/ir/types.hpp>

Expand All @@ -12,6 +13,7 @@
namespace clp_ffi_js::ir {
using clp::ir::four_byte_encoded_variable_t;
using UnstructuredLogEvent = clp::ir::LogEvent<four_byte_encoded_variable_t>;
using StructuredLogEvent = clp::ffi::KeyValuePairLogEvent;

/**
* A templated class that extends a log event type with processed versions of some of its fields,
Expand All @@ -21,7 +23,7 @@ using UnstructuredLogEvent = clp::ir::LogEvent<four_byte_encoded_variable_t>;
* @tparam LogEvent The type of the log event.
*/
template <typename LogEvent>
requires std::same_as<LogEvent, UnstructuredLogEvent>
requires std::same_as<LogEvent, UnstructuredLogEvent> || std::same_as<LogEvent, StructuredLogEvent>
class LogEventWithFilterData {
public:
// Constructor
Expand Down
4 changes: 3 additions & 1 deletion src/clp_ffi_js/ir/StreamReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,9 @@ EMSCRIPTEN_BINDINGS(ClpStreamReader) {
// JS types used as inputs
emscripten::register_type<clp_ffi_js::ir::DataArrayTsType>("Uint8Array");
emscripten::register_type<clp_ffi_js::ir::LogLevelFilterTsType>("number[] | null");
emscripten::register_type<clp_ffi_js::ir::ReaderOptions>("{timestampKey: string} | null");
emscripten::register_type<clp_ffi_js::ir::ReaderOptions>(
"{logLevelKey: string, timestampKey: string} | null"
);
Comment on lines +120 to +122
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for ReaderOptions keys

The ReaderOptions type now includes logLevelKey, but there's no visible validation to ensure the provided keys are non-empty or valid.

Consider adding validation in the create method. Here's a suggested implementation:

if (reader_options != nullptr) {
    auto validate_key = [](const std::string& key, const std::string& key_name) {
        if (key.empty()) {
            throw ClpFfiJsException{
                clp::ErrorCode::ErrorCode_InvalidArgument,
                __FILENAME__,
                __LINE__,
                std::format("{} cannot be empty", key_name)
            };
        }
    };
    
    validate_key(reader_options.logLevelKey, "logLevelKey");
    validate_key(reader_options.timestampKey, "timestampKey");
}


// JS types used as outputs
emscripten::enum_<clp_ffi_js::ir::StreamType>("IrStreamType")
Expand Down
152 changes: 152 additions & 0 deletions src/clp_ffi_js/ir/StreamReader.hpp
kirkrodrigues marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,12 +1,24 @@
#ifndef CLP_FFI_JS_IR_STREAMREADER_HPP
#define CLP_FFI_JS_IR_STREAMREADER_HPP

#include <algorithm>
#include <concepts>
#include <cstddef>
#include <cstdint>
#include <memory>
#include <optional>
#include <string>
#include <type_traits>
#include <vector>

#include <clp/streaming_compression/zstd/Decompressor.hpp>
#include <clp/type_utils.hpp>
#include <emscripten/em_asm.h>
#include <emscripten/val.h>
#include <spdlog/spdlog.h>

#include <clp_ffi_js/constants.hpp>
#include <clp_ffi_js/ir/LogEventWithFilterData.hpp>

namespace clp_ffi_js::ir {
// JS types used as inputs
Expand All @@ -23,6 +35,15 @@ enum class StreamType : uint8_t {
Unstructured,
};

template <typename LogEvent>
using LogEvents = std::vector<LogEventWithFilterData<LogEvent>>;

/**
* Mapping between an index in the filtered log events collection to an index in the unfiltered
* log events collection.
*/
using FilteredLogEventsMap = std::optional<std::vector<size_t>>;

/**
* Class to deserialize and decode Zstandard-compressed CLP IR streams as well as format decoded
* log events.
Expand All @@ -35,6 +56,7 @@ class StreamReader {
* Creates a `StreamReader` to read from the given array.
*
* @param data_array An array containing a Zstandard-compressed IR stream.
* @param reader_options
* @return The created instance.
* @throw ClpFfiJsException if any error occurs.
*/
Expand Down Expand Up @@ -102,7 +124,137 @@ class StreamReader {

protected:
explicit StreamReader() = default;

/**
* Templated implementation of `decode_range` that uses `log_event_to_string` to convert
* `log_event` to a string for the returned result.
*
* @tparam LogEvent
* @tparam ToStringFunc Function to convert a log event into a string.
* @param begin_idx
davemarco marked this conversation as resolved.
Show resolved Hide resolved
* @param end_idx
* @param filtered_log_event_map
* @param log_events
* @param use_filter
* @param log_event_to_string
* @return See `decode_range`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the lack of context at #35 (comment)

Suggested change
* @return See `decode_range`.
* @return See `decode_range`.
* @throw Propagates `ToStringFunc`'s exception.

*/
template <typename LogEvent, typename ToStringFunc>
requires requires(ToStringFunc func, LogEvent const& log_event) {
{
func(log_event)
} -> std::convertible_to<std::string>;
}
static auto generic_decode_range(
size_t begin_idx,
size_t end_idx,
FilteredLogEventsMap const& filtered_log_event_map,
LogEvents<LogEvent> const& log_events,
ToStringFunc log_event_to_string,
bool use_filter
) -> DecodedResultsTsType;

/**
* Templated implementation of `filter_log_events`.
*
* @tparam LogEvent
* @param log_level_filter
* @param log_events Derived class's log events.
* @param log_events
* @param[out] filtered_log_event_map Returns the filtered log events.
*/
template <typename LogEvent>
static auto generic_filter_log_events(
FilteredLogEventsMap& filtered_log_event_map,
LogLevelFilterTsType const& log_level_filter,
LogEvents<LogEvent> const& log_events
) -> void;
};

template <typename LogEvent, typename ToStringFunc>
requires requires(ToStringFunc func, LogEvent const& log_event) {
{
func(log_event)
} -> std::convertible_to<std::string>;
}
auto StreamReader::generic_decode_range(
size_t begin_idx,
size_t end_idx,
FilteredLogEventsMap const& filtered_log_event_map,
LogEvents<LogEvent> const& log_events,
ToStringFunc log_event_to_string,
bool use_filter
) -> DecodedResultsTsType {
if (use_filter && false == filtered_log_event_map.has_value()) {
return DecodedResultsTsType{emscripten::val::null()};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't in the log viewer code as well -

Can we add a log print before returning so that we know the input argument check fails here rather than at the range check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here i dont think so. This should be a normal case when a filter is applied, and the file does not have any logs with that filter level.

}

size_t length{0};
if (use_filter) {
length = filtered_log_event_map->size();
} else {
length = log_events.size();
}
if (length < end_idx || begin_idx > end_idx) {
SPDLOG_ERROR("Invalid log event index range: {}-{}", begin_idx, end_idx);
return DecodedResultsTsType{emscripten::val::null()};
Copy link
Collaborator

@junhaoliao junhaoliao Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see https://github.com/y-scope/clp-ffi-js/pull/35/files#r1884668582

it might be a good idea to add a log print here before returning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an error log here

}

auto const results{emscripten::val::array()};

for (size_t i = begin_idx; i < end_idx; ++i) {
size_t log_event_idx{0};
if (use_filter) {
log_event_idx = filtered_log_event_map->at(i);
} else {
log_event_idx = i;
}

auto const& log_event_with_filter_data{log_events.at(log_event_idx)};
auto const& log_event = log_event_with_filter_data.get_log_event();
auto const& timestamp = log_event_with_filter_data.get_timestamp();
auto const& log_level = log_event_with_filter_data.get_log_level();

EM_ASM(
{ Emval.toValue($0).push([UTF8ToString($1), $2, $3, $4]); },
results.as_handle(),
log_event_to_string(log_event).c_str(),
timestamp,
log_level,
log_event_idx + 1
);
}

return DecodedResultsTsType(results);
}

template <typename LogEvent>
auto StreamReader::generic_filter_log_events(
FilteredLogEventsMap& filtered_log_event_map,
LogLevelFilterTsType const& log_level_filter,
LogEvents<LogEvent> const& log_events
) -> void {
if (log_level_filter.isNull()) {
filtered_log_event_map.reset();
return;
}

filtered_log_event_map.emplace();
auto filter_levels
= emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter);

for (size_t log_event_idx = 0; log_event_idx < log_events.size(); ++log_event_idx) {
auto const& log_event = log_events[log_event_idx];
if (std::ranges::find(
filter_levels,
clp::enum_to_underlying_type(log_event.get_log_level())
)
!= filter_levels.end())
{
filtered_log_event_map->emplace_back(log_event_idx);
}
}
}
} // namespace clp_ffi_js::ir

#endif // CLP_FFI_JS_IR_STREAMREADER_HPP
Loading
Loading