-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for log-level filtering of structured IR streams. #35
base: main
Are you sure you want to change the base?
Changes from 44 commits
1a3da51
29a87d2
e5a8e4c
921c322
865e41d
0234365
0c86cf4
84037cf
ef1ff5d
cdcad94
a1e9142
2738dd5
d4afd35
3f32f91
01ef415
de5aaf5
67f43f0
be0e5ba
8e8c648
f2907f4
a1b776f
a199924
d00d0a2
e9ed45d
72029f1
c757571
4975f79
7346715
04a8d1f
2a202b8
5332e9b
da6350f
ee2eb42
90b977f
8cc73d0
4d905e8
5854e36
f3f6e76
d4467ea
6660763
5b684ab
9a5a502
cce7980
feec5ef
61e1e83
907c327
1da4c7d
f08e07f
aee6422
0581310
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -4,6 +4,7 @@ | |||||||
#include <array> | ||||||||
#include <cstdint> | ||||||||
#include <string_view> | ||||||||
#include <type_utils.hpp> | ||||||||
|
||||||||
namespace clp_ffi_js { | ||||||||
/** | ||||||||
|
@@ -17,6 +18,7 @@ enum class LogLevel : std::uint8_t { | |||||||
WARN, | ||||||||
ERROR, | ||||||||
FATAL, | ||||||||
LENGTH, // This isn't a valid log level. | ||||||||
}; | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we add a
Suggested change
That way we don't always have to update There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was looking at https://github.com/y-scope/clp/blob/f1876cd63a56dd79a6243f6c8b6b6a7fd48348cc/components/core/src/clp/GlobalMySQLMetadataDB.cpp#L22-L23 when i suggested the enum name "length" though I don't see the convention being documented anywhere. If that's a convention, i feel we can omit the comment. @kirkrodrigues could you confirm the convention? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's a general C/C++ convention to call it length; that said, it's probably not a bad idea to leave a comment because maybe the reader would confuse "SIZE" for a log level. I also realized that the case of these enum values is wrong. They should be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @coderabbitai let's open an issue to correct cases of enum property names as PascalCase There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @junhaoliao, I've successfully created the issue to correct the enum property names to PascalCase: Issue #39. 🔗 Analysis chain@junhaoliao, I've opened an issue to correct the cases of enum property names to PascalCase. 🏁 Scripts executedThe following scripts were executed for the analysis: Script:
Length of output: 460 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I changed to Length. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||||
constexpr LogLevel cValidLogLevelsBeginIdx{LogLevel::TRACE}; | ||||||||
|
||||||||
|
@@ -25,15 +27,16 @@ constexpr LogLevel cValidLogLevelsBeginIdx{LogLevel::TRACE}; | |||||||
* | ||||||||
* NOTE: These must be kept in sync manually. | ||||||||
*/ | ||||||||
constexpr std::array<std::string_view, 7> cLogLevelNames{ | ||||||||
"NONE", // This isn't a valid log level. | ||||||||
"TRACE", | ||||||||
"DEBUG", | ||||||||
"INFO", | ||||||||
"WARN", | ||||||||
"ERROR", | ||||||||
"FATAL", | ||||||||
}; | ||||||||
constexpr std::array<std::string_view, clp::enum_to_underlying_type(LogLevel::LENGTH)> | ||||||||
cLogLevelNames{ | ||||||||
"NONE", // This isn't a valid log level. | ||||||||
"TRACE", | ||||||||
"DEBUG", | ||||||||
"INFO", | ||||||||
"WARN", | ||||||||
"ERROR", | ||||||||
"FATAL", | ||||||||
}; | ||||||||
} // namespace clp_ffi_js | ||||||||
|
||||||||
#endif // CLP_FFI_JS_CONSTANTS_HPP |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -117,7 +117,9 @@ EMSCRIPTEN_BINDINGS(ClpStreamReader) { | |
// JS types used as inputs | ||
emscripten::register_type<clp_ffi_js::ir::DataArrayTsType>("Uint8Array"); | ||
emscripten::register_type<clp_ffi_js::ir::LogLevelFilterTsType>("number[] | null"); | ||
emscripten::register_type<clp_ffi_js::ir::ReaderOptions>("{timestampKey: string} | null"); | ||
emscripten::register_type<clp_ffi_js::ir::ReaderOptions>( | ||
"{logLevelKey: string, timestampKey: string} | null" | ||
); | ||
Comment on lines
+120
to
+122
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Add validation for ReaderOptions keys The ReaderOptions type now includes Consider adding validation in the if (reader_options != nullptr) {
auto validate_key = [](const std::string& key, const std::string& key_name) {
if (key.empty()) {
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_InvalidArgument,
__FILENAME__,
__LINE__,
std::format("{} cannot be empty", key_name)
};
}
};
validate_key(reader_options.logLevelKey, "logLevelKey");
validate_key(reader_options.timestampKey, "timestampKey");
} |
||
|
||
// JS types used as outputs | ||
emscripten::enum_<clp_ffi_js::ir::StreamType>("IrStreamType") | ||
|
kirkrodrigues marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
@@ -1,13 +1,21 @@ | ||||||||
#ifndef CLP_FFI_JS_IR_STREAMREADER_HPP | ||||||||
#define CLP_FFI_JS_IR_STREAMREADER_HPP | ||||||||
|
||||||||
#include <algorithm> | ||||||||
#include <cstddef> | ||||||||
#include <cstdint> | ||||||||
#include <memory> | ||||||||
#include <optional> | ||||||||
#include <type_traits> | ||||||||
#include <vector> | ||||||||
|
||||||||
#include <clp/streaming_compression/zstd/Decompressor.hpp> | ||||||||
#include <clp/type_utils.hpp> | ||||||||
#include <emscripten/em_asm.h> | ||||||||
#include <emscripten/val.h> | ||||||||
|
||||||||
#include <clp_ffi_js/constants.hpp> | ||||||||
|
||||||||
namespace clp_ffi_js::ir { | ||||||||
// JS types used as inputs | ||||||||
EMSCRIPTEN_DECLARE_VAL_TYPE(DataArrayTsType); | ||||||||
|
@@ -23,6 +31,12 @@ enum class StreamType : uint8_t { | |||||||
Unstructured, | ||||||||
}; | ||||||||
|
||||||||
/** | ||||||||
* Mapping between an index in the filtered log events collection to an index in the unfiltered | ||||||||
* log events collection. | ||||||||
*/ | ||||||||
using FilteredLogEventsMap = std::optional<std::vector<size_t>>; | ||||||||
|
||||||||
/** | ||||||||
* Class to deserialize and decode Zstandard-compressed CLP IR streams as well as format decoded | ||||||||
* log events. | ||||||||
|
@@ -35,6 +49,7 @@ class StreamReader { | |||||||
* Creates a `StreamReader` to read from the given array. | ||||||||
* | ||||||||
* @param data_array An array containing a Zstandard-compressed IR stream. | ||||||||
* @param reader_options | ||||||||
* @return The created instance. | ||||||||
* @throw ClpFfiJsException if any error occurs. | ||||||||
*/ | ||||||||
|
@@ -100,6 +115,113 @@ class StreamReader { | |||||||
[[nodiscard]] virtual auto decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const | ||||||||
-> DecodedResultsTsType = 0; | ||||||||
|
||||||||
/** | ||||||||
kirkrodrigues marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
* Templated implementation of `filter_log_events`. | ||||||||
* | ||||||||
* @tparam LogEvent | ||||||||
* @param log_level_filter | ||||||||
davemarco marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
* @param log_events Derived class's log events. | ||||||||
* @param log_events | ||||||||
* @param[out] filtered_log_event_map Returns the filtered log events. | ||||||||
*/ | ||||||||
template <typename LogEvent> | ||||||||
static auto generic_filter_log_events( | ||||||||
FilteredLogEventsMap& filtered_log_event_map, | ||||||||
LogLevelFilterTsType const& log_level_filter, | ||||||||
std::vector<LogEventWithFilterData<LogEvent>> const& log_events | ||||||||
) -> void { | ||||||||
if (log_level_filter.isNull()) { | ||||||||
filtered_log_event_map.reset(); | ||||||||
return; | ||||||||
} | ||||||||
|
||||||||
filtered_log_event_map.emplace(); | ||||||||
auto filter_levels | ||||||||
= emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter); | ||||||||
|
||||||||
for (size_t log_event_idx = 0; log_event_idx < log_events.size(); ++log_event_idx) { | ||||||||
auto const& log_event = log_events[log_event_idx]; | ||||||||
if (std::ranges::find( | ||||||||
filter_levels, | ||||||||
clp::enum_to_underlying_type(log_event.get_log_level()) | ||||||||
) | ||||||||
!= filter_levels.end()) | ||||||||
{ | ||||||||
filtered_log_event_map->emplace_back(log_event_idx); | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
|
||||||||
/** | ||||||||
* Templated implementation of `decode_range` that uses `log_event_to_string` to convert | ||||||||
* `log_event` to a string for the returned result. | ||||||||
* | ||||||||
* @tparam LogEvent | ||||||||
* @tparam ToStringFunc Function to convert a log event into a string. | ||||||||
* @param begin_idx | ||||||||
davemarco marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
* @param end_idx | ||||||||
* @param filtered_log_event_map | ||||||||
* @param log_events | ||||||||
* @param use_filter | ||||||||
* @param log_event_to_string | ||||||||
* @return See `decode_range`. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for the lack of context at #35 (comment)
Suggested change
|
||||||||
*/ | ||||||||
template <typename LogEvent, typename ToStringFunc> | ||||||||
requires requires(ToStringFunc func, LogEvent const& log_event) { | ||||||||
{ | ||||||||
func(log_event) | ||||||||
} -> std::convertible_to<std::string>; | ||||||||
} | ||||||||
static auto generic_decode_range( | ||||||||
size_t begin_idx, | ||||||||
size_t end_idx, | ||||||||
FilteredLogEventsMap const& filtered_log_event_map, | ||||||||
std::vector<LogEventWithFilterData<LogEvent>> const& log_events, | ||||||||
ToStringFunc log_event_to_string, | ||||||||
bool use_filter | ||||||||
) -> DecodedResultsTsType { | ||||||||
if (use_filter && false == filtered_log_event_map.has_value()) { | ||||||||
return DecodedResultsTsType{emscripten::val::null()}; | ||||||||
} | ||||||||
|
||||||||
size_t length{0}; | ||||||||
if (use_filter) { | ||||||||
length = filtered_log_event_map->size(); | ||||||||
} else { | ||||||||
length = log_events.size(); | ||||||||
} | ||||||||
if (length < end_idx || begin_idx > end_idx) { | ||||||||
return DecodedResultsTsType{emscripten::val::null()}; | ||||||||
} | ||||||||
|
||||||||
auto const results{emscripten::val::array()}; | ||||||||
|
||||||||
for (size_t i = begin_idx; i < end_idx; ++i) { | ||||||||
size_t log_event_idx{0}; | ||||||||
if (use_filter) { | ||||||||
log_event_idx = filtered_log_event_map->at(i); | ||||||||
} else { | ||||||||
log_event_idx = i; | ||||||||
} | ||||||||
|
||||||||
auto const& log_event_with_filter_data{log_events.at(log_event_idx)}; | ||||||||
auto const& log_event = log_event_with_filter_data.get_log_event(); | ||||||||
auto const& timestamp = log_event_with_filter_data.get_timestamp(); | ||||||||
auto const& log_level = log_event_with_filter_data.get_log_level(); | ||||||||
|
||||||||
EM_ASM( | ||||||||
{ Emval.toValue($0).push([UTF8ToString($1), $2, $3, $4]); }, | ||||||||
results.as_handle(), | ||||||||
log_event_to_string(log_event).c_str(), | ||||||||
timestamp, | ||||||||
log_level, | ||||||||
log_event_idx + 1 | ||||||||
); | ||||||||
} | ||||||||
|
||||||||
return DecodedResultsTsType(results); | ||||||||
} | ||||||||
|
||||||||
protected: | ||||||||
explicit StreamReader() = default; | ||||||||
}; | ||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove
NONE
and useLENGTH
in its place? That way we can drop one of the enum values and we can think aboutLENGTH
as indicating that no valid level was found (so the search extended past the last valid value).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a reasonable idea, but have some qualms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you're saying. On the other hand, having two invalid values is kind of messy already. For example, the code to check if a log level is within the valid enum values depends on
cValidLogLevelsBeginIdx
andLogLevel::LENGTH
.My perspective on the current log level parsing is that it's a heuristic until we migrate to IRv2 entirelyl; at that point, we shouldn't have to parse log levels any more. So I feel like simplifying the current code is a better approach until we switch to IRv2 or we have a user-need for more complicated levels.
That said, this isn't that big of a deal, so I guess if @junhaoliao doesn't agree, I'll leave it be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junhaoliao please review my and kirk arguments, and let us know ur thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kirkrodrigues Would CLP FFI also support parsing log level for IRv1 stream in the new
Deserializer
interface (the one that is now only for IRv2)? If not, we still need to maintain this enum for IRv1 till we completely drop support for IRv1, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we need to wait until we completely drop support for IRv1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @davemarco 's (1.) argument about the challenges of adding new log levels. However, I also recognize that the
NONE
value is used solely for log parsing (in read paths, not write paths). SinceNONE
is never serialized into non-code spaces and read back into the code, as long as our implementation consistently refers to it, there should be no risk of errors, which aligns with @kirkrodrigues's argument, assuming I understand it correctly.With that said, I still prefer that we keep the
NONE
as the very first item in the enum.The underlying type of the
LogLevel
isuint8_t
which has value0
when unutilized and placed in bss / initialized without explicit value (={}
). Such value is usually used to hint the default; for example, in order to hint feature XXX is by default enabled, we can add Boolean flags likeis_xxx_disabled
. WithNONE
removed, when we declare a enum variable without proper initialization, it may contain a valid value, and most importantly the default value is no longer hinted to code readers. (Though most of the time we shouldn't really read before initialization lolThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a way I feel the inline comments seem a bit bloated. Instead, as a new convention, can we add a prefix to hint the enum member is for describing properties of the enum? e.g.,
I was going to propose
_LENGTH
but clang-tidy claims that's reserved even now we've been using theenum class
syntax.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NONE
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junhaoliao i'm not sure clang tidy will currently let you have underscore in enum. For now I added the part about changing name of "LENGTH" to existing issue #39 about pascal case.