-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype: Add initial code for IRv2 decoder. #10
base: main
Are you sure you want to change the base?
Conversation
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
src/clp_ffi_js/ir/StreamReader.cpp
Outdated
bool is_four_bytes_encoding{true}; | ||
if (auto const err{ | ||
clp::ffi::ir_stream::get_encoding_type(*zstd_decompressor, is_four_bytes_encoding) | ||
}; | ||
clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success != err) | ||
{ | ||
SPDLOG_CRITICAL("Failed to decode encoding type, err={}", err); | ||
throw ClpFfiJsException{ | ||
clp::ErrorCode::ErrorCode_MetadataCorrupted, | ||
__FILENAME__, | ||
__LINE__, | ||
"Failed to decode encoding type." | ||
}; | ||
} | ||
if (false == is_four_bytes_encoding) { | ||
throw ClpFfiJsException{ | ||
clp::ErrorCode::ErrorCode_Unsupported, | ||
__FILENAME__, | ||
__LINE__, | ||
"IR stream uses unsupported encoding." | ||
}; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to remove this byte-encoding check as it seems to consume bytes from the reader and cause below clp::ffi::ir_stream::Deserializer::create(*zstd_decompressor)
to fail with a protocol error. Do we still need this byte-encoding check?
I haven't taken a careful look into the clp-core code. Just out of curiosity, are we sticking with 4-byte or 8-byte encoding in IRv2 or do we support both? I recall there were some discussions to deprecate one in favour of the other but I don't remember we had a conclusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For v2 deserializer, you don't need this check. In terms of encoding types, both are supported by the format. However, the logging library will probably only implement four-byte encoding interface. The deserializer will automatically handle both encodings
src/clp_ffi_js/ir/StreamReader.cpp
Outdated
while (true) { | ||
auto result{m_stream_reader_data_context->get_deserializer().deserialize_log_event()}; | ||
auto result{m_stream_reader_data_context->get_deserializer().deserialize_to_next_log_event(reader)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: pull out the deserializer to reduce line length.
src/clp_ffi_js/ir/StreamReader.cpp
Outdated
|
||
auto const parsed{log_event.get_message().decode_and_unparse()}; | ||
if (false == parsed.has_value()) { | ||
auto const json{log_event.serialize_to_json()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we retrieve the log event as a JSON since I am concerned about the cost of inserting nodes into an Embind object when we traverse the log event. Let me know if there're any concerns that I missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some benchmarking might be worthy before we decide on that though.
@LinZhihao-723 / @kirkrodrigues
|
|
We should plan for one stream reader (a.k.a. decoder / deserializer) class by the time we start using this in production. Before now and then, we can do whatever is simplest.
Do we need to? The way I see it, the two readers are part of the same IR library, so it feels easier to keep them in one binary.
I guess it's still smaller than the IR streams we end up decoding?
Right, this is another reason to go with one binary.
The production release plan is that the KV-pair IR format will support deserializing IR v1 streams and expose them with the same API as KV-pair IR streams. So hopefully we can move away from all the different template specializations.
In my mind, the production release will use a different version number to differentiate the two.
As Zhihao said above.
|
@kirkrodrigues Thanks for completing the responses. I'll stick with one binary and redesign the interfaces with one stream reader pictured. I also shortly discussed offline with Zhihao yesterday, and we have these items opened for discussions (@LinZhihao-723 please correct me if I'm wrong):
|
We could, but at the same time, magic numbers should be used sparingly imo. If one day, CLP IR streams are more pervasive, we don't want the implementers of file type checking tools to have dozens of different entries for the same conceptual format, lol.
This seems like the easiest option until we implement backwards compatible deserialization.
This is probably more work than the previous option.
Not yet. Hopefully within this month. |
…r to inherit from it.
… stream_reader_data_context) back to private.
src/clp_ffi_js/ir/StreamReader.cpp
Outdated
log_num | ||
); | ||
++log_num; | ||
return std::make_unique<KVPairIRStreamReader>(KVPairIRStreamReader::create(std::move(data_array))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bad practice since we have to recreate the data_buffer in KVPairIRStreamReader's factory function. Shall we make KVPairIRStreamReader's constructor public / friend so that we can
- seek(0) in the zstd reader
- create the deserializer from the zstd reader and then the StreamReaderDataContext
- call KVPairIRStreamReader's constructor here.
so long we don't expose the constructor via Embind there should be minor concerns of misuses of the constructor (at the moment we are not exposing StreamReaderDataContext anyways). What do you think?
# Conflicts: # src/clp_ffi_js/ir/StreamReader.cpp # src/clp_ffi_js/ir/StreamReader.hpp
…e interfaces of the newly proposed StreamReader.
The PR is not ready for review yet as I'm still cleaning up the log level filtering code for IRv2. |
This prototype is for reference only and not meant to be merged. |
The changes in this PR are only to facilitate discussions. The original IR decoder was modified directly for the ease of prototyping and to show the differences between the two decoders. (Maybe we would end up with only one decoder depending on our discussion results.)
Description
Validation performed