Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for log-level filtering of structured IR streams. #35

Merged
merged 55 commits into from
Dec 18, 2024
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
1a3da51
add new class
davemarco Nov 9, 2024
29a87d2
fix-lint
davemarco Nov 9, 2024
e5a8e4c
fix lint
davemarco Nov 9, 2024
921c322
small change
davemarco Nov 9, 2024
865e41d
small change
davemarco Nov 9, 2024
0234365
fix lint
davemarco Nov 9, 2024
0c86cf4
Update src/clp_ffi_js/ir/LogEventWithFilterData.hpp
davemarco Nov 9, 2024
84037cf
fix lint
davemarco Nov 10, 2024
ef1ff5d
fix lint
davemarco Nov 10, 2024
cdcad94
fix lint
davemarco Nov 10, 2024
a1e9142
fix lint
davemarco Nov 10, 2024
2738dd5
first draft
davemarco Nov 10, 2024
d4afd35
add missing filtering
davemarco Nov 10, 2024
3f32f91
fix lint
davemarco Nov 10, 2024
01ef415
add support for int
davemarco Nov 10, 2024
de5aaf5
add new functions
davemarco Nov 11, 2024
67f43f0
cleanup
davemarco Nov 11, 2024
be0e5ba
lint
davemarco Nov 11, 2024
8e8c648
add files
davemarco Nov 11, 2024
f2907f4
changes
davemarco Nov 11, 2024
a1b776f
fix filter
davemarco Nov 11, 2024
a199924
Merge branch 'main' into sFilter
davemarco Nov 11, 2024
d00d0a2
change version
davemarco Nov 11, 2024
e9ed45d
fix lint
davemarco Nov 11, 2024
72029f1
small change
davemarco Nov 11, 2024
c757571
junhao review
davemarco Nov 15, 2024
4975f79
fix lint
davemarco Nov 25, 2024
7346715
fix lint
davemarco Nov 25, 2024
04a8d1f
junhao changes
davemarco Dec 2, 2024
2a202b8
add missing code in structured reader
davemarco Dec 5, 2024
5332e9b
add template for decode
davemarco Dec 6, 2024
da6350f
lint
davemarco Dec 6, 2024
ee2eb42
fix lint
davemarco Dec 6, 2024
90b977f
reword some comments
davemarco Dec 6, 2024
8cc73d0
junhao review
davemarco Dec 9, 2024
4d905e8
remove double declaration
davemarco Dec 9, 2024
5854e36
fix lint
davemarco Dec 10, 2024
f3f6e76
move reserve up
davemarco Dec 10, 2024
d4467ea
Apply suggestions from code review
davemarco Dec 12, 2024
6660763
add reader option param
davemarco Dec 12, 2024
5b684ab
refactor using lambda
davemarco Dec 12, 2024
9a5a502
small nits
davemarco Dec 12, 2024
cce7980
fix lint
davemarco Dec 12, 2024
feec5ef
Apply suggestions from code review
davemarco Dec 13, 2024
61e1e83
kirk review
davemarco Dec 13, 2024
907c327
fix lint
davemarco Dec 13, 2024
1da4c7d
fix lint
davemarco Dec 13, 2024
f08e07f
Apply suggestions from code review
davemarco Dec 13, 2024
aee6422
kirk review
davemarco Dec 13, 2024
0581310
added rerror for invalid range
davemarco Dec 14, 2024
2aabf7b
Update src/clp_ffi_js/ir/StreamReader.hpp
davemarco Dec 18, 2024
2d7e02e
add throws doc strings
davemarco Dec 18, 2024
7d0a2c8
fix
davemarco Dec 18, 2024
88e6d3e
junhao review
davemarco Dec 18, 2024
7df5e26
add mising word
davemarco Dec 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ endif()
set(CLP_FFI_JS_SRC_MAIN
src/clp_ffi_js/ir/StreamReader.cpp
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
)

Expand Down
21 changes: 12 additions & 9 deletions src/clp_ffi_js/constants.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include <array>
#include <cstdint>
#include <string_view>
#include <type_utils.hpp>

namespace clp_ffi_js {
/**
Expand All @@ -17,6 +18,7 @@ enum class LogLevel : std::uint8_t {
WARN,
ERROR,
FATAL,
LENGTH, // This isn't a valid log level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove NONE and use LENGTH in its place? That way we can drop one of the enum values and we can think about LENGTH as indicating that no valid level was found (so the search extended past the last valid value).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a reasonable idea, but have some qualms

  1. If we add multiple level schemas in the future, say one with more levels (like a new CRITICAL), then LENGTH enum may or may not represent a valid level across level schemas. Having a unique NONE as always zero across level schemas may simplify some code.
  2. There will be some effort to modify log viewer to support this change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying. On the other hand, having two invalid values is kind of messy already. For example, the code to check if a log level is within the valid enum values depends on cValidLogLevelsBeginIdx and LogLevel::LENGTH.

My perspective on the current log level parsing is that it's a heuristic until we migrate to IRv2 entirelyl; at that point, we shouldn't have to parse log levels any more. So I feel like simplifying the current code is a better approach until we switch to IRv2 or we have a user-need for more complicated levels.

That said, this isn't that big of a deal, so I guess if @junhaoliao doesn't agree, I'll leave it be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao please review my and kirk arguments, and let us know ur thoughts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current log level parsing is that it's a heuristic until we migrate to IRv2 entirely

@kirkrodrigues Would CLP FFI also support parsing log level for IRv1 stream in the new Deserializer interface (the one that is now only for IRv2)? If not, we still need to maintain this enum for IRv1 till we completely drop support for IRv1, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we need to wait until we completely drop support for IRv1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @davemarco 's (1.) argument about the challenges of adding new log levels. However, I also recognize that the NONE value is used solely for log parsing (in read paths, not write paths). Since NONE is never serialized into non-code spaces and read back into the code, as long as our implementation consistently refers to it, there should be no risk of errors, which aligns with @kirkrodrigues's argument, assuming I understand it correctly.

With that said, I still prefer that we keep the NONE as the very first item in the enum.
The underlying type of the LogLevel is uint8_t which has value 0 when unutilized and placed in bss / initialized without explicit value (={}). Such value is usually used to hint the default; for example, in order to hint feature XXX is by default enabled, we can add Boolean flags like is_xxx_disabled. With NONE removed, when we declare a enum variable without proper initialization, it may contain a valid value, and most importantly the default value is no longer hinted to code readers. (Though most of the time we shouldn't really read before initialization lol

Copy link
Member

@junhaoliao junhaoliao Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a way I feel the inline comments seem a bit bloated. Instead, as a new convention, can we add a prefix to hint the enum member is for describing properties of the enum? e.g.,

enum class FilesTableCol: std::uint8_t {
   NAME = 0,
   PATH,
   SIZE,
   FILES_TABLE_COL_LENGTH // then this should not be confused with `SIZE`
}

I was going to propose _LENGTH but clang-tidy claims that's reserved even now we've been using the enum class syntax.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Sure, let's keep NONE.
  • Sure, I guess we could define a new convention. Keep in mind the naming of enum values should be PascalCase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao i'm not sure clang tidy will currently let you have underscore in enum. For now I added the part about changing name of "LENGTH" to existing issue #39 about pascal case.

};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a Length as the last item and use it to replace cValidLogLevelsEndIdx?

Suggested change
};
Length
};

That way we don't always have to update cValidLogLevelsEndIdx if we ever add more log levels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at https://github.com/y-scope/clp/blob/f1876cd63a56dd79a6243f6c8b6b6a7fd48348cc/components/core/src/clp/GlobalMySQLMetadataDB.cpp#L22-L23 when i suggested the enum name "length" though I don't see the convention being documented anywhere. If that's a convention, i feel we can omit the comment.

@kirkrodrigues could you confirm the convention?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a general C/C++ convention to call it length; that said, it's probably not a bad idea to leave a comment because maybe the reader would confuse "SIZE" for a log level.

I also realized that the case of these enum values is wrong. They should be None, Trace, etc. Can we open an issue and fix in another PR?

Copy link
Member

@junhaoliao junhaoliao Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai let's open an issue to correct cases of enum property names as PascalCase

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, I've successfully created the issue to correct the enum property names to PascalCase: Issue #39.

🔗 Analysis chain

@junhaoliao, I've opened an issue to correct the cases of enum property names to PascalCase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
gh issue create \
  --title "Correct enum property names to PascalCase" \
  --body "As discussed in [PR #35](https://github.com/y-scope/clp-ffi-js/pull/35) and [this comment](https://github.com/y-scope/clp-ffi-js/pull/35#discussion_r1842284173), the enum property names in \`src/clp_ffi_js/constants.hpp\` should be updated to PascalCase to adhere to coding conventions.

Requested by @junhaoliao." \
  --assignee junhaoliao

Length of output: 460

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to Length.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

constexpr LogLevel cValidLogLevelsBeginIdx{LogLevel::TRACE};

Expand All @@ -25,15 +27,16 @@ constexpr LogLevel cValidLogLevelsBeginIdx{LogLevel::TRACE};
*
* NOTE: These must be kept in sync manually.
*/
constexpr std::array<std::string_view, 7> cLogLevelNames{
"NONE", // This isn't a valid log level.
"TRACE",
"DEBUG",
"INFO",
"WARN",
"ERROR",
"FATAL",
};
constexpr std::array<std::string_view, clp::enum_to_underlying_type(LogLevel::LENGTH)>
cLogLevelNames{
"NONE", // This isn't a valid log level.
"TRACE",
"DEBUG",
"INFO",
"WARN",
"ERROR",
"FATAL",
};
} // namespace clp_ffi_js

#endif // CLP_FFI_JS_CONSTANTS_HPP
5 changes: 4 additions & 1 deletion src/clp_ffi_js/ir/LogEventWithFilterData.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include <concepts>
#include <utility>

#include <clp/ffi/KeyValuePairLogEvent.hpp>
#include <clp/ir/LogEvent.hpp>
#include <clp/ir/types.hpp>

Expand All @@ -12,6 +13,7 @@
namespace clp_ffi_js::ir {
using clp::ir::four_byte_encoded_variable_t;
using UnstructuredLogEvent = clp::ir::LogEvent<four_byte_encoded_variable_t>;
using StructuredLogEvent = clp::ffi::KeyValuePairLogEvent;

/**
* A templated class that extends a log event type with processed versions of some of its fields,
Expand All @@ -21,7 +23,8 @@ using UnstructuredLogEvent = clp::ir::LogEvent<four_byte_encoded_variable_t>;
* @tparam LogEvent The type of the log event.
*/
template <typename LogEvent>
requires std::same_as<LogEvent, UnstructuredLogEvent>
requires std::same_as<LogEvent, UnstructuredLogEvent> || std::same_as<LogEvent, StructuredLogEvent>

class LogEventWithFilterData {
public:
// Constructor
Expand Down
4 changes: 3 additions & 1 deletion src/clp_ffi_js/ir/StreamReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,9 @@ EMSCRIPTEN_BINDINGS(ClpStreamReader) {
// JS types used as inputs
emscripten::register_type<clp_ffi_js::ir::DataArrayTsType>("Uint8Array");
emscripten::register_type<clp_ffi_js::ir::LogLevelFilterTsType>("number[] | null");
emscripten::register_type<clp_ffi_js::ir::ReaderOptions>("{timestampKey: string} | null");
emscripten::register_type<clp_ffi_js::ir::ReaderOptions>(
"{logLevelKey: string, timestampKey: string} | null"
);
Comment on lines +120 to +122
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for ReaderOptions keys

The ReaderOptions type now includes logLevelKey, but there's no visible validation to ensure the provided keys are non-empty or valid.

Consider adding validation in the create method. Here's a suggested implementation:

if (reader_options != nullptr) {
    auto validate_key = [](const std::string& key, const std::string& key_name) {
        if (key.empty()) {
            throw ClpFfiJsException{
                clp::ErrorCode::ErrorCode_InvalidArgument,
                __FILENAME__,
                __LINE__,
                std::format("{} cannot be empty", key_name)
            };
        }
    };
    
    validate_key(reader_options.logLevelKey, "logLevelKey");
    validate_key(reader_options.timestampKey, "timestampKey");
}


// JS types used as outputs
emscripten::enum_<clp_ffi_js::ir::StreamType>("IrStreamType")
Expand Down
8 changes: 8 additions & 0 deletions src/clp_ffi_js/ir/StreamReader.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
#include <cstddef>
#include <cstdint>
#include <memory>
#include <optional>
#include <vector>

#include <clp/streaming_compression/zstd/Decompressor.hpp>
#include <emscripten/val.h>
Expand All @@ -23,6 +25,12 @@ enum class StreamType : uint8_t {
Unstructured,
};

/**
* Mapping between an index in the filtered log events collection to an index in the unfiltered
* log events collection.
*/
using FilteredLogEventsMap = std::optional<std::vector<size_t>>;

/**
* Class to deserialize and decode Zstandard-compressed CLP IR streams as well as format decoded
* log events.
Expand Down
113 changes: 31 additions & 82 deletions src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,45 +7,37 @@
#include <string_view>
#include <system_error>
#include <utility>
#include <vector>

#include <clp/Array.hpp>
#include <clp/ErrorCode.hpp>
#include <clp/ffi/ir_stream/Deserializer.hpp>
#include <clp/ffi/KeyValuePairLogEvent.hpp>
#include <clp/ffi/Value.hpp>
#include <clp/ir/types.hpp>
#include <clp/TraceableException.hpp>
#include <emscripten/em_asm.h>
#include <emscripten/val.h>
#include <spdlog/spdlog.h>

#include <clp_ffi_js/ClpFfiJsException.hpp>
#include <clp_ffi_js/constants.hpp>
#include <clp_ffi_js/ir/StreamReader.hpp>
#include <clp_ffi_js/ir/StreamReaderDataContext.hpp>
#include <clp_ffi_js/ir/StructuredIrUnitHandler.hpp>
#include <clp_ffi_js/ir/utils.hpp>

namespace clp_ffi_js::ir {
namespace {
constexpr std::string_view cEmptyJsonStr{"{}"};
constexpr std::string_view cLogLevelFilteringNotSupportedErrorMsg{
"Log level filtering is not yet supported in this reader."
};
constexpr std::string_view cReaderOptionsLogLevelKey{"logLevelKey"};
constexpr std::string_view cReaderOptionsTimestampKey{"timestampKey"};
} // namespace

using clp::ir::four_byte_encoded_variable_t;

auto StructuredIrStreamReader::create(
std::unique_ptr<ZstdDecompressor>&& zstd_decompressor,
clp::Array<char> data_array,
ReaderOptions const& reader_options
) -> StructuredIrStreamReader {
auto deserialized_log_events{std::make_shared<std::vector<clp::ffi::KeyValuePairLogEvent>>()};
auto deserialized_log_events{std::make_shared<StructuredLogEvents>()};
auto result{StructuredIrDeserializer::create(
*zstd_decompressor,
IrUnitHandler{
StructuredIrUnitHandler{
deserialized_log_events,
reader_options[cReaderOptionsLogLevelKey.data()].as<std::string>(),
reader_options[cReaderOptionsTimestampKey.data()].as<std::string>()
Comment on lines +41 to 42
Copy link

@coderabbitai coderabbitai bot Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ensure presence of required keys in reader_options

The code assumes that reader_options contains the keys logLevelKey and timestampKey. To prevent potential exceptions, consider adding error handling to verify that these keys are present.

Apply this diff to add key existence checks:

+if (!reader_options.hasOwnProperty(cReaderOptionsLogLevelKey.data()) || 
+    !reader_options.hasOwnProperty(cReaderOptionsTimestampKey.data())) {
+    throw ClpFfiJsException{
+        clp::ErrorCode::ErrorCode_InvalidArgument,
+        __FILENAME__,
+        __LINE__,
+        "Required reader options keys are missing"
+    };
+}

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a issue for JavaScript support (usage without the TS definition file) and keep a list of TODOs there.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, I've created the issue as requested: Issue #36 to track JavaScript support without the TypeScript definition file and to keep a list of TODOs.

🔗 Analysis chain

@junhaoliao, I've created a new GitHub issue to track JavaScript support without the TypeScript definition file and to keep a list of TODOs.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Create a new GitHub issue for JavaScript support without the TypeScript definition file
gh issue create \
  --title "JavaScript support without the TypeScript definition file" \
  --body "We need to support JavaScript usage without relying on the TypeScript definition file.

This issue is created to keep a list of TODOs for implementing this feature.

Referenced from [PR #35](https://github.com/y-scope/clp-ffi-js/pull/35) and [this comment](https://github.com/y-scope/clp-ffi-js/pull/35#discussion_r1835929602) by @junhaoliao." \
  --assignee junhaoliao

Length of output: 507


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#35
File: src/clp_ffi_js/ir/StructuredIrStreamReader.cpp:50-51
Timestamp: 2024-11-14T15:31:45.747Z
Learning: Ensure that `clp-ffi-js` supports JavaScript usage without relying on the TypeScript definition file, and keep track of related TODOs in a GitHub issue.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

}
)};
Expand Down Expand Up @@ -75,15 +67,19 @@ auto StructuredIrStreamReader::get_num_events_buffered() const -> size_t {
}

auto StructuredIrStreamReader::get_filtered_log_event_map() const -> FilteredLogEventMapTsType {
SPDLOG_ERROR(cLogLevelFilteringNotSupportedErrorMsg);
return FilteredLogEventMapTsType{emscripten::val::null()};
if (false == m_filtered_log_event_map.has_value()) {
return FilteredLogEventMapTsType{emscripten::val::null()};
}

return FilteredLogEventMapTsType{emscripten::val::array(m_filtered_log_event_map.value())};
}

void StructuredIrStreamReader::filter_log_events(LogLevelFilterTsType const& log_level_filter) {
if (log_level_filter.isNull()) {
return;
}
SPDLOG_ERROR(cLogLevelFilteringNotSupportedErrorMsg);
generic_filter_log_events(
m_filtered_log_event_map,
log_level_filter,
*m_deserialized_log_events
);
}

auto StructuredIrStreamReader::deserialize_stream() -> size_t {
Expand All @@ -94,16 +90,14 @@ auto StructuredIrStreamReader::deserialize_stream() -> size_t {
constexpr size_t cDefaultNumReservedLogEvents{500'000};
m_deserialized_log_events->reserve(cDefaultNumReservedLogEvents);
auto& reader{m_stream_reader_data_context->get_reader()};
while (true) {
auto result{m_stream_reader_data_context->get_deserializer().deserialize_next_ir_unit(reader
)};
auto& deserializer = m_stream_reader_data_context->get_deserializer();

while (false == deserializer.is_stream_completed()) {
auto result{deserializer.deserialize_next_ir_unit(reader)};
if (false == result.has_error()) {
continue;
}
auto const error{result.error()};
if (std::errc::operation_not_permitted == error) {
break;
}
if (std::errc::result_out_of_range == error) {
Comment on lines +96 to 102
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinZhihao-723 could you help confirm if this is the correct / preferred call sequence to handle End of Stream?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my end I tested with a file without the EOS symbol which seems to be gracefully handled as I saw no crash and only this error message was printed:

[2024-12-09 14:26:35.225] [error] [StructuredIrStreamReader.cpp:102] File contains an incomplete IR stream

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, u can refer to this part of code in clp-ffi-py to see some similar usage: https://github.com/y-scope/clp-ffi-py/blob/main/src/clp_ffi_py/ir/native/PyDeserializer.cpp#L224

Comment on lines +96 to 102
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling for incomplete IR streams.

Based on the learnings from LinZhihao-723, we should only check for std::errc::result_out_of_range as the incomplete stream error has been fixed in ffi core.

Apply this change:

-    while (false == deserializer.is_stream_completed()) {
-        auto result{deserializer.deserialize_next_ir_unit(reader)};
-        if (false == result.has_error()) {
-            continue;
-        }
-        auto const error{result.error()};
-        if (std::errc::result_out_of_range == error) {
-            SPDLOG_ERROR("File contains an incomplete IR stream");
-            break;
-        }
+    while (true) {
+        auto result{deserializer.deserialize_next_ir_unit(reader)};
+        if (result.has_error()) {
+            if (std::errc::result_out_of_range == result.error()) {
+                SPDLOG_ERROR("File contains an incomplete IR stream");
+                break;
+            }
+            throw ClpFfiJsException{
+                clp::ErrorCode::ErrorCode_Corrupt,
+                __FILENAME__,
+                __LINE__,
+                std::format(
+                    "Failed to deserialize IR unit: {}:{}",
+                    result.error().category().name(),
+                    result.error().message()
+                )
+            };
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while (false == deserializer.is_stream_completed()) {
auto result{deserializer.deserialize_next_ir_unit(reader)};
if (false == result.has_error()) {
continue;
}
auto const error{result.error()};
if (std::errc::operation_not_permitted == error) {
break;
}
if (std::errc::result_out_of_range == error) {
while (true) {
auto result{deserializer.deserialize_next_ir_unit(reader)};
if (result.has_error()) {
if (std::errc::result_out_of_range == result.error()) {
SPDLOG_ERROR("File contains an incomplete IR stream");
break;
}
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Corrupt,
__FILENAME__,
__LINE__,
std::format(
"Failed to deserialize IR unit: {}:{}",
result.error().category().name(),
result.error().message()
)
};
}

SPDLOG_ERROR("File contains an incomplete IR stream");
break;
Expand All @@ -119,72 +113,27 @@ auto StructuredIrStreamReader::deserialize_stream() -> size_t {
)
};
}
m_timestamp_node_id = m_stream_reader_data_context->get_deserializer()
.get_ir_unit_handler()
.get_timestamp_node_id();
m_stream_reader_data_context.reset(nullptr);
return m_deserialized_log_events->size();
}

auto StructuredIrStreamReader::decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType {
if (use_filter) {
SPDLOG_ERROR(cLogLevelFilteringNotSupportedErrorMsg);
return DecodedResultsTsType{emscripten::val::null()};
}

if (m_deserialized_log_events->size() < end_idx || begin_idx > end_idx) {
return DecodedResultsTsType{emscripten::val::null()};
}

auto const results{emscripten::val::array()};

for (size_t log_event_idx = begin_idx; log_event_idx < end_idx; ++log_event_idx) {
auto const& log_event{m_deserialized_log_events->at(log_event_idx)};

auto const json_result{log_event.serialize_to_json()};
std::string json_str{cEmptyJsonStr};
if (false == json_result.has_value()) {
auto error_code{json_result.error()};
SPDLOG_ERROR(
"Failed to deserialize log event to JSON: {}:{}",
error_code.category().name(),
error_code.message()
);
} else {
json_str = json_result.value().dump();
}

auto const& id_value_pairs{log_event.get_node_id_value_pairs()};
clp::ffi::value_int_t timestamp{0};
if (m_timestamp_node_id.has_value()) {
auto const& timestamp_pair{id_value_pairs.at(m_timestamp_node_id.value())};
if (timestamp_pair.has_value()) {
if (timestamp_pair->is<clp::ffi::value_int_t>()) {
timestamp = timestamp_pair.value().get_immutable_view<clp::ffi::value_int_t>();
} else {
// TODO: Add support for parsing timestamp values of string type.
SPDLOG_ERROR("Unable to parse timestamp for log_event_idx={}", log_event_idx);
}
}
}

EM_ASM(
{ Emval.toValue($0).push([UTF8ToString($1), $2, $3, $4]); },
results.as_handle(),
json_str.c_str(),
timestamp,
LogLevel::NONE,
log_event_idx + 1
);
}

return DecodedResultsTsType(results);
return generic_decode_range(
begin_idx,
end_idx,
m_filtered_log_event_map,
*m_deserialized_log_events,
use_filter,
// `ts_pattern` argument is unused in `generic_decode_range<StructuredLogEvents>`.
// Default timestamp pattern is used to conform to `generic_decode_range<T>` interface.
clp::TimestampPattern()
);
}

StructuredIrStreamReader::StructuredIrStreamReader(
StreamReaderDataContext<StructuredIrDeserializer>&& stream_reader_data_context,
std::shared_ptr<std::vector<clp::ffi::KeyValuePairLogEvent>> deserialized_log_events
std::shared_ptr<StructuredLogEvents> deserialized_log_events
)
: m_deserialized_log_events{std::move(deserialized_log_events)},
m_stream_reader_data_context{
Expand Down
Loading
Loading