Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split StreamReader into an interface and implementation to prepare for adding another IR stream reader. #26

Merged
merged 42 commits into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
3dae806
Refactor StreamReader to modularize decoding logic.
junhaoliao Oct 15, 2024
ab32045
Add a comment section for methods in StreamReader.hpp.
junhaoliao Oct 15, 2024
8dc3dff
Add `get_version` to decoding methods.
junhaoliao Oct 15, 2024
eba1d30
Move `#include <emscripten/bind.h>` from StreamReader.hpp to StreamRe…
junhaoliao Oct 15, 2024
5398d1d
Rename StreamReader -> IRStreamReader.
junhaoliao Oct 15, 2024
cfd55ca
Rename ClpIrStreamReader to ClpIRStreamReader in bindings.
junhaoliao Oct 15, 2024
d1ed2f0
Extract a StreamReader base class from IRStreamReader.
junhaoliao Oct 15, 2024
bf5e4c9
Rename IRStreamReader -> IrStreamReader.
junhaoliao Oct 15, 2024
ad306f2
merging main into junhao PR
davemarco Oct 31, 2024
c474bb4
some restructure
davemarco Nov 1, 2024
f4a6896
fix cmake
davemarco Nov 1, 2024
dba3910
more changes after test
davemarco Nov 1, 2024
9ddd974
junhao changes
davemarco Nov 4, 2024
a5cb529
fix lint
davemarco Nov 4, 2024
cd50c9a
version methodology change + better error handling
davemarco Nov 4, 2024
d2cf122
lint fix
davemarco Nov 4, 2024
7fe3a58
Update src/clp_ffi_js/ir/decoding_methods.hpp
davemarco Nov 5, 2024
9b67be9
fix print statement
davemarco Nov 5, 2024
fb2b290
fix lint
davemarco Nov 5, 2024
464cf31
forgot one comment, adding type for log level filter
davemarco Nov 5, 2024
6e8d025
Update src/clp_ffi_js/ir/StreamReader.hpp
davemarco Nov 5, 2024
19e41b9
Update src/clp_ffi_js/ir/StreamReader.cpp
davemarco Nov 5, 2024
4988949
Update src/clp_ffi_js/ir/decoding_methods.hpp
davemarco Nov 5, 2024
c79a83f
kirk review comments
davemarco Nov 5, 2024
ef0b86b
fix lint
davemarco Nov 5, 2024
d9e9e97
Apply suggestions from code review
davemarco Nov 6, 2024
efaeedd
kirk review
davemarco Nov 6, 2024
3764298
rename
davemarco Nov 6, 2024
c81cd4b
cleanup
davemarco Nov 6, 2024
28d4250
fix seek issue
davemarco Nov 6, 2024
c5db9a6
Update src/clp_ffi_js/ir/StreamReader.cpp
davemarco Nov 6, 2024
4a83bce
Update src/clp_ffi_js/ir/StreamReader.hpp
davemarco Nov 6, 2024
de7a2f8
Update src/clp_ffi_js/ir/StreamReader.hpp
davemarco Nov 6, 2024
1b31f08
Apply suggestions from code review
davemarco Nov 6, 2024
226ab3a
Update src/clp_ffi_js/ir/StreamReader.cpp
davemarco Nov 6, 2024
fa4debd
change to pass by value
davemarco Nov 6, 2024
4108398
kirk changes
davemarco Nov 6, 2024
c7a8fbf
change submodule
davemarco Nov 6, 2024
d5cad52
Merge branch 'main' into extractBaseClass
davemarco Nov 6, 2024
bfea301
Apply suggestions from code review
davemarco Nov 6, 2024
72bdf6e
revert submodule change
davemarco Nov 6, 2024
119de6c
use see
davemarco Nov 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ target_include_directories(${CLP_FFI_JS_BIN_NAME} PRIVATE src/)

set(CLP_FFI_JS_SRC_MAIN
src/clp_ffi_js/ir/decoding_methods.cpp
src/clp_ffi_js/ir/IrStreamReader.cpp
src/clp_ffi_js/ir/StreamReader.cpp
)

Expand Down
224 changes: 224 additions & 0 deletions src/clp_ffi_js/ir/IrStreamReader.cpp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides renaming the class from StreamReader to IrStreamReader, did we make any other changes to this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed create() since we have effectively moved this to base class. I removed encripten bindings and put them all in base class.

Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
#include "IrStreamReader.hpp"

#include <algorithm>
#include <cstddef>
#include <iterator>
#include <memory>
#include <optional>
#include <string>
#include <string_view>
#include <system_error>
#include <type_traits>
#include <utility>
#include <vector>

#include <clp/Array.hpp>
#include <clp/ErrorCode.hpp>
#include <clp/ir/LogEventDeserializer.hpp>
#include <clp/ir/types.hpp>
#include <clp/streaming_compression/zstd/Decompressor.hpp>
#include <clp/TraceableException.hpp>
#include <clp/type_utils.hpp>
#include <emscripten/bind.h>
#include <emscripten/em_asm.h>
#include <emscripten/val.h>
#include <spdlog/spdlog.h>

#include <clp_ffi_js/ClpFfiJsException.hpp>
#include <clp_ffi_js/constants.hpp>
#include <clp_ffi_js/ir/decoding_methods.hpp>
#include <clp_ffi_js/ir/LogEventWithLevel.hpp>
#include <clp_ffi_js/ir/StreamReader.hpp>
#include <clp_ffi_js/ir/StreamReaderDataContext.hpp>

namespace clp_ffi_js::ir {

using namespace std::literals::string_literals;
using clp::ir::four_byte_encoded_variable_t;

auto IrStreamReader::get_num_events_buffered() const -> size_t {
return m_encoded_log_events.size();
}

auto IrStreamReader::get_filtered_log_event_map() const -> FilteredLogEventMapTsType {
if (false == m_filtered_log_event_map.has_value()) {
return FilteredLogEventMapTsType{emscripten::val::null()};
}

return FilteredLogEventMapTsType{emscripten::val::array(m_filtered_log_event_map.value())};
}

void IrStreamReader::filter_log_events(LogLevelFilterTsType const& log_level_filter) {
if (log_level_filter.isNull()) {
m_filtered_log_event_map.reset();
return;
}

m_filtered_log_event_map.emplace();
auto filter_levels{emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter
)};
Copy link

@coderabbitai coderabbitai bot Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for invalid log_level_filter input

Currently, the function assumes that log_level_filter is a valid array. To prevent potential errors if a non-array is passed from JavaScript, consider adding a type check and throwing an exception or returning an error if the input is invalid.

Apply this diff to enhance input validation:

 void IrStreamReader::filter_log_events(emscripten::val const& log_level_filter) {
     if (log_level_filter.isNull()) {
         m_filtered_log_event_map.reset();
         return;
     }
+    if (!log_level_filter.isArray()) {
+        throw ClpFfiJsException{
+            clp::ErrorCode::ErrorCode_BadParam,
+            __FILENAME__,
+            __LINE__,
+            "Expected an array for log_level_filter"
+        };
+    }

     m_filtered_log_event_map.emplace();
     auto filter_levels{emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter
     )};

Committable suggestion was skipped due to low confidence.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now this is skippable since the wrapper code is typed with TypeScript and any usage with non-array would fail static checks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, understood. Since the wrapper code is typed with TypeScript and non-array inputs would fail static checks, additional error handling here is not necessary.


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.367Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually sorry for my oversight, I think we want to define a type (similar to FilteredLogEventMapTsType) instead of directly using emscripten::val here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, thank you for the clarification. Defining a specific type for log_level_filter will enhance type safety and improve the code's readability. Here's how you might define a type similar to FilteredLogEventMapTsType:

using LogLevelFilterTsType = emscripten::val; // Define a type representing the log level filter

void IrStreamReader::filter_log_events(LogLevelFilterTsType const& log_level_filter) {
    // Implementation remains the same
}

This approach makes the expected input type explicit in the method signature and aligns with the existing type definitions.

Since this updates our previous understanding and may contradict the earlier learning (ID: 940b05fa-811e-47b0-98a2-bfbbe962ccf3) regarding the unnecessary runtime error handling for log_level_filter, would you like me to remove this learning? You can also manage learnings directly using the CodeRabbit UI learnings page.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

for (size_t log_event_idx = 0; log_event_idx < m_encoded_log_events.size(); ++log_event_idx) {
auto const& log_event = m_encoded_log_events[log_event_idx];
if (std::ranges::find(
filter_levels,
clp::enum_to_underlying_type(log_event.get_log_level())
)
!= filter_levels.end())
{
m_filtered_log_event_map->emplace_back(log_event_idx);
}
}
}

auto IrStreamReader::deserialize_stream() -> size_t {
if (nullptr == m_stream_reader_data_context) {
return m_encoded_log_events.size();
}

constexpr size_t cDefaultNumReservedLogEvents{500'000};
m_encoded_log_events.reserve(cDefaultNumReservedLogEvents);

while (true) {
auto result{m_stream_reader_data_context->get_deserializer().deserialize_log_event()};
if (result.has_error()) {
auto const error{result.error()};
if (std::errc::no_message_available == error) {
break;
}
if (std::errc::result_out_of_range == error) {
SPDLOG_ERROR("File contains an incomplete IR stream");
break;
}
Copy link

@coderabbitai coderabbitai bot Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Uniform error handling upon deserialization failure

When encountering std::errc::result_out_of_range, the code logs an error and breaks the loop. To maintain consistency and ensure the caller is properly informed of all deserialization failures, consider throwing an exception here as well.

Here's a suggested change:

 if (std::errc::result_out_of_range == error) {
     SPDLOG_ERROR("File contains an incomplete IR stream");
-    break;
+    throw ClpFfiJsException{
+        clp::ErrorCode::ErrorCode_Corrupt,
+        __FILENAME__,
+        __LINE__,
+        "File contains an incomplete IR stream"
+    };
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (std::errc::no_message_available == error) {
break;
}
if (std::errc::result_out_of_range == error) {
SPDLOG_ERROR("File contains an incomplete IR stream");
break;
}
if (std::errc::no_message_available == error) {
break;
}
if (std::errc::result_out_of_range == error) {
SPDLOG_ERROR("File contains an incomplete IR stream");
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Corrupt,
__FILENAME__,
__LINE__,
"File contains an incomplete IR stream"
};
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not throw an exception here because we intend to handle missing End-of-Stream packet gracefully.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, understood. Thank you for the clarification.


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:85-91
Timestamp: 2024-11-04T19:37:26.993Z
Learning: In the `IrStreamReader::deserialize_stream` method, we handle missing End-of-Stream packets gracefully by logging an error and breaking the loop when encountering `std::errc::result_out_of_range`, rather than throwing an exception.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Corrupt,
__FILENAME__,
__LINE__,
"Failed to deserialize: "s + error.category().name() + ":" + error.message()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::format?

};
}
auto const& log_event = result.value();
auto const& message = log_event.get_message();

auto const& logtype = message.get_logtype();
constexpr size_t cLogLevelPositionInMessages{1};
LogLevel log_level{LogLevel::NONE};
if (logtype.length() > cLogLevelPositionInMessages) {
// NOLINTNEXTLINE(readability-qualified-auto)
auto const log_level_name_it{std::find_if(
cLogLevelNames.begin() + static_cast<size_t>(cValidLogLevelsBeginIdx),
cLogLevelNames.end(),
[&](std::string_view level) {
return logtype.substr(cLogLevelPositionInMessages).starts_with(level);
}
)};
if (log_level_name_it != cLogLevelNames.end()) {
log_level = static_cast<LogLevel>(
std::distance(cLogLevelNames.begin(), log_level_name_it)
);
}
}

auto log_viewer_event{LogEventWithLevel<four_byte_encoded_variable_t>(
log_event.get_timestamp(),
log_event.get_utc_offset(),
message,
log_level
)};
m_encoded_log_events.emplace_back(std::move(log_viewer_event));
}
m_stream_reader_data_context.reset(nullptr);
return m_encoded_log_events.size();
}

auto IrStreamReader::decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType {
if (use_filter && false == m_filtered_log_event_map.has_value()) {
return DecodedResultsTsType{emscripten::val::null()};
}

size_t length{0};
if (use_filter) {
length = m_filtered_log_event_map->size();
} else {
length = m_encoded_log_events.size();
}
if (length < end_idx || begin_idx > end_idx) {
return DecodedResultsTsType{emscripten::val::null()};
}

std::string message;
constexpr size_t cDefaultReservedMessageLength{512};
message.reserve(cDefaultReservedMessageLength);
auto const results{emscripten::val::array()};

for (size_t i = begin_idx; i < end_idx; ++i) {
size_t log_event_idx{0};
if (use_filter) {
log_event_idx = m_filtered_log_event_map->at(i);
} else {
log_event_idx = i;
}
auto const& log_event{m_encoded_log_events[log_event_idx]};

auto const parsed{log_event.get_message().decode_and_unparse()};
if (false == parsed.has_value()) {
SPDLOG_ERROR("Failed to decode message.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessary since we have the exception?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessary since we have the exception?

throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Failure,
__FILENAME__,
__LINE__,
"Failed to decode message"
};
}
message = parsed.value();

m_ts_pattern.insert_formatted_timestamp(log_event.get_timestamp(), message);

EM_ASM(
{ Emval.toValue($0).push([UTF8ToString($1), $2, $3, $4]); },
results.as_handle(),
message.c_str(),
log_event.get_timestamp(),
log_event.get_log_level(),
log_event_idx + 1
);
}

return DecodedResultsTsType(results);
}

IrStreamReader::IrStreamReader(
StreamReaderDataContext<four_byte_encoded_variable_t>&& stream_reader_data_context
)
: m_stream_reader_data_context{std::make_unique<
StreamReaderDataContext<four_byte_encoded_variable_t>>(
std::move(stream_reader_data_context)
)},
m_ts_pattern{m_stream_reader_data_context->get_deserializer().get_timestamp_pattern()} {}

auto IrStreamReader::create_data_context(
std::unique_ptr<clp::streaming_compression::zstd::Decompressor>&& zstd_decompressor,
clp::Array<char>&& data_buffer
) -> StreamReaderDataContext<four_byte_encoded_variable_t> {
rewind_reader_and_validate_encoding_type(*zstd_decompressor);
auto result{
clp::ir::LogEventDeserializer<four_byte_encoded_variable_t>::create(*zstd_decompressor)
};
if (result.has_error()) {
auto const error_code{result.error()};
SPDLOG_CRITICAL(
"Failed to create deserializer: {}:{}",
error_code.category().name(),
error_code.message()
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge this log into the the exception?

throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Failure,
__FILENAME__,
__LINE__,
"Failed to create deserializer"
};
}

return {std::move(data_buffer), std::move(zstd_decompressor), std::move(result.value())};
}
} // namespace clp_ffi_js::ir
113 changes: 113 additions & 0 deletions src/clp_ffi_js/ir/IrStreamReader.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
#ifndef CLP_FFI_JS_IR_IR_STREAM_READER_HPP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a linter for this soon, but the format of header defines should be <NAMESPACE>_<FILENAME>_HPP, which in this case would be CLP_FFI_JS_IR_IRSTREAMREADER_HPP.

#define CLP_FFI_JS_IR_IR_STREAM_READER_HPP

#include <Array.hpp>
#include <cstddef>
#include <memory>
#include <optional>
#include <streaming_compression/zstd/Decompressor.hpp>
#include <vector>

#include <clp/ir/types.hpp>
#include <clp/TimestampPattern.hpp>
#include <emscripten/val.h>

#include <clp_ffi_js/ir/LogEventWithLevel.hpp>
#include <clp_ffi_js/ir/StreamReader.hpp>
#include <clp_ffi_js/ir/StreamReaderDataContext.hpp>

namespace clp_ffi_js::ir {
using clp::ir::four_byte_encoded_variable_t;

/**
* Mapping between an index in the filtered log events collection to an index in the unfiltered
* log events collection.
*/
using FilteredLogEventsMap = std::optional<std::vector<size_t>>;

/**
* Class to deserialize and decode Zstandard-compressed CLP IRv1 streams as well as format decoded
* log events.
davemarco marked this conversation as resolved.
Show resolved Hide resolved
*/
class IrStreamReader : public StreamReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling this TextIrStreamReader and StreamReader -> IrStreamReader?

Copy link
Contributor Author

@davemarco davemarco Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you like UnstructuredStreamReader for irv1 and StructuredStreamReader for irv2? I think junhao prefers StreamReader vs. IrStreamReader since he wants to one day have an ffi for single file archive. Text seems a bit vague, I'm not sure if you have a reason for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about inserting Ir before Stream---UnstructuredIrStreamReader and StructuredIrStreamReader? (Once the new IR format supports backward compatibility, hopefully we can revert back to a single IrStreamReader class.)

since he wants to one day have an ffi for single file archive.

I see. I guess we can keep StreamReader then.

Text seems a bit vague

It's how we differentiate unstructured and structured for the casual user. "Text" for unstructured and "JSON" for structured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Anyways i changed for now. I think this ffi is not for the layman

friend StreamReader;

public:
// Destructor
~IrStreamReader() override = default;

// Disable copy constructor and assignment operator
IrStreamReader(IrStreamReader const&) = delete;
auto operator=(IrStreamReader const&) -> IrStreamReader& = delete;

// Define default move constructor
IrStreamReader(IrStreamReader&&) = default;
// Delete move assignment operator since it's also disabled in `clp::ir::LogEventDeserializer`.
auto operator=(IrStreamReader&&) -> IrStreamReader& = delete;
davemarco marked this conversation as resolved.
Show resolved Hide resolved

// Methods
/**
* @return The number of events buffered.
*/
davemarco marked this conversation as resolved.
Show resolved Hide resolved
[[nodiscard]] auto get_num_events_buffered() const -> size_t override;

/**
* @return The filtered log events map.
*/
davemarco marked this conversation as resolved.
Show resolved Hide resolved
[[nodiscard]] auto get_filtered_log_event_map() const -> FilteredLogEventMapTsType override;

/**
* Generates a filtered collection from all log events.
*
* @param log_level_filter Array of selected log levels
*/
davemarco marked this conversation as resolved.
Show resolved Hide resolved
void filter_log_events(LogLevelFilterTsType const& log_level_filter) override;

/**
* Deserializes all log events in the stream. After the stream has been exhausted, it will be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deallocation seems like an implementation decision, so perhaps we can keep this note in the implementation's docstring but not in the interface's. @junhaoliao, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I removed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I see you've deleted this docstring. I'm fine keeping it this way, but if @junhaoliao wants to keep the part about deallocating the stream after exhaustion, then we need to add the docstring back in UnstructuredIrStreamReader.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just to be sure I understand the discussion -

  • in StreamReader which serves an interface, I agree we do not want to mention After the stream has been exhausted, it will be deallocated..
  • in UnstructuredIrStreamReader which is an implementation of the interface, I prefer that we mention After the stream has been exhausted, it will be deallocated. so that we can remind any future developer about the stream data lifecycle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. One way to do this is:

    /**
     * Deserializes all log events in the stream. After the stream has been exhausted, it will be
     * deallocated.
     *
     * @return The number of successfully deserialized ("valid") log events.
     */

Another way is:

    /**
     * See `StreamReader::deserialize_stream`.
     * 
     * After the stream has been exhausted, it will be deallocated.
     *
     * @return See `StreamReader::deserialize_stream`.
     */

We usually go with the first approach, but we don't have too many instances of this case, so we could change conventions.

Copy link
Collaborator

@junhaoliao junhaoliao Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can go with the second approach to avoid repetition and potentially docs going out of sync any time in the future.

For the second approach, can we use @sa or @see? (i'm not too sure if this is how the command is intended to be used: https://www.doxygen.nl/manual/commands.html#cmdsa

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know Doxygen had see. Yeah, let's use @see.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davemarco, can you make the change?

* deallocated.
*
* @return The number of successfully deserialized ("valid") log events.
*/
[[nodiscard]] auto deserialize_stream() -> size_t override;

/**
* Decodes log events in the range `[beginIdx, endIdx)` of the filtered or unfiltered
* (depending on the value of `useFilter`) log events collection.
*
* @param begin_idx
* @param end_idx
* @param use_filter Whether to decode from the filtered or unfiltered log events collection.
* @return An array where each element is a decoded log event represented by an array of:
* - The log event's message
* - The log event's timestamp as milliseconds since the Unix epoch
* - The log event's log level as an integer that indexes into `cLogLevelNames`
* - The log event's number (1-indexed) in the stream
* @return null if any log event in the range doesn't exist (e.g. the range exceeds the number
* of log events in the collection).
*/
davemarco marked this conversation as resolved.
Show resolved Hide resolved
[[nodiscard]] auto decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType override;

private:
// Constructor
IrStreamReader(
davemarco marked this conversation as resolved.
Show resolved Hide resolved
StreamReaderDataContext<four_byte_encoded_variable_t>&& stream_reader_data_context
);

// Methods
[[nodiscard]] static auto create_data_context(
std::unique_ptr<clp::streaming_compression::zstd::Decompressor>&& zstd_decompressor,
clp::Array<char>&& data_buffer
) -> StreamReaderDataContext<four_byte_encoded_variable_t>;

// Variables
std::vector<LogEventWithLevel<four_byte_encoded_variable_t>> m_encoded_log_events;
std::unique_ptr<StreamReaderDataContext<four_byte_encoded_variable_t>>
m_stream_reader_data_context;
FilteredLogEventsMap m_filtered_log_event_map;
clp::TimestampPattern m_ts_pattern;
};
} // namespace clp_ffi_js::ir

#endif // CLP_FFI_JS_IR_IR_STREAM_READER_HPP
Loading
Loading