Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add record_multiplexer microbenchmarks #24155

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from

Conversation

ballard26
Copy link
Contributor

@ballard26 ballard26 commented Nov 18, 2024

This PR adds benchmarks for record_multiplexer along with the following that was needed to support that;

  • A cmake-build-compatible random Protobuf message generator.
  • Protobuf support for record_generator.
  • A serde parquet writer that writes to a null ostream.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

@ballard26 ballard26 changed the title [WIP] Add microbenchmarks to datalake [WIP] Add record_multiplexer microbenchmarks Nov 18, 2024
@ballard26 ballard26 force-pushed the iceberg-microbench-1 branch 2 times, most recently from 3712bfa to da6f4e6 Compare November 22, 2024 05:44
@github-actions github-actions bot added area/build area/wasm WASM Data Transforms labels Nov 22, 2024
@ballard26 ballard26 changed the title [WIP] Add record_multiplexer microbenchmarks Add record_multiplexer microbenchmarks Nov 22, 2024
@ballard26 ballard26 marked this pull request as ready for review November 22, 2024 05:51
@ballard26 ballard26 force-pushed the iceberg-microbench-1 branch 2 times, most recently from a77d575 to 5f7e229 Compare November 22, 2024 05:53
co_return std::nullopt;
}

iobuf encode_protobuf_message_index(const std::vector<int32_t>& message_index) {
Copy link
Contributor Author

@ballard26 ballard26 Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any existing serializer for this message index format anywhere in our code-base? There is a de-serializer; get_proto_offsets in src/v/datalake/schema_registry.h. Happy to move this serializer to a more general location if there is any use for it outside of the record generator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closest thing is

iobuf encode_pb_offsets(const std::vector<int32_t>& offsets) {
auto cnt_bytes = vint::to_bytes(offsets.size());
iobuf buf;
buf.append(cnt_bytes.data(), cnt_bytes.size());
for (auto o : offsets) {
auto bytes = vint::to_bytes(o);
buf.append(bytes.data(), bytes.size());
}
return buf;
}

I don't have strong feelings about code placement, I think leaving it in the record generate seems reasonable

@ballard26 ballard26 force-pushed the iceberg-microbench-1 branch 4 times, most recently from 1a07259 to d3cc7ab Compare November 22, 2024 06:12
@vbotbuildovich
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants