Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: handle decompression outside vectored read_blobs #8942

Merged

Conversation

yliang412
Copy link
Contributor

@yliang412 yliang412 commented Sep 6, 2024

Part of #8130.

Problem

Currently, decompression is performed within the read_blobs implementation and the decompressed blob will be appended to the end of the BytesMut buffer. We will lose this flexibility of extending the buffer when we switch to using our own dio-aligned buffer (WIP in #8730). To facilitate the adoption of aligned buffer, we need to refactor the code to perform decompression outside read_blobs.

Summary of changes

  • VectoredBlobReader::read_blobs will return VectoredBlob without performing decompression and appending decompressed blob. It becomes the caller's responsibility to decompress the buffer.
  • Added a new BufView type that functions as Cow<Bytes, &[u8]>.
  • Perform decompression within VectoredBlob::read so that people don't have to explicitly thinking about compression when using the reader interface.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@yliang412 yliang412 force-pushed the yuchen/handle-decompression-outside-vectored-read-blobs branch from 44793b7 to 5f5653a Compare September 6, 2024 13:05
@yliang412 yliang412 added the c/storage/pageserver Component: storage: pageserver label Sep 6, 2024
@yliang412 yliang412 self-assigned this Sep 6, 2024
@yliang412 yliang412 marked this pull request as ready for review September 6, 2024 13:28
@yliang412 yliang412 requested a review from a team as a code owner September 6, 2024 13:28
Copy link

github-actions bot commented Sep 6, 2024

4986 tests run: 4822 passed, 0 failed, 164 skipped (full report)


Flaky tests (6)

Postgres 17

Postgres 16

  • test_ondemand_wal_download_in_replication_slot_funcs: release-x86-64

Postgres 15

  • test_ondemand_wal_download_in_replication_slot_funcs: release-x86-64

Code coverage* (full report)

  • functions: 32.1% (7475 of 23255 functions)
  • lines: 50.0% (60227 of 120574 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
67bd614 at 2024-09-24T16:56:24.080Z :recycle:

Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding @koivunej's #8942 (comment): I think if decompress took a Cow<Bytes> instead of the Option<Bytes> it would make the consuming code paths more concise.

pageserver/src/tenant/vectored_blob_io.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/vectored_blob_io.rs Outdated Show resolved Hide resolved
Copy link
Member

@arpad-m arpad-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose appending to the buffer in read_blobs because I didn't want to make compression someone has to explicitly think about when using the reader interface.

Because if it is something someone has to explicitly think about, mistakes can happen, and people can forget about it, resulting in the code working on compressed data. And it is more cumbersome to work with the API.

Therefore, I agree with Christian's suggestion to add a function to VectoredBlob to allow reading of the blobs, and returning something like Cow<Bytes, &[u8]> (or a custom enum if that doesn't work).

Ideally one would make the members of VectoredBlob private so that one can't mistakenly forget about calling the decompress function on it.

Signed-off-by: Yuchen Liang <[email protected]>
Copy link

@orca-security-us orca-security-us bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

Copy link
Member

@arpad-m arpad-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better now, thanks for adjusting it. It's well done work.

pageserver/src/tenant/vectored_blob_io.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/vectored_blob_io.rs Outdated Show resolved Hide resolved
Signed-off-by: Yuchen Liang <[email protected]>
@yliang412 yliang412 enabled auto-merge (squash) September 24, 2024 14:54
@yliang412 yliang412 merged commit 4f67b02 into main Sep 24, 2024
75 checks passed
@yliang412 yliang412 deleted the yuchen/handle-decompression-outside-vectored-read-blobs branch September 24, 2024 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants