Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable file system buffer reuse for compaction prefetches #13187

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

archang19
Copy link
Contributor

@archang19 archang19 commented Dec 5, 2024

Summary

In #13177, I discussed an unsigned integer overflow issue that affects compaction reads inside FilePrefetchBuffer when we attempt to enable the file system buffer reuse optimization. In that PR, I disabled the optimization whenever for_compaction was true to eliminate the source of the bug.

This PR safely re-enables the optimization when for_compaction is true. We need to properly set the overlap buffer through PrefetchInternal rather than simply calling Prefetch. Prefetch assumes num_buffers_ is 1 (i.e. async IO is disabled), so historically it did not have any overlap buffer logic. What ends up happening (with the old bug) is that, when we try to reuse the file system provided buffer, inside the Prefetch method, we read the remaining missing data. However, since we do not do any RefitTail method when use_fs_buffer is true, normally we would rely on copying the partial relevant data into an overlap buffer. That overlap buffer logic was missing, so the final main buffer ends up storing data from an offset that is greater than the requested offset, and we effectively end up "throwing away" part of the requested data.

This PR also unifies the prefetching logic for compaction and non-compaction reads:

  • The same readahead size is used. Previously, we read only std::max(n, readahead_size_) bytes for compaction reads, rather than n + readahead_size_ bytes
  • The stats for PREFETCH_HITS and PREFETCH_BYTES_USEFUL are tracked for both. Previously, they were only tracked for non-compaction reads.

These two small changes should help reduce some of the cognitive load required to understand the codebase. The test suite also became easier to maintain. We could not come up with good reasons why the logic for the readahead size and stats should be different for compaction reads.

Test Plan

I removed the temporary test case from #13200 and incorporated the same test cases into my updated parameterized test case, which tests the valid combinations between use_async_prefetch and for_compaction.

I went further and added a randomized test case that will simply try to hit assertion failures and catch any missing areas in the logic.

I also added a test case for compaction reads without the file system buffer reuse optimization. I am thinking that it may be valuable to make a future PR that unifies a lot of these prefetch tests and parametrizes as much of them as possible. This way we can avoid writing duplicate tests and just look over different parameters for async IO, direct IO, file system buffer reuse, and for_compaction.

@facebook-github-bot
Copy link
Contributor

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from cb16e90 to 32d2d17 Compare December 6, 2024 23:28
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@archang19 archang19 requested a review from anand1976 December 6, 2024 23:50
@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from 3fd9280 to 9a0a2c4 Compare December 6, 2024 23:55
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from f87e358 to 88ebd8b Compare December 9, 2024 23:30
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@archang19 archang19 changed the title Add precondition assertions for file prefetch buffer reads Enable file system buffer reuse for compaction prefetches Dec 9, 2024
@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from 88ebd8b to ab57916 Compare December 9, 2024 23:44
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from 315bfc0 to 5e4483f Compare December 12, 2024 18:12
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

5 similar comments
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@hx235
Copy link
Contributor

hx235 commented Dec 13, 2024

Just to clarify:

Prefetch assumes num_buffers_ is 1 (i.e. async IO is disabled), so historically it did not have any overlap buffer logic. ... However, since we do not do any RefitTail method when use_fs_buffer is true, normally we would rely on copying the partial relevant data into an overlap buffer. That overlap buffer logic was missing, so the final main buffer ends up storing data from an offset that is greater than the requested offset, and we effectively end up "throwing away" part of the requested data.

Why do we need overlap buffer logic for reuse file system buffer + prefetch + non-assync IO? Is it because there can actually be overlapping data OR is it just because the overlap_buf_ is reused while the data being copied are non-overlapped? I'm trying to get a high-level understanding on why we ended up having "so the final main buffer ends up storing data from an offset that is greater than the requested offset..." in Prefetch()

// readahize_size_.
uint64_t trimmed_readahead_size = 0;
if (n < readahead_size_) {
trimmed_readahead_size = readahead_size_ - n;
Copy link
Contributor

@hx235 hx235 Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in the original code shows that compaction is prefetching only readahead_size_ - n but not readahead_size_?

I remembered it was very confusing to me what was std::max(n, readahead_size_) in Prefetch() intended for - why not n + readahead_size_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in the original code shows that compaction is prefetching only readahead_size_ - n but not readahead_size_?

https://github.com/facebook/rocksdb/blob/main/file/file_prefetch_buffer.cc#L823

      if (for_compaction) {
        s = Prefetch(opts, reader, offset, std::max(n, readahead_size_));

I think you have it correct. The std::max(n, readahead_size_) that you mentioned is the reason that the prefetch is only readahead_size_ - n. If n < readahead_size_, we read the original n bytes plus readahead_size_ - n of prefetched data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anand1976 I am in favor of unifying the logic so that we treat compaction readaheads the same as non-compaction readaheads (e.g. fetch the requested amount + readahead_size_ in either case). Do you have any objections / what are your thoughts? I think this would simplify our logic and don't see downsides

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future readers, offline we decided to unify the logic and use the same readahead amount for compaction reads

@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from 2d72233 to ac5fd5c Compare December 19, 2024 16:50
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@archang19
Copy link
Contributor Author

Why do we need overlap buffer logic for reuse file system buffer + prefetch + non-assync IO? Is it because there can actually be overlapping data OR is it just because the overlap_buf_ is reused while the data being copied are non-overlapped? I'm trying to get a high-level understanding on why we ended up having "so the final main buffer ends up storing data from an offset that is greater than the requested offset..." in Prefetch()

@hx235 With async IO, there are 2 buffers, so the overlap buffer is for when the requested data spans the 2 buffers (i.e. buffer 1 has the first half of the data and buffer 2 has the second half).

Without async IO, and with the file system buffer optimization, we use the overlap buffer when the main (and only buffer) only has a "partial hit." Say the buffer contains offsets 100-200 and we request 150-300. Now, without the file system buffer optimization, we would normally "refit tail," move bytes 150-200 from the end of the buffer to the start of the buffer, and then request bytes 200-300 (+ any additional prefetching). With the file system buffer optimization, we don't do that because we are pointing to a buffer allocated by the file system. So instead we would copy bytes 150-200 to the overlap buffer, request bytes 200-300 (+ any additional prefetching), and then copy back bytes 200-300 to the overlap buffer. Ultimately, the overlap buffer would contain exactly what the user asked for (bytes 150-300) and no more. We also avoid refetching the same data (i.e. making a request for bytes 150-300)

@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from ac5fd5c to 8a34e06 Compare December 27, 2024 22:01
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from 8a34e06 to 8a386c6 Compare December 27, 2024 22:13
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

// We disallow async IO for compaction reads since they are performed in
// the background anyways and are less latency sensitive compareed to
// user-initiated reads
(void)for_compaction;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change to unify the logic, for_compaction is not used outside of this assert so we need this (void) for_compaction to prevent the compiler from complaining

@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from dc8bb5d to 9a991ee Compare December 27, 2024 23:43
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from 9a991ee to 34999ca Compare December 27, 2024 23:45
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants