-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable file system buffer reuse for compaction prefetches #13187
base: main
Are you sure you want to change the base?
Conversation
772efa0
to
a7ec31f
Compare
@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@archang19 has updated the pull request. You must reimport the pull request before landing. |
cb16e90
to
32d2d17
Compare
@archang19 has updated the pull request. You must reimport the pull request before landing. |
@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@archang19 has updated the pull request. You must reimport the pull request before landing. |
@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
3fd9280
to
9a0a2c4
Compare
@archang19 has updated the pull request. You must reimport the pull request before landing. |
@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@archang19 has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@archang19 has updated the pull request. You must reimport the pull request before landing. |
f87e358
to
88ebd8b
Compare
@archang19 has updated the pull request. You must reimport the pull request before landing. |
88ebd8b
to
ab57916
Compare
@archang19 has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@archang19 has updated the pull request. You must reimport the pull request before landing. |
315bfc0
to
5e4483f
Compare
@archang19 has updated the pull request. You must reimport the pull request before landing. |
5 similar comments
@archang19 has updated the pull request. You must reimport the pull request before landing. |
@archang19 has updated the pull request. You must reimport the pull request before landing. |
@archang19 has updated the pull request. You must reimport the pull request before landing. |
@archang19 has updated the pull request. You must reimport the pull request before landing. |
@archang19 has updated the pull request. You must reimport the pull request before landing. |
Just to clarify:
Why do we need overlap buffer logic for reuse file system buffer + prefetch + non-assync IO? Is it because there can actually be overlapping data OR is it just because the |
file/file_prefetch_buffer.cc
Outdated
// readahize_size_. | ||
uint64_t trimmed_readahead_size = 0; | ||
if (n < readahead_size_) { | ||
trimmed_readahead_size = readahead_size_ - n; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where in the original code shows that compaction is prefetching only readahead_size_ - n
but not readahead_size_
?
I remembered it was very confusing to me what was std::max(n, readahead_size_)
in Prefetch() intended for - why not n + readahead_size_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where in the original code shows that compaction is prefetching only readahead_size_ - n but not readahead_size_?
https://github.com/facebook/rocksdb/blob/main/file/file_prefetch_buffer.cc#L823
if (for_compaction) {
s = Prefetch(opts, reader, offset, std::max(n, readahead_size_));
I think you have it correct. The std::max(n, readahead_size_)
that you mentioned is the reason that the prefetch is only readahead_size_ - n
. If n
< readahead_size_
, we read the original n
bytes plus readahead_size_ - n
of prefetched data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anand1976 I am in favor of unifying the logic so that we treat compaction readaheads the same as non-compaction readaheads (e.g. fetch the requested amount + readahead_size_
in either case). Do you have any objections / what are your thoughts? I think this would simplify our logic and don't see downsides
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future readers, offline we decided to unify the logic and use the same readahead amount for compaction reads
2d72233
to
ac5fd5c
Compare
@archang19 has updated the pull request. You must reimport the pull request before landing. |
@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@hx235 With async IO, there are 2 buffers, so the overlap buffer is for when the requested data spans the 2 buffers (i.e. buffer 1 has the first half of the data and buffer 2 has the second half). Without async IO, and with the file system buffer optimization, we use the overlap buffer when the main (and only buffer) only has a "partial hit." Say the buffer contains offsets 100-200 and we request 150-300. Now, without the file system buffer optimization, we would normally "refit tail," move bytes 150-200 from the end of the buffer to the start of the buffer, and then request bytes 200-300 (+ any additional prefetching). With the file system buffer optimization, we don't do that because we are pointing to a buffer allocated by the file system. So instead we would copy bytes 150-200 to the overlap buffer, request bytes 200-300 (+ any additional prefetching), and then copy back bytes 200-300 to the overlap buffer. Ultimately, the overlap buffer would contain exactly what the user asked for (bytes 150-300) and no more. We also avoid refetching the same data (i.e. making a request for bytes 150-300) |
ac5fd5c
to
8a34e06
Compare
@archang19 has updated the pull request. You must reimport the pull request before landing. |
8a34e06
to
8a386c6
Compare
@archang19 has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@archang19 has updated the pull request. You must reimport the pull request before landing. |
// We disallow async IO for compaction reads since they are performed in | ||
// the background anyways and are less latency sensitive compareed to | ||
// user-initiated reads | ||
(void)for_compaction; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the change to unify the logic, for_compaction
is not used outside of this assert
so we need this (void) for_compaction
to prevent the compiler from complaining
dc8bb5d
to
9a991ee
Compare
@archang19 has updated the pull request. You must reimport the pull request before landing. |
9a991ee
to
34999ca
Compare
@archang19 has updated the pull request. You must reimport the pull request before landing. |
@archang19 has updated the pull request. You must reimport the pull request before landing. |
Summary
In #13177, I discussed an unsigned integer overflow issue that affects compaction reads inside
FilePrefetchBuffer
when we attempt to enable the file system buffer reuse optimization. In that PR, I disabled the optimization wheneverfor_compaction
wastrue
to eliminate the source of the bug.This PR safely re-enables the optimization when
for_compaction
istrue
. We need to properly set the overlap buffer throughPrefetchInternal
rather than simply callingPrefetch
.Prefetch
assumesnum_buffers_
is 1 (i.e. async IO is disabled), so historically it did not have any overlap buffer logic. What ends up happening (with the old bug) is that, when we try to reuse the file system provided buffer, inside thePrefetch
method, we read the remaining missing data. However, since we do not do anyRefitTail
method whenuse_fs_buffer
is true, normally we would rely on copying the partial relevant data into an overlap buffer. That overlap buffer logic was missing, so the final main buffer ends up storing data from an offset that is greater than the requested offset, and we effectively end up "throwing away" part of the requested data.This PR also unifies the prefetching logic for compaction and non-compaction reads:
std::max(n, readahead_size_)
bytes for compaction reads, rather thann + readahead_size_
bytesPREFETCH_HITS
andPREFETCH_BYTES_USEFUL
are tracked for both. Previously, they were only tracked for non-compaction reads.These two small changes should help reduce some of the cognitive load required to understand the codebase. The test suite also became easier to maintain. We could not come up with good reasons why the logic for the readahead size and stats should be different for compaction reads.
Test Plan
I removed the temporary test case from #13200 and incorporated the same test cases into my updated parameterized test case, which tests the valid combinations between
use_async_prefetch
andfor_compaction
.I went further and added a randomized test case that will simply try to hit
assert
ion failures and catch any missing areas in the logic.I also added a test case for compaction reads without the file system buffer reuse optimization. I am thinking that it may be valuable to make a future PR that unifies a lot of these prefetch tests and parametrizes as much of them as possible. This way we can avoid writing duplicate tests and just look over different parameters for async IO, direct IO, file system buffer reuse, and
for_compaction
.