Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Unify compaction prefetching logic (#13187)
Summary: In #13177, I discussed an unsigned integer overflow issue that affects compaction reads inside `FilePrefetchBuffer` when we attempt to enable the file system buffer reuse optimization. In that PR, I disabled the optimization whenever `for_compaction` was `true` to eliminate the source of the bug. **This PR safely re-enables the optimization when `for_compaction` is `true`.** We need to properly set the overlap buffer through `PrefetchInternal` rather than simply calling `Prefetch`. `Prefetch` assumes `num_buffers_` is 1 (i.e. async IO is disabled), so historically it did not have any overlap buffer logic. What ends up happening (with the old bug) is that, when we try to reuse the file system provided buffer, inside the `Prefetch` method, we read the remaining missing data. However, since we do not do any `RefitTail` method when `use_fs_buffer` is true, normally we would rely on copying the partial relevant data into an overlap buffer. That overlap buffer logic was missing, so the final main buffer ends up storing data from an offset that is greater than the requested offset, and we effectively end up "throwing away" part of the requested data. **This PR also unifies the prefetching logic for compaction and non-compaction reads:** - The same readahead size is used. Previously, we read only `std::max(n, readahead_size_)` bytes for compaction reads, rather than `n + readahead_size_` bytes - The stats for `PREFETCH_HITS` and `PREFETCH_BYTES_USEFUL` are tracked for both. Previously, they were only tracked for non-compaction reads. These two small changes should help reduce some of the cognitive load required to understand the codebase. The test suite also became easier to maintain. We could not come up with good reasons why the logic for the readahead size and stats should be different for compaction reads. Pull Request resolved: #13187 Test Plan: I removed the temporary test case from #13200 and incorporated the same test cases into my updated parameterized test case, which tests the valid combinations between `use_async_prefetch` and `for_compaction`. I went further and added a randomized test case that will simply try to hit `assert`ion failures and catch any missing areas in the logic. I also added a test case for compaction reads _without_ the file system buffer reuse optimization. I am thinking that it may be valuable to make a future PR that unifies a lot of these prefetch tests and parametrizes as much of them as possible. This way we can avoid writing duplicate tests and just look over different parameters for async IO, direct IO, file system buffer reuse, and `for_compaction`. Reviewed By: anand1976 Differential Revision: D66903373 Pulled By: archang19 fbshipit-source-id: 351b56abea2f0ec146b83e3d8065ccc69d40405d
- Loading branch information