Unify compaction prefetching logic (#13187)

Summary: In #13177, I discussed an unsigned integer overflow issue that affects compaction reads inside `FilePrefetchBuffer` when we attempt to enable the file system buffer reuse optimization. In that PR, I disabled the optimization whenever `for_compaction` was `true` to eliminate the source of the bug. **This PR safely re-enables the optimization when `for_compaction` is `true`.** We need to properly set the overlap buffer through `PrefetchInternal` rather than simply calling `Prefetch`. `Prefetch` assumes `num_buffers_` is 1 (i.e. async IO is disabled), so historically it did not have any overlap buffer logic. What ends up happening (with the old bug) is that, when we try to reuse the file system provided buffer, inside the `Prefetch` method, we read the remaining missing data. However, since we do not do any `RefitTail` method when `use_fs_buffer` is true, normally we would rely on copying the partial relevant data into an overlap buffer. That overlap buffer logic was missing, so the final main buffer ends up storing data from an offset that is greater than the requested offset, and we effectively end up "throwing away" part of the requested data. **This PR also unifies the prefetching logic for compaction and non-compaction reads:** - The same readahead size is used. Previously, we read only `std::max(n, readahead_size_)` bytes for compaction reads, rather than `n + readahead_size_` bytes - The stats for `PREFETCH_HITS` and `PREFETCH_BYTES_USEFUL` are tracked for both. Previously, they were only tracked for non-compaction reads. These two small changes should help reduce some of the cognitive load required to understand the codebase. The test suite also became easier to maintain. We could not come up with good reasons why the logic for the readahead size and stats should be different for compaction reads. Pull Request resolved: #13187 Test Plan: I removed the temporary test case from #13200 and incorporated the same test cases into my updated parameterized test case, which tests the valid combinations between `use_async_prefetch` and `for_compaction`. I went further and added a randomized test case that will simply try to hit `assert`ion failures and catch any missing areas in the logic. I also added a test case for compaction reads _without_ the file system buffer reuse optimization. I am thinking that it may be valuable to make a future PR that unifies a lot of these prefetch tests and parametrizes as much of them as possible. This way we can avoid writing duplicate tests and just look over different parameters for async IO, direct IO, file system buffer reuse, and `for_compaction`. Reviewed By: anand1976 Differential Revision: D66903373 Pulled By: archang19 fbshipit-source-id: 351b56abea2f0ec146b83e3d8065ccc69d40405d
facebook · Jan 8, 2025 · b5e4162 · b5e4162
1 parent d4bd67f
commit b5e4162
Show file tree

Hide file tree

Showing 2 changed files with 188 additions and 86 deletions.
diff --git a/file/file_prefetch_buffer.cc b/file/file_prefetch_buffer.cc
@@ -771,6 +771,12 @@ bool FilePrefetchBuffer::TryReadFromCache(const IOOptions& opts,
 bool FilePrefetchBuffer::TryReadFromCacheUntracked(
     const IOOptions& opts, RandomAccessFileReader* reader, uint64_t offset,
     size_t n, Slice* result, Status* status, bool for_compaction) {
+  // We disallow async IO for compaction reads since they are performed in
+  // the background anyways and are less latency sensitive compared to
+  // user-initiated reads
+  (void)for_compaction;
+  assert(!for_compaction || num_buffers_ == 1);
+
   if (track_min_offset_ && offset < min_offset_read_) {
     min_offset_read_ = static_cast<size_t>(offset);
   }
@@ -819,27 +825,22 @@ bool FilePrefetchBuffer::TryReadFromCacheUntracked(
       assert(reader != nullptr);
       assert(max_readahead_size_ >= readahead_size_);
 
-      if (for_compaction) {
-        s = Prefetch(opts, reader, offset, std::max(n, readahead_size_));
-      } else {
-        if (implicit_auto_readahead_) {
-          if (!IsEligibleForPrefetch(offset, n)) {
-            // Ignore status as Prefetch is not called.
-            s.PermitUncheckedError();
-            return false;
-          }
+      if (implicit_auto_readahead_) {
+        if (!IsEligibleForPrefetch(offset, n)) {
+          // Ignore status as Prefetch is not called.
+          s.PermitUncheckedError();
+          return false;
         }
-
-        // Prefetch n + readahead_size_/2 synchronously as remaining
-        // readahead_size_/2 will be prefetched asynchronously if num_buffers_
-        // > 1.
-        s = PrefetchInternal(
-            opts, reader, offset, n,
-            (num_buffers_ > 1 ? readahead_size_ / 2 : readahead_size_),
-            copy_to_overlap_buffer);
-        explicit_prefetch_submitted_ = false;
       }
 
+      // Prefetch n + readahead_size_/2 synchronously as remaining
+      // readahead_size_/2 will be prefetched asynchronously if num_buffers_
+      // > 1.
+      s = PrefetchInternal(
+          opts, reader, offset, n,
+          (num_buffers_ > 1 ? readahead_size_ / 2 : readahead_size_),
+          copy_to_overlap_buffer);
+      explicit_prefetch_submitted_ = false;
       if (!s.ok()) {
         if (status) {
           *status = s;
@@ -854,6 +855,7 @@ bool FilePrefetchBuffer::TryReadFromCacheUntracked(
       return false;
     }
   } else if (!for_compaction) {
+    // These stats are meant to track prefetch effectiveness for user reads only
     UpdateStats(/*found_in_buffer=*/true, n);
   }
 
@@ -864,6 +866,7 @@ bool FilePrefetchBuffer::TryReadFromCacheUntracked(
     buf = overlap_buf_;
   }
   assert(buf->offset_ <= offset);
+  assert(buf->IsDataBlockInBuffer(offset, n));
   uint64_t offset_in_buffer = offset - buf->offset_;
   *result = Slice(buf->buffer_.BufferStart() + offset_in_buffer, n);
   if (prefetched) {