Skip to content

Commit

Permalink
Add comment clarifying proper use of fs_scratch (facebook#13189)
Browse files Browse the repository at this point in the history
Summary:
facebook#13182 seems to have resolved the `heap-use-after-free` / `heap-buffer-overflow` issues, but not for the reasons we had in mind.

I believe I have figured out the root cause after doing more thinking / reading into the warm storage code.

**`fs_scratch` cannot be assumed to point to the start of the data buffer. It must be treated as a pointer to any arbitrary object / data structure. As such, we must rely only on result.data().**

I think that part of the reason for the bug was that the comment for `fs_scratch` was

> fs_scratch is a data buffer allocated and provided by underlying FileSystem

which is _extremely misleading_.

To avoid confusion in the future, I have updated the comments related to `FsReadRequest` with some of my learnings and included `WARNING`s in all caps to hopefully steer future engineers aware from the same issue.

In another PR, I will update some of our mock file system test classes that support `FSSupportedOps::kFSBuffer`. The test class implementation also contributed to my confusion, since `fs_scratch` did point to the start of the valid data in those implementations. This cannot and should not be assumed to be true in general, and we should try to guard against potential future bugs by updating those mock implementations.

Pull Request resolved: facebook#13189

Test Plan: These are just comments.

Reviewed By: anand1976, hx235

Differential Revision: D66849436

Pulled By: archang19

fbshipit-source-id: c264007647af9cc2a4dfd58dbe7287af86fa2261
  • Loading branch information
archang19 authored and facebook-github-bot committed Dec 6, 2024
1 parent 1095810 commit 4ac7fcb
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 40 deletions.
67 changes: 30 additions & 37 deletions include/rocksdb/file_system.h
Original file line number Diff line number Diff line change
Expand Up @@ -821,54 +821,47 @@ struct FSReadRequest {
// and will be passed to underlying FileSystem.
char* scratch;

// Output parameter set by MultiRead() to point to the data buffer, and
// the number of valid bytes
// Output parameter set by MultiRead() to point to the start of the data
// buffer.
//
// In case of asynchronous reads, this output parameter is set by Async Read
// APIs to point to the data buffer, and
// the number of valid bytes.
// Slice result should point to scratch i.e the data should
// always be read into scratch.
// When FSReadRequest::scratch is provided, this should point to
// FSReadRequest::scratch. When FSSupportedOps::kFSBuffer is enabled and
// FSReadRequest::scratch is nullptr, this points to the start of the
// data buffer allocated by the FileSystem.
//
// WARNING: Even with the FSSupportedOps::kFSBuffer optimization, you must
// still use result.data() to get the start of the actual data that was read.
// Do NOT treat FSReadRequest::fs_scratch as a char* to the start of a valid
// data buffer.
Slice result;

// Output parameter set by underlying FileSystem that represents status of
// read request.
IOStatus status;

// fs_scratch is a data buffer allocated and provided by underlying FileSystem
// to RocksDB during reads, when FS wants to provide its own buffer with data
// instead of using RocksDB provided FSReadRequest::scratch.
//
// FileSystem needs to provide a buffer and custom delete function. The
// lifecycle of fs_scratch until data is used by RocksDB. The buffer
// should be released by RocksDB using custom delete function provided in
// unique_ptr fs_scratch.
// fs_scratch is a unique pointer to an arbitrary object allocated by the
// underlying FileSystem.
//
// Optimization benefits:
// This is helpful in cases where underlying FileSystem has to do additional
// copy of data to RocksDB provided buffer which can consume CPU cycles. It
// can be optimized by avoiding copying to RocksDB buffer and directly using
// FS provided buffer.
// Instead of having the FileSystem spend CPU cycles copying data into the
// FSReadRequest::scratch buffer provided by RocksDB, RocksDB can directly use
// a buffer allocated by the FileSystem.
//
// How to enable:
// In order to enable this option, FS needs to override SupportedOps() API and
// set FSSupportedOps::kFSBuffer in SupportedOps() as:
// {
// supported_ops |= (1 << FSSupportedOps::kFSBuffer);
// }
// This optimization is enabled for MultiReads (sync and async) with non
// direct io, when these conditions hold:
// 1. The FileSystem has overriden the SupportedOps() API and set
// FSSupportedOps::kFSBuffer.
// 2. FSReadRequest::scratch is set to nullptr.
//
// Work in progress:
// Right now it's only enabled for MultiReads (sync and async
// both) with non direct io.
// If RocksDB provide its own buffer (scratch) during reads, that's a
// signal for FS to use RocksDB buffer.
// If FSSupportedOps::kFSBuffer is enabled and scratch == nullptr,
// then FS have to provide its own buffer in fs_scratch.
// RocksDB will:
// 1. Reuse the buffer allocated by the FileSystem.
// 2. Take ownership of the object managed by fs_scratch.
// 3. Handle invoking the custom deleter function from the FSAllocationPtr.
//
// NOTE:
// - FSReadRequest::result should point to fs_scratch.
// - This is needed only if FSSupportedOps::kFSBuffer support is provided by
// underlying FS.
// WARNING: Do NOT assume that fs_scratch points to the start of the actual
// char* data returned by the read. As the type signature suggests, fs_scratch
// is a pointer to any arbitrary data type. Use result.data() to get a valid
// start to the real data. See https://github.com/facebook/rocksdb/pull/13189
// for more context.
FSAllocationPtr fs_scratch;
};

Expand Down
9 changes: 6 additions & 3 deletions util/aligned_buffer.h
Original file line number Diff line number Diff line change
Expand Up @@ -113,15 +113,18 @@ class AlignedBuffer {
alignment_ = alignment;
}

// Points the buffer to new_buf (taking ownership) without allocating extra
// memory or performing any data copies. This method is called when we want to
// reuse the buffer provided by the file system
// Points the buffer to the result without allocating extra
// memory or performing any data copies. Takes ownership of the
// FSAllocationPtr. This method is called when we want to reuse the buffer
// provided by the file system
void SetBuffer(Slice& result, FSAllocationPtr new_buf) {
alignment_ = 1;
capacity_ = result.size();
cursize_ = result.size();
buf_ = std::move(new_buf);
assert(buf_.get() != nullptr);
// Note: bufstart_ must point to result.data() and not new_buf, which can
// point to any arbitrary object
bufstart_ = const_cast<char*>(result.data());
}

Expand Down

0 comments on commit 4ac7fcb

Please sign in to comment.