[C++] F2 KV store #922

kkanellis · 2024-07-17T15:47:47Z

This PR F2, an evolution of FASTER key-value store. More info can be found here.

* [C++] Add force option to record user Delete request * If force is set to true, then a tombstone will be appended to the log, irrespective of whether the hash index contains the record itself. * [C++] Support for defining a Guid for a session externally * [C++] Replace checkpoint inline callback definition ... with predefined types. * [C++] Rmw can be configured to not create record ... if one does not exists inside the log. * [C++] Implement method for conditionally copying to log tail * [C++] Use minimum number of mutable pages if value is 0 * [C++] Initial implementation of FASTER hot-cold design * Currently supports reads, upserts, deletes and RMWs. * [C++] Fix compilation error * [C++] Initial tests for hot-cold design * [C++] Lookup-based hybrid log compaction (microsoft#487) * [C++] Log scan can now return record address, along with record * [C++] Add implementation of Address + operator * [C++] Add method for finding if a record exists in the hybrid log * Note that if a tombstone record exists, it will return true. * [C++] Initial implementation of a better log compaction algorithm * It leverages the hash index to identify live records, and copy them to the tail of the log. * Ensures that if a user performs a concurrent upsert, compaction won't overwrite their operation. * Avoids expensive scan of the entire log -- only the relevent log section is read. * [C++] Remove unnecessary template typenames * [C++] Fix several issues in compaction code * [C++] Several bugfixes in log scan iterator * Now correctly switches to read from next page if the record didn't entirely fit in the previous one. * Fix bug where record address was wrong * Fix bug where in-disk page wasn't read due to >0 offset in passed address. * [C++] Minor bugfixes in lookup-based compaction * [C++] Add tests for lookup-based compaction algorithm * [C++] Fix bug in Addres + operator * [C++] Add a medium-sized value type for tests * [C++] Compaction context/entry now stores record address * [C++] Bugfixes in compaction code * [C++] Update log compaction tests * Add tests with where other threads perform concurrent insertions & deletions * Test actual log truncation correctness (using `ShiftBeginAddress` method). * [C++] Fix test compilation error * [C++] Refactor log compaction code * [C++] Minor changes * [C++] Better status handling in RecordExists method * [C++] Log compaction with multiple threads * [C++] Unoptimized concurrent page-granularity compaction * [C++] Fix bug in tests * [C++] Concurrent compaction \w non-blocking waiting for threads * [C++] Introduce page- and record-granularity log iterators * Page-granularity iterator is used with the new lookup-based compaction method, while the (older) record-granularity one is used by the (old) compaction algorithm. * The page log iterator can still be optimized further (i.e. avoid locking, prefetching, etc). * [C++] Avoid key/value copying on compaction contexts * [C++] Improvements on the log compaction method * + bugfix on sessions start/stop when using multiple threads. * [C++] Make obsolete write key calls on Read/Exists contexts * [C++] Add variable-length key tests for log compaction * [C++] Add delete ops to varlen keys tests * [C++] Concurrent lock-free log iterator with prefetching * [C++] Add variable-length value tests for lookup compaction * [C++] Bugfixes in tests Co-authored-by: Badrish Chandramouli <[email protected]> Co-authored-by: Kirk Olynyk <[email protected]> * Better design FASTER's of copy to tail method * Implement hot-cold & cold-cold compaction * Include RMW in compact lookup tests * Bugfixes in core FASTER * Bugfixes & preliminary work for retrying RMW ops in hot-cold * Minor changes in compact lookup tests * Rework hot-cold implementation to support retries * Bugfixes in FASTER RMW & log compaction * Update hot-cold design tests * Minor change in RMW * Minor cleanup & better handling of complete pending requests * Proper handling of deleted records in hot-cold `Read` method returns `NOT_FOUND` either if no record was found, or if a tombstone record was found. While there is no point separating the two cases in the single log case, in the hot-cold design it is important to know which is the case. The most useful use-case for that is for the hot-cold `Read` method: if a tombstone was found in hot log, there is no need to search the cold log. In other words, `Read` will go through the cold log only if no record (normal or tombstone) was found in the hot log. Thus, FASTER `Read` method can now be configured to return a different status (i.e. `ABORTED`) if it finds a tombstone, insted of `NOT_FOUND`. We support this, using an additional optional flag `abort_if_tombstone` in the Read function prototype. By default this is set to `false` -- only hot-cold design this flag, when a Read is issued on the hot log. * Update hot-cold tests * Bugfix to guarrantee progress in both stores Co-authored-by: Badrish Chandramouli <[email protected]> Co-authored-by: Kirk Olynyk <[email protected]>

During log compaction (\w lookup), live records are being copied to the tail of the log. Once the all live records have been copied, the part of the log that was just compacted is truncated. However, there is a slim chance that during the log truncation a pending Read operation will return NOT_FOUND, even thought a record for this key exists. Specifically this can happen if a live record is being copied to the tail of the log, but the Read operation has already checked the log tail, and has issued one (or more) I/O requests to read disk-resident records. In this case, if we truncate the log before this Read operation reaches the live record, the Read will return NOT_FOUND. In order to handle this undesired behavior we keep track of the number of truncations after performance log compaction (global). Each Read operation keeps a local copy of this number in its context. If the Read operation has reached the end (begin) of the log and has not found a live record, we check if there a log truncation occured due to a log compaction. If this is the case, this Read op will retry, in order check the newly introduced log part. This last part is now supported using the `min_start_address' argument that can be defined in the Read context. In this case, the Read operation will not go throught the entire log.

This fixes some spurious error messages, including the following: `Assertion `idx < size_' failed.'

Fixes a bug that was due to improper calculation of how many bytes to read from disk.

Employing a local reference for `thread_ctx()`, instead of calling it over and over again, results in up to 20% better throughput for in-memory workloads; F2 performance is now withing 2-3% of original stable FASTER C++ codebase (i.e., pre-F2 code refactoring).

Currently benchmarking is time-based: i.e., a workload is run for a X period of time. We now support ops-based benchmarking: i.e., running a workload until Y requests have completed.

For write-intensive workloads, it is possible that even during compaction, the maximum hlog budget can been reached. For example, this can occur when the rate of ingesting requests to the hot log is higher then the rate of compacting rates to the cold log. To fix, we now allow user threads to participate to the compaction process, only if we reach the (hard) hlog size limit. Note that background compaction threads are anyways performing only compaction work. Once the compaction completes, user threads can resume serving user requests, as before.

kkanellis and others added 30 commits August 6, 2021 15:36

Rename testcases

c647508

Initial implementation of conditional insert

a842dd9

Cleanup

5a30fb8

Fix hot-cold Read-Modify-Write operation

19ad421

Replace hot-cold compaction login with Upserts

ddd4a6d

Minor fix

d0a605d

Add support for variable-length keys in hot-cold

3669a14

Minor changes

7ee21e3

Add RMW ops in variable length key tests

76e9552

Bugfix on hot-cold read operation

68f7bfc

Add test that exposes Reads-Compaction race condition

fb23158

hot-cold: Add test for variable-length value records

b07bf92

Bugfix in CleanHashTableBuckets

bc0dbcb

This fixes some spurious error messages, including the following: `Assertion `idx < size_' failed.'

Fix wrong assertion, for when RMW entry is deleted

52d8420

Working prototype -- suboptimal HC context management

9d90648

Better context management memory-wise

d80d7cf

Bugfixes caused by ops inside assertions

416d67e

HC RMW operation: proper memory free-ing for contexts

2e69abe

Minor fix in hot-cold test

6d059ee

Modify compaction test to use variable number of threads

83e019c

Bugfix

1a7cf1d

Bugfix on log scan

1ad3bc9

Fixes a bug that was due to improper calculation of how many bytes to read from disk.

Support for hybrid log checkpointing after compaction

be1f82b

Add tests for checkpointing after compaction

6209ae3

[WIP] Decoupling index from log

36b9693

Move index checkpointing to hash index class

d05b84b

Fix error from gtest

d8f5079

Minor changes

2fc04bc

kkanellis added 30 commits July 16, 2024 05:37

Checkponting/Recovery tests for hot-cold store

545f2bc

Minor fixes in tests

1dfe37b

ShiftBeginAddress: add support for context in callbacks

e5ec73c

Checkpoint: minor fix on callback context behavior

314916f

Minor fixes

b71207e

Minor fix

29d6095

Minor changes in CMake files

e5eed6c

Minor cleanup

49c39cf

Add missing copyright header

d6aff4a

Fit pending context struct in a single cache-line

31582c9

Minor changes

6fc9ae9

Simplify hash function usage

1c33250

Support for ops-based benchmarking

bac8551

Currently benchmarking is time-based: i.e., a workload is run for a X period of time. We now support ops-based benchmarking: i.e., running a workload until Y requests have completed.

Support more benchmarking workloads

2bb75bc

F2: Add hash index class as template arguments

fbc5873

Add blocking method that waits for log compaction completion.

4a58734

F2: Add benchmarking code

2ba9337

Minor cleanup

ab7eaee

Add missing pragma for header file

253b089

Minor fix in log messages

8dcaca4

Minor index class reordering

cda0b40

F2: Changes in classes naming

e2cbe3d

Naming improvements for index classes

5090cc8

Remove deprecated benchmark files

ea51a9c

Provide config defaults for FASTER hlog compaction

16a959f

Minor change

771534b

Bugfix

a62b930

Set I/O queue size in a single place

94cbf90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] F2 KV store #922

[C++] F2 KV store #922

kkanellis commented Jul 17, 2024

[C++] F2 KV store #922

Are you sure you want to change the base?

[C++] F2 KV store #922

Conversation

kkanellis commented Jul 17, 2024