Splitting Large Records For Redis #14
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is created to solve the issue #12 . Most of the changes are inspired by what has been done for Kyoto Tycoon to support splitting large records.
Again the assumption is that no two processes will attempt to write the same split key at the same time. Since the max record size for redis (512M) is more than Kyoto Tycoon (10M) we should expect less splitting for Redis. Using Redis Lists and Sets enabled a more straightforward implementation of splitting in sonLib. The main notes regarding this implementation are as below:
A unique Set
_SPLIT_RECORDS
is used for storing the keys with split (Listed) values. So by searching this SET in constant time we can check if the key is split or not. When a split record is being removed its key should also be removed from this Set.In order to keep the benefits of pipelining multiple commands, which is mainly used in 3 bulk functions (SET, GET and REMOVE), there are two implementations for checking if a record is split or not. One is recordIsSplitDB() that checks the split-ness by asking the database directly. The other one is recordIsSplitCache() that searches
db->splitRecords
(a stSet copy of all split keys) instead of asking the database directly. The second one does not interfere with pipelining other commands so can be used in bulk functions.db->splitRecords
can be filled using the functionfillSplitRecordsCache()
so it should be called at the beginning of each of the 3 bulk functions.