-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: add BTree index for kv snapshots #2367
base: master
Are you sure you want to change the base?
Conversation
static std::unique_ptr<EliasFanoList32> from_encoded_data(std::span<uint8_t> encoded_data) { | ||
ensure(encoded_data.size() >= kCountLength + kULength, "EliasFanoList32::from_encoded_data data too short"); | ||
const uint64_t count = endian::load_big_u64(encoded_data.data()); | ||
const uint64_t u = endian::load_big_u64(encoded_data.subspan(kCountLength).data()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What 'u' stands for? Is there a better name for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious too.
This is either 3rd party code or copied from erigon.
I think maybe need to open the Elias Fano paper to understand it.
I suggest to keep it the code as is, and do renaming in follow ups if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LTGM. See comments.
|
||
#include <type_traits> | ||
|
||
namespace silkworm::snapshots { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to sandbox it in a subnamespace?
I'm afraid that since it is included in decompressor.hpp, it might virally be included in a lot of places and infect the namespace with these operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see your point here because including decompressor.hpp
will add these operands only for CompressionKind
type, which is what you want in order to be able to specify compound values for it in the user code.
On the other hand, any other type is not affected until you explicitly instantiate the bitmask operators for such type.
if ((decoder_->compression_ & compression) != CompressionKind::kNone) { | ||
next(current_word_); | ||
} else { | ||
next_uncompressed(current_word_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
ByteView value() const noexcept { return value_; } | ||
DataIndex data_index() const noexcept { return data_index_; } | ||
|
||
bool next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend to refactor Cursor to follow the Iterator pattern. Could be done in a follow-up PR.
std::optional<MemoryMappedRegion> index_region = {}, | ||
uint64_t btree_fanout = kDefaultFanout); | ||
|
||
//! Return the Elias-Fano encoding of the sequence of key offsets or nullptr if not present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's are the conditions when nullptr is possible? Is it a common use case?
//! - if found item.key has \p seek_key as prefix, return found=false and item.key | ||
//! - if key is greater than all keys, return found=false and empty key | ||
//! \endverbatim | ||
SeekResult seek(ByteView seek_key, DataIterator& data_it); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to put DataIterator in SeekResult here and same in get()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is not because it's not a pure output parameter: it's an input parameter that gets modified by the function call (i.e. the iterator gets moved).
|
||
// Gracefully handle the case of empty index file before memory mapping to avoid error | ||
if (std::filesystem::file_size(file_path_) == 0) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should unify the construction behavior with Index and RecSplit. For RecSplit it throws in constructor for this case, but Index wrapper doesn't throw in constructor, because RecSplitIndex is a unique_ptr there. Index::reopen_index() tries to create a RecSplit object and throws.
I suggest to refactor this (perhaps in a follow-up) to throw, and add the BTreeIndex feature to Index (or do it via some other approach).
I'm not a fan of partial construction approach, because it makes other methods brittle (requires to do checks in all other methods to avoid crashing on btree_ or data_offsets_ be null).
This PR introduces support for reading B+Tree index files (
.bt
) associated to Erigon3 key-value state snapshots (.kv
).Related changes:
bitmask_operators
fromRecSplit
and move into common placeseg::Decompressor
to support files w/ key-value compression combinationsopen_btree_index
subcommand insnapshots
command-line utility