-
Notifications
You must be signed in to change notification settings - Fork 665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(resharding) - Resharding mapping state update #12232
Conversation
d404027
to
198e73e
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12232 +/- ##
==========================================
+ Coverage 71.62% 71.64% +0.01%
==========================================
Files 837 837
Lines 167105 167189 +84
Branches 167105 167189 +84
==========================================
+ Hits 119696 119783 +87
- Misses 42180 42184 +4
+ Partials 5229 5222 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good but I'm concerned about the storage clone - see comments
core/store/src/adapter/trie_store.rs
Outdated
&self, | ||
shard_uid: ShardUId, | ||
) -> Result<ShardUId, StorageError> { | ||
let store = TrieStoreAdapter::new(Store::new(self.store_update.storage.clone())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a bad idea to clone the storage everytime. Is there any way to do it by reference? What's the typical way to access data from store in here? Can we do it the same way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just create a helper function here for accessing stuff from raw store and just use that instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the typical way to access data from store in here
From what I see elsewhere in the code we do not access data from within StoreUpdate
.
accessing stuff from raw store
Do you mean using storage: Arc<dyn Database>
member of StoreUpdate
directly?
StoreUpdate
does not contain Store
.
I would like to add Store
to TrieStoreUpdateAdapter
.
But it is problematic because of this:
impl<'a> TrieStoreUpdateAdapter<'a> {
pub fn new(store_update: &'a mut StoreUpdate) -> Self {
Self { store_update: StoreUpdateHolder::Reference(store_update) }
}
The easiest way (changing 4 lines of code) would be to replace:
pub struct StoreUpdate {
transaction: DBTransaction,
storage: Arc<dyn Database>,
}
with
pub struct StoreUpdate {
transaction: DBTransaction,
store: Store,
}
where Store
is just a wrapper for storage
:
pub struct Store {
storage: Arc<dyn Database>,
}
@wacban @shreyan-gupta Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied the above change in 09a5ca7 so you can see it.
// The data is now visible at both `parent_shard` and `child_shard`. | ||
assert_eq!(*store.get(child_shard, &dummy_hash).unwrap(), [0]); | ||
assert_eq!(*store.get(parent_shard, &dummy_hash).unwrap(), [0]); | ||
// Remove the data using `parent_shard` UId. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a case where you remove it by using child_shard
UId?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should allow decreasing refcounts using parent shard at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once resharded we should not use parent shard.
This test shows what the current behavior is.
Do you mean we should detect such situation and panic, and reflect the panic in this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The garbage collection will keep using the parent shard id for a few epochs after resharding. Then it will start using the child shard id. Both should work.
core/store/src/adapter/trie_store.rs
Outdated
&self, | ||
shard_uid: ShardUId, | ||
) -> Result<ShardUId, StorageError> { | ||
let store = TrieStoreAdapter::new(Store::new(self.store_update.storage.clone())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just create a helper function here for accessing stuff from raw store and just use that instead?
// The data is now visible at both `parent_shard` and `child_shard`. | ||
assert_eq!(*store.get(child_shard, &dummy_hash).unwrap(), [0]); | ||
assert_eq!(*store.get(parent_shard, &dummy_hash).unwrap(), [0]); | ||
// Remove the data using `parent_shard` UId. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should allow decreasing refcounts using parent shard at all?
core/store/src/adapter/trie_store.rs
Outdated
pub fn decrement_refcount_by( | ||
&mut self, | ||
shard_uid: ShardUId, | ||
hash: &CryptoHash, | ||
decrement: NonZero<u32>, | ||
) { | ||
let mapped_shard_uid = | ||
self.read_shard_uid_mapping_from_db(shard_uid).expect("decrement_refcount_by failed"); | ||
read_shard_uid_mapping_from_db(&self.store_update().store, shard_uid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep it as a method on self?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wacban Does ac1751f do what you mean?
If we go this way, then I think there is one more thing to do:
impl<'a> TrieStoreUpdateAdapter<'a> {
fn get_key_from_shard_uid_and_hash(&self, shard_uid: ShardUId, hash: &CryptoHash) -> [u8; 40] {
self.store_update.store.trie_store().get_key_from_shard_uid_and_hash(shard_uid, hash)
}
Here, calling trie_store()
indirectly clones store. I would like to create StoreHolder
that would allow to pass store by reference, analogously how StoreUpdateHolder
works. @shreyan-gupta wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's getting more complicated instead of less. Feel free to keep as is to keep it simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoided cloning in 432a96d
@@ -459,7 +459,7 @@ impl Store { | |||
/// Keeps track of current changes to the database and can commit all of them to the database. | |||
pub struct StoreUpdate { | |||
transaction: DBTransaction, | |||
storage: Arc<dyn Database>, | |||
store: Store, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no clue what sort of abstraction layers are we breaking here. I guess if it compiles and all tests pass it should be fine. 🙏 In compiler we trust. 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good stuff!
core/store/src/adapter/trie_store.rs
Outdated
/// Replaces shard_uid prefix with a mapped value according to mapping strategy in Resharding V3. | ||
/// For this, it does extra read from `DBCol::StateShardUIdMapping`. | ||
pub fn get(&self, shard_uid: ShardUId, hash: &CryptoHash) -> Result<Arc<[u8]>, StorageError> { | ||
let mapped_shard_uid = self.read_shard_uid_mapping_from_db(shard_uid)?; | ||
let mapped_shard_uid = get_mapped_shard_uid(&self.store, shard_uid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it makes me wonder if we can just move this to inside of get_key_from_shard_uid_and_hash
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I like it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went step further in ac1751f.
Now get_key_from_shard_uid_and_hash
does not need to be exposed at all.
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
sync_empty_state failed on nayduck. Weird 🤔 looking into it. |
432a96d
to
8c86921
Compare
Weird, it passes locally (run 10 times). |
The previous PR introduced mapping for read operations.
This PR extends that functionality to write operations and adds some testing for State mapping.
Following the Zulip discussion, we decided to implement a panic inside the
TrieStoreUpdateAdapter
methods. Other strategies considered were:TrieStoreUpdateAdapter::commit()
: This would require iterating through allDBOp
s, parsing each operation, extracting theshard_uid
from the database key, mapping it, and re-encoding. This approach would makeTrieStoreUpdateAdapter
dependent on the internal workings ofDBTransaction
. Also,StoreUpdate::merge()
makes me feel uneasy.