-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[reshardingV3] State ShardUIdMapping #12084
base: master
Are you sure you want to change the base?
Conversation
let (start, end) = subtree_to_load.to_iter_range(self.shard_uid); | ||
let (start, end) = subtree_to_load.to_iter_range(self.shard_uid_db_prefix.0); | ||
|
||
// Load all the keys in this range from the FlatState column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about that. Maybe it would read entire parent shard while we only want child shard.
And it reads FlatState so we might want to synchronize on changes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed, this seems fragile.
Taking a step back what is our plan for loading memtrie post resharding? Perhaps we can rely on that and panic here if shard_uid != mapped_shard_uid.
There is
and it might not be possible to scan child shard only. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12084 +/- ##
==========================================
+ Coverage 71.51% 71.54% +0.02%
==========================================
Files 818 819 +1
Lines 164494 164637 +143
Branches 164494 164637 +143
==========================================
+ Hits 117644 117782 +138
+ Misses 41713 41712 -1
- Partials 5137 5143 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I left a few comments!
let (start, end) = subtree_to_load.to_iter_range(self.shard_uid); | ||
let (start, end) = subtree_to_load.to_iter_range(self.shard_uid_db_prefix.0); | ||
|
||
// Load all the keys in this range from the FlatState column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed, this seems fragile.
Taking a step back what is our plan for loading memtrie post resharding? Perhaps we can rely on that and panic here if shard_uid != mapped_shard_uid.
core/store/src/trie/shard_tries.rs
Outdated
let shard_uid_db_prefix = match self.0.shard_uid_to_db_prefix.get(&shard_uid) { | ||
Some(mapped_shard_uid) => *mapped_shard_uid, | ||
// TODO(reshardingV3) Think about how None should be handled here. | ||
None => shard_uid.into(), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is on a bit higher level than I anticipated. I thought the mapping would happen closer to the db itself, perhaps in Store or Database. I'm not saying this is wrong but I'm curious on your thoughts about how those two approaches compare.
core/store/src/trie/trie_storage.rs
Outdated
// TODO(reshardingV3) Think about how to handle it the best way | ||
_ => shard_uid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 best to think about what are the invariants and handle it accordingly.
If the invariant is that the mapping should always be populated for all shards then returning an error seems quite reasonable.
7d1841c
to
85574b2
Compare
@@ -101,10 +102,14 @@ impl NightshadeRuntime { | |||
let trie_viewer = TrieViewer::new(trie_viewer_state_size_limit, max_gas_burnt_view); | |||
let flat_storage_manager = FlatStorageManager::new(store.clone()); | |||
let shard_uids: Vec<_> = genesis_config.shard_layout.shard_uids().collect(); | |||
// TODO(reshardingV3) Recursively calculate resharding parents for `shard_uids`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would require iterating through previous epochs, which is a bit cumbersome, see
nearcore/chain/client/src/sync/epoch.rs
Lines 232 to 234 in 3d0fd26
// and (2) it is not easy to walk backwards from the last epoch; there's no | |
// "give me the previous epoch" query. So instead, we use block header's | |
// `next_epoch_id` to establish an epoch chain. |
85574b2
to
8a4ffea
Compare
Tracking issue: #12050
I would like to collect an early feedback on the first steps to implementing mapping strategy for State in ReshardingV3.
Went through all references in code to
DbCol::State
, excluding tests for now.Update: in-memory mapping
Digging more into the code, it turns out we construct
TrieStorage
very often (e.g. every time we apply a chunk):nearcore/core/store/src/trie/shard_tries.rs
Line 125 in 3d0fd26
It can be initialized with a shared reference to
TrieCache
, that it is kept inShardTries
and protected by mutex:nearcore/core/store/src/trie/shard_tries.rs
Line 235 in 3d0fd26
I used similar approach by creating
StateReader
that is kept inShardTries
and used to create each new instance ofTrieStorage
.StateReader
needs to know resharding tree so that it knows parent shard uid when it cannot find a node in the database.It keeps the resharding tree as hashmap. Alternatively, it could store ancestors list for each shard, but it would use O(S^2) memory, which is also fine for the near future.
The problem is that the resharding history might not be easily accessible, see #12084 (comment).
What to do?
One idea is to add a db column that stores resharding parent shard_uid and update it on resharding event.
It could be initially empty, since we do not need resharding history from before reshardingV3 lands.
There are many places in code where we have only access to
store
(not epoch manager) and we need the resharding history, so I think this idea is the way to go.Also, it would simplify the code as
StateReader
could be initially empty and would not need to know shard_uids and resharding history upfront.Next steps: