Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reshardingV3] State ShardUIdMapping - initial implementation #12084

Merged
merged 13 commits into from
Oct 11, 2024

Conversation

staffik
Copy link
Contributor

@staffik staffik commented Sep 12, 2024

Tracking issue: #12050

Summary

Currently the changes should be almost no-op, as we do not explicitly save anything to DBCol::ShardUIdMapping.
The only difference is that we make an additional read from DBCol::ShardUIdMapping column every time we access State column.
The main logic is in Store::get_impl_state().
These changes implement mapping for reads, writes will be handled in the next PR.

Changes:

  • Added DBCol::ShardUIdMapping that is initially empty and will be populated on future resharding events.
  • Slight refactor: only allow Store to create StoreUpdate.
  • Store::get_impl_state() - special get() implementation for the State column.

Next steps (see tracking issue #12050):

  • Use mapping for writes to db.
  • Handle copy_state_from_store in cold_storage.rs.
  • Integration.
  • State clean up (e.g. gc parent state when it is no longer referenced by any child).
  • Tests.

@staffik staffik added the A-resharding Area: State resharding label Sep 12, 2024
@staffik staffik changed the title [reshardingV3] [reshardingV3] State ShardUIdMapping Sep 12, 2024
@staffik
Copy link
Contributor Author

staffik commented Sep 12, 2024

There is

scan_db_column(
    col: &str,
    lower_bound: Option<&[u8]>,
    upper_bound: Option<&[u8]>,
    store: Store)

and it might not be possible to scan child shard only.
It is only a debug tool and we probably would need to live with that.

Copy link

codecov bot commented Sep 12, 2024

Codecov Report

Attention: Patch coverage is 89.65517% with 6 lines in your changes missing coverage. Please review.

Project coverage is 71.70%. Comparing base (99ecfa4) to head (fee1c6a).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
core/store/src/lib.rs 85.71% 0 Missing and 4 partials ⚠️
core/store/src/adapter/trie_store.rs 83.33% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #12084   +/-   ##
=======================================
  Coverage   71.69%   71.70%           
=======================================
  Files         825      825           
  Lines      165834   165850   +16     
  Branches   165834   165850   +16     
=======================================
+ Hits       118902   118921   +19     
+ Misses      41751    41745    -6     
- Partials     5181     5184    +3     
Flag Coverage Δ
backward-compatibility 0.17% <0.00%> (-0.01%) ⬇️
db-migration 0.17% <0.00%> (-0.01%) ⬇️
genesis-check 1.26% <0.00%> (-0.01%) ⬇️
integration-tests 38.87% <29.31%> (+0.03%) ⬆️
linux 71.39% <89.65%> (+0.01%) ⬆️
linux-nightly 71.28% <89.65%> (+0.01%) ⬆️
macos 54.19% <89.65%> (-0.06%) ⬇️
pytests 1.57% <0.00%> (-0.01%) ⬇️
sanity-checks 1.38% <0.00%> (-0.01%) ⬇️
unittests 65.37% <89.65%> (-0.01%) ⬇️
upgradability 0.21% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left a few comments!

core/store/src/cold_storage.rs Outdated Show resolved Hide resolved
core/store/src/flat/store_helper.rs Outdated Show resolved Hide resolved
core/store/src/trie/mem/parallel_loader.rs Outdated Show resolved Hide resolved
core/store/src/trie/shard_tries.rs Outdated Show resolved Hide resolved
core/store/src/trie/shard_tries.rs Outdated Show resolved Hide resolved
core/store/src/trie/shard_tries.rs Outdated Show resolved Hide resolved
core/store/src/trie/trie_storage.rs Outdated Show resolved Hide resolved
core/store/src/trie/trie_storage.rs Outdated Show resolved Hide resolved
core/store/src/trie/trie_storage.rs Outdated Show resolved Hide resolved
@staffik staffik force-pushed the reshardingv3-state branch 2 times, most recently from 8a4ffea to be91b49 Compare September 24, 2024 13:06
@staffik staffik changed the title [reshardingV3] State ShardUIdMapping [reshardingV3] State ShardUIdMapping - initial implementation Sep 24, 2024
@staffik staffik marked this pull request as ready for review September 24, 2024 13:23
@staffik staffik requested a review from a team as a code owner September 24, 2024 13:23
@staffik staffik requested a review from wacban September 24, 2024 13:23
@@ -337,6 +397,7 @@ impl Store {
}

pub fn iter_prefix<'a>(&'a self, col: DBCol, key_prefix: &'a [u8]) -> DBIterator<'a> {
assert!(col != DBCol::State, "can't iter prefix of State column");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Luckily, that is not currently used for State column

@@ -355,6 +418,7 @@ impl Store {
col: DBCol,
key_prefix: &'a [u8],
) -> impl Iterator<Item = io::Result<(Box<[u8]>, T)>> + 'a {
assert!(col != DBCol::State, "can't iter prefix ser of State column");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Luckily, that is not currently used for State column

@@ -595,6 +658,7 @@ impl StoreUpdate {
/// Deletes the given key range from the database including `from`
/// and excluding `to` keys.
pub fn delete_range(&mut self, column: DBCol, from: &[u8], to: &[u8]) {
assert!(column != DBCol::State, "can't range delete State column");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Luckily, that is not currently used for State column

core/store/src/lib.rs Outdated Show resolved Hide resolved
core/store/src/lib.rs Outdated Show resolved Hide resolved
core/store/src/trie/shard_tries.rs Outdated Show resolved Hide resolved
core/store/src/shard_uid_mapping.rs Outdated Show resolved Hide resolved
core/store/src/shard_uid_mapping.rs Outdated Show resolved Hide resolved
core/store/src/trie/mem/parallel_loader.rs Show resolved Hide resolved
core/store/src/lib.rs Outdated Show resolved Hide resolved
core/store/src/shard_uid_mapping.rs Outdated Show resolved Hide resolved
@staffik
Copy link
Contributor Author

staffik commented Sep 25, 2024

@wacban @shreyan-gupta I will go offline in a moment. Feel free to merge / take it over, I can continue moving it forward in 10 days.

@staffik staffik force-pushed the reshardingv3-state branch from eaf0e57 to 8fcc38a Compare October 11, 2024 09:13
@staffik staffik force-pushed the reshardingv3-state branch from 8fcc38a to c8da0e9 Compare October 11, 2024 09:23
@staffik staffik force-pushed the reshardingv3-state branch from c8da0e9 to 69ea15d Compare October 11, 2024 09:26
fn get_impl_state(&self, key: &[u8]) -> io::Result<Option<DBSlice<'_>>> {
let shard_uid = retrieve_shard_uid_from_db_key(key)?;
let mapped_shard_uid = self
.get_ser::<ShardUId>(DBCol::ShardUIdMapping, &shard_uid.to_bytes())?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do the mapping in core/store/src/adapter/trie_store.rs ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having TrieStoreAdapter is now so nice :)

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall

Can you add some test where you:
write and read some data
set the mapping
write and read some data using the old shard uid
write and read some data using the new shard uid (and check that it was mapped)

@@ -186,6 +186,7 @@ fn copy_state_from_store(

let Some(trie_changes) = trie_changes else { continue };
for op in trie_changes.insertions() {
// TODO(reshardingV3) Handle shard_uid not mapped there
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This todo will be tricky, we will need to be careful when adding and removing a mapping. It's good that you spotted it.

unrelated to this PR:
I wonder if we can keep the original shard uids when moving data to cold storage. This would keep the cold storage sharded (as it is today).

core/store/src/columns.rs Outdated Show resolved Hide resolved
core/store/src/columns.rs Outdated Show resolved Hide resolved
@@ -444,7 +450,8 @@ impl DBCol {
| DBCol::StateChangesForSplitStates
| DBCol::StateHeaders
| DBCol::TransactionResultForBlock
| DBCol::Transactions => true,
| DBCol::Transactions
| DBCol::ShardUIdMapping => true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how will the mapping work on split storage nodes? I can't say if this is good or not without the full picture. For early MVP it doesn't matter too much so feel free to just leave a TODO here and proceed.

Copy link
Contributor Author

@staffik staffik Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was we want to copy ShardUIdMapping to cold storage because cold_store.get(state_key) would not work otherwise. Marked it as TODO to understand it deeper after early MVP.

core/store/src/lib.rs Outdated Show resolved Hide resolved
core/store/src/lib.rs Outdated Show resolved Hide resolved
core/store/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines +48 to +49
let mapped_shard_uid = self.read_shard_uid_mapping_from_db(shard_uid)?;
let key = get_key_from_shard_uid_and_hash(mapped_shard_uid, hash);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

core/store/src/lib.rs Show resolved Hide resolved
@wacban
Copy link
Contributor

wacban commented Oct 11, 2024

Ah I just saw your zulip post that the write path will be done separately, ignore my comment about testing it. Just have a look at my comments and fix the CI and we should be good to go.

@staffik staffik requested a review from wacban October 11, 2024 13:19
Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@staffik staffik added this pull request to the merge queue Oct 11, 2024
Merged via the queue into master with commit 189222e Oct 11, 2024
29 of 30 checks passed
@staffik staffik deleted the reshardingv3-state branch October 11, 2024 14:28
github-merge-queue bot pushed a commit that referenced this pull request Oct 18, 2024
The [previous PR](#12084)
introduced mapping for read operations.

This PR extends that functionality to write operations and adds some
testing for State mapping.

Following the [Zulip
discussion](https://near.zulipchat.com/#narrow/stream/407288-core.2Fresharding/topic/State.20mapping/near/476959235),
we decided to implement a panic inside the `TrieStoreUpdateAdapter`
methods. Other strategies considered were:
1. Propagating the error instead of panicking: This was rejected because
the error would need to be propagated through multiple layers that
currently don't expect errors. Additionally, an error here would
indicate a misconfiguration in the database, justifying the use of
panic.
2. Performing the mapping later in `TrieStoreUpdateAdapter::commit()`:
This would require iterating through all `DBOp`s, parsing each
operation, extracting the `shard_uid` from the database key, mapping it,
and re-encoding. This approach would make `TrieStoreUpdateAdapter`
dependent on the internal workings of `DBTransaction`. Also,
`StoreUpdate::merge()` makes me feel uneasy.

---------

Co-authored-by: Waclaw Banasik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-resharding Area: State resharding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants