Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(resharding) - Introducing ShardLayoutV2 and nighshade layout v4 #12066

Merged
merged 19 commits into from
Sep 30, 2024

Conversation

wacban
Copy link
Contributor

@wacban wacban commented Sep 9, 2024

Adding a new shard layout structure. It replaces the boundary accounts with a mapping from shard id to the account range of that shard. This is necessary in order to implement the account id to shard id mapping. Previously it relied on shard ids being contiguous and ordered by the account ranges, neither of which will be the case in this shard layout.

@wacban wacban requested a review from a team as a code owner September 9, 2024 17:26
Copy link

codecov bot commented Sep 9, 2024

Codecov Report

Attention: Patch coverage is 93.75000% with 12 lines in your changes missing coverage. Please review.

Project coverage is 71.60%. Comparing base (e5aa208) to head (797ce1f).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
core/primitives/src/shard_layout.rs 93.68% 11 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12066      +/-   ##
==========================================
+ Coverage   71.59%   71.60%   +0.01%     
==========================================
  Files         823      824       +1     
  Lines      165137   165348     +211     
  Branches   165137   165348     +211     
==========================================
+ Hits       118223   118400     +177     
- Misses      41789    41819      +30     
- Partials     5125     5129       +4     
Flag Coverage Δ
backward-compatibility 0.17% <0.00%> (-0.01%) ⬇️
db-migration 0.17% <0.00%> (-0.01%) ⬇️
genesis-check 1.26% <0.00%> (-0.01%) ⬇️
integration-tests 38.71% <9.37%> (-0.05%) ⬇️
linux 71.38% <93.75%> (+0.01%) ⬆️
linux-nightly 71.18% <93.75%> (+0.02%) ⬆️
macos 54.09% <93.22%> (+0.03%) ⬆️
pytests 1.52% <0.00%> (-0.01%) ⬇️
sanity-checks 1.32% <0.00%> (-0.01%) ⬇️
unittests 65.33% <93.22%> (+0.03%) ⬆️
upgradability 0.21% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 258 to 259
/// A mapping from the shard id to the account range of this shard.
shards_account_range: ShardsAccountRange,
Copy link
Member

@Longarithm Longarithm Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to leave boundary_accounts: Vec<AccountId> and just add new field shard_ids: Vec<ShardId>? Then we would have to validate only that boundary_accounts.len() + 1 == shard_ids.len() and that boundary_accounts is in ascending order. After that, we can construct immediately valid ShardsAccountRange for local usage. This would simplify validation in general.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this can be a fine and more simplistic strategy, but I don't feel too strongly about it vs the current implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either would work but I feel like shard id => range mapping is easier for humans to parse. I'm also considering changing it to range => shard id as that would make the account_id_to_shard_id query faster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with human readability argument. But I think readability should be handled by some tool over the shard layout format, like we already do for the latest snapshot of runtime config. I prefer the base primitive to be as concise as possible. For me it's like getting rpc response in json or text - both make sense, but json must be the base format.

If we go with ranges, start and end are valid accounts so we need some placeholders which are definitely invalid ($start?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I'm fine with that, will do tomorrow.

If we go with ranges, start and end are valid accounts so we need some placeholders which are definitely invalid ($start?)

I wouldn't want to use magic values but perhaps I can use Unbounded for those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and it actually simplified a lot of things, thanks!

The account ranges are no longer needed so I didn't need to use the std ranges.

let prev = values[i - 1];
let curr = values[i];
if prev.end != curr.start {
return Err(err("account ranges should be contiguous"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can probably also include the offending account value in the error message?

/// A mapping from the shard id to the account range of this shard.
shards_account_range: ShardsAccountRange,

/// A mapping from the parent shard to child shards. Maps shards from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it makes sense to have this struct contain all historic mappings instead of just the latest split?

That would be quite useful for store layer where we may recursively need to find parent shard_ids

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe but let's keep it simple for now and we can enhance it later as needed.

shards_parent_map: Option<ShardsParentMapV2>,

/// Version of the shard layout, this is useful for uniquely identify the shard layout
version: ShardVersion,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope with LayoutV2, we would finally increment version by 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few options here:

  1. bump the version and do a migration (since state and flat state are keyed by shard uid)
  2. bump the version and add an override on storage layer access, along the lines of if version >= 3 { version = 3 }
  3. do not bump the version

The third one doesn't require any extra changes and for that reason it's my favourite. We can just stick to version 3 until we need to make a real change and then we can figure it out. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no bump ➕

Comment on lines 258 to 259
/// A mapping from the shard id to the account range of this shard.
shards_account_range: ShardsAccountRange,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this can be a fine and more simplistic strategy, but I don't feel too strongly about it vs the current implementation.

fn test_shard_layout_v4() {
// Test ShardsAccountRange validation

// Test account id to shard id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally left empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I forgot to add this test, thanks!

core/primitives/src/shard_layout.rs Show resolved Hide resolved
@staffik
Copy link
Contributor

staffik commented Sep 12, 2024

We might want to do sth with ShardUId::next_shard_prefix().
Also, there is remove_range_by_shard_uid():

fn remove_range_by_shard_uid(store_update: &mut StoreUpdate, shard_uid: ShardUId, col: DBCol) {
    let key_from = shard_uid.to_bytes();
    let key_to = ShardUId::next_shard_prefix(&key_from);
    store_update.delete_range(col, &key_from, &key_to);
}

|reason: &str| ShardLayoutError::InvalidShardsAccountRange { reason: reason.to_string() };

let values = shards_account_range.values().sorted().collect_vec();
println!("{:?}", values);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this meant to be a log message?

Self { start: AccountBoundary::Middle(start.parse().unwrap()), end: AccountBoundary::End }
}

pub fn new_mid(start: &str, end: &str) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is &str preferred to &AccountId?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it should be AccountId but it's just so much more conveninent to
pass &str because then I just need to do .parse().unwrap() in one place. I'm
happy to do it proper way and add some convenience functions just for the tests,
please let me now if you think it's worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to think it's worth it, just because all these methods are public, but I don't have a strong preference.

@wacban
Copy link
Contributor Author

wacban commented Sep 25, 2024

Reviewers, can you have another look please? The most notable change is that the mapping is now from account range to shard id (used to be the other way around). It will allows us to optimize the account id to shard mapping in the future. Also addressed the comments and did some cleanups.

Copy link
Contributor

@Trisfald Trisfald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🦀 🚀

core/primitives/src/shard_layout.rs Outdated Show resolved Hide resolved
// In this shard layout the accounts are divided into ranges, each range is
// mapped to a shard. The shards are contiguous and start from 0.
fn account_id_to_shard_id(&self, account_id: &AccountId) -> ShardId {
// Note: As we scale up the number of shards we can consider
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: note is outdated, in a certain sense
I think no one will add more shards to V1 layout

core/primitives/src/shard_layout.rs Outdated Show resolved Hide resolved
/// This layout is provisional, the actual shard layout should be determined
/// based on the fresh data before the resharding.
pub fn get_simple_nightshade_layout_v4() -> ShardLayout {
// the boundary accounts in lexicographical order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it worth to add todo!() or similar compiler help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add todo where?

I'm hoping we can get v4 into nightly soon enough and start testing it there. I wouldn't want compiler panicking there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to prevent using the provisional layout by mistake, but anyway we are going to check this again before releasing

};

let mut shards_parent_map = ShardsParentMapV2::new();
for (&parent_shard_id, shard_ids) in shards_split_map.iter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if at some point we would need to implement a check such as: at most one shard is being split in a given ShardLayout. Maybe another day :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can perform a few splits at the same time :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe another day :)

Copy link
Member

@Longarithm Longarithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is about where readability should be handled.

/// TODO(resharding) Determine the shard layout for v4.
/// This layout is provisional, the actual shard layout should be determined
/// based on the fresh data before the resharding.
pub fn get_simple_nightshade_layout_v4() -> ShardLayout {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marketing nit: what about dropping simple and saying get_nightshade_layout_v4 since now?
The ultimate version may be called get_dynamic_nightshade_layout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking this particular layout will not be dynamic so I wouldn't want to call it that quite yet. I called it simple for consistency and greppability. I'm not sure the codebase is the best place for marketing but if you like it more I'm happy to rename it to get_nightshade_layout_v4.

Comment on lines 179 to 191
fn contains(&self, account_id: &AccountId) -> bool {
match (&self.start, &self.end) {
(AccountBoundary::Start, AccountBoundary::End) => true,
(AccountBoundary::Start, AccountBoundary::Middle(end)) => account_id < end,
(AccountBoundary::Middle(start), AccountBoundary::End) => account_id >= start,
(AccountBoundary::Middle(start), AccountBoundary::Middle(end)) => {
account_id >= start && account_id < end
}
(AccountBoundary::End, _) => panic!("invalid account range"),
(_, AccountBoundary::Start) => panic!("invalid account range"),
}
}
}
Copy link
Member

@Longarithm Longarithm Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative:

I remembered there are primitives std::ops::RangeBounds and std::ops::Bound. We can represent shard ranges using them. RangeBounds::contains looks exactly like this function, with exception that there is Bound::Unbounded which by definition means both start and end, depending on which side of range it represents. So panic is not needed there.

There are also RangeFull, RangeTo, RangeFrom.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, thanks! I should have guessed there is some ready structure for this. I'll check it out tomorrow!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to use primitives in std if it's simple enough for our use case

Comment on lines 258 to 259
/// A mapping from the shard id to the account range of this shard.
shards_account_range: ShardsAccountRange,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with human readability argument. But I think readability should be handled by some tool over the shard layout format, like we already do for the latest snapshot of runtime config. I prefer the base primitive to be as concise as possible. For me it's like getting rpc response in json or text - both make sense, but json must be the base format.

If we go with ranges, start and end are valid accounts so we need some placeholders which are definitely invalid ($start?)

shards_split_map: Option<ShardsSplitMapV2>,
/// A mapping from the child shard to the parent shard. Maps shards in this
/// shard layout to their parent shards.
shards_parent_map: Option<ShardsParentMapV2>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the shards_parent_map and shards_split_map optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for the genesis case. It's used in tests mostly.

@@ -135,6 +213,51 @@ impl ShardLayout {
})
}

/// Return a V2 Shardlayout
pub fn v2(
boundary_accounts: Vec<AccountId>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also add debug assert that boundary_accounts is strictly sorted.

@wacban wacban added this pull request to the merge queue Sep 30, 2024
Merged via the queue into master with commit eb7ceec Sep 30, 2024
30 checks passed
@wacban wacban deleted the waclaw-resharding branch September 30, 2024 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants