Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(state-sync): sync to the current epoch instead of the previous #12102

Merged
merged 75 commits into from
Oct 26, 2024

Conversation

marcelo-gonzalez
Copy link
Contributor

When a node processes a block that’s the first block of epoch T, and it realizes that it will need to track shards that it doesn’t currently track in epoch T+1, it syncs state to the end of epoch T-1 and then applies chunks until it’s caught up. We want to change this so that it syncs state to epoch T instead, so that the integration of state sync/catchup and resharding will be simpler.

In this PR, this is done by keeping most of the state sync logic unchanged, but changing the “sync_hash” that’s used to identify what point in the chain we want to sync to. Before, “sync_hash” was set to the first block of an epoch, and the existing state sync logic would have us sync the state as of two chunks before this hash. So here we change the sync hash to be the hash of the first block for which at least two new chunks have been seen for each shard in its epoch. This allows us to sync state to epoch T with minimal modifications, because the old logic is still valid.

Note that this PR does not implement support for this new way of syncing for nodes that have fallen several epochs behind the chain, rather than nodes that need to catchup for an upcoming epoch. This can be done in a future PR

@marcelo-gonzalez
Copy link
Contributor Author

Btw, for reviewers, this PR is not quite ready to be submitted because it is only minimally tested on localnet, where I just checked that it syncs properly. You can try it with this:

diff --git a/core/primitives/src/epoch_manager.rs b/core/primitives/src/epoch_manager.rs
index 281be4399..b73b9348a 100644
--- a/core/primitives/src/epoch_manager.rs
+++ b/core/primitives/src/epoch_manager.rs
@@ -166,6 +166,8 @@ impl AllEpochConfig {
     pub fn generate_epoch_config(&self, protocol_version: ProtocolVersion) -> EpochConfig {
         let mut config = self.genesis_epoch_config.clone();
 
+        config.validator_selection_config.shuffle_shard_assignment_for_chunk_producers = true;
+
         Self::config_mocknet(&mut config, &self.chain_id);
 
         if !self.use_production_config {

Then if you run the transactions.py pytest, it should be able to finish after nodes sync state in the new way (you might have to comment out some asserts in that test that fail sometimes, looks unrelated but will check it)

So before submitting, I need to try this with more meaningful state and traffic/receipts, probably on forknet. Also would be good to add some integration tests, and fix whichever integration tests or pytests might have been broken by this. Also the FIXME comment in this PR needs to be fixed before I can submit this. But in any case, it should mostly be ready for review

One thing to be decided in this PR review is whether the gating via current protocol version I put in there looks okay. It feels kind of ugly to me, but it might be the easiest way to go

/// is the first block of the epoch, these two meanings are the same. But if the sync_hash is moved forward
/// in order to sync the current epoch's state instead of last epoch's, this field being false no longer implies
/// that we want to apply this block during catchup, so some care is needed to ensure we start catchup at the right
/// point in Client::run_catchup()
pub(crate) is_caught_up: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to split this field into multiple fields (or enum) to differentiate these meanings? it feels like the field being false indicates both we want to apply the chunks and not apply the chunks based on other state such as sync_hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think actually in both cases if this field is false, we don't want to apply the chunks for shards we don't currently track, and this logic should be the same:

fn get_should_apply_chunk(

I think we probably could split it, but it's a little bit tricky. Let me think about it actually... For now in this PR it is kept as is to not have to touch too many things and possibly break something. the tricky part is that right now we add the first block of the epoch to the BlocksToCatchup column based on this field, which is then read to see if we'll need to catch up the next block after this one as well:

Ok((self.prev_block_is_caught_up(&prev_prev_hash, &prev_hash)?, None))

I guess where that is called maybe we can just call get_state_sync_info() again, and also check if catchup is already done, but it requires some care

@marcelo-gonzalez
Copy link
Contributor Author

I actually just removed the test_mock_node_basic() test, since I think it's kind of outdated anyway. hopefully nobody objects to that... I kind of have some plans to just delete the hacky part of the mock-node code that generates home dirs in favor of just providing the home dirs from mainnet or testnet up-front anyway

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't manage to get through everything but I left some comments to keep you busy for now ;) Please have a look and let's reconvene.

It may be easier to review and merge this if you can split it down into smaller PRs. Not sure if it makes sense but something to consider.

I still need to catch up on the reading to understand why we need to have two new chunks in every shard. It would make it way easier with one so I want to make sure. I'll have another look at your proposal doc and come back.

Once this is done can you run nayduck and make sure that all the tests are still passing?

I think it would be best to add to nightly first and stabilize only after DSS is released.

chain/chain/src/chain.rs Outdated Show resolved Hide resolved
Comment on lines 787 to 790
let protocol_version =
self.epoch_manager.get_epoch_protocol_version(block.header().epoch_id())?;
let sync_hash =
if protocol_version < 72 { *block.header().hash() } else { CryptoHash::default() };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should still use a proper protocol feature here. There is no harm being more conservative and treating some features as protocol features.

@saketh-are How are you planning to rollout decentralized state sync? Will it also be gated by a protocol feature? I think it would be nice even though it's not strictly necessary. Even with fallback to cloud state sync it would make for a cleaner transition in the period where nodes are upgrading from the old binary to the new one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made it a nightly feature

Comment on lines 787 to 790
let protocol_version =
self.epoch_manager.get_epoch_protocol_version(block.header().epoch_id())?;
let sync_hash =
if protocol_version < 72 { *block.header().hash() } else { CryptoHash::default() };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is CryptoHash::default? Is it meant to be a magic value to indicate that state sync should be delayed until there is a chunk in every shard? We should not rely on magic values, instead you can use an enum to fully capture the meaning e.g.

enum StateSyncHandle { // or id, anchor, reference, ..., whatever name feels right
  None,
  Waiting,
  Hash(CryptoHash), // Or perhaps even OldStateSyncHash and NewStateSyncHash to further differentiate between the old implementation and the new one. The names should be improved :) 
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that magic values are ugly in general, but here it’s done to avoid the need for a database migration.

So before this change, this field is always set to the hash of the first block of the epoch (which is the sync_hash we use for state sync). And this is also used as the key we look up this value with. But this sync_hash field in the StateSyncInfo struct is actually only ever checked once, where we just check if it’s equal to the key that looked up that struct.

So this sync_hash field is actually just redundant data and is not used meaningfully, which is why I took the opportunity to change the way we set it, because it can be done with logic that’s consistent before/after the binary upgrade. We just set it to zero after the protocol change, and if we read it and that field is set, then that’s the sync hash we want to use, otherwise, we’re still looking for the sync hash.

So all this is just to avoid a DB migration. I felt that if we could do so, it’s not so bad to accept slightly ugly code with a magic value of zeros, because a DB migration is a sort of nonzero development/UX churn point. But wdyt? I could of course just do this the right way and add a DB migration if we don’t think it’s such a big deal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing context, I didn't realize it would necessitate a db migration.

Still I think state sync being in a pretty bad place as it is I wouldn't want to add more tech debt to it. I don't think a db migration is a big deal even if it takes a bit extra effort. I would expect the state sync info to be minimal in size so the migration should be really quick too. Can you try and see how bad is it?

Comment on lines 1806 to 1809
// 2) Start creating snapshot if needed.
if let Err(err) = self.process_snapshot() {
if let Err(err) = self.process_snapshot(block.header()) {
tracing::error!(target: "state_snapshot", ?err, "Failed to make a state snapshot");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this pr, I'm curious why process snapshot happens in start_process_block. It's not important but if you happen to know please tell :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a v good question, I am not sure… I think it should be pretty possible to just decouple this, since it’s not needed in any block/chunk processing. My guess is it was just easy to express it as a callback that gets called every time we update the HEAD, in a way that you could sort of imagine the indexer being rewritten to do (instead of polling every so often for a new block)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine having this check to happen on every block, it's a common pattern. It just seems strange that it happens before the block is applied. If the block turns out to be invalid the node may do an "invalid" snapshot for some definition of invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait yea that is a good point.. and I feel like not only should we wait until it's applied, but maybe we should wait until it's final too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we discussed it but I don't remember the final conclusion - can you remind me please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think it's just a little bit tricky where we have to make sure flat storage is still available when we make the snapshot, and that the flat storage head hasn't moved past the point where we want to take the snapshot. I'm not 100% sure what the guarantees are on that, but keeping this here at least preserves the old behavior, and in another PR if it makes sense we can move this. But worth noting that in one of the recent commits addressing your comment about not passing the block header to Chain::should_make_or_delete_snapshot(), we now take a snapshot for the current epoch sync hash one block later than before, and it still works

@@ -2407,6 +2414,84 @@ impl Chain {
)
}

fn get_epoch_start_sync_hash_impl(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without further context this method seems very inefficient. In the worst case it would need to iterate all blocks in an epoch to get to the first one. Looking at what we have in the db the EpochStart column and get_epoch_start_from_epoch_id may already have what you need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree, although here I’m actually just moving the existing code from here. For now I just left it as-is without trying to optimize it any further, but maybe in another PR we should do something like that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, it's fine if you want to do it in a follow up, can you just add a TODO in there?

@@ -249,13 +249,6 @@ impl StoreValidator {
DBCol::StateDlInfos => {
let block_hash = CryptoHash::try_from(key_ref)?;
let state_sync_info = StateSyncInfo::try_from_slice(value_ref)?;
// StateSyncInfo is valid
self.check(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see the comment above about avoiding the DB migration

Comment on lines 114 to 119
pub struct CatchupState {
pub state_sync: StateSync,
pub state_downloads: HashMap<u64, ShardSyncDownload>,
pub catchup: BlocksCatchUpState,
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 156 to 158
/// A mapping from the first block of an epoch that we know needs to be state synced for some shards
/// to a tracker that will find an appropriate sync_hash for state sync
catchup_tracker: HashMap<CryptoHash, NewChunkTracker>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mini nit: Intuitively it would make sense to track state sync per epoch (epoch_id). Would it make sense in practice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, agreed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could even change the sync hash to state sync id that is the epoch id. I'm not sure if it makes sense, how much work would it take and if it's worth it though

chain/client/src/client.rs Show resolved Hide resolved
chain/client/src/client.rs Outdated Show resolved Hide resolved
@marcelo-gonzalez
Copy link
Contributor Author

It may be easier to review and merge this if you can split it down into smaller PRs. Not sure if it makes sense but something to consider.

Actually here I tried to make the individual commits inside this PR somewhat well contained on their own, so hopefully those are easier to look at?

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

I think we should fix the duplication of code for finding the sync hash but it can be done separately.

let mut num_new_chunks: HashMap<_, _> =
shard_ids.iter().map(|shard_id| (*shard_id, 0)).collect();

loop {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the status of this? impossible / will do later (please add todo) / will do in this PR / other ?

loop {
let next_hash = match self.chain_store().get_next_block_hash(header.hash()) {
Ok(h) => h,
Err(Error::DBNotFoundErr(_)) => return Ok(None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using errors as part of business logic is an anti pattern. It's outside of your PR as I think it's the get_next_block_hash that should return a Result<Option<..>> but flagging just in case there is a different API that you can use.

Comment on lines +3942 to +3943
match snapshot_config.state_snapshot_type {
// For every epoch, we snapshot if the next block is the state sync "sync_hash" block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just snapshot one block later in both cases, but I wanted to change as little as possible

If that would make the code cleaner then I'm for it. If not feel free to leave as is.

@@ -139,9 +151,12 @@ pub struct Client {
/// Approvals for which we do not have the block yet
pub pending_approvals:
lru::LruCache<ApprovalInner, HashMap<AccountId, (Approval, ApprovalType)>>,
/// A mapping from an epoch that we know needs to be state synced for some shards
/// to a tracker that will find an appropriate sync_hash for state sync to that epoch
catchup_tracker: HashMap<EpochId, NewChunkTracker>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new_chunk_trackers or similar
Out of curiosity, is there ever more than one epoch in that map? (same question for catchup_state_syncs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, that is a good question... It seems this mapping from epoch ID to catchup status has been around since state sync was first implemented, and looking at the code it doesn't seem to me like it should ever have more than one... When we add one, it's because we see a block in a new epoch T, and we know we're going to be a chunk producer for T+1 in a new shard. The epoch ID for T+1 is the hash of the last block in epoch T-1, so if there's a fork at the beginning of the epoch, it will still have the same epoch ID. So that should mean that for any given epoch height, we only have one of these. And then if we ask if it's possible to have epoch T and T+1 in there at the same time, we can look at the fact that we remove the epoch info for epoch T in finish_catchup_blocks() when we call remove_state_sync_info(). If that hasn't happened by the time we get to the first block of epoch T+1, we will not add another catchup state keyed by the epoch ID of epoch T+1, because we'll call that block an orphan until we remove the last block of epoch T from the BlocksToCatchup column in finish_catchup_blocks() .

So idk how it is even possible to have two, but maybe I'm missing something and that logic was put there for a reason?

But now that I think about it, what happens if we apply the first block of a new epoch and then save StateSyncInfo for it, which on the DB is keyed by the hash of that first block, and then that block doesn't end up on the main chain, because there's a fork off the last block of epoch T-1? Is there some reason that won't happen for the first block of an epoch? I'm not sure if there is, but if so, then there's an implicit assumption here that at least should have gotten a comment explaining, and if not, there's a potential bug here. I think in general maybe we should only be working with final blocks when doing all this state sync stuff. If it is a bug, then for now it's a pretty unlikely one I guess, but it's worth investigating more. Shouldn't be too too hard to try causing a fork at the first block of an epoch in a testloop test and seeing what happens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ehh it's possible that last part is not a bug actually, since there's care put into only removing the particular sync hash from the BlocksToCatchup here, but it's a bit confusing and worth sanity checking

This test fails because the mock epoch manager does not implement
get_block_info(), which is needed in the new implementation of get_sync_hash().
Instead of trying to fix it, just move the test to test loop
Otherwise this might return an error when looking up the prev header
in get_current_epoch_sync_hash(), and we don't want to sync state from
before the genesis anyway
…ync_hash_validity()

The behavior has changed to return false for the genesis block, but getting the sync hash corresponding
to the genesis block is not something we ever want to do, since we're not going to be syncing that state,
and there's an assertion that checks we dont use the genesis hash as a sync hash in ClientActorInner::find_sync_hash()
anyway.
@marcelo-gonzalez
Copy link
Contributor Author

fyi, @wacban added a couple more commits to fix tests. Note that in 9991c89 I'm returning early from get_sync_hash() for the genesis block and in 99f98a5 just not even checking what happens with the genesis block in tests because I don't think we ever want to be using the genesis block here in any case. ClientActorInner::find_sync_hash() even has an assertion against it. Let me know if you have objections though :)

Copy link

codecov bot commented Oct 24, 2024

Codecov Report

Attention: Patch coverage is 77.20930% with 98 lines in your changes missing coverage. Please review.

Project coverage is 71.24%. Comparing base (10463b2) to head (286693b).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
chain/client/src/client.rs 80.00% 14 Missing and 12 partials ⚠️
chain/chain/src/chain.rs 81.30% 8 Missing and 12 partials ⚠️
core/store/src/migrations.rs 0.00% 19 Missing ⚠️
tools/state-viewer/src/state_parts.rs 0.00% 12 Missing ⚠️
core/store/src/lib.rs 0.00% 8 Missing ⚠️
chain/client/src/client_actor.rs 53.33% 5 Missing and 2 partials ⚠️
chain/client/src/test_utils/client.rs 0.00% 3 Missing ⚠️
chain/chain/src/store_validator/validate.rs 0.00% 2 Missing ⚠️
nearcore/src/migrations.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12102      +/-   ##
==========================================
- Coverage   71.54%   71.24%   -0.31%     
==========================================
  Files         838      838              
  Lines      168748   168911     +163     
  Branches   168748   168911     +163     
==========================================
- Hits       120725   120334     -391     
- Misses      42767    43338     +571     
+ Partials     5256     5239      -17     
Flag Coverage Δ
backward-compatibility 0.16% <0.00%> (-0.01%) ⬇️
db-migration 0.16% <0.00%> (-0.01%) ⬇️
genesis-check 1.23% <0.00%> (-0.01%) ⬇️
integration-tests 39.05% <77.20%> (+0.14%) ⬆️
linux 70.68% <48.37%> (-0.39%) ⬇️
linux-nightly 70.83% <77.20%> (-0.29%) ⬇️
macos 50.34% <8.06%> (-3.81%) ⬇️
pytests 1.54% <0.00%> (-0.01%) ⬇️
sanity-checks 1.35% <0.00%> (-0.01%) ⬇️
unittests 64.17% <10.48%> (-1.14%) ⬇️
upgradability 0.21% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@marcelo-gonzalez marcelo-gonzalez added this pull request to the merge queue Oct 24, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 24, 2024
@Longarithm Longarithm added this pull request to the merge queue Oct 24, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 24, 2024
@marcelo-gonzalez marcelo-gonzalez added this pull request to the merge queue Oct 26, 2024
Merged via the queue into near:master with commit a871b9d Oct 26, 2024
26 of 29 checks passed
@marcelo-gonzalez marcelo-gonzalez deleted the state-sync-epoch branch October 26, 2024 01:57
github-merge-queue bot pushed a commit that referenced this pull request Nov 4, 2024
#12337)

In #12102 we added a new feature
that would move the sync hash to the first block in the epoch for which
at least two new chunks have been produced for each shard after the
first block. Finding this hash was implemented by iterating over block
headers until we find the right one. This is fine for a first version
but needed to be fixed, because finding the sync hash is done in many
places where we would rather not iterate over blocks like that,
especially if there are many skipped chunks that make the iteration
longer than expected.

Here we fix this by introducing a two new columns (`StateSyncHashes` and
`StateSyncNewChunks`) that keep track of the sync hashes and the number
of new chunks seen so far in the epoch for each block. These are updated
on each header update with the same basic logic as the old
implementation, and we set the sync hash in `StateSyncHashes` when it's
found, so that `Chain::get_sync_hash()` just becomes a single lookup by
epoch ID in that column.
github-merge-queue bot pushed a commit that referenced this pull request Nov 7, 2024
`ShardInfo` stores a `ShardId` and a chunk header in the `StateSyncInfo`
stored on disk when we need to catch up. But the chunk header field is
not read anywhere, and wasn't even before
#12102. Setting this chunk hash is
the first thing that fails when trying to enable state sync after a
resharding, because we try indexing into a block's chunks with the new
epoch's shard indices. So since it's totally unused, just remove it.

This makes a breaking change to the `StateDlInfos` column, but only
relative to another commit on master since the last release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants