feat(sync): use routed state part request in sync actor #12111

saketh-are · 2024-09-19T00:26:36Z

This PR goes after #12110:

Now that peer selection happens inside the network crate, we can simplify a great deal of the sync actor. It is no longer concerned about from which peers the state parts are requested.
We change the behavior of nodes which have external storage configured. For each state part, they will first attempt to obtain it from other nodes in the network. After a fixed number of attempts, they will fall back to downloading from the external storage.

These changes bring us towards the ultimate goal of deprecating cloud storage of state parts entirely.

marcelo-gonzalez · 2024-09-20T17:31:07Z

chain/client/src/sync/state.rs

-            StateSyncInner::External { .. } => peers,
-        };
-        Ok(res)
+        // TODO: where is the part validated though?


yea this is all kind of convoluted... It's validated below in update_download_on_state_response_message() when we call Chain::set_state_part(). So actually I think we should probably send this StatePartReceived message only after that call returns Ok, because otherwise we're saying we can clear the state in the SnapshotHostsCache for this part before we validate it. But what we should do is keep all that state around until we're sure the part is legit, and actually maybe even remove this peer from the set we'll request parts from if it sent us a bad part

marcelo-gonzalez · 2024-09-20T17:40:33Z

chain/client/src/sync/state.rs

+        } else {
+            let peer_id = possible_targets.choose(&mut thread_rng()).cloned().unwrap();
+            tracing::debug!(target: "sync", ?peer_id, shard_id, ?sync_hash, ?possible_targets, "request_shard_header");
+            assert!(header_download.run_me.load(Ordering::SeqCst));


super nit: I know you're just moving the code that was already here, but if you want you could change this assert and the store below it to just one assert!(header_download.run_me.swap(false, Ordering::SeqCst));

marcelo-gonzalez · 2024-09-20T17:53:24Z

chain/client/src/sync/state.rs

+                    let epoch_info = chain.epoch_manager.get_epoch_info(epoch_id).unwrap();
+                    let epoch_height = epoch_info.epoch_height();
+
+                    let shard_state_header = chain.get_state_header(shard_id, sync_hash).unwrap();


On each iteration of this loop we read this actually sort of largeish value from storage, when it's going to be the same for every part. If you want, you could move this call to the top of the function, and same with sync_block_header above. And since the shard_state_header is only needed in this branch to get the state_root and state_num_parts, but not when we dont have external storage configured, you could maybe just initialize an Option<state_root, state_num_parts> at the top of this funciton set to None, and set it here.

I guess it's not a huge deal though, so feel free to forget about it.

marcelo-gonzalez · 2024-09-20T18:07:24Z

chain/client/src/sync/state.rs

+
+                // The request sent to the network adapater needs to include the sync_prev_prev_hash
+                // so that a peer hosting the correct snapshot can be selected.
+                if let Ok(header) = chain.get_block_header(&sync_hash) {


again like in the other PR, prob best to log the errors here, since there is something very wrong if getting these headers doesnt work

marcelo-gonzalez · 2024-09-21T02:19:49Z

chain/client/src/sync/state.rs

+            {
+                let StateSyncExternal { chain_id, semaphore, external } =
+                    self.external.as_ref().unwrap();
+                if semaphore.available_permits() > 0 {


could also just break if it's 0

marcelo-gonzalez · 2024-09-21T02:41:43Z

chain/client/src/sync/state.rs

+                if let Ok(header) = chain.get_block_header(&sync_hash) {
+                    if let Ok(prev_header) = chain.get_block_header(&header.prev_hash()) {
+                        let sync_prev_prev_hash = prev_header.prev_hash();
+                        request_part_from_peers(


so actually I think there are a couple problems that will come up, one is that it seems that we aren't getting SyncSnapshotHosts with updated sync_hash and shard ID values after the first one. Also, it seems that the shard IDs sent in this message are just all the shards, even if the node is only tracking one. On a localnet test it stays on the first sync hash even after we've applied the new sync_hash, and we should have gotten updated SyncSnapshotHosts with the next epoch's sync hash. So after a successful state sync the first time, this request ends up failing on the next epoch because the call to snapshot_hosts.select_host_for_part() in the peer manager returns None

Also, this is not a correctness issue I guess, but this parts_to_fetch() returns the parts whose associated run_me field is true. And this value gets set to true again here after a timeout (default 2 seconds). On localnet this timeout is probably fine but in practice on mainnet 2 seconds seems a little small to me. But this highlights that the above thing is actually a real problem because in this localnet test we're hitting this timeout because the shard IDs set in the SyncSnapshotHosts message list all the shards, even though the node is only tracking one, and then we end up making state sync requests to nodes that can't actually serve the parts. So in this case the timeout happens for a legit reason, but I think in practice the 2 second default timeout applied by all nodes might add unnecessary load on the network as a whole

you can try with this diff:

diff --git a/core/primitives/src/epoch_manager.rs b/core/primitives/src/epoch_manager.rs index 619809334..0d2cdd5c1 100644 --- a/core/primitives/src/epoch_manager.rs +++ b/core/primitives/src/epoch_manager.rs @@ -166,6 +166,8 @@ impl AllEpochConfig { pub fn generate_epoch_config(&self, protocol_version: ProtocolVersion) -> EpochConfig { let mut config = self.genesis_epoch_config.clone(); + config.validator_selection_config.shuffle_shard_assignment_for_chunk_producers = true; + Self::config_mocknet(&mut config, &self.chain_id); if !self.use_production_config { diff --git a/pytest/lib/state_sync_lib.py b/pytest/lib/state_sync_lib.py index 2eefb7b81..39fe8d06d 100644 --- a/pytest/lib/state_sync_lib.py +++ b/pytest/lib/state_sync_lib.py @@ -37,15 +37,6 @@ def get_state_sync_configs_pair(tracked_shards=[0]): "nanos": 500000000 }, "state_sync": { - "sync": { - "ExternalStorage": { - "location": { - "Filesystem": { - "root_dir": state_parts_dir - } - } - } - } }, "state_sync_enabled": True, } diff --git a/pytest/tests/sanity/transactions.py b/pytest/tests/sanity/transactions.py index d0de6e781..8207122d7 100755 --- a/pytest/tests/sanity/transactions.py +++ b/pytest/tests/sanity/transactions.py @@ -106,16 +106,16 @@ for height, hash in utils.poll_blocks(nodes[4], timeout=TIMEOUT): last_balances = [x for x in ctx.expected_balances] ctx.send_moar_txs(hash, 10, use_routing=True) sent_height = height - else: - assert height <= sent_height + 10, ('Balances before: {before}\n' - 'Expected balances: {expected}\n' - 'Current balances: {current}\n' - 'Sent at height: {sent_at}\n' - 'Current height: {height}').format( - before=last_balances, - expected=ctx.expected_balances, - current=ctx.get_balances(), - sent_at=sent_height, - height=height) + # else: + # assert height <= sent_height + 10, ('Balances before: {before}\n' + # 'Expected balances: {expected}\n' + # 'Current balances: {current}\n' + # 'Sent at height: {sent_at}\n' + # 'Current height: {height}').format( + # before=last_balances, + # expected=ctx.expected_balances, + # current=ctx.get_balances(), + # sent_at=sent_height, + # height=height) if height >= 100: break

this will make the chunk producer assignment change on every epoch so that the nodes need to state sync for the shards theyll track in the future. then if you add some logging in this function, and in the peer manager where we call snapshot_hosts.select_host_for_part(), you should be able to see these two problems. the first one should cause the chain to stall after a couple epochs at height 30something (you might have to try a couple times for it to happen), and the second one is visible only in the logs

saketh-are added 2 commits September 18, 2024 19:47

implement routed state part request

b9acf4a

update sync actor w/ routed state part request

6d22c22

saketh-are requested review from VanBarbascu and marcelo-gonzalez September 19, 2024 00:59

saketh-are marked this pull request as ready for review September 19, 2024 00:59

saketh-are requested a review from a team as a code owner September 19, 2024 00:59

saketh-are added 2 commits September 20, 2024 14:26

Merge remote-tracking branch 'origin/master' into p2p-state-sync

a0520cf

improve TODO msg

407d34e

marcelo-gonzalez reviewed Sep 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sync): use routed state part request in sync actor #12111

feat(sync): use routed state part request in sync actor #12111

saketh-are commented Sep 19, 2024 •

edited

Loading

marcelo-gonzalez Sep 20, 2024

marcelo-gonzalez Sep 20, 2024

marcelo-gonzalez Sep 20, 2024

marcelo-gonzalez Sep 20, 2024

marcelo-gonzalez Sep 21, 2024

marcelo-gonzalez Sep 21, 2024

feat(sync): use routed state part request in sync actor #12111

Are you sure you want to change the base?

feat(sync): use routed state part request in sync actor #12111

Conversation

saketh-are commented Sep 19, 2024 • edited Loading

marcelo-gonzalez Sep 20, 2024

Choose a reason for hiding this comment

marcelo-gonzalez Sep 20, 2024

Choose a reason for hiding this comment

marcelo-gonzalez Sep 20, 2024

Choose a reason for hiding this comment

marcelo-gonzalez Sep 20, 2024

Choose a reason for hiding this comment

marcelo-gonzalez Sep 21, 2024

Choose a reason for hiding this comment

marcelo-gonzalez Sep 21, 2024

Choose a reason for hiding this comment

saketh-are commented Sep 19, 2024 •

edited

Loading