Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sync): use routed state part request in sync actor #12111

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

saketh-are
Copy link
Collaborator

@saketh-are saketh-are commented Sep 19, 2024

This PR goes after #12110:

  • Now that peer selection happens inside the network crate, we can simplify a great deal of the sync actor. It is no longer concerned about from which peers the state parts are requested.

  • We change the behavior of nodes which have external storage configured. For each state part, they will first attempt to obtain it from other nodes in the network. After a fixed number of attempts, they will fall back to downloading from the external storage.

These changes bring us towards the ultimate goal of deprecating cloud storage of state parts entirely.

@saketh-are saketh-are marked this pull request as ready for review September 19, 2024 00:59
@saketh-are saketh-are requested a review from a team as a code owner September 19, 2024 00:59
StateSyncInner::External { .. } => peers,
};
Ok(res)
// TODO: where is the part validated though?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea this is all kind of convoluted... It's validated below in update_download_on_state_response_message() when we call Chain::set_state_part(). So actually I think we should probably send this StatePartReceived message only after that call returns Ok, because otherwise we're saying we can clear the state in the SnapshotHostsCache for this part before we validate it. But what we should do is keep all that state around until we're sure the part is legit, and actually maybe even remove this peer from the set we'll request parts from if it sent us a bad part

} else {
let peer_id = possible_targets.choose(&mut thread_rng()).cloned().unwrap();
tracing::debug!(target: "sync", ?peer_id, shard_id, ?sync_hash, ?possible_targets, "request_shard_header");
assert!(header_download.run_me.load(Ordering::SeqCst));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: I know you're just moving the code that was already here, but if you want you could change this assert and the store below it to just one assert!(header_download.run_me.swap(false, Ordering::SeqCst));

let epoch_info = chain.epoch_manager.get_epoch_info(epoch_id).unwrap();
let epoch_height = epoch_info.epoch_height();

let shard_state_header = chain.get_state_header(shard_id, sync_hash).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On each iteration of this loop we read this actually sort of largeish value from storage, when it's going to be the same for every part. If you want, you could move this call to the top of the function, and same with sync_block_header above. And since the shard_state_header is only needed in this branch to get the state_root and state_num_parts, but not when we dont have external storage configured, you could maybe just initialize an Option<state_root, state_num_parts> at the top of this funciton set to None, and set it here.

I guess it's not a huge deal though, so feel free to forget about it.


// The request sent to the network adapater needs to include the sync_prev_prev_hash
// so that a peer hosting the correct snapshot can be selected.
if let Ok(header) = chain.get_block_header(&sync_hash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again like in the other PR, prob best to log the errors here, since there is something very wrong if getting these headers doesnt work

{
let StateSyncExternal { chain_id, semaphore, external } =
self.external.as_ref().unwrap();
if semaphore.available_permits() > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also just break if it's 0

if let Ok(header) = chain.get_block_header(&sync_hash) {
if let Ok(prev_header) = chain.get_block_header(&header.prev_hash()) {
let sync_prev_prev_hash = prev_header.prev_hash();
request_part_from_peers(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so actually I think there are a couple problems that will come up, one is that it seems that we aren't getting SyncSnapshotHosts with updated sync_hash and shard ID values after the first one. Also, it seems that the shard IDs sent in this message are just all the shards, even if the node is only tracking one. On a localnet test it stays on the first sync hash even after we've applied the new sync_hash, and we should have gotten updated SyncSnapshotHosts with the next epoch's sync hash. So after a successful state sync the first time, this request ends up failing on the next epoch because the call to snapshot_hosts.select_host_for_part() in the peer manager returns None

Also, this is not a correctness issue I guess, but this parts_to_fetch() returns the parts whose associated run_me field is true. And this value gets set to true again here after a timeout (default 2 seconds). On localnet this timeout is probably fine but in practice on mainnet 2 seconds seems a little small to me. But this highlights that the above thing is actually a real problem because in this localnet test we're hitting this timeout because the shard IDs set in the SyncSnapshotHosts message list all the shards, even though the node is only tracking one, and then we end up making state sync requests to nodes that can't actually serve the parts. So in this case the timeout happens for a legit reason, but I think in practice the 2 second default timeout applied by all nodes might add unnecessary load on the network as a whole

you can try with this diff:

diff --git a/core/primitives/src/epoch_manager.rs b/core/primitives/src/epoch_manager.rs
index 619809334..0d2cdd5c1 100644
--- a/core/primitives/src/epoch_manager.rs
+++ b/core/primitives/src/epoch_manager.rs
@@ -166,6 +166,8 @@ impl AllEpochConfig {
     pub fn generate_epoch_config(&self, protocol_version: ProtocolVersion) -> EpochConfig {
         let mut config = self.genesis_epoch_config.clone();
 
+        config.validator_selection_config.shuffle_shard_assignment_for_chunk_producers = true;
+
         Self::config_mocknet(&mut config, &self.chain_id);
 
         if !self.use_production_config {
diff --git a/pytest/lib/state_sync_lib.py b/pytest/lib/state_sync_lib.py
index 2eefb7b81..39fe8d06d 100644
--- a/pytest/lib/state_sync_lib.py
+++ b/pytest/lib/state_sync_lib.py
@@ -37,15 +37,6 @@ def get_state_sync_configs_pair(tracked_shards=[0]):
             "nanos": 500000000
         },
         "state_sync": {
-            "sync": {
-                "ExternalStorage": {
-                    "location": {
-                        "Filesystem": {
-                            "root_dir": state_parts_dir
-                        }
-                    }
-                }
-            }
         },
         "state_sync_enabled": True,
     }
diff --git a/pytest/tests/sanity/transactions.py b/pytest/tests/sanity/transactions.py
index d0de6e781..8207122d7 100755
--- a/pytest/tests/sanity/transactions.py
+++ b/pytest/tests/sanity/transactions.py
@@ -106,16 +106,16 @@ for height, hash in utils.poll_blocks(nodes[4], timeout=TIMEOUT):
         last_balances = [x for x in ctx.expected_balances]
         ctx.send_moar_txs(hash, 10, use_routing=True)
         sent_height = height
-    else:
-        assert height <= sent_height + 10, ('Balances before: {before}\n'
-                                            'Expected balances: {expected}\n'
-                                            'Current balances: {current}\n'
-                                            'Sent at height: {sent_at}\n'
-                                            'Current height: {height}').format(
-                                                before=last_balances,
-                                                expected=ctx.expected_balances,
-                                                current=ctx.get_balances(),
-                                                sent_at=sent_height,
-                                                height=height)
+    # else:
+    #     assert height <= sent_height + 10, ('Balances before: {before}\n'
+    #                                         'Expected balances: {expected}\n'
+    #                                         'Current balances: {current}\n'
+    #                                         'Sent at height: {sent_at}\n'
+    #                                         'Current height: {height}').format(
+    #                                             before=last_balances,
+    #                                             expected=ctx.expected_balances,
+    #                                             current=ctx.get_balances(),
+    #                                             sent_at=sent_height,
+    #                                             height=height)
     if height >= 100:
         break

this will make the chunk producer assignment change on every epoch so that the nodes need to state sync for the shards theyll track in the future. then if you add some logging in this function, and in the peer manager where we call snapshot_hosts.select_host_for_part(), you should be able to see these two problems. the first one should cause the chain to stall after a couple epochs at height 30something (you might have to try a couple times for it to happen), and the second one is visible only in the logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants