Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(state-sync): shard swap in single shard tracking #12108

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

VanBarbascu
Copy link
Contributor

Pytest to check decentralised state sync of nodes tracking one shard.
Keep shard shuffling off until the implementation is done.

This test sets a dumper node only for the state sync headers.
Sets 4 validator nodes, each tracking 1 shard.
Sets an RPC node to handle the random traffic that changes the state.
Only allow validator nodes to share parts by enabling the state snapshot.

Check if the network can got for 6 epochs while shard shuffling is on.
Validator nodes are expected to download parts from each other.

Check http://127.0.0.1:3040/debug/pages/epoch_info for the validator assignment rotation.

Copy link

codecov bot commented Sep 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.54%. Comparing base (c0c172f) to head (e2ba7cb).

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12108      +/-   ##
==========================================
- Coverage   71.54%   71.54%   -0.01%     
==========================================
  Files         819      819              
  Lines      164720   164720              
  Branches   164720   164720              
==========================================
- Hits       117843   117841       -2     
- Misses      41733    41738       +5     
+ Partials     5144     5141       -3     
Flag Coverage Δ
backward-compatibility 0.17% <ø> (ø)
db-migration 0.17% <ø> (ø)
genesis-check 1.26% <ø> (ø)
integration-tests 38.65% <ø> (-0.01%) ⬇️
linux 71.34% <ø> (+<0.01%) ⬆️
linux-nightly 71.11% <ø> (-0.01%) ⬇️
macos 53.52% <ø> (+0.91%) ⬆️
pytests 1.52% <ø> (ø)
sanity-checks 1.33% <ø> (ø)
unittests 65.26% <ø> (-0.01%) ⬇️
upgradability 0.21% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


def _prepare_cluster(self, with_rpc=False, shuffle_shard_assignment=False):
(node_config_dump,
node_config_sync) = state_sync_lib.get_state_sync_configs_pair(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also enables state dumping to external storage (local filesystem), so we may need a different function to generate state sync configs but without enabling centralized state dump. I guess this is what you want to test right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need the headers to be dumped until we find a way to share them between nodes that track a subset of shards. Same for this question here.

tracked_shards=None)

# State snapshot is disabled for dumper. We only want to dump the headers.
node_config_dump["store.state_snapshot_enabled"] = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not get it, will the headers still be dumped to and retrieved from centralized storage?

I would expect that in decentralized state sync there is no specific dumper role?

Copy link
Collaborator

@saketh-are saketh-are Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the 2.3 release we will decentralize just the state part distribution, which represents the vast majority of the upload/download traffic for state sync. Subsequently we will address the state headers. So for now we still need the dumper node putting the headers into the central storage.

Copy link
Contributor

@tayfunelmas tayfunelmas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just added some comments to understand the config setup.

Copy link
Collaborator

@saketh-are saketh-are left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @VanBarbascu for getting this set up.

Currently the test is passing even if I break decentralized state sync requests because after some attempts the node falls back to the central storage and gets the part successfully from there. If I disable the download from central storage as well (so that all of state sync is broken) then the test does fail as expected.

If there's an easy way to make the dumper upload only the headers and not the parts, that would be nice in the short term. In the long run once we fully decentralize we could just get rid of the dumper.

@VanBarbascu
Copy link
Contributor Author

If there's an easy way to make the dumper upload only the headers and not the parts, that would be nice in the short term. In the long run once we fully decentralize we could just get rid of the dumper.

This is what I was aiming for when I set the dumper to skip snapshot:

# State snapshot is disabled for dumper. We only want to dump the headers.
node_config_dump["store.state_snapshot_enabled"] = False

Could there be a conflict between store.state_snapshot_enabled and store.state_snapshot configs?
When we set store.state_snapshot_enabled to True, store.state_snapshot is set to EveryEpoch. When it is False, store.state_snapshot stays the same.

The default value for store.state_snapshot changed in the meantime from OnlyForResharding to EveryEpoch so that is why you are experiencing this behaviour.

@VanBarbascu
Copy link
Contributor Author

Done! Now it should not dump the parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants