Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tests): fix state sync test errors #12150

Merged
merged 5 commits into from
Oct 4, 2024

Conversation

marcelo-gonzalez
Copy link
Contributor

These tests create node home dirs in tempfiles, but these are dropped too soon, and the directories are removed while the nodes are still running. This is visible in the logs as snapshot creation failure messages. After #12147, these no longer cause test failures, but the underlying failures to create snapshots are still there.

Fix it by keeping them around in an Arc in the enclosing scopes

Note that every now and then these tests still fail with many messages showing Received an invalid block during state sync followed by a test timeout, but it seems to be for an unrelated reason.

These tests create node home dirs in tempfiles, but these are dropped too soon, and the
directories are removed while the nodes are still running. This is visible in the logs as snapshot
creation failure messages. After near#12147, these no longer
cause test failures, but the underlying failures to create snapshots are still there.

Fix it by keeping them around in an Arc in the enclosing scopes

Note that every now and then these tests still fail with many messages showing
`Received an invalid block during state sync` followed by a test timeout, but
it seems to be for an unrelated reason.
@marcelo-gonzalez marcelo-gonzalez requested a review from a team as a code owner September 26, 2024 03:21
Copy link

codecov bot commented Sep 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.61%. Comparing base (2b882c8) to head (190d9fd).
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #12150   +/-   ##
=======================================
  Coverage   71.60%   71.61%           
=======================================
  Files         824      824           
  Lines      165507   165507           
  Branches   165507   165507           
=======================================
+ Hits       118511   118527   +16     
+ Misses      41873    41858   -15     
+ Partials     5123     5122    -1     
Flag Coverage Δ
backward-compatibility 0.17% <ø> (ø)
db-migration 0.17% <ø> (ø)
genesis-check 1.25% <ø> (ø)
integration-tests 38.72% <ø> (+<0.01%) ⬆️
linux 71.41% <ø> (+0.01%) ⬆️
linux-nightly 71.18% <ø> (+<0.01%) ⬆️
macos 54.10% <ø> (-0.09%) ⬇️
pytests 1.57% <ø> (ø)
sanity-checks 1.38% <ø> (ø)
unittests 65.34% <ø> (-0.01%) ⬇️
upgradability 0.21% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@marcelo-gonzalez
Copy link
Contributor Author

hmm I know it's prob frowned upon but maybe we should disable this clippy redundant_clone failure on these tests. We need to clone the Arc so that we still have one around after one of them gets moved to run_actix(async move { .. }), otherwise the tempfile gets dropped too soon. So the warning message that says "this value is dropped without further use" is not quite right because yes the original value is indeed dropped without further use, but I care to keep it around exactly because of when it will be dropped.

I guess I could also just add redundant drop() calls on the original Arcs at the bottom, but feels weird

@nagisa have you ever seen this kind of clippy problem? Or am I doing something silly?

@nagisa
Copy link
Collaborator

nagisa commented Sep 26, 2024

drop to me seems like a very natural way to indicate that this data has to live up to that exact point. I have used this for the same purposes quite frequently even without a strict need (e.g. for mutex guards.)

For the most part it is also a great way to tell Rust that you want to move the guard/resource into a future, thus tying the lifetime of the two together, when there are no other "proper" uses of the variable.

Last option for tempfile specifically is to use reopen and close methods rather than Arc.

Copy link
Collaborator

@saketh-are saketh-are left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this.

The Received an invalid block during state sync messages are normal while a node is performing state sync; they occur because the node is too far behind to validate any blocks at the head of the chain. I think we should consider something along the lines of silently ignoring blocks past a certain height while the node is in state sync.

@@ -45,8 +45,13 @@ fn sync_state_nodes() {
let mut near1 = load_test_config("test1", port1, genesis.clone());
near1.network_config.peer_store.boot_nodes = convert_boot_nodes(vec![]);
near1.client_config.min_num_peers = 0;

let _dir1 = Arc::new(tempfile::Builder::new().prefix("sync_nodes_1").tempdir().unwrap());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to leave a comment here explaining the observed issues with directories being cleaned up too soon.

@marcelo-gonzalez marcelo-gonzalez added this pull request to the merge queue Oct 4, 2024
Merged via the queue into near:master with commit 0aa1343 Oct 4, 2024
30 checks passed
@marcelo-gonzalez marcelo-gonzalez deleted the state-sync-tests branch October 4, 2024 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants