Skip to content

Fix: flaky net::tests::convergence tests #6046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 1, 2025

Conversation

obycode
Copy link
Contributor

@obycode obycode commented Apr 28, 2025

Description

If my read of these tests is correct, we are just looking for convergence, and these tests should not be testing the pruning of neighbors, so it is best to set these soft max values to be the same as the total number of peers to avoid any non-deterministic pruning. The pruning should be tested elsewhere. See later comments and fixes.

Applicable issues

@obycode obycode requested a review from a team as a code owner April 28, 2025 20:29
@obycode obycode changed the base branch from master to develop April 28, 2025 20:30
@obycode obycode requested review from kantai and jferrant April 28, 2025 20:30
@obycode
Copy link
Contributor Author

obycode commented Apr 28, 2025

I made this change based on seeing peers being dropped in the logs with the reason, "the peer’s org dominates our peer table.”

@obycode
Copy link
Contributor Author

obycode commented Apr 29, 2025

Hmm... based on that CI failure, it seems that this is not the problem here.

@obycode obycode marked this pull request as draft April 29, 2025 01:29
@orsissimo
Copy link
Contributor

In local and in my CI, those tests failed (with assertion left == right failed)
net::tests::convergence::test_walk_star_allowed_15 (Failed 16 times out of 140)
net::tests::convergence::test_walk_star_15_plain (Failed 19 times out of 140)

While net::tests::convergence::test_walk_ring_15_org_biased never did - but I further checked the CI runs and saw failures during some PR merges (4 failed out of 25 checked)

One thing I couldn't do was running the tests as a group and not one by one in CI (I could do it in local tho).

I catched those failures:

For net::tests::convergence::test_walk_star_15_plain

thread '<unnamed>' panicked at stackslib/src/net/tests/convergence.rs:1099:17:
assertion `left == right` failed
  left: [QualifiedContractIdentifier { issuer: StandardPrincipalData(S100000000000000000000J197CZ), name: ContractName("db-0") }, QualifiedContractIdentifier { issuer: StandardPrincipalData(S1G2081040G2081040G2081040G208105NK8PE5), name: ContractName("db-1") }]
 right: []

Failures are often preceded by:

INFO [stackslib/src/net/stackerdb/mod.rs:509] [ThreadId(3)] No stacker DB config for S12081040G2081040G2081040G2081040HYKVZCC.db-4
INFO [stackslib/src/net/p2p.rs:2013] [ThreadId(3)] Neighbor accepted!, public key: None, address: 127.0.0.1

While for net::tests::convergence::test_walk_star_allowed_15

thread '<unnamed>' panicked at stackslib/src/net/tests/convergence.rs:1099:17:
assertion `left == right` failed
  left: [QualifiedContractIdentifier { issuer: StandardPrincipalData(S100000000000000000000J197CZ), name: ContractName("db-0") }, QualifiedContractIdentifier { issuer: StandardPrincipalData(S1G2081040G2081040G2081040G208105NK8PE5), name: ContractName("db-1") }, ...]
 right: []

Failures are often preceded by:

INFO [stackslib/src/net/stackerdb/mod.rs:509] [ThreadId(3)] No stacker DB config for S130C1G60R30C1G60R30C1G60R30C1G60V982B5Y.db-6
INFO [stackslib/src/net/stackerdb/mod.rs:509] [ThreadId(3)] No stacker DB config for S1G2081040G2081040G2081040G208105NK8PE5.db-1

obycode added 2 commits April 30, 2025 10:29
This was one cause of flaky `net::tests::convergence` tests.
This was another cause of flakiness in `net::tests::convergence` tests.
@obycode obycode force-pushed the fix/flaky-net-tests branch from 3c4e1e9 to 80f5cca Compare April 30, 2025 14:31
@obycode obycode marked this pull request as ready for review April 30, 2025 14:32
@obycode
Copy link
Contributor Author

obycode commented Apr 30, 2025

Updated with what I think is the correct fix (2 fixes actually).

  • Peer 0 was not getting any stacker DBs, but it was expected to because 0 % 2 == 0
  • The wait for convergence did not also wait for stacker DB convergence

@jferrant
Copy link
Contributor

Noice. LGTM

Copy link
Contributor

@kantai kantai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@obycode obycode requested a review from wileyj May 1, 2025 17:02
@obycode obycode enabled auto-merge May 1, 2025 17:02
@wileyj wileyj linked an issue May 1, 2025 that may be closed by this pull request
@obycode obycode added this pull request to the merge queue May 1, 2025
Merged via the queue into stacks-network:develop with commit d4507ee May 1, 2025
205 of 206 checks passed
@obycode obycode deleted the fix/flaky-net-tests branch May 1, 2025 17:47
Copy link

codecov bot commented May 1, 2025

Codecov Report

Attention: Patch coverage is 90.90909% with 3 lines in your changes missing coverage. Please review.

Project coverage is 83.45%. Comparing base (b1545ee) to head (8ab3024).
Report is 46 commits behind head on develop.

Files with missing lines Patch % Lines
stackslib/src/net/tests/convergence.rs 90.90% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6046      +/-   ##
===========================================
+ Coverage    77.95%   83.45%   +5.50%     
===========================================
  Files          538      538              
  Lines       387064   387015      -49     
  Branches       323      323              
===========================================
+ Hits        301730   322985   +21255     
+ Misses       85326    64022   -21304     
  Partials         8        8              
Files with missing lines Coverage Δ
stackslib/src/net/tests/convergence.rs 93.77% <90.90%> (-0.13%) ⬇️

... and 219 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d16bd3a...8ab3024. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants