Skip to content
This repository has been archived by the owner on Oct 31, 2024. It is now read-only.

fix(p2p): rework ticks of bootstrap query interval #483

Merged
merged 1 commit into from
Mar 21, 2024

Conversation

Freyskeyd
Copy link
Member

@Freyskeyd Freyskeyd commented Mar 21, 2024

Description

This PR reworks how we manage the ticks of the bootstrap queries.

After investigating an issue on devnet I saw that two queries were executed at the same second.
While the expected behavior is to have a delay of 60sec (by default).

Why ?

09:33:29: DEBUG -> Started kademlia bootstrap query with query_id: QueryId(0)
                    ^ (1) This is the first query is created when the node starts
09:33:29: INFO  -> Dialing bootpeer 16Uiu2HAmTqPREKjx7dYpcEsjByzy8yZMgjipzAcStMFQBAvZZL73 on connection: 1
                    ^ We dial the other peer for this query
09:33:29: WARN  -> Unable to connect to bootpeer 16Uiu2HAmTqPREKjx7dYpcEsjByzy8yZMgjipzAcStMFQBAvZZL73
                    ^ The other bootnode is not available
09:33:29: WARN  -> Bootstrap query finished but unable to connect to bootnode during initialization, switching to unhealthy and fast bootstrap mode
                                    ^ The query finished but the bootnode was unreachable, switching to fast mode (5sec interval query)
09:33:29: INFO  -> Dialing bootpeer 16Uiu2HAmTqPREKjx7dYpcEsjByzy8yZMgjipzAcStMFQBAvZZL73 on connection: 2
                    ^ (2) New query created !
09:33:29: WARN  -> Unable to connect to bootpeer 16Uiu2HAmTqPREKjx7dYpcEsjByzy8yZMgjipzAcStMFQBAvZZL73
09:33:29: WARN  -> Bootstrap query finished but unable to connect to bootnode, retrying 2 more times

As we can see on the logs, the first query (1) fails, and we switch to fast mode for bootstrap as expected.
But the problem is that after changing the mode (basically, creating a new interval with shorter period), the next query (2) is directly executed.

This is due to the fact that the first tick of interval completes immediately and then the discovery poll function fire a new query.

Fix

To fix that I updated the change_interval to await and ignore the first tick and switch the MissedTickBehavior to delay as we don't need any catchup on missing tick.

PR Checklist:

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added or updated tests that comprehensively prove my change is effective or that my feature works

@Freyskeyd Freyskeyd requested a review from a team as a code owner March 21, 2024 16:00
Copy link
Contributor

@dvdplm dvdplm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

retrying 2 more times

One question on this: why just 3 retries? Ideally we'd keep retrying (perhaps backing off over time).

crates/topos-p2p/src/behaviour/discovery.rs Show resolved Hide resolved
crates/topos-p2p/src/behaviour/discovery.rs Show resolved Hide resolved
@Freyskeyd Freyskeyd force-pushed the fix/bootstrap-query-interval-tick branch from 369cfb8 to 5330083 Compare March 21, 2024 18:26
Copy link

codecov bot commented Mar 21, 2024

Codecov Report

Attention: Patch coverage is 89.74359% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 72.16%. Comparing base (b2a1af0) to head (5330083).

Files Patch % Lines
...es/topos-p2p/src/runtime/handle_event/discovery.rs 85.18% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #483      +/-   ##
==========================================
+ Coverage   72.09%   72.16%   +0.07%     
==========================================
  Files         227      218       -9     
  Lines       12831    12734      -97     
==========================================
- Hits         9251     9190      -61     
+ Misses       3580     3544      -36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hadjiszs hadjiszs merged commit 5b6ddb8 into main Mar 21, 2024
21 checks passed
@hadjiszs hadjiszs deleted the fix/bootstrap-query-interval-tick branch March 21, 2024 18:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants