feat(l1): snap sync overhaul #1763

fmoletta · 2025-01-21T13:50:36Z

Motivation
This PR introduces the following upgrades for snap-sync:

Use DB-persisted checkpoints so we can persist the sync progress throughout restarts & cycles
Stop ForckChoices & NewPayloads being applied while syncing
Improved handling of stale pivot during sub-processes
Improved handling of pending requests when aborting due to stale pivot
Fetching of large storage tries (that don't fit in a single range request)
Safer (but a bit slower) healing that can be restarted
Faster storage fetching (multiple parallel fetches)

And also simplifies it by removing the following logic:

No longer downloads bodies and receipts for blocks before the pivot during snap sync (WARNING: this goes against the spec but shouldn't be a problem for the time being)
Removes restart from latest block when latest - 64 becomes stale. (By this point it is more effective to wait for the next fork choice update)
Periodically shows state sync progress

Description

Stores the last downloaded block's hash in the DB during snap sync to serve as a checkpoint if the sync is aborted halfway (common case when syncing from genesis). This checkpoint is cleared upon succesful snap sync.
No longer fetches receipts or block bodies past the pivot block during snap sync
Add method sync_status which returns an enum with the current sync status (either Inactive, Active or Pending) and uses it in the ForkChoiceUpdate & NewPayload engine rpc endpoints so that we don't apply their logic during an active or pending sync.
Fetcher process now identify stale pivots and remain passive until they receive the end signal
Fetcher processes now return their current queue upon return so that it can be persisted into the next cycle
Stores the latest state root during state sync and healing as a checkpoint
Stores the last fetched key during state sync as a checkpoint
Healing no longer stores the nodes received via p2p, it instead inserts the leaf values and rebuilds it to avoid trie corruption between restarts.
The current progress percentage and estimated time to finish is periodically reported during state sync
Disables the following Paris & Cancun engine hive tests that previously yielded false positives due to new payloads being accepted on top of a syncing chain:
- Invalid NewPayload (family)
- Re-Org Back to Canonical Chain From Syncing Chain
- Unknown HeadBlockHash
- In-Order Consecutive Payload Execution (Flaky)
- Valid NewPayload->ForkchoiceUpdated on Syncing Client
- Invalid Missing Ancestor ReOrg
- - Payload Build after New Invalid Payload  (only Cancun)
And also disables the following tests that fail with the flag Syncing=true for the same reason :
- Bad Hash on NewPayload
- ParentHash equals BlockHash on NewPayload (only for Paris)
- Invalid PayloadAttributes (family)

Misc:

Replaces some noisy unwraps in networking module with errors
Applies annotated hacky fixes for problems reported in bug(l1): connection attempt on already connected peer due to revalidation #1684 bug(l1): wrong logic when validating peer ForkID #1685 & bug(l1): peer connection broken due to failure to add transaction to mempool before sync #1686

Closes None

Closes #issue_number

…to save-peer-capabilities

…to add-holesky-presets

…to add-sepolia-presets

crates/networking/p2p/peer_channels.rs

…or node requests

…to sync-overhaul

crates/networking/p2p/sync.rs

…to sync-overhaul

**Motivation** Reverts the "slow but safe" approach to healing taken by #1763 and instead adds the intermediate nodes received from peers to the trie.  **Description** * Write nodes fetched during healing to the state instead of writing only leaf values * Track paths left in queue when a healing cycle ends due to pivot staleness * Condense snap checkpoint clearing to one single method * (Misc) Move some noisy p2p info tracing to debug * (Fix) When executing full blocks after the pivot block during snap sync execute the block first before setting it as canonical   **Comments** A not so pretty solution was implemented on the storage_healer (the startup flag) to avoid blocking storage healing at the start. This can and will be improved but doing so in this PR is not worth it when follow up PRs will already touch up on this logic when parallelizing storage node fetching (similar to storage ranges) **Testing** This proved to work fine in `Mekong` testnet when introducing forced sleeps and delays to force healing. But it is still too slow to work on Holesky testnet, this should improve once we add more parallelization to state sync & healing Closes #issue_number

fmoletta and others added 30 commits December 19, 2024 11:00

Set peer capabilities when starting connection

0ee0d81

get_peer: Make it random + filter by cap

3e35cec

Fixes

b3dcdc3

Clippy

75e32b1

fmt

30fec69

Add holesky genesis + bootnodes

3ba67f4

Push files

d9dd1ee

Fix rem by zero

b0cd4ee

Apply suggestion

dd3dd6b

Merge branch 'main' of github.com:lambdaclass/lambda_ethereum_rust in…

c48ba08

…to save-peer-capabilities

Add show_peer_stats method

ab60c84

fmt

9f4ebc8

Move holesky bootnodes to a json file

18fcab5

Push file

83c4b3b

Clippy

a263ae6

Merge branch 'main' of github.com:lambdaclass/lambda_ethereum_rust in…

9f75073

…to add-holesky-presets

Aquire receiver lock before sending request

b221394

Add bootnodes & genesis

ba76e4e

Merge branch 'main' of github.com:lambdaclass/lambda_ethereum_rust in…

926a364

…to add-sepolia-presets

Fix Deserialize for BootNode

c1edda8

Merge branch 'show-peer-stats' into sepolia-debug

673db47

Merge branch 'fix-potential-request-overlap' into sepolia-debug

6b14d7b

Mute handle_message errors + show peer stats upon peer request

e8fa564

Disable ForkId check

f6e2e2d

Omit unknown caps

0ab0070

Make sync louder

dd1b575

Mute payload tracing

8433e9a

Fix

dff2d56

Debug

a8e0e6c

Debug

d97b736

fmoletta added 2 commits January 22, 2025 17:19

Fix issue link

ba68348

Amend diffs

6afac6b

ilitteri reviewed Jan 22, 2025

View reviewed changes

crates/networking/p2p/peer_channels.rs Show resolved Hide resolved

fmoletta added 6 commits January 22, 2025 18:18

Prioritize reaching leaves on state trie heal + increase batch size f…

a93e847

…or node requests

Fix

d4e299d

Cleaning

3ead985

Fix cherry pick conflicts

e08cc0a

Merge branch 'main' of github.com:lambdaclass/lambda_ethereum_rust in…

d5782ab

…to sync-overhaul

Fix leftover conflicts

13c0a1d

fmoletta marked this pull request as ready for review January 22, 2025 21:53

fmoletta requested a review from a team as a code owner January 22, 2025 21:53

ilitteri reviewed Jan 22, 2025

View reviewed changes

crates/networking/p2p/sync.rs Outdated Show resolved Hide resolved

fmoletta added 2 commits January 23, 2025 10:45

Use saturating sub

adc4df7

Remove hacky fix

e62df20

mpaulucci approved these changes Jan 23, 2025

View reviewed changes

fmoletta and others added 4 commits January 23, 2025 15:54

remove unused import

61261fc

Lower request byte limit

b08ff00

Update paris engine tests

89ecc31

Merge branch 'main' of github.com:lambdaclass/lambda_ethereum_rust in…

236d200

…to sync-overhaul

fmoletta mentioned this pull request Jan 27, 2025

feat(l1): save header download checkpoint + skip downloading pre-pivot bodies + track pending sync [WIP] #1712

Closed

fmoletta added 4 commits January 27, 2025 17:33

Update cancun engine tests

d7e23a2

Update cancun engine tests

7f5c420

Remove elusive test

c0e4808

Merge branch 'main' of github.com:lambdaclass/lambda_ethereum_rust in…

8398500

…to sync-overhaul

fmoletta added this pull request to the merge queue Jan 29, 2025

Merged via the queue into main with commit ea3ae65 Jan 29, 2025
20 checks passed

fmoletta deleted the sync-overhaul branch January 29, 2025 14:01

fmoletta mentioned this pull request Feb 3, 2025

feat(l1): improve healing #1867

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(l1): snap sync overhaul #1763

feat(l1): snap sync overhaul #1763

fmoletta commented Jan 21, 2025 •

edited

Loading

feat(l1): snap sync overhaul #1763

feat(l1): snap sync overhaul #1763

Conversation

fmoletta commented Jan 21, 2025 • edited Loading

fmoletta commented Jan 21, 2025 •

edited

Loading