sync2: implement multi-peer synchronization #6358

ivan4th · 2024-09-30T13:23:31Z

Motivation

syncv2 must ensure that the network is in sync by performing sync against multiple peers from time to time, also when starting a fresh/stale node.
When a lot of data needs to be transferred during sync, it would be better to spread the load across multiple peers to avoid costly ax/1-like requests.

Description

#6404 needs to be merged before this one.

This adds multi-peer synchronization support.
When the local set differs too much from the remote sets, making pairwise sync degrade to transferring the whole set, "torrent-style" "split sync" is attempted which splits the set into subranges and syncs each sub-range against a separate peer. Otherwise, the full sync is done, syncing the whole set against each of the synchronization peers.
Full sync is also done after each split sync run.
The local set can be considered synchronized after the specified number of full syncs has happened.

The approach is loosely based on SREP: Out-Of-Band Sync of Transaction Pools for Large-Scale
Blockchains paper by Novak Boškov, Sevval Simsek, Ari Trachtenberg, and David Starobinski.

codecov · 2024-09-30T13:50:51Z

Codecov Report

Attention: Patch coverage is 83.90663% with 131 lines in your changes missing coverage. Please review.

Project coverage is 79.8%. Comparing base (d32ffaa) to head (cf46587).
Report is 3 commits behind head on develop.

Files with missing lines	Patch %	Lines
sync2/multipeer/split_sync.go	70.7%	29 Missing and 9 partials ⚠️
sync2/multipeer/multipeer.go	89.0%	22 Missing and 6 partials ⚠️
sync2/p2p.go	67.0%	21 Missing and 5 partials ⚠️
sync2/multipeer/setsyncbase.go	78.7%	15 Missing and 9 partials ⚠️
sync2/multipeer/sync_queue.go	86.8%	5 Missing and 3 partials ⚠️
sync2/rangesync/rangesync.go	88.7%	4 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##           develop   #6358     +/-   ##
=========================================
+ Coverage     79.7%   79.8%   +0.1%     
=========================================
  Files          328     335      +7     
  Lines        42977   43656    +679     
=========================================
+ Hits         34256   34860    +604     
- Misses        6782    6831     +49     
- Partials      1939    1965     +26

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Given that after recent item sync is done (if it's needed at all), the range set reconciliation algorithm no longer depends on newly received item being added to the set, we can save memory by not adding the received items during reconciliation. During real sync, the received items will be sent to the respective handlers and after the corresponding data are fetched and validated, they will be added to the database, without the need to add them to cloned OrderedSets which are used to sync against particular peers.

ivan4th · 2024-10-23T10:30:28Z

Given that no review comments were added yet, I've rebased the PR on top of #6404, squashing commits one more time

This adds multi-peer synchronization support. When the local set differs too much from the remote sets, "torrent-style" "split sync" is attempted which splits the set into subranges and syncs each sub-range against a separate peer. Otherwise, the full sync is done, syncing the whole set against each of the synchronization peers. Full sync is also done after each split sync run. The local set can be considered synchronized after the specified number of full syncs has happened. The approach is loosely based on [SREP: Out-Of-Band Sync of Transaction Pools for Large-Scale Blockchains](https://people.bu.edu/staro/2023-ICBC-Novak.pdf) paper by Novak Boškov, Sevval Simsek, Ari Trachtenberg, and David Starobinski.

acud · 2024-10-23T14:46:10Z

sync2/multipeer/delim.go

+	"github.com/spacemeshos/go-spacemesh/sync2/rangesync"
+)
+
+func getDelimiters(numPeers, keyLen, maxDepth int) (h []rangesync.KeyBytes) {


not sure I understand the context of the word delimiters here. It isn't really clear what this function does

Added a comment describing the purpose of this function

sync2/multipeer/dumbset.go

acud · 2024-10-23T14:53:51Z

sync2/multipeer/interface.go

+// It extends rangesync.OrderedSet with methods which are needed for multi-peer
+// reconciliation.
+type OrderedSet interface {
+	rangesync.OrderedSet


do we really need to have these multiple interfaces that share the same name in a different package? why can't we just have one OrderedSet interface? also the inheritance syntax is confusing in this context. if this is strictly needed and we can't do without it, consider adding to the interface naming something that suggests that it has to do with syncing (iiuc from the comment).

I should probably do it indeed, this kind of splitting for this interface was motivated by the need to split the whole syncv2 thing into multiple PRs, but at this point it's already not needed that much

Merged multipeer.OrderedSet interface into rangesync.OrderedSet

acud · 2024-10-23T14:58:27Z

sync2/multipeer/interface.go

+	Has(rangesync.KeyBytes) (bool, error)
+	// Release releases the resources associated with the set.
+	// Calling Release on a set that is already released is a no-op.
+	Release() error


consider using io.Closer here

On one hand this seems to make sense, but OrderedSet is not really an I/O primitive, so I'm somewhat in doubt here.

acud · 2024-10-23T14:59:04Z

sync2/multipeer/interface.go

+}
+
+// Syncer is a synchronization interface for a single peer.
+type Syncer interface {


nit: PeerSyncer?

Makes sense, will rename

acud · 2024-10-23T16:59:52Z

sync2/multipeer/sync_queue.go

+	return sr
+}
+
+func newSyncQueue(numPeers, keyLen, maxDepth int) syncQueue {


I wonder why not to return a pointer here? also all the methods have pointer semantics.

syncQueue is just a slice, not a struct, so the pointer is only needed for the methods that modify it, as there's not much copying involved

acud · 2024-10-23T17:04:28Z

sync2/multipeer/synclist.go

+	for sl.syncs.Len() != 0 {
+		el := sl.syncs.Back()
+		if t.After(el.Value.(time.Time)) {
+			sl.syncs.Remove(el)


can this not cause a memory leak? i.e. if a a series of double-linked list items get cut off from the rest, it means they might just continue living in memory because they reference each other (won't show up in the gc mark-and-sweep runs).

Go's double-linked implementation unlinks the list element properly: https://github.com/golang/go/blob/go1.23.2/src/container/list/list.go#L108-L115

acud · 2024-10-23T17:06:49Z

sync2/p2p.go

+	"github.com/spacemeshos/go-spacemesh/sync2/rangesync"
+)
+
+type Dispatcher = rangesync.Dispatcher


why is the type alias needed?

My idea was for sync2 package to serve as a facade that hides all the implementation details of sync itself beneath it

That's fine, I'm still not sure I understand the type-aliasing narrative though. Not sure how the two are related. Unless you're decorating the original type with more functionality I don't see why this should be necessary. It just adds more indirection to an already quite large package. If you need to leak stuff out of the package, maybe better to do it through interfaces instead of type aliasing (that's just my opinion though).

Ok it was a remnant from an older iteration where Dispatcher did not use the constructor, etc.
Given that rangesync.OrderedSet etc. is needed anyway, I removed this type alias.

acud · 2024-10-23T17:08:31Z

sync2/p2p.go

+	s.reconciler = multipeer.NewMultiPeerReconciler(
+		s.syncBase, peers, keyLen, maxDepth,
+		multipeer.WithLogger(logger),
+		multipeer.WithSyncPeerCount(cfg.SyncPeerCount),


seems like a good candidate for a Config type. many function calls that can be avoided

Switched to config type (also did that for rangesync package)

acud · 2024-10-23T17:10:22Z

sync2/p2p.go

+		return
+	}
+	s.running.Store(true)
+	s.start.Do(func() {


not sure why Once is needed? can it be that it will be called from multiple places?

The idea was for the Start method to be idempotent

I don't quite understand the reasoning - Start can be called any amount of times - but only the first time it actually does something? Start - Stop - Start causes running to be true but the component is actually in the stopped state?

The idea is that there's no harm in invoking Start multiple times on a P2PHashSync, but after you Stop() it, you throw it away (it is non-restartable)

fasmat · 2024-10-30T23:14:17Z

sync2/p2p.go

+		s.eg.Go(func() error {
+			defer s.running.Store(false)
+			var ctx context.Context
+			ctx, s.cancel = context.WithCancel(context.Background())
+			return s.reconciler.Run(ctx)
+		})


Start just serves as a wrapper around the Run method here. Would it make sense to instead of having the Start, Stop and s.running methods/fields to just have a Run method that passes along the context to s.reconciler? This would also get rid of the need for s.cancel.

P2PHashSync has active sync enable/disable flag as its config option. That's part of P2PHashSync logic.
When cfg.EnableActiveSync is false, Start / Stop are noops. In this case, P2PHashSync only serves requests received from a p2p Server via a Dispatcher.

fasmat · 2024-10-30T23:21:36Z

sync2/multipeer/multipeer_test.go

+		var ctx context.Context
+		for i := 0; i < numSyncs; i++ {
+			pl := mt.expectProbe(6, rangesync.ProbeResult{
+				FP:    "foo",
+				Count: 100,
+				Sim:   0.99, // high enough for full sync
+			})
+			mt.expectFullSync(pl, 6, 0)
+			mt.syncBase.EXPECT().Wait()
+			if i == 0 {
+				//nolint:fatcontext
+				ctx = mt.start()
+			} else {
+				// first full sync happens immediately
+				mt.clock.Advance(time.Minute)
+			}
+			mt.clock.BlockUntilContext(ctx, 1)
+			mt.satisfy()
+		}


If the first loop behaves differently would it make sense to indicate it as such more clearly? This should also get rid of the linter warning.

Suggested change

var ctx context.Context

for i := 0; i < numSyncs; i++ {

pl := mt.expectProbe(6, rangesync.ProbeResult{

FP: "foo",

Count: 100,

Sim: 0.99, // high enough for full sync

})

mt.expectFullSync(pl, 6, 0)

mt.syncBase.EXPECT().Wait()

if i == 0 {

//nolint:fatcontext

ctx = mt.start()

} else {

// first full sync happens immediately

mt.clock.Advance(time.Minute)

}

mt.clock.BlockUntilContext(ctx, 1)

mt.satisfy()

}

pl := mt.expectProbe(6, rangesync.ProbeResult{

FP: "foo",

Count: 100,

Sim: 0.99, // high enough for full sync

})

mt.expectFullSync(pl, 6, 0)

mt.syncBase.EXPECT().Wait()

mt.clock.Advance(time.Minute) // first sync happens immediatly

mt.clock.BlockUntilContext(context.Background(), 1)

mt.satisfy()

for i := 1; i < numSyncs; i++ {

pl := mt.expectProbe(6, rangesync.ProbeResult{

FP: "foo",

Count: 100,

Sim: 0.99, // high enough for full sync

})

mt.expectFullSync(pl, 6, 0)

mt.syncBase.EXPECT().Wait()

ctx := mt.start()

mt.clock.BlockUntilContext(ctx, 1)

mt.satisfy()

}

It's the other way around, we need to do mt.start() in the initial iteration and advance the clocks in the following ones. But otherwise that's probably the right idea, so I updated the code, except that I wrapped expectations in the nested func expect to highlight the fact that the expectations are the same right after startup and when it's time to do more syncs

fasmat · 2024-10-30T23:30:01Z

sync2/p2p.go

+	peer, found := server.ContextPeerID(ctx)
+	if !found {
+		panic("BUG: no peer ID found in the handler")
+	}


Would it make sense to pass the peer explicitly instead of putting it into the context and then panicing when it isn't there? Afaik this is the only place in our codebase at the moment where we put values in the context at all. A missing value should not lead to a panic or this feels like a misuse of context.WithValue to me 🤔

That entailed quite a few changes in unrelated code as most of the existing p2p.Server use cases don't care about the peer ID, but I still updated the p2p.Server and got rid of that context key

ivan4th requested review from dshulyak, fasmat, poszu and acud as code owners September 30, 2024 13:23

ivan4th force-pushed the sync2/multipeer branch from f472048 to 71c9f56 Compare September 30, 2024 13:29

ivan4th force-pushed the syncv2/pairwise branch from 2e8e59c to c22a7c4 Compare October 9, 2024 05:13

ivan4th requested a review from jellonek as a code owner October 9, 2024 17:31

spacemesh-bors bot changed the base branch from syncv2/pairwise to develop October 17, 2024 05:40

ivan4th force-pushed the sync2/multipeer branch 3 times, most recently from b8ea626 to 82b5a72 Compare October 21, 2024 03:54

ivan4th force-pushed the sync2/multipeer branch from 6f4b0ca to 500ab74 Compare October 23, 2024 10:29

ivan4th changed the base branch from develop to sync2/rangesync-recent October 23, 2024 10:29

ivan4th mentioned this pull request Oct 23, 2024

sync2: add sqlstore #6405

Open

ivan4th force-pushed the sync2/multipeer branch from 4c57c22 to ef30f47 Compare October 23, 2024 12:19

acud requested changes Oct 23, 2024

View reviewed changes

ivan4th added 2 commits October 27, 2024 02:43

sync2: add description of multipeer reconciliation

af95c7f

sync2: doc update

b062eaa

spacemesh-bors bot changed the base branch from sync2/rangesync-recent to develop October 28, 2024 12:15

ivan4th added 3 commits October 28, 2024 21:52

Merge branch 'develop' into sync2/multipeer

3bfaa3f

Merge branch 'develop' into sync2/multipeer

ef484c5

sync2: address comments

0cf1678

fasmat reviewed Oct 30, 2024

View reviewed changes

ivan4th added 2 commits October 31, 2024 03:15

sync2: multipeer: add error decoration

287d76d

sync2: remove Dispatcher type alias

51166fa

fasmat reviewed Oct 30, 2024

View reviewed changes

ivan4th added 2 commits October 31, 2024 06:23

sync2: multipeer: refactor tests

c58d690

p2p: server: pass PeerID as an explicit argument to the handler

cf46587

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync2: implement multi-peer synchronization #6358

sync2: implement multi-peer synchronization #6358

ivan4th commented Sep 30, 2024 •

edited

Loading

codecov bot commented Sep 30, 2024 •

edited

Loading

ivan4th commented Oct 23, 2024

acud Oct 23, 2024

ivan4th Oct 30, 2024

acud Oct 23, 2024

ivan4th Oct 24, 2024

ivan4th Oct 30, 2024

acud Oct 23, 2024

ivan4th Oct 24, 2024

acud Oct 23, 2024

ivan4th Oct 24, 2024

ivan4th Oct 30, 2024

acud Oct 23, 2024

ivan4th Oct 24, 2024

acud Oct 23, 2024

ivan4th Oct 30, 2024

acud Oct 23, 2024

ivan4th Oct 24, 2024

acud Oct 24, 2024

ivan4th Oct 30, 2024

acud Oct 23, 2024

ivan4th Oct 30, 2024

acud Oct 23, 2024

ivan4th Oct 30, 2024

fasmat Oct 30, 2024

ivan4th Oct 31, 2024

fasmat Oct 30, 2024

ivan4th Oct 31, 2024

fasmat Oct 30, 2024

ivan4th Oct 31, 2024

fasmat Oct 30, 2024

ivan4th Oct 31, 2024

sync2: implement multi-peer synchronization #6358

Are you sure you want to change the base?

sync2: implement multi-peer synchronization #6358

Conversation

ivan4th commented Sep 30, 2024 • edited Loading

Motivation

Description

codecov bot commented Sep 30, 2024 • edited Loading

Codecov Report

ivan4th commented Oct 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivan4th commented Sep 30, 2024 •

edited

Loading

codecov bot commented Sep 30, 2024 •

edited

Loading