Implement opportunistic mirror replicator #2510

masih · 2024-02-12T17:21:06Z

Implement a file store that opportunistically replicates data.

gammazero

Originally, the mirror was designed to talk a separate read and write Filestore to facilitate replication. We just needed a way to configure that. The idea was that once the indexer caught up, then the read and the write store would become the same Filestore.

This replicator sets up another mirror as a alternat source if data if the data is not already present in the destination. The only case where data to be indexed would be available in the destination is if a new indexer is re-indexing.

gammazero · 2024-02-12T17:33:21Z

filestore/config.go

@@ -13,6 +13,8 @@ type Config struct {
 	Local LocalConfig
 	// S3 configures storing files in S3.
 	S3 S3Config
+	// Replicator configures opportunistic mirror replication from a source onto a destination.
+	Replicator *ReplicatorConfig


It seems that the Replicator should be at a higher level in the config hierarchy, as a peer of filestore.Config, and that instead of a ReplicaConfig type, the Source and Destination should each be of type filestore.Config.

I had that implemented originally, and settled on a dedicated config to avoid the opportunity to define recursive replica source/description.

gammazero · 2024-02-12T17:45:09Z

filestore/replicator.go

+							e <- dee
+						}
+						return
+					} else {


Unnecessary else following return. Same below.

gammazero · 2024-02-12T17:46:16Z

filestore/replicator.go

+						}
+						return
+					} else {
+						break DestinationList


Break not needed, since that is where loop iteration continues naturally.

The intend of the break is to not continue the iteration, and fall back on listing from source. No?

gammazero · 2024-02-12T17:48:48Z

filestore/replicator.go

+		//
+		// If Destination returns an error first, then fallback on listing from Source.
+
+		{


Unnecessary brace?

Only there to separate the scope of working with destination vs source.

gammazero · 2024-02-12T17:49:22Z

filestore/replicator.go

+					if ok {
+						c <- dcc
+						listedAtLeastOneFile = true
+					} else {
+						if listedAtLeastOneFile {
+							return
+						} else {
+							break DestinationList
+						}
+					}


Suggested change

if ok {

c <- dcc

listedAtLeastOneFile = true

} else {

if listedAtLeastOneFile {

return

} else {

break DestinationList

}

}

if ok {

c <- dcc

listedAtLeastOneFile = true

} else if listedAtLeastOneFile {

return

}

I think the suggested commit results in listing from destination to continue, which is not what I intend to do. Unless I have missed something?

gammazero · 2024-02-12T17:58:19Z

filestore/replicator.go

+// Get Attempts to get from Replicator.Destination first, and if path does nto exist falls back onto Replicator.Source.
+// If the path is found at source, it is replicated onto destination first and then the replica is returned.
+func (r *Replicator) Get(ctx context.Context, path string) (*File, io.ReadCloser, error) {
+	dFile, dRc, dErr := r.Destination.Get(ctx, path)


The filestore is only used during ingestion of new index data, as a faster alternative to reading from the publisher. The only case where data would be available in the destination is if a new indexer is re-indexing. In which case, wouldn't it be better to support multiple sources (that are not the destination) instead of trying to read from the destination as one of the sources?

masih · 2024-02-13T09:42:52Z

Further to 1:1 discussion, ingest engine calls Put (and Head by extension) on every processed ad and its entries regardless of where the original ad came from.

This means for the purposes of populating the bare metal mirror we simply need to separate the config for read and write.

Future work will revisit the file store and ingest engine for a more optimal approach where the context of where the ad came from is taken into consideration and therefore we can avoid unnecessary calls to Put.

We should also consider encapsulating much of the complexity here into a simple sync API, such that the user is given an option to configure a list of mirrors with some priority ad the sync-er simply gets the ads back as that is the only thing that the ad sync mechanism is concerned with.

gammazero · 2024-05-07T03:20:06Z

Future work will revisit the file store and ingest engine for a more optimal approach where the context of where the ad came from is taken into consideration and therefore we can avoid unnecessary calls to Put.

This has now been implemented.

sync API, such that the user is given an option to configure a list of mirrors with some priority ad the sync-er simply gets the ads back as that is the only thing that the ad sync mechanism is concerned with.

Requested by issue #2613

Closing, since there is nothing more to do in this PR.

Implement opportunistic mirror replicator

73f0802

Implement a file store that opportunistically replicates data.

masih requested review from willscott and gammazero February 12, 2024 17:23

willscott approved these changes Feb 12, 2024

View reviewed changes

gammazero reviewed Feb 12, 2024

View reviewed changes

masih marked this pull request as draft February 12, 2024 18:28

gammazero closed this May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement opportunistic mirror replicator #2510

Implement opportunistic mirror replicator #2510

masih commented Feb 12, 2024

gammazero left a comment

gammazero Feb 12, 2024

masih Feb 13, 2024

gammazero Feb 12, 2024

gammazero Feb 12, 2024

masih Feb 13, 2024

gammazero Feb 12, 2024

masih Feb 13, 2024

gammazero Feb 12, 2024

masih Feb 13, 2024

gammazero Feb 12, 2024

masih commented Feb 13, 2024

gammazero commented May 7, 2024

Implement opportunistic mirror replicator #2510

Implement opportunistic mirror replicator #2510

Conversation

masih commented Feb 12, 2024

gammazero left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masih commented Feb 13, 2024

gammazero commented May 7, 2024