-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement opportunistic mirror replicator #2510
Conversation
Implement a file store that opportunistically replicates data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, the mirror was designed to talk a separate read and write Filestore
to facilitate replication. We just needed a way to configure that. The idea was that once the indexer caught up, then the read and the write store would become the same Filestore
.
This replicator sets up another mirror as a alternat source if data if the data is not already present in the destination. The only case where data to be indexed would be available in the destination is if a new indexer is re-indexing.
@@ -13,6 +13,8 @@ type Config struct { | |||
Local LocalConfig | |||
// S3 configures storing files in S3. | |||
S3 S3Config | |||
// Replicator configures opportunistic mirror replication from a source onto a destination. | |||
Replicator *ReplicatorConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the Replicator
should be at a higher level in the config hierarchy, as a peer of filestore.Config
, and that instead of a ReplicaConfig
type, the Source and Destination should each be of type filestore.Config
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that implemented originally, and settled on a dedicated config to avoid the opportunity to define recursive replica source/description.
e <- dee | ||
} | ||
return | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary else
following return. Same below.
} | ||
return | ||
} else { | ||
break DestinationList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break not needed, since that is where loop iteration continues naturally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intend of the break is to not continue the iteration, and fall back on listing from source. No?
// | ||
// If Destination returns an error first, then fallback on listing from Source. | ||
|
||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary brace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only there to separate the scope of working with destination vs source.
if ok { | ||
c <- dcc | ||
listedAtLeastOneFile = true | ||
} else { | ||
if listedAtLeastOneFile { | ||
return | ||
} else { | ||
break DestinationList | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ok { | |
c <- dcc | |
listedAtLeastOneFile = true | |
} else { | |
if listedAtLeastOneFile { | |
return | |
} else { | |
break DestinationList | |
} | |
} | |
if ok { | |
c <- dcc | |
listedAtLeastOneFile = true | |
} else if listedAtLeastOneFile { | |
return | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the suggested commit results in listing from destination to continue, which is not what I intend to do. Unless I have missed something?
// Get Attempts to get from Replicator.Destination first, and if path does nto exist falls back onto Replicator.Source. | ||
// If the path is found at source, it is replicated onto destination first and then the replica is returned. | ||
func (r *Replicator) Get(ctx context.Context, path string) (*File, io.ReadCloser, error) { | ||
dFile, dRc, dErr := r.Destination.Get(ctx, path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filestore is only used during ingestion of new index data, as a faster alternative to reading from the publisher. The only case where data would be available in the destination is if a new indexer is re-indexing. In which case, wouldn't it be better to support multiple sources (that are not the destination) instead of trying to read from the destination as one of the sources?
Further to 1:1 discussion, ingest engine calls Put (and Head by extension) on every processed ad and its entries regardless of where the original ad came from. This means for the purposes of populating the bare metal mirror we simply need to separate the config for read and write. Future work will revisit the file store and ingest engine for a more optimal approach where the context of where the ad came from is taken into consideration and therefore we can avoid unnecessary calls to Put. We should also consider encapsulating much of the complexity here into a simple sync API, such that the user is given an option to configure a list of mirrors with some priority ad the sync-er simply gets the ads back as that is the only thing that the ad sync mechanism is concerned with. |
This has now been implemented.
Requested by issue #2613 Closing, since there is nothing more to do in this PR. |
Implement a file store that opportunistically replicates data.