-
Notifications
You must be signed in to change notification settings - Fork 663
New Discovery Reconcile Algorithm
In this page, I try to explain what I have in mind as a replacement of the current csync algorithm. This is removing the last csync brick.
In order to understand the new one, i'm explaining how this currently works:
(The discovery phase was called update in csync speak)
-
csync_update()
local: goes recursively 'stat'ing each file on the FS. Build up thelocal.tree
. Compare with the DB to know the initial intruction. -
csync_update()
remote: goes recursively 'PROPFIND'ing each file on the server. Build up theremote.tree
Compare with the DB to know the initial intruction. (Optimisation: if the etag has not change, simply copy the value from the DB incsync_statedb_get_below_path
) -
csync_reconcile
: Go overlocal.tree
andremote.tree
to find the difference and update the instructions. -
SyncEngine::treewalkFile
: Post-reconcile: mergelocal.tree
andremote.tree
together in the_syncItemMap
- The sync Engine proceed by converting the
_syncItemMap
into a vector. Check permissions, then start creating propagator jobs - Propagation start and do the change to the file system, the server, and update the DB as it goes.
Some things we notice: is that this SyncEngine::treewalkFile
function is doing this akward convertion with between the csync and the Qt world. This is quite complex function.
Merge discovery and reconcile in one function.
For every directory:
- Query the server if the etag has not changed
- Query the FS for file in that directory
- Compare the sets of file vs. DB
- NEW files: check if they are renamed then propagate the file (or the rename)
- CHANGED files: propagate it now.
- DELETED files: schedule a propagation at the end of the sync (maybe it is renamed, so we need to keep it)
- NOTHING: don't do anything
- Recurse if it is a directory (after its MkcolJob or MoveJob finished).
fn process_path(string directory_name, bool etag_has_changed) {
// If the etag was changed, do a PROPFIND to get the list of files.
// Otherwise get them from the database.
set server_entries = etag_has_changed ? do_propfind(directory_name)
: get_from_db(directory_name);
// Iterate over local files in this directory
for(auto fileEntry : QDirIterator(directory_name)) {
auto instruction = compare_with_db(fileEntry);
if (instruction == NEW) {
if (etag_has_changed && server_entries.contains(fileEntry.name)) {
process_sync(fileEntry, server_entries[fileEntry.name]);
} else if (check_rename_inode(fileEntry)) { // check the db for file with same inode
process_rename(fileEntry);
} else {
process_sync(fileEntry, nullptr)
}
} else if (instruction == CHANGED) {
process_sync(fileEntry, server_entries[fileEntry.name]);
} else if (instruction == NONE) {
if (etag_has_changed && !server_entries.contains(fileEntry.name)) {
process_delete_later(fileEntry.name); // Will delete at the end of the sync (in case it was renamed)
server_entries.remove(fileEntry.name)
continue;
} else if (etag_has_changed && compare_with_db(server_entries[fileEntry.name]) != NONE) {
process_sync(fileEntry, )
}
}
server_entries.remove(fileEntry.name)
if (fileEntry.is_directory()) {
//recurse
process_path(fileEntry.name, etag_has_changed && compare_with_db(server_entries[fileEntry.name]) != NONE)
}
}
// Now the file that are not there localy:
for (auto entry : server_entries) {
if (etag_has_changed && compare_with_db(entry) == NEW && check_rename_fileid(entry)) {
process_rename(entry);
} else if (etag_has_changed && compare_with_db(entry) != NONE) {
process_sync(nullptr, entry)
if (fileEntry.is_directory()) {
// recurse
process_path(entry.name, true);
}
} else {
process_delete_later(entry.name);
}
}
}
// Check if we should download or upload the file, handle conflicts, ....
fn process_sync(auto localEntry, auto remoteEntry) {
if (!check_permission_and_other_tweak(localEntry, remoteEntry))
return;
propagate(localEntry, remoteEntry);
}
// Do the delete after the sync.
// Note: we still need to recurse as well within the deleted directory to make sure
// there is no files to restore. Or ignored files that blocks the delete
fn process_delete_later(string fileName);
The idea is to do all step at once. and proceed directory by directory, recursively.
Later, we can only go in the directory that were changed according to the file system watcher. We don't need to go at all into directory we know have not changed.
TODO: Figure how to do UPDATE_METADATA when childs are propagated to the remote
TODO: Find out how to parallelize this to not have the one-remote-PROPFIND-at-a-time issue like with the current codebase.
☁️