You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have observed that the mergerfs.dup command takes a significant amount of time to execute on a dataset that has already been duplicated. Currently, I have the following setup:
/mnt/disk1:/mnt/disk2 /mnt/pool
Within the /mnt/pool directory, there is a folder called /mnt/pool/data containing approximately 273GB of data with 147,148 files. My objective is to maintain duplicate copies of this folder on both drives. To achieve this, I am using the command /usr/local/bin/mergerfs.dup -d newest -c 2 -e /mnt/pool/data.
The execution of this command takes approximately 45 minutes, even when no actual copying is required. The script performs an rsync overwrite for each file.
To optimize the performance, I propose a modification to the *_dupfun functions to return if an overwrite is necessary:
def newest_dupfun(default_basepath,relpath,basepaths):
sts = dict([(f,os.lstat(os.path.join(f,relpath))) for f in basepaths])
mtime = sts[basepaths[0]].st_mtime
if not all([st.st_mtime == mtime for st in sts.values()]):
return sorted(sts,key=lambda x: sts.get(x).st_mtime,reverse=True)[0], True
ctime = sts[basepaths[0]].st_ctime
if not all([st.st_ctime == ctime for st in sts.values()]):
return sorted(sts,key=lambda x: sts.get(x).st_ctime,reverse=True)[0], False
return default_basepath, False
Then, a simple check can be added to determine whether an overwrite is necessary before executing the rsync command´:
for tgtpath in existing:
if prune and i >= count:
break
copies.append(tgtpath)
if overwrite:
args = build_copy_file(srcpath,tgtpath,relpath)
print('# overwrite')
print_args(args)
if execute:
execute_cmd(args)
i += 1
These changes have significantly improved the performance, reducing the script execution time to just 1 minute. Furthermore, the output log now only displays actual changes made to the file system. The rsync overwrites never actually did anything as the files were already duplicated.
This change shoud improve the performace for all the *_dupfun functions except for mergerfs_dupfun where
I have observed that the mergerfs.dup command takes a significant amount of time to execute on a dataset that has already been duplicated. Currently, I have the following setup:
/mnt/disk1:/mnt/disk2 /mnt/pool
Within the /mnt/pool directory, there is a folder called /mnt/pool/data containing approximately 273GB of data with 147,148 files. My objective is to maintain duplicate copies of this folder on both drives. To achieve this, I am using the command
/usr/local/bin/mergerfs.dup -d newest -c 2 -e /mnt/pool/data
.The execution of this command takes approximately 45 minutes, even when no actual copying is required. The script performs an rsync overwrite for each file.
To optimize the performance, I propose a modification to the
*_dupfun
functions to return if an overwrite is necessary:Modifying the call of the
*_dupfun
functionsThen, a simple check can be added to determine whether an overwrite is necessary before executing the rsync command´:
These changes have significantly improved the performance, reducing the script execution time to just 1 minute. Furthermore, the output log now only displays actual changes made to the file system. The rsync overwrites never actually did anything as the files were already duplicated.
This change shoud improve the performace for all the *_dupfun functions except for mergerfs_dupfun where
would trigger an overwrite everytime, because no other check is possible.
Please let me know if you can spot any issues. If you'd like I can creating a merge request for these changes for you to review.
The text was updated successfully, but these errors were encountered: