You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Medusa allows for MD5 checks to be optionally enabled during backups and backup verifications ( backup , backup-cluster, verify commands) via the command line option --enable-md5-checks or the configuration parameter enable_md5_checks
While this works in general, in the case of local storage:
the relevant methods to do the check ignore MD5 values
the MD5 value is calculated on the fly every time the files in the backups get listed
No MD5 information is stored as part of the backup
differences between cloud and local storage
Metadata object information (like size, datetime) for the various cloud providers also includes the MD5 value of the file, which is calculated automatically on upload and stored as part of the metadata.
Local filesystem metadata information ( stat()) does not come with an MD5 field so the value should be associated to the file in some other way. Options are:
A) if the underline filesystem allows it use an extended attribute
B) otherwise store the MD5 value in dedicated files stored on backup side
Option (A) would work on genuinely local filesystems (i.e. a local disk mounted ) and for NFS mounts but only if NFS is version 4 or newer. unfortunately NFS v3 seems still widely used and it does not support extended attributes which would force at least in this case to go to option (B) or to impose a minimum version of NFS. Other network mount options (SMB) also may or may not support extended attributes depending on the version or mount type.
Option (B) would work regardless but it needs to be implemented carefully to avoid generating too many extra files and also cater for cases where older backups were taken with a version of Medusa that did not have the feature, or when a cloud backup is moved to local storage.
A performance issue with the current code
Point 2 has the potential to introduce delays every time a list of the files on backup need to be retrieved, which happens not just during a backup, but also during conceptually simple operations such as list-backups . This delay can become substantial if the size of the the backups is large, especially if network mounts are used.
Project board link
Medusa allows for MD5 checks to be optionally enabled during backups and backup verifications (
backup
,backup-cluster
,verify
commands) via the command line option--enable-md5-checks
or the configuration parameterenable_md5_checks
While this works in general, in the case of local storage:
differences between cloud and local storage
Metadata object information (like size, datetime) for the various cloud providers also includes the MD5 value of the file, which is calculated automatically on upload and stored as part of the metadata.
Local filesystem metadata information (
stat()
) does not come with an MD5 field so the value should be associated to the file in some other way. Options are:A) if the underline filesystem allows it use an extended attribute
B) otherwise store the MD5 value in dedicated files stored on backup side
Option (A) would work on genuinely local filesystems (i.e. a local disk mounted ) and for NFS mounts but only if NFS is version 4 or newer. unfortunately NFS v3 seems still widely used and it does not support extended attributes which would force at least in this case to go to option (B) or to impose a minimum version of NFS. Other network mount options (SMB) also may or may not support extended attributes depending on the version or mount type.
Option (B) would work regardless but it needs to be implemented carefully to avoid generating too many extra files and also cater for cases where older backups were taken with a version of Medusa that did not have the feature, or when a cloud backup is moved to local storage.
A performance issue with the current code
Point 2 has the potential to introduce delays every time a list of the files on backup need to be retrieved, which happens not just during a backup, but also during conceptually simple operations such as
list-backups
. This delay can become substantial if the size of the the backups is large, especially if network mounts are used.I propose to tackle this in two steps:
┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: MED-112
The text was updated successfully, but these errors were encountered: