You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Backup tracking is failing to detect failed backups due to how the BackupStatus code does its check.
In the current version, the code will get the node backup from the storage bucket, and it will potentially register the backup in the BackupMan in case it doesn't exist. This allows to recreate the backups metadata in memory after a restart of the GRPC server.
It then computes the status solely based on the data gathered from storage, without checking if there is a pending future or not.
If there's no pending future (backup crashed or medusa was restarted), then we're not resurfacing the failure and the operator will keep on polling indefinitely.
What needs to be changed:
BackupMan.register_backup(request.backupName, is_async=False, overwrite_existing=False)
status = BackupMan.STATUS_UNKNOWN
if backup.started:
status = BackupMan.STATUS_IN_PROGRESS
if backup.finished:
status = BackupMan.STATUS_SUCCESS
this code needs to include a check for the existence of a future in the BackupMan and return accordingly:
BackupMan.register_backup(request.backupName, is_async=False, overwrite_existing=False)
status = BackupMan.STATUS_UNKNOWN
try:
future = BackupMan.get_backup_future(request.backupName)
except RuntimeError as e:
# No future exists, if the backup isn't marked as finished in the backend then it failed
if not backup.finished:
status = BackupMan.STATUS_FAILED
if status == BackupMan.STATUS_UNKNOWN:
# We don't have a pending future and need to compute status based on storage information
if backup.started:
status = BackupMan.STATUS_IN_PROGRESS
if backup.finished:
status = BackupMan.STATUS_SUCCESS
The text was updated successfully, but these errors were encountered:
Project board link
Backup tracking is failing to detect failed backups due to how the BackupStatus code does its check.
In the current version, the code will get the node backup from the storage bucket, and it will potentially register the backup in the BackupMan in case it doesn't exist. This allows to recreate the backups metadata in memory after a restart of the GRPC server.
It then computes the status solely based on the data gathered from storage, without checking if there is a pending future or not.
If there's no pending future (backup crashed or medusa was restarted), then we're not resurfacing the failure and the operator will keep on polling indefinitely.
What needs to be changed:
this code needs to include a check for the existence of a future in the BackupMan and return accordingly:
The text was updated successfully, but these errors were encountered: