Backup tracking is broken in the grpc server #797

adejanovski · 2024-08-14T09:25:01Z

Backup tracking is failing to detect failed backups due to how the BackupStatus code does its check.

In the current version, the code will get the node backup from the storage bucket, and it will potentially register the backup in the BackupMan in case it doesn't exist. This allows to recreate the backups metadata in memory after a restart of the GRPC server.
It then computes the status solely based on the data gathered from storage, without checking if there is a pending future or not.

If there's no pending future (backup crashed or medusa was restarted), then we're not resurfacing the failure and the operator will keep on polling indefinitely.

What needs to be changed:

                BackupMan.register_backup(request.backupName, is_async=False, overwrite_existing=False)
                status = BackupMan.STATUS_UNKNOWN
                if backup.started:
                    status = BackupMan.STATUS_IN_PROGRESS
                if backup.finished:
                    status = BackupMan.STATUS_SUCCESS

this code needs to include a check for the existence of a future in the BackupMan and return accordingly:

                BackupMan.register_backup(request.backupName, is_async=False, overwrite_existing=False)
                status = BackupMan.STATUS_UNKNOWN
                try:
                   future = BackupMan.get_backup_future(request.backupName)
                except RuntimeError as e:
                  # No future exists, if the backup isn't marked as finished in the backend then it failed
                  if not backup.finished:
                    status = BackupMan.STATUS_FAILED
                if status == BackupMan.STATUS_UNKNOWN:
                  # We don't have a pending future and need to compute status based on storage information
                  if backup.started:
                      status = BackupMan.STATUS_IN_PROGRESS
                  if backup.finished:
                      status = BackupMan.STATUS_SUCCESS

The text was updated successfully, but these errors were encountered:

adejanovski assigned rzvoncek and adejanovski and unassigned rzvoncek Aug 14, 2024

adejanovski added ready Issues in the state 'ready' bug Something isn't working in-progress Issues in the state 'in-progress' and removed ready Issues in the state 'ready' labels Aug 14, 2024

adejanovski mentioned this issue Aug 14, 2024

Fix backup man future registration and BackupStatus computation of the node backup state #799

Merged

adejanovski added ready-for-review Issues in the state 'ready-for-review' and removed in-progress Issues in the state 'in-progress' labels Aug 14, 2024

rzvoncek closed this as completed in #799 Aug 15, 2024

adejanovski added done Issues in the state 'done' and removed ready-for-review Issues in the state 'ready-for-review' labels Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backup tracking is broken in the grpc server #797

Backup tracking is broken in the grpc server #797

adejanovski commented Aug 14, 2024 •

edited

Loading

Backup tracking is broken in the grpc server #797

Backup tracking is broken in the grpc server #797

Comments

adejanovski commented Aug 14, 2024 • edited Loading

adejanovski commented Aug 14, 2024 •

edited

Loading