Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup tracking is broken in the grpc server #797

Closed
adejanovski opened this issue Aug 14, 2024 · 0 comments · Fixed by #799
Closed

Backup tracking is broken in the grpc server #797

adejanovski opened this issue Aug 14, 2024 · 0 comments · Fixed by #799
Assignees
Labels
bug Something isn't working done Issues in the state 'done'

Comments

@adejanovski
Copy link
Contributor

adejanovski commented Aug 14, 2024

Project board link

Backup tracking is failing to detect failed backups due to how the BackupStatus code does its check.

In the current version, the code will get the node backup from the storage bucket, and it will potentially register the backup in the BackupMan in case it doesn't exist. This allows to recreate the backups metadata in memory after a restart of the GRPC server.
It then computes the status solely based on the data gathered from storage, without checking if there is a pending future or not.

If there's no pending future (backup crashed or medusa was restarted), then we're not resurfacing the failure and the operator will keep on polling indefinitely.

What needs to be changed:

                BackupMan.register_backup(request.backupName, is_async=False, overwrite_existing=False)
                status = BackupMan.STATUS_UNKNOWN
                if backup.started:
                    status = BackupMan.STATUS_IN_PROGRESS
                if backup.finished:
                    status = BackupMan.STATUS_SUCCESS

this code needs to include a check for the existence of a future in the BackupMan and return accordingly:

                BackupMan.register_backup(request.backupName, is_async=False, overwrite_existing=False)
                status = BackupMan.STATUS_UNKNOWN
                try:
                   future = BackupMan.get_backup_future(request.backupName)
                except RuntimeError as e:
                  # No future exists, if the backup isn't marked as finished in the backend then it failed
                  if not backup.finished:
                    status = BackupMan.STATUS_FAILED
                if status == BackupMan.STATUS_UNKNOWN:
                  # We don't have a pending future and need to compute status based on storage information
                  if backup.started:
                      status = BackupMan.STATUS_IN_PROGRESS
                  if backup.finished:
                      status = BackupMan.STATUS_SUCCESS
@adejanovski adejanovski assigned rzvoncek and adejanovski and unassigned rzvoncek Aug 14, 2024
@adejanovski adejanovski added ready Issues in the state 'ready' bug Something isn't working in-progress Issues in the state 'in-progress' and removed ready Issues in the state 'ready' labels Aug 14, 2024
@adejanovski adejanovski added ready-for-review Issues in the state 'ready-for-review' and removed in-progress Issues in the state 'in-progress' labels Aug 14, 2024
@adejanovski adejanovski added done Issues in the state 'done' and removed ready-for-review Issues in the state 'ready-for-review' labels Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working done Issues in the state 'done'
Projects
None yet
2 participants