-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
velero_backup_last_status indicates failed backup, while in reality the backup was successful #6809
Comments
|
This issue is happening to us also. The metric |
We are also running into this, very annoying as we get a lot of false alerts now on our backups |
@vvanouytsel Because the |
And i also found the generate backup 's time is odd. The backup:
|
Its not the cron time as usual like the scheduler defines.:
|
@yanggangtony , unfortunately I do not have access to those logs anymore. The weird thing is that the
|
The default value of The better way to handle |
@allenxu404 you might actually be on to something. |
In the normal way , backup schedule begin a backup cr, the backup cr was watchd, and begin to run. and report the status of it. the code is right. How about we just change the default value from 0-->1? |
@allenxu404 |
But there is another issue may occur, the metrics |
|
After briefly examining the code, it appears that the metrics velero_backup_last_status are being updated correctly around all relevant sections. Therefore, the issue seems to arise when the code exits without updating the metrics. @yanggangtony Yes, I think it's better to change the default value to 1, indicating the success. And it doesn't hurt the metrics indicates success when there is no backup initiated. |
@allenxu404 |
What steps did you take and what happened:
We have a
Schedule
that takes a backup every night which is namedvelero-daily
. When we deployed a new AMI to our cluster, all our nodes were replaced with the new AMI version, effectively meaning that the velero pod moved to a new node. After that we noticed that thevelero_backup_last_status{schedule="velero-daily"}
metric indicated that the last backup failed. While in reality the backup was successful.What did you expect to happen:
I expected the
velero_backup_last_status
metric to output a1
instead of a0
, because the backup was successful.When querying the
velero_backup_last_status
metric, we get a0
indicating that the backup failed, thus triggering our alert.However note that the
velero_backup_last_successful_timestamp
shows that the backup was actually successful, so both metrics contradict each other.The CLI shows that the backup has completed with no errors/warnings.
The
Backup
object also shows that the backup was taken successfully.Environment:
velero version
):velero client config get features
):kubectl version
):EKS
Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: