Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove schedule-related metrics on schedule delete #6715

Merged
merged 1 commit into from
Sep 25, 2023
Merged

Remove schedule-related metrics on schedule delete #6715

merged 1 commit into from
Sep 25, 2023

Conversation

nilesh-akhade
Copy link
Contributor

@nilesh-akhade nilesh-akhade commented Aug 28, 2023

Thank you for contributing to Velero!

Summary of the change

Currently, when a schedule is deleted, the associated metrics are still exported, which leads to inconsistencies in the metrics data. With this PR, we ensure that when a schedule is deleted, the related metrics are correctly removed or no longer exported. This enhances the accuracy and consistency of the metrics data.

Example

If schedule myschedule exists with one successful backup and if we access the /metrics endpoint on the Velero server.

velero_backup_success_total{schedule="myschedule"} 1
# .. other metrics created for this schedule

After we delete the schedule myschedule, the above line is removed(not exported).

Does your change fix a particular issue?

Fixes #1333

Please indicate you've done the following:

  • Accepted the DCO. Commits without the DCO will delay acceptance.
  • Created a changelog file or added /kind changelog-not-required as a comment on this pull request.
  • Updated the corresponding documentation in site/content/docs/main.

@nilesh-akhade nilesh-akhade marked this pull request as ready for review August 29, 2023 13:25
Comment on lines 95 to +97
if apierrors.IsNotFound(err) {
log.WithError(err).Error("schedule not found")
c.metrics.RemoveSchedule(req.Name)
Copy link
Member

@kaovilai kaovilai Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to consider the case where deletionTimestamp is not nil (user has initiated deletion request) but object is not yet not found (gone from cluster)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario you described is when the metadata.deletionTimestamp is not nil, indicating that a user has initiated a deletion request for the schedule, but the schedule is not yet deleted.
Do you mean we should move the function call for removing schedule-related metrics in the if-block, as shown below?

if schedule.ObjectMeta.DeletionTimestamp != nil {
  log.Debug("Got a deletion request for the schedule")
  c.metrics.RemoveSchedule(req.Name)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would remove metrics even if schedule is stuck/remain in cluster due to finalizers etc. but yes just questioning if that would be preferred or not?

Copy link
Contributor Author

@nilesh-akhade nilesh-akhade Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the deletion of the schedule gets stuck, the metrics will still be exported. This makes sense because the schedule is not yet deleted. Additionally, the removal of metrics is quick and should never fail. Hence, I believe that removal in the reconciliation cycle after deletion should be preferred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@codecov
Copy link

codecov bot commented Sep 21, 2023

Codecov Report

Merging #6715 (c7c4413) into main (f234dd6) will increase coverage by 0.07%.
Report is 168 commits behind head on main.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #6715      +/-   ##
==========================================
+ Coverage   60.28%   60.36%   +0.07%     
==========================================
  Files         238      242       +4     
  Lines       25256    25983     +727     
==========================================
+ Hits        15226    15684     +458     
- Misses       8976     9196     +220     
- Partials     1054     1103      +49     
Files Changed Coverage Δ
pkg/controller/schedule_controller.go 72.72% <100.00%> (+0.17%) ⬆️

... and 41 files with indirect coverage changes

Copy link
Collaborator

@shubham-pampattiwar shubham-pampattiwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nilesh-akhade

@shubham-pampattiwar shubham-pampattiwar merged commit c3ec7b7 into vmware-tanzu:main Sep 25, 2023
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

velero_backup_success_total metric keeps being reported for deleted schedules
4 participants