Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove schedule-related metrics on schedule delete #6715

Merged
merged 1 commit into from
Sep 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelogs/unreleased/6715-nilesh-akhade
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Remove schedule-related metrics on schedule delete
1 change: 1 addition & 0 deletions pkg/controller/schedule_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ func (c *scheduleReconciler) Reconcile(ctx context.Context, req ctrl.Request) (c
if err := c.Get(ctx, req.NamespacedName, schedule); err != nil {
if apierrors.IsNotFound(err) {
log.WithError(err).Error("schedule not found")
c.metrics.RemoveSchedule(req.Name)
Comment on lines 95 to +97
Copy link
Member

@kaovilai kaovilai Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to consider the case where deletionTimestamp is not nil (user has initiated deletion request) but object is not yet not found (gone from cluster)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario you described is when the metadata.deletionTimestamp is not nil, indicating that a user has initiated a deletion request for the schedule, but the schedule is not yet deleted.
Do you mean we should move the function call for removing schedule-related metrics in the if-block, as shown below?

if schedule.ObjectMeta.DeletionTimestamp != nil {
  log.Debug("Got a deletion request for the schedule")
  c.metrics.RemoveSchedule(req.Name)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would remove metrics even if schedule is stuck/remain in cluster due to finalizers etc. but yes just questioning if that would be preferred or not?

Copy link
Contributor Author

@nilesh-akhade nilesh-akhade Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the deletion of the schedule gets stuck, the metrics will still be exported. This makes sense because the schedule is not yet deleted. Additionally, the removal of metrics is quick and should never fail. Hence, I believe that removal in the reconciliation cycle after deletion should be preferred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

return ctrl.Result{}, nil
}
return ctrl.Result{}, errors.Wrapf(err, "error getting schedule %s", req.String())
Expand Down
82 changes: 82 additions & 0 deletions pkg/metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -505,6 +505,88 @@ func (m *ServerMetrics) InitSchedule(scheduleName string) {
}
}

// RemoveSchedule removes metrics associated with a specified schedule.
func (m *ServerMetrics) RemoveSchedule(scheduleName string) {
if g, ok := m.metrics[backupTarballSizeBytesGauge].(*prometheus.GaugeVec); ok {
g.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[backupAttemptTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[backupSuccessTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[backupPartialFailureTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[backupFailureTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[backupValidationFailureTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if h, ok := m.metrics[backupDurationSeconds].(*prometheus.HistogramVec); ok {
h.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[backupDeletionAttemptTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[backupDeletionSuccessTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[backupDeletionFailureTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if g, ok := m.metrics[backupLastSuccessfulTimestamp].(*prometheus.GaugeVec); ok {
g.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[backupItemsTotalGauge].(*prometheus.GaugeVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[backupItemsErrorsGauge].(*prometheus.GaugeVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[backupWarningTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[backupLastStatus].(*prometheus.GaugeVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[restoreAttemptTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[restorePartialFailureTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[restoreFailedTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[restoreSuccessTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[restoreValidationFailedTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[volumeSnapshotSuccessTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[volumeSnapshotAttemptTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[volumeSnapshotFailureTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName)
}
if c, ok := m.metrics[csiSnapshotAttemptTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName, "")
}
if c, ok := m.metrics[csiSnapshotSuccessTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName, "")
}
if c, ok := m.metrics[csiSnapshotFailureTotal].(*prometheus.CounterVec); ok {
c.DeleteLabelValues(scheduleName, "")
}
}

// InitSchedule initializes counter metrics for a node.
func (m *ServerMetrics) InitMetricsForNode(node string) {
if c, ok := m.metrics[podVolumeBackupEnqueueTotal].(*prometheus.CounterVec); ok {
Expand Down
Loading