Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable snapshots, remove scheduledbackup for snapshots #1057

Closed
wants to merge 1 commit into from

Conversation

nhudson
Copy link
Collaborator

@nhudson nhudson commented Nov 24, 2024

Currently we have ~ 36000+ VolumeSnapshots on the us-east-1 cluster in prod. This is causing the snapshot-controller to crash as it can't currently view that many objects from the Kubernetes API.

I1124 19:28:25.790912       1 main.go:157] Version: v8.0.1
I1124 19:28:25.791757       1 main.go:206] Start NewCSISnapshotController with kubeconfig [] resyncPeriod [15m0s]
E1124 19:29:25.797250       1 main.go:93] Failed to list v1 volumesnapshots with error=the server was unable to return a response in the time allotted, but may still be processing the request (get volumesnapshots.snapshot.storage.k8s.io)
E1124 19:30:25.898711       1 main.go:93] Failed to list v1 volumesnapshots with error=the server was unable to return a response in the time allotted, but may still be processing the request (get volumesnapshots.snapshot.storage.k8s.io)
I1124 19:31:26.049852       1 request.go:1110] Stream error http2.StreamError{StreamID:0x5, Code:0x2, Cause:(*errors.errorString)(0x2afa130)} when reading response body, may be caused by closed connection.
E1124 19:31:26.049969       1 main.go:93] Failed to list v1 volumesnapshots with error=stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 5; INTERNAL_ERROR; received from peer
E1124 19:32:26.276700       1 main.go:93] Failed to list v1 volumesnapshots with error=the server was unable to return a response in the time allotted, but may still be processing the request (get volumesnapshots.snapshot.storage.k8s.io)
E1124 19:33:25.798645       1 main.go:93] Failed to list v1 volumesnapshots with error=Get "[https://172.20.0.1:443/apis/snapshot.storage.k8s.io/v1/volumesnapshots](https://172.20.0.1/apis/snapshot.storage.k8s.io/v1/volumesnapshots)": context deadline exceeded
E1124 19:33:25.798682       1 main.go:231] Exiting due to failure to ensure CRDs exist during startup: context deadline exceeded

Disable volume snapshots again and make sure we remove the ScheduledBackup along with removing VolumeSnapshots

The removal of the VolumeSnapshots will be done with the operator (default keeping 1 day). If every namespace has 40 VolumeSnapshots it will remove all but 1, then remove that 24 hours later after it's over 24 hours old.

This adds 2 new functions remove_scheduled_backup and remove_snap_scheduled_backup. This will ensure that the ScheduledBackup of type volumeSnapshot is removed from the namespace of the instance.

@nhudson nhudson self-assigned this Nov 24, 2024
@nhudson nhudson marked this pull request as ready for review November 24, 2024 22:46
@nhudson
Copy link
Collaborator Author

nhudson commented Nov 24, 2024

This issue with merging this now, is that deleting the snapshots in prod us-east-1 will not work because the controller will not start. Not sure of what to do about that just yet.

@nhudson nhudson marked this pull request as draft November 25, 2024 21:49
@nhudson
Copy link
Collaborator Author

nhudson commented Nov 25, 2024

Converting to draft for now.

@nhudson nhudson marked this pull request as ready for review November 26, 2024 01:16
@nhudson
Copy link
Collaborator Author

nhudson commented Nov 26, 2024

This is not needed, I forgot I added a config to enable/disable snapshots. This can be closed.

@nhudson nhudson closed this Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants