Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VolumeSnapshotContents Synced across clusters are rePUT because of incorrect source #7978

Open
anshulahuja98 opened this issue Jul 4, 2024 · 6 comments · Fixed by #7983
Open

Comments

@anshulahuja98
Copy link
Collaborator

What steps did you take and what happened:

Velero syncs VolumeSnapshotContents across clusters with same BSL as part of Backup Sync flow.

Here earlier Velero used to sync them with spec.source.snapshotHandle set, whereas now it is populated with VolumeHandle.
This leads to CSI driver to again trigger createSnapshot - https://github.com/kubernetes-csi/external-snapshotter/blame/master/pkg/sidecar-controller/snapshot_controller.go#L88

This further leads to throttling /high load on the CSI driver leading to other consequences.

What did you expect to happen:

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add:

Environment:

  • Velero version (use velero version):
  • Velero features (use velero client config get features):
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@anshulahuja98
Copy link
Collaborator Author

Initial fix is done in #7924

@Lyndon-Li
Copy link
Contributor

@anshulahuja98 @blackpiglet For 1.14.1, I think the initial fix #7924 is enough right?
@anshulahuja98 Could you also cherry pick #7924 to 1.13 branch?

@anshulahuja98
Copy link
Collaborator Author

@Lyndon-Li we also need to merge - #7983
Can you please help with review

@anshulahuja98
Copy link
Collaborator Author

Post that I can help with cherry pick of both commits

@blackpiglet
Copy link
Contributor

Put more comments about this fix for the user to understand how this fix works and its limitations.

The fix resets the VolumeSnapshotContent.Spec.Source from the VolumeHandle to SnapshotHandle on the backup sync process. This makes the VolumeSnapshotContents a static type avoiding the CSI snapshot controller to recreate its VolumeSnapshot.

But this modification is not persisted in the backup's metadata in the Backup's object storage, so the VolumeSnapshotContent in the backup's volumesnapshotcontents.json.gz is different from the synced VolumeSnapshotContent created in the k8s cluster.

That should be fixed in the v1.15.0 Velero release by removing the VolumeSnapshotContent from the backup metadata.

@Lyndon-Li
Copy link
Contributor

Reopen for cherry-picks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants