-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error VolumeSnapshot vaicloud-dev/cephfs-pvc-snapshot does not have a velero.io/csi-volumesnapshot-handle annotation #8444
Comments
This is not expected. |
@erichevers
|
Hi @blackpiglet , |
Thanks for collecting the debug bundle. There were three VolumeSnapshots included in the backup, and they are not created by the backup. snapshot.storage.k8s.io/v1/VolumeSnapshot:
- vaicloud-dev/cephfs-pvc-snapshot
- vaicloud-dev/velero-vaicloud-mq-volume-rsjnj
- vaicloud-dev/velero-vaicloud-postgresql-volume-bgfhz Velero also run the VolumeSnapshot BackupItemAction against them. The only reason the restore failed to restore the VolumeSnapshots is that the backup-included VolumeSnapshots didn't have the |
Hi @blackpiglet , On the Prod01 cluster i've checked the volumesnapshots and indeed there are three kubectl get volumesnapshots
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
cephfs-pvc-snapshot false cephfs-pvc csi-cephfsplugin-snapclass 41d
velero-vaicloud-mq-volume-rsjnj true vaicloud-mq-volume 2Gi csi-rbdplugin-snapclass snapcontent-131fbc1b-92f6-4980-87e9-997d4aef74c3 3d7h 3d7h
velero-vaicloud-postgresql-volume-bgfhz true vaicloud-postgresql-volume 30Gi csi-rbdplugin-snapclass snapcontent-89dd62b0-7dae-4297-83cb-dc4d1b97db86 3d7h 3d7h I don't know where the cephfs snapshot is coming from, but the describe shows that it in a failed state. kubectl describe volumesnapshot cephfs-pvc-snapshot
Status:
Error:
Message: Failed to create snapshot content with error snapshot controller failed to update cephfs-pvc-snapshot on API server: cannot get claim from snapshot
Time: 2024-10-15T13:35:37Z
Ready To Use: false I also looked at the other two snapshots and they came from another backup. datasnapshots got created and removed, as it should be
This time there was no messsage about the VolumeSnapshot as the original case. However the restore job stays in: below is the debug logfile of the restore: Regards |
From the log, I think the restore worked as expected. The data mover restore may take longer time than the CSI snapshot restore, because the data mover restore needs to create temporary pod and PVC to host the restored data. |
Make some clarification about the scenario of this issue:
Although this is a rainy-day case, we may also consider whether Velero should handle it instead of reporting error. |
Hi @blackpiglet ,
And the pods are still in pending Regards |
@erichevers |
@blackpiglet , kubectl describe pvc vaicloud-postgresql-volume -n vaicloud-dev
gives:
Name: vaicloud-postgresql-volume
Namespace: vaicloud-dev
StorageClass: rook-ceph-block
Status: Pending
Volume:
Labels: velero.io/backup-name=vaicloud-dev-backup22112024-4
velero.io/restore-name=restore-test
velero.io/volume-snapshot-name=velero-vaicloud-postgresql-volume-jkmhl
Annotations: backup.velero.io/must-include-additional-items: true
velero.io/csi-volumesnapshot-class: csi-rbdplugin-snapclass
volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
volume.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: vaicloud-db-7846b4c4cd-25k8w
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 3m53s (x2923 over 12h) persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'rook-ceph.rbd.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
Normal Provisioning 27s (x205 over 12h) rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-54b4855f96-b95cx_0ee214d7-199b-4fcc-8748-dfa6b513df21 External provisioner is provisioning volume for claim "vaicloud-dev/vaicloud-postgresql-volume" Regards |
To me, the error should be related to the Ceph Rook not creating a volume for the PVC in time. |
Create a new issue #8460 to address this comment. |
Hi @blackpiglet , I did a quick test. I deleted the pending pvc vaicloud-mq-volume and did apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: vaicloud-mq-volume
namespace: vaicloud-dev
spec:
accessModes:
- ReadWriteOnce
storageClassName: rook-ceph-block
resources:
requests:
storage: 2Gi The new pvc got Bound en the pod was immediatly in Running. Prod01: kubectl describe pvc vaicloud-mq-volume -n vaicloud-dev
Name: vaicloud-mq-volume
Namespace: vaicloud-dev
StorageClass: rook-ceph-block
Status: Bound
Volume: pvc-b6de9f16-8a9d-41ae-ae2e-a5fc715377c0
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
velero.io/csi-volumesnapshot-class: csi-rbdplugin-snapclass
volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
volume.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 2Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: vaicloud-mq-54d4469b99-qfrzm
Events: <none>
kubectl config use-context dr01
Switched to context "dr01".
kubectl describe pvc vaicloud-mq-volume -n vaicloud-dev
Name: vaicloud-mq-volume
Namespace: vaicloud-dev
StorageClass: rook-ceph-block
Status: Bound
Volume: pvc-1db2cbaa-347b-4ab1-8f0b-9a803c28a393
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
volume.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 2Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: vaicloud-mq-54d4469b99-qfrzm
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 4m35s persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'rook-ceph.rbd.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
Normal Provisioning 4m35s rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-54b4855f96-b95cx_0ee214d7-199b-4fcc-8748-dfa6b513df21 External provisioner is provisioning volume for claim "vaicloud-dev/vaicloud-mq-volume"
Normal ProvisioningSucceeded 4m35s rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-54b4855f96-b95cx_0ee214d7-199b-4fcc-8748-dfa6b513df21 Successfully provisioned volume pvc-1db2cbaa-347b-4ab1-8f0b-9a803c28a393 So to me it seems that Rook-Ceph is working correctly. I did a new restore test and now i saw the following in the events: Could this be a issue? Regards |
It's better to collected the timeout restore's debug bundle to investigate what happened exactly. |
What steps did you take and what happened:
I do a restore from a backup made from a pvc on rook-ceph on cluster prod01 to cluster dR01 with:
velero restore create restore-test --include-namespaces vaicloud-dev --from-backup vaicloud-dev-backup22112024-2
The backup was made using --snapshot-move-data to S3 compatible storage
What did you expect to happen:
The restore to succeed, but i get the following error:
I have set the requested annotation on rbd and cephfs, on both clusters. Also the volumes that need to be restored are using the rook-ceph-block storageclass, not the cephfs as the failure message indicates. So i'm wondering why this restore fails with a reference to cephfs
The following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
bundle-2024-11-22-16-16-21.tar.gz
If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add:
Environment:
velero version
): Version 1.15.0 on both clustersvelero client config get features
): features: EnableCSIkubectl version
): 1.30.0 on the prod01 cluster (backup) and 1.31.1 on the dr01 (restore)/etc/os-release
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: