You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been playing around with the operator a bit. I'm heavily relying on snapshots for my backups and saw, that they don't get cleaned up any more. I'm using 2 replicas for the disks.
Some more information:
The snapshots get successfully deleted from one host (kerrigan03)
The other host still contains the snapshot
It's possible to simply delete the snapshot via ZFS destroy
Deleting the snapshot manually and then restarting the satellite pod solves the issue
The snapshot was in use by a PVC
The PVC and volumesnapshot were deleted at more or less the same time, this feels like a race condition
root@kerrigan02:/var/log/linstor-satellite# cat ErrorReport-66C45CF9-C43D0-000001.log
ERROR REPORT 66C45CF9-C43D0-000001
============================================================
Application: LINBIT? LINSTOR
Module: Satellite
Version: 1.28.0
Build ID: 959382f7b4fb9436fefdd21dfa262e90318edaed
Build time: 2024-07-11T10:21:06+00:00
Error time: 2024-08-20 09:18:57
Node: kerrigan02
Thread: DeviceManager
============================================================
Reported error:
===============
Category: LinStorException
Class name: StorageException
Class canonical name: com.linbit.linstor.storage.StorageException
Generated at: Method 'checkExitCode', Source file 'ExtCmdUtils.java', Line piraeusdatastore/piraeus-operator#69
Error message: Failed to delete zfs snapshot
Error context:
An error occurred while processing snapshot 'snapshot-765e2075-d352-402b-a085-c8fd2f501a20' of resource 'pvc-488b72a9-7753-41ac-b03e-df7c311e4b12'
ErrorContext:
Details: Command 'zfs destroy zfspv-pool/linstor/pvc-488b72a9-7753-41ac-b03e-df7c311e4b12_00000@snapshot-765e2075-d352-402b-a085-c8fd2f501a20' returned with exitcode 1.
Standard out:
Error message:
cannot destroy 'zfspv-pool/linstor/pvc-488b72a9-7753-41ac-b03e-df7c311e4b12_00000@snapshot-765e2075-d352-402b-a085-c8fd2f501a20': snapshot has dependent clones
use '-R' to destroy the following datasets:
zfspv-pool/linstor/pvc-158db1b3-a10a-4424-af3e-6149c63adc72_00000
Call backtrace:
Method Native Class:Line number
checkExitCode N com.linbit.extproc.ExtCmdUtils:69
genericExecutor N com.linbit.linstor.storage.utils.Commands:103
genericExecutor N com.linbit.linstor.storage.utils.Commands:63
delete N com.linbit.linstor.layer.storage.zfs.utils.ZfsCommands:104
deleteSnapshotImpl N com.linbit.linstor.layer.storage.zfs.ZfsProvider:469
deleteSnapshotImpl N com.linbit.linstor.layer.storage.zfs.ZfsProvider:70
deleteSnapshot N com.linbit.linstor.layer.storage.AbsStorageProvider:810
processSnapshotVolumes N com.linbit.linstor.layer.storage.AbsStorageProvider:392
processSnapshot N com.linbit.linstor.layer.storage.StorageLayer:333
processGeneric N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:949
processGeneric N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:967
processSnapshot N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:919
processSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:610
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:211
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:331
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1204
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:778
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:672
run N java.lang.Thread:840
END OF ERROR REPORT.
I'll gladly provide more information if necessary.
I guess there is a potential race conditiion between the cloned PVC and snapshot. ZFS does not allow deleting snapshots that are used by a cloned volume. Because the CSI driver does receive separate requests for deleting the volume and snapshot, the driver might try to delete the snapshot first, which then fails. After deleting the volume, I guess LINSTOR should try to delete the snapshot again.
WanzenBug
transferred this issue from piraeusdatastore/piraeus-operator
Aug 26, 2024
Hi
I've been playing around with the operator a bit. I'm heavily relying on snapshots for my backups and saw, that they don't get cleaned up any more. I'm using 2 replicas for the disks.
Some more information:
ZFS destroy
Error report:
I'll gladly provide more information if necessary.
EDIT:
I can reproduce the issue with https://kubestr.io/:
Let if fail and check the volume snapshots on the cluster.
The text was updated successfully, but these errors were encountered: