Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting a PV before a PVC results in an orphaned volume and bdev #1269

Closed
kylebrooks-8451 opened this issue Dec 11, 2022 · 17 comments
Closed
Assignees

Comments

@kylebrooks-8451
Copy link

kylebrooks-8451 commented Dec 11, 2022

Edit:

I tried to delete a pv before deleting the pvc and the kubectl delete pv command hung. I stopped it using ctrl-c and deleted the pvc first and then the pv. This resulted in an orphaned volume and bdev. I then tried to use the io-engine-client to manually delete the bdev but it failed:

docker run openebs/mayastor-io-engine-client:release-2.0 -b talos-df7-egr bdev list
UUID                                 NUM_BLOCKS BLK_SIZE CLAIMED_BY NAME                                 SHARE_URI
8c1abb67-b96c-47f5-be95-4c6e96bbc4da 167772160  512      lvol       /dev/sdb                             bdev:////dev/sdb
f517fe5e-3f81-4484-b597-634483fe8d2d 2097152    512      Orphaned   f517fe5e-3f81-4484-b597-634483fe8d2d bdev:///f517fe5e-3f81-4484-b597-634483fe8d2d
docker run openebs/mayastor-io-engine-client:release-2.0 -b talos-df7-egr bdev destroy f517fe5e-3f81-4484-b597-634483fe8d2d
Error: GrpcStatus { source: Status { code: InvalidArgument, message: "Error parsing URI ''", metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Sun, 11 Dec 2022 15:03:42 GMT"} }, source: None }, backtrace: Backtrace(()) }

Expected behavior
The block device is destroyed. For some reason this block device was not destroyed by Kubernetes even though the StorageClass had reclaimPolicy: Delete.

** OS info (please complete the following information):**

  • Distro: Docker images
  • MayaStor revision or container image: 1.0.4
@kylebrooks-8451
Copy link
Author

kylebrooks-8451 commented Dec 11, 2022

I think the issue is that the uri is empty for this bdev:

{
  "aliases": "talos-df7-egr/f517fe5e-3f81-4484-b597-634483fe8d2d",
  "blk_size": 512,
  "claimed": false,
  "claimed_by": "Orphaned",
  "name": "f517fe5e-3f81-4484-b597-634483fe8d2d",
  "num_blocks": 2097152,
  "product_name": "Logical Volume",
  "share_uri": "bdev:///f517fe5e-3f81-4484-b597-634483fe8d2d",
  "uri": "",
  "uuid": "f517fe5e-3f81-4484-b597-634483fe8d2d"
}

@tiagolobocastro
Copy link
Contributor

Using grpc to io-engine should be the last resort.
If you use kubectl-mayastor get volumes does it return any volumes?

@kylebrooks-8451
Copy link
Author

Yes, it still shows the volumes:

kubectl mayastor get volumes
 ID                                    REPLICAS  TARGET-NODE  ACCESSIBILITY  STATUS  SIZE
 327b4e55-589c-42e2-baf9-d1bdadf63366  1         <none>       <none>         Online  1073741824

@tiagolobocastro
Copy link
Contributor

Great, and is the pvc and pv gone?

@kylebrooks-8451
Copy link
Author

Yes, they are both gone. If it helps here is the YAML for the volume:

---
spec:
  labels:
    local: "true"
  num_replicas: 1
  size: 1073741824
  status: Created
  uuid: 327b4e55-589c-42e2-baf9-d1bdadf63366
  topology:
    node_topology:
      explicit:
        allowed_nodes:
          - talos-proxmox-0
          - talos-proxmox-1
          - talos-tu8-r34
          - talos-b2m-7a0
          - talos-df7-egr
          - talos-j92-0u1
          - talos-srn-grj
          - talos-ixc-r5k
        preferred_nodes:
          - talos-df7-egr
          - talos-ixc-r5k
          - talos-j92-0u1
          - talos-proxmox-0
          - talos-proxmox-1
          - talos-srn-grj
          - talos-tu8-r34
          - talos-b2m-7a0
    pool_topology:
      labelled:
        exclusion: {}
        inclusion:
          openebs.io/created-by: msp-operator
  policy:
    self_heal: true
state:
  size: 1073741824
  status: Online
  uuid: 327b4e55-589c-42e2-baf9-d1bdadf63366
  replica_topology:
    f517fe5e-3f81-4484-b597-634483fe8d2d:
      node: talos-df7-egr
      pool: talos-df7-egr
      state: Online

@tiagolobocastro
Copy link
Contributor

How strange, with retention set to delete then it should have deleted it, right @abhilashshetty04 ?
Could we get logs from the csi-controller?

@kylebrooks-8451
Copy link
Author

Here are the logs. One thing to note, I did try to delete the PV before the PVC and had to ctrl-c the kubectl delete pv command and delete the pvc first. I'm wonder if I created a race condition indicated in the cis-attacher-log.txt.

csi-attacher-log.txt
csi-provisioner-log.txt
csi-controller-log.txt

I1210 18:40:58.639219       1 csi_handler.go:276] Detaching "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1"
I1210 18:40:58.720361       1 csi_handler.go:583] Detached "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1"
I1210 18:40:58.749118       1 csi_handler.go:276] Detaching "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1"
I1210 18:40:58.755756       1 csi_handler.go:583] Detached "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1"
I1210 18:40:58.769362       1 csi_handler.go:283] Failed to save detach error to "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1": volumeattachments.storage.k8s.io "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1" not found
I1210 18:40:58.769780       1 csi_handler.go:228] Error processing "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1": failed to detach: could not mark as detached: volumeattachments.storage.k8s.io "csi-9128a61d8587a1d12b1e6a9091d79ecf6164e5e28f9a90271ee5f46cb8c51eb1" not found
I1210 18:41:17.331177       1 csi_handler.go:708] Removed finalizer from PV "pvc-327b4e55-589c-42e2-baf9-d1bdadf63366"

@tiagolobocastro
Copy link
Contributor

Oh right, if you try to delete the pv manually I don't think we support that as things stand @abhilashshetty04 ?

@tiagolobocastro
Copy link
Contributor

To recover from this you'd have to delete the mayastor volume using rest, something like this:
curl -X 'DELETE' 'http://node:30011/v0/volumes/327b4e55-589c-42e2-baf9-d1bdadf63366' -H 'accept: /'

@kylebrooks-8451
Copy link
Author

@tiagolobocastro That command deleted the volume and bdev, thank you!

@kylebrooks-8451 kylebrooks-8451 changed the title io-engine-client can't destroy orphaned bdevs Deleting a PV before a PVC results in an orphaned volume and bdev Dec 12, 2022
@abhilashshetty04
Copy link
Contributor

How strange, with retention set to delete then it should have deleted it, right @abhilashshetty04 ? Could we get logs from the csi-controller?
Yes should have been deleted by CSI driver.

@kvzn
Copy link

kvzn commented Apr 11, 2023

How I deleted the volume:
I've tried on several pods of mayastor, all of them have no curl and apt, need to try other ways:

user@mbp2023 ~ % kubectl get svc -n mayastor
NAME                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)               AGE
mayastor-agent-core              ClusterIP   10.128.39.22     <none>        50051/TCP,50052/TCP   6h13m
mayastor-api-rest                ClusterIP   10.128.209.239   <none>        8080/TCP,8081/TCP     6h13m
mayastor-etcd                    ClusterIP   10.128.176.26    <none>        2379/TCP,2380/TCP     6h13m
mayastor-etcd-headless           ClusterIP   None             <none>        2379/TCP,2380/TCP     6h13m
mayastor-loki                    ClusterIP   10.128.186.128   <none>        3100/TCP              6h13m
mayastor-loki-headless           ClusterIP   None             <none>        3100/TCP              6h13m
mayastor-metrics-exporter-pool   ClusterIP   10.128.39.212    <none>        9502/TCP              6h13m

Realized the line of mayastor-api-rest, tried this:

user@mbp2023 ~ % kubectl port-forward deployment/mayastor-api-rest -n mayastor 8081:8081
Forwarding from 127.0.0.1:8081 -> 8081
Forwarding from [::1]:8081 -> 8081
Handling connection for 8081
Handling connection for 8081

And this:

user@mbp2023 ~ % curl -X 'DELETE' 'http://127.0.0.1:8081/v0/volumes/327b4e55-589c-42e2-baf9-d1bdadf63366' -H 'accept: /'
{"details":"Volume '327b4e55-589c-42e2-baf9-d1bdadf63366' not found","message":"SvcError :: VolumeNotFound","kind":"NotFound"} 
user@mbp2023 ~ % curl -X 'DELETE' 'http://127.0.0.1:8081/v0/volumes/0a59e089-5db5-4221-9b53-ecdd854a99ec' -H 'accept: /'
user@mbp2023 ~ %

Bang! It’s done!
Thank you @tiagolobocastro

@nullzone
Copy link

I had exactly the same issue.
Could the garbage job be fixed to implement the cleaning of these orphaned volumes or even better, not to delete the entries when Pv is call for deletion until this really happens?
When a PV is called for deletion when a existing PVC is still bounded, this get stuck (or should) until the PVC is also deleted.

Anyhow, thank you for all the above-mentioned. Helped me to detect the orphaned volumes and gave me a way to clean up and free the blocked resources.

@tiagolobocastro
Copy link
Contributor

I guess we need to try and repro this first.
So let me double check what the flow is:

  1. create pvc ---> bound
  2. delete pv ---> gets stuck because pvc is still bound
  3. delete pvc ---> pv is deleted but mayastor volume remains!??

@nullzone
Copy link

Right.

@tiagolobocastro
Copy link
Contributor

Automatic GC on openebs/mayastor-control-plane#724
WA on current release, please restart the csi-controller pod

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants