Uncertain handling for attach #400

msau42 · 2022-12-28T01:46:36Z

It looks like we check for finalError on ControllerPublishVolume:

external-attacher/pkg/attacher/attacher.go

Line 70 in c4cfca3

return nil, isFinalError(err), err

But we ignore it later on:

external-attacher/pkg/controller/csi_handler.go

Line 513 in c4cfca3

// We're not interested in `detached` return value, the controller will

And we rely on the driver's implementation of ControllerUnpublishVolume to be able to properly detect if there is still an attach operation in progress.

This is different than how we handle other uncertain states for volume operations like provision and mount. Should we consider adding uncertain handling for attach?

The text was updated successfully, but these errors were encountered:

msau42 · 2022-12-28T01:47:17Z

cc @jsafrane @gnufied

jsafrane · 2023-01-05T12:14:08Z

IIRC, the external-attacher always calls ControllerUnpublish when a VolumeAttachment gets DeletionTimestamp, regardless if the previous attach was successful or not or if it got a final error or temporary one. In this sense, an attachment is always "uncertain" until fully attached.

It could be possible to optimize and mark attachments as "not attached" after a final ControllerPublish error and skip ControllerUnpublish in this case. I am not sure it's worth the effort.

bswartz · 2023-01-05T23:57:22Z

I agree it sounds like it should not cause problems, but one could imagine that some plugin depends on the current Kubernetes behavior to avoid leaking resources.

My reading of the spec on the requirements for the CO are ambiguous in this area. There is a lot of "the CO may choose to" language that suggests the CO has little if any obligation here.

msau42 · 2023-01-06T19:35:31Z

The main challenge is that the current behavior relies on the plugin to potentially keep state about pending attaches requests, which may be difficult. You can have a sequence like:

Attach disk started
Attach disk timed out on client side, but is still queued in the backend
Pod is deleted and we trigger detach
Detach returns success because disk is not attached to the node yet.
Queued attach operation proceeds and attaches the disk to the old node.

Some ways that a plugin could address this are:

During step 4, check for pending attach operations
During step 4, issue a detach operation regardless and wait for the operation to complete

However, keeping track of pending operations may be difficult. For example, in GCP, operation metadata only provides the instance id, not the disk id. So the CSI driver would have to potentially 1) serialize operations per instance, which would not be good for performance or 2) cache operations in memory and only serialize on restart 3) keep operations state in some CRD

msau42 · 2023-01-18T18:55:15Z

For reference, here is a prototype of adding operation caching in the GCP driver: kubernetes-sigs/gcp-compute-persistent-disk-csi-driver#923

xing-yang · 2023-01-18T18:57:25Z

Discussed at the triage meeting. We'll work on a design that is similar to how provisioner works and the attach/detach operations will be synchronized with an operations cache.

xing-yang · 2023-01-18T18:58:32Z

/priority important-soon

xing-yang · 2023-01-18T18:58:54Z

/triage accepted

Sneha-at · 2023-01-31T21:52:08Z

I will work on this issue, unable to assign to myself.

k8s-triage-robot · 2023-05-01T21:54:48Z

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Deprioritize it with /priority important-longterm or /priority backlog
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

msau42 · 2023-09-12T22:29:22Z

/priority important-longterm
/triage accepted

k8s-triage-robot · 2024-01-19T03:00:47Z

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Deprioritize it with /priority important-longterm or /priority backlog
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

msau42 · 2024-03-09T00:13:55Z

/priority important-longterm
/triage accepted

k8s-triage-robot · 2024-06-07T00:48:32Z

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Deprioritize it with /priority important-longterm or /priority backlog
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

msau42 · 2024-08-02T22:42:00Z

/remove-priority important-soon

k8s-triage-robot · 2024-10-31T23:05:54Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-12-01T00:04:14Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jan 18, 2023

k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jan 18, 2023

msau42 mentioned this issue Feb 13, 2023

Disk don't attach after a rateLimitExceeded kubernetes-sigs/gcp-compute-persistent-disk-csi-driver#1129

Closed

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels May 1, 2023

k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Sep 12, 2023

k8s-ci-robot removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jan 19, 2024

k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Mar 9, 2024

k8s-ci-robot removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jun 7, 2024

k8s-ci-robot removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 2, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 31, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2024

k8s-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uncertain handling for attach #400

Uncertain handling for attach #400

msau42 commented Dec 28, 2022 •

edited

Loading

msau42 commented Dec 28, 2022

jsafrane commented Jan 5, 2023 •

edited

Loading

bswartz commented Jan 5, 2023

msau42 commented Jan 6, 2023

msau42 commented Jan 18, 2023

xing-yang commented Jan 18, 2023

xing-yang commented Jan 18, 2023

xing-yang commented Jan 18, 2023

Sneha-at commented Jan 31, 2023

k8s-triage-robot commented May 1, 2023

msau42 commented Sep 12, 2023

k8s-triage-robot commented Jan 19, 2024

msau42 commented Mar 9, 2024

k8s-triage-robot commented Jun 7, 2024

msau42 commented Aug 2, 2024

k8s-triage-robot commented Oct 31, 2024

k8s-triage-robot commented Dec 1, 2024

Uncertain handling for attach #400

Uncertain handling for attach #400

Comments

msau42 commented Dec 28, 2022 • edited Loading

msau42 commented Dec 28, 2022

jsafrane commented Jan 5, 2023 • edited Loading

bswartz commented Jan 5, 2023

msau42 commented Jan 6, 2023

msau42 commented Jan 18, 2023

xing-yang commented Jan 18, 2023

xing-yang commented Jan 18, 2023

xing-yang commented Jan 18, 2023

Sneha-at commented Jan 31, 2023

k8s-triage-robot commented May 1, 2023

msau42 commented Sep 12, 2023

k8s-triage-robot commented Jan 19, 2024

msau42 commented Mar 9, 2024

k8s-triage-robot commented Jun 7, 2024

msau42 commented Aug 2, 2024

k8s-triage-robot commented Oct 31, 2024

k8s-triage-robot commented Dec 1, 2024

msau42 commented Dec 28, 2022 •

edited

Loading

jsafrane commented Jan 5, 2023 •

edited

Loading