-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move recover from volume expansion feature to beta #4849
Move recover from volume expansion feature to beta #4849
Conversation
There is still this drawback added 3 years ago and still saying "We plan to revisit and address this in next release": enhancements/keps/sig-storage/1790-recover-resize-failure/README.md Lines 497 to 499 in 90d712a
When do we plan to address it? |
I do not mind leaving that issue unresolved. I do not think any of folks who are using this feature has asked this to be fixed. @msau42 what do you think? |
90d712a
to
1fd9585
Compare
/lgtm |
/lgtm |
approving the PRR /approve |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, gnufied, jsafrane The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Regarding being able to resize back to 0, I think it would be nice to have. There are some storage systems that have very large expansion increments of 100s of GBs or even TBs, and if a user doesn't realize that, they may want to find alternative ways to manage their space instead of expanding the size. But since we require the size has to be size+1, they will have to go with a painful alternative of having to recreate the PVC/PV, fighting finalizers, recreating Pods (ie downtime) to get rid of the error. That being said, being able to support this does add signficant complexity to the design, so if we don't want to support this, then I would at least like to see:
|
@msau42 we have always documented how to recover from any expansion failure including resetting the PVC size completely - https://kubernetes.io/docs/concepts/storage/persistent-volumes/#recovering-from-failure-when-expanding-volumes
I can make a strawman design, but the devil is in the details and making sure proposed flow - https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1790-recover-resize-failure/expansion_flow.pdf has no corner cases. I believe currently we have a proven design which works, has much less chances of race conditions and it does help with valid usability problems, including better observability in case of errors (because errors don't flip-flop anymore). We won't know about all the corner cases until we do a full design and start implementing it. Volume expansion is complicated because of state reconciliation between kubelet and control-plane and has all kind of edge cases. I am not opposed to letting users reset the PVC size via a new feature may be called |
Thanks, I think the only thing missing from those steps is having to deal with pvc finalizers if a Pod is still referencing it. It would be nice if users can fix this without having to restart all their Pods. We can follow up further documentation enhancements. /hold cancel |
xref #1790
We are planning to move this feature to beta. The KEP has PRR review already filled in, but please do review closely, if I missed something.
/assign @jsafrane @deads2k