Open
Description
Enhancement Description
- One-line enhancement description (can be used as a release note):
This issue tracks a list of KEP review conversations that need resolving before we GA the feature. - Kubernetes Enhancement Proposal: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources
- Primary contact (assignee): @tallclair @Jeffwan @vinaykul
- Responsible SIGs: sig-node, sig-autoscaling
- Enhancement target (which target equals to which milestone):
- Alpha release target (1.27)
- Beta release target (past 1.33?)
- Stable release target (past )
- Alpha(v1.27~v1.29)
- KEP (k/enhancements) update PR(s):
- Code (k/k) update PR(s):
- v1.25 CRI: CRI changes to support in-place pod resize kubernetes#111645
- v1.27
- In-place Pod Vertical Scaling feature kubernetes#102884
- Restructure resize policy naming and set default resize policy values kubernetes#116119
- Fix nil pointer access panic in kubelet from uninitialized pod allocation checkpoint manager in standalone kubelet scenario kubernetes#116271
- Initialize pod resource allocation checkpoint manager to noop kubernetes#116351
- Rename ContainerStatus.ResourcesAllocated to ContainerStatus.AllocatedResources kubernetes#116450
- Fix null pointer access in doPodResizeAction for kubeletonly mode kubernetes#116504
- Add missing unit test for resource resize policy defaulting kubernetes#116684
- Fix pod object update that may cause data race kubernetes#116702
- Call function that validates resize policy for in-place pod resize feature kubernetes#116857
- v1.29
- Perf optimization: GetPodQOS() returns persisted value of PodStatus.QOSClass, if set. kubernetes#119665
- Fail validation if container restart policy is 'Never' and resource resize restart policy isn't 'NotRequired' kubernetes#118768
- Fix: do not assign an empty value to the resource (CPU or memory) if it's not defined in the container kubernetes#117615
- Adding Windows support for InPlace Pod Vertical Scaling kubernetes#112599
- fix inplace VPA stuck in InProgress when custom resources are specified kubernetes#120145
- Docs (k/website) update PR(s):
- Beta(v1.33)
- KEP (k/enhancements) update PR(s):
- Code (k/k) update PR(s):
- [FG:InPlacePodVerticalScaling] Equate CPU limits below the minimum effective limit (10m) kubernetes#128771
- [FG:InPlacePodVerticalScaling] Enable resizing containers without limits kubernetes#128718
- [FG:InPlacePodVerticalScaling] Emit events for Deferred and Infeasible statuses kubernetes#128713
- Revert "[FG:InPlacePodVerticalScaling] kubelet: Propagate error in doPodResizeAction() to the caller" kubernetes#128694
- [FG:InPlacePodVerticalScaling] Fix AllocatedResources feature gate annotation kubernetes#128687
- [FG:InPlacePodVerticalScaling] Disallow removing requests & limits for Burstable pods. kubernetes#128683
- Refactor: Move IsRestartableInitContainer to common utility package kubernetes#128676
- [FG:InPlacePodVerticalScaling] Handle edge cases around CPU MinShares kubernetes#128680
- [FG:InPlacePodVerticalScaling] Drop InPlacePodVerticalScaling support in windows kubernetes#128623
- [FG:InPlacePodVerticalScaling] fix InPlacePodVerticalScaling e2e tests kubernetes#128598
- [FG:InPlacePodVerticalScaling] Don't checkpoint ResizeStatus kubernetes#128551
- [FG:InPlacePodVerticalScaling] PLEG watch conditions: rapid polling for expected changes kubernetes#128518
- [FG:InPlacePodVerticalScaling] Implement AllocatedResources status changes for Beta kubernetes#128377
- [FG:InPlacePodVerticalScaling] Remove restrictions on subresource flag in kubectl commands kubernetes#128296
- [FG:InPlacePodVerticalScaling] Gate Disallow in-place resize for guaranteed pods on nodes with a static topology policy kubernetes#128287
- [FG:InPlacePodVerticalScaling] Rework handling of allocated resources kubernetes#128269
- [FG:InPlacePodVerticalScaling] Introduce /resize subresource to request pod resource resizing kubernetes#128266
- Updated version skew strategy for InPlacePodVerticalScaling kubernetes#128186
- [FG:InPlacePodVerticalScaling] Refactor in-place pod resize e2e tests kubernetes#128143
- [FG:InPlacePodVerticalScaling] Fix order of resizing pod cgroups in doPodResizeAction() kubernetes#125708
- [InPlacePodVerticalScaling] fix restore checkpoint bug: failed to verify pod status checkpoint checksum because of different behaviors of func Quantity.Marshal and Quantity.Unmarshal kubernetes#126620
- [FG:InPlacePodVerticalScaling] kubelet: Propagate error in doPodResizeAction() to the caller kubernetes#127300
- [FG:InPlacePodVerticalScaling] Fixed the apiserver panic issue that occurred when adding a container during pod updates in the InPlacePodVerticalScaling scenario. kubernetes#127291
- [FG:InPlacePodVerticalScaling] bug(quota): handle resources changed on resource quota filter kubernetes#127275
- [FG:InPlacePodVerticalScaling] Handle systemd cgroup driver by using libcontainer for updating pod cgroup values kubernetes#124216
- [FG:InPlacePodVerticalScaling] Fix backoff problem when quickly reverting resize patch kubernetes#125757
- [FG:InPlacePodVerticalScaling] Add extended resources to ContainerStatuses[i].Resources kubernetes#124227
- [FG:InPlacePodVerticalScaling] Add UpdatePodSandboxResources CRI method kubernetes#128123
- [FG:InPlacePodVerticalScaling] Implement resize for sidecar containers kubernetes#128367
- [FG:InPlacePodVerticalScaling] assert specific error occurred in tests kubernetes#128685
- [FG:InPlacePodVerticalScaling] reduce container resources in tests kubernetes#128719
- [FG:InPlacePodVerticalScaling] Remove ResizePolicy defaulting kubernetes#128920
- [FG:InPlacePodVerticalScaling] Never attempt a resize of windows pods and always use allocated resources for unsupported resize pods kubernetes#129216
- [FG:InPlacePodVerticalScaling] Improve allocated resources checkpointing kubernetes#129477
- testing: Fix pod delete timeout failures after InPlacePodVerticalScaling Graduate to Beta commit kubernetes#129717
- [FG:InPlacePodVerticalScaling] Forbid memory limit decrease kubernetes#130183
- [FG:InPlacePodVerticalScaling] Move pod resource allocation management out of the status manager kubernetes#130254
- [FG:InPlacePodVerticalScaling] Fix use CamelCase for memory manager policy in InPlacePodVerticalScalingExclusiveCPUs kubernetes#130559
- [FG:InPlacePodVerticalScaling] Drop
Proposed
resize status kubernetes#130574 - [FG:InPlacePodVerticalScaling] Track actuated resources to trigger resizes kubernetes#130599
- [FG:InPlacePodVerticalScaling] Move pod resize status to pod conditions kubernetes#130733
- [FG:InPlacePodVerticalScaling] Add back
AllocatedResources
and use it for scheduling kubernetes#130880 - [FG:InPlacePodVerticalScaling] surface pod resize actuation errors in pod resize conditions kubernetes#130902
- [FG:InPlacePodVerticalScaling] Graduate to Beta kubernetes#130905
- Invoke UpdateContainerResources or trigger container restarts when memory requests are resized kubernetes#130917
- disable in-place pod vertical scaling for swap enabled pods kubernetes#130831
- Docs (k/website) update PR(s):
Please to keep this description up to date. This will help the Enhancement Team track efficiently the evolution of the enhancement
-
Identify CRI changes needed for UpdateContainerResources API, define response message for UpdateContainerResourcesExtend UpdateContainerResources API to return info such as ‘not supported’, ‘not enough memory’, ‘successful’, ‘pending page evictions’ etc.Define expected behavior for runtime when UpdateContainerResources is invoked. Define timeout duration of the CRI call.- Resolution: Separate KEP for CRI changes.
- Discussed draft CRI changes with SIG-Node on Oct 22, and we agreed to do this as an incremental change outside the scope of this KEP, in a new mini-KEP. It does not block implementation of this KEP.
- Resolution: Separate KEP for CRI changes.
-
Define behavior when multiple containers are being resized, and UpdateContainerResources fails for one or more containers.
- One Possible solution:
- Do not update Status.Resources.Limits if UpdateContainerResources API fails, and keep retrying until it succeeds.
- One Possible solution:
-
Check with API reviewers if we can keep maps instead list of named sub-objects for ResizePolicy.- After discussion with @liggitt , we are going to use list of named subobjects for extensibility.
-
Can we find a more intuitive name for ResizePolicy?
-
Can we use ResourceVersion to figure out the ordering of Pod resize requests?
-
Do we need to add back the ‘RestartPod’ resize policy? Is there a strong use-case for it?- Resolution: No.
- Discussed with SIG-Node on Oct 15th, not adding RestartPod policy for simplicity, will revisit if we encounter problems.
- Resolution: No.
Alpha Feature Code Issues:
These are Items and issues discovered during code review that need further discussion and need to be addressed before Beta.
Can we figure out GetPodQOS differently once it is determined on pod create? See In-place Pod Vertical Scaling feature kubernetes#102884 (comment)- How do we deal with a pod that requests 1m/1m cpu requests/limits. See In-place Pod Vertical Scaling feature kubernetes#102884 (comment)
- Add internal representation of ContainerStatus.Resources in kubeContainer. Convert it to ContainerStatus.Resources in kubelet_pods generate functions. See In-place Pod Vertical Scaling feature kubernetes#102884 (comment) and In-place Pod Vertical Scaling feature kubernetes#102884 (comment) and In-place Pod Vertical Scaling feature kubernetes#102884 (comment)
- Can we get rid of resize mutex? Is there a better way to handle resize retries? See In-place Pod Vertical Scaling feature kubernetes#102884 (comment)
- Can we recover from resize checkpoint store failures? See In-place Pod Vertical Scaling feature kubernetes#102884 (comment)
- CRI clarification for ContainerStatus.Resources and how to handle runtimes that don't support it. See In-place Pod Vertical Scaling feature kubernetes#102884 (comment)
Add real values to dockershim test for ContainerStatus.Resources In-place Pod Vertical Scaling feature kubernetes#102884 (comment)- Resolution: Not required due to dockershim deprecation.
Change PodStatus.Resources from v1.ResourceRequirements to *v1.ResourceRequirements- Resolution: Fixed
- Address all places in the code that has 'TODO(vinaykul)'
- Current implementation does not work with node toploogy manager enabled. This limitation is not capturedi in the KEP. Add this to the release documentation for alpha, we will address this in beta. See In-place Pod Vertical Scaling feature kubernetes#102884 (comment)
Metadata
Metadata
Labels
Categorizes issue or PR as related to adding, removing, or otherwise changing an APIDenotes that an issue has been opted in to a releaseCategorizes an issue or PR as relevant to SIG Autoscaling.Categorizes an issue or PR as relevant to SIG Node.Categorizes an issue or PR as relevant to SIG Scheduling.Denotes an issue tracking an enhancement targeted for Beta status
Type
Projects
Status
In Progress
Status
Done