Skip to content

In-Place Update of Pod Resources #1287

Open
@vinaykul

Description

@vinaykul

Enhancement Description

Please to keep this description up to date. This will help the Enhancement Team track efficiently the evolution of the enhancement

  1. Identify CRI changes needed for UpdateContainerResources API, define response message for UpdateContainerResources

    • Extend UpdateContainerResources API to return info such as ‘not supported’, ‘not enough memory’, ‘successful’, ‘pending page evictions’ etc.
    • Define expected behavior for runtime when UpdateContainerResources is invoked. Define timeout duration of the CRI call.
      • Resolution: Separate KEP for CRI changes.
        • Discussed draft CRI changes with SIG-Node on Oct 22, and we agreed to do this as an incremental change outside the scope of this KEP, in a new mini-KEP. It does not block implementation of this KEP.
  2. Define behavior when multiple containers are being resized, and UpdateContainerResources fails for one or more containers.

    • One Possible solution:
      • Do not update Status.Resources.Limits if UpdateContainerResources API fails, and keep retrying until it succeeds.
  3. Check with API reviewers if we can keep maps instead list of named sub-objects for ResizePolicy.

    • After discussion with @liggitt , we are going to use list of named subobjects for extensibility.
  4. Can we find a more intuitive name for ResizePolicy?

  5. Can we use ResourceVersion to figure out the ordering of Pod resize requests?

  6. Do we need to add back the ‘RestartPod’ resize policy? Is there a strong use-case for it?

    • Resolution: No.
      • Discussed with SIG-Node on Oct 15th, not adding RestartPod policy for simplicity, will revisit if we encounter problems.

Alpha Feature Code Issues:
These are Items and issues discovered during code review that need further discussion and need to be addressed before Beta.

  1. Can we figure out GetPodQOS differently once it is determined on pod create? See In-place Pod Vertical Scaling feature kubernetes#102884 (comment)
  2. How do we deal with a pod that requests 1m/1m cpu requests/limits. See In-place Pod Vertical Scaling feature kubernetes#102884 (comment)
  3. Add internal representation of ContainerStatus.Resources in kubeContainer. Convert it to ContainerStatus.Resources in kubelet_pods generate functions. See In-place Pod Vertical Scaling feature kubernetes#102884 (comment) and In-place Pod Vertical Scaling feature kubernetes#102884 (comment) and In-place Pod Vertical Scaling feature kubernetes#102884 (comment)
  4. Can we get rid of resize mutex? Is there a better way to handle resize retries? See In-place Pod Vertical Scaling feature kubernetes#102884 (comment)
  5. Can we recover from resize checkpoint store failures? See In-place Pod Vertical Scaling feature kubernetes#102884 (comment)
  6. CRI clarification for ContainerStatus.Resources and how to handle runtimes that don't support it. See In-place Pod Vertical Scaling feature kubernetes#102884 (comment)
  7. Add real values to dockershim test for ContainerStatus.Resources In-place Pod Vertical Scaling feature kubernetes#102884 (comment)
    • Resolution: Not required due to dockershim deprecation.
  8. Change PodStatus.Resources from v1.ResourceRequirements to *v1.ResourceRequirements
    • Resolution: Fixed
  9. Address all places in the code that has 'TODO(vinaykul)'
  10. Current implementation does not work with node toploogy manager enabled. This limitation is not capturedi in the KEP. Add this to the release documentation for alpha, we will address this in beta. See In-place Pod Vertical Scaling feature kubernetes#102884 (comment)

Metadata

Metadata

Labels

kind/api-changeCategorizes issue or PR as related to adding, removing, or otherwise changing an APIlead-opted-inDenotes that an issue has been opted in to a releasesig/autoscalingCategorizes an issue or PR as relevant to SIG Autoscaling.sig/nodeCategorizes an issue or PR as relevant to SIG Node.sig/schedulingCategorizes an issue or PR as relevant to SIG Scheduling.stage/betaDenotes an issue tracking an enhancement targeted for Beta status

Type

No type

Projects

Status

In Progress

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions