Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A new parameter for the management of the limits that the VPA assigns to the new Pod #7790

Closed
FabrizioCafolla opened this issue Jan 30, 2025 · 10 comments
Labels
area/vertical-pod-autoscaler kind/feature Categorizes issue or PR as related to a new feature.

Comments

@FabrizioCafolla
Copy link

FabrizioCafolla commented Jan 30, 2025

Which component are you using?:

/area vertical-pod-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

I am having problems with the resources.limit described in issue #6996

  • I expect the pod to be restarted with the new resources and limit
  • I expect the required resources to be in a range for CPU > minAllowed && CPU < maxAllowed and Memory > minAllowed && Memory < maxAllowed
  • I expect the new resource limits to always be CPU <= maxAllowed and Memory <= maxAllowed

I also expected the maxAllowed option to set the resources.limit, but it seems to only apply to resources.requests. This causes the pod to potentially use too many resources (which I don't want). See more details described in the issue

Describe the solution you'd like.:

I want to be able to handle new limits on the Pod.

Because with the controlledValues: RequestsAndLimits option makes the VPA set limits beyond the maxAllowed, while this is expected behavior, it is too risky and uncontrollable because the limits the VPA sets run the risk of causing the pod to use many more resources than expected by saturating the nodes.

Describe any alternative solutions you've considered.:

Can we think of a new option for the ContainerResourcePolicy configuration, such as adding maxLimitAllowed? This would allow setting a ceiling limit on the pod's resources.

Additional context.:

Hi @voelzmo. I am testing VPA and I am experiencing the same problem. I don't know if I am missing something but the pod limits are not correct ( or rather not as I expect)

System

  • MacOs Darwin Kernel Version 24.2.0
  • kind version 0.22.0

VPA PoC

apiVersion: apps/v1
kind: Deployment
metadata:
  name: high-cpu-utilization-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: cpu-utilization-app
  template:
    metadata:
      labels:
        app: cpu-utilization-app
    spec:
      containers:
        - name: cpu-utilization-container
          image: ubuntu
          command:
            [
              "/bin/sh",
              "-c",
              "apt-get update && apt-get install -y stress-ng && while true; do stress-ng --cpu 1; done",
            ]
          resources:
            limits:
              cpu: "20m"
              memory: "50Mi"
            requests:
              cpu: "1m"
              memory: "10Mi"
---
apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
  name: stress-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: high-cpu-utilization-deployment
  updatePolicy:
    updateMode: Auto
  resourcePolicy:
    containerPolicies:
      - containerName: "*"
        minAllowed:
          cpu: 1m
          memory: 10Mi
        maxAllowed:
          cpu: 200m
          memory: 100Mi
        controlledResources: ["cpu", "memory"]
        controlledValues: RequestsAndLimits

Desired status

  • I expect the high-cpu-utilization-deployment pod to be restarted with the new resources and limit
  • I expect the required resources to be in a range for CPU > 1m && CPU < 200m and Memory > 10Mi && CPU < 100Mi
  • I expect the new resource limits to always be CPU <= 200m and Memory <= 100Mi

I also expected the maxAllowed option to set the resources.limit, but it seems to only apply to resources.requests. This causes the pod to potentially use too many resources (which I don't want), as in the example above where I switched to 4 CPUs. I used the controlledValues: RequestsOnly configuration to prevent the VPA from touching the limits.

What happens

I created locally in my cluster kind the Deployment and VPA described above

➜  test-vpa git:(main) ✗ kubectl get pods                                                        
NAME                                              READY   STATUS    RESTARTS   AGE
high-cpu-utilization-deployment-fc98c7d68-5vwts   1/1     Running   0          36s
high-cpu-utilization-deployment-fc98c7d68-q2ztv   1/1     Running   0          36s

I retrieve the resources of the created pods (which are those indicated in the Deployment)

kubectl get pods high-cpu-utilization-deployment-fc98c7d68-5vwts -o yaml

...
    name: cpu-utilization-container
    resources:
      limits:
        cpu: 20m
        memory: 50Mi
      requests:
        cpu: 1m
        memory: 10Mi
...

The VPA creates a new pod with new resources

➜  test-vpa git:(main) ✗ kubectl get pods                                                        
NAME                                              READY   STATUS        RESTARTS   AGE
high-cpu-utilization-deployment-fc98c7d68-5vwts   1/1     Terminating   0          56s
high-cpu-utilization-deployment-fc98c7d68-8tk45   1/1     Running       0          6s
high-cpu-utilization-deployment-fc98c7d68-q2ztv   1/1     Running       0          56s

I retrieve the new resources of the pod and it turns out that the limits are incorrect.

kubectl get pods high-cpu-utilization-deployment-fc98c7d68-8tk45  -o yaml

...
    resources:
      limits:
        cpu: "4" <--------- Why is there this new limit?
        memory: 500Mi <---- Why is there this new limit?
      requests:
        cpu: 200m
        memory: 100Mi
...
test-vpa git:(main) ✗ kubectl get vpa stress-vpa -o yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"autoscaling.k8s.io/v1","kind":"VerticalPodAutoscaler","metadata":{"annotations":{},"name":"stress-vpa","namespace":"default"},"spec":{"resourcePolicy":{"containerPolicies":[{"containerName":"*","controlledResources":["cpu","memory"],"controlledValues":"RequestsAndLimits","maxAllowed":{"cpu":"200m","memory":"100Mi"},"minAllowed":{"cpu":"20m","memory":"50Mi"}}]},"targetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"high-cpu-utilization-deployment"},"updatePolicy":{"updateMode":"Auto"}}}
  creationTimestamp: "2025-01-30T11:49:39Z"
  generation: 1
  name: stress-vpa
  namespace: default
  resourceVersion: "33688"
  uid: 6d5c5758-84b8-4507-bbcd-02f035d8694c
spec:
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      controlledResources:
      - cpu
      - memory
      controlledValues: RequestsAndLimits
      maxAllowed:
        cpu: 200m
        memory: 100Mi
      minAllowed:
        cpu: 20m
        memory: 50Mi
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: high-cpu-utilization-deployment
  updatePolicy:
    updateMode: Auto
status:
  conditions:
  - lastTransitionTime: "2025-01-30T11:50:29Z"
    status: "True"
    type: RecommendationProvided
  recommendation:
    containerRecommendations:
    - containerName: cpu-utilization-container
      lowerBound:
        cpu: 32m
        memory: 100Mi
      target:
        cpu: 200m
        memory: 100Mi
      uncappedTarget:
        cpu: 1168m
        memory: 262144k
      upperBound:
        cpu: 200m
        memory: 100Mi

Questions

  1. How do I manage the new limits?
  2. Can we think of a new option for the ContainerResourcePolicy configuration, such as adding maxLimitAllowed? This would allow setting a ceiling limit on the pod's resources.

Originally posted by @FabrizioCafolla in #6996

@FabrizioCafolla FabrizioCafolla added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 30, 2025
@FabrizioCafolla FabrizioCafolla marked this as a duplicate of #7792 Jan 30, 2025
@FabrizioCafolla FabrizioCafolla marked this as a duplicate of #7791 Jan 30, 2025
@voelzmo
Copy link
Contributor

voelzmo commented Jan 31, 2025

Hey @FabrizioCafolla, I'm not sure I understand your question correctly. In your VPA, you're using controlledValues: RequestsAndLimits. I tried to explain in #6996 (comment) (which I think you followed closely, as your original comment was posted there), why limits are adjusted by VPA in this scenario.

I also see you tried switching to controlledValues: RequestsOnly

I used the controlledValues: RequestsOnly configuration to prevent the VPA from touching the limits.

Did this not work as expected? If this doesn't do what you want, could you elaborate a bit on what you're expecting to happen?

The configuration which you're proposing (a setting like maxLimitAllowed) technically exists already: specify the limit yourself and configure VPA with controlledValues: RequestsOnly. I'm not sure what added value maxLimitAllowed inside a VPA would bring.

@FabrizioCafolla
Copy link
Author

@voelzmo The current logic is now clear to me (but I couldn't understand it very well from the documentation).

Current state:

VPA in RequestsAndLimits mode updates the resources and limits assigned to the POD. Unfortunately, the limits are not manageable, and VPA assigns values that can cause serious problems.

Desired state:

I want VPA in RequestsAndLimits mode to manage the limits for me, but at the same time, I want to set a maximum limit for CPU/Memory that VPA will assign.

Example:

As reported in the example #6996 (comment), I wanted the VPA to manage the resources and limits autonomously, scaling both.
But the VPA set the CPU limit to 4, which is an illogical number of CPUs that could cause serious problems in production, as this pod would be allowed to use 4 CPUs.

I want the VPA to continue to manage resources and limits, but also allow me to set a maxLimitAllowed so that the above does not happen.

For example, I would have set a maxLimitAllowed to 1 CPU and 1024Mi Memory and be sure that the Pod would not scale beyond that.

Using the RequestsOnly mode would be a workaround, but would not cover 100% of the cases, as it assumes that each Pod has limits set, but this is not always true in production contexts.

Again, I want the VPA to manage limits but allow me to set the maximum limit.


I hope I have expressed myself correctly, I know it is something very specific. Maybe I am missing something.
Thanks for the support

@adrianmoisey
Copy link
Member

Again, I want the VPA to manage limits but allow me to set the maximum limit.

Given that the VPA sets requests based on history, what should limits be based on? How should it be calculated?

@FabrizioCafolla
Copy link
Author

@adrianmoisey I am not saying that VPA needs to change the way it calculates limits. I am saying that I don't want the limits that are set (even if VPA thinks they are correct) to be greater than a certain threshold that I specify.

Case 1

  • I have the limits at the POD -> Change the mode from RequestsAndLimits to RequestsOnly and it works.

Case 2

  • I have limits at the POD, but I want VPA to increase them for me when it thinks it is better, but without exceeding a certain threshold -> How do I handle this case?

Case 3

  • I have no limits on the POD, but I want the VPA to manage them without exceeding a certain threshold -> How do I handle this case?

This is why I suggest a property that allows you to put a limit on the limit that the VPA will set, this is to have a precaution and be able to cover cases 2 and 3. Or if you already have a solution that I might be missing.

@adrianmoisey
Copy link
Member

Right, so I imagine that this is what would happen. At the moment the VPA maintains the input Pod's requests/limit ratio.
So using RAM as an example, and a Pod with 1Gi requests and 2Gi limits.

Without this new feature:

  1. If the VPA recommends 1Gi RAM, the resulting pod will be 1Gi requests, 2Gi limits
  2. If the VPA recommends 2Gi RAM, the resulting pod will be 2Gi requests, 4Gi limits

With this new feature turned on, limiting memory limits to 5Gi:

  1. If the VPA recommends 1Gi RAM, the resulting pod will be 1Gi requests, 2Gi limits
  2. If the VPA recommends 2Gi RAM, the resulting pod will be 2Gi requests, 4Gi limits
  3. If the VPA recommends 3Gi RAM, the resulting pod will be 3Gi requests, 5Gi limits
  4. If the VPA recommends 4Gi RAM, the resulting pod will be 4Gi requests, 5Gi limits
  5. If the VPA recommends 5Gi RAM, the resulting pod will be 5Gi requests, 5Gi limits

Is that how you envision this to work?

(As an aside, number 5 above is an edge case we need to think about too, since it may change the QoS class of a Pod: https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/#guaranteed. I'm unsure if we would want to do that)

@voelzmo
Copy link
Contributor

voelzmo commented Jan 31, 2025

Hey @FabrizioCafolla thanks for the quick response!
What I'm most interested in is case 2, because I don't understand it very well.
If you have an upper boundary for the limit of your workload (and you do seem to have this, as you want to put it into maxLimitAllowed), why don't you just put that same value into limits? There is no negative impact of doing this that I'm aware of.

case 3 is really something that VPA doesn't do. VPA cannot know the limits that you would want for your workload. Similar to what I mentioned above for case 2: If you know the upper boundary for your limit, then just set it in the workload directly.

@FabrizioCafolla
Copy link
Author

FabrizioCafolla commented Feb 3, 2025

@adrianmoisey Yes, you get the point. This would be the behavior that I described in the question and that I think would be nice to have in VPA.

@voelzmo I see your point. Imagine a production cluster with hundreds of applications deployed by different development teams, asking them to add the limits in the configuration would mean doing research on which limits are correct for each pod (plus the time to support updates in the code base would be longer). On the other hand, if I use VPA, it will tell me the correct limits based on the metrics. However, this is only valid if the condition I described in the problem is available since I do not want the limits to exceed a "maximum limit" (which would be wrong prior).

I realize this is a borderline case, but as mentioned above, it would be nice to have this feature ✌️

Anyway, it doesn't make sense to me that with the max request of 200m, the VPA sets the limit for me at 4 CPU, I find it wrong and very dangerous (and not being able to handle it, in my opinion, is a limitation).

@voelzmo
Copy link
Contributor

voelzmo commented Feb 4, 2025

@FabrizioCafolla

Imagine a production cluster with hundreds of applications deployed by different development teams, asking them to add the limits in the configuration would mean doing research on which limits are correct for each pod (plus the time to support updates in the code base would be longer). On the other hand, if I use VPA, it will tell me the correct limits based on the metrics

What you are describing is not possible. CPU and memory limits are properties which are not depending on the resource utilization. It describes the maximum amount of resources which your workload would ever be able to consume, no matter how much free capacity there is on a node. No machine can set those for you.
The only way to set limits is by a human who knows the workload and who knows what they want to achieve.

For example:

  • memory limits are dangerous because they can cause OOMKills. Still, you might want to set a memory limit to a value for which you are sure that if your workload exceeds this value, this must be due to a memory leak and you need to fix or revert. Noone but an expert for this workload can make this decision
  • cpu limits are less dangerous, because cpu is compressible, but they're also less useful, because they don't necessarily help you detect if your workload introduces a performance problem. As I outlined in VPA Not honoring maxAllowed Memory Limit #6996 (comment), people mostly use cpu limits to get better predictability for their workload runs (in particular for batch workloads), but sacrifice performance. Also here, a machine cannot really infer a good cpu limit for you – this needs to be either set by platform operators or workload owners based on some thinking about the tradeoff between predicability and performance.

Last but not least, you're mentioning that you want to introduce a property that is an upper bound for the limits. My take here is: if someone in your company can find a value for this upper boundary, you don't need an algorithm which increases the actual limits step-by-step towards this upper boundary. What would this be useful for? Limits are not used by the scheduler to determine the free capacity on a node (as I elaborated in #6996 (comment)), so I don't see a reason why you would want your limits to "start small". The person or program which would set this upper boundary can just set the actual limit on the workload, this is the same thing.

Does this make sense?

@voelzmo
Copy link
Contributor

voelzmo commented Feb 19, 2025

/close

please re-open if you feel I didn't cover the situation correctly and we still need to change something in the API.

@k8s-ci-robot
Copy link
Contributor

@voelzmo: Closing this issue.

In response to this:

/close

please re-open if you feel I didn't cover the situation correctly and we still need to change something in the API.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants