Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Help] Unable to upgrade a managed nodegroup #7976

Open
ybykov-a9s opened this issue Sep 27, 2024 · 9 comments
Open

[Help] Unable to upgrade a managed nodegroup #7976

ybykov-a9s opened this issue Sep 27, 2024 · 9 comments
Labels
kind/help Request for help

Comments

@ybykov-a9s
Copy link

ybykov-a9s commented Sep 27, 2024

Hello!

I can’t upgrade a managed nodegroup using eksctl

Following document was used for the procedure:
https://docs.aws.amazon.com/eks/latest/userguide/update-managed-node-group.html#mng-update

Steps to reproduce:

Create a cluster using following manifest

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: yby-test
  region: eu-central-1
  version: "1.28"
managedNodeGroups:
  - name: mng-medium
    instanceType: t3a.medium
    desiredCapacity: 2
    minSize: 1
    maxSize: 2
    volumeSize: 10
    iam:
      withAddonPolicies:
        ebs: true

It gets created successfully

Next I upgrade a control plane's kubernetes version using following command:

eksctl upgrade cluster --name yby-test --region eu-central-1 --approve

Everything works fine:

2024-09-27 15:05:31 [ℹ]  will upgrade cluster "yby-test" control plane from current version "1.28" to "1.29"
2024-09-27 15:14:54 [✔]  cluster "yby-test" control plane has been upgraded to version "1.29"
2024-09-27 15:14:54 [ℹ]  you will need to follow the upgrade procedure for all of nodegroups and add-ons
2024-09-27 15:14:55 [ℹ]  re-building cluster stack "eksctl-yby-test-cluster"
2024-09-27 15:14:55 [✔]  all resources in cluster stack "eksctl-yby-test-cluster" are up-to-date
2024-09-27 15:14:55 [ℹ]  checking security group configuration for all nodegroups
2024-09-27 15:14:55 [ℹ]  all nodegroups have up-to-date cloudformation templates

And then I try to upgrade nodegroup to the target version using:

eksctl upgrade nodegroup --cluster yby-test --region eu-central-1 --name mng-medium --kubernetes-version=1.29

Here is the log:

2024-09-27 15:17:45 [ℹ]  will upgrade nodes to release version: 1.29.8-20240917
2024-09-27 15:17:45 [ℹ]  upgrading nodegroup version
2024-09-27 15:17:45 [ℹ]  updating nodegroup stack
2024-09-27 15:17:46 [ℹ]  waiting for CloudFormation changeset "eksctl-update-nodegroup-1727443065" for stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:18:16 [ℹ]  waiting for CloudFormation changeset "eksctl-update-nodegroup-1727443065" for stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:18:16 [ℹ]  waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:18:46 [ℹ]  waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:19:28 [ℹ]  waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:20:28 [ℹ]  waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:21:36 [ℹ]  waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:23:25 [ℹ]  waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:24:58 [ℹ]  waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:25:34 [ℹ]  waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:26:44 [ℹ]  waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:28:41 [ℹ]  waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:30:12 [ℹ]  waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:31:48 [ℹ]  waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:33:39 [ℹ]  waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
Error: error updating nodegroup stack: waiter state transitioned to Failure

If I check Cloudformation console I see a following event:

ManagedNodeGroup
Resource handler returned message: "Requested release version 1.29.8-20240917 is not valid for kubernetes version 1.28. (Service: Eks, Status Code: 400, Request ID: 00c5f96d-c686-42a6-98e8-06abde8621d6)" (RequestToken: 38c80d12-e9e8-12b7-ab49-6c7cf2a65b6c, HandlerErrorCode: InvalidRequest)

If I try to upgrade a node pool using AWS web console everything works fine, but without any changes in Cloudformation logs. Therefore I suppose it doesn't use Cloudformation.

eksctl version
0.190.0-dev+3fccc8ed8.2024-09-04T12:58:57Z

What help do you need?

Please point me if I misunderstood the documentation or if it's a bug.
Maybe there are other actions which nave to be done.

Tell me if I should provide more information or tests.

Thanks in advance.

--
Eugene Bykov

@ybykov-a9s ybykov-a9s added the kind/help Request for help label Sep 27, 2024
Copy link
Contributor

Hello ybykov-a9s 👋 Thank you for opening an issue in eksctl project. The team will review the issue and aim to respond within 1-5 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website

@TreeKat71
Copy link

TreeKat71 commented Oct 4, 2024

I am also facing this issue. And I checked the CloudFormation, it mentioned the Resource handler returned message: "Volume of size 10GB is smaller than snapshot 'snap-0145xxxxxx10a66e4', expect size>= 20GB

But I can do that (less than 20 GB) in my another account. They are both in the same region. The only difference I can think of is k8s cluster version. I can create node with 10 GB in 1.29, can't in 1.30

The eksctl version I am using is 0.184

@roman5595
Copy link

Any update on this ? Im facing same issue, Resource handler returned message: "Requested release version 1.31.0-20241024 is not valid for kubernetes version 1.30. (Service: Eks, Status Code: 400, Request ID: 15e2fb73-4134-4763-94d4-6b1ffc6d04b3)" (RequestToken: 1565436d-5bbc-7be1-7081-7a0631cf5842, HandlerErrorCode: InvalidRequest)

After I upgraded successfully control plane to 1.31, i cannot upgrade managed node group to 1.31.

@TreeKat71
Copy link

TreeKat71 commented Nov 28, 2024

Requested release version 1.31.0-20241024 is not valid for kubernetes version 1.30

Are you sure your control plane is already updated?


And I found the reason why I can not upgrade my managed nodegroup.

I am also facing this issue. And I checked the CloudFormation, it mentioned the Resource handler returned message: "Volume of size 10GB is smaller than snapshot 'snap-0145xxxxxx10a66e4', expect size>= 20GB

But I can do that (less than 20 GB) in my another account. They are both in the same region. The only difference I can think of is k8s cluster version. I can create node with 10 GB in 1.29, can't in 1.30

The eksctl version I am using is 0.184

I am using two different AMI and OS. So AmazonLinux2 is able to reduce the disk size to 8 but AmazonLinux2023 can not.
That is what I currently know... I am not sure if it is documented.

@jim-barber-he
Copy link

jim-barber-he commented Jan 8, 2025

I also have the exact same problem when trying to upgrade the managed node group.

Resource handler returned message: "Requested release version 1.31.3-20250103 is not valid for kubernetes version 1.30. (Service: Eks, Status Code: 400, Request ID: 4aa1ba6a-840a-40ec-9934-0809b7c92538)" (RequestToken: a328d53f-60e2-b3a1-ba8f-5e6daa9d59ef, HandlerErrorCode: InvalidRequest

The control plane was already upgraded from version 1.30 to 1.31 successfully.

$ eksctl upgrade cluster --approve --name eks-analytics
2025-01-08 15:25:55 [ℹ]  will upgrade cluster "eks-analytics" control plane from current version "1.30" to "1.31"
2025-01-08 15:35:50 [✔]  cluster "eks-analytics" control plane has been upgraded to version "1.31"
2025-01-08 15:35:50 [ℹ]  you will need to follow the upgrade procedure for all of nodegroups and add-ons
2025-01-08 15:35:51 [ℹ]  re-building cluster stack "eksctl-eks-analytics-cluster"
2025-01-08 15:35:51 [✔]  all resources in cluster stack "eksctl-eks-analytics-cluster" are up-to-date
2025-01-08 15:35:52 [ℹ]  checking security group configuration for all nodegroups
2025-01-08 15:35:52 [ℹ]  all nodegroups have up-to-date cloudformation templates

It shows the new version in the AWS console as well as via the command:

$ eksctl get cluster --name eks-analytics --output json | jq -r '.[].Version'
1.31

I also upgraded all the podidentityassociations successfully.

2025-01-08 15:35:55 [ℹ]  
2 parallel tasks: { update pod identity association kube-system/aws-load-balancer-controller, update pod identity association cert-manager/cert-manager 
}
2025-01-08 15:35:56 [ℹ]  updating IAM resources stack "eksctl-eks-analytics-podidentityrole-cert-manager-cert-manager" for pod identity association "cert-manager/cert-manager"
2025-01-08 15:35:56 [ℹ]  updating IAM resources stack "eksctl-eks-analytics-podidentityrole-kube-system-aws-load-balancer-controller" for pod identity association "kube-system/aws-load-balancer-controller"
2025-01-08 15:35:56 [ℹ]  waiting for CloudFormation changeset "eksctl-kube-system-aws-load-balancer-controller-update-1736321756" for stack "eksctl-eks-analytics-podidentityrole-kube-system-aws-load-balancer-controller"
2025-01-08 15:35:56 [ℹ]  nothing to update
2025-01-08 15:35:56 [ℹ]  IAM resources for kube-system/aws-load-balancer-controller (pod identity association ID: kube-system/aws-load-balancer-controller) are already up-to-date
2025-01-08 15:35:56 [ℹ]  waiting for CloudFormation changeset "eksctl-cert-manager-cert-manager-update-1736321756" for stack "eksctl-eks-analytics-podidentityrole-cert-manager-cert-manager"
2025-01-08 15:35:56 [ℹ]  nothing to update
2025-01-08 15:35:56 [ℹ]  IAM resources for cert-manager/cert-manager (pod identity association ID: cert-manager/cert-manager) are already up-to-date
2025-01-08 15:35:56 [ℹ]  all tasks were completed successfully

And the addons

2025-01-08 15:35:59 [ℹ]  Kubernetes version "1.31" in use by cluster "eks-analytics"
2025-01-08 15:35:59 [ℹ]  updating addon
2025-01-08 15:38:02 [ℹ]  addon "aws-ebs-csi-driver" active
2025-01-08 15:38:02 [ℹ]  updating addon
2025-01-08 15:38:13 [ℹ]  addon "coredns" active
2025-01-08 15:38:13 [ℹ]  updating addon
2025-01-08 15:38:24 [ℹ]  addon "eks-pod-identity-agent" active
2025-01-08 15:38:24 [ℹ]  new version provided v1.31.3-eksbuild.2
2025-01-08 15:38:24 [ℹ]  updating addon
2025-01-08 15:39:07 [ℹ]  addon "kube-proxy" active
2025-01-08 15:39:08 [ℹ]  updating addon
2025-01-08 15:39:18 [ℹ]  addon "vpc-cni" active

At first I just tried to do the following to upgrade the node group and it finished without error but left the node group at version v1.30

$ eksctl upgrade nodegroup --cluster eks-analytics --name eks-analytics-ng-1 --wait
2025-01-08 15:40:00 [ℹ]  setting ForceUpdateEnabled value to false
2025-01-08 15:40:00 [ℹ]  updating nodegroup stack
2025-01-08 15:40:01 [ℹ]  waiting for CloudFormation changeset "eksctl-update-nodegroup-1736322000" for stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:40:31 [ℹ]  waiting for CloudFormation changeset "eksctl-update-nodegroup-1736322000" for stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:40:31 [ℹ]  waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:41:01 [ℹ]  waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:41:02 [ℹ]  nodegroup "eks-analytics-ng-1" is already up-to-date
2025-01-08 15:41:02 [ℹ]  will upgrade nodes to Kubernetes version: 1.30
2025-01-08 15:41:02 [ℹ]  upgrading nodegroup version
2025-01-08 15:41:02 [ℹ]  updating nodegroup stack
2025-01-08 15:41:02 [ℹ]  waiting for CloudFormation changeset "eksctl-update-nodegroup-1736322062" for stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:41:32 [ℹ]  waiting for CloudFormation changeset "eksctl-update-nodegroup-1736322062" for stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:41:33 [ℹ]  waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:42:03 [ℹ]  waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:42:03 [ℹ]  nodegroup successfully upgraded

But noticed it left things at version 1.30.
So then I tried the following which resulted in the error above within the CloudFormation

$ eksctl upgrade nodegroup --cluster eks-analytics --kubernetes-version 1.31 --name eks-analytics-ng-1 --wait
2025-01-08 15:57:09 [ℹ]  will upgrade nodes to release version: 1.31.3-20250103
2025-01-08 15:57:09 [ℹ]  upgrading nodegroup version
2025-01-08 15:57:09 [ℹ]  updating nodegroup stack
2025-01-08 15:57:09 [ℹ]  waiting for CloudFormation changeset "eksctl-update-nodegroup-1736323029" for stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:57:39 [ℹ]  waiting for CloudFormation changeset "eksctl-update-nodegroup-1736323029" for stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:57:40 [ℹ]  waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:58:10 [ℹ]  waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:59:03 [ℹ]  waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:59:52 [ℹ]  waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 16:01:02 [ℹ]  waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 16:01:52 [ℹ]  waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 16:03:18 [ℹ]  waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
Error: error updating nodegroup stack: waiter state transitioned to Failure

@dmzeus
Copy link

dmzeus commented Jan 31, 2025

Got the same issue with upgrading node groups from 1.29 to 1.30
Control plane was updated successfully

$ eksctl version
0.201.0

@bartleboeuf
Copy link

We had the same problem on some clusters and we noticed that the cloudformation generated for upgrading NodeGroup put wrong "Version" number in "AWS::EKS::Nodegroup". Here we've upgraded from 1.30 to 1.31, but version is still 1.30 in CFN. We had the message "Requested release version 1.31.4-20250123 is not valid for kubernetes version 1.30. " .

 "ManagedNodeGroup": {
      "Type": "AWS::EKS::Nodegroup",
      "Properties": {
        "AmiType": "AL2_x86_64",
        "ClusterName": "testcluster",
        "ForceUpdateEnabled": true,
        "InstanceTypes": [
          "c5a.large",
          "c5.large",
          "c6i.large"
        ],
        "Labels": {
          "alpha.eksctl.io/cluster-name": "testcluster",
          "alpha.eksctl.io/nodegroup-name": "ng-eks-1",
        },
        "LaunchTemplate": {
          "Id": {
            "Ref": "LaunchTemplate"
          }
        },
        "NodeRole": {
          "Fn::GetAtt": [
            "NodeInstanceRole",
            "Arn"
          ]
        },
        "NodegroupName": "ng-eks-1",
        "ReleaseVersion": "1.31.4-20250123",
        "ScalingConfig": {
          "DesiredSize": 2,
          "MaxSize": 4,
          "MinSize": 2
        },
        "Subnets": [
          "subnet-0800de77f4fd29000",
          "subnet-0500ceeab196d00c"
        ],
        "Tags": {
          "alpha.eksctl.io/nodegroup-name": "ng-eks-1",
          "alpha.eksctl.io/nodegroup-type": "managed",
          "k8s.io/cluster-autoscaler/testcluster": "owned",
          "k8s.io/cluster-autoscaler/enabled": "true",
        },
        "Taints": [
          {
            "Effect": "NO_EXECUTE",
            "Key": "node.cilium.io/agent-not-ready",
            "Value": "true"
          }
        ],
        "UpdateConfig": {
          "MaxUnavailable": 2
        },
        "Version": "1.30"
      }
    }, 

As a workaround, we manually change the cloudformation to replace version 1.30 with 1.31 and update the CFN stack to make it work. I haven't yet managed to find out where this version was recovered. I hope this will help unblock those who are in this situation.

@dmzeus
Copy link

dmzeus commented Feb 3, 2025

It works properly if you create and upgrade your cluster from config file:

eksctl create cluster -f <cluster_config>.yaml
eksctl upgrade cluster -f <cluster_config>.yaml

And doesn't work If you create cluster from config file but upgrade using --name arg :)

@twarkie
Copy link

twarkie commented Feb 14, 2025

+1 on running into this problem now.

eksctl version
0.204.0

eksctl upgrade cluster -f eks.yaml --approve

2025-02-14 12:37:25 [!]  NOTE: cluster VPC (subnets, routing & NAT Gateway) configuration changes are not yet implemented
2025-02-14 12:37:26 [ℹ]  will upgrade cluster "eks" control plane from current version "1.31" to "1.32"
2025-02-14 12:45:22 [✔]  cluster "eks" control plane has been upgraded to version "1.32"
2025-02-14 12:45:22 [ℹ]  you will need to follow the upgrade procedure for all of nodegroups and add-ons
2025-02-14 12:45:22 [ℹ]  re-building cluster stack "eksctl-eks-cluster"
2025-02-14 12:45:22 [✔]  all resources in cluster stack "eksctl-eks-cluster" are up-to-date
2025-02-14 12:45:22 [ℹ]  checking security group configuration for all nodegroups
2025-02-14 12:45:22 [ℹ]  all nodegroups have up-to-date cloudformation templates

eksctl upgrade nodegroup --name=ng-1-workers --cluster=eks--kubernetes-version=1.32

Requested release version 1.32.0-20250203 is not valid for kubernetes version 1.31. 

I can also confirm that manually changing the version in the cloudformation template got me past this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/help Request for help
Projects
None yet
Development

No branches or pull requests

7 participants