bug(AL2023): Pre-nodeadm script doesn't run and post-nodeadm prevents nodes from joining #2123

darox · 2025-01-24T09:30:58Z

What happened:

I have to run the following script at boot to configure the interfaces for XDP, I don't matter if it's pre or post nodeadm.

With post-noeadm

cat /var/lib/cloud/instance/scripts/part-003 
#!/usr/bin/env bash
ip link set dev ens5 mtu 3498
ethtool -L ens5 combined 2

In this case nodes don't join the cluster.

The status of Kubelet:

[root@ip-10-1-0-222 ec2-user]# service kubelet status
Redirecting to /bin/systemctl status kubelet.service
● kubelet.service - Kubernetes Kubelet
     Loaded: loaded (/etc/systemd/system/kubelet.service; disabled; preset: disabled)
     Active: activating (auto-restart) (Result: resources) since Fri 2025-01-24 07:53:17 UTC; 3s ago
       Docs: https://github.com/kubernetes/kubernetes
        CPU: 0

Kubelet service logs:

Jan 24 08:06:51 ip-10-1-0-222.eu-central-1.compute.internal systemd[1]: kubelet.service: Failed with result 'resources'.
Jan 24 08:06:51 ip-10-1-0-222.eu-central-1.compute.internal systemd[1]: Failed to start kubelet.service - Kubernetes Kubelet.
Jan 24 08:06:56 ip-10-1-0-222.eu-central-1.compute.internal systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 442.
Jan 24 08:06:56 ip-10-1-0-222.eu-central-1.compute.internal audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kubelet comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jan 24 08:06:56 ip-10-1-0-222.eu-central-1.compute.internal audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kubelet comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jan 24 08:06:56 ip-10-1-0-222.eu-central-1.compute.internal systemd[1]: Stopped kubelet.service - Kubernetes Kubelet.
Jan 24 08:06:56 ip-10-1-0-222.eu-central-1.compute.internal systemd[1]: kubelet.service: Failed to load environment files: No such file or directory
Jan 24 08:06:56 ip-10-1-0-222.eu-central-1.compute.internal systemd[1]: kubelet.service: Failed to run 'start-pre' task: No such file or directory

The user data is as follows:

cat /var/lib/cloud/instance/user-data.txt
ontent-Type: multipart/mixed; boundary="MIMEBOUNDARY"
MIME-Version: 1.0

--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: application/node.eks.aws
Mime-Version: 1.0

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name: <redacted>
    apiServerEndpoint: <redacted>
    certificateAuthority: <redacted>
    cidr: 172.20.0.0/16

--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: text/x-shellscript
Mime-Version: 1.0

#!/usr/bin/env bash

ip link set dev ens5 mtu 3498
ethtool -L ens5 combined 2
--MIMEBOUNDARY--

The script ran, because we can see the changed MTU:

2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 3498 qdisc mq state UP group default qlen 1000

With pre-noeadm

sudo cat /var/lib/cloud/instance/scripts/part-003 
#!/usr/bin/env bash

ip link set dev ens5 mtu 3498
ethtool -L ens5 combined 2

In this case nodes join the cluster, but the interface MTU is still the same:

2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000

What you expected to happen:

I expect that post or pre scripts run successfully and the nodes can join the cluster.

How to reproduce it (as minimally and precisely as possible):

Define some pre and post nodeadm scripts and check if scripts ran and if nodes joined the cluster.

Environment: EKS

AWS Region: eu-central-1
Instance Type(s): m5n.xlarge
Cluster Kubernetes version: 1.30
Node Kubernetes version: 1.30
AMI Version: amazon-eks-node-al2023-x86_64-standard-1.30-v20250116

The text was updated successfully, but these errors were encountered:

ndbaker1 · 2025-01-24T16:55:15Z

Hi @darox, sorry not quite groking the pre/post script setup. If you're relying on cloud-init to execute user data scripts then everything will run before nodeadm is completed. The bootstrap is split into 2 parts so it looks something like nodeadm-config > cloud-init > nodeadm-run.

Have you checked the logs for those services via journalctl -u nodeadm-run

darox added the bug Something isn't working label Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(AL2023): Pre-nodeadm script doesn't run and post-nodeadm prevents nodes from joining #2123

bug(AL2023): Pre-nodeadm script doesn't run and post-nodeadm prevents nodes from joining #2123

darox commented Jan 24, 2025 •

edited

Loading

ndbaker1 commented Jan 24, 2025

bug(AL2023): Pre-nodeadm script doesn't run and post-nodeadm prevents nodes from joining #2123

bug(AL2023): Pre-nodeadm script doesn't run and post-nodeadm prevents nodes from joining #2123

Comments

darox commented Jan 24, 2025 • edited Loading

With post-noeadm

With pre-noeadm

ndbaker1 commented Jan 24, 2025

darox commented Jan 24, 2025 •

edited

Loading