Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(AL2023): Pre-nodeadm script doesn't run and post-nodeadm prevents nodes from joining #2123

Open
darox opened this issue Jan 24, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@darox
Copy link

darox commented Jan 24, 2025

What happened:

I have to run the following script at boot to configure the interfaces for XDP, I don't matter if it's pre or post nodeadm.

With post-noeadm

cat /var/lib/cloud/instance/scripts/part-003 
#!/usr/bin/env bash
ip link set dev ens5 mtu 3498
ethtool -L ens5 combined 2

In this case nodes don't join the cluster.

The status of Kubelet:

[root@ip-10-1-0-222 ec2-user]# service kubelet status
Redirecting to /bin/systemctl status kubelet.service
● kubelet.service - Kubernetes Kubelet
     Loaded: loaded (/etc/systemd/system/kubelet.service; disabled; preset: disabled)
     Active: activating (auto-restart) (Result: resources) since Fri 2025-01-24 07:53:17 UTC; 3s ago
       Docs: https://github.com/kubernetes/kubernetes
        CPU: 0

Kubelet service logs:

Jan 24 08:06:51 ip-10-1-0-222.eu-central-1.compute.internal systemd[1]: kubelet.service: Failed with result 'resources'.
Jan 24 08:06:51 ip-10-1-0-222.eu-central-1.compute.internal systemd[1]: Failed to start kubelet.service - Kubernetes Kubelet.
Jan 24 08:06:56 ip-10-1-0-222.eu-central-1.compute.internal systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 442.
Jan 24 08:06:56 ip-10-1-0-222.eu-central-1.compute.internal audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kubelet comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jan 24 08:06:56 ip-10-1-0-222.eu-central-1.compute.internal audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kubelet comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jan 24 08:06:56 ip-10-1-0-222.eu-central-1.compute.internal systemd[1]: Stopped kubelet.service - Kubernetes Kubelet.
Jan 24 08:06:56 ip-10-1-0-222.eu-central-1.compute.internal systemd[1]: kubelet.service: Failed to load environment files: No such file or directory
Jan 24 08:06:56 ip-10-1-0-222.eu-central-1.compute.internal systemd[1]: kubelet.service: Failed to run 'start-pre' task: No such file or directory

The user data is as follows:

cat /var/lib/cloud/instance/user-data.txt
ontent-Type: multipart/mixed; boundary="MIMEBOUNDARY"
MIME-Version: 1.0

--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: application/node.eks.aws
Mime-Version: 1.0

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name: <redacted>
    apiServerEndpoint: <redacted>
    certificateAuthority: <redacted>
    cidr: 172.20.0.0/16

--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: text/x-shellscript
Mime-Version: 1.0

#!/usr/bin/env bash

ip link set dev ens5 mtu 3498
ethtool -L ens5 combined 2
--MIMEBOUNDARY--

The script ran, because we can see the changed MTU:

2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 3498 qdisc mq state UP group default qlen 1000

With pre-noeadm

sudo cat /var/lib/cloud/instance/scripts/part-003 
#!/usr/bin/env bash

ip link set dev ens5 mtu 3498
ethtool -L ens5 combined 2

In this case nodes join the cluster, but the interface MTU is still the same:

2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000

What you expected to happen:

I expect that post or pre scripts run successfully and the nodes can join the cluster.

How to reproduce it (as minimally and precisely as possible):

Define some pre and post nodeadm scripts and check if scripts ran and if nodes joined the cluster.

Environment: EKS

  • AWS Region: eu-central-1
  • Instance Type(s): m5n.xlarge
  • Cluster Kubernetes version: 1.30
  • Node Kubernetes version: 1.30
  • AMI Version: amazon-eks-node-al2023-x86_64-standard-1.30-v20250116
@darox darox added the bug Something isn't working label Jan 24, 2025
@ndbaker1
Copy link
Member

Hi @darox, sorry not quite groking the pre/post script setup. If you're relying on cloud-init to execute user data scripts then everything will run before nodeadm is completed. The bootstrap is split into 2 parts so it looks something like nodeadm-config > cloud-init > nodeadm-run.

Have you checked the logs for those services via journalctl -u nodeadm-run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants