Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Rocky Linux 9 and AlmaLinux 9 hosts #301

Merged
merged 1 commit into from
Sep 15, 2023

Conversation

AkihiroSuda
Copy link
Member

@AkihiroSuda AkihiroSuda commented Sep 15, 2023

No description provided.

@vsoch
Copy link

vsoch commented Sep 15, 2023

Do you want me to ping some rocky devs? I think they might be able to provide insight.

@AkihiroSuda
Copy link
Member Author

Do you want me to ping some rocky devs? I think they might be able to provide insight.

Probably not yet, until VXLAN works for me on Rocky

@vsoch
Copy link

vsoch commented Sep 15, 2023

Probably not yet, until VXLAN works for me on Rocky

Okay I won't! But if they could be of help here (getting it working) let me know and I can.

@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Sep 15, 2023

WIP: this seems to somehow enable VXLAN functional

(sysctl values are from https://qiita.com/tom7/items/1bc7f4e568b20c306845)

# Execute inside `nsenter -t $(pgrep dockerd) -n -U` before running `make up`

# VRF
sysctl -w net.ipv4.ip_forward=1
sysctl -w net.ipv4.tcp_l3mdev_accept=1
sysctl -w net.ipv4.udp_l3mdev_accept=1
sysctl -w net.ipv4.conf.default.rp_filter=0
sysctl -w net.ipv4.conf.all.rp_filter=0


# Inspiered by Cumulus
sysctl -w net.ipv4.conf.default.arp_accept=0
sysctl -w net.ipv4.conf.default.arp_announce=2
sysctl -w net.ipv4.conf.default.arp_filter=0
sysctl -w net.ipv4.conf.default.arp_ignore=1
sysctl -w net.ipv4.conf.default.arp_notify=1

@vsoch
Copy link

vsoch commented Sep 15, 2023

Woot! So just to clarify - if I run this on the host nodes (not in containers) right before make up, this should work?

I can try this tonight (after you confirm the above!) It would be so great to get this working on rocky because our networking is good there, but we haven't figured out ubuntu yet.

@AkihiroSuda
Copy link
Member Author

It turns out that net.ipv4.conf.default.rp_filter is set to 1 (strict) on Rocky 9.

This has to be 0 (disabled) or 2 (loose) in the rootless dockerd's network namespace.
(Setting this value for the node container isn't enough).

This value may still remain 1 on the host.

Signed-off-by: Akihiro Suda <[email protected]>
@AkihiroSuda
Copy link
Member Author

Now this is ready for testing.

@vsoch
Copy link

vsoch commented Sep 15, 2023

Excellent! So should I test this branch as it is now, no changes to my rocky base images, or do we need further changes?

@AkihiroSuda
Copy link
Member Author

Excellent! So should I test this branch as it is now, no changes to my rocky base images, or do we need further changes?

No further change is expected to be needed

@AkihiroSuda AkihiroSuda merged commit 462ccf0 into rootless-containers:master Sep 15, 2023
3 checks passed
@vsoch
Copy link

vsoch commented Sep 15, 2023

Awesome! My rocky image is building now and I should be able to bring up a testing cluster after dinner. Will send you an update when I do! 🎉

@AkihiroSuda
Copy link
Member Author

Confirmed that this works on AlmaLinux 9.2 too, of course

@AkihiroSuda AkihiroSuda changed the title Support Rocky Linux 9 hosts Support Rocky Linux 9 and AlmaLinux 9 hosts Sep 15, 2023
@vsoch
Copy link

vsoch commented Nov 8, 2023

hey @AkihiroSuda! Congrats on your award today, you and your contributions are amazing and we so appreciate you!

I was running into some issues (related to this one, but on ubuntu) and wanted to post what I learned for some future person.
First, I was still getting a dbus error with the make up command:

cat: /sys/fs/cgroup/user.slice/user-501043911.slice/[email protected]/cgroup.controllers: No such file or directory
Failed to connect to bus: No such file or directory
[INFO] systemd not detected, dockerd-rootless.sh needs to be started manually:

And the fix was to rebuild my base image, and I added apt-get upgrade to update the kernel (that worked!) Then I was getting an error about net.ipv4.conf.default.rp_filter, specifically that it was still 1. But the rootless init script did create this to set to 2:

$ cat /etc/sysctl.d/99-usernetes.conf 
net.ipv4.conf.default.rp_filter = 2

I has already run this ./init-host/init-host.rootless.sh and here was the full error:

[INFO] Detected container engine type: docker
[WARNING] systemd lingering is not enabled. Run `sudo loginctl enable-linger $(whoami)` to enable it, otherwise Kubernetes will exit on logging out.
[WARNING] Kernel module "ip6_tables" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "ip6table_nat" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "iptable_nat" does not seem loaded? (negligible if built-in to the kernel)
[ERROR] sysctl value "net.ipv4.conf.default.rp_filter" must be 0 (disabled) or 2 (loose) in the container engine's network namespace
make: *** [Makefile:60: check-preflight] Error 1

(sidenote) no matter how many times I run this, I always see this warning and I haven't figured out why that's the case yet:

[WARNING] systemd lingering is not enabled. Run `sudo loginctl enable-linger $(whoami)` to enable it, otherwise Kubernetes will exit on logging out.

But I determined that I think it's still set to 1 on my host?

$ grep [01] /proc/sys/net/ipv4/conf/*/rp_filter|egrep "default|all"
/proc/sys/net/ipv4/conf/all/rp_filter:1

So I did:

$ sudo vim /etc/sysctl.conf
vsochat_gmail_com@usernetes-compute-001:/opt/usernetes$ sudo sysctl -p
net.ipv4.conf.default.rp_filter = 2

(changing it to 2) and restarted docker:

systemctl --user restart docker.service

And then the make up worked! But I wonder why that wasn't fixed to start? Now I have a control plane!

NAMESPACE      NAME                                                READY   STATUS    RESTARTS   AGE
kube-flannel   kube-flannel-ds-7wstg                               1/1     Running   0          23m
kube-system    coredns-5dd5756b68-ccwtd                            1/1     Running   0          23m
kube-system    coredns-5dd5756b68-m7c7v                            1/1     Running   0          23m
kube-system    etcd-u7s-usernetes-compute-001                      1/1     Running   0          23m
kube-system    kube-apiserver-u7s-usernetes-compute-001            1/1     Running   0          23m
kube-system    kube-controller-manager-u7s-usernetes-compute-001   1/1     Running   0          23m
kube-system    kube-proxy-gzxg8                                    1/1     Running   0          23m
kube-system    kube-scheduler-u7s-usernetes-compute-001            1/1     Running   0          23m

For the worker node, my power went out and I didn't get to test it fully, but when I ran the script to bring up the worker it seemed to hang:

./Makefile.d/check-preflight.sh
[INFO] Detected container engine type: docker
[WARNING] systemd lingering is not enabled. Run `sudo loginctl enable-linger $(whoami)` to enable it, otherwise Kubernetes will exit on logging out.
[WARNING] Kernel module "ip6_tables" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "ip6table_nat" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "iptable_nat" does not seem loaded? (negligible if built-in to the kernel)
docker compose up --build -d
[+] Building 0.2s (9/9) FINISHED                                 docker:default
 => [node internal] load build definition from Dockerfile                  0.0s
 => => transferring dockerfile: 809B                                       0.0s
 => [node internal] load .dockerignore                                     0.0s
 => => transferring context: 75B                                           0.0s
 => [node internal] load metadata for docker.io/kindest/node:v1.28.0       0.2s
 => [node 1/4] FROM docker.io/kindest/node:v1.28.0@sha256:b7a4cad12c197af  0.0s
 => [node internal] load build context                                     0.0s
 => => transferring context: 84B                                           0.0s
 => CACHED [node 2/4] RUN arch="$(uname -m | sed -e s/x86_64/amd64/ -e s/  0.0s
 => CACHED [node 3/4] RUN apt-get update && apt-get install -y --no-insta  0.0s
 => CACHED [node 4/4] ADD Dockerfile.d/u7s-entrypoint.sh /                 0.0s
 => [node] exporting to image                                              0.0s
 => => exporting layers                                                    0.0s
 => => writing image sha256:ef1a52ff46bc2c33546f1db882bb04667aecb3e532c5b  0.0s
 => => naming to docker.io/library/usernetes-node                          0.0s
[+] Running 1/0
 ✔ Container usernetes-node-1  Running                                     0.0s 
docker compose exec -e U7S_HOST_IP=10.10.0.3 -e U7S_NODE_NAME=u7s-usernetes-compute-003 -e U7S_NODE_SUBNET=10.100.5.0/24 node sh -euc '$(cat /usernetes/join-command)'
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: missing optional cgroups: hugetlb

I think the above was running make -C /opt/usernetes up kubeadm-join with the copied over join-command.
But I didn't see the node with kubectl get nodes. What should I try to debug next? I had to bring my cluster down from my phone when my power went off in case it was an all day thing and I was burning cloud monies. 😆

@AkihiroSuda
Copy link
Member Author

Congrats on your award today, you and your contributions are amazing and we so appreciate you!

Thank you

But I wonder why that wasn't fixed to start?

Because the sysctl value of the dockerd process is propagated to the container.

But I didn't see the node with kubectl get nodes. What should I try to debug next?

Any error from kubeadm-join?

I had to bring my cluster down from my phone when my power went off in case it was an all day thing and I was burning cloud monies. 😆

I'd suggest to use local VMs for an exercise

e.g., with https://lima-vm.io/ :

limactl start --network=lima:user-v2 --name=vm0 template://rockylinux-9
limactl start --network=lima:user-v2 --name=vm1 template://rockylinux-9

@vsoch
Copy link

vsoch commented Nov 8, 2023

oh neat - I am not familiar with this tool. I'll try this out after a meeting / later this evening and give you an update!

@vsoch
Copy link

vsoch commented Nov 9, 2023

okay so I created two rocky VMs - but I don't really know how to get them networked or even the basics. I do see there are templates:

@vsoch
Copy link

vsoch commented Nov 9, 2023

Okay I installed lima and QEMU and created two rocky VMs - and I don't know enough basics to even get a ping working from one VM to the other. I do see there are templates:

image

And namely some for k8s and k3s - is there any reason there isn't a template for usernetes? is it that a template == one vm? It seems like if one person has stepped through this process of using lima (and knows how to do it) it would be logical to provide a template for a control plan and then N workers for someone else to easily deploy.

@vsoch
Copy link

vsoch commented Nov 9, 2023

Any error from kubeadm-join?

Will bring up a cluster now and look into this! I've been working for months on these terraform (now OpenTofu) templates and it feels daunting to start from scratch with a VM tool I've never used before. I'm hoping I'm close with the tofu configs on GCP to have something working more quickly.

@vsoch
Copy link

vsoch commented Nov 9, 2023

okay here is the error from kubeadm-join:

docker compose exec -e U7S_HOST_IP=10.10.0.5 -e U7S_NODE_NAME=u7s-usernetes-compute-002 -e U7S_NODE_SUBNET=10.100.153.0/24 node sh -euc '$(cat /usernetes/join-command)'
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: missing optional cgroups: hugetlb
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR CRI]: container runtime is not running: output: time="2023-11-09T04:50:45Z" level=fatal msg="validate service connection: validate CRI v1 runtime API for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/containerd/containerd.sock: connect: no such file or directory\""
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
make: *** [Makefile:112: kubeadm-join] Error 1

@vsoch
Copy link

vsoch commented Nov 9, 2023

If I shell in (or just run again from the outside) it hangs here:

[preflight] Running pre-flight checks
	[WARNING SystemVerification]: missing optional cgroups: hugetlb

@vsoch
Copy link

vsoch commented Nov 9, 2023

For the control plane (that appears to work) what I see in make logs

:509] "Failed to ensure process in container with oom score" err="failed to apply oom score -999 to PID 1013: write /proc/1013/oom_score_adj: permission denied"
Nov 09 04:56:43 u7s-usernetes-compute-001 kubelet[1013]: E1109 04:56:43.260899    1013 container_manager_linux.go:509] "Failed to ensure process in container with oom score" err="failed to apply oom score -999 to PID 1013: write /proc/1013/oom_score_adj: permission denied"

And the worker node (hanging) I see:

Nov 09 04:50:45 u7s-usernetes-compute-002 containerd[181]: time="2023-11-09T04:50:45.432440353Z" level=warning msg="The image docker.io/kindest/local-path-helper:v20230510-486859a6 is not unpacked."
Nov 09 04:50:45 u7s-usernetes-compute-002 systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Nov 09 04:50:45 u7s-usernetes-compute-002 systemd[1]: Finished Update UTMP about System Runlevel Changes.
Nov 09 04:50:45 u7s-usernetes-compute-002 systemd[1]: Startup finished in 199ms.
Nov 09 04:50:45 u7s-usernetes-compute-002 containerd[181]: time="2023-11-09T04:50:45.444757902Z" level=info msg="Start event monitor"
Nov 09 04:50:45 u7s-usernetes-compute-002 containerd[181]: time="2023-11-09T04:50:45.444784147Z" level=info msg="Start snapshots syncer"
Nov 09 04:50:45 u7s-usernetes-compute-002 containerd[181]: time="2023-11-09T04:50:45.444793703Z" level=info msg="Start cni network conf syncer for default"
Nov 09 04:50:45 u7s-usernetes-compute-002 containerd[181]: time="2023-11-09T04:50:45.444799589Z" level=info msg="Start streaming server"

But I don't see the node is registered:

$ kubectl get nodes
NAME                        STATUS   ROLES           AGE   VERSION
u7s-usernetes-compute-001   Ready    control-plane   11m   v1.28.0

This did work once for me, when it was in the middle of development! I wish I knew what changed :/ I could try going back to rocky since that works now, but I had thought ubuntu was a more sound option.

@vsoch
Copy link

vsoch commented Nov 9, 2023

The hanging terminal finally timed out:

	[WARNING SystemVerification]: missing optional cgroups: hugetlb
error execution phase preflight: couldn't validate the identity of the API Server: Get "https://10.10.0.3:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
To see the stack trace of this error execute with --v=5 or higher
root@u7s-usernetes-compute-002:/usernetes# 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants