Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VXLAN doesn't seem to work on GCP (while works on AWS and Azure); probably related to MTU #300

Closed
AkihiroSuda opened this issue Sep 6, 2023 · 21 comments
Labels
bug Something isn't working

Comments

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Sep 6, 2023

VXLAN doesn't seem to work on GCP, while it works on AWS and Azure

$ kubectl taint nodes --all node-role.kubernetes.io/control-plane-
$ ./hack/test-smoke.sh 
[INFO] Waiting for nodes to be ready
node/u7s-suda-tmp-1 condition met
node/u7s-suda-tmp-2 condition met
[INFO] Creating StatefulSet "dnstest" and headless Service "dnstest"
service/dnstest created
statefulset.apps/dnstest created
[INFO] Waiting for 3 replicas to be ready
Waiting for 3 pods to be ready...
Waiting for 2 pods to be ready...
Waiting for 2 pods to be ready...
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
partitioned roll out complete: 3 new pods have been updated...
[INFO] Connecting to dnstest-{0,1,2}.dnstest.default.svc.cluster.local
If you don't see a command prompt, try pressing enter.
wget: bad address 'dnstest-0.dnstest.default.svc.cluster.local'
pod "dnstest-shell" deleted
pod default/dnstest-shell terminated (Error)

Likely to be related to MTU.

  • GCP: 1460
  • AWS: 9001
  • Azure: 1500

Version: Usernetes gen2-v20230906.0, Rootless Docker 24.0.6, on Ubuntu 22.04.

@AkihiroSuda AkihiroSuda added the bug Something isn't working label Sep 6, 2023
@AkihiroSuda AkihiroSuda pinned this issue Sep 6, 2023
@AkihiroSuda AkihiroSuda changed the title VXLAN doesn't seem to work on GCP (while works on AWS); probably related to MTU VXLAN doesn't seem to work on GCP (while works on AWS and Azure); probably related to MTU Sep 6, 2023
@aojea
Copy link

aojea commented Sep 6, 2023

gcloud compute networks create mtu9k --mtu=8896

https://cloud.google.com/vpc/docs/mtu

@vsoch
Copy link

vsoch commented Sep 6, 2023

okay found the setting in terraform - testing now.
image

@vsoch
Copy link

vsoch commented Sep 6, 2023

ack, it's timing out again on:

[preflight] Running pre-flight checks
	[WARNING SystemVerification]: missing optional cgroups: hugetlb

This happens maybe 2/3 times, so something is up!

@vsoch
Copy link

vsoch commented Sep 7, 2023

yeah, not getting through either of these steps now with this change. :/ I wonder if this is still issues with Google networking. I think my next step needs to be to create a terraform setup for aws. I have a lot on my Q with 2 talks but I'll find time somewhere!

@vsoch
Copy link

vsoch commented Sep 14, 2023

We are still debugging the ubuntu setup - what appears to be happening is that we don't have basic networking (e.g., even with a configuration that works on rocky, on ubuntu I can open a little webserver on some port, and the curl -k <address> has no route to host. I've started debugging - trying to remove docker entirely and NFS, and still no go. I'm not super great with networking but I'll keep reading and trying to understand why it's not working. I'm especially puzzled because it was working before, I think before a change here, but I don't remember the details. Will keep you updated for sure!

@AkihiroSuda
Copy link
Member Author

FYI I'm trying to support Rocky, but VXLAN doesn't seem to work even with local Lima VMs:

@AkihiroSuda
Copy link
Member Author

net.ipv4.conf.default.rp_filter seems set to 1 (strict) on GCP's Ubuntu image, that might be the reason of the issue on GCP.

@vsoch
Copy link

vsoch commented Sep 15, 2023

Oh! I can test this too. Is it possible to change it, and if so, how?

@AkihiroSuda
Copy link
Member Author

Confirmed that VXLAN is functional on GCP with 462ccf0 🎉

Is it possible to change it, and if so, how?

cat >/etc/sysctl.d/99-usernetes.conf <<EOF
# For VXLAN, net.ipv4.conf.default.rp_filter must not be 1 (strict) in the daemon's netns.
# It may still remain 1 in the host netns, but there is no robust and simple way to
# configure sysctl for the daemon's netns. So we are configuring it globally here.
net.ipv4.conf.default.rp_filter = 2
EOF
sysctl --system

(Also you have to run systemctl --user restart docker.service )

@AkihiroSuda AkihiroSuda unpinned this issue Sep 15, 2023
@vsoch
Copy link

vsoch commented Sep 15, 2023

I'm not sure it's sticking - I see:

$ make up
./Makefile.d/check-preflight.sh
[WARNING] systemd lingering is not enabled. Run `sudo loginctl enable-linger $(whoami)` to enable it, otherwise Kubernetes will exit on logging out.
[WARNING] Kernel module "ip6_tables" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "ip6table_nat" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "iptable_nat" does not seem loaded? (negligible if built-in to the kernel)
[ERROR] sysctl value "net.ipv4.conf.default.rp_filter" must be 0 (disabled) or 2 (loose) in the daemon's network namespace
make: *** [Makefile:57: check-preflight] Error 1

And in the output of sysctl --system I see it at the end:

* Applying /etc/sysctl.d/99-usernetes.conf ...
net.ipv4.conf.default.rp_filter = 2
* Applying /etc/sysctl.conf ...

But I still get that message. I checked the file reported to run after, but it's commented out (so I suspect should not have influence).

$ cat /etc/sysctl.conf |grep ipv4
#net.ipv4.conf.default.rp_filter=1
#net.ipv4.conf.all.rp_filter=1
#net.ipv4.tcp_syncookies=1
#net.ipv4.ip_forward=1
#net.ipv4.conf.all.accept_redirects = 0
# net.ipv4.conf.all.secure_redirects = 1
#net.ipv4.conf.all.send_redirects = 0
#net.ipv4.conf.all.accept_source_route = 0
#net.ipv4.conf.all.log_martians = 1

Am I missing a detail? I ran the commands from the README on my own, ran into this bug, and then ran the init scripts you prepared no luck.

@vsoch
Copy link

vsoch commented Sep 15, 2023

Ah this is interesting!

$ sysctl -n net.ipv4.conf.default.rp_filter
2
$ docker run --rm --net=host busybox sysctl -n net.ipv4.conf.default.rp_filter
1

@vsoch
Copy link

vsoch commented Sep 15, 2023

Doh, this fixed it, I think I put it in the wrong spot in my script!

systemctl --user restart docker.service

Trying again!

@vsoch
Copy link

vsoch commented Sep 15, 2023

okay (for the ubuntu setup) it's still hanging here:

 ✔ Container usernetes-node-1  Running                                                                                        0.0s 
docker compose exec -e U7S_HOST_IP=10.10.0.2 -e U7S_NODE_NAME=u7s-usernetes-compute-002 -e U7S_NODE_SUBNET=10.100.153.0/24 node kubeadm join 10.10.0.4:6443 --token t8ub7m.rfjcdt2jdh24miia --discovery-token-ca-cert-hash sha256:8c3067d686064b134b6f0a604623f13e73fa46e6aa3c0ee44bd9b57b8147213c 
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: missing optional cgroups: hugetlb

For that .net value on the worker node, it's also 2/2 (good). I think the issue on ubuntu is still not fixed w.r.t networking, e.g., running python3 -m http.server 9999 and the firewall has all tcp ports open, from another instance:

$ curl -k 10.10.0.2:9999
curl: (7) Failed to connect to 10.10.0.2 port 9999 after 0 ms: No route to host

Going to try rocky instead.

@vsoch
Copy link

vsoch commented Sep 15, 2023

okay will need to figure out how to install rootless docker on rocky - the default script says unsupported distribution. When I download the script add rocky to the list:

$ ./install-docker.sh 
# Executing docker install script, commit: e5543d473431b782227f8908005543bb4389b8de
+ sudo -E sh -c 'yum install -y -q yum-utils'

Installed:
  yum-utils-4.0.21-19.el8_8.noarch                                              

+ sudo -E sh -c 'yum-config-manager --add-repo https://download.docker.com/linux/rocky/docker-ce.repo'
Adding repo from: https://download.docker.com/linux/rocky/docker-ce.repo
Status code: 404 for https://download.docker.com/linux/rocky/docker-ce.repo (IP: 99.84.160.77)
Error: Configuration of repo failed

@AkihiroSuda
Copy link
Member Author

how to install rootless docker on rocky

if ! command -v dockerd-rootless-setuptool.sh >/dev/null 2>&1; then
if grep -q centos /etc/os-release; then
# Works with Rocky and Alma too
dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
dnf -y install docker-ce
else
curl https://get.docker.com | sh
fi
fi

@vsoch
Copy link

vsoch commented Sep 15, 2023

That worked! Next issue is that this is missing (I'm going through the other make steps now).

[init] Using Kubernetes version: v1.28.2
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: missing optional cgroups: hugetlb
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist

It doesn't hand at the hugetlb though, which means the networking is working and that's great! I can confirm that too with starting up a little web server and doing curl -k to hit it.

@AkihiroSuda
Copy link
Member Author

/proc/sys/net/bridge/bridge-nf-call-iptables does not exist

You need to modprobe br_netfilter

cat >/etc/modules-load.d/usernetes.conf <<EOF
br_netfilter
vxlan
EOF
systemctl restart systemd-modules-load.service

@vsoch
Copy link

vsoch commented Sep 15, 2023

This sequence:

sudo modprobe ip_tables
sudo modprobe br_netfilter 
sudo modprobe vxlan 
sudo systemctl restart systemd-modules-load.service 

# Run init host scripts (I'm not sure if we should skip the first or clone in image build and run there?)
sudo ./init-host/init-host.root.sh 
./init-host/init-host.rootless.sh

Always ends telling me a warning that it's disabled:

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

@vsoch
Copy link

vsoch commented Sep 15, 2023

On the host:

$ sudo sysctl -a | grep iptables
net.bridge.bridge-nf-call-iptables = 1

But I don't see anything in the container:

docker run --rm --net=host busybox sysctl -a | grep iptables

And I did try:

systemctl --user restart docker.service

But the above is still empty.

@AkihiroSuda
Copy link
Member Author

You may need modprobe bridge too?

@vsoch
Copy link

vsoch commented Sep 15, 2023

okay tried that - no change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants