Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What services are running in HA control-plane nodes #6

Open
cmonty14 opened this issue Mar 1, 2022 · 13 comments
Open

What services are running in HA control-plane nodes #6

cmonty14 opened this issue Mar 1, 2022 · 13 comments

Comments

@cmonty14
Copy link

cmonty14 commented Mar 1, 2022

Hi,
I want to rebuild your solution in my homelab.
However I don't fully understand the architecture.

First, I would ask you to clarify the different terms:

  • control-plane nodes
  • Master kubernetes cluster

For a PXE bootable server I would need

  • PXE server
  • DHCP service
  • NFS service
    Are these services deployed on the HA control-plane nodes?

And what storage type is used?
Is it local storage, means any control-plane node offers a NFS service?
Or is it storage cluster, means all storage attached to the control-plane nodes is serving this cluster storage?

Regards
Thomas

@kvaps
Copy link
Member

kvaps commented Mar 1, 2022

Hi, originally I was trying to repeat GKE on-prem architecture:
Screenshot 2022-03-01 at 9 51 19

First you need to bring your Master Kubernetes cluster or Admin cluster which consists of three control-plane nodes.
They are control-plane nodes for this Admin cluster as they running control-plane services like etcd, kube-apiserver, scheduler and controller-manager.

And they are also running containerized control-plane for the User defined clusters (child clusters).
I'm not sure about the terminology, but if you feel that it is confusing please send me a pull request to fix the documentation.

For a PXE bootable server I would need

  • PXE server
  • DHCP service
  • NFS service
    Are these services deployed on the HA control-plane nodes?

Yes they are

And what storage type is used?
Is it local storage, means any control-plane node offers a NFS service?

Etcd the only storage consumer. Etcd enables HA on application layer, so you don't need any high-available storage for that. I suggest using local-path-provisioner as the most simplest solution

@cmonty14
Copy link
Author

cmonty14 commented Mar 1, 2022

Hi,
many thanks for your reply.

Things are getting clearer now.

There's just this follow-up question regarding NFS / Netboot Servers.
My understanding is there must be a Netboot server that provides a shared storage with NFS.
If this is located on a single node, it would be a SPOF and break HA.
Therefore I concluded that this is provided by the Master kubernetes cluster (= Admin cluster).

In my homelab this Admin cluster is built on three Raspi4 nodes, each with 4GB RAM and a SSD connected.
How would this work with local-path-provisioner?
Would the shared storage be provided by every single SSD, and that means 2 SSDs have replicated data?

@kvaps
Copy link
Member

kvaps commented Mar 1, 2022

NFS is not used as whole rootfs-image is loaded directly into RAM. This image is given by LTSP-server which is separate for each user cluster

In my homelab this Admin cluster is built on three Raspi4 nodes, each with 4GB RAM and a SSD connected.
How would this work with local-path-provisioner?

Not checked that yet, but I guess you'd need to rebuild everything for ARM

@cmonty14
Copy link
Author

cmonty14 commented Mar 1, 2022

NFS is not used as whole rootfs-image is loaded directly into RAM. This image is given by LTSP-server which is separate for each user cluster

Understood... no NFS server but LTSP. Actually LTSP includes several services, e.g. DNS, TFTP, NFS, etc.
And LTSP is deployed on the Admin cluster, too.

But LTSP requires a storage for the images that the clients are booting.
Is this storage provided by every single control-plane nodes then?
If yes, I would have n-1 replicas of the images for a cluster with n nodes.
If no, this image storage would be a single-point-of-failure.

@kvaps
Copy link
Member

kvaps commented Mar 1, 2022

The root-fs image is build using Dockerfile, so rootfs-image for booting is part of LTSP-server image. Of course you can run it in multiple replicas

@cmonty14
Copy link
Author

cmonty14 commented Mar 1, 2022

And what is your (original) software design to store this root-fs-image?
Dedicated LTSP server or control-plane node storage?

@samek
Copy link

samek commented Apr 28, 2022

@Kvasp have you got an docker compose of the services needed in order to get node up ?
All that would be needed is the join command (for example of existing cluster), and nodes booted would join it?
I tried decomposing the whole thing but the http booting with grub is not working for me..

@kvaps
Copy link
Member

kvaps commented Apr 29, 2022

It was intended that you can use standard tools like kubeadm and kube-spray to bootstrap Kubernetes cluster. All the needed components can be installed in HA inside it.

@samek
Copy link

samek commented Apr 29, 2022

Would you have any idea, why would I get menu, then when It want's to load the vmlinuz from nginx (inside the docker) It fails. Nginx logs only show that it partialy downloaded the file, while curl works. (and to me it seams it timeouts) eg:

`192.168.42.133 - - [29/Apr/2022:13:07:10 +0000] "GET /ltsp/x86_64/vmlinuz HTTP/1.1" 200 13668608 "-" "curl/7.68.0"

192.168.42.151 - - [29/Apr/2022:13:10:27 +0000] "GET /ltsp/x86_64/vmlinuz HTTP/1.1" 200 155170 "-" "GRUB 2.04-1ubuntu44.2"

192.168.42.151 - - [29/Apr/2022:13:12:39 +0000] "GET /ltsp/x86_64/vmlinuz HTTP/1.1" 200 128906 "-" "GRUB 2.04-1ubuntu44.2"

192.168.42.151 - - [29/Apr/2022:13:18:18 +0000] "GET /ltsp/x86_64/vmlinuz HTTP/1.1" 200 117650 "-" "GRUB 2.04-1ubuntu44.2"

192.168.42.151 - - [29/Apr/2022:13:22:12 +0000] "GET /ltsp/x86_64/vmlinuz HTTP/1.1" 200 53866 "-" "GRUB 2.04-1ubuntu44.2"`

@kvaps
Copy link
Member

kvaps commented Apr 29, 2022

Unfortunately I have no idea. Do you use your own DHCP Server?

@samek
Copy link

samek commented Apr 29, 2022

So what I did is I ran the docker twice: Once for dnsmasq-tftp + dnsmasq-dchp (with data from dhcp-controller) + images, and once for the nginx serving.

I took the configuration files from kubernetes deployment. (/etc/ltsp/ and /etc/dnsmasq.d)

docker was build from https://github.com/kvaps/kubefarm/blob/master/build/ltsp/Dockerfile.

dnsmasq-dhcp returns the ip:

dnsmasq-tftp: TFTP root is /srv/tftp  single port mode
dnsmasq-dhcp: read /etc/dnsmasq.d/dhcp-hosts/kubefarm-cluster1-cluster1-ltsp-clients
dnsmasq-dhcp: read /etc/dnsmasq.d/dhcp-opts/kubefarm-cluster1-cluster1-ltsp-ip
dnsmasq-dhcp: read /etc/dnsmasq.d/dhcp-opts/kubefarm-cluster1-cluster1-ltsp-options
dnsmasq-dhcp: read /etc/dnsmasq.d/dhcp-opts/kubefarm-cluster1-cluster1-ltsp-tags
dnsmasq-dhcp: DHCPDISCOVER(ens33) 00:0c:29:73:12:15 
dnsmasq-dhcp: DHCPOFFER(ens33) 192.168.42.151 00:0c:29:73:12:15 
dnsmasq-dhcp: DHCPREQUEST(ens33) 192.168.42.151 00:0c:29:73:12:15 
dnsmasq-dhcp: DHCPACK(ens33) 192.168.42.151 00:0c:29:73:12:15 moj2

then dnsmasq-tftp serves files

dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/core.efi to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/core.efi to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/normal.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/extcmd.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/verifiers.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/crypto.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/gettext.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/terminal.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/gzio.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/gcry_crc.mod to 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-01-00-0c-29-73-12-15 not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C0A82A97 not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C0A82A9 not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C0A82A not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C0A82 not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C0A8 not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C0A not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C0 not found for 192.168.42.151
dnsmasq-tftp: file /srv/tftp/ltsp/grub/grub.cfg-C not found for 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/command.lst to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/fs.lst to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/crypto.lst to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/terminal.lst to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/grub.cfg to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/test.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/efi_gop.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/video_fb.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/video.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/efi_uga.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/cpuid.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/regexp.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/echo.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/linux.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/relocator.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/mmap.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/linuxefi.mod to 192.168.42.151
dnsmasq-tftp: sent /srv/tftp/ltsp/grub/x86_64-efi/http.mod to 192.168.42.151

Menu opens and then the request comes to nginx where only partial vmlinuz is downloaded - but If I curl the url (which is also printed in the grub.cfg) I get it.

192.168.42.133 - - [29/Apr/2022:13:07:10 +0000] "GET /ltsp/x86_64/vmlinuz HTTP/1.1" 200 13668608 "-" "curl/7.68.0"
192.168.42.151 - - [29/Apr/2022:13:10:27 +0000] "GET /ltsp/x86_64/vmlinuz HTTP/1.1" 200 155170 "-" "GRUB 2.04-1ubuntu44.2"

If I just use the /etc/ltsp and create a default ltsp with nfs and dnsmasq - the node boot's up and joins the cluster. But the root is mounted via nfs ..

@kvaps
Copy link
Member

kvaps commented Apr 30, 2022

Ah got it. Did you run these comands to regenrate ltsp initrd image and grub config?

https://github.com/kvaps/kubefarm/blob/f397481db8369c8e5f4c6b297659aa22da224a58/deploy/helm/kubefarm/templates/ltsp-deployment.yaml#L62

@samek
Copy link

samek commented May 1, 2022

Yes i did. I now tried it on "real" server - not on virtual on my computer and it works .. So I guess all along there's something with my vmware setup that's causing this.
I would really like to thank you for everything that you've done in this project - It's a great idea on how to provision nodes/clusters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants