diff --git a/README.md b/README.md index 07f0c6e..251b603 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,17 @@ # Metal Cluster -This is our repo for everything involving our bare-metal Kubernetes cluster. At this moment, +This is our repo for everything related to the now-depreciated "flock" Kubernetes cluster. At this moment, it also contains documentation for setting up Kubernetes, JupyterHub, and BinderHub on Google Cloud. The folder [docs](./docs) contains all documentation related to these topics. -The root of this repository contains files relating to the set-up of the bare-metal cluster, +The folder [flock-archive](./flock-archive) contains files relating to the set-up of the flock cluster. ## Table of Contents This discusses the table of contents of [documentation](./docs) folder based on the following topics. ### Bare-Metal -1. [Bare Metal Cluster Setup](./docs/Bare-Metal/baremetal.md) has what you should read first about - the cluster. The file gives an overview of the cluster set-up, including networking, publishing services, - instructions on adding nodes, and useful resources. +1. [Bare Metal Cluster Setup](./docs/Bare-Metal/baremetal.md) has reading material on nearly every +aspect of the flock cluster. The file gives an overview of the cluster set-up, including networking, publishing services, +instructions on adding nodes, and useful resources. #### Concepts 1. [RAID.md](./docs/Bare-Metal/concepts/RAID.md) describes the purpose and different levels of RAID. @@ -36,8 +36,6 @@ setting up JupyterHub on a virtual machine. It contains solutions to the problem when installing JupyterHub through the [jupyterhub-deploy-teaching](https://github.com/mechmotum/jupyterhub-deploy-teaching) repository. We keep this as a reference for those who might encounter the same problems in the future. - - ### JupyterHub on GCloud This section teaches how to set-up and configure JupyterHub on Google Cloud. @@ -57,5 +55,3 @@ on a Kubernetes cluster on Google Cloud. We created a development [cluster of vms](./dev-env) using Vagrant that is nice for testing stuff without having it on the main cluster. This section contains files and instructions implementing the test cluster. - - diff --git a/docs/.JupyterBareMetalWithLVM.md.swo b/docs/.JupyterBareMetalWithLVM.md.swo deleted file mode 100644 index db25e2b..0000000 Binary files a/docs/.JupyterBareMetalWithLVM.md.swo and /dev/null differ diff --git a/docs/.JupyterBareMetalWithLVM.md.swp b/docs/.JupyterBareMetalWithLVM.md.swp deleted file mode 100644 index e12a649..0000000 Binary files a/docs/.JupyterBareMetalWithLVM.md.swp and /dev/null differ diff --git a/docs/Bare-Metal/concepts/networking.md b/docs/Bare-Metal/concepts/networking.md index 9ff4565..35a286a 100644 --- a/docs/Bare-Metal/concepts/networking.md +++ b/docs/Bare-Metal/concepts/networking.md @@ -1,4 +1,7 @@ # Networking + +NOTE: This file is outdated for the [current galaxy cluster](https://github.com/LibreTexts/galaxy-control-repo/tree/production/router-configs). + *Relevant files: `/etc/netplan/`* *Summary: https://netplan.io/examples* We use [netplan](https://netplan.io/) to configure networking on rooster. diff --git a/docs/Bare-Metal/concepts/nginx.md b/docs/Bare-Metal/concepts/nginx.md index 9d209bd..186cfad 100644 --- a/docs/Bare-Metal/concepts/nginx.md +++ b/docs/Bare-Metal/concepts/nginx.md @@ -1,5 +1,7 @@ # NGINX +NOTE: This file is outdated for the [current galaxy cluster](https://github.com/LibreTexts/galaxy-control-repo/tree/production/router-configs). + *Relevant files: `/etc/nginx`* *Summary: http://nginx.org/en/docs/http/load_balancing.html* diff --git a/docs/Bare-Metal/login.md b/docs/Bare-Metal/login.md index 0be81e7..1e79242 100644 --- a/docs/Bare-Metal/login.md +++ b/docs/Bare-Metal/login.md @@ -1,10 +1,7 @@ # Login This JupyterHub serves LibreTexts instructors and their students, as well as UC Davis faculty, staff, and students. -## Request an account -If you are a LibreTexts or UC Davis student, please request an account by sending your -Google OAuth enabled email to . Your email address must have a Google Account -that can be used with Google OAuth, like `@gmail.com` or `@ucdavis.edu`. +If you are a UC Davis student, access to this JupyterHub is already granted. Just login with your school email. ## Getting started with Jupyter [Jupyter](https://jupyter.org/index.html) is an environment where you can diff --git a/docs/Bare-Metal/troubleshooting/README.md b/docs/Bare-Metal/troubleshooting/README.md index 73da878..131da02 100644 --- a/docs/Bare-Metal/troubleshooting/README.md +++ b/docs/Bare-Metal/troubleshooting/README.md @@ -1,13 +1,15 @@ # Common Practices For Troubleshooting +NOTE: This file and others in this folder may be outdated for the [current galaxy cluster](https://github.com/LibreTexts/galaxy-control-repo/tree/production/kubernetes/). + If the troubleshooting that you are doing for a particular problem is ineffective it can largely be due to any of the following reasons: 1. You may be looking at symptoms unrelated to the problem. * Fixing this is a matter of becoming better aquainted with the system. The following are some resources that you may want to review: - 1. [Bare-Metal](https://github.com/LibreTexts/metalc/blob/docs/Troubleshooting-Summary/docs/Bare-Metal/baremetal.md) - 2. [BinderHub](https://github.com/LibreTexts/metalc/blob/docs/Troubleshooting-Summary/docs/Binder-on-GCloud/01-BinderHub.md) - 3. [Maintenance](https://github.com/LibreTexts/metalc/blob/docs/Troubleshooting-Summary/docs/maintenance-tasks.md) - 4. [Adding new packages to the Dockerfile](https://github.com/LibreTexts/default-env/tree/master/rich-default) + 1. Documentation in galaxy-control-repo + 2. [BinderHub](/docs/Binder-on-GCloud/01-BinderHub.md) + 3. [Maintenance](/docs/maintenance-tasks.md) + 4. [Adding new packages to the Dockerfile](https://github.com/LibreTexts/default-env/) 2. Not fully understanding how to change the system such the inputs, outputs, the environment, and etc. * This is similar to the previous example, make sure you have good knowledge of the system you are looking at. Visit any of the above links, and feel free check out any other documentation. The following may potentially be relevant: @@ -16,7 +18,7 @@ If the troubleshooting that you are doing for a particular problem is ineffectiv 3. [BinderHub FAQ](https://mybinder.readthedocs.io/en/latest/faq.html) 3. Assuming that the problem you are facing is the same as one you have previously dealt with given that the symptoms are the same. * If you are dealing with similar symptoms definitly look into how you have previously handled the issue, or how solutions have been documented - [here](https://github.com/LibreTexts/metalc/tree/docs/Troubleshooting-Summary/docs/Bare-Metal/troubleshooting). Howevever, if you have tried everything you + [here](/docs/Bare-Metal/troubleshooting/). Howevever, if you have tried everything you have done before, it might be worth trying something different. A good example of this kind of dilema is [this issue](https://github.com/LibreTexts/metalc/blob/master/docs/Bare-Metal/troubleshooting/KubeadmCert.md) we had. diff --git a/docs/maintenance-tasks.md b/docs/maintenance-tasks.md index 824b3c0..b2df817 100644 --- a/docs/maintenance-tasks.md +++ b/docs/maintenance-tasks.md @@ -2,48 +2,63 @@ This document lists all tasks that should be done regularly. -## Security Update on Rooster - -* Frequency: weekly -* Command: `sudo unattended-upgrade -d` - -Rooster is the only server that has a public network. It is essential to keep the system up to date. We use [unattended-upgrade](https://github.com/mvo5/unattended-upgrades) to upgrade packages safely. To minimize affecting the cluster, check the following before running the command: - -1. Run `sudo unattended-upgrade -d --dry-run` to make sure that it will upgrade without error. -2. Check `kubectl get pods -n jhub` to see if there are a lot of people using the cluster. Try to upgrade when no one is there. - -Additionally, there is a cron job on rooster that runs `sudo unattended-upgrade -d --dry-run` and sends out weekly emails on Friday. Do `sudo crontab -e` to edit the cron job. If you wish to change the code of the cron job, the shell script is located at `/home/spicy/metalc-configurations/cronjob/weekly-security-update`. The shell script is a python script that uses pipes to run commands. - -## Scrub Checks on Hen (ZFS) +## Scrub Checks on Blackhole (ZFS) * Frequency: monthly * Command: `sudo zpool scrub nest` -To execute manually, you must first ssh into hen from rooster with the command `ssh hen`. Scrub checks the file system's integrity, and repairs any issues that it finds. After the scrub is finished, it is good to also run `zpool status` to check if there is anything wrong. - -This command is run on a cronjob, so there should be no need for manual intervention. The cronjob runs at 8:00AM the first day of every month. Rooster will also send out an email at 8:10AM on the same day with the results. If the scrub is fine, the email should be titled `[All clear] Hen monthly ZFS report`. If the title says `[POTENTIAL ZFS ISSUE]` instead, there may be something wrong with the disk, and the email contains the `zpool status` output which you can use to debug disk issues. Details on how the cronjob is setup are in the private configuration repo, under `cronjob/monthly-zfs-report.py`. +To execute manually, you must first ssh into blackhole. +Scrub checks the file system's integrity, and repairs +any issues that it finds. After the scrub is finished, +it is good to also run `zpool status` to check if there +is anything wrong. + +This command is run on a cronjob. The cronjob runs at 8:00AM +the first day of every month. Gravity also sends out an +email at 8:10AM on the same day with the results. If the +scrub is fine, the email is titled +`[All clear] Hen monthly ZFS report`. If the title +says `[POTENTIAL ZFS ISSUE]` instead, there may be +something wrong with the disk, and the email contains +the `zpool status` output which can be used to debug disk +issues. Details on how the cronjob was setup are in the +private configuration repo, under `cronjob/monthly-zfs-report.py`. ## Cluster control plane upgrade -The Kubernetes control plane should be upgraded regularly. There is a cronjob sending out a triyearly reminder (Jan, May, Sept 1st of every year) reminding you to do the upgrade. (The cronjob can be found in the private configuration repo, under `cronjob/cluster-upgrade-reminder.sh). - -This must be done at least once a year, otherwise the Kubernetes certificates may expire, which will [break the entire cluster](https://github.com/LibreTexts/metalc/blob/master/docs/Bare-Metal/troubleshooting/KubeadmCert.md#more-complex-solution-renewing-kubeadm-certificates). The email sent out should contain the date on which the certificates expire. If you apply routine upgrades, certificates should be renewed automatically and this should not be an issue. If the certificates do expire, follow that guide instead of this one. If you need to renew the certificates without upgrading the cluster, follow [the guide to renew certificates without upgrading](#renew-certificates-without-upgrade) instead, but this should not be an excuse to put off cluster upgrades indefinitely. +The Kubernetes control plane should be upgraded regularly. +There is a cronjob sending out a triyearly reminder +(Jan, May, Sept 1st of every year) reminding you to do the +upgrade. (The cronjob can be found in galaxy-control-repo.) + +This must be done at least once a year, otherwise the +Kubernetes certificates may expire, which will +[break the entire cluster](/docs/Bare-Metal/troubleshooting/KubeadmCert.md#more-complex-solution-renewing-kubeadm-certificates). +The email sent out contains the date on which the +certificates expire. If you apply routine upgrades, +certificates should be renewed automatically and this +should not be an issue. If the certificates do expire, +follow that guide instead of this one. If you need to +renew the certificates without upgrading the cluster, +follow [the guide to renew certificates without upgrading](#renew-certificates-without-upgrade) +instead, but this should not be an excuse to put off +cluster upgrades indefinitely. ### Upgrading the control plane -tl;dr: Follow https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ and upgrade the control plane, update apt packages on the nodes, then copy `/etc/kubernetes/admin.conf` on chick0 into `/home/spicy/.kube/config` on rooster. +tl;dr: Follow https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ and upgrade the control planes, update apt packages on the nodes, then copy `/etc/kubernetes/admin.conf` on a nebula into `/home/milky/.kube/config` on gravity. Note: This will take a while and will cause some downtime. Be sure to notify users beforehand. -1. Follow the instructions in the [official cluster upgrade guide](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/). We only have one control plane node (chick0), so follow the "Upgrade the first control plane node" secion on chick0 and ignore "Upgrade additional control plane nodes". Also follow "Upgrade worker nodes" on all other chicks. This is also a good time to [upgrade all packages on the chicks](https://github.com/LibreTexts/metalc/blob/447a459bacfbc6a29d80229e7df2f2bfb953cd7a/docs/updating-ubuntu-kubelet.md) as well, since the chicks are cordoned during the upgrade. -2. Once you verified the cluster is working using `kubectl get nodes` on rooster, copy over the newer admin certificate/key. - 1. SSH into chick0, then do `sudo cp /etc/kubernetes/admin.conf /home/spicy/.kube/config`. Also `chown spicy:spicy /home/spicy/.kube/config` to make it readable to us. - 2. Go back into rooster and do `scp chick0:.kube/config ~/.kube/config` to copy the file onto rooster. - 3. Verify `kubectl `works on both rooster and chick0 by running any kubectl command (such as `kubectl get nodes`). +1. Follow the instructions in the [official cluster upgrade guide](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/). We have multiple control plane nodes (nebulas), so follow both the "Upgrade the first control plane node" and "Upgrade additional control plane nodes" sections. Also follow "Upgrade worker nodes" on all other worker nodes. This is also a good time to [upgrade all packages on the stars](https://github.com/LibreTexts/metalc/blob/447a459bacfbc6a29d80229e7df2f2bfb953cd7a/docs/updating-ubuntu-kubelet.md) as well, since the stars are cordoned during the upgrade. +2. Once you verified the cluster is working using `kubectl get nodes` on gravity/quantum, copy over the newer admin certificate/key. + 1. SSH into a nebula, then do `sudo cp /etc/kubernetes/admin.conf /home/milky/.kube/config`. Also `chown milky:milky /home/milky/.kube/config` to make it readable to us. + 2. Go back into gravity and do `scp nebula{1,5}:.kube/config ~/.kube/config` to copy the file onto gravity. + 3. Verify `kubectl` works on both gravity and the nebulas by running any kubectl command (such as `kubectl get nodes`). ### Renew certificates without upgrade Sometimes you want to renew the certificates without doing a proper upgrade and causing downtime. In that case, do the following: -1. Follow [the official guide](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#manual-certificate-renewal). tl;dr: Just run `sudo kubeadm alpha certs renew` on chick0. +1. Follow [the official guide](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#manual-certificate-renewal). tl;dr: Just run `sudo kubeadm alpha certs renew` on all of the nebulas. 2. Follow the same step 2 as [Upgrading the control plane](#upgrading-the-control-plane). diff --git a/docs/updating-ubuntu-kubernetes.md b/docs/updating-ubuntu-kubernetes.md index 574434d..e8635f3 100644 --- a/docs/updating-ubuntu-kubernetes.md +++ b/docs/updating-ubuntu-kubernetes.md @@ -1,10 +1,19 @@ # Updating Ubuntu and Kubernetes -This document lists the procedure for updating Ubuntu and Kubernetes on the chick nodes. +Note: This document is mostly outdated for the [current galaxy cluster](https://github.com/LibreTexts/galaxy-control-repo/#upgrading-the-kubernetes-cluster). + +This document lists the procedure for updating Ubuntu and +Kubernetes on the chick nodes. ## Checking Software Versions on the Nodes -You can check the versions of kubernetes, Ubuntu and the kernel as well as the status of each node by executing the command `kubectl get nodes -o wide` from rooster. When you do Kubernetes upgrades, make sure that you do not upgrade more than one minor version at a time. For example, if the cluster is at verison 1.19 and the latest available version is 1.21, you should first upgrade everything to 1.20, then 1.21. +You can check the versions of kubernetes, Ubuntu and the +kernel as well as the status of each node by executing the +command `kubectl get nodes -o wide` from rooster. When you +do Kubernetes upgrades, make sure that you do not upgrade +more than one minor version at a time. For example, if the +cluster is at verison 1.19 and the latest available version +is 1.21, you should first upgrade everything to 1.20, then 1.21. ## Preparing to Update @@ -16,7 +25,13 @@ You can check the versions of kubernetes, Ubuntu and the kernel as well as the s ## Updating Kubernetes -The official documentation for upgrading Kubernetes is available [here](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/). You will first have to upgrade kubeadm and then use it to upgrade kubelet and kubectl. The processis pretty straightforward if you follow the official documentation. Also checkout `maintenance-tasks.md` for more information about cluster upgrades and nuances. +The official documentation for upgrading Kubernetes is available +[here](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/). +You will first have to upgrade kubeadm and then use it to +upgrade kubelet and kubectl. The processis pretty straightforward +if you follow the official documentation. Also checkout +`maintenance-tasks.md` for more information about cluster +upgrades and nuances. ## Updating Ubuntu diff --git a/ansible/hosts b/flock-archive/ansible/hosts similarity index 100% rename from ansible/hosts rename to flock-archive/ansible/hosts diff --git a/ansible/misc/htop.yml b/flock-archive/ansible/misc/htop.yml similarity index 100% rename from ansible/misc/htop.yml rename to flock-archive/ansible/misc/htop.yml diff --git a/ansible/misc/hwe.yml b/flock-archive/ansible/misc/hwe.yml similarity index 100% rename from ansible/misc/hwe.yml rename to flock-archive/ansible/misc/hwe.yml diff --git a/ansible/misc/test.yml b/flock-archive/ansible/misc/test.yml similarity index 100% rename from ansible/misc/test.yml rename to flock-archive/ansible/misc/test.yml diff --git a/ansible/misc/test2.yml b/flock-archive/ansible/misc/test2.yml similarity index 100% rename from ansible/misc/test2.yml rename to flock-archive/ansible/misc/test2.yml diff --git a/ansible/playbooks/init.yml b/flock-archive/ansible/playbooks/init.yml similarity index 100% rename from ansible/playbooks/init.yml rename to flock-archive/ansible/playbooks/init.yml diff --git a/ansible/playbooks/kube-deps.yml b/flock-archive/ansible/playbooks/kube-deps.yml similarity index 100% rename from ansible/playbooks/kube-deps.yml rename to flock-archive/ansible/playbooks/kube-deps.yml diff --git a/ansible/playbooks/main.yml b/flock-archive/ansible/playbooks/main.yml similarity index 100% rename from ansible/playbooks/main.yml rename to flock-archive/ansible/playbooks/main.yml diff --git a/ansible/playbooks/master.yml b/flock-archive/ansible/playbooks/master.yml similarity index 100% rename from ansible/playbooks/master.yml rename to flock-archive/ansible/playbooks/master.yml diff --git a/ansible/playbooks/workers.yml b/flock-archive/ansible/playbooks/workers.yml similarity index 100% rename from ansible/playbooks/workers.yml rename to flock-archive/ansible/playbooks/workers.yml diff --git a/docs/Bare-Metal/baremetal.md b/flock-archive/baremetal.md similarity index 100% rename from docs/Bare-Metal/baremetal.md rename to flock-archive/baremetal.md diff --git a/calico.yml b/flock-archive/calico.yml similarity index 100% rename from calico.yml rename to flock-archive/calico.yml diff --git a/chicks.csv b/flock-archive/chicks.csv similarity index 100% rename from chicks.csv rename to flock-archive/chicks.csv diff --git a/dev-env/.gitignore b/flock-archive/dev-env/.gitignore similarity index 100% rename from dev-env/.gitignore rename to flock-archive/dev-env/.gitignore diff --git a/dev-env/README.md b/flock-archive/dev-env/README.md similarity index 100% rename from dev-env/README.md rename to flock-archive/dev-env/README.md diff --git a/dev-env/Vagrantfile b/flock-archive/dev-env/Vagrantfile similarity index 100% rename from dev-env/Vagrantfile rename to flock-archive/dev-env/Vagrantfile diff --git a/dev-env/ingress/ingress.yml b/flock-archive/dev-env/ingress/ingress.yml similarity index 100% rename from dev-env/ingress/ingress.yml rename to flock-archive/dev-env/ingress/ingress.yml diff --git a/dev-env/kube-flannel.yml b/flock-archive/dev-env/kube-flannel.yml similarity index 100% rename from dev-env/kube-flannel.yml rename to flock-archive/dev-env/kube-flannel.yml diff --git a/dev-env/nfs-client-vals.yml b/flock-archive/dev-env/nfs-client-vals.yml similarity index 100% rename from dev-env/nfs-client-vals.yml rename to flock-archive/dev-env/nfs-client-vals.yml diff --git a/get_macs.py b/flock-archive/get_macs.py similarity index 100% rename from get_macs.py rename to flock-archive/get_macs.py diff --git a/metallb-config.yml b/flock-archive/metallb-config.yml similarity index 100% rename from metallb-config.yml rename to flock-archive/metallb-config.yml diff --git a/nfs-client-vals.yml b/flock-archive/nfs-client-vals.yml similarity index 100% rename from nfs-client-vals.yml rename to flock-archive/nfs-client-vals.yml