-
Notifications
You must be signed in to change notification settings - Fork 107
WMCore Kubernetes 101
This wiki will contain kubernetes procedures specific to WMCore central services. However, for full and rich details, please refer to the CMSWEB documentation, e.g.: CMSWEB k8s cluster
To execute these steps, you will need to access lxplus8.cern.ch, have the cmsweb kubernetes configuration file and have privileges to access the namespace you are responsible for. In our case, we require access to the following namespaces:
-
dmwm
: for all of the WMCore central services (ReqMgr2, microservices, etc). -
couchdb
: only commissioned in our development clusters (testX). Production and integration still rely on this service running in specific VMs.
The following table lists the CMSWEB environment, the kubernetes cluster name and the services configuration branch name.
Environment name | Cluster name | Configuration branch |
---|---|---|
prod | prod | prod |
testbed | preprod | preprod |
develop | test[1-13] | test |
Several DEV clusters have been allocated to the WMCore team, however, we still have to organize and allocate those resources within our team. Kubernetes clusters dedicated to WMCore are:
Cluster name | Developer |
---|---|
cmsweb-test1 | Todor |
cmsweb-test5 | Kenyi |
cmsweb-test8 | Erik |
cmsweb-test9 | Alan |
cmsweb-test10 | Valentin |
The current list of services - and docker images - that need to be provided from the WMCore project is:
reqmgr2
workqueue
reqmon
t0_reqmon
reqmgr2ms-transferor
reqmgr2ms-monitor
reqmgr2ms-output
reqmgr2ms-rulecleaner
reqmgr2ms-unmerged
reqmgr2ms-pileup
Note that the following two services are still provided in specific VMs (exception for the development environment, which also has these in kubernetes):
- acdcserver (deployed as a CouchApp)
- CouchDB
For CMSWEB services, which WMCore services belongs to, we keep all Dockerfiles in the following repository area: https://github.com/dmwm/CMSKubernetes/tree/master/docker
Each individual directory is named by the service name, e.g. reqmgr2
is for ReqMgr2 WMCore service. These directories contain at least the relevant Dockerfile, but they might contain additional auxiliary files used during the docker build process.
WMCore have two distinct docker areas:
-
docker/xxx
: represents package area based on the RPM images; -
docker/pypy/xxx
: while this area is based on the PyPi deployment approach.
Since December 2022, we no longer have to build the WMCore docker images manually. There is a GitHub action workflow that builds for every service provided by WMCore:
- a PyPi package is build and uploaded
- the same workflow builds docker image with pip install
- docker images are uploaded to CERN Registry
This process usually takes less than 15min for all the packages and images. This GH workflow is triggered whenever a new tag is created, except for those used in the Jenkins tests.
However, in case an image needs to be manually built and uploaded, you need to first access a node (your laptop, a VO box, etc) that is running Docker, then login to the CERN Registry with:
docker login registry.cern.ch
If you are not able to login, then you need to fetch your credentials from CERN Registry. For that, access CERN Registry, then on the upper right corner select Drop Down Box (username)
-> User Profile
-> CLI Secret
and copy it to clipboard.
Finally, building one of our WMCore PyPi images from the CMSKubernetes repository:
cd docker
; now build the dmwm-base image
docker build -t registry.cern.ch/cmsweb/dmwm-base:pypi-20230110 pypi/dmwm-base
; finally, upload this new image to the registry
docker push registry.cern.ch/cmsweb/dmwm-base:pypi-20230110
On what concerns the services configuration and secret files, stored in the services_config repository, there are different branches for different service level, such as:
-
prod
: this branch contains configuration and secrets for the production cluster. -
preprod
: this branch contains configuration and secrets for the testbed cluster. -
test
: contains configuration and secrets for all of the dev cluster.
You need to:
- log in to lxplus8.cern.ch
- load an openstack token with
export OS_TOKEN=$(openstack token issue -c id -f value)
- and export the correct cluster configuration to KUBECONFIG, e.g.
export KUBECONFIG=/afs/cern.ch/user/a/amaltaro/private/cmsweb/config.cmsweb-k8s-services-testbed
Now you should be able to check which PODs and services are running in the kubernetes cluster that you loaded. To list all the DMWM pods, run:
kubectl get pods -n dmwm
To list all the secrets available in a given kubernetes namespace, you can execute the following command(or change dmwm
by the namespace you want):
kubectl get secrets -n <NAMESPACE>
e.g.
kubectl get secrets -n dmwm
In order to describe one specific kubernetes secret, you can run the following command:
kubectl describe secret <SECRET_NAME> -n <NAMESPACE>
e.g.
kubectl describe secret reqmgr2-secrets -n dmwm
The content of a given secret object can be retrieved with the following command:
kubectl get secret <SECRET_NAME> -n <NAMESPACE> -o go-template='{{range $k,$v := .data}}{{printf "%s: " $k}}{{if not $v}}{{$v}}{{else}}{{$v | base64decode}}{{end}}{{"\n"}}{{end}}'
e.g.
kubectl get secret rucio-secrets -n dmwm -o go-template='{{range $k,$v := .data}}{{printf "%s: " $k}}{{if not $v}}{{$v}}{{else}}{{$v | base64decode}}{{end}}{{"\n"}}{{end}}'
You can retrieve the service names available under a given namespace with the following command:
kubectl get deploy -n <NAMESPACE>
e.g.
kubectl get deploy -n dmwm
These service names might be referred as deployment name as well.
You can list all the replicas/PODs available in a given namespace with the following command:
; get list of pods
kubectl get pods -n <NAMESPACE>
In case you need to restart a specific POD, for instance to load a new secret object, it can be performed as:
kubectl delete pod -n <NAMESPACE> --wait=false <NAME_OF_THE_POD>
; note that the --wait=false will execute a non-blocking command
An specific example of a CouchDB restart can be:
kubectl delete -f $HOME/private/cmsweb-deploy/CMSKubernetes/kubernetes/cmsweb/services/couchdb.yaml
If you push in configuration or secret changes to a given service, you will need to restart all the replicas/PODs such that the new configuration gets properly loaded by the services. For that, you can execute the following command:
kubectl rollout restart deploy <SERVICE_NAME> -n <NAMESPACE>
e.g.
kubectl rollout restart deploy reqmgr2 -n dmwm
If you want to access a container/POD, the following command can be executed to open a bash terminal inside the container:
kubectl exec -it <POD_NAME> -n <NAMESPACE> bash
e.g.
kubectl exec -it ms-monitor-65469d7cc-qgqt5 -n dmwm bash
Logs of a POD can be visualized with the following command:
kubectl logs <POD_NAME> -n <NAMESPACE>
e.g.
kubectl logs ms-monitor-65469d7cc-qgqt5 -n dmwm
which is specially important for replicas that don't manage to start up.
In case you want to permanently stop a given service, such that it does not get automatically restarted, you need to execute the following command:
kubectl delete -f $HOME/private/cmsweb-deploy/CMSKubernetes/kubernetes/cmsweb/services/${SERVICE_NAME}.yaml
If the service deployment didn't go through and your POD keeps crashing and restarting, hence not allowing you to open a bash inside the pod for further debugging, you can run the following command to get the tail of the pod log:
kubectl logs <REPLICA_NAME> -n <NAMESPACE>
e.g.
kubectl logs ms-monitor-65469d7cc-qgqt5 -n dmwm
This is a one-time action, in case you have problems deploying WMCore services in a kubernetes cluster, with the following error:
Error from server (NotFound): secrets "dmwm-keys-secrets" not found
you need to contact the CMSWEB/HTTP team, such that they can install this key in the cluster for you.
Complete documentation for this process can be found at deploy secrets.
In short, after setting up your environment, you need to have your configuration changes merged in the services_config
repository (which you need to contact Imran for). Once configuration changes have been merged in the correct branches, you can proceed with this process.
In lxplus8, go to your private directory - one that has restricted AFS ACL - and clone the services_config repository (note that you might need to use a different directory name):
cd $HOME/private/cmsweb-deploy
git clone https://:@gitlab.cern.ch:8443/cmsweb-k8s/services_config.git
Now you need to checkout the correct branch and ensure that your recent changes show up in your remote (note that you might need to update the branch name below):
git checkout preprod
git log -5 --pretty=oneline --decorate
Then you move to your home directory, or anywhere outside of the services_config local repository, and clone CMSKubernetes
this time.:
cd $HOME
git clone https://github.com/dmwm/CMSKubernetes.git && cd kubernetes/cmsweb
Finally, you can now execute the following script to push your local up-to-date configuration changes to the k8s cluster:
./scripts/deploy-secrets.sh <NAMESPACE> <SERVICE_NAME> <PATH_TO_SERVICES_CONFIG_ROOT_DIRECTORY>
e.g.:
./scripts/deploy-secrets.sh dmwm reqmgr2 $HOME/private/cmsweb-deploy/services_config
./scripts/deploy-secrets.sh dmwm rucio $HOME/private/cmsweb-deploy/services_config ; for rucio-secrets
it will download an application called sops
, which will be used to decrypted secrets and deploy them into k8s. Check the logs to see if there were any errors.
[NOTE: this section is likely not relevant, given that kubernetes clusters are created and the HTTP team performs the initial setup, including namespace and secrets creation]
These steps are specific to CouchDB deployment in Kubernetes, which for the moment is only performed for the dev environment, while testbed and production remain in dedicated VMs and based on RPM deployment.
Note that two different CouchDB images have been created (bringing in a different stack):
-
registry.cern.ch/cmsweb/couchdb:3.2.2-v2
: this is based on our RPM procedure, thus bringing in all the dependencies defined in cmsdist (image size of 3.7GB). -
registry.cern.ch/cmsweb/couchdb:3.2.2-v4
: vanilla Debian-based CouchDB image (bringing in Erlang 23.3.4.14 and SpiderMonkey 78).
Assuming you have performed the initial openstack, kubernetes, CMSKubernetes and services_config setup; list the available namespaces in this dev cluster:
kubectl get ns
if couchdb
namespace does not exist yet, we will have to create:
kubectl create ns couchdb
In order to run almost every CMS based service a proper proxy file with robot certificates should be
available to the services. For that, in CMSWEB k8s setup we rely on /etc/secrets/proxy
file which should
contain valid robot certificate. To create this file we rely on k8s secret. Usually, these secrets are
created by CMSWEB operators, but if your cluster and in your namespace you lack of it you can
take one of the existing proxy-secrets
from a different cluster and namespace (as they all contain
the same robot certificate).
Here is an example how to fetch content of proxy-secrets
from the dmwm
namespace located on testbed or
another testXXX cluster:
kubectl -n dmwm get secret proxy-secrets -o jsonpath='{.data.proxy}' | base64 --decode > ~/proxy
Please note the name of the secret is proxy-secrets
while it contains a single file named proxy
.
You can verify that by using
kubectl -n dmwm describe secret proxy-secrets
...
proxy: 10986 bytes
Once we fetched the proxy
file from proxy-secrets
we can deployed to our cluster as following:
kubectl create secret generic proxy-secrets --from-file=$HOME/proxy --dry-run=client -o yaml | kubectl apply -n couchdb -f -
rm -f ~/proxy
Now we need to clone the configuration repository to deploy further configuration and secrets (NOTE: you must checkout the proper branch, test
is used for all our dev clusters):
cd $HOME/private
git clone ssh://[email protected]:7999/cmsweb-k8s/services_config.git && cd services_config && git checkout test
and we can finally deploy the remainder CouchDB configuration and secrets with (note that we do not need to deploy hmackey.ini anymore):
kubectl create secret generic couchdb-secrets --from-file=$HOME/private/cmsweb-deploy/services_config/couchdb/couch_creds --from-file=$HOME/private/services_config/couchdb/local.ini --dry-run=client -o yaml | kubectl apply --namespace=couchdb -f -
But, CMSWEB group provide an alternative approach to deploy collection of files to your k8s cluster namespace.
The tool is called deploy-secrets.sh
and it is located in CMSKubernetes/kubernetes/cmsweb/scripts area.
To use this script to deploy your couchdb secrets you will use it like this:
cd $HOME/CMSKubernetes/kubernetes/cmsweb/
./scripts/deploy-secrets.sh <namespace> <service> <path to configuration files>
; for example to deploy couchdb secret files we perform
./scripts/deploy-secrets.sh couchdb couchdb /<path>/services_config
The deploy-secrets.sh
scripts will take care of how properly decrypt encrypted files if they exist in your service area,
and it will read all files from the service area and create proper secret.
Now that all configuration and secrets are in place, we can proceed with the deployment of the CouchDB service itself. Start cloning the CMSKubernetes repository and then we can run a script to deploy the service:
git clone https://github.com/dmwm/CMSKubernetes.git
cd $HOME/CMSKubernetes/kubernetes/cmsweb/
./scripts/deploy-srv.sh couchdb 3.2.2-v4
If deployment was successful, you should be able to find a POD running for CouchDB, e.g.:
$ kubectl get pods -n couchdb
NAME READY STATUS RESTARTS AGE
couchdb-858d54c457-7sk8v 1/1 Running 0 12h
[NOTE: with microservices using MongoDBAaS, this step is no longer required and this service is provided by the HTTP team (together with Panos)]
Until we start using MongoDBaaS for ms-output, ms-output-mongo needs to be deployed in the k8s test clusters.
Since it does not use the standard cmsweb tag, we need to deploy it with the "latest" tag instead.
./scripts/deploy-secrets.sh dmwm ms-output-mongo /<path>/services_config
./scripts/deploy-srv.sh ms-output-mongo latest