|
| 1 | +# Setup Test Environment for CI/CD |
| 2 | +This document outlines the steps to set up a test environment for OPEA CI/CD from scratch. The environment will be used to run tests and ensure code quality before PR merge and Release. |
| 3 | + |
| 4 | +## Install Habana Driver (Gaudi Only) |
| 5 | +1. Driver and software installation |
| 6 | +https://docs.habana.ai/en/latest/Installation_Guide/Driver_Installation.html |
| 7 | +2. Firmware upgrade |
| 8 | +https://docs.habana.ai/en/latest/Installation_Guide/Firmware_Upgrade.html |
| 9 | + |
| 10 | + |
| 11 | +## Install Docker |
| 12 | +```shell |
| 13 | + sudo apt update |
| 14 | + sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin |
| 15 | + sudo systemctl enable docker.service |
| 16 | + sudo systemctl daemon-reload |
| 17 | + sudo systemctl start docker |
| 18 | +``` |
| 19 | +### Troubleshooting Docker Installation |
| 20 | +1. Issue: E: Unable to locate package docker-compose-plugin |
| 21 | +**solution:** |
| 22 | +```shell |
| 23 | + curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg |
| 24 | + echo \ |
| 25 | + "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \ |
| 26 | + $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null |
| 27 | + sudo apt-get update |
| 28 | + sudo apt-get install -y docker-compose-plugin |
| 29 | +``` |
| 30 | +2. Issue: permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.45/containers/json": dial unix /var/run/docker.sock: connect: permission denied |
| 31 | +**solution:** |
| 32 | +```shell |
| 33 | + # option1. |
| 34 | + sudo usermod -a -G docker xxx |
| 35 | + # option2. |
| 36 | + sudo chmod 666 /var/run/docker.sock |
| 37 | +``` |
| 38 | +3. Issue: ulimit -n setting. [optional] |
| 39 | +**solution:** |
| 40 | +```shell |
| 41 | + cat << EOF | tee /etc/systemd/system/containerd.service.d/override.conf |
| 42 | + [Service] |
| 43 | + LimitNOFILE=infinity |
| 44 | + EOF |
| 45 | + sudo systemctl restart containerd.service |
| 46 | +``` |
| 47 | +4. Issue: control the maximum number of memory mapped areas that a process can have. [optional] |
| 48 | +**solution:** |
| 49 | +```shell |
| 50 | + echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf |
| 51 | + sudo sysctl -p |
| 52 | + sysctl vm.max_map_count # check |
| 53 | +``` |
| 54 | +
|
| 55 | +## Install Conda |
| 56 | +For e2e test env setup. |
| 57 | +```shell |
| 58 | + wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh |
| 59 | + bash Miniconda3-latest-Linux-x86_64.sh |
| 60 | +``` |
| 61 | +
|
| 62 | +## Install K8S |
| 63 | +1. Use kubeadm to setup k8s cluster. |
| 64 | +https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubeadm.md |
| 65 | +2. Install Habana plugins (Gaudi Only) |
| 66 | +https://docs.habana.ai/en/latest/Installation_Guide/Additional_Installation/Kubernetes_Installation/Intel_Gaudi_Kubernetes_Device_Plugin.html |
| 67 | +### Some Test Code after Installation |
| 68 | +```shell |
| 69 | + kubectl get nodes -o wide |
| 70 | + kubectl get pods -A |
| 71 | + kubectl get cs |
| 72 | + kubectl describe node <node_name> |
| 73 | + kubectl describe pod <pod_name> |
| 74 | +``` |
| 75 | +Test for Gaudi: |
| 76 | +```shell |
| 77 | +cat <<EOF | tee test.yaml |
| 78 | +apiVersion: batch/v1 |
| 79 | +kind: Job |
| 80 | +metadata: |
| 81 | + name: habanalabs-gaudi-demo |
| 82 | +spec: |
| 83 | + template: |
| 84 | + spec: |
| 85 | + hostIPC: true |
| 86 | + restartPolicy: OnFailure |
| 87 | + containers: |
| 88 | + - name: habana-ai-base-container |
| 89 | + image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu24.04/habanalabs/pytorch-installer-2.6.0:latest |
| 90 | + workingDir: /root |
| 91 | + command: ["hl-smi"] |
| 92 | + securityContext: |
| 93 | + capabilities: |
| 94 | + add: ["SYS_NICE"] |
| 95 | + resources: |
| 96 | + limits: |
| 97 | + habana.ai/gaudi: 1 |
| 98 | +EOF |
| 99 | + |
| 100 | +kubectl apply -f test.yaml |
| 101 | +kubectl delete -f test.yaml |
| 102 | +``` |
| 103 | + |
| 104 | +## Setup Image Registry for K8S Test |
| 105 | +1. Create a docker image registry. |
| 106 | +```shell |
| 107 | +cat << EOF | tee registry.yaml |
| 108 | +version: 0.1 |
| 109 | +log: |
| 110 | + fields: |
| 111 | + service: registry |
| 112 | +storage: |
| 113 | + cache: |
| 114 | + blobdescriptor: inmemory |
| 115 | + filesystem: |
| 116 | + rootdirectory: /var/lib/registry |
| 117 | + delete: |
| 118 | + enabled: true |
| 119 | +http: |
| 120 | + addr: :5000 |
| 121 | + headers: |
| 122 | + X-Content-Type-Options: [nosniff] |
| 123 | +health: |
| 124 | + storagedriver: |
| 125 | + enabled: true |
| 126 | + interval: 10s |
| 127 | + threshold: 3 |
| 128 | +EOF |
| 129 | + |
| 130 | +cd /scratch-1 # place to store the images |
| 131 | +mkdir local_image_registry && chmod -R 777 local_image_registry |
| 132 | +docker run -d -p 5000:5000 --restart=always --name registry -v /home/sdp/workspace/registry.yaml:/etc/docker/registry/config.yml -v /scratch-1/local_image_registry:/var/lib/registry registry:2 |
| 133 | +``` |
| 134 | +2. Setup docker registry clean up cron. |
| 135 | +https://github.com/opea-project/Validation/blob/main/tools/image-registry/cleanup.sh |
| 136 | +3. Setup connection to the local registry. |
| 137 | + |
| 138 | +For docker: |
| 139 | +```shell |
| 140 | +cat /etc/docker/daemon.json |
| 141 | +# gaudi: |
| 142 | +{"runtimes": {"habana": {"path": "/usr/bin/habana-container-runtime", "runtimeArgs": []}}, "default-runtime": "habana", "insecure-registries" : [ "100.83.111.232:5000" ]} |
| 143 | +# xeon: |
| 144 | +{"insecure-registries": ["100.83.111.232:5000"]} |
| 145 | + |
| 146 | +# restart docker |
| 147 | +sudo systemctl restart docker |
| 148 | + |
| 149 | +# for test |
| 150 | +docker pull opea/chatqna:latest |
| 151 | +docker tag opea/chatqna:latest 100.83.111.232:5000/opea/chatqna:test |
| 152 | +docker push 100.83.111.232:5000/opea/chatqna:test |
| 153 | +``` |
| 154 | +For K8S: |
| 155 | +```shell |
| 156 | +# setup in client side |
| 157 | +cat /etc/containerd/config.toml |
| 158 | +... |
| 159 | +[plugins."io.containerd.grpc.v1.cri".registry.mirrors] |
| 160 | + [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"] |
| 161 | + endpoint = ["https://registry-1.docker.io"] |
| 162 | + [plugins."io.containerd.grpc.v1.cri".registry.mirrors."100.83.111.232:5000"] |
| 163 | + endpoint = ["http://100.83.111.232:5000"] |
| 164 | +... |
| 165 | +# restart containerd |
| 166 | +sudo systemctl restart containerd.service |
| 167 | + |
| 168 | +# setup in server side |
| 169 | +cd /etc/containerd |
| 170 | +sudo mkdir -p certs.d/100.83.111.232:5000 |
| 171 | +cd certs.d/100.83.111.232:5000 |
| 172 | +cat << EOF | sudo tee hosts.toml |
| 173 | +server = "http://100.83.111.232:5000" |
| 174 | +[host."http://100.83.111.232:5000"] |
| 175 | + capabilities = ["pull", "resolve", "push"] |
| 176 | +EOF |
| 177 | + |
| 178 | +# restart containerd |
| 179 | +sudo systemctl restart containerd.service |
| 180 | + |
| 181 | +# for test |
| 182 | +docker pull opea/chatqna:latest |
| 183 | +docker tag opea/chatqna:latest 100.83.111.232:5000/opea/chatqna:test |
| 184 | +docker push 100.83.111.232:5000/opea/chatqna:test |
| 185 | +sudo nerdctl -n k8s.io pull 100.83.111.232:5000/opea/chatqna:test |
| 186 | +``` |
| 187 | +4. Setup ENV for CI/CD. |
| 188 | +```shell |
| 189 | +vi .bashrc |
| 190 | +export OPEA_IMAGE_REPO=100.83.111.232:5000/ |
| 191 | +``` |
| 192 | +5. Build and push images to the new local registry. |
| 193 | + |
| 194 | +## Setup GHA ENV for CI/CD |
| 195 | +1. Setup self-hosed runner for GHA, follow official steps. |
| 196 | +2. Setup ENV for GHA. |
| 197 | +```shell |
| 198 | +vi ~/action_runner/.env |
| 199 | +OPEA_IMAGE_REPO=100.83.111.232:5000/ |
| 200 | +``` |
| 201 | +3. Start runner with svc. |
| 202 | +```shell |
| 203 | +sudo ./svc.sh install # use svc.sh instead of run.sh |
| 204 | +sudo ./svc.sh start |
| 205 | +sudo ./svc.sh status |
| 206 | +sudo ./svc.sh stop |
| 207 | +``` |
| 208 | +## Setup Action Runner Controller (ARC) |
| 209 | +https://docs.github.com/en/actions/tutorials/quickstart-for-actions-runner-controller |
| 210 | +For now, we only support use ARC on Xeon K8S cluster. |
| 211 | +1. Install the ARC |
| 212 | +Make sure you have installed k8s and helm charts in your test machine. |
| 213 | +```shell |
| 214 | +NAMESPACE="opea-arc-systems" |
| 215 | +helm install arc \ |
| 216 | + --namespace "${NAMESPACE}" \ |
| 217 | + --create-namespace \ |
| 218 | + oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller |
| 219 | + |
| 220 | +helm uninstall arc -n $NAMESPACE |
| 221 | +kubectl delete namespace $NAMESPACE --grace-period=0 --force |
| 222 | +``` |
| 223 | +2. Install a runner scale set |
| 224 | +The runner image that we used in CI/CD build by this [dockerfile](https://github.com/opea-project/Validation/blob/main/tools/actions-runner-controller/xeon.dockerfile). |
| 225 | +And the config setting for the runner scale set can be found in [here](https://github.com/opea-project/Validation/blob/main/tools/actions-runner-controller/xeon.yaml). |
| 226 | +```shell |
| 227 | +RUNNER_SET_NAME="xeon" |
| 228 | +RUNNERS_NAMESPACE="opea-runner-set-c1" |
| 229 | +RUNNER_GROUP="opea-runner-set-1" # before use this name, make sure this group has been created in GHA. |
| 230 | +GITHUB_CONFIG_URL="https://github.com/opea-project" |
| 231 | +GITHUB_PAT="xxx" # the personal access token for GHA, which has the permission to create runners in the repo. |
| 232 | +helm install "${RUNNER_SET_NAME}" \ |
| 233 | + --namespace "${RUNNERS_NAMESPACE}" \ |
| 234 | + --create-namespace \ |
| 235 | + -f xeon_large.yaml \ |
| 236 | + --set githubConfigUrl="${GITHUB_CONFIG_URL}" \ |
| 237 | + --set githubConfigSecret.github_token="${GITHUB_PAT}" \ |
| 238 | + --set runnerGroup="${RUNNER_GROUP}" \ |
| 239 | + oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set |
| 240 | +``` |
| 241 | +**Nodes:** |
| 242 | +a. Make sure the nodes in the cluster have enough resources to run the runner pods. |
| 243 | +b. Create the special `RUNNER_GROUP` in GHA, which is used to group the runners. |
| 244 | +c. Make sure you have set up label for `nodeSelector`, with `kubectl label nodes opea-cicd-spr-0 runner-node=true` and use `kubectl get nodes --show-labels` to check the labels. |
| 245 | +d. Make sure you have `/data2` for model cache. |
| 246 | + |
| 247 | +3. Clean up the ARC (If needed) |
| 248 | +```shell |
| 249 | +# clean up runner set |
| 250 | +( |
| 251 | +RUNNER_SET_NAME="xeon" |
| 252 | +RUNNERS_NAMESPACE="opea-runner-set-c1" |
| 253 | +helm uninstall $RUNNER_SET_NAME -n $RUNNERS_NAMESPACE |
| 254 | +kubectl delete namespace $RUNNERS_NAMESPACE --grace-period=0 --force |
| 255 | +) |
| 256 | +# clean up ARC |
| 257 | +( |
| 258 | +NAMESPACE="opea-arc-systems" |
| 259 | +helm uninstall arc -n $NAMESPACE |
| 260 | +kubectl delete namespace $NAMESPACE --grace-period=0 --force |
| 261 | +) |
| 262 | +``` |
0 commit comments