Skip to content

Commit 32eba0b

Browse files
authored
Add CI/CD Test Environment Setup Step-by-Step BKM (#389)
* Add CI/CD Test Environment Setup Step-by-Step BKM --------- Signed-off-by: chensuyue <[email protected]>
1 parent 10f2089 commit 32eba0b

File tree

2 files changed

+274
-0
lines changed

2 files changed

+274
-0
lines changed

deploy/index.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,18 @@ Installation Guides
3131
../guide/installation/*
3232
../guide/installation/**/*
3333

34+
CI/CD ENV Setup
35+
***************
36+
37+
.. rst-class:: rst-columns
38+
39+
.. toctree::
40+
:maxdepth: 1
41+
:glob:
42+
43+
../guide/cicd/*
44+
../guide/cicd/**/*
45+
3446
Cloud Service Provider
3547
**********************
3648

guide/cicd/setup_cicd_env.md

Lines changed: 262 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,262 @@
1+
# Setup Test Environment for CI/CD
2+
This document outlines the steps to set up a test environment for OPEA CI/CD from scratch. The environment will be used to run tests and ensure code quality before PR merge and Release.
3+
4+
## Install Habana Driver (Gaudi Only)
5+
1. Driver and software installation
6+
https://docs.habana.ai/en/latest/Installation_Guide/Driver_Installation.html
7+
2. Firmware upgrade
8+
https://docs.habana.ai/en/latest/Installation_Guide/Firmware_Upgrade.html
9+
10+
11+
## Install Docker
12+
```shell
13+
sudo apt update
14+
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin
15+
sudo systemctl enable docker.service
16+
sudo systemctl daemon-reload
17+
sudo systemctl start docker
18+
```
19+
### Troubleshooting Docker Installation
20+
1. Issue: E: Unable to locate package docker-compose-plugin
21+
**solution:**
22+
```shell
23+
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
24+
echo \
25+
"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
26+
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
27+
sudo apt-get update
28+
sudo apt-get install -y docker-compose-plugin
29+
```
30+
2. Issue: permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.45/containers/json": dial unix /var/run/docker.sock: connect: permission denied
31+
**solution:**
32+
```shell
33+
# option1.
34+
sudo usermod -a -G docker xxx
35+
# option2.
36+
sudo chmod 666 /var/run/docker.sock
37+
```
38+
3. Issue: ulimit -n setting. [optional]
39+
**solution:**
40+
```shell
41+
cat << EOF | tee /etc/systemd/system/containerd.service.d/override.conf
42+
[Service]
43+
LimitNOFILE=infinity
44+
EOF
45+
sudo systemctl restart containerd.service
46+
```
47+
4. Issue: control the maximum number of memory mapped areas that a process can have. [optional]
48+
**solution:**
49+
```shell
50+
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
51+
sudo sysctl -p
52+
sysctl vm.max_map_count # check
53+
```
54+
55+
## Install Conda
56+
For e2e test env setup.
57+
```shell
58+
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
59+
bash Miniconda3-latest-Linux-x86_64.sh
60+
```
61+
62+
## Install K8S
63+
1. Use kubeadm to setup k8s cluster.
64+
https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubeadm.md
65+
2. Install Habana plugins (Gaudi Only)
66+
https://docs.habana.ai/en/latest/Installation_Guide/Additional_Installation/Kubernetes_Installation/Intel_Gaudi_Kubernetes_Device_Plugin.html
67+
### Some Test Code after Installation
68+
```shell
69+
kubectl get nodes -o wide
70+
kubectl get pods -A
71+
kubectl get cs
72+
kubectl describe node <node_name>
73+
kubectl describe pod <pod_name>
74+
```
75+
Test for Gaudi:
76+
```shell
77+
cat <<EOF | tee test.yaml
78+
apiVersion: batch/v1
79+
kind: Job
80+
metadata:
81+
name: habanalabs-gaudi-demo
82+
spec:
83+
template:
84+
spec:
85+
hostIPC: true
86+
restartPolicy: OnFailure
87+
containers:
88+
- name: habana-ai-base-container
89+
image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu24.04/habanalabs/pytorch-installer-2.6.0:latest
90+
workingDir: /root
91+
command: ["hl-smi"]
92+
securityContext:
93+
capabilities:
94+
add: ["SYS_NICE"]
95+
resources:
96+
limits:
97+
habana.ai/gaudi: 1
98+
EOF
99+
100+
kubectl apply -f test.yaml
101+
kubectl delete -f test.yaml
102+
```
103+
104+
## Setup Image Registry for K8S Test
105+
1. Create a docker image registry.
106+
```shell
107+
cat << EOF | tee registry.yaml
108+
version: 0.1
109+
log:
110+
fields:
111+
service: registry
112+
storage:
113+
cache:
114+
blobdescriptor: inmemory
115+
filesystem:
116+
rootdirectory: /var/lib/registry
117+
delete:
118+
enabled: true
119+
http:
120+
addr: :5000
121+
headers:
122+
X-Content-Type-Options: [nosniff]
123+
health:
124+
storagedriver:
125+
enabled: true
126+
interval: 10s
127+
threshold: 3
128+
EOF
129+
130+
cd /scratch-1 # place to store the images
131+
mkdir local_image_registry && chmod -R 777 local_image_registry
132+
docker run -d -p 5000:5000 --restart=always --name registry -v /home/sdp/workspace/registry.yaml:/etc/docker/registry/config.yml -v /scratch-1/local_image_registry:/var/lib/registry registry:2
133+
```
134+
2. Setup docker registry clean up cron.
135+
https://github.com/opea-project/Validation/blob/main/tools/image-registry/cleanup.sh
136+
3. Setup connection to the local registry.
137+
138+
For docker:
139+
```shell
140+
cat /etc/docker/daemon.json
141+
# gaudi:
142+
{"runtimes": {"habana": {"path": "/usr/bin/habana-container-runtime", "runtimeArgs": []}}, "default-runtime": "habana", "insecure-registries" : [ "100.83.111.232:5000" ]}
143+
# xeon:
144+
{"insecure-registries": ["100.83.111.232:5000"]}
145+
146+
# restart docker
147+
sudo systemctl restart docker
148+
149+
# for test
150+
docker pull opea/chatqna:latest
151+
docker tag opea/chatqna:latest 100.83.111.232:5000/opea/chatqna:test
152+
docker push 100.83.111.232:5000/opea/chatqna:test
153+
```
154+
For K8S:
155+
```shell
156+
# setup in client side
157+
cat /etc/containerd/config.toml
158+
...
159+
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
160+
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
161+
endpoint = ["https://registry-1.docker.io"]
162+
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."100.83.111.232:5000"]
163+
endpoint = ["http://100.83.111.232:5000"]
164+
...
165+
# restart containerd
166+
sudo systemctl restart containerd.service
167+
168+
# setup in server side
169+
cd /etc/containerd
170+
sudo mkdir -p certs.d/100.83.111.232:5000
171+
cd certs.d/100.83.111.232:5000
172+
cat << EOF | sudo tee hosts.toml
173+
server = "http://100.83.111.232:5000"
174+
[host."http://100.83.111.232:5000"]
175+
capabilities = ["pull", "resolve", "push"]
176+
EOF
177+
178+
# restart containerd
179+
sudo systemctl restart containerd.service
180+
181+
# for test
182+
docker pull opea/chatqna:latest
183+
docker tag opea/chatqna:latest 100.83.111.232:5000/opea/chatqna:test
184+
docker push 100.83.111.232:5000/opea/chatqna:test
185+
sudo nerdctl -n k8s.io pull 100.83.111.232:5000/opea/chatqna:test
186+
```
187+
4. Setup ENV for CI/CD.
188+
```shell
189+
vi .bashrc
190+
export OPEA_IMAGE_REPO=100.83.111.232:5000/
191+
```
192+
5. Build and push images to the new local registry.
193+
194+
## Setup GHA ENV for CI/CD
195+
1. Setup self-hosed runner for GHA, follow official steps.
196+
2. Setup ENV for GHA.
197+
```shell
198+
vi ~/action_runner/.env
199+
OPEA_IMAGE_REPO=100.83.111.232:5000/
200+
```
201+
3. Start runner with svc.
202+
```shell
203+
sudo ./svc.sh install # use svc.sh instead of run.sh
204+
sudo ./svc.sh start
205+
sudo ./svc.sh status
206+
sudo ./svc.sh stop
207+
```
208+
## Setup Action Runner Controller (ARC)
209+
https://docs.github.com/en/actions/tutorials/quickstart-for-actions-runner-controller
210+
For now, we only support use ARC on Xeon K8S cluster.
211+
1. Install the ARC
212+
Make sure you have installed k8s and helm charts in your test machine.
213+
```shell
214+
NAMESPACE="opea-arc-systems"
215+
helm install arc \
216+
--namespace "${NAMESPACE}" \
217+
--create-namespace \
218+
oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller
219+
220+
helm uninstall arc -n $NAMESPACE
221+
kubectl delete namespace $NAMESPACE --grace-period=0 --force
222+
```
223+
2. Install a runner scale set
224+
The runner image that we used in CI/CD build by this [dockerfile](https://github.com/opea-project/Validation/blob/main/tools/actions-runner-controller/xeon.dockerfile).
225+
And the config setting for the runner scale set can be found in [here](https://github.com/opea-project/Validation/blob/main/tools/actions-runner-controller/xeon.yaml).
226+
```shell
227+
RUNNER_SET_NAME="xeon"
228+
RUNNERS_NAMESPACE="opea-runner-set-c1"
229+
RUNNER_GROUP="opea-runner-set-1" # before use this name, make sure this group has been created in GHA.
230+
GITHUB_CONFIG_URL="https://github.com/opea-project"
231+
GITHUB_PAT="xxx" # the personal access token for GHA, which has the permission to create runners in the repo.
232+
helm install "${RUNNER_SET_NAME}" \
233+
--namespace "${RUNNERS_NAMESPACE}" \
234+
--create-namespace \
235+
-f xeon_large.yaml \
236+
--set githubConfigUrl="${GITHUB_CONFIG_URL}" \
237+
--set githubConfigSecret.github_token="${GITHUB_PAT}" \
238+
--set runnerGroup="${RUNNER_GROUP}" \
239+
oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set
240+
```
241+
**Nodes:**
242+
a. Make sure the nodes in the cluster have enough resources to run the runner pods.
243+
b. Create the special `RUNNER_GROUP` in GHA, which is used to group the runners.
244+
c. Make sure you have set up label for `nodeSelector`, with `kubectl label nodes opea-cicd-spr-0 runner-node=true` and use `kubectl get nodes --show-labels` to check the labels.
245+
d. Make sure you have `/data2` for model cache.
246+
247+
3. Clean up the ARC (If needed)
248+
```shell
249+
# clean up runner set
250+
(
251+
RUNNER_SET_NAME="xeon"
252+
RUNNERS_NAMESPACE="opea-runner-set-c1"
253+
helm uninstall $RUNNER_SET_NAME -n $RUNNERS_NAMESPACE
254+
kubectl delete namespace $RUNNERS_NAMESPACE --grace-period=0 --force
255+
)
256+
# clean up ARC
257+
(
258+
NAMESPACE="opea-arc-systems"
259+
helm uninstall arc -n $NAMESPACE
260+
kubectl delete namespace $NAMESPACE --grace-period=0 --force
261+
)
262+
```

0 commit comments

Comments
 (0)