Skip to content

Commit 3d6d9e1

Browse files
committed
docs: update apiserver HACluster docs
-e Signed-off-by: machichima <[email protected]>
1 parent f1462dc commit 3d6d9e1

File tree

4 files changed

+164
-85
lines changed

4 files changed

+164
-85
lines changed

apiserver/Autoscaling.md

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -145,18 +145,13 @@ trigger a scale-up and launch a new worker pod.
145145
146146
### Validate that RayCluster is deployed correctly
147147

148-
Run:
148+
Run following command to get list of pods running. You should see something like below:
149149

150-
```shell
150+
```sh
151151
kubectl get pods
152-
```
153-
154-
You should get something like this:
155-
156-
```shell
157-
NAME READY STATUS RESTARTS AGE
158-
kuberay-operator-545586d46c-f9grr 1/1 Running 0 49m
159-
test-cluster-head 2/2 Running 0 3m1s
152+
# NAME READY STATUS RESTARTS AGE
153+
# kuberay-operator-545586d46c-f9grr 1/1 Running 0 49m
154+
# test-cluster-head 2/2 Running 0 3m1s
160155
```
161156

162157
Note that there is no worker for `test-cluster` as we set its initial replicas to 0. You

apiserver/HACluster.md

Lines changed: 159 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,101 @@
11
# Creating HA cluster with API Server
22

3-
One of the issue for long-running Ray applications, for example, Ray Serve is that Ray Head node is a single
4-
point of failure, which means that if the Head node dies, complete cluster has to be restarted. Fortunately,
5-
KubeRay cluster provides an option to create
6-
[fault tolerance Ray cluster](https://docs.ray.io/en/master/cluster/kubernetes/user-guides/kuberay-gcs-ft.html).
7-
The similar type of highly available Ray cluster can also be created using API server. The foundation of this
8-
approach is ensuring high availability Global Control Service (GCS) data. GCS manages cluster-level
9-
metadata. By default, the GCS lacks fault tolerance as it stores all data in-memory, and a failure can cause the
10-
entire Ray cluster to fail. To make the GCS fault tolerant, you must have a high-availability Redis. This way,
11-
in the event of a GCS restart, it retrieves all the data from the Redis instance and resumes its regular
12-
functioning.
13-
14-
## Creating external Redis cluster
3+
One of the issue for long-running Ray application (e.g. RayServe) is that if Ray head node
4+
dies, the whole cluster has to be restarted. Fortunately, KubeRay cluster solved it by
5+
introducing [Fault Tolerance Ray Cluster](https://docs.ray.io/en/master/cluster/kubernetes/user-guides/kuberay-gcs-ft.html).
6+
7+
The RayCluster with high availability can also be created in API server, which aims to
8+
ensure a high availability Global Control Service (GCS) data. The GCS manages
9+
cluster-level metadata by storing all data in memory, which is lack of fault tolerance. A
10+
single failure can cause the entire RayCluster to fail. To enable GCS's fault tolerance,
11+
we should have a highly available Redis so that when GCS restart, it can resume its
12+
status by retrieve previous data from the Redis instance.
13+
14+
We will provide a detailed example on how to create this highly available API Server.
15+
16+
## Setup
17+
18+
### Setup Ray Operator and API Server
19+
20+
Refer to [README](README.md) for setting up KubRay operator and API server.
21+
22+
Alternatively, you could build and deploy the Operator and API server from local repo for
23+
development purpose.
24+
25+
```shell
26+
make start-local-apiserver deploy
27+
```
28+
29+
## Example
30+
31+
Before going through the example, remove any running RayClusters to ensure a successful
32+
run through of the example below.
33+
34+
```sh
35+
kubectl delete raycluster --all
36+
```
37+
38+
### Create external Redis cluster
1539

1640
A comprehensive documentation on creating Redis cluster on Kubernetes can be found
1741
[here]( https://www.dragonflydb.io/guides/redis-kubernetes). For this example we will use a rather simple
18-
[yaml file](test/cluster/redis/redis.yaml). To create Redis run:
42+
[RedisYAML]. Simply download this YAML file and run:
1943

2044
```sh
2145
kubectl create ns redis
22-
kubectl apply -f <your location>/kuberay/apiserver/test/cluster/redis/redis.yaml -n redis
46+
kubectl apply -f redis.yaml -n redis
2347
```
2448

25-
Note that here we are deploying redis to the `redis` namespace, that we are creating here.
49+
Note that we created a new `redis` namespace and deploy redis in it.
2650

2751
Alternatively, if you run on the cloud you can use managed version of HA Redis, which will not require
2852
you to stand up, run, manage and monitor your own version of redis.
2953

30-
## Creating Redis password secret
54+
Check if the redis is successfully set up with following command. You should see
55+
`redis-config` in the list:
56+
57+
```sh
58+
kubectl get configmaps -n redis
3159

32-
Before creating your cluster, you need to create [secret](test/cluster/redis/redis_passwrd.yaml) in the
60+
# NAME DATA AGE
61+
# redis-config 1 19s
62+
```
63+
64+
### Create Redis password secret
65+
66+
Before creating your cluster, you need to create secret in the
3367
namespace where you are planning to create your Ray cluster (remember, that secret is visible only within a given
34-
namespace). To create a secret for using external redis, run:
68+
namespace). To create a secret for using external redis, please download the [Secret] and
69+
run following command:
3570

3671
```sh
37-
kubectl apply -f <your location>/kuberay/apiserver/test/cluster/redis/redis_passwrd.yaml
72+
kubectl apply -f redis_passwrd.yaml
3873
```
3974

40-
## Ray Code for testing
75+
### Install ConfigMap
76+
77+
We will use this [ConfigMap] which contains code for our example. For the real world
78+
cases, it is recommended to pack user code in an image.
4179

42-
For both Ray Jobs and Ray Serve we recommend packaging user code in the image. For a simple testing here
43-
we will create a [config map](test/cluster/code_configmap.yaml), containing simple code, that we will use for
44-
testing. To deploy it run the following:
80+
Please download the config map and deploy it with following command:
4581

4682
```sh
47-
kubectl apply -f <your location>/kuberay/apiserver/test/cluster/code_configmap.yaml
83+
kubectl apply -f code_configmap.yaml
4884
```
4985

50-
## API server request
86+
### Create RayCluster
5187

52-
To create a Ray cluster we can use the following curl command:
88+
Use following command to create a compute template and a RayCluster:
5389

5490
```sh
91+
curl -X POST 'localhost:31888/apis/v1/namespaces/default/compute_templates' \
92+
--header 'Content-Type: application/json' \
93+
--data '{
94+
"name": "default-template",
95+
"namespace": "default",
96+
"cpu": 2,
97+
"memory": 4
98+
}'
5599
curl -X POST 'localhost:31888/apis/v1/namespaces/default/clusters' \
56100
--header 'Content-Type: application/json' \
57101
--data '{
@@ -134,84 +178,124 @@ curl -X POST 'localhost:31888/apis/v1/namespaces/default/clusters' \
134178
}'
135179
```
136180

137-
Note that computeTemplate here has to be created using this [command](test/cluster//template/simple)
181+
To enable the RayCluster's GCS fault tolerance feature, we added the annotation:
138182

139-
Lets discuss the important pieces here:
140-
You need to specify annotation, that tells Ray that this is cluster with GCS fault tolerance
141-
142-
```sh
143-
ray.io/ft-enabled: "true"
183+
```json
184+
"annotations" : {
185+
"ray.io/ft-enabled": "true"
186+
}
144187
```
145188

146-
For the `headGroupSpec` you need the following. In the `rayStartParams` you need to add information about Redis
147-
password.
189+
For connecting to Redis, we set the following content in `rayStartParams` of
190+
`headGroupSpec`: We also added following content in `rayStartParams` of `headGroupSpec`,
191+
which set the Redis password and the number of cpu. Setting `num-cpu` to 0 ensures that no
192+
application code runs on a head node.
148193

149-
```sh
194+
```json
150195
"redis-password:: "$REDIS_PASSWORD"
151196
"num-cpu": "0"
152197
```
153198

154-
Where the value of `REDIS_PASSWORD` comes from environment variable (below). Additionally `num-cpus:
155-
0` ensures that no application code runs on a head node.
199+
Here, the `$REDIS_PASSWORD` is defined in `headGroupSpec`'s environment variable below:
156200

157-
The following environment variable have to be added here:
201+
```json
202+
"environment": {
203+
"values": {
204+
"RAY_REDIS_ADDRESS": "redis.redis.svc.cluster.local:6379"
205+
},
206+
"valuesFrom": {
207+
"REDIS_PASSWORD": {
208+
"source": 1,
209+
"name": "redis-password-secret",
210+
"key": "password"
211+
}
212+
}
213+
},
214+
```
158215

159-
```sh
160-
"environment": {
161-
"values": {
162-
"RAY_REDIS_ADDRESS": "redis.redis.svc.cluster.local:6379"
163-
},
164-
"valuesFrom": {
165-
"REDIS_PASSWORD": {
166-
"source": 1,
167-
"name": "redis-password-secret",
168-
"key": "password"
169-
}
170-
}
171-
},
216+
For the `workerGroupSpecs`, we set the `gcs_rpc_server_reconnect_timeout` environment
217+
variable, which controls the GCS heartbeat timeout (default 60 seconds). This controls how
218+
long after the head node dies do we kill the worker node. While it takes time to restart
219+
the head node, we want this values to be large enough to prevent the worker node being
220+
killed during the restarting period.
221+
222+
```json
223+
"environment": {
224+
"values": {
225+
"RAY_gcs_rpc_server_reconnect_timeout_s": "300"
226+
}
227+
},
172228
```
173229

174-
For the `workerGroupSpecs` you might want to increase `gcs_rpc_server_reconnect_timeout` by specifying the following
175-
environment variable:
230+
### Validate that RayCluster is deployed correctly
231+
232+
Run following command to get list of pods running. You should see one head and worker node
233+
like below:
176234

177235
```sh
178-
"environment": {
179-
"values": {
180-
"RAY_gcs_rpc_server_reconnect_timeout_s": "300"
181-
}
182-
},
236+
kubectl get pods
237+
# NAME READY STATUS RESTARTS AGE
238+
# ha-cluster-head 1/1 Running 0 2m36s
239+
# ha-cluster-small-wg-worker-22lbx 1/1 Running 0 2m36s
183240
```
184241

185-
This environment variable allows to increase GCS heartbeat timeout, which is 60 sec by default. The reason for
186-
increasing it is because restart of the head node can take some time, and we want to make sure that the worker node
187-
will not be killed during this time.
242+
### Create an Actor
243+
244+
Before we try to trigger the restoration, we need to find a way to validate our GCS restore
245+
is working correctly. We will validate this by creating a detached actor. If it still
246+
exists and functions after the head node deletion and restoration, we can confirm that the
247+
GCS data is restored correctly.
248+
249+
Run following command for creating a detached actor. Please change `ha-cluster-head` to
250+
your head node's name:
188251

189-
## Testing resulting cluster
252+
```sh
253+
kubectl exec -it ha-cluster-head -- python3 /home/ray/samples/detached_actor.py
254+
```
190255

191-
Once the cluster is created, we can validate that it is working correctly. To do this first create a detached actor.
192-
To do this, note the name of the head node and create a detached actor using the following command:
256+
Then, open a new termianl and use port-forward to enable accessing to the Ray dashboard.
257+
The dashboard can be accessed through `http://localhost:8265`:
193258

194259
```sh
195-
kubectl exec -it <head node pod name> -- python3 /home/ray/samples/detached_actor.py
260+
kubectl port-forward pod/ha-cluster-head 8265:8265
196261
```
197262

198-
Once this is done, open Ray dashboard (using port-forward). In the cluster tab you should see 2 nodes and in the
199-
Actor's pane you should see created actor.
263+
In the dashboard, You can see 2 nodes in the Cluster pane, which is head and worker:
264+
265+
![hacluster-dashboard-cluster](img/hacluster-dashboard-cluster.png)
266+
267+
If you go to the Actor pane, you can see the actor we created earlier:
200268

201-
Now you can delete head node pod:
269+
![hacluster-dashboard-actor](img/hacluster-dashboard-actor.png)
270+
271+
### Trigger the GCS restore
272+
273+
To trigger the restoration, we can simply delete the head node with:
202274

203275
```sh
204-
kubectl delete pods <head node pod name>
276+
kubectl delete pods ha-cluster-head
205277
```
206278

207-
The operator will recreate it. Make sure that only head node is recreated (note that it now has a different name),
208-
while worker node stays as is. Now you can go to the dashboard and make sure that in the Cluster tab you still see
209-
2 nodes and in the Actor's pane you still see created actor.
279+
If you list the pods now, you can see a new head node is recreated
210280

211-
For additional test run the following command:
281+
```sh
282+
kubectl get pods
283+
# NAME READY STATUS RESTARTS AGE
284+
# ha-cluster-head 0/1 Running 0 5s
285+
# ha-cluster-small-wg-worker-tpgqs 1/1 Running 0 9m19s
286+
```
287+
288+
Note that only head node will be recreated, while the worker node stays as is.
289+
290+
Port-forward again and access the dashboard through `http://localhost:8265`:
212291

213292
```sh
214-
kubectl exec -it <head node pod name> -- python3 /home/ray/samples/increment_counter.py
293+
kubectl port-forward pod/ha-cluster-head 8265:8265
215294
```
216295

217-
and make sure that it executes correctly. Note that the name of the head node here is different
296+
You can see one pod marked as "DEAD" in the Cluster pane and the actor in Actors pane
297+
still running.
298+
299+
[RedisYAML]: test/cluster/redis/redis.yaml
300+
[Secret]: test/cluster/redis/redis_passwrd.yaml
301+
[ConfigMap]: test/cluster/code_configmap.yaml
111 KB
Loading
198 KB
Loading

0 commit comments

Comments
 (0)