|
1 | 1 | # Creating HA cluster with API Server
|
2 | 2 |
|
3 |
| -One of the issue for long-running Ray applications, for example, Ray Serve is that Ray Head node is a single |
4 |
| -point of failure, which means that if the Head node dies, complete cluster has to be restarted. Fortunately, |
5 |
| -KubeRay cluster provides an option to create |
6 |
| -[fault tolerance Ray cluster](https://docs.ray.io/en/master/cluster/kubernetes/user-guides/kuberay-gcs-ft.html). |
7 |
| -The similar type of highly available Ray cluster can also be created using API server. The foundation of this |
8 |
| -approach is ensuring high availability Global Control Service (GCS) data. GCS manages cluster-level |
9 |
| -metadata. By default, the GCS lacks fault tolerance as it stores all data in-memory, and a failure can cause the |
10 |
| -entire Ray cluster to fail. To make the GCS fault tolerant, you must have a high-availability Redis. This way, |
11 |
| -in the event of a GCS restart, it retrieves all the data from the Redis instance and resumes its regular |
12 |
| -functioning. |
13 |
| - |
14 |
| -## Creating external Redis cluster |
| 3 | +One of the issue for long-running Ray application (e.g. RayServe) is that if Ray head node |
| 4 | +dies, the whole cluster has to be restarted. Fortunately, KubeRay cluster solved it by |
| 5 | +introducing [Fault Tolerance Ray Cluster](https://docs.ray.io/en/master/cluster/kubernetes/user-guides/kuberay-gcs-ft.html). |
| 6 | + |
| 7 | +The RayCluster with high availability can also be created in API server, which aims to |
| 8 | +ensure a high availability Global Control Service (GCS) data. The GCS manages |
| 9 | +cluster-level metadata by storing all data in memory, which is lack of fault tolerance. A |
| 10 | +single failure can cause the entire RayCluster to fail. To enable GCS's fault tolerance, |
| 11 | +we should have a highly available Redis so that when GCS restart, it can resume its |
| 12 | +status by retrieve previous data from the Redis instance. |
| 13 | + |
| 14 | +We will provide a detailed example on how to create this highly available API Server. |
| 15 | + |
| 16 | +## Setup |
| 17 | + |
| 18 | +### Setup Ray Operator and API Server |
| 19 | + |
| 20 | +Refer to [README](README.md) for setting up KubRay operator and API server. |
| 21 | + |
| 22 | +Alternatively, you could build and deploy the Operator and API server from local repo for |
| 23 | +development purpose. |
| 24 | + |
| 25 | +```shell |
| 26 | +make start-local-apiserver deploy |
| 27 | +``` |
| 28 | + |
| 29 | +## Example |
| 30 | + |
| 31 | +Before going through the example, remove any running RayClusters to ensure a successful |
| 32 | +run through of the example below. |
| 33 | + |
| 34 | +```sh |
| 35 | +kubectl delete raycluster --all |
| 36 | +``` |
| 37 | + |
| 38 | +### Create external Redis cluster |
15 | 39 |
|
16 | 40 | A comprehensive documentation on creating Redis cluster on Kubernetes can be found
|
17 | 41 | [here]( https://www.dragonflydb.io/guides/redis-kubernetes). For this example we will use a rather simple
|
18 |
| -[yaml file](test/cluster/redis/redis.yaml). To create Redis run: |
| 42 | +[RedisYAML]. Simply download this YAML file and run: |
19 | 43 |
|
20 | 44 | ```sh
|
21 | 45 | kubectl create ns redis
|
22 |
| -kubectl apply -f <your location>/kuberay/apiserver/test/cluster/redis/redis.yaml -n redis |
| 46 | +kubectl apply -f redis.yaml -n redis |
23 | 47 | ```
|
24 | 48 |
|
25 |
| -Note that here we are deploying redis to the `redis` namespace, that we are creating here. |
| 49 | +Note that we created a new `redis` namespace and deploy redis in it. |
26 | 50 |
|
27 | 51 | Alternatively, if you run on the cloud you can use managed version of HA Redis, which will not require
|
28 | 52 | you to stand up, run, manage and monitor your own version of redis.
|
29 | 53 |
|
30 |
| -## Creating Redis password secret |
| 54 | +Check if the redis is successfully set up with following command. You should see |
| 55 | +`redis-config` in the list: |
| 56 | + |
| 57 | +```sh |
| 58 | +kubectl get configmaps -n redis |
31 | 59 |
|
32 |
| -Before creating your cluster, you need to create [secret](test/cluster/redis/redis_passwrd.yaml) in the |
| 60 | +# NAME DATA AGE |
| 61 | +# redis-config 1 19s |
| 62 | +``` |
| 63 | + |
| 64 | +### Create Redis password secret |
| 65 | + |
| 66 | +Before creating your cluster, you need to create secret in the |
33 | 67 | namespace where you are planning to create your Ray cluster (remember, that secret is visible only within a given
|
34 |
| -namespace). To create a secret for using external redis, run: |
| 68 | +namespace). To create a secret for using external redis, please download the [Secret] and |
| 69 | +run following command: |
35 | 70 |
|
36 | 71 | ```sh
|
37 |
| -kubectl apply -f <your location>/kuberay/apiserver/test/cluster/redis/redis_passwrd.yaml |
| 72 | +kubectl apply -f redis_passwrd.yaml |
38 | 73 | ```
|
39 | 74 |
|
40 |
| -## Ray Code for testing |
| 75 | +### Install ConfigMap |
| 76 | + |
| 77 | +We will use this [ConfigMap] which contains code for our example. For the real world |
| 78 | +cases, it is recommended to pack user code in an image. |
41 | 79 |
|
42 |
| -For both Ray Jobs and Ray Serve we recommend packaging user code in the image. For a simple testing here |
43 |
| -we will create a [config map](test/cluster/code_configmap.yaml), containing simple code, that we will use for |
44 |
| -testing. To deploy it run the following: |
| 80 | +Please download the config map and deploy it with following command: |
45 | 81 |
|
46 | 82 | ```sh
|
47 |
| -kubectl apply -f <your location>/kuberay/apiserver/test/cluster/code_configmap.yaml |
| 83 | +kubectl apply -f code_configmap.yaml |
48 | 84 | ```
|
49 | 85 |
|
50 |
| -## API server request |
| 86 | +### Create RayCluster |
51 | 87 |
|
52 |
| -To create a Ray cluster we can use the following curl command: |
| 88 | +Use following command to create a compute template and a RayCluster: |
53 | 89 |
|
54 | 90 | ```sh
|
| 91 | +curl -X POST 'localhost:31888/apis/v1/namespaces/default/compute_templates' \ |
| 92 | +--header 'Content-Type: application/json' \ |
| 93 | +--data '{ |
| 94 | + "name": "default-template", |
| 95 | + "namespace": "default", |
| 96 | + "cpu": 2, |
| 97 | + "memory": 4 |
| 98 | +}' |
55 | 99 | curl -X POST 'localhost:31888/apis/v1/namespaces/default/clusters' \
|
56 | 100 | --header 'Content-Type: application/json' \
|
57 | 101 | --data '{
|
@@ -134,84 +178,124 @@ curl -X POST 'localhost:31888/apis/v1/namespaces/default/clusters' \
|
134 | 178 | }'
|
135 | 179 | ```
|
136 | 180 |
|
137 |
| -Note that computeTemplate here has to be created using this [command](test/cluster//template/simple) |
| 181 | +To enable the RayCluster's GCS fault tolerance feature, we added the annotation: |
138 | 182 |
|
139 |
| -Lets discuss the important pieces here: |
140 |
| -You need to specify annotation, that tells Ray that this is cluster with GCS fault tolerance |
141 |
| - |
142 |
| -```sh |
143 |
| -ray.io/ft-enabled: "true" |
| 183 | +```json |
| 184 | +"annotations" : { |
| 185 | + "ray.io/ft-enabled": "true" |
| 186 | +} |
144 | 187 | ```
|
145 | 188 |
|
146 |
| -For the `headGroupSpec` you need the following. In the `rayStartParams` you need to add information about Redis |
147 |
| -password. |
| 189 | +For connecting to Redis, we set the following content in `rayStartParams` of |
| 190 | +`headGroupSpec`: We also added following content in `rayStartParams` of `headGroupSpec`, |
| 191 | +which set the Redis password and the number of cpu. Setting `num-cpu` to 0 ensures that no |
| 192 | +application code runs on a head node. |
148 | 193 |
|
149 |
| -```sh |
| 194 | +```json |
150 | 195 | "redis-password:: "$REDIS_PASSWORD"
|
151 | 196 | "num-cpu": "0"
|
152 | 197 | ```
|
153 | 198 |
|
154 |
| -Where the value of `REDIS_PASSWORD` comes from environment variable (below). Additionally `num-cpus: |
155 |
| -0` ensures that no application code runs on a head node. |
| 199 | +Here, the `$REDIS_PASSWORD` is defined in `headGroupSpec`'s environment variable below: |
156 | 200 |
|
157 |
| -The following environment variable have to be added here: |
| 201 | +```json |
| 202 | +"environment": { |
| 203 | + "values": { |
| 204 | + "RAY_REDIS_ADDRESS": "redis.redis.svc.cluster.local:6379" |
| 205 | + }, |
| 206 | + "valuesFrom": { |
| 207 | + "REDIS_PASSWORD": { |
| 208 | + "source": 1, |
| 209 | + "name": "redis-password-secret", |
| 210 | + "key": "password" |
| 211 | + } |
| 212 | + } |
| 213 | +}, |
| 214 | +``` |
158 | 215 |
|
159 |
| -```sh |
160 |
| - "environment": { |
161 |
| - "values": { |
162 |
| - "RAY_REDIS_ADDRESS": "redis.redis.svc.cluster.local:6379" |
163 |
| - }, |
164 |
| - "valuesFrom": { |
165 |
| - "REDIS_PASSWORD": { |
166 |
| - "source": 1, |
167 |
| - "name": "redis-password-secret", |
168 |
| - "key": "password" |
169 |
| - } |
170 |
| - } |
171 |
| - }, |
| 216 | +For the `workerGroupSpecs`, we set the `gcs_rpc_server_reconnect_timeout` environment |
| 217 | +variable, which controls the GCS heartbeat timeout (default 60 seconds). This controls how |
| 218 | +long after the head node dies do we kill the worker node. While it takes time to restart |
| 219 | +the head node, we want this values to be large enough to prevent the worker node being |
| 220 | +killed during the restarting period. |
| 221 | + |
| 222 | +```json |
| 223 | +"environment": { |
| 224 | + "values": { |
| 225 | + "RAY_gcs_rpc_server_reconnect_timeout_s": "300" |
| 226 | + } |
| 227 | +}, |
172 | 228 | ```
|
173 | 229 |
|
174 |
| -For the `workerGroupSpecs` you might want to increase `gcs_rpc_server_reconnect_timeout` by specifying the following |
175 |
| -environment variable: |
| 230 | +### Validate that RayCluster is deployed correctly |
| 231 | + |
| 232 | +Run following command to get list of pods running. You should see one head and worker node |
| 233 | +like below: |
176 | 234 |
|
177 | 235 | ```sh
|
178 |
| - "environment": { |
179 |
| - "values": { |
180 |
| - "RAY_gcs_rpc_server_reconnect_timeout_s": "300" |
181 |
| - } |
182 |
| - }, |
| 236 | +kubectl get pods |
| 237 | +# NAME READY STATUS RESTARTS AGE |
| 238 | +# ha-cluster-head 1/1 Running 0 2m36s |
| 239 | +# ha-cluster-small-wg-worker-22lbx 1/1 Running 0 2m36s |
183 | 240 | ```
|
184 | 241 |
|
185 |
| -This environment variable allows to increase GCS heartbeat timeout, which is 60 sec by default. The reason for |
186 |
| -increasing it is because restart of the head node can take some time, and we want to make sure that the worker node |
187 |
| -will not be killed during this time. |
| 242 | +### Create an Actor |
| 243 | + |
| 244 | +Before we try to trigger the restoration, we need to find a way to validate our GCS restore |
| 245 | +is working correctly. We will validate this by creating a detached actor. If it still |
| 246 | +exists and functions after the head node deletion and restoration, we can confirm that the |
| 247 | +GCS data is restored correctly. |
| 248 | + |
| 249 | +Run following command for creating a detached actor. Please change `ha-cluster-head` to |
| 250 | +your head node's name: |
188 | 251 |
|
189 |
| -## Testing resulting cluster |
| 252 | +```sh |
| 253 | +kubectl exec -it ha-cluster-head -- python3 /home/ray/samples/detached_actor.py |
| 254 | +``` |
190 | 255 |
|
191 |
| -Once the cluster is created, we can validate that it is working correctly. To do this first create a detached actor. |
192 |
| -To do this, note the name of the head node and create a detached actor using the following command: |
| 256 | +Then, open a new termianl and use port-forward to enable accessing to the Ray dashboard. |
| 257 | +The dashboard can be accessed through `http://localhost:8265`: |
193 | 258 |
|
194 | 259 | ```sh
|
195 |
| -kubectl exec -it <head node pod name> -- python3 /home/ray/samples/detached_actor.py |
| 260 | +kubectl port-forward pod/ha-cluster-head 8265:8265 |
196 | 261 | ```
|
197 | 262 |
|
198 |
| -Once this is done, open Ray dashboard (using port-forward). In the cluster tab you should see 2 nodes and in the |
199 |
| -Actor's pane you should see created actor. |
| 263 | +In the dashboard, You can see 2 nodes in the Cluster pane, which is head and worker: |
| 264 | + |
| 265 | + |
| 266 | + |
| 267 | +If you go to the Actor pane, you can see the actor we created earlier: |
200 | 268 |
|
201 |
| -Now you can delete head node pod: |
| 269 | + |
| 270 | + |
| 271 | +### Trigger the GCS restore |
| 272 | + |
| 273 | +To trigger the restoration, we can simply delete the head node with: |
202 | 274 |
|
203 | 275 | ```sh
|
204 |
| -kubectl delete pods <head node pod name> |
| 276 | +kubectl delete pods ha-cluster-head |
205 | 277 | ```
|
206 | 278 |
|
207 |
| -The operator will recreate it. Make sure that only head node is recreated (note that it now has a different name), |
208 |
| -while worker node stays as is. Now you can go to the dashboard and make sure that in the Cluster tab you still see |
209 |
| -2 nodes and in the Actor's pane you still see created actor. |
| 279 | +If you list the pods now, you can see a new head node is recreated |
210 | 280 |
|
211 |
| -For additional test run the following command: |
| 281 | +```sh |
| 282 | +kubectl get pods |
| 283 | +# NAME READY STATUS RESTARTS AGE |
| 284 | +# ha-cluster-head 0/1 Running 0 5s |
| 285 | +# ha-cluster-small-wg-worker-tpgqs 1/1 Running 0 9m19s |
| 286 | +``` |
| 287 | + |
| 288 | +Note that only head node will be recreated, while the worker node stays as is. |
| 289 | + |
| 290 | +Port-forward again and access the dashboard through `http://localhost:8265`: |
212 | 291 |
|
213 | 292 | ```sh
|
214 |
| -kubectl exec -it <head node pod name> -- python3 /home/ray/samples/increment_counter.py |
| 293 | +kubectl port-forward pod/ha-cluster-head 8265:8265 |
215 | 294 | ```
|
216 | 295 |
|
217 |
| -and make sure that it executes correctly. Note that the name of the head node here is different |
| 296 | +You can see one pod marked as "DEAD" in the Cluster pane and the actor in Actors pane |
| 297 | +still running. |
| 298 | + |
| 299 | +[RedisYAML]: test/cluster/redis/redis.yaml |
| 300 | +[Secret]: test/cluster/redis/redis_passwrd.yaml |
| 301 | +[ConfigMap]: test/cluster/code_configmap.yaml |
0 commit comments