Skip to content

Commit

Permalink
docs: update cluster configuration doc and add new ray cluster intera…
Browse files Browse the repository at this point in the history
…ction doc

Signed-off-by: Bobbins228 <[email protected]>
  • Loading branch information
Bobbins228 committed Aug 27, 2024
1 parent 2f81c45 commit e5e8b9b
Show file tree
Hide file tree
Showing 2 changed files with 73 additions and 3 deletions.
5 changes: 2 additions & 3 deletions docs/cluster-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,12 @@ cluster = Cluster(ClusterConfiguration(
worker_cpu_limits=1, # Default 1
worker_memory_requests=2, # Default 2
worker_memory_limits=2, # Default 2
# image="", # Optional Field
machine_types=["m5.xlarge", "g4dn.xlarge"],
# image="", # Default quay.io/rhoai/ray:2.23.0-py39-cu121
labels={"exampleLabel": "example", "secondLabel": "example"},
))
```
Note: 'quay.io/rhoai/ray:2.23.0-py39-cu121' is the default community image used by the CodeFlare SDK for creating a RayCluster resource. If you have your own Ray image which suits your purposes, specify it in image field to override the default image.

The `labels={"exampleLabel": "example"}` parameter can be used to apply additional labels to the RayCluster resource.

After creating their `cluster`, a user can call `cluster.up()` and `cluster.down()` to respectively create or remove the Ray Cluster.
For detailed instructions on the various methods that can be called for interacting with Ray Clusters see [Ray Cluster Interaction](./ray_cluster_interaction.md).
71 changes: 71 additions & 0 deletions docs/ray_cluster_interaction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Ray Cluster Interaction

The CodeFlare SDK offers multiple ways to interact with Ray Clusters including the below methods.

## get_cluster()
The `get_cluster()` command is used to initialise a `Cluster` object from a pre-existing Ray Cluster/AppWrapper. <br>
Below is an example of it's usage:
```
from codeflare_sdk import get_cluster
cluster = get_cluster(cluster_name="raytest", namespace="example", is_appwrapper=False, write_to_file=False)
-> output: Yaml resources loaded for raytest
cluster.status()
-> output:
🚀 CodeFlare Cluster Status 🚀
╭─────────────────────────────────────────────────────────────────╮
│ Name │
│ raytest Active ✅ │
│ │
│ URI: ray://raytest-head-svc.example.svc:10001 │
│ │
│ Dashboard🔗 │
│ │
╰─────────────────────────────────────────────────────────────────╯
(<CodeFlareClusterStatus.READY: 1>, True)
cluster.down()
cluster.up() # This method will create an exact copy of the retrieved Ray Cluster only if the Ray Cluster has been previously deleted.
```

These are the parameters the `get_cluster()` method accepts:
* `cluster_name: str # Required` -> The name of the Ray Cluster.
* `namespace: str # Default: "default"` -> The namespace the Cluster.
* `is_appwrapper: bool # Default: False` -> The function will attempt to retrieve an AppWrapper instead of a Ray Cluster.
* `write_to_file: bool # Default: False` -> The Ray Cluster/AppWrapper will be written to a file similar to how it is done in `ClusterConfiguration`.

## list_all_queued()
The `list_all_queued()` returns (and prints by default) a list of all currently queued-up Ray Clusters in a given namespace.
It accepts the following parameters:
* `namespace: str # Required` -> The namespace you want to retrieve the list from.
* `print_to_console: bool # Default: True` -> Allows the user to print the list to their console.
* `appwrapper: bool # Default: False` -> Allows the user to list queued AppWrappers.



## list_all_clusters()
The `list_all_clusters()` function will return a list of detailed descriptions of Ray Clusters to the console by default. It accepts the following parameters:
* `namespace: str # Required` -> The namespace you want to retrieve the list from.
* `print_to_console: bool # Default: True` -> A boolean that allows the user to print the list to their console.

<br>
NOTE: The following methods require a `Cluster` object to be initialised see [Cluster Configuration](./cluster-configuration.md)

## cluster.up()
The `cluster.up()` method creates a Ray Cluster in the given namespace.

## cluster.down()
The `cluster.down()` method will delete the Ray Cluster in the given namespace.

## cluster.status()
The `cluster.status()` method will print out a status of the Ray Cluster's state with a link to the Ray Dashboard.

## cluster.details()
The `cluster.details()` method will print out a detailed description of the Ray Cluster's status, worker resources and a link to the Ray Dashboard.

## cluster.wait_ready()
The `cluster.wait_ready()` method waits for requested cluster to be ready, up to an optional timeout and checks every 5 seconds. It accepts the following parameters:
* `timeout: Optional[int] # Default: None` -> Allows the user to define a timeout for the `wait_ready()` method.
* `dashboard_check: bool # Default: True` -> If enabled the `wait_ready()` method will wait until the Ray Dashboard is ready too.

0 comments on commit e5e8b9b

Please sign in to comment.