Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Ray Cluster/AppWrapper creation #650

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions docs/cluster-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,12 @@ cluster = Cluster(ClusterConfiguration(
worker_cpu_limits=1, # Default 1
worker_memory_requests=2, # Default 2
worker_memory_limits=2, # Default 2
# image="", # Optional Field
machine_types=["m5.xlarge", "g4dn.xlarge"],
# image="", # Default quay.io/rhoai/ray:2.23.0-py39-cu121
labels={"exampleLabel": "example", "secondLabel": "example"},
))
```
Note: 'quay.io/rhoai/ray:2.23.0-py39-cu121' is the default community image used by the CodeFlare SDK for creating a RayCluster resource. If you have your own Ray image which suits your purposes, specify it in image field to override the default image.

The `labels={"exampleLabel": "example"}` parameter can be used to apply additional labels to the RayCluster resource.

After creating their `cluster`, a user can call `cluster.up()` and `cluster.down()` to respectively create or remove the Ray Cluster.
For detailed instructions on the various methods that can be called for interacting with Ray Clusters see [Ray Cluster Interaction](./ray_cluster_interaction.md).
71 changes: 71 additions & 0 deletions docs/ray_cluster_interaction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Ray Cluster Interaction

The CodeFlare SDK offers multiple ways to interact with Ray Clusters including the below methods.

## get_cluster()
The `get_cluster()` function is used to initialise a `Cluster` object from a pre-existing Ray Cluster/AppWrapper. <br>
Below is an example of it's usage:
```
from codeflare_sdk import get_cluster
cluster = get_cluster(cluster_name="raytest", namespace="example", is_appwrapper=False, write_to_file=False)
-> output: Yaml resources loaded for raytest

cluster.status()
-> output:
🚀 CodeFlare Cluster Status 🚀

╭─────────────────────────────────────────────────────────────────╮
│ Name │
│ raytest Active ✅ │
│ │
│ URI: ray://raytest-head-svc.example.svc:10001 │
│ │
│ Dashboard🔗 │
│ │
╰─────────────────────────────────────────────────────────────────╯
(<CodeFlareClusterStatus.READY: 1>, True)

cluster.down()

cluster.up() # This function will create an exact copy of the retrieved Ray Cluster only if the Ray Cluster has been previously deleted.
```

These are the parameters the `get_cluster()` function accepts:
* `cluster_name: str # Required` -> The name of the Ray Cluster.
* `namespace: str # Default: "default"` -> The namespace of the Ray Cluster.
* `is_appwrapper: bool # Default: False` -> When set to `True` the function will attempt to retrieve an AppWrapper instead of a Ray Cluster.
* `write_to_file: bool # Default: False` -> When set to `True` the Ray Cluster/AppWrapper will be written to a file similar to how it is done in `ClusterConfiguration`.

## list_all_queued()
The `list_all_queued()` function returns (and prints by default) a list of all currently queued-up Ray Clusters in a given namespace.
It accepts the following parameters:
* `namespace: str # Required` -> The namespace you want to retrieve the list from.
* `print_to_console: bool # Default: True` -> Allows the user to print the list to their console.
* `appwrapper: bool # Default: False` -> When set to `True` allows the user to list queued AppWrappers.



## list_all_clusters()
The `list_all_clusters()` function will return a list of detailed descriptions of Ray Clusters to the console by default. It accepts the following parameters:
* `namespace: str # Required` -> The namespace you want to retrieve the list from.
* `print_to_console: bool # Default: True` -> A boolean that allows the user to print the list to their console.

<br>
NOTE: The following methods require a `Cluster` object to be initialised see [Cluster Configuration](./cluster-configuration.md)

## cluster.up()
The `cluster.up()` function creates a Ray Cluster in the given namespace.

## cluster.down()
The `cluster.down()` function deletes the Ray Cluster in the given namespace.

## cluster.status()
The `cluster.status()` function prints out the status of the Ray Cluster's state with a link to the Ray Dashboard.

## cluster.details()
The `cluster.details()` function prints out a detailed description of the Ray Cluster's status, worker resources and a link to the Ray Dashboard.

## cluster.wait_ready()
The `cluster.wait_ready()` function waits for the requested cluster to be ready, up to an optional timeout and checks every 5 seconds. It accepts the following parameters:
* `timeout: Optional[int] # Default: None` -> Allows the user to define a timeout for the `wait_ready()` function.
* `dashboard_check: bool # Default: True` -> If enabled the `wait_ready()` function will wait until the Ray Dashboard is ready too.
Loading