Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: autodiscovery of cluster spec files #61

Open
alisterburt opened this issue Mar 2, 2023 · 6 comments
Open

proposal: autodiscovery of cluster spec files #61

alisterburt opened this issue Mar 2, 2023 · 6 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@alisterburt
Copy link

I recently picked the thread up from dask/dask-jobqueue#543 and dask/dask-jobqueue#544 and was super happy to find that arbitrary cluster configuration from a yaml spec is working really well with dask-ctl on the first try now, great work!

What do you think about having dask_ctl.create_cluster() autodiscover a yaml spec if one isn't provided? Do you see where this might fit into the existing dask config structure?

Thanks!

@jacobtomlinson
Copy link
Contributor

arbitrary cluster configuration from a yaml spec is working really well with dask-ctl

Yay!

I could imagine adding a config option here that could be pointed to a YAML spec. That way it could be configured either in the Dask YAML config or as an environment variable.

@jacobtomlinson jacobtomlinson added enhancement New feature or request help wanted Extra attention is needed labels Mar 7, 2023
@jacobtomlinson
Copy link
Contributor

Also thinking more about this, if you create a bare client with distributed and provide no config at all it instantiates a LocalCluster automagically.

>>> from dask.distributed import Client
>>> client = Client()  # Leave all config as defaults
>>> type(client.cluster)
<class 'distributed.deploy.local.LocalCluster'>

I wonder if we could add a hook there that dask-ctl could plug into and change the behaviour via an entrypoint or something.

So if you have dask-ctl installed and you configure the DASK_CTL__DEFAULT_SPEC_FILE="/path/to/spec.yaml" pointing to a spec that uses dask_jobqueue.SLURMCluster then creating a bare Client would instantiate the SLURMCluster via dask_ctl.create_cluster() instead of a LocalCluster.

@fjetter what do you think about that idea?

@fjetter
Copy link

fjetter commented Mar 7, 2023

This reminds me slightly of dask/distributed#6792 which, to some part, also discusses how to manage hooks to cluster instances or how to simplify UX around this.

The path-to-spec bit is surely different but the hook-to-implementation thing sounds similar. Does it makes sense to push on that ticket first?

Generally speaking I like the suggestion to simplify the user API by "hiding" clusters and offer only clients as a user facing API but I don't have a very strong opinion either way.

I wonder if we could add a hook there that dask-ctl could plug into and change the behaviour via an entrypoint or something.

If a hook is all you need we can add a hook. I'm not too familiar with dask-ctl but the need to change the "default cluster" has been mentioned frequently and should get started on it one way or the other.
One thing to keep in mind is how to handle precedence because I'm sure once there is a hook other projects want to use it as well event w/out dask-ctl. entrypoints might prove difficult because it would rely on installation order, wouldn't it?

@alisterburt
Copy link
Author

alisterburt commented Mar 7, 2023

My two cents here, with the obvious caveat that I'm not super experienced with dask so am potentially missing obvious reasons this is bad:

dask-ctl seems to be the general control plane and there are scenarios where I would want to have multiple clusters configured for e.g. gpu/cpu work in a HPC environment. Given that dask-ctl already allows for discovery of existing clusters by name dask_ctl.get_cluster(name) I think having a similar API for cluster creation (dask_ctl.create_cluster(name)) seems optimal, it allows easy creation of potentially many clusters rather than just one.

We could then reference many 'named' yaml specs in the dask-ctl config and create them at will with the API proposed above.

edit: extra thought, we would probably want to add a check if creating from name that the named cluster doesn't already exist

@jacobtomlinson as a stepping stone to implementing dask-ctl discovery in dask-jobqueue I just put together dask/dask-jobqueue#604 - if you have a bit of time a review over there would be appreciated, the implementation goes against what you suggested in some earlier discussion (dask/dask-jobqueue#543) but is, as far as I can tell, required for autodiscovery in dask-ctl

@jacobtomlinson
Copy link
Contributor

@alisterburt yeah that makes a lot of sense. I think the PR you raised in dask-jobqueue could basically be moved here whgich would allow other dask-foo projects to use it.

We could add a section to the dask-ctl config where you can list some predefined specs (either paths to YAML files or just directly in the config).

# ctl.yaml
ctl:
  cluster-templates:
    pbs:
      version: 1
      module: "dask_jobqueue"
      class: "PBSCluster"
      args: []
      kwargs:
        cores: 36
        memory: 100GB
        queue: regular
    custom-pbs-cluster: "/path/to/custom-pbs-cluster.yaml"
from dask_ctl import create_cluster

cluster = create_cluster("pbs")
client = cluster.get_client()

@alisterburt
Copy link
Author

alisterburt commented Mar 8, 2023

@jacobtomlinson brilliant! I will find some time over the next few days and submit a PR here

I think having the mechanism in dask-joqueue separately is also useful as

  1. the config there requires a little less understanding of python
  2. jobqueue clusters from config can be autodiscovered by dask-ctl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants