-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for configuring Dask distributed #2040
Comments
Another question: would we like to be able to configure Dask distributed from the command line? Or at least pass in the scheduler address if we already have a Dask cluster running, e.g. started from a Jupyter notebook? |
cheers @bouweandela - sorry I slacked at this - I'll come back with a deeper and more meanigful analysis (yeh, beware 🤣 ) but before I do that, here's two quick comments:
|
Thanks a lot for your work @bouweandela! I would also suggest not to put dask related settings to the
It could be nice to have the possibility to use something like
Yes, that's a good point. I'm just worried that this can make it more complicated for us the developers: time to get answers from HPC admins, updates in the software stack, number of machines supported, ... Perhaps we could simply link to specific documentation on Dask usage if HPC centers provide that (here is an example for DKRZ). |
This is still an ongoing discussion so needs reopening |
Suggestion by @sloosvel:
|
At the workshop at SMHI agreement was reached that a new configuration file format would be acceptable. I will make a proposal, but this will not be implemented in time for v2.10. |
Since iris 3.6, it is possible to use Dask distributed with iris. This is a great new feature that will allow for better memory handling and distributed computing. See #1714 for an example implementation. However, it does require some extra configuration.
My proposal would be to allow users to specify the arguments to distributed.Client and to the associated cluster, e.g. distributed.LocalCluster or dask_jobqueue.SLURMCluster in this configuration. This could either be added under a new key in config-user.yml or in a new configuration file in the
~/.esmvaltool
directory:Add to the user configuration file
We could add these new options to
config-user.yml
under a new keydask
, e.g.Example config-user.yml settings for running locally using a LocalCluster:
Example settings for using an externally managed cluster (e.g. set it up from a Jupyter notebook)
Example settings for running on Levante:
New configuration file
Or, we could add the new configuration in a separate file, e.g. called
~/.esmvaltool/dask.yml
or~/.esmvaltool/dask-distributed.yml
.Example config-user.yml settings for running locally using a LocalCluster:
Example settings for using an externally managed cluster (e.g. set it up from a Jupyter notebook)
Example settings for running on Levante:
@ESMValGroup/esmvaltool-coreteam Does anyone have an opinion on what the best approach is here? A new file or add to config-user.yml?
The text was updated successfully, but these errors were encountered: