Use dask cuda worker when GPU resource > 0 #339

ljstrnadiii · 2023-04-03T03:42:27Z

TL;DR

If a gpu has been specified in the dask plugin resource, assume we want to use dask cuda worker i.e. that dask-cuda is present on the specified image.

Type

Feature
Plugin

Are all requirements met?

Complete description

Currently we build an arg to submit to the container and inject values such as threads and memory from the limits specified in the resources object in the dask plugin object. Dask cuda gives us many built in gpu tools and all that is required is to ensure dask cuda is installed and then to start out workers with dask-cuda-worker. So, in order to accomodate, we simply check for gpu limits in the dask plugin resources and overwrite the first arg to dask-cuda-worker.

welcome · 2023-04-03T03:42:29Z

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
Sign off your commits (Reference: DCO Guide).

Signed-off-by: ljstrnadiii <[email protected]>

ljstrnadiii · 2023-04-03T03:46:40Z

@bstadlbauer any thoughts would be greatly appreciated 🙏

codecov · 2023-04-03T18:27:52Z

Codecov Report

Merging #339 (d2f99c2) into master (18a594e) will increase coverage by 1.33%.
The diff coverage is 75.00%.

❗ Current head d2f99c2 differs from pull request most recent head 6a82451. Consider uploading reports for the commit 6a82451 to get more accurate results

@@            Coverage Diff             @@
##           master     #339      +/-   ##
==========================================
+ Coverage   62.65%   63.99%   +1.33%     
==========================================
  Files         146      146              
  Lines       12220     9910    -2310     
==========================================
- Hits         7657     6342    -1315     
+ Misses       3981     2986     -995     
  Partials      582      582

Flag	Coverage Δ
unittests	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
go/tasks/plugins/array/k8s/subtask_exec_context.go	`80.80% <66.66%> (+0.97%)`	⬆️
go/tasks/plugins/k8s/dask/dask.go	`86.72% <100.00%> (+2.80%)`	⬆️

... and 128 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

pingsutw · 2023-04-10T01:29:20Z

@ljstrnadiii Is it ready for review?

bstadlbauer

I'm generally in favor, but I am somewhat cautious about making this the default.

How about adding a configuration option to be able to specify the (GPU) worker command? You can seen an example of how to do configuration in the spark plugin

bstadlbauer · 2023-04-11T06:38:01Z

go/tasks/plugins/k8s/dask/dask.go

+		// If limits includes gpu, assume dask cuda worker cli startup
+		// https://docs.rapids.ai/api/dask-cuda/nightly/quickstart.html#dask-cuda-worker
+		if !limits.Name(flytek8s.ResourceNvidiaGPU, "0").IsZero() {
+			workerArgs[0] = "dask-cuda-worker"
+		}


After having taken a closer look, I am not so sure whether this is a good default behavior. A few questions I've got:

Is this the default behavior that every user wants? E.g. from the docs I saw that it states:

It only helps with deployment and management of Dask workers in multi-GPU systems.

and a lot of k8s deployments will have single GPU workers.

Should this command also be applied on GPU requests, not just limits? 🤔

Is this the desired behavior in case the default resources (e.g. the task resources) also have GPUs?

Good point! I'll make changes to instead add the option to specify a gpu worker command.

OK, after thinking about this for a little and getting somewhat more familiar with the code base(s), tell me if this makes sense:

go to flyteidl plugins dask.proto and add a field "command".

update flytekit plugins python code for models and task here and here to include the "command" option.

lastly, update dask.go to get default command if not specified.

question: This seems like three separate prs. Is this order correct? update proto, update models, update dask.go? Also, "command" could be "startup_command". I don't care, but any suggestions are welcome.

@bstadlbauer if this sounds good to you I will try to get this in over the next week.

eapolinario · 2023-10-03T01:53:51Z

Hi, we are moving all Flyte development to a monorepo. In order to help the transition period, we're moving open PRs to monorepo automatically and your PR was moved to flyteorg/flyte#4129. Notice that if there are any conflicts in the resulting PR they most likely happen due to the change in the import path of the flyte components.

ljstrnadiii added 2 commits April 3, 2023 03:44

add cuda support

76edcb4

Signed-off-by: ljstrnadiii <[email protected]>

add simple test for workerspec arg + expected resources

6a82451

Signed-off-by: ljstrnadiii <[email protected]>

ljstrnadiii force-pushed the dask_add_cuda_support branch from 288f088 to 6a82451 Compare April 3, 2023 03:45

ljstrnadiii marked this pull request as ready for review April 10, 2023 14:51

bstadlbauer reviewed Apr 11, 2023

View reviewed changes

eapolinario closed this Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use dask cuda worker when GPU resource > 0 #339

Use dask cuda worker when GPU resource > 0 #339

ljstrnadiii commented Apr 3, 2023

welcome bot commented Apr 3, 2023

ljstrnadiii commented Apr 3, 2023

codecov bot commented Apr 3, 2023

pingsutw commented Apr 10, 2023 •

edited

Loading

bstadlbauer left a comment

bstadlbauer Apr 11, 2023

ljstrnadiii Apr 13, 2023

ljstrnadiii May 1, 2023 •

edited

Loading

eapolinario commented Oct 3, 2023

Use dask cuda worker when GPU resource > 0 #339

Use dask cuda worker when GPU resource > 0 #339

Conversation

ljstrnadiii commented Apr 3, 2023

TL;DR

Type

Are all requirements met?

Complete description

welcome bot commented Apr 3, 2023

ljstrnadiii commented Apr 3, 2023

codecov bot commented Apr 3, 2023

Codecov Report

pingsutw commented Apr 10, 2023 • edited Loading

bstadlbauer left a comment

Choose a reason for hiding this comment

bstadlbauer Apr 11, 2023

Choose a reason for hiding this comment

ljstrnadiii Apr 13, 2023

Choose a reason for hiding this comment

ljstrnadiii May 1, 2023 • edited Loading

Choose a reason for hiding this comment

eapolinario commented Oct 3, 2023

pingsutw commented Apr 10, 2023 •

edited

Loading

ljstrnadiii May 1, 2023 •

edited

Loading