[RELEASE] dask-cuda v24.12 #1411

raydouglass · 2024-11-21T20:51:07Z

❄️ Code freeze for `branch-24.12` and v24.12 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-24.12 until release (merging of this PR).

What is the purpose of this PR?

Update documentation
Allow testing for the new release
Enable a means to merge branch-24.12 into main for the release

Forward-merge branch-24.10 into branch-24.12

Durations output were previously increased to show all tests to allow us debugging of timeouts. However, now they have not been as important so limiting to only the 50 longer running tests is best to decrease log lengths, we may soon remove it entirely if they are not currently important. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - James Lamb (https://github.com/jameslamb) URL: #1393

Contributes to rapidsai/build-planning#106 Proposes specifying the RAPIDS version in `conda install` calls that install CI artifacts, to reduce the risk of CI jobs picking up artifacts from other releases. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: #1395

@pentschev

This PR closes: #1281 Usage example: ``` from dask_cuda import LocalCUDACluster from dask.distributed import Client cluster = LocalCUDACluster(rmm_allocator_external_lib_list=["torch", "cupy"]) client = Client(cluster) ``` Verify working ``` def get_torch_allocator(): import torch return torch.cuda.get_allocator_backend() client.run(get_torch_allocator) ``` ``` client.run(get_torch_allocator) ``` ``` {'tcp://127.0.0.1:37167': 'pluggable', 'tcp://127.0.0.1:38749': 'pluggable', 'tcp://127.0.0.1:43109': 'pluggable', 'tcp://127.0.0.1:44259': 'pluggable', 'tcp://127.0.0.1:44953': 'pluggable', 'tcp://127.0.0.1:45087': 'pluggable', 'tcp://127.0.0.1:45623': 'pluggable', 'tcp://127.0.0.1:45847': 'pluggable'} ``` Without it its `native`. Context: This helps NeMo-Curator to have a more stable use of Pytorch+dask-cuda CC: @pentschev . Authors: - Vibhu Jawa (https://github.com/VibhuJawa) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1392

UCXX CI tests had been previously disabled due to instabilities, see #1270 (comment), it should now be much more resilient so we should reenable them in preparation for the permanent migration to UCXX. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: #1396

Ignore legacy Dask dataframe warnings that the implementation is going to be soon removed, introduced in dask/dask#11437 . The warning is only raised for `DASK_DATAFRAME__QUERY_PLANNING=False` cases. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Richard (Rick) Zamora (https://github.com/rjzamora) - James Lamb (https://github.com/jameslamb) URL: #1397

Contributes to rapidsai/build-planning#108 This is a pure Python project, so it doesn't need configuration about CMake or `sccache`. This proposes removing them to simplify build scripts a bit. It also proposes updating the `rapids-dependency-file-generator` pre-commit hook to it's latest version, something I'm trying to roll out across RAPIDS as part of rapidsai/build-planning#108. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: #1400

In cudf & cuml we have observed a ~10% to ~20% respectively speed up of pytest suite execution by switching pytest traceback to `--native`: ``` currently: 102474 passed, 2117 skipped, 902 xfailed in 892.16s (0:14:52) --tb=short: 102474 passed, 2117 skipped, 902 xfailed in 898.99s (0:14:58) --tb=no: 102474 passed, 2117 skipped, 902 xfailed in 815.98s (0:13:35) --tb=native: 102474 passed, 2117 skipped, 902 xfailed in 820.92s (0:13:40) ``` This PR makes similar change to `dask-cuda` repo. xref: rapidsai/cudf#16851 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1389

Add support for initial warmup runs in benchmarks and allows profiling all iterations or just the last one. This is technically a breaking change since `--profile` now profiles all iterations, and the new `--profile-last` option profiles only the last one as `--profile` used to behave. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) URL: #1402

Contributes to rapidsai/build-planning#110 Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI. * checks on wheel size (compressed), - *to be sure they're under PyPI limits* - *and to prompt discussion on PRs that significantly increase wheel sizes* * checks on README formatting - *to ensure they'll render properly as the PyPI project homepages* - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/* Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1404

Temporarily disable UCXX tests in CI due to some non-deterministic failures during code freeze phase. They will be reenabled after 24.12 release. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: #1406

Handling the str vs. bytes discrepancy should have been covered by the changes in #1118. Authors: - Lawrence Mitchell (https://github.com/wence-) - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - https://github.com/jakirkham URL: #1130

raydouglass and others added 14 commits September 19, 2024 11:46

DOC v24.12 Updates [skip ci]

95f0a33

Merge pull request #1388 from rapidsai/branch-24.10

fe16796

Forward-merge branch-24.10 into branch-24.12

Merge pull request #1394 from rapidsai/branch-24.10

e139307

Forward-merge branch-24.10 into branch-24.12

raydouglass requested review from a team as code owners November 21, 2024 20:51

raydouglass requested review from msarahan and removed request for a team November 21, 2024 20:51

github-actions bot added python python code needed conda conda issue ci labels Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RELEASE] dask-cuda v24.12 #1411

[RELEASE] dask-cuda v24.12 #1411

raydouglass commented Nov 21, 2024

[RELEASE] dask-cuda v24.12 #1411

Are you sure you want to change the base?

[RELEASE] dask-cuda v24.12 #1411

Conversation

raydouglass commented Nov 21, 2024

❄️ Code freeze for branch-24.12 and v24.12 release

What does this mean?

What is the purpose of this PR?

❄️ Code freeze for `branch-24.12` and v24.12 release