Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] dask-cuda v25.02 #1438

Merged
merged 24 commits into from
Feb 13, 2025
Merged

[RELEASE] dask-cuda v25.02 #1438

merged 24 commits into from
Feb 13, 2025

Conversation

AyodeAwe
Copy link
Contributor

❄️ Code freeze for branch-25.02 and v25.02 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-25.02 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-25.02 into main for the release

raydouglass and others added 22 commits November 15, 2024 09:26

Verified

This commit was signed with the committer’s verified signature.
raydouglass Ray Douglass

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Forward-merge branch-24.12 into branch-25.02

Verified

This commit was signed with the committer’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Forward-merge branch-24.12 into branch-25.02

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
By default, CI runs on draft PRs. This leads to many CI runs that may be unnecessary.

With this PR's change to `.github/copy-pr-bot.yaml`, an `/ok to test` comment from a trusted user is required to trigger CI on draft PRs. Non-draft PRs will run CI by default, assuming that all commits are signed by trusted users. Otherwise an `/ok to test` is required (as before) -- see the `copy-pr-bot` docs at https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/ for more information.

Part of rapidsai/build-planning#123.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #1412

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Forward-merge branch-24.12 into branch-25.02

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Conda builds are failing due to missing `setuptools`, this change add the missing dependency to fix the failure.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Bradley Dice (https://github.com/bdice)

URL: #1418

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
When PyNVML fails to identify CPU affinity appropriately, it may cause an error with launching Dask-CUDA. After extensive discussions in #1381, it seems appropriate to allow continuing if CPU affinity identification fails and print a warning with a link to documentation instead. New documentation is also added to help in first steps of troubleshooting.

Unfortunately testing warnings in Distributed plugins seems very hard to do, I couldn't find a way to do that even with `distributed.utils_tests.captured_logger`, which runs only after the cluster is created with a `LocalCluster` (or `LocalCUDACluster`). For the `dask cuda worker` CLI there's no way for us to mock the value passed to `CPUAffinity` to force a warning to be raised, so no tests are added at this time.

Closes #1381 .

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Benjamin Zaitlen (https://github.com/quasiben)

URL: #1420

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Do not skip `pynvml` if it's not importable, given `pynvml` is a hard-dependency.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - https://github.com/jakirkham
  - James Lamb (https://github.com/jameslamb)

URL: #1421

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Bump `pynvml` from `11` to `12`. This version of `pynvml` also now depends on `nvidia-ml-py` for core functionality.

Authors:
  - https://github.com/jakirkham
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #1419

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Adding this now that wheels are available

- **deps(kvikio): add kvikio to CUDA version matrices**
- **test(wheels): enable wheel tests in CI**

Resolves #1344

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - James Lamb (https://github.com/jameslamb)

URL: #1416

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Removes testing/handling for "legacy" Dask cuDF (i.e. `DASK_DATAFRAME__QUERY_PLANNING=False`).

This PR also adds support for the `"explicit-comms"` config with query-planning enabled (we used to raise an error telling the user to disable query planning).

This should be merged **before** rapidsai/cudf#17558 (otherwise Dask-CUDA CI will break).
This PR is marked as "breaking", because it technically breaks the `"explicit-comms"` config with the "legacy" version of Dask cuDF (which we are about to remove in 25.02 anyway).

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - James Lamb (https://github.com/jameslamb)
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: #1417

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Follow up to #1417

Cleans up some imports (some of which don't work for `dask>2024.12.1`).

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1424

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Numba 0.61.0 just got released with couple of breaking changes, this pr is required to unblock the ci.

xref: rapidsai/cudf#17777

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Gil Forsyth (https://github.com/gforsyth)

URL: #1426

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Pull in build dependencies from `pyproject.toml` into Conda's `meta.yaml`.

Authors:
  - https://github.com/jakirkham

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Ray Douglass (https://github.com/raydouglass)

URL: #1425

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
shellcheck  is a fast, static analysis tool for shell scripts. It's good at                                  
flagging up unused variables, unintentional glob expansions, and other potential                              
execution and security headaches that arise from the wonders of  bash  (and other shlangs).                   
                                                                                                              
This PR adds a  pre-commit  hook to run  shellcheck  on all of the  sh-lang  files in the  ci/  directory, and
the changes requested by  shellcheck  to make the existing files pass the check.                              
                                                                                                              
xref: rapidsai/build-planning#135

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1427

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
A new configuration to the UCX comms module was introduced in rapidsai/rapids-dask-dependency#80, this is designed to help with timeouts in larger clusters, and sometimes even small ones depending on the architecture. This change documents that new configuration.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Benjamin Zaitlen (https://github.com/quasiben)

URL: #1428

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Contributes to rapidsai/build-planning#142

`ucx-proc` is no longer necessary, for the reasons described in that issue. This proposes dropping the dependency on it here.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #1429

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
This PR uses CUDA 12.8.0 to build and test.

xref: rapidsai/build-planning#139

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #1432

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
This PR points the shared workflow branches back to the default 25.02 branches.

xref: rapidsai/build-planning#139

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1436
@AyodeAwe AyodeAwe requested review from a team as code owners January 31, 2025 21:40
@AyodeAwe AyodeAwe requested review from jameslamb and removed request for a team January 31, 2025 21:40
@github-actions github-actions bot added python python code needed conda conda issue ci labels Jan 31, 2025
pentschev and others added 2 commits February 4, 2025 22:12

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Recently wheel tests have been failing, this is probably due to the `DASK_CUDA_WAIT_WORKERS_MIN_TIMEOUT=20` configuration that is set for conda tests but not for wheels. With this change, the tests and benchmarks are now moved to new scripts that can be called by both conda and wheels tests to provide unified configurations with sane values for CI, and therefore should also resolve the CI failures.

Additionally removed `--durations` which was previously added to help understand timeouts but has not been of much use since.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Tom Augspurger (https://github.com/TomAugspurger)
  - James Lamb (https://github.com/jameslamb)
  - Richard (Rick) Zamora (https://github.com/rjzamora)

URL: #1440

Verified

This commit was signed with the committer’s verified signature.
AyodeAwe Jake Awe
@AyodeAwe AyodeAwe merged commit b01ac29 into main Feb 13, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci conda conda issue python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet