Skip to content

Reduce CUDA build matrix #1778

@matthewdouglas

Description

@matthewdouglas

We currently build and package with our kernels built against the following versions of the CUDA Toolkit on all supported platforms:

  • 11.8
  • 12.0, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.8, 12.9
  • 13.0

This adds excessive size to our wheels, along with excessive build times.

By default we try to load the binary built with the same CUDA Toolkit version as the user's PyTorch build, but this can be overriden with the BNB_CUDA_VERSION env variable.

Let's align better with the official PyTorch wheels, as these will be the most commonly used. We currently support PyTorch 2.3+ on CUDA.

PyTorch Version CUDA Toolkit Versions
2.9 12.6, 12.8, 13.0
2.8 12.6, 12.8, 12.9
2.7 11.8, 12.6, 12.8
2.6 11.8, 12.4, 12.6
2.5 11.8, 12.1, 12.4
2.4 11.8, 12.1, 12.4
2.3 11.8, 12.1

We would remove 4 of our 11 builds: CUDA 12.0, 12.2, 12.3, 12.5. When users happen to run a PyTorch built against these versions, we should then fallback to load the closest version of bitsandbytes. The BNB_CUDA_VERSION override would still supersede this.

PyTorch CUDA BNB Build
12.0 12.1
12.2 12.1
12.3 12.1
12.5 12.4

This will reduce our unpacked wheel size by about 35% or ~73.4MB on Linux x86-64.

Metadata

Metadata

Labels

BuildCUDAIssues and PRs related to the CUDA backend, excluding installation/support help.

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions