Reduce CUDA build matrix

We currently build and package with our kernels built against the following versions of the CUDA Toolkit on all supported platforms:
* 11.8
* 12.0, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.8, 12.9
* 13.0

This adds excessive size to our wheels, along with excessive build times.

By default we try to load the binary built with the same CUDA Toolkit version as the user's PyTorch build, but this can be overriden with the `BNB_CUDA_VERSION` env variable.

Let's align better with the official PyTorch wheels, as these will be the most commonly used. We currently support PyTorch 2.3+ on CUDA.

| PyTorch Version | CUDA Toolkit Versions |
|--------|--------|
| 2.9 | 12.6, 12.8, 13.0 |
| 2.8 | 12.6, 12.8, 12.9 |
| 2.7 | 11.8, 12.6, 12.8 |
| 2.6 | 11.8, 12.4, 12.6 | 
| 2.5 | 11.8, 12.1, 12.4 |
| 2.4 | 11.8, 12.1, 12.4 |
| 2.3 | 11.8, 12.1 |

We would remove 4 of our 11 builds: CUDA 12.0, 12.2, 12.3, 12.5. When users happen to run a PyTorch built against these versions, we should then fallback to load the closest version of bitsandbytes. The `BNB_CUDA_VERSION` override would still supersede this.

| PyTorch CUDA | BNB Build |
|----------------|-----------|
| 12.0 | 12.1 |
| 12.2 | 12.1 |
| 12.3 | 12.1 |
| 12.5 | 12.4 |

This will reduce our unpacked wheel size by about 35% or ~73.4MB on Linux x86-64.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reduce CUDA build matrix #1778

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PyTorch Version	CUDA Toolkit Versions
2.9	12.6, 12.8, 13.0
2.8	12.6, 12.8, 12.9
2.7	11.8, 12.6, 12.8
2.6	11.8, 12.4, 12.6
2.5	11.8, 12.1, 12.4
2.4	11.8, 12.1, 12.4
2.3	11.8, 12.1

Uh oh!

Reduce CUDA build matrix #1778

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions