Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to build CUDA wheels #106

Open
wants to merge 189 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 75 commits
Commits
Show all changes
189 commits
Select commit Hold shift + click to select a range
faa582d
Try to build CUDA wheels
frostedoyster Feb 12, 2024
52aea84
Try again
frostedoyster Feb 12, 2024
72e4fec
Try yet another version
frostedoyster Feb 12, 2024
c52e0d0
Typo
frostedoyster Feb 12, 2024
2dee430
Try with miniconda
frostedoyster Feb 13, 2024
90d1691
Debug
frostedoyster Feb 13, 2024
a614ea2
Debug
frostedoyster Feb 13, 2024
3d8899d
Debug
frostedoyster Feb 13, 2024
6cc261c
Back to cibw
frostedoyster Feb 13, 2024
b679bb2
Try again?
frostedoyster Feb 13, 2024
00a045a
Extract setup into its own file
frostedoyster Feb 13, 2024
f5aa849
Test torch 2.1.0
frostedoyster Feb 13, 2024
bd29242
Debug
frostedoyster Feb 13, 2024
de679c0
Debug
frostedoyster Feb 13, 2024
47544b1
Debug
frostedoyster Feb 13, 2024
194acac
Debug
frostedoyster Feb 13, 2024
5af3030
Debug
frostedoyster Feb 13, 2024
e7f0b28
debug
frostedoyster Feb 13, 2024
3a5bb19
debug
frostedoyster Feb 13, 2024
fb56feb
debug
frostedoyster Feb 13, 2024
b6ffd8b
debug
frostedoyster Feb 13, 2024
6eab11b
debug
frostedoyster Feb 13, 2024
d423094
Run set-up script?
frostedoyster Feb 13, 2024
347910f
Free disk space
frostedoyster Feb 13, 2024
4836d45
Only python 3.11
frostedoyster Feb 13, 2024
76c745e
Try again
frostedoyster Feb 13, 2024
23312f9
Print where it fails
frostedoyster Feb 13, 2024
5540e6b
Find nvcc
frostedoyster Feb 13, 2024
3e2237c
Find the CUDA compiler everywhere
frostedoyster Feb 13, 2024
cf3b46f
Found.
frostedoyster Feb 13, 2024
ed4995f
Debug
frostedoyster Feb 13, 2024
e346ac8
Try with architectures
frostedoyster Feb 13, 2024
de1a880
Obey nvcc's orders
frostedoyster Feb 13, 2024
958e30c
Debug
frostedoyster Feb 13, 2024
31665af
Try again
frostedoyster Feb 13, 2024
a1961df
Debug
frostedoyster Feb 13, 2024
0d43033
Set CUDA compiler explicitly
frostedoyster Feb 13, 2024
f499acc
Try again
frostedoyster Feb 13, 2024
4080c74
Try installing full cuda toolkit
frostedoyster Feb 13, 2024
9220348
Set torch cuda architectures
frostedoyster Feb 13, 2024
e2f12e8
Add one more exclude
frostedoyster Feb 13, 2024
a76a784
Exclude more
frostedoyster Feb 13, 2024
1f6018c
Build multiple Python versions
frostedoyster Feb 14, 2024
2a3a8bb
Build for multiple CUDA versions
frostedoyster Feb 14, 2024
e484c6e
Try "all" architectures
frostedoyster Feb 14, 2024
4eaf83a
Try All instead of all for torch
frostedoyster Feb 14, 2024
2954252
Remove comma bug after cp311
frostedoyster Feb 14, 2024
248094f
Custom build backend to pin torch+cuda version
frostedoyster Feb 14, 2024
d6c896a
Try again
frostedoyster Feb 14, 2024
60b393f
Try with a new dependency?
frostedoyster Feb 14, 2024
3967c79
Try to get backend.py copied inside docker environmet
frostedoyster Feb 14, 2024
6cbbfb1
Try again
frostedoyster Feb 14, 2024
4d9c9a5
Fix bug
frostedoyster Feb 14, 2024
db72817
Try again
frostedoyster Feb 14, 2024
4f1d087
Make it work
frostedoyster Feb 14, 2024
772530a
Try again
frostedoyster Feb 14, 2024
c4b2434
Debug
frostedoyster Feb 14, 2024
de8770b
Try again
frostedoyster Feb 14, 2024
484fefb
Debug
frostedoyster Feb 14, 2024
508351b
Debug
frostedoyster Feb 14, 2024
e514149
???
frostedoyster Feb 14, 2024
61ecc74
Try again
frostedoyster Feb 14, 2024
a9edab7
Try again
frostedoyster Feb 14, 2024
165d583
More bugs
frostedoyster Feb 14, 2024
b12db15
More torch versions
frostedoyster Feb 14, 2024
5c26471
Debug
frostedoyster Feb 14, 2024
912187e
Bug?
frostedoyster Feb 14, 2024
b0c1f20
Do not run on pwsh
frostedoyster Feb 14, 2024
c79908a
Debug
frostedoyster Feb 14, 2024
92bc818
Try again
frostedoyster Feb 14, 2024
3ab255d
??
frostedoyster Feb 14, 2024
1568d77
1.13.0 does not support cuda 11.8?
frostedoyster Feb 14, 2024
f81b3db
Take out torch 1.13
frostedoyster Feb 14, 2024
276ecf6
Try to exclude Python 3.12 from torch 2.0 and 2.1
frostedoyster Mar 26, 2024
a94ec4f
Merge branch 'main' into cuda-wheels
frostedoyster Mar 28, 2024
db46abf
Merge branch 'main' into cuda-wheels
nickjbrowning Nov 1, 2024
59682f6
Merge branch 'main' into cuda-wheels
nickjbrowning Nov 1, 2024
a7a1f1e
update
nickjbrowning Nov 1, 2024
eb9c257
update
nickjbrowning Nov 1, 2024
2fa074e
update
nickjbrowning Nov 1, 2024
efe5e7f
stop cuda wheel actions.
nickjbrowning Nov 1, 2024
0939da0
aarch64 update
nickjbrowning Nov 1, 2024
25a905d
added dockerfiles
nickjbrowning Nov 1, 2024
b38974e
path change
nickjbrowning Nov 1, 2024
9a7bb6a
docker file updates
nickjbrowning Nov 1, 2024
d4083fe
dockerfile update.
nickjbrowning Nov 1, 2024
b7d49fc
x
nickjbrowning Nov 1, 2024
01ccc37
dockerfile updates
nickjbrowning Nov 1, 2024
42cda7f
dockerfile updates
nickjbrowning Nov 1, 2024
26110ca
docker file updates
nickjbrowning Nov 1, 2024
0438400
QEMU
nickjbrowning Nov 1, 2024
0a787dc
update.
nickjbrowning Nov 1, 2024
3aab1b8
update
nickjbrowning Nov 1, 2024
6fa992e
update only use 2.4 for now.
nickjbrowning Nov 1, 2024
364e13f
disable unecessary cibw script.
nickjbrowning Nov 1, 2024
2647254
pip index url.
nickjbrowning Nov 1, 2024
22808b3
debug
nickjbrowning Nov 1, 2024
cb7b3e2
try no isolation
nickjbrowning Nov 1, 2024
21da7f1
added deps
nickjbrowning Nov 1, 2024
bcef36d
x
nickjbrowning Nov 1, 2024
f6c1fe5
test
nickjbrowning Nov 1, 2024
8db9115
retry
nickjbrowning Nov 3, 2024
3f2eafa
swapped to nightly.
nickjbrowning Nov 4, 2024
4ab5734
update to aarch64
nickjbrowning Nov 4, 2024
de1e330
test + linting
nickjbrowning Nov 4, 2024
5615f36
test
nickjbrowning Nov 4, 2024
74fb5e7
x
nickjbrowning Nov 4, 2024
e4f2b6f
test
nickjbrowning Nov 4, 2024
4955a48
test
nickjbrowning Nov 4, 2024
5e2daf5
test
nickjbrowning Nov 4, 2024
5b8856c
test
nickjbrowning Nov 15, 2024
8487603
moved away from nightly.
nickjbrowning Nov 15, 2024
5612be8
backtrack to 2.4.1
nickjbrowning Nov 15, 2024
7d4f43b
adjustments
nickjbrowning Nov 15, 2024
052d434
revert
nickjbrowning Nov 15, 2024
8de8169
small updates
nickjbrowning Nov 15, 2024
bdf55a9
changed how pytorch is installed
nickjbrowning Nov 15, 2024
201db81
removed sphericart wheel build.
nickjbrowning Nov 15, 2024
dd80dec
updated build backend
nickjbrowning Nov 15, 2024
3abd603
added setup tools
nickjbrowning Nov 15, 2024
0136bd7
added some print statments
nickjbrowning Nov 15, 2024
395da79
added torch version print
nickjbrowning Nov 15, 2024
f96fb0d
shuffle build-backend
nickjbrowning Nov 15, 2024
cdfc7bc
test
nickjbrowning Nov 15, 2024
f5c99e8
changed to before_all
nickjbrowning Nov 15, 2024
b18a95c
remove versioning requirement
nickjbrowning Nov 16, 2024
02e7ec6
update build-system
nickjbrowning Nov 16, 2024
f4eac49
removed build, changed PIP_EXTRA_INDEX
nickjbrowning Nov 16, 2024
8a3437e
removed |
nickjbrowning Nov 16, 2024
1f05b8e
typo.
nickjbrowning Nov 16, 2024
6f32cc2
updated backend.py
nickjbrowning Nov 16, 2024
b8b1cd7
removed CIBW_BEFORE_ALL
nickjbrowning Nov 16, 2024
381990e
print debug
nickjbrowning Nov 16, 2024
c6e6678
fixed build-backend
nickjbrowning Nov 16, 2024
cc79ddd
debug print pytorch version from cmakelist.
nickjbrowning Nov 16, 2024
9bbc797
added cibw-python
nickjbrowning Nov 16, 2024
9fccacb
not sure if stream guard is needed.
nickjbrowning Nov 17, 2024
7f1bd93
forgot one.
nickjbrowning Nov 17, 2024
8404ccd
added CUDAToolkit_INCLUDE_DIRS to cmakelist
nickjbrowning Nov 17, 2024
af044b9
added DL libs
nickjbrowning Nov 17, 2024
cc41d19
added TORCH_INCLUDE_DIRS
nickjbrowning Nov 17, 2024
316e818
added cu specifier to SPHERICART_TORCH_BUILD_WITH_TORCH_VERSION
nickjbrowning Nov 17, 2024
29a5a63
try this?
nickjbrowning Nov 17, 2024
b16ee48
rebuild with original torch spec.
nickjbrowning Nov 17, 2024
58c0ed6
x
nickjbrowning Nov 17, 2024
c8bd01b
test
nickjbrowning Nov 17, 2024
bcd611f
fixed build with cpu torch
nickjbrowning Nov 17, 2024
5d103d2
update latest
nickjbrowning Nov 18, 2024
47b7471
removed build messages.
nickjbrowning Nov 18, 2024
f86c6fb
formatting.
nickjbrowning Nov 18, 2024
e3a38ee
removed merge-torch-wheels for now.
nickjbrowning Nov 18, 2024
2ed8d14
added CUDAToolkit_INCLUDE_DIRS
nickjbrowning Nov 18, 2024
18c4c83
added print statement for CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES
nickjbrowning Nov 18, 2024
37f3d1c
add target_compile_definitions
nickjbrowning Nov 18, 2024
6acbbb4
changed to PUBLIC
nickjbrowning Nov 18, 2024
b3c57f3
change C10_CUDA_NO_CMAKE_CONFIGURE_FILE to PRIVATE
nickjbrowning Nov 18, 2024
4a3418c
updates to docker and cibuildwheels calls.
nickjbrowning Dec 2, 2024
8b34def
tmp fix to enable docker to find remove script.
nickjbrowning Dec 2, 2024
4026bc4
lets try aarch64...
nickjbrowning Dec 2, 2024
8fcd1a1
build matrix update for aarch64
nickjbrowning Dec 2, 2024
ac93fa1
x
nickjbrowning Dec 2, 2024
0099796
env update.
nickjbrowning Dec 2, 2024
0edc8b6
updated aarch64 dockerfile.
nickjbrowning Dec 2, 2024
0357482
try again.
nickjbrowning Dec 2, 2024
8a5f2f7
fixes.
nickjbrowning Dec 2, 2024
b28653c
copy removed_unused_python.
nickjbrowning Dec 2, 2024
92aad85
removed prints.
nickjbrowning Dec 2, 2024
892217c
remove aarch64.
nickjbrowning Dec 2, 2024
6af1ed1
try to force the torch build.
nickjbrowning Dec 2, 2024
9b8abb5
test
nickjbrowning Dec 2, 2024
d69de58
test
nickjbrowning Dec 2, 2024
a8b7766
undo
nickjbrowning Dec 2, 2024
1d839d8
try explicitly command line.
nickjbrowning Dec 2, 2024
1886b38
turn off arch_native.
nickjbrowning Dec 2, 2024
313926e
lets try aarch64 again.
nickjbrowning Dec 2, 2024
fb67e95
re-add qemu
nickjbrowning Dec 2, 2024
8c69988
qemu update
nickjbrowning Dec 2, 2024
d282955
git unused.
nickjbrowning Dec 2, 2024
f6ebbbb
disable wheels.
nickjbrowning Dec 2, 2024
eea95e0
jax updates.
nickjbrowning Dec 2, 2024
690bcbb
add artifact upload.
nickjbrowning Dec 2, 2024
53e89d4
bug fix.
nickjbrowning Dec 2, 2024
d93ec56
try jax
nickjbrowning Dec 2, 2024
164acce
aded pybind11
nickjbrowning Dec 2, 2024
16d0ad2
added ALL CUDA architecture.
nickjbrowning Dec 2, 2024
1bc8d22
force a retrigger
nickjbrowning Dec 2, 2024
11750c0
some fixes for jax.
nickjbrowning Dec 2, 2024
570e756
removed double enableCuda
nickjbrowning Dec 2, 2024
9a7cd5f
updated cmake requirements
nickjbrowning Dec 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions .github/workflows/build-cuda-wheels.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# inspired by https://github.com/AutoGPTQ/AutoGPTQ/blob/main/.github/workflows/build_wheels_cuda.yml

name: Build Python wheels with CUDA

on:
push:
branches: [main]
tags: ["*"]
pull_request:
# Check all PR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to build all CUDA wheels on all PR? I'm worried this will increase CI time a lot


env:
SPHERICART_NO_LOCAL_DEPS: "1"

jobs:
build_wheels:
name: ${{ matrix.os }}, torch ${{ matrix.torch }}, cuda ${{ matrix.cuda }}
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-20.04]
torch: ["2.0.0", "2.1.0", "2.2.0"]
cuda: ["11.8", "12.1"]
exclude:
# torch 1.13.0 does not support cuda 12.1
- os: ubuntu-20.04
torch: "1.13.1"
cuda: "12.1"

steps:
- name: Prepare cuda_no_point variable
id: prepare_cuda_no_point
run: |
export cuda_no_point=$(echo ${{ matrix.cuda }} | sed 's/\.//')
echo ::set-output name=cuda_no_point::$cuda_no_point

- name: free disk space
run: sudo rm -rf /usr/share/dotnet /usr/local/lib/android || true

- uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"

- name: Install cibuildwheel
run: python -m pip install cibuildwheel build

- name: Build sphericart wheels
run: |
# ensure we build the wheel from the sdist, not the checkout
python -m build --sdist . --outdir dist
python -m cibuildwheel dist/*.tar.gz --output-dir dist
env:
CIBW_BUILD_VERBOSITY: 3
# build wheels on CPython 3.10
CIBW_BUILD: cp39-* cp310-* cp311-* cp312-*
# skip musl and 32-bit builds
CIBW_SKIP: "*-musllinux* *-win32 *-manylinux_i686"
# on macOS, build both Intel & Apple Silicon versions
CIBW_MANYLINUX_X86_64_IMAGE: manylinux2014
# do not build wheels with -march=native
CIBW_ENVIRONMENT: SPHERICART_ARCH_NATIVE=OFF

- name: Build sphericart-torch wheels
run: |
# ensure we build the wheel from the sdist, not the checkout
python -m build --sdist sphericart-torch --outdir sphericart-torch/dist
python -m cibuildwheel sphericart-torch/dist/*.tar.gz --output-dir sphericart-torch/dist
env:
CIBW_BEFORE_ALL: bash /host/home/runner/work/sphericart/sphericart/scripts/cibw-cuda-setup.sh ${{ matrix.cuda }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also be done for the diverse sphericart (non torch) wheels?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't provide any CUDA functionality for our Python (NumPy) inteface

CIBW_BUILD_VERBOSITY: 3
CIBW_BUILD: >
bash -c 'if [[ "${{ matrix.torch }}" == "2.2.0" ]]; then
echo "cp39-* cp310-* cp311-* cp312-*"
else
echo "cp39-* cp310-* cp311-*"
fi'
CIBW_SKIP: "*-musllinux* *-win32 *-manylinux_i686"
CIBW_MANYLINUX_X86_64_IMAGE: manylinux2014
# set environment variables for sphericart-torch build
CIBW_ENVIRONMENT: SPHERICART_ARCH_NATIVE=OFF CUDACXX=/usr/local/cuda/bin/nvcc TORCH_CUDA_ARCH_LIST=All CUDAARCHS=all SPHERICART_TORCH_TORCH_VERSION=${{ matrix.torch }} PIP_EXTRA_INDEX_URL=https://download.pytorch.org/whl/cu${{ steps.prepare_cuda_no_point.outputs.cuda_no_point }}
# do not complain for missing libtorch.so in sphericart-torch wheel
CIBW_REPAIR_WHEEL_COMMAND_LINUX: |
auditwheel repair --exclude libtorch.so --exclude libtorch_cpu.so --exclude libtorch_cuda.so --exclude libc10.so --exclude libc10_cuda.so -w {dest_dir} {wheel}

# - name: Build sphericart-jax wheels
# run: |
# # ensure we build the wheel from the sdist, not the checkout
# python -m build --sdist sphericart-jax --outdir sphericart-jax/dist
# python -m cibuildwheel sphericart-jax/dist/*.tar.gz --output-dir sphericart-jax/dist
# env:
# CIBW_BUILD_VERBOSITY: 3
# CIBW_BUILD: cp310-*
# CIBW_SKIP: "*-musllinux* *-win32 *-manylinux_i686"
# CIBW_MANYLINUX_X86_64_IMAGE: manylinux2014
# CIBW_ENVIRONMENT: SPHERICART_ARCH_NATIVE=OFF
26 changes: 26 additions & 0 deletions scripts/cibw-cuda-setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set -eu

-e to exit if any command fails, -u to error on undefined bash variables

# Set CUDA version and architecture
CU_VER=${1//./-}
ARCH="x86_64"

# Install CUDA compiler and libraries
yum install -y yum-utils
yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/${ARCH}/cuda-rhel7.repo
yum -y install cuda-toolkit-${CU_VER}.${ARCH} \
nvidia-driver-latest-dkms

# Clean up YUM caches
yum clean all
rm -rf /var/cache/yum/*

# Configure dynamic linker run-time bindings
echo "/usr/local/cuda/lib64" >> /etc/ld.so.conf.d/999_nvidia_cuda.conf

# Set environment variables
export PATH="/usr/local/cuda/bin:${PATH}"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:${LD_LIBRARY_PATH}"
export CUDA_HOME=/usr/local/cuda
export CUDA_ROOT=/usr/local/cuda
export CUDA_PATH=/usr/local/cuda
export CUDADIR=/usr/local/cuda
2 changes: 2 additions & 0 deletions sphericart-torch/MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,7 @@ recursive-include sphericart *
recursive-include src *
recursive-include include *

recursive-include build-backend *.py

include pyproject.toml
include README.md
26 changes: 26 additions & 0 deletions sphericart-torch/build-backend/backend.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# this is a custom Python build backend wrapping setuptool's to set a
# specific torch version as a build dependency, based on an environment
# variable
import os

from setuptools import build_meta

TORCH_VERSION = os.environ.get("SPHERICART_TORCH_TORCH_VERSION")

if TORCH_VERSION is not None:
TORCH_DEP = f"torch =={TORCH_VERSION}"
else:
TORCH_DEP = "torch >=1.13"


prepare_metadata_for_build_wheel = build_meta.prepare_metadata_for_build_wheel
build_wheel = build_meta.build_wheel
build_sdist = build_meta.build_sdist


def get_requires_for_build_wheel(config_settings=None):
defaults = build_meta.get_requires_for_build_wheel(config_settings)
return defaults + [TORCH_DEP]


get_requires_for_build_sdist = build_meta.get_requires_for_build_sdist
7 changes: 5 additions & 2 deletions sphericart-torch/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,12 @@ requires = [
"setuptools >=44",
"wheel >=0.36",
"cmake",
"torch >= 1.13",
]
build-backend = "setuptools.build_meta"

# use a custom build backend to add a dependency on a specific
# version of torch+cuda
build-backend = "backend"
backend-path = ["build-backend"]


[tool.setuptools]
Expand Down
Loading