Ship all CUDA kernels in a single .so #2220

syed-ahmed · 2025-05-19T17:56:13Z

We currently have to build kernels for specific architecture flags (e.g. -gencode=arch=compute_90a,code=sm_90a, -gencode=arch=compute_100,code=sm_100). torch.utils.cpp_extension.CUDAExtension, doesn't have a way to add per source file specific flags. PyTorch's cmake build has a way to add the flags per source, but torchao's build system is not a pure cmake build. As a result, the current work around is to have an SO per cuda-arch. We should either fix this in cpp_extension or migrate to a pure cmake build in order to resolve this.

CC: @drisspg

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ship all CUDA kernels in a single .so #2220

Ship all CUDA kernels in a single .so #2220

syed-ahmed commented May 19, 2025

Ship all CUDA kernels in a single .so #2220

Ship all CUDA kernels in a single .so #2220

Comments

syed-ahmed commented May 19, 2025