Make it optional to build CUDA extension for SAM 2; also fallback to all available kernels if Flash Attention fails #155

ronghanghu · 2024-08-06T05:24:48Z

In this PR, we make it optional to build the SAM 2 CUDA extension, in observation that many users encounter difficulties with the CUDA compilation step.

During installation, we catch build errors and print a warning message. We also allow explicitly turning off the CUDA extension building with SAM2_BUILD_CUDA=0.
At runtime, we catch CUDA kernel errors from connected components and print a warning on skipping the post processing step.

We also fall back to the all available kernels if the Flash Attention kernel fails.

bhack · 2024-08-06T17:35:24Z

Do you think that Kornia like pure pytorch connected components will be too much numerically misaligned?

kornia/kornia#1184

ronghanghu · 2024-08-06T17:43:26Z

Do you think that Kornia like pure pytorch connected components will be too much numerically misaligned?

kornia/kornia#1184

@bhack Thanks for the suggestion! We have also tried this kornia implementation before, but it was too slow for video applications (as it's using an iteration loop in Python and its algorithm has not been carefully optimized for GPUs), so we added a custom CUDA kernel in connected_components.cu instead, which is much faster.

…all available kernels if Flash Attention fails In this PR, we make it optional to build the SAM 2 CUDA extension, in observation that many users encounter difficulties with the CUDA compilation step. 1. During installation, we catch build errors and print a warning message. We also allow explicitly turning off the CUDA extension building with `SAM2_BUILD_CUDA=0`. 2. At runtime, we catch CUDA kernel errors from connected components and print a warning on skipping the post processing step. We also fall back to the all available kernels if the Flash Attention kernel fails.

bhack · 2024-08-06T17:45:31Z

Yes I know that it has loops. It is not easy to implement with pytorch ops. Have you benchmarked how is the pytorch compiler behaving with these loops?

bhack · 2024-08-06T17:48:48Z

Quite funny that...
pytorch/pytorch#113538 (comment)

ronghanghu · 2024-08-06T17:50:35Z

Yes I know that it has loops. It is not easy to implement with pytorch ops. Have you benchmarked how is the pytorch compiler behaving with these loops?

@bhack In our internal benchmarking, the custom CUDA kernel is much (~100x) faster than the kornia implementation even if we try to optimize the latter (e.g. via torch compilation). Another user also reported similar observations (prittt/YACCLAB#28 (comment)).

…be loaded (#175) Previously we only catch build errors in `BuildExtension` in https://github.com/facebookresearch/segment-anything-2/pull/155. However, in some cases, the `CUDAExtension` instance might not load. So in this PR, we also catch such errors for `CUDAExtension`.

… into facebookresearch-main * 'main' of github.com:facebookresearch/segment-anything-2: (40 commits) open `README.md` with unicode (to support Hugging Face emoji); fix various typos (facebookresearch#218) accept kwargs in auto_mask_generator Fix HF image predictor improving warning message and adding further tips for installation (facebookresearch#204) better support for non-CUDA devices (CPU, MPS) (facebookresearch#192) Update hieradet.py add Colab support to the notebooks; pack config files in `sam2_configs` package during installation (facebookresearch#176) also catch errors during installation in case `CUDAExtension` cannot be loaded (facebookresearch#175) Add interface for box prompt in SAM 2 video predictor (facebookresearch#174) Address comment Update hieradet.py Update docstrings Revert code snippet Updated INSTALL.md with CUDA_HOME-related troubleshooting (facebookresearch#140) Format using ufmt Update INSTALL.md (facebookresearch#156) Update README Make it optional to build CUDA extension for SAM 2; also fallback to all available kernels if Flash Attention fails (facebookresearch#155) Clean up Address comment ...

…all available kernels if Flash Attention fails (facebookresearch#155) In this PR, we make it optional to build the SAM 2 CUDA extension, in observation that many users encounter difficulties with the CUDA compilation step. 1. During installation, we catch build errors and print a warning message. We also allow explicitly turning off the CUDA extension building with `SAM2_BUILD_CUDA=0`. 2. At runtime, we catch CUDA kernel errors from connected components and print a warning on skipping the post processing step. We also fall back to the all available kernels if the Flash Attention kernel fails.

facebook-github-bot added the cla signed label Aug 6, 2024

ronghanghu requested a review from haithamkhedr August 6, 2024 05:25

ronghanghu marked this pull request as draft August 6, 2024 05:42

ronghanghu force-pushed the ronghanghu/cuda_kernel_optional branch 3 times, most recently from 509f0b1 to 268ad1c Compare August 6, 2024 15:08

ronghanghu changed the title ~~Make it optional to build CUDA extension for SAM 2; also fallback to math kernel if Flash Attention fails~~ Make it optional to build CUDA extension for SAM 2; also fallback to all available kernels if Flash Attention fails Aug 6, 2024

ronghanghu force-pushed the ronghanghu/cuda_kernel_optional branch 3 times, most recently from 8522a19 to 6943cf6 Compare August 6, 2024 17:05

ronghanghu marked this pull request as ready for review August 6, 2024 17:36

ronghanghu force-pushed the ronghanghu/cuda_kernel_optional branch from 6943cf6 to 1757177 Compare August 6, 2024 17:45

haithamkhedr approved these changes Aug 6, 2024

View reviewed changes

ronghanghu merged commit 6f7e700 into facebookresearch:main Aug 6, 2024
2 checks passed

This was referenced Aug 7, 2024

When I run pip install --no-build-isolation -e . #56

Closed

Failed to build SAM-2 and RuntimeError in infer #18

Closed

also catch errors during installation in case CUDAExtension cannot be loaded #175

Merged

bhack mentioned this pull request Aug 9, 2024

Introduce Segment Anything 2 cvat-ai/cvat#8243

Open

4 tasks

ronghanghu mentioned this pull request Aug 12, 2024

There may be some bugs for the function "get_connected_components" in utils\misc.py? #200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it optional to build CUDA extension for SAM 2; also fallback to all available kernels if Flash Attention fails #155

Make it optional to build CUDA extension for SAM 2; also fallback to all available kernels if Flash Attention fails #155

ronghanghu commented Aug 6, 2024 •

edited

Loading

bhack commented Aug 6, 2024

ronghanghu commented Aug 6, 2024

bhack commented Aug 6, 2024 •

edited

Loading

bhack commented Aug 6, 2024

ronghanghu commented Aug 6, 2024

Make it optional to build CUDA extension for SAM 2; also fallback to all available kernels if Flash Attention fails #155

Make it optional to build CUDA extension for SAM 2; also fallback to all available kernels if Flash Attention fails #155

Conversation

ronghanghu commented Aug 6, 2024 • edited Loading

bhack commented Aug 6, 2024

ronghanghu commented Aug 6, 2024

bhack commented Aug 6, 2024 • edited Loading

bhack commented Aug 6, 2024

ronghanghu commented Aug 6, 2024

ronghanghu commented Aug 6, 2024 •

edited

Loading

bhack commented Aug 6, 2024 •

edited

Loading