Cmake v3.30.2 cudart link error #154

Birch-san · 2024-08-05T00:44:36Z

As there wasn't a torch 2.4.0 wheel, I tried building NATTEN myself. It didn't go as smoothly as usual.

Most problems were due to cmake giving misleading/incomplete error messages. These are the various errors I hit along the way:
Birch-san/sdxl-play#3 (comment)

Ultimately I think most problems here were just "my gcc and g++ alternatives didn't point anywhere after Ubuntu upgrade", but there is one change I had to make to setup.py to get it to build, and I'm not sure why cmake wasn't able to figure this out automatically, or try it as a guess:

setup.py

  f"-DNATTEN_CUDA_ARCH_LIST={cuda_arch_list_str}",
+ f"-DCUDA_CUDART_LIBRARY=/usr/local/cuda/lib64/libcudart.so",

Perhaps the reason things have changed is because the newer cmake demises FindCUDA?

CMake Warning (dev) at CMakeLists.txt:11 (find_package):
  Policy CMP0146 is not set: The FindCUDA module is removed.  Run "cmake
  --help-policy CMP0146" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

This warning is for project developers.  Use -Wno-dev to suppress it.

Anyway, passing in the CUDA_CUDART_LIBRARY option persuaded it to try compiling.

Unfortunately it looks like that wasn't what it wanted… linking failed at the end of all of that.

/home/birch/git/sdxl-play/venv-311/lib/python3.11/site-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/natten.dir/link.txt --verbose=1
/usr/bin/c++ -fPIC  -std=c++17 -shared -Wl,-soname,natten/libnatten.cpython-311-x86_64-linux-gnu.so -o natten/libnatten.cpython-311-x86_64-linux-gnu.so … -lcudart /usr/local/cuda/lib64/libcudart.so /usr/local/cuda/lib64/libnvToolsExt.so -lcudadevrt -lcudart_static -lrt -lpthread -ldl
/usr/bin/ld: cannot find -lcudart: No such file or directory
/usr/bin/ld: cannot find -lcudadevrt: No such file or directory
/usr/bin/ld: cannot find -lcudart_static: No such file or directory

seems like a perfectly typical value for CUDA_CUDART_LIBRARY though. and the library certainly exists:

ls /usr/local/cuda/lib64/ | grep cudart
libcudart.so
libcudart.so.12
libcudart.so.12.2.53
libcudart_static.a

any idea what I'm doing wrong? the errors don't seem rational…

The text was updated successfully, but these errors were encountered:

Birch-san · 2024-08-05T00:52:33Z

I guess the reason CUDA_CUDART_LIBRARY was ineffective, is that -lcudart appears in the libraries list in addition to /usr/local/cuda/lib64/libcudart.so.

probably what I really need to do is add tell it to link the library dir /usr/local/cuda/lib64, so that it can find -lcudart -lcudadevrt -lcudart_static in that dir.

just need to remember which cmake convention to use for that…

alihassanijr · 2024-08-05T01:11:12Z

Apologies for this; I dropped the ball on the 2.4 release; I'll build those wheels tonight.

I've always had bad experience with FindCUDA, and unfortunately it's difficult to link with libtorch through cmake without including theirs, and that's when everything goes wrong. Every time I've figured out a way around it it's been a hack, but somehow torch's docker images and NGC images aren't affected. So I don't think it's anything wrong with your environment, rather just FindCUDA being annoying as usual.

Also, if you know which version of CUDA toolkit your local torch was compiled with I can just build that binary first and post the link here -- building wheels take a while now that 2.4 supports 3 different CTK versions and 5 python versions (together that's 15 CUDA wheels and 5 CPU.)

Birch-san · 2024-08-05T01:22:09Z

no worries, there's always too much to be done!

I'm pretty much done for the night but I think my last idea might get it building locally.

for some reason CXXFLAGS='-L/usr/local/cuda/lib64' env var didn't work, as in:

CXXFLAGS='-L/usr/local/cuda/lib64' CUDACXX=/usr/local/cuda/bin/nvcc NATTEN_CUDA_ARCH=8.9 NATTEN_VERBOSE=1 NATTEN_IS_BUILDING_DIST=1 NATTEN_WITH_CUDA=1 NATTEN_N_WORKERS=8 python setup.py bdist_wheel -d out/wheels/cu121/torch/240

and by "didn't work" I mean that it didn't introduce any -L/usr/local/cuda/lib64 option into:
build/lib.linux-x86_64-cpython-311/CMakeFiles/natten.dir/link.txt

so I modified csrc/CMakeLists.txt:

  if(${NATTEN_WITH_CUDA})
    target_link_libraries(natten PUBLIC c10 torch torch_cpu torch_python cudart c10_cuda torch_cuda)
+   message("Adding to target 'natten', link directory: ${CUDA_TOOLKIT_ROOT_DIR}/lib64")
+   target_link_directories(natten PUBLIC ${CUDA_TOOLKIT_ROOT_DIR}/lib64)

And this seems to have succeeded in adding a -L/usr/local/cuda/lib64 to natten.dir/link.txt.
will see how it goes.

=====

if you know which version of CUDA toolkit your local torch was compiled with I can just build that binary first

Thanks! Is this it?

print(torch._C._cuda_getCompiledVersion())
12010

torch.version.cuda
'12.1'

torch.__version__
'2.4.0+cu121'

alihassanijr · 2024-08-05T01:25:17Z

Yeah the find cuda module is a big pain; I've sometimes been successful in going around it but never wrote it down 😅 .

Thanks! Is this it?

Yes perfect! I'll post that wheel here when it builds.

Birch-san · 2024-08-05T01:32:17Z

ah! my local build succeeded. NATTEN now working with torch 2.4.0. in the end, all I needed was that target_link_directories() patch. wonder why.

alihassanijr · 2024-08-05T01:39:39Z

Oh nice; feel free to drop the diff here or even open a PR; I wouldn't rule out NATTEN's cmake config doing something wrong.

I guess if the actual issue was a linking error in the end it makes sense; I originally thought FindCUDA was just blocking everything. Anyway I'll try and redo the cmake config soon; I hacked it together one time last year when we made the switch and haven't looked at it since.

Birch-san linked a pull request Aug 5, 2024 that will close this issue

Add -L/usr/local/cuda/lib64 to ensure cudart lib can be found #155

Open

alihassanijr added the build-system Issues related to build system label Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cmake v3.30.2 cudart link error #154

Cmake v3.30.2 cudart link error #154

Birch-san commented Aug 5, 2024 •

edited

Loading

Birch-san commented Aug 5, 2024

alihassanijr commented Aug 5, 2024 •

edited

Loading

Birch-san commented Aug 5, 2024

alihassanijr commented Aug 5, 2024

Birch-san commented Aug 5, 2024

alihassanijr commented Aug 5, 2024 •

edited

Loading

Cmake v3.30.2 cudart link error #154

Cmake v3.30.2 cudart link error #154

Comments

Birch-san commented Aug 5, 2024 • edited Loading

Birch-san commented Aug 5, 2024

alihassanijr commented Aug 5, 2024 • edited Loading

Birch-san commented Aug 5, 2024

alihassanijr commented Aug 5, 2024

Birch-san commented Aug 5, 2024

alihassanijr commented Aug 5, 2024 • edited Loading

Birch-san commented Aug 5, 2024 •

edited

Loading

alihassanijr commented Aug 5, 2024 •

edited

Loading

alihassanijr commented Aug 5, 2024 •

edited

Loading