Add NVIDIA CUDA dependency to README.md #601

yukoba · 2025-02-08T20:24:28Z

The documentation for using CUDA was insufficient and unclear, so I noted in the README.md that cuda of org.bytedeco is required. Apologies if this is not the correct approach.

saudet · 2025-02-08T22:19:47Z

We can add the redist artifact or install CUDA, but we don't need both.

yukoba · 2025-02-08T23:40:26Z

However, as far as I have experimented, I don't know the reason, but it doesn't work unless both are added. Does it work with only one?

Craigacp · 2025-02-09T01:15:57Z

It should work with the CUDA redist, but you'll need the version that matches the TensorFlow release you are using.

yukoba · 2025-02-09T01:23:22Z

TensorFlow 2.16.2 uses CUDA 12.3, but CUDA 12.3 is not supported on Ubuntu 24.04.
https://www.tensorflow.org/install/source#gpu
https://developer.nvidia.com/cuda-12-3-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network

I installed CUDA 12.3 on Ubuntu 22.04, but it does not work with that alone.

Which OS are you testing on?

saudet · 2025-02-09T10:46:27Z

Please set the "org.bytedeco.javacpp.logger.debug" system property to "true" to get more information on the console.

yukoba · 2025-02-09T13:40:49Z

I created the smallest code that reproduces the issue.
https://github.com/yukoba/TensorFlowJavaBugReport

Please set the "org.bytedeco.javacpp.logger.debug" system property to "true" to get more information on the console.

Please see https://github.com/yukoba/TensorFlowJavaBugReport/blob/main/README.md

yukoba · 2025-02-09T14:22:41Z

https://github.com/yukoba/TensorFlowJavaBugReport/tree/main?tab=readme-ov-file#result

Debug: Loading library cudart
Debug: Failed to load for [email protected]: java.lang.UnsatisfiedLinkError: no cudart in java.library.path: /usr/java/packages/lib:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib

Regarding the above section,
it seems that @.11.0 is being called from here:
https://github.com/tensorflow/java/blob/v1.0.0/tensorflow-core/tensorflow-core-native/src/main/java/org/tensorflow/internal/c_api/presets/tensorflow.java#L322

However, according to
https://github.com/bytedeco/javacpp-presets/blob/1.5.10/cuda/src/main/java/org/bytedeco/cuda/presets/cudart.java#L45,
JavaCPP uses @.12, and when looking inside
https://repo1.maven.org/maven2/org/bytedeco/cuda/12.3-8.9-1.5.10/cuda-12.3-8.9-1.5.10-linux-x86_64-redist.jar,
it contains /org/bytedeco/cuda/linux-x86_64/libcudart.so.12.

Is this mismatch between @.11 and @.12 the expected behavior?

Craigacp · 2025-02-09T17:59:16Z

What's the output of TF_CPP_MAX_VLOG_LEVEL=3 mvn compile exec:java on your system?

Craigacp · 2025-02-09T18:16:18Z

I have your example working on WSL2 without the JavaCPP CUDA reference in the pom file. I installed CUDA 12.8 (the WSL 2 version, as the standard version made TF Python work but didn't work with TF Java due to library loading weirdness that I didn't run down), and cuDNN 8.9. After symlinking cuDNN into the right location and rerunning sudo ldconfig both TF Python and TF-Java work.

It might just be that you don't have cuDNN installed, and the JavaCPP artifact contains cuDNN so things work after that. You should be able to diagnose that by running with TF_CPP_MAX_VLOG_LEVEL=3 set as it'll complain about the lack of cuDNN if that's the issue. cuDNN isn't packaged in the CUDA toolkit you need to download it separately as it has a different license to CUDA.

yukoba · 2025-02-09T18:23:08Z

What's the output of TF_CPP_MAX_VLOG_LEVEL=3 mvn compile exec:java on your system?

TF_CPP_MAX_VLOG_LEVEL_3.txt

It might just be that you don't have cuDNN installed

I will try installing cuDNN now.

yukoba · 2025-02-09T18:35:59Z

It might just be that you don't have cuDNN installed

You're right. Thank you! Installing cuDNN made it work.
I ran sudo apt-get install -y nvidia-driver-550 nvidia-cuda-toolkit nvidia-cudnn and stopped adding org.bytedeco/cuda to the dependencies.

yukoba · 2025-02-09T18:44:13Z

I wrote a one-line note to prevent others from making the same mistake as I did. Could you merge the pull request?

Notes on the required softwares for using an NVIDIA GPU

7dd3eba

yukoba force-pushed the patch-1 branch from ade967f to 7dd3eba Compare February 9, 2025 18:41

Craigacp approved these changes Feb 9, 2025

View reviewed changes

Craigacp merged commit d54d231 into tensorflow:master Feb 12, 2025
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NVIDIA CUDA dependency to README.md #601

Add NVIDIA CUDA dependency to README.md #601

yukoba commented Feb 8, 2025

saudet commented Feb 8, 2025

yukoba commented Feb 8, 2025

Craigacp commented Feb 9, 2025

yukoba commented Feb 9, 2025

saudet commented Feb 9, 2025

yukoba commented Feb 9, 2025

yukoba commented Feb 9, 2025

Craigacp commented Feb 9, 2025

Craigacp commented Feb 9, 2025 •

edited

Loading

yukoba commented Feb 9, 2025

yukoba commented Feb 9, 2025

yukoba commented Feb 9, 2025

Add NVIDIA CUDA dependency to README.md #601

Add NVIDIA CUDA dependency to README.md #601

Conversation

yukoba commented Feb 8, 2025

saudet commented Feb 8, 2025

yukoba commented Feb 8, 2025

Craigacp commented Feb 9, 2025

yukoba commented Feb 9, 2025

saudet commented Feb 9, 2025

yukoba commented Feb 9, 2025

yukoba commented Feb 9, 2025

Craigacp commented Feb 9, 2025

Craigacp commented Feb 9, 2025 • edited Loading

yukoba commented Feb 9, 2025

yukoba commented Feb 9, 2025

yukoba commented Feb 9, 2025

Craigacp commented Feb 9, 2025 •

edited

Loading