Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NVIDIA CUDA dependency to README.md #601

Merged
merged 1 commit into from
Feb 12, 2025
Merged

Conversation

yukoba
Copy link
Contributor

@yukoba yukoba commented Feb 8, 2025

The documentation for using CUDA was insufficient and unclear, so I noted in the README.md that cuda of org.bytedeco is required. Apologies if this is not the correct approach.

@saudet
Copy link
Contributor

saudet commented Feb 8, 2025

We can add the redist artifact or install CUDA, but we don't need both.

@yukoba
Copy link
Contributor Author

yukoba commented Feb 8, 2025

However, as far as I have experimented, I don't know the reason, but it doesn't work unless both are added. Does it work with only one?

@Craigacp
Copy link
Collaborator

Craigacp commented Feb 9, 2025

It should work with the CUDA redist, but you'll need the version that matches the TensorFlow release you are using.

@yukoba
Copy link
Contributor Author

yukoba commented Feb 9, 2025

TensorFlow 2.16.2 uses CUDA 12.3, but CUDA 12.3 is not supported on Ubuntu 24.04.
https://www.tensorflow.org/install/source#gpu
https://developer.nvidia.com/cuda-12-3-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network

I installed CUDA 12.3 on Ubuntu 22.04, but it does not work with that alone.

Which OS are you testing on?

@saudet
Copy link
Contributor

saudet commented Feb 9, 2025

Please set the "org.bytedeco.javacpp.logger.debug" system property to "true" to get more information on the console.

@yukoba
Copy link
Contributor Author

yukoba commented Feb 9, 2025

I created the smallest code that reproduces the issue.
https://github.com/yukoba/TensorFlowJavaBugReport

Please set the "org.bytedeco.javacpp.logger.debug" system property to "true" to get more information on the console.

Please see https://github.com/yukoba/TensorFlowJavaBugReport/blob/main/README.md

@yukoba
Copy link
Contributor Author

yukoba commented Feb 9, 2025

https://github.com/yukoba/TensorFlowJavaBugReport/tree/main?tab=readme-ov-file#result

Debug: Loading library cudart
Debug: Failed to load for [email protected]: java.lang.UnsatisfiedLinkError: no cudart in java.library.path: /usr/java/packages/lib:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib

Regarding the above section,
it seems that @.11.0 is being called from here:
https://github.com/tensorflow/java/blob/v1.0.0/tensorflow-core/tensorflow-core-native/src/main/java/org/tensorflow/internal/c_api/presets/tensorflow.java#L322

However, according to
https://github.com/bytedeco/javacpp-presets/blob/1.5.10/cuda/src/main/java/org/bytedeco/cuda/presets/cudart.java#L45,
JavaCPP uses @.12, and when looking inside
https://repo1.maven.org/maven2/org/bytedeco/cuda/12.3-8.9-1.5.10/cuda-12.3-8.9-1.5.10-linux-x86_64-redist.jar,
it contains /org/bytedeco/cuda/linux-x86_64/libcudart.so.12.

Is this mismatch between @.11 and @.12 the expected behavior?

@Craigacp
Copy link
Collaborator

Craigacp commented Feb 9, 2025

What's the output of TF_CPP_MAX_VLOG_LEVEL=3 mvn compile exec:java on your system?

@Craigacp
Copy link
Collaborator

Craigacp commented Feb 9, 2025

I have your example working on WSL2 without the JavaCPP CUDA reference in the pom file. I installed CUDA 12.8 (the WSL 2 version, as the standard version made TF Python work but didn't work with TF Java due to library loading weirdness that I didn't run down), and cuDNN 8.9. After symlinking cuDNN into the right location and rerunning sudo ldconfig both TF Python and TF-Java work.

It might just be that you don't have cuDNN installed, and the JavaCPP artifact contains cuDNN so things work after that. You should be able to diagnose that by running with TF_CPP_MAX_VLOG_LEVEL=3 set as it'll complain about the lack of cuDNN if that's the issue. cuDNN isn't packaged in the CUDA toolkit you need to download it separately as it has a different license to CUDA.

@yukoba
Copy link
Contributor Author

yukoba commented Feb 9, 2025

What's the output of TF_CPP_MAX_VLOG_LEVEL=3 mvn compile exec:java on your system?

TF_CPP_MAX_VLOG_LEVEL_3.txt

It might just be that you don't have cuDNN installed

I will try installing cuDNN now.

@yukoba
Copy link
Contributor Author

yukoba commented Feb 9, 2025

It might just be that you don't have cuDNN installed

You're right. Thank you! Installing cuDNN made it work.
I ran sudo apt-get install -y nvidia-driver-550 nvidia-cuda-toolkit nvidia-cudnn and stopped adding org.bytedeco/cuda to the dependencies.

@yukoba
Copy link
Contributor Author

yukoba commented Feb 9, 2025

I wrote a one-line note to prevent others from making the same mistake as I did. Could you merge the pull request?

@Craigacp Craigacp merged commit d54d231 into tensorflow:master Feb 12, 2025
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants