-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update onnxruntime to 1.20.1 #40
Conversation
Thanks a lot! I can help later with the windows CI fail. The issue is that the artifact system only understands tar, while the official binaries are zip. |
Ah, got it! Thank you |
I updated the binaries. Also I changed the osx URL, to universal2. Is that the right thing? I would like this to run out of the box on say 4 year old macs. I assume the arm binaries don't give you that, but am not a mac user and interested in your comments. |
If the osx platform needs to be tweaked further, can you do it in a PR that updates https://github.com/jw3126/ONNXRunTimeArtifacts |
The artifacts are platform-specific, so old macs are on x86 with universal tarball. New macs (aarch64) are on their specific tarball. Both will work as they should. |
I went through the CUDA test failures and I think they trace back to GPUCompiler.jl. It seems to be a bug which has been resolved in newer versions: https://github.com/JuliaGPU/GPUCompiler.jl/blob/09b4708ba12e0b19e40f85c64e9105cf666c4d62/src/GPUCompiler.jl#L60C2-L63C64 It has this block suggesting it's a known issue.
I think we can ignore it. |
Do you have any further thoughts on the PR? |
Thanks!
|
I'll update the script. For the error, it's not solvable by us since a dependency fails. I'm happy to experiment with it, but can you give me permission to run the CI? I don't have GPU so can't reproduce it and it will take forever if I can't iterate it quickly. |
Saving my notes here. which would make the fixed release GPUCompiler 0.26.7 |
@jw3126 Why do we have CuDNN dep in this package? I understand CUDA for the extension but not CuDNN. I'm fairly confident that the failing version of GPUCompiler is enforced by CuDNN version. It seems that it's phased out: https://github.com/JuliaAttic/CUDNN.jl They suggest to use CuArrays.jl, which is fully absorbed by CUDA.jl So all in all, I'd suggest removing CuDNN dep instead of tweaking the versions. |
Would love to get rid of that. In the past we had issues with libcudnn (or similar name) not found by onnxruntime and this is how we dealt with it. Nowadays there may or may not be better ways. If you can make linux (or windows) GPU support work with a lighter dependency that would be awesome. |
I don't know enough to guarantee that it works and no way to test it. As a first pass, I've bumped up cuDNN to 1.3, which hits the patch version 1.3.2 -> has the fixed GPUCompiler v0.26.7 You can trigger the CI at your convenience |
Thanks I will locally check if it works on linux later |
The current errors are CUDA.jl + Julia 1.12 related. I don't think it's supported yet, see their own CI: https://buildkite.com/julialang/cuda-dot-jl/builds/5531#0193683c-531a-4f7e-ad30-23e4e167be72 |
Both the CUDA and cuDNN weak dependencies are sort of fake. We really only depend on them in order to get the right artifacts loaded, including libcudnn, so that libonnxruntime can link to them. |
I checked and could not get this branch to work locally. libcudnn does still not ship with CUDA.jl. |
Ah, that's a shame. Thanks for trying! Btw why do we need to pass CI for nightly builds? Shouldn't we check 1.11 instead? |
Yeah sorry, but convnets are pretty common, we cant break them. Julia 1 is always the current release (1.11 right now). Breaking nightly is not a merge blocker. |
Ah, yes, of course! I overlooked the Julia 1 CI on my phone. So everything is passing besides the nightly? That’s good! So will this be merged after all? I took your comment as though it doesn’t work, but CI looks good. |
It will not be merged. CI passes, because we have no GPU coverage (I don't know a way to run GPU CI for free). But my local testing shows GPU does not work because there is no libcudnn. So if we merged this, we would break GPU support. |
Got it! Thanks for explaining. Is there value in updating just the macos artifact on 1.15 in current version? It wouldn't help me, but at least it's native - a lot of Julia community have arm macs. I don't fully understand the failures, are there any next steps (sth to do) or just sit and wait? |
Sure good suggestion. |
I am not exactly keen on this, but making onnxruntime version a preference is something one can think about. |
Presumably we could target the CUDA artifacts directly rather than the higher level packages, but first we would need to find where the relevant libraries are. I'll have a look at what has changed in the CUDA packaging. |
That repository and any information there is irrelevant. The current cuDNN package lives here: https://github.com/JuliaGPU/CUDA.jl/tree/master/lib/cudnn. |
As far as I can tell nothing has changed in the CUDA packaging recently. I think the only update needed is
which effectively is what happens if you run We get the right For good measure the cuDNN compat in |
@GunnarFarneback thanks! Did you try that? I still get:
with your suggestion. |
Yes, with the LocalPreferences update and compat that resolves to a cuDNN 1.3.1 or higher it works locally for me. |
What versions of |
I just saw this error:
|
I rebooted and recreated the environment. Tests pass, but I get the following warning:
|
I get that warning too. It happens in https://github.com/jw3126/ONNXRunTime.jl/blob/main/test/test_cuda.jl#L35 when |
Oh, that's an amazing progress! Thank you both for looking into it. I checked the 1.9 failures -- it's because CuDNN 1.4 dropped support for it: https://github.com/JuliaGPU/CUDA.jl/blob/7ff012f21ecaf9364a348289a136deebe299e8d9/lib/cudnn/Project.toml#L17 |
@GunnarFarneback does this PR look good to you? In particular any other places that need version adjustments? |
Could you add 1.10 CI, just so that we are conscious which is our min julia version. If it does not run on 1.10 that is also fine, just bump to 1.11, even if that means running CI twice. |
I've done that yesterday night. Check the CI results and compat in Project.toml Or do you mean something else? |
First of all, thank you for this amazing package!!
I've hit some issues with the outdated binary we were using (I needed IR v10), so I've updated the repo accordingly.
In addition, I've also changed the macos aarch64 to the correct binary - they now produce a native one.
TODO list
[x] Update all artifacts links to 1.20.1, update SHA1 and SHA256 values
[x] Update src/versions.jl to Cuda 12.0 (as per the announcement of onnxruntime 1.19: "Default GPU packages use CUDA 12.x and Cudnn 9.x (previously CUDA 11.x/CuDNN 8.x) CUDA 11.x/CuDNN 8.x packages are moved to the aiinfra VS feed.")
[x] Update the reference to Cuda 11.8 on the README page
[x] Ran tests (locally) -- all passed
[x] Verified that the package loads the model with IR10 and all works