Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGPU downgrades Waterlily #145

Closed
SimonDanisch opened this issue Jul 17, 2024 · 21 comments · Fixed by #146
Closed

AMDGPU downgrades Waterlily #145

SimonDanisch opened this issue Jul 17, 2024 · 21 comments · Fixed by #146

Comments

@SimonDanisch
Copy link

I wanted to try out Waterlily on an AMD gpu with AMDGPU.jl, but adding AMDGPU to the project downgrades Waterlily significantly:
image

Any help to run this on my 7900xtx would be appreciated :)

@b-fg
Copy link
Member

b-fg commented Jul 17, 2024

Hi Simon, thanks for reporting this. What ROCm version do you have installed in your system? I was recently able to run WaterLily on an AMD GPU, but because the ROCm version in that system was quite low (5.1.1) I had to force AMDGPU.jl to use its 0.4.15 version. Maybe something similar to this is happening?
Also, we could force WaterLily v1.1 to compile only upstream of a certain AMDGPU.jl version.
And it would also be helpful to see the full log of what packages changed versions when using AMDGPU (if there were more than just WaterLily) - maybe there is an incompatibility with another package.

@SimonDanisch
Copy link
Author

That wasn't the reason, I just updated all my drivers including rocm (5.7 + 6.1). I also don't think Pkg can even install different AMDGPU version based on available drivers, if I'm not mistaken.

You can actually reproduce it with this minimal setup (Julia 1.10.4):

]activate --temp
]add AMDGPU@0.9.6 WaterLily@1.1.0

@b-fg
Copy link
Member

b-fg commented Jul 17, 2024

What happens when you try to not specify the version, ie

]activate --temp
]add AMDGPU WaterLily

This works fine on my system, even though it installs AMDGPU v0.6.1. So maybe we should update our version compatibilities.

@b-fg
Copy link
Member

b-fg commented Jul 17, 2024

Can you please try my temporary fix in the fix_compat branch and report back?

]
activate --temp
add WaterLily#fix_compat
add [email protected]

@SimonDanisch
Copy link
Author

Yes, this seems to work :)

@b-fg
Copy link
Member

b-fg commented Jul 17, 2024

Good! I will create the PR and (re)activate some of the compatibilities. If that still works, then we can merge it :)

@SimonDanisch
Copy link
Author

It does not work though :(
I'm getting lots of:

Reason: unsupported call through a literal pointer (call to jl_gc_run_pending_finalizers)
Reason: unsupported call to an unknown function (call to ijl_pop_handler)

I guess some kernel isn't setup correctly, although I would expect CUDA.jl to also caugh on something like jl_gc_run_pending_finalizers ....
Do you have an example that you have tested with AMDGPU that runs fine?

@b-fg b-fg linked a pull request Jul 17, 2024 that will close this issue
@b-fg
Copy link
Member

b-fg commented Jul 17, 2024

You can try the following MWE

using WaterLily
using AMDGPU

function tgv(p, backend; Re=1600, T=Float32)
    L = 2^p; U = 1; κ=π/L; ν = 1/*Re)
    function (i,xyz)
        x,y,z = @. xyz/L*π                # scaled coordinates
        i==1 && return -U*sin(x)*cos(y)*cos(z) # u_x
        i==2 && return  U*cos(x)*sin(y)*cos(z) # u_y
        return 0.                              # u_z
    end
    Simulation((L, L, L), (0, 0, 0), 1/κ; U=U, uλ=uλ, ν=ν, T=T, mem=backend)
end

function main()
    sim = tgv(5, ROCArray)
    sim_step!(sim)
end

main()

I can correctly run this using ROCm/5.1.1, WaterLily.jl/1.1, AMDGPU.jl/0.4.15 on a Radeon Instinct MI50 32GB.

@SimonDanisch
Copy link
Author

That actually kills the julia session 😓
[email protected], WaterLily v1.1.0 and ROCm/6.1 on a 7900xtx.
The AMDGPU test seem to be getting much further on these new versions than any before!

@b-fg
Copy link
Member

b-fg commented Jul 17, 2024

That's unfortunate. This week I will get access to LUMI, which I presume has a more updated ROCm version. I will look into it and let you know how that goes.

@weymouth weymouth reopened this Jul 22, 2024
@weymouth
Copy link
Collaborator

The PR doesn't actually fix this issue.

@b-fg
Copy link
Member

b-fg commented Jul 23, 2024

Hey Simon, I have successfully run WaterLily on LUMI today (AMD MI250x), where ROCm/5.2.3. There are a couples of fixes for flows with bodies, but this example worked out of the box. This is my project environment right now

  [21141c5a] AMDGPU v0.9.6
  [0c68f7d7] GPUArrays v10.3.0
  [63c18a36] KernelAbstractions v0.9.22
  [90137ffa] StaticArrays v1.9.7
  [ed894a53] WaterLily v1.2.0

And I get the following AMDGPU.jl info

julia> AMDGPU.versioninfo()
[ Info: AMDGPU versioninfo
┌───────────┬──────────────────┬───────────┬──────────────────────────────────────┐
│ Available │ Name             │ Version   │ Path                                 │
├───────────┼──────────────────┼───────────┼──────────────────────────────────────┤
│     +     │ LLD              │ -/opt/rocm/llvm/bin/ld.lld            │
│     +     │ Device Libraries │ -/opt/rocm/amdgcn/bitcode             │
│     +     │ HIP              │ 5.2.21153/opt/rocm-5.2.3/lib/libamdhip64.so   │
│     +     │ rocBLAS          │ 2.44.0/opt/rocm-5.2.3/lib/librocblas.so    │
│     +     │ rocSOLVER        │ 3.18.0/opt/rocm-5.2.3/lib/librocsolver.so  │
│     +     │ rocALUTION       │ -/opt/rocm-5.2.3/lib/librocalution.so │
│     +     │ rocSPARSE        │ -/opt/rocm-5.2.3/lib/librocsparse.so  │
│     +     │ rocRAND          │ 2.10.5/opt/rocm-5.2.3/lib/librocrand.so    │
│     +     │ rocFFT           │ 1.0.27/opt/rocm-5.2.3/lib/librocfft.so     │
│     +     │ MIOpen           │ 2.17.0/opt/rocm-5.2.3/lib/libMIOpen.so     │
└───────────┴──────────────────┴───────────┴──────────────────────────────────────┘

[ Info: AMDGPU devices
┌────┬──────┬────────────────────────┬───────────┬────────────┐
│ Id │ Name │               GCN arch │ Wavefront │     Memory │
├────┼──────┼────────────────────────┼───────────┼────────────┤
│  1 │      │ gfx90a:sramecc+:xnack-6463.984 GiB │
└────┴──────┴────────────────────────┴───────────┴────────────┘

I cannot try anything with ROCm/6.x though... which could be the problem.

@SimonDanisch
Copy link
Author

I just found out that AMDGPU can be used on WSL2 :-O
Now I get:

julia> using AMDGPU

julia> function tgv(p, backend; Re=1600, T=Float32)
           L = 2^p; U = 1; κ=π/L; ν = 1/*Re)
           function (i,xyz)
               x,y,z = @. xyz/L*π                # scaled coordinates
               i==1 && return -U*sin(x)*cos(y)*cos(z) # u_x
               i==2 && return  U*cos(x)*sin(y)*cos(z) # u_y
               return 0.                              # u_z
           end
           Simulation((L, L, L), (0, 0, 0), 1/κ; U=U, uλ=uλ, ν=ν, T=T, mem=backend)
       end
tgv (generic function with 1 method)

julia> function main()
           sim = tgv(5, ROCArray)
           sim_step!(sim)
       end
main (generic function with 1 method)

julia> main()
ERROR: Scalar indexing is disallowed.
Invocation of setindex! resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

@b-fg
Copy link
Member

b-fg commented Jul 23, 2024

Ok, that's progress! I think the only problem is now that you might be running Julia with single thread instead of auto. Can you try julia -t auto ..., or export JULIA_NUM_THREADS=auto. Then validate that Threads.nthreads() > 1. This behaviour will eventually be fixed by #133.

@SimonDanisch
Copy link
Author

Yay, that makes the example work on WSL2 ubuntu!

I do notice now, that there seems to be a mismatch in the HIP version on windows:

Windows

julia> AMDGPU.versioninfo()
[ Info: AMDGPU versioninfo
┌───────────┬──────────────────┬───────────┬─────────────────────────────────────────────────────────────────────────────────────────
│ Available │ Name             │ Version   │ Path                                                                                   ⋯
├───────────┼──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────────────────────
│     +     │ LLD              │ -         │ C:\\Program Files\\AMD\\ROCm\\6.1\\bin\\ld.lld.exe                                     ⋯
│     +     │ Device Libraries │ -         │ C:\\Users\\sdani\\.julia\\artifacts\\5ad5ecb46e3c334821f54c1feecc6c152b7b6a45\\amdgcn/ ⋯
│     +     │ HIP              │ 5.7.32000 │ C:\\WINDOWS\\SYSTEM32\\amdhip64.DLL                                                    ⋯
│     +     │ rocBLAS          │ 4.1.2     │ C:\\Program Files\\AMD\\ROCm\\6.1\\bin\\rocblas.dll                                    ⋯
│     +     │ rocSOLVER        │ 3.25.0    │ C:\\Program Files\\AMD\\ROCm\\6.1\\bin\\rocsolver.dll                                  ⋯
│     +     │ rocALUTION       │ -         │ C:\\Program Files\\AMD\\ROCm\\6.1\\bin\\rocalution.dll                                 ⋯
│     +     │ rocSPARSE        │ -         │ C:\\Program Files\\AMD\\ROCm\\6.1\\bin\\rocsparse.dll                                  ⋯
│     +     │ rocRAND          │ 2.10.5    │ C:\\Program Files\\AMD\\ROCm\\6.1\\bin\\rocrand.dll                                    ⋯
│     +     │ rocFFT           │ 1.0.27    │ C:\\Program Files\\AMD\\ROCm\\6.1\\bin\\rocfft.dll                                     ⋯
│     -     │ MIOpen           │ -         │ -                                                                                      ⋯
└───────────┴──────────────────┴───────────┴─────────────────────────────────────────────────────────────────────────────────────────
                                                                                                                     1 column omitted

[ Info: AMDGPU devices
┌────┬─────────────────────────┬──────────┬───────────┬────────────┐
│ Id │                    Name │ GCN arch │ Wavefront │     Memory │
├────┼─────────────────────────┼──────────┼───────────┼────────────┤
│  1 │  AMD Radeon RX 7900 XTX │  gfx1100 │        32 │ 23.984 GiB │
│  2 │ AMD Radeon(TM) Graphics │  gfx1036 │        32 │ 12.019 GiB │
└────┴─────────────────────────┴──────────┴───────────┴────────────┘

WSL ubuntu

[ Info: AMDGPU versioninfo
┌───────────┬──────────────────┬───────────┬─────────────────────────────────────────────────────────────────────────────────────┐
│ Available │ Name             │ Version   │ Path                                                                                │
├───────────┼──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────────────────┤
│     +     │ LLD              │ -         │ /opt/rocm/llvm/bin/ld.lld                                                           │
│     +     │ Device Libraries │ -         │ /home/simi/.julia/artifacts/5ad5ecb46e3c334821f54c1feecc6c152b7b6a45/amdgcn/bitcode │
│     +     │ HIP              │ 6.1.40093 │ /opt/rocm-6.1.3/lib/libamdhip64.so                                                  │
│     +     │ rocBLAS          │ 4.1.2     │ /opt/rocm-6.1.3/lib/librocblas.so                                                   │
│     +     │ rocSOLVER        │ 3.25.0    │ /opt/rocm-6.1.3/lib/librocsolver.so                                                 │
│     +     │ rocALUTION       │ -         │ /opt/rocm-6.1.3/lib/librocalution.so                                                │
│     +     │ rocSPARSE        │ -         │ /opt/rocm-6.1.3/lib/librocsparse.so                                                 │
│     +     │ rocRAND          │ 2.10.5    │ /opt/rocm-6.1.3/lib/librocrand.so                                                   │
│     +     │ rocFFT           │ 1.0.27    │ /opt/rocm-6.1.3/lib/librocfft.so                                                    │
│     +     │ MIOpen           │ 3.1.0     │ /opt/rocm-6.1.3/lib/libMIOpen.so                                                    │
└───────────┴──────────────────┴───────────┴─────────────────────────────────────────────────────────────────────────────────────┘

[ Info: AMDGPU devices
┌────┬────────────────────────┬──────────┬───────────┬────────────┐
│ Id │                   Name │ GCN arch │ Wavefront │     Memory │
├────┼────────────────────────┼──────────┼───────────┼────────────┤
│  1 │ AMD Radeon RX 7900 XTX │  gfx1100 │        32 │ 23.938 GiB │
└────┴────────────────────────┴──────────┴───────────┴────────────┘

@b-fg
Copy link
Member

b-fg commented Jul 24, 2024

Great! And yes, that mismatch might have caused your original error. For that, you could submit an issue on AMDGPU.jl I guess. So, is this issue resolved now? :)

@SimonDanisch
Copy link
Author

Yes, thank you :)

@SimonDanisch
Copy link
Author

I just realized, that the error for AMDGPU on windows could come from this:
image

@b-fg
Copy link
Member

b-fg commented Aug 21, 2024

Ah, good point. Should we try adding unsafe_trunc(Int32, 1f0) on the AMD extension when Windows is detected then (using Sys.iswindows) ?

@SimonDanisch
Copy link
Author

It's not just unsafe_trunc, but I can imagine this being faster for all platforms, so I wouldn't bother detecting the platform...Unless, this easily introduces numerical problems ;)

@b-fg
Copy link
Member

b-fg commented Aug 21, 2024

We can run some benchmarks I guess, but I do not have access to a Windows with AMD GPUs. You could try playing with https://github.com/WaterLily-jl/WaterLily-Benchmarks. I can help if necessary :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants