Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Since julia-1.10 / OpenBLAS32-0.3.22 the number of threads must be explicitly specified #123

Open
fp4code opened this issue Nov 15, 2023 · 6 comments

Comments

@fp4code
Copy link

fp4code commented Nov 15, 2023

Since julia-1.10 a simple program like this one is very slow because only one CPU is used

# On a 32-core computer, uncomment next line to have 3200% CPU workload if julia version is > 1.9
# ENV["OPENBLAS_NUM_THREADS"]="32"

using Random, MUMPS, MPI, SparseArrays, LinearAlgebra
N = 10000
Nc = 1
Random.seed!(3)
A = sprand(N, N, 0.1) + I
rhs = rand(N, Nc)
x = Matrix{Float64}(undef, N, Nc)
MPI.Init()
mumps = Mumps{Float64}(mumps_unsymmetric, default_icntl, default_cntl64)
associate_matrix!(mumps, A)
t = @elapsed factorize!(mumps)
associate_rhs!(mumps, rhs)
solve!(mumps)
MUMPS.get_sol!(x, mumps)
finalize(mumps)
MPI.Finalize()
println("maximum error = ", maximum(abs.(A*x - rhs)), ", factorise time (s) = ", t)

No such problem with pinning OpenBLAS32_jll to 0.3.21: Pkg.add(name="OpenBLAS32_jll", version="0.3.21")

One solution is to set the environment variable for the desired number of threads: export OPENBLAS_NUM_THREADS=32

This can be done inside Julia: ENV["OPENBLAS_NUM_THREADS"]="32"

@amontoison
Copy link
Member

@ViralBShah Are you aware of this issue with the last release of OpenBLAS?

@fp4code
Copy link
Author

fp4code commented Nov 15, 2023

@ViralBShah @amontoison There was a typo in the title (now corrected), the problematic package is OpenBLAS32_jll, not OpenBLAS_jll

Same problem with the last release of OpenBLAS32_jll (0.3.25):

Slow with default configuration:

Pkg.add(url="https://github.com/JuliaBinaryWrappers/OpenBLAS32_jll.jl", rev="main")
Pkg.add(name="OpenBLAS32_jll", version="0.3.24")
Pkg.add(name="OpenBLAS32_jll", version="0.3.23")
Pkg.add(name="OpenBLAS32_jll", version="0.3.22")

Fast with default configuration:

Pkg.add(name="OpenBLAS32_jll", version="0.3.21")

@fp4code fp4code changed the title Since julia-1.10 / OpenBLAS-0.3.22 the number of threads must be explicitly specified Since julia-1.10 / OpenBLAS32-0.3.22 the number of threads must be explicitly specified Nov 15, 2023
@guiburon
Copy link
Contributor

guiburon commented Feb 12, 2024

FYI, in my case I have to set the env variable OMP_NUM_THREADS to the desired number of threads per proc.

OPENBLAS_NUM_THREADS set unlock multithreading per proc but its value seems irrelevant: all the CPUs of my system are used. This can lead to slow computation with too many threads.

@ViralBShah
Copy link

ViralBShah commented Feb 12, 2024

I suspect this is because with OpenBLAS, we set the number of threads in our __init__ when LinearAlgebra loads (IIRC), but we don't do such a thing with OpenBLAS32_jll which is a BB generated package.

I had created OpenBLAS32.jl, which may be a place where we can specify number of threads on loading and such.

@amontoison
Copy link
Member

@ViralBShah
A little bit off-topic but you should check in OpenBLAS32.jl that an LP64 BLAS / LAPACK is not already loaded.
I do that for all packages compiled LBT, like Ipopt.jl or HSL.jl.
A user could use MKL or AppleAccelerate already and you don't to want to modify the setup.
https://github.com/JuliaSmoothOptimizers/HSL.jl/blob/main/src/HSL.jl#L16-L23

@amontoison
Copy link
Member

@fp4code
I think that I found what we need:
https://github.com/JuliaLang/julia/blob/master/stdlib/LinearAlgebra/src/LinearAlgebra.jl#L766

Can you test in MUMPS.jl with an __init__ function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants