Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks with Julia v1.5 #8

Open
oschulz opened this issue May 29, 2020 · 4 comments
Open

Benchmarks with Julia v1.5 #8

oschulz opened this issue May 29, 2020 · 4 comments

Comments

@oschulz
Copy link
Collaborator

oschulz commented May 29, 2020

Julia v1.5 enables inline allocation of structs with pointers (JuliaLang/julia#34126), this should make UnsafeArrays unnecessary in most cases. New benchmarks - using the test case

using Base.Threads, LinearAlgebra
using UnsafeArrays
using BenchmarkTools

function colnorms!(dest::AbstractVector, A::AbstractMatrix)
    @threads for i in axes(A, 2)
        dest[i] = norm(view(A, :, i))
    end
    dest
end

A = rand(50, 10^5);
dest = similar(A, size(A, 2));

colnorms!(dest, A)

With Julia v1.4:

julia> nthreads()
64

julia> @benchmark colnorms!(dest, A)
BenchmarkTools.Trial: 
  memory estimate:  4.62 MiB
  allocs estimate:  100323
  --------------
  minimum time:     256.291 μs (0.00% GC)
  median time:      623.428 μs (0.00% GC)
  mean time:        10.020 ms (93.82% GC)
  maximum time:     3.567 s (99.97% GC)
  --------------
  samples:          758
  evals/sample:     1

julia> @benchmark @uviews A colnorms!(dest, A)
BenchmarkTools.Trial: 
  memory estimate:  45.63 KiB
  allocs estimate:  324
  --------------
  minimum time:     227.121 μs (0.00% GC)
  median time:      249.831 μs (0.00% GC)
  mean time:        262.351 μs (1.26% GC)
  maximum time:     4.043 ms (85.49% GC)
  --------------
  samples:          10000
  evals/sample:     1

With Julia v1.5-beta1:

julia> nthreads()
64

julia> @benchmark colnorms!(dest, A)
BenchmarkTools.Trial: 
  memory estimate:  46.61 KiB
  allocs estimate:  321
  --------------
  minimum time:     135.311 μs (0.00% GC)
  median time:      156.681 μs (0.00% GC)
  mean time:        166.511 μs (2.80% GC)
  maximum time:     5.915 ms (89.80% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark @uviews A colnorms!(dest, A)
BenchmarkTools.Trial: 
  memory estimate:  46.66 KiB
  allocs estimate:  322
  --------------
  minimum time:     126.701 μs (0.00% GC)
  median time:      140.041 μs (0.00% GC)
  mean time:        150.547 μs (2.48% GC)
  maximum time:     5.952 ms (90.35% GC)
  --------------
  samples:          10000
  evals/sample:     1

Very little difference in the mean runtime with and without @uviews - in contrast to v1.4, where we see a strong difference. Also, a very nice gain in speed in general.

Test system: AMD EPYC 7702P 64-core CPU.

@mbauman
Copy link
Member

mbauman commented May 29, 2020

I was a little disappointed to still see that 7% difference there, but I was unable to reproduce it with my 6-core skylake laptop:

julia> @benchmark colnorms!($dest, $A)
BenchmarkTools.Trial:
  memory estimate:  4.16 KiB
  allocs estimate:  31
  --------------
  minimum time:     1.138 ms (0.00% GC)
  median time:      1.191 ms (0.00% GC)
  mean time:        1.213 ms (0.00% GC)
  maximum time:     1.754 ms (0.00% GC)
  --------------
  samples:          4119
  evals/sample:     1

julia> @benchmark @uviews $A colnorms!($dest, $A)
BenchmarkTools.Trial:
  memory estimate:  4.16 KiB
  allocs estimate:  31
  --------------
  minimum time:     1.136 ms (0.00% GC)
  median time:      1.187 ms (0.00% GC)
  mean time:        1.213 ms (0.00% GC)
  maximum time:     2.265 ms (0.00% GC)
  --------------
  samples:          4121
  evals/sample:     1

julia> Sys.cpu_summary()
Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz:
          speed         user         nice          sys         idle          irq
#1-12  2600 MHz   27733057 s          0 s   14434951 s  900004048 s          0 s

julia> nthreads()
6

Is that 7% in your demo just noise? I'd be very interested to know if it's real.

@oschulz
Copy link
Collaborator Author

oschulz commented May 29, 2020

Is that 7% in your demo just noise? I'd be very interested to know if it's real.

I think so, I'll make a few more in-depth check with thread-pinning, etc.

@oschulz
Copy link
Collaborator Author

oschulz commented May 29, 2020

@mbauman, ran it again with $ in @benchmark. There doesn't seem to be any very significant difference between using @uviews or not on Julia v1.5 😄 :

numactl -C 0-63 julia
using Base.Threads, LinearAlgebra
using UnsafeArrays
using BenchmarkTools

function colnorms!(dest::AbstractVector, A::AbstractMatrix)
    @threads for i in axes(A, 2)
        dest[i] = norm(view(A, :, i))
    end
    dest
end

colnorms_with_uviews!(dest, A) = @uviews A colnorms!(dest, A)


A = rand(50, 10^5);
dest = similar(A, size(A, 2));

colnorms!(dest, A)
colnorms_with_uviews!(dest, A)


julia> versioninfo()
Julia Version 1.5.0-beta1.0
Commit 6443f6c95a (2020-05-28 17:42 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD EPYC 7702P 64-Core Processor

julia> nthreads()
64

julia> @benchmark colnorms!($dest, $A)
BenchmarkTools.Trial: 
  memory estimate:  46.61 KiB
  allocs estimate:  321
  --------------
  minimum time:     95.621 μs (0.00% GC)
  median time:      110.751 μs (0.00% GC)
  mean time:        121.822 μs (2.68% GC)
  maximum time:     4.075 ms (91.13% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark colnorms_with_uviews!($dest, $A)
BenchmarkTools.Trial: 
  memory estimate:  46.63 KiB
  allocs estimate:  321
  --------------
  minimum time:     89.120 μs (0.00% GC)
  median time:      105.310 μs (0.00% GC)
  mean time:        116.689 μs (2.70% GC)
  maximum time:     4.001 ms (90.93% GC)
  --------------
  samples:          10000
  evals/sample:     1

These numbers seem fairly stable, when I run it multiple times. So a very small difference remains, but that can't really be due to memory allocation - on 64 threads, any difference in memory allocation frequency should result in a clear performance difference.

@oschulz
Copy link
Collaborator Author

oschulz commented May 29, 2020

And the pure absolute difference between Julia v1.4 and v1.5 (without UnsafeArrays), on 64 threads:

Julia v1.4:

julia> @benchmark colnorms!($dest, $A)
BenchmarkTools.Trial: 
  memory estimate:  4.62 MiB
  allocs estimate:  100323
  --------------
  minimum time:     257.731 μs (0.00% GC)
  median time:      617.504 μs (0.00% GC)
  mean time:        9.535 ms (93.55% GC)
  maximum time:     3.384 s (99.97% GC)
  --------------
  samples:          758
  evals/sample:     1

Julia v1.5

julia> @benchmark colnorms!($dest, $A)
BenchmarkTools.Trial: 
  memory estimate:  46.61 KiB
  allocs estimate:  321
  --------------
  minimum time:     95.621 μs (0.00% GC)
  median time:      110.751 μs (0.00% GC)
  mean time:        121.822 μs (2.68% GC)
  maximum time:     4.075 ms (91.13% GC)
  --------------
  samples:          10000
  evals/sample:     1

A mean time of 122 μs vs. 9.5 ms before! My deepest thanks to the compiler team for this. I think JuliaLang/julia#34126 will boost heavily multi-threaded applications a lot.

After all, benchmark mean time is usually the number with the strongest influence on application wall-clock time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants