CPU benchmarking #361

apaloczy · 2024-06-19T10:50:57Z

apaloczy
Jun 19, 2024
Collaborator

I have been attempting some benchmarking of MultiLayerQG simulations on CPUs (following the GPU example in runtests.jl):

using GeophysicalFlows
using BenchmarkTools

nlayers = 2

dev = CPU()
println("Number of layers: "*string(nlayers))
prob = MultiLayerQG.Problem(nlayers, dev)
@btime stepforward!(prob)

I'm seeing a general increase in runtime with number of threads after around 4-8, though (see output below). Could this be due to some known bottleneck? I've tried the same with 10 layers and also with SingleLayerQG, with similar results.

Number of layers: 2
  13.258 ms (241 allocations: 16.61 KiB)
[ Info: FourierFlows will use 2 threads
Number of layers: 2
  9.174 ms (23537 allocations: 2.07 MiB)
[ Info: FourierFlows will use 4 threads
Number of layers: 2
  10.010 ms (23537 allocations: 2.07 MiB)
[ Info: FourierFlows will use 8 threads
Number of layers: 2
  11.386 ms (23537 allocations: 2.07 MiB)
[ Info: FourierFlows will use 16 threads
Number of layers: 2
  13.705 ms (23537 allocations: 2.07 MiB)
[ Info: FourierFlows will use 32 threads
Number of layers: 2
  19.538 ms (23537 allocations: 2.07 MiB)
[ Info: FourierFlows will use 64 threads
Number of layers: 2
  27.591 ms (23537 allocations: 2.07 MiB)

Answered by glwagner

Jun 19, 2024

So the result you are looking for is to find the optimal number of threads, and to find that the optimal number of threads increases for bigger problems. That would indicate things are working as expected...

View full answer

glwagner · 2024-06-19T16:05:26Z

glwagner
Jun 19, 2024
Maintainer

Threads have overhead so its expected that there will be problem sizes where increasing thread count no longer produces a speed up. But maybe the problem is too small for that?

3 replies

glwagner Jun 19, 2024
Maintainer

So the result you are looking for is to find the optimal number of threads, and to find that the optimal number of threads increases for bigger problems. That would indicate things are working as expected...

Answer selected by apaloczy

apaloczy Jun 19, 2024
Collaborator Author

That makes sense then. I ran a few more tests with 5, 10, and 20 layers at much higher resolution (N = 2048), and the optimal thread count does increase.

For example, a 5-layer, N = 2048 simulation brings the optimal thread count to 32:

Number of layers: 5
  7.210 s (349907 allocations: 101.44 MiB)
[ Info: FourierFlows will use 2 threads
Number of layers: 5
  6.861 s (497775 allocations: 111.86 MiB)
[ Info: FourierFlows will use 4 threads
Number of layers: 5
  5.280 s (403443 allocations: 105.95 MiB)
[ Info: FourierFlows will use 8 threads
Number of layers: 5
  5.720 s (403443 allocations: 105.95 MiB)
[ Info: FourierFlows will use 16 threads
Number of layers: 5
  5.146 s (457347 allocations: 109.33 MiB)
[ Info: FourierFlows will use 32 threads
Number of layers: 5
  4.292 s (403443 allocations: 105.95 MiB)
[ Info: FourierFlows will use 64 threads
Number of layers: 5
  5.130 s (457347 allocations: 109.33 MiB)
[ Info: FourierFlows will use 128 threads
Number of layers: 5
  7.416 s (497775 allocations: 111.86 MiB)

Would it make sense to have a simple benchmarking test like this one for CPUs? I see that currently this is only done for GPUs (CC'ing @navidcy).

glwagner Jun 19, 2024
Maintainer

Can you just change the device in the benchmark to use it for CPU?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU benchmarking #361

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

CPU benchmarking #361

apaloczy Jun 19, 2024 Collaborator

Replies: 1 comment · 3 replies

glwagner Jun 19, 2024 Maintainer

glwagner Jun 19, 2024 Maintainer

apaloczy Jun 19, 2024 Collaborator Author

glwagner Jun 19, 2024 Maintainer

apaloczy
Jun 19, 2024
Collaborator

Replies: 1 comment 3 replies

glwagner
Jun 19, 2024
Maintainer

glwagner Jun 19, 2024
Maintainer

apaloczy Jun 19, 2024
Collaborator Author

glwagner Jun 19, 2024
Maintainer