-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/limited multithreading #23
base: main
Are you sure you want to change the base?
Feature/limited multithreading #23
Conversation
I just notice that they are a lot of “fake change”, I probably have run the autoformatter at some point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the diff, there is a lot of removed whitespace. Is this a formatting issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for some reason these change did not appear in vscode comparison tool. I will fix that.
@@ -285,8 +293,8 @@ function randcompress(A::AbstractMatOrLinOp{T}, rcl::ClusterTree, ccl::ClusterTr | |||
|
|||
# compute initial sampling | |||
k = kest; r = opts.noversampling; | |||
Ωcol = randn(n, k+r) | |||
Ωrow = randn(m, k+r) | |||
Ωcol = randn(T,n, k+r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the inclusion of T here correct? I am not sure how the sampling matrix in randomized SVD needs to be for complex numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I will check that.
@@ -1,4 +1,5 @@ | |||
using Test, LinearAlgebra, HssMatrices | |||
BLAS.set_num_threads(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using too many threads when combining BLAS operations with Julia multithreading results in slowdowns due to CPU oversubscription. I put 2 here as a compromise, 1 could make more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing this. The changes look good to me.
I see that you have unittests already for both the multithreaded and single threaded case. This is great for making sure things work as intended. Are there any cornercases which we haven't thought of due to the introduction of parallelism?
but otherwise LGTM, jsut minor questions on my end
Great work Baptiste! Do you have any intuition why matvec isn't faster? |
also - I would suggest bumping up the version number and pushing a new package once this is merged |
Hi, Regarding the absence of performance gain for matvec, I suspect that the serial version is simply so fast that the multithreading overhead isn’t worth it. The situation may change for larger problem sizes. Overall, the efficiency of the parallel version is not great, but it still provides a useful speedup. I agree that could deserve a version bump. |
Hi,
This PR introduces some parallelization in HssMatrices. Benchmarks on an M1 Pro show a performance increase when parallelization is enabled, with no degradation compared to main when it is disabled.
There is probably room for performance improvement, but I think that is a good start.
Benchmark Results
The benchmarks were performed using an updated benchmarking script for both the feature branch and the old version. Results are as follows:
New Version
Without Multithreading (multithreaded = false)
Benchmarking full...
12.468 ms (1473 allocations: 70.92 MiB)
Benchmarking getindex...
75.417 μs (1227 allocations: 688.86 KiB)
Benchmarking compression...
126.705 ms (10080 allocations: 437.57 MiB)
Benchmarking randomized compression...
47.087 ms (20965 allocations: 65.62 MiB)
Benchmarking re-compression...
2.398 ms (14970 allocations: 4.80 MiB)
Benchmarking proper...
3.492 ms (7490 allocations: 6.37 MiB)
Benchmarking addition...
206.250 μs (1086 allocations: 4.89 MiB)
Benchmarking matvec...
360.417 μs (1374 allocations: 1.48 MiB)
Benchmarking matrix products...
2.243 ms (4640 allocations: 11.13 MiB)
Benchmarking ulvfactsolve...
8.661 ms (7497 allocations: 22.56 MiB)
Benchmarking hssldivide...
42.087 ms (66937 allocations: 76.29 MiB)
With Multithreading (multithreaded = true)
Threads.nthreads() = 6
Benchmarking full...
12.143 ms (1473 allocations: 70.92 MiB)
Benchmarking getindex...
76.583 μs (1227 allocations: 688.86 KiB)
Benchmarking compression...
121.999 ms (10080 allocations: 437.57 MiB)
Benchmarking randomized compression...
47.028 ms (21475 allocations: 65.68 MiB)
Benchmarking re-compression...
988.250 μs (15195 allocations: 4.82 MiB)
Benchmarking proper...
830.834 μs (7565 allocations: 6.38 MiB)
Benchmarking addition...
103.042 μs (1236 allocations: 4.90 MiB)
Benchmarking matvec...
343.208 μs (1629 allocations: 1.51 MiB)
Benchmarking matrix products...
1.382 ms (4865 allocations: 11.15 MiB)
Benchmarking ulvfactsolve...
8.834 ms (7497 allocations: 22.56 MiB)
Benchmarking hssldivide...
19.697 ms (65897 allocations: 72.22 MiB)
Old version
Benchmarking full...
12.572 ms (1473 allocations: 70.92 MiB)
Benchmarking getindex...
77.625 μs (1227 allocations: 688.86 KiB)
Benchmarking compression...
119.936 ms (10080 allocations: 437.57 MiB)
Benchmarking randomized compression...
45.687 ms (20281 allocations: 65.58 MiB)
Benchmarking re-compression...
2.354 ms (14736 allocations: 4.78 MiB)
Benchmarking proper...
3.482 ms (7396 allocations: 6.37 MiB)
Benchmarking addition...
207.500 μs (1085 allocations: 4.89 MiB)
Benchmarking matvec...
316.625 μs (1032 allocations: 1.46 MiB)
Benchmarking matrix products...
2.270 ms (4453 allocations: 11.12 MiB)
Benchmarking ulvfactsolve...
8.861 ms (7497 allocations: 22.56 MiB)
Benchmarking hssldivide...
41.304 ms (63843 allocations: 76.16 MiB)