-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows TLS Emulation is extremely slow #61
Comments
Not sure if it's only the allocator. Even GEMM which shouldn't allocate much is abysmally slow: Compiled with MSVC
as of https://github.com/mratsim/weave/tree/7e414ad5fc59920a7e97930e636dd411f323c860 |
We can get 3x perf on both fibonacci and GEMM by disabling TLS emulation. GEMM does segfaults from time to time though. With Clang on WIndows
MSVC can only reach 1.2TFlops. Given that the microkernel is pure intrinsics the generated code should be the same. |
I wonder why tlsEmulation is still turned on after all these years... cough |
Overhead-bound benchmarks like Fibonacci and Depth-First Search are significantly slower on Windows than Linux and Mac.
Config: i9-9980XE 18 cores, 36 threads, with 4.1GHz all core Turbo
On Fibonacci in particular, the default eager futures takes 14s under windows while it takes 370ms under Linux for a whopping 30x slowdown.
Lazy futures allocated via
alloca
takes 800ms while they take 180ms under Linux.This points to a memory allocator issue.
Memory-bound benchmarks (transpose) and CPU-bound benchmarks (Black-Scholes) seem to behave somewhat similarly to Linux.
Similar issues:
Low priority as we can't probably do anything more than what we have now in our memory subsytem. It's doubtful than even using Mimalloc on Windows (just for Weave) would help as our memory pool is based on the same techniques. Lastly Fibonacci is an extreme case with computation load of 1 cycle while Weave targets being efficient at 2000 cycles.
TODO: benchmark Cilk and TBB to make sure we are not missing something.
The text was updated successfully, but these errors were encountered: