You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was comparing performance on two MacBooks and was surprised at some of the results.
I used the included benchmark program to compare the following hardware:
2015 MacBook Pro: 2.2 GHz Quad-Core Intel Core i7
2021 MacBook Pro: 8-core Apple M1 Pro
Level
Intel
M1 Pro
Difference
1
70ms
43ms
63%
2
107ms
59ms
81%
3
103ms
60ms
72%
4
108ms
61ms
77%
5
112ms
64ms
75%
6
120ms
74ms
62%
7
157ms
110ms
43%
8
316ms
270ms
17%
9
443ms
429ms
3%
10
2062ms
1340ms
54%
11
4481ms
3142ms
43%
12
9192ms
7309ms
26%
At level 9, M1 Pro is only 3% faster than a 6-year older Intel!
I tried profiling it with xctrace and as best I can tell the performance hit comes from load_u32_unaligned (I can attach the trace output if that would be helpful). I can confirm that UNALIGNED_ACCESS_IS_FAST is set, but beyond that I haven't been able to work out why there's an issue.
Do you have any ideas, or is it really the case that the Intel hardware is simply better at this?
The text was updated successfully, but these errors were encountered:
At level 9 the bottleneck is usually the inner loop in hc_matchfinder_longest_match(). This is one very specific thing, so it is plausible that a CPU that is faster in general is not faster at this specific thing. This inner loop is also not very friendly to CPUs in general, as it doesn't allow for much instruction pipelining. But it's hard to improve.
It doesn't really make sense to talk about just the load_u32_unaligned part of hc_matchfinder_longest_match, since load_u32_unaligned is always inlined and compiles down to a single load. The fact that you are seeing a difference in the time accounted to load_u32_unaligned is probably just a side effect of whatever profiling tool you are using.
ebiggers
changed the title
Comparatively poor performance on Apple Silicon
Comparatively poor compression performance on Apple Silicon
Jan 10, 2023
ebiggers
changed the title
Comparatively poor compression performance on Apple Silicon
Improve hc_matchfinder_longest_match() performance on Apple Silicon
Jan 10, 2023
I was comparing performance on two MacBooks and was surprised at some of the results.
I used the included
benchmark
program to compare the following hardware:2015 MacBook Pro: 2.2 GHz Quad-Core Intel Core i7
2021 MacBook Pro: 8-core Apple M1 Pro
At level 9, M1 Pro is only 3% faster than a 6-year older Intel!
I tried profiling it with xctrace and as best I can tell the performance hit comes from
load_u32_unaligned
(I can attach the trace output if that would be helpful). I can confirm thatUNALIGNED_ACCESS_IS_FAST
is set, but beyond that I haven't been able to work out why there's an issue.Do you have any ideas, or is it really the case that the Intel hardware is simply better at this?
The text was updated successfully, but these errors were encountered: