-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Target-dependent parameters in the multi-target mode #2505
Comments
There is the There is also the If HWY_HAVE_SCALABLE is 0 (which is true for targets other than HWY_SVE, HWY_SVE2, and HWY_RVV), the largest vector with lane type |
Hi @emilmelnikov , I agree with @johnplatts that checking lanes is convenient: this groups targets into three sets: {AVX3, AVX3_DL, AVX3_ZEN4, AVX3_SPR}, {AVX2, AVX10_2}, and {SSE4, SSSE3, SSE2}, which is better than comparing targets directly. May I suggest an alternate approach, namely autotuning? Just try all the variants at runtime and see which is best :) I'm considering hoisting the autotuning state machine into Highway because it's reusable. |
@johnplatts @jan-wassenberg Thanks for the input!
I've thought about something like that, but considered it to be too much complexity at the time.
That would be really helpful! Alternatively, if autotuner is too project-specific, I think people would appreciate some sort of a short example of how to roll up a custom one, either from scratch or by using various Highway tools. |
Understandable, but it's not too heavy: perhaps 100 LOC which we can lift into Highway, and 50-100 on the app side. Adding hwy/autotune.h is on my TODO :) |
In my multi-target (
<hwy/foreach_target.h>
) code, I need to define some hardware-dependent parameters (specifically, the unroll factor in order to utilize all available FP execution ports). This theoretically depends on a specific CPU model, but could be reasonably approximated by detecting the highest supported SIMD ISA (at least on x86 it seems to be true).Currently, it is possible to get the current target in the multi-target mode with
#if HWY_TARGET == HWY_<<<isa>>>
. Is there any better way to do this than a series of ifdefs for each possible target? What is the best approach?The
HWY_<<<isa>>>
symbols are defined indetect_targets.h
as 64-bit constants, and the docs say that the lower value is "better", so it's theoretically possible to use comparisons for conditional compilation. Is this a good approach? Are these values considered stable?The text was updated successfully, but these errors were encountered: