Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target-dependent parameters in the multi-target mode #2505

Open
emilmelnikov opened this issue Feb 28, 2025 · 4 comments
Open

Target-dependent parameters in the multi-target mode #2505

emilmelnikov opened this issue Feb 28, 2025 · 4 comments

Comments

@emilmelnikov
Copy link
Contributor

In my multi-target (<hwy/foreach_target.h>) code, I need to define some hardware-dependent parameters (specifically, the unroll factor in order to utilize all available FP execution ports). This theoretically depends on a specific CPU model, but could be reasonably approximated by detecting the highest supported SIMD ISA (at least on x86 it seems to be true).

Currently, it is possible to get the current target in the multi-target mode with #if HWY_TARGET == HWY_<<<isa>>>. Is there any better way to do this than a series of ifdefs for each possible target? What is the best approach?

The HWY_<<<isa>>> symbols are defined in detect_targets.h as 64-bit constants, and the docs say that the lower value is "better", so it's theoretically possible to use comparisons for conditional compilation. Is this a good approach? Are these values considered stable?

@johnplatts
Copy link
Contributor

There is the HWY_LANES(T) macro which is equal to the maximum possible number of lanes in a vector of type hwy::HWY_NAMESPACE::Vec<hwy::HWY_NAMESPACE::ScalableTag<T>> for the current HWY_TARGET, and HWY_LANES(T) is equal to hwy::HWY_NAMESPACE::MaxLanes(hwy::HWY_NAMESPACE::ScalableTag<T>()).

There is also the HWY_MAX_BYTES constant which is equal to the largest possible size of the largest SIMD vector supported on the current HWY_TARGET in bytes.

If HWY_HAVE_SCALABLE is 0 (which is true for targets other than HWY_SVE, HWY_SVE2, and HWY_RVV), the largest vector with lane type T on the current HWY_TARGET has exactly HWY_LANES(T) lanes.

@jan-wassenberg
Copy link
Member

Hi @emilmelnikov , I agree with @johnplatts that checking lanes is convenient: this groups targets into three sets: {AVX3, AVX3_DL, AVX3_ZEN4, AVX3_SPR}, {AVX2, AVX10_2}, and {SSE4, SSSE3, SSE2}, which is better than comparing targets directly.
The values are mostly stable but we have had to reorder/renumber them in the past when something ran out, and targets will be added in future.

May I suggest an alternate approach, namely autotuning? Just try all the variants at runtime and see which is best :)
We recently added this for matmul in gemma.cpp: https://github.com/google/gemma.cpp/blob/dev/ops/matmul.h#L439.

I'm considering hoisting the autotuning state machine into Highway because it's reusable.

@emilmelnikov
Copy link
Contributor Author

@johnplatts @jan-wassenberg Thanks for the input!

May I suggest an alternate approach, namely autotuning? Just try all the variants at runtime and see which is best :)

I've thought about something like that, but considered it to be too much complexity at the time.

I'm considering hoisting the autotuning state machine into Highway because it's reusable.

That would be really helpful! Alternatively, if autotuner is too project-specific, I think people would appreciate some sort of a short example of how to roll up a custom one, either from scratch or by using various Highway tools.

@jan-wassenberg
Copy link
Member

Understandable, but it's not too heavy: perhaps 100 LOC which we can lift into Highway, and 50-100 on the app side. Adding hwy/autotune.h is on my TODO :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants