Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum Size Types #87

Open
jacob-fw opened this issue Mar 24, 2024 · 1 comment
Open

Minimum Size Types #87

jacob-fw opened this issue Mar 24, 2024 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@jacob-fw
Copy link

jacob-fw commented Mar 24, 2024

So this may very well just be a flaw in my way of thinking of how to use std::simd, but I haven't been able to find anything that addresses it.
One of the situations that to be seems potentially problematic is with something like matrix operations.
To my mind at least, it seems like the simplest way to do this would be to have an array where each row is represented as a separate std::simd. While I'm not 100% sure about this, from everything I can see, it looks like with the current standard, the compiler is always going to try and go for the largest size simd register available, even if there are smaller ones available. For larger matrices this is fine, but for smaller ones that could still benefit from simd, it seems like there could be a whole lot of unnecessary padding, which I imagine would only get worse as cpu's continue to expand their simd options.
Let's say for example I have a 7x7 matrix of floats. Most cpu's with simd support are either going to have a 4x-float or 8x-float register, either of which would likely work well here (although obviously we would prefer the 8x-float). However if the cpu supports AVX-512, it's going to have 16x-float registers, and (I may be wrong about this) but it looks like in that situation the compiler will use the 16x-float register instead. Now that's not the end of the world in this scenario, however:
a) It is flat out unnecessary considering there is a better option, and will take longer to transfer, and take up more cache, more time and power as the simd instruction has to execute double the amount of data it otherwise would have.
b) Again, consider what is going to happen as the chip makers continue to expand on their simd features. What happens as a 16x-float goes to a 32x, 64x, 128x, etc.

I think that the obvious response is going to be to use simd::fixed_size, however then we will lose much of the portability if we're always having to specify the size of a simd type for each machine.

It looks like deduce may be used to solve this issue but I'm not sure.
If not then I would suggest this be added to the standard, to ensure the type that ends up being used isn't any bigger than we actually need.

Again, if there is something I missed in how this works, please feel free to correct me.

@mattkretz
Copy link
Owner

The facilities for fixed_size and deduce have been improved and simplified in the C++26 paper. Specifically, std::simd<T, N> is basically std::experimental::simd<T, std::experimental::simd_abi:deduce_t<T, N>>. And even the latter will choose SIMD register width (at least with GCC) as you desire (next power of 2, if it exists in HW). You can test this with a simple sizeof check.

Does this address your concern?

@mattkretz mattkretz self-assigned this Mar 25, 2024
@mattkretz mattkretz added the question Further information is requested label Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants