You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So this may very well just be a flaw in my way of thinking of how to use std::simd, but I haven't been able to find anything that addresses it.
One of the situations that to be seems potentially problematic is with something like matrix operations.
To my mind at least, it seems like the simplest way to do this would be to have an array where each row is represented as a separate std::simd. While I'm not 100% sure about this, from everything I can see, it looks like with the current standard, the compiler is always going to try and go for the largest size simd register available, even if there are smaller ones available. For larger matrices this is fine, but for smaller ones that could still benefit from simd, it seems like there could be a whole lot of unnecessary padding, which I imagine would only get worse as cpu's continue to expand their simd options.
Let's say for example I have a 7x7 matrix of floats. Most cpu's with simd support are either going to have a 4x-float or 8x-float register, either of which would likely work well here (although obviously we would prefer the 8x-float). However if the cpu supports AVX-512, it's going to have 16x-float registers, and (I may be wrong about this) but it looks like in that situation the compiler will use the 16x-float register instead. Now that's not the end of the world in this scenario, however:
a) It is flat out unnecessary considering there is a better option, and will take longer to transfer, and take up more cache, more time and power as the simd instruction has to execute double the amount of data it otherwise would have.
b) Again, consider what is going to happen as the chip makers continue to expand on their simd features. What happens as a 16x-float goes to a 32x, 64x, 128x, etc.
I think that the obvious response is going to be to use simd::fixed_size, however then we will lose much of the portability if we're always having to specify the size of a simd type for each machine.
It looks like deduce may be used to solve this issue but I'm not sure.
If not then I would suggest this be added to the standard, to ensure the type that ends up being used isn't any bigger than we actually need.
Again, if there is something I missed in how this works, please feel free to correct me.
The text was updated successfully, but these errors were encountered:
The facilities for fixed_size and deduce have been improved and simplified in the C++26 paper. Specifically, std::simd<T, N> is basically std::experimental::simd<T, std::experimental::simd_abi:deduce_t<T, N>>. And even the latter will choose SIMD register width (at least with GCC) as you desire (next power of 2, if it exists in HW). You can test this with a simple sizeof check.
So this may very well just be a flaw in my way of thinking of how to use std::simd, but I haven't been able to find anything that addresses it.
One of the situations that to be seems potentially problematic is with something like matrix operations.
To my mind at least, it seems like the simplest way to do this would be to have an array where each row is represented as a separate std::simd. While I'm not 100% sure about this, from everything I can see, it looks like with the current standard, the compiler is always going to try and go for the largest size simd register available, even if there are smaller ones available. For larger matrices this is fine, but for smaller ones that could still benefit from simd, it seems like there could be a whole lot of unnecessary padding, which I imagine would only get worse as cpu's continue to expand their simd options.
Let's say for example I have a 7x7 matrix of floats. Most cpu's with simd support are either going to have a 4x-float or 8x-float register, either of which would likely work well here (although obviously we would prefer the 8x-float). However if the cpu supports AVX-512, it's going to have 16x-float registers, and (I may be wrong about this) but it looks like in that situation the compiler will use the 16x-float register instead. Now that's not the end of the world in this scenario, however:
a) It is flat out unnecessary considering there is a better option, and will take longer to transfer, and take up more cache, more time and power as the simd instruction has to execute double the amount of data it otherwise would have.
b) Again, consider what is going to happen as the chip makers continue to expand on their simd features. What happens as a 16x-float goes to a 32x, 64x, 128x, etc.
I think that the obvious response is going to be to use simd::fixed_size, however then we will lose much of the portability if we're always having to specify the size of a simd type for each machine.
It looks like deduce may be used to solve this issue but I'm not sure.
If not then I would suggest this be added to the standard, to ensure the type that ends up being used isn't any bigger than we actually need.
Again, if there is something I missed in how this works, please feel free to correct me.
The text was updated successfully, but these errors were encountered: