You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is just a comment that @timmoon10 and others may find useful.
I see the following output when compiling Elemental with Intel 18 beta:
/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"
/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"
/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"
/home/jrhammon/Work/Elemental/git/include/El/blas_like/level1/EntrywiseMap.hpp(93): warning #15552: loop was not vectorized with "simd"
For reference, the relevant code is below, where EL_SIMD is using _Pragma("omp simd").
template<typename S,typename T>
void EntrywiseMap
( const Matrix<S>& A, Matrix<T>& B, function<T(const S&)> func )
{
EL_DEBUG_CSE
const Int m = A.Height();
const Int n = A.Width();
B.Resize( m, n );
const S* ABuf = A.LockedBuffer();
T* BBuf = B.Buffer();
const Int ALDim = A.LDim();
const Int BLDim = B.LDim();
EL_PARALLEL_FOR
for( Int j=0; j<n; ++j )
{
EL_SIMD
for( Int i=0; i<m; ++i )
{
BBuf[i+j*BLDim] = func(ABuf[i+j*ALDim]);
}
}
}
The problem is that vectorizing std::function is hard. If one wants these to vectorize, one likely has to declare them as SIMD functions (see e.g. https://software.intel.com/en-us/node/524514 for details).
Interestingly enough, the Intel compiler will auto-vectorize lambdas, so if you implement and use EntrywiseMap with lambdas instead of std::functions, then you are likely to get SIMD code.
Another way to realize threaded+vectorized code in Elemental would be use C++17 Parallel STL, which Intel has implemented in Intel 18 beta (although this is currently somewhat irrelevant due to #215 and similar). std::for_each( pstl::execution::unseq, ...) generates SIMD code for lambdas. Unfortunately, unseq isn't standard (yet) but it's trivial to abstract that away.
The text was updated successfully, but these errors were encountered:
It might be better to remove the SIMD functionality here if we assume that the function is expensive and hard to vectorize. In this regime, the cost of divisions and modulus operations for 'omp parallel collapse(2)' will be relatively minor and load balancing may be more important. We have already implemented vectorized code for common memory bound operations.
@timmoon10 Yeah, but I wonder how far we should go down this path. Do we not have a way to give the user a pointer to this data so they implement an element-wise map in their code?
This is just a comment that @timmoon10 and others may find useful.
I see the following output when compiling Elemental with Intel 18 beta:
For reference, the relevant code is below, where
EL_SIMD
is using_Pragma("omp simd")
.The problem is that vectorizing
std::function
is hard. If one wants these to vectorize, one likely has to declare them as SIMD functions (see e.g. https://software.intel.com/en-us/node/524514 for details).Interestingly enough, the Intel compiler will auto-vectorize lambdas, so if you implement and use
EntrywiseMap
with lambdas instead ofstd::function
s, then you are likely to get SIMD code.Another way to realize threaded+vectorized code in Elemental would be use C++17 Parallel STL, which Intel has implemented in Intel 18 beta (although this is currently somewhat irrelevant due to #215 and similar).
std::for_each( pstl::execution::unseq, ...)
generates SIMD code for lambdas. Unfortunately,unseq
isn't standard (yet) but it's trivial to abstract that away.The text was updated successfully, but these errors were encountered: