Skip to content

Commit

Permalink
Work around compiler vectorization issue
Browse files Browse the repository at this point in the history
With uint8_t types, the icpx compiler fails to vectorize even when
calling begin() on our range within a kernel to pull out a raw pointer.
To work around this issue, begin() needs to be called on the host and
passed to the kernel

Signed-off-by: Matthew Michel <[email protected]>
  • Loading branch information
mmichel11 committed Jan 6, 2025
1 parent 32612a1 commit 1336735
Show file tree
Hide file tree
Showing 2 changed files with 151 additions and 165 deletions.
16 changes: 14 additions & 2 deletions include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_for.h
Original file line number Diff line number Diff line change
Expand Up @@ -192,8 +192,20 @@ __parallel_for(oneapi::dpl::__internal::__device_backend_tag, _ExecutionPolicy&&
{
if (__count >= __large_submitter::__estimate_best_start_size(__exec, __brick))
{
return __large_submitter{}(std::forward<_ExecutionPolicy>(__exec), __brick, __count,
std::forward<_Ranges>(__rngs)...);
// Passing begin() of each range is needed for the icpx compiler to vectorize. The indirection introduced
// by our all / guard views interfere with compiler vectorization. At this point, we have ensured that
// input is contiguous and can be operated on directly. The begin() function for these views will return a
// pointer which is passed to the kernel.
if constexpr (_Fp::__can_vectorize)
{
return __large_submitter{}(std::forward<_ExecutionPolicy>(__exec), __brick, __count,
std::forward<_Ranges>(__rngs).begin()...);
}
else
{
return __large_submitter{}(std::forward<_ExecutionPolicy>(__exec), __brick, __count,
std::forward<_Ranges>(__rngs)...);
}
}
}
return __small_submitter{}(std::forward<_ExecutionPolicy>(__exec), __brick, __count,
Expand Down
Loading

0 comments on commit 1336735

Please sign in to comment.