You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Without that warp_position is uninitialized. The only way it gets defined before it gets shuffled is if it enters the if block, and then it gets passed to warp_tile.shfl by value, which I think is UB. So the compiler concludes that it has to have entered the if block. This nicely explains why either not assigning warp_position in the if block or not using it afterwards makes the issue go away. Assigning to 0 also makes it go away in my reproducer.
Looking deeper though, it's a bit odd because the _shfl operations specifically have __attribute__((maybe_undef)), which was basically invented for this exact use case (https://reviews.llvm.org/D130224):
I'm not really clear on whether that's something that would be propagated to the calling function. That also says that the value argument can only be a 32-bit int or float, which is a bit weird, especially since it's templated and has a static_assert that just checks if it's integral or float, and delegates to the warp functions that support way more types than that. That documentation was just added. Seems maybe wrong? Changing it to int32_t doesn't fix anything though.
I tried throwing MAYBE_UNDEF (the HIP Macro around the clang attribute) on thread_block_tile_base::shfl (confirmed that this is the one it calls via printf and the debugger), but that doesn't fix the issue. Based on the description of maybe_undef it seems like it should.
If this isn't initialized, clang compiles away the if condition and
every thread increments the missing counter. I *think* this is a clang
bug because the shfl functions are supposed to have the values marked as
`maybe_undef` (which is literally what `maybe_undef` was created for),
but regardless initializing it is required here. This makes the gpu
cache tests pass when compiler optimizations are enabled.
Part of #14
GMNGeoffrey
changed the title
Warp position undef
Undefined variable leads to potential clang miscompile despite maybe_undefJan 29, 2025
In #15 I had to add an initialization to
warp_position
in the GPU cache kernel.dgl/third_party/HugeCTR/gpu_cache/src/nv_gpu_cache.cu
Lines 530 to 543 in 7c3b60d
Without that
warp_position
is uninitialized. The only way it gets defined before it gets shuffled is if it enters the if block, and then it gets passed towarp_tile.shfl
by value, which I think is UB. So the compiler concludes that it has to have entered theif
block. This nicely explains why either not assigningwarp_position
in the if block or not using it afterwards makes the issue go away. Assigning to 0 also makes it go away in my reproducer.Looking deeper though, it's a bit odd because the
_shfl
operations specifically have__attribute__((maybe_undef))
, which was basically invented for this exact use case (https://reviews.llvm.org/D130224):https://github.com/ROCm/clr/blob/3c863dad9146be24ccec93816c3cb0752d40d9ca/hipamd/include/hip/amd_detail/amd_warp_functions.h#L130-L136
The groups-level version don't though:
https://github.com/ROCm/clr/blob/3c863dad9146be24ccec93816c3cb0752d40d9ca/hipamd/include/hip/amd_detail/amd_hip_cooperative_groups.h#L451-L474
I'm not really clear on whether that's something that would be propagated to the calling function. That also says that the value argument can only be a 32-bit int or float, which is a bit weird, especially since it's templated and has a
static_assert
that just checks if it's integral or float, and delegates to the warp functions that support way more types than that. That documentation was just added. Seems maybe wrong? Changing it toint32_t
doesn't fix anything though.I tried throwing
MAYBE_UNDEF
(the HIP Macro around the clang attribute) onthread_block_tile_base::shfl
(confirmed that this is the one it calls via printf and the debugger), but that doesn't fix the issue. Based on the description of maybe_undef it seems like it should.The RFC for
maybe_undef
literally describes the exact issue I ran into: https://discourse.llvm.org/t/llvm-dev-rfc-d130224-introduce-maybe-undef-attribute-for-function-arguments-which-accepts-undef-values/63980. The commit for this landed over two years ago and is in clang-16: llvm/llvm-project@a35c64c. This seems like a [hip] clang bug then. Although calling__shfl
directly like__shfl(warp_position, 0, 64)
also fixes the issue, so maybe ROCm is holding it wrong actually.Here's a minimal reproducer of the issue
The text was updated successfully, but these errors were encountered: