Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined variable leads to potential clang miscompile despite maybe_undef #14

Open
GMNGeoffrey opened this issue Jan 25, 2025 · 0 comments

Comments

@GMNGeoffrey
Copy link
Collaborator

GMNGeoffrey commented Jan 25, 2025

In #15 I had to add an initialization to warp_position in the GPU cache kernel.

// First thread of the warp_tile accumulate the missing length to global
// variable TODO(nod-ai/dgl#14): clang miscompiles if this isn't initialized
// even though the shfl instructions are marked with maybe_undef. Figure out
// whether this is indeed a clang bug and remove this initialization.
size_t warp_position = 0;
if (lane_idx == 0) {
warp_position = atomicAdd(d_missing_len, (size_t)warp_missing_counter);
}
warp_position = warp_tile.shfl(warp_position, 0);
if (lane_idx < warp_missing_counter) {
d_missing_keys[warp_position + lane_idx] = missing_key;
d_missing_index[warp_position + lane_idx] = missing_index;
}

Without that warp_position is uninitialized. The only way it gets defined before it gets shuffled is if it enters the if block, and then it gets passed to warp_tile.shfl by value, which I think is UB. So the compiler concludes that it has to have entered the if block. This nicely explains why either not assigning warp_position in the if block or not using it afterwards makes the issue go away. Assigning to 0 also makes it go away in my reproducer.

Looking deeper though, it's a bit odd because the _shfl operations specifically have __attribute__((maybe_undef)), which was basically invented for this exact use case (https://reviews.llvm.org/D130224):

https://github.com/ROCm/clr/blob/3c863dad9146be24ccec93816c3cb0752d40d9ca/hipamd/include/hip/amd_detail/amd_warp_functions.h#L130-L136

The groups-level version don't though:

https://github.com/ROCm/clr/blob/3c863dad9146be24ccec93816c3cb0752d40d9ca/hipamd/include/hip/amd_detail/amd_hip_cooperative_groups.h#L451-L474

I'm not really clear on whether that's something that would be propagated to the calling function. That also says that the value argument can only be a 32-bit int or float, which is a bit weird, especially since it's templated and has a static_assert that just checks if it's integral or float, and delegates to the warp functions that support way more types than that. That documentation was just added. Seems maybe wrong? Changing it to int32_t doesn't fix anything though.

I tried throwing MAYBE_UNDEF (the HIP Macro around the clang attribute) on thread_block_tile_base::shfl (confirmed that this is the one it calls via printf and the debugger), but that doesn't fix the issue. Based on the description of maybe_undef it seems like it should.

The RFC for maybe_undef literally describes the exact issue I ran into: https://discourse.llvm.org/t/llvm-dev-rfc-d130224-introduce-maybe-undef-attribute-for-function-arguments-which-accepts-undef-values/63980. The commit for this landed over two years ago and is in clang-16: llvm/llvm-project@a35c64c. This seems like a [hip] clang bug then. Although calling __shfl directly like __shfl(warp_position, 0, 64) also fixes the issue, so maybe ROCm is holding it wrong actually.

Here's a minimal reproducer of the issue

#include "hip/hip_runtime.h"
#include <hip/hip_cooperative_groups.h>

namespace cg = cooperative_groups;

#define HIP_CHECK(val) \
  { hip_check_((val), __FILE__, __LINE__); }

class HipException : public std::runtime_error {
 public:
  HipException(const std::string& what) : runtime_error(what) {}
};

inline void hip_check_(hipError_t val, const char* file, int line) {
  if (val != hipSuccess) {
    throw HipException(std::string(file) + ":" + std::to_string(line) + ": CUDA error " +
                        std::to_string(val) + ": " + hipGetErrorString(val));
  }
}

template <int warp_size>
__global__ void get_kernel(size_t* d_missing_len) {
  cg::thread_block_tile<warp_size> warp_tile =
      cg::tiled_partition<warp_size>(cg::this_thread_block());
  const size_t lane_idx = warp_tile.thread_rank();
  uint8_t warp_missing_counter = 1;

  size_t warp_position;
  printf(
      "Before branch: lane_idx=%zu, recomputed_lane_idx=%u, d_missing_len=%zu\n",
      lane_idx, warp_tile.thread_rank(), *d_missing_len);
  if (lane_idx == 0) {
    warp_position = atomicAdd(d_missing_len, (size_t)warp_missing_counter);
    printf(
      "Incremented d_missing_len: lane_idx=%zu, recomputed_lane_idx=%u, d_missing_len=%zu, warp_position=%zu\n",
      lane_idx, warp_tile.thread_rank(), *d_missing_len, warp_position);
  }
  warp_position = warp_tile.shfl(warp_position, 0);
}

int main(int argc, char** argv) {
  hipStream_t stream;
  size_t h_missing_len;
  size_t* d_missing_len;

  HIP_CHECK(hipMalloc((void**)&d_missing_len, sizeof(size_t)));
  HIP_CHECK(hipStreamCreate(&stream));

  HIP_CHECK(hipStreamSynchronize(stream));

  constexpr int warp_size = 64;
  get_kernel<warp_size><<<1, 64, 0, stream>>>(d_missing_len);

  HIP_CHECK(hipMemcpyAsync(&h_missing_len, d_missing_len, sizeof(size_t),
                              hipMemcpyDeviceToHost, stream));
  HIP_CHECK(hipStreamSynchronize(stream));
  assert(h_missing_len == 1);

  HIP_CHECK(hipStreamDestroy(stream));
  HIP_CHECK(hipFree(d_missing_len));
  return 0;
}
GMNGeoffrey added a commit that referenced this issue Jan 29, 2025
If this isn't initialized, clang compiles away the if condition and
every thread increments the missing counter. I *think* this is a clang
bug because the shfl functions are supposed to have the values marked as
`maybe_undef` (which is literally what `maybe_undef` was created for),
but regardless initializing it is required here. This makes the gpu
cache tests pass when compiler optimizations are enabled.

Part of #14
@GMNGeoffrey GMNGeoffrey changed the title Warp position undef Undefined variable leads to potential clang miscompile despite maybe_undef Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant