Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Clang 19.1.0 with libc++ fails to build HIP code #100

Open
AngryLoki opened this issue Oct 26, 2024 · 10 comments
Open

[Issue]: Clang 19.1.0 with libc++ fails to build HIP code #100

AngryLoki opened this issue Oct 26, 2024 · 10 comments

Comments

@AngryLoki
Copy link

AngryLoki commented Oct 26, 2024

Problem Description

Hi,

As seen in https://godbolt.org/z/ehxeqG4sj, clang --stdlib=libc++ works with clang-17 and does not work with newer versions (checked with clang version 19.1.0 and nightly builds). Compilation of any file fails with:

In file included from <built-in>:2:
In file included from /opt/compiler-explorer/libs/rocm/6.1.2/include/hip/hip_runtime.h:62:
In file included from /opt/compiler-explorer/libs/rocm/6.1.2/include/hip/amd_detail/amd_hip_runtime.h:351:
/opt/compiler-explorer/libs/rocm/6.1.2/include/hip/amd_detail/amd_math_functions.h:47:8: error: no template named '__numeric_type'; did you mean '__hip::__numeric_type'?
   47 | struct __numeric_type<_Float16>
      |        ^
/opt/compiler-explorer/clang-assertions-trunk-20241026/lib/clang/20/include/__clang_hip_cmath.h:383:29: note: '__hip::__numeric_type' declared here
  383 | template <class _Tp> struct __numeric_type {
      |                             ^
In file included from <built-in>:2:
In file included from /opt/compiler-explorer/libs/rocm/6.1.2/include/hip/hip_runtime.h:62:
In file included from /opt/compiler-explorer/libs/rocm/6.1.2/include/hip/amd_detail/amd_hip_runtime.h:351:
/opt/compiler-explorer/libs/rocm/6.1.2/include/hip/amd_detail/amd_math_functions.h:47:8: error: class template specialization of '__numeric_type' not in a namespace enclosing '__hip'
   47 | struct __numeric_type<_Float16>
      |        ^
/opt/compiler-explorer/clang-assertions-trunk-20241026/lib/clang/20/include/__clang_hip_cmath.h:383:29: note: explicitly specialized declaration is here
  383 | template <class _Tp> struct __numeric_type {
      |                             ^

A single replacement in

#if _LIBCPP_VERSION && __HIP__
namespace std {
template <>
struct __numeric_type<_Float16>

-namespace std {
+namespace __hip {

fixes compilation (compiles the whole pytorch fine), however I think it might break something else (as I was not able to find which commit in llvm caused the issue). I'm pretty sure @yxsamliu knows the details though.

Could you fix this issue? Thanks!

ROCm Version

ROCm 6.1.1

ROCm Component

clr

@ppanchad-amd
Copy link

Hi @AngryLoki. Internal ticket has been created to fix your issue. Thanks!

@zichguan-amd
Copy link
Contributor

Hi @AngryLoki, does this happen when you use the clang provided with ROCm?

@AngryLoki
Copy link
Author

@zichguan-amd , when checking with clang from ROCm/llvm-project repository:

  • issue affects rocm-staging branch (pointing to rocm-llvm-project/build/lib/clang/20/include/__clang_hip_cmath.h)
  • issue affects rocm-6.2.2 tag (pointing to rocm-llvm-project/build/lib/clang/18/include/__clang_hip_cmath.h)
  • issue does not affect tag rocm-6.1.2 tag

@zichguan-amd
Copy link
Contributor

I cannot repro the issue with ROCm 6.2.2, are you using the same ROCm version as the clang release tag?

@AngryLoki
Copy link
Author

I rechecked with hip-6.2.2 (and rocm-comgr-6.2.2, rocr-runtime-6.2.2), with rocm/llvm-project from branch rocm-6.2.2, still reproduces. Steps were:

wget https://raw.githubusercontent.com/ROCm-Developer-Tools/HIP-CPU/master/examples/vadd_hip/vadd_hip.cpp

./bin/clang --stdlib=libc++ -x hip --rocm-path=/usr/lib/ vadd_hip.cpp
Same error

In file included from vadd_hip.cpp:5:
In file included from /usr/include/hip/hip_runtime.h:62:
In file included from /usr/include/hip/amd_detail/amd_hip_runtime.h:351:
/usr/include/hip/amd_detail/amd_math_functions.h:47:8: error: explicit specialization of undeclared template struct '__numeric_type'
   47 | struct __numeric_type<_Float16>
      |        ^             ~~~~~~~~~~
In file included from vadd_hip.cpp:5:
In file included from /usr/include/hip/hip_runtime.h:62:
In file included from /usr/include/hip/amd_detail/amd_hip_runtime.h:389:
/src/rocm-llvm-project/build/lib/clang/18/include/__clang_cuda_complex_builtins.h:194:27: error: use of undeclared identifier 'max'; did you mean
      'fmax'?
  194 |   double __logbw = _LOGBd(_fmaxd(_ABSd(__c), _ABSd(__d)));
      |                           ^
/src/rocm-llvm-project/build/lib/clang/18/include/__clang_cuda_complex_builtins.h:45:16: note: expanded from macro '_fmaxd'
   45 | #define _fmaxd max
      |                ^
/src/rocm-llvm-project/build/lib/clang/18/include/__clang_cuda_math_forward_declares.h:73:19: note: 'fmax' declared here
   73 | __DEVICE__ double fmax(double, double);
      |                   ^
In file included from vadd_hip.cpp:5:
In file included from /usr/include/hip/hip_runtime.h:62:
In file included from /usr/include/hip/amd_detail/amd_hip_runtime.h:389:
/src/rocm-llvm-project/build/lib/clang/18/include/__clang_cuda_complex_builtins.h:227:26: error: use of undeclared identifier 'max'; did you mean
      'fmax'?
  227 |   float __logbw = _LOGBf(_fmaxf(_ABSf(__c), _ABSf(__d)));
      |                          ^
/src/rocm-llvm-project/build/lib/clang/18/include/__clang_cuda_complex_builtins.h:46:16: note: expanded from macro '_fmaxf'
   46 | #define _fmaxf max
      |                ^
/src/rocm-llvm-project/build/lib/clang/18/include/__clang_cuda_math_forward_declares.h:74:18: note: 'fmax' declared here
   74 | __DEVICE__ float fmax(float, float);
      |                  ^
vadd_hip.cpp:39:9: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   39 |         hipMalloc(&A_d, sizeBytes);
      |         ^~~~~~~~~ ~~~~~~~~~~~~~~~
vadd_hip.cpp:40:9: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   40 |         hipMalloc(&B_d, sizeBytes);
      |         ^~~~~~~~~ ~~~~~~~~~~~~~~~
vadd_hip.cpp:41:9: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   41 |         hipMalloc(&C_d, sizeBytes);
      |         ^~~~~~~~~ ~~~~~~~~~~~~~~~
vadd_hip.cpp:50:9: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   50 |         hipMemcpy(A_d, A_h, sizeBytes, hipMemcpyHostToDevice);
      |         ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vadd_hip.cpp:51:9: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   51 |         hipMemcpy(B_d, B_h, sizeBytes, hipMemcpyHostToDevice);
      |         ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vadd_hip.cpp:59:9: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   59 |         hipMemcpy(C_h, C_d, sizeBytes, hipMemcpyDeviceToHost);
      |         ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6 warnings and 3 errors generated when compiling for gfx906.

Just to repeat: the only important flag for reproduction is --stdlib=libc++. By default Clang uses --stdlib=libstdc++. With libstdc++ everything works fine with any version of Clang. The issue lies within #if _LIBCPP_VERSION ... block and affects upstream clang-18+ and ROCm fork which currently (in 6.2.2) is based on clang-18.

@zichguan-amd
Copy link
Contributor

Can you include the -v flag for more details? I also tried on the ubuntu 24 ROCm 6.2.0 image and still cannot reproduce the error.
I can successfully compile with clang++ or hipcc on 6.2.0 and 6.2.2 as such /opt/rocm-6.2.2/lib/llvm/bin/clang++ -v --stdlib=libc++ -x hip vadd_hip.cpp

@AngryLoki
Copy link
Author

AngryLoki commented Nov 6, 2024

Ah, sorry, my mistake. Even though I used clang from rocm branch, compiler pulled libc++ header /usr/include/c++/v1/__type_traits/promote.h from libc++-19.1.0 installation. The breaking change was llvm/llvm-project#81379, which conditionally disabled definition of __numeric_type structure in favor of dectype-based implementation:

#if !defined(_LIBCPP_CLANG_VER) || _LIBCPP_CLANG_VER != 1700
    // compiler follows this branch
    class __promote {
        //  decltype-based implementation
        using type = decltype((__test(_Args()) + ...));
#else
    struct __numeric_type {
        // ...
    }

    class __promote {
        // ...

This "simplified implementation of __promote" does not exist in libc++ 18.x, so the issue only affects systems with libc++-19-dev (Debian sid, Ubuntu 24.10, etc.)

docker run -it --rm ubuntu:24.10 bash

apt update && apt install libc++-dev wget -y
wget https://repo.radeon.com/amdgpu-install/6.2.2/ubuntu/noble/amdgpu-install_6.2.60202-1_all.deb
apt install ./amdgpu-install_6.2.60202-1_all.deb -y && apt update && apt install amdgpu-dkms rocm -y

echo "#include <hip/hip_runtime.h>" > test.hip
/opt/rocm-6.2.2/bin/hipcc -stdlib=libc++ -c test.hip
# error: class template specialization of '__numeric_type' not in a namespace enclosing '__hip'

@zichguan-amd
Copy link
Contributor

Good catch! I've escalated the issue and @yxsamliu should be able to take a closer look into it.

@yxsamliu
Copy link
Contributor

yxsamliu commented Nov 6, 2024

Ah, sorry, my mistake. Even though I used clang from rocm branch, compiler pulled libc++ header /usr/include/c++/v1/__type_traits/promote.h from libc++-19.1.0 installation. The breaking change was llvm/llvm-project#81379, which conditionally disabled definition of __numeric_type structure in favor of dectype-based implementation:

#if !defined(_LIBCPP_CLANG_VER) || _LIBCPP_CLANG_VER != 1700
    // compiler follows this branch
    class __promote {
        //  decltype-based implementation
        using type = decltype((__test(_Args()) + ...));
#else
    struct __numeric_type {
        // ...
    }

    class __promote {
        // ...

This "simplified implementation of __promote" does not exist in libc++ 18.x, so the issue only affects systems with libc++-19-dev (Debian sid, Ubuntu 24.10, etc.)

docker run -it --rm ubuntu:24.10 bash

apt update && apt install libc++-dev wget -y
wget https://repo.radeon.com/amdgpu-install/6.2.2/ubuntu/noble/amdgpu-install_6.2.60202-1_all.deb
apt install ./amdgpu-install_6.2.60202-1_all.deb -y && apt update && apt install amdgpu-dkms rocm -y

echo "#include <hip/hip_runtime.h>" > test.hip
/opt/rocm-6.2.2/bin/hipcc -stdlib=libc++ -c test.hip
# error: class template specialization of '__numeric_type' not in a namespace enclosing '__hip'

Thanks for your analysis.

I think we can fix the issue by reverting this commit 6f439f0

This commit was done four years ago to workaround a libc++ ambiguity issue when using _Float16 with fma. However, it seems we no longer need this workaround as libc++ has fixed that issue. https://godbolt.org/z/YhsKr8sjv

@zichguan-amd
Copy link
Contributor

Hi @AngryLoki, fix has landed in staging: b8ba4cc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants