Skip to content
This repository has been archived by the owner on Jan 26, 2024. It is now read-only.

ROCm 5.3 gfx1030 hang with hipStreamCreate and hipStreamDestroy #52

Open
nolmoonen opened this issue Oct 13, 2022 · 3 comments
Open

ROCm 5.3 gfx1030 hang with hipStreamCreate and hipStreamDestroy #52

nolmoonen opened this issue Oct 13, 2022 · 3 comments

Comments

@nolmoonen
Copy link

The following test hangs with ROCm 5.3 on the gfx1030 architecture (AMD Radeon PRO V620).

#include <hip/hip_runtime.h>

#include <cstdio>

int main()
{
    printf("starting..\n");

    hipStream_t stream;
    hipStreamCreate(&stream);
    hipStreamDestroy(stream);

    hipStream_t stream2;
    hipStreamCreateWithFlags(&stream2, hipStreamNonBlocking);
    hipStreamDestroy(stream2);

    printf("finished!\n");
}

Ran with

hipcc test.cpp
./a.out

Built and executed in Docker image rocm/rocm-terminal. hipconfig reports

HIP version  : 5.3.22061-e8e78f1a

== hipconfig
HIP_PATH     : /opt/rocm-5.3.0
ROCM_PATH    : /opt/rocm-5.3.0
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME  : rocclr
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-5.3.0/include -I/opt/rocm-5.3.0/llvm/bin/../lib/clang/15.0.0 -I/opt/rocm-5.3.0/hsa/include

== hip-clang
HSA_PATH         : /opt/rocm-5.3.0/hsa
HIP_CLANG_PATH   : /opt/rocm-5.3.0/llvm/bin
AMD clang version 15.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.3.0 22362 3cf23f77f8208174a2ee7c616f4be23674d7b081)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.3.0/llvm/bin
AMD LLVM version 15.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver3

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
 -std=c++11 -isystem "/opt/rocm-5.3.0/llvm/lib/clang/15.0.0/include/.." -isystem /opt/rocm-5.3.0/hsa/include -isystem "/opt/rocm-5.3.0/include" -O3
 -L"/opt/rocm-5.3.0/lib" -O3 -lgcc_s -lgcc -lpthread -lm -lrt

=== Environment Variables
PATH=/home/rocm-user/.vscode-server/bin/129500ee4c8ab7263461ffe327268ba56b9f210d/bin/remote-cli:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/rocm/bin

== Linux Kernel
Hostname     : fb5ed677a12b
Linux fb5ed677a12b 5.4.0-125-generic #141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

The test works with ROCm 5.2 and with ROCm 5.3 on other architectures.

@nolmoonen
Copy link
Author

The issue is also reproducible with

#include <hip/hip_runtime.h>

#include <cstdio>

int main()
{
    printf("starting..\n");

    hipStream_t stream;
    hipStreamCreate(&stream);
    hipStreamDestroy(stream);

    hipStream_t stream2;
    hipStreamCreate(&stream2);
    hipStreamDestroy(stream2);

    printf("finished!\n");
}

so it is not specific to hipStreamCreateWithFlags.

@nolmoonen
Copy link
Author

The issue is reproducible on a system with two gfx1030 cards. It is not reproducible on a system with only one: if I create a rocm/rocm-terminal:5.3 image and pass only one card, the example works like it should. The issue is not reproducible on a system with two gfx908 cards.

@Maetveis
Copy link

I was able to reproduce the hang in a 5.3 docker container (rocm/rocm-terminal:5.3) before updating the host system to 5.3, but not after it.

Looks like an additional requirement for this to trigger is to have the rocm 5.2 kernel module, but using the 5.3 runtime (typically via new docker containers, when the host has not yet been updated).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants