Skip to content

[Issue]: A simple vector test program, compiled using hipcc, will encounter a segmentation fault error when run. #259

@zejianzhang1982

Description

@zejianzhang1982

Problem Description

A simple vector test program, compiled using hipcc, will encounter a segmentation fault error when run.
The error occurs within the hipMemcpy function. By using gdb for tracing, it was discovered that it was at #0 0x00007ffff364a5ca in rocr::atomic::Store (order=std::memory_order_release, val=0, ptr=0x7fffee95a000)
at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/core/util/atomic_helpers.h:215
ptr=0x7fffee95a000 cannot be accessed?

my code:

#include <hip/hip_runtime.h>
#include

global void vectorAdd(const float *A, const float *B, float *C) {
int i = blockDim.x * blockIdx.x + threadIdx.x;
C[i] = A[i] + B[i];
}

int main() {
int numElements = 512;
size_t size = numElements * sizeof(float);
float *h_A = (float *)malloc(size);
float *h_B = (float *)malloc(size);
float *h_C = (float *)malloc(size);

for (int i = 0; i < numElements; ++i) {
        h_A[i] = rand() / (float)RAND_MAX;
        h_B[i] = rand() / (float)RAND_MAX;
}

float *d_A, *d_B, *d_C;
hipMalloc((void **)&d_A, size);
hipMalloc((void **)&d_B, size);
hipMalloc((void **)&d_C, size);


hipMemcpy(d_A, h_A, size, hipMemcpyHostToDevice);
hipMemcpy(d_B, h_B, size, hipMemcpyHostToDevice);


int threadsPerBlock = 256;
int blocksPerGrid = (numElements + threadsPerBlock - 1) / threadsPerBlock;

hipLaunchKernelGGL(vectorAdd, dim3(blocksPerGrid), dim3(threadsPerBlock), 0, 0, d_A, d_B, d_C);
hipMemcpy(h_C, d_C, size, hipMemcpyDeviceToHost);
for (int i = 0; i < numElements; ++i) {
        if (fabs(h_A[i] + h_B[i] - h_C[i]) > 1e-5) {
                fprintf(stderr, "Result verification failed at element %d!\n", i);
                exit(EXIT_FAILURE);
        }
}

printf("Test PASSED\n");
hipFree(d_A);
hipFree(d_B);
hipFree(d_C);
free(h_A);
free(h_B);
free(h_C);

return 0;

}

#0 0x00007ffff364a5ca in rocr::atomic::Store (order=std::memory_order_release, val=0, ptr=0x7fffee95a000)
at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/core/util/atomic_helpers.h:215
#1 rocr::AMD::AqlQueue::StoreRelaxed (this=0x55555567a4d0, value=0) at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/core/runtime/amd_aql_queue.cpp:464
#2 0x00007ffff36356e2 in rocr::AMD::BlitKernel::SubmitLinearCopyCommand (this=0x5555557411c0, dst=, src=, size=, dep_signals=..., out_signal=...,
gang_signals=...) at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/core/runtime/amd_blit_kernel.cpp:1161
#3 0x00007ffff363426e in rocr::AMD::BlitKernel::SubmitLinearCopyCommand (this=0x5555557411c0, dst=0x7fffee940000, src=0x7fffee930000, size=32768)
at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/core/runtime/amd_blit_kernel.cpp:1007
#4 0x00007ffff364d76e in rocr::(anonymous namespace)::RegionMemory::Freeze (this=0x555555684f80)
at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/core/runtime/amd_loader_context.cpp:354
#5 0x00007ffff3683278 in rocr::amd::hsa::loader::Segment::Freeze (this=0x555555678040) at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/loader/executable.cpp:705
#6 rocr::amd::hsa::loader::ExecutableImpl::Freeze (this=0x555555cdc3c0, options=)
at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/loader/executable.cpp:1944
#7 0x00007ffff3683e1e in rocr::amd::hsa::loader::AmdHsaCodeLoader::FreezeExecutable (this=0x5555556621a0, executable=0x555555cdc3c0, options=)
at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/loader/executable.cpp:236
#8 0x00007ffff3658054 in rocr::HSA::hsa_executable_freeze (executable=..., options=0x0) at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/core/runtime/hsa.cpp:2330
#9 0x00007ffff6e2379e in ?? () from /usr/lib64/libamdhip64.so.6
#10 0x00007ffff6db93f8 in ?? () from /usr/lib64/libamdhip64.so.6
#11 0x00007ffff6db945e in ?? () from /usr/lib64/libamdhip64.so.6
#12 0x00007ffff6e00ee2 in ?? () from /usr/lib64/libamdhip64.so.6
#13 0x00007ffff6daa772 in ?? () from /usr/lib64/libamdhip64.so.6
#14 0x00007ffff6e0f5b4 in ?? () from /usr/lib64/libamdhip64.so.6
#15 0x00007ffff6e3c2c4 in ?? () from /usr/lib64/libamdhip64.so.6
#16 0x00007ffff6e2d4ea in ?? () from /usr/lib64/libamdhip64.so.6
#17 0x00007ffff6e0fbc2 in ?? () from /usr/lib64/libamdhip64.so.6
#18 0x00007ffff6defca6 in ?? () from /usr/lib64/libamdhip64.so.6
#19 0x00007ffff6d3380c in ?? () from /usr/lib64/libamdhip64.so.6
#20 0x00007ffff6be2466 in ?? () from /usr/lib64/libamdhip64.so.6
#21 0x00007ffff6c8b13c in ?? () from /usr/lib64/libamdhip64.so.6
#22 0x00005555555572aa in main () at vectorAdd-cl.hip:63

This program can run normally on my previous MI210 + Linux 6.15 system, which is also a RISC-V CPU platform.

Operating System

Linux localhost.localdomain 6.12.43-25.08.29.14.riscv64

CPU

SG2044 riscv

GPU

W7900 dual slot

ROCm Version

6.3.4

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

ROCk module is loaded

HSA System Attributes

Runtime Version: 1.1
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES

==========
HSA Agents


Agent 1


Name:
Uuid: CPU-XX
Marketing Name:
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 65536(0x10000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2600
BDFID: 0
Internal Node ID: 0
Compute Unit: 64
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 131595684(0x7d7fda4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 131595684(0x7d7fda4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 131595684(0x7d7fda4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 4
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 131595684(0x7d7fda4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:


Agent 2


Name: gfx1100
Uuid: GPU-d66e739c6390904e
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29770(0x744a)
ASIC Revision: 0(0x0)
Cacheline Size: 128(0x80)
Max Clock Freq. (MHz): 1760
BDFID: 768
Internal Node ID: 1
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 552
SDMA engine uCode:: 24
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 47169536(0x2cfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 47169536(0x2cfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***

Additional Information

Display Engine Configuration

CONFIG_DRM_AMD_DC=y
CONFIG_DRM_AMD_DC_FP=y
CONFIG_DRM_AMD_DC_SI=y
CONFIG_DEBUG_KERNEL_DC=y

CONFIG_DRM_AMD_SECURE_DISPLAY is not set

end of Display Engine Configuration

CONFIG_HSA_AMD=y
CONFIG_DRM_NOUVEAU=m
CONFIG_NOUVEAU_DEBUG=5
CONFIG_NOUVEAU_DEBUG_DEFAULT=3

CONFIG_NOUVEAU_DEBUG_MMU is not set

CONFIG_NOUVEAU_DEBUG_PUSH is not set

CONFIG_DRM_NOUVEAU_BACKLIGHT=y

CONFIG_DRM_NOUVEAU_GSP_DEFAULT is not set

CONFIG_DRM_XE is not set

CONFIG_DRM_VGEM=m
CONFIG_DRM_VKMS=m
CONFIG_DRM_UDL=m
CONFIG_DRM_AST=y
CONFIG_DRM_MGAG200=m
CONFIG_DRM_QXL=m
CONFIG_DRM_VIRTIO_GPU=m
CONFIG_DRM_VIRTIO_GPU_KMS=y
CONFIG_DRM_PANEL=y

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions