-
Notifications
You must be signed in to change notification settings - Fork 79
Description
Problem Description
A simple vector test program, compiled using hipcc, will encounter a segmentation fault error when run.
The error occurs within the hipMemcpy function. By using gdb for tracing, it was discovered that it was at #0 0x00007ffff364a5ca in rocr::atomic::Store (order=std::memory_order_release, val=0, ptr=0x7fffee95a000)
at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/core/util/atomic_helpers.h:215
ptr=0x7fffee95a000 cannot be accessed?
my code:
#include <hip/hip_runtime.h>
#include
global void vectorAdd(const float *A, const float *B, float *C) {
int i = blockDim.x * blockIdx.x + threadIdx.x;
C[i] = A[i] + B[i];
}
int main() {
int numElements = 512;
size_t size = numElements * sizeof(float);
float *h_A = (float *)malloc(size);
float *h_B = (float *)malloc(size);
float *h_C = (float *)malloc(size);
for (int i = 0; i < numElements; ++i) {
h_A[i] = rand() / (float)RAND_MAX;
h_B[i] = rand() / (float)RAND_MAX;
}
float *d_A, *d_B, *d_C;
hipMalloc((void **)&d_A, size);
hipMalloc((void **)&d_B, size);
hipMalloc((void **)&d_C, size);
hipMemcpy(d_A, h_A, size, hipMemcpyHostToDevice);
hipMemcpy(d_B, h_B, size, hipMemcpyHostToDevice);
int threadsPerBlock = 256;
int blocksPerGrid = (numElements + threadsPerBlock - 1) / threadsPerBlock;
hipLaunchKernelGGL(vectorAdd, dim3(blocksPerGrid), dim3(threadsPerBlock), 0, 0, d_A, d_B, d_C);
hipMemcpy(h_C, d_C, size, hipMemcpyDeviceToHost);
for (int i = 0; i < numElements; ++i) {
if (fabs(h_A[i] + h_B[i] - h_C[i]) > 1e-5) {
fprintf(stderr, "Result verification failed at element %d!\n", i);
exit(EXIT_FAILURE);
}
}
printf("Test PASSED\n");
hipFree(d_A);
hipFree(d_B);
hipFree(d_C);
free(h_A);
free(h_B);
free(h_C);
return 0;
}
#0 0x00007ffff364a5ca in rocr::atomic::Store (order=std::memory_order_release, val=0, ptr=0x7fffee95a000)
at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/core/util/atomic_helpers.h:215
#1 rocr::AMD::AqlQueue::StoreRelaxed (this=0x55555567a4d0, value=0) at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/core/runtime/amd_aql_queue.cpp:464
#2 0x00007ffff36356e2 in rocr::AMD::BlitKernel::SubmitLinearCopyCommand (this=0x5555557411c0, dst=, src=, size=, dep_signals=..., out_signal=...,
gang_signals=...) at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/core/runtime/amd_blit_kernel.cpp:1161
#3 0x00007ffff363426e in rocr::AMD::BlitKernel::SubmitLinearCopyCommand (this=0x5555557411c0, dst=0x7fffee940000, src=0x7fffee930000, size=32768)
at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/core/runtime/amd_blit_kernel.cpp:1007
#4 0x00007ffff364d76e in rocr::(anonymous namespace)::RegionMemory::Freeze (this=0x555555684f80)
at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/core/runtime/amd_loader_context.cpp:354
#5 0x00007ffff3683278 in rocr::amd::hsa::loader::Segment::Freeze (this=0x555555678040) at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/loader/executable.cpp:705
#6 rocr::amd::hsa::loader::ExecutableImpl::Freeze (this=0x555555cdc3c0, options=)
at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/loader/executable.cpp:1944
#7 0x00007ffff3683e1e in rocr::amd::hsa::loader::AmdHsaCodeLoader::FreezeExecutable (this=0x5555556621a0, executable=0x555555cdc3c0, options=)
at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/loader/executable.cpp:236
#8 0x00007ffff3658054 in rocr::HSA::hsa_executable_freeze (executable=..., options=0x0) at /usr/src/debug/rocm-runtime-6.3.0-2.oe2403sp1.riscv64/runtime/hsa-runtime/core/runtime/hsa.cpp:2330
#9 0x00007ffff6e2379e in ?? () from /usr/lib64/libamdhip64.so.6
#10 0x00007ffff6db93f8 in ?? () from /usr/lib64/libamdhip64.so.6
#11 0x00007ffff6db945e in ?? () from /usr/lib64/libamdhip64.so.6
#12 0x00007ffff6e00ee2 in ?? () from /usr/lib64/libamdhip64.so.6
#13 0x00007ffff6daa772 in ?? () from /usr/lib64/libamdhip64.so.6
#14 0x00007ffff6e0f5b4 in ?? () from /usr/lib64/libamdhip64.so.6
#15 0x00007ffff6e3c2c4 in ?? () from /usr/lib64/libamdhip64.so.6
#16 0x00007ffff6e2d4ea in ?? () from /usr/lib64/libamdhip64.so.6
#17 0x00007ffff6e0fbc2 in ?? () from /usr/lib64/libamdhip64.so.6
#18 0x00007ffff6defca6 in ?? () from /usr/lib64/libamdhip64.so.6
#19 0x00007ffff6d3380c in ?? () from /usr/lib64/libamdhip64.so.6
#20 0x00007ffff6be2466 in ?? () from /usr/lib64/libamdhip64.so.6
#21 0x00007ffff6c8b13c in ?? () from /usr/lib64/libamdhip64.so.6
#22 0x00005555555572aa in main () at vectorAdd-cl.hip:63
This program can run normally on my previous MI210 + Linux 6.15 system, which is also a RISC-V CPU platform.
Operating System
Linux localhost.localdomain 6.12.43-25.08.29.14.riscv64
CPU
SG2044 riscv
GPU
W7900 dual slot
ROCm Version
6.3.4
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
ROCk module is loaded
HSA System Attributes
Runtime Version: 1.1
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
Agent 1
Name:
Uuid: CPU-XX
Marketing Name:
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 65536(0x10000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2600
BDFID: 0
Internal Node ID: 0
Compute Unit: 64
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 131595684(0x7d7fda4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 131595684(0x7d7fda4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 131595684(0x7d7fda4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 4
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 131595684(0x7d7fda4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 2
Name: gfx1100
Uuid: GPU-d66e739c6390904e
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29770(0x744a)
ASIC Revision: 0(0x0)
Cacheline Size: 128(0x80)
Max Clock Freq. (MHz): 1760
BDFID: 768
Internal Node ID: 1
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 552
SDMA engine uCode:: 24
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 47169536(0x2cfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 47169536(0x2cfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Additional Information
Display Engine Configuration
CONFIG_DRM_AMD_DC=y
CONFIG_DRM_AMD_DC_FP=y
CONFIG_DRM_AMD_DC_SI=y
CONFIG_DEBUG_KERNEL_DC=y
CONFIG_DRM_AMD_SECURE_DISPLAY is not set
end of Display Engine Configuration
CONFIG_HSA_AMD=y
CONFIG_DRM_NOUVEAU=m
CONFIG_NOUVEAU_DEBUG=5
CONFIG_NOUVEAU_DEBUG_DEFAULT=3
CONFIG_NOUVEAU_DEBUG_MMU is not set
CONFIG_NOUVEAU_DEBUG_PUSH is not set
CONFIG_DRM_NOUVEAU_BACKLIGHT=y
CONFIG_DRM_NOUVEAU_GSP_DEFAULT is not set
CONFIG_DRM_XE is not set
CONFIG_DRM_VGEM=m
CONFIG_DRM_VKMS=m
CONFIG_DRM_UDL=m
CONFIG_DRM_AST=y
CONFIG_DRM_MGAG200=m
CONFIG_DRM_QXL=m
CONFIG_DRM_VIRTIO_GPU=m
CONFIG_DRM_VIRTIO_GPU_KMS=y
CONFIG_DRM_PANEL=y