Out of bounds write in kernel20 #4

alexkyllo · 2021-03-06T22:16:26Z

Hello,

I am looking to use this library in a project so I just compiled it and ran the test dsscfg program and it failed with what looks like an out-of-bounds write to a global memory array somewhere in the matupd::kernel20 kernel function.

This is on a GeForce RTX 2060 Super (CC 7.5) with CUDA 11.2.

$ ./bin/dsscfg 
Begin testing DSSCFG on the CPU (double precision)
CPU iteration 0 F: -0.349732
CPU iteration 0 F: 0.0192875
CPU iteration 0 F: -0.496264
CPU iteration 100 F: -0.888434
CPU iteration 200 F: -1.00093
CPU iteration 300 F: -1.01115
CPU iteration 400 F: -1.01126
CPU iteration 500 F: -1.01129
Timing: 9.94557 ms / iteration
Begin testing DSSCFG with CUDA (double precision)
CUDA iteration 0 F: -0.349732
CUDA iteration 0 F: -0.199307
CUDA iteration 0 F: -0.496264
lbfgsb failure: 700, /home/alex/lbfgsb-gpu/culbfgsb/./cauchy.cu, 594

Here's the cuda-memcheck output:

========= Invalid __global__ write of size 8
=========     at 0x000003d0 in void lbfgsbcuda::cuda::matupd::kernel20<int=256, double>(int, int, int, int, int, int, double const *, lbfgsbcuda::cuda::matupd::kernel20<int=256, double>*, double const )
=========     by thread (0,0,0) in block (135,0,0)
=========     Address 0x7f7999e22038 is out of bounds
=========     Device Frame:void lbfgsbcuda::cuda::matupd::kernel20<int=256, double>(int, int, int, int, int, int, double const *, lbfgsbcuda::cuda::matupd::kernel20<int=256, double>*, double const ) (void lbfgsbcuda::cuda::matupd::kernel20<int=256, double>(int, int, int, int, int, int, double const *, lbfgsbcuda::cuda::matupd::kernel20<int=256, double>*, double const ) : 0x3d0)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x222dd8]
=========     Host Frame:./bin/dsscfg [0x2136b]
=========     Host Frame:./bin/dsscfg [0x6de20]
=========     Host Frame:./bin/dsscfg [0xd0c25]
=========     Host Frame:./bin/dsscfg [0x9b098]
=========     Host Frame:./bin/dsscfg [0x9e633]
=========     Host Frame:./bin/dsscfg [0x102a4]
=========     Host Frame:./bin/dsscfg [0x9db6]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf3) [0x270b3]
=========     Host Frame:./bin/dsscfg [0xdf4e]
=========

The text was updated successfully, but these errors were encountered:

nepluno · 2021-03-07T06:53:50Z

Hi Alex,

I cannot reproduce the issue with the default setting. Have you changed any options or anything? The only memory writing access in matupd::kernel20 is writing to buf_array_p at Line 112 of matupd.cu. According to the cuda-memcheck info, for block (135, 0, 0), the writing address is simply 135, while buf_array_p should have size initialized with 8272 (Line 143 of lbfgsbcuda.cpp). Then there should not be a memory access error as long as buf_array_p is correctly allocated.

Therefore, the only reason I can think of is that the GPU memory is somehow not correctly allocated. You may verify if buf_array_p has been initialized into the correct size at Line 143 of lbfgsbcuda.cpp. Could it be some driver issue?

alexkyllo · 2021-03-08T17:12:43Z

Thanks for looking into it.
My NVIDIA driver version is 460.39. I am able to run other CUDA code on it.
I didn't change any options, I just tried to run the example. The steps I did were:

git clone [email protected]:nepluno/lbfgsb-gpu.git
cd lbfgsb-gpu
mkdir build
cd build
cmake ..
make
./bin/dsscfg

That size m * normalpitch * 2 does appear to equal 8272 as you said.
To clarify, cuda-memcheck throws invalid write errors for many threads in this kernel, I just grabbed one. Here's for (0, 0, 0):

========= Invalid __global__ write of size 8
=========     at 0x000003d0 in void lbfgsbcuda::cuda::matupd::kernel20<int=256, double>(int, int, int, int, int, int, double const *, lbfgsbcuda::cuda::matupd::kernel20<int=256, double>*, double const )
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7fae2de21c00 is out of bounds
=========     Device Frame:void lbfgsbcuda::cuda::matupd::kernel20<int=256, double>(int, int, int, int, int, int, double const *, lbfgsbcuda::cuda::matupd::kernel20<int=256, double>*, double const ) (void lbfgsbcuda::cuda::matupd::kernel20<int=256, double>(int, int, int, int, int, int, double const *, lbfgsbcuda::cuda::matupd::kernel20<int=256, double>*, double const ) : 0x3d0)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x222dd8]
=========     Host Frame:./bin/dsscfg [0x2136b]
=========     Host Frame:./bin/dsscfg [0x6de20]
=========     Host Frame:./bin/dsscfg [0xd0cc5]
=========     Host Frame:./bin/dsscfg [0x9b098]
=========     Host Frame:./bin/dsscfg [0x9e683]
=========     Host Frame:./bin/dsscfg [0x102a4]
=========     Host Frame:./bin/dsscfg [0x9db6]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf3) [0x270b3]
=========     Host Frame:./bin/dsscfg [0xdf4e]

It looks like the actual exception is:

thrust::system::detail::bad_alloc'
  what():  std::bad_alloc: cudaErrorIllegalAddress: an illegal memory access was encountered

So I agree that this looks like a memory allocation failure. Any further suggestion for how to track down the root cause?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of bounds write in kernel20 #4

Out of bounds write in kernel20 #4

alexkyllo commented Mar 6, 2021

nepluno commented Mar 7, 2021

alexkyllo commented Mar 8, 2021

Out of bounds write in kernel20 #4

Out of bounds write in kernel20 #4

Comments

alexkyllo commented Mar 6, 2021

nepluno commented Mar 7, 2021

alexkyllo commented Mar 8, 2021