Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of bounds write in kernel20 #4

Open
alexkyllo opened this issue Mar 6, 2021 · 2 comments
Open

Out of bounds write in kernel20 #4

alexkyllo opened this issue Mar 6, 2021 · 2 comments

Comments

@alexkyllo
Copy link

Hello,

I am looking to use this library in a project so I just compiled it and ran the test dsscfg program and it failed with what looks like an out-of-bounds write to a global memory array somewhere in the matupd::kernel20 kernel function.

This is on a GeForce RTX 2060 Super (CC 7.5) with CUDA 11.2.

$ ./bin/dsscfg 
Begin testing DSSCFG on the CPU (double precision)
CPU iteration 0 F: -0.349732
CPU iteration 0 F: 0.0192875
CPU iteration 0 F: -0.496264
CPU iteration 100 F: -0.888434
CPU iteration 200 F: -1.00093
CPU iteration 300 F: -1.01115
CPU iteration 400 F: -1.01126
CPU iteration 500 F: -1.01129
Timing: 9.94557 ms / iteration
Begin testing DSSCFG with CUDA (double precision)
CUDA iteration 0 F: -0.349732
CUDA iteration 0 F: -0.199307
CUDA iteration 0 F: -0.496264
lbfgsb failure: 700, /home/alex/lbfgsb-gpu/culbfgsb/./cauchy.cu, 594

Here's the cuda-memcheck output:

========= Invalid __global__ write of size 8
=========     at 0x000003d0 in void lbfgsbcuda::cuda::matupd::kernel20<int=256, double>(int, int, int, int, int, int, double const *, lbfgsbcuda::cuda::matupd::kernel20<int=256, double>*, double const )
=========     by thread (0,0,0) in block (135,0,0)
=========     Address 0x7f7999e22038 is out of bounds
=========     Device Frame:void lbfgsbcuda::cuda::matupd::kernel20<int=256, double>(int, int, int, int, int, int, double const *, lbfgsbcuda::cuda::matupd::kernel20<int=256, double>*, double const ) (void lbfgsbcuda::cuda::matupd::kernel20<int=256, double>(int, int, int, int, int, int, double const *, lbfgsbcuda::cuda::matupd::kernel20<int=256, double>*, double const ) : 0x3d0)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x222dd8]
=========     Host Frame:./bin/dsscfg [0x2136b]
=========     Host Frame:./bin/dsscfg [0x6de20]
=========     Host Frame:./bin/dsscfg [0xd0c25]
=========     Host Frame:./bin/dsscfg [0x9b098]
=========     Host Frame:./bin/dsscfg [0x9e633]
=========     Host Frame:./bin/dsscfg [0x102a4]
=========     Host Frame:./bin/dsscfg [0x9db6]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf3) [0x270b3]
=========     Host Frame:./bin/dsscfg [0xdf4e]
=========
@nepluno
Copy link
Owner

nepluno commented Mar 7, 2021

Hi Alex,

I cannot reproduce the issue with the default setting. Have you changed any options or anything? The only memory writing access in matupd::kernel20 is writing to buf_array_p at Line 112 of matupd.cu. According to the cuda-memcheck info, for block (135, 0, 0), the writing address is simply 135, while buf_array_p should have size initialized with 8272 (Line 143 of lbfgsbcuda.cpp). Then there should not be a memory access error as long as buf_array_p is correctly allocated.

Therefore, the only reason I can think of is that the GPU memory is somehow not correctly allocated. You may verify if buf_array_p has been initialized into the correct size at Line 143 of lbfgsbcuda.cpp. Could it be some driver issue?

@alexkyllo
Copy link
Author

Thanks for looking into it.
My NVIDIA driver version is 460.39. I am able to run other CUDA code on it.
I didn't change any options, I just tried to run the example. The steps I did were:

git clone [email protected]:nepluno/lbfgsb-gpu.git
cd lbfgsb-gpu
mkdir build
cd build
cmake ..
make
./bin/dsscfg

That size m * normalpitch * 2 does appear to equal 8272 as you said.
To clarify, cuda-memcheck throws invalid write errors for many threads in this kernel, I just grabbed one. Here's for (0, 0, 0):

========= Invalid __global__ write of size 8
=========     at 0x000003d0 in void lbfgsbcuda::cuda::matupd::kernel20<int=256, double>(int, int, int, int, int, int, double const *, lbfgsbcuda::cuda::matupd::kernel20<int=256, double>*, double const )
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7fae2de21c00 is out of bounds
=========     Device Frame:void lbfgsbcuda::cuda::matupd::kernel20<int=256, double>(int, int, int, int, int, int, double const *, lbfgsbcuda::cuda::matupd::kernel20<int=256, double>*, double const ) (void lbfgsbcuda::cuda::matupd::kernel20<int=256, double>(int, int, int, int, int, int, double const *, lbfgsbcuda::cuda::matupd::kernel20<int=256, double>*, double const ) : 0x3d0)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x222dd8]
=========     Host Frame:./bin/dsscfg [0x2136b]
=========     Host Frame:./bin/dsscfg [0x6de20]
=========     Host Frame:./bin/dsscfg [0xd0cc5]
=========     Host Frame:./bin/dsscfg [0x9b098]
=========     Host Frame:./bin/dsscfg [0x9e683]
=========     Host Frame:./bin/dsscfg [0x102a4]
=========     Host Frame:./bin/dsscfg [0x9db6]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf3) [0x270b3]
=========     Host Frame:./bin/dsscfg [0xdf4e]

It looks like the actual exception is:

thrust::system::detail::bad_alloc'
  what():  std::bad_alloc: cudaErrorIllegalAddress: an illegal memory access was encountered

So I agree that this looks like a memory allocation failure. Any further suggestion for how to track down the root cause?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants