pyopencl._cl.MemoryError: clEnqueueNDRangeKernel failed: MEM_OBJECT_ALLOCATION_FAILURE #14

btguilherme · 2019-01-30T18:02:48Z

Hi,
I am using the gputools library in my project and I have a problem. When I try to render images with the nlm3 filter and the GPU does not support the amount of images, I get the error of the attached image. However, the GPU continues to process the images. Is there any solution to this problem (kill the process)?
Thanks!

maweigert · 2019-02-03T22:32:32Z

Hi,

Thanks for the feedback!

the GPU continues to process the images.

I don't fully understand that. You mean the GPU memory is still allocated? Sometimes, a memory allocation error such as this indeed leads to strange behaviour afterwards - how large was the image and what does nvidia-smi (given you have a nvidia card) shows before and after the nlm3 call?

btguilherme · 2019-02-06T10:42:56Z

Hello! Thanks for listening!

I don't fully understand that. You mean the GPU memory is still allocated?

Exactly, the memory is still allocated and the video card continues to process (in the Windows task manager it shows that the images are being computed even with memory error).

how large was the image

I am using an AMD RadeonT R750 4GB, and I am trying to process TIF images 2000x2000 pixels (in fact, the error also happens with RAW files)

In the attached images it is possible to observe that the algorithm breaks, but continues to process the images. The prompt is only available after the end of processing (the result of this test was obtained by processing 156 slices of a RAW file, 980x1008 pixels)

I know the problem is memory allocation (it happens specifically at line 97 of the nlm3.py file, where the accBuf / = weightBuf division happens). The strange thing is that it does not deallocate from memory, nor does it stop processing.

maweigert · 2019-02-06T11:24:23Z

That's weird, given that accBuf /= weightBuf is an inplace division, and doesn't even allocate extra memory...So the shape of the input image was (156,980,1008)?
Sadly, all our windows machines only have Nvidia cards, so thats gonna be a hard one to debug...

btguilherme · 2019-02-06T11:46:25Z

That's weird, given that accBuf /= weightBuf is an inplace division, and doesn't even allocate extra memory

In fact, on the machine I am using there is extra memory allocation, there is a peak of video memory consumption shortly after the loops of lines 67-69 of the nlm3 function, which is exactly the accBuf / = weightBuf call. Maybe it's a "problem" with AMD cards (?), or maybe with pyopencl (the version installed here is 2018.2.2 + cl12).

So the shape of the input image was (156,980,1008)?

Yes.

For now I'm working around the problem dividing the set of images. I compute each subset and store it in another array.

Thank you again for your time! By the way, good work in developing this project.

maweigert · 2019-02-06T12:17:50Z

So if you run somethings like this, do you see peak memory usage to be greater than 2.1 GB (which it does not on my machine)?

import numpy as np 
from gputools import OCLArray

x = np.ones((1024,1024,256), np.float32)

x_g = OCLArray.from_array(x)
y_g = OCLArray.from_array(x)
x_g /= y_g

And thanks for the feedback! :)

btguilherme · 2019-02-06T13:34:27Z

I performed the code 10 times in a row and got this result

In fact, there are peaks, as shown in the attached image, in indexes 4, 5, 6 and 9.

maweigert · 2019-02-06T13:53:53Z

So if you change the shape to (1024,1024,400) it will crash?

btguilherme · 2019-02-06T14:02:01Z

Yes

maweigert · 2019-02-06T14:04:53Z

same if you exchange elementwise divide by an elementwise multiply?

import numpy as np 
from gputools import OCLArray
x = np.ones((1024,1024,400), np.float32)
x_g = OCLArray.from_array(x)
y_g = OCLArray.from_array(x)
x_g *= y_g

btguilherme · 2019-02-06T14:22:05Z

With x = np.ones((1024,1024,400), np.float32) the code don't crash. But if I set x = np.ones((1024,1024,**450**), np.float32), then it will crash.

maweigert · 2019-02-06T15:42:18Z

What happens if you run the following code in the same way? Does it still crash?

import numpy as np 
from gputools import OCLArray, get_device, OCLElementwiseKernel

x = np.empty((1024,1024,400), np.float32)
x_g = OCLArray.from_array(x)
y_g = OCLArray.from_array(x)

k = OCLElementwiseKernel(
"float *a, float *b",
"a[i] = a[i]/b[i]",
"divide_inplace")

k(x_g, y_g)

btguilherme · 2019-02-06T15:56:48Z

It exhibits the same behavior, running smoothly when x = np.ones ((1024,1024,400), np.float32) and crashing when x = np.ones ((1024,1024,450), np.float32)

maweigert · 2019-02-06T16:27:40Z

Interesting. So it seems that pyopencl's "/=" operator is not inplace, whereas the "*=" operator is (i.e. does not allocate additional memory). I would guess, that (1024,1024,450) fails, as it it might slightly be above the available memory of your card - so I would not be too worried about that.

So you could use the inplace-divide kernel from above as workaround

k = OCLElementwiseKernel(
"float *a, float *b",
"a[i] = a[i]/b[i]",
"divide_inplace")

...

# was: accBuf /= weightBuf
k(accBuf,weightBuf)

That should rid you of the original allocation error. Why the kernel keeps on running, however, I still have no idea about ;)

btguilherme · 2019-02-06T17:37:14Z

The proposed modifications really worked! I no longer receive the memory error, so the program is being limited only by RAM memory. Many thanks for the support @maweigert !

Why the kernel keeps on running, however, I still have no idea about ;)

In fact, it is very strange behavior.

xiuliren · 2023-06-06T01:27:54Z

I am processing an array with shape of (1024x1024x512), the data type is float32.
I got the same error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyopencl._cl.MemoryError: clEnqueueNDRangeKernel failed: MEM_OBJECT_ALLOCATION_FAILURE #14

pyopencl._cl.MemoryError: clEnqueueNDRangeKernel failed: MEM_OBJECT_ALLOCATION_FAILURE #14

btguilherme commented Jan 30, 2019

maweigert commented Feb 3, 2019

btguilherme commented Feb 6, 2019

maweigert commented Feb 6, 2019

btguilherme commented Feb 6, 2019

maweigert commented Feb 6, 2019 •

edited

Loading

btguilherme commented Feb 6, 2019

maweigert commented Feb 6, 2019

btguilherme commented Feb 6, 2019

maweigert commented Feb 6, 2019

btguilherme commented Feb 6, 2019

maweigert commented Feb 6, 2019

btguilherme commented Feb 6, 2019

maweigert commented Feb 6, 2019

btguilherme commented Feb 6, 2019

xiuliren commented Jun 6, 2023

pyopencl._cl.MemoryError: clEnqueueNDRangeKernel failed: MEM_OBJECT_ALLOCATION_FAILURE #14

pyopencl._cl.MemoryError: clEnqueueNDRangeKernel failed: MEM_OBJECT_ALLOCATION_FAILURE #14

Comments

btguilherme commented Jan 30, 2019

maweigert commented Feb 3, 2019

btguilherme commented Feb 6, 2019

maweigert commented Feb 6, 2019

btguilherme commented Feb 6, 2019

maweigert commented Feb 6, 2019 • edited Loading

btguilherme commented Feb 6, 2019

maweigert commented Feb 6, 2019

btguilherme commented Feb 6, 2019

maweigert commented Feb 6, 2019

btguilherme commented Feb 6, 2019

maweigert commented Feb 6, 2019

btguilherme commented Feb 6, 2019

maweigert commented Feb 6, 2019

btguilherme commented Feb 6, 2019

xiuliren commented Jun 6, 2023

maweigert commented Feb 6, 2019 •

edited

Loading