Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyopencl._cl.MemoryError: clEnqueueNDRangeKernel failed: MEM_OBJECT_ALLOCATION_FAILURE #14

Open
btguilherme opened this issue Jan 30, 2019 · 15 comments

Comments

@btguilherme
Copy link

Hi,
I am using the gputools library in my project and I have a problem. When I try to render images with the nlm3 filter and the GPU does not support the amount of images, I get the error of the attached image. However, the GPU continues to process the images. Is there any solution to this problem (kill the process)?
Thanks!
untitled

@maweigert
Copy link
Owner

Hi,

Thanks for the feedback!

the GPU continues to process the images.

I don't fully understand that. You mean the GPU memory is still allocated? Sometimes, a memory allocation error such as this indeed leads to strange behaviour afterwards - how large was the image and what does nvidia-smi (given you have a nvidia card) shows before and after the nlm3 call?

@btguilherme
Copy link
Author

Hello! Thanks for listening!

I don't fully understand that. You mean the GPU memory is still allocated?

Exactly, the memory is still allocated and the video card continues to process (in the Windows task manager it shows that the images are being computed even with memory error).

initial

how large was the image

I am using an AMD RadeonT R750 4GB, and I am trying to process TIF images 2000x2000 pixels (in fact, the error also happens with RAW files)

In the attached images it is possible to observe that the algorithm breaks, but continues to process the images. The prompt is only available after the end of processing (the result of this test was obtained by processing 156 slices of a RAW file, 980x1008 pixels)

final

I know the problem is memory allocation (it happens specifically at line 97 of the nlm3.py file, where the accBuf / = weightBuf division happens). The strange thing is that it does not deallocate from memory, nor does it stop processing.

@maweigert
Copy link
Owner

That's weird, given that accBuf /= weightBuf is an inplace division, and doesn't even allocate extra memory...So the shape of the input image was (156,980,1008)?
Sadly, all our windows machines only have Nvidia cards, so thats gonna be a hard one to debug...

@btguilherme
Copy link
Author

That's weird, given that accBuf /= weightBuf is an inplace division, and doesn't even allocate extra memory

In fact, on the machine I am using there is extra memory allocation, there is a peak of video memory consumption shortly after the loops of lines 67-69 of the nlm3 function, which is exactly the accBuf / = weightBuf call. Maybe it's a "problem" with AMD cards (?), or maybe with pyopencl (the version installed here is 2018.2.2 + cl12).

So the shape of the input image was (156,980,1008)?

Yes.

For now I'm working around the problem dividing the set of images. I compute each subset and store it in another array.

Thank you again for your time! By the way, good work in developing this project.

@maweigert
Copy link
Owner

maweigert commented Feb 6, 2019

So if you run somethings like this, do you see peak memory usage to be greater than 2.1 GB (which it does not on my machine)?

import numpy as np 
from gputools import OCLArray

x = np.ones((1024,1024,256), np.float32)

x_g = OCLArray.from_array(x)
y_g = OCLArray.from_array(x)
x_g /= y_g

And thanks for the feedback! :)

@btguilherme
Copy link
Author

I performed the code 10 times in a row and got this result

presentation1

In fact, there are peaks, as shown in the attached image, in indexes 4, 5, 6 and 9.

@maweigert
Copy link
Owner

So if you change the shape to (1024,1024,400) it will crash?

@btguilherme
Copy link
Author

Yes
error

@maweigert
Copy link
Owner

same if you exchange elementwise divide by an elementwise multiply?

import numpy as np 
from gputools import OCLArray
x = np.ones((1024,1024,400), np.float32)
x_g = OCLArray.from_array(x)
y_g = OCLArray.from_array(x)
x_g *= y_g

@btguilherme
Copy link
Author

With x = np.ones((1024,1024,400), np.float32) the code don't crash. But if I set x = np.ones((1024,1024,**450**), np.float32), then it will crash.

@maweigert
Copy link
Owner

What happens if you run the following code in the same way? Does it still crash?

import numpy as np 
from gputools import OCLArray, get_device, OCLElementwiseKernel

x = np.empty((1024,1024,400), np.float32)
x_g = OCLArray.from_array(x)
y_g = OCLArray.from_array(x)

k = OCLElementwiseKernel(
"float *a, float *b",
"a[i] = a[i]/b[i]",
"divide_inplace")

k(x_g, y_g)

@btguilherme
Copy link
Author

It exhibits the same behavior, running smoothly when x = np.ones ((1024,1024,400), np.float32) and crashing when x = np.ones ((1024,1024,450), np.float32)

@maweigert
Copy link
Owner

Interesting. So it seems that pyopencl's "/=" operator is not inplace, whereas the "*=" operator is (i.e. does not allocate additional memory). I would guess, that (1024,1024,450) fails, as it it might slightly be above the available memory of your card - so I would not be too worried about that.

So you could use the inplace-divide kernel from above as workaround

k = OCLElementwiseKernel(
"float *a, float *b",
"a[i] = a[i]/b[i]",
"divide_inplace")

...

# was: accBuf /= weightBuf
k(accBuf,weightBuf)

That should rid you of the original allocation error. Why the kernel keeps on running, however, I still have no idea about ;)

@btguilherme
Copy link
Author

The proposed modifications really worked! I no longer receive the memory error, so the program is being limited only by RAM memory. Many thanks for the support @maweigert !

Why the kernel keeps on running, however, I still have no idea about ;)

In fact, it is very strange behavior.

@xiuliren
Copy link

xiuliren commented Jun 6, 2023

I am processing an array with shape of (1024x1024x512), the data type is float32.
I got the same error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants