WIP: PythonBackend/cl_run_kernel rewrite #28

simonius · 2018-09-28T11:09:10Z

Hi !
i started to write a PyOpenCl based backend and realized that all arrays are created in host memory. Afterwards they are copied to the OpenCL device, used by the kernel, and copied back by cl_run_kernel.
I would like to write functions which just create references to the arrays in OpenCL device memory and pass them around, because transfer to and from GPUs can be slow. What do you think ?

ranocha · 2018-09-29T06:36:40Z

This is the same approach I have been using for Julia (unpublished alpha version, cf. #13 and some references linked therein). I don't know completely how pyopencl handles the lifetime of buffers on the GPU. Relying strictly on the OpenCL standard, there might be some problems of a kernel is called that does not take a specific buffer as argument. That's why we are using kernels with partly irrelevant arguments. Thus, doing the same in pyopencl should be fine. @philipheinisch could tell you more about that.

philipheinisch · 2018-09-29T07:13:04Z

In most cases the data is left on the device because multiple kernels are queued back to back without copy. If you want your data to live exclusively on the device you need to make sure enough memory is available on the device at all times. Due to the way OpenCL handles memory this is not trivial, like @ranocha explained. Additional care has to be taken with divergence cleaning to prevent memory problems, if the device runs out of memory or the memory manger thinks it might. It might be possible to circumvent some of the host device copies still performed, but the influence on the total runtime is so small, that it is not worth the effort to risk possibly hard to debug memory problems. Especially as only the kernel time is useful if you want to benchmark the performance of the numeric method.

To summarize it is a good idea to prevent as many copy instructions as possible, but pure device only buffers are tricky.

simonius · 2018-09-29T14:54:59Z

Thanks. The documentation of PyOpenCl is inconclusive about this topic. I wrote both versions.

simonius closed this as completed Sep 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: PythonBackend/cl_run_kernel rewrite #28

WIP: PythonBackend/cl_run_kernel rewrite #28

simonius commented Sep 28, 2018

ranocha commented Sep 29, 2018

philipheinisch commented Sep 29, 2018

simonius commented Sep 29, 2018

WIP: PythonBackend/cl_run_kernel rewrite #28

WIP: PythonBackend/cl_run_kernel rewrite #28

Comments

simonius commented Sep 28, 2018

ranocha commented Sep 29, 2018

philipheinisch commented Sep 29, 2018

simonius commented Sep 29, 2018