This repo contains PyTorch Extensions for 2D convolutional operations written with C++, OpenGL, OpenMP, and Boost Compute. The motivation behind writting faster extensions is that that currently PyTorch native 2D convolution is implmented using CuDNN,but does not support integrated GPU support currently. Thus I have written extensions for running convolution operations on CPU, integrated GPU and shared CPU/GPU.
Hardware: Intel(R) Core(TM) i7-14700HX (Raptor Lake), Intel(R) UHD Graphics
Test size: 4096x4096 tensor Input, 256x356 tensor Filter
Native PyTorch runtime: 0.30904 secs
Extension runtime: 0.05667 secs
Final Speedup: ~88%
Native PyTorch runtime: 0.30904 secs
Extension runtime: 0.6036 secs
Final Speedup: No increase
Native PyTorch runtime: 0.30904 secs
Extension runtime: 0.28835 secs
Final Speedup: ~13%