Skip to content

Commit

Permalink
Merge branch 'release_prep' of github.com:BradleySappington/poppy int…
Browse files Browse the repository at this point in the history
…o release_prep
  • Loading branch information
BradleySappington committed May 11, 2023
2 parents bc7a3c5 + 0ab71e2 commit ffbbf62
Show file tree
Hide file tree
Showing 6 changed files with 137 additions and 229 deletions.
78 changes: 78 additions & 0 deletions docs/gpu_acceleration.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
GPU Accelerated Optical Calculations
====================================



.. admonition:: Placeholder docs
This page is a placeholder for more complete documentation to be added later about usage of GPUs for fast optical calculations.



Thanks to team members Kian Milani and Ewan Douglas (University of Arizona), poppy now includes a
high performance option using NVidia GPUs to significantly accelerate optical calculations, in some
cases by ~20x to 80x.

This implementation seeks to perform all calculations on the GPU until the end of propagation. This
reduces time for calculations as arrays no longer need to be transferred between GPU memory and
standard memory when performing different calculations. It also allows GPU acceleration of the
majority of calculations performed during an optical propagation (i.e. creating models of optical
elements and applying them to wavefronts happens on the GPU, as well as the propagation calculations
from one plane to another.)

An updated implementation using `CuPy <https://docs.cupy.dev/en/stable/overview.html>` replaces
initial earlier support for CUDA using pyculib and numba.cuda. (That initial implementation has been
removed since the CuPy implementation is much better performing.)

Note, because cupy is used as a replacement for numpy at import time, it is a bit tricky to toggle
between GPU and CPU calculations during the same python session. Doing so is advanced usage, and
while it can be useful in some cases for debugging or benchmarking, it's not fully supported or
recommended to try to switch between calculation backends within the same session.


**What about AMD GPUs?**

There also exists partial/earlier support for OpenCL for AMD GPUs, using the `pyopencl` and `gpyfft`
packages. This provides much less performance gains than the CuPy version, however, since only
FFTs are performed on-GPU, not other parts of the optical propagation calculations.

**What about Apple Silicon GPUs?**

Poppy does not yet have support for the specialized GPU hardware in Apple Silicon M1/M2 and similar.
For these machines, plain numpy is the best option.

Requirements and Setup
----------------------


Requires NVidia GPU hardware

Requires CuPy > 10.0. Install from https://cupy.dev following the `CuPy installation docs <https://docs.cupy.dev/en/stable/install.html#>`_

Also requires the cupyx GPU-accelerated version of scipy.


Performance Comparisons
-----------------------



Computation comparisons have been performed to illustrate the benefit of this accelerated computing
feature. Below are comparisons of the times required for a PSF to be calculated for varying array
sizes using the MKL FFT option versus the CuPy calculations. The optical systems tested had 5
different surfaces/optics.

Performances will naturally vary depending on the compute hardware used. The system used for these
comparisons was the University of Arizona’s HPC Puma nodes. The node utilized 32 AMD EPYC 7642 CPUs
and the NVIDIA Tesla V100S GPU.

+-------------------+--------------+------------------------+-------------------------+----------------------+
| Propagation Type | Array Size | MKL Method Times [s] | CuPy Method Times [s] | Speed Up Factor |
+===============+=======+=======+===============+=======+
| Fraunhofer | 1024 | 0.218 | 0.0261 | 8.35 |
| Fraunhofer | 2048 | 0.755 | 0.0294 | 25.7 |
| Fraunhofer | 4096 | 3.36 | 0.0423 | 79.4 |
| Fresnel | 1024 | 0.714 | 0.0438 | 16.3 |
| Fresnel | 2048 | 4.16 | 0.0845 | 49.2 |
| Fresnel | 4096 | 17.5 | 0.225 | 77.8 |
+---------------+-------+-------+---------------+-------+

1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ Contents
about.rst
performance.rst
fft_optimization.rst
gpu_acceleration.rst
dev_notes.rst


Expand Down
9 changes: 6 additions & 3 deletions docs/relnotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,13 @@ Release Notes

For a list of contributors, see :ref:`about`.

1.1.0
-----
.. _rel1.1.0:

1.1.0
-----

This release introduces support for much faster (20-80x) optical calculations using GPU acceleration via the CuPy library for NVidia GPUs. Credit to Kian Milani (:user:`kian1337`) for this significant and complex improvement.

.. _rel1.1.0::

*2023 May 12*

Expand Down
102 changes: 35 additions & 67 deletions poppy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,73 +56,41 @@ class Conf(_config.ConfigNamespace):
# because this is a memory-intensive calculation and you will
# just end up thrashing IO and swapping out a ton, so everything
# becomes super slow.
n_processes = _config.ConfigItem(
4,
"Maximum number of additional "
+ "worker processes to spawn, if multiprocessing is enabled. "
+ "Set to 0 for autoselect. Note, PSF calculations are likely RAM "
+ "limited more than CPU limited for higher N on modern machines.",
)

use_fftw = _config.ConfigItem(
True,
"Use FFTW for FFTs (assuming it"
+ "is available)? Set to False to force numpy.fft always, True to"
+ "try importing and using FFTW via PyFFTW.",
)
autosave_fftw_wisdom = _config.ConfigItem(
True,
"Should POPPY "
+ "automatically save and reload FFTW "
+ '"wisdom" for improved speed?',
)
use_mkl = _config.ConfigItem(
True,
"Use Intel MKL for FFTs (assuming it is available). "
"This has highest priority for CPU-based FFT over other FFT options, if multiple are set True.",
)

use_cuda = _config.ConfigItem(
True, "Use cuda for FFTs on GPU (assuming it" + "is available)?"
)
use_opencl = _config.ConfigItem(
True, "Use OpenCL for FFTs on GPU (assuming it" + "is available)?"
)
use_cupy = _config.ConfigItem(
True, "Use CuPy for FFTs on GPU (assuming it" + "is available)?"
)
use_numexpr = _config.ConfigItem(
True, "Use NumExpr to accelerate array math (assuming it" + "is available)?"
)

double_precision = _config.ConfigItem(
True,
"Floating point values use float64 and complex128 if True,"
+ "otherwise float32 and complex64.",
)

default_image_display_fov = _config.ConfigItem(
5.0,
"Default image"
+ "display field of view, in arcseconds. Adjust this to display "
+ "only a subregion of a larger output array.",
)

default_logging_level = _config.ConfigItem(
"INFO", "Logging " + "verbosity: one of {DEBUG, INFO, WARN, ERROR, or CRITICAL}"
)

enable_speed_tests = _config.ConfigItem(
False,
"Enable additional "
+ "verbose printout of computation times. Useful for benchmarking.",
)
enable_flux_tests = _config.ConfigItem(
False,
"Enable additional "
+ "verbose printout of fluxes and flux conservation during "
+ "calculations. Useful for testing.",
)
n_processes = _config.ConfigItem(4, 'Maximum number of additional ' +
'worker processes to spawn, if multiprocessing is enabled. ' +
'Set to 0 for autoselect. Note, PSF calculations are likely RAM ' +
'limited more than CPU limited for higher N on modern machines.')

use_fftw = _config.ConfigItem(True, 'Use FFTW for FFTs (assuming it' +
'is available)? Set to False to force numpy.fft always, True to' +
'try importing and using FFTW via PyFFTW.')
autosave_fftw_wisdom = _config.ConfigItem(True, 'Should POPPY ' +
'automatically save and reload FFTW ' +
'"wisdom" for improved speed?')
use_mkl = _config.ConfigItem(True, "Use Intel MKL for FFTs (assuming it is available). "
"This has highest priority for CPU-based FFT over other FFT options, if multiple are set True.")
use_opencl = _config.ConfigItem(True, 'Use OpenCL for FFTs on GPU (assuming it' +
'is available)?')
use_cupy = _config.ConfigItem(True, 'Use CuPy for FFTs on GPU (assuming it' +
'is available)?')
use_numexpr = _config.ConfigItem(True, 'Use NumExpr to accelerate array math (assuming it' +
'is available)?')

double_precision = _config.ConfigItem(True, 'Floating point values use float64 and complex128 if True,' +
'otherwise float32 and complex64.')

default_image_display_fov = _config.ConfigItem(5.0, 'Default image' +
'display field of view, in arcseconds. Adjust this to display ' +
'only a subregion of a larger output array.')

default_logging_level = _config.ConfigItem('INFO', 'Logging ' +
'verbosity: one of {DEBUG, INFO, WARN, ERROR, or CRITICAL}')

enable_speed_tests = _config.ConfigItem(False, 'Enable additional ' +
'verbose printout of computation times. Useful for benchmarking.')
enable_flux_tests = _config.ConfigItem(False, 'Enable additional ' +
'verbose printout of fluxes and flux conservation during ' +
'calculations. Useful for testing.')
cmap_sequential = _config.ConfigItem(
"gist_heat",
"Select a default colormap to represent sequential data (e.g. intensity)",
Expand Down
Loading

0 comments on commit ffbbf62

Please sign in to comment.