Merge branch 'release_prep' of github.com:BradleySappington/poppy int…

…o release_prep
spacetelescope · May 11, 2023 · ffbbf62 · ffbbf62
2 parents bc7a3c5 + 0ab71e2
commit ffbbf62
Show file tree

Hide file tree

Showing 6 changed files with 137 additions and 229 deletions.
diff --git a/docs/gpu_acceleration.rst b/docs/gpu_acceleration.rst
@@ -0,0 +1,78 @@
+GPU Accelerated Optical Calculations
+====================================
+
+
+
+.. admonition:: Placeholder docs
+   This page is a placeholder for more complete documentation to be added later about usage of GPUs for fast optical calculations.
+
+
+
+Thanks to team members Kian Milani and Ewan Douglas (University of Arizona), poppy now includes a
+high performance option using NVidia GPUs to significantly accelerate optical calculations, in some
+cases by ~20x to 80x. 
+
+This implementation seeks to perform all calculations on the GPU until the end of propagation. This
+reduces time for calculations as arrays no longer need to be transferred between GPU memory and
+standard memory when performing different calculations. It also allows GPU acceleration of the
+majority of calculations performed during an optical propagation (i.e. creating models of optical
+elements and applying them to wavefronts happens on the GPU, as well as the propagation calculations
+from one plane to another.)
+
+An updated implementation using `CuPy <https://docs.cupy.dev/en/stable/overview.html>` replaces
+initial earlier support for CUDA using pyculib and numba.cuda. (That initial implementation has been
+removed since the CuPy implementation is much better performing.)
+
+Note, because cupy is used as a replacement for numpy at import time, it is a bit tricky to toggle
+between GPU and CPU calculations during the same python session. Doing so is advanced usage, and
+while it can be useful in some cases for debugging or benchmarking, it's not fully supported or
+recommended to try to switch between calculation backends within the same session. 
+
+
+**What about AMD GPUs?**
+
+There also exists partial/earlier support for OpenCL for AMD GPUs, using the `pyopencl` and `gpyfft`
+packages. This provides much less performance gains than the CuPy version, however, since only 
+FFTs are performed on-GPU, not other parts of the optical propagation calculations.
+
+**What about Apple Silicon GPUs?**
+
+Poppy does not yet have support for the specialized GPU hardware in Apple Silicon M1/M2 and similar.
+For these machines, plain numpy is the best option.
+
+Requirements and Setup
+----------------------
+
+
+Requires NVidia GPU hardware
+
+Requires CuPy > 10.0. Install from https://cupy.dev following the `CuPy installation docs <https://docs.cupy.dev/en/stable/install.html#>`_
+
+Also requires the cupyx GPU-accelerated version of scipy.
+
+
+Performance Comparisons
+-----------------------
+
+
+
+Computation comparisons have been performed to illustrate the benefit of this accelerated computing
+feature. Below are comparisons of the times required for a PSF to be calculated for varying array
+sizes using the MKL FFT option versus the CuPy calculations. The optical systems tested had 5
+different surfaces/optics. 
+
+Performances will naturally vary depending on the compute hardware used. The system used for these
+comparisons was the University of Arizona’s HPC Puma nodes. The node utilized 32 AMD EPYC 7642 CPUs
+and the NVIDIA Tesla V100S GPU.
+
++-------------------+--------------+------------------------+-------------------------+----------------------+
+|  Propagation Type |	Array Size |	MKL Method Times [s] |	CuPy Method Times [s] |	Speed Up Factor      |
++===============+=======+=======+===============+=======+
+| Fraunhofer	| 1024	| 0.218	| 0.0261	| 8.35  |
+| Fraunhofer	| 2048	| 0.755	| 0.0294	| 25.7  |
+| Fraunhofer	| 4096	| 3.36	| 0.0423	| 79.4  |
+| Fresnel	| 1024	| 0.714	| 0.0438	| 16.3  |
+| Fresnel	| 2048	| 4.16	| 0.0845	| 49.2  |
+| Fresnel	| 4096	| 17.5	| 0.225	        | 77.8  |
++---------------+-------+-------+---------------+-------+
+
diff --git a/docs/index.rst b/docs/index.rst
@@ -67,6 +67,7 @@ Contents
   about.rst
   performance.rst
   fft_optimization.rst
+  gpu_acceleration.rst
   dev_notes.rst
 
 

diff --git a/docs/relnotes.rst b/docs/relnotes.rst
@@ -5,10 +5,13 @@ Release Notes
 
 For a list of contributors, see :ref:`about`.
 
- 1.1.0
- -----
+.. _rel1.1.0:
+
+1.1.0
+-----
+
+This release introduces support for much faster (20-80x) optical calculations using GPU acceleration via the CuPy library for NVidia GPUs. Credit to Kian Milani (:user:`kian1337`) for this significant and complex improvement.
 
- .. _rel1.1.0::
 
  *2023 May 12*
 

diff --git a/poppy/__init__.py b/poppy/__init__.py
@@ -56,73 +56,41 @@ class Conf(_config.ConfigNamespace):
     # because this is a memory-intensive calculation and you will
     # just end up thrashing IO and swapping out a ton, so everything
     # becomes super slow.
-    n_processes = _config.ConfigItem(
-        4,
-        "Maximum number of additional "
-        + "worker processes to spawn, if multiprocessing is enabled. "
-        + "Set to 0 for autoselect. Note, PSF calculations are likely RAM "
-        + "limited more than CPU limited for higher N on modern machines.",
-    )
-
-    use_fftw = _config.ConfigItem(
-        True,
-        "Use FFTW for FFTs (assuming it"
-        + "is available)?  Set to False to force numpy.fft always, True to"
-        + "try importing and using FFTW via PyFFTW.",
-    )
-    autosave_fftw_wisdom = _config.ConfigItem(
-        True,
-        "Should POPPY "
-        + "automatically save and reload FFTW "
-        + '"wisdom" for improved speed?',
-    )
-    use_mkl = _config.ConfigItem(
-        True,
-        "Use Intel MKL for FFTs (assuming it is available). "
-        "This has highest priority for CPU-based FFT over other FFT options, if multiple are set True.",
-    )
-
-    use_cuda = _config.ConfigItem(
-        True, "Use cuda for FFTs on GPU (assuming it" + "is available)?"
-    )
-    use_opencl = _config.ConfigItem(
-        True, "Use OpenCL for FFTs on GPU (assuming it" + "is available)?"
-    )
-    use_cupy = _config.ConfigItem(
-        True, "Use CuPy for FFTs on GPU (assuming it" + "is available)?"
-    )
-    use_numexpr = _config.ConfigItem(
-        True, "Use NumExpr to accelerate array math (assuming it" + "is available)?"
-    )
-
-    double_precision = _config.ConfigItem(
-        True,
-        "Floating point values use float64 and complex128 if True,"
-        + "otherwise float32 and complex64.",
-    )
-
-    default_image_display_fov = _config.ConfigItem(
-        5.0,
-        "Default image"
-        + "display field of view, in arcseconds. Adjust this to display "
-        + "only a subregion of a larger output array.",
-    )
-
-    default_logging_level = _config.ConfigItem(
-        "INFO", "Logging " + "verbosity: one of {DEBUG, INFO, WARN, ERROR, or CRITICAL}"
-    )
-
-    enable_speed_tests = _config.ConfigItem(
-        False,
-        "Enable additional "
-        + "verbose printout of computation times. Useful for benchmarking.",
-    )
-    enable_flux_tests = _config.ConfigItem(
-        False,
-        "Enable additional "
-        + "verbose printout of fluxes and flux conservation during "
-        + "calculations. Useful for testing.",
-    )
+    n_processes = _config.ConfigItem(4, 'Maximum number of additional ' +
+                                     'worker processes to spawn, if multiprocessing is enabled. ' +
+                                     'Set to 0 for autoselect. Note, PSF calculations are likely RAM ' +
+                                     'limited more than CPU limited for higher N on modern machines.')
+
+    use_fftw = _config.ConfigItem(True, 'Use FFTW for FFTs (assuming it' +
+                                  'is available)?  Set to False to force numpy.fft always, True to' +
+                                  'try importing and using FFTW via PyFFTW.')
+    autosave_fftw_wisdom = _config.ConfigItem(True, 'Should POPPY ' +
+                                              'automatically save and reload FFTW ' +
+                                              '"wisdom" for improved speed?')
+    use_mkl = _config.ConfigItem(True, "Use Intel MKL for FFTs (assuming it is available). "
+                                       "This has highest priority for CPU-based FFT over other FFT options, if multiple are set True.")
+    use_opencl = _config.ConfigItem(True, 'Use OpenCL for FFTs on GPU (assuming it' +
+            'is available)?')
+    use_cupy = _config.ConfigItem(True, 'Use CuPy for FFTs on GPU (assuming it' +
+            'is available)?')
+    use_numexpr = _config.ConfigItem(True, 'Use NumExpr to accelerate array math (assuming it' +
+            'is available)?')
+
+    double_precision = _config.ConfigItem(True, 'Floating point values use float64 and complex128 if True,' +
+            'otherwise float32 and complex64.')
+
+    default_image_display_fov = _config.ConfigItem(5.0, 'Default image' +
+                                                   'display field of view, in arcseconds. Adjust this to display ' +
+                                                   'only a subregion of a larger output array.')
+
+    default_logging_level = _config.ConfigItem('INFO', 'Logging ' +
+                                               'verbosity: one of {DEBUG, INFO, WARN, ERROR, or CRITICAL}')
+
+    enable_speed_tests = _config.ConfigItem(False, 'Enable additional ' +
+                                            'verbose printout of computation times. Useful for benchmarking.')
+    enable_flux_tests = _config.ConfigItem(False, 'Enable additional ' +
+                                           'verbose printout of fluxes and flux conservation during ' +
+                                           'calculations. Useful for testing.')
     cmap_sequential = _config.ConfigItem(
         "gist_heat",
         "Select a default colormap to represent sequential data (e.g. intensity)",