-
Well chosen simulation parameters are the first (and can be the most important) step in reducing run time
-
Use the Anaconda distribution to get free access to the Intel MKL, which
- will automatically vectorize many numpy operations
- can be configured to parallelize some operations across multiple CPU cores
-
Consider using numexpr to evaluate compound expressions efficiently
-
If your system contains sparse matrices (a matrix in which most elements are zero), use one of scipy.sparse’s sparse matrix representations (e.g. csr_matrix)
-
Use a profiler (cProfile in Python) to identify hot-spots in your code
-
Consider modifying / rewriting those sections of your Python code to work with either
- NUMBA: this might be preferable, because it works on pure Python (though doesn’t support all Numpy functions in the fast ‘nopython’ mode). Also supports automatic parallelization across CPU cores, and even execution on a GPU. Or,
- Cython: popular way to write high-performance Python-like code. Requires changing your code so it is no longer runnable on the default Python interpreter if you want native performance.
-
An alternative to writing your numerical code in Python, write it in C instead and utilize the Intel MKL to vectorize all your linear algebra.
- Initialize your simulation parameters, and handle/process your data in Python. Only the simulation is run in C. You can call your C code using Python extension modules, ctypes, or cffi.
- Use OpenMP to decorate C code with hints to automatically execute across multiple CPUs.
- For very large numerical simulators (e.g. cluster-scale), use Open MPI to manually implement a distributed simulator, appropriate for running on compute clusters.
Further reading:
Interfacing with C from Python
Extending Python with C or C++