Scientific computing, or numerical computation, is a rapidly growing field that uses computers to solve numerical computation problems in scientific research and engineering.
Scientific computing is applied in a variety of fields such as image processing, machine learning, and deep learning.
Below is the scientific Python ecosystem.
SciPy provides fundamental algorithms for scientific computing, including mathematical, scientific and engineering routines. Matplotlib generates publication-ready figures and visualizations. The combination of NumPy, SciPy and Matplotlib, together with an advanced interactive environment such as IPython or Jupyter, provides a solid foundation for array programming in Python.
NumPy is the base of the scientific Python ecosystem, as shown above.
Multidimensional arrays are a core concept of NumPy, and are the foundation for various upper-layer tools. A multidimensional array is also called a tensor. Compared with two-dimensional tables or matrices, tensors have more powerful expressiveness. Therefore, popular deep learning frameworks usually adopt data structures that are based on tensors.
An NumPy array consists of a pointer to memory, along with metadata used to interpret the data stored there, notably ‘data type’, ‘shape’ and ‘strides’, as shown below.
The data type describes the nature of elements stored in an array. An array has a single data type, and each element of an array occupies the same number of bytes in memory.
The shape of an array determines the number of elements along each axis, and the number of axes is the dimensionality of the array.
The strides of an array tell us how many bytes we have to skip in memory to move to the next position along a certain axis.
To apply a function to an element of a list or tuple or a NumPy array, we can easily use the for loop in Python.
But Python is an interpreted language and most of the implementation is slow compared to that of C and C++. The main reason for this slow computation comes down to the dynamic nature of Python and the lack of compiler level optimizations which incur memory overheads.
To tackle this bottleneck, NumPy provides vectorization functionality that maps a function over a sequence efficiently.
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations.
Numpy Vectorization essentially functions like the python map() but with additional functionality – the NumPy broadcasting mechanism. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.
For example,
import numpy as np
arr = np.arange(3)
result = arr + 4
Here arr
has one dimension(axis), which has length 3, on the other hand, 4 is a simple integer, which has 0
dimensions. Since they are of different dimensions, Numpy tries to broadcast (simply stretch) the smaller array
along a specific axis making it suitable for the mathematical operation to take place.
Scientific datasets now routinely exceed the memory capacity of a single machine and may be stored on multiple machines or in the cloud. In addition, the recent need to accelerate deep-learning and artificial intelligence applications has led to the emergence of specialized accelerator hardware, for example, graphics processing units (GPUs). Owing to its in-memory data model, NumPy is currently unable to directly utilize such storage and specialized hardware.
The community’s efforts to fill this gap led to a proliferation of new array implementations.For example,
- Deep-learning framework creates its own arrays. The PyTorch, Tensorflow, arrays all have the capability to run on CPUs and GPUs in a distributed fashion, using lazy evaluation to allow for additional performance optimizations.
- Sparse arrays. The SciPy provides sparse arrays, which typically contain few non-zero values and store only those in memory for efficiency.
- Distributed NumPy arrays. Dask arrays coordinate many Numpy arrays, arranged into chunks within a grid. Dask also labels arrays — referring to dimensions of an array by name rather than by index for clarity.
Such libraries often mimic the NumPy API, because this lowers the barrier to entry for newcomers and provides the wider community with a stable array programming interface.
https://www.nature.com/articles/s41586-020-2649-2
https://examples.dask.org/array.html
https://www.askpython.com/python-modules/numpy/numpy-vectorization
https://www.askpython.com/python-modules/numpy/numpy-broadcasting