-
-
Notifications
You must be signed in to change notification settings - Fork 120
Add content on performance benchmarks. #461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
7f13511
Add benchmark-doc, codes and graph
khushi-411 fb713be
Modified the graph, table, and code for visualization.
khushi-411 22f6dca
Modified Visulization and documentation.
khushi-411 0684e21
Minor Correction (name of the graph)
khushi-411 83e7d46
Plotted different type of visualization, added links to web page
khushi-411 467daf6
Minor Correction in content/en/benchmark.md
khushi-411 57a3221
Increased the size of the label in the graph.
khushi-411 3450f43
Reformatted number of characters in lines to <80 characters
khushi-411 7939706
Optimized NumPy Code, and Pythran code, Removed unnecessary files.
khushi-411 17b8095
Added pure-python code, renamed numpy file, deleted unnecessary files.
khushi-411 8342aa2
Optimized codes, modified content for the web page.
khushi-411 b19857d
modified codes, now the code outputsthe same results via different im…
khushi-411 258b22b
update codes and add updated plot.
khushi-411 a496e53
Modified codes to maintain uniformity and used Python operators rathe…
khushi-411 9e19e0d
Add Single implementation of compiled methods, made correction in C++…
khushi-411 0c0e6b5
Improved Documentation
khushi-411 c420d50
Changed Graph, made minor typo corrections in the documentation and u…
khushi-411 bf0f84d
Minor correction
khushi-411 b70655c
Minor correction
khushi-411 4f5e88f
Graph Modification
khushi-411 6a9a2d7
Changed the color of the bar in Graph.
khushi-411 33cf330
Reformated statements.
khushi-411 a823681
Change plot
khushi-411 71e4876
Deleted files
khushi-411 d384579
edit content
khushi-411 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,333 @@ | ||
--- | ||
title: NumPy Benchmarks | ||
sidebar: false | ||
--- | ||
|
||
<img src = "/images/content_images/performance_benchmarking.png" alt = "Visualization" title = "Performance Benchmark; Number of Iterations: 50"> | ||
|
||
## Overview | ||
|
||
This web page aims to benchmark NumPy's performance on the widely accepted N-body problem | ||
<a href="#nbody">[2]</a>. This work also compares NumPy with Python & C++ | ||
and with compilers like Numba and Pythran. | ||
|
||
The objective of benchmarking NumPy revolves around the efficiency of the library in quasi real-life situations, | ||
and the N-body problem suits the purpose well. | ||
Benchmarking is performed over several iterations for different datasets to ensure the accuracy of the results. | ||
|
||
<!--Towards the end of this post, an attempt will be made to make a conclusion on how NumPy can be efficient in solving problems like N-body problem.--> | ||
|
||
<!-- The post is organized as: --> | ||
|
||
<!-- Can be made like a content section? --> | ||
<!-- 1. Overview: (current section): Discussing the objective of the post. --> | ||
<!-- 2. About N-body Problem: Brief description on N-body problem and why it was chosen. --> | ||
<!-- 3. Pseudo Code of Solving N-Body Problem. --> | ||
<!-- 4. Dataset Description --> | ||
<!-- 5. Compiled Methods --> | ||
<!-- 6. Source Code --> | ||
<!-- 7. Results --> | ||
<!-- 8. Environment Configuration --> | ||
<!-- 9. Conclusion --> | ||
<!-- 10. References --> | ||
|
||
## About N-Body Problem | ||
|
||
<script type="text/x-mathjax-config"> | ||
MathJax.Hub.Config({ | ||
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]} | ||
}); | ||
</script> | ||
|
||
<script type="text/javascript" async | ||
src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.4/MathJax.js?config=TeX-MML-AM_CHTML"> | ||
</script> | ||
|
||
> In physics, the n-body problem is the problem of predicting the individual motions of a group of celestial objects interacting with each other gravitationally. | ||
|
||
<div style="text-align: right">Source: <a href="https://en.wikipedia.org/wiki/N-body_problem">Wikipedia</a></div> | ||
|
||
From the definition above, the N-body problem includes the kinematics between the different bodies, | ||
which involve various mathematical computations. | ||
Solving this problem has been motivated by the desire to understand the motions of the celestial bodies. | ||
Thus it serves as a robust entity between real-world applications and the computational world. | ||
|
||
A brief description of computations involved in solving the N-body problem is given below, | ||
along with the pseudo-code in the next section: | ||
|
||
Consider $n$ bodies of masses $m_1, m_2, m_3, ... , m_n$, | ||
moving under the mutual [gravitational force](https://en.wikipedia.org/wiki/Gravity) of attraction | ||
between them in an [inertial frame of reference](https://en.wikipedia.org/wiki/Inertial_frame_of_reference) of three dimensions, | ||
such that consecutive positions and velocities of an ${ith}$ body | ||
are denoted by ($s_{k-1}$, $s_k$) and ($v_{k-1}$, $v_k$) respectively. | ||
According to the [Newton's law of gravity](https://en.wikipedia.org/wiki/Newton%27s_law_of_universal_gravitation), | ||
the gravitational force felt on the $ith$ body of mass $m_i$ | ||
by a single body of mass $m_j$ is denoted as $F_{ij}$ | ||
and the acceleration of the $ith$ body is represented as $a_i$. | ||
Let $r_i$ and $r_j$ be the position vectors of two body, such that: | ||
|
||
\begin{equation} {r_i} = {s_{k}} - {s_{k-1}} \tag{I} \end{equation} | ||
|
||
\begin{equation} {r_j} = {s_{k-1}} - {s_{k}} \tag{II} \end{equation} | ||
|
||
The final aim is to find time taken to evaluate the total energy of each particle | ||
in the celestial space at a given time step. | ||
The equations involved in solving the problem are listed below: | ||
|
||
\begin{equation} {s_k} = {s_{k-1}} + {u\times t} + \frac{a\times t^2}{2} \tag{III} \end{equation} | ||
|
||
\begin{equation}{v_k} = {v_{k-1}} + {a\times t} \tag{IV} \end{equation} | ||
|
||
\begin{equation} {F_{ij}} = \frac{{G\times {m_i}\times {m_j}}\times \mid {r_j}-{r_i} \mid}{{\mid {r_j}-{r_i} \mid}^3} \tag{V} \end{equation} | ||
|
||
\begin{equation} {a_{i}} = \frac{F_{ij}}{m_{j}} \tag{VI} \end{equation} | ||
|
||
\begin{equation} \textrm{Self Potential Energy} = \textrm{U} = -\frac{{m_i}\times {m_j}}{r^2} \tag{VII} \end{equation} | ||
|
||
\begin{equation} \textrm{Kinetic Energy} = \textrm{K.E} = \frac{\sum m\times v^2}{2} \tag{VIII} \end{equation} | ||
|
||
\begin{equation} \textrm{Total Energy} = \textrm{Kinetic Energy} + \textrm{Self Potential Energy} \tag{IX} \end{equation} | ||
|
||
### Pseudo Code of Solving N-body Problem | ||
|
||
``` | ||
Set time to 0, time_step to 0.001 and time_end to 10s | ||
THEN number_of_step is 10/0.001 | ||
FOR time is less than or equal to time_end | ||
Calculate accelerations (a[i], for given position r[i]) | ||
Calculate total initial energies: | ||
Calculate kinetic energy | ||
Calculate potential energy | ||
FOR k less than number_of_step | ||
Calculate positions (r[k+1]) | ||
Swap accelerations | ||
Calculate accelerations | ||
Calculate velocities (v[k+1]) | ||
Increment time | ||
IF number_of_step % 100 is not 0 THEN | ||
Calculate total energy | ||
Print energy | ||
ENDIF | ||
END FOR | ||
END FOR | ||
``` | ||
|
||
## Dataset Description | ||
|
||
* Nine different text files, named as `InputX.txt`, where $X$ is number of particles in the celestial space (for this problem, number of particles are: $16, 32, 64, 128, 256, 512, 1000, 2000$ and $16000$). | ||
* Dataset<a href="#data">[3]</a> consists of the masses of each particle and information about their initial positions and velocities along the three-dimensional axis. | ||
|
||
An example from the dataset<a href="#data">[3]</a> used is given below: | ||
(for a single particle, the values are approximated up to four decimal places for readability) | ||
|
||
``` | ||
# Ordered as: label, mass (grams), position_x, position_y, position_z, velocity_x, velocity_y, velocity_z | ||
-1 0.0625 0.2148 -0.1204 -0.2661 0.7578 0.1576 -0.0715 | ||
``` | ||
|
||
## Compiled Methods | ||
|
||
We considered accelerators like [Numba](http://numba.pydata.org/), | ||
[Pythran](https://transonic.readthedocs.io/), and [Transonic](https://transonic.readthedocs.io/) | ||
for benchmarking. | ||
This decision is inspired by [Ralf Gommer's Presentation on SciPy 1.0](https://www.slideshare.net/RalfGommers/scipy-10-and-beyond-a-story-of-community-and-code) | ||
(conference [video](https://www.youtube.com/watch?v=oHmm3mPxg6Y)). | ||
We give brief details on a few of the accelerators below: | ||
|
||
### Numba | ||
|
||
> Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. | ||
|
||
<div style="text-align: right">Source: <a href="http://numba.pydata.org/">Numba's Website</a></div> | ||
|
||
Since Numba is a compiler focused on accelerating Python and NumPy codes, | ||
the user API of the library supports various decorators. | ||
It uses the industry-standard LLVM compiler library. | ||
It aims to translate the Python functions to optimized machine code during runtime. | ||
It supports variety of decorators like `@jit`, `@vectorize`, `@guvectorize`, `@stencil`, `@jitclass`, `@cfunc`, `@overload`. | ||
We are using `Just-In-Time` compilation in this work. | ||
It also supports `nopython` mode to generate fully compiled results | ||
without the need for intermediate Python interpreter calls. | ||
Numba's assistance to NumPy arrays and functions also makes it a good candidate for comparison. | ||
|
||
<!-- NumPy and Numba both use a similar type of compilation for ufuncs in manual looping resulting in the same speed. Another thing that Numba lacks behind is that it does not support all functions of NumPy. There are functions in NumPy which does not hold up some of the optional arguments in nopython mode. It can implement linear algebra calls in the compiled functions but does not return any faster implementation. --> | ||
|
||
### Pythran | ||
|
||
> Pythran is an ahead of time compiler for a subset of the Python language, with a focus on scientific computing. | ||
|
||
<div style="text-align: right">Source: <a href="https://pythran.readthedocs.io/en/latest/#">Pythran's Website</a></div> | ||
|
||
Since the focus of Pythran was on accelerating Python and NumPy codes, | ||
its C++ API is the same as that of NumPy. | ||
Pythran also supports [Expression Templating](https://en.wikipedia.org/wiki/Expression_templates) and [SIMD](https://en.wikipedia.org/wiki/SIMD) instructions, which are its main advantages. | ||
It converts annotated Python modules into native Python modules, which are comparatively faster. | ||
But both have the same kind of interface. | ||
|
||
<!-- NumPy arrays in Cython should be stored in contiguous memory like C-style or Fortran to use Pythran in the backend. Here, the Pythran lacks behind. Another limitation is that the sequence of bytes of words must be the same as the targeted architecture to make Pythran work.--> | ||
|
||
## Source Code | ||
|
||
* The code is inspired by <a href = "https://github.com/paugier/nbabel">Pierre Augier's work on N-Body Problem</a>. | ||
* Visualization Code: <a href = "/benchmarks/python/plot.py">here</a>. | ||
|
||
<html> | ||
<head> | ||
<style> | ||
table, th, td { | ||
border: 1px solid black; | ||
border-collapse: collapse; | ||
} | ||
</style> | ||
</head> | ||
<table> | ||
<tr> | ||
<td><b>Algorithm & Source Code</b></td> | ||
<td><b>Implementation Details</b></td> | ||
</tr> | ||
<tr> | ||
<td><a href = "/benchmarks/python/optimized_numpy.py">NumPy</a></td> | ||
<td>Vectorized Approach, Broadcasting Method, NumPy Arrays</td> | ||
</tr> | ||
<tr> | ||
<td><a href = "/benchmarks/python/pure_python.py">Python</a></td> | ||
<td>Standard Python Approach, Using List</td> | ||
</tr> | ||
<tr> | ||
<td><a href = "/benchmarks/cpp/main.cpp">C++</a></td> | ||
<td>C++ Implementation, GNU C++ Compiler</td> | ||
</tr> | ||
<tr> | ||
<td><a href = "/benchmarks/python/compiled_methods.py">Numba</a></td> | ||
<td>Just-In-time Compilation, Non-Vectorized Approach, Using Numba at the Backend via Transonic, NumPy Arrays</td> | ||
</tr> | ||
<tr> | ||
<td><a href = "/benchmarks/python/compiled_methods.py">Pythran</a></td> | ||
<td>Just-In-Time Compilation, Non-Vectorized Approach, Pythran at the Backend via Transonic, NumPy Arrays</td> | ||
</tr> | ||
</table> | ||
</html> | ||
|
||
## Results | ||
|
||
Table values represent the normalized time 'time / nParticles^{2}` taken in seconds | ||
by each algorithm to run on the given datasets for $50$ number of iterations. | ||
The raw timing data can be downloaded from <a href = "benchmarks/data/table.csv">here</a>. | ||
|
||
<html> | ||
<head> | ||
<style> | ||
table, th, td { | ||
border: 1px solid black; | ||
border-collapse: collapse; | ||
} | ||
</style> | ||
</head> | ||
<body> | ||
<table> | ||
<tr> | ||
<td>Input(s) $\rightarrow$</td> | ||
<td><b>32</b></td> | ||
<td><b>64</b></td> | ||
<td><b>128</b></td> | ||
<td><b>256</b></td> | ||
</tr> | ||
<tr> | ||
<tr> | ||
<td><b>NumPy</b></td> | ||
<td>0.434</td> | ||
<td>0.243</td> | ||
<td>0.139</td> | ||
<td>0.0713</td> | ||
</tr> | ||
<tr> | ||
<td><b>Python</b></td> | ||
<td>0.838</td> | ||
<td>0.783</td> | ||
<td>0.82</td> | ||
<td>0.697</td> | ||
</tr> | ||
<tr> | ||
<td><b>C++</b></td> | ||
<td>0.1001</td> | ||
<td>0.089</td> | ||
<td>0.089</td> | ||
<td>0.075</td> | ||
<tr> | ||
<td><b>Numba</b></td> | ||
<td>0.1007</td> | ||
<td>0.101</td> | ||
<td>0.106</td> | ||
<td>0.104</td> | ||
</tr> | ||
<tr> | ||
<td><b>Pythran</b></td> | ||
<td>0.02</td> | ||
<td>0.02</td> | ||
<td>0.019</td> | ||
<td>0.0203</td> | ||
</tr> | ||
</table> | ||
</body> | ||
</html> | ||
|
||
## Environment configuration | ||
|
||
* **CPU Model:** Intel(R) Core(TM) i7-10870H CPU @ 2.20GHz | ||
* **RAM GB:** 16 | ||
* **RAM Model:** DDR4 | ||
* **Speed:** 3200 MT/s | ||
* **Operating System:** Manjaro Linux 21.1.1, Pahvo | ||
* **Library Versions:** | ||
* Python: 3.9.6 | ||
* NumPy: 1.20.3 | ||
* Numba: 0.54.0 | ||
* Pythran: 0.9.12.post1 | ||
* Transonic: 0.4.10 | ||
* GCC: 11.1.0 | ||
* **Note:** The benchmarking is performed on the $4$ isolated CPU cores for accurate results. | ||
|
||
## Conclusion | ||
|
||
* NumPy is very efficient, especially for larger datasets. | ||
It performs $3.2$ times faster than Python for input size $64$, | ||
$5.8$ times faster for a dataset of size, $128$. | ||
It gives more than $9.7$ times better performance than Python for input size $256$. | ||
The performance of NumPy increases drastically as the number of particles in the datasets increases. | ||
Thanks to the vectorized approach in NumPy. Vectorization makes the code look clean and concise to read. That resulted in better performance without any explicit looping, indexing, etc. | ||
|
||
* It uses pre-compiled C code, which adds up to the performance of NumPy. | ||
The table shows that the performance of the NumPy is approaching the speed of C++ on increasing the input size. | ||
For a dataset of size $64$, NumPy is $2.7$ times slower than C++. For the dataset of size $128$, | ||
it reaches equivalent to the speed of C++, with a running time of $1.56$ times, faster than the time taken by C++. | ||
NumPy outperforms C++ by $1.05$ times for input size $256$. | ||
|
||
**How can we accelerate NumPy?** | ||
|
||
NumPy aims to improve itself and to give better performance for the end-users. It performs well in most cases. | ||
But to fill the gaps where NumPy is not so good various compiled methods | ||
like Numba, Pythran, etc are used. They play a huge role. Presently, | ||
we used Transonic's JIT Compilation at the backend for NumPy arrays to implement Numba & Pythran. | ||
To be specific, we want to compare NumPy's vectorized approach with the JIT-compiled non-vectorized approach. | ||
|
||
* We observed Numba performs $2.4$ times faster than NumPy for input size $64$ | ||
and $1.31$ times faster for input size $128$. | ||
But later, NumPy outperforms Numba by $1.45$ times faster for input size $256$. | ||
* Pythran performs $12.15$ times faster for input size $64$, | ||
$7.31$ times better for input size $128$, and $3.51$ times faster than NumPy for input size $256$. | ||
|
||
We have compared the performance of NumPy with two of the most popular languages | ||
Python and C++, and with popular compiled methods like Numba and Pythran. | ||
We distinguished the remarkable change in the behaviour of NumPy as we increment the number of input sizes. | ||
NumPy initially performed equivalent to the speed of Python, | ||
but, later it changed its behavior from "python-like" nature to "compiled-like" behavior. | ||
The running time became similar to accelerators that are to increase the performance of the code. | ||
It achieves better performance for scientific computations | ||
as well as for solving real-life situations. That's NumPy. | ||
It stands explicitly well in all kinds of circumstances. | ||
|
||
## References | ||
|
||
1. [The issue for adding content on performance](https://github.com/numpy/numpy.org/issues/370) | ||
2. <a id="nbody" href="https://en.wikipedia.org/wiki/N-body_problem">Wikipedia's Article on N-Body Problem</a> | ||
3. <a id="data" href="https://github.com/paugier/nbabel/tree/master/data">Dataset used from Pierre Augier's repository</a> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.