Skip to content

Implementation of N-Body Problem. #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 55 commits into from
Oct 1, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
db627f5
Create optimized-numpy.py
khushi-411 Aug 18, 2021
d29639b
Update optimized-numpy.py
khushi-411 Aug 18, 2021
4220830
Create __init__.py
khushi-411 Aug 20, 2021
79b0dc4
Add files via upload
khushi-411 Aug 20, 2021
b2968e9
Update 10-sq-arr.py
khushi-411 Aug 22, 2021
077a444
Create 2.py
khushi-411 Aug 25, 2021
82c82e4
Create python.py
khushi-411 Aug 25, 2021
d409bad
Delete optimized-numpy.py
khushi-411 Aug 25, 2021
8d6f908
Rename 2.py to optimized-numpy.py
khushi-411 Aug 25, 2021
4b06d3c
Rename python.py to pure-python-particle.py
khushi-411 Aug 25, 2021
c32ddf3
Add files via upload
khushi-411 Aug 25, 2021
40b37f7
Create __init__.py
khushi-411 Aug 25, 2021
155556d
Add files via upload
khushi-411 Aug 25, 2021
f8448b4
Delete __init__.py
khushi-411 Aug 25, 2021
53cbf43
Update README.md
khushi-411 Aug 25, 2021
036eb2f
Add files via upload
khushi-411 Aug 25, 2021
d726a8c
Update optimized-numpy.py
khushi-411 Aug 25, 2021
582b8ea
Create README.md
khushi-411 Aug 25, 2021
5d3b51a
Update optimized-numpy.py
khushi-411 Aug 25, 2021
c1c6cce
Add files via upload
khushi-411 Aug 25, 2021
b60e2f1
Update README.md
khushi-411 Aug 25, 2021
49c446c
Update optimized-numpy-2.py
khushi-411 Aug 26, 2021
1c4bf8a
Update optimized-numpy-2.py
khushi-411 Aug 26, 2021
211105f
Update README.md
khushi-411 Aug 26, 2021
1577007
Update README.md
khushi-411 Aug 26, 2021
42d68bd
Update optimized-numpy-2.py
khushi-411 Aug 26, 2021
75da537
Update README.md
khushi-411 Aug 30, 2021
52061ff
Add pure-pure code
khushi-411 Aug 30, 2021
f5938b8
Modified pure Python code, added cpp code
khushi-411 Aug 30, 2021
97e8cd8
Added modified code of Numba and Pythran, added Numba's error and Mod…
khushi-411 Aug 30, 2021
5892d3c
Modified Pure Python Code, optimized-NumPy code
khushi-411 Aug 30, 2021
cb83679
Updated README
khushi-411 Aug 30, 2021
528f37c
Updated README
khushi-411 Aug 30, 2021
efbc3b4
Added images
khushi-411 Aug 31, 2021
2e26877
Update README, Optimized-numpy-2.py, pure-python.py
khushi-411 Sep 1, 2021
3a55f0d
update optimized-numpy-2.py, pure-python.py
khushi-411 Sep 1, 2021
3d8b046
Update codes, uploaded image result
khushi-411 Sep 2, 2021
9c0fa3e
Add test cases, modified codes
khushi-411 Sep 3, 2021
71f222d
Modified codes
khushi-411 Sep 5, 2021
2c43efb
Obtained same results in each case
khushi-411 Sep 7, 2021
a3943f6
Add acc_pythran2.py code
khushi-411 Sep 9, 2021
d308ced
Add acc_pythran3.py code
khushi-411 Sep 9, 2021
7c99eef
To many adds up
khushi-411 Sep 13, 2021
fe4bd96
Pythran Code
khushi-411 Sep 14, 2021
170fcf0
Add code, blog
khushi-411 Sep 22, 2021
1a5349c
Add blog
khushi-411 Sep 22, 2021
24fac99
Add blog
khushi-411 Sep 22, 2021
0e9649d
Add
khushi-411 Sep 23, 2021
ccd1c2a
Add
khushi-411 Sep 24, 2021
a043bb6
add plot
khushi-411 Sep 28, 2021
b632a68
Add
khushi-411 Sep 28, 2021
0b9e175
Deleted files
khushi-411 Sep 28, 2021
9024693
Deleted files
khushi-411 Sep 28, 2021
a5bcec7
Updated files
khushi-411 Oct 1, 2021
fedad65
Updated blog and README
khushi-411 Oct 1, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1 +1,87 @@
# NumPy Benchmarks

Benchmarking NumPy in realistic situations.

## Usage

To run the benchmarks, you need to install the following libraries:
```
sudo pacman -S python-numpy
sudo pacman -S python-pythran
pip install transonic
sudo pacman -S python-llvmlite
sudo pacman -S python-setuptools
conda install numba
sudo pacman -S cmake
sudo pacman -S gcc
```

To Run the Files

1. Clone the Repository.
```
git clone https://github.com/khushi-411/numpy-benchmarks
```

2. To execute NumPy and Python Code.
```
taskset -c 6,7,14,15 python python/filename.py data/dataset_filename.txt
```
*Note:* To obtain the accurate results the benchmarking is performed on the 4 isolated CPU cores (6, 7, 14, 15).

3. To execute Algorithms using Compiled Methods.
```
transonic -b BACKEND python/filename.py
TRANSONIC_BACKEND="BACKEND" taskset -c 6,7,15,16 python python/filename.py data/dataset_filename.txt
```

Here, `-b` flag used to set the BACKEND (presently, there are 3 backends in Transonic; `cython`, `numba` and `python`. The default backend is `pythran`. Currently, we used `numba` and `pythran` for our implementation.

## Writing Benchmarks

Obtaining stable and reliable benchmark results requires to tune the system and to analyze the results manually.

Some things to consider:
- The first goal is to avoid the outliers caused by noisy applications.
- To set the system state for benchmarking use the [pyperf system tune command](https://pyperf.readthedocs.io/en/latest/cli.html#system-cmd).
- To reduce the system jitter refer [tune the system for benchmarks](https://pyperf.readthedocs.io/en/latest/system.html#system) page.

## File Structure
```
|-- cpp
|-- main.cpp
|-- python
|-- compiled_methods.py
|-- numpy_python.py
|-- plot.py
|-- pure_python_particle.py
|-- pure_python.py
|-- data
|-- input files in the form of `inputX.txt`, where X = `16`, `32`, `64`, `128`, `256`, `512` and `1k`.
|-- benchmark_output.csv
|-- images
|-- blog
|-- test
|-- small tests to check the performance.
```

## Environment configuration

* **CPU Model:** Intel(R) Core(TM) i7-10870H CPU @ 2.20GHz
* **RAM GB:** 16
* **RAM Model:** DDR4
* **Speed:** 3200 MT/s
* **Operating System:** Manjaro Linux 21.1.1, Pahvo
* **Library Versions:**
* Python: 3.9.6
* NumPy: 1.20.3
* Numba: 0.54.0
* Pythran: 0.9.12.post1
* Transonic: 0.4.10
* GCC: 11.1.0
* **Note:** The benchmarking is performed on the $4$ isolated CPU cores for accurate results.

#### References
- [The issue for adding content on performance](https://github.com/numpy/numpy.org/issues/370)
- [Wikipedia's Article ob N-Body Problem](https://en.wikipedia.org/wiki/N-body_problem)
- [Pierre Augier's work on N-Body Problem](https://github.com/paugier/nbabel)
159 changes: 159 additions & 0 deletions blog/numpy_benchmarking_blog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
In this blog post, I'll be talking about my journey in Quansight.
I want to share all things I was involved in and accomplished.
What issues I faced, and most importantly, what were awesome life hacks I learned during this period.

First of all, I'd like to express my gratitude to the whole team
for allowing me to be a part of such a great team.
My work was majorly focused on providing performance benchmarks to NumPy in realistic situations.
The target was to show the world that NumPy is efficient in handling quasi real-life situations too.

Minor Note: To read the conceptual part of this project visit, [here](https://deploy-preview-461--numpy-preview.netlify.app/benchmark/).

<p align="center">
<img src = "/images/journey.jpeg" alt = "A word cloud with themes, open-source projects and people mentioned throughout the blog post. Each is stylized using a different font, most of them calligraphical.">
</p>

<!-- TEASER_END -->

## My Experience
My work was broadly divided majorly into the following categories:

- **N-Body Problem**: The [N-Body problem](https://en.wikipedia.org/wiki/N-body_problem) is one of the most famous
and universally accepted problems for benchmarking.
I was given this as a problem statement to work on the project.
I started my work with a theoretical understanding of the problem.
I took reference from [Pierre Augier] remarkable work on
[n-body](https://github.com/paugier/nbabel) problem. I'd love to thank him.
It was a fun learning part for me to connect the scientific part with the programming world.
I implemented the N-Body problem in Python, C, and C++.

- **Compiled Methods**: This part was the most exciting part of the project.
I got introduced to various compiled methods like [Cython](https://cython.readthedocs.io/en/latest/),
[Pythran](https://pythran.readthedocs.io/en/latest/), [Numba](http://numba.pydata.org/), [Transonic](https://transonic.readthedocs.io/en/latest/).
It was the first time when I got to know about accelerators.
I visited their documentations, GitHub, and searched popular blogs about compiled methods.
I loved playing with them. I read the theory part, looked into the examples,
and implemented them in my editor. It was a great learning experience for me
to get familiar with compiled methods.
I implemented the jitted compilation mode in Pythran and Numba for benchmarking via Transonic's support.

- **Visualization**: I used the [Matplotlib](https://matplotlib.org/) library for visualizing the benchmarking results.
I tried various plots to verify which one suits best,
like scatter plots, box plots, line charts, the combination of the scatter plots and line charts, etc.
But these were not good to go.
Those plots either lacked clarity or were not capable of providing significant results.
We finalized decided on two bar charts, with different vertical scales
to accomodate the vastly different performance of Python vs the compiled methods.
We also normalized the data to show trends as the number of particles increased.

- **Model Optimization**: Model Optimization was one of the most exciting parts for me to work.
I like playing with codes. The main task was to ensure
we were obtaining similar results in all the implemented algorithms.
I revisited all the code I had executed earlier. At this stage,
I was able to find out errors in my code and had an idea to improve it.
The final aim was to achieve the same results in each step at a minimal time.
Steps I followed to attain it:

- Initially, I played around with the library functions to check out which library function gave the best results.

- I then turned my focus to reduce the number of loops.
And I'll say hats off to the *Vectorized Approach* of the NumPy.
NumPy achieved a speed of more than 10% faster than Python.
The compelling thing is the changing behavior of NumPy
from python-like performance to compiled-like performance.

- The only task left was to verify whether we got the same results in all the cases.
Initially, I wanted to make my code as compact as possible.
Hence I focused on using more NumPy functions, but this, in turn,
led to a decrement in the readability of code and made my code more complex.
I learned that the structure of the code should be made easier to understand for the end-user.
The ultimate goal was to prove that NumPy performs well even without using its unique functions.

The following is the output of my work:

<img src = "/images/performance_benchmarking.png" alt = "A visual representation to compare the performance of NumPy with various languages like Python, C++, and accelerators like Numba, and Pythran." title = "Performance Benchmark; Number of Iterations: 50">

<!-- TEASER END -->

## Relevant Links

- **The issue I worked on**: [#370](https://github.com/numpy/numpy.org/issues/370) Add content on performance (e.g., benchmarks, mention accelerators).

- **PR's**: [#461](https://github.com/numpy/numpy.org/pull/461), [#1](https://github.com/numpy/numpyorg-benchmarks/pull/1), [#2](https://github.com/numpy/numpyorg-benchmarks/pull/2)

- **The Repository**: [numpyorg-benchmarks](https://github.com/numpy/numpyorg-benchmarks), [numpy-benchmarks](https://github.com/khushi-411/numpy-benchmarks)

- **Issue**: The most interesting issue I faced was using the vectorized approach in Pythran's implementation. I mentioned that [here](https://github.com/khushi-411/numpy-benchmarks/issues/4).

## Other Technical Work

- **Benchmarking Environment**: I enjoy changing my OS and love to taste different environments.
But it was the first time I isolated a certain number of CPU cores for accurate benchmarking results.
I referred to the official documentation of [pypref](https://pyperf.readthedocs.io/en/latest/)
and visited more than ten blogs to understand the idea.
It was a fun learning part.

- **Git**: Getting familiar with various git commands was one of the most incredible things
I became comfortable with it while working at Quansight.

## Advice to the Beginners

- **Getting Familiar with the importance of the project**: I believe:
'To find joy in the work, the most important task is to know where it started from.'
Read the previous discussions made and know the reason for the importance of your project.
I started my work with its origin. I read the issue related to benchmarking,
articles, and other related work.
I visited benchmarking pages of other libraries, too, to get the idea.
Among which the [micro-benchmarks of NumPy](https://pv.github.io/numpy-bench/) using ASV are the best.
It's too lovely!

- **Search everything about your project in the first 3-4 days**: At this part,
you need to get familiar with all the possible dots of your project.
Look into as many related works of your project and examine
the positive and negative points of the proposed work.
Now it's high time to give structure to your project.
I was pretty much sure about my work.
After getting familiar with the problem statement,
I read various other proposed projects related to benchmarking.
A few of them were [initialcontiditions.org](http://initialconditions.org/),
[benchmarks game](https://benchmarksgame-team.pages.debian.net/benchmarksgame/), [Julia's micro-benchmarks](https://julialang.org/benchmarks/);
there were a few more.
I agree that it took more than three days for me to complete,
but I learned specific life hacks, which I'm pretty sure
that I will implement in every project.
Make sure not to dig too deep into the topic.
First, know the width, then dive into the depth and
ensure that you are focusing on the subject.

- **Start working**: Here is where the journey starts.
The best way to express yourself is to present everything that you have completed.
Ask doubts as much as you can. But make sure that you have spent quality time in it.
I used to update my mentor Matti Picus each day about the progress of our work.
I am so glad to get such a responsive and understanding mentor.

- **Learn to prioritize things & make connections** (make sure to express yourself).
I learned to make connections with people being in Quansight.
It was my first professional experience.
I realized that the world is entirely different.
I still remember my first presentation (near about a year back) in my college.
I was not even able to speak up, but I worked on my communication skills.
I am pleased that within a few months, I interacted with such great personalities in The Quansight.
And I am pretty sure it will go on and on!

## My Next Step
Quansight has opened lots of great opportunities for me.
I aim to make myself more comfortable in resolving problems and bugs.
Soon, I am looking forward to contributing to other issues in NumPy and other Open Source Projects.
It was one of the best learning experiences for me.

## Acknowledgment
I want to thank [Quansight](https://github.com/Quansight-Labs)
for allowing me to work in such a great environment.
I am grateful to my mentors, [Matti Picus](https://github.com/mattip) and [Ralf Gommers](https://github.com/rgommers)
for all their guidance and support throughout the internship timeline.
I'd also like to thank [Melissa Weber Mendonça](https://github.com/melissawm) for sharing cool ideas about our project.

Special thanks to [Kushashwa Ravi Shrimali](https://github.com/krshrimali) and [Kshitij Kalambarkar](https://github.com/kshitij12345)
for sharing their cool learning tricks and life hacks.

Thanks to you'll! It was great interacting with you'll.
Loading