Add content on performance benchmarks. #461

khushi-411 · 2021-08-11T06:49:28Z

This PR adds content on performance. Follows issue: #370

Rendered Version: https://deploy-preview-461--numpy-preview.netlify.app/

cc: @mattip

content/en/config.yaml

mattip · 2021-08-11T10:38:52Z

Please reformat the lines to <80 characters

mattip · 2021-08-14T12:40:28Z

How to best link to this from the “Performant” block on the front page?

rgommers · 2021-08-14T19:55:44Z

How to best link to this from the “Performant” block on the front page?

The content is in content/en/config.yaml, under keyfeatures:. An item url: could be added there under title and text, and that can then be used in layouts/partials/keyfeatures.html (or static/css/keyfeatures.css, not sure without looking in more detail).

mattip · 2021-08-15T14:21:48Z

Please reformat the lines in the markdown to have less than 80 characters.

…plementations

mattip · 2021-09-09T05:16:39Z

Deployment is failing with

$ python gen_config.py && hugo
Start building sites …
Total in 114 ms
Error: Error building site: process: readAndProcessContent: \
    open /opt/build/repo/public/config.yaml: no such file

mattip · 2021-09-09T05:18:13Z

The results look reasonable now. Did you regenerate the figure to reflect the table?

seberg

I don't really want to dive into this, but looked at the code and found it a bit oddly dense for the NumPy version. Is the code taken from somewhere? I am surprised, because the nbable code seemed a bit clearer/and more comparable to me on first sight.

To be clear: I like this start and having N-Body seems good! My code comments are mainly because it seemed the code could be better and needs to be more comparable.

But, the document around it needs much more thoughts in my opinion. As is, I think the N-body examples probably cannot stand alone (they are just too terrible for NumPy, NumPy is not good at this, although it might be decent for large enough N, in which case that may work).

Even then, we need a story around it to go somewhat beyond "numpy is fast" (or "how fast is numpy").

Pythran/numba are here for the "fill the gap" story part: NumPy is fast enough for most things, and it is easy to fill the gap with C/Pythran/numba/cython where needed. It is trivial for Julia, etc. to point out that they are faster than Python/NumPy because it plays to their strengths, lets try to play to our strengths? But even that feels hard if we give examples 50-100 times slower than C, which is really much worse than typical).

benchmarks/python/optimized-numpy.py

mattip · 2021-09-10T05:55:41Z

As is, I think the N-body examples probably cannot stand alone

This is meant to be the first of many benchmarks. We chose this because it was well constructed and seemed to be straight-forward to present. It turned out to be not so great for NumPy, although at the end of the day I think there should be a way to avoid the loops and get better results.

…r than NumPy's operators at the required places.

… code, modified graph, removed unnecessary files.

khushi-411 · 2021-09-18T03:52:30Z

Hi,
I had addressed all the reviews and made those improvements. Thank you for all of your valuable words. Will you all please let me know if I can make any other corrections?

Presently, I have two things in my mind:

I observed NumPy performs very well for larger datasets. Do we remove outputs of smaller datasets and only include the larger ones? We have datasets of size 16, 32, 64, 128, 256, 512, 1k, 2k and 16k. And I included the performance for input sizes 32, 64, 128, and 256. Let me know if we have to change the values. Or is that okay?
We are using the non-vectorized approach for compiled methods like Numba and Pythran (Because we were facing buggy issues while using the vectorized approach, here: Unexpected behaviour while using transonic @jit and @boost decorator. khushi-411/numpy-benchmarks#4). Is that okay to use a non-vectorized method? Also, I expected the performance of Numba nearly equal to that of Pythran, but that's not the case here. What do you think about it?

I am looking forward to your inputs. Thanks!

cc: @mattip @rgommers @seberg

…pdated results.

rgommers · 2021-09-19T17:13:27Z

And I included the performance for input sizes 32, 64, 128, and 256. Let me know if we have to change the values. Or is that okay?

That should be fine. The runtime is a bit long (up to 180 sec), but that's still doable to reproduce because we don't run these benchmarks a lot. I'd not go any larger than 256 though.

Is that okay to use a non-vectorized method?

That should be fine. Sometimes it's even more readable.

Also, I expected the performance of Numba nearly equal to that of Pythran, but that's not the case here. What do you think about it?

You now use the code in compiled_methods.py for both Pythran and Numba, right? If so, it is slightly surprising that Numba is that much slower, but it can happen. On average they should be about as fast, but it can depend on a single line of code that's not well optimized if you're just looking at a single benchmark.

khushi-411 · 2021-09-19T19:56:42Z

Hi,
Thanks for your confirmations.

You now use the code in compiled_methods.py for both Pythran and Numba, right?

Yes, we are now using the same code for both Pythran and Numba.

Numba's behavior is quite a bit interesting.

rgommers · 2021-09-28T13:17:52Z

I think we should move the benchmarking code to a new repo, it doesn't quite fit here or in the main numpy repo. Proposed name: numpyorg-benchmarks. @mattip, @seberg, @melissawm any thoughts on that?

mattip · 2021-09-28T23:38:43Z

I think something is off with the normalization. Originally, the formula time / (n_particles**2) was giving a almost constant time on all the implementations. Now it seems it is only constant for NumPy. If that is the case, we should rethink the normalization, something else is overshadowing the simple n**2 model.

Moving the benchmark code elsewhere seems reasonable. Should I open a repo?

khushi-411 · 2021-09-29T08:40:26Z

Hi,
I agree with you. But at the same time, I am sure that the results are correct. It's just a way to look into things. Let's take an example of the two methods in the smaller area.

Let time(t) = 2 secs and n_particles = 2, 3, 4...

Using the formula time / (n_particles) to normalize: We'll get 2/2=1, 2/3=0.67, 2/4=0.5 etc.
Using the formula time / (n_particles**2) to normalize: Here, we will get 2/4=0.5, 2/(3x3)=0.22, 2/(4x4)=0.125 etc. The rate of change differs.

The following is the output, in case we use time / (n_particles**2) for normalizing:

According to me, normalizing the results using the formula time / (n_particles**2) is a bit biased. Can we play with growth rates to present the benchmarking results?

What do you all think?

Moving the benchmark code elsewhere seems reasonable. Should I open a repo?

We discussed transferring my repo https://github.com/khushi-411/numpyorg-benchmarks to NumPy org. Though, it needs some edits.
The structure I'm planning:

|-- data
    |-- all `input.txt` files here
|-- images
    |-- output images
|-- nbody problem
    |-- cpp
        |-- main.cpp
    |-- python
        |-- python_numpy.py
        |-- pure_python.py
        |-- compiled_methods.py
        |-- plot.py
    |-- README.md
        |-- describing the problem statement
|-- README.md

Is that okay?

Thanks!

cc: @mattip @rgommers @seberg @melissawm

mattip · 2021-09-29T08:50:58Z

content/en/benchmark.md

+
+## Results
+
+Table values represent the normalized time taken in seconds by each algorithm to run on the given datasets for $50$ number of iterations. The raw timing data can be downloaded from <a href = "benchmarks/data/table.csv">here</a>.


Format to 80 columns, specify the normalization used in the table, and remove reduncancy

Suggested change

Table values represent the normalized time taken in seconds by each algorithm to run on the given datasets for $50$ number of iterations. The raw timing data can be downloaded from <a href = "benchmarks/data/table.csv">here</a>.

Table values represent the normalized time `time / n_particles` taken in

seconds by each algorithm to run on the given datasets for $50$ iterations. The

raw timing data can be downloaded from <a href =

"benchmarks/data/table.csv">here</a>.```

Sure, I'll make the edits. Thanks!

mattip · 2021-09-29T09:02:10Z

normalizing the results using the formula time / (n_particles**2) is a bit biased

Ahh, I was confused: you are currently normalizing by time / n_particles. Using n_particles ** 2 seems to more-or-less flatten out the results for all the algorithms except NumPy, which gets faster for more particles (it would be nice to reason about this and add a comment why this happens. Hint: as we get more particles, the times move from "python-like" to "compiled-like"). What kind of a bias do you see? I can think of how to explain n_particles ** 2 from the formulas, and it seems sufficient enough of a normalization for a first-order approximation.

I think that is a more compelling story, perhaps the numerical text at the top of each column could present both the raw and the normalized values, as well as presenting both values in the table.

mattip · 2021-09-29T20:47:59Z

I created numpy/numpyorg-benchmarks. Please issue a PR to move the code there.

mattip · 2021-09-29T21:22:30Z

Sorry, I created mattip/numpyorg-benchmarks instead. It seems I do not have the needed permissions to create a repo in the numpy org, and I didn't notice that github moved me to mattip instead. @rgommers could you create the repo?

rgommers · 2021-09-29T21:37:03Z

Sorry, I created mattip/numpyorg-benchmarks instead. It seems I do not have the needed permissions to create a repo in the numpy org, and I didn't notice that github moved me to mattip instead. @rgommers could you create the repo?

done!

khushi-411 · 2021-10-05T04:40:38Z

I'm closing this PR because I shifted all the work to numpyorg-benchmarks repo. Thanks!

Add benchmark-doc, codes and graph

7f13511

khushi-411 marked this pull request as draft August 11, 2021 06:49

khushi-411 mentioned this pull request Aug 11, 2021

Add content on performance benchmarks. #460

Closed

khushi-411 changed the title ~~Add benchmark-doc, codes and graph.~~ Add content on performance benchmarks. Aug 11, 2021

mattip reviewed Aug 11, 2021

View reviewed changes

content/en/config.yaml Show resolved Hide resolved

khushi-411 added 3 commits August 11, 2021 16:51

Modified the graph, table, and code for visualization.

fb713be

Modified Visulization and documentation.

22f6dca

Minor Correction (name of the graph)

0684e21

rgommers added the content label Aug 13, 2021

khushi-411 added 3 commits August 15, 2021 17:41

Plotted different type of visualization, added links to web page

83e7d46

Minor Correction in content/en/benchmark.md

467daf6

Increased the size of the label in the graph.

57a3221

khushi-411 added 5 commits August 17, 2021 11:05

Reformatted number of characters in lines to <80 characters

3450f43

Optimized NumPy Code, and Pythran code, Removed unnecessary files.

7939706

Added pure-python code, renamed numpy file, deleted unnecessary files.

17b8095

Optimized codes, modified content for the web page.

8342aa2

modified codes, now the code outputsthe same results via different im…

b19857d

…plementations

update codes and add updated plot.

258b22b

seberg reviewed Sep 9, 2021

View reviewed changes

Modified codes to maintain uniformity and used Python operators rathe…

a496e53

…r than NumPy's operators at the required places.

khushi-411 marked this pull request as ready for review September 13, 2021 03:57

khushi-411 added 2 commits September 17, 2021 01:27

Add Single implementation of compiled methods, made correction in C++…

9e19e0d

… code, modified graph, removed unnecessary files.

Improved Documentation

0c0e6b5

khushi-411 added 2 commits September 18, 2021 11:39

Changed Graph, made minor typo corrections in the documentation and u…

c420d50

…pdated results.

Minor correction

bf0f84d

khushi-411 added 2 commits September 20, 2021 12:06

Minor correction

b70655c

Graph Modification

4f5e88f

Changed the color of the bar in Graph.

6a9a2d7

mattip reviewed Sep 29, 2021

View reviewed changes

Reformated statements.

33cf330

khushi-411 mentioned this pull request Sep 29, 2021

Implementation of the N-Body Problem. numpy/numpyorg-benchmarks#1

Merged

khushi-411 added 2 commits September 30, 2021 16:57

Change plot

a823681

Deleted files

71e4876

khushi-411 mentioned this pull request Sep 30, 2021

N-Body Problem: Added datasets, README for usage and for theory. numpy/numpyorg-benchmarks#2

Merged

edit content

d384579

This was referenced Oct 1, 2021

Blog Post: NumPy Benchmarking Quansight-Labs/quansight-labs-site#260

Merged

Implementation of N-Body Problem. khushi-411/numpy-benchmarks#7

Merged

khushi-411 closed this Oct 5, 2021

khushi-411 deleted the numpy-benchmarking branch March 4, 2022 13:51


		## Results

		Table values represent the normalized time taken in seconds by each algorithm to run on the given datasets for $50$ number of iterations. The raw timing data can be downloaded from <a href = "benchmarks/data/table.csv">here</a>.

-Table values represent the normalized time taken in seconds by each algorithm to run on the given datasets for $50$ number of iterations. The raw timing data can be downloaded from <a href = "benchmarks/data/table.csv">here</a>.
+Table values represent the normalized time `time / n_particles` taken in
+seconds by each algorithm to run on the given datasets for $50$ iterations. The
+raw timing data can be downloaded from <a href =
+"benchmarks/data/table.csv">here</a>.```

Uh oh!

Add content on performance benchmarks. #461

Add content on performance benchmarks. #461

Uh oh!

Conversation

khushi-411 commented Aug 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mattip commented Aug 11, 2021

Uh oh!

mattip commented Aug 14, 2021

Uh oh!

rgommers commented Aug 14, 2021

Uh oh!

mattip commented Aug 15, 2021

Uh oh!

mattip commented Sep 9, 2021

Uh oh!

mattip commented Sep 9, 2021

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattip commented Sep 10, 2021

Uh oh!

khushi-411 commented Sep 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgommers commented Sep 19, 2021

Uh oh!

khushi-411 commented Sep 19, 2021

Uh oh!

rgommers commented Sep 28, 2021

Uh oh!

mattip commented Sep 28, 2021

Uh oh!

khushi-411 commented Sep 29, 2021

Uh oh!

mattip Sep 29, 2021

Choose a reason for hiding this comment

Uh oh!

khushi-411 Sep 29, 2021

Choose a reason for hiding this comment

Uh oh!

mattip commented Sep 29, 2021

Uh oh!

mattip commented Sep 29, 2021

Uh oh!

mattip commented Sep 29, 2021

Uh oh!

rgommers commented Sep 29, 2021

Uh oh!

khushi-411 commented Oct 5, 2021

Uh oh!

Uh oh!

khushi-411 commented Aug 11, 2021 •

edited

Loading

khushi-411 commented Sep 18, 2021 •

edited

Loading