Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add content on performance benchmarks. #461

Closed
wants to merge 25 commits into from
Closed

Add content on performance benchmarks. #461

wants to merge 25 commits into from

Conversation

khushi-411
Copy link

@khushi-411 khushi-411 commented Aug 11, 2021

This PR adds content on performance. Follows issue: #370

Rendered Version: https://deploy-preview-461--numpy-preview.netlify.app/

cc: @mattip

@khushi-411 khushi-411 marked this pull request as draft August 11, 2021 06:49
@khushi-411 khushi-411 changed the title Add benchmark-doc, codes and graph. Add content on performance benchmarks. Aug 11, 2021
@mattip
Copy link
Member

mattip commented Aug 11, 2021

Please reformat the lines to <80 characters

@mattip
Copy link
Member

mattip commented Aug 14, 2021

How to best link to this from the “Performant” block on the front page?

@rgommers
Copy link
Member

How to best link to this from the “Performant” block on the front page?

The content is in content/en/config.yaml, under keyfeatures:. An item url: could be added there under title and text, and that can then be used in layouts/partials/keyfeatures.html (or static/css/keyfeatures.css, not sure without looking in more detail).

@mattip
Copy link
Member

mattip commented Aug 15, 2021

Please reformat the lines in the markdown to have less than 80 characters.

@mattip
Copy link
Member

mattip commented Sep 9, 2021

Deployment is failing with

$ python gen_config.py && hugo
Start building sites …
Total in 114 ms
Error: Error building site: process: readAndProcessContent: \
    open /opt/build/repo/public/config.yaml: no such file 

@mattip
Copy link
Member

mattip commented Sep 9, 2021

The results look reasonable now. Did you regenerate the figure to reflect the table?

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really want to dive into this, but looked at the code and found it a bit oddly dense for the NumPy version. Is the code taken from somewhere? I am surprised, because the nbable code seemed a bit clearer/and more comparable to me on first sight.

To be clear: I like this start and having N-Body seems good! My code comments are mainly because it seemed the code could be better and needs to be more comparable.

But, the document around it needs much more thoughts in my opinion. As is, I think the N-body examples probably cannot stand alone (they are just too terrible for NumPy, NumPy is not good at this, although it might be decent for large enough N, in which case that may work).

Even then, we need a story around it to go somewhat beyond "numpy is fast" (or "how fast is numpy").

Pythran/numba are here for the "fill the gap" story part: NumPy is fast enough for most things, and it is easy to fill the gap with C/Pythran/numba/cython where needed. It is trivial for Julia, etc. to point out that they are faster than Python/NumPy because it plays to their strengths, lets try to play to our strengths? But even that feels hard if we give examples 50-100 times slower than C, which is really much worse than typical).

benchmarks/python/optimized-numpy.py Outdated Show resolved Hide resolved
benchmarks/python/optimized-numpy.py Outdated Show resolved Hide resolved
benchmarks/python/optimized-numpy.py Outdated Show resolved Hide resolved
benchmarks/python/optimized-numpy.py Outdated Show resolved Hide resolved
benchmarks/python/optimized-numpy.py Outdated Show resolved Hide resolved
benchmarks/python/optimized-numpy.py Outdated Show resolved Hide resolved
benchmarks/python/optimized-numpy.py Outdated Show resolved Hide resolved
benchmarks/python/optimized-numpy.py Outdated Show resolved Hide resolved
@mattip
Copy link
Member

mattip commented Sep 10, 2021

As is, I think the N-body examples probably cannot stand alone

This is meant to be the first of many benchmarks. We chose this because it was well constructed and seemed to be straight-forward to present. It turned out to be not so great for NumPy, although at the end of the day I think there should be a way to avoid the loops and get better results.

…r than NumPy's operators at the required places.
@khushi-411 khushi-411 marked this pull request as ready for review September 13, 2021 03:57
@khushi-411
Copy link
Author

khushi-411 commented Sep 18, 2021

Hi,
I had addressed all the reviews and made those improvements. Thank you for all of your valuable words. Will you all please let me know if I can make any other corrections?

Presently, I have two things in my mind:

  • I observed NumPy performs very well for larger datasets. Do we remove outputs of smaller datasets and only include the larger ones? We have datasets of size 16, 32, 64, 128, 256, 512, 1k, 2k and 16k. And I included the performance for input sizes 32, 64, 128, and 256. Let me know if we have to change the values. Or is that okay?
  • We are using the non-vectorized approach for compiled methods like Numba and Pythran (Because we were facing buggy issues while using the vectorized approach, here: Unexpected behaviour while using transonic @jit and @boost decorator. khushi-411/numpy-benchmarks#4). Is that okay to use a non-vectorized method? Also, I expected the performance of Numba nearly equal to that of Pythran, but that's not the case here. What do you think about it?

I am looking forward to your inputs. Thanks!

cc: @mattip @rgommers @seberg

@rgommers
Copy link
Member

And I included the performance for input sizes 32, 64, 128, and 256. Let me know if we have to change the values. Or is that okay?

That should be fine. The runtime is a bit long (up to 180 sec), but that's still doable to reproduce because we don't run these benchmarks a lot. I'd not go any larger than 256 though.

Is that okay to use a non-vectorized method?

That should be fine. Sometimes it's even more readable.

Also, I expected the performance of Numba nearly equal to that of Pythran, but that's not the case here. What do you think about it?

You now use the code in compiled_methods.py for both Pythran and Numba, right? If so, it is slightly surprising that Numba is that much slower, but it can happen. On average they should be about as fast, but it can depend on a single line of code that's not well optimized if you're just looking at a single benchmark.

@khushi-411
Copy link
Author

Hi,
Thanks for your confirmations.

You now use the code in compiled_methods.py for both Pythran and Numba, right?

Yes, we are now using the same code for both Pythran and Numba.

Numba's behavior is quite a bit interesting.

@rgommers
Copy link
Member

I think we should move the benchmarking code to a new repo, it doesn't quite fit here or in the main numpy repo. Proposed name: numpyorg-benchmarks. @mattip, @seberg, @melissawm any thoughts on that?

@mattip
Copy link
Member

mattip commented Sep 28, 2021

I think something is off with the normalization. Originally, the formula time / (n_particles**2) was giving a almost constant time on all the implementations. Now it seems it is only constant for NumPy. If that is the case, we should rethink the normalization, something else is overshadowing the simple n**2 model.

Moving the benchmark code elsewhere seems reasonable. Should I open a repo?

@khushi-411
Copy link
Author

Hi,
I agree with you. But at the same time, I am sure that the results are correct. It's just a way to look into things. Let's take an example of the two methods in the smaller area.

Let time(t) = 2 secs and n_particles = 2, 3, 4...

  • Using the formula time / (n_particles) to normalize: We'll get 2/2=1, 2/3=0.67, 2/4=0.5 etc.
  • Using the formula time / (n_particles**2) to normalize: Here, we will get 2/4=0.5, 2/(3x3)=0.22, 2/(4x4)=0.125 etc. The rate of change differs.

The following is the output, in case we use time / (n_particles**2) for normalizing:
performance_benchmarking

According to me, normalizing the results using the formula time / (n_particles**2) is a bit biased. Can we play with growth rates to present the benchmarking results?

What do you all think?

Moving the benchmark code elsewhere seems reasonable. Should I open a repo?

We discussed transferring my repo https://github.com/khushi-411/numpyorg-benchmarks to NumPy org. Though, it needs some edits.
The structure I'm planning:

|-- data
    |-- all `input.txt` files here
|-- images
    |-- output images
|-- nbody problem
    |-- cpp
        |-- main.cpp
    |-- python
        |-- python_numpy.py
        |-- pure_python.py
        |-- compiled_methods.py
        |-- plot.py
    |-- README.md
        |-- describing the problem statement
|-- README.md

Is that okay?

Thanks!

cc: @mattip @rgommers @seberg @melissawm


## Results

Table values represent the normalized time taken in seconds by each algorithm to run on the given datasets for $50$ number of iterations. The raw timing data can be downloaded from <a href = "benchmarks/data/table.csv">here</a>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format to 80 columns, specify the normalization used in the table, and remove reduncancy

Suggested change
Table values represent the normalized time taken in seconds by each algorithm to run on the given datasets for $50$ number of iterations. The raw timing data can be downloaded from <a href = "benchmarks/data/table.csv">here</a>.
Table values represent the normalized time `time / n_particles` taken in
seconds by each algorithm to run on the given datasets for $50$ iterations. The
raw timing data can be downloaded from <a href =
"benchmarks/data/table.csv">here</a>.```

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll make the edits. Thanks!

@mattip
Copy link
Member

mattip commented Sep 29, 2021

normalizing the results using the formula time / (n_particles**2) is a bit biased

Ahh, I was confused: you are currently normalizing by time / n_particles. Using n_particles ** 2 seems to more-or-less flatten out the results for all the algorithms except NumPy, which gets faster for more particles (it would be nice to reason about this and add a comment why this happens. Hint: as we get more particles, the times move from "python-like" to "compiled-like"). What kind of a bias do you see? I can think of how to explain n_particles ** 2 from the formulas, and it seems sufficient enough of a normalization for a first-order approximation.

I think that is a more compelling story, perhaps the numerical text at the top of each column could present both the raw and the normalized values, as well as presenting both values in the table.

@mattip
Copy link
Member

mattip commented Sep 29, 2021

I created numpy/numpyorg-benchmarks. Please issue a PR to move the code there.

@mattip
Copy link
Member

mattip commented Sep 29, 2021

Sorry, I created mattip/numpyorg-benchmarks instead. It seems I do not have the needed permissions to create a repo in the numpy org, and I didn't notice that github moved me to mattip instead. @rgommers could you create the repo?

@rgommers
Copy link
Member

Sorry, I created mattip/numpyorg-benchmarks instead. It seems I do not have the needed permissions to create a repo in the numpy org, and I didn't notice that github moved me to mattip instead. @rgommers could you create the repo?

done!

@khushi-411
Copy link
Author

I'm closing this PR because I shifted all the work to numpyorg-benchmarks repo. Thanks!

@khushi-411 khushi-411 closed this Oct 5, 2021
@khushi-411 khushi-411 deleted the numpy-benchmarking branch March 4, 2022 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants