Add numba and tweak Python benchmarks #2
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, I added numba to the benchmarks, as discussed. I also tweaked the functions so that they return the correct results (same as the Julia version and the original problem).
I'll note that
timeit
is really clunky to use, and I do not have the time right now to learn it properly. For instance, not sure adding ''µs" to the print statements is appropriate -- feel free to take that out. More importantly though, I did notice that I get somewhat different results when benchmarking in a script vs benchmarking with%timeit
in IPython:The differences are much more pronounced if I use larger array sizes.
Speaking of which, I think benchmarking with such small arrays may be misleading. I may be wrong about the mechanics, but I think calling numpy incurs some overhead, so a tiny array of 10, which is obviously not representative of real-world use cases, understates numpy's performance, especially in comparison to looping in pure Python. For instance, here are some benchmarks with randomly generated arrays of 1200 (12 months * 10 years):
Thus, you should give some thought to expanding the size of the
q
andw
arrays. Not only is then numpy clearly faster than plain Python, but in certain cases numba is a bit faster than plain Julia. I recently ran some benchmarks on a similar (discounting) function, and numba was indeed slightly faster. Of course, numba is quite finicky and, in my experience, only works on small/clean enough problems. And there is of course also this.Nevertheless, if we're benchmarking performance, we should be as thorough and fair as possible. I hope this is helpful. Feel free to reach out here or on Slack if you'd like to discuss.