-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic benchmarks #1002
Add basic benchmarks #1002
Conversation
], | ||
// "uninstall_command": ["return-code=any python -mpip uninstall -y {project}"], | ||
"build_command": [ | ||
"python -m pip install cython", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not be necessary because of the pyproject.toml (and in fact will be ignored)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I comment out the pip install cython line
, I get this this. Can you spot the solution?
$ asv run -v HEAD^!
· Running '/usr/local/bin/git rev-list --first-parent HEAD^! --'
OUTPUT -------->
6a574253f6179f8b51bf131382f8b41032f925ec
· Creating environments
·· Creating virtualenv for virtualenv-py3.9
·· Running '/Users/nickcrews/Documents/projects/dedupe/.venv/bin/python3 -mvirtualenv -p /Users/nickcrews/Documents/projects/dedupe/.venv/bin/python3.9 /Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee'
OUTPUT -------->
created virtual environment CPython3.9.7.final.0-64 in 808ms
creator CPython3Posix(dest=/Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/Users/nickcrews/Library/Application Support/virtualenv)
added seed packages: pip==22.0.4, setuptools==62.1.0, wheel==0.37.1
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
·· Installing requirements for virtualenv-py3.9
·· Running '/Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/bin/python -mpip install -v wheel pip>=8'
OUTPUT -------->
Using pip 22.0.4 from /Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/lib/python3.9/site-packages/pip (python 3.9)
Requirement already satisfied: wheel in ./.asv/env/63b4d1e2919fcd950c89910ea84c38ee/lib/python3.9/site-packages (0.37.1)
Requirement already satisfied: pip>=8 in ./.asv/env/63b4d1e2919fcd950c89910ea84c38ee/lib/python3.9/site-packages (22.0.4)
· Discovering benchmarks
·· Running '/usr/local/bin/git rev-parse master^{commit}'
OUTPUT -------->
5d24e42aaf356ef7060c4d92435dc31337d05ad2
·· Running '/usr/local/bin/git clone --shared --recursive /Users/nickcrews/Documents/projects/dedupe /Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/project'
OUTPUT -------->
Cloning into '/Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/project'...
done.
·· Running '/usr/local/bin/git submodule deinit -f .'
·· Running '/usr/local/bin/git checkout -f 6a574253f6179f8b51bf131382f8b41032f925ec'
OUTPUT -------->
Note: switching to '6a574253f6179f8b51bf131382f8b41032f925ec'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at 6a57425 Add asv to CI
·· Running '/usr/local/bin/git clean -fdx'
·· Running '/usr/local/bin/git submodule update --init --recursive'
·· Uninstalling from virtualenv-py3.9
·· Running '/Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/bin/python -mpip uninstall -y dedupe'
OUTPUT -------->
WARNING: Skipping dedupe as it is not installed.
·· Running '/usr/local/bin/git name-rev --name-only --exclude=remotes/* --no-undefined 6a574253f6179f8b51bf131382f8b41032f925ec'
OUTPUT -------->
benchmarks
·· Building 6a574253 <benchmarks> for virtualenv-py3.9
·· Running '/Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/bin/python -m pip install -e .'
OUTPUT -------->
Obtaining file:///Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/project
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Checking if build backend supports build_editable: started
Checking if build backend supports build_editable: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'done'
Collecting numpy>=1.13
Using cached numpy-1.22.3-cp39-cp39-macosx_10_14_x86_64.whl (17.6 MB)
Collecting zope.index
Using cached zope.index-5.2.0-cp39-cp39-macosx_10_15_x86_64.whl (90 kB)
Collecting highered>=0.2.0
Using cached highered-0.2.1-py2.py3-none-any.whl (3.3 kB)
Collecting Levenshtein-search==1.4.5
Using cached Levenshtein_search-1.4.5-cp39-cp39-macosx_10_9_x86_64.whl
Collecting fastcluster
Using cached fastcluster-1.2.6-cp39-cp39-macosx_10_9_x86_64.whl (40 kB)
Collecting rlr>=2.4.3
Using cached rlr-2.4.6-py2.py3-none-any.whl (6.0 kB)
Collecting simplecosine>=1.2
Using cached simplecosine-1.2-py2.py3-none-any.whl (3.2 kB)
Collecting typing-extensions
Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Collecting categorical-distance>=1.9
Using cached categorical_distance-1.9-py3-none-any.whl (3.3 kB)
Collecting haversine>=0.4.1
Using cached haversine-2.5.1-py2.py3-none-any.whl (6.1 kB)
Collecting BTrees>=4.1.4
Using cached BTrees-4.10.0-cp39-cp39-macosx_10_15_x86_64.whl (1.0 MB)
Collecting doublemetaphone
Using cached DoubleMetaphone-1.1-cp39-cp39-macosx_10_9_x86_64.whl (22 kB)
Collecting affinegap>=1.3
Using cached affinegap-1.12-cp39-cp39-macosx_10_14_x86_64.whl (14 kB)
Collecting dedupe-variable-datetime
Using cached dedupe_variable_datetime-0.1.5-py3-none-any.whl (4.8 kB)
Collecting dedupe-hcluster
Using cached dedupe_hcluster-0.3.9-cp39-cp39-macosx_10_9_x86_64.whl (167 kB)
Collecting zope.interface>=5.0.0
Using cached zope.interface-5.4.0-cp39-cp39-macosx_10_14_x86_64.whl (208 kB)
Collecting persistent>=4.1.0
Using cached persistent-4.9.0-cp39-cp39-macosx_10_15_x86_64.whl (148 kB)
Collecting pyhacrf-datamade>=0.2.0
Using cached pyhacrf_datamade-0.2.6-cp39-cp39-macosx_10_9_x86_64.whl (193 kB)
Collecting pylbfgs
Using cached PyLBFGS-0.2.0.14-cp39-cp39-macosx_10_9_x86_64.whl (55 kB)
Collecting datetime-distance
Using cached datetime_distance-0.1.3-py3-none-any.whl (4.1 kB)
Collecting future
Using cached future-0.18.2-py3-none-any.whl
Collecting six
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Requirement already satisfied: setuptools in /Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/lib/python3.9/site-packages (from zope.index->dedupe==2.0.14) (62.1.0)
Collecting cffi
Using cached cffi-1.15.0-cp39-cp39-macosx_10_9_x86_64.whl (178 kB)
Collecting python-dateutil>=2.6.0
Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pycparser
Using cached pycparser-2.21-py2.py3-none-any.whl (118 kB)
Installing collected packages: Levenshtein-search, doublemetaphone, affinegap, zope.interface, typing-extensions, six, pycparser, numpy, haversine, future, simplecosine, python-dateutil, pylbfgs, fastcluster, dedupe-hcluster, cffi, categorical-distance, rlr, pyhacrf-datamade, persistent, datetime-distance, highered, BTrees, zope.index, dedupe-variable-datetime, dedupe
Running setup.py develop for dedupe
Successfully installed BTrees-4.10.0 Levenshtein-search-1.4.5 affinegap-1.12 categorical-distance-1.9 cffi-1.15.0 datetime-distance-0.1.3 dedupe-2.0.14 dedupe-hcluster-0.3.9 dedupe-variable-datetime-0.1.5 doublemetaphone-1.1 fastcluster-1.2.6 future-0.18.2 haversine-2.5.1 highered-0.2.1 numpy-1.22.3 persistent-4.9.0 pycparser-2.21 pyhacrf-datamade-0.2.6 pylbfgs-0.2.0.14 python-dateutil-2.8.2 rlr-2.4.6 simplecosine-1.2 six-1.16.0 typing-extensions-4.2.0 zope.index-5.2.0 zope.interface-5.4.0
·· Running '/Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/bin/python -m pip wheel --no-deps --no-index -w /Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/asv-build-cache/6a574253f6179f8b51bf131382f8b41032f925ec /Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/project'
OUTPUT -------->
Processing /Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/project
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error
× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [16 lines of output]
Traceback (most recent call last):
File "/Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
main()
File "/Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 164, in prepare_metadata_for_build_wheel
return hook(metadata_directory, config_settings)
File "/Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/lib/python3.9/site-packages/setuptools/build_meta.py", line 188, in prepare_metadata_for_build_wheel
self.run_setup()
File "/Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/lib/python3.9/site-packages/setuptools/build_meta.py", line 281, in run_setup
super(_BuildMetaLegacyBackend,
File "/Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/lib/python3.9/site-packages/setuptools/build_meta.py", line 174, in run_setup
exec(compile(code, __file__, 'exec'), locals())
File "setup.py", line 9, in <module>
from Cython.Build import cythonize
ModuleNotFoundError: No module named 'Cython'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
·· Error running /Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/bin/python -m pip wheel --no-deps --no-index -w /Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/asv-build-cache/6a574253f6179f8b51bf131382f8b41032f925ec /Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/project (exit status 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if it has a pretty old version of pip that doesn't know how to handle the pyproject.toml? try upgrading pip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ .asv/env/63b4d1e2919fcd950c89910ea84c38ee/bin/python -m pip --version
pip 22.0.4 from /Users/nickcrews/Documents/projects/dedupe/benchmarks/.asv/env/63b4d1e2919fcd950c89910ea84c38ee/lib/python3.9/site-packages/pip (python 3.9)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very mysterious, we can leave that in there for now.
python tests/canonical.py -vv | ||
cd benchmarks | ||
asv check -E existing | ||
git remote add upstream https://github.com/dedupeio/dedupe.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is happening here and on line 50?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in line 52, ASV checks out upstream/master
, installs dedupe, and runs the benchmarks. then checks out HEAD
, installs dedupe, runs the benchmarks. So lines 49 and 50 ensure that the ref of upstream/master
exist. Possible to accomplish the same thing a different way probably, but this hardcodes it instead of relying on a variable, the "base commit of the PR". Honestly I was just copying from the pandas example I linked in a comment above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, in this configuration we are just comparing a PR to master. we aren't keeping a longer term record of performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. See longer explanation in general PR discussion section
benchmarks/asv.conf.json
Outdated
// uninstalling the project. See asv.conf.json documentation. | ||
// | ||
"install_command": [ | ||
"python -m pip install -r requirements.txt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the current path?
in some parts of the code it seems like its "." and sometimes it seems like "./benchmarks"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confusing, I agree. See here. It is under the repo root for both of these (I am pretty sure)
Thanks @NickCrews! i don't quite understand what's happening yet, because I suppose there is not an existing benchmark results to compare against? How are the benchmark results persisted? |
At this point the benchmarks are not persisted. To do the comparison, ASV runs the benchmarks against Indeed, since the benchmarks don't currently exist in Persisting results would be nice, but is harder. would need to decide where to store them. It appears common to store them in a separate repo. For instance this person owns numpy-bench and scipy-bench repos for the results of those two projects, and that is also the repo with the GH pages branch that hosts the static website. From some comment I found they said they had an old machine in their basement that they have set up to run nightly. We could use GH actions presumably. Less consistent of a machine though I think. Also could store results in this repo. Downloading the https://github.com/pv/scipy-bench repo took 280Mb compresses, and is 820Mb uncompressed on my machine. Granted, they have a lot more benchmarks, so each of their result JSON files (one per commit) is much larger (273 KB for them compared to 4kB for just the 12 benchmarks above), and they have several hundred commits. IDK exactly how that would pan out. Not sure, what is the use case? Would you want to be able to compare benchmarks from an experiment branch to any arbitrary commit, just to decide wether or not to merge? Want the pretty graph that shows performance over all time? |
okay, thanks for your work on this. rebase against master and then I think we'll be good to go. |
let's revisiting persisting at another time. |
run `asv quickstart` and fill out a bit of config.
No behavior change, you can still manually test with `python benchmarks/benchmarks/canonical.py -v` - Move more common code to common.py - Migrate to snake_case - Make DATASETS_DIR less hardcoded - Ensure settings files are also in datasets, before they varied depending on CWD - Reformat with black - Apply flake8 to benchmarks too
This will make it easier to get scores for benchmarks
This will make it easier to call load() as a setup function and not evaluate it in a benchmark. The only part we actually want to benchmark is the run()
now it follows sklearn's convention of score(X, y)
I kinda wanted to keep it funcitonal and agnostic to the framework, but its more dry if I just fully commit
They just clog up the results and make them hard to see.
Not sure this is the best way to do this, but I was running intor problems because asv is run inside it's own little venv, so you can't just import neighboring modules by name because you don't know the path that the interpreter starts on. I'm pretty confused by all this though, so I'm mostly just thrashing around until I find somethign that works.
- benchmakrs don't need to get installed as editable - We need the stuff in `requirements.txt` to be able to get *ASV* to run, but we don't need thos dev requirements for the `tests` to run.
Really shot myself in the foot with choosing to do the Note the one fixup commit in this new changeset that addressed your one comment, and a fixup to reflect the master->main branch rename |
thanks @NickCrews, this will be interesting!! |
closes #967
@fgregg Here is a start of the benchmarking code!
I did a fair bit of thrashing around trying to get things to work and cleaning things up, so there is definitely some churn if you look at each commit individually, though I tried to rebase and clean them up a bit. Notes:
build_command
andinstall_command
in asv.conf.json), I'm no expert in python packaging so it could very well be non-optimal