Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Run ASV on Travis? #15035

Closed
max-sixty opened this issue Jan 2, 2017 · 7 comments
Closed

TST: Run ASV on Travis? #15035

max-sixty opened this issue Jan 2, 2017 · 7 comments
Labels
Performance Memory or execution speed performance Testing pandas testing functions or related to the test suite

Comments

@max-sixty
Copy link
Contributor

Running ASV locally entrusts it to the pull requester, which means it gets run only in occasional circumstances (and is a bit of a burden).

Is there a way to run it on Travis, without significantly slowing down the builds? I know CircleCI has the ability to skip tests depending on the commit message - is there something similar we could do for Travis, so it only runs with an #run_asv string in the commit message (or similar)?

@jorisvandenbossche jorisvandenbossche added Performance Memory or execution speed performance Testing pandas testing functions or related to the test suite labels Jan 2, 2017
@jreback
Copy link
Contributor

jreback commented Jan 2, 2017

I think this would be possible, though I think the running time might be too long? (create 2 envs & run all tests). but I don't run the full suite very often. but yes this would be nice (and ideally we could have multiple benchmark runs say versus 0.19.0, 0.18.0) etc.

we could easily just setup another repo, like dask did recently. To make this pretty automated. (we might actually want to setup a new org, e.g. pandas-dev-benchmarks), because then the travis runs don't compete with main pandas, but that is a separate issue.

note that the actually running of the scripts is here, a set of automated scripts create an env and run it (this is the part that would go on travis).

Also someone could scour travis for tools / examples that does this kind of benchmarking.

anyone want to give a whirl?

@jreback jreback added this to the Next Major Release milestone Jan 2, 2017
@jorisvandenbossche
Copy link
Member

I also think this should be possible, but indeed computing time may be the most important problem. Do you know how long it takes for you to run the full benchmark?

I don't think creating a separate repo for this is needed. I thought the main reason for dask to have it as a separate repo was to also include distributed benchmarks (so benchmarks not related to a single package), dask/dask#1738. The advantage of easily including benchmarks with PRs is something we want to keep IMO. They also have a PR for making a cron job: dask/dask-benchmarks#8

If we would have an external machine to run perf tests, https://github.com/anderspitman/autobencher could also be interesting (used by scikit-bio).

@tacaswell
Copy link
Contributor

From observation travis runtimes can be very flaky which might greatly reduce the value of ASV results.

@jreback
Copy link
Contributor

jreback commented Jan 2, 2017

are there other services (e.g. CircleCI maybe) that are 'meant' for benchmarking? (as opposed to 'making' travis work for us)?

@jorisvandenbossche
Copy link
Member

From observation travis runtimes can be very flaky which might greatly reduce the value of ASV results.

The question then is also if this difference is mainly between runs, or also during one run. As differences between runs is not necessarily a problem for this use case, as the benchmark always compares to master within the same travis run.
But I can certainly imagine that also this can be flaky.

For full benchmark results over time, this will indeed be a problem. But for this, another option would be to have a separate machine to do this (spend some money on this, or share infrastructure with other projects, dask/dask-benchmarks#3 (comment))

@pv
Copy link
Contributor

pv commented Jan 6, 2017

Re: continuous benchmarking

In my experience, you get good enough benchmark stability already from the cheapest dedicated server (~ 100€/year) --- one caveat however is in that these can have crappy CPUs, which behave differently from more high-end models vs some performance benchmarks (e.g. memory bandwidth issues). It's also fairly straightforward to set up a cron job to run (eg. inside a VM / other sandboxing) on your own desktop machine. The results can be easily hosted on github etc., so the machine does not need to be publicly visible.

The stability is in practice also less important for the continuous benchmarking over time, and more important for asv continuous. The reason is that CPU performance stepping and system loads contributes low-frequency noise (variation on long time scales). This averages towards zero for continuous benchmarking, where benchmark runs are separated by a long time interval --- in contrast, the rapid measurement in asv continuous takes samples over a short time interval, and cannot average over the slow noise.

I don't know a good solution for benchmarking PRs, however. The benchmark suites often take too long to run for Travis, and the results are too unreliable.

@TomAugspurger
Copy link
Contributor

Closing this since we have a dedicated machine for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

6 participants