TST: Run ASV on Travis? #15035

max-sixty · 2017-01-02T14:58:58Z

Running ASV locally entrusts it to the pull requester, which means it gets run only in occasional circumstances (and is a bit of a burden).

Is there a way to run it on Travis, without significantly slowing down the builds? I know CircleCI has the ability to skip tests depending on the commit message - is there something similar we could do for Travis, so it only runs with an #run_asv string in the commit message (or similar)?

The text was updated successfully, but these errors were encountered:

jreback · 2017-01-02T16:19:22Z

I think this would be possible, though I think the running time might be too long? (create 2 envs & run all tests). but I don't run the full suite very often. but yes this would be nice (and ideally we could have multiple benchmark runs say versus 0.19.0, 0.18.0) etc.

we could easily just setup another repo, like dask did recently. To make this pretty automated. (we might actually want to setup a new org, e.g. pandas-dev-benchmarks), because then the travis runs don't compete with main pandas, but that is a separate issue.

note that the actually running of the scripts is here, a set of automated scripts create an env and run it (this is the part that would go on travis).

Also someone could scour travis for tools / examples that does this kind of benchmarking.

anyone want to give a whirl?

jorisvandenbossche · 2017-01-02T19:43:31Z

I also think this should be possible, but indeed computing time may be the most important problem. Do you know how long it takes for you to run the full benchmark?

I don't think creating a separate repo for this is needed. I thought the main reason for dask to have it as a separate repo was to also include distributed benchmarks (so benchmarks not related to a single package), dask/dask#1738. The advantage of easily including benchmarks with PRs is something we want to keep IMO. They also have a PR for making a cron job: dask/dask-benchmarks#8

If we would have an external machine to run perf tests, https://github.com/anderspitman/autobencher could also be interesting (used by scikit-bio).

tacaswell · 2017-01-02T19:47:20Z

From observation travis runtimes can be very flaky which might greatly reduce the value of ASV results.

jreback · 2017-01-02T19:52:09Z

are there other services (e.g. CircleCI maybe) that are 'meant' for benchmarking? (as opposed to 'making' travis work for us)?

jorisvandenbossche · 2017-01-02T19:59:41Z

From observation travis runtimes can be very flaky which might greatly reduce the value of ASV results.

The question then is also if this difference is mainly between runs, or also during one run. As differences between runs is not necessarily a problem for this use case, as the benchmark always compares to master within the same travis run.
But I can certainly imagine that also this can be flaky.

For full benchmark results over time, this will indeed be a problem. But for this, another option would be to have a separate machine to do this (spend some money on this, or share infrastructure with other projects, dask/dask-benchmarks#3 (comment))

pv · 2017-01-06T16:54:01Z

Re: continuous benchmarking

In my experience, you get good enough benchmark stability already from the cheapest dedicated server (~ 100€/year) --- one caveat however is in that these can have crappy CPUs, which behave differently from more high-end models vs some performance benchmarks (e.g. memory bandwidth issues). It's also fairly straightforward to set up a cron job to run (eg. inside a VM / other sandboxing) on your own desktop machine. The results can be easily hosted on github etc., so the machine does not need to be publicly visible.

The stability is in practice also less important for the continuous benchmarking over time, and more important for asv continuous. The reason is that CPU performance stepping and system loads contributes low-frequency noise (variation on long time scales). This averages towards zero for continuous benchmarking, where benchmark runs are separated by a long time interval --- in contrast, the rapid measurement in asv continuous takes samples over a short time interval, and cannot average over the slow noise.

I don't know a good solution for benchmarking PRs, however. The benchmark suites often take too long to run for Travis, and the results are too unreliable.

TomAugspurger · 2018-02-14T15:56:52Z

Closing this since we have a dedicated machine for this.

jorisvandenbossche added Performance Memory or execution speed performance Testing pandas testing functions or related to the test suite labels Jan 2, 2017

jreback added Difficulty Intermediate labels Jan 2, 2017

jreback added this to the Next Major Release milestone Jan 2, 2017

mroeschke mentioned this issue Jan 9, 2018

PERF: add asv job to travis #19104

Closed

TomAugspurger closed this as completed Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: Run ASV on Travis? #15035

TST: Run ASV on Travis? #15035

max-sixty commented Jan 2, 2017

jreback commented Jan 2, 2017 •

edited

Loading

jorisvandenbossche commented Jan 2, 2017

tacaswell commented Jan 2, 2017

jreback commented Jan 2, 2017

jorisvandenbossche commented Jan 2, 2017

pv commented Jan 6, 2017 •

edited

Loading

TomAugspurger commented Feb 14, 2018

TST: Run ASV on Travis? #15035

TST: Run ASV on Travis? #15035

Comments

max-sixty commented Jan 2, 2017

jreback commented Jan 2, 2017 • edited Loading

jorisvandenbossche commented Jan 2, 2017

tacaswell commented Jan 2, 2017

jreback commented Jan 2, 2017

jorisvandenbossche commented Jan 2, 2017

pv commented Jan 6, 2017 • edited Loading

TomAugspurger commented Feb 14, 2018

jreback commented Jan 2, 2017 •

edited

Loading

pv commented Jan 6, 2017 •

edited

Loading