Line graph of benchmark results #33

TrevorHansen · 2024-01-02T00:48:56Z

This produces a graph of the benchmark results, e.g:

For each extractor it shows the improvement v.s. time to a reference extractor. In this I ran each extractor with a 2000 second timeout, and after that returned the result from faster-greedy-day. Given faster-greedy-dag is also the reference, the timeouts just show time being used, and no improvement.

I've got some ideas about how to make it more informative e.g:

Including details of timeouts,
Changing it to log scale.

I'm keen to hear what others would find helpful?

…t run.

mwillsey · 2024-01-12T00:06:12Z

I love the idea of better graphics, but I'm afraid I don't really understand what's going on with this plot atm.

Bastacyclop · 2024-01-12T08:59:05Z

I can image plotting, for every dataset and for their cumulation, a figure with the extractor on the x-axis, and time or cost quantiles on the y-axis.

Bastacyclop · 2024-01-12T09:05:18Z

I also like the general idea of Trevor's graph for incremental extractors: because you can decide how much time to spend on extraction, there is trade-off between time allowed and the actual cost reduction achieved. Maybe incremental extractors could have a different "tag" or trait to produce these additional plots, where cost can be reported over time.

TrevorHansen · 2024-01-13T07:02:57Z

Thanks @Bastacyclop & @mwillsey for the input.

I've not compared optimisation solvers before, so I don't know how it's commonly done. From an updated comment in the plot.py file, what this graphs shows is:

This assumes an extractor is run on the all the egraph benchmarks at the same time.
So given 500 egraphs, each will receive 1/500th of a second of that first second's
CPU time. Say 10 egraphs finish processing with their extractor with less than
1/500th of a second's CPU time, i.e. they have a runtime of less than 2ms. Then
for the 2nd second of CPU time, each egraph will get 1/490th of a second of CPU time.

Continuing the example, if those 10 egraphs which were processed in the first 1/500th
of a second, improved on the cost versus the reference implementation by 20, then the
graph will plot an improvemement of 20 at 1 second.

At 2 seconds, the improvement will be the sum of the improvements of all the extractors
which finished in less than 1/500th + 1/490th of a second, that is that finished with
a total runtime of less than 4.04ms.

This will continue until the timeout on the extractor is reached.

I don't have any claim that this is the best way to compare extractors graphically. Just some way that makes sense to me.

mwillsey · 2024-01-13T16:58:23Z

I think that might be a little too complicated. Perhaps another idea is to just scatter plot time vs cost. Each point is a single extraction of a single egraph, x = time, y = cost, color/shape = extractor. I think your idea of normalizing the cost is a good one and might be needed in this scheme as well. But yes, this seems like a tricky visualization problem.

TrevorHansen · 2024-01-27T07:15:31Z

I've made this a bit more confusing, but more useful. It now plots cumulative percentage improvement. So 1% means that the arithmetic-average improvement in the DAG-cost across all the benchmarks is 1%.

I tried log-scale for time, but didn't find it as helpful (see below).

Next, I'll do a separate plot which shows the number of timeouts. I think we need a dashboard with a few graphs to show what's happening.

I tried the scatter plot, but found it too crowded to be helpful.

TrevorHansen added 3 commits January 2, 2024 11:44

line graph of benchmark results

55bad7e

Reduce what's shown for tree extractions. Fix error if bottom-up isn'…

65d5ac1

…t run.

Merge remote-tracking branch 'main/main' into om17

c021a9a

TrevorHansen added 2 commits January 14, 2024 06:15

Improve the description of what the graph shows

c0af72b

move to cumulative percentage improvement

04efad2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Line graph of benchmark results #33

Line graph of benchmark results #33

TrevorHansen commented Jan 2, 2024 •

edited

Loading

mwillsey commented Jan 12, 2024

Bastacyclop commented Jan 12, 2024

Bastacyclop commented Jan 12, 2024

TrevorHansen commented Jan 13, 2024

mwillsey commented Jan 13, 2024

TrevorHansen commented Jan 27, 2024 •

edited

Loading

Line graph of benchmark results #33

Are you sure you want to change the base?

Line graph of benchmark results #33

Conversation

TrevorHansen commented Jan 2, 2024 • edited Loading

mwillsey commented Jan 12, 2024

Bastacyclop commented Jan 12, 2024

Bastacyclop commented Jan 12, 2024

TrevorHansen commented Jan 13, 2024

mwillsey commented Jan 13, 2024

TrevorHansen commented Jan 27, 2024 • edited Loading

TrevorHansen commented Jan 2, 2024 •

edited

Loading

TrevorHansen commented Jan 27, 2024 •

edited

Loading