-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Line graph of benchmark results #33
base: main
Are you sure you want to change the base?
Conversation
I love the idea of better graphics, but I'm afraid I don't really understand what's going on with this plot atm. |
I can image plotting, for every dataset and for their cumulation, a figure with the extractor on the x-axis, and time or cost quantiles on the y-axis. |
I also like the general idea of Trevor's graph for incremental extractors: because you can decide how much time to spend on extraction, there is trade-off between time allowed and the actual cost reduction achieved. Maybe incremental extractors could have a different "tag" or trait to produce these additional plots, where cost can be reported over time. |
Thanks @Bastacyclop & @mwillsey for the input. I've not compared optimisation solvers before, so I don't know how it's commonly done. From an updated comment in the plot.py file, what this graphs shows is:
I don't have any claim that this is the best way to compare extractors graphically. Just some way that makes sense to me. |
I think that might be a little too complicated. Perhaps another idea is to just scatter plot time vs cost. Each point is a single extraction of a single egraph, x = time, y = cost, color/shape = extractor. I think your idea of normalizing the cost is a good one and might be needed in this scheme as well. But yes, this seems like a tricky visualization problem. |
I've made this a bit more confusing, but more useful. It now plots cumulative percentage improvement. So 1% means that the arithmetic-average improvement in the DAG-cost across all the benchmarks is 1%. I tried log-scale for time, but didn't find it as helpful (see below). Next, I'll do a separate plot which shows the number of timeouts. I think we need a dashboard with a few graphs to show what's happening. I tried the scatter plot, but found it too crowded to be helpful. |
This produces a graph of the benchmark results, e.g:
For each extractor it shows the improvement v.s. time to a reference extractor. In this I ran each extractor with a 2000 second timeout, and after that returned the result from faster-greedy-day. Given faster-greedy-dag is also the reference, the timeouts just show time being used, and no improvement.
I've got some ideas about how to make it more informative e.g:
I'm keen to hear what others would find helpful?