Benchmark Results #70

nmichlo · 2021-11-02T23:55:42Z

First of all, great repository!

Would it be possible to publish benchmark results as part of the repo rather than requiring users to run it ourselves?

mbuzdalov · 2021-11-03T00:20:12Z

I want to publish such results, somehow, maybe since the beginning of this repo.

However, there are few problems that make such a thing really difficult.

They must be up-to-date with the code in the repository, hence the benchmarking must be run at each commit in a continuous integration fashion, and then published. I believe that for technical reasons such results shall be not a part of the repository, but they rather need to be stored and displayed on a companion website, for instance.
Even the smallest usable benchmark set runs for few days. The next smallest one requires two weeks. The only way to scale it somehow is to use a lot of (groups of identical) dedicated computers (as multiple cores of a single computer do not work well, and using non-dedicated computers also does not work), which I cannot currently afford. I have spent some considerable time on implementing incremental benchmarking (e.g. the algorithms that do not show changes on a small dataset are not recomputed on larger ones), but what has been developed remains highly immature (e.g. who likes the statistical significance threshold of 10^-10?).
Even if the previous things are resolved, the data is still very hard to display, as it is multidimensional (problem type, number of points, dimension, number of fronts for those types where it is necessary) even for one algorithm, and there are many algorithms and many flavours of them. So this is hell lot of numbers, tables and plots. You can see here (some parallel speedups) or here (very few plots of few good algorithms), for instance, how much of a mess it is.
The existing benchmark sets are just a tiny fraction of what shall be there. For instance, everyone and their brother wants timings on e.g. ZDT, DTLZ, WFG, DTLZ^-1 benchmarks, and so on. Currently, such things are not even there!
Even assuming all that is resolved... one cannot really be much confident that the relations between these numbers (of course not the numbers themselves) will hold on some different hardware. Yes, most of these relations are quite stable. However, I am aware that switching some of the things on and off may increase the performance by 3x on one machine and decrease it by 1.2x on another (if you are, for some reason, interested what such things are, check up the branchless-median branch).

So you may see that it is really difficult. Maybe when I get back here finally, for instance to prepare a journal paper on the multthreaded version of the Jensen-Fortin-myself algorithm, I will also try to sort out some of these issues.

nmichlo · 2021-11-03T13:56:43Z

That makes sense.

Maybe they don't have to be full blown benchmarks but partial ones just to get an idea of the behaviour and runtimes of the different algorithms?
You could also version the results and give the date of the run if you did decide to post them without continuous integration.

EDIT: I did not see that some of the results have been posted under the release section.

mbuzdalov self-assigned this Nov 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark Results #70

Benchmark Results #70

nmichlo commented Nov 2, 2021

mbuzdalov commented Nov 3, 2021 •

edited

Loading

nmichlo commented Nov 3, 2021 •

edited

Loading

Benchmark Results #70

Benchmark Results #70

Comments

nmichlo commented Nov 2, 2021

mbuzdalov commented Nov 3, 2021 • edited Loading

nmichlo commented Nov 3, 2021 • edited Loading

mbuzdalov commented Nov 3, 2021 •

edited

Loading

nmichlo commented Nov 3, 2021 •

edited

Loading