Produce reference results to verify correctness #12

ingomueller-net · 2020-04-17T13:40:41Z

I think it would be a good idea to produce a reference result for each of the benchmarks, such that implementations can verify their correctness. I already found several bugs in several implementations, which would probably have been discovered if there were a reference result.

One problem that I see is that it is not clear how to compare results. First, the benchmark does not specify what "plot" means, i.e., how to configure the histograms. As far as I have seen, the different implementations largely use the same configuration, but, for example, the Go and the Coffea implementations use different configurations in Task 8. I think it would be a good idea to specify the histograms in the benchmark.

Second, different tools serialize their histograms differently. While Groot and Coffea give the lower bounds of each bin, ROOT gives the bin centers. Also, the former two have the underflows and overflows separately while ROOT has two extra bins. This has an easy solution: pick one, say bin centers plus two extra bins, and have the implementations convert as part of the comparison.

ingomueller-net · 2020-05-14T17:43:33Z

I am continuing to work with this benchmark and I continue finding bugs in the implementations listed so far. It would be extremely helpful for me to know what the correct results are. I can contribute to this effort, but before I work out something in full detail, it would be great to have other peoples' thoughts...

ingomueller-net · 2020-05-14T17:46:01Z

@stwunsch, @arizzi, @sbinet, @mat-adamec, @pieterdavid: You have all written at least parts of an implementation. Any interest in joining this effort/discussion?

pieterdavid · 2020-05-14T19:42:09Z

I had a look at some of the other implementations when writing the bamboo version, and also noticed some differences - I tried to implement the benchmarks to my best understanding of them, but bugs in both steps are obviously possible, so reference outputs would be useful to check. I don't have a strong preference for a format, ROOT or any format that is easy to write from python should work.

masonproffitt · 2020-05-20T09:05:20Z

I think providing an implementation with thoroughly verified results for comparison is certainly a good idea. Systematic validation of the histogram output is difficult because the benchmarks do not dictate the parameters of the plots, as you mentioned. I think this is intentional, since the point of the benchmarks is only to demonstrate how to make a particular selection; the details of histogramming are beyond the scope of the benchmark tasks, I believe. If there was a reference implementation with numerical results available, however, we could at least recommend to use the same binning for easier validation.

sbinet · 2020-05-23T18:10:01Z

for histos, one could either say TH{1,2}x or use YODA histograms (saved in their ASCII format).
this would help with text based comparison (and funneling into a CI pipeline).

eguiraud · 2022-03-03T16:27:42Z

+1 for this :)

@masonproffitt looked into and fixed a few problems in https://github.com/root-project/opendata-benchmarks some time ago, maybe just to start somewhere we could add the histograms produced by those implementations to this repo?

ingomueller-net · 2022-03-03T22:08:58Z

FYI, our six implementations (which you find in the submodules used by this repo) have (matching!) reference results as normal CSV files. (I am not sure whether the results are for the full data set or just samples of it.)

masonproffitt self-assigned this Mar 17, 2021

masonproffitt mentioned this issue Apr 14, 2021

Review histogram ranges #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Produce reference results to verify correctness #12

Produce reference results to verify correctness #12

ingomueller-net commented Apr 17, 2020

ingomueller-net commented May 14, 2020

ingomueller-net commented May 14, 2020 •

edited

Loading

pieterdavid commented May 14, 2020

masonproffitt commented May 20, 2020

sbinet commented May 23, 2020

eguiraud commented Mar 3, 2022

ingomueller-net commented Mar 3, 2022

Produce reference results to verify correctness #12

Produce reference results to verify correctness #12

Comments

ingomueller-net commented Apr 17, 2020

ingomueller-net commented May 14, 2020

ingomueller-net commented May 14, 2020 • edited Loading

pieterdavid commented May 14, 2020

masonproffitt commented May 20, 2020

sbinet commented May 23, 2020

eguiraud commented Mar 3, 2022

ingomueller-net commented Mar 3, 2022

ingomueller-net commented May 14, 2020 •

edited

Loading