Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce reference results to verify correctness #12

Open
ingomueller-net opened this issue Apr 17, 2020 · 7 comments
Open

Produce reference results to verify correctness #12

ingomueller-net opened this issue Apr 17, 2020 · 7 comments
Assignees

Comments

@ingomueller-net
Copy link
Contributor

I think it would be a good idea to produce a reference result for each of the benchmarks, such that implementations can verify their correctness. I already found several bugs in several implementations, which would probably have been discovered if there were a reference result.

One problem that I see is that it is not clear how to compare results. First, the benchmark does not specify what "plot" means, i.e., how to configure the histograms. As far as I have seen, the different implementations largely use the same configuration, but, for example, the Go and the Coffea implementations use different configurations in Task 8. I think it would be a good idea to specify the histograms in the benchmark.

Second, different tools serialize their histograms differently. While Groot and Coffea give the lower bounds of each bin, ROOT gives the bin centers. Also, the former two have the underflows and overflows separately while ROOT has two extra bins. This has an easy solution: pick one, say bin centers plus two extra bins, and have the implementations convert as part of the comparison.

@ingomueller-net
Copy link
Contributor Author

I am continuing to work with this benchmark and I continue finding bugs in the implementations listed so far. It would be extremely helpful for me to know what the correct results are. I can contribute to this effort, but before I work out something in full detail, it would be great to have other peoples' thoughts...

@ingomueller-net
Copy link
Contributor Author

ingomueller-net commented May 14, 2020

@stwunsch, @arizzi, @sbinet, @mat-adamec, @pieterdavid: You have all written at least parts of an implementation. Any interest in joining this effort/discussion?

@pieterdavid
Copy link
Contributor

I had a look at some of the other implementations when writing the bamboo version, and also noticed some differences - I tried to implement the benchmarks to my best understanding of them, but bugs in both steps are obviously possible, so reference outputs would be useful to check. I don't have a strong preference for a format, ROOT or any format that is easy to write from python should work.

@masonproffitt
Copy link
Member

I think providing an implementation with thoroughly verified results for comparison is certainly a good idea. Systematic validation of the histogram output is difficult because the benchmarks do not dictate the parameters of the plots, as you mentioned. I think this is intentional, since the point of the benchmarks is only to demonstrate how to make a particular selection; the details of histogramming are beyond the scope of the benchmark tasks, I believe. If there was a reference implementation with numerical results available, however, we could at least recommend to use the same binning for easier validation.

@sbinet
Copy link
Contributor

sbinet commented May 23, 2020

for histos, one could either say TH{1,2}x or use YODA histograms (saved in their ASCII format).
this would help with text based comparison (and funneling into a CI pipeline).

@eguiraud
Copy link
Contributor

eguiraud commented Mar 3, 2022

+1 for this :)

@masonproffitt looked into and fixed a few problems in https://github.com/root-project/opendata-benchmarks some time ago, maybe just to start somewhere we could add the histograms produced by those implementations to this repo?

@ingomueller-net
Copy link
Contributor Author

FYI, our six implementations (which you find in the submodules used by this repo) have (matching!) reference results as normal CSV files. (I am not sure whether the results are for the full data set or just samples of it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants