-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Produce reference results to verify correctness #12
Comments
I am continuing to work with this benchmark and I continue finding bugs in the implementations listed so far. It would be extremely helpful for me to know what the correct results are. I can contribute to this effort, but before I work out something in full detail, it would be great to have other peoples' thoughts... |
@stwunsch, @arizzi, @sbinet, @mat-adamec, @pieterdavid: You have all written at least parts of an implementation. Any interest in joining this effort/discussion? |
I had a look at some of the other implementations when writing the bamboo version, and also noticed some differences - I tried to implement the benchmarks to my best understanding of them, but bugs in both steps are obviously possible, so reference outputs would be useful to check. I don't have a strong preference for a format, ROOT or any format that is easy to write from python should work. |
I think providing an implementation with thoroughly verified results for comparison is certainly a good idea. Systematic validation of the histogram output is difficult because the benchmarks do not dictate the parameters of the plots, as you mentioned. I think this is intentional, since the point of the benchmarks is only to demonstrate how to make a particular selection; the details of histogramming are beyond the scope of the benchmark tasks, I believe. If there was a reference implementation with numerical results available, however, we could at least recommend to use the same binning for easier validation. |
for histos, one could either say |
+1 for this :) @masonproffitt looked into and fixed a few problems in https://github.com/root-project/opendata-benchmarks some time ago, maybe just to start somewhere we could add the histograms produced by those implementations to this repo? |
FYI, our six implementations (which you find in the submodules used by this repo) have (matching!) reference results as normal CSV files. (I am not sure whether the results are for the full data set or just samples of it.) |
I think it would be a good idea to produce a reference result for each of the benchmarks, such that implementations can verify their correctness. I already found several bugs in several implementations, which would probably have been discovered if there were a reference result.
One problem that I see is that it is not clear how to compare results. First, the benchmark does not specify what "plot" means, i.e., how to configure the histograms. As far as I have seen, the different implementations largely use the same configuration, but, for example, the Go and the Coffea implementations use different configurations in Task 8. I think it would be a good idea to specify the histograms in the benchmark.
Second, different tools serialize their histograms differently. While Groot and Coffea give the lower bounds of each bin, ROOT gives the bin centers. Also, the former two have the underflows and overflows separately while ROOT has two extra bins. This has an easy solution: pick one, say bin centers plus two extra bins, and have the implementations convert as part of the comparison.
The text was updated successfully, but these errors were encountered: