-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What to put inside the reproducibility report ? #4
Comments
There are a couple of resources that could be useful here: Will try and work on this soon! |
Operating system and hardware should ideally be given both for the original computation and for the reproduction. License is nice to know but not essential for reproduction by the original author. We should also recommend authors to make their code available online (if not done already), add a license (if missing), and usage instructions. Then submit everything to Software Heritage. |
From @annakrystalli resouces, I think we could ask authors who managed to reproduce their results to make a dedicated compendium (GitHub repository) and to save it at Software Heritage. |
Copying over some items from ReScience/template#6: @rougier writes:
@khinsen writes: |
One point from my own experience as a participant in the challenge: If reproduction requires any changes at all to the code or the installation instructions, discuss which knowledge or competence someone else would need to be able to do it. In my case (https://github.com/khinsen/rescience-ten-year-challenge-paper-3), I fixed a software collapse issue by changing a single line of my code, but I could do that quickly only because I had written or contributed to the entire software stack that my code depends on. For anyone else, figuring out that a dependency of a dependency of my code was broken by a change one more layer below would probably be a prohibitive effort. |
I have written a first draft of the author guidelines. Comments (and pull requests!) welcome! |
Looks good to me. Maybe we need to emphasize the des ription of the language that has been used since we may want to do some (simple) statistics from all the entries. |
Good point. Anything else we might to want include in the statistics? It would be interesting in theory to include all dependencies, not just the language. It is unlikely that we will have enough submissions to do a meaningful analysis on anything but the most frequently listed dependencies, but I'd expect a few dependencies (e.g. NumPy) to be as frequent as languages. |
Could we reasonably ask authors to provide a machine-readable list of dependencies for our analysis? Take ReScience/submissions#11 (comment) as an example: I think the author provided a nice and detailed explanation for his choice of technologies for a human reader, but it would be hard to extract [Fortran, IQPACK] as a dependency list from it. Alternatively, we could ask reviewer to compile such a list, and have authors verify. Or scan for dependencies in the end, when doing our statistics, which is doable if the number of submissions doesn't explode in the coming months. |
Would you go so far as to suggest to create a Binder? |
No. We won't have any notebooks due to the ten-year rule, so moving towards Binder would require authors to rewrite their code, which is not the goal of the exercice. What we could suggest is to package a suitable computational environment as a container (a reproducible Dockerfile, for example) or using a Nix or Guix. But I wouldn't want to make this a condition, we'd lose too many people. |
We can propose alternatives here:
|
This is a thread for defining what kind of information might be useful to appear in the reproducibility report (independently of success of failure). @annakrystalli might have some more ideas since she organized reproducibility hackatons:
Feel free to add comments/ideas/suggestions
The text was updated successfully, but these errors were encountered: