Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What to put inside the reproducibility report ? #4

Open
rougier opened this issue Nov 4, 2019 · 12 comments
Open

What to put inside the reproducibility report ? #4

rougier opened this issue Nov 4, 2019 · 12 comments

Comments

@rougier
Copy link
Member

rougier commented Nov 4, 2019

This is a thread for defining what kind of information might be useful to appear in the reproducibility report (independently of success of failure). @annakrystalli might have some more ideas since she organized reproducibility hackatons:

  • Location of the original source code (online, physical support, what kind, etc)
  • Presence of a license for the code
  • Presence of a README
  • Programming langage
  • Operating System (if relevant)
  • Specific hardware (if relevant)

Feel free to add comments/ideas/suggestions

@annakrystalli
Copy link

There are a couple of resources that could be useful here:

Will try and work on this soon!

@khinsen
Copy link
Contributor

khinsen commented Nov 4, 2019

Operating system and hardware should ideally be given both for the original computation and for the reproduction.

License is nice to know but not essential for reproduction by the original author.

We should also recommend authors to make their code available online (if not done already), add a license (if missing), and usage instructions. Then submit everything to Software Heritage.

@rougier
Copy link
Member Author

rougier commented Nov 4, 2019

From @annakrystalli resouces, I think we could ask authors who managed to reproduce their results to make a dedicated compendium (GitHub repository) and to save it at Software Heritage.
(and we'll test in again in ten years :)) The reviewer template will also be super useful for the review.

@khinsen
Copy link
Contributor

khinsen commented Nov 19, 2019

Copying over some items from ReScience/template#6:

@rougier writes:
Among the things that might be interesting is:

How did you conserve the sources
Did you take care of registering RNG seed (if you use it)
Did you save command line options (if you need some options)
Did you need to adapt your sources ?
Did you need to adapt your libraries ?
What guided your choice of fortran among other languages at that time
etc.

@khinsen writes:
I'd like to emphasize the utility of communicating the choices (and the motivations behind them) made at the time of publication, even if they risk being distorted by hindsight. That's something we can only get out of authors doing reproductions of their own work. For example, I realized that I never preserved or published code for reproducibility, but only to make it available for reuse by others. As a consequence, I am always missing the last small steps: command-line arguments, that five-line script that ties computations together, etc.

@khinsen
Copy link
Contributor

khinsen commented Dec 3, 2019

One point from my own experience as a participant in the challenge: If reproduction requires any changes at all to the code or the installation instructions, discuss which knowledge or competence someone else would need to be able to do it. In my case (https://github.com/khinsen/rescience-ten-year-challenge-paper-3), I fixed a software collapse issue by changing a single line of my code, but I could do that quickly only because I had written or contributed to the entire software stack that my code depends on. For anyone else, figuring out that a dependency of a dependency of my code was broken by a change one more layer below would probably be a prohibitive effort.

@khinsen
Copy link
Contributor

khinsen commented Dec 11, 2019

I have written a first draft of the author guidelines. Comments (and pull requests!) welcome!

@rougier
Copy link
Member Author

rougier commented Dec 12, 2019

Looks good to me. Maybe we need to emphasize the des ription of the language that has been used since we may want to do some (simple) statistics from all the entries.

@khinsen
Copy link
Contributor

khinsen commented Dec 12, 2019

Good point. Anything else we might to want include in the statistics? It would be interesting in theory to include all dependencies, not just the language. It is unlikely that we will have enough submissions to do a meaningful analysis on anything but the most frequently listed dependencies, but I'd expect a few dependencies (e.g. NumPy) to be as frequent as languages.

@khinsen
Copy link
Contributor

khinsen commented Dec 13, 2019

Could we reasonably ask authors to provide a machine-readable list of dependencies for our analysis? Take ReScience/submissions#11 (comment) as an example: I think the author provided a nice and detailed explanation for his choice of technologies for a human reader, but it would be hard to extract [Fortran, IQPACK] as a dependency list from it.

Alternatively, we could ask reviewer to compile such a list, and have authors verify. Or scan for dependencies in the end, when doing our statistics, which is doable if the number of submissions doesn't explode in the coming months.

@nuest
Copy link

nuest commented Dec 13, 2019

Would you go so far as to suggest to create a Binder?

@khinsen
Copy link
Contributor

khinsen commented Dec 13, 2019

No. We won't have any notebooks due to the ten-year rule, so moving towards Binder would require authors to rewrite their code, which is not the goal of the exercice.

What we could suggest is to package a suitable computational environment as a container (a reproducible Dockerfile, for example) or using a Nix or Guix. But I wouldn't want to make this a condition, we'd lose too many people.

@pdebuyl
Copy link
Member

pdebuyl commented Dec 17, 2019

We can propose alternatives here:

  • notebook, on binder or not
  • Makefile
  • bash script

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants