Skip to content

[Discussion] reproducibility #170

Open
@minrk

Description

@minrk

Discussion issue for general topics of reproducibility and what's in and out of scope for repo2docker (and Binder).

We currently have a tension between our scientific goal of reproducibility and the maintenance goal of keeping everything up to date. We have the same issue that everyone who pursues reproducibility has, which is specifying the environment as strictly as necessary (so it's correct), but no stricter (so it stays useful). Conservative approaches are to use overly-specified environments (e.g. pip freeze / conda env export), which we should make sure to support well and document for the more reproducibility-minded users.

A user who wants to ensure a truly reproducible build must:

  • use a pip freeze or conda env export-produced environment specification
  • pin the Python version (for pip, already done above for conda)
  • pin the distro/base image
  • probably pin repo2docker itself (easy for manual use cases, not available on Binder)

Right now, the only truly reproducible builds available on Binder are custom Dockerfiles, which is something I want fewer people to use, not more. But we currently have no answer for reproducibility with any other builders, as there is no way for users to be sufficiently strict about the environment.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions