Skip to content

More thorough software tracking #17

Open
@jl-wynen

Description

@jl-wynen

Context

Currently, we can use reduction.software to specify a single program and version that was used to produce the data. This is enough in cases where the program is fully self-contained. But this is not always the case. For example, reduction software may be published as a Python package which depends on a number of other packages.

For reproducibility, we need to track (some) dependencies as well as the 'main' package. In the Python example, the best solution would be to store the output of pip freeze or conda list. However, most of those packages are not relevant for reproducing data (barring possible bugs in those packages). More importantly, we need to track pieces of software that provide algorithms which may change in the future and impact the result.

In our concrete case at ESS, we have ESSreflectometry as the highest level package. It uses algorithms from ESSreduce and ScippNeutron. All three of those packages need to be listed with their versions if we hope to reproduce reduced data in the future.

For full provenance tracking, we need more than what can be reasonable encoded in YAML. E.g., a full list of packages (pip freeze) and a description of the concrete workflow beyond a short list of corrections. The latter would likely take the form of a graph. This information can be saved in separate files alongside an .ort file.

Proposed solution

Allow reduction.software to be an array. This way we can track all pieces of software we deem relevant.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions