A curated list of reproducible research case studies, projects, tutorials, and media
- Case studies
- Ad-hoc reproductions
- Theory papers
- Tool reviews
- Courses
- Development Resources
- User tools
- Books
- Databases
- Data Repositories
- Exemplar Portals
- Runnable Papers
- Journals
- Ontologies
- Minimal Standards
- Organizations
- Awesome Lists
The term "case studies" is used here in a general sense to describe any study of reproducibility. A reproduction is an attempt to arrive at comparable results with identical data using computational methods described in a paper. A refactor involves refactoring existing code into frameworks and other reproducibility best practices while preserving the original data. A replication involves generating new data and applying existing methods to achieve comparable results. A robustness test applies various protocols, workflows, statistical models or parameters to a given data set to study their effect on results, either as a follow-up to an existing study or as a "bake-off". A census is a high-level tabulation conducted by a third party. A survey is a questionnaire sent to practitioners. A case narrative is an in-depth first-person account. An independent discussion utilizes a secondary independent author to interpret the results of a study as a means to improve inferential reproducibility.
Study |
Field |
Approach |
Size |
Medicine |
Census |
80 studies |
|
Cancer biology |
Refactor |
8 studies |
|
Biostatistics |
Census |
56 studies |
|
Genetics |
Reproduction |
18 studies |
|
Software engineering |
Replication |
4 companies |
|
Signal processing |
Census |
134 papers |
|
Biomedical sciences |
Survey |
23 PIs |
|
Bioinformatics |
Census |
100 studies |
|
Cancer biology |
Replication |
53 studies |
|
Computer science |
Census |
613 papers |
|
Psychology |
Replication |
100 studies |
|
Biomedical sciences |
Census |
100 papers |
|
Epidemiology |
Robustness test |
417 variables |
|
Economics |
Reproduction |
67 papers |
|
Biomedical sciences |
Census |
441 papers |
|
Science |
Survey |
1,576 researchers |
|
NLP |
Replication |
3 studies |
|
Cancer biology |
Replication |
9 studies |
|
Biomedical sciences |
Census |
318 journals |
|
Science |
Case narrative |
31 PIs |
|
Biological sciences
|
Survey |
704 PIs |
|
Bioinformatics |
Refactor |
1 study |
|
Economics |
Replication |
18 studies |
|
Machine learning |
Census |
30 studies |
|
Archaeology |
Case narrative |
1 survey |
|
Comparative toxicogenomics |
Census |
51,292 claims in 3,363 papers |
|
Artificial intelligence |
Census |
400 papers |
|
Economics |
Census |
203 papers |
|
Computational science |
Reproduction |
204 papers, 180 authors |
|
Genomics |
Case narrative |
1 study |
|
Social sciences |
Replication |
21 papers |
|
Psychology |
Robustness test |
One data set, 29 analyst teams |
|
Medicine and health sciences |
Census |
30 papers |
|
Microbiome immuno oncology |
Replication |
1 paper |
|
Bioinformatics |
Refactor and test of robustness |
1 paper |
|
Biomedical Sciences |
Census |
149 papers |
|
Bioinformatics |
Synthetic replication & refactor |
1 paper |
|
Geosciences |
Survey, Reproduction |
146 scientists, 41 papers |
|
Reinforcement Learning |
Reproduction, case narrative |
1 paper |
|
Computational physics |
Census |
306 papers |
|
Science & Engineering |
Survey |
215 participants |
|
Nephrology |
Robustness test |
1 paper |
|
Social sciences & other |
Census |
810 Dataverse studies |
|
Social sciences & other |
Census, Survey |
2109 replication datasets |
|
GIScience/Geoinformatics |
Census, Survey |
32 papers, 22 participants |
|
Genomics |
Robustness test |
8 studies |
|
Geosciences |
Survey |
360 papers |
|
Deep learning |
Robustness test |
1 analysis |
|
Genomics |
Case narrative |
1 analysis |
|
Pharmacogenomics |
Case narrative |
2 analyses |
|
Biomedical sciences and Psychology |
Census |
127 registered reports |
|
All |
Census |
1,159,166 Jupyter notebooks |
|
Virology |
Census |
236 papers |
|
Anaesthesia |
Indepedent discussion |
1 study |
|
Psychology |
Replication |
1 paper |
|
Cell pharmacology |
Robustness test |
5 labs |
|
Machine learning |
Reproduction |
18 conference papers |
|
Experimental archaeology |
Replication |
1 theory |
|
Neurology |
Census |
202 papers |
|
Psychology |
Replication |
2 experiments |
|
Ecology and Evolution |
Census |
163 papers |
|
Neuroimaging |
Robustness test |
1 data set, 70 teams |
|
Psychology |
Replication |
1 experiment, 21 labs, 2,220 participants |
|
Psychology |
Census |
62 papers |
|
Oncology |
Census |
154 meta-analyses |
|
Bioinformatics |
Robustness test |
1 data set |
|
Neurobiology |
Census |
41 papers |
|
Genetics |
Census |
1799 papers |
|
Psychology |
Reproduction |
33 meta-analyses |
|
Biomedical science |
Census |
792 papers |
|
Ecology |
Census |
346 papers |
|
Physics |
Replication |
2 papers |
|
Reproductive endocrinology |
Census |
222 papers |
|
Biomedical sciences |
Census |
240 papers |
|
Environmental Modelling |
Census |
7500 papers |
|
Cardiology |
Census |
532 papers |
|
GIS |
Census |
75 papers |
|
Life sciences |
Survey |
251 researchers |
|
Genetics |
Robustness test |
1 paper |
|
Life sciences |
Census |
3377 articles |
|
Computational Biology |
Census |
622 papers |
|
Computational Biology |
Robustness test |
6 studies |
|
Computational Biology |
Survey |
214 researchers |
|
Differential expression |
Census |
2109 GEO submissions |
These are one-off unpublished attempts to reproduce individual studies
Reproduction |
Original study |
https://rdoodles.rbind.io/2019/06/reanalyzing-data-from-human-gut-microbiota-from-autism-spectrum-disorder-promote-behavioral-symptoms-in-mice/ and https://notstatschat.rbind.io/2019/06/16/analysing-the-mouse-autism-data/ |
Sharon, G. et al. Human Gut Microbiota from Autism Spectrum Disorder Promote Behavioral Symptoms in Mice. Cell 2019, 177 (6), 1600–1618.e17. |
Wei, X.; Nielsen, R. CCR5-∆32 Is Deleterious in the Homozygous State in Humans. Nat. Med. 2019 DOI: 10.1038/s41591-019-0459-6. (retracted) |
|
Leiby et al "Lack of detection of a human placenta microbiome in samples from preterm and term deliveries" https://doi.org/10.1186/s40168-018-0575-4 |
Authors/Date |
Title |
Field |
Type |
Why most published research findings are false |
Science |
Statistical reproducibility |
|
A Quick Guide to Organizing Computational Biology Projects |
Bioinformatics |
Best practices |
|
Ten Simple Rules for Reproducible Computational Research |
Computational science |
Best practices |
|
The Economics of Reproducibility in Preclinical Research |
Preclinical research |
Best practices |
|
The Generalizability Crisis |
Psychology |
Statistical reproducibility |
|
Unreproducible Research is Reproducible |
Machine Learning |
Methodology |
|
Trustworthy data underpin reproducible research |
Physics |
Scientific philosophy |
|
Scientific discovery in a model-centric framework: Reproducibility, innovation, and epistemic diversity |
Science |
Statistical reproducibility |
|
A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility |
Science |
Best practices |
|
The importance of transparency and reproducibility in artificial intelligence research |
Artificial Intelligence |
Critique |
|
What is replication? |
Science |
Scientific philosophy |
|
A Beginner’s Guide to Conducting Reproducible Research |
Ecology |
Best Practices |
|
Realistic and Robust Reproducible Research for Biostatistics |
Biostatistics |
Best practices |
|
A Link is not Enough – Reproducibility of Data |
Databases |
Best practices |
|
COVID-19 pandemic reveals the peril of ignoring metadata standards |
Virology |
Critique |
|
Principles for data analysis workflows |
Data science |
Best practices |
|
Reproducible Research: A Retrospective |
Public health |
Review |
|
Streamlining Data-Intensive Biology With Workflow Systems |
Biology |
Best practices |
|
Meta Research: Questionable research practices may have little effect on replicability |
Science |
Statistical reproducibility |
|
We need to keep a reproducible trace of facts, predictions, and hypotheses from gene to function in the era of big data |
Functional genomics |
Critique |
|
A research parasite's perspective on establishing a baseline to avoid errors in secondary analyses |
Science |
Best practices |
|
The multiplicity of analysis strategies jeopardizes replicability: lessons learned across disciplines |
Science |
Critique |
Authors/Date |
Title |
Tools |
Out-of-the-box Reproducibility: A Survey of Machine Learning Platforms |
MLflow, Polyaxon, StudioML, Kubeflow, CometML, Sagemaker, GCPML, AzureML, Floydhub, BEAT, Codalab, Kaggle |
|
A Survey on Collecting, Managing, and Analyzing Provenance from Scripts |
Astro-Wise, CPL, CXXR, Datatrack, ES3, ESSW, IncPy, Lancet, Magni, noWorkflow, Provenance Curios, pypet, RDataTracker, Sacred, SisGExp, SPADE, StarFlow, Sumatra, Variolite, VCR, versuchung, WISE, YesWorkflow |
|
The Role of Metadata in Reproducible Computational Research |
CellML, CIF2, DATS, DICOM, EML, FAANG, GBIF, GO, ISO/TC 276, MIAME, NetCDF, OGC, ThermoML, CRAN, Conda, pip setup.cfg, EDAM, CodeMeta, Biotoolsxsd, DOAP, ontosoft, SWO, OBCS, STATO, SDMX, DDI, MEX, MLSchema, MLFlow, Rmd, CWL, CWLProv, RO-Crate, RO, WICUS, OPM, PROV-O, ReproZip, ProvOne, WES, BagIt, BCO, ERC, BEL, DC, JATS, ONIX, MeSH, LCSH, MP, Open PHACTS, SWAN, SPAR, PWO, PAV, Manubot, ReScience, PandocScholar |
|
Publishing computational research - a review of infrastructures for reproducible and transparent scholarly communication |
Authorea, Binder, CodeOcean, eLife RDS, Galaxy Project, Gigantum, Manuscript, o2r, REANA, ReproZip, Whole tale |
- MOOCs
- Coursera Reproducible Research - Roger Peng et al JHU. Very popular course.
- edX Principles, Statistical and Computational Tools for Reproducible Science - John Quackenbush et al Harvard
- Reproducible research: methodological principles for transparent science - Beginner level. Note taking, version control, notebooks, reproducible data analysis. Bilingual English/French.
- Online course content
- Tools for Reproducible Research - Karl Broman UW, includes resources page
- R for Reproducible Scientific Analysis - Software Carpentry workshop primer using Gapminder data
- R-DAVIS - Student-developed computer literacy and data course in R
- AMIA2019 - Pragmatic RR for Analysis, Dissemination and Publication
- PSU-PSY525 - Transparent, Open, and Reproducible Research Practices in the Social and Behavioral Sciences
- Monash-RRR - Reproducible Research in R workshop tutorial
- OSU-OSRR - An open science and reproducible research course targeted at organismal ecologists
- R
- CRAN Task View - Reproducible Research - packages relevant to RCR in R
- liftr - persistent reproducible reporting through containerized R Markdown documents
- repo - provenance framework package
- orderly - R package that automates writing reproducible analyses
- Python
- mlf-core - Framework to develop GPU deterministic machine learning models with PyTorch, TensorFlow and XGBoost
- Open With Binder for Chrome or Firefox - open the GitHub repository you are visiting using MyBinder.org
- DVC - DVC tracks machine learning models and data sets
- SciScore - SciScore methods sections for a variety of rigor criteria and analyzes sentences that contain research resources (antibodies, cell lines, plasmids and software tools) and determines how uniquely identifiable that resource is based off of the provided metadata.
- Ripeta - Ripeta quickly scans research manuscripts or articles to identify and record key reproducibility variables, such as data availability, code acknowledgements, and research analysis methods.
- Reproducible Research with R and R Studio 2013
- Implementing Reproducible Research 2014 - Describes projects: Sumatra, Vistrails, CDE, SOLE, JUMBO, CML, knitr. Content available on OSF.
- The Practice of Reproducible Research 2017 - 31 first person case narratives and intro chapters
- Dynamic Documents with R and knitr 2015
- The Turing Way: A Handbook for Reproducible Data Science 2020
- Reproducibility and Replicability in Science
- Reproducibility: Principles, Problems, Practices, and Prospects
- ReplicationWiki - Database for empirical studies with information about methods, data and software used, availability of replication material and whether replications, corrections or retractions are known. Mostly focused on social sciences.
- ReproCrawl
All these repositories assign Digital Object Identifiers (DOIs) to data
- DataCite - 12M+ DOIs registered for 46 allocators. Offers APIs and a metadata schema.
- Data Dryad - curated, metadata-centric, focused on articles associated with published artices, $120 submission fee (various waivers available)
- Figshare - 20 GB of free private space, unlimited public space, >2M articles, >5k projects
- OSF - Project-oriented system with access control and integration with popular tools. Unlimited storage for projects, but individual files are limited to 5 gigabytes (GB) each.
- Zenodo - Allows embargoed, restricted access, metadata support. 50GB limit.
Places to find papers with code or portals to host them
- Jupyter Gallery - Gallery of interesting Jupyter notebooks
- Papers With Code - ML papers with code
- NARPS - Code related to Neuroimaging Analysis Replication and Prediction Study
- Codeocean - A gallery of cloud-based containers with reproducible analyses
Experimental papers that have associated notebooks
Blumberg et al 2021. Characterizing RNA stability genome-wide through combined analysis of PRO-seq and RNA-seq data | https://codeocean.com/capsule/7351682 |
- ReScience - Journal dedicated to insilico reproductions and tests of robustness, lives on Github.
- eLife - Executable Research Articles (ERA) inline executable blocks
- FAIRsharing - standards, databases, and policies
- BioPortal - 660 biomedical ontologies
- STORMS - Strengthening The Organization and Reporting of Microbiome Studies (STORMS) is a checklist for reporting on human microbiome studies. Preprint
- ResearchObject.org - RO specifications and publications
- BioCompute - BCO specs
- rOpenSci - Tools, conferences, and education
- Open Science Framework - Open source project management
- pyOpenSci - Promotes open and reproducible research through peer-review of scientific Python packages
- Replication Network - Furthering the practice of replication in economics. Econ replication database.
- repliCATS project - Estimating the replicability of research in the social sciences. Paper
- ReproHack - 1-day reproducibility hackathons held worldwide
- CODECHECK - community for checking executability of scientific preprints and papers
- CASCaD - Certification Agency for Scientific Code and Data. Issues reproducibility certificates.
- Awesome Pipeline - So many pipelines frameworks
- Awesome Docker - Everything related to the Docker containerization system
- Awesome R - Section on RR tools
- Awesome Reproducible R - RRR tools
- Awesome Jupyter - Jupyter projects, libraries and resources
- Awesome Bioinformatics Benchmarks - Benchmarks are a related aspect of robustness testing
- Awesome Open Science - Resources, data, tools, and scholarship
- Awesome Public Datasets - A topic-centric list of HQ open datasets
- Awesome Semantic Web - Semantic web and linked data resources.
Contributions welcome! Read the contribution guidelines first. You may find my src/doi2md.py
script useful for quickly generating entries from a DOI.
To the extent possible under law, Jeremy Leipzig has waived all copyright and related or neighboring rights to this work.