diff --git a/content/03.results.md b/content/03.results.md index 31dcd7ac..391f4e6b 100644 --- a/content/03.results.md +++ b/content/03.results.md @@ -6,20 +6,15 @@ We previously performed whole genome sequencing (WGS), whole exome sequencing (W **Figure {@fig:Fig1}B** summarizes biospecimen numbers by phase of therapy and histology. We harnessed, and built upon, the benchmarking efforts of the [Gabriella Miller Kids First Data Resource Center](https://kidsfirstdrc.org/) to develop robust and reproducible data analysis workflows within the [CAVATICA platform](https://www.cavatica.org/) to perform comprehensive somatic analyses (**Figure {@fig:S1}**) and **STAR Methods**) of the PBTA. -A key innovative feature of OpenPBTA is its open contribution framework used for both analyses (e.g., analytical code) and manuscript writing. +A key innovative feature of OpenPBTA is the contribution framework used for analyses (e.g., analytical code) and manuscript writing. We created a public Github analysis repository ([https://github.com/AlexsLemonade/OpenPBTA-analysis](https://github.com/AlexsLemonade/OpenPBTA-analysis)) to hold all analysis code downstream of Kids First workflows and a GitHub manuscript repository ([https://github.com/AlexsLemonade/OpenPBTA-manuscript](https://github.com/AlexsLemonade/OpenPBTA-manuscript)) with Manubot [@doi:10.1371/journal.pcbi.1007128] integration to enable real-time manuscript creation. -Most analysis modules, as indicated in their documentation, can be run locally or scaled to run on an Amazon EC2 instance, as facilitated by the project's Docker® [@https://dl.acm.org/doi/10.5555/2600239.2600241] container. -As all analyses and manuscript writing were conducted in public repositories, any researcher in the world could contribute to OpenPBTA. - -The process for analysis and manuscript contributions is outlined in **Figure {@fig:Fig1}C**. +As all analyses and manuscript writing were conducted in public repositories, any researcher in the world could contribute to OpenPBTA following the process outlined in **Figure {@fig:Fig1}C**. First, a potential contributor proposed an analysis by filing an issue in the GitHub analysis repository. Next, project organizers or other contributors with expertise provided feedback about the proposed analysis (**Figure {@fig:Fig1}C**). -The contributor then added their proposed code and results to their copy (fork) of the analysis repository. -The contributor formally requested to include their analytical code and results in the OpenPBTA analysis repository by filing a GitHub pull request (PR) . -All PRs underwent peer review by organizers and/or other contributors to ensure scientific accuracy, maintainability, and readability of code and documentation (**Figure {@fig:Fig1}C-D**). -During review, two or more analysts ran the same code within the OpenPBTA Docker® [@https://dl.acm.org/doi/10.5555/2600239.2600241] container to ensure reproducibility of results. +The contributor formally requested to include their analytical code and results – written in their own copy (fork) of repository – in the OpenPBTA analysis repository by filing a GitHub pull request (PR). +All PRs underwent peer review to ensure scientific accuracy, maintainability, and readability of code and documentation (**Figure {@fig:Fig1}C-D**). -Beyond peer review, we established additional checks to ensure consistent results for all collaborators over time (**Figure {@fig:Fig1}D**). +Beyond peer review to ensure reproducibility, we established additional checks to ensure consistent results for all collaborators over time (**Figure {@fig:Fig1}D**). We leveraged Docker® [@https://dl.acm.org/doi/10.5555/2600239.2600241] and the Rocker project [@https://doi.org/10.48550/arXiv.1710.03675] to maintain a consistent software development environment, creating a monolithic image with all OpenPBTA dependencies. To ensure that new code executed in the development environment, we used the continuous integration (CI) service CircleCI® to run analytical code in PRs on a test dataset before formal code review, allowing us to detect code bugs or sensitivity to data release changes.