This repository includes scripts to process the amplicon data and generate the figures used in (Kiledal, Keffer, and Maresca 2020).
The raw sequence data is housed at NCBI SRA PRJNA629592. Scripts include functionality to download this data, and run our entire analysis pipeline.
-
code
- This directory contains the code (as notebooks) for each step; detailed descriptions are found in that directory's README file.
-
data
- This directory is where processed data is stored, and comes populated with items like sample metadata and several other reference files.
-
results
- Figures and tables from the paper which can be reproduced by running the notebooks in the
code
directory.
- Figures and tables from the paper which can be reproduced by running the notebooks in the
Running this analysis requires R. The following python tools are also required:
Several steps are quite computationally intensive and use of an HPC is strongly suggested.
Many of the figures produced with ggplot inadvertantly used an "off-label" use of ggsave() and will now produce error codes. The fix is simple and document in the included link, but the short version is that ggplot() + ggsave() no longer works, and so calls to ggsave should be replaced. Several replacement options are available, including relying on the default last_plot(), pipeing the ggplot object (%>% instead of +), or saving the ggplot to an object and explicitely passing it to ggsave. Alternately, an older version of ggpplot could also be used.