Merge pull request #47 from paulinetrinh/main

update happi documentation
statdivlab · Jul 28, 2023 · 66519a3 · 66519a3
2 parents b15b10a + d142d7d
commit 66519a3
Show file tree

Hide file tree

Showing 2 changed files with 92 additions and 12 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -5,9 +5,20 @@ output: github_document
 ```{r setup, include=FALSE}
 knitr::opts_chunk$set(echo = TRUE)
 ```
-<span style="font-family:Arial; font-size:2em;"> a **H**ierarchical **Ap**proach to **P**angenomics **I**nference</span>
+<span style="font-family:Arial; font-size:10em;"> `happi`: a **H**ierarchical **Ap**proach to **P**angenomics **I**nference</span>
 
-`happi` is an `R` package for modeling gene presence. 
+## What is `happi`?
+`happi` is a method for modeling gene presence in pangenomics that leverages information about genome quality to improve inference. `happi` models the association between an experimental condition and gene presence where the **experimental condition** is the **primary predictor** of interest and **gene presence** is the **outcome** while incorporating user-chosen information on genome quality metrics (e.g. mean coverage, contamination, completion, etc...). 
+
+You might be interested in using `happi` to conduct your pangenomics hypothesis testing if you work with fragmented genomes such as metagenome assembled genomes (MAGs). `happi` is currently distributed as an `R` package and can be installed using the instructions below. 
+
+## Where does `happi` fit into my workflow? 
+
+If you're new to shotgun metagenomics we understand that things can feel overwhelming! On top of all the tools and names floating around you're probably wondering where does `happi` fit into the vast suite of bioinformatics tools for metagenomics data and how can you use it in your work? `happi` can be used *after* you have assembled, binned, annotated, and refined your genomes or metagenome-assembled genomes (MAGs) and as such it can be used with any bioinformatics workflow that conducts assembly, binning, annotation, and refinement. 
+
+We highly recommend checking out Mike Lee's [resources](https://astrobiomike.github.io/genomics/) on genomics to orient yourself to the breadth of metagenomics tools for building a bioinformatics pipeline/workflow from raw reads to assembled genomes. We recommend spending time to understand the pros and cons of the tools you're using at each step of your workflow, particularly as software and methods continue to improve. 
+
+If you're looking for a platform that wraps a lot of these tools into its code base to conduct assembly, binning, annotation, and refinement of MAGs we suggest checking out the [anvi'o](https://anvio.org) platform: metagenomics workflow [here](https://merenlab.org/2016/06/22/anvio-tutorial-v2/) and pangenomics workflow [here](https://merenlab.org/2016/11/08/pangenomics-v2/). Future work for our development team is to integrate `happi` into the anvi'o platform for a seamless user experience. In the meantime, please refer to the format of your data in the following data input section for what is required to utilize `happi`. 
 
 ## Installation
 
@@ -18,8 +29,15 @@ devtools::install_github("statdivlab/happi", build_vignettes = TRUE) # install h
 library(happi)
 ```
 
+## Data Input: What data format is needed to run `happi`? 
+
+You will need two pieces of information: 
+
+1. Information on the presence/absence of your genes in your MAGs/genomes; example [here](https://github.com/statdivlab/happi/blob/main/workflows/TM7_genes_presence_table.csv) 
+2. Metadata/covariate information for the MAGs/genomes that you'd like to use for hypothesis testing; example [here](https://github.com/statdivlab/happi/blob/main/workflows/TM7_metadata.csv)
+
 ## Usage 
-The vignettes provide examples of how to use `happi` and all its main
+The vignettes provide detailed information of how to use `happi` and all its main
 functions through the `R` interactive session. You can follow the vignettes by running the following code in
 `R`:
 ```
@@ -47,10 +65,14 @@ snakemake --cores 6
 
 The Snakefile is customizable for your own input data and parameters. Please refer to the sample data files that have been provided in `workflows` for formatting of your input data.
 
+## How do I export data from anvi'o for use in `happi`? 
+
+
+
 ## Citation 
 If you use `happi` please cite our work: 
 
-An open-access preprint is available here. 
+An open-access preprint is available [here](https://www.biorxiv.org/content/10.1101/2022.04.26.489591v1.full). 
 
 ## Issues/Requests 
 

diff --git a/README.md b/README.md
@@ -1,10 +1,54 @@
 
 <!-- README.md is generated from README.Rmd. Please edit that file -->
 
-<span style="font-family:Arial; font-size:2em;"> a **H**ierarchical
-**Ap**proach to **P**angenomics **I**nference</span>
-
-`happi` is an `R` package for modeling gene presence.
+<span style="font-family:Arial; font-size:10em;"> `happi`: a
+**H**ierarchical **Ap**proach to **P**angenomics **I**nference</span>
+
+## What is `happi`?
+
+`happi` is a method for modeling gene presence in pangenomics that
+leverages information about genome quality to improve inference. `happi`
+models the association between an experimental condition and gene
+presence where the **experimental condition** is the **primary
+predictor** of interest and **gene presence** is the **outcome** while
+incorporating user-chosen information on genome quality metrics
+(e.g. mean coverage, contamination, completion, etc…).
+
+You might be interested in using `happi` to conduct your pangenomics
+hypothesis testing if you work with fragmented genomes such as
+metagenome assembled genomes (MAGs). `happi` is currently distributed as
+an `R` package and can be installed using the instructions below.
+
+## Where does `happi` fit into my workflow?
+
+If you’re new to shotgun metagenomics we understand that things can feel
+overwhelming! On top of all the tools and names floating around you’re
+probably wondering where does `happi` fit into the vast suite of
+bioinformatics tools for metagenomics data and how can you use it in
+your work? `happi` can be used *after* you have assembled, binned,
+annotated, and refined your genomes or metagenome-assembled genomes
+(MAGs) and as such it can be used with any bioinformatics workflow that
+conducts assembly, binning, annotation, and refinement.
+
+We highly recommend checking out Mike Lee’s
+[resources](https://astrobiomike.github.io/genomics/) on genomics to
+orient yourself to the breadth of metagenomics tools for building a
+bioinformatics pipeline/workflow from raw reads to assembled genomes. We
+recommend spending time to understand the pros and cons of the tools
+you’re using at each step of your workflow, particularly as software and
+methods continue to improve.
+
+If you’re looking for a platform that wraps a lot of these tools into
+its code base to conduct assembly, binning, annotation, and refinement
+of MAGs we suggest checking out the [anvi’o](https://anvio.org)
+platform: metagenomics workflow
+[here](https://merenlab.org/2016/06/22/anvio-tutorial-v2/) and
+pangenomics workflow
+[here](https://merenlab.org/2016/11/08/pangenomics-v2/). Future work for
+our development team is to integrate `happi` into the anvi’o platform
+for a seamless user experience. In the meantime, please refer to the
+format of your data in the following data input section for what is
+required to utilize `happi`.
 
 ## Installation
 
@@ -13,11 +57,22 @@
     devtools::install_github("statdivlab/happi", build_vignettes = TRUE) # install happi using devtools
     library(happi)
 
+## Data Input: What data format is needed to run `happi`?
+
+You will need two pieces of information:
+
+1.  Information on the presence/absence of your genes in your
+    MAGs/genomes; example
+    [here](https://github.com/statdivlab/happi/blob/main/workflows/TM7_genes_presence_table.csv)
+2.  Metadata/covariate information for the MAGs/genomes that you’d like
+    to use for hypothesis testing; example
+    [here](https://github.com/statdivlab/happi/blob/main/workflows/TM7_metadata.csv)
+
 ## Usage
 
-The vignettes provide examples of how to use `happi` and all its main
-functions through the `R` interactive session. You can follow the
-vignettes by running the following code in `R`:
+The vignettes provide detailed information of how to use `happi` and all
+its main functions through the `R` interactive session. You can follow
+the vignettes by running the following code in `R`:
 
     utils::browseVignettes(package = "happi")
 
@@ -45,11 +100,14 @@ The Snakefile is customizable for your own input data and parameters.
 Please refer to the sample data files that have been provided in
 `workflows` for formatting of your input data.
 
+## How do I export data from anvi’o for use in `happi`?
+
 ## Citation
 
 If you use `happi` please cite our work:
 
-An open-access preprint is available here.
+An open-access preprint is available
+[here](https://www.biorxiv.org/content/10.1101/2022.04.26.489591v1.full).
 
 ## Issues/Requests