From a580a96b602b9b8a6c7e15ee4714ed960f06967c Mon Sep 17 00:00:00 2001
From: Andrea Sboner <asboner@gmail.com>
Date: Tue, 13 Oct 2020 18:46:42 -0400
Subject: [PATCH] Updated readme for the docker release

---
 README.md | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 75 insertions(+), 7 deletions(-)

diff --git a/README.md b/README.md
index 6da79b9..ecc9089 100644
--- a/README.md
+++ b/README.md
@@ -1,10 +1,78 @@
-# ERVmap
+# **ERVmap**
 ERVmap is one part curated database of human proviral ERV loci and one part a stringent algorithm to determine which ERVs are transcribed in their RNA seq data.
 
-# **Citation** 
-Tokuyama M. et. al., ERVmap analysis reveals genome-wide transcription of human endogenous retroviruses. Proc Natl Acad Sci U S A. 2018 Dec 11;115(50):12565-12572. doi: 10.1073/pnas.1814589115.
+## Citation 
+Tokuyama M. et. al., ERVmap analysis reveals genome-wide transcription of human endogenous retroviruses. Proc Natl Acad Sci USA 2018 Dec 11;115(50):12565-12572. [doi: 10.1073/pnas.1814589115](http:/doi.org/10.1073/pnas.1814589115).
 
-# **Installing**
+## **How to use it**
+
+### Install
+This version of the tool consists on 2 steps: 1. alignment to the human genome (GRC38) and 2. quantification of the ERV regions. To download and install ERVmap latest version provided as docker image, simply type:
+```
+docker pull eipm/ervmap:latest
+```
+**NOTE**: for a specific version replace `latest` with the release version. 
+
+### **How to run ERVmap**
+To run ERVmap, you'd need: 1. an indexed genome reference for STAR; 2. A bed file with the curated ERV regions on the human genome (see `ERVmap.bed`); 3. the input FASTQ data (gzipped).  Assuming that your sample is called `SAMPLE`, and has 2 FASTQ files (one per read) in the folder `/path/to/input/data`; the reference genome is in `/path/to/genome` and the ERV bed file is in `/path/to/erv/file` here is the command:
+```
+docker run --rm  \
+    -u $(id -u):$(id -g) \
+    -v /path/to/input/data:/data:ro \
+    -v /path/to/genome:/genome:ro \
+    -v /path/to/erv/file:/resources:ro \
+    -v /path/to/output:/results \
+    ervmap \
+    --read1 /data/SAMPLE_1.fastq.gz \
+    --read2 /data/SAMPLE_2.fastq.gz \
+    --output SAMPLE/SAMPLE. \
+    --mode ALL
+```
+This command will generate the alignment files (BAMs) in the `/path/to/output/SAMPLE/` folder and all files will have the prefix `SAMPLE.`. The generated files will be:
+```
+SAMPLE.Aligned.sortedByCoord.out.bam
+SAMPLE.Aligned.sortedByCoord.out.bam.bai
+SAMPLE.ERVresults.txt
+SAMPLE.Log.final.out
+SAMPLE.Log.out
+SAMPLE.Log.progress.out
+SAMPLE.SJ.out.tab
+```
+(See [STAR documentation](https://github.com/alexdobin/STAR) for the description of the output files of the STAR aligner ). 
+The results of ERV quantification will be in the `SAMPLE.ERVresults.txt` file. This is a tab-delimited file with 7 columns from  [bedtools](https://bedtools.readthedocs.io/en/latest/). For example:
+```
+1       896176  898458  5803    500     +       70
+1       1412251 1418852 5804    500     +       36
+1       3801730 3806808 5807    500     +       6
+1       4178468 4187573 5808    500     +       1
+```
+
+## The **`--mode`** option
+This option can only have 3 values: { `ALL`, `STAR`, `BED` }:
+* `ALL` to run both the STAR aligner and the ERV quantification from start to finish; 
+* `STAR` to only perform the alignment;
+* `BED` to only run the ERV quantification.
+
+
+### Optional parameters (recommended) 
+There are a few parameters that can be added to the ERVmap image to make the process more efficient.
+* `--cpus 20`: if you have a multi-core system (and you should have one), you can specify the number of CPUs to use (e.g. 20);
+* `--limit-ram 48000000000`: this limits the amount of RAM used to avoid overusing the resources 
+You can see the full set of parameters by typing: `docker run --rm ervmap`.
+
+There are also other parameters from Docker that should be included before `ervmap` in the command line, e.g. 
+```    
+    --memory 50G \
+    --memory-swap 100G
+``` 
+
+----
+
+# Published version 
+
+Please note that the instructions hereafter refer to the orignal published version (see [ERVmap on GitHub](https://github.com/mtokuyama/ERVmap))
+
+## **Installing**
 
 ### Install dependencies
 ``` 
@@ -30,7 +98,7 @@ normalize_with_file.pl
 normalize_deseq.r
 ```
 
-# **Map data to human genome (hg38)**
+## **Map data to human genome (hg38)**
 
 This step will yield raw counts for cellular genes and ERVmap loci as separate files.
 
@@ -52,7 +120,7 @@ mv ./sample/herv_coverage_GRCh38_genome.txt ./output/erv/${i}.e
 mv ./sample/GRCh38/htseq.cnt ./output/cellular/${i}.c
 ```
 
-# **Clean up data, merge, and normalize**
+## **Clean up data, merge, and normalize**
 
 These steps will yield normalized ERV read counts based on size factors obtained through DESeq2 analysis. 
 Use the output files from above. 
@@ -65,7 +133,7 @@ normalize_deseq.r  ./output/cellular/merged_cellular.txt ./output/cellular/norma
 normalize_with_file.pl ./output/cellular/normalized_factors ./output/erv/merged_erv.txt > ./output/$folder_name.txt
 ```
 
-# **Authors**
+## Authors
 
 * Maria Tokuyama
 * Yong Kong