Updated docs

jfnavarro · Jan 13, 2021 · ca1b3d4 · ca1b3d4
1 parent b3000fc
commit ca1b3d4
Show file tree

Hide file tree

Showing 23 changed files with 12,134 additions and 161 deletions.
diff --git a/docs/_sources/changes.rst.txt b/docs/_sources/changes.rst.txt
diff --git a/docs/_sources/contact.rst.txt b/docs/_sources/contact.rst.txt
@@ -0,0 +1,4 @@
+Contact
+-------
+
+Author: Jose Fernandez Navarro <[email protected]>
diff --git a/docs/_sources/example.rst.txt b/docs/_sources/example.rst.txt
@@ -0,0 +1,39 @@
+Examples
+--------
+
+The following is an example of an BASH file to run the ST pipeline. 
+This example is for version 1.6.0 of the ST pipeline.
+
+.. code-block:: bash
+
+	#!/bin/bash
+
+	# FASTQ reads
+	FW=/YOUR_RUN/R1.fastq.gz
+	RV=/YOUR_RUN/R2.fastq.gz
+
+	# References for mapping, annotation and nonRNA-filtering
+	MAP=/mouse/GRCm38_86v2/StarIndex
+	ANN=/mouse/GRCm38_86v2/annotation/annotation.gtf
+	CONT=/mouse/GRCm38_86v2/ncRNA/StarIndex
+
+	# Barcodes settings
+	ID=/stpipeline/ids/YOUR_IDs.txt
+
+	# Output folder and experiment name
+	# Do not use / or \ in the experiment name
+	OUTPUT=/your_experiment_folder
+	EXP=YOUR_EXP_NAME
+
+	# Running the pipeline
+	st_pipeline_run.py \
+	  --output-folder $OUTPUT \
+	  --ids $ID \
+	  --ref-map $MAP \
+	  --ref-annotation $ANN \
+	  --expName $EXP \
+	  --htseq-no-ambiguous \
+	  --verbose \
+	  --log-file $OUTPUT/${EXP}_log.txt \
+	  --contaminant-index $CONT \
+	  $FW $RV
diff --git a/docs/_sources/index.rst.txt b/docs/_sources/index.rst.txt
@@ -0,0 +1,26 @@
+.. ST Pipeline documentation.
+
+Welcome to ST Pipeline's documentation!
+=======================================
+
+Contents:
+
+.. toctree::
+   :maxdepth: 2
+
+   intro
+   installation
+   manual
+   example
+   changes
+   license
+   contact
+
+
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
diff --git a/docs/_sources/installation.rst.txt b/docs/_sources/installation.rst.txt
@@ -0,0 +1,57 @@
+Installing the Spatial Transcriptomics pipeline
+-----------------------------------------------
+
+These are the general instructions for installing the st_pipeline from scratch
+on you compute environment. All the commands can be performed as a user with no
+elevated permissions.
+
+We recommend to download and install Anaconda (https://www.anaconda.com/products/individual)
+
+We then create a virtual environment from which we will run the pipeline in.
+Type the following command:
+
+	``conda create -n pipeline python=3.6 anaconda``
+
+The name for the virtual environment that we have just created is specified by
+the -n flag. Here is is called pipeline, but this can be anything that you want
+to name it. To run the pipeline, this virtual environment must be activated. To
+activate the virtual environment, enter the following command:
+
+	``source activate pipeline``
+
+Where pipeline is the name of your virtual environment (here the virtual
+environment is called pipeline). To deactivate the virtual environment, type the
+following command:
+
+	``source deactivate``
+
+You need to obtain the pipeline from github to use it. The following steps will
+tell you how to perform this.
+
+Change to your home directory
+
+	``cd``
+
+Clone the repository from github
+
+	``git clone git://github.com/SpatialTranscriptomicsResearch/st_pipeline.git``
+
+Change into the st_pipeline directory
+
+	``cd st_pipeline``
+
+Activate the virtual environment (if not already active)
+
+	``source activate pipeline``
+
+Install the pipeline
+
+	``python setup.py build``
+
+	``python setup.py install``
+
+Alternatively, you can simply install the pipeline using PyPy:
+
+	``pip install stpipeline``
+
+Now the pipeline is installed and ready to be run.
diff --git a/docs/_sources/intro.rst.txt b/docs/_sources/intro.rst.txt
@@ -0,0 +1,61 @@
+Introduction
+------------
+
+The ST Pipeline contains the tools and scripts needed to process 
+and analyze the raw files generated with the Spatial Transcriptomics 
+or Visium in FASTQ format to generate datasets for down-stream analysis. 
+The ST pipeline can also be used to process single cell RNA-seq data as 
+long as a file with barcodes identifying each cell is provided.
+The ST Pipeline can also process RNA-Seq datasets generated with 
+or without UMIs. 
+
+The ST Pipeline has been optimized for speed, robustness and 
+it is very easy to use with many parameters to adjust all the settings.
+The ST Pipeline is fully parallel and has constant memory use. 
+The ST Pipeline allows to skip any of the steps and to use the 
+genome or the transcriptome as reference. 
+
+The following files/parameters are required:
+
+- FASTQ files (Read 1 containing the spatial information and the UMI 
+  and read 2 containing the genomic sequence) 
+- A genome index generated with STAR 
+- An annotation file in GTF or GFF format (optional)
+- The file containing the barcodes and array coordinates 
+   (look at the folder "ids" and chose the correct one). 
+   Basically this file contains 3 columns (BARCODE, X and Y), 
+   so if you provide this file with barcodes identinfying cells (for example), 
+   the ST pipeline can be used for single cell data.
+   This file is optional too. 
+- A name for the dataset
+
+The ST pipeline has multiple parameters mostly related to trimming, 
+mapping and annotation but generally the default values are good enough. 
+You can see a full description of the parameters 
+typing "st_pipeline_run.py --help" after you have installed the ST pipeline.
+
+The input FASTQ files can be given in gzip/bzip format as well. 
+
+Basically what the ST pipeline does is:
+
+- Quality trimming (read 1 and read 2):
+	- Remove low quality bases
+	- Sanity check (reads same length, reads order, etc..)
+	- Check quality UMI (if provided)
+	- Remove artifacts (PolyT, PolyA, PolyG, PolyN and PolyC) of user defined length
+	- Check for AT and GC content
+	- Discard reads with a minimum number of bases of that failed any of the checks above
+- Contamimant filter e.x. rRNA genome (Optional)
+- Mapping with STAR (only read 2)
+- Demultiplexing with [Taggd](https://github.com/SpatialTranscriptomicsResearch/taggd) (only read 1)
+- Keep reads (read 2) that contain a valid barcode and are correctly mapped
+- Annotate the reads with htseq-count (optional)
+- Group annotated reads by barcode(spot position) and gene to get a read count
+- In the grouping/counting only unique molecules (UMIs) are kept. 
+
+You can see a graphical more detailed description of the workflow in the documents workflow.pdf and workflow_extended.pdf
+
+The output will be a matrix of counts (genes as columns, spots as rows),
+a BED file containing the transcripts (Read name, coordinate, gene, etc..), and a JSON
+file with useful stats.
+The ST pipeline will also output a log file with useful information.
diff --git a/docs/_sources/license.rst.txt b/docs/_sources/license.rst.txt
@@ -0,0 +1,25 @@
+License
+-------
+
+The MIT License (MIT)
+Copyright (c) 2016 Jose Fernandez Navarro.
+All rights reserved.
+
+* Jose Fernandez Navarro <[email protected]>
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
+the Software, and to permit persons to whom the Software is furnished to do so,
+subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
+FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
+COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
+IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.