Skip to content

Commit

Permalink
Updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
jfnavarro committed Jan 13, 2021
1 parent b3000fc commit ca1b3d4
Show file tree
Hide file tree
Showing 23 changed files with 12,134 additions and 161 deletions.
578 changes: 578 additions & 0 deletions docs/_sources/changes.rst.txt

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions docs/_sources/contact.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Contact
-------

Author: Jose Fernandez Navarro <[email protected]>
39 changes: 39 additions & 0 deletions docs/_sources/example.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
Examples
--------

The following is an example of an BASH file to run the ST pipeline.
This example is for version 1.6.0 of the ST pipeline.

.. code-block:: bash
#!/bin/bash
# FASTQ reads
FW=/YOUR_RUN/R1.fastq.gz
RV=/YOUR_RUN/R2.fastq.gz
# References for mapping, annotation and nonRNA-filtering
MAP=/mouse/GRCm38_86v2/StarIndex
ANN=/mouse/GRCm38_86v2/annotation/annotation.gtf
CONT=/mouse/GRCm38_86v2/ncRNA/StarIndex
# Barcodes settings
ID=/stpipeline/ids/YOUR_IDs.txt
# Output folder and experiment name
# Do not use / or \ in the experiment name
OUTPUT=/your_experiment_folder
EXP=YOUR_EXP_NAME
# Running the pipeline
st_pipeline_run.py \
--output-folder $OUTPUT \
--ids $ID \
--ref-map $MAP \
--ref-annotation $ANN \
--expName $EXP \
--htseq-no-ambiguous \
--verbose \
--log-file $OUTPUT/${EXP}_log.txt \
--contaminant-index $CONT \
$FW $RV
26 changes: 26 additions & 0 deletions docs/_sources/index.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
.. ST Pipeline documentation.
Welcome to ST Pipeline's documentation!
=======================================

Contents:

.. toctree::
:maxdepth: 2

intro
installation
manual
example
changes
license
contact



Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
57 changes: 57 additions & 0 deletions docs/_sources/installation.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
Installing the Spatial Transcriptomics pipeline
-----------------------------------------------

These are the general instructions for installing the st_pipeline from scratch
on you compute environment. All the commands can be performed as a user with no
elevated permissions.

We recommend to download and install Anaconda (https://www.anaconda.com/products/individual)

We then create a virtual environment from which we will run the pipeline in.
Type the following command:

``conda create -n pipeline python=3.6 anaconda``

The name for the virtual environment that we have just created is specified by
the -n flag. Here is is called pipeline, but this can be anything that you want
to name it. To run the pipeline, this virtual environment must be activated. To
activate the virtual environment, enter the following command:

``source activate pipeline``

Where pipeline is the name of your virtual environment (here the virtual
environment is called pipeline). To deactivate the virtual environment, type the
following command:

``source deactivate``

You need to obtain the pipeline from github to use it. The following steps will
tell you how to perform this.

Change to your home directory

``cd``

Clone the repository from github

``git clone git://github.com/SpatialTranscriptomicsResearch/st_pipeline.git``

Change into the st_pipeline directory

``cd st_pipeline``

Activate the virtual environment (if not already active)

``source activate pipeline``

Install the pipeline

``python setup.py build``

``python setup.py install``

Alternatively, you can simply install the pipeline using PyPy:

``pip install stpipeline``

Now the pipeline is installed and ready to be run.
61 changes: 61 additions & 0 deletions docs/_sources/intro.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
Introduction
------------

The ST Pipeline contains the tools and scripts needed to process
and analyze the raw files generated with the Spatial Transcriptomics
or Visium in FASTQ format to generate datasets for down-stream analysis.
The ST pipeline can also be used to process single cell RNA-seq data as
long as a file with barcodes identifying each cell is provided.
The ST Pipeline can also process RNA-Seq datasets generated with
or without UMIs.

The ST Pipeline has been optimized for speed, robustness and
it is very easy to use with many parameters to adjust all the settings.
The ST Pipeline is fully parallel and has constant memory use.
The ST Pipeline allows to skip any of the steps and to use the
genome or the transcriptome as reference.

The following files/parameters are required:

- FASTQ files (Read 1 containing the spatial information and the UMI
and read 2 containing the genomic sequence)
- A genome index generated with STAR
- An annotation file in GTF or GFF format (optional)
- The file containing the barcodes and array coordinates
(look at the folder "ids" and chose the correct one).
Basically this file contains 3 columns (BARCODE, X and Y),
so if you provide this file with barcodes identinfying cells (for example),
the ST pipeline can be used for single cell data.
This file is optional too.
- A name for the dataset

The ST pipeline has multiple parameters mostly related to trimming,
mapping and annotation but generally the default values are good enough.
You can see a full description of the parameters
typing "st_pipeline_run.py --help" after you have installed the ST pipeline.

The input FASTQ files can be given in gzip/bzip format as well.

Basically what the ST pipeline does is:

- Quality trimming (read 1 and read 2):
- Remove low quality bases
- Sanity check (reads same length, reads order, etc..)
- Check quality UMI (if provided)
- Remove artifacts (PolyT, PolyA, PolyG, PolyN and PolyC) of user defined length
- Check for AT and GC content
- Discard reads with a minimum number of bases of that failed any of the checks above
- Contamimant filter e.x. rRNA genome (Optional)
- Mapping with STAR (only read 2)
- Demultiplexing with [Taggd](https://github.com/SpatialTranscriptomicsResearch/taggd) (only read 1)
- Keep reads (read 2) that contain a valid barcode and are correctly mapped
- Annotate the reads with htseq-count (optional)
- Group annotated reads by barcode(spot position) and gene to get a read count
- In the grouping/counting only unique molecules (UMIs) are kept.

You can see a graphical more detailed description of the workflow in the documents workflow.pdf and workflow_extended.pdf

The output will be a matrix of counts (genes as columns, spots as rows),
a BED file containing the transcripts (Read name, coordinate, gene, etc..), and a JSON
file with useful stats.
The ST pipeline will also output a log file with useful information.
25 changes: 25 additions & 0 deletions docs/_sources/license.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
License
-------

The MIT License (MIT)
Copyright (c) 2016 Jose Fernandez Navarro.
All rights reserved.

* Jose Fernandez Navarro <[email protected]>

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Loading

0 comments on commit ca1b3d4

Please sign in to comment.