-
Notifications
You must be signed in to change notification settings - Fork 47
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
23 changed files
with
12,134 additions
and
161 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
Contact | ||
------- | ||
|
||
Author: Jose Fernandez Navarro <[email protected]> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
Examples | ||
-------- | ||
|
||
The following is an example of an BASH file to run the ST pipeline. | ||
This example is for version 1.6.0 of the ST pipeline. | ||
|
||
.. code-block:: bash | ||
#!/bin/bash | ||
# FASTQ reads | ||
FW=/YOUR_RUN/R1.fastq.gz | ||
RV=/YOUR_RUN/R2.fastq.gz | ||
# References for mapping, annotation and nonRNA-filtering | ||
MAP=/mouse/GRCm38_86v2/StarIndex | ||
ANN=/mouse/GRCm38_86v2/annotation/annotation.gtf | ||
CONT=/mouse/GRCm38_86v2/ncRNA/StarIndex | ||
# Barcodes settings | ||
ID=/stpipeline/ids/YOUR_IDs.txt | ||
# Output folder and experiment name | ||
# Do not use / or \ in the experiment name | ||
OUTPUT=/your_experiment_folder | ||
EXP=YOUR_EXP_NAME | ||
# Running the pipeline | ||
st_pipeline_run.py \ | ||
--output-folder $OUTPUT \ | ||
--ids $ID \ | ||
--ref-map $MAP \ | ||
--ref-annotation $ANN \ | ||
--expName $EXP \ | ||
--htseq-no-ambiguous \ | ||
--verbose \ | ||
--log-file $OUTPUT/${EXP}_log.txt \ | ||
--contaminant-index $CONT \ | ||
$FW $RV |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
.. ST Pipeline documentation. | ||
Welcome to ST Pipeline's documentation! | ||
======================================= | ||
|
||
Contents: | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
intro | ||
installation | ||
manual | ||
example | ||
changes | ||
license | ||
contact | ||
|
||
|
||
|
||
Indices and tables | ||
================== | ||
|
||
* :ref:`genindex` | ||
* :ref:`modindex` | ||
* :ref:`search` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
Installing the Spatial Transcriptomics pipeline | ||
----------------------------------------------- | ||
|
||
These are the general instructions for installing the st_pipeline from scratch | ||
on you compute environment. All the commands can be performed as a user with no | ||
elevated permissions. | ||
|
||
We recommend to download and install Anaconda (https://www.anaconda.com/products/individual) | ||
|
||
We then create a virtual environment from which we will run the pipeline in. | ||
Type the following command: | ||
|
||
``conda create -n pipeline python=3.6 anaconda`` | ||
|
||
The name for the virtual environment that we have just created is specified by | ||
the -n flag. Here is is called pipeline, but this can be anything that you want | ||
to name it. To run the pipeline, this virtual environment must be activated. To | ||
activate the virtual environment, enter the following command: | ||
|
||
``source activate pipeline`` | ||
|
||
Where pipeline is the name of your virtual environment (here the virtual | ||
environment is called pipeline). To deactivate the virtual environment, type the | ||
following command: | ||
|
||
``source deactivate`` | ||
|
||
You need to obtain the pipeline from github to use it. The following steps will | ||
tell you how to perform this. | ||
|
||
Change to your home directory | ||
|
||
``cd`` | ||
|
||
Clone the repository from github | ||
|
||
``git clone git://github.com/SpatialTranscriptomicsResearch/st_pipeline.git`` | ||
|
||
Change into the st_pipeline directory | ||
|
||
``cd st_pipeline`` | ||
|
||
Activate the virtual environment (if not already active) | ||
|
||
``source activate pipeline`` | ||
|
||
Install the pipeline | ||
|
||
``python setup.py build`` | ||
|
||
``python setup.py install`` | ||
|
||
Alternatively, you can simply install the pipeline using PyPy: | ||
|
||
``pip install stpipeline`` | ||
|
||
Now the pipeline is installed and ready to be run. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
Introduction | ||
------------ | ||
|
||
The ST Pipeline contains the tools and scripts needed to process | ||
and analyze the raw files generated with the Spatial Transcriptomics | ||
or Visium in FASTQ format to generate datasets for down-stream analysis. | ||
The ST pipeline can also be used to process single cell RNA-seq data as | ||
long as a file with barcodes identifying each cell is provided. | ||
The ST Pipeline can also process RNA-Seq datasets generated with | ||
or without UMIs. | ||
|
||
The ST Pipeline has been optimized for speed, robustness and | ||
it is very easy to use with many parameters to adjust all the settings. | ||
The ST Pipeline is fully parallel and has constant memory use. | ||
The ST Pipeline allows to skip any of the steps and to use the | ||
genome or the transcriptome as reference. | ||
|
||
The following files/parameters are required: | ||
|
||
- FASTQ files (Read 1 containing the spatial information and the UMI | ||
and read 2 containing the genomic sequence) | ||
- A genome index generated with STAR | ||
- An annotation file in GTF or GFF format (optional) | ||
- The file containing the barcodes and array coordinates | ||
(look at the folder "ids" and chose the correct one). | ||
Basically this file contains 3 columns (BARCODE, X and Y), | ||
so if you provide this file with barcodes identinfying cells (for example), | ||
the ST pipeline can be used for single cell data. | ||
This file is optional too. | ||
- A name for the dataset | ||
|
||
The ST pipeline has multiple parameters mostly related to trimming, | ||
mapping and annotation but generally the default values are good enough. | ||
You can see a full description of the parameters | ||
typing "st_pipeline_run.py --help" after you have installed the ST pipeline. | ||
|
||
The input FASTQ files can be given in gzip/bzip format as well. | ||
|
||
Basically what the ST pipeline does is: | ||
|
||
- Quality trimming (read 1 and read 2): | ||
- Remove low quality bases | ||
- Sanity check (reads same length, reads order, etc..) | ||
- Check quality UMI (if provided) | ||
- Remove artifacts (PolyT, PolyA, PolyG, PolyN and PolyC) of user defined length | ||
- Check for AT and GC content | ||
- Discard reads with a minimum number of bases of that failed any of the checks above | ||
- Contamimant filter e.x. rRNA genome (Optional) | ||
- Mapping with STAR (only read 2) | ||
- Demultiplexing with [Taggd](https://github.com/SpatialTranscriptomicsResearch/taggd) (only read 1) | ||
- Keep reads (read 2) that contain a valid barcode and are correctly mapped | ||
- Annotate the reads with htseq-count (optional) | ||
- Group annotated reads by barcode(spot position) and gene to get a read count | ||
- In the grouping/counting only unique molecules (UMIs) are kept. | ||
|
||
You can see a graphical more detailed description of the workflow in the documents workflow.pdf and workflow_extended.pdf | ||
|
||
The output will be a matrix of counts (genes as columns, spots as rows), | ||
a BED file containing the transcripts (Read name, coordinate, gene, etc..), and a JSON | ||
file with useful stats. | ||
The ST pipeline will also output a log file with useful information. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
License | ||
------- | ||
|
||
The MIT License (MIT) | ||
Copyright (c) 2016 Jose Fernandez Navarro. | ||
All rights reserved. | ||
|
||
* Jose Fernandez Navarro <[email protected]> | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy of | ||
this software and associated documentation files (the "Software"), to deal in | ||
the Software without restriction, including without limitation the rights to | ||
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of | ||
the Software, and to permit persons to whom the Software is furnished to do so, | ||
subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS | ||
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR | ||
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER | ||
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN | ||
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
Oops, something went wrong.