diff --git a/README.md b/README.md index f377b28..1b759f7 100644 --- a/README.md +++ b/README.md @@ -36,23 +36,13 @@ To run the pipeline you have create experiment metadata files: and samplesheet (`samplesheet.csv`). We provide test example [here](assets/samplesheet.csv). -Next, you have to generate genome references to incorporate ERCC spike-ins. References are downloaded from [GENCODE](https://www.gencodegenes.org) database. - -```bash -nextflow run nf-core/marsseq \ - -profile \ - --genome \ - --build_references \ - --input samplsheet.csv \ - --outdir -``` - Now, you can run the pipeline using: ```bash nextflow run nf-core/marsseq \ -profile \ - --genome \ + --fasta https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M32/GRCm39.primary_assembly.genome.fa.gz \ + --gtf https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M32/gencode.vM32.annotation.gtf.gz \ --input samplesheet.csv \ --outdir ``` diff --git a/docs/images/workflow.png b/docs/images/workflow.png index 2a70cce..c587481 100644 Binary files a/docs/images/workflow.png and b/docs/images/workflow.png differ diff --git a/docs/output.md b/docs/output.md index 57253fc..5baffdf 100644 --- a/docs/output.md +++ b/docs/output.md @@ -10,7 +10,7 @@ The directories listed below will be created in the results directory after the The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: -- [Download and build references](#download-and-build-references) - Build references needer to run the pipeline +- [Prepare genome](#prepare-genome) - Build references needer to run the pipeline - [Prepare pipeline](#prepare-pipeline) - [Label reads](#label-reads) - [Align reads](#align-preads) @@ -26,55 +26,35 @@ The pipeline is executed per `Batch` and therefore the folder structure looks li ```console results/ -|-- multiqc -|-- pipeline_info -|-- references -`-- +├── multiqc +├── pipeline_info +├── references +└── SB26 + ├── data + ├── fastqc + ├── output + ├── QC + ├── SB26.sam + └── velocity ``` -## Download and build references +## Prepare genome -
-Output files +The pipeline requires ERCC (spike-ins) to be included in the reference genome. To +accomdate this, the pipeline requires `fasta` and `gtf` reference files. We recommend +using files from [GENCODE](https://www.gencodegenes.org). Reference indexes are built +based on set `--aligner` parameter. ```console -. -└── - ├── bowtie2 - │ ├── .1.bt2 - │ ├── .2.bt2 - │ ├── .3.bt2 - │ ├── .4.bt2 - │ ├── .rev.1.bt2 - │ └── .rev.2.bt2 - ├── .fa - ├── .gtf - ├── star - │ ├── chrLength.txt - │ ├── chrNameLength.txt - │ ├── chrName.txt - │ ├── chrStart.txt - │ ├── exonGeTrInfo.tab - │ ├── exonInfo.tab - │ ├── geneInfo.tab - │ ├── Genome - │ ├── genomeParameters.txt - │ ├── Log.out - │ ├── SA - │ ├── SAindex - │ ├── sjdbInfo.txt - │ ├── sjdbList.fromGTF.out.tab - │ ├── sjdbList.out.tab - │ └── transcriptInfo.tab - └── versions.yml +results/references +├── bowtie2 +├── gencode.vM32.annotation.gtf +├── GRCm39.primary_assembly.genome_ercc.fa +├── GRCm39.primary_assembly.genome.fa +├── star +└── versions.yml ``` -
- -The pipeline downloads references from GENCODE database. This is required, because -the MARS-seq is using ERCC spike-ins, which have to be appended. Next it builds -bowtie2 index. If `--velocity` flag is set, star index is also built. - ## Prepare pipeline
@@ -85,7 +65,6 @@ bowtie2 index. If `--velocity` flag is set, star index is also built. - `gene_intervals.txt`: Information about gene (chromosome, start, end, strand and symbol) - `seq_batches.txt`: Sequencing batches - `wells_cells.txt`: Well cells - - `*fastq.gz`: Raw reads
@@ -121,11 +100,11 @@ folder. Split reads are aligned using `bowtie2`. Next, all the aligned reads are merged into one `SAM` file which is used as an input for demultiplexing. -If `--velocity` flag is set, the reads are also aligned using `StarSolo` to estimated -both spliced and unspliced reads which can be used for RNA velocity estimation. -This is an additional plugin which we developed. In short MARS-seq2.0 reads are -converted to `10X v2` format. Additionally, a whitelist is generated for aligned -to perform demultiplexing. +If `--aligner` flag is set to `bowtie2_star` or `star`, the reads are also aligned +using `StarSolo` to estimated both spliced and unspliced reads which can be used +for RNA velocity estimation. This is an additional plugin which we developed. +In short MARS-seq2.0 reads are converted to `10X v2` format. Additionally, a +whitelist is generated for aligned to perform demultiplexing.
Output files @@ -133,13 +112,11 @@ to perform demultiplexing. - `` - `.sam`: Merged aligned reads into one SAM file with `bowtie2` - `velocity/` - - `Solo.out/*`: Output from StarSolo (Gene, GeneFull, SJ, Velocyto and Barcode.stats) - - `Aligned.sortedByCoord.out.bam`: Aligned reads - - `Log.final.out`: STAR alignment report containing the mapping results summary - - `Log.out` and `Log.progress.out`: STAR log files containing detailed information about the run. Typically only useful for debugging purposes - - `.cutadapt.log`: Log file from running `cutadapt` - `_{1,2}.trim.fastq.gz`: Trimmed pair-end converted `10X v2` reads - - `SJ.out.tab`: File containing filtered splice junctions detected after mapping the reads + - `.cutadapt.log`: Log file from running `cutadapt` + - `.Log.final.out`: STAR alignment report containing the mapping results summary + - `.Log.out` and `.Log.progress.out`: STAR log files containing detailed information about the run. Typically only useful for debugging purposes + - `.Solo.out/*`: Output from StarSolo (Gene, GeneFull, SJ, Velocyto and Barcode.stats) - `whitelist.txt`: File containing cell barcodes (combination of pool and cell barcode)
diff --git a/docs/usage.md b/docs/usage.md index 369498e..d56d951 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -59,7 +59,7 @@ An [example samplesheet](../assets/samplesheet.csv) has been provided with the p The typical command for running the pipeline is as follows: ```bash -nextflow run nf-core/marsseq --input ./samplesheet.csv --outdir ./results --genome GRCh37 -profile docker +nextflow run nf-core/marsseq --input ./samplesheet.csv --outdir ./results --fasta genome.fasta --gtf annotation.gtf -profile docker ``` This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. @@ -92,7 +92,8 @@ with: ```yaml title="params.yaml" input: './samplesheet.csv' outdir: './results/' -genome: 'GRCh37' +fasta: 'genome.fasta' +gtf: 'annotation.gtf' <...> ```