Starting with fastQ files as input, this pipeline has the following stages:
- Pre-alignment QC with
fastp
(adapter-trimming + QC) andmultiQC
report collation - Alignment with
STAR
- Post-alignment QC and analysis with:
- Infer library strandedness with
RSeQC inferstrandedness.py
- Outputs Transcript Integrity number (TIN) with
RSeQC tin.py
for evenness of coverage assessment - Outputs wig and normalised bigwig files for genome browser visualisations with
RSeQC bam2wig
andUCSC wig2BigWig binary
- Quantification with
FeatureCounts from Subread
at the exon-level, grouped by gene_id into metafeatures. - [Conditional] If there are >=2 sample conditions and >=3 samples, we carry out the R secondary analysis which performs:
- Differential gene expression across all pairwise comparisons
- Overrepresentation analysis for pathway enrichment
Name | Type | Extension | Description |
---|---|---|---|
fastpQcHtml | pre-alignment qc | .html files | fastp html reports |
multiqcHtml | pre-alignment qc | .html file | multiqc html report |
bamFiles | alignment results | .bam files | STAR sorted-by-coordinate bams |
starLogs | post-alignment qc | log files | STAR log.final.out files |
flagstat | post-alignment qc | log files | flagstat files |
tin_summary | post-alignment qc | .txt files | rseqc tin summary files |
tin_xls | post-alignment qc | .xls files | rseqc tin xls files |
wigs | alignment results | .wig files | wig files |
normalised bigwigs | alignment results | .bigwig files | bigwig files |
countMatrix | quantification results | .txt | featureCounts count matrix |
countsParsed | quantification results | .txt | Output of parse_counts.py which provides gene names |
countsSummary | quantification results | .txt | featureCounts summary file |
downstreamResDir | R secondary analysis results | .zip | R analysis results files |
- dockerBase = 026171442599.dkr.ecr.ap-southeast-1.amazonaws.com/
- dockerPrefix = species + genomeVersion
- E.g. “human” + “grch38”
- dockerUri = dockerBase + dockerPrefix
For other species, replace docker image according to naming convention
Index | Module | Docker image name | Scripts/apps |
---|---|---|---|
1 | countArrayUniqueItems.wdl | ubuntu | None |
2 | fastp.wdl | qc (apps_refless) | None |
3 | multiQc.wdl | qc (apps_refless) | None |
4 | pairsToR1R2.wdl | None | None |
5 | star.wdl | humangrch38rnaseq (fat Docker) | (ref_files/species): STAR indices, annotation.gtf |
6 | samtools.wdl | samtools (apps_refless) | None |
7 | rseqc.wdl | humangrch38rseqc (apps) | (ref_files/species): annotation.bed, houseKeepingGenes.bed, chrNameLength.txt; (ref_files/apps): wigToBigWig; (apps): processInferExperimentScript.py |
8 | featureCounts.wdl | humangrch38featureCounts (apps) | (ref_files/species): annotation.gtf; (apps) parse_counts.py |
9 | downstreamRNAseq.wdl | downstreamRNAseq (apps_refless) | (apps): main.R |
- Main: RNAseq repository
- Dockerfiles and configurations
- Modules (checkout rnaseq branch)
The options.json
file is a dummy file required for running the cromshell submit
subcommand. Currently, cromwell has the following IP: 172.18.149.93:7098
.
cromshell submit ./main.wdl ./tests/tests_biodebian/rnaseq_mouse_grcm39.json ./options.json ./modules.zip