Skip to content

Commit

Permalink
address suggestions from Christopher
Browse files Browse the repository at this point in the history
  • Loading branch information
yuukiiwa committed Jan 21, 2022
1 parent 81874b3 commit 3aa9240
Show file tree
Hide file tree
Showing 8 changed files with 81 additions and 13 deletions.
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Major enhancements

* Add DNA variant calling functionality
* Add RNA modification and fusion detection functionalities
* Port pipeline to the updated Nextflow DSL2 syntax adopted on nf-core/modules
* Removed `--publish_dir_mode` as it is no longer required for the new syntax
* Bump minimum Nextflow version from 21.04.0 -> 21.10.3
Expand All @@ -22,16 +23,24 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
* Added `--phase_vcf` to output a phased vcf
* Added `--skip_medaka` to skip `medaka_variant`
* Added `--skip_sniffles` to skip `sniffles`
* Added `--skip_modification_analysis` to skip RNA modification detection
* Added `--skip_xpore` to skip `xpore`
* Added `--skip_m6anet` to skip `m6anet`
* Added `--skip_fusion_analysis` to skip RNA fusion detection
* Added `--jaffal_ref_dir` to indicate the reference directory path required by `JAFFAL`

### Software dependencies

| Dependency | Old version | New version |
|-------------------------|-------------|-------------|
| `bioconductor-bambu` | 1.0.2 | 2.0.0 |
| `bioconductor-bsgenome` | 1.58.0 | 1.62.0 |
| `jaffa` | | 2.0 |
| `m6anet` | | 1.0 |
| `medaka` | | 1.4.4 |
| `multiqc` | 1.10.1 | 1.11 |
| `sniffles` | | 1.0.12 |
| `xpore` | | 2.1 |

### Bug fix

Expand Down
9 changes: 9 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@
* [Guppy](https://nanoporetech.com/nanopore-sequencing-data-analysis)

* [JAFFAL](https://doi.org/10.1186/s13059-021-02588-5)
> Davidson NM, et al., JAFFAL: detecting fusion genes with long-read transcriptome sequencing. Genome Biology (2022)
* [m6anet](https://www.biorxiv.org/content/10.1101/2021.09.20.461055v1)
> Hendra C, et al., Detection of m6A from direct RNA sequencing using a Multiple Instance Learning framework. bioRXiv (2021)
* [Minimap2](https://pubmed.ncbi.nlm.nih.gov/29750242/)
> Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191. PMID: 29750242; PMCID: PMC6137996.
Expand Down Expand Up @@ -54,6 +60,9 @@
* [UCSC tools](https://www.ncbi.nlm.nih.gov/pubmed/20639541/)
> Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010 Sep 1;26(17):2204-7. doi: 10.1093/bioinformatics/btq351. Epub 2010 Jul 17. PubMed PMID: 20639541; PubMed Central PMCID: PMC2922891.
* [xPore](https://doi.org/10.1038/s41587-021-00949-w)
> Pratanwanich PN, et al.,Identification of differential RNA modifications from nanopore direct RNA sequencing with xPore. Nat Biotechnol (2021)
## R packages

* [R](https://www.R-project.org/)
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ On release, automated continuous integration tests run the pipeline on a [full-s
* bambu performs both transcript reconstruction and quantification.
* When StringTie2 is chosen, each sample can be processed individually and combined. After which, [`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/) will be used for both gene and transcript quantification.
* Differential expression analysis ([`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) and/or [`DEXSeq`](https://bioconductor.org/packages/release/bioc/html/DEXSeq.html))
* RNA modification detection ([`xpore`](https://github.com/GoekeLab/xpore) and/or [`m6anet`](https://github.com/GoekeLab/m6anet))
* RNA fusion detection ([`JAFFAL`](https://github.com/Oshlack/JAFFA))
9. Present QC for raw read and alignment results ([`MultiQC`](https://multiqc.info/docs/))

## Quick Start
Expand Down
50 changes: 41 additions & 9 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,24 @@ The [bigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html) format is in a

The creation of these files can be bypassed by setting the parameters `--skip_bigwig`/`--skip_bigbed`.

## Variant calling

<details markdown="1">
<summary>Output files</summary>

* `minimap2/medaka/<SAMPLE>/round_1.vcf`
VCF file with small variants for each sample.
* `minimap2/sniffles/<sample>_sniffles.vcf`
VCF files with unflitered structural variants.

</details>

*Documentation*:
[Medaka](https://github.com/nanoporetech/medaka), [Sniffles](https://github.com/fritzsedlazeck/Sniffles)

*Description*:
If the protocol is set to `--protocol DNA` and the *Minimap2* aligner was used, then the `--call_variants` parameter can be invoked to call small variants and structural variants using Medaka and Sniffles, respectively. These steps won't be run if you provide the `--skip_medaka` or `--skip_sniffles` parameters.

## Transcript Reconstruction and Quantification

<details markdown="1">
Expand Down Expand Up @@ -191,8 +209,8 @@ After genomic alignment, novel transcripts can be reconstructed using tools such
<details markdown="1">
<summary>Output files</summary>

* `<QUANTIFICATION_METHOD>/deseq2/deseq2.results.txt` - a `.txt` file that can contains differential expression results for genes.
* `<QUANTIFICATION_METHOD>/dexseq/dexseq.results.txt` - a `.txt` file that can contains differential expression results for transcripts.
* `<QUANTIFICATION_METHOD>/deseq2/deseq2.results.txt` - a `.txt` file that contains differential expression results for genes.
* `<QUANTIFICATION_METHOD>/dexseq/dexseq.results.txt` - a `.txt` file that contains differential expression results for transcripts.

</details>

Expand All @@ -202,23 +220,37 @@ After genomic alignment, novel transcripts can be reconstructed using tools such
*Description*:
If multiple conditions and multiple replicates are available then the pipeline is able to run differential analysis on gene and transcripts with DESeq2 and DEXSeq, respectively. These steps won't be run if you provide the `--skip_quantification` or `--skip_differential_analysis` parameters or if all of the samples in the samplesheet don't have the same fasta and GTF reference files.

## Variant calling
## RNA modification analysis

<details markdown="1">
<summary>Output files</summary>

* `minimap2/medaka/<SAMPLE>/round_1.vcf`
VCF file with small variants for each sample.
* `minimap2/sniffles/<sample>_sniffles.vcf`
VCF files with unflitered structural variants.
* `rna_modifications/xpore/diffmod/diffmod_outputs/diffmod.table` - a `.csv` file that contains differentially modified sites.
* `rna_modifications/m6anet/inference/<sample_name>/data.result.csv.gz` - a `.csv` file that contains detected m6A sites.

</details>

*Documentation*:
[Medaka](https://github.com/nanoporetech/medaka), [Sniffles](https://github.com/fritzsedlazeck/Sniffles)
[xPore](https://xpore.readthedocs.io/en/latest/), [m6anet](https://m6anet.readthedocs.io/en/latest/)

*Description*:
If the protocol is set to `--protocol DNA` and the *Minimap2* aligner was used, then the `--call_variants` parameter can be invoked to call small variants and structural variants using Medaka and Sniffles, respectively. These steps won't be run if you provide the `--skip_medaka` or `--skip_sniffles` parameters.
If multiple conditions are available then the pipeline is able to run differential modification analysis with xPore. These steps won't be run if you provide the `--skip_modification_analysis` or `--skip_xpore` or `--skip_m6anet` parameters.

## RNA fusion analysis

<details markdown="1">
<summary>Output files</summary>

* `jaffal/jaffa_results.csv` - a `.csv` file that contains detected RNA fusion results.
* `jaffal/jaffa_results.fasta` - a `.fasta` file that contains the sequence of the detected RNA fusions.

</details>

*Documentation*:
[jaffal](https://github.com/Oshlack/JAFFA/wiki)

*Description*:
This step won't be run if you provide the `--skip_fusion_analysis` parameter.

## MultiQC

Expand Down
2 changes: 1 addition & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ You will need to create a file with information about the samples in your experi
| `group` | Group identifier for sample. This will be identical for replicate samples from the same experimental group. |
| `replicate` | Integer representing replicate number. Must start from `1..<number of replicates>`. |
| `barcode` | Barcode identifier attributed to that sample during multiplexing. Must be an integer. |
| `input_file` | Full path to FastQ file if previously demultiplexed or a BAM file if previously aligned. FastQ File has to be zipped and have the extension ".fastq.gz" or ".fq.gz". BAM file has to have the extension ".bam". |
| `input_file` | Full path to FastQ file if previously demultiplexed or a BAM file if previously aligned or a path to a directory with subdirectories `fastq` and `fast5` for RNA modification detection. FastQ File has to be zipped and have the extension ".fastq.gz" or ".fq.gz". BAM file has to have the extension ".bam". |
| `genome` | Genome fasta file for alignment. This can either be blank, a local path, or the appropriate key for a genome available in [iGenomes config file](../conf/igenomes.config). Must have the extension ".fasta", ".fasta.gz", ".fa" or ".fa.gz". |
| `transcriptome` | Transcriptome fasta/gtf file for alignment. This can either be blank or a local path. Must have the extension ".fasta", ".fasta.gz", ".fa", ".fa.gz", ".gtf" or ".gtf.gz". |

Expand Down
10 changes: 7 additions & 3 deletions modules/local/jaffal.nf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ process JAFFAL {
container "docker.io/yuukiiwa/jaffa:2.0"
//container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
// 'https://depot.galaxyproject.org/singularity/jaffa:2.00--hdfd78af_1' :
// 'quay.io/biocontainers/jaffa:2.00--hdfd78af_1' }"//tried three biocontainers, all of them got command not found for minimap2
// 'quay.io/biocontainers/jaffa:2.00--hdfd78af_1' }"//tried three biocontainers, all of them got command not found for minimap2

input:
tuple val(meta), path(fastq)
Expand All @@ -16,11 +16,15 @@ process JAFFAL {
output:
tuple val(meta), path("*.fasta") ,emit: jaffal_fastq
path "*.csv" ,emit: jaffal_results
path "*_version.txt" ,emit: version
path "versions.yml" , emit: versions

script:
"""
bpipe run -p refBase=$jaffal_ref_dir $jaffal_ref_dir/JAFFAL.groovy $fastq
echo 'jaffa 2.0' > jaffal_version.txt
cat <<-END_VERSIONS > versions.yml
"${task.process}":
jaffa: \$( echo 'jaffa 2.0' )
END_VERSIONS
"""
}
6 changes: 6 additions & 0 deletions modules/local/m6anet_dataprep.nf
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,18 @@ process M6ANET_DATAPREP {

output:
tuple val(meta), path("$meta.id"), emit: dataprep_outputs
path "versions.yml" , emit: versions

script:
"""
m6anet-dataprep \\
--eventalign $eventalign \\
--out_dir $meta.id \\
--n_processes $task.cpus
cat <<-END_VERSIONS > versions.yml
"${task.process}":
m6anet: \$( echo 'm6anet 1.0' )
END_VERSIONS
"""
}
6 changes: 6 additions & 0 deletions modules/local/xpore_dataprep.nf
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ process XPORE_DATAPREP {

output:
tuple val(meta), path("$meta.id"), emit: dataprep_outputs
path "versions.yml" , emit: versions

script:
"""
Expand All @@ -20,5 +21,10 @@ process XPORE_DATAPREP {
--out_dir $meta.id \\
--n_processes $task.cpus \\
--genome --gtf_or_gff $gtf --transcript_fasta $genome
cat <<-END_VERSIONS > versions.yml
"${task.process}":
xpore: \$( xpore --version | sed -e 's/xpore version //g' )
END_VERSIONS
"""
}

0 comments on commit 3aa9240

Please sign in to comment.