layout	website	subdomain
subsite-galaxy	https://virology.usegalaxy.eu	rna

Welcome to the European Virology Galaxy flavor

{:.no_toc}

{:.rna-intro-right}

The Virology Galaxy workbench is a comprehensive set of analysis tools and consolidated workflows. The workbench is based on the Galaxy framework, which guarantees simple access, easy extension, flexible adaption to personal and security needs, and sophisticated analyses independent of command-line knowledge.

Content

{:.no_toc}

TOC {:toc}

The SARS-CoV-2 project

Are you new to Galaxy, or returning after a long time, and looking for help to get started? Take [a guided tour]({{ page.website }}/tours/core.galaxy_ui){:target="_blank"} through Galaxy's user interface.

All data that you need is available on GitHub or in our dedicated data library. Otherwise, we will start with the preprocessing of raw SARS-CoV-2 reads.

Preprocessing of raw SARS-CoV-2 reads

The raw reads available so far are generated from bronchoalveolar lavage fluid (BALF) and are metagenomic in nature: they contain human reads, reads from potential bacterial co-infections as well as true COVID-19 reads.

What's the point?

Assess quality of reads, remove adapters and remove reads mapping to human genome.

The outline

Illumina and Oxford nanopore reads are pulled from the NCBI SRA (links to SRA accessions are available here). They are then processed separately as described in the workflow section.

Inputs

Only SRA accessions are required for this analysis. The described analysis was performed with all SRA SARS-CoV accessions available as of Feb 20, 2020:

Illumina reads
```
SRR10903401
SRR10903402
SRR10971381
```
Oxford Nanopore reads
```
SRR10948550
SRR10948474
SRR10902284
```

Outputs

This workflow produces three outputs that are used in tow subsequent analyses

`#`	Output	Used in
1	A combined set of adapter-free Illumina reads without human contamination	Assembly
2	A combined set of Oxford Nanopore reads without human contamination	Assembly
3	A collection of adapter-free Illumina reads from which human reads have not been removed	Variation detection
{:.table.table-striped}

The history and the workflow

A Galaxy workspace (history) containing the most current analysis can be imported from here.

The publicly accessible workflow can be downloaded and installed on any Galaxy instance. It contains version information for all tools used in this analysis.

The workflow performs the following steps:

Illumina

Illumina reads are QC'ed and adapter sequences are removed using fastp
Quality metrics are computed and visualized using fastqc and multiqc
Reads are mapped against human genome version hg38 using bwa mem
Reads that do not map to hg38 are filtered out using samtools view
Reads are converted back to fastq format using samtools fastx

Oxford nanopore

Reads are QC'ed using nanoplot
Quality metrics are computed and visualized using fastqc and multiqc
Reads are mapped against human genome version hg38 using minimap2
Reads that do not map to hg38 are filtered out using samtools view
Reads are converted back to fastq format using samtools fastx

{:height="50%" width="50%"}

Assembly of SARS-CoV-2 from pre-processed reads

What's the point?

Use a combination of Illumina and Oxford Nanopore reads to produce SARS-CoV-2 genome assembly.

Outline

We use Illumina and Oxford Nanopore reads that were pre-processed to remove human-derived sequences. We use two assembly tools: spades and unicycler. While spades is a tool fully dedicated to assembly, unicycler is a "wrapper" that combines multiple existing tools. It uses spades as an engine for short read assembly while utilizing mimiasm and racon for assembly of long noisy reads.

In addition to assemblies (actual sequences) the two tools produce assembly graphs that can be used for visualization of assembly with bandage.

Inputs

Filtered Illumina and Oxford Nanopore reads produced during the pre-processing step are used as inputs to the assembly tools.

Outputs

Each tool produces assembly (contigs) and assembly graph representations. The largest contigs generated by unicycler and spades were 29,781 and 29,907 nts, respectively, and had 100% identity over their entire length.

The following figures show visualizations of assembly graphs produced with spades and unicycler. The complexity of the graphs is not surprising given the metagenomic nature of the underlying samples.

Assembly graphs for Unicycler (A) and SPAdes (B)
{:height="50%" width="50%"}
A. Unicycler assembly graph
{:height="50%" width="50%"}
B. SPAdes assembly graph

History and workflow

A Galaxy workspace (history) containing the most current analysis can be imported from here.

The publicly accessible workflow can be downloaded and installed on any Galaxy instance. It contains version information for all tools used in this analysis.

{:height="50%" width="50%"}

Dating the most common recent ancestor (MCRA) of SARS-CoV-2

What's the point?

For this we used simple root-to-tip regression Korber et al. 2000 (more complex and powerful phylodynamics methods could certainly be used, but for this data with very low levels of sequence divergence, simpler and faster methods suffice). Using a set of sequences from all COVID-19 sequences available as of Feb 16, 2020 we obtained an MCRA date of Nov 14, 2019, which is close to other existing estimates Rambaut 2020.

Outline

This analysis consists of two components - a Galaxy workflow and a Jupyter notebook.

The workflow is used to extract full length sequences of SARS-CoV-2, tidy up their names in FASTA files, produce a multiple sequences alignment and compute a maximum likelihood tree.

The Jupyter notebook is used to correlate branch lengths with collection dates in order to estimate MCRA timing.

Inputs

One input is required: a comma-separated file containing accession numbers and collection dates:

Accession,Collection_Date
MT019531,2019-12-30
MT019529,2019-12-23
MT007544,2020-01-25
MN975262,2020-01-11
...

An up-to-date version of this file can be generated directly from the NCBI Virus resource by

searching for SARS-CoV-2 (NCBI taxid: 2697049) sequences
configuring the list of results to display only the Accession and Collection date columns
downloading the Current table view result in CSV format

The collection dates will be taken from the corresponding GenBank record's /collection_date tag.

Outputs

History and workflow

A Galaxy workspace (history) containing the most current analysis can be imported from here.

The publicly accessible workflow can be downloaded and installed on any Galaxy instance. It contains version information for all tools used in this analysis.

{:height="50%" width="50%"}

Analysis of variation within individual COVID-19 samples

What's the point?

To understand the amount of heterogeneity in individual COVID-19 isolates.

Outline

As of writing (2/13/2020) there were just three Illumina datasets from COVID-19 patients:

- sra-study: SRP242226
  bioproject: PRJNA601736
  biosample: SAMN13872787
  sra-sample: SRS6007144
  sra-experiment: SRX7571571
  sra-run: SRR10903401

- sra-study: SRP242226
  bioproject: PRJNA601736
  biosample: SAMN13872786
  sra-sample: SRS6007143
  sra-experiment: SRX7571570
  sra-run: SRR10903402

- sra-study: SRP245409
  bioproject: PRJNA603194
  biosample: SAMN13922059
  sra-sample: SRS6067521
  sra-experiment: SRX7636886
  sra-run: SRR10971381

To understand the extent of sequence variation within these samples we performed the following analysis. First, we used a Galaxy workflow to perform the following steps:

Mapped all reads against COVID-19 reference NC_045512.2 using bwa mem
Filtered reads with mapping quality of at least 20, that were mapped as proper pairs
Performed realignments using lofreq viterbi
Called variants using lofreq call
Annotated variants using snpeff against database created from NC_045512.2 GenBank file
Converted VCFs into tab delimited datasets

Next, we analyzed this tab delimited data in a Jupyter notebook.

Inputs

Workflow

GenBank file for the reference COVID-19 genome.

The GenBank record is used by snpeff to generate a database for variant annotation.
Set of illumina reads (in this case a collection of unfiltered reads from SRR10903401, SRR10903402, and SRR10971381)

Jupyter notebook

The Jupyter notebook requires the GenBank file (#1 from above) and the output of the workflow described below.

Outputs

The workflow produces a table of variants that looks like this:

	Sample	CHROM	POS	REF	ALT	DP	AF	SB	DP4	IMPACT	FUNCLASS	EFFECT	GENE	CODON
0	SRR10903401	NC_045512	1409	C	T	124	0.040323	1	66,53,2,3	MODERATE	MISSENSE	NON_SYNONYMOUS_CODING	orf1ab	Cat/Tat
1	SRR10903401	NC_045512	1821	G	A	95	0.094737	0	49,37,5,4	MODERATE	MISSENSE	NON_SYNONYMOUS_CODING	orf1ab	gGt/gAt
2	SRR10903401	NC_045512	1895	G	A	107	0.037383	0	51,52,2,2	MODERATE	MISSENSE	NON_SYNONYMOUS_CODING	orf1ab	Gta/Ata
3	SRR10903401	NC_045512	2407	G	T	122	0.024590	0	57,62,1,2	MODERATE	MISSENSE	NON_SYNONYMOUS_CODING	orf1ab	aaG/aaT
4	SRR10903401	NC_045512	3379	A	G	121	0.024793	0	56,62,1,2	LOW	SILENT	SYNONYMOUS_CODING	orf1ab	gtA/gtG

Here, most fields names are descriptive. SB = the Phred-scaled probability of strand bias as calculated by lofreq (0 = no strand bias); DP4 = strand-specific depth for reference and alternate allele observations (Forward reference, reverse reference, forward alternate, reverse alternate).

The variants we identified were distributed across the SARS-CoV-2 genome in the following way:

The following table describes variants with frequencies above 10%:

History and workflow

A Galaxy workspace (history) containing the most current analysis can be imported from here.

The publicly accessible workflow can be downloaded and installed on any Galaxy instance. It contains version information for all tools used in this analysis.

{:height="50%" width="50%"}

Alignment of COVID-19 Spike protein with homologs from other coronaviruses

What's the point?

Aligning Spike protein sequences to detect structural variations and impact of polymorphisms.

Outline

We generate a codon alignment for a set of coronaviruses in order to track polymorphisms uncovered by the analysis of variation in individual samples.

Input

Downloaded CDS sequences of coronavirus Spike proteins from NCBI Viral Resource for the following coronaviruses:

FJ588692.1	Bat SARS Coronavirus Rs806/2006
KR559017.1	Bat SARS-like coronavirus BatCoV/BB9904/BGR/2008
KC881007.1	Bat SARS-like coronavirus WIV1
KT357810.1	MERS coronavirus isolate Riyadh_1175/KSA/2014
KT357811.1	MERS coronavirus isolate Riyadh_1337/KSA/2014
KT357812.1	MERS coronavirus isolate Riyadh_1340/KSA/2014
KF811036.1	MERS coronavirus strain Tunisia-Qatar_2013
AB593383.1	Murine hepatitis virus
AF190406.1	Murine hepatitis virus strain TY
AY687355.1	SARS coronavirus A013
AY687356.1	SARS coronavirus A021
AY687361.1	SARS coronavirus B029
AY687365.1	SARS coronavirus C013
AY687368.1	SARS coronavirus C018
AY648300.1	SARS coronavirus HHS-2004
DQ412594.1	SARS coronavirus isolate CUHKtc10NP
DQ412596.1	SARS coronavirus isolate CUHKtc14NP
DQ412609.1	SARS coronavirus isolate CUHKtc32NP
MN996528.1	nCov-2019
MN996527.1	nCov-2019
NC_045512.2	nCov-2019
NC_002306.3	Feline infectious peritonitis virus
NC_028806.1	Swine enteric coronavirus strain Italy/213306/2009
NC_038861.1	Transmissible gastroenteritis virus

Output

We produce two alignments, one at the nucleotide and one at the amino acid level, of Betacoronavirus spike proteins. The alignments can be visualized with the Multiple Sequence Alignment visualization in Galaxy :


{:height="50%" width="50%"}

Or with locally installed softwares, here AliView.

Alignments of Spike proteins
{:height="50%" width="50%"}
A. CDS alignments
{:height="50%" width="50%"}
B. Protein alignment

Workflow

The Galaxy history containing the latest analysis can be found here. The publicly accessible workflow can be downloaded and installed on any Galaxy instance. It contains all information about tool versions and parameters used in this analysis.

{:height="50%" width="50%"}

The Transeq tool converts the CDS sequences into protein sequences, which we then align to each other using MAFFT. The output is fed into tranalign along with the nucleotide sequences. tranalign produces a nucleotide alignment coherent with the protein alignment.

Evolutionary Analysis

What's the point?

Wu et al. showed recombination between COVID-19 and bat coronaviruses located within the S-gene. We want to confirm this observation and provide a publicly accessible workflow for recombination detection.

In previous coronovirus outbreaks (SARS), retrospecive analyses determined that adaptive substitutions might have occured in the S-protein Zhang et al., e.g., related to ACE2 receptor utilization. While data on COVID-19 are currenly limited, we investigated whether or not the lineage leading to them showed any evidence of positive diversifying selection.

Outline

We employ a recombination detection algorithm (GARD) developed by Kosakovsky Pond et al. and implemented in the hyphy package. To select a representative set of S-genes we perform a blast search using the S-gene CDS from NC_045512 as a query against the nr database. We select coding regions corresponding to the S-gene from a number of COVID-19 genomes, original SARS isolates. This set of sequences can be found in this repository

We then generate a codon-based alignment using the workflow shown below and perform the recombination analysis using the gard tool from the hyphy package.

For selection analyses, we apply the Adaptive Branch Site Random Effects method to test whether or each branch of the tree shows evidence of diversifing positive selection along a fraction of sites using the absrel tool from the hyphy package.

Inputs

A set of unaligned CDS sequences for the S-gene.

Additionally, for aBSREL, a phylogenetic tree (for aBSREL).

Outputs

A recombination report:

{:height="50%" width="50%"}

and a map of possible recombination hotspots:

{:height="50%" width="50%"}

A selection analysis summary and tree (COVID-19 isolate is MN988668_1)

{:height="50%" width="50%"}

and a plot of the inferred ω distribution for the MN988668_1 branch.

{:height="50%" width="50%"}

History and workflow

TODO: add aBSREL workflow

A Galaxy workspace (history) containing the most current analysis can be imported from here.

The publicly accessible workflow can be downloaded and installed on any Galaxy instance. It contains version information for all tools used in this analysis.

{:height="50%" width="50%"}

The workflow takes unaligned CDS sequences, translates them with EMBOSS:tanseq, aligns translations using mafft, realigns original CDS input using the mafft alignment as a guide and sends this codon-based alignment to gard.

Comparative analysis of coronovirus sequences

What's the point?

What is the phylogenetic relationship between assembled sequences and other coronaviruses?

Outline

We mapped Unicycler assembly produces at step 1 against nr database at NCBI using blastn and downloaded hit table. This analysis indicated that our assembly is 100% identical to NC_045512.

We then used this hit table to Galaxy workflow that:

downloaded sequences
aligned downloaded sequences against COVID-19 reference NC_045512 using lastz
identified sequences that align with at least 75% of NC_045512
created multiple alignment of sequences from the previous step using mafft
computed a maximum likelihood tree using iqtree

Inputs

The analysis takes two inputs:

Hit table generated by blast
Genbank file for COVID-19 reference genome NC_045512

The hit table has the following format:

COVID-19,MN988668.1,100.000,29781,0,0,1,29781,29838,58,0.0,54996
COVID-19,NC_045512.2,100.000,29781,0,0,1,29781,29839,59,0.0,54996
COVID-19,MN994468.1,99.993,29781,2,0,1,29781,29839,59,0.0,54985
COVID-19,MN985325.1,99.990,29781,3,0,1,29781,29839,59,0.0,54979
COVID-19,MN938384.1,99.990,29781,3,0,1,29781,29807,27,0.0,54979
COVID-19,MN997409.1,99.987,29781,4,0,1,29781,29839,59,0.0,54974
COVID-19,MN975262.1,99.983,29781,5,0,1,29781,29839,59,0.0,54968
....

the workflow:

extracts accession numbers (the second column) for hit table
downloads all corresponding FASTA files
aligned them using lastz
selected all sequences that align over at least 75% of the reference
uses these sequences to create multiple alignment and phylogenetic tree

Outputs

A multiple alignment of sequences that align over at least 75% of the reference
A maximum likelihood tree

History and workflow

Galaxy workspace (history) containing the most current analysis can be imported from here.

The workflow is available at Galaxy public site and can downloaded and installed on any Galaxy instance. It contains version information for all tools used in this analysis.

the workflow performs the following steps:

extracts accession numbers (the second column) for hit table
downloads all corresponding FASTA files
aligned them using lastz
selected all sequences that align over at least 75% of the reference
uses these sequences to create multiple alignment and phylogenetic tree

{:height="50%" width="50%"}

Training

We are passionate about training. So we are working in close collaboration with the Galaxy Training Network (GTN){:target="_blank"} to develop training materials of data analyses based on Galaxy {% cite batut2017community %}. These materials hosted on the GTN GitHub repository are available online at https://training.galaxyproject.org{:target="_blank"}.

Want to learn more about RNA analyses? Take one of our guided tour or check out the following hands-on tutorials. We developed several tutorials and the remaining are from the GTN community (marked with )

Lesson	Slides	Hands-on	Input dataset	Workflows	Galaxy tour	Galaxy History
Introduction to Transcriptomics	{:target="_blank"}
RNA-seq counts to genes		{:target="_blank"}	{:target="_blank"}
RNA-seq genes to pathways		{:target="_blank"}	{:target="_blank"}
RNA-Seq reads to counts		{:target="_blank"}	{:target="_blank"}
Analyse unaligned ncRNAs				[]({{ page.website }}/workflows/run?id=5cd167ed9e159e73){:target="_blank"}
CLIP-Seq data analysis from pre-processing to motif detection		{:target="_blank"}	{:target="_blank"}	[]({{ page.website }}/workflows/run?id=f5be5bcf9b9f171c){:target="_blank"}		[]({{ page.website }}/u/joerg-fallmann/h/eclipworkflow){:target="_blank"}
De novo transcriptome reconstruction with RNA-Seq		{:target="_blank"}	{:target="_blank"}	[]({{ page.website }}/workflows/run?id=f026c4b8341ff94c){:target="_blank"}	[]({{ page.website }}/tours/rnateam.de-novo){:target="_blank"}
Differential abundance testing of small RNAs		{:target="_blank"}	{:target="_blank"}	[]({{ page.website }}/workflows/run?id=7734928ebc0a2654){:target="_blank"} []({{ page.website }}/workflows/run?id=1ffc058273ab357e){:target="_blank"}	[]({{ page.website }}/tours/differential_abundance_testing_sRNAs){:target="_blank"}
Network analysis with Heinz	{:target="_blank"}	{:target="_blank"}	{:target="_blank"}	[]({{ page.website }}/workflows/run?id=12c80c5b5e2305d8){:target="_blank"}	[]({{ page.website }}/tours/rnateam.network-analysis-with-heinz){:target="_blank"}	[]({{ page.website }}/u/videmp/h/rna-workbench-network-analysis-with-heinz){:target="_blank"}
PAR-CLIP analysis			{:target="_blank"}	[]({{ page.website }}/workflows/run?id=a108b575b16e6cb9){:target="_blank"}		[]({{ page.website }}/u/joerg-fallmann/h/parclipworkflow){:target="_blank"}
Reference-based RNA-Seq data analysis		{:target="_blank"}	{:target="_blank"}	[]({{ page.website }}/workflows/run?id=9c7a218993788493){:target="_blank"}	[]({{ page.website }}/tours/ref_based_rna-seq){:target="_blank"}	[]({{ page.website }}/u/andrea.bagnacani/h/reference-based-rna-seq){:target="_blank"}
RNA family model construction				[]({{ page.website }}/workflows/run?id=8f2d958cee428ca1){:target="_blank"}
RNA-seq counts to genes		{:target="_blank"}	{:target="_blank"}	[]({{ page.website }}/workflows/run?id=86f89f49431b1e2e){:target="_blank"}		[]({{ page.website }}/u/videmp/h/rna-workbench-rna-seq-counts-to-genes){:target="_blank"}
RNA-seq genes to pathways		{:target="_blank"}	{:target="_blank"}	[]({{ page.website }}/workflows/run?id=3cb45f0d38e9fd42){:target="_blank"}		[]({{ page.website }}/u/videmp/h/rna-workbench-rna-seq-genes-to-pathways){:target="_blank"}
RNA-Seq reads to counts		{:target="_blank"}	{:target="_blank"}	[]({{ page.website }}/workflows/run?id=e89761c4bb25d89c){:target="_blank"}		[]({{ page.website }}/u/videmp/h/rna-workbench-rna-seq-reads-to-counts-1){:target="_blank"}
Scan for C/D-box sequences with segmentation-fold				[]({{ page.website }}/workflows/run?id=3b717623054d5125){:target="_blank"}
Small Non-coding RNA Clustering using BlockClust		{:target="_blank"}	{:target="_blank"}	[]({{ page.website }}/workflows/run?id=c7026cd5578c8678){:target="_blank"}		[]({{ page.website }}/u/videmp/h/rna-workbench-small-non-coding-rna-clustering-using-blockclust){:target="_blank"}
Visualization of RNA-Seq results with CummeRbund	{:target="_blank"}	{:target="_blank"}	{:target="_blank"}	[]({{ page.website }}/workflows/run?id=17e720bee3b9104f){:target="_blank"}	[]({{ page.website }}/tours/rna-seq-viz-with-cummerbund){:target="_blank"}	[]({{ page.website }}/u/videmp/h/rna-workbench-visualization-of-rna-seq-results-with-cummerbund){:target="_blank"}
Visualization of RNA-Seq results with Volcano Plot		{:target="_blank"}	{:target="_blank"}	[]({{ page.website }}/workflows/run?id=fd156028b09d213a){:target="_blank"}		[]({{ page.website }}/u/videmp/h/rna-workbench-visualization-of-rna-seq-results-with-volcano-plot){:target="_blank"}
Visualization of RNA-Seq results with heatmap2		{:target="_blank"}	{:target="_blank"}	[]({{ page.website }}/workflows/run?id=4dae6d48ba08c037){:target="_blank"}		{:target="_blank"}
ViennaRNA Introduction			{:target="_blank"}	[]({{ page.website }}/workflows/run?id=58fd339165ded462){:target="_blank"}	[]({{ page.website }}/tours/rnateam.viennarna){:target="_blank"}	[]({{ page.website }}/u/joerg-fallmann/h/viennarnaintroduction){:target="_blank"}
{:.table.table-striped}

Available tools

In this section we list all tools that have been integrated in the RNA workbench. The list is likely to grow as soon as further tools and workflows are contributed. To ease readability, we divided them into categories.

RNA structure prediction and analysis

Tool	Description	Reference
{% include tool.html id="antaRNA" %}	Possibility of inverse RNA structure folding and a specification of a GC value constraint	Kleinkauf et al. 2015{:target="_blank"}
{% include tool.html id="CoFold" %}	A thermodynamics-based RNA secondary structure folding algorithm	Proctor and Meyer, 2015{:target="_blank"}
{% include tool.html id="Kinwalker" %}	Algorithm for cotranscriptional folding of RNAs to obtain the min. free energy structure	Geis et al. 2008{:target="_blank"}
{% include tool.html id="MEA" %}	Prediction of maximum expected accuracy RNA secondary structures	Amman et al. 2013{:target="_blank"}
{% include tool.html id="RNAshapes" %}	Structures to a tree-like domain of shapes, retaining adjacency and nesting of structural features	Janssen and Giergerich, 2014{:target="_blank"}
{% include tool.html id="RNAz" %}	Predicts structurally conserved and therm. stable RNA secondary structures in mult. seq. alignments	Washietl et al. 2005{:target="_blank"}
{% include tool.html id="segmentation-fold" %}	An application that predicts RNA 2D-structure with an extended version of the Zuker algorithm	-
ViennaRNA	A tool compilation for prediction and comparison of RNA secondary structures	Lorenz et al. 2011{:target="_blank"}
{: .table.table-striped .tooltable}

RNA alignment

Tool	Description	Reference
{% include tool.html id="Compalignp" %}	An RNA counterpart of the protein specific "Benchmark Alignment Database"	Wilm et al. 2006{:target="_blank"}
{% include tool.html id="LocARNA" %}	A tool for multiple alignment of RNA molecules	Will et al. 2012{:target="_blank"}
{% include tool.html id="MAFFT" %}	A multiple sequence alignment program for unix-like operating systems	Katoh and Standley, 2016{:target="_blank"}
{% include tool.html id="RNAlien" %}	A tool for RNA family model construction	Eggenhofer et al. 2016{:target="_blank"}
{% include tool.html id="CMV" %}	RNA family model visualisation	Eggenhofer et al. 2018{:target="_blank"}
{: .table.table-striped .tooltable}

RNA annotation

Tool	Description	Reference
{% include tool.html id="ARAGORN" %}	A tool to identify tRNA and tmRNA genes	Laslett and Canback, 2004{:target="_blank"}
{% include tool.html id="Fusion Matcher (FuMa)" %}	A tool that reports identical fusion genes based on gene-name annotations	Hoogstrate et al. 2016{:target="_blank"}
{% include tool.html id="GotohScan" %}	A search tool that finds shorter sequences in large database sequences	Hertel et al. 2009{:target="_blank"}
{% include tool.html id="INFERNAL" %}	A tool searching DNA sequence databases for RNA structure and sequence similarities	Nawrocki et al. 2015{:target="_blank"}
{% include tool.html id="RNABOB" %}	A tool for fast pattern searching for RNA secondary structures	-
{% include tool.html id="RNAcode" %}	Predicts protein coding regions in a a set of homologous nucleotide sequences	Washietl et al. 2011{:target="_blank"}
{% include tool.html id="tRNAscan" %}	Searches for tRNA genes in genomic sequences	Lowe and Eddy, 1997{:target="_blank"}
{% include tool.html id="RCAS" %}	A generic reporting tool for the functional analysis of transcriptome-wide regions of interest detected by high-throughput experiments	Uyar et al.{:target="_blank"}
{: .table.table-striped .tooltable}

RNA-protein interaction

Tool	Description	Reference
{% include tool.html id="AREsite2" %}	A database for AU-/GU-/U-rich elements in human and model organisms	Fallmann et al. 2016{:target="_blank"}
{% include tool.html id="DoRiNA" %}	A database of RNA interactions in post-transcriptional regulation	Blin et al. 2014{:target="_blank"}
{% include tool.html id="PARalyzer" %}	An algorithm to generate a map of interacting RNA-binding proteins and their targets	Corcoran et al. 2011{:target="_blank"}
{% include tool.html id="Piranha" %}	A peak-caller for CLIP- and RIP-seq data	-
{: .table.table-striped .tooltable}

RNA-RNA interaction

Tool	Description	Reference
{% include tool.html id="ChiRA-collapse" label="C" %}{% include tool.html id="ChiRA-map" label="h" %}{% include tool.html id="ChiRA-merge" label="i" %}{% include tool.html id="ChiRA-quantify" label="R" %}{% include tool.html id="ChiRA-extract" label="A" %}	A set of tools to analyze RNA-RNA interactome experimental data such as CLASH, CLEAR-CLIP, PARIS, LIGR-Seq etc	-
{: .table.table-striped .tooltable}

RNA target prediction

Tool	Description	Reference
{% include tool.html id="TargetFinder" %}	A tool to predict small RNA binding sites on target transcripts from a sequence database	-
{: .table.table-striped .tooltable}

RNA Seq and HTS analysis

Preprocessing

Tool	Description	Reference
{% include tool.html id="FastQC" %}	A quality control tool for high throughput sequence data	-
{% include tool.html id="TrimGalore" label="Trim Galore!" %}	Automatic quality and adapter trimming as well as quality control	-
{: .table.table-striped .tooltable}

RNA-Seq

Tool	Description	Reference
{% include tool.html id="BlockClust" %}	Small non-coding RNA clustering from deep sequencing read profiles	Videm et al. 2014{:target="_blank"}
{% include tool.html id="FlaiMapper" %}	A tool for computational annotation of small ncRNA-derived fragments using RNA-seq data	Hoogstrate et al. 2015{:target="_blank"}
{% include tool.html id="MiRDeep2" %}	Discovers microRNA genes by analyzing sequenced RNAs	Friedländer et al. 2008{:target="_blank"}
{% include tool.html id="NASTIseq" %}	A method that incorporates the inherent variable efficiency of generating perfectly strand-specific libraries	Li et al. 2013{:target="_blank"}
{% include tool.html id="PIPmiR" %}	An algorithm to identify novel plant miRNA genes from a combination of deep sequencing data and genomic features	Breakfield et al. 2011{:target="_blank"}
{% include tool.html id="SortMeRNA" %}	A tool for filtering, mapping and OTU-picking NGS reads in metatranscriptomic and -genomic data	Kopylova et al. 2011{:target="_blank"}
{: .table.table-striped .tooltable}

Read Mapping

Tool	Description	Reference
{% include tool.html id="HISAT2" %}	Hierarchical indexing for spliced alignment of transcripts	Pertea et al. 2016{:target="_blank"}
{% include tool.html id="RNA STAR" %}	Rapid spliced aligner for RNA-seq data	Dobin et al. 2013{:target="_blank"}
{% include tool.html id="STAR-fusion" %}	Fast fusion gene finder	Haas et al. 2017{:target="_blank"}
{% include tool.html id="Bowtie2" %}	Fast and sensitive read alignment	Langmead et al. 2012{:target="_blank"}
{% include tool.html id="BWA" %}	Software package for mapping low-divergent sequences against a large reference genome	Li and Durbin 2009{:target="_blank"}, Li and Durbin 2010{:target="_blank"}
{: .table.table-striped .tooltable}

Transcript Assembly

Tool	Description	Reference
{% include tool.html id="Trinity" %}	De novo transcript sequence reconstruction from RNA-Seq	Haas et al. 2013{:target="_blank"}
{: .table.table-striped .tooltable}

Quantification

Tool	Description	Reference
{% include tool.html id="featureCounts" %}	Ultrafast and accurate read summarization program	Liao et al. 2014{:target="_blank"}
{% include tool.html id="htseq-count" %}	Tool for counting reads in features	Anders et al. 2015{:target="_blank"}
{% include tool.html id="Sailfish" %}	Rapid Alignment-free Quantification of Isoform Abundance	Patro et al. 2014{:target="_blank"}
{% include tool.html id="Salmon" %}	Fast, accurate and bias-aware transcript quantification	Patro et al. 2017{:target="_blank"}
{: .table.table-striped .tooltable}

Differential expression analysis

Tool	Description	Reference
{% include tool.html id="DESeq2" %}	Differential gene expression analysis based on the negative binomial distribution	Love et al. 2014{:target="_blank"}
{: .table.table-striped .tooltable}

Utilities

Tool	Description	Reference
SAMtools	Utilities for manipulating alignments in the SAM format	Heng et al. 2009{:target="_blank"}
BEDTools	Utilities for genome arithmetic	Quinlan and Hall 2010{:target="_blank"}
deepTools	Tools for exploring deep-sequencing data	Ramirez et al. 2014{:target="_blank"}, Ramirez et al. 2016{:target="_blank"}
{: .table.table-striped .tooltable}

Ribosome profiling

Tool	Description	Reference
RiboTaper	An analysis pipeline for Ribo-Seq experiments, exploiting the triplet periodicity of ribosomal footprints to call translated regions	Calviello et al. 2016{:target="_blank"}
{: .table.table-striped .tooltable}

Contributors

Dannon Baker, Marius Van Den Beek, John Chilton, Nate Coraor, Bjorn Gruning, Delphine Larivière, Nicholas Keener, Sergei Kosakovsky, Wolfgang Maier, Anton Nekrutenko, James Taylor, Steven Weaver

Files

index-virology.md

Latest commit

History

index-virology.md

File metadata and controls

Welcome to the European Virology Galaxy flavor

Content

The SARS-CoV-2 project

Preprocessing of raw SARS-CoV-2 reads

What's the point?

The outline

Inputs

Outputs

The history and the workflow

Illumina

Oxford nanopore

Assembly of SARS-CoV-2 from pre-processed reads

What's the point?

Outline

Inputs

Outputs

History and workflow

Dating the most common recent ancestor (MCRA) of SARS-CoV-2

What's the point?

Outline

Inputs

Outputs

History and workflow

Analysis of variation within individual COVID-19 samples

What's the point?

Outline

Inputs

Workflow

Jupyter notebook

Outputs

History and workflow

Alignment of COVID-19 Spike protein with homologs from other coronaviruses

What's the point?

Outline

Input

Output

Workflow

Evolutionary Analysis

What's the point?

Outline

Inputs

Outputs

History and workflow

Comparative analysis of coronovirus sequences

What's the point?

Outline

Inputs

Outputs

History and workflow

Training

Available tools

RNA structure prediction and analysis

RNA alignment

RNA annotation

RNA-protein interaction

RNA-RNA interaction

RNA target prediction

RNA Seq and HTS analysis

Preprocessing

RNA-Seq

Read Mapping

Transcript Assembly

Quantification

Differential expression analysis

Utilities

Ribosome profiling

Contributors