Skip to content

2. RNA Seq Workflow_Introduction

Nanjala Ruth edited this page Aug 27, 2020 · 3 revisions

RNA-Seq data processing and gene expression analysis workflow

Background

RNA seq is widely used for gene expression studies to quantify the RNA in a sample using next-generation sequencing (NGS). It is a powerful tool with many applications for gene discovery and quantification. Here, we assume differential expression is being assessed between 2 experimental conditions, i.e. a simple 1:1 comparison. The sample data is from a human genome. It is highly reccomended to use a HPC environment for increased RAM and computational power.

Pipeline

The pipeline for this mini-project include:

  • FastQC for quality check
  • Trimmomatic for adaptor removal and trimming
  • HISAT2 for alignment and Subread’s featureCounts for count generation
  • Kallisto for pseudoalignment
  • MultiQC to collect the statistics
  • Statistical analysis in R using DESEQ
  • Converting the pipeline to R Markdown
  • Converting the pipeline to a Snakemake pipeline

Getting Setup

A. Installing Miniconda (if needed)

  1. Download Miniconda for your specific OS to your home directory
    • Linux: wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
    • Mac: curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
  2. Run:
    • bash Miniconda3-latest-Linux-x86_64.sh
    • bash Miniconda3-latest-MacOSX-x86_64.sh
  3. Follow all the prompts: if unsure, accept defaults
  4. Close and re-open your terminal
  5. If the installation is successful, you should see a list of installed packages with
    • conda list If the command cannot be found, you can add Anaconda bin to the path using: export PATH=~/miniconda3/bin:$PATH

B. Setting up the folder Structure

Clone this repository with folder structure into the current working folder

git clone https://github.com/nanjalaruth/Group-5-miniproject_RNASEQ.git
mkdir -p Rawdata Results Scripts README.md

C. Download Data files (In case you want to use these dataset)

cd Rawdata/
#nano sample_id.txt and then add a list of the sample names:
#sample37
#sample38
#sample39
#sample40
#sample41
#sample42

for sample in `cat sample_id.txt`
do
	wget http://h3data.cbio.uct.ac.za/assessments/RNASeq/practice/dataset/${sample}_R1.fastq.gz
	wget http://h3data.cbio.uct.ac.za/assessments/RNASeq/practice/dataset/${sample}_R2.fastq.gz
done

#download the metadata file
wget -c http://h3data.cbio.uct.ac.za/assessments/RNASeq/practice/practice.dataset.metadata.tsv