-
Notifications
You must be signed in to change notification settings - Fork 10
Home
Nanjala Ruth edited this page Aug 25, 2020
·
9 revisions
Welcome to the Group-5-miniproject_RNASEQ wiki!
RNA seq is widely used for gene expression studies to quantify the RNA in a sample using next-generation sequencing (NGS). It is a powerful tool with many applications for gene discovery and quantification. Here, we assume differential expression is being assessed between 2 experimental conditions, i.e. a simple 1:1 comparison. The sample data is from a human genome. It is highly reccomended to use a HPC environment for increased RAM and computational power.
The pipeline for this mini-project include:
-
FastQC
for quality check -
Trimmomatic
for adaptor removal and trimming -
HISAT2
for alignment and Subread’sfeatureCounts
for count generation -
Kallisto
for pseudoalignment -
MultiQC
to collect the statistics - Statistical analysis in R using
DESEQ
- Converting the pipeline to R Markdown
- Converting the pipeline to Snakemake/Nextflow languages
-
Download Miniconda for your specific OS to your home directory
- Linux:
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
- Mac:
curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
- Linux:
- Run:
bash Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-MacOSX-x86_64.sh
- Follow all the prompts: if unsure, accept defaults
- Close and re-open your terminal
- If the installation is successful, you should see a list of installed packages with
-
conda list
If the command cannot be found, you can add Anaconda bin to the path using:export PATH=~/miniconda3/bin:$PATH
-
Clone this repository with folder structure into the current working folder
git clone https://github.com/nanjalaruth/Group-5-miniproject_RNASEQ.git
mkdir -p Rawdata Results Scripts README.md
cd Rawdata/
#nano sample_id.txt and then add a list of the sample names:
#sample37
#sample38
#sample39
#sample40
#sample41
#sample42
for sample in `cat sample_id.txt`
do
wget http://h3data.cbio.uct.ac.za/assessments/RNASeq/practice/dataset/${sample}_R1.fastq.gz
wget http://h3data.cbio.uct.ac.za/assessments/RNASeq/practice/dataset/${sample}_R2.fastq.gz
done
#download the metadata file
wget -c http://h3data.cbio.uct.ac.za/assessments/RNASeq/practice/practice.dataset.metadata.tsv
# Download the human annotation file
wget -P ftp://ftp.ensembl.org/pub/release-100/gtf/homo_sapiens/Homo_sapiens.GRCh38.100.gtf.gz -O hisat2/Homo_sapiens.GRCh38.100.gtf.gz
gunzip hisat2/Homo_sapiens.GRCh38.100.gtf.gz
# Download the human reference genome
wget -c ftp://ftp.ensembl.org/pub/release-100/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz -O hisat2/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip hisat2/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
# Download the human transcriptome reference
wget -c ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_34/gencode.v34.transcripts.fa.gz -O Kallisto/gencode.v34.transcripts.fa.gz