Authors: Hugo Tavares, Georg Zeller
These are extra materials used as a complement to Data Carpentry in R courses, and thus assume that some of those lessons were covered beforehand.
These lessons are under active development and may change over time.
The lessons are modular so can be taught in different order than shown here (apart from the introduction, which should always be the first):
- Introduction to the dataset.
- Basic exploratory analysis to understand some properties of expression data.
- Using principal component analysis (PCA) to explore transcriptome-wide effects of our experimental design.
- Exploring gene expression patterns:
- Identifying candidate genes from a differential analysis test.
- Using clustering to partition genes into groups.
There are many dedicated packages to deal with RNAseq data, mostly within the Bioconductor package repository.
This lesson is not about analysing RNAseq data (that would be a topic for a whole course!), but rather to show you how the data manipulation principles learned so far can be applied to explore these kind of data.
If you are doing RNAseq analysis, you should use dedicated packages and workflows, which implement models to account for particular features of these data.
If you are interested, you can see how the data for this lesson was pre-processed
using the DESeq2
package.