-
Notifications
You must be signed in to change notification settings - Fork 25
/
Copy pathindex.html
234 lines (201 loc) · 16.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
<!DOCTYPE html>
<html lang="en-us">
<head>
<meta charset="UTF-8">
<title>Best Practices in the analysis of RNA-seq and ChIP-seq data by bioinformatics-core-shared-training</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" type="text/css" href="stylesheets/normalize.css" media="screen">
<link href='http://fonts.googleapis.com/css?family=Open+Sans:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" type="text/css" href="stylesheets/stylesheet.css" media="screen">
<link rel="stylesheet" type="text/css" href="stylesheets/github-light.css" media="screen">
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-63148050-2', 'auto');
ga('send', 'pageview');
</script>
</head>
<body>
<section class="page-header">
<h1 class="project-name">Best Practices in the analysis of RNA-seq and ChIP-seq data</h1>
<h2 class="project-tagline">Cambridge, Uk, 27th - 31st July 2015</h2>
<a href="https://github.com/bioinformatics-core-shared-training/cruk-bioinf-sschool" class="btn">View on GitHub</a>
<a href="https://github.com/bioinformatics-core-shared-training/cruk-bioinf-sschool/zipball/master" class="btn">Download .zip</a>
<a href="https://github.com/bioinformatics-core-shared-training/cruk-bioinf-sschool/tarball/master" class="btn">Download .tar.gz</a>
</section>
<div class="container container-full-width card">
<div class="banner">
<a href="http://www.cruk.cam.ac.uk/" title="CRUK Cambridge Institute"><img width="600" align="center" src="img/cruk-cambridge-institute.jpg"/></a>
</div>
<section class="main-content">
<h3>
<img align="center" src="img/group.jpg"/>
<a id="description" class="anchor" href="#description" aria-hidden="true"><span class="octicon octicon-link"></span></a>Description.</h3>
<p>High-throughput technologies such as next generation sequencing (NGS) can routinely produce massive amounts of data. These technologies allow us to describe all variants in a genome or to detect the whole set of transcripts that are present in a cell or tissue. However, such datasets pose new challenges in the way the data have to be analyzed, annotated and interpreted which are not trivial and are daunting to the wet-lab biologist. This course covers state-of-the-art and best-practice tools for NGS RNA-seq and ChIP-seq data analysis, which are of major relevance in today’s genomic and gene expression studies.
</p>
<h3>
<a id="authors" class="anchor" href="#authors" aria-hidden="true"><span class="octicon octicon-link"></span></a>Instructors.</h3>
<p>
<li><a href="https://github.com/markdunning">Mark Dunning</a></li>
<li>Bernard Pereira</li>
<li>Oscar Rueda</li>
<li>Ines De Santiago</li>
<li>Shamith Samarajiwa</li>
<img align="center" src="img/bioinf-tutors2_0.jpg"/>
</p>
<h3>
<a id="prerequisites" class="anchor" href="#prerequisites" aria-hidden="true"><span class="octicon octicon-link"></span></a>Prerequisites.</h3>
<p>
There is a lot of material to cover in the course, so we will assume that you are familiar with a few basics before you come. The tool that will we do most of the analysis in is R. There will be a short recap of the key concepts at the beginning of the course; however it will be beneficial if you are already familiar with the following
<li>Using the RStudio program</li>
<li>Setting your working directory</li>
<li>Creating variables and basic object types; in particular vectors and data frames</li>
<li>Using built-in R functions</li>
<li>Using R to get help on functions</li>
<li>Subset operations for vectors and data frames using the [] notation </li>
<li>Reading files into R</li>
<li>Basic plots; scatter plots, boxplot and histogram</li>
<li>Conditional statements using if and else (not essential, but highly recommended)</li>
<li>Achieving repetitive tasks using a for loop (not essential, but highly recommended)</li>
</p>
<p>Several Online videos are available that cover this materials. For example</p>
<li><a href="http://shop.oreilly.com/product/0636920034834.do">http://shop.oreilly.com/product/0636920034834.do</a></li>
<li><a href="http://blog.revolutionanalytics.com/2012/12/coursera-videos.html">http://blog.revolutionanalytics.com/2012/12/coursera-videos.html</a></li>
<li><a href="http://bitesizebio.com/webinar/20600/beginners-introduction-to-r-statistical-software">http://bitesizebio.com/webinar/20600/beginners-introduction-to-r-statistical-software</a></li>
Or feel free to look through the lecture notes of our University R <a href="http://cambiotraining.github.io/r-intro/">course</a>
Some introductory statistics will be also be assumed. See <a href="http://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one">Statistics at Square One</a> for a good overview.
<h3>
<a id="aims" class="anchor" href="#aims" aria-hidden="true"><span class="octicon octicon-link"></span></a>Aims.</h3>
<p>
<li>To provide an understanding of how aligned sequencing reads, genome sequences and genomic regions are represented in R. </li>
<li>To encourage confidence in reading sequencing reads into R, performing quality assessment and executing standard pipelines for RNA-Seq and ChIP-Seq analysis </li>
</p>
<h3>
<a id="objectives" class="anchor" href="#objectives" aria-hidden="true"><span class="octicon octicon-link"></span></a>Objectives.</h3>
<p>
<li>Know what tools are available in Bioconductor for HTS analysis and understand the basic object-types that are utilised. </li>
<li>Given a set of gene identifiers, find out whereabouts in the genome they are located, and vice-versa (i.e. go from genomic coordinates to genes). </li>
<li>Produce a list of differentially expressed genes from an RNA-Seq experiment. </li>
<li>Import a set of ChIP-Seq peaks and investigate their biological context.</li>
</p>
<h2><a id="Course Materials" class="anchor" href="#materials" aria-hidden="true"><span class="octicon octicon-link"></span></a>Course Materials.</h2>
<h3>
<a id="day1" class="anchor" href="#day1" aria-hidden="true"><span class="octicon octicon-link"></span></a>Day One.</h3>
<p>
<li><a target="_blank" href="Day1/intro.html">Introduction to Bioconductor and exploratory data analysis (L)</a><a target="_blank" href="Day1/Course Introduction.pdf"> [ Printable PDF ]</a></li>
<li><a target="_blank" href="Day1/bioc-intro.pdf">R and Bioconductor recap (P)</li>
<li><a target="_blank" href="Day1/ngs-intro.html">Introduction to NGS Sequencing (L)</a> <a target="_blank" href="Day1/Introduction to NGS data.pdf">[ Printable PDF ] </a></li>
<li><a target="_blank" href="Day1/NGS_QC_inesdesantiago.pdf">Quality Assessment of NGS Data (L)</a></li>
<li><a target="_blank" href="Day1/fastqc_sweave.pdf">Quality Assessment of NGS Data (P)</a></li>
<li><a target="_blank" href="https://docs.google.com/forms/d/1h1sXLptX5FUhoVnmwdY7MKEaU-39fcBYpRIsdVpH-LM/viewform?c=0&w=1f">Quality Assessment of NGS Data (Quiz!)</a></li>
<li><a target="_blank" href="Day1/Sequence Alignment_July2015_ShamithSamarajiwa.pdf">Alignment Slides</a></li>
<li><a target="_blank" href="Day1/Alignment_Demo.pdf">Alignment Demo</a></li>
</p>
<h3>
<a id="day2" class="anchor" href="#day2" aria-hidden="true"><span class="octicon octicon-link"></span></a>Day Two.</h3>
<p>
<li><a target="_blank" href="Day2/repSeqData.html">Representing Sequencing data in Bioconductor (L) </a><a target="_blank" href="Day2/Representing sequencing data in R and Bioconductor.pdf">[ Printable PDF ] </a></li>
<li><a target="_blank" href="Day2/StringsAndRanges-Prac.pdf">Representing Sequencing data in Bioconductor (P)</a></li>
<li><a target="_blank" href="Day2/Lect4-DesignStatistics_HTSeq.pdf">Linear Models and Experimental Design (L)</a></li>
<li><a target="_blank" href="Day2/rnaSeq_July.pdf">Introduction to RNA Sequencing (L)</a></li>
<li><a target="_blank" href="Day2/rnaSeq_align.pdf">RNA-Seq counts to reads (P)</a></li>
</p>
<h3>
<a id="day1" class="anchor" href="#day3" aria-hidden="true"><span class="octicon octicon-link"></span></a>Day Three.</h3>
<p>
<li><a target="_blank" href="Day3/rnaSeq_DE.pdf">RNA-seq Practical</a></li>
<li><a target="_blank" href="Day3/Supplementary-RNAseq-practical.pdf">(Supplementary) RNA-seq Practical</a></li>
<li><a target="_blank" href="Day3/annoAndViz.html">Introduction to Genome Annotation</a><a target="_blank" href="Day3/Genome Annotation and Visualisation using R and Bioconductor.pdf">[ Printable PDF ] </a></li>
<li><a target="_blank" href="Day3/new-anno-prac.pdf">Genome Annotation Practical</a></li>
<li><a target="_blank" href="Day3/Genome Browsers_July2015_ShamithSamarajiwa.pdf">Using Genome Browsers (L) </a></li>
</p>
<h3>
<a id="day4" class="anchor" href="#day4" aria-hidden="true"><span class="octicon octicon-link"></span></a>Day Four.</h3>
<p>
<li><a target="_blank" href="Day4/Downstream_Analysis_of_Transcrptomic_Data.pdf">Downstream Analysis of RNA-seq Data (L)</a></li>
<li><a target="_blank" href="Day4/Day4_RNseq_DownstreamAnalysis.pdf">Downstream Analysis of RNA-seq Data (P)</a></li>
<li><a target="_blank" href="Day4/Introduction_to_ChIPseq_July2015_ShamithSamarajiwa.pdf">Introduction to ChIP-Seq (L) </a></li>
<li><a target="_blank" href="Day4/ChIP_QC_presentation.pdf">Analysis of ChIP-Seq (L) </a></li>
<li><a target="_blank" href="Day4/chipqc_sweave.pdf">ChIP-Seq Practical</a></li>
<li><a target="_blank" href="Day4/rep-research.html">Reproducible Research</a><a target="_blank" href="Day4/Reproducible Research.pdf">[ Printable PDF ] </a></li>
</p>
<h3>
<a id="day5" class="anchor" href="#day5" aria-hidden="true"><span class="octicon octicon-link"></span></a>Day Five.</h3>
<p>
<li><a target="_blank" href="Day5/ChIPseq_Downstream_analysis_July2015_ShamithSamarajiwa.pdf"><Downstream Analysis of ChIP-Seq Data (L)</a>Downstream Analysis of ChIP-Seq Data (L) </li>
<li><a target="_blank" href="Day5/ChIP-Seq_Practical_2.pdf"><Downstream Analysis of ChIP-Seq Data (L)</a>Downstream Analysis of ChIP-Seq Data (P) </li>
</p>
<h3>
<a id="software" class="anchor" href="#software" aria-hidden="true"><span class="octicon octicon-link"></span></a>How to Run the course.</h3>
We recommend using <a href="www.rstudio.com">RStudio</a> for the practicals along with <a href="https://cran.r-project.org/">R version 3.2.1</a>
<p>
Download the materials from this repository and install the required R and Bioconductor packages from within RStudio. This may take several minutes.
<pre class="input"><code>source("http://www.bioconductor.org/biocLite.R")
biocLite(c("Biostrings", "ShortRead", "DESeq", "edgeR","biomaRt", "BSgenome",
"pasillaBamSubset", "pasilla",
"rtracklayer", "ggbio", "vsn","gplots","RColorBrewer","chipseq","htSeqTools","limma","NBPSeq","tweeDEseqCountData","org.Hs.eg.db","Rcade", "ChIPQC","TxDb.Hsapiens.UCSC.hg19.knownGene","BSgenome.Hsapiens.UCSC.hg19","ChIPpeakAnno","statmod","locfit","Rsubread","goseq","GO.db"))
</code></pre>
The Download zip file link at the top of this page will download all the lectures and practicals, and some example data. However, larger data files have to be downloaded from elsewhere because they are too large to share on github
</p>
<h2>
<a id="data" class="anchor" href="#data" aria-hidden="true"><span class="octicon octicon-link"></span></a>Example Data.</h2>
<h3>Day 1</h3>
<p>
A breast cancer dataset is also required for the Bioconductor introductory practical. This folder can be downloaded from <a href="https://www.dropbox.com/s/82p2dcwwo3qnf21/nki.zip">Dropbox.</a> Once downloaded and unzipped, the folder should be placed inside the Day1 directory
</p>
<p>
<li><a target="_blank" href="http://training.bio.cam.ac.uk/SRR1186252_trimmed.fq.chr6.fq">Example chromosome 6 reads</a></li>
<li><a target="_blank" href="http://hgdownload.cse.ucsc.edu/goldenpath/hg19/chromosomes/chr6.fa.gz">Chromosome 6 reference sequence</a></li>
</p>
<h3>Day 2</h3>
<p>
<li><a target="_blank" href="http://archers.bio.cam.ac.uk/cruk/HG00096.chr22.bam">1000genomes sample, chromosome 22 aligned reads bam</a></li>
<li><a target="_blank" href="http://archers.bio.cam.ac.uk/cruk/HG00096.chr22.bam.bai">1000genomes sample, chromosome 22 aligned reads bam index</a></li>
<li><a target="_blank" href="http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/chromosomes/chr22.fa.gz">Chromosome 22 reference sequence</a></li>
<li><a target="_blank" href="http://archers.bio.cam.ac.uk/cruk/16N_aligned.bam">RNA-seq sample 16N aligned bam </a></li>
<li><a target="_blank" href="http://archers.bio.cam.ac.uk/cruk/16N_aligned.bam.bai">RNA-seq sample 16N aligned bam index </a></li>
<li><a target="_blank" href="http://archers.bio.cam.ac.uk/cruk/16T_aligned.bam">RNA-seq sample 16T aligned bam </a></li>
<li><a target="_blank" href="http://archers.bio.cam.ac.uk/cruk/16T_aligned.bam.bai">RNA-seq sample 16T aligned bam index </a></li>
<li><a target="_blank" href="http://archers.bio.cam.ac.uk/cruk/18N_aligned.bam">RNA-seq sample 18N aligned bam </a></li>
<li><a target="_blank" href="http://archers.bio.cam.ac.uk/cruk/18N_aligned.bam.bai">RNA-seq sample 18N aligned bam index </a></li>
<li><a target="_blank" href="http://archers.bio.cam.ac.uk/cruk/18T_aligned.bam">RNA-seq sample 18T aligned bam </a></li>
<li><a target="_blank" href="http://archers.bio.cam.ac.uk/cruk/18T_aligned.bam.bai">RNA-seq sample 18T aligned bam index </a></li>
<li><a target="_blank" href="http://archers.bio.cam.ac.uk/cruk/19N_aligned.bam">RNA-seq sample 19N aligned bam </a></li>
<li><a target="_blank" href="http://archers.bio.cam.ac.uk/cruk/19N_aligned.bam.bai">RNA-seq sample 19N aligned bam index </a></li>
<li><a target="_blank" href="http://archers.bio.cam.ac.uk/cruk/19T_aligned.bam">RNA-seq sample 19T aligned bam </a></li>
<li><a target="_blank" href="http://archers.bio.cam.ac.uk/cruk/19T_aligned.bam.bai">RNA-seq sample 19N aligned bam index </a></li>
<p>
</p>
<h3>
<a id="docker" class="anchor" href="#docker" aria-hidden="true"><span class="octicon octicon-link"></span></a>Using Docker.</h3>
<p>
<br>
If you not attending one of our courses in-person you can still run the course materials using the <a href="http://www.docker.com">Docker</a> system.
First, you will need to install the <a href="http://boot2docker.io/">boot2Docker</a> software.
</p>
<p>Once you have boot2docker installed, an icon should appear on your Desktop (Windows) or Applications folder (Mac). After running this new application, a new window should appear will various lines of white text on a black background. The last line should read;
<pre class="input"><code>docker@boot2docker:~$</code></pre>
Now carefully type the following line of text (using the correct spaces and punctuation is very important!)
<pre class="input"><code>docker run -p 8787:8787 markdunning/cruk-bioinf-sschool</code></pre>
This will download and install some data. Once this has finished, you can open a web browser and type the following. This will launch a version of RStudio within your browser. You will need to enter the username 'rstudio' and password 'rstudio'.
<pre class="input"><code>http://localhost:8787</code></pre>
For exercises which use the command-line (e.g. alignment and qa practicals) run the following command in boot2docker
<pre class="input"><code>docker run -ti markdunning/cruk-bioinf-sschool /bin/bash</code></pre>
</p>
<h2>
<a id="data" class="anchor" href="#data" aria-hidden="true"><span class="octicon octicon-link"></span></a>License</h2>
<p>
This work is licensed under the Creative Commons Attribution-ShareAlike 2.0 UK: England & Wales License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.0/uk/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.
</p>
<h2></h2>
<a id="data" class="anchor" href="#data" aria-hidden="true"><span class="octicon octicon-link"></span></a>Resources</h2>
<li><a target="_blank" href="http://seqanswers.com/forums/forumdisplay.php?f=18">seqanswers Bioinformatics forum</a></li>
<li><a target="_blank" href="https://www.biostars.org/">Biostars forum</a></li>
<li><a target="_blank" href="https://support.bioconductor.org/">Bioconductor forum</a></li>
<li><a target="_blank" href="http://www.r-bloggers.com/">R-bloggers</a></li>
<li><a target="_blank" href="https://en.wikibooks.org/wiki/Next_Generation_Sequencing_%28NGS%29">NGS wiki</a></li>
</section>
</body>
</html>