-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlecture6_annotation_resources_and_prelim_ORA.html
813 lines (622 loc) · 26 KB
/
lecture6_annotation_resources_and_prelim_ORA.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
<!DOCTYPE html>
<html lang="" xml:lang="">
<head>
<title>BCB420 - Computational Systems Biology</title>
<meta charset="utf-8" />
<meta name="author" content="Ruth Isserlin" />
<meta name="date" content="2020-02-23" />
<link href="libs/remark-css-0.0.1/default.css" rel="stylesheet" />
<link rel="stylesheet" href="libs/example.css" type="text/css" />
</head>
<body>
<textarea id="source">
class: center, middle, inverse, title-slide
# BCB420 - Computational Systems Biology
## Lecture 6 - Annotation Dataset and Intro to Pathway analysis
### Ruth Isserlin
### 2020-02-23
---
## Before we start
Please fill this out:
[<font size=8>Mid course Feedback : https://forms.gle/maGA529V7pxvgBXC6</font>](https://forms.gle/maGA529V7pxvgBXC6)
---
## Features of Markdown
Three **must** features to include in your next assignment:
* Table of contents
* Code chunk specifications.
* References using bibtex
For more information on Rmarkdown see the following useful resources:
* See [Rmarkdown cheatsheet](https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf) for the basics
* For more in depth on all the **great** features of markdown chekc out [this page](https://yihui.org/knitr/)
---
## Table of contents
.pull-left[
<img src=./images/img_lecture6/rmarkdown_toc.png>
]
.pull-right[
* TOC will be generated by taking all entries specified as a heading using "\#"
* Depending on what depth you specify this will only include the number of "\#" specified
]
---
## Code chunk specifications
.pull-left[
<img src=./images/img_lecture6/rmarkdown_code_chunks.png>
]
.pull-right[
* message = FALSE - to get rid of all those info messages that make the report un-readable.
* warning = FALSE - same as messages but it is important that you pay attention to these when you are iterating over your report. Some warning may indicate that a function is not being used correctly. You don't want these showing up in the final report but you do want to address all of them.
* echo and eval are both useful for reports where you don't want to show the code you are using. If you are giving this report to a biologist or experimentalist they don't want to see the code that generated the data or figures.
]
---
## References using bibtext
* Create a set of references that you use for your Rmarkdown notebook, extension .bib
* To get the bibtex go to google scholar and search for your article. Click on the quotation marks associated with your article and click on Bibtex. Copy it to .bib file.
* Example bibtex to put in your bib file:
```r
@article{cytoscape2003,
title={Cytoscape: a software environment for integrated models of biomolecular interaction networks},
author={Shannon, Paul and Markiel, Andrew and Ozier, Owen and Baliga, Nitin S and Wang, Jonathan T and Ramage, Daniel and Amin, Nada and Schwikowski, Benno and Ideker, Trey},
journal={Genome research},
volume={13},
number={11},
pages={2498--2504},
year={2003},
publisher={Cold Spring Harbor Lab}
}
```
---
## References using bibtex - step 2
Add your bibliography file to your Rmarkdown file
```r
---
title: "BCB420 - Computational Systems Biology"
subtitle: "Lecture 6 - Annotation Dataset and Intro to Pathway analysis"
author: "Ruth Isserlin"
date: "`r Sys.Date()`"
output:
html_document:
toc: true
toc_depth: 2
*bibliography: lecture6.bib
---
```
---
## References using bibtex - step3
If I want to add the following sentence to add the reference use the @ sign followed by the name you called the reference.
```r
* [Cytoscape](www.cytoscape.org)[@cytoscape_2003] is a freely available open-source, cross platform network analysis software.
```
--
In your markdown file the above will get rendered to look like:
* [Cytoscape](www.cytoscape.org)(Shannon et al. 2003) is a freely available open-source, cross platform network analysis software.
And the following will appear at the end of your document:
```r
Shannon, Paul, Andrew Markiel, Owen Ozier, Nitin S Baliga, Jonathan T Wang, Daniel Ramage, Nada Amin, Benno Schwikowski, and Trey Ideker. 2003. “Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks.” Genome Research 13 (11). Cold Spring Harbor Lab: 2498–2504.
```
---
## References using bibtex - controlling the format
* In order to change the style of your references you can specify a citation style.
* For a set of predefined styles see [here](https://github.com/citation-style-language/styles)
```r
---
title: "BCB420 - Computational Systems Biology"
subtitle: "Lecture 6 - Annotation Dataset and Intro to Pathway analysis"
author: "Ruth Isserlin"
date: "`r Sys.Date()`"
output:
html_document:
toc: true
toc_depth: 2
bibliography: lecture6.bib
*csl: biomed-central.csl
```
---
## Annotation Sources
[<font size=8>Fill in this short survey on your annotation source: https://forms.gle/shyJiddEKMk9qnU76</font>](https://forms.gle/shyJiddEKMk9qnU76)
---
## Annotation Sources
.pull-left[
<img src="images/img_lecture6/annotation_resources.png">
]
.pull-right[
* There are a lot of sources that try and collate a lot of the smaller databases
* [Pathway commons](https://www.pathwaycommons.org/)
* [MsigDb](https://www.gsea-msigdb.org/gsea/msigdb/collections.jsp)
* [iRefIndex](https://irefindex.vib.be/wiki/index.php/iRefIndex)
* [GeneCards](https://www.genecards.org/)
* There are hundreds of smaller databases!
]
---
## Information that is important to know about our annotation source
.pull-left[
* Where did it come from?
* is it from a primary publication?
* Is it curated by a consortium?
* Is it computationally defined?
]
.pull-right[
* How is it released and how often is it released?
* Is it a custom format? gmt format?
* available for download from a website or ftp site?
* queriable and accessibly through a REST interface?
* Is it versioned?
]
???
REST - REpresentational State Transfer
---
## GMT file format
* File format defined by the GSEA tool when it was originally released in 2003
* Each row represents a pathway or gene set
* Format is: pathway name <tab> pathway description <tab> tab delimited list of genes associated with this pathway.
* A lot of enrichment analysis tools can take this format in as input.
<img src="https://software.broadinstitute.org/cancer/software/gsea/wiki/images/3/3d/Gmt_format_snapshot.gif">
---
## Where are we?
So, what are **we** going to be covering during this course?<br>
![](./images/course_overview_flowchart.png)
---
## Recap from previous weeks
<img src=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4532886/bin/ncomms8956-f1.jpg width="80%">
<font size=1>Janzen DM, Tiourin E, Salehi JA, Paik DY, Lu J, Pellegrini M, Memarzadeh S. [An
apoptosis-enhancing drug overcomes platinum resistance in a tumour-initiating
subpopulation of ovarian cancer](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4532886/). Nat Commun. 2015 Aug 3;6:7956. - Figure 1</font>
---
First things first,
* Load the data, filter it
```r
library(GEOquery)
sfiles = getGEOSuppFiles('GSE70072')
fnames = rownames(sfiles)
# there is only one supplemental file
ca125_exp = read.delim(fnames[1],header=TRUE,
check.names = FALSE)
cpms = edgeR::cpm(ca125_exp[,3:22])
rownames(cpms) <- ca125_exp[,1]
# get rid of low counts
*keep = rowSums(cpms >1) >=3
ca125_exp_filtered = ca125_exp[keep,]
filtered_data_matrix <- as.matrix(ca125_exp_filtered[,3:22])
rownames(filtered_data_matrix) <- ca125_exp_filtered$ensembl75_id
```
---
* Define the groups
```r
#get the 2 and third token from the column names
samples <- data.frame(
lapply(colnames(ca125_exp_filtered)[3:22],
FUN=function(x){
unlist(strsplit(x, split = "\\."))[c(2,3)]}))
colnames(samples) <- colnames(ca125_exp_filtered)[3:22]
rownames(samples) <- c("patients","cell_type")
samples <- data.frame(t(samples))
```
```r
d = DGEList(counts=filtered_data_matrix, group=samples$cell_type)
```
* create the model
```r
model_design_pat <- model.matrix(
~ samples$patients + samples$cell_type)
```
---
* estimate the dispersion,
* calculate normalization factors,
* fit the model
* calculate differentail expression
```r
#estimate dispersion
d <- estimateDisp(d, model_design_pat)
#calculate normalization factors
d <- calcNormFactors(d)
#fit model
fit <- glmQLFit(d, model_design_pat)
#calculate differential expression
qlf.pos_vs_neg <- glmQLFTest(fit, coef='samples$cell_typeCA125+')
```
---
Get all the results
```r
qlf_output_hits <- topTags(qlf.pos_vs_neg,sort.by = "PValue",
n = nrow(filtered_data_matrix))
```
How many gene pass the threshold p-value < 0.05?
```r
length(which(qlf_output_hits$table$PValue < 0.05))
```
```
## [1] 1338
```
How many genes pass correction?
--
```r
length(which(qlf_output_hits$table$FDR < 0.05))
```
```
## [1] 0
```
---
Output our top hits
```r
kable(topTags(qlf.pos_vs_neg), type="html")
```
<table class="kable_wrapper">
<tbody>
<tr>
<td>
<table>
<thead>
<tr>
<th style="text-align:left;"> </th>
<th style="text-align:right;"> logFC </th>
<th style="text-align:right;"> logCPM </th>
<th style="text-align:right;"> F </th>
<th style="text-align:right;"> PValue </th>
<th style="text-align:right;"> FDR </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> ENSG00000182752 </td>
<td style="text-align:right;"> 2.2978937 </td>
<td style="text-align:right;"> 2.7763253 </td>
<td style="text-align:right;"> 35.38410 </td>
<td style="text-align:right;"> 0.0000416 </td>
<td style="text-align:right;"> 0.5020363 </td>
</tr>
<tr>
<td style="text-align:left;"> ENSG00000240864 </td>
<td style="text-align:right;"> 5.0129489 </td>
<td style="text-align:right;"> -0.7698429 </td>
<td style="text-align:right;"> 55.78796 </td>
<td style="text-align:right;"> 0.0000523 </td>
<td style="text-align:right;"> 0.5020363 </td>
</tr>
<tr>
<td style="text-align:left;"> ENSG00000102755 </td>
<td style="text-align:right;"> 1.3912815 </td>
<td style="text-align:right;"> 2.5717883 </td>
<td style="text-align:right;"> 30.52553 </td>
<td style="text-align:right;"> 0.0000857 </td>
<td style="text-align:right;"> 0.5483107 </td>
</tr>
<tr>
<td style="text-align:left;"> ENSG00000211625 </td>
<td style="text-align:right;"> 2.8877608 </td>
<td style="text-align:right;"> 1.8364482 </td>
<td style="text-align:right;"> 34.28879 </td>
<td style="text-align:right;"> 0.0001973 </td>
<td style="text-align:right;"> 0.6697894 </td>
</tr>
<tr>
<td style="text-align:left;"> ENSG00000188641 </td>
<td style="text-align:right;"> 1.1109197 </td>
<td style="text-align:right;"> 3.9514876 </td>
<td style="text-align:right;"> 24.50111 </td>
<td style="text-align:right;"> 0.0002383 </td>
<td style="text-align:right;"> 0.6697894 </td>
</tr>
<tr>
<td style="text-align:left;"> ENSG00000026508 </td>
<td style="text-align:right;"> 1.1113542 </td>
<td style="text-align:right;"> 8.4785833 </td>
<td style="text-align:right;"> 23.75535 </td>
<td style="text-align:right;"> 0.0002737 </td>
<td style="text-align:right;"> 0.6697894 </td>
</tr>
<tr>
<td style="text-align:left;"> ENSG00000085552 </td>
<td style="text-align:right;"> -0.9281010 </td>
<td style="text-align:right;"> 2.4534215 </td>
<td style="text-align:right;"> 22.06649 </td>
<td style="text-align:right;"> 0.0003784 </td>
<td style="text-align:right;"> 0.6697894 </td>
</tr>
<tr>
<td style="text-align:left;"> ENSG00000126878 </td>
<td style="text-align:right;"> -0.6443631 </td>
<td style="text-align:right;"> 6.6234342 </td>
<td style="text-align:right;"> 22.00129 </td>
<td style="text-align:right;"> 0.0003833 </td>
<td style="text-align:right;"> 0.6697894 </td>
</tr>
<tr>
<td style="text-align:left;"> ENSG00000168350 </td>
<td style="text-align:right;"> -1.4574338 </td>
<td style="text-align:right;"> 3.1352032 </td>
<td style="text-align:right;"> 21.85648 </td>
<td style="text-align:right;"> 0.0003945 </td>
<td style="text-align:right;"> 0.6697894 </td>
</tr>
<tr>
<td style="text-align:left;"> ENSG00000150938 </td>
<td style="text-align:right;"> 0.6396190 </td>
<td style="text-align:right;"> 5.8837098 </td>
<td style="text-align:right;"> 21.74652 </td>
<td style="text-align:right;"> 0.0004032 </td>
<td style="text-align:right;"> 0.6697894 </td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<thead>
<tr>
<th style="text-align:left;"> x </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> BH </td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<thead>
<tr>
<th style="text-align:left;"> x </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> samples$cell_typeCA125+ </td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<thead>
<tr>
<th style="text-align:left;"> x </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> glm </td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
---
How many genes are up regulated?
```r
length(which(qlf_output_hits$table$PValue < 0.05
& qlf_output_hits$table$logFC > 0))
```
```
## [1] 790
```
How many genes are down regulated?
```r
length(which(qlf_output_hits$table$PValue < 0.05
& qlf_output_hits$table$logFC < 0))
```
```
## [1] 548
```
---
Create **thresholded** lists of genes.
```r
#merge gene names with the top hits
qlf_output_hits_withgn <- merge(ca125_exp[,1:2],qlf_output_hits, by.x=1, by.y = 0)
qlf_output_hits_withgn[,"rank"] <- -log(qlf_output_hits_withgn$PValue,base =10) * sign(qlf_output_hits_withgn$logFC)
qlf_output_hits_withgn <- qlf_output_hits_withgn[order(qlf_output_hits_withgn$rank),]
upregulated_genes <- qlf_output_hits_withgn$gname[
which(qlf_output_hits_withgn$PValue < 0.05
& qlf_output_hits_withgn$logFC > 0)]
downregulated_genes <- qlf_output_hits_withgn$gname[
which(qlf_output_hits_withgn$PValue < 0.05
& qlf_output_hits_withgn$logFC < 0)]
write.table(x=upregulated_genes,
file=file.path("data","ca125_upregulated_genes.txt"),sep = "\t",
row.names = FALSE,col.names = FALSE,quote = FALSE)
write.table(x=downregulated_genes,
file=file.path("data","ca125_downregulated_genes.txt"),sep = "\t",
row.names = FALSE,col.names = FALSE,quote = FALSE)
write.table(x=data.frame(genename= qlf_output_hits_withgn$gname,F_stat= qlf_output_hits_withgn$rank),
file=file.path("data","ca125_ranked_genelist.txt"),sep = "\t",
row.names = FALSE,col.names = FALSE,quote = FALSE)
```
---
## Thresholded list?
* We have gone from a large datasets of 1000s of measurements to a list of differentially expressed genes. Why do we look at a subset of this list?
* Aren't thresholds arbitrary? Why define a threshold and limit our list to just that set?
--
Some reasons for starting with the thresholded list:
* Where else can we start?
* Get the low hanging fruit. Strongest signal might be the most important
* The strongest signals might be the best place to start.
* Easy.
---
## Is there an alternative to a Thresholded list?
### Thresholded list - genes with p-value < threshold
* Asks the question - Are there any gene sets or pathways that are **enriched/over-represented** or **depleted/under-represented** in my list
* Uses stastics like fisher exact test, hypergeometric test to assess this.
* Tools like: [DAVID](https://david.ncifcrf.gov/),[g:profiler](https://biit.cs.ut.ee/gprofiler/),[ErichR](https://amp.pharm.mssm.edu/Enrichr/),[GREAT](http://great.stanford.edu/public/html/)
### non-thresholded list - ranked gene list
* Asks the question - Are there any gene sets or pathways that are ranked surprisingly high or low in my ranked list.
* Uses stastics like KS test, linear models
* Tools and packages like:[GSEA - Gene set enrichment anlaysis](https://www.gsea-msigdb.org/gsea/index.jsp),[GSVA - Gene Set Variation Analysis](https://bioconductor.org/packages/release/bioc/html/GSVA.html)
---
## Pros and Cons of Thresholded list
.pull-left[
**Cons**
* No set value good for threshold - pvalue<0.05 is accepted but ultimately it is still arbitrary
* Weak signals are ignored (many weak signals can contribute to the whole)
* different results at different thresholds
]
.pull-right[
**Pros**
* No learning curve! Plug and play.
* Very accessible.
* Quick and fast results
* Endless of online tools available
* No requirement for a bioinformatician or computational biologist.
]
---
## Gene List Enrichment Analysis
* sometimes called over-representation analysis
### What do we start with:
* Gene list: eg: FUCA2, GCLC, NFYA, NIPAL3, SEMA3F, KRIT1
* Gene sets or pathways: come from any one of the annotation databases that we mentioned at the beginning of the day today.
### What are we looking for:
* Are any of the genelists surprisingly or statistically over-represented in our set of genes.
### What do we need in order to answer this question:
* A way to associate a statistic with the above question
* A way to correct for the multiple hypothesis that we are going to be asking.
---
<img src="./images/img_lecture6/two_class_design.png">
<font size=2>Voisin V, Bader GD, Pathway and network Analysis course - slide 12 [Bioinformatics.ca](https://bioinformatics.ca/workshops/2018-pathway-and-network-analysis-of-omics-data/#course-material)</font>
---
<img src="./images/img_lecture6/simple_enrichment_test.png">
<font size=2>Voisin V, Bader GD, Pathway and network Analysis course - slide 14 [Bioinformatics.ca](https://bioinformatics.ca/workshops/2018-pathway-and-network-analysis-of-omics-data/#course-material)</font>
---
<img src="./images/img_lecture6/fisher_exact1.png">
<font size=2>Voisin V, Bader GD, Pathway and network Analysis course - slide 16 [Bioinformatics.ca](https://bioinformatics.ca/workshops/2018-pathway-and-network-analysis-of-omics-data/#course-material)</font>
---
<img src="./images/img_lecture6/fisher_exact2.png">
<font size=2>Voisin V, Bader GD, Pathway and network Analysis course - slide 17 [Bioinformatics.ca](https://bioinformatics.ca/workshops/2018-pathway-and-network-analysis-of-omics-data/#course-material)</font>
---
<img src="./images/img_lecture6/fisher_exact3.png">
<font size=2>Voisin V, Bader GD, Pathway and network Analysis course - slide 18 [Bioinformatics.ca](https://bioinformatics.ca/workshops/2018-pathway-and-network-analysis-of-omics-data/#course-material)</font>
---
<img src="./images/img_lecture6/fisher_exact4.png">
<font size=2>Voisin V, Bader GD, Pathway and network Analysis course - slide 19 [Bioinformatics.ca](https://bioinformatics.ca/workshops/2018-pathway-and-network-analysis-of-omics-data/#course-material)</font>
---
## More about fisher exact test
* Video using M & M example with fisher exact test : https://www.youtube.com/watch?v=udyAvvaMjfM
* Great explanation on pathway commons site: https://www.pathwaycommons.org/guide/primers/statistics/fishers_exact_test/
---
<img src="./images/img_lecture6/difference_ora_gsea.png">
<font size=2>Voisin V, Bader GD, Pathway and network Analysis course - slide 21 [Bioinformatics.ca](https://bioinformatics.ca/workshops/2018-pathway-and-network-analysis-of-omics-data/#course-material)</font>
---
Check out this publication:
* Geistlinger L, Csaba G, Santarelli M, Ramos M, Schiffer L, Turaga N, Law C,
Davis S, Carey V, Morgan M, Zimmer R, Waldron L. Toward a gold standard for
benchmarking gene set enrichment analysis. Brief Bioinform. 2020 Feb 6 [PMID](https://www.ncbi.nlm.nih.gov/pubmed/32026945)
---
<img src="./images/img_lecture6/waldron_ora_methods.png">
<font size=2>Geistlinger L, Csaba G, Santarelli M, Ramos M, Schiffer L, Turaga N, Law C,Davis S, Carey V, Morgan M, Zimmer R, Waldron L. Toward a gold standard for
benchmarking gene set enrichment analysis. Brief Bioinform. 2020 Feb 6 [PMID](https://www.ncbi.nlm.nih.gov/pubmed/32026945)</font>
---
## Assignment #2
* differentail gene expression and preliminary ORA
* <font size=5> Due March 3, 2020! @ 20:00 </font>
## What to hand in?
* **html rendered RNotebook** - you should submit this through quercus
* Make sure the notebook and all associated code is checked into your github repo as I will be pulling all the repos at the deadline and using them to compile your code. - Your checked in code must replicate the handed in notebook.
* Document your work and your code directly in the notebook.
* **Reference the paper associated with your data!**
* **Introduce your paper and your data again**
* You are allowed to use helper functions or methods but make sure when you source those files the paths to them are relative and that they are checked into your repo as well.
</textarea>
<style data-target="print-only">@media screen {.remark-slide-container{display:block;}.remark-slide-scaler{box-shadow:none;}}</style>
<script src="https://remarkjs.com/downloads/remark-latest.min.js"></script>
<script>var slideshow = remark.create({
"highlightStyle": "github",
"highlightLines": true,
"highlightSpans": true,
"countIncrementalSlides": false
});
if (window.HTMLWidgets) slideshow.on('afterShowSlide', function (slide) {
window.dispatchEvent(new Event('resize'));
});
(function(d) {
var s = d.createElement("style"), r = d.querySelector(".remark-slide-scaler");
if (!r) return;
s.type = "text/css"; s.innerHTML = "@page {size: " + r.style.width + " " + r.style.height +"; }";
d.head.appendChild(s);
})(document);
(function(d) {
var el = d.getElementsByClassName("remark-slides-area");
if (!el) return;
var slide, slides = slideshow.getSlides(), els = el[0].children;
for (var i = 1; i < slides.length; i++) {
slide = slides[i];
if (slide.properties.continued === "true" || slide.properties.count === "false") {
els[i - 1].className += ' has-continuation';
}
}
var s = d.createElement("style");
s.type = "text/css"; s.innerHTML = "@media print { .has-continuation { display: none; } }";
d.head.appendChild(s);
})(document);
// delete the temporary CSS (for displaying all slides initially) when the user
// starts to view slides
(function() {
var deleted = false;
slideshow.on('beforeShowSlide', function(slide) {
if (deleted) return;
var sheets = document.styleSheets, node;
for (var i = 0; i < sheets.length; i++) {
node = sheets[i].ownerNode;
if (node.dataset["target"] !== "print-only") continue;
node.parentNode.removeChild(node);
}
deleted = true;
});
})();
// adds .remark-code-has-line-highlighted class to <pre> parent elements
// of code chunks containing highlighted lines with class .remark-code-line-highlighted
(function(d) {
const hlines = d.querySelectorAll('.remark-code-line-highlighted');
const preParents = [];
const findPreParent = function(line, p = 0) {
if (p > 1) return null; // traverse up no further than grandparent
const el = line.parentElement;
return el.tagName === "PRE" ? el : findPreParent(el, ++p);
};
for (let line of hlines) {
let pre = findPreParent(line);
if (pre && !preParents.includes(pre)) preParents.push(pre);
}
preParents.forEach(p => p.classList.add("remark-code-has-line-highlighted"));
})(document);</script>
<script>
(function() {
var links = document.getElementsByTagName('a');
for (var i = 0; i < links.length; i++) {
if (/^(https?:)?\/\//.test(links[i].getAttribute('href'))) {
links[i].target = '_blank';
}
}
})();
</script>
<script>
slideshow._releaseMath = function(el) {
var i, text, code, codes = el.getElementsByTagName('code');
for (i = 0; i < codes.length;) {
code = codes[i];
if (code.parentNode.tagName !== 'PRE' && code.childElementCount === 0) {
text = code.textContent;
if (/^\\\((.|\s)+\\\)$/.test(text) || /^\\\[(.|\s)+\\\]$/.test(text) ||
/^\$\$(.|\s)+\$\$$/.test(text) ||
/^\\begin\{([^}]+)\}(.|\s)+\\end\{[^}]+\}$/.test(text)) {
code.outerHTML = code.innerHTML; // remove <code></code>
continue;
}
}
i++;
}
};
slideshow._releaseMath(document);
</script>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = 'https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML';
if (location.protocol !== 'file:' && /^https?:/.test(script.src))
script.src = script.src.replace(/^https?:/, '');
document.getElementsByTagName('head')[0].appendChild(script);
})();
</script>
</body>
</html>