-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathintroduction.tex
executable file
·124 lines (110 loc) · 5.37 KB
/
introduction.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
\chapter*{Introduction}\label{ch:intro}
\addcontentsline{toc}{chapter}{Introduction}
%\setlength{\epigraphwidth}{0.53\textwidth}
\setlength{\epigraphwidth}{0.62\textwidth}
%\epigraphhead[155]{%
\setlength{\epigraphrule}{0pt}
\epigraphhead[5]{%
\epigraph{\emph{I am not accustomed to saying anything with certainty after
only one or two observations.}}{Andreas Vesalius \mycite{omalley_1964}}
}
\vspace{-4mm}
Today, we have comprehensive knowledge %(extensive knowledge)
about the structure and functioning of the human body
at the macroscopic level\footnote{%
Even though human anatomy can still be refined,
and new findings can happen~\mycite{Kumar2019-oo}.}.
At the microscopic level, however,
the identification and mapping of macromolecules
(\eg\ \glspl{RNA}, proteins),
their function and whereabouts still need to be refined.\mybr\
Beyond the invaluable addition to our knowledge,
other more practical reasons are also sustaining the effort
for human expression atlases.
Genes with specific behaviour in particular conditions
are a convenient starting point
for designing new diagnostic tests
and discovering new effective drug targets.
Besides, a robust atlas for non-diseased tissue baseline expression
will allow a better understanding of unperturbed physiology.
It can also serve as a reference in studies where controls are unavailable
or hard to sample,
which is generally the case in cancer research.\mybr\
The completed and annotated human genome and
technological advances in high-throughput expression studies
have opened the way towards this future new milestone.
Evidence of the community shared interest is
the recent explosion of high-throughput transcriptomic atlases in the literature.
Examples include expression atlases of
mouse~\mycite{Wu2009-lw,Ringwald2012}, pig~\mycite{Freeman2012},
sheep~\mycite{Clark2017-mw}, plants as maize~\mycite{Stelpflug2016-sm},
vigna~\mycite{Yao2016-se}, pigeon pea~\mycite{Pazhamala2017-ig},
parasites, \eg\ \species{Schistosoma mansoni}.
Many focus on mapping the human gene expression either as a whole,
see~\eg{} \mycite{Krupp2012,Jimenez-Lozano2012,Uhlen2015,GTEx2013}
or for specific aspects,
\eg\ the organogenesis in human embryos~\mycite{Gerrard2016-zu}.\mybr\
There are large incentives to develop these atlases with
transcriptome shotgun sequencing (\ie\ \Rnaseq).
The technology involved in similar older projects\footnote{%
E.g.\ Gene Expression Atlas for Human Embryogenesis~\mycite{Yi2010-az},
Atlas of human primary cells~\mycite{Mabbott2013-xf},
Gene atlas of mouse and human protein-encoding transcriptomes
(now hosted by \soft{BioGPS}) \mycite{Su2004-kc},
Allen Brain Atlas~\mycite{Hawrylycz2012-mx}.}
has demonstrated to generate highly variable data~\mycite{Rung2013-ul}
that is challenging to integrate
(even when produced by the same laboratory
on the same platform)~\mycite{Walsh2015-nf}.
When I started my doctorate,
little was known about the interstudy robustness of \Rnaseq.
However, it had already shown less background noise and
a more extensive dynamic range of detection
than the array-based technology (microarray) used
in previous expression studies~\mycite{rnaseq-2009}.
\Rnaseq\ also has the added advantage to discover new transcripts
as it does not rely on previous knowledge~\mycite{rnaseq-2009}.\mybr\
Considering the growing number of studies referring to these atlases\footnote{%
More than 2,800 papers for the five human primary studies
(presented in \Cref{sec:rnaseq-data}) on 15 September 2019.
}
or the numerous efforts to compile them into new resources for the community ---
\eg\
\hFoCi{TISSUES}{https://tissues.jensenlab.org/Search}{Santos2015-rj},
\hFoCi{Harmonizome}{https://amp.pharm.mssm.edu/Harmonizome}{Harmonizome}
and, after reprocessing the raw data,
\hFoCi{Expression Atlas}{https://www.ebi.ac.uk/gxa/home}{EBIgxa}
---
assessing the consistency of the results from one study to another
has become paramount.
The recently reported reproducibility crisis in science \mycite{Begley2012-hm,%
Fatovich2017-lo,Lindner2018-qy,Lyu2018-ze}
has only further underlined this need.\mybr\
Given the previous context,
the first aim of my doctorate was
to examine the consistency of the (non-diseased) human tissues
landscape of expression in independent large-scale transcriptomics.
Then, with the publication of the first drafts of the
human proteome~\mycite{PandeyData,KusterData},
my aims have expanded to the integration of human tissue expression data
across different datasets and biological layers.\mybr\
\vspace{-2mm}
\section*{Outlook of this thesis}
\vspace{-3mm}
First, I review the biological, chemical, experimental and computational
background of my doctorate works in \Cref{ch:background}.\mybr\
Then, in \Cref{ch:datasets},
I present the five transcriptomic (\Rnaseq)
and the three proteomic (\glsxtrshort{MS}) datasets
that I have preselected for my analysis.
Then I describe the bioinformatic pipelines
that have automated the processing of these large-scale datasets.\mybr\
After considering several possible sources of bias
and strategies to limit them in \Cref{ch:expression},
I compare and integrate the independent transcriptomic datasets
in \Cref{ch:Transcriptomics}.
Following an assessment of the findings' consistency in proteomics
in \Cref{ch:proteomics},
I then employ different approaches for integrating
transcriptomics and proteomics in \Cref{ch:Integration}.\mybr\
Finally, I close this thesis with a few remarks in \Cref{ch:conclusion}.\mybr\