index.Rmd

---
title:
author: 
date: 
site: bookdown::bookdown_site
output: 
  bookdown::pdf_book: 
    dev: png
    toc : true
    toc_depth: 4
    keep_tex : yes
    includes:
        in_header: Descartes.tex
        before_body: titre-doctorat.tex
mainfont : Montserrat
lof: yes
lot: yes        
documentclass: book
fontsize: 12pt
bibliography: [01-Interdisciplinarity.bib, 01-Intro.bib, 02-MathIntro.bib, packages.bib, UCzcite.bib, lva.bib, deconica.bib, data.bib, references.bib, gloss.bib]
biblio-style: "plainnat"
link-citations: true
colorlinks: true
geometry: "marginpar=2cm, top=3cm, bottom=4cm"
description: "PhDThesis"
linestretch: 1
---

<!-- # Prerequisites -->

<!-- This is a _sample_ book written in **Markdown**. You can use anything that Pandoc's Markdown supports, e.g., a math equation $a^2 + b^2 = c^2$. -->

<!-- For now, you have to install the development versions of **bookdown** from Github: -->

<!-- ```{r eval=FALSE} -->
<!-- devtools::install_github("rstudio/bookdown") -->
<!-- ``` -->

<!-- Remember each Rmd file contains one and only one chapter, and a chapter is defined by the first-level heading `#`. -->

<!-- To compile this example to PDF, you need to install XeLaTeX. 

# ```{r echo=TRUE}
# knitr::opts_knit$get("rmarkdown.pandoc.to")
# ```

# ```{r GlobalOptions}
# options(knitr.duplicate.label = 'allow')
# ```
-->
```{r include=FALSE}
library(kableExtra)
library(magick)
```

```{r include=FALSE}
# automatically create a bib database for R packages
knitr::write_bib(c(
  .packages(), 'bookdown', 'knitr', 'rmarkdown'
), 'packages.bib')

#optimizing R images output
# knitr::knit_hooks$set()
# knitr::knit_hooks$set(pngquant = '--speed=1')
knitr::opts_chunk$set(optipng = '-o7',pngquant = '--speed=1')
```


```{r setup, include=FALSE}
is_pdf_output = function() {
  knitr::opts_knit$get("rmarkdown.pandoc.to") =="latex"
}
```

```{r setupup, include=FALSE}
is_html_output = function() {
  knitr::opts_knit$get("rmarkdown.pandoc.to") =="html4"
}
```

```{r setsize, include=FALSE}
if(is_pdf_output()){
  size.fig = "70%"
} else {
  size.fig = "100%"
}


```

```{r index, echo=FALSE, results='asis', eval=is_html_output()}
cat("# **Unsupervised deconvolution of bulk omics profiles: methodology and application to characterize the immune landscape in tumors**{-}

by [Urszula Czerwinska](http://urszulaczerwinska.github.io)

INTERDISCIPLINARY Ph.D. thesis

Supervised by Andrei Zinovyev and  Vassili Soumelis

See the website [urszulaczerwinska.github.io](http://urszulaczerwinska.github.io) to learn more about my professional projects.\n\n

# Resumé {-}

Les tumeurs sont entourées d’un microenvironnement complexe comprenant des cellules tumorales, des fibroblastes et une diversité de cellules immunitaires. Avec le développement actuel des immunothérapies, la compréhension de la composition du microenvironnement tumoral est d'une importance critique pour effectuer un pronostic sur la progression tumorale et sa réponse au traitement. Cependant, nous manquons d'approches quantitatives fiables et validées pour caractériser le microenvironnement tumoral, facilitant ainsi le choix de la meilleure thérapie.

Une partie de ce défi consiste à quantifier la composition cellulaire d'un échantillon tumoral (appelé problème de déconvolution dans ce contexte), en utilisant son profil omique de masse (le profil quantitatif global de certains types de molécules, tels que l'ARNm ou les marqueurs épigénétiques). La plupart des méthodes existantes utilisent des signatures prédéfinies de types cellulaires et ensuite extrapolent cette information à des nouveaux contextes. Cela peut introduire un biais dans la quantification de microenvironnement tumoral dans les situations où le contexte étudié est significativement différent de la référence.

Sous certaines conditions, il est possible de séparer des mélanges de signaux complexes, en utilisant des méthodes de séparation de sources et de réduction des dimensions, sans définitions de sources préexistantes. Si une telle approche (déconvolution non supervisée) peut être appliquée à des profils omiques de masse de tumeurs, cela permettrait d'éviter les biais contextuels mentionnés précédemment et fournirait un aperçu des signatures cellulaires spécifiques au contexte.

Dans ce travail, j’ai développé une nouvelle méthode appelée DeconICA (Déconvolution de données omiques de masse par l'analyse en composantes immunitaires), basée sur la méthodologie de séparation aveugle de source. DeconICA a pour but l'interprétation et la quantification des signaux biologiques, façonnant les profils omiques d'échantillons tumoraux ou de tissus normaux, en mettant l'accent sur les signaux liés au système immunitaire et la découverte de nouvelles signatures.

Afin de rendre mon travail plus accessible, j'ai implémenté la méthode DeconICA en tant que librairie R. En appliquant ce logiciel aux jeux de données de référence, j'ai démontré qu’il est possible de quantifier les cellules immunitaires avec une précision comparable aux méthodes de pointe publiées, sans définir a priori des gènes spécifiques au type cellulaire. DeconICA peut fonctionner avec des techniques de factorisation matricielle telles que l'analyse indépendante des composants (ICA) ou la factorisation matricielle non négative (NMF).

Enfin, j’ai appliqué DeconICA à un grand volume de données : plus de 100 jeux de données, contenant au total plus de 28 000 échantillons de 40 types de tumeurs, générés par différentes technologies et traités indépendamment. Cette analyse a démontré que les signaux immunitaires basés sur l'ICA sont reproductibles entre les différents jeux de données. D’autre part, nous avons montré que les trois principaux types de cellules immunitaires, à savoir les lymphocytes T, les lymphocytes B et les cellules myéloïdes, peuvent y être identifiés et quantifiés.

Enfin, les métagènes dérivés de l'ICA, c’est-à-dire les valeurs de projection associées à une source, ont été utilisés comme des signatures spécifiques permettant d’étudier les caractéristiques des cellules immunitaires dans différents types de tumeurs. L'analyse a révélé une grande diversité de phénotypes cellulaires identifiés ainsi que la plasticité des cellules immunitaires, qu’elle soit dépendante ou indépendante du type de tumeur. Ces résultats pourraient être utilisés pour identifier des cibles médicamenteuses ou des biomarqueurs pour l'immunothérapie du cancer.\n\n
# Summary {-}
Tumors are engulfed in a complex microenvironment (TME) including tumor cells, fibroblasts, and a diversity of immune cells. Currently, a new generation of cancer therapies based on modulation of the immune system response is in active clinical development with first promising results. Therefore, understanding the composition of TME in each tumor case is critically important to make a prognosis on the tumor progression and its response to treatment. However, we lack reliable and validated quantitative approaches to characterize the TME in order to facilitate the choice of the best existing therapy. 

One part of this challenge is to be able to quantify the cellular composition of a tumor sample (called deconvolution problem in this context), using its bulk omics profile (global quantitative profiling of certain types of molecules, such as mRNA or epigenetic markers). In recent years, there was a remarkable explosion in the number of methods approaching this problem in several different ways. Most of them use pre-defined molecular signatures of specific cell types and extrapolate this information to previously unseen contexts. This can bias the TME quantification in those situations where the context under study is significantly different from the reference.

In theory, under certain assumptions, it is possible to separate complex signal mixtures, using classical and advanced methods of source separation and dimension reduction, without pre-existing source definitions. If such an approach (unsupervised deconvolution) is feasible to apply for bulk omic profiles of tumor samples, then this would make it possible to avoid the above mentioned contextual biases and provide insights into the context-specific signatures of cell types.

In this work, I developed a new method called DeconICA (Deconvolution of bulk omics datasets through Immune Component Analysis), based on the blind source separation methodology. DeconICA has an aim to decipher and quantify the biological signals shaping omics profiles of tumor samples or normal tissues. A particular focus of my study was on the immune system-related signals and discovering new signatures of immune cell types. 

In order to make my work more accessible, I implemented the DeconICA method as an R package named “DeconICA”.  By applying this software to the standard benchmark datasets, I demonstrated that DeconICA is able to quantify immune cells with accuracy comparable to published state-of-the-art methods but without a priori defining a cell type-specific signature genes. The implementation can work with existing deconvolution methods based on matrix factorization techniques such as Independent Component Analysis (ICA) or Non-Negative Matrix Factorization (NMF).

Finally, I applied DeconICA to a big corpus of data containing more than 100 transcriptomic datasets composed of, in total, over 28000 samples of 40 tumor types generated by different technologies and processed independently. This analysis demonstrated that ICA-based immune signals are reproducible between datasets and three major immune cell types: T-cells, B-cells and Myeloid cells can be reliably identified and quantified. 

Additionally, I used the ICA-derived metagenes as context-specific signatures in order to study the characteristics of immune cells in different tumor types. The analysis revealed a large diversity and plasticity of immune cells dependent and independent on tumor type. Some conclusions of the study can be helpful in identification of new drug targets or biomarkers for immunotherapy of cancer.

\n\n
# Acknowledgements {-}
I would like to thank my supervisors Andrei Zinovyev and Vassili Soumelis for guiding this project and enabling me to interact with their teams and sharing the resources. 
I would also like to thank the U900 lab and his head Emmanuel Barillot to generously equip me with the professional environment, the place and the tools.

I address my gratitude to the TAC committee members Franck Pagès and Denis Thieffry for helping me organizing the jury and giving constructive comments along with my thesis, for being present, at least remotely despite severe weather conditions or travels. 

I would also like to express gratitude towards the jury for taking the time to assess this work.

This is also the place to thank the ITMO Cancer - AVIESAN for funding my Ph.D. scholarship and the Pharmacology Faculty of Paris Descartes and specifically Prof. Chantal Guihenneuc for giving me the opportunity to teach in parallel of my Ph.D. as a part of her pedagogical team.  I would also thank a lot Center of Interdisciplinary Research for equipping me with unusual skills through numerous courses and Bettencourt Foundation for financing part of the training and sponsoring travel expenses. Special thanks to FdV coordinators: Sofie Leon, David Manset, Elodie Kaslikowski and Maria Molina Calvita for their availability and dynamism. Also, I would express my gratitude for supporting my application for French nationality to François Taddei, director of the FdV Ph.D. school.

Thanks to all people I worked with in both teams: Arnau, Pauline, Gaelle, Paul, Cristobal, Luca, Laura, Laurence, Loredana, Jonas, Floriane, Maude, Philemon, Lilith, Paula, Mihaly, Louis, Caroline. To my FdV mates: Roberta, Juanma, Miza, Guillermo, Aamir and others. To other Ph.D. students of the unit I got along with: Peter, Jo, Hector and Benoît.

This work would never be possible without help and patience of my family, my partner Arnaud and his family. Especially, I would like to thank Arnaud, who managed to be with me on the daily basis, spent the endless hours correcting my writing and speaking, discussed about the code and good practices, was making me laugh when I was coming home tired, angry or unmotivated, and just for being him adorable self. This thesis is dedicated to his father that will not be able anymore to profit from any breakthrough in cancer research. 

I am proud to finish the thesis and face new professional adventures. I learned a lot about myself during this three years. I would like to thank very much everyone whom I crossed on this path. I hope to meet you again one day.

")


```