Skip to content

Commit

Permalink
Merge pull request #8 from TCP-Lab/2bioRxiv
Browse files Browse the repository at this point in the history
Pre-preprint changes
  • Loading branch information
Feat-FeAR authored Jul 18, 2023
2 parents 144f5b3 + df8ec15 commit 1a28577
Show file tree
Hide file tree
Showing 6 changed files with 158 additions and 132 deletions.
12 changes: 6 additions & 6 deletions paper/src/main.tex
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
\usepackage{geometry}
\geometry{
a4paper,
total={170mm,257mm},
left=20mm,
top=20mm,
total={150mm,217mm},
left=30mm,
top=40mm,
}

% Fonts
Expand Down Expand Up @@ -183,11 +183,11 @@
\usepackage{fancyhdr}
\pagestyle{fancy}
\fancyhead[]{}
\fancyhead[LO]{\color{red}Draft generated from source code on \today}
%\fancyhead[LO]{\color{red}Draft generated from source code on \today}
\renewcommand{\headrulewidth}{0pt} % no line in header area
\fancyfoot{} % clear all footer fields
\fancyfoot[RO]{\thepage} % page number in "outer" position of footer line
\fancyfoot[LO]{\color{red}Draft generated from source code on \today}
%\fancyfoot[LO]{\color{red}Draft generated from source code on \today}

% Better abstracts
\usepackage{abstract}
Expand All @@ -196,7 +196,7 @@
% Better Tables
\usepackage{tabularx}

% My proposal for a title: The transcriptional landscape of the Transportome in Cancer
% Title and authors
\title{\normalfont Profiling the Expression of Transportome Genes in cancer: A systematic approach}
\author[*]{Luca Visentin\footnote{L.V., [email protected]}}
\author[**]{Giorgia Scarpellino\footnote{G.S., [email protected]}}
Expand Down
19 changes: 10 additions & 9 deletions paper/src/resources/glossary.tex
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
\newacronym{mtpdb}{MTP-DB}{Membrane Transport Protein Database}
\newacronym{mtpdb}{MTP-DB}{Membrane Transport Protein DataBase}
\newacronym{hgnc}{HGNC}{Human Gene Nomenclature Committee}
\newacronym{iuphar}{IUPHAR}{International Union of basic and clinical Pharmacology}
\newacronym{tcdb}{TCDB}{Transporter Classification Database}
\newacronym{cosmic}{COSMIC}{Catalogue of Somatic Mutations in Cancer}
\newacronym{slc}{SLC}{Solute Carrier}
\newacronym{iuphar}{IUPHAR}{International Union of basic and clinical PHARmacology}
\newacronym{tcdb}{TCDB}{Transporter Classification DataBase}
\newacronym{cosmic}{COSMIC}{Catalogue Of Somatic Mutations In Cancer}
\newacronym{slc}{SLC}{SoLute Carrier}
\newacronym[
longplural={Ion Channels and Transporters}
]{ict}{ICT}{Ion Channel and Transporter}
\newacronym{gsea}{GSEA}{Gene Set Enrichment Analysis}
\newacronym{tgl}{TGL}{Transporter Gene List}
\newacronym{tgs}{TGS}{Transporter Gene Set}
\newacronym{tcga}{TCGA}{The Cancer Genome Atlas}
\newacronym{gtex}{GTEx}{Genotype-Tissue Expression}
\newacronym{pdac}{PDAC}{Pancreatic Ductal Adenocarcinoma}
\newacronym{pdac}{PDAC}{Pancreatic Ductal AdenoCarcinoma}
\newacronym{geo}{GEO}{Gene Expression Omnibus}
\newacronym{nes}{NES}{Normalized Enrichment Score}
\newacronym{go}{GO}{Gene Onthology}
\newacronym{go}{GO}{Gene Ontology}
\newacronym{abc}{ABC}{ATP-Binding Cassette}
\newacronym{fu}{FU}{functional unit}
\newacronym{tfu}{TFU}{Transport Functional Unit}
\newacronym{dea}{DEA}{Differential Expression Analysis}
19 changes: 12 additions & 7 deletions paper/src/sections/009_abstract.tex
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
\setcounter{footnote}{0} % Reset footnote counter

\begin{abstract}
\large
The transportome, the -omic layer encompassing all ion channels and transportes (ICTs), is crucial for cell physiology.
The transportome, the \textit{-omic} layer encompassing all \glspl{ict}, is crucial for cell physiology.
It is therefore reasonable to hypothesize a role of the transportome in disease, and in particular in cancer.
Here, we present the Membrane Transport Protein database (MTP-db), a database collecting information on ICTs, and a pipeline to take expression data and the MTP-db as input and produce a broad overview over the disregulated transportome in cancer.
The MTP-DB may prove useful for the study of the transportome in general, and the pipeline may be used to study the transportome in other diseases.
Both tools are open source and can be found on Github at \href{https://github.com/TCP-Lab/MTP-db}{TCP-Lab/mtp-db} and \href{https://github.com/TCP-Lab/transportome_profiler}{TCP-Lab/transportome\_profiler}, under permissive licenses.
We detect that the transportome is disregulated in cancer, and that dysregulation patterns are shared between different cancer types.
It is still unclear how these disregulation patterns are linked with cancer physiology.
\end{abstract}
Here, we present the \gls{mtpdb}, a database collecting information on \glspl{ict}, and a pipeline that takes expression data and the \gls{mtpdb} as input to produce a broad overview of transportome dysregulation in cancer.
The \gls{mtpdb} may prove useful for the study of the transportome in general, and the pipeline may be used to study the transportome in other diseases.
Both tools are open source and can be found on GitHub at \href{https://github.com/TCP-Lab/MTP-db}{TCP-Lab/mtp-db} and \href{https://github.com/TCP-Lab/transportome_profiler}{TCP-Lab/transportome\_profiler}, under permissive licenses.
We detect that the transportome is dysregulated in cancer, and that dysregulation patterns are shared among different cancer types.
It is still unclear how these patterns are linked to cancer pathophysiology.
\end{abstract}

\setcounter{footnote}{0} % Reset footnote counter again
\glsresetall % To make glossary forget about the abstract.
88 changes: 48 additions & 40 deletions paper/src/sections/010_introduction.tex
Original file line number Diff line number Diff line change
@@ -1,58 +1,66 @@
\section{Introduction}

% Ok so I used chatgpt a bit for this, mainly to rewrite some paragraphs that were hard to read in my first draft, and ho boy it's very good at its job.
In each kingdom of life, the diffusion of substances between the intracellular and extracellular environments is essential for cell survival.
Apart the free diffusion of small lipophilic molecules, these exchanges are mediated by transmembrane proteins of various nature, which we refer to by the term \textit{transmembrane transport proteins}, while the \textit{-omic} layer that comprises all of them is referred to as the \textit{transportome}.

The movement of molecules between the intracellular and extracellular environments is essential for cell survival.
This movement is due, in part, by transmembrane proteins of various nature, which we refer to with the term \textit{transmembrane transporters}.
The -omic layer that comprises the transmembrane transporters is referred to as the \textit{transportome}.
The coordinated action of these proteins regulates a large number of physiological functions, such as membrane potential, nutrient absorption, waste product removal, cellular signaling, regulation of intra- and extracellular pH, and more.

The coordinated action of these proteins regulates a large number of physiological functions, such as membrane potential, nutrient absorption, waste product removal, cellular signalling, regulation of intracellular and extracellular pH, and more.
While the expression of these proteins and their proper targeting are necessary prerequisites for membrane transport, they are generally not sufficient for the fulfillment of the overall function.
In fact, the establishment of \glspl{tfu}, which are capable of performing specific tasks, often requires the assembly of multiple protein subunits to form homo- or heteromultimers, or even long-range interactions among different transmembrane transport proteins.
For instance, any secondary active transporter cannot function properly in the absence of a pump that creates a chemical gradient to be dissipated.
Another example is given by the communication between \textit{STIM} calcium sensors and \textit{Orai} channels, responsible for the calcium release-activated calcium currents.

While the expression of these proteins is necessary for membrane transport, it is not sufficient to accomplish the overall function effectively.
Indeed, the formation of \glspl{fu}, which are capable of performing specific tasks, often requires the interaction of transport proteins with accessory proteins or other complex-forming proteins.
The cataloging and characterization of \glspl{tfu} can be a complex task, especially due to current limitations in proteomics research and the still low-throughput nature of the biophysical techniques for functional assays.
Currently, according to a widely accepted approximation, the individual \gls{tfu} is identified with the gene transcript of one or more subunits it consists of, assuming transcriptional levels to be directly proportional to the abundance and activity of the \glspl{tfu} they encode.
Although this approximation may have many limitations (poor correlation between transcripts and proteins, lack of functional assessment, unclear role of auxiliary subunits, \ldots), it serves as a practical approach until more precise methods for \gls{tfu} quantification and characterization become available.
% The last issue can be ameliorated under the hypothesis that transcriptional dysregulation of even one of the subunits of which it is composed may impair the functionality of the entire macromolecular complex.

The cataloging and characterization of \glspl{fu} can be a complex task, especially due to current limitations in proteomics research.
However, advancements in technology and methodologies are continuously being made, so it is highly likely that the characterization of \glspl{fu} will become more precise in the future.

We can broadly classify \glspl{fu} into two main categories: \textit{pores} and \textit{transporters}.
According to the schema in Figure \ref{fig:BasicTree}, transport proteins can be broadly classified into two main categories: \textit{pores} and \textit{transporters}.
\begin{itemize}
\item Pores: water-filled pores that allow the facilitated passage of molecules through the membrane.
These may be additionally subdivided into \textit{channels} proper and \textit{aquaporins}, pores that mostly allow the passage of water.
\item Transporters: proteins or protein complexes that, embedded in the membrane, allow passage of molecules thanks to conformational changes.
These may be furthermore divided into those that require ATP hydrolysis to function and those that do not as \textit{ATP-driven transporters} and \textit{solute carriers}, respectively.
\begin{itemize}
\item A common distinction in atp-driven transporters is given by \textit{\gls{abc} transporters} and \textit{pumps}, where \gls{abc} transporters feature the conserved \gls{abc} subunit, while pumps do not.
\end{itemize}
\item \textbf{Pores:} water-filled pores that allow the facilitated passage of molecules through the membrane.
These may be additionally subdivided into \textit{ion channels} proper and \textit{aquaporins}, pores that mostly allow the passage of water.
\item \textbf{Transporters:} proteins or protein complexes that, embedded in the membrane, allow the passage of molecules upon conformational changes.
These may be furthermore divided into those that require ATP hydrolysis to function and those that do not, known as \textit{ATP-driven (or primary active) transporters} and \textit{\glspl{slc}}, respectively.
A common distinction within the ATP-driven transporters is given by \textit{\gls{abc} transporters} and \textit{pumps}, where \gls{abc} transporters feature the conserved \gls{abc} domain, while pumps do not.
\end{itemize}
Overall, the phrase ``\glspl{ict}'' is now commonly used to refer to the set of all the transportome gene products that fall into one of these macro-categories.

Currently, a widely accepted approximation for defining functional units is the gene transcript.
It is assumed that the presence and quantity of gene transcripts are directly proportional to the abundance and activity of the corresponding \glspl{fu}.
Although this approximation has limitations, it serves as a practical approach until more precise methods for \gls{fu} characterization become available.

Cancer cells differ fundamentally in respect with their healthy counterparts, especially in their relationship with the extracellular environment.
Dramatic metabolic shifts are also seen in cancer.
Both of these aspects probably involve an alteration in transportome \glspl{fu}, and, by the above approximation, may be potentially reflected in the expression levels of the same genes.
It is therefore of interest to study the expression levels of transportome genes in cancer cells.
\begin{figure}
\centering
\includegraphics[width=0.65\textwidth]{resources/images/BasicTree.pdf}
\caption{\small Tree of the principal classes of \glspl{ict}.}
\label{fig:BasicTree}
\end{figure}

One commonly employed approach to accomplish this is by measuring gene expression in both healthy and diseased samples.
This is followed by performing differential expression analysis and enriching the resulting gene list with ontologies such as the \gls{go}, using tests like Fisher's exact test.
Cancer cells exhibit fundamental differences from their healthy counterparts, especially in their relationship with the extracellular environment.
Dramatic metabolic shifts are also observed in cancer.
Both of these aspects probably involve an alteration in the transportome, acting as an ``adapter''\footnote{Literally in the sense of \textit{mediator of the adaptation} towards the extracellular environment.} for cancer cells with the tumor micro-environment, while at the same time ensuring the exchange of nutrients and metabolites capable to sustain the altered metabolism.
By the above approximation, this transportome dysregulation may be reflected in the expression levels of the genes themselves, making it interesting to explore the transcriptional profile of cancer cells.

The effectiveness of this process relies on the careful curation and organization of the \gls{go} database.
% Next two paragrahps could be even moved to Discussion
One commonly employed approach to accomplish this task is by measuring gene expression in both healthy and diseased samples.
This is followed by \gls{dea} and enrichment analysis of the resulting gene list using ontologies---such as the \gls{go}---and hypothesis tests like hypergeometric test or Fisher's exact test.
Finally, the list of all the significantly enriched terms needs to be screened \textit{a posteriori} to search for some transportome-related terms.
The effectiveness of this process heavily relies on both the statistical power reached by the \gls{dea} and the careful curation and organization of the ontology database.

A similar, but "reversed" approach is to generate gene lists of interest and then test them against the data to see if the list is differentially expressed or not, for example with the \gls{gsea} method.
Assuming that these gene lists meaningfully group together genes by their functional role (i. e. group together similar \glspl{fu}), we have a few advantages:
A similar---but somehow ``reversed''---approach is to generate a limited number of gene sets of interest \textit{a priori}, which meaningfully group together genes belonging to, for instance, similar \glspl{tfu}.
Then a \gls{gsea} can be performed to test them against the data and see if the function or the gene family they represent is dysregulated or not.
This second option has a few advantages:
\begin{itemize}
\item The weigthed \gls{gsea} method can take into account the magnitude of differential expression of the genes in the list, and not just their presence or absence.
\item The tested gene lists may be arbitrary and not necessarily be based on ontologies. For example, they may be manually curated, specifically crafted for a purpose (such as a list of genes involved in a specific function of interest), or generated by other methods.
\item Given a set of characteristics, it is possible to systematically generate all gene lists that may be meaningfully created, and test them all against the data.
\item The weighted \gls{gsea} method can take into account the magnitude of differential expression of the genes in the list (i.e., the effect size), and not their mere presence or absence.
\item The tested gene lists may be arbitrary and not necessarily based on ontologies.
For example, they may be manually curated, specifically crafted for a purpose (such as a list of genes involved in a specific function or pathway of interest), or generated by other methods.
\item Given a set of features, it is possible to systematically generate all the gene sets that may be meaningfully conceived, and test them all against the data.
\end{itemize}

The present work aims at profiling the expression levels of transportome genes in the context of cancer.
To do this, we collected information on these genes (such as their complete list, which molecule(s) they transport, their gating mechanism, their functional class, etc.) and used it to systematically arrange them into meaningful \glspl{tgl}.
After sorting all the protein-coding genes found in cancer cells based on their differential expression with respect to healthy cells, we run a pre-ranked \gls{gsea} on these ordered lists to obtain enrichment scores for every \gls{tgl}.
We therefore obtained the "deregulation status" of most functional facets of the transportome in 19 different cancer tissue types.
The present work aims at profiling the expression levels of transportome genes in the context of human cancer.
To do this, we collected information on these genes (such as their complete list, which molecule(s) they transport, their gating mechanism(s), their functional class, etc.) and used it to systematically arrange them into meaningful \glspl{tgs}.
After sorting all the protein-coding genes found in cancer cells based on their differential expression with respect to healthy cells, we ran a pre-ranked \gls{gsea} on these ordered lists to obtain enrichment scores for every \gls{tgs}.
We therefore obtained the ``dysregulation status'' of most functional facets of the transportome in $19$ different cancer tissue types.

We provide an open-source, documented and reproducible Python package, Daedalus (\href{https://github.com/CMA-Lab/MTP-DB}{github.com/CMA-Lab/MTP-DB}) that retrieves transportome-related data from various databases and compiles it in a local \mono{.sqlite} database, and we provide pre-compiled database files as periodic releases.
We provide an open-source, documented, and reproducible Python package, Daedalus (\href{https://github.com/CMA-Lab/MTP-DB}{github.com/CMA-Lab/MTP-DB}) that retrieves transportome-related data from various databases and compiles it in a local \mono{.sqlite} database.
In parallel, we also provide pre-compiled database files as periodic releases.

We additionally provide a \mono{make}-driven and docker-containerized pipeline, dubbed "transportome profiler", (\href{https://github.com/CMA-Lab/transportome_profiler}{github.com/CMA-Lab/transportome\_profiler}) that takes gene expression data and the aforementioned database to generate gene lists, sort genes based on their differential expression and run \gls{gsea}.
A \mono{make}-driven and docker-containerized pipeline, named ``transportome profiler'', (\href{https://github.com/CMA-Lab/transportome_profiler}{github.com/CMA-Lab/transportome\_profiler}) is also available.
It takes gene expression data and the aforementioned database to generate gene sets, sorts genes based on their differential expression, and runs \gls{gsea}.
This pipeline was designed with modularity and reproducibility in mind, so that it would be easily adaptable on other datasets and databases.
Loading

0 comments on commit 1a28577

Please sign in to comment.