From d181048da1e00de3521fcba6757453e81cb3cce2 Mon Sep 17 00:00:00 2001 From: Luca Visentin Date: Tue, 18 Jul 2023 14:54:14 +0200 Subject: [PATCH] fix: Added a reference to xsv + fixed boundary box violation --- paper/src/resources/bibliography.bib | 11 +++++++++++ paper/src/sections/010_introduction.tex | 2 +- paper/src/sections/020_MnMs.tex | 2 +- 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/paper/src/resources/bibliography.bib b/paper/src/resources/bibliography.bib index e1451ea..5baff81 100644 --- a/paper/src/resources/bibliography.bib +++ b/paper/src/resources/bibliography.bib @@ -260,3 +260,14 @@ @article{thegeneontologyconsortiumGeneOntologyKnowledgebase2023 abstract = {The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO\textemdash a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations\textemdash evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)\textemdash mechanistic models of molecular ``pathways'' (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.}, file = {/home/hedmad/Zotero/storage/2HRX6I4K/The Gene Ontology Consortium et al. - 2023 - The Gene Ontology knowledgebase in 2023.pdf;/home/hedmad/Zotero/storage/AGWTTS2G/7068118.html} } + +@misc{gallantBurntSushiXsv2023, + title = {{{BurntSushi}}/Xsv}, + author = {Gallant, Andrew}, + year = {2023}, + month = jul, + urldate = {2023-07-18}, + abstract = {A fast CSV command line toolkit written in Rust.}, + copyright = {Unlicense}, + keywords = {cli,command-line,csv,rust} +} diff --git a/paper/src/sections/010_introduction.tex b/paper/src/sections/010_introduction.tex index 4a32e06..4fedb2c 100644 --- a/paper/src/sections/010_introduction.tex +++ b/paper/src/sections/010_introduction.tex @@ -61,6 +61,6 @@ \section{Introduction} We provide an open-source, documented, and reproducible Python package, Daedalus (\href{https://github.com/TCP-Lab/MTP-DB}{github.com/TCP-Lab/MTP-DB}) that retrieves transportome-related data from various databases and compiles it in a local \mono{.sqlite} database. In parallel, we also provide pre-compiled database files as periodic releases. -A \mono{make}-driven and docker-containerized pipeline, named ``transportome profiler'', (\href{https://github.com/TCP-Lab/transportome_profiler}{github.com/TCP-Lab/transportome\_profiler}) is also available. +A \mono{make}-driven and docker-containerized pipeline, named ``transportome profiler'',\\(\href{https://github.com/TCP-Lab/transportome_profiler}{github.com/TCP-Lab/transportome\_profiler}) is also available. It takes gene expression data and the aforementioned database to generate gene sets, sorts genes based on their differential expression, and runs \gls{gsea}. This pipeline was designed with modularity and reproducibility in mind, so that it would be easily adaptable on other datasets and databases. diff --git a/paper/src/sections/020_MnMs.tex b/paper/src/sections/020_MnMs.tex index ccb5f3d..ade959b 100644 --- a/paper/src/sections/020_MnMs.tex +++ b/paper/src/sections/020_MnMs.tex @@ -273,7 +273,7 @@ \subsection{GSEA analysis} However, it is unclear how to meaningfully pair such large databases, as methodology, possible batch effects due to sample handling, sample preprocessing (such as microdissection), and the intrinsic non-healthy nature of \gls{gtex} samples can cause confounding in the results. To pair the healthy and tumor data together, we therefore followed the macroscopic grouping provided by the metadata files in the Xena platform. -To subset the very large expression matrix file based on the metadata, we implemented a Python package called \textit{metasplit}, which makes use of the low level Rust-compiled \mono{xsv} package to speed up the computation \todo{PUT A REFERENCE HERE}. +To subset the very large expression matrix file based on the metadata, we implemented a Python package called \textit{metasplit}, which makes use of the low level Rust-compiled \mono{xsv} package to speed up the computation \cite{gallantBurntSushiXsv2023}. The specific calls used to subset the expression matrix and therefore match the tumor and healthy samples are available in the GitHub repository at \href{https://github.com/TCP-Lab/transportome_profiler/blob/main/src/run_dea/tcga_gtex_queries.json}{this url}. %Should we explicitly show them as a table?? As an overview, we compared tumor samples with their healthy counterparts from the same tissue or organ of origin.