Skip to content

Commit

Permalink
fix: Added a reference to xsv + fixed boundary box violation
Browse files Browse the repository at this point in the history
  • Loading branch information
MrHedmad committed Jul 18, 2023
1 parent df0203a commit d181048
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 2 deletions.
11 changes: 11 additions & 0 deletions paper/src/resources/bibliography.bib
Original file line number Diff line number Diff line change
Expand Up @@ -260,3 +260,14 @@ @article{thegeneontologyconsortiumGeneOntologyKnowledgebase2023
abstract = {The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO\textemdash a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations\textemdash evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)\textemdash mechanistic models of molecular ``pathways'' (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.},
file = {/home/hedmad/Zotero/storage/2HRX6I4K/The Gene Ontology Consortium et al. - 2023 - The Gene Ontology knowledgebase in 2023.pdf;/home/hedmad/Zotero/storage/AGWTTS2G/7068118.html}
}

@misc{gallantBurntSushiXsv2023,
title = {{{BurntSushi}}/Xsv},
author = {Gallant, Andrew},
year = {2023},
month = jul,
urldate = {2023-07-18},
abstract = {A fast CSV command line toolkit written in Rust.},
copyright = {Unlicense},
keywords = {cli,command-line,csv,rust}
}
2 changes: 1 addition & 1 deletion paper/src/sections/010_introduction.tex
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,6 @@ \section{Introduction}
We provide an open-source, documented, and reproducible Python package, Daedalus (\href{https://github.com/TCP-Lab/MTP-DB}{github.com/TCP-Lab/MTP-DB}) that retrieves transportome-related data from various databases and compiles it in a local \mono{.sqlite} database.
In parallel, we also provide pre-compiled database files as periodic releases.

A \mono{make}-driven and docker-containerized pipeline, named ``transportome profiler'', (\href{https://github.com/TCP-Lab/transportome_profiler}{github.com/TCP-Lab/transportome\_profiler}) is also available.
A \mono{make}-driven and docker-containerized pipeline, named ``transportome profiler'',\\(\href{https://github.com/TCP-Lab/transportome_profiler}{github.com/TCP-Lab/transportome\_profiler}) is also available.
It takes gene expression data and the aforementioned database to generate gene sets, sorts genes based on their differential expression, and runs \gls{gsea}.
This pipeline was designed with modularity and reproducibility in mind, so that it would be easily adaptable on other datasets and databases.
2 changes: 1 addition & 1 deletion paper/src/sections/020_MnMs.tex
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ \subsection{GSEA analysis}
However, it is unclear how to meaningfully pair such large databases, as methodology, possible batch effects due to sample handling, sample preprocessing (such as microdissection), and the intrinsic non-healthy nature of \gls{gtex} samples can cause confounding in the results.

To pair the healthy and tumor data together, we therefore followed the macroscopic grouping provided by the metadata files in the Xena platform.
To subset the very large expression matrix file based on the metadata, we implemented a Python package called \textit{metasplit}, which makes use of the low level Rust-compiled \mono{xsv} package to speed up the computation \todo{PUT A REFERENCE HERE}.
To subset the very large expression matrix file based on the metadata, we implemented a Python package called \textit{metasplit}, which makes use of the low level Rust-compiled \mono{xsv} package to speed up the computation \cite{gallantBurntSushiXsv2023}.
The specific calls used to subset the expression matrix and therefore match the tumor and healthy samples are available in the GitHub repository at \href{https://github.com/TCP-Lab/transportome_profiler/blob/main/src/run_dea/tcga_gtex_queries.json}{this url}. %Should we explicitly show them as a table??
As an overview, we compared tumor samples with their healthy counterparts from the same tissue or organ of origin.

Expand Down

0 comments on commit d181048

Please sign in to comment.