This is the official codebase for Integrate Any Omics: Towards genome-wide data integration for patient stratification.
IntegrAO
package requires only a standard computer with enough RAM to support the in-memory operations.
IntegrAO works with Python >= 3.7. Please make sure you have the correct version of Python pre-installation.
- Create a virtual environment:
conda create -n integrAO python=3.10 -y
andconda activate integrAO
- Install Pytorch 2.1.0
- IntegrAO is available on PyPI. To install IntegrAO, run the following command:
pip install integrao
For developing, clone this repo with following commands:
$ git clone this-repo-url
$ cd IntegrAO
$ pip install -r requirement.txt
High-throughput omics profiling advancements have greatly enhanced cancer patient stratification. However, incomplete data in multi-omics integration presents a significant challenge, as traditional methods like sample exclusion or imputation often compromise biological diversity and dependencies. Furthermore, the critical task of accurately classifying new patients with partial omics data into existing subtypes is commonly overlooked. We introduce IntegrAO, an unsupervised framework integrating incomplete multi-omics and classifying new biological samples. IntegrAO first combines partially overlapping patient graphs from diverse omics sources and utilizes graph neural networks to produce unified patient embeddings.
An overview of IntegrAO can be seen below.
We offer the following tutorials for demonstration:
- Integrate simulated butterfly datasets
- Integrate simulated cancer omics datasets
- Classify new samples with incomplete omics datasets
@article{ma2024integrate,
title={Integrate Any Omics: Towards genome-wide data integration for patient stratification},
author={Ma, Shihao and Zeng, Andy GX and Haibe-Kains, Benjamin and Goldenberg, Anna and Dick, John E and Wang, Bo},
journal={arXiv preprint arXiv:2401.07937},
year={2024}
}