IntersectOmics

IntersectOmics is a computational framework for analyzing multi-omics datasets with time series or multiple conditions. It identifies biomolecules (e.g., genes, proteins, metabolites) that behave in a coordinated manner—either concordantly (similar direction) or discordantly (opposite direction)—across different omics layers, using correlation, graph theory, and community detection.

This tool is ideal for uncovering complex biological patterns across layers and conditions. The interpretation of whether clusters represent concordant or discordant behavior is left to the user, based on downstream inspection or visualization.

TODO: Add support for automatic detection and labeling of concordant vs. discordant biomolecule relationships across omics layers along with biological explanation as detailed (here)[https://substack.com/history/post/157552707]

Installation

After cloning the repository, install the package in editable mode:

pip install -e .

Supported Dataset

IntersectOmics supports multi-omics datasets with any number of replicates and experimental conditions. Each omics layer (transcriptomics, proteomics, metabolomics, etc.) should be provided as a separate table. To ensure consistency between the metadata and data, the program requires columns that are multiindex.

Input Data Structure

Each input dataset should be structured as follows:

Rows: Unique biomolecule identifiers (e.g., gene/protein/metabolite names)
Columns: Sample measurements, ideally grouped by condition and replicate as multiindex

You need at least two different omics layers.

Requirements

All omics layers must use consistent biomolecule identifiers to enable graph intersection.
Replicates must be distinguishable by naming convention or metadata.
Handle missing values appropriately before using the tool.

Example Dataset

We use a dataset from this publication, which measured transcriptomics and proteomics in springtail earthworms over time after insecticide exposure.

Correlation with Replicates

Rather than averaging replicates—which can lose valuable variance information—IntersectOmics fits a distribution for each condition and bootstraps correlation values by sampling from these distributions.

Bootstrap Workflow

Fit a normal distribution at each time point using replicate values.
Sample one value per time point from the fitted distribution and perform correlation computation
Repeat this sampling process n times to compute a distribution of correlation values.
Average these correlations.
Combine p-values using the Pearson method (scipy.stats.combine_pvalues).

Note: While a normal distribution is currently used, future versions may support data-type-specific distributions (e.g., Poisson for RNA-seq).

Supported Correlation Metrics

Spearman (default): Rank-based, ideal for monotonic or curvilinear trends
Pearson: TODO
Euclidean Distance: TODO

Graph Construction

A graph is constructed for each omics layer. Note that by default any correlations that have a significance <=0.05 is ignore, but that threshold can be modified by the user:

Nodes: Biomolecules
Edges: Pairwise similarity scores between biomolecules
Weights: Averaged correlation scores (from bootstrapped sampling)

Edges can reflect both positive (concordant) and negative (discordant) relationships.

Graph Intersection

Once a graph is built for each omics layer, their intersection is computed:

Nodes: Must exist in all graphs
Edges: Retained only if present in all graphs

This ensures only biomolecule relationships that are consistent across all omics layers are preserved, regardless of the direction of the correlation.

Community Detection

Community detection is performed on the intersected graph:

Identifies clusters of biomolecules that are similarly related across layers
Uses edge weights (correlation) as a measure of connectivity

Important: Communities may include biomolecules that are concordantly or discordantly related between omics layers. It is the user’s responsibility to inspect each community (e.g., via visualization or trend comparison) to interpret the biological meaning.

Final Results

Each community represents a group of biomolecules with shared trends across layers. These may be:

Concordant: Biomolecules change in the same direction
Discordant: Biomolecules show opposing trends

Use visualization tools to evaluate and annotate the nature of each cluster.

Notebooks

A complete example demonstrating the full pipeline, including data loading, correlation, graph construction, and visualization, can be found at:

notebooks/run_example.ipynb

Inspiration

IntersectOmics is inspired by Nikolay Oskolkov’s UMAPDataIntegration, with major additions including:

Support for time series data
Bootstrap-based correlation with replicates
Cross-layer graph intersection
Discovery of both concordant and discordant multi-omics communities

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
intersectomics		intersectomics
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IntersectOmics

Installation

Supported Dataset

Input Data Structure

Requirements

Example Dataset

Correlation with Replicates

Bootstrap Workflow

Supported Correlation Metrics

Graph Construction

Graph Intersection

Community Detection

Final Results

Notebooks

Inspiration

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Melclic/intersectomics

Folders and files

Latest commit

History

Repository files navigation

IntersectOmics

Installation

Supported Dataset

Input Data Structure

Requirements

Example Dataset

Correlation with Replicates

Bootstrap Workflow

Supported Correlation Metrics

Graph Construction

Graph Intersection

Community Detection

Final Results

Notebooks

Inspiration

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages