IntersectOmics is a computational framework for analyzing multi-omics datasets with time series or multiple conditions. It identifies biomolecules (e.g., genes, proteins, metabolites) that behave in a coordinated manner—either concordantly (similar direction) or discordantly (opposite direction)—across different omics layers, using correlation, graph theory, and community detection.
This tool is ideal for uncovering complex biological patterns across layers and conditions. The interpretation of whether clusters represent concordant or discordant behavior is left to the user, based on downstream inspection or visualization.
TODO: Add support for automatic detection and labeling of concordant vs. discordant biomolecule relationships across omics layers along with biological explanation as detailed (here)[https://substack.com/history/post/157552707]
After cloning the repository, install the package in editable mode:
pip install -e .IntersectOmics supports multi-omics datasets with any number of replicates and experimental conditions. Each omics layer (transcriptomics, proteomics, metabolomics, etc.) should be provided as a separate table. To ensure consistency between the metadata and data, the program requires columns that are multiindex.
Each input dataset should be structured as follows:
- Rows: Unique biomolecule identifiers (e.g., gene/protein/metabolite names)
 - Columns: Sample measurements, ideally grouped by condition and replicate as multiindex
 
You need at least two different omics layers.
- All omics layers must use consistent biomolecule identifiers to enable graph intersection.
 - Replicates must be distinguishable by naming convention or metadata.
 - Handle missing values appropriately before using the tool.
 
We use a dataset from this publication, which measured transcriptomics and proteomics in springtail earthworms over time after insecticide exposure.
Rather than averaging replicates—which can lose valuable variance information—IntersectOmics fits a distribution for each condition and bootstraps correlation values by sampling from these distributions.
- Fit a normal distribution at each time point using replicate values.
 - Sample one value per time point from the fitted distribution and perform correlation computation
 - Repeat this sampling process n times to compute a distribution of correlation values.
 - Average these correlations.
 - Combine p-values using the Pearson method (
scipy.stats.combine_pvalues). 
Note: While a normal distribution is currently used, future versions may support data-type-specific distributions (e.g., Poisson for RNA-seq).
- Spearman (default): Rank-based, ideal for monotonic or curvilinear trends
 - Pearson: TODO
 - Euclidean Distance: TODO
 
A graph is constructed for each omics layer. Note that by default any correlations that have a significance <=0.05 is ignore, but that threshold can be modified by the user:
- Nodes: Biomolecules
 - Edges: Pairwise similarity scores between biomolecules
 - Weights: Averaged correlation scores (from bootstrapped sampling)
 
Edges can reflect both positive (concordant) and negative (discordant) relationships.
Once a graph is built for each omics layer, their intersection is computed:
- Nodes: Must exist in all graphs
 - Edges: Retained only if present in all graphs
 
This ensures only biomolecule relationships that are consistent across all omics layers are preserved, regardless of the direction of the correlation.
Community detection is performed on the intersected graph:
- Identifies clusters of biomolecules that are similarly related across layers
 - Uses edge weights (correlation) as a measure of connectivity
 
Important: Communities may include biomolecules that are concordantly or discordantly related between omics layers. It is the user’s responsibility to inspect each community (e.g., via visualization or trend comparison) to interpret the biological meaning.
Each community represents a group of biomolecules with shared trends across layers. These may be:
- Concordant: Biomolecules change in the same direction
 - Discordant: Biomolecules show opposing trends
 
Use visualization tools to evaluate and annotate the nature of each cluster.
A complete example demonstrating the full pipeline, including data loading, correlation, graph construction, and visualization, can be found at:
notebooks/run_example.ipynb
IntersectOmics is inspired by Nikolay Oskolkov’s UMAPDataIntegration, with major additions including:
- Support for time series data
 - Bootstrap-based correlation with replicates
 - Cross-layer graph intersection
 - Discovery of both concordant and discordant multi-omics communities
 





