Learning Universal Representations of Intermolecular Interactions

Authors

Ada Fang
Zaixi Zhang
Andrew Zhou
Marinka Zitnik

ATOMICA is a geometric AI model that learns universal representations of molecular interactions at an atomic scale. The model is pretrained on 2,037,972 molecular interaction interfaces from the Protein Data Bank and Cambridge Structural Database, this includes protein-small molecule, protein-ion, small molecule-small molecule, protein-protein, protein-peptide, protein-RNA, protein-DNA, and nucleic acid-small molecule complexes. Embeddings of ATOMICA can be generated with the open source model weights and code to be used for various downstream tasks. In the paper, we demonstrate the utility of ATOMICA embeddings for studying the human interfaceome network with ATOMICANets and for annotating ions and small molecules to proteins in the dark proteome.

🚀 Installation and Setup

1. Download the Repository

Clone the Gihub Repository:

git clone https://github.com/mims-harvard/ATOMICA
cd ATOMICA

2. Set Up Environment

Set up the environment according to setup_env.sh.

3. (optional) Download Processed Datasets

The data for pretraining and downstream analyses is hosted at Harvard Dataverse.

We provide the following datasets:

Processed CSD and QBioLiP (based on PDB) interaction complex graphs for pretraining
Processed protein interfaces of human proteome binding sites to ion, small molecule, lipid, nucleic acid, and protein modalities
Processed protein interfaces of dark proteome binding sites to ion and small molecules

4. Download Model Checkpoints

Model checkpoints are provided on Hugging Face. The following models are available:

ATOMICA model
Pretrained ATOMICA-Interface model
Finetuned ATOMICA-Ligand prediction models for the following ligands:
- metal ions: Ca, Co, Cu, Fe, K, Mg, Mn, Na, Zn
- small molecules: ADP, ATP, GTP, GDP, FAD, NAD, NAP, NDP, HEM, HEC, CIT, CLA

⭐ Usage

Train ATOMICA

Training scripts for pretraining ATOMICA and finetuning ATOMICA-Interface and ATOMICA-Ligand are provided in scripts/.

Inference with ATOMICA-Ligand

Refer to the jupyter notebook at case_studies/atomica_ligand/example_run_atomica_ligand.ipynb for an example of how to use the model for binder prediction.

Explore ATOMICANets

Refer to the jupyter notebook at case_studies/atomica_net/example_atomica_net.ipynb

Embedding your own structures

Make sure to download the ATOMICA model weights and config files from Hugging Face.

For embedding biomolecular complexes: process .pdb files with data/process_pdbs.py and embed with get_embeddings.py. See further details for data processing in the data/README.md file here.

For embedding protein-(ion/small molecule/lipid/nucleic acid/protein) interfaces: first predict (ion/small molecule/lipid/nucleic acid/protein) binding sites with PeSTo, second process the PeSTo output .pdb files with data/process_PeSTo_results.py, finally embed with get_embeddings.py.

💡 Questions

For questions, please leave a GitHub issue or contact Ada Fang at [email protected].

⚖️ License

The code in this package is licensed under the MIT License.

📜 Citation

If you use ATOMICA in your research, please cite the following preprint:

@article{Fang2025ATOMICA,
  author = {Fang, Ada and Zhang, Zaixi and Zhou, Andrew and Zitnik, Marinka},
  title = {ATOMICA: Learning Universal Representations of Intermolecular Interactions},
  year = {2025},
  journal = {bioRxiv},
  publisher = {Cold Spring Harbor Laboratory},
  doi = {10.1101/2025.04.02.646906},
  url = {https://www.biorxiv.org/content/10.1101/2025.04.02.646906v1},
  note = {preprint},
}

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
assets		assets
case_studies		case_studies
data		data
interaction_profiler		interaction_profiler
models		models
scripts		scripts
trainers		trainers
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
get_embeddings.py		get_embeddings.py
setup_env.sh		setup_env.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Universal Representations of Intermolecular Interactions

🚀 Installation and Setup

1. Download the Repository

2. Set Up Environment

3. (optional) Download Processed Datasets

4. Download Model Checkpoints

⭐ Usage

Train ATOMICA

Inference with ATOMICA-Ligand

Explore ATOMICANets

Embedding your own structures

💡 Questions

⚖️ License

📜 Citation

About

Releases

Packages

Languages

License

imeMFK01/ATOMICA

Folders and files

Latest commit

History

Repository files navigation

Learning Universal Representations of Intermolecular Interactions

🚀 Installation and Setup

1. Download the Repository

2. Set Up Environment

3. (optional) Download Processed Datasets

4. Download Model Checkpoints

⭐ Usage

Train ATOMICA

Inference with ATOMICA-Ligand

Explore ATOMICANets

Embedding your own structures

💡 Questions

⚖️ License

📜 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages