Skip to content

The official implementation of the ICML'24 paper "A Graph is Worth K Words: Euclideanizing Graph using Pure Transformer".

Notifications You must be signed in to change notification settings

A4Bio/GraphsGPT

Repository files navigation

[GraphsGPT] A Graph is Worth $K$ Words:
Euclideanizing Graph using Pure Transformer (ICML2024)

Zhangyang Gao*, Daize Dong*, Cheng Tan, Jun Xia, Bozhen Hu, Stan Z. Li

Published on The 41st International Conference on Machine Learning (ICML 2024).

Introduction

Can we model Non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The Non-Euclidean property have posed a long term challenge in graph modeling. Despite recent graph neural networks and graph transformers efforts encoding graphs as Euclidean vectors, recovering the original graph from vectors remains a challenge. In this paper, we introduce GraphsGPT, featuring an Graph2Seq encoder that transforms Non-Euclidean graphs into learnable GraphWords in the Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from GraphWords to ensure information equivalence. We pretrain GraphsGPT on $100$M molecules and yield some interesting findings:

  • The pretrained Graph2Seq excels in graph representation learning, achieving state-of-the-art results on $8/9$ graph classification and regression tasks.
  • The pretrained GraphGPT serves as a strong graph generator, demonstrated by its strong ability to perform both few-shot and conditional graph generation.
  • Graph2Seq+GraphGPT enables effective graph mixup in the Euclidean space, overcoming previously known Non-Euclidean challenges.
  • The edge-centric pretraining framework GraphsGPT demonstrates its efficacy in graph domain tasks, excelling in both representation and generation.

graphsgpt.svg

This is the official code implementation of ICML 2024 paper A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer.

The model checkpoints can be downloaded from 🤗 Transformers. We provide both the foundational pretrained models with different number of Graph Words $\mathcal{W}$ (GraphsGPT-nW), and the conditional version with one Graph Word (GraphsGPT-1W-C).

Model Name Model Type Model Checkpoint
GraphsGPT-1W Foundation Model
GraphsGPT-2W Foundation Model
GraphsGPT-4W Foundation Model
GraphsGPT-8W Foundation Model
GraphsGPT-1W-C Finetuned Model

Installation

To get started with GraphsGPT, please run the following commands to install the environments.

git clone [email protected]:A4Bio/GraphsGPT.git
cd GraphsGPT
conda create --name graphsgpt python=3.12
conda activate graphsgpt
pip install -e .[dev]
pip install -r requirement.txt

Quick Start

We provide some Jupyter Notebooks in ./jupyter_notebooks, and their corresponding online Google Colaboratory Notebooks. You can run them for a quick start.

Example Name Jupyter Notebook Google Colaboratory
GraphsGPT Pipeline example_pipeline.ipynb Open In Colab
Graph Clustering Analysis clustering.ipynb Open In Colab
Graph Hybridization Analysis hybridization.ipynb Open In Colab
Graph Interpolation Analysis interpolation.ipynb Open In Colab

Representation

You should first download the configurations and data for finetuning, and put them in ./data_finetune. (We also include the finetuned checkpoints in the model_zoom.zip file for a quick test.)

To evaluate the representation performance of Graph2Seq Encoder, please run:

bash ./scripts/representation/finetune.sh

You can also toggle the --mixup_strategy for graph mixup using Graph2Seq.

Generation

For unconditional generation with GraphGPT Decoder, please refer to README-Generation-Uncond.md.

For conditional generation with GraphGPT-C Decoder, please refer to README-Generation-Cond.md.

To evaluate the few-shots generation performance of GraphGPT Decoder, please run:

bash ./scripts/generation/evaluation/moses.sh
bash ./scripts/generation/evaluation/zinc250k.sh

Citation

@article{gao2024graph,
  title={A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer},
  author={Gao, Zhangyang and Dong, Daize and Tan, Cheng and Xia, Jun and Hu, Bozhen and Li, Stan Z},
  journal={arXiv preprint arXiv:2402.02464},
  year={2024}
}

Contact Us

If you have any questions, please contact: