Skip to content

BorgwardtLab/AutoGraph

Repository files navigation

AutoGraph

AutoGraph: Transformers are Scalable Graph Generators

This repository implements AutoGraph presented in the following paper:

Dexiong Chen, Markus Krimmel, and Karsten Borgwardt. Flatten Graphs as Sequences: Transformers are Scalable Graph Generators, NeurIPS 2025.

TL;DR: A scalable autoregressive model for attributed graph generation using decoder-only transformers

Unconditional generation Substructure-Conditioned generation
Unconditional generation Substructure-conditioned generation

Updates

  • 2025-10-07: Our new PolyGraph benchmark has been released, along with a new metric for evaluating graph generation models.
  • 2025-09-19: Our codebase and model checkpoints are released.

Overview

By flattening graphs into random sequences of tokens through a reversible process, AutoGraph enables modeling graphs as sequences in a manner akin to natural language. This results in sampling complexity and sequence lengths that scale optimally linearly with the number of edges, making it scalable and efficient for large, sparse graphs. A key success factor of AutoGraph is that its sequence prefixes represent induced subgraphs, creating a direct link to sub-sentences in language modeling. Empirically, AutoGraph achieves state-of-the-art performance on synthetic and molecular benchmarks, with up to 100x faster generation and 3x faster training than leading diffusion models. It also supports substructure-conditioned generation without fine-tuning and shows promising transferability, bridging language modeling and graph generation to lay the groundwork for graph foundation models.

The flattening process relies on sampling a sequence of random trail segments with neighborhood information (i.e. a SENT), by traversing the graph through a strategy similar to depth-first search. More details can be found in Algorithm 1 in our paper. The obtained sequence is then tokenized into a sequence of tokens which can be modeled effectively with a transformer.

Installation

We recommend the users to manage dependencies using miniconda or micromamba:

# Replace micromamba with conda if you use conda or miniconda
micromamba env create -f environment.yaml
micromamba activate autograph
cd autograph/evaluation/orca; g++ -O2 -std=c++11 -o orca orca.cpp; cd ../../..
pip install -e .

Model Downloads

You can download all the pretrained models here and unzip it to ./pretrained_models.

Model Running

The configurations for all experiments are managed by hydra, stored in ./config.

Below you can find the list of experiments conducted in the paper:

  • Small synthetic datasets: Planar and SBM introduced by SPECTRE.
  • Large graph datasets: Proteins and Point Clouds introduced by GRAN.
  • Molecular graph datasets: QM9, MOSES, and GuacaMol.
  • Our pre-training dataset (unattributed graphs): NetworkX, which is based on graph generators from NetworkX.

Pre-trained Model Evaluation

# You can replace planar with any of the above datasets
dataset=planar # can be sbm, protein, point_cloud, qm9, moses, guacamol, networkx
pretrained_path=${path_to_the_downloaded_model}
python test.py model.pretrained_path=${pretrained_path} experiment=test_${dataset}

Training from Scratch

# You can replace planar with any of the above datasets
python train.py experiment=planar # can be sbm, protein, point_cloud, qm9, moses, guacamol, networkx

About

Official code for "Flatten Graphs as Sequences: Transformers are scalable graph generators"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •