Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TokenGT model #9834

Open
wants to merge 26 commits into
base: master
Choose a base branch
from
Open

Conversation

michailmelonas
Copy link

@michailmelonas michailmelonas commented Dec 9, 2024

PyG implementation of the Tokenized Graph Transformer following "Pure Transformers are Powerful Graph Learners" (https://arxiv.org/pdf/2207.02505). Includes support for both Laplacian eigenvectors and ORF node identifiers (implemented via a simple data Transform object). A graph regression example is also included.

For a detailed blog post about the implementation, see https://medium.com/stanford-cs224w/pyg-implementation-tokengt-e4aa74dc867b.

@michailmelonas
Copy link
Author

@wsad1 @EdisonLeeeee @akihironitta any thoughts on when this contribution will get reviewed? :)

@puririshi98
Copy link
Contributor

@michailmelonas this is cool, ill review and help merge soon as my time allows,

@puririshi98 puririshi98 self-requested a review January 14, 2025 20:56
@puririshi98
Copy link
Contributor

this looks good, will do a deep review soon

@puririshi98
Copy link
Contributor

puririshi98 commented Jan 15, 2025

this is good at a high level. however i want to see how it compares to existing work. Can you please update this example:
https://github.com/pyg-team/pytorch_geometric/blob/master/examples/ogbn_train.py#L31
to have a "--gnn-choice" arg parse option, with choices ["sage, gat, tokengt_graph_transformer"]. and run all 3 in your environment to see how they compare. Please make the highest test acc the default. I can review a little closer once that initial test is done

@michailmelonas
Copy link
Author

@puririshi98 sure thing, will do asap.

@puririshi98
Copy link
Contributor

@michailmelonas lmk when ready for further review

@michailmelonas
Copy link
Author

@puririshi98 apologies for only getting back to you now - have been swamped at work.

TokenGT requires specifying n_nodes orthogonal vectors ("node identifiers"). This is infeasible for the ogbn-papers100M dataset which has over 100M nodes. Therefore, rather than amending ogbn_train.py, I instead added token_gt_ogbn.py: a script that makes it easy to benchmark TokenGT against GCN on the ogbg-molhiv dataset (ideally, I'd like to run the model on PCQM4Mv2 as in the paper, but given my computational resources this was the best I could do). Running said script, I get slightly worse (but comparable) results for TokenGT vs GCN: the former has a validation ROC-AUC of 0.774 and the latter has 0.819.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants