Generative Recommendation with Semantic IDs (GRID)

GRID (Generative Recommendation with Semantic IDs) is a state-of-the-art framework for generative recommendation systems using semantic IDs, developed by a group of scientists and engineers from Snap Research. This project implements novel approaches for learning semantic IDs from text embedding and generating recommendations through transformer-based generative models.

🚀 Overview

GRID facilitates generative recommendation three overarching steps:

Embedding Generation with LLMs: Converting item text into embeddings using any LLMs available on Huggingface.
Semantic ID Learning: Converting item embedding into hierarchical semantic IDs using Residual Quantization techniques such as RQ-KMeans, RQ-VAE, RVQ.
Generative Recommendations: Using transformer architectures to generate recommendation sequences as semantic ID tokens.

📦 Installation

Prerequisites

Python 3.10+
CUDA-compatible GPU (recommended)

Setup Environment

# Clone the repository
git clone https://github.com/snap-research/GRID.git
cd GRID

# Install dependencies
pip install -r requirements.txt

🎯 Quick Start

1. Data Preparation

Prepare your dataset in the expected format:

data/
├── train/       # training sequence of user history 
├── validation/  # validation sequence of user history 
├── test/        # testing sequence of user history 
└── items/       # text of all items in the dataset

We provide pre-processed Amazon data explored in the P5 paper [4]. The data can be downloaded from this google drive link.

2. Embedding Generation with LLMs

Generate embeddings from LLMs, which later will be transformed into semantic IDs.

python -m src.inference experiment=sem_embeds_inference_flat data_dir=data/amazon_data/beauty # avaiable data includes 'beauty', 'sports', and 'toys'

3. Train and Generate Semantic IDs

Learn semantic ID centroids for embeddings generated in step 2:

python -m src.train experiment=rkmeans_train_flat \
    data_dir=data/amazon_data/beauty \
    embedding_path=<output_path_from_step_2>/merged_predictions_tensor.pt \ # this can be found in the log dirs in step2
    embedding_dim=2048 \ # the model dimension of the LLMs you use in step 2. 2048 for flan-t5-xl as used in this example.
    num_hierarchies=3 \  # we train 3 codebooks
    codebook_width=256 \ # each codebook has 256 rows of centroids

Generate SIDs:

python -m src.inference experiment=rkmeans_inference_flat \
    data_dir=data/amazon_data/beauty \
    embedding_path=<output_path_from_step_2>/merged_predictions_tensor.pt \ 
    embedding_dim=2048 \ 
    num_hierarchies=3 \  
    codebook_width=256 \ 
    ckpt_path=<the_checkpoint_you_just_get_above> # this can be found in the log dir for training SIDs

4. Train Generative Recommendation Model with Semantic IDs

Train the recommendation model using the learned semantic IDs:

python -m src.train experiment=tiger_train_flat \
    data_dir=data/amazon_data/beauty \ 
    semantic_id_path=<output_path_from_step_3>/pickle/merged_predictions_tensor.pt \
    num_hierarchies=4 # Please note that we add 1 for num_hierarchies because in the previous step we appended one additional digit to de-duplicate the semantic IDs we generate.

4. Generate Recommendations

Run inference to generate recommendations:

python -m src.inference experiment=tiger_inference_flat \
    data_dir=data/amazon_data/beauty \ 
    semantic_id_path=<output_path_from_step_3>/pickle/merged_predictions_tensor.pt \
    ckpt_path=<the_checkpoint_you_just_get_above> \ # this can be found in the log dir for training GR models
    num_hierarchies=4 \ # Please note that we add 1 for num_hierarchies because in the previous step we appended one additional digit to de-duplicate the semantic IDs we generate.

Supported Models:

Semantic ID:

Residual K-means proposed in One-Rec [2]
Residual Vector Quantization
Residual Quantization with Variational Autoencoder [3]

Generative Recommendation:

TIGER [1]

📚 Citation

If you use GRID in your research, please cite:

@inproceedings{grid,
  title     = {Generative Recommendation with Semantic IDs: A Practitioner's Handbook},
  author    = {Ju, Clark Mingxuan and Collins, Liam and Neves, Leonardo and Kumar, Bhuvesh and Wang, Louis Yufeng and Zhao, Tong and Shah, Neil},
  booktitle = {Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM)},
  year      = {2025}
}

🤝 Acknowledgments

Built with PyTorch and PyTorch Lightning
Configuration management by Hydra
Inspired by recent advances in generative AI and recommendation systems
Part of this repo is built on top of https://github.com/ashleve/lightning-hydra-template

📞 Contact

For questions and support:

Create an issue on GitHub
Contact the development team: Clark Mingxuan Ju ([email protected]), Liam Collins ([email protected]), and Leonardo Neves ([email protected]).

Bibliography

[1] Rajput, Shashank, et al. "Recommender systems with generative retrieval." Advances in Neural Information Processing Systems 36 (2023): 10299-10315.

[2] Deng, Jiaxin, et al. "Onerec: Unifying retrieve and rank with generative recommender and iterative preference alignment." arXiv preprint arXiv:2502.18965 (2025).

[3] Lee, Doyup, et al. "Autoregressive image generation using residual quantization." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.

[4] Geng, Shijie, et al. "Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5)." Proceedings of the 16th ACM conference on recommender systems. 2022.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
logs		logs
outputs		outputs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
notices.txt		notices.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Generative Recommendation with Semantic IDs (GRID)

🚀 Overview

📦 Installation

Prerequisites

Setup Environment

🎯 Quick Start

1. Data Preparation

2. Embedding Generation with LLMs

3. Train and Generate Semantic IDs

4. Train Generative Recommendation Model with Semantic IDs

4. Generate Recommendations

Supported Models:

Semantic ID:

Generative Recommendation:

📚 Citation

🤝 Acknowledgments

📞 Contact

Bibliography

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

snap-research/GRID

Folders and files

Latest commit

History

Repository files navigation

Generative Recommendation with Semantic IDs (GRID)

🚀 Overview

📦 Installation

Prerequisites

Setup Environment

🎯 Quick Start

1. Data Preparation

2. Embedding Generation with LLMs

3. Train and Generate Semantic IDs

4. Train Generative Recommendation Model with Semantic IDs

4. Generate Recommendations

Supported Models:

Semantic ID:

Generative Recommendation:

📚 Citation

🤝 Acknowledgments

📞 Contact

Bibliography

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages