This is the official PyTorch implementation of the following publication:
COS3D: Collaborative Open-Vocabulary 3D Segmentation
Runsong Zhu, Ka-Hei Hui, Zhengzhe Liu, Qianyi Wu, Weiliang Tang, Shi Qiu, Pheng-Ann Heng, Chi-Wing Fu.
NeurIPS 2025
Paper (NeurIPS) |Paper (ArXiv)
TL;DR:
1). This paper contributes a novel and effective collaborative prompt-segmentation framework (COS3D) for the 3D open-vocabulary segmentation task.
2). Extensive experiments demonstrate that i) it not only significantly outperforms existing baselines with superior training efficiency ii) but also shows high potential for various applications, such as novel image-based 3D segmentation, hierarchical segmentation, and robotics.
The code has been tested on:
- Ubuntu 20.04
- CUDA 11.8
- Python 3.8.18
- Pytorch 1.12.1
- GeForce RTX 4090.
The repository contains submodules, thus please check it out with
# HTTPS
git https://github.com/Runsong123/COS3D.git --recursiveOur default, provided install method is based on Conda package and environment management:
conda env create --file environment.yml
conda activate COS3DThen, download the checkpoints of SAM from here and place it in the ckpts/ directory.
- Downloading/Preparing the dataset (images + segmentation/Language features via a 2D foundation model).
- (Pre-processing) obtaining the 3DGS from the given images.
Our training process consists of two steps:
- Stage 1: Training instance field.
- Stage 2: Instance2language mapping.
cd ./script/train && bash train.sh
Our inference process consists of three main steps:
- 3D grounding for given queries.
- Render images for novel views.
- Exporting the metrics.
cd ./script/infer && bash infer.shYou can download the LERF dataset from this OneDrive / Baidu (provided by OpenGaussians). Additionally, we provide our COS3D checkpoint for quick testing.
- Release training code
- Release evaluation code
- Release the preprocessing code to support various 2D foundation models (e.g., SAM2 and Semantic-SAM for segmentation results, and SigLIP for language features).
- Release the applications code (e.g., novel image-based query).
This repository is still under construction. Please feel free to open issues or submit pull requests. We appreciate all contributions to this project.
@article{zhu2025cos3d,
title={COS3D: Collaborative Open-Vocabulary 3D Segmentation},
author={Zhu, Runsong and Hui, Ka-Hei and Liu, Zhengzhe and Wu, Qianyi and Tang, Weiliang and Qiu, Shi and Heng, Pheng-Ann and Fu, Chi-Wing},
journal={arXiv preprint arXiv:2510.20238},
year={2025}
}
Some code snippets are borrowed from OpenGaussian, Langsplat, GAGS, Unified-Lift. We thank the authors for releasing their code.


