Skip to content

Runsong123/COS3D

Repository files navigation

COS3D: Collaborative Open-Vocabulary 3D Segmentation

This is the official PyTorch implementation of the following publication:

COS3D: Collaborative Open-Vocabulary 3D Segmentation
Runsong Zhu, Ka-Hei Hui, Zhengzhe Liu, Qianyi Wu, Weiliang Tang, Shi Qiu, Pheng-Ann Heng, Chi-Wing Fu.
NeurIPS 2025
Paper (NeurIPS) |Paper (ArXiv)

Introduction

TL;DR:

1). This paper contributes a novel and effective collaborative prompt-segmentation framework (COS3D) for the 3D open-vocabulary segmentation task.

2). Extensive experiments demonstrate that i) it not only significantly outperforms existing baselines with superior training efficiency ii) but also shows high potential for various applications, such as novel image-based 3D segmentation, hierarchical segmentation, and robotics.

Teaser

Teaser image

Overview

Method image

Applications

Method image

Requirements

The code has been tested on:

  • Ubuntu 20.04
  • CUDA 11.8
  • Python 3.8.18
  • Pytorch 1.12.1
  • GeForce RTX 4090.

Installation

Cloning the Repository

The repository contains submodules, thus please check it out with

# HTTPS
git https://github.com/Runsong123/COS3D.git --recursive

Environment Setup

Our default, provided install method is based on Conda package and environment management:

conda env create --file environment.yml
conda activate COS3D

Then, download the checkpoints of SAM from here and place it in the ckpts/ directory.

Pre-processing:

  1. Downloading/Preparing the dataset (images + segmentation/Language features via a 2D foundation model).
  2. (Pre-processing) obtaining the 3DGS from the given images.

Training

Our training process consists of two steps:

  1. Stage 1: Training instance field.
  2. Stage 2: Instance2language mapping.

For simplicity, you can run the following training script:

cd ./script/train && bash train.sh

Inference

Our inference process consists of three main steps:

  1. 3D grounding for given queries.
  2. Render images for novel views.
  3. Exporting the metrics.

For simplicity, you can run the following inference script:

cd ./script/infer && bash infer.sh

Data and checkpoint

You can download the LERF dataset from this OneDrive / Baidu (provided by OpenGaussians). Additionally, we provide our COS3D checkpoint for quick testing.

TODO list

  • Release training code
  • Release evaluation code
  • Release the preprocessing code to support various 2D foundation models (e.g., SAM2 and Semantic-SAM for segmentation results, and SigLIP for language features).
  • Release the applications code (e.g., novel image-based query).

This repository is still under construction. Please feel free to open issues or submit pull requests. We appreciate all contributions to this project.

Citation

@article{zhu2025cos3d,
  title={COS3D: Collaborative Open-Vocabulary 3D Segmentation},
  author={Zhu, Runsong and Hui, Ka-Hei and Liu, Zhengzhe and Wu, Qianyi and Tang, Weiliang and Qiu, Shi and Heng, Pheng-Ann and Fu, Chi-Wing},
  journal={arXiv preprint arXiv:2510.20238},
  year={2025}
}

Related Projects

Some code snippets are borrowed from OpenGaussian, Langsplat, GAGS, Unified-Lift. We thank the authors for releasing their code.

About

Code Release for NeurIPS 2025, "COS3D: Collaborative Open-Vocabulary 3D Segmentation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages