Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation

CVPR 2025

Chanyoung Kim · Dayun Ju · Woojung Han · Ming-Hsuan Yang · Seong Jae Hwang

Yonsei University · University of California, Merced

Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025
Chanyoung Kim, Dayun Ju, Woojung Han, Ming-Hsuan Yang, Seong Jae Hwang
Yonsei University & University of California, Merced

Abstract

Open-Vocabulary Semantic Segmentation (OVSS) has advanced with recent vision-language models (VLMs), enabling segmentation beyond predefined categories through various learning schemes. Notably, training-free methods offer scalable, easily deployable solutions for handling unseen data, a key goal of OVSS. Yet, a critical issue persists: lack of object-level context consideration when segmenting complex objects in the challenging environment of OVSS based on arbitrary query prompts. This oversight limits models' ability to group semantically consistent elements within object and map them precisely to userdefined arbitrary classes. In this work, we introduce a novel approach that overcomes this limitation by incorporating object-level contextual knowledge within images. Specifically, our model enhances intra-object consistency by distilling spectral-driven features from vision foundation models into the attention mechanism of the visual encoder, enabling semantically coherent components to form a single object mask. Additionally, we refine the text embeddings with zero-shot object presence likelihood to ensure accurate alignment with the specific objects represented in the images. By leveraging object-level contextual knowledge, our proposed approach achieves state-of-the-art performance with strong generalizability across diverse datasets.

Installation

Requirements

pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install openmim scikit-learn scikit-image
mim install mmcv==2.0.1 mmengine==0.8.4 mmsegmentation==1.1.1
pip install ftfy regex numpy==1.23.5 yapf==0.40.1

Download Datasets

We include the listed dataset configurations in this repo, following SCLIP: PASCAL VOC (with and without the background category), PASCAL Context (with and without the background category), Cityscapes, ADE20k, COCO-Stuff164k, and COCO-Object.

Please follow the MMSeg data preparation document to download and pre-process the datasets. The COCO-Object dataset can be converted from COCO-Stuff164k by executing the following command:

python ./datasets/cvt_coco_object.py PATH_TO_COCO_STUFF164K -o PATH_TO_COCO_OBJECT

Evaluation

To evaluate CASS on a single benchmark, run the following command:

python eval.py --config ./configs/cfg_{benchmark_name}.py --pamr off

To evaluate CASS on a all 8 benchmarks, run the following command:

bash eval_all.sh

About CASS

Spectral Object-Level Context Distillation

Detailed illustration of our proposed training-free spectral object-level context distillation mechanism. By matching the attention graphs of VFM and CLIP head-by-head to establish complementary relationships, and distilling the fundamental object-level context of the VFM graph to CLIP, we enhance CLIP's ability to capture intra-object contextual coherence.

Object Presence-Driven Object-Level Context

Detailed illustration of our object presence prior-guided text embedding adjustment module. The CLIP text encoder generates text embeddings for each object class, and the object presence prior is derived from both visual and text embeddings. Within hierarchically defined class groups, text embeddings are selected based on object presence prior, then refined in an object-specific direction to align with components likely present in the image.

Qualitative Results

Qualitative comparison across the Pascal VOC, Pascal Context, COCO, and ADE20K datasets using CLIP ViT-B/16.

Quantitative Results

Quantitative results with state-of-the-art unsupervised open-vocabulary semantic segmentation models on eight datasets.

Citation

If you found this code useful, please cite the following paper:

@InProceedings{kim2024distilling,
    author    = {Kim, Chanyoung and Ju, Dayun and Han, Woojung and Yang, Ming-Hsuan and Hwang, Seong Jae},
    title     = {Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025}
}

📜 Acknowledgement

This repository has been developed based on the NACLIP and SCLIP repository. Thanks for the great work!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
clip		clip
configs		configs
datasets		datasets
dino		dino
dinov2		dinov2
prompts		prompts
README.md		README.md
cass_segmentor.py		cass_segmentor.py
custom_datasets.py		custom_datasets.py
eval.py		eval.py
eval_all.sh		eval_all.sh
pamr.py		pamr.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation

CVPR 2025

Chanyoung Kim · Dayun Ju · Woojung Han · Ming-Hsuan Yang · Seong Jae Hwang

Yonsei University · University of California, Merced

Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation

Abstract

📖 Contents

Installation

Requirements

Download Datasets

Evaluation

About CASS

Spectral Object-Level Context Distillation

Object Presence-Driven Object-Level Context

Qualitative Results

Quantitative Results

Citation

📜 Acknowledgement

About

Releases

Packages

Languages

kochanha/CASS

Folders and files

Latest commit

History

Repository files navigation

Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation

CVPR 2025

Chanyoung Kim · Dayun Ju · Woojung Han · Ming-Hsuan Yang · Seong Jae Hwang

Yonsei University · University of California, Merced

Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation

Abstract

📖 Contents

Installation

Requirements

Download Datasets

Evaluation

About CASS

Spectral Object-Level Context Distillation

Object Presence-Driven Object-Level Context

Qualitative Results

Quantitative Results

Citation

📜 Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages