Unleashing the Power of Visual Foundation Models for Generalizable Semantic Segmentation

This is the official implementation of the paper "Unleashing the Power of Visual Foundation Models for Generalizable Semantic Segmentation." In this paper, we propose a novel framework to leverage visual foundation models for domain generalizable semantic segmentation (DGSS). The core idea is to fine-tune the VFM with minimal modifications and enable inference on high-resolution images. We argue that this approach can maintain the pretrained knowledge of the VFM and unleash its power for DGSS. We conduct experiments on various benchmarks and achieve an average mIoU of 70.3% on GTAV to {Cityscapes + BDD100K + Mapillary} and 71.62% on Cityscapes to {BDD100K + Mapillary}, outperforming the previous state-of-the-art approaches by 3.3% and 1.1% in average mIoU, respectively.

Environment Setup

To set up the environment for this project, execute the following script:

chmod +x install.sh
./install.sh

This script will create a conda virtual environment named DGVFM and install all the required dependencies. To run the code, you should activate the virtual environment using the following command:

conda activate DGVFM

Dataset Preparation

1. Download the dataset

GTA: Download all image and label packages from here and extract them to data/gta.
Cityscapes: Download leftImg8bit_trainvaltest.zip and from here and extract them to data/cityscapes.
BDD100K: Download the 10K Images and Segmentation from here and extract them to datasets/bdd100k.
Mapillary: Download MAPILLARY v1.2 from here and extract it to data/mapillary.

The final folder structure should look like this:

DGVFM
├── ...
├── data
│   ├── gta
│   │   ├── images
│   │   ├── labels
│   ├── cityscapes
│   │   ├── leftImg8bit
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── gtFine
│   │   │   ├── train
│   │   │   ├── val
│   ├── bdd100k
│   │   ├── images
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── labels
│   │   │   ├── train
│   │   │   ├── val
│   ├── mapillary
│   │   ├── training
│   │   │   ├── images
│   │   │   ├── labels
│   │   ├── validation
│   │   │   ├── images
│   │   │   ├── val_label
├── ...

2. Convert the dataset Prepare datasets with these commands:

cd DGVFM
python tools/convert_datasets/gta.py data/gta 
python tools/convert_datasets/cityscapes.py data/cityscapes
python tools/convert_datasets/mapillary2cityscape.py data/mapillary data/mapillary/cityscapes_trainIdLabel --train_id
# you do not need to convert BDD100K. It is already in the correct format.

Preparing Visual Foundation Models

Download: Download pre-trained weights of VFMs and place them in the checkpoints directory without changing the file name. You only need to download one of the following models depending on which one you want to run:

Model	Download Link	filename	Size
DINOv2	DINOv2-ViT-L/14	dinov2_vitl14_pretrain.pth	1.2GB
EVA02	EVA02-ViT-L/14	eva02_L_pt_m38m_p14to16.pt	613MB
CLIP	CLIP-ViT-L/14	ViT-L-14.pt	890MB
SAM	SAM-ViT-H/14	sam_vit_h_4b8939.pth	2.4GB

Convert: Convert pre-trained weights for training or evaluation.

# convert DINOv2
python tools/convert_models/convert_dinov2.py checkpoints/dinov2_vitl14_pretrain.pth checkpoints/dinov2_converted.pth
# convert EVA02
python tools/convert_models/convert_eva2_512x512.py checkpoints/eva02_L_pt_m38m_p14to16.pt checkpoints/eva02_L_converted.pth
# convert CLIP
python tools/convert_models/convert_clip.py checkpoints/ViT-L-14.pt checkpoints/CLIP-ViT-L_converted.pth
# convert SAM
python tools/convert_models/convert_sam.py checkpoints/sam_vit_h_4b8939.pth checkpoints/sam_vit_h_converted.pth

Training

Start training on a single GPU:

python tools/train.py configs/dg/gta2citys/dg_lora_dinov2_ms_masked.py

You can also run the script:

./train.sh

Pretrained Models

We provide the following pretrained models:

Setting	checkpoint
GTAV -> Citys + BDD + Map	gta.pth
Citys -> BDD + Map	citys.pth

Evaluation

Run the evaluation:

python tools/test.py \
  configs/dg/gta2citys/dg_lora_dinov2_ms_masked.py \
  <path_to_your_checkpoint> \
  --backbone checkpoints/dinov2_converted.pth

You can also run the script:

./test.sh

Overview of Important Files

This section provides an overview of the code files related to the model architecture and design:

core/models/backbones: This folder contains the implementation of encoder of VFMs, including dino_v2.py,eva_02.py,sam_vit.py,clip.py. lora_backbone.py implements the lora-based fine-tuning algorithm.
core/models/heads: This folder contains the implementation of the head for our VFMNet and MGRNet. Liner_head.py implements the head for VFMNet. VFMHead.py implements the head for MGRNet.
core/segmentors/Ms_VFM_encoder_decoder.py: This file implements our multi-scale training algorithm and the two-stage coarse-to-fine inference algorithm.

Citation

If you find this code useful for your research, please consider citing our paper:

@inproceedings{
anonymous2024unleashing,
title={Unleashing the Power of Visual Foundation Models for Generalizable Semantic Segmentation},
author={Peiyuan Tang, Xiaodong Zhang, Chunze Yang, Haoran Yuan, Jun Sun, Danfeng Shan, Zijiang James Yan},
booktitle={The 39th Annual AAAI Conference on Artificial Intelligence},
year={2024},
url={https://openreview.net/forum?id=ZarQ2RfHxO}
}

Acknowledgment

Our implementation is based on the following repositories. We thank the authors for their contributions:

Name	Name	Last commit message	Last commit date
Latest commit Your Name upload pretrained weights Mar 10, 2025 f214ed1 · Mar 10, 2025 History 73 Commits
configs	configs	test on ACDC	Nov 6, 2024
rein	rein	1. print trainable parameters	Nov 19, 2024
resources	resources	Update image paths in Readme.md	Feb 24, 2025
tools	tools	1. Scripts for downloading the dataset.	Feb 24, 2025
.gitignore	.gitignore	add README.md	Nov 19, 2024
Readme.md	Readme.md	upload pretrained weights	Mar 10, 2025
environment.yml	environment.yml	script to set up environment (not validate)	Nov 19, 2024
install.sh	install.sh	update install.sh and Readme.md	Feb 24, 2025
requirements.txt	requirements.txt	update install.sh and Readme.md	Feb 24, 2025
test.py	test.py	first commit	Apr 29, 2024
test.sh	test.sh	final commit	Aug 20, 2024
train.py	train.py	DINO + SegFormer + RareClassSampling + Freeze	May 5, 2024
train.sh	train.sh	final commit	Aug 20, 2024
use_vpn.sh	use_vpn.sh	必须回滚	Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unleashing the Power of Visual Foundation Models for Generalizable Semantic Segmentation

Table of Contents

Environment Setup

Dataset Preparation

Preparing Visual Foundation Models

Training

Pretrained Models

Evaluation

Overview of Important Files

Citation

Acknowledgment

About

Releases

Packages

Languages

tpy001/VFMSeg

Folders and files

Latest commit

History

Repository files navigation

Unleashing the Power of Visual Foundation Models for Generalizable Semantic Segmentation

Table of Contents

Environment Setup

Dataset Preparation

Preparing Visual Foundation Models

Training

Pretrained Models

Evaluation

Overview of Important Files

Citation

Acknowledgment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages