Skip to content

Latest commit

 

History

History
 
 

semseg

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Depth Anything for Semantic Segmentation

We use our Depth Anything pre-trained ViT-L encoder to fine-tune downstream semantic segmentation models.

Performance

Cityscapes

Note that our results are obtained without Mapillary pre-training.

Method Encoder mIoU (s.s.) m.s.
SegFormer MiT-B5 82.4 84.0
Mask2Former Swin-L 83.3 84.3
OneFormer Swin-L 83.0 84.4
OneFormer ConNeXt-XL 83.6 84.6
DDP ConNeXt-L 83.2 83.9
Ours ViT-L 84.8 86.2

ADE20K

Method Encoder mIoU
SegFormer MiT-B5 51.0
Mask2Former Swin-L 56.4
UperNet BEiT-L 56.3
ViT-Adapter BEiT-L 58.3
OneFormer Swin-L 57.4
OneFormer ConNeXt-XL 57.4
Ours ViT-L 59.4

Pre-trained models

Installation

Please refer to MMSegmentation for instructions. Do not forget to install mmdet to support Mask2Former:

pip install "mmdet>=3.0.0rc4"

After installation:

For training or inference with our pre-trained models, please refer to MMSegmentation instructions.