Skip to content

[ECCV'24] Official PyTorch implementation of In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation

Notifications You must be signed in to change notification settings

dahyun-kang/lazygrounding

Repository files navigation

In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation



result

This repo is the official implementation of the ECCV 2024 paper In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation

Conda installation command

conda env create -f environment.yml --prefix $YOURPREFIX

$YOUPREFIX is typically /home/$USER/anaconda3

Dependencies

This repo is built on CLIP, SCLIP, and MMSegmentation.

mim install mmcv==2.0.1 mmengine==0.8.4 mmsegmentation==1.1.1
pip install ftfy regex yapf==0.40.1

Dataset preparation

Please make it compatible with Pascal VOC 2012, Pascal Context, COCO stuff 164K, COCO object, and ADEChallengeData2016 following the MMSeg data preparation. The COCO-Object dataset can be converted from COCO-Stuff164k by executing the following command:

python datasets/cvt_coco_object.py PATH_TO_COCO_STUFF164K -o PATH_TO_COCO164K

Place them under $yourdatasetroot/ directory such that:

    $yourdatasetroot/
    ├── ADEChallengeData2016/
    │   ├── annotations/
    │   ├── images/
    │   ├── ...
    ├── VOC2012/
    │   ├── Annotations/
    │   ├── JPEGImages/
    │   ├── ...
    ├── coco_stuff164k/
    │   ├── annotations/
    │   ├── images/
    │   ├── ...
    ├── ...

1) Panoptic Cut for unsupervised object mask discovery

cd panoptic_cut
python predict.py \
    --logs panoptic_cut \
    --dataset {coco_object, coco_stuff, ade20k, voc21, voc20, context60, context59} \
    --datasetroot $yourdatasetroot

The checkpoints for the panoptic mask discovery is found below google drive:

mask prediction root after stage 1) benchmark id Google drive link
coco_stuff164k coco_object, coco_stuff164k link to download (84.5 MB)
VOC2012 context59, context60, voc20, voc21 link to download (66.7 MB)
ADEChallengeData2016 ade20k link to download (29.4 MB)

Place them under lavg/panoptic_cut/pred/ directory such that:

    lavg/panoptic_cut/pred/panoptic_cut/
    ├── ADEChallengeData2016/
    │   ├── ADE_val_00000001.pth
    │   ├── ADE_val_00000002.pth
    │   ├── ...
    ├── VOC2012/
    │   ├── 2007_000033.pth
    │   ├── 2007_000042.pth
    │   ├── ...
    ├── coco_stuff164k/
    │   ├── 000000000139.pth
    │   ├── 000000000285.pth
    │   ├── ...

2) Visual grounding & Segmentation evaluation

Update $yourdatasetroot in configs/cfg_*.py

cd lavg
python eval.py --config ./configs/{cfg_context59/cfg_context60/cfg_voc20/cfg_voc21}.py --maskpred_root VOC2012/panoptic_cut
python eval.py --config ./configs/cfg_ade20k.py --maskpred_root ADEChallengeData2016/panoptic_cut
python eval.py --config ./configs/{cfg_coco_object/cfg_coco_stuff164k}.py --maskpred_root coco_stuff164k/panoptic_cut

The run is a single-GPU compatible.

Quantitative performance (mIoU, %) on open-vocabulary semantic segmentation benchmarks

With background category Without background category
Method VOC21 Context60 COCO-obj VOC20 Context59 ADE COCO-stuff
LaVG 62.1 31.6 34.2 82.5 34.7 15.8 23.2

Related repos

Our project refers to and heavily borrows some the codes from the following repos:

Acknowledgements

This work was supported by Samsung Electronics (IO201208-07822-01), the NRF grant (NRF-2021R1A2C3012728 (45%), and the IITP grants (RS-2022-II220959: Few-Shot Learning of Causal Inference in Vision and Language for Decision Making (50%), RS-2019-II191906: AI Graduate School Program at POSTECH (5%)) funded by Ministry of Science and ICT, Korea. We also thank Sua Choi for her helpful discussion.

About

[ECCV'24] Official PyTorch implementation of In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages