-
Referring segmentation datasets (Required for both training and eval): (FP-/R-)refcoco(+/g) annotations, COCO images
-
Visual Question Answering dataset (Required for training models for referring segmentation model): LLaVA-Instruct-150k
-
Semantic segmentation datasets (Required for training models for reasoning segmentation tasks): ADE20K, COCO-Stuff, PACO-LVIS, PASCAL-Part, COCO Images
Note: For COCO-Stuff, we use the annotation file stuffthingmaps_trainval2017.zip. We only use the PACO-LVIS part in PACO. COCO Images should be put into the
dataset/coco/
directory. -
Augmented Reasoning segmentation dataset (with false-premise queries): FP-Aug ReasonSeg
Download them from the above links, and organize them as follows.
SESAME
├── dataset
│ ├── ade20k
│ │ ├── annotations
│ │ └── images
│ ├── coco
│ │ └── train2017
│ │ ├── 000000000009.jpg
│ │ └── ...
│ ├── cocostuff
│ │ └── train2017
│ │ ├── 000000000009.png
│ │ └── ...
│ ├── llava_dataset
│ │ └── llava_instruct_150k.json
│ ├── reason_seg
│ │ └── ReasonSeg
│ │ ├── train
│ │ └── val
│ ├── refer_seg
│ │ ├── images
│ │ | └── mscoco
│ │ | └── images
│ │ | └── train2014
│ │ ├── refclef
│ │ ├── refcoco
│ │ ├── refcoco+
│ │ ├── refcocog
│ │ ├── R-refcoco
│ │ ├── R-refcoco+
│ │ ├── R-refcocog
│ │ ├── fprefcoco
│ │ ├── fprefcoco+
│ │ └── fprefcocog
│ └── vlpart
│ ├── paco
│ │ └── annotations
│ └── pascal_part
│ ├── train.json
│ └── VOCdevkit