This is an official PyTorch implementation of Semantic-Guided Representation Enhancement for Multi-Label Image Classification, IEEE Transactions on Circuits and Systems for Video Technology 2024. [paper]
- Download dataset and organize them as follow:
|datasets
|---- MSCOCO
|-------- annotations
|-------- train2014
|-------- val2014
|---- NUS-WIDE
|-------- Flickr
|-------- Groundtruth
|-------- ImageList
|-------- NUS_WID_Tags
|-------- Concepts81.txt
|---- VOC2007
|-------- Annotations
|-------- ImageSets
|-------- JPEGImages
|-------- SegmentationClass
|-------- SegmentationObject
|---- VisualGenome
|-------- ssgrl_partition
|------------ test_list_500.txt
|------------ train_list_500.txt
|------------ vg_category_500_labels_index.json
|-------- VG_100K
|-------- VG_100K_2
|-------- objects.json
- Preprocess using following commands:
python scripts/mscoco.py
python scripts/nuswide.py
python scripts/voc2007.py
python scripts/vg500.py
python embedding.py --data [mscoco, nuswide, voc2007, vg500]
torch >= 1.9.0
torchvision >= 0.10.0
One can use following commands to train model and reproduce the results reported in paper.
python train.py --model mlic --arch resnet101 --data voc2007 --loss asl --batch-size 128 --lr 0.00009 --lamda 0.1 --ema-decay 0.9983 --pos
python train.py --model mlic --arch resnet101 --data mscoco --loss asl --batch-size 128 --lr 0.00009 --lamda 0.4 --pos
python train.py --model mlic --arch resnet101 --data nuswide --loss asl --batch-size 128 --lr 0.00009 --lamda 0.05 --pos
Note that the choice for --arch
can be tresnet_l
, tresnet_l21k
, vit_large_patch16_224
, vit_large_patch16_224_in21k
and swin_large_patch4_window12_384_in22k
. One can add CUDA_VISIBLE_DEVICES=0,1,2,3
in front of the commands to enable distributed data parallel training with available GPUs.
Pre-trained models are available in link. Download and put them in the experiments
folder, then one can use following commands to reproduce results reported in paper.
python evaluate.py --exp-dir experiments/mlic_mscoco/exp1 # evaluation for ResNet101 on MSCOCO
python evaluate.py --exp-dir experiments/mlic_mscoco/exp2 # evaluation for TResNetL on MSCOCO
python evaluate.py --exp-dir experiments/mlic_mscoco/exp3 # evaluation for ViT-large on MSCOCO
To visualize the attention heatmaps in the paper, run the following command and visualization results are saved in the visualization
folder of the corresponding experiment.
python infer.py --exp-dir experiments/mlic_mscoco/exp1
@article{zhu2024semantic,
title={Semantic-Guided Representation Enhancement for Multi-Label Image Classification},
author={Zhu, Xuelin and Li, Jianshu and Cao, Jiuxin and Tang, Dongqi and Liu, Jian and Liu, Bo},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
year={2024},
volume={34},
number={10},
pages={10036-10049}
}