Skip to content

Latest commit

 

History

History
88 lines (78 loc) · 3.22 KB

README.md

File metadata and controls

88 lines (78 loc) · 3.22 KB

Semantic-Guided Representation Enhancement for Multi-Label Image Classification

Introduction

This is an official PyTorch implementation of Semantic-Guided Representation Enhancement for Multi-Label Image Classification, IEEE Transactions on Circuits and Systems for Video Technology 2024. [paper]

alt tsformer

Data Preparation

  1. Download dataset and organize them as follow:
|datasets
|---- MSCOCO
|-------- annotations
|-------- train2014
|-------- val2014
|---- NUS-WIDE
|-------- Flickr
|-------- Groundtruth
|-------- ImageList
|-------- NUS_WID_Tags
|-------- Concepts81.txt
|---- VOC2007
|-------- Annotations
|-------- ImageSets
|-------- JPEGImages
|-------- SegmentationClass
|-------- SegmentationObject
|---- VisualGenome
|-------- ssgrl_partition
|------------ test_list_500.txt
|------------ train_list_500.txt
|------------ vg_category_500_labels_index.json
|-------- VG_100K
|-------- VG_100K_2
|-------- objects.json
  1. Preprocess using following commands:
python scripts/mscoco.py
python scripts/nuswide.py
python scripts/voc2007.py
python scripts/vg500.py
python embedding.py --data [mscoco, nuswide, voc2007, vg500]

Requirements

torch >= 1.9.0
torchvision >= 0.10.0

Training

One can use following commands to train model and reproduce the results reported in paper.

python train.py --model mlic --arch resnet101 --data voc2007 --loss asl --batch-size 128 --lr 0.00009 --lamda 0.1 --ema-decay 0.9983 --pos
python train.py --model mlic --arch resnet101 --data mscoco --loss asl --batch-size 128 --lr 0.00009 --lamda 0.4 --pos
python train.py --model mlic --arch resnet101 --data nuswide --loss asl --batch-size 128 --lr 0.00009 --lamda 0.05 --pos

Note that the choice for --arch can be tresnet_l, tresnet_l21k, vit_large_patch16_224, vit_large_patch16_224_in21k and swin_large_patch4_window12_384_in22k. One can add CUDA_VISIBLE_DEVICES=0,1,2,3 in front of the commands to enable distributed data parallel training with available GPUs.

Evaluation

Pre-trained models are available in link. Download and put them in the experiments folder, then one can use following commands to reproduce results reported in paper.

python evaluate.py --exp-dir experiments/mlic_mscoco/exp1 # evaluation for ResNet101 on MSCOCO
python evaluate.py --exp-dir experiments/mlic_mscoco/exp2 # evaluation for TResNetL on MSCOCO
python evaluate.py --exp-dir experiments/mlic_mscoco/exp3 # evaluation for ViT-large on MSCOCO

Visualization

To visualize the attention heatmaps in the paper, run the following command and visualization results are saved in the visualization folder of the corresponding experiment.

python infer.py --exp-dir experiments/mlic_mscoco/exp1

Citation

@article{zhu2024semantic,
  title={Semantic-Guided Representation Enhancement for Multi-Label Image Classification}, 
  author={Zhu, Xuelin and Li, Jianshu and Cao, Jiuxin and Tang, Dongqi and Liu, Jian and Liu, Bo},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  year={2024},
  volume={34},
  number={10},
  pages={10036-10049}
}