Deep Semantic Dictionary Learning for Multi-label Image Classification

Implementation of the paper: "Deep Semantic Dictionary Learning for Multi-label Image Classification" (AAAI 2021) by Fengtao Zhou, Sheng Huang, Yun Xing.

Abstract:
Compared with single-label image classification, multi-label image classification is more practical and challenging. Some recent studies attempted to leverage the semantic information of categories for improving multi-label image classification performance. However, these semantic-based methods only take semantic information as type of complements for visual representation without further exploitation. In this paper, we present a innovative path towards the solution of the multi-label image classification which considers it as a dictionary learning task. A novel end-to-end model named Deep Semantic Dictionary Learning (DSDL) is designed. In DSDL, an auto-encoder is applied to generate the semantic dictionary from class-level semantics and then such dictionary is utilized for representing the visual features extracted by Convolutional Neural Network (CNN) with label embeddings. The DSDL provides a simple but elegant way to exploit and reconcile the label, semantic and visual spaces simultaneously via conducting the dictionary learning among them. Moreover, inspired by iterative optimization of traditional dictionary learning, we further devise a novel training strategy named Alternately Parameters Update Strategy (APUS) for optimizing DSDL, which alteratively optimizes the representation coefficients and the semantic dictionary in forward and backward propagation. Extensive experimental results on three popular benchmarks demonstrate that our method achieves promising performances in comparison with the state-of-the-arts.

Environment

Python3.5.2

Pytorch 1.1.0

torchvision 0.2.2

Datasets

Microsoft COCO

Microsoft COCO is large scale images with Common Objects in Context (COCO), which is one of the most popular image datasets out there, with applications like object detection, segmentation, and image caption. The images in the dataset are everyday objects captured from everyday scenes. COCO provides multi-object labeling, segmentation mask annotations, image captioning, key-point detection and panoptic segmentation annotations with a total of 81 categories, making it a very versatile and multi-purpose dataset. More concretely, the dataset contains 122,218 images and covers 80 common categories (class 81 is background). For multi-label image classification, MS COCO is further divided into a training set of 82,081 images and a validation set of 40,137 images. Since the ground truth annotations of test set are unavailable, we usually train our model on the training datasets and evaluate on the validation set.
You can see more detailed description of this dataset on its website.

Pascal VOC 2007

Pascal VOC 2007 dataset is a benchmark in the field of computer vision, which is widely used in image classification, object detection, target segmentation and other tasks, which contains four categories: Vehicle, Indoor, Animal, Person, and can be subdivided into 20 classes.

Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

For multi-label image classification, VOC 2007 dataset is divided into training (2,501), validation (2,510) and testing (4,952) sets. Following the previous work, we usually use the train and validate set to train our model, and evaluate the classification performance on the test set.
You can see more detailed description of this dataset on its website.

Pascal VOC 2012

The classes of VOC 2012 are same as VOC 2007, but the 2012 dataset contains images from 2008-2011 and there is no intersection with 2007. VOC 2012 dataset is divided into training (5,717), validating (5,823) and testing (10,991) set. However, no ground truth labels are provided in the test set. Therefore, all the approaches have to be evaluated by submitting the testing results to the PASCAL VOC Evaluation Server. We usually train our model on training set, and fine-tune on validate set. Then, the result of testing set is submitted to Evaluation Server for evaluation.
You can see more detailed description of this dataset on its website.

Download pretrain models

coco/checkpoint/coco (GoogleDrive)

voc2007/checkpoint/voc2007 (GoogleDrive)

voc2012/checkpoint/voc2012 (GoogleDrive)

Demo VOC 2007

cd voc2007
bash run_voc2007

Demo VOC 2012

cd voc2012
bash run_voc2012

Demo MS-COCO

cd coco
bash run_coco

Experimental Results

The performance comparison on the MS-COCO

Method	mAP	CP-3	CR-3	CF1-3	OP-3	OR-3	OF1-3	CP	CR	CF1	OP	OR	OF1
WARP	-	59.3	52.5	55.7	59.8	61.4	60.7	-	-	-	-	-	-
CNN-RNN	61.2	66.0	55.6	60.4	69.2	66.4	67.8	-	-	-	-	-	-
ResNet-101	77.3	84.1	59.4	69.7	89.1	62.8	73.6	80.2	66.7	72.8	83.9	70.8	76.8
RDAR	73.4	79.1	58.7	67.4	84.0	63.0	72.0	-	-	-	-	-	-
ResNet-SRN	77.1	85.2	58.8	67.4	87.4	62.5	72.9	81.6	65.4	71.2	82.7	69.9	75.8
RLSD	68.2	67.6	57.2	62.0	70.1	63.4	66.5	-	-	-	-	-	-
KD-WSD	74.6	-	-	66.8	-	-	72.7	-	-	69.2	-	-	74.0
RARL	-	78.8	57.2	66.2	84.0	61.6	71.1	-	-	-	-	-	-
DELTA	71.3	-	-	-	-	-	-	-	-	-	-	-	-
ResNet101-ACfs	77.5	85.2	59.4	68.0	86.6	63.3	73.1	77.4	68.3	72.2	79.8	73.1	76.3
CoP	81.1	86.4	62.9	72.7	88.7	65.1	75.1	81.2	70.8	75.8	83.6	73.3	78.1
DSDL	81.7	88.1	62.9	73.4	89.6	65.3	75.6	84.1	70.4	76.7	85.1	73.9	79.1

The performance comparison on the VOC 2007

Method	aero	bike	bird	boat	bottle	bus	car	cat	chair	cow	table	dog	horse	motor	person	plant	sheep	sofa	train	tv	mAP
HCP	98.6	97.1	98.0	95.6	75.3	94.7	95.8	97.3	73.1	90.2	80.0	97.3	96.1	94.9	96.3	78.3	94.7	76.2	97.9	91.5	90.9
CNN-RNN	96.7	83.1	94.2	92.8	61.2	82.1	89.1	94.2	64.2	83.6	70.0	92.4	91.7	84.2	93.7	59.8	93.2	75.3	99.7	78.6	84.0
ResNet-101	99.5	97.7	97.8	96.4	65.7	91.8	96.1	97.6	74.2	80.9	85.0	98.4	96.5	95.9	98.4	70.1	88.3	80.2	98.9	89.2	89.9
FeV+LV	97.9	97.0	96.6	94.6	73.6	93.9	96.5	95.5	73.7	90.3	82.8	95.4	97.7	95.9	98.6	77.6	88.7	78.0	98.3	89.0	90.6
RCP	99.3	97.6	98.0	96.4	79.3	93.8	96.6	97.1	78.0	88.7	87.1	97.1	96.3	95.4	99.1	82.1	93.6	82.2	98.4	92.8	92.5
RDAR	98.6	97.4	96.3	96.2	75.2	92.4	96.5	97.1	76.5	92.0	87.7	96.8	97.5	93.8	98.5	81.6	93.7	82.8	98.6	89.3	91.9
RMIC	97.1	91.3	94.2	57.1	86.7	90.7	93.1	63.3	83.3	76.4	92.8	94.4	91.6	95.1	92.3	59.7	86.0	69.5	96.4	79.0	84.5
RLSD	96.4	92.7	93.8	94.1	71.2	92.5	94.2	95.7	74.3	90.0	74.2	95.4	96.2	92.1	97.9	66.9	93.5	73.7	97.5	87.6	88.5
RARL	98.6	97.1	97.1	95.5	75.6	92.8	96.8	97.3	78.3	92.2	87.6	96.9	96.5	93.6	98.5	81.6	93.1	83.2	98.5	89.3	92.0
ML-GCN	99.5	98.5	98.6	98.1	80.8	94.6	97.2	98.2	82.3	95.7	86.4	98.2	98.4	96.7	99.0	84.7	96.7	84.3	98.9	93.7	94.0
CoP	99.9	98.4	97.8	98.8	81.2	93.7	97.1	98.4	82.7	94.6	87.1	98.1	97.6	96.2	98.8	83.2	96.2	84.7	99.1	93.5	93.8
DSDL	99.8	98.7	98.4	97.9	81.9	95.4	97.6	98.3	83.3	95.0	88.6	98.0	97.9	95.8	99.0	86.6	95.9	86.4	98.6	94.4	94.4

The performance comparison on the VOC 2012

Method	aero	bike	bird	boat	bottle	bus	car	cat	chair	cow	table	dog	horse	motor	person	plant	sheep	sofa	train	tv	mAP
VGG19+SVM	99.1	88.7	95.7	93.9	73.1	92.1	84.8	97.7	79.1	90.7	83.2	97.3	96.2	94.3	96.9	63.4	93.2	74.6	97.3	87.9	89.0
HCP	99.1	92.8	97.4	94.4	79.9	93.6	89.8	98.2	78.2	94.9	79.8	97.8	97.0	93.8	96.4	74.3	94.7	71.9	96.7	88.6	90.5
FeV+LV	98.4	92.8	93.4	90.7	74.9	93.2	90.2	96.1	78.2	89.8	80.6	95.7	96.1	95.3	97.5	73.1	91.2	75.4	97.0	88.2	89.4
RCP	99.3	92.2	97.5	94.9	82.3	94.1	92.4	98.5	83.8	93.5	83.1	98.1	97.3	96.0	98.8	77.7	95.1	79.4	97.7	92.4	92.2
RMIC	98.0	85.5	92.6	88.7	64.0	86.8	82.0	94.9	72.7	83.1	73.4	95.2	91.7	90.8	95.5	58.3	87.6	70.6	93.8	83.0	84.4
RLSD	96.4	92.7	93.8	94.1	71.2	92.5	94.2	95.7	74.3	90.0	74.2	95.4	96.2	92.1	97.9	66.9	93.5	73.7	97.5	87.6	88.5
DELTA	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	90.3
DSDL	99.4	95.3	97.6	95.7	83.5	94.8	93.9	98.5	85.7	94.5	83.8	98.4	97.7	95.9	98.5	80.6	95.7	82.3	98.2	93.2	93.2

Tips

For any questions, please contact us ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
coco		coco
voc2007		voc2007
voc2012		voc2012
README.md		README.md
framework.jpg		framework.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Semantic Dictionary Learning for Multi-label Image Classification

Environment

Datasets

Microsoft COCO

Pascal VOC 2007

Pascal VOC 2012

Download pretrain models

Demo VOC 2007

Demo VOC 2012

Demo MS-COCO

Experimental Results

The performance comparison on the MS-COCO

The performance comparison on the VOC 2007

The performance comparison on the VOC 2012

Tips

About

Releases

Packages

Languages

FT-ZHOU-ZZZ/DSDL

Folders and files

Latest commit

History

Repository files navigation

Deep Semantic Dictionary Learning for Multi-label Image Classification

Environment

Datasets

Microsoft COCO

Pascal VOC 2007

Pascal VOC 2012

Download pretrain models

Demo VOC 2007

Demo VOC 2012

Demo MS-COCO

Experimental Results

The performance comparison on the MS-COCO

The performance comparison on the VOC 2007

The performance comparison on the VOC 2012

Tips

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages