This repository is the official implementation of Deep Metric Learning with Spherical Embedding on deep metric learning (DML) task.
📋 Training a vanilla triplet loss / semihard triplet loss / normalized N-pair loss (tuplet loss) / multi-similarity loss on CUB200-2011 / Cars196 / SOP / In-Shop datasets.
This repo was tested with Ubuntu 16.04.1 LTS, Python 3.6, PyTorch 1.1.0, and CUDA 10.1.
Requirements: torch==1.1.0, tensorboardX
-
Prepare datasets and pertained BN-Inception.
Download datasets: CUB200-2011, Cars196, SOP, In-Shop, unzip and organize them as follows.
└───datasets
└───split_train_test.py
└───CUB_200_2011
| └───images.txt
| └───images
| └───001.Black_footed_Albatross
| └───...
└───CARS196
| └───cars_annos.mat
| └───car_ims
| └───000001.jpg
| └───...
└───SOP
| └───Stanford_Online_Products
| └───Ebay_train.txt
| └───Ebay_test.txt
| └───bicycle_final
| └───...
└───Inshop
| └───list_eval_partition.txt
| └───img
| └───MEN
| └───WOMEN
| └───...
Then run split_train_test.py
to generate training and testing lists.
Download the imagenet pertained BN-Inception and put it into ./pretrained_models
.
-
To train the model(s) in the paper, run the following commands or use
sh mytrain.sh
.Train models with vanilla triplet loss.
CUDA_VISIBLE_DEVICES=0 python train.py --use_dataset CUB --instances 3 --lr 0.5e-5 --lr_p 0.25e-5 \
--lr_gamma 0.1 --use_loss triplet
Train models with vanilla triplet loss + SEC.
CUDA_VISIBLE_DEVICES=0 python train.py --use_dataset CUB --instances 3 --lr 0.5e-5 --lr_p 0.25e-5 \
--lr_gamma 0.1 --use_loss triplet --sec_wei 1.0
Train models with vanilla triplet loss + L2-reg.
CUDA_VISIBLE_DEVICES=0 python train.py --use_dataset CUB --instances 3 --lr 0.5e-5 --lr_p 0.25e-5 \
--lr_gamma 0.1 --use_loss triplet --l2reg_wei 1e-4
Similarly, we set --use_loss
to semihtriplet
/n-npair
/ms
and --instances
to 3
/2
/5
, for training models with semihard triplet loss / normalized N-pair loss / multi-similarity loss. We set --use_dataset
to Cars
/SOP
/Inshop
, for training models on other datasets.
📋 The detailed settings of the above hyper-parameters is provided in Appendix B of our paper (with two exceptions to the lr settings listed below).
(a) multi-similarity loss without SEC/L2-reg on CUB: 1e-5/0.5e-5/0.1@3k, 6k
(b) multi-similarity loss without SEC/L2-reg on Cars: 2e-5/2e-5/0.1@2k
(We find that using a larger learning rate harms the original loss function.)
When training on a different dataset or with a different loss function, we only need to modify the hyper-parameters in above commands and the head settings (only when using multi-similarity loss without SEC/L2-reg, we need to set need_bn=False,
self.model = torch.nn.DataParallel(BNInception(need_bn=False)).cuda()
in line 24 of learner.py).
📋 Additionally, to use SEC with EMA method, we need to set
--norm_momentum <value>
, where norm_momentum denotes$\rho$ in Appendix D of our paper.
The test of NMI and F1 on SOP costs a lot of time, and we thus conduct it only after the training process (we only conduct test of R@K during training). In particular, run:
CUDA_VISIBLE_DEVICES=0 python test_sop.py --use_dataset SOP --test_sop_model SOP_xxxx_xxxx
or use sh test_sop.sh
for a complete test of NMI, F1, and R@K on SOP. Here SOP_xxxx_xxxx
is the model to be tested which could be found in ./work_space
.
For other three datasets, the test of NMI, F1, and R@K is conducted during the training process.
Our model achieves the following performance on CUB200-2011, Cars196, SOP, and In-Shop datasets:
If you find this repo useful for your research, please consider citing this paper
@article{zhang2020deep,
title={Deep Metric Learning with Spherical Embedding},
author={Zhang, Dingyi and Li, Yingming and Zhang, Zhongfei},
journal={arXiv preprint arXiv:2011.02785},
year={2020}
}