Visual factors such as viewpoint, pose, illumination and background, are usually considered as important challenges in person re-identification (re-ID). Despite the acknowledgement that these factors are influential, quantitative studies on how they affect a re-ID system are still lacking. To derive insights in this scientific campaign, the burning problem needed to be settled is collecting quantitative data. However, it is difficult to control the changes of visual factors in practice and the cost of collecting this kind of data is expensive. Therefore, we build a synthetic data engine PerxonX.
The link of the illustration of the data engine is PersonX; the link of the paper is pdf-link.
To help know this work quickly, there are some summarized contents as follows.
- 1. Dataset introduction
- 2. Dataset validation
- 3. Dissecting Person Re-identification from the Viewpoint of Viewpoint
- 4. Citation
The PersonX dataset contains six backgrounds, including three pure color backgrounds and three scene backgrounds. There are 1266 hand-crafted identities (547 females and 719 males) and each identity has 36 images (corresponding to 36 viewpoints that are defined below). In this work, we combine two different backgrounds as one dataset to study different situations. The backgrounds and subsets of PersonX are shows as follows.
To show the feasible of using synthetic data, we conduct experiments on both real-world (the Market-1501/1203 and Duke datasets) and synthetic datasets by using evaluate three algorithms IDE+, triplet feature and PCB. The results are shown in the following figure.
`“lr” means the frames are low resolution of 512×242 instead of the original resolution 1024×768`
Three characteristics of PersonX can be observed from the validation results:
- Eligibility: the performance trend of the three algorithms is similar between PersonX and real-world datasets
- Purity: the re-ID accuracies on PersonX subsets are relatively high compared to the real-world dataset
- Sensitivity: these subsets are sensitive to the changes in the environment, such as changes of resolution.
Based on the PersonX engine, this paper makes an early attempt in studying a particular factor, viewpoint.
Here, we denote viewpoint as the pedestrian rotation angle (as shown in above Figure). Since different views of a person contain different details, the viewpoint of a person influences the visual information contained in the image, which is directly related to the performance of the algorithm. Therefore, we investigate the exact influence of viewpoint on the system from three aspects.
- Control group 1. We randomly select half (18 out of 36) or a quarter (9 out of 36) images of each identity for training.
- Control group 2. The training set is constituted by randomly selecting half (18 out of 36) or a quarter (9 out of 36) viewpoints for each identity. There is an example of Control group 1 and Control group2 in the following figure.
- Experimental group 1. Train with two orientations. The training images exhibit two orientations, left+right or front+back. The training set is thus half of the original training set.
- Experimental group 2. Train with one orientation. The training set has one orientation, i.e., left, right, front, or back. The training set becomes a quarter of the size of the original training set.
The Re-ID accuracy (mAP, %) when the training set has missing orientations/viewpoints is shown in following figure.
Here, A and B: we use two orientations for training, e.g.,, training with left and right orientations only (see the definition of viewpoint). C: we train with one orientation only, i.e., left, right, front, or back orientation. D: Impact of missing continuous viewpoints on PersonX46. The horizontal axis is the remaining number of viewpoints and vertical axis is the mAP. In the experimental group, continuous viewpoints are removed. The number on this curve denotes the remaining number of viewpoints. “n.s.” represents that the difference between results is not statistically significant. ★ corresponds to statistically significant. ★★★ means the difference between results is statistically very significant.
- Missing viewpoints compromises training.
- Missing continuous viewpoints are more detrimental than missing randomly viewpoints.
- When limited training viewpoints are available, left/right orientations allow models to be better trained than front/back orientations.
We denote the viewpoint of a query and its true match as \theta_{t} and \theta_{q}, respectively.
- Experimental group 1. The three true matches whose viewpoints are viewpoint of query 10 are removed (set as “junk”).
- Control group 1. Three true matches are randomly removed from the gallery. The illustrations are shown as follows:
Similarly, experimental group 2 and 3, as well as control group 2 and 3 means remove Five (\theta_{t} \in [\theta_{q}-20, \theta_{q}+20] ) and Nine (\theta_{t} \in [\theta_{q}-40, \theta_{q}+40] ) images, respectively.
Experiments are conducted on PersonX45, PersonX46, PersonX46-lr as well as Market-1203.
- True matches whose viewpoints are dissimilar to the query are harder to be retrieved than true matches with a similar viewpoint to the query.
- The above problem becomes more severe when the environment is challenging, e.g., complex background and low resolution.
We train a model on the original training set comprised of every viewpoint. We modify the query viewpoints to see its effect during testing. Specifically, the viewpoint of a probe image can be set to the due left (0), due front (90), due right (180) or due back (270) to represent different sides of person. During retrieval, we assume only one true match in gallery; the true match contains the same person as the query, and its viewpoint is between 0 and 350. Viewpoints of the distractor gallery images are images of all other persons. Taking using due left as query viewpoint as an example, the setting is shown as follows:
For four kinds of query viewpoints, the results are shown in below figure.
Under each query viewpoint, we report 36 rank-1 scores obtained by the query to retrieve 36 types of true match viewpoints. The mean value of 36 rank-1 scores of each kind of query viewpoint is also reported.
- The query viewpoint of left/right generally leads to higher re-ID accuracy than front/back viewpoints.
If you use this dataset in your research, please kindly cite our work as,
@inproceedings{sun2019dissecting,
title={Dissecting Person Re-identification from the Viewpoint of Viewpoint},
author={Sun, Xiaoxiao and Zheng, Liang},
booktitle={CVPR},
year={2019}
}