Meta Generative Attack on Person Reidentification
Abstract
Adversarial attacks have been recently investigated in person re-identification. These attacks perform well under cross dataset or cross model setting. However, the challenges present in cross-dataset cross-model scenario does not allow these models to achieve similar accuracy. To this end, we propose our method with the goal of achieving better transferability against different models and across datasets. We generate a mask to obtain better performance across models and use meta learning to boost the generalizability in the challenging cross-dataset cross-model setting. Experiments on Market-1501, DukeMTMC-reID and MSMT-17 demonstrate favorable results compared to other attacks.
Index Terms:
Adversarial attacks, Meta learning, ReID.I Introduction
The tremendous performance of deep learning models has led to rampant application of these systems in practice. However, these models can be manipulated by introducing minor perturbations [1, 2, 3, 4, 5]. This process is called adversarial attacks. In case of person re-identification, for a given query input , a target model and a gallery, the attack is defined as,
where and are gallery samples belonging to different identity and is the adversarial perturbation with an norm bound of . topk() refers to the top retrieved images for the given argument.
Adversarial attacks have been extensively investigated under classification setting [6] and also studied in other domains [7, 8, 9] in the recent times. However, to the best of our knowledge, there are very few works which study these attacks in person re-identification domain. In the following we briefly discuss some classical attacks under classification setting. Szegedy et al. [1] proposed the first work on generation of adversarial sample for deep neural networks using L-BFGS. Goodfellow et al. [2] proposed an efficient adversarial sample generation method using fast gradient sign method (FGSM). Kurakin et al. [10] proposed an iterative FGSM method. Other prominent works include [11, 12, 13, 14, 15, 16].
In person re-id [17, 18, 19, 20], both white-box and black box attacks have been proposed in [21, 22, 23, 24]. These attacks use a labeled source dataset and show that the attacks are transferable under cross-dataset or cross-model, or both settings. However, transferabilty of attacks in the challenging cross-dataset and cross-model setting is still an issue. In this work, we propose to use a mask and meta-learning for better transferability of attacks. We also investigate adversarial attacks in a completely new setting where the source dataset does not have any labels and the target model structure or parameters are unknown.
II Related Works
In [25], authors propose white box and black box attacks. The black box attack only assumes that the victim model is unknown but the dataset is available. [26] introduces physically realizable attacks in white box setting by generating adversarial clothing pattern. [24] proposes a query based attack wherein the images obtained by querying the victim model are used to form triplets for triplet loss. [27] proposes white box attack using self metric attack; wherein the positive sample is obtained by adding noise to the given input and obtain negative sample from other images. In [21], authors propose a meta-learning framework using a labeled source and extra association dataset. This method generalizes well in cross-dataset scenario. In [22], Ding et al. proposed to use a list-wise attack objective function along with model agnostic regularization for better transferability. A GAN based framework is proposed in [23]. Here the authors generate adversarial noise and mask by training the network using triplet loss.
In this work we use a GAN network to generate adversarial sample. In order to achieve better transferability of attack across models, we suppress the pixels that generate large gradients. Suppressing these gradients allows the network to focus on other pixels. In this way, the network can focus on pixels that are not explicitly salient with respect to the model used for attack. We further use meta learning [28] which also allows incorporation of an additional dataset to boost the transferability. We refer this attack as Meta Generative Attack (MeGA). Our work is closest in spirit to [21, 23], however, the mask generation and application of meta learning under GAN framework are quite distinct from these works.
III Methodology
In this work we address both white-box and black-box attacks. We need that the attack is transferable across models and datasets. Thus if we obtain the attack sample using a given model , the attack is inherently tied to [16]. In order that attack does not over-learn, we apply a mask that can focus on regions that are not highly salient for discrimination. This way the network can focus on less salient but discriminative regions thereby increasing the generalizability of attack to other models. On the other hand, meta learning has been efficiently used in adversarial attacks [29, 21, 30] to obtain better transferability across datasets. However meta learning has not been explored with generative learning for attacks in case of PRID. We adapt the MAML meta learning framework [28] in our proposed method. While the black box attack works assume the presence of a labeled source dataset, we additionally present a more challenging setting wherein no labels are available during attack.

Our proposed model is illustrated in Figure 1. In case of white-box setting, the generator is trained using the generator loss, adversarial triplet loss and meta learning loss while the discriminator is trained with the classical binary cross-entropy discriminator loss. The mask is obtained via self-supervised triplet loss. The network learns to generate adversarial image. While the GAN loss itself focuses on generating real samples, the adversarial triplet loss guides the network to generate samples that will be closer to negative samples and farther away from positive samples.
III-A GAN training
Given a clean sample , we use the generator to create the adversarial sample . The overall GAN loss is given by, . Here denotes the projection into ball of -radius within and . In order to generate adversarial samples, a deep mis-ranking loss is used [23],
(1) | ||||
where is the margin. is the adversarial sample obtained from anchor sample . Similarly, and are the adversarial samples obtained from respective positive and negative samples and . This loss tries to push the negatives closer to each other and pulls the positives farther away. Thus the network learns to generate convincing adversarial samples.
III-B Mask Generation
Attack obtained using the given model leads to poor generelization to other networks. In order to have a better tranferability, we first compute the gradients with respect to self-supervised triplet loss , where is obtained by augmentation of and is the sample in the batch which lies at a maximum Euclidean distance from . Here, the large gradients are primarily responsible for loss convergence. Since this way of achieving convergence is clearly coupled with , we mask the large gradients. Thus, the convergence is not entirely dependent on the large gradients and focuses on other smaller ones which can also potentially posses discriminative nature. Thus the overfitting can be reduced by using the mask. To obtain the mask, we compute,
(2) |
Note that, we use the real samples in Eq. 2. The mask is given by , where denotes absolute value. We mask before feeding as an input to the generator . The masked input is given as , where denotes Hadamard product.
Masking techniques have also been explored in [31, 32] where the idea is to learn the model such that it does not overfit to the training distribution. Our masking technique is motivated from the idea that an adversarial example should be transferbale across different reid models. Our technique is distinct and can be applied to an individual sample. Whereas, masking technique in [31, 32] seeks agreement among the gradients obtained from all the samples of a batch. This technique in [31, 32] also suffers from the drawback of tuning hyperparameter. Further, the masking technique of [31] is boolean while ours is continuous.
III-C Meta Learning
Meta optimization technique allows to learn from multiple datasets for different tasks while generalizing well on a given task. One of the popular meta learning approaches, MAML [28], applies two update steps. The first update happens in an inner loop with a meta-train set while the second update happens in outer loop with a meta-test set. In our case, we perform the inner loop update on the discriminator and generator parameters using the meta-train set and the outer loop update is performed on the parameters of generator using a meta-test set.
More formally, given a network parametrized by and parametrized by , we perform the meta-training phase to obtain the parameters and . The update steps are given in Algorithm 1. We also obtain the adversarial perturbation as, .
We then apply the meta-testing update using the additional meta-test dataset . In Algorithm 1, . We discriminate the datasets using superscripts for meta-train set and for meta-test set. draws its samples from . At the inference stage, we only use to generate the adversarial sample.
III-D Training in absence of labels
Deep mis-ranking loss can be used [23] when the labels are available for . In this scenario, we present the case where no labels are available. In the absence of labels and inspired by unsupervised contrastive loss [33], we generate a positive sample by applying augmentation to the given sample . The negative sample is generated using batch hard negative sample strategy, that is we consider all samples except the augmented version of as negative samples and choose the one which is closest to . We then use Eq. 1 to obtain the adversarial triplet loss.
IV Experimental Results
IV-A Implementation Details
We implemented the proposed method in Pytorch framework. The GAN architecture is similar to that of the GAN used in [34, 35]. We use the models from Model Zoo [36] - OSNet [17], MLFN [18], HACNN [37], ResNet-50 and ResNet-50-FC512. We also use AlignedReID [38, 39], LightMBN [40], and PCB [41, 42]. We use an Adam optimizer with a learning rate = , = and and train the model for 40 epochs. We set , , and . In order to stabilize GAN training, we apply label flipping with 5% flipped labels. We first present the ablation for mask and meta learning.
IV-B Effect of mask
We find that when we use mask for Resnet50 and test for different models like MLFN [18] and HACNN [37], there is a substantial gain in the performance as shown in Table I. In terms of R-1 accuracy, introduction of mask gives a boost of 42.10% and 4.8% for MLFN and HACNN respectively. This indicates that mask provides better transferability. Further, when we evaluate on Resnet50 itself, there is a minor change in performance which could be because mask is learnt using Resnet50 itself.
Model | Resnet50 | MLFN | HACNN | |||
---|---|---|---|---|---|---|
mAP | R-1 | mAP | R-1 | mAP | R-1 | |
Before | 70.4 | 87.9 | 74.3 | 90.1 | 75.6 | 90.9 |
0.66 | 0.41 | 3.95 | 3.23 | 32.57 | 42.01 | |
0.56 | 0.35 | 5.39 | 4.55 | 35.13 | 44.20 | |
0.51 | 0.33 | 6.01 | 4.89 | 37.50 | 45.11 | |
0.69 | 0.50 | 2.80 | 1.87 | 31.73 | 39.99 |
IV-C Effect of meta learning
We demonstrate the effect of meta learning in Table II. In the case of cross-dataset (Resnet50) as well as cross-dataset cross-model (MLFN) setting, we observe that introduction of meta learning gives a significant performance boost. In terms of R-1 accuracy, there is a boost of 69.87% and 69.29% respectively for Resnet50 and MLFN. We further observe that Resnet50 does not have a good transferability towards HACNN. This could be due to two reasons. First, Resnet50 is a basic model compared to other superior PRID models. Second, HACNN is built on Inception units [44].
Model | Resnet50 | MLFN | HACNN | |||
---|---|---|---|---|---|---|
mAP | R-1 | mAP | R-1 | mAP | R-1 | |
Before | 58.9 | 78.3 | 63.2 | 81.1 | 63.2 | 80.1 |
17.96 | 24.86 | 18.25 | 24.10 | 42.75 | 58.48 | |
5.80 | 7.49 | 6.15 | 7.4 | 43.12 | 58.97 |
IV-D Adversarial attack performance
We first present the results for cross-model attack in Table III. We use AlignedReID model, Market-1501 [43] as training set and MSMT-17 [45] as meta test set. The results are reported for Market-1501 and DukeMTMC-reID [46]. In case of Market-1501, it is clearly evident that the proposed method is able to achieve a strong transferability. We can see that incorporation of meta test set leads to less than halving the mAP and R-1 results compared to case when only labels are used. For instance, mAP and R-1 of AlignedReID goes down from 7.00% and 6.38% to 3.51% and 2.82% respectively. This is consistently observed for all three models. Further, the combined usage of mask and meta learning (), denoted as MeGA, achieves best results in cross-model case of PCB and HACNN. The respective R-1 improvements are 10.00% and 9.10%. Thus our method is extremely effective in generating adversarial samples.
Model | AlignedReID | PCB | HACNN | ||||
---|---|---|---|---|---|---|---|
mAP | R-1 | mAP | R-1 | mAP | R-1 | ||
M M | Before | 77.56 | 91.18 | 78.54 | 92.87 | 75.6 | 90.9 |
7.00 | 6.38 | 16.46 | 29.69 | 16.39 | 20.16 | ||
+ | 6.62 | 5.93 | 15.96 | 28.94 | 16.01 | 19.47 | |
3.51 | 2.82 | 8.07 | 13.86 | 5.44 | 5.28 | ||
MeGA | 5.50 | 5.07 | 7.39 | 12.47 | 4.85 | 4.80 | |
M D | 16.04 | 21.14 | 13.35 | 15.66 | 15.94 | 21.85 | |
16.23 | 21.72 | 13.70 | 15.97 | 16.43 | 22.17 | ||
4.69 | 5.70 | 11.10 | 12.88 | 5.40 | 6.55 | ||
MeGA | 7.70 | 9.47 | 11.81 | 14.04 | 4.73 | 5.40 |
In case of Market-1501 to DukeMTMC-reID, we observe that simply applying the meta learning () generalizes very well. In case of AlignedReID, mAP and R-1 of 4.60% and 5.70% respectively, are significantly lower compared to results obtained via or settings. The combined setting of mask and meta learning yields better results for HACNN compared to AlignedReID and PCB. This may be because of the fact that learning of mask is still tied to training set and thus may result in overfitting.
In Table IV we discuss the results for cross-dataset and cross-model case against more models. Here also we can see that both AlignedReID and PCB lead to strong attacks against other models in a different dataset.
In Table V, we present the results for MSMT-17. Here, the model is trained using AlignedReID and PCB using Market-1501 and DukeMTMC-reID as meta test set. When trained and tested using AlignedReID, the R-1 accuracy drops from 67.6% on clean samples to 17.69%. On the other hand when trained using PCB and tested on AlignedReID, the performance drops to 16.70%. This shows that our attack is very effective in large scale datasets such as MSMT-17.
Model | OSNet | LightMBN | ResNet50 | MLFN | ResNet50FC512 | HACNN | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mAP | R-1 | R-10 | mAP | R-1 | R-10 | mAP | R-1 | R-10 | mAP | R-1 | R-10 | mAP | R-1 | R-10 | mAP | R-1 | R-10 | |
Before | 70.2 | 87.0 | - | 73.4 | 87.9 | - | 58.9 | 78.3 | - | 63.2 | 81.1 | - | 64.0 | 81.0 | - | 63.2 | 80.1 | - |
AlignedReID | 15.31 | 22.30 | 35.00 | 16.24 | 24.13 | 39.65 | 5.17 | 6.64 | 13.77 | 12.28 | 16.38 | 29.39 | 6.97 | 9.69 | 19.38 | 4.77 | 5.61 | 11.98 |
PCB | 12.27 | 14.45 | 27.49 | 12.88 | 15.70 | 28.54 | 7.14 | 8.55 | 20.01 | 11.95 | 16.54 | 30.92 | 9.45 | 11.46 | 23.90 | 3.97 | 4.66 | 10.00 |
Model | AlignedReID | ||
---|---|---|---|
mAP | R-1 | R-10 | |
MeGA (AlignedReID) | 9.37 | 17.69 | 33.42 |
MeGA (PCB) | 8.82 | 16.70 | 31.98 |












IV-E Comparison with SOTA models
In Table VI we present the comparison with TCIAA [23], UAP [47] and Meta-attack [21]. We observe that our method outperforms TCIAA by a huge margin. We can also see that when mis-ranking loss is naively applied in case of TCIAA† [21], the model’ performance degrades. Our attack has better performance compared to both TCIAA and Meta-attack.
Model | Aligned reid | ||
---|---|---|---|
mAP | R-1 | R-10 | |
Before | 67.81 | 80.50 | 93.18 |
TCIAA [23] | 14.2 | 17.7 | 32.6 |
MeGA∗ (Ours) | 11.34 | 12.81 | 24.11 |
MeGA (Ours) | 7.70 | 9.47 | 19.16 |
PCB | |||
Before | 69.94 | 84.47 | - |
TCIAA [23] | 31.2 | 45.4 | - |
TCIAA† [23] | 38.0 | 51.4 | - |
UAP [47] | 29.0 | 41.9 | - |
Meta-attack∗ () [21] | 26.9 | 39.9 | |
MeGA∗ () (Ours) | 22.91 | 31.70 | - |
MeGA () (Ours) | 18.01 | 21.85 | 44.29 |
IV-F Subjective Evaluation
We show the example images obtained by our algorithm in Figure 2 and top-5 retrieved results in Figure 3 for the OSNet model. We can see that in the case of clean samples the top-3 retrieved images match the query ID, however, none of the retrieved images match query ID in the presence of our attack.












IV-G Attack using unlabelled source
In this section we discuss the attack when source dataset is unlabeled and neither the victim model nor the dataset used for training victim model are available. This is a very challenging scenario as supervised models cannot be used for attack. Towards this, we use unsupervised trained models on Market-1501 and MSMT-17 from [49]. In Table VII, we present results for training using MSMT-17 and testing on Market. We observe that IBNR50 obtains a mAP and R-1 accuracy of 40.7% and 52.34% when both labels and mask are not used. When mask is incorporated there is a substantial boost of 3.82% in mAP and 4.81% in R-1 accuracy in case of OSNet. These gains are even higher for MLFN and HACNN.
In case of Market-1501 to MSMT-17 in Table VIII, we see that the attack using only mask performs reasonably well compared to that of attacks using label or both label and mask. Due to the comparatively small size of Market-1501, even the attacks using labels are not very efficient.
Model | OSNet | MLFN | HACNN | |||
---|---|---|---|---|---|---|
mAP | R-1 | mAP | R-1 | mAP | R-1 | |
Before | 82.6 | 94.2 | 74.3 | 90.1 | 75.6 | 90. 9 |
(R50) | 30.50 | 39.45 | 26.37 | 38.03 | 31.15 | 39.34 |
(R50) | 24.50 | 33.07 | 21.76 | 32.18 | 18.81 | 23.66 |
(R50) | 36.5 | 47.56 | 34.92 | 52.61 | 31.15 | 39.34 |
IBN R50 | 40.7 | 52.34 | 40.62 | 61.46 | 35.44 | 44.84 |
(IBN R50) | 36.88 | 47.53 | 35.01 | 52.79 | 30.98 | 38.98 |
Model | OSNet | MLFN | HACNN | |||
---|---|---|---|---|---|---|
mAP | R-1 | mAP | R-1 | mAP | R-1 | |
Before | 43.8 | 74.9 | 37.2 | 66.4 | 37.2 | 64.7 |
(R50) | 31.78 | 60.43 | 25.17 | 49.33 | 28.9 | 54.91 |
(R50) | 29.04 | 56.11 | 22.02 | 43.57 | 28.26 | 53.53 |
(R50) | 35.16 | 66.28 | 29.16 | 56.65 | 29.69 | 57.81 |
V Conclusion
We present a generative adversarial attack method using mask and meta-learning techniques. The mask allows better transferability across different networks, whereas, meta learning allows better generalizability. We present elaborate results under various settings. Our ablation also shows the importance of mask and meta-learning. Elaborate experiments on Market-1501, MSMT-17 and DukeMTMC-reID shows the efficacy of the proposed method.
References
- [1] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.
- [2] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
- [3] B. Wang, M. Zhao, W. Wang, F. Wei, Z. Qin, and K. Ren, “Are you confident that you have successfully generated adversarial examples?” TCSVT, vol. 31, no. 6, pp. 2089–2099, 2020.
- [4] B. Wang, M. Zhao, W. Wang, X. Dai, Y. Li, and Y. Guo, “Adversarial analysis for source camera identification,” TCSVT, vol. 31, no. 11, pp. 4174–4186, 2020.
- [5] H. Zhang, B. Chen, J. Wang, and G. Zhao, “A local perturbation generation method for gan-generated face anti-forensics,” IEEE TCSVT, 2022.
- [6] N. Akhtar, A. Mian, N. Kardan, and M. Shah, “Advances in adversarial attacks and defenses in computer vision: A survey,” IEEE Access, vol. 9, pp. 155 161–155 196, 2021.
- [7] Q. Li, X. Wang, B. Ma, X. Wang, C. Wang, S. Gao, and Y. Shi, “Concealed attack for robust watermarking based on generative model and perceptual loss,” TCSVT, 2021.
- [8] Z. Li, Y. Shi, J. Gao, S. Wang, B. Li, P. Liang, and W. Hu, “A simple and strong baseline for universal targeted attacks on siamese visual tracking,” TCSVT, 2021.
- [9] S. Jia, X. Li, C. Hu, G. Guo, and Z. Xu, “3d face anti-spoofing with factorized bilinear coding,” TCSVT, vol. 31, no. 10, pp. 4031–4045, 2020.
- [10] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” arXiv preprint arXiv:1611.01236, 2016.
- [11] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017.
- [12] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in Symposium on security and privacy, 2017, pp. 39–57.
- [13] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings,” in European symposium on security and privacy, 2016, pp. 372–387.
- [14] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting adversarial attacks with momentum,” in CVPR, 2018, pp. 9185–9193.
- [15] F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in ICML. PMLR, 2020, pp. 2206–2216.
- [16] Z. Wang, H. Guo, Z. Zhang, W. Liu, Z. Qin, and K. Ren, “Feature importance-aware transferable adversarial attacks,” in ICCV, 2021.
- [17] K. Zhou, Y. Yang, A. Cavallaro, and T. Xiang, “Omni-scale feature learning for person re-identification,” in ICCV, 2019, pp. 3702–3712.
- [18] X. Chang, T. M. Hospedales, and T. Xiang, “Multi-level factorisation net for person re-identification,” in CVPR, 2018, pp. 2109–2118.
- [19] Y.-J. Li, C.-S. Lin, Y.-B. Lin, and Y.-C. F. Wang, “Cross-dataset person re-identification via unsupervised pose disentanglement and adaptation,” in ICCV, 2019, pp. 7919–7929.
- [20] Y. Yang, P. Tiwari, H. M. Pandey, Z. Lei et al., “Pixel and feature transfer fusion for unsupervised cross-dataset person reidentification,” TNNLS, 2021.
- [21] F. Yang, Z. Zhong, H. Liu, Z. Wang, Z. Luo, S. Li, N. Sebe, and S. Satoh, “Learning to attack real-world models for person re-identification via virtual-guided meta-learning,” in AAAI, vol. 35, no. 4, 2021.
- [22] W. Ding, X. Wei, R. Ji, X. Hong, Q. Tian, and Y. Gong, “Beyond universal person re-identification attack,” IEEE TIFS, vol. 16, 2021.
- [23] H. Wang, G. Wang, Y. Li, D. Zhang, and L. Lin, “Transferable, controllable, and inconspicuous adversarial attacks on person re-identification with deep mis-ranking,” in CVPR, 2020, pp. 342–351.
- [24] X. Li, J. Li, Y. Chen, S. Ye, Y. He, S. Wang, H. Su, and H. Xue, “Qair: Practical query-efficient black-box attacks for image retrieval,” in CVPR, 2021, pp. 3330–3339.
- [25] S. Bai, Y. Li, Y. Zhou, Q. Li, and P. H. Torr, “Adversarial metric attack and defense for person re-identification,” IEEE TPAMI, vol. 43, no. 6, pp. 2119–2126, 2021.
- [26] Z. Wang, S. Zheng, M. Song, Q. Wang, A. Rahimpour, and H. Qi, “advpattern: Physical-world attacks on deep person re-identification via adversarially transformable patterns,” in ICCV, 2019, pp. 8341–8350.
- [27] Q. Bouniot, R. Audigier, and A. Loesch, “Vulnerability of person re-identification models to metric adversarial attacks,” in CVPR, 2020, pp. 794–795.
- [28] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in ICML. PMLR, 2017, pp. 1126–1135.
- [29] Z. Yuan, J. Zhang, Y. Jia, C. Tan, T. Xue, and S. Shan, “Meta gradient adversarial attack,” in ICCV, 2021, pp. 7748–7757.
- [30] W. Feng, B. Wu, T. Zhang, Y. Zhang, and Y. Zhang, “Meta-attack: Class-agnostic and model-agnostic physical adversarial attack,” in ICCV, 2021, pp. 7787–7796.
- [31] G. Parascandolo, A. Neitz, A. Orvieto, L. Gresele, and B. Schölkopf, “Learning explanations that are hard to vary,” arXiv preprint arXiv:2009.00329, 2020.
- [32] S. Shahtalebi, J.-C. Gagnon-Audet, T. Laleh, M. Faramarzi, K. Ahuja, and I. Rish, “Sand-mask: An enhanced gradient masking strategy for the discovery of invariances in domain generalization,” arXiv preprint arXiv:2106.02266, 2021.
- [33] F. Wang and H. Liu, “Understanding the behaviour of contrastive loss,” in CVPR, 2021, pp. 2495–2504.
- [34] C. Xiao, B. Li, J.-Y. Zhu, W. He, M. Liu, and D. Song, “Generating adversarial examples with adversarial networks,” arXiv preprint arXiv:1801.02610, 2018.
- [35] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in CVPR, 2017, pp. 1125–1134.
- [36] K. Zhou, “Model zoo,” https://kaiyangzhou.github.io/deep-person-reid/MODEL_ZOO.html.
- [37] W. Li, X. Zhu, and S. Gong, “Harmonious attention network for person re-identification,” in CVPR, 2018, pp. 2285–2294.
- [38] X. Zhang, H. Luo, X. Fan, W. Xiang, Y. Sun, Q. Xiao, W. Jiang, C. Zhang, and J. Sun, “Alignedreid: Surpassing human-level performance in person re-identification,” arXiv preprint arXiv:1711.08184, 2017.
- [39] H. Huang, “Alignedreid,” https://github.com/michuanhaohao/AlignedReID.
- [40] F. Herzog, X. Ji, T. Teepe, S. Hörmann, J. Gilg, and G. Rigoll, “Lightweight multi-branch network for person re-identification,” in ICIP. IEEE, 2021, pp. 1129–1133.
- [41] Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),” in ECCV, 2018, pp. 480–496.
- [42] H. Luo, “Pcb,” https://github.com/huanghoujing/beyond-part-models.
- [43] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in ICCV, 2015, pp. 1116–1124.
- [44] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in AAAI, 2017.
- [45] L. Wei, S. Zhang, W. Gao, and Q. Tian, “Person transfer gan to bridge domain gap for person re-identification,” in CVPR, 2018, pp. 79–88.
- [46] Z. Zheng, L. Zheng, and Y. Yang, “Unlabeled samples generated by gan improve the person re-identification baseline in vitro,” in ICCV, 2017.
- [47] J. Li, R. Ji, H. Liu, X. Hong, Y. Gao, and Q. Tian, “Universal perturbation attack against image retrieval,” in ICCV, 2019.
- [48] X. Sun and L. Zheng, “Dissecting person re-identification from the viewpoint of viewpoint,” in CVPR, 2019.
- [49] Y. Ge, F. Zhu, D. Chen, R. Zhao et al., “Self-paced contrastive learning with hybrid memory for domain adaptive object re-id,” NeurIPS, vol. 33, pp. 11 309–11 321, 2020.