This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Meta Generative Attack on Person Reidentification

A V Subramanyam A V Subramanyam is with IIITD, India (e-mail: [email protected]). We acknowledge the support of the IHUB-ANUBHUTI-IIITD FOUNDATION setup under the NM-ICPS scheme of the DST, India.
Abstract

Adversarial attacks have been recently investigated in person re-identification. These attacks perform well under cross dataset or cross model setting. However, the challenges present in cross-dataset cross-model scenario does not allow these models to achieve similar accuracy. To this end, we propose our method with the goal of achieving better transferability against different models and across datasets. We generate a mask to obtain better performance across models and use meta learning to boost the generalizability in the challenging cross-dataset cross-model setting. Experiments on Market-1501, DukeMTMC-reID and MSMT-17 demonstrate favorable results compared to other attacks.

Index Terms:
Adversarial attacks, Meta learning, ReID.

I Introduction

The tremendous performance of deep learning models has led to rampant application of these systems in practice. However, these models can be manipulated by introducing minor perturbations [1, 2, 3, 4, 5]. This process is called adversarial attacks. In case of person re-identification, for a given query input xx, a target model ff and a gallery, the attack is defined as,

f(𝐱+𝜹)f(𝐱g)2>f(𝐱+𝜹)f(𝐱¯g)2s.t.𝜹pϵ,\displaystyle\lVert f(\mathbf{x}+\boldsymbol{\delta})-f(\mathbf{x}_{g})\rVert_{2}>\lVert f(\mathbf{x}+\boldsymbol{\delta})-f(\bar{\mathbf{x}}_{g})\rVert_{2}\;\;\;\textit{s.t.}\;\lVert\boldsymbol{\delta}\rVert_{p}\leq\epsilon,
𝐱gtopk(𝐱+𝜹),ID(𝐱)=ID(𝐱g)ID(𝐱¯g)\displaystyle\mathbf{x}_{g}\ni topk(\mathbf{x}+\boldsymbol{\delta}),ID(\mathbf{x})=ID(\mathbf{x}_{g})\neq ID(\bar{\mathbf{x}}_{g})

where 𝐱g\mathbf{x}_{g} and 𝐱¯g\bar{\mathbf{x}}_{g} are gallery samples belonging to different identity and 𝜹\boldsymbol{\delta} is the adversarial perturbation with an lpl_{p} norm bound of ϵ\epsilon. topk(\cdot) refers to the top kk retrieved images for the given argument.

Adversarial attacks have been extensively investigated under classification setting [6] and also studied in other domains [7, 8, 9] in the recent times. However, to the best of our knowledge, there are very few works which study these attacks in person re-identification domain. In the following we briefly discuss some classical attacks under classification setting. Szegedy et al. [1] proposed the first work on generation of adversarial sample for deep neural networks using L-BFGS. Goodfellow et al. [2] proposed an efficient adversarial sample generation method using fast gradient sign method (FGSM). Kurakin et al. [10] proposed an iterative FGSM method. Other prominent works include [11, 12, 13, 14, 15, 16].

In person re-id [17, 18, 19, 20], both white-box and black box attacks have been proposed in [21, 22, 23, 24]. These attacks use a labeled source dataset and show that the attacks are transferable under cross-dataset or cross-model, or both settings. However, transferabilty of attacks in the challenging cross-dataset and cross-model setting is still an issue. In this work, we propose to use a mask and meta-learning for better transferability of attacks. We also investigate adversarial attacks in a completely new setting where the source dataset does not have any labels and the target model structure or parameters are unknown.

II Related Works

In [25], authors propose white box and black box attacks. The black box attack only assumes that the victim model is unknown but the dataset is available. [26] introduces physically realizable attacks in white box setting by generating adversarial clothing pattern. [24] proposes a query based attack wherein the images obtained by querying the victim model are used to form triplets for triplet loss. [27] proposes white box attack using self metric attack; wherein the positive sample is obtained by adding noise to the given input and obtain negative sample from other images. In [21], authors propose a meta-learning framework using a labeled source and extra association dataset. This method generalizes well in cross-dataset scenario. In [22], Ding et al. proposed to use a list-wise attack objective function along with model agnostic regularization for better transferability. A GAN based framework is proposed in [23]. Here the authors generate adversarial noise and mask by training the network using triplet loss.

In this work we use a GAN network to generate adversarial sample. In order to achieve better transferability of attack across models, we suppress the pixels that generate large gradients. Suppressing these gradients allows the network to focus on other pixels. In this way, the network can focus on pixels that are not explicitly salient with respect to the model used for attack. We further use meta learning [28] which also allows incorporation of an additional dataset to boost the transferability. We refer this attack as Meta Generative Attack (MeGA). Our work is closest in spirit to [21, 23], however, the mask generation and application of meta learning under GAN framework are quite distinct from these works.

III Methodology

In this work we address both white-box and black-box attacks. We need that the attack is transferable across models and datasets. Thus if we obtain the attack sample using a given model ff, the attack is inherently tied to ff [16]. In order that attack does not over-learn, we apply a mask that can focus on regions that are not highly salient for discrimination. This way the network can focus on less salient but discriminative regions thereby increasing the generalizability of attack to other models. On the other hand, meta learning has been efficiently used in adversarial attacks [29, 21, 30] to obtain better transferability across datasets. However meta learning has not been explored with generative learning for attacks in case of PRID. We adapt the MAML meta learning framework [28] in our proposed method. While the black box attack works assume the presence of a labeled source dataset, we additionally present a more challenging setting wherein no labels are available during attack.

Refer to caption
Figure 1: Model architecture. Mask 𝐌\mathbf{M} is generated using model ff and is used to mask the input 𝐱\mathbf{x}. GAN is trained using a meta learning framework with an adversarial triplet loss and GAN loss.

Our proposed model is illustrated in Figure 1. In case of white-box setting, the generator 𝒢\mathcal{G} is trained using the generator loss, adversarial triplet loss and meta learning loss while the discriminator 𝒟\mathcal{D} is trained with the classical binary cross-entropy discriminator loss. The mask is obtained via self-supervised triplet loss. The network learns to generate adversarial image. While the GAN loss itself focuses on generating real samples, the adversarial triplet loss guides the network to generate samples that will be closer to negative samples and farther away from positive samples.

III-A GAN training

Given a clean sample 𝐱\mathbf{x}, we use the generator 𝒢\mathcal{G} to create the adversarial sample 𝐱adv\mathbf{x}_{adv}. The overall GAN loss is given by, GAN=E𝐱log𝒟(𝐱)+E𝐱log(1𝒟(Π(𝒢(𝐱))))\mathcal{L}_{GAN}=E_{\mathbf{x}}\log\mathcal{D}(\mathbf{x})+E_{\mathbf{x}}\log(1-\mathcal{D}(\Pi(\mathcal{G}(\mathbf{x})))). Here Π(.)\Pi(.) denotes the projection into ll_{\infty} ball of ϵ\epsilon-radius within 𝐱\mathbf{x} and 𝐱adv=Π(𝒢(𝐱))\mathbf{x}_{adv}=\Pi(\mathcal{G}(\mathbf{x})). In order to generate adversarial samples, a deep mis-ranking loss is used [23],

advtrip(𝐱adva,𝐱advn,𝐱advp)\displaystyle\mathcal{L}_{adv-trip}(\mathbf{x}_{adv}^{a},\mathbf{x}_{adv}^{n},\mathbf{x}_{adv}^{p}) =max(𝐱adva𝐱advn2\displaystyle=\max(\lVert\mathbf{x}_{adv}^{a}-\mathbf{x}_{adv}^{n}\rVert_{2} (1)
𝐱adva𝐱advp2+m,0)\displaystyle-\lVert\mathbf{x}_{adv}^{a}-\mathbf{x}_{adv}^{p}\rVert_{2}+m,0)

where mm is the margin. 𝐱adva\mathbf{x}_{adv}^{a} is the adversarial sample obtained from anchor sample 𝐱a\mathbf{x}^{a}. Similarly, 𝐱advp\mathbf{x}_{adv}^{p} and 𝐱advn\mathbf{x}_{adv}^{n} are the adversarial samples obtained from respective positive and negative samples 𝐱p\mathbf{x}^{p} and 𝐱n\mathbf{x}^{n}. This loss tries to push the negatives closer to each other and pulls the positives farther away. Thus the network learns to generate convincing adversarial samples.

III-B Mask Generation

Attack obtained using the given model ff leads to poor generelization to other networks. In order to have a better tranferability, we first compute the gradients with respect to self-supervised triplet loss advtrip(𝐱,𝐱n,𝐱p)\mathcal{L}_{adv-trip}(\mathbf{x},\mathbf{x}^{n},\mathbf{x}^{p}), where 𝐱p\mathbf{x}^{p} is obtained by augmentation of 𝐱\mathbf{x} and 𝐱n\mathbf{x}^{n} is the sample in the batch which lies at a maximum Euclidean distance from 𝐱\mathbf{x}. Here, the large gradients are primarily responsible for loss convergence. Since this way of achieving convergence is clearly coupled with ff, we mask the large gradients. Thus, the convergence is not entirely dependent on the large gradients and focuses on other smaller ones which can also potentially posses discriminative nature. Thus the overfitting can be reduced by using the mask. To obtain the mask, we compute,

𝐠𝐫𝐚𝐝advtriplet=𝐱advtrip(𝐱,𝐱n,𝐱p)\mathbf{grad}_{adv-triplet}=\nabla_{\mathbf{x}}\mathcal{L}_{adv-trip}(\mathbf{x},\mathbf{x}^{n},\mathbf{x}^{p}) (2)

Note that, we use the real samples in Eq. 2. The mask is given by 𝐌=sigmoid(|𝐠𝐫𝐚𝐝advtriplet|)\mathbf{M}=sigmoid(\lvert\mathbf{grad}_{adv-triplet}\rvert), where ||\lvert\cdot\rvert denotes absolute value. We mask 𝐱\mathbf{x} before feeding as an input to the generator 𝒢\mathcal{G}. The masked input is given as 𝐱=𝐱(1𝐌)\mathbf{x}=\mathbf{x}\odot(1-\mathbf{M}), where \odot denotes Hadamard product.

Masking techniques have also been explored in [31, 32] where the idea is to learn the model such that it does not overfit to the training distribution. Our masking technique is motivated from the idea that an adversarial example should be transferbale across different reid models. Our technique is distinct and can be applied to an individual sample. Whereas, masking technique in [31, 32] seeks agreement among the gradients obtained from all the samples of a batch. This technique in [31, 32] also suffers from the drawback of tuning hyperparameter. Further, the masking technique of [31] is boolean while ours is continuous.

III-C Meta Learning

Meta optimization technique allows to learn from multiple datasets for different tasks while generalizing well on a given task. One of the popular meta learning approaches, MAML [28], applies two update steps. The first update happens in an inner loop with a meta-train set while the second update happens in outer loop with a meta-test set. In our case, we perform the inner loop update on the discriminator and generator parameters using the meta-train set and the outer loop update is performed on the parameters of generator using a meta-test set.

input : Datasets 𝒯\mathcal{T} and 𝒜\mathcal{A}, model ff
output : Generator network 𝒢\mathcal{G} parameters 𝜽g\boldsymbol{\theta}_{g}
while not converge do
       for samples in 𝒯\mathcal{T} do
             /* Obtain the mask */
             𝐌\mathbf{M} \leftarrow σ\sigma(|𝐱advtrip(𝐱,𝐱n,𝐱p)|\lvert\nabla_{\mathbf{x}}{\mathcal{L}_{adv-trip}(\mathbf{x},\mathbf{x}^{n},\mathbf{x}^{p})}\rvert)
             /* Meta train update using 𝒯\mathcal{T} */
             𝜽dargmax𝜽dE𝐱log𝒟(𝐱)+E𝐱log(1𝒟(Π(𝒢(𝐱))))\boldsymbol{\theta}_{d}\leftarrow\operatorname*{arg\,max}_{\boldsymbol{\theta}_{d}}E_{\mathbf{x}}\log\mathcal{D}(\mathbf{x})+E_{\mathbf{x}}\log(1-\mathcal{D}(\Pi(\mathcal{G}(\mathbf{x}))))
             𝜽gargmin𝜽g𝒢𝒯+λadvtrip𝒯(𝐱adva,𝐱advn,𝐱advp)\boldsymbol{\theta}_{g}\leftarrow\operatorname*{arg\,min}_{\boldsymbol{\theta}_{g}}\mathcal{L}_{\mathcal{G}}^{\mathcal{T}}+\lambda\mathcal{L}_{adv-trip}^{\mathcal{T}}(\mathbf{x}_{adv}^{a},\mathbf{x}_{adv}^{n},\mathbf{x}_{adv}^{p})
             𝜹=𝐱Π(G(𝐱))\boldsymbol{\delta}=\mathbf{x}-\Pi(G(\mathbf{x}))
             /* Meta test loss using 𝒜\mathcal{A} */
             Sample triplets from meta-test set 𝒜\mathcal{A} and compute =advtrip𝒜(𝐱a𝜹,𝐱n,𝐱p)\mathcal{L}=\mathcal{L}_{adv-trip}^{\mathcal{A}}(\mathbf{x}^{a}-\boldsymbol{\delta},\mathbf{x}^{n},\mathbf{x}^{p})
            
      /* Meta test update */
       𝜽gargmin𝜽gλ\boldsymbol{\theta}_{g}\leftarrow\operatorname*{arg\,min}_{\boldsymbol{\theta}_{g}}\lambda\mathcal{L}
      
Algorithm 1 Training for MeGA

More formally, given a network 𝒟\mathcal{D} parametrized by 𝜽d\boldsymbol{\theta}_{d} and 𝒢\mathcal{G} parametrized by 𝜽g\boldsymbol{\theta}_{g}, we perform the meta-training phase to obtain the parameters 𝜽d\boldsymbol{\theta}_{d} and 𝜽g\boldsymbol{\theta}_{g}. The update steps are given in Algorithm 1. We also obtain the adversarial perturbation as, 𝜹=𝐱Π(G(𝐱))\boldsymbol{\delta}=\mathbf{x}-\Pi(G(\mathbf{x})).

We then apply the meta-testing update using the additional meta-test dataset 𝒜{\mathcal{A}}. In Algorithm 1, 𝒢𝒯=E𝐱log(1𝒟(Π(𝒢(𝐱))))\mathcal{L}_{\mathcal{G}}^{\mathcal{T}}=E_{\mathbf{x}}\log(1-\mathcal{D}(\Pi(\mathcal{G}(\mathbf{x})))). We discriminate the datasets using superscripts 𝒯\mathcal{T} for meta-train set and 𝒜\mathcal{A} for meta-test set. advtrip𝒜\mathcal{L}_{adv-trip}^{\mathcal{A}} draws its samples 𝐱\mathbf{x} from 𝒜\mathcal{A}. At the inference stage, we only use 𝒢\mathcal{G} to generate the adversarial sample.

III-D Training in absence of labels

Deep mis-ranking loss can be used [23] when the labels are available for 𝒯\mathcal{T}. In this scenario, we present the case where no labels are available. In the absence of labels and inspired by unsupervised contrastive loss [33], we generate a positive sample 𝐱advp\mathbf{x}_{adv}^{p} by applying augmentation to the given sample 𝐱adva\mathbf{x}_{adv}^{a}. The negative sample 𝐱advn\mathbf{x}_{adv}^{n} is generated using batch hard negative sample strategy, that is we consider all samples except the augmented version of 𝐱adva\mathbf{x}_{adv}^{a} as negative samples and choose the one which is closest to 𝐱adva\mathbf{x}_{adv}^{a}. We then use Eq. 1 to obtain the adversarial triplet loss.

IV Experimental Results

IV-A Implementation Details

We implemented the proposed method in Pytorch framework. The GAN architecture is similar to that of the GAN used in [34, 35]. We use the models from Model Zoo [36] - OSNet [17], MLFN [18], HACNN [37], ResNet-50 and ResNet-50-FC512. We also use AlignedReID [38, 39], LightMBN [40], and PCB [41, 42]. We use an Adam optimizer with a learning rate = 10510^{-5}, β1\beta_{1} = 0.50.5 and β2=0.999\beta_{2}=0.999 and train the model for 40 epochs. We set m=1m=1, λ=0.01\lambda=0.01, and ϵ=16\epsilon=16. In order to stabilize GAN training, we apply label flipping with 5% flipped labels. We first present the ablation for mask and meta learning.

IV-B Effect of mask 𝐌\mathbf{M}

We find that when we use mask for Resnet50 and test for different models like MLFN [18] and HACNN [37], there is a substantial gain in the performance as shown in Table I. In terms of R-1 accuracy, introduction of mask gives a boost of 42.10% and 4.8% for MLFN and HACNN respectively. This indicates that mask provides better transferability. Further, when we evaluate on Resnet50 itself, there is a minor change in performance which could be because mask is learnt using Resnet50 itself.

TABLE I: Trained on Market-1501 [43]. Setting Market-1501 \rightarrow Market-1501. ll indicates Market-1501 labels are used for training. 𝐌\mathbf{M} indicates the incorporation of mask. ’Before’ indicates accuracy on clean samples.
Model Resnet50 MLFN HACNN
mAP R-1 mAP R-1 mAP R-1
Before 70.4 87.9 74.3 90.1 75.6 90.9
ll 0.66 0.41 3.95 3.23 32.57 42.01
l+ANDl+\text{AND} 0.56 0.35 5.39 4.55 35.13 44.20
l+SANDl+\text{SAND} 0.51 0.33 6.01 4.89 37.50 45.11
l+𝐌l+\mathbf{M} 0.69 0.50 2.80 1.87 31.73 39.99

IV-C Effect of meta learning

We demonstrate the effect of meta learning in Table II. In the case of cross-dataset (Resnet50) as well as cross-dataset cross-model (MLFN) setting, we observe that introduction of meta learning gives a significant performance boost. In terms of R-1 accuracy, there is a boost of 69.87% and 69.29% respectively for Resnet50 and MLFN. We further observe that Resnet50 does not have a good transferability towards HACNN. This could be due to two reasons. First, Resnet50 is a basic model compared to other superior PRID models. Second, HACNN is built on Inception units [44].

TABLE II: Trained on Market-1501 using MSMT-17 [45] as meta test set. Setting Market-1501 \rightarrow DukeMTMC-reID [46]. 𝒜\mathcal{A} indicates incorporation of meta learning.
Model Resnet50 MLFN HACNN
mAP R-1 mAP R-1 mAP R-1
Before 58.9 78.3 63.2 81.1 63.2 80.1
ll 17.96 24.86 18.25 24.10 42.75 58.48
l+𝒜l+\mathcal{A} 5.80 7.49 6.15 7.4 43.12 58.97

IV-D Adversarial attack performance

We first present the results for cross-model attack in Table III. We use AlignedReID model, Market-1501 [43] as training set and MSMT-17 [45] as meta test set. The results are reported for Market-1501 and DukeMTMC-reID [46]. In case of Market-1501, it is clearly evident that the proposed method is able to achieve a strong transferability. We can see that incorporation of meta test set leads to less than halving the mAP and R-1 results compared to case when only labels are used. For instance, mAP and R-1 of AlignedReID goes down from 7.00% and 6.38% to 3.51% and 2.82% respectively. This is consistently observed for all three models. Further, the combined usage of mask and meta learning (l+𝐌+𝒜l+\mathbf{M}+\mathcal{A}), denoted as MeGA, achieves best results in cross-model case of PCB and HACNN. The respective R-1 improvements are 10.00% and 9.10%. Thus our method is extremely effective in generating adversarial samples.

TABLE III: AlignedReID trained on Market-1501 with MSMT-17 as meta test set. M is Market-1501 and D is DukeMTMC-reID. MeGA denotes l+𝐌+𝒜l+\mathbf{M}+\mathcal{A}.
Model AlignedReID PCB HACNN
mAP R-1 mAP R-1 mAP R-1
M \rightarrow M Before 77.56 91.18 78.54 92.87 75.6 90.9
ll 7.00 6.38 16.46 29.69 16.39 20.16
ll + 𝐌\mathbf{M} 6.62 5.93 15.96 28.94 16.01 19.47
l+𝒜l+\mathcal{A} 3.51 2.82 8.07 13.86 5.44 5.28
MeGA 5.50 5.07 7.39 12.47 4.85 4.80
M \rightarrow D ll 16.04 21.14 13.35 15.66 15.94 21.85
l+𝐌l+\mathbf{M} 16.23 21.72 13.70 15.97 16.43 22.17
l+𝒜l+\mathcal{A} 4.69 5.70 11.10 12.88 5.40 6.55
MeGA 7.70 9.47 11.81 14.04 4.73 5.40

In case of Market-1501 to DukeMTMC-reID, we observe that simply applying the meta learning (l+𝒜l+\mathcal{A}) generalizes very well. In case of AlignedReID, mAP and R-1 of 4.60% and 5.70% respectively, are significantly lower compared to results obtained via ll or l+𝐌l+\mathbf{M} settings. The combined setting of mask and meta learning yields better results for HACNN compared to AlignedReID and PCB. This may be because of the fact that learning of mask is still tied to training set and thus may result in overfitting.

In Table IV we discuss the results for cross-dataset and cross-model case against more models. Here also we can see that both AlignedReID and PCB lead to strong attacks against other models in a different dataset.

In Table V, we present the results for MSMT-17. Here, the model is trained using AlignedReID and PCB using Market-1501 and DukeMTMC-reID as meta test set. When trained and tested using AlignedReID, the R-1 accuracy drops from 67.6% on clean samples to 17.69%. On the other hand when trained using PCB and tested on AlignedReID, the performance drops to 16.70%. This shows that our attack is very effective in large scale datasets such as MSMT-17.

TABLE IV: AlignedReID and PCB trained on Market with MSMT-17 as meta test set. Setting Market-1501 \rightarrow DukeMTMC-reID.
Model OSNet LightMBN ResNet50 MLFN ResNet50FC512 HACNN
mAP R-1 R-10 mAP R-1 R-10 mAP R-1 R-10 mAP R-1 R-10 mAP R-1 R-10 mAP R-1 R-10
Before 70.2 87.0 - 73.4 87.9 - 58.9 78.3 - 63.2 81.1 - 64.0 81.0 - 63.2 80.1 -
AlignedReID 15.31 22.30 35.00 16.24 24.13 39.65 5.17 6.64 13.77 12.28 16.38 29.39 6.97 9.69 19.38 4.77 5.61 11.98
PCB 12.27 14.45 27.49 12.88 15.70 28.54 7.14 8.55 20.01 11.95 16.54 30.92 9.45 11.46 23.90 3.97 4.66 10.00
TABLE V: Trained on Market-1501 using DukeMTMC-reID as meta test set. Setting Market-1501 \rightarrow MSMT-17.
Model AlignedReID
mAP R-1 R-10
MeGA (AlignedReID) 9.37 17.69 33.42
MeGA (PCB) 8.82 16.70 31.98
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: Left column: Red and blue box show the given image from Market-1501 and its mask (1M1-M) respectively. Right column: Attacked (top) and clean (bottom) images from MSMT-17

IV-E Comparison with SOTA models

In Table VI we present the comparison with TCIAA [23], UAP [47] and Meta-attack [21]. We observe that our method outperforms TCIAA by a huge margin. We can also see that when mis-ranking loss is naively applied in case of TCIAA [21], the model’ performance degrades. Our attack has better performance compared to both TCIAA and Meta-attack.

TABLE VI: AlignedReID trained on Market with MSMT-17 as meta test set. Setting Market-1501 \rightarrow DukeMTMC-reID. uses PersonX [48] as extra dataset. uses PersonX for meta learning.
Model Aligned reid
mAP R-1 R-10
Before 67.81 80.50 93.18
TCIAA [23] 14.2 17.7 32.6
MeGA (Ours) 11.34 12.81 24.11
MeGA (Ours) 7.70 9.47 19.16
PCB
Before 69.94 84.47 -
TCIAA [23] 31.2 45.4 -
TCIAA [23] 38.0 51.4 -
UAP [47] 29.0 41.9 -
Meta-attack (ϵ=8\epsilon=8) [21] 26.9 39.9
MeGA (ϵ=8\epsilon=8) (Ours) 22.91 31.70 -
MeGA (ϵ=8\epsilon=8) (Ours) 18.01 21.85 44.29

IV-F Subjective Evaluation

We show the example images obtained by our algorithm in Figure 2 and top-5 retrieved results in Figure 3 for the OSNet model. We can see that in the case of clean samples the top-3 retrieved images match the query ID, however, none of the retrieved images match query ID in the presence of our attack.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Query image marked in blue border. Top 5 retrieved mages from OSNet for Market-1501 (top). Green colored boxes are correct match and red ones are incorrect. Retrieved images after attacking query sample (bottom).

IV-G Attack using unlabelled source

In this section we discuss the attack when source dataset 𝒯\mathcal{T} is unlabeled and neither the victim model nor the dataset used for training victim model are available. This is a very challenging scenario as supervised models cannot be used for attack. Towards this, we use unsupervised trained models on Market-1501 and MSMT-17 from [49]. In Table VII, we present results for training using MSMT-17 and testing on Market. We observe that IBNR50 obtains a mAP and R-1 accuracy of 40.7% and 52.34% when both labels and mask are not used. When mask is incorporated there is a substantial boost of 3.82% in mAP and 4.81% in R-1 accuracy in case of OSNet. These gains are even higher for MLFN and HACNN.

In case of Market-1501 to MSMT-17 in Table VIII, we see that the attack using only mask performs reasonably well compared to that of attacks using label or both label and mask. Due to the comparatively small size of Market-1501, even the attacks using labels are not very efficient.

TABLE VII: MSMT-17 \rightarrow Market-1501. R50 denotes Resnet50.
Model OSNet MLFN HACNN
mAP R-1 mAP R-1 mAP R-1
Before 82.6 94.2 74.3 90.1 75.6 90. 9
ll (R50) 30.50 39.45 26.37 38.03 31.15 39.34
l+𝐌l+\mathbf{M} (R50) 24.50 33.07 21.76 32.18 18.81 23.66
𝐌\mathbf{M} (R50) 36.5 47.56 34.92 52.61 31.15 39.34
IBN R50 40.7 52.34 40.62 61.46 35.44 44.84
𝐌\mathbf{M} (IBN R50) 36.88 47.53 35.01 52.79 30.98 38.98
TABLE VIII: Market-1501 \rightarrow MSMT-17.
Model OSNet MLFN HACNN
mAP R-1 mAP R-1 mAP R-1
Before 43.8 74.9 37.2 66.4 37.2 64.7
ll (R50) 31.78 60.43 25.17 49.33 28.9 54.91
l+𝐌l+\mathbf{M} (R50) 29.04 56.11 22.02 43.57 28.26 53.53
𝐌\mathbf{M} (R50) 35.16 66.28 29.16 56.65 29.69 57.81

V Conclusion

We present a generative adversarial attack method using mask and meta-learning techniques. The mask allows better transferability across different networks, whereas, meta learning allows better generalizability. We present elaborate results under various settings. Our ablation also shows the importance of mask and meta-learning. Elaborate experiments on Market-1501, MSMT-17 and DukeMTMC-reID shows the efficacy of the proposed method.

References

  • [1] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.
  • [2] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
  • [3] B. Wang, M. Zhao, W. Wang, F. Wei, Z. Qin, and K. Ren, “Are you confident that you have successfully generated adversarial examples?” TCSVT, vol. 31, no. 6, pp. 2089–2099, 2020.
  • [4] B. Wang, M. Zhao, W. Wang, X. Dai, Y. Li, and Y. Guo, “Adversarial analysis for source camera identification,” TCSVT, vol. 31, no. 11, pp. 4174–4186, 2020.
  • [5] H. Zhang, B. Chen, J. Wang, and G. Zhao, “A local perturbation generation method for gan-generated face anti-forensics,” IEEE TCSVT, 2022.
  • [6] N. Akhtar, A. Mian, N. Kardan, and M. Shah, “Advances in adversarial attacks and defenses in computer vision: A survey,” IEEE Access, vol. 9, pp. 155 161–155 196, 2021.
  • [7] Q. Li, X. Wang, B. Ma, X. Wang, C. Wang, S. Gao, and Y. Shi, “Concealed attack for robust watermarking based on generative model and perceptual loss,” TCSVT, 2021.
  • [8] Z. Li, Y. Shi, J. Gao, S. Wang, B. Li, P. Liang, and W. Hu, “A simple and strong baseline for universal targeted attacks on siamese visual tracking,” TCSVT, 2021.
  • [9] S. Jia, X. Li, C. Hu, G. Guo, and Z. Xu, “3d face anti-spoofing with factorized bilinear coding,” TCSVT, vol. 31, no. 10, pp. 4031–4045, 2020.
  • [10] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” arXiv preprint arXiv:1611.01236, 2016.
  • [11] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017.
  • [12] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in Symposium on security and privacy, 2017, pp. 39–57.
  • [13] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings,” in European symposium on security and privacy, 2016, pp. 372–387.
  • [14] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting adversarial attacks with momentum,” in CVPR, 2018, pp. 9185–9193.
  • [15] F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in ICML.   PMLR, 2020, pp. 2206–2216.
  • [16] Z. Wang, H. Guo, Z. Zhang, W. Liu, Z. Qin, and K. Ren, “Feature importance-aware transferable adversarial attacks,” in ICCV, 2021.
  • [17] K. Zhou, Y. Yang, A. Cavallaro, and T. Xiang, “Omni-scale feature learning for person re-identification,” in ICCV, 2019, pp. 3702–3712.
  • [18] X. Chang, T. M. Hospedales, and T. Xiang, “Multi-level factorisation net for person re-identification,” in CVPR, 2018, pp. 2109–2118.
  • [19] Y.-J. Li, C.-S. Lin, Y.-B. Lin, and Y.-C. F. Wang, “Cross-dataset person re-identification via unsupervised pose disentanglement and adaptation,” in ICCV, 2019, pp. 7919–7929.
  • [20] Y. Yang, P. Tiwari, H. M. Pandey, Z. Lei et al., “Pixel and feature transfer fusion for unsupervised cross-dataset person reidentification,” TNNLS, 2021.
  • [21] F. Yang, Z. Zhong, H. Liu, Z. Wang, Z. Luo, S. Li, N. Sebe, and S. Satoh, “Learning to attack real-world models for person re-identification via virtual-guided meta-learning,” in AAAI, vol. 35, no. 4, 2021.
  • [22] W. Ding, X. Wei, R. Ji, X. Hong, Q. Tian, and Y. Gong, “Beyond universal person re-identification attack,” IEEE TIFS, vol. 16, 2021.
  • [23] H. Wang, G. Wang, Y. Li, D. Zhang, and L. Lin, “Transferable, controllable, and inconspicuous adversarial attacks on person re-identification with deep mis-ranking,” in CVPR, 2020, pp. 342–351.
  • [24] X. Li, J. Li, Y. Chen, S. Ye, Y. He, S. Wang, H. Su, and H. Xue, “Qair: Practical query-efficient black-box attacks for image retrieval,” in CVPR, 2021, pp. 3330–3339.
  • [25] S. Bai, Y. Li, Y. Zhou, Q. Li, and P. H. Torr, “Adversarial metric attack and defense for person re-identification,” IEEE TPAMI, vol. 43, no. 6, pp. 2119–2126, 2021.
  • [26] Z. Wang, S. Zheng, M. Song, Q. Wang, A. Rahimpour, and H. Qi, “advpattern: Physical-world attacks on deep person re-identification via adversarially transformable patterns,” in ICCV, 2019, pp. 8341–8350.
  • [27] Q. Bouniot, R. Audigier, and A. Loesch, “Vulnerability of person re-identification models to metric adversarial attacks,” in CVPR, 2020, pp. 794–795.
  • [28] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in ICML.   PMLR, 2017, pp. 1126–1135.
  • [29] Z. Yuan, J. Zhang, Y. Jia, C. Tan, T. Xue, and S. Shan, “Meta gradient adversarial attack,” in ICCV, 2021, pp. 7748–7757.
  • [30] W. Feng, B. Wu, T. Zhang, Y. Zhang, and Y. Zhang, “Meta-attack: Class-agnostic and model-agnostic physical adversarial attack,” in ICCV, 2021, pp. 7787–7796.
  • [31] G. Parascandolo, A. Neitz, A. Orvieto, L. Gresele, and B. Schölkopf, “Learning explanations that are hard to vary,” arXiv preprint arXiv:2009.00329, 2020.
  • [32] S. Shahtalebi, J.-C. Gagnon-Audet, T. Laleh, M. Faramarzi, K. Ahuja, and I. Rish, “Sand-mask: An enhanced gradient masking strategy for the discovery of invariances in domain generalization,” arXiv preprint arXiv:2106.02266, 2021.
  • [33] F. Wang and H. Liu, “Understanding the behaviour of contrastive loss,” in CVPR, 2021, pp. 2495–2504.
  • [34] C. Xiao, B. Li, J.-Y. Zhu, W. He, M. Liu, and D. Song, “Generating adversarial examples with adversarial networks,” arXiv preprint arXiv:1801.02610, 2018.
  • [35] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in CVPR, 2017, pp. 1125–1134.
  • [36] K. Zhou, “Model zoo,” https://kaiyangzhou.github.io/deep-person-reid/MODEL_ZOO.html.
  • [37] W. Li, X. Zhu, and S. Gong, “Harmonious attention network for person re-identification,” in CVPR, 2018, pp. 2285–2294.
  • [38] X. Zhang, H. Luo, X. Fan, W. Xiang, Y. Sun, Q. Xiao, W. Jiang, C. Zhang, and J. Sun, “Alignedreid: Surpassing human-level performance in person re-identification,” arXiv preprint arXiv:1711.08184, 2017.
  • [39] H. Huang, “Alignedreid,” https://github.com/michuanhaohao/AlignedReID.
  • [40] F. Herzog, X. Ji, T. Teepe, S. Hörmann, J. Gilg, and G. Rigoll, “Lightweight multi-branch network for person re-identification,” in ICIP.   IEEE, 2021, pp. 1129–1133.
  • [41] Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),” in ECCV, 2018, pp. 480–496.
  • [42] H. Luo, “Pcb,” https://github.com/huanghoujing/beyond-part-models.
  • [43] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in ICCV, 2015, pp. 1116–1124.
  • [44] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in AAAI, 2017.
  • [45] L. Wei, S. Zhang, W. Gao, and Q. Tian, “Person transfer gan to bridge domain gap for person re-identification,” in CVPR, 2018, pp. 79–88.
  • [46] Z. Zheng, L. Zheng, and Y. Yang, “Unlabeled samples generated by gan improve the person re-identification baseline in vitro,” in ICCV, 2017.
  • [47] J. Li, R. Ji, H. Liu, X. Hong, Y. Gao, and Q. Tian, “Universal perturbation attack against image retrieval,” in ICCV, 2019.
  • [48] X. Sun and L. Zheng, “Dissecting person re-identification from the viewpoint of viewpoint,” in CVPR, 2019.
  • [49] Y. Ge, F. Zhu, D. Chen, R. Zhao et al., “Self-paced contrastive learning with hybrid memory for domain adaptive object re-id,” NeurIPS, vol. 33, pp. 11 309–11 321, 2020.