This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

TkML-AP: Adversarial Attacks to Top-kk Multi-Label Learning

Shu Hu1,    Lipeng Ke1,     Xin Wang2,    Siwei Lyu1
1University at Buffalo, State University of New York     2Keya Medical
{shuhu, lipengke, siweilyu}@buffalo.edu,     [email protected]
Abstract

Top-kk multi-label learning, which returns the top-kk predicted labels from an input, has many practical applications such as image annotation, document analysis, and web search engine. However, the vulnerabilities of such algorithms with regards to dedicated adversarial perturbation attacks have not been extensively studied previously. In this work, we develop methods to create adversarial perturbations that can be used to attack top-kk multi-label learning-based image annotation systems (TkML-AP). Our methods explicitly consider the top-kk ranking relation and are based on novel loss functions. Experimental evaluations on large-scale benchmark datasets including PASCAL VOC and MS COCO demonstrate the effectiveness of our methods in reducing the performance of state-of-the-art top-kk multi-label learning methods, under both untargeted and targeted attacks.

1 Introduction

Refer to caption
Figure 1: Illustrative examples of the untargeted and targeted attacks to the top-33 multi-label image annotation for an image from the PASCAL VOC 2012 dataset. The green icons correspond to the ground truth labels. The red icons represent the targeted labels for attacking. The figure is better viewed in color.

The past decade has witnessed the tour de force of modern deep neural networks (DNNs), which have significantly improved, or in some cases, revolutionized, the state-of-the-art performance of many computer vision problems. Notwithstanding this tremendous success, the omnipotent DNN models are surprisingly vulnerable to adversarial attacks [24, 6, 12]. In particular, inputs with specially designed perturbations, commonly known as adversarial examples, can easily mislead a DNN model to make erroneous predictions. The vulnerabilities of DNN models to adversarial examples impede the safe adoptions of machine learning systems in practical applications. It also motivates the explorations of algorithms generating adversarial examples [3, 17, 14] as a means to analyze the vulnerabilities of DNN models and improve their security.

Most existing works on generating adversarial examples have been focused on the case of multi-class classification [1, 24, 6, 19, 3, 17], where one instance can only be assigned to exactly one out of a set of mutually exclusive classes (labels). Because of the singleness of the labels, existing adversarial perturbation generation schemes for multi-class classification are based on the top-1 attack (i.e., C&W [3], Deepfool [17]), only aiming to alter the top predicted label using the adversarial perturbation.

However, in many real-world applications such as image annotation, document categorization, and web search engines, it is more natural to solve the multi-label learning problem, where an instance is associated with a non-empty subset of labels. Furthermore, in these applications, the output of the system is usually a set of labels of a fixed size, corresponding to the top-kk predicted labels. We term this as the top-kk multi-label learning (TkML). The practical cases of TkML open more opportunities for attackers and leading to larger uncertainties for defenders. There are two common settings that we will consider subsequently for TkML adversarial attacks. The untargeted attack aims to only replace the top-kk labels with a set of arbitrary kk labels that are not true classes of the untampered input. The targeted attack, on the other hand, aims to coerce the TkML classifier to use a specific set of kk labels that are not true classes of the input as the top-kk predictions.

In this work, we describe the first untargeted and targeted adversarial attack algorithms for TkML based on a continuous formulation of the ranking operation, which we term as TkML-AP. Specifically, we note that to perturb the predictions of a TkML algorithm, it is sufficient to clear any ground-truth labels from the top-kk set. There are many different ways to achieve this, but we will focus on ones that enlist the “least actions”, i.e., perturbing the predicted labels with minimum changes to the original label rankings. For the untargeted attack, this means move the ground-truth labels out of the top-kk predictions, and for the targeted attack, this means move the target labels to the top-kk set. Fig.1 gives an illustrative explanation of the proposed idea.

Thus, the key challenge in generating adversarial examples for TkML is to optimize perturbations that can lead to the change of top-kk rankings of the predicted label. To this end, we introduce a reformulation of the top-kk sum that lends itself to efficient numerical algorithms based on gradient descent methods. In particular, we provide loss functions for adversarial perturbations to TkML that are convex in terms of the individual prediction scores. This has a further advantage that even though the model may be nonlinear, a convex loss function can encourage many equally effective local optima. Hence any adversarial perturbation that can lead the model to have the same loss value will have equal effects. We demonstrate the effectiveness of our method on attacking state-of-the-art TkML algorithms using large scale benchmark datasets (PASCAL VOC 2012 [4] and MS COCO 2014 [11]). The main contributions of our work can be summarized as follows:

  1. 1.

    We present the first algorithms for untargeted and targeted adversarial attacks to the TkML problem.

  2. 2.

    Our method is based on a continuous reformulation of the non-differentiable ranking operation. The objective function is convex in terms of the individual prediction scores, which is easier to optimize.

  3. 3.

    Numerical experiments on large-scale benchmark datasets confirm the effectiveness of our method in attacking state-of-the-art TkML algorithms.

2 Backgrounds

2.1 Top-kk Multi-label Learning (TkML)

Let us assume a general multi-label classification problem with a total of m>1m>1 possible labels. For an input 𝐱d\mathbf{x}\in\mathbb{R}^{d}, its true labels are represented by a binary label vector 𝐲=[y1,y2,,ym]{0,1}m\mathbf{y}=[y_{1},y_{2},\cdots,y_{m}]^{\top}\in\{0,1\}^{m}, with yj=1y_{j}=1 indicating that 𝐱\mathbf{x} is associated with the jj-th label. We also use Y={j|yj=1}Y=\{j|y_{j}=1\} to represent the set of true labels of 𝐱\mathbf{x}. Note that YY and 𝐲\mathbf{y} are equivalent notations of the truth labels of 𝐱\mathbf{x}.

We introduce a continuous multi-label prediction function F(𝐱)=[f1(𝐱),f2(𝐱),,fm(𝐱)]F(\mathbf{x})=[f_{1}(\mathbf{x}),f_{2}(\mathbf{x}),\cdots,f_{m}(\mathbf{x})]^{\top}, with each fj(𝐱)[0,1]f_{j}(\mathbf{x})\in[0,1] corresponding to the prediction score of 𝐱\mathbf{x} with regards to the jj-th class111Here we assume the prediction scores are calibrated, i.e., taking values in the range of [0,1][0,1]. For ff\in\mathbb{R}, we can use simple transforms such as 11+ef{1\over 1+e^{-f}} to map it to the range of [0,1][0,1] without changing their ranking.. We denote [f[1](𝐱),f[2](𝐱),,f[m](𝐱)][f_{[1]}(\mathbf{x}),f_{[2]}(\mathbf{x}),\cdots,f_{[m]}(\mathbf{x})]^{\top} as the sorted values of F(𝐱)F(\mathbf{x}) in descending order, i.e., f[1](𝐱)f_{[1]}(\mathbf{x}) is the largest (top-11) score, f[2](𝐱)f_{[2]}(\mathbf{x}) is the second largest (top-22) score, and so on. Furthermore, [j][j] corresponds to the label index of the top-jj prediction score, i.e., j=[j]j^{\prime}=[j] if fj(𝐱)=f[j](𝐱)f_{j^{\prime}}(\mathbf{x})=f_{[j]}(\mathbf{x}). In ranking the values, ties can be broken in any consistent way. For input 𝐱\mathbf{x}, the top-kk multi-label classifier returns the set Y^k(𝐱)={[1],,[k]}\hat{Y}_{k}(\mathbf{x})=\{[1],\cdots,[k]\} for 1k<m1\leq k<m. In other words, we can convert a general multi-label predictor FF to a top-kk multi-label classifier by returning the set of labels corresponding to the set of top-kk prediction scores from F(𝐱)F(\mathbf{x}). This problem is related to many types of learning problems. If |Y|=1|Y|=1, k=1k=1, it becomes the conventional multi-class problem. If |Y|=1|Y|=1, k1k\geq 1, it becomes top-k multi-class problem [10]. If k=|Y|k=|Y|, |Y|1|Y|\geq 1, it becomes the conventional multi-label problem. The top-kk setting is often implicitly used in applications of multi-label learning. For instance, in image annotation, when the number of possible labels is large, the system often returns a fixed number of top annotations that are most relevant to the image222Another strategy in multi-label learning is to return all labels with prediction score above a preset threshold. The result will be a list of labels of varying length. The top-kk multi-label classification can be regarded as using a varying threshold to fix the number of the returned labels..

A successful top-kk multi-label classification should lead to consistency between the true labels (YY) and the predicted labels Y^k(𝐱)\hat{Y}_{k}(\mathbf{x}) of the input. The situation is complicated by the difference in size of YY and Y^k(𝐱)\hat{Y}_{k}(\mathbf{x}), so we use the following criterion: when k|Y|k\geq|Y|, it corresponds to YY^k(𝐱)Y\subseteq\hat{Y}_{k}(\mathbf{x}); when k|Y|k\leq|Y|, it is the case Y^k(𝐱)Y\hat{Y}_{k}(\mathbf{x})\subseteq Y. In other words, one is the subset of the other depending on the relation of kk and the number of the truth labels. We define the top-kk label consistency score as:

E(Y,Y^k(𝐱))=𝕀YY^k(𝐱)+𝕀Y^k(𝐱)Y+𝕀Y=Y^k(𝐱),E(Y,\hat{Y}_{k}(\mathbf{x}))=\mathbb{I}_{Y\subset\hat{Y}_{k}(\mathbf{x})}+\mathbb{I}_{\hat{Y}_{k}(\mathbf{x})\subset Y}+\mathbb{I}_{Y=\hat{Y}_{k}(\mathbf{x})}, (1)

where 𝕀c\mathbb{I}_{c} is the indicator function that takes value 11 when condition cc is true and 0 otherwise. As such, E(Y,Y^k(𝐱))E(Y,\hat{Y}_{k}(\mathbf{x})) is 11 for a successful multi-label classification of input 𝐱\mathbf{x} and 0 otherwise.

2.2 Top-kk and Average Top-kk

Top-kk ranking emerges as a natural element in the learning objectives in various problems such as multi-class learning and robust binary classification [10, 13, 8]. However, as a function of all elements in a set, the top-kk ranking function is non-continuous, non-differentiable, and non-convex. This makes the optimization involving top-kk ranking challenging.

To mitigate these problems of the top-kk operator, we can use the average of top-kk function [5], which is defined for a set F={f1,,fm}F=\{f_{1},\cdots,f_{m}\} as

ϕk(F)=1kj=1kf[j].\textstyle\phi_{k}(F)={1\over k}\sum_{j=1}^{k}f_{[j]}. (2)

It is not difficult to show that (i) ϕk(F)f[k]\phi_{k}(F)\geq f_{[k]}, and (ii) ϕk(F)=f[k]\phi_{k}(F)=f_{[k]} when f[1]==f[k]f_{[1]}=\cdots=f_{[k]}. As such, the average of top-kk is a tight upper-bound of the top-kk. It can be proved that ϕk(F)\phi_{k}(F) is a convex function of the elements of FF [2]. More importantly, it affords an equivalent form as an optimization problem [18].

Lemma 1.

For fi(𝐱)[0,1]f_{i}(\mathbf{x})\in[0,1], we have

ϕk(F)\displaystyle\phi_{k}(F) =1kminλ[0,1]{kλ+j=1m[fjλ]+}\displaystyle={1\over k}\min_{\lambda\in[0,1]}\{k\lambda+\sum_{j=1}^{m}[f_{j}-\lambda]_{+}\} (3)
f[k]\displaystyle f_{[k]} argminλ[0,1]{kλ+j=1m[fjλ]+},\displaystyle\in\arg\!\min_{\lambda\in[0,1]}\{k\lambda+\sum_{j=1}^{m}[f_{j}-\lambda]_{+}\}, (4)

where [a]+=max{0,a}[a]_{+}=\max\{0,a\} is the hinge function.

For completeness, we include the proof of Lemma 1 in Appendix A.1. Lemma 1 enables us to incorporate the average top-kk function in conventional sub-gradient based optimization.

2.3 Related Works

Due to the limit of space, we only provide a brief overview of relevant works. A full survey of adversarial attacks on deep learning models can be found in [1]. The major differences between the work in this paper and the related works are summarized in Table 1.

Most existing adversarial attacking methods target multi-class classification problems (corresponding to the special case of TkML with k=1k=1 and |Y|=1|Y|=1 for all inputs). As such, these methods often target the top prediction and aim to change it with perturbations. For untargeted attacks, DeepFool [17] is a generalization of the minimum attack under general decision boundaries by swapping labels from the top-22 prediction. The work of [16] (UAP) aims to find universal adversarial perturbations that are independent of individual input images. Both DeepFool and UAPs are top-1 multi-class adversarial attack methods. For targeted attacks, FGSM [6] and I-FGSM [9] are two popular attack schemes that use the gradient of the DNN models with regards to the input to generate adversarial samples. The CW method [3] improves on the previous method by using regularization and modified constraints.

Realizing that only attacking the top predictions may not be effective, several works introduce attacks to the top-kk (for k>1k>1) predictions in a multi-class classification system. kkFool [25] and CWk [26] extend the original DeepFool [17] and CW [3] methods to exclude the truth label out of the top-kk predictions. kkFool is based on a geometry view of the decision boundary between kk labels and the truth label in the multi-class problem. The UAP method is extended in [25] to top-kk Universal Adversarial Perturbations (kkUAPs). In addition, the CW method is extended to a top-kk version known as CWk in [26]. However, all these methods are still designed for multi-class classification (i.e.|Y|=1|Y|=1), and cannot be directly adapted to the attacks to the more general top-kk multi-label learning.

The authors of [22] describes an adversarial attack to multi-label classification extending existing attacks to multi-class classification. This method is further studied in [27], which transfers the problem of generating attack to a linear programming problem. To make the predictions of adversarial examples lying inside of the training data distribution, [15] proposed a multi-label attack procedure with an additional domain knowledge-constrained classifier. These are all for multi-label learning without the top-kk constraint. Our experiments in Section 4 show that they are not effective for the top-kk setting.

Methods Features Multi Label Untargeted Attack Universal Attack Targeted Attack Top-kk
kkFool [25] ×\times \checkmark ×\times ×\times \checkmark
kkUAPs [25] ×\times \checkmark \checkmark ×\times \checkmark
CWk [26] ×\times ×\times ×\times \checkmark \checkmark
ML-AP [22] \checkmark ×\times ×\times \checkmark ×\times
TkML-AP-U (this paper) \checkmark \checkmark ×\times ×\times \checkmark
TkML-AP-Uv (this paper) \checkmark \checkmark \checkmark ×\times \checkmark
TkML-AP-T (this paper) \checkmark ×\times ×\times \checkmark \checkmark
Table 1: Summary of the difference between previous works with our methods (TkML-AP).

3 Method

In this work, we introduce new methods to generate adversarial perturbations to attack top-kk multi-label classification. We term our method as TkML-AP. Unlike the multi-label adversarial perturbation method in [22], we consider the top-kk ranking in the TkML problem an essential requirement to design loss functions in our methods. Hence, in our methods, the ranking relation is explicitly handled, using the results in Section 2.2. Specifically, we describe our methods in detail for the instance-specific and instance-independent (universal) untargeted attacks (TkML-AP-U and TkML-AP-Uv) and targeted attacks (TkML-AP-T). A comparison with previous works is given in Table 1.

3.1 Untargeted Attack

Formulation. The untargeted attack to top-kk multi-label learning (TkML-AP-U) aims to find a minimum perturbation to the input that can push the prediction scores of the truth labels outside of the top-kk set. It can be formulated as finding a perturbation signal 𝐳\mathbf{z} for an input 𝐱\mathbf{x} such that

min𝐳𝐳2,s.t.E(Y,Y^k(𝐱+𝐳))=0,\min_{\mathbf{z}}\|\mathbf{z}\|_{2},~{}\text{s.t.}~{}E(Y,\hat{Y}_{k}(\mathbf{x}+\mathbf{z}))=0, (5)

where E(Y,Y^k)E(Y,\hat{Y}_{k}) is defined in Eq.(1)333Note that the definition is minimal: it only changes labels that are correctly predicted by F(𝐱)F(\mathbf{x}), i.e.YY^k(𝐱)Y\cap\hat{Y}_{k}(\mathbf{x}). True labels that are incorrectly predicted by FF and not in Y^k(𝐱)\hat{Y}_{k}(\mathbf{x}) are expected to be intact.. Because km1k\leq m-1, we can rewrite Eq.(5) with a more revealing equivalent form,

min𝐳12𝐳22,s.t.maxjYfj(𝐱+𝐳)f[k+1](𝐱+𝐳).\min_{\mathbf{z}}{1\over 2}\|\mathbf{z}\|_{2}^{2},~{}\text{s.t.}~{}\max_{j\in Y}f_{j}(\mathbf{x}+\mathbf{z})\leq f_{[k+1]}(\mathbf{x}+\mathbf{z}). (6)

Note that the constraint is equivalent to have Y{j|fj(𝐱+𝐳)<f[k](𝐱+𝐳)}Y\subseteq\{j|f_{j}(\mathbf{x}+\mathbf{z})<f_{[k]}(\mathbf{x}+\mathbf{z})\}, the converse of which means at least one truth label is inside the top-kk range.

Relaxation. The optimization problem in Eq.(6) is difficult to optimize directly, so we introduce the top-kk multi-label loss, [maxjYfj(𝐱+𝐳)f[k+1](𝐱+𝐳)]+\left[\max_{j\in Y}f_{j}(\mathbf{x}+\mathbf{z})-f_{[k+1]}(\mathbf{x}+\mathbf{z})\right]_{+}, as a surrogate to the constraint. The top-kk multi-label loss precisely reflects the requirement to exclude the true labels out of the top-kk range. It is zero when the maximal prediction score from the true labels is no greater than the k+1k+1-th prediction score, and positive otherwise. Rewriting the objective function using the Lagrangian form, we have

min𝐳β2𝐳22+[maxjYfj(𝐱+𝐳)f[k+1](𝐱+𝐳)]+,\min_{\mathbf{z}}{\beta\over 2}\|\mathbf{z}\|_{2}^{2}+\left[\max_{j\in Y}f_{j}(\mathbf{x}+\mathbf{z})-f_{[k+1]}(\mathbf{x}+\mathbf{z})\right]_{+}, (7)

where β>0\beta>0 is a prechosen trade-off parameter.

Optimization. The ranking operation in Eq.(7) can be further removed. Specifically, denote tm+1j=[maxyYfy(𝐱+𝐳)fj(𝐱+𝐳)]+t_{m+1-j}=[\max_{y\in Y}f_{y}(\mathbf{x}+\mathbf{z})-f_{j}(\mathbf{x}+\mathbf{z})]_{+} for j=1,,mj=1,\cdots,m. With a bit of abuse of the notation, we denote t[j]t_{[j]} as the top-jj element in the set {t1,,tm}\{t_{1},\cdots,t_{m}\}444Note that [j][j] in t[j]t_{[j]} may not correspond to the same index as in the case of f[j]f_{[j]} as it depends on the ranking of different sets.. Note that there is a simple correspondence, as t[mk]=[maxyYfy(𝐱+𝐳)f[k+1](𝐱+𝐳)]+t_{[m-k]}=[\max_{y\in Y}f_{y}(\mathbf{x}+\mathbf{z})-f_{[k+1]}(\mathbf{x}+\mathbf{z})]_{+}.

As shown in Section 2.2, we have the following bound of the top-kk value using the average of {t1,,tm}\{t_{1},\cdots,t_{m}\}, as 1mkj=1mkt[j]t[mk]=[maxyYfy(𝐱+𝐳)f[k+1](𝐱+𝐳)]+{1\over m-k}\sum_{j=1}^{m-k}t_{[j]}\geq t_{[m-k]}=[\max_{y\in Y}f_{y}(\mathbf{x}+\mathbf{z})-f_{[k+1]}(\mathbf{x}+\mathbf{z})]_{+}. Furthermore, using Lemma 1, we can rewrite the average of top-(mk)(m-k) elements of {t1,,tm}\{t_{1},\cdots,t_{m}\} as 1mkj=1mkt[j]=minλ[0,1]λ+1mkj=1m[tjλ]+{1\over m-k}\sum_{j=1}^{m-k}t_{[j]}=\min_{\lambda\in[0,1]}\lambda+{1\over m-k}\sum_{j=1}^{m}[t_{j}-\lambda]_{+}. Replacing the definition of tjt_{j}, the inner term [[maxyYfy(𝐱+𝐳)f[m+1j](𝐱+𝐳)]+λ]+[[\max_{y\in Y}f_{y}(\mathbf{x}+\mathbf{z})-f_{[m+1-j]}(\mathbf{x}+\mathbf{z})]_{+}-\lambda]_{+} can be further simplified by removing the double hinge function to [maxyYfy(𝐱+𝐳)f[j](𝐱+𝐳)λ]+[\max_{y\in Y}f_{y}(\mathbf{x}+\mathbf{z})-f_{[j]}(\mathbf{x}+\mathbf{z})-\lambda]_{+}, according to the following result.

Lemma 2.

For λ0\lambda\geq 0, [[ax]+λ]+=[axλ]+[[a-x]_{+}-\lambda]_{+}=[a-x-\lambda]_{+}.

The proof of Lemma 2 is deferred to Appendix A.2. Putting all results together, we get the objective function for finding adversarial perturbation in an untargeted attack to the top-kk multi-label learning as:

minλ[0,1],𝐳β2𝐳22+λ+1mkj=1m[maxyYfy(𝐱+𝐳)fj(𝐱+𝐳)λ]+.\small\min_{\lambda\in[0,1],\mathbf{z}}{\beta\over 2}\|\mathbf{z}\|_{2}^{2}+\lambda+{1\over m-k}\sum_{j=1}^{m}[\max_{y\in Y}f_{y}(\mathbf{x}+\mathbf{z})-f_{j}(\mathbf{x}+\mathbf{z})-\lambda]_{+}. (8)

This optimization problem can be solved with an iterative gradient descent approach [8, 5]. We initialize 𝐳\mathbf{z} and λ\lambda, then update them with the following steps:

𝐳l+1=\displaystyle\mathbf{z}_{l+1}= (1βηl)𝐳lηlmkj=1m(fy(𝐱)𝐱fj(𝐱)𝐱)|𝐱=𝐱+𝐳l\displaystyle(1-\beta\eta_{l})\mathbf{z}_{l}-\frac{\eta_{l}}{m-k}\sum_{j=1}^{m}\left(\frac{\partial f_{y^{\prime}}(\mathbf{x}^{\prime})}{\partial\mathbf{x}^{\prime}}-\frac{\partial f_{j}(\mathbf{x}^{\prime})}{\partial\mathbf{x}^{\prime}}\right)\Bigg{|}_{\mathbf{x}^{\prime}=\mathbf{x}+\mathbf{z}_{l}} (9)
𝕀[fy(𝐱+𝐳l)fj(𝐱+𝐳l)>λl],\displaystyle\cdot\mathbb{I}_{[f_{y^{\prime}}(\mathbf{x}+\mathbf{z}_{l})-f_{j}(\mathbf{x}+\mathbf{z}_{l})>\lambda_{l}]},
λl+1=\displaystyle\lambda_{l+1}= λlηl(11mkj=1m𝕀[fy(𝐱+𝐳l)fj(𝐱+𝐳l)>λl]),\displaystyle\lambda_{l}-\eta_{l}\cdot\Big{(}1-\frac{1}{m-k}\sum_{j=1}^{m}\mathbb{I}_{[f_{y^{\prime}}(\mathbf{x}+\mathbf{z}_{l})-f_{j}(\mathbf{x}+\mathbf{z}_{l})>\lambda_{l}]}\Big{)},

where ηl\eta_{l} is the step size and ymaxyYy^{\prime}\in\max_{y\in Y}. This iterative process continues until the termination conditions are met. The overall procedure is described in detail in Algorithm 1.

Input: 𝐱\mathbf{x}, predictor FF, kk, ηl\eta_{l}, β\beta
Output: adversarial example 𝐱\mathbf{x}^{*}, perturbation 𝐳\mathbf{z}^{*}
1 Initialization: l=0l=0, 𝐱=𝐱\mathbf{x}^{*}=\mathbf{x}, 𝐳0\mathbf{z}_{0}, and λ0\lambda_{0}
2
3while E(Y,Y^k(𝐱+𝐳))0E(Y,\hat{Y}_{k}(\mathbf{x}+\mathbf{z}))\neq 0 do
4       Compute 𝐳l+1\mathbf{z}_{l+1} and λl+1\lambda_{l+1} with Eq.(9);
5      𝐱=𝐱+𝐳l+1\mathbf{x}^{*}=\mathbf{x}+\mathbf{z}_{l+1}, 𝐳=𝐳l+1\mathbf{z}^{*}=\mathbf{z}_{l+1};
6      l=l+1l=l+1;
7 end while
return 𝐱\mathbf{x}^{*}, 𝐳\mathbf{z}^{*}
Algorithm 1 Untargeted Attack (TkML-AP-U)

Universal untargeted attack. We can extend the instance-specific untargeted attack to a universal adversarial attack that is independent of the input [16] so can be shared by all instances. Specifically, given a dataset X={𝐱1,,𝐱n}\textbf{X}=\{\mathbf{x}_{1},\cdots,\mathbf{x}_{n}\} and its ground truth label set Y={Y1,,Yn}Y=\{Y_{1},\cdots,Y_{n}\}, where Yi:=Y(𝐱i)Y_{i}:=Y(\mathbf{x}_{i}), finding the instance-independent (universal) adversarial perturbation 𝐳\mathbf{z} is formulated as minλ[0,1],𝐳1ni=1nL(𝐳,λ;𝐱i,Yi)\min_{\lambda\in[0,1],\mathbf{z}}{1\over n}\sum_{i=1}^{n}L(\mathbf{z},\lambda;\mathbf{x}_{i},Y_{i}), where L(𝐳,λ;𝐱i,Yi)L(\mathbf{z},\lambda;\mathbf{x}_{i},Y_{i}) is the objective function in Eq.(8). The solution to the universal untargted attack can be obtained using a similar procedure based on Algorithm 1. Please refer to Appendix B.3 for the details about the TkML-AP-Uv algorithm.

3.2 Targeted Attack

Formulation. We next consider the targeted attack, the aim of which is to plant a set of kk labels, Y~{1,,m}\widetilde{Y}\subset\{1,\cdots,m\} and Y~Y=\widetilde{Y}\cap Y=\varnothing, as the top-kk predictions. We formulate the learning objective of the targeted attack on top-kk multi-label learning (TkML-AP-T) as

min𝐳𝐳2,s.t.Y~=Y^k(𝐱+𝐳).\min_{\mathbf{z}}\|\mathbf{z}\|_{2},~{}\text{s.t.}~{}\widetilde{Y}=\hat{Y}_{k}(\mathbf{x}+\mathbf{z}). (10)

The constraint in Eq.(10) exactly reflects the requirement that the top-kk predicted labels of the perturbed are all from the targeted label set.

Relaxation. Analogous to the untargeted attack, we rewrite the objective function into a form that lends itself to optimization. Specifically, if we have Y~=Y^k(𝐱+𝐳)\widetilde{Y}=\hat{Y}_{k}(\mathbf{x}+\mathbf{z}), it means that the prediction scores of labels in Y~\widetilde{Y} occupy the top-kk ranks. So the sum of prediction scores from labels in the sets Y~\widetilde{Y} and Y^k(𝐱+𝐳)\hat{Y}_{k}(\mathbf{x}+\mathbf{z}) are also the same, i.e., we have j=1kf[j](𝐱+𝐳)jY~fj(𝐱+𝐳)=0\sum_{j=1}^{k}f_{[j]}(\mathbf{x}+\mathbf{z})-\sum_{j\in\widetilde{Y}}f_{j}(\mathbf{x}+\mathbf{z})=0. Furthermore, if Y~Y^k(𝐱+𝐳)\widetilde{Y}\not=\hat{Y}_{k}(\mathbf{x}+\mathbf{z}), j=1kf[j](𝐱+𝐳)jY~fj(𝐱+𝐳)0\sum_{j=1}^{k}f_{[j]}(\mathbf{x}+\mathbf{z})-\sum_{j\in\widetilde{Y}}f_{j}(\mathbf{x}+\mathbf{z})\geq 0 as by definition, the second term cannot be greater than the first term. This suggest that j=1kf[j](𝐱+𝐳)jY~fj(𝐱+𝐳)\sum_{j=1}^{k}f_{[j]}(\mathbf{x}+\mathbf{z})-\sum_{j\in\widetilde{Y}}f_{j}(\mathbf{x}+\mathbf{z}) is a surrogate to the constraint in Eq.(10). It is zero when all target attacked labels are in the top-kk positions, and positive otherwise. Introducing the Lagrangian form, we can reformulate Eq.(10) as

min𝐳β2𝐳22+j=1kf[j](𝐱+𝐳)jY~fj(𝐱+𝐳),\textstyle\min_{\mathbf{z}}{\beta\over 2}\|\mathbf{z}\|_{2}^{2}+\sum_{j=1}^{k}f_{[j]}(\mathbf{x}+\mathbf{z})-\sum_{j\in\widetilde{Y}}f_{j}(\mathbf{x}+\mathbf{z}), (11)

where β>0\beta>0 is a prechosen trade-off parameter. Note that the second term in Eq.(11) is precisely the sum of top-kk elements. Using the results of Lemma 1, we can remove the explicit ranking operation in Eq.(11). Specifically, we have j=1kf[j](𝐱+𝐳)=minλ[0,1]{kλ+j=1m[fj(𝐱+𝐳)λ]+}\sum_{j=1}^{k}f_{[j]}(\mathbf{x}+\mathbf{z})=\min_{\lambda\in[0,1]}\{k\lambda+\sum_{j=1}^{m}[f_{j}(\mathbf{x}+\mathbf{z})-\lambda]_{+}\}. Further simplifying the last two terms in Eq.(11) yields

{kλ+j=1m[fj(𝐱+𝐳)λ]+}jY~fj(𝐱+𝐳)\displaystyle\textstyle\Big{\{}k\lambda+\sum_{j=1}^{m}[f_{j}(\mathbf{x}+\mathbf{z})-\lambda]_{+}\Big{\}}-\sum_{j\in\widetilde{Y}}f_{j}(\mathbf{x}+\mathbf{z})
=\displaystyle= jY~([fj(𝐱+𝐳)λ]+(fj(𝐱+𝐳)λ))\displaystyle\textstyle\sum_{j\in\widetilde{Y}}\Big{(}[f_{j}(\mathbf{x}+\mathbf{z})-\lambda]_{+}-(f_{j}(\mathbf{x}+\mathbf{z})-\lambda)\Big{)}
+\displaystyle+ jY~[fj(𝐱+𝐳)λ]+\displaystyle\textstyle\sum_{j\not\in\widetilde{Y}}[f_{j}(\mathbf{x}+\mathbf{z})-\lambda]_{+}
=\displaystyle= jY~[λfj(𝐱+𝐳)]++jY~[fj(𝐱+𝐳)λ]+,\displaystyle\textstyle\sum_{j\in\widetilde{Y}}[\lambda-f_{j}(\mathbf{x}+\mathbf{z})]_{+}+\sum_{j\not\in\widetilde{Y}}[f_{j}(\mathbf{x}+\mathbf{z})-\lambda]_{+},

where we use a fact that [a]+a=[a]+[a]_{+}-a=[-a]_{+}. Introducing sj=2𝕀jY~1{1,1}s_{j}=2\mathbb{I}_{j\in\widetilde{Y}}-1\in\{-1,1\}, we can rewrite Eq.(11) more concisely as

minλ[0,1],𝐳β2𝐳22+i=1m[sj(λfj(𝐱+𝐳))]+\textstyle\min_{\lambda\in[0,1],\mathbf{z}}{\beta\over 2}\|\mathbf{z}\|_{2}^{2}+\sum_{i=1}^{m}[s_{j}(\lambda-f_{j}(\mathbf{x}+\mathbf{z}))]_{+} (12)

Optimization. This optimization problem can also be solved with an iterative gradient descent approach as in the untargeted attack. We initialize 𝐳\mathbf{z} and λ\lambda, then update them with the following steps:

𝐳l+1=\displaystyle\mathbf{z}_{l+1}= (1βηl)𝐳l\displaystyle(1-\beta\eta_{l})\mathbf{z}_{l} (13)
ηlj=1m(sj)fj(𝐱)𝐱|𝐱=𝐱+𝐳l𝕀[sj(λlfj(𝐱+𝐳l))>0]\displaystyle-\eta_{l}\sum_{j=1}^{m}(-s_{j})\left.\frac{\partial f_{j}(\mathbf{x}^{\prime})}{\partial\mathbf{x}^{\prime}}\right|_{\mathbf{x}^{\prime}=\mathbf{x}+\mathbf{z}_{l}}\cdot\mathbb{I}_{[s_{j}(\lambda_{l}-f_{j}(\mathbf{x}+\mathbf{z}_{l}))>0]}
λl+1=\displaystyle\lambda_{l+1}= λlηlj=1msj𝕀[sj(λlfj(𝐱+𝐳l))>0]\displaystyle\lambda_{l}-\eta_{l}\sum_{j=1}^{m}s_{j}\cdot\mathbb{I}_{[s_{j}(\lambda_{l}-f_{j}(\mathbf{x}+\mathbf{z}_{l}))>0]}

where ηl\eta_{l} is the step size. The overall procedure is described in detail in Algorithm 2. The algorithm stops when the termination conditions are met.

Input: 𝐱\mathbf{x}, predictor FF, Y~\widetilde{Y}, max_iter, ηl\eta_{l}, β\beta
Output: adversarial example 𝐱\mathbf{x}^{*}, perturbation 𝐳\mathbf{z}^{*}
1 Initialization: l=0l=0, 𝐱=𝐱\mathbf{x}^{*}=\mathbf{x}, 𝐳0\mathbf{z}_{0}, and λ0\lambda_{0}
2
3while ll\leq max_iter do
4       Compute 𝐳l+1\mathbf{z}_{l+1} and λl+1\lambda_{l+1} with Eq.(13);
5      𝐱=𝐱+𝐳l+1\mathbf{x}^{*}=\mathbf{x}+\mathbf{z}_{l+1}, 𝐳=𝐳l+1\mathbf{z}^{*}=\mathbf{z}_{l+1};
6      l=l+1l=l+1;
7 end while
return 𝐱\mathbf{x}^{*}, 𝐳\mathbf{z}^{*}
Algorithm 2 Targeted Attack (TkML-AP-T)

4 Experiments

We evaluate the performance of the proposed adversarial attacks (i.e., TkML-AP-U, TkML-AP-Uv, and TkML-AP-T) in the practical problem of image annotation, the goal of which is to predict the labels of an input image. Due to the limit of the space, we present the most significant information and results of our experiments, with more detailed information and additional results in the complementary materials555Code: https://github.com/discovershu/TKML-AP..

4.1 Experimental Settings

Datasets and baseline models. Our experiments are based on two popular large-scale image annotation datasets, namely PASCAL VOC 2012 [4] and MS COCO 2014 [11]. Both datasets have multiple true labels associated with each image: the average number of positive labels per instance in PASCAL VOC 2012 and MS COCO 2014 are 1.43 (out of 20) and 3.67 (out of 80), respectively. All RGB images are with pixel intensities in the range of {0,1,,255}\{0,1,\cdots,255\}.

On the two datasets, we train deep neural network-based top-kk multi-label classifiers as baseline models. For the PASCAL VOC 2012 dataset, similar to [22], we adopt the inception-v3 [23] model pre-trained on ImageNet [21] and fine-tuned on PASCAL VOC 2012. For MS COCO 2014 datasets, we use a ResNet50 [7] based model. Both models are originally designed for multi-class classification, so we convert them to multi-label models by replacing the softmax layer with sigmoid classification layer as in [20]666After retraining the models, we get 0.934 mAP performance for PASCAL VOC 2012 and 0.867 mAP performance for MS COCO 2014 on the corresponding validation datasets, which are close to the state-of-the-art performance [22, 20].. We further modify the model to output the top-kk predicted labels.

We select 1,000 images from the validation set in PASCAL VOC 2012 and MS COCO 2014 datasets respectively as the test set to test the untargeted and targeted attack methods. These images are correctly predicted by the baseline TkML models, i.e., the predicted top-kk labels either contain or are completely from the true labels. For the universal untargeted attack, however, we need a training dataset to find the universal perturbation. Therefore, we select 3,000 images from the validation set of MS COCO 2014 as the training set and evaluate the attack performance on another different 1,000 images from the same validation set. For the targeted attacks, we choose the target labels as in [3, 26], where we consider three different strategies (see Fig.4 for more details).

  • Best Case. In this case, we select kk labels that are not true labels and have the highest prediction scores. These labels are the runner-ups and the regarded as the easiest labels to attack.

  • Random Case. In this case, we randomly select kk labels that are not true labels following a uniform distribution.

  • Worst Case. In this case, we select kk labels that are not true labels with the lowest prediction scores. These labels are the most difficult to attack.

Evaluation metrics. For the instance-specific untargeted and targeted attacks, we use the attack success rate (ASR) as an evaluation metric of the attack performance, which is defined as

ASR=11ni=1nE(Y(𝐱i),Y^k(𝐱i+𝐳i)),\textstyle\mbox{{ASR}}=1-\frac{1}{n}\sum_{i=1}^{n}E(Y(\mathbf{x}_{i}),\hat{Y}_{k}(\mathbf{x}_{i}+\mathbf{z}_{i})),\vspace{-1mm} (14)

where nn is the number of evaluation data. Higher values of ASR indicate the corresponding method has a high attacking performance. This metric extends the one used in [25] for multi-class classification |Y|=1|Y|=1. In the universal untargted attack, we use a slightly different ASR definition to reflect that the perturbation is shared by all instances, as ASR=11ni=1nE(Y(𝐱i),Y^k(𝐱i+𝐳)).\mbox{{ASR}}=1-\frac{1}{n}\sum_{i=1}^{n}E(Y(\mathbf{x}_{i}),\hat{Y}_{k}(\mathbf{x}_{i}+\mathbf{z})). To evaluate the perceptual quality, we define the average per-pixel perturbation over all successful attacks as

Pert=1nASRi=1n𝐳i2(1E(Y(𝐱i),Y^k(𝐱i+𝐳i))# of pixels of 𝐱i.\textstyle\mbox{{Pert}}=\frac{1}{n\cdot\mbox{{ASR}}}\sum_{i=1}^{n}\frac{\|\mathbf{z}_{i}\|_{2}(1-E(Y(\mathbf{x}_{i}),\hat{Y}_{k}(\mathbf{x}_{i}+\mathbf{z}_{i}))}{\mbox{\scriptsize\# of pixels of }\mathbf{x}_{i}}.\vspace{-1mm} (15)

The lower value of Pert means that the perturbation is less perceivable. The hyper-parameter β\beta is chosen to achieve a good trade-off between ASR and Pert.

Compared Methods. We use experiments to test the practical performance of TkML-AP. However, as there are no dedicated adversarial perturbation generation methods for the top-kk multi-label learning, we adapt several existing adversarial attacks designed for the general multi-label or multi-class learning as comparison baselines. Specifically, we use the following methods.

  • Untargeted attack (kkFool): We replace the prediction score of one ground-truth label in the kkFool [25] algorithm with the maximum prediction score among all ground truth labels as an untargeted attack comparative method.

  • Universal attack (kkUAPs): We use the kkUAPs from [25] with only replace the inner kkFool method with our modified untargeted attack comparative method.

  • Targeted attack (ML-AP): We adapt the Rank I algorithm from [22] with the loss function [maxjPfj(𝐱+𝐳)minjPfj(𝐱+𝐳)]+[\max_{j\notin P}f_{j}(\mathbf{x}+\mathbf{z})-\min_{j\in P}f_{j}(\mathbf{x}+\mathbf{z})]_{+} to a targeted attack comparative method, where PP contains the targeted labels (exclude the ground truth labels) and |P|=k|P|=k. It should be mentioned that this loss is similar to the loss function in [26] when we do not consider the order of targeted labels.

These methods, together with the proposed methods, namely TkML-AP-U, TkML-AP-Uv, TkML-AP-T, are applied to attack the baseline models trained on the datasets.

4.2 Results

Untargeted Attacks.

kk Methods PASCAL VOC 2012 MS COCO 2014
Pert(×\times10-2) ASR Pert(×\times10-2) ASR
3 kkFool 1.64 93.7 5.49 61.4
TkML-AP-U 0.51 99.6 0.49 100
5 kkFool 2.39 93.5 9.91 65.2
TkML-AP-U 0.56 99.3 0.53 100
10 kkFool 4.88 88.7 16.44 68.1
TkML-AP-U 0.63 98.3 0.59 100
Table 2: Comparison of Pert and ASR (%) of the untargeted attack methods with kk=3, 5, 10 on two datasets. The best results are shown in bold.

The performance of untargeted attacks is shown in Table 2. Note that for different kk values, the TkML-AP-U method achieves a nearly complete obviation (with very high ASR values) on both the PASCAL VOC 2012 dataset and the MS COCO 2014 dataset with small perturbation scales (indicated by the smaller Pert values). On the other hand, the simple adoption of the DeepFool method (kkFool) is much less effective. This could be attributed to the explicit consideration of the top-kk prediction in TkML-AP-U. The quantitative results are corroborated with an example from the PASCAL VOC 2012 shown in Fig.2. Although both kkFool and TkML-AP-U show effectiveness in attacking the top-kk predictions from the baseline method, kkFool introduces larger perturbations in general. In many cases, the perturbations are visible as shown in Fig.2.

When deployed in practice, it is possible that the attack is designed for top-kk predictions but the actual system is used to find the top-kk^{\prime} outputs. In other words, there can be a mismatch between the cutoff rank that is used in the attack kk from that used in the system kk^{\prime}. Note that by the definition of TkML-AP-U, a successful attack to a top-kk multi-label learning system is necessarily a successful attack to the same system for the case of top-kk^{\prime} for kkk^{\prime}\leq k. This is because the top kk^{\prime} set is a subset of the top kk set. On the other hand, we perform a set of experiments to validate the case when k>kk^{\prime}>k. Specifically, in Table 3, we show the results of running TkML-AP-U for k=3k=3 and k=5,10k^{\prime}=5,10, respectively. Note that in these cases, the effectiveness of the effect significantly reduces from the case of k=kk=k^{\prime}. This is expected since a successful top-kk attack will move the original top-kk labels to ranks greater than kk. However, our objective Eq.(8) cannot avoid the case when some of the original labels are placed between kk and kk^{\prime}, so a successful attack to the top-kk case may not generalize to a successful attack to the case of top-k,(k<k)k^{\prime},(k<k^{\prime}).

Refer to caption
Figure 2: Visual examples of untargeted attack methods on PASCAL VOC 2012. The perturbations are scaled by a factor of 20 to increase visibility. Green icons represent the truth labels (GT) that are attacked. The figure is better viewed in color.
kk^{\prime} Method (kk=3) PASCAL VOC 2012 MS COCO 2014
Pert(×\times10-2) ASR Pert(×\times10-2) ASR
3 TkML-AP-U 0.51 99.6 0.49 100
5 TkML-AP-U 0.24 3.6 0.43 26.5
10 TkML-AP-U 0.18 0.3 0.35 3.9
Table 3: Comparison of Pert and ASR (%) of the untargeted attack methods in k=3,5,10k^{\prime}=3,5,10 on two datasets when setting kk=3.
kk 1 2 3
Metrics Pert ASR Pert ASR Pert ASR
kkUAPs 0.51 63.9 0.51 74.6 0.51 73.2
TkML-AP-Uv 0.13 86.5 0.15 82 0.16 80.5
Table 4: Comparison of Pert and ASR (%) of the universal untargeted attack methods on MS COCO 2014.

Universal Untargeted Attacks. The results of universal untargeted attacks are shown in Table 4. On the MS COCO 2014 dataset, TkML-AP-Uv outperforms kkUAPs in all cases. Fig.3 further exhibits visual examples of universal perturbation with TkML-AP-Uv and kkUAPs for k=3k=3. With similar perturbations, TkML-AP-Uv is successful in attacking all top-33 labels but there are images that kkUAPs fails to attack. On the other hand, because of the requirement of being instance-independent, to achieve the same level of attacks, universal untargeted attacks need to introduce larger visual perturbations than those in the instance-specific attacks.

Refer to caption
Figure 3: Examples of universal untargeted attack methods with k=3k=3 on MS COCO 2014. The figure is better viewed in color.
Cases kk Methods PASCAL VOC 2012 MS COCO 2014
Pert(×\times10-2) ASR Pert(×\times10-2) ASR
Best 3 ML-AP 0.44 96.2 0.55 100
TkML-AP-T 0.44 96.6 0.57 100
5 ML-AP 0.50 92 0.66 99.9
TkML-AP-T 0.50 92.8 0.69 99.9
10 ML-AP 0.55 84.2 0.81 99.8
TkML-AP-T 0.56 86.4 0.85 99.8
Random 3 ML-AP 0.59 86 0.95 99.8
TkML-AP-T 0.59 89.8 0.99 99.9
5 ML-AP 0.62 77.9 1.11 96.5
TkML-AP-T 0.63 83.7 1.18 97.8
10 ML-AP 0.63 67.7 1.22 84.2
TkML-AP-T 0.64 76.4 1.28 94.5
Worst 3 ML-AP 0.66 68 1.08 90
TkML-AP-T 0.66 75.8 1.14 91.4
5 ML-AP 0.67 53.3 1.18 81.8
TkML-AP-T 0.69 66.6 1.25 87.2
10 ML-AP 0.67 39.1 1.25 59
TkML-AP-T 0.69 57 1.30 73.1
Table 5: Comparison of Pert and ASR (%) of the targeted attack methods with kk=3, 5, 10 in the Best, Random, and Worst cases on two datasets. The best ASR results are shown in bold.
Refer to caption
Refer to caption
Figure 4: Targeted attack in Best (left-top, targeted labels are near GT) , Random (left-bottom, targeted labels are randomly selected), and Worst (right, targeted labels are far from GT) cases. TA means the targeted attack labels. The perturbations are scaled by a factor of 20 to increase visibility. The red icons represent the targeted labels for attacking. The figure is better viewed in color.

Targeted Attacks. In Table 5, we evaluate the performance of TkML-AP-T (our method) and compare it with the ML-AP method in the three attack settings (Best, Random, and Worst) as described in Section 4.1. On both datasets and over all cases, TkML-AP-T outperforms ML-AP for different kk values and comparing settings in terms of the ASR scores with comparable perturbation strengths. In particular, with increasing kk, the gap of ASR scores between TkML-AP-T and the ML-AP method also increases. On the other hand, we note that the ASR scores decrease when the kk value increases. This is because the attack methods need to take more effort to put labels in the set Y~\widetilde{Y} to top-kk positions. The attacks become more challenging for both methods when the target labels are chosen according to the Worst setting, reflecting that larger perturbations are required to modify the predictions to the more difficult labels. Visual results of targeted attack using TkML-AP-T and ML-AP are shown in Fig.4.

[Uncaptioned image]

5 Conclusion

Top-kk multi-label learning (TkML) has many practical applications. However, the vulnerability of such algorithms with regards to dedicated adversarial perturbation attacks has not been extensively studied previously. In this work, we develop white-box generation methods of adversarial perturbations to TkML based image annotation system for both untargeted and targeted attacks. Our methods explicitly consider the top-kk ranking relation and are based on novel loss functions. Experimental evaluations on large-scale benchmark datasets including PASCAL VOC and MS COCO demonstrate the effectiveness of our methods in reducing the performance of state-of-the-art TkML methods.

There are several directions in which we would like to further improve our current methods. First, we only consider white-box attacks in the current work, it is natural to extend similar attacks to the black-box setting in which we do not have detailed knowledge of the model to be attacked. Furthermore, labels are not independent, and targeted attacks tend to be easier for semantically distant labels, e.g., it is probably easier to change label Dog to Cat than Airplane. Therefore, in our subsequent work, we would like to consider the semantic dependencies in designing more effective attacks to TkML algorithms. We will also study defenses against such attacks as an important future work.

Acknowledgments. This research was developed with funding from the National Science Foundation under Grant No. IIS-2103450.

References

  • [1] Naveed Akhtar and Ajmal Mian. Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access, 6:14410–14430, 2018.
  • [2] Stephen Boyd, Stephen P Boyd, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
  • [3] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39–57. IEEE, 2017.
  • [4] Mark Everingham, SM Ali Eslami, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes challenge: A retrospective. International journal of computer vision, 111(1):98–136, 2015.
  • [5] Yanbo Fan, Siwei Lyu, Yiming Ying, and Baogang Hu. Learning with average top-k loss. In Advances in neural information processing systems, pages 497–505, 2017.
  • [6] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. International Conference on Learning Representations, 2015.
  • [7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [8] Shu Hu, Yiming Ying, Siwei Lyu, et al. Learning by minimizing the sum of ranked range. Advances in Neural Information Processing Systems, 33, 2020.
  • [9] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. International Conference on Learning Representations, 2017.
  • [10] Maksim Lapin, Matthias Hein, and Bernt Schiele. Top-k multiclass svm. Advances in neural information processing systems, 2015.
  • [11] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
  • [12] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. International Conference on Learning Representations, 2017.
  • [13] Siwei Lyu and Yiming Ying. A univariate bound of area under roc. In Proceedings of the Conference on Uncertainty on Artificial Intelligence (UAI), 2018.
  • [14] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations, 2018.
  • [15] Stefano Melacci, Gabriele Ciravegna, Angelo Sotgiu, Ambra Demontis, Battista Biggio, Marco Gori, and Fabio Roli. Can domain knowledge alleviate adversarial attacks in multi-label classifiers? arXiv preprint arXiv:2006.03833, 2020.
  • [16] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1765–1773, 2017.
  • [17] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2574–2582, 2016.
  • [18] Wlodzimierz Ogryczak and Arie Tamir. Minimizing the sum of the k largest functions in linear time. Information Processing Letters, 85(3):117–122, 2003.
  • [19] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P), pages 372–387. IEEE, 2016.
  • [20] Tal Ridnik, Hussam Lawen, Asaf Noy, Emanuel Ben Baruch, Gilad Sharir, and Itamar Friedman. Tresnet: High performance gpu-dedicated architecture. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1400–1409, 2021.
  • [21] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
  • [22] Qingquan Song, Haifeng Jin, Xiao Huang, and Xia Hu. Multi-label adversarial perturbations. In 2018 IEEE International Conference on Data Mining (ICDM), pages 1242–1247. IEEE, 2018.
  • [23] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
  • [24] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. International Conference on Learning Representations, 2014.
  • [25] Nurislam Tursynbek, Aleksandr Petiushko, and Ivan Oseledets. Geometry-inspired top-k adversarial perturbations. arXiv preprint arXiv:2006.15669, 2020.
  • [26] Zekun Zhang and Tianfu Wu. Learning ordered top-k adversarial attacks via adversarial distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 776–777, 2020.
  • [27] Nan Zhou, Wenjian Luo, Xin Lin, Peilan Xu, and Zhenya Zhang. Generating multi-label adversarial examples by linear programming. In 2020 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2020.

Appendix


Appendix A Proofs

A.1 Proof of Lemma 1

Proof.

We know j=1kf[j]\sum_{j=1}^{k}f_{[j]} is the solution of

max𝐩𝐩F,s.t.𝐩𝟏=k,𝟎𝐩𝟏.\max_{\mathbf{p}}\mathbf{p}^{\top}F,~{}\text{s.t.}~{}\mathbf{p}^{\top}\mathbf{1}=k,\mathbf{0}\leq\mathbf{p}\leq\mathbf{1}.

We apply Lagrangian to this equation and get

L=𝐩F𝐯𝐩+𝐮(𝐩1)+λ(𝐩𝟏k)L=-\mathbf{p}^{\top}F-\mathbf{v}^{\top}\mathbf{p}+\mathbf{u}^{\top}(\mathbf{p}-1)+\lambda(\mathbf{p}^{\top}\mathbf{1}-k)

where 𝐮𝟎\mathbf{u}\geq\mathbf{0}, 𝐯𝟎\mathbf{v}\geq\mathbf{0} and λ\lambda\in\mathbb{R} are Lagrangian multipliers. Taking its derivative w.r.t. 𝐩\mathbf{p} and set it to 0, we have 𝐯=𝐮F+λ𝟏\mathbf{v}=\mathbf{u}-F+\lambda\mathbf{1}. Substituting it back into the Lagrangian, we get

min𝐮,λ𝐮𝟏+kλ,s.t.𝐮𝟎,𝐮+λ𝟏F0.\min_{\mathbf{u},\lambda}\mathbf{u}^{\top}\mathbf{1}+k\lambda,~{}\text{s.t.}~{}\mathbf{u}\geq\mathbf{0},\mathbf{u}+\lambda\mathbf{1}-F\geq 0.

This means

j=1kf[j]=minλ{kλ+j=1m[fjλ]+}.\sum_{j=1}^{k}f_{[j]}=\min_{\lambda}\Big{\{}k\lambda+\sum_{j=1}^{m}[f_{j}-\lambda]_{+}\Big{\}}.

Therefore,

ϕk(F)=1kminλ{kλ+j=1m[fjλ]+}.\phi_{k}(F)=\frac{1}{k}\min_{\lambda}\Big{\{}k\lambda+\sum_{j=1}^{m}[f_{j}-\lambda]_{+}\Big{\}}. (16)

Furthermore, we can see that λ=f[k]\lambda=f_{[k]} is always one optimal solution for Eq.(16). So

f[k]argminλ{kλ+j=1m[fjλ]+}.f_{[k]}\in\arg\min_{\lambda}\Big{\{}k\lambda+\sum_{j=1}^{m}[f_{j}-\lambda]_{+}\Big{\}}.

A.2 Proof of Lemma 2

Proof.

Denote g(x)=[[ax]+λ]+g(x)=[[a-x]_{+}-\lambda]_{+}. For λ0\lambda\geq 0, we have g(x)=0=[axλ]+g(x)=0=[a-x-\lambda]_{+} if xax\geq a. On the other hand, if x<ax<a, we have g(x)=[axλ]+g(x)=[a-x-\lambda]_{+}. Therefore, g(x)=[[ax]+λ]+=[axλ]+g(x)=[[a-x]_{+}-\lambda]_{+}=[a-x-\lambda]_{+} for any λ0\lambda\geq 0. ∎

Appendix B Additional Experimental Details

B.1 Source Code

For the purpose of review, the source code is accessible in the supplementary file.

B.2 Computing Infrastructure Description

All algorithms are implemented in Python 3.6 and trained and tested on an Intel(R) Xeon(R) CPU W5590 @3.33GHz with 48GB of RAM and an NVIDIA Quadro RTX 6000 GPU with 24GB memory.

B.3 The Algorithm for The Universal Untargeted Attack

First, we define the universal attack success rate (UASR) as

UASR =1ni=1n𝕀[E(Y(𝐱i),Y^(𝐱i+𝐳))=0].\displaystyle=\frac{1}{n}\sum_{i=1}^{n}\mathbb{I}_{\left[E(Y(\mathbf{x}_{i}),\hat{Y}(\mathbf{x}_{i}+\mathbf{z}))=0\right]}. (17)

We apply Algorithm 1 (without using projection in it) iteratively over all samples from X. At each iteration, Algorithm 1 finds a perturbation Δ𝐳i\Delta\mathbf{z}_{i} for a given data point 𝐱i+𝐳\mathbf{x}_{i}+\mathbf{z}, which success attack all top-kk labels for the current data xix_{i}. Then we update the universal adversarial perturbation 𝐳\mathbf{z} by using a projection operation to it. The overall procedure is described in detail in Algorithm 3. The algorithm terminates when the predefined attack success rate ξ\xi is reached.

Input: X={𝐱1,,𝐱n}\textbf{X}=\{\mathbf{x}_{1},\cdots,\mathbf{x}_{n}\}, predictor FF, kk, ηl\eta_{l}, ξ\xi, β\beta
Output: perturbation 𝐳\mathbf{z}^{*}
1 Initialization: 𝐳\mathbf{z}
2
3while UASR<ξ\mbox{{UASR}}<\xi do
4       for 𝐱iX\mathbf{x}_{i}\in\textbf{X} do
5             if E(Y(𝐱i),Y^(𝐱i+𝐳))0E(Y(\mathbf{x}_{i}),\hat{Y}(\mathbf{x}_{i}+\mathbf{z}))\neq 0 then
6                   _,Δ𝐳i=\_\ ,\Delta\mathbf{z}_{i}=TkML-AP-U(𝐱i+𝐳,F,k,ηl,β\mathbf{x}_{i}+\mathbf{z},F,k,\eta_{l},\beta) \triangleright Algorithm 1
7                  𝐳=𝒫ϵ(𝐳+Δ𝐳i)\mathbf{z}=\mathcal{P}_{\epsilon}(\mathbf{z}+\Delta\mathbf{z}_{i})
8             end if
9            
10       end for
11      
12 end while
13𝐳=𝐳\mathbf{z}^{*}=\mathbf{z}
return 𝐳\mathbf{z}^{*}
Algorithm 3 TkML-AP-Uv

B.4 Baseline Model Settings

Specifically, we tune the model with Adam optimizer with an initial learning rate of 0.0010.001 and batch size of 64 on PASCAL VOC 2012 and MS COCO 2014 for 25 and 100 epochs respectively.

For the PASCAL VOC 2012 dataset, following the protocol of [22, 27] for a fair comparison, which trained on the training set (5,717 images) and tests on the validation set (5,823 images). The MS COCO 2014 dataset is a larger dataset comparing with the PASCAL VOC 2012 in terms of both numbers of classes and images. It does not provide the ground truth labels for the testing images either. Similarly, we do the training on the training set (82,081 images) and testing on the validation set (40,137 images). The images are normalized into the range of [1,1][-1,1].

kk Methods PASCAL VOC 2012 MS COCO 2014
kk^{\prime}=3 kk^{\prime}=5 kk^{\prime}=10 kk^{\prime}=3 kk^{\prime}=5 kk^{\prime}=10
Pert(×\times10-2) ASR Pert(×\times10-2) ASR Pert(×\times10-2) ASR Pert(×\times10-2) ASR Pert(×\times10-2) ASR Pert(×\times10-2) ASR
3 kkFool 1.64 93.7 0.95 15.7 0.91 3.2 5.48 61.4 0.99 5.5 0.65 0.4
TkML-AP-U 0.51 99.6 0.24 3.6 0.18 0.3 0.24 100 0.43 26.5 0.35 3.9
5 kkFool - - 2.39 93.5 1.59 20.4 - - 9.92 65.2 1.10 6.8
TkML-AP-U - - 0.56 99.3 0.18 1 - - 0.53 100 0.39 9.8
10 kkFool - - - - 4.87 88.7 - - - - 1.66 68.1
TkML-AP-U - - - - 0.63 98.3 - - - - 0.59 100
Table 6: Comparison of Pert and ASR (%) of the untargeted attack methods with k=3,5,10k=3,5,10 on two datasets. The best results are shown in bold. ‘-’ represents the current results in the k<kk^{\prime}<k setting are the same as the results in the k=kk^{\prime}=k setting.

B.5 Settings of Attacking Methods

We use the same learning rate 0.010.01 for TkML-AP-U, TkML-AP-T, and ML-AP methods. We set 1000 as the maximum iterations in all untargeted and targeted attack methods. we only test the 2\ell_{2} norm for the perturbations. However, our algorithms can also work on other p\ell_{p} norms, i.e., 1\ell_{1} and \ell_{\infty}. Since the optimization may get stuck in extreme spots, we follow similar image processing and variable transformation methods in the algorithms based on [3] to avoid this problem.

Instead of taking a long time to find a good trade-off hyper-parameter β\beta in all algorithms, we use a projection method [14, 9] on 𝐳\mathbf{z}. After each iteration, we apply projection on the 𝐳\mathbf{z} with a projector operation 𝒫ϵ\mathcal{P}_{\epsilon} controls the criteria 𝐳2ϵ\|\mathbf{z}\|_{2}\leq\epsilon, where ϵ\epsilon is a predefined robustness threshold.

In the untargeted attack experiments, we set the projection threshold ϵ=10\epsilon=10 in our algorithm. Since our algorithm and the kkFool method have no terminate conditions, they will success attack all images in the test set of data. This means both of them can achieve a 100% ASR score in the final performance even take a long time in some specific images. To avoid this situation, we set the maximum iteration equals 1000 as we mentioned before. After both algorithms finish attack 1000 images in each dataset, we report the final performance.

We set ϵ=2\epsilon=2 in the targeted attack experiment and report the final performance in the main paper. However, we also test other ϵ\epsilon settings in the following Section.

Appendix C Additional Experimental Results

C.1 Additional Untargeted Attacking Performance

Methods kk MS COCO 2014
Pert(×\times10-2) ASR
kkFool 10 16.45 68.1
15 23.49 66.7
20 26.01 65.7
25 57.38 63
30 97.67 62.7
40 103.53 51.2
Table 7: Pert and ASR (%) of kkFool method in different kk settings on MS COCO 2014 dataset.
Refer to caption
Figure 5: Examples of kkFool and TkML-AP-U adversarial perturbations with k=3,5,10k=3,5,10 on PASCAL VOC 2012. To better show perturbations, we have multiplied the intensity of all perturbation images by 20. GT means the ground truth labels. Top-kk (kk=3, 5, 10) means the Top-kk predicted labels from the corresponding image. Advers. means adversarial. The green icons correspond to the ground truth labels.

First, we report the complete results with the same setting from Table 2 in Table 6. Second, we analysis some results displayed in Table 2. For the kkFool method, we can see that ASR scores from 93.7% to 88.7%, which are decreasing with increasing the kk value in the PASCAL VOC 2012 dataset. This is because the top-kk attack becomes hard when the kk is increasing. The algorithm needs to take more effort to push ground truth labels outside the top-kk position. However, we find an opposite trend that ASR score is increasing when the kk value is increasing in the COCO dataset. The reason is that the number of labels in the COCO dataset is four times that in VOC. In other words, the number of non-ground truth labels is very large in the COCO dataset. When the kk value is small, the performance of the kkFool method is not influenced much by the kk settings. However, when the kk value is large and continues increasing, the ASR score will be decreased because the kk value has more impact on the algorithm. We have verified this statement through additional experiments in Table 7.

C.2 Additional Untargeted Attacking Image Results

We show a failed example from the kkFool method in Figure 5. Since the original kkFool method is for attacking the top-kk multi-class classifier, we show the results, which an image has only one ground-truth label in Figure 6. From Figure 6, we can see that our method outperforms the kkFool method even our method is reduced to a multi-class version.

Refer to caption
Figure 6: Examples of kkFool and TkML-AP-U adversarial perturbations with k=3,5,10k=3,5,10 on PASCAL VOC 2012. To better show perturbations, we have multiplied the intensity of all perturbation images by 20. GT means the ground truth labels. Top-kk (kk=3, 5, 10) means the Top-kk predicted labels from the corresponding image. Advers. means adversarial. The green icons correspond to the ground truth labels.

C.3 Additional Universal Untargeted Attacking Performance

We also compare the performance of our TkML-AP-Uv method and the kkUAPs method. In the training of both algorithms, when UASR larger than 0.7, we output the universal perturbation 𝐳\|\mathbf{z}\| and use it to evaluate the attacking methods and report the performance in Table 8. Note that the performance in Table 4 is a partial results of the performance in Table 8. To evaluate the performance efficiently and avoid the algorithm still get stuck in the loop, we set 20 as the maximum iteration for the outer loop in both algorithms. We use ‘×\times’ to represent the algorithm that cannot satisfy the terminate conditions after the maximum iteration. We also report the results when ξ=0.8\xi=0.8.

ξ\xi kk Methods PASCAL VOC 2012 MS COCO 2014
kk^{\prime}=1 kk^{\prime}=2 kk^{\prime}=3 Pert kk^{\prime}=1 kk^{\prime}=2 kk^{\prime}=3 Pert
0.7 1 kkUAPs ×\times ×\times ×\times ×\times 63.9 51.4 45 0.51
TkML-AP-Uv 72.3 60 50.2 0.15 86.5 68.5 62.9 0.13
2 kkUAPs - ×\times ×\times ×\times - 74.6 65.9 0.51
TkML-AP-Uv - 68 59.5 0.16 - 82 82 0.15
3 kkUAPs - - ×\times ×\times - - 73.2 0.51
TkML-AP-Uv - - 65 0.17 - - 80.5 0.16
0.8 1 kkUAPs ×\times ×\times ×\times ×\times 66.2 56.2 49.4 0.51
TkML-AP-Uv 74.2 60.7 50.4 0.14 84.5 70.4 63.7 0.13
2 kkUAPs - ×\times ×\times ×\times - ×\times ×\times ×\times
TkML-AP-Uv - 70.5 61.7 0.16 - 81.2 75.3 0.15
3 kkUAPs - - ×\times ×\times - - 73.2 0.51
TkML-AP-Uv - - 69 0.18 - - 78.9 0.16
Table 8: Comparison of Pert and ASR (%) of the universal untargeted attack methods for two datasets. ‘×\times’ represents the current algorithm cannot get results. ‘-’ represents the current results in the k<kk^{\prime}<k setting are the same as the results in the k=kk^{\prime}=k setting. ϵ=100\epsilon=100. The best results are shown in bold.
ξ\xi kk Methods PASCAL VOC 2012 MS COCO 2014
kk^{\prime}=1 kk^{\prime}=2 kk^{\prime}=3 Pert kk^{\prime}=1 kk^{\prime}=2 kk^{\prime}=3 Pert
0.7 1 kkUAPs ×\times ×\times ×\times ×\times 97.6 91.5 85.4 1.24
TkML-AP-Uv 72.3 60 50.2 0.14 86.5 68.5 62.9 0.13
2 kkUAPs - 78.5 70.6 0.85 - 99.4 98.8 10.16
TkML-AP-Uv - 68 59.5 0.16 - 82 82 0.15
3 kkUAPs - - 79.5 2.09 - - 73.2 10.16
TkML-AP-Uv - - 65 0.17 - - 80.5 0.16
0.8 1 kkUAPs 82.9 70.6 60.7 0.66 96.7 78.6 59.4 2.60
TkML-AP-Uv 74.2 60.7 50.4 0.14 84.5 70.4 63.7 0.13
2 kkUAPs - 84.2 77 0.98 - 99.3 98.1 1.62
TkML-AP-Uv - 70.5 61.7 0.16 - 81.2 75.3 0.15
3 kkUAPs - - ×\times ×\times - - 98.6 3.95
TkML-AP-Uv - - 69 0.18 - - 78.9 0.16
Table 9: Comparison of Pert and ASR (%) of the universal untargeted attack methods for two datasets. ‘×\times’ represents the current algorithm cannot get results. ‘-’ represents the current results in the k<kk^{\prime}<k setting are the same as the results in the k=kk^{\prime}=k setting. ϵ=2000\epsilon=2000. The best results are shown in bold.

In MS COCO 2014 dataset, the ASR scores from our method are higher than the scores from the kkUAPs. There is a huge gap (12.6%) between two methods when k=1k=1 and ξ=0.7\xi=0.7. Second, we can find that the Pert from our method is smaller than it from the kkUAPs method in all kk value settings. On the other hand, we find the ASR scores from our method are decreasing when increasing the kk value. However, there is no same trend in the kkUAPs. The maximum score (74.6%) can be achieved when k=2k=2. A potential explanation is that the kkUAPs is based on the modified kkFool comparative method, which has no guarantee of the optimal solution in the optimization procedures. But our method is based on TkML-AP-U, which uses the sum of top-kk (a convex relaxation) in the optimization.

Cases kk Methods PASCAL VOC 2012 MS COCO 2014
kk^{\prime}=3 kk^{\prime}=5 kk^{\prime}=10 kk^{\prime}=3 kk^{\prime}=5 kk^{\prime}=10
Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR
Best 3 ML-AP 3.04 64.7 2.63 3.6 1.14 0.1 4.96 97.5 4.65 11.2 4.35 1.1
TkML-AP-T 3.04 64.7 2.70 2.80 1.14 0.1 5.10 97.5 4.51 9.1 4.73 0.7
5 ML-AP - - 3.20 51.5 2.02 0.5 - - 5.55 94 4.70 3
TkML-AP-T - - 3.20 51.8 1.96 0.7 - - 5.68 94.5 5.14 2.5
10 ML-AP - - - - 3.33 34.4 - - - - 6.10 86.5
TkML-AP-T - - - - 3.37 35.5 - - - - 6.22 87.8
Random 3 ML-AP 3.58 34.3 3.44 12.9 3.31 1.6 6.58 84.5 6.50 45.2 6.27 14
TkML-AP-T 3.60 38.2 3.46 12.8 3.49 1.1 6.63 86.3 6.55 42.6 6.52 11.5
5 ML-AP - - 3.67 23.7 3.82 4.7 - - 6.72 60.1 6.61 22.9
TkML-AP-T - - 3.76 28.2 3.54 3.7 - - 6.81 68.1 6.65 21.1
10 ML-AP - - - - 3.75 17.6 - - - - 6.95 26.4
TkML-AP-T - - - - 3.80 20.5 - - - - 6.97 42.6
Worst 3 ML-AP 3.88 13.7 3.85 9 3.82 2.2 6.71 61.2 6.68 42.4 6.78 18.5
TkML-AP-T 3.85 17.5 3.79 10.2 3.74 2.6 6.76 65.5 6.74 42 6.75 17.4
5 ML-AP - - 3.96 8.3 4.03 3.4 - - 6.75 39.9 6.70 23.9
TkML-AP-T - - 3.98 11.3 4.01 3.3 - - 6.80 47.2 6.76 23.2
10 ML-AP - - - - 3.95 6.2 - - - - 6.90 14.7
TkML-AP-T - - - - 4.04 10.2 - - - - 6.88 24.4
Table 10: Comparison of Pert and ASR (%) of the targeted attack methods with kk=3, 5, 10 in the Best, Random, and Worst cases on two datasets. The best ASR results are shown in bold. ϵ=1\epsilon=1.

When we compare the results between two datasets in our method, we can find that the ASR scores from the MS COCO 2014 dataset are larger than ones from the PASCAL VOC 2012 dataset. The reason is that MS COCO 2014 has 80 labels. There are many labels (exclude the ground truth labels) that can be pushed to the top-kk positions. However, there are only 20 labels in the PASCAL VOC 2012 dataset. Therefore, the attacking methods are easy to attack the baseline models with a dataset that contains more labels.

Refer to caption
Figure 7: Examples of kkUAPs and TkML-AP-Uv adversarial perturbations with k=3k=3 on MS COCO 2014. GT means the ground truth labels. Top-3 means the Top-3 predicted labels from the corresponding image. The green icons correspond to the ground truth labels.
Refer to caption
Figure 8: Examples of TkML-AP-Uv adversarial perturbations with k=3k=3 on MS COCO 2014. GT means the ground truth labels. Top-3 means the Top-3 predicted labels from the corresponding image. Advers. means adversarial. The green icons correspond to the ground truth labels. ϵ=15\epsilon=15.
Refer to caption
Figure 9: Examples of TkML-AP-Uv adversarial perturbations with k=3k=3 on MS COCO 2014. GT means the ground truth labels. Top-3 means the Top-3 predicted labels from the corresponding image. Advers. means adversarial. The green icons correspond to the ground truth labels. ϵ=20\epsilon=20.

Since we set 100 as a projection threshold in the algorithm, the norm of 𝐳\mathbf{z} from the kkUAPs method in all kk value settings can reach the maximum threshold, which means the perturbed images from kkUAPs will distortion. When we increase the projection threshold, these values from the kkUAPs will become larger. But the values from the TkML-AP-Uv method are stable. We show the results in Table 9 with setting the projection threshold ϵ=2000\epsilon=2000 .

Cases kk Methods PASCAL VOC 2012 MS COCO 2014
kk^{\prime}=3 kk^{\prime}=5 kk^{\prime}=10 kk^{\prime}=3 kk^{\prime}=5 kk^{\prime}=10
Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR
Best 3 ML-AP 4.44 96.2 3.70 15 2.42 0.2 5.49 100 5.13 11.4 5.09 1.3
TkML-AP-T 4.45 96.6 3.57 4.10 1.14 0.1 5.71 100 5.55 11.4 5.68 0.9
5 ML-AP - - 5.01 92 2.59 0.6 - - 6.57 99.9 5.94 3.9
TkML-AP-T - - 5.02 92.8 1.18 0.1 - - 6.86 99.9 5.69 2.9
10 ML-AP - - - - 5.53 84.2 - - - - 8.06 99.8
TkML-AP-T - - - - 5.59 86.4 - - - - 8.52 99.8
Random 3 ML-AP 5.90 86 5.69 51.7 4.88 3.1 9.43 99.8 9.35 60.7 9.17 24.9
TkML-AP-T 5.90 89.8 5.39 25.3 4.57 2.4 9.89 99.9 9.79 58.4 9.72 22.7
5 ML-AP - - 6.22 77.9 5.78 16.3 - - 11.10 96.5 11.00 43.2
TkML-AP-T - - 6.27 83.7 5.66 10.5 - - 11.80 97.8 11.70 42.7
10 ML-AP - - - - 6.27 67.7 - - - - 12.20 84.2
TkML-AP-T - - - - 6.41 76.4 - - - - 12.80 94.5
Worst 3 ML-AP 6.59 68 6.43 42.6 5.86 7.7 10.80 90 10.90 72.2 11.10 43.3
TkML-AP-T 6.64 75.8 6.35 36.2 5.79 7.3 11.40 91.4 11.50 71.6 11.70 43
5 ML-AP - - 6.75 53.3 6.47 18.8 - - 11.90 81.8 11.90 56.6
TkML-AP-T - - 6.90 66.6 6.41 16.6 - - 12.50 87.2 12.60 58.9
10 ML-AP - - - - 6.68 39.1 - - - - 12.50 59
TkML-AP-T - - - - 6.91 57 - - - - 13.00 73.1
Table 11: Comparison of Pert and ASR (%) of the targeted attack methods with kk=3, 5, 10 in the Best, Random, and Worst cases on two datasets. The best ASR results are shown in bold. ϵ=2\epsilon=2.
Cases kk Methods PASCAL VOC 2012 MS COCO 2014
kk^{\prime}=3 kk^{\prime}=5 kk^{\prime}=10 kk^{\prime}=3 kk^{\prime}=5 kk^{\prime}=10
Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR
Best 3 ML-AP 4.74 98.9 4.12 5.6 1.14 0.1 5.51 100 5.31 12.2 4.57 1
TkML-AP-T 4.76 99.2 3.91 4.20 1.14 0.1 5.72 100 5.48 11.1 4.91 0.7
5 ML-AP - - 5.53 97.2 5.29 1.1 - - 6.60 99.9 5.57 3.3
TkML-AP-T - - 5.54 97.8 4.71 1.1 - - 6.91 99.9 6.75 3.9
10 ML-AP - - - - 6.35 94 - - - - 8.18 100
TkML-AP-T - - - - 6.43 95.7 - - - - 8.69 100
Random 3 ML-AP 6.77 95.5 6.39 40.2 5.39 2.8 9.62 100 9.47 64.6 9.36 25
TkML-AP-T 6.72 97.7 6.06 29 5.03 1.9 10.10 100 10.00 61.6 9.94 23.4
5 ML-AP - - 7.32 90.7 6.71 17.2 - - 11.90 98.5 11.50 45
TkML-AP-T - - 7.34 94.8 6.30 12 - - 13.00 99.3 12.90 44.4
10 ML-AP - - - - 7.55 85 - - - - 14.40 95
TkML-AP-T - - - - 7.74 91.8 - - - - 16.50 99.1
Worst 3 ML-AP 7.92 86.1 7.75 54.8 7.09 10.9 11.90 96.6 12.10 77.3 12.60 49.3
TkML-AP-T 8.04 92.9 7.67 48.5 6.83 9.3 12.70 98 13.00 77.7 13.50 50.7
5 ML-AP - - 8.30 75.7 7.67 27.8 - - 13.50 92.8 13.80 67.8
TkML-AP-T - - 8.58 89.2 7.74 21.5 - - 15.00 95.8 15.30 68.3
10 ML-AP - - - - 8.37 64.3 - - - - 15.50 75.6
TkML-AP-T - - - - 8.84 83.6 - - - - 17.90 90
Table 12: Comparison of Pert and ASR (%) of the targeted attack methods with kk=3, 5, 10 in the Best, Random, and Worst cases on two datasets. The best ASR results are shown in bold. ϵ=3\epsilon=3.
Cases kk Methods PASCAL VOC 2012 MS COCO 2014
kk^{\prime}=3 kk^{\prime}=5 kk^{\prime}=10 kk^{\prime}=3 kk^{\prime}=5 kk^{\prime}=10
Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR Pert(×\times10-3) ASR
Best 3 ML-AP 4.83 99.6 3.62 4.7 1.14 0.1 5.50 100 5.42 12.3 4.57 1.2
TkML-AP-T 4.86 99.8 3.64 4.3 1.14 0.1 5.74 100 5.66 11.4 6.84 1.1
5 ML-AP - - 5.68 98.4 3.65 1 - - 6.60 99.9 5.98 3.8
TkML-AP-T - - 5.73 99.1 2.92 0.9 - - 6.91 99.9 5.96 3.3
10 ML-AP - - - - 6.62 96.7 - - - - 8.19 100
TkML-AP-T - - - - 6.75 98.2 - - - - 8.70 100
Random 3 ML-AP 7.00 97.4 6.66 40.6 5.48 4 9.61 100 9.51 63.2 9.22 24.6
TkML-AP-T 6.96 99 6.31 29.5 5.04 3.1 10.10 100 9.98 61.7 9.91 23.4
5 ML-AP - - 7.61 93.6 6.64 16.3 - - 12.00 99 11.70 44.7
TkML-AP-T - - 7.84 98.6 6.33 12.3 - - 13.10 99.7 13.10 44.6
10 ML-AP - - - - 7.93 89.1 - - - - 14.70 96.8
TkML-AP-T - - - - 8.33 96.4 - - - - 17.20 99.9
Worst 3 ML-AP 8.35 90.9 8.07 57.4 7.24 11.7 12.10 97.9 12.30 80.1 13.10 52.3
TkML-AP-T 8.66 97.8 8.09 50.6 7.09 10.9 12.90 98.9 13.30 77.4 13.90 50.9
5 ML-AP - - 8.75 80.7 8.08 29.4 - - 13.80 94.1 14.20 68.9
TkML-AP-T - - 9.47 96.2 8.62 25 - - 15.60 98.3 16.00 72.2
10 ML-AP - - - - 8.81 69.2 - - - - 16.00 78.7
TkML-AP-T - - - - 10.10 94.7 - - - - 19.90 95.7
Table 13: Comparison of Pert and ASR (%) of the targeted attack methods with kk=3, 5, 10 in the Best, Random, and Worst cases on two datasets. The best ASR results are shown in bold. ϵ=10\epsilon=10.

C.4 Additional Universal Untargeted Attacking Image results

We set ϵ=100\epsilon=100 and ξ=0.7\xi=0.7, then show the image results from both methods in Figure 7. However, we can set a smaller ϵ\epsilon value and get a smaller perturbation. Here, we show more perturbed image results and report their top-3 predictions based on our TkML-AP-Uv method with setting different ϵ\epsilon. We set the projection threshold ϵ=15\epsilon=15 in Figure 8 and ϵ=20\epsilon=20 in Figure 9.

C.5 Additional Targeted Attacking Performance

In the main paper, we set the projection threshold ϵ=2\epsilon=2 and report partial results. Here, we set more different projection thresholds such as ϵ=1\epsilon=1 in both TkML-AP-T and ML-AP algorithms and show the performance in the Table 10. We also show the complete results in Table 11, 12 and 13 with setting ϵ=2,3,10\epsilon=2,3,10, respectively. From these Tables, we can see that the performance is increasing when ϵ\epsilon is increasing.

In the level of the cases, we find that the ASR score is decreasing when the selected labels are hard to attack in the same kk values. For example, for k=3k^{\prime}=3 in the PASCAL VOC 2014 dataset with ϵ=2\epsilon=2, the ASR score of the ML-AP method is decreased from 96.2% (Best) to 86% (Random), which has a 10.2% difference. This difference becomes large from 86% (Random) to 68% (Worst), which has an 18% difference. On the other hand, for our method, we can find the difference is 6.8% from 96.6% (Best) to 89.8% (Random), and the difference becomes 14% from the 89.8% (Random) to 75.8% (Worst). Comparing these difference values between the two methods in the same scenario, we can see that the difference from TkML-AP-T is smaller than the difference of ML-AP. This means our method is more robust than the ML-AP method. For the Pert values of perturbation norms, we find that it is increased when the selected labels are hard to attack in the same kk values. This is very intuitive because both methods need to add more perturbations to handle the hard tasks.