This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Is Face Recognition Safe from Realizable Attacks?

Sanjay Saha
Department of Computer Science
National University of Singapore
[email protected]
   Terence Sim
Department of Computer Science
National University of Singapore
[email protected]
Abstract

Face recognition is a popular form of biometric authentication and due to its widespread use, attacks have become more common as well. Recent studies show that Face Recognition Systems are vulnerable to attacks and can lead to erroneous identification of faces. Interestingly, most of these attacks are white-box, or they are manipulating facial images in ways that are not physically realizable. In this paper, we propose an attack scheme where the attacker can generate realistic synthesized face images with subtle perturbations and physically realize that onto his face to attack black-box face recognition systems. Comprehensive experiments and analyses show that subtle perturbations realized on attackers face can create successful attacks on state-of-the-art face recognition systems in black-box settings. Our study exposes the underlying vulnerability posed by the Face Recognition Systems against realizable black-box attacks.

footnotetext: 978-1-7281-9186-7/20/$31.00 ©2020 IEEE

1 Introduction

Face recognition is one of the most convenient and popular biometric authentication systems. Being contactless, applicable for a large crowd at a time, and more importantly, recent improvements in the accuracy of these systems are the main reasons behind the widespread uses of Face Recognition Systems (FRS). Different kinds of attacks on FRS and their countermeasures are frequently studied in the field of biometrics.

Attacks on FRS can be broadly classified mainly into two groups: 1. Presentation attacks. 2. Adversarial attacks. In presentation attacks, the aim of the Attacker is to fool the FRS by presenting a photo of a face or put on a mask. Presentation attacks and detection of these attacks have been very common recently. Many studies[20, 23, 25, 26] have been done to develop presentation attacks, and detection techniques to prevent these attacks. Adversarial attacks, on the other hand, rely on generating adversarial face images, or adversarial patches (mostly through Generative Adversarial Networks, Auto-encoders, etc.). Adversarial attacks also extend to other image classification problems. They rely on making subtle changes to the input images (mostly in the forms of noise, small changes in the pixels which are unnoticeable to human eyes). Studies[8, 28] show that even the state-of-the-art FRS are vulnerable to adversarial attacks. All these studies are helpful for finding out inherent vulnerabilities in FRS.

We can also group attacks on image classification systems based on how much the Attacker knows about the target system: 1. White-box attack. 2. Black-box attack. In a white-box attack, the Attacker knows the underlying working principle of the target system. The Attacker can use this knowledge of the internal parameters to modify the decisions of the system. Although these attacks are more successful against targeted systems, it is not realistic to know their parameters. A more practical kind of attack is a black-box attack where the internal parameters are unknown to the Attacker. Hence, the Attacker would need to rely only on the output of the system.

To better describe our Attack Scheme, let us assume a ‘Guarded Attack Scenario’ where a FRS is used to secure a building. It authenticates users of the building. The system is also attended by a security guard. The Attacker wants to get into the building by fooling the FRS. He also must not raise any suspicion of the security guard (e.g. present a photograph/video of face, put on mask etc.). Hence the attack needs to be realizable. Additionally, he does not know the internal structure of the FRS which means it must be a black-box attack.

It is not practical to launch a presentation attack by showing a printed photo, video, or, put on a mask in front of the security guard as he would immediately notice the attack. In this scenario, it is also not possible to launch an adversarial attack because adversarial images with subtle pixel manipulations cannot be recreated physically. But the attacker can attempt to grow a beard, put on a scar, or make a facial expression to fool the FRS. We call such attacks ‘physically realizable’. Also, the security guard does not notice because the attack does not raise any suspicion. In this paper, our goal is to address the ‘Guarded Attack Scenario’ by making subtle perturbations on the Attacker’s face with the help of a face synthesizer so that the same perturbations can be physically realized.

In this paper, we explore three types of attacks on FRS in the Guarded Attack Scenario: 1. Break-in. 2. Impersonation. 3. Evasion. Details on these attacks are given in Section 3.4. The contribution of this paper is to show that FRS are vulnerable to realizable attacks, and thus to urge more research in this area.

2 Related Works

Attacks on objects and face recognition. Attacks on object classification systems, and FRS have been studied broadly in recent times especially after the advent of deep neural networks. Presentation attacks is the most common attack against FRS. Studies [7, 20, 23, 24, 25, 26] on presentation attacks, and detection of these attacks show that many FRS are vulnerable to these attacks. However, these presentation attacks[24, 26] are developed for the scenario when the FRS are unattended i.e. there is no security guard near the system to monitor. Thus, these attacks are not useful in an attended scenario like the Guarded Attack Scenario where it is necessary for the attacker to be stealthy. Also traditional presentations attacks (e.g. presenting a photograph, or video) are also not useful in the Guarded Attack Scenario. Other than presentation attacks, Adversarial studies[4, 10, 15, 30] show that careful perturbation in images based on pixel manipulations can also fool the classifier to misclassify. Deep Convolutional Neural Network models posses vulnerability against carefully crafted white-box attacks. These[27, 28] attacks are based on manipulation on the face of an attacker using a printed out adversarial eyeglasses. Although successful, these attacks utilized prior information about the models that were attacked. In this study, we are focusing on the black-box scenario where the attacker would not have any information from the model other than its predicted identity, and its prediction score.

Black-box attacks. These black-box attacks[2, 3, 5, 14, 21] on object classification models are based on adding subtle perturbations in the input images. However, these attacks are not possible to be replicated in the real world as they are based on pixel manipulations on the images rather than perturbations on the actual objects. In [4] the authors designed a black-box attack with a physical patch attached near an object which causes wrong classification from the classifier. However, this attack is not subtle and also does not scale to FRS. In [18, 19], attacks targeting face images have been made using semi-adversarial networks to preserve gender privacy. In [8, 11], the identity of the face was the target in black-box settings for attacks on FRS. Although these attacks are making small perturbations on face images, they are pixel-based perturbations that cannot be realized on physical faces.

Realizable attacks. The closest work to our attack is [27]. In their paper, the authors attacked, and succeeded to fool a FRS by putting on a printed adversarial spectacle frame in a white-box setting. Although this attack is realizable, and stealthy, it is only possible in a white-box scenario. On the other hand, perturbations proposed in [8, 11, 17] are subtle but are only possible for face images i.e. they cannot be realized on physical faces. Unlike these attacks, in our Attack Scheme, the Attacker attacks a black-box FRS by adding on facial features like beard, marks, makeup, etc.

In this paper, we focus on the Stealthy Attack Scenario as described in Section 1. We propose realizable perturbations so that it is possible to physically reproduce them on real faces. Our Attack Scheme succeeded in attacking multiple FRS in the above mentioned scenario with realistic synthesized faces.

3 Method

3.1 The Attack Scheme

The basic structure of our Attack Scheme has the following major components: a target FRS, a gallery of authorized subjects, and a face synthesizer. We present a synthesized image of the Attacker’s face to the target FRS which authenticates the face and returns the best-matched identity (idid) and the corresponding score (ss). The score can be the distance (euclidean, cosine, etc.) of the best-match or, confidence in the prediction. For simplicity, let us assume that score, s, 0s1s,\;0\leq s\leq 1 is the distance with the best-matched face in the gallery. Hence, lower ss means the FRS is highly confident of its prediction and vice verse. When the score (ss) is below a predefined threshold θ\theta the FRS authenticates the Attacker and the attack is considered successful.

We generate a new face image of the Attacker using the synthesizer by manipulating the controllable parameters, pp. More details on the face synthesizer, and the parameter vector pp are in section 3.5. The Attack Scheme is an optimization problem where we try to minimize the score, ss until s<θs<\theta. Hence, we are essentially using an ‘analysis-by-synthesis’ method (Figure 1).

Refer to caption
Figure 1: Attack Scheme: synthesis of optimized realizable face and using it to attack the target black-box FRS. The FRS matches the presented face with the face images in the gallery, and returns identity and score.

Let f:XYf:X\rightarrow Y denote the target FRS where XX is the set of input faces and YY is the set of scores of the best-matched faces. Each element in YY is composed of two parts: idid (predicted identity) and ss (score). 𝒮(p)\mathcal{S}(p) be the face synthesizer that generates a new face for the Attacker using the synthesis parameter vector, pp. Let \mathbb{P} be the parameter vector space. Our goal is to minimize the score (ss) returned by our objective function ff. If pminp_{min} is the optimized parameter vector at the end of an attack on the FRS, and smins_{min} is the score for the face synthesized using pminp_{min}, we can define our Attack Scheme as the optimization problem:

pmin\displaystyle p_{min} =argminp(f(𝒮(p)))\displaystyle=\operatorname*{arg\,min}_{p}(f(\mathcal{S}(\mathbb{P}_{p}))) (1)
subjecttoB(p)0,whereB()constrains\displaystyle subject~{}to~{}B(\mathbb{P}_{p})\leq 0,~{}where~{}B()~{}constrains
theparametervectorp(seeSection3.5.3)\displaystyle the~{}parameter~{}vector~{}\mathbb{P}_{p}~{}(see~{}Section~{}\ref{goto:newface})
smin\displaystyle s_{min} =f(𝒮(pmin))\displaystyle=f(\mathcal{S}(p_{min})) (2)
result\displaystyle result ={success,if sminθfailed,otherwise\displaystyle=\begin{cases}success,&\text{if }s_{min}\leq\theta\\ failed,&\text{otherwise}\end{cases} (3)

In our experiments, this optimization was done using the Nelder-Mead method in the Scipy optimization package. We initialized the optimization parameter p0p_{0} depending on the types of attacks (see Section 3.4).

3.2 Target Face recognition systems (FRS)

We have used three FRS as targets which are publicly available. 1. Python face_recognition[9] library. 2. RESNET50[13] trained with Oxford’s VGGFace[22]. 3. Face++[1] API. Although the system architectures of the first two systems are at public disposal, we treat them as black-boxes. We use the best-matched identity (idid), and score (ss) returned by the target systems to optimize our objective function ff. Details on the thresholds (θ\theta) for different FRS are given in section 4.1.

3.3 Gallery

Target FRS have a gallery of faces that they recognize. We have taken a subset of the full-frontal face images from the CMU Multi-PIE dataset[12]. The Multi-PIE dataset has 338 people’s faces from which we have used a subset of the 338 faces for our experiments. We have made sure that people from different races, skin colors, and genders are there in the gallery to ensure that face recognition is not biased to any of these factors.

3.4 Types of attacks

We carried out three kinds of attack as listed below. All the attacks work in similar way as is described in the Attack Scheme (Figure 1). We refer back to the ‘Guarded Attack Scenario’ from Section 1 to describe these attack types. A summary of these attacks in terms of variations in gallery and Victim(s) are presented in Table 1.

Break-in. The Attacker tries to gain access to the secured building in this attack. He is not registered in the gallery. He presents his perturbed face to the FRS to fool the system to identify him as one of the faces in the gallery. There is no specific Victim in the gallery. A break-in attack is deemed successful when the FRS authenticates the Attacker as one of the registered faces in its gallery.

Impersonation. This attack is similar to the break-in attack as the Attacker is not registered in the gallery. However, now the Attacker tries to impersonate someone from the gallery. We call the selected person ‘Victim’. The Attacker tries to make the FRS think his face as the Victim’s face. An impersonation attack is successful only when the FRS identifies the Attacker as the Victim and no one else. Even if the Attacker is authenticated as some other person, it is not enough to be considered as successful. Hence, impersonation attacks are more challenging than break-in attacks.

Evasion. Unlike the previous two attacks, the Attacker is now registered in the gallery. The goal of the Attacker is to avoid being identified as himself by the FRS. There are two scenarios:

  • Partial Evasion: The Attacker knows the people in the gallery and wants to impersonate someone else (a Victim). This attack is considered successful when the Attacker is identified as the Victim and no one else.

  • Full Evasion: This is where the Attacker does not want to be recognized as anyone in the gallery. This is also the ‘Police Watchlist Scenario’, in which the police use a FRS to identify a watchlist of criminals. The Attacker, who is in the watchlist, needs to avoid being recognized as himself, as well as anyone else in the watchlist.

Attack Gallery Victim(s)
Break-in
Attacker is NOT
IN the gallery
No specific Victim
Impersonation
Attacker is NOT
IN the gallery
One, selected by Attacker
Evasion
Attacker is IN
the gallery
No specific Victim as long
as it is not the Attacker
Table 1: Attack types’ summary.

3.5 Face Synthesizer

One of our main focuses in this work is to generate realizable face images so that it can be replicated in real-world scenarios. To get such face images we have used the Multimodal Discriminant Analysis (MMDA) face synthesis model from [29]. This method of face synthesizing is capable of generating realistic-looking faces from the given different variations of training face images. Other synthesis methods[16, 6] could be used that are based on Generative Adversarial Networks (GANs). But this is not the focus of this work. We use MMDA because it is easy to train and it produces good quality realistic images.

Refer to caption
(a) Attacker 1: South Asian Male
Refer to caption
(b) Attacker 2: Caucasian Female
Refer to caption
(c) Attacker 3: East Asian Female
Figure 2: Sample images of the Attackers used to train the face synthesizer. They show different combinations of scars, mole, facial hair, expression, etc. The three attackers were chosen from different ethnicity and genders. This helps to generalize the attack scheme in terms of different ethnicity and genders.

     Attackers

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption

     Victims

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
a b c d e
Figure 3: Examples of successful Break-in attacks: columns (a)-(b) with Attacker 1, (c)-(d) with Attacker 2, and (e) with Attacker 3. The FRS have misclassified the attackers as the subjects in the corresponding columns.

3.5.1 Training

The face synthesizer needs to be trained with different variations of the Attacker’s face images. Attackers use their individual face synthesizer models by training them with different variations of their face images (Figure 2). For example, the synthesizer model for Attacker 1 is trained with four(4) modes each with multiple different labels. In total there are 48 training images for Attacker 1 which is the Cartesian product of all the modes and labels (48=3×4×2×248=3\times 4\times 2\times 2):

  • Facial hair: clean, beard ++ mustache, mustache only.

  • Marks: clean, forehead scar, cheek scar, mole.

  • Expressions: no expression, distorted cheek.

  • Eyeglasses: no eyeglasses, with eyeglasses.

These training images were preprocessed before applying the MMDA method to decompose. Preprocessing steps consist of locating facial landmarks, aligning the image to a reference face via the eye positions, normalizing the shape by applying a mask, and then to encode the warp face and landmarks coordinates in a vector xx (in Equation 5).

3.5.2 Decomposition

The MMDA face synthesizer decomposes the training face images based on the different modes and labels within the modes. During the decomposition it learns two matrices PP, and VV. PP is the whitening PCA matrix. VV is an orthogonal matrix capturing semantic information of the modes and the shape of VV is (n1)×(n1)(n-1)\times(n-1) where nn is the number of training images. Here we show the following matrices only for the face synthesizer trained on Attacker 1’s face images. Face synthesizer for other attackers follows similar calculations. Now VV is:

V\displaystyle V =[VfacehairVmarkVexpressionV0]\displaystyle=[V^{facehair}\;V^{mark}\;V^{expression}\;V^{0}] (4)

The columns of the VV matrix are the bases for the facial hair (VfacehairV^{facehair}), marks (VmarkV^{mark}), and expressions (VexpressionV^{expression}). V0V^{0} is the Residual Space. It captures the facial identity. Each face (mm) can be decomposed into a vector yy:

y\displaystyle y =VTPTx\displaystyle=V^{T}P^{T}x (5)
yT\displaystyle y^{T} =[h1h2face hairm1m2m3markse1e2expressionsTresidual]\displaystyle=[\underbrace{h_{1}\,h_{2}}_{\text{face hair}}\;\underbrace{m_{1}\,m_{2}\,m_{3}}_{\text{marks}}\;\underbrace{e_{1}\,e_{2}}_{\text{expression}}\;\underbrace{s^{T}}_{\text{residual}}] (6)
p\displaystyle p =[h1h2m1m2m3e]\displaystyle=[h_{1}\;h_{2}\;m_{1}\;m_{2}\;m_{3}\;e] (7)

During the attack, the parameter optimizer changes this vector to synthesize new face images apart from only the residual value. This is why the parameter vector pp for each Attacker does not have the residual space like the decomposed vectors yTy^{T}. Each of the attributes in the parameter vector pp (face hair, marks, etc.) need exactly one(1) less value than the number of labels within the attribute in the training images. The parameter vector pp has one more parameter gg, which controls whether or not eyeglasses are synthesized. gg is separate from the MMDA face synthesizer. Eyeglasses are overlaid by using alpha-blending on the synthesized faces depending on the value of gg.

3.5.3 Generating New Face

Given an altered parameter vector p{p^{\prime}}, the synthesis is achieved by:

x\displaystyle\vspace{-3mm}x^{\prime} =PrVp\displaystyle=P_{r}V{p^{\prime}} (8)

VV is the orthogonal matrix as described in the previous section and PrP_{r} is learned during the decomposition. PrP_{r} reverses the effects of PP in Equation 5. These vectors form a semantic basis of the training space. Then the new face vector xx^{\prime}, which is a linear combination of the semantic faces, is reshaped and unwarped so that we are able to visualize the new face.

0.50<h1,h2<0.45-0.50<h_{1},h_{2}<0.45 0.60<m1,m2,m3<0.50-0.60<m_{1},m_{2},m_{3}<0.50
facial hair marks
0.40<e<0.40-0.40<e<0.40 0.20<g<0.20-0.20<g<0.20
expression eyeglasses
Table 2: Bound constraints on the parameters in vector pp for Attacker 1. These help keep the synthesized face from being unrealistic.

The values in the altered parameter vector p{p} need to be bounded in order to prevent the newly synthesized image from being unrealistic. These bounds on the parameters were selected from observing the learned whitening PCA matrix (PP). Table 2 shows the bounds on the parameters for Attacker 1. Similar bounds are set for the other attackers.

4 Experiments and Results

4.1 Experiment setups

In the experiments we used the three FRS as mentioned in section 3.2. Gallery of 40 random face images were selected from the CMU Multi-PIE dataset[12] for majority of the experiments. For some experiments different galleries (from the same dataset) were selected based on the experiment.

Python face_recognition library and RESNET50 implementation returns euclidean and cosine distance respectively. Lower distance is considered better. The threshold θ\theta in Equation 1 for these two systems are 0.500.50 and 0.450.45 respectively. Hence, any attack with the final distance less than θ\theta is considered a successful attack. Face++ API on the other hand returns confidence (higher is better). The minimum confidence for this system is 0.650.65. Any attack with higher confidence than this is considered successful. These thresholds values were taken as strict cutoff values from the official implementation instructions of the systems. We have also generated a ‘Genuine (match) and Impostor (non-match) Score Distribution’ (see Figure 4) for one of the FRS (Python ‘face_recognition’) with 35 subject in each group (genuine and impostor). This justifies the selection of the thresholds.

Refer to caption
Figure 4: Genuine (match) and Impostor (non-match) Score Distribution for FRS 1: Python ‘face_recognition’). It justifies the selection of the threshold.

4.2 Break-in

In break-in attacks, the attacker aims to be accepted by the FRS as ‘anyone’ from the gallery. The initial parameters (p0p_{0}) for the synthesizer are randomly selected within the range of the bound constraints of Table 2. Examples of the successful break-in attacks are presented in Figure 3. The first row of images are of the attackers and the next row are the subjects the FRS identified the attacker as. The FRS misclassified the attack faces as the victims in the corresponding columns.

4.3 Impersonation and Evasion

 Attackers

Refer to caption Refer to caption Refer to caption Refer to caption

 Victims

Refer to caption Refer to caption Refer to caption Refer to caption
a b c d
Figure 5: Examples of successful Impersonation attacks: column (a) with Attacker 1, (b)-(c) with Attacker 2, and (d) with Attacker 3. The attackers selected the victims prior to the attack and could successfully impersonate them.

Impersonation. In impersonation attacks the attacker tries to make the FRS classify him as ‘specifically someone’ who he tries to impersonate. For Impersonation, the initial parameter vector p0p_{0} was set to the parameters from decomposing Victim’s face image with the face synthesizing method (Equations 5 and 7). It is a reasonable starting point, as opposed to starting with a random p0p_{0} like in break-in attacks. All three attackers could successfully impersonate multiple targeted victims. Figure 5 shows some of the successful impersonation attempts.

However, impersonation attacks succeed when the Attacker and the Victim have similar attributes (e.g. race, gender, facial hair etc.) between them. We also experimented with victims of different genders, and races for both the Attackers and reasonably failed to impersonate them i.e. the target FRS did not output the Victim’s ID with a score below the threshold. Some of the victims whom we failed to impersonate are presented in Figure 6.

Refer to caption Refer to caption Refer to caption Refer to caption
(a) (b)
Figure 6: Victims for whom impersonation failed. We found that success in impersonation attacks depended on similarity of the Attacker and his Victims. Victims in (a), and (b) are the targets who could not be impersonated by Attacker 1, and Attacker 2 respectively.

Evasion. Evasion is more challenging task than the other two forms of attack. This is because the Attacker’s own face is also in the gallery. We have experimented with two different scenarios as discussed in section 3.4. For the given gallery and synthesizer, evasion from the gallery was not successful in our experiments. In our experiments on evasion, the FRS were not fooled and they correctly identified the attacker’s face.

4.4 Attacking FRS with Real Face

By replicating the makeups informed by the synthesized faces from some of the successful attacks, we presented the real face images to the FRS. Some of these results are shown in Figure 7. We could successfully break-in and impersonate victims using real faces with makeups informed by the synthesized face images.

 Attacker 1

Refer to caption

 Attackers

Refer to caption Refer to caption

Attacker 2

Refer to caption

 Victims

Refer to caption Refer to caption
Base faces a b
Figure 7: Faces in the left-most column are the base real faces of Attacker 1 and 2 which did not fool the FRS. In (a), Attacker 1 successfully impersonated the Victim by presenting his actual face (not an image) to the FRS by applying the scars and mustache informed by the synthesized image. In (b), Attacker 2 applied the makeup (synthesized in Figure 3(c)) on her real face, and successfully broke into the FRS. The FRS misidentified Attacker 2 as the Victim shown.

4.5 Variations of Galleries

Following are some different combinations of galleries for break-in attacks: 1. 3 random galleries of 50 individuals each. 2. 2 galleries of 40 individuals from different races. 3. 1 gallery of 50 individuals from different gender. 4. 1 gallery of 40 individuals from same race. 5. 1 gallery of 50 individuals from same gender. In Figure 8 we plot number of successful attacks and unique victims from 20 attacks for different galleries as listed above. The figure shows that Break-in attacks succeed more often if the Attacker has the same gender and ethnicity as someone (anyone) in the gallery.

Refer to caption
Figure 8: Impact of different galleries with varying difficulties: Number of successful attacks (out of 20 attempts) and number of unique victims for different variations of galleries. Break-in attacks are more successful when the gallery has subject from same race, or gender as the attacker.

4.6 Different Gallery Sizes

We experimented the performance of proposed Attack Scheme with varying gallery sizes. The aim is to find out if the size of gallery has any impact. We made 10 attempts to break-in with Attacker 1 with gallery sizes ranging from 10 to 330. The results of this experiments are given in Figure 9.

Refer to caption
Figure 9: Impact of gallery size. For each gallery size we plot the number of successful break-in attempts (in green bars) by Attacker 1, and average of minimum distances (lower is better) from 10 attempts. It is clear that larger gallery size makes break-in easier.

4.7 Variations in Face Synthesizer

Without Bound Constraints. When attack-faces are synthesized without the bound constraints (e.g. Table 2 - for Attacker 1) on the synthesizing parameters, the face images sometimes turn out to be unrealistic. This is due to the parameter optimizer pushing the parameters too far. However, synthesizing faces without these bounds generates more successful attacks. This can be useful in a scenario when attacks on FRS can be done with an image rather than an actual face (e.g. online upload). Although not realizable, these images fool the FRS successfully.

More Training Images. Training the face synthesizer with more training images (e.g. face with different types of scars, moles, makeups, etc.) creates more space for the synthesizer to generate a new face. Moreover, higher number of successful attacks are generated. However, having more facial attributes would mean there are more things to take care of during the attack with real faces. The number of combinations of the training images (for face synthesizer) is a trade-off between getting more successful in the attacks, and synthesizing face which are easier to realize.

4.8 Scaling to Multiple FRS

Experiments were done with the three FRS listed in Section 3.2. Multiple experiments with all three attackers show that these FRS are prone to misclassifying attack faces for the three attackers. Hence, it is evident that our attack scheme scales to multiple FRS with multiple attackers.

5 Discussion and Conclusion

From the experiments we counted that approximately 21.8%21.8\% of the attacks (Break-in and Impersonation) were successful for the three attackers combined. Hence, it is clear that face recognition systems are vulnerable to our attack scheme. We showed that real perturbations of the attackers’ face succeed in break-in and impersonation attacks. However, evasion was not successful as the other two types of attacks. Moreover, impersonation attacks depend on the similarity between the Attacker and the Victim. Getting a good initial parameter vector also plays an important role in finding an effective final image. We noticed that increasing gallery size improves the chances of success. Also, a gallery with faces of similar race, gender as the attacker increases the chances of a successful attack. Finally, we have seen that attackers were successful in attacking with real face which had been realized from a synthesized image. This shows that our attack scheme successfully fooled the FRS in a black-box settings with real faces.

Our work makes a clarion call for urgent research to address these vulnerabilities in FRS. We hope other researchers will take up this challenge.

References

  • [1] https://faceplusplus.com.
  • [2] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok. Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397, 2017.
  • [3] W. Brendel, J. Rauber, and M. Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248, 2017.
  • [4] T. B. Brown, D. Mané, A. Roy, M. Abadi, and J. Gilmer. Adversarial patch. arXiv preprint arXiv:1712.09665, 2017.
  • [5] P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 15–26. ACM, 2017.
  • [6] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8789–8797, 2018.
  • [7] T. I. Dhamecha, R. Singh, M. Vatsa, and A. Kumar. Recognizing disguised faces: Human and machine evaluation. PloS one, 9(7):e99212, 2014.
  • [8] Y. Dong, H. Su, B. Wu, Z. Li, W. Liu, T. Zhang, and J. Zhu. Efficient decision-based black-box adversarial attacks on face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7714–7722, 2019.
  • [9] A. Geitgey. Face recognition, 2017.
  • [10] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  • [11] G. Goswami, N. Ratha, A. Agarwal, R. Singh, and M. Vatsa. Unravelling robustness of deep learning based face recognition against adversarial attacks. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  • [12] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker. Multi-pie. Image and Vision Computing, 28(5):807–813, 2010.
  • [13] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [14] A. Ilyas, L. Engstrom, A. Athalye, and J. Lin. Black-box adversarial attacks with limited queries and information. arXiv preprint arXiv:1804.08598, 2018.
  • [15] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
  • [16] J. Li, Y. Wong, and T. Sim. Learning controllable face generator from disjoint datasets. In Computer Analysis of Images and Patterns - 18th International Conference, CAIP 2019, Salerno, Italy, September 3-5, 2019, Proceedings, Part I, pages 209–223, 2019.
  • [17] V. Mirjalili, S. Raschka, A. Namboodiri, and A. Ross. Semi-adversarial networks: Convolutional autoencoders for imparting privacy to face images. In 2018 International Conference on Biometrics (ICB), pages 82–89. IEEE, 2018.
  • [18] V. Mirjalili, S. Raschka, and A. Ross. Gender privacy: An ensemble of semi adversarial networks for confounding arbitrary gender classifiers. In 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), pages 1–10. IEEE, 2018.
  • [19] V. Mirjalili and A. Ross. Soft biometric privacy: Retaining biometric utility of face images while perturbing gender. In 2017 IEEE International joint conference on biometrics (IJCB), pages 564–573. IEEE, 2017.
  • [20] O. Nikisins, A. Mohammadi, A. Anjos, and S. Marcel. On effectiveness of anomaly detection approaches against unseen presentation attacks in face anti-spoofing. In 2018 International Conference on Biometrics (ICB), pages 75–81. IEEE, 2018.
  • [21] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pages 506–519. ACM, 2017.
  • [22] O. M. Parkhi, A. Vedaldi, A. Zisserman, et al. Deep face recognition. In bmvc, volume 1, page 6, 2015.
  • [23] R. Raghavendra, K. B. Raja, and C. Busch. Presentation attack detection for face recognition using light field camera. IEEE Transactions on Image Processing, 24(3):1060–1075, 2015.
  • [24] R. Raghavendra, K. B. Raja, S. Venkatesh, F. A. Cheikh, and C. Busch. On the vulnerability of extended multispectral face recognition systems towards presentation attacks. In 2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA), pages 1–8. IEEE, 2017.
  • [25] R. Ramachandra and C. Busch. Presentation attack detection methods for face recognition systems: A comprehensive survey. ACM Computing Surveys (CSUR), 50(1):8, 2017.
  • [26] U. Scherhag, R. Raghavendra, K. B. Raja, M. Gomez-Barrero, C. Rathgeb, and C. Busch. On the vulnerability of face recognition systems towards morphed face attacks. In 2017 5th International Workshop on Biometrics and Forensics (IWBF), pages 1–6. IEEE, 2017.
  • [27] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 1528–1540. ACM, 2016.
  • [28] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter. Adversarial generative nets: Neural network attacks on state-of-the-art face recognition. arXiv preprint arXiv:1801.00349, 2017.
  • [29] T. Sim and L. Zhang. Controllable face privacy. In 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), volume 4, pages 1–8. IEEE, 2015.
  • [30] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.