Deep convolutional neural networks for face and iris presentation attack detection: Survey and case study

\auYomna Safaa El-Din

{}^{1\corr}

\auMohamed N. Moustafa² \auHani Mahdi¹ [email protected] \add1Department of Computer and Systems Engineering, Ain Shams University, Cairo, Egypt \add2Department of Computer Science and Engineering, The American University in Cairo, New Cairo, Egypt

Abstract

Biometric presentation attack detection is gaining increasing attention. Users of mobile devices find it more convenient to unlock their smart applications with finger, face or iris recognition instead of passwords. In this paper, we survey the approaches presented in the recent literature to detect face and iris presentation attacks. Specifically, we investigate the effectiveness of fine tuning very deep convolutional neural networks to the task of face and iris antispoofing. We compare two different fine tuning approaches on six publicly available benchmark datasets. Results show the effectiveness of these deep models in learning discriminative features that can tell apart real from fake biometric images with very low error rate. Cross-dataset evaluation on face PAD showed better generalization than state of the art. We also performed cross-dataset testing on iris PAD datasets in terms of equal error rate which was not reported in literature before. Additionally, we propose the use of a single deep network trained to detect both face and iris attacks. We have not noticed accuracy degradation compared to networks trained for only one biometric separately. Finally, we analyzed the learned features by the network, in correlation with the image frequency components, to justify its prediction decision.

1 Introduction

Biometric recognition has been increasingly used in recent application that need authentication and verification. Instead of normal username and password or token-based authentication, the use of biometric traits like face, iris or fingerprints are more convenient to the users and so has gained a lot of popularity specially after the wide spread of smartphones and its usage in a lot of areas, such as payment. However, with the increase of technology, spoofing these biometric traits has become more easy. For example, an attacker can use a photograph or a video replay of the face or iris or a person to gain access to his/her smartphone or any application that needs authorization. Such act is referred to by the ISO/IEC 30107-1:2016 standard as presentation attack (PA) and the biometric or object used in PA is called an artifact or presentation attack instrument (PAI). This has lead to the increase of interest in designing algorithms that guard against these attacks, and so are called presentation attack detection (PAD) algorithms.

Lots of research has been done in this area, starting with methods that basically depend on designing and extracting hand-crafted features from the acquired images then using conventional classifiers to detect bona-fide (real) from attack presentations. Literature shows that methods relying on manually engineered features are suitable for solving the PAD problem for face and iris recognition systems. However, the design and selection of handcrafted feature extractors is mainly based on expert knowledge of researchers on the problem. Consequently, these features often reflect limited aspects of the problem and are often sensitive to varying acquisition conditions, such as camera devices, lighting conditions and PAIs. This causes their detection accuracy to vary significantly among different databases, indicating that the handcrafted features have poor generalizability and so do not completely solve the PAD problem.

In recent years, deep learning has evolved and the use of deep neural networks or convolutional neural networks (CNN) has proven to be effective in many computer vision tasks especially with the availability of new advanced hardware and large data. CNNs have been successfully used for vision problems like image classification and object detection. This has encouraged many researchers to incorporate deep learning in the PAD problem, and rely on a deep network or CNN to learn features that discriminate between bona-fide and attack face 319_18_f ; 317_18_f or iris 305_18_i ; 306_18_i or both 312_18_fi , instead of using hand-crafted features. Several of these researches used CNN only for feature extraction followed by a classic classifier like SVM, others designed custom networks or combined information from CNN and hand-crafted features.

Several surveys for presentation attack detections in either face or iris recognition are available in literature. Czajka and Bowyer provided a thorough assessment of the state-of-the-art for iris PAD in 183_18 and concluded that PAD for iris recognition is not a solved problem yet. Their main focus was on iris presentation attack categories, their countermeasures, competitions and applications.

For PAD in face recognition systems, Raghavendra and Bush provided a comprehensive survey in 526_17_f describing different types of presentation attack and face artifacts, and showing the vulnerability of commercial face recognition systems to presentation attack. They survey and evaluate fourteen state-of-the-art face PAD algorithms on CASIA face-spoofing database and discuss the remaining challenges and open issues for a robust face presentation attack detection system. In 503_17_f , Ghaffar and Mohd focused their review on face PAD systems on smartphones with recent face datasets captured with mobile devices.

Later, Li et al. 502_18_f provided a comprehensive review of more recent PAD algorithms in face recognition, discussing the available datasets and reported results. They proposed a color-LBP based countermeasure as a case study and evaluated it on two benchmark face datasets highlighting the importance of cross-dataset testing.

In this work, we survey most of the recent PAD approaches proposed in literature for both face and iris recognition applications, along with the competitions and benchmark datasets. Then for a case study, we assess the performance of recent well-known deep CNN architectures on the task of face and iris presentation attack detection.

For assessment we use 3 different face datasets and 3 iris datasets. Using just the RGB images as input, we finetune the deep network’s weights then perform intra and cross-dataset evaluation to discuss the generalization ability of these deep models. The experiments are first done with separate models trained for each of face and iris biometric, then we experiment with training a single network to generally tell apart bona-fide from attack samples whether the sample is a face or an iris image. Such generic PAD network could perfectly differentiate between the real and attack images. The specific features learned by this network are analyzed showing that these features are highly related to high frequency regions in the input images. We finally use gradient-weighted class activation maps gradcam to visualize what the network focuses on when deciding if a sample is bona-fide or not.

Hence, the main contributions of this work include: (1) Survey the recent methods for presentation attack detection in both face and iris modalities with competitions and datasets, (2) Demonstrate the effectiveness of CNN for the PAD problem, by finetuning and comparing recent deep CNN architectures on 3 benchmark face datasets and 3 benchmark iris datasets, experimenting with two finetuning approaches and applying cross-dataset evaluation, (3) Train a single network for presentation attack detection of the two biometric modalities, and compare the results with networks trained on single biometric, (4) Analyze specific features learned by the commonly-trained network and (5) Finally, visualize the class activation maps of the trained network on sample bona-fide and attack images.

The rest of the paper is organized as follow, we first review the PAD approaches which can broadly be categorized into active (Section 2) vs passive (Section 3) PAD techniques. Then in Section 4, we summarize the face and iris PAD competitions and benchmark databases, followed by Section 5 where we explain the proposed PAD approach. Experiments, results and analysis are presented in Section 6 and finally conclusion and future work in Section 7.

2 Active PAD

Active Presentation Attack Detection can either be hardware-based or depending on challenge-response, both will be highlighted in details in this section.

2.1 Hardware-based

Such methods depend on capturing the automatic response of the biometric trait (face/iris) to a certain stimulus. For example in iris liveness detection, the use of eye hippus, which is the permanent oscillation that the eye pupil presents even under uniform lighting conditions, or the dilation of pupil in reaction to sudden illumination changes 111_04 ; 104_12 ; 107_06 ; 150_15 ; 172_14 . Variations of pupil dilation were calculated by Bodade et al. 148_09 from multiple iris images while Huang et al. 149_13 used pupil constriction to detect iris liveness. Other researchers used the iris image brightness variations after light stimuli in some predefined iris regions 119_07 . Or the reflections of infrared light on the moist cornea when stimulated with light sources positioned randomly in space. Lee et al. 145_10 used additional infrared sensors during iris image acquisition to construct the full 3d shape of the iris to aid in the detection of fake iris images or contact lenses. These approaches rely on an external specific device to be added to the sensor/camera, hence the naming "hardware-based techniques". However, not all hardware-based methods are active, some such approaches detect facial thermogram hwp1_15_f ; hwp2_15_f , blood pressure, fingerprint sweat, gait etc passively in a non-intrusive manner.

2.2 Challenge-response

In these methods, the user is required to respond to a "challenge" instructed by the system. The system requests the user to perform some specific movement like "blink right eye" or "rotate the head clockwise" 531_11_f ; 562_15_f ; 528_08_f or gaze towards a predefined stimulus 563_13_f .

2.3 Drawbacks of active PAD methods

Although active spoof detection methods may provide higher presentation attack detection rates and robustness against different presentation attack types, they are more expensive and less convenient to users than the software-based approaches which do not include any additional devices beside the standard camera. For the rest of this survey we will focus on software-based methods; they are more generic, less expensive, less intrusive and can easily be incorporated in real-world applications and smartphones. Table 2.3 summarizes the different approaches used in literature for detection of presentation attacks.