This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Fingerprint Feature Extraction by Combining Texture, Minutiae, and Frequency Spectrum Using Multi-Task CNN

Ai Takahashi1, Yoshinori Koda1,2, Koichi Ito1, and Takafumi Aoki1
1 Graduate School of Information Sciences, Tohoku University, Japan.
2 Biometrics Research Laboratories, NEC Corporation, Japan.
[email protected]
Abstract

Although most fingerprint matching methods utilize minutia points and/or texture of fingerprint images as fingerprint features, the frequency spectrum is also a useful feature since a fingerprint is composed of ridge patterns with its inherent frequency band. We propose a novel CNN-based method for extracting fingerprint features from texture, minutiae, and frequency spectrum. In order to extract effective texture features from local regions around the minutiae, the minutia attention module is introduced to the proposed method. We also propose new data augmentation methods, which takes into account the characteristics of fingerprint images to increase the number of images during training since we use only a public dataset in training, which includes a few fingerprint classes. Through a set of experiments using FVC2004 DB1 and DB2, we demonstrated that the proposed method exhibits the efficient performance on fingerprint verification compared with a commercial fingerprint matching software and the conventional method.

footnotetext: 978-1-7281-9186-7/20/$31.00 ©2020 IEEE

1 Introduction

A fingerprint has the property of high permanence and high discrimination in biometric features that have been used in many biometric authentication systems, such as user authentication for mobile devices, mobile payments, and national IDs [13]. In general, fingerprint matching is based on the positions and angles of the feature points, called minutiae, and the relative positions between the feature points [17].

Fingerprint recognition using minutiae has two problems: (i) the recognition accuracy depends on the extraction accuracy of the minutiae, and (ii) the length of features depends on the number of minutiae. For example, in a scratchy or dry fingerprint, the break in the ridges is extracted as some minutiae. In wet or crushed fingerprints, the number of extracted minutiae is small since the ridges are collapsed. In order to achieve highly accurate recognition of such a fingerprint image, it is necessary to combine multiple features that can be extracted from the fingerprint image. In addition, the fingerprint image may be captured in a different area at the time of registration and authentication since the fingerprint image is acquired by placing the finger on the scanner. It is difficult to apply template protection to prevent leakage of biometric features since the number of extracted minutiae is different at the time of acquisition.

In order to address the above problems, fingerprint matching methods based on deep learning have been proposed [16, 8]. Li et al. [16] used Fully Convolutional Network (FCN) to extract feature based on the texture of fingerprint images. Engelsma et al. [8] proposed a Convolutional Neural Network (CNN) to extract texture feature of the whole fingerprint and minutia feature of the fingerprint. It is possible to extract fixed-length fingerprint feature using CNN that are independent of the number of minutiae. On the other hand, fingerprint matching methods using deep learning used a large-scale private fingerprint image dataset in training, and its effectiveness and reproducibility are not always guaranteed.

In this paper, we propose a new CNN-based method for extracting fingerprint feature to achieve high accuracy fingerprint recognition with training using only a public dataset. The proposed method employs frequency spectrum as a new feature in addition to the texture and minutiae used in other fingerprint matching methods. We introduce the minutia attention module in order to extract effective texture feature from local regions around the minutiae. We propose new data augmentation methods, which takes into account the characteristics of fingerprint images to increase the number of images during training. We train our CNN model using 9,000 fingerprint images (1,000 classes) of IIIT-D Multi-sensor Optical and Latent Fingerprint (MOLF) [19] and show that the proposed method is more accurate than commercial fingerprint authentication software, VeriFinger, and [8] through experiments using FVC2004 [2].

The contributions of this paper are summarized below.

  1. 1.

    We propose a novel CNN architecture to extract feature from fingerprint images by combining texture, minutiae and frequency spectrum.

  2. 2.

    We propose a minutia attention module to pay attention to the position of the minutiae to extract global and local texture features.

  3. 3.

    We propose a new data augmentation method specific to fingerprint images.

  4. 4.

    We demonstrate that the recognition accuracy is higher than that of commercial software, VeriFinger SDK, and the conventional method by using only a public dataset in training.

2 Related Work

Fingerprint recognition is classified into two methods: one is based on local features, i.e., minutiae, and the other is based on global features, i.e., texture.

A commonly used local features of a fingerprint is the minutiae, which are the end points and bifurcation points derived from fingerprint ridge patterns. The position, angle and relative relation of minutiae are used to compute the matching score. The minutia-based method is widely used in fingerprint recognition, for example, VeriFinger SDK [5] is available as a commercial software and NIST Biometric Image Software (NBIS) [3] as a free software. Although the minutiae are highly discriminative features, there is a problem that the recognition accuracy is decreased if the minutiae are not extracted correctly. Most of the methods using global features such as textures are based on image-based template matching [17]. Among them, the fingerprint matching method using frequency features of images has been proposed for matching fingerprint images from which minutiae cannot be extracted, such as dry or allergic skin [11]. Ito et al. [11] proposed a method called Band-Limited Phase-Only Correlation (BLPOC), which focuses on the phase information of the image. BLPOC has improved the accuracy of matching by utilizing the fact that the texture pattern of a fingerprint has an inherent frequency band and the energy is concentrated at that frequency band. Ito et al. [10] also showed that fingerprint recognition can be made more accurate by combining it with a minutia-based method. On the other hand, there is a problem that the recognition accuracy is decreased due to the rotation and nonlinear deformation between fingerprint images since only the translation between images can be considered.

Recently, fingerprint recognition methods based on CNN have been proposed to address the above problems. Tang et al. [21] have proposed FingerNet, which can extract minutiae from low-quality fingerprint images with high accuracy by incorporating Gabor filter-based fingerprint image enhancement into CNNs. Through experiments using the NIST SD27 [4], which is a latent fingerprint image dataset, they demonstrated that FingerNet exhibited higher accuracy of minutiae extraction than the commercial software. Nguyen et al. [18] proposed MinutiaeNet, which is a combination of CoarseNet to detect minutiae in the whole fingerprint image and FineNet to estimate the existence probability of minutiae. Li et al. [16] proposed a fingerprint feature extractor that does not need to be aligned and is robust to rotation and translation by training FCN on a local region centered on the minutiae. Engelsma et al. [8] proposed a multi-task learning method for texture feature extraction and minutiae detection to extract fingerprint features from textures and minutiae. Since the fingerprint features extracted using CNN are of fixed length, encryption can be applied to protect biometric information. On the other hand, most of CNN-based methods used a large-scale private fingerprint image dataset in training, and their effectiveness and reproducibility are not always guaranteed.

3 Method

The CNN-based fingerprint recognition methods proposed so far use minutiae, textures, or a combination of both [21, 18, 8]. To improve recognition accuracy, we consider using frequency spectrum as a feature in addition to minutiae and texture. Since the ridge patterns of fingerprints have their inherent frequency band [11], it is possible to match fingerprint images that cannot extract minutiae by focusing on the frequency band. The proposed method consists of three CNNs to extract three features: texture, minutiae, and frequency spectrum, which are then concatenated to form the fingerprint feature. We define a loss function for each feature and train it with multi-task learning, and introduce metric learning to improve the recognition accuracy. In order to train efficiently with fewer images, we introduce new data augmentation methods, which are specialized for fingerprint images.

The network architecture of the proposed fingerprint feature extraction method is shown in Fig. 1. First, we use Spatial Transformer Networks (STN) [12] to align the rotation of the fingerprint image. Next, we extract texture-based, minutia-based, and frequency-based features from the fingerprint image using CNNs based on the Residual Network (ResNet) [9]. The three features are then concatenated to form the fingerprint feature, which is used for matching. The details of the proposed method are described below.

Refer to caption
Figure 1: Overview of the network architecture used in the proposed method.

3.1 Rotation Alignment Using STN

Before processing with the proposed method, the input fingerprint image is enhanced to increase the recognition accuracy. In this paper, we employ the fingerprint enhancement method based on the intensity gradient, where we use the implementation of FastEnhanceTexture [1]. In the first step of the proposed method, we align the rotation of fingerprint images, which causes a decrease in accuracy in feature extraction. In this paper, we use STN [12] for end-to-end learning of CNNs. The STN estimates the transformation parameters in the localization network and transforms the image based on the estimated parameters. Fig. 2 shows the architecture of the localization network used in the proposed method. Although it is possible to correct the translation and rotation using the STN, we have confirmed experimentally that the correction of the rotation angle alone is sufficient. As a result, the number of parameters to be estimated is reduced and the training is stabilized.

Refer to caption
Figure 2: Architecture of localization network used in STN.

3.2 Feature Extraction Using CNNs

Frequency, texture and minutia features are extracted from the rotation-corrected fingerprint image using ResNet [9]. The frequency features are extracted using CNN as shown in Fig. 3, where ResBlock is the same as the one used in ResNet [9]. The input is two channels of the real and imaginary parts of the frequency spectrum obtained by Discrete Fourier Transform (DFT) of the fingerprint image. In addition, the following preprocessing is applied to extract the frequency features that represent the features of the fingerprint image. The DC component is much larger than the other frequency components and represents the gain inherent to the sensor. In order to reduce the effect of DC component, we apply normalization so that the average of the pixel values is zero, and then perform DFT. Since the energy of the fingerprint image is concentrated in an ellipse of the low-frequency band, the high-frequency region contains only perturbations such as noise and aliasing. In order to consider only the inherent frequency band of fingerprints similar to the BLPOC [11], only the region containing elliptical frequency band is extracted as an input. The CNN network architecture for extracting texture and minutiae features is shown in Fig. 4. Note that the CNNs that extract texture and minutia features share weights up to the middle. The gray-scale fingerprint image is input to the CNN as one channel. In order to extract minutia features using CNN, we introduce a minutia map [6] which can represent the positions and angles of minutiae. The minutia map is obtained with a minutiae map generator as shown in Fig. 5, which is branched from a minutia feature extractor.

Refer to caption
Figure 3: Architecture of CNN for extracting frequency-based feature.
Refer to caption
Figure 4: Architecture of CNNs for extracting minutiae-based and texture-based features.
Refer to caption
Figure 5: Architecture of minutia map generator.

3.3 Minutia Attention Module

Texture features are extracted from the entire image through training of the CNN to be effective for fingerprint recognition. On the other hand, Li et al. [16] demonstrated that using texture features in the local area around the minutiae is more accurate than using features extracted from the entire fingerprint image. Based on this idea, we extract highly discriminative texture feature by using the location information of minutiae for texture feature extraction. In the proposed method, we employ a Minutia Attention Module (MAM) inspired by Spatially Attentive Output Layer (SAOL) [14] for extracting texture feature that takes into account local regions around the minutiae. Instead of the Global Average Pooling (GAP) Layer in the texture feature extractor, MAM is used with the minutia map as an attention mask to extract features that aggregate texture information around the minutiae. Fig. 6 shows the architecture of MAM used in the proposed method.

Let 𝑿lCl×Hl×Wl\bm{X}^{l}\in\mathbb{R}^{C^{l}\times H^{l}\times W^{l}} be the feature map output from the ll-th layer of the texture feature extractor, where HlH^{l}, WlW^{l}, and ClC^{l} are the height, width, and the number of channels, respectively, of the feature map output from the ll-th layer (1lL1\leq l\leq L). When using the GAP layer and Fully-Connected (FC) layer, the texture feature 𝒕tex\bm{t}_{\rm tex} is given by

𝒕tex=GAP(𝑿L)𝑾FC,\bm{t}_{\rm tex}={\rm GAP}(\bm{X}^{L})^{\intercal}\bm{W}^{\rm FC}, (1)

where GAP(𝑿L)CL×1{\rm GAP}(\bm{X}^{L})\in\mathbb{R}^{C^{L}\times 1} indicates the spatially aggregated feature vector by GAP, 𝑾FCCL×K\bm{W}^{\rm FC}\in\mathbb{R}^{C^{L}\times K} indicates the weight matrix of the output FC layer, ()(\cdot)^{\intercal} indicates the transposition, CL=1,024C^{L}=1,024, and K=512K=512. On the other hand, when using MAM instead of GAP, the texture feature with attention, 𝒕texatt\bm{t}^{\rm att}_{\rm tex}, is given by

𝒕texatt=MA(𝑿L)𝑾FC,\bm{t}^{\rm att}_{\rm tex}={\rm MA}(\bm{X}^{L})^{\intercal}\bm{W}^{FC}, (2)

where MA(𝑿L)C×K{\rm MA}(\bm{X}^{L})\in\mathbb{R}^{C^{\prime}\times K} is defined by

MAc(𝑿L)=i,jAij(Yc)ij,{\rm MA}_{c^{\prime}}(\bm{X}^{L})=\sum_{i,j}A_{ij}(Y_{c^{\prime}})_{ij}, (3)

for each cc^{\prime}, where cc^{\prime} indicates the class index (1cC)(1\leq c^{\prime}\leq C^{\prime}), 𝑨[0,1]HL×WL\bm{A}\in[0,1]^{H^{L}\times W^{L}} indicates the attention mask, 𝒀[0,1]C×HL×WL\bm{Y}\in[0,1]^{C^{\prime}\times H^{L}\times W^{L}} indicates the spatial logits, which is obtained by applying softmax to 𝑿L\bm{X}^{L}, and (Yc)ij(Y_{c}^{\prime})_{ij} indicates the (i,j)(i,j)-th element of cc^{\prime}-th feature map of YcY_{c}^{\prime}. Note that the size of 𝑾FC\bm{W}^{FC} in Eq. (2) is changed to C×K\mathbb{R}^{C^{\prime}\times K} since the training is performed as 1,000-class classification problem, i.e., C=1,000C^{\prime}=1,000 in this paper.

Refer to caption
Figure 6: Architecture of minutiae attention module.

3.4 Loss Function and Deep Metric Learning

This section describes the loss functions to be trained in the proposed method and deep metric learning to improve the accuracy of fingerprint recognition.

We use the MSE loss for estimating the minutia map. Let the ground-truth and estimated minutia maps be 𝑯g\bm{H}_{g} and 𝑯e\bm{H}_{e}, respectively. The loss function for minutia map estimation is defined by

Lmap=i,j,kρ{Hg(i,j,k)He(i,j,k)}2,L_{\rm map}=\sum_{i,j,k}\rho\left\{H_{g}(i,j,k)-H_{e}(i,j,k)\right\}^{2}, (4)

where (i,j,k)(i,j,k) indicates (x,y)(x,y) coordinates and kk channel, respectively, and ρ\rho is a constant (ρ=100\rho=100 in this paper). The loss functions for texture, minutia, and frequency features are defined by the cross-entropy loss and are denoted by LtL_{t}, LmL_{m}, and LfL_{f}, respectively. The loss function of the proposed method LallL_{\rm all} is given by

Lall=Lt+Lm+Lf+λmapLmap,L_{\rm all}=L_{t}+L_{m}+L_{f}+{\lambda}_{\rm map}L_{\rm map}, (5)

where λmap\lambda_{\rm map} is the weight for LmapL_{\rm map}. Note that the weight parameters for STN are also optimized so as to minimize LallL_{\rm all}.

The proposed method employs deep metric learning in training to improve the accuracy of fingerprint recognition. In this paper, we use AdaCos [22], which has been demonstrated to be effective in face recognition. The use of AdaCos makes it possible to increase the cosine similarity of the genuine pairs and to decrease that of the impostor pairs during training. Let the class weight consisting of the center feature vector for each class be 𝑾\bm{W}. The probability Pi,yiP_{i,y_{i}}, which the ii-th input image is classified into the class label yiy_{i} of 𝑾\bm{W} is given by

Pi,yi=exp(s~d(t)cosθi,yi)exp(s~d(t)cosθi,yi)+kyiexp(s~d(t)cosθi,k),P_{i,y_{i}}=\frac{\exp\left(\tilde{s}_{d}^{(t)}\cdot\cos\theta_{i,y_{i}}\right)}{\exp\left(\tilde{s}_{d}^{(t)}\cdot\cos\theta_{i,y_{i}}\right)+\displaystyle\sum_{k\neq y_{i}}\exp\left(\tilde{s}_{d}^{(t)}\cdot\cos\theta_{i,k}\right)}, (6)

where cosθi,yi\cos\theta_{i,y_{i}} indicates the cosine similarity between the ii-th input feature and the corresponding class label yiy_{i} of 𝑾\bm{W}, s~d(t)\tilde{s}_{d}^{(t)} indicates the scaling parameter, kk indicates the class label of 𝑾\bm{W} except for ii, and tt indicate the number of updates. In the case of using AdaCos, the final loss function LL is given by

L=1Ni=1NlogPi,yi.L=-\frac{1}{N}\sum_{i=1}^{N}\log P_{i,y_{i}}. (7)

Applying AdaCos to the proposed method, LtL_{t}, LmL_{m}, and LfL_{f} are calculated by Eq. (7).

3.5 Data Augmentation

In this paper, data extension is important since the training is performed only on a public dataset that is not large. The general data augmentation methods are random contrast and random noise. In the proposed method, we introduce random deformation and random morphology as new data augmentation methods specialized for fingerprint images. During the acquisition process, nonlinear deformation may be added to the fingerprint image and the ridge pattern of the fingerprint may be collapsed. In order to represent deformation of fingers, the fingerprint image is deformed according to the fingerprint deformation model [7] by random deformation. In order to represent collapse of ridge patterns, the morphological filters of dilation and erosion are applied to a part of the fingerprint image. Note that the size of the filter for random morphology is 0.020.20.02\sim 0.2% of the image size according to random erasing [23]. Fig. 7 shows examples of fingerprint images with data augmentation.

Refer to caption
Figure 7: Example of fingerprint images after data augmentation: (a) original image, (b) random contrast, (c) random noise, (d) random morphology, and (e) random deformation.

3.6 Matching Score

The matching score is calculated between fingerprint features extracted by the proposed method. Let the fingerprint features extracted two fingerprint images be 𝒕1\bm{t}_{1} and 𝒕2\bm{t}_{2}, respectively. The matching score SS is calculated by

S=𝒕1𝒕2.S=\bm{t}_{1}^{\intercal}\cdot\bm{t}_{2}. (8)

4 Experiments and Discussion

This section describe the experiments of evaluating the proposed method and discuss with their results.

4.1 Training

The proposed CNN model is trained as follows. We use 12,000 fingerprint images (DB1, DB2, and DB3) out of 19,200 fingerprint images in MOLF in the experiments. Note that we do not use DB4 and DB5 consisting of latent fingerprint images since we only use the scanned fingerprint images in the experiments. The 12,000 fingerprint images are divided into 9,000 (1,000 fingers ×\times 3 images ×\times 3 DBs) for training and 3,000 (1,000 fingers ×\times 1 image ×\times 3 DBs) for validation. We perform multi-task learning for identifying 1,000-class labels and detecting minutiae. RMSProp is used as an optimizer and the weight decay for preventing overfitting is set to 10510^{-5}. The learning rates for feature extractors and STN are set to 10310^{-3} and 545^{-4}, respectively. Fingerprint images are resized to 256×256256\times 256 pixels as an input. The size of the minutia map is 128×128128\times 128 pixels with 6 channels. The effective frequency band is set to 50% of the size of frequency spectrum. The weight for the minutia map is set to λmap=10{\lambda}_{\rm map}=10. The probability of random noise, random contrast, random deformation, and random morphology is set to 80%, 80%, 50%, and 50%, respectively. We use the minutia information detected by VeriFinger SDK 10.0 [5] as the ground truth of the training data.

4.2 Experimental Condition

The performance of the proposed method is evaluated using the public fingerprint image dataset, FVC2004 [2] DB1 and DB2. DB1 and DB2 consist of 800 fingerprint images (100 fingers ×\times 8 images). Fig. 8 shows the example of fingerprint images from the genuine pairs in FVC2004 DB1 and DB2. DB1 has many low-quality fingerprint images that contain little overlap in the fingerprint region, nonlinear deformation in the same subjects, collapsed ridges due to wetness, or interrupted ridges due to drying. The DB2 has a lot of collapsed ridges, which makes it difficult to detect minutiae in many fingerprint images. We evaluate the matching scores of 2,800 genuine pairs and 368,500 impostor pairs, which are all the combinations of the dataset. The verification accuracy is evaluated by Equal Error Rate (EER), where is defined as the error rate where False Acceptance Rate (FAR) and False Rejection Rate (FRR) are equal.

Refer to caption
Figure 8: Example of fingerprint images from the genuine pairs in FVC2004 DB1 and DB2.
Table 1: Summary of EERs [%] for FVC2004 DB1 and DB2.
Method Texture Minutiae Frequency AdaCos Data Aug. MAM DB1 DB2
VeriFinger [5] 1.63 1.29
Engelsma [8] 1.75 2.44
Proposed (A) 1.84 1.84
Proposed (B) 1.69 1.73
Proposed (C) 1.58 1.10
Proposed (D) 1.41 1.36

4.3 Ablation Study

The effectiveness of refinement techniques introduced in the proposed method, i.e., frequency feature, deep metric learning, data augmentation, and MAM, is evaluated by the following ablation study using FVC2004 DB1 and DB2. In this experiment, we compare the verification accuracy of the proposed method with Verifinger SDK 10.0 [5] and the deep learning-based method [8] to demonstrate the effectiveness of the proposed method. Note that we replaced Inception modules used in [8] to ResNet modules used in the proposed method since the number of classes in the training data used in this experiment is quite small compared with the original implementation of [8], where 382,914 classes are included in the private training data. Table 1 shows the summary of experimental results. In FVC2004 DB1, the EERs of the proposed method have been improved by adding refinement techniques. The EER of the proposed method (D), which employs all the refinement techniques, is lower than that of VeriFinger and Engelsma [8]. In DB2, the accuracy of the proposed method has been also improved by adding refinement techniques. The proposed method (C) is more accurate than VeriFinger and [8]. On the other hand, the proposed method (D) with MAM had more errors than the proposed method (C) without MAM. This is because the fingerprint image in FVC2004 DB2 has many areas with collapsed ridges and blurred minutiae as shown in Fig. 8, and therefore, it may not be possible to apply proper attention to them.

4.4 Neonate Fingerprint Recognition

We evaluate the accuracy of the proposed method in neonate fingerprint recognition, which is an example of fingerprint images that are extremely difficult to extract minutiae from. In this experiment, we use the neonate fingerprint image dataset used in [15]. This dataset contains neonate fingerprint images taken with a 1,920-ppi sensor at 2, 6, and 18 hours of age. The number of genuine and impostor pairs is 664 and 9,632, respectively. The EERs of VeriFinger, the proposed method (C), and (D) are 42.6%, 34.9%, and 38.4%, respectively. In VeriFinger, wrinkles and pores are extracted as minutiae as shown in Fig. 9. The EER of VeriFinger is high due to the use of many wrong minutiae for fingerprint matching. On the other hand, the proposed method is more accurate than VeriFinger SDK in the neonate fingerprint recognition since the proposed method uses texture and frequency features in addition to minutia feature for matching. Minutiae extracted from neonate fingerprint images using the proposed method are shown in Fig. 9. This figure shows that the proposed method was able to extract relatively correct minutiae, however, the accuracy of the proposed method (C) with MAM was reduced due to the small number of minutiae.

Refer to caption
Figure 9: Minutiae extracted from neonate fingerprint images at 2 and 18 hours of age using VeriFinger SDK and the proposed method (D).

4.5 Qualitative Analysis

We present qualitative evaluation of the minutia extraction using the proposed method. Fig. 10 shows the minutiae extracted using MINDTCT [3], MinutiaeNet [18] , VeriFinger, and the proposed method (D) for fingerprint images with quality values of 96 and 45 in VeriFinger. For the fingerprint image with a quality value of 96, the same minutiae were extracted by all the methods. In the case of the fingerprint image with a quality value of 45, there are many false positives for MINDTCT, and there are many undetected for MinutiaeNet, while the proposed method (D) is able to extract the majority of minutiae extracted by VeriFinger.

The effectiveness of the proposed method is verified using a saliency map that visualizes the effect of each pixel in the input image on the extracted features. We use Guided-back propagation [20] to make a saliency map from CNN weights. Fig. 11 shows the saliency maps of texture and minutia features. While CNN for extracting minutia feature focuses on the entire fingerprint image, CNN for extracting texture feature focuses on the core region. CNN with MAM focus not only on the core region, but also on the region around minutiae. Fig. 12 shows the saliency maps of frequency feature. CNN for extracting frequency feature focuses on the frequency bands inherent in the fingerprint pattern and we confirmed that the individual frequency features are extracted.

Refer to caption
Figure 10: Example of minutia extraction from fingerprint images of FVC2004 DB1 (Upper: Quality 96, Lower: Quality 45).
Refer to caption
Figure 11: Saliency maps of texture and minutia features for images in FVC2004 DB1: (a) the proposed method (b) the proposed method (D).
Refer to caption
Figure 12: Saliency maps of frequency features for images in FVC2004 DB1.

5 Conclusion

We proposed a novel CNN architecture to extract features from fingerprint images by combining texture, minutiae and frequency spectrum. A minutia attention module to pay attention to the position of the minutiae was proposed. We also introduced novel data augmentation methods specific to fingerprint images for efficient training with a small number of classes of training data. Through a set of experiments using FVC2004 DB1 and DB2, we demonstrated that the matching accuracy is higher than that of VeriFinger and the conventional method. In future work, we will investigate the use of local frequency features and the use of relationships between local regions to recognize low quality fingerprint images from which only a few minutiae are extracted.

References

  • [1] FastEnhanceTexture. https://github.com/luannd/MinutiaeNet/blob/master/CoarseNet/MinutiaeNet%_utils.py.
  • [2] FVC2004. http://bias.csr.unibo.it/fvc2004/.
  • [3] NIST Biometric Image Software (NBIS). https://www.nist.gov/services-resources/software/nist-biometric-image-s%oftware-nbis.
  • [4] NIST Special Database 27. https://www.nist.gov/itl/iad/image-group/nist-special-database-2727a.
  • [5] VeriFinger SDK. https://www.neurotechnology.com/verifinger.html.
  • [6] K. Cao, D.-L. Nguyen, C. Tymoszek, and A. K. Jain. End-to-end latent fingerprint search. IEEE Trans. Information Forensics and Security, 15:880–894, July 2019.
  • [7] R. Cappelli, D. Maio, and D. Maltoni. Modelling plastic distortion in fingerprint images. Proc. Int’l Conf. Advances in Pattern Recognition (LNCS 2013), pages 371–378, Mar. 2001.
  • [8] J. J. Engelsma, K. Cao, and A. K. Jain. Learning a fixed-length fingerprint representation. IEEE Trans. Pattern Analysis and Machine Intelligence (Early access), pages 1–16, Dec. 2019.
  • [9] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. Proc. IEEE Conf. Computer Vision and Pattern Recognition, pages 770–778, June 2016.
  • [10] K. Ito, A. Morita, T. Aoki, H. Nakajima, K. Kobayashi, and T. Higuchi. Score-level fusion of phase-based and feature-based fingerprint matching algorithms. IEICE Trans. Fundamentals, E93-A(3):607–616, Mar. 2010.
  • [11] K. Ito, H. Nakajima, K. Kobayashi, Aoki., and T. T, Higuchi. A fingerprint matching algorithm using phase-only correlation. IEICE Trans. Fundamentals, E87-A(3):682–691, Mar. 2004.
  • [12] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu. Spatial transformer networks. Proc. Annual Conf. Neural Information Processing Systems, pages 2017–2025, Dec. 2015.
  • [13] A. K. Jain, P. Flynn, and A. A. Ross. Handbook of Biometrics. Springer, 2008.
  • [14] I. Kim, W. Beak, and S. Kim. Spatially attentive output layer for image classification. Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, pages 9533–9542, June 2020.
  • [15] Y. Koda, A. Takahashi, K. Ito, A. Takafumi, S. Kaneko, and S. M. Nzou. Development of 2,400ppi fingerprint sensor for capturing neonate fingerprint within 24 hours after birth. Int’l Conf. Biometrics Special Interest Group (Lecture Notes in Informatics 296), pages 95–106, Sept. 2019.
  • [16] R. Li, D. Song, Y. Liu, and J. Feng. Learning global fingerprint features by training a fully convolutional network with local patches. Proc. Int’l Conf. Biometrics, pages 1–8, June 2019.
  • [17] D. Maltoni, D. Maio, A. K. Jain, and S. Prabhakar. Handbook of Fingerprint Recognition. Springer, 2003.
  • [18] D.-L. Nguyen, K. Cao, and A. K. Jain. Robust minutiae extractor: Integrating deep networks and fingerprint domain knowledge. Proc. Int’l Conf. Biometrics, pages 9–16, Feb. 2018.
  • [19] A. Sankaran, M. Vatsa, and R. Singh. Multisensor optical and latent fingerprint database. IEEE Access, 3:653–665, Apr. 2015.
  • [20] J.-T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. Proc. Int’l Conf. Learning Representations (Workshop track), pages 1–14, May 2015.
  • [21] Y. Tang, F. Gao, and Y. Liu. FingerNet: An unified deep network for fingerprint minutiae extraction. Proc. Int’l Joint Conf. Biometrics, pages 108–116, Oct. 2017.
  • [22] X. Zhang, R. Zhao, Y. Quao, W. X., and H. Li. AdaCos: Adaptively scaling cosine logits for effectively learning deep face representations. Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, pages 10823–10832, June 2019.
  • [23] Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang. Random erasing data augmentation. CoRR, abs/1708.04896:1–10, 2017.