This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Face.evoLVe: A High-Performance Face Recognition Library

Qingzhong Wang [email protected] Baidu ResearchBeijingChina Pengfei Zhang [email protected] National University of Defense TechnologyChangshaChina Haoyi Xiong [email protected] Baidu ResearchBeijingChina  and  Jian Zhao [email protected] Institute of North Electronic EquipmentBeijingChina
Abstract.

While face recognition has drawn much attention, a large number of algorithms and models have been proposed with applications to daily life, such as authentication for mobile payments, etc. Recently, deep learning methods have dominated in the field of face recognition with advantages in comparisons to conventional approaches and even the human perception. Despite the popular adoption of deep learning-based methods to the field, researchers and engineers frequently need to reproduce existing algorithms with unified implementations (i.e., the identical deep learning framework with standard implementations of operators and trainers) and compare the performance of face recognition methods under fair settings (i.e., the same set of evaluation metrics and preparation of datasets with tricks on/off), so as to ensure the reproducibility of experiments.

To the end, we develop face.evoLVe — a comprehensive library that collects and implements a wide range of popular deep learning-based methods for face recognition. First of all, face.evoLVe is composed of key components that cover the full process of face analytics, including face alignment, data processing, various backbones, losses, and alternatives with bags of tricks for improving performance. Later, face.evoLVe supports multi-GPU training on top of different deep learning platforms, such as PyTorch and PaddlePaddle, which facilitates researchers to work on both large-scale datasets with millions of images and and low-shot counterparts with limited well-annotated data. More importantly, along with face.evoLVe, images before & after alignment in the common benchmark datasets are released with source codes and trained models provided. All these efforts lower the technical burdens in reproducing the existing methods for comparison, while users of our library could focus on developing advanced approaches more efficiently. Last but not least, face.evoLVe is well designed and vibrantly evolving, so that new face recognition approaches can be easily plugged into our framework. Note that we have used face.evoLVe to participate in a number of face recognition competitions and secured the first place. The version that supports PyTorch (Paszke et al., 2019) is publicly available at https://github.com/ZhaoJ9014/face.evoLVe.PyTorch and the PaddlePaddle (Ma et al., 2019) version is available at https://github.com/ZhaoJ9014/face.evoLVe.PyTorch/tree/master/paddle. Face.evoLVe has been widely used for face analytics, receiving 2.4K stars and 622 forks.

Face Analytics, Toolbox, Library, PyTorch, PaddlePaddle
Refer to caption
Refer to caption
Figure 1. Face analytics from a scene of “The Big Bang Theory” TV series.

1. Introduction

As a critical research problem in multimedia computing (Zhu et al., 2011), face recognition has drawn much attention from both academia and industry with intensive applications to daily life, e.g., authentications for mobile payment (Du, 2018) and the friends tagging from photos/videos (e.g., Fig. 1). A face recognition system normally takes an image or a video as input and identifies faces in the image or video as outputs. Recently, deep learning-based approaches have dominated in the field of face recognition, showing incredible superiority to conventional face recognition methods, such as EigenFace (Gupta et al., 2010; Rizon et al., 2006; Sahoolizadeh and Ghassabeh, 2008) and subspace-based methods (Aishwarya and Marcus, 2010). Bless by its over-parameterization nature, deep learning-based approaches enjoy unique advantages including (1) more powerful feature extraction for face representation, (2) an integrated representation and discriminative learning process in an end-to-end manner, and (3) higher capacity of memorization and generalization with large datasets (Zhang et al., 2021).

While deep face recognition approaches have been reported to outperform human’s perception (Sun et al., 2014, 2015), they have been concerned with issues of reproducibility, i.e., it might be difficult to achieve the same performance when algorithms and models were re-implemented from the details released in the papers. Although some researchers release codes, it is still inconvenient to reproduce the experiments for fair comparisons as they often have been “cooked with recipes”, e.g., tricks in architectures, losses, data processing, training and evaluation. Hence, developing a comprehensive face recognition library, with all alternating backbones, loss functions and bag of tricks incorporated, is vital for both researchers and engineers. To this end, we develop a relatively comprehensive deep face recognition library named face.evoLVe to meet the goals above. In this paper, we present the features of the developed face.evoLVe library in details. In summary, our main contributions could be categorized in three folders as follows:

  1. (1)

    We develop a comprehensive library, namely face.evoLVe, for face-related analytics and applications, including face alignment (e.g., detection, landmark localization, affine transformation, etc.), data processing (e.g., augmentation, data balancing, normalization, etc.), where various backbones (e.g., ResNet (He et al., 2016), IR, IR-SE (Hu et al., 2018), ResNeXt (Xie et al., 2017), SE-ResNeXt, DenseNet (Huang et al., 2017), LightCNN (Wu et al., 2018a), MobileNet (Howard et al., 2017), ShuffleNet (Zhang et al., 2018), DPN (Chen et al., 2017a), etc.) with alternating losses (e.g., Softmax, Focal (Lin et al., 2017), Center (Wen et al., 2016), SphereFace (Liu et al., 2017), CosFace (Wang et al., 2018b), AmSoftmax (Wang et al., 2018a), ArcFace (Deng et al., 2019), Triplet (Schroff et al., 2015), etc.) and bags of tricks (e.g., training refinements, model tweaks, knowledge distillation (Hinton et al., 2015), etc.) for improving performance have been provided in standard implementations.

  2. (2)

    Face.evoLVe supports multiple popular deep learning platforms including both PaddlePaddle (Ma et al., 2019) and PyTroch (Paszke et al., 2019). On top of the native platform, i.e., PyTorch, face.evoLVe provides necessary facilities to support parallel training with multi-GPUs, where users could enjoy the computation power of massive GPUs using few lines of codes/configurations. Note that the parallel training scheme in face.evoLVe not only supports the training of backbones (Paszke et al., 2019), but also accelerates the training of fully-connected (softmax) layers to fully scale-up the parallel training based on multi-GPUs and large datasets over distributed storage.

  3. (3)

    Face.evoLVe can help researchers/engineers develop high-performance deep face recognition models and algorithms quickly for practical use and deployment. Specifically, all data before and after alignment, source codes and trained models are provided, which reduces the efforts required for reproducing the existing methods, facilitates the development of new advanced approach, and provides training and evaluation environments for fair comparisons. We have used face.evoLVe to participate in a number of face recognition competitions and secured the first place. In addition, the library is well designed and evolving vibrantly with a group of active contributors. New face recognition approaches can be easily plugged into the face.evoLVe framework.

2. Related Work

2.1. A Brief Review of Face Recognition

Normally, a face recognition system consists of face detection, facial landmark localization, face alignment, feature extraction and matching (Wang and Deng, 2018; Jian, 2018). Each part of the face recognition system can be an individual research area and recently, a wide range of approaches have been proposed not only for the feature extraction module to obtain better representations of faces but also for other modules, such as loss functions. Deepface (Taigman et al., 2014) employs deep neural networks (DNNs), such as Alexnet (Krizhevsky et al., 2012) to extract face features, which is much more powerful than using Eigenface (Gupta et al., 2010; Rizon et al., 2006; Sahoolizadeh and Ghassabeh, 2008). A marginalized CNN is proposed by Zhao et al.(Zhao et al., 2017a) to achieve more robust face representations. In terms of the loss function, Deepface (Taigman et al., 2014) adopts softmax, which is widely used for classification (Krizhevsky et al., 2012; Simonyan and Zisserman, 2014; He et al., 2016). In contrast, Sun et al.employ contrastive loss (Sun et al., 2015). However, either softmax loss or contrastive loss is not sufficient to learn discriminative features for face recognition. Also, Alexnet cannot obtain satisfactory representations of faces, hence triplet loss is applied in (Schroff et al., 2015; Liu et al., 2015; Parkhi et al., 2015) to learn more discriminative features. GoogleNet (Szegedy et al., 2015) and VGGNet (Simonyan and Zisserman, 2014) are then adopted to learn better representations. The problem of using triplet loss is that the training process is not stable. To mitigate this problem, Wen et al.(Wen et al., 2016) propose a center loss function, which learns a center for each class and penalizes the distances between the deep features and their corresponding class centers. More recently, angular loss functions (Wang et al., 2018b; Deng et al., 2019) and ResNet (He et al., 2016) dominate face recognition. Researchers have realized that to accurately classify the queries, a face recognition model should strictly separate faces in the feature space. A number of approaches based on angular distance are proposed to achieve the goal, such as Cosface (Wang et al., 2018b), Arcface (Deng et al., 2019), Regularface (Zhao et al., 2019b), Adaptiveface (Liu et al., 2019b) and Adacos (Zhang et al., 2019).

Another challenging direction is age-invariant face recognition (Zheng et al., 2017; Park et al., 2010; Zhao et al., 2020), i.e., to learn representations of faces that is robust to appearance changes caused by facial aging. However, it is difficult to obtain sufficient well-annotated facial images with different age ranges, Zhao et al.(Zhao et al., 2019a) propose an age-invariant model to learn disentangled representations while synthesizing photorealistic cross-age faces for age-invariant face recognition. Pose variants are also challenging for robust face recognition in the wild. 3D vision techniques have been used to estimate the pose and aid face recognition (Zhao et al., 2018) under such settings.

In this paper, the developed face recognition library covers most state-of-the-art loss functions and backbones to achieve discriminative yet generative face representations for high-performance face recognition, which is also flexible for users to design their own loss function and backbone.

2.2. Face Recognition Toolboxes

Library # backbones # heads Platforms
InsightFace (Guo et al., 2019) - 4 Pytorch (Paszke et al., 2019), MxNet (Chen et al., 2015)
FaceX-zoo (Wang et al., 2021) 8 8 Pytorch (Paszke et al., 2019)
Face.evoLVe 13 16 Pytorch (Paszke et al., 2019), PaddlePaddle (Ma et al., 2019)
Table 1. Comparison between face.evoLVe and other libraries. InsightFace (Guo et al., 2019) is the official implementation of Arcface (Deng et al., 2019).

Many widely used face recognition approaches have released their source codes, however, the platforms are different and the data processing method varies, which is inconvenient for the community of face recognition, resulting in unfair comparison. Therefore, using the same pipeline can be more convenient and flexible for face recognition. Guo et al.(Guo et al., 2019) develop a toolbox named InsightFace for 2D and 3D face analysis, which supports two platforms: PyTorch (Paszke et al., 2019) and MxNet (Chen et al., 2015). However, InsightFace only implements a few popular deep face recognition models, such as Arcface (Deng et al., 2019) and Subcenter Arcface (Deng et al., 2020). FaceX-Zoo (Wang et al., 2021) is a relatively comprehensive face recognition library, however, it only supports PyTorch platform (Paszke et al., 2019), which is inconvenient for the users that use other platforms, such as Tensorflow (Abadi et al., 2016) and PaddlePaddle (Ma et al., 2019). By contrast, the proposed face.evoLVe library is highly flexible and scalable, which implements the complete face recognition pipeline, most face recognition models and supports both PyTorch and PaddlePaddle platforms. In addition, distributed training is well supported in face.evoLVe, which can be easily applied to large-scale datasets composed of millions of images. Tab. 1 presents the comparison among different libraries, we can see that face.evoLVe is more comprehensive and supports more platforms. Another feature of face.evoLVe is that few-shot learning is supported.

3. Face.evoLVe Library

Refer to caption
Figure 2. An overview of the face.evoLVe pipeline. The input data can be images or videos. Face.evoLVe first detects faces in images or videos and then localizes the facial landmarks for alignment. The backbone network takes the processed data as input and outputs the representations of faces. Finally, the head blocks are used for compute the loss. Note that face.evoLVe supports distributed training to accelerate the training process.

In this section, we present the developed face.evoLVe library.

3.1. Pipeline

To be convenient and flexible, face.evoLVe library is well designed. We split the face recognition pipeline into four modules: face detection & alignment, data processing, feature extraction and the loss head. Fig. 2 demonstrates the pipeline of the developed face.evoLVe library. face.evoLVe first detects faces in images and localize the facial landmarks for alignment, where MTCNN (Zhang et al., 2016) is adopted. The aligned faces are fed into the backbone networks to extract features, in face.evoLVe library, we implement a large number of backbones. Finally, face representations are used to compute the loss in the head block, where various loss function are implemented in the library. More details can be found in the following sections.

3.2. Data Processing

Given a dataset composed of NN images of mm classes, to balance the distribution of classes, face.evoLVe library first removes the low-shot classes that occurs less than num_min times in the training dataset. It is easy for users to define their own num_min for a specific dataset.

Apart from removing low-shot classes, face.evoLVe library also provides data augmentation approaches, e.g., flip horizontally, scale hue/satuation/brightness with coefficients uniformly drawn from [0.6,1.4][0.6,1.4], add PCA noise with a coefficient sampled from a normal distribution 𝒩(0,0.1)\mathcal{N}(0,0.1), etc.. Moreover, to alleviate the problem of long-tailed distribution, we also implement weighted random sampling in face.evoLVe library, which is user-friendly, i.e., users can also define a specific sampler instead of using the original weighted sampling implementation during training.

3.3. Implemented Models

One of the advantages of face.evoLVe library is that face.evoLVe contains many existing deep face recognition models. Generally, a model can be separated into two parts: (1) backbone – to extract face features and (2) loss function – to train a model and learn better representations of faces. Face.evoLVe library also designed in a highly-modular manner, i.e., backbones and loss heads are separately implemented.

3.3.1. Backbone

We implement many popular backbones in face.evoLVe library and users can easily change the configuration of backbones to specify the architecture of the backbone. In addition, the highly-modular design allows users to conveniently plug their own backbone networks into the library.

The implemented backbones in face.evoLVe library are as follows:

  • ResNet (He et al., 2016) – a deep convolutional neural network with residual connections, which has various versions, e.g., ResNet18, ResNet34, ResNet50, ResNet101 and ResNet152. ResNet has been widely applied to many vision tasks, such as classification (He et al., 2016), segmentation (He et al., 2017) and object detection (Ren et al., 2015), achieving state-of-the-art performance.

  • IR – an improved version of ResNet. Similar to ResNet, IR also has various versions with different numbers of layers.

  • IR-SE – applies squeeze-and-excitation (SE) blocks (Hu et al., 2018) to improve IR.

  • ResNeXt (Xie et al., 2017) – similar to ResNet, but splits channels into different paths, which achieves better performance on most vision tasks.

  • SE-ResNeXt – uses squeeze-and-excitation (SE) blocks (Hu et al., 2018) to improve ResNeXt.

  • DenseNet (Huang et al., 2017) – a deep convolutional neural network (CNN) with dense residual connections, i.e., a deep layer takes the outputs of all previous shallow layers as input, which takes more time.

  • LightCNN (Wu et al., 2018a) – adopts a Max-Feature-Map (MFM) for face recognition with noisy labels. It has 3 variants: LightCNN9, LightCNN29 and LightCNN29-V2, which is able to balance the performance and computational complexity.

  • MobileNet (Howard et al., 2017) – a CNN with fewer parameters but achieves competitive performance compared with VGG16 (Simonyan and Zisserman, 2014) that is 30 times larger.

  • ShuffleNet (Zhang et al., 2018) – a very small CNN that uses pointwise group convolution, channel shuffle and depthwise separable convolution.

  • DPN (Chen et al., 2017a) – a simple, highly efficient and modularized network. A shallow DPN surpasses the best ResNeXt101 (Xie et al., 2017) with smaller model size, less computational cost and lower memory consumption.

  • AttentionNet (Yoo et al., 2015) – introduces attention mechanism (Bahdanau et al., 2014; Vaswani et al., 2017) into ResNet (He et al., 2016) to learn the attention-aware features.

  • EfficientNet (Tan and Le, 2019) – a series of models that consider width scaling, depth scaling, resolution scaling and compound scaling, resulting in lower volume size but comparable performance compared to the counterparts.

  • GhostNet (Han et al., 2020) – a model that uses cheap operators to generate more features.

3.3.2. Head and Loss Function

A large number of heads and loss functions are implemented in face.evoLVe library, which covers most published works.

  • Softmax – a simple and naive loss function for face recognition, which is popular in the earlier works.

  • Focal (Lin et al., 2017) – is used to alleviate the problem of class imbalance, which is first applied to object detection.

  • Center (Wen et al., 2016) – introduces intra-class distance into softmax, which forces the samples have the same class label are more close to each other.

  • SphereFace (Liu et al., 2017) – an angular softmax function. Compared to the original softmax function, Sphereface loss is more strict, hence the learned feature are more separable.

  • CosFace (Wang et al., 2018b) – cosine function and a large margin are applied to the original softmax function, therefore the samples are separated in the angular space.

  • AmSoftmax (Wang et al., 2018a) – introduces a margin into angular softmax function to force the face recognition model to learn separable representations of faces.

  • ArcFace (Deng et al., 2019) – directly adds a margin to the angle, yielding a additive angular margin loss.

  • Triplet (Schroff et al., 2015) – considers the relative difference of the distances among the triplet composed of an anchor image, a positive image and a negative image, which requires a triplet and how to select the negative images could be tricky.

  • AdaCos (Zhang et al., 2019) – introduces dynamic scaling mechanism to adjust the margin and scale factor.

  • AdaMSoftmax (Liu et al., 2019b) – adopts learnable margins and the average margin of all classes should be large.

  • ArcNegFace (Liu et al., 2019a) – takes the distance between anchors into consideration to utilize hard negative mining and weaken the influence of the error labeling.

  • CircleLoss (Sun et al., 2020) – a general loss function, which is able to deduce other loss functions, such as triplet loss and AmSoftmax.

  • CurricularFace (Huang et al., 2020) – applies curriculum learning to face recognition and dynamically ranks the samples based on the hardness.

  • MagFace (Meng et al., 2021) – learns a universal feature embedding whose magnitude can measure the quality of the given face.

  • NPCFace (Zeng et al., 2020) – adopts both hard negatives and positives to improve learning better representations.

  • MVSoftmax (Wang et al., 2020b) – adaptively emphasizes the mis-classified feature vectors to guide the discriminative feature learning.

Note that focal loss (Lin et al., 2017) can be applied to all softmax-like functions, such as softmax, Arcface, and Cosface, which is able to mitigate the problem of class imbalance.

3.4. Bag of Tricks

Instead of implementing the existing works, such as backbones and loss functions, we also implement a bag of tricks for face recognition, which can be helpful for both researchers and engineers.

Learning rate adjustment – at the beginning of the training process, all parameters are typically random values, therefore they are far away from the optimal. Using a large learning rate normally results in an unstable training process. One solution is using warmup (Gotmare et al., 2018). In the warmup stage, we use a small learning rate in the beginning and then switch it to the initial learning rate when the training process is stable (Goyal et al., 2017). After warmup, we use learning rate decay – the learning rate gradually decreases from the initial value. Cosine annealing strategy (Loshchilov and Hutter, 2016) is also implemented in the library. An simplified version is reducing the learning rate from the initial value to 0 by following the cosine function. The cosine decay reduces the learning rate slowly in the beginning, and then becomes almost linear decreasing in the middle, and slows down again at the end. Compared to the step decay function, cosine decay is much more smooth and the learning rate is larger than using step decay in the middle training stage, resulting in faster convergence, which potentially improves the final performance.

Label Smoothing – label smoothing was first proposed to train Inception-v2 (Szegedy et al., 2016), which changes the labels from one-hot vectors to some distributions. We empirically compare the outputs of two ResNet50 models that are trained with and without label smoothing respectively and calculate the gap between the maximum output value and the average of the rest values. Under ϵ=0.1\epsilon=0.1 and K=1000K=1000, the theoretical gap is around 9.1 and using label smoothing, the output distribution center is close to the theoretical value and has fewer extreme values.

Model tweak – a model tweak is a minor adjustment to the network architecture, such as changing the stride of a particular convolution layer. Model tweaks highly depend on the experience and knowledge of researchers. Generally, a tweak barely changes the computational complexity but might have a non-negligible effect on the model accuracy. Many works have been done on ResNet tweak (Zhao et al., 2017b; Goyal et al., 2017). E.g., changing the downsampling block of ResNet. Some tweaks (Hu et al., 2018; Ma et al., 2018; Chen et al., 2017b) replace the 7×77\times 7 convolutional kernel in the input stem with stacking three conventional 3×33\times 3 convolutional kernels.

Knowledge distillation – existing models suffer from noisy identity labels, such as outliers and label flips. It is beneficial to automatically reduce the label noise for improving recognition accuracy. Self-training is a standard approach in semi-supervised learning, which is explored to significantly boost the performance on image classification (Wu et al., 2018b; Tai et al., 2021). Based on our estimation, there are more than 30% and 50% noisy labels in MegaFace2 (Kemelmacher-Shlizerman et al., 2016; Nech and Kemelmacher-Shlizerman, 2017) and MS-celeb-1M (Guo et al., 2016). Since the datasets are relatively large and to reduce label noise and learning better representations of faces, we can apply self-training and knowledge distillation (Hinton et al., 2015; Kim et al., 2020), i.e., learning the knowledge of the previous trained model.In Fig. 3, we use T-SNE (Van der Maaten and Hinton, 2008) to visualize the distribution of the learned features with and without using knowledge distillation. Obviously, using knowledge distillation learns better representations since the features are more discriminative, i.e., it is much easier to classify the faces.

Refer to caption
Refer to caption
Figure 3. T-SNE visualization of 10 classes randomly selected from MS-Celeb-1M. Left: without using knowledge distillation. Right: using knowledge distillation.

3.5. Training and Evaluation

Dataset # identities # images # videos
LFW (Huang et al., 2008) 5,749 13,233 -
CALFW (Zheng et al., 2017) 4,025 12,174 -
CASIA-WebFace (Yi et al., 2014) 10,575 494,414 -
MS-Celeb-1M (Guo et al., 2016) 100,000 5.08M -
Vggface2 (Cao et al., 2018) 8,631 3.09M -
AgeDB (Moschoglou et al., 2017) 570 16,488 -
IJB-A (Klare et al., 2015) 500 5,396 2,085
IJB-B (Whitelam et al., 2017) 1,845 21,798 7,011
CFP (Sengupta et al., 2016) 500 7,000 -
Umdfaces (Bansal et al., 2017) 8,277 367,888 -
CelebA (Liu et al., 2018) 10,177 202,599 -
CACD-VS (Chen et al., 2014) 2,000 163,446 -
YTF (Wolf et al., 2011) 1,595 - 621,127
DeepGlint (dee, [n.d.]) 180,855 6.75M -
UTKFace (Zhang and Qi, 2017) - 23,708 -
BUAA-VisNir (Huang et al., 2012) 150 5,952 -
CASIA NIR-VIS (Li et al., 2013) 2.0 725 17,580 -
Oulu-CASIA (Zhao et al., 2011) 80 65,000 -
NUAA-ImposterDB (Tan et al., 2010) 15 12,614 -
CASIA-SURF (Zhang et al., 2020) 1,000 - 21,000
CASIA-FASD (Zhang et al., 2012) 50 - 600
CASIA-MFSD (Zhang et al., 2012) 50 - 600
Replay-Attack (Chingovska et al., 2012) 50 - 1200
WebFace260M (Zhu et al., 2021) 2M 42M -
Table 2. Datasets supported by face.evoLVe library. All datasets are publicly available. And we also provide the pre-processed datasets.

Face.evoLVe supports most public datasets and users can directly download the datasets from the links provided in the library. Tab. 2 presents the statistics of the datasets that supported by face.evoLVe library. For most datasets, we provide multiple versions, including the raw version and aligned version.

Note that large-scale datasets are more popular in recent years, hence distributed training is vital. Fortunately, face.evoLVe library well supports distributed training, thus users are able to train the model using relatively large datasets, such as MS-Celeb-1M (Guo et al., 2016) and WebFace260M (Zhu et al., 2021).

In terms of evaluation, we provide many trained models, so that researchers can easily evaluate the models for comparison and engineers can also quickly deploy a face recognition model.

4. Performance

Backbone Head Loss Testing Dataset
LFW (Huang et al., 2008) CFP_FF (Sengupta et al., 2016) CFP_FP (Sengupta et al., 2016) AgeDB (Moschoglou et al., 2017) CALFW (Zheng et al., 2017) CPLFW (Zheng and Deng, 2018) Vggface2_FP (Cao et al., 2018)
IR50 Arcface (Deng et al., 2019) Focal 99.78 99.69 98.14 97.53 95.87 92.45 95.22
IR101 Arcface (Deng et al., 2019) Focal 99.81 99.74 98.25 97.77 95.93 92.74 95.44
IR152 Arcface (Deng et al., 2019) Focal 99.82 99.83 98.37 98.07 96.03 93.05 95.50
IR50 AdaCos (Zhang et al., 2019) Focal 99.75 99.53 98.39 97.25 95.55 92.25 95.27
IR101 AdaCos (Zhang et al., 2019) Focal 99.78 99.59 98.41 97.33 95.68 92.41 95.35
IR152 AdaCos (Zhang et al., 2019) Focal 99.81 99.65 98.42 98.47 95.74 92.57 95.43
IR50 AM-Softmax (Wang et al., 2018a) Focal 99.59 99.59 98.23 97.37 95.23 92.37 95.35
IR101 AM-Softmax (Wang et al., 2018a) Focal 99.65 99.62 98.31 97.42 95.38 92.46 95.47
IR152 AM-Softmax (Wang et al., 2018a) Focal 99.72 99.71 98.45 97.49 95.50 92.52 95.55
LightCNN-9 Arcface (Deng et al., 2019) Focal 98.75 98.14 92.57 88.51 89.23 82.88 92.14
LightCNN-29 Arcface (Deng et al., 2019) Focal 99.02 98.57 94.25 90.85 91.87 85.78 94.16
LightCNN-29v2 Arcface (Deng et al., 2019) Focal 99.23 98.47 94.35 91.49 91.72 85.78 94.06
Table 3. The performance of the implemented models in face.evoLVe library on different Head and Backbone. We train the models using Ms-Celeb-1M dataset (Guo et al., 2016).
Backbone Head Loss Testing Dataset
LFW (Huang et al., 2008) CFP_FF (Sengupta et al., 2016) CFP_FP (Sengupta et al., 2016) AgeDB (Moschoglou et al., 2017) CALFW (Zheng et al., 2017) CPLFW (Zheng and Deng, 2018) Vggface2_FP (Cao et al., 2018)
IR152 AdaCos (Zhang et al., 2019) Focal 99.82 99.84 98.37 98.07 96.03 93.05 95.50
HRnet (Wang et al., 2020a) MV-Softmax (Wang et al., 2020b) Focal 99.82 99.51 98.41 97.88 95.43 88.95 94.70
TF-NAS-A (Hu et al., 2020) AM-Softmax (Wang et al., 2018a) Focal 99.82 99.47 98.33 96.65 94.32 84.88 91.38
GhostNet (Han et al., 2020) ArcFace (Deng et al., 2019) Focal 99.69 99.52 98.48 97.29 94.92 85.25 90.88
AttentionNet (Yoo et al., 2015) AdaCos (Zhang et al., 2019) Focal 99.82 99.47 98.52 96.89 95.12 87.23 94.23
MobileFaceNet (Howard et al., 2017) AdaCos (Zhang et al., 2019) Focal 99.73 99.84 97.75 95.87 94.87 89.29 93.20
Table 4. The performance of the implemented models in face.evoLVe library on different Head and Backbone. We train the models using Web260M dataset (Zhu et al., 2021).
Refer to caption
Figure 4. The visualization of the training process. We train the model on MS-Celeb-1M (Guo et al., 2016), using IR152 as the backbone, Arcface (Deng et al., 2019) as the head and focal loss. We validate the model on seven datasets and present the accuracy.

We presents the performances of the implemented models in Tab. 3 and Tab. 4. We train the models using MS-Celeb-1M (Guo et al., 2016) dataset (Tab. 3) and Web260M (Zhu et al., 2021) (Tab. 4), then test them on different datasets. Compared with the state-of-the-art performance, the implemented models in face.evoLVe achieve competitive results, e.g., the original Arcface (Deng et al., 2019) model with ResNet100 achieves 99.82 on LFW, 95.45 on CALFW and 92.08 on CPLFW, in contrast, using face.evoLVe with IR50 achieves 99.78 on LFW, 95.87 on CALFW and 92.45 on CPLFW (see tabel 3), performing even better than the original Arcface. In addition, using deeper backbones like IR101 and IR152 are able to obtaining better performance. Compared to other implementations such as FaceX-zoo (Wang et al., 2021), face.evoLVe is also competitive, e.g., using IR50+Arcface face.evoLVe and faceX-zoo achieve the same accuracy (99.78) on LFW, while face.evoLVe outperforms faceX-zoo on CALFW (95.87 vs. 95.47).

Interestingly, looking at Tab. 4 where the models are trained on a relatively large dataset – Web260M (Zhu et al., 2021), the performance could be further improved. E.g., IR152+AdaCos trained on Web260M obtains 99.82, 96.03, 93.05 on LFW, CALFW and CPLFW, respectively, while the same model trained on MS-Celeb-1M achieves 99.81, 95.74 and 92.57.

Apart from presenting the accuracy, we also visualize the training process in Fig. 4 to help users analyse the model behaviors during training, such as stability and overfitting. E.g., we can easily find than overfitting occurs on LFW dataset during training, hence training more iterations does not improve the final performance but takes more time, in this case, early stop can be applied.

Moreover, we use face.evoLVe library to participate in many face recognition competitions, achieving the fist place and SoTA performance. E.g., we achieve the fist place on ICCV 2017 MS-Celeb-1M Large-Scale Face Recognition Hard Set/Random Set/Low-Shot Learning Challenges by using the face.evoLVe library and No.1 on National Institute of Standards and Technology (NIST) IARPA Janus Benchmark A (IJB-A) Unconstrained Face Verification challenge and Identification challenge in 2017. In addition, face.evoLVe also achieves SoTA performance on other datasets, e.g., on MS-Celeb-1M hard set, using face.evoLVe obtains 79.10% of coverage at precision=95% and 87.50% of coverage at precision=95% on the random set.

5. Using Customized Datasets and Models

As we have mentioned, it is relatively easy and convenient to use the developed face.evoLVe library for both training and evaluation. Generally, researchers also requires the modularity of a toolbox, thus they are able to easily plug their proposed models into the toolbox with changing only a few codes. Fortunately, face.evoLVe is a highly modular library.

Apart from the datasets provided by the library, users can also their own datasets. Face.evoLVe provides data pre-processing SDK, including face detection and alignment, so that users can first use the pre-processing SDK to obtain the images of faces, which is relatively convenient.

In terms of the model, which is designed in a modular and extensible manner, e.g., the backbones and heads are independent to each other, thus, users can easily plug either customized backbones or heads without changing the architecture of the library. Also, during training users can easily change the hyper-parameters, such as learning rate, batch size and momentum.

6. Conclusion

In this paper, we have presented a comprehensive face recognition library – face.evoLVe, which is composed of necessary components covering the full pipeline of face recognition practices, including the alternating backbones and loss functions for face detection, alignment, feature extraction and matching. The goal of face.evoLVe is to lower the technical burden for researchers in the community to reproduce existing algorithms and models for comparisons and benchmark. In addition, face.evoLVe is designed in a highly modular and extensible manner, where users could easily implement and plug their own models into the library for potential extension and contribution. Also, the developed library provides a bag of tricks to improve the performance and stabilize the training process. Note that we have used face.evoLVe to achieve SoTA performance and secured the first place for a series of open competitions. Currently, face.evoLVe is still evolving with a group of active contributors. Commitments with novel models, tricks and datasets are welcome.

7. Acknowledgement

This work was partially supported by the National Science Foundation of China 62006244.

References

  • (1)
  • dee ([n.d.]) [n.d.]. Ms-celeb-1m challenge 3. In http://trillionpairs.deepglint.com/overview.
  • Abadi et al. (2016) Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In OSDI. 265–283.
  • Aishwarya and Marcus (2010) P Aishwarya and Karnan Marcus. 2010. Face recognition using multiple eigenface subspaces. Journal of Engineering and Technology Research 2, 8 (2010), 139–143.
  • Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
  • Bansal et al. (2017) Ankan Bansal, Anirudh Nanduri, Carlos D Castillo, Rajeev Ranjan, and Rama Chellappa. 2017. Umdfaces: An annotated face dataset for training deep networks. In IJCB. 464–473.
  • Cao et al. (2018) Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. 2018. Vggface2: A dataset for recognising faces across pose and age. In FG. 67–74.
  • Chen et al. (2014) Bor-Chun Chen, Chu-Song Chen, and Winston H Hsu. 2014. Cross-age reference coding for age-invariant face recognition and retrieval. In ECCV. 768–783.
  • Chen et al. (2017b) Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017b. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).
  • Chen et al. (2015) Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015).
  • Chen et al. (2017a) Yunpeng Chen, Jianan Li, Huaxin Xiao, Xiaojie Jin, Shuicheng Yan, and Jiashi Feng. 2017a. Dual path networks. arXiv preprint arXiv:1707.01629 (2017).
  • Chingovska et al. (2012) Ivana Chingovska, André Anjos, and Sébastien Marcel. 2012. On the effectiveness of local binary patterns in face anti-spoofing. In BIOSIG. 1–7.
  • Deng et al. (2020) Jiankang Deng, Jia Guo, Tongliang Liu, Mingming Gong, and Stefanos Zafeiriou. 2020. Sub-center arcface: Boosting face recognition by large-scale noisy web faces. In ECCV. 741–757.
  • Deng et al. (2019) Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. Arcface: Additive angular margin loss for deep face recognition. In CVPR. 4690–4699.
  • Du (2018) Meiyan Du. 2018. Mobile payment recognition technology based on face detection algorithm. Concurrency and Computation: Practice and Experience 30, 22 (2018), e4655.
  • Gotmare et al. (2018) Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher. 2018. A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation. arXiv preprint arXiv:1810.13243 (2018).
  • Goyal et al. (2017) Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017).
  • Guo et al. (2019) Jia Guo, Jiankang Deng, Xiang An, and Jack Yu. 2019. InsightFace. https://github.com/deepinsight/insightface.git.
  • Guo et al. (2016) Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, and Jianfeng Gao. 2016. Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In ECCV. 87–102.
  • Gupta et al. (2010) Sheifali Gupta, OP Sahoo, Ajay Goel, and Rupesh Gupta. 2010. A new optimized approach to face recognition using eigenfaces. Global Journal of Computer Science and Technology (2010).
  • Han et al. (2020) Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. 2020. Ghostnet: More features from cheap operations. In CVPR. 1580–1589.
  • He et al. (2017) Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In ICCV. 2961–2969.
  • He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.
  • Hinton et al. (2015) Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
  • Howard et al. (2017) Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
  • Hu et al. (2018) Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In CVPR. 7132–7141.
  • Hu et al. (2020) Yibo Hu, Xiang Wu, and Ran He. 2020. TF-NAS: Rethinking Three Search Freedoms of Latency-Constrained Differentiable Neural Architecture Search. In ECCV.
  • Huang et al. (2012) D Huang, J Sun, and Y Wang. 2012. The BUAA-VisNir face database instructions. School Comput. Sci. Eng., Beihang Univ., Beijing, China, Tech. Rep. IRIP-TR-12-FR-001 3 (2012).
  • Huang et al. (2017) Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In CVPR. 4700–4708.
  • Huang et al. (2008) Gary B Huang, Marwan Mattar, Tamara Berg, and Eric Learned-Miller. 2008. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on Faces in ’Real-Life’ Images: Detection, Alignment, and Recognition.
  • Huang et al. (2020) Yuge Huang, Yuhan Wang, Ying Tai, Xiaoming Liu, Pengcheng Shen, Shaoxin Li, Jilin Li, and Feiyue Huang. 2020. Curricularface: adaptive curriculum learning loss for deep face recognition. In CVPR. 5901–5910.
  • Jian (2018) Zhao Jian. 2018. Deep learning for human-centric image analysis. (2018).
  • Kemelmacher-Shlizerman et al. (2016) Ira Kemelmacher-Shlizerman, Steven M Seitz, Daniel Miller, and Evan Brossard. 2016. The megaface benchmark: 1 million faces for recognition at scale. In CVPR.
  • Kim et al. (2020) Kyungyul Kim, ByeongMoon Ji, Doyoung Yoon, and Sangheum Hwang. 2020. Self-knowledge distillation: A simple way for better generalization. arXiv preprint arXiv:2006.12000 (2020).
  • Klare et al. (2015) Brendan F Klare, Ben Klein, Emma Taborsky, Austin Blanton, Jordan Cheney, Kristen Allen, Patrick Grother, Alan Mah, and Anil K Jain. 2015. Pushing the frontiers of unconstrained face detection and recognition: Iarpa janus benchmark a. In CVPR. 1931–1939.
  • Krizhevsky et al. (2012) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. NeurIPS 25 (2012), 1097–1105.
  • Li et al. (2013) Stan Li, Dong Yi, Zhen Lei, and Shengcai Liao. 2013. The casia nir-vis 2.0 face database. In CVPRW. 348–353.
  • Lin et al. (2017) Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In ICCV. 2980–2988.
  • Liu et al. (2019b) Hao Liu, Xiangyu Zhu, Zhen Lei, and Stan Z Li. 2019b. Adaptiveface: Adaptive margin and sampling for face recognition. In CVPR. 11947–11956.
  • Liu et al. (2015) Jingtuo Liu, Yafeng Deng, Tao Bai, Zhengping Wei, and Chang Huang. 2015. Targeting ultimate accuracy: Face recognition via deep embedding. arXiv preprint arXiv:1506.07310 (2015).
  • Liu et al. (2017) Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. 2017. Sphereface: Deep hypersphere embedding for face recognition. In CVPR. 212–220.
  • Liu et al. (2019a) Yu Liu et al. 2019a. Towards flops-constrained face recognition. In ICCVW. 0–0.
  • Liu et al. (2018) Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2018. Large-scale celebfaces attributes (celeba) dataset. Retrieved August 15, 2018 (2018), 11.
  • Loshchilov and Hutter (2016) Ilya Loshchilov and Frank Hutter. 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016).
  • Ma et al. (2018) Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In ECCV. 116–131.
  • Ma et al. (2019) Yanjun Ma, Dianhai Yu, Tian Wu, and Haifeng Wang. 2019. PaddlePaddle: An open-source deep learning platform from industrial practice. Frontiers of Data and Computing 1, 1 (2019), 105–115.
  • Meng et al. (2021) Qiang Meng, Shichao Zhao, Zhida Huang, and Feng Zhou. 2021. Magface: A universal representation for face recognition and quality assessment. In CVPR. 14225–14234.
  • Moschoglou et al. (2017) Stylianos Moschoglou, Athanasios Papaioannou, Christos Sagonas, Jiankang Deng, Irene Kotsia, and Stefanos Zafeiriou. 2017. Agedb: the first manually collected, in-the-wild age database. In CVPRW.
  • Nech and Kemelmacher-Shlizerman (2017) Aaron Nech and Ira Kemelmacher-Shlizerman. 2017. Level Playing Field For Million Scale Face Recognition. In CVPR.
  • Park et al. (2010) Unsang Park, Yiying Tong, and Anil K Jain. 2010. Age-invariant face recognition. T-PAMI 32, 5 (2010), 947–954.
  • Parkhi et al. (2015) Omkar M Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep face recognition. (2015).
  • Paszke et al. (2019) Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. NeurIPS 32 (2019), 8026–8037.
  • Ren et al. (2015) Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. NeurIPS 28 (2015), 91–99.
  • Rizon et al. (2006) Mohamed Rizon, Hashim Muhammad Firdaus, Saad Puteh, Yaacob Sazali, Mamat Mohd Rozailan, Ali Yeon, Md Shakaff, Saad Abdul Rahman, Desa Hazri, and M Karthigayan. 2006. Face recognition using eigenfaces and neural networks. (2006).
  • Sahoolizadeh and Ghassabeh (2008) Hossein Sahoolizadeh and Youness Aliyari Ghassabeh. 2008. Face recognition using eigen-faces, fisher-faces and neural networks. In 2008 7th IEEE International Conference on Cybernetic Intelligent Systems. 1–6.
  • Schroff et al. (2015) Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In CVPR. 815–823.
  • Sengupta et al. (2016) Soumyadip Sengupta, Jun-Cheng Chen, Carlos Castillo, Vishal M Patel, Rama Chellappa, and David W Jacobs. 2016. Frontal to profile face verification in the wild. In WACV. 1–9.
  • Simonyan and Zisserman (2014) Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
  • Sun et al. (2020) Yifan Sun, Changmao Cheng, Yuhan Zhang, Chi Zhang, Liang Zheng, Zhongdao Wang, and Yichen Wei. 2020. Circle loss: A unified perspective of pair similarity optimization. In CVPR. 6398–6407.
  • Sun et al. (2015) Yi Sun, Ding Liang, Xiaogang Wang, and Xiaoou Tang. 2015. Deepid3: Face recognition with very deep neural networks. arXiv preprint arXiv:1502.00873 (2015).
  • Sun et al. (2014) Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2014. Deep learning face representation from predicting 10,000 classes. In CVPR. 1891–1898.
  • Szegedy et al. (2015) Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In CVPR. 1–9.
  • Szegedy et al. (2016) Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In CVPR. 2818–2826.
  • Tai et al. (2021) Kai Sheng Tai, Peter Bailis, and Gregory Valiant. 2021. Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training. arXiv preprint arXiv:2102.08622 (2021).
  • Taigman et al. (2014) Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. 2014. Deepface: Closing the gap to human-level performance in face verification. In CVPR. 1701–1708.
  • Tan and Le (2019) Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML. 6105–6114.
  • Tan et al. (2010) Xiaoyang Tan, Yi Li, Jun Liu, and Lin Jiang. 2010. Face liveness detection from a single image with sparse low rank bilinear discriminative model. In ECCV. 504–517.
  • Van der Maaten and Hinton (2008) Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. JMLR 9, 11 (2008).
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS. 5998–6008.
  • Wang et al. (2018a) Feng Wang, Jian Cheng, Weiyang Liu, and Haijun Liu. 2018a. Additive margin softmax for face verification. IEEE Signal Processing Letters 25, 7 (2018), 926–930.
  • Wang et al. (2018b) Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. 2018b. Cosface: Large margin cosine loss for deep face recognition. In CVPR. 5265–5274.
  • Wang et al. (2021) Jun Wang, Yinglu Liu, Yibo Hu, Hailin Shi, and Tao Mei. 2021. FaceX-Zoo: A PyTorh Toolbox for Face Recognition. arXiv preprint arXiv:2101.04407 (2021).
  • Wang et al. (2020a) Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. 2020a. Deep high-resolution representation learning for visual recognition. T-PAMI (2020).
  • Wang and Deng (2018) Mei Wang and Weihong Deng. 2018. Deep face recognition: A survey. arXiv preprint arXiv:1804.06655 (2018).
  • Wang et al. (2020b) Xiaobo Wang, Shifeng Zhang, Shuo Wang, Tianyu Fu, Hailin Shi, and Tao Mei. 2020b. Mis-classified vector guided softmax loss for face recognition. In AAAI, Vol. 34. 12241–12248.
  • Wen et al. (2016) Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. 2016. A discriminative feature learning approach for deep face recognition. In ECCV. 499–515.
  • Whitelam et al. (2017) Cameron Whitelam, Emma Taborsky, Austin Blanton, Brianna Maze, Jocelyn Adams, Tim Miller, Nathan Kalka, Anil K Jain, James A Duncan, Kristen Allen, et al. 2017. Iarpa janus benchmark-b face dataset. In CVPRW. 90–98.
  • Wolf et al. (2011) Lior Wolf, Tal Hassner, and Itay Maoz. 2011. Face recognition in unconstrained videos with matched background similarity. In CVPR. 529–534.
  • Wu et al. (2018b) Di Wu, Mingsheng Shang, Xin Luo, Ji Xu, Huyong Yan, Weihui Deng, and Guoyin Wang. 2018b. Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275 (2018), 180–191.
  • Wu et al. (2018a) Xiang Wu, Ran He, Zhenan Sun, and Tieniu Tan. 2018a. A light cnn for deep face representation with noisy labels. TIFS 13, 11 (2018), 2884–2896.
  • Xie et al. (2017) Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In CVPR. 1492–1500.
  • Yi et al. (2014) Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z Li. 2014. Learning face representation from scratch. arXiv preprint arXiv:1411.7923 (2014).
  • Yoo et al. (2015) Donggeun Yoo, Sunggyun Park, Joon-Young Lee, Anthony S Paek, and In So Kweon. 2015. Attentionnet: Aggregating weak directions for accurate object detection. In ICCV. 2659–2667.
  • Zeng et al. (2020) Dan Zeng, Hailin Shi, Hang Du, Jun Wang, Zhen Lei, and Tao Mei. 2020. NPCFace: A Negative-Positive Cooperation Supervision for Training Large-scale Face Recognition. arXiv preprint arXiv:2007.10172 (2020).
  • Zhang et al. (2016) Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499–1503.
  • Zhang et al. (2020) Shifeng Zhang, Ajian Liu, Jun Wan, Yanyan Liang, Guodong Guo, Sergio Escalera, Hugo Jair Escalante, and Stan Z Li. 2020. Casia-surf: A large-scale multi-modal benchmark for face anti-spoofing. IEEE Transactions on Biometrics, Behavior, and Identity Science 2, 2 (2020), 182–193.
  • Zhang and Qi (2017) Song Yang Zhang, Zhifei and Hairong Qi. 2017. Age Progression/Regression by Conditional Adversarial Autoencoder. In CVPR.
  • Zhang et al. (2021) Xiao Zhang, Haoyi Xiong, and Dongrui Wu. 2021. Rethink the Connections among Generalization, Memorization and the Spectral Bias of DNNs. IJCAI (2021).
  • Zhang et al. (2019) Xiao Zhang, Rui Zhao, Yu Qiao, Xiaogang Wang, and Hongsheng Li. 2019. Adacos: Adaptively scaling cosine logits for effectively learning deep face representations. In CVPR. 10823–10832.
  • Zhang et al. (2018) Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In CVPR. 6848–6856.
  • Zhang et al. (2012) Zhiwei Zhang, Junjie Yan, Sifei Liu, Zhen Lei, Dong Yi, and Stan Z Li. 2012. A face antispoofing database with diverse attacks. In ICB. 26–31.
  • Zhao et al. (2011) Guoying Zhao, Xiaohua Huang, Matti Taini, Stan Z Li, and Matti PietikäInen. 2011. Facial expression recognition from near-infrared videos. Image and Vision Computing 29, 9 (2011), 607–619.
  • Zhao et al. (2017b) Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017b. Pyramid scene parsing network. In CVPR. 2881–2890.
  • Zhao et al. (2019a) Jian Zhao, Yu Cheng, Yi Cheng, Yang Yang, Fang Zhao, Jianshu Li, Hengzhu Liu, Shuicheng Yan, and Jiashi Feng. 2019a. Look across elapse: Disentangled representation learning and photorealistic cross-age face synthesis for age-invariant face recognition. In AAAI, Vol. 33. 9251–9258.
  • Zhao et al. (2017a) Jian Zhao, Jianshu Li, Fang Zhao, Shuicheng Yan, and Jiashi Feng. 2017a. Marginalized cnn: Learning deep invariant representations. In BMVC.
  • Zhao et al. (2018) Jian Zhao, Lin Xiong, Yu Cheng, Yi Cheng, Jianshu Li, Li Zhou, Yan Xu, Jayashree Karlekar, Sugiri Pranata, Shengmei Shen, et al. 2018. 3D-Aided Deep Pose-Invariant Face Recognition.. In IJCAI, Vol. 2. 11.
  • Zhao et al. (2020) Jian Zhao, Shuicheng Yan, and Jiashi Feng. 2020. Towards age-invariant face recognition. T-PAMI (2020).
  • Zhao et al. (2019b) Kai Zhao, Jingyi Xu, and Ming-Ming Cheng. 2019b. Regularface: Deep face recognition via exclusive regularization. In CVPR. 1136–1144.
  • Zheng and Deng (2018) Tianyue Zheng and Weihong Deng. 2018. Cross-pose lfw: A database for studying cross-pose face recognition in unconstrained environments. Technical Report (2018).
  • Zheng et al. (2017) Tianyue Zheng, Weihong Deng, and Jiani Hu. 2017. Cross-age lfw: A database for studying cross-age face recognition in unconstrained environments. arXiv:1708.08197 (2017).
  • Zhu et al. (2011) Wenwu Zhu, Chong Luo, Jianfeng Wang, and Shipeng Li. 2011. Multimedia cloud computing. IEEE Signal Processing Magazine 28, 3 (2011), 59–69.
  • Zhu et al. (2021) Zheng Zhu, Guan Huang, Jiankang Deng, Yun Ye, Junjie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Jiwen Lu, Dalong Du, and Jie Zhou. 2021. WebFace260M: A Benchmark Unveiling the Power of Million-scale Deep Face Recognition. In CVPR.