Harnessing Unlabeled Data to Improve Generalization of Biometric Gender and Age Classifiers
Abstract
With significant advances in deep learning, many computer vision applications have reached the inflection point. However, these deep learning models need large amount of labeled data for model training and optimum parameter estimation. Limited labeled data for model training results in over-fitting and impacts their generalization performance. However, the collection and annotation of large amount of data is a very time consuming and expensive operation. Further, due to privacy and security concerns, the large amount of labeled data could not be collected for certain applications such as those involving medical field. Self-training, Co-training, and Self-ensemble methods are three types of semi-supervised learning methods that can be used to exploit unlabeled data. In this paper, we propose self-ensemble based deep learning model that along with limited labeled data, harness unlabeled data for improving the generalization performance. We evaluated the proposed self-ensemble based deep-learning model for soft-biometric gender and age classification. Experimental evaluation on CelebA and VISOB datasets suggest gender classification accuracy of 94.46% and 81.00%, respectively, using only 1000 labeled samples and remaining 199k samples as unlabeled samples for CelebA dataset and similarly,1000 labeled samples with remaining 107k samples as unlabeled samples for VISOB dataset. Comparative evaluation suggest that there is and improvement in the accuracy of the self-ensemble model when compared with supervised model trained on the entire CelebA and VISOB dataset, respectively. We also evaluated the proposed learning method for age-group prediction on Adience dataset and it outperformed the baseline supervised deep-learning learning model with a better exact accuracy of 55.55 4.28 which is 3.92% more than the baseline.
Index Terms:
Unlabeled data, Deep Learning, Soft Biometrics, Self-ensemble, and Semi-supervised learning.I Introduction
AI and computer vision has reached inflection point with advances in deep learning. However, these high accuracy rates are obtained at the cost of large number of parameters involved which in turn require large scale labeled (annotated) datasets for network training and learning optimal set of parameters.
The availability of large-scale annotated datasets may be limited due to the requirement of manual annotation by the human expert which is a cumbersome and time consuming operation. Further, due to the nature of field, large datasets may not be publicly available, for instance, in the medical field, due to the privacy and security concerns. Training using limited datasets impact the generalization ability of the deep learning models due to overfitting that arise due to small number of data points.
A number of studies have explored the automatic extraction of demographic attributes such as gender, age, ethnicity, etc. of an individual, known as soft-biometrics [1]. These attributes have been deduced from biometric data such as facial images, voice, gait, and hand or body images. Automated soft-biometric prediction has drawn significant interest in numerous applications such as surveillance, human–computer interaction, anonymous customized advertisement system, image retrieval system, continuous user authentication, subject re-identification, and in fusion with primary biometric modalities for performance enhancement [2].

Studies have been conducted to improve the prediction accuracy of soft biometrics in different biometric traits [3] using deep learning models. For face biometrics, studies have been conducted for deducing gender [4, 5, 6], age [4, 5, 7], and ethnicity [6, 8] from facial images. Soft biometrics such as gender [9, 10, 11, 12, 13], ethnicity [10, 12, 14], age [15, 16], and eye color [17, 12] have been predicted from periocular and iris in visible and near infrared spectrum.
Studies in biometric recognition have shown steady improvement in matching performance by combining soft biometric attributes with primary biometrics [18, 19, 1]. In [1], showed around improvement in identification system and around improvement in verification system by combining primary biometrics with predicted soft biometric traits such as ethnicity, gender, and height.
Recent advancements in soft biometric predictions are mainly based on deep learning for feature extraction and classification such as in [4, 12, 11]. These deep learning methods require a large amount of training data to obtain high generalization accuracy rates. However, acquisition of large biometric datasets along with soft-biometric attributes such as gender, race and age, may invokes privacy and security issues.
One way to overcome this problem is by using semi-supervised learning [20]. Semi-supervised learning entails joint use of labeled and unlabeled data in improving the generalization performance of the classifier.
In this paper, we proposed a self-ensemble based semi-supervised learning method which improve the generalization ability of the deep learning model trained on limited labeled data and large amount of unlabeled data. The proposed method develops a joint loss function that uses categorical cross-entropy for the labeled samples while reducing the distance between the model outputs of two different perturbations on the unlabeled data. Figure 2 show the schema of the proposed self-ensemble model. Many of the semi-supervised methods propose to use noise to induce perturbations [21, 22]. However, using noise might affect the features which are used in predicting soft biometric attributes to be deduced from biometric traits. Therefore, we proposed to use data augmentation to generate different perturbations from unlabeled data.
The contributions of this paper are two-fold:
-
•
The proposal of self-ensemble based deep learning model which harness unlabeled data to improve the generalization ability of the soft-biometric prediction.
-
•
Experimental evaluations are conducted for gender prediction on eye region images from VISOB dataset and face images from CelebA dataset. We have also evaluated age-group prediction on face images from Adience dataset. We show the deep learning models can be trained using as low as labeled samples using the proposed semi-supervised learning method.
The rest of the paper is organized as follows: A brief explanation of semi-supervised learning with recent deep learning based advancements is provided in section-II. The proposed self-ensemble method is discussed in section-III. Experimental setup and datasets are presented in section-IV. Results are discussed in section-V and paper is concluded in section-VI.
II Prior Work on Semi-Supervised Learning
Semi-supervised learning is a learning technique, which make joint use of labeled and unlabeled data for training a classifier, have a long history in machine learning research [20]. Specifically, the model trained on labeled data is used to generate pseudo-label for the unlabeled data. The pseudo-labeled unlabeled data is used for re-training the classifier for enhanced accuracy. For misclassification in psuedo-labeled data, the deep learning models have proven to be good at dealing with noisy labels [23].
These semi-supervised learning methods can be broadly divided into three types as follows:
(1) Self-training: In self-training based semi-supervised learning, a single model trained on a small labeled dataset is used to generate pseudo-labels for unlabeled dataset. The labeled data together with pseudo-labeled unlabeled data is used to re-train the model [24, 25]. Self-training is one of the earliest methods in semi-supervised learning and is used for performance enhancement of the classifier.
(2) Co-training: Co-training is a suitable semi-supervised learning approach that needs the dataset to be characterized by two different interpretations of features. In co-training, joint use of two-classifiers is used for pseudo-label generation and classifier re-training on unlabeled dataset. One way of performing co-training is one or more models are used to predict proxy (pseudo-labels) labels for another model [26]. Another way is by doing a majority voting on the predicted labels from two or more models to generate a proxy label for unlabeled dataset which is used for classifier re-training [27].
(3) Self-ensemble: This training process is similar to the co-training, where instead of using multiple classifiers to predict the pseudo-labels, only a single model with different perturbations of input samples is used. In self-ensemble, the model is trained using both labeled and unlabeled data during the training stage. Rasmus et al. [21] proposed to train model robust to noise by reducing the prediction error between noisy and clean input of unlabeled dataset while predicting targets for the labeled dataset in a supervised manner. The noise is induced by adding Gaussian noise to the input image as well as to the output of each layer in the model. Laine et al. [22] applied two different perturbations using Gaussian noise and augmentations to unlabeled samples and used dropout in deep learning model instead of Gaussian noise.
Our proposed method is based on self-ensemble technique, where instead of using noise as a perturbation in the previous models, we propose to use only data augmentations such as color jitter, flipping the image horizontally and using random translations. The proposed method is discussed in more detail in Section-III.

III Proposed Method
Block diagram of the proposed self-ensemble method is shown in Figure 2. In the proposed method, the deep learning model is trained consecutively in supervised manner on data with labels and in unsupervised manner by reducing output error between the pseudo-labels and of two augmented samples and generated from the unlabeled data . This is done so that the supervised part of the training will ensure the model learns to predict the correct target class for the given input while unsupervised part ensures the model learns to produce consistent output.
Data Augmentation for Perturbation: As we mentioned in Section I, the proposed method is based on self-ensemble based semi-supervised learning, where two output images were generated by inducing different kind of noise to the input image. However, inducing noise may distort the features in biometric traits, such as ocular and fingerprint images. We therefore propose to use data augmentation. In our experiments, we incorporated two different augmentations: firstly, we applied color jitter by changing the brightness and saturation of the image. Secondly, we randomly flipped the input image horizontally and finally, translation is applied by randomly cropping a region out of the image. This way, from data , we generate two perturbed samples and to train the deep learning model for the unsupervised part. Using data augmentation around 108K unlabeled samples for VISOB dataset and 199k unlabeled samples for CelebA dataset were generated.
Layer | Output Layer | Parameters |
---|---|---|
Conv 3x3, 32 filters | [32, 128, 128] | 288 |
[32, 128, 128] | 9,280 | |
MaxPool 2x2 | [32, 64, 64] | - |
[64, 64, 64] | 55,488 | |
MaxPool 2x2 | [64, 32, 32] | - |
[128, 32, 32] | 221,568 | |
MaxPool 2x2 | [128, 16, 16] | - |
[128, 16, 16] | 295,424 | |
MaxPool 2x2 | [128, 8, 8] | - |
[128, 8, 8] | 295,424 | |
Global AvgPool 8x8 | [128] | - |
Dense Layer | [2] | 256 |
Total Parameters | 877,728 |
Proposed Deep Learning Model: In our experiments, we used a simple sequential convolution neural network (CNN) based on visual geometry group (VGG) [28] architecture. Table 1 shows the proposed deep learning model used in our experiments. In all our experiments, we used a single channel (grayscale) image of size pixels as an input, from which we extract features by taking the global average of the output of the last convolutional layer. This is followed by the final fully connected layer (dense layer) for predicting the soft-biometric attributes. The proposed deep learning model is relatively small with only parameters as shown in Table 1.
Loss Function: For a given labeled data with targets , the model predicts as output, then we use cross-entropy loss as shown in equation-1 for supervised part of the model training.
(1) |
Let and be the predictions from the model for the given two perturbed samples, and , for given unlabeled input , then the model is trained in an unsupervised manner by taking mean square error (MSE) between two model predictions as shown in equation 2.
(2) |
Finally, the proposed model is trained by joint combination of cross-entropy and mean square error loss functions as shown in equation 3.
(3) |
Where is coefficient for unsupervised loss (MSE) and in all our experiments, we used based on empirical evidence.
IV Experimental Setup
IV-A Datasets
We evaluated the proposed self-ensemble method on gender and age-group soft biometrics predictions.













Gender Prediction: For prediction of gender as soft biometric trait, we evaluated the proposed model on eye region in visible light spectrum using VISOB [30] dataset and also face images in visible light spectrum using CelebA [29] dataset. For gender prediction, we trained the proposed model in subject independent evaluation with very few labeled samples, ranging from to collectively for both genders and a large number of unlabeled samples of about K.
In the case of VISOB [30] dataset, we used extended eye region crops used in [11] for our evaluation. VISOB dataset consists of eye images from subjects captured using selfie cameras of three different smartphones (iPhone 5s, Oppo N1 and Samsung Note 4). Eye captures are collected from the subjects in two visits at least weeks apart in two sessions in each visit in multiple lighting conditions. The dataset is divided into batches with subjects each, and we randomly pick folds for training and validation, and the remaining folds are used testing. This process is repeated times, and the average performance is reported.
CelebA [29] is the publicly available face dataset with more than images from subjects with 40 soft biometric attributes assigned for each image. In our experiments, we divided the dataset into subject independent split with equal distribution for male and female subjects in both the sets. During training, we randomly picked a small subset, ranging from to , of samples as labeled set and rest of the training dataset of about K as the unlabeled set.
We evaluated the performance of gender prediction in terms of male classification rate (MCR) and female classification rate (FCR) along with total classification accuracy (ACC) as reported in [11] where MCR is the percentage of male samples correctly predicted as male and similarly for FCR with female samples.
Age-Group Prediction: We evaluated the age-group prediction on Adience [31] dataset with age groups (, , , , , , , ). The dataset consists of face images from subjects which are divided into subject disjoint sets. We followed the cross-validation protocol presented in [31] and reported the performance metrics in terms of exact age-group prediction rate (Exact Acc%) and one-off age-group prediction rate (1-OFF Acc%). As the dataset contains unconstrained and unfiltered face samples collected from flicker.com albums with large noise induced due to motion blur, resolution, type of capturing device, gaze, and illumination, this rendered the age-group prediction to be a complex task. For this reason, we used all the samples of about K in training set, as both labeled and unlabeled set (50:50 split), to train the proposed self-ensemble based deep learning model.
IV-B Experimental Protocol
For both gender and age-group prediction, first we trained the proposed model shown in Table 1, as the baseline with all the training dataset as labeled samples. Once we obtained the best results from our experiments, we used the same global parameters for the rest of our experiments. For gender prediction, we used a batch size of with a learning rate of and trained the model for epochs. In the case of age-group prediction, we used a batch size of with a learning rate of and trained for epochs. For both gender and age group predictions, we used Adam [32] optimizer.
Before applying our proposed data augmentation pipeline, we resized all the images to pixels. For each batch of the given image, we randomly apply color jitter followed by randomly flipping the image horizontally. Finally, we crop pixel region from the image and converted to grayscale. During testing, we consider the center pixel crop from the image and converted it to grayscale. The trained gender-prediction models are evaluated on and test samples from CelebA and VISOB, respectively. The age-group classification model is evaluated on test samples samples from Adience dataset. The test accuracy values of the gender and age-prediction models trained using supervised and self-ensemble based learning using various combination of labeled and unlabeled data are shown in Table II III, and IV.
|
|
MCR() | FCR() | ACC(%) | ||||
---|---|---|---|---|---|---|---|---|
All samples | 0 | 96.8/- | 98.51/- | 97.91/- | ||||
100 | 199900 | 0/75.49 | 100/89.67 | 64.8/84.68 | ||||
200 | 199800 | 3.67/90.45 | 98.86/92.45 | 65.35/91.74 | ||||
500 | 199500 | 49.42/92.55 | 95.44/94.62 | 79.24/93.89 | ||||
1000 | 199000 | 84.87/94.5 | 90.81/94.44 | 88.72/94.46 |
|
|
MCR() | FCR() | ACC(%) | ||||
---|---|---|---|---|---|---|---|---|
All samples | 0 | 92.16/- | 75.94/- | 84.65/- | ||||
100 | 108335 | 100.00/80.01 | 0.00/57.32 | 53.70/69.51 | ||||
200 | 108235 | 92.47/76.42 | 9.11/71.33 | 53.87/74.07 | ||||
500 | 107935 | 84.79/85.19 | 47.71/66.41 | 67.63/76.49 | ||||
1000 | 107435 | 88.75/83.40 | 53.71/78.22 | 72.53/81.00 |
Cross Validation | Exact Acc(%) | 1-OFF Acc(%) | ||
---|---|---|---|---|
baseline | proposed | baseline | proposed | |
1 | 49.55 | 54.39 | 84.6 | 87.3 |
2 | 48.28 | 55.1 | 87.67 | 90.14 |
3 | 54.46 | 52.87 | 88.68 | 88.71 |
4 | 48.46 | 51.64 | 86.41 | 85.59 |
5 | 57.42 | 63.76 | 90.06 | 91.88 |
Overall | 51.63 3.66 | 55.55 4.28 | 87.48 1.87 | 88.72 2.18 |
From [4] | 50.7 5.1 | 84.7 2.2 |
|
|
ACC(%)FixMatch | ACC(%)Proposed | ||||
---|---|---|---|---|---|---|---|
All samples | 0 | 97.8/- | 97.91/- | ||||
100 | 199900 | -/91.9 | 64.8/84.68 | ||||
200 | 199800 | -/92.85 | 65.35/91.74 | ||||
500 | 199500 | -/93.25 | 79.24/93.89 | ||||
1000 | 199000 | -/93.65 | 88.72/94.46 |
V Results
Table II and Table III show gender prediction accuracy for supervised and self-ensemble model for face images from CelebA dataset and ocular images from VISOB dataset, respectively. The number of labeled and unlabeled samples and the obtained accuracy values using supervised as well as self-ensemble based models are also shown. It can be seen that using the low number of labeled samples, and samples, the supervised model performance is dropped with about accuracy in predicting the gender. Whereas, with the proposed self-ensemble model better capability at prediction the gender while at least improvement in total prediction accuracy (ACC%) could be obtained.
It can also be seen that the accuracy of the proposed self-ensemble learning with only labeled samples obtained accuracy on VISOB dataset and on CelebA in gender prediction, which is very close to the model trained in a supervised manner with all the training dataset as labeled samples with and accuracy for VISOB and CelebA, respectively. The experimental results demonstrate that the self-ensemble model for gender prediction trained using both limited labeled data (as low as 100 samples) and unlabeled data (about 108K-199K) obtains performance equivalent to supervised model trained on all 108K-199K labeled samples.
Table IV show the age-group prediction on Adience dataset with the deep learning model trained using supervised learning (with about 11K samples) as the baseline. The comparative analysis is done with the proposed self-ensemble model for 5-fold cross-validation. It can be seen that the supervised deep learning model trained on 11K samples can obtain exact accuracy of which already is higher than the model proposed in [4] with exact accuracy of . On top of that, by using our proposed self-ensemble learning (using 11K samples as labeled and unlabeled data equally divided), better exact accuracy of which is more than the baseline, could be obtained.
Table V show the gender prediction performance comparison of the proposed self-ensemble method with the FixMatch [33]. Both the models were evaluated on face images from the CelebA dataset.
The experimental results on gender and age prediction from face and ocular images suggest the efficacy of harnessing unlabeled data towards improving generalization accuracy of the deep learning models.
VI Conclusion and Future Work
The generalization ability of the deep learning models is proportional to the use of large scale and representative datasets for model training. However, labeling the large scale dataset require human operator and is very expensive and time consuming. In this paper, we proposed a self-ensemble based semi supervised learning model which could be trained using labeled data in a supervised manner and can consecutively harness the unlabeled data in an unsupervised fashion using multi-objective loss function. A case study on gender and age-group prediction using face and ocular images from CelebA, Adience and VISOB datasets suggest increase in the generalization ability of the proposed deep learning model when trained with small amount of labeled data (as less as 100 samples) along with large unlabeled dataset. The deep learning models with the ability to harness unlabeled data also have applications in few shot learning [34], bias mitigation towards unrepresentative sub-population in the dataset [35], and model adaptation to the dynamic and temporal variations of the operational data [36]. As a part of future work, comparative analysis of the proposed self-ensemble method, implemented using different backbone architectures such as ResNet and MobileNet, with the other semi supervised learning methods like MixMatch [37], Noisy Student [38] and FixMatch [33] will be performed.
References
- [1] Jain, A. K., Dass, S. C., and Nandakumar, K., “Soft biometric traits for personal recognition systems,” in Biometric authentication. Springer, 2004, pp. 731–738.
- [2] Reid, D., Samangooei, S., Chen, C., Nixon, M., and Ross, A., “Soft biometrics for surveillance: an overview,” in Handbook of statistics. Elsevier, 2013, vol. 31, pp. 327–352.
- [3] Dantcheva, A., Elia, P., and Ross, A., “What else does your biometric data reveal? a survey on soft biometrics,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 3, pp. 441–467, 2016.
- [4] Levi, G. and Hassner, T., “Age and gender classification using convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 34–42.
- [5] Zhang, K., Gao, C., Guo, L., Sun, M., Yuan, X., Han, T. X., Zhao, Z., and Li, B., “Age group and gender estimation in the wild with deep ror architecture,” IEEE Access, vol. 5, pp. 22 492–22 503, 2017.
- [6] Narang, N. and Bourlai, T., “Gender and ethnicity classification using deep learning in heterogeneous face recognition,” in 2016 International Conference on Biometrics (ICB). IEEE, 2016, pp. 1–8.
- [7] Zhang, K., Liu, N., Yuan, X., Guo, X., Gao, C., and Zhao, Z., “Fine-grained age estimation in the wild with attention lstm networks,” arXiv preprint arXiv:1805.10445, 2018.
- [8] Masood, S., Gupta, S., Wajid, A., Gupta, S., and Ahmed, M., “Prediction of human ethnicity from facial images using neural networks,” in Data Engineering and Intelligent Computing. Springer, 2018, pp. 217–226.
- [9] Thomas, V., Chawla, N. V., Bowyer, K. W., and Flynn, P. J., “Learning to predict gender from iris images,” in 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems. IEEE, 2007, pp. 1–5.
- [10] Lagree, S. and Bowyer, K. W., “Predicting ethnicity and gender from iris texture,” in 2011 IEEE International Conference on Technologies for Homeland Security (HST). IEEE, 2011, pp. 440–445.
- [11] Rattani, A., Reddy, N., and Derakhshani, R., “Convolutional neural networks for gender prediction from smartphone-based ocular images,” IET Biometrics, vol. 7, no. 5, pp. 423–430, 2018.
- [12] Bobeldyk, D. and Ross, A., “Predicting soft biometric attributes from 30 pixels: A case study in nir ocular images,” in 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), Jan 2019, pp. 116–124.
- [13] Tapia, J., Arellano, C., and Viedma, I., “Sex-classification from cell-phones periocular iris images,” arXiv preprint arXiv:1812.11702, 2018.
- [14] Mohammad, A. S. and Al-Ani, J. A., “Convolutional neural network for ethnicity classification using ocular region in mobile environment,” in 2018 10th Computer Science and Electronic Engineering (CEEC). IEEE, 2018, pp. 293–298.
- [15] Sgroi, A., Bowyer, K. W., and Flynn, P. J., “The prediction of old and young subjects from iris texture,” in 2013 International Conference on Biometrics (ICB). IEEE, 2013, pp. 1–5.
- [16] Rattani, A., Reddy, N., and Derakhshani, R., “Convolutional neural network for age classification from smart-phone based ocular images,” in 2017 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 2017, pp. 756–761.
- [17] Bobeldyk, D. and Ross, A., “Predicting eye color from near infrared iris images,” in 2018 International Conference on Biometrics (ICB). IEEE, 2018, pp. 104–110.
- [18] Jain, A. K. and Park, U., “Facial marks: Soft biometric for face recognition,” in 2009 16th IEEE International Conference on Image Processing (ICIP). IEEE, 2009, pp. 37–40.
- [19] Gonzalez-Sosa, E., Fierrez, J., Vera-Rodriguez, R., and Alonso-Fernandez, F., “Facial soft biometrics for recognition in the wild: Recent works, annotation, and cots evaluation,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 8, pp. 2001–2014, 2018.
- [20] Zhu, X. J., “Semi-supervised learning literature survey,” University of Wisconsin-Madison Department of Computer Sciences, Tech. Rep., 2005.
- [21] Rasmus, A., Berglund, M., Honkala, M., Valpola, H., and Raiko, T., “Semi-supervised learning with ladder networks,” in Advances in neural information processing systems, 2015, pp. 3546–3554.
- [22] Laine, S. and Aila, T., “Temporal ensembling for semi-supervised learning,” arXiv preprint arXiv:1610.02242, 2016.
- [23] Reed, S. E., Lee, H., Anguelov, D., Szegedy, C., Erhan, D., and Rabinovich, A., “Training deep neural networks on noisy labels with bootstrapping,” in ICLR 2015, 2015. [Online]. Available: http://arxiv.org/abs/1412.6596
- [24] Suzuki, J. and Isozaki, H., “Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data,” Proceedings of ACL-08: HLT, pp. 665–673, 2008.
- [25] Lee, H.-W., Kim, N.-r., and Lee, J.-H., “Deep neural network self-training based on unsupervised learning and dropout,” International Journal of Fuzzy Logic and Intelligent Systems, vol. 17, no. 1, pp. 1–9, 2017.
- [26] Ruder, S. and Plank, B., “Strong baselines for neural semi-supervised learning under domain shift,” arXiv preprint arXiv:1804.09530, 2018.
- [27] Zhou, Y. and Goldman, S., “Democratic co-learning,” in 16th IEEE International Conference on Tools with Artificial Intelligence. IEEE, 2004, pp. 594–602.
- [28] Simonyan, K. and Zisserman, A., “Very deep convolutional networks for large-scale image recognition,” 2015.
- [29] Liu, Z., Luo, P., Wang, X., and Tang, X., “Deep learning face attributes in the wild,” in Proceedings of International Conference on Computer Vision (ICCV), December 2015.
- [30] Nguyen, H., Reddy, N., Rattani, A., and Derakhshani, R., “VISOB 2.0 - second international competition on mobile ocular biometric recognition,” in IAPR ICPR, Rome, Italy, 2020, pp. 1–8.
- [31] Eidinger, E., Enbar, R., and Hassner, T., “Age and gender estimation of unfiltered faces,” IEEE Transactions on Information Forensics and Security, vol. 9, no. 12, pp. 2170–2179, 2014.
- [32] Kingma, D. P. and Ba, J., “Adam: A method for stochastic optimization,” 2017.
- [33] Sohn, K., Berthelot, D., Li, C.-L., Zhang, Z., Carlini, N., Cubuk, E. D., Kurakin, A., Zhang, H., and Raffel, C., “Fixmatch: Simplifying semi-supervised learning with consistency and confidence,” 2020.
- [34] Yao, F., “Cross-domain few-shot learning with unlabelled data,” 2021.
- [35] Krishnan, A., Almadan, A., and Rattani, A., “Understanding fairness of gender classification algorithms across gender-race groups,” in Proceedings of the IEEE Conference on Machine Learning and Applications, 2020.
- [36] Chen, M.-H., Li, B., Bao, Y., AlRegib, G., and Kira, Z., “Action segmentation with joint self-supervised temporal domain adaptation,” 2020.
- [37] Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., and Raffel, C., “Mixmatch: A holistic approach to semi-supervised learning,” 2019.
- [38] Xie, Q., Luong, M.-T., Hovy, E., and Le, Q. V., “Self-training with noisy student improves imagenet classification,” 2020.