\cftpagenumbersoff

figure \cftpagenumbersofftable

Face recognition via compact second order image gradient orientations

He-Feng Yin Jiangnan University, School of Artificial Intelligence and Computer Science, No. 1800, Lihu Avenue, Wuxi, China, 214122 Jiangsu Provincial Laboratory of Pattern Recognition and Computational Intelligence, No. 1800, Lihu Avenue, Wuxi, China, 214122 Xiao-Jun Wu Jiangnan University, School of Artificial Intelligence and Computer Science, No. 1800, Lihu Avenue, Wuxi, China, 214122 Jiangsu Provincial Laboratory of Pattern Recognition and Computational Intelligence, No. 1800, Lihu Avenue, Wuxi, China, 214122 Xiao-Ning Song Jiangnan University, School of Artificial Intelligence and Computer Science, No. 1800, Lihu Avenue, Wuxi, China, 214122 Jiangsu Provincial Laboratory of Pattern Recognition and Computational Intelligence, No. 1800, Lihu Avenue, Wuxi, China, 214122

Abstract

Conventional subspace learning approaches based on image gradient orientations only employ the first-order gradient information. However, recent researches on human vision system (HVS) uncover that the neural image is a landscape or a surface whose geometric properties can be captured through the second order gradient information. The second order image gradient orientations (SOIGO) can mitigate the adverse effect of noises in face images. To reduce the redundancy of SOIGO, we propose compact SOIGO (CSOIGO) by applying linear complex principal component analysis (PCA) in SOIGO. Combined with collaborative representation based classification (CRC) algorithm, the classification performance of CSOIGO is further enhanced. CSOIGO is evaluated under real-world disguise, synthesized occlusion and mixed variations. Experimental results indicate that the proposed method is superior to its competing approaches with few training samples, and even outperforms some prevailing deep neural network based approaches. The source code of CSOIGO is available at https://github.com/yinhefeng/SOIGO.

keywords:

face recognition, second order gradient, image gradient orientations, collaborative representation based classification

*Xiao-Jun Wu, \linkable[email protected]

1 Introduction

Face recognition (FR) remains one of the most active research topics in the community of pattern recognition, feature extraction [1, 2, 3] is a key ingredient in FR, as well as in image fusion [4, 5, 6, 7] and many other computer vision tasks[8, 9, 10, 11, 12]. Though considerable progress has been made during the past decades, robust FR is still unresolved. Occlusion is ubiquitous in practical applications, and it will dramatically degrade the performance of FR. To increase the robustness to occlusion, researchers have developed a variety of approaches. Sparse representation based classification (SRC) [13] is developed for FR and it shows robustness to occlusion and corruption in the test images when combined with block partition technique. Naseem et al.[14] proposed a modular linear regression classification (Modular LRC) approach with a distance based evidence fusion (DEF) algorithm to tackle the problem of contiguous occlusion. To further enhance the performance of SRC, Li et al.[15] proposed a sparsity augmented weighted CRC approach for image recognition. Dong et al.[16] designed a low-rank Laplacian-uniform mixed (LR-LUM) model which characterizes complex errors as a combination of continuous structured noises and random noises. Yang et al.[17] presented nuclear norm based matrix regression (NMR) which employs two dimensional image-matrix-based error model rather than the one dimensional pixel-based error model. The representation vector in NMR is imposed by the $\ell_{2}$ norm, to make use of the discriminative property of sparsity, Chen et al.[18] proposed a sparse regularized NMR (SR-NMR) by replacing the $\ell_{2}$ norm constraint on the representation vector with the $\ell_{1}$ norm. However, the above approaches need uncorrupted training images. When providing corrupted training data, their performance will be deteriorated. To tackle the situation that both the training and test data are corrupted, low rank matrix recovery (LRMR) can be applied. Chen et al.[19] proposed a discriminative low rank representation (DLRR) method which introduces the structural incoherence into the framework of low rank representation (LRR) [20]. Gao et al.[21] proposed to learn robust and discriminative low-rank representation (RDLRR) by introducing low-rank constraint to simultaneously model the representation and each error term. Hu et al.[22] presented a robust FR method which employs dual nuclear norm low rank representation and self-representation induced classifier. Yang et al.[23] developed a sparse low-rank component-based representation (SLCR) method for FR with low quality images. Recently, Yang et al.[24] extended SLCR and proposed a FR technique named sparse individual low-rank component representation (SILR) for IoT-based system. Inspired by LRR and deep learning techniques, Xia et al.[25] developed an embedded conformal deep low-rank auto-encoder (ECLAE) neural network architecture for matrix recovery.

Recently, image gradient orientations (IGO) has attracted much attention due to its impressive results in occluded FR. Wu et al.[26] presented a gradient direction-based hierarchical adaptive sparse and low rank (GD-HASLR) model which performs in the image gradient direction domain rather than the image intensity domain. Li et al.[27] incorporated IGO into robust error coding and proposed an IGO-embedded structural error coding (IGO-SEC) model for FR with occlusion. Apart from the above two works, Zhang et al.[28] designed Gradientfaces for FR under varying illumination conditions. In essence, Gradientfaces is the IGO. Tzimiropoulos et al.[29] introduced the notion of subspace learning from IGO and developed approaches such as IGO-PCA and IGO-LDA. Vu [30] proposed a face representation approach called patterns of orientation difference (POD) which explores the relations of both gradient orientations and magnitudes. Zheng et al.[31] presented an online image alignment method via subspace learning from IGO. Qian et al.[32] presented a method called ID-NMR in which the local gradient distribution is exploited to decompose the image into several gradient images.

The above IGO based approaches only take the first order gradient information into account, thus neglecting the second order or higher order gradient information. Latest researches on human vision discover that the neural image is a landscape or a surface whose geometric properties can be described by local curvatures of differential geometry through second order gradient information [33, 34]. Based on the second order gradient, Huang et al.[33] presented a new local image descriptor called histograms of second order gradient (HSOG). Li et al.[35] proposed a patterned fabric defect detection method based on the second-order orientation-aware descriptor. Zhang et al.[36] designed a blind image quality assessment (IQA) method based on multi-order gradients statistics. Bastian et al.[37] developed a pedestrian detector utilizing both the first order and the second order gradient information in the image. Nevertheless, the above second order gradient based approaches do not involve dimensionality reduction technique which would result in redundant information. To alleviate this problem, we introduce PCA into the framework of SOIGO to extract more compact features. Moreover, we employ CRC as the final classifier due to its effectiveness and efficiency. Experimental results show that our proposed method (CSOIGO) is robust to real disguise, synthesized occlusion and mixed variations, and it is superior to some popular deep neural network based approaches.

The remainder of this paper is arranged as follows. Section 2 reviews some related work. In Section 3, we present our proposed approach. Section 4 conducts several experiments to demonstrate the efficacy of our proposed method. Finally, conclusions are drawn in Section 5.

2 Related work

2.1 IGO-PCA

Given a set of images $\left\{\mathbf{Z}_{i}\right\}$ ( $i=1,2,\ldots,N$ ), where $N$ denotes the number of training images and $\mathbf{Z}_{i}\in\mathbb{R}^{m\times n}$ . Suppose $\mathbf{I}(x,y)$ is the image intensities at pixel coordinates $(x,y)$ of sample $\mathbf{Z}_{i}$ , the horizontal and vertical gradient can be obtained by the following formulations:

		$\displaystyle\mathbf{G}_{i,x}=h_{x}*\mathbf{I}(x,y)$		(1)
		$\displaystyle\mathbf{G}_{i,y}=h_{y}*\mathbf{I}(x,y)$		(1)

where $*$ expresses convolution, $h_{x}$ and $h_{y}$ are filters employed to approximate the ideal differentiation operator along the image horizontal and vertical directions, respectively. However, the image data mostly distribute discretely in real-world scenarios, so we usually use differences to compute the gradients, i.e., achieving the gradients through the difference between adjacent pixels’ gray values. Thus horizontal and vertical gradients can be reformulated as:

		$\displaystyle\mathbf{G}_{i,x}=\mathbf{I}(x+1,y)-\mathbf{I}(x,y)$		(2)
		$\displaystyle\mathbf{G}_{i,y}=\mathbf{I}(x,y+1)-\mathbf{I}(x,y)$		(2)

Then the gradient orientation of the pixel location $(x,y)$ is:

\Phi_{i}(x,y)=\textrm{arctan}\tfrac{\mathbf{G}_{i,y}}{\mathbf{G}_{i,x}},i=1,2,...,N

(3)

For each image $\mathbf{Z}_{i}$ whose size is $m\times n$ , we can get a corresponding gradient orientation matrix $\Phi_{i}\in[0,2\pi)^{m\times n}$ . Then we can obtain the corresponding sample vectors by converting 2D images $\Phi_{i}$ into 1D vectors $\mathbf{\phi}_{i}$ . Referring to Ref.29, we also define the mapping from $[0,2\pi)^{K}(K=m\times n)$ onto a subset of complex sphere with radius $\sqrt{K}$

\boldsymbol{t}_{i}(\phi_{i})=e^{j\phi_{i}}

(4)

where $e^{j\phi_{i}}=[e^{j\phi_{1}},e^{j\phi_{2}},...,e^{j\phi_{K}}]^{T}$ , and $e^{j\theta}$ is Euler form, i.e., $e^{j\theta}=\textrm{cos}\theta+j\textrm{sin}\theta$ . Then we can apply complex linear PCA to the transformed $\boldsymbol{t}_{i}$ . That is, we seek for a set of $d<K$ orthonormal bases $\mathbf{U}=[\mathbf{u}_{1},\mathbf{u}_{2},...,\mathbf{u}_{d}]\in\mathbb{C}^{K\times d}$ by solving the following problem:

\epsilon(\mathbf{U})=\left\|\mathbf{X}-\mathbf{U}\mathbf{U}^{H}\mathbf{X}\right\|_{F}^{2}

(5)

where $\mathbf{X}=[\boldsymbol{t}_{1},\boldsymbol{t}_{2},...,\boldsymbol{t}_{N}]\in\mathbb{C}^{K\times N}$ , $\mathbf{U}^{H}$ is the conjugate transpose of $\mathbf{U}$ , and $\left\|.\right\|_{F}$ denotes the Frobenius norm. Eq. 5 can be reformulated as:

\mathbf{U}_{o}=\textrm{arg}\ \underset{\mathbf{U}}{\textrm{max}}\ tr(\mathbf{U}^{H}\mathbf{X}\mathbf{X}^{H}\mathbf{U}),\ \textrm{s.t.}\ \mathbf{U}^{H}\mathbf{U}=\mathbf{I}

(6)

The solution is given by the $d$ eigenvectors of $\mathbf{X}\mathbf{X}^{H}$ corresponding to the $d$ largest eigenvalues. Then the $d$ -dimensional embedding $\mathbf{Y}\in\mathbb{C}^{d\times N}$ of $\mathbf{X}$ is produced by $\mathbf{Y}=\mathbf{U}^{H}\mathbf{X}$ .

2.2 Collaborative representation based classification

During the past few years, representation based classification method (RBCM) has attracted lots of attention in the community of pattern recognition. The pioneering work is SRC [13]. In SRC, the $\ell_{1}$ norm constraint is employed to attain the sparse coefficient of test data. Afterwards, Zhang et al.[38] argued that it is the collaborative representation mechanism rather than the $\ell_{1}$ norm constraint that makes SRC successful for FR. Therefore, they developed the CRC method which replaces the $\ell_{1}$ norm constraint with the $\ell_{2}$ norm, and the objective function of CRC is formulated as follows,

\underset{\boldsymbol{\alpha}}{\textrm{min}}\ \left\{\left\|\boldsymbol{y}-\mathbf{D}\boldsymbol{\alpha}\right\|_{2}^{2}+\lambda\left\|\boldsymbol{\alpha}\right\|_{2}^{2}\right\}

(7)

where $\boldsymbol{y}$ is the test data, $\mathbf{D}$ is the dictionary that contains all the training data from $C$ classes, and $\lambda$ is a balancing parameter. Eq. 7 has the following closed-form solution,

\boldsymbol{\alpha}=(\mathbf{D}^{T}\mathbf{D}+\lambda\mathbf{I})^{-1}\mathbf{D}^{T}\boldsymbol{y}

(8)

In the classification stage, apart from the class specific reconstruction error $\left\|\boldsymbol{y}-\mathbf{D}_{j}\boldsymbol{\alpha}_{j}\right\|_{2},j=1,2,\ldots,C$ , where $\boldsymbol{\alpha}_{j}$ is the coefficient vector corresponding to the $j$ th class, Zhang et al.[38] found that $\left\|\boldsymbol{\alpha}_{j}\right\|_{2}$ also contains some discriminative information for classification. Thus, they presented the following regularized residuals for classification,

\textrm{identity}(\boldsymbol{y})=\textrm{arg}\ \underset{j}{\textrm{min}}\ \frac{\left\|\boldsymbol{y}-\mathbf{D}_{j}\boldsymbol{\alpha}_{j}\right\|_{2}}{\left\|\boldsymbol{\alpha}_{j}\right\|_{2}}

(9)

3 Proposed method

Previous studies reveal that gradient information at different orders characterize different structural features of natural scenes. The first-order gradient information is related to the slope and elasticity of a surface, while the second order gradient delivers the curvature related geometric properties. Fig. 1 depicts two images and their corresponding landscapes plotted as surfaces, one can see that these landscapes contain a variety of local shapes, such as cliffs, ridges, summits, valleys and basins. Inspired by the above results, we propose a new FR method which exploits the SOIGO. And the second order gradient is obtained based on the first order gradient information defined in Eq. 2,

		$\displaystyle\mathbf{G}_{i,x}^{2}=\mathbf{G}_{i,x}(x+1,y)-\mathbf{G}_{i,x}(x,y)$		(10)
		$\displaystyle\mathbf{G}_{i,y}^{2}=\mathbf{G}_{i,y}(x,y+1)-\mathbf{G}_{i,y}(x,y)$		(10)

where $\mathbf{G}_{i,x}^{2}$ and $\mathbf{G}_{i,y}^{2}$ are the second order gradient along the horizontal and vertical directions, respectively. Therefore, the SOIGO is computed as follows,

\Phi_{i}^{2}(x,y)=\textrm{arctan}\tfrac{\mathbf{G}_{i,y}^{2}}{\mathbf{G}_{i,x}^{2}}

(11)

Refer to caption — Figure 1: Original images (left part) and their surface plots (right part).

Fig. 2 presents an original face image and its gradient orientations of the first and second orders, one can see that, compared with the first order IGO, the SOIGO significantly depress the noises in the orientation domain. Moreover, the SOIGO contains more fine information than the first order IGO, e.g., areas around the eyes, nose and mouth.

To further illustrate the effectiveness of using the SOIGO, we visualize the original data, the first order IGO and the SOIGO on the AR database by employing the t-SNE algorithm [39] in Fig. 3. These data are selected from the first ten subjects on the AR database, for each person, seven non-occluded face images in Session 1 are used. Then these images are occluded by a square baboon image with a percentage of 30%, for detailed experimental settings, please refer to subsection 4.3. As can be seen from Fig. 3, though the first order IGO looks better as compared with the original data, clusters of different classes are mixed together. In Fig. 3 (c), cluster of the same class is more compact than that of Fig. 3 (b), which is beneficial for subsequent classification.

The procedures of obtaining the projection matrix $\mathbf{U}$ is the same as in IGO-PCA. Then for a test image $\mathbf{Z}_{t}$ , first we compute its SOIGO and obtain $\boldsymbol{t}$ after the mapping defined by Eq. 4. Embeddings of training and test images are derived as follows,

\mathbf{Y}=\mathbf{U}^{H}\mathbf{X}

(12)

\boldsymbol{z}=\mathbf{U}^{H}\boldsymbol{t}

(13)

where $\mathbf{Y}\in\mathbb{C}^{d\times N}$ and $\boldsymbol{z}\in\mathbb{C}^{d\times 1}$ . To make the embeddings of training and test images suitable for CRC, we employ both the real and imaginary parts of $\mathbf{Y}$ and $\boldsymbol{z}$ as the input of CRC, let

\mathbf{D}=\begin{bmatrix}\textrm{real}(\mathbf{Y})\\ \textrm{imag}(\mathbf{Y})\end{bmatrix}

(14)

\boldsymbol{y}=\begin{bmatrix}\textrm{real}(\boldsymbol{z})\\ \textrm{imag}(\boldsymbol{z})\end{bmatrix}

(15)

where $\textrm{real}(\cdot)$ and $\textrm{imag}(\cdot)$ are the real part and imaginary part of complex number, respectively. Then we compute the representation coefficient vector of $\boldsymbol{y}$ over $\mathbf{D}$ , this is followed by checking which class results in the least regularized residual. The complete process of our proposed CSOIGO is outlined in Algorithm 1.

Input: A set of

N

training images

\left\{\mathbf{Z}_{i}\right\}(i=1,2,\ldots,N)

from

C

classes, test image

\mathbf{Z}_{t}

, the number of principal components

d

and the regularization parameter

\lambda

for CRC.

1. Obtain the SOIGO

\Phi_{i}^{2}

of training images and convert it to 1D vector

\phi_{i}^{2}

2. Compute

\mathbf{t}_{i}(\phi_{i}^{2})=e^{j\phi_{i}^{2}}

, all the SOIGO of training images form the matrix

\mathbf{X}=[\mathbf{t}_{1},\mathbf{t}_{2},...,\mathbf{t}_{N}]

3. Obtain the projection matrix

\mathbf{U}

via Eq. 6.

4. For the test image

\mathbf{Z}_{t}

, obtain its SOIGO

\Phi_{t}^{2}

and convert it to 1D vector

\phi_{t}^{2}

, then compute

\boldsymbol{t}=e^{j\phi_{t}^{2}}

5. Obtain the embeddings of training and test images via Eqs. 12 and 13.

6. Obtain

\mathbf{D}

and

\boldsymbol{y}

by Eqs. 14 and 15.

7. Code

\boldsymbol{y}

over

\mathbf{D}

by Eq. 8.

8. Compute the regularized residuals

\boldsymbol{r}_{j}=\frac{\left\|\boldsymbol{y}-\mathbf{D}_{j}\boldsymbol{\alpha}_{j}\right\|_{2}}{\left\|\boldsymbol{\alpha}_{j}\right\|_{2}},j=1,2,\ldots,C

Output:

\textrm{identity}(\mathbf{Z}_{t})=\textrm{arg}\ \underset{j}{\textrm{min}}\ \boldsymbol{r}_{j}

Algorithm 1 CSOIGO

4 Experimental results and analysis

In this section, experiments are conducted under different scenarios to validate the effectiveness of the proposed method.

4.1 Recognition with real disguise

The AR database contains over 4000 images of 126 subjects. For each individual, 26 images are taken in two separate sessions. There are 13 images for each session, in which 3 images with sunglasses, another 3 with scarves and the remaining 7 are with different illumination and expression changes, the 13 images of one subject from Session 1 is shown in Fig. 4. Each image is 165 $\times$ 120 pixels. In our experiments, we choose a subset of the AR database consisting of 50 men and 50 women, and all images are resized to 42 $\times$ 30 pixels. The neutral face image of each subject is used as training data, and the sunglasses/scarf occluded images in each session for testing. The proposed method is compared with other state-of-the-art approaches, including HQPAMI [40], NR [41], ProCRC [42], F-LR-IRNNLS [43], EGSNR[44], LDMR[45], and GD-HASLR [26]. To better illustrate the superiority of CSOIGO, we also present the results of IGO-PCA-NNC[29], IGO-PCA-CRC and SOIGO-PCA-NNC. Table 1 summarizes the experimental results, one can see that CSOIGO achieves the highest recognition accuracy under all cases except for the sunglasses scenario of session 1. It has the best overall result, and the overall accuracy gain of CSOIGO over GD-HASLR and IGO-PCA-CRC is 4.5% and 2.67%, respectively. The above experimental results indicate our proposed CSOIGO is robust to real disguise even when a single training sample per person is available.

Next, we utilize two neutral face images per subject from Sessions 1 and 2 for training, and the test sets are identical with the first experiment. The results are reported in Table 2. As can be seen from Table 2, CSOIGO yields the best overall recognition accuracy, and it outperforms GD-HASLR by 2.92%.

Table 1: Recognition accuracy (%) of competing approaches on the AR database when only one neutral face image per subject from Session 1 is used as training sample, the dimension that leads to the best result for IGO and SOIGO based approaches is given in parentheses.

Methods	Sunglasses		Scarf		Overall
	Session1	Session2	Session1	Session2
HQPAMI[40]	56.67	38.00	38.00	22.33	3875
NR[41]	28.33	16.67	29.67	17.33	23.00
ProCRC[42]	53.07	31.00	18.67	7.33	27.52
F-LR-IRNNLS[43]	88.67	60.33	67.00	49.67	66.42
EGSNR[44]	84.00	54.00	70.33	48.33	64.16
LDMR[45]	68.33	45.67	59.67	34.00	51.92
GD-HASLR[26]	92.00	66.67	82.67	58.67	75.00
IGO-PCA-NNC[29]	89.00 (99)	69.00 (99)	73.33 (97)	53.33 (96)	71.17
IGO-PCA-CRC	93.00 (85)	74.33 (92)	81.67 (88)	58.33 (95)	76.83
SOIGO-PCA-NNC	88.67 (92)	73.33 (96)	80.33 (99)	61.00 (88)	75.83
CSOIGO	92.67 (89)	76.67 (93)	83.33 (75)	65.33 (99)	79.50

Table 2: Recognition accuracy (%) of competing approaches on the AR database when two neutral face images (from Sessions 1 and 2) per subject are used as training samples, the dimension that leads to the best result for IGO and SOIGO based approaches is given in parentheses.

Methods	Sunglasses		Scarf		Overall
	Session1	Session2	Session1	Session2
HQPAMI[40]	61.33	59.33	44.67	48.00	53.33
NR[41]	34.00	33.33	33.00	35.67	34.00
ProCRC[42]	53.00	54.67	18.00	17.67	35.84
F-LR-IRNNLS[43]	90.33	87.67	78.67	76.00	83.17
EGSNR[44]	88.00	89.33	80.00	73.00	82.58
LDMR[45]	71.00	63.67	64.00	61.00	64.92
GD-HASLR[26]	93.00	93.33	82.67	84.00	88.25
IGO-PCA-NNC[29]	93.00 (182)	91.67 (191)	78.00 (199)	74.00 (193)	84.17
IGO-PCA-CRC	96.00 (128)	95.33 (116)	85.00 (190)	84.00 (160)	90.08
SOIGO-PCA-NNC	96.33 (187)	92.67 (197)	86.33 (166)	83.67 (189)	89.75
CSOIGO	97.33 (144)	95.67 (124)	86.00 (119)	85.67 (198)	91.17

4.2 Comparison with CNN-based approaches

In this subsection, we compare our proposed method with prevailing deep learning-based approaches. The first one is VGGFace [46] which is based on the VGGNet [47], and it has 16 convolutional layers, five max-pooling layers, three fully-connected layers and a final linear layer with softmax layer. In our experiments, we employ FC6 and FC7 for feature extraction. The second one is Lightened CNN [48] which has a low computational complexity. Lightened CNN consists of two different models, i.e., Model A and Model B. Model A is based on the AlexNet [49], which contains four convolution layers using the max feature map (MFM) activation functions, four max-pooling layers, two fully-connected layers, and a linear layer with softmax activation in the output. Model B is based on the Network in Network model [50] and consists of five convolution layers using the MFM activation functions, four convolutional layers for dimensionality reduction, five max-pooling layers, two fully-connected layers, and a linear layer with softmax activation in the output. For Lightened CNN, FC1 is used for feature extraction. All the features extracted by VGGFace and Lightened CNN are classified using the nearest neighbor classifier with cosine distance. Like in subsection 4.1, the first experiment is one neutral face of each subject for training on the AR database, and the experimental results are summarized in Table 3. Table 4 lists the results when two neutral faces are used for training. From Tables 3 and 4, we can see that VGGFace and Lightened CNN perform better in the scarf scenario than in the sunglasses scenario. This indicates that they have difficulty to tackle the upper face occlusion, and this phenomenon is also observed in Ref. 51. For Lightened CNN, Model A outperforms Model B. Whether one or two neutral face images per subject are used for training, our proposed CSOIGO achieves the best overall recognition accuracy.

Table 3: Comparison with CNN-based approaches on the AR database when only one neutral face image per subject from Session 1 is used as training samples, the dimension that leads to the best result for IGO and SOIGO based approaches is given in parentheses.

Methods	Sunglasses		Scarf		Overall
	Session1	Session2	Session1	Session2
VGGFace FC6[46]	54.00	45.00	91.67	88.00	69.67
VGGFace FC7[46]	45.67	40.00	88.67	84.00	64.59
Lightened CNN (A)[48]	67.33	56.00	87.00	82.33	73.17
Lightened CNN (B)[48]	36.33	31.33	80.67	73.67	55.50
GD-HASLR[26]	92.00	66.67	82.67	58.67	75.00
IGO-PCA-NNC[29]	89.00 (99)	69.00 (99)	73.33 (97)	53.33 (96)	71.17
IGO-PCA-CRC	93.00 (85)	74.33 (92)	81.67 (88)	58.33 (95)	76.83
SOIGO-PCA-NNC	88.67 (92)	73.33 (96)	80.33 (99)	61.00 (88)	75.83
CSOIGO	92.67 (89)	76.67 (93)	83.33 (75)	65.33 (99)	79.50

Table 4: Comparison with CNN-based approaches on the AR database when two neutral face images (from Sessions 1 and 2) per subject are used as training samples, the dimension that leads to the best result for IGO and SOIGO based approaches is given in parentheses.

Methods	Sunglasses		Scarf		Overall
	Session1	Session2	Session1	Session2
VGGFace FC6[46]	44.67	51.00	91.67	93.33	70.17
VGGFace FC7[46]	41.67	44.67	88.67	89.33	66.08
Lightened CNN (A)[48]	64.67	58.33	86.67	85.33	73.75
Lightened CNN (B)[48]	38.67	38.00	81.67	79.33	59.42
GD-HASLR[26]	93.00	93.33	82.67	84.00	88.25
IGO-PCA-NNC[29]	93.00 (182)	91.67 (191)	78.00 (199)	74.00 (193)	84.17
IGO-PCA-CRC	96.00 (128)	95.33 (116)	85.00 (190)	84.00 (160)	90.08
SOIGO-PCA-NNC	96.33 (187)	92.67 (197)	86.33 (166)	83.67 (189)	89.75
CSOIGO	97.33 (144)	95.67 (124)	86.00 (119)	85.67 (198)	91.17

4.3 Random block occlusion

Here, we conduct other experiments using synthesized occluded face data as testing data. For each subject, seven non-occluded face images in the AR dataset in Session 1 are used for training and the other seven non-occluded images in Session 2 for testing, the image size is 42 $\times$ 30 pixels. Block occlusion is tested by placing the square baboon image on each test image. The location of the occlusion is randomly chosen and is unknown during training. We consider different sizes of the object such that the face is covered with the occluded object from 30% to 50% of its area, some occluded face images are shown in Fig. 5. The above experimental results indicate that GD-HASLR is superior to other competing approaches; therefore, in this subsection and the following subsection, we report the result of GD-HASLR for comparison. Recognition results for different levels of occlusion are shown in Table 5. One can see that CSOIGO outperforms GD-HASLR by a large margin, and the performance gain is significant with the increasing percentage of occlusion. Moreover, SOIGO-PCA-NNC outperforms IGO-PCA-NNC and CSOIGO performs better than IGO-PCA-CRC, which demonstrates that SOIGO is more robust than IGO when dealing with artificial occlusion.

To vividly show the performance of IGO and SOIGO based approaches under different number of features, in Fig. 6 we plot the recognition accuracy against the number of features when the percentage of occlusion is 30%. We can clearly see that with the increasing of the number of features, CSOIGO consistently outperforms the other three competing approaches.

Table 5: Recognition accuracy (%) of competing methods under different percentages of occlusion on the AR database, the dimension that leads to the best result for IGO and SOIGO based approaches is given in parentheses.

Occlusion percentage	30%	40%	50%
GD-HASLR[26]	81.29	71.14	56.14
IGO-PCA-NNC[29]	86.14 (588)	80.57 (606)	66.29 (321)
IGO-PCA-CRC	89.14 (205)	80.14(185)	71.29 (569)
SOIGO-PCA-NNC	88.86 (458)	84.57 (575)	73.29 (693)
CSOIGO	93.57 (423)	87.00 (533)	76.57 (698)

4.4 Recognition with mixed variations

In this subsection, we evaluate our proposed CSOIGO and other compared approaches under the mixed variations. As shown in Figs. 4 (a) and (b), the first seven images per subject in Session 1 have variations of expression and illumination, thus seven unoccluded images from Session 1 of the AR database are selected for training and another seven undisguised images from Session 2 are used for testing. Experimental results of compared methods are shown in Table 6. We can see that CSOIGO has the best performance. Specifically, it makes 1.86% and 0.86% improvement over GD-HASLR and IGO-PCA-CRC, respectively.

Table 6: Recognition accuracy (%) of compared approaches with mixed variations on the AR database, the dimension that leads to the best result for IGO and SOIGO based approaches is given in parentheses.

Methods	Accuracy (%)
GD-HASLR[26]	96.71
IGO-PCA-NNC[29]	93.14(478)
IGO-PCA-CRC	97.71(100)
SOIGO-PCA-NNC	94.71(371)
CSOIGO	98.57(171)

5 Conclusion

In this paper, we present a new method for occluded face recognition, namely CSOIGO, by exploiting the second order gradient information. SOIGO is robust to real disguise, synthesized occlusion and mixed variations. By employing CRC as the final classifier, our proposed method achieves impressive results in various scenarios and even outperforms some deep neural network based approaches.

In future work, we will introduce SOIGO into other popular subspace learning approaches, e.g., linear discriminant analysis (LDA), to extract more discriminative features. Moreover, other variants of CRC will also be investigated to further enhance the performance of recognition.

Acknowledgements.

This work was supported in part by the National Natural Science Foundation of China (Grant 62020106012, Grant U1836218, Grant 61902153, Grant 61876072, Grant 62006097, Grant 61672265), in part by the Fundamental Research Funds for the Central Universities (Grant JUSRP121104), in part by the Natural Science Foundation of Jiangsu Province (Grant BK20200593), and the 111 Project of Ministry of Education of China (Grant B12018).

References

[1] Y.-J. Zheng, J.-Y. Yang, J. Yang, et al., “Nearest neighbour line nonparametric discriminant analysis for feature extraction,” Electronics Letters 42(12), 679–680 (2006).
[2] Y.-j. Zheng, J. Yang, J.-y. Yang, et al., “A reformative kernel fisher discriminant algorithm and its application to face recognition,” Neurocomputing 69(13-15), 1806–1810 (2006).
[3] X.-J. Wu, J. Kittler, J.-Y. Yang, et al., “A new direct lda (d-lda) algorithm for feature extraction in face recognition,” in Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 4, 545–548, IEEE (2004).
[4] X. Luo, Z. Zhang, and X. Wu, “A novel algorithm of remote sensing image fusion based on shift-invariant shearlet transform and regional selection,” AEU-International Journal of Electronics and Communications 70(2), 186–197 (2016).
[5] X. Luo, Z. Zhang, B. Zhang, et al., “Image fusion with contextual statistical similarity and nonsubsampled shearlet transform,” IEEE Sensors Journal 17(6), 1760–1771 (2017).
[6] H. Li and X.-J. Wu, “Multi-focus image fusion using dictionary learning and low-rank representation,” in International Conference on Image and Graphics, 675–686, Springer, Cham (2017).
[7] H. Li, X.-J. Wu, and T. Durrani, “Nestfuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models,” IEEE Transactions on Instrumentation and Measurement 69(12), 9645–9656 (2020).
[8] C. Li, W. Yuan, A. Bovik, et al., “No-reference blur index using blur comparisons,” Electronics letters 47(17), 962–963 (2011).
[9] S.-G. Chen and X.-J. Wu, “A new fuzzy twin support vector machine for pattern classification,” International Journal of Machine Learning and Cybernetics 9(9), 1553–1564 (2018).
[10] J. Sun, C. Li, X.-J. Wu, et al., “An effective method of weld defect detection and classification based on machine vision,” IEEE Transactions on Industrial Informatics 15(12), 6322–6333 (2019).
[11] M. Wang, S. Wang, and X.-J. Wu, “Initial results on fuzzy morphological associative memmories,” Journal of Electronics 31(005), 690–693 (2003).
[12] J. Sun, W. Fang, and X.-J. Wu, “Quantum-behaved particle swarm optimization: principle and applications,” (2011).
[13] J. Wright, A. Y. Yang, A. Ganesh, et al., “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence 31(2), 210–227 (2009).
[14] I. Naseem, R. Togneri, and M. Bennamoun, “Linear regression for face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence 32(11), 2106–2112 (2010).
[15] Z.-Q. Li, J. Sun, X.-J. Wu, et al., “Sparsity augmented weighted collaborative representation for image classification,” Journal of Electronic Imaging 28(5), 053032 (2019).
[16] J. Dong, H. Zheng, and L. Lian, “Low-rank laplacian-uniform mixed model for robust face recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11897–11906 (2019).
[17] J. Yang, L. Luo, J. Qian, et al., “Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes,” IEEE Transactions on Pattern Analysis and Machine Intelligence 39(1), 156–171 (2017).
[18] Z. Chen, X.-J. Wu, and J. Kittler, “A sparse regularized nuclear norm based matrix regression for face recognition with contiguous occlusion,” Pattern Recognition Letters (2019).
[19] J. Chen and Z. Yi, “Sparse representation for face recognition by discriminative low-rank matrix recovery,” Journal of Visual Communication and Image Representation 25(5), 763–773 (2014).
[20] G. Liu, Z. Lin, S. Yan, et al., “Robust recovery of subspace structures by low-rank representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence 35(1), 171–184 (2012).
[21] G. Gao, J. Yang, X.-Y. Jing, et al., “Learning robust and discriminative low-rank representations for face recognition with occlusion,” Pattern Recognition 66, 129–143 (2017).
[22] Z. Hu, G. Gao, H. Gao, et al., “Robust face recognition via dual nuclear norm low-rank representation and self-representation induced classifier,” in 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), 920–924, IEEE (2018).
[23] S. Yang, L. Zhang, L. He, et al., “Sparse low-rank component-based representation for face recognition with low-quality images,” IEEE Transactions on Information Forensics and Security 14(1), 251–261 (2018).
[24] S. Yang, Y. Wen, L. He, et al., “Sparse individual low-rank component representation for face recognition in iot-based system,” IEEE Internet of Things Journal (2021).
[25] H. Xia, G. Feng, J.-x. Cai, et al., “Embedded conformal deep low-rank auto-encoder network for matrix recovery,” Pattern Recognition Letters 132, 38–45 (2020).
[26] C. Y. Wu and J. J. Ding, “Occluded face recognition using low-rank regression with generalized gradient direction,” Pattern Recognition 80, 256–268 (2018).
[27] X.-X. Li, P. Hao, L. He, et al., “Image gradient orientations embedded structural error coding for face recognition with occlusion,” Journal of Ambient Intelligence and Humanized Computing , 1–19 (2019).
[28] T. Zhang, Y. Y. Tang, B. Fang, et al., “Face recognition under varying illumination using gradientfaces,” IEEE Transactions on Image Processing 18(11), 2599–2606 (2009).
[29] G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “Subspace learning from image gradient orientations,” IEEE Transactions on Pattern Analysis and Machine Intelligence 34(12), 2454–2466 (2012).
[30] N.-S. Vu, “Exploring patterns of gradient orientations and magnitudes for face recognition,” IEEE Transactions on Information Forensics and Security 8(2), 295–304 (2012).
[31] Q. Zheng, Y. Wang, and P. A. Heng, “Online subspace learning from gradient orientations for robust image alignment,” IEEE Transactions on Image Processing 28(7), 3383–3394 (2019).
[32] J. Qian, J. Yang, Y. Xu, et al., “Image decomposition based matrix regression with applications to robust face recognition,” Pattern Recognition 102, 107204 (2020).
[33] D. Huang, C. Zhu, Y. Wang, et al., “Hsog: a novel local image descriptor based on histograms of the second-order gradients,” IEEE Transactions on Image Processing 23(11), 4680–4695 (2014).
[34] M. J. Morgan, “Features and the primal sketch,” Vision Research 51(7), 738–753 (2011).
[35] C. Li, G. Gao, Z. Liu, et al., “Defect detection for patterned fabric images based on ghog and low-rank decomposition,” IEEE Access 7, 83962–83973 (2019).
[36] Y. Zhang, X. Bai, J. Yan, et al., “No-reference image quality assessment based on multi-order gradients statistics,” Journal of Imaging Science and Technology 64(1), 10505–1 (2020).
[37] B. T. Bastian and C. Jiji, “Pedestrian detection using first-and second-order aggregate channel features,” International Journal of Multimedia Information Retrieval 8(2), 127–133 (2019).
[38] L. Zhang, M. Yang, and X. Feng, “Sparse representation or collaborative representation: Which helps face recognition?,” in 2011 International Conference on Computer Vision, 471–478, IEEE (2011).
[39] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research 9(Nov), 2579–2605 (2008).
[40] R. He, W.-S. Zheng, T. Tan, et al., “Half-quadratic-based iterative minimization for robust sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence 36(2), 261–275 (2013).
[41] J. Qian, L. Luo, J. Yang, et al., “Robust nuclear norm regularized regression for face recognition with occlusion,” Pattern Recognition 48(10), 3145–3159 (2015).
[42] S. Cai, L. Zhang, W. Zuo, et al., “A probabilistic collaborative representation based approach for pattern classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2950–2959 (2016).
[43] M. Iliadis, H. Wang, R. Molina, et al., “Robust and low-rank representation for fast face identification with occlusions,” IEEE Transactions on Image Processing 26(5), 2203–2218 (2017).
[44] C. Zhang, H. Li, C. Chen, et al., “Enhanced group sparse regularized nonconvex regression for face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).
[45] C. Zhang, H. Li, Y. Qian, et al., “Locality-constrained discriminative matrix regression for robust face identification,” IEEE Transactions on Neural Networks and Learning Systems (2020).
[46] O. M. Parkhi, A. Vedaldi, A. Zisserman, et al., “Deep face recognition.,” in British Machine Vision Conference, 1(3), 6 (2015).
[47] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 (2014).
[48] X. Wu, R. He, and Z. Sun, “A lightened cnn for deep face representation,” arXiv preprint arXiv:1511.02683 (2015).
[49] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 1097–1105 (2012).
[50] M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400 (2013).
[51] M. Mehdipour Ghazi and H. Kemal Ekenel, “A comprehensive analysis of deep learning based representation for face recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition workshops, 34–41 (2016).

He-Feng Yin received the B.S. degree in the School of Computer Science and Technology from Xuchang University, Xuchang, China, in 2011 and the Ph.D. degree from the School of Internet of Things Engineering, Jiangnan University, Wuxi, China, in 2020. Currently, he is a postdoctoral researcher in the School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China. He was a visiting PhD student at Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, under the supervision of Prof. Josef Kittler. His research interests include representation based classification methods, dictionary learning and low rank representation.

Xiao-Jun Wu received the B.Sc. degree in mathematics from Nanjing Normal University, Nanjing, China, in 1991, and the M.S. and Ph.D. degrees in pattern recognition and intelligent systems from the Nanjing University of Science and Technology, Nanjing, in 1996 and 2002, respectively. From 1996 to 2006, he taught at the School of Electronics and Information, Jiangsu University of Science and Technology, where he was promoted to Professor.

He has been with the School of AI & CS, Jiangnan University, since 2006, where he is a Professor of Computer Science and Technology. He was a Visiting Researcher with the CVSSP, University of Surrey, U.K., from 2003 to 2004. He has published over 300 research papers in refereed international journals and conferences. He is an Associate Editor of Pattern Recognition Letters, International Journal of Computer Mathematics, and several other journals. His current research interests include pattern recognition and computational intelligence. He was a Fellow of the International Institute for Software Technology, United Nations University, from 1999 to 2000. He was a recipient of the Most Outstanding Postgraduate Award from the Nanjing University of Science and Technology.

Xiao-Ning Song received the B.S. degree in computer science from Southeast University, Nanjing, China, in 1997, the M.S. degree in computer science from the Jiangsu University of Science and Technology, Zhenjiang, China, in 2005, and the Ph.D. degree in pattern recognition and intelligence system from the Nanjing University of Science and Technology, Nanjing, in 2010. He was a Visiting Researcher with the Centre for Vision, Speech, and Signal Processing, University of Surrey, Guildford, U.K., from 2014 to 2015. He is currently a Professor with the School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China. His current research interests include pattern recognition, machine learning, and computer vision.