Higher order multi-dimension reduction methods via Einstein-product
Abstract
This paper explores the extension of dimension reduction (DR) techniques to the multi-dimension case by using the Einstein product. Our focus lies on graph-based methods, encompassing both linear and nonlinear approaches, within both supervised and unsupervised learning paradigms. Additionally, we investigate variants such as repulsion graphs and kernel methods for linear approaches. Furthermore, we present two generalizations for each method, based on single or multiple weights. We demonstrate the straightforward nature of these generalizations and provide theoretical insights. Numerical experiments are conducted, and results are compared with original methods, highlighting the efficiency of our proposed methods, particularly in handling high-dimensional data such as color images.
keywords:
Tensor, Dimension reduction, Einstein product, Graph-based methods, Multi-dimensional Data, Trace optimization problem.1 Introduction
In today’s data-driven world, where amounts of information are collected and analyzed, the ability to simplify and interpret data has never been more critical. The task is particularly evident in the field of data science and machine learning [29], where the curse of dimensionality is a major obstacle. Dimension reduction techniques aim to address this issue by projecting high-dimensional data onto a lower-dimensional space while preserving the underlying structure of the data. These methods have been proven to be quite efficient in revealing hidden structures and patterns.
The landscape of DR is quite rich, with a wide range of methods, from linear to non-linear [18], supervised to unsupervised, and versions of these methods that incorporate repulsion-based principles, or kernels. We find as an example, Principal Component Analysis (PCA)[9], Locality Preserving Projections (LPP)[12], Orthogonal Neighborhood Preserving Projections (ONPP)[14, 15], Neighborhood Preserving Projections (NPP)[14], Laplacian Eigenmap (LE)[4], Locally Linear Embedding (LLE)[23]… Each of these techniques has been subject to extensive research and applications, offering insights into data structures that are often hidden in high-dimensional spaces, these methods can also be seen as an optimization problem of trace, with some constraints[24].
Current approaches often require transforming the multi-dimensional data, such as images [13, 3, 26, 19, 30], into a matrix, into flattened (vectorized) forms before analysis. This process, while it’s fast, however, can be problematic, as it may lead to loss of inherent structure and relational information within the data.
This paper proposes a novel approach to generalize dimensional reduction techniques, employing the Einstein product, a tool in tensor algebra, which is the natural extension of the usual matrix product. By reformulating the operations of both linear and non-linear methods in the context of tensor operations, the generalization maintains the multi-dimensional integrity of complex datasets. This approach circumvents the need for vectorization, preserving the rich intrinsic structure of the data.
Our contribution lies in not only proposing a generalized framework for dimensional reduction, but also in demonstrating its effectiveness through empirical studies. We show that the proposed methods, outperform or at least are the same as their matrix-based counterparts, while preserving the integrity of the data.
This paper is organized as follows. Firstly, we will talk in Section 2 about the methods in the matrix case, then, in Section 3, we will introduce the mathematical background of tensors, and the Einstein product. Next, in Section 4, we will introduce the different methods, and the generalization of these methods using the Einstein product. Following that, Section 5 is dedicated to presenting variants of these techniques. Subsequently, we will present the numerical experiments and the results in Section 6. Lastly, we offer some concluding remarks and suggestion of future work in 7.
2 Dimension reduction methods in matrix case
Given a set of data points and a set of corresponding points , denote the data matrix and the low-dimensional matrix . The objective is to find a mapping . The mapping is either non-linear , or linear , in the latter case, it reduces to find the projection matrix .
We denote the similarity matrix of a graph by , the degree matrix by , and the Laplacian matrix by . For the sake of simplifications, we will define some new matrices
where is the centering matrix, and
The usual loss functions used are defined as follows
(1) | |||||
(2) | |||||
(3) |
Equations (1), and (3) preserve the locality, i.e., the point and its representation stay close, while Equation (2) preserves the local geometry, i.e., the representation point can be written as a linear combination of its neighbours.
For simplicity, we will refer to the eigenvectors of a matrix corresponding to the largest and smallest eigenvalues, respectively, as the largest and smallest eigenvectors of a matrix. The same terminology applies to the left or right singular vectors. Table 1 summarizes the various dimension reduction methods, their corresponding optimization problems and the solutions.
Method |
Loss
function |
Constraint | Solution |
Linear methods | |||
Principal component analysis[9]. | Maximize Equation 3. | Largest left singular vectors of . | |
Locality Preserving Projections. [12] | Minimize Equation 1 | Solution of . | |
Orthogonal Locality Preserving Projections.[14, 15] | Minimize Equation 1 | Smallest eigenvectors of . | |
Orthogonal Neighborhood Preserving Projections.[14, 15] | Minimize Equation 2 | Smallest eigenvectors of . | |
Neighborhood Preserving Projections.[14] | Minimize Equation 2 | Sol of | |
Non-Linear methods | |||
Locally Linear Embedding.[23] | Minimize Equation 2 | Smallest Eigenvectors of . | |
Laplacian Eigenmap.[4] | Minimize Equation 1 | Solution of . |
Notice that the smallest eigenvalue is disregarded in the solutions, thus, the second to the eigenvectors are taken. The graph based methods are quite similar, each one tries to give an accurate representation of the data while preserving a desired property. The solution of the optimization problem is given by the eigenvectors, or the singular vectors.
Next, we will introduce notations related to the tensor theory (Einstein product) as well as some properties that guarantee the proposed generalization.
3 The Einstein product and its properties
Let and be two multi-indices, and and be two indices. The index mapping function that maps the multi-index to the corresponding index in the vectorized form of a tensor of size . The unfolding, also known also as flattening or matricizaion, is a function with , that maps a tensor into a matrix, with the subscripts and . The mapping is a linear isomorphism, and its inverse is denoted by . It would generalize some concepts of the matrix theory more easily.
The frontal slice of the N-order tensor , denoted by is the tensor (the last mode is fixed to ). A tensor is called even if and square if for all [21].
Definition 1 (m-mode product).
[17]Let , and , the m-mode (matrix) product of and is a tensor of size , with element-wise
(4) |
Definition 2 (Einstein product).
[6]Let and
, the Einstein product of the tensors and is the tensor of size whose elements are defined by
(5) |
Next, we have some definitions related to the Einstein product.
Definition 3.
-
•
Let , then the transpose tensor [21] of denoted by is the tensor of size whose entries defined by .
-
•
is a diagonal tensor if all of its entries are zero except for those on its diagonal, denoted as , for all .
-
•
The identity tensor denoted by is a diagonal tensor with only ones on its diagonal.
-
•
A square tensor is called symmetric if .
Remark 1.
In case of no confusion, The identity tensor will be denoted simply .
Definition 4.
The inner product of tensors is defined by
(6) |
The inner product induces The Frobenius norm as follows
(7) |
Definition 5.
A square 2N-order tensor in invertible (non-singular) if there is a tensor denoted by of same size such that . It is unitary if . It is positive semi-definite if for all non-zero . It is positive definite if the inequality is strict.
An important relationship that is easy to prove is the stability of the Frobenius norm under the Einstein product with a unitary tensor.
Proposition 6.
Let and be a unitary tensor, then
(8) |
The proof is straightforward using the inner product definition.
Proposition 7.
[27] For even order tensors , we have
(9) | ||||
Proposition 8.
[20] Given tensors , , we have
(10) |
The isomorphism has some properties that would be useful in the following.
Proposition 9.
[27] Given the tensors and of appropriate size then, we have is a multiplicative morphism with respect the Einstein product, i.e., .
It allows us to prove the Einstein Tensor Spectral Theorem.
Theorem 10 (Einstein Tensor Spectral Theorem).
A symmetric tensor is diagonalizable via the Einstein product.
Proof.
The proof is using the isomorphism and its properties 9.
Let be a symmetric tensor of size , then is symmetric, and by the spectral theorem, there exists an orthogonal matrix such that , where is a diagonal matrix.
Then, , and , with is a unitary, and is diagonal tensor.
∎
The cyclic property of the trace with Einstein product is also verified, which would be needed in the sequel.
Proposition 11 (Cyclic property of the trace).
Given tensors , , we have
(11) |
Theorem 12.
[20] Let , the Einstein singular value decomposition (E-SVD) of is defined by
(12) |
where with the following properties
and are unitary, where are the left and right singular tensors of , respectively.
If , then
.
If , then
The numbers are the singular values of with the decreasing order
with .
We define the eigenvalues and eigen-tensors of a tensor with the following.
Definition 13.
[28] Let a square 2-N order tensors , then
-
•
Tensor Eigenvalue problem: If there is a non null , and such that , then is called an eigen-tensor of , and is the corresponding eigenvalue.
-
•
Tensor generalized Eigenvalue problem: If there is a non null , and such that , then is called an eigen-tensor of the pair , and is the corresponding eigenvalue.
Remark 2.
If , the two definitions above coincide with the eigenvalue and generalized eigenvalue problems, respectively.
We can also show a relationship between the singular values and the eigenvalues of a tensor.
Proposition 14.
Let the E-SVD of , defined as , then
-
•
The eigenvalues of and are the squared singular values of .
-
•
The eigen-tensors of are the left singular tensors of .
-
•
The eigen-tensors of are the right singular tensors of .
The proof is straightforward.
To simplify matters, we’ll denote the eigen-tensors of a tensor, associated with the smallest eigenvalues, as the smallest eigen-tensors. Similarly, we will apply the same principle to the largest eigen-tensors. This terminology also extends to the left or right singular tensors.
Remark 3.
To generalize the notion of left and right inverse for a non-square tensors. It is called left or right -invertible if is left or right invertible, respectively. In case of confusion, we will denote by to represent the transformation of tensor to a matrix .
Proposition 15.
-
1.
A square 2-N order symmetric is positive semi-definite tensor, definite tensor, respectively, if and only if there is a tensor, an invertible tensor, respectively, of same size such that .
-
2.
Let a tensor of order N, with its transpose in then is semi-definite positive for any .
Let a tensor of order N, let such that the tensor is invertible, with its transpose in , then is definite positive. -
3.
The eigenvalues of a square symmetric tensor are real.
-
4.
Let a symmetric matrix and , with its transpose in then is a symmetric.
-
5.
If is positive semi-definite, then is positive semi-definite.
-
6.
If is positive definite, and is invertible, then is positive definite.
Proof.
The proof of the first one is straightforward using 9.
Let , then .
The proof of the third is similar to the second one.
Let be a symmetric tensor and non-zero tensor with , then
, then , which completes the proof.
We have , then conclude by the symmetry of .
Let be a positive semi-definite matrix, then there exist a matrix such that , then
with , then the result follows.
The last has a similar proof; Using the fact that is positive definite, then is invertible, and is invertible, then is invertible, and the result follows.
∎
We also have a property that relates the tensor generalized eigenvalue problem with the tensor eigenvalue problem.
Proposition 16.
Let the generalized eigenvalue problem , with are a square 2-N order tensor, with being invertible, then is a solution of the tensor eigen-problem with .
Theorem 17.
Let a symmetric , and a positive definite tensor of same size, then
is equivalent to solve the generalized eigenvalue problem .
Proof.
Since is an isomorphism, the problem is equivalent to minimize with ,
. We have symmetric and is definite positive. The solution of the equivalent problem is the smallest eigenvalues of , using the fact is an isomorphism, we obtain the result.
A second proof without using the isomorphism property is the following.
Let the Lagrangian of the problem be
with the Lagrange multiplier. Using KKT conditions, we have
To compute the partial derivative with respect to , we introduce the functions and , defined as follows
Subsequently, we aim to determine the partial derivative.
Then, as is symmetric, the partial derivative in the direction is
It gives us the partial derivative of with respect to as
For the second function, we have
We used the cyclic property 11 and the transpose with trace property 9 in the last equality. Then, as is symmetric, the partial derivative in the direction is
This yields the partial derivative of with respect to as follows
Subsequently,
The third line utilizes the property that , which is symmetric, thus diagonalizable (10), i.e., . The last two lines are justified by the fact that also satisfies , concluding the proof. ∎
A corollary of this theorem can be deduced.
Corollary 18.
Let , the solution of
is the smallest left singular tensors of .
Proof.
We have Theorem 17 tells that the solution is equivalent to solve , i.e, the smallest tensors of , which corresponds exactly to the smallest left singular tensors of . ∎
4 Multidimensional reduction
In this section, we present a generalized approach to DR methods using the Einstein product.
Given a tensor , our objective is to derive a low-dimensional representation of . This involves defining a mapping function .
First, we discuss the determination of the weight matrix, which can be computed in various ways, one common method is using the Gaussian kernel
Additionally, introducing a threshold can yield a sparse matrix (Gaussian-threshold). We also explore another method later in this section, which utilizes the reconstruction error.
Next, we introduce our proposed methods.
Linear methods: The linear methods can be written as . It is sufficient to find the projection matrix .
Higher order PCA based on Einstein [11] is the natural extension of PCA to higher order tensors. It extends the PCA applied to images, using the notion of eigenfaces to colored images that are modeled by a fourth-order tensor, using the Einstein product. It vectorizes pixels (height and width) for each color (RGB) to get a third-order tensor, then it computes E-SVD of this tensor centered, to get the eigenfaces.
The vectorization is not natural, since we omit the spatial information. The proposed work hides the vectorization in the first step by using the tensor directly, and seeks to find a solution of the following problem
(13) |
The objective function can be written as
with , where represents the mean, the solution of (13) is the largest left singular tensors of the centered tensor .
Since the feature dimension is typically larger than the number of data points , computing the E-SVD of can be computationally expensive. It’s preferable to have a runtime that depends on instead. To achieve this, we transform the equation to , with . This allows us to find the eigenvectors of a square matrix of size . The projected data would be these vectors reshaped to the appropriate size.
The algorithm bellow, shows the steps of PCA via Einstein product.
Input: (Data) (dimension output).
Output: (Projection space).
4.1 Generalization of ONPP
Given a weight matrix , the objective function is to resolve
(14) |
The objective function can be written as
Using corollary 18, the solution of 14 is the smallest left singular tensors of .
The algorithm bellow, shows the steps of ONPP via Einstein product.
Input: (Data) (subspace dimension).
Output: (Projection space).
4.1.1 Multi-weight ONPP
In this section, we propose a generalization of ONPP, where multiple weights matrices are employed on the mode. We denote by the weight tensor. Let denotes the tensor , by fixing its last two indices to , i.e., , similarly denotes . We assume that the -th frontal slice of the weight tensor is constructed only from the -th frontal slice of the data tensor.
The objective function is
(15) |
For each , utilizing the independence of the frontal slices , we can divide the objective function into independent objective functions. The solution is obtained by concatenating the solutions of each objective function. The -th objective function can be written as
(16) |
The solution of this objective function is the smallest left singular tensors of
. The solution of the original problem is obtained by concatenating the solutions of each objective function.
4.2 Generalization of OLPP
Given a weight matrix , the optimization problem to solve is
(17) |
The objective function can be written as
where is the Laplacian matrix corresponding to .
The solution using Theorem 18, is the smallest eigen-tensors of the symmetric tensor (Prop. 15) .
The algorithm bellow, shows the steps of OLPP via Einstein product.
Input: (Data) (subspace dimension).
Output: (Projection space).
4.2.1 Multi-weight OLPP
In this section, we propose a generalization of OLPP, where multiple weights are utilized on the mode. The objective function is to solve
(18) |
Here, we have independent objective functions, and the solution is obtained by concatenating the solutions of each objective function.
4.3 Generalization of LPP
LPP is akin to the Laplacian Eigenmap, serving as its linear counterpart. The objective function of LPP solves
(19) |
The solution involves finding the smallest eigen-tensor of the generalized eigen-problem
(20) |
Using 15, we deduce that the tensors , , are symmetric semi-definite positive. Here, we assume definiteness (although generally not true, especially if the number of sample points is less than the product of the feature dimensions). The projection tensor is obtained by adding these eigen-tensors and reshaping it to the desired dimension.
The algorithm bellow, shows the steps of LPP via Einstein product.
Input: (Data) (subspace dimension).
Output: (Projection space).
Similarly, the multi-weight LPP can be proposed.
4.4 Generalization of NPP
The generalization of NPP resembles that of ONPP, under the constraint of LLE, it aims to find
(21) |
The solution entails finding the smallest eigenvectors of the generalized eigen-problem
The projection tensor is obtained by concatenating these eigenvectors into a matrix and then reshaping it to the desired dimension.
The algorithm bellow, shows the steps of NPP via Einstein product.
Input: (Data) (subspace dimension).
Output: (Projection space).
Similarly, the multi-weight NPP can be proposed, and the solution is similar.
Nonlinear methods: Nonlinear DR methods are potent tools for uncovering the nonlinear structure within data. However, they present their own challenges, such as the absence of an inverse mapping, which is essential for data reconstruction. Another difficulty encountered is the out-of-sample extension, which involves extending the method to new data. While a variant of these methods utilizing multiple weights could be proposed, it would resemble the approach of linear methods, and thus we will not delve into them here.
4.5 Generalization of Laplacian Eigenmap
Given a weight matrix , the objective function is to solve
(22) |
The objective function can be written as
For , the constraint becomes , using , the objective function becomes
(23) |
Using the isomorphism and its properties, the problem is equivalent to Equation 1 under the constraint . Since the solution is the smallest eigenvectors of , the solution of the original problem would be .
The algorithm bellow, shows the steps of LE via Einstein product.
Input: (Data) (subspace dimension).
Output: (Projection data).
4.5.1 Projection on out of Sample Data
The out of sample extension is the problem of extending the method to new data. Many approaches were proposed to solve this problem, such as the Nyström method, kernel mapping, eigenfunctions [5] [5, 25], etc. We will propose a method that is based on the eigenfunctions.
In matrix case, the out of sample extension is done simply computing the components explicitly as , where is the kernel matrix of the new data, and is eigentuple of the kernel matrix , is the kernel vector of the test data .
It can be written as
Thus, the generalization is straightforward.
4.6 Generalization of LLE
LLE is a nonlinear method that aims to preserve the local structure of the data. After finding the -nearest neighbors, it determines the weights that minimize the reconstruction error. In other words, it seeks to solve the following objective function
(24) |
subject to two constraints: a sparse one (the weights are zero if a point is not in the neighborhood of another point), ensuring the locality of the reconstruction weight, and an invariance constraint (the row sum of the weight matrix is 1). Finally, the projected data is obtained by solving the following objective function
(25) |
4.6.1 Computing the weights
Equation 24 can be decomposed to finding the weights of a point independently. For simplicity, we will refer to the neighbors of as , and the weights of its neighbors as , with the number of neighbors, which plays the rule of the sparseness constraint. Denote the local Grammian matrix as .
The invariance constraint can be written as , thus, we can write the problem as
This constrained problem can be solved using the Lagrangian
We compute the partial derivative with respect to and setting it to zero
(26) | ||||
We utilize the fact that the Grammian matrix is symmetric, assuming that is full rank, which is typically the case. However, if is not full rank, a small value can be added to its diagonal to ensure full rank.
The partial derivative with respect to , set to zero, yields the invariance constraint of point , i.e., . Multiplying Equation (26) by , we can isolate and arrive at the following equation
4.6.2 Computing the projected data
The final step resembles the previous cases, ensuring that the mean constraint is satisfied. As can be represented as a Laplacian, we know that the number of components corresponds to the multiplicity of the eigenvalue 0. Hence, there is at least one eigenvalue 0 with multiplicity 1, and the identity tensor serves as the corresponding eigenvector, thereby satisfying the second constraint. For further elaboration, interested readers can refer to [10].
The solution is equivalent to solving the matricized version, where the solution comprises the smallest singular eigenvectors of the matrix . Consequently, the solution to the original problem is the inverse transform, denoted as , of the solution to the matricized version.
The algorithm bellow, shows the steps of LLE via Einstein product.
Input: (Data) (subspace dimension).
Output: (Projection data).
4.6.3 Projection on out of Sample Data
To extend these methods to new data (test data) not seen in the training set, various approaches can be employed. These include kernel mapping, eigenfunctions (as discussed in Bengio et al. [5]), and linear reconstruction. Here, we opt to generalize the latter approach. We can follow similar steps to those used in matrix-based methods. Specifically, we can perform the following steps without re-running the algorithm on the entire dataset
-
1
Find the neighbors in training data of each new data test.
-
2
Compute the reconstruction weight that best reconstruct each test point from its k neighbours in the training data.
-
3
Compute the low dimensional representation of the new data using the reconstruction weight.
More formally, after finding the neighbours of a test data , we solve the following problem
with is the reconstruction weight of the test data, under the same constraint, i.e., the row sum of the weight matrix is 1, with value zero if it’s not in the neighbour of a point.
The solution is , with .
Finally, the embedding of the test data is obtained as , with are the embedding representation of the neighbours of the test data.
5 Other variants via Einstein product
5.1 Kernel methods
Kernels serve as powerful tools enabling linear methods to operate effectively in high-dimensional spaces, allowing for the representation of nonlinear data without explicitly computing data coordinates in this feature space. The kernel trick, a pivotal breakthrough in machine learning, facilitates this process. Formally, instead of directly manipulating the data , we operate within a high-dimensional implicit feature space using a function .
We denote the tensor . The mapping need not be explicitly known; only the Gram matrix is required. This matrix represents the inner products of the data in the feature space, defined as . Consequently, any method expressible in terms of data inner products can be reformulated in terms of the Gram matrix. Consequently, the kernel trick can be applied sequentially. Moreover, extending kernel methods to multi-linear operations using the Einstein product is straightforward. Commonly used kernels include the Gaussian, polynomial, Laplacian, and Sigmoid kernels, among others.
Denote , with .
5.1.1 Kernel PCA via Einstein product
The kernel multi-linear PCA solves the following problem
(27) |
with is the mean of kernel points, the solution is the largest left singular tensors of . It needs to calculate the eigen-tensors of , which is not accessible, however, the Grammian is available, we can transform the problem to
with representing the Grammian of the centered data, that can easily be obtained from only as , since
Then, the solution is the same as the matrix case, i.e., the largest eigenvectors of , reshaped to the appropriate size.
The algorithm bellow, shows the steps of the kernel PCA via Einstein product.
Input: (Data) (dimension output) (Grammian).
Output: (Projected data).
5.1.2 Kernel LPP via Einstein product
The kernel multi-linear LPP tackles the following problem
(28) |
The solution involves finding the smallest eigen-tensors of the generalized eigen-problem
is not available, the problem needs to be reformulated, Utilizing the fact that is invertible, we reformulate the problem as to find the vectors , solution of the generalized eigen-problem
This formulation reduces to the same minimization problem as in the matrix case.
5.1.3 Kernel ONPP via Einstein product
The kernel multi-linear ONPP addresses the following optimization problem
(29) |
The solution involves finding the smallest eigen-tensors of problem
By employing similar techniques as before, we can derive the equivalent problem that seeks to find the vectors , solution of of the eigen-problem
The problem is the same minimization problem, the solution can be obtained from reshaping the transpose of the concatenated vectors .
The algorithm bellow, shows the steps of the kernel ONPP via Einstein product.
Input: (Grammian) (subspace dimension).
Output: (Projected data).
5.1.4 Kernel OLPP via Einstein product
The kernel multi-linear OLPP tackles the optimization problem defined as follows:
(30) |
The solution of the problem involves the eigen-tensors of , By transforming the problem, we arrive to find the vectors , solution of of the eigen-problem
which mirrors the matrix case. Here, the solution can be obtained from reshaping the transpose of the concatenated vectors .
The algorithm bellow, shows the steps of the kernel OLPP via Einstein product.
Input: (Grammian) (subspace dimension).
Output: (Projected data).
5.2 Supervised learning
In general, supervised learning differs from unsupervised learning primarily in how the weight matrix incorporates class label information. Supervised learning tends to outperform unsupervised learning, particularly with small datasets, due to the utilization of additional class label information.
In supervised learning, each data point is associated with a known class label. The weight matrix can be adapted to include this class label information. For instance, it may take the form of a block diagonal matrix, where represents sub-weight matrices, and denotes the number of data points in class . Let denote the class of data point .
Supervised PCA: PCA is the sole linear method presented devoid of a graph matrix. Consequently, Supervised PCA implementation is not straightforward, necessitating a detailed explanation. Following the approach proposed in [2], we address this challenge by formulating the problem and leveraging the empirical Hilbert-Schmidt independence criterion (HSIC):
where is the kernel of the outcome measurements . Thus, the generalization would be to solve
(31) |
The solution of 31 is the largest eigen-tensors of .
Notice that when, , we get the same problem as in the unsupervised case.
The algorithm bellow, shows the steps of the Supervised PCA via Einstein product.
Input: (Data) (dimension output). (Kernel of labels)
Output: (Projection space).
Supervised Laplacian Eigenmap: The supervised Laplacian Eigenmap is similar to the Laplacian Eigenmap, with the difference that the weight matrix is computed using the class label, many approaches were proposed [22, 8, 25]. We choose a simple approach that changes the weight matrix to , and the rest of the algorithm is the same.
Supervised LLE: There are multiple variants of LLE that uses the class label to improve the performance of the method, e.g., Supervised LLE (SLLE), probabilistic SLLE [33], supervised guided LLE using HSIC [1], enhanced SLLE [31]…, the general strategy is to incorporate the class label either in computing the distance matrix, the weight matrix, or in the objective function [10]. We choose the simplest which is the first strategy; By changing the distance matrix by adding term that increases the inter-class and decreases the intra-class variance. The rest of the steps are the same as the unsupervised LLE.
5.2.1 Repulsion approaches
In the semi-supervised or the supervised learning, how we use the class label can affect the performance, commonly, the similarity matrix, tells us only if two points are of the same class or not, without incorporating any additional information on data locality, e.g., the closeness of points of different classes.., thus the repulsion technique is used to take into account the class label information, by repulsing the points of different classes, and attracting the points of the same class. It extends the traditional graph-based methods by incorporating repulsion or discrimination elements into the graph Laplacian, learning to more distinct separation of different classes in the reduced-dimensional space by integrating the class label information directly into the graph structure. The concept of repulsion has been used in DR with different formulations [32, 7] before using the k-nn graph to derive it. [16] a generic proposed a method that applies attraction to all points of the same class with the use of repulsion between nearby points of different classes, which was found to be significantly better than the previous approaches. Thus, we will use the same approach and generalize it to the Einstein-product.
The repulsion graph is derived from k-nn Graph based on the class label information, the weight of the edges can be computed in the simplest form as
Hence, in the case of fully connected graph, the repulsion weight would be of the form
Other weights value can be proposed.
The new repulsion algorithms are similar to the previous ones, with the new weight matrix with is a parameter.
6 Experiments
To show the effectiveness of the proposed methods, we will use datasets that are commonly used in the literature. The experiments will be conducted on the GTDB dataset for the facial recognition, and the MNIST dataset for the digit recognition. We note that these datasets give the raw images instead of features. The results will be compared to the state of art methods, by using the projected data in a classifier. The baseline is also used for comparison, which is utilizing the raw data as the input of the classifier, and the recognition rate will be used as the evaluation metric for all methods. Images were chosen because the proposed methods are designed to work on multi-linear data, and the image is a typical example of such data. The proposed methods that use the multi-weight will be denoted by adding ”” to the name of the method. It is intuitive to use multi-weight for images since the third mode represent the RGB while the first two modes represent the location of the pixel.
The evaluation metric Recognition rate (IR) is used to evaluate the performance of the proposed method. It is defined as the number of correct classification over the total number of testing data. A correct classification is done by computing the minimum distance between the projected data training and the projected testing data. The IR is computed on the testing data.
For simplicity, we used the supervised version of methods, with Gaussian weights, and the recommended parameter in [16] (half the median of data) for the Gaussian parameter.
All computations are carried out on a laptop computer with 2.1 GHz Intel Core 7 processors 8-th Gen and 8 GB of memory using MATLAB 2021a.
6.1 Digit recognition
The Dataset that will be used in the experiments is the MNIST dataset 111https://lucidar.me/en/matlab/load-mnist-database-of-handwritten-digits-in-matlab/. It contains 60,000 training images and 10,000 testing images of labeled handwritten digits. The images are of size , and are normalized gray images. The evaluation metric is the same as the facial recognition. We will work with smaller subset of the data to speed up the computation. e.g., 1000 training images and 200 testing images taking randomly from the data. Observe that the multi-weight methods are not used in this data since it is gray data, thus, we don’t have multiple weights.
Table 2 shows the performance of different approaches compared to the state-of-art based on different subspace dimensions.
- | OLPP | OLPP-E | ONPP | ONPP-E | PCA | PCA-E | Baseline |
---|---|---|---|---|---|---|---|
5 | 50,50 | 50,50 | 56,00 | 56,00 | 63,00 | 63,00 | 8,50 |
10 | 75,50 | 75,50 | 81,50 | 81,50 | 82,50 | 82,50 | 8,50 |
15 | 81,00 | 81,00 | 80,50 | 80,50 | 84,50 | 84,50 | 8,50 |
20 | 85,00 | 85,00 | 83,50 | 83,50 | 88,00 | 88,00 | 8,50 |
25 | 86,00 | 86,00 | 87,50 | 87,50 | 88,00 | 88,00 | 8,50 |
30 | 85,50 | 85,50 | 87,50 | 87,50 | 89,00 | 89,00 | 8,50 |
35 | 88,00 | 88,00 | 89,50 | 89,50 | 87,00 | 87,00 | 8,50 |
40 | 88,00 | 88,00 | 89,00 | 89,00 | 87,50 | 87,50 | 8,50 |
The results are similar in the MNIST dataset between the method with its Multi dimensional counterpart, we claim that it is due to the fact that the vectorization of 2 dimension to 1 does not affect much the accuracy, which leads to similar results using the proposed parameters.
Note that the objective is to compare a method with its proposed multi dimension counterpart via Einstein product to see if the generalization works.
6.2 Facial recognition
The dataset that will be used in the experiments is the Georgia Tech database GTDB crop 222https://www.anefian.com/research/face_reco.htm. It contains 750 color JPEG images of 50 person, with each one represented by exactly 15 images that show different facial expression, scale and lighting conditions. Figure 1
shows an example of 12 arbitrary images from the possible 15 of an arbitrary person in the data set.
Our data in this case is a tensor of size when dealing with RGB, and when dealing with gray images. The height and width of the images are fixed to . The data is normalized.

The experiment is done using 12 images for training and 3 for testing per face. Figure 2 shows these results for different subspace dimension reduction

The results show that as the subspace dimension increases, the performance of most methods also increases, suggesting that these methods benefit from a higher dimensional-feature space up to a point that differ from a method to another. The generalized methods using the Einstein product gives overall better result on all subspace dimension compared to its counterparts, except for the ONPP in the small case. The Multiple-weight methods show varying performance. They outperform the single-weight in some cases. Future work could be considered to enhance how the to aggregate the results of each weights in order to give a more robust results.
The objective is to compare between a method and its multi dimensional counter parts via Einstein product, e.g., the OLPP method with the OLPP-E, and OLPP-E-MW.
The superiority of Einstein based methods can be justified by the fact that, it preserve the multi-linear structure of the data, and the non-linear structure of the data, which is not the case of the vectorization of the data, which is the case of the other matricized methods.
7 Conclusion
The paper advances the field of dimension reduction by introducing refined graph-based methods and leveraging the Einstein product for tensor data. It extends both the Linear and Nonlinear methods (supervised and unsupervised) to higher order tensors as well as its variants. The methods are conducted on the GTDB and MNIST dataset, and the results are compared to the state-of-art-methods showing the competitive results. A future work could be conducted on generalization on trace ratio methods as Linear Discriminant Analysis. An acceleration of the computation can also be proposed using the Tensor Golub Kahan decomposition to get an approximation of these eigen-tensors in constructing the projected space.
References
- [1] A. Álvarez-Meza, J. Valencia-Aguirre, G. Daza-Santacoloma, and G. Castellanos-Domínguez, Global and local choice of the number of nearest neighbors in locally linear embedding, Pattern Recognition Letters, 32 (2011), pp. 2171–2177.
- [2] E. Barshan, A. Ghodsi, Z. Azimifar, and M. Z. Jahromi, Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds, Pattern Recognition, 44 (2011), pp. 1357–1371.
- [3] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, Eigenfaces vs. fisherfaces: Recognition using class specific linear projection, IEEE Transactions on pattern analysis and machine intelligence, 19 (1997), pp. 711–720.
- [4] M. Belkin and P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural computation, 15 (2003), pp. 1373–1396.
- [5] Y. Bengio, J.-f. Paiement, P. Vincent, O. Delalleau, N. Roux, and M. Ouimet, Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering, Advances in neural information processing systems, 16 (2003).
- [6] M. Brazell, N. Li, C. Navasca, and C. Tamon, Solving multilinear systems via tensor inversion, SIAM Journal on Matrix Analysis and Applications, 34 (2013), pp. 542–570.
- [7] H.-T. Chen, H.-W. Chang, and T.-L. Liu, Local discriminant embedding and its variants, in 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol. 2, IEEE, 2005, pp. 846–853.
- [8] J. A. Costa and A. Hero, Classification constrained dimensionality reduction, in Proceedings.(ICASSP’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., vol. 5, IEEE, 2005, pp. v–1077.
- [9] K. P. F.R.S., Liii. on lines and planes of closest fit to systems of points in space, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2 (1901), pp. 559–572.
- [10] B. Ghojogh, A. Ghodsi, F. Karray, and M. Crowley, Locally linear embedding and its variants: Tutorial and survey, arXiv preprint arXiv:2011.10925, (2020).
- [11] A. E. Hachimi, K. Jbilou, M. Hached, and A. Ratnani, Tensor golub kahan based on einstein product, arXiv preprint arXiv:2311.03109, (2023).
- [12] X. He and P. Niyogi, Locality preserving projections, Advances in neural information processing systems, 16 (2003).
- [13] X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J. Zhang, Face recognition using laplacianfaces, IEEE transactions on pattern analysis and machine intelligence, 27 (2005), pp. 328–340.
- [14] E. Kokiopoulou and Y. Saad, Orthogonal neighborhood preserving projections, in Fifth IEEE International Conference on Data Mining (ICDM’05), IEEE, 2005, pp. 8–pp.
- [15] E. Kokiopoulou and Y. Saad, Orthogonal neighborhood preserving projections: A projection-based dimensionality reduction technique, IEEE Transactions on Pattern Analysis and Machine Intelligence, 29 (2007), pp. 2143–2156.
- [16] E. Kokiopoulou and Y. Saad, Enhanced graph-based dimensionality reduction with repulsion laplaceans, Pattern Recognition, 42 (2009), pp. 2392–2402.
- [17] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM review, 51 (2009), pp. 455–500.
- [18] J. A. Lee, M. Verleysen, et al., Nonlinear dimensionality reduction, vol. 1, Springer, 2007.
- [19] Q. Liu, R. Huang, H. Lu, and S. Ma, Face recognition using kernel-based fisher discriminant analysis, in Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition, IEEE, 2002, pp. 197–201.
- [20] C. B. Lizhu Sun, Baodong Zheng and Y. Wei, Moore–penrose inverse of tensors via einstein product, Linear and Multilinear Algebra, 64 (2016), pp. 686–698.
- [21] L. Qi and Z. Luo, Tensor analysis: spectral theory and special tensors, SIAM, 2017.
- [22] B. Raducanu and F. Dornaika, A supervised non-linear dimensionality reduction approach for manifold learning, Pattern Recognition, 45 (2012), pp. 2432–2444.
- [23] S. T. Roweis and L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding, science, 290 (2000), pp. 2323–2326.
- [24] L. K. Saul, K. Q. Weinberger, F. Sha, J. Ham, and D. D. Lee, Spectral methods for dimensionality reduction, in Semi-Supervised Learning, The MIT Press, 09 2006.
- [25] M. Tai, M. Kudo, A. Tanaka, H. Imai, and K. Kimura, Kernelized supervised laplacian eigenmap for visualization and classification of multi-label data, Pattern Recognition, 123 (2022), p. 108399.
- [26] M. A. Turk and A. P. Pentland, Face recognition using eigenfaces, in Proceedings. 1991 IEEE computer society conference on computer vision and pattern recognition, IEEE Computer Society, 1991, pp. 586–587.
- [27] Q.-W. Wang and X. Xu, Iterative algorithms for solving some tensor equations, Linear and Multilinear Algebra, 67 (2019), pp. 1325–1349.
- [28] Y. Wang and Y. Wei, Generalized eigenvalue for even order tensors via einstein product and its applications in multilinear control systems, Computational and Applied Mathematics, 41 (2022), p. 419.
- [29] A. R. Webb, K. D. Copsey, and G. Cawley, Statistical pattern recognition, vol. 2, Wiley Online Library, 2011.
- [30] M.-H. Yang, Face recognition using kernel methods, Advances in neural information processing systems, 14 (2001).
- [31] S.-q. Zhang, Enhanced supervised locally linear embedding, Pattern Recognition Letters, 30 (2009), pp. 1208–1218.
- [32] W. Zhang, X. Xue, H. Lu, and Y.-F. Guo, Discriminant neighborhood embedding for classification, Pattern Recognition, 39 (2006), pp. 2240–2243.
- [33] L. Zhao and Z. Zhang, Supervised locally linear embedding with probability-based distance for classification, Computers & Mathematics with Applications, 57 (2009), pp. 919–926.