[3]\surJingya Chang

1]\orgdivSchool of Mathematics and Statistics, \orgnameGuangdong University of Technology, \cityGuangzhou, \postcode510520, \countryChina

2]\orgdivSchool of Automation, \orgnameGuangdong University of Technology, \cityGuangzhou, \postcode510006, \countryChina

[3]\orgdivSchool of Mathematics and Statistics, \orgnameCenter for Mathematics and Interdisciplinary Sciences, Guangdong University of Technology, \cityGuangzhou, \postcode510520, \countryChina

An adaptive weighted self-representation method for incomplete multi-view clustering

\surLishan Feng [email protected] \surGuoxu Zhou [email protected] [email protected] [ [ *

Abstract

For multi-view data in reality, part of its elements may be missing because of human or machine error. Incomplete multi-view clustering (IMC) clusters the incomplete multi-view data according to the characters of various views of the instances. Recently, IMC has attracted much attention and many related methods have been proposed. However, the existing approaches still need to be developed and innovated in the following aspects: (1) Current methods only consider the differences of different views, while the different influences of instances, as well as distinguishes between missing values and completed values are ignored. (2) The updating scheme for weighting matrix in adaptive weighted algorithms usually relies on an optimization sub-problem, whose optimal solution may not be easy to achieve. (3) The adaptive weighted subspace algorithms that can recover the incomplete data are anchor types. The randomness of the anchor matrix may cause unreliability. To tackle these limitations, we propose an adaptive weighted self-representation (AWSR) subspace method for IMC. The AWSR method tunes the weighting matrix adaptively in accordance with the views of different instances and the recovery process of the missing values. The low rank and smoothness constraints on the representation matrix make the subspace reveal the underlying features of the dataset accurately. We also analyze the convergence property of the block coordinate method for our optimization model theoretically. Numerical performance on five real-world data shows that the AWSR method is effective and delivers superior results when compared to other eight widely-used approaches considering the clustering accuracy (ACC), normalized mutual information (NMI) and Purity.

keywords:

Multi-view clustering, Incomplete data, Subspace learning, Self-representation learning

1 Introduction

The rapid development of science and technology enables us many ways to get the information of an object. One can receive news from different news organizations or media. The same semantic can be expressed by different languages via a smart phone application program. However, in reality, due to human or machine error, such as lost files, anonymous purposes, equipment malfunctions, technical failures and so on, it is common that the collected information is missing. Data missing can be divided into two categories: value missing, and view missing. Value missing means some of the elements of the features are absent [5]. View missing indicates the complete absence of some features [21, 5], which is a special case of value missing. In this paper, we concentrate on view missing incomplete multi-view clustering (IMC) problem.

1.1 Methods from MVC (multi-view clustering) to IMC and the auto-weighted techniques

In this subsection, we review the kinds of subspace methods for MVC and the approaches for dealing with missing data in IMC. At last, we summarize the auto-weighted techniques used for MVC.

Among methods of MVC, the class of subspace clustering methods is widely studied and used [16, 39, 8, 45]. These methods first map the original data into a low dimensional subspace, and then obtain the clustering results by applying spectral clustering approach to the subspace. One kind of subspace clustering approaches assumes that the features of items in the dataset are relevant and each item can be represented by other items numerically. In this context, the self-representation subspace clustering methods arise [21, 31]. Another kind of subspace clustering methods for MVC is the anchor based approach, which chooses a small set of points called anchors as the basis of the feature space and generates the subspace of the given data by the anchors [12, 30, 11]. Moreover, Zhao et al. [44] proposed the Reinforced Tensor Graph Neural Network (RTGNN) method to learn multi-view graph data, which employed tensor decomposition to obtain the graph structure features of each view in the common feature spaces.

Compared to MVC, the missing data in IMC creates difficulties for clustering. One way of dealing with the missing data is ignoring it or simply filling it with a constant. Deng et al. introduced a missing index matrix to utilize the completed instances and formulated a projection model for IMC [7]. Hu et al. removed the effect of missing data by assigning $0$ to it [14]. Zhao et al. suggested to use the average vector of non-missing views to estimate the missing instances [42]. This kind of methods fails to explore the implicit information in the missing data and may lead to an inaccurate clustering result. The other way of dealing with the missing data is to recover it. A popular technique of restoring the missing data is to mix the matrix completion and clustering task together throughout the clustering process. Yin et al. reconstructed the incomplete data by adding a regularization term to keep the estimated matrix close enough to the existing data [37]. Liu et al. distinguished the complete and incomplete views and obtained the missing data via the self-representation model [21].

Different views may have different influences on the clustering result. For example, the view collected from the color of a leaf is important to the view given by the shape of a leaf when clustering data of leaves. Equal treatment of various features may bring negative effect to clustering. Therefore, weighting matrix is assigned to adjust the importance of different views [28, 8]. Zhao et al. proposed to use a constant diagonal matrix to measure the importance of different views [42]. A majority of the weighted MVC methods are auto-weighted. The diagonal elements of the weighting matrix are restricted to the unit interval with summation being one. For subspace MVC methods, the weight is commonly multiplied by the representation error of the corresponding view [36, 34, 29, 30, 12, 22, 33, 18, 23, 15]. Then the weight is updated by solving a constrained sub optimization problem whose stationary point always has a closed form. In another case, a Frobenius norm regularization term of weights is added to smooth the weight distribution [26, 43]. In this case, the sub optimization problem for updating the weights becomes complex.

1.2 Motivation and contribution

Although great progress has been made in the area of IMC, current auto-weighted subspace IMC methods still face challenges. 1) The mechanism of weighting multiplies the weight by total representation error of the corresponding view. The features of different instances are treated equally. However, the significance of different features varies from instance to instance. For example, the color feature plays different roles in clustering leaves and persons. Also, both the known and unknown elements in the same feature are not distinguished and treated equally. 2) The adaptive updating scheme of the weighting matrix relies on the optimal solution of a constrained optimization problem, the global solution of which is not easy to find. 3) The existing auto-weighted subspace IMC methods that can recover the incomplete data are anchor based methods. The randomness of the anchor matrix induces unreliability of the clustering results.

In order to address the above limitations, we propose an adaptive weighted self-representation subspace clustering (AWSR) method for the IMC problem. The framework of this approach is intuitively shown in Figure 1.

Refer to caption — Figure 1: Overview of the algorithm. Here is an example of three views, including RGB, Text and Shape modalities. From left to right, first, feature vectors are extracted from the incomplete multi-view data $\mathbf{X}$ . Then the low rank and smoothness properties are imposed on the consensus coefficient matrix $\mathbf{Z}$ . Meanwhile, auto-weighted self-representation learning and missing data imputation are jointly performed to obtain the consensus representation matrix. Finally, cluster assignments are given based on the resulting affinity matrix.

In the AWSR model, the construction of the consensus representation matrix is designed in conjunction with the process of the missing data imputation. The weighting matrix is adaptively updated in accordance with the values assigned to the missing data and the features of different instances. Unlike traditional methods, we also put low rank and smoothness constraints on the representation matrix. The representation matrix, missing data and weighting matrix are mutually updated with each other to achieve the optimum point.

In terms of the IMC problem, the contributions of our paper include:

(1) To the best of our knowledge, the proposed AWSR method is one of the first attempts to employ self-representation subspace algorithm with auto-weighted matrix that can obtain the representation matrix and recover the incomplete data simultaneously. A novel updating scheme of the weighting matrix is given, which not only avoids solving the sub optimization problem but also treats the values of missing data and existing data differently. Both the low rank and smoothness properties of the representation matrix are considered.

(2) In the computation process, we introduce a splitting variable which enhances the convexity of the objective function in the optimization model and makes the model easy to be solved by the block coordinate descent (BCD) algorithm. In theory, we prove the global convergence property of the iteration sequence.

(3) The numerical experiments show complete superiority of our method over other eight state of the art approaches for IMC on five benchmark datasets.

2 Notations and related work

Table 1: Summary of the Notations

Notation	Description
$\mathbf{X}_{i}$	The data matrix for the $i$ th view.
$\mathbf{Z}$	The consensus coefficient matrix for all the views.
$\mathbf{W}^{(i)}$	The weighting matrix for the $i$ th view.
$\mathbf{Z}_{i}$	The coefficient matrix for the $i$ th view.
$v$	The number of the views.
$\\|\cdot\\|_{F}$	The Frobenius norm of matrix.
$\\|\cdot\\|_{*}$	The nuclear norm of matrix.
$Tr(\cdot)$	The trace of matrix.
$\mathbf{I}$	The identity matrix.
$\varOmega$	The feasible regions of the variables.
$\otimes$	The Kronecker product.
diag( $\mathbf{Z}$ )=0	The diagonal elements of matrix $\mathbf{Z}$ are zero.
\botrule

First, we introduce the notations and the preliminary knowledge. In this paper, matrix is expressed by bold capital letters, while bold lower case letters and lower case letters represent vectors and scalars respectively. The nuclear norm of a matrix $\mathbf{Z}$ is defined as $\left\|\mathbf{Z}\right\|_{*}=\sum\limits_{i}^{n}\sigma_{i}$ and $\sigma_{i}$ is the $i$ th singular value of $\mathbf{Z}$ . The Frobenius norm of $\mathbf{Z}$ is $\left\|\mathbf{Z}\right\|_{F}=(\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}\mathbf{Z}_{ij}^{2})^{\frac{1}{2}}.$

Definition 1 (Subspace clustering [24]).

Considering a collection of samples $\mathbf{X}=\left[\mathbf{x}_{1},\cdots,\mathbf{x}_{n}\right].$ Suppose that $\mathbf{X}$ is derived from a combination of $k$ subspaces $\{S_{i}\}_{i=1}^{k}$ with dimensionality $\{D_{i}\}_{i=1}^{k}$ . The objective of subspace clustering is to segment samples depending on the underlying subspaces under which they are drawn from.

The subspace here can be described as $S_{i}=\{x\in\mathbb{R}^{D_{i}}:x=\nu_{i}+B_{i}y\},$ where $\nu_{i}$ is an arbitrary point in $S_{i},$ $B_{i}$ is the basis of $S_{i},$ and $y$ is a low-dimensional representation for $x$ [32]. For example, the data points in the partition part of Figure 1 belong to three subspaces of the $\mathbb{R}^{3}$ space.

One of the most important subspace clustering approach is self-representation subspace method [21]. When the number of instances is large, the data of one instance can be represented by the data of others. The representation matrix represents the latent subspace. As a surrogate of the original dataset, the latent subspace usually inherits some properties from the original data and certain constraints or properties, such as sparsity, low rank or smoothness are usually imposed on the representation matrix. The task of self-representation subspace multi-view clustering is to solve

	$\displaystyle\min\limits_{\mathbf{Z}_{*},\{\mathbf{Z}_{i}\}_{i=1}^{v}}$	$\displaystyle\sum\limits_{i=1}^{v}\ell(\mathbf{X}_{i}-\mathbf{X}_{i}\mathbf{Z}_{i})+\lambda\mathbf{\mathcal{R}}(\mathbf{Z}_{*},\{\mathbf{Z}_{i}\}_{i=1}^{v}),$		(1)
		$\displaystyle\text{s.t.}\ \varOmega(\mathbf{Z}_{*},\{\mathbf{Z}_{i}\}_{i=1}^{v}),$		(1)

where $\ell$ is some loss function to measure the representation error. $\mathbf{\mathcal{R}}$ is the regularization term and $\mathbf{Z}_{*}$ is the consensus coefficient matrix of all views. $\varOmega$ defines feasible regions of the variables. The regularization term is usually determined by the potential properties of the subspace.

For complete single-view dataset, if the subspace is low rank, then the model of low rank self-representation is [19]

\min\ \left\|\mathbf{Z}_{1}\right\|_{*},\quad\text{s.t.}\ \mathbf{X}_{1}=\mathbf{X}_{1}\mathbf{Z}_{1},\ \text{diag}(\mathbf{Z}_{1})=0.

(2)

In order to capture the within-cluster affinities and smoothness properties better, the self-representation model becomes [24]

\displaystyle\min\limits_{\mathbf{Z}_{1}}\ \left\|\mathbf{X}_{1}-\mathbf{X}_{1}\mathbf{Z}_{1}\right\|_{F}^{2}+\lambda\left\|\mathbf{Z}_{1}\right\|_{F}^{2},\quad\text{s.t.}\ \text{diag}(\mathbf{Z}_{1})=0.

(3)

Following the framework of (Eq.(3)), the complete multi-view subspace clustering problem can be converted into

		$\displaystyle\min\limits_{\mathbf{Z}_{i}}\ \frac{1}{v}\sum\limits_{i=1}^{v}\left\\|\mathbf{X}_{i}-\mathbf{X}_{i}\mathbf{Z}_{i}\right\\|_{F}^{2}+\lambda\left\\|\mathbf{Z}_{i}\right\\|_{F}^{2},$		(4)
		$\displaystyle\text{s.t.}\ \text{diag}(\mathbf{Z}_{i})=0.$		(4)

3 The proposed AWSR model

When data information is incomplete, the incomplete data matrix is explicitly expressed and involved in the optimization process. Liu et al. proposed the following model [21]:

		$\displaystyle\min\limits_{\mathbf{Z},{\{\mathbf{X}_{i}^{(m)}\}}_{i=1}^{v},\mathbf{F}}\ \frac{1}{v}\sum\limits_{i=1}^{v}\left\\|[\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]-[\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]\mathbf{Z}\right\\|_{F}^{2}$		(5)
		$\displaystyle\hskip 56.9055pt+\lambda\left\\|\mathbf{Z}\right\\|_{F}^{2}+\gamma\left\\|\mathbf{Z}-\mathbf{F}^{\top}\mathbf{F}\right\\|_{F}^{2},$
		$\displaystyle\hskip 28.45274pt\text{s.t.}\qquad\text{diag}(\mathbf{Z})=0,\ \mathbf{F}\mathbf{F}^{\top}=\mathbf{I}_{k},$

where $\mathbf{X}_{i}^{(m)}$ is the matrix saving the missing data of the $i$ th view and $\mathbf{X}_{i}^{(o)}$ is the matrix for existing data of the $i$ th view. In this model, all views share a consensus coefficient matrix $\mathbf{Z}$ .

Since different views exert influences on the clustering result differently, it is a trend to include self- or auto-weighted techniques in MVC models. Two general auto-weighted models are widely used [12, 34, 26, 43]:

		$\displaystyle\min\limits_{\mathbf{Z}_{},\{\mathbf{Z}_{i}\}_{i=1}^{v}}\ \sum\limits_{i=1}^{v}\text{w}_{i}\ell(\mathbf{X}_{i}-\mathbf{X}_{i}\mathbf{Z}_{i})+\lambda\mathbf{\mathcal{R}}(\mathbf{Z}_{},\{\mathbf{Z}_{i}\}_{i=1}^{v}),$		(6)
		$\displaystyle\qquad\text{s.t.}\qquad\varOmega(\mathbf{Z}_{*},\{\mathbf{Z}_{i}\}_{i=1}^{v},\textbf{W}),$		(6)

and

		$\displaystyle\min\limits_{\mathbf{Z}_{},\{\mathbf{Z}_{i}\}_{i=1}^{v}}\ \sum\limits_{i=1}^{v}\text{w}_{i}\ell(\mathbf{X}_{i}-\mathbf{X}_{i}\mathbf{Z}_{i})+\lambda_{1}\mathbf{\mathcal{R}}(\mathbf{Z}_{},\{\mathbf{Z}_{i}\}_{i=1}^{v}),$		(7)
		$\displaystyle\hskip 42.67912pt+\lambda_{2}\mathbf{\mathcal{R}}(\textbf{W})$
		$\displaystyle\qquad\text{s.t.}\qquad\varOmega(\mathbf{Z}_{*},\{\mathbf{Z}_{i}\}_{i=1}^{v},\textbf{W}).$

The vector $\textbf{w}=(\text{w}_{1},\ldots,\text{w}_{v})$ is composed of the diagonal entries of the diagonal weighting matrix W. The regularization term $\mathbf{\mathcal{R}}(\textbf{W})$ is the Frobenius norm in [26, 43].

As pointed in [8], when coping with various real-life applications, auto-weighting should be more adaptive and automatic. The existing weighting matrix only concentrate on the differences among influences of different views. However the differences of instances, as well as dissimilarities between the missing values and existing values are neglected. Besides, the weighting matrix is updated by solving a sub optimization problem, which may be troublesome. To overcome these difficulties, we suggest a new way to automatically produce the weighting matrix. Initially the weighting matrix is a diagonal matrix with its diagonal elements being

\mathbf{W}_{jj}^{(i)}=\left\{\begin{array}[]{cl}1,&\mbox{if $j$th item is in the $i$th view,}\\ 0,&\text{otherwise}.\\ \end{array}\right.

(8)

Because the missing view is totally unknown at first, the zero diagonal elements in this weighting matrix make the representation error caused by the corresponding missing view being zero. At the beginning, the value of the data in the missing view is set to zero. During the iteration process, some elements in the missing view of the item are recovered while other elements are still zero. In this case, the influence of the recovered data should be taken into account. Therefore, the elements of the weighting matrix at the index positions where the missing data is recovered are set to

\mathbf{W}_{mj}^{(i)}=\left\{\begin{array}[]{cl}\alpha_{mj}^{(i)},&\mbox{if $\mathbf{X}_{i,mj}$ is recovered or nonzero}\\ 0,&\text{otherwise}.\end{array}\right.

(9)

The updating scheme means if the $m$ th element of the $j$ th instance in the $i$ th missing view $\mathbf{X}_{i,mj}$ is recovered, then $\mathbf{W}_{mj}^{(i)}=\alpha^{i}_{mj}.$ Here $\alpha^{i}_{mj}$ is nonzero and determined by either the value of the missing element $\mathbf{X}_{i,mj}$ or an empirical constant. On the other hand, if the $n$ th element of the $j$ th instance in the $i$ th missing view $\mathbf{X}_{i,nj}$ is zero, then $\mathbf{W}_{nj}^{(i)}=0.$ We illustrate the influence tuned by $\mathbf{W}_{mj}^{(i)}$ and $\mathbf{W}_{nj}^{(i)}$ in detail. Let $\mathbf{E}^{i}=\mathbf{X}_{i}-\mathbf{X}_{i}\mathbf{Z}.$ The matrix product

\mathbf{E}^{i}(\mathbf{W}^{(i)})^{\top}=\left(\begin{array}[]{ccccc}\cdots&\mathbf{E}^{i}_{1j}&\cdots\\ &\vdots&\\ \cdots&\mathbf{E}^{i}_{pj}&\cdots\\ \end{array}\right)\left(\begin{array}[]{ccccc}\cdots&\vdots&\cdots&\vdots&\cdots\\ \cdots&\mathbf{W}_{mj}^{(i)}&\cdots&\mathbf{W}_{nj}^{(i)}&\cdots\\ \cdots&\vdots&\cdots&\vdots&\cdots\\ \end{array}\right)

implies that $\mathbf{W}_{mj}^{(i)}=\alpha^{i}_{mj}$ affirms the effect of the $j$ th instance in the $i$ th view on clustering results, while $\mathbf{W}_{nj}^{(i)}=0$ weakens the effect of the $j$ th instance in the $i$ th view on clustering results. Therefore, by utilizing the adaptive weighting matrix technique we achieve the goal that the recovered data has significance on the clustering result and the significance of the unrecovered data on clustering result is eliminated. The influences of different instances in the same view are distinguished. Furthermore, the updating framework avoids solving the optimization sub-problem of $\mathbf{W}^{(i)}$ .

In order to get a low rank and smoothness coefficient matrix, the nuclear norm and Frobenius norm are employed to construct regularization terms, which can also capture the data correlation structure in the same space and help to solve the minimization problem effectively. Finally, the overall model is

$\displaystyle\min\limits_{\mathbf{Z},{\{\mathbf{X}_{i}^{(m)}\}}_{i=1}^{v}}$	$\displaystyle\sum\limits_{i=1}^{v}\left\\|([\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]-[\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]\mathbf{Z}){\mathbf{W}^{(i)}}^{\top}\right\\|_{F}^{2}$	(10)
	$\displaystyle+\gamma\left\\|\mathbf{Z}\right\\|_{*}+\frac{\lambda}{2}\left\\|\mathbf{Z}\right\\|_{F}^{2},$
	$\displaystyle\text{s.t.}\ \text{diag}(\mathbf{Z})=0,$

where $\gamma,\lambda$ are positive trade-off parameters and $\mathbf{X}_{i}^{(o)}$ and $\mathbf{X}_{i}^{(m)}$ refer to observed and missing entries of the $i$ th data view.

In summary, the entire framework of our method are as follows. First we construct the adaptive weighted linear constrained optimization model (Eq.(10)) based on the subspace representation approach. Then, problem (Eq.(10)) is solved iteratively by the BCD method. Finally, the Laplacian matrix is generated based on the coefficient matrix $\mathbf{Z}$ and the spectral clustering approach [4] is employed to cluster the given objects.

4 Computation of the AWSR model

In this section, we demonstrate computation process of the optimization model (Eq.(10)). At first, we declare that the nonzero elements of the weighting matrix W is determined artificially according to the numerical results. In order to enhance the convex property of the objective function and create a separable optimization model, we introduce a separated variable $\mathbf{J}$ and convert model (Eq.(10)) to

		$\displaystyle\min\limits_{\mathbf{J},\mathbf{Z},{\{\mathbf{X}_{i}^{(m)}\}}_{i=1}^{v}}\ \gamma\left\\|\mathbf{Z}\right\\|_{*}+\frac{\lambda}{2}\left\\|\mathbf{Z}\right\\|_{F}^{2}+\frac{\alpha}{2}\left\\|\mathbf{J}-\mathbf{Z}\right\\|_{F}^{2}$		(11)
		$\displaystyle+\sum\limits_{i=1}^{v}\left\\|([\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]-[\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]\mathbf{J}){\mathbf{W}^{(i)}}^{\top}\right\\|_{F}^{2},$
		$\displaystyle\text{s.t.}\ \text{diag}(\mathbf{Z})=0.$

The (Eq.(10)) and (Eq.(11)) are equivalent when $\alpha>0$ is large enough. The strong convexity property of the objective function implies that the solution is stable and unique. We apply the alternate strategy [20] to calculation of the above optimization problem.

4.1 Sub-problem for J:

To update $\mathbf{J}$ , we fix the values of all other variables. The sub-problem related with $\mathbf{J}$ is

	$\displaystyle\min\limits_{\mathbf{J}}\psi(\mathbf{J})$	$\displaystyle=\sum\limits_{i=1}^{v}\left\\|([\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]-[\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]\mathbf{J}){\mathbf{W}^{(i)}}^{\top}\right\\|_{F}^{2}$		(12)
		$\displaystyle+\frac{\alpha}{2}\left\\|\mathbf{J}-\mathbf{Z}\right\\|_{F}^{2}.$		(12)

The (Eq.(12)) is equivalent to the following problem

$\displaystyle\min\limits_{\mathbf{J}}\ \psi(\mathbf{J})$	$\displaystyle=Tr(\sum\limits_{i=1}^{v}\mathbf{X}_{i}\mathbf{J}{\mathbf{W}^{(i)}}^{\top}\mathbf{W}^{(i)}\mathbf{J}^{\top}{\mathbf{X}_{i}}^{\top}$	(13)
	$\displaystyle-\mathbf{X}_{i}\mathbf{J}{\mathbf{W}^{(i)}}^{\top}\mathbf{W}^{(i)}{\mathbf{X}_{i}}^{\top}-\mathbf{X}_{i}{\mathbf{W}^{(i)}}^{\top}\mathbf{W}^{(i)}{\mathbf{J}}^{\top}\mathbf{X}_{i})$
	$\displaystyle+\frac{\alpha}{2}Tr(\mathbf{J}^{\top}\mathbf{J}-\mathbf{J}^{\top}\mathbf{Z}-\mathbf{Z}^{\top}\mathbf{J}),$

in which $\mathbf{X}_{i}=[\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]$ . Take derivation of the objective function in (Eq.(13)) with respect to variable $\mathbf{J}$ , and let the derivation equal to zero. We get

		$\displaystyle\sum\limits_{i=1}^{v}{\mathbf{X}_{i}}^{\top}\mathbf{X}_{i}\mathbf{J}{\mathbf{W}^{(i)}}^{\top}\mathbf{W}^{(i)}+\frac{\alpha}{2}\mathbf{J}$		(14)
		$\displaystyle-\sum\limits_{i=1}^{v}{\mathbf{X}_{i}}^{\top}\mathbf{X}_{i}{\mathbf{W}^{(i)}}^{\top}\mathbf{W}^{(i)}-\frac{\alpha}{2}\mathbf{Z}=0.$		(14)

Since the matrix equation $\mathbf{AXB}=\mathbf{C}$ is equivalent to [25]

(\mathbf{B}^{T}\otimes\mathbf{A})\text{vec}(\mathbf{X})=\text{vec}(\mathbf{C}),

where $\otimes$ means the Kronecker product, the (Eq.(14)) equals to

\left[\sum\limits_{i=1}^{v}{\mathbf{W}^{(i)}}^{\top}\mathbf{W}^{(i)}\otimes{\mathbf{X}_{i}}^{\top}\mathbf{X}_{i}+\frac{\alpha}{2}\mathbf{I}\right]\text{vec}(\mathbf{J})=\text{vec}(\mathbf{C}).

(15)

Here $\mathbf{C}=\sum\limits_{i=1}^{v}{\mathbf{X}_{i}}^{\top}\mathbf{X}_{i}{\mathbf{W}^{(i)}}^{\top}\mathbf{W}^{(i)}+\frac{\alpha}{2}\mathbf{Z}.$ The coefficient matrix in the linear equation system (Eq.(15)) is positive definite. Hence, the vector $\text{vec}(\mathbf{J})$ can be solved by the traditional conjugate gradient method. Finally, we get the matrix $\mathbf{J}$ by rearranging $\text{vec}(\mathbf{J}).$

4.2 Sub-problem for $\mathbf{X}_{i}^{(m)}$ :

To update $\mathbf{X}_{i}^{(m)}$ , we fix the values of all other variables. The sub-problem related with $\mathbf{X}_{i}^{(m)}$ is

	$\displaystyle\min\limits_{\mathbf{X}_{i}^{(m)}}$	$\displaystyle\psi(\mathbf{X}_{i}^{(m)})=$		(16)
		$\displaystyle\sum\limits_{i=1}^{v}\left\\|([\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]-[\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]\mathbf{J}){\mathbf{W}^{(i)}}^{\top}\right\\|_{F}^{2}.$		(16)

The (Eq.(16)) can be transformed to

	$\displaystyle\min\limits_{\mathbf{X}_{i}^{(m)}}$	$\displaystyle\psi(\mathbf{X}_{i}^{(m)})=Tr([\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]\mathbf{B}_{i}{[\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]}^{\top}),$		(17)
		$\displaystyle\text{for}\ i=1,\ldots,v,$		(17)

where

	$\displaystyle\mathbf{B}_{i}$	$\displaystyle={\mathbf{W}^{(i)}}^{\top}\mathbf{W}^{(i)}-\mathbf{J}{\mathbf{W}^{(i)}}^{\top}\mathbf{W}^{(i)}-{\mathbf{W}^{(i)}}^{\top}\mathbf{W}^{(i)}\mathbf{J}^{\top}$		(18)
		$\displaystyle+\mathbf{J}{\mathbf{W}^{(i)}}^{\top}\mathbf{W}^{(i)}\mathbf{J}^{\top}.$		(18)

Because $\mathbf{X}_{i}$ is divided into observed and missing data, i.e., $\mathbf{X}_{i}=[\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]$ , we rewrite $\mathbf{B}_{i}$ in the form of block matrix. The (Eq.(17)) is equivalent to

	$\displaystyle\min\limits_{\mathbf{X}_{i}^{(m)}}$	$\displaystyle\ \psi(\mathbf{X}_{i}^{(m)})=$		(19)
		$\displaystyle Tr([\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]\left(\begin{array}[]{cc}\mathbf{B}_{i}^{(oo)}&\mathbf{B}_{i}^{(om)}\\ \mathbf{B}_{i}^{(mo)}&\mathbf{B}_{i}^{(mm)}\end{array}\right){[\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]}^{\top}.$		(19)

The (Eq.(19)) is further replaced by

		$\displaystyle\min\limits_{\mathbf{X}_{i}^{(m)}}\ \psi(\mathbf{X}_{i}^{(m)})=$		(20)
		$\displaystyle Tr(\mathbf{X}_{i}^{(o)}(\mathbf{B}_{i}^{(om)}+{\mathbf{B}_{i}^{(mo)}}^{\top}){\mathbf{X}_{i}^{(m)}}^{\top}+\mathbf{X}_{i}^{(m)}\mathbf{B}_{i}^{(mm)}{\mathbf{X}_{i}^{(m)}}^{\top}).$		(20)

From the definition of $\mathbf{B}_{i}$ , we have

\mathbf{B}_{i}^{(mm)}=\left(\mathbf{W}^{(i)}-\mathbf{J}\mathbf{W}^{(i)}\right)^{(m)}\left[\left(\mathbf{W}^{(i)}-\mathbf{J}\mathbf{W}^{(i)}\right)^{(m)}\right]^{\top},

(21)

in which $(\mathbf{W}^{(i)}-\mathbf{J}\mathbf{W}^{(i)})^{(m)}$ refers to the columns of $\mathbf{W}^{(i)}-\mathbf{J}\mathbf{W}^{(i)}$ and corresponds to the weighted missing entries in the $i$ th view. By taking the derivation of the objective function in (Eq.(20)) with respect to the variable $\mathbf{X}_{i}^{(m)}$ , we get

\mathbf{X}_{i}^{(m)}\mathbf{B}_{i}^{(mm)}=-\mathbf{X}_{i}^{(o)}\mathbf{B}_{i}^{(om)}.

(22)

Because $\mathbf{B}_{i}^{(mm)}$ is a positive semi-definite matrix, we calculate the eigenvalue decomposition of matrix $\mathbf{B}_{i}^{(mm)}$ , i.e., $\mathbf{B}_{i}^{(mm)}=\mathbf{U}_{i}\mathbf{\wedge}_{i}\mathbf{V}_{i}^{\top}$ . The numerical solution of equation system (Eq.(22)) is

\mathbf{X}_{i}^{(m)}=-\mathbf{X}_{i}^{(o)}\mathbf{B}_{i}^{(om)}\mathbf{V}_{i}\mathbf{\wedge}_{i}^{+}{\mathbf{U}_{i}}^{\top}.

(23)

Here $\mathbf{\wedge}_{i}^{+}$ is the pseudo inverse of $\mathbf{\wedge}_{i}$ .

4.3 Sub-problem for $Z$ :

To update $\mathbf{Z}$ , we fix the values of all other variables. We solve the following sub-problem:

		$\displaystyle\min\limits_{\mathbf{Z}}\ \psi(\mathbf{Z})=\ \gamma\left\\|\mathbf{Z}\right\\|_{*}+\frac{\lambda}{2}\left\\|\mathbf{Z}\right\\|_{F}^{2}+\frac{\alpha}{2}\left\\|\mathbf{J}-\mathbf{Z}\right\\|_{F}^{2},$		(24)
		$\displaystyle\text{s.t.}\ \ \text{diag}(\mathbf{Z})=0.$		(24)

The objective function in (Eq.(24)) is rearranged, and we obtain the equivalent model as follows

		$\displaystyle\min\limits_{\mathbf{Z}}\ \psi(\mathbf{Z})=\ \frac{\gamma}{\lambda+\alpha}\left\\|\mathbf{Z}\right\\|_{*}+\frac{1}{2}\left\\|\mathbf{Z}-\frac{\alpha}{\lambda+\alpha}\mathbf{J}\right\\|_{F}^{2},$		(25)
		$\displaystyle\text{s.t.}\ \text{diag}(\mathbf{Z})=0.$		(25)

Denote the Lagrange function

\mathcal{L}(\mathbf{Z},\mathbf{y})=\psi(\mathbf{Z})+<\mathbf{y},\text{diag}(\mathbf{Z})>.

Then the dual function of the Lagrange multiplier $\mathbf{y}$ is $g(\mathbf{y})=\inf\limits_{\mathbf{Z}}\mathcal{L}(\mathbf{Z},\mathbf{y}).$ The optimization problem (Eq.(25)) is convex and satisfies the weak Slater condition, so the strong duality property holds that means

\sup\limits_{\mathbf{y}}\inf\limits_{\mathbf{Z}}\mathcal{L}(\mathbf{Z},\mathbf{y})=\mathcal{L}(\mathbf{Z}^{*},\mathbf{y}^{*})=\inf\limits_{\mathbf{Z}}\sup\limits_{\mathbf{y}}\mathcal{L}(\mathbf{Z},\mathbf{y}).

(26)

We employ Uzawa’s algorithm [2] to solve the dual problem of (Eq.(25)). The gradient of $g(\mathbf{y})$ is given by

\nabla g(\mathbf{y})=\frac{\partial\mathcal{L}(\tilde{\mathbf{Z}},\mathbf{y})}{\partial\mathbf{y}}=\text{diag}(\tilde{\mathbf{Z}}),

(27)

where $\tilde{\mathbf{Z}}=\text{arg}\min\limits_{\mathbf{Z}}\mathcal{L}(\mathbf{Z},\mathbf{y}).$ Uzawa’s algorithm is achieved iteratively via the following scheme

\begin{cases}\mathcal{L}(\mathbf{Z}^{t},\mathbf{y}^{t-1})=\min\limits_{\mathbf{Z}}\mathcal{L}(\mathbf{Z},\mathbf{y}^{t-1}),\\ \mathbf{y}^{t}=\mathbf{y}^{t-1}+\delta_{k}(\text{diag}(\mathbf{Z}^{t})).\end{cases}

(28)

The parameter $\delta_{k}$ is a step size of the gradient direction. Next, we consider the minimization problem $\min\limits_{Z}\mathcal{L}(\mathbf{Z},\mathbf{y}^{t-1})$ in (Eq.(28)). By substituting the formula in (Eq.(25)) for $\psi(\mathbf{Z}),$ we obtain

	$\displaystyle\mathcal{L}(\mathbf{Z},\mathbf{y}^{t-1})$	$\displaystyle=\psi(\mathbf{Z})+<\mathbf{y}^{t-1},\text{diag}(\mathbf{Z})>$		(29)
		$\displaystyle=\psi(\mathbf{Z})+<\mathbf{Y}^{t-1},\mathbf{Z}>.$		(29)

Here $\mathbf{Y}$ is a matrix of the same size as $\mathbf{Z}$ . Its diagonal elements equal to $\mathbf{y}$ and off diagonal elements are zero. Furthermore, we have the following equivalent model

		$\displaystyle\text{arg}\min\limits_{\mathbf{Z}}\mathcal{L}(\mathbf{Z},\mathbf{y}^{t-1})=\frac{\gamma}{\lambda+\alpha}\left\\|\mathbf{Z}\right\\|_{*}$		(30)
		$\displaystyle+\text{arg}\min\limits_{\mathbf{Z}}\frac{1}{2}\left\\|\mathbf{Z}-\left(\frac{\alpha}{\lambda+\alpha}(\mathbf{J}+\frac{\mathbf{Y}^{t-1}}{\alpha})\right)\right\\|_{F}^{2}.$		(30)

Based on (Eq.(30)), the minimization problem in (Eq.(28)) can be efficiently solved by the singular value threshold operator [3] and Uzawa’s process is transformed into

\begin{cases}\begin{aligned} \mathbf{Z}^{t}&=\mathcal{D}_{\frac{\gamma}{\lambda+\alpha}}\left(\frac{\alpha}{\lambda+\alpha}(\mathbf{J}+\frac{\mathbf{Y}^{t-1}}{\alpha})\right)\\ \mathbf{y}^{t}&=\mathbf{y}^{t-1}+\delta_{k}(\text{diag}(\mathbf{Z}^{t})),\\ \end{aligned}\end{cases}

(31)

where

		$\displaystyle\mathcal{D}_{\frac{\gamma}{\lambda+\alpha}}\left(\frac{\alpha}{\lambda+\alpha}(\mathbf{J}+\frac{\mathbf{Y}^{t-1}}{\alpha})\right)=\mathbf{U}\mathcal{D}_{\frac{\gamma}{\lambda+\alpha}}(\mathbf{\Sigma})\mathbf{V}^{T}$
		$\displaystyle\mathcal{D}_{\frac{\gamma}{\lambda+\alpha}}(\mathbf{\Sigma})=\text{diag}\{\max(0,\sigma_{i}-\frac{\gamma}{\lambda+\alpha})\}$

and $\mathbf{U}\mathbf{\Sigma}\mathbf{V}$ is the singular value decomposition of the matrix

\frac{\alpha}{\lambda+\alpha}(\mathbf{J}+\frac{\mathbf{Y}^{t-1}}{\alpha})\text{ with}\quad\mathbf{\Sigma}=\text{diag}(\sigma_{i}).

When the iteration converges, we obtain the consensus coefficient matrix $\mathbf{Z}$ . Finally, we apply the spectral clustering technology [4] to the Laplacian (affinity) matrix $\mathbf{L}=(|\mathbf{Z}|+|\mathbf{Z}^{\top}|)/2$ to calculate the cluster assignments. The procedure for solving our model (Eq.(11)) is summarized in Algorithm 1.

Algorithm 1 An adaptive weighted self-representation method for incomplete multi-view clustering (AWSR)

1:Initial weighting matrix

\mathbf{W}_{0}^{(i)}

, incomplete multi-view data

{\{[\mathbf{X}_{i}^{(o)},\mathbf{X}_{i}^{(m)}]\}}_{i=1}^{v},

initial matrix

\mathbf{Z}_{0}

\mathbf{J}_{0},

the number of clusters

k

, parameter

\gamma

\lambda

\alpha

error=1e-3

;

2:while abs((obj(t-1)-obj(t))/(obj(t)))

>

error do

3: Update

\mathbf{J}

by solving Eq.(15);

4: Update

\{\mathbf{X}_{i}^{(m)}\}_{i=1}^{v}

by Eq.(23);

5: Update

\mathbf{Z}

by Eq.(31);

6: t=t+1;

7:end while

8:Obtain the affinity matrix by

\mathbf{L}=(|\mathbf{Z}|+|\mathbf{Z}^{\top}|)/2

;

9:Apply spectral clustering method with the aid of the affinity matrix

\mathbf{L}

;

10:Clustering result

\mathcal{C}

4.4 Convergence and complexity analysis

In this section, we prove that if the iteration sequence generated by the AWSR algorithm is infinite, it converges to a stationary point globally. Also, we demonstrate the convergence property of the sequences of objective function values numerically.

Let $\tilde{\mathbf{X}}^{(m)}=\{\mathbf{X}_{1}^{(m)},\mathbf{X}_{2}^{(m)},\cdot\cdot\cdot,\mathbf{X}_{v}^{(m)}\}$ . First, we demonstrate the property of $\{\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)}\}$ in the following lemma.

Lemma 1.

The sequence { $\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)}$ } generated via Algorithm 1 satisfies those properties:

(1) The function $f(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})$ is a monotonically decreasing function. Especially, we have

		$\displaystyle f(\mathbf{J}^{(t+1)},\mathbf{Z}^{(t+1)},\tilde{\mathbf{X}}^{(m)(t+1)})$		(32)
		$\displaystyle\leqslant f(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})-\frac{\alpha}{2}\left\\|\mathbf{J}^{(t+1)}-\mathbf{J}^{(t)}\right\\|_{F}^{2}$
		$\displaystyle-\frac{\lambda}{2}\left\\|\mathbf{Z}^{(t+1)}-\mathbf{Z}^{(t)}\right\\|_{F}^{2}$

(2) $\lim\limits_{t\rightarrow\infty}f(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})=c$ for some constant $c$ .

(3) When $t\rightarrow+\infty$ , $\mathbf{J}^{(t+1)}-\mathbf{J}^{(t)}\rightarrow 0$ , $\mathbf{Z}^{(t+1)}-\mathbf{Z}^{(t)}\rightarrow 0$ and $\tilde{\mathbf{X}}^{(m)(t+1)}-\tilde{\mathbf{X}}^{(m)(t)}\rightarrow 0$ .

(4) The sequences $\{\mathbf{J}^{(t)}\}$ , $\{\mathbf{Z}^{(t)}\}$ and $\{\tilde{\mathbf{X}}^{(m)(t)}\}$ are bounded.

Proof.

(1). The function $f(\mathbf{J},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})$ of $\mathbf{J}$ in the $t$ th updating process of $\mathbf{J}$ (Eq.(12)) is a $\alpha$ -strongly convex function. Therefore, we have

	$\displaystyle f(\mathbf{J}^{(t+1)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})$	$\displaystyle\leqslant f(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})$		(33)
		$\displaystyle-\frac{\alpha}{2}\left\\|\mathbf{J}^{(t+1)}-\mathbf{J}^{(t)}\right\\|_{F}^{2}.$		(33)

The function $f(\mathbf{J}^{(t+1)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)})$ of $\tilde{\mathbf{X}}^{(m)}$ in the $t$ th updating process of $\tilde{\mathbf{X}}^{(m)}$ (Eq.(16)) is a convex function, and the following inequality holds

\displaystyle f(\mathbf{J}^{(t+1)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t+1)})\leq f(\mathbf{J}^{(t+1)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)}).

(34)

Similarly, the function $f(\mathbf{J}^{(t+1)},\mathbf{Z},\tilde{\mathbf{X}}^{(m)(t+1)})$ of $\mathbf{Z}$ is a $\lambda$ -strongly convex function in the sub-problem (Eq.(24)), and we have

	$\displaystyle f(\mathbf{J}^{(t+1)},\mathbf{Z}^{(t+1)},\tilde{\mathbf{X}}^{(m)(t+1)})$	$\displaystyle\leqslant f(\mathbf{J}^{(t+1)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t+1)})$		(35)
		$\displaystyle-\frac{\lambda}{2}\left\\|\mathbf{Z}^{(t+1)}-\mathbf{Z}^{(t)}\right\\|_{F}^{2}.$		(35)

Summation of the above ( Eq.(33)), (Eq.(34)) and (Eq.(35)) induces the inequality

$\displaystyle f(\mathbf{J}^{(t+1)},\mathbf{Z}^{(t+1)},\tilde{\mathbf{X}}^{(m)(t+1)})$	$\displaystyle\leqslant f(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})$	(36)
	$\displaystyle-\frac{\lambda}{2}\left\\|\mathbf{Z}^{(t+1)}-\mathbf{Z}^{(t)}\right\\|_{F}^{2}$
	$\displaystyle-\frac{\alpha}{2}\left\\|\mathbf{J}^{(t+1)}-\mathbf{J}^{(t)}\right\\|_{F}^{2},$

which means the function $f(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})$ decreases as $t$ increases.

(2). Because $f(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})$ is non-negative and monotonically decreasing, it converges as $t$ tends to infinity. That is to say

\lim\limits_{t\rightarrow+\infty}f(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})=c\ \text{for some constant }\ c.

(3). Based on (Eq.(36)), we have

	$\displaystyle\sum\limits_{t=0}^{+\infty}[\frac{\lambda}{2}\left\\|\mathbf{Z}^{(t+1)}-\mathbf{Z}^{(t)}\right\\|_{F}^{2}+\frac{\alpha}{2}\left\\|\mathbf{J}^{(t+1)}-\mathbf{J}^{(t)}\right\\|_{F}^{2}]$	(37)
$\displaystyle\leqslant$	$\displaystyle\sum\limits_{t=0}^{+\infty}[f(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})$
	$\displaystyle-f(\mathbf{J}^{(t+1)},\mathbf{Z}^{(t+1)},\tilde{\mathbf{X}}^{(m)(t+1)})]$
$\displaystyle\leq$	$\displaystyle f(\mathbf{J}^{(0)},\mathbf{Z}^{(0)},\tilde{\mathbf{X}}^{(m)(0)}).$

Furthermore, we get $\mathbf{J}^{(t+1)}-\mathbf{J}^{(t)}\rightarrow 0$ , $\mathbf{Z}^{(t+1)}-\mathbf{Z}^{(t)}\rightarrow 0$ . In addition, the rule of updating $\tilde{\mathbf{X}}^{(m)(t)}$ indicates

f(\mathbf{J}^{(t+1)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t+1)})\leq f(\mathbf{J}^{(t+1)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)}),

and then we obtain $\tilde{\mathbf{X}}^{(m)(t+1)}-\tilde{\mathbf{X}}^{(m)(t)}\rightarrow 0$ from (Eq.(17)).

(4). Due to the fact that the value $f(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})$ is bounded and each term in (Eq.(11)) is nonnegative, the sequences $\{\mathbf{J}^{(t)}\}$ , $\{\mathbf{Z}^{(t)}\}$ and $\{\tilde{\mathbf{X}}^{(m)(t)}\}$ are all bounded. ∎

Theorem 1.

Suppose $\{(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})\}$ is the infinite iteration sequence produced by Algorithm 1. Then there exists $(\mathbf{J}^{(*)},\mathbf{Z}^{(*)},\tilde{\mathbf{X}}^{(m)(*)})$ such that

\lim_{t\rightarrow\infty}(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})=(\mathbf{J}^{(*)},\mathbf{Z}^{(*)},\tilde{\mathbf{X}}^{(m)(*)})

(38)

with $\text{diag}(\mathbf{Z}^{(*)})=0$ and

\displaystyle\lim_{t\rightarrow\infty}\|\nabla f(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})\|

\displaystyle=0.

(39)

Proof.

First we prove that the sequence $\{(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})\}$ converges. Because $(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})$ is bounded, there exists at least one accumulation point $(\mathbf{J}^{(*)},\mathbf{Z}^{(*)},\tilde{\mathbf{X}}^{(m)(*)})$ such that a sub-sequence $\{\mathbf{J}^{(t_{j})},\mathbf{Z}^{(t_{j})},\tilde{\mathbf{X}}^{(m)(t_{j})}\}$ converges to $(\mathbf{J}^{(*)},\mathbf{Z}^{(*)},\tilde{\mathbf{X}}^{(m)(*)})$ , i.e.,

\mathbf{J}^{(t_{j})}\rightarrow\mathbf{J}^{(*)},\mathbf{Z}^{(t_{j})}\rightarrow\mathbf{Z}^{(*)},\tilde{\mathbf{X}}^{(m)(t_{j})}\rightarrow\tilde{\mathbf{X}}^{(m)(*)}.

On the other hand, we have

\mathbf{J}^{(t+1)}-\mathbf{J}^{(t)}\rightarrow 0,

\mathbf{Z}^{(t+1)}-\mathbf{Z}^{(t)}\rightarrow 0,

\tilde{\mathbf{X}}^{(m)(t+1)}-\tilde{\mathbf{X}}^{(m)(t)}\rightarrow 0

from Lemma 1. Thus, the sequence $\{(\mathbf{J}^{(t)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})\}$ converges and (Eq.(38)) holds. From Theorem 4.4 of [3], the solution $\mathbf{Z}^{(t)}$ of sub-problem (Eq.(24)) converges to $\mathbf{Z}^{(*)}$ satisfying $diag(\mathbf{Z}^{(*)})=0$ .

Next we show that the partial derivative sequences with respect to $\mathbf{J}$ , $\mathbf{Z}$ and $\tilde{\mathbf{X}}^{(m)}$ converge to zero. It can be deduced from the solution process of the sub-problems that

$\displaystyle\nabla f_{\mathbf{J}}(\mathbf{J}^{(t+1)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t)})$	$\displaystyle=0,$	(40)
$\displaystyle\nabla f_{\tilde{\mathbf{X}}^{(m)}}(\mathbf{J}^{(t+1)},\mathbf{Z}^{(t)},\tilde{\mathbf{X}}^{(m)(t+1)})$	$\displaystyle=0.$
$\displaystyle\nabla f_{\mathbf{Z}}(\mathbf{J}^{(t+1)},\mathbf{Z}^{(t+1)},\tilde{\mathbf{X}}^{(m)(t+1)})$	$\displaystyle=0.$

Since the sequence $\{\mathbf{J}^{(t)};\mathbf{Z}^{(t)};\tilde{\mathbf{X}}^{(m)(t)}\}$ converges, the conclusion (Eq. (39)) is obtained immediately. ∎

Time complexity: We analyze the time complexity by counting the number of elementary operations performed in each sub-problem. (1) For the sub-problem of $\mathbf{J}$ , the time and space-consuming Kronecker product in (Eq.(15)) is computed via the product form in (Eq.(14)) during the iteration process in the conjugate gradient algorithm. Since we employ the preconditioned conjugate gradient method, the algorithm usually terminates quickly. Its computational complexity is $O(n^{3}+n^{2}d_{i})$ , where $d_{i}$ represents the feature numbers of the $i$ th view. (2) The solution of $\mathbf{X}_{i}$ has a closed form whose computational complexity is $O(n_{m}^{3}+d_{i}nn_{m})$ , where $n_{m}$ represents the number of unobservable entries in the $i$ th view. (3) Because the minimization problem in (Eq.(28)) is solved analytically, the Uzawa’s algorithm for computing $\mathbf{Z}$ in the sub-problem (Eq.(24)) only costs few iterations in practice. The computational complexity is $O(n^{3})$ for obtaining $\mathbf{Z}.$ (4) The computational complexity of spectral clustering in the last step is $O(n^{3}).$ To sum up, the total time complexity of the AWSR method is $O(n^{3})$ . Space complexity: The matrices involved are the $i$ th view data $\mathbf{X}_{i}\in\mathbb{R}^{d_{i}\times n},$ the self-representation matrix $\mathbf{Z}\in\mathbb{R}^{n\times n},$ the weighting matrix $\mathbf{W}^{(i)}\in\mathbb{R}^{d_{i}\times n}$ and the separated variable $\mathbf{J}\in\mathbb{R}^{n\times n}$ for $i=1,\ldots,v.$ Thus the space complexity is $O(n^{2}).$

5 Experiment

In this section, we demonstrate the performance of our AWSR method compared to performance of other widely-used methods on five datasets.

5.1 Datasets

The proposed AWSR method as well as other approaches is implemented on five benchmark datasets, named ORL¹¹1http://cam-orl.co.uk/facedatabase.html , Still²²2https://www.di.ens.fr/willow/research/stillactions/ , BBCSport³³3http://mlg.ucd.ie/datasets/bbc.html , Olympics⁴⁴4http://mlg.ucd.ie/aggregation/ , Leaves⁵⁵5https://github.com/cswanghao/gbs/blob/ . For the process of creating missing view, we refer to the processing method in literature [21]. First, the $n_{m}=[n*r]$ data entries of the first view are deleted, where $n$ is the total number of entries and $r$ is the missing ratio. Then the $n_{m}$ data entries of the second view are deleted and those existing in the first view are removed to ensure that all data is observed at least in one view. Finally, $n_{m}$ data entries arbitrarily chosen from the other views are deleted. In the numerical experiments, we consider the missing ratio as 0.1, 0.2, 0.3, 0.4, 0.5. The details of each dataset are as follows.

ORL: The dataset consists of 400 images about 40 different people at different times, lighting, facial expressions (eyes open, eyes closed, smiling and not smiling) and facial details (with/without glasses). Refer to [27] and [38], we use part of the images to generate the datasets ORL1 and ORL2. ORL1 is a dataset of dual views with dimensions 1024 and 288, respectively. Data in ORL2 has three features, whose dimensions are 4096, 3304 and 6750, respectively.

Still: The dataset [6] consists of 467 images taken of six different movements, including grabbing, running, walking, throwing, squatting and kicking. In the experiment, Still is a three-view dataset with dimensions 200, 200 and 200 respectively.

BBCSport: The dataset [9] consists of 737 sports articles published on the BBC Sport website by journalists in five different areas including athletics, cricket, football, rugby and tennis. In the experiment, we only selecte a subset of BBCSport that contains news reports from 116 journalists. BBCSport is a four-view dataset with dimensions 1991, 2063, 2113 and 2158.

Olympics: The dataset [10] contains images of 464 athletes or organizers at 28 different sports in the London 2012 Summer Olympic Games. In the experiment, Olympics is a double-view dataset with dimensions 4942 and 3097 respectively.

Leaves: This dataset [1] contains 1600 leaf images, covering the leaf images of 100 plant species. Leaves describes leaves from shape, fine-scale edges and texture histogram features. In the experiment, Leaves is a three-view dataset with dimensions 64, 64 and 64.

Statistical information about the data is presented in Table 2. The “Number of Clusters" and “Dimension of Each View” are acquired from the information of the dataset. For example, ORL dataset is the data of 400 images composed of different images of 40 people, and its number of clusters is 40.

Table 2: Statistics of the datasets.

Dataset	Number of	Dimension of Each View
Dataset	Clusters	1	2	3	4
ORL1	40	1024	288	-	-
Still	6	200	200	200	-
BBCSport	5	1991	2063	2113	2158
Olympics	28	4942	3097	-	-
Leaves	100	64	64	64	-
ORL2	40	4096	3304	6750	-

5.2 Methods for comparison

In this subsection, we provide a brief introduction to eight other algorithms and two benchmarks, which we use to with our proposed algorithm.

(1) LSRs (single-best baseline) [19] first filled the missing data with random values, then it applied LSR algorithm to the learning of each view.

(2) In LSRc (concatenated baseline) [19], the missing data was filled with arbitrary values, and the data of each view was concatenated. It run the LSR algorithm on the concatenated data.

(3) IMG [41] factorized data of each view to learn the latent subspaces independently. Meanwhile, it utilized the graph Laplacian term to regularize the latent subspaces of each view.

(4) DAIMC [13] was a weighted non-negative matrix factorization based method. It employed the existing instance alignment information to learn the consistent latent feature matrix of all views. Besides, in order to reduce the influence of missing entries, the $\ell_{2,1}$ -norm was applied to build the consistency basis matrix of all views.

(5) AGL [35] first performed a low rank representation of the graph for each view with spectral constraints. Then it built a consensus representation for all views using the co-regularization term.

(6) AWGF [40] factorized a weighted non-negative matrix to obtain the feature matrix of each view and then constructed the graph of each view. The feature extraction and the graph were fused into a large framework according to a certain weight.

(7) PLR [17] obtained the consensus feature matrix of all views based on non-negative matrix factorization. Then it imposed $\ell_{2,1}$ -norm constraints on the basis matrix of each view and finally imposed regularization constraints on the local graph to obtain the shared feature matrix.

(8) IMSR [21] performed feature extraction. It employed missing data imputation and low rank regularized consensus coefficient matrix via self-representation learning to obtain the complete consensus coefficient matrix.

(9) CAMVSC [31] employed an auto-weighted technique and constructed a self-representation subspace model to cluster mult-views on complete dataset.

(10) SIMC_ADC [12] proposed an anchor point learning method to obtain consensus coefficient matrix for large scale multi-view clustering problem.

5.3 Evaluation metrics

Accuracy (ACC), normalized mutual information (NMI) and Purity are used to measure the performance of clustering. Generally speaking, higher values represent better clustering performance. Next, we introduce the definitions of the metrics. The ACC is calculated by

\displaystyle ACC=\frac{TP+TN}{TP+TN+FP+FN},

where true positive (TP) indicates that similar documents are put in the same cluster, true negative (TN) means that different documents are placed in various clusters, false negative (FN) indicates that similar documents are placed in different clusters and false positive (FP) indicates that various documents are put in the same cluster. The mutual information (MI) is

\displaystyle MI(\mathcal{C},\varTheta)=\sum\limits_{c_{i}\in\mathcal{C}}\sum\limits_{\omega_{k}\in\mathcal{\varTheta}}P(c_{i},\omega_{k})log\frac{P(c_{i},\omega_{k})}{P(c_{i})P(\omega_{k})},

where $P(\cdot)$ is a probability function. If $\mathcal{C}$ and $\varTheta$ are independent of each other, mutual information (MI) is equivalent to zero. In the clustering experiments, we generally use the normalized mutual information (NMI) to measure

\displaystyle NMI(\mathcal{C},\varTheta)=\frac{MI(\mathcal{C},\varTheta)}{max(H(\mathcal{C}),H(\varTheta))},

where $H(\cdot)$ is the entropy function and defined as $H(\varTheta)=-\sum\limits_{k}P(\omega_{k})logP(\omega_{k})$ . The metric Purity is given by

\displaystyle Purity(\mathcal{C},\varTheta)=\frac{1}{n}\sum\limits_{i=1}^{r}\max\limits_{j=1}^{k}(\omega_{k}\cap c_{i}),

where $n$ is the total number of sample, $\varTheta=\{\omega_{1},\omega_{2},\cdot\cdot\cdot,\omega_{K}\}$ is the clustering class and the real class is defined as $\mathcal{C}=\{c_{1},c_{2},\cdot\cdot\cdot,c_{J}\}$ .

5.4 Numerical experiments

In this section, we explain how to choose the parameters. Then we demonstrate the numerical results of the proposed AWSR method comparing with other prevailing methods. At last, the sequence of objective function values is plotted to verify the convergence property of the computational algorithm.

5.4.1 Parameter selection

For the elements $\alpha_{mj}^{i}$ in the weighting matrix, we assign the constant $1$ to it from experience. The main idea we determine the parameters in the objective function is that we compare the performance of the AWSR method with different potential candidates and choose the best ones as the value of parameters.

First, we list the potential values of the parameters. We roughly approximate the range which may contain the desirable value of the parameters. All potential values of the parameters are expressed by $2^{q}.$ In Table 3, we demonstrate the values of the exponential $q$ for each parameter in different datasets. To be specific, the values assigned for the parameter $\gamma$ in the ORL1 dataset are $2^{3.1},2^{3.3},2^{3.5}.$ That is because the expression $[3.1:0.2:3.5]$ indicates that the exponential $q$ takes the values from 3.1 to 3.5 with a common difference 0.2 between any successive values. Other nominated parameters are chosen based on Table 3 in the same way.

Table 3: The values of the exponential

q

for different parameters (

2^{q}

Dataset	$q$ for $\lambda=2^{q}$	$q$ for $\gamma=2^{q}$	$q$ for $\alpha=2^{q}$
ORL1	$[-2.9:0.1:-1.7]$	$[3.1:0.2:3.5]$	$[7.3:0.1:8]$
Still	$[-1.9:0.05:-1.7]$	$[3.1:0.2:3.5]$	$[7.2:0.1:7.7]$
BBCSport	$[-10:2:10]$	$[2.5:1:4.5]$	$[7.3:0.1:8]$
Olympics	$[-1.8:0.1:-1.7]$	$[3.1:0.2:3.5]$	$[7.2:0.1:8]$
Leaves	$[-2.3:0.1:-1.6]$	$[3.1:0.2:3.5]$	$[7.4:0.1:8]$
ORL2	[-3:1:0]	[2:1:6]	[6.7:0.05:6.85]

Next, we take the Olympics dataset as an example to show the process of choosing parameters. As presented in the ‘Olympics’ row in Table 3, the parameter $\lambda$ has 2 candidates, $2^{-1.8}$ and $2^{-1.7},$ while $\gamma$ is allowed to take the values of $2^{3.1},2^{3.3},2^{3.5}.$ The parameter $\alpha$ has 9 choices, from $2^{7.2}$ to $2^{8}.$ Thus the combination of $(\lambda,\gamma,\alpha)$ has 54 cases. We sort these 54 combinations in lexical order, which means every number from 1 to 54 corresponds to a combination of $(\lambda,\gamma,\alpha)$ . For each combination of $(\lambda,\gamma,\alpha),$ we run the AWSR method when the missing ratio is 0.1, 0.2 , 0.3 and 0.4 respectively. In Figure 2, the average values of ACC, NMI, Purity are demonstrated. The number in the horizontal axis is the order of the corresponding combination $(\lambda,\gamma,\alpha)$ . It can be seen that, the model performs the best when the parameters come to the $36$ th combination of $(\lambda,\gamma,\alpha),$ which is $\lambda=2^{(-1.7)}$ , $\gamma=2^{(3.1)}$ and $\alpha=2^{(8.0)}$ .

5.4.2 Numerical results

Complete datasets: First, we compare the performance of our proposed algorithm with CAMVSC algorithm on complete datasets ORL2, BBCSport and Leaves. The experimental results are shown in Table 4. Since the CAMVSC algorithm is designed especially for complete multi-view clustering, it outperforms the AWSR method on these complete dataset.

Table 4: Performance comparison between the algorithm proposed and CAMVSC algorithm on complete dataset ORL2, BBCSport and Leaves.

Dataset	algorithm	ACC	NMI	Purity
ORL2	AWSR	80.90	91.16	84.30
	CAMVSC	82.45	91.73	85.33
BBCSport	AWSR	80.17	75.89	90.52
	CAMVSC	92.24	82.19	92.24
Leaves	AWSR	77.80	89.47	80.30
	CAMVSC	91.38	81.05	91.38

Incomplete datasets: In [21], it is shown that the IMSR algorithm is superior to the LSRs, LSRc, IMG, DAIMC, AGL, AWGF and PLR for incomplete dataset. Therefore first we only compare the AWSR algorithm with the IMSR algorithm and the state of the art SIMC_ADC algorithm when the missing ratio changes from 0.1 to 0.5. The results are shown in Figure 3 and Table 5 respectively. In Figure 3, the values of ACC, NMI and Purity computed by AWSR and IMSR are plotted. The blue lines describe the evaluation indices given by AWSR, while the red ones show the evaluation indices generated by IMSR. In Table 5, the superior values are in bold type. For most cases, AWSR performs better than IMSR and SIMC_ADC.

Table 5: Comparison between AWSR method and the SIMC_ADC method on incomplete datasets with various missing ratios. The best results are shown in bold.

Metrics	Dataset	Algorithm	0.1	0.2	0.3	0.4	0.5
ACC	ORL1	SIMC_ADC	51.73	45.85	37.80	29.23	30.80
	ORL1	AWSR	65.38	55.15	47.90	43.73	41.98
	Olympics	SIMC_ADC	56.51	51.44	45.60	39.01	34.24
	Olympics	AWSR	76.57	73.69	68.32	57.74	44.29
	Leaves	SIMC_ADC	52.62	42.84	36.36	32.03	27.36
	Leaves	AWSR	66.25	56.48	46.31	39.71	33.99
	BBCSport	SIMC_ADC	52.33	49.31	48.97	48.53	45.07
	BBCSport	AWSR	79.31	76.55	71.53	61.46	53.97
	Still	SIMC_ADC	32.63	31.32	29.13	29.79	29.14
	Still	AWSR	33.02	31.18	29.87	28.56	27.20
NMI	ORL1	SIMC_ADC	69.64	65.20	57.63	50.17	52.90
	ORL1	AWSR	78.01	72.43	67.13	62.94	64.22
	Olympics	SIMC_ADC	65.84	61.25	55.27	49.80	45.86
	Olympics	AWSR	83.35	81.36	78.19	71.35	65.33
	Leaves	SIMC_ADC	74.71	69.21	64.70	61.45	58.07
	Leaves	AWSR	81.48	75.47	69.72	66.00	60.00
	BBCSport	SIMC_ADC	28.65	23.47	23.71	22.88	20.61
	BBCSport	AWSR	73.60	68.46	63.41	50.11	39.41
	Still	SIMC_ADC	11.93	10.54	9.74	9.21	8.70
	Still	AWSR	12.08	10.25	8.97	7.80	6.67
Purity	ORL1	SIMC_ADC	54.39	48.78	40.48	31.38	32.95
	ORL1	AWSR	68.25	57.70	50.73	46.48	45.18
	Olympics	SIMC_ADC	64.38	58.92	52.97	39.09	43.82
	Olympics	AWSR	83.95	82.18	78.71	72.72	62.85
	Leaves	SIMC_ADC	54.73	45.10	38.68	39.94	29.16
	Leaves	AWSR	68.27	58.63	48.20	41.96	39.79
	BBCSport	SIMC_ADC	58.71	55.35	52.67	52.50	48.19
	BBCSport	AWSR	89.40	86.38	84.40	72.07	64.40
	Still	SIMC_ADC	35.56	34.28	33.00	32.35	31.50
	Still	AWSR	35.25	33.43	32.29	30.90	29.27

Moreover, we also compare the AWSR method with the above-mentioned approaches. The average values of the clustering results provided with missing ratio being 0.1, 0.2, 0.3, 0.4 and 0.5 are demonstrated in Table 6. The best metric value of each problem is shown in bold. We can see that for each dataset, the overall performance of the proposed algorithm AWSR on ACC, NMI and Purity is significantly better than that of other algorithms except for the NMI metric on ORL1 dataset, of which the proposed algorithm is the second best.

Table 6: Comparison of the performance after collation. The average values of the clustering results provided with missing ratio being 0.1, 0.2, 0.3, 0.4 and 0.5 are given. The best results are shown in bold.

	Dataset	LSRs	LSRc	IMG	DAIMC	AGL	AWGF	PLR	IMSR	SIMC_ADC	AWSR
ACC	ORL1	41.50	32.20	35.30	56.10	43.20	61.27	60.50	64.63	39.08	65.10
	Still	29.85	27.54	29.46	32.29	28.22	29.38	31.65	33.09	30.60	33.26
	BBCSport	60.17	38.97	43.90	73.28	67.24	37.29	70.17	76.72	48.84	78.10
	Olympics	60.52	59.53	26.47	59.87	67.03	43.09	64.87	73.17	45.36	74.37
	Leaves	41.39	21.96	37.36	48.33	47.88	44.34	47.63	51.81	38.24	61.02
NMI	ORL1	60.98	53.31	52.55	73.73	63.80	76.79	75.83	78.57	59.08	77.35
	Still	9.56	6.56	10.13	10.60	7.91	8.90	10.52	11.56	10.02	12.01
	BBCSport	42.88	17.16	23.04	60.49	60.10	14.51	59.51	69.06	23.86	70.7
	Olympics	66.45	69.67	35.12	72.44	74.28	55.92	77.09	82.16	55.60	82.94
	Leaves	65.06	50.75	60.75	71.59	71.32	56.40	70.61	73.49	65.63	75.63
Purity	ORL1	43.35	34.70	40.06	59.70	45.65	65.58	63.55	67.57	41.59	67.61
	Still	32.38	29.98	31.15	34.90	31.13	31.76	34.30	35.20	33.34	35.35
	BBCSport	68.10	47.07	45.10	81.21	78.28	43.19	80.00	86.38	53.48	87.75
	Olympics	65.82	69.09	32.88	70.78	72.76	53.32	75.30	82.37	51.68	83.86
	Leaves	42.69	22.85	40.23	50.68	50.13	46.91	49.85	54.03	40.32	63.51

We also visualize the results of Table 6 with different metrics values based on different datasets in Figure 4.

Time comparison: In terms of computation efficiency, the AWSR method is not as efficient as the IMSR and SIMC_ADC algorithm. Take Olympics, BBCSport and Still as examples. In Table 7 we show the the computation time of AWSR, IMSR and SIMC_ADC when the missing rate is $0.1$ .

Table 7: Comparison of time (seconds).

	Olympics	BBCSport	Still
AWSR	6.596	1.656	5.159
IMSR	2.166	0.378	0.724
SIMC_ADC	2.174	1.097	1.439

5.4.3 Sequence of objective function values

To verify the convergence property of the BCD algorithm, we show the result of objective function values on Still, Leaves, ORL1 and Olympics datasets in Figure 5. Since the missing elements at the initial point are zero, it can be seen that the first objective function value in each figure is less than the objective function value at the second point, while some missing elements are nonzero. The objective function value declines and tends to converge eventually.

6 Conclusion

For problems of IMC, we introduce an adaptive weighted self-representation (AWSR) model. It adjusts the weighting matrix on the basis of the variations of views and the recovery process of the missing data. The proposed AWSR model is concise and can be solved by the traditional BCD algorithm. Convergence property of the iterative algorithm is proved. Numerical experiments on five real-world incomplete data demonstrate the effectiveness of AWSR and its superiority over other eight widely-used approaches. In addition, we also compare the performance with CAMVSC on three complete data, our method can be applied to complete and incomplete datasets, while the CAMVSC value can only be applied to complete ones.

Although the AWSR method is effective for IMC, there are still some issues that deserve further investigation. Theoretically, the elements of weighting matrix are determined by the differences of instances, uncompleted entries and completed entries. However, the adjustment of the entries in the weighting matrix is realized empirically in the experiments. A more rational and flexible updating strategy should be provided. Besides, high-efficiency optimization algorithms are needed for large scale datasets.

\bmhead

Acknowledgments This work was supported by the National Natural Science Foundation of China [grant No.12326302, No.62073087].

\bmhead

Data availability The data used to support the findings of this study are available from the corresponding author upon request.

Declarations

•

Conflict of interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
•

Author contributions Conceptualization: [Lishan Feng], [Jingya Chang]; Software: [Lishan Feng]; Validation: [Lishan Feng], [Guoxu Zhou], [Jingya Chang]; Formal analysis: [Lishan Feng]; Data curation: [Lishan Feng]; Writing - original draft: [Lishan Feng]; Preparation: [Lishan Feng]; Visualization: [Lishan Feng]; Project administration: [Lishan Feng]; Investigation: [Guoxu Zhou], [Jingya Chang]; Resources: [Guoxu Zhou], [Jingya Chang]; Supervision:[Guoxu Zhou], [Jingya Chang]; Funding acquisition: [Guoxu Zhou], [Jingya Chang]; Methodology: [Jingya Chang]; Writing - review $\&$ editing: [Jingya Chang]. All authors have read and agreed to the published version of the manuscript.

References

\bibcommenthead
Beghin et al [2010] Beghin T, Cope JS, Remagnino P, et al (2010) Shape and texture based plant leaf classification. In: International conference on advanced concepts for intelligent vision systems, Springer, pp 345–353
Bramble et al [1997] Bramble JH, Pasciak JE, Vassilev AT (1997) Analysis of the inexact uzawa algorithm for saddle point problems. SIAM Journal on Numerical Analysis 34(3):1072–1092
Cai et al [2010] Cai JF, Candès EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM Journal on optimization 20(4):1956–1982
Cai et al [2018] Cai Y, Jiao Y, Zhuge W, et al (2018) Partial multi-view spectral clustering. Neurocomputing 311:316–324
Chao et al [2022] Chao G, Wang S, Yang S, et al (2022) Incomplete multi-view clustering with multiple imputation and ensemble clustering. Applied Intelligence pp 1–11
Delaitre et al [2010] Delaitre V, Laptev I, Sivic J (2010) Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: BMVC 2010-21st British Machine Vision Conference
Deng et al [2023] Deng S, Wen J, Liu C, et al (2023) Projective incomplete multi-view clustering. IEEE Transactions on Neural Networks and Learning Systems
Fang et al [2023] Fang U, Li M, Li J, et al (2023) A comprehensive survey on multi-view clustering. IEEE Transactions on Knowledge and Data Engineering
Greene and Cunningham [2006] Greene D, Cunningham P (2006) Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd international conference on Machine learning, pp 377–384
Greene and Cunningham [2013] Greene D, Cunningham P (2013) Producing a unified graph representation from multiple social network views. In: Proceedings of the 5th annual ACM web science conference, pp 118–121
Guo et al [2023] Guo W, Wang Z, Chi Z, et al (2023) Scalable one-stage multi-view subspace clustering with dictionary learning. Knowledge-Based Systems 259:110092
He et al [2023] He Wj, Zhang Z, Wei Y (2023) Scalable incomplete multi-view clustering with adaptive data completion. Information Sciences p 119562
Hu and Chen [2019] Hu M, Chen S (2019) Doubly aligned incomplete multi-view clustering. arXiv preprint arXiv:190302785
Hu et al [2021] Hu Y, Luo C, Wang B, et al (2021) Complete/incomplete multi-view subspace clustering via soft block-diagonal-induced regulariser. IET Computer Vision 15(8):618–632
Khan et al [2023] Khan MA, Khan GA, Khan J, et al (2023) Adaptive weighted low-rank sparse representation for multi-view clustering. IEEE Access
Li et al [2022] Li Z, Tang C, Zheng X, et al (2022) High-order correlation preserved incomplete multi-view subspace clustering. IEEE Transactions on Image Processing 31:2067–2080
Lian et al [2021] Lian H, Xu H, Wang S, et al (2021) Partial multiview clustering with locality graph regularization. International Journal of Intelligent Systems 36(6):2991–3010
Liang et al [2022] Liang N, Yang Z, Xie S (2022) Incomplete multi-view clustering with sample-level auto-weighted graph fusion. IEEE Transactions on Knowledge and Data Engineering 35(6):6504–6511
Liu et al [2012] Liu G, Lin Z, Yan S, et al (2012) Robust recovery of subspace structures by low-rank representation. IEEE transactions on pattern analysis and machine intelligence 35(1):171–184
Liu et al [2021a] Liu J, Liu X, Yang Y, et al (2021a) Hierarchical multiple kernel clustering. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI, pp 2–9
Liu et al [2021b] Liu J, Liu X, Zhang Y, et al (2021b) Self-representation subspace clustering for incomplete multi-view data. In: Proceedings of the 29th ACM International Conference on Multimedia, pp 2726–2734
Liu et al [2023] Liu M, Yang Z, Li L, et al (2023) Auto-weighted collective matrix factorization with graph dual regularization for multi-view clustering. Knowledge-Based Systems 260:110145
Liu and Lin [2023] Liu SS, Lin L (2023) Adaptive weighted multi-view clustering. In: Conference on Health, Inference, and Learning, PMLR, pp 19–36
Lu et al [2012] Lu CY, Min H, Zhao ZQ, et al (2012) Robust and efficient subspace segmentation via least squares regression. In: Computer Vision – ECCV 2012. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 347–360
Merino [1992] Merino DI (1992) Topics in matrix analysis. The Johns Hopkins University
Ren et al [2018] Ren P, Xiao Y, Xu P, et al (2018) Robust auto-weighted multi-view clustering. In: IJCAI, pp 2644–2650
Samaria and Harter [1994] Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of 1994 IEEE workshop on applications of computer vision, IEEE, pp 138–142
Shi et al [2022] Shi S, Nie F, Wang R, et al (2022) Self-weighting multi-view spectral clustering based on nuclear norm. Pattern Recognition 124:108429
Shu et al [2022] Shu X, Zhang X, Gao Q, et al (2022) Self-weighted anchor graph learning for multi-view clustering. IEEE Transactions on Multimedia
Sun et al [2021] Sun M, Zhang P, Wang S, et al (2021) Scalable multi-view subspace clustering with unified anchors. In: Proceedings of the 29th ACM International Conference on Multimedia, pp 3528–3536
Tang et al [2022] Tang K, Cao L, Zhang N, et al (2022) Consistent auto-weighted multi-view subspace clustering. Pattern Analysis and Applications 25(4):879–890
Vidal [2011] Vidal R (2011) Subspace clustering. IEEE Signal Processing Magazine 28(2):52–68
Wan et al [2023] Wan X, Liu X, Liu J, et al (2023) Auto-weighted multi-view clustering for large-scale data. arXiv preprint arXiv:230301983
Wang et al [2022] Wang S, Wang Y, Le W (2022) Adaptive weight structure representation for multi-view subspace clustering. In: 2022 9th International Conference on Dependable Systems and Their Applications (DSA), pp 918–925, 10.1109/DSA56465.2022.00129
Wen et al [2018] Wen J, Xu Y, Liu H (2018) Incomplete multiview spectral clustering with adaptive graph learning. IEEE transactions on cybernetics 50(4):1418–1429
Wen et al [2019] Wen J, Zhang Z, Xu Y, et al (2019) Unified embedding alignment with missing views inferring for incomplete multi-view clustering. In: Proceedings of the AAAI conference on artificial intelligence, pp 5393–5400
Yin and Jiang [2023] Yin J, Jiang J (2023) Incomplete multi-view clustering based on self-representation. Neural Processing Letters pp 1–15
Zhang et al [2015] Zhang C, Fu H, Liu S, et al (2015) Low-rank tensor constrained multiview subspace clustering. In: Proceedings of the IEEE international conference on computer vision, pp 1582–1590
Zhang et al [2017] Zhang C, Hu Q, Fu H, et al (2017) Latent multi-view subspace clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4279–4287
Zhang et al [2020] Zhang P, Wang S, Hu J, et al (2020) Adaptive weighted graph fusion incomplete multi-view subspace clustering. Sensors 20(20):5755
Zhao et al [2016] Zhao H, Liu H, Fu Y (2016) Incomplete multi-modal visual data grouping. In: IJCAI, pp 2392–2398
Zhao et al [2022a] Zhao L, Zhang J, Yang T, et al (2022a) Incomplete multi-view clustering based on weighted sparse and low rank representation. Applied Intelligence 52(13):14822–14838
Zhao et al [2023] Zhao M, Yang W, Nie F (2023) Auto-weighted orthogonal and nonnegative graph reconstruction for multi-view clustering. Information Sciences 632:324–339
Zhao et al [2022b] Zhao X, Dai Q, Wu J, et al (2022b) Multi-view tensor graph neural networks through reinforced aggregation. IEEE Transactions on Knowledge and Data Engineering 35(4):4077–4091
Zhuge et al [2017] Zhuge W, Hou C, Jiao Y, et al (2017) Robust auto-weighted multi-view subspace clustering with common subspace representation matrix. PloS one 12(5):e0176769

An adaptive weighted self-representation method for incomplete multi-view clustering

Abstract

keywords:

1 Introduction

1.1 Methods from MVC (multi-view clustering) to IMC and the auto-weighted techniques

1.2 Motivation and contribution

2 Notations and related work

Definition 1 (Subspace clustering [24]).

3 The proposed AWSR model

4 Computation of the AWSR model

4.1 Sub-problem for J:

4.2 Sub-problem for 𝐗i(m)\mathbf{X}_{i}^{(m)} :

4.3 Sub-problem for ZZ:

4.4 Convergence and complexity analysis

Lemma 1.

Proof.

Theorem 1.

Proof.

5 Experiment

5.1 Datasets

5.2 Methods for comparison

5.3 Evaluation metrics

5.4 Numerical experiments

5.4.1 Parameter selection

5.4.2 Numerical results

5.4.3 Sequence of objective function values

6 Conclusion

Declarations

References

4.2 Sub-problem for $\mathbf{X}_{i}^{(m)}$ :

4.3 Sub-problem for $Z$ :