This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Partially Shared Semi-supervised Deep Matrix Factorization with Multi-view Data

Haonan Huang2, Naiyao Liang23, Wei Yan2, Zuyuan Yang25, Weijun Sun24 {mrhaonan, naiyaogdut}@aliyun.com, [email protected], {yangzuyuan, gdutswj}@gdut.edu.cn 2Guangdong Key Laboratory of IoT Information Technology, Guangdong University of Technology, Guangzhou, China 3Key Laboratory of iDetection and Manufacturing-IoT, Ministry of Education, Guangzhou, China 4Guangdong-Hong Kong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou, China 5Author to whom any correspondence should be addressed.
Abstract

Since many real-world data can be described from multiple views, multi-view learning has attracted considerable attention. Various methods have been proposed and successfully applied to multi-view learning, typically based on matrix factorization models. Recently, it is extended to the deep structure to exploit the hierarchical information of multi-view data, but the view-specific features and the label information are seldom considered. To address these concerns, we present a partially shared semi-supervised deep matrix factorization model (PSDMF). By integrating the partially shared deep decomposition structure, graph regularization and the semi-supervised regression model, PSDMF can learn a compact and discriminative representation through eliminating the effects of uncorrelated information. In addition, we develop an efficient iterative updating algorithm for PSDMF. Extensive experiments on five benchmark datasets demonstrate that PSDMF can achieve better performance than the state-of-the-art multi-view learning approaches. The MATLAB source code is available at https://github.com/libertyhhn/PartiallySharedDMF.

Index Terms:
Multi-view learning; Deep matrix factorization; Semi-supervised learning; Partially shared structure.

I Introduction

In practical applications, real-world data can be described from different views, that is the so called multi-view data. For instance, an image can be described by several characteristics, e.g., shape, color, texture and so on. Because multi-view representation learning can exploit the implicit-dependent structure of multiple views and improve the performance of the learning tasks. In recent years, multi-view representation learning has attracted increasing research attention in machine learning [1, 2, 3].

Over the past decade, lots of researches on multi-view learning have emerged. In particular, Non-negative Matrix Factorization (NMF) [4, 5] as one of the most popular high-dimensional data processing algorithms, have been widely used for clustering multi-view data [7, 6]. NMF-based multi-view clustering methods have shown to generate superior clustering results which are easy to interpret [8, 9]. Considering the relations between view-specific (uncorrelated) and common information (correlated), partially shared NMF-based multi-view learning approaches are proposed [10, 11].

Refer to caption

Figure 1: The general framework of the Deep Multi-View Clustering. Data of the same class (shape) will become more compact and generate more discriminative representation as the number of decomposition layers increases.

Nevertheless, most existing methods are single layer structures, which are hard to extract hierarchical structural information [12] of multi-view data. With the development of deep learning, Wang et al. proposed DCCA to extract the hidden hierarchical information in 2-view data [13]. Zhao et al. [14] extended single-view deep matrix factorization [15] into multi-view and proposed graph regularized deep multi-view clustering, which can eliminate the interference in multi-view data. Huang et al. [16] designed a novel robust deep multi-view clustering model to learn the hierarchical semantics without hyperparameters. For clarity, the general deep matrix factorization framework for multi-view data as illustrated in Fig. 1. However, most existing deep matrix factorization multi-view methods are merely considering the common information and ignoring the effect of view-specific information in each individual view. Besides, they are formulated as an unsupervised learning problem and inapplicable when partially labeled data is available. Actually, researchers have found that integrating such label information can produce a considerable improvement in learning performance [19, 18, 17].

To this end, we propose a novel deep multi-view clustering methods called Partially Shared Semi-supervised Deep Matrix Factorization (PSDMF). In our method, correlated and uncorrelated features of multi-view data are both considered by partially shared approaches, in which the latent representation of each view is divided into the common part and the view-specific part. A robust sparse regression term with L2,1L_{2,1} norm is adopted to integrate the partial label information of the labeled data. Besides, to respect the intrinsic geometric relationship and avoid the parameters problem in different views, we apply graph regularization and auto-weighted strategy to PSDMF. An efficient iterative updating algorithm with pre-training scheme is designed for PSDMF. We summarize our major contributions as follows:

  • This paper proposes an semi-supervised deep multi-view clustering model to improve the performance of traditional unsupervised deep matrix factorization methods by using label regression learning approach.

  • To respect the common information and view-specific features of multi-view data, we propose a partially shared deep matrix factorization method to jointly exploit two kinds of information and learn a comprehensive final layer representation of multi-view data.

  • Local invariant graph regularization and the auto-weighted strategy are introduced to preserve the intrinsic geometric structure in each view and further boost the quality of output representation.

Refer to caption

Figure 2: Illustration of the work flow of the proposed Partially Shared Semi-supervised Deep Matrix Factorization.

The rest of this paper is organized as follows. In Section II, we give a brief review of related works and describe the proposed PSDMF. Section III present an efficient algorithm to solve the optimization problem. In Section IV, we report the experimental results on five real-world datasets. Finally, Section V concludes this paper. Table. I summarizes the general notations in this article for reader’s convenience.

TABLE I: Notations.
Notation Description
PP The number of views
mm The number of layers
kk The layer sizes
NN The number of samples
NlN_{l} The number of labeled samples
NuN_{u} The number of unlabeled samples
KsK_{s} The dimension of view-specific encoding matrix
KcK_{c} The dimension of shared encoding matrix
𝐗p\mathbf{X}^{p} The data matrix of the pp-th view
𝐔ip\mathbf{U}_{i}^{p} The hidden matrix in the ii-th layer of pp-th view
𝐕\mathbf{V} The partially shared factor latent representation matrix
𝐕mp\mathbf{V}_{m}^{p} The mm-th layer partially shared factor of the pp-th view
𝐖\mathbf{W} The regression weight matrix

II Methodology

II-A Overview of Deep Matrix Factorization

Semi-NMF is not only a useful data dimensionality reduction technique but also beneficial to data clustering [20]. Motivated by deep neural network structures, Trigeorgis et al. [15] proposed a multi-layer structure Semi-NMF called Deep Semi-NMF to exploit data complex hierarchical information with implicit lower-level hidden attributes. The Deep Semi-NMF decomposes dataset 𝐗\mathbf{X} hierarchically, and the process can be formulated as:

𝐗\displaystyle\mathbf{X} 𝐔1𝐕1+\displaystyle\approx\mathbf{U}_{1}\mathbf{V}_{1}^{+} (1)
𝐗\displaystyle\mathbf{X} 𝐔1𝐔2𝐕2+\displaystyle\approx\mathbf{U}_{1}\mathbf{U}_{2}\mathbf{V}_{2}^{+}
\displaystyle\vdots
𝐗\displaystyle\mathbf{X} 𝐔1𝐔m𝐕m+\displaystyle\approx\mathbf{U}_{1}\ldots\mathbf{U}_{m}\mathbf{V}_{m}^{+}

where mm is the numbers of layers, 𝐔i\mathbf{U}_{i} is the ii-th layer hidden matrix, 𝐕i+\mathbf{V}_{i}^{+} denotes the ii-th layer low-dimensional representation matrix and we use the notation ++ to state that a matrix contains only non-negative elements. The loss function of Deep Semi-NMF is:

min𝐔i,𝐕m𝐗𝐔1𝐔m𝐕mF2 s.t. 𝐕m0.\min_{\mathbf{U}_{i},\mathbf{V}_{m}}\left\|\mathbf{X}-\mathbf{U}_{1}\cdot\cdot\cdot\mathbf{U}_{m}\mathbf{V}_{m}\right\|_{F}^{2}\text{ s.t. }\mathbf{V}_{m}\geq 0. (2)

The Semi-NMF and Deep Semi-NMF mentioned above can be regarded as single view algorithms. For multi-view data, Zhao et al. [14] presented a new Deep Semi-NMF framework with graph regularization (DMVC) which can eliminate the negative interference in the multi-source data and obtain an effective consensus representation in the final layer. Given data 𝐗\mathbf{X} consisting of PP views, denoted by 𝐗={𝐗1,𝐗2,,𝐗P}\mathbf{X}=\{\mathbf{X}^{1},\mathbf{X}^{2},...,\mathbf{X}^{P}\}, the loss function of DMVC is:

min𝐙ip,𝐇ip,𝐇m,αp\displaystyle\min_{\mathbf{Z}_{i}^{p},\mathbf{H}_{i}^{p},\mathbf{H}_{m},\alpha^{p}} p=1P(αp)r(𝐗p𝐔1p𝐔2p𝐔mp𝐕mF2\displaystyle\sum_{p=1}^{P}\left(\alpha^{p}\right)^{r}\left(\left\|\mathbf{X}^{p}-\mathbf{U}_{1}^{p}\mathbf{U}_{2}^{p}\ldots\mathbf{U}_{m}^{p}\mathbf{V}_{m}\right\|_{\mathrm{F}}^{2}\right. (3)
+βtr(𝐕m𝐋p𝐕mT))\displaystyle\left.\quad+\beta\operatorname{tr}\left(\mathbf{V}_{m}\mathbf{L}^{p}\mathbf{V}_{m}^{T}\right)\right)
s.t. 𝐕ip0,𝐕m0,p=1Pαp=1,αp0\displaystyle\mathbf{V}_{i}^{p}\geq 0,\mathbf{V}_{m}\geq 0,\sum_{p=1}^{P}\alpha^{p}=1,\alpha^{p}\geq 0

where αp\alpha^{p} denotes the weighting coefficient of the pp-th view and rr is a important hyperparameter to control the weights distribution. 𝐋p\mathbf{L}^{p} is the graph Laplacian of the graph for view pp and β\beta is used to adjust the contribution of the graph constrains. The details of how to construct a graph matrix will be discussed in the next subsection. However, as an unsupervised method, DMVC cannot make use of partial prior knowledge of data (e.g. labels). Besides, DMVC only concerns common representation of multi-view data and ignores the view-specific features. In this paper, we propose a novel semi-supervised deep matrix factorization model to address such challenging problems.

II-B Partially Shared Semi-supervised Deep Matrix Factorization

To make full use of priori knowledge and learn low-dimensional factors with powerful discrimination, motivated by recently proposed label regression learning technique [10, 17], we combine the following L2,1L_{2,1}-norm regularized regression in our model:

min𝐖T𝐕l𝐘F2+γ𝐖2,1.\min\left\|\mathbf{W}^{T}\mathbf{V}_{l}-\mathbf{Y}\right\|_{F}^{2}+\gamma\|\mathbf{W}\|_{2,1}. (4)

where 𝐕lK×Nl\mathbf{V}_{l}\in\mathbb{R}^{K\times N_{l}} denotes the labeled part of representation matrix 𝐕\mathbf{V} and 𝐕uK×Nu\mathbf{V}_{u}\in\mathbb{R}^{K\times N_{u}} is the unlabeled part as shown in Fig. 2 (i.e., 𝐕=[𝐕l,𝐕u]\mathbf{V}=[\mathbf{V}_{l},\mathbf{V}_{u}]). 𝐖K×C\mathbf{W}\in\mathbb{R}^{K\times C} is the regression coefficient matrix and the L2,1L_{2,1}-norm is used to enforce it sparse in rows. Consider we have a priori knowledge that those samples share the same label, a binary weight matrix 𝐘C×N\mathbf{Y}\in\mathbb{R}^{C\times N} is constructed by following rule:

𝐘cn={1 if n-th data point belongs to the c-th class0 otherwise \mathbf{Y}_{cn}=\left\{\begin{array}[]{ll}{1}&{\text{ if }n\text{-th data point belongs to the }c\text{-th class}}\\ {0}&{\text{ otherwise }}\end{array}\right. (5)

The effectiveness of graph regularization technique has been shown in recent research work [28, 14] and it is able to keep the geometric structure of data when the dimension is changed. Similar to the DMVC, we construct a local graph Laplacian matrix to preserve the local geometrical structure of the each view 𝐗p\mathbf{X}^{p}. As introduced in [14], the graph binary weight matrix 𝐒p\mathbf{S}^{p} is constructed in k-nearest neighbor (k-NN) fashion. Formally, the regularization term p\mathcal{R}^{p} is calculated:

p\displaystyle\mathcal{R}^{p} =12j,q=1n𝐯jp𝐯qp2𝐒jqp\displaystyle=\frac{1}{2}\sum_{j,q=1}^{n}\left\|\mathbf{v}_{j}^{p}-\mathbf{v}_{q}^{p}\right\|^{2}\mathbf{S}_{jq}^{p} (6)
=j=1n𝐯jp𝐯jp𝐃jjpj,q=1n𝐯jp𝐯lp𝐒jqp\displaystyle=\sum_{j=1}^{n}\mathbf{v}_{j}^{p}\mathbf{v}_{j}^{p}\mathbf{D}_{jj}^{p}-\sum_{j,q=1}^{n}\mathbf{v}_{j}^{p}\mathbf{v}_{l}^{p}\mathbf{S}_{jq}^{p}
=Tr(𝐕p𝐃p(𝐕p)T)Tr(𝐕p𝐒p(𝐕p)T)\displaystyle=\operatorname{Tr}\left(\mathbf{V}^{p}\mathbf{D}^{p}(\mathbf{V}^{p})^{T}\right)-\operatorname{Tr}\left(\mathbf{V}^{p}\mathbf{S}^{p}(\mathbf{V}^{p})^{T}\right)
=Tr(𝐕p𝐋p(𝐕p)T)\displaystyle=\operatorname{Tr}\left(\mathbf{V}^{p}\mathbf{L}^{p}(\mathbf{V}^{p})^{T}\right)

where 𝐃jjp=q𝐒jqp\mathbf{D}_{jj}^{p}=\sum_{q}\mathbf{S}_{jq}^{p}. 𝐋p=𝐃p𝐒p\mathbf{L}^{p}=\mathbf{D}^{p}-\mathbf{S}^{p} denotes the graph Laplacian matrix for each view data.

Different from the existing deep matrix factorization multi-view methods, we use partially shared strategy to jointly exploit view-specific and common features. The partially shared latent representation matrix 𝐕K×N\mathbf{V}\in\mathbb{R}^{K\times N} and K=Ks×P+KcK=K_{s}\times P+K_{c}. The common factor ratio λ=Kc/(Ks+Kc)\lambda=K_{c}/(K_{s}+K_{c}). The final mm-th layer partially shared factor 𝐕mp(Ks+Kc)×N\mathbf{V}_{m}^{p}\in\mathbb{R}^{(K_{s}+K_{c})\times N} is divided into four parts: the labeled and unlabeled view-specific encoding matrix (i.e. 𝐕slpKs×Nl\mathbf{V}_{sl}^{p}\in\mathbb{R}^{K_{s}\times N_{l}} and 𝐕supKs×Nu\mathbf{V}_{su}^{p}\in\mathbb{R}^{K_{s}\times N_{u}}), the labeled and unlabeled shared encoding matrix (i.e. 𝐕clKc×Nl\mathbf{V}_{cl}\in\mathbb{R}^{K_{c}\times N_{l}} and 𝐕cuKc×Nu\mathbf{V}_{cu}\in\mathbb{R}^{K_{c}\times N_{u}}). We propose a general partially shared deep matrix factorization framework as follows:

minp=1P𝐗p[𝐔1sp,𝐔1cp][𝐔msp,𝐔mcp][𝐕slp,𝐕sup𝐕cl,𝐕cu]F2\displaystyle\min\sum_{p=1}^{P}\left\|\mathbf{X}^{p}-\left[\mathbf{U}_{1s}^{p},\mathbf{U}_{1c}^{p}\right]\cdot\cdot\cdot\left[\mathbf{U}_{ms}^{p},\mathbf{U}_{mc}^{p}\right]\left[\begin{array}[]{c}\mathbf{V}_{sl}^{p},\mathbf{V}_{su}^{p}\\ \mathbf{V}_{cl},\mathbf{V}_{cu}\end{array}\right]\right\|_{F}^{2} (7)
s.t. 𝐕slp,𝐕sup,𝐕cl,𝐕cu0\displaystyle\text{ s.t. }\mathbf{V}_{sl}^{p},\mathbf{V}_{su}^{p},\mathbf{V}_{cl},\mathbf{V}_{cu}\geq 0

where 𝐔isp\mathbf{U}_{is}^{p} and 𝐔icp\mathbf{U}_{ic}^{p} denote the hidden view-specific matrix and common matrix, respectively. To simplify the problem, we use 𝐕mp=[𝐕sp;𝐕c]=[[𝐕slp,𝐕sup];[𝐕cl,𝐕cu]]\mathbf{V}_{m}^{p}=[\mathbf{V}_{s}^{p};\mathbf{V}_{c}]=\left[[\mathbf{V}_{sl}^{p},\mathbf{V}_{su}^{p}];[\mathbf{V}_{cl},\mathbf{V}_{cu}]\right], 𝐔ip=[𝐔isp;𝐔icp]\mathbf{U}_{i}^{p}=[\mathbf{U}_{is}^{p};\mathbf{U}_{ic}^{p}], by combining Eq.(4), Eq.(6) and Eq.(7), we get the cost function of PSDMF as:

min𝐔ip,𝐕mp,𝐖O=\displaystyle\min_{\mathbf{U}_{i}^{p},\mathbf{V}_{m}^{p},\mathbf{W}}O= p=1P(αp𝐗p𝐔1p𝐔mp𝐕mpF2\displaystyle\sum_{p=1}^{P}\left(\alpha^{p}\left\|\mathbf{X}^{p}-\mathbf{U}_{1}^{p}\cdot\cdot\cdot\mathbf{U}_{m}^{p}\mathbf{V}_{m}^{p}\right\|_{F}^{2}\right. (8)
+μtr(𝐕mp𝐋p(𝐕mp)T))\displaystyle\left.+\mu\operatorname{tr}\left(\mathbf{V}_{m}^{p}\mathbf{L}^{p}\left(\mathbf{V}_{m}^{p}\right)^{T}\right)\right)
+β(𝐖T𝐕l𝐘F2+γ𝐖2,1)\displaystyle+\beta\left(\left\|\mathbf{W}^{T}\mathbf{V}_{l}-\mathbf{Y}\right\|_{F}^{2}+\gamma\|\mathbf{W}\|_{2,1}\right)
s.t. 𝐕mp0,p\displaystyle\text{ s.t. }\mathbf{V}_{m}^{p}\geq 0,\forall p

where αp\alpha^{p} is the weighting coefficient for the pp-th view and it has a great influence on the effect of the model. In DMVC [14], the weights distribution smooth is determined by parameter rr, but rr needs to be searched in a large range manually, which makes it to adjust. To avoid this problem, inspired by [25, 26], we use the auto-weighted strategy that obtains the value of αp\alpha^{p} based on the distance between the data and the decomposition matrices, as follows:

αp=12𝐗p𝐔1p𝐔mp𝐕mpF\alpha^{p}=\frac{1}{2\left\|\mathbf{X}^{p}-\mathbf{U}_{1}^{p}\cdot\cdot\cdot\mathbf{U}_{m}^{p}\mathbf{V}_{m}^{p}\right\|_{F}} (9)

III Optimization

To expedite the approximation of the matrices in our proposed model, we conduct the pre-training [24] by decomposing each view 𝐗p𝐔1p𝐕1p\mathbf{X}^{p}\approx\mathbf{U}_{1}^{p}\mathbf{V}_{1}^{p} by minimizing the Semi-NMF 𝐗p𝐔1p𝐕1pF2\left\|\mathbf{X}^{p}-\mathbf{U}_{1}^{p}\mathbf{V}_{1}^{p}\right\|_{F}^{2}, where 𝐔1pMp×k1\mathbf{U}_{1}^{p}\in\mathbb{R}^{M^{p}\times k_{1}} and 𝐕1pk1×N\mathbf{V}_{1}^{p}\in\mathbb{R}^{k_{1}\times N}. Then the factor matrix 𝐕1p\mathbf{V}_{1}^{p} is further decomposed as 𝐕1p𝐔2p𝐕2p\mathbf{V}_{1}^{p}\approx\mathbf{U}_{2}^{p}\mathbf{V}_{2}^{p} by minimizing the 𝐕2p𝐔2p𝐕2pF2\left\|\mathbf{V}_{2}^{p}-\mathbf{U}_{2}^{p}\mathbf{V}_{2}^{p}\right\|_{F}^{2}, where 𝐔2pk1×k2\mathbf{U}_{2}^{p}\in\mathbb{R}^{k_{1}\times k_{2}}, 𝐕1pk2×N\mathbf{V}_{1}^{p}\in\mathbb{R}^{k_{2}\times N} and k1,k2k_{1},k_{2} denote the dimensions for layer 1 and layer 2. The process will be repeated till all of the layers are pre-trained. The optimization of Semi-NMF can be derived following a similar process as described in [20]. To save the space, we omit the updating rules here.

Update rule for hidden matrix 𝐔ip\mathbf{U}_{i}^{p}: By fixing 𝐕mp\mathbf{V}_{m}^{p} and 𝐖\mathbf{W}, we minimize the objective function (8) with respect to 𝐔ip\mathbf{U}_{i}^{p}. Letting the partial derivative O/𝐔ip=0\partial O/\mathbf{U}_{i}^{p}=0, we can obtain

𝐔ip=(Φi1TΦi1)1(Φi1T𝐗p𝐔imT)(𝐔im𝐔imT)1\mathbf{U}_{i}^{p}=\left(\Phi_{i-1}^{T}\Phi_{i-1}\right)^{-1}\left(\Phi_{i-1}^{T}\mathbf{X}^{p}\mathbf{U}_{im}^{T}\right)\\ \left(\mathbf{U}_{im}\mathbf{U}_{im}^{T}\right)^{-1} (10)

where Φi1=𝐔1p𝐔2p𝐔i1p\Phi_{i-1}=\mathbf{U}_{1}^{p}\mathbf{U}_{2}^{p}\ldots\mathbf{U}_{i-1}^{p} and 𝐔im=𝐔i+1p𝐔mp𝐕mp\mathbf{U}_{im}=\mathbf{U}_{i+1}^{p}\ldots\mathbf{U}_{m}^{p}\mathbf{V}_{m}^{p}.

Update rule for regression weight matrix 𝐖\mathbf{W}: Following [27], the derivative of objective function (8) with respect to 𝐖\mathbf{W} is as follows:

O𝐖=2β(𝐕l(𝐕lT𝐖𝐘T)+γ𝐄𝐖)\frac{\partial O}{\mathbf{W}}=2\beta\left(\mathbf{V}_{l}(\mathbf{V}_{l}^{T}\mathbf{W}-\mathbf{Y}^{T})+\gamma\mathbf{E}\mathbf{W}\right) (11)

where 𝐄\mathbf{E} is a diagonal matrix with eii=1/2𝐰i2e_{ii}=1/2\|\mathbf{w}_{i}\|_{2}. According to the optimization theory, set O/𝐖=0\partial O/\mathbf{W}=0, then we can obtain the update rule for 𝐖\mathbf{W} is

𝐖=(𝐕l𝐕lT+γ𝐄)1𝐕l𝐘T.\mathbf{W}=\left(\mathbf{V}_{l}\mathbf{V}_{l}^{T}+\gamma\mathbf{E}\right)^{-1}\mathbf{V}_{l}\mathbf{Y}^{T}. (12)

Update rule for partially shared factor matrix 𝐕mp\mathbf{V}_{m}^{p}: As illustrated in Fig. 2, 𝐕mp\mathbf{V}_{m}^{p} can be divided into four parts: 𝐕slp,𝐕sup,𝐕cl\mathbf{V}_{sl}^{p},\mathbf{V}_{su}^{p},\mathbf{V}_{cl} and 𝐕cu\mathbf{V}_{cu}. Similarly, each data matrix 𝐗p\mathbf{X}^{p} is divided into two parts 𝐗lp\mathbf{X}_{l}^{p} and 𝐗up\mathbf{X}_{u}^{p}; each Laplacian Graph matrix 𝐋p\mathbf{L}^{p} is divided into two parts 𝐋lp\mathbf{L}_{l}^{p} and 𝐋up\mathbf{L}_{u}^{p}.

For the constraint 𝐕mp0\mathbf{V}_{m}^{p}\geq 0, we introduce the Lagrangian multiplier η\eta as follows:

(𝐕mp)\displaystyle\mathcal{L}\left(\mathbf{V}_{m}^{p}\right) =p=1P(αp𝐗pΦm𝐕mpF2+μtr(𝐕mp𝐋p(𝐕mp)T)\displaystyle=\sum_{p=1}^{P}\left(\alpha^{p}\left\|\mathbf{X}^{p}-\Phi_{m}\mathbf{V}_{m}^{p}\right\|_{\mathrm{F}}^{2}+\mu\operatorname{tr}\left(\mathbf{V}_{m}^{p}\mathbf{L}^{p}(\mathbf{V}_{m}^{p})^{T}\right)\right. (13)
+β𝐖T𝐕l𝐘tr(ηT𝐕mp))\displaystyle\left.+\beta\left\|\mathbf{W}^{T}\mathbf{V}_{l}-\mathbf{Y}\right\|-tr(\eta^{T}\mathbf{V}_{m}^{p})\right)

where Φm=𝐔1p𝐔2p𝐔mp\Phi_{m}=\mathbf{U}_{1}^{p}\mathbf{U}_{2}^{p}\ldots\mathbf{U}_{m}^{p}. Accordingly, Φm\Phi_{m} is divided into two parts Φms\Phi_{ms} and Φmc\Phi_{mc}. For the convenience of writing, we denote that 𝐀p=[Φms,Φmc][𝐕slp;𝐕cl]\mathbf{A}^{p}=[\Phi_{ms},\Phi_{mc}][\mathbf{V}_{sl}^{p};\mathbf{V}_{cl}], 𝐁p=[Φms,Φmc][𝐕sup;𝐕cu]\mathbf{B}^{p}=[\Phi_{ms},\Phi_{mc}][\mathbf{V}_{su}^{p};\mathbf{V}_{cu}].

The zero gradient condition of (𝐕mp)\mathcal{L}\left(\mathbf{V}_{m}^{p}\right) with respect to 𝐕slp,𝐕sup,𝐕cl\mathbf{V}_{sl}^{p},\mathbf{V}_{su}^{p},\mathbf{V}_{cl} and 𝐕cu\mathbf{V}_{cu}, respectively, we have

𝐕slp=\displaystyle\frac{\partial\mathcal{L}}{\partial\mathbf{V}_{sl}^{p}}= αpΦmsT(𝐀p𝐗lp)+μ𝐕sp𝐋lp+β𝐅spηsl=0\displaystyle\alpha^{p}\Phi_{ms}^{T}\left(\mathbf{A}^{p}-\mathbf{X}_{l}^{p}\right)+\mu\mathbf{V}_{s}^{p}\mathbf{L}_{l}^{p}+\beta\mathbf{F}_{s}^{p}-\eta_{sl}=0 (14)
𝐕sup=\displaystyle\frac{\partial\mathcal{L}}{\partial\mathbf{V}_{su}^{p}}= αpΦmsT(𝐁p𝐗up)+μ𝐕sp𝐋upηsu=0\displaystyle\alpha^{p}\Phi_{ms}^{T}\left(\mathbf{B}^{p}-\mathbf{X}_{u}^{p}\right)+\mu\mathbf{V}_{s}^{p}\mathbf{L}_{u}^{p}-\eta_{su}=0
𝐕c1=\displaystyle\frac{\partial\mathcal{L}}{\partial\mathbf{V}_{c1}}= p=1PαpΦmcT(𝐀p𝐗lp)+μ𝐕cp𝐋lp+β𝐅cηcl=0\displaystyle\sum_{p=1}^{P}\alpha^{p}\Phi_{mc}^{T}\left(\mathbf{A}^{p}-\mathbf{X}_{l}^{p}\right)+\mu\mathbf{V}_{c}^{p}\mathbf{L}_{l}^{p}+\beta\mathbf{F}_{c}-\eta_{cl}=0
𝐕cu=\displaystyle\frac{\partial\mathcal{L}}{\partial\mathbf{V}_{cu}}= p=1PαpΦmcT(𝐁p𝐗up)+μ𝐕cp𝐋upηcu=0\displaystyle\sum_{p=1}^{P}\alpha^{p}\Phi_{mc}^{T}\left(\mathbf{B}^{p}-\mathbf{X}_{u}^{p}\right)+\mu\mathbf{V}_{c}^{p}\mathbf{L}_{u}^{p}-\eta_{cu}=0

where 𝐅=𝐖(𝐖T𝐕l𝐘)=[𝐅s1;;𝐅sp;𝐅c]\mathbf{F}=\mathbf{W}(\mathbf{W}^{T}\mathbf{V}_{l}-\mathbf{Y})=[\mathbf{F}_{s}^{1};...;\mathbf{F}_{s}^{p};\mathbf{F}_{c}]. Following a similar proof to [14], using the Karush-Kuhn-Tucker condition, we can formulate the updating rules for 𝐕mp\mathbf{V}_{m}^{p}:

𝐕slp=𝐕slp\displaystyle\mathbf{V}_{sl}^{p}=\mathbf{V}_{sl}^{p}\odot (15)
αp[ΦmsT𝐀p]+αp[ΦmsT𝐗lp]++μ[𝐕sp𝐋lp]+β[𝐅sp]αp[ΦmsT𝐀p]++αp[ΦmsT𝐗lp]+μ[𝐕sp𝐋lp]++β[𝐅sp]+\displaystyle\sqrt{\frac{\alpha^{p}[\Phi_{ms}^{T}\mathbf{A}^{p}]^{-}+\alpha^{p}[\Phi_{ms}^{T}\mathbf{X}_{l}^{p}]^{+}+\mu[\mathbf{V}_{s}^{p}\mathbf{L}_{l}^{p}]^{-}+\beta[\mathbf{F}_{s}^{p}]^{-}}{\alpha^{p}[\Phi_{ms}^{T}\mathbf{A}^{p}]^{+}+\alpha^{p}[\Phi_{ms}^{T}\mathbf{X}_{l}^{p}]^{-}+\mu[\mathbf{V}_{s}^{p}\mathbf{L}_{l}^{p}]^{+}+\beta[\mathbf{F}_{s}^{p}]^{+}}}
𝐕sup=𝐕sup\displaystyle\mathbf{V}_{su}^{p}=\mathbf{V}_{su}^{p}\odot
αp[ΦmsT𝐁p]+αp[ΦmsT𝐗up]++μ[𝐕sp𝐋up]αp[ΦmsT𝐁p]++αp[ΦmsT𝐗up]+μ[𝐕sp𝐋up]+\displaystyle\sqrt{\frac{\alpha^{p}[\Phi_{ms}^{T}\mathbf{B}^{p}]^{-}+\alpha^{p}[\Phi_{ms}^{T}\mathbf{X}_{u}^{p}]^{+}+\mu[\mathbf{V}_{s}^{p}\mathbf{L}_{u}^{p}]^{-}}{\alpha^{p}[\Phi_{ms}^{T}\mathbf{B}^{p}]^{+}+\alpha^{p}[\Phi_{ms}^{T}\mathbf{X}_{u}^{p}]^{-}+\mu[\mathbf{V}_{s}^{p}\mathbf{L}_{u}^{p}]^{+}}}
𝐕cl=𝐕cl\displaystyle\mathbf{V}_{cl}=\mathbf{V}_{cl}\odot
p=1Pαp([ΦmcT𝐀p]+[ΦmsT𝐗lp]+)+μ[𝐕cp𝐋lp]+β[𝐅c]p=1Pαp([ΦmsT𝐀p]++[ΦmsT𝐗lp])+μ[𝐕cp𝐋lp]++β[𝐅c]+\displaystyle\sqrt{\frac{\sum\limits_{p=1}^{P}\alpha^{p}\left([\Phi_{mc}^{T}\mathbf{A}^{p}]^{-}+[\Phi_{ms}^{T}\mathbf{X}_{l}^{p}]^{+}\right)+\mu[\mathbf{V}_{c}^{p}\mathbf{L}_{l}^{p}]^{-}+\beta[\mathbf{F}_{c}]^{-}}{\sum\limits_{p=1}^{P}\alpha^{p}\left([\Phi_{ms}^{T}\mathbf{A}^{p}]^{+}+[\Phi_{ms}^{T}\mathbf{X}_{l}^{p}]^{-}\right)+\mu[\mathbf{V}_{c}^{p}\mathbf{L}_{l}^{p}]^{+}+\beta[\mathbf{F}_{c}]^{+}}}
𝐕cu=𝐕cu\displaystyle\mathbf{V}_{cu}=\mathbf{V}_{cu}\odot
p=1Pαp[ΦmcT𝐁p]+p=1Pαp[ΦmsT𝐗up]++μ[𝐕cp𝐋up]p=1Pαp[ΦmsT𝐁p]++p=1Pαp[ΦmsT𝐗up]+μ[𝐕cp𝐋up]+,\displaystyle\sqrt{\frac{\sum\limits_{p=1}^{P}\alpha^{p}[\Phi_{mc}^{T}\mathbf{B}^{p}]^{-}+\sum\limits_{p=1}^{P}\alpha^{p}[\Phi_{ms}^{T}\mathbf{X}_{u}^{p}]^{+}+\mu[\mathbf{V}_{c}^{p}\mathbf{L}_{u}^{p}]^{-}}{\sum\limits_{p=1}^{P}\alpha^{p}[\Phi_{ms}^{T}\mathbf{B}^{p}]^{+}+\sum\limits_{p=1}^{P}\alpha^{p}[\Phi_{ms}^{T}\mathbf{X}_{u}^{p}]^{-}+\mu[\mathbf{V}_{c}^{p}\mathbf{L}_{u}^{p}]^{+}}},

where [𝐇]+[\mathbf{H}]^{+} and [𝐇][\mathbf{H}]^{-} denote a matrix that all the negative elements are replaced by 0 and all the positive elements are replaced by 0, respectively. That is,

k,j[𝐇]kj+=|𝐇kj|+𝐇kj2,[𝐇]kj=|𝐇kj|𝐇kj2.\forall k,j[\mathbf{H}]_{kj}^{+}=\frac{|\mathbf{H}_{kj}|+\mathbf{H}_{kj}}{2},[\mathbf{H}]_{kj}^{-}=\frac{|\mathbf{H}_{kj}|-\mathbf{H}_{kj}}{2}. (16)
Algorithm 1 Optimization algorithm of PSDMF
0:  Multi-view Data {𝐗p}p=1P\{\mathbf{X}^{p}\}_{p=1}^{P}, parameters μ,β,γ,λ,K\mu,\beta,\gamma,\lambda,K, layer sizes kk
0:  Partially shared latent representation 𝐕\mathbf{V}, the regression weight matrix 𝐖\mathbf{W}
1:  Initialize
2:  Construct partially label matrix 𝐘\mathbf{Y} via Eq. (5)
3:  for all layers in each view do
4:     (𝐔ip,𝐕ip)(\mathbf{U}_{i}^{p},\mathbf{V}_{i}^{p})\leftarrow Semi-NMF(𝐕i1p,𝒌i)(\mathbf{V}_{i-1}^{p},\boldsymbol{k}_{i})
5:  end for
6:  while  not converged do
7:     Update 𝐖\mathbf{W} via Eq. (12)
8:     for p=1p=1 to PP do
9:        Update αp\alpha^{p} via Eq. (9)
10:        for i=1i=1 to mm do
11:           Update 𝐔ip\mathbf{U}_{i}^{p} via Eq. (10)
12:        end for
13:        Update 𝐕mp\mathbf{V}_{m}^{p} via Eq. (15)
14:     end for
15:  end while
TABLE II: Results on five datasets( mean ±\pm standard deviation). Higher value indicates better performance and the highest values are in boldface.
Datasets DMVC GMC lLSMC LMVSC DICS MvSL PSLF GPSNMF Ours
Extended ACC 50.34±\pm0.07 43.38±\pm0.00 53.41±\pm1.29 36.62±\pm0.00 47.54±\pm3.58 19.64±\pm1.81 43.79±\pm4.90 69.56±\pm8.04 87.38±\pm3.45
Yale B NMI 49.97±\pm0.14 44.90±\pm0.00 53.40±\pm0.64 28.09±\pm0.00 50.05±\pm5.11 11.82±\pm2.88 32.14±\pm4.85 60.44±\pm9.12 83.53±\pm2.44
Purity 50.49±\pm0.07 43.69±\pm0.00 53.45±\pm1.26 42.77±\pm0.00 51.39±\pm3.82 20.52±\pm1.66 44.10±\pm4.36 69.56±\pm8.04 87.38±\pm3.45
Prokaryotic ACC 53.50±\pm2.23 49.55±\pm0.00 52.31±\pm6.01 57.53±\pm0.00 56.07±\pm11.76 35.68±\pm2.51 46.49±\pm6.63 63.45±\pm2.59 66.83±\pm2.94
NMI 3.28±\pm0.66 19.34±\pm0.00 16.49±\pm8.59 13.37±\pm0.00 12.63±\pm9.88 1.05±\pm0.41 1.69±\pm1.02 18.46±\pm5.48 21.08±\pm4.21
Purity 57.31±\pm0.65 58.44±\pm0.00 63.08±\pm4.15 62.94±\pm0.00 60.14±\pm5.16 57.03±\pm0.07 57.46±\pm0.73 63.45±\pm2.59 66.53±\pm2.94
Caltech101 ACC 54.60±\pm0.43 69.20±\pm0.00 51.98±\pm2.61 69.47±\pm0.00 50.95±\pm10.61 46.42±\pm4.14 76.88±\pm3.09 89.77±\pm2.54 90.24±\pm3.08
-7 NMI 37.60±\pm5.79 65.95±\pm0.00 53.22±\pm2.22 46.68±\pm0.00 53.08±\pm7.71 51.59±\pm2.81 40.94±\pm7.75 74.53±\pm4.10 75.52±\pm5.36
Purity 79.55±\pm4.64 88.47±\pm0.00 85.61±\pm1.64 77.07±\pm0.00 85.28±\pm3.70 83.73±\pm1.49 77.56±\pm4.12 90.38±\pm2.33 91.12±\pm2.03
Caltech101 ACC 51.72±\pm1.72 45.64±\pm0.00 45.03±\pm4.16 42.08±\pm0.00 35.75±\pm3.83 42.01±\pm2.86 55.75±\pm10.35 77.33±\pm3.34 79.74±\pm3.73
-20 NMI 52.18±\pm1.49 48.09±\pm0.00 61.21±\pm1.14 49.64±\pm0.00 55.99±\pm2.58 60.48±\pm1.45 44.85±\pm5.77 67.48±\pm3.52 71.24±\pm3.06
Purity 70.23±\pm0.92 55.49±\pm0.00 76.95±\pm0.79 50.84±\pm0.00 69.32±\pm2.47 76.16±\pm1.17 62.55±\pm4.94 77.81±\pm2.60 79.74±\pm2.84
MSRCV1 ACC 40.76±\pm2.37 74.76±\pm0.00 69.91±\pm1.79 64.76±\pm0.00 51.91±\pm6.61 78.07±\pm5.83 82.96±\pm4.51 72.17±\pm4.03 86.24±\pm5.69
NMI 27.60±\pm3.17 70.09±\pm0.00 62.37±\pm2.71 58.81±\pm0.00 50.48±\pm11.29 68.32±\pm4.69 75.04±\pm3.49 67.41±\pm4.34 76.14±\pm3.42
Purity 45.14±\pm2.41 79.05±\pm0.00 72.86±\pm1.38 69.05±\pm0.00 58.09±\pm8.68 78.52±\pm5.73 82.96±\pm4.51 74.92±\pm4.44 86.24±\pm4.45

Until now, we have all the update rules done. Updates iterations repeatedly until convergence. The overall optimization process of PSDMF is outlined in Algorithm 1, where the ”Semi-NMF” procedure performs the pre-train as described earlier. Once the optimal partially shared latent representation matrix 𝐕\mathbf{V} and regression weight matrix 𝐖\mathbf{W} are obtained, cluster label y=argmaxcyc,iy=\arg\max_{c}y_{c,i} where 𝒚i=𝐖T𝒗i\boldsymbol{y}_{i}=\mathbf{W}^{T}\boldsymbol{v}_{i}.

IV Experiments

To comparatively study the performance of PSDMF, we consider several state-of-the-art baselines as stated bellow:
Multi-view clustering via deep matrix factorization (DMVC) [14], graph-based multi-view clustering (GMC) [21], latent multi-view subspace clustering (lLSMC) [23], large-scale multi-view subspace clustering (LMVSC) [22], multi-view discriminative learning via joint non-negative matrix factorization (DICS) [18], graph-regularized multi-view semantic subspace learning (MvSL) [19], partially shared latent factor learning (PSLF) [10] and semi-supervised multi-view clustering with graph-regularized partially shared non-negative matrix factorization (GPSNMF) [17]. Among them, four methods are unsupervised (i.e., DMVC, GMC, lLSMC, LMVSC) and the others are semi-supervised.

IV-A Datasets and Evaluation Metric

We perform experiments on five benchmark datasets: Extended Yale B, Prokaryotic, MSRCV1, Caltech101-7 and 20. More detailed information about these data is shown in Table III.

TABLE III: Statistics of datasets used in experiments.
Dataset Dimension of Views Sizes Classes
Extended Yale B 2500/3304/6750 650 10
Prokaryotic 393/3/438 551 4
Caltech101-7/20 48/40/254/1984/512/928 1474/2386 7/20
MSRCV1 24/576/512/256/254 210 7

For evaluation metrics, we use Accuracy (ACC), Normalized Mutual Information (NMI) and Purity to comprehensively measure the clustering performance in our experiments. The formal definitions of these metrics, omitted here to save space, can be found in [9].

IV-B Experiment Setup

Following [17], for PSLF, GPSNMF and PSDMF, the dimension of partially shared latent representation KK and the common factor ratio λ\lambda are set to 100 and 0.5, respectively. Thus, Kc+Ks×P=100K_{c}+K_{s}\times P=100 and Kc/(Ks+Kc)=0.5K_{c}/(K_{s}+K_{c})=0.5. In addition, they all predicate the labels with regression matrix 𝐖\mathbf{W}. Other than these three algorithms, all the above methods are evaluated by the classic clustering algorithm, K-means. In terms of semi-supervised methods (i.e. DICS, MvSL, PSLF, GPSNMF and PSDMF), we denote 10%10\% as the proportions of the labeled data. Following [14], the layer sizes of deep structure models (i.e. DMVC, PSDMF) are set as [100,50][100,50].

For all compared methods, 10 repeated times are run to reduce the randomness caused by initialization, and report their average scores and standard deviations. As to the compared methods, source codes are obtained from the authors’ websites, and we set their parameters to the optimal value if they have.

IV-C Performance Evaluation

The performances of multi-view clustering is shown in Table II. From the experiment results, we have drawn some conclusions as following: In general, semi-supervised algorithms are superior to unsupervised algorithms. For example, our PSDMF significantly improves the performance compared with DMVC all over the five datasets, even when only 10%10\% label information is used. For the dataset Extended Yale B, we raise the performance bar by around 18%18\% in ACC, 13%13\% in NMI and 18%18\% in Purity. On average, we improve the state-of-the-art GPSNMF by more than 16%16\%, which means multi-layer structure methods have clustering advantages over single-layer methods. Through the partially shared deep representation, PSDMF eliminates the influence of undesirable factors by considering both view-specific and common features, and keeps the important information in the final representation layer. In different datasets experiments, our model achieves better performances than the state-of-the-art methods, which demonstrate the robustness of ours.

Refer to caption

Figure 3: The performance of PSDMF in terms of ACC with different value of parameters μ\mu and β\beta. (a) Extended Yale B; (b) Prokaryotic; (c) Caltech101-7; (d) Caltech101-20; (e) MSRCV1.

IV-D Parameters Sensitivity

There are three nonnegative essential parameters (i.e. μ,β,γ\mu,\beta,\gamma) in our proposed model. μ\mu controls the smoothness of the partially shared factor. γ\gamma and β\beta controls the sparsity of the weight matrix 𝐖\mathbf{W} and balances the relation between the regression term and data reconstruction, respectively. Following [10, 17], γ=10\gamma=10 usually obtains better performance. As is shown in Fig. 3, we test the sensitivity of the parameters μ\mu and β\beta in terms of clustering results, which are tuned in the range of {0.001,0.005,0.01,0.05,0.1,0.5}\{0.001,0.005,0.01,0.05,0.1,0.5\} and {0.01,0.1,1,10}\{0.01,0.1,1,10\}, respectively. We can observe that the parameters affect the experimental results in different datasets, which indicates that μ\mu and β\beta play an important role in PSDMF. In practice, we fix μ=0.1\mu=0.1 and β=10\beta=10 as default in our experiments.

V conclusion

In this paper, we introduced a novel semi-supervised deep matrix factorization model for multi-view learning, called PSDMF, which is able to learn a comprehensive partially shared latent final-layer representation for multi-view data. Through the partially shared multi-layer matrix factorization structure, our method is capable of exploiting view-specific and common features among different views, simultaneously. Benefitting from the label regression term, it can incorporate information from the labeled data. Furthermore, we utilize graph regularization and auto-weighted strategy to preserve the intrinsic geometric structure of the data. An iterative optimization algorithm is developed to deal with the PSDMF. Experimental results show that the proposed model achieves superior performances.

Acknowledgment

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61722304, 61801133, and 61803096, in part by the Guangdong Science and Technology Foundation under Grant Nos. 2019B010118001, 2019B010121001, and 2019B010154002, National Key Research and Development Project, China under Grant No. 2018YFB1802400.

References

  • [1] S. Sun, “A survey of multi-view machine learning,” Neural computing and applications, vol. 23, no. 7–8, pp. 2031–2038, 2013
  • [2] Y. Li, M. Yang, Z. Zhang, “A survey of multi-view representation learning,” IEEE Trans. Knowl. Data Eng., vol. 31, no. 10, pp. 1863–1883, 2018.
  • [3] R. Zhang, F. Nie, X. Li and X. Wei, “Feature selection with multi-view data: A survey,” Information Fusion, vol. 50, pp. 158–167, 2019.
  • [4] D. D. Lee, H. S. Seung, “Algorithms for non-negative matrix factorization,” in NIPS, pp. 556–562, 2001.
  • [5] Z. Yang, Y. Xiang, K. Xie and Y. Lai, “Adaptive method for nonsmooth nonnegative matrix factorization,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 4, pp. 948–960, 2017.
  • [6] X. Zhang, L. Zong, X. Liu and H. Yu, “Constrained NMF-based multi-view clustering on unmapped data,” in AAAI, pp. 3174–3180, 2015.
  • [7] J. Liu, C. Wang, J. Gao and J. Han, “Multi-view clustering via joint nonnegative matrix factorization,” in SDM, pp. 252–260, 2013.
  • [8] L. Zong, X. Zhang, L. Zhao, H. Yu and Q. Zhao, “Multi-view clustering via multi-manifold regularized non-negative matrix factorization,” Neural Netw., vol. 88, pp. 74–89, 2017.
  • [9] Z. Yang, N. Liang, W. Yan, Z. Li and S. Xie, “Uniform distribution non-negative matrix factorization for multiview clustering,” IEEE Trans. Cybern., in press, 2020.
  • [10] J. Liu, Y. Jiang, Z. Li, Z. H. Zhou and H. Lu, “Partially shared latent factor learning with multiview data,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 6, pp. 1233–1246, 2015.
  • [11] L. Zhao, T. Yang, J. Zhang, Z. Chen, Y. Yang and Z. J. Wang, ”Co-learning non-negative correlated and uncorrelated features for multi-view data,” IEEE Trans. Neural Netw. Learn. Syst., in press, 2020.
  • [12] Y. Meng, R. Shang, F. Shang, L. Jiao, S. Yang and R. Stolkin, “Semi-supervised graph regularized deep NMF with bi-orthogonal constraints for data representation,” IEEE Trans. Neural Netw. Learn. Syst., in press, 2019.
  • [13] W. Wang, R. Arora, K. Livescu and J. Bilmes, “On deep multi-view representation learning,” in ICML, pp. 1083–1092, 2015.
  • [14] H. Zhao, Z. Ding and Y. Fu, “Multi-view clustering via deep matrix factorization,” in AAAI, pp. 2921–2927, 2017.
  • [15] G. Trigeorgis, K. Bousmalis, S. Zafeiriou, and B. W. Schuller, “A deep matrix factorization method for learning attribute representations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 3, pp. 417–429, 2017.
  • [16] S. Huang, Z. Kang and Z. Xu, “Auto-weighted multi-view clustering via deep matrix decomposition,” Pattern Recognition, vol. 97, no. 107015, 2020.
  • [17] N. Liang, Z. Yang, Z. Li, S. Xie and C. Su, “Semi-supervised multi-view clustering with Graph-regularized Partially Shared Non-negative Matrix Factorization,” Knowledge-Based Systems, vol. 190, no. 105185, 2020.
  • [18] Z. Zhang, Z. Qin, P. Li, Q. Yang and J. Shao, “Multi-view discriminative learning via joint non-negative matrix factorization,” in Int. Conf. Database Systems for Advanced Applications, pp. 542–557, 2018.
  • [19] J. Peng, P. Luo, Z. Guan and J. Fan, ”Graph-regularized multi-view semantic subspace learning,” Int. Jour. Mach. Learn. Cybern., vol. 10, no. 5, pp. 879–895, 2019.
  • [20] C. H. Ding, T. Li, and M. I. Jordan, “Convex and semi-nonnegative matrix factorizations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 1, pp. 45–55, 2010.
  • [21] H. Wang, Y. Yang and B. Liu, “GMC: Graph-based multi-view clustering,” IEEE Trans. Knowl. Data Eng., vol. 32, no. 6, pp. 1116–1129, 2019.
  • [22] Z. Kang, W. Zhou, Z. Zhao, J. Shao, M. Han and Z. Xu, “Large-scale multi-view subspace clustering in linear time,” in AAAI, 2020.
  • [23] C. Zhang, H. Fu, Q. Hu, X. Cao, Y. Xie, D. Tao and D. Xu, “Generalized latent multi-view subspace clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 1, pp. 86–99, 2020.
  • [24] W. Zhao, C. Xu, Z. Guan and Y. Liu, “Multiview concept learning via deep matrix factorization,” IEEE Trans. Neural Netw. Learn. Syst., in press, 2020.
  • [25] F. Nie, G. Cai, J. Li and X. Li, “Auto-weighted multi-view learning for image clustering and semi-supervised classification,” IEEE Trans. Image Process., vol. 27, no. 3, pp. 1501–1511, 2018.
  • [26] Z. Kang, X. Lu, J. Yi and Z. Xu, “Self-weighted multiple kernel learning for graph-based clustering and semi-supervised classification,” in IJCAI, pp. 2312–2318, 2018.
  • [27] F. Nie, X. Cai, H. Huang and C. Ding, “Efficient and robust feature selection via joint l2, 1-norms minimization,” in NIPS, pp. 1813–1821, 2010.
  • [28] D. Cai, X. He, J. Han and T.S. Huang, “Graph regularized nonnegative matrix factorization for data representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33 no. 8, pp. 1548–1560, 2011.