This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the Robustness of Cross-Concentrated Sampling for Matrix Completion
thanks: The authors contributed equally. This work was partially supported by NSF DMS 2304489. (Corresponding authors: HanQin Cai and Longxiu Huang.)

HanQin Cai Department of Statistics and Data Science
Department of Computer Science
University of Central Florida
Orlando, FL 32816, USA
[email protected]
   Longxiu Huang Department of Computational Mathematics, Science and Engineering
Department of Mathematics
Michigan State University
East Lansing, MI 48824, USA
[email protected]
   Chandra Kundu Department of Statistics and Data Science
University of Central Florida
Orlando, FL 32816, USA
[email protected]
   Bowen Su Department of Mathematics
Michigan State University
East Lansing, MI 48824, USA
[email protected]
Abstract

Matrix completion is one of the crucial tools in modern data science research. Recently, a novel sampling model for matrix completion coined cross-concentrated sampling (CCS) has caught much attention. However, the robustness of the CCS model against sparse outliers remains unclear in the existing studies. In this paper, we aim to answer this question by exploring a novel Robust CCS Completion problem. A highly efficient non-convex iterative algorithm, dubbed Robust CUR Completion (RCURC), is proposed. The empirical performance of the proposed algorithm, in terms of both efficiency and robustness, is verified in synthetic and real datasets.

Index Terms:
Robust matrix completion, cross-concentrated sampling, CUR decomposition, outlier detection.

I Introduction

Matrix completion problem [1, 2] aims to recover the underlying low-rank matrix 𝑿\bm{X} from some entry-wise partial observation. It has received tremendous attention in the past decades and has been widely applied in real-world applications such as collaborative filtering [3, 4], image processing [5, 6], and signal processing [7, 8]. While the vanilla matrix completion studies are often based on entry-wise uniform or Bernoulli sampling models, the recent development of a novel sampling model called Cross-Concentrated Sampling (CCS) [9] has caught the attention of the matrix completion community. The CCS model is summarized in Procedure 1.

Procedure 1 Cross-Concentrated Sampling (CCS) [9]
1:Input: 𝒀\bm{Y}: access to the data matrix.
2:Uniformly choose row and column indices ,𝒥{\mathcal{I}},{\mathcal{J}}.
3:Set 𝑹=[𝒀],:{\bm{R}}=[\bm{Y}]_{{\mathcal{I}},:} and 𝑪=[𝒀]:,𝒥{\bm{C}}=[\bm{Y}]_{:,{\mathcal{J}}}.
4:Uniformly sample entries in 𝑹{\bm{R}} and 𝑪{\bm{C}}, then record the sampled locations as Ω𝑹\Omega_{{\bm{R}}} and Ω𝑪\Omega_{{\bm{C}}}, respectively.
5:Output: [𝒀]Ω𝑹Ω𝑪[\bm{Y}]_{\Omega_{{\bm{R}}}\cup\Omega_{{\bm{C}}}}, Ω𝑹\Omega_{{\bm{R}}}, Ω𝑪\Omega_{{\bm{C}}}, {\mathcal{I}}, 𝒥{\mathcal{J}}.

As illustrated in Figure 1, the CCS model bridges two popular sampling models for matrix completion: Uniform sampling and CUR sampling which recovers the underlying low-rank matrix from observed rows and columns. The CCS model provides extra flexibility, with a theoretical guideline, for samples concentrated on certain rows and columns. This model has the potential for more cost- and time-efficient sampling strategies in applications such as recommendation systems. However, data in real-world applications are often corrupted by sparse outliers, and uniformly sampled data must be processed by some robust completion algorithms for reliable recovery results. That being said, a crucial question has to be answered before the CCS model can be widely used: “Is CCS-based matrix completion robust to sparse outliers under some robust algorithms, like the uniform sampling model?”

Mathematically, provided the partial observation 𝒫Ω(𝒀)\mathcal{P}_{\Omega}(\bm{Y}) of a corrupted data matrix 𝒀=𝑿+𝑺\bm{Y}=\bm{X}+\bm{S}, we study the Robust CCS Completion problem:

minimize𝑿,𝑺12𝒫Ω(𝑿+𝑺𝒀),𝑿+𝑺𝒀subjecttorank(𝑿)=rand𝑺 is α-sparse,\begin{split}\operatorname*{\mathrm{minimize}}_{\bm{X},\bm{S}}&\quad\frac{1}{2}\langle\mathcal{P}_{\Omega}(\bm{X}+\bm{S}-\bm{Y}),\bm{X}+\bm{S}-\bm{Y}\rangle\\ \operatorname*{\mathrm{subject~{}to}}&\quad\mathrm{rank}(\bm{X})=r\quad\textnormal{and}\quad\bm{S}\textnormal{ is $\alpha$-sparse,}\end{split} (1)

where 𝒫\mathcal{P} is the sampling operator defined as:

𝒫Ω(𝒀)=(i,j)Ω[𝒀]i,j𝒆i𝒆j\mathcal{P}_{\Omega}(\bm{Y})=\sum_{(i,j)\in\Omega}[\bm{Y}]_{i,j}\bm{e}_{i}\bm{e}_{j}^{\top}

and the observation set Ω\Omega is generated according to the CCS model. Similar to the standard robust matrix completion, a sparse variable 𝑺\bm{S} is introduced for better outlier tolerance, thus one can recover the underlying low-rank matrix 𝑿\bm{X} accurately.

Refer to caption
(a) Uniform Sampling
Refer to caption
(b) CCS–Less Concentrated
Refer to caption
(c) CCS–More Concentrated
Refer to caption
(d) CUR Sampling
Figure 1: [9] Visual illustrations of different sampling schemes. From left to right, sampling methods change from the uniform sampling style to the CUR sampling style with the same total observation rate. Colored pixels indicate observed entries, while black pixels mean missing entries.

I-A Related Work

The problem of low-rank matrix recovery in the presence of sparse outliers has been well-studied under the settings of uniform sampling and Bernoulli sampling. This problem is known as robust principal component analysis (RPCA) when the corrupted data matrix is fully observed, and it is called robust matrix completion (RMC) when data is partially observed. The seminal work [10] considers both RPCA and RMC problems via convex relaxed formulations and provides recovery guarantees. In particular, under the μ\mu-incoherence assumption for the low-rank 𝑿\bm{X}, [10] requires the positions of outliers to be placed uniformly at random, and at least 0.1n20.1n^{2} entries are observed uniformly at random. Later, a series of non-convex algorithms [11, 12, 13, 14, 15, 16] tackle RPCA and/or RMC problems with an improved, non-random α\alpha-sparsity assumptions for the outlier matrix 𝑺\bm{S}. The typical recovery guarantee shows a linear convergence of a non-convex algorithm, provided α𝒪(1/poly(μr))\alpha\leq\mathcal{O}(1/\mathrm{poly}(\mu r)); moreover, 𝒪(poly(μr)polylog(n)n)\mathcal{O}(\mathrm{poly}(\mu r)\mathrm{polylog}(n)n) random samples are typically required for the RMC cases. Another line of work [17, 7, 18, 19, 8] focuses on the robust recovery of structured low-rank matrices, e.g., Hankel matrices, and they typically require merely 𝒪(poly(μr)polylog(n))\mathcal{O}(\mathrm{poly}(\mu r)\mathrm{polylog}(n)) samples by utilizing the structure, even in the presence of structured outliers. More recently, [20, 21, 22] study the robust CUR decomposition problem, that is, recovering the low-rank matrix from row- and column-wise observations with entry-wise corruptions. The studies show that robust CUR decomposition is as robust as RPCA, with better computational efficiency and data interoperability.

On the other hand, [9] shows that CCS-based matrix completion requires 𝒪(μ2r2nlog2n)\mathcal{O}(\mu^{2}r^{2}n\log^{2}n) samples which is only a factor of logn\log n worse than the state-of-the-art result; however, its outlier tolerance has not been studied.

I-B Notation

For a matrix 𝑴\bm{M}, [𝑴]i,j[\bm{M}]_{i,j}, [𝑴],:[\bm{M}]_{{\mathcal{I}},:}, [𝑴]:,𝒥[\bm{M}]_{:,{\mathcal{J}}}, and [𝑴],𝒥[\bm{M}]_{{\mathcal{I}},{\mathcal{J}}} denote its (i,j)(i,j)-th entry, its row submatrix with row indices {\mathcal{I}}, its column submatrix with column indices 𝒥{\mathcal{J}}, and its submatrix with row indices {\mathcal{I}} and column indices 𝒥{\mathcal{J}}, respectively. 𝑴F:=(i,j[𝑴]i,j2)1/2\|\bm{M}\|_{{\mathrm{F}}}:=(\sum_{i,j}[\bm{M}]_{i,j}^{2})^{1/2} denotes the Frobenius norm, 𝑴2,:=maxi(j[𝑴]i,j2)1/2\|\bm{M}\|_{2,\infty}:=\max_{i}(\sum_{j}[\bm{M}]_{i,j}^{2})^{1/2} denotes the largest row-wise 2\ell_{2}-norm, 𝑴=maxi,j|[𝑴]i,j|\|\bm{M}\|_{\infty}=\max_{i,j}|[\bm{M}]_{i,j}| denote the largest entry-wise magnitude, 𝑴\bm{M}^{\dagger} represents the Moore–Penrose inverse of 𝑴\bm{M}, and 𝑴\bm{M}^{\top} is the transpose of 𝑴\bm{M}. ,\langle\cdot,\cdot\rangle denotes the Frobenius inner product. We denote 𝒆i\bm{e}_{i} as the ii-th standard basis of the real vector space. The symbol [n][n] denotes the set {1,2,,n}\{1,2,\cdots,n\} for all n+n\in\mathbb{Z}^{+}. Throughout the paper, uniform sampling is referred to as uniform sampling with replacement.

II Preliminaries

II-A Assumptions

As discussed in the related work, μ\mu-incoherence and α\alpha-sparsity are commonly used assumptions in RMC problems. We will use the same assumptions in this study and give the formal definition here:

Assumption 1 (μ\mu-incoherence).

Let 𝐗n1×n2\bm{X}\in\mathbb{R}^{n_{1}\times n_{2}} be a rank-rr matrix. It is said μ\mu-incoherent if

𝑼2,μrn1and𝑽2,μrn2\displaystyle\|{\bm{U}}\|_{2,\infty}\leq\sqrt{\frac{\mu r}{n_{1}}}\qquad\textnormal{and}\qquad\|\bm{V}\|_{2,\infty}\leq\sqrt{\frac{\mu r}{n_{2}}}

for some positive μ\mu, where 𝐔𝚺𝐕{\bm{U}}\bm{\Sigma}\bm{V}^{\top} is the compact SVD of 𝐗\bm{X}.

Assumption 2 (α\alpha-sparsity).

𝑺n1×n2\bm{S}\in\mathbb{R}^{n_{1}\times n_{2}} is said α\alpha-sparse if there are at most α\alpha fraction of non-zero entries in each of its row and column. That is, for all i[n1]i\in[n_{1}] and j[n2]j\in[n_{2}], it holds

[𝑺]i,:0αn2and[𝑺]:,j0αn1.\displaystyle\|[\bm{S}]_{i,:}\|_{0}\leq\alpha n_{2}\qquad\textnormal{and}\qquad\|[\bm{S}]_{:,j}\|_{0}\leq\alpha n_{1}.

Note that no randomness is required for the sparsity pattern.

II-B CUR Approximation

CUR approximation, also known as skeleton decomposition, is the backbone of our algorithm design. We shall go over some background of CUR approximation for the reader’s interest. CUR approximation is a self-expressive technique that fills the interpretability gap in matrix dimensionality reduction. Given a rank-rr matrix 𝑿n×n\bm{X}\in\mathbb{R}^{n\times n}, if we select rows and columns from 𝑿\bm{X} that span its row and column spaces respectively, then 𝑿\bm{X} can be reconstructed from these submatrices. This has been affirmed in previous studies:

Theorem 3 ([23]).

Consider row and column index sets ,𝒥[n]{\mathcal{I}},{\mathcal{J}}\subseteq[n]. Denote submatrices 𝐂=[𝐗]:,𝒥{\bm{C}}=[\bm{X}]_{:,{\mathcal{J}}}, 𝐔=[𝐗],𝒥{\bm{U}}=[\bm{X}]_{{\mathcal{I}},{\mathcal{J}}} and 𝐑=[𝐗],:{\bm{R}}=[\bm{X}]_{{\mathcal{I}},:}. If rank(𝐔)=rank(𝐗)\mathrm{rank}({\bm{U}})=\mathrm{rank}(\bm{X}), then 𝐗=𝐂𝐔𝐑\bm{X}={\bm{C}}{\bm{U}}^{\dagger}{\bm{R}}.

For further details of CUR approximation, including the historical context and formal proof, one can refer to [23] and reference therein. Note that through various row- and column-wise sampling methodologies, meeting this rank equivalence condition in Theorem 3 is highly probable, especially when a sufficient number of rows and columns are sampled. An example of this is detailed in Theorem 4.

Theorem 4 ([24, Theorem 1.1]).

Let 𝐗\bm{X} satisfy 1, and suppose we sample ||=𝒪(rlogn)|{\mathcal{I}}|=\mathcal{O}(r\log n) rows and |𝒥|=𝒪(rlogn)|{\mathcal{J}}|=\mathcal{O}(r\log n) columns uniformly at random. Then rank(𝐔)=rank(𝐗)\mathrm{rank}({\bm{U}})=\mathrm{rank}(\bm{X}) with probability at least 1𝒪(rn2)1-\mathcal{O}(rn^{-2}).

There also exist many other sampling methodologies that ensure the recoverability of CUR approximation; for instance, the Discrete Empirical Interpolation Method (DEIM) [25, 26] and leverage scores sampling method [27]. The interested reader can find more details in [28, Table 1]. Nonetheless, they all sample completed rows and columns, thus are substantially different than the CCS model.

III Proposed Algorithm

In this section, we will present a novel non-convex algorithm for solving the proposed Robust CCS Completion problem (1). This iterative algorithm is based on the idea of projected gradient descent where CUR approximation is used for fast low-rank approximation in each iteration. The algorithm is dubbed Robust CUR Completion (RCURC) and summarized in Algorithm 2.

Algorithm 2 Robust CUR Completion (RCURC)
1:Input: [𝒀=𝑿+𝑺]Ω𝑹Ω𝑪[\bm{Y}=\bm{X}+\bm{S}]_{\Omega_{{\bm{R}}}\cup\Omega_{{\bm{C}}}}: observed data; Ω𝑹,Ω𝑪\Omega_{\bm{R}},\Omega_{\bm{C}}: observation locations; ,𝒥{\mathcal{I}},{\mathcal{J}}: row and column indices that define 𝑹{\bm{R}} and 𝑪{\bm{C}} respectively; ηR,ηC\eta_{R},\eta_{C}: step sizes; rr: target rank; ε\varepsilon: target precision level; ζ0\zeta_{0}: initial thresholding value; γ\gamma: thresholding decay parameter.
2:𝑿0=𝟎\bm{X}_{0}=\bm{0};  k=0k=0
3:while ek>εe_{k}>\varepsilon do \triangleright eke_{k} is defined in (3)
4:     // Updating sparse component
5:     ζk+1=γkζ0\zeta_{k+1}=\gamma^{k}\zeta_{0}
6:     [𝑺k+1],:=𝒯ζk+1[𝒀𝑿k],:[\bm{S}_{k+1}]_{{\mathcal{I}},:}={\mathcal{T}}_{\zeta_{k+1}}[\bm{Y}-\bm{X}_{k}]_{{\mathcal{I}},:}
7:     [𝑺k+1]:,𝒥=𝒯ζk+1[𝒀𝑿k]:,𝒥[\bm{S}_{k+1}]_{:,{\mathcal{J}}}={\mathcal{T}}_{\zeta_{k+1}}[\bm{Y}-\bm{X}_{k}]_{:,{\mathcal{J}}}
8:     // Updating low-rank component
9:     𝑹k+1=[𝑿k],:+ηR[𝒫Ω𝑹(𝒀𝑿k𝑺k+1)],:{\bm{R}}_{k+1}=[\bm{X}_{k}]_{{\mathcal{I}},:}+\eta_{R}[\mathcal{P}_{\Omega_{\bm{R}}}(\bm{Y}-\bm{X}_{k}-\bm{S}_{k+1})]_{{\mathcal{I}},:}
10:     𝑪k+1=[𝑿k]:,𝒥+ηC[𝒫Ω𝑪(𝒀𝑿k𝑺k+1)]:,𝒥{\bm{C}}_{k+1}=[\bm{X}_{k}]_{:,{\mathcal{J}}}+\eta_{C}[\mathcal{P}_{\Omega_{\bm{C}}}(\bm{Y}-\bm{X}_{k}-\bm{S}_{k+1})]_{:,{\mathcal{J}}}
11:     𝑼k+1=𝒟r([𝑹k+1]:,𝒥[𝑪k+1],:){\bm{U}}_{k+1}=\mathcal{D}_{r}([{\bm{R}}_{k+1}]_{:,{\mathcal{J}}}\uplus[{\bm{C}}_{k+1}]_{{\mathcal{I}},:})
12:     [𝑹k+1]:,𝒥=𝑼k+1[{\bm{R}}_{k+1}]_{:,{\mathcal{J}}}={\bm{U}}_{k+1}  and  [𝑪k+1],:=𝑼k+1[{\bm{C}}_{k+1}]_{{\mathcal{I}},:}={\bm{U}}_{k+1}
13:     𝑿k+1=𝑪k+1𝑼k+1𝑹k+1\bm{X}_{k+1}={\bm{C}}_{k+1}{\bm{U}}_{k+1}^{\dagger}{\bm{R}}_{k+1} \triangleright Do not compute
14:     k=k+1k=k+1
15:Output: 𝑪k,𝑼k,𝑹k{\bm{C}}_{k},{\bm{U}}_{k},{\bm{R}}_{k}: The CUR components of the estimated low-rank matrix.

We will go over the algorithm step by step in the following paragraphs. For the ease of the presentation, we start with the low-rank component.

Updating low-rank component. To utilize the structure of cross-concentrated samples, it is efficient to enforce the low-rank constraint of 𝑿\bm{X} with the CUR approximation technique. Let 𝑹=[𝑿],:{\bm{R}}=[\bm{X}]_{{\mathcal{I}},:}, 𝑪=[𝑿]:,𝒥{\bm{C}}=[\bm{X}]_{:,{\mathcal{J}}}, and 𝑼=[𝑿],𝒥{\bm{U}}=[\bm{X}]_{{\mathcal{I}},{\mathcal{J}}}. Applying gradient descent directly on 𝑹{\bm{R}} and 𝑪{\bm{C}} gives:

𝑹k+1=[𝑿k],:+ηR[𝒫Ω𝑹(𝒀𝑿k𝑺k+1)],:𝑪k+1=[𝑿k]:,𝒥+ηC[𝒫Ω𝑪(𝒀𝑿k𝑺k+1)]:,𝒥,\begin{split}{\bm{R}}_{k+1}&=[\bm{X}_{k}]_{{\mathcal{I}},:}+\eta_{R}[\mathcal{P}_{\Omega_{\bm{R}}}(\bm{Y}-\bm{X}_{k}-\bm{S}_{k+1})]_{{\mathcal{I}},:}\\ {\bm{C}}_{k+1}&=[\bm{X}_{k}]_{:,{\mathcal{J}}}+\eta_{C}[\mathcal{P}_{\Omega_{\bm{C}}}(\bm{Y}-\bm{X}_{k}-\bm{S}_{k+1})]_{:,{\mathcal{J}}},\end{split}

where ηR\eta_{R} and ηC\eta_{C} are the stepsizes. However, When it comes to the intersection submatrix 𝑼{\bm{U}}, it is more complicated as ΩR\Omega_{R} and ΩC\Omega_{C} can have overlaps. We abuse the notation \uplus and define an operator called union sum here:

[[𝑹k+1]:,𝒥[𝑪k+1],:]i,j={[𝑹k+1]i,jif (i,j)ΩRΩC;[𝑪k+1]i,jif (i,j)ΩCΩR;ηRηCηR+ηC([𝑹k+1]i,jηR+[𝑪k+1]i,jηC)if (i,j)ΩCΩR;0otherwise.\begin{split}&~{}\big{[}[{\bm{R}}_{k+1}]_{:,{\mathcal{J}}}\uplus[{\bm{C}}_{k+1}]_{{\mathcal{I}},:}\big{]}_{i,j}\\ =&\begin{cases}[{\bm{R}}_{k+1}]_{i,j}&\textnormal{if }(i,j)\in\Omega_{R}\setminus\Omega_{C};\\ [{\bm{C}}_{k+1}]_{i,j}&\textnormal{if }(i,j)\in\Omega_{C}\setminus\Omega_{R};\\ \frac{\eta_{R}\eta_{C}}{\eta_{R}+\eta_{C}}\left(\frac{[{\bm{R}}_{k+1}]_{i,j}}{\eta_{R}}+\frac{[{\bm{C}}_{k+1}]_{i,j}}{\eta_{C}}\right)&\textnormal{if }(i,j)\in\Omega_{C}\cap\Omega_{R};\\ 0&\textnormal{otherwise}.\end{cases}\end{split}

Basically, we take whatever value we have for the non-overlapped entries and take a weighted average for the overlaps where the weights are determined by the stepsizes used in the updates of 𝑹k+1{\bm{R}}_{k+1} and 𝑪k+1{\bm{C}}_{k+1}. To ensure the rank-rr constraint, at least one of 𝑪k+1{\bm{C}}_{k+1}, 𝑼k+1{\bm{U}}_{k+1} or 𝑹k+1{\bm{R}}_{k+1} should be rank-rr. For computational efficiency, we choose to put it on the smallest one. Thus,

𝑼k+1=𝒟r([𝑹k+1]:,𝒥[𝑪k+1],:),{\bm{U}}_{k+1}=\mathcal{D}_{r}\left([{\bm{R}}_{k+1}]_{:,{\mathcal{J}}}\uplus[{\bm{C}}_{k+1}]_{{\mathcal{I}},:}\right),

where 𝒟r\mathcal{D}_{r} is the best rank-rr approximation operator via truncated SVD. After replacing the intersection part 𝑼k+1{\bm{U}}_{k+1} in the previously updated 𝑹k+1{\bm{R}}_{k+1} and 𝑪k+1{\bm{C}}_{k+1}, we have the new estimation of low-rank component:

𝑿k+1=𝑪k+1𝑼k+1𝑹k+1.\bm{X}_{k+1}={\bm{C}}_{k+1}{\bm{U}}_{k+1}^{\dagger}{\bm{R}}_{k+1}. (2)

However, (2) is just a conceptual step and one should never compute it. In fact, the full matrix 𝑿\bm{X} is never needed and should not be formed in the algorithm as updating the corresponding CUR components is sufficient.

Updating sparse component. We detect the outliers and put them into the sparse matrix 𝑺\bm{S} via hard-thresholding operator:

[𝒯ζ(𝑴)]i,j={0if |[𝑴]i,j|<ζ;[𝑴]i,jotherwise.[{\mathcal{T}}_{\zeta}(\bm{M})]_{i,j}=\begin{cases}0&\textnormal{if }|[\bm{M}]_{i,j}|<\zeta;\\ [\bm{M}]_{i,j}&\textnormal{otherwise.}\end{cases}

The hard-thresholding on residue 𝒀𝑿k\bm{Y}-\bm{X}_{k}, paired with iterative decayed thresholding values:

ζk+1=γkζ0with some γ(0,1),\zeta_{k+1}=\gamma^{k}\zeta_{0}\quad\textnormal{with some }\gamma\in(0,1),

has shown promising performance in outlier detection in prior art [14, 16, 29, 30]. Notice that we only need to remove outliers located on the selected rows and columns, i.e., 𝑹{\bm{R}} and 𝑪{\bm{C}}, since they are the only components needed to update the low-rank component later. Therefore, for computational efficiency, we should only compute 𝑿k\bm{X}_{k} on the selected rows and columns to update 𝑺k+1\bm{S}_{k+1} correspondingly—as said, one should never compute the full 𝑿k\bm{X}_{k} in this algorithm. In particular,

[𝑿k],:=[𝑪k],:𝑼k𝑹k and [𝑿k]:,𝒥=𝑪k𝑼k[𝑹k]:,𝒥.\displaystyle[\bm{X}_{k}]_{{\mathcal{I}},:}=[{\bm{C}}_{k}]_{{\mathcal{I}},:}{\bm{U}}_{k}^{\dagger}{\bm{R}}_{k}\textnormal{~{}~{}and~{}~{}}[\bm{X}_{k}]_{:,{\mathcal{J}}}={\bm{C}}_{k}{\bm{U}}_{k}^{\dagger}[{\bm{R}}_{k}]_{:,{\mathcal{J}}}.

Stopping criteria and stepsizes. We finish the algorithm with the last few pieces. The stopping criteria is set to be ekεe_{k}\leq\varepsilon where ε\varepsilon is the targeted accuracy and the computational error is

ek=𝒫Ω𝑹Ω𝑪(𝑺k+𝑿k𝒀),𝑺k+𝑿k𝒀𝒫Ω𝑹Ω𝑪𝒀,𝒀.e_{k}=\frac{\langle\mathcal{P}_{\Omega_{\bm{R}}\cup\Omega_{\bm{C}}}(\bm{S}_{k}+\bm{X}_{k}-\bm{Y}),\bm{S}_{k}+\bm{X}_{k}-\bm{Y}\rangle}{\langle\mathcal{P}_{\Omega_{\bm{R}}\cup\Omega_{\bm{C}}}\bm{Y},\bm{Y}\rangle}. (3)

The recommended stepsizes are ηR=1pR\eta_{R}=\frac{1}{p_{R}} and ηC=1pC\eta_{C}=\frac{1}{p_{C}} where pRp_{R} and pCp_{C} are the observation rates of ΩR\Omega_{R} and ΩC\Omega_{C} respectively. Smaller stepsizes should be used with larger α\alpha, i.e., more outliers.

IV Numerical Experiments

In this section, we will verify the empirical performance of RCURC with both synthetic and real datasets. All experiments are implemented on Matlab R2022b and executed on a laptop equipped with Intel i7-11800H CPU (2.3GHz @ 8 cores) and 16GB DDR4 RAM.

IV-A Synthetic Datasets

In this simulation, we assess the computational efficiency of our algorithm, RCURC, in addressing the robust CCS completion problem. We construct 𝒀=𝑿+𝑺\bm{Y}=\bm{X}+\bm{S}, a d×dd\times d matrix with d=3000d=3000, where 𝑿=𝑾𝑽\bm{X}=\bm{W}\bm{V}^{\top} is a randomly generated rank-rr matrix. To create the sparse outlier tensor 𝑺\bm{S}, we randomly select α\alpha percent entries to form the support of 𝑺\bm{S}. The values of the non-zero entries are then uniformly sampled from the range [c𝔼(|𝑺ij|),c𝔼(|𝑺ij|)][-c\mathbb{E}(\left|\bm{S}_{ij}\right|),c\mathbb{E}(\left|\bm{S}_{ij}\right|)]. To generate the robust CCS completion problems, we set α=0.2\alpha=0.2, ||d=|𝒥|d=30%\frac{|{\mathcal{I}}|}{d}=\frac{|{\mathcal{J}}|}{d}=30\%, and |Ω𝑹|||d=|Ω𝑪||𝒥|d=25%\frac{|\Omega_{\bm{R}}|}{|{\mathcal{I}}|d}=\frac{|\Omega_{\bm{C}}|}{|{\mathcal{J}}|d}=25\%. The results are obtained by averaging over 5050 runs and reported in Figure 2. Both figures in Figure 2 depict the relationship between the relative error eke_{k} and computational time for our RCURC method with varying rank rr and outlier amplification factor cc. It is noteworthy that RCURC consistently achieves nearly linear convergence rates across different scenarios.

Refer to caption
Refer to caption
Figure 2: Empirical convergence of RCURC. Left: c=10c=10 and varying rr. Right: r=5r=5 and varying cc.

IV-B Video Background Subtraction

We have applied our robust CCS model and RCURC to the problem of background separation. Two popular videos, shoppingmall and restaurant, are used as benchmarks. The dimensions of the datasets are documented in Table I.

To evaluate the performance of our method, we apply IRCUR [31] to the entire video dataset. We use the background derived from IRCUR as the benchmark ground truth. Subsequently, we randomly select 40%40\% of the rows and 40%40\% of the columns to create subrow and subcolumn matrices. At the next stage, we generate 30%30\% observations on these submatrices. The reconstructed backgrounds obtained using RCURC are illustrated in Figure 3. It is evident that the background recovered by RCURC closely resembles that obtained via IRCUR. Furthermore, we utilize the Peak Signal-to-Noise Ratio (PSNR) as a quantitative measure to evaluate the reconstruction quality of RCURC in comparison to the results achieved with IRCUR. These findings are detailed in Table I.

TABLE I: Video size, runtime and PSNR. Herein 𝒮\mathcal{S} represents shoppingmall and \mathcal{R} represents restaurant.
Frame Size Frame Number Runtime PSNR
𝒮\mathcal{S} 256×320256\times 320 10001000 33.1233.12s 41.8741.87
\mathcal{R} 120×160120\times 160 30553055 31.1631.16s 39.8439.84
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g)
Refer to caption
(h)
Figure 3: Video background subtraction results: The first row shows shoppingmall, the second restaurant. Column one displays the ground truth background via IRCUR from full datasets. Columns two and three show two observed frames from our CCS model. The last column presents the background from RCURC using partial videos.

V Conclusion Remarks

This paper introduces a novel mathematical model for robust matrix completion problems with cross-concentrated samples. A highly efficient non-convex algorithm, dubbed RCURC, has been developed for the proposed model. The key techniques are projected gradient descent and CUR approximation. The numerical experiments, with both synthetic and real datasets, show high potential. In particular, we consistently observe linear convergence on RCURC.

As for future work, we will study the statistical properties of the proposed robust CCS completion model, such as theoretical sample complexities and outlier tolerance. The recovery guarantee with a linear convergence rate will also be established for RCURC. We will also explore other real-world applications that suit the proposed model.

References

  • [1] E. J. Candès and B. Recht, “Exact matrix completion via convex optimization,” Found. Comput. Math., vol. 9, no. 6, pp. 717–772, 2009.
  • [2] B. Recht, “A simpler approach to matrix completion.” J. Mach. Learn. Res., vol. 12, no. 12, 2011.
  • [3] J. Bennett, C. Elkan, B. Liu, P. Smyth, and D. Tikk, “KDD cup and workshop 2007,” SIGKDD Explor. Newsl., vol. 9, no. 2, p. 51–52, 2007.
  • [4] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, “Using collaborative filtering to weave an information tapestry,” Commun. ACM, vol. 35, no. 12, pp. 61–70, 1992.
  • [5] P. Chen and D. Suter, “Recovering the missing components in a large noisy low-rank matrix: Application to sfm,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 8, pp. 1051–1063, 2004.
  • [6] Y. Hu, D. Zhang, J. Ye, X. Li, and X. He, “Fast and accurate matrix completion via truncated nuclear norm regularization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 9, pp. 2117–2130, 2012.
  • [7] J.-F. Cai, T. Wang, and K. Wei, “Fast and provable algorithms for spectrally sparse signal reconstruction via low-rank Hankel matrix completion,” Appl. Comput. Harmon. Anal., vol. 46, no. 1, pp. 94–121, 2019.
  • [8] H. Cai, J.-F. Cai, and J. You, “Structured gradient descent for fast robust low-rank Hankel matrix completion,” SIAM J. Sci. Comput., vol. 45, no. 3, pp. A1172–A1198, 2023.
  • [9] H. Cai, L. Huang, P. Li, and D. Needell, “Matrix completion with cross-concentrated sampling: Bridging uniform sampling and CUR sampling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 8, pp. 10 100–10 113, 2023.
  • [10] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM, vol. 58, no. 3, pp. 1–37, 2011.
  • [11] P. Netrapalli, U. Niranjan, S. Sanghavi, A. Anandkumar, and P. Jain, “Non-convex robust PCA,” in Adv. Neural Inf. Process Syst., 2014, pp. 1107–1115.
  • [12] X. Yi, D. Park, Y. Chen, and C. Caramanis, “Fast algorithms for robust PCA via gradient descent,” in Adv. Neural Inf. Process Syst., 2016, pp. 4152–4160.
  • [13] T. Zhang and Y. Yang, “Robust pca by manifold optimization,” J. Mach. Learn. Res., vol. 19, no. 1, pp. 3101–3139, 2018.
  • [14] H. Cai, J.-F. Cai, and K. Wei, “Accelerated alternating projections for robust principal component analysis,” J. Mach. Learn. Res., vol. 20, no. 1, pp. 685–717, 2019.
  • [15] T. Tong, C. Ma, and Y. Chi, “Accelerating ill-conditioned low-rank matrix estimation via scaled gradient descent,” J. Mach. Learn. Res., vol. 22, no. 150, pp. 1–63, 2021.
  • [16] H. Cai, J. Liu, and W. Yin, “Learned robust PCA: A scalable deep unfolding approach for high-dimensional outlier detection,” in Adv. Neural Inf. Process Syst., vol. 34, 2021, pp. 16 977–16 989.
  • [17] Y. Chen and Y. Chi, “Robust spectral compressed sensing via structured matrix completion,” IEEE Trans. Inf. Theory, vol. 60, no. 10, pp. 6576–6601, 2014.
  • [18] S. Zhang and M. Wang, “Correction of corrupted columns through fast robust hankel matrix completion,” IEEE Trans. Signal Process., vol. 67, no. 10, pp. 2580–2594, 2019.
  • [19] H. Cai, J.-F. Cai, T. Wang, and G. Yin, “Accelerated structured alternating projections for robust spectrally sparse signal recovery,” IEEE Trans. Signal Process., vol. 69, pp. 809–821, 2021.
  • [20] H. Cai, K. Hamm, L. Huang, J. Li, and T. Wang, “Rapid robust principal component analysis: CUR accelerated inexact low rank estimation,” IEEE Signal Process. Lett., vol. 28, pp. 116–120, 2021.
  • [21] H. Cai, K. Hamm, L. Huang, and D. Needell, “Robust CUR decomposition: Theory and imaging applications,” SIAM J. Imaging Sci., vol. 14, no. 4, pp. 1472–1503, 2021.
  • [22] K. Hamm, M. Meskini, and H. Cai, “Riemannian CUR decompositions for robust principal component analysis,” in Topological, Algebraic and Geometric Learning Workshops, 2022, pp. 152–160.
  • [23] K. Hamm and L. Huang, “Perspectives on CUR decompositions,” Appl. Comput. Harmon. Anal., vol. 48, no. 3, pp. 1088–1099, 2020.
  • [24] J. Chiu and L. Demanet, “Sublinear randomized algorithms for skeleton decompositions,” SIAM J. Matrix Anal. Appl., vol. 34, no. 3, pp. 1361–1383, 2013.
  • [25] S. Chaturantabut and D. C. Sorensen, “Nonlinear model reduction via discrete empirical interpolation,” SIAM J. Sci. Comput., vol. 32, no. 5, pp. 2737–2764, 2010.
  • [26] D. C. Sorensen and M. Embree, “A DEIM induced CUR factorization,” SIAM J. Sci. Comput., vol. 38, no. 3, pp. A1454–A1482, 2016.
  • [27] P. Drineas, M. W. Mahoney, and S. Muthukrishnan, “Relative-error CUR matrix decompositions,” SIAM J. Matrix Anal. Appl., vol. 30, no. 2, pp. 844–881, 2008.
  • [28] K. Hamm and L. Huang, “Stability of sampling for CUR decompositions,” Found. Data Sci., vol. 2, no. 2, pp. 83–99, 2020.
  • [29] H. Cai, Z. Chao, L. Huang, and D. Needell, “Fast robust tensor principal component analysis via fiber CUR decomposition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021, pp. 189–197.
  • [30] ——, “Robust tensor CUR decompositions: Rapid low-tucker-rank tensor recovery with sparse corruption,” SIAM J. Imaging Sci., 2024.
  • [31] H. Cai, K. Hamm, L. Huang, J. Li, and T. Wang, “Rapid robust principal component analysis: CUR accelerated inexact low rank estimation,” IEEE Signal Process. Lett., vol. 28, pp. 116–120, 2020.