Fast Updating Truncated SVD for Representation Learning with Sparse Matrices

Haoran Deng¹, Yang Yang¹, Jiahe Li¹, Cheng Chen², Weihao Jiang², Shiliang Pu²
¹Zhejiang University, ²Hikvision Research Institute
{denghaoran, yangya, jiaheli}@zju.edu.cn
{chencheng16, jiangweihao5, pushiliang.hri}@hikvision.com Corresponding Author

Abstract

Updating a truncated Singular Value Decomposition (SVD) is crucial in representation learning, especially when dealing with large-scale data matrices that continuously evolve in practical scenarios. Aligning SVD-based models with fast-paced updates becomes increasingly important. Existing methods for updating truncated SVDs employ Rayleigh-Ritz projection procedures, where projection matrices are augmented based on original singular vectors. However, these methods suffer from inefficiency due to the densification of the update matrix and the application of the projection to all singular vectors. To address these limitations, we introduce a novel method for dynamically approximating the truncated SVD of a sparse and temporally evolving matrix. Our approach leverages sparsity in the orthogonalization process of augmented matrices and utilizes an extended decomposition to independently store projections in the column space of singular vectors. Numerical experiments demonstrate a remarkable efficiency improvement of an order of magnitude compared to previous methods. Remarkably, this improvement is achieved while maintaining a comparable precision to existing approaches.

1 Introduction

Truncated Singular Value Decompositions (truncated SVDs) are widely used in various machine learning tasks, including computer vision (Turk & Pentland, 1991), natural language processing (Deerwester et al., 1990; Levy & Goldberg, 2014), recommender systems (Koren et al., 2009) and graph representation learning (Ramasamy & Madhow, 2015; Abu-El-Haija et al., 2021; Cai et al., 2022). Representation learning with a truncated SVD benefits from no requirement of gradients and hyperparameters, better interpretability derived from the optimal approximation properties, and efficient adaptation to large-scale data using randomized numerical linear algebra techniques (Halko et al., 2011; Ubaru et al., 2015; Musco & Musco, 2015).

However, large-scale data matrices frequently undergo temporal evolution in practical applications. Consequently, it is imperative for a representation learning system that relies on the truncated SVD of these matrices to adjust the representations based on the evolving data. Node representation for a graph, for instance, can be computed with a truncated SVD of the adjacency matrix, where each row in the decomposed matrix corresponds to the representation of a node (Ramasamy & Madhow, 2015). When the adjacency matrix undergoes modifications, it is necessary to update the corresponding representation (Zhu et al., 2018; Zhang et al., 2018; Deng et al., 2023). Similar implementations exist in recommender systems based on the truncated SVD of an evolving and sparse user-item rating matrix (Sarwar et al., 2002; Cremonesi et al., 2010; Du et al., 2015; Nikolakopoulos et al., 2019). This requirement for timely updates emphasizes the significance of keeping SVD-based models in alignment with the ever-evolving data.

In the past twenty-five years, Rayleigh-Ritz projection methods (Zha & Simon, 1999; Vecharynski & Saad, 2014; Yamazaki et al., 2015; Kalantzis et al., 2021) have become the standard methods for updating the truncated SVD, owing to their high accuracy. Concretely, they construct the projection matrix by augmenting the columns of the current singular vector to an orthonormal matrix that roughly captures the column space of the updated singular vector. Notably, the augmentation procedure typically thickens sparse areas of the matrices, rendering the inefficiency of these algorithms. Moreover, due to the requirement of applying the projection process to all singular vectors, these methods become impractical in situations involving frequent updates or using only a portion of the representations in the downstream tasks.

In this paper, we present a novel algorithm for fast updating truncated SVDs in sparse matrices.

Contributions. We summarize major contributions of this paper as follows:

1.

We study the orthogonalization process of the augmented matrix performed in an inner product space isometric to the column space of the augmented matrix, which can exploit the sparsity of the updated matrix to reduce the time complexity. Besides, we propose an extended decomposition for the obtained orthogonal basis to efficiently update singular vectors.
2.

We propose an algorithm for approximately updating the rank- $k$ truncated SVD with a theoretical guarantee (Theorem 1), which runs at the update sparsity time complexity when considering $k$ and $s$ (the rank of the updated matrix) as constants. We also propose two variants of the algorithm that can be applied to cases where $s$ is large.
3.

We perform numerical experiments on updating truncated SVDs for sparse matrices in various real-world scenarios, such as representation learning applications in graphs and recommendations. The results demonstrate that the proposed method achieves a speed improvement of an order of magnitude compared to previous methods, while still maintaining comparable accuracy.

2 Background and Notations

The singular value decomposition (SVD) of an $m$ -by- $n$ real data matrix ${\bm{A}}$ is denoted by ${\bm{A}}={\bm{U}}\mathbf{\Sigma}{\bm{V}}^{\top}$ , where ${\bm{U}}\in\mathbb{R}^{m\times m}$ , ${\bm{V}}\in\mathbb{R}^{n\times n}$ are orthogonal and $\mathbf{\Sigma}\in\mathbb{R}^{m\times n}$ is a rectangular diagonal with non-negative real numbers sorted across the diagonal. The rank- $k$ truncated SVD of $\mathbf{A}$ is obtained by keeping only the first $k$ largest singular values and using the corresponding $k$ columns of $\mathbf{U}$ and $\mathbf{V}$ , which is denoted by $\mathbf{U}_{k}\mathbf{\Sigma}_{k}\mathbf{V}_{k}=\text{SVD}_{k}(\mathbf{A})$ .

In this paper, we consider the problem of updating the truncated SVD of a sparse matrix by adding new rows (or columns) and low-rank modifications of weights. Specifically, when a truncated SVD of matrix $\mathbf{A}\in\mathbb{R}^{m\times n}$ is available, our goal is to approximate the truncated SVD of $\overline{{\bm{A}}}$ which has addition of rows ${\bm{E}}_{r}$ $\in\mathbb{R}^{s\times n}$ (or columns ${\bm{E}}_{c}$ $\in\mathbb{R}^{m\times s}$ ) or low-rank modifications ${\bm{D}}\in\mathbb{R}^{m\times s},$ ${\bm{E}}_{m}$ $\in\mathbb{R}^{s\times n}$ to ${\bm{A}}$ .

\overline{{\bm{A}}}=\begin{bmatrix}{\bm{A}}\\ {{{\bm{E}}_{r}}}\end{bmatrix},\quad\overline{{\bm{A}}}=\begin{bmatrix}{\bm{A}}\ {{\mathbf{E_{c}}}}\end{bmatrix},\quad\text{or}\quad\overline{{\bm{A}}}={\bm{A}}+{\bm{D}}{{{\bm{E}}_{m}^{\top}}}

In several related works, including Zha-Simon’s, a key issue often revolves around the optimization of the QR decomposition of the $({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{B}}$ matrix. Specifically, given an orthonormal matrix ${\bm{U}}_{k}$ and a sparse matrix ${\bm{B}}$ , we refer to $({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{B}}$ as the Augmented Matrix with respect to ${\bm{U}}_{k}$ , where its column space is orthogonal to ${\bm{U}}_{k}$ .

Notations. In this paper, we use the lowercase letter $x$ , bold lowercase letter ${\bm{x}}$ and bold uppercase letter ${\bm{X}}$ to denote scalars, vectors, and matrices, respectively. Moreover, $nnz({\bm{X}})$ denotes the number of non-zero entries in a matrix, $\overline{{\bm{X}}}$ denotes the updated matrix, $\widetilde{{\bm{X}}}_{k}$ denotes the rank- $k$ approximation of a matrix, and $range({\bm{X}})$ denotes the matrix’s column space. A table of frequently used notations in this paper is listed in Appendix B.

2.1 Related Work

Projection Viewpoint. Recent perspectives (Vecharynski & Saad, 2014; Kalantzis et al., 2021) frame the prevailing methodologies for updating the truncated SVD as instances of the Rayleigh-Ritz projection process, which can be characterized by following steps.

1.

Augment the singular vector ${\bm{U}}_{k}$ and ${\bm{V}}_{k}$ by adding extra columns (or rows), resulting in $\widehat{{\bm{U}}}$ and $\widehat{{\bm{V}}}$ , respectively. This augmentation is performed so that $range(\widehat{{\bm{U}}})$ and $range(\widehat{{\bm{V}}})$ can effectively approximate and capture the updated singular vectors’ column space.
2.

Compute ${\bm{F}}_{k},\mathbf{\Theta}_{k},{\bm{G}}_{k}=\text{SVD}_{k}(\widehat{{\bm{U}}}^{\top}{\bm{A}}\widehat{{\bm{V}}})$ .
3.

Approximate the updated truncated SVD by $\widehat{{\bm{U}}}{\bm{F}}_{k},\mathbf{\Theta}_{k},\widehat{{\bm{V}}}{\bm{G}}_{k}$ .

Zha-Simon’s Scheme. For $\widetilde{{\bm{A}}}_{k}={\bm{U}}_{k}\mathbf{\Sigma}_{k}{\bm{V}}_{k}^{\top}$ , let $({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){{{\bm{E}}_{c}}}={\bm{Q}}{\bm{R}}$ , where ${\bm{Q}}$ ’s columns are orthonormal and ${\bm{R}}$ is upper trapezoidal. Zha & Simon (1999) method updates the truncate SVD after appending new columns ${\bm{E}}_{c}$ $\in\mathbb{R}^{m\times s}$ to ${\bm{A}}\in\mathbb{R}^{m\times n}$ by

\displaystyle\overline{{\bm{A}}}\approx\begin{bmatrix}\widetilde{{\bm{A}}_{k}}~{}~{}{{{\bm{E}}_{c}}}\end{bmatrix}=\begin{bmatrix}{\bm{U}}_{k}~{}~{}{\bm{Q}}\end{bmatrix}\begin{bmatrix}\mathbf{\Sigma}_{k}&{\bm{U}}_{k}^{\top}{{{\bm{E}}_{c}}}\\ &{\bm{R}}\end{bmatrix}\begin{bmatrix}{\bm{V}}_{k}^{\top}&\\ &{\bm{I}}\end{bmatrix}=([{\bm{U}}_{k}~{}~{}{\bm{Q}}]~{}{\bm{F}}_{k})\mathbf{\Theta}_{k}(\begin{bmatrix}{\bm{V}}_{k}&\\ &{\bm{I}}\end{bmatrix}{\bm{G}}_{k})^{\top}

(1)

where ${\bm{F}}_{k},\mathbf{\Theta}_{k},{\bm{G}}_{k}=\text{SVD}_{k}(\begin{bmatrix}\mathbf{\Sigma}_{k}&{\bm{U}}_{k}^{\top}{{{\bm{E}}_{c}}}\\ &{\bm{R}}\end{bmatrix})$ . The updated approximate left singular vectors, singular values and right singular vectors are $\begin{bmatrix}{\bm{U}}_{k}~{}~{}{\bm{Q}}\end{bmatrix}{\bm{F}}_{k},\mathbf{\Theta}_{k},\begin{bmatrix}{\bm{V}}_{k}&\\ &{\bm{I}}\end{bmatrix}{\bm{G}}_{k}$ , respectively.

When it comes to weight update, let the QR-decomposition of the augmented matrices be $({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{D}}={\bm{Q}}_{\bm{D}}{\bm{R}}_{\bm{D}}$ and $({\bm{I}}-{\bm{V}}_{k}{\bm{V}}_{k}^{\top}){{{\bm{E}}_{m}}}={\bm{Q}}_{\bm{E}}{\bm{R}}_{\bm{E}}$ , then the update procedure is

	$\displaystyle\overline{{\bm{A}}}\approx\widetilde{{\bm{A}}_{k}}+{\bm{D}}{{{\bm{E}}_{m}}}^{\top}$	$\displaystyle=\begin{bmatrix}{\bm{U}}_{k}~{}~{}{\bm{Q}}_{\bm{D}}\end{bmatrix}(\begin{bmatrix}\mathbf{\Sigma}_{k}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}\\ \end{bmatrix}+\begin{bmatrix}{\bm{U}}_{k}^{\top}{\bm{D}}\\ {\bm{R}}_{\bm{D}}\end{bmatrix}\begin{bmatrix}{\bm{V}}_{k}^{\top}{{{\bm{E}}_{m}}}\\ {\bm{R}}_{\bm{E}}\end{bmatrix}^{\top})\begin{bmatrix}{\bm{V}}_{k}~{}~{}{\bm{Q}}_{\bm{E}}\end{bmatrix}^{\top}$		(2)
		$\displaystyle=([{\bm{U}}_{k}~{}~{}{\bm{Q}}_{\bm{D}}]~{}{\bm{F}}_{k})\mathbf{\Theta}_{k}(\begin{bmatrix}{\bm{V}}_{k}~{}~{}{\bm{Q}}_{\bm{E}}\end{bmatrix}{\bm{G}}_{k})^{\top}$		(2)

where ${\bm{F}}_{k},\mathbf{\Theta}_{k},{\bm{G}}_{k}=\text{SVD}_{k}(\begin{bmatrix}\mathbf{\Sigma}_{k}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}\\ \end{bmatrix}+\begin{bmatrix}{\bm{U}}_{k}^{\top}{\bm{D}}\\ {\bm{R}}_{\bm{D}}\end{bmatrix}\begin{bmatrix}{\bm{V}}_{k}^{\top}{{{\bm{E}}_{m}}}\\ {\bm{R}}_{\bm{E}}\end{bmatrix}^{\top})$ . The updated approximate truncated SVD is $\begin{bmatrix}{\bm{U}}_{k}~{}~{}{\bm{Q}}_{\bm{D}}\end{bmatrix}{\bm{F}}_{k},\mathbf{\Theta}_{k},\begin{bmatrix}{\bm{V}}_{k}~{}~{}{\bm{Q}}_{\bm{E}}\end{bmatrix}{\bm{G}}_{k}$ .

Orthogonalization of Augmented Matrix. The above QR decomposition of the augmented matrix and the consequent compact SVD is of high time complexity and occupies the majority of the time in the whole process when the granularity of the update is large (i.e., $s$ is large). To reduce the time complexity, a series of subsequent methods have been developed based on a faster approximation of the orthogonal basis of the augmented matrix. Vecharynski & Saad (2014) used Lanczos vectors conducted by a Golub-Kahan-Lanczos (GKL) bidiagonalization procedure to approximate the augmented matrix. Yamazaki et al. (2015) and Ubaru & Saad (2019) replaced the above GKL procedure with Randomized Power Iteration (RPI) and graph coarsening, respectively. Kalantzis et al. (2021) suggested determining only the left singular projection subspaces with the augmented matrix set as the identity matrix, and obtaining the right singular vectors by projection.

3 Methodology

We propose an algorithm for fast updating the truncated SVD based on the Rayleigh-Ritz projection paradigm. In Section 3.1, we present a procedure for orthogonalizing the augmented matrix that takes advantage of the sparsity of the updated matrix. This procedure is performed in an inner product space that is isometric to the augmented space. In Section 3.2, we demonstrate how to utilize the resulting orthogonal basis to update the truncated SVD. We also propose an extended decomposition technique to reduce complexity. In Section 3.3, we provide a detailed description of the proposed algorithm and summarize the main findings in the form of theorems. In Section 3.4, we evaluate the time complexity of our algorithm in relation to existing methods.

3.1 Faster Orthogonalization of Augmented Matrix

In this section, we introduce an inner product space that is isomorphic to the column space of the augmented matrix, where each element is a tuple of a sparse vector and a low-dimensional vector. The orthogonalization process in this space can exploit sparsity and low dimension, and the resulting orthonormal basis is bijective to the orthonormal basis of the column space of the augmented matrix.

Previous methods perform QR decomposition of the augmented matrix with ${\bm{Q}}{\bm{R}}=({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{B}}$ , to obtain the updated orthogonal matrix. The matrix ${\bm{Q}}$ derived from the aforementioned procedure is not only orthonormal, but its column space is also orthogonal to the column space of ${\bm{U}}_{k}$ , implying that matrix $\begin{bmatrix}{\bm{U}}_{k}~{}~{}{\bm{Q}}\end{bmatrix}$ is orthonormal.

Let ${\bm{Z}}=({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{B}}={\bm{B}}-{\bm{U}}_{k}({\bm{U}}_{k}^{\top}{\bm{B}})={\bm{B}}-{\bm{U}}_{k}{\bm{C}}$ be the augmented matrix, then each column of ${\bm{Z}}$ can be written as a sparse $m$ -dimensional matrix of column vectors minus a linear combination of the column vectors of the $m$ -by- $k$ orthonormal matrix ${\bm{U}}_{k}$ i.e., ${\bm{z}}_{i}={\bm{b}}_{i}-{\bm{U}}_{k}{\bm{C}}[i]$ . We refer to the form of a sparse vector minus a linear combination of orthonormal vectors as SV-LCOV, and its definition is as follows:

Definition 1 (SV-LCOV).

Let ${\bm{U}}_{k}\in\mathbb{R}^{m\times k}$ be an arbitrary matrix satisfying ${\bm{U}}_{k}^{\top}{\bm{U}}_{k}={\bm{I}}$ , and let ${\bm{b}}\in\mathbb{R}^{m}$ be a sparse vector. Then, the SV-LCOV form of the vector ${\bm{z}}=({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{b}}$ is a tuple denoted as $({\bm{b}},{\bm{c}})_{{\bm{U}}_{k}}$ where ${\bm{c}}={\bm{U}}_{k}^{\top}{\bm{b}}$ .

Turning columns of an augmented matrix into SV-LCOV is efficient, because the computation of ${\bm{U}}_{k}^{\top}{\bm{b}}$ can be done by extracting the rows of ${\bm{U}}_{k}$ that correspond to the nonzero elements of ${\bm{b}}$ , and then multiplying them by ${\bm{b}}$ .

Lemma 1.

For an orthonormal matrix ${\bm{U}}_{k}\in\mathbb{R}^{m\times k}$ with ${\bm{U}}_{k}^{\top}{\bm{U}}_{k}={\bm{I}}$ , turning $({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{b}}$ with a sparse vector ${\bm{b}}\in\mathbb{R}^{m}$ into SV-LCOV can be done in time complexity of $O(k\cdot nnz({\bm{b}}))$ .

Furthermore, we define the scalar multiplication, addition and inner product (i.e., dot product) of SV-LCOV and show in Lemma 2 that these operations can be computed with low computational complexity when ${\bm{b}}$ is sparse.

Lemma 2.

For an orthonormal matrix ${\bm{U}}_{k}\in\mathbb{R}^{m\times k}$ with ${\bm{U}}_{k}^{\top}{\bm{U}}_{k}={\bm{I}}$ , the following operations of SV-LCOV can be done in the following time.

•

Scalar multiplication: $\alpha({\bm{b}},{\bm{c}})_{{\bm{U}}_{k}}=(\alpha{\bm{b}},\alpha{\bm{c}})_{{\bm{U}}_{k}}$ in $O(nnz({\bm{b}})+k)$ time.
•

Addition: $({\bm{b}}_{1},{\bm{c}}_{1})_{{\bm{U}}_{k}}+({\bm{b}}_{2},{\bm{c}}_{2})_{{\bm{U}}_{k}}=({\bm{b}}_{1}+{\bm{b}}_{2},{\bm{c}}_{1}+{\bm{c}}_{2})_{{\bm{U}}_{k}}$ in $O(nnz({\bm{b}}_{1}+{\bm{b}}_{2})+k)$ time.
•

Inner product: $\langle({\bm{b}}_{1},{\bm{c}}_{1})_{{\bm{U}}_{k}},({\bm{b}}_{2},{\bm{c}}_{2})_{{\bm{U}}_{k}}\rangle=\langle{\bm{b}}_{1},{\bm{b}}_{2}\rangle-\langle{\bm{c}}_{1},{\bm{c}}_{2}\rangle$ in $O(nnz({\bm{b}}_{1}+{\bm{b}}_{2})+k)$ time.

With the scalar multiplication, addition and inner product operations of SV-LCOV, we can further study the inner product space of SV-LCOV.

Lemma 3 (Isometry of SV-LCOV).

For an orthonormal matrix ${\bm{U}}_{k}\in\mathbb{R}^{m\times k}$ with ${\bm{U}}_{k}^{\top}{\bm{U}}_{k}={\bm{I}}$ , let ${\bm{B}}\in\mathbb{R}^{m\times s}$ be arbitrary sparse matrix with the columns of ${\bm{B}}=[{\bm{b}}_{1},...,{\bm{b}}_{s}]$ , then the column space of $({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{B}}$ is isometric to the inner product space of their SV-LCOV.

The isometry of an inner product space here is a transformation that preserves the inner product, thereby preserving the angles and lengths in the space. From Lemma 3, we can see that in SV-LCOV, since the dot product remains unchanged, the orthogonalization process of an augmented matrix can be transformed into an orthogonalization process in the inner product space.

As an example, the Modified Gram-Schmidt process (i.e. Algorithm 1) is commonly used to compute an orthonormal basis for a given set of vectors. It involves a series of orthogonalization and normalization steps to produce a set of mutually orthogonal vectors that span the same subspace as the original vectors. Numerically, the entire process consists of only three types of operations in Lemma 2, so we can replace them with SV-LCOV operations to obtain a more efficient method (i.e. Algorithm 2).

Input:

{\bm{E}}\in\mathbb{R}^{n\times s},{\bm{U}}_{k}\in\mathbb{R}^{n\times k}

Output:

{\bm{Q}}\in\mathbb{R}^{n\times s},{\bm{R}}\in\mathbb{R}^{s\times s}

{\bm{Q}}\leftarrow({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{E}}

;

3for $i=1$ to $s$ do

\alpha\leftarrow\sqrt{\langle{\bm{q}}_{i},{\bm{q}}_{i}\rangle}

;

{\bm{R}}_{i,i}\leftarrow\alpha

;

{\bm{q}}_{i}\leftarrow{\bm{q}}_{i}/\alpha

;

7 for $j=i+1$ to $s$ do

\beta\leftarrow\langle{\bm{q}}_{i},{\bm{q}}_{j}\rangle

;

{\bm{R}}_{i,j}\leftarrow\beta

;

{\bm{q}}_{j}\leftarrow{\bm{q}}_{j}-\beta{\bm{q}}_{i}

;

12 end for

14 end for

return ${\bm{Q}}=[{\bm{q}}_{1},...,{\bm{q}}_{s}],{\bm{R}}$

Algorithm 1 Modified Gram-Schmidt

Input:

{\bm{E}}\in\mathbb{R}^{m\times s},{\bm{U}}_{k}\in\mathbb{R}^{m\times k}

Output:

{\bm{B}}\in\mathbb{R}^{m\times s},{\bm{C}}\in\mathbb{R}^{k\times s},{\bm{R}}\in\mathbb{R}^{s\times s}

{\bm{B}}\leftarrow{\bm{E}},\quad{\bm{C}}\leftarrow{\bm{U}}_{k}^{\top}{\bm{E}}

;

3for $i=1$ to $s$ do

\alpha\leftarrow\sqrt{\langle{\bm{b}}_{i},{\bm{b}}_{i}\rangle-\langle{\bm{c}}_{i},{\bm{c}}_{i}\rangle}

;

{\bm{R}}_{i,i}\leftarrow\alpha

;

{\bm{b}}_{i}\leftarrow{\bm{b}}_{i}/\alpha,\quad{\bm{c}}_{i}\leftarrow{\bm{c}}_{i}/\alpha

;

7 for $j=i+1$ to $s$ do

\beta\leftarrow\langle{\bm{b}}_{i},{\bm{b}}_{j}\rangle-\langle{\bm{c}}_{i},{\bm{c}}_{j}\rangle

;

{\bm{R}}_{i,j}\leftarrow\beta

;

{\bm{b}}_{j}\leftarrow{\bm{b}}_{j}-\beta{\bm{b}}_{i},\quad{\bm{c}}_{j}\leftarrow{\bm{c}}_{j}-\beta{\bm{c}}_{i}

;

12 end for

14 end for

15return ${\bm{Q}}=[({\bm{b}}_{1},{\bm{c}}_{1})_{{\bm{U}}_{k}},...,({\bm{b}}_{s},{\bm{c}}_{s})_{{\bm{U}}_{k}}],{\bm{R}}$

Algorithm 2 SV-LCOV’s QR process

Lemma 4.

Given an orthonormal matrix ${\bm{U}}_{k}\in\mathbb{R}^{m\times k}$ satisfying ${\bm{U}}_{k}^{\top}{\bm{U}}_{k}={\bm{I}}$ , there exists an algorithm that can obtain a orthonormal basis of a set of SV-LCOV $\{({\bm{b}}_{1},{\bm{c}}_{1})_{{\bm{U}}_{k}},...,({\bm{b}}_{s},{\bm{c}}_{s})_{{\bm{U}}_{k}}\}$ in $O((nnz(\sum_{i=1}^{s}{\bm{b}}_{i})+k)s^{2})$ time.

Approximating the augmented matrix with SV-LCOV. The Gram-Schdmit process is less efficient when $s$ is large. To this end, Vecharynski & Saad (2014) and Yamazaki et al. (2015) approximated the orthogonal basis of the augmented matrix with Golub-Kahan-Lanczos bidiagonalization(GKL) procedure (Golub & Kahan, 1965) and Randomized Power Iteration (RPI) process. We find they consists of three operations in Lemma 2 and can be transformed into SV-LCOV for efficiency improvement. Limited by space, we elaborate the proposed algorithm of appoximating the augmented matrix with SV-LCOV in Appendix D.1 and Appendix D.2, respectively.

3.2 An Extended Decomposition to Reducing Complexity

Low-dimensional matrix mutliplication and sparse matrix addition. Suppose an orthonormal basis $({\bm{b}}_{1},{\bm{c}}_{1})_{{\bm{U}}_{k}},...,({\bm{b}}_{s},{\bm{c}}_{s})_{{\bm{U}}_{k}}$ of the augmented matrix in the SV-LCOV is obtained, according to Definition 1, this orthonormal basis corresponds to the matrix ${\bm{B}}-{\bm{U}}_{k}{\bm{C}}$ where the $i$ -th column of ${\bm{B}}$ is ${\bm{b}}_{s}$ . Regarding the final step of the Rayleigh-Ritz process for updating the truncated SVD by adding columns, we have the update procedure for the left singular vectors:

$\displaystyle\overline{{\bm{U}}_{k}}\leftarrow\begin{bmatrix}{\bm{U}}_{k}~{}~{}{\bm{Q}}\end{bmatrix}{\bm{F}}_{k}$	$\displaystyle={\bm{U}}_{k}{\bm{F}}_{k}[:k]+{\bm{Q}}{\bm{F}}_{k}[k:]$	(3)
	$\displaystyle={\bm{U}}_{k}{\bm{F}}_{k}[:k]+({\bm{B}}-{{{\bm{U}}_{k}}}{\bm{C}}){\bm{F}}_{k}[k:]$
	$\displaystyle={\bm{U}}_{k}({\bm{F}}_{k}[:k]-{\bm{C}}{\bm{F}}_{k}[k:])+{\bm{B}}{\bm{F}}_{k}[k:]$

where ${\bm{F}}_{k}[:k]$ denotes the submatrix consisting of the first $k$ rows of ${\bm{F}}_{k}$ , and ${\bm{F}}_{k}[k:]$ denotes the submatrix consisting of rows starting from the $(k+1)$ -th rows of ${\bm{F}}_{k}$ .

Equation (3) shows that each update of the singular vectors ${\bm{U}}_{k}$ consists of the following operations:

1.

A low-dimensional matrix multiplication to ${\bm{U}}_{k}$ by a $k$ -by- $k$ matrix $({\bm{F}}_{k}[:k]-{\bm{C}}{\bm{F}}_{k}[k:])$ .
2.

A sparse matrix addition to ${\bm{U}}_{k}$ by of a sparse matrix ${\bm{B}}{\bm{F}}_{k}[k:]$ that has at most $nnz({\bm{B}})\cdot k$ non-zero entries. While ${\bm{B}}{\bm{F}}_{k}[k:]$ may appear relatively dense in the context, considering it as a sparse matrix during the process of sparse matrix addition ensures asymptotic complexity.

An extended decomposition for reducing complexity. To reduce the complexity, we write the truncated SVD as a product of the following five matrices:

{\bm{U}}^{\prime}_{m\times k}\cdot{\bm{U}}^{\prime\prime}_{k\times k}\cdot\mathbf{\Sigma}_{k\times k}\cdot{\bm{V}}^{\prime\prime\top}_{k\times k}\cdot{\bm{V}}^{\prime\top}_{n\times k}

(4)

with orthonormal ${\bm{U}}={\bm{U}}^{\prime}\cdot{\bm{U}}^{\prime\prime}$ and ${\bm{V}}^{\prime}\cdot{\bm{V}}^{\prime\prime}$ (but not ${\bm{V}}^{\prime}$ or ${\bm{V}}^{\prime\prime}$ ), that is, the singular vectors will be obtained by the product of the two matrices. Initially, ${\bm{U}}^{\prime\prime}$ and ${\bm{V}}^{\prime\prime}$ are set by the $k$ -by- $k$ identity matrix, and ${\bm{U}}^{\prime}$ , ${\bm{V}}^{\prime}$ are set as the left and right singular vectors. Similar additional decomposition was used in Brand (2006) for updating of SVD without any truncation and with much higher complexity. With the additional decomposition presented above, the two operations can be performed to update the singular vector:

	$\displaystyle\overline{{\bm{U}}^{\prime\prime}}$	$\displaystyle\leftarrow{\bm{U}}^{\prime\prime}({\bm{F}}_{k}[:k]-{\bm{C}}{\bm{F}}_{k}[k:])$		(5)
	$\displaystyle\overline{{\bm{U}}^{\prime}}$	$\displaystyle\leftarrow{\bm{U}}^{\prime}+{\bm{B}}{\bm{F}}_{k}[k:]\overline{{\bm{U}}^{\prime\prime}}^{-1}$		(5)

which is a low-dimensional matrix multiplication and a sparse matrix addition. And the update of the right singular vectors is

\overline{{\bm{V}}_{k}}\leftarrow\begin{bmatrix}{\bm{V}}_{k}&\\ &{\bm{I}}\end{bmatrix}{\bm{G}}_{k}=\begin{bmatrix}{\bm{V}}_{k}\\ \mathbf{0}\end{bmatrix}{\bm{G}}_{k}[:k]+\begin{bmatrix}\mathbf{0}\\ {\bm{I}}\end{bmatrix}{\bm{G}}_{k}[k:]

(6)

where ${\bm{G}}_{k}[:k]$ denotes the submatrix consisting of the first $k$ rows of ${\bm{G}}_{k}$ and ${\bm{G}}_{k}[k:]$ denotes the submatrix cosisting of rows starting from the $(k+1)$ -th rows of ${\bm{G}}_{k}$ . Now this can be performed as

\begin{split}\overline{{\bm{V}}^{\prime\prime}}\leftarrow&{\bm{V}}^{\prime\prime}{\bm{G}}_{k}[:k]\\ \text{Append matrix }&{\bm{G}}_{k}[k:]\overline{{\bm{V}}^{\prime\prime}}^{-1}\text{ to }{\bm{V}}^{\prime}\end{split}

(7)

Above we have only shown the scenario of adding columns, but a similar approach can be used to add rows and modify weights. Such an extended decomposition reduces the time complexity of updating the left and right singular vectors so that they can be deployed to large-scale matrix with large $m$ and $n$ . In practical application, the aforementioned extended decomposition might pose potential numerical issues when the condition number of the matrix is large, even though in most cases these matrices have relatively low condition numbers. One solution to this issue is to reset the $k$ -by- $k$ matrix to the identity matrix.

3.3 Main Result

Algorithm 3 and Algorithm 4 are the pseudocodes of the proposed algorithm for adding columns and modifying weights, respectively.

Input:

{\bm{U}}_{k}({\bm{U}}^{\prime},{\bm{U}}^{\prime\prime}),\mathbf{\Sigma}_{k},{\bm{V}}_{k}({\bm{V}}^{\prime},{\bm{V}}^{\prime\prime}),{{{\bm{E}}_{c}}}

1 Turn

({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){{{\bm{E}}_{c}}}

into SV-LCOV and get

{\bm{Q}}({\bm{B}},{\bm{C}}),{\bm{R}}

with Algorithm 2;

{\bm{F}}_{k},\mathbf{\Theta}_{k},{\bm{G}}_{k}\leftarrow\text{SVD}_{k}(\begin{bmatrix}\mathbf{\Sigma}_{k}&{\bm{U}}_{k}^{\top}{{{\bm{E}}_{c}}}\\ &{\bm{R}}\\ \end{bmatrix})

;

{\bm{U}}^{\prime\prime}\leftarrow{\bm{U}}^{\prime\prime}({\bm{F}}_{k}[:k]-{\bm{C}}{\bm{F}}_{k}[k:])

;

{\bm{U}}^{\prime}\leftarrow{\bm{U}}^{\prime}+{\bm{B}}{\bm{F}}_{k}[k:]{\bm{U}}^{\prime\prime-1}

;

\mathbf{\Sigma}_{k}\leftarrow\mathbf{\Theta}_{k}

;

{\bm{V}}^{\prime\prime}\leftarrow{\bm{V}}^{\prime\prime}{\bm{G}}_{k}[:k]

;

7 Append new columns

{\bm{G}}[k:]{\bm{V}}^{\prime\prime-1}

{\bm{V}}^{\prime}

;

Algorithm 3 Add columns

Input:

{\bm{U}}_{k}({\bm{U}}^{\prime},{\bm{U}}^{\prime\prime}),\mathbf{\Sigma}_{k},{\bm{V}}_{k}({\bm{V}}^{\prime},{\bm{V}}^{\prime\prime}),{\bm{D}},{{{\bm{E}}_{m}}}

1 Turn

({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{D}}

into SV-LCOV and get

{\bm{Q}}_{\bm{D}}({\bm{B}}_{\bm{D}},{\bm{C}}_{\bm{D}}),{\bm{R}}_{\bm{D}}

with Algorithm 2;

2 Turn

({\bm{I}}-{\bm{V}}_{k}{\bm{V}}_{k}^{\top}){{{\bm{E}}_{m}}}

into SV-LCOV and get

{\bm{Q}}_{\bm{E}}({\bm{B}}_{\bm{E}},{\bm{C}}_{\bm{E}}),{\bm{R}}_{\bm{E}}

with Algorithm 2;

{\bm{F}}_{k},\mathbf{\Sigma}_{k},{\bm{G}}_{k}\leftarrow\text{SVD}_{k}(\begin{bmatrix}\mathbf{\Sigma}_{k}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}\\ \end{bmatrix}+\begin{bmatrix}{\bm{U}}_{k}^{\top}{\bm{D}}\\ {\bm{R}}_{\bm{D}}\end{bmatrix}\begin{bmatrix}{\bm{V}}_{k}^{\top}{{{\bm{E}}_{m}}}\\ {\bm{R}}_{\bm{E}}\end{bmatrix}^{\top})

;

{\bm{U}}^{\prime\prime}\leftarrow{\bm{U}}^{\prime\prime}({\bm{F}}_{k}[:k]-{\bm{C}}_{\bm{D}}{\bm{F}}_{k}[k:])

;

{\bm{U}}^{\prime}\leftarrow{\bm{U}}^{\prime}+{\bm{B}}_{\bm{D}}{\bm{F}}_{k}[k:]{\bm{U}}^{\prime\prime-1}

;

\mathbf{\Sigma}_{k}\leftarrow\mathbf{\Theta}_{k}

;

{\bm{V}}^{\prime\prime}\leftarrow{\bm{V}}^{\prime\prime}({\bm{G}}_{k}[:k]-{\bm{C}}_{\bm{E}}{\bm{G}}_{k}[k:])

;

{\bm{V}}^{\prime}\leftarrow{\bm{V}}^{\prime}+{\bm{B}}_{\bm{E}}{\bm{G}}_{k}[k:]{\bm{U}}^{\prime\prime-1}

;

Algorithm 4 Update weights

The time complexity of the proposed algorithm in this paper is summarized in Theorem 1 and a detailed examination of the time complexity is provided in Appendix F.

Definition 2.

If a modification is applied to the matrix ${\bm{A}}$ , and the (approximate) truncated SVD of ${\bm{A}}$ , denoted as $[{\bm{U}}_{k},\bm{\Sigma}_{k},{\bm{V}}_{k}]$ , is updated to the rank- $k$ truncated SVD of the matrix resulting from applying this modification to $\widetilde{A}$ ( $\widetilde{A}={\bm{U}}_{k}\bm{\Sigma}_{k}{\bm{V}}_{k}^{\top}$ ), we call it maintaining an approximate rank-k truncated SVD of ${\bm{A}}$ .

Theorem 1 (Main result).

There is a data structure for maintaining an approximate rank- $k$ truncated SVD of ${\bm{A}}\in\mathbb{R}^{m\times n}$ with the following operations in the following time.

•

$\textbf{Add rows}({\bm{E}})$ : Update ${\bm{U}}_{k},\mathbf{\Sigma}_{k},{\bm{V}}_{k}\leftarrow\text{SVD}_{k}(\begin{bmatrix}\widetilde{{\bm{A}}_{k}}\\ {{{\bm{E}}_{r}}}\end{bmatrix}$ ) in $O(nnz({{{\bm{E}}_{r}}})(s+k)^{2}+(s+k)^{3})$ time.
•

$\textbf{Add columns}({\bm{E}})$ : Update ${\bm{U}}_{k},\mathbf{\Sigma}_{k},{\bm{V}}_{k}\leftarrow\text{SVD}_{k}(\begin{bmatrix}\widetilde{{\bm{A}}_{k}}~{}~{}{{{\bm{E}}_{c}}}\end{bmatrix})$ in $O(nnz({{{\bm{E}}_{c}}})(s+k)^{2}+(s+k)^{3})$ time.
•

$\textbf{Update weights}({\bm{D}},{\bm{E}})$ : Update ${\bm{U}}_{k},\mathbf{\Sigma}_{k},{\bm{V}}_{k}\leftarrow\text{SVD}_{k}(\widetilde{{\bm{A}}}+{\bm{D}}{{{\bm{E}}_{m}}}^{\top})$ in $O(nnz({\bm{D}}+{{{\bm{E}}_{m}}})(s+k)^{2}+(s+k)^{3})$ time.
•

$\textbf{Query}(i)$ : Return ${\bm{U}}_{k}[i]$ (or ${\bm{V}}_{k}[i]$ ) and $\mathbf{\Sigma}_{k}$ in $O(k^{2})$ time.

where $\widetilde{{\bm{A}}_{k}}={\bm{U}}_{k}\mathbf{\Sigma}_{k}{\bm{V}}_{k}$ and $s$ is the rank of the update matrix.

The proposed method can theoretically produce the same output as Zha & Simon (1999) method with a much lower time complexity.

The proposed variants with the approximate augmented space. To address updates with coarser granularity (i.e., larger $s$ ), we propose two variants of the proposed algorithm based on approximate augmented spaces with GKL and RPI (see section 3.1) in SV-LCOV, denoted with the suffixes -GKL and -RPI, respectively. The proposed variants procude theoretically the same output as Vecharynski & Saad (2014) and Yamazaki et al. (2015), repectively. We elaborate the proposed variants in Appendix D.3.

3.4 Time complexity comparing to previous methods

Table 1: Time complexity comparing to previous methods

Algorithm	Asymptotic complexity
Kalantzis et al. (2021)	$(m+n)k^{2}+nnz({\bm{A}})k+(k+s)k^{2}$
\cdashline1-2 Zha & Simon (1999)	$(m+n)k^{2}+nks+ns^{2}+(k+s)^{3}$
Vecharynski & Saad (2014)	$(m+n)k^{2}+nkl+nnz({\bm{E}})(k+l)+(k+s)(k+l)^{2}$
Yamazaki et al. (2015)	$(m+n)k^{2}+t(nkl+nl^{2}+nnz({\bm{E}})l)+(k+s)(k+l)^{2}$
\cdashline1-2 ours	$nnz({\bm{E}})(k+s)^{2}+(k+s)^{3}$
ours-GKL	$nnz({\bm{E}})(k^{2}+sl+kl)+(k+s)(k+l)^{2}$
ours-RPI	$nnz({\bm{E}})(sl+l^{2})t+nnz({\bm{E}})k^{2}+(k+s)(k+l)^{2}$

Table. 1 presents the comparison of the time complexity of our proposed algorithm with the previous algorithms when updating rows ${{{\bm{E}}_{r}}}\in\mathbb{R}^{s\times n}$ . To simplify the results, we have assumed $s<n$ and omitted the big- $O$ notation. $l$ denotes the rank of approximation in GKL and RPI. $t$ denotes the number of iteration RPI performed.

Our method achieves a time complexity of $O(nnz({\bm{E}}))$ for updating, without any $O(n)$ or $O(m)$ terms when $s$ and $k$ are constants (i.e., the proposed algorithm is at the update sparsity time complexity). This is because SV-LCOV is used to obtain the orthogonal basis, eliminating the $O(n)$ or $O(m)$ terms in processing the augmented matrix. Additionally, our extended decomposition avoids the $O(n)$ or $O(m)$ terms when restoring the SV-LCOV and eliminates the projection in all rows of singular vectors. Despite the increase in time complexity for querying from $O(k)$ to $O(k^{2})$ , this trade-off remains acceptable considering the optimizations mentioned above.

For updates with one coarse granularity (i.e. larger $s$ ), the proposed method of approximating the augmented space with GKL or RPI in the SV-LCOV space eliminates the squared term of $s$ in the time complexity, making the proposed algorithm also applicable to coarse granularity update.

4 Numerical Experiment

In this section, we conduct experimental evaluations of the update process for the truncated SVD on sparse matrices. Subsequently, we assess the performance of the proposed method by applying it to two downstream tasks: (1) link prediction, where we utilize node representations learned from an evolving adjacency matrix, and (2) collaborative filtering, where we utilize user/item representations learned from an evolving user-item matrix.

4.1 Experimental Description

Baselines. We evaluate the proposed algorithm and its variants against existing methods, including Zha & Simon (1999), Vecharynski & Saad (2014) and Yamazaki et al. (2015). Throughout the experiments, we set $l$ , the spatial dimension of the approximation, to $10$ based on previous settings (Vecharynski & Saad, 2014). The required number of iterations for the RPI, denoted by $t$ , is set to $3$ . In the methods of Vecharynski & Saad (2014) and Yamazaki et al. (2015), there may be differences in running time between 1) directly constructing the augmented matrix, and 2) accessing the matrix-vector multiplication as needed without directly constructing the augmented matrix. We conducted tests under both implementations and considered the minimum value as the running time.

Tasks and Settings. For the adjacency matrix, we initialize the SVD with the first 50% of the rows and columns, and for the user-item matrix, we initialize the SVD with the first 50% of the columns.

The experiment involves $\phi$ batch updates, adding $n/\phi$ rows and columns each time for a $2n$ -sized adjacency matrix. For a user-item matrix with $2n$ columns, $n/\phi$ columns are added per update.

•

Link Prediction aims at predicting whether there is a link between a given node pair. Specifically, each node’s representation is obtained by a truncated SVD of the adjacency matrix. During the infer stage, we first query the representation obtained by the truncated SVD of the node pair $(i,j)$ to predict. Then a score, the inner product between the representation of the node pair

${\bm{U}}[i]^{\top}\Sigma{\bm{V}}[j]$

is used to make predictions. For an undirected graph, we take the maximum value in both directions. We sort the scores in the test set and label the edges between node pairs with high scores as positive ones.

Following prior research, we create the training set $\mathcal{G^{\prime}}$ by randomly removing 30% of the edges from the original graph $\mathcal{G}$ . The node pairs connected by the eliminated edges are then chosen, together with an equal number of unconnected node pairs from $\mathcal{G}$ , to create the testing set $\mathcal{E}_{test}$ .
•

Collaborative Filtering in recommender systems is a technique using a small sample of user preferences to predict likes and dislikes for a wide range of products. In this paper, we focus on predicting the values of in the normalized user-item matrix.

In this task, a truncated SVD is used to learn the representation for each user and item. And the value is predicted by the inner product between the representation of $i$ -th user and $j$ -th item with

${\bm{U}}[i]^{\top}\Sigma{\bm{V}}[j]$

The matrix is normalized by subtracting the average value of each item. Values in the matrix are initially divided into the training and testing set with a ratio of $8:2$ .

Datasets. The link prediction experiments are conducted on three publicly available graph datasets, namely Slashdot (Leskovec et al., 2009) with $82,168$ nodes and $870,161$ edges, Flickr (Tang & Liu, 2009) with $80,513$ nodes and $11,799,764$ edges, and Epinions (Richardson et al., 2003) with $75,879$ nodes and $1,017,674$ edges. The social network consists of nodes representing users, and edges indicating social relationships between them. In our setup, all graphs are undirected.

For the collaborative filtering task, we use data from MovieLens (Harper & Konstan, 2015). The MovieLens25M dataset contains more than $2,500,000$ ratings across $62,423$ movies. According to the dataset’s selection mechanism, all selected users rated at least $20$ movies, ensuring the dataset’s validity and a moderate level of density. All ratings in the dataset are integers ranging from $1$ to $5$ .

Refer to caption — Figure 1: Computational efficiency of adjacency matrix w.r.t. for $k$ values of $16,32,64,128,256$

4.2 Efficiency Study

To study the efficiency of the proposed algorithm, we evaluate the proposed method and our optimized GKL and RPI methods in the context of link prediction and collaborative filtering tasks.

Experiments are conducted on the undirected graphs of Slashdot (Leskovec et al., 2009), Flickr (Tang & Liu, 2009), and Epinions (Richardson et al., 2003) to investigate link prediction. The obtained results are presented in Table 2, Fig. 1 and Fig. 2. Our examination metrics for the Efficiency section include runtime and Average Precision (AP), which is the percentage of correctly predicted edges in the predicted categories to the total number of predicted edges. Besides, we report the Frobenius norm between the ${\bm{U}}_{k}\mathbf{\Sigma}_{k}{\bm{V}}_{k}^{\top}$ and the train matrix. The results demonstrate that across multiple datasets, the proposed method and its variants have demonstrated significant increases in efficiency without compromising AP, reducing time consumption by more than $85\%$ .

Table 3 demonstrates the results of four experiments conducted for the collaborative filtering task: batch update( $\phi=2000$ ) and streaming update( $\phi=\#entries$ ) with $k=16$ and $k=64$ . Our evaluation metrics include runtime and mean squared error (MSE). The results show that our method significantly reduces runtime while maintaining accuracy. Existing methods struggle with updating large and sparse datasets, especially for streaming updates, making real-time updates challenging. The methodology employed in our study effectively decreases the runtime by a factor of $10$ or more.

Table 2: Experimental results on adjacency matrix

	Slashdot		Flickr		Epinions
Method	Norm	AP	Norm	AP	Norm	AP
Zha & Simon (1999)	792.11	93.40%	2079.23	95.16%	1370.26	95.62%
Vecharynski & Saad (2014)	792.01	93.56%	2079.63	95.11%	1370.64	95.70%
Yamazaki et al. (2015)	792.11	93.52%	2079.28	95.14%	1370.29	95.61%
ours	792.11	93.41%	2079.23	95.16%	1370.26	95.62%
ours-GKL	792.01	93.56%	2079.63	95.11%	1370.64	95.70%
ours-RPI	792.11	93.50%	2079.28	95.14%	1370.29	95.61%

Table 3: Experimental results of user-item matrix

		Batch Update		Streaming Update
	Method	Runtime	MSE	Runtime	MSE
$k$ =16	Zha & Simon (1999)	192s	0.8616	626s	0.8616
	Vecharynski & Saad (2014)	323s	0.8646	2529s	0.8647
	Yamazaki et al. (2015)	278s	0.8618	352s	0.8619
\cdashline2-9	ours	23s	0.8616	35s	0.8616
	ours-GKL	18s	0.8646	48s	0.8647
	ours-RPI	45s	0.8618	43s	0.8619
$k$ =64	Zha & Simon (1999)	343s	0.8526	2410s	0.8527
	Vecharynski & Saad (2014)	124s	0.8572	3786s	0.8568
	Yamazaki et al. (2015)	313s	0.8527	758s	0.8528
\cdashline2-9	ours	49s	0.8526	135s	0.8527
	ours-GKL	45s	0.8572	147s	0.8568
	ours-RPI	98s	0.8527	141s	0.8528

4.3 Varying $k$ and $\phi$

We conduct link prediction experiments on three datasets for $k$ and $\phi$ , aiming to explore the selection of variants and approaches in various scenarios. The results in Fig. 1 show that our optimized methods are all significantly faster than the original ones for different choices of $k$ . All methods become somewhat slower as $k$ increases, consistent with the asymptotic complexity metrics in Table 1.

We experimentally evaluate the performance of each method for different batch sizes, $\phi$ . As shown in Fig. 2, our methods (ours-GKL and ours-RPI) outperform others when a large number of entries are updated simultaneously. For $\phi=10^{4}$ , the three methods exhibit similar runtime. These experiments provide guidance for selecting the most suitable model based on specific context. For tasks with varying requirements for single updates, it is recommended to choose the most appropriate method.

5 Conclusion

In conclusion, we introduce a novel algorithm along with two variants for updating truncated SVDs with sparse matrices. Numerical experiments show a substantial speed boost of our method compared to previous approaches, while maintaining the comparable accuracy.

References

Abu-El-Haija et al. (2021) Sami Abu-El-Haija, Hesham Mostafa, Marcel Nassar, Valentino Crespi, Greg Ver Steeg, and Aram Galstyan. Implicit svd for graph representation learning. Advances in Neural Information Processing Systems, 34:8419–8431, 2021.
Brand (2006) Matthew Brand. Fast low-rank modifications of the thin singular value decomposition. Linear algebra and its applications, 415(1):20–30, 2006.
Cai et al. (2022) Xuheng Cai, Chao Huang, Lianghao Xia, and Xubin Ren. Lightgcl: Simple yet effective graph contrastive learning for recommendation. In The Eleventh International Conference on Learning Representations, 2022.
Cremonesi et al. (2010) Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems, pp. 39–46, 2010.
Deerwester et al. (1990) Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American society for information science, 41(6):391–407, 1990.
Deng et al. (2023) Haoran Deng, Yang Yang, Jiahe Li, Haoyang Cai, Shiliang Pu, and Weihao Jiang. Accelerating dynamic network embedding with billions of parameter updates to milliseconds. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 414–425, 2023.
Du et al. (2015) Nan Du, Yichen Wang, Niao He, Jimeng Sun, and Le Song. Time-sensitive recommendation from recurrent user activities. Advances in neural information processing systems, 28, 2015.
Golub & Kahan (1965) Gene Golub and William Kahan. Calculating the singular values and pseudo-inverse of a matrix. Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis, 2(2):205–224, 1965.
Halko et al. (2011) Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288, 2011.
Harper & Konstan (2015) F. Maxwell Harper and Joseph A. Konstan. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TiiS), 5(4):19:1–19:19, 2015. doi: 10.1145/2827872.
Kalantzis et al. (2021) Vasileios Kalantzis, Georgios Kollias, Shashanka Ubaru, Athanasios N Nikolakopoulos, Lior Horesh, and Kenneth Clarkson. Projection techniques to update the truncated svd of evolving matrices with applications. In International Conference on Machine Learning, pp. 5236–5246. PMLR, 2021.
Koren et al. (2009) Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009.
Leskovec et al. (2009) J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29–123, 2009.
Levy & Goldberg (2014) Omer Levy and Yoav Goldberg. Neural word embedding as implicit matrix factorization. Advances in neural information processing systems, 27, 2014.
Musco & Musco (2015) Cameron Musco and Christopher Musco. Randomized block krylov methods for stronger and faster approximate singular value decomposition. Advances in neural information processing systems, 28, 2015.
Nikolakopoulos et al. (2019) Athanasios N Nikolakopoulos, Vassilis Kalantzis, Efstratios Gallopoulos, and John D Garofalakis. Eigenrec: generalizing puresvd for effective and efficient top-n recommendations. Knowledge and Information Systems, 58:59–81, 2019.
Ramasamy & Madhow (2015) Dinesh Ramasamy and Upamanyu Madhow. Compressive spectral embedding: sidestepping the svd. Advances in neural information processing systems, 28, 2015.
Richardson et al. (2003) M. Richardson, R. Agrawal, and P. Domingos. Trust management for the semantic web. In ISWC, 2003.
Sarwar et al. (2002) Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Incremental singular value decomposition algorithms for highly scalable recommender systems. In Fifth international conference on computer and information science, volume 1, pp. 27–8. Citeseer, 2002.
Tang & Liu (2009) Lei Tang and Huan Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 817–826, 2009.
Turk & Pentland (1991) Matthew Turk and Alex Pentland. Eigenfaces for recognition. Journal of cognitive neuroscience, 3(1):71–86, 1991.
Ubaru & Saad (2019) Shashanka Ubaru and Yousef Saad. Sampling and multilevel coarsening algorithms for fast matrix approximations. Numerical Linear Algebra with Applications, 26(3):e2234, 2019.
Ubaru et al. (2015) Shashanka Ubaru, Arya Mazumdar, and Yousef Saad. Low rank approximation using error correcting coding matrices. In International Conference on Machine Learning, pp. 702–710. PMLR, 2015.
Vecharynski & Saad (2014) Eugene Vecharynski and Yousef Saad. Fast updating algorithms for latent semantic indexing. SIAM Journal on Matrix Analysis and Applications, 35(3):1105–1131, 2014.
Yamazaki et al. (2015) Ichitaro Yamazaki, Jakub Kurzak, Piotr Luszczek, and Jack Dongarra. Randomized algorithms to update partial singular value decomposition on a hybrid cpu/gpu cluster. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12, 2015.
Zha & Simon (1999) Hongyuan Zha and Horst D Simon. On updating problems in latent semantic indexing. SIAM Journal on Scientific Computing, 21(2):782–791, 1999.
Zhang et al. (2018) Ziwei Zhang, Peng Cui, Jian Pei, Xiao Wang, and Wenwu Zhu. Timers: Error-bounded svd restart on dynamic networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
Zhu et al. (2018) Dingyuan Zhu, Peng Cui, Ziwei Zhang, Jian Pei, and Wenwu Zhu. High-order proximity preserved embedding for dynamic networks. IEEE Transactions on Knowledge and Data Engineering, 30(11):2134–2144, 2018.

Appendix A Reproducibility

We make public our experimental code, which includes python implementations of the proposed and baseline methods, as well as all the datasets used and their preprocessing. The code is available at https://anonymous.4open.science/r/ISVDforICLR2024-7517.

Appendix B Notations

Frequently used notations through out the paper in are summarized in Table 4.

Table 4: Frequently used notations

Notation	Description
${\bm{A}}$	The data matrix
${\bm{E}}$	The update matrix (new columns / rows)
${\bm{D}},{\bm{E}}$	The update matrix (low rank update of weight)
${\bm{I}}$	The identity matrix
${\bm{U}}$	The left singular vectors
$\bm{\Sigma}$	The singular values
${\bm{V}}$	The right singular vectors
$({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}){\bm{B}}$	The (left) augmented matrix
$({\bm{I}}-{\bm{V}}_{k}{\bm{V}}_{k}){\bm{B}}$	The (right) augmented matrix
$m,n$	The number of rows and columns of data matrix
$k$	The rank of truncated SVD
$s$	The rank of update matrix
$l$	The rank of approximate augmented space
$t$	The number of Random Power Iteration performed
${\bm{X}}_{k}$	A matrix with rank $k$
${\bm{X}}^{\top}$	The transpose of matrix ${\bm{X}}$
$\widetilde{{\bm{X}}_{k}}$	A rank- $k$ approximation of ${\bm{X}}$
$\overline{{{\bm{X}}}}$	The updated matrix of ${\bm{X}}$
$\\|{\bm{x}}\\|$	$l_{2}$ -norm of ${\bm{x}}$
$\langle{\bm{x}}_{1},{\bm{x}}_{2}\rangle$	Dot product bewteen ${\bm{x}}_{1},{\bm{x}}_{2}$
$\text{SVD}_{k}({\bm{X}})$	A rank- $k$ truncated SVD of ${\bm{X}}$

Appendix C Omitted Proofs

Proof of Lemma 2

Proof.

We prove each of the three operations as follows.

•

Scalar multiplication. It takes $O(nnz(b))$ time to get $\alpha{\bm{b}}$ and $O(k)$ time to get $\alpha{\bm{c}}$ , respectively. Therefore the overall time complexity for scalar multiplication is $O(nnz({\bm{b}})+k)$ .
•

Addition. It takes $O(nnz({\bm{b}}_{1})+nnz({\bm{b}}_{2}))=O(nnz({\bm{b}}_{1}+{\bm{b}}_{2}))$ time to get ${\bm{b}}_{1}+{\bm{b}}_{2}$ and $O(k)$ time to get ${\bm{c}}_{1}+{\bm{c}}_{2}$ , respectively. Therefore the overall time complexity for addition is $O(nnz({\bm{b}}_{1}+{\bm{b}}_{2})+k)$ .
•

Inner product. It takes $O(nnz({\bm{b}}_{1})+nnz({\bm{b}}_{2}))=O(nnz({\bm{b}}_{1}+{\bm{b}}_{2}))$ time to get $\langle{\bm{b}}_{1},{\bm{b}}_{2}\rangle$ and $O(k)$ time to get $\langle{\bm{c}}_{1},{\bm{c}}_{2}\rangle$ , respectively. Therefore the overall time complexity for inner product is $O(nnz({\bm{b}}_{1}+{\bm{b}}_{2})+k)$ .

∎

Proof of Lemma 3

Proof.

Each of the three operations of SV-LCOV can be corresponded to the original space as follows.

•

Scalar multiplication.

\alpha({\bm{b}},{\bm{c}})_{{\bm{U}}_{k}}=(\alpha{\bm{b}},\alpha{\bm{c}})_{{\bm{U}}_{k}}=(\alpha{\bm{b}})-{\bm{U}}_{k}({\bm{U}}_{k}^{\top}\alpha{\bm{b}})=\alpha({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{b}}

(8)

•

Addition.

$\displaystyle({\bm{b}}_{1},{\bm{c}}_{1})_{{\bm{U}}_{k}}+({\bm{b}}_{2},{\bm{c}}_{2})_{{\bm{U}}_{k}}$	$\displaystyle=({\bm{b}}_{1}+{\bm{b}}_{2},{\bm{c}}_{1}+{\bm{c}}_{2})_{{\bm{U}}_{k}}$	(9)
	$\displaystyle=({\bm{b}}_{1}+{\bm{b}}_{2})-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}({\bm{c}}_{1}+{\bm{c}}_{2})$
	$\displaystyle={\bm{b}}_{1}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}{\bm{c}}_{1}+{\bm{b}}_{2}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}{\bm{c}}_{2}$
	$\displaystyle=({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{b}}_{1}+({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{b}}_{2}$

•

Inner product.

$\displaystyle\langle({\bm{b}}_{1},{\bm{c}}_{1})_{{\bm{U}}_{k}},({\bm{b}}_{2},{\bm{c}}_{2})_{{\bm{U}}_{k}}\rangle$	$\displaystyle=\langle{\bm{b}}_{1},{\bm{b}}_{2}\rangle-\langle{\bm{c}}_{1},{\bm{c}}_{2}\rangle$	(10)
	$\displaystyle={\bm{b}}_{1}^{\top}{\bm{b}}_{2}-{\bm{c}}_{1}^{\top}{\bm{c}}_{2}$
	$\displaystyle={\bm{b}}_{1}^{\top}{\bm{b}}_{2}-{\bm{b}}_{1}^{\top}{\bm{U}}_{k}{\bm{U}}_{k}^{\top}{\bm{b}}_{2}$
	$\displaystyle={\bm{b}}_{1}^{\top}{\bm{b}}_{2}-2{\bm{b}}_{1}^{\top}{\bm{U}}_{k}{\bm{U}}_{k}^{\top}{\bm{b}}_{2}+{\bm{b}}_{1}^{\top}{\bm{U}}_{k}{\bm{U}}_{k}^{\top}{\bm{b}}_{2}$
	$\displaystyle={\bm{b}}_{1}^{\top}{\bm{b}}_{2}-2{\bm{b}}_{1}^{\top}{\bm{U}}_{k}{\bm{U}}_{k}^{\top}{\bm{b}}_{2}+{\bm{b}}_{1}^{\top}{\bm{U}}_{k}{\bm{U}}_{k}^{\top}{\bm{U}}_{k}{\bm{U}}_{k}^{\top}{\bm{b}}_{2}$
	$\displaystyle=({\bm{b}}_{1}^{\top}-{\bm{b}}_{1}^{\top}{\bm{U}}_{k}{\bm{U}}_{k}^{\top})({\bm{b}}_{2}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}{\bm{b}}_{2})$
	$\displaystyle=(({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{b}}_{1})^{\top}(({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{b}}_{2})$
	$\displaystyle=\langle({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{b}}_{1},({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{b}}_{2}\rangle$

∎

Proof of Lemma 4

Proof.

An orthogonal basis can be obtained by Algorithm 2, and we next analyze the time complexity of Algorithm 2.

Because the number of non-zero entries of any linear combination of a set of sparse vectors $\{{\bm{b}}_{1},{\bm{b}}_{2},...,{\bm{b}}_{k}\}$ with size $k$ is at most $\sum_{i=1}^{k}nnz({\bm{b}}_{i})$ , the number of non-zero entries of the sparse vector $({\bm{b}})$ in each SV-LCOV during the process of Algorithm 2 are at most $nnz(\sum_{i=1}^{k}{\bm{b}}_{i})$ .

There are a total of $O(s^{2})$ projections and subtractions of SV-LCOV in Algorithm 2, and by Lemma 2, the overall time complexity is $O((nnz(\sum_{i=1}^{k}{\bm{b}}_{i})+k)s^{2})$ .

∎

Proof of Theorem 1

Proof.

The output of proposed method is theoretically equivalent to the output of Zha & Simon (1999), and the detailed time complexity analysis is given in Appendix F. ∎

Appendix D Approximating the Augmented Space

D.1 Approximating the Augmented Space via Golub-Kahan-Lanczos Process

Input:

{\bm{E}}\in\mathbb{R}^{m\times s},{\bm{U}}_{k}\in\mathbb{R}^{m\times k},l\in\mathbb{N}^{+}

Output:

{\bm{Q}}\in\mathbb{R}^{n\times l},{\bm{P}}\in\mathbb{R}^{s\times l}

{\bm{Z}}\leftarrow({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top})\mathbf{E}

;

2 Choose

{\bm{p}}_{1}\in\mathbb{R}^{s}

\|{\bm{p}}_{1}\|=1

. Set

\beta_{0}=0

;

3 for $i=1,...,l$ do

{\bm{q}}_{i}\leftarrow{\bm{Z}}{\bm{p}}_{i}-\beta_{i-1}{\bm{q}}_{i-1}

;

\alpha_{i}\leftarrow\|{\bm{q}}_{i}\|

;

{\bm{q}}_{i}\leftarrow{\bm{q}}_{i}/\alpha_{i}

;

{\bm{p}}_{i+1}\leftarrow{\bm{Z}}^{\top}{\bm{q}}_{i}-\alpha_{i}{\bm{p}}_{i}

;

8 Orthogonalze

{\bm{p}}_{i+1}

against

[{\bm{p}}_{1},...,{\bm{p}}_{i}]

;

\beta_{i}\leftarrow\|{\bm{p}}_{i+1}\|

;

{\bm{p}}_{i+1}={\bm{p}}_{i+1}/\beta_{i}

;

12 end for

{\bm{P}}_{l+1}=[{\bm{p}}_{1},...,{\bm{p}}_{l+1}]

;

{\bm{H}}=\text{diag}\{\alpha_{1},...,\alpha_{l}\}+\text{superdiag}\{\beta_{1},...,\beta_{l}\}

;

{\bm{P}}\leftarrow{\bm{P}}_{l+1}{\bm{H}}^{\top}

;

return ${\bm{Q}}=[{\bm{q}}_{1},...,{\bm{q}}_{l}],{\bm{P}}$

Algorithm 5 GKL

Input:

{\bm{E}}\in\mathbb{R}^{m\times s},{\bm{U}}_{k}\in\mathbb{R}^{m\times k},l\in\mathbb{N}^{+}

Output:

{\bm{B}}\in\mathbb{R}^{n\times l},{\bm{C}}\in\mathbb{R}^{s\times l},{\bm{P}}\in\mathbb{R}^{s\times l}

{\bm{B}}\leftarrow{\bm{E}},\quad{\bm{C}}\leftarrow{\bm{U}}_{k}^{\top}{\bm{E}}

;

2 Choose

{\bm{p}}_{1}\in\mathbb{R}^{s}

\|{\bm{p}}_{1}\|=1

. Set

\beta_{0}=0

;

3 for $i=1,...,l$ do

{\bm{b}}_{i}^{\prime}\leftarrow(\sum_{t}{\bm{p}}_{i}[t]\cdot{\bm{b}}_{t})-\beta_{i-1}{\bm{b}}^{\prime}_{i-1}

;

{\bm{c}}_{i}^{\prime}\leftarrow(\sum_{t}{\bm{p}}_{i}[t]\cdot{\bm{c}}_{t})-\beta_{i-1}{\bm{c}}^{\prime}_{i-1}

;

\alpha_{i}\leftarrow\sqrt{\langle b_{i}^{\prime},b_{i}^{\prime}\rangle-\langle{\bm{c}}_{i}^{\prime}{\bm{c}}_{i}^{\prime}\rangle}

;

{\bm{b}}_{i}^{\prime}\leftarrow{\bm{b}}_{i}^{\prime}/\alpha

;

{\bm{c}}_{i}^{\prime}\leftarrow{\bm{c}}_{i}^{\prime}/\alpha

;

{\bm{p}}_{i+1}\leftarrow{\bm{B}}^{\top}{\bm{b}}^{\prime}_{i}-{\bm{C}}^{\top}{\bm{c}}^{\prime}_{i}-\alpha_{i}{\bm{p}}_{i}

;

10 Orthogonalze

{\bm{p}}_{i+1}

against

[{\bm{p}}_{1},...,{\bm{p}}_{i}]

;

\beta_{i}\leftarrow\|{\bm{p}}_{i+1}\|

;

{\bm{p}}_{i+1}={\bm{p}}_{i+1}/\beta_{i}

;

14 end for

{\bm{P}}_{l+1}=[{\bm{p}}_{1},...,{\bm{p}}_{l+1}]

;

{\bm{H}}=\text{diag}\{\alpha_{1},...,\alpha_{l}\}+\text{superdiag}\{\beta_{1},...,\beta_{l}\}

;

{\bm{P}}\leftarrow{\bm{P}}_{l+1}{\bm{H}}^{\top}

;

return ${\bm{Q}}=({\bm{B}}^{\prime},{\bm{C}}^{\prime}),{\bm{P}}$

Algorithm 6 GKL with SV-LCOV

A step-by-step description. Specifically, the ${\bm{Z}}{\bm{p}}_{i}$ in Line 4 of Algorithm 7 can be viewed as a linear combination of SV-LCOV with Line 4 and Line 5 in Algorithm 6. The $l_{2}$ norm of ${\bm{q}}_{i}$ in Line 5 of Algorithm 7 is the length of a SV-LCOV and can be transformed into inner product in Line 6 of Algorithm 6. Line 6 of Algorithm 7 is a scalar multiplication corresponding to Line 7 and Line 8 of Algorithm 6. And the ${\bm{Z}}^{\top}{\bm{q}}_{i}$ in Line 7 of Algorithm 7 can be recongnized as inner product between SV-LCOV demonstarted in Line 9 of Algorithm 6.

Complexity analysis of Algorithm 6. Line 4, 5 run in $O((nnz({\bm{E}})+k)sl)$ time. Line 6 run in $O(nnz({\bm{E}}+k)l)$ time. Line 7, 8 run in $O((nnz({\bm{E}})+k)l)$ time. Line 9 run in $O(nnz({\bm{E}}+k)sl)$ time. Line 10 run in $O(sl^{2})$ time. Line 11, 12 run in $O(sl)$ time. Line 14, 15, 16 run in $O(sl^{2})$ time.

The overall time complexity of Algorithm 6 is $O(nnz({\bm{E}}+k)sl)$ time.

D.2 Approximating the Augmented Space via Random Power Iteration Process

Input:

{\bm{E}}\in\mathbb{R}^{m\times s},{\bm{U}}_{k}\in\mathbb{R}^{m\times k},l,t\in\mathbb{N}^{+}

Output:

{\bm{Q}}\in\mathbb{R}^{n\times l},{\bm{P}}\in\mathbb{R}^{s\times l}

{\bm{Z}}\leftarrow({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top})\mathbf{E}

;

2 Choose

{\bm{P}}\in\mathbb{R}^{s\times l}

with random unitary columns;

3 for $i=1,...,t$ do

{\bm{P}},{\bm{R}}\leftarrow QR({\bm{P}})

;

{\bm{Q}}\leftarrow{\bm{Z}}{\bm{P}}

;

{\bm{Q}},{\bm{R}}\leftarrow QR({\bm{Q}})

;

{\bm{P}}\leftarrow{\bm{Z}}^{\top}{\bm{Q}}

;

9 end for

return ${\bm{Q}},{\bm{P}}$

Algorithm 7 RPI

Input:

{\bm{E}}\in\mathbb{R}^{m\times s},{\bm{U}}_{k}\in\mathbb{R}^{m\times k},l,t\in\mathbb{N}^{+}

Output:

{\bm{Q}}\in\mathbb{R}^{n\times l},{\bm{P}}\in\mathbb{R}^{s\times l}

{\bm{B}}\leftarrow{\bm{E}},\quad{\bm{C}}\leftarrow{\bm{U}}_{k}^{\top}{\bm{E}}

;

2 Choose

{\bm{P}}\in\mathbb{R}^{s\times l}

with random unitary columns;

3 for $i=1,...,t$ do

{\bm{B}}^{\prime},{\bm{C}}^{\prime}\leftarrow{\bm{B}}{\bm{P}},{\bm{C}}{\bm{P}}

;

{\bm{B}}^{\prime},{\bm{C}}^{\prime},{\bm{R}}\leftarrow QR\text{ with Algorithm \ref{algo2}}

;

{\bm{P}},{\bm{R}}\leftarrow QR({\bm{P}})

;

{\bm{P}}\leftarrow{\bm{B}}^{\top}{\bm{B}}^{\prime}-{\bm{C}}^{\top}{\bm{C}}^{\prime}

;

9 end for

return ${\bm{Q}}=({\bm{B}}^{\prime},{\bm{C}}^{\prime}),{\bm{P}}$

Algorithm 8 RPI with SV-LCOV

Complexity analysis of Algorithm 8. Line 1 run in $O(nnz({\bm{E}})k^{2})$ . Line 4 run in $O((nnz({\bm{E}})+k)slt)$ time. Line 5 run in $O(nnz({\bm{E}})+k)l^{2}t)$ time. Line 6 run in $O(sl^{2}t)$ time. Line 7 run in $O((nnz(E)+k)slt)$ time. The overall time complexity of Algorithm 8 is $O((nnz({\bm{E}})+k)(sl+l^{2})t+nnz({\bm{E}})k^{2})$ .

D.3 The Proposed Updating Truncated SVD with Approximate Augmented Matrix

Input:

{\bm{U}}_{k}({\bm{U}}^{\prime},{\bm{U}}^{\prime\prime}),\mathbf{\Sigma}_{k},{\bm{V}}_{k}({\bm{V}}^{\prime},{\bm{V}}^{\prime\prime}),{\bm{E}}

1 Turn

({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{E}}

into SV-LCOV and get

{\bm{Q}}({\bm{B}},{\bm{C}}),{\bm{P}}

with Algorithm 6 or 8;

{\bm{F}}_{k},\mathbf{\Theta}_{k},{\bm{G}}_{k}\leftarrow\text{SVD}_{k}(\begin{bmatrix}\mathbf{\Sigma}_{k}&{\bm{U}}_{k}^{\top}{\bm{E}}\\ &{\bm{P}}^{\top}\\ \end{bmatrix})

;

{\bm{U}}^{\prime\prime}\leftarrow{\bm{U}}^{\prime\prime}({\bm{F}}_{k}[:k]-{\bm{C}}{\bm{F}}_{k}[k:])

;

{\bm{U}}^{\prime}\leftarrow{\bm{U}}^{\prime}+{\bm{B}}{\bm{F}}_{k}[k:]{\bm{U}}^{\prime\prime-1}

;

\mathbf{\Sigma}_{k}\leftarrow\mathbf{\Theta}_{k}

;

{\bm{V}}^{\prime\prime}\leftarrow{\bm{V}}^{\prime\prime}{\bm{G}}_{k}[:k]

;

7 Append new columns

{\bm{G}}[k:]{\bm{V}}^{\prime\prime-1}

{\bm{V}}^{\prime}

;

Algorithm 9 Add columns with approximated augmented space

Input:

{\bm{U}}_{k}({\bm{U}}^{\prime},{\bm{U}}^{\prime\prime}),\mathbf{\Sigma}_{k},{\bm{V}}_{k}({\bm{V}}^{\prime},{\bm{V}}^{\prime\prime}),{\bm{D}},{\bm{E}}

1 Turn

({\bm{I}}-{\bm{U}}_{k}{\bm{U}}_{k}^{\top}){\bm{D}}

into SV-LCOV and get

{\bm{Q}}_{\bm{D}}({\bm{B}}_{\bm{D}},{\bm{C}}_{\bm{D}}),{\bm{P}}_{\bm{D}}

with Algorithm 6 or 8;

2 Turn

({\bm{I}}-{\bm{V}}_{k}{\bm{V}}_{k}^{\top}){\bm{E}}

into SV-LCOV and get

{\bm{Q}}_{\bm{E}}({\bm{B}}_{\bm{E}},{\bm{C}}_{\bm{E}}),{\bm{P}}_{\bm{E}}

with Algorithm 6 or 8;

{\bm{F}}_{k},\mathbf{\Sigma}_{k},{\bm{G}}_{k}\leftarrow\text{SVD}_{k}(\begin{bmatrix}\mathbf{\Sigma}_{k}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}\\ \end{bmatrix}+\begin{bmatrix}{\bm{U}}_{k}^{\top}{\bm{D}}\\ {\bm{P}}_{\bm{D}}^{\top}\end{bmatrix}\begin{bmatrix}{\bm{V}}_{k}^{\top}{\bm{E}}\\ {\bm{P}}_{\bm{E}}^{\top}\end{bmatrix}^{\top})

;

{\bm{U}}^{\prime\prime}\leftarrow{\bm{U}}^{\prime\prime}({\bm{F}}_{k}[:k]-{\bm{C}}_{\bm{D}}{\bm{F}}_{k}[k:])

;

{\bm{U}}^{\prime}\leftarrow{\bm{U}}^{\prime}+{\bm{B}}_{\bm{D}}{\bm{F}}_{k}[k:]{\bm{U}}^{\prime\prime-1}

;

\mathbf{\Sigma}_{k}\leftarrow\mathbf{\Theta}_{k}

;

{\bm{V}}^{\prime\prime}\leftarrow{\bm{V}}^{\prime\prime}({\bm{G}}_{k}[:k]-{\bm{C}}_{\bm{E}}{\bm{G}}_{k}[k:])

;

{\bm{V}}^{\prime}\leftarrow{\bm{V}}^{\prime}+{\bm{B}}_{\bm{E}}{\bm{G}}_{k}[k:]{\bm{U}}^{\prime\prime-1}

;

Algorithm 10 Update weights with approximated augmented space

Appendix E Experiments

E.1 Runtime of Each Step

We present the runtime analysis of each component in the experiments, specifically focusing on the verification of $\phi$ using the Slashdot datasets. Since all the baselines, as well as the proposed method, can be conceptualized as a three-step algorithm outlined in Section 2.1, we provide an illustration of the runtime for each step. Specifically, we break down the entire algorithm into three distinct steps: the stage prior to the execution of the compact SVD, the actual execution of the compact SVD, and the segment following the compact SVD. The experimental results are depicted in Figure 3.

Compared to the original methods, the proposed methods take approximately the same amount of time to execute the compact SVD. This similarity arises because both the proposed and original methods involve decomposing a matrix of the same shape. Furthermore, as the value of $\phi$ increases and $s$ decreases, the proportion of total time consumed by step-2 (compact SVD) increases. This trend suggests a more significant improvement in efficiency for step-1 and step-3 of the algorithm.

The proposed method mainly primarily time complexity in step-1 and step-3, respectively benefiting from the structure of SV-LCOV in described in Section 3.1 and the extended decomposition in Section 3.2. The optimization in the first step is more pronounced when $\phi$ is larger (i.e., when $s$ is smaller). This is due to the fact that in the SV-LCOV framework, the sparse vectors have fewer non-zero rows when $s$ is smaller. This reduction in non-zero rows is a consequence of having fewer columns in the matrix ${\bm{E}}$ , resulting in more significant efficiency improvements through sparse addition and subtraction operations.

Furthermore, when $\phi$ is smaller and $s$ is larger, the utilization of the proposed variant method, which involves approximating the basis of the augmented space instead of obtaining the exact basis, tends to yield greater efficiency.

E.2 Experiments on Synthetic Matrices with Different Sparsity

We conduct experiments using synthetic matrices with varying sparsity levels (i.e., varying the number of non-zero entries) to investigate the influence of sparsity on the efficiency of updating the SVD. Initially, Specifically, we generate several random sparse matrices with a fixed size of 100,000 $\times$ 100,000. The number of non-zero elements in these matrices ranges from 1,000,000 to 1,000,000,000 (i.e., density from $0.01\%$ to $10\%$ ). We initialize a truncated SVD by utilizing a matrix comprised of the initial 50% of the columns. Subsequently, the remaining 50% of the columns are incrementally inserted into the matrix in $\phi$ batches, with each batch containing an equal number of columns. The number of columns added in each batch is denoted as $s$ . The experimental results are shown in Figure 4.

Experimental results under various $s$ values show that: 1) The proposed method exhibits more substantial efficiency improvements when the matrix is relatively sparse. 2) When the rank of the update matrix is higher, the variant using Lanczos vectors for space approximation achieves faster performance.

E.3 Effectiveness

We report the Frobenius norm beteween ${\bm{U}}_{k}\Sigma_{k}{\bm{V}}_{k}^{\top}$ and original matrix in the Slashdot datasets and MovieLen25M datasets in Table 5. Our proposed approach maintains a comparable precision to comparing to baselines.

Table 5: Frobenius norm w.r.t

\phi

	Slashdot			MovieLen25M
Method	$\phi=10^{1}$	$\phi=10^{2}$	$\phi=10^{3}$	$\phi=10^{1}$	$\phi=10^{2}$	$\phi=10^{3}$
Zha & Simon (1999)	784.48	792.16	792.11	4043.86	4044.23	4044.40
Vecharynski & Saad (2014)	802.61	796.26	792.01	4073.41	4111.66	4050.53
Yamazaki et al. (2015)	796.97	795.94	792.11	4098.87	4098.62	4047.87
ours	784.48	792.16	792.11	4043.86	4044.23	4044.40
ours-GKL	802.85	795.65	792.01	4076.61	4110.71	4050.36
ours-RPI	796.65	795.19	792.11	4099.11	4099.09	4047.20

Appendix F Time Complexity Analysis

A line-by-line analysis of time complexity for Algorithm 3, 4, 9, 10 is given below. Note that due to the extended decomposition, the time complexity of turing the augmented matrix into SV-LCOV is $O(nnz({\bm{E}})k^{2})$ instead of $O(nnz({\bm{E}})k)$ .

Table 6: Asymptotic complexity of Algorithm 3

	Asymptotic complexity (big- $O$ notation is omitted)
Line 1	$nnz({\bm{E}})(k^{2}+s^{2})$
Line 2	$nnz({\bm{E}})k^{2}+(k+s)^{3}$
Line 3	$k^{3}+k^{2}s$
Line 4	$nnz({\bm{E}})(ks+k^{2})+k^{3}$
Line 5	$k$
Line 6	$k^{3}$
Line 7	$k^{2}s+k^{3}$
Overall	$nnz({\bm{E}})(k+s)^{2}+(k+s)^{3}$

Table 7: Asymptotic complexity of Algorithm 4

	Asymptotic complexity (big- $O$ notation is omitted)
Line 1	$nnz({\bm{D}})(k^{2}+s^{2})$
Line 2	$nnz({\bm{E}})(k^{2}+s^{2})$
Line 3	$nnz({\bm{D}}+{\bm{E}})k^{2}+(k+s)^{3}$
Line 4	$k^{3}+k^{2}s$
Line 5	$nnz({\bm{D}})(ks+k^{2})+k^{3}$
Line 6	$k$
Line 7	$k^{3}+k^{2}s$
Line 8	$nnz({\bm{E}})(ks+k^{2})+k^{3}$
Overall	$nnz({\bm{D}}+{\bm{E}})(k+s)^{2}+(k+s)^{3}$

Table 8: Asymptotic complexity of Algorithm 9

	Asymptotic complexity (big- $O$ notation is omitted)
Line 1 (GKL)	$nnz({\bm{E}})k^{2}+nnz({\bm{E}}+k)sl$
Line 1 (RPI)	$(nnz({\bm{E}})+k)(sl+l^{2})t+nnz({\bm{E}})k^{2}$
Line 2	$(k+s)(k+l)^{2}$
Line 3	$k^{3}+k^{2}l$
Line 4	$nnz({\bm{E}})(kl+k^{2})+k^{3}$
Line 5	$k$
Line 6	$k^{3}$
Line 7	$k^{2}l+k^{3}$
Overall (GKL)	$nnz({\bm{E}})(k^{2}+sl+kl)+(k+s)(k+l)^{2}$
Overall (RPI)	$nnz({\bm{E}})(sl+l^{2})t+nnz({\bm{E}})k^{2}+(k+s)(k+l)^{2}$

Table 9: Asymptotic complexity of Algorithm 10

	Asymptotic complexity (big- $O$ notation is omitted)
Line 1 (GKL)	$nnz({\bm{D}})k^{2}+nnz({\bm{D}}+k)sl$
Line 1 (RPI)	$(nnz({\bm{D}})+k)(sl+l^{2})t+nnz({\bm{D}})k^{2}$
Line 2 (GKL)	$nnz({\bm{E}})k^{2}+nnz({\bm{E}}+k)sl$
Line 2 (RPI)	$(nnz({\bm{E}})+k)(sl+l^{2})t+nnz({\bm{E}})k^{2}$
Line 3	$nnz({\bm{D}}+{\bm{E}})k^{2}+(k+l)^{3}$
Line 4	$k^{3}+k^{2}l$
Line 5	$nnz({\bm{D}})(kl+k^{2})+k^{3}$
Line 6	$k$
Line 7	$k^{3}+k^{2}l$
Line 8	$nnz({\bm{E}})(kl+k^{2})+k^{3}$
Overall (GKL)	$nnz({\bm{D}}+{\bm{E}})(k^{2}+sl+kl)+(k+s)(k+l)^{2}$
Overall (RPI)	$nnz({\bm{D}}+{\bm{E}})(sl+l^{2})t+nnz({\bm{D}}+{\bm{E}})k^{2}+(k+s)(k+l)^{2}$

Appendix G Algorithm of adding rows

Input:

{\bm{U}}_{k}({\bm{U}}^{\prime},{\bm{U}}^{\prime\prime}),\mathbf{\Sigma}_{k},{\bm{V}}_{k}({\bm{V}}^{\prime},{\bm{V}}^{\prime\prime}),{{{\bm{E}}_{r}}}

1 Turn

({\bm{I}}-{\bm{V}}_{k}{\bm{V}}_{k}^{\top}){{{\bm{E}}_{r}}}^{\top}

into SV-LCOV and get

{\bm{Q}}({\bm{B}},{\bm{C}}),{\bm{R}}

with Algorithm 2;

{\bm{F}}_{k},\mathbf{\Theta}_{k},{\bm{G}}_{k}\leftarrow\text{SVD}_{k}(\begin{bmatrix}\mathbf{\Sigma}_{k}&\\ {\bm{E}}_{r}^{\top}{\bm{V}}_{k}&{\bm{R}}^{\top}\\ \end{bmatrix})

;

{\bm{U}}^{\prime\prime}\leftarrow{\bm{U}}^{\prime\prime}{\bm{F}}_{k}[:k]

;

4 Append new rows

{\bm{F}}[k:]{\bm{U}}^{\prime\prime-1}

{\bm{U}}^{\prime}

;

\mathbf{\Sigma}_{k}\leftarrow\mathbf{\Theta}_{k}

;

{\bm{V}}^{\prime\prime}\leftarrow{\bm{V}}^{\prime\prime}({\bm{G}}_{k}[:k]-{\bm{C}}{\bm{G}}_{k}[k:])

;

{\bm{V}}^{\prime}\leftarrow{\bm{V}}^{\prime}+{\bm{B}}{\bm{G}}_{k}[k:]{\bm{V}}^{\prime\prime-1}

;

Algorithm 11 Add rows

Fast Updating Truncated SVD for Representation Learning with Sparse Matrices

Abstract

1 Introduction

2 Background and Notations

2.1 Related Work

3 Methodology

3.1 Faster Orthogonalization of Augmented Matrix

Definition 1 (SV-LCOV).

Lemma 1.

Lemma 2.

Lemma 3 (Isometry of SV-LCOV).

Lemma 4.

3.2 An Extended Decomposition to Reducing Complexity

3.3 Main Result

Definition 2.

Theorem 1 (Main result).

3.4 Time complexity comparing to previous methods

4 Numerical Experiment

4.1 Experimental Description

4.2 Efficiency Study

4.3 Varying kk and ϕ\phi

5 Conclusion

References

Appendix A Reproducibility

Appendix B Notations

Appendix C Omitted Proofs

Proof.

Proof.

Proof.

Proof.

Appendix D Approximating the Augmented Space

D.1 Approximating the Augmented Space via Golub-Kahan-Lanczos Process

D.2 Approximating the Augmented Space via Random Power Iteration Process

D.3 The Proposed Updating Truncated SVD with Approximate Augmented Matrix

Appendix E Experiments

E.1 Runtime of Each Step

E.2 Experiments on Synthetic Matrices with Different Sparsity

E.3 Effectiveness

Appendix F Time Complexity Analysis

Appendix G Algorithm of adding rows

4.3 Varying $k$ and $\phi$