This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Near-Optimal differentially private low-rank trace regression with guaranteed private initialization

Mengyue, Zha 111Department of Mathematics, Hong Kong University of Science and Technology, [email protected]. Mengyue, Zha’s research was supported by Hong Kong PhD Fellowship Scheme.
(())
Abstract

We study differentially private (DP) estimation of a rank-rr matrix Md1×d2M\in\mathbb{R}^{d_{1}\times d_{2}} under the trace regression model with Gaussian measurement matrices. Theoretically, the sensitivity of non-private spectral initialization is precisely characterized, and the differential-privacy-constrained minimax lower bound for estimating MM under the Schatten-qq norm is established. Methodologically, the paper introduces a computationally efficient algorithm for DP-initialization with a sample size of nO~(r2(d1d2))n\geq\widetilde{O}(r^{2}(d_{1}\vee d_{2})). Under certain regularity conditions, the DP-initialization falls within a local ball surrounding MM. We also propose a differentially private algorithm for estimating MM based on Riemannian optimization (DP-RGrad), which achieves a near-optimal convergence rate with the DP-initialization and sample size of nO~(r(d1+d2))n\geq\widetilde{O}(r(d_{1}+d_{2})). Finally, the paper discusses the non-trivial gap between the minimax lower bound and the upper bound of low-rank matrix estimation under the trace regression model. It is shown that the estimator given by DP-RGrad attains the optimal convergence rate in a weaker notion of differential privacy. Our powerful technique for analyzing the sensitivity of initialization requires no eigengap condition between rr non-zero singular values.

1 Introduction

The trace regression model (Rohde and Tsybakov, 2011; Koltchinskii et al., 2011), as an extension of the standard regression model, has been widely applied in various fields such as matrix completion, compressed sensing, and multi-task learning (Negahban and Wainwright, 2011; Koltchinskii et al., 2011; Hamidi and Bayati, 2022). Previous studies have proposed both convex and non-convex approaches for optimal estimation procedures for the model. However, the increasing demand for privacy protection has added new complexities to this extensively studied problem. Differential privacy (DP) (Dwork et al., 2006), a framework for protecting individual privacy, has been widely adopted in industrial and governmental applications (Erlingsson et al., 2014; Apple Differential Privacy Team, 2017; Abowd et al., 2020). This paper aims to develop a near-optimal differentially private method for low-rank matrix estimation under the trace regression model.

Trace regression model

Let Md1×d2M\in\mathbb{R}^{d_{1}\times d_{2}} be an unknown rank-rr matrix and Xid1×d2X_{i}\in\mathbb{R}^{d_{1}\times d_{2}} be the measurement matrix for i=1,,ni=1,\cdots,n. Suppose the noisy observation yiy_{i} satisfies

yi=Xi,M+ξifori=1,,n,y_{i}=\left<X_{i},M\right>+\xi_{i}\quad{\rm for}\quad i=1,\cdots,n, (1)

where the model noise ξii.i.d.𝒩(0,σξ2)\xi_{i}\overset{i.i.d.}{\sim}\mathcal{N}(0,\sigma_{\xi}^{2}) and the inner product between XiX_{i} and MM in Euclidean space is given by Xi,M:=Tr(XiM)\left<X_{i},M\right>:=\operatorname{Tr}(X_{i}^{\top}M). The goal of the present paper is to estimate the unknown rank-rr matrix Md1×d2M\in\mathbb{R}^{d_{1}\times d_{2}} under trace regression model defined by (1), subject to differential privacy, based on nn independent observations Z:={(Xi,yi)}i=1nZ:=\{(X_{i},y_{i})\}_{i=1}^{n}.

Our approaches are built upon the Gaussian mechanism Dwork et al. (2006). The main difficulty in applying the Gaussian mechanism is sharply characterizing sensitivities of statistics whose privacy is under protection. Listed below are definitions of sensitivity, differential privacy (DP), and Gaussian mechanism. Interested readers may refer to Dwork et al. (2006, 2014); Vadhan (2017) for proofs and other details. Let \|\cdot\| denotes the spectral norm and F\|\cdot\|_{F} denotes the Frobenius norm.

Sensitivity

The sensitivity of a function ff that maps a dataset ZZ into d1×d2\mathbb{R}^{d_{1}\times d_{2}} is defined by Δf:=supneighbouring(Z,Z)f(Z)f(Z)F,\Delta_{f}:=\sup_{\textrm{neighbouring}(Z,Z^{\prime})}\|f(Z)-f(Z^{\prime})\|_{\rm F}, where and the supremum is taken over all neighbouring datasets ZZ and ZZ^{\prime} that differ by at most one observation.

Differential privacy

Let ε>0\varepsilon>0 and 0<δ<10<\delta<1, then we say the randomized algorithm AA is (ε,δ)(\varepsilon,\delta)-differentially private if (A(Z)𝒬)eε(A(Z)𝒬)+δ\mathbb{P}\big{(}A(Z)\in\mathcal{Q}\big{)}\leq e^{\varepsilon}\mathbb{P}\big{(}A(Z^{\prime})\in\mathcal{Q}\big{)}+\delta for all neighbouring data sets Z,ZZ,Z^{\prime} and all subset 𝒬d1×d2\mathcal{Q}\subset\mathbb{R}^{d_{1}\times d_{2}}.

Gaussian mechanism

The randomized algorithm defined by A(Z)=f(Z)+EA(Z)=f(Z)+E is (ε,δ)(\varepsilon,\delta)-DP where Ed1×d2E\in\mathbb{R}^{d_{1}\times d_{2}} has i.i.d. N(0,2Δf2ε2log(1.25/δ))N\big{(}0,2\Delta_{f}^{2}\varepsilon^{-2}\log(1.25/\delta)\big{)} entries.

RIP of Gaussian measurement matrices

The sensitivity of any statistic involving {Xi}i[n]\{X_{i}\}_{i\in[n]} depends on the properties of the measurement matrices. Besides, it has been previously established since Candes and Tao (2005) that the restricted isometry property (RIP) on measurement matrices is crucial to the recovery of the unknown matrix MM. Hence, assumptions on {Xi}i[n]\{X_{i}\}_{i\in[n]} are necessary and the present paper considers {Xi}i[n]\{X_{i}\}_{i\in[n]} with Gaussian design.

Assumption 1 (Gaussian design).

The vectorization of measurement matrices X1,,XnX_{1},\ldots,X_{n} are independent Gaussian vec(Xi)𝒩(𝟎,Λi)\operatorname{vec}(X_{i})\sim\mathcal{N}\left(\mathbf{0},\Lambda_{i}\right) where Λi\Lambda_{i} ’s are known, symmetric and positive definite. There exist absolute constants Cl,Cu>0C_{l},C_{u}>0 such that Clλmin(Λi)λmax(Λi)Cu.C_{l}\leq\lambda_{\min}\left(\Lambda_{i}\right)\leq\lambda_{\max}\left(\Lambda_{i}\right)\leq C_{u}.

The following Lemma 1 shows that under Assumption 1, the measurement matrices {Xi}i=1n\{X_{i}\}_{i=1}^{n} satisfy the restricted isometry property (RIP) with high probability, see the proof in E.1.

Lemma 1.

Under the Assumption 1, for any Bd1×d2B\in\mathbb{R}^{d_{1}\times d_{2}} of rank rr, there exist constants c1,c2,c3>0c_{1},c_{2},c_{3}>0 and c5>c4>0c_{5}>c_{4}>0 such that if nc1r(d1+d2)n\geq c_{1}r(d_{1}+d_{2}), with probability at least 1c2exp(c3r(d1+d2))1-c_{2}\exp\left(-c_{3}r(d_{1}+d_{2})\right), we have c4CuClBF21ni=1nXi,B2c5CuClBF2.c_{4}\sqrt{C_{u}C_{l}}\|B\|_{\mathrm{F}}^{2}\leq\frac{1}{n}\sum_{i=1}^{n}\left\langle X_{i},B\right\rangle^{2}\leq c_{5}\sqrt{C_{u}C_{l}}\|B\|_{\mathrm{F}}^{2}.

Notations

Suppose MM is of rank-rr and its singular value decomposition is of the form M=UΣVd1×d2M=U\Sigma V^{\top}\in\mathbb{R}^{d_{1}\times d_{2}} where U𝕆d1,rU\in\mathbb{O}_{d_{1},r}, V𝕆d2,rV\in\mathbb{O}_{d_{2},r} and Σ=diag{σ1σr}\Sigma=\operatorname{diag}\{\sigma_{1}\cdots\sigma_{r}\} with σ1σr\sigma_{1}\geq\cdots\geq\sigma_{r}. Here, 𝕆d,r\mathbb{O}_{d,r} denotes the set of d×rd\times r matrices satisfying HH=IrH^{\top}H=I_{r}. Let κ:=σ1/σr\kappa:=\sigma_{1}/\sigma_{r} be the condition number and κξ:=σξ/σr\kappa_{\xi}:=\sigma_{\xi}/\sigma_{r} be the signal-to-noise ratio. Let O~\widetilde{O} stand for the typical big-O notation up to logarithmic factors and O~p()\widetilde{O}_{p}(\cdot) stand for O~\widetilde{O} holds with high probability.

1.1 Main results

The paper presents several key results related to differentially private low-rank matrix estimation. Firstly, we propose a private initialization M~0\widetilde{M}_{0} (as detailed in Algorithm 1). Secondly, we establish the privacy-constrained minimax lower bound under the general Shatten-qq norm (as detailed in Theorem 2). Finally, we introduce a private estimator M~l\widetilde{M}_{l^{*}} (as detailed in Algorithm 2) that achieves the near-optimal convergence rate under the Frobenius norm. The sensitivity analysis of M~0\widetilde{M}_{0} heavily relies on a spectral representation formula for asymmetric matrices (See Lemma 2).

We prove in Corollary 1 that the private initialization M~0\widetilde{M}_{0} satisfies M~0MF2rM~0Mc0σr,\|\widetilde{M}_{0}-M\|_{F}\leq\sqrt{2r}\|\widetilde{M}_{0}-M\|\leq c_{0}\sigma_{r}, with high probability (w.h.p.), for a small constant 0<c0<10<c_{0}<1, provided that nO~((κ4r2+κ2κξ2r)(d1d2)).n\geq\widetilde{O}\left((\kappa^{4}r^{2}+\kappa^{2}\kappa_{\xi}^{2}r)(d_{1}\vee d_{2})\right).

Theorem 2 establishes the DP-constrained minimax risk of estimating the rank-rr matrix Md1×d2M\in\mathbb{R}^{d_{1}\times d_{2}} under model (1) and general Schatten-qq norm. Specifically, the minimax risk under Frobenius norm is in the order of σξr(d1d2)n+σξr(d1d2)nε\sigma_{\xi}\sqrt{\frac{r(d_{1}\vee d_{2})}{n}}+\sigma_{\xi}\,\frac{r(d_{1}\vee d_{2})}{n\varepsilon}.

Finally, we show in Theorem 3 that with a sample size of nO~((κξ2κξ)r(d1d2))n\geq\widetilde{O}\left(\left(\kappa_{\xi}^{2}\vee\kappa_{\xi}\right)r(d_{1}\vee d_{2})\right) and any initialization satisfying (2), Algorithm 2 achieves geometric convergence rate. The private estimator M~l\widetilde{M}_{l^{*}} attains the near-optimal convergence rate

M~lMFO~p(σξr(d1+d2)n+(σξ+σr)r(d1+d2)nε).\left\|\widetilde{M}_{l^{*}}-M\right\|_{F}\leq\widetilde{O}_{p}\left(\sigma_{\xi}\sqrt{\frac{r(d_{1}+d_{2})}{n}}+(\sigma_{\xi}+\sigma_{r})\frac{r(d_{1}+d_{2})}{n\varepsilon}\right).

1.2 Motivations and related works

The trace regression model has been extensively researched, resulting in well-known optimal procedures and theoretical properties. Both convex (Rohde and Tsybakov, 2011; Koltchinskii et al., 2011; Candes and Plan, 2011; Negahban and Wainwright, 2011) and non-convex methods (Burer and Monteiro, 2003; Chen and Wainwright, 2015; Zheng and Lafferty, 2016; Wei et al., 2016) have achieved the optimal convergence rate of the order σξr(d1d2)n\sigma_{\xi}\sqrt{\frac{r\left(d_{1}\vee d_{2}\right)}{n}} without the constraint from differential privacy. However, the DP-constrained minimax rate of low-rank matrix estimation under the trace regression model is still unknown. (Near) Optimal DP-algorithms have been developed for statistical problems such as learning Gaussians Kamath et al. (2019); Kuditipudi et al. (2023); Brown et al. (2023) or heavy-tailed distributions Kamath et al. (2020), (sparse or generalized) linear regression Wang (2018); Cai et al. (2021, 2023), and PCA Blum et al. (2005); Dwork et al. (2014); Chaudhuri et al. (2012); Liu et al. (2022). Previous works on DP-regression Cai et al. (2021, 2023) assume that all measurements have bounded 2\ell_{2} norm. This assumption presents a significant limitation to studying the role of measurements play in the estimation error. Additionally, by treating measurements as a fixed vector or matrix, the statistical properties of measurements are disregarded. As a result, the opportunity for optimal statistical analysis subject to privacy concerns is inevitably lost. Recently, (McSherry and Mironov, 2009; Liu et al., 2015; Jain et al., 2018; Chien et al., 2021; Wang et al., 2023) propose gradient-descent-based algorithms for DP low-rank matrix completion. These algorithms have attained near-optimal sample complexity. However, the problem of sample-efficient, differentially private initialization remains under-explored. Additionally, it is unknown how to establish the minimax risk of low-rank matrix estimation with the constraints of differential privacy, especially when the matrix is asymmetric.

1.3 Organization

Section 2 proposes a DP-initialization algorithm and presents its privacy and utility guarantees. In Section 3, we establish a DP-constrained minimax lower bound (5) for estimating the rank-rr matrix MM under the trace regression model. Section 4 presents the DP-estimator based on non-convex optimization and derives the upper bound of the DP-estimator’s error, as stated in (7). We discuss the score attack argument and the non-trivial gap between the upper bound of (7) and the DP-constrained minimax lower bound (5) in Section 5. Proofs are given in Appendix A to F.

2 DP-initialization

Section 2.1 presents an (ε,δ)(\varepsilon,\delta)-DP initialization M~0\widetilde{M}_{0}, as stated in Algorithm 1. In Section 2.2, we introduce a spectral representation formula (See Lemma 2) that is crucial to sensitivity analysis on the initialization. With the help of the spectral representation formula, the privacy and utility guarantees of the DP-initialization M~0\widetilde{M}_{0} are given in Section 2.3.

2.1 Algorithm for DP-initialization

We begin with L^=n1i=1nmat(Λi1vec(Xi))yi,\widehat{L}=n^{-1}\sum_{i=1}^{n}\operatorname{mat}\left(\Lambda_{i}^{-1}\operatorname{vec}\left(X_{i}\right)\right)y_{i}, which is unbiased for MM. Suppose that the leading-rr left and right singular vectors of L^\widehat{L} is given by the columns of U^𝕆d1,r\widehat{U}\in\mathbb{O}_{d_{1},r} and V^𝕆d2,r\widehat{V}\in\mathbb{O}_{d_{2},r}, respectively. Then, M^0:=SVDr(L^)\widehat{M}_{0}:=\operatorname{SVD}_{r}(\widehat{L}) is a non-private estimator for MM. Let Σ^:=U^L^V^\widehat{\Sigma}:=\widehat{U}^{\top}\widehat{L}\widehat{V}, then we have

M^0=SVDr(L^)=U^U^L^V^V^=U^Σ^V^.\widehat{M}_{0}=\operatorname{SVD}_{r}(\widehat{L})=\widehat{U}\widehat{U}^{\top}\widehat{L}\widehat{V}\widehat{V}^{\top}=\widehat{U}\widehat{\Sigma}\widehat{V}^{\top}.

It is reasonable to think about privatizing U^\widehat{U}, V^\widehat{V}, and Σ^\widehat{\Sigma}, separately. We first privatize the empirical spectral projector U^U^\widehat{U}\widehat{U}^{\top} and V^V^\widehat{V}\widehat{V}^{\top} by Gaussian mechanism. Thanks to post-processing property Dwork et al. (2006), we obtain U~𝕆d1,r\widetilde{U}\in\mathbb{O}^{d_{1},r} and V~𝕆d2,r\widetilde{V}\in\mathbb{O}^{d_{2},r} whose columns are differentially private and orthogonal. Secondly, we privatize the r×rr\times r matrix U~L^V~\widetilde{U}^{\top}\widehat{L}\widetilde{V} by Gaussian mechanism and obtain Σ~r×r\widetilde{\Sigma}\in\mathbb{R}^{r\times r} which is a private surrogate for Σ^=U^L^V^\widehat{\Sigma}=\widehat{U}^{\top}\widehat{L}\widehat{V}. Finally, we take M~0=U~Σ~V~\widetilde{M}_{0}=\widetilde{U}\widetilde{\Sigma}\widetilde{V}^{\top} as the DP-initialization. We display the pseudo-code of the proposed DP-initialization in Algorithm 1. The privacy of M~0\widetilde{M}_{0} is guaranteed by the composition property Dwork et al. (2006).

Algorithm 1 Differentially private initialization for trace regression
Input: the data set {(Xi,yi)}i=1n\{(X_{i},y_{i})\}_{i=1}^{n}; the covariance matrices {Λi}i=1n\{\Lambda_{i}\}_{i=1}^{n}; sensitivity Δ(1)\Delta^{(1)}, Δ(2)>0\Delta^{(2)}>0; rank rr; nuisance variance σξ2\sigma_{\xi}^{2}; privacy budget ε>0\varepsilon>0 and 0<δ<10<\delta<1.
Output: (ε,δ)(\varepsilon,\delta)-differentially private initialization M~0\widetilde{M}_{0}.
Compute the unbiased sample estimator L^\widehat{L} and its top-rr left and right singular vectors:
L^n1i=1nmat(Λi1vec(Xi))yiandM^0=U^Σ^V^SVDr(L^)\widehat{L}\leftarrow n^{-1}\sum_{i=1}^{n}\operatorname{mat}\left(\Lambda_{i}^{-1}\operatorname{vec}\left(X_{i}\right)\right)y_{i}\,\quad{\rm and}\,\quad\widehat{M}_{0}=\widehat{U}\widehat{\Sigma}\widehat{V}\leftarrow{\rm SVD}_{r}(\widehat{L})
Compute (ε/3,δ/3)(\varepsilon/3,\delta/3)-differentially private singular subspaces by adding artificial Gaussian noise:
U~SVDr(U^U^+EU)with(EU)ij=(EU)jii.i.d.N(0,18Δ(1)2ε2log(3/δ)),1ijd1\widetilde{U}\leftarrow{\rm SVD}_{r}\Big{(}\widehat{U}\widehat{U}^{\top}+E_{U}\Big{)}\,{\rm with}\,(E_{U})_{ij}=(E_{U})_{ji}\stackrel{{\scriptstyle{\rm i.i.d.}}}{{\sim}}N\Big{(}0,\frac{18\Delta^{(1)^{2}}}{\varepsilon^{2}}\log(3/\delta)\Big{)},\,\forall 1\leq i\leq j\leq d_{1}
V~SVDr(V^V^+EV)with(EV)ij=(EU)jii.i.d.N(0,18Δ(1)2ε2log(3/δ)),1ijd2\widetilde{V}\leftarrow{\rm SVD}_{r}\Big{(}\widehat{V}\widehat{V}^{\top}+E_{V}\Big{)}\,{\rm with}\,(E_{V})_{ij}=(E_{U})_{ji}\stackrel{{\scriptstyle{\rm i.i.d.}}}{{\sim}}N\Big{(}0,\frac{18\Delta^{(1)^{2}}}{\varepsilon^{2}}\log(3/\delta)\Big{)},\,\forall 1\leq i\leq j\leq d_{2}
Compute (ε/3,δ/3)(\varepsilon/3,\delta/3)-differentially private estimates of singular values up to rotations:
Σ~\displaystyle\widetilde{\Sigma} U~L^V~+EΣwith(EΣ)ij=(EΣ)jii.i.d.N(0,18Δ(2)2ε2log(3/δ)),1ijr\displaystyle\,\leftarrow\widetilde{U}^{\top}\widehat{L}\widetilde{V}+E_{\Sigma}\,{\rm with}\,(E_{\Sigma})_{ij}=(E_{\Sigma})_{ji}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}N\Big{(}0,\frac{18\Delta^{(2)^{2}}}{\varepsilon^{2}}\log(3/\delta)\Big{)},\,\forall 1\leq i\leq j\leq r
Compute (ε,δ)(\varepsilon,\delta)-differentially private initialization: M~0U~Σ~V~\widetilde{M}_{0}\leftarrow\widetilde{U}\widetilde{\Sigma}\widetilde{V}^{\top}
Return: M~0\widetilde{M}_{0}

To this end, we define the sensitivities of Δ(1)\Delta^{(1)} and Δ(2)\Delta^{(2)} appear in Algorithm 1. Let

L^(i):=n1jinmat(Λj1vec(Xj))Yj+n1mat(Λi1vec(Xi))yi,\widehat{L}^{(i)}:=n^{-1}\sum_{j\neq i}^{n}\operatorname{mat}\left(\Lambda_{j}^{-1}\operatorname{vec}\left(X_{j}\right)\right)Y_{j}+n^{-1}\operatorname{mat}\left(\Lambda_{i}^{-1}\operatorname{vec}\left(X_{i}^{\prime}\right)\right)y_{i}^{\prime},

where (Xi,yi)(X_{i}^{\prime},y_{i}^{\prime}) is an i.i.d. copy of (Xi,yi)(X_{i},y_{i}). Then, the estimator L^(i)\widehat{L}^{(i)} differs with L^\widehat{L} only by the ii-th pair of observations. Suppose the top-rr left and right singular vectors of L^(i)\widehat{L}^{(i)} are given by U(i)U^{(i)} and V(i)V^{(i)\top}, respectively. The sensitivity of U^U^\widehat{U}\widehat{U}^{\top} is defined by

ΔU^U^\displaystyle\Delta_{\widehat{U}\widehat{U}^{\top}} =supneighbouring(Z,Z)U^(Z)U^(Z)U^(Z)U^(Z)F=maxi[n]U^U^U^(i)U^(i)F,\displaystyle=\sup_{{\rm neighbouring}(Z,Z^{\prime})}\left\|\widehat{U}(Z)\widehat{U}(Z)^{\top}-\widehat{U}(Z^{\prime})\widehat{U}(Z^{\prime})^{\top}\right\|_{F}=\max_{i\in[n]}\left\|\widehat{U}\widehat{U}^{\top}-\widehat{U}^{(i)}\widehat{U}^{(i)\top}\right\|_{F},

and the sensitivity ΔV^V^\Delta_{\widehat{V}\widehat{V}^{\top}} of V^V^\widehat{V}\widehat{V}^{\top} is defined similarly. We refer to Δ(1)ΔU^U^ΔV^V^\Delta^{(1)}\triangleq\Delta_{\widehat{U}\widehat{U}^{\top}}\vee\Delta_{\widehat{V}\widehat{V}^{\top}} as the sensitivity of singular subspaces and define the sensitivity

Δ(2)ΔU~L^V~=supneighbouring(Z,Z)U~L^(Z)V~U~L^(Z)V~F=maxi[n]U~(L^L^(i))V~F.\Delta^{(2)}\triangleq\Delta_{\widetilde{U}^{\top}\widehat{L}\widetilde{V}}=\sup_{{\rm neighbouring}(Z,Z^{\prime})}\left\|\widetilde{U}\widehat{L}(Z)\widetilde{V}^{\top}-\widetilde{U}\widehat{L}(Z^{\prime})\widetilde{V}^{\top}\right\|_{F}=\max_{i\in[n]}\left\|\widetilde{U}^{\top}\left(\widehat{L}-\widehat{L}^{(i)}\right)\widetilde{V}\right\|_{F}.

As privatizing Σ^=U^L^V^\widehat{\Sigma}=\widehat{U}^{\top}\widehat{L}\widehat{V} by Gaussian mechanism, the scale of artificial noise avoids growing with an unnecessary d1d2\sqrt{d_{1}}\vee\sqrt{d_{2}} but rather growing with a smaller quantity r\sqrt{r}. This benefit motivates us to privatize U^\widehat{U}, V^\widehat{V} and Σ^\widehat{\Sigma}, separately. However, it is technically challenging to characterize Δ(1)=ΔU^U^ΔV^V^\Delta^{(1)}=\Delta_{\widehat{U}\widehat{U}^{\top}}\vee\Delta_{\widehat{V}\widehat{V}^{\top}} due to the non-linear dependence of U^U^\widehat{U}\widehat{U}^{\top} and V^V^\widehat{V}\widehat{V}^{\top} on the dataset Z={(Xi,yi)}i=1nZ=\{(X_{i},y_{i})\}_{i=1}^{n}. To address this challenge, we introduce an explicit spectral representation formula (See Lemma 2) to obtain a sharp upper bound on the sensitivity of the singular subspaces.

2.2 Spetral representation formula

This section introduces a spectral representation formula for asymmetric matrices (See Lemma 2). To begin with, we quickly explain the standard symmetric dilation trick (See e.g., Section 2.1.17 in Tropp et al. (2015)) and define auxiliary operators used in Lemma 2.

Symmetric dilation and auxiliary operators

For any Md1×d2M\in\mathbb{R}^{d_{1}\times d_{2}}, the symmetric dialation MM_{*} of MM is a (d1+d2)×(d1+d2)(d_{1}+d_{2})\times(d_{1}+d_{2}) matrix defined by M:=(0MM0)M_{*}:=\left(\begin{array}[]{cc}0&M\\ M^{\top}&0\end{array}\right). It is easy to check that M=MM_{*}=M_{*}^{\top} and M=M\|M_{*}\|=\|M\|. Further, if we assume that MM is of rank rr and has the form of SVD M=UΣVd1×d2M=U\Sigma V^{\top}\in\mathbb{R}^{d_{1}\times d_{2}}, then MM_{*} is of rank-2r2r and has eigendecomposition of the form

12(UUVV)(Σ00Σ)12(UUVV):=UMΣMUM.\frac{1}{\sqrt{2}}\left(\begin{array}[]{cc}U&U\\ V&-V\end{array}\right)\cdot\left(\begin{array}[]{cc}\Sigma&0\\ 0&-\Sigma\end{array}\right)\cdot\frac{1}{\sqrt{2}}\left(\begin{array}[]{cc}U&U\\ V&-V\end{array}\right)^{\top}:=U_{M^{*}}\Sigma_{M^{*}}U_{M^{*}}^{\top}.

The 2r2r eigenvectors of MM_{*} is given by the columns of UM𝕆(d1+d2),2rU_{M^{*}}\in\mathbb{O}_{(d_{1}+d_{2}),2r}. For integer t1t\geq 1, we define operators

Qt:=UMΣMtUMandQ0:=Q(UM)(UM)=Id1+d2UMUM.Q^{-t}:=U_{M^{*}}\Sigma_{M^{*}}^{-t}U_{M^{*}}^{\top}\quad{\rm and}\quad Q^{-0}:=Q^{\perp}\triangleq(U_{M^{*}})_{\perp}(U_{M^{*}})_{\perp}^{\top}=I_{d_{1}+d_{2}}-U_{M^{*}}U_{M^{*}}^{\top}.
Lemma 2 (Spectral representation formula).

Let Md1×d2M\in\mathbb{R}^{d_{1}\times d_{2}} be any rank-rr matrix with singular values σ1σr>0\sigma_{1}\geq\cdots\geq\sigma_{r}>0 and L^=M+Δd1×d2\widehat{L}=M+\Delta\in\mathbb{R}^{d_{1}\times d_{2}} be a perturbation of MM where Δd1×d2\Delta\in\mathbb{R}^{d_{1}\times d_{2}} is the deviation matrix. Suppose the top-rr left and right singular vectors of L^\widehat{L} and MM, are given by the columns of U^\widehat{U}, V^\widehat{V} and UU, VV, respectively. Suppose that 2Δσr2\|\Delta\|\leq\sigma_{r}, then

(U^U^UU00V^V^VV)=k1𝒮M,k(Δ).\left(\begin{array}[]{cc}\widehat{U}\widehat{U}^{\top}-UU^{\top}&0\\ 0&\widehat{V}\widehat{V}^{\top}-VV^{\top}\end{array}\right)=\sum_{k\geq 1}\mathcal{S}_{M_{*},k}(\Delta_{*}).

Here, Δ\Delta_{*} is the symmetric dilation of Δ:=L^M\Delta:=\widehat{L}-M and the kk-th order term 𝒮M,k(Δ)\mathcal{S}_{M_{*},k}(\Delta_{*}) is a summation of (2kk)\binom{2k}{k} terms defined by 𝒮M,k(Δ)=𝐬:s1++sk+1=k(1)1+τ(𝐬)Qs1ΔQs2ΔQsk+1,\mathcal{S}_{M_{*},k}(\Delta_{*})=\sum_{\mathbf{s}:s_{1}+\ldots+s_{k+1}=k}(-1)^{1+\tau(\mathbf{s})}\cdot Q^{-s_{1}}\Delta_{*}Q^{-s_{2}}\ldots\Delta_{*}Q^{-s_{k+1}}, where 𝐬=(s1,,sk+1)\mathbf{s}=\left(s_{1},\ldots,s_{k+1}\right) contains non-negative indices and τ(𝐬)=j=1k+1𝕀(sj>0).\tau(\mathbf{s})=\sum_{j=1}^{k+1}\mathbb{I}\left(s_{j}>0\right).

In Lemma 2, the spectral projectors U^U^\widehat{U}\widehat{U}^{\top} and V^V^\widehat{V}\widehat{V}^{\top} of the matrix L^=M+Δ\widehat{L}=M+\Delta, is explicitly represented in terms of the symmetric dilation of Δ\Delta, with the help of auxiliary operators Q0Q^{-0} and QtQ^{-t} for integer t1t\geq 1. The proof of Lemma 2 is deferred to Appendix E.2. Note that Lemma 2 accommodates a diverging condition number and requires no eigengap condition between rr non-zero singular values. In the proof of Theorem 1, we shall see that V^V^V^(i)V^(i)\widehat{V}\widehat{V}^{\top}-\widehat{V}^{(i)}\widehat{V}^{(i)\top} and U^U^U^(i)U^(i)\widehat{U}\widehat{U}^{\top}-\widehat{U}^{(i)}\widehat{U}^{(i)\top} are mainly contributed by the 1-st order approximation 𝒮M,1(Δ)𝒮M,1(Δ(i))\mathcal{S}_{M_{*},1}(\Delta_{*})-\mathcal{S}_{M_{*},1}(\Delta_{*}^{(i)}) where Δ(i)\Delta_{*}^{(i)} is the symmetric dilation of Δ(i):=L^(i)M\Delta^{(i)}:=\widehat{L}^{(i)}-M.

2.3 Privacy and utility guarantees of the initialization

In this section, we study the privacy and utility guarantees of the initialization M~0\widetilde{M}_{0}. Theorem 1 characterizes the sensitivities Δ(1)\Delta^{(1)} and Δ(2)\Delta^{(2)} needed to guarantee an (ε,δ)(\varepsilon,\delta)-DP M~0\widetilde{M}_{0}, and present the upper bounds of M~0M\|\widetilde{M}_{0}-M\| and M~0MF\|\widetilde{M}_{0}-M\|_{F}. The proof of Theorem 1 is provided in Appendix A.

Theorem 1 (Privacy and utility guarantees of the initialization M~0\widetilde{M}_{0}).

Consider i.i.d. observations Z={z1,,zn}Z=\{z_{1},\cdots,z_{n}\} drawn from the trace regression model stated in (1)(\ref{equ: trace_regression_model}) where zi:=(Xi,yi)z_{i}:=(X_{i},y_{i}) for i=1,,ni=1,\cdots,n. Let the true rank-rr regression coefficients matrix be Md1×d2M\in\mathbb{R}^{d_{1}\times d_{2}}. Suppose that {Xi}i[n]\{X_{i}\}_{i\in[n]} satisfy the Assumption 1. Under the mild condition nlog2n(d1d2)log(d1+d2)n\geq\frac{\log^{2}n}{(d_{1}\vee d_{2})\log(d_{1}+d_{2})}, there exists absolute constants C1,C2,C3>0C_{1},C_{2},C_{3}>0 such that

nn0C1Cl1r(σξ+Curσ1σr)2(d1d2)log(d1+d2);n\geq n_{0}\triangleq C_{1}C_{l}^{-1}r\left(\frac{\sigma_{\xi}+\sqrt{C_{u}r}\sigma_{1}}{\sigma_{r}}\right)^{2}(d_{1}\vee d_{2})\log(d_{1}+d_{2});

if Algorithm 1 takes in sensitivities at least Δ(1)=C2(Cl1(Curσ1+σξ)σr)rnlogn\Delta^{(1)}=C_{2}\left(\frac{\sqrt{C_{l}^{-1}}\left(\sqrt{C_{u}r}\sigma_{1}+\sigma_{\xi}\right)}{\sigma_{r}}\right)\frac{\sqrt{r}}{n}\log n and Δ(2)=C2Cl1(Curσ1+σξ)rnlogn,\Delta^{(2)}=C_{2}\sqrt{C_{l}^{-1}}\left(\sqrt{C_{u}r}\sigma_{1}+\sigma_{\xi}\right)\frac{\sqrt{r}}{n}\log n, then Algorithm 1 is guaranteed to be (ε,δ)(\varepsilon,\delta)-DP. Moreover, the output M~0\widetilde{M}_{0} of Algorithm 1 satisfies

M~0M(M~0MF/2r)\displaystyle\|\widetilde{M}_{0}-M\|\bigvee\left(\|\widetilde{M}_{0}-M\|_{F}/\sqrt{2r}\right)
C3Cl1(Curσ1+σξ)σ1σr(d1d2)log(d1+d2)ne1\displaystyle\leq\underbrace{C_{3}\sqrt{C_{l}^{-1}}\left(\sqrt{C_{u}r}\sigma_{1}+\sigma_{\xi}\right)\,\frac{\sigma_{1}}{\sigma_{r}}\sqrt{\frac{(d_{1}\vee d_{2})\log(d_{1}+d_{2})}{n}}}_{e_{1}}
+C3Cl1(Curσ1+σξ)(σ1σrr(d1d2)nε+rnε)lognlog12(3.75δ)e2,\displaystyle\quad+\underbrace{C_{3}\sqrt{C_{l}^{-1}}\left(\sqrt{C_{u}r}\sigma_{1}+\sigma_{\xi}\right)\left(\frac{\sigma_{1}}{\sigma_{r}}\frac{\sqrt{r(d_{1}\vee d_{2})}}{n\varepsilon}+\frac{r}{n\varepsilon}\right)\log n\log^{\frac{1}{2}}(\frac{3.75}{\delta})}_{e_{2}},

with probability at least 1(d1+d2)10n9exp(d1)exp(d2)1020r1-(d_{1}+d_{2})^{-10}-n^{-9}-\exp(-d_{1})-\exp(-d_{2})-10^{-20r}.

In Theorem 1, the sample size condition nn0n\geq n_{0} ensures that the spectral norm of perturbations is small enough, i.e., Δ+maxi[n]Δ(i)σr/2,\|\Delta\|+\max_{i\in[n]}\|\Delta^{(i)}\|\leq\sigma_{r}/2, to apply Lemma 2 and obtain a sharp characterization on Δ(1)ΔU^U^ΔV^V^\Delta^{(1)}\triangleq\Delta_{\widehat{U}\widehat{U}^{\top}}\vee\Delta_{\widehat{V}\widehat{V}^{\top}}. Theorem 1 also provides an upper bound on the sensitivity of pseudo singular values, which is of the order Δ(2)ΔU~L^V~σ1Δ(1)\Delta^{(2)}\triangleq\Delta_{\widetilde{U}^{\top}\widehat{L}\widetilde{V}}\asymp\sigma_{1}\Delta^{(1)}. Based on these results, Algorithm 1 outputs an (ε,δ)(\varepsilon,\delta)-DP initialization M~0\widetilde{M}_{0} under the sample size condition

nO~((κ2r2+κξr)(d1d2)),n\geq\widetilde{O}\left((\kappa^{2}r^{2}+\kappa_{\xi}r)(d_{1}\vee d_{2})\right),

with an upper bound on the error M~0M\|\widetilde{M}_{0}-M\| consisting of two terms. The first term e1e_{1} accounts for the statistical error of M^0\widehat{M}_{0} and is greater than the the optimal rate σξd1d2n\sigma_{\xi}\sqrt{\frac{d_{1}\vee d_{2}}{n}} (Koltchinskii, 2011). The second term e2e_{2} can be further decomposed into the cost of privacy on the singular subspaces which is of the order O~p(σ1σr(σ1r+σξ)(d1d2)nε)\widetilde{O}_{p}\left(\frac{\sigma_{1}}{\sigma_{r}}\left(\sigma_{1}\sqrt{r}+\sigma_{\xi}\right)\frac{\sqrt{(d_{1}\vee d_{2})}}{n\varepsilon}\right), and the cost of privacy arises from privatizing the singular values by Gaussian mechanism which is of the order O~p((σ1r+σξ)r/(nε))\widetilde{O}_{p}\left((\sigma_{1}\sqrt{r}+\sigma_{\xi})r/(n\varepsilon)\right).

Next, Corollary 1 gives the sample size required by a DP-initialization M~0\widetilde{M}_{0} that falls within a local ball of MM. The proof of Corollary 1 is deferred to Appendix A.5.

Corollary 1.

Under the conditions stated in Theorem 1, as the sample size is sufficiently large

nC1max{\displaystyle n\geq C_{1}\max\Bigg{\{} Cl1(Curσ1+σξσr)2κ2r(d1d2)log(d1+d2)n1,\displaystyle\underbrace{C_{l}^{-1}\left(\frac{\sqrt{C_{u}r}\sigma_{1}+\sigma_{\xi}}{\sigma_{r}}\right)^{2}\kappa^{2}r(d_{1}\vee d_{2})\log(d_{1}+d_{2})}_{n_{1}},
Cl1(Curσ1+σξσr)(κrd1d2+r32)lognlog12(3.75δ)εn2},\displaystyle\underbrace{\sqrt{C_{l}^{-1}}\left(\frac{\sqrt{C_{u}r}\sigma_{1}+\sigma_{\xi}}{\sigma_{r}}\right)\left(\kappa r\sqrt{d_{1}\vee d_{2}}+r^{\frac{3}{2}}\right)\log n\frac{\log^{\frac{1}{2}}(\frac{3.75}{\delta})}{\varepsilon}}_{n_{2}}\Bigg{\}},

for some absolute constant c2>0c_{2}>0, then we have, for some small constant 0<c0<10<c_{0}<1,

M~0MF2rM~0Mc0σr.\|\widetilde{M}_{0}-M\|_{F}\leq\sqrt{2r}\|\widetilde{M}_{0}-M\|\leq c_{0}\sigma_{r}. (2)

In Corollary 1, the error due to M(V^V^VV+U^U^UU)\|M\|\cdot\left(\left\|\widehat{V}\widehat{V}^{\top}-VV^{\top}\right\|+\left\|\widehat{U}\widehat{U}^{\top}-UU^{\top}\right\|\right) dominates the statistical error L^M\|\widehat{L}-M\| and the sample size n1n_{1} is required to control these two terms; the sample size n2n_{2} controls the error due to privatizing the singular subspaces and singular values. According to Corollary 1, as the sample size nO~((κ4r2+κ2κξ2r)(d1d2)),n\geq\widetilde{O}\left((\kappa^{4}r^{2}+\kappa^{2}\kappa_{\xi}^{2}r)(d_{1}\vee d_{2})\right), the (ε,δ)(\varepsilon,\delta)-DP M~0\widetilde{M}_{0} is guaranteed to fall into a local ball surrounding MM, as stated in (2). The condition (2) is a pre-requisite for Algorithm 2 to converge geometrically, as discussed in Theorem 3.

3 Minimax lower bounds

This section applies DP-Fano’s lemma (See Lemma 3) to establish the DP-constrained minimax lower bound of estimating the matrix M𝕄r:={Md1×d2:rank(M)=r}M\in\mathbb{M}_{r}:=\{M\in\mathbb{R}^{d_{1}\times d_{2}}:\operatorname{rank}(M)=r\} under trace regression model

fM(yi|Xi)=12πσξexp((yiXi,M)22σ2);Xi𝒩(𝟎,Λi).f_{M}(y_{i}|X_{i})=\frac{1}{\sqrt{2\pi}\sigma_{\xi}}\exp\left(\frac{-\left(y_{i}-\left<X_{i},M\right>\right)^{2}}{2\sigma^{2}}\right);X_{i}\sim\mathcal{N}\left(\mathbf{0},\Lambda_{i}\right). (3)

Suppose we observe an i.i.d. sample {(Xi,yi),(Xi,yi)}i[n]\{(X_{i},y_{i}),(X_{i}^{\prime},y_{i}^{\prime})\}_{i\in[n]} of size 2n2n drawn from (3). Then, we have

y¯i:=yi+yi=(0MM0),(0XiXi0)+ξi+ξi,\bar{y}_{i}:=y_{i}+y_{i}^{\prime}=\left<\left(\begin{array}[]{cc}0&M\\ M^{\top}&0\end{array}\right),\left(\begin{array}[]{cc}0&X_{i}\\ X_{i}^{\prime\top}&0\end{array}\right)\right>+\xi_{i}+\xi_{i}^{\prime},

where the underlying matrix MM_{*}. Let f(Xi,Xi)f(X_{i},X_{i}^{\prime}) be the joint distribution of XiX_{i} and XiX_{i}^{\prime}; fM(y¯iXi,Xi)f_{M_{*}}(\bar{y}_{i}\mid X_{i},X_{i}^{\prime}) be the conditional distribution of y¯i\bar{y}_{i} given Xi,XiX_{i},X_{i}^{\prime}; and denote the joint distribution of y¯i\bar{y}_{i} and Xi,XiX_{i},X_{i}^{\prime} as fM(y¯i,Xi,Xi)f_{M_{*}}(\bar{y}_{i},X_{i},X_{i}^{\prime}). It is clear that fM(y¯iXi,Xi)f_{M_{*}}(\bar{y}_{i}\mid X_{i},X_{i}^{\prime}) is given by the distribution of

𝒩((0MM0),(0XiXi0),2σξ2).\mathcal{N}(\left<\left(\begin{array}[]{cc}0&M\\ M^{\top}&0\end{array}\right),\left(\begin{array}[]{cc}0&X_{i}\\ X_{i}^{\prime\top}&0\end{array}\right)\right>,2\sigma_{\xi}^{2}).

Let \otimes represent the tensor product of marginal laws. For a given matrix Σ=diag{σ1,,σr}\Sigma=\operatorname{diag}\{\sigma_{1},\cdots,\sigma_{r}\} where Cσσ1σrcσC\sigma\geq\sigma_{1}\cdots\geq\sigma_{r}\geq c\sigma for some constants σ>0\sigma>0 and C>c>0C>c>0, we consider the family of normal distribution under trace regression model:

𝒫Σ:={i=1nfM(y¯i,Xi,Xi):M=(UΣV),U𝕆d1,r,V𝕆d2,r}.\mathcal{P}_{\Sigma}:=\Big{\{}\bigotimes_{i=1}^{n}f_{M_{*}}(\bar{y}_{i},X_{i},X_{i}^{\prime}):M_{*}=(U\Sigma V^{\top})_{*},U\in\mathbb{O}_{d_{1},r},V\in\mathbb{O}_{d_{2},r}\Big{\}}.

By definition, each distribution PM𝒫ΣP_{M_{*}}\in\mathcal{P}_{\Sigma} is indexed by UM=12(UUVV)𝕆d1+d2,2rU_{M_{*}}=\frac{1}{\sqrt{2}}\left(\begin{array}[]{cc}U&U\\ V&-V\end{array}\right)\in\mathbb{O}_{d_{1}+d_{2},2r} whose columns are the rr eigenvectors of MM_{*}. Next, we employ DP-Fano’s lemma to derive the minimax lower bound of estimating MM_{*} by a sample drawn from 𝒫Σ\mathcal{P}_{\Sigma}. Let KL(){\rm KL}(\cdot\|\cdot) and TV(,){\rm TV}(\cdot,\cdot) denote the Kullback-Leibler divergence and total variation distance between two distributions.

Lemma 3 (DP-Fano’s lemma, Acharya et al. (2021)).

Let 𝒫:={P:P=μ(1)××μ(n)}\mathcal{P}:=\{P:P=\mu^{(1)}\times\cdots\times\mu^{(n)}\} be a family of product measures indexed by a parameter from a pseudo-metric space (Θ,ρ)(\Theta,\rho). Denote θ(P)Θ\theta(P)\in\Theta the parameter associated with the distribution PP. Let 𝒬={P1,,PN}𝒫\mathcal{Q}=\{P_{1},\cdots,P_{N}\}\subset\mathcal{P} contain NN probability measures and there exist constants ρ0,l0,t0>0\rho_{0},l_{0},t_{0}>0 such that for all ii[N]i\neq i^{\prime}\in[N], ρ(θ(Pi),θ(Pi))ρ0,KL(PiPi)l0,k[n]TV(μi(k),μi(k))t0,\rho\left(\theta(P_{i}),\theta(P_{i^{\prime}})\right)\geqslant\rho_{0},\quad\operatorname{KL}\left(P_{i}\|P_{i^{\prime}}\right)\leq l_{0},\quad\sum_{k\in[n]}\operatorname{TV}\left(\mu_{i}^{(k)},\mu_{i^{\prime}}^{(k)}\right)\leq t_{0}, where Pi=μi(1)××μi(n)P_{i}=\mu_{i}^{(1)}\times\cdots\times\mu_{i}^{(n)} and Pi=μi(1)××μi(n)P_{i^{\prime}}=\mu_{i^{\prime}}^{(1)}\times\cdots\times\mu_{i^{\prime}}^{(n)}. Suppose δen\delta\lesssim e^{-n}, then

infA𝒜ε,δ(𝒫)\displaystyle\inf_{A\in\mathcal{A}_{\varepsilon,\delta}(\mathcal{P})} supP𝒫𝔼Aρ(A,θ(P))max{ρ02(1l0+log2logN),ρ04(1N1exp(4εt0))},\displaystyle\sup_{P\in\mathcal{P}}\mathbb{E}_{A}\;\rho(A,\theta(P))\geqslant\max\left\{\frac{\rho_{0}}{2}\left(1-\frac{l_{0}+\log 2}{\log N}\right),\frac{\rho_{0}}{4}\left(1\bigwedge\frac{N-1}{\exp\left(4\varepsilon t_{0}\right)}\right)\right\}, (4)

where the infimum is taken over all the (ε,δ)(\varepsilon,\delta)-DP randomized algorithm defined by 𝒜ε,δ(𝒫):={A:ZΘandA is (ε,δ)-differentially privatefor allZP𝒫}\mathcal{A}_{\varepsilon,\delta}(\mathcal{P}):=\{A:Z\mapsto\Theta\ {\rm and}\ A\text{ is }\;(\varepsilon,\delta)\text{-differentially private}\;\text{for all}\;Z\sim P\in\mathcal{P}\ \} .

To apply Fano’s lemma, we need to construct a large subset with well-separated elements for 𝕆d1+d2,2r\mathbb{O}_{d_{1}+d_{2},2r}. By Lemma 6, there exists a subset 𝒮q(d1+d2)𝕆d1+d2,2r\mathcal{S}_{q}^{(d_{1}+d_{2})}\subset\mathbb{O}_{d_{1}+d_{2},2r} with cardinality |𝒮q(d1+d2)|22r(d1+d22r)\left|\mathcal{S}_{q}^{(d_{1}+d_{2})}\right|\geq 2^{2r(d_{1}+d_{2}-2r)} such that for any HH𝒮q(d1+d2)H\neq H^{\prime}\in\mathcal{S}_{q}^{(d_{1}+d_{2})},

HHHHqτε0(2r)1/qandHHHHF2rε0,\|HH^{\top}-H^{\prime}H^{\prime\top}\|_{q}\gtrsim\tau\varepsilon_{0}(2r)^{1/q}\quad{\rm and}\quad\|HH^{\top}-H^{\prime}H^{\prime\top}\|_{\rm F}\lesssim 2\sqrt{r}\varepsilon_{0},

for some small constants τ,ε0>0\tau,\varepsilon_{0}>0, where q\|\cdot\|_{q} denotes the Schatten-q norm. We then consider the family of distributions

𝒫σ={i=1nfM(y¯i,Xi,Xi):M=σHH,H𝒮q(d1+d2)}𝒫Σ,\displaystyle\mathcal{P}_{\sigma}=\left\{\bigotimes_{i=1}^{n}f_{M_{*}}(\bar{y}_{i},X_{i},X_{i}^{\prime}):M_{*}=\sigma HH^{\top},H\in\mathcal{S}_{q}^{(d_{1}+d_{2})}\right\}\subset\mathcal{P}_{\Sigma},

whose cardinality N:=|𝒫σ|22r(d1+d22r)N:=\left|\mathcal{P}_{\sigma}\right|\geq 2^{2r(d_{1}+d_{2}-2r)}. Let M=σHHM_{*}=\sigma HH^{\top} and M=σHHM_{*}^{\prime}=\sigma H^{\prime}H^{\prime\top}. As shown in Appendix B, for any HH𝒮q(d1+d2)H\neq H^{\prime}\in\mathcal{S}_{q}^{(d_{1}+d_{2})}, we have

KL(i=1nfM(y¯i,Xi,Xi)i=1nfM(y¯i,Xi,Xi))nσξ2Cuσ2rε02,\operatorname{KL}\left(\bigotimes_{i=1}^{n}f_{M_{*}}(\bar{y}_{i},X_{i},X_{i}^{\prime})\|\bigotimes_{i=1}^{n}f_{M_{*}^{\prime}}(\bar{y}_{i},X_{i},X_{i}^{\prime})\right)\lesssim\frac{n}{\sigma_{\xi}^{2}}C_{u}\sigma^{2}r\varepsilon_{0}^{2},

and k[n]TV(fM(y¯i,Xi,Xi),fM(y¯i,Xi,Xi))nCuσσξrε0.\sum_{k\in[n]}\operatorname{\operatorname{TV}}\left(f_{M_{*}}(\bar{y}_{i},X_{i},X_{i}^{\prime}),f_{M^{\prime}_{*}}(\bar{y}_{i},X_{i},X_{i}^{\prime})\right)\lesssim n\frac{\sqrt{C_{u}}\sigma}{\sigma_{\xi}}\sqrt{r}\varepsilon_{0}. To this end, we obtain Theorem 2 by applying Lemma 3 with the bounded KL divergence and TV distance, together with the facts that 𝒫σ𝒫Σ\mathcal{P}_{\sigma}\subset\mathcal{P}_{\Sigma}. The proof of Theorem 2 is deferred to Appendix B.

Theorem 2.

Consider a sample of size nn drawn from the distribution PM𝒫ΣP_{M^{*}}\in\mathcal{P}_{\Sigma}, then for any δen\delta\lesssim e^{-n} and any q[1,]q\in[1,\infty], there exists a constant c>0c>0

infM~supP𝒫Σ𝔼M~MqcσξCu(r1/qd1d2n+r12+1qd1d2nε)r1/qσ,\inf_{\widetilde{M}}\sup_{P\in\mathcal{P}_{\Sigma}}\mathbb{E}\Big{\|}\widetilde{M}-M\Big{\|}_{q}\geq c\frac{\sigma_{\xi}}{\sqrt{C_{u}}}\left(r^{1/q}\sqrt{\frac{d_{1}\vee d_{2}}{n}}+r^{\frac{1}{2}+\frac{1}{q}}\frac{d_{1}\vee d_{2}}{n\varepsilon}\right)\bigwedge r^{1/q}\sigma,

where the infimum is taken over all possible (ε,δ)(\varepsilon,\delta)-DP algorithms. It suffices to choose q=1,2,q=1,2,\infty to obtain the bounds in the nuclear norm, Frobenius norm, and spectral norm, respectively. For example, when q=2q=2, there exists a constant c>0c>0

infM~supP𝒫Σ𝔼M~MFc(σξCur(d1d2)nl1+σξCur(d1d2)nεl2)r1/2σ.\inf_{\widetilde{M}}\sup_{P\in\mathcal{P}_{\Sigma}}\mathbb{E}\Big{\|}\widetilde{M}-M\Big{\|}_{F}\geq c\Big{(}\underbrace{\frac{\sigma_{\xi}}{\sqrt{C_{u}}}\sqrt{\frac{r(d_{1}\vee d_{2})}{n}}}_{l_{1}}+\underbrace{\frac{\sigma_{\xi}}{\sqrt{C_{u}}}\frac{r(d_{1}\vee d_{2})}{n\varepsilon}}_{l2}\Big{)}\bigwedge r^{1/2}\sigma. (5)

In Theorem 2, the lower bound (5) consists of two terms, the statistical error l1l_{1} and the cost of privacy l2l_{2}. The next section proposes a DP-estimator that attains the minimax lower bound (5), up to an additional factor σr\sigma_{r} and some logarithmic factors. As a supplement to DP-Fano’s Lemma which works for δen\delta\lesssim e^{-n}, we also try the score attack argument, which is valid for a wider range of δn1+γ\delta\lesssim n^{1+\gamma} where γ>0\gamma>0 is a constant. Theorem 5 presents the DP-constrained lower bound established by the score attack argument. The content and proof of Theorem 5 are deferred to Appendix D. We also point out that it is trivial to derive the minimax lower bound of the case d1=d2=dd_{1}=d_{2}=d based on DP-Fano’s Lemma since there is no need to apply the trick of symmetrization.

4 Upper bounds with differential privacy

In this section, we present Algorithm 2, DP-RGrad, and show that DP-RGrad attains the near-optimal convergence rate for differentially privately estimating low-rank matrices under the trace regression model. Our approach is based on privatizing the Riemannian gradient descent (RGrad) by the Gaussian mechanism. Interested readers may refer to Vandereycken (2013); Edelman et al. (1998); Adler et al. (2002); Absil et al. (2008) for the basics of RGrad. Let the estimate we obtain after ll iterations be the rank-rr matrix Mld1×d2M_{l}\in\mathbb{R}^{d_{1}\times d_{2}} whose SVD has the form Ml=UlΣlVlM_{l}=U_{l}\Sigma_{l}V_{l}^{\top}. It is well-known in Absil et al. (2008); Vandereycken (2013) that the tangent space of 𝕄r\mathbb{M}_{r} at MlM_{l} is given by 𝕋l:={Zd1×d2:Z=UlR+LVl,Rd2×r,Ld1×r}\mathbb{T}_{l}:=\left\{Z\in\mathbb{R}^{d_{1}\times d_{2}}:Z=U_{l}R^{\top}+LV_{l}^{\top},R\in\mathbb{R}^{d_{2}\times r},L\in\mathbb{R}^{d_{1}\times r}\right\}. The projection of the gradient GlG_{l} onto 𝕋l\mathbb{T}_{l} is 𝒫𝕋l(Gl)=UlUlGl+GlVlVlUlUlGlVlVl,\mathcal{P}_{\mathbb{T}_{l}}\left(G_{l}\right)=U_{l}U_{l}^{\top}G_{l}+G_{l}V_{l}V_{l}^{\top}-U_{l}U_{l}^{\top}G_{l}V_{l}V_{l}^{\top}, which is of rank at most 2r2r. Let the noisy gradient descent on the tangent space be Mlηl𝒫𝕋l(Gl)+𝒫𝕋lNlM_{l}-\eta_{l}\mathcal{P}_{\mathbb{T}_{l}}\left(G_{l}\right)+\mathcal{P}_{\mathbb{T}_{l}}N_{l} where Nld1×d2N_{l}\in\mathbb{R}^{d_{1}\times d_{2}} is the Gaussian noise matrix. Then, we retract it back to 𝕄r\mathbb{M}_{r} and obtain

Ml+1=SVDr(Mlηl𝒫𝕋l(Gl)+𝒫𝕋lNl).M_{l+1}=\operatorname{SVD}_{r}\left(M_{l}-\eta_{l}\mathcal{P}_{\mathbb{T}_{l}}\left(G_{l}\right)+\mathcal{P}_{\mathbb{T}_{l}}N_{l}\right). (6)

We update the estimate as defined in (6) for 1=0,,l11=0,\cdots,l^{*}-1 where ll^{*} is the total number of iterations. Thanks to the composition property and Gaussian mechanism, we only need to ensure that each iteration is (ε/l,δ/l)(\varepsilon/l^{*},\delta/l^{*})-DP. For trace regression model defined in (1), empirical mean squared loss is defined as n(Ml;Z):=12ni=1n(Xi,Mlyi)2\mathcal{L}_{n}(M_{l};Z):=\frac{1}{2n}\sum_{i=1}^{n}\left(\left\langle X_{i},M_{l}\right\rangle-y_{i}\right)^{2} and the empirical Euclidean gradient is Gl:=n(Ml;Z)=1ni=1n(Xi,Mlyi)Xi.G_{l}:=\nabla\mathcal{L}_{n}(M_{l};Z)=\frac{1}{n}\sum_{i=1}^{n}\left(\left\langle X_{i},M_{l}\right\rangle-y_{i}\right)X_{i}. The sensitivity of the ll-th iteration is Δl:=maxneighbouring(Z,Z)Mlη𝒫𝕋l(Gl(Z))[Mlη𝒫𝕋l(Gl(Z))]F.\Delta_{l}:=\max_{{\rm neighbouring}(Z,Z^{\prime})}\left\|M_{l}-\eta\mathcal{P}_{\mathbb{T}_{l}}(G_{l}(Z))-\left[M_{l}-\eta\mathcal{P}_{\mathbb{T}_{l}}(G_{l}(Z^{\prime}))\right]\right\|_{F}.

Algorithm 2 DP-RGrad for trace regression
Input: the loss function \mathcal{L}; the data set {(Xi,yi)}i=1n\{(X_{i},y_{i})\}_{i=1}^{n}; sensitivities {Δl}l[l]\{\Delta_{l}\}_{l\in[l^{*}]}; DP-initialization M~0\widetilde{M}_{0}; rank rr; nuisance variance σξ2\sigma_{\xi}^{2}; privacy budget ε>0,δ(0,1)\varepsilon>0,\delta\in(0,1).
Output: (ε,δ)(\varepsilon,\delta)-differentially private estimate MlM_{l^{*}} for trace regression.
Initialization: M0M~0.M_{0}\longleftarrow\widetilde{M}_{0}.
for l+1[l]l+1\in[l^{*}] do
Ml+1SVDr(Mlηl𝒫𝕋l(Gl)+𝒫𝕋lNl),M_{l+1}\longleftarrow\operatorname{SVD}_{r}\left(M_{l}-\eta_{l}\mathcal{P}_{\mathbb{T}_{l}}\left(G_{l}\right)+\mathcal{P}_{\mathbb{T}_{l}}N_{l}\right),
where GlG_{l} is the empirical Euclidean gradient
Gl:=n(Ml;Z)=1ni=1n(Xi,Mlyi)Xi,G_{l}:=\nabla\mathcal{L}_{n}(M_{l};Z)=\frac{1}{n}\sum_{i=1}^{n}\left(\left\langle X_{i},M_{l}\right\rangle-y_{i}\right)X_{i},
and Nld1×d2N_{l}\in\mathbb{R}^{d_{1}\times d_{2}} has entries i.i.d. to
𝒩(0,2Δl2l2ε2log(1.25lδ)).\mathcal{N}\left(0,\frac{2\Delta_{l}^{2}l^{*2}}{\varepsilon^{2}}\log\left(\frac{1.25l^{*}}{\delta}\right)\right).
Return: M~lMl\widetilde{M}_{l^{*}}\longleftarrow M_{l^{*}}

Theorem 3 establishes the error bound of the estimator M~l\widetilde{M}_{l^{*}} given by Algorithm 2. The proof of Theorem 3 is deferred to Appendix C.

Theorem 3.

Consider i.i.d. observations Z={z1,,zn}Z=\{z_{1},\cdots,z_{n}\} drawn from the trace regression model stated in (1)(\ref{equ: trace_regression_model}) where the true low-rank regression coefficients matrix being M𝕄rM\in\mathbb{M}_{r}. Here, zi:=(Xi,yi)z_{i}:=(X_{i},y_{i}) for i=1,,ni=1,\cdots,n and we assume that {Xi}i[n]\{X_{i}\}_{i\in[n]} satisfy the Assumption 1. Under the Assumption 1 and the condition that (d1+d2)>logn(d_{1}+d_{2})>\log n, suppose that Algorithm 2 takes in an (ε,δ)(\varepsilon,\delta)-DP initialization such that for some small constant 0<c0<10<c_{0}<1, M~0MF2rM~0Mc0σr,\|\widetilde{M}_{0}-M\|_{F}\leq\sqrt{2r}\|\widetilde{M}_{0}-M\|\leq c_{0}\sigma_{r}, and the sensitivities Δl\Delta_{l} take the value

Δl\displaystyle\Delta_{l} =C3ηn(σξ+σrCu)Cur(d1+d2)logn,foralll=1,,l,\displaystyle=C_{3}\frac{\eta}{n}(\sigma_{\xi}+\sigma_{r}\sqrt{C_{u}})\sqrt{C_{u}r(d_{1}+d_{2})\log n},\quad{\rm for\;all}\quad l=1,\cdots,l^{*},

for some absolute constant C3>0C_{3}>0, then we have, Algorithm 2 is (2ε,2δ)(2\varepsilon,2\delta)-differentially private. Moreover, as the sample size

nc4max{\displaystyle n\geq c_{4}\max\Bigg{\{} c1r(d1+d2)n3,η2κξ2Cur(d1+d2)log(d1+d2)n4,\displaystyle\underbrace{c_{1}r(d_{1}+d_{2})}_{n_{3}},\quad\underbrace{\eta^{2}\kappa_{\xi}^{2}C_{u}r(d_{1}+d_{2})\log(d_{1}+d_{2})}_{n_{4}},
ηCu(κξ+Cu)r(d1+d2)log3/2(n)log1/2(1.25log(n)δ)εn5},\displaystyle\underbrace{\eta\sqrt{C_{u}}\left(\kappa_{\xi}+\sqrt{C_{u}}\right)r(d_{1}+d_{2})\log^{3/2}(n)\frac{\log^{1/2}\left(\frac{1.25\log(n)}{\delta}\right)}{\varepsilon}}_{n_{5}}\Bigg{\}},

for some small constant 0<c4<10<c_{4}<1, number of iteration l=O(logn)l^{*}=O(\log n), and the step size 0<η<10<\eta<1, we have the output M~l\widetilde{M}_{l^{*}} of Algorithm 2 satisfies

M~lMF\displaystyle\left\|\widetilde{M}_{l^{*}}-M\right\|_{F} C4σξCur(d1+d2)nlog1/2(d1+d2)u1\displaystyle\leq\underbrace{C_{4}\sigma_{\xi}\sqrt{C_{u}}\sqrt{\frac{r(d_{1}+d_{2})}{n}}\log^{1/2}(d_{1}+d_{2})}_{u_{1}}
+C4Cu(σξ+σrCu)r(d1+d2)nεlog3/2nlog1/2(1.25log(n)δ)u2.\displaystyle+\underbrace{C_{4}\sqrt{C_{u}}(\sigma_{\xi}+\sigma_{r}\sqrt{C_{u}})\frac{r(d_{1}+d_{2})}{n\varepsilon}\log^{3/2}n\log^{1/2}\left(\frac{1.25\log(n)}{\delta}\right)}_{u_{2}}.

with probability at least

1\displaystyle 1 c~2exp(c~3r(d1+d2))(d1+d2)10n9ed1ed21020r\displaystyle-\widetilde{c}_{2}\exp\left(-\widetilde{c}_{3}r(d_{1}+d_{2})\right)-(d_{1}+d_{2})^{-10}-n^{-9}-e^{-d_{1}}-e^{-d_{2}}-10^{-20r}
((d1+d2)10+exp((d1+d2))+n9+exp(10Cu(d1+d2))n9)logn.\displaystyle-\left((d_{1}+d_{2})^{-10}+\exp(-(d_{1}+d_{2}))+n^{-9}+\exp\left(-10C_{u}(d_{1}+d_{2})\right)n^{-9}\right)\log n.

According to Theorem 3, the upper bound of M~lMF\left\|\widetilde{M}_{l^{*}}-M\right\|_{F} can be decomposed into the the statistical error u1u_{1} and the cost of privacy u2u_{2}. The term u1u_{1} matches the the optimal rate l1σξr(d1d2)nl_{1}\sim\sigma_{\xi}\sqrt{\frac{r(d_{1}\vee d_{2})}{n}}, only up to logarithmic factors. However, the term u2u_{2} differs from the theoretical lower bound of the cost of privacy l2σξr(d1+d2)nεl_{2}\sim\sigma_{\xi}\frac{r(d_{1}+d_{2})}{n\varepsilon}, by a non-trivial factor σr\sigma_{r}, apart from logarithmic factors. In conclusion, Theorem 3 shows that as the sample size nO~((κξ2κξ)r(d1d2)),n\gtrsim\widetilde{O}\left(\left(\kappa_{\xi}^{2}\vee\kappa_{\xi}\right)r(d_{1}\vee d_{2})\right), the estimator M~l\widetilde{M}_{l^{*}} given by Algorithm 2 attains the near-optimal convergence rate

O~p(σξr(d1+d2)n+(σξ+σr)r(d1+d2)nε).\widetilde{O}_{p}\left(\sigma_{\xi}\sqrt{\frac{r(d_{1}+d_{2})}{n}}+(\sigma_{\xi}+\sigma_{r})\frac{r(d_{1}+d_{2})}{n\varepsilon}\right). (7)

The sample size requirement of Theorem 3 has the following explanations. The sample size n3n_{3} is required to guarantee that the RIP condition stated in Lemma 1 occurs with high probability. The sample size n4n_{4} is necessary to control the statistical error contributed by i[n]ξiXi\sum_{i\in[n]}\xi_{i}X_{i} in each iteration where ξi\xi_{i} is the model noise. The sample size n5n_{5} arises from controlling the cost of privacy due to 𝒫𝕋lNl\mathcal{P}_{\mathbb{T}_{l}}N_{l} in each iteration.

5 Discussion

In this section, we discuss the non-trivial gap σr\sigma_{r} between u2(σξ+σr)r(d1+d2)nεu_{2}\sim(\sigma_{\xi}+\sigma_{r})\frac{r(d_{1}+d_{2})}{n\varepsilon} and l2σξr(d1d2)nεl_{2}\sim\sigma_{\xi}\,\frac{r(d_{1}\vee d_{2})}{n\varepsilon}. Note that l2l_{2} is free of σr\sigma_{r} while u2u_{2} contains the factor σr\sigma_{r} arising from sensitivities

Δlηn(σξ+σrCu)Cur(d1+d2)lognforl=1,,l.\displaystyle\Delta_{l}\asymp\frac{\eta}{n}(\sigma_{\xi}+\sigma_{r}\sqrt{C_{u}})\sqrt{C_{u}r(d_{1}+d_{2})\log n}\quad{\rm for}\quad l=1,\cdots,l^{*}.

The quantity σrCu\sigma_{r}\sqrt{C_{u}} of Δl\Delta_{l} arises from MlM,Xiψ2CuMlMF\left\|\left<M_{l}-M,X_{i}\right>\right\|_{\psi_{2}}\leq\sqrt{C_{u}}\left\|M_{l}-M\right\|_{F}, as elaborated in (S.12). Here, ψ2\|\cdot\|_{\psi_{2}} denotes the sub-Gaussian norm. Therefore, we cannot get rid of the factor σr\sigma_{r} once the measurement matrices {Xi}i[n]\{X_{i}\}_{i\in[n]} are subject to differential privacy. In many real applications, however, the measurement matrices {Xi}i[n]\{X_{i}\}_{i\in[n]} are fixed with deterministic designs. People publish {Xi}i[n]\{X_{i}\}_{i\in[n]} to the public with little concern on the privacy of {Xi}i[n]\{X_{i}\}_{i\in[n]}. Although the exposure of {Xi}i[n]\{X_{i}\}_{i\in[n]} alone will not reveal any information on MM, the privacy of MM suffers from leakage when the public has access to the joint observations {(Xi,yi)}i[n]\{(X_{i},y_{i})\}_{i\in[n]}. We, therefore, introduce the following notion of privacy for neighboring datasets sharing the same measurement matrix.

Definition 1 (weak (ε,δ)(\varepsilon,\delta)-differential privacy).

The algorithm AA that maps ZZ into d1×d2\mathbb{R}^{d_{1}\times d_{2}} is weak (ε,δ)(\varepsilon,\delta)-differentially private over the dataset ZZ if (A(Z)𝒬)eε(A(Z)𝒬)+δ,\mathbb{P}\big{(}A(Z)\in\mathcal{Q}\big{)}\leq e^{\varepsilon}\mathbb{P}\big{(}A(Z^{\prime})\in\mathcal{Q}\big{)}+\delta, for all neighbouring data set Z,ZZ,Z^{\prime} sharing the same measurement XX and all subset 𝒬d1×d2\mathcal{Q}\subset\mathbb{R}^{d_{1}\times d_{2}}.

In Theorem 7, Appendix F, we show that as nO~((κξ2κξ)r(d1d2)),n\gtrsim\widetilde{O}\left(\left(\kappa_{\xi}^{2}\vee\kappa_{\xi}\right)r(d_{1}\vee d_{2})\right), the estimator M~l\widetilde{M}_{l^{*}} given by Algorithm 2 attains the optimal convergence rate O~p(σξr(d1+d2)n+σξr(d1+d2)nε)\widetilde{O}_{p}\left(\sigma_{\xi}\sqrt{\frac{r(d_{1}+d_{2})}{n}}+\sigma_{\xi}\frac{r(d_{1}+d_{2})}{n\varepsilon}\right) in the sense of weak differential privacy. The analogs of Theorem 1, Corollary 1 and Theorem 3 in the sense of weak differential privacy can be found as Theorem 6, Corollary 2 and Theorem 7 in Appendix F. It is interesting to explore in future work whether the score attack argument or DP-Fano’s Lemma can be generalized to include the non-trivial factor σr\sigma_{r}.

Acknowledgement

The author would like to express sincere gratitude to Hong Kong PhD Fellowship Scheme and Hong Kong RGC GRF Grant 16301622 for providing financial support for this research. The author also wishes to acknowledge the invaluable guidance provided by Prof. Dong, Xia throughout the research process. Additionally, the author would like to extend heartfelt thanks to Mr. Zetao, Fei for his constructive criticism during the paper revision. Their contributions have been instrumental in the successful completion of this research.

Appendix A Proof of Theorem 1

The proof of Theorem 1 consists of four parts. In Part A.1, we list several existing results that are useful in the proofs later. In Part A.2, Lemma 2 works as the main technique to derive the sensitivity Δ(1)\Delta^{(1)}. Part A.3 derives the sensitivity Δ(2)\Delta^{(2)}. Part A.4 establishes the upper bounds of M~0M\|\widetilde{M}_{0}-M\| and M~0MF\|\widetilde{M}_{0}-M\|_{F} based on the Δ(1)\Delta^{(1)} and Δ(2)\Delta^{(2)}.

A.1 Part 1

The following Theorem 4 proposed by Proposition 22, Koltchinskii (2011) will be frequently used to establish the upper bound of the spectral norm of a summation of independent random matrices.

Theorem 4 (Bernstein’s inequality, Koltchinskii (2011)).

Let B1,,BnB_{1},\cdots,B_{n} be independent d1×d2d_{1}\times d_{2} matrices such that for some α1\alpha\geq 1 and all i[n]i\in[n]

𝔼Bi=0,Λmax(Bi)Ψα=:K<+.\mathbb{E}B_{i}=0,\quad\left\|\Lambda_{\max}\left(B_{i}\right)\right\|_{\Psi_{\alpha}}=:K<+\infty.

Let

S2:=max{Λmax(i=1n𝔼BiBi)/n,Λmax(i=1n𝔼BiBi)/n}.S^{2}:=\max\left\{\Lambda_{\max}\left(\sum_{i=1}^{n}\mathbb{E}B_{i}B_{i}^{\top}\right)/n,\Lambda_{\max}\left(\sum_{i=1}^{n}\mathbb{E}B_{i}^{\top}B_{i}\right)/n\right\}.

Then, for some constant C>0C>0 and for all t>0t>0,

(1ni=1nBiCSt+log(d1+d2)n+CKlog1/α(KS)t+log(d1+d2)n)exp(t).\mathbb{P}\left(\left\|\frac{1}{n}\sum_{i=1}^{n}B_{i}\right\|\geq CS\sqrt{\frac{t+\log(d_{1}+d_{2})}{n}}+CK\log^{1/\alpha}\left(\frac{K}{S}\right)\frac{t+\log(d_{1}+d_{2})}{n}\right)\leq\exp(-t).

Theorem 4 applies to bound the spectral norm of Δ:=L^M\Delta:=\widehat{L}-M. The existing result for the case of heavy-tailed noise can be found in Theorem 6, Shen et al. (2023). Adapting the existing result to the case of Gaussian noise, we have that for some absolute constant C0>0C_{0}>0,

Δ=L^MC0Cl1(σξ+Curσ1)(d1d2)log(d1+d2)n,\|\Delta\|=\|\widehat{L}-M\|\leq C_{0}\sqrt{C_{l}^{-1}}\left(\sigma_{\xi}+\sqrt{C_{u}}\sqrt{r}\sigma_{1}\right)\sqrt{\frac{(d_{1}\vee d_{2})\log(d_{1}+d_{2})}{n}}, (S.1)

with probability at least 1(d1+d2)101-(d_{1}+d_{2})^{-10}. The following Lemma originated from Lemma 18, Shen et al. (2023), is useful to analyze the matrix permutation due to singular value decomposition.

Lemma 4 (Matrix Permutation, Shen et al. (2023)).

Let Md1×d2M\in\mathbb{R}^{d_{1}\times d_{2}} be a rank-rr matrix with singular value decomposition of the form M=UΣVM=U\Sigma V^{\top} where Σ=diag{σ1,σ2,,σr}\Sigma=\operatorname{diag}\left\{\sigma_{1},\sigma_{2},\cdots,\sigma_{r}\right\} with σ1σ2σr>0\sigma_{1}\geq\sigma_{2}\geq\cdots\geq\sigma_{r}>0. For any M^d×d\widehat{M}\in\mathbb{R}^{d\times d} satisfying M^MF<σr/8\|\widehat{M}-M\|_{\mathrm{F}}<\sigma_{r}/8, then

SVDr(M^)M\displaystyle\left\|\operatorname{SVD}_{r}(\widehat{M})-M\right\| M^M+40M^M2σr,\displaystyle\leq\|\widehat{M}-M\|+40\frac{\|\widehat{M}-M\|^{2}}{\sigma_{r}},
SVDr(M^)MF\displaystyle\left\|\operatorname{SVD}_{r}(\widehat{M})-M\right\|_{\mathrm{F}} M^MF+40M^MM^MFσr,\displaystyle\leq\|\widehat{M}-M\|_{\mathrm{F}}+40\frac{\|\widehat{M}-M\|\|\widehat{M}-M\|_{\mathrm{F}}}{\sigma_{r}},

and

U^U^UU8σrM^M,V^V^VV8σrM^M,\left\|\widehat{U}\widehat{U}^{\top}-UU^{\top}\right\|\leq\frac{8}{\sigma_{r}}\|\widehat{M}-M\|,\quad\left\|\widehat{V}\widehat{V}^{\top}-VV^{\top}\right\|\leq\frac{8}{\sigma_{r}}\|\widehat{M}-M\|,

where the leading rr left singular vectors of M^\widehat{M} are given by the columns of U^d1×r\widehat{U}\in\mathbb{R}^{d_{1}\times r} and the leading rr right singular vectors of M^\widehat{M} are given by the columns of V^d2×r\widehat{V}\in\mathbb{R}^{d_{2}\times r}.

A.2 Part 2

The second part aims to derive the sensivitity

Δ(1):=maxi[n](U^U^U^(i)U^(i)FV^V^V^(i)V^(i)F).\Delta^{(1)}:=\max_{i\in[n]}\left(\|\widehat{U}\widehat{U}^{\top}-\widehat{U}^{(i)}\widehat{U}^{(i)\top}\|_{F}\vee\|\widehat{V}\widehat{V}^{\top}-\widehat{V}^{(i)}\widehat{V}^{(i)\top}\|_{F}\right).

Before moving on, we present Lemma 5, which provides conclusions on Δ\Delta and Δ(i)\Delta^{(i)}, frequently used in the proof later. The proof of Lemma 5 can be found in Appendix E.4.

Lemma 5.

Under model (1), Assumption 1, and the condition nlog2n(d1d2)log(d1+d2)n\geq\frac{\log^{2}n}{(d_{1}\vee d_{2})\log(d_{1}+d_{2})}, there exists some absolute constant C0,C1>0C_{0},C_{1}>0 such that the event

:=\displaystyle\mathcal{E}_{*}:= {maxi[n]ΔΔ(i)C0n1Cl1(Curσ1+σξ)logn}\displaystyle\Bigg{\{}\max_{i\in[n]}\big{\|}\Delta-\Delta^{(i)}\big{\|}\leq C_{0}\cdot n^{-1}\sqrt{C_{l}^{-1}}\left(\sqrt{C_{u}r}\sigma_{1}+\sigma_{\xi}\right)\log n\Bigg{\}}
{Δ+maxi[n]Δ(i)C0Cl1(σξ+Curσ1)(d1d2)log(d1+d2)n},\displaystyle\bigcap\Bigg{\{}\|\Delta\|+\max_{i\in[n]}\|\Delta^{(i)}\|\leq C_{0}\sqrt{C_{l}^{-1}}\left(\sigma_{\xi}+\sqrt{C_{u}}\sqrt{r}\sigma_{1}\right)\sqrt{\frac{(d_{1}\vee d_{2})\log(d_{1}+d_{2})}{n}}\Bigg{\}},

holds with probability at least 1(d1+d2)10n91-(d_{1}+d_{2})^{-10}-n^{-9}. Conditioned on the event \mathcal{E}_{*}, as the sample size satisfies

nC1Cl1r(σξ+Curσ1σr)2(d1d2)log(d1+d2),n\geq C_{1}C_{l}^{-1}r\left(\frac{\sigma_{\xi}+\sqrt{C_{u}r}\sigma_{1}}{\sigma_{r}}\right)^{2}(d_{1}\vee d_{2})\log(d_{1}+d_{2}), (S.2)

we have

Δmaxi[n]Δ(i)=Δmaxi[n]Δ(i)σr82r<σr5+δ<σr2,\|\Delta_{*}\|\vee\max_{i\in[n]}\|\Delta_{*}^{(i)}\|=\|\Delta\|\vee\max_{i\in[n]}\|\Delta^{(i)}\|\leq\frac{\sigma_{r}}{8\sqrt{2r}}<\frac{\sigma_{r}}{5+\delta}<\frac{\sigma_{r}}{2}, (S.3)

and

ΔFmaxi[n]Δ(i)F=maxi[n]ΔFΔ(i)Fσr8,\|\Delta_{*}\|_{F}\vee\max_{i\in[n]}\|\Delta_{*}^{(i)}\|_{F}=\max_{i\in[n]}\|\Delta\|_{F}\vee\|\Delta^{(i)}\|_{F}\leq\frac{\sigma_{r}}{8},

for some constant δ>0\delta>0, where Δ(i)\Delta_{*}^{(i)} is the symmetric dilation of Δ(i):=L(i)M\Delta^{(i)}:=L^{(i)}-M.

The following analysis is proceeded under the sample size condition (S.2) and is mainly conditioned on the event \mathcal{E}_{*} which happens with probability at least 1(d1+d2)10n91-(d_{1}+d_{2})^{-10}-n^{-9}.

Step 1: expansion. Conditioned on \mathcal{E}_{*}, we are able to apply Lemma 2 to Δ\Delta_{*} and Δ(i)\Delta_{*}^{(i)} and get

(U^U^U^(i)U^(i)00V^V^V^(i)V^(i))=k1𝒮M,k(Δ)k1𝒮M,k(Δ(i)).\left(\begin{array}[]{cc}\widehat{U}\widehat{U}^{\top}-\widehat{U}^{(i)}\widehat{U}^{(i)\top}&0\\ 0&\widehat{V}\widehat{V}^{\top}-\widehat{V}^{(i)}\widehat{V}^{(i)\top}\end{array}\right)=\sum_{k\geq 1}\mathcal{S}_{M_{*},k}(\Delta_{*})-\sum_{k\geq 1}\mathcal{S}_{M_{*},k}(\Delta^{(i)}_{*}).

Our goal is to bound Δ(1)\Delta^{(1)} which satisfies

Δ(1)\displaystyle\Delta^{(1)} maxi[n](U^U^U^(i)U^(i)F+V^V^V^(i)V^(i)F)\displaystyle\leq\max_{i\in[n]}\left(\|\widehat{U}\widehat{U}^{\top}-\widehat{U}^{(i)}\widehat{U}^{(i)\top}\|_{F}+\|\widehat{V}\widehat{V}^{\top}-\widehat{V}^{(i)}\widehat{V}^{(i)\top}\|_{F}\right)
maxi[n](𝒮M,1(Δ)𝒮M,1(Δ(i))F+k2𝒮M,k(Δ)k2𝒮M,k(Δ(i))F).\displaystyle\leq\max_{i\in[n]}\left(\left\|\mathcal{S}_{M_{*},1}(\Delta_{*})-\mathcal{S}_{M_{*},1}(\Delta^{(i)}_{*})\right\|_{F}+\left\|\sum_{k\geq 2}\mathcal{S}_{M_{*},k}(\Delta_{*})-\sum_{k\geq 2}\mathcal{S}_{M_{*},k}(\Delta^{(i)}_{*})\right\|_{F}\right).

Step 2: bounding the first order term. By the definition of 𝒮M,1(Δ)\mathcal{S}_{M_{*},1}(\Delta_{*}) and 𝒮M,1(Δ(i))\mathcal{S}_{M_{*},1}(\Delta^{(i)}_{*}),

maxi[n]𝒮M,1(Δ)𝒮M,1(Δ(i))=maxi[n]Q1(ΔΔ(i))Q+Q(ΔΔ(i))Q1\displaystyle\max_{i\in[n]}\|\mathcal{S}_{M_{*},1}(\Delta_{*})-\mathcal{S}_{M_{*},1}(\Delta^{(i)}_{*})\|=\max_{i\in[n]}\|Q^{-1}(\Delta-\Delta^{(i)})^{\top}Q_{\perp}+Q_{\perp}(\Delta-\Delta^{(i)})^{\top}Q^{-1}\| (S.4)
2rσrmaxi[n]ΔΔ(i)C4rnσrCl1(Curσ1+σξ)logn,\displaystyle\leq\frac{2\sqrt{r}}{\sigma_{r}}\max_{i\in[n]}\|\Delta-\Delta^{(i)}\|\leq C_{4}\frac{\sqrt{r}}{n\sigma_{r}}\sqrt{C_{l}^{-1}}\left(\sqrt{C_{u}r}\sigma_{1}+\sigma_{\xi}\right)\log n,

conditioned on the event \mathcal{E}_{*}, for some absolute constant C4>0C_{4}>0.

Step 3: bounding the higher order terms. Let IkI_{k} be the index set for terms in 𝒮M,k\mathcal{S}_{M_{*},k}

Ik={𝐬:𝐬=(s1,,sk+1),m=1k+1sm=k,sm0m[k+1]},I_{k}=\left\{\mathbf{s}:\mathbf{s}=\left(s_{1},\ldots,s_{k+1}\right),\sum_{m=1}^{k+1}s_{m}=k,s_{m}\geq 0\quad\forall m\in[k+1]\right\},

with the cardinality |Ik|=(2kk)\left|I_{k}\right|=\binom{2k}{k}. We define

𝒯M,k,𝐬,l(ΔΔ(i)):=Qs1Δ(i)Qs2Qsl(ΔΔ(i))Qsl+1QskΔQsk+1,\mathcal{T}_{M_{*},k,\mathbf{s},l}\left(\Delta_{*}-\Delta_{*}^{(i)}\right):=Q^{-s_{1}}\Delta_{*}^{(i)}Q^{-s_{2}}\cdots Q^{-s_{l}}\left(\Delta_{*}-\Delta_{*}^{(i)}\right)Q^{s_{l+1}}\cdots Q^{-s_{k}}\Delta_{*}Q^{s_{k+1}},

for k2,𝐬=(s1,,sk+1)Ikk\geq 2,\mathbf{s}=\left(s_{1},\cdots,s_{k+1}\right)\in I_{k} and l[k]l\in[k]. Since |Ik|=(2kk)\left|I_{k}\right|=\binom{2k}{k}, the higher order terms maxi[n]k2𝒮M,k(Δ)k2𝒮M,k(Δ(i))F\max_{i\in[n]}\left\|\sum_{k\geq 2}\mathcal{S}_{M_{*},k}(\Delta_{*})-\sum_{k\geq 2}\mathcal{S}_{M_{*},k}\left(\Delta_{*}^{(i)}\right)\right\|_{\mathrm{F}}

=maxi[n]k2𝐬Ikl[k]𝒯M,k,𝐬,l(ΔΔ(i))Fmaxi[n]k2(2kk)l[k]𝒯M,k,𝐬,l(ΔΔ(i))F.\displaystyle=\max_{i\in[n]}\left\|\sum_{k\geq 2}\sum_{\mathbf{s}\in I_{k}}\sum_{l\in[k]}\mathcal{T}_{M_{*},k,\mathbf{s},l}\left(\Delta_{*}-\Delta_{*}^{(i)}\right)\right\|_{\mathrm{F}}\leq\max_{i\in[n]}\sum_{k\geq 2}\binom{2k}{k}\sum_{l\in[k]}\left\|\mathcal{T}_{M_{*},k,\mathbf{s},l}\left(\Delta_{*}-\Delta_{*}^{(i)}\right)\right\|_{\mathrm{F}}. (S.5)

It is sufficient to find a upper bound of 𝒯M,k,𝐬,l(ΔΔ(i))F\left\|\mathcal{T}_{M_{*},k,\mathbf{s},l}\left(\Delta_{*}-\Delta_{*}^{(i)}\right)\right\|_{\mathrm{F}}. Denote

Dmax:=C1Cl1(σξ+Curσ1)(d1d2)log(d1+d2)n,D_{\max}:=C_{1}\sqrt{C_{l}^{-1}}\left(\sigma_{\xi}+\sqrt{C_{u}}\sqrt{r}\sigma_{1}\right)\sqrt{\frac{(d_{1}\vee d_{2})\log(d_{1}+d_{2})}{n}},

which appeared in the event \mathcal{E}_{*} as an upper bound of Δ+maxi[n]Δ(i)\|\Delta\|+\max_{i\in[n]}\left\|\Delta^{(i)}\right\|.

Conditioned on \mathcal{E}_{*}, for all i[n]i\in[n], k2,𝐬Ikk\geq 2,\mathbf{s}\in I_{k} and l[k]l\in[k],

𝒯M,k,𝐬,l(ΔΔ(i))F2r𝒯M,k,𝐬,l(ΔΔ(i))2rσrΔΔ(i)(Dmaxσr)k1,\displaystyle\left\|\mathcal{T}_{M_{*},k,\mathbf{s},l}\left(\Delta-\Delta^{(i)}\right)\right\|_{\mathrm{F}}\leq\sqrt{2r}\left\|\mathcal{T}_{M_{*},k,\mathbf{s},l}\left(\Delta-\Delta^{(i)}\right)\right\|\leq\frac{\sqrt{2r}}{\sigma_{r}}\left\|\Delta_{*}-\Delta_{*}^{(i)}\right\|\left(\frac{D_{\max}}{\sigma_{r}}\right)^{k-1},

where the first inequality is because 𝒯M,k,𝐬,l(ΔΔ(i))\mathcal{T}_{M_{*},k,\mathbf{s},l}\left(\Delta_{*}-\Delta_{*}^{(i)}\right) is of rank at most 2r2r. Let aa be a function defined by a(k)=(2kk)ka(k)=\binom{2k}{k}k, then a(2)=12a(2)=12 and a(k+1)a(k)5\frac{a(k+1)}{a(k)}\leq 5 for all integer k2k\geq 2,

maxi[n]k2(2kk)l[k]𝒯M,k,𝐬,l(ΔΔ(i))F\displaystyle\max_{i\in[n]}\sum_{k\geq 2}\binom{2k}{k}\sum_{l\in[k]}\left\|\mathcal{T}_{M_{*},k,\mathbf{s},l}\left(\Delta_{*}-\Delta_{*}^{(i)}\right)\right\|_{\mathrm{F}} (S.6)
maxi[n](42)k05k(Dmaxσr)k2rσrΔΔ(i)(Dmaxσr)\displaystyle\leq\max_{i\in[n]}\binom{4}{2}\sum_{k\geq 0}5^{k}\left(\frac{\|D_{\max}\|}{\sigma_{r}}\right)^{k}\frac{\sqrt{2r}}{\sigma_{r}}\left\|\Delta_{*}-\Delta_{*}^{(i)}\right\|\left(\frac{D_{\max}}{\sigma_{r}}\right)
maxi[n]a(2)2rσrΔΔ(i)k0(55+δ)k(Dmaxσr)\displaystyle\leq\max_{i\in[n]}a(2)\frac{\sqrt{2r}}{\sigma_{r}}\left\|\Delta_{*}-\Delta_{*}^{(i)}\right\|\sum_{k\geq 0}\left(\frac{5}{5+\delta}\right)^{k}\left(\frac{D_{\max}}{\sigma_{r}}\right)
maxi[n]a(2)(5+δδ)(Dmaxσr)2rσrΔΔ(i),\displaystyle\leq\max_{i\in[n]}a(2)\left(\frac{5+\delta}{\delta}\right)\left(\frac{D_{\max}}{\sigma_{r}}\right)\frac{\sqrt{2r}}{\sigma_{r}}\left\|\Delta_{*}-\Delta_{*}^{(i)}\right\|,

where the last step is due to (S.3), which is guaranteed by the sample size condition (S.2) together with the event \mathcal{E}_{*}. Combining (S.5) and (S.6), since ΔΔ(i)=ΔΔ(i)\left\|\Delta_{*}-\Delta_{*}^{(i)}\right\|=\left\|\Delta-\Delta^{(i)}\right\|, conditioned on the event \mathcal{E}_{*}, we have

maxi[n]k2𝒮M,k(Δ)k2𝒮M,k(Δ(i))FC4(12δ)2rnCl1(Curσ1+σξσr)logn,\displaystyle\max_{i\in[n]}\left\|\sum_{k\geq 2}\mathcal{S}_{M_{*},k}(\Delta_{*})-\sum_{k\geq 2}\mathcal{S}_{M_{*},k}\left(\Delta_{*}^{(i)}\right)\right\|_{\mathrm{F}}\leq C_{4}\left(\frac{12}{\delta}\right)\frac{\sqrt{2r}}{n}\sqrt{C_{l}^{-1}}\left(\frac{\sqrt{C_{u}r}\,\sigma_{1}+\sigma_{\xi}}{\sigma_{r}}\right)\log n,

for some absolute constant C3>0C_{3}>0. In conclusion, conditioned on \mathcal{E}_{*}, as the sample size nC1Cl1r(σξ+Curσ1σr)2(d1d2)log(d1+d2),n\geq C_{1}C_{l}^{-1}r\left(\frac{\sigma_{\xi}+\sqrt{C_{u}r}\sigma_{1}}{\sigma_{r}}\right)^{2}(d_{1}\vee d_{2})\log(d_{1}+d_{2}), for some absolute constant C1>0C_{1}>0, we have Δ(1)C4rnCl1(Curσ1+σξσr)logn,\Delta^{(1)}\leq C_{4}\frac{\sqrt{r}}{n}\sqrt{C_{l}^{-1}}\left(\frac{\sqrt{C_{u}r}\,\sigma_{1}+\sigma_{\xi}}{\sigma_{r}}\right)\log n, for some absolute constant C4C_{4}.

Let EUd1×d1E_{U}\in\mathbb{R}^{d_{1}\times d_{1}} be a symmetric matrix where the entries (EU)ij(E_{U})_{ij} i.i.d. to 𝒩(0,18Δ(1)2ε2log(3.75δ))\mathcal{N}(0,\frac{18\Delta^{(1)^{2}}}{\varepsilon^{2}}\log(\frac{3.75}{\delta})) for 1ijd11\leq i\leq j\leq d_{1}. Then, conditioned on the event \mathcal{E}_{*} and (S.2), for some absolute constant C~4>0\widetilde{C}_{4}>0, EUC~4rd1nεCl1(Curσ1+σξσr)lognlog12(3.75δ),\|E_{U}\|\leq\widetilde{C}_{4}\frac{\sqrt{rd_{1}}}{n\varepsilon}\sqrt{C_{l}^{-1}}\left(\frac{\sqrt{C_{u}r}\,\sigma_{1}+\sigma_{\xi}}{\sigma_{r}}\right)\log n\log^{\frac{1}{2}}(\frac{3.75}{\delta}), with probability at least 1ed11-e^{-d_{1}}. Moreover, by Gaussian mechanism, U^U^+EU\widehat{U}\widehat{U}^{\top}+E_{U} is (ε/3,δ/3)(\varepsilon/3,\delta/3)-DP and thus U~\widetilde{U} is also (ε/3,δ/3)(\varepsilon/3,\delta/3)-DP thanks to the post-processing property of differential privacy. Furthermore, by Davis-Kahan’s Theorem, for some absolute constant c0~>0\tilde{c_{0}}>0

U~U~UU\displaystyle\left\|\widetilde{U}\widetilde{U}^{\top}-UU^{\top}\right\| 1(U^U^UU+EU)(a)1((8σrL^M1)+EU)\displaystyle\leq 1\wedge\left(\left\|\widehat{U}\widehat{U}^{\top}-UU^{\top}\right\|+\|E_{U}\|\right)\stackrel{{\scriptstyle(a)}}{{\leq}}1\wedge\left(\left(\frac{8}{\sigma_{r}}\|\widehat{L}-M\|\wedge 1\right)+\|E_{U}\|\right) (S.7)
1c0~(Cl1(Curσ1+σξσr)(d1d2)log(d1+d2)n\displaystyle\leq 1\wedge\tilde{c_{0}}\Bigg{(}\sqrt{C_{l}^{-1}}\left(\frac{\sqrt{C_{u}r}\,\sigma_{1}+\sigma_{\xi}}{\sigma_{r}}\right)\sqrt{\frac{(d_{1}\vee d_{2})\log(d_{1}+d_{2})}{n}}
+Cl1(Curσ1+σξσr)rd1nεlognlog12(3.75δ)),\displaystyle\quad\quad\quad\quad+\sqrt{C_{l}^{-1}}\left(\frac{\sqrt{C_{u}r}\,\sigma_{1}+\sigma_{\xi}}{\sigma_{r}}\right)\frac{\sqrt{rd_{1}}}{n\varepsilon}\log n\log^{\frac{1}{2}}(\frac{3.75}{\delta})\Bigg{)},

where we apply Lemma 4 in step (a)(a).

Let EVd2×d2E_{V}\in\mathbb{R}^{d_{2}\times d_{2}} be a symmetric matrix with (EV)ij(E_{V})_{ij} i.i.d. to 𝒩(0,18Δ(1)2ε2log(3.75δ))\mathcal{N}(0,\frac{18\Delta^{(1)^{2}}}{\varepsilon^{2}}\log(\frac{3.75}{\delta})) for 1ijd21\leq i\leq j\leq d_{2}, then for some absolute constant C~4>0\widetilde{C}_{4}>0, conditioned on the event \mathcal{E}_{*} and (S.2), we have EVC~4rd2nεCl1(Curσ1+σξσr)lognlog12(3.75δ),\|E_{V}\|\leq\widetilde{C}_{4}\frac{\sqrt{rd_{2}}}{n\varepsilon}\sqrt{C_{l}^{-1}}\left(\frac{\sqrt{C_{u}r}\,\sigma_{1}+\sigma_{\xi}}{\sigma_{r}}\right)\log n\log^{\frac{1}{2}}(\frac{3.75}{\delta}), with probability at least 1ed21-e^{-d_{2}}. Moreover, by Gaussian mechanism, V^V^+EU\widehat{V}\widehat{V}^{\top}+E_{U} is (ε/3,δ/3)(\varepsilon/3,\delta/3)-DP and V~\widetilde{V} is also (ε/3,δ/3)(\varepsilon/3,\delta/3)-DP thanks to the post-processing property of differential privacy. Furthermore, by Davis-Kahan’s Theorem, for some absolute constant c0~>0\tilde{c_{0}}>0

V~V~VV\displaystyle\left\|\widetilde{V}\widetilde{V}^{\top}-VV^{\top}\right\| 1c0~(Cl1(Curσ1+σξσr)(d1d2)log(d1+d2)n\displaystyle\leq 1\wedge\tilde{c_{0}}\Bigg{(}\sqrt{C_{l}^{-1}}\left(\frac{\sqrt{C_{u}r}\,\sigma_{1}+\sigma_{\xi}}{\sigma_{r}}\right)\sqrt{\frac{(d_{1}\vee d_{2})\log(d_{1}+d_{2})}{n}} (S.8)
+Cl1(Curσ1+σξσr)rd2nεlognlog12(3.75δ)).\displaystyle\quad\quad\quad\quad+\sqrt{C_{l}^{-1}}\left(\frac{\sqrt{C_{u}r}\,\sigma_{1}+\sigma_{\xi}}{\sigma_{r}}\right)\frac{\sqrt{rd_{2}}}{n\varepsilon}\log n\log^{\frac{1}{2}}(\frac{3.75}{\delta})\Bigg{)}.

A.3 Part 3

Given the (ε/3,δ/3)(\varepsilon/3,\delta/3)-DP singular vectors U~𝕆d1×r\widetilde{U}\in\mathbb{O}^{d_{1}\times r} and V~𝕆d2×r\widetilde{V}\in\mathbb{O}^{d_{2}\times r} obtained in Part A.2, we derive the sensitivity Δ(2):=maxi[n]U~(L^L^(i))V~F.\Delta^{(2)}:=\max_{i\in[n]}\left\|\widetilde{U}^{\top}\left(\widehat{L}-\widehat{L}^{(i)}\right)\widetilde{V}\right\|_{F}. Conditioned on the event \mathcal{E}_{*},

Δ(2)\displaystyle\Delta^{(2)} :=maxi[n]U~(L^L^(i))V~Fmaxi[n]rL^L^(i)\displaystyle:=\max_{i\in[n]}\left\|\widetilde{U}^{\top}\left(\widehat{L}-\widehat{L}^{(i)}\right)\widetilde{V}\right\|_{F}\leq\max_{i\in[n]}\sqrt{r}\left\|\widehat{L}-\widehat{L}^{(i)}\right\|
=maxi[n]rL^M+ML^(i)=maxi[n]rΔΔ(i)\displaystyle=\max_{i\in[n]}\sqrt{r}\left\|\widehat{L}-M+M-\widehat{L}^{(i)}\right\|=\max_{i\in[n]}\sqrt{r}\left\|\Delta-\Delta^{(i)}\right\|
C3n1Cl1r(Curσ1+σξ)logn.\displaystyle\leq C_{3}\cdot n^{-1}\sqrt{C_{l}^{-1}}\sqrt{r}\left(\sqrt{C_{u}r}\sigma_{1}+\sigma_{\xi}\right)\log n.

Let EΣE_{\Sigma} be a r×rr\times r matrix with entries i.i.d. to 𝒩(0,18Δ(2)2log(3.75δ)/ε2)\mathcal{N}(0,18\Delta^{(2)^{2}}\log(\frac{3.75}{\delta})/\varepsilon^{2}), then

EΣC~4rnεCl1(Curσ1+σξ)lognlog12(3.75δ),\|E_{\Sigma}\|\leq\widetilde{C}_{4}\cdot\frac{r}{n\varepsilon}\sqrt{C_{l}^{-1}}\left(\sqrt{C_{u}r}\sigma_{1}+\sigma_{\xi}\right)\log n\log^{\frac{1}{2}}\left(\frac{3.75}{\delta}\right), (S.9)

for some absolute constant C~4>0\widetilde{C}_{4}>0 with probability at least 1020r10^{-20r}. Moreover, by Gaussian mechanism, Σ~=U~L^V~+EΣ\widetilde{\Sigma}=\widetilde{U}^{\top}\widehat{L}\widetilde{V}+E_{\Sigma} is (ε/3,δ/3)(\varepsilon/3,\delta/3)-differentially private. Thanks to the composition property of differential privacy, the output of Algorithm 1 M~0=U~Σ~V~,\widetilde{M}_{0}=\widetilde{U}\widetilde{\Sigma}\widetilde{V}^{\top}, is (ε,δ)(\varepsilon,\delta)-differentially private.

A.4 Part 4

In this part, we derive the upper bound of M~0M\|\widetilde{M}_{0}-M\|. Note that

M~0M=U~Σ~V~UΣV=U~(U~L^V~+EΣ)V~UUMVV,\displaystyle\|\widetilde{M}_{0}-M\|=\left\|\widetilde{U}\widetilde{\Sigma}\widetilde{V}^{\top}-U\Sigma V^{\top}\right\|=\left\|\widetilde{U}(\widetilde{U}^{\top}\widehat{L}\widetilde{V}+E_{\Sigma})\widetilde{V}^{\top}-UU^{\top}MVV^{\top}\right\|,

is a d1×d2d_{1}\times d_{2} matrix of rank at most 2r2r. Since

U~(U~L^V~+EΣ)V~UUMVVU~U~L^V~V~UUMVV+U~EΣV~\displaystyle\left\|\widetilde{U}(\widetilde{U}^{\top}\widehat{L}\widetilde{V}+E_{\Sigma})\widetilde{V}^{\top}-UU^{\top}MVV^{\top}\right\|\leq\left\|\widetilde{U}\widetilde{U}^{\top}\widehat{L}\widetilde{V}\widetilde{V}^{\top}-UU^{\top}MVV^{\top}\right\|+\left\|\widetilde{U}E_{\Sigma}\widetilde{V}^{\top}\right\| (S.10)
(U~U~UU)L^V~V~+UU(L^M)V~V~+UUM(V~V~VV)+U~EΣV~\displaystyle\leq\left\|\left(\widetilde{U}\widetilde{U}^{\top}-UU^{\top}\right)\widehat{L}\widetilde{V}\widetilde{V}^{\top}\right\|+\left\|UU^{\top}\left(\widehat{L}-M\right)\widetilde{V}\widetilde{V}^{\top}\right\|+\left\|UU^{\top}M\left(\widetilde{V}\widetilde{V}^{\top}-VV^{\top}\right)\right\|+\left\|\widetilde{U}E_{\Sigma}\widetilde{V}^{\top}\right\|
U~U~UUL^+L^M+MV~V~VV+EΣ\displaystyle\leq\left\|\widetilde{U}\widetilde{U}^{\top}-UU^{\top}\right\|\left\|\widehat{L}\right\|+\left\|\widehat{L}-M\right\|+\|M\|\left\|\widetilde{V}\widetilde{V}^{\top}-VV^{\top}\right\|+\left\|E_{\Sigma}\right\|
U~U~UUL^M+U~U~UUM+L^M+MV~V~VV+EΣ\displaystyle\leq\left\|\widetilde{U}\widetilde{U}^{\top}-UU^{\top}\right\|\left\|\widehat{L}-M\right\|+\left\|\widetilde{U}\widetilde{U}^{\top}-UU^{\top}\right\|\left\|M\right\|+\left\|\widehat{L}-M\right\|+\|M\|\left\|\widetilde{V}\widetilde{V}^{\top}-VV^{\top}\right\|+\left\|E_{\Sigma}\right\|
2L^M+M(V~V~VV+U~U~UU)+EΣ,\displaystyle\leq 2\left\|\widehat{L}-M\right\|+\|M\|\left(\left\|\widetilde{V}\widetilde{V}^{\top}-VV^{\top}\right\|+\left\|\widetilde{U}\widetilde{U}^{\top}-UU^{\top}\right\|\right)+\left\|E_{\Sigma}\right\|,

it is sufficient to plug in the upper bound (S.1) of L^M\left\|\widehat{L}-M\right\|, M=σ1\|M\|=\sigma_{1}, as well as the upper bounds (S.7) of U~U~UU\|\widetilde{U}\widetilde{U}^{\top}-UU^{\top}\|, (S.8) of V~V~VV\|\widetilde{V}\widetilde{V}^{\top}-VV^{\top}\| and (S.9) of EΣ\|E_{\Sigma}\|. In conclusion, conditioned on the event \mathcal{E}_{*}, as the sample size

nC1Cl1r(Curσ1+σξσr)2(d1d2)log(d1+d2),n\geq C_{1}C_{l}^{-1}r\left(\frac{\sqrt{C_{u}r}\,\sigma_{1}+\sigma_{\xi}}{\sigma_{r}}\right)^{2}(d_{1}\vee d_{2})\log(d_{1}+d_{2}),

for some absolute constant C1>0C_{1}>0, we have

M~0M\displaystyle\|\widetilde{M}_{0}-M\| C5Cl1(Curσ1+σξ)σ1σr(d1d2)log(d1+d2)n\displaystyle\leq C_{5}\sqrt{C_{l}^{-1}}\left(\sqrt{C_{u}r}\sigma_{1}+\sigma_{\xi}\right)\frac{\sigma_{1}}{\sigma_{r}}\sqrt{\frac{(d_{1}\vee d_{2})\log(d_{1}+d_{2})}{n}}
+C5Cl1(Curσ1+σξ)σ1σrr(d1d2)nεlognlog12(3.75δ)\displaystyle\quad+C_{5}\sqrt{C_{l}^{-1}}\left(\sqrt{C_{u}r}\,\sigma_{1}+\sigma_{\xi}\right)\frac{\sigma_{1}}{\sigma_{r}}\frac{\sqrt{r(d_{1}\vee d_{2})}}{n\varepsilon}\log n\log^{\frac{1}{2}}(\frac{3.75}{\delta})
+C5Cl1(Curσ1+σξ)rnεlognlog12(3.75δ),\displaystyle\quad+C_{5}\sqrt{C_{l}^{-1}}\left(\sqrt{C_{u}r}\sigma_{1}+\sigma_{\xi}\right)\frac{r}{n\varepsilon}\log n\log^{\frac{1}{2}}(\frac{3.75}{\delta}),

for some absolute constant C5>0C_{5}>0 with probability at least 1exp(d1)exp(d2)1020r1-\exp(-d_{1})-\exp(-d_{2})-10^{-20r}.

A.5 Proof of Corollary 1

The proof of Corollary 1 is obtained by setting the upper bound of M~0MF2rM~0M\|\widetilde{M}_{0}-M\|_{F}\leq\sqrt{2r}\|\widetilde{M}_{0}-M\| given by Theorem 1 smaller than the order of σr\sigma_{r}.

Appendix B Proof of Theorem 2

We first present some preliminary results on the KL-divergence and total variation distance between Gaussian distributions. Let 𝒩(μ1,Σ1)\mathcal{N}(\mu_{1},\Sigma_{1}) and 𝒩(μ2,Σ2)\mathcal{N}(\mu_{2},\Sigma_{2}) be two pp-dimensional multivariate Gaussians, then

KL(𝒩(μ1,Σ1)𝒩(μ2,Σ2))\displaystyle\operatorname{KL}\left(\mathcal{N}\left(\mu_{1},\Sigma_{1}\right)\|\mathcal{N}\left(\mu_{2},\Sigma_{2}\right)\right)
=12(Tr(Σ21Σ1Ip)+(μ2μ1)Σ21(μ2μ1)+log(detΣ2detΣ1)).\displaystyle=\frac{1}{2}\left(\operatorname{Tr}\left(\Sigma_{2}^{-1}\Sigma_{1}-I_{p}\right)+\left(\mu_{2}-\mu_{1}\right)^{\top}\Sigma_{2}^{-1}\left(\mu_{2}-\mu_{1}\right)+\log\left(\frac{\operatorname{det}\Sigma_{2}}{\operatorname{det}\Sigma_{1}}\right)\right).

Let M=σHHM_{*}=\sigma HH^{\top} and M=σHHM_{*}^{\prime}=\sigma H^{\prime}H^{\prime\top} where HH𝒮q(d1+d2)H\neq H^{\prime}\in\mathcal{S}_{q}^{(d_{1}+d_{2})}, then

KL(fM(y¯i,Xi,Xi)fM(y¯i,Xi,Xi))\displaystyle\operatorname{KL}\left(f_{M_{*}}(\bar{y}_{i},X_{i},X_{i}^{\prime})\|f_{M_{*}^{\prime}}(\bar{y}_{i},X_{i},X_{i}^{\prime})\right)
=𝔼fM(y¯i,Xi,Xi)[logfM(y¯i,Xi,Xi)fM(y¯i,Xi,Xi)]\displaystyle=\mathbb{E}_{f_{M_{*}}(\bar{y}_{i},X_{i},X_{i}^{\prime})}\left[\log\frac{f_{M_{*}}(\bar{y}_{i},X_{i},X_{i}^{\prime})}{f_{M_{*}^{\prime}}(\bar{y}_{i},X_{i},X_{i}^{\prime})}\right]
=𝔼Xi,Xi𝔼fM(y¯iXi,Xi)[(y¯i(0MM0),(0XiXi0))24σξ2]\displaystyle=\mathbb{E}_{X_{i},X_{i}^{\prime}}\mathbb{E}_{f_{M_{*}}(\bar{y}_{i}\mid X_{i},X_{i}^{\prime})}\left[-\frac{\left(\bar{y}_{i}-\left<\left(\begin{array}[]{cc}0&M\\ M^{\top}&0\end{array}\right),\left(\begin{array}[]{cc}0&X_{i}\\ X_{i}^{\prime\top}&0\end{array}\right)\right>\right)^{2}}{4\sigma_{\xi}^{2}}\right]
+𝔼Xi,Xi𝔼fM(y¯iXi,Xi)[(y¯i(0MM0),(0XiXi0))24σξ2]\displaystyle\quad+\mathbb{E}_{X_{i},X_{i}^{\prime}}\mathbb{E}_{f_{M_{*}}(\bar{y}_{i}\mid X_{i},X_{i}^{\prime})}\left[-\frac{\left(\bar{y}_{i}-\left<\left(\begin{array}[]{cc}0&M^{\prime}\\ M^{\prime\top}&0\end{array}\right),\left(\begin{array}[]{cc}0&X_{i}\\ X_{i}^{\prime\top}&0\end{array}\right)\right>\right)^{2}}{4\sigma_{\xi}^{2}}\right]
=𝔼Xi,Xi[MM,(0XiXi0)MM,(0XiXi0)4σξ2]\displaystyle=\mathbb{E}_{X_{i},X_{i}^{\prime}}\left[\frac{\left<M_{*}-M_{*}^{\prime},\left(\begin{array}[]{cc}0&X_{i}\\ X_{i}^{\prime\top}&0\end{array}\right)\right>\cdot\left<M_{*}-M_{*}^{\prime},\left(\begin{array}[]{cc}0&X_{i}\\ X_{i}^{\prime\top}&0\end{array}\right)\right>}{4\sigma_{\xi}^{2}}\right]
1σξ2CuMMF21σξ2Cuσ2rε02,\displaystyle\lesssim\frac{1}{\sigma_{\xi}^{2}}C_{u}\|M_{*}-M_{*}^{\prime}\|_{F}^{2}\lesssim\frac{1}{\sigma_{\xi}^{2}}C_{u}\sigma^{2}r\varepsilon_{0}^{2},

and further by Pinsker’s inequality, we have

TV(fM(y¯i,Xi,Xi),fM(y¯i,Xi,Xi))Cuσσξrε0.\operatorname{TV}\left(f_{M_{*}}(\bar{y}_{i},X_{i},X_{i}^{\prime}),f_{M_{*}^{\prime}}(\bar{y}_{i},X_{i},X_{i}^{\prime})\right)\lesssim\frac{\sqrt{C_{u}}\sigma}{\sigma_{\xi}}\sqrt{r}\varepsilon_{0}.

For any probability measures PMPM𝒫σP_{M_{*}}\neq P_{M_{*}^{\prime}}\in\mathcal{P}_{\sigma}, we have

KL(i=1nfM(y¯i,Xi,Xi)i=1nfM(y¯i,Xi,Xi))nσξ2Cuσ2rε02.\operatorname{KL}\left(\bigotimes_{i=1}^{n}f_{M_{*}}(\bar{y}_{i},X_{i},X_{i}^{\prime})\|\bigotimes_{i=1}^{n}f_{M_{*}^{\prime}}(\bar{y}_{i},X_{i},X_{i}^{\prime})\right)\lesssim\frac{n}{\sigma_{\xi}^{2}}C_{u}\sigma^{2}r\varepsilon_{0}^{2}.

and

k[n]TV(fM(y¯i,Xi,Xi),fM(y¯i,(Xi)))nCuσσξrε0,\sum_{k\in[n]}\operatorname{\operatorname{TV}}\left(f_{M_{*}}(\bar{y}_{i},X_{i},X_{i}^{\prime}),f_{M^{\prime}_{*}}(\bar{y}_{i},(X_{i}^{\perp\!\!\!\perp})_{*})\right)\lesssim n\frac{\sqrt{C_{u}}\sigma}{\sigma_{\xi}}\sqrt{r}\varepsilon_{0},

The next Lemma 6 states that there exists a sufficiently large subsect of 𝕆d1+d2,2r\mathbb{O}_{d_{1}+d_{2},2r} such that the elements in the subsets are well separated.

Lemma 6.

For any rdr\leq d, there exists a subset 𝒮q(d)𝕆d,r\mathcal{S}_{q}^{(d)}\subset\mathbb{O}_{d,r} with cardinality |𝒮q(d)|2r(dr)\left|\mathcal{S}_{q}^{(d)}\right|\geq 2^{r(d-r)} such that for any UiUj𝒮q(d)U_{i}\neq U_{j}\in\mathcal{S}_{q}^{(d)},

UiUiUjUjqε02(1ε02)ViVjqε02(1ε02)ViViVjVjqε02(1ε02)r1/q\displaystyle\|U_{i}U_{i}^{\top}-U_{j}U_{j}^{\top}\|_{q}\geq\sqrt{\varepsilon_{0}^{2}(1-\varepsilon_{0}^{2})}\|V_{i}-V_{j}\|_{q}\gtrsim\sqrt{\varepsilon_{0}^{2}(1-\varepsilon_{0}^{2})}\|V_{i}V_{i}^{\top}-V_{j}V_{j}^{\top}\|_{q}\gtrsim\sqrt{\varepsilon_{0}^{2}(1-\varepsilon_{0}^{2})}r^{1/q}

where q\|\cdot\|_{q} denotes the Schatten-q norm and, meanwhile,

UiUiUjUjFUiUjFε0ViVjF2rε0.\|U_{i}U_{i}^{\top}-U_{j}U_{j}^{\top}\|_{\rm F}\lesssim\|U_{i}-U_{j}\|_{\rm F}\leq\varepsilon_{0}\|V_{i}-V_{j}\|_{\rm F}\leq\sqrt{2r}\varepsilon_{0}.

By Lemma 6, there exists a subset 𝒮q(d1+d2)𝕆d1+d2,2r\mathcal{S}_{q}^{(d_{1}+d_{2})}\subset\mathbb{O}_{d_{1}+d_{2},2r} with cardinality |𝒮q(d1+d2)|22r(d1+d22r)\left|\mathcal{S}_{q}^{(d_{1}+d_{2})}\right|\geq 2^{2r(d_{1}+d_{2}-2r)} such that for any HH𝒮q(d1+d2)H\neq H^{\prime}\in\mathcal{S}_{q}^{(d_{1}+d_{2})},

HHHHqε02(1ε02)(2r)1/q,\|HH^{\top}-H^{\prime}H^{\prime\top}\|_{q}\gtrsim\sqrt{\varepsilon_{0}^{2}(1-\varepsilon_{0}^{2})}(2r)^{1/q},

and meanwhile,

HHHHF2rε0.\|HH^{\top}-H^{\prime}H^{\prime\top}\|_{\rm F}\lesssim 2\sqrt{r}\varepsilon_{0}.

To invoke Lemma 3, we define the metric ρ:𝕆d1+d2,2r×𝕆d1+d2,2r+\rho:\mathbb{O}_{d_{1}+d_{2},2r}\times\mathbb{O}_{d_{1}+d_{2},2r}\mapsto\mathbb{R}^{+} as ρ(H,H):=HHHHq\rho(H,H^{\prime}):=\|HH^{\top}-H^{\prime}H^{\prime\top}\|_{q} for any q[1,]q\in[1,\infty] and take ρ0τε0r1/q\rho_{0}\asymp\tau\varepsilon_{0}r^{1/q},

l0=c0nσξ2Cuσ2rε02andt0=c0nCuσσξrε0,l_{0}=c_{0}\frac{n}{\sigma_{\xi}^{2}}C_{u}\sigma^{2}r\varepsilon_{0}^{2}\quad{\rm and}\quad t_{0}=c_{0}n\frac{\sqrt{C_{u}}\sigma}{\sigma_{\xi}}\sqrt{r}\varepsilon_{0},

for some small absolute constant c0,τ>0c_{0},\tau>0. Then, by Lemma 3, for any (ε,δ)(\varepsilon,\delta)-DP estimator H~\widetilde{H},

supP𝒫σ𝔼H~H~HHq\displaystyle\sup_{P\in\mathcal{P}_{\sigma}}\mathbb{E}\big{\|}\widetilde{H}\widetilde{H}^{\top}-HH^{\top}\big{\|}_{q}
max{τε0r1/q2(1c0nσξ2Cuσ2rε02+log2logN),τε0r1/q4(1N1exp(4εc0nCuσσξrε0))}.\displaystyle\geqslant\max\left\{\frac{\tau\varepsilon_{0}r^{1/q}}{2}\left(1-\frac{c_{0}\frac{n}{\sigma_{\xi}^{2}}C_{u}\sigma^{2}r\varepsilon_{0}^{2}+\log 2}{\log N}\right),\frac{\tau\varepsilon_{0}r^{1/q}}{4}\left(1\wedge\frac{N-1}{\exp\left(4\varepsilon c_{0}n\frac{\sqrt{C_{u}}\sigma}{\sigma_{\xi}}\sqrt{r}\varepsilon_{0}\right)}\right)\right\}.

Recall that N22r(d1+d2)/2N\geq 2^{2r(d_{1}+d_{2})/2} if d1+d24rd_{1}+d_{2}\geq 4r. We can take

ε0σξCuσ(d1+d2n+r12d1+d2nε),\varepsilon_{0}\asymp\frac{\sigma_{\xi}}{\sqrt{C_{u}}\sigma}\left(\sqrt{\frac{d_{1}+d_{2}}{n}}+r^{\frac{1}{2}}\frac{d_{1}+d_{2}}{n\varepsilon}\right),

to get

supP𝒫σ𝔼H~H~HHqσξCuσ(r1/qd1+d2n+r12+1qd1+d2nε)r1/q,\displaystyle\sup_{P\in\mathcal{P}_{\sigma}}\mathbb{E}\Big{\|}\widetilde{H}\widetilde{H}^{\top}-HH^{\top}\Big{\|}_{q}\gtrsim\frac{\sigma_{\xi}}{\sqrt{C_{u}}\sigma}\left(r^{1/q}\sqrt{\frac{d_{1}+d_{2}}{n}}+r^{\frac{1}{2}+\frac{1}{q}}\frac{d_{1}+d_{2}}{n\varepsilon}\right)\bigwedge r^{1/q},

where the last term is due to a trivial upper of H~H~HHq(4r)1/q\|\widetilde{H}\widetilde{H}^{\top}-HH^{\top}\|_{q}\leq(4r)^{1/q}. Since σ\sigma is already known, it suffices to estimate HHHH^{\top} differentially privately by the estimator H~H~\widetilde{H}\widetilde{H}^{\top}, and an estimator (M~)(\widetilde{M})_{*} for the matrix MM_{*} is given by σH~H~\sigma\widetilde{H}\widetilde{H}^{\top} . Therefore,

supP𝒫σ𝔼M~\displaystyle\underset{P\in\mathcal{P}_{\sigma}}{\sup}\mathbb{E}\big{\|}\widetilde{M}_{*}- MqσsupP𝒫σ𝔼H~H~HHq.\displaystyle M_{*}\big{\|}_{q}\geq\sigma\cdot\underset{P\in\mathcal{P}_{\sigma}}{\sup}\mathbb{E}\big{\|}\widetilde{H}\widetilde{H}^{\top}-HH^{\top}\big{\|}_{q}.

Due to 𝒫σ𝒫Σ\mathcal{P}_{\sigma}\subset\mathcal{P}_{\Sigma}, we have

supP𝒫Σ𝔼M~MqsupP𝒫σ𝔼M~MqσξCu(r1/qd1+d2n+r12+1qd1+d2nε)r1/qσ.\displaystyle\underset{P\in\mathcal{P}_{\Sigma}}{\sup}\mathbb{E}\big{\|}\widetilde{M}_{*}-M_{*}\big{\|}_{q}\geq\underset{P\in\mathcal{P}_{\sigma}}{\sup}\mathbb{E}\big{\|}\widetilde{M}_{*}-M_{*}\big{\|}_{q}\gtrsim\frac{\sigma_{\xi}}{\sqrt{C_{u}}}\left(r^{1/q}\sqrt{\frac{d_{1}+d_{2}}{n}}+r^{\frac{1}{2}+\frac{1}{q}}\frac{d_{1}+d_{2}}{n\varepsilon}\right)\bigwedge r^{1/q}\sigma.

There is a one-to-one mapping between M~\widetilde{M} and (M~)(\widetilde{M})_{*}. Let M~M=UΔΣΔVΔ\widetilde{M}-M=U_{\Delta}\Sigma_{\Delta}V_{\Delta}^{\top}, then M~Mqq=ΣΔqq\|\widetilde{M}-M\|_{q}^{q}=\left\|\Sigma_{\Delta}\right\|_{q}^{q}. Note that

(M~)M=(0M~MM~M0)=12(UΔUΔVΔVΔ)(ΣΔ00ΣΔ)12(UΔUΔVΔVΔ),(\widetilde{M})_{*}-M_{*}=\left(\begin{array}[]{cc}0&\widetilde{M}-M\\ \widetilde{M}^{\top}-M^{\top}&0\end{array}\right)=\frac{1}{\sqrt{2}}\left(\begin{array}[]{ll}U_{\Delta}&U_{\Delta}\\ V_{\Delta}&-V_{\Delta}\end{array}\right)\left(\begin{array}[]{cc}\Sigma_{\Delta}&0\\ 0&\Sigma_{\Delta}\end{array}\right)\frac{1}{\sqrt{2}}\left(\begin{array}[]{ll}U_{\Delta}&U_{\Delta}\\ V_{\Delta}&-V_{\Delta}\end{array}\right)^{\top},

and

(M~)Mqq=(ΣA00Σa)qq=2ΣΔqq=2M~Mqq.\left\|(\widetilde{M})_{*}-M_{*}\right\|_{q}^{q}=\left\|\left(\begin{array}[]{cc}\Sigma_{A}&0\\ 0&\Sigma_{a}\end{array}\right)\right\|_{q}^{q}=2\left\|\Sigma_{\Delta}\right\|_{q}^{q}=2\|\widetilde{M}-M\|_{q}^{q}.

Therefore, (M~)Mq=21/qM~Mq\left\|(\widetilde{M})_{*}-M*\right\|_{q}=2^{1/q}\|\widetilde{M}-M\|_{q} and

infM~supP𝒫Σ𝔼M~\displaystyle\inf_{\widetilde{M}}\underset{P\in\mathcal{P}_{\Sigma}}{\sup}\mathbb{E}\big{\|}\widetilde{M}- MqσξCu(r1/qd1+d2n+r12+1qd1+d2nε)r1/qσ,\displaystyle M\big{\|}_{q}\gtrsim\frac{\sigma_{\xi}}{\sqrt{C_{u}}}\left(r^{1/q}\sqrt{\frac{d_{1}+d_{2}}{n}}+r^{\frac{1}{2}+\frac{1}{q}}\frac{d_{1}+d_{2}}{n\varepsilon}\right)\bigwedge r^{1/q}\sigma, (S.11)

where we use the fact d1+d2d1d2d_{1}+d_{2}\lesssim d_{1}\vee d_{2} and infimum is taken over all possible (ε,δ)(\varepsilon,\delta)-DP algorithms.

Appendix C Proof of Theorem 3

In Appendix C, we aim to prove Theorem 3. The proof is composed of three Parts. In Part C.1, we characterize the sensitivity Δl\Delta_{l} for iterations l=1,,ll=1,\cdots,l^{*} and bound 𝒫𝕋lNlF\|\mathcal{P}_{\mathbb{T}_{l}}N_{l}\|_{F}. In Part C.2, take mathematical induction to prove that if the RIP-condition holds and both Δl\Delta_{l} and MlMF\|M_{l}-M\|_{F} are bounded with high probability, then we also have Δl+1\Delta_{l+1} and Ml+1MF\|M_{l+1}-M\|_{F} are bounded with high probability. In Part C.3, we choose an appropriate ll^{*} as the total number of iterations and give the convergence rate of M~lMF\|\widetilde{M}_{l^{*}}-M\|_{F}.

C.1 Part 1

In Part C.1, we focus on upper bounding 𝒫𝕋lNlF\|\mathcal{P}_{\mathbb{T}_{l}}N_{l}\|_{F}. The first step is to characterize the sensitivity of the ll-th iteration for l[l]l\in[l^{*}]. Let

Gl(i):=1nji(Xj,Mlyj)Xj+1n[Xi,Mlyi]Xi,G_{l}^{(i)}:=\frac{1}{n}\sum_{j\neq i}\left(\left<X_{j},M_{l}\right>-y_{j}\right)X_{j}+\frac{1}{n}\left[\left<X_{i}^{\prime},M_{l}\right>-y_{i}^{\prime}\right]X_{i}^{\prime},

which is the gradient of ll-th iteration obtained by the dataset differes with the original one only by the ii-th pair of observation. The sensitivity of the gradient descent on the tagent space 𝕋l\mathbb{T}_{l} is

Δl\displaystyle\Delta_{l} :=maxneighbouring(Z,Z)Mlη𝒫𝕋l(Gl(Z))[Mlη𝒫𝕋l(Gl(Z))]F\displaystyle:=\max_{{\rm neighbouring}(Z,Z^{\prime})}\left\|M_{l}-\eta\mathcal{P}_{\mathbb{T}_{l}}(G_{l}(Z))-\left[M_{l}-\eta\mathcal{P}_{\mathbb{T}_{l}}(G_{l}(Z^{\prime}))\right]\right\|_{F}
=maxi[n]Mlη𝒫𝕋l(Gl)[Mlη𝒫𝕋l(Gl(i))]F.\displaystyle=\max_{i\in[n]}\left\|M_{l}-\eta\mathcal{P}_{\mathbb{T}_{l}}(G_{l})-\left[M_{l}-\eta\mathcal{P}_{\mathbb{T}_{l}}(G_{l}^{(i)})\right]\right\|_{F}.

By the definition of GlG_{l} and Gl(i)G_{l}^{(i)},

Δlηnmaxi[n][𝒫𝕋l(Xi,Mlyi)XiF+𝒫𝕋l(Xi,Mlyi)XiF],\displaystyle\Delta_{l}\leq\frac{\eta}{n}\max_{i\in[n]}\left[\left\|\mathcal{P}_{\mathbb{T}_{l}}\left(\left<X_{i},M_{l}\right>-y_{i}\right)X_{i}\right\|_{F}+\left\|\mathcal{P}_{\mathbb{T}_{l}}\left(\left<X_{i}^{\prime},M_{l}\right>-y_{i}^{\prime}\right)X_{i}^{\prime}\right\|_{F}\right],

where for all i[n]i\in[n] and l+1[l]l+1\in[l^{*}]

𝒫𝕋l(Xi,Mlyi)XiF\displaystyle\left\|\mathcal{P}_{\mathbb{T}_{l}}\left(\left<X_{i},M_{l}\right>-y_{i}\right)X_{i}\right\|_{F} |Xi,MlMξi|𝒫𝕋lXiF\displaystyle\leq\left|\left<X_{i},M_{l}-M\right>-\xi_{i}\right|\left\|\mathcal{P}_{\mathbb{T}_{l}}X_{i}\right\|_{F} (S.12)
(|ξi|+|Xi,MlM|)2rXi.\displaystyle\leq\left(|\xi_{i}|+\left|\left<X_{i},M_{l}-M\right>\right|\right)\sqrt{2r}\left\|X_{i}\right\|.

Here, the last inequality uses the fact that for any Bd1×d2B\in\mathbb{R}^{d_{1}\times d_{2}}, the matrix 𝒫𝕋lB\mathcal{P}_{\mathbb{T}_{l}}B is of rank at most 2r2r. Since both ξi\xi_{i} and MlM,Xi\left<M_{l}-M,X_{i}\right> are sub-Gaussians with ξiψ2=σξ\|\xi_{i}\|_{\psi_{2}}=\sigma_{\xi} and MlM,Xiψ2MlMFCu\left\|\left<M_{l}-M,X_{i}\right>\right\|_{\psi_{2}}\leq\left\|M_{l}-M\right\|_{F}\sqrt{C_{u}}, we turn to Lemma 7 to upper bound |ξi|+|Xi,MlM||\xi_{i}|+\left|\left<X_{i},M_{l}-M\right>\right|.

Lemma 7 (Vershynin (2018)).

For any sub-Gaussian random variable BB\in\mathbb{R},

(|B|t)2exp(ct2/Bψ2),t0.\mathbb{P}(|B|\geq t)\leq 2\exp\left(-ct^{2}/\|B\|_{\psi_{2}}\right),\quad\forall t\geq 0.

According to the tail probability of sub-Gaussian random variable stated in Lemma 7, we have with probability at least 1n101-n^{-10},

|ξi|+|Xi,MlM|C1(σξ+MlMFCu)log1/2n,\displaystyle|\xi_{i}|+\left|\left<X_{i},M_{l}-M\right>\right|\leq C_{1}(\sigma_{\xi}+\left\|M_{l}-M\right\|_{F}\sqrt{C_{u}})\log^{1/2}n, (S.13)

for some absolute constant C1>0C_{1}>0. Shen et al. (2023) offeres the folowing result on Xi\|X_{i}\|.

Lemma 8 (Shen et al. (2023)).

Suppose the vectorization of Xd1×d2X\in\mathbb{R}^{d_{1}\times d_{2}} follows mean zero multivariate Gaussian distribution N(𝟎,Λ)N(\mathbf{0},\Lambda) where Λd1d2×d1d2\Lambda\in\mathbb{R}^{d_{1}d_{2}\times d_{1}d_{2}} satisfies λmax(Λ)Cu\lambda_{\max}(\Lambda)\leq C_{u}. Then, for some constant c>0c>0

(Xt+cCu(d1+d2))exp(t2).\mathbb{P}\left(\|X\|\geq t+c\sqrt{C_{u}\left(d_{1}+d_{2}\right)}\right)\leq\exp\left(-t^{2}\right).

It implies Xψ2c1Cu(d1+d2)\left\|\|X\|\right\|_{\psi_{2}}\leq c_{1}\sqrt{C_{u}(d_{1}+d_{2})} and Xψ1c2Cu(d1+d2)\left\|\|X\|\right\|_{\psi_{1}}\leq c_{2}\sqrt{C_{u}(d_{1}+d_{2})} for some constants c1,c2>0c_{1},c_{2}>0.

Thus, for some absolute constant C2>0C_{2}>0, with probability at least 1exp(10Cu(d1+d2))n101-\exp\left(-10C_{u}(d_{1}+d_{2})\right)n^{-10}

Xi\displaystyle\|X_{i}\| C2Cu(d1+d2)+logn.\displaystyle\leq C_{2}\sqrt{C_{u}(d_{1}+d_{2})+\log n}. (S.14)

Combining (S.12), (S.13) and (S.14) and taking maximum over nn, for some constant C3>0C_{3}>0, we have the event

Δl:={ΔlC3ηn(σξ+MlMFCu)Cur(d1+d2+logn)logn},\mathcal{E}_{\Delta_{l}}^{\prime}:=\left\{\Delta_{l}\leq C_{3}\frac{\eta}{n}(\sigma_{\xi}+\|M_{l}-M\|_{F}\sqrt{C_{u}})\sqrt{C_{u}r(d_{1}+d_{2}+\log n)\log n}\right\}, (S.15)

happens with probability at least 1n9exp(10Cu(d1+d2))n91-n^{-9}-\exp\left(-10C_{u}(d_{1}+d_{2})\right)n^{-9}. In the event Δl\mathcal{E}_{\Delta_{l}}^{\prime} stated in (S.15), the sensitivity Δl\Delta_{l} still relies on MlMF\|M_{l}-M\|_{F}. To get an upper bound irrelevant with ll, we take condition on the event

l={MlMFc0σr},\mathcal{E}_{l}=\left\{\left\|M_{l}-M\right\|_{F}\leq c_{0}\sigma_{r}\right\},

and obtain that for some absolute constant C~3>0\widetilde{C}_{3}>0, the event

Δl:={ΔlC~3ηn(σξ+σrCu)Cur(d1+d2+logn)logn},\mathcal{E}_{\Delta_{l}}:=\left\{\Delta_{l}\leq\widetilde{C}_{3}\frac{\eta}{n}(\sigma_{\xi}+\sigma_{r}\sqrt{C_{u}})\sqrt{C_{u}r(d_{1}+d_{2}+\log n)\log n}\right\}, (S.16)

happens with the probability (Δl)1n9exp(10Cu(d1+d2))n9.\mathbb{P}(\mathcal{E}_{\Delta_{l}})\geq 1-n^{-9}-\exp\left(-10C_{u}(d_{1}+d_{2})\right)n^{-9}.

In the l+1l+1-th iteration of Algorithm 2, the matrix MlM_{l} and operator 𝒫𝕋l\mathcal{P}_{\mathbb{T}_{l}} are known. Moreover, the rank rr approximation SVDr\operatorname{SVD}_{r} is irrelevant with the data set Z={(Xi,yi)}i=1nZ=\{(X_{i},y_{i})\}_{i=1}^{n}. Thanks to the post-processing property and composition property of differential privacy, we only need to guarantee that Mlηl𝒫𝕋l(Gl)M_{l}-\eta_{l}\mathcal{P}_{\mathbb{T}_{l}}(G_{l}) is (ε/l,δ/l)(\varepsilon/l^{*},\delta/l^{*})-DP where the gradient

Gl=1ni=1n(Xi,Mlyi)Xi,G_{l}=\frac{1}{n}\sum_{i=1}^{n}\left(\left<X_{i},M_{l}\right>-y_{i}\right)X_{i},

is the only component depends on the data set ZZ. Let NlN_{l} be a d1×d2d_{1}\times d_{2} matrix with entries i.i.d. to normal distribution with varicance l2Δl2ε2log(1.25lδ).\frac{l^{*2}\Delta_{l}^{2}}{\varepsilon^{2}}\log\left(\frac{1.25l^{*}}{\delta}\right). Under the condition that d1+d2lognd_{1}+d_{2}\gtrsim\log n and conditioned on the event Δll\mathcal{E}_{\Delta_{l}}\cap\mathcal{E}_{l}, we have for some constant C4>0C_{4}>0

𝒫𝕋lNlFC4ηlr(d1+d2)nε(σξ+σrCu)Culognlog1/2(1.25lδ),\displaystyle\|\mathcal{P}_{\mathbb{T}_{l}}N_{l}\|_{F}\leq C_{4}\eta l^{*}\frac{r(d_{1}+d_{2})}{n\varepsilon}(\sigma_{\xi}+\sigma_{r}\sqrt{C_{u}})\sqrt{C_{u}\log n}\log^{1/2}\left(\frac{1.25l^{*}}{\delta}\right),

with probability at least 1exp((d1+d2))1-\exp(-(d_{1}+d_{2})).

C.2 Part 2

In Part C.2, we take mathematical induction to prove that when the events

l:=RIPlΔl,\mathcal{E}_{l}^{*}:=\mathcal{E}_{{\rm RIP}}\cap\mathcal{E}_{l}\cap\mathcal{E}_{\Delta_{l}},

occurs with high probability, the event l+1\mathcal{E}_{l+1}^{*} occurs with high probability as well. Here, RIP\mathcal{E}_{{\rm RIP}} is defined as the event where the RIP condition of {Xi}i[n]\{X_{i}\}_{i\in[n]} holds, See (S.17).

Step 1: 0\mathcal{E}_{0}^{*} is true with high probability.

We first consider the RIP condition. According to Lemma 1, for any Bd1×d2B\in\mathbb{R}^{d_{1}\times d_{2}} of rank rr, there exist constants c1,c2,c3>0c_{1},c_{2},c_{3}>0 and 0<c4<c50<c_{4}<c_{5} such that when nc1r(d1+d2)n\geq c_{1}r(d_{1}+d_{2}), with probability at least 1c2exp(c3r(d1+d2))1-c_{2}\exp\left(-c_{3}r(d_{1}+d_{2})\right),

(1Rr)BF21ni=1nXi,B2(1+Rr)BF2,(1-R_{r})\|B\|_{\mathrm{F}}^{2}\leq\frac{1}{n}\sum_{i=1}^{n}\left\langle X_{i},B\right\rangle^{2}\leq(1+R_{r})\|B\|_{\mathrm{F}}^{2},

where Rr:=(1c4CuCl)(c5CuCl1)R_{r}:=\left(1-c_{4}\sqrt{C_{u}C_{l}}\right)\vee\left(c_{5}\sqrt{C_{u}C_{l}}-1\right). The values of R2r,R3rR_{2r},R_{3r} and R4rR_{4r} are defined similarly. Therefore, under the condition that nc~1r(d1+d2)n\geq\widetilde{c}_{1}r(d_{1}+d_{2}), for some constants c~1,c~2,c~3,c~4,c~5>0\widetilde{c}_{1},\widetilde{c}_{2},\widetilde{c}_{3},\widetilde{c}_{4},\widetilde{c}_{5}>0,

RIP:={RrR2rR3rR4r(1c~4CuCl)(c~5CuCl1)},\mathcal{E}_{{\rm RIP}}:=\left\{R_{r}\vee R_{2r}\vee R_{3r}\vee R_{4r}\leq\left(1-\widetilde{c}_{4}\sqrt{C_{u}C_{l}}\right)\vee\left(\widetilde{c}_{5}\sqrt{C_{u}C_{l}}-1\right)\right\}, (S.17)

happens with probability at least 1c~2exp(c~3r(d1+d2))1-\widetilde{c}_{2}\exp\left(-\widetilde{c}_{3}r(d_{1}+d_{2})\right).

As for 0\mathcal{E}_{0}, we refer to Corollary 1, which shows that as the sample size

nO~((κ4r2+κ2κξ2r)(d1d2)),n\geq\widetilde{O}\left((\kappa^{4}r^{2}+\kappa^{2}\kappa_{\xi}^{2}r)(d_{1}\vee d_{2})\right),

the event 0\mathcal{E}_{0} happens with probability at least 1(d1+d2)10n9exp(d1)exp(d2)1020r1-(d_{1}+d_{2})^{-10}-n^{-9}-\exp(-d_{1})-\exp(-d_{2})-10^{-20r}. Conditioned on 0\mathcal{E}_{0}, plugging l=0l=0 to the event Δl\mathcal{E}_{\Delta_{l}}^{\prime} defined in (S.15), we have the event Δl\mathcal{E}_{\Delta_{l}} defined in (S.16) happens with probability at least 1n9exp(10Cu(d1+d2))n91-n^{-9}-\exp\left(-10C_{u}\left(d_{1}+d_{2}\right)\right)n^{-9}. To this end, we have

(0)\displaystyle\mathbb{P}\left(\mathcal{E}_{0}^{*}\right) =(RIP0Δ0)1c~2exp(c~3r(d1+d2))\displaystyle=\mathbb{P}\left(\mathcal{E}_{{\rm RIP}}\cap\mathcal{E}_{0}\cap\mathcal{E}_{\Delta_{0}}\right)\geq 1-\widetilde{c}_{2}\exp\left(-\widetilde{c}_{3}r(d_{1}+d_{2})\right)
(d1+d2)10n9ed1ed21020r\displaystyle\quad\quad-(d_{1}+d_{2})^{-10}-n^{-9}-e^{-d_{1}}-e^{-d_{2}}-10^{-20r}
n9exp(10Cu(d1+d2))n9.\displaystyle\quad\quad-n^{-9}-\exp\left(-10C_{u}(d_{1}+d_{2})\right)n^{-9}.

Step 2: induction. The following analysis is conditioned on the event l\mathcal{E}_{l}^{*}. Let 𝒳:d1×d2n\mathcal{X}:\mathbb{R}^{d_{1}\times d_{2}}\rightarrow\mathbb{R}^{n} be an operator defined by 𝒳(B)=(X1,B,,Xn,B)n,\mathcal{X}(B)=\left(\left<X_{1},B\right>,\cdots,\left<X_{n},B\right>\right)^{\top}\in\mathbb{R}^{n}, for all Bd1×d2B\in\mathbb{R}^{d_{1}\times d_{2}}. It is easy to check that the adjoint operator of 𝒳\mathcal{X} is 𝒳:nd1×d2\mathcal{X}^{*}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{d_{1}\times d_{2}} which is defined by 𝒳(b):=i=1nbiXi,\mathcal{X}^{*}(b):=\sum_{i=1}^{n}b_{i}X_{i}, for all bnb\in\mathbb{R}^{n}. Therefore, 𝒳𝒳(Ml)=i=1nXi,MlXiand𝒳(ξ)=i=1nξiXi,\mathcal{X}^{*}\mathcal{X}(M_{l})=\sum_{i=1}^{n}\left<X_{i},M_{l}\right>X_{i}\quad{\rm and}\quad\mathcal{X}^{*}(\xi)=\sum_{i=1}^{n}\xi_{i}X_{i}, where ξ:=(ξ1,,ξn)n\xi:=(\xi_{1},\cdots,\xi_{n})\in\mathbb{R}^{n} and accordingly,

P𝕋lGl=1n[𝒫𝕋l𝒳𝒳(MlM)𝒫𝕋l𝒳(ξ)].P_{\mathbb{T}_{l}}G_{l}=\frac{1}{n}\left[\mathcal{P}_{\mathbb{T}_{l}}\mathcal{X}^{*}\mathcal{X}\left(M_{l}-M\right)-\mathcal{P}_{\mathbb{T}_{l}}\mathcal{X}(\xi)\right].

Our first goal is to upper bound

MlMη𝒫𝕋l(Gl)F=MlMηn𝒫𝕋l𝒳𝒳(MlM)𝒫𝕋l𝒳(ξ)F\displaystyle\|M_{l}-M-\eta\mathcal{P}_{\mathbb{T}_{l}}(G_{l})\|_{F}=\|M_{l}-M-\frac{\eta}{n}\mathcal{P}_{\mathbb{T}_{l}}\mathcal{X}^{*}\mathcal{X}\left(M_{l}-M\right)-\mathcal{P}_{\mathbb{T}_{l}}\mathcal{X}(\xi)\|_{F} (S.18)
(1ηn)MlMF+ηn(𝒫𝕋l𝒳𝒳𝒫𝕋l)(MlM)FD1\displaystyle\leq\left(1-\frac{\eta}{n}\right)\|M_{l}-M\|_{F}+\underbrace{\frac{\eta}{n}\left\|\left(\mathcal{I}-\mathcal{P}_{\mathbb{T}_{l}}\mathcal{X}^{*}\mathcal{X}\mathcal{P}_{\mathbb{T}_{l}}\right)\left(M_{l}-M\right)\right\|_{F}}_{D_{1}}
+ηn𝒫𝕋l𝒳𝒳𝒫𝕋l(MlM)FD2+ηn𝒫𝕋l𝒳(ξ)FD3.\displaystyle\quad+\underbrace{\frac{\eta}{n}\left\|\mathcal{P}_{\mathbb{T}_{l}}\mathcal{X}^{*}\mathcal{X}\mathcal{P}_{\mathbb{T}_{l}^{\perp}}\left(M_{l}-M\right)\right\|_{F}}_{D_{2}}+\underbrace{\frac{\eta}{n}\left\|\mathcal{P}_{\mathbb{T}_{l}}\mathcal{X}^{*}\left(\xi\right)\right\|_{F}}_{D_{3}}.

Lemma 9 characterizes the operators 𝒫𝕋l𝒫𝕋l𝒳𝒳𝒫𝕋l\mathcal{P}_{\mathbb{T}_{l}}-\mathcal{P}_{\mathbb{T}_{l}}\mathcal{X}^{*}\mathcal{X}\mathcal{P}_{\mathbb{T}_{l}} and 𝒫Tl𝒳𝒳𝒫𝕋l\mathcal{P}_{T_{l}}\mathcal{X}^{*}\mathcal{X}\mathcal{P}_{\mathbb{T}_{l}^{\perp}}, which are critical to upper bound D1D_{1} and D2D_{2} in (S.18).

Lemma 9 (Wei et al. (2016), Luo and Zhang (2022)).

Suppose the event RIP\mathcal{E}_{{\rm RIP}} happens, then the following conclusions hold

  1. 1.

    𝒫𝕋l𝒫𝕋l𝒳𝒳𝒫𝕋lR2r\left\|\mathcal{P}_{\mathbb{T}_{l}}-\mathcal{P}_{\mathbb{T}_{l}}\mathcal{X}^{*}\mathcal{X}\mathcal{P}_{\mathbb{T}_{l}}\right\|\leq R_{2r}.

  2. 2.

    𝒫Tl𝒳𝒳𝒫𝕋l(MlM)F=R4r𝒫𝕋l(MlM)F\|\mathcal{P}_{T_{l}}\mathcal{X}^{*}\mathcal{X}\mathcal{P}_{\mathbb{T}_{l}^{\perp}}\left(M_{l}-M\right)\|_{F}=R_{4r}\left\|\mathcal{P}_{\mathbb{T}_{l}^{\perp}}(M_{l}-M)\right\|_{F},

where 𝒫𝕋l(MlM)F1σrMlMMlMF\|\mathcal{P}_{\mathbb{T}_{l}^{\perp}}\left(M_{l}-M\right)\|_{F}\leq\frac{1}{\sigma_{r}}\|M_{l}-M\|\|M_{l}-M\|_{F} according to Wei et al. (2016).

According to Lemma 9, conditioned on the event l\mathcal{E}_{l}^{*}

D1=ηn𝒫𝕋l𝒳𝒳𝒫𝕋l(MlM)F\displaystyle D_{1}=\frac{\eta}{n}\left\|\mathcal{I}-\mathcal{P}_{\mathbb{T}_{l}}\mathcal{X}^{*}\mathcal{X}\mathcal{P}_{\mathbb{T}_{l}}\left(M_{l}-M\right)\right\|_{F}
ηn[𝒫𝕋l𝒫𝕋l𝒳𝒳𝒫𝕋l(MlM)F+𝒫𝕋l(MlM)F]ηn(R2r+c0)MlMF,\displaystyle\leq\frac{\eta}{n}\left[\left\|\mathcal{P}_{\mathbb{T}_{l}}-\mathcal{P}_{\mathbb{T}_{l}}\mathcal{X}^{*}\mathcal{X}\mathcal{P}_{\mathbb{T}_{l}}\left(M_{l}-M\right)\right\|_{F}+\|\mathcal{P}_{\mathbb{T}_{l}^{\perp}}\left(M_{l}-M\right)\|_{F}\right]\leq\frac{\eta}{n}\left(R_{2r}+c_{0}\right)\left\|M_{l}-M\right\|_{F},

and D2=ηn𝒫Tl𝒳𝒳𝒫𝕋l(MlM)FηnR4rc0MlMF.D_{2}=\frac{\eta}{n}\|\mathcal{P}_{T_{l}}\mathcal{X}^{*}\mathcal{X}\mathcal{P}_{\mathbb{T}_{l}}\left(M_{l}-M\right)\|_{F}\leq\frac{\eta}{n}R_{4r}c_{0}\left\|M_{l}-M\right\|_{F}. To this end, the only term unknown in (S.18) is

D3=𝒫𝕋l𝒳(ξ)F=𝒫𝕋li=1nξiXiF2ri=1nξiXi,D_{3}=\left\|\mathcal{P}_{\mathbb{T}_{l}}\mathcal{X}^{*}\left(\xi\right)\right\|_{F}=\left\|\mathcal{P}_{\mathbb{T}_{l}}\sum_{i=1}^{n}\xi_{i}X_{i}\right\|_{F}\leq\sqrt{2r}\left\|\sum_{i=1}^{n}\xi_{i}X_{i}\right\|,

where i=1nξiXi\left\|\sum_{i=1}^{n}\xi_{i}X_{i}\right\| is the spectral norm of a summation of nn i.i.d. mean zero sub-exponential random matrices. We upper bound i=1nξiXi\left\|\sum_{i=1}^{n}\xi_{i}X_{i}\right\| by Theorem 4. Let Bi:=ξiXiB_{i}:=\xi_{i}X_{i} for all i=1,,ni=1,\cdots,n, then

K:=maxi[n]Biψ1=maxi[n]ξiXiψ1maxi[n]ξiψ2Xiψ2Cu(d1+d2)σξ2.\displaystyle K:=\max_{i\in[n]}\left\|\|B_{i}\|\right\|_{\psi_{1}}=\max_{i\in[n]}\left\|\|\xi_{i}X_{i}\|\right\|_{\psi_{1}}\leq\max_{i\in[n]}\|\xi_{i}\|_{\psi_{2}}\left\|\|X_{i}\|\right\|_{\psi_{2}}\leq\sqrt{C_{u}(d_{1}+d_{2})\sigma_{\xi}^{2}}.

Since 𝔼ξi43σξ4\mathbb{E}\xi_{i}^{4}\leq 3\sigma_{\xi}^{4} and 𝔼Xi43Cu(d1+d2)2\mathbb{E}\|X_{i}\|^{4}\leq 3C_{u}(d_{1}+d_{2})^{2},

𝔼BiBi=𝔼ξi2𝔼Xi2Cu(d1+d2)2σξ2𝔼ξi4+σξ22Cu(d1+d2)𝔼Xi43σξ2Cu(d1+d2),\|\mathbb{E}B_{i}B_{i}^{\top}\|=\mathbb{E}\xi_{i}^{2}\mathbb{E}\|X_{i}\|^{2}\leq\frac{C_{u}(d_{1}+d_{2})}{2\sigma_{\xi}^{2}}\mathbb{E}\xi_{i}^{4}+\frac{\sigma_{\xi}^{2}}{2C_{u}(d_{1}+d_{2})}\mathbb{E}\|X_{i}\|^{4}\leq 3\sigma_{\xi}^{2}C_{u}(d_{1}+d_{2}),

where the first inequality uses the fact aba22c+cb22ab\leq\frac{a^{2}}{2c}+\frac{cb^{2}}{2}. Similarly, 𝔼BiBi3σξ2Cu(d1+d2)\|\mathbb{E}B_{i}B_{i}^{\top}\|\leq 3\sigma_{\xi}^{2}C_{u}(d_{1}+d_{2}).

Therefore,

S2:=𝔼BiBi𝔼BiBi3σξ2Cu(d1+d2).S^{2}:=\|\mathbb{E}B_{i}B_{i}^{\top}\|\vee\|\mathbb{E}B_{i}^{\top}B_{i}\|\leq 3\sigma_{\xi}^{2}C_{u}(d_{1}+d_{2}).

Applying Thorem 4 with α=1\alpha=1, K=c1Cu(d1+d2)σξ2K=c_{1}\sqrt{C_{u}(d_{1}+d_{2})\sigma_{\xi}^{2}} and S=3σξ2Cu(d1+d2)S=\sqrt{3\sigma_{\xi}^{2}C_{u}(d_{1}+d_{2})}, we have

(1ni=1nξiXiC5Cu(d1+d2)σξ2log(d1+d2)n)(d1+d2)10.\displaystyle\mathbb{P}\left(\frac{1}{n}\left\|\sum_{i=1}^{n}\xi_{i}X_{i}\right\|\geq C_{5}\sqrt{C_{u}(d_{1}+d_{2})\sigma_{\xi}^{2}}\sqrt{\frac{\log(d_{1}+d_{2})}{n}}\right)\leq(d_{1}+d_{2})^{-10}.

In conclusion, with probability at least 1(d1+d2)101-(d_{1}+d_{2})^{-10}

D3C1ri=1nξiXiFC1Cuσξ2nr(d1+d2)log(d1+d2),\displaystyle D_{3}\leq C_{1}\sqrt{r}\left\|\sum_{i=1}^{n}\xi_{i}X_{i}\right\|_{F}\leq C_{1}\sqrt{C_{u}\sigma_{\xi}^{2}nr(d_{1}+d_{2})\log(d_{1}+d_{2})},

for some constant C1>0C_{1}>0.

Conditioned on l\mathcal{E}_{l}^{*}, we plug D1D_{1}, D2D_{2} and D3D_{3} into (S.18) and obtain that for some small constant 0<c0<10<c_{0}<1 and absolute constant C2>0C_{2}>0

MlMη𝒫𝕋l(Gl)+𝒫𝕋lNlFMlMη𝒫𝕋l(Gl)F+𝒫𝕋lNlF\displaystyle\|M_{l}-M-\eta\mathcal{P}_{\mathbb{T}_{l}}(G_{l})+\mathcal{P}_{\mathbb{T}_{l}}N_{l}\|_{F}\leq\|M_{l}-M-\eta\mathcal{P}_{\mathbb{T}_{l}}(G_{l})\|_{F}+\|\mathcal{P}_{\mathbb{T}_{l}}N_{l}\|_{F}
(1ρ0)MlMF+C2ησξCur(d1+d2)nlog1/2(d1+d2)\displaystyle\leq\left(1-\rho_{0}\right)\|M_{l}-M\|_{F}+C_{2}\eta\sigma_{\xi}\sqrt{C_{u}}\sqrt{\frac{r(d_{1}+d_{2})}{n}}\log^{1/2}(d_{1}+d_{2})
+C2ηlCu(σξ+σrCu)r(d1+d2)nεlog1/2nlog1/2(1.25lδ),\displaystyle\quad+C_{2}\eta l^{*}\sqrt{C_{u}}(\sigma_{\xi}+\sigma_{r}\sqrt{C_{u}})\frac{r(d_{1}+d_{2})}{n\varepsilon}\log^{1/2}n\log^{1/2}\left(\frac{1.25l^{*}}{\delta}\right),

with probability at least 1(d1+d2)10exp((d1+d2))1-(d_{1}+d_{2})^{-10}-\exp(-(d_{1}+d_{2})) where we define

ρ0:=ηn(1R2rc0R4rc0).\rho_{0}:=\frac{\eta}{n}\left(1-R_{2r}-c_{0}-R_{4r}c_{0}\right).

Suppose that c01R2r(1+R4r)18c_{0}\lesssim\frac{1}{R_{2r}(1+R_{4r})}\wedge\frac{1}{8}, the step size ηn\eta\leq n being a small constant, then we have 0ρ0<10\leq\rho_{0}<1. Further, as for some absolute constant C3>0C_{3}>0, the sample size satisfies

nC3max{\displaystyle n\geq C_{3}\max\Bigg{\{} η2(σξσr)2Cur(d1+d2)log(d1+d2),\displaystyle\eta^{2}\left(\frac{\sigma_{\xi}}{\sigma_{r}}\right)^{2}C_{u}r(d_{1}+d_{2})\log(d_{1}+d_{2}),
ηlCu(σξ+σrCuσr)r(d1+d2)εlog1/2nlog1/2(1.25lδ)},\displaystyle\eta l^{*}\sqrt{C_{u}}\left(\frac{\sigma_{\xi}+\sigma_{r}\sqrt{C_{u}}}{\sigma_{r}}\right)\frac{r(d_{1}+d_{2})}{\varepsilon}\log^{1/2}n\log^{1/2}\left(\frac{1.25l^{*}}{\delta}\right)\Bigg{\}},

we have for some small constant 0<ρ1<10<\rho_{1}<1,

MlMη𝒫𝕋l(Gl)+𝒫𝕋lNlF(1ρ1)MlMF(1ρ1)c0σr.\|M_{l}-M-\eta\mathcal{P}_{\mathbb{T}_{l}}(G_{l})+\mathcal{P}_{\mathbb{T}_{l}}N_{l}\|_{F}\leq\left(1-\rho_{1}\right)\|M_{l}-M\|_{F}\leq(1-\rho_{1})c_{0}\sigma_{r}.

Applying Lemma 4, we obtain that under the condition 40c0<ρ1240c_{0}<\frac{\rho_{1}}{2},

Ml+1MF\displaystyle\left\|M_{l+1}-M\right\|_{F} (1+40c0)MlMη𝒫𝕋l(Gl)+𝒫𝕋lNlF\displaystyle\leq\left(1+40c_{0}\right)\left\|M_{l}-M-\eta\mathcal{P}_{\mathbb{T}_{l}}(G_{l})+\mathcal{P}_{\mathbb{T}_{l}}N_{l}\right\|_{F} (S.19)
(1+40c0)(1ρ1)MlMF\displaystyle\leq\left(1+40c_{0}\right)(1-\rho_{1})\|M_{l}-M\|_{F}
(1ρ12)MlMF(1ρ12)c0σr.\displaystyle\leq\left(1-\frac{\rho_{1}}{2}\right)\|M_{l}-M\|_{F}\leq\left(1-\frac{\rho_{1}}{2}\right)c_{0}\sigma_{r}.

In summary, conditioned on the event l\mathcal{E}_{l}^{*}, the event

l+1:={Ml+1MFc0σr},\mathcal{E}_{l+1}:=\left\{\left\|M_{l+1}-M\right\|_{F}\leq c_{0}\sigma_{r}\right\},

occurs with probability at least 1(d1+d2)10exp((d1+d2))1-(d_{1}+d_{2})^{-10}-\exp(-(d_{1}+d_{2})). Besides, according to Δl\mathcal{E}_{\Delta_{l}}^{\prime} defined in (S.15), the event

Δl+1:={Δl+1C~3ηn(σξ+σrCu)Cur(d1+d2+logn)logn},\mathcal{E}_{\Delta_{l+1}}:=\left\{\Delta_{l+1}\leq\widetilde{C}_{3}\frac{\eta}{n}\left(\sigma_{\xi}+\sigma_{r}\sqrt{C_{u}}\right)\sqrt{C_{u}r\left(d_{1}+d_{2}+\log n\right)\log n}\right\},

occurs with probability 1n9exp(10Cu(d1+d2))n91-n^{-9}-\exp\left(-10C_{u}\left(d_{1}+d_{2}\right)\right)n^{-9}. Therefore, conditioned on l\mathcal{E}_{l}^{*}, the event l+1\mathcal{E}_{l+1}^{*} happens with probability at least 1(d1+d2)10exp((d1+d2))n9exp(10Cu(d1+d2))n9.1-(d_{1}+d_{2})^{-10}-\exp(-(d_{1}+d_{2}))-n^{-9}-\exp\left(-10C_{u}\left(d_{1}+d_{2}\right)\right)n^{-9}.

To this end, we has finished the induction and conclude Part C.2 by

(i=0li)\displaystyle\mathbb{P}\left(\bigcap_{i=0}^{l}\mathcal{E}_{i}^{*}\right) 1c~2exp(c~3r(d1+d2))(d1+d2)10n9ed1ed21020r\displaystyle\geq 1-\widetilde{c}_{2}\exp\left(-\widetilde{c}_{3}r(d_{1}+d_{2})\right)-(d_{1}+d_{2})^{-10}-n^{-9}-e^{-d_{1}}-e^{-d_{2}}-10^{-20r}
l((d1+d2)10+exp((d1+d2)))(l+1)(n9+exp(10Cu(d1+d2))n9).\displaystyle\quad-l\left((d_{1}+d_{2})^{-10}+\exp(-(d_{1}+d_{2}))\right)-(l+1)\left(n^{-9}+\exp\left(-10C_{u}(d_{1}+d_{2})\right)n^{-9}\right).

C.3 Part 3

In Part C.3, we derive the convergence rate of MlMF\left\|M_{l^{*}}-M\right\|_{F} and choose an appropriate value for ll^{*}. Conditioned on the event i=0l1i\bigcap_{i=0}^{l^{*}-1}\mathcal{E}_{i}^{*}, according to (S.19), with probability at least 1(d1+d2)10exp((d1+d2))1-(d_{1}+d_{2})^{-10}-\exp(-(d_{1}+d_{2}))

M~lMF=MlMF\displaystyle\left\|\widetilde{M}_{l^{*}}-M\right\|_{F}=\left\|M_{l^{*}}-M\right\|_{F}
(1ρ0)lM0MF+(l=0l1(1ρ0)ll1)C2ησξCur(d1+d2)nlog1/2(d1+d2)\displaystyle\leq\left(1-\rho_{0}\right)^{l^{*}}\|M_{0}-M\|_{F}+\left(\sum_{l=0}^{l^{*}-1}\left(1-\rho_{0}\right)^{l^{*}-l-1}\right)C_{2}\eta\sigma_{\xi}\sqrt{C_{u}}\sqrt{\frac{r(d_{1}+d_{2})}{n}}\log^{1/2}(d_{1}+d_{2})
+(l=0l1(1ρ0)ll1)C2ηlCu(σξ+σrCu)r(d1+d2)nεlog1/2nlog1/2(1.25lδ).\displaystyle\quad+\left(\sum_{l=0}^{l^{*}-1}\left(1-\rho_{0}\right)^{l^{*}-l-1}\right)C_{2}\eta l^{*}\sqrt{C_{u}}(\sigma_{\xi}+\sigma_{r}\sqrt{C_{u}})\frac{r(d_{1}+d_{2})}{n\varepsilon}\log^{1/2}n\log^{1/2}\left(\frac{1.25l^{*}}{\delta}\right).

Let M0MF=c0\left\|M_{0}-M^{*}\right\|_{F}=c_{0}^{*} and l:=log(c0n)/ρ0l^{*}:=\log\left(c_{0}^{*}n\right)/\rho_{0}, then we have (1ρ0)lM0MF1n\left(1-\rho_{0}\right)^{l^{*}}\|M_{0}-M\|_{F}\asymp\frac{1}{n}, indicating that there is little reason to run the algorithm further than O(logn)O(\log n) iterations.

In conclusion,

M~lMF\displaystyle\left\|\widetilde{M}_{l^{*}}-M\right\|_{F} C3σξCur(d1+d2)nlog1/2(d1+d2)\displaystyle\leq C_{3}\sigma_{\xi}\sqrt{C_{u}}\sqrt{\frac{r(d_{1}+d_{2})}{n}}\log^{1/2}(d_{1}+d_{2})
+C3Cu(σξ+σrCu)r(d1+d2)nεlog1/2nlog3/2(1.25lδ).\displaystyle+C_{3}\sqrt{C_{u}}(\sigma_{\xi}+\sigma_{r}\sqrt{C_{u}})\frac{r(d_{1}+d_{2})}{n\varepsilon}\log^{1/2}n\log^{3/2}\left(\frac{1.25l^{*}}{\delta}\right).

with probability at least

1\displaystyle 1 c~2exp(c~3r(d1+d2))(d1+d2)10n9ed1ed21020r\displaystyle-\widetilde{c}_{2}\exp\left(-\widetilde{c}_{3}r(d_{1}+d_{2})\right)-(d_{1}+d_{2})^{-10}-n^{-9}-e^{-d_{1}}-e^{-d_{2}}-10^{-20r}
((d1+d2)10+exp((d1+d2))+n9+exp(10Cu(d1+d2))n9)logn.\displaystyle-\left((d_{1}+d_{2})^{-10}+\exp(-(d_{1}+d_{2}))+n^{-9}+\exp\left(-10C_{u}(d_{1}+d_{2})\right)n^{-9}\right)\log n.

Appendix D The lower bound derived by score attack argument

Let 𝕄r:={Md1×d2:rank(M)=r}\mathbb{M}_{r}:=\{M\in\mathbb{R}^{d_{1}\times d_{2}}:\operatorname{rank}(M)=r\}. This section establishes the minimax lower bound of differentially privately estimating the matrix Mk=1r𝕄k,M\in\bigcup_{k=1}^{r}\mathbb{M}_{k}, within the trace regression model based on an alternative approach, score attack argument Cai et al. (2023).

The score attack argument involves designing a test statistic and establishing the lower bound of the statistic with the help of a prior distribution of the parameters to estimate. It is unclear, however, how to construct a prior distribution for the low-rank matrix MM such that the prior complies with the parameter space 𝕄r\mathbb{M}_{r} and the score attack is easy to compute at the same time. Compared to DP-fano’s Lemma (See Lemma 3) which requires δen\delta\lesssim e^{-n}, the score attack argument is valid for a wider range of δn1+γ\delta\lesssim n^{1+\gamma} where γ>0\gamma>0 is a constant. We first define some necessary notations for the elaboration of score attack argument. For any matirx B,Cd1×d2B,C\in\mathbb{R}^{d_{1}\times d_{2}}, we denote supp(B):={(i,j)[d1]×[d2]:Bij0}\operatorname{supp}(B):=\{(i,j)\in[d_{1}]\times[d_{2}]:B_{ij}\neq 0\} as the support of BB and the matrix CC restricted on suppB\operatorname{supp}B is [C]supp(B)=i=1d1j=1d2Cijeiej𝕀(Bij0)\left[C\right]_{\operatorname{supp}(B)}=\sum_{i=1}^{d_{1}}\sum_{j=1}^{d_{2}}C_{ij}e_{i}e_{j}^{\top}\mathbb{I}(B_{ij}\neq 0) where eie_{i} is the ii-th canonical basis in d1\mathbb{R}^{d_{1}} and eje_{j} is the jj-th canonical basis in d2\mathbb{R}^{d_{2}}.

To apply score attack argument, we relax the problem to deriving minimax lower bounds over 𝕄r,d1:={Md1×d2:supp(M)[d1]×[r]}k=1r𝕄k\mathbb{M}_{r,d_{1}}:=\{M\in\mathbb{R}^{d_{1}\times d_{2}}:\operatorname{supp}(M)\subset[d_{1}]\times[r]\}\subset\bigcup_{k=1}^{r}\mathbb{M}_{k}. The benefit is that there exists a trivial prior of M𝕄r,d1M\in\mathbb{M}_{r,d_{1}} such that Miji.i.d.𝒩(0,1)M_{ij}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathcal{N}(0,1) for (i,j)[d1]×[r](i,j)\in[d_{1}]\times[r] and Mij=0M_{ij}=0 otherwise. Similarly, we may consider establish minimax lower bound over 𝕄r,d2:={Md1×d2:supp(M)[r]×[d2]}k=1r𝕄k.\mathbb{M}_{r,d_{2}}:=\{M\in\mathbb{R}^{d_{1}\times d_{2}}:\operatorname{supp}(M)\subset[r]\times[d_{2}]\}\subset\bigcup_{k=1}^{r}\mathbb{M}_{k}. For any M𝕄r,d2M\in\mathbb{M}_{r,d_{2}}, there is a trivial prior as well. Let AA be a randomized algorithm mapping a dataset ZZ to a d1×d2d_{1}\times d_{2} matrix. We define the DP-constrained minimax risk over k=1r𝕄k\bigcup_{k=1}^{r}\mathbb{M}_{k} as

risk(k=1r𝕄k):=infAsupM𝕄r𝔼A(Z)MF2,\operatorname{risk}(\bigcup_{k=1}^{r}\mathbb{M}_{k}):=\inf_{A}\sup_{M\in\mathbb{M}_{r}}\mathbb{E}\left\|A(Z)-M\right\|_{F}^{2},

where AA is taken over all (ϵ,δ)(\epsilon,\delta)-DP algorithms. Similarly, we define risk(𝕄r,d1)\operatorname{risk}(\mathbb{M}_{r,d_{1}}) and risk(𝕄r,d2)\operatorname{risk}(\mathbb{M}_{r,d_{2}}). Since 𝕄r,d1k=1r𝕄k\mathbb{M}_{r,d_{1}}\subset\bigcup_{k=1}^{r}\mathbb{M}_{k} and 𝕄r,d2k=1r𝕄k\mathbb{M}_{r,d_{2}}\subset\bigcup_{k=1}^{r}\mathbb{M}_{k}, we have

risk(k=1r𝕄k)risk(𝕄r,d1)risk(𝕄r,d2),\operatorname{risk}(\bigcup_{k=1}^{r}\mathbb{M}_{k})\geq\operatorname{risk}(\mathbb{M}_{r,d_{1}})\bigvee\operatorname{risk}(\mathbb{M}_{r,d_{2}}), (S.20)

which indicates that the lower bound of risk(k=1r𝕄k)\operatorname{risk}(\bigcup_{k=1}^{r}\mathbb{M}_{k}) will be an immediate result once we successfully lower bound risk(𝕄r,d1)\operatorname{risk}(\mathbb{M}_{r,d_{1}}) and risk(𝕄r,d2)\operatorname{risk}(\mathbb{M}_{r,d_{2}}).

Next, we construct score attacks to derive the lower bounds of risk(𝕄r,d1)\operatorname{risk}(\mathbb{M}_{r,d_{1}}) and risk(𝕄r,d2)\operatorname{risk}(\mathbb{M}_{r,d_{2}}). Let z=(X,y)z=(X,y) be the pair of a measurement matrix and its corresponding response variable, drawn independently from (3). The score function is defined by

SM(z):=MlogfM(z)=MlogfM(y|X),\displaystyle S_{M}(z):=\nabla_{M}\log f_{M}(z)=\nabla_{M}\log f_{M}(y|X),

and the score attack is defined by

𝒜M(1)(z,A(Z)):=[A(Z)M][d1]×[r],SM(z),\mathcal{A}^{(1)}_{M}(z,A(Z)):=\left<\left[A(Z)-M\right]_{[d_{1}]\times[r]},S_{M}(z)\right>,

where AA is an (ε,δ)(\varepsilon,\delta)-DP algorithm to estimate M𝕄rM\in\mathbb{M}_{r}; z=(X,y)z=(X,y) is a piece of datum that we want to test whether it belongs to Z={(Xi,yi)}i=1nZ=\{(X_{i},y_{i})\}_{i=1}^{n}; the quantity [A(Z)M][d1]×[r]\left[A(Z)-M\right]_{[d_{1}]\times[r]} is obtained by restricting A(Z)Md1×d2A(Z)-M\in\mathbb{R}^{d_{1}\times d_{2}} to the index set [d1]×[r][d_{1}]\times[r]. Under some regularity conditions, the score attack 𝒜M(1)(z,A(Z))\mathcal{A}^{(1)}_{M}(z,A(Z)) will lead to the lower bound of risk(𝕄r,d1)\operatorname{risk}(\mathbb{M}_{r},d_{1}). Similarly, we derive the lower bound of risk(𝕄r,d2)\operatorname{risk}(\mathbb{M}_{r},d_{2}) with the help of the attack

𝒜M(2)(z,A(Z)):=[A(Z)M][r]×[d2],SM(z).\mathcal{A}^{(2)}_{M}(z,A(Z)):=\left<\left[A(Z)-M\right]_{[r]\times[d_{2}]},S_{M}(z)\right>.

Finally, Theorem 5 establishes the lower bound for estimating the low-rank matrix MM. The proof of Theorem 5.

Theorem 5.

Consider i.i.d. observations Z={z1,,zn}Z=\{z_{1},\cdots,z_{n}\} drawn from the trace regression model defined in (1)(\ref{equ: trace_regression_model}), where zi:=(Xi,yi)z_{i}:=(X_{i},y_{i}) for i=1,,ni=1,\cdots,n. We assume that {Xi}i[n]\{X_{i}\}_{i\in[n]} satisfy the Assumption 1, r(d1d2)nεr(d_{1}\vee d_{2})\lesssim n\varepsilon, 0<ε<10<\varepsilon<1 and δn(1+γ)\delta\lesssim n^{-(1+\gamma)} for some γ>0\gamma>0, then

risk(k=1r𝕄k)=infAsupMk=1r𝕄k𝔼A(Z)MF2σξ2r(d1d2)na1+σξ2r2(d1d2)2n2ε2a2.\operatorname{risk}(\bigcup_{k=1}^{r}\mathbb{M}_{k})=\inf_{A}\sup_{M\in\bigcup_{k=1}^{r}\mathbb{M}_{k}}\mathbb{E}\left\|A(Z)-M\right\|_{F}^{2}\gtrsim\underbrace{\sigma_{\xi}^{2}\;\frac{r(d_{1}\vee d_{2})}{n}}_{a_{1}}+\underbrace{\sigma_{\xi}^{2}\;\frac{r^{2}(d_{1}\vee d_{2})^{2}}{n^{2}\varepsilon^{2}}}_{a_{2}}. (S.21)

By Theorem 5, the lower bound of risk(𝕄r)\operatorname{risk}(\mathbb{M}_{r}) consists of two terms where the first term a1a_{1} accounts for the statistical error and the second term a2a_{2} is the cost of privacy. The proof for a1a_{1} can be found in Rohde and Tsybakov (2011) and the cost of privacy is deduced in the following proof.

Proof of Theorem 5.

We now start proving Theorem 5 by score attack argument. Throughout the proof, we assume that Z={z1,,zn}Z=\{z_{1},\cdots,z_{n}\} is an i.i.d. sample drawn from fMf_{M} and ZiZ_{i}^{\prime} is a neighbouring data set of ZZ obtained by replacing ziz_{i} with an independent copy zifMz_{i}^{\prime}\sim f_{M}. Besides, we mainly focus on the case M𝕄r,d1M\in\mathbb{M}_{r,d_{1}} and states the result for the case M𝕄r,d2M\in\mathbb{M}_{r,d_{2}} in Remark 1. Let

𝒜M(1)(z,A(Z)):=[A(Z)M][d1]×[r],SM(z).\mathcal{A}^{(1)}_{M}(z,A(Z)):=\left<\left[A(Z)-M\right]_{[d_{1}]\times[r]},S_{M}(z)\right>.

We derive the lower bound of risk(𝕄r,d1):=infAsupM𝕄r,d1𝔼A(Z)MF2,\operatorname{risk}(\mathbb{M}_{r,d_{1}}):=\inf_{A}\sup_{M\in\mathbb{M}_{r,d_{1}}}\mathbb{E}\left\|A(Z)-M\right\|_{F}^{2}, in three steps. For ease of notation, we define

Ai:=𝒜M(zi,A(Zi))andAi:=𝒜M(zi,A(Z)).\displaystyle A_{i}^{\prime}:=\mathcal{A}_{M}\left(z_{i},A(Z_{i}^{\prime})\right)\quad{\rm and}\quad A_{i}:=\mathcal{A}_{M}\left(z_{i},A(Z)\right).

Step 1: bounding the summation. The following Lemma 10 bounds 𝔼|Ai|\mathbb{E}\left|A_{i}^{\prime}\right|; Lemma 11 develops the upper bound of i[n]𝔼Ai\sum_{i\in[n]}\mathbb{E}\,A_{i} based on 𝔼|Ai|\mathbb{E}\left|A_{i}^{\prime}\right| discussed in Lemma 10 and a tunning parameter TT. The proof of Lemma 10 and 11 can be found in Appendix E.7 and E.8.

Lemma 10.

For i[n]i\in[n], we have 𝔼Ai=0and𝔼|Ai|𝔼A(Z)MF2Cuσξ2.\mathbb{E}A_{i}^{\prime}=0\quad{\rm and}\quad\mathbb{E}\left|A_{i}^{\prime}\right|\leq\sqrt{\mathbb{E}\left\|A(Z)-M\right\|^{2}_{F}}\sqrt{\frac{C_{u}}{\sigma_{\xi}^{2}}}.

Lemma 11.

Let AA be an (ε,δ)(\varepsilon,\delta)-DP algorithm with 0<ε<10<\varepsilon<1 and δ0\delta\geq 0, under model (1), by choosing T=2/σξ2rd1log(1δ)T=\sqrt{2/\sigma_{\xi}^{2}}rd_{1}\sqrt{\log(\frac{1}{\delta})}, we have

i[n]𝔼Ai2nε𝔼A(Z)MF2Cu/σξ2+42δrd1log(1/δ)/σξ2.\displaystyle\sum_{i\in[n]}\mathbb{E}A_{i}\leq 2n\varepsilon\sqrt{\mathbb{E}\|A(Z)-M\|_{F}^{2}}\sqrt{C_{u}/\sigma_{\xi}^{2}}+4\sqrt{2}\delta rd_{1}\sqrt{\log(1/\delta)/\sigma_{\xi}^{2}}. (S.22)

Step 2: lower bounding the summation. Under some regularity conditions, the following Lemma 12 characterize the quantity i[n]Ai\sum_{i\in[n]}A_{i} as a summation of functions of MM. Lemma 13 lower bounds the summation of functions by assigning an appropriate prior distribution π\pi to MM. The proof of Lemma 12 and 13 can be found in Appendix E.8.

Lemma 12.

If for every (i,j)[d1]×[r],logfM(Z)(i,j)\in[d_{1}]\times[r],\log f_{M}(Z) is continuously differentiable with respect to MijM_{ij} and |MijlogfM(Z)|<hij(Z)\left|\frac{\partial}{\partial M_{ij}}\log f_{M}(Z)\right|<h_{ij}(Z) such that 𝔼|hij(Z)A(M)ij|<\mathbb{E}\left|h_{ij}(Z)A(M)_{ij}\right|<\infty, we have

i[n]𝔼𝒜M1(zi,A(Z))=(i,j)[d1]×[r]Mij𝔼A(Z)ij.\sum_{i\in[n]}\mathbb{E}\,\mathcal{A}^{1}_{M}\left(z_{i},A(Z)\right)=\sum_{(i,j)\in[d_{1}]\times[r]}\frac{\partial}{\partial M_{ij}}\mathbb{E}A(Z)_{ij}.

Lemma 12 has its general form stated in Theorem 2.12.1, Cai et al. (2023). Let gijg_{ij} be a function defined by gij(M):=(𝔼Z|MA(Z))ijg_{ij}(M):=\left(\mathbb{E}_{Z|M}A(Z)\right)_{ij} for all (i,j)[d1]×[r](i,j)\in[d_{1}]\times[r], then

(i,j)[d1]×[r]Mij𝔼A(Z)ij=𝔼π((i,j)[d1]×[r]Mijgij).\sum_{(i,j)\in[d_{1}]\times[r]}\frac{\partial}{\partial M_{ij}}\mathbb{E}A(Z)_{ij}=\mathbb{E}_{\pi}\left(\sum_{(i,j)\in[d_{1}]\times[r]}\frac{\partial}{\partial M_{ij}}g_{ij}\right).

Lemma 13 lower bounds this quantity by assigning the prior distribution π\pi to MM such that Mij𝒩(0,1)M_{ij}\sim\mathcal{N}(0,1) for all (i,j)[d1]×[r](i,j)\in[d_{1}]\times[r] and otherwise, Mij=0M_{ij}=0.

Lemma 13.

Let M𝕄r,d1M\in\mathbb{M}_{r,d_{1}} be distributed according to a density π\pi whose marginal densities are {πij}\{\pi_{ij}\} for i=1,,d1i=1,\cdots,d_{1} and j=1,,d2j=1,\cdots,d_{2} such that πij𝒩(0,1)\pi_{ij}\sim\mathcal{N}(0,1) for all (i,j)[d1]×[r](i,j)\in[d_{1}]\times[r], and otherwise, πij\pi_{ij} be the density function such that (Mij=0)=1\mathbb{P}(M_{ij}=0)=1. Then,

𝔼π((i,j)[d1]×[r]Mijgij)(i,j)[d1]×[r]𝔼πijMij2C𝔼πijMij2=rd1Crd1rd1.\mathbb{E}_{\pi}\left(\sum_{(i,j)\in[d_{1}]\times[r]}\frac{\partial}{\partial M_{ij}}g_{ij}\right)\geq\sum_{(i,j)\in[d_{1}]\times[r]}\mathbb{E}_{\pi_{ij}}M_{ij}^{2}-\sqrt{C}\sqrt{\mathbb{E}_{\pi_{ij}}M_{ij}^{2}}=rd_{1}-\sqrt{Crd_{1}}\gtrsim rd_{1}.

Combining Lemma 12 and 13, we obtain

i[n]𝔼Ai=i[n]𝔼𝒜M(zi,A(Z))rd1.\sum_{i\in[n]}\mathbb{E}A_{i}=\sum_{i\in[n]}\mathbb{E}\,\mathcal{A}_{M}\left(z_{i},A(Z)\right)\gtrsim rd_{1}. (S.23)

Step 3: combining the upper and lower bounds. Combining the lower bound (S.23)(\ref{equ: lower_bound_summation_A_i}) of i[n]𝔼Ai\sum_{i\in[n]}\mathbb{E}A_{i} and the upper bound (S.22) of i[n]𝔼Ai\sum_{i\in[n]}\mathbb{E}A_{i}, we have

2nε𝔼π𝔼ZMA(Z)MF2Cu/σξ2rd142δrd1log(1/δ)/σξ2.2n\varepsilon\sqrt{\mathbb{E}_{\pi}\mathbb{E}_{Z\mid M}\|A(Z)-M\|_{F}^{2}}\sqrt{C_{u}/\sigma_{\xi}^{2}}\gtrsim rd_{1}-4\sqrt{2}\delta rd_{1}\sqrt{\log(1/\delta)/\sigma_{\xi}^{2}}.

Under the assumption that δ<n(1+γ)\delta<n^{-(1+\gamma)} for some γ>0\gamma>0, we have rd142δrd1log(1/δ)/σξ2rd1rd_{1}-4\sqrt{2}\delta rd_{1}\sqrt{\log(1/\delta)/\sigma_{\xi}^{2}}\gtrsim rd_{1}, and therefore 𝔼π𝔼ZMA(Z)MF2σξ2r2d12n2ε2.\mathbb{E}_{\pi}\mathbb{E}_{Z\mid M}\left\|A(Z)-M\right\|_{F}^{2}\gtrsim\sigma_{\xi}^{2}\cdot\frac{r^{2}d_{1}^{2}}{n^{2}\varepsilon^{2}}. Since the sup-risk is greater than the Bayesian risk,

supM𝕄r,d1𝔼ZMA(Z)MF2σξ2r2d12n2ε2.\sup_{M\in\mathbb{M}_{r,d_{1}}}\mathbb{E}_{Z\mid M}\left\|A(Z)-M\right\|_{F}^{2}\gtrsim\sigma_{\xi}^{2}\cdot\frac{r^{2}d_{1}^{2}}{n^{2}\varepsilon^{2}}.

Furthermore, due to 𝕄r,d1k=1r𝕄k\mathbb{M}_{r,d_{1}}\subset\bigcup_{k=1}^{r}\mathbb{M}_{k}, we have supMk=1r𝕄k𝔼ZMA(Z)MF2σξ2r2d12n2ε2\sup_{M\in\bigcup_{k=1}^{r}\mathbb{M}_{k}}\mathbb{E}_{Z\mid M}\left\|A(Z)-M\right\|_{F}^{2}\gtrsim\sigma_{\xi}^{2}\cdot\frac{r^{2}d_{1}^{2}}{n^{2}\varepsilon^{2}} and

infAsupMk=1r𝕄k𝔼ZMA(Z)MF2σξ2r2d12n2ε2,\inf_{A}\sup_{M\in\bigcup_{k=1}^{r}\mathbb{M}_{k}}\mathbb{E}_{Z\mid M}\left\|A(Z)-M\right\|_{F}^{2}\gtrsim\sigma_{\xi}^{2}\cdot\frac{r^{2}d_{1}^{2}}{n^{2}\varepsilon^{2}},

where AA is an (ε,δ)(\varepsilon,\delta)-DP algorithm that satisfies 𝔼ZMA(Z)MF21\mathbb{E}_{Z\mid M}\left\|A(Z)-M\right\|_{F}^{2}\lesssim 1. This conclusion extends to all differentially private AA if we assume that rd1nεrd_{1}\lesssim n\varepsilon such that r2d12n2ε21\frac{r^{2}d_{1}^{2}}{n^{2}\varepsilon^{2}}\lesssim 1.

Remark 1.

Lemma 11 and 13 are also applicable to the case where the parameter space is Mr,d2M_{r,d_{2}}. For M𝕄r,d2M\in\mathbb{M}_{r,d_{2}}, Lemma 11 implies that

i[n]𝔼Ai2nε𝔼A(Z)MF2Cu/σξ2+42δrd2log(1/δ)/σξ2;\sum_{i\in[n]}\mathbb{E}A_{i}\leq 2n\varepsilon\sqrt{\mathbb{E}\|A(Z)-M\|_{F}^{2}}\sqrt{C_{u}/\sigma_{\xi}^{2}}+4\sqrt{2}\delta rd_{2}\sqrt{\log(1/\delta)/\sigma_{\xi}^{2}};

and Lemma 13 results in 𝔼((i,j)[d1]×[r]Mijgij)rd2.\mathbb{E}\left(\sum_{(i,j)\in[d_{1}]\times[r]}\frac{\partial}{\partial M_{ij}}g_{ij}\right)\gtrsim rd_{2}. Therefore, as δ<n(1+γ)\delta<n^{-(1+\gamma)} for some γ>0\gamma>0, the minimax lower bound

infAsupMk=1r𝕄k𝔼ZMA(Z)MF2σξ2r2d22n2ε2,\inf_{A}\sup_{M\in\bigcup_{k=1}^{r}\mathbb{M}_{k}}\mathbb{E}_{Z\mid M}\left\|A(Z)-M\right\|_{F}^{2}\gtrsim\sigma_{\xi}^{2}\cdot\frac{r^{2}d_{2}^{2}}{n^{2}\varepsilon^{2}},

where AA is an (ε,δ)(\varepsilon,\delta)-DP algorithm that satisfies 𝔼ZMA(Z)MF21\mathbb{E}_{Z\mid M}\left\|A(Z)-M\right\|_{F}^{2}\lesssim 1. Similarly, this conclusion extends to all differentially private AA if we assume that rd2nεrd_{2}\lesssim n\varepsilon such that r2d22n2ε21\frac{r^{2}d_{2}^{2}}{n^{2}\varepsilon^{2}}\lesssim 1.

Appendix E Proofs of Technical Lemmas

E.1 Proof of Lemma 1

Lemma 1 is a consequence of Proposition 10.410.4, Vershynin (2015) and Lemma 11, Chen et al. (2019) by setting cl2=Clc_{l}^{2}=C_{l} and cu2=Cuc_{u}^{2}=C_{u} and τ2CuCl\tau^{2}\asymp\frac{\sqrt{C_{u}}}{\sqrt{C_{l}}}. See the definiton of cl2c_{l}^{2} and cu2c_{u}^{2} in Lemma 11, Chen et al. (2019).

E.2 Proof of Lemma 2

The proof of Lemma 2 involves applying the symmetric dilation trick to Theorem 11, Xia (2021).

Lemma 14 (Theorem 1, Xia (2021)).

Let Bd×dB\in d\times d be a rank-rr symmetric matrix with eigen-decomposition of the form B=ΘΛΘB=\Theta\Lambda\Theta^{\top} where Θ𝕆d,r\Theta\in\mathbb{O}_{d,r} and the diagonal matrix Λ={λ1,,λr}\Lambda=\{\lambda_{1},\cdots,\lambda_{r}\} has the eigenvalues of BB arranging in the non-increasing order. Let B^=B+ΔB\widehat{B}=B+\Delta_{B} be another d×dd\times d symmetric matrix and leading rr eigen vector of B^\widehat{B} is given by Θ^𝕆r,d\widehat{\Theta}\in\mathbb{O}_{r,d}. Then,

Θ^Θ^ΘΘ=k1𝒮B,k(ΔB),\widehat{\Theta}\widehat{\Theta}^{\top}-\Theta\Theta^{\top}=\sum_{k\geq 1}\mathcal{S}_{B,k}(\Delta_{B}),

where the kk-th order term 𝒮M,k(Δ)\mathcal{S}_{M_{*},k}(\Delta) is a summation of (2kk)\binom{2k}{k} terms defined by

𝒮B,k(ΔB)=𝐬:s1++sk+1=k(1)1+τ(𝐬)Qs1ΔBQs2ΔBQsk+1,\mathcal{S}_{B,k}(\Delta_{B})=\sum_{\mathbf{s}:s_{1}+\ldots+s_{k+1}=k}(-1)^{1+\tau(\mathbf{s})}\cdot Q^{-s_{1}}\Delta_{B}Q^{-s_{2}}\ldots\Delta_{B}Q^{-s_{k+1}},

where 𝐬=(s1,,sk+1)\mathbf{s}=\left(s_{1},\ldots,s_{k+1}\right) contains non-negative indices and τ(𝐬)=j=1k+1𝕀(sj>0).\tau(\mathbf{s})=\sum_{j=1}^{k+1}\mathbb{I}\left(s_{j}>0\right).

Lemma 14 provides an explicit representation formula for the spectral projector Θ^Θ^\widehat{\Theta}\widehat{\Theta}^{\top} given that BB is symmetric and of rank-rr. Since we are interested in the asymmetric rank-rr matrix M=UΣVd1×d2d1×d2M=U\Sigma V^{\top}\in\mathbb{R}^{d_{1}\times d_{2}}\in\mathbb{R}^{d_{1}\times d_{2}}, we apply the symmetric dilation trick to MM and obtain the rank-2r2r symmetric matrix MM_{*} has eigendecomposition of the form

M=UMΣMUM=12(UUVV)(Σ00Σ)12(UUVV).M^{*}=U_{M^{*}}\Sigma_{M^{*}}U_{M^{*}}^{\top}=\frac{1}{\sqrt{2}}\left(\begin{array}[]{cc}U&U\\ V&-V\end{array}\right)\left(\begin{array}[]{cc}\Sigma&0\\ 0&-\Sigma\end{array}\right)\frac{1}{\sqrt{2}}\left(\begin{array}[]{cc}U&U\\ V&-V\end{array}\right)^{\top}.

The proof is finished by applying Lemma 14 with B=MB=M_{*}, B^=M+Δ\widehat{B}=M_{*}+\Delta_{*}, d=d1+d2d=d_{1}+d_{2}, the rank be 2r2r and

Θ=12(UUVV)andΘ^=12(U^U^V^V^).\displaystyle\Theta=\frac{1}{\sqrt{2}}\left(\begin{array}[]{cc}U&U\\ V&-V\end{array}\right)\quad{\rm and}\quad\widehat{\Theta}=\frac{1}{\sqrt{2}}\left(\begin{array}[]{cc}\widehat{U}&\widehat{U}\\ \widehat{V}&-\widehat{V}\end{array}\right).

E.3 Proof of Lemma 3 and 4

See the proof of Lemma 3 in Acharya et al. (2021) and Cai et al. (2024). See the proof of 4 in Shen et al. (2023).

E.4 Proof of Lemma 5

maxi[n]ΔΔ(i)\displaystyle\max_{i\in[n]}\big{\|}\Delta-\Delta^{(i)}\big{\|} 1nmat(Λi1vec(Xi))(Xi,M+ξi)\displaystyle\leq\frac{1}{n}\left\|\operatorname{mat}(\Lambda_{i}^{-1}\operatorname{vec}(X_{i}))\left(\left<X_{i},M\right>+\xi_{i}\right)\right\|
+maxi[n]1nmat((Λi)1vec(Xi))(Xi,M+ξi),\displaystyle\quad+\max_{i\in[n]}\,\frac{1}{n}\left\|\operatorname{mat}((\Lambda_{i}^{\prime})^{-1}\operatorname{vec}(X_{i}^{\prime}))\left(\left<X_{i}^{\prime},M\right>+\xi_{i}\right)\right\|,

where ξiψ2=σξ\|\xi_{i}\|_{\psi_{2}}=\sigma_{\xi} and Xi,MN(0,vec(M)Λivec(M)),Λi1vec(Xi)N(0,Λi1).\left\langle X_{i},M\right\rangle\sim N\left(0,\operatorname{vec}\left(M\right)^{\top}\Lambda_{i}\operatorname{vec}\left(M\right)\right),\quad\Lambda_{i}^{-1}\operatorname{vec}\left(X_{i}\right)\sim N\left(0,\Lambda_{i}^{-1}\right).

ξi+Xi,Mmat(Λi1vec(Xi))MΨ1\displaystyle\left\|\|\xi_{i}+\left\langle X_{i},M\right\rangle\operatorname{mat}\left(\Lambda_{i}^{-1}\operatorname{vec}\left(X_{i}\right)\right)-M\|\right\|_{\Psi_{1}}
ξi+Xi,MΨ2mat(Λi1vec(Xi))Ψ2c0Cl1(σξ+Curσ1),\displaystyle\leq\left\|\xi_{i}+\left\langle X_{i},M\right\rangle\right\|_{\Psi_{2}}\left\|\|\operatorname{mat}\left(\Lambda_{i}^{-1}\operatorname{vec}\left(X_{i}\right)\right)\|\right\|_{\Psi_{2}}\leq c_{0}\sqrt{C_{l}^{-1}}\left(\sigma_{\xi}+\sqrt{C_{u}}\sqrt{r}\sigma_{1}\right),

for some absolute constant c0>0c_{0}>0. Therefore, for some absolute constant C3>0C_{3}>0, with probability at least 1n101-n^{-10},

mat(Λi1vec(Xi))(Xi,M+ξi)C3Cl1(σξ+Curσ1)logn.\left\|\operatorname{mat}(\Lambda_{i}^{-1}\operatorname{vec}(X_{i}))\left(\left<X_{i},M\right>+\xi_{i}\right)\right\|\leq C_{3}\sqrt{C_{l}^{-1}}\left(\sigma_{\xi}+\sqrt{C_{u}}\sqrt{r}\sigma_{1}\right)\log n.

Taking maximum over nn, with probability at least 1n91-n^{-9}

maxi[n]ΔΔ(i)C3n1Cl1(Curσ1+σξ)logn.\displaystyle\max_{i\in[n]}\big{\|}\Delta-\Delta^{(i)}\big{\|}\leq C_{3}\cdot n^{-1}\sqrt{C_{l}^{-1}}\left(\sqrt{C_{u}r}\sigma_{1}+\sigma_{\xi}\right)\log n.

In (S.1), we have already shown that that for some absolute constant C1>0C_{1}>0,

Δ=L^MC1Cl1(σξ+Curσ1)(d1d2)log(d1+d2)n,\|\Delta\|=\|\widehat{L}-M\|\leq C_{1}\sqrt{C_{l}^{-1}}\left(\sigma_{\xi}+\sqrt{C_{u}}\sqrt{r}\sigma_{1}\right)\sqrt{\frac{(d_{1}\vee d_{2})\log(d_{1}+d_{2})}{n}}, (S.24)

with probability at least 1(d1+d2)101-(d_{1}+d_{2})^{-10}. Note that for all i[n]i\in[n], Δ(i)=Δ(ΔΔ(i))Δ+ΔΔ(i),\|\Delta^{(i)}\|=\|\Delta-(\Delta-\Delta^{(i)})\|\leq\|\Delta\|+\|\Delta-\Delta^{(i)}\|, and thus

Δ+maxi[n]Δ(i)2Δ+maxi[n]ΔΔ(i).\|\Delta\|+\max_{i\in[n]}\|\Delta^{(i)}\|\leq 2\|\Delta\|+\max_{i\in[n]}\|\Delta-\Delta^{(i)}\|.

As long as the sample size nlog2n(d1d2)log(d1+d2),n\geq\frac{\log^{2}n}{(d_{1}\vee d_{2})\log(d_{1}+d_{2})}, there exists an absolute constant C0>0C_{0}>0 such that

Δ+maxi[n]Δ(i)C0Cl1(σξ+Curσ1)(d1d2)log(d1+d2)n,\|\Delta\|+\max_{i\in[n]}\|\Delta^{(i)}\|\leq C_{0}\sqrt{C_{l}^{-1}}\left(\sigma_{\xi}+\sqrt{C_{u}}\sqrt{r}\sigma_{1}\right)\sqrt{\frac{(d_{1}\vee d_{2})\log(d_{1}+d_{2})}{n}},

with probability at least 1(d1+d2)10n91-(d_{1}+d_{2})^{-10}-n^{-9}.

E.5 Proof of Lemma 6

By (Pajor, 1998, Proposition 8) and (Koltchinskii and Xia, 2015, Lemma 5), for any q[1,]q\in[1,\infty], there exists an absolute constant c>0c^{\prime}>0 and a subset 𝒮q(dr)𝕆dr,r\mathcal{S}_{q}^{(d-r)}\subset\mathbb{O}_{d-r,r} such that for any ViVj𝒮q(dr)V_{i}\neq V_{j}\in\mathcal{S}_{q}^{(d-r)}, ViViVjVjqcr1/q\|V_{i}V_{i}^{\top}-V_{j}V_{j}^{\top}\|_{q}\geq c^{\prime}r^{1/q}, and the cardinality of 𝒮q(dr)\mathcal{S}_{q}^{(d-r)} is at least 2r(dr)2^{r(d-r)}. Here, q\|\cdot\|_{q} denotes the Schatten-qq norm of a matrix. In particular, spectral norm is Schatten-\infty norm, Frobenius norm is Schatten-22 norm, and nuclear norm is Schatten-11 norm. Let ε0>0\varepsilon_{0}>0 be a small number to be decided later. Now, for each V𝒮q(dr)V\in\mathcal{S}_{q}^{(d-r)}, we define

U=(1ε02Irε02V)U=\left(\begin{array}[]{c}\sqrt{1-\varepsilon_{0}^{2}}I_{r}\\ \sqrt{\varepsilon_{0}^{2}}V\end{array}\right)

such that Ud×rU\in\mathbb{R}^{d\times r} and UU=IrU^{\top}U=I_{r}. This means that, for any V𝒮q(dr)V\in\mathcal{S}_{q}^{(d-r)}, we can construct a U𝕆d,rU\in\mathbb{O}_{d,r}. This defines a subset 𝒮q(d)𝕆d,r\mathcal{S}_{q}^{(d)}\subset\mathbb{O}_{d,r} with Card(𝒮q(d))2r(dr){\rm Card}\big{(}\mathcal{S}_{q}^{(d)}\big{)}\geq 2^{r(d-r)} such that for any UiUj𝒮q(d)U_{i}\neq U_{j}\in\mathcal{S}_{q}^{(d)},

UiUiUjUjqε02(1ε02)ViVjqε02(1ε02)ViViVjVjqε02(1ε02)r1/q\displaystyle\|U_{i}U_{i}^{\top}-U_{j}U_{j}^{\top}\|_{q}\geq\sqrt{\varepsilon_{0}^{2}(1-\varepsilon_{0}^{2})}\|V_{i}-V_{j}\|_{q}\gtrsim\sqrt{\varepsilon_{0}^{2}(1-\varepsilon_{0}^{2})}\|V_{i}V_{i}^{\top}-V_{j}V_{j}^{\top}\|_{q}\gtrsim\sqrt{\varepsilon_{0}^{2}(1-\varepsilon_{0}^{2})}r^{1/q}

and, meanwhile,

UiUiUjUjFUiUjFε0ViVjF2rε0.\|U_{i}U_{i}^{\top}-U_{j}U_{j}^{\top}\|_{\rm F}\lesssim\|U_{i}-U_{j}\|_{\rm F}\leq\varepsilon_{0}\|V_{i}-V_{j}\|_{\rm F}\leq\sqrt{2r}\varepsilon_{0}.

E.6 Proof of Lemma 7, 8 and 9

See the proof of Lemma 7 in Vershynin (2018), Lemma 8 in Shen et al. (2023) and Lemma 9 in Wei et al. (2016) and Luo and Zhang (2022).

E.7 Proof of Lemma 10

Since ZiZ_{i}^{\prime} is independent of ziz_{i} and 𝔼𝒮M(zi)=𝔼MfM(yi|Xi)=0\mathbb{E}\,\mathcal{S}_{M}(z_{i})=\mathbb{E}\,\nabla_{M}f_{M}(y_{i}|X_{i})=0,

𝔼𝒜M1(zi,A(Zi))=𝔼[A(Zi)M][d1]×[r],𝒮M(zi)=𝔼[A(Zi)M][d1]×[r],𝔼𝒮M(zi)=0,\displaystyle\mathbb{E}\,\mathcal{A}^{1}_{M}(z_{i},A(Z_{i}^{\prime}))=\mathbb{E}\left<\left[A(Z_{i}^{\prime})-M\right]_{[d_{1}]\times[r]},\mathcal{S}_{M}(z_{i})\right>=\left<\mathbb{E}\left[A(Z_{i}^{\prime})-M\right]_{[d_{1}]\times[r]},\mathbb{E}\,\mathcal{S}_{M}(z_{i})\right>=0,

As for 𝔼Ai=𝔼𝒜M1(zi,A(Z))\mathbb{E}A_{i}=\mathbb{E}\,\mathcal{A}^{1}_{M}(z_{i},A(Z)), we apply Jensen’s inequality and have

𝔼|𝒜M1(zi,A(Zi))|𝔼|𝒜M1(zi,A(Zi))|2\displaystyle\mathbb{E}\left|\mathcal{A}^{1}_{M}(z_{i},A(Z_{i}^{\prime}))\right|\leq\sqrt{\mathbb{E}\left|\mathcal{A}^{1}_{M}(z_{i},A(Z_{i}^{\prime}))\right|^{2}} (S.25)
𝔼vec([A(Zi)M][d1]×[r])vec(𝒮M(zi))vec(𝒮M(zi))vec([A(Zi)M][d1]×[r])\displaystyle\leq\sqrt{\mathbb{E}\operatorname{vec}\left(\left[A(Z_{i}^{\prime})-M\right]_{[d_{1}]\times[r]}\right)^{\top}\operatorname{vec}(\mathcal{S}_{M}(z_{i}))\operatorname{vec}(\mathcal{S}_{M}(z_{i}))^{\top}\operatorname{vec}\left(\left[A(Z_{i}^{\prime})-M\right]_{[d_{1}]\times[r]}\right)}
𝔼vec(𝒮M(zi))vec(𝒮M(zi))𝔼[A(Zi)M][d1]×[r]F2,\displaystyle\leq\sqrt{\left\|\mathbb{E}\operatorname{vec}(\mathcal{S}_{M}(z_{i}))\operatorname{vec}(\mathcal{S}_{M}(z_{i}))^{\top}\right\|}\cdot\sqrt{\mathbb{E}\left\|\left[A(Z_{i}^{\prime})-M\right]_{[d_{1}]\times[r]}\right\|_{F}^{2}},

where the second line is due to B,C=vec(B)vec(C)\left<B,C\right>=\operatorname{vec}(B)^{\top}\operatorname{vec}(C) and the last inequality is because ZiZ_{i}^{\prime} is independent of ziz_{i}. By the definition of 𝒮M(zi)=1σξ2(yiXi,M)Xi\mathcal{S}_{M}(z_{i})=\frac{1}{\sigma_{\xi}^{2}}(y_{i}-\left<X_{i},M\right>)X_{i} and the independence between ξi\xi_{i} and XiX_{i},

𝔼vec(𝒮M(zi))vec(𝒮M(zi))=𝔼(yiXi,Mσξ2)2𝔼vec(Xi)vec(Xi)=1σξ2ΛiCuσξ2.\displaystyle\left\|\mathbb{E}\operatorname{vec}(\mathcal{S}_{M}(z_{i}))\operatorname{vec}(\mathcal{S}_{M}(z_{i}))^{\top}\right\|=\left\|\mathbb{E}\left(\frac{y_{i}-\left<X_{i},M\right>}{\sigma_{\xi}^{2}}\right)^{2}\mathbb{E}\operatorname{vec}\left(X_{i}\right)\operatorname{vec}\left(X_{i}\right)^{\top}\right\|=\frac{1}{\sigma_{\xi}^{2}}\|\Lambda_{i}\|\leq\frac{C_{u}}{\sigma_{\xi}^{2}}.

Plugging the upper bound of 𝔼vec(𝒮M(zi))vec(𝒮M(zi))\left\|\mathbb{E}\operatorname{vec}(\mathcal{S}_{M}(z_{i}))\operatorname{vec}(\mathcal{S}_{M}(z_{i}))^{\top}\right\| into (S.25),

𝔼|𝒜M1(zi,A(Zi))|𝔼A(Z)MF2Cuσξ2,\displaystyle\mathbb{E}\left|\mathcal{A}^{1}_{M}(z_{i},A(Z_{i}^{\prime}))\right|\leq\sqrt{\mathbb{E}\left\|A(Z)-M\right\|^{2}_{F}}\sqrt{\frac{C_{u}}{\sigma_{\xi}^{2}}},

E.8 Proof of Lemma 11, Proof of Lemma 12, Lemma 13

Lemma 11 is a trivial consequence by setting T=2/σξ2rd1log(1δ)T=\sqrt{2/\sigma_{\xi}^{2}}rd_{1}\sqrt{\log(\frac{1}{\delta})} to Proposition 2.12.1, Cai et al. (2023). Lemma 12 is a trivial consequence of Theorem 2.1, Cai et al. (2023) along with the definition of 𝒜M1(zi,A(Z))\mathcal{A}_{M}^{1}\left(z_{i},A(Z)\right). Lemma 13 is a trivial consequence of Proposition 2.22.2, Cai et al. (2023) by taking Mij𝒩(0,1)M_{ij}\sim\mathcal{N}(0,1) for (i,j)[d1]×[r](i,j)\in[d_{1}]\times[r] and Mij=0M_{ij}=0 otherwise.

E.9 Proof of Lemma 14

See the proof of Lemma 14 in Xia (2021).

Appendix F Weak Differential privacy

This section proposes a weaker definition than differential privacy such that the sensitivities are free of {Xi}i[n]\{X_{i}\}_{i\in[n]}.

Definition 2 (weak (ε,δ)(\varepsilon,\delta)-differential privacy).

Let ZZ be a given data set and ZZ^{\prime} be a weak neighbouring data set of ZZ, i.e., ZZ and ZZ^{\prime} differs by at most one pair of observations zZz\in Z and zZz^{\prime}\in Z^{\prime} sharing the same measurement XX. The algorithm AA that maps ZZ into d1×d2\mathbb{R}^{d_{1}\times d_{2}} is weak (ε,δ)(\varepsilon,\delta)-differentially private over the dataset ZZ if

(A(Z)𝒬)eε(A(Z)𝒬)+δ,\mathbb{P}\big{(}A(Z)\in\mathcal{Q}\big{)}\leq e^{\varepsilon}\mathbb{P}\big{(}A(Z^{\prime})\in\mathcal{Q}\big{)}+\delta, (S.26)

for all weak neighbouring data set Z,ZZ,Z^{\prime} and all subset 𝒬d1×d2\mathcal{Q}\subset\mathbb{R}^{d_{1}\times d_{2}}.

Compared to the standard (ε,δ)(\varepsilon,\delta)-DP, weak (ε,δ)(\varepsilon,\delta)-differential privacy is a less powerful constraint. Definition 2 only requires the algorithm AA to preserve the property (S.26) over weak neighbouring datasets, i.e., datasets that differs by at most one pair of observations sharing the same measurement XX. As we consider a pair of observations z=(X,y)z=(X,y) and z=(X,y)z^{\prime}=(X,y^{\prime}) under the model (1), where y=X,M+ξandy=X,M+ξ,y=\left<X,M\right>+\xi\quad{\rm and}\quad y^{\prime}=\left<X,M\right>+\xi^{\prime}, the difference yy=ξξy-y^{\prime}=\xi-\xi^{\prime} is free of the measurement XX.

Next, we list the Theorem 6, Corollary 2 and Theorem 7 as the analogues of Theorem 1, Corollary 1 and Theorem 3. All proofs for this section are deferred to the end of this section.

Theorem 6 (Weak DP and utility guarantees of the initialization M~0\widetilde{M}_{0}).

Consider i.i.d. observations Z={z1,,zn}Z=\{z_{1},\cdots,z_{n}\} drawn from the trace regression model stated in (1)(\ref{equ: trace_regression_model}) where zi:=(Xi,yi)z_{i}:=(X_{i},y_{i}) for i=1,,ni=1,\cdots,n. Let the true low-rank regression coefficients matrix being M𝕄rM\in\mathbb{M}_{r}. Suppose that {Xi}i[n]\{X_{i}\}_{i\in[n]} satisfy the Assumption 1. Under the mild condition nσξσξ+Curσ1n\geq\frac{\sigma_{\xi}}{\sigma_{\xi}+\sqrt{C_{u}r}\sigma_{1}}, there exists absolute constants C1,C2,C3>0C_{1},C_{2},C_{3}>0 such that as the sample size nn0n\geq n_{0}, the sensitivity for leading rr left and right singular vectors takes the value

Δweak(1)\displaystyle\Delta_{weak}^{(1)} :=maxi[n](U^U^U^(i)U^(i)FV^V^V^(i)V^(i)F)=C2Cl1σξσrrnlogn;\displaystyle:=\max_{i\in[n]}\left(\|\widehat{U}\widehat{U}^{\top}-\widehat{U}^{(i)}\widehat{U}^{(i)\top}\|_{F}\vee\|\widehat{V}\widehat{V}^{\top}-\widehat{V}^{(i)}\widehat{V}^{(i)\top}\|_{F}\right)=C_{2}\sqrt{C_{l}^{-1}}\frac{\sigma_{\xi}}{\sigma_{r}}\frac{\sqrt{r}}{n}\log n;

the sensitivity for the rr singular values takes the value

Δweak(2)\displaystyle\Delta_{weak}^{(2)} :=maxi[n]U~(L^L^(i))V~F=C2Cl1σξrnlogn,\displaystyle:=\max_{i\in[n]}\left\|\widetilde{U}^{\top}\left(\widehat{L}-\widehat{L}^{(i)}\right)\widetilde{V}\right\|_{F}=C_{2}\sqrt{C_{l}^{-1}}\sigma_{\xi}\frac{\sqrt{r}}{n}\log n,

and Algorithm 1 is weak (ε,δ)(\varepsilon,\delta)-differentially private. Moreover,

M~0weakM(M~0weakMF/2r)\displaystyle\|\widetilde{M}_{0}^{weak}-M\|\bigvee\left(\|\widetilde{M}_{0}^{weak}-M\|_{F}/\sqrt{2r}\right) e1+C3Cl1σξ(σ1σrr(d1d2)nε+rnε)lognlog12(3.75δ)e2weak,\displaystyle\leq e_{1}+\underbrace{C_{3}\sqrt{C_{l}^{-1}}\sigma_{\xi}\left(\frac{\sigma_{1}}{\sigma_{r}}\frac{\sqrt{r(d_{1}\vee d_{2})}}{n\varepsilon}+\frac{r}{n\varepsilon}\right)\log n\log^{\frac{1}{2}}(\frac{3.75}{\delta})}_{e^{weak}_{2}},

with probability at least 1(d1+d2)10n9exp(d1)exp(d2)1020r1-(d_{1}+d_{2})^{-10}-n^{-9}-\exp(-d_{1})-\exp(-d_{2})-10^{-20r}.

Theorem 6 requires the same sample size condition nn0n\geq n_{0} as Theorem 1, however, the sensitivities Δweak(1)\Delta_{weak}^{(1)} and Δweak(2)\Delta_{weak}^{(2)} derived under weak DP, differs with their DP counterpart Δ(1)\Delta^{(1)} and Δ(2)\Delta^{(2)} by the factor Curσ1\sqrt{C_{u}r}\sigma_{1}. This leads to a smaller cost of privacy e2weake^{weak}_{2} than the cost of privacy e2e_{2} we obtained under stronger standard DP-constraints, as presented in Theorem 1.

Corollary 2.

Under the conditions stated in Theorem 6, as the sample size is sufficiently large such that for some absolute constant c2>0c_{2}>0,

nC1max{n1,Cl1(σξσr)(κrd1d2+r32)lognlog12(3.75δ)εn2weak},\displaystyle n\geq C_{1}\max\Bigg{\{}n_{1},\underbrace{\sqrt{C_{l}^{-1}}\left(\frac{\sigma_{\xi}}{\sigma_{r}}\right)\left(\kappa r\sqrt{d_{1}\vee d_{2}}+r^{\frac{3}{2}}\right)\log n\frac{\log^{\frac{1}{2}}(\frac{3.75}{\delta})}{\varepsilon}}_{n_{2}^{weak}}\Bigg{\}},

we have for some small constant 0<c0<10<c_{0}<1, M~0MF2rM~0Mc0σr.\|\widetilde{M}_{0}-M\|_{F}\leq\sqrt{2r}\|\widetilde{M}_{0}-M\|\leq c_{0}\sigma_{r}.

Compared with Corollary 1, Corollary 2 requires smaller sample size as n2weakn2n_{2}^{weak}\leq n_{2}.

Theorem 7.

Consider i.i.d. observations Z={z1,,zn}Z=\{z_{1},\cdots,z_{n}\} drawn from the trace regression model stated in (1)(\ref{equ: trace_regression_model}) where the true low-rank regression coefficients matrix being M𝕄rM\in\mathbb{M}_{r}. Here, zi:=(Xi,yi)z_{i}:=(X_{i},y_{i}) for i=1,,ni=1,\cdots,n and we assume that {Xi}i[n]\{X_{i}\}_{i\in[n]} satisfy the Assumption 1 and (d1+d2)>logn(d_{1}+d_{2})>\log n. Suppose the weak (ε,δ)(\varepsilon,\delta)-DP initialization satisfies 2, then Algorithm 2 is weak (2ε,2δ)(2\varepsilon,2\delta)-differentially private with the sensitivities

Δl\displaystyle\Delta_{l} =C3ηnσξCur(d1+d2)logn,\displaystyle=C_{3}\frac{\eta}{n}\sigma_{\xi}\sqrt{C_{u}r(d_{1}+d_{2})\log n},

for some absolute constant C3>0C_{3}>0. Moreover, as the sample size

nc4max{n3,n4,ηCuκξr(d1+d2)log3/2(n)log1/2(1.25log(n)δ)εn5weak},\displaystyle n\geq c_{4}\max\Bigg{\{}n_{3},n_{4},\underbrace{\eta\sqrt{C_{u}}\kappa_{\xi}r(d_{1}+d_{2})\log^{3/2}(n)\frac{\log^{1/2}\left(\frac{1.25\log(n)}{\delta}\right)}{\varepsilon}}_{n_{5}^{weak}}\Bigg{\}},

for some small constant 0<c4<10<c_{4}<1, number of iteration l=O(logn)l^{*}=O(\log n), and the step size 0<η<10<\eta<1, we have the output of Algorithm 2 satisfies

M~lMF\displaystyle\left\|\widetilde{M}_{l^{*}}-M\right\|_{F} u1+C4Cuσξr(d1+d2)nεlog3/2nlog1/2(1.25log(n)δ)u2weak.\displaystyle\leq u_{1}+\underbrace{C_{4}\sqrt{C_{u}}\sigma_{\xi}\frac{r(d_{1}+d_{2})}{n\varepsilon}\log^{3/2}n\log^{1/2}\left(\frac{1.25\log(n)}{\delta}\right)}_{u_{2}^{weak}}.

with probability at least

1\displaystyle 1 c~2exp(c~3r(d1+d2))(d1+d2)10n9ed1ed21020r\displaystyle-\widetilde{c}_{2}\exp\left(-\widetilde{c}_{3}r(d_{1}+d_{2})\right)-(d_{1}+d_{2})^{-10}-n^{-9}-e^{-d_{1}}-e^{-d_{2}}-10^{-20r}
((d1+d2)10+exp((d1+d2))+n9+exp(10Cu(d1+d2))n9)logn.\displaystyle-\left((d_{1}+d_{2})^{-10}+\exp(-(d_{1}+d_{2}))+n^{-9}+\exp\left(-10C_{u}(d_{1}+d_{2})\right)n^{-9}\right)\log n.

Theorem 7 shows that as the sample size nO~((κξ2κξ)r(d1d2)),n\gtrsim\widetilde{O}\left(\left(\kappa_{\xi}^{2}\vee\kappa_{\xi}\right)r(d_{1}\vee d_{2})\right), the estimator M~l\widetilde{M}_{l^{*}} given by Algorithm 2 attains the optimal convergence rate O~p(σξr(d1+d2)n+σξr(d1+d2)nε),\widetilde{O}_{p}\left(\sigma_{\xi}\sqrt{\frac{r(d_{1}+d_{2})}{n}}+\sigma_{\xi}\frac{r(d_{1}+d_{2})}{n\varepsilon}\right), in the sense of weak differential privacy.

The proofs of Theorem 6, 7 and Corollary 2 will be a trivial concequence of replacing the first part of Lemma 5 by the following Lemma 15

Lemma 15.

Under model (1), Assumption 1, and the condition nσξσξ+Curσ1n\geq\frac{\sigma_{\xi}}{\sigma_{\xi}+\sqrt{C_{u}r}\sigma_{1}}, there exists some absolute constant C0,C1>0C_{0},C_{1}>0 such that the event

:=\displaystyle\mathcal{E}_{*}:= {maxi[n]ΔΔ(i)C0n1Cl1σξlogn}\displaystyle\Bigg{\{}\max_{i\in[n]}\big{\|}\Delta-\Delta^{(i)}\big{\|}\leq C_{0}\cdot n^{-1}\sqrt{C_{l}^{-1}}\sigma_{\xi}\log n\Bigg{\}}
{Δ+maxi[n]Δ(i)C0Cl1(σξ+Curσ1)(d1d2)log(d1+d2)n},\displaystyle\bigcap\Bigg{\{}\|\Delta\|+\max_{i\in[n]}\|\Delta^{(i)}\|\leq C_{0}\sqrt{C_{l}^{-1}}\left(\sigma_{\xi}+\sqrt{C_{u}}\sqrt{r}\sigma_{1}\right)\sqrt{\frac{(d_{1}\vee d_{2})\log(d_{1}+d_{2})}{n}}\Bigg{\}},

holds with probability at least 1(d1+d2)10n91-(d_{1}+d_{2})^{-10}-n^{-9}.

Proof of Lemma 15.

We only need to focus on maxi[n]ΔΔ(i)\max_{i\in[n]}\big{\|}\Delta-\Delta^{(i)}\big{\|} since the rest of the proof is the same as Lemma 5.

maxi[n]ΔΔ(i)maxi[n]1nmat(Λi1vec(Xi))(ξiξi),\displaystyle\max_{i\in[n]}\big{\|}\Delta-\Delta^{(i)}\big{\|}\leq\max_{i\in[n]}\frac{1}{n}\left\|\operatorname{mat}(\Lambda_{i}^{-1}\operatorname{vec}(X_{i}))\left(\xi_{i}-\xi_{i}^{\prime}\right)\right\|,

where ξiψ2=ξiψ2=σξ\|\xi_{i}\|_{\psi_{2}}=\|\xi_{i}^{\prime}\|_{\psi_{2}}=\sigma_{\xi} and Λi1vec(Xi)N(0,Λi1)\Lambda_{i}^{-1}\operatorname{vec}\left(X_{i}\right)\sim N\left(0,\Lambda_{i}^{-1}\right) for all i=1,,ni=1,\cdots,n. Therefore, for some absolute constant c0>0c_{0}>0,

mat(Λi1vec(Xi))(ξiξi)Ψ1c0Cl1σξ.\displaystyle\left\|\left\|\operatorname{mat}(\Lambda_{i}^{-1}\operatorname{vec}(X_{i}))\left(\xi_{i}-\xi_{i}^{\prime}\right)\right\|\right\|_{\Psi_{1}}\leq c_{0}\sqrt{C_{l}^{-1}}\sigma_{\xi}.

We complete the proof by appying tail bound for sub-exponential random variable and taking a maximum over nn. ∎

References

  • Abowd et al. (2020) Abowd, J. M., I. M. Rodriguez, W. N. Sexton, P. E. Singer, and L. Vilhuber (2020). The modernization of statistical disclosure limitation at the us census bureau. US Census Bureau.
  • Absil et al. (2008) Absil, P.-A., R. Mahony, and R. Sepulchre (2008). Optimization Algorithms on Matrix Manifolds. Princeton, NJ: Princeton University Press.
  • Acharya et al. (2021) Acharya, J., Z. Sun, and H. Zhang (2021). Differentially private assouad, fano, and le cam. In Algorithmic Learning Theory, pp.  48–78. PMLR.
  • Adler et al. (2002) Adler, R. L., J. Dedieu, J. Y. Margulies, M. Martens, and M. Shub (2002, 07). Newton’s method on Riemannian manifolds and a geometric model for the human spine. IMA Journal of Numerical Analysis 22(3), 359–390.
  • Apple Differential Privacy Team (2017) Apple Differential Privacy Team (2017). Learning with privacy at scale.
  • Blum et al. (2005) Blum, A., C. Dwork, F. McSherry, and K. Nissim (2005, 06). Practical privacy: The sulq framework. Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 128–138.
  • Brown et al. (2023) Brown, G., S. Hopkins, and A. Smith (2023, 12–15 Jul). Fast, sample-efficient, affine-invariant private mean and covariance estimation for subgaussian distributions. In G. Neu and L. Rosasco (Eds.), Proceedings of Thirty Sixth Conference on Learning Theory, Volume 195 of Proceedings of Machine Learning Research, pp.  5578–5579. PMLR.
  • Burer and Monteiro (2003) Burer, S. and R. D. Monteiro (2003). A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical Programming 95(2), 329–357.
  • Cai et al. (2021) Cai, T. T., Y. Wang, and L. Zhang (2021). The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. The Annals of Statistics 49, 2825–2850.
  • Cai et al. (2023) Cai, T. T., Y. Wang, and L. Zhang (2023). Score attack: A lower bound technique for optimal differentially private learning. arXiv preprint arXiv:2303.07152.
  • Cai et al. (2024) Cai, T. T., D. Xia, and M. Zha (2024). Optimal differentially private pca and estimation for spiked covariance matrices. arXiv preprint arXiv:2401.03820.
  • Candes and Plan (2011) Candes, E. J. and Y. Plan (2011). Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Transactions on Information Theory 57(4), 2342–2359.
  • Candes and Tao (2005) Candes, E. J. and T. Tao (2005). Decoding by linear programming. IEEE transactions on information theory 51(12), 4203–4215.
  • Chaudhuri et al. (2012) Chaudhuri, K., A. Sarwate, and K. Sinha (2012). Near-optimal differentially private principal components. Advances in Neural Information Processing Systems 25.
  • Chen et al. (2019) Chen, H., G. Raskutti, and M. Yuan (2019). Non-convex projected gradient descent for generalized low-rank tensor regression. The Journal of Machine Learning Research 20(1), 172–208.
  • Chen and Wainwright (2015) Chen, Y. and M. J. Wainwright (2015). Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees.
  • Chien et al. (2021) Chien, S., P. Jain, W. Krichene, S. Rendle, S. Song, A. Thakurta, and L. Zhang (2021). Private alternating least squares: Practical private matrix completion with tighter rates. pp.  1877–1887.
  • Dwork et al. (2006) Dwork, C., F. McSherry, K. Nissim, and A. Smith (2006). Calibrating noise to sensitivity in private data analysis. Theory of Cryptography Conference, 265–284.
  • Dwork et al. (2014) Dwork, C., A. Roth, et al. (2014). The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3–4), 211–407.
  • Dwork et al. (2014) Dwork, C., K. Talwar, A. Thakurta, and L. Zhang (2014). Analyze gauss: optimal bounds for privacy-preserving principal component analysis. Proceedings of the forty-sixth annual ACM symposium on Theory of computing, 11–20.
  • Edelman et al. (1998) Edelman, A., T. A. Arias, and S. T. Smith (1998). The geometry of algorithms with orthogonality constraints. SIAM journal on Matrix Analysis and Applications 20(2), 303–353.
  • Erlingsson et al. (2014) Erlingsson, U., V. Pihur, and A. Korolova (2014). Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS ’14, New York, NY, USA, pp.  1054–1067. Association for Computing Machinery.
  • Hamidi and Bayati (2022) Hamidi, N. and M. Bayati (2022). On low-rank trace regression under general sampling distribution. Journal of Machine Learning Research 23(321), 1–49.
  • Jain et al. (2018) Jain, P., O. D. Thakkar, and A. Thakurta (2018). Differentially private matrix completion revisited. pp.  2215–2224.
  • Kamath et al. (2019) Kamath, G., J. Li, V. Singhal, and J. Ullman (2019). Privately learning high-dimensional distributions. In Conference on Learning Theory, pp.  1853–1902. PMLR.
  • Kamath et al. (2020) Kamath, G., V. Singhal, and J. Ullman (2020, 09–12 Jul). Private mean estimation of heavy-tailed distributions. In J. Abernethy and S. Agarwal (Eds.), Proceedings of Thirty Third Conference on Learning Theory, Volume 125 of Proceedings of Machine Learning Research, pp.  2204–2235. PMLR.
  • Koltchinskii (2011) Koltchinskii, V. (2011). Von neumann entropy penalization and low-rank matrix estimation.
  • Koltchinskii et al. (2011) Koltchinskii, V., K. Lounici, and A. B. Tsybakov (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion.
  • Koltchinskii and Xia (2015) Koltchinskii, V. and D. Xia (2015). Optimal estimation of low rank density matrices. J. Mach. Learn. Res. 16(53), 1757–1792.
  • Kuditipudi et al. (2023) Kuditipudi, R., J. Duchi, and S. Haque (2023, 12–15 Jul). A pretty fast algorithm for adaptive private mean estimation. In G. Neu and L. Rosasco (Eds.), Proceedings of Thirty Sixth Conference on Learning Theory, Volume 195 of Proceedings of Machine Learning Research, pp.  2511–2551. PMLR.
  • Liu et al. (2022) Liu, X., W. Kong, P. Jain, and S. Oh (2022). Dp-pca: Statistically optimal and differentially private pca. Advances in Neural Information Processing Systems 35, 29929–29943.
  • Liu et al. (2015) Liu, Z., Y.-X. Wang, and A. Smola (2015). Fast differentially private matrix factorization. In Proceedings of the 9th ACM Conference on Recommender Systems, pp.  171–178.
  • Luo and Zhang (2022) Luo, Y. and A. R. Zhang (2022). Tensor-on-tensor regression: Riemannian optimization, over-parameterization, statistical-computational gap, and their interplay. arXiv preprint arXiv:2206.08756.
  • McSherry and Mironov (2009) McSherry, F. and I. Mironov (2009). Differentially private recommender systems: Building privacy into the netflix prize contenders. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.  627–636.
  • Negahban and Wainwright (2011) Negahban, S. and M. J. Wainwright (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling.
  • Pajor (1998) Pajor, A. (1998). Metric entropy of the grassmann manifold. Convex Geometric Analysis 34(181-188), 0942–46013.
  • Rohde and Tsybakov (2011) Rohde, A. and A. B. Tsybakov (2011). Estimation of high-dimensional low-rank matrices.
  • Shen et al. (2023) Shen, Y., J. Li, J.-F. Cai, and D. Xia (2023). Computationally efficient and statistically optimal robust high-dimensional linear regression. arXiv preprint arXiv:2305.06199.
  • Tropp et al. (2015) Tropp, J. A. et al. (2015). An introduction to matrix concentration inequalities. Foundations and Trends® in Machine Learning 8(1-2), 1–230.
  • Vadhan (2017) Vadhan, S. (2017). The complexity of differential privacy. Tutorials on the Foundations of Cryptography: Dedicated to Oded Goldreich, 347–450.
  • Vandereycken (2013) Vandereycken, B. (2013). Low-rank matrix completion by riemannian optimization. SIAM Journal on Optimization 23(2), 1214–1236.
  • Vershynin (2015) Vershynin, R. (2015). Estimation in high dimensions: a geometric perspective. In Sampling Theory, a Renaissance: Compressive Sensing and Other Developments, pp.  3–66. Springer.
  • Vershynin (2018) Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science, Volume 47. Cambridge university press.
  • Wang et al. (2023) Wang, L., B. Zhao, and M. Kolar (2023, 25–27 Apr). Differentially private matrix completion through low-rank matrix factorization.  206, 5731–5748.
  • Wang (2018) Wang, Y.-X. (2018). Revisiting differentially private linear regression: optimal and adaptive prediction and estimation in unbounded domain. In UAI 2018.
  • Wei et al. (2016) Wei, K., J.-F. Cai, T. F. Chan, and S. Leung (2016). Guarantees of riemannian optimization for low rank matrix recovery. SIAM Journal on Matrix Analysis and Applications 37(3), 1198–1222.
  • Xia (2021) Xia, D. (2021). Normal approximation and confidence region of singular subspaces. Electronic Journal of Statistics 15(2), 3798–3851.
  • Zheng and Lafferty (2016) Zheng, Q. and J. Lafferty (2016). A convergent gradient descent algorithm for rank minimization and semidefinite programming from random linear measurements.