This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On Stewart’s Perturbation Theorem for SVD

Ren-Cang Li Department of Mathematics, University of Texas at Arlington, Arlington, TX 76019-0408, USA. Supported in part by NSF DMS-2009689. Email: [email protected].    Ninoslav Truhar Department of Mathematics, J. J. Strossmayer University of Osijek, Trg Ljudevita Gaja 6, 31000 Osijek, Croatia. Email: [email protected].    Lei-Hong Zhang School of Mathematical Sciences, Soochow University, Suzhou 215006, Jiangsu, China. Supported in part by the National Natural Science Foundation of China NSFC-11671246 NSFC-12371380, and Jiangsu Shuangchuang Project JSSCTD202209. Email: [email protected].
Abstract

This paper establishes a variant of Stewart’s theorem [Theorem 6.4 of Stewart, SIAM Rev., 15:727–764, 1973] for the singular subspaces associated with the SVD of a matrix subject to perturbations. Stewart’s original version uses both the Frobenius and spectral norms, whereas the new variant uses the spectral norm and any unitarily invariant norm that offer choices per convenience of particular applications and lead to sharper bounds than that straightforwardly derived from Stewart’s original theorem with the help of the well-known equivalence inequalities between matrix norms. Of interest in their own right, bounds on the solution to two couple Sylvester equations are established for a few different circumstances.


Keywords: SVD, perturbation, singular subspaces, coupled Sylvester equations

Mathematics Subject Classification 65F15, 15A18

1 Introduction

In [15], Stewart established a perturbation theory for the singular subspaces associated with the SVD of a matrix Gm×nG\in\mathbb{C}^{m\times n} slightly perturbed. In the same paper he also investigated the eigenspace of a matrix and the deflating subspace of a regular matrix pencil subject to perturbations. But in this paper, we limit our scope to one of his theorems, namely, [15, Theorem 6.4] on SVD, where both the Frobenius and spectral norms are used to measure perturbations. Our goal is to establish a more general and yet sharper version of [15, Theorem 6.4] in the spectral norm and any unitarily invariant norm. Our new version offers flexibility in its applications, for example, the version with the unitarily invariant norm also set to the spectral norm is more convenient to use in our recent work [18] for analyzing the quality of a reduced order model by approximate balanced truncation [1, 2]. Even in the case of using both the Frobenius and spectral norms, our results are slightly better than Stewart’s [15, Theorem 6.4] in terms of a less stringent condition and yet a sharper bound.

Given Gm×nG\in\mathbb{C}^{m\times n}, let

U [\@arstrutrm-r\\U1U2] m×m,V [\@arstrutrn-r\\V1V2] n×nU\equiv\hbox{}\vbox{\kern 0.86108pt\hbox{$\kern 0.0pt\kern 2.5pt\kern-5.0pt\left[\kern 0.0pt\kern-2.5pt\kern-5.55557pt\vbox{\kern-0.86108pt\vbox{\vbox{ \halign{\kern\arraycolsep\hfil\@arstrut$\kbcolstyle#$\hfil\kern\arraycolsep& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep&& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep\cr 5.0pt\hfil\@arstrut$\scriptstyle$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle m-r\\$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle U_{1}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle U_{2}$\hfil\kern 5.0pt\crcr}}}}\right]$}}\in\mathbb{C}^{m\times m},\quad V\equiv\hbox{}\vbox{\kern 0.86108pt\hbox{$\kern 0.0pt\kern 2.5pt\kern-5.0pt\left[\kern 0.0pt\kern-2.5pt\kern-5.55557pt\vbox{\kern-0.86108pt\vbox{\vbox{ \halign{\kern\arraycolsep\hfil\@arstrut$\kbcolstyle#$\hfil\kern\arraycolsep& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep&& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep\cr 5.0pt\hfil\@arstrut$\scriptstyle$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle n-r\\$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle V_{1}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle V_{2}$\hfil\kern 5.0pt\crcr}}}}\right]$}}\in\mathbb{C}^{n\times n} (1.1a)
be two unitary matrices such that GG admits the decomposition
UHGV[U1HU2H]G[V1,V2]= [\@arstrutrn-r\\rG10\\m-r0G2] ,U^{\operatorname{H}}GV\equiv\begin{bmatrix}U_{1}^{\operatorname{H}}\\ U_{2}^{\operatorname{H}}\end{bmatrix}G[V_{1},V_{2}]=\hbox{}\vbox{\kern 0.86108pt\hbox{$\kern 0.0pt\kern 2.5pt\kern-5.0pt\left[\kern 0.0pt\kern-2.5pt\kern-5.55557pt\vbox{\kern-0.86108pt\vbox{\vbox{ \halign{\kern\arraycolsep\hfil\@arstrut$\kbcolstyle#$\hfil\kern\arraycolsep& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep&& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep\cr 5.0pt\hfil\@arstrut$\scriptstyle$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle n-r\\\scriptstyle r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle G_{1}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0\\\scriptstyle m-r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle G_{2}$\hfil\kern 5.0pt\crcr}}}}\right]$}}, (1.1b)

where 1r<min{m,n}1\leq r<\min\{m,n\}. For example, this can be the SVD of GG, for which G1G_{1} is diagonal with nonnegative diagonal entries as some of the singular values of GG and G2G_{2} is leading diagonal by which we mean that only its entries along the main diagonal line starting at the top-left corner may be nonzero and nonnegative and they are some of the singular values of GG, too. By convention, GG has min{m,n}\min\{m,n\} singular values σi(G)\sigma_{i}(G) arranged in decreasing order:

σ1(G)σ2(G)σmin{m,n}(G).\sigma_{1}(G)\geq\sigma_{2}(G)\geq\cdots\geq\sigma_{\min\{m,n\}}(G). (1.2)

Denote the singular value set and its extended set of GG by

sv(G)={σi(G)}i=1min{m,n},svext(G)=sv(G){|mn| copies of 0s},\operatorname{sv}(G)=\{\sigma_{i}(G)\}_{i=1}^{\min\{m,n\}},\quad\operatorname{sv}_{\operatorname{ext}}(G)=\operatorname{sv}(G)\cup\{\mbox{$|m-n|$ copies of $0$s}\}, (1.3)

where the union is meant to be the multiset union.

Consider now that GG is perturbed to G~=G+Em×n\widetilde{G}=G+E\in\mathbb{C}^{m\times n}, and partition

UHG~V[U1HU2H](G+E)[V1,V2]= [\@arstrutrn-r\\rG1+E11E12\\m-rE21G2+E22] .U^{\operatorname{H}}\widetilde{G}V\equiv\begin{bmatrix}U_{1}^{\operatorname{H}}\\ U_{2}^{\operatorname{H}}\end{bmatrix}(G+E)[V_{1},V_{2}]=\hbox{}\vbox{\kern 0.86108pt\hbox{$\kern 0.0pt\kern 2.5pt\kern-5.0pt\left[\kern 0.0pt\kern-2.5pt\kern-5.55557pt\vbox{\kern-0.86108pt\vbox{\vbox{ \halign{\kern\arraycolsep\hfil\@arstrut$\kbcolstyle#$\hfil\kern\arraycolsep& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep&& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep\cr 5.0pt\hfil\@arstrut$\scriptstyle$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle n-r\\\scriptstyle r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle G_{1}+E_{11}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle E_{12}\\\scriptstyle m-r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle E_{21}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle G_{2}+E_{22}$\hfil\kern 5.0pt\crcr}}}}\right]$}}. (1.4)

Naturally, one would ask whether G~\widetilde{G} admits a decomposition that is “close” to the one for GG in (1.1). Stewart [15, Theorem 6.4] provided an answer to that by seeking orthogonal matrices [15, p.760]

Uˇ\displaystyle\check{U} [\@arstrutrm-r\\ˇU1ˇU2] =[U1,U2][IrΓHΓImr][(I+ΓHΓ)1/200(I+ΓΓH)1/2],\\Vˇ\displaystyle\equiv\hbox{}\vbox{\kern 0.86108pt\hbox{$\kern 0.0pt\kern 2.5pt\kern-5.0pt\left[\kern 0.0pt\kern-2.5pt\kern-5.55557pt\vbox{\kern-0.86108pt\vbox{\vbox{ \halign{\kern\arraycolsep\hfil\@arstrut$\kbcolstyle#$\hfil\kern\arraycolsep& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep&& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep\cr 5.0pt\hfil\@arstrut$\scriptstyle$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle m-r\\$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\check{U}_{1}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\check{U}_{2}$\hfil\kern 5.0pt\crcr}}}}\right]$}}=[U_{1},U_{2}]\begin{bmatrix}I_{r}&-\Gamma^{\operatorname{H}}\\ \Gamma&I_{m-r}\end{bmatrix}\begin{bmatrix}(I+\Gamma^{\operatorname{H}}\Gamma)^{-1/2}&0\\ 0&(I+\Gamma\Gamma^{\operatorname{H}})^{-1/2}\end{bmatrix},\\\check{V} [\@arstrutrn-r\\ˇV1ˇV2] =[V1,V2][IrΩHΩInr][(I+ΩHΩ)1/200(I+ΩΩH)1/2]\displaystyle\equiv\hbox{}\vbox{\kern 0.86108pt\hbox{$\kern 0.0pt\kern 2.5pt\kern-5.0pt\left[\kern 0.0pt\kern-2.5pt\kern-5.55557pt\vbox{\kern-0.86108pt\vbox{\vbox{ \halign{\kern\arraycolsep\hfil\@arstrut$\kbcolstyle#$\hfil\kern\arraycolsep& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep&& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep\cr 5.0pt\hfil\@arstrut$\scriptstyle$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle n-r\\$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\check{V}_{1}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\check{V}_{2}$\hfil\kern 5.0pt\crcr}}}}\right]$}}=[V_{1},V_{2}]\begin{bmatrix}I_{r}&-\Omega^{\operatorname{H}}\\ \Omega&I_{n-r}\end{bmatrix}\begin{bmatrix}(I+\Omega^{\operatorname{H}}\Omega)^{-1/2}&0\\ 0&(I+\Omega\Omega^{\operatorname{H}})^{-1/2}\end{bmatrix} (1.5c)

such that

UˇHG~Vˇ[Uˇ1HUˇ2H](G+E)[Vˇ1,Vˇ2]= [\@arstrutrn-r\\rˇG10\\m-r0ˇG2] .\check{U}^{\operatorname{H}}\widetilde{G}\check{V}\equiv\begin{bmatrix}\check{U}_{1}^{\operatorname{H}}\\ \check{U}_{2}^{\operatorname{H}}\end{bmatrix}(G+E)[\check{V}_{1},\check{V}_{2}]=\hbox{}\vbox{\kern 0.86108pt\hbox{$\kern 0.0pt\kern 2.5pt\kern-5.0pt\left[\kern 0.0pt\kern-2.5pt\kern-5.55557pt\vbox{\kern-0.86108pt\vbox{\vbox{ \halign{\kern\arraycolsep\hfil\@arstrut$\kbcolstyle#$\hfil\kern\arraycolsep& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep&& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep\cr 5.0pt\hfil\@arstrut$\scriptstyle$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle n-r\\\scriptstyle r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\check{G}_{1}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0\\\scriptstyle m-r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\check{G}_{2}$\hfil\kern 5.0pt\crcr}}}}\right]$}}. (1.6)

Theorem 1.1 below is [15, Theorem 6.4] in our notations here.

Theorem 1.1 ([15, Theorem 6.4]).

Given G,G~m×nG,\,\widetilde{G}\in\mathbb{C}^{m\times n}, let GG be decomposed as in (1.1), and partition UHG~VU^{\operatorname{H}}\widetilde{G}V according to (1.4). Let

ε^\displaystyle\hat{\varepsilon} =E12F2+E21F2,\displaystyle=\sqrt{\|E_{12}\|_{\operatorname{F}}^{2}+\|E_{21}\|_{\operatorname{F}}^{2}}, (1.7a)
δ\displaystyle\delta =minμsv(G1),νsvext(G2)|μν|,\displaystyle=\min_{\mu\in\operatorname{sv}(G_{1}),\,\nu\in\operatorname{sv}_{\operatorname{ext}}(G_{2})}\,|\mu-\nu|, (1.7b)
δ¯\displaystyle\underaccent{\bar}{\delta} =δE112E222,\displaystyle=\delta-\|E_{11}\|_{2}-\|E_{22}\|_{2}, (1.7c)

where 2\|\cdot\|_{2} and F\|\cdot\|_{\operatorname{F}} denote the spectral and Frobenius norms, respectively. If

δ¯>0andε^δ¯<12,\underaccent{\bar}{\delta}>0\quad\mbox{and}\quad\frac{\hat{\varepsilon}}{\underaccent{\bar}{\delta}}<\frac{1}{2}, (1.8)

then there exist Ω(nr)×r\Omega\in\mathbb{C}^{(n-r)\times r} and Γ(mr)×r\Gamma\in\mathbb{C}^{(m-r)\times r} satisfying

ΓF2+ΩF21+14(ε^/δ¯)212(ε^/δ¯)2+14(ε^/δ¯)2ε^δ¯<2ε^δ¯\sqrt{\|\Gamma\|_{\operatorname{F}}^{2}+\|\Omega\|_{\operatorname{F}}^{2}}\leq\frac{1+\sqrt{1-4(\hat{\varepsilon}/\underaccent{\bar}{\delta})^{2}}}{1-2(\hat{\varepsilon}/\underaccent{\bar}{\delta})^{2}+\sqrt{1-4(\hat{\varepsilon}/\underaccent{\bar}{\delta})^{2}}}\frac{\hat{\varepsilon}}{\underaccent{\bar}{\delta}}<2\,\frac{\hat{\varepsilon}}{\underaccent{\bar}{\delta}} (1.9)

such that (1.6) with (1.5) holds.

A number of results can be deduced from this informative theorem, e.g., bounds on UUˇF\|U-\check{U}\|_{\operatorname{F}} and VVˇF\|V-\check{V}\|_{\operatorname{F}}, the explicit expressions of Gˇ1\check{G}_{1} and Gˇ2\check{G}_{2} in terms of Ω\Omega and Γ\Gamma, G1G_{1}, G2G_{2}, and EijE_{ij}, and also the singular values of G~\widetilde{G} is the multiset union of the singular values of Gˇ1\check{G}_{1} and Gˇ2\check{G}_{2}.

Although the spectral norm 2\|\cdot\|_{2} is used in defining δ¯\underaccent{\bar}{\delta}, the Frobenius norm F\|\cdot\|_{\operatorname{F}} is in full display in defining ε^\hat{\varepsilon} and in bounding Ω\Omega and Γ\Gamma. A straightforward version of it, using only the spectral norm can be easily derived in light of the equivalency inequalities

B2BFrank(B)B2min{s,t}B2for Bs×t.\|B\|_{2}\leq\|B\|_{\operatorname{F}}\leq\sqrt{\operatorname{rank}(B)}\,\|B\|_{2}\leq\sqrt{\min\{s,t\}}\,\|B\|_{2}\quad\mbox{for $B\in\mathbb{C}^{s\times t}$}.

For example, we can conclude that if δ¯>0\underaccent{\bar}{\delta}>0 and

ε~:=min{mr,nr,r}E1222+E2122δ¯<12,\tilde{\varepsilon}:=\sqrt{\min\{m-r,n-r,r\}}\frac{\sqrt{\|E_{12}\|_{2}^{2}+\|E_{21}\|_{2}^{2}}}{\underaccent{\bar}{\delta}}<\frac{1}{2}, (1.10)

then there exist Ω(nr)×r\Omega\in\mathbb{C}^{(n-r)\times r} and Γ(mr)×r\Gamma\in\mathbb{C}^{(m-r)\times r} satisfying

ΓF2+ΩF21+14(ε~/δ¯)212(ε~/δ¯)2+14(ε~/δ¯)2ε~δ¯<2ε~δ¯\sqrt{\|\Gamma\|_{\operatorname{F}}^{2}+\|\Omega\|_{\operatorname{F}}^{2}}\leq\frac{1+\sqrt{1-4(\tilde{\varepsilon}/\underaccent{\bar}{\delta})^{2}}}{1-2(\tilde{\varepsilon}/\underaccent{\bar}{\delta})^{2}+\sqrt{1-4(\tilde{\varepsilon}/\underaccent{\bar}{\delta})^{2}}}\frac{\tilde{\varepsilon}}{\underaccent{\bar}{\delta}}<2\,\frac{\tilde{\varepsilon}}{\underaccent{\bar}{\delta}} (1.11)

such that (1.6) with (1.5) holds. There are two clear drawbacks of such a straightforward version: 1) a much stronger condition in (1.10) than something like

E1222+E2122δ¯<12\frac{\sqrt{\|E_{12}\|_{2}^{2}+\|E_{21}\|_{2}^{2}}}{\underaccent{\bar}{\delta}}<\frac{1}{2} (1.12)

as one might possibly expect, and 2) a much weaker conclusion in (1.11) than something like

Γ22+Ω22ΓF2+ΩF2<2E1222+E2122δ¯\sqrt{\|\Gamma\|_{2}^{2}+\|\Omega\|_{2}^{2}}\leq\sqrt{\|\Gamma\|_{\operatorname{F}}^{2}+\|\Omega\|_{\operatorname{F}}^{2}}<2\,\frac{\sqrt{\|E_{12}\|_{2}^{2}+\|E_{21}\|_{2}^{2}}}{\underaccent{\bar}{\delta}} (1.13)

also as one might possibly expect. The appearances of mm and nn in (1.10) are particularly unsatisfactory for large (huge) mm and usually modest rr.

Our goal in this paper is to create a version of Theorem 1.1 that uses the spectral norm and any unitarily invariant norm. Our eventual version, Theorem 3.1, when restricted exclusively to the spectral norm, does not exactly coincide with the possible expectations in (1.12) and (1.13), but is very much close to it, namely (1.12) and (1.13) with E1222+E2122\sqrt{\|E_{12}\|_{2}^{2}+\|E_{21}\|_{2}^{2}} and Γ22+Ω22\sqrt{\|\Gamma\|_{2}^{2}+\|\Omega\|_{2}^{2}} replaced by max{E122,E212}\max\{\|E_{12}\|_{2},\|E_{21}\|_{2}\} and max{Γ2,Ω2}\max\{\|\Gamma\|_{2},\|\Omega\|_{2}\}, respectively. In particular, mm and nn disappear altogether.

The rest of this paper is organized as follows. Section 2 states two preliminary results for later use in our proofs. Our main result, a variant of Stewart’s, is stated and proved in Section 3. In Section 4, we compare our main result realized for the spectral norm only and for both the spectral and Frobenius norm with Stewart’s result in Theorem 1.1 and its potential consequences in (1.10) and (1.11). Finally, we draw our conclusions in Section 5.

Notation. n×m\mathbb{C}^{n\times m} is the set of all n×mn\times m complex matrices, n=n×1\mathbb{C}^{n}=\mathbb{C}^{n\times 1}, and =1\mathbb{C}=\mathbb{C}^{1}. InI_{n} (or simply II if its dimension is clear from the context) is the n×nn\times n identity matrix. The superscript XHX^{\operatorname{H}} is the complex conjugate transpose of a matrix or vector XX. We shall also adopt MATLAB-like convention to access the entries of vectors and matrices. Let i:ji:j be the set of integers from ii to jj inclusive. X(k:,i:j)X_{(k:\ell,i:j)}, X(k:,:)X_{(k:\ell,:)}, and X(:,i:j)X_{(:,i:j)} are submatrices of XX, consisting of intersections of row kk to row \ell and column ii to column jj, row kk to row \ell, and column ii to column jj, respectively. (B){\cal R}(B) is the column space of BB, i.e., the subspace spanned by the columns of BB. Finally eig(A)\operatorname{eig}(A) is the spectrum of a square matrix AA.

We will continue to adopt the notation introduced so far in this section such as 2\|\cdot\|_{2} and F\|\cdot\|_{\operatorname{F}}. In particular, Gm×nG\in\mathbb{C}^{m\times n} has min{m,n}\min\{m,n\} singular values denoted by σi(G)\sigma_{i}(G) in decreasing order as in (1.2), and accordingly two singular value sets sv(G)\operatorname{sv}(G) and svext(G)\operatorname{sv}_{\operatorname{ext}}(G) in (1.3), and σmax(G):=σ1(G)\sigma_{\max}(G):=\sigma_{1}(G) and σmin(G):=σmin{m,n}(G)\sigma_{\min}(G):=\sigma_{\min\{m,n\}}(G).

2 Preliminaries

In this section, we will state four lemmas that we will need later. The first lemma, Lemma 2.1, is due to Stewart [14, 15], and the second one, Lemma 2.2, summarizes known bounds on the solution to the Sylvester equation with Hermitian coefficient matrices of Davis and Kahan [6] and of Bhatia, Davis, and McIntosh [4]. The third one, Lemma 2.3, establishes bounds on the solution pair to a set of the coupled Sylvester equations and the results are new, except the one for the Frobenius norm. Finally, the fourth lemma, Lemma 2.4, is likely known.

Lemma 2.1 ([14, Theorem 3.5], [15, Theorem 3.1]).

Let 𝐓\boldsymbol{T} be a bounded linear operator on the Banach space (,)(\mathscr{B},\|\cdot\|) that has a bounded inverse. Let ϕ\boldsymbol{\phi} be a continuous function on (,)(\mathscr{B},\|\cdot\|) that satisfies, for 𝐱,𝐲\boldsymbol{x},\,\boldsymbol{y}\in\mathscr{B},

(i) ϕ(𝒙)ε^𝒙2,\displaystyle\quad\|\boldsymbol{\phi}(\boldsymbol{x})\|\leq\hat{\varepsilon}\|\boldsymbol{x}\|^{2},
(ii) ϕ(𝒙)ϕ(𝒚)2ε^max{𝒙,𝒚}𝒙𝒚,\displaystyle\quad\|\boldsymbol{\phi}(\boldsymbol{x})-\boldsymbol{\phi}(\boldsymbol{y})\|\leq 2\hat{\varepsilon}\max\{\|\boldsymbol{x}\|,\|\boldsymbol{y}\|\}\|\boldsymbol{x}-\boldsymbol{y}\|,

for some ε^0\hat{\varepsilon}\geq 0. Let 0<δ^𝐓11.0<\hat{\delta}\leq\|\boldsymbol{T}^{-1}\|^{-1}. Given 𝐠\boldsymbol{g}\in\mathscr{B}, if

κ2:=(ε^/δ^2)𝒈<1/4,\kappa_{2}:=(\hat{\varepsilon}/\hat{\delta}^{2})\|\boldsymbol{g}\|<1/4,

then equation 𝐓𝐱=𝐠ϕ(𝐱)\boldsymbol{T}\boldsymbol{x}=\boldsymbol{g}-\boldsymbol{\phi}(\boldsymbol{x}) has a solution 𝐱\boldsymbol{x}\in\mathscr{B} that satisfies

𝒙1+14κ212κ2+14κ2𝒈δ^<2𝒈δ^.\|\boldsymbol{x}\|\leq\frac{1+\sqrt{1-4\kappa_{2}}}{1-2\kappa_{2}+\sqrt{1-4\kappa_{2}}}\frac{\|\boldsymbol{g}\|}{\hat{\delta}}<2\frac{\|\boldsymbol{g}\|}{\hat{\delta}}.

We need the notation of unitarily invariant norm to go forward. A matrix norm ui\|\cdot\|_{\operatorname{ui}} is called a unitarily invariant norm on m×n\mathbb{C}^{m\times n} if it is a matrix norm and has the following two additional properties [3, 16]:

  1. 1.

    UHBVui=Bui\|U^{\operatorname{H}}BV\|_{\operatorname{ui}}=\|B\|_{\operatorname{ui}} for all unitary matrices Um×mU\in\mathbb{C}^{m\times m} and Vn×nV\in\mathbb{C}^{n\times n} and Bm×nB\in\mathbb{C}^{m\times n};

  2. 2.

    Bui=B2\|B\|_{\operatorname{ui}}=\|B\|_{2}, the spectral norm of BB, if rank(B)=1\operatorname{rank}(B)=1.

Two commonly used unitarily invariant norms are

the spectral norm: B2=maxjσj\|B\|_{2}=\max_{j}\sigma_{j},
the Frobenius norm: BF=jσj2\|B\|_{\operatorname{F}}=\sqrt{\sum_{j}\sigma_{j}^{2}} ,

where σ1,σ2,,σmin{m,n}\sigma_{1},\sigma_{2},\ldots,\sigma_{\min\{m,n\}} are the singular values of BB. In what follows, ui\|\cdot\|_{\operatorname{ui}} denotes a general unitarily invariant norm.

In this article, for convenience, any ui\|\cdot\|_{\operatorname{ui}} we use is generic to matrix sizes in the sense that it applies to matrices of all sizes. Examples include the matrix spectral norm 2\|\cdot\|_{2}, the Frobenius norm F\|\cdot\|_{\operatorname{F}}, and the trace norm. One important property of unitarily invariant norms is

XYZuiX2YuiZ2\|XYZ\|_{\operatorname{ui}}\leq\|X\|_{2}\cdot\|Y\|_{\operatorname{ui}}\cdot\|Z\|_{2}

for any matrices XX, YY, and ZZ of compatible sizes.

The next lemma summarizes known bounds on the solution of the Sylvester equation AXXB=SAX-XB=S. These bounds have played important roles in eigenspace variations as demonstrated by Davis and Kahan [6, 1970], and Bhatia, Davis, and McIntosh [4, 1983]. For a brief review, see [10].

Lemma 2.2.

Consider matrix equation XABX=SXA-BX=S, where Ar×rA\in\mathbb{R}^{r\times r} and Bs×sB\in\mathbb{R}^{s\times s} are Hermitian, and Ss×rS\in\mathbb{C}^{s\times r}. If eig(A)eig(B)=\operatorname{eig}(A)\cap\operatorname{eig}(B)=\emptyset, then the equation has a unique solution Xs×rX\in\mathbb{C}^{s\times r}. Furthermore, the following statements hold.

  1. (a)

    [6] we have

    XFSF/δ,δ:=minμeig(A),νeig(B)|μν|;\|X\|_{\operatorname{F}}\leq\|S\|_{\operatorname{F}}/\delta,\quad\delta:=\min_{\mu\in\operatorname{eig}(A),\,\nu\in\operatorname{eig}(B)}\,|\mu-\nu|; (2.1)
  2. (b)

    [6] if there exist α<β\alpha<\beta and δ>0\delta>0 such that

    eithereig(A)[α,β] and eig(B)(,αδ][β+δ,),oreig(A)(,αδ][β+δ,) and eig(B)[α,β],\begin{array}[]{rl}\mbox{either}&\mbox{$\operatorname{eig}(A)\subset[\alpha,\beta]$ and $\operatorname{eig}(B)\subset(-\infty,\alpha-\delta]\cup[\beta+\delta,\infty)$},\\ \mbox{or}&\mbox{$\operatorname{eig}(A)\subset(-\infty,\alpha-\delta]\cup[\beta+\delta,\infty)$ and $\operatorname{eig}(B)\subset[\alpha,\beta]$},\end{array} (2.2)

    then for any unitarily invariant norm ui\|\cdot\|_{\operatorname{ui}},

    XuiSui/δ;\|X\|_{\operatorname{ui}}\leq\|S\|_{\operatorname{ui}}/\delta; (2.3)
  3. (c)

    [4, 5] we have

    Xui(π/2)Sui/δ,\|X\|_{\operatorname{ui}}\leq(\pi/2)\|S\|_{\operatorname{ui}}/\delta, (2.4)

    where δ\delta is as in (2.1).

The results of Lemma 2.2 have an alternative interpretation through a linear operator

𝑳:Xs×r𝑳(X)=XABX,\boldsymbol{L}\,:\,X\in\mathbb{C}^{s\times r}\,\to\,\boldsymbol{L}(X)=XA-BX\in\mathscr{B}, (2.5)

endowed with certain unitarily invariant norm ui\|\cdot\|_{\operatorname{ui}} on s×r\mathbb{C}^{s\times r}, including the spectral and Frobenius norms as special ones, where Ar×rA\in\mathbb{R}^{r\times r} and Bs×sB\in\mathbb{R}^{s\times s} are Hermitian. With any given ui\|\cdot\|_{\operatorname{ui}}, there is an induced operator norm \|\cdot\| on the linear operator from s×r\mathbb{C}^{s\times r} to itself. Translating the results of Lemma 2.2 yields the following corollary.

Corollary 2.1.

Let Ar×rA\in\mathbb{R}^{r\times r} and Bs×sB\in\mathbb{R}^{s\times s} be Hermitian, and define linear operator 𝐋\boldsymbol{L} on (s×r,ui)(\mathbb{C}^{s\times r},\|\cdot\|_{\operatorname{ui}}) as in (2.5). If eig(A)eig(B)=\operatorname{eig}(A)\cap\operatorname{eig}(B)=\emptyset, then 𝐋\boldsymbol{L} is invertible, and, furthermore, the following statements hold.

  1. (i)

    With ui=F\|\cdot\|_{\operatorname{ui}}=\|\cdot\|_{\operatorname{F}} and δ\delta as in (2.1), we have 𝑳11=δ\|\boldsymbol{L}^{-1}\|^{-1}=\delta;

  2. (ii)

    With general ui\|\cdot\|_{\operatorname{ui}} and assuming (2.2), then 𝑳11=δ\|\boldsymbol{L}^{-1}\|^{-1}=\delta, where δ\delta is the largest |μν||\mu-\nu| for some μeig(A),νeig(B)\mu\in\operatorname{eig}(A),\,\nu\in\operatorname{eig}(B) and subject to (2.2);

  3. (iii)

    With general ui\|\cdot\|_{\operatorname{ui}} and δ\delta as in (2.1), we have 𝑳11(2/π)δ\|\boldsymbol{L}^{-1}\|^{-1}\geq(2/\pi)\delta.

Proof.

To see these, we note that 𝑳11=min{γ:𝑳(X)γ(X)}\|\boldsymbol{L}^{-1}\|^{-1}=\min\{\gamma\,:\,\|\boldsymbol{L}(X)\|\geq\gamma\|(X)\|\}. Hence, immediately it follows that 𝑳11δ\|\boldsymbol{L}^{-1}\|^{-1}\geq\delta for items (i) and (ii) and 𝑳11(2/π)δ\|\boldsymbol{L}^{-1}\|^{-1}\geq(2/\pi)\delta for item (iii). The equality sign in items (i) and (ii) are achieved by letting X=𝒙𝒚HX=\boldsymbol{x}\boldsymbol{y}^{\operatorname{H}}, where 𝒙,𝒚\boldsymbol{x},\,\boldsymbol{y} are unit eigenvectors of AA and BB, respectively, such that A𝒙=μ𝒙A\boldsymbol{x}=\mu\,\boldsymbol{x} and B𝒚=ν𝒚B\boldsymbol{y}=\nu\,\boldsymbol{y}, assuming δ=|μν|\delta=|\mu-\nu|, and hence

𝑳(X)=(μν)𝒙𝒚H=(μν)X.\boldsymbol{L}(X)=(\mu-\nu)\,\boldsymbol{x}\boldsymbol{y}^{\operatorname{H}}=(\mu-\nu)X.

Also because sv(X)\operatorname{sv}(X) consists of one nonzero singular value 11 and some copies of zeros,

Xui=X2=1,\|X\|_{\operatorname{ui}}=\|X\|_{2}=1,

implying 𝑳(X)ui=δXui\|\boldsymbol{L}(X)\|_{\operatorname{ui}}=\delta\|X\|_{\operatorname{ui}} for item (i) or item (ii). ∎

For subspace variation associated with SVD, a set of two coupled Sylvester equations come into play:

XABY=S,YAHBHX=T,XA-BY=S,\quad YA^{\operatorname{H}}-B^{\operatorname{H}}X=T, (2.6)

where Ar×rA\in\mathbb{C}^{r\times r}, Bs×tB\in\mathbb{C}^{s\times t}, Ss×rS\in\mathbb{C}^{s\times r}, and Tt×rT\in\mathbb{C}^{t\times r}. Note that BB is possibly a nonsquare matrix, in which case we pad some blocks of zeros to BB, YY, and SS or TT to yield an equivalent set of two coupled Sylvester equations with BB being square. Specifically, if s>ts>t, let

Bext=[B,0s×(st)],Yext=[Y0(st)×r],Sext=S,Text=[T0(st)×r],B_{\operatorname{ext}}=[B,0_{s\times(s-t)}],\,Y_{\operatorname{ext}}=\begin{bmatrix}Y\\ 0_{(s-t)\times r}\end{bmatrix},\,S_{\operatorname{ext}}=S,\,\,T_{\operatorname{ext}}=\begin{bmatrix}T\\ 0_{(s-t)\times r}\end{bmatrix},

whereas if s<ts<t, let

Bext=[B0(ts)×t],Yext=[Y,0(ts)×r],Sext=[S0(ts)×r],Text=T.B_{\operatorname{ext}}=\begin{bmatrix}B\\ 0_{(t-s)\times t}\end{bmatrix},\,Y_{\operatorname{ext}}=[Y,0_{(t-s)\times r}],\,S_{\operatorname{ext}}=\begin{bmatrix}S\\ 0_{(t-s)\times r}\end{bmatrix},\,\,T_{\operatorname{ext}}=T.

Then (2.6) is equivalent to

XABextYext=Sext,YextAHBextHX=Text,XA-B_{\operatorname{ext}}Y_{\operatorname{ext}}=S_{\operatorname{ext}},\quad Y_{\operatorname{ext}}A^{\operatorname{H}}-B_{\operatorname{ext}}^{\operatorname{H}}X=T_{\operatorname{ext}}, (2.7)

which can be merged into one Sylvester equation

[0XYext0][0AHA0][0BextBextH0][0XYext0]=[Sext00Text].\begin{bmatrix}0&X\\ Y_{\operatorname{ext}}&0\end{bmatrix}\begin{bmatrix}0&A^{\operatorname{H}}\\ A&0\end{bmatrix}-\begin{bmatrix}0&B_{\operatorname{ext}}\\ B_{\operatorname{ext}}^{\operatorname{H}}&0\end{bmatrix}\begin{bmatrix}0&X\\ Y_{\operatorname{ext}}&0\end{bmatrix}=\begin{bmatrix}S_{\operatorname{ext}}&0\\ 0&T_{\operatorname{ext}}\end{bmatrix}. (2.8)

It can be seen that

eig([0AHA0])=sv(A)(sv(A)),eig([0BextBextH0])=svext(B)(svext(B)),\operatorname{eig}(\begin{bmatrix}0&A^{\operatorname{H}}\\ A&0\end{bmatrix})=\operatorname{sv}(A)\cup(-\operatorname{sv}(A)),\,\operatorname{eig}(\begin{bmatrix}0&B_{\operatorname{ext}}\\ B_{\operatorname{ext}}^{\operatorname{H}}&0\end{bmatrix})=\operatorname{sv}_{\operatorname{ext}}(B)\cup(-\operatorname{sv}_{\operatorname{ext}}(B)), (2.9)

where negating a set means negating each element of the set, and

[0XYext0]ui=[0XY0]ui,[Sext00Text]ui=[S00T]ui\left\|\begin{bmatrix}0&X\\ Y_{\operatorname{ext}}&0\end{bmatrix}\right\|_{\operatorname{ui}}=\left\|\begin{bmatrix}0&X\\ Y&0\end{bmatrix}\right\|_{\operatorname{ui}},\quad\left\|\begin{bmatrix}S_{\operatorname{ext}}&0\\ 0&T_{\operatorname{ext}}\end{bmatrix}\right\|_{\operatorname{ui}}=\left\|\begin{bmatrix}S&0\\ 0&T\end{bmatrix}\right\|_{\operatorname{ui}} (2.10)

for any unitary invariant norm. Apply Lemma 2.2 to get

Lemma 2.3.

Consider a set of two coupled Sylvester equations (2.6), where Ar×rA\in\mathbb{C}^{r\times r}, Bs×tB\in\mathbb{C}^{s\times t}, Ss×rS\in\mathbb{C}^{s\times r}, and Tt×rT\in\mathbb{C}^{t\times r}. If sv(A)svext(B)=\operatorname{sv}(A)\cap\operatorname{sv}_{\operatorname{ext}}(B)=\emptyset, then the set of equations has a unique solution pair (X,Y)s×r×t×r(X,Y)\in\mathbb{C}^{s\times r}\times\mathbb{C}^{t\times r}. Furthermore, the following statements hold:

  1. (a)

    we have [15]

    XF2+YF2SF2+TF2/δ,δ:=minωsv(A),γsvext(B)|ωγ|;\sqrt{\|X\|_{\operatorname{F}}^{2}+\|Y\|_{\operatorname{F}}^{2}}\leq\left.\sqrt{\|S\|_{\operatorname{F}}^{2}+\|T\|_{\operatorname{F}}^{2}}\right/\delta,\quad\delta:=\min_{\omega\in\operatorname{sv}(A),\,\gamma\in\operatorname{sv}_{\operatorname{ext}}(B)}\,|\omega-\gamma|; (2.11)
  2. (b)

    if δ:=σmin(A)σmax(B)>0\delta:=\sigma_{\min}(A)-\sigma_{\max}(B)>0, then for any unitarily invariant norm ui\|\cdot\|_{\operatorname{ui}},

    [0XY0]ui\displaystyle\left\|\begin{bmatrix}0&X\\ Y&0\end{bmatrix}\right\|_{\operatorname{ui}} 1δ[S00T]ui,\displaystyle\leq\frac{1}{\delta}\left\|\begin{bmatrix}S&0\\ 0&T\end{bmatrix}\right\|_{\operatorname{ui}}, (2.12a)
    max{Xui,Yui}\displaystyle\max\{\|X\|_{\operatorname{ui}},\|Y\|_{\operatorname{ui}}\} 1δmax{Sui,Tui},\displaystyle\leq\frac{1}{\delta}\max\{\|S\|_{\operatorname{ui}},\|T\|_{\operatorname{ui}}\}, (2.12b)

    and in particular for the spectral norm

    max{X2,Y2}1δmax{S2,T2};\max\{\|X\|_{2},\|Y\|_{2}\}\leq\frac{1}{\delta}\max\{\|S\|_{2},\|T\|_{2}\}; (2.13)
  3. (c)

    we have

    [0XY0]ui\displaystyle\left\|\begin{bmatrix}0&X\\ Y&0\end{bmatrix}\right\|_{\operatorname{ui}} π21δ[S00T]ui,\displaystyle\leq\frac{\pi}{2}\frac{1}{\delta}\left\|\begin{bmatrix}S&0\\ 0&T\end{bmatrix}\right\|_{\operatorname{ui}}, (2.14a)
    max{Xui,Yui}\displaystyle\max\{\|X\|_{\operatorname{ui}},\|Y\|_{\operatorname{ui}}\} π1δmax{Sui,Tui},\displaystyle\leq\pi\frac{1}{\delta}\max\{\|S\|_{\operatorname{ui}},\|T\|_{\operatorname{ui}}\}, (2.14b)

    where δ\delta is as in (2.11), and in particular for the spectral norm

    max{X2,Y2}π21δmax{S2,T2}.\max\{\|X\|_{2},\|Y\|_{2}\}\leq\frac{\pi}{2}\frac{1}{\delta}\max\{\|S\|_{2},\|T\|_{2}\}. (2.15)
Proof.

Recall (2.9) and (2.10). The inequality in (2.11) is essentially Stewart’s [15, Theorem 6.2], but here as a corollary of Lemma 2.2 applied to (2.8). Inequalities (2.12a) and (2.14a) are also corollaries of Lemma 2.2 applied to (2.8). Inequalities (2.13) and (2.15) follow from (2.12) and (2.14a), respectively, due to

[0XY0]2=max{X2,Y2},[S00T]2=max{S2,T2}.\left\|\begin{bmatrix}0&X\\ Y&0\end{bmatrix}\right\|_{2}=\max\{\|X\|_{2},\|Y\|_{2}\},\quad\left\|\begin{bmatrix}S&0\\ 0&T\end{bmatrix}\right\|_{2}=\max\{\|S\|_{2},\|T\|_{2}\}. (2.16)

It remains to show (2.12b) and (2.14b). Inequality (2.12b) is essentially implied in [17, section 3], but we will present a quick proof anyway. Note that for the case δ:=σmin(A)σmax(B)>0\delta:=\sigma_{\min}(A)-\sigma_{\max}(B)>0. We have

XABYui\displaystyle\|XA-BY\|_{\operatorname{ui}} XAuiBYui\displaystyle\geq\|XA\|_{\operatorname{ui}}-\|BY\|_{\operatorname{ui}}
XuiA121B2Yui\displaystyle\geq\|X\|_{\operatorname{ui}}\|A^{-1}\|_{2}^{-1}-\|B\|_{2}\|Y\|_{\operatorname{ui}}
=Xuiσmin(A)σmax(B)Yui.\displaystyle=\|X\|_{\operatorname{ui}}\sigma_{\min}(A)-\sigma_{\max}(B)\|Y\|_{\operatorname{ui}}. (2.17a)
Similarly, we can get
YAHBHXuiσmin(A)Yuiσmax(B)Xui.\|YA^{\operatorname{H}}-B^{\operatorname{H}}X\|_{\operatorname{ui}}\geq\sigma_{\min}(A)\|Y\|_{\operatorname{ui}}-\sigma_{\max}(B)\|X\|_{\operatorname{ui}}. (2.17b)

There are two cases to consider. If XuiYui\|X\|_{\operatorname{ui}}\geq\|Y\|_{\operatorname{ui}}, then by (2.17a) we get

XABYuiδXui=δmax{Xui,Yui};\|XA-BY\|_{\operatorname{ui}}\geq\delta\|X\|_{\operatorname{ui}}=\delta\,\max\{\|X\|_{\operatorname{ui}},\|Y\|_{\operatorname{ui}}\}; (2.18a)
if, on the other hand, Xui<Yui\|X\|_{\operatorname{ui}}<\|Y\|_{\operatorname{ui}}, then by (2.17b) we get
YAHBHXuiδYui=δmax{Xui,Yui}.\|YA^{\operatorname{H}}-B^{\operatorname{H}}X\|_{\operatorname{ui}}\geq\delta\|Y\|_{\operatorname{ui}}=\delta\,\max\{\|X\|_{\operatorname{ui}},\|Y\|_{\operatorname{ui}}\}. (2.18b)

Together, (2.18a) and (2.18b) yield

max{XABYui,YAHBHXui}δmax{Xui,Yui},\displaystyle\max\left\{\|XA-BY\|_{\operatorname{ui}},\|YA^{\operatorname{H}}-B^{\operatorname{H}}X\|_{\operatorname{ui}}\right\}\geq\delta\,\max\{\|X\|_{\operatorname{ui}},\|Y\|_{\operatorname{ui}}\},

as expected. Finally, noticing that

max{Xui,Yui}\displaystyle\max\{\|X\|_{\operatorname{ui}},\|Y\|_{\operatorname{ui}}\} [0XY0]ui,\displaystyle\leq\left\|\begin{bmatrix}0&X\\ Y&0\end{bmatrix}\right\|_{\operatorname{ui}},
[S00T]ui\displaystyle\left\|\begin{bmatrix}S&0\\ 0&T\end{bmatrix}\right\|_{\operatorname{ui}} Sui+Tui\displaystyle\leq\|S\|_{\operatorname{ui}}+\|T\|_{\operatorname{ui}}
2max{Sui,Tui},\displaystyle\leq 2\max\{\|S\|_{\operatorname{ui}},\|T\|_{\operatorname{ui}}\},

we see that (2.14a) implies (2.14b). ∎

Lemma 2.3 has more than what we need later. In fact, we will only use (2.13) and (2.15) in our later development.

The results of Lemma 2.3 have an alternative interpretation through a linear operator

𝑻::=s×r×t×r𝑻(X,Y)(X,Y)(XABY,YAHBHX),\begin{array}[]{cccc}\boldsymbol{T}\,:&\mathscr{B}:=\mathbb{C}^{s\times r}\times\mathbb{C}^{t\times r}&\,\to&\boldsymbol{T}(X,Y)\in\mathscr{B}\\ &(X,Y)&\,\to&(XA-BY,YA^{\operatorname{H}}-B^{\operatorname{H}}X),\end{array} (2.19)

endowed with certain norm on \mathscr{B} to make it a Banach space, including (cf. those used in Lemma 2.3)

(X,Y)=[0XY0]ui[X00Y]ui\|(X,Y)\|=\left\|\begin{bmatrix}0&X\\ Y&0\end{bmatrix}\right\|_{\operatorname{ui}}\equiv\left\|\begin{bmatrix}X&0\\ 0&Y\end{bmatrix}\right\|_{\operatorname{ui}} (2.20a)
for any (X,Y)(X,Y)\in\mathscr{B}, where ui\|\cdot\|_{\operatorname{ui}} is any given unitarily invariant norm. Two particular ones are
(X,Y)=max{X2,Y2}orXF2+YF2,\|(X,Y)\|=\max\{\|X\|_{2},\|Y\|_{2}\}\quad\mbox{or}\quad\sqrt{\|X\|_{\operatorname{F}}^{2}+\|Y\|_{\operatorname{F}}^{2}}, (2.20b)
upon realizing ui\|\cdot\|_{\operatorname{ui}} in (2.20a) as the spectral norm or the Frobenius norm. Another possible endowed norm on \mathscr{B} is
(X,Y)=max{Xui,Yui}.\|(X,Y)\|=\max\{\|X\|_{\operatorname{ui}},\|Y\|_{\operatorname{ui}}\}. (2.20c)

With each endowed norm, there is an induced operator norm \|\cdot\| on the linear operator from \mathscr{B} to itself. Translating the results of Lemma 2.3 yields the following corollary.

Corollary 2.2.

Let Ar×rA\in\mathbb{C}^{r\times r}, Bs×tB\in\mathbb{C}^{s\times t}, Ss×rS\in\mathbb{C}^{s\times r}, and Tt×rT\in\mathbb{C}^{t\times r}, and define linear operator 𝐓\boldsymbol{T} on (:=s×r×t×r,)(\mathscr{B}:=\mathbb{C}^{s\times r}\times\mathbb{C}^{t\times r},\|\cdot\|) as in (2.5) where \|\cdot\| is given by one of those in (2.20). If sv(A)svext(B)=\operatorname{sv}(A)\cap\operatorname{sv}_{\operatorname{ext}}(B)=\emptyset, then 𝐓\boldsymbol{T} is invertible, and, furthermore, the following statements hold.

  1. (i)

    With (X,Y)=XF2+YF2\|(X,Y)\|=\sqrt{\|X\|_{\operatorname{F}}^{2}+\|Y\|_{\operatorname{F}}^{2}} and δ\delta as in (2.11), we have 𝑻11=δ\|\boldsymbol{T}^{-1}\|^{-1}=\delta [15];

  2. (ii)

    With either (2.20a) or (2.20c), if δ:=σmin(A)σmax(B)>0\delta:=\sigma_{\min}(A)-\sigma_{\max}(B)>0, then 𝑻11=δ\|\boldsymbol{T}^{-1}\|^{-1}=\delta;

  3. (iii)

    With (2.20a) and δ\delta as in (2.11), we have 𝑻11(2/π)δ\|\boldsymbol{T}^{-1}\|^{-1}\geq(2/\pi)\delta;

  4. (iv)

    With (2.20c) and δ\delta as in (2.11), we have 𝑻11(1/π)δ\|\boldsymbol{T}^{-1}\|^{-1}\geq(1/\pi)\delta.

Proof.

To see these, we note that 𝑻11=min{γ:𝑻(X,Y)γ(X,Y)}\|\boldsymbol{T}^{-1}\|^{-1}=\min\{\gamma\,:\,\|\boldsymbol{T}(X,Y)\|\geq\gamma\|(X,Y)\|\}. Hence, immediately it follows that 𝑻11δ\|\boldsymbol{T}^{-1}\|^{-1}\geq\delta for items (i) and (ii), 𝑻11(2/π)δ\|\boldsymbol{T}^{-1}\|^{-1}\geq(2/\pi)\delta for item (iii), and 𝑻11(1/π)δ\|\boldsymbol{T}^{-1}\|^{-1}\geq(1/\pi)\delta for item (iv). The equality sign in items (i) and (ii) are achieved by letting X=𝒚𝒖HX=\boldsymbol{y}\boldsymbol{u}^{\operatorname{H}} and Y=𝒙𝒗HY=\boldsymbol{x}\boldsymbol{v}^{\operatorname{H}}, where 𝒖,𝒗,𝒙,𝒚\boldsymbol{u},\,\boldsymbol{v},\boldsymbol{x},\,\boldsymbol{y} are unit singular vectors of AA and BB, respectively such that A𝒗=σmin(A)𝒖A\boldsymbol{v}=\sigma_{\min}(A)\,\boldsymbol{u} and B𝒙=σmax(B)𝒚B\boldsymbol{x}=\sigma_{\max}(B)\,\boldsymbol{y}, and hence

𝑻(X,Y)=[σmin(A)σmax(B)](𝒚𝒗H,𝒙𝒖H).\boldsymbol{T}(X,Y)=[\sigma_{\min}(A)-\sigma_{\max}(B)](\boldsymbol{y}\boldsymbol{v}^{\operatorname{H}},\boldsymbol{x}\boldsymbol{u}^{\operatorname{H}}).

Also because sv(X)\operatorname{sv}(X), sv(Y)\operatorname{sv}(Y), sv(𝒚𝒗H)\operatorname{sv}(\boldsymbol{y}\boldsymbol{v}^{\operatorname{H}}), and sv(𝒙𝒖H)\operatorname{sv}(\boldsymbol{x}\boldsymbol{u}^{\operatorname{H}}) all consist of one nonzero singular value 11 and some copies of zeros,

[0XY0]ui[X00Y]ui=[𝒚𝒗H00𝒙𝒖H]ui,\displaystyle\left\|\begin{bmatrix}0&X\\ Y&0\end{bmatrix}\right\|_{\operatorname{ui}}\equiv\left\|\begin{bmatrix}X&0\\ 0&Y\end{bmatrix}\right\|_{\operatorname{ui}}=\left\|\begin{bmatrix}\boldsymbol{y}\boldsymbol{v}^{\operatorname{H}}&0\\ 0&\boldsymbol{x}\boldsymbol{u}^{\operatorname{H}}\end{bmatrix}\right\|_{\operatorname{ui}},\quad
Xui=X2=1,Yui=Y2=1,\displaystyle\|X\|_{\operatorname{ui}}=\|X\|_{2}=1,\,\,\|Y\|_{\operatorname{ui}}=\|Y\|_{2}=1,
𝒚𝒗Hui=𝒚𝒗H2=1,𝒙𝒖Hui=𝒙𝒖H2=1,\displaystyle\|\boldsymbol{y}\boldsymbol{v}^{\operatorname{H}}\|_{\operatorname{ui}}=\|\boldsymbol{y}\boldsymbol{v}^{\operatorname{H}}\|_{2}=1,\,\,\|\boldsymbol{x}\boldsymbol{u}^{\operatorname{H}}\|_{\operatorname{ui}}=\|\boldsymbol{x}\boldsymbol{u}^{\operatorname{H}}\|_{2}=1,

implying 𝑻(X,Y)=δ(X,Y)\|\boldsymbol{T}(X,Y)\|=\delta\|(X,Y)\| with the respective norm on \mathscr{B} as specified in item (i) or item (ii). ∎

The next lemma is likely known. We state it here with a proof for self-containedness.

Lemma 2.4.

Given Bm×nB\in\mathbb{C}^{m\times n} and an integer 1r<min{m,n}1\leq r<\min\{m,n\}, we have

σr(B)\displaystyle\sigma_{r}(B) max{σmin(B(:,1:r)),σmin(B(1:r,:))}σmin(B(1:r,1:r)),\displaystyle\geq\max\left\{\sigma_{\min}(B_{(:,1:r)}),\,\sigma_{\min}(B_{(1:r,:)})\right\}\geq\sigma_{\min}(B_{(1:r,1:r)}),
σr+1(B)\displaystyle\sigma_{r+1}(B) min{σmax(B(:,r+1:n)),σmax(B(r+1:m,:))}.\displaystyle\leq\min\left\{\sigma_{\max}(B_{(:,r+1:n)}),\,\sigma_{\max}(B_{(r+1:m,:)})\right\}.
Proof.

Partition BB as

B= [\@arstrutrn-r\\rB11B12\\m-rB21B22] ,B=\hbox{}\vbox{\kern 0.86108pt\hbox{$\kern 0.0pt\kern 2.5pt\kern-5.0pt\left[\kern 0.0pt\kern-2.5pt\kern-5.55557pt\vbox{\kern-0.86108pt\vbox{\vbox{ \halign{\kern\arraycolsep\hfil\@arstrut$\kbcolstyle#$\hfil\kern\arraycolsep& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep&& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep\cr 5.0pt\hfil\@arstrut$\scriptstyle$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle\scriptstyle n-r\\\scriptstyle r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle B_{11}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle B_{12}\\\scriptstyle m-r$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle B_{21}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle B_{22}$\hfil\kern 5.0pt\crcr}}}}\right]$}},

and then we have

BHB=[B11HB11+B21HB21B11HB12+B21HB22B12HB11+B22HB21B12HB12+B22HB22]n×n.B^{\operatorname{H}}B=\begin{bmatrix}B_{11}^{\operatorname{H}}B_{11}+B_{21}^{\operatorname{H}}B_{21}&B_{11}^{\operatorname{H}}B_{12}+B_{21}^{\operatorname{H}}B_{22}\\ B_{12}^{\operatorname{H}}B_{11}+B_{22}^{\operatorname{H}}B_{21}&B_{12}^{\operatorname{H}}B_{12}+B_{22}^{\operatorname{H}}B_{22}\end{bmatrix}\in\mathbb{C}^{n\times n}.

It follows from Fischer’s minimax principle for the symmetric eigenvalue problem (see, e.g., [12, eq. (1.3)], [13, p.206], [16, p.201]) that

[σr(B)]2=λr(BHB)\displaystyle[\sigma_{r}(B)]^{2}=\lambda_{r}(B^{\operatorname{H}}B) =maxdim𝒳=rmin𝒙𝒳n𝒙H(BHB)𝒙𝒙H𝒙\displaystyle=\max_{\dim{\cal X}=r}\min_{\boldsymbol{x}\in{\cal X}\subset\mathbb{C}^{n}}\frac{\boldsymbol{x}^{\operatorname{H}}(B^{\operatorname{H}}B)\boldsymbol{x}}{\boldsymbol{x}^{\operatorname{H}}\boldsymbol{x}}
min𝒛r𝒛H(B11HB11+B21HB21)𝒛𝒛H𝒛\displaystyle\geq\min_{\boldsymbol{z}\in\mathbb{C}^{r}}\frac{\boldsymbol{z}^{\operatorname{H}}(B_{11}^{\operatorname{H}}B_{11}+B_{21}^{\operatorname{H}}B_{21})\boldsymbol{z}}{\boldsymbol{z}^{\operatorname{H}}\boldsymbol{z}}
min𝒛r𝒛H(B11HB11)𝒛𝒛H𝒛=[σmin(B11)]2,\displaystyle\geq\min_{\boldsymbol{z}\in\mathbb{C}^{r}}\frac{\boldsymbol{z}^{\operatorname{H}}(B_{11}^{\operatorname{H}}B_{11})\boldsymbol{z}}{\boldsymbol{z}^{\operatorname{H}}\boldsymbol{z}}=[\sigma_{\min}(B_{11})]^{2}, (2.21)

where 𝒳n{\cal X}\subset\mathbb{C}^{n} denotes a subspace of n\mathbb{C}^{n}, λr(BHB)\lambda_{r}(B^{\operatorname{H}}B) is the rrth largest eigenvalue of BHBB^{\operatorname{H}}B, and the first inequality is due to limiting the subspace to the one composed of vectors with their last nrn-r entries being to 0. This proves σr(B)σmin(B(:,1:r))σmin(B(1:r,1:r))\sigma_{r}(B)\geq\sigma_{\min}(B_{(:,1:r)})\geq\sigma_{\min}(B_{(1:r,1:r)}). Next, using [σr(B)]2=λr(BBH)[\sigma_{r}(B)]^{2}=\lambda_{r}(BB^{\operatorname{H}}), in the same way as in (2.21), we can get σr(B)σmin(B(1:r,:))σmin(B(1:r,1:r))\sigma_{r}(B)\geq\sigma_{\min}(B_{(1:r,:)})\geq\sigma_{\min}(B_{(1:r,1:r)}). This completes the proof of the inequalities for σr(B)\sigma_{r}(B).

Analogously, again by Fischer’s minimax principle, we have

[σr+1(B)]2=λr+1(BHB)\displaystyle[\sigma_{r+1}(B)]^{2}=\lambda_{r+1}(B^{\operatorname{H}}B) =mindim𝒳=nrmax𝒙𝒳n𝒙H(BHB)𝒙𝒙H𝒙\displaystyle=\min_{\dim{\cal X}=n-r}\max_{\boldsymbol{x}\in{\cal X}\subset\mathbb{C}^{n}}\frac{\boldsymbol{x}^{\operatorname{H}}(B^{\operatorname{H}}B)\boldsymbol{x}}{\boldsymbol{x}^{\operatorname{H}}\boldsymbol{x}}
max𝒛nr𝒛H(B12HB12+B22HB22)𝒛𝒛H𝒛\displaystyle\leq\max_{\boldsymbol{z}\in\mathbb{C}^{n-r}}\frac{\boldsymbol{z}^{\operatorname{H}}(B_{12}^{\operatorname{H}}B_{12}+B_{22}^{\operatorname{H}}B_{22})\boldsymbol{z}}{\boldsymbol{z}^{\operatorname{H}}\boldsymbol{z}}
=λmax(B12HB12+B22HB22)=[σmax(B(:,r+1:n))]2,\displaystyle=\lambda_{\max}(B_{12}^{\operatorname{H}}B_{12}+B_{22}^{\operatorname{H}}B_{22})=[\sigma_{\max}(B_{(:,r+1:n)})]^{2}, (2.22)

where λr+1(BHB)\lambda_{r+1}(B^{\operatorname{H}}B) is the (r+1)(r+1)st largest eigenvalue of BHBB^{\operatorname{H}}B, and the first inequality is due to limiting the subspace to the one composed of vectors with their first rr entries being to 0. Next, using [σr+1(B)]2=λr+1(BBH)[\sigma_{r+1}(B)]^{2}=\lambda_{r+1}(BB^{\operatorname{H}}), in the same way as in (2.22), we can get σr+1(B)σmax(B(r+1:m,:))\sigma_{r+1}(B)\leq\sigma_{\max}(B_{(r+1:m,:)}). ∎

3 Main Result

In order to achieve (1.6), we need some Ω\Omega and Γ\Gamma to satisfy

Γ(G1+E11)(G2+E22)Ω\displaystyle\Gamma(G_{1}+E_{11})-(G_{2}+E_{22})\Omega =E21ΓE12Ω,\displaystyle=E_{21}-\Gamma\,E_{12}\Omega, (3.1a)
Ω(G1+E11)H(G2+E22)HΓ\displaystyle\Omega(G_{1}+E_{11})^{\operatorname{H}}-(G_{2}+E_{22})^{\operatorname{H}}\Gamma =E12HΩE21HΓ,\displaystyle=E_{12}^{\operatorname{H}}-\Omega E_{21}^{\operatorname{H}}\Gamma, (3.1b)

obtained from setting off-diagonal blocks of UˇHG~Vˇ\check{U}^{\operatorname{H}}\widetilde{G}\check{V}, partitioned accordingly, to 0.

In what follows, we will use Lemma 2.1 to prove the existence of a solution pair (Γ,Ω)(mr)×r×(nr)×r(\Gamma,\Omega)\in\mathbb{C}^{(m-r)\times r}\times\mathbb{C}^{(n-r)\times r} to (3.1) with an upper bound under certain conditions.

Keeping in mind that our goal is to create a variant of Theorem 1.1 in a unitarily invariant norm and the spectral norm, and hopefully the variant for the special case of the Frobenius norm is better than Theorem 1.1 in terms of both weaker conditions and stronger results. In the setting of Lemma 2.1, we will use the Banach space

:=(mr)×r×(nr)×r\mathscr{B}:=\mathbb{C}^{(m-r)\times r}\times\mathbb{C}^{(n-r)\times r}

endowed with one of the two norms: for (Γ,Ω)(\Gamma,\Omega)\in\mathscr{B},

(Γ,Ω)\displaystyle\|(\Gamma,\Omega)\| :=[0ΓΩ0]ui[Γ00Ω]ui,\displaystyle:=\left\|\begin{bmatrix}0&\Gamma\\ \Omega&0\end{bmatrix}\right\|_{\operatorname{ui}}\equiv\left\|\begin{bmatrix}\Gamma&0\\ 0&\Omega\end{bmatrix}\right\|_{\operatorname{ui}}, (3.2a)
(Γ,Ω)\displaystyle\|(\Gamma,\Omega)\| :=max{Γui,Ωui}.\displaystyle:=\max\{\|\Gamma\|_{\operatorname{ui}},\|\Omega\|_{\operatorname{ui}}\}. (3.2b)

The two endowed norms become one for the case ui=2\|\cdot\|_{\operatorname{ui}}=\|\cdot\|_{2}, the spectral norm, but otherwise are different. The linear operator 𝑻:\boldsymbol{T}\,:\,\mathscr{B}\to\mathscr{B} is given by

𝑻(Γ,Ω)=(Γ(G1+E11)(G2+E22)Ω,Ω(G1+E11)H(G2+E22)HΓ),\boldsymbol{T}(\Gamma,\Omega)=\Big{(}\Gamma(G_{1}+E_{11})-(G_{2}+E_{22})\Omega,\Omega(G_{1}+E_{11})^{\operatorname{H}}-(G_{2}+E_{22})^{\operatorname{H}}\Gamma\Big{)}, (3.3)

and the continuous function ϕ:\boldsymbol{\phi}\,:\,\mathscr{B}\to\mathscr{B} is

ϕ((Γ,Ω))=(ΓE12Ω,ΩE21HΓ).\boldsymbol{\phi}((\Gamma,\Omega))=\left(\Gamma\,E_{12}\Omega,\Omega E_{21}^{\operatorname{H}}\Gamma\right). (3.4)

It is not hard to verify that (Γ,Ω)\|(\Gamma,\Omega)\| defined in (3.2) is indeed a norm on \mathscr{B}.

Compactly, the two equations in (3.1) can be merged into one to take the form

𝑻(Γ,Ω)=(E21,E12H)ϕ((Γ,Ω)).\boldsymbol{T}(\Gamma,\Omega)=(E_{21},E_{12}^{\operatorname{H}})-\boldsymbol{\phi}((\Gamma,\Omega)). (3.5)

It remains to verify the conditions of Lemma 2.1 for 𝑻\boldsymbol{T} and ϕ\boldsymbol{\phi} we just defined. This is done in the next two lemmas. Let

δ\displaystyle\delta =minμsv(G1),νsvext(G2)|μν|,\displaystyle=\min_{\mu\in\operatorname{sv}(G_{1}),\,\nu\in\operatorname{sv}_{\operatorname{ext}}(G_{2})}\,|\mu-\nu|, (3.6a)
δ¯\displaystyle\underaccent{\bar}{\delta} =δE112E222,\displaystyle=\delta-\|E_{11}\|_{2}-\|E_{22}\|_{2}, (3.6b)
ε\displaystyle\varepsilon =max{E122,E212}.\displaystyle=\max\{\|E_{12}\|_{2},\|E_{21}\|_{2}\}. (3.6c)
Lemma 3.1.

Suppose δ¯>0\underaccent{\bar}{\delta}>0. Dependent of different cases, we have for any (Γ,Ω)(\Gamma,\Omega)\in\mathscr{B}

𝑻(Γ,Ω)δ¯c(Γ,Ω),\|\boldsymbol{T}(\Gamma,\Omega)\|\geq\frac{\underaccent{\bar}{\delta}}{c}\,\|(\Gamma,\Omega)\|, (3.7)

which implies 𝐓11δ¯/c\|\boldsymbol{T}^{-1}\|^{-1}\geq\underaccent{\bar}{\delta}/c, where δ¯\underaccent{\bar}{\delta} is as in (3.6b), and

  1. (i)

    with the endowed norm in (3.2a),

    c={1,if σmin(G1)>σmax(G2) or ui=F,π/2,otherwise;c=\begin{cases}1,\quad&\mbox{if $\sigma_{\min}(G_{1})>\sigma_{\max}(G_{2})$ or $\|\cdot\|_{\operatorname{ui}}=\|\cdot\|_{\operatorname{F}}$},\\ \pi/2,\quad&\mbox{otherwise};\end{cases} (3.8)
  2. (ii)

    with the endowed norm in (3.2b),

    c={1,if σmin(G1)>σmax(G2),π,otherwise.c=\begin{cases}1,\quad&\mbox{if $\sigma_{\min}(G_{1})>\sigma_{\max}(G_{2})$},\\ \pi,\quad&\mbox{otherwise}.\end{cases} (3.9)
Proof.

Notice that 𝑻(Γ,Ω)=(S,T)\boldsymbol{T}(\Gamma,\Omega)=(S,T) consists of a set of two coupled Sylvester equations in (2.6) with

A=G1+E11,B=G2+E22,A=G_{1}+E_{11},\,\,B=G_{2}+E_{22},

and Γ\Gamma and Ω\Omega correspond to XX and YY there, respectively. We claim that

δ¯minμsv(G1+E11),νsvext(G2+E22)|μν|=|μν|.\underaccent{\bar}{\delta}\leq\min_{\mu\in\operatorname{sv}(G_{1}+E_{11}),\,\nu\in\operatorname{sv}_{\operatorname{ext}}(G_{2}+E_{22})}\,|\mu-\nu|=|\mu^{\prime}-\nu^{\prime}|. (3.10)

where μsv(G1+E11),νsvext(G2+E22)\mu^{\prime}\in\operatorname{sv}(G_{1}+E_{11}),\,\nu^{\prime}\in\operatorname{sv}_{\operatorname{ext}}(G_{2}+E_{22}) achieve the minimum. By Mirsky’s theorem [16, p.204], there are μsv(G1),νsvext(G2)\mu\in\operatorname{sv}(G_{1}),\,\nu\in\operatorname{sv}_{\operatorname{ext}}(G_{2}) such that |μμ|E112|\mu^{\prime}-\mu|\leq\|E_{11}\|_{2} and |νν|E222|\nu^{\prime}-\nu|\leq\|E_{22}\|_{2}. Hence

|μν|\displaystyle|\mu^{\prime}-\nu^{\prime}| =|μν+(μμ)(νν)|\displaystyle=|\mu-\nu+(\mu^{\prime}-\mu)-(\nu^{\prime}-\nu)|
|μν||μμ||νν|\displaystyle\geq|\mu-\nu|-|\mu^{\prime}-\mu|-|\nu^{\prime}-\nu|
δE112E222=δ¯,\displaystyle\geq\delta-\|E_{11}\|_{2}-\|E_{22}\|_{2}=\underaccent{\bar}{\delta},

yielding (3.10).

If σmin(G1)>σmax(G2)\sigma_{\min}(G_{1})>\sigma_{\max}(G_{2}), then δ=σmin(G1)σmax(G2)>0\delta=\sigma_{\min}(G_{1})-\sigma_{\max}(G_{2})>0, and hence

0<δ¯\displaystyle 0<\underaccent{\bar}{\delta} =σmin(G1)σmax(G2)E112E222\displaystyle=\sigma_{\min}(G_{1})-\sigma_{\max}(G_{2})-\|E_{11}\|_{2}-\|E_{22}\|_{2}
=[σmin(G1)E112][σmax(G2)+E222].\displaystyle=[\sigma_{\min}(G_{1})-\|E_{11}\|_{2}]-[\sigma_{\max}(G_{2})+\|E_{22}\|_{2}].

Keep in mind that

σmin(G1+E11)σmin(G1)E112,σmax(G2+E22)σmax(G2)+E222\sigma_{\min}(G_{1}+E_{11})\geq\sigma_{\min}(G_{1})-\|E_{11}\|_{2},\quad\sigma_{\max}(G_{2}+E_{22})\leq\sigma_{\max}(G_{2})+\|E_{22}\|_{2}

by Mirsky’s theorem [16, p.204], and hence

sv(A)[σmin(G1)E112,),svext(B)[0,σmax(G2)+E222].\operatorname{sv}(A)\subset[\sigma_{\min}(G_{1})-\|E_{11}\|_{2},\infty),\quad\operatorname{sv}_{\operatorname{ext}}(B)\subset[0,\sigma_{\max}(G_{2})+\|E_{22}\|_{2}].

This lemma is a consequence of Lemma 2.3. ∎

Remark 3.1.

There are a few comments in order, regarding the assumptions in Lemma 3.1 that ensure (3.7).

  1. 1)

    Always sv(G1)svext(G2)=sv(G1+E11)svext(G2+E22)=\operatorname{sv}(G_{1})\cap\operatorname{sv}_{\operatorname{ext}}(G_{2})=\operatorname{sv}(G_{1}+E_{11})\cap\operatorname{sv}_{\operatorname{ext}}(G_{2}+E_{22})=\emptyset, guaranteed by δ¯>0\underaccent{\bar}{\delta}>0;

  2. 2)

    If mnm\neq n, then 0 is an element of both svext(G2)\operatorname{sv}_{\operatorname{ext}}(G_{2}) and svext(G2+E22)\operatorname{sv}_{\operatorname{ext}}(G_{2}+E_{22}) and hence σmin(G1)>0\sigma_{\min}(G_{1})>0 and σmin(G1+E11)>0\sigma_{\min}(G_{1}+E_{11})>0;

  3. 3)

    Two different ways of separation between sv(G1)\operatorname{sv}(G_{1}) and svext(G2)\operatorname{sv}_{\operatorname{ext}}(G_{2}) are assumed:

    1. (i)

      simply sv(G1)svext(G2)=\operatorname{sv}(G_{1})\cap\operatorname{sv}_{\operatorname{ext}}(G_{2})=\emptyset;

    2. (ii)

      σmin(G1)>σmax(G2)\sigma_{\min}(G_{1})>\sigma_{\max}(G_{2}), i..e, the interval [σmax(G2),σmin(G1)][\sigma_{\max}(G_{2}),\sigma_{\min}(G_{1})] separates sv(G1)\operatorname{sv}(G_{1}) from svext(G2)\operatorname{sv}_{\operatorname{ext}}(G_{2}).

    Assumption (i) of separation is weaker than Assumption (ii) of separation. Each implies the same way of separation between sv(G1+E11)\operatorname{sv}(G_{1}+E_{11}) and svext(G2+E22)\operatorname{sv}_{\operatorname{ext}}(G_{2}+E_{22}) by δ¯>0\underaccent{\bar}{\delta}>0;

  4. 4)

    The endowed norm (3.2a) works with both assumptions of separation on the singular values. Correspondingly, c=1c=1 always for the Frobenius norm, and c=1c=1 under Assumption (ii) of separation above and π/2\pi/2 otherwise;

  5. 5)

    The endowed norm (3.2b) works with both assumptions of separation, too. Correspondingly, c=1c=1 always for the Frobenius norm, and c=1c=1 under Assumption (ii) of separation and π\pi otherwise.

Finally, because of (2.16), items (i) and (ii) of Lemma 3.1 overlap at the case ui=2\|\cdot\|_{\operatorname{ui}}=\|\cdot\|_{2} and σmin(G1)>σmax(G2)\sigma_{\min}(G_{1})>\sigma_{\max}(G_{2}).

In what follows, our representation will assume that one of the endowed norm (,)\|(\cdot,\cdot)\| in (3.2) is selected and fixed, and, along with it, the part of Lemma 3.1, unless explicitly stated otherwise.

Lemma 3.2.

For the continuous function ϕ\boldsymbol{\phi} defined in (3.4), we have

(i) ϕ((Γ,Ω))ε(Γ,Ω)2,\displaystyle\quad\|\boldsymbol{\phi}((\Gamma,\Omega))\|\leq{\varepsilon}\,\|(\Gamma,\Omega)\|^{2},
(ii) ϕ((Γ,Ω))ϕ((Γ^,Ω^))2εmax{(Γ,Ω),(Γ^,Ω^)}(ΓΓ^,ΩΩ^),\displaystyle\quad\|\boldsymbol{\phi}((\Gamma,\Omega))-\boldsymbol{\phi}((\widehat{\Gamma},\widehat{\Omega}))\|\leq 2\varepsilon\max\{\|(\Gamma,\Omega)\|,\|(\widehat{\Gamma},\widehat{\Omega})\|\}\|(\Gamma-\widehat{\Gamma},\Omega-\widehat{\Omega})\|,

where ε=max{E122,E212}\varepsilon=\max\{\|E_{12}\|_{2},\|E_{21}\|_{2}\} is as in (3.6c).

Proof.

With (3.2a), we have

ϕ((Γ,Ω))\displaystyle\|\boldsymbol{\phi}((\Gamma,\Omega))\| =[ΓE12Ω00ΩE21HΓ]ui=[Γ00Ω][E1200E21H][Ω00Γ]ui\displaystyle=\left\|\begin{bmatrix}\Gamma\,E_{12}\Omega&0\\ 0&\Omega E_{21}^{\operatorname{H}}\Gamma\end{bmatrix}\right\|_{\operatorname{ui}}=\left\|\begin{bmatrix}\Gamma&0\\ 0&\Omega\end{bmatrix}\begin{bmatrix}E_{12}&0\\ 0&E_{21}^{\operatorname{H}}\end{bmatrix}\begin{bmatrix}\Omega&0\\ 0&\Gamma\end{bmatrix}\right\|_{\operatorname{ui}}
[Γ00Ω]ui[E1200E21H]2[Ω00Γ]ui\displaystyle\leq\left\|\begin{bmatrix}\Gamma&0\\ 0&\Omega\end{bmatrix}\right\|_{\operatorname{ui}}\left\|\begin{bmatrix}E_{12}&0\\ 0&E_{21}^{\operatorname{H}}\end{bmatrix}\right\|_{2}\left\|\begin{bmatrix}\Omega&0\\ 0&\Gamma\end{bmatrix}\right\|_{\operatorname{ui}}
=max{E122,E212}×(Γ,Ω)2;\displaystyle=\max\{\|E_{12}\|_{2},\|E_{21}\|_{2}\}\times\|(\Gamma,\Omega)\|^{2}; (3.11)

with (3.2b), we have

ϕ((Γ,Ω))\displaystyle\|\boldsymbol{\phi}((\Gamma,\Omega))\| =max{ΓE12Ωui,ΩE21HΓui}\displaystyle=\max\left\{\|\Gamma\,E_{12}\Omega\|_{\operatorname{ui}},\|\Omega E_{21}^{\operatorname{H}}\Gamma\|_{\operatorname{ui}}\right\}
max{E122,E212}×ΓuiΩui\displaystyle\leq\max\{\|E_{12}\|_{2},\|E_{21}\|_{2}\}\times\|\Gamma\|_{\operatorname{ui}}\|\Omega\|_{\operatorname{ui}}
max{E122,E212}×(Γ,Ω)2.\displaystyle\leq\max\{\|E_{12}\|_{2},\|E_{21}\|_{2}\}\times\|(\Gamma,\Omega)\|^{2}. (3.12)

This proves the inequality in item (i). For item (ii), we note

ϕ((Γ,Ω))ϕ((Γ^,Ω^))=(ΓE12ΩΓ^E12Ω^,ΩE21HΓΩ^E21HΓ^).\boldsymbol{\phi}((\Gamma,\Omega))-\boldsymbol{\phi}((\widehat{\Gamma},\widehat{\Omega}))=\left(\Gamma\,E_{12}\Omega-\widehat{\Gamma}\,E_{12}\widehat{\Omega},\Omega E_{21}^{\operatorname{H}}\Gamma-\widehat{\Omega}E_{21}^{\operatorname{H}}\widehat{\Gamma}\right).

For each of the two components, we have

ΓE12ΩΓ^E12Ω^\displaystyle\Gamma\,E_{12}\Omega-\widehat{\Gamma}\,E_{12}\widehat{\Omega} =(ΓΓ^)E12Ω+Γ^E12(ΩΩ^),\displaystyle=(\Gamma-\widehat{\Gamma})\,E_{12}\Omega+\widehat{\Gamma}\,E_{12}(\Omega-\widehat{\Omega}), (3.13a)
ΩE21HΓΩ^E21HΓ^\displaystyle\Omega E_{21}^{\operatorname{H}}\Gamma-\widehat{\Omega}E_{21}^{\operatorname{H}}\widehat{\Gamma} =(ΩΩ^)E21HΓ+Ω^E21H(ΓΓ^).\displaystyle=(\Omega-\widehat{\Omega})\,E_{21}^{\operatorname{H}}\Gamma+\widehat{\Omega}E_{21}^{\operatorname{H}}(\Gamma-\widehat{\Gamma}). (3.13b)

With (3.2a), we get, similar to the derivation in (3.11),

ϕ((Γ,Ω))\displaystyle\|\boldsymbol{\phi}((\Gamma,\Omega)) ϕ((Γ^,Ω^))\displaystyle-\boldsymbol{\phi}((\widehat{\Gamma},\widehat{\Omega}))\|
=[(ΓΓ^)E12Ω+Γ^E12(ΩΩ^)00(ΩΩ^)E21HΓ+Ω^E21H(ΓΓ^)]ui\displaystyle=\left\|\begin{bmatrix}(\Gamma-\widehat{\Gamma})\,E_{12}\Omega+\widehat{\Gamma}\,E_{12}(\Omega-\widehat{\Omega})&0\\ 0&(\Omega-\widehat{\Omega})\,E_{21}^{\operatorname{H}}\Gamma+\widehat{\Omega}E_{21}^{\operatorname{H}}(\Gamma-\widehat{\Gamma})\end{bmatrix}\right\|_{\operatorname{ui}}
[(ΓΓ^)E12Ω00(ΩΩ^)E21HΓ]ui+[Γ^E12(ΩΩ^)00Ω^E21H(ΓΓ^)]ui\displaystyle\leq\left\|\begin{bmatrix}(\Gamma-\widehat{\Gamma})\,E_{12}\Omega&0\\ 0&(\Omega-\widehat{\Omega})\,E_{21}^{\operatorname{H}}\Gamma\end{bmatrix}\right\|_{\operatorname{ui}}+\left\|\begin{bmatrix}\widehat{\Gamma}\,E_{12}(\Omega-\widehat{\Omega})&0\\ 0&\widehat{\Omega}E_{21}^{\operatorname{H}}(\Gamma-\widehat{\Gamma})\end{bmatrix}\right\|_{\operatorname{ui}}
ε(ΓΓ^,ΩΩ^)((Γ,Ω)+(Γ^,Ω^))\displaystyle\leq\varepsilon\,\|(\Gamma-\widehat{\Gamma},\Omega-\widehat{\Omega})\|\big{(}\|(\Gamma,\Omega)\|+\|(\widehat{\Gamma},\widehat{\Omega})\|\big{)}
2εmax{(Γ,Ω),(Γ^,Ω^)}(ΓΓ^,ΩΩ^);\displaystyle\leq 2\varepsilon\max\{\|(\Gamma,\Omega)\|,\|(\widehat{\Gamma},\widehat{\Omega})\|\}\|(\Gamma-\widehat{\Gamma},\Omega-\widehat{\Omega})\|;

with (3.2b), we get, similar to the derivation in (3.12),

max{ΓE12ΩΓ^E12Ω^ui,ΩE21HΓΩ^E21HΓ^ui}\displaystyle\max\{\|\Gamma\,E_{12}\Omega-\widehat{\Gamma}\,E_{12}\widehat{\Omega}\|_{\operatorname{ui}},\|\Omega E_{21}^{\operatorname{H}}\Gamma-\widehat{\Omega}E_{21}^{\operatorname{H}}\widehat{\Gamma}\|_{\operatorname{ui}}\}
εmax{ΓΓ^uiΩui+Γ^uiΩΩ^ui,ΩΩ^uiΓui+Ω^uiΓΓ^ui}\displaystyle\qquad\leq\varepsilon\max\{\|\Gamma-\widehat{\Gamma}\|_{\operatorname{ui}}\|\Omega\|_{\operatorname{ui}}+\|\widehat{\Gamma}\|_{\operatorname{ui}}\|\Omega-\widehat{\Omega}\|_{\operatorname{ui}},\|\Omega-\widehat{\Omega}\|_{\operatorname{ui}}\|\Gamma\|_{\operatorname{ui}}+\|\widehat{\Omega}\|_{\operatorname{ui}}\|\Gamma-\widehat{\Gamma}\|_{\operatorname{ui}}\}
εmax{ΓΓ^ui,ΩΩ^ui}max{Ωui+Γ^ui,Γui+Ω^ui}\displaystyle\qquad\leq\varepsilon\max\{\|\Gamma-\widehat{\Gamma}\|_{\operatorname{ui}},\|\Omega-\widehat{\Omega}\|_{\operatorname{ui}}\}\cdot\max\{\|\Omega\|_{\operatorname{ui}}+\|\widehat{\Gamma}\|_{\operatorname{ui}},\|\Gamma\|_{\operatorname{ui}}+\|\widehat{\Omega}\|_{\operatorname{ui}}\}
εmax{ΓΓ^ui,ΩΩ^ui}2max{(Γ,Ω),(Γ^,Ω^)}\displaystyle\qquad\leq\varepsilon\max\{\|\Gamma-\widehat{\Gamma}\|_{\operatorname{ui}},\|\Omega-\widehat{\Omega}\|_{\operatorname{ui}}\}\cdot 2\max\{\|(\Gamma,\Omega)\|,\|(\widehat{\Gamma},\widehat{\Omega})\|\}
=2εmax{(Γ,Ω),(Γ^,Ω^)}(ΓΓ^,ΩΩ^),\displaystyle\qquad=2\varepsilon\,\max\{\|(\Gamma,\Omega)\|,\|(\widehat{\Gamma},\widehat{\Omega})\|\}\|(\Gamma-\widehat{\Gamma},\Omega-\widehat{\Omega})\|,

completing the proof of item (ii). ∎

Next we apply Lemma 2.1 to ensure a particular solution (Γ,Ω)(\Gamma,\Omega)\in\mathscr{B} to (3.5).

Lemma 3.3.

Let δ\delta, δ¯\underaccent{\bar}{\delta}, and ε\varepsilon be defined as in (3.6), and suppose the conditions in Lemma 3.1 that ensure (3.7) with constant cc as specified there. If

δ¯>0andκ2:=c2εδ¯2(E21,E12H)<14,\underaccent{\bar}{\delta}>0\quad\mbox{and}\quad\kappa_{2}:=\frac{c^{2}\varepsilon}{\underaccent{\bar}{\delta}^{2}}\,\|(E_{21},E_{12}^{\operatorname{H}})\|<\frac{1}{4}, (3.14)

then (3.5) has a solution (Γ,Ω)(\Gamma,\Omega) that satisfies

(Γ,Ω)1+14κ212κ2+14κ2c(E21,E12H)δ¯<2c(E21,E12H)δ¯.\|(\Gamma,\Omega)\|\leq\frac{1+\sqrt{1-4\kappa_{2}}}{1-2\kappa_{2}+\sqrt{1-4\kappa_{2}}}\frac{c\|(E_{21},E_{12}^{\operatorname{H}})\|}{\underaccent{\bar}{\delta}}<2\,\frac{c\|(E_{21},E_{12}^{\operatorname{H}})\|}{\underaccent{\bar}{\delta}}. (3.15)

Here (E21,E12H)\|(E_{21},E_{12}^{\operatorname{H}})\| and (Γ,Ω)\|(\Gamma,\Omega)\| are understood as the same one of the endowed norms in (3.2) under consideration.

Proof.

In light of Lemmas 3.1 and 3.2, we find the conclusion is a straightforward consequence of Lemma 2.1 with ε^=ε\hat{\varepsilon}=\varepsilon, δ^=δ¯/c\hat{\delta}=\underaccent{\bar}{\delta}/c, and 𝒈=(E21,E12H)\|\boldsymbol{g}\|=\|(E_{21},E_{12}^{\operatorname{H}})\|. ∎

Finally, we state the main result of this section.

Theorem 3.1.

Given G,G~m×nG,\,\widetilde{G}\in\mathbb{C}^{m\times n}, let GG be decomposed as in (1.1), and partition UHG~VU^{\operatorname{H}}\widetilde{G}V according to (1.4). Let δ\delta, δ¯\underaccent{\bar}{\delta}, and ε\varepsilon be defined as in (3.6). If (3.14) is satisfied, then the following statements hold:

  1. (a)

    there exists a solution (Γ,Ω)(mr)×r×(nr)×r(\Gamma,\Omega)\in\mathbb{C}^{(m-r)\times r}\times\mathbb{C}^{(n-r)\times r} to (3.1), satisfying (3.15);

  2. (b)

    G~\widetilde{G} admits the decomposition (1.6), and the singular values of G~\widetilde{G} is the multiset union of those of

    Gˇ1\displaystyle\check{G}_{1} =Uˇ1HG~Vˇ1\displaystyle=\check{U}_{1}^{\operatorname{H}}\widetilde{G}\check{V}_{1}
    =(I+ΓHΓ)1/2(G1+E11+E12Ω)(I+ΩHΩ)1/2\displaystyle=(I+\Gamma^{\operatorname{H}}\Gamma)^{1/2}(G_{1}+E_{11}+E_{12}\Omega)(I+\Omega^{\operatorname{H}}\Omega)^{-1/2} (3.16a)
    =(I+ΓHΓ)1/2(G1+E11+ΓHE21)(I+ΩHΩ)1/2,\displaystyle=(I+\Gamma^{\operatorname{H}}\Gamma)^{-1/2}(G_{1}+E_{11}+\Gamma^{\operatorname{H}}E_{21})(I+\Omega^{\operatorname{H}}\Omega)^{1/2}, (3.16b)

    and

    Gˇ2\displaystyle\check{G}_{2} =Uˇ2HG~Vˇ2\displaystyle=\check{U}_{2}^{\operatorname{H}}\widetilde{G}\check{V}_{2}
    =(I+ΓΓH)1/2(G2+E22E21ΩH)(I+ΩΩH)1/2\displaystyle=(I+\Gamma\Gamma^{\operatorname{H}})^{1/2}(G_{2}+E_{22}-E_{21}\Omega^{\operatorname{H}})(I+\Omega\Omega^{\operatorname{H}})^{-1/2} (3.17a)
    =(I+ΓΓH)1/2(G2+E22ΓE12)(I+ΩΩH)1/2;\displaystyle=(I+\Gamma\Gamma^{\operatorname{H}})^{-1/2}(G_{2}+E_{22}-\Gamma E_{12})(I+\Omega\Omega^{\operatorname{H}})^{1/2}; (3.17b)
  3. (c)

    We have

    σmin(Gˇ1)σmin(G1)E1122cε(E21,E12H)δ¯,\displaystyle\sigma_{\min}(\check{G}_{1})\geq\sigma_{\min}(G_{1})-\|E_{11}\|_{2}-2c\,\frac{\varepsilon\,\|(E_{21},E_{12}^{\operatorname{H}})\|}{\underaccent{\bar}{\delta}}, (3.18a)
    σmax(Gˇ2)σmax(G2)+E222+2cε(E21,E12H)δ¯,\displaystyle\sigma_{\max}(\check{G}_{2})\leq\sigma_{\max}(G_{2})+\|E_{22}\|_{2}+2c\,\frac{\varepsilon\,\|(E_{21},E_{12}^{\operatorname{H}})\|}{\underaccent{\bar}{\delta}}, (3.18b)

    where σmin(Gˇ1)\sigma_{\min}(\check{G}_{1}) and σmax(Gˇ2)\sigma_{\max}(\check{G}_{2}) are the smallest singular value of Gˇ1\check{G}_{1} and the largest singular value of Gˇ2\check{G}_{2}, respectively;

  4. (d)

    The left and right singular subspaces of G~\widetilde{G} associated with the part of its singular values sv(Gˇ)\operatorname{sv}(\check{G}) are spanned by the columns of

    Uˇ1\displaystyle\check{U}_{1} =(U1+U2Γ)(I+ΓHΓ)1/2,\displaystyle=(U_{1}+U_{2}\Gamma)(I+\Gamma^{\operatorname{H}}\Gamma)^{-1/2}, (3.19a)
    Vˇ1\displaystyle\check{V}_{1} =(V1+V2Ω)(I+ΩHΩ)1/2,\displaystyle=(V_{1}+V_{2}\Omega)(I+\Omega^{\operatorname{H}}\Omega)^{-1/2}, (3.19b)

    respectively. In particular,

    Uˇ1U1ui\displaystyle\|\check{U}_{1}-U_{1}\|_{\operatorname{ui}} Γui(Γ,Ω)2c(E21,E12H)δ¯,\displaystyle\leq\|\Gamma\|_{\operatorname{ui}}\leq\|(\Gamma,\Omega)\|\leq 2\,\frac{c\|(E_{21},E_{12}^{\operatorname{H}})\|}{\underaccent{\bar}{\delta}}, (3.20a)
    Vˇ1V1ui\displaystyle\|\check{V}_{1}-V_{1}\|_{\operatorname{ui}} Ωui(Γ,Ω)2c(E21,E12H)δ¯.\displaystyle\leq\|\Omega\|_{\operatorname{ui}}\leq\|(\Gamma,\Omega)\|\leq 2\,\frac{c\|(E_{21},E_{12}^{\operatorname{H}})\|}{\underaccent{\bar}{\delta}}. (3.20b)
Proof.

Only item (c) and the inequalities in (3.20) of item (d) need proofs. For item (c), using (3.16a) for Gˇ1\check{G}_{1} and (3.16b) for Gˇ1\check{G}_{1} in Gˇ1H\check{G}_{1}^{\operatorname{H}} below, we get

Gˇ1Gˇ1H=(I+ΓHΓ)1/2(G1+E11+E12Ω)(G1+E11+ΓHE21)H(I+ΓHΓ)1/2.\check{G}_{1}\check{G}_{1}^{\operatorname{H}}=(I+\Gamma^{\operatorname{H}}\Gamma)^{1/2}(G_{1}+E_{11}+E_{12}\Omega)(G_{1}+E_{11}+\Gamma^{\operatorname{H}}E_{21})^{\operatorname{H}}(I+\Gamma^{\operatorname{H}}\Gamma)^{-1/2}.

Hence

[σmin(Gˇ1)]2=λmin(Gˇ1Gˇ1H)=λmin((G1+E11+E12Ω)(G1+E11+ΓHE21)H),\big{[}\sigma_{\min}(\check{G}_{1})\big{]}^{2}=\lambda_{\min}(\check{G}_{1}\check{G}_{1}^{\operatorname{H}})=\lambda_{\min}\left((G_{1}+E_{11}+E_{12}\Omega)(G_{1}+E_{11}+\Gamma^{\operatorname{H}}E_{21})^{\operatorname{H}}\right),

where λmin()\lambda_{\min}(\cdot) is the smallest eigenvalues of a Hermitian matrix. Therefore, with the help of (3.15), we have

[σmin(Gˇ1)]2\displaystyle\big{[}\sigma_{\min}(\check{G}_{1})\big{]}^{2} =[(G1+E11+E12Ω)(G1+E11+ΓHE21)H]121\displaystyle=\left\|\left[(G_{1}+E_{11}+E_{12}\Omega)(G_{1}+E_{11}+\Gamma^{\operatorname{H}}E_{21})^{\operatorname{H}}\right]^{-1}\right\|_{2}^{-1}
(G1+E11+ΓHE21)H21(G1+E11+E12Ω)121\displaystyle\geq\left\|(G_{1}+E_{11}+\Gamma^{\operatorname{H}}E_{21})^{-\operatorname{H}}\right\|_{2}^{-1}\left\|(G_{1}+E_{11}+E_{12}\Omega)^{-1}\right\|_{2}^{-1}
(σmin(G1)E112Γ2E212)(σmin(G1)E112E122Ω2)\displaystyle\geq\left(\sigma_{\min}(G_{1})-\|E_{11}\|_{2}-\|\Gamma\|_{2}\|E_{21}\|_{2}\right)\left(\sigma_{\min}(G_{1})-\|E_{11}\|_{2}-\|E_{12}\|_{2}\|\Omega\|_{2}\right)
(σmin(G1)E112(Γ,Ω)ε)2\displaystyle\geq\Big{(}\sigma_{\min}(G_{1})-\|E_{11}\|_{2}-\|(\Gamma,\Omega)\|\varepsilon\Big{)}^{2}
(σmin(G1)E1122cε(E21,E12H)δ¯)2,\displaystyle\geq\left(\sigma_{\min}(G_{1})-\|E_{11}\|_{2}-2c\,\frac{\varepsilon\,\|(E_{21},E_{12}^{\operatorname{H}})\|}{\underaccent{\bar}{\delta}}\right)^{2}, (3.21)

yielding the first inequality in (3.18). Similarly, we get the second inequality there by using (3.17).

Now we prove the inequalities in (3.20). We have

Uˇ1U1\displaystyle\check{U}_{1}-U_{1} =U1[(I+ΓHΓ)1/2I]+U2Γ(I+ΓHΓ)1/2\displaystyle=U_{1}\big{[}(I+\Gamma^{\operatorname{H}}\Gamma)^{-1/2}-I\big{]}+U_{2}\Gamma(I+\Gamma^{\operatorname{H}}\Gamma)^{-1/2}
=[U1,U2][I(I+ΓHΓ)1/2Γ(I+ΓHΓ)1/2].\displaystyle=[-U_{1},U_{2}]\begin{bmatrix}I-(I+\Gamma^{\operatorname{H}}\Gamma)^{-1/2}\\ \Gamma(I+\Gamma^{\operatorname{H}}\Gamma)^{-1/2}\end{bmatrix}.

Let Γ=ZΞWH\Gamma=Z\Xi W^{\operatorname{H}} be the SVD of Γ\Gamma. We find

[I(I+ΓHΓ)1/2Γ(I+ΓHΓ)1/2]=[WZ][I(I+ΞHΞ)1/2Ξ(I+ΞHΞ)1/2]WH,\begin{bmatrix}I-(I+\Gamma^{\operatorname{H}}\Gamma)^{-1/2}\\ \Gamma(I+\Gamma^{\operatorname{H}}\Gamma)^{-1/2}\end{bmatrix}=\begin{bmatrix}W&\\ &Z\end{bmatrix}\begin{bmatrix}I-(I+\Xi^{\operatorname{H}}\Xi)^{-1/2}\\ \Xi(I+\Xi^{\operatorname{H}}\Xi)^{-1/2}\end{bmatrix}W^{\operatorname{H}},

where for the middle matrix on the right, I(I+ΞHΞ)1/2I-(I+\Xi^{\operatorname{H}}\Xi)^{-1/2} is diagonal and Ξ(I+ΞHΞ)1/2\Xi(I+\Xi^{\operatorname{H}}\Xi)^{-1/2} is leading diagonal. Hence the singular values of the middle matrix are given by: for each singular value γ\gamma of Γ\Gamma,

(111+γ2)2+(γ1+γ2)2\displaystyle\sqrt{\left(1-\frac{1}{\sqrt{1+\gamma^{2}}}\right)^{2}+\left(\frac{\gamma}{\sqrt{1+\gamma^{2}}}\right)^{2}} =2(111+γ2)\displaystyle=\sqrt{2\left(1-\frac{1}{\sqrt{1+\gamma^{2}}}\right)} (3.22)
=2γ[1+γ2(1+γ2+1)]1/2\displaystyle=\frac{\sqrt{2}\,\gamma}{\big{[}\sqrt{1+\gamma^{2}}(\sqrt{1+\gamma^{2}}+1)\big{]}^{1/2}} (3.23)
γ.\displaystyle\leq\gamma.

Therefore, we get111By (3.22) and (3.23), we conclude that for the spectral norm Uˇ1U12=2Γ2[1+Γ22(1+Γ22+1)]1/2,Vˇ1V12=2Ω2[1+Ω22(1+Ω22+1)]1/2.\|\check{U}_{1}-U_{1}\|_{2}=\frac{\sqrt{2}\,\|\Gamma\|_{2}}{\big{[}\sqrt{1+\|\Gamma\|_{2}^{2}}(\sqrt{1+\|\Gamma\|_{2}^{2}}+1)\big{]}^{1/2}},\,\,\|\check{V}_{1}-V_{1}\|_{2}=\frac{\sqrt{2}\,\|\Omega\|_{2}}{\big{[}\sqrt{1+\|\Omega\|_{2}^{2}}(\sqrt{1+\|\Omega\|_{2}^{2}}+1)\big{]}^{1/2}}.

Uˇ1U1ui=[I(I+ΞHΞ)1/2Ξ(I+ΞHΞ)1/2]uiΓui,\|\check{U}_{1}-U_{1}\|_{\operatorname{ui}}=\left\|\begin{bmatrix}I-(I+\Xi^{\operatorname{H}}\Xi)^{-1/2}\\ \Xi(I+\Xi^{\operatorname{H}}\Xi)^{-1/2}\end{bmatrix}\right\|_{\operatorname{ui}}\leq\|\Gamma\|_{\operatorname{ui}},

yielding (3.20a) in light of (3.15). Similarly, we have (3.20b). ∎

The lower and upper bound on σmin(Gˇ1)\sigma_{\min}(\check{G}_{1}) and σmax(Gˇ2)\sigma_{\max}(\check{G}_{2}), respectively, in (3.18), although always true, do not provide useful information, unless also σmin(G1)>σmax(G2)\sigma_{\min}(G_{1})>\sigma_{\max}(G_{2}), in which case it can be used to establish a sufficient condition to ensure σmin(Gˇ1)>σmax(Gˇ2)\sigma_{\min}(\check{G}_{1})>\sigma_{\max}(\check{G}_{2}).

Corollary 3.1.

Given G,G~m×nG,\,\widetilde{G}\in\mathbb{C}^{m\times n}, let GG be decomposed as in (1.1), and partition UHG~VU^{\operatorname{H}}\widetilde{G}V according to (1.4). Let δ\delta, δ¯\underaccent{\bar}{\delta}, and ε\varepsilon be defined as in (3.6), and suppose σmin(G1)>σmax(G2)\sigma_{\min}(G_{1})>\sigma_{\max}(G_{2}). If (3.14) with c=1c=1 is satisfied, then the top rr singular values of G~\widetilde{G} are exactly the rr singular values of Gˇ1\check{G}_{1}, and

σmin(G1)E112σmin(Gˇ1)σmin(G1)+E112+2ε2δ¯+δ¯2+4ε2.\sigma_{\min}(G_{1})-\|E_{11}\|_{2}\leq\sigma_{\min}(\check{G}_{1})\leq\sigma_{\min}(G_{1})+\|E_{11}\|_{2}+\frac{2\varepsilon^{2}}{\underaccent{\bar}{\delta}+\sqrt{\underaccent{\bar}{\delta}^{2}+4\varepsilon^{2}}}. (3.24)
Proof.

We have all conclusions of Theorem 3.1. By (3.18), we have

σmin(Gˇ1)σmax(Gˇ2)δ¯4ε(E21,E12H)δ¯=δ¯[14ε(E21,E12H)δ¯2]>0,\sigma_{\min}(\check{G}_{1})-\sigma_{\max}(\check{G}_{2})\geq\underaccent{\bar}{\delta}-4\frac{\varepsilon\,\|(E_{21},E_{12}^{\operatorname{H}})\|}{\underaccent{\bar}{\delta}}=\underaccent{\bar}{\delta}\left[1-4\frac{\varepsilon\,\|(E_{21},E_{12}^{\operatorname{H}})\|}{\underaccent{\bar}{\delta}^{2}}\right]>0,

which says that the top rr singular values of G~\widetilde{G} are exactly the rr singular values of Gˇ1\check{G}_{1}. As a result, applying Lemma 2.4 to (1.4), we have

σmin(Gˇ1)=σr(UHG~V)σmin(G1+E11)σmin(G1)E112,\sigma_{\min}(\check{G}_{1})=\sigma_{r}\big{(}U^{\operatorname{H}}\widetilde{G}V\big{)}\geq\sigma_{\min}(G_{1}+E_{11})\geq\sigma_{\min}(G_{1})-\|E_{11}\|_{2}, (3.25)

where the last inequality in (3.25) is a consequence of Mirsky’s theorem [16, p.204]. This proves the first inequality in (3.24). It can be seen that

σmin(G1+E11)σmax(G2+E22)σmin(G1)E112[σmax(G2)+E222]=δ¯,\sigma_{\min}(G_{1}+E_{11})-\sigma_{\max}(G_{2}+E_{22})\geq\sigma_{\min}(G_{1})-\|E_{11}\|_{2}-\big{[}\sigma_{\max}(G_{2})+\|E_{22}\|_{2}\big{]}=\underaccent{\bar}{\delta},

and hence by [7, Theorem 3] combining with (3.25), we get

σmin(Gˇ1)σmin(G1+E11)2ε2δ¯+δ¯2+4ε2,\sigma_{\min}(\check{G}_{1})-\sigma_{\min}(G_{1}+E_{11})\leq\frac{2\varepsilon^{2}}{\underaccent{\bar}{\delta}+\sqrt{\underaccent{\bar}{\delta}^{2}+4\varepsilon^{2}}}, (3.26)

yielding the second inequality in (3.24). ∎

The sharper lower bound on σmin(Gˇ1)\sigma_{\min}(\check{G}_{1}) in (3.24) than the one by the first inequality in (3.18) is only made possible by first establishing that the top rr singular values of G~\widetilde{G} are exactly the rr singular values of Gˇ1\check{G}_{1}, which relies on the first inequality in (3.18) for a proof, however. More than (3.26) which is just for the smallest singular value only, [7, Theorem 3] yields

|σi(Gˇ1)σi(G1+E11)|2ε2δ¯+δ¯2+4ε2for 1ir|\sigma_{i}(\check{G}_{1})-\sigma_{i}(G_{1}+E_{11})|\leq\frac{2\varepsilon^{2}}{\underaccent{\bar}{\delta}+\sqrt{\underaccent{\bar}{\delta}^{2}+4\varepsilon^{2}}}\quad\mbox{for $1\leq i\leq r$}

under the conditions of Corollary 3.1.

4 Discussions

Theorem 3.1 serves the same purpose as Stewart’s [15, Theorem 6.4], but the former provides a variety of choices of norms for the Banach space \mathscr{B} and, as a consequence, a number of results dependent of circumstances to use.

Let us begin by realizing Theorem 3.1 and Corollary 3.1 with unitarily invariant norm ui\|\cdot\|_{\operatorname{ui}} being set to the Frobenius norm, in order to compare with Theorem 1.1 [15, Theorem 6.4]. We have two endowed norms from (3.2) to choose:

(Γ,Ω)=ΓF2+ΩF2or(Γ,Ω)=max{ΓF,ΩF}.\|(\Gamma,\Omega)\|=\sqrt{\|\Gamma\|_{\operatorname{F}}^{2}+\|\Omega\|_{\operatorname{F}}^{2}}\quad\mbox{or}\quad\|(\Gamma,\Omega)\|=\max\{\|\Gamma\|_{\operatorname{F}},\|\Omega\|_{\operatorname{F}}\}.

We have the following corollary to Theorem 3.1, where we state the conditions and the bounds on (Γ,Ω)\|(\Gamma,\Omega)\| but omit the others that can be deduced from the bounds on (Γ,Ω)\|(\Gamma,\Omega)\|.

Corollary 4.1.

Given G,G~m×nG,\,\widetilde{G}\in\mathbb{C}^{m\times n}, let GG be decomposed as in (1.1), and partition UHG~VU^{\operatorname{H}}\widetilde{G}V according to (1.4). Let δ\delta, δ¯\underaccent{\bar}{\delta}, and ε\varepsilon be defined as in (3.6).

  1. (i)

    If

    δ¯>0andκ2:=εδ¯2E12F2+E21F2<14,\underaccent{\bar}{\delta}>0\quad\mbox{and}\quad\kappa_{2}:=\frac{\varepsilon}{\underaccent{\bar}{\delta}^{2}}\,\sqrt{\|E_{12}\|_{\operatorname{F}}^{2}+\|E_{21}\|_{\operatorname{F}}^{2}}<\frac{1}{4}, (4.1)

    then (3.5) has a solution (Γ,Ω)(\Gamma,\Omega) that satisfies

    ΓF2+ΩF2\displaystyle\sqrt{\|\Gamma\|_{\operatorname{F}}^{2}+\|\Omega\|_{\operatorname{F}}^{2}} 1+14κ212κ2+14κ2E12F2+E21F2δ¯\displaystyle\leq\frac{1+\sqrt{1-4\kappa_{2}}}{1-2\kappa_{2}+\sqrt{1-4\kappa_{2}}}\frac{\sqrt{\|E_{12}\|_{\operatorname{F}}^{2}+\|E_{21}\|_{\operatorname{F}}^{2}}}{\underaccent{\bar}{\delta}}
    <2E12F2+E21F2δ¯.\displaystyle<2\,\frac{\sqrt{\|E_{12}\|_{\operatorname{F}}^{2}+\|E_{21}\|_{\operatorname{F}}^{2}}}{\underaccent{\bar}{\delta}}. (4.2)
  2. (ii)

    If σmin(G1)>σmax(G2)\sigma_{\min}(G_{1})>\sigma_{\max}(G_{2}) and if

    δ¯>0andκ2:=εδ¯2max{E12F,E21F}<14,\underaccent{\bar}{\delta}>0\quad\mbox{and}\quad\kappa_{2}:=\frac{\varepsilon}{\underaccent{\bar}{\delta}^{2}}\,\max\{\|E_{12}\|_{\operatorname{F}},\|E_{21}\|_{\operatorname{F}}\}<\frac{1}{4}, (4.3)

    then (3.5) has a solution (Γ,Ω)(\Gamma,\Omega) that satisfies

    max{ΓF,ΩF}\displaystyle\max\{\|\Gamma\|_{\operatorname{F}},\|\Omega\|_{\operatorname{F}}\} 1+14κ212κ2+14κ2max{E12F,E21F}δ¯\displaystyle\leq\frac{1+\sqrt{1-4\kappa_{2}}}{1-2\kappa_{2}+\sqrt{1-4\kappa_{2}}}\frac{\max\{\|E_{12}\|_{\operatorname{F}},\|E_{21}\|_{\operatorname{F}}\}}{\underaccent{\bar}{\delta}}
    <2max{E12F,E21F}δ¯.\displaystyle<2\,\frac{\max\{\|E_{12}\|_{\operatorname{F}},\|E_{21}\|_{\operatorname{F}}\}}{\underaccent{\bar}{\delta}}. (4.4)

In comparing Corollary 4.1 with Theorem 1.1 [15, Theorem 6.4], we note Corollary 4.1(i) provides better results than Theorem 1.1 in their conditions: (4.1) vs. (1.8), and bounds: (4.2) vs. (1.9), because

ε^2=E12F2+E21F2\displaystyle\hat{\varepsilon}^{2}=\|E_{12}\|_{\operatorname{F}}^{2}+\|E_{21}\|_{\operatorname{F}}^{2} max{E122,E212}E12F2+E21F2\displaystyle\geq\max\{\|E_{12}\|_{2},\|E_{21}\|_{2}\}\sqrt{\|E_{12}\|_{\operatorname{F}}^{2}+\|E_{21}\|_{\operatorname{F}}^{2}}
=εE12F2+E21F2.\displaystyle=\varepsilon\sqrt{\|E_{12}\|_{\operatorname{F}}^{2}+\|E_{21}\|_{\operatorname{F}}^{2}}.

Such an improvement of Corollary 4.1(i) over Theorem 1.1 could be considered marginal because it can be easily recovered by just refining Stewart’s relevant estimates [15]. However, the improvement by Corollary 4.1(ii) over Theorem 1.1, under the condition σmin(G1)>σmax(G2)\sigma_{\min}(G_{1})>\sigma_{\max}(G_{2}), unlikely can be achieved by simply refining Stewart’s arguments there.

Our original motivation to revisit this classical result of Stewart’s is the need for a version of Theorem 1.1 in the spectral norm while we are working on an error analysis for model order reduction by balanced truncation [18]. Let us look at what Theorem 3.1 leads to for the spectral norm, for which the two endowed norms from (3.2) collapse to the same one

(Γ,Ω)=max{Γ2,Ω2}.\|(\Gamma,\Omega)\|=\max\{\|\Gamma\|_{2},\|\Omega\|_{2}\}.

We have the following corollary to Theorem 3.1, which yields far sharper results than those such as (1.11) that would otherwise have to be derived from Theorem 1.1.

Corollary 4.2.

Given G,G~m×nG,\,\widetilde{G}\in\mathbb{C}^{m\times n}, let GG be decomposed as in (1.1), and partition UHG~VU^{\operatorname{H}}\widetilde{G}V according to (1.4). Let δ\delta, δ¯\underaccent{\bar}{\delta}, and ε\varepsilon be defined as in (3.6).

  1. (i)

    If

    δ¯>0andκ2:=(π2)2ε2δ¯2<14,\underaccent{\bar}{\delta}>0\quad\mbox{and}\quad\kappa_{2}:=\left(\frac{\pi}{2}\right)^{2}\frac{\varepsilon^{2}}{\underaccent{\bar}{\delta}^{2}}<\frac{1}{4}, (4.5)

    then (3.5) has a solution (Γ,Ω)(\Gamma,\Omega) that satisfies

    max{Γ2,Ω2}1+14κ212κ2+14κ2π2εδ¯<πεδ¯.\max\{\|\Gamma\|_{2},\|\Omega\|_{2}\}\leq\frac{1+\sqrt{1-4\kappa_{2}}}{1-2\kappa_{2}+\sqrt{1-4\kappa_{2}}}\frac{\pi}{2}\frac{\varepsilon}{\underaccent{\bar}{\delta}}<{\pi}\,\frac{\varepsilon}{\underaccent{\bar}{\delta}}. (4.6)
  2. (ii)

    If σmin(G1)>σmax(G2)\sigma_{\min}(G_{1})>\sigma_{\max}(G_{2}) and if

    δ¯>0andκ2:=ε2δ¯2<14,\underaccent{\bar}{\delta}>0\quad\mbox{and}\quad\kappa_{2}:=\frac{\varepsilon^{2}}{\underaccent{\bar}{\delta}^{2}}<\frac{1}{4}, (4.7)

    then (3.5) has a solution (Γ,Ω)(\Gamma,\Omega) that satisfies

    max{Γ2,Ω2}1+14κ212κ2+14κ2εδ¯<2εδ¯.\max\{\|\Gamma\|_{2},\|\Omega\|_{2}\}\leq\frac{1+\sqrt{1-4\kappa_{2}}}{1-2\kappa_{2}+\sqrt{1-4\kappa_{2}}}\frac{\varepsilon}{\underaccent{\bar}{\delta}}<2\,\frac{\varepsilon}{\underaccent{\bar}{\delta}}. (4.8)

During the proof of Lemma 2.3, we commented that inequality (2.12b) had been really implied by Wedin [17, section 3], and inequality (2.11) may also be essentially implied in [17] but with his δ\delta defined as

δ:=min{minμsv(A),νsv(B)|μν|,σmin(A)}\delta:=\min\Big{\{}\min_{\mu\in\operatorname{sv}(A),\,\nu\in\operatorname{sv}(B)}\,|\mu-\nu|,\,\sigma_{\min}(A)\Big{\}} (4.9)

which is the same as the one in (2.11) for the case sts\neq t because then 0svext(B)0\in\operatorname{sv}_{\operatorname{ext}}(B), but can be different if s=ts=t for which case svext(B)=sv(B)\operatorname{sv}_{\operatorname{ext}}(B)=\operatorname{sv}(B) that may or may not contain 0, however. Both (2.12b) and (2.11) are used to develop the generalized sinθ\sin\theta theorems for SVD there (see, also, [11, p.21-7]). In the same spirit, we can use (2.12a) and (2.14) to establish more generalized sinθ\sin\theta theorems for SVD. In fact, we have the following theorem.

Theorem 4.1.

Let Gm×nG\in\mathbb{C}^{m\times n} be decomposed as in (1.1), and let G~m×n\widetilde{G}\in\mathbb{C}^{m\times n} admit a decomposition in the same form as in (1.1), except with tildes on all symbols. Let

R=GV~1U~1G~1,S=GHU~1V~1G~1H.R=G\widetilde{V}_{1}-\widetilde{U}_{1}\widetilde{G}_{1},\quad S=G^{\operatorname{H}}\widetilde{U}_{1}-\widetilde{V}_{1}\widetilde{G}_{1}^{\operatorname{H}}.

If

δ:=minμsv(G~1),νsvext(G2)|μν|>0,\delta:=\min_{\mu\in\operatorname{sv}(\widetilde{G}_{1}),\,\nu\in\operatorname{sv}_{\operatorname{ext}}(G_{2})}\,|\mu-\nu|>0,

then

[sinΘ((U1),(U~1))00sinΘ((V1),(V~1))]uic1δ[R00S]ui,\left\|\begin{bmatrix}\sin\Theta({\cal R}(U_{1}),{\cal R}(\widetilde{U}_{1}))&0\\ 0&\sin\Theta({\cal R}(V_{1}),{\cal R}(\widetilde{V}_{1}))\end{bmatrix}\right\|_{\operatorname{ui}}\leq c\,\frac{1}{\delta}\left\|\begin{bmatrix}R&0\\ 0&S\end{bmatrix}\right\|_{\operatorname{ui}},

where c=π/2c={\pi}/2 in general, but c=1c=1 if also σmin(G~1)>σmax(G2)\sigma_{\min}(\widetilde{G}_{1})>\sigma_{\max}(G_{2}) or for the Frobenius norm. Here Θ((U1),(U~1))\Theta({\cal R}(U_{1}),{\cal R}(\widetilde{U}_{1})) is the diagonal matrix of the canonical angles between the subspaces (U1){\cal R}(U_{1}) and (U~1){\cal R}(\widetilde{U}_{1}) [11, p.21-2], [16].

The case for ui=F\|\cdot\|_{\operatorname{ui}}=\|\cdot\|_{\operatorname{F}} is not new and Stewart and Sun [16, p.260] credited it to Wedin [17] with a slightly different δ\delta (similar to (4.9) we commented moments ago). However, Wedin [17] did not explicitly mention it for the case. Evidently, Wedin could easily had it because of the machinery he had already built in the paper.

Proof of Theorem 4.1.

It can be seen that

U2HR=G2V2HV~1U2HU~1G~1,V2HS=G2HU2HU~1V2HV~1G~1H.U_{2}^{\operatorname{H}}R=G_{2}V_{2}^{\operatorname{H}}\widetilde{V}_{1}-U_{2}^{\operatorname{H}}\widetilde{U}_{1}\widetilde{G}_{1},\quad V_{2}^{\operatorname{H}}S=G_{2}^{\operatorname{H}}U_{2}^{\operatorname{H}}\widetilde{U}_{1}-V_{2}^{\operatorname{H}}\widetilde{V}_{1}\widetilde{G}_{1}^{\operatorname{H}}.

Or, equivalently,

(U2HU~1)G~1G2(V2HV~1)=U2HR,(V2HV~1)G~1HG2H(U2HU~1)=V2HS,(U_{2}^{\operatorname{H}}\widetilde{U}_{1})\widetilde{G}_{1}-G_{2}(V_{2}^{\operatorname{H}}\widetilde{V}_{1})=-U_{2}^{\operatorname{H}}R,\quad(V_{2}^{\operatorname{H}}\widetilde{V}_{1})\widetilde{G}_{1}^{\operatorname{H}}-G_{2}^{\operatorname{H}}(U_{2}^{\operatorname{H}}\widetilde{U}_{1})=-V_{2}^{\operatorname{H}}S,

which takes the form of the coupled Sylvester equations (2.6) with X=U2HU~1X=U_{2}^{\operatorname{H}}\widetilde{U}_{1} and Y=V2HV~1Y=V_{2}^{\operatorname{H}}\widetilde{V}_{1}. Noticing that the singular values of U2HU~1U_{2}^{\operatorname{H}}\widetilde{U}_{1} and those of V2HV~1V_{2}^{\operatorname{H}}\widetilde{V}_{1} are the same as the sines of the canonical angles between (U1){\cal R}(U_{1}) and (U~1){\cal R}(\widetilde{U}_{1}) and between (V1){\cal R}(V_{1}) and (V~1){\cal R}(\widetilde{V}_{1}), respectively, and hence [16, 8, 9]

[U2HU~100V2HV~1]ui=[sinΘ((U1),(U~1))00sinΘ((V1),(V~1))]ui,\left\|\begin{bmatrix}U_{2}^{\operatorname{H}}\widetilde{U}_{1}&0\\ 0&V_{2}^{\operatorname{H}}\widetilde{V}_{1}\end{bmatrix}\right\|_{\operatorname{ui}}=\left\|\begin{bmatrix}\sin\Theta({\cal R}(U_{1}),{\cal R}(\widetilde{U}_{1}))&0\\ 0&\sin\Theta({\cal R}(V_{1}),{\cal R}(\widetilde{V}_{1}))\end{bmatrix}\right\|_{\operatorname{ui}},

and noticing

[U2HR00V2HS]ui=[U2H00V2H][R00S]ui1δ[R00S]ui,\left\|\begin{bmatrix}-U_{2}^{\operatorname{H}}R&0\\ 0&-V_{2}^{\operatorname{H}}S\end{bmatrix}\right\|_{\operatorname{ui}}=\left\|-\begin{bmatrix}U_{2}^{\operatorname{H}}&0\\ 0&V_{2}^{\operatorname{H}}\end{bmatrix}\begin{bmatrix}R&0\\ 0&S\end{bmatrix}\right\|_{\operatorname{ui}}\leq\frac{1}{\delta}\left\|\begin{bmatrix}R&0\\ 0&S\end{bmatrix}\right\|_{\operatorname{ui}},

the conclusion in the theorem is a simple consequence of Lemma 2.3. ∎

5 Concluding Remarks

Stewart’s original theorem [15, Theorem 6.4] for the singular subspaces associated with the SVD of a matrix subject to some perturbations uses both the Frobenius and spectral norms. Although it is still versatile to apply in the situations such as only the spectral norm is suitable [18], it may lead to weaker results: stronger conditions and yet less sharp bounds, through equivalency bounds between the spectral norm and the Frobenius norm, as we argued in Section 1. Consequently it pays to establish variants of Stewart’s original theorem from scratch. Our main contribution in Theorem 3.1 provides perturbation bounds that encompass a variety of circumstances and that are dictated by the conditions as specified by Lemma 3.1, which are further explained in Remark 3.1.

Lemma 2.3 collects bounds, some old and some new, on the solution to the set of coupled Sylvester equations (2.6) under different circumstances. They form a part of the foundation based on which our main results in Theorem 3.1 are derived. Furthermore Lemma 2.3 can be put into good use to establish more generalized sinθ\sin\theta theorems for SVD as exemplified in Theorem 4.1, beyond existing ones due to Wedin [17].

References

  • [1] A. C. Antoulas. Approximation of Large-Scale Dynamical Systems. Advances in Design and Control. SIAM, Philadelphia, PA, 2005.
  • [2] Peter Benner, Mario Ohlberger, Albert Cohen, and Karen Willcox, editors. Model Reduction and Approximation: Theory and Algorithms. Computational Science & Engineering. SIAM, Philadelphia, 2017.
  • [3] R. Bhatia. Matrix Analysis. Graduate Texts in Mathematics, vol. 169. Springer, New York, 1996.
  • [4] R. Bhatia, C. Davis, and A. McIntosh. Perturbation of spectral subspaces and solution of linear operator equations. Linear Algebra Appl., 52-53:45–67, 1983.
  • [5] R. Bhatia and P. Rosenthal. How and why to solve the operator equation AXXB=YAX-XB=Y. Bull. London Math. Soc., 29:1–21, 1997.
  • [6] C. Davis and W. Kahan. The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal., 7:1–46, 1970.
  • [7] C.-K. Li and R.-C. Li. A note on eigenvalues of perturbed Hermitian matrices. Linear Algebra Appl., 395:183–190, 2005.
  • [8] R.-C. Li. Bounds on perturbations of generalized singular values and of associated subspaces. SIAM J. Matrix Anal. Appl., 14:195–234, 1993.
  • [9] R.-C. Li. On perturbations of matrix pencils with real spectra. Math. Comp., 62:231–265, 1994.
  • [10] R.-C. Li. A bound on the solution to a structured Sylvester equation with an application to relative perturbation theory. SIAM J. Matrix Anal. Appl., 21:440–445, 1999.
  • [11] R.-C. Li. Matrix perturbation theory. In L. Hogben, R. Brualdi, and G. W. Stewart, editors, Handbook of Linear Algebra, page Chapter 21. CRC Press, Boca Raton, FL, 2nd edition, 2014.
  • [12] X. Liang and R.-C. Li. Extensions of Wielandt’s min-max principles for positive semi-definite pencils. Lin. Multilin. Alg., 62(8):1032–1048, 2014.
  • [13] B. N. Parlett. The Symmetric Eigenvalue Problem. SIAM, Philadelphia, 1998.
  • [14] G. W. Stewart. Error bounds for approximate invariant subspaces of closed linear operators. SIAM J. Numer. Anal., 8:796–808, 1971.
  • [15] G. W. Stewart. Error and perturbation bounds for subspaces associated with certain eigenvalue problems. SIAM Rev., 15:727–764, 1973.
  • [16] G. W. Stewart and J.-G. Sun. Matrix Perturbation Theory. Academic Press, Boston, 1990.
  • [17] P.-Å. Wedin. Perturbation bounds in connection with singular value decomposition. BIT, 12:99–111, 1972.
  • [18] L.-H. Zhang and R.-C. Li. Quality of approximate balanced truncation. URL https://arxiv.org/pdf/2406.05665, 2024.