This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\newsiamremark

remarkRemark \newsiamremarkhypothesisHypothesis \newsiamthmclaimClaim \headersSubspace Recycling-based Regularization MethodsR. Ramlau and K. M. Soodhalter and Victoria Hutterer

Subspace Recycling-based Regularization Methodsthanks: DATE.

Ronny Ramlau Industrial Mathematics Institute, Kepler University Linz, and Johann Radon Institute for Computational and Applied Mathematics (RICAM), Linz, Austria.([email protected]).    Kirk M. Soodhalter School of Mathematics, Trinity College Dublin, The University of Dublin, College Green, Dublin 2, Ireland.([email protected]).    Victoria Hutterer Industrial Mathematics Institute, Kepler University Linz, Austria. ([email protected]).
Abstract

Subspace recycling techniques have been used quite successfully for the acceleration of iterative methods for solving large-scale linear systems. These methods often work by augmenting a solution subspace generated iteratively by a known algorithm with a fixed subspace of vectors which are “useful” for solving the problem. Often, this has the effect of inducing a projected version of the original linear system to which the known iterative method is then applied, and this projection can act as a deflation preconditioner, accelerating convergence. Most often, these methods have been applied for the solution of well-posed problems. However, they have also begun to be considered for the solution of ill-posed problems.

In this paper, we consider subspace augmentation-type iterative schemes applied to linear ill-posed problems in a continuous Hilbert space setting, based on a recently developed framework describing these methods. We show that under suitable assumptions, a recycling method satisfies the formal definition of a regularization, as long as the underlying scheme is itself a regularization. We then develop an augmented subspace version of the gradient descent method and demonstrate its effectiveness, both on an academic Gaussian blur model and on problems arising from the adaptive optics community for the resolution of large sky images by ground-based extremely large telescopes.

keywords:
ill-posed problems, augmented methods, recycling, iterative methods, Landweber, gradient descent
{AMS}

68Q25, 68R10, 68U05

1 Introduction

This paper concerns the use of subspace recycling iterative methods in the context of solving linear ill-posed problems. There have already been a number of augmented or subspace recycling methods proposed specifically to treat ill-posed problems (e.g., [4, 3, 9, 5]), but these methods have yet to be formally analyzed as regularization schemes. More recently, an augmented scheme to accelerate the regularization of a problem in adaptive optics, producing a large improvement in performance, was proposed [32]. However, the augmentation was done after the regularized problem was set up rather than actually combining these techniques. It would thus be helpful to place all these methods into a common framework to streamline the development of new methods and to enable them to be analyzed using a common language and framework. In particular, it enables us in this paper to prove under what conditions augmented approaches described by the framework are regularizations.

We take advantage of a newly-proposed general framework [36] which can be used to describe the vast majority of these methods. This framework allows us to perform a general analysis of augmentation and recycling schemes as regularizations. From this, we develop sufficient conditions for a recycled iterative method to be a regularization, namely that a subspace recycling scheme is a regularization if it is built from an underlying regularization method. As a proof-of-concept, we then leverage the framework and our analysis to propose augmented steepest descent and Landweber methods, which we then apply to some test problems.

The linear inverse problem we consider is of the form

(1) Tx=yTx=y

whereby we approximately reconstruct xx from noisy data yδy^{\delta} where yδ=y+ny^{\delta}=y+n, and nn represents the perturbation of our data which satisfies nδ\|n\|\leq\delta. Here, T:𝒳𝒴T:{\mathcal{X}}\to{\mathcal{Y}} is a continuous mapping between separable Hilbert spaces 𝒳,𝒴{\mathcal{X}},{\mathcal{Y}}. In many relevant applications, the operator TT is ill-posed, i.e., (1) violates Hadamard’s conditions: a solution either might not exist, it might be non-unique or it may not depend continuously on the data. In this case regularization techniques have to be employed for the stable solution of (1), see [12, 13] and Section 2 for a summary on these methods.

The quality of the reconstruction for a specific problem depends heavily on available additional information on the solution, which is often directly included into the chosen reconstruction approach. E.g., source conditions of the solution are used to estimate the reconstruction error [13, 27]. Alternatively, information on the smoothness of the solution can be incorporated into Tikhonov regularization by choosing a penalty term that enforces the reconstruction to be smooth. For iterative regularization methods, the iteration can be started with a rough guess for the solution, leading generally to shorter reconstruction times and better results. Those approaches are of particular interest for inverse problems that have to be solved repeatedly in time and where the previous reconstruction can be considered as a good guess for the solution of the current problem. This would be the case, e.g., for Adaptive Optics applications in Astronomy, where the image quality loss due to turbulences in the atmosphere is corrected by a suitable shaped deformable mirror. The shape of the deformable mirrors has to be adjusted within milliseconds. The optimal shape of the mirror is derived by solving the atmospheric tomography problem repeatedly (about every 1-2ms). Although the atmosphere changes rapidly it can be regarded frozen on a sub-millisecond timescale, meaning that the previous reconstruction of the atmospheric tomography problem can be regarded as a good approximation to the new solution [40, 31].

In this paper we want to consider the case that we know a finite-dimensional subspace 𝒰𝒳{\mathcal{U}}\subset{\mathcal{X}} that contains a significant part of the solution xx^{\dagger} which we seek. As we will see, this part can be easily and stably be reconstructed. What then remains is to reconstruct the part that belongs to the orthogonal complement 𝒰{\mathcal{U}}^{\perp} of 𝒰{\mathcal{U}}, possibly induces an acceleration in the convergence rate of the resulting iterative method, cf. Section 5.

For finite dimensional problems, this approach is known as a subspace recycling method and is currently a hot topic in numerical linear algebra. The goal of this paper is therefore to extend the theory to ill-posed problems in an infinite dimensional setting. Specifically, we will show that the concept of subspace recycling can be used in connection with most of the standard regularization methods.

The paper is organized as follows: In Section 2 we give a short introduction to Inverse Problems and regularization methods, whereas Section 3 discusses the state of research of subspace recycling methods. In Sections 4 and 5 we show that standard linear regularization method can be combined with subspace recycling and prove that the resulting method is still a regularization. Finally, Section 6 considers recycled gradient descent methods, and Section 7 presents some numerical results.

2 Regularization of linear ill-posed problems

We begin with a brief review of some basic regularization theory, which we mainly base on [12, 13, 27]. As mentioned above, a solution to the problem Tx=yTx=y might not exist; if it exists it might not be unique and/or might not depend continuously on the data. The first two problems can be circumvented by using the generalized inverse TT^{\dagger}, whereas stability is enforced by using regularization methods. To be more specific, an operator

(2) Rα:YXR_{\alpha}:Y\to X

is a regularization for a linear operator T:XYT:X\to Y iff for any right hand side yδy^{\delta} with yyδδ\|y-y^{\delta}\|\leq\delta there exists a parameter choice rule

(3) α:×Y+\alpha:\mathbb{R}\times Y\to\mathbb{R}^{+}

such that

(4) limδ0Rα(δ)yδ=Ty=:x\lim_{\delta\to 0}R_{\alpha(\delta)}y^{\delta}=T^{\dagger}y=:x^{\dagger}

holds. Here, the parameter is chosen such that limδ0α(δ)=0\lim_{\delta\to 0}\alpha(\delta)=0. Over the last decades, many regularization methods have been investigated, most prominently Tikhonov regularization and iterative methods like Landweber iteration [13, 24] and the conjugate gradient method. For Tikhonov regularization, the approximation xαδx_{\alpha}^{\delta} to the generalized solution xx^{\dagger} is computed as the minimizer of the Tikhonov functional

(5) Jα(x)=yδTxY2+αΩ(x),J_{\alpha}(x)=\|y^{\delta}-Tx\|_{Y}^{2}+\alpha\Omega(x),

where Ω(x)\Omega(x) is a suitable penalty term, e.g., Ω(x)=x2\Omega(x)=\|x\|^{2}. For more general penalties we refer, e.g., to [12, 35]. For Tikhonov to be rendered a regularization method, one appropriate method of choosing the regularization parameter is the discrepancy principle, where α\alpha is chosen such that

(6) yδTxαδ=τδ,τ>1.\|y^{\delta}-Tx_{\alpha}^{\delta}\|=\tau\delta,\hskip 28.45274pt\tau>1.

Iterative methods are popular as their computational realization is often straightforward. Landweber iteration computes as

(7) xk+1δ=xkδ+βT(yδTxkδ),0<β<2/T2.x_{k+1}^{\delta}=x_{k}^{\delta}+\beta T^{\ast}(y^{\delta}-Tx_{k}^{\delta}),\hskip 28.45274pt0<\beta<2/\|T\|^{2}.

For iterative methods the iteration index plays the role of the regularization parameter. According to the discrepancy principle, the iteration is terminated at step kk when for the first time yδTxkδτδ\|y^{\delta}-Tx_{k}^{\delta}\|\leq\tau\delta with τ>1\tau>1 fixed. The Landweber iteration with the discrepancy principle is a regularization method [13]. It is well known that the convergence of regularization methods for δ0\delta\to 0 can be arbitrarily slow. Convergence rates can be obtained by using source conditions. Popular are Hölder type source conditions

(8) x=(TT)νwx^{\dagger}=(T^{\ast}T)^{\nu}w

for some μ0\mu\geq 0 and wXw\in X. A regularization method is called order optimal iff the reconstruction error can be estimated as

(9) xRα(δ)yδ=𝒪(δν/(ν+1))\|x^{\dagger}-R_{\alpha(\delta)}y^{\delta}\|=\mathcal{O}(\delta^{\nu/(\nu+1)})

provided a source condition (8) is fulfilled. Both Tikhonov and Landweber equipped with suitable source conditions are order optimal [13].

In this paper we will show that subspace recycling-based methods also form a regularization.

3 Augmented iterative methods

Augmented iterative methods, as we discuss them here, have been proposed in the literature beginning in the mid- to late-nineties; see [36] for a survey of this history. They can be considered as a specialized version of the class of iterative methods called residual constraint or residual projection methods. We discuss these methods briefly to put them into context, but for more details, the reader should see the survey paper [36], and references therein.

3.1 Iterations determined by residual constraint

Many iterative methods which are commonly used to treat both well- and ill-posed linear problems can be formulated according to a correction/constraint setup. Our discussion here in Section 3 focuses on the well-posed problem case for simplicity. Let (,)𝒳\left(\cdot,\cdot\right)_{{\mathcal{X}}} and (,)𝒴\left(\cdot,\cdot\right)_{{\mathcal{Y}}}, respectively, be the inner products for 𝒳{\mathcal{X}} and 𝒴{\mathcal{Y}} and (Tx,y)𝒳=(x,Ty)𝒴\left(Tx,y\right)_{{\mathcal{X}}}=\left(x,T^{\ast}y\right)_{{\mathcal{Y}}}.

Consider the case wherein we are solving Eq. 1 with no initial approximation; i.e., x0=0x_{0}=0. The strategies we outline here take the abstract form of approximating the solution xx^{\dagger} by x^𝒮\hat{x}\in{\mathcal{S}} such that the residual satisfies yTx^𝒮~y-T\hat{x}\perp\widetilde{{\mathcal{S}}}, where 𝒮𝒳{\mathcal{S}}\subset{\mathcal{X}} and 𝒮~𝒴\widetilde{{\mathcal{S}}}\subset{\mathcal{Y}} are such that dim𝒮=dim𝒮~<\dim{\mathcal{S}}=\dim\widetilde{{\mathcal{S}}}<\infty. They are called the correction and constraint spaces, respectively. A common example of a case for which dim𝒮=1\dim{\mathcal{S}}=1 is steepest descent (i.e., gradient descent with step size chosen to minimize the TTT^{\ast}T-norm of the error); see, e.g., [34, Section 5.3.1].

Proposition 3.1.

The (j+1)(j+1)st step of steepest descent can be formulated as

(10)  select tj+1span{Trj} such that yT(xj+tj+1)span{TTrj},\mbox{\ \ \ select\ \ }t_{j+1}\in{\rm span}\left\{T^{\ast}r_{j}\right\}\mbox{\ \ \ such that\ \ }y-T\left(x_{j}+t_{j+1}\right)\perp{\rm span}\left\{TT^{\ast}r_{j}\right\},

where rj=yTxjr_{j}=y-Tx_{j}; i.e., 𝒮=span{Trj}{\mathcal{S}}={\rm span}\left\{T^{\ast}r_{j}\right\}, and 𝒮~=span{TTrj}\widetilde{{\mathcal{S}}}={\rm span}\left\{TT^{\ast}r_{j}\right\}.111One could also, with some work, use this result to obtain a residual constraint formulation for a more general Landweber method, but it is not clear that this brings an analytic advantage.

Proof 3.2.

Suppose we have constructed the approximation xjx_{j} at the previous iteration. Recall that gradient descent methods compute xj+1=xj+αTrjx_{j+1}=x_{j}+\alpha T^{\ast}r_{j}, where rj=yTxjr_{j}=y-Tx_{j}. The choice of α\alpha (called the step size) determines the method’s behavior. Steepest descent computes

αj+1=argminαyT(xj+αTrj)𝒴.\alpha_{j+1}=\underset{\alpha\in\mathbb{R}}{\text{{\rm argmin}}}\left\|y-T(x_{j}+\alpha T^{\ast}r_{j})\right\|_{{\mathcal{Y}}}.

Differentiating the right-hand side, setting it equal to zero, and solving for αj+1\alpha_{j+1} yields the minimizer αj+1=(rj,TTrj)𝒴(TTrj,TTrj)𝒴\alpha_{j+1}=\dfrac{\left(r_{j},TT^{\ast}r_{j}\right)_{{\mathcal{Y}}}}{\left(TT^{\ast}r_{j},TT^{\ast}r_{j}\right)_{{\mathcal{Y}}}}. One must simply show that Eq. 10 produces the same choice of αj+1\alpha_{j+1}. However, this constraint condition is equivalent to solving the equation

(rjαjTTrj,TTrj)𝒴=0,\left(r_{j}-\alpha_{j}TT^{\ast}r_{j},TT^{\ast}r_{j}\right)_{{\mathcal{Y}}}=0,

which when solved for αj\alpha_{j} produces the same TTT^{\ast}T-norm-minimizing choice of αj\alpha_{j}.

Steepest descent is an example of a method for which dim𝒮=1\dim{\mathcal{S}}=1. There are many methods for which the size of the correction and constraint spaces grows at each iteration. It is a well-known result that one can characterize the method of conjugate gradients in this way, which we present without proof; see [33, 34] for proofs.

Proposition 3.3.

The (j+1)(j+1)st iterate produced by the method of conjugate gradients applied to the normal equations associated to Eq. 1 can be formulated as

(11)  select xj+1𝒦j+1(TT,Ty) such that yTxj+1T𝒦j+1(TT,Ty),\mbox{\ \ \ select\ \ }x_{j+1}\in{\mathcal{K}}_{j+1}\left(T^{\ast}T,T^{\ast}y\right)\mbox{\ \ \ such that\ \ }y-Tx_{j+1}\perp T{\mathcal{K}}_{j+1}\left(T^{\ast}T,T^{\ast}y\right),

where 𝒦j+1(TT,Ty)=span{y,TTy,(TT)2y,,(TT)jy}{\mathcal{K}}_{j+1}\left(T^{\ast}T,T^{\ast}y\right)={\rm span}\left\{y,T^{\ast}Ty,\left(T^{\ast}T\right)^{2}y,\ldots,\left(T^{\ast}T\right)^{j}y\right\}.

3.2 Augmentation via constraint over a pair of spaces

Many flavors of augmented iterative methods have been proposed to treat both well- and ill-posed problems. These methods treat the situation in which we wish to approximate the solution using an iterative method, but we also have a fixed subspace 𝒰{\mathcal{U}} which we also want to use to build the approximation. Using the correction/constraint formulation in Section 3.1, we can describe many of these methods as selecting approximations from a correction space 𝒮j=𝒰+𝒱j𝒳{\mathcal{S}}_{j}={\mathcal{U}}+{\mathcal{V}}_{j}\subset{\mathcal{X}} with a residual constraint space 𝒮~j=𝒰~+𝒱~j𝒴\widetilde{{\mathcal{S}}}_{j}=\widetilde{{\mathcal{U}}}+\widetilde{{\mathcal{V}}}_{j}\subset{\mathcal{Y}}, where 𝒰{\mathcal{U}} and 𝒰~\widetilde{{\mathcal{U}}} are spaces of fixed dimension kk and 𝒱j{\mathcal{V}}_{j} and 𝒱~j\widetilde{{\mathcal{V}}}_{j} are generated iteratively, with dimension jj at step jj. It has been shown that this correction/constraint over the sum of subspaces reduces mathematically to applying the underlying iterative method to a projected problem, obtaining part of the approximation from 𝒱j{\mathcal{V}}_{j} and then obtaining the corrections from 𝒰{\mathcal{U}} afterward. This will be explained in more detail in Section 5, but we give a brief outline here.

A projector is uniquely defined by its range and its null space. A standard result from linear algebra shows that this is equivalent to defining the ranges of the projector and its complement. In this setting, consider the projectors P(𝒳)P\in{\mathcal{L}}\left({\mathcal{X}}\right) and Q(𝒴)Q\in{\mathcal{L}}\left({\mathcal{Y}}\right) having the relationship that

(12) TP=QT.TP=QT.

Let Range(P)=𝒰{\rm Range}\left(P\right)={\mathcal{U}} and Null(P)=(T𝒰~){\rm Null}\left(P\right)=\left(T^{\ast}\,\widetilde{{\mathcal{U}}}\right)^{\perp}, and Range(Q)=T𝒰{\rm Range}\left(Q\right)=T\,{\mathcal{U}} and Null(Q)=(𝒰~){\rm Null}\left(Q\right)=\left(\widetilde{{\mathcal{U}}}\right)^{\perp}. One can easily show that defining the null spaces is equivalent to defining Range(I𝒳P)=(T𝒰~){\rm Range}\left(I_{{\mathcal{X}}}-P\right)=\left(T^{\ast}\,\widetilde{{\mathcal{U}}}\right)^{\perp} and Range(I𝒴Q)=𝒰~{\rm Range}\left(I_{{\mathcal{Y}}}-Q\right)=\widetilde{{\mathcal{U}}}^{\perp}. For example, let v𝒳v\in{\mathcal{X}} be chosen arbitrarily. It can be decomposed as v=v+v𝒩v=v_{\mathcal{R}}+v_{\mathcal{N}} with vRange(P)v_{\mathcal{R}}\in{\rm Range}\left(P\right) and v𝒩Null(P)v_{\mathcal{N}}\in{\rm Null}\left(P\right). As expected, Pv=Pv+Pv𝒩=vPv=Pv_{\mathcal{R}}+Pv_{\mathcal{N}}=v_{\mathcal{R}}. We can then observe that (I𝒳P)v=v+v𝒩PvPv𝒩=v𝒩\left(I_{{\mathcal{X}}}-P\right)v=v_{\mathcal{R}}+v_{\mathcal{N}}-Pv_{\mathcal{R}}-Pv_{\mathcal{N}}=v_{\mathcal{N}}. As v𝒳v\in{\mathcal{X}} was chosen arbitrarily, we see that Range(I𝒳P)=Null(P){\rm Range}\left(I_{{\mathcal{X}}}-P\right)={\rm Null}\left(P\right). The same can be shown for QQ.

Approximating the solution of Eq. 1 by xj𝒮jx_{j}\in{\mathcal{S}}_{j} subject to the constraint rj𝒮~jr_{j}\perp\widetilde{{\mathcal{S}}}_{j} has been shown to be equivalent to approximating the solution tt to the projected problem

(13) (IQ)Tt=(IQ)y\left(I-Q\right)Tt=\left(I-Q\right)y

by tj𝒱jt_{j}\in{\mathcal{V}}_{j} with residual constraint space 𝒱~j\widetilde{{\mathcal{V}}}_{j} and setting the approximation xj𝒰+𝒱jx_{j}\in{\mathcal{U}}+{\mathcal{V}}_{j} to be

(14) xj=Px+(IP)tj;x_{j}=Px^{\dagger}+\left(I-P\right)t_{j};

see, [7] and [15, 16, 18].

For well-posed problems, the space 𝒰{\mathcal{U}} often contains vectors from previous iteration cycles applied to the current or previously-solved related problem; see, e.g., [2, 23, 29, 38]. These vectors are often (but not always) approximate eigenvectors (of a square operator) or left singular vectors, guided by Krylov subspace method convergence theory. For ill-posed problems, the strategy is generally to include some known features of the image/signal being reconstructed (see, e.g., [3, 4, 9]), but some current work by [5] still seeks to augment a Lanczos bidiagonalization-based solver for ill-posed, tall, skinny least-squares problems using approximate left singular vectors to accelerate semi-convergence and reduce the influence of smaller singular values to achieve a regularizing effect. See Section 5.1 for further discussion.

There is an alternative approach (originally proposed in the context of domain decomposition in, e.g., [14]) which begins with the projectors PP and QQ (rather than treating them as a consequence of the correction/constraint formulation) and produces the same algorithms but provides more flexibility to use methods like Landweber which do not have a clean residual constraint condition. This will be derived in detail in Section 5, but it is built on the simple idea that the projector PP can be used to get the decomposition x=Px+(IP)xx^{\dagger}=Px^{\dagger}+\left(I-P\right)x^{\dagger}. We will show that PxPx^{\dagger} can be cheaply computed and we can use an iterative method to approximate (IP)x\left(I-P\right)x^{\dagger}.

4 Linear Projection Methods

We consider here the case that 𝒰~=T𝒰\widetilde{{\mathcal{U}}}=T\,{\mathcal{U}}. For a finite dimensional subspace 𝒰𝒳{\mathcal{U}}\subset{\mathcal{X}} with dim𝒰=k>0\dim{\mathcal{U}}=k>0, we denote the sibling subspace 𝒞=T𝒰𝒴{\mathcal{C}}=T\,{\mathcal{U}}\subset{\mathcal{Y}} and assume it also has dimension kk. We can represent these subspaces in terms of bases, which we denote with a matrix-like notation. Let

(15) 𝐔=[u1u2uk]𝒳k and 𝐂=[c1c2ck]𝒴k\mathbf{U}=\begin{bmatrix}u_{1}&u_{2}&\cdots&u_{k}\end{bmatrix}\in{\mathcal{X}}^{k}\mbox{\ \ \ and\ \ \ }\mathbf{C}=\begin{bmatrix}c_{1}&c_{2}&\cdots&c_{k}\end{bmatrix}\in{\mathcal{Y}}^{k}

be kk-tuples of elements from 𝒰{\mathcal{U}} and 𝒞{\mathcal{C}}, respectively, which are bases for these spaces.

Remark 4.1.

To get a sense of these objects, observe that if 𝒳=𝒴=n{\mathcal{X}}={\mathcal{Y}}=\mathbb{R}^{n} then we would have 𝐔,𝐂n×k\mathbf{U},\mathbf{C}\in\mathbb{R}^{n\times k}; so this is just the generalization of the notion of a tall, skinny matrix.

We can consider 𝐔\mathbf{U} and 𝐂\mathbf{C} to be linear mappings induced by a basis expansion operation, with 𝐔(k,𝒳)\mathbf{U}\in{\mathcal{L}}\left(\mathbb{R}^{k},{\mathcal{X}}\right) and 𝐂(k,𝒴)\mathbf{C}\in{\mathcal{L}}\left(\mathbb{R}^{k},{\mathcal{Y}}\right), defined by

(16) 𝐔𝐳=i=1kziui and 𝐂𝐳=i=1kzici for 𝐳=[z1z2zk]Tk.\mathbf{U}\mathbf{z}=\sum_{i=1}^{k}z_{i}u_{i}\mbox{\ \ \ and\ \ \ }\mathbf{C}\mathbf{z}=\sum_{i=1}^{k}z_{i}c_{i}\mbox{\ \ \ for\ \ \ }\mathbf{z}=\ \begin{bmatrix}z_{1}&z_{2}&\cdots&z_{k}\end{bmatrix}^{T}\in\mathbb{R}^{k}.

Let

(17) (x,𝐔)𝒳=[(x,u1)𝒳(x,uk)𝒳]k, and (y,𝐂)𝒴=[(y,c1)𝒴(y,ck)𝒴]k.\left(x,\mathbf{U}\right)_{{\mathcal{X}}}=\begin{bmatrix}\left(x,u_{1}\right)_{{\mathcal{X}}}\\ \vdots\\ \left(x,u_{k}\right)_{{\mathcal{X}}}\end{bmatrix}\in\mathbb{R}^{k},\mbox{\ \ \ and\ \ \ }\left(y,\mathbf{C}\right)_{{\mathcal{Y}}}=\begin{bmatrix}\left(y,c_{1}\right)_{{\mathcal{Y}}}\\ \vdots\\ \left(y,c_{k}\right)_{{\mathcal{Y}}}\end{bmatrix}\in\mathbb{R}^{k}.

Furthermore, for 𝐌,𝐋𝒳k\mathbf{M},\mathbf{L}\in{\mathcal{X}}^{k}, we define the bilinear mapping

(18) (,)𝒳:𝒳k×𝒳kk×k with (𝐌,𝐋)((mi,j)𝒳)i,j=1kk×k.\left(\cdot,\cdot\right)_{{\mathcal{X}}}:{\mathcal{X}}^{k}\times{\mathcal{X}}^{k}\rightarrow\mathbb{R}^{k\times k}\mbox{\ \ \ with\ \ \ }\left(\mathbf{M},\mathbf{L}\right)\mapsto\big{(}\left(m_{i},\ell_{j}\right)_{{\mathcal{X}}}\big{)}_{i,j=1}^{k}\in\mathbb{R}^{k\times k}.

We similarly define the corresponding bilinear mapping for two elements from 𝐅,𝐆𝒴k\mathbf{F},\mathbf{G}\in{\mathcal{Y}}^{k}

(,)𝒴:𝒴k×𝒴kk×k with (𝐅,𝐆)((fi,gj)𝒴)i,j=1kk×k.\left(\cdot,\cdot\right)_{{\mathcal{Y}}}:{\mathcal{Y}}^{k}\times{\mathcal{Y}}^{k}\rightarrow\mathbb{R}^{k\times k}\mbox{\ \ \ with\ \ \ }\left(\mathbf{F},\mathbf{G}\right)\mapsto\big{(}\left(f_{i},g_{j}\right)_{{\mathcal{Y}}}\big{)}_{i,j=1}^{k}\in\mathbb{R}^{k\times k}.

In particular, we have the following relation:

Lemma 4.2.

For the operator 𝐔\mathbf{U} (and similarly 𝐂\mathbf{C}) holds

(19) 𝐔z𝒳2(𝐔,𝐔)Fzk2\|\mathbf{U}z\|_{{\mathcal{X}}}^{2}\leq\|(\mathbf{U},\mathbf{U})\|_{F}\|z\|^{2}_{\mathbb{R}^{k}}

Proof 4.3.

With z=(z1,,zk)kz=(z_{1},\dots,z_{k})\in\mathbb{R}^{k}, 𝐔z=i=1kziuj𝒳\mathbf{U}z=\sum_{i=1}^{k}z_{i}u_{j}\in{\mathcal{X}} follows

𝐔z𝒳2\displaystyle\|\mathbf{U}z\|^{2}_{{\mathcal{X}}} =i=1kj=1kzizjui,uj𝒳\displaystyle=\sum_{i=1}^{k}\sum_{j=1}^{k}z_{i}z_{j}\langle u_{i},u_{j}\rangle_{{\mathcal{X}}}
=z,(𝐔,𝐔)zk(𝐔,𝐔)zkzk\displaystyle=\langle z,(\mathbf{U},\mathbf{U})z\rangle_{\mathbb{R}^{k}}\leq\|(\mathbf{U},\mathbf{U})z\|_{\mathbb{R}^{k}}\|z\|_{\mathbb{R}^{k}}
(𝐔,𝐔)Fzk2.\displaystyle\leq\|(\mathbf{U},\mathbf{U})\|_{F}\|z\|^{2}_{\mathbb{R}^{k}}.

From Lemma 4.2 follows in particular

(20) 𝐔k𝒳(𝐔,𝐔)F.\|\mathbf{U}\|_{\mathbb{R}^{k}\to{\mathcal{X}}}\leq\sqrt{\|(\mathbf{U},\mathbf{U})\|_{F}}.

We will further on also need

Lemma 4.4.

For y𝒴y\in{\mathcal{Y}} and 𝐂\mathbf{C} as above holds

(21) (y,𝐂)kTy𝒴l=1kul2\|(y,\mathbf{C})\|_{\mathbb{R}^{k}}\leq\|T\|\|y\|_{\mathcal{Y}}\sqrt{\sum_{l=1}^{k}\|u_{l}\|^{2}}

Proof 4.5.

With (17) yields

(22) (y,𝐂)k2\displaystyle\|(y,\mathbf{C})\|_{\mathbb{R}^{k}}^{2} =i=1k|(y,ci)𝒴|2i=1ky𝒴2ci𝒴2\displaystyle=\sum_{i=1}^{k}|(y,c_{i})_{\mathcal{Y}}|^{2}\leq\sum_{i=1}^{k}\|y\|^{2}_{\mathcal{Y}}\|c_{i}\|_{\mathcal{Y}}^{2}
(23) =y𝒴2i=1kTui2T𝒳𝒴2y𝒴2i=1kui𝒳2\displaystyle=\|y\|^{2}_{\mathcal{Y}}\sum_{i=1}^{k}\|Tu_{i}\|^{2}\leq\|T\|^{2}_{{}_{{\mathcal{X}}\to{\mathcal{Y}}}}\|\|y\|^{2}_{{\mathcal{Y}}}\sum_{i=1}^{k}\|u_{i}\|_{{\mathcal{X}}}^{2}

5 Projected Regularization Method

Let 𝒰𝒳{\mathcal{U}}\subset{\mathcal{X}} be a kk-dimensional subspace, and 𝒞=T𝒰𝒴{\mathcal{C}}=T\,{\mathcal{U}}\subset{\mathcal{Y}} be its image under the action of TT. Here it is important that we assume that the restricted operator T|𝒰T|_{{\mathcal{U}}} is invertible, so as to guarantee that dim𝒰=dim𝒞=k\dim{\mathcal{U}}=\dim{\mathcal{C}}=k holds. This allows us express clearly the structure of projectors onto these spaces.

Proposition 5.1.

Assume that the vectors ui𝐔u_{i}\in\mathbf{U} are linear independent, and that

(24) T|𝒰:𝒰YT_{|_{{\mathcal{U}}}}:{\mathcal{U}}\to Y

is invertible on its range (T|𝒰)\mathcal{R}(T_{|_{{\mathcal{U}}}}). Then (𝐂,𝐂)k×k\left(\mathbf{C},\mathbf{C}\right)\in\mathbb{R}^{k\times k} is continuously invertible.

Proof 5.2.

According to (15), 𝐂=[c1c2ck]𝒴k\mathbf{C}=\begin{bmatrix}c_{1}&c_{2}&\cdots&c_{k}\end{bmatrix}\in{\mathcal{Y}}^{k} with ci=Tuic_{i}=Tu_{i}. As TT is invertible on (T|𝒰)\mathcal{R}(T_{|_{{\mathcal{U}}}}), the vectors in 𝐂\mathbf{C} are linear independent if the vectors in 𝐔\mathbf{U} are linear independent. Moreover,

(25) (𝐂,𝐂)=((ci,cj))i,j=1,,k\left(\mathbf{C},\mathbf{C}\right)=\left(\left(c_{i},c_{j}\right)\right)_{i,j=1,\dots,k}

is the Gramian matrix of the linear independent system contained in 𝐂\mathbf{C} and therefore an invertible matrix. Additionally, as (𝐂,𝐂):kk\left(\mathbf{C},\mathbf{C}\right):\mathbb{R}^{k}\to\mathbb{R}^{k} maps between finite dimensional spaces, its inverse is also bounded.

In general, the operator TT is ill posed and might have a non-trivial null space, i.e., TT is not continuously invertible. However, on a finite dimensional subspace invertibility might be achieved easily.

Now let QQ be the 𝒴{\mathcal{Y}}-orthogonal projector onto 𝒞{\mathcal{C}} and PP be the (,TT)𝒳\left(\cdot,T^{\ast}T\,\cdot\right)_{{\mathcal{X}}}-orthogonal projector onto 𝒰{\mathcal{U}}. The orthogonal projector QQ induces a direct sum splitting 𝒴=𝒞𝒞(,)𝒴{\mathcal{Y}}={\mathcal{C}}\oplus{\mathcal{C}}^{\perp_{\left(\cdot,\cdot\right)_{{\mathcal{Y}}}}}. Consider w𝒴w\in{\mathcal{Y}}. We represent the action of QQ as Qw=i=1k(w,ci)𝒴ciQw=\sum_{i=1}^{k}\left(w,c_{i}\right)_{{\mathcal{Y}}}c_{i}. Similarly, PP induces a direct sum splitting of 𝒳{\mathcal{X}} with respect to (,TT)𝒳\left(\cdot,T^{\ast}T\,\cdot\right)_{{\mathcal{X}}}, namely 𝒳=𝒰𝒰(,TT)𝒳{\mathcal{X}}={\mathcal{U}}\oplus{\mathcal{U}}^{\perp_{\left(\cdot,T^{\ast}T\,\cdot\right)_{{\mathcal{X}}}}} where 𝒰(,TT)𝒳{\mathcal{U}}^{\perp_{\left(\cdot,T^{\ast}T\,\cdot\right)_{{\mathcal{X}}}}} denotes the orthogonal complement of 𝒰{\mathcal{U}} with respect to the inner product (,TT)𝒳\left(\cdot,T^{\ast}T\,\cdot\right)_{{\mathcal{X}}}. We represent the action of PP as Pv=i=1k(v,TTui)𝒳uiPv=\sum_{i=1}^{k}\left(v,T^{\ast}T\,u_{i}\right)_{{\mathcal{X}}}u_{i}. We also can construct these projectors explicitly.

Proposition 5.3.

Let 𝐔𝒳k\mathbf{U}\in{\mathcal{X}}^{k} be a basis of 𝒰{\mathcal{U}} and 𝐂=T𝐔𝒴k\mathbf{C}=T\,\mathbf{U}\in{\mathcal{Y}}^{k} be a basis of 𝒞{\mathcal{C}}. Then we have

P=𝐔(𝐔,TT𝐔)𝒳1(,TT𝐔)𝒳 and Q=𝐂(𝐂,𝐂)𝒴1(,𝐂)𝒴.P\cdot=\mathbf{U}\left(\mathbf{U},T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}^{-1}\left(\cdot,T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}\mbox{\ \ \ and\ \ \ }Q\cdot=\mathbf{C}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(\cdot,\mathbf{C}\right)_{{\mathcal{Y}}}.

Proof 5.4.

This is a standard result, particularly in the setting where 𝒳=m{\mathcal{X}}=\mathbb{R}^{m} and 𝒴=n{\mathcal{Y}}=\mathbb{R}^{n}, but it is helpful to see why this is true in this more general setting, as well. From Proposition 5.1, we know that (𝐂,𝐂)=(T𝐔,T𝐔)=(𝐔,TT𝐔)\left(\mathbf{C},\mathbf{C}\right)=\left(T\mathbf{U},T\mathbf{U}\right)=\left(\mathbf{U},T^{\ast}T\mathbf{U}\right) is continuously invertible. Let {u^k+1,u^k+2,,u^,}\left\{\hat{u}_{k+1},\hat{u}_{k+2},\ldots,\hat{u}_{\ell},\ldots\right\} be a basis for 𝒰(,TT)𝒳{\mathcal{U}}^{\perp_{\left(\cdot,T^{\ast}T\,\cdot\right)_{{\mathcal{X}}}}}. Then for any v𝒳v\in{\mathcal{X}}, we can write

v=i=1k(v,TTui)𝒳ui+i=k+1(v,TTu^i)𝒳u^i.v=\sum_{i=1}^{k}\left(v,T^{\ast}T\,u_{i}\right)_{{\mathcal{X}}}u_{i}+\sum_{i=k+1}^{\infty}\left(v,T^{\ast}T\,\hat{u}_{i}\right)_{{\mathcal{X}}}\hat{u}_{i}.

Since by definition (ui^,TT𝐔)𝒳=0\left(\hat{u_{i}},T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}=0 for all i>ki>k, we can write

(26) 𝐔(𝐔,TT𝐔)𝒳1(v,TT𝐔)𝒳=i=1k(v,TTui)𝒳𝐔(𝐔,TT𝐔)𝒳1(ui,TT𝐔)𝒳.\mathbf{U}\left(\mathbf{U},T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}^{-1}\left(v,T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}=\sum_{i=1}^{k}\left(v,T^{\ast}T\,u_{i}\right)_{{\mathcal{X}}}\mathbf{U}\left(\mathbf{U},T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}^{-1}\left(u_{i},T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}.

Observe that, according to the definitions in Eq. 17 and Eq. 18, the vector (ui,TT𝐔)𝒳\left(u_{i},T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}} is the iith column of the matrix (𝐔,TT𝐔)𝒳\left(\mathbf{U},T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}. Thus,

(𝐔,TT𝐔)𝒳1(ui,TT𝐔)𝒳=𝐞ik\left(\mathbf{U},T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}^{-1}\left(u_{i},T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}=\mathbf{e}_{i}\in\mathbb{R}^{k}

the iith Cartesian basis vector. Thus, we have

(27) 𝐔(𝐔,TT𝐔)𝒳1(ui,TT𝐔)𝒳=𝐔𝐞i=ui\mathbf{U}\left(\mathbf{U},T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}^{-1}\left(u_{i},T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}=\mathbf{U}\mathbf{e}_{i}=u_{i}

and thus according to Eq. 26

(28) 𝐔(𝐔,TT𝐔)𝒳1(v,TT𝐔)𝒳=i=1k(v,TTui)𝒳ui.\mathbf{U}\left(\mathbf{U},T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}^{-1}\left(v,T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}=\sum_{i=1}^{k}\left(v,T^{\ast}T\,u_{i}\right)_{{\mathcal{X}}}u_{i}.

As vv was chosen arbitrarily, this is true for any element of 𝒳{\mathcal{X}} meaning the action of 𝐔(𝐔,TT𝐔)𝒳1(,TT𝐔)𝒳\mathbf{U}\left(\mathbf{U},T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}^{-1}\left(\cdot,T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}} is the action of PP. This completes the proof for PP. The same line of reasoning yields the proof for QQ.

We assume that with exact data that Eq. 1 is consistent. Using the projector PP, we decompose the exact true solution,

x=Px+(I𝒳P)x.x^{\dagger}=Px^{\dagger}+\left(I_{{\mathcal{X}}}-P\right)x^{\dagger}.

With exact data, PxPx^{\dagger} would be exactly computable, since we can rewrite

(29) xp:=Px=𝐔(𝐂,𝐂)𝒴1(Tx,𝐂)𝒴=𝐔(𝐂,𝐂)𝒴1(y,𝐂)𝒴.x_{p}^{\dagger}:=Px^{\dagger}=\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(Tx,\mathbf{C}\right)_{{\mathcal{Y}}}=\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(y,\mathbf{C}\right)_{{\mathcal{Y}}}.

However, we do no have exact data. We have disturbed data yδy^{\delta} such that yyδ𝒴δ\left\|y-y^{\delta}\right\|_{{\mathcal{Y}}}\leq\delta. Rewriting

(30) Px=𝐔(𝐂,𝐂)𝒴1(yyδ,𝐂)𝒴+𝐔(𝐂,𝐂)𝒴1(yδ,𝐂)𝒴Px=\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(y-y^{\delta},\mathbf{C}\right)_{{\mathcal{Y}}}+\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(y^{\delta},\mathbf{C}\right)_{{\mathcal{Y}}}

we can define

(31) xpδ:=𝐔(𝐂,𝐂)𝒴1(yδ,𝐂)𝒴,x_{p}^{\delta}:=\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(y^{\delta},\mathbf{C}\right)_{{\mathcal{Y}}},

which can be computed inexpensively to high precision and may be used as an approximation xpδPxx_{p}^{\delta}\approx Px^{\dagger}. We obtain

Proposition 5.5.

The approximation error between xpx_{p}^{\dagger} and xpδx_{p}^{\delta} is given by

(32) xpxpδ=𝐔(𝐂,𝐂)𝒴1(yyδ,𝐂)𝒴,x_{p}^{\dagger}-x_{p}^{\delta}=\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(y-y^{\delta},\mathbf{C}\right)_{{\mathcal{Y}}},

which can be estimated as

(33) xpxpδ((𝐔,𝐔)Fl=1kul2(𝐂,𝐂)1F)δ.\left\|x_{p}^{\dagger}-x_{p}^{\delta}\right\|\leq\left(\sqrt{\|(\mathbf{U},\mathbf{U})\|_{F}\sum_{l=1}^{k}\|u_{l}\|^{2}}\left\|\left(\mathbf{C},\mathbf{C}\right)^{-1}\right\|_{F}\right)\delta.

Proof 5.6.

Using (29)-(31) gives (32), which can be estimated as

(34) xpxpδ𝒳\displaystyle\left\|x_{p}^{\dagger}-x_{p}^{\delta}\right\|_{{\mathcal{X}}} 𝐔(𝐂,𝐂)𝒴1(yyδ,𝐂)𝒴\displaystyle\leq\left\|\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(y-y^{\delta},\mathbf{C}\right)_{{\mathcal{Y}}}\right\|
(35) 𝐔k𝒳(𝐂,𝐂)1kk(yyδ,𝐂)𝒴k\displaystyle\leq\left\|\mathbf{U}\right\|_{\mathbb{R}^{k}\to{\mathcal{X}}}\left\|\left(\mathbf{C},\mathbf{C}\right)^{-1}\right\|_{\mathbb{R}^{k}\to\mathbb{R}^{k}}\left\|\left(y-y^{\delta},\mathbf{C}\right)_{{\mathcal{Y}}}\right\|_{\mathbb{R}^{k}}
(36) (4.2),(21)(𝐔,𝐔)F(𝐂,𝐂)1kkTyyδ𝒴l=1kul2\displaystyle\stackrel{{\scriptstyle\eqref{Unorm},\eqref{yC_est}}}{{\leq}}\sqrt{\|(\mathbf{U},\mathbf{U})\|_{F}}\left\|\left(\mathbf{C},\mathbf{C}\right)^{-1}\right\|_{\mathbb{R}^{k}\to\mathbb{R}^{k}}\|T\|\|y-y^{\delta}\|_{\mathcal{Y}}\sqrt{\sum_{l=1}^{k}\|u_{l}\|^{2}}
(37) =((𝐔,𝐔)Fl=1kul2(𝐂,𝐂)1F)δ\displaystyle=\left(\sqrt{\|(\mathbf{U},\mathbf{U})\|_{F}\sum_{l=1}^{k}\|u_{l}\|^{2}}\left\|\left(\mathbf{C},\mathbf{C}\right)^{-1}\right\|_{F}\right)\delta

Remark 5.7.

Often with such recycling methods, one takes either 𝒰{\mathcal{U}} or 𝒞{\mathcal{C}} to be represented with an orthonormal basis, for the purposes of convenience with regard to analysis and implementation. We assume here that 𝒞{\mathcal{C}} is represented by an orthonormal basis. If we scale 𝐔\mathbf{U} such that 𝐂=T𝐔\mathbf{C}=T\,\mathbf{U} is an orthonormal system spanning 𝒞{\mathcal{C}}, we can write Eq. 33 more compactly as

Pxxpδ𝒳𝐔𝒳δ=𝒪(δ).\left\|Px-x_{p}^{\delta}\right\|_{{\mathcal{X}}}\leq\left\|\mathbf{U}\right\|_{{\mathcal{X}}}\delta={\mathcal{O}}\left(\delta\right).

Furthermore, PP and QQ can be expressed more compactly,

P=𝐔(,TT𝐔)𝒳 and Q=𝐂(,𝐂)𝒴,P\cdot=\mathbf{U}\left(\cdot,T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}\mbox{\ \ \ and\ \ \ }Q\cdot=\mathbf{C}\left(\cdot,\mathbf{C}\right)_{{\mathcal{Y}}},

due to the fact that (𝐔,TT𝐔)𝒳=(𝐂,𝐂)𝒴=𝐈\left(\mathbf{U},T^{\ast}T\,\mathbf{U}\right)_{{\mathcal{X}}}=\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}=\mathbf{I} when 𝐂\mathbf{C} is an orthonormal system.

It remains to find a suitable approximation for (I𝒳P)x\left(I_{{\mathcal{X}}}-P\right)x^{\dagger}. We have the following result:

Proposition 5.8.

Under the assumption

(38) QT(IP)x=0QT(I-P)x^{\dagger}=0

holds

(39) yyp=(IQ)T(IP)x,y-y_{p}=(I-Q)T(I-P)x^{\dagger},

where

(40) yp:=Txp.y_{p}:=Tx_{p}^{\dagger}.

Proof 5.9.

We have

(41) y=Tx=QTPx+QT(IP)x+(IQ)TPx+(IQ)T(IP)x.y=Tx^{\dagger}=QTPx^{\dagger}+QT(I-P)x^{\dagger}+(I-Q)TPx^{\dagger}+(I-Q)T(I-P)x^{\dagger}.

As xpδ=Px𝒰x_{p}^{\delta}=Px^{\dagger}\in{\mathcal{U}} we have yp=TxpδT𝒰y_{p}=Tx_{p}^{\delta}\in T{\mathcal{U}} and therefore

(42) yp=Txpδ=QTPx.y_{p}=Tx_{p}^{\delta}=QTPx^{\dagger}.

Further on, as TPxT𝒰TPx^{\dagger}\in T{\mathcal{U}} and (IQ)(I-Q) projects onto the complement of T𝒰T{\mathcal{U}} we obtain

(43) (IQ)TPx=0.(I-Q)TPx^{\dagger}=0.

Inserting (42), (43) into (41) yields

(44) yyp=QT(IP)x+(IQ)T(IP)x,y-y_{p}=QT(I-P)x^{\dagger}+(I-Q)T(I-P)x^{\dagger},

and Assumption (38) yields (39)

Assumption (38) transforms to

(45) QT(IP)x=0\displaystyle QT(I-P)x^{\dagger}=0 \displaystyle\Longleftrightarrow z,QT(IP)x=0z𝒴\displaystyle\langle z,QT(I-P)x^{\dagger}\rangle=0\hskip 28.45274pt\forall z\in{\mathcal{Y}}
\displaystyle\Longleftrightarrow TQz,(IP)x=0z𝒴\displaystyle\langle T^{\ast}Qz,(I-P)x^{\dagger}\rangle=0\hskip 28.45274pt\forall z\in{\mathcal{Y}}

We arrive at

Proposition 5.10.

If

(46) T𝒞(IP)x,T^{\ast}{\mathcal{C}}\perp(I-P)x^{\dagger},

then (38) holds. Particularly, (46) is fulfilled if

(47) TT𝒰𝒰T^{\ast}T{\mathcal{U}}\subset{\mathcal{U}}

holds.

Proof 5.11.

As Q𝒴=𝒞Q{\mathcal{Y}}={\mathcal{C}}, eq. (45) transforms to (46). As (IP)x𝒰(I-P)x^{\dagger}\in{\mathcal{U}}^{\perp} holds always if we additionally assume that 𝒰{\mathcal{U}} is chosen s.t. T𝒞𝒰T^{\ast}{\mathcal{C}}\perp{\mathcal{U}}^{\perp}, and with 𝒞=T𝒰{\mathcal{C}}=T{\mathcal{U}} this transforms to TT𝒰𝒰T^{\ast}T{\mathcal{U}}\perp{\mathcal{U}}^{\perp} or, equivalently, (47).

Remark 5.12.

Obviously, (47) is much stronger than (46). However, there are easy examples where (46) always holds:

  1. 1.

    If 𝒰=span{ui:i=1,N and TTui=σiui}{\mathcal{U}}=span\{u_{i}:i=1,\dots N\mbox{ and }T^{\ast}Tu_{i}=\sigma_{i}u_{i}\} then clearly TT𝒰𝒰T^{\ast}T{\mathcal{U}}\subset{\mathcal{U}}.

  2. 2.

    If TT is an unitary operator, i.e., Tx1,Tx2=x1,x2\langle Tx_{1},Tx_{2}\rangle=\langle x_{1},x_{2}\rangle holds for all x1,x2x_{1},x_{2}, then we have for u𝒰u\in{\mathcal{U}} and u𝒰u^{\perp}\in{\mathcal{U}}^{\perp}

    TTu,u=Tu,Tu=u,u=0,\langle T^{\ast}Tu,u^{\perp}\rangle=\langle Tu,Tu^{\perp}\rangle=\langle u,u^{\perp}\rangle=0,

    i.e., (47).

Furthermore, the way we construct our projectors ensures that the assumption holds.

Lemma 5.13.

The assumption Eq. 38 holds for any pair of projectors (P,Q)\left(P,Q\right) satisfying Eq. 12.

Proof.  This follows from the fact that the product of complementary projectors is the zero operator, since from Eq. 12, we have

QT(IP)x=Q(IQ)Tx=0.QT(I-P)x^{\dagger}=Q(I-Q)Tx^{\dagger}=0.

    

With the above results we can now compute (IP)x(I-P)x^{\dagger} as a solution of equation (38). However, as always in Inverse Problems, we might not have the exact right hand side yypy-y_{p} but some noisy version yδypδy^{\delta}-y_{p}^{\delta}, where we define ypδ:=Txpδy_{p}^{\delta}:=Tx_{p}^{\delta}. The data error can be approximated as follows:

Proposition 5.14.

The data error in the left hand side of (38) can be estimated as

(48) (yyp)(yδypδ)κ𝒰δ\|(y-y_{p})-(y^{\delta}-y_{p}^{\delta})\|\leq\kappa_{{\mathcal{U}}}\cdot\delta

with

(49) κ𝒰:=1+T(𝐔,𝐔)Fl=1kul2(𝐂,𝐂)1F\kappa_{{\mathcal{U}}}:=1+\|T\|\sqrt{\|(\mathbf{U},\mathbf{U})\|_{F}\sum_{l=1}^{k}\|u_{l}\|^{2}}\left\|\left(\mathbf{C},\mathbf{C}\right)^{-1}\right\|_{F}

Proof 5.15.

The proof is straight forward:

(50) (yyp)(yδypδ)yyδ+ypypδδ+Txpxpδ\|(y-y_{p})-(y^{\delta}-y_{p}^{\delta})\|\leq\|y-y^{\delta}\|+\|y_{p}-y_{p}^{\delta}\|\leq\delta+\|T\|\|x_{p}^{\dagger}-x_{p}^{\delta}\|

and with (33) follows (48)

We are now ready to formulate our solution approach:

1 Given:
\bullet data yδy^{\delta} fulfilling yyδδ\|y-y^{\delta}\|\leq\delta
\bullet 𝒰𝒳{\mathcal{U}}\subset{\mathcal{X}}, 𝒰{\mathcal{U}} finite dimensional
\bullet a regularization method Rα(y,T)R_{\alpha}(y,T) for the equation y=Txy=Tx with parameter choice rule α=α(δ)\alpha=\alpha(\delta)
2 Compute xpδx_{p}^{\delta} according to (31) and set ypδ=Txpδy_{p}^{\delta}=Tx_{p}^{\delta}
3 Set B:=(IQ)TB:=(I-Q)T
4 Compute x^pδ:=Rα(yδypδ,B)\hat{x}_{p}^{\delta}:=R_{\alpha}(y^{\delta}-y_{p}^{\delta},B), α=α(κ𝒰δ)\alpha=\alpha(\kappa_{{\mathcal{U}}}\delta)
Set x𝒰,αδ:=xpδ+x^pδx_{{\mathcal{U}},\alpha}^{\delta}:=x_{p}^{\delta}+\hat{x}_{p}^{\delta}
Algorithm 1 Augmented Regularization
Proposition 5.16.

If for the solution xx^{\dagger} of the equation Tx=yTx=y and the chosen augmentation subspace 𝒰{\mathcal{U}} eq. (38) holds, then the augmented regularization with data yδy^{\delta} fulfilling yyδδ\|y-y^{\delta}\|\leq\delta as described in Algorithm 1 forms a regularization method.

Proof 5.17.

As Rα(yδypδ,B)R_{\alpha}(y^{\delta}-y_{p}^{\delta},B) is a regularization for (39) we have as δ0\delta\to 0, and decomposing x=Px+(IP)x=xp+x^px^{\dagger}=Px^{\dagger}+(I-P)x^{\dagger}=x_{p}^{\dagger}+\hat{x}_{p}^{\dagger} yields

xx𝒰,αδ\displaystyle\|x^{\dagger}-x_{{\mathcal{U}},\alpha}^{\delta}\| \displaystyle\leq xpxpδ+x^px^pδ\displaystyle\|x_{p}^{\dagger}-x_{p}^{\delta}\|+\|\hat{x}_{p}^{\dagger}-\hat{x}_{p}^{\delta}\|
(33)\displaystyle\stackrel{{\scriptstyle\eqref{eqn.init-proj-err}}}{{\leq}} 𝒪(δ)+x^pRα(κ𝒰δ)(yδypδ,B)δ00\displaystyle\mathcal{O}(\delta)+\|\hat{x}_{p}^{\dagger}-R_{\alpha(\kappa_{{\mathcal{U}}}\delta)}(y^{\delta}-y_{p}^{\delta},B)\|\stackrel{{\scriptstyle\delta\to 0}}{{\longrightarrow}}0

which concludes the proof.

Remark 5.18.

We can thus extend any existing regularization RαR_{\alpha} to its augmented method and still obtain a regularization method.

Example 1.

Let us illustrate this for the Landweber method, which has been defined in (7). If the iteration is stopped by the discrepancy principle, i.e., if the stopping index kk_{\ast} is determined as the first index such that

(51) yδTxkδτδτ>1\|y^{\delta}-Tx_{k_{\ast}}^{\delta}\|\leq\tau\delta\hskip 28.45274pt\tau>1

holds, then it is a regularization method, see e.g., [8]. For given data yδy^{\delta}, the augmented Landweber method would read as

(52) xkδ=xpδ+x^kδx_{k_{\ast}}^{\delta}=x_{p}^{\delta}+\hat{x}_{k_{\ast}}^{\delta}

where xpδx_{p}^{\delta} is defined in (31) and x^kδ\hat{x}_{k_{\ast}}^{\delta} is the kk_{\ast}th iterate of the Landweber iteration applied to the equation

(53) (IQ)Tx=yδypδ,ypδ=Txpδ.(I-Q)Tx=y^{\delta}-y_{p}^{\delta},\hskip 56.9055pty_{p}^{\delta}=Tx_{p}^{\delta}~{}.

Using

(54) ypδ=Txpδ=T𝐔(𝐂,𝐂)𝒴1(yδ,𝐂)𝒴=𝐂(𝐂,𝐂)𝒴1(yδ,𝐂)𝒴=Qyδy_{p}^{\delta}=Tx_{p}^{\delta}=T\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(y^{\delta},\mathbf{C}\right)_{{\mathcal{Y}}}=\mathbf{C}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(y^{\delta},\mathbf{C}\right)_{{\mathcal{Y}}}=Qy^{\delta}

the iteration for x^kδ\hat{x}_{k}^{\delta} reads as

(55) x^k+1δ\displaystyle\hat{x}_{k+1}^{\delta} =\displaystyle= x^kδ+T(IQ)(yδypδ(IQ)Tx^kδ)\displaystyle\hat{x}_{k}^{\delta}+T^{\ast}(I-Q)\left(y^{\delta}-y_{p}^{\delta}-(I-Q)T\hat{x}_{k}^{\delta}\right)
=\displaystyle= x^kδ+T(IQ)(yδTx^kδ).\displaystyle\hat{x}_{k}^{\delta}+T^{\ast}(I-Q)\left(y^{\delta}-T\hat{x}_{k}^{\delta}\right).

Usually such an iteration is started with x^0δ=0\hat{x}_{0}^{\delta}=0. Instead, we can incorporate xpδx_{p}^{\delta} directly into the iteration by setting x^0δ=xpδ\hat{x}_{0}^{\delta}=x_{p}^{\delta}.

5.1 What to recycle

One question we have not addressed thus far concerns what should the subspace 𝒰{\mathcal{U}} encode? This is a question that hinges very much on the application. Suffice it to say, a detailed review of recycling strategies is beyond the scope of this paper, but we refer the reader to [36, Section 6] for a more detailed discussion. Some common choices for augmentation vectors are approximate solutions or solutions to related problems, approximate or (when available) exact eigenvectors/singular vectors, and vectors satisfying convergence model optimality properties.

The use of approximate eigenvectors as a recycling strategy was first suggested for recycling between cycles of GMRES applied to one linear system to mitigate the effects on convergence speed of restarting [28]. The algorithmic choices in that paper necessitated that the approximate eigenvectors be harmonic Ritz vectors. Ritz-type eigenvector approximations with respect to a square matrix 𝐀\mathbf{A} are obtained applying a Galerkin or Petrov-Galerkin approximation strategy to the eigenvalue problem of the form

select𝐯𝒱such that𝐀𝐯λ𝐯𝒲,\mbox{select}\qquad\mathbf{v}\in\mathcal{V}\qquad\mbox{such that}\qquad\mathbf{A}\mathbf{v}-\lambda\mathbf{v}\perp\mathcal{W},

for some λ\lambda\in\mathbb{R}. The spaces 𝒱\mathcal{V} and 𝒲\mathcal{W} are usually spaces which have been generated during the iteration, so that the solution to the Ritz problem reduces to a (generalized) eigenvalue problem. This strategy was extended and combined with GCRO-type optimal methods in a way that did not require augmentation by harmonic Ritz vectors, and this enabled recycling between multiple linear systems [29]. This has been further extended to the case of short-recurrence schemes not requiring restart; see, e.g., [5, 23, 38]. In that case, one stores and updates a running recycle window but not to use in the current iteration. Rather, it is being built to be used in subsequent systems.

It should be noted that in the context of discrete ill-posed problems, a poor or noisy choice of 𝒰{\mathcal{U}} may induce a projected problem Eq. 13 which is even more ill-posed. However, it has been demonstrated that there are many good choices for 𝒰{\mathcal{U}}; see, e.g., [3, 4, 9]. More recently, a recycled LSQR algorithm has been proposed where approximate eigenvectors are used as well as other data which become available in real-time [5]. As LSQR is a short-recurrence method, the authors employ a windowed recycling strategy, where the recycled subspace is updated every pp iterations so that only a window of vectors generated by the short recurrence iteration must be stored. They propose a few different recycling approaches appropriate for ill-posed problems: truncated singular vectors, solution- or sparsity-oriented compression, and reduced basis decomposition.

In the next section, we use the recycling method framework from [36] to develop augmented versions of steepest descent and Landweber methods. In Section 7, we combine this method with recycling strategies appropriate to each individual application, building on strategies from the literature but adjusted to fit the particular applications.

6 Augmented gradient descent methods

We now delve further into the case that we apply a gradient descent-type method (i.e., Landweber) to the projected problem, in order to produce a practical implementation of this algorithm. In particular, we first explore augmenting the steepest descent method, which can be understood as a Landweber-type method with a step-length that is dynamically chosen to minimize the residual norm. We have shown in Proposition 3.1 that this is a projection method and thus can be augmented in the above-discussed framework. Augmented methods such as this rely on certain identities to achieve algorithmic advantages.

Lemma 6.1.

The residual produced by the augmented iterative method with approximation x^kδ=𝐔(𝐂,𝐂)𝒴1(yδ,𝐂)𝒴+(I𝒳P)t^kδ\hat{x}_{k}^{\delta}=\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(y^{\delta},\mathbf{C}\right)_{{\mathcal{Y}}}+\left(I_{{\mathcal{X}}}-P\right)\hat{t}_{k}^{\delta} defined as in Eq. 14 where t^kδ\hat{t}_{k}^{\delta} is the approximate solution of Eq. 13 coincides with the residual associated to t^kδ\hat{t}_{k}^{\delta} for the projected problem Eq. 13, i.e.,

yδTx^kδ=(I𝒴Q)(yδTt^kδ)𝒞.y^{\delta}-T\hat{x}_{k}^{\delta}=\left(I_{{\mathcal{Y}}}-Q\right)\left(y^{\delta}-T\hat{t}_{k}^{\delta}\right)\perp{\mathcal{C}}.

Proof.  This result has been shown in a number of papers for finite-dimensional, well-posed problems; see, [7] and [6, 15, 16, 18]. We show it here for completeness by direct calculation, by first writing

yδTx^jδ\displaystyle y^{\delta}-T\hat{x}_{j}^{\delta} =yδT(𝐔(𝐂,𝐂)𝒴1(yδ,𝐂)𝒴+(I𝒳P)t^jδ)\displaystyle=y^{\delta}-T\left(\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(y^{\delta},\mathbf{C}\right)_{{\mathcal{Y}}}+\left(I_{{\mathcal{X}}}-P\right)\hat{t}_{j}^{\delta}\right)
yδTx^jδ\displaystyle y^{\delta}-T\hat{x}_{j}^{\delta} =yδ(𝐂(𝐂,𝐂)𝒴1(yδ,𝐂)𝒴+T(I𝒳P)t^jδ).\displaystyle=y^{\delta}-\left(\mathbf{C}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(y^{\delta},\mathbf{C}\right)_{{\mathcal{Y}}}+T\left(I_{{\mathcal{X}}}-P\right)\hat{t}_{j}^{\delta}\right).

Then we observe that it follows from Proposition 5.3 and Eq. 12 that this is equivalent to

yδTx^jδ\displaystyle y^{\delta}-T\hat{x}_{j}^{\delta} =yδ(Qyδ+(I𝒳Q)Tt^jδ)\displaystyle=y^{\delta}-\left(Qy^{\delta}+\left(I_{{\mathcal{X}}}-Q\right)T\hat{t}_{j}^{\delta}\right)
=(I𝒳Q)yδ(I𝒳Q)Tt^jδ.\displaystyle=\left(I_{{\mathcal{X}}}-Q\right)y^{\delta}-\left(I_{{\mathcal{X}}}-Q\right)T\hat{t}_{j}^{\delta}.

It follows directly that yδTx^jδ𝒞y^{\delta}-T\hat{x}_{j}^{\delta}\perp{\mathcal{C}}.        Note that this proof holds in the presence or absence of noise and whether or not the problem being treated is consistent. In the case that the right-hand side is noise-free and the problem is consistent, we have from Eq. 30 that

(56) 𝐔(𝐂,𝐂)𝒴1(y,𝐂)𝒴=Px=xp,\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(y,\mathbf{C}\right)_{{\mathcal{Y}}}=Px^{\dagger}=x_{p}^{\dagger},

meaning it is the best approximation of xx^{\dagger} in 𝒰{\mathcal{U}} with respect to the TTT^{\ast}T-norm. In the presence of noise, we previously characterized and bounded the error introduced into Eq. 56 in Proposition 5.5.

It directly follows that the construction of the gradient descent direction for Landweber applied to Eq. 13 can be simplified.

Corollary 6.2.

The gradient descent direction for Landweber applied to Eq. 13 admits the simplification,

T(I𝒴Q)(yδTt^jδ)=T(yδTx^jδ);T^{\ast}\left(I_{{\mathcal{Y}}}-Q\right)\left(y^{\delta}-T\hat{t}^{\delta}_{j}\right)=T^{\ast}\left(y^{\delta}-T\hat{x}^{\delta}_{j}\right);

i.e., the descent direction is the same as that for the full problem Eq. 1.

With these simplifications, we now show that indeed the above-described fully augmented Landweber construction is indeed updating the approximation to the solution to the full problem Eq. 1 in the optimal direction at each iteration. Furthermore, if when applying Landweber to Eq. 13, we choose the step length dynamically such that we are applying the method of steepest descent to Eq. 13, this will be shown in the following theorem to be equivalent to minimizing the residual over the sum of subspaces 𝒮j+1=𝒰+span{T(yδTxj)}{\mathcal{S}}_{j+1}={\mathcal{U}}+{\rm span}\left\{T^{\ast}\left(y^{\delta}-Tx_{j}\right)\right\}.

As this should be a steepest descent-type method which is a 𝒴{\mathcal{Y}}-norm minimization method, the correct constraint space is the image of the correction space under the action of the operator, namely j+1=T𝒰+span{TT(yδTxj)}{\mathcal{L}}_{j+1}=T{\mathcal{U}}+{\rm span}\left\{TT^{\ast}\left(y^{\delta}-Tx_{j}\right)\right\}. At step j+1j+1, we have xj+1=xj+𝐔𝐮j+1+αj+1T(yδTxj)x_{j+1}=x_{j}+\mathbf{U}\mathbf{u}_{j+1}+\alpha_{j+1}T^{\ast}\left(y^{\delta}-Tx_{j}\right) where 𝐮j+1k\mathbf{u}_{j+1}\in\mathbb{R}^{k} and αj+1\alpha_{j+1}\in\mathbb{R} are determined by the minimization constraint. The following is an adaption of a more general result, proven in [7].

Theorem 6.3.

The (j+1)(j+1)st approximate solution to the augmented steepest descent problem minimization,

 select 𝐔𝐮j+1+αj+1vj𝒰+span{vj} where vj=T(yδTxj)\displaystyle\mbox{\ \ \ select\ \ }\mathbf{U}\mathbf{u}_{j+1}+\alpha_{j+1}v_{j}\in{\mathcal{U}}+{\rm span}\left\{v_{j}\right\}\mbox{\ \ \ where\ \ \ }v_{j}=T^{\ast}\left(y^{\delta}-Tx_{j}\right)
 such that (𝐮j+1,αj+1)=argmin𝐮kαyT(xj+αvj+𝐔𝐮)𝒴\displaystyle\mbox{\ \ \ such that\ \ }\left(\mathbf{u}_{j+1},\alpha_{j+1}\right)=\underset{\mathbf{u}\in\mathbb{R}^{k}\atop\alpha\in\mathbb{R}}{\text{{\rm argmin}}}\left\|y-T\left(x_{j}+\alpha v_{j}+\mathbf{U}\mathbf{u}\right)\right\|_{{\mathcal{Y}}}

satisfies

  1. 1.

    𝐮j+1=𝐮j+1(1)+𝐮j+1(2)\mathbf{u}_{j+1}=\mathbf{u}^{(1)}_{j+1}+\mathbf{u}^{(2)}_{j+1} with

    (57) 𝐔𝐮j+1(1)\displaystyle\mathbf{U}\mathbf{u}^{(1)}_{j+1} ={argminx𝒰e0x(TT,)𝒳uy if j=0𝟎 otherwise\displaystyle=\begin{cases}\underset{x\in{\mathcal{U}}}{\text{{\rm argmin}}}\left\|e^{\dagger}_{0}-x\right\|_{\left(T^{\ast}T\cdot,\cdot\right)_{{\mathcal{X}}}}-u_{y}&\qquad\mbox{\ \ \ if\ \ \ }j=0\\ \mathbf{0}&\qquad\mbox{\ \ \ otherwise}\end{cases}
    (58) 𝐔𝐮j+1(2)\displaystyle\mathbf{U}\mathbf{u}^{(2)}_{j+1} =argminx𝒰xαj+1vj+1(TT,)𝒳\displaystyle=-\underset{x\in{\mathcal{U}}}{\text{{\rm argmin}}}\left\|x-\alpha_{j+1}v_{j+1}\right\|_{\left(T^{\ast}T\cdot,\cdot\right)_{{\mathcal{X}}}}

    where uy=𝐔(𝐂,𝐂)𝒴1(yyδ,𝐂)𝒴u_{y}=\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(y-y^{\delta},\mathbf{C}\right)_{{\mathcal{Y}}},

  2. 2.

    and αj+1\alpha_{j+1} satisfies

    (59) αj+1vj+1=argminxspan{vj}(I𝒴Q)rj(I𝒴Q)Tx𝒴,\alpha_{j+1}v_{j+1}=\underset{x\in{\rm span}\left\{v_{j}\right\}}{\text{{\rm argmin}}}\left\|\left(I_{{\mathcal{Y}}}-Q\right)r_{j}-\left(I_{{\mathcal{Y}}}-Q\right)Tx\right\|_{{\mathcal{Y}}},

i.e., αj+1\alpha_{j+1} is obtained via applying an iteration of steepest descent to the projected problem, where vj+1=Trjv_{j+1}=T^{\ast}r_{j} and e0=xx0e^{\dagger}_{0}=x^{\dagger}-x_{0}.

Proof.  As this is a minimization problem, we prove the result by studying the structure of the associated orthogonal projector. We can rewrite this minimization as

(𝐮j+1,αj+1)=argmin𝐮kαrjT(αvj+𝐔𝐮)𝒴.\left(\mathbf{u}_{j+1},\alpha_{j+1}\right)=\underset{\mathbf{u}\in\mathbb{R}^{k}\atop\alpha\in\mathbb{R}}{\text{{\rm argmin}}}\left\|r_{j}-T\left(\alpha v_{j}+\mathbf{U}\mathbf{u}\right)\right\|_{{\mathcal{Y}}}.

Thus, we seek the pair (𝐮j+1,αj+1)\left(\mathbf{u}_{j+1},\alpha_{j+1}\right) giving the 𝒴{\mathcal{Y}}-norm best approximation of rjr_{j} in the space T𝒮j+1T{\mathcal{S}}_{j+1}. This is equivalent to computing the 𝒴{\mathcal{Y}}-orthogonal projection of rjr_{j} into that space, which we denote QT𝒮j+1Q_{T{\mathcal{S}}_{j+1}}. Recalling that 𝐂=T𝐔\mathbf{C}=T\mathbf{U}, from Proposition 5.3, we can express this as

T(αvj+𝐔𝐮)\displaystyle T\left(\alpha v_{j}+\mathbf{U}\mathbf{u}\right) =QT𝒮j+1rj\displaystyle=Q_{T{\mathcal{S}}_{j+1}}r_{j}
=[𝐂Tvj+1]([𝐂Tvj+1],[𝐂Tvj+1])𝒴1(rj,[𝐂Tvj+1])𝒴,\displaystyle=\footnotesize\begin{bmatrix}\mathbf{C}&Tv_{j+1}\end{bmatrix}\left(\begin{bmatrix}\mathbf{C}&Tv_{j+1}\end{bmatrix},\begin{bmatrix}\mathbf{C}&Tv_{j+1}\end{bmatrix}\right)_{{\mathcal{Y}}}^{-1}\left(r_{j},\begin{bmatrix}\mathbf{C}&Tv_{j+1}\end{bmatrix}\right)_{{\mathcal{Y}}}\normalsize,

where [𝐂Tvj+1]((k+1),𝒴)\begin{bmatrix}\mathbf{C}&Tv_{j+1}\end{bmatrix}\in{\mathcal{L}}\left(\mathbb{R}^{(k+1)},{\mathcal{Y}}\right). This means that (𝐮j+1,αj+1)\left(\mathbf{u}_{j+1},\alpha_{j+1}\right) form the solution of

[(𝐂,𝐂)𝒴(Tvj+1,𝐂)𝒴(𝐂,Tvj+1)𝒴(Tvj+1,Tvj+1)𝒴][𝐮j+1αj+1]=[(rj,𝐂)𝒴(rj,Tvj+1)𝒴]\begin{bmatrix}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}&\left(Tv_{j+1},\mathbf{C}\right)_{{\mathcal{Y}}}\\ \left(\mathbf{C},Tv_{j+1}\right)_{{\mathcal{Y}}}&\left(Tv_{j+1},Tv_{j+1}\right)_{{\mathcal{Y}}}\end{bmatrix}\begin{bmatrix}\mathbf{u}_{j+1}\\ \alpha_{j+1}\end{bmatrix}=\begin{bmatrix}\left(r_{j},\mathbf{C}\right)_{{\mathcal{Y}}}\\ \left(r_{j},Tv_{j+1}\right)_{{\mathcal{Y}}}\end{bmatrix}

To prove the statement of the theorem, we eliminate 𝐮j+1\mathbf{u}_{j+1} from the second block of equations (following [30]) yielding the two sets of equations,

(𝐂,𝐂)𝒴𝐮j+1+αj+1(Tvj+1,𝐂)𝒴\displaystyle\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}\mathbf{u}_{j+1}+\alpha_{j+1}\left(Tv_{j+1},\mathbf{C}\right)_{{\mathcal{Y}}} =(rj,𝐂)𝒴\displaystyle=\left(r_{j},\mathbf{C}\right)_{{\mathcal{Y}}}
αj+1[(Tvj+1,Tvj+1)𝒴(𝐂,Tvj+1)𝒴(𝐂,𝐂)𝒴1(Tvj+1,𝐂)𝒴]\displaystyle\alpha_{j+1}\left[\left(Tv_{j+1},Tv_{j+1}\right)_{{\mathcal{Y}}}-\left(\mathbf{C},Tv_{j+1}\right)_{{\mathcal{Y}}}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(Tv_{j+1},\mathbf{C}\right)_{{\mathcal{Y}}}\right] =(rj,Tvj+1)𝒴\displaystyle=\left(r_{j},Tv_{j+1}\right)_{{\mathcal{Y}}}
(𝐂,Tvj+1)𝒴\displaystyle-\left(\mathbf{C},Tv_{j+1}\right)_{{\mathcal{Y}}} (𝐂,𝐂)𝒴1(rj,𝐂)𝒴.\displaystyle\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(r_{j},\mathbf{C}\right)_{{\mathcal{Y}}}.

We can consolidate the second set of equations, yielding

αj+1(Tvj+1𝐂(𝐂,𝐂)𝒴1(Tvj+1,𝐂)𝒴,Tvj+1)𝒴=\displaystyle\alpha_{j+1}\left(Tv_{j+1}-\mathbf{C}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(Tv_{j+1},\mathbf{C}\right)_{{\mathcal{Y}}},Tv_{j+1}\right)_{{\mathcal{Y}}}=
(rj𝐂(𝐂,𝐂)𝒴1(rj,𝐂)𝒴,Tvj+1)𝒴.\displaystyle\left(r_{j}-\mathbf{C}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(r_{j},\mathbf{C}\right)_{{\mathcal{Y}}},Tv_{j+1}\right)_{{\mathcal{Y}}}.

Recalling the definition of the projector QQ, we substitute and then take advantage of the self-adjointness of orthogonal projectors,

αj+1((I𝒴Q)Tvj+1,Tvj+1)𝒴\displaystyle\alpha_{j+1}\left(\left(I_{{\mathcal{Y}}}-Q\right)Tv_{j+1},Tv_{j+1}\right)_{{\mathcal{Y}}} =((I𝒴Q)rj,Tvj+1)𝒴\displaystyle=\left(\left(I_{{\mathcal{Y}}}-Q\right)r_{j},Tv_{j+1}\right)_{{\mathcal{Y}}}
αj+1((I𝒴Q)2Tvj+1,Tvj+1)𝒴\displaystyle\alpha_{j+1}\left(\left(I_{{\mathcal{Y}}}-Q\right)^{2}Tv_{j+1},Tv_{j+1}\right)_{{\mathcal{Y}}} =((I𝒴Q)2rj,Tvj+1)𝒴\displaystyle=\left(\left(I_{{\mathcal{Y}}}-Q\right)^{2}r_{j},Tv_{j+1}\right)_{{\mathcal{Y}}}
αj+1((I𝒴Q)Tvj+1,(I𝒴Q)Tvj+1)𝒴\displaystyle\alpha_{j+1}\left(\left(I_{{\mathcal{Y}}}-Q\right)Tv_{j+1},\left(I_{{\mathcal{Y}}}-Q\right)Tv_{j+1}\right)_{{\mathcal{Y}}} =((I𝒴Q)rj,(I𝒴Q)Tvj+1)𝒴.\displaystyle=\left(\left(I_{{\mathcal{Y}}}-Q\right)r_{j},\left(I_{{\mathcal{Y}}}-Q\right)Tv_{j+1}\right)_{{\mathcal{Y}}}.

Thus, the minimizer αj+1\alpha_{j+1} satisfies

((I𝒴Q)rj(I𝒴Q)T(αj+1vj+1),(I𝒴Q)Tvj+1)𝒴=0.\big{(}\left(I_{{\mathcal{Y}}}-Q\right)r_{j}-\left(I_{{\mathcal{Y}}}-Q\right)T\left(\alpha_{j+1}v_{j+1}\right),\left(I_{{\mathcal{Y}}}-Q\right)Tv_{j+1}\big{)}_{{\mathcal{Y}}}=0.

As vj+1=Trjv_{j+1}=T^{\ast}r_{j}, then we see that αj+1\alpha_{j+1} is the solution of

 select αj+1vj+1span{T(I𝒴Q)rj}\displaystyle\mbox{\ \ \ select\ \ }\alpha_{j+1}v_{j+1}\in{\rm span}\left\{T^{\ast}\left(I_{{\mathcal{Y}}}-Q\right)r_{j}\right\}
 such that (I𝒴Q)rj(I𝒴Q)T(αj+1vj+1)span{(I𝒴Q)TT(I𝒴Q)rj}.\displaystyle\mbox{\ \ \ such that\ \ }\left(I_{{\mathcal{Y}}}-Q\right)r_{j}-\left(I_{{\mathcal{Y}}}-Q\right)T\left(\alpha_{j+1}v_{j+1}\right)\perp{\rm span}\left\{\left(I_{{\mathcal{Y}}}-Q\right)TT^{\ast}\left(I_{{\mathcal{Y}}}-Q\right)r_{j}\right\}.

From Proposition 3.1, we then see that this is the residual constraint formulation of the (j+1)(j+1)st step of steepest descent applied to Eq. 59. We can then solve for 𝐮j+1\mathbf{u}_{j+1},

𝐮j+1=(𝐂,𝐂)𝒴1[(rj,𝐂)𝒴αj+1(Tvj+1,𝐂)𝒴].\mathbf{u}_{j+1}=\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left[\left(r_{j},\mathbf{C}\right)_{{\mathcal{Y}}}-\alpha_{j+1}\left(Tv_{j+1},\mathbf{C}\right)_{{\mathcal{Y}}}\right].

Expanding to obtain the contribution from 𝒰{\mathcal{U}} yields

𝐔𝐮j+1=𝐔(𝐂,𝐂)𝒴1(rj,𝐂)𝒴αj+1𝐔(𝐂,𝐂)𝒴1(Tvj+1,𝐂)𝒴,\displaystyle\mathbf{U}\mathbf{u}_{j+1}=\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(r_{j},\mathbf{C}\right)_{{\mathcal{Y}}}-\alpha_{j+1}\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(Tv_{j+1},\mathbf{C}\right)_{{\mathcal{Y}}},

and we observe that second term in this sum is Eq. 58. For the first term, we must consider two cases. If j=0j=0, then x0=0x_{0}=0, and r0δ=yδr_{0}^{\delta}=y^{\delta}. Thus, we can write

𝐔(𝐂,𝐂)𝒴1(r0,𝐂)𝒴=𝐔(𝐂,𝐂)𝒴1(yδ,𝐂)𝒴=xpδ.\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(r_{0},\mathbf{C}\right)_{{\mathcal{Y}}}=\mathbf{U}\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(y^{\delta},\mathbf{C}\right)_{{\mathcal{Y}}}=x_{p}^{\delta}.

From Proposition 5.5, we have that xpδ=xpU(𝐂,𝐂)𝒴1(yyδ,𝐂)𝒴x_{p}^{\delta}=x_{p}^{\dagger}-U\left(\mathbf{C},\mathbf{C}\right)_{{\mathcal{Y}}}^{-1}\left(y-y^{\delta},\mathbf{C}\right)_{{\mathcal{Y}}}. It is established in Eq. 56 that xpx_{p}^{\dagger} is the minimizer in Eq. 57. Lastly, if j>0j>0, Lemma 6.1 tells us that rj𝒞r_{j}\perp{\mathcal{C}}; thus (rj,𝐂)𝒴=𝟎\left(r_{j},\mathbf{C}\right)_{{\mathcal{Y}}}=\mathbf{0}.       

Corollary 6.4.

The full augmented steepest descent approximation can be represented as

xj+1={xpδ+α1TyδαiP(Tyδ) if j=0xj+αiTrjαiP(Trj) otherwisex_{j+1}=\begin{cases}x_{p}^{\delta}+\alpha_{1}T^{\ast}y^{\delta}-\alpha_{i}P\left(T^{\ast}y^{\delta}\right)&\mbox{\ \ \ if\ \ \ }j=0\\ x_{j}+\alpha_{i}T^{\ast}r_{j}-\alpha_{i}P\left(T^{\ast}r_{j}\right)&\mbox{\ \ \ otherwise}\end{cases}

with optimal step length

αi=Tri𝒴2(I𝒴Q)TTri𝒴2.\alpha_{i}=\dfrac{\left\|T^{\ast}r_{i}\right\|_{{\mathcal{Y}}}^{2}}{\left\|\left(I_{{\mathcal{Y}}}-Q\right)TT^{\ast}r_{i}\right\|_{{\mathcal{Y}}}^{2}}.

We note that xpδ𝒰x_{p}^{\delta}\in{\mathcal{U}} can be computed one time initially, as it comes only from initial data. We encapsulate all of this in the case of augmented steepest descent with dynamically determined optimal step length as Algorithm 2.

Note: the reader may notice that Line 3 of Algorithm 2 asks that the user compute a QR-factorization of T𝐔T\mathbf{U}, which essentially has a finite number of elements from an infinite-dimensional space 𝒴{\mathcal{Y}} as its “columns”. By QR-factorization in this context, we mean performing steps of the Gram-Schmidt process and storing the orthogonalization and normalization coefficients in an upper-triangular matrix 𝐑\mathbf{R}. Right-multiplication of 𝐂(k,𝒴)\mathbf{C}\in{\mathcal{L}}\left(\mathbb{R}^{k},{\mathcal{Y}}\right) by a matrix should be understood as applying the definition in Eq. 16 for each column of 𝐑\mathbf{R}, thereby recovering the relationship between T𝐔T\mathbf{U} and 𝐂\mathbf{C} arising from the Gram-Schmidt process. This is a well-defined procedure and process which is discussed, e.g., in [37, Lecture 7].

1 Given: 𝐔(k,𝒳)\mathbf{U}\in{\mathcal{L}}\left(\mathbb{R}^{k},{\mathcal{X}}\right) representing 𝒰{\mathcal{U}}
2 Set r0=yTx0r_{0}=y-Tx_{0}
3 Compute QR-factorization T𝐔=𝐂𝐑T\mathbf{U}=\mathbf{C}\mathbf{R}
4 𝐔𝐔𝐑1\mathbf{U}\leftarrow\mathbf{U}\mathbf{R}^{-1}
5 𝐮(1)=(r0,𝐂)𝒴\mathbf{u}^{(1)}=\left(r_{0},\mathbf{C}\right)_{{\mathcal{Y}}}
6 xx0+𝐔𝐮(1)x\leftarrow x_{0}+\mathbf{U}\mathbf{u}^{(1)}
7 rr0𝐂𝐮(1)r\leftarrow r_{0}-\mathbf{C}\mathbf{u}^{(1)}
8 while STOPPING-CRITERIA do
9       αi=Tri𝒴2(I𝒴Q)TTri𝒴2\alpha_{i}=\frac{\left\|T^{\ast}r_{i}\right\|_{{\mathcal{Y}}}^{2}}{\left\|\left(I_{{\mathcal{Y}}}-Q\right)TT^{\ast}r_{i}\right\|_{{\mathcal{Y}}}^{2}}
10       𝐰^=(TTr,𝐂)𝒴\widehat{\mathbf{w}}\leftarrow=\left(TT^{\ast}r,\mathbf{C}\right)_{{\mathcal{Y}}}
11       xx+αiTrαi𝐔𝐰^x\leftarrow x+\alpha_{i}T^{\ast}r-\alpha_{i}\mathbf{U}\widehat{\mathbf{w}}
12       rrαiTTr+αi𝐂𝐰^r\leftarrow r-\alpha_{i}TT^{\ast}r+\alpha_{i}\mathbf{C}\widehat{\mathbf{w}}
13 end while
Algorithm 2 Augmented steepest descent for the normal equations

Furthermore, it is then clear that if we no longer choose αi\alpha_{i} dynamically according to a minimization criteria and instead fix αiα\alpha_{i}\equiv\alpha that we can propose an augmented Landweber method, which we encapsulate as Algorithm 3.

1 Given: 𝐔(k,𝒳)\mathbf{U}\in{\mathcal{L}}\left(\mathbb{R}^{k},{\mathcal{X}}\right) representing 𝒰{\mathcal{U}},  α>0\alpha>0
2 Set r0=yTx0r_{0}=y-Tx_{0}
3 Compute QR-factorization T𝐔=𝐂𝐑T\mathbf{U}=\mathbf{C}\mathbf{R}
4 𝐔𝐔𝐑1\mathbf{U}\leftarrow\mathbf{U}\mathbf{R}^{-1}
5 𝐮(1)=(r0,𝐂)𝒴\mathbf{u}^{(1)}=\left(r_{0},\mathbf{C}\right)_{{\mathcal{Y}}}
6 xx0+𝐔𝐮(1)x\leftarrow x_{0}+\mathbf{U}\mathbf{u}^{(1)}
7 rr0𝐂𝐮(1)r\leftarrow r_{0}-\mathbf{C}\mathbf{u}^{(1)}
8 while STOPPING-CRITERIA do
9       𝐰^=(TTr,𝐂)𝒴\widehat{\mathbf{w}}\leftarrow=\left(TT^{\ast}r,\mathbf{C}\right)_{{\mathcal{Y}}}
10       xx+αTrα𝐔𝐰^x\leftarrow x+\alpha T^{\ast}r-\alpha\mathbf{U}\widehat{\mathbf{w}}
11       rrαTTr+α𝐂𝐰^r\leftarrow r-\alpha TT^{\ast}r+\alpha\mathbf{C}\widehat{\mathbf{w}}
12 end while
Algorithm 3 Augmented Landweber for the normal equations

7 Numerical experiments

We present here experiments from two subproblems from the field of adaptive optics for large ground-based telescopes as well as for a toy blurring problem presented as a proof-of-concept for an alternative use of these methods.

7.1 Gaussian blurring model

Refer to caption
Refer to caption
Figure 1: Residual and error convergence for large-scale Gaussian blur problem with σ=6\sigma=6 and augmentation space generated using vectors resulting from applying 2 iterations steepest descent to the noisy (dashed curve) and noise-free (solid black curve) data using Gaussian blur operators for five smaller standard deviations chosen equally-spaced from the interval [0.5,1.5][0.5,1.5].

Our first experiment concerns an academic problem which serves as a proof of concept. However, we still execute a large-scale version of this experiment using an implicitly-defined, matrix free problem generated by the IR-Tools software package [17]. The IR-tools package does provide an out-of-the-box generator function for the Gaussian blurring problem, PRBlurGauss(); however, it does not allow one to specify precise standard deviation (i.e., the spread) of the point-spread function (PSF). Looking under the hood, though, one sees that there is a helper function called psfGauss() which generates a PSF function for a Gaussian with specified spread, which can then be fed into the general purpose PRBlur() problem generator, which outputs a matrix-free operator which convolves images with the PSF as well as the true solution and noisy right-hand side. A variety of solution images are available; we choose a predefined geometric pattern for this experiment. To make the problem sufficiently large-scale, we choose the image size to be 500×500500\times 500, meaning the implicit side of the matrix induced by the PSF is 250000×250000250000\times 250000. The IR-tools package is careful to not generate exact right-hand side data unless it is specified. We rerun PRBlur() with the option CommitCrime=‘true’ to obtain 𝐛true\mathbf{b}_{true}, but this is only used to understand the effects of noise in generating the recycling vectors. In these experiments, we generated the problem for standard deviation σ=6\sigma=6. To construct an augmentation subspace 𝒰{\mathcal{U}}, we selected vectors resulting from applying two iterations of steepest descent to the right-hand side data but for different Gaussian blurring operators with five smaller standard deviations chosen equally-spaced from the interval [0.5,1.5][0.5,1.5]. An orthonormal basis for the span of these (nine) vectors was obtained and taken as 𝒰{\mathcal{U}}. We also generated a set of “clean” recycled vectors using the noise-free 𝐛true\mathbf{b}_{true}, to see what effect noise has on the recycling process itself. It is demonstrated that such vectors do indeed offer better performance as recycled vectors, meaning that noise does effect the recycling process.

In Figure 1, we see that for both problems, this strategy yields an improvement in convergence, both for the noise-free and noise-contaminated problems. This is a promising numerical result and a proof of concept, but it requires further analysis and experimentation to understand it in the proper concept to guide the use of this strategy for real-world, non-academic problems.

Figure 2: The right-hand side and the best reconstruction produced by steepest descent for each, respectively.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: The true image and reconstructed images using augmented steepest descent We show them both as images in the top row and three-dimensional surfaces in the bottom row.
Refer to caption
Refer to caption
Refer to caption
Refer to caption

7.2 Adaptive optics: wavefront reconstruction

Turbulences in the atmosphere have a significant impact on the imaging quality of modern ground based telescopes. In order to correct for the impact of the atmosphere, Adaptive Optics (AO) systems are utilized. In these systems, the incoming light from bright guide stars is measured by wavefront sensors in order to obtain information of the actual turbulence in the atmosphere. In a next step, deformable mirrors use this information for an improvement of the obtained scientific images. For a review on the different mathematical aspects of AO systems we refer to [11]. E.g., Single Conjugate Adaptive Optics (SCAO) systems use one sensor and one mirror for the correction and are able to obtain a good image quality close to the direction of the guide star. Figure 4 shows a setup for a SCAO system. For the instruments of the new generation of telescopes, e.g., the Extremely Large Telescope (ELT) of the European Southern Observatory (ESO), the Pyramid sensor (P-WFS) will be used frequently to obtain the wavefront. A linear approximation of the connection between the incoming wavefront φ\varphi and and the sensor measurements (sx,sy)(s_{x},s_{y}) - assuming a non-modulated sensor - can be described as

(60) sx(x,y)=1πB(y)B(y)Φ(x,y)xx𝑑x,s_{x}(x,y)=\frac{1}{\pi}\int\limits_{-B(y)}^{B(y)}\frac{\Phi(x^{\prime},y)}{x-x^{\prime}}\,dx^{\prime}~{},

analogous for sys_{y}. An overview of existing reconstruction methods for the P-WFS can be found in [22, 20, 19], and a throughout analysis of the various P-WFS models is given in [21, 21].

Refer to caption
Figure 4: Principle of SCAO

The linear models can be used in case of small wavefront aberrations which is the case in a closed loop setting, i.e., if the wavefront sensor is optically located behind the deformable mirror. Due to the rapidly changing atmosphere, the shape of the deformable mirror has to be readjusted every 1-2 milliseconds over the whole imaging process, which might last several minutes. Therefore, the reconstructions of the wavefronts have to be accurate and have to be obtained in real-time. As a quality measure we use the Strehl ratio (SRSR), which relates the imaging quality of the telescope without disturbing atmosphere (SR=1SR=1) to the corrected imaging quality, i.e.. SR[0,1]SR\in[0,1]. In the beginning of the observation the Strehl ratio is low, as the deformable mirror is not yet adjusted to the turbulence conditions. After a few time steps, the mirror is in a state where only small adjustments are necessary - the loop has been closed. This happens in our case when SR[0.6,0.7]SR\in[0.6,0.7]. In this state, the linear modes for the P-WFS are valid. We show the utility of the recycled steepest descent method for multiple steps of an closed-loop iteration for a wavefront reconstruction for the instrument METIS of the ELT. The recycling subspaces were formed by the previous 131-3 reconstructions, see Fig. 5 for the results. In the long run, both steepest descent with recycling and warm restart (i.e, where the last reconstruction is used as a starting value for the new reconstruction) yield the same Strehl ratio. However, the method with recycling is able to close the loop much faster, which is important for a stable and efficient imaging of the scientific object.

Refer to caption
Figure 5: Monitoring of wavefront reconstruction accuracy over time.

7.3 Adaptive optics: Image reconstruction

Our final example concerns another problem related to astronomical imaging. As for any optical system, the design of the system determines the obtainable imaging quality: Instead of the ”true” image one always obtains a blurred image. The extend of blurring is described mathematically by the point spread function (PSF) of the optical system, and the measured image can be modeled as the convolution of the true image and the PSF:

(61) I(t)=Itrue(s)PSF(ts)𝑑s.I(t)=\int I_{true}(s)\cdot PSF(t-s)\,ds.

Thus, the original image can be recovered - assuming the PSF is known - by solving the linear ill-posed problem (61). In this example we want to reconstruct an astronomical image from the measured image obtained by a telescope using the approximate PSF of the telescope. It is well known that the PSF of a telescope without a turbulent atmosphere above the telescope (e.g., a telescope in space) is given by

(62) PSF0(x)=|{P}(x)|2PSF_{\color[rgb]{0,0,0}0}(x)=\left|{\mathcal{F}}\{P\}(x)\right|^{2}

where PP is the characteristic function of the aperture of the main mirror of the telescope (usually an annulus) and (){\mathcal{F}}(\cdot) represents the Fourier transform of the argument. Please note that the recovery of the original image from the measured one results in most cases in an ill posed problem, at least for smooth PSF, which is, e.g., the case for (62). For ground based telescopes, however, the turbulent atmosphere above the telescope has to be taken into account. The related PSF can be modeled as

(63) PSFϕ(x,t)=|{P()eiϕ(,t)}(x)|2.PSF_{\phi}(x,t)=\left|{\mathcal{F}}\{P(\cdot)e^{i\phi(\cdot,t)}\}(x)\right|^{2}.

Here, ϕ(r,t)\phi(r,t) describes the turbulence above the telescope. As we have seen above, an AO systems aims to reduce the impact of the turbulence to the imaging process - see Section 7.2. For various reasons, e.g., the time delay between measuring the turbulences and their correction or the limited resolution of wavefront sensors and deformable mirrors, such an correction will never be perfect and therefore there will always remain a residual (uncorrected) turbulence. Based on the data from the AO system it is, however, possible to estimate those uncorrected turbulences and obtain an approximation of PSFϕPSF_{\phi} of the observation. A full description of this problem, i.e. the reconstruction of the PSF from AO data of the scientific observation, can be found, e.g., in [10, Section 5.4] and references therein. As astronomical images are acquired over a period of time, one actually recovers a time-averaged PSF, denoted PSFϕ(,t)\left\langle PSF_{\phi}(\cdot,t)\right\rangle.
In our experiments, the time averaged PSFϕPSF_{\phi} with residual turbulences taken into account is simulated by OCTOPUS [26, 25], the simulation tool of the European Southern Observatory. In addition, we use a representation of PSF(x)PSF(x), the PSF of the telescope without turbulences. Both PSFs are represented by matrices, i.e., as the functions spatially discretized for images of a certain size. Please note that the recovery of the original image from the convolution data remains ill-posed in both cases.

Convolution of images with PSFs was performed using the function psfMatrix() available, .e.g., as a part of the Matlab toolbox IRTools [17]. We use the Star Cluster image, which can be obtained from the test problems of the Matlab package Restore Tools [1]. The provided artificial Star Cluster image is 257×257257\times 257; so for these experiments, the discretized PSFs are resized (i.e., downscaled) to also be 257×257257\times 257 using the imresize() function of Matlab. A psfMatrix object is then instantiated which can be applied via multiplication to an image to perform a convolution, i.e.,

𝙰𝚙𝚜𝚏=𝚙𝚜𝚏𝙼𝚊𝚝𝚛𝚒𝚡(𝚙𝚜𝚏_𝟸𝟻𝟽,`𝚙𝚎𝚛𝚒𝚘𝚍𝚒𝚌,[𝙲𝙲],[𝙽𝙽])\tt Apsf=psfMatrix(psf\_257,`periodic^{\prime},[C\ C],[N\ N])

where N=257N=257 and C=129C=129, the center coordinate of the image, and we use periodic boundary conditions. Entry (C,C)(C,C) of the PSF indicates where the value of the PSF at (0,0)(0,0) is located, where we assume the square image is centered at the origin. The PSF inducing the adjoint operator can be created via the relation PSFadj(x,y)=PSF(x,y)PSFadj(x,y)=PSF(-x,-y) which, along with the image being centered at the origin, can be used to construct the discrete PSF inducing the discretized adjoint operator [39]. In Figure 6, we show log plots of the true Star Cluster image and its blurred counterpart convolved with the PSF representing the atmospheric distortion. The blurred image was perturbed with random noise generated with uniform distribution using Matlab’s rand() function, which was then scaled to have norm 101bF10^{-1}\left\|b\right\|_{F}, where bb represents the convolved true image shown on the right in Figure 6.

Figure 6: The true and blurred Star Cluster image with generated stars of various brightnesses, with the true image shown in log-plot to clearly show all the generated stars clearly
Refer to caption
Refer to caption

In this study, we demonstrate that subspace recycling works well for the considered deconvolution problem. As for a telescope with a good AO system PSF0PSF_{0} is at least a coarse approximation of PSFϕPSF_{\phi}, we propose to form the recycling subspace from a few eigenvectors of the matrix representing PSF0PSF_{0}. Here, we take advantage of the fact that although PSFϕPSF_{\phi} changes for every observation, PSF0PSF_{0} remains constant. This enables us to pre-calculate eigenvectors of its induced convolution operator ahead of time offline.

We calculated the telescope eigenvectors via the matrix-free eigs() routine of Matlab which uses a Krylov-based iteration to compute eigenvalues and eigenvectors for a few of the largest eigenvalues of the operator. In this experiment, we chose to compute the top 200 but had to perform post-processing step to eliminate the spurious complex eigenvalues/-vectors that are created in pairs, leaving us with 198 eigenvalues. Ordering the eigenvalues thereof in descending order, we augment with different collections of these pre-computed eigenvectors, starting with just the first, then the first two, and so on, up to all 198. In Figure 7, we show the residual and error curves for all experiments together. One sees that augmenting with these eigenvectors is effective in increasing the speed of convergence of the steepest descent method, with acceleration increasing as we add more vectors, but with diminishing returns.

Figure 7: Residual and error convergence of steepest descent (shown in red) and augmented steepest descent (shown in shades of gray) with augmentation spaces being eigenvectors of the operator induced by the telescope PSF, where we used the eigenvectors associated to the largest eigenvalues. The curve’s darkness is determined by how many eigenvectors were used in the augmentation. More vectors produces a darker curve.
Refer to caption
Refer to caption

It is important to investigate how long it takes to achieve some manner of semiconvergence. Thus, for each experiment, we track at which iteration semiconvergence is achieved and also at which iteration the residual-based discrepancy principle is achieved. For the operator induced by the telescope PSF, we also monitor jumps in eigenvalue magnitude, and these are plotted separately. This is shown in Figure 8. We also show for completeness the first nine eigenvectors (i.e., eigenimages) of the operator induced by the telescope PSF in Figure 9.

Figure 8: We show (left figure) for each recycled subspace dimension from this experiment at what iteration semiconvergence and the discrepancy principle are achieved, and we also mark where jumps in the eigenvalues of the operator induced by the PSF (the Airy function) of the telescope occur. On the right, we show the eigenvalues of this operator.
Refer to caption
Refer to caption
Figure 9: The first nine eigenfuntions (with eigenvalues in descending order) of the operator induced by the PSF, i.e., Eq. 63, of the telescope. Note that the first function is actually a constant up to roundoff noise.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption

Lastly, for the case in which we augmented with the first 37 eigenvectors, we show the actual image produced at the iteration in which semiconvergence was reached. We show both the raw image produced by augmented steepest descent, as well as one post-processed by thresholding pixels with values less than 1010 to be zero. This is shown (again in log plot) in Figure 10. Augmentation of additional eigenvectors beyond the first 37 produced no additional improvement for this problem.

Figure 10: The image produced by augmented steepest descent at the semiconvergence iteration, raw and with pixel thresholding (all pixels of value less than 1010 thresholded to zero), for the experiment augmenting the first 37 eigenvectors.
Refer to caption
Refer to caption

8 Conclusions

In this paper, we have shown that under basic assumptions, the subspace recycling scheme can be combined with any known regularization scheme, and the resulting augmented scheme is still a regularization. This opens many possibilities for schemes for developing new augmented regularization schemes. We have further demonstrated this by proposing an augmented gradient descent scheme, and we show how useful recycling easily-computed information within the augmented gradient descent method can be for accelerating semi-convergence, both in an academic problem and in problems arising from applications in astronomy.

Acknowledgments

Victoria Hutterer and Ronny Ramlau are partialy supported by the SFB ”Tomography Across the Scales” (Austrian Science Fund Project F68-N36). The work in this paper was initiated during a visit by the second author to RICAM. The authors would like to thank Roland Wagner for providing sample PSFs for the last numerical experiment. Furthermore, the third author thanks Roland Wagner for his help in using the PSF to induce the adjoint operator needed for applying Landweber-type iterations. Lastly, the authors thank the referees for their constructive comments.

References

  • [1] Restoretools: An object oriented matlab package for image restoration, 2012, http://www.mathcs.emory.edu/~nagy/RestoreTools/. Accessed on 28. June 2020.
  • [2] K. Ahuja, E. de Sturler, S. Gugercin, and E. R. Chang, Recycling BiCG with an application to model reduction, SIAM J. Sci. Comput., 34 (2012), pp. A1925–A1949, https://doi.org/10.1137/100801500, http://dx.doi.org/10.1137/100801500.
  • [3] J. Baglama and L. Reichel, Augmented GMRES-type methods, Numerical Linear Algebra with Applications, 14 (2007), pp. 337–350, https://doi.org/10.1002/nla.518, http://dx.doi.org/10.1002/nla.518.
  • [4] J. Baglama and L. Reichel, Decomposition methods for large linear discrete ill-posed problems, Journal of Computational and Applied Mathematics, 198 (2007), pp. 332–343, https://doi.org/10.1016/j.cam.2005.09.025, http://dx.doi.org/10.1016/j.cam.2005.09.025.
  • [5] J. Chung, E. de Sturler, and J. Jiang, Hybrid projection methods with recycling for inverse problems, (2020), https://arxiv.org/abs/2007.00207.
  • [6] E. de Sturler, Nested Krylov methods based on GCR, Journal of Computational and Applied Mathematics, 67 (1996), pp. 15–41, https://doi.org/10.1016/0377-0427(94)00123-5, http://dx.doi.org/10.1016/0377-0427(94)00123-5.
  • [7] E. de Sturler, M. Kilmer, and K. M. Soodhalter, Krylov subspace augmentation for the solution of shifte systems: a review. in preparation.
  • [8] M. Defrise and C. D. Mol, A note on stopping rules for iterative methods and filtered svd, in Inverse Problems: An Interdisciplinary Study, P. C. Sabatier, ed., Academic Press, 1987, pp. 261–268.
  • [9] Y. Dong, H. Garde, and P. C. Hansen, R3GMRES: including prior information in GMRES-type methods for discrete inverse problems, Electron. Trans. Numer. Anal., 42 (2014), pp. 136–146.
  • [10] L. Dykes, R. Ramlau, L. Reichel, K. M. Soodhalter, and R. Wagner, Lanczos-based fast blind deconvolution methods, Journal of Computational and Applied Mathematics, (To Appear).
  • [11] B. Ellerbroek and C. Vogel, Inverse problems in astronomical adaptive optics, Inverse Problems, 25 (2009), p. 063001.
  • [12] H. Engl and R. Ramlau, Regularization of Inverse Problems, Springer, New York, 2015, pp. 1233–1241.
  • [13] H. W. Engl, M. Hanke, and A. Neubauer, Regularization of inverse problems., Dordrecht: Kluwer Academic Publishers, 1996.
  • [14] Y. A. Erlangga and R. Nabben, Deflation and balancing preconditioners for Krylov subspace methods applied to nonsymmetric matrices, SIAM Journal on Matrix Analysis and Applications, 30 (2008), pp. 684–699.
  • [15] A. Gaul, Recycling Krylov subspace methods for sequences of linear systems: Analysis and applications, PhD thesis, Technischen Universität Berlin, 2014.
  • [16] A. Gaul, M. H. Gutknecht, J. Liesen, and R. Nabben, A framework for deflated and augmented Krylov subspace methods, SIAM Journal on Matrix Analysis and Applications, 34 (2013), pp. 495–518, https://doi.org/10.1137/110820713, http://dx.doi.org/10.1137/110820713.
  • [17] S. Gazzola, P. C. Hansen, and J. G. Nagy, IR Tools: a MATLAB package of iterative regularization methods and large-scale test problems, Numer. Algorithms, 81 (2019), pp. 773–811, https://doi.org/10.1007/s11075-018-0570-7, https://doi-org.libproxy.temple.edu/10.1007/s11075-018-0570-7.
  • [18] M. H. Gutknecht, Deflated and augmented Krylov subspace methods: a framework for deflated BiCG and related solvers, SIAM J. Matrix Anal. Appl., 35 (2014), pp. 1444–1466, https://doi.org/10.1137/130923087, https://doi-org.libproxy.temple.edu/10.1137/130923087.
  • [19] V. Hutterer and R. Ramlau, Nonlinear wavefront reconstruction methods for pyramid sensors using Landweber and Landweber-Kaczmarz iteration, Applied Optics, 57 (2018), pp. 8790–8804.
  • [20] V. Hutterer and R. Ramlau, Wavefront reconstruction from non-modulated pyramid wavefront sensor data using a singular value type expansion, Inverse Problems, 34 (2018), p. 035002.
  • [21] V. Hutterer, R. Ramlau, and I. Shatokhina, Real-time adaptive optics with pyramid wavefront sensors: part I. a theoretical analysis of the pyramid sensor model, Inverse Problems, 35 (2019), p. 045007, https://doi.org/10.1088/1361-6420/ab0656, https://doi.org/10.1088%2F1361-6420%2Fab0656.
  • [22] R. R. I. Shatokhina, V. Hutterer, Review on methods for wavefront reconstruction from pyramid wavefront sensor data, Journal of Astronomical Telescopes, Instruments and Systems, 6 (2020), p. 010901, https://doi.org/10.1117/1.JATIS.6.1.010901.
  • [23] M. E. Kilmer and E. de Sturler, Recycling subspace information for diffuse optical tomography, SIAM Journal on Scientific Computing, 27 (2006), pp. 2140–2166, https://doi.org/10.1137/040610271, http://dx.doi.org/10.1137/040610271.
  • [24] L. Landweber, An Iteration Formula for Fredholm Integral Equations of the First Kind, American Journal of Mathematics, 73 (1951), pp. 615–624.
  • [25] M. Le Louarn, C. Vérinaud, V. Korkiakoski, and E. Fedrigo, Parallel simulation tools for AO on ELTs, in Advancements in Adaptive Optics, Proc. SPIE 5490, 2004, pp. 705–712.
  • [26] M. Le Louarn, C. Vérinaud, V. Korkiakoski, N. Hubin, and E. Marchetti, Adaptive optics simulations for the European Extremely Large Telescope - art. no. 627234, in Advances in Adaptive Optics II, Prs 1-3, vol. 6272, 2006, pp. U1048–U1056.
  • [27] A. K. Louis, Inverse und schlecht gestellte Probleme, Teubner Studienbücher Mathematik, Vieweg+Teubner Verlag, 1989.
  • [28] R. B. Morgan, GMRES with deflated restarting, SIAM J. Sci. Comput., 24 (2002), pp. 20–37, https://doi.org/10.1137/S1064827599364659, https://doi-org.libproxy.temple.edu/10.1137/S1064827599364659.
  • [29] M. L. Parks, E. de Sturler, G. Mackey, D. D. Johnson, and S. Maiti, Recycling Krylov subspaces for sequences of linear systems, SIAM Journal on Scientific Computing, 28 (2006), pp. 1651–1674, https://doi.org/10.1137/040607277, http://dx.doi.ofrrg/10.1137/040607277.
  • [30] M. L. Parks, K. M. Soodhalter, and D. B. Szyld, A block recycled gmres method with investigations into aspects of solver performance, (2016), https://arxiv.org/abs/1604.01713.
  • [31] R. Ramlau and M. Rosensteiner, An efficient solution to the atmospheric turbulence tomography problem using Kaczmarz iteration, Inverse Problems, 28 (2012), p. 095004.
  • [32] R. Ramlau and B. Stadler, An augmented wavelet reconstructor for atmospheric tomography, 2020, https://arxiv.org/abs/2011.06842.
  • [33] Y. Saad, Krylov subspace methods for solving large unsymmetric linear systems, Mathematics of Computation, 37 (1981), pp. 105–126, https://doi.org/10.2307/2007504, http://dx.doi.org/10.2307/2007504.
  • [34] Y. Saad, Iterative Methods for Sparse Linear Systems, SIAM, Philadelphia, Second ed., 2003.
  • [35] O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen, Variational Methods in Imaging, Springer, New York, 2009.
  • [36] K. M. Soodhalter, E. de Sturler, and M. Kilmer, A survey of subspace recycling iterative methods, GAMM Mitteilungen, (2020). Applied and numerical linear algebra topical issue.
  • [37] L. N. Trefethen and D. Bau, III, Numerical linear algebra, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997, https://doi.org/10.1137/1.9780898719574, https://doi-org.libproxy.temple.edu/10.1137/1.9780898719574.
  • [38] S. Wang, E. de Sturler, and G. H. Paulino, Large-scale topology optimization using preconditioned Krylov subspace methods with recycling, International Journal for Numerical Methods in Engineering, 69 (2007), pp. 2441–2468, https://doi.org/10.1002/nme.1798, http://dx.doi.org/10.1002/nme.1798.
  • [39] R. Wanger. Personal Communication, Mar. 2020.
  • [40] M. Yudytskiy, T. Helin, and R. Ramlau, Finite element-wavelet hybrid algorithm for atmospheric tomography, J. Opt. Soc. Am. A, 31 (2014), pp. 550–560, https://doi.org/10.1364/JOSAA.31.000550, http://josaa.osa.org/abstract.cfm?URI=josaa-31-3-550.