This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Scalable iterative data-adaptive RKHS regularization

Haibo Li School of Mathematics and Statistics, The University of Melbourne, Victoria, Australia. Jinchao Feng Department of Mathematics, School of Sciences, Great Bay University, Dongguan, China. Fei Lu Department of Mathematics, Johns Hopkins University, Baltimore, USA.
Abstract

We present iDARR, a scalable iterative Data-Adaptive RKHS Regularization method, for solving ill-posed linear inverse problems. The method searches for solutions in subspaces where the true solution can be identified, with the data-adaptive RKHS penalizing the spaces of small singular values. At the core of the method is a new generalized Golub-Kahan bidiagonalization procedure that recursively constructs orthonormal bases for a sequence of RKHS-restricted Krylov subspaces. The method is scalable with a complexity of O(kmn)O(kmn) for mm-by-nn matrices with kk denoting the iteration numbers. Numerical tests on the Fredholm integral equation and 2D image deblurring show that it outperforms the widely used L2L^{2} and l2l^{2} norms, producing stable accurate solutions consistently converging when the noise level decays.

Keywords: iterative regularization; ill-posed inverse problem; reproducing kernel Hilbert space; Golub-Kahan bidiagonalization; deconvolution.

AMS subject classifications(MSC2020): 47A52, 65F22, 65J20

1 Introduction

This study considers large-scale ill-posed linear inverse problems with little prior information on the regularization norm. The goal is to reliably solve high-dimensional vectors xnx\in\mathbb{R}^{n} from the equation

Ax+𝐰=b,Am×n,Ax+\mathbf{w}=b,\quad A\in\mathbb{R}^{m\times n}, (1.1)

where AA and bb are data-dependent forward mapping and output, and 𝐰\mathbf{w} denotes noise or measurement error. The problem is ill-posed in the sense that the least squares solution with minimal Euclidean norm, often solved by xLS=Abx_{LS}=A^{\dagger}b or xLS=(AA)Abx_{LS}=(A^{\top}A)^{\dagger}A^{\top}b with denoting pseudo-inverse, is sensitive to perturbations in bb. Such an ill-posedness happens when the singular values of AA decay to zero faster than the perturbation in bb projected in the corresponding singular vectors.

Regularization is crucial to producing stable solutions for the ill-posed inverse problem. Broadly, it encompasses two integral components: a penalty term that defines the search domain and a hyperparameter that controls the strength of regularization. There are two primary approaches to implementing regularization: direct methods, which rely on matrix decomposition, e.g., the Tikhonov regularization [35], the truncated singular value decomposition (SVD) [15, 11]; and iterative methods, which use matrix-vector computations to scale for high-dimensional problems, see e.g., [12, 7, 41] for recent developments.

In our setting, we encounter two primary challenges: selecting an adaptive regularization norm and devising an iterative method to ensure scalability. The need for an adaptive norm arises from the variability of the forward map AA across different applications and the often limited prior information about the regularity of xx. Many existing regularization norms, such as the Euclidean norms used in Tikhonov methods in [15, 35] and the total variation norm in [33], lack this adaptability; for more examples, see the related work section below. Although a data-adaptive regularization norm has been proposed in [26, 25] for nonparametric regression, it is implicitly defined and requires a spectral decomposition of the normal operator.

We introduce iDARR, an iterative Data-Adaptive Reproducing kernel Hilbert space Regularization method. This method resolves both challenges by iteratively solving the subspace-projected problem

xk=argminx𝒳kxCrkhs,𝒳k={x:minx𝒮kAxb2},x_{k}=\mathop{\mathrm{argmin}}_{x\in\mathcal{X}_{k}}\|x\|_{C_{rkhs}},\ \ \mathcal{X}_{k}=\{x:\min_{x\in\mathcal{S}_{k}}\|Ax-b\|_{2}\},

where Crkhs\|\cdot\|_{C_{rkhs}} is the implicitly defined semi-norm of a data-adaptive RKHS (DA-RKHS), and 𝒮k\mathcal{S}_{k} are subspaces of the DA-RKHS constructed by a generalized Golub-Kahan bidiagonalization (gGKB). By stopping the iteration early using the L-curve criterion, it produces a stable accurate solution without using any matrix decomposition.

The DA-RKHS is a space defined by the data and model, embodying the intrinsic nature of the inverse problem. Its closure is the data-dependent space in which the true solution can be recovered, particularly when AA is deficient-ranked. Thus, when used for regularization, it confines the solution search in the right space and penalizes the small singular values, leading to stable solutions. We construct this DA-RKHS by reformulating eq. 1.1 as a weighted Fredholm integral equation of the first kind and examining the identifiability of the input signal, as detailed in Section 2.

Our key innovation is the gGKB. It constructs solution subspaces in the DA-RKHS without explicitly computing it. It is scalable with a cost of only O(kmn)O(kmn), where kk is the number of iterations. This cost is orders of magnitude much smaller than the cost of direct methods based on spectral decomposition of AAA^{\top}A, typically O(n3+mn2)O(n^{3}+mn^{2}) operations.

The iDARR and gGKB have solid mathematical foundations. We prove that each subspace 𝒮k\mathcal{S}_{k} is restricted in the DA-RKHS, thereby named the RKHS-restricted Krylov subspaces. It is spanned by the orthonormal vectors produced by gGKB. Importantly, if not stopped early, the gGKB terminates when the RKHS-restricted Krylov subspace is fully explored, and the solution in each iteration is unique.

Systematic numerical tests employing the Fredholm integral equations demonstrate that iDARR surpasses traditional iterative methods employing l2l^{2} and L2L^{2} norms in the state-of-the-art IR TOOLS package [12]. Notably, iDARR delivers accurate estimators that consistently decay with the noise level. This superior performance is evident irrespective of whether the spectral decay is exponential or polynomial, or whether the true solution resides inside or outside the DA-RKHS. Furthermore, our application to image deblurring underscores both its scalability and accuracy.

Main contributions. Our main contribution lies in developing iDARR, a scalable iterative regularization method tailored for large-scale ill-posed inverse problems with little prior knowledge about the solution. The cornerstone of iDARR is the introduction of a new data-adaptive RKHS determined by the underlying model and the data. A key technical innovation is the gGKB, which efficiently constructs solution subspaces of the implicitly defined DA-RKHS.

1.1 Related work

Numerous regularization methods have been developed, and the literature on this topic is too vast to be surveyed here; we refer to [15, 11, 12] and references therein for an overview. In the following, we compare iDARR with the most closely related works.

Regularization norms

Various regularization norms exist, such as Euclidean norms of Tikhonov in [15, 35], the total variation norm xL1\|x^{\prime}\|_{L^{1}} of the Rudin–Osher–Fatemi method in [33], the L1L^{1} norm xL1\|x\|_{L^{1}} of LASSO in [34], and the RKHS norm xR2\|x\|_{R}^{2} of an RKHS with a user-specified reproducing kernel RR [37, 3, 9]. These norms, however, are often based on presumed properties of the solution and do not consider the specifics of each inverse problem. Our RKHS norm differs by adapting to the model and data: our RKHS has a reproducing kernel determined by the inverse problem, and its closure is the space in which the solution can be identified, making it an apt choice for regularization in the absence of additional solution information.

Iterative regularization (IR) methods

IR methods are scalable by accessing the matrix only via matrix-vector multiplications, producing a sequence of estimators until an early stopping, where the iteration number plays the role of the regularization parameter. IR has a rich and extensive history and continues to be a vibrant area of interest in contemporary studies [30, 1, 12, 24]. Different regularization terms lead to various methods. The LSQR algorithm [30, 5] with early stopping is standard for x22\|x\|_{2}^{2}-regularization. It solves projected problems in Krylov subspaces before transforming back to the original domain. For Lx22\|Lx\|_{2}^{2} with Lp×nL\in\mathbb{R}^{p\times n}, the widely-used methods include joint bidiagonalization method [19, 18], generalized Krylov subspace method [21, 31], random SVD or generalized SVD method [39, 40, 38], modified truncated SVD method [2, 17], etc. For the general regularization norm in the form xTMxx^{T}Mx with a symmetric matrix MM, the MLSQR in [1] treats positive definite MM and the preconditioned GKB [24] handles positive semi-definite MM’s. Our iDARR studies the case that MM is unavailable but MM^{\dagger} is ready to be used.

Golub-Kahan bidiagonalization (GKB)

The GKB was first used to solve inverse problems in [29], which generates orthonormal bases for Krylov subspaces in (n,,2)(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{2}) and (m,,2)(\mathbb{R}^{m},\langle\cdot,\cdot\rangle_{2}). This method extends to bounded linear compact operators between Hilbert spaces, with properties and convergence results detailed in [6]. Our gGKB extends the method to construct RKHS-restricted Krylov subspace in (n,,Crkhs)(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}}) and (m,,2)(\mathbb{R}^{m},\langle\cdot,\cdot\rangle_{2}), where CrkhsC_{rkhs} is positive semidefinite, and in particular, only CrkhsC_{rkhs}^{\dagger} is available.

The remainder of this paper is organized as follows: Section 2 introduces the adaptive RKHS with a characterization of its norm. Section 3 presents in detail the iDARR. Section 4 proves the desired properties of gGKB. In Section 5, we systematically examine the algorithm and demonstrate the robust convergence of the estimator when the noise becomes small. Finally, Section 6 concludes with a discussion on future developments.

2 A Data Adaptive RKHS for Regularization

This section introduces a data-adaptive RKHS (DA-RKHS) that adapts to the model and data in the inverse problem. The closure of this DA-RKHS is the function (or vector) space in which the true solution can be recovered, or equivalently, the inverse problem is well-defined in the sense that the loss function has a unique minimizer. Hence, when its norm is used for regularization, this DA-RKHS ensures that the minimization searches in the space where we can identify the solution.

To describe the DA-RKHS, we first present a unified notation that applies to both discrete and continuous time models using a weighted Fredholm integral equation of the first kind. Based on this notation, we write the normal operator as an integral operator emerging in a variational formulation of the inverse problem. The integral kernel is the reproducing kernel for the DA-RKHS. In other words, the normal operator defines the DA-RKHS. At last, we briefly review a DA-RKHS Tikhonov regularization algorithm, the DARTR algorithm.

2.1 Unified notation for discrete and continuous models

The linear equation eq. 1.1 can arise from discrete or continuous inverse problems. In either case, we can present the inverse problem using the prototype of the Fredholm integral equation of the first kind. We consider only the 1D case for simplicity, and the extension to high-dimensional cases is straightforward. Specifically, let 𝒮,𝒯\mathcal{S},\mathcal{T}\subset\mathbb{R} be two compact sets. We aim to recover the function ϕ:𝒮\phi:\mathcal{S}\to\mathbb{R} in the Fredholm equation

y(t)=𝒮K(t,s)ϕ(s)ν(ds)+σW˙(t)=:LKϕ(t)+σW˙(t),t𝒯\displaystyle y(t)=\int_{\mathcal{S}}K(t,s)\phi(s)\nu(ds)+\sigma\dot{W}(t)=:L_{K}\phi(t)+\sigma\dot{W}(t),\quad\forall t\in\mathcal{T} (2.1)

from discrete noisy data

b=(y(t1),,y(tm))m,b=(y(t_{1}),\ldots,y(t_{m}))^{\top}\in\mathbb{R}^{m},

where we assume the observation index 𝒯={tj}j=1m\mathcal{T}=\{t_{j}\}_{j=1}^{m} to be 0=t0<t1<<tm0=t_{0}<t_{1}<\cdots<t_{m} for simplicity. Here the measurement noise σW˙(t)\sigma\dot{W}(t) is the white noise scaled by σ\sigma; that is, the noise at tjt_{j} has a Gaussian distribution 𝒩(0,σ2(tj+1tj))\mathcal{N}(0,\sigma^{2}(t_{j+1}-t_{j})) for each jj. Such noise is integrable when the observation mesh refines, i.e., maxj(tj+1tj)\max_{j}(t_{j+1}-t_{j}) vanishes.

Here the finite measure ν\nu can be either the Lebesgue measure with 𝒮\mathcal{S} being an interval or an atom measure with 𝒮\mathcal{S} having finitely many elements. Correspondingly, the Fredholm integral equation eq. 2.1 is either a continuous or a discrete model.

In either case, the goal is to solve for the function ϕ:𝒮\phi:\mathcal{S}\to\mathbb{R} in eq. 2.1. When seeking a solution in the form of ϕ=i=1nxiϕi\phi=\sum_{i=1}^{n}x_{i}\phi_{i}, where {ϕi}i=1n\{\phi_{i}\}_{i=1}^{n} is a pre-selected set of basis functions, we obtain the linear equation eq. 1.1 with x=(x1,,xn)nx=(x_{1},\ldots,x_{n})^{\top}\in\mathbb{R}^{n} and the matrix AA with entries

A(j,i)=𝒮K(tj,s)ϕi(s)ν(ds)=LKϕi(tj),1jm,1in.A(j,i)=\int_{\mathcal{S}}K(t_{j},s)\phi_{i}(s)\nu(ds)=L_{K}\phi_{i}(t_{j}),\quad 1\leq j\leq m,1\leq i\leq n. (2.2)

In particular, when {ϕi}\{\phi_{i}\} are piece-wise constants, we obtain AA as follows.

  • Discrete model. Let ν\nu be an atom measure on 𝒮={si}i=1n\mathcal{S}=\{s_{i}\}_{i=1}^{n}, a set with nn elements. Suppose that the basis functions are ϕi(s)=𝟏{si}(s)\phi_{i}(s)=\mathbf{1}_{\{s_{i}\}}(s). Then, ϕ=x\phi=x and the matrix AA has entries A(j,i)=K(tj,si)ν(si)A(j,i)=K(t_{j},s_{i})\nu(s_{i}).

  • Continuous model. Let ν\nu be the Lebesgue measure on 𝒮=[0,1]\mathcal{S}=[0,1], and ϕi(s)=𝟏[si1,si](s)\phi_{i}(s)=\mathbf{1}_{[s_{i-1},s_{i}]}(s) be piecewise constant functions on a partition of 𝒮\mathcal{S} with 0=s0<s1<<sn=10=s_{0}<s_{1}<\ldots<s_{n}=1. Then, ϕ=i=1nxiϕi\phi=\sum_{i=1}^{n}x_{i}\phi_{i} and the matrix AA has entries A(j,i)=K(tj,si)(sisi1)A(j,i)=K(t_{j},s_{i})(s_{i}-s_{i-1}).

The default function spaces for ϕ\phi and yy above are Lν2(𝒮)L^{2}_{\nu}(\mathcal{S}) and Lμ2(𝒯)L^{2}_{\mu}(\mathcal{T}). The loss function (x)=Axb22\mathcal{E}(x)=\|Ax-b\|_{2}^{2} over Lν2(𝒮)L^{2}_{\nu}(\mathcal{S}) becomes

(ϕ):=LKϕyLμ2(𝒯)2\displaystyle\mathcal{E}(\phi):=\left\|L_{K}\phi-y\right\|^{2}_{{L^{2}_{\mu}(\mathcal{T})}} =LKϕ,LKϕLμ2(𝒯)2LKϕ,yLμ2(𝒯)+yLμ2(𝒯)2,\displaystyle=\langle{L_{K}\phi,L_{K}\phi}\rangle_{{L^{2}_{\mu}(\mathcal{T})}}-2\langle{L_{K}\phi,y}\rangle_{{L^{2}_{\mu}(\mathcal{T})}}+\left\|y\right\|^{2}_{{L^{2}_{\mu}(\mathcal{T})}}, (2.3)

Eq.eq. 2.1 is a prototype of ill-posed inverse problems, dating back from Hadamard [14], and it remains a testbed for new regularization methods [36, 28, 15, 23].

The Lν2(𝒮)L^{2}_{\nu}(\mathcal{S}) norm is often a default choice for regularization. However, it has a major drawback: it does not take into account the operator LKL_{K}, particularly when LKL_{K} has zero eigenvalues, and it leads to unstable solutions that may blow up in the small noise limit [22]. To avoid such instability, particularly for iterative methods, we introduce a weighted function space and an RKHS that are adaptive to both the data and the model in the next sections.

2.2 The function space of identifiability

We first introduce a weighted function space Lρ2(𝒮)L^{2}_{\rho}(\mathcal{S}), where the measure ρ\rho is defined as

dρdν(s):=1Z𝒯|K(t,s)|μ(dt),s𝒮,\frac{d\rho}{d\nu}(s):=\frac{1}{Z}\int_{\mathcal{T}}|K(t,s)|\mu(dt),\,\forall s\in\mathcal{S}, (2.4)

where Z=𝒮𝒯|K(t,s)|μ(dt)ν(ds)Z=\int_{\mathcal{S}}\int_{\mathcal{T}}|K(t,s)|\mu(dt)\nu(ds) is a normalizing constant. This measure quantifies the exploration of data to the unknown function through the integral kernel KK at the output set 𝒯\mathcal{T}, i.e., {K(tj,)}tj𝒯\{K(t_{j},\cdot)\}_{t_{j}\in\mathcal{T}}, hence it is referred to as an exploration measure. In particular, when (1.1) is a discrete model, the exploration measure is the normalized column sum of the absolute values of the matrix AA.

The major advantage of the space Lρ2(𝒮)L^{2}_{\rho}(\mathcal{S}) over the original space Lν2(𝒮)L^{2}_{\nu}(\mathcal{S}) is that it is adaptive to the specific setting of the inverse problem. In particular, this weighted space takes into account the structure of the integral kernel and the data points in 𝒯\mathcal{T}. Thus, while the following introduction of RKHS can be carried out in both Lρ2(𝒮)L^{2}_{\rho}(\mathcal{S}) and Lν2(𝒮)L^{2}_{\nu}(\mathcal{S}), we will focus only on Lρ2(𝒮)L^{2}_{\rho}(\mathcal{S}).

Next, we consider the variational inverse problem over Lρ2L^{2}_{\rho}, and the goal is to find a minimizer of the loss function eq. 2.3 in Lρ2L^{2}_{\rho}. Since the loss function is quadratic, its Hessian is a symmetric positive linear operator, and it has a unique minimizer in the linear subspace where the Hessian is strictly positive. We assign a name to this subspace in the next definition.

Definition 2.1 (Function space of identifiability)

In a variational inverse problem of minimizing a quadratic loss function \mathcal{E} in Lρ2L^{2}_{\rho}, we call G¯=122{\mathcal{L}_{\overline{G}}}=\frac{1}{2}\nabla^{2}\mathcal{E} the normal operator, where 2\nabla^{2}\mathcal{E} is the Hessian of \mathcal{E}, and we call H=Null(G¯)H=\mathrm{Null}({\mathcal{L}_{\overline{G}}})^{\perp} the function space of identifiability (FSOI).

The next lemma specifies the FSOI for the loss function in eq. 2.3 (see [26] for its proof).

Lemma 2.2

Assume KC(𝒯×𝒮)K\in C(\mathcal{T}\times\mathcal{S}). For ρ\rho in eq. 2.4, define G¯:𝒮×𝒮{\overline{G}}:\mathcal{S}\times\mathcal{S}\to\mathbb{R} as

G¯(s,s)\displaystyle{\overline{G}}(s,s^{\prime}) :=G(s,s)dρdν(s)dρdν(s),G(s,s)\displaystyle:=\frac{G(s,s^{\prime})}{\frac{d\rho}{d\nu}(s)\frac{d\rho}{d\nu}(s^{\prime})},\quad G(s,s^{\prime}) :=𝒯K(t,s)K(t,s)μ(dt).\displaystyle:=\int_{\mathcal{T}}K(t,s)K(t,s^{\prime})\mu(dt). (2.5)
  • (a)

    The normal operator for \mathcal{E} in eq. 2.3 over Lρ2L^{2}_{\rho} is G¯:Lρ2Lρ2{\mathcal{L}_{\overline{G}}}:L^{2}_{\rho}\to L^{2}_{\rho} defined by

    G¯ϕ(s):=𝒮ϕ(s)G¯(s,s)ρ(ds),{\mathcal{L}_{\overline{G}}}\phi(s):=\int_{\mathcal{S}}\phi(s^{\prime}){\overline{G}}(s,s^{\prime})\rho(ds^{\prime}), (2.6)

    and the loss function can be written as

    (ϕ)\displaystyle\mathcal{E}(\phi) =G¯ϕ,ϕLρ22ϕD,ϕLρ2+const.,\displaystyle=\langle{{\mathcal{L}_{\overline{G}}}\phi,\phi}\rangle_{L^{2}_{\rho}}-2\langle{\phi^{D},\phi}\rangle_{L^{2}_{\rho}}+const., (2.7)

    where ϕD\phi^{D} comes from Riesz representation s.t. ϕD,ϕLρ2=LKϕ,yLμ2(𝒯)\langle{\phi^{D},\phi}\rangle_{L^{2}_{\rho}}=\langle{L_{K}\phi,y}\rangle_{{L^{2}_{\mu}(\mathcal{T})}} for any ϕLρ2\phi\in L^{2}_{\rho}.

  • (b)

    G¯{\mathcal{L}_{\overline{G}}} is compact, self-adjoint, and positive. Hence, its eigenvalues {λi}i1\{\lambda_{i}\}_{i\geq 1} converge to zero and its orthonormal eigenfunctions {ψi}i\{\psi_{i}\}_{i} form a complete basis of Lρ2(𝒮)L^{2}_{\rho}(\mathcal{S}).

  • (c)

    The FSOI is H:=span{ψi}i:λi>0¯Lρ2(𝒮)H:=\overline{\mathrm{span}\{\psi_{i}\}_{i:\lambda_{i}>0}}\subset L^{2}_{\rho}(\mathcal{S}), and the unique minimizer of \mathcal{E} in H is ϕ^=G¯1ϕD\widehat{\phi}={\mathcal{L}_{\overline{G}}}^{-1}\phi^{D}, where G¯1{\mathcal{L}_{\overline{G}}}^{-1} is the inversion of G¯:HLρ2{\mathcal{L}_{\overline{G}}}:H\to L^{2}_{\rho}.

Lemma 2.2 reveals the cause of the ill-posedness, and provides insights on regularization:

  • The variational inverse problem is well-defined only in the FSOI HH, which can be a proper subset of Lρ2L^{2}_{\rho}. Its ill-posedness in HH depends on the smallest eigenvalue of the operator G¯{\mathcal{L}_{\overline{G}}} and the error in ϕD\phi^{D}.

  • When the data is noiseless, the loss function can only identify the HH-projection of the true input function. When data is noisy, its minimizer G¯1ϕD{\mathcal{L}_{\overline{G}}}^{-1}\phi^{D} is ill-defined in Lρ2L^{2}_{\rho} when ϕDG¯(H)\phi^{D}\notin{\mathcal{L}_{\overline{G}}}(H).

As a result, when regularizing the ill-posed problem, an important task is to ensure the search takes place in the FSOI and to remove the noise-contaminated components making ϕDG¯(H)\phi^{D}\notin{\mathcal{L}_{\overline{G}}}(H).

2.3 A Data-adaptive RKHS

Our data-adaptive RKHS is the RKHS with G¯{\overline{G}} in eq. 2.5 as a reproducing kernel. Hence, it is adaptive to the integral kernel KK and the data in the model. When its norm is used for regularization, it ensures that the search takes place in the FSOI because its Lρ2L^{2}_{\rho} closure is the FSOI; also, it penalizes the components in ϕD\phi^{D} corresponding to the small singular values.

The next lemma characterizes this RKHS, and we refer to [26] for its proof.

Lemma 2.3 (Characterization of the adaptive RKHS)

Assume KC(𝒯×𝒮)K\in C(\mathcal{T}\times\mathcal{S}). The RKHS HGH_{G} with G¯{\overline{G}} as its reproducing kernel satisfies the following properties.

  1. (a)

    HG:=G¯12(Lρ2(𝒮))H_{G}:={\mathcal{L}_{\overline{G}}}^{\frac{1}{2}}(L^{2}_{\rho}(\mathcal{S})) has inner product ϕ,ϕHG=G¯12ϕ,G¯12ϕLρ2(𝒮).\langle{\phi,\phi}\rangle_{H_{G}}=\langle{{\mathcal{L}_{\overline{G}}}^{-\frac{1}{2}}\phi,{\mathcal{L}_{\overline{G}}}^{-\frac{1}{2}}\phi}\rangle_{L^{2}_{\rho}(\mathcal{S})}.

  2. (b)

    {λiψi}λi>0\{\sqrt{\lambda_{i}}\psi_{i}\}_{\lambda_{i}>0} is an orthonormal basis of HGH_{G}, where {(λi,ψi)}i\{(\lambda_{i},\psi_{i})\}_{i} are eigen-pairs of G¯{\mathcal{L}_{\overline{G}}}.

  3. (c)

    For any ϕ=i=1ciyiHG\phi=\sum_{i=1}^{\infty}c_{i}y_{i}\in H_{G}, we have

    LKϕ,LKϕLμ2(𝒯)=i=1λici2,ϕLρ22=i=1ci2,ϕHG2=i=1λi1ci2.\langle{L_{K}\phi,L_{K}\phi}\rangle_{L^{2}_{\mu}(\mathcal{T})}=\sum_{i=1}^{\infty}\lambda_{i}c_{i}^{2},\quad\|\phi\|^{2}_{L^{2}_{\rho}}=\sum_{i=1}^{\infty}c_{i}^{2},\quad\|\phi\|^{2}_{H_{G}}=\sum_{i=1}^{\infty}\lambda_{i}^{-1}c_{i}^{2}.
  4. (d)

    H=HG¯H=\overline{H_{G}} with inclosure in Lρ2(𝒮)L^{2}_{\rho}(\mathcal{S}), where H=span{yi}i:λi>0¯H=\overline{\mathrm{span}\{y_{i}\}_{i:\lambda_{i}>0}} is the FSOI.

The next theorem shows the computation of the RKHS norm for the problem eq. 1.1 when it is written in the form eq. 2.1-eq. 2.2. A key component is solving the eigenvalues of G¯{\mathcal{L}_{\overline{G}}} through a generalized eigenvalue problem.

Theorem 2.4 (Computation of RKHS norm)

Suppose that eq. 1.1 is equivalent to eq. 2.1 under basis functions {ϕi}i=1n\{\phi_{i}\}_{i=1}^{n} with nn\leq\infty and eq. 2.2. Let BB with entries B(i,j)=ϕi,ϕjLρ2B(i,j)=\langle{\phi_{i},\phi_{j}}\rangle_{L^{2}_{\rho}} be the non-singular basis matrix, where ρ\rho is the measure defined in eq. 2.4. Then, the operator G¯{\mathcal{L}_{\overline{G}}} in eq. 2.6 has eigenvalues (λ1,,λn)(\lambda_{1},\ldots,\lambda_{n}) solved by the generalize eigenvalue problem:

AAV=BVΛ,s.t.,VBV=In,Λ=diag(λ1,,λn),A^{\top}AV=BV\Lambda,\quad s.t.,\ \ V^{\top}BV=I_{n},\quad\Lambda=\mathrm{diag}(\lambda_{1},\ldots,\lambda_{n}), (2.8)

and the eigenfunctions are {ψk=j=1nVjkϕj}k\{\psi_{k}=\sum_{j=1}^{n}V_{jk}\phi_{j}\}_{k}. The RKHS norm of ϕ=i=1nxiϕi\phi=\sum_{i=1}^{n}x_{i}\phi_{i} satisfies

ϕHG2\displaystyle\|\phi\|_{H_{G}}^{2} =xCrkhs2=xCrkhsx,\displaystyle=\|x\|_{C_{rkhs}}^{2}=x^{\top}C_{rkhs}x, (2.9)
Crkhs\displaystyle C_{rkhs} =(VΛV)=B(AA)B,Crkhs=B1(AA)B1.\displaystyle=(V\Lambda V^{\top})^{\dagger}=B(A^{\top}A)^{\dagger}B,\quad C_{rkhs}^{\dagger}=B^{-1}(A^{\top}A)B^{-1}.

In particular, if B=InB=I_{n}, we have Crkhs=(AA)C_{rkhs}=(A^{\top}A)^{\dagger}.

Proof. Denote 𝚽=(ϕ1,,ϕn)\boldsymbol{\Phi}=(\phi_{1},\ldots,\phi_{n})^{\top} and 𝚿=(ψ1,,ψn)\boldsymbol{\Psi}=(\psi_{1},\ldots,\psi_{n})^{\top}. We first prove that the eigenvalues of G¯{\mathcal{L}_{\overline{G}}} are solved by eq. 2.8. We suppose {(λi,ψi)}i=1n\{(\lambda_{i},\psi_{i})\}_{i=1}^{n} are the eigenvalues and eigen-functions of G¯{\mathcal{L}_{\overline{G}}} over Lρ2L^{2}_{\rho} with {ψi}\{\psi_{i}\} being an orthonormal basis of Lρ2(𝒮)L^{2}_{\rho}(\mathcal{S}). Since =span{ϕi}i=1nG¯(Lρ2)\mathcal{H}=\mathrm{span}\{\phi_{i}\}_{i=1}^{n}\supseteq{\mathcal{L}_{\overline{G}}}(L^{2}_{\rho}), there exists Vn×nV\in\mathbb{R}^{n\times n} such that 𝚿=V𝚽\boldsymbol{\Psi}=V^{\top}\boldsymbol{\Phi}, i.e., ψk=j=1nVjkϕj}k\psi_{k}=\sum_{j=1}^{n}V_{jk}\phi_{j}\}_{k} for each 1kn1\leq k\leq n. Then, the task is to verify that VV and Λ=Diag(λ1,,λn)\Lambda=\mathrm{Diag}(\lambda_{1},\ldots,\lambda_{n}) satisfy AAV=BVΛA^{\top}AV=BV\Lambda and VBV=InV^{\top}BV=I_{n}.

The orthonormality of {ψi}\{\psi_{i}\} implies that

In=(ψk,ψlLρ2)1k,ln=(i=1nVikϕi,j=1nVjlϕjLρ2)1k,ln=VBV.I_{n}=\big{(}\langle{\psi_{k},\psi_{l}}\rangle_{L^{2}_{\rho}}\big{)}_{1\leq k,l\leq n}=\big{(}\langle{\sum_{i=1}^{n}V_{ik}\phi_{i},\sum_{j=1}^{n}V_{jl}\phi_{j}}\rangle_{L^{2}_{\rho}}\big{)}_{1\leq k,l\leq n}=V^{\top}BV.

Note that [AA](j,i)=LKϕi,LKϕjLμ2(𝒯)=ϕj,G¯ϕiLρ2[A^{\top}A](j,i)=\langle{L_{K}\phi_{i},L_{K}\phi_{j}}\rangle_{L^{2}_{\mu}(\mathcal{T})}=\langle{\phi_{j},{\mathcal{L}_{\overline{G}}}\phi_{i}}\rangle_{L^{2}_{\rho}} for 1i,jn1\leq i,j\leq n. Then,

ϕj,G¯ψkLρ2=i=1n[AA](j,i)Vik.\langle{\phi_{j},{\mathcal{L}_{\overline{G}}}\psi_{k}}\rangle_{L^{2}_{\rho}}=\sum_{i=1}^{n}[A^{\top}A](j,i)V_{ik}.

Meanwhile, the eigen-equation G¯ψk=λkψk{\mathcal{L}_{\overline{G}}}\psi_{k}=\lambda_{k}\psi_{k} implies that for each ϕj\phi_{j},

ϕj,G¯ψkLρ2=λkϕj,ψkLρ2=λkϕj,i=1nVikϕiLρ2=λki=1nBjiVik,\langle{\phi_{j},{\mathcal{L}_{\overline{G}}}\psi_{k}}\rangle_{L^{2}_{\rho}}=\lambda_{k}\langle{\phi_{j},\psi_{k}}\rangle_{L^{2}_{\rho}}=\lambda_{k}\langle{\phi_{j},\sum_{i=1}^{n}V_{ik}\phi_{i}}\rangle_{L^{2}_{\rho}}=\lambda_{k}\sum_{i=1}^{n}B_{ji}V_{ik},

i.e., (ϕj,G¯ψkLρ2)=BVΛ\big{(}\langle{\phi_{j},{\mathcal{L}_{\overline{G}}}\psi_{k}}\rangle_{L^{2}_{\rho}}\big{)}=BV\Lambda. Hence, these two equations imply that AAV=BVΛA^{\top}AV=BV\Lambda.

Next, to compute the norm of ϕ=i=1nxiϕiHG\phi=\sum_{i=1}^{n}x_{i}\phi_{i}\in H_{G}, we write it as ϕ=x𝚽=xV1𝚿\phi=x^{\top}\boldsymbol{\Phi}=x^{\top}V^{-1}\boldsymbol{\Psi}. Then, its norm is

ϕHG2=k=1nλk1(xV1)k2=xV1ΛVx=x(VΛV)x.\|\phi\|_{H_{G}}^{2}=\sum_{k=1}^{n}\lambda_{k}^{-1}\big{(}x^{\top}V^{-1}\big{)}_{k}^{2}=x^{\top}V^{-1}\Lambda^{\dagger}V^{-\top}x=x^{\top}(V\Lambda V^{\top})^{\dagger}x.

Thus, Crkhs=(VΛV)=B(AA)BC_{rkhs}=(V\Lambda V^{\top})^{\dagger}=B(A^{\top}A)^{\dagger}B and Crkhs=VΛV=B1(AA)B1C_{rkhs}^{\dagger}=V\Lambda V^{\top}=B^{-1}(A^{\top}A)B^{-1}.   

In particular, when eq. 1.1 is either a discrete model or a discretization of eq. 2.1 based on Riemann sum approximation of the integral, the exploration measure ρ\rho is the normalized column sum of the absolute values of the matrix AA, and B=diag(ρ)B=\mathrm{diag}(\rho). See Section 5.1 for details.

2.4 DARTR: data-adaptive RKHS Tikhonov regularization

We review DARTR, a data-adaptive RKHS Tikhonov regularization algorithm introduced in [26].

Specifically, it solves the problem eq. 1.1 with regularization:

(x^λ,λ)=argminxn,λ+λ(x), where λ(x):=Axb2+λxCrksh2,(\widehat{x}_{\lambda_{*}},\lambda_{*})=\mathop{\mathrm{argmin}}_{x\in\mathbb{R}^{n},\lambda\in\mathbb{R}^{+}}\mathcal{E}_{\lambda}(x),\quad\text{ where }\mathcal{E}_{\lambda}(x):=\|Ax-b\|^{2}+\lambda\|x\|_{C_{rksh}}^{2},

where the norm Crksh\|\cdot\|_{C_{rksh}} is the DA-RKHS norm introduced in Theorem 2.4. A direct solution minimizing λ(x)\mathcal{E}_{\lambda}(x) is to solve (AA+λCrkhs)xλ=Ab(A^{\top}A+\lambda C_{rkhs})x_{\lambda}=A^{\top}b. However, the computation of CrkhsC_{rkhs} requires a pseudo-inverse that may cause numerical instability.

DARTR introduces a transformation matrix C:=VΛ1/2C_{*}:=V\Lambda^{1/2} to avoid using the pseudo-inverse. Note that CCrkhsC=(Ir000):=𝐈rC_{*}^{\top}C_{rkhs}C_{*}=\begin{pmatrix}I_{r}&0\\ 0&0\end{pmatrix}:=\mathbf{I}_{r}, where IrI_{r} is the identity matrix with rank rr, the number of positive eigenvalues in Λ\Lambda. Then, the linear equation (AA+λCrkhs)xλ=Ab(A^{\top}A+\lambda C_{rkhs})x_{\lambda}=A^{\top}b is equivalent to

(CAAC+λ𝐈r)x~λ=CAb(C_{*}A^{\top}AC_{*}+\lambda\mathbf{I}_{r})\widetilde{x}_{\lambda}=C_{*}A^{\top}b (2.10)

with x~λ=C1xλ\widetilde{x}_{\lambda}=C_{*}^{-1}x_{\lambda}. Thus, DARTR computes x~λ\widetilde{x}_{\lambda} in the above equation by least squares with minimal norm, and returns xλ=Cx~λx_{\lambda}=C_{*}\widetilde{x}_{\lambda}.

DARTR is a direct method based on matrix decomposition, and it takes O(mn2+n3)O(mn^{2}+n^{3}) flops. Hence, it is computationally infeasible when nn is large. The iterative method in the next section implements the RKHS regularization in a scalable fashion.

3 Iterative Regularization by DA-RKHS

This section introduces a subspace project method tailored to utilize the DA-RKHS for iterative regularization. As an iterative method, it achieves scalability by accessing the coefficient matrix only via matrix-vector multiplications, producing a sequence of estimators until reaching a desired solution. This section follows the notation conventions in Table 1.

Table 1: Table of notations.
A,B,CA,B,C matrix or array by capital letters
b,c,x,y,z,u,vb,c,x,y,z,u,v vector by regular letters
α,β,γ\alpha,\beta,\gamma scalar by Greek letters
(A)\mathcal{R}(A) and 𝒩(A)\mathcal{N}(A) the range and null spaces of matrix AA

3.1 Overview

Our regularization method is based on subspace projection in the DA-RKHS. It iteratively constructs a sequence of linear subspaces 𝒮k\mathcal{S}_{k} of the DA-RKHS (n,,Crkhs)(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}}), and recursively solves projected problems

xk=argminx𝒳kxCrkhs,𝒳k={x:minx𝒮kAxb2}.x_{k}=\mathop{\mathrm{argmin}}_{x\in\mathcal{X}_{k}}\|x\|_{C_{rkhs}},\ \ \mathcal{X}_{k}=\{x:\min_{x\in\mathcal{S}_{k}}\|Ax-b\|_{2}\}. (3.1)

This process yields a sequence of solutions {xk}\{x_{k}\}, each emerging from its corresponding subspace. We ensure the uniqueness of the solution within each iteration, as detailed in Theorem 4.7. The iteration proceeds until it meets an early stopping criterion, designed to exclude excessive noisy components and thereby achieve effective regularization. The spaces 𝒮k\mathcal{S}_{k} are called the solution subspaces, and the iteration number kk plays the role of the regularization parameter.

Algorithm 1 outlines this procedure, which is a recursion of the following three parts.

  1. P1

    Construct the solution subspaces. We introduce a new generalized Golub-Kahan bidiagonalization (gGKB) to construct the solution subspaces in the DA-RKHS iteratively. The procedure is presented in Algorithm 2.

  2. P2

    Recursively update the solution to the projected problem. We solve the least squares problem in the solution subspaces in eq. 3.1 efficiently by a new LSQR-type algorithm, Algorithm 3, updating xkCrkhs\|x_{k}\|_{C_{rkhs}} and the residual norm Axb2\|Ax-b\|_{2} without even computing the residual.

  3. P3

    Regularize by early stopping. We select the optimal kk by either the discrepancy principle (DP) when we have an accurate estimate of 𝐰2\|\mathbf{w}\|_{2} or the L-curve criterion otherwise.

Algorithm 1 iDARR: Iterative Data-Adaptive RKHS Regularization

Require: Am×nA\in\mathbb{R}^{m\times n}, bmb\in\mathbb{R}^{m}, B=diag(ρ)B=\mathrm{diag}(\rho), x0=𝟎x_{0}=\mathbf{0}, x¯0=𝟎\bar{x}_{0}=\mathbf{0}, where ρ\rho is the exploration measure in eq. 2.4.

1:for k=1,2,,k=1,2,\ldots, do
2:     P1. Compute uku_{k}, zkz_{k}, z¯k\bar{z}_{k}, αk\alpha_{k}, βk\beta_{k} by gGKB in Algorithm 2.
3:     P2. Update xkx_{k}, γ¯k+1\bar{\gamma}_{k+1}, xkCrkhs\|x_{k}\|_{C_{rkhs}}, etc. by Algorithm 3.
4:     P3: Stop at iteration kk_{*} if Early stopping criterion is satisfied. \triangleright L-curve or DP

Ensure: Final solution xkx_{k_{*}}

In the next subsection, we present details for these key parts. Then, we analyze the computational complexity of the algorithm.

3.2 Algorithm details and derivations

This subsection presents detailed derivations for the three parts P1-P3 in Algorithm 1.

P1. Construct the solution subspaces. We construct the solution subspaces by elaborately incorporating the regularization term Crkhs2\|\cdot\|_{C_{rkhs}}^{2} in the Golub-Kahan bidiagonalization (GKB) process. A key point is to use Crkhs=B1AAB1C_{rkhs}^{\dagger}=B^{-1}A^{\top}AB^{-1} to avoid explicitly computing CrkhsC_{rkhs}, which involves the costly spectral decomposition of the normal operator, see eq. 2.9 in Theorem 2.4.

Consider first the case where CrkhsC_{rkhs} is positive definite. In this scenario, AA has full column rank, and the CrkhsC_{rkhs}-inner product Hilbert space (n,,Crkhs)(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}}) is a discrete representation of the RKHS HGH_{G} with the given basis functions. Note that the true solution is mapped to the noisy bb by A:(n,,Crkhs)(m,,2)A:(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}})\to(\mathbb{R}^{m},\langle\cdot,\cdot\rangle_{2}). Let A:(m,,2)(n,,Crkhs)A^{*}:(\mathbb{R}^{m},\langle\cdot,\cdot\rangle_{2})\to(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}}) be the adjoint of AA, i.e. Ax,b2=x,AbCrkhs\langle Ax,b\rangle_{2}=\langle x,A^{*}b\rangle_{C_{rkhs}} for any xnx\in\mathbb{R}^{n} and bmb\in\mathbb{R}^{m}. By definition, the matrix-form expression of AA^{*} is

A=Crkhs1A,A^{*}=C_{rkhs}^{-1}A^{\top}, (3.2)

since Ax,b2=xAb\langle Ax,b\rangle_{2}=x^{\top}A^{\top}b and x,AbCrkhs=xCrkhsAb\langle x,A^{*}b\rangle_{C_{rkhs}}=x^{\top}C_{rkhs}A^{*}b for any xx and bb.

The Golub-Kahan bidiagonalization (GKB) process recursively constructs orthonormal bases for these two Hilbert spaces starting with the vector bb as follows:

β1u1=b,α1z1=Au1,\displaystyle\beta_{1}u_{1}=b,\ \ \alpha_{1}z_{1}=A^{*}u_{1}, (3.3a)
βi+1ui+1=Aziαiui,\displaystyle\beta_{i+1}u_{i+1}=Az_{i}-\alpha_{i}u_{i}, (3.3b)
αi+1zi+1=Aui+1βi+1zi,\displaystyle\alpha_{i+1}z_{i+1}=A^{*}u_{i+1}-\beta_{i+1}z_{i}, (3.3c)

where ui(m,,2)u_{i}\in(\mathbb{R}^{m},\langle\cdot,\cdot\rangle_{2}), zi(n,,Crkhs)z_{i}\in(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}}) with z0=𝟎z_{0}=\boldsymbol{0}, and αi\alpha_{i}, βi\beta_{i} are normalizing factor such that ui2=ziCrkhs=1\|u_{i}\|_{2}=\|z_{i}\|_{C_{rkhs}}=1. The iteration starts with u1=b/β1u_{1}=b/\beta_{1} with β1=b2\beta_{1}=\|b\|_{2}. Using AA^{*} in eq. 3.2, we write eq. 3.3c as

αi+1zi+1=Crkhs1Aui+1βi+1zi\displaystyle\alpha_{i+1}z_{i+1}=C_{rkhs}^{-1}A^{\top}u_{i+1}-\beta_{i+1}z_{i} (3.4)

with αi+1=Crkhs1Aui+1βi+1ziCrkhs\alpha_{i+1}=\|C_{rkhs}^{-1}A^{\top}u_{i+1}-\beta_{i+1}z_{i}\|_{C_{rkhs}}. To compute αi+1\alpha_{i+1} without explicitly computing CrkhsC_{rkhs}, define z¯i=Crkhszi\bar{z}_{i}=C_{rkhs}z_{i}. Then we have

αi+1z¯i+1=Aui+1βi+1z¯i,\displaystyle\alpha_{i+1}\bar{z}_{i+1}=A^{\top}u_{i+1}-\beta_{i+1}\bar{z}_{i}, (3.5)

where z¯0:=𝟎\bar{z}_{0}:=\boldsymbol{0}. Let p=Aui+1βi+1z¯ip=A^{\top}u_{i+1}-\beta_{i+1}\bar{z}_{i}. Then we obtain αi+1=Crkhs1pCrkhs=(pCrkhs1p)1/2\alpha_{i+1}=\|C_{rkhs}^{-1}p\|_{C_{rkhs}}=(p^{\top}C_{rkhs}^{-1}p)^{1/2}, which uses Crkhs1=B1AAB1C_{rkhs}^{-1}=B^{-1}A^{\top}AB^{-1} without computing CrkhsC_{rkhs}.

Next, consider that CrkhsC_{rkhs} is positive semidefinite. The iterative process remains the same with Crkhs1C_{rkhs}^{-1} replaced by the pseudo-inverse CrkhsC_{rkhs}^{{\dagger}}, because Crkhs=B1AAB1C_{rkhs}^{{\dagger}}=B^{-1}A^{\top}AB^{-1} has the same form as Crkhs1C_{rkhs}^{-1} for the non-singular case. Specifically, the recursive relation eq. 3.4 becomes

αi+1zi+1=CrkhsAui+1βi+1zi.\displaystyle\alpha_{i+1}z_{i+1}=C_{rkhs}^{{\dagger}}A^{\top}u_{i+1}-\beta_{i+1}z_{i}. (3.6)

To compute αi+1\alpha_{i+1}, we use the property that zi(Crkhs)z_{i}\in\mathcal{R}(C_{rkhs}), which will be proved in Proposition 4.3. Note that CrkhsCrkhs=P𝒩(Crkhs)=P(Crkhs)C_{rkhs}^{{\dagger}}C_{rkhs}=P_{\mathcal{N}(C_{rkhs})^{\perp}}=P_{\mathcal{R}(C_{rkhs})} since CrkhsC_{rkhs} is symmetric, where P𝒳P_{\mathcal{X}} is the projection operator onto subspace 𝒳\mathcal{X}. It follows that CrkhsCrkhszi=ziC_{rkhs}^{{\dagger}}C_{rkhs}z_{i}=z_{i}. Therefore, eq. 3.6 becomes

αi+1zi+1=Crkhs(Aui+1βi+1Crkhszi).\alpha_{i+1}z_{i+1}=C_{rkhs}^{{\dagger}}(A^{\top}u_{i+1}-\beta_{i+1}C_{rkhs}z_{i}).

Letting z¯i=Crkhszi\bar{z}_{i}=C_{rkhs}z_{i} and p=Aui+1βi+1z¯ip=A^{\top}u_{i+1}-\beta_{i+1}\bar{z}_{i} again, we get αi+1=CrkhspCrkhs=(pCrkhsp)1/2\alpha_{i+1}=\|C_{rkhs}^{{\dagger}}p\|_{C_{rkhs}}=(p^{\top}C_{rkhs}^{{\dagger}}p)^{1/2}.

Thus, the two cases of CrkhsC_{rkhs} lead to the same iterative process. We summarize the iterative process in Algorithm 2, and call it generalized Golub-Kahan bidiagonalization (gGKB).

Algorithm 2 Generalized Golub-Kahan bidiagonalization (gGKB)

Require: Am×nA\in\mathbb{R}^{m\times n}, bmb\in\mathbb{R}^{m}, B=diag(ρ)B=\mathrm{diag}(\rho)

1:Initialize: let β1=b2\beta_{1}=\|b\|_{2},  u1=b/β1u_{1}=b/\beta_{1}, and compute p=Au1p=A^{\top}u_{1},   s=B1AAB1ps=B^{-1}A^{\top}AB^{-1}p.
2:Let α1=(sp)1/2\alpha_{1}=(s^{\top}p)^{1/2},  z1=s/α1z_{1}=s/\alpha_{1},  z¯1=p/α1\bar{z}_{1}=p/\alpha_{1}.
3:for i=1,2,,k,i=1,2,\dots,k, do
4:     r=Aziαiuir=Az_{i}-\alpha_{i}u_{i},   βi+1=r2\beta_{i+1}=\|r\|_{2},  ui+1=r/βi+1u_{i+1}=r/\beta_{i+1};
5:     p=Aui+1βi+1z¯ip=A^{\top}u_{i+1}-\beta_{i+1}\bar{z}_{i},   s=B1AAB1ps=B^{-1}A^{\top}AB^{-1}p; \triangleright Crkhs=B1AAB1C_{rkhs}^{\dagger}=B^{-1}A^{\top}AB^{-1}
6:     αi+1=(sp)1/2\alpha_{i+1}=(s^{\top}p)^{1/2},  zi+1=s/αi+1z_{i+1}=s/\alpha_{i+1},  z¯i+1=p/αi+1\bar{z}_{i+1}=p/\alpha_{i+1}. \triangleright z¯i=Crkhszi\bar{z}_{i}=C_{rkhs}z_{i}       

Ensure: {αi,βi}i=1k+1\{\alpha_{i},\beta_{i}\}_{i=1}^{k+1},  {ui,zi,z¯i}i=1k+1\{u_{i},z_{i},\bar{z}_{i}\}_{i=1}^{k+1}

Suppose the gGKB process terminates at step kt:=maxk1{αkβk>0}k_{t}:=\max_{k\geq 1}\{\alpha_{k}\beta_{k}>0\}. We show in Proposition 4.4Proposition 4.5 that the output vectors {ui}i=1kt\{u_{i}\}_{i=1}^{k_{t}} and {zi}i=1kt\{z_{i}\}_{i=1}^{k_{t}} are orthonormal in (m,,2)(\mathbb{R}^{m},\langle\cdot,\cdot\rangle_{2}) and (n,,Crkhs)(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}}), respectively. In particular, they span two Krylov subspaces generated by {ACrkhsA,b}\{AC_{rkhs}^{{\dagger}}A^{\top},b\} and {CrkhsAA,CrkhsAb}\{C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b\}, respectively.

In matrix form, the kk-step gGKB process with k<ktk<k_{t}, starting with vector bb, produces a 22-orthonormal matrix Uk+1=(u1,,uk+1)m×(k+1)U_{k+1}=(u_{1},\dots,u_{k+1})\in\mathbb{R}^{m\times(k+1)} (i.e., Uk+1Uk+1=Ik+1U_{k+1}^{\top}U_{k+1}=I_{k+1}), a CrkhsC_{rkhs}-orthonormal matrix Zk+1=(z1,,zk+1)n×(k+1)Z_{k+1}=(z_{1},\dots,z_{k+1})\in\mathbb{R}^{n\times(k+1)}, and a bidiagonal matrix

Bk=(α1β2α2β3αkβk+1)(k+1)×kB_{k}=\begin{pmatrix}\alpha_{1}&&&\\ \beta_{2}&\alpha_{2}&&\\ &\beta_{3}&\ddots&\\ &&\ddots&\alpha_{k}\\ &&&\beta_{k+1}\end{pmatrix}\in\mathbb{R}^{(k+1)\times k} (3.7)

such that the recursion in eq. 3.3a,eq. 3.3b and eq. 3.6 can be written as

β1Uk+1e1=b,\displaystyle\beta_{1}U_{k+1}e_{1}=b, (3.8a)
AZk=Uk+1Bk,\displaystyle AZ_{k}=U_{k+1}B_{k}, (3.8b)
CrkhsAUk+1=ZkBkT+αk+1zk+1ek+1,\displaystyle C_{rkhs}^{{\dagger}}A^{\top}U_{k+1}=Z_{k}B_{k}^{T}+\alpha_{k+1}z_{k+1}e_{k+1}^{\top}, (3.8c)

where e1e_{1} and ek+1e_{k+1} are the first and (k+1)(k+1)-th columns of Ik+1I_{k+1}. We emphasize that BkB_{k} is a bidiagonal matrix of full column rank, since αi,βi>0\alpha_{i},\beta_{i}>0 for all ik+1i\leq k+1.

P2. Recursively update the solution to the projected problem. For each kktk\leq k_{t}, i.e., before the gGKB terminates, we solve eq. 3.1 in the subspace

𝒮k:=span{z1,,zk}\mathcal{S}_{k}:=\mathrm{span}\{z_{1},\dots,z_{k}\} (3.9)

and compute the RKHS norm of the solution.

We show first the uniqueness of the solution to eq. 3.1. From eq. 3.8b we have Bk=Uk+1AZkB_{k}=U_{k+1}^{\top}AZ_{k}, which implies that BkB_{k} is a projection of AA onto the two subspaces span{Uk+1}\mathrm{span}\{U_{k+1}\} and span{Zk}\mathrm{span}\{Z_{k}\}. Since any vector xx in 𝒮k\mathcal{S}_{k} can be written as x=Zkyx=Z_{k}y with a yky\in\mathbb{R}^{k}, we obtain from eq. 3.8a and eq. 3.8b that for any x=Zkyx=Z_{k}y,

minx=ZkyAxb2\displaystyle\min_{x=Z_{k}y}\|Ax-b\|_{2} =minykAZkyUk+1β1e12\displaystyle=\min_{y\in\mathbb{R}^{k}}\|AZ_{k}y-U_{k+1}\beta_{1}e_{1}\|_{2} (3.10)
=minykUk+1BkyUk+1β1e12=minykBkyβ1e12.\displaystyle=\min_{y\in\mathbb{R}^{k}}\|U_{k+1}B_{k}y-U_{k+1}\beta_{1}e_{1}\|_{2}=\min_{y\in\mathbb{R}^{k}}\|B_{k}y-\beta_{1}e_{1}\|_{2}.

Since kktk\leq k_{t}, BkB_{k} has full column rank, this kk-dimensional least squares problem has a unique solution. Therefore, the unique solution to eq. 3.1 is

xk=Zkyk,yk=argminykBkyβ1e12.x_{k}=Z_{k}y_{k},\ \ y_{k}=\mathop{\mathrm{argmin}}_{y\in\mathbb{R}^{k}}\|B_{k}y-\beta_{1}e_{1}\|_{2}. (3.11)

We note that the above uniqueness is independent of the specific construction of the basis vectors {zk}\{z_{k}\}. In general, as long as 𝒮k(Crkhs)\mathcal{S}_{k}\subseteq\mathcal{R}(C_{rkhs}), the solution to eq. 3.1 is unique; see Theorem 4.7.

Importantly, we recursively compute xkx_{k} without explicitly solving yky_{k} in eq. 3.11, avoiding the O(k3)O(k^{3}) computational cost. To this end, we adopt a procedure similar to the LSQR algorithm in [30] to iteratively update xkx_{k} from x0=𝟎x_{0}=\mathbf{0}. It starts from the following Givens QR factorization:

Qk(Bkβ1e1)=(Rkfkγ¯k+1):=(ρ1θ2γ1ρ2θ3γ2ρk1θkγk1ρkγk\hdashlineγ¯k+1),Q_{k}\begin{pmatrix}B_{k}&\beta_{1}e_{1}\end{pmatrix}=\begin{pmatrix}R_{k}&f_{k}\\ &\bar{\gamma}_{k+1}\end{pmatrix}:=\left(\begin{array}[]{ccccc : c}\rho_{1}&\theta_{2}&&&&\gamma_{1}\\ &\rho_{2}&\theta_{3}&&&\gamma_{2}\\ &&\ddots&\ddots&&\vdots\\ &&&\rho_{k-1}&\theta_{k}&\gamma_{k-1}\\ &&&&\rho_{k}&\gamma_{k}\\ \hdashline&&&&&\bar{\gamma}_{k+1}\end{array}\right),

where the orthogonal matrix QkQ_{k} is the product of a series of Givens rotation matrices, and RkR_{k} is a bidiagonal upper triangular matrix; see [13, §5.2.5]. We implement the Givens QR factorization using the procedure in [30], which recursively zeros out the subdiagonal elements βi\beta_{i} for each 2ik+12\leq i\leq k+1. Specifically, at the ii-th step, a Givens rotation zeros out βi+1\beta_{i+1} by

(cisisici)(ρ¯i0γ¯iβi+1αi+10)=(ρiθi+1γi0ρ¯i+1γ¯i+1),\begin{pmatrix}c_{i}&s_{i}\\ s_{i}&-c_{i}\end{pmatrix}\begin{pmatrix}\bar{\rho}_{i}&0&\bar{\gamma}_{i}\\ \beta_{i+1}&\alpha_{i+1}&0\end{pmatrix}=\begin{pmatrix}\rho_{i}&\theta_{i+1}&\gamma_{i}\\ 0&\bar{\rho}_{i+1}&\bar{\gamma}_{i+1}\end{pmatrix},

where the entries cic_{i} and sis_{i} of the Givens rotations satisfy ci2+si2=1c_{i}^{2}+s_{i}^{2}=1, and the elements ρi\rho_{i}, θi+1\theta_{i+1}, ρ¯i+1\bar{\rho}_{i+1}, γi\gamma_{i}, γ¯i+1\bar{\gamma}_{i+1} are recursively updated accordingly; see [30] for more details.

As a result of the QR factorization, we can write

Bkyβ1e122=Qk(Bkβ1e1)(y1)22=Rkyfk22+|γ¯k+1|2.\|B_{k}y-\beta_{1}e_{1}\|_{2}^{2}=\Big{\|}Q_{k}\begin{pmatrix}B_{k}&\beta_{1}e_{1}\end{pmatrix}\begin{pmatrix}y\\ -1\end{pmatrix}\Big{\|}_{2}^{2}=\|R_{k}y-f_{k}\|_{2}^{2}+|\bar{\gamma}_{k+1}|^{2}. (3.12)

Hence, the solution to minykBkyβ1e1\min_{y\in\mathbb{R}^{k}}\|B_{k}y-\beta_{1}e_{1}\| is yk=Rk1fky_{k}=R_{k}^{-1}f_{k}. Factorizing RkR_{k} as

Rk=DkR¯k,Dk:=(ρ1ρ2ρk),R¯k:=(1θ2/ρ11θ3/ρ2θk/ρk11)R_{k}=D_{k}\bar{R}_{k},\ \ D_{k}:=\begin{pmatrix}\rho_{1}&&&\\ &\rho_{2}&&\\ &&\ddots&\\ &&&\rho_{k}\end{pmatrix},\ \bar{R}_{k}:=\begin{pmatrix}1&\theta_{2}/\rho_{1}&&\\ &1&\theta_{3}/\rho_{2}&\\ &&\ddots&\theta_{k}/\rho_{k-1}\\ &&&1\end{pmatrix}

and denoting Wk=ZkR¯k1=(w1,,wk)W_{k}=Z_{k}\bar{R}_{k}^{-1}=(w_{1},\dots,w_{k}), we get

xk=Zkyk=ZkRk1fk=(ZkR¯k1)(Dk1fk)=Wk(Dk1fk)=i=1k(γi/ρi)wi.x_{k}=Z_{k}y_{k}=Z_{k}R_{k}^{-1}f_{k}=(Z_{k}\bar{R}_{k}^{-1})(D_{k}^{-1}f_{k})=W_{k}(D_{k}^{-1}f_{k})=\sum_{i=1}^{k}(\gamma_{i}/\rho_{i})w_{i}.

Updating xkx_{k} recursively, and solving WkR¯k=ZkW_{k}\bar{R}_{k}=Z_{k} by back substitution, we obtain:

xi=xi1+(γi/ρi)wi,wi+1=zi+1(θi+1/ρi)wi,1ik.x_{i}=x_{i-1}+(\gamma_{i}/\rho_{i})w_{i},\ \ \ w_{i+1}=z_{i+1}-(\theta_{i+1}/\rho_{i})w_{i},\quad\forall 1\leq i\leq k. (3.13)

Lastly, we compute xiCrkhs\|x_{i}\|_{C_{rkhs}} without an explicit CrkhsC_{rkhs}. We have from eq. 3.13:

Crkhsxi=Crkhsxi1+(γi/ρi)Crkhswi,Crkhswi+1=Crkhszi+1(θi+1/ρi)Crkhswi.C_{rkhs}x_{i}=C_{rkhs}x_{i-1}+(\gamma_{i}/\rho_{i})C_{rkhs}w_{i},\ \ C_{rkhs}w_{i+1}=C_{rkhs}z_{i+1}-(\theta_{i+1}/\rho_{i})C_{rkhs}w_{i}.

Letting x¯i=Crkhsxi\bar{x}_{i}=C_{rkhs}x_{i} and w¯i=Crkhswi\bar{w}_{i}=C_{rkhs}w_{i}, and recalling that z¯i=Crkhszi\bar{z}_{i}=C_{rkhs}z_{i}, we have

xiCrkhs2=xix¯i,x¯i=x¯i1+(γi/ρi)w¯i,w¯i+1=z¯i+1(θi+1/ρi)w¯i.\|x_{i}\|_{C_{rkhs}}^{2}=x_{i}^{\top}\bar{x}_{i},\quad\bar{x}_{i}=\bar{x}_{i-1}+(\gamma_{i}/\rho_{i})\bar{w}_{i},\ \ \ \bar{w}_{i+1}=\bar{z}_{i+1}-(\theta_{i+1}/\rho_{i})\bar{w}_{i}. (3.14)
Algorithm 3 Updating procedure
1:Let x0=𝟎,x¯0=𝟎,w1=z1,w¯1=z¯1,γ¯1=β1,ρ¯1=α1x_{0}=\mathbf{0},\ \bar{x}_{0}=\mathbf{0},\ w_{1}=z_{1},\ \bar{w}_{1}=\bar{z}_{1},\ \bar{\gamma}_{1}=\beta_{1},\ \bar{\rho}_{1}=\alpha_{1}
2:for i=1,2,,k,i=1,2,\ldots,k, do
3:     ρi=(ρ¯i2+βi+12)1/2\rho_{i}=(\bar{\rho}_{i}^{2}+\beta_{i+1}^{2})^{1/2}, ci=ρ¯i/ρi,si=βi+1/ρic_{i}=\bar{\rho}_{i}/\rho_{i},\ s_{i}=\beta_{i+1}/\rho_{i}
4:     θi+1=siαi+1,ρ¯i+1=ciαi+1\theta_{i+1}=s_{i}\alpha_{i+1},\ \bar{\rho}_{i+1}=-c_{i}\alpha_{i+1}, γi=ciγ¯i,γ¯i+1=siγ¯i\gamma_{i}=c_{i}\bar{\gamma}_{i},\ \bar{\gamma}_{i+1}=s_{i}\bar{\gamma}_{i}
5:     xi=xi1+(γi/ρi)wix_{i}=x_{i-1}+(\gamma_{i}/\rho_{i})w_{i}, wi+1=zi+1(θi+1/ρi)wiw_{i+1}=z_{i+1}-(\theta_{i+1}/\rho_{i})w_{i}
6:     x¯i=x¯i1+(γi/ρi)w¯i\bar{x}_{i}=\bar{x}_{i-1}+(\gamma_{i}/\rho_{i})\bar{w}_{i}, w¯i+1=z¯i+1(θi+1/ρi)w¯i\bar{w}_{i+1}=\bar{z}_{i+1}-(\theta_{i+1}/\rho_{i})\bar{w}_{i}
7:     xiCrkhs=(xix¯i)1/2\|x_{i}\|_{C_{rkhs}}=(x_{i}^{\top}\bar{x}_{i})^{1/2}

The whole updating procedure is described in Algorithm 3. This algorithm yields the residual norm Axkb2\|Ax_{k}-b\|_{2} without explicitly computing the residual. In fact, by eq. 3.10 and eq. 3.12 we have

γ¯k+1=Bkykβ1e12=Axkb2.\bar{\gamma}_{k+1}=\|B_{k}y_{k}-\beta_{1}e_{1}\|_{2}=\|Ax_{k}-b\|_{2}. (3.15)

Note that γ¯k+1\bar{\gamma}_{k+1} decreases monotonically since xkx_{k} minimizes Axb2\|Ax-b\|_{2} in the gradually expanding subspace span{Zk}\mathrm{span}\{Z_{k}\}.

Importantly, Algorithm 3 efficiently computes the solution xkx_{k} and xkCrkhs\|x_{k}\|_{C_{rkhs}}. At each step of updating xix_{i} or wi+1w_{i+1}, the computation takes O(2n)O(2n) flops. Similarly, for updating x¯i\bar{x}_{i} or w¯i+1\bar{w}_{i+1}, as well as for computing xiCrkhs\|x_{i}\|_{C_{rkhs}}, the number of flops are also O(2n)O(2n). Therefore, the dominant computational cost is O(10n)O(10n). In contrast, if yky_{k} is solved explicitly at each step, it takes O(i=1ki3)O(k4)O(\sum_{i=1}^{k}i^{3})\sim O(k^{4}) flops; together with the step of forming xk=Zkykx_{k}=Z_{k}y_{k} that takes O(kn)O(kn) flops, they lead to a total cost of O(kn+k4)O(kn+k^{4}) flops. Thus, the LSQR-type iteration in Algorithm 3 significantly reduces the number of flops from O(kn+k4)O(kn+k^{4}) to O(10n)O(10n).

P3. Regularize by early stopping. An early stopping strategy is imperative to prevent the solution subspace from becoming excessively large, which could otherwise compromise the regularization. This necessity is rooted in the phenomenon of semi-convergence: the iteration vector xkx_{k} initially approaches an optimal regularized solution but subsequently moves towards the unstable naive solution to minx(Crkhs)Axb2\min_{x\in\mathcal{R}(C_{rkhs})}\|Ax-b\|_{2}, as detailed in Proposition 4.6.

For early stopping, we adopt the L-curve criterion, as outlined in [12]. This method identifies the ideal early stopping iteration kk_{*} at the corner of the curve represented by

(logAxkb2,xkCrkhs)=(logγ¯k+1,logxkCrkhs).\left(\log\|Ax_{k}-b\|_{2},\|x_{k}\|_{C_{rkhs}}\right)=\left(\log\bar{\gamma}_{k+1},\log\|x_{k}\|_{C_{rkhs}}\right). (3.16)

Here γ¯k+1\bar{\gamma}_{k+1} and xkCrkhs\|x_{k}\|_{C_{rkhs}} are computed with negligible cost in Algorithm 3. To construct the L-curve effectively, we set the gGKB to execute at least 10 iterations. Additionally, to enhance numerical stability, we stop the gGKB when either αi\alpha_{i} or βi\beta_{i} is near the machine precision, as inspired by Proposition 4.6.

It is noteworthy that the discrepancy principle (DP) presents a viable alternative when the measurement error 𝐰2\|\mathbf{w}\|_{2} in eq. 1.1 is available with a high degree of accuracy. The DP halts iterations at the earliest instance of kk that satisfies γ¯k+1=Axkb2τ𝐰2,\bar{\gamma}_{k+1}=\|Ax_{k}-b\|_{2}\leq\tau\|\mathbf{w}\|_{2}, where τ\tau is chosen to be marginally greater than 1.

3.3 Computational Complexity

Suppose the algorithm takes kk iterations and the basis matrix BB is diagonal. Recall that Am×nA\in\mathbb{R}^{m\times n}, Bn×nB\in\mathbb{R}^{n\times n}, uimu_{i}\in\mathbb{R}^{m} and zinz_{i}\in\mathbb{R}^{n}. The total computational cost of Algorithm 1 is about O(3mnk)O(3mnk) when mn/3m\leq n/3 or k<n/3k<n/3; and about O((m+k)n2)O((m+k)n^{2}) when otherwise. The cost is dominated by the gGKB process since the cost of the update procedure in Algorithm 3 is only O(n)O(n) at each step.

The gGKB can be computed in two approaches. The first approach uses only matrix-vector multiplication. The main computations in each iteration of gGKB occur at the matrix-vector products p=Auip=A^{\top}u_{i} and s=B1AAB1p=B1(A(A(B1p)))s=B^{-1}A^{\top}AB^{-1}p=B^{-1}(A^{\top}(A(B^{-1}p))), which take O(mn)O(mn) and O(2mn)O(2mn) flops respectively. Thus, the total computational cost of gGKB is O(3mnk)O(3mnk) flops. Another approach is using 𝐀=AA\mathbf{A}=A^{\top}A instead of AA^{\top} and AA to compute ss. In this approach, the computation of 𝐀\mathbf{A} from AA takes O(mn2)O(mn^{2}) flops, and the matrix-vector multiplication 𝐀v\mathbf{A}v in each iteration takes about O(n2)O(n^{2}) flops. Hence, the total cost of kk iterations is O(mn2+kn2)O(mn^{2}+kn^{2}). The second approach is faster when mn2+n2k<3mnkmn^{2}+n^{2}k<3mnk, or equivalently, (3mn)k>mn(3m-n)k>mn. That is, roughly speaking, m>n/3m>n/3 and k>mn/(3mn)>n/3k>mn/(3m-n)>n/3.

In practice, the matrix-vector computation is preferred since the iteration number kk is often small. The resulting iDARR algorithm takes about O(3mnk)O(3mnk) flops.

4 Properties of gGKB

This section studies the properties of the gGKB in Algorithm 2, including the structure of the solution subspace, the orthogonality of the resulted vectors {ui}i=1k+1\{u_{i}\}_{i=1}^{k+1} and {zi}i=1k+1\{z_{i}\}_{i=1}^{k+1} and the number of iterations at termination, defined as:

gGKB terminate step: kt:=maxk1{αkβk>0}.\textbf{gGKB terminate step: }\quad k_{t}:=\max_{k\geq 1}\{\alpha_{k}\beta_{k}>0\}. (4.1)

Additionally, we show that the solution to eq. 3.1 is unique in each iteration.

Throughout this section, let rr denote the rank of Λ\Lambda and let VrV_{r} denote the first rr columns of VV, where Λ\Lambda and VV are matrices constituted by the generalized eigenvalues and eigenvectors of {A,B}\{A,B\} in eq. 2.9 in Theorem 2.4. We have rank(A)=r\mathrm{rank}(A)=r. Recall that Crkhs=(VΛV)=B(AA)BC_{rkhs}=(V\Lambda V^{\top})^{\dagger}=B(A^{\top}A)^{\dagger}B. Note that the DA-RKHS is ((Crkhs),,Crkhs)(\mathcal{R}(C_{rkhs}),\langle\cdot,\cdot\rangle_{C_{rkhs}}).

4.1 Properties of gGKB

We show first that the gGKB-produced vectors {ui}i=1k+1\{u_{i}\}_{i=1}^{k+1} and {zi}i=1k+1\{z_{i}\}_{i=1}^{k+1} are orthogonal in n\mathbb{R}^{n} and in the DA-RKHS, and the solution subspaces of the gGKB are RKHS-restricted Krylov subspaces.

Definition 4.1 (RKHS-restricted Krylov subspace)

Let Am×nA\in\mathbb{R}^{m\times n} and bnb\in\mathbb{R}^{n}, and let Bn×nB\in\mathbb{R}^{n\times n} be a symmetric positive definite matrix. Let Crkhs=B(AA)BC_{rkhs}=B(A^{\top}A)^{\dagger}B, which defines an RKHS (n,,Crkhs)(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}}). The RKHS-restricted Krylov subspaces are

𝒦k+1(CrkhsAA,CrkhsAb)=span{(CrkhsAA)iCrkhsAb}i=0k,k0.\mathcal{K}_{k+1}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b)=\mathrm{span}\{(C_{rkhs}^{{\dagger}}A^{\top}A)^{i}C_{rkhs}^{{\dagger}}A^{\top}b\}_{i=0}^{k},\quad k\geq 0. (4.2)

The main result is the following.

Theorem 4.2 (Properties of gGKB)

Recall ktk_{t} in eq. 4.1, and the gGKB generates vectors {ui}i=1k\{u_{i}\}_{i=1}^{k}, {zi}i=1k\{z_{i}\}_{i=1}^{k} and 𝒮k=span{zi}i=1k\mathcal{S}_{k}=\mathrm{span}\{z_{i}\}_{i=1}^{k}. They satisfy the following properties:

  • (i)

    {ui}i=1k\{u_{i}\}_{i=1}^{k} and {zi}i=1k\{z_{i}\}_{i=1}^{k} are orthonormal in n\mathbb{R}^{n} and in ((Crkhs),,Crkhs)(\mathcal{R}(C_{rkhs}),\langle\cdot,\cdot\rangle_{C_{rkhs}}), respectively;

  • (ii)

    𝒮k=𝒦k(CrkhsAA,CrkhsAb)\mathcal{S}_{k}=\mathcal{K}_{k}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b) for each kktk\leq k_{t}, and the termination iteration number is kt=dim(𝒦(CrkhsAA,CrkhsAb))k_{t}=\mathrm{dim}(\mathcal{K}_{\infty}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b)).

Proof. Part (i) follows from Proposition 4.3, where we show that zk(Crkhs)z_{k}\in\mathcal{R}(C_{rkhs}), and Proposition 4.4, where we show the orthogonality of these vectors.

For Part (ii), 𝒮k=𝒦k(CrkhsAA,CrkhsAb)\mathcal{S}_{k}=\mathcal{K}_{k}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b) follows from that {zi}i=1k\{z_{i}\}_{i=1}^{k} form an orthonormal basis of the RKHS-restricted Krylov subspace. We prove kt=dim(𝒦(CrkhsAA,CrkhsAb))k_{t}=\mathrm{dim}(\mathcal{K}_{\infty}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b)) in Proposition 4.5.   

Proposition 4.3

For each ziz_{i} generatad by gGKB in eq. 3.3, it holds that zi(Crkhs)z_{i}\in\mathcal{R}(C_{rkhs}). Additionally, if q:=CrkhsAui+1βi+1zi𝟎q:=C_{rkhs}^{{\dagger}}A^{\top}u_{i+1}-\beta_{i+1}z_{i}\neq\boldsymbol{0}, then αi+1=qCrkhs0\alpha_{i+1}=\|q\|_{C_{rkhs}}\neq 0.

Proof. We prove it by mathematical induction. For i=1i=1, we obtain from eq. 3.6 and Theorem 2.4 that α1z1=CrkhsAu1=VΛVAu1(Vr)=(Crkhs)\alpha_{1}z_{1}=C_{rkhs}^{{\dagger}}A^{\top}u_{1}=V\Lambda V^{\top}A^{\top}u_{1}\in\mathcal{R}(V_{r})=\mathcal{R}(C_{rkhs}), where rr is the rank of Λ\Lambda. Suppose zi(Crkhs)z_{i}\in\mathcal{R}(C_{rkhs}) for i1i\geq 1. Using again eq. 3.6 and Theorem 2.4 we get

αi+1zi+1=CrkhsAui+1βi+1zi=VΛVAui+1βi+1zi(Crkhs).\alpha_{i+1}z_{i+1}=C_{rkhs}^{{\dagger}}A^{\top}u_{i+1}-\beta_{i+1}z_{i}=V\Lambda V^{\top}A^{\top}u_{i+1}-\beta_{i+1}z_{i}\in\mathcal{R}(C_{rkhs}).

Therefore, zi+1(Crkhs)z_{i+1}\in\mathcal{R}(C_{rkhs}), and q(Crkhs)q\in\mathcal{R}(C_{rkhs}).

If αi+1=0\alpha_{i+1}=0, then q𝒩(Crkhs)=(Crkhs)q\in\mathcal{N}(C_{rkhs})=\mathcal{R}(C_{rkhs})^{\perp}. Therefore, q=𝟎q=\boldsymbol{0}.   

Thus, even if CrkhsC_{rkhs} is singular (positive semidefinite), the gGKB in Algorithm 2 does not terminate as long as the right-hand sides of eq. 3.3b and eq. 3.6 are nonzero, since the iterative computation of {βi+1,ui+1}\{\beta_{i+1},u_{i+1}\} and {αi+1,zi+1}\{\alpha_{i+1},z_{i+1}\} can continue. Next, we show that these vectors are orthogonal.

Proposition 4.4 (Orthogonality)

Suppose the kk-step gGKB does not terminate, i.e. k<ktk<k_{t}, then {ui}i=1k+1\{u_{i}\}_{i=1}^{k+1} is a 2-orthonormal basis of the Krylov subspace

𝒦k+1(ACrkhsA,b)=span{(ACrkhsA)ib}i=0k,\mathcal{K}_{k+1}(AC_{rkhs}^{{\dagger}}A^{\top},b)=\mathrm{span}\{(AC_{rkhs}^{{\dagger}}A^{\top})^{i}b\}_{i=0}^{k}, (4.3)

and {zi}i=1k+1\{z_{i}\}_{i=1}^{k+1} is a CrkhsC_{rkhs}-orthonormal basis for the RKHS-restricted Krylov subspace in eq. 4.2.

Proof. Note from Theorem 2.4 that Crkhs=VΛVC_{rkhs}^{{\dagger}}=V\Lambda V^{\top}. Let Wr=VrΛr1/2W_{r}=V_{r}\Lambda_{r}^{1/2}. Then WrCrkhsWr=IrW_{r}^{\top}C_{rkhs}W_{r}=I_{r}, Crkhs=WrWrC_{rkhs}^{{\dagger}}=W_{r}W_{r}^{\top}, and (Wr)=(Crkhs)\mathcal{R}(W_{r})=\mathcal{R}(C_{rkhs}). For any ziz_{i}, Proposition 4.3 implies that there exists virv_{i}\in\mathbb{R}^{r} such that zi=Wrviz_{i}=W_{r}v_{i}. We get from eq. 3.3b and eq. 3.6 that

βi+1ui+1=AWrviαiui,\displaystyle\beta_{i+1}u_{i+1}=AW_{r}v_{i}-\alpha_{i}u_{i},
αi+1vi+1=WrAui+1βi+1vi,\displaystyle\alpha_{i+1}v_{i+1}=W_{r}^{\top}A^{\top}u_{i+1}-\beta_{i+1}v_{i},

where the second equation comes from αi+1Wrvi+1=WrWrAui+1βi+1Wrvi\alpha_{i+1}W_{r}v_{i+1}=W_{r}W_{r}^{\top}A^{\top}u_{i+1}-\beta_{i+1}W_{r}v_{i}. Combining the above two relations with eq. 3.3a, we conclude that the iterative process for generating uiu_{i} and viv_{i} is the standard GKB process of AWrAW_{r} with starting vector bb between the two finite dimensional Hilbert spaces (r,,2)(\mathbb{R}^{r},\langle\cdot,\cdot\rangle_{2}) and (m,,2)(\mathbb{R}^{m},\langle\cdot,\cdot\rangle_{2}). Therefore, {ui}i=1k+1\{u_{i}\}_{i=1}^{k+1} and {vi}i=1k+1\{v_{i}\}_{i=1}^{k+1} are two 2-orthonormal bases of the Krylov subspaces

𝒦k+1(AWr(AWr),b)=span{(AWrWrA)ib}i=0k,\displaystyle\mathcal{K}_{k+1}(AW_{r}(AW_{r})^{\top},b)=\mathrm{span}\{(AW_{r}W_{r}^{\top}A^{\top})^{i}b\}_{i=0}^{k},
𝒦k+1((AWr)AWr,(AWr)b)=span{(WrAAWr)iWrAb}i=0k,\displaystyle\mathcal{K}_{k+1}((AW_{r})^{\top}AW_{r},(AW_{r})^{\top}b)=\mathrm{span}\{(W_{r}^{\top}A^{\top}AW_{r})^{i}W_{r}^{\top}A^{\top}b\}_{i=0}^{k},

respectively; see e.g. [13, §10.4]. Then, WrWr=CrkhsW_{r}W_{r}^{\top}=C_{rkhs}^{{\dagger}} implies eq. 4.3. Also, {zi}i=1k+1={Wrvi}i=1k+1\{z_{i}\}_{i=1}^{k+1}=\{W_{r}v_{i}\}_{i=1}^{k+1} is a CrkhsC_{rkhs}-orthonormal basis of Wr𝒦k+1((AWr)AWr,(AWr)b)W_{r}\mathcal{K}_{k+1}((AW_{r})^{\top}AW_{r},(AW_{r})^{\top}b) since WrW_{r} is CrkhsC_{rkhs}-orthonormal.

Finally, {zi}i=1k+1\{z_{i}\}_{i=1}^{k+1} are CrkhsC_{rkhs} orthogonal by construction, and by using the relation

Wr(WrAAWr)iWrAb=(WrWrAA)iWrWrAb=(CrkhsAA)iCrkhsAb,\displaystyle W_{r}(W_{r}^{\top}A^{\top}AW_{r})^{i}W_{r}^{\top}A^{\top}b=(W_{r}W_{r}^{\top}A^{\top}A)^{i}W_{r}W_{r}^{\top}A^{\top}b=(C_{rkhs}^{{\dagger}}A^{\top}A)^{i}C_{rkhs}^{{\dagger}}A^{\top}b,

we get that {zi}i=1k+1\{z_{i}\}_{i=1}^{k+1} in the RKHS-restricted Krylov subspace in eq. 4.2.   

Proposition 4.5 (gGKB termination number)

Suppose the gGKB in Algorithm 2 terminates at step kt=maxk1{αkβk>0}k_{t}=\max_{k\geq 1}\{\alpha_{k}\beta_{k}>0\}. Let the distinct nonzero eigenvalues of ACrkhsAAC_{rkhs}^{{\dagger}}A^{\top} be μ1>μs>0\mu_{1}>\cdots\mu_{s}>0 with multiplicities m1,,msm_{1},\dots,m_{s}, and the corresponding eigenspaces are 𝒢1,,𝒢s\mathcal{G}_{1},\cdots,\mathcal{G}_{s}. Then, kt=qk_{t}=q, where qq is the number of nonzero elements in {P𝒢1b,,P𝒢sb}\{P_{\mathcal{G}_{1}}b,\dots,P_{\mathcal{G}_{s}}b\}, and

q=dim(𝒦(ACrkhsA,b))=dim(𝒦(CrkhsAA,CrkhsAb)),q=\mathrm{dim}(\mathcal{K}_{\infty}(AC_{rkhs}^{{\dagger}}A^{\top},b))=\mathrm{dim}(\mathcal{K}_{\infty}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b)), (4.4)

where K(M,v)=span{Miv}i=0K_{\infty}(M,v)=\mathrm{span}\{M^{i}v\}_{i=0}^{\infty} denotes the Krylov subspace of {M,v}\{M,v\}. Moreover, i=1smi=r\sum_{i=1}^{s}m_{i}=r with rr being the rank of AA and kt=qrk_{t}=q\leq r.

Proof. First, we prove eq. 4.4 with qq being the number of nonzero elements in {P𝒢1b,,P𝒢sb}\{P_{\mathcal{G}_{1}}b,\dots,P_{\mathcal{G}_{s}}b\}. Let gj=P𝒢jb/P𝒢jb2g_{j}=P_{\mathcal{G}_{j}}b/\|P_{\mathcal{G}_{j}}b\|_{2} for 1js1\leq j\leq s, and let g1,,gq𝟎g_{1},\dots,g_{q}\neq\mathbf{0} without loss of generality. Note that {gj}j=1q\{g_{j}\}_{j=1}^{q} are orthonormal. Let GjG_{j} be a matrix with orthonormal columns that span 𝒢j\mathcal{G}_{j}. Note that P𝒢j=GjGjP_{\mathcal{G}_{j}}=G_{j}G_{j}^{\top}. By the eigenvalue decomposition ACrkhsA=j=1sμjGjGjAC_{rkhs}^{{\dagger}}A^{\top}=\sum_{j=1}^{s}\mu_{j}G_{j}G_{j}^{\top}, we have

wi:=(ACrkhsA)i1b=j=1sμji1GjGjb=j=1qμji1P𝒢jb2gj.w_{i}:=(AC_{rkhs}^{{\dagger}}A^{\top})^{i-1}b=\sum_{j=1}^{s}\mu_{j}^{i-1}G_{j}G_{j}^{\top}b=\sum_{j=1}^{q}\mu_{j}^{i-1}\|P_{\mathcal{G}_{j}}b\|_{2}g_{j}. (4.5)

Hence, rank{wi}i=1q\mathrm{rank}\{w_{i}\}_{i=1}^{\infty}\leq q. On the other hand, for 1kq1\leq k\leq q, setting w¯i=P𝒢jb2gj\bar{w}_{i}=\|P_{\mathcal{G}_{j}}b\|_{2}g_{j}, we have (w1,wk)=(w¯1,,w¯q)Tk(w_{1}\dots,w_{k})=(\bar{w}_{1},\dots,\bar{w}_{q})T_{k} with

Tk=(1μ1μ1k11μ2μ2k11μqμqk1)=(Tk1Tk2),T_{k}=\begin{pmatrix}1&\mu_{1}&\cdots&\mu_{1}^{k-1}\\ 1&\mu_{2}&\cdots&\mu_{2}^{k-1}\\ \vdots&\vdots&\cdots&\vdots\\ 1&\mu_{q}&\cdots&\mu_{q}^{k-1}\end{pmatrix}=\begin{pmatrix}T_{k1}\\ T_{k2}\end{pmatrix},

where Tk1k×kT_{k1}\in\mathbb{R}^{k\times k} consists of the first kk rows of TkT_{k}, and Tk2k,qkT_{k2}\in\mathbb{R}^{k,q-k} consists of the rest rows. Note that Tk1T_{k1} is a Vandermonde matrix and it is nonsingular since μiμj\mu_{i}\neq\mu_{j} for 1ijk1\leq i\neq j\leq k. Then, TkT_{k} has full column rank, thereby the rank of {wi}i=1k\{w_{i}\}_{i=1}^{k} is kk for 1kq1\leq k\leq q, and rank{wi}i=1rank{wi}i=1q=q\mathrm{rank}\{w_{i}\}_{i=1}^{\infty}\geq\mathrm{rank}\{w_{i}\}_{i=1}^{q}=q. Therefore, we have dim(𝒦(ACrkhsA,b))=rank{wi}i=1=q\mathrm{dim}(\mathcal{K}_{\infty}(AC_{rkhs}^{{\dagger}}A^{\top},b))=\mathrm{rank}\{w_{i}\}_{i=1}^{\infty}=q.

Also, we have dim(𝒦(CrkhsAA,CrkhsAb))=rank{CrkhsAwi}i=1=q\mathrm{dim}(\mathcal{K}_{\infty}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b))=\mathrm{rank}\{C_{rkhs}^{{\dagger}}A^{\top}w_{i}\}_{i=1}^{\infty}=q, where the first equality follows from

(CrkhsAA)i1CrkhsAb=CrkhsA(ACrkhsA)i1b=CrkhsAwi,(C_{rkhs}^{{\dagger}}A^{\top}A)^{i-1}C_{rkhs}^{{\dagger}}A^{\top}b=C_{rkhs}^{{\dagger}}A^{\top}(AC_{rkhs}^{{\dagger}}A^{\top})^{i-1}b=C_{rkhs}^{{\dagger}}A^{\top}w_{i}, (4.6)

and the second equality follows from rank{CrkhsAwi}i=1=rank{wi}i=1=q\mathrm{rank}\{C_{rkhs}^{{\dagger}}A^{\top}w_{i}\}_{i=1}^{\infty}=\mathrm{rank}\{w_{i}\}_{i=1}^{\infty}=q since CrkhsAC_{rkhs}^{{\dagger}}A^{\top} is non-singular on span{wi}i=1\mathrm{span}\{w_{i}\}_{i=1}^{\infty}, which is a subset of span{𝒢l}l=1s\mathrm{span}\{\mathcal{G}_{l}\}_{l=1}^{s} by eq. 4.5. In fact, CrkhsAC_{rkhs}^{{\dagger}}A^{\top} is non-singular on span{𝒢l}l=1s\mathrm{span}\{\mathcal{G}_{l}\}_{l=1}^{s} because {𝒢l}\{\mathcal{G}_{l}\} are eigenspaces of ACrkhsAAC_{rkhs}^{{\dagger}}A^{\top} corresponding to the positive eiengvalues.

Furthermore, eq. 4.6 and the non-degeneracy of CrkhsAC_{rkhs}^{{\dagger}}A^{\top} on span{wi}i=1\mathrm{span}\{w_{i}\}_{i=1}^{\infty} imply that

dim(𝒦q(CrkhsAA,CrkhsAb))=\displaystyle\mathrm{dim}(\mathcal{K}_{q}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b))= rank({CrkhsAA)i1CrkhsAb}i=0q1\displaystyle\mathrm{rank}(\{C_{rkhs}^{{\dagger}}A^{\top}A)^{i-1}C_{rkhs}^{{\dagger}}A^{\top}b\}_{i=0}^{q-1} (4.7)
=\displaystyle= rank({CrkhsAwi}i=1q=rank{wi}i=1q=q.\displaystyle\mathrm{rank}(\{C_{rkhs}^{{\dagger}}A^{\top}w_{i}\}_{i=1}^{q}=\mathrm{rank}\{w_{i}\}_{i=1}^{q}=q.

That is, the vectors {CrkhsAA)i1CrkhsAb}i=0q1\{C_{rkhs}^{{\dagger}}A^{\top}A)^{i-1}C_{rkhs}^{{\dagger}}A^{\top}b\}_{i=0}^{q-1} are linearly independent.

Next, we prove that kt=qk_{t}=q. Clearly, ktqk_{t}\leq q since by Proposition 4.4 {zi}i=1kt\{z_{i}\}_{i=1}^{k_{t}} are orthogonal and they are in 𝒦kt(CrkhsAA,CrkhsAb)𝒦(CrkhsAA,CrkhsAb)\mathcal{K}_{k_{t}}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b)\subset\mathcal{K}_{\infty}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b), whose dimension is qq. On the other hand, we show next that if kt<qk_{t}<q, there will be a contradiction; hence, we must have kt=qk_{t}=q. In fact, eq. 3.3b and eq. 3.6 imply that, for each 1ikt1\leq i\leq k_{t},

CrkhsAAzi\displaystyle C_{rkhs}^{{\dagger}}A^{\top}Az_{i} =αiCrkhsAui+βi+1CrkhsAui+1\displaystyle=\alpha_{i}C_{rkhs}^{{\dagger}}A^{\top}u_{i}+\beta_{i+1}C_{rkhs}^{{\dagger}}A^{\top}u_{i+1}
=αi(αizi+βizi+1)+βi+1(αi+1zi+1+βi+1zi),\displaystyle=\alpha_{i}(\alpha_{i}z_{i}+\beta_{i}z_{i+1})+\beta_{i+1}(\alpha_{i+1}z_{i+1}+\beta_{i+1}z_{i}),

which leads to

αi+1βi+1zi+1=CrkhsAAzi(αi2+βi+12)ziαiβizi1.\alpha_{i+1}\beta_{i+1}z_{i+1}=C_{rkhs}^{{\dagger}}A^{\top}Az_{i}-(\alpha_{i}^{2}+\beta_{i+1}^{2})z_{i}-\alpha_{i}\beta_{i}z_{i-1}.

Note that α1β1z1=CrkhsAβ1u1=CrkhsAb\alpha_{1}\beta_{1}z_{1}=C_{rkhs}^{{\dagger}}A^{\top}\beta_{1}u_{1}=C_{rkhs}^{{\dagger}}A^{\top}b. Combining the above two relations and using αkβk>0\alpha_{k}\beta_{k}>0 for all kktk\leq k_{t}, it follows that zkspan{(CrkhsAA)iCrkhsAb}i=0kz_{k}\in\mathrm{span}\{(C_{rkhs}^{{\dagger}}A^{\top}A)^{i}C_{rkhs}^{{\dagger}}A^{\top}b\}_{i=0}^{k} for all kktk\leq k_{t}. Hence, recursively applying zi=1αiβiCrkhsAAzi11αiβi(αi12+βi2)zi1αi1βi1αiβizi2z_{i}=\frac{1}{\alpha_{i}\beta_{i}}C_{rkhs}^{{\dagger}}A^{\top}Az_{i-1}-\frac{1}{\alpha_{i}\beta_{i}}(\alpha_{i-1}^{2}+\beta_{i}^{2})z_{i-1}-\frac{\alpha_{i-1}\beta_{i-1}}{\alpha_{i}\beta_{i}}z_{i-2} for all 2ikt2\leq i\leq k_{t}, we can write

αkt+1βkt+1zkt+1=i=0ktξi(CrkhsAA)iCrkhsAb,\alpha_{k_{t}+1}\beta_{k_{t}+1}z_{k_{t}+1}=\sum_{i=0}^{k_{t}}\xi_{i}(C_{rkhs}^{{\dagger}}A^{\top}A)^{i}C_{rkhs}^{{\dagger}}A^{\top}b,

with ξi\xi_{i}\in\mathbb{R} and in particular, ξkt=1/Πi=1ktαiβi0\xi_{k_{t}}=1/\Pi_{i=1}^{k_{t}}\alpha_{i}\beta_{i}\neq 0. Now αkt+1βkt+1=0\alpha_{k_{t}+1}\beta_{k_{t}+1}=0 implies that {(CrkhsAA)iCrkhsAb}i=0kt\{(C_{rkhs}^{{\dagger}}A^{\top}A)^{i}C_{rkhs}^{{\dagger}}A^{\top}b\}_{i=0}^{k_{t}} are linearly dependent, contradicting with the fact that they are linearly independent as suggested by eq. 4.7. Therefore, we have q=ktq=k_{t}.

Lastly, to prove that i=1smi=r\sum_{i=1}^{s}m_{i}=r, it suffices to show that rank(ACrkhsA)=r\mathrm{rank}(AC_{rkhs}^{{\dagger}}A^{\top})=r since the eigenvalues of ACrkhsAAC_{rkhs}^{{\dagger}}A^{\top} are nonnegative. To see that its rank is rr, following the proof of Proposition 4.4, we write ACrkhsA=AWrWrAAC_{rkhs}^{{\dagger}}A^{\top}=AW_{r}W_{r}^{\top}A^{\top}. Since AWr=AVrΛr1/2AW_{r}=AV_{r}\Lambda_{r}^{1/2}, we only need to prove that AVrAV_{r} has full column rank. Suppose AVry=𝟎AV_{r}y=\mathbf{0} with yry\in\mathbb{R}^{r}. By Theorem 2.4 it follows AAVry=𝟎BVrΛry=𝟎y=𝟎A^{\top}AV_{r}y=\mathbf{0}\Leftrightarrow BV_{r}\Lambda_{r}y=\mathbf{0}\Leftrightarrow y=\mathbf{0}. Thus, rank(ACrkhsA)=rank(AWr)=r\mathrm{rank}(AC_{rkhs}^{{\dagger}}A^{\top})=\mathrm{rank}(AW_{r})=r.   

4.2 Uniqueness of solutions in the iterations

Proposition 4.6

If gGKB terminates at step ktk_{t} in eq. 4.1, the iterative solution xktx_{k_{t}} is the unique least squares solution to minx(Crkhs)Axb2\min_{x\in\mathcal{R}(C_{rkhs})}\|Ax-b\|_{2}.

Proof. Following the proof of Proposition 4.4, the solution to minx(Crkhs)Axb2\min_{x\in\mathcal{R}(C_{rkhs})}\|Ax-b\|_{2} is x=Wryx_{\star}=W_{r}y_{\star} with y=argminyAWryb2y_{\star}=\mathop{\mathrm{argmin}}_{y}\|AW_{r}y-b\|_{2}. Since AWrAW_{r} has full column rank, it follows that yy_{\star} is the unique solution to WrA(AWryb)=𝟎W_{r}^{\top}A^{\top}(AW_{r}y-b)=\mathbf{0}. Note that (Wr)=(Crkhs)\mathcal{R}(W_{r})=\mathcal{R}(C_{rkhs}). Thus, xx_{\star} is the solution to minx(Crkhs)Axb2\min_{x\in\mathcal{R}(C_{rkhs})}\|Ax-b\|_{2} if and only if P(Crkhs)A(Axb)=𝟎P_{\mathcal{R}(C_{rkhs})}A^{\top}(Ax_{\star}-b)=\mathbf{0}.

Now we only need to prove P(Crkhs)A(Axqb)=𝟎P_{\mathcal{R}(C_{rkhs})}A^{\top}(Ax_{q}-b)=\mathbf{0} since kt=qk_{t}=q by Proposition 4.5. Using the property P(Crkhs)=CrkhsCrkhsP_{\mathcal{R}(C_{rkhs})}=C_{rkhs}C_{rkhs}^{{\dagger}}, we get from eq. 3.8c

P(Crkhs)AUk+1=Crkhs(ZkBkT+αk+1zk+1ek+1).P_{\mathcal{R}(C_{rkhs})}A^{\top}U_{k+1}=C_{rkhs}(Z_{k}B_{k}^{T}+\alpha_{k+1}z_{k+1}e_{k+1}^{\top}).

Combining the above relation with eq. 3.10, we have

P(Crkhs)A(Axqb)\displaystyle P_{\mathcal{R}(C_{rkhs})}A^{\top}(Ax_{q}-b) =Crkhs(ZqBqT+αq+1zq+1eq+1)(Bqyqβ1e1)\displaystyle=C_{rkhs}(Z_{q}B_{q}^{T}+\alpha_{q+1}z_{q+1}e_{q+1}^{\top})(B_{q}y_{q}-\beta_{1}e_{1})
=Crkhs[Zq(BqBqyqBqβ1e1)+αq+1βq+1zq+1eqyq]\displaystyle=C_{rkhs}[Z_{q}(B_{q}^{\top}B_{q}y_{q}-B_{q}^{\top}\beta_{1}e_{1})+\alpha_{q+1}\beta_{q+1}z_{q+1}e_{q}^{\top}y_{q}]
=αq+1βq+1Crkhszq+1eqyq=𝟎,\displaystyle=\alpha_{q+1}\beta_{q+1}C_{rkhs}z_{q+1}e_{q}^{\top}y_{q}=\boldsymbol{0},

since αq+1βq+1=0\alpha_{q+1}\beta_{q+1}=0 when gGKB terminates.   

This result shows the necessity for early stopping the iteration to avoid getting a naive solution. The next theorem shows the uniqueness of the solution in each iteration of the algorithm.

Theorem 4.7 (Uniquess of solution in each iteration)

For each iteration with k<ktk<k_{t}, there exists a unique solution to eq. 3.1. Furthermore, there exists a unique solution to minx𝒮kAxb2\min_{x\in\mathcal{S}_{k}}\|Ax-b\|_{2}.

Proof. Let Wkn×kW_{k}\in\mathbb{R}^{n\times k} that has orthonormal columns and spans 𝒮k\mathcal{S}_{k}. For any x𝒮kx\in\mathcal{S}_{k}, there is a unique yky\in\mathbb{R}^{k} such that x=Wkyx=W_{k}y. Then the solution to eq. 3.1 should be xk=Wkykx_{k}=W_{k}y_{k}, where yky_{k} is the solution to

miny𝒴kWkyCrkhs,𝒴k={y:minykAWkyb2}.\min_{y\in\mathcal{Y}_{k}}\|W_{k}y\|_{C_{rkhs}},\ \ \mathcal{Y}_{k}=\{y:\min_{y\in\mathbb{R}^{k}}\|AW_{k}y-b\|_{2}\}.

By [10, Theorem 2.1], it has a unique solution yky_{k} iff 𝒩(Crkhs1/2Wk)𝒩(AWk)={𝟎}\mathcal{N}(C_{rkhs}^{1/2}W_{k})\bigcap\mathcal{N}(AW_{k})=\{\boldsymbol{0}\}.

Now we prove 𝒩(Crkhs1/2Wk)={𝟎}\mathcal{N}(C_{rkhs}^{1/2}W_{k})=\{\boldsymbol{0}\}. To this end, suppose y𝒩(Crkhs1/2Wk)y\in\mathcal{N}(C_{rkhs}^{1/2}W_{k}) and x=Wkyx=W_{k}y. Then x𝒮k𝒩(Crkhs1/2)x\in\mathcal{S}_{k}\bigcap\mathcal{N}(C_{rkhs}^{1/2}). Since 𝒩(Crkhs1/2)=𝒩(Crkhs)=(Crkhs)\mathcal{N}(C_{rkhs}^{1/2})=\mathcal{N}(C_{rkhs})=\mathcal{R}(C_{rkhs})^{\perp}, we get x𝒮k(Crkhs)(Crkhs)(Crkhs)={𝟎}x\in\mathcal{S}_{k}\bigcap\mathcal{R}(C_{rkhs})^{\perp}\subseteq\mathcal{R}(C_{rkhs})\bigcap\mathcal{R}(C_{rkhs})^{\perp}=\{\boldsymbol{0}\}. Therefore, x=Wky=𝟎x=W_{k}y=\boldsymbol{0}, which leads to y=𝟎y=\boldsymbol{0}.

To prove the uniqueness of a solution to minx𝒮kAxb2\min_{x\in\mathcal{S}_{k}}\|Ax-b\|_{2}, suppose that there are two minimizers, x1x2𝒮kx_{1}\neq x_{2}\in\mathcal{S}_{k}, and we prove that they must be the same. Let x=x1x2x_{*}=x_{1}-x_{2} and we have AAx=0A^{\top}Ax_{*}=0 since the minimizer of (x)=Axb2\mathcal{E}(x)=\|Ax-b\|_{2} must satisfy 0=12(x)=AAxAb0=\frac{1}{2}\nabla\mathcal{E}(x)=A^{\top}Ax-A^{\top}b. That is, x𝒩(AA)x_{*}\in\mathcal{N}(A^{\top}A).

On the other hand, since x𝒮k(Crkhs)x_{*}\in\mathcal{S}_{k}\subset\mathcal{R}(C_{rkhs}) by Proposition 4.3, and note that Crkhs=B(AA)B=BVrΛr1VrBC_{rkhs}=B(A^{\top}A)^{{\dagger}}B=BV_{r}\Lambda_{r}^{-1}V_{r}^{\top}B, we have B1x((AA)B)𝒩(AA)B^{-1}x_{*}\in\mathcal{R}((A^{\top}A)^{{\dagger}}B)\subset\mathcal{N}(A^{\top}A)^{\perp}.

Combining the above two, we have x,B1x=0\langle{x_{*},B^{-1}x_{*}}\rangle=0. But BB is a symmetric positive definite matrix, so we must have x=0x_{*}=0.   

5 Numerical Examples

5.1 The Fredholm equation of the first kind

We first examine the iDARR in solving the discrete Fredholm integral equation of the first kind. The tests cover two distinct types of spectral decay: exponential and polynomial. The latter is well-known to be challenging, often occurring in applications such as image deblurring. Additionally, we investigate two scenarios concerning whether the true solution is inside or outside the function space of identifiability (FSOI).

Three norms in iterative and direct methods. We compare the l2l^{2}, L2L^{2}, and DA-RKHS norms in iterative and direct methods. The direct methods are based on matrix decomposition. These regularizers are listed in Table 2.

Table 2: Three regularization norms in iterative and direct methods.
Norms x2\|x\|_{*}^{2} Iterative Direct
l2l^{2} xInxx^{\top}I_{n}x IR-l2 l2
L2L^{2} xBxx^{\top}Bx IR-L2 L2
DA-RKHS xCrkhsxx^{\top}C_{rkhs}x iDARR DARTR

The iterative methods differ primarily in their regularization norms. For the with l2l^{2} norm, we use the LSQR method in the IR TOOLS package [12], and we stop the iteration when αi\alpha_{i} or βi\beta_{i} becomes negligible to maintain stability. For the L2L^{2} norm, we use gGKB to construct solution subspaces by replacing CrkhsC_{rkhs} by the basis matrix BB; note that this method is equivalent to the LSQR method using L=BL=\sqrt{B} as a preconditioner in the IR TOOLS package.

The direct methods are Tikhonov regularizers using the L-curve method [16].

Numerical settings. We consider the problem of recovering the input signal ϕ\phi in a discretization of Fredholm integral equation in eq. 2.1 with s[a,b]s\in[a,b] and t[c,d]t\in[c,d]. The data are discrete noisy observations b=(y(t1),,y(tm))mb=(y(t_{1}),\ldots,y(t_{m}))\in\mathbb{R}^{m}, where tj=c+j(dc)/mt_{j}=c+j(d-c)/m for 0jm0\leq j\leq m. The task is to estimate the coefficient vector x=(ϕ(s1),,ϕ(sn))nx=(\phi(s_{1}),\ldots,\phi(s_{n}))\in\mathbb{R}^{n} in a piecewise-constant function ϕ(s)=i=1nϕ(si)𝟏[si1,si](s)\phi(s)=\sum_{i=1}^{n}\phi(s_{i})\mathbf{1}_{[s_{i-1},s_{i}]}(s), where 𝒮:={si}i=1n[a,b]\mathcal{S}:=\{s_{i}\}_{i=1}^{n}\subset[a,b] with si=a+iδs_{i}=a+i\delta, δ=(ba)/n\delta=(b-a)/n. We obtain the linear system eq. 1.1 with A(j,i)=K(tj,si)δA(j,i)=K(t_{j},s_{i})\delta by a Riemann sum approximation of the integral. We set (a,b,c,d)=(1,5,0,5)(a,b,c,d)=(1,5,0,5), m=500m=500, and take n=100n=100 except when testing the computational time with a sequence of large values for nn.

The m\mathbb{R}^{m}-valued noise 𝐰\mathbf{w} is Gaussian N(0,σ2ΔtIm)N(0,\sigma^{2}\Delta tI_{m}). We set the standard deviation of the noise to be σ=Ax×nsr\sigma=\|Ax\|\times nsr, where nsrnsr is the noise-to-signal ratio, and we test our methods with nsr={0.0625,0.125,0.25,0.5,1}nsr=\{0.0625,0.125,0.25,0.5,1\}.

We consider two integral kernels

(a) K(t,s):=s2est;(b) K(t,s):=s1|sin(st+1)|.\text{(a) }\,K(t,s):=s^{-2}e^{-st};\quad\text{(b) }\,K(t,s):=s^{-1}|\sin(st+1)|. (5.1)

which lead to exponential and polynomial decaying spectra, respectively, as shown in Figure 1. The first kernel arises from magnetic resonance relaxometry [4] with ϕ\phi being the distribution of transverse nuclear relaxation times.

Refer to caption
(a) Exponential decaying spectrum
Refer to caption
(b) Polynomial decaying spectrum
Figure 1: Singular values of AA and generalized eigenvalues of (AA,B)(A^{\top}A,B) for kernels in eq. 5.1.

The DA-RKHS. By its definition in eq. 2.4, the exploration measure is ρ(si)=1γj=1m|K(tj,si)|δ\rho(s_{i})=\frac{1}{\gamma}\sum_{j=1}^{m}|K(t_{j},s_{i})|\delta for i=1,,ni=1,\dots,n with γ\gamma being the normalizing constant. In other words, it is the normalized column sum of the absolute values of the matrix AA. The discrete function space Lρ2(𝒮)L^{2}_{\rho}(\mathcal{S}) is equivalent to n\mathbb{R}^{n} with weight ρ=(ρ(s1),,ρ(sn))\rho=(\rho(s_{1}),\ldots,\rho(s_{n})), and its norm is x,xLρ2:=xdiag(ρ(si))x\langle{x,x}\rangle_{L^{2}_{\rho}}:=x^{\top}\mathrm{diag}(\rho(s_{i}))x for all xnx\in\mathbb{R}^{n}. The basis matrix for the Cartesian basis of n\mathbb{R}^{n} is B=diag(ρ(si))B=\mathrm{diag}(\rho(s_{i})), which is also the basis matrix for step functions in the Riemann sum discretization. The DA-RKHS in this discrete setting is (𝒩(AA)),,Crkhs)(\mathcal{N}(A^{\top}A))^{\perp},\langle{\cdot,\cdot}\rangle_{C_{rkhs}}).

Settings for comparisons. The comparison consists of two scenarios regarding whether the true solution is inside or outside of the FSOI: (i) the true solution is the second eigenvector of G¯{\mathcal{L}_{\overline{G}}}; thus it is inside the FSOI; and (ii) the true solution is ϕ(x)=x2\phi(x)=x^{2}, which has significant components outside the FSOI.

For each scenario, we conduct 100 independent simulations, with each simulation comprising five datasets at varying noise levels. The results are presented by a box plot, which illustrates the median, the lower and upper quartiles, any outliers, and the range of values excluding outliers. The key indicator of a regularizer’s effectiveness is its ability to produce accurate estimators whose errors decay consistently as the noise level decreases. Since exploring the decay rate in the small noise limit is not the focus of this study, we direct readers to [27, 22] for initial insights into how this rate is influenced by factors such as spectral decay, the smoothness of the true solution, and the choice of regularization strength.

Refer to caption
Figure 2: Results in the case of exponentially decaying spectrum. Top-row: typical estimators of IR-l2, IR-L2, and iDARR when nsr=0.5nsr=0.5 and their denoising of the output signal. The 2nd-top row: the residual Axkb2\|Ax_{k}-b\|_{2} as iteration number kk increases in one realization when nsr=0.5nsr=0.5, as well as the boxplots of the stopping iteration numbers in the 100 simulations. The lower two rows: boxplots of the estimators’ Lρ2L^{2}_{\rho} errors and loss function values in the 100 simulations.

Results. We report the results separately according to the spectral decay.

(i) Exponential decaying spectrum. Figure 2’s top row shows typical estimators of IR-l2, IR-L2, and iDARR and their de-noising of the output signal when nsr=0.5nsr=0.5. When the true solution is inside the FSOI, the iDARR significantly outperforms the other two in producing a more accurate estimator. However, both IR-L2 and IR-l2 can denoise the data accurately, even though their estimators are largely biased. When the true solution is outside the FSOI, all the regularizers can not capture the true function accurately, but iDARR and IR-L2 clearly outperform the l2l_{2} regularizer. Yet again, all these largely biased estimators can de-noise the data accurately. Thus, this inverse problem is severely ill-defined, and one must restrict the inverse to be in the FSOI.

The 2nd top row of Figure 2 shows the decay of the residual Axky2\|Ax_{k}-y\|_{2} as the iteration number increases, as well as the stopping iteration numbers of these regularizers in 100 simulations. The fast decaying residual suggests the need for early stopping, and all three regularizers indeed stop in a few steps. Notably, iDARR consistently stops early at the second iteration for different noise levels, outperforming the other two regularizers in stably detecting the stopping iteration.

The effectiveness of the DA-RKHS regularization becomes particularly evident in the lower two rows of Figure 2, which depict the decaying errors and loss values as the noise-to-signal ratio (nsrnsr) decreases in the 100 independent simulations. In both iterative and direct methods, the DA-RKHS norm demonstrates superior performance compared to the l2l^{2} and L2L^{2} norms, consistently delivering more accurate estimators that show a steady decrease in error alongside the noise level. Notably, the values of the corresponding loss functions are similar, underscoring the inherent ill-posedness of the inverse problem. Furthermore, iDARR marginally surpasses the direct method DARTR in producing more precise estimators, particularly when the true solution resides within the FSOI. The performance of iDARR suggests that its early stopping mechanism can reliably determine an optimal regularization level, achieving results that are slightly more refined than those obtained with DARTR using the L-curve method.

Refer to caption
Figure 3: Results in the case of polynomial decaying spectrum. Top-row: typical estimators of IR-l2, IR-L2, and iDARR when nsr=0.0625nsr=0.0625 and their denoising of the output signal. The 2nd-top row: the residual Axkb2\|Ax_{k}-b\|_{2} as iteration number kk increases in one realization when nsr=0.0625nsr=0.0625, as well as the box plots of the stopping iteration numbers the 100 simulations. The lower two rows: box plots of the estimators’ Lρ2L^{2}_{\rho} errors and loss function values in the 100 simulations.

(ii) Polynomial decaying spectrum. Figure 3 illustrates again the superior performance of iDARR over IR-L2 and IR-l2 in the case of polynomial spectral decay. The 2nd top row shows that the slow spectral decay poses a notable challenge to the iterative methods, as the noise level affects their stopping iteration numbers. Also, they all stop early within approximately twelve steps, even though the true solution may lay in a subspace with a higher dimension.

The lower two rows show that iDARR remains effective. It continues to outperform the other two iterative regularizers when the true solution is in the FSOI, and it is marginally surpassed by IR-L2 and IR-l2 when the true solution is outside the FSOI. In both scenarios, the direct method DARTR outperforms all other methods, including iDARR, indicating DARTR is more effective in extracting information from the spectrum with slow decay.

Computational Complexity. The iterative method iDARR is orders of magnitude faster than the direct method DARTR, especially when nn is large. Figure 4 shows their computation time as nn increases in 1010 independent simulations, and the results align with the complexity order illustrated in Section 3.3.

Refer to caption
Figure 4: Computational time in 10 simulations with m=500m=500.

In summary, iDARR outperforms IR-L2 and IR-l2 in yielding accurate estimators that consistently decay with the noise level. Its major advantage comes from the DA-RKHS norm that adaptively exploits the information in data and the model.

5.2 Image Deblurring

We further test iDARR in 2D image deblurring problems, where the task is to reconstruct images from blurred and noisy observations. The mathematical model of this problem can be expressed in the form of the first-kind Fredholm integral equation in eq. 2.1 with s,t2s,t\in\mathbb{R}^{2}. The kernel K(t,s)K(t,s) is a function that specifies how the points in the image are distorted, called the point spread function (PSF). We chose PRblurspeckle from [12] as the blurring operator, which simulates spatially invariant blurring caused by atmospheric turbulence, and we use zero boundary conditions to construct the matrix AA. For a true image with N×NN\times N pixels, the matrix AN2×N2A\in\mathbb{R}^{N^{2}\times N^{2}} is a psfMatrix object. We consider two images with 256×256256\times 256 and 320×320320\times 320 pixels, respectively, and set the noise level to be nsr=0.01nsr=0.01 for both images. The true images, their blurred noisy observations, and corresponding PSFs that define matrices AA are presented in Figure 5.

Refer to caption
(a) True Image-1
Refer to caption
(b) Blurred Image-1
Refer to caption
(c) PSF for N=256N=256
Refer to caption
(d) True Image-2
Refer to caption
(e) Blurred Image-2
Refer to caption
(f) PSF for N=320N=320
Figure 5: The true images, noisy images blurred by PRblurspeckle, and the corresponding PSFs.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 6: The reconstructed images computed by iDARR, LSQR and hybrid-l2 methods.
Refer to caption
(a) Relative error, Image-1
Refer to caption
(b) iDARR, Image-1
Refer to caption
(c) LSQR, Image-1
Refer to caption
(d) Relative error, Image-2
Refer to caption
(e) iDARR, Image-2
Refer to caption
(f) LSQR, Image-2
Figure 7: Relative errors in iteration numbers, where the circles mark the early stopping iterations chosen by the L-curve method presented in the right two columns.

Figure 6 shows the reconstructed images computed by iDARR, LSQR, and the hybrid-l2 methods. Here the hybrid-l2 applies an l2l_{2}-norm Tikhonov regularization to the projected problem obtained by LSQR, and it uses the stopping strategy in [12]. The best estimations of for iDARR or LSQR are solutions with kk_{*} minimizing xkxtrue2\|x_{k}-x_{\text{true}}\|_{2}. Their reconstructed solutions are obtained by using the L-curve method for early stopping.

Figure 7 (a)–(f) show the relative errors as the iteration number increases and the selection of the early stopping iterations by the L-curve method in the right two columns.

Notably, Figure 7 reveals that iDARR achieves more accurate reconstructed images than LSQR for both tests, despite appearing to the contrary in Figure fig:deblur. The LSQR appears prone to stopping late, resulting in lower-quality reconstructions than iDARR. In contrast, iDARR tends to stop earlier than ideal, before achieving the best quality. However, the hybrid-l2 method consistently produces accurate estimators with stable convergence, suggesting potential benefits in developing a hybrid iDARR approach to enhance stability.

The effectiveness of iDARR depends on the alignment of regularities between the convolution kernel and the image, as the DA-RKHS’s regularity is tied to the smoothness of the convolution kernel. With the PRblurspeckle featuring a smooth PSF, iDARR obtains a higher accuracy for the smoother Image-2 compared to Image-1, producing reconstructions with smooth edges. An avenue for future exploration involves adjusting the DA-RKHS’s smoothness to better align with the smoothness of the data.

6 Conclusion and Future Work

We have introduced iDARR, a scalable iterative data-adaptive RKHS regularization method, for solving ill-posed linear inverse problems. It searches for solutions in the subspaces where the true signal can be identified and achieves reliable early stopping via the DA-RKHS norm. A core innovation is a generalized Golub-Kahan bidiagonalization procedure that recursively computes orthonormal bases for a sequence of RKHS-restricted Krylov subspaces. Systematic numerical tests on the Fredholm integral equation show that iDARR outperforms the widely used iterative regularizations using the L2L^{2} and l2l^{2} norms, in the sense that it produces stable accurate solutions consistently converging when the noise level decays. Applications to 2D image de-blurring further show the iDARR outperforms the benchmark of LSQR with the l2l^{2} norm.

Future Work: Hybrid Methods

The accuracy and stability of the regularized solution hinges on the choice of iteration number for early stopping. While the L-curve criterion is a commonly used tool for determining this number, it can sometimes lead to suboptimal results due to its reliance on identifying a corner in the discrete curve. Hybrid methods are well-recognized alternatives that help stabilize this semi-convergence issue, as referenced in [20, 8, 32]. One promising approach is to apply Tikhonov regularization to each iteration of the projected problem. The hyperparameter for this process can be determined using the weighted generalized cross-validation method (WGCV) as described in [8]. This approach is a focus of our upcoming research project.

References

  • [1] Simon R Arridge, Marta M Betcke, and Lauri Harhanen. Iterated preconditioned LSQR method for inverse problems on unstructured grids. Inverse Probl., 30(7):075009, 2014.
  • [2] Xianglan Bai, Guang-Xin Huang, Xiao-Jun Lei, Lothar Reichel, and Feng Yin. A novel modified TRSVD method for large-scale linear discrete ill-posed problems. Appl. Numer. Math., 164:72–88, 2021.
  • [3] Frank Bauer, Sergei Pereverzev, and Lorenzo Rosasco. On regularization algorithms in learning theory. Journal of complexity, 23(1):52–72, 2007.
  • [4] Chuan Bi, M. Yvonne Ou, Mustapha Bouhrara, and Richard G. Spencer. Span of regularization for solution of inverse problems with application to magnetic resonance relaxometry of the brain. Scientific Reports, 12(1):20194, 2022.
  • [5] Åke Björck. A bidiagonalization algorithm for solving large and sparse ill-posed systems of linear equations. BIT Numer. Math., 28(3):659–670, 1988.
  • [6] Noe Angelo Caruso and Paolo Novati. Convergence analysis of LSQR for compact operator equations. Linear Algebra and its Applications, 583:146–164, 2019.
  • [7] Zhiming Chen, Wenlong Zhang, and Jun Zou. Stochastic convergence of regularized solutions and their finite element approximations to inverse source problems. SIAM Journal on Numerical Analysis, 60(2):751–780, 2022.
  • [8] Julianne Chung, James G Nagy, and Dianne P O’Leary. A weighted-GCV method for Lanczos-hybrid regularization. Electr. Trans. Numer. Anal., 28(29):149–167, 2008.
  • [9] Felipe Cucker and Ding Xuan Zhou. Learning theory: an approximation theory viewpoint, volume 24. Cambridge University Press, Cambridge, 2007.
  • [10] Lars Eldén. A weighted pseudoinverse, generalized singular values, and constrained least squares problems. BIT Numer. Math., 22:487–502, 1982.
  • [11] Heinz Werner Engl, Martin Hanke, and Andreas Neubauer. Regularization of inverse problems, volume 375. Springer Science & Business Media, 1996.
  • [12] Silvia Gazzola, Per Christian Hansen, and James G Nagy. Ir tools: a matlab package of iterative regularization methods and large-scale test problems. Numerical Algorithms, 81(3):773–811, 2019.
  • [13] Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, 4th edition, 2013.
  • [14] Jacques Salomon Hadamard. Lectures on Cauchy’s problem in linear partial differential equations, volume 18. Yale University Press, 1923.
  • [15] Per Christian Hansen. Rank-deficient and discrete ill-posed problems: numerical aspects of linear inversion. SIAM, 1998.
  • [16] Per Christian Hansen. The L-curve and its use in the numerical treatment of inverse problems. In in Computational Inverse Problems in Electrocardiology, ed. P. Johnston, Advances in Computational Bioengineering, pages 119–142. WIT Press, 2000.
  • [17] Guangxin Huang, Yuanyuan Liu, and Feng Yin. Tikhonov regularization with MTRSVD method for solving large-scale discrete ill-posed problems. J. Comput. Appl. Math., 405:113969, 2022.
  • [18] Zhongxiao Jia and Yanfei Yang. A joint bidiagonalization based iterative algorithm for large scale general-form Tikhonov regularization. Appl. Numer. Math., 157:159–177, 2020.
  • [19] Misha E. Kilmer, Per Christian Hansen, and Malena I. Espanol. A projection–based approach to general-form Tikhonov regularization. SIAM J. Sci. Comput., 29(1):315–330, 2007.
  • [20] Misha E Kilmer and Dianne P O’Leary. Choosing regularization parameters in iterative methods for ill-posed problems. SIAM J. Matrix Anal. Appl., 22(4):1204–1221, 2001.
  • [21] Jörg Lampe, Lothar Reichel, and Heinrich Voss. Large-scale Tikhonov regularization via reduction by orthogonal projection. Linear Algebra Appl., 436(8):2845–2865, 2012.
  • [22] Quanjun Lang and Fei Lu. Small noise analysis for Tikhonov and RKHS regularizations. arXiv preprint arXiv:2305.11055, 2023.
  • [23] Gongsheng Li and Zuhair Nashed. A modified Tikhonov regularization for linear operator equations. Numerical functional analysis and optimization, 26(4-5):543–563, 2005.
  • [24] Haibo Li. A preconditioned Krylov subspace method for linear inverse problems with general-form Tikhonov regularization. arXiv:2308.06577v1, 2023.
  • [25] Fei Lu, Qingci An, and Yue Yu. Nonparametric learning of kernels in nonlocal operators. Journal of Peridynamics and Nonlocal Modeling, pages 1–24, 2023.
  • [26] Fei Lu, Quanjun Lang, and Qingci An. Data adaptive RKHS Tikhonov regularization for learning kernels in operators. Proceedings of Mathematical and Scientific Machine Learning, PMLR 190:158-172, 2022.
  • [27] Fei Lu and Miao-Jung Yvonne Ou. An adaptive RKHS regularization for Fredholm integral equations. arXiv preprint arXiv:2303.13737, 2023.
  • [28] M Zuhair Nashed and Grace Wahba. Generalized inverses in reproducing kernel spaces: an approach to regularization of linear operator equations. SIAM Journal on Mathematical Analysis, 5(6):974–987, 1974.
  • [29] Dianne P O’Leary and John A Simmons. A bidiagonalization-regularization procedure for large scale discretizations of ill-posed problems. SIAM J. Sci. Statist. Comput., 2(4):474–489, 1981.
  • [30] Christopher C Paige and Michael A Saunders. LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software, 8:43–71, 1982.
  • [31] Lothar Reichel, Fiorella Sgallari, and Qiang Ye. Tikhonov regularization based on generalized Krylov subspace methods. Appl. Numer. Math., 62(9):1215–1228, 2012.
  • [32] R A Renaut, S Vatankhah, and V E Ardesta. Hybrid and iteratively reweighted regularization by unbiased predictive risk and weighted GCV for projected systems. SIAM J. Sci. Statist. Comput., 39(2):B221–B243, 2017.
  • [33] Leonid I Rudin, Stanley Osher, and Emad Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992.
  • [34] Robert Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
  • [35] Andrei Nikolajevits Tikhonov. Solution of incorrectly formulated problems and the regularization method. Soviet Math., 4:1035–1038, 1963.
  • [36] Grace Wahba. Convergence rates of certain approximate solutions to fredholm integral equations of the first kind. Journal of Approximation Theory, 7(2):167–185, 1973.
  • [37] Grace Wahba. Practical approximate solutions to linear operator equations when the data are noisy. SIAM journal on numerical analysis, 14(4):651–667, 1977.
  • [38] Yimin Wei, Pengpeng Xie, and Liping Zhang. Tikhonov regularization and randomized GSVD. SIAM J. Matrix Anal. Appl., 37(2):649–675, 2016.
  • [39] Hua Xiang and Jun Zou. Regularization with randomized svd for large-scale discrete inverse problems. Inverse Problems, 29(8):085008, 2013.
  • [40] Hua Xiang and Jun Zou. Randomized algorithms for large-scale inverse problems with general Tikhonov regularizations. Inverse Probl., 31(8):085008, 2015.
  • [41] Ye Zhang and Chuchu Chen. Stochastic asymptotical regularization for linear inverse problems. Inverse Problems, 39(1):015007, 2022.