Scalable iterative data-adaptive RKHS regularization

Haibo Li School of Mathematics and Statistics, The University of Melbourne, Victoria, Australia. Jinchao Feng Department of Mathematics, School of Sciences, Great Bay University, Dongguan, China. Fei Lu Department of Mathematics, Johns Hopkins University, Baltimore, USA.

Abstract

We present iDARR, a scalable iterative Data-Adaptive RKHS Regularization method, for solving ill-posed linear inverse problems. The method searches for solutions in subspaces where the true solution can be identified, with the data-adaptive RKHS penalizing the spaces of small singular values. At the core of the method is a new generalized Golub-Kahan bidiagonalization procedure that recursively constructs orthonormal bases for a sequence of RKHS-restricted Krylov subspaces. The method is scalable with a complexity of $O(kmn)$ for $m$ -by- $n$ matrices with $k$ denoting the iteration numbers. Numerical tests on the Fredholm integral equation and 2D image deblurring show that it outperforms the widely used $L^{2}$ and $l^{2}$ norms, producing stable accurate solutions consistently converging when the noise level decays.

Keywords: iterative regularization; ill-posed inverse problem; reproducing kernel Hilbert space; Golub-Kahan bidiagonalization; deconvolution.

AMS subject classifications(MSC2020): 47A52, 65F22, 65J20

1 Introduction

This study considers large-scale ill-posed linear inverse problems with little prior information on the regularization norm. The goal is to reliably solve high-dimensional vectors $x\in\mathbb{R}^{n}$ from the equation

Ax+\mathbf{w}=b,\quad A\in\mathbb{R}^{m\times n},

(1.1)

where $A$ and $b$ are data-dependent forward mapping and output, and $\mathbf{w}$ denotes noise or measurement error. The problem is ill-posed in the sense that the least squares solution with minimal Euclidean norm, often solved by $x_{LS}=A^{\dagger}b$ or $x_{LS}=(A^{\top}A)^{\dagger}A^{\top}b$ with ^† denoting pseudo-inverse, is sensitive to perturbations in $b$ . Such an ill-posedness happens when the singular values of $A$ decay to zero faster than the perturbation in $b$ projected in the corresponding singular vectors.

Regularization is crucial to producing stable solutions for the ill-posed inverse problem. Broadly, it encompasses two integral components: a penalty term that defines the search domain and a hyperparameter that controls the strength of regularization. There are two primary approaches to implementing regularization: direct methods, which rely on matrix decomposition, e.g., the Tikhonov regularization [35], the truncated singular value decomposition (SVD) [15, 11]; and iterative methods, which use matrix-vector computations to scale for high-dimensional problems, see e.g., [12, 7, 41] for recent developments.

In our setting, we encounter two primary challenges: selecting an adaptive regularization norm and devising an iterative method to ensure scalability. The need for an adaptive norm arises from the variability of the forward map $A$ across different applications and the often limited prior information about the regularity of $x$ . Many existing regularization norms, such as the Euclidean norms used in Tikhonov methods in [15, 35] and the total variation norm in [33], lack this adaptability; for more examples, see the related work section below. Although a data-adaptive regularization norm has been proposed in [26, 25] for nonparametric regression, it is implicitly defined and requires a spectral decomposition of the normal operator.

We introduce iDARR, an iterative Data-Adaptive Reproducing kernel Hilbert space Regularization method. This method resolves both challenges by iteratively solving the subspace-projected problem

x_{k}=\mathop{\mathrm{argmin}}_{x\in\mathcal{X}_{k}}\|x\|_{C_{rkhs}},\ \ \mathcal{X}_{k}=\{x:\min_{x\in\mathcal{S}_{k}}\|Ax-b\|_{2}\},

where $\|\cdot\|_{C_{rkhs}}$ is the implicitly defined semi-norm of a data-adaptive RKHS (DA-RKHS), and $\mathcal{S}_{k}$ are subspaces of the DA-RKHS constructed by a generalized Golub-Kahan bidiagonalization (gGKB). By stopping the iteration early using the L-curve criterion, it produces a stable accurate solution without using any matrix decomposition.

The DA-RKHS is a space defined by the data and model, embodying the intrinsic nature of the inverse problem. Its closure is the data-dependent space in which the true solution can be recovered, particularly when $A$ is deficient-ranked. Thus, when used for regularization, it confines the solution search in the right space and penalizes the small singular values, leading to stable solutions. We construct this DA-RKHS by reformulating eq. 1.1 as a weighted Fredholm integral equation of the first kind and examining the identifiability of the input signal, as detailed in Section 2.

Our key innovation is the gGKB. It constructs solution subspaces in the DA-RKHS without explicitly computing it. It is scalable with a cost of only $O(kmn)$ , where $k$ is the number of iterations. This cost is orders of magnitude much smaller than the cost of direct methods based on spectral decomposition of $A^{\top}A$ , typically $O(n^{3}+mn^{2})$ operations.

The iDARR and gGKB have solid mathematical foundations. We prove that each subspace $\mathcal{S}_{k}$ is restricted in the DA-RKHS, thereby named the RKHS-restricted Krylov subspaces. It is spanned by the orthonormal vectors produced by gGKB. Importantly, if not stopped early, the gGKB terminates when the RKHS-restricted Krylov subspace is fully explored, and the solution in each iteration is unique.

Systematic numerical tests employing the Fredholm integral equations demonstrate that iDARR surpasses traditional iterative methods employing $l^{2}$ and $L^{2}$ norms in the state-of-the-art IR TOOLS package [12]. Notably, iDARR delivers accurate estimators that consistently decay with the noise level. This superior performance is evident irrespective of whether the spectral decay is exponential or polynomial, or whether the true solution resides inside or outside the DA-RKHS. Furthermore, our application to image deblurring underscores both its scalability and accuracy.

Main contributions. Our main contribution lies in developing iDARR, a scalable iterative regularization method tailored for large-scale ill-posed inverse problems with little prior knowledge about the solution. The cornerstone of iDARR is the introduction of a new data-adaptive RKHS determined by the underlying model and the data. A key technical innovation is the gGKB, which efficiently constructs solution subspaces of the implicitly defined DA-RKHS.

1.1 Related work

Numerous regularization methods have been developed, and the literature on this topic is too vast to be surveyed here; we refer to [15, 11, 12] and references therein for an overview. In the following, we compare iDARR with the most closely related works.

Regularization norms

Various regularization norms exist, such as Euclidean norms of Tikhonov in [15, 35], the total variation norm $\|x^{\prime}\|_{L^{1}}$ of the Rudin–Osher–Fatemi method in [33], the $L^{1}$ norm $\|x\|_{L^{1}}$ of LASSO in [34], and the RKHS norm $\|x\|_{R}^{2}$ of an RKHS with a user-specified reproducing kernel $R$ [37, 3, 9]. These norms, however, are often based on presumed properties of the solution and do not consider the specifics of each inverse problem. Our RKHS norm differs by adapting to the model and data: our RKHS has a reproducing kernel determined by the inverse problem, and its closure is the space in which the solution can be identified, making it an apt choice for regularization in the absence of additional solution information.

Iterative regularization (IR) methods

IR methods are scalable by accessing the matrix only via matrix-vector multiplications, producing a sequence of estimators until an early stopping, where the iteration number plays the role of the regularization parameter. IR has a rich and extensive history and continues to be a vibrant area of interest in contemporary studies [30, 1, 12, 24]. Different regularization terms lead to various methods. The LSQR algorithm [30, 5] with early stopping is standard for $\|x\|_{2}^{2}$ -regularization. It solves projected problems in Krylov subspaces before transforming back to the original domain. For $\|Lx\|_{2}^{2}$ with $L\in\mathbb{R}^{p\times n}$ , the widely-used methods include joint bidiagonalization method [19, 18], generalized Krylov subspace method [21, 31], random SVD or generalized SVD method [39, 40, 38], modified truncated SVD method [2, 17], etc. For the general regularization norm in the form $x^{T}Mx$ with a symmetric matrix $M$ , the MLSQR in [1] treats positive definite $M$ and the preconditioned GKB [24] handles positive semi-definite $M$ ’s. Our iDARR studies the case that $M$ is unavailable but $M^{\dagger}$ is ready to be used.

Golub-Kahan bidiagonalization (GKB)

The GKB was first used to solve inverse problems in [29], which generates orthonormal bases for Krylov subspaces in $(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{2})$ and $(\mathbb{R}^{m},\langle\cdot,\cdot\rangle_{2})$ . This method extends to bounded linear compact operators between Hilbert spaces, with properties and convergence results detailed in [6]. Our gGKB extends the method to construct RKHS-restricted Krylov subspace in $(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}})$ and $(\mathbb{R}^{m},\langle\cdot,\cdot\rangle_{2})$ , where $C_{rkhs}$ is positive semidefinite, and in particular, only $C_{rkhs}^{\dagger}$ is available.

The remainder of this paper is organized as follows: Section 2 introduces the adaptive RKHS with a characterization of its norm. Section 3 presents in detail the iDARR. Section 4 proves the desired properties of gGKB. In Section 5, we systematically examine the algorithm and demonstrate the robust convergence of the estimator when the noise becomes small. Finally, Section 6 concludes with a discussion on future developments.

2 A Data Adaptive RKHS for Regularization

This section introduces a data-adaptive RKHS (DA-RKHS) that adapts to the model and data in the inverse problem. The closure of this DA-RKHS is the function (or vector) space in which the true solution can be recovered, or equivalently, the inverse problem is well-defined in the sense that the loss function has a unique minimizer. Hence, when its norm is used for regularization, this DA-RKHS ensures that the minimization searches in the space where we can identify the solution.

To describe the DA-RKHS, we first present a unified notation that applies to both discrete and continuous time models using a weighted Fredholm integral equation of the first kind. Based on this notation, we write the normal operator as an integral operator emerging in a variational formulation of the inverse problem. The integral kernel is the reproducing kernel for the DA-RKHS. In other words, the normal operator defines the DA-RKHS. At last, we briefly review a DA-RKHS Tikhonov regularization algorithm, the DARTR algorithm.

2.1 Unified notation for discrete and continuous models

The linear equation eq. 1.1 can arise from discrete or continuous inverse problems. In either case, we can present the inverse problem using the prototype of the Fredholm integral equation of the first kind. We consider only the 1D case for simplicity, and the extension to high-dimensional cases is straightforward. Specifically, let $\mathcal{S},\mathcal{T}\subset\mathbb{R}$ be two compact sets. We aim to recover the function $\phi:\mathcal{S}\to\mathbb{R}$ in the Fredholm equation

\displaystyle y(t)=\int_{\mathcal{S}}K(t,s)\phi(s)\nu(ds)+\sigma\dot{W}(t)=:L_{K}\phi(t)+\sigma\dot{W}(t),\quad\forall t\in\mathcal{T}

(2.1)

from discrete noisy data

b=(y(t_{1}),\ldots,y(t_{m}))^{\top}\in\mathbb{R}^{m},

where we assume the observation index $\mathcal{T}=\{t_{j}\}_{j=1}^{m}$ to be $0=t_{0}<t_{1}<\cdots<t_{m}$ for simplicity. Here the measurement noise $\sigma\dot{W}(t)$ is the white noise scaled by $\sigma$ ; that is, the noise at $t_{j}$ has a Gaussian distribution $\mathcal{N}(0,\sigma^{2}(t_{j+1}-t_{j}))$ for each $j$ . Such noise is integrable when the observation mesh refines, i.e., $\max_{j}(t_{j+1}-t_{j})$ vanishes.

Here the finite measure $\nu$ can be either the Lebesgue measure with $\mathcal{S}$ being an interval or an atom measure with $\mathcal{S}$ having finitely many elements. Correspondingly, the Fredholm integral equation eq. 2.1 is either a continuous or a discrete model.

In either case, the goal is to solve for the function $\phi:\mathcal{S}\to\mathbb{R}$ in eq. 2.1. When seeking a solution in the form of $\phi=\sum_{i=1}^{n}x_{i}\phi_{i}$ , where $\{\phi_{i}\}_{i=1}^{n}$ is a pre-selected set of basis functions, we obtain the linear equation eq. 1.1 with $x=(x_{1},\ldots,x_{n})^{\top}\in\mathbb{R}^{n}$ and the matrix $A$ with entries

A(j,i)=\int_{\mathcal{S}}K(t_{j},s)\phi_{i}(s)\nu(ds)=L_{K}\phi_{i}(t_{j}),\quad 1\leq j\leq m,1\leq i\leq n.

(2.2)

In particular, when $\{\phi_{i}\}$ are piece-wise constants, we obtain $A$ as follows.

•

Discrete model. Let $\nu$ be an atom measure on $\mathcal{S}=\{s_{i}\}_{i=1}^{n}$ , a set with $n$ elements. Suppose that the basis functions are $\phi_{i}(s)=\mathbf{1}_{\{s_{i}\}}(s)$ . Then, $\phi=x$ and the matrix $A$ has entries $A(j,i)=K(t_{j},s_{i})\nu(s_{i})$ .
•

Continuous model. Let $\nu$ be the Lebesgue measure on $\mathcal{S}=[0,1]$ , and $\phi_{i}(s)=\mathbf{1}_{[s_{i-1},s_{i}]}(s)$ be piecewise constant functions on a partition of $\mathcal{S}$ with $0=s_{0}<s_{1}<\ldots<s_{n}=1$ . Then, $\phi=\sum_{i=1}^{n}x_{i}\phi_{i}$ and the matrix $A$ has entries $A(j,i)=K(t_{j},s_{i})(s_{i}-s_{i-1})$ .

The default function spaces for $\phi$ and $y$ above are $L^{2}_{\nu}(\mathcal{S})$ and $L^{2}_{\mu}(\mathcal{T})$ . The loss function $\mathcal{E}(x)=\|Ax-b\|_{2}^{2}$ over $L^{2}_{\nu}(\mathcal{S})$ becomes

\displaystyle\mathcal{E}(\phi):=\left\|L_{K}\phi-y\right\|^{2}_{{L^{2}_{\mu}(\mathcal{T})}}

\displaystyle=\langle{L_{K}\phi,L_{K}\phi}\rangle_{{L^{2}_{\mu}(\mathcal{T})}}-2\langle{L_{K}\phi,y}\rangle_{{L^{2}_{\mu}(\mathcal{T})}}+\left\|y\right\|^{2}_{{L^{2}_{\mu}(\mathcal{T})}},

(2.3)

Eq.eq. 2.1 is a prototype of ill-posed inverse problems, dating back from Hadamard [14], and it remains a testbed for new regularization methods [36, 28, 15, 23].

The $L^{2}_{\nu}(\mathcal{S})$ norm is often a default choice for regularization. However, it has a major drawback: it does not take into account the operator $L_{K}$ , particularly when $L_{K}$ has zero eigenvalues, and it leads to unstable solutions that may blow up in the small noise limit [22]. To avoid such instability, particularly for iterative methods, we introduce a weighted function space and an RKHS that are adaptive to both the data and the model in the next sections.

2.2 The function space of identifiability

We first introduce a weighted function space $L^{2}_{\rho}(\mathcal{S})$ , where the measure $\rho$ is defined as

\frac{d\rho}{d\nu}(s):=\frac{1}{Z}\int_{\mathcal{T}}|K(t,s)|\mu(dt),\,\forall s\in\mathcal{S},

(2.4)

where $Z=\int_{\mathcal{S}}\int_{\mathcal{T}}|K(t,s)|\mu(dt)\nu(ds)$ is a normalizing constant. This measure quantifies the exploration of data to the unknown function through the integral kernel $K$ at the output set $\mathcal{T}$ , i.e., $\{K(t_{j},\cdot)\}_{t_{j}\in\mathcal{T}}$ , hence it is referred to as an exploration measure. In particular, when (1.1) is a discrete model, the exploration measure is the normalized column sum of the absolute values of the matrix $A$ .

The major advantage of the space $L^{2}_{\rho}(\mathcal{S})$ over the original space $L^{2}_{\nu}(\mathcal{S})$ is that it is adaptive to the specific setting of the inverse problem. In particular, this weighted space takes into account the structure of the integral kernel and the data points in $\mathcal{T}$ . Thus, while the following introduction of RKHS can be carried out in both $L^{2}_{\rho}(\mathcal{S})$ and $L^{2}_{\nu}(\mathcal{S})$ , we will focus only on $L^{2}_{\rho}(\mathcal{S})$ .

Next, we consider the variational inverse problem over $L^{2}_{\rho}$ , and the goal is to find a minimizer of the loss function eq. 2.3 in $L^{2}_{\rho}$ . Since the loss function is quadratic, its Hessian is a symmetric positive linear operator, and it has a unique minimizer in the linear subspace where the Hessian is strictly positive. We assign a name to this subspace in the next definition.

Definition 2.1 (Function space of identifiability)

In a variational inverse problem of minimizing a quadratic loss function $\mathcal{E}$ in $L^{2}_{\rho}$ , we call ${\mathcal{L}_{\overline{G}}}=\frac{1}{2}\nabla^{2}\mathcal{E}$ the normal operator, where $\nabla^{2}\mathcal{E}$ is the Hessian of $\mathcal{E}$ , and we call $H=\mathrm{Null}({\mathcal{L}_{\overline{G}}})^{\perp}$ the function space of identifiability (FSOI).

The next lemma specifies the FSOI for the loss function in eq. 2.3 (see [26] for its proof).

Lemma 2.2

Assume $K\in C(\mathcal{T}\times\mathcal{S})$ . For $\rho$ in eq. 2.4, define ${\overline{G}}:\mathcal{S}\times\mathcal{S}\to\mathbb{R}$ as

\displaystyle{\overline{G}}(s,s^{\prime})

\displaystyle:=\frac{G(s,s^{\prime})}{\frac{d\rho}{d\nu}(s)\frac{d\rho}{d\nu}(s^{\prime})},\quad G(s,s^{\prime})

\displaystyle:=\int_{\mathcal{T}}K(t,s)K(t,s^{\prime})\mu(dt).

(2.5)

(a)

The normal operator for $\mathcal{E}$ in eq. 2.3 over $L^{2}_{\rho}$ is ${\mathcal{L}_{\overline{G}}}:L^{2}_{\rho}\to L^{2}_{\rho}$ defined by

{\mathcal{L}_{\overline{G}}}\phi(s):=\int_{\mathcal{S}}\phi(s^{\prime}){\overline{G}}(s,s^{\prime})\rho(ds^{\prime}),

(2.6)

and the loss function can be written as

\displaystyle\mathcal{E}(\phi)

\displaystyle=\langle{{\mathcal{L}_{\overline{G}}}\phi,\phi}\rangle_{L^{2}_{\rho}}-2\langle{\phi^{D},\phi}\rangle_{L^{2}_{\rho}}+const.,

(2.7)

where $\phi^{D}$ comes from Riesz representation s.t. $\langle{\phi^{D},\phi}\rangle_{L^{2}_{\rho}}=\langle{L_{K}\phi,y}\rangle_{{L^{2}_{\mu}(\mathcal{T})}}$ for any $\phi\in L^{2}_{\rho}$ .

(b)

${\mathcal{L}_{\overline{G}}}$ is compact, self-adjoint, and positive. Hence, its eigenvalues $\{\lambda_{i}\}_{i\geq 1}$ converge to zero and its orthonormal eigenfunctions $\{\psi_{i}\}_{i}$ form a complete basis of $L^{2}_{\rho}(\mathcal{S})$ .
(c)

The FSOI is $H:=\overline{\mathrm{span}\{\psi_{i}\}_{i:\lambda_{i}>0}}\subset L^{2}_{\rho}(\mathcal{S})$ , and the unique minimizer of $\mathcal{E}$ in H is $\widehat{\phi}={\mathcal{L}_{\overline{G}}}^{-1}\phi^{D}$ , where ${\mathcal{L}_{\overline{G}}}^{-1}$ is the inversion of ${\mathcal{L}_{\overline{G}}}:H\to L^{2}_{\rho}$ .

Lemma 2.2 reveals the cause of the ill-posedness, and provides insights on regularization:

•

The variational inverse problem is well-defined only in the FSOI $H$ , which can be a proper subset of $L^{2}_{\rho}$ . Its ill-posedness in $H$ depends on the smallest eigenvalue of the operator ${\mathcal{L}_{\overline{G}}}$ and the error in $\phi^{D}$ .
•

When the data is noiseless, the loss function can only identify the $H$ -projection of the true input function. When data is noisy, its minimizer ${\mathcal{L}_{\overline{G}}}^{-1}\phi^{D}$ is ill-defined in $L^{2}_{\rho}$ when $\phi^{D}\notin{\mathcal{L}_{\overline{G}}}(H)$ .

As a result, when regularizing the ill-posed problem, an important task is to ensure the search takes place in the FSOI and to remove the noise-contaminated components making $\phi^{D}\notin{\mathcal{L}_{\overline{G}}}(H)$ .

2.3 A Data-adaptive RKHS

Our data-adaptive RKHS is the RKHS with ${\overline{G}}$ in eq. 2.5 as a reproducing kernel. Hence, it is adaptive to the integral kernel $K$ and the data in the model. When its norm is used for regularization, it ensures that the search takes place in the FSOI because its $L^{2}_{\rho}$ closure is the FSOI; also, it penalizes the components in $\phi^{D}$ corresponding to the small singular values.

The next lemma characterizes this RKHS, and we refer to [26] for its proof.

Lemma 2.3 (Characterization of the adaptive RKHS)

Assume $K\in C(\mathcal{T}\times\mathcal{S})$ . The RKHS $H_{G}$ with ${\overline{G}}$ as its reproducing kernel satisfies the following properties.

(a)

$H_{G}:={\mathcal{L}_{\overline{G}}}^{\frac{1}{2}}(L^{2}_{\rho}(\mathcal{S}))$ has inner product $\langle{\phi,\phi}\rangle_{H_{G}}=\langle{{\mathcal{L}_{\overline{G}}}^{-\frac{1}{2}}\phi,{\mathcal{L}_{\overline{G}}}^{-\frac{1}{2}}\phi}\rangle_{L^{2}_{\rho}(\mathcal{S})}.$
(b)

$\{\sqrt{\lambda_{i}}\psi_{i}\}_{\lambda_{i}>0}$ is an orthonormal basis of $H_{G}$ , where $\{(\lambda_{i},\psi_{i})\}_{i}$ are eigen-pairs of ${\mathcal{L}_{\overline{G}}}$ .

(c)

For any $\phi=\sum_{i=1}^{\infty}c_{i}y_{i}\in H_{G}$ , we have

\langle{L_{K}\phi,L_{K}\phi}\rangle_{L^{2}_{\mu}(\mathcal{T})}=\sum_{i=1}^{\infty}\lambda_{i}c_{i}^{2},\quad\|\phi\|^{2}_{L^{2}_{\rho}}=\sum_{i=1}^{\infty}c_{i}^{2},\quad\|\phi\|^{2}_{H_{G}}=\sum_{i=1}^{\infty}\lambda_{i}^{-1}c_{i}^{2}.

(d)

$H=\overline{H_{G}}$ with inclosure in $L^{2}_{\rho}(\mathcal{S})$ , where $H=\overline{\mathrm{span}\{y_{i}\}_{i:\lambda_{i}>0}}$ is the FSOI.

The next theorem shows the computation of the RKHS norm for the problem eq. 1.1 when it is written in the form eq. 2.1-eq. 2.2. A key component is solving the eigenvalues of ${\mathcal{L}_{\overline{G}}}$ through a generalized eigenvalue problem.

Theorem 2.4 (Computation of RKHS norm)

Suppose that eq. 1.1 is equivalent to eq. 2.1 under basis functions $\{\phi_{i}\}_{i=1}^{n}$ with $n\leq\infty$ and eq. 2.2. Let $B$ with entries $B(i,j)=\langle{\phi_{i},\phi_{j}}\rangle_{L^{2}_{\rho}}$ be the non-singular basis matrix, where $\rho$ is the measure defined in eq. 2.4. Then, the operator ${\mathcal{L}_{\overline{G}}}$ in eq. 2.6 has eigenvalues $(\lambda_{1},\ldots,\lambda_{n})$ solved by the generalize eigenvalue problem:

A^{\top}AV=BV\Lambda,\quad s.t.,\ \ V^{\top}BV=I_{n},\quad\Lambda=\mathrm{diag}(\lambda_{1},\ldots,\lambda_{n}),

(2.8)

and the eigenfunctions are $\{\psi_{k}=\sum_{j=1}^{n}V_{jk}\phi_{j}\}_{k}$ . The RKHS norm of $\phi=\sum_{i=1}^{n}x_{i}\phi_{i}$ satisfies

	$\displaystyle\\|\phi\\|_{H_{G}}^{2}$	$\displaystyle=\\|x\\|_{C_{rkhs}}^{2}=x^{\top}C_{rkhs}x,$		(2.9)
	$\displaystyle C_{rkhs}$	$\displaystyle=(V\Lambda V^{\top})^{\dagger}=B(A^{\top}A)^{\dagger}B,\quad C_{rkhs}^{\dagger}=B^{-1}(A^{\top}A)B^{-1}.$		(2.9)

In particular, if $B=I_{n}$ , we have $C_{rkhs}=(A^{\top}A)^{\dagger}$ .

Proof. Denote $\boldsymbol{\Phi}=(\phi_{1},\ldots,\phi_{n})^{\top}$ and $\boldsymbol{\Psi}=(\psi_{1},\ldots,\psi_{n})^{\top}$ . We first prove that the eigenvalues of ${\mathcal{L}_{\overline{G}}}$ are solved by eq. 2.8. We suppose $\{(\lambda_{i},\psi_{i})\}_{i=1}^{n}$ are the eigenvalues and eigen-functions of ${\mathcal{L}_{\overline{G}}}$ over $L^{2}_{\rho}$ with $\{\psi_{i}\}$ being an orthonormal basis of $L^{2}_{\rho}(\mathcal{S})$ . Since $\mathcal{H}=\mathrm{span}\{\phi_{i}\}_{i=1}^{n}\supseteq{\mathcal{L}_{\overline{G}}}(L^{2}_{\rho})$ , there exists $V\in\mathbb{R}^{n\times n}$ such that $\boldsymbol{\Psi}=V^{\top}\boldsymbol{\Phi}$ , i.e., $\psi_{k}=\sum_{j=1}^{n}V_{jk}\phi_{j}\}_{k}$ for each $1\leq k\leq n$ . Then, the task is to verify that $V$ and $\Lambda=\mathrm{Diag}(\lambda_{1},\ldots,\lambda_{n})$ satisfy $A^{\top}AV=BV\Lambda$ and $V^{\top}BV=I_{n}$ .

The orthonormality of $\{\psi_{i}\}$ implies that

I_{n}=\big{(}\langle{\psi_{k},\psi_{l}}\rangle_{L^{2}_{\rho}}\big{)}_{1\leq k,l\leq n}=\big{(}\langle{\sum_{i=1}^{n}V_{ik}\phi_{i},\sum_{j=1}^{n}V_{jl}\phi_{j}}\rangle_{L^{2}_{\rho}}\big{)}_{1\leq k,l\leq n}=V^{\top}BV.

Note that $[A^{\top}A](j,i)=\langle{L_{K}\phi_{i},L_{K}\phi_{j}}\rangle_{L^{2}_{\mu}(\mathcal{T})}=\langle{\phi_{j},{\mathcal{L}_{\overline{G}}}\phi_{i}}\rangle_{L^{2}_{\rho}}$ for $1\leq i,j\leq n$ . Then,

\langle{\phi_{j},{\mathcal{L}_{\overline{G}}}\psi_{k}}\rangle_{L^{2}_{\rho}}=\sum_{i=1}^{n}[A^{\top}A](j,i)V_{ik}.

Meanwhile, the eigen-equation ${\mathcal{L}_{\overline{G}}}\psi_{k}=\lambda_{k}\psi_{k}$ implies that for each $\phi_{j}$ ,

\langle{\phi_{j},{\mathcal{L}_{\overline{G}}}\psi_{k}}\rangle_{L^{2}_{\rho}}=\lambda_{k}\langle{\phi_{j},\psi_{k}}\rangle_{L^{2}_{\rho}}=\lambda_{k}\langle{\phi_{j},\sum_{i=1}^{n}V_{ik}\phi_{i}}\rangle_{L^{2}_{\rho}}=\lambda_{k}\sum_{i=1}^{n}B_{ji}V_{ik},

i.e., $\big{(}\langle{\phi_{j},{\mathcal{L}_{\overline{G}}}\psi_{k}}\rangle_{L^{2}_{\rho}}\big{)}=BV\Lambda$ . Hence, these two equations imply that $A^{\top}AV=BV\Lambda$ .

Next, to compute the norm of $\phi=\sum_{i=1}^{n}x_{i}\phi_{i}\in H_{G}$ , we write it as $\phi=x^{\top}\boldsymbol{\Phi}=x^{\top}V^{-1}\boldsymbol{\Psi}$ . Then, its norm is

\|\phi\|_{H_{G}}^{2}=\sum_{k=1}^{n}\lambda_{k}^{-1}\big{(}x^{\top}V^{-1}\big{)}_{k}^{2}=x^{\top}V^{-1}\Lambda^{\dagger}V^{-\top}x=x^{\top}(V\Lambda V^{\top})^{\dagger}x.

Thus, $C_{rkhs}=(V\Lambda V^{\top})^{\dagger}=B(A^{\top}A)^{\dagger}B$ and $C_{rkhs}^{\dagger}=V\Lambda V^{\top}=B^{-1}(A^{\top}A)B^{-1}$ .

In particular, when eq. 1.1 is either a discrete model or a discretization of eq. 2.1 based on Riemann sum approximation of the integral, the exploration measure $\rho$ is the normalized column sum of the absolute values of the matrix $A$ , and $B=\mathrm{diag}(\rho)$ . See Section 5.1 for details.

2.4 DARTR: data-adaptive RKHS Tikhonov regularization

We review DARTR, a data-adaptive RKHS Tikhonov regularization algorithm introduced in [26].

Specifically, it solves the problem eq. 1.1 with regularization:

(\widehat{x}_{\lambda_{*}},\lambda_{*})=\mathop{\mathrm{argmin}}_{x\in\mathbb{R}^{n},\lambda\in\mathbb{R}^{+}}\mathcal{E}_{\lambda}(x),\quad\text{ where }\mathcal{E}_{\lambda}(x):=\|Ax-b\|^{2}+\lambda\|x\|_{C_{rksh}}^{2},

where the norm $\|\cdot\|_{C_{rksh}}$ is the DA-RKHS norm introduced in Theorem 2.4. A direct solution minimizing $\mathcal{E}_{\lambda}(x)$ is to solve $(A^{\top}A+\lambda C_{rkhs})x_{\lambda}=A^{\top}b$ . However, the computation of $C_{rkhs}$ requires a pseudo-inverse that may cause numerical instability.

DARTR introduces a transformation matrix $C_{*}:=V\Lambda^{1/2}$ to avoid using the pseudo-inverse. Note that $C_{*}^{\top}C_{rkhs}C_{*}=\begin{pmatrix}I_{r}&0\\ 0&0\end{pmatrix}:=\mathbf{I}_{r}$ , where $I_{r}$ is the identity matrix with rank $r$ , the number of positive eigenvalues in $\Lambda$ . Then, the linear equation $(A^{\top}A+\lambda C_{rkhs})x_{\lambda}=A^{\top}b$ is equivalent to

(C_{*}A^{\top}AC_{*}+\lambda\mathbf{I}_{r})\widetilde{x}_{\lambda}=C_{*}A^{\top}b

(2.10)

with $\widetilde{x}_{\lambda}=C_{*}^{-1}x_{\lambda}$ . Thus, DARTR computes $\widetilde{x}_{\lambda}$ in the above equation by least squares with minimal norm, and returns $x_{\lambda}=C_{*}\widetilde{x}_{\lambda}$ .

DARTR is a direct method based on matrix decomposition, and it takes $O(mn^{2}+n^{3})$ flops. Hence, it is computationally infeasible when $n$ is large. The iterative method in the next section implements the RKHS regularization in a scalable fashion.

3 Iterative Regularization by DA-RKHS

This section introduces a subspace project method tailored to utilize the DA-RKHS for iterative regularization. As an iterative method, it achieves scalability by accessing the coefficient matrix only via matrix-vector multiplications, producing a sequence of estimators until reaching a desired solution. This section follows the notation conventions in Table 1.

Table 1: Table of notations.

$A,B,C$	matrix or array by capital letters
$b,c,x,y,z,u,v$	vector by regular letters
$\alpha,\beta,\gamma$	scalar by Greek letters
$\mathcal{R}(A)$ and $\mathcal{N}(A)$	the range and null spaces of matrix $A$

3.1 Overview

Our regularization method is based on subspace projection in the DA-RKHS. It iteratively constructs a sequence of linear subspaces $\mathcal{S}_{k}$ of the DA-RKHS $(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}})$ , and recursively solves projected problems

x_{k}=\mathop{\mathrm{argmin}}_{x\in\mathcal{X}_{k}}\|x\|_{C_{rkhs}},\ \ \mathcal{X}_{k}=\{x:\min_{x\in\mathcal{S}_{k}}\|Ax-b\|_{2}\}.

(3.1)

This process yields a sequence of solutions $\{x_{k}\}$ , each emerging from its corresponding subspace. We ensure the uniqueness of the solution within each iteration, as detailed in Theorem 4.7. The iteration proceeds until it meets an early stopping criterion, designed to exclude excessive noisy components and thereby achieve effective regularization. The spaces $\mathcal{S}_{k}$ are called the solution subspaces, and the iteration number $k$ plays the role of the regularization parameter.

Algorithm 1 outlines this procedure, which is a recursion of the following three parts.

P1

Construct the solution subspaces. We introduce a new generalized Golub-Kahan bidiagonalization (gGKB) to construct the solution subspaces in the DA-RKHS iteratively. The procedure is presented in Algorithm 2.
P2

Recursively update the solution to the projected problem. We solve the least squares problem in the solution subspaces in eq. 3.1 efficiently by a new LSQR-type algorithm, Algorithm 3, updating $\|x_{k}\|_{C_{rkhs}}$ and the residual norm $\|Ax-b\|_{2}$ without even computing the residual.
P3

Regularize by early stopping. We select the optimal $k$ by either the discrepancy principle (DP) when we have an accurate estimate of $\|\mathbf{w}\|_{2}$ or the L-curve criterion otherwise.

Algorithm 1 iDARR: Iterative Data-Adaptive RKHS Regularization

Require: $A\in\mathbb{R}^{m\times n}$ , $b\in\mathbb{R}^{m}$ , $B=\mathrm{diag}(\rho)$ , $x_{0}=\mathbf{0}$ , $\bar{x}_{0}=\mathbf{0}$ , where $\rho$ is the exploration measure in eq. 2.4.

1:for

k=1,2,\ldots,

2: P1. Compute

u_{k}

z_{k}

\bar{z}_{k}

\alpha_{k}

\beta_{k}

by gGKB in Algorithm 2.

3: P2. Update

x_{k}

\bar{\gamma}_{k+1}

\|x_{k}\|_{C_{rkhs}}

, etc. by Algorithm 3.

4: P3: Stop at iteration

k_{*}

if Early stopping criterion is satisfied.

\triangleright

L-curve or DP

Ensure: Final solution $x_{k_{*}}$

In the next subsection, we present details for these key parts. Then, we analyze the computational complexity of the algorithm.

3.2 Algorithm details and derivations

This subsection presents detailed derivations for the three parts P1-P3 in Algorithm 1.

P1. Construct the solution subspaces. We construct the solution subspaces by elaborately incorporating the regularization term $\|\cdot\|_{C_{rkhs}}^{2}$ in the Golub-Kahan bidiagonalization (GKB) process. A key point is to use $C_{rkhs}^{\dagger}=B^{-1}A^{\top}AB^{-1}$ to avoid explicitly computing $C_{rkhs}$ , which involves the costly spectral decomposition of the normal operator, see eq. 2.9 in Theorem 2.4.

Consider first the case where $C_{rkhs}$ is positive definite. In this scenario, $A$ has full column rank, and the $C_{rkhs}$ -inner product Hilbert space $(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}})$ is a discrete representation of the RKHS $H_{G}$ with the given basis functions. Note that the true solution is mapped to the noisy $b$ by $A:(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}})\to(\mathbb{R}^{m},\langle\cdot,\cdot\rangle_{2})$ . Let $A^{*}:(\mathbb{R}^{m},\langle\cdot,\cdot\rangle_{2})\to(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}})$ be the adjoint of $A$ , i.e. $\langle Ax,b\rangle_{2}=\langle x,A^{*}b\rangle_{C_{rkhs}}$ for any $x\in\mathbb{R}^{n}$ and $b\in\mathbb{R}^{m}$ . By definition, the matrix-form expression of $A^{*}$ is

A^{*}=C_{rkhs}^{-1}A^{\top},

(3.2)

since $\langle Ax,b\rangle_{2}=x^{\top}A^{\top}b$ and $\langle x,A^{*}b\rangle_{C_{rkhs}}=x^{\top}C_{rkhs}A^{*}b$ for any $x$ and $b$ .

The Golub-Kahan bidiagonalization (GKB) process recursively constructs orthonormal bases for these two Hilbert spaces starting with the vector $b$ as follows:


	$\displaystyle\beta_{1}u_{1}=b,\ \ \alpha_{1}z_{1}=A^{*}u_{1},$		(3.3a)
	$\displaystyle\beta_{i+1}u_{i+1}=Az_{i}-\alpha_{i}u_{i},$		(3.3b)
	$\displaystyle\alpha_{i+1}z_{i+1}=A^{*}u_{i+1}-\beta_{i+1}z_{i},$		(3.3c)

where $u_{i}\in(\mathbb{R}^{m},\langle\cdot,\cdot\rangle_{2})$ , $z_{i}\in(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}})$ with $z_{0}=\boldsymbol{0}$ , and $\alpha_{i}$ , $\beta_{i}$ are normalizing factor such that $\|u_{i}\|_{2}=\|z_{i}\|_{C_{rkhs}}=1$ . The iteration starts with $u_{1}=b/\beta_{1}$ with $\beta_{1}=\|b\|_{2}$ . Using $A^{*}$ in eq. 3.2, we write eq. 3.3c as

\displaystyle\alpha_{i+1}z_{i+1}=C_{rkhs}^{-1}A^{\top}u_{i+1}-\beta_{i+1}z_{i}

(3.4)

with $\alpha_{i+1}=\|C_{rkhs}^{-1}A^{\top}u_{i+1}-\beta_{i+1}z_{i}\|_{C_{rkhs}}$ . To compute $\alpha_{i+1}$ without explicitly computing $C_{rkhs}$ , define $\bar{z}_{i}=C_{rkhs}z_{i}$ . Then we have

\displaystyle\alpha_{i+1}\bar{z}_{i+1}=A^{\top}u_{i+1}-\beta_{i+1}\bar{z}_{i},

(3.5)

where $\bar{z}_{0}:=\boldsymbol{0}$ . Let $p=A^{\top}u_{i+1}-\beta_{i+1}\bar{z}_{i}$ . Then we obtain $\alpha_{i+1}=\|C_{rkhs}^{-1}p\|_{C_{rkhs}}=(p^{\top}C_{rkhs}^{-1}p)^{1/2}$ , which uses $C_{rkhs}^{-1}=B^{-1}A^{\top}AB^{-1}$ without computing $C_{rkhs}$ .

Next, consider that $C_{rkhs}$ is positive semidefinite. The iterative process remains the same with $C_{rkhs}^{-1}$ replaced by the pseudo-inverse $C_{rkhs}^{{\dagger}}$ , because $C_{rkhs}^{{\dagger}}=B^{-1}A^{\top}AB^{-1}$ has the same form as $C_{rkhs}^{-1}$ for the non-singular case. Specifically, the recursive relation eq. 3.4 becomes

\displaystyle\alpha_{i+1}z_{i+1}=C_{rkhs}^{{\dagger}}A^{\top}u_{i+1}-\beta_{i+1}z_{i}.

(3.6)

To compute $\alpha_{i+1}$ , we use the property that $z_{i}\in\mathcal{R}(C_{rkhs})$ , which will be proved in Proposition 4.3. Note that $C_{rkhs}^{{\dagger}}C_{rkhs}=P_{\mathcal{N}(C_{rkhs})^{\perp}}=P_{\mathcal{R}(C_{rkhs})}$ since $C_{rkhs}$ is symmetric, where $P_{\mathcal{X}}$ is the projection operator onto subspace $\mathcal{X}$ . It follows that $C_{rkhs}^{{\dagger}}C_{rkhs}z_{i}=z_{i}$ . Therefore, eq. 3.6 becomes

\alpha_{i+1}z_{i+1}=C_{rkhs}^{{\dagger}}(A^{\top}u_{i+1}-\beta_{i+1}C_{rkhs}z_{i}).

Letting $\bar{z}_{i}=C_{rkhs}z_{i}$ and $p=A^{\top}u_{i+1}-\beta_{i+1}\bar{z}_{i}$ again, we get $\alpha_{i+1}=\|C_{rkhs}^{{\dagger}}p\|_{C_{rkhs}}=(p^{\top}C_{rkhs}^{{\dagger}}p)^{1/2}$ .

Thus, the two cases of $C_{rkhs}$ lead to the same iterative process. We summarize the iterative process in Algorithm 2, and call it generalized Golub-Kahan bidiagonalization (gGKB).

Algorithm 2 Generalized Golub-Kahan bidiagonalization (gGKB)

Require: $A\in\mathbb{R}^{m\times n}$ , $b\in\mathbb{R}^{m}$ , $B=\mathrm{diag}(\rho)$

1:Initialize: let

\beta_{1}=\|b\|_{2}

u_{1}=b/\beta_{1}

, and compute

p=A^{\top}u_{1}

s=B^{-1}A^{\top}AB^{-1}p

2:Let

\alpha_{1}=(s^{\top}p)^{1/2}

z_{1}=s/\alpha_{1}

\bar{z}_{1}=p/\alpha_{1}

3:for

i=1,2,\dots,k,

r=Az_{i}-\alpha_{i}u_{i}

\beta_{i+1}=\|r\|_{2}

u_{i+1}=r/\beta_{i+1}

;

p=A^{\top}u_{i+1}-\beta_{i+1}\bar{z}_{i}

s=B^{-1}A^{\top}AB^{-1}p

;

\triangleright

C_{rkhs}^{\dagger}=B^{-1}A^{\top}AB^{-1}

\alpha_{i+1}=(s^{\top}p)^{1/2}

z_{i+1}=s/\alpha_{i+1}

\bar{z}_{i+1}=p/\alpha_{i+1}

\triangleright

\bar{z}_{i}=C_{rkhs}z_{i}

Ensure: $\{\alpha_{i},\beta_{i}\}_{i=1}^{k+1}$ , $\{u_{i},z_{i},\bar{z}_{i}\}_{i=1}^{k+1}$

Suppose the gGKB process terminates at step $k_{t}:=\max_{k\geq 1}\{\alpha_{k}\beta_{k}>0\}$ . We show in Proposition 4.4–Proposition 4.5 that the output vectors $\{u_{i}\}_{i=1}^{k_{t}}$ and $\{z_{i}\}_{i=1}^{k_{t}}$ are orthonormal in $(\mathbb{R}^{m},\langle\cdot,\cdot\rangle_{2})$ and $(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}})$ , respectively. In particular, they span two Krylov subspaces generated by $\{AC_{rkhs}^{{\dagger}}A^{\top},b\}$ and $\{C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b\}$ , respectively.

In matrix form, the $k$ -step gGKB process with $k<k_{t}$ , starting with vector $b$ , produces a $2$ -orthonormal matrix $U_{k+1}=(u_{1},\dots,u_{k+1})\in\mathbb{R}^{m\times(k+1)}$ (i.e., $U_{k+1}^{\top}U_{k+1}=I_{k+1}$ ), a $C_{rkhs}$ -orthonormal matrix $Z_{k+1}=(z_{1},\dots,z_{k+1})\in\mathbb{R}^{n\times(k+1)}$ , and a bidiagonal matrix

B_{k}=\begin{pmatrix}\alpha_{1}&&&\\ \beta_{2}&\alpha_{2}&&\\ &\beta_{3}&\ddots&\\ &&\ddots&\alpha_{k}\\ &&&\beta_{k+1}\end{pmatrix}\in\mathbb{R}^{(k+1)\times k}

(3.7)

such that the recursion in eq. 3.3a,eq. 3.3b and eq. 3.6 can be written as


	$\displaystyle\beta_{1}U_{k+1}e_{1}=b,$		(3.8a)
	$\displaystyle AZ_{k}=U_{k+1}B_{k},$		(3.8b)
	$\displaystyle C_{rkhs}^{{\dagger}}A^{\top}U_{k+1}=Z_{k}B_{k}^{T}+\alpha_{k+1}z_{k+1}e_{k+1}^{\top},$		(3.8c)

where $e_{1}$ and $e_{k+1}$ are the first and $(k+1)$ -th columns of $I_{k+1}$ . We emphasize that $B_{k}$ is a bidiagonal matrix of full column rank, since $\alpha_{i},\beta_{i}>0$ for all $i\leq k+1$ .

P2. Recursively update the solution to the projected problem. For each $k\leq k_{t}$ , i.e., before the gGKB terminates, we solve eq. 3.1 in the subspace

\mathcal{S}_{k}:=\mathrm{span}\{z_{1},\dots,z_{k}\}

(3.9)

and compute the RKHS norm of the solution.

We show first the uniqueness of the solution to eq. 3.1. From eq. 3.8b we have $B_{k}=U_{k+1}^{\top}AZ_{k}$ , which implies that $B_{k}$ is a projection of $A$ onto the two subspaces $\mathrm{span}\{U_{k+1}\}$ and $\mathrm{span}\{Z_{k}\}$ . Since any vector $x$ in $\mathcal{S}_{k}$ can be written as $x=Z_{k}y$ with a $y\in\mathbb{R}^{k}$ , we obtain from eq. 3.8a and eq. 3.8b that for any $x=Z_{k}y$ ,

	$\displaystyle\min_{x=Z_{k}y}\\|Ax-b\\|_{2}$	$\displaystyle=\min_{y\in\mathbb{R}^{k}}\\|AZ_{k}y-U_{k+1}\beta_{1}e_{1}\\|_{2}$		(3.10)
		$\displaystyle=\min_{y\in\mathbb{R}^{k}}\\|U_{k+1}B_{k}y-U_{k+1}\beta_{1}e_{1}\\|_{2}=\min_{y\in\mathbb{R}^{k}}\\|B_{k}y-\beta_{1}e_{1}\\|_{2}.$		(3.10)

Since $k\leq k_{t}$ , $B_{k}$ has full column rank, this $k$ -dimensional least squares problem has a unique solution. Therefore, the unique solution to eq. 3.1 is

x_{k}=Z_{k}y_{k},\ \ y_{k}=\mathop{\mathrm{argmin}}_{y\in\mathbb{R}^{k}}\|B_{k}y-\beta_{1}e_{1}\|_{2}.

(3.11)

We note that the above uniqueness is independent of the specific construction of the basis vectors $\{z_{k}\}$ . In general, as long as $\mathcal{S}_{k}\subseteq\mathcal{R}(C_{rkhs})$ , the solution to eq. 3.1 is unique; see Theorem 4.7.

Importantly, we recursively compute $x_{k}$ without explicitly solving $y_{k}$ in eq. 3.11, avoiding the $O(k^{3})$ computational cost. To this end, we adopt a procedure similar to the LSQR algorithm in [30] to iteratively update $x_{k}$ from $x_{0}=\mathbf{0}$ . It starts from the following Givens QR factorization:

Q_{k}\begin{pmatrix}B_{k}&\beta_{1}e_{1}\end{pmatrix}=\begin{pmatrix}R_{k}&f_{k}\\ &\bar{\gamma}_{k+1}\end{pmatrix}:=\left(\begin{array}[]{ccccc : c}\rho_{1}&\theta_{2}&&&&\gamma_{1}\\ &\rho_{2}&\theta_{3}&&&\gamma_{2}\\ &&\ddots&\ddots&&\vdots\\ &&&\rho_{k-1}&\theta_{k}&\gamma_{k-1}\\ &&&&\rho_{k}&\gamma_{k}\\ \hdashline&&&&&\bar{\gamma}_{k+1}\end{array}\right),

where the orthogonal matrix $Q_{k}$ is the product of a series of Givens rotation matrices, and $R_{k}$ is a bidiagonal upper triangular matrix; see [13, §5.2.5]. We implement the Givens QR factorization using the procedure in [30], which recursively zeros out the subdiagonal elements $\beta_{i}$ for each $2\leq i\leq k+1$ . Specifically, at the $i$ -th step, a Givens rotation zeros out $\beta_{i+1}$ by

\begin{pmatrix}c_{i}&s_{i}\\ s_{i}&-c_{i}\end{pmatrix}\begin{pmatrix}\bar{\rho}_{i}&0&\bar{\gamma}_{i}\\ \beta_{i+1}&\alpha_{i+1}&0\end{pmatrix}=\begin{pmatrix}\rho_{i}&\theta_{i+1}&\gamma_{i}\\ 0&\bar{\rho}_{i+1}&\bar{\gamma}_{i+1}\end{pmatrix},

where the entries $c_{i}$ and $s_{i}$ of the Givens rotations satisfy $c_{i}^{2}+s_{i}^{2}=1$ , and the elements $\rho_{i}$ , $\theta_{i+1}$ , $\bar{\rho}_{i+1}$ , $\gamma_{i}$ , $\bar{\gamma}_{i+1}$ are recursively updated accordingly; see [30] for more details.

As a result of the QR factorization, we can write

\|B_{k}y-\beta_{1}e_{1}\|_{2}^{2}=\Big{\|}Q_{k}\begin{pmatrix}B_{k}&\beta_{1}e_{1}\end{pmatrix}\begin{pmatrix}y\\ -1\end{pmatrix}\Big{\|}_{2}^{2}=\|R_{k}y-f_{k}\|_{2}^{2}+|\bar{\gamma}_{k+1}|^{2}.

(3.12)

Hence, the solution to $\min_{y\in\mathbb{R}^{k}}\|B_{k}y-\beta_{1}e_{1}\|$ is $y_{k}=R_{k}^{-1}f_{k}$ . Factorizing $R_{k}$ as

R_{k}=D_{k}\bar{R}_{k},\ \ D_{k}:=\begin{pmatrix}\rho_{1}&&&\\ &\rho_{2}&&\\ &&\ddots&\\ &&&\rho_{k}\end{pmatrix},\ \bar{R}_{k}:=\begin{pmatrix}1&\theta_{2}/\rho_{1}&&\\ &1&\theta_{3}/\rho_{2}&\\ &&\ddots&\theta_{k}/\rho_{k-1}\\ &&&1\end{pmatrix}

and denoting $W_{k}=Z_{k}\bar{R}_{k}^{-1}=(w_{1},\dots,w_{k})$ , we get

x_{k}=Z_{k}y_{k}=Z_{k}R_{k}^{-1}f_{k}=(Z_{k}\bar{R}_{k}^{-1})(D_{k}^{-1}f_{k})=W_{k}(D_{k}^{-1}f_{k})=\sum_{i=1}^{k}(\gamma_{i}/\rho_{i})w_{i}.

Updating $x_{k}$ recursively, and solving $W_{k}\bar{R}_{k}=Z_{k}$ by back substitution, we obtain:

x_{i}=x_{i-1}+(\gamma_{i}/\rho_{i})w_{i},\ \ \ w_{i+1}=z_{i+1}-(\theta_{i+1}/\rho_{i})w_{i},\quad\forall 1\leq i\leq k.

(3.13)

Lastly, we compute $\|x_{i}\|_{C_{rkhs}}$ without an explicit $C_{rkhs}$ . We have from eq. 3.13:

C_{rkhs}x_{i}=C_{rkhs}x_{i-1}+(\gamma_{i}/\rho_{i})C_{rkhs}w_{i},\ \ C_{rkhs}w_{i+1}=C_{rkhs}z_{i+1}-(\theta_{i+1}/\rho_{i})C_{rkhs}w_{i}.

Letting $\bar{x}_{i}=C_{rkhs}x_{i}$ and $\bar{w}_{i}=C_{rkhs}w_{i}$ , and recalling that $\bar{z}_{i}=C_{rkhs}z_{i}$ , we have

\|x_{i}\|_{C_{rkhs}}^{2}=x_{i}^{\top}\bar{x}_{i},\quad\bar{x}_{i}=\bar{x}_{i-1}+(\gamma_{i}/\rho_{i})\bar{w}_{i},\ \ \ \bar{w}_{i+1}=\bar{z}_{i+1}-(\theta_{i+1}/\rho_{i})\bar{w}_{i}.

(3.14)

Algorithm 3 Updating procedure

1:Let

x_{0}=\mathbf{0},\ \bar{x}_{0}=\mathbf{0},\ w_{1}=z_{1},\ \bar{w}_{1}=\bar{z}_{1},\ \bar{\gamma}_{1}=\beta_{1},\ \bar{\rho}_{1}=\alpha_{1}

2:for

i=1,2,\ldots,k,

\rho_{i}=(\bar{\rho}_{i}^{2}+\beta_{i+1}^{2})^{1/2}

c_{i}=\bar{\rho}_{i}/\rho_{i},\ s_{i}=\beta_{i+1}/\rho_{i}

\theta_{i+1}=s_{i}\alpha_{i+1},\ \bar{\rho}_{i+1}=-c_{i}\alpha_{i+1}

\gamma_{i}=c_{i}\bar{\gamma}_{i},\ \bar{\gamma}_{i+1}=s_{i}\bar{\gamma}_{i}

x_{i}=x_{i-1}+(\gamma_{i}/\rho_{i})w_{i}

w_{i+1}=z_{i+1}-(\theta_{i+1}/\rho_{i})w_{i}

\bar{x}_{i}=\bar{x}_{i-1}+(\gamma_{i}/\rho_{i})\bar{w}_{i}

\bar{w}_{i+1}=\bar{z}_{i+1}-(\theta_{i+1}/\rho_{i})\bar{w}_{i}

\|x_{i}\|_{C_{rkhs}}=(x_{i}^{\top}\bar{x}_{i})^{1/2}

The whole updating procedure is described in Algorithm 3. This algorithm yields the residual norm $\|Ax_{k}-b\|_{2}$ without explicitly computing the residual. In fact, by eq. 3.10 and eq. 3.12 we have

\bar{\gamma}_{k+1}=\|B_{k}y_{k}-\beta_{1}e_{1}\|_{2}=\|Ax_{k}-b\|_{2}.

(3.15)

Note that $\bar{\gamma}_{k+1}$ decreases monotonically since $x_{k}$ minimizes $\|Ax-b\|_{2}$ in the gradually expanding subspace $\mathrm{span}\{Z_{k}\}$ .

Importantly, Algorithm 3 efficiently computes the solution $x_{k}$ and $\|x_{k}\|_{C_{rkhs}}$ . At each step of updating $x_{i}$ or $w_{i+1}$ , the computation takes $O(2n)$ flops. Similarly, for updating $\bar{x}_{i}$ or $\bar{w}_{i+1}$ , as well as for computing $\|x_{i}\|_{C_{rkhs}}$ , the number of flops are also $O(2n)$ . Therefore, the dominant computational cost is $O(10n)$ . In contrast, if $y_{k}$ is solved explicitly at each step, it takes $O(\sum_{i=1}^{k}i^{3})\sim O(k^{4})$ flops; together with the step of forming $x_{k}=Z_{k}y_{k}$ that takes $O(kn)$ flops, they lead to a total cost of $O(kn+k^{4})$ flops. Thus, the LSQR-type iteration in Algorithm 3 significantly reduces the number of flops from $O(kn+k^{4})$ to $O(10n)$ .

P3. Regularize by early stopping. An early stopping strategy is imperative to prevent the solution subspace from becoming excessively large, which could otherwise compromise the regularization. This necessity is rooted in the phenomenon of semi-convergence: the iteration vector $x_{k}$ initially approaches an optimal regularized solution but subsequently moves towards the unstable naive solution to $\min_{x\in\mathcal{R}(C_{rkhs})}\|Ax-b\|_{2}$ , as detailed in Proposition 4.6.

For early stopping, we adopt the L-curve criterion, as outlined in [12]. This method identifies the ideal early stopping iteration $k_{*}$ at the corner of the curve represented by

\left(\log\|Ax_{k}-b\|_{2},\|x_{k}\|_{C_{rkhs}}\right)=\left(\log\bar{\gamma}_{k+1},\log\|x_{k}\|_{C_{rkhs}}\right).

(3.16)

Here $\bar{\gamma}_{k+1}$ and $\|x_{k}\|_{C_{rkhs}}$ are computed with negligible cost in Algorithm 3. To construct the L-curve effectively, we set the gGKB to execute at least 10 iterations. Additionally, to enhance numerical stability, we stop the gGKB when either $\alpha_{i}$ or $\beta_{i}$ is near the machine precision, as inspired by Proposition 4.6.

It is noteworthy that the discrepancy principle (DP) presents a viable alternative when the measurement error $\|\mathbf{w}\|_{2}$ in eq. 1.1 is available with a high degree of accuracy. The DP halts iterations at the earliest instance of $k$ that satisfies $\bar{\gamma}_{k+1}=\|Ax_{k}-b\|_{2}\leq\tau\|\mathbf{w}\|_{2},$ where $\tau$ is chosen to be marginally greater than 1.

3.3 Computational Complexity

Suppose the algorithm takes $k$ iterations and the basis matrix $B$ is diagonal. Recall that $A\in\mathbb{R}^{m\times n}$ , $B\in\mathbb{R}^{n\times n}$ , $u_{i}\in\mathbb{R}^{m}$ and $z_{i}\in\mathbb{R}^{n}$ . The total computational cost of Algorithm 1 is about $O(3mnk)$ when $m\leq n/3$ or $k<n/3$ ; and about $O((m+k)n^{2})$ when otherwise. The cost is dominated by the gGKB process since the cost of the update procedure in Algorithm 3 is only $O(n)$ at each step.

The gGKB can be computed in two approaches. The first approach uses only matrix-vector multiplication. The main computations in each iteration of gGKB occur at the matrix-vector products $p=A^{\top}u_{i}$ and $s=B^{-1}A^{\top}AB^{-1}p=B^{-1}(A^{\top}(A(B^{-1}p)))$ , which take $O(mn)$ and $O(2mn)$ flops respectively. Thus, the total computational cost of gGKB is $O(3mnk)$ flops. Another approach is using $\mathbf{A}=A^{\top}A$ instead of $A^{\top}$ and $A$ to compute $s$ . In this approach, the computation of $\mathbf{A}$ from $A$ takes $O(mn^{2})$ flops, and the matrix-vector multiplication $\mathbf{A}v$ in each iteration takes about $O(n^{2})$ flops. Hence, the total cost of $k$ iterations is $O(mn^{2}+kn^{2})$ . The second approach is faster when $mn^{2}+n^{2}k<3mnk$ , or equivalently, $(3m-n)k>mn$ . That is, roughly speaking, $m>n/3$ and $k>mn/(3m-n)>n/3$ .

In practice, the matrix-vector computation is preferred since the iteration number $k$ is often small. The resulting iDARR algorithm takes about $O(3mnk)$ flops.

4 Properties of gGKB

This section studies the properties of the gGKB in Algorithm 2, including the structure of the solution subspace, the orthogonality of the resulted vectors $\{u_{i}\}_{i=1}^{k+1}$ and $\{z_{i}\}_{i=1}^{k+1}$ and the number of iterations at termination, defined as:

\textbf{gGKB terminate step: }\quad k_{t}:=\max_{k\geq 1}\{\alpha_{k}\beta_{k}>0\}.

(4.1)

Additionally, we show that the solution to eq. 3.1 is unique in each iteration.

Throughout this section, let $r$ denote the rank of $\Lambda$ and let $V_{r}$ denote the first $r$ columns of $V$ , where $\Lambda$ and $V$ are matrices constituted by the generalized eigenvalues and eigenvectors of $\{A,B\}$ in eq. 2.9 in Theorem 2.4. We have $\mathrm{rank}(A)=r$ . Recall that $C_{rkhs}=(V\Lambda V^{\top})^{\dagger}=B(A^{\top}A)^{\dagger}B$ . Note that the DA-RKHS is $(\mathcal{R}(C_{rkhs}),\langle\cdot,\cdot\rangle_{C_{rkhs}})$ .

4.1 Properties of gGKB

We show first that the gGKB-produced vectors $\{u_{i}\}_{i=1}^{k+1}$ and $\{z_{i}\}_{i=1}^{k+1}$ are orthogonal in $\mathbb{R}^{n}$ and in the DA-RKHS, and the solution subspaces of the gGKB are RKHS-restricted Krylov subspaces.

Definition 4.1 (RKHS-restricted Krylov subspace)

Let $A\in\mathbb{R}^{m\times n}$ and $b\in\mathbb{R}^{n}$ , and let $B\in\mathbb{R}^{n\times n}$ be a symmetric positive definite matrix. Let $C_{rkhs}=B(A^{\top}A)^{\dagger}B$ , which defines an RKHS $(\mathbb{R}^{n},\langle\cdot,\cdot\rangle_{C_{rkhs}})$ . The RKHS-restricted Krylov subspaces are

\mathcal{K}_{k+1}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b)=\mathrm{span}\{(C_{rkhs}^{{\dagger}}A^{\top}A)^{i}C_{rkhs}^{{\dagger}}A^{\top}b\}_{i=0}^{k},\quad k\geq 0.

(4.2)

The main result is the following.

Theorem 4.2 (Properties of gGKB)

Recall $k_{t}$ in eq. 4.1, and the gGKB generates vectors $\{u_{i}\}_{i=1}^{k}$ , $\{z_{i}\}_{i=1}^{k}$ and $\mathcal{S}_{k}=\mathrm{span}\{z_{i}\}_{i=1}^{k}$ . They satisfy the following properties:

(i)

$\{u_{i}\}_{i=1}^{k}$ and $\{z_{i}\}_{i=1}^{k}$ are orthonormal in $\mathbb{R}^{n}$ and in $(\mathcal{R}(C_{rkhs}),\langle\cdot,\cdot\rangle_{C_{rkhs}})$ , respectively;
(ii)

$\mathcal{S}_{k}=\mathcal{K}_{k}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b)$ for each $k\leq k_{t}$ , and the termination iteration number is $k_{t}=\mathrm{dim}(\mathcal{K}_{\infty}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b))$ .

Proof. Part (i) follows from Proposition 4.3, where we show that $z_{k}\in\mathcal{R}(C_{rkhs})$ , and Proposition 4.4, where we show the orthogonality of these vectors.

For Part (ii), $\mathcal{S}_{k}=\mathcal{K}_{k}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b)$ follows from that $\{z_{i}\}_{i=1}^{k}$ form an orthonormal basis of the RKHS-restricted Krylov subspace. We prove $k_{t}=\mathrm{dim}(\mathcal{K}_{\infty}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b))$ in Proposition 4.5.

Proposition 4.3

For each $z_{i}$ generatad by gGKB in eq. 3.3, it holds that $z_{i}\in\mathcal{R}(C_{rkhs})$ . Additionally, if $q:=C_{rkhs}^{{\dagger}}A^{\top}u_{i+1}-\beta_{i+1}z_{i}\neq\boldsymbol{0}$ , then $\alpha_{i+1}=\|q\|_{C_{rkhs}}\neq 0$ .

Proof. We prove it by mathematical induction. For $i=1$ , we obtain from eq. 3.6 and Theorem 2.4 that $\alpha_{1}z_{1}=C_{rkhs}^{{\dagger}}A^{\top}u_{1}=V\Lambda V^{\top}A^{\top}u_{1}\in\mathcal{R}(V_{r})=\mathcal{R}(C_{rkhs})$ , where $r$ is the rank of $\Lambda$ . Suppose $z_{i}\in\mathcal{R}(C_{rkhs})$ for $i\geq 1$ . Using again eq. 3.6 and Theorem 2.4 we get

\alpha_{i+1}z_{i+1}=C_{rkhs}^{{\dagger}}A^{\top}u_{i+1}-\beta_{i+1}z_{i}=V\Lambda V^{\top}A^{\top}u_{i+1}-\beta_{i+1}z_{i}\in\mathcal{R}(C_{rkhs}).

Therefore, $z_{i+1}\in\mathcal{R}(C_{rkhs})$ , and $q\in\mathcal{R}(C_{rkhs})$ .

If $\alpha_{i+1}=0$ , then $q\in\mathcal{N}(C_{rkhs})=\mathcal{R}(C_{rkhs})^{\perp}$ . Therefore, $q=\boldsymbol{0}$ .

Thus, even if $C_{rkhs}$ is singular (positive semidefinite), the gGKB in Algorithm 2 does not terminate as long as the right-hand sides of eq. 3.3b and eq. 3.6 are nonzero, since the iterative computation of $\{\beta_{i+1},u_{i+1}\}$ and $\{\alpha_{i+1},z_{i+1}\}$ can continue. Next, we show that these vectors are orthogonal.

Proposition 4.4 (Orthogonality)

Suppose the $k$ -step gGKB does not terminate, i.e. $k<k_{t}$ , then $\{u_{i}\}_{i=1}^{k+1}$ is a 2-orthonormal basis of the Krylov subspace

\mathcal{K}_{k+1}(AC_{rkhs}^{{\dagger}}A^{\top},b)=\mathrm{span}\{(AC_{rkhs}^{{\dagger}}A^{\top})^{i}b\}_{i=0}^{k},

(4.3)

and $\{z_{i}\}_{i=1}^{k+1}$ is a $C_{rkhs}$ -orthonormal basis for the RKHS-restricted Krylov subspace in eq. 4.2.

Proof. Note from Theorem 2.4 that $C_{rkhs}^{{\dagger}}=V\Lambda V^{\top}$ . Let $W_{r}=V_{r}\Lambda_{r}^{1/2}$ . Then $W_{r}^{\top}C_{rkhs}W_{r}=I_{r}$ , $C_{rkhs}^{{\dagger}}=W_{r}W_{r}^{\top}$ , and $\mathcal{R}(W_{r})=\mathcal{R}(C_{rkhs})$ . For any $z_{i}$ , Proposition 4.3 implies that there exists $v_{i}\in\mathbb{R}^{r}$ such that $z_{i}=W_{r}v_{i}$ . We get from eq. 3.3b and eq. 3.6 that

	$\displaystyle\beta_{i+1}u_{i+1}=AW_{r}v_{i}-\alpha_{i}u_{i},$
	$\displaystyle\alpha_{i+1}v_{i+1}=W_{r}^{\top}A^{\top}u_{i+1}-\beta_{i+1}v_{i},$

where the second equation comes from $\alpha_{i+1}W_{r}v_{i+1}=W_{r}W_{r}^{\top}A^{\top}u_{i+1}-\beta_{i+1}W_{r}v_{i}$ . Combining the above two relations with eq. 3.3a, we conclude that the iterative process for generating $u_{i}$ and $v_{i}$ is the standard GKB process of $AW_{r}$ with starting vector $b$ between the two finite dimensional Hilbert spaces $(\mathbb{R}^{r},\langle\cdot,\cdot\rangle_{2})$ and $(\mathbb{R}^{m},\langle\cdot,\cdot\rangle_{2})$ . Therefore, $\{u_{i}\}_{i=1}^{k+1}$ and $\{v_{i}\}_{i=1}^{k+1}$ are two 2-orthonormal bases of the Krylov subspaces

	$\displaystyle\mathcal{K}_{k+1}(AW_{r}(AW_{r})^{\top},b)=\mathrm{span}\{(AW_{r}W_{r}^{\top}A^{\top})^{i}b\}_{i=0}^{k},$
	$\displaystyle\mathcal{K}_{k+1}((AW_{r})^{\top}AW_{r},(AW_{r})^{\top}b)=\mathrm{span}\{(W_{r}^{\top}A^{\top}AW_{r})^{i}W_{r}^{\top}A^{\top}b\}_{i=0}^{k},$

respectively; see e.g. [13, §10.4]. Then, $W_{r}W_{r}^{\top}=C_{rkhs}^{{\dagger}}$ implies eq. 4.3. Also, $\{z_{i}\}_{i=1}^{k+1}=\{W_{r}v_{i}\}_{i=1}^{k+1}$ is a $C_{rkhs}$ -orthonormal basis of $W_{r}\mathcal{K}_{k+1}((AW_{r})^{\top}AW_{r},(AW_{r})^{\top}b)$ since $W_{r}$ is $C_{rkhs}$ -orthonormal.

Finally, $\{z_{i}\}_{i=1}^{k+1}$ are $C_{rkhs}$ orthogonal by construction, and by using the relation

\displaystyle W_{r}(W_{r}^{\top}A^{\top}AW_{r})^{i}W_{r}^{\top}A^{\top}b=(W_{r}W_{r}^{\top}A^{\top}A)^{i}W_{r}W_{r}^{\top}A^{\top}b=(C_{rkhs}^{{\dagger}}A^{\top}A)^{i}C_{rkhs}^{{\dagger}}A^{\top}b,

we get that $\{z_{i}\}_{i=1}^{k+1}$ in the RKHS-restricted Krylov subspace in eq. 4.2.

Proposition 4.5 (gGKB termination number)

Suppose the gGKB in Algorithm 2 terminates at step $k_{t}=\max_{k\geq 1}\{\alpha_{k}\beta_{k}>0\}$ . Let the distinct nonzero eigenvalues of $AC_{rkhs}^{{\dagger}}A^{\top}$ be $\mu_{1}>\cdots\mu_{s}>0$ with multiplicities $m_{1},\dots,m_{s}$ , and the corresponding eigenspaces are $\mathcal{G}_{1},\cdots,\mathcal{G}_{s}$ . Then, $k_{t}=q$ , where $q$ is the number of nonzero elements in $\{P_{\mathcal{G}_{1}}b,\dots,P_{\mathcal{G}_{s}}b\}$ , and

q=\mathrm{dim}(\mathcal{K}_{\infty}(AC_{rkhs}^{{\dagger}}A^{\top},b))=\mathrm{dim}(\mathcal{K}_{\infty}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b)),

(4.4)

where $K_{\infty}(M,v)=\mathrm{span}\{M^{i}v\}_{i=0}^{\infty}$ denotes the Krylov subspace of $\{M,v\}$ . Moreover, $\sum_{i=1}^{s}m_{i}=r$ with $r$ being the rank of $A$ and $k_{t}=q\leq r$ .

Proof. First, we prove eq. 4.4 with $q$ being the number of nonzero elements in $\{P_{\mathcal{G}_{1}}b,\dots,P_{\mathcal{G}_{s}}b\}$ . Let $g_{j}=P_{\mathcal{G}_{j}}b/\|P_{\mathcal{G}_{j}}b\|_{2}$ for $1\leq j\leq s$ , and let $g_{1},\dots,g_{q}\neq\mathbf{0}$ without loss of generality. Note that $\{g_{j}\}_{j=1}^{q}$ are orthonormal. Let $G_{j}$ be a matrix with orthonormal columns that span $\mathcal{G}_{j}$ . Note that $P_{\mathcal{G}_{j}}=G_{j}G_{j}^{\top}$ . By the eigenvalue decomposition $AC_{rkhs}^{{\dagger}}A^{\top}=\sum_{j=1}^{s}\mu_{j}G_{j}G_{j}^{\top}$ , we have

w_{i}:=(AC_{rkhs}^{{\dagger}}A^{\top})^{i-1}b=\sum_{j=1}^{s}\mu_{j}^{i-1}G_{j}G_{j}^{\top}b=\sum_{j=1}^{q}\mu_{j}^{i-1}\|P_{\mathcal{G}_{j}}b\|_{2}g_{j}.

(4.5)

Hence, $\mathrm{rank}\{w_{i}\}_{i=1}^{\infty}\leq q$ . On the other hand, for $1\leq k\leq q$ , setting $\bar{w}_{i}=\|P_{\mathcal{G}_{j}}b\|_{2}g_{j}$ , we have $(w_{1}\dots,w_{k})=(\bar{w}_{1},\dots,\bar{w}_{q})T_{k}$ with

T_{k}=\begin{pmatrix}1&\mu_{1}&\cdots&\mu_{1}^{k-1}\\ 1&\mu_{2}&\cdots&\mu_{2}^{k-1}\\ \vdots&\vdots&\cdots&\vdots\\ 1&\mu_{q}&\cdots&\mu_{q}^{k-1}\end{pmatrix}=\begin{pmatrix}T_{k1}\\ T_{k2}\end{pmatrix},

where $T_{k1}\in\mathbb{R}^{k\times k}$ consists of the first $k$ rows of $T_{k}$ , and $T_{k2}\in\mathbb{R}^{k,q-k}$ consists of the rest rows. Note that $T_{k1}$ is a Vandermonde matrix and it is nonsingular since $\mu_{i}\neq\mu_{j}$ for $1\leq i\neq j\leq k$ . Then, $T_{k}$ has full column rank, thereby the rank of $\{w_{i}\}_{i=1}^{k}$ is $k$ for $1\leq k\leq q$ , and $\mathrm{rank}\{w_{i}\}_{i=1}^{\infty}\geq\mathrm{rank}\{w_{i}\}_{i=1}^{q}=q$ . Therefore, we have $\mathrm{dim}(\mathcal{K}_{\infty}(AC_{rkhs}^{{\dagger}}A^{\top},b))=\mathrm{rank}\{w_{i}\}_{i=1}^{\infty}=q$ .

Also, we have $\mathrm{dim}(\mathcal{K}_{\infty}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b))=\mathrm{rank}\{C_{rkhs}^{{\dagger}}A^{\top}w_{i}\}_{i=1}^{\infty}=q$ , where the first equality follows from

(C_{rkhs}^{{\dagger}}A^{\top}A)^{i-1}C_{rkhs}^{{\dagger}}A^{\top}b=C_{rkhs}^{{\dagger}}A^{\top}(AC_{rkhs}^{{\dagger}}A^{\top})^{i-1}b=C_{rkhs}^{{\dagger}}A^{\top}w_{i},

(4.6)

and the second equality follows from $\mathrm{rank}\{C_{rkhs}^{{\dagger}}A^{\top}w_{i}\}_{i=1}^{\infty}=\mathrm{rank}\{w_{i}\}_{i=1}^{\infty}=q$ since $C_{rkhs}^{{\dagger}}A^{\top}$ is non-singular on $\mathrm{span}\{w_{i}\}_{i=1}^{\infty}$ , which is a subset of $\mathrm{span}\{\mathcal{G}_{l}\}_{l=1}^{s}$ by eq. 4.5. In fact, $C_{rkhs}^{{\dagger}}A^{\top}$ is non-singular on $\mathrm{span}\{\mathcal{G}_{l}\}_{l=1}^{s}$ because $\{\mathcal{G}_{l}\}$ are eigenspaces of $AC_{rkhs}^{{\dagger}}A^{\top}$ corresponding to the positive eiengvalues.

Furthermore, eq. 4.6 and the non-degeneracy of $C_{rkhs}^{{\dagger}}A^{\top}$ on $\mathrm{span}\{w_{i}\}_{i=1}^{\infty}$ imply that

	$\displaystyle\mathrm{dim}(\mathcal{K}_{q}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b))=$	$\displaystyle\mathrm{rank}(\{C_{rkhs}^{{\dagger}}A^{\top}A)^{i-1}C_{rkhs}^{{\dagger}}A^{\top}b\}_{i=0}^{q-1}$		(4.7)
	$\displaystyle=$	$\displaystyle\mathrm{rank}(\{C_{rkhs}^{{\dagger}}A^{\top}w_{i}\}_{i=1}^{q}=\mathrm{rank}\{w_{i}\}_{i=1}^{q}=q.$		(4.7)

That is, the vectors $\{C_{rkhs}^{{\dagger}}A^{\top}A)^{i-1}C_{rkhs}^{{\dagger}}A^{\top}b\}_{i=0}^{q-1}$ are linearly independent.

Next, we prove that $k_{t}=q$ . Clearly, $k_{t}\leq q$ since by Proposition 4.4 $\{z_{i}\}_{i=1}^{k_{t}}$ are orthogonal and they are in $\mathcal{K}_{k_{t}}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b)\subset\mathcal{K}_{\infty}(C_{rkhs}^{{\dagger}}A^{\top}A,C_{rkhs}^{{\dagger}}A^{\top}b)$ , whose dimension is $q$ . On the other hand, we show next that if $k_{t}<q$ , there will be a contradiction; hence, we must have $k_{t}=q$ . In fact, eq. 3.3b and eq. 3.6 imply that, for each $1\leq i\leq k_{t}$ ,

	$\displaystyle C_{rkhs}^{{\dagger}}A^{\top}Az_{i}$	$\displaystyle=\alpha_{i}C_{rkhs}^{{\dagger}}A^{\top}u_{i}+\beta_{i+1}C_{rkhs}^{{\dagger}}A^{\top}u_{i+1}$
		$\displaystyle=\alpha_{i}(\alpha_{i}z_{i}+\beta_{i}z_{i+1})+\beta_{i+1}(\alpha_{i+1}z_{i+1}+\beta_{i+1}z_{i}),$

which leads to

\alpha_{i+1}\beta_{i+1}z_{i+1}=C_{rkhs}^{{\dagger}}A^{\top}Az_{i}-(\alpha_{i}^{2}+\beta_{i+1}^{2})z_{i}-\alpha_{i}\beta_{i}z_{i-1}.

Note that $\alpha_{1}\beta_{1}z_{1}=C_{rkhs}^{{\dagger}}A^{\top}\beta_{1}u_{1}=C_{rkhs}^{{\dagger}}A^{\top}b$ . Combining the above two relations and using $\alpha_{k}\beta_{k}>0$ for all $k\leq k_{t}$ , it follows that $z_{k}\in\mathrm{span}\{(C_{rkhs}^{{\dagger}}A^{\top}A)^{i}C_{rkhs}^{{\dagger}}A^{\top}b\}_{i=0}^{k}$ for all $k\leq k_{t}$ . Hence, recursively applying $z_{i}=\frac{1}{\alpha_{i}\beta_{i}}C_{rkhs}^{{\dagger}}A^{\top}Az_{i-1}-\frac{1}{\alpha_{i}\beta_{i}}(\alpha_{i-1}^{2}+\beta_{i}^{2})z_{i-1}-\frac{\alpha_{i-1}\beta_{i-1}}{\alpha_{i}\beta_{i}}z_{i-2}$ for all $2\leq i\leq k_{t}$ , we can write

\alpha_{k_{t}+1}\beta_{k_{t}+1}z_{k_{t}+1}=\sum_{i=0}^{k_{t}}\xi_{i}(C_{rkhs}^{{\dagger}}A^{\top}A)^{i}C_{rkhs}^{{\dagger}}A^{\top}b,

with $\xi_{i}\in\mathbb{R}$ and in particular, $\xi_{k_{t}}=1/\Pi_{i=1}^{k_{t}}\alpha_{i}\beta_{i}\neq 0$ . Now $\alpha_{k_{t}+1}\beta_{k_{t}+1}=0$ implies that $\{(C_{rkhs}^{{\dagger}}A^{\top}A)^{i}C_{rkhs}^{{\dagger}}A^{\top}b\}_{i=0}^{k_{t}}$ are linearly dependent, contradicting with the fact that they are linearly independent as suggested by eq. 4.7. Therefore, we have $q=k_{t}$ .

Lastly, to prove that $\sum_{i=1}^{s}m_{i}=r$ , it suffices to show that $\mathrm{rank}(AC_{rkhs}^{{\dagger}}A^{\top})=r$ since the eigenvalues of $AC_{rkhs}^{{\dagger}}A^{\top}$ are nonnegative. To see that its rank is $r$ , following the proof of Proposition 4.4, we write $AC_{rkhs}^{{\dagger}}A^{\top}=AW_{r}W_{r}^{\top}A^{\top}$ . Since $AW_{r}=AV_{r}\Lambda_{r}^{1/2}$ , we only need to prove that $AV_{r}$ has full column rank. Suppose $AV_{r}y=\mathbf{0}$ with $y\in\mathbb{R}^{r}$ . By Theorem 2.4 it follows $A^{\top}AV_{r}y=\mathbf{0}\Leftrightarrow BV_{r}\Lambda_{r}y=\mathbf{0}\Leftrightarrow y=\mathbf{0}$ . Thus, $\mathrm{rank}(AC_{rkhs}^{{\dagger}}A^{\top})=\mathrm{rank}(AW_{r})=r$ .

4.2 Uniqueness of solutions in the iterations

Proposition 4.6

If gGKB terminates at step $k_{t}$ in eq. 4.1, the iterative solution $x_{k_{t}}$ is the unique least squares solution to $\min_{x\in\mathcal{R}(C_{rkhs})}\|Ax-b\|_{2}$ .

Proof. Following the proof of Proposition 4.4, the solution to $\min_{x\in\mathcal{R}(C_{rkhs})}\|Ax-b\|_{2}$ is $x_{\star}=W_{r}y_{\star}$ with $y_{\star}=\mathop{\mathrm{argmin}}_{y}\|AW_{r}y-b\|_{2}$ . Since $AW_{r}$ has full column rank, it follows that $y_{\star}$ is the unique solution to $W_{r}^{\top}A^{\top}(AW_{r}y-b)=\mathbf{0}$ . Note that $\mathcal{R}(W_{r})=\mathcal{R}(C_{rkhs})$ . Thus, $x_{\star}$ is the solution to $\min_{x\in\mathcal{R}(C_{rkhs})}\|Ax-b\|_{2}$ if and only if $P_{\mathcal{R}(C_{rkhs})}A^{\top}(Ax_{\star}-b)=\mathbf{0}$ .

Now we only need to prove $P_{\mathcal{R}(C_{rkhs})}A^{\top}(Ax_{q}-b)=\mathbf{0}$ since $k_{t}=q$ by Proposition 4.5. Using the property $P_{\mathcal{R}(C_{rkhs})}=C_{rkhs}C_{rkhs}^{{\dagger}}$ , we get from eq. 3.8c

P_{\mathcal{R}(C_{rkhs})}A^{\top}U_{k+1}=C_{rkhs}(Z_{k}B_{k}^{T}+\alpha_{k+1}z_{k+1}e_{k+1}^{\top}).

Combining the above relation with eq. 3.10, we have

	$\displaystyle P_{\mathcal{R}(C_{rkhs})}A^{\top}(Ax_{q}-b)$	$\displaystyle=C_{rkhs}(Z_{q}B_{q}^{T}+\alpha_{q+1}z_{q+1}e_{q+1}^{\top})(B_{q}y_{q}-\beta_{1}e_{1})$
		$\displaystyle=C_{rkhs}[Z_{q}(B_{q}^{\top}B_{q}y_{q}-B_{q}^{\top}\beta_{1}e_{1})+\alpha_{q+1}\beta_{q+1}z_{q+1}e_{q}^{\top}y_{q}]$
		$\displaystyle=\alpha_{q+1}\beta_{q+1}C_{rkhs}z_{q+1}e_{q}^{\top}y_{q}=\boldsymbol{0},$

since $\alpha_{q+1}\beta_{q+1}=0$ when gGKB terminates.

This result shows the necessity for early stopping the iteration to avoid getting a naive solution. The next theorem shows the uniqueness of the solution in each iteration of the algorithm.

Theorem 4.7 (Uniquess of solution in each iteration)

For each iteration with $k<k_{t}$ , there exists a unique solution to eq. 3.1. Furthermore, there exists a unique solution to $\min_{x\in\mathcal{S}_{k}}\|Ax-b\|_{2}$ .

Proof. Let $W_{k}\in\mathbb{R}^{n\times k}$ that has orthonormal columns and spans $\mathcal{S}_{k}$ . For any $x\in\mathcal{S}_{k}$ , there is a unique $y\in\mathbb{R}^{k}$ such that $x=W_{k}y$ . Then the solution to eq. 3.1 should be $x_{k}=W_{k}y_{k}$ , where $y_{k}$ is the solution to

\min_{y\in\mathcal{Y}_{k}}\|W_{k}y\|_{C_{rkhs}},\ \ \mathcal{Y}_{k}=\{y:\min_{y\in\mathbb{R}^{k}}\|AW_{k}y-b\|_{2}\}.

By [10, Theorem 2.1], it has a unique solution $y_{k}$ iff $\mathcal{N}(C_{rkhs}^{1/2}W_{k})\bigcap\mathcal{N}(AW_{k})=\{\boldsymbol{0}\}$ .

Now we prove $\mathcal{N}(C_{rkhs}^{1/2}W_{k})=\{\boldsymbol{0}\}$ . To this end, suppose $y\in\mathcal{N}(C_{rkhs}^{1/2}W_{k})$ and $x=W_{k}y$ . Then $x\in\mathcal{S}_{k}\bigcap\mathcal{N}(C_{rkhs}^{1/2})$ . Since $\mathcal{N}(C_{rkhs}^{1/2})=\mathcal{N}(C_{rkhs})=\mathcal{R}(C_{rkhs})^{\perp}$ , we get $x\in\mathcal{S}_{k}\bigcap\mathcal{R}(C_{rkhs})^{\perp}\subseteq\mathcal{R}(C_{rkhs})\bigcap\mathcal{R}(C_{rkhs})^{\perp}=\{\boldsymbol{0}\}$ . Therefore, $x=W_{k}y=\boldsymbol{0}$ , which leads to $y=\boldsymbol{0}$ .

To prove the uniqueness of a solution to $\min_{x\in\mathcal{S}_{k}}\|Ax-b\|_{2}$ , suppose that there are two minimizers, $x_{1}\neq x_{2}\in\mathcal{S}_{k}$ , and we prove that they must be the same. Let $x_{*}=x_{1}-x_{2}$ and we have $A^{\top}Ax_{*}=0$ since the minimizer of $\mathcal{E}(x)=\|Ax-b\|_{2}$ must satisfy $0=\frac{1}{2}\nabla\mathcal{E}(x)=A^{\top}Ax-A^{\top}b$ . That is, $x_{*}\in\mathcal{N}(A^{\top}A)$ .

On the other hand, since $x_{*}\in\mathcal{S}_{k}\subset\mathcal{R}(C_{rkhs})$ by Proposition 4.3, and note that $C_{rkhs}=B(A^{\top}A)^{{\dagger}}B=BV_{r}\Lambda_{r}^{-1}V_{r}^{\top}B$ , we have $B^{-1}x_{*}\in\mathcal{R}((A^{\top}A)^{{\dagger}}B)\subset\mathcal{N}(A^{\top}A)^{\perp}$ .

Combining the above two, we have $\langle{x_{*},B^{-1}x_{*}}\rangle=0$ . But $B$ is a symmetric positive definite matrix, so we must have $x_{*}=0$ .

5 Numerical Examples

5.1 The Fredholm equation of the first kind

We first examine the iDARR in solving the discrete Fredholm integral equation of the first kind. The tests cover two distinct types of spectral decay: exponential and polynomial. The latter is well-known to be challenging, often occurring in applications such as image deblurring. Additionally, we investigate two scenarios concerning whether the true solution is inside or outside the function space of identifiability (FSOI).

Three norms in iterative and direct methods. We compare the $l^{2}$ , $L^{2}$ , and DA-RKHS norms in iterative and direct methods. The direct methods are based on matrix decomposition. These regularizers are listed in Table 2.

Table 2: Three regularization norms in iterative and direct methods.

Norms	$\\|x\\|_{*}^{2}$	Iterative	Direct
$l^{2}$	$x^{\top}I_{n}x$	IR-l2	l2
$L^{2}$	$x^{\top}Bx$	IR-L2	L2
DA-RKHS	$x^{\top}C_{rkhs}x$	iDARR	DARTR

The iterative methods differ primarily in their regularization norms. For the with $l^{2}$ norm, we use the LSQR method in the IR TOOLS package [12], and we stop the iteration when $\alpha_{i}$ or $\beta_{i}$ becomes negligible to maintain stability. For the $L^{2}$ norm, we use gGKB to construct solution subspaces by replacing $C_{rkhs}$ by the basis matrix $B$ ; note that this method is equivalent to the LSQR method using $L=\sqrt{B}$ as a preconditioner in the IR TOOLS package.

The direct methods are Tikhonov regularizers using the L-curve method [16].

Numerical settings. We consider the problem of recovering the input signal $\phi$ in a discretization of Fredholm integral equation in eq. 2.1 with $s\in[a,b]$ and $t\in[c,d]$ . The data are discrete noisy observations $b=(y(t_{1}),\ldots,y(t_{m}))\in\mathbb{R}^{m}$ , where $t_{j}=c+j(d-c)/m$ for $0\leq j\leq m$ . The task is to estimate the coefficient vector $x=(\phi(s_{1}),\ldots,\phi(s_{n}))\in\mathbb{R}^{n}$ in a piecewise-constant function $\phi(s)=\sum_{i=1}^{n}\phi(s_{i})\mathbf{1}_{[s_{i-1},s_{i}]}(s)$ , where $\mathcal{S}:=\{s_{i}\}_{i=1}^{n}\subset[a,b]$ with $s_{i}=a+i\delta$ , $\delta=(b-a)/n$ . We obtain the linear system eq. 1.1 with $A(j,i)=K(t_{j},s_{i})\delta$ by a Riemann sum approximation of the integral. We set $(a,b,c,d)=(1,5,0,5)$ , $m=500$ , and take $n=100$ except when testing the computational time with a sequence of large values for $n$ .

The $\mathbb{R}^{m}$ -valued noise $\mathbf{w}$ is Gaussian $N(0,\sigma^{2}\Delta tI_{m})$ . We set the standard deviation of the noise to be $\sigma=\|Ax\|\times nsr$ , where $nsr$ is the noise-to-signal ratio, and we test our methods with $nsr=\{0.0625,0.125,0.25,0.5,1\}$ .

We consider two integral kernels

\text{(a) }\,K(t,s):=s^{-2}e^{-st};\quad\text{(b) }\,K(t,s):=s^{-1}|\sin(st+1)|.

(5.1)

which lead to exponential and polynomial decaying spectra, respectively, as shown in Figure 1. The first kernel arises from magnetic resonance relaxometry [4] with $\phi$ being the distribution of transverse nuclear relaxation times.

Refer to caption — (a) Exponential decaying spectrum

The DA-RKHS. By its definition in eq. 2.4, the exploration measure is $\rho(s_{i})=\frac{1}{\gamma}\sum_{j=1}^{m}|K(t_{j},s_{i})|\delta$ for $i=1,\dots,n$ with $\gamma$ being the normalizing constant. In other words, it is the normalized column sum of the absolute values of the matrix $A$ . The discrete function space $L^{2}_{\rho}(\mathcal{S})$ is equivalent to $\mathbb{R}^{n}$ with weight $\rho=(\rho(s_{1}),\ldots,\rho(s_{n}))$ , and its norm is $\langle{x,x}\rangle_{L^{2}_{\rho}}:=x^{\top}\mathrm{diag}(\rho(s_{i}))x$ for all $x\in\mathbb{R}^{n}$ . The basis matrix for the Cartesian basis of $\mathbb{R}^{n}$ is $B=\mathrm{diag}(\rho(s_{i}))$ , which is also the basis matrix for step functions in the Riemann sum discretization. The DA-RKHS in this discrete setting is $(\mathcal{N}(A^{\top}A))^{\perp},\langle{\cdot,\cdot}\rangle_{C_{rkhs}})$ .

Settings for comparisons. The comparison consists of two scenarios regarding whether the true solution is inside or outside of the FSOI: (i) the true solution is the second eigenvector of ${\mathcal{L}_{\overline{G}}}$ ; thus it is inside the FSOI; and (ii) the true solution is $\phi(x)=x^{2}$ , which has significant components outside the FSOI.

For each scenario, we conduct 100 independent simulations, with each simulation comprising five datasets at varying noise levels. The results are presented by a box plot, which illustrates the median, the lower and upper quartiles, any outliers, and the range of values excluding outliers. The key indicator of a regularizer’s effectiveness is its ability to produce accurate estimators whose errors decay consistently as the noise level decreases. Since exploring the decay rate in the small noise limit is not the focus of this study, we direct readers to [27, 22] for initial insights into how this rate is influenced by factors such as spectral decay, the smoothness of the true solution, and the choice of regularization strength.

Results. We report the results separately according to the spectral decay.

(i) Exponential decaying spectrum. Figure 2’s top row shows typical estimators of IR-l2, IR-L2, and iDARR and their de-noising of the output signal when $nsr=0.5$ . When the true solution is inside the FSOI, the iDARR significantly outperforms the other two in producing a more accurate estimator. However, both IR-L2 and IR-l2 can denoise the data accurately, even though their estimators are largely biased. When the true solution is outside the FSOI, all the regularizers can not capture the true function accurately, but iDARR and IR-L2 clearly outperform the $l_{2}$ regularizer. Yet again, all these largely biased estimators can de-noise the data accurately. Thus, this inverse problem is severely ill-defined, and one must restrict the inverse to be in the FSOI.

The 2nd top row of Figure 2 shows the decay of the residual $\|Ax_{k}-y\|_{2}$ as the iteration number increases, as well as the stopping iteration numbers of these regularizers in 100 simulations. The fast decaying residual suggests the need for early stopping, and all three regularizers indeed stop in a few steps. Notably, iDARR consistently stops early at the second iteration for different noise levels, outperforming the other two regularizers in stably detecting the stopping iteration.

The effectiveness of the DA-RKHS regularization becomes particularly evident in the lower two rows of Figure 2, which depict the decaying errors and loss values as the noise-to-signal ratio ( $nsr$ ) decreases in the 100 independent simulations. In both iterative and direct methods, the DA-RKHS norm demonstrates superior performance compared to the $l^{2}$ and $L^{2}$ norms, consistently delivering more accurate estimators that show a steady decrease in error alongside the noise level. Notably, the values of the corresponding loss functions are similar, underscoring the inherent ill-posedness of the inverse problem. Furthermore, iDARR marginally surpasses the direct method DARTR in producing more precise estimators, particularly when the true solution resides within the FSOI. The performance of iDARR suggests that its early stopping mechanism can reliably determine an optimal regularization level, achieving results that are slightly more refined than those obtained with DARTR using the L-curve method.

(ii) Polynomial decaying spectrum. Figure 3 illustrates again the superior performance of iDARR over IR-L2 and IR-l2 in the case of polynomial spectral decay. The 2nd top row shows that the slow spectral decay poses a notable challenge to the iterative methods, as the noise level affects their stopping iteration numbers. Also, they all stop early within approximately twelve steps, even though the true solution may lay in a subspace with a higher dimension.

The lower two rows show that iDARR remains effective. It continues to outperform the other two iterative regularizers when the true solution is in the FSOI, and it is marginally surpassed by IR-L2 and IR-l2 when the true solution is outside the FSOI. In both scenarios, the direct method DARTR outperforms all other methods, including iDARR, indicating DARTR is more effective in extracting information from the spectrum with slow decay.

Computational Complexity. The iterative method iDARR is orders of magnitude faster than the direct method DARTR, especially when $n$ is large. Figure 4 shows their computation time as $n$ increases in $10$ independent simulations, and the results align with the complexity order illustrated in Section 3.3.

In summary, iDARR outperforms IR-L2 and IR-l2 in yielding accurate estimators that consistently decay with the noise level. Its major advantage comes from the DA-RKHS norm that adaptively exploits the information in data and the model.

5.2 Image Deblurring

We further test iDARR in 2D image deblurring problems, where the task is to reconstruct images from blurred and noisy observations. The mathematical model of this problem can be expressed in the form of the first-kind Fredholm integral equation in eq. 2.1 with $s,t\in\mathbb{R}^{2}$ . The kernel $K(t,s)$ is a function that specifies how the points in the image are distorted, called the point spread function (PSF). We chose PRblurspeckle from [12] as the blurring operator, which simulates spatially invariant blurring caused by atmospheric turbulence, and we use zero boundary conditions to construct the matrix $A$ . For a true image with $N\times N$ pixels, the matrix $A\in\mathbb{R}^{N^{2}\times N^{2}}$ is a psfMatrix object. We consider two images with $256\times 256$ and $320\times 320$ pixels, respectively, and set the noise level to be $nsr=0.01$ for both images. The true images, their blurred noisy observations, and corresponding PSFs that define matrices $A$ are presented in Figure 5.

Figure 6 shows the reconstructed images computed by iDARR, LSQR, and the hybrid-l2 methods. Here the hybrid-l2 applies an $l_{2}$ -norm Tikhonov regularization to the projected problem obtained by LSQR, and it uses the stopping strategy in [12]. The best estimations of for iDARR or LSQR are solutions with $k_{*}$ minimizing $\|x_{k}-x_{\text{true}}\|_{2}$ . Their reconstructed solutions are obtained by using the L-curve method for early stopping.

Figure 7 (a)–(f) show the relative errors as the iteration number increases and the selection of the early stopping iterations by the L-curve method in the right two columns.

Notably, Figure 7 reveals that iDARR achieves more accurate reconstructed images than LSQR for both tests, despite appearing to the contrary in Figure fig:deblur. The LSQR appears prone to stopping late, resulting in lower-quality reconstructions than iDARR. In contrast, iDARR tends to stop earlier than ideal, before achieving the best quality. However, the hybrid-l2 method consistently produces accurate estimators with stable convergence, suggesting potential benefits in developing a hybrid iDARR approach to enhance stability.

The effectiveness of iDARR depends on the alignment of regularities between the convolution kernel and the image, as the DA-RKHS’s regularity is tied to the smoothness of the convolution kernel. With the PRblurspeckle featuring a smooth PSF, iDARR obtains a higher accuracy for the smoother Image-2 compared to Image-1, producing reconstructions with smooth edges. An avenue for future exploration involves adjusting the DA-RKHS’s smoothness to better align with the smoothness of the data.

6 Conclusion and Future Work

We have introduced iDARR, a scalable iterative data-adaptive RKHS regularization method, for solving ill-posed linear inverse problems. It searches for solutions in the subspaces where the true signal can be identified and achieves reliable early stopping via the DA-RKHS norm. A core innovation is a generalized Golub-Kahan bidiagonalization procedure that recursively computes orthonormal bases for a sequence of RKHS-restricted Krylov subspaces. Systematic numerical tests on the Fredholm integral equation show that iDARR outperforms the widely used iterative regularizations using the $L^{2}$ and $l^{2}$ norms, in the sense that it produces stable accurate solutions consistently converging when the noise level decays. Applications to 2D image de-blurring further show the iDARR outperforms the benchmark of LSQR with the $l^{2}$ norm.

Future Work: Hybrid Methods

The accuracy and stability of the regularized solution hinges on the choice of iteration number for early stopping. While the L-curve criterion is a commonly used tool for determining this number, it can sometimes lead to suboptimal results due to its reliance on identifying a corner in the discrete curve. Hybrid methods are well-recognized alternatives that help stabilize this semi-convergence issue, as referenced in [20, 8, 32]. One promising approach is to apply Tikhonov regularization to each iteration of the projected problem. The hyperparameter for this process can be determined using the weighted generalized cross-validation method (WGCV) as described in [8]. This approach is a focus of our upcoming research project.

References

[1] Simon R Arridge, Marta M Betcke, and Lauri Harhanen. Iterated preconditioned LSQR method for inverse problems on unstructured grids. Inverse Probl., 30(7):075009, 2014.
[2] Xianglan Bai, Guang-Xin Huang, Xiao-Jun Lei, Lothar Reichel, and Feng Yin. A novel modified TRSVD method for large-scale linear discrete ill-posed problems. Appl. Numer. Math., 164:72–88, 2021.
[3] Frank Bauer, Sergei Pereverzev, and Lorenzo Rosasco. On regularization algorithms in learning theory. Journal of complexity, 23(1):52–72, 2007.
[4] Chuan Bi, M. Yvonne Ou, Mustapha Bouhrara, and Richard G. Spencer. Span of regularization for solution of inverse problems with application to magnetic resonance relaxometry of the brain. Scientific Reports, 12(1):20194, 2022.
[5] Åke Björck. A bidiagonalization algorithm for solving large and sparse ill-posed systems of linear equations. BIT Numer. Math., 28(3):659–670, 1988.
[6] Noe Angelo Caruso and Paolo Novati. Convergence analysis of LSQR for compact operator equations. Linear Algebra and its Applications, 583:146–164, 2019.
[7] Zhiming Chen, Wenlong Zhang, and Jun Zou. Stochastic convergence of regularized solutions and their finite element approximations to inverse source problems. SIAM Journal on Numerical Analysis, 60(2):751–780, 2022.
[8] Julianne Chung, James G Nagy, and Dianne P O’Leary. A weighted-GCV method for Lanczos-hybrid regularization. Electr. Trans. Numer. Anal., 28(29):149–167, 2008.
[9] Felipe Cucker and Ding Xuan Zhou. Learning theory: an approximation theory viewpoint, volume 24. Cambridge University Press, Cambridge, 2007.
[10] Lars Eldén. A weighted pseudoinverse, generalized singular values, and constrained least squares problems. BIT Numer. Math., 22:487–502, 1982.
[11] Heinz Werner Engl, Martin Hanke, and Andreas Neubauer. Regularization of inverse problems, volume 375. Springer Science & Business Media, 1996.
[12] Silvia Gazzola, Per Christian Hansen, and James G Nagy. Ir tools: a matlab package of iterative regularization methods and large-scale test problems. Numerical Algorithms, 81(3):773–811, 2019.
[13] Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, 4th edition, 2013.
[14] Jacques Salomon Hadamard. Lectures on Cauchy’s problem in linear partial differential equations, volume 18. Yale University Press, 1923.
[15] Per Christian Hansen. Rank-deficient and discrete ill-posed problems: numerical aspects of linear inversion. SIAM, 1998.
[16] Per Christian Hansen. The L-curve and its use in the numerical treatment of inverse problems. In in Computational Inverse Problems in Electrocardiology, ed. P. Johnston, Advances in Computational Bioengineering, pages 119–142. WIT Press, 2000.
[17] Guangxin Huang, Yuanyuan Liu, and Feng Yin. Tikhonov regularization with MTRSVD method for solving large-scale discrete ill-posed problems. J. Comput. Appl. Math., 405:113969, 2022.
[18] Zhongxiao Jia and Yanfei Yang. A joint bidiagonalization based iterative algorithm for large scale general-form Tikhonov regularization. Appl. Numer. Math., 157:159–177, 2020.
[19] Misha E. Kilmer, Per Christian Hansen, and Malena I. Espanol. A projection–based approach to general-form Tikhonov regularization. SIAM J. Sci. Comput., 29(1):315–330, 2007.
[20] Misha E Kilmer and Dianne P O’Leary. Choosing regularization parameters in iterative methods for ill-posed problems. SIAM J. Matrix Anal. Appl., 22(4):1204–1221, 2001.
[21] Jörg Lampe, Lothar Reichel, and Heinrich Voss. Large-scale Tikhonov regularization via reduction by orthogonal projection. Linear Algebra Appl., 436(8):2845–2865, 2012.
[22] Quanjun Lang and Fei Lu. Small noise analysis for Tikhonov and RKHS regularizations. arXiv preprint arXiv:2305.11055, 2023.
[23] Gongsheng Li and Zuhair Nashed. A modified Tikhonov regularization for linear operator equations. Numerical functional analysis and optimization, 26(4-5):543–563, 2005.
[24] Haibo Li. A preconditioned Krylov subspace method for linear inverse problems with general-form Tikhonov regularization. arXiv:2308.06577v1, 2023.
[25] Fei Lu, Qingci An, and Yue Yu. Nonparametric learning of kernels in nonlocal operators. Journal of Peridynamics and Nonlocal Modeling, pages 1–24, 2023.
[26] Fei Lu, Quanjun Lang, and Qingci An. Data adaptive RKHS Tikhonov regularization for learning kernels in operators. Proceedings of Mathematical and Scientific Machine Learning, PMLR 190:158-172, 2022.
[27] Fei Lu and Miao-Jung Yvonne Ou. An adaptive RKHS regularization for Fredholm integral equations. arXiv preprint arXiv:2303.13737, 2023.
[28] M Zuhair Nashed and Grace Wahba. Generalized inverses in reproducing kernel spaces: an approach to regularization of linear operator equations. SIAM Journal on Mathematical Analysis, 5(6):974–987, 1974.
[29] Dianne P O’Leary and John A Simmons. A bidiagonalization-regularization procedure for large scale discretizations of ill-posed problems. SIAM J. Sci. Statist. Comput., 2(4):474–489, 1981.
[30] Christopher C Paige and Michael A Saunders. LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software, 8:43–71, 1982.
[31] Lothar Reichel, Fiorella Sgallari, and Qiang Ye. Tikhonov regularization based on generalized Krylov subspace methods. Appl. Numer. Math., 62(9):1215–1228, 2012.
[32] R A Renaut, S Vatankhah, and V E Ardesta. Hybrid and iteratively reweighted regularization by unbiased predictive risk and weighted GCV for projected systems. SIAM J. Sci. Statist. Comput., 39(2):B221–B243, 2017.
[33] Leonid I Rudin, Stanley Osher, and Emad Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992.
[34] Robert Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
[35] Andrei Nikolajevits Tikhonov. Solution of incorrectly formulated problems and the regularization method. Soviet Math., 4:1035–1038, 1963.
[36] Grace Wahba. Convergence rates of certain approximate solutions to fredholm integral equations of the first kind. Journal of Approximation Theory, 7(2):167–185, 1973.
[37] Grace Wahba. Practical approximate solutions to linear operator equations when the data are noisy. SIAM journal on numerical analysis, 14(4):651–667, 1977.
[38] Yimin Wei, Pengpeng Xie, and Liping Zhang. Tikhonov regularization and randomized GSVD. SIAM J. Matrix Anal. Appl., 37(2):649–675, 2016.
[39] Hua Xiang and Jun Zou. Regularization with randomized svd for large-scale discrete inverse problems. Inverse Problems, 29(8):085008, 2013.
[40] Hua Xiang and Jun Zou. Randomized algorithms for large-scale inverse problems with general Tikhonov regularizations. Inverse Probl., 31(8):085008, 2015.
[41] Ye Zhang and Chuchu Chen. Stochastic asymptotical regularization for linear inverse problems. Inverse Problems, 39(1):015007, 2022.

	$\displaystyle\min_{x=Z_{k}y}\\|Ax-b\\|_{2}$	$\displaystyle=\min_{y\in\mathbb{R}^{k}}\\|AZ_{k}y-U_{k+1}\beta_{1}e_{1}\\|_{2}$		(3.10)
		$\displaystyle=\min_{y\in\mathbb{R}^{k}}\\|U_{k+1}B_{k}y-U_{k+1}\beta_{1}e_{1}\\|_{2}=\min_{y\in\mathbb{R}^{k}}\\|B_{k}y-\beta_{1}e_{1}\\|_{2}.$		(3.10)