This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Randomized block coordinate descent method for linear ill-posed problems

Qinian Jin Mathematical Scuences Institute, Australian National University, Canberra, ACT 2601, Australia [email protected]  and  Duo Liu School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China [email protected]
Abstract.

Consider the linear ill-posed problems of the form i=1bAixi=y\sum_{i=1}^{b}A_{i}x_{i}=y, where, for each ii, AiA_{i} is a bounded linear operator between two Hilbert spaces XiX_{i} and 𝒴{\mathcal{Y}}. When bb is huge, solving the problem by an iterative method using the full gradient at each iteration step is both time-consuming and memory insufficient. Although randomized block coordinate decent (RBCD) method has been shown to be an efficient method for well-posed large-scale optimization problems with a small amount of memory, there still lacks a convergence analysis on the RBCD method for solving ill-posed problems. In this paper, we investigate the convergence property of the RBCD method with noisy data under either a priori or a posteriori stopping rules. We prove that the RBCD method combined with an a priori stopping rule yields a sequence that converges weakly to a solution of the problem almost surely. We also consider the early stopping of the RBCD method and demonstrate that the discrepancy principle can terminate the iteration after finite many steps almost surely. For a class of ill-posed problems with special tensor product form, we obtain strong convergence results on the RBCD method. Furthermore, we consider incorporating the convex regularization terms into the RBCD method to enhance the detection of solution features. To illustrate the theory and the performance of the method, numerical simulations from the imaging modalities in computed tomography and compressive temporal imaging are reported.

Key words and phrases:
linear ill-posed problems, randomized block coordinate descent method, convex regularization term, convergence, imaging
2010 Mathematics Subject Classification:
65J20, 65J22, 65J10, 94A08

1. Introduction

In this paper we consider linear ill-posed problems of the form

i=1bAixi=y,\sum_{i=1}^{b}A_{i}x_{i}=y, (1)

where, for each i=1,,bi=1,\cdots,b, Ai:Xi𝒴A_{i}:X_{i}\to{\mathcal{Y}} is a bounded linear operator between two real Hilbert spaces XiX_{i} and 𝒴{\mathcal{Y}}. Ill-posed problems with such a form arise naturally from many applications in imaging. For example, coded aperture compressive temporal imaging (more details see Example 4.2 and [15]), a kind of large-scale snapshot compressive imaging [33], can be expressed by the following integral equation

(Ax)(s):=Tm(s,t)x(s,t)dt=y(s),sΩ,(Ax)(s):=\int_{T}m(s,t)x(s,t)\mathop{}\!\mathrm{d}t=y(s),\quad s\in\Omega, (2)

where T=[t1,t2][0,+)T=[t_{1},t_{2}]\subset[0,+\infty), Ω\Omega is a bounded domains in 2-dimensional Euclidean spaces and mm is a bounded continuous mask function on Ω×T\Omega\times T, which ensures the forward operator AA is a bounded linear operator from L2(Ω×T)L^{2}(\Omega\times T) to L2(Ω)L^{2}(\Omega). Decomposing the whole time domain TT into bb disjoint sub-domains T1,,TbT_{1},\ldots,T_{b}, where for any i=1,,bi=1,\ldots,b,

Ti:=[t¯i,t¯i+1],t¯i:=t2t1b(i1)+t1,T_{i}:=[\bar{t}_{i},\bar{t}_{i+1}],\quad\bar{t}_{i}:=\frac{t_{2}-t_{1}}{b}(i-1)+t_{1}, (3)

we denote the restriction of each function xL2(Ω×T)x\in L^{2}(\Omega\times T) to Ω×Ti\Omega\times T_{i} as xix_{i}, i.e. xi:=x|Ω×Tix_{i}:=x|_{\Omega\times T_{i}} and let

(Aixi)(s):=Tim(s,t)xi(s,t)dt,sΩ,(A_{i}x_{i})(s):=\int_{T_{i}}m(s,t)x_{i}(s,t)\mathop{}\!\mathrm{d}t,\quad s\in\Omega, (4)

then Ai:L2(Ω×Ti)L2(Ω)A_{i}:L^{2}(\Omega\times T_{i})\rightarrow L^{2}(\Omega) is a bounded linear operator for each ii. Since

Tm(s,t)x(s,t)dt=i=1bTim(s,t)xi(s,t)dt,\int_{T}m(s,t)x(s,t)\mathop{}\!\mathrm{d}t=\sum_{i=1}^{b}\int_{T_{i}}m(s,t)x_{i}(s,t)\mathop{}\!\mathrm{d}t,

the above integral equation can be written as the form (1). For more examples, one may refer to the light field reconstruction from the focal stack [14, 32] and other imaging modalities such as spectral imaging [28] and full 3-D computed tomography [18]. In these examples, the sought solution can be separated into many films or frames naturally.

Let 𝒳:=X1××Xb{\mathcal{X}}:=X_{1}\times\cdots\times X_{b} be the product space of X1,,XbX_{1},\cdots,X_{b} with the natural inner product inherited from those of XiX_{i} and define A:𝒳𝒴A:{\mathcal{X}}\to{\mathcal{Y}} by

Ax:=i=1bAixi,x=(x1,,xb)𝒳.Ax:=\sum_{i=1}^{b}A_{i}x_{i},\quad\forall x=(x_{1},\ldots,x_{b})\in{\mathcal{X}}. (5)

Then, (1) can be equivalently stated as Ax=yAx=y. Throughout the paper we assume that (1) has a solution, i.e. yRan(A)y\in\mbox{Ran}(A), the range of AA. In practical applications, instead of the exact data yy we can only acquire a noisy data yδy^{\delta} satisfying

yyδδ,\|y-y^{\delta}\|\leq\delta,

where δ>0\delta>0 is the noise level. How to stably reconstruct an approximate solution of ill-posed problems described by (1) using noisy data yδy^{\delta} is a central topic in computational inverse problems. In this situation, the regularization techniques should be used and various regularization methods have been developed in the literature ([6]).

In this paper we will exploit the decomposition structure of the equation (1) to develop a randomized block coordinate descent method for solving ill-posed linear problems. The researches on the block coordinate descent method in the literature mainly focus on solving large scale well-posed optimization problems ([17, 19, 22, 24, 25, 26, 27, 30]). For the unconstrained minimization problem

min(x1,,xb)𝒳f(x1,,xb)\displaystyle\min_{(x_{1},\cdots,x_{b})\in{\mathcal{X}}}f(x_{1},\cdots,x_{b}) (6)

with 𝒳=X1××Xb{\mathcal{X}}=X_{1}\times\cdots\times X_{b}, where f:𝒳f:{\mathcal{X}}\to{\mathbb{R}} is a continuous differentiable function, the block coordinate descent method updates xk+1:=(xk+1,1,,xk+1,b)x_{k+1}:=(x_{k+1,1},\cdots,x_{k+1,b}) from xk:=(xk,1,,xk,b)x_{k}:=(x_{k,1},\cdots,x_{k,b}) by selecting an index iki_{k} from {1,,b}\{1,\cdots,b\} and uses the partial gradient ikf(xk)\nabla_{i_{k}}f(x_{k}) of ff at xkx_{k} to update the component xk+1,ikx_{k+1,i_{k}} of xk+1x_{k+1} and leave other components unchanged; more precisely, we have the updating formula

xk+1,i={xk,ikγikf(xk),ifi=ik,xk,i,otherwise,x_{k+1,i}=\begin{cases}x_{k,i_{k}}-\gamma\nabla_{i_{k}}f(x_{k}),&\text{if}\ i=i_{k},\\ x_{k,i},&\text{otherwise},\end{cases} (7)

where γ\gamma is the step size. The index iki_{k} can be selected in various ways, either in a cyclic fashion or in a random manner. By applying the block coordinate descent method (7) to solve (6) with ff given by

f(x1,,xb)=12i=1bAixiyδ2,\displaystyle f(x_{1},\cdots,x_{b})=\frac{1}{2}\left\|\sum_{i=1}^{b}A_{i}x_{i}-y^{\delta}\right\|^{2}, (8)

it leads to the iterative method

xk+1,iδ={xk,ikδγAik(Axkδyδ),ifi=ik,xk,iδ,otherwisex_{k+1,i}^{\delta}=\begin{cases}x_{k,i_{k}}^{\delta}-\gamma A^{*}_{i_{k}}(Ax_{k}^{\delta}-y^{\delta}),&\text{if}\ i=i_{k},\\ x_{k,i}^{\delta},&\text{otherwise}\end{cases} (9)

for solving ill-posed linear problem (1) using noisy data yδy^{\delta}, where Ai:𝒴XiA_{i}^{*}:{\mathcal{Y}}\to X_{i} denotes the adjoint of AiA_{i} for each ii and the superscript “δ\delta” on all iterates is used to indicate their dependence on the noisy data.

The method (9) can also be motivated from the Landweber iteration

xk+1δ=xkδγA(Axkδyδ),x_{k+1}^{\delta}=x_{k}^{\delta}-\gamma A^{*}(Ax_{k}^{\delta}-y^{\delta}), (10)

which is one of the most prominent regularization methods for solving ill-posed problems, where γ\gamma is the step-size and A:𝒴𝒳A^{*}:{\mathcal{Y}}\to{\mathcal{X}} is the adjoint of AA. Note that, with the operator AA defined by (5), we have

Az=(A1z,,Abz),z𝒴.A^{*}z=\left(A_{1}^{*}z,\cdots,A_{b}^{*}z\right),\quad\forall z\in{\mathcal{Y}}.

Thus, updating xk+1δx_{k+1}^{\delta} from xkδx_{k}^{\delta} by (10) needs calculating Ai(Axkδyδ)A_{i}^{*}(Ax_{k}^{\delta}-y^{\delta}) for all i=1,,bi=1,\cdots,b. If bb is huge, applying the method (10) to solve the equation (1) is both time-consuming and memory insufficient. To resolve these issues, it is natural to calculate only one component of A(Axkδyδ)A^{*}(Ax_{k}^{\delta}-y^{\delta}) and use it to update the corresponding component of xk+1δx_{k+1}^{\delta} and keep other components fixed. This leads again to the block coordinate descent method (9) for solving (1).

The block coordinate descent method and its variants for large scale well-posed optimization problems have been analyzed extensively, and all the established convergence results require either the objective function to be strongly convex or the convergence is established in terms of the objective value. However, these results are not applicable to the method (9) for ill-posed problems because the corresponding objective function (8) is not strongly convex, and moreover, due to the ill-posedness of the underlying problem, convergence on objective value does not imply any result on the iterates directly. Therefore, new analysis is required for understanding the method (9) for ill-posed problems.

In [21] the block coordinate descent method (9) was considered with the index iki_{k} selected in a cyclic fashion, i.e. ik=(kmodb)+1i_{k}=(k\ \text{mod}\ b)+1. The regularization property of the corresponding method was only established for a specific class of linear ill-posed problems in which the forward operator AA is assumed a particular tensor product form; more precisely, the ill-posed linear system of the form

i=1bvliKxi=yl,l=1,,d,\displaystyle\sum_{i=1}^{b}v_{li}Kx_{i}=y_{l},\quad l=1,\cdots,d, (11)

was considered in [21], where K:XYK:X\to Y is a bounded linear operator between two real Hilbert spaces XX and YY and V=(vli)V=(v_{li}) is a d×bd\times b matrix with full column rank. Clearly this problem can be cast into the form (1) if we take Xi=XX_{i}=X for each ii, 𝒴=Yd{\mathcal{Y}}=Y^{d}, the dd-fold Cartesian product of YY, and define Ai:X𝒴A_{i}:X\to{\mathcal{Y}} by

Aiz=(v1iKz,,vdiKz),zXA_{i}z=(v_{1i}Kz,\cdots,v_{di}Kz),\quad z\in X

for each i=1,,bi=1,\cdots,b. The problem formulation (11), however, seems too specific to cover interesting problems arising from practical applications. How to establish the regularization property of this cyclic block coordinate descent method for ill-posed problem (1) in its full generality remains a challenging open question.

Instead of using a deterministic cyclic order, in this paper we will consider a randomized block coordinate descent method, i.e. the method (9) with iki_{k} being selected from {1,,b}\{1,\cdots,b\} uniformly at random. We will investigate the convergence behavior of the iterates for the ill-posed linear problem (1) with general forward operator AA. Note that in the method (9) the definition of xk+1δx_{k+1}^{\delta} involves rkδ:=Axkδyδr_{k}^{\delta}:=Ax_{k}^{\delta}-y^{\delta}. If this quantity is required to be calculated directly at each iteration step, then the advantages of the method will be largely reduced. Fortunately, this quantity can be calculated in a simple recursive way. Indeed, by the definition of xk+1δx_{k+1}^{\delta} we have

rk+1δ\displaystyle r_{k+1}^{\delta} =i=1bAixk+1δyδ=iikAixk+1,iδ+Aikxk+1,ikδyδ\displaystyle=\sum_{i=1}^{b}A_{i}x_{k+1}^{\delta}-y^{\delta}=\sum_{i\neq i_{k}}A_{i}x_{k+1,i}^{\delta}+A_{i_{k}}x_{k+1,i_{k}}^{\delta}-y^{\delta}
=iikAixk,iδ+Aikxk,ikδ+Aik(xk+1,ikδxk,ikδ)yδ\displaystyle=\sum_{i\neq i_{k}}A_{i}x_{k,i}^{\delta}+A_{i_{k}}x_{k,i_{k}}^{\delta}+A_{i_{k}}(x_{k+1,i_{k}}^{\delta}-x_{k,i_{k}}^{\delta})-y^{\delta}
=Axkδyδ+Aik(xk+1,ikδxk,ikδ)\displaystyle=Ax_{k}^{\delta}-y^{\delta}+A_{i_{k}}(x_{k+1,i_{k}}^{\delta}-x_{k,i_{k}}^{\delta})
=rkδ+Aik(xk+1,ikδxk,ikδ).\displaystyle=r_{k}^{\delta}+A_{i_{k}}(x_{k+1,i_{k}}^{\delta}-x_{k,i_{k}}^{\delta}).

Therefore, we can formulate our randomized block coordinate descent (RBCD) method into the following equivalent form which is convenient for numerical implementation.

Algorithm 1 (RBCD method for ill-posed problems).

Pick an initial guess x0𝒳x_{0}\in{\mathcal{X}}, set x0δ:=x0x_{0}^{\delta}:=x_{0} and calculate r0δ:=Ax0δyδr_{0}^{\delta}:=Ax_{0}^{\delta}-y^{\delta}. Choose a suitable step size γ>0\gamma>0. For all integers k0k\geq 0 do the following:

  1. (i)

    Pick an index ik{1,,b}i_{k}\in\{1,\cdots,b\} randomly via the uniform distribution;

  2. (ii)

    Update xk+1δx_{k+1}^{\delta} by setting xk+1,iδ=xk,iδx_{k+1,i}^{\delta}=x_{k,i}^{\delta} for iiki\neq i_{k} and

    xk+1,ikδ=xk,ikδγAikrkδ;x_{k+1,i_{k}}^{\delta}=x_{k,i_{k}}^{\delta}-\gamma A_{i_{k}}^{*}r_{k}^{\delta};
  3. (iii)

    Calculate rk+1δ=rkδ+Aik(xk+1,ikδxk,ikδ)r_{k+1}^{\delta}=r_{k}^{\delta}+A_{i_{k}}(x_{k+1,i_{k}}^{\delta}-x_{k,i_{k}}^{\delta}).

In this paper we present a convergence analysis on Algorithm 1 by deriving the stability estimate and establishing the regularization property. For the equation (1) in the general form, we obtain a weak convergence result and for a particular tensor product form as studied in [21] we establish the strong convergence result. Moreover, we also consider the early stopping of Algorithm 1 and demonstrate that the discrepancy principle can terminate the iteration after finite many steps almost surely. Note that, Algorithm 1 does not incorporate a priori available information on the feature of the sought solution. In case the sought solution has some special feature, such as non-negativity, sparsity and piece-wise constancy, Algorithm 1 may not be efficient enough to produce satisfactory approximate solutions. In order to resolve this issue, we further extend Algorithm 1 by incorporating into it a convex regularization term which is selected to capture the desired feature. For a detailed description of this extended algorithm, please refer to Algorithm 3 for which we will provide a convergence analysis.

It should be pointed out that the RBCD method is essentially different from the Landweber-Kaczmarz method and its stochastic version which have been studied in [8, 10, 11, 12, 13]. The Landweber-Kaczmarz method relies on decomposing the data into many blocks of small size, while our RBCD method makes use of the block structure of the sought solutions.

This paper is organized as follows. In Section 2 we consider Algorithm 1 by proving various convergence results and investigating the discrepancy principle as an early stopping rule. In Section 3 we consider an extension of Algorithm 1 by incorporating a convex regularization term into it so that the special feature of sought solutions can be detected. Finally in Section 4 we provide numerical simulations to test the performance of our proposed methods.

2. Convergence analysis

In this section we consider Algorithm 1. We first establish a stability estimate, and prove a weak convergence result when the method is terminated by an a priori stopping rule. We then investigate the discrepancy principle and demonstrate that it can terminate the iteration in finite many steps almost surely. When the forward operator AA has a particular tensor product form as used in [21], we further show that strong convergence can be guaranteed.

Note that, once x0𝒳x_{0}\in{\mathcal{X}} and γ\gamma are fixed, the sequence {xkδ}\{x_{k}^{\delta}\} is completely determined by the sample path {i0,i1,}\{i_{0},i_{1},\cdots\}; changing the sample path can result in a different iterative sequence and thus {xkδ}\{x_{k}^{\delta}\} is a random sequence. Let 0={\mathcal{F}}_{0}=\emptyset and, for each integer k1k\geq 1, let k{\mathcal{F}}_{k} denote the σ\sigma-algebra generated by the random variables ili_{l} for 0l<k0\leq l<k. Then {k:k0}\{{\mathcal{F}}_{k}:k\geq 0\} form a filtration which is natural to Algorithm 1. Let 𝔼\mathbb{E} denote the expectation associated with this filtration, see [4]. The tower property of conditional expectation

𝔼[𝔼[ϕ|k]]=𝔼[ϕ] for any random variable ϕ\mathbb{E}[\mathbb{E}[\phi|{\mathcal{F}}_{k}]]=\mathbb{E}[\phi]\quad\mbox{ for any random variable }\phi

will be frequently used.

2.1. Stability estimate

Let {xk}\{x_{k}\} denote the iterative sequence produced by Algorithm 1 with yδy^{\delta} replaced by the exact data yy. By definition it is easy to see that, for any fixed integer k0k\geq 0, along any sample path there holds xkδxk0\|x_{k}^{\delta}-x_{k}\|\to 0 as δ0\delta\to 0 and hence 𝔼[xkδxk2]0\mathbb{E}[\|x_{k}^{\delta}-x_{k}\|^{2}]\to 0 as δ0\delta\to 0. The following result gives a more precise stability estimate.

Lemma 2.1.

Consider Algorithm 1 and assume 0<γ2/A20<\gamma\leq 2/\|A\|^{2}. Then, along any sample path there holds

A(xkδxk)yδ+yδ\displaystyle\|A(x_{k}^{\delta}-x_{k})-y^{\delta}+y\|\leq\delta (12)

for all integers k0k\geq 0. Furthermore

𝔼[xkδxk2]2γbkδ2\displaystyle\mathbb{E}[\|x_{k}^{\delta}-x_{k}\|^{2}]\leq\frac{2\gamma}{b}k\delta^{2} (13)

for all k0k\geq 0.

Proof.

Let ukδ:=xkδxku_{k}^{\delta}:=x_{k}^{\delta}-x_{k} for all integers k0k\geq 0. It follows from the definition of xk+1δx_{k+1}^{\delta} and xk+1x_{k+1} that

uk+1,iδ={uk,iδ if iik,uk,ikδγAik(Aukδyδ+y) if i=ik.\displaystyle u_{k+1,i}^{\delta}=\left\{\begin{array}[]{lll}u_{k,i}^{\delta}&\mbox{ if }i\neq i_{k},\\[4.73611pt] u_{k,i_{k}}^{\delta}-\gamma A_{i_{k}}^{*}(Au_{k}^{\delta}-y^{\delta}+y)&\mbox{ if }i=i_{k}.\end{array}\right.

Thus

Auk+1δyδ+y\displaystyle Au_{k+1}^{\delta}-y^{\delta}+y =iikAiuk+1,iδ+Aikuk+1,ikδyδ+y\displaystyle=\sum_{i\neq i_{k}}A_{i}u_{k+1,i}^{\delta}+A_{i_{k}}u_{k+1,i_{k}}^{\delta}-y^{\delta}+y
=i=1bAiuk,iδγAikAik(Aukδyδ+y)yδ+y\displaystyle=\sum_{i=1}^{b}A_{i}u_{k,i}^{\delta}-\gamma A_{i_{k}}A_{i_{k}}^{*}(Au_{k}^{\delta}-y^{\delta}+y)-y^{\delta}+y
=(IγAikAik)(Aukδyδ+y).\displaystyle=(I-\gamma A_{i_{k}}A_{i_{k}}^{*})(Au_{k}^{\delta}-y^{\delta}+y).

Since AikA\|A_{i_{k}}\|\leq\|A\| and 0<γ2/A20<\gamma\leq 2/\|A\|^{2}, we have IγAikAik1\|I-\gamma A_{i_{k}}A_{i_{k}}^{*}\|\leq 1 and thus

Auk+1δyδ+yIγAikAikAukδyδ+yAukδyδ+y.\|Au_{k+1}^{\delta}-y^{\delta}+y\|\leq\|I-\gamma A_{i_{k}}A_{i_{k}}^{*}\|\|Au_{k}^{\delta}-y^{\delta}+y\|\leq\|Au_{k}^{\delta}-y^{\delta}+y\|.

Consequently

Auk+1δyδ+yAu0δyδ+y=yδyδ\displaystyle\|Au_{k+1}^{\delta}-y^{\delta}+y\|\leq\|Au_{0}^{\delta}-y^{\delta}+y\|=\|y^{\delta}-y\|\leq\delta

which shows (12).

To derive (13), we note that

uk+1δ2\displaystyle\|u_{k+1}^{\delta}\|^{2} =iikuk+1,iδ2+uk+1,ikδ2\displaystyle=\sum_{i\neq i_{k}}\|u_{k+1,i}^{\delta}\|^{2}+\|u_{k+1,i_{k}}^{\delta}\|^{2}
=iikuk,iδ2+uk,ikδγAik(Aukδyδ+y)2\displaystyle=\sum_{i\neq i_{k}}\|u_{k,i}^{\delta}\|^{2}+\|u_{k,i_{k}}^{\delta}-\gamma A_{i_{k}}^{*}(Au_{k}^{\delta}-y^{\delta}+y)\|^{2}
=iikuk,iδ2+uk,ikδ2+γ2Aik(Aukδyδ+y)2\displaystyle=\sum_{i\neq i_{k}}\|u_{k,i}^{\delta}\|^{2}+\|u_{k,i_{k}}^{\delta}\|^{2}+\gamma^{2}\|A_{i_{k}}^{*}(Au_{k}^{\delta}-y^{\delta}+y)\|^{2}
2γuk,ikδ,Aik(Aukδyδ+y)\displaystyle\quad\,-2\gamma\langle u_{k,i_{k}}^{\delta},A_{i_{k}}^{*}(Au_{k}^{\delta}-y^{\delta}+y)\rangle
=ukδ2+γ2Aik(Aukδyδ+y)2\displaystyle=\|u_{k}^{\delta}\|^{2}+\gamma^{2}\|A_{i_{k}}^{*}(Au_{k}^{\delta}-y^{\delta}+y)\|^{2}
2γAikuk,ikδ,Aukδyδ+y.\displaystyle\quad\,-2\gamma\langle A_{i_{k}}u_{k,i_{k}}^{\delta},Au_{k}^{\delta}-y^{\delta}+y\rangle.

Taking the conditional expectation gives

𝔼[uk+1δ2|k]\displaystyle\mathbb{E}[\|u_{k+1}^{\delta}\|^{2}|{\mathcal{F}}_{k}]
=ukδ2+γ2bi=1bAi(Aukδyδ+y)22γbi=1bAiuk,iδ,Aukδyδ+y\displaystyle=\|u_{k}^{\delta}\|^{2}+\frac{\gamma^{2}}{b}\sum_{i=1}^{b}\|A_{i}^{*}(Au_{k}^{\delta}-y^{\delta}+y)\|^{2}-\frac{2\gamma}{b}\left\langle\sum_{i=1}^{b}A_{i}u_{k,i}^{\delta},Au_{k}^{\delta}-y^{\delta}+y\right\rangle
=ukδ2+γ2bA(Aukδyδ+y)22γbAukδ,Aukδyδ+y\displaystyle=\|u_{k}^{\delta}\|^{2}+\frac{\gamma^{2}}{b}\|A^{*}(Au_{k}^{\delta}-y^{\delta}+y)\|^{2}-\frac{2\gamma}{b}\left\langle Au_{k}^{\delta},Au_{k}^{\delta}-y^{\delta}+y\right\rangle
ukδ2+γ2A2bAukδyδ+y22γbAukδyδ+y2\displaystyle\leq\|u_{k}^{\delta}\|^{2}+\frac{\gamma^{2}\|A\|^{2}}{b}\|Au_{k}^{\delta}-y^{\delta}+y\|^{2}-\frac{2\gamma}{b}\|Au_{k}^{\delta}-y^{\delta}+y\|^{2}
2γbyδy,Aukδyδ+y\displaystyle\quad\,-\frac{2\gamma}{b}\langle y^{\delta}-y,Au_{k}^{\delta}-y^{\delta}+y\rangle
ukδ21b(2γA2)γAukδyδ+y2+2γbδAukδyδ+y.\displaystyle\leq\|u_{k}^{\delta}\|^{2}-\frac{1}{b}(2-\gamma\|A\|^{2})\gamma\|Au_{k}^{\delta}-y^{\delta}+y\|^{2}+\frac{2\gamma}{b}\delta\|Au_{k}^{\delta}-y^{\delta}+y\|.

By using 0<γ2/A20<\gamma\leq 2/\|A\|^{2} and (12), we further obtain

𝔼[uk+1δ2|k]ukδ2+2γbδAukδyδ+yukδ2+2γbδ2.\displaystyle\mathbb{E}[\|u_{k+1}^{\delta}\|^{2}|{\mathcal{F}}_{k}]\leq\|u_{k}^{\delta}\|^{2}+\frac{2\gamma}{b}\delta\|Au_{k}^{\delta}-y^{\delta}+y\|\leq\|u_{k}^{\delta}\|^{2}+\frac{2\gamma}{b}\delta^{2}.

Consequently, by taking the full expectation and using the tower property of conditional expectation, we can obtain

𝔼[uk+1δ2]=𝔼[𝔼[uk+1δ2|k]]𝔼[ukδ2]+2γδ2b.\displaystyle\mathbb{E}\left[\|u_{k+1}^{\delta}\|^{2}\right]=\mathbb{E}\left[\mathbb{E}\left[\|u_{k+1}^{\delta}\|^{2}|{\mathcal{F}}_{k}\right]\right]\leq\mathbb{E}\left[\|u_{k}^{\delta}\|^{2}\right]+\frac{2\gamma\delta^{2}}{b}.

Based on this inequality and the fact u0δ=0u_{0}^{\delta}=0, we can use an induction argument to complete the proof of (13) immediately. ∎

2.2. Weak convergence

Our goal is to investigate the approximation behavior of xkδx_{k}^{\delta} to a solution of Ax=yAx=y. Because of Lemma 2.1, we now focus our consideration on the sequence {xk}\{x_{k}\} defined by Algorithm 1 using exact data.

Lemma 2.2.

Assume 0<γ<2/A20<\gamma<2/\|A\|^{2}. Then for any solution x¯\bar{x} of (1) there holds

𝔼[xk+1x¯2|k]xkx¯2c0Axky2\displaystyle\mathbb{E}[\|x_{k+1}-\bar{x}\|^{2}|{\mathcal{F}}_{k}]\leq\|x_{k}-\bar{x}\|^{2}-c_{0}\|Ax_{k}-y\|^{2} (14)

for all integers k0k\geq 0, where c0:=(2γA2)γ/b>0c_{0}:=(2-\gamma\|A\|^{2})\gamma/b>0.

Proof.

Let x¯i\bar{x}_{i} denote the ii-th component of x¯\bar{x}, i.e. x¯=(x¯1,,x¯b)\bar{x}=(\bar{x}_{1},\cdots,\bar{x}_{b}). By the definition of xk+1x_{k+1} and the polarization identity, we have

xk+1x¯2\displaystyle\|x_{k+1}-\bar{x}\|^{2} =iikxk+1,ix¯i2+xk+1,ikx¯ik2\displaystyle=\sum_{i\neq i_{k}}\|x_{k+1,i}-\bar{x}_{i}\|^{2}+\|x_{k+1,i_{k}}-\bar{x}_{i_{k}}\|^{2}
=iikxk,ix¯i2+xk,ikx¯ikγAik(Axky)2\displaystyle=\sum_{i\neq i_{k}}\|x_{k,i}-\bar{x}_{i}\|^{2}+\|x_{k,i_{k}}-\bar{x}_{i_{k}}-\gamma A_{i_{k}}^{*}(Ax_{k}-y)\|^{2}
=iikxk,ix¯i2+xk,ikx¯ik2+γ2Aik(Axky)2\displaystyle=\sum_{i\neq i_{k}}\|x_{k,i}-\bar{x}_{i}\|^{2}+\|x_{k,i_{k}}-\bar{x}_{i_{k}}\|^{2}+\gamma^{2}\|A_{i_{k}}^{*}(Ax_{k}-y)\|^{2}
2γxk,ikx¯ik,Aik(Axky)\displaystyle\quad\,-2\gamma\langle x_{k,i_{k}}-\bar{x}_{i_{k}},A_{i_{k}}^{*}(Ax_{k}-y)\rangle
=xkx¯2+γ2Aik(Axky)2\displaystyle=\|x_{k}-\bar{x}\|^{2}+\gamma^{2}\|A_{i_{k}}^{*}(Ax_{k}-y)\|^{2}
2γAik(xk,ikx¯ik),Axky.\displaystyle\quad\,-2\gamma\langle A_{i_{k}}(x_{k,i_{k}}-\bar{x}_{i_{k}}),Ax_{k}-y\rangle.

Taking the conditional expectation and using Ax¯=yA\bar{x}=y, we can obtain

𝔼[xk+1x¯2|k]\displaystyle\mathbb{E}[\|x_{k+1}-\bar{x}\|^{2}|{\mathcal{F}}_{k}]
=xkx¯2+γ2bi=1bAi(Axky)22γbi=1bAi(xk,ix¯i),Axky\displaystyle=\|x_{k}-\bar{x}\|^{2}+\frac{\gamma^{2}}{b}\sum_{i=1}^{b}\|A_{i}^{*}(Ax_{k}-y)\|^{2}-\frac{2\gamma}{b}\sum_{i=1}^{b}\langle A_{i}(x_{k,i}-\bar{x}_{i}),Ax_{k}-y\rangle
=xkx¯2+γ2bA(Axky)22γbA(xkx¯),Axky\displaystyle=\|x_{k}-\bar{x}\|^{2}+\frac{\gamma^{2}}{b}\|A^{*}(Ax_{k}-y)\|^{2}-\frac{2\gamma}{b}\langle A(x_{k}-\bar{x}),Ax_{k}-y\rangle
xkx¯2+γ2A2bAxky22γbAxky2\displaystyle\leq\|x_{k}-\bar{x}\|^{2}+\frac{\gamma^{2}\|A\|^{2}}{b}\|Ax_{k}-y\|^{2}-\frac{2\gamma}{b}\|Ax_{k}-y\|^{2}

which shows (14). ∎

To proceed further, we need the following Doob’s martingale convergence theorem ([4]).

Proposition 2.3.

Let {Uk}\{U_{k}\} be a sequence of nonnegative random variables in a probability space that is a supermartingale with respect to a filtration {k}\{{\mathcal{F}}_{k}\}, i.e.

𝔼[Uk+1|k]Uk,k.\mathbb{E}[U_{k+1}|{\mathcal{F}}_{k}]\leq U_{k},\quad\forall k.

Then {Uk}\{U_{k}\} converges almost surely to a nonnegative random variable UU with finite expectation.

Based on Proposition 2.3, we next prove the almost sure weak convergence of {xk}\{x_{k}\}. We need 𝒳{\mathcal{X}} to be separable in the sense that 𝒳{\mathcal{X}} has a countable dense subset.

Theorem 2.4.

Consider the sequence {xk}\{x_{k}\} defined by Algorithm 1 using exact data. Assume that 𝒳{\mathcal{X}} is separable and 0<γ<2/A20<\gamma<2/\|A\|^{2}, then {xk}\{x_{k}\} converges weakly to a random solution x¯\bar{x} of (1) almost surely.

Proof.

Let SS denote the set of solutions of (1). According to Lemma 2.2, we have for any solution zSz\in S that

𝔼[xk+1z2|k]xkz2,k\mathbb{E}[\|x_{k+1}-z\|^{2}|{\mathcal{F}}_{k}]\leq\|x_{k}-z\|^{2},\quad\forall k

which means {xkz2}\{\|x_{k}-z\|^{2}\} is a supermartingale. Thus, we may use Proposition 2.3 to conclude that the event

Ωz:={limkxkz exists and is finite}\Omega_{z}:=\left\{\lim_{k\to\infty}\|x_{k}-z\|\mbox{ exists and is finite}\right\}

has probability one. We now strengthen this result by showing that there is an event Ω1\Omega_{1} of probability one such that, for any z~S\tilde{z}\in S and any sample path ωΩ1\omega\in\Omega_{1}, the limit limkxk(ω)z~\lim_{k\to\infty}\|x_{k}(\omega)-\tilde{z}\| exists. We adapt the arguments in [2, 5]. By the separability of 𝒳{\mathcal{X}}, we can find a countable set CSC\subset S such that CC is dense in SS. Let

Ω1:=zCΩz.\Omega_{1}:=\bigcap_{z\in C}\Omega_{z}.

Since (Ωz)=1\mathbb{P}(\Omega_{z})=1 for each zCz\in C and CC is countable, we have (Ω1)=1\mathbb{P}(\Omega_{1})=1. Let z~S\tilde{z}\in S be any point. Then there is a sequence {zl}C\{z_{l}\}\subset C such that zlz~z_{l}\to\tilde{z} as ll\to\infty. For any sample path ωΩ1\omega\in\Omega_{1} we have by the triangle inequality that

zlz~xk(ω)z~xk(ω)zlzlz~.-\|z_{l}-\tilde{z}\|\leq\|x_{k}(\omega)-\tilde{z}\|-\|x_{k}(\omega)-z_{l}\|\leq\|z_{l}-\tilde{z}\|.

Thus

zlz~\displaystyle-\|z_{l}-\tilde{z}\| lim infk{xk(ω)z~xk(ω)zl}\displaystyle\leq\liminf_{k\to\infty}\left\{\|x_{k}(\omega)-\tilde{z}\|-\|x_{k}(\omega)-z_{l}\|\right\}
lim supk{xk(ω)z~xk(ω)zl}\displaystyle\leq\limsup_{k\to\infty}\left\{\|x_{k}(\omega)-\tilde{z}\|-\|x_{k}(\omega)-z_{l}\|\right\}
zlz~.\displaystyle\leq\|z_{l}-\tilde{z}\|.

Since ωΩ1Ωzl\omega\in\Omega_{1}\subset\Omega_{z_{l}}, limkxk(ω)zl\lim_{k\to\infty}\|x_{k}(\omega)-z_{l}\| exists. Thus, by the properties of lim inf\liminf and lim sup\limsup we have

zlz~\displaystyle-\|z_{l}-\tilde{z}\| lim infkxk(ω)z~limkxk(ω)zl\displaystyle\leq\liminf_{k\to\infty}\|x_{k}(\omega)-\tilde{z}\|-\lim_{k\to\infty}\|x_{k}(\omega)-z_{l}\|
lim supkxk(ω)z~limkxk(ω)zl\displaystyle\leq\limsup_{k\to\infty}\|x_{k}(\omega)-\tilde{z}\|-\lim_{k\to\infty}\|x_{k}(\omega)-z_{l}\|
zlz~.\displaystyle\leq\|z_{l}-\tilde{z}\|.

This implies that both lim infkxk(ω)z~\liminf_{k\to\infty}\|x_{k}(\omega)-\tilde{z}\| and lim supkxk(ω)z~\limsup_{k\to\infty}\|x_{k}(\omega)-\tilde{z}\| are finite with

0lim supkxk(ω)z~lim infkxk(ω)z~2zlz~.0\leq\limsup_{k\to\infty}\|x_{k}(\omega)-\tilde{z}\|-\liminf_{k\to\infty}\|x_{k}(\omega)-\tilde{z}\|\leq 2\|z_{l}-\tilde{z}\|.

Letting ll\to\infty shows that

lim infkxk(ω)z~=lim supkxk(ω)z~\liminf_{k\to\infty}\|x_{k}(\omega)-\tilde{z}\|=\limsup_{k\to\infty}\|x_{k}(\omega)-\tilde{z}\|

and hence limkxk(ω)z~\lim_{k\to\infty}\|x_{k}(\omega)-\tilde{z}\| exists and is finite for every ωΩ1\omega\in\Omega_{1} and z~S\tilde{z}\in S.

Next we use Lemma 2.2 again to obtain for any zSz\in S that

𝔼[xk+1z2]𝔼[xkz2]c0𝔼[Axky2]\mathbb{E}[\|x_{k+1}-z\|^{2}]\leq\mathbb{E}[\|x_{k}-z\|^{2}]-c_{0}\mathbb{E}[\|Ax_{k}-y\|^{2}]

which implies that

𝔼[k=0Axky2]=k=0𝔼[Axky2]x0z2c0<.\mathbb{E}\left[\sum_{k=0}^{\infty}\|Ax_{k}-y\|^{2}\right]=\sum_{k=0}^{\infty}\mathbb{E}[\|Ax_{k}-y\|^{2}]\leq\frac{\|x_{0}-z\|^{2}}{c_{0}}<\infty.

Consequently, the event

Ω2:={k=0Axky2<}\displaystyle\Omega_{2}:=\left\{\sum_{k=0}^{\infty}\|Ax_{k}-y\|^{2}<\infty\right\} (15)

has probability one. Let Ω0:=Ω1Ω2\Omega_{0}:=\Omega_{1}\cap\Omega_{2}. Then (Ω0)=1\mathbb{P}(\Omega_{0})=1. Note that along any sample path in Ω0\Omega_{0} we have {xkz}\{\|x_{k}-z\|\} is convergent for any zSz\in S and

k=0Axky2<\sum_{k=0}^{\infty}\|Ax_{k}-y\|^{2}<\infty

which implies Axky0\|Ax_{k}-y\|\to 0 as kk\to\infty. The convergence of {xkz}\{\|x_{k}-z\|\} implies that {xk}\{x_{k}\} is bounded and hence it has a weakly convergent subsequence {xkl}\{x_{k_{l}}\} such that xklx¯x_{k_{l}}\rightharpoonup\bar{x} as ll\to\infty for some x¯𝒳\bar{x}\in{\mathcal{X}}, where “\rightharpoonup” denotes the weak convergence. Since AxkyAx_{k}\to y, we must have Ax¯=yA\bar{x}=y, i.e. x¯S\bar{x}\in S and consequently xkx¯\|x_{k}-\bar{x}\| converges. We now show that xkx¯x_{k}\rightharpoonup\bar{x} for the whole sequence {xk}\{x_{k}\}. It suffices to show that x¯\bar{x} is the unique weak cluster point of {xk}\{x_{k}\}. Let xx^{*} be any cluster point of {xk}\{x_{k}\}. Then there is another subsequence {xnl}\{x_{n_{l}}\} of {xk}\{x_{k}\} such that xnlxx_{n_{l}}\rightharpoonup x^{*}. From the above argument we also have xSx^{*}\in S and thus {xkx}\{\|x_{k}-x^{*}\|\} is convergent. Noting the identity

xk,xx¯=xkx¯2xkx2x¯2+x2.\langle x_{k},x^{*}-\bar{x}\rangle=\|x_{k}-\bar{x}\|^{2}-\|x_{k}-x^{*}\|^{2}-\|\bar{x}\|^{2}+\|x^{*}\|^{2}.

Since both {xkx¯}\{\|x_{k}-\bar{x}\|\} and {xkx}\{\|x_{k}-x^{*}\|\} are convergent, we can conclude that limkxk,xx¯\lim_{k\to\infty}\langle x_{k},x^{*}-\bar{x}\rangle exists. Therefore

limkxk,xx¯\displaystyle\lim_{k\to\infty}\langle x_{k},x^{*}-\bar{x}\rangle =limlxkl,xx¯=x¯,xx¯,\displaystyle=\lim_{l\to\infty}\langle x_{k_{l}},x^{*}-\bar{x}\rangle=\langle\bar{x},x^{*}-\bar{x}\rangle,
limkxk,xx¯\displaystyle\lim_{k\to\infty}\langle x_{k},x^{*}-\bar{x}\rangle =limlxnl,xx¯=x,xx¯\displaystyle=\lim_{l\to\infty}\langle x_{n_{l}},x^{*}-\bar{x}\rangle=\langle x^{*},x^{*}-\bar{x}\rangle

and thus x¯,xx¯=x,xx¯\langle\bar{x},x^{*}-\bar{x}\rangle=\langle x^{*},x^{*}-\bar{x}\rangle, i.e. xx¯2=0\|x^{*}-\bar{x}\|^{2}=0 and hence x=x¯x^{*}=\bar{x}. The proof is complete. ∎

Remark 2.5.

Theorem 2.4 shows that there is an event Ω0\Omega_{0} of probability one and a random vector x¯\bar{x} such that Ax¯=yA\bar{x}=y almost surely and xkx¯x_{k}\rightharpoonup\bar{x} along every sample path in Ω0\Omega_{0}. Let xx^{\dagger} denote the unique x0x_{0}-minimum-norm solution of Ax=yAx=y. We would like to point out that x¯=x\bar{x}=x^{\dagger} almost surely if {xk}\{x_{k}\} is uniformly bounded in the sense that

there is a constant C such that xkC for all k almost surely.\displaystyle\mbox{there is a constant }C\mbox{ such that }\|x_{k}\|\leq C\mbox{ for all }k\mbox{ almost surely}. (16)

To see this, we first claim that

𝔼[xkx0,x¯x]=0,k.\displaystyle\mathbb{E}[\langle x_{k}-x_{0},\bar{x}-x^{\dagger}\rangle]=0,\quad\forall k. (17)

Indeed this is trivial for k=0k=0. Assume it is true for some k0k\geq 0. Then, by the definition of xk+1x_{k+1}, we have

xk+1x0,x¯x\displaystyle\langle x_{k+1}-x_{0},\bar{x}-x^{\dagger}\rangle =xkx0,x¯xγAik(Axky),x¯ikxik\displaystyle=\langle x_{k}-x_{0},\bar{x}-x^{\dagger}\rangle-\gamma\langle A_{i_{k}}^{*}(Ax_{k}-y),\bar{x}_{i_{k}}-x_{i_{k}}^{\dagger}\rangle
=xkx0,x¯xγAxky,Aik(x¯ikxik).\displaystyle=\langle x_{k}-x_{0},\bar{x}-x^{\dagger}\rangle-\gamma\langle Ax_{k}-y,A_{i_{k}}(\bar{x}_{i_{k}}-x_{i_{k}}^{\dagger})\rangle.

Thus

𝔼[xk+1x0,x¯x|k]\displaystyle\mathbb{E}[\langle x_{k+1}-x_{0},\bar{x}-x^{\dagger}\rangle|{\mathcal{F}}_{k}] =xkx0,x¯xγbAxky,i=1bAi(x¯ixi)\displaystyle=\langle x_{k}-x_{0},\bar{x}-x^{\dagger}\rangle-\frac{\gamma}{b}\left\langle Ax_{k}-y,\sum_{i=1}^{b}A_{i}(\bar{x}_{i}-x_{i}^{\dagger})\right\rangle
=xkx0,x¯xγbAxky,A(x¯x)\displaystyle=\langle x_{k}-x_{0},\bar{x}-x^{\dagger}\rangle-\frac{\gamma}{b}\langle Ax_{k}-y,A(\bar{x}-x^{\dagger})\rangle
=xkx0,x¯x\displaystyle=\langle x_{k}-x_{0},\bar{x}-x^{\dagger}\rangle

because Ax¯=y=AxA\bar{x}=y=Ax^{\dagger} almost surely. Consequently, by taking the full expectation and using the induction hypothesis we can obtain 𝔼[xk+1x0,x¯x]=0\mathbb{E}[\langle x_{k+1}-x_{0},\bar{x}-x^{\dagger}\rangle]=0 and the claim (17) is proved.

Under the condition (16), we have

|xkx0,x¯x|xkx0x¯x(C+x0)x¯x.|\langle x_{k}-x_{0},\bar{x}-x^{\dagger}\rangle|\leq\|x_{k}-x_{0}\|\|\bar{x}-x^{\dagger}\|\leq(C+\|x_{0}\|)\|\bar{x}-x^{\dagger}\|.

Since xkx¯x_{k}\rightharpoonup\bar{x} almost surely, by using the weak lower semi-continuity of norms and the Fatou lemma we can obtain from Lemma 2.2 that

𝔼[x¯x2]\displaystyle\mathbb{E}[\|\bar{x}-x^{\dagger}\|^{2}] 𝔼[lim infkxkx2]lim infk𝔼[xkx2]\displaystyle\leq\mathbb{E}\left[\liminf_{k\to\infty}\|x_{k}-x^{\dagger}\|^{2}\right]\leq\liminf_{k\to\infty}\mathbb{E}[\|x_{k}-x^{\dagger}\|^{2}]
x0x2<\displaystyle\leq\|x_{0}-x^{\dagger}\|^{2}<\infty

and thus

𝔼[x¯x](𝔼[x¯x2])1/2x0x<.\mathbb{E}[\|\bar{x}-x^{\dagger}\|]\leq\left(\mathbb{E}[\|\bar{x}-x^{\dagger}\|^{2}]\right)^{1/2}\leq\|x_{0}-x^{\dagger}\|<\infty.

Therefore, we may use the dominated convergence theorem, xk,x¯xx¯,x¯x\langle x_{k},\bar{x}-x^{\dagger}\rangle\to\langle\bar{x},\bar{x}-x^{\dagger}\rangle and (17) to conclude

𝔼[x¯x0,x¯x]=limk𝔼[xkx0,x¯x]=0.\mathbb{E}[\langle\bar{x}-x_{0},\bar{x}-x^{\dagger}\rangle]=\lim_{k\to\infty}\mathbb{E}[\langle x_{k}-x_{0},\bar{x}-x^{\dagger}\rangle]=0.

Since xx0,x¯x=0\langle x^{\dagger}-x_{0},\bar{x}-x^{\dagger}\rangle=0 always holds as xx^{\dagger} is the x0x_{0}-minimum-norm solution, we thus obtain 𝔼[x¯x2]=0\mathbb{E}[\|\bar{x}-x^{\dagger}\|^{2}]=0 which implies that x¯=x\bar{x}=x^{\dagger} almost surely.

It should be emphasized that the above characterization of x¯\bar{x} depends crucially on the condition (16). We haven’t yet verified it for general forward operator AA. However, for the operator AA with a particular tensor product form as studied in [21], we will show that (16) holds in subsection 2.4.

By using Lemma 2.1 and Theorem 2.4, now we are ready to prove the following weak convergence result on Algorithm 1 under an a priori stopping rule.

Theorem 2.6.

For any sequence {yδn}\{y^{\delta_{n}}\} of noisy data satisfying yδnyδn\|y^{\delta_{n}}-y\|\leq\delta_{n} with δn0\delta_{n}\to 0 as nn\to\infty, let {xkδn}\{x_{k}^{\delta_{n}}\} be the iterative sequence produced by Algorithm 1 with yδy^{\delta} replaced by yδny^{\delta_{n}}, where 0<γ<2/A20<\gamma<2/\|A\|^{2}. Let the integer knk_{n} be chosen such that knk_{n}\to\infty and knδn20k_{n}\delta_{n}^{2}\to 0 as nn\to\infty. Then, by taking a subsequence of {xknδn}\{x_{k_{n}}^{\delta_{n}}\} if necessary, there holds xknδnx¯x_{k_{n}}^{\delta_{n}}\rightharpoonup\bar{x} as nn\to\infty almost surely, where x¯\bar{x} denotes the random solution of Ax=yAx=y determined in Theorem 2.4.

Proof.

According to Lemma 2.1 and knδn20k_{n}\delta_{n}^{2}\to 0 we have

𝔼[xknδnxkn2]2γbknδn20as n.\mathbb{E}\left[\|x_{k_{n}}^{\delta_{n}}-x_{k_{n}}\|^{2}\right]\leq\frac{2\gamma}{b}k_{n}\delta_{n}^{2}\to 0\quad\mbox{as }n\to\infty.

Therefore, by taking a subsequence of {xknδn}\{x_{k_{n}}^{\delta_{n}}\} if necessary, we can find an event Ω3\Omega_{3} with (Ω3)=1{\mathbb{P}}(\Omega_{3})=1 such that xknδnxkn0x_{k_{n}}^{\delta_{n}}-x_{k_{n}}\to 0 along every sample path in Ω3\Omega_{3}. According to Theorem 2.4 and knk_{n}\to\infty, there is an event Ω4\Omega_{4} of probability one such that xknx¯x_{k_{n}}\rightharpoonup\bar{x} as nn\to\infty along every sample path in Ω4\Omega_{4}. Let Ω:=Ω3Ω4\Omega:=\Omega_{3}\cap\Omega_{4}. Then (Ω)=1{\mathbb{P}}(\Omega)=1 and for any x𝒳x\in{\mathcal{X}} there holds

xknδnx¯,x=xknδnxkn,x+xknx¯,x0\displaystyle\langle x_{k_{n}}^{\delta_{n}}-\bar{x},x\rangle=\langle x_{k_{n}}^{\delta_{n}}-x_{k_{n}},x\rangle+\langle x_{k_{n}}-\bar{x},x\rangle\to 0

as nn\to\infty along every sample path in Ω\Omega. The proof is thus complete. ∎

2.3. The discrepancy principle

The above weak convergence result on Algorithm 1 is established under an a priori stopping rule. In applications, we usually expect to terminate the iteration by a posteriori rules. Note that rkδ:=Axkδyδr_{k}^{\delta}:=Ax_{k}^{\delta}-y^{\delta} is involved in the algorithm, it is natural to consider terminating the iteration by the discrepancy principle which determines kδk_{\delta} to be the first integer such that rkδτδ\|r_{k}^{\delta}\|\leq\tau\delta, where τ>1\tau>1 is a given number. Incorporating the discrepancy principle into Algorithm 1 leads to the following algorithm.

Algorithm 2 (RBCD method with the discrepancy principle).

Pick an initial guess x0𝒳x_{0}\in{\mathcal{X}}, set x0δ:=x0x_{0}^{\delta}:=x_{0} and calculate r0δ:=Ax0δyδr_{0}^{\delta}:=Ax_{0}^{\delta}-y^{\delta}. Choose τ>1\tau>1 and γ>0\gamma>0. For all integers k0k\geq 0 do the following:

  1. (i)

    Set the step size γk\gamma_{k} by

    γk:={γ if rkδ>τδ,0 if rkδτδ;\displaystyle\gamma_{k}:=\left\{\begin{array}[]{lll}\gamma&\mbox{ if }\|r_{k}^{\delta}\|>\tau\delta,\\ 0&\mbox{ if }\|r_{k}^{\delta}\|\leq\tau\delta;\end{array}\right.
  2. (ii)

    Pick an index ik{1,,b}i_{k}\in\{1,\cdots,b\} randomly via the uniform distribution;

  3. (iii)

    Update xk+1δx_{k+1}^{\delta} by setting xk+1,iδ=xk,iδx_{k+1,i}^{\delta}=x_{k,i}^{\delta} for iiki\neq i_{k} and

    xk+1,ikδ=xk,ikδγkAikrkδ;x_{k+1,i_{k}}^{\delta}=x_{k,i_{k}}^{\delta}-\gamma_{k}A_{i_{k}}^{*}r_{k}^{\delta};
  4. (iv)

    Calculate rk+1δ=rkδ+Aik(xk+1,ikδxk,ikδ)r_{k+1}^{\delta}=r_{k}^{\delta}+A_{i_{k}}(x_{k+1,i_{k}}^{\delta}-x_{k,i_{k}}^{\delta}).

Algorithm 2 is formulated in the way that it incorporates the discrepancy principle to define an infinite sequence {xkδ}\{x_{k}^{\delta}\}, which is convenient for the analysis below. In numerical simulations, the iteration should be terminated as long as rkδτδ\|r_{k}^{\delta}\|\leq\tau\delta is satisfied because the iterates are no longer updated. It should be highlighted that the stopping index depends crucially on the sample path and thus is a random integer. Note also that the step size γk\gamma_{k} in Algorithm 2 is a random number; this sharply contrasts to the step size γ\gamma in Algorithm 1 which is deterministic.

Proposition 2.7.

Consider Algorithm 2 with γ=μ/A2\gamma=\mu/\|A\|^{2} for some 0<μ<20<\mu<2. Then the iteration must terminate in finite many steps almost surely. If in addition 0<μ<22/τ0<\mu<2-2/\tau, then for any solution x¯\bar{x} of (1) there holds

𝔼[xk+1δx¯2]𝔼[xkδx¯2]c1𝔼[γkAxkδyδ2]\displaystyle\mathbb{E}[\|x_{k+1}^{\delta}-\bar{x}\|^{2}]\leq\mathbb{E}[\|x_{k}^{\delta}-\bar{x}\|^{2}]-c_{1}\mathbb{E}\left[\gamma_{k}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}\right] (18)

for all integers k0k\geq 0, where c1:=(22/τμ)/b>0c_{1}:=(2-2/\tau-\mu)/b>0.

Proof.

By virtue of (12) in Lemma 2.1 we have along any sample path that

AxkδyδAxky+A(xkδxk)yδ+yAxky+δ.\|Ax_{k}^{\delta}-y^{\delta}\|\leq\|Ax_{k}-y\|+\|A(x_{k}^{\delta}-x_{k})-y^{\delta}+y\|\leq\|Ax_{k}-y\|+\delta.

In the proof of Theorem 2.4 we have shown that

(limkAxky=0)=1.\mathbb{P}\left(\lim_{k\to\infty}\|Ax_{k}-y\|=0\right)=1.

Therefore, as τ>1\tau>1, it follows that

lim supkAxkδyδδ<τδalmost surely\displaystyle\limsup_{k\to\infty}\|Ax_{k}^{\delta}-y^{\delta}\|\leq\delta<\tau\delta\quad\mbox{almost surely}

which means that Axkδyδ<τδ\|Ax_{k}^{\delta}-y^{\delta}\|<\tau\delta for some finite integer kk almost surely, i.e. Algorithm 2 must terminate in finite many steps almost surely.

Next we show (18) under the additional condition 0<μ<22/τ0<\mu<2-2/\tau. By following the proof of Lemma 2.2 we can obtain

xk+1δx¯2\displaystyle\|x_{k+1}^{\delta}-\bar{x}\|^{2} =xkδx¯2+γk2Aik(Axkδyδ)2\displaystyle=\|x_{k}^{\delta}-\bar{x}\|^{2}+\gamma_{k}^{2}\|A_{i_{k}}^{*}(Ax_{k}^{\delta}-y^{\delta})\|^{2}
2γkAik(xk,ikδx¯ik),Axkδyδ.\displaystyle\quad\,-2\gamma_{k}\langle A_{i_{k}}(x_{k,i_{k}}^{\delta}-\bar{x}_{i_{k}}),Ax_{k}^{\delta}-y^{\delta}\rangle.

By taking the conditional expectation on k{\mathcal{F}}_{k} and noting that γk\gamma_{k} is k{\mathcal{F}}_{k}-measurable, we have

𝔼[xk+1δx¯2|k]\displaystyle\mathbb{E}[\|x_{k+1}^{\delta}-\bar{x}\|^{2}|{\mathcal{F}}_{k}]
=xkδx¯2+γk2bi=1bAi(Axkδyδ)22γkbi=1bAi(xk,iδx¯i),Axkδyδ\displaystyle=\|x_{k}^{\delta}-\bar{x}\|^{2}+\frac{\gamma_{k}^{2}}{b}\sum_{i=1}^{b}\|A_{i}^{*}(Ax_{k}^{\delta}-y^{\delta})\|^{2}-\frac{2\gamma_{k}}{b}\left\langle\sum_{i=1}^{b}A_{i}(x_{k,i}^{\delta}-\bar{x}_{i}),Ax_{k}^{\delta}-y^{\delta}\right\rangle
=xkδx¯2+γk2bA(Axkδyδ)22γkbA(xkδx¯),Axkδyδ\displaystyle=\|x_{k}^{\delta}-\bar{x}\|^{2}+\frac{\gamma_{k}^{2}}{b}\|A^{*}(Ax_{k}^{\delta}-y^{\delta})\|^{2}-\frac{2\gamma_{k}}{b}\langle A(x_{k}^{\delta}-\bar{x}),Ax_{k}^{\delta}-y^{\delta}\rangle
xkδx¯2+γk2A2bAxkδyδ22γkbAxkδyδ2+2γkδbAxkδyδ.\displaystyle\leq\|x_{k}^{\delta}-\bar{x}\|^{2}+\frac{\gamma_{k}^{2}\|A\|^{2}}{b}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}-\frac{2\gamma_{k}}{b}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}+\frac{2\gamma_{k}\delta}{b}\|Ax_{k}^{\delta}-y^{\delta}\|.

By the definition of γk\gamma_{k} we have γkδγkτAxkδyδ\gamma_{k}\delta\leq\frac{\gamma_{k}}{\tau}\|Ax_{k}^{\delta}-y^{\delta}\|. Therefore

𝔼[xk+1δx¯2|k]\displaystyle\mathbb{E}[\|x_{k+1}^{\delta}-\bar{x}\|^{2}|{\mathcal{F}}_{k}] xkδx¯21b(22τγkA2)γkAxkδyδ2\displaystyle\leq\|x_{k}^{\delta}-\bar{x}\|^{2}-\frac{1}{b}\left(2-\frac{2}{\tau}-\gamma_{k}\|A\|^{2}\right)\gamma_{k}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}
xkδx¯21b(22τμ)γkAxkδyδ2.\displaystyle\leq\|x_{k}^{\delta}-\bar{x}\|^{2}-\frac{1}{b}\left(2-\frac{2}{\tau}-\mu\right)\gamma_{k}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}.

Taking the full expectation gives (18). ∎

2.4. Strong convergence

In Subsection 2.2, we obtained weak convergence result on Algorithm 1. In this section, we will show that, for a special case studied in [21], a strong convergence result can be derived. To start with, let V=(vij)V=(v_{ij}) be a d×bd\times b matrix and let K:XYK:X\to Y be a bounded linear operator. We will consider the problem (11) which can be written as Ax=yAx=y by setting y:=(y1,,yd)Ydy:=(y_{1},\cdots,y_{d})\in Y^{d} and define A:XbYdA:X^{b}\to Y^{d} as

Ax:=(i=1bvliKxi)l=1d,x=(xi)i=1bXb.Ax:=\left(\sum_{i=1}^{b}v_{li}Kx_{i}\right)_{l=1}^{d},\quad\forall x=(x_{i})_{i=1}^{b}\in X^{b}.

It is easy to see that (11) is a special case of (1) with Xi=XX_{i}=X for each ii, 𝒴=Yd{\mathcal{Y}}=Y^{d}, and

Aiz:=(vliKz)l=1d,zX.A_{i}z:=\left(v_{li}Kz\right)_{l=1}^{d},\quad\forall z\in X.

Note that Ai:YdXA^{*}_{i}:Y^{d}\to X, the adjoint of AiA_{i}, has the following form

Aiy~=l=1dvliKy~l,y~=(y~l)l=1dYd.A^{*}_{i}\tilde{y}=\sum_{l=1}^{d}v_{li}K^{*}\tilde{y}_{l},\quad\forall\tilde{y}=(\tilde{y}_{l})_{l=1}^{d}\in Y^{d}.

Thus, when our randomized block coordinate descent method, i.e. Algorithm 1, is used to solve (11) with the exact data, the iteration scheme becomes xk+1,i=xk,ix_{k+1,i}=x_{k,i} if iiki\neq i_{k} and

xk+1,ik=xk,ikγAik(Axky)=xk,ikγl=1dvlikK(Axky)l,\displaystyle x_{k+1,i_{k}}=x_{k,i_{k}}-\gamma A_{i_{k}}^{*}(Ax_{k}-y)=x_{k,i_{k}}-\gamma\sum_{l=1}^{d}v_{li_{k}}K^{*}(Ax_{k}-y)_{l},

where (Axky)l(Ax_{k}-y)_{l} denotes the ll-th component of AxkyAx_{k}-y. In order to give a convergence analysis, we introduce θk=(θk,l)l=1d\theta_{k}=(\theta_{k,l})_{l=1}^{d} with

θk,l=i=1bvlixk,i,l=1,,d\theta_{k,l}=\sum_{i=1}^{b}v_{li}x_{k,i},\quad l=1,\cdots,d

and for any solution x^\hat{x} of (11) we set θ^=(θ^l)l=1d\hat{\theta}=(\hat{\theta}_{l})_{l=1}^{d} with

θ^l=i=1bvlix^i,l=1,,d.\hat{\theta}_{l}=\sum_{i=1}^{b}v_{li}\hat{x}_{i},\quad l=1,\cdots,d.

Then, we have

θk+1,l\displaystyle\theta_{k+1,l} =iikvlixk+1,i+vlikxk+1,ik\displaystyle=\sum_{i\neq i_{k}}v_{li}x_{k+1,i}+v_{li_{k}}x_{k+1,i_{k}}
=iikvlixk,i+vlikxk,ikγvlikl=1dvlikK(Axky)l\displaystyle=\sum_{i\neq i_{k}}v_{li}x_{k,i}+v_{li_{k}}x_{k,i_{k}}-\gamma v_{li_{k}}\sum_{l^{\prime}=1}^{d}v_{l^{\prime}i_{k}}K^{*}(Ax_{k}-y)_{l^{\prime}}
=θk,lγvlikl=1dvlikK(Axky)l.\displaystyle=\theta_{k,l}-\gamma v_{li_{k}}\sum_{l^{\prime}=1}^{d}v_{l^{\prime}i_{k}}K^{*}(Ax_{k}-y)_{l^{\prime}}.

Consequently,

θk+1θ^2\displaystyle\|\theta_{k+1}-\hat{\theta}\|^{2} =l=1dθk+1,lθ^l2=l=1dθk,lθ^lγvlikl=1dvlikK(Axky)l2\displaystyle=\sum_{l=1}^{d}\|\theta_{k+1,l}-\hat{\theta}_{l}\|^{2}=\sum_{l=1}^{d}\left\|\theta_{k,l}-\hat{\theta}_{l}-\gamma v_{li_{k}}\sum_{l^{\prime}=1}^{d}v_{l^{\prime}i_{k}}K^{*}(Ax_{k}-y)_{l^{\prime}}\right\|^{2}
=l=1dθk,lθ^l22γΔ1+γ2Δ2,\displaystyle=\sum_{l=1}^{d}\|\theta_{k,l}-\hat{\theta}_{l}\|^{2}-2\gamma\Delta_{1}+\gamma^{2}\Delta_{2}, (19)

where

Δ1\displaystyle\Delta_{1} =l=1dθk,lθ^l,vlikl=1dvlikK(Axky)l,\displaystyle=\sum_{l=1}^{d}\left\langle\theta_{k,l}-\hat{\theta}_{l},v_{li_{k}}\sum_{l^{\prime}=1}^{d}v_{l^{\prime}i_{k}}K^{*}(Ax_{k}-y)_{l^{\prime}}\right\rangle,
Δ2\displaystyle\Delta_{2} =l=1dvlikl=1dvlikK(Axky)l2.\displaystyle=\sum_{l=1}^{d}\left\|v_{li_{k}}\sum_{l^{\prime}=1}^{d}v_{l^{\prime}i_{k}}K^{*}(Ax_{k}-y)_{l^{\prime}}\right\|^{2}.

By straightforward calculation, we can obtain

Δ1\displaystyle\Delta_{1} =l=1di=1bvlixk,ix^i,vlikl=1dvlikK(Axky)l\displaystyle=\sum_{l=1}^{d}\sum_{i=1}^{b}v_{li}\left\langle x_{k,i}-\hat{x}_{i},v_{li_{k}}\sum_{l^{\prime}=1}^{d}v_{l^{\prime}i_{k}}K^{*}(Ax_{k}-y)_{l^{\prime}}\right\rangle
=l=1dl=1dvlikvliki=1bvliK(xk,ix^i),(Axky)l\displaystyle=\sum_{l=1}^{d}\sum_{l^{\prime}=1}^{d}v_{li_{k}}v_{l^{\prime}i_{k}}\left<\sum_{i=1}^{b}v_{li}K(x_{k,i}-\hat{x}_{i}),(Ax_{k}-y)_{l^{\prime}}\right>
=l=1dl=1dvlikvlik(Axky)l,(Axky)l\displaystyle=\sum_{l=1}^{d}\sum_{l^{\prime}=1}^{d}v_{li_{k}}v_{l^{\prime}i_{k}}\left<(Ax_{k}-y)_{l},(Ax_{k}-y)_{l^{\prime}}\right>
=l=1dvlik(Axky)l2\displaystyle=\left\|\sum_{l=1}^{d}v_{li_{k}}(Ax_{k}-y)_{l}\right\|^{2}

and

Δ2\displaystyle\Delta_{2} l=1d|vlik|2K2l=1dvlik(Axky)l2=vik2K2l=1dvlik(Axky)l2,\displaystyle\leq\sum_{l=1}^{d}|v_{li_{k}}|^{2}\|K\|^{2}\left\|\sum_{l^{\prime}=1}^{d}v_{l^{\prime}i_{k}}(Ax_{k}-y)_{l^{\prime}}\right\|^{2}=\|v_{i_{k}}\|^{2}\|K\|^{2}\left\|\sum_{l=1}^{d}v_{li_{k}}(Ax_{k}-y)_{l}\right\|^{2},

where, for each ii, we use viv_{i} to denote the ii-th column of VV. Combining these two equations with (19) gives

θk+1θ^2θkθ^2(2γvik2K2)γl=1dvlik(Axky)l2\|\theta_{k+1}-\hat{\theta}\|^{2}\leq\|\theta_{k}-\hat{\theta}\|^{2}-(2-\gamma\|v_{i_{k}}\|^{2}\|K\|^{2})\gamma\left\|\sum_{l=1}^{d}v_{li_{k}}(Ax_{k}-y)_{l}\right\|^{2}

which shows the following result.

Lemma 2.8.

Consider Algorithm 1 for solving (11) with the exact data. Let v=max{vi2:i=1,,b}v^{*}=\max\{\|v_{i}\|^{2}:i=1,\cdots,b\}. If 0<γ<2/(vK2)0<\gamma<2/(v^{*}\|K\|^{2}), then

θk+1θ^2θkθ^2c2l=1dvlik(Axky)l2\|\theta_{k+1}-\hat{\theta}\|^{2}\leq\|\theta_{k}-\hat{\theta}\|^{2}-c_{2}\left\|\sum_{l=1}^{d}v_{li_{k}}(Ax_{k}-y)_{l}\right\|^{2}

for all k0k\geq 0, where c2:=(2γvK2)γ>0c_{2}:=(2-\gamma v^{*}\|K\|^{2})\gamma>0. Consequently, {θkθ^2}\{\|\theta_{k}-\hat{\theta}\|^{2}\} is monotonically decreasing.

Based on Lemma 2.8, now we show the almost sure strong convergence of {xk}\{x_{k}\} under the assumption that V=(vli)V=(v_{li}) is of full column rank.

Theorem 2.9.

Consider Algorithm 1 for solving (11) with the exact data. Assume that V=(vli)V=(v_{li}) is of full column rank and let vv^{*} be defined as in Lemma 2.8. If 0<γ<2/(vK2)0<\gamma<2/(v^{*}\|K\|^{2}), then xkx0\|x_{k}-x^{\dagger}\|\to 0 almost surely and 𝔼[xkx2]0\mathbb{E}[\|x_{k}-x^{\dagger}\|^{2}]\to 0 as kk\to\infty, where xx^{\dagger} denotes the unique x0x_{0}-minimum-norm solution of (11).

Proof.

Consider the event Ω2\Omega_{2} defined in (15). It is known that (Ω2)=1\mathbb{P}(\Omega_{2})=1. We now fix an arbitrary sample path {ik:k=0,1,}\{i_{k}:k=0,1,\cdots\} in Ω2\Omega_{2} and show that, along this sample path, {θk}\{\theta_{k}\} is a Cauchy sequence. Recall that

k=0Axky2<\displaystyle\sum_{k=0}^{\infty}\|Ax_{k}-y\|^{2}<\infty (20)

and hence Axky0\|Ax_{k}-y\|\to 0 as kk\to\infty. Given any two positive integers pqp\leq q, let kk^{*} be an integer such that pkqp\leq k^{*}\leq q and

Axky=min{Axky:pkq}.\displaystyle\|Ax_{k^{*}}-y\|=\min\left\{\|Ax_{k}-y\|:p\leq k\leq q\right\}. (21)

Then

θpθq22(θpθk2+θqθk2).\|\theta_{p}-\theta_{q}\|^{2}\leq 2\left(\|\theta_{p}-\theta_{k^{*}}\|^{2}+\|\theta_{q}-\theta_{k^{*}}\|^{2}\right).

We now show that θpθk20\|\theta_{p}-\theta_{k^{*}}\|^{2}\rightarrow 0 and θqθk20\|\theta_{q}-\theta_{k^{*}}\|^{2}\rightarrow 0 as pp\to\infty. To show the first assertion, we note that

θpθk2=θpθ^2θkθ^2+2θkθp,θkθ^.\|\theta_{p}-\theta_{k^{*}}\|^{2}=\|\theta_{p}-\hat{\theta}\|^{2}-\|\theta_{k^{*}}-\hat{\theta}\|^{2}+2\left<\theta_{k^{*}}-\theta_{p},\theta_{k^{*}}-\hat{\theta}\right>.

Since, by Lemma 2.8, {θpθ^}\{\|\theta_{p}-\hat{\theta}\|\} is monotonically decreasing, limpθpθ^\lim_{p\to\infty}\|\theta_{p}-\hat{\theta}\| exists and thus θpθ^2θkθ^20\|\theta_{p}-\hat{\theta}\|^{2}-\|\theta_{k^{*}}-\hat{\theta}\|^{2}\to 0 as pp\to\infty. To estimate θkθp,θkθ^\langle\theta_{k^{*}}-\theta_{p},\theta_{k^{*}}-\hat{\theta}\rangle, we first write

θkθp,θkθ^=k=pk1θk+1θk,θkθ^.\displaystyle\left\langle\theta_{k^{*}}-\theta_{p},\theta_{k^{*}}-\hat{\theta}\right\rangle=\sum_{k=p}^{k^{*}-1}\left\langle\theta_{k+1}-\theta_{k},\theta_{k^{*}}-\hat{\theta}\right\rangle.

By the definition of θk+1\theta_{k+1} and θ^\hat{\theta}, we have

θkθp,θkθ^\displaystyle\left\langle\theta_{k^{*}}-\theta_{p},\theta_{k^{*}}-\hat{\theta}\right\rangle =k=pk1l=1dθk+1,lθk,l,θk,lθ^l\displaystyle=\sum_{k=p}^{k^{*}-1}\sum_{l=1}^{d}\left\langle\theta_{k+1,l}-\theta_{k,l},\theta_{k^{*},l}-\hat{\theta}_{l}\right\rangle
=k=pk1l=1di=1bvli(xk+1,ixk,i),i=1bvli(xk,ix^i)\displaystyle=\sum_{k=p}^{k^{*}-1}\sum_{l=1}^{d}\left\langle\sum_{i=1}^{b}v_{li}(x_{k+1,i}-x_{k,i}),\sum_{i^{\prime}=1}^{b}v_{li^{\prime}}(x_{k^{*},i^{\prime}}-\hat{x}_{i^{\prime}})\right\rangle
=k=pk1l=1dvlik(xk+1,ikxk,ik),i=1bvli(xk,ix^i)\displaystyle=\sum_{k=p}^{k^{*}-1}\sum_{l=1}^{d}\left<v_{li_{k}}(x_{k+1,i_{k}}-x_{k,i_{k}}),\sum_{i^{\prime}=1}^{b}v_{li^{\prime}}(x_{k^{*},i^{\prime}}-\hat{x}_{i^{\prime}})\right>
=γk=pk1l=1dvlikAik(Axky),i=1bvli(xk,ix^i)\displaystyle=-\gamma\sum_{k=p}^{k^{*}-1}\sum_{l=1}^{d}\left<v_{li_{k}}A_{i_{k}}^{*}(Ax_{k}-y),\sum_{i^{\prime}=1}^{b}v_{li^{\prime}}(x_{k^{*},i^{\prime}}-\hat{x}_{i^{\prime}})\right>
=γk=pk1l=1di=1bvlikvliAxky,Aik(xk,ix^i).\displaystyle=-\gamma\sum_{k=p}^{k^{*}-1}\sum_{l=1}^{d}\sum_{i^{\prime}=1}^{b}v_{li_{k}}v_{li^{\prime}}\left\langle Ax_{k}-y,A_{i_{k}}(x_{k^{*},i^{\prime}}-\hat{x}_{i^{\prime}})\right\rangle.

Using the definition of AikA_{i_{k}} we further have

θkθp,θkθ^\displaystyle\left\langle\theta_{k^{*}}-\theta_{p},\theta_{k^{*}}-\hat{\theta}\right\rangle =γk=pk1l=1di=1bvlikvliAxky,(vlikK(xk,ix^i))l=1d\displaystyle=-\gamma\sum_{k=p}^{k^{*}-1}\sum_{l=1}^{d}\sum_{i^{\prime}=1}^{b}v_{li_{k}}v_{li^{\prime}}\left<Ax_{k}-y,\left(v_{l^{\prime}i_{k}}K(x_{k^{*},i^{\prime}}-\hat{x}_{i^{\prime}})\right)_{l^{\prime}=1}^{d}\right>
=γk=pk1l,l=1di=1bvlikvlikvli(Axky)l,K(xk,ix^i)\displaystyle=-\gamma\sum_{k=p}^{k^{*}-1}\sum_{l,l^{\prime}=1}^{d}\sum_{i^{\prime}=1}^{b}v_{li_{k}}v_{l^{\prime}i_{k}}v_{li^{\prime}}\left<(Ax_{k}-y)_{l^{\prime}},K(x_{k^{*},i^{\prime}}-\hat{x}_{i^{\prime}})\right>
=γk=pk1l,l=1dvlikvlik(Axky)l,(Axky)l\displaystyle=-\gamma\sum_{k=p}^{k^{*}-1}\sum_{l,l^{\prime}=1}^{d}v_{li_{k}}v_{l^{\prime}i_{k}}\left<(Ax_{k}-y)_{l^{\prime}},(Ax_{k^{*}}-y)_{l}\right>
=γk=pk1l=1dvlik(Axky)l,l=1dvlik(Axky)l.\displaystyle=-\gamma\sum_{k=p}^{k^{*}-1}\left<\sum_{l^{\prime}=1}^{d}v_{l^{\prime}i_{k}}(Ax_{k}-y)_{l^{\prime}},\sum_{l=1}^{d}v_{li_{k}}(Ax_{k^{*}}-y)_{l}\right>.

By virtue of the Cauchy-Schwarz inequality, we then obtain

|θkθp,θkθ^|\displaystyle\left|\left\langle\theta_{k^{*}}-\theta_{p},\theta_{k^{*}}-\hat{\theta}\right\rangle\right| γk=pk1l=1dvlik(Axky)ll=1dvlik(Axky)l\displaystyle\leq\gamma\sum_{k=p}^{k^{*}-1}\left\|\sum_{l^{\prime}=1}^{d}v_{l^{\prime}i_{k}}(Ax_{k}-y)_{l^{\prime}}\right\|\left\|\sum_{l=1}^{d}v_{li_{k}}(Ax_{k^{*}}-y)_{l}\right\|
γk=pk1l=1d|vlik|(Axky)ll=1d|vlik|(Axky)l\displaystyle\leq\gamma\sum_{k=p}^{k^{*}-1}\sum_{l^{\prime}=1}^{d}|v_{l^{\prime}i_{k}}|\|(Ax_{k}-y)_{l^{\prime}}\|\sum_{l=1}^{d}|v_{li_{k}}|\|(Ax_{k^{*}}-y)_{l}\|
γk=pk1vik2AxkyAxky\displaystyle\leq\gamma\sum_{k=p}^{k^{*}-1}\|v_{i_{k}}\|^{2}\|Ax_{k}-y\|\|Ax_{k^{*}}-y\|
γvk=pk1AxkyAxky.\displaystyle\leq\gamma v^{*}\sum_{k=p}^{k^{*}-1}\|Ax_{k}-y\|\|Ax_{k^{*}}-y\|.

With the help of (21) and (20) we further obtain

|θkθp,θkθ^|\displaystyle\left|\left\langle\theta_{k^{*}}-\theta_{p},\theta_{k^{*}}-\hat{\theta}\right\rangle\right| γvk=pk1Axky20\displaystyle\leq\gamma v^{*}\sum_{k=p}^{k^{*}-1}\|Ax_{k}-y\|^{2}\to 0

as pp\rightarrow\infty. Thus θpθk0\|\theta_{p}-\theta_{k^{*}}\|\rightarrow 0 as pp\to\infty. Similarly, we can also obtain θqθk0\|\theta_{q}-\theta_{k^{*}}\|\rightarrow 0 as pp\rightarrow\infty. Therefore, {θk}\{\theta_{k}\} is a Cauchy sequence.

Recall that VV is assumed to be of full column rank. Thus we can find a b×db\times d matrix U=(uil)U=(u_{il}) such that UV=IbUV=I_{b}, where IbI_{b} is the b×bb\times b identity matrix. Then

l=1duilθk,l\displaystyle\sum_{l=1}^{d}u_{il}\theta_{k,l} =l=1duili=1bvlixk,i=i=1b(l=1duilvli)xk,i=i=1bδiixk,i=xk,i\displaystyle=\sum_{l=1}^{d}u_{il}\sum_{i^{\prime}=1}^{b}v_{li^{\prime}}x_{k,i^{\prime}}=\sum_{i^{\prime}=1}^{b}\left(\sum_{l=1}^{d}u_{il}v_{li^{\prime}}\right)x_{k,i^{\prime}}=\sum_{i^{\prime}=1}^{b}\delta_{ii^{\prime}}x_{k,i^{\prime}}=x_{k,i}

for i=1,,bi=1,\cdots,b. Hence we can recover xkx_{k} from ξk\xi_{k}. Let UF\|U\|_{F} denote the Frobenius norm of UU. Then by the Cauchy-Schwarz inequality we can obtain

xpxq2\displaystyle\|x_{p}-x_{q}\|^{2} =i=1bl=1duil(θp,lθq,l)2i=1b(l=1d|uil|θp,lθq,l)2\displaystyle=\sum_{i=1}^{b}\left\|\sum_{l=1}^{d}u_{il}(\theta_{p,l}-\theta_{q,l})\right\|^{2}\leq\sum_{i=1}^{b}\left(\sum_{l=1}^{d}|u_{il}|\|\theta_{p,l}-\theta_{q,l}\|\right)^{2}
(i=1bl=1d|uil|2)(l=1dθp,lθq,l2)\displaystyle\leq\left(\sum_{i=1}^{b}\sum_{l=1}^{d}|u_{il}|^{2}\right)\left(\sum_{l=1}^{d}\|\theta_{p,l}-\theta_{q,l}\|^{2}\right)
UF2θpθq2\displaystyle\leq\|U\|_{F}^{2}\|\theta_{p}-\theta_{q}\|^{2}

which implies that {xk}\{x_{k}\} is also a Cauchy sequence and hence xkxx_{k}\to x^{*} as kk\to\infty for some xXbx^{*}\in X^{b}. Since Axky0\|Ax_{k}-y\|\to 0 as kk\to\infty, we can conclude Axy=0\|Ax^{*}-y\|=0, i.e. xx^{*} is a solution of Ax=yAx=y.

The above argument actually shows that there is a random solution xx^{*} of Ax=yAx=y such that xkxx_{k}\to x^{*} as kk\to\infty along any sample path in Ω2\Omega_{2}. Since (Ω2)=1\mathbb{P}(\Omega_{2})=1, we have xkxx_{k}\to x^{*} as kk\to\infty almost surely. Note that Lemma 2.8 implies θkθ^θ0θ^\|\theta_{k}-\hat{\theta}\|\leq\|\theta_{0}-\hat{\theta}\|, we can conclude that

xkx^2UF2θkθ^2UF2θ0θ^2\displaystyle\|x_{k}-\hat{x}\|^{2}\leq\|U\|_{F}^{2}\|\theta_{k}-\hat{\theta}\|^{2}\leq\|U\|_{F}^{2}\|\theta_{0}-\hat{\theta}\|^{2}

which implies that {xk}\{x_{k}\} is uniformly bounded in the sense of (16). Thus we may use Remark 2.5 to conclude that x=xx^{*}=x^{\dagger} almost surely and hence xkxx_{k}\to x^{\dagger} as kk\to\infty almost surely. Furthermore, by the dominated convergence theorem we can further obtain 𝔼[xkx2]0\mathbb{E}[\|x_{k}-x^{\dagger}\|^{2}]\to 0 as kk\to\infty. The proof is therefore complete. ∎

Remark 2.10.

In their exploration of the cyclic block coordinate descent method detailed in [21] for solving the ill-posed problem (11), the authors established the convergence of the generated sequence {xk}\{x_{k}\} to a point x𝒳x^{*}\in{\mathcal{X}} satisfying

l=1dvli(Axy)l=0,i=1,,b\sum_{l=1}^{d}v_{li}(Ax^{*}-y)_{l}=0,\quad i=1,\cdots,b

They then concluded that the full column rank property of VV implies Ax=yAx^{*}=y. However, this condition on VV is not sufficient for drawing such a conclusion when d>bd>b. We circumvented this issue for our RBCD method by leveraging Lemma 2.2. Furthermore, when classifying the limit xx^{*}, the authors in [21] inferred that xx^{*} is the unique x0x_{0}-minimum-norm solution of (11) by asserting that xk+1xkRan(A)x_{k+1}-x_{k}\in\mbox{Ran}(A^{*}) for all kk. Regrettably, this assertion is inaccurate. Actually

xk+1xkRan(A1)Ran(Ab)x_{k+1}-x_{k}\in\mbox{Ran}(A_{1}^{*})\otimes\cdots\otimes\mbox{Ran}(A_{b}^{*})

which is considerably larger than Ran(A)\mbox{Ran}(A^{*}). We demonstrated that the limit of our method is the x0x_{0}-minimum-norm solution by using Remark 2.5.

Theorem 2.11.

Consider Algorithm 1 for solving (11) with noisy data. Assume V=(vli)V=(v_{li}) is of full column rank and define vv^{*} as in Lemma 2.8. Assume 0<γ<2/(vK2)0<\gamma<2/(v^{*}\|K\|^{2}) and let xx^{\dagger} denote the unique x0x_{0}-minimum-norm solution of (11). If the integer kδk_{\delta} is chosen such that kδk_{\delta}\to\infty and δ2kδ0\delta^{2}k_{\delta}\to 0 as δ0\delta\to 0, then 𝔼[xkδδx2]0\mathbb{E}[\|x_{k_{\delta}}^{\delta}-x^{\dagger}\|^{2}]\to 0 as δ0\delta\to 0.

Proof.

By virtue of the inequality a+b22(a2+b2)\|a+b\|^{2}\leq 2(\|a\|^{2}+\|b\|^{2}) and Lemma 2.1, we have

𝔼[xkδδx2]\displaystyle\mathbb{E}[\|x_{k_{\delta}}^{\delta}-x^{\dagger}\|^{2}] 2𝔼[xkδδxkδ2]+2𝔼[xkδx2]\displaystyle\leq 2\mathbb{E}[\|x_{k_{\delta}}^{\delta}-x_{k_{\delta}}\|^{2}]+2\mathbb{E}[\|x_{k_{\delta}}-x^{\dagger}\|^{2}]
2γbδ2kδ+2𝔼[xkδx2].\displaystyle\leq\frac{2\gamma}{b}\delta^{2}k_{\delta}+2\mathbb{E}[\|x_{k_{\delta}}-x^{\dagger}\|^{2}].

Thus, we may use the choice of kδk_{\delta} and Theorem 2.9 to conclude the proof. ∎

The above theorem provides a convergence result on Algorithm 1 under an a priori stopping rule when it is used to solve (11). We can also apply Algorithm 2 to solve (11). Correspondingly we have the following convergence result.

Theorem 2.12.

Consider Algorithm 2 for solving (11) with noisy data. Assume V=(vli)V=(v_{li}) is of full column rank and define vv^{*} as in Lemma 2.8. Assume

0<γ<min{2/(vK2),(22/τ)/A2}0<\gamma<\min\{2/(v^{*}\|K\|^{2}),(2-2/\tau)/\|A\|^{2}\}

and let xx^{\dagger} denote the unique x0x_{0}-minimum-norm solution of (11). If the integer kδk_{\delta} is chosen such that kδk_{\delta}\to\infty as δ0\delta\to 0, then 𝔼[xkδδx2]0\mathbb{E}[\|x_{k_{\delta}}^{\delta}-x^{\dagger}\|^{2}]\to 0 as δ0\delta\to 0.

Proof.

Let k0k\geq 0 be any integer. Since kδk_{\delta}\to\infty, we have kδ>kk_{\delta}>k for small δ>0\delta>0. According to (18) in Propositon 2.7 we have

𝔼[xkδδx2]𝔼[xkδx2]2𝔼[xkδxk2]+2𝔼[xkx2].\mathbb{E}[\|x_{k_{\delta}}^{\delta}-x^{\dagger}\|^{2}]\leq\mathbb{E}[\|x_{k}^{\delta}-x^{\dagger}\|^{2}]\leq 2\mathbb{E}[\|x_{k}^{\delta}-x_{k}\|^{2}]+2\mathbb{E}[\|x_{k}-x^{\dagger}\|^{2}].

By virtue of (13) in Lemma 2.1, we then obtain

𝔼[xkδδx2]4γbkδ2+2𝔼[xkx2].\mathbb{E}[\|x_{k_{\delta}}^{\delta}-x^{\dagger}\|^{2}]\leq\frac{4\gamma}{b}k\delta^{2}+2\mathbb{E}[\|x_{k}-x^{\dagger}\|^{2}].

Therefore

lim supδ0𝔼[xkδδx2]2𝔼[xkx2].\limsup_{\delta\to 0}\mathbb{E}[\|x_{k_{\delta}}^{\delta}-x^{\dagger}\|^{2}]\leq 2\mathbb{E}[\|x_{k}-x^{\dagger}\|^{2}].

Letting kk\to\infty and using Theorem 2.9, we can conclude lim supδ0𝔼[xkδδx2]0\limsup_{\delta\to 0}\mathbb{E}[\|x_{k_{\delta}}^{\delta}-x^{\dagger}\|^{2}]\leq 0 and hence 𝔼[xkδδx2]0\mathbb{E}[\|x_{k_{\delta}}^{\delta}-x^{\dagger}\|^{2}]\to 0 as δ0\delta\to 0. ∎

3. Extension

In Algorithm 1 we proposed a randomized block coordinate descent method for solving linear ill-posed problem (1) and provided convergence analysis on the method which demonstrates that the iterates in general converge to the x0x_{0}-minimal norm solution. In many applications, however, we are interested in reconstructing solutions with other features, such as non-negativity, sparsity, and piece-wise constancy. Incorporating such feature information into algorithm design can significantly improve the reconstruction accuracy. Assume the feature of the sought solution can be detected by a convex function :𝒳(,]{\mathcal{R}}:{\mathcal{X}}\to(-\infty,\infty]. We may consider determining a solution xx^{\dagger} of (1) such that

(x)=min{(x):i=1bAixi=y}.\displaystyle{\mathcal{R}}(x^{\dagger})=\min\left\{{\mathcal{R}}(x):\sum_{i=1}^{b}A_{i}x_{i}=y\right\}. (22)

We assume the following condition on {\mathcal{R}}.

Assumption 3.1.

:𝒳(,]{\mathcal{R}}:{\mathcal{X}}\to(-\infty,\infty] is proper, lower semi-continuous and strongly convex in the sense that there is a constant κ>0\kappa>0 such that

(tx+(1t)x¯)+κt(1t)xx¯2t(x)+(1t)(x¯)\displaystyle{\mathcal{R}}(tx+(1-t)\bar{x})+\kappa t(1-t)\|x-\bar{x}\|^{2}\leq t{\mathcal{R}}(x)+(1-t){\mathcal{R}}(\bar{x})

for all x,x¯dom()x,\bar{x}\in\emph{dom}({\mathcal{R}}) and 0t10\leq t\leq 1. Moreover, {\mathcal{R}} has the separable structure

(x):=i=1bi(xi),x=(x1,,xb)𝒳,{\mathcal{R}}(x):=\sum_{i=1}^{b}{\mathcal{R}}_{i}(x_{i}),\qquad\forall x=(x_{1},\cdots,x_{b})\in{\mathcal{X}},

where each i{\mathcal{R}}_{i} is a function from XiX_{i} to (,](-\infty,\infty].

Under Assumption 3.1, it is easy to see that each i{\mathcal{R}}_{i} is proper, lower semi-continuous, and strong convex from XiX_{i} to (,](-\infty,\infty]. Furthermore, let \partial{\mathcal{R}} denote the subdifferential of {\mathcal{R}}, then the following facts on {\mathcal{R}} hold (see [12, 34]):

  1. (i)

    For any x𝒳x\in{\mathcal{X}} with (x)\partial{\mathcal{R}}(x)\neq\emptyset and ξ(x)\xi\in\partial{\mathcal{R}}(x) there holds

    Dξ(x¯,x)κx¯x2,x¯𝒳,\displaystyle D_{\mathcal{R}}^{\xi}(\bar{x},x)\geq\kappa\|\bar{x}-x\|^{2},\quad\forall\bar{x}\in{\mathcal{X}}, (23)

    where

    Dξ(x¯,x):=(x¯)(x)ξ,x¯x,x¯𝒳\displaystyle D_{\mathcal{R}}^{\xi}(\bar{x},x):={\mathcal{R}}(\bar{x})-{\mathcal{R}}(x)-\langle\xi,\bar{x}-x\rangle,\quad\bar{x}\in{\mathcal{X}}

    is the Bregman distance induced by {\mathcal{R}} at xx in the direction ξ\xi.

  2. (ii)

    For any x,x¯𝒳x,\bar{x}\in{\mathcal{X}}, ξ(x)\xi\in\partial{\mathcal{R}}(x) and ξ¯(x¯)\bar{\xi}\in\partial{\mathcal{R}}(\bar{x}) there holds

    ξξ¯,xx¯2κxx¯2.\langle\xi-\bar{\xi},x-\bar{x}\rangle\geq 2\kappa\|x-\bar{x}\|^{2}.
  3. (iii)

    For all x,x¯𝒳x,\bar{x}\in{\mathcal{X}}, ξ(x)\xi\in\partial{\mathcal{R}}(x) and ξ¯(x¯)\bar{\xi}\in\partial{\mathcal{R}}(\bar{x}) there holds

    Dξ(x¯,x)14κξ¯ξ2\displaystyle D_{\mathcal{R}}^{\xi}(\bar{x},x)\leq\frac{1}{4\kappa}\|\bar{\xi}-\xi\|^{2} (24)

Let us elucidate how to extend the method (9) to solve (1) so that the convex function {\mathcal{R}} can be incorporated to detect the solution feature. By introducing gkδ:=(gk,1δ,,gk,bδ)𝒳g_{k}^{\delta}:=(g_{k,1}^{\delta},\cdots,g_{k,b}^{\delta})\in{\mathcal{X}} with

gk,iδ={Aik(Axkδyδ), if i=ik,0, otherwiseg_{k,i}^{\delta}=\left\{\begin{array}[]{lll}A_{i_{k}}^{*}(Ax_{k}^{\delta}-y^{\delta}),&\mbox{ if }i=i_{k},\\ 0,&\mbox{ otherwise}\end{array}\right.

we can rewrite (9) as

xk+1δ\displaystyle x_{k+1}^{\delta} =xkδγgkδ=argminx𝒳{12x(xkδγgkδ)2}\displaystyle=x_{k}^{\delta}-\gamma g_{k}^{\delta}=\arg\min_{x\in{\mathcal{X}}}\left\{\frac{1}{2}\|x-(x_{k}^{\delta}-\gamma g_{k}^{\delta})\|^{2}\right\}
=argminx𝒳{12xxkδ2+γgkδ,x}.\displaystyle=\arg\min_{x\in{\mathcal{X}}}\left\{\frac{1}{2}\|x-x_{k}^{\delta}\|^{2}+\gamma\langle g_{k}^{\delta},x\rangle\right\}.

Assume (xkδ)\partial{\mathcal{R}}(x_{k}^{\delta})\neq\emptyset and take ξkδ(xkδ)\xi_{k}^{\delta}\in\partial{\mathcal{R}}(x_{k}^{\delta}), we may use the Bregman distance Dξkδ(x,xkδ)D_{\mathcal{R}}^{\xi_{k}^{\delta}}(x,x_{k}^{\delta}) to replace 12xxkδ2\frac{1}{2}\|x-x_{k}^{\delta}\|^{2} in the above equation to obtain the new updating formula

xk+1δ\displaystyle x_{k+1}^{\delta} =argminx𝒳{Dξkδ(x,xkδ)+γgkδ,x}=argminx𝒳{(x)ξkδγgkδ,x}.\displaystyle=\arg\min_{x\in{\mathcal{X}}}\left\{D_{\mathcal{R}}^{\xi_{k}^{\delta}}(x,x_{k}^{\delta})+\gamma\langle g_{k}^{\delta},x\rangle\right\}=\arg\min_{x\in{\mathcal{X}}}\left\{{\mathcal{R}}(x)-\langle\xi_{k}^{\delta}-\gamma g_{k}^{\delta},x\rangle\right\}.

Letting ξk+1δ:=ξkδγgkδ\xi_{k+1}^{\delta}:=\xi_{k}^{\delta}-\gamma g_{k}^{\delta}, then we have

xk+1δ=argminx𝒳{(x)ξk+1δ,x}.x_{k+1}^{\delta}=\arg\min_{x\in{\mathcal{X}}}\left\{{\mathcal{R}}(x)-\langle\xi_{k+1}^{\delta},x\rangle\right\}.

Recall the definition of gkδg_{k}^{\delta}, we can see that ξk+1δ=(ξk+1,1δ,,ξk+1,bδ)\xi_{k+1}^{\delta}=(\xi_{k+1,1}^{\delta},\cdots,\xi_{k+1,b}^{\delta}) with

ξk+1,iδ={ξk,ikδγAik(Axkδyδ), if i=ik,ξk,iδ otherwise.\displaystyle\xi_{k+1,i}^{\delta}=\left\{\begin{array}[]{lll}\xi_{k,i_{k}}^{\delta}-\gamma A_{i_{k}}^{*}(Ax_{k}^{\delta}-y^{\delta}),&\mbox{ if }i=i_{k},\\ \xi_{k,i}^{\delta}&\mbox{ otherwise}.\end{array}\right. (27)

Under Assumption 3.1, it is known that xk+1δx_{k+1}^{\delta} is uniquely defined and ξk+1δ(xk+1δ)\xi_{k+1}^{\delta}\in\partial{\mathcal{R}}(x_{k+1}^{\delta}). According to the separable structure of {\mathcal{R}}, it is easy to see that xk+1δ=(xk+1,1δ,,xk+1,bδ)x_{k+1}^{\delta}=(x_{k+1,1}^{\delta},\cdots,x_{k+1,b}^{\delta}) with

xk+1,iδ={argminzXik{ik(z)ξk+1,ikδ,z}, if i=ik,xk,iδ, otherwise.\displaystyle x_{k+1,i}^{\delta}=\left\{\begin{array}[]{lll}\displaystyle{\arg\min_{z\in X_{i_{k}}}\left\{{\mathcal{R}}_{i_{k}}(z)-\langle\xi_{k+1,i_{k}}^{\delta},z\rangle\right\}},&\mbox{ if }i=i_{k},\\ x_{k,i}^{\delta},&\mbox{ otherwise}.\end{array}\right. (30)

Since ξk+1δ(xk+1δ)\xi_{k+1}^{\delta}\in\partial{\mathcal{R}}(x_{k+1}^{\delta}), we may repeat the above procedure. This leads us to propose the following algorithm.

Algorithm 3 (RBCD method with convex regularization function).

Let ξ0=0𝒳\xi_{0}=0\in{\mathcal{X}} and define x0:=argminx𝒳(x)x_{0}:=\arg\min_{x\in{\mathcal{X}}}{\mathcal{R}}(x) as an initial guess. Set ξ0δ:=ξ0\xi_{0}^{\delta}:=\xi_{0}, x0δ:=x0x_{0}^{\delta}:=x_{0} and calculate r0δ:=Ax0δyδr_{0}^{\delta}:=Ax_{0}^{\delta}-y^{\delta}. Choose a suitable step size γ>0\gamma>0. For all integers k0k\geq 0 do the following:

  1. (i)

    Pick an index ik{1,,b}i_{k}\in\{1,\cdots,b\} randomly via the uniform distribution;

  2. (ii)

    Update ξk+1δ\xi_{k+1}^{\delta} by the equation (27), i.e. setting ξk+1,iδ=ξk,iδ\xi_{k+1,i}^{\delta}=\xi_{k,i}^{\delta} for iiki\neq i_{k} and

    ξk+1,ikδ=ξk,ikδγAikrkδ;\xi_{k+1,i_{k}}^{\delta}=\xi_{k,i_{k}}^{\delta}-\gamma A_{i_{k}}^{*}r_{k}^{\delta};
  3. (iii)

    Update xk+1δx_{k+1}^{\delta} by the equation (30);

  4. (iv)

    Calculate rk+1δ=rkδ+Aik(xk+1,ikδxk,ikδ)r_{k+1}^{\delta}=r_{k}^{\delta}+A_{i_{k}}(x_{k+1,i_{k}}^{\delta}-x_{k,i_{k}}^{\delta}).

The analysis on Algorithm 3 is rather challenging. In the following we will prove some results which support the use of Algorithm 3 to solve ill-posed problems. We start with the following lemma.

Lemma 3.2.

Let Assumption 3.1 hold and consider Algorithm 3. Then for any solution x^\hat{x} of (1) in dom()\emph{dom}({\mathcal{R}}) there holds

𝔼[Dξk+1δ(x^,xk+1δ)|k]Dξkδ(x^,xkδ)\displaystyle\mathbb{E}\left[D_{\mathcal{R}}^{\xi_{k+1}^{\delta}}(\hat{x},x_{k+1}^{\delta})\Big{|}{\mathcal{F}}_{k}\right]-D_{\mathcal{R}}^{\xi_{k}^{\delta}}(\hat{x},x_{k}^{\delta}) c3γbAxkδyδ2+γbδAxkδyδ\displaystyle\leq-c_{3}\frac{\gamma}{b}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}+\frac{\gamma}{b}\delta\|Ax_{k}^{\delta}-y^{\delta}\|

for all integers k0k\geq 0, where c3:=1γA2/(4κ)c_{3}:=1-\gamma\|A\|^{2}/(4\kappa).

Proof.

For any solution x^\hat{x} of (1) in dom()\mbox{dom}({\mathcal{R}}) let Δkδ:=Dξkδ(x^,xkδ)\Delta_{k}^{\delta}:=D_{\mathcal{R}}^{\xi_{k}^{\delta}}(\hat{x},x_{k}^{\delta}) for all integers kk. Then, by using (24) and the definition of ξk+1δ\xi_{k+1}^{\delta}, we have

Δk+1δΔkδ\displaystyle\Delta_{k+1}^{\delta}-\Delta_{k}^{\delta} =Dξk+1δ(xkδ,xk+1δ)+ξk+1δξkδ,xkδx^\displaystyle=D_{\mathcal{R}}^{\xi_{k+1}^{\delta}}(x_{k}^{\delta},x_{k+1}^{\delta})+\langle\xi_{k+1}^{\delta}-\xi_{k}^{\delta},x_{k}^{\delta}-\hat{x}\rangle
14κξk+1δξkδ2+ξk+1δξkδ,xkδx^\displaystyle\leq\frac{1}{4\kappa}\|\xi_{k+1}^{\delta}-\xi_{k}^{\delta}\|^{2}+\langle\xi_{k+1}^{\delta}-\xi_{k}^{\delta},x_{k}^{\delta}-\hat{x}\rangle
=14κξk+1,ikδξk,ikδ2+ξk+1,ikδξk,ikδ,xk,ikδx^ik\displaystyle=\frac{1}{4\kappa}\|\xi_{k+1,i_{k}}^{\delta}-\xi_{k,i_{k}}^{\delta}\|^{2}+\langle\xi_{k+1,i_{k}}^{\delta}-\xi_{k,i_{k}}^{\delta},x_{k,i_{k}}^{\delta}-\hat{x}_{i_{k}}\rangle
=γ24κAik(Axkδyδ)2γAxkδyδ,Aik(xk,ikδx^ikδ).\displaystyle=\frac{\gamma^{2}}{4\kappa}\|A_{i_{k}}^{*}(Ax_{k}^{\delta}-y^{\delta})\|^{2}-\gamma\langle Ax_{k}^{\delta}-y^{\delta},A_{i_{k}}(x_{k,i_{k}}^{\delta}-\hat{x}_{i_{k}}^{\delta})\rangle.

By taking the conditional expectation on k{\mathcal{F}}_{k} and using yδyδ\|y^{\delta}-y\|\leq\delta we can obtain

𝔼[Δk+1δ|k]Δkδ\displaystyle\mathbb{E}[\Delta_{k+1}^{\delta}|{\mathcal{F}}_{k}]-\Delta_{k}^{\delta} γ24κbi=1bAi(Axkδyδ)2γbAxkδyδ,Axkδy\displaystyle\leq\frac{\gamma^{2}}{4\kappa b}\sum_{i=1}^{b}\|A_{i}^{*}(Ax_{k}^{\delta}-y^{\delta})\|^{2}-\frac{\gamma}{b}\langle Ax_{k}^{\delta}-y^{\delta},Ax_{k}^{\delta}-y\rangle
=γ24κbA(Axkδyδ)2γbAxkδyδ2γbAxkδyδ,yδy\displaystyle=\frac{\gamma^{2}}{4\kappa b}\|A^{*}(Ax_{k}^{\delta}-y^{\delta})\|^{2}-\frac{\gamma}{b}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}-\frac{\gamma}{b}\langle Ax_{k}^{\delta}-y^{\delta},y^{\delta}-y\rangle
(1γA24κ)γbAxkδyδ2+γbδAxkδyδ.\displaystyle\leq-\left(1-\frac{\gamma\|A\|^{2}}{4\kappa}\right)\frac{\gamma}{b}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}+\frac{\gamma}{b}\delta\|Ax_{k}^{\delta}-y^{\delta}\|.

The proof is therefore complete. ∎

Theorem 3.3.

Let 𝒳{\mathcal{X}} be finite dimensional and let :𝒳{\mathcal{R}}:{\mathcal{X}}\to{\mathbb{R}} satisfy Assumption 3.1. Consider Algorithm 3 with the exact data. If 0<γ<4κ/A20<\gamma<4\kappa/\|A\|^{2}, then {xk}\{x_{k}\} converges to a random solution x¯\bar{x} of (1) almost surely.

Proof.

Since 𝒳{\mathcal{X}} is finite dimensional and {\mathcal{R}} maps 𝒳{\mathcal{X}} to {\mathbb{R}}, the convex function {\mathcal{R}} is Lipschitz continuous on bounded sets and (x)\partial{\mathcal{R}}(x)\neq\emptyset for all x𝒳x\in{\mathcal{X}}; moreover, for any bounded set K𝒳K\subset{\mathcal{X}} there is a constant CKC_{K} such that ξCK\|\xi\|\leq C_{K} for all ξ(x)\xi\in\partial{\mathcal{R}}(x) and xKx\in K.

In the following we will use the similar argument in the proof of Theorem 2.4 to show the almost sure convergence of {xk}\{x_{k}\}. Let SS denote the set of solutions of (1) in dom()\mbox{dom}({\mathcal{R}}). By using Lemma 3.2 with exact data and Proposition 2.3 we can conclude for any zSz\in S that the event

Ω~z:={limkDξk(z,xk) exists and is finite}\tilde{\Omega}_{z}:=\left\{\lim_{k\to\infty}D_{\mathcal{R}}^{\xi_{k}}(z,x_{k})\mbox{ exists and is finite}\right\}

has probability one. Since 𝒳{\mathcal{X}} is separable, we can find a countable set DSD\subset S such that DD is dense in SS. Let

Ω~1:=zDΩ~z.\tilde{\Omega}_{1}:=\bigcap_{z\in D}\tilde{\Omega}_{z}.

Then (Ω~1)=1\mathbb{P}(\tilde{\Omega}_{1})=1. We now show that, for any z~S\tilde{z}\in S, along any sample path in Ω~1\tilde{\Omega}_{1} the sequence {Dξk(z~,xk)}\{D_{\mathcal{R}}^{\xi_{k}}(\tilde{z},x_{k})\} is convergent. To see this, we take a sequence {zl}D\{z_{l}\}\subset D such that zlz~z_{l}\to\tilde{z} as ll\to\infty. For any sample path ωΩ~1\omega\in\tilde{\Omega}_{1} we have

Dξk(ω)(z~,xk(ω))Dξk(ω)(zl,xk(ω))=(z~)(zl)ξk(ω),z~zl.D_{\mathcal{R}}^{\xi_{k}(\omega)}(\tilde{z},x_{k}(\omega))-D_{\mathcal{R}}^{\xi_{k}(\omega)}(z_{l},x_{k}(\omega))={\mathcal{R}}(\tilde{z})-{\mathcal{R}}(z_{l})-\langle\xi_{k}(\omega),\tilde{z}-z_{l}\rangle.

Since {Dξk(ω)(zl,xk(ω))}\{D_{\mathcal{R}}^{\xi_{k}(\omega)}(z_{l},x_{k}(\omega))\}, for a fixed ll, is convergent, it is bounded. By (23), {xk(ω)}\{x_{k}(\omega)\} is then bounded and hence we can find a constant MM such that ξk(ω)M\|\xi_{k}(\omega)\|\leq M for all kk. Therefore

(z~)(zl)Mz~zl\displaystyle{\mathcal{R}}(\tilde{z})-{\mathcal{R}}(z_{l})-M\|\tilde{z}-z_{l}\| Dξk(ω)(z~,xk(ω))Dξk(ω)(zl,xk(ω))\displaystyle\leq D_{\mathcal{R}}^{\xi_{k}(\omega)}(\tilde{z},x_{k}(\omega))-D_{\mathcal{R}}^{\xi_{k}(\omega)}(z_{l},x_{k}(\omega))
(z~)(zl)+Mz~zl.\displaystyle\leq{\mathcal{R}}(\tilde{z})-{\mathcal{R}}(z_{l})+M\|\tilde{z}-z_{l}\|.

This together with the existence of limkDξk(ω)(zl,xk(ω))\lim_{k\to\infty}D_{\mathcal{R}}^{\xi_{k}(\omega)}(z_{l},x_{k}(\omega)) implies

0lim supkDξk(ω)(z~,xk(ω))lim infkDξk(ω)(z~,xk(ω))2Mzlz~.0\leq\limsup_{k\to\infty}D_{\mathcal{R}}^{\xi_{k}(\omega)}(\tilde{z},x_{k}(\omega))-\liminf_{k\to\infty}D_{\mathcal{R}}^{\xi_{k}(\omega)}(\tilde{z},x_{k}(\omega))\leq 2M\|z_{l}-\tilde{z}\|.

Letting ll\to\infty then shows that limkDξk(ω)(z~,xk(ω))\lim_{k\to\infty}D_{\mathcal{R}}^{\xi_{k}(\omega)}(\tilde{z},x_{k}(\omega)) exists and is finite for every ωΩ~1\omega\in\tilde{\Omega}_{1} and z~S\tilde{z}\in S.

Next we show the almost sure convergence of {xk}\{x_{k}\}. By using Lemma 3.2 with exact data we can conclude the event

Ω~2:={k=0Axky2<}\displaystyle\tilde{\Omega}_{2}:=\left\{\sum_{k=0}^{\infty}\|Ax_{k}-y\|^{2}<\infty\right\}

has probability one. Let Ω~0:=Ω~1Ω~2\tilde{\Omega}_{0}:=\tilde{\Omega}_{1}\cap\tilde{\Omega}_{2}. Then (Ω~0)=1\mathbb{P}(\tilde{\Omega}_{0})=1. Note that, along any sample path in Ω~0\tilde{\Omega}_{0}, {Dξk(z,xk)}\{D_{\mathcal{R}}^{\xi_{k}}(z,x_{k})\} is convergent for any zSz\in S and thus {xk}\{x_{k}\} and {ξk}\{\xi_{k}\} are bounded. Thus, we can find a subsequence {kl}\{k_{l}\} of positive integers such that xklx¯x_{k_{l}}\to\bar{x} and ξklξ¯\xi_{k_{l}}\to\bar{\xi} as ll\to\infty. Because ξkl(xkl)\xi_{k_{l}}\in\partial{\mathcal{R}}(x_{k_{l}}), we have ξ¯(x¯)\bar{\xi}\in\partial{\mathcal{R}}(\bar{x}). Since Axky0\|Ax_{k}-y\|\to 0 as kk\to\infty, we can obtain Ax¯=yA\bar{x}=y, i.e. x¯S\bar{x}\in S. Consequently {Dξk(x¯,xk)}\{D_{\mathcal{R}}^{\xi_{k}}(\bar{x},x_{k})\} is convergent. To show xkx¯x_{k}\to\bar{x}, it suffices to show that x¯\bar{x} is the unique cluster point of {xk}\{x_{k}\}. Let xx^{*} be any cluster point of {xk}\{x_{k}\}. Then, by using the boundedness of {ξk}\{\xi_{k}\}, there is another subsequence {nl}\{n_{l}\} of positive integers such that xnlxx_{n_{l}}\to x^{*} and ξnlξ\xi_{n_{l}}\to\xi^{*} as ll\to\infty. Then ξ(x)\xi^{*}\in\partial{\mathcal{R}}(x^{*}), xSx^{*}\in S and thus {Dξk(x,xk)}\{D_{\mathcal{R}}^{\xi_{k}}(x^{*},x_{k})\} is convergent. Noting the identity

ξk,xx¯=Dξk(x¯,xk)Dξk(x,xk)+(x)(x¯),\langle\xi_{k},x^{*}-\bar{x}\rangle=D_{\mathcal{R}}^{\xi_{k}}(\bar{x},x_{k})-D_{\mathcal{R}}^{\xi_{k}}(x^{*},x_{k})+{\mathcal{R}}(x^{*})-{\mathcal{R}}(\bar{x}),

we can conclude that limkξk,xx¯\lim_{k\to\infty}\langle\xi_{k},x^{*}-\bar{x}\rangle exists. Therefore

limkξk,xx¯\displaystyle\lim_{k\to\infty}\langle\xi_{k},x^{*}-\bar{x}\rangle =limlξkl,xx¯=ξ¯,xx¯,\displaystyle=\lim_{l\to\infty}\langle\xi_{k_{l}},x^{*}-\bar{x}\rangle=\langle\bar{\xi},x^{*}-\bar{x}\rangle,
limkξk,xx¯\displaystyle\lim_{k\to\infty}\langle\xi_{k},x^{*}-\bar{x}\rangle =limlξnl,xx¯=ξ,xx¯\displaystyle=\lim_{l\to\infty}\langle\xi_{n_{l}},x^{*}-\bar{x}\rangle=\langle\xi^{*},x^{*}-\bar{x}\rangle

and thus ξξ¯,xx¯=0\langle\xi^{*}-\bar{\xi},x^{*}-\bar{x}\rangle=0. Since ξ¯(x¯)\bar{\xi}\in\partial{\mathcal{R}}(\bar{x}) and ξ(x)\xi^{*}\in\partial{\mathcal{R}}(x^{*}), we may use the strong convexity of {\mathcal{R}} to conclude x=x¯x^{*}=\bar{x}. The proof is complete. ∎

For Algorithm 3 with noisy data, we have the following convergence result under an a priori stopping rule.

Theorem 3.4.

Let 𝒳{\mathcal{X}} be finite dimensional and let :𝒳{\mathcal{R}}:{\mathcal{X}}\to{\mathbb{R}} satisfy Assumption 3.1. Consider Algorithm 3 with 0<γ<4κ/A20<\gamma<4\kappa/\|A\|^{2}. Let {xk}\{x_{k}\} be the random sequence determined by Algorithm 3 with the exact data. If {xk}\{x_{k}\} is uniformly bounded in the sense of (16) and if the integer kδk_{\delta} is chosen such that kδk_{\delta}\to\infty and δ2kδ0\delta^{2}k_{\delta}\to 0 as δ0\delta\to 0, then

𝔼[xkδδx2]0\mathbb{E}\left[\|x_{k_{\delta}}^{\delta}-x^{\dagger}\|^{2}\right]\to 0

as δ0\delta\to 0, where xx^{\dagger} denotes the unique solution of (1) satisfying (22).

Proof.

According to Theorem 3.3, there is a random solution x¯\bar{x} of (1) such that xkx¯x_{k}\to\bar{x} as kk\to\infty almost surely. Since {xk}\{x_{k}\} is assumed to satisfy (16), we can use the dominated convergence theorem to conclude 𝔼[xkx¯2]0\mathbb{E}[\|x_{k}-\bar{x}\|^{2}]\to 0 as kk\to\infty. Based on this, we will show

limk𝔼[xkx2]=0.\displaystyle\lim_{k\to\infty}\mathbb{E}[\|x_{k}-x^{\dagger}\|^{2}]=0. (31)

Indeed, by using a similar argument in Remark 2.5 and noting ξ0=0\xi_{0}=0 we can conclude that

𝔼[ξk,x¯x]=0,k.\displaystyle\mathbb{E}[\langle\xi_{k},\bar{x}-x^{\dagger}\rangle]=0,\quad\forall k. (32)

According to the definition of xx^{\dagger}, the convex function φ(t):=(x+t(x¯x))\varphi(t):={\mathcal{R}}(x^{\dagger}+t(\bar{x}-x^{\dagger})) on {\mathbb{R}} achieves its minimum at t=0t=0 and thus 0φ(0)0\in\partial\varphi(0). Note that

φ(0)={ξ,x¯x:ξ(x)}.\partial\varphi(0)=\{\langle\xi,\bar{x}-x^{\dagger}\rangle:\xi\in\partial{\mathcal{R}}(x^{\dagger})\}.

Consequently, there exists ξ(x)\xi^{\dagger}\in\partial{\mathcal{R}}(x^{\dagger}) such that ξ,x¯x=0\langle\xi^{\dagger},\bar{x}-x^{\dagger}\rangle=0. This ξ\xi^{\dagger} depends on x¯\bar{x} and hence it is a random element in (x)\partial{\mathcal{R}}(x^{\dagger}). Nevertheless, 𝔼[ξ,x¯x]=0\mathbb{E}[\langle\xi^{\dagger},\bar{x}-x^{\dagger}\rangle]=0. Combining this with (32) gives

0=𝔼[ξkξ,x¯x]=𝔼[ξkξ,xkx]+𝔼[ξkξ,x¯xk]0=\mathbb{E}[\langle\xi_{k}-\xi^{\dagger},\bar{x}-x^{\dagger}\rangle]=\mathbb{E}[\langle\xi_{k}-\xi^{\dagger},x_{k}-x^{\dagger}\rangle]+\mathbb{E}[\langle\xi_{k}-\xi^{\dagger},\bar{x}-x_{k}\rangle]

for all integers k0k\geq 0. Since {xk}\{x_{k}\} is uniformly bounded in the sense of (16) and {\mathcal{R}} is Lipschitz continuous on bounded sets, we can find a constant MM such that ξM\|\xi^{\dagger}\|\leq M and ξkM\|\xi_{k}\|\leq M for all kk almost surely. Thus

|𝔼[ξkξ,x¯xk]|2M𝔼[x¯xk]2M(𝔼[xkx¯2])1/20\displaystyle\left|\mathbb{E}[\langle\xi_{k}-\xi^{\dagger},\bar{x}-x_{k}\rangle]\right|\leq 2M\mathbb{E}[\|\bar{x}-x_{k}\|]\leq 2M(\mathbb{E}[\|x_{k}-\bar{x}\|^{2}])^{1/2}\to 0

as kk\to\infty. Consequently

limk𝔼[|ξkξ,xkx|=0.\lim_{k\to\infty}\mathbb{E}[|\langle\xi_{k}-\xi^{\dagger},x_{k}-x^{\dagger}\rangle|=0.

By the strong convexity of {\mathcal{R}}, we have ξkξ,xkx2κxkx2\langle\xi_{k}-\xi^{\dagger},x_{k}-x^{\dagger}\rangle\geq 2\kappa\|x_{k}-x^{\dagger}\|^{2}. Therefore limk𝔼[xkx2]=0\lim_{k\to\infty}\mathbb{E}[\|x_{k}-x^{\dagger}\|^{2}]=0 which shows (31).

Next, by using the definition of ξkδ,xkδ\xi_{k}^{\delta},x_{k}^{\delta} and ξk,xk\xi_{k},x_{k}, it is easy to show by an induction argument that, along any sample path there holds

ξkδξkandxkδxkas δ0\displaystyle\xi_{k}^{\delta}\to\xi_{k}\quad\mbox{and}\quad x_{k}^{\delta}\to x_{k}\quad\mbox{as }\delta\to 0 (33)

for every integer k0k\geq 0.

Finally, we show 𝔼[xkδδx2]0\mathbb{E}[\|x_{k_{\delta}}^{\delta}-x^{\dagger}\|^{2}]\to 0 as δ0\delta\to 0 if kδk_{\delta} is chosen such that kδk_{\delta}\to\infty and δ2kδ0\delta^{2}k_{\delta}\to 0 as δ0\delta\to 0. To see this, let Δkδ:=Dξkδ(x,xkδ)\Delta_{k}^{\delta}:=D_{\mathcal{R}}^{\xi_{k}^{\delta}}(x^{\dagger},x_{k}^{\delta}) for all k0k\geq 0. From Lemma 3.2 it follows that

𝔼[Δk+1δ]𝔼[Δkδ]+γ4bc3δ2,k0.\displaystyle\mathbb{E}[\Delta_{k+1}^{\delta}]\leq\mathbb{E}[\Delta_{k}^{\delta}]+\frac{\gamma}{4bc_{3}}\delta^{2},\quad\forall k\geq 0.

Let k0k\geq 0 be any fixed integer. Since kδk_{\delta}\to\infty, we have kδ>kk_{\delta}>k for small δ>0\delta>0. Consequently, by virtue of the above inequality, we can obtain

𝔼[Δkδδ]𝔼[Δkδ]+γ4bc3(kδk)δ2𝔼[Δkδ]+γ4bc3kδδ2\displaystyle\mathbb{E}[\Delta_{k_{\delta}}^{\delta}]\leq\mathbb{E}[\Delta_{k}^{\delta}]+\frac{\gamma}{4bc_{3}}(k_{\delta}-k)\delta^{2}\leq\mathbb{E}[\Delta_{k}^{\delta}]+\frac{\gamma}{4bc_{3}}k_{\delta}\delta^{2}

Since δ2kδ0\delta^{2}k_{\delta}\to 0 as δ0\delta\to 0, we may use (33) and the continuity of {\mathcal{R}} to further obtain

lim supδ0𝔼[Δkδδ]lim supδ0𝔼[Δkδ]=𝔼[Dξk(x,xk)].\displaystyle\limsup_{\delta\to 0}\mathbb{E}[\Delta_{k_{\delta}}^{\delta}]\leq\limsup_{\delta\to 0}\mathbb{E}[\Delta_{k}^{\delta}]=\mathbb{E}\left[D_{\mathcal{R}}^{\xi_{k}}(x^{\dagger},x_{k})\right].

Since {xk}\{x_{k}\} satisfies (16) and {\mathcal{R}} is Lipschitz continuous on bounded sets, there is a constant LL such that

Dξk(x,xk)\displaystyle D_{\mathcal{R}}^{\xi_{k}}(x^{\dagger},x_{k}) =(x)(xk)ξk,xxk\displaystyle={\mathcal{R}}(x^{\dagger})-{\mathcal{R}}(x_{k})-\langle\xi_{k},x^{\dagger}-x_{k}\rangle
|(x)(xk)|+ξkxxk\displaystyle\leq|{\mathcal{R}}(x^{\dagger})-{\mathcal{R}}(x_{k})|+\|\xi_{k}\|\|x^{\dagger}-x_{k}\|
(L+M)xxk\displaystyle\leq(L+M)\|x^{\dagger}-x_{k}\|

Therefore

lim supδ0𝔼[Δkδδ](L+M)𝔼[xkx](L+M)(𝔼[xkx2])1/2\displaystyle\limsup_{\delta\to 0}\mathbb{E}[\Delta_{k_{\delta}}^{\delta}]\leq(L+M)\mathbb{E}[\|x_{k}-x^{\dagger}\|]\leq(L+M)(\mathbb{E}[\|x_{k}-x^{\dagger}\|^{2}])^{1/2}

for all k0k\geq 0. By taking kk\to\infty and using (31) we thus have lim supδ0𝔼[Δkδδ]0\limsup_{\delta\to 0}\mathbb{E}[\Delta_{k_{\delta}}^{\delta}]\leq 0 which implies 𝔼[Δkδδ]0\mathbb{E}[\Delta_{k_{\delta}}^{\delta}]\to 0 as δ0\delta\to 0. Consequently, it follows from the strong convexity of {\mathcal{R}} that 𝔼[xkδδx2]0\mathbb{E}[\|x_{k_{\delta}}^{\delta}-x^{\dagger}\|^{2}]\to 0 as δ0\delta\to 0. ∎

In the above results we considered Algorithm 3 by terminating the iterations via a priori stopping rules. Analogous to Algorithm 2, we may consider incorporating the discrepancy principle into Algorithm 3. This leads to the following algorithm.

Algorithm 4.

Let ξ0=0𝒳\xi_{0}=0\in{\mathcal{X}} and define x0:=argminx𝒳(x)x_{0}:=\arg\min_{x\in{\mathcal{X}}}{\mathcal{R}}(x) as an initial guess. Set ξ0δ:=ξ0\xi_{0}^{\delta}:=\xi_{0}, x0δ:=x0x_{0}^{\delta}:=x_{0} and calculate r0δ:=Ax0δyδr_{0}^{\delta}:=Ax_{0}^{\delta}-y^{\delta}. Choose τ>1\tau>1 and γ>0\gamma>0. For all integers k0k\geq 0 do the following:

  1. (i)

    Set the step-size γk\gamma_{k} by

    γk:={γ if rkδ>τδ,0 if rkδτδ;\displaystyle\gamma_{k}:=\left\{\begin{array}[]{lll}\gamma&\mbox{ if }\|r_{k}^{\delta}\|>\tau\delta,\\ 0&\mbox{ if }\|r_{k}^{\delta}\|\leq\tau\delta;\end{array}\right.
  2. (ii)

    Pick an index ik{1,,b}i_{k}\in\{1,\cdots,b\} randomly via the uniform distribution;

  3. (iii)

    Update ξk+1δ\xi_{k+1}^{\delta} by the equation (27), i.e. setting ξk+1,iδ=ξk,iδ\xi_{k+1,i}^{\delta}=\xi_{k,i}^{\delta} for iiki\neq i_{k} and

    ξk+1,ikδ=ξk,ikδγkAikrkδ;\xi_{k+1,i_{k}}^{\delta}=\xi_{k,i_{k}}^{\delta}-\gamma_{k}A_{i_{k}}^{*}r_{k}^{\delta};
  4. (iv)

    Update xk+1δx_{k+1}^{\delta} by the equation (30);

  5. (v)

    Calculate rk+1δ=rkδ+Aik(xk+1,ikδxk,ikδ)r_{k+1}^{\delta}=r_{k}^{\delta}+A_{i_{k}}(x_{k+1,i_{k}}^{\delta}-x_{k,i_{k}}^{\delta}).

For Algorithm 4 we have the following result which in particular shows that the iteration must terminate in finite many steps almost surely, i.e. there is an event of probability one such that, along any sample path in this event, γk=0\gamma_{k}=0 for sufficiently large kk.

Theorem 3.5.

Let Assumption 3.1 hold and consider Algorithm 4 with γ=μ/A2\gamma=\mu/\|A\|^{2} for some 0<μ<4κ(11/τ)0<\mu<4\kappa(1-1/\tau). Then the iteration must terminate in finite many steps almost surely. If, in addition, all the conditions in Theorem 3.4 hold, then for any integer kδk_{\delta} with kδk_{\delta}\to\infty as δ0\delta\to 0 there holds

limδ0𝔼[xkδδx2]=0,\displaystyle\lim_{\delta\to 0}\mathbb{E}[\|x_{k_{\delta}}^{\delta}-x^{\dagger}\|^{2}]=0, (34)

where xx^{\dagger} denotes the unique solution of (1) satisfying (22).

Proof.

For any integer k0k\geq 0 let Δkδ:=Dξkδ(x,xkδ)\Delta_{k}^{\delta}:=D_{\mathcal{R}}^{\xi_{k}^{\delta}}(x^{\dagger},x_{k}^{\delta}). By noting that γk\gamma_{k} is k{\mathcal{F}}_{k}-measurable, we may use the similar argument in the proof of Lemma 3.2 to obtain

𝔼[Δk+1δ|k]Δkδ\displaystyle\mathbb{E}[\Delta_{k+1}^{\delta}|{\mathcal{F}}_{k}]-\Delta_{k}^{\delta} 1b(1γkA24κ)γkAxkδyδ2+γkbδAxkδyδ.\displaystyle\leq-\frac{1}{b}\left(1-\frac{\gamma_{k}\|A\|^{2}}{4\kappa}\right)\gamma_{k}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}+\frac{\gamma_{k}}{b}\delta\|Ax_{k}^{\delta}-y^{\delta}\|.

According to the definition of γk\gamma_{k}, we have γkγ\gamma_{k}\leq\gamma and γkδγkτAxkδyδ\gamma_{k}\delta\leq\frac{\gamma_{k}}{\tau}\|Ax_{k}^{\delta}-y^{\delta}\|. Therefore

𝔼[Δk+1δ|k]Δkδ\displaystyle\mathbb{E}[\Delta_{k+1}^{\delta}|{\mathcal{F}}_{k}]-\Delta_{k}^{\delta} 1b(11τγkA24κ)γkAxkδyδ2\displaystyle\leq-\frac{1}{b}\left(1-\frac{1}{\tau}-\frac{\gamma_{k}\|A\|^{2}}{4\kappa}\right)\gamma_{k}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}
1b(11τμ4κ)γkAxkδyδ2\displaystyle\leq-\frac{1}{b}\left(1-\frac{1}{\tau}-\frac{\mu}{4\kappa}\right)\gamma_{k}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}

Taking the full expectation then gives

𝔼[Δk+1δ]𝔼[Δkδ]c4𝔼[γkAxkδyδ2]\displaystyle\mathbb{E}[\Delta_{k+1}^{\delta}]\leq\mathbb{E}[\Delta_{k}^{\delta}]-c_{4}\mathbb{E}\left[\gamma_{k}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}\right] (35)

for all k0k\geq 0, where c4:=(11/τμ/(4κ))/b>0c_{4}:=(1-1/\tau-\mu/(4\kappa))/b>0.

Based on (35) we now show that the method must terminate after finite many steps almost surely. To see this, consider the event

:={Axkδyδ>τδ for all integers k0}{\mathcal{E}}:=\left\{\|Ax_{k}^{\delta}-y^{\delta}\|>\tau\delta\mbox{ for all integers }k\geq 0\right\}

It suffices to show ()=0\mathbb{P}({\mathcal{E}})=0. By virtue of (35) we have

c4𝔼[γkAxkδyδ2]𝔼[Δkδ]𝔼[Δk+1δ]\displaystyle c_{4}\mathbb{E}\left[\gamma_{k}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}\right]\leq\mathbb{E}[\Delta_{k}^{\delta}]-\mathbb{E}[\Delta_{k+1}^{\delta}]

and hence for any integer n0n\geq 0 that

c4k=0n𝔼[γkAxkδyδ2]𝔼[Δ0δ]=(x¯)(x0)<.\displaystyle c_{4}\sum_{k=0}^{n}\mathbb{E}\left[\gamma_{k}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}\right]\leq\mathbb{E}[\Delta_{0}^{\delta}]={\mathcal{R}}(\bar{x})-{\mathcal{R}}(x_{0})<\infty. (36)

Let χ\chi_{\mathcal{E}} denote the characteristic function of {\mathcal{E}}, i.e. χ(ω)=1\chi_{\mathcal{E}}(\omega)=1 if ω\omega\in{\mathcal{E}} and 0 otherwise. Then

𝔼[γkAxkδyδ2]𝔼[γkAxkδyδ2χ]γτ2δ2𝔼[χ]=γτ2δ2().\displaystyle\mathbb{E}\left[\gamma_{k}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}\right]\geq\mathbb{E}\left[\gamma_{k}\|Ax_{k}^{\delta}-y^{\delta}\|^{2}\chi_{\mathcal{E}}\right]\geq\gamma\tau^{2}\delta^{2}\mathbb{E}[\chi_{\mathcal{E}}]=\gamma\tau^{2}\delta^{2}\mathbb{P}({\mathcal{E}}).

Combining this with (36) gives

c4γτ2δ2(n+1)()(x¯)(x0)c_{4}\gamma\tau^{2}\delta^{2}(n+1)\mathbb{P}({\mathcal{E}})\leq{\mathcal{R}}(\bar{x})-{\mathcal{R}}(x_{0})

for all n0n\geq 0 and hence ()((x¯)(x0))/(c4γτ2δ2(n+1))0\mathbb{P}({\mathcal{E}})\leq({\mathcal{R}}(\bar{x})-{\mathcal{R}}(x_{0}))/(c_{4}\gamma\tau^{2}\delta^{2}(n+1))\to 0 as nn\to\infty. Thus ()=0\mathbb{P}({\mathcal{E}})=0.

Next we show (34) under the conditions given in Theorem 3.4. Let k0k\geq 0 be any integer. Since kδk_{\delta}\to\infty, we have kδ>kk_{\delta}>k for small δ>0\delta>0. By virtue of (35) and analogous to the proof of Theorem 3.4 we can obtain

lim supδ0𝔼[Δkδδ]𝔼[Dξk(x,xk)]C(𝔼[xkx2])1/2,\displaystyle\limsup_{\delta\to 0}\mathbb{E}[\Delta_{k_{\delta}}^{\delta}]\leq\mathbb{E}\left[D_{\mathcal{R}}^{\xi_{k}}(x^{\dagger},x_{k})\right]\leq C\left(\mathbb{E}[\|x_{k}-x^{\dagger}\|^{2}]\right)^{1/2},

where CC is a generic constant independent of kk. Letting kk\to\infty and using (31) we thus obtain 𝔼[Δkδδ]0\mathbb{E}[\Delta_{k_{\delta}}^{\delta}]\to 0 as δ0\delta\to 0 which together with the strong convexity of {\mathcal{R}} implies (34). The proof is therefore complete. ∎

4. Numerical results

In this section, we present numerical simulations to verify the theoretical results and test the performance of the RBCD method for solving some specific linear ill-posed problems that arise from two imaging modalities, including X-ray computed tomography (CT) and coded aperture compressive temporal imaging. In all simulations, the noisy data yδy^{\delta} is generated from the exact data yy by

yδ=y+δrelyξ,y^{\delta}=y+\delta_{\text{rel}}\|y\|\xi,

where δrel\delta_{\text{rel}} is the relative noise level and ξ\xi is a normalized random noise obeying the standard Gaussian distribution (i.e., ξ=1\|\xi\|=1), then the noisy level δ=δrely\delta=\delta_{\text{rel}}\|y\|. The experiments are carried out by MATLAB 2021a on a laptop computer (1.80Hz Intel Core i7 processor with 16GB random access memory).

4.1. Computed tomography

As the first testing example, we consider the standard 2D parallel beam X-ray CT from stimulated tomographic data to illustrate the performance of the RBCD method in this classical imaging task. The process in this modality consists of reconstructing a slice through the patient’s body by collecting the attenuation of X-ray dose as they pass through the biological issues, which can be mathematically expressed as finding a compactly supported function from its line integrals, the so-called Radon transform.

In this example, we discretize the sought image into a 256×256 pixel grid and use the function paralleltomo from the package AIR TOOLS [9] in MATLAB to generate the discrete model. The true images we used in our experiments are the Shepp-Logan phantom and real chest CT image (see Figure 1). The real chest CT image is provided by the dataset chestVolume in MATLAB. If we consider the reconstruction from the tomographic data with pp projections and 367 X-ray lines per projection, it leads to a linear system Ax=yAx=y where AA is a coefficient matrix with the size (367×p)×65536(367\times p)\times 65536. Let xx^{\dagger} be the true image vector by stacking all the columns of the original image matrix and y=Axy^{\dagger}=Ax^{\dagger} be the exact data. We divide the true solution xx^{\dagger} into bb blocks equally, and then the coefficient matrix AA also can be divided into bb blocks by column correspondingly.

Refer to caption
Refer to caption
Figure 1. The Sheep-Logan phantom (left) and a real CT image (right).

Assuming the image to be reconstructed is the Shepp-Logan phantom, we compare the RBCD method with the cyclic block coordinate descent (CBCD) method studied in [21] for solving the full angle CT problem from the exact tomographic data with 90 projection angles evenly distributed between 1 and 180 degrees. In theory, the inversion problem with the complete tomographic data is mildly ill-posed. To make a fair comparison, we run both the RBCD method (Algorithm 1) and the CBCD method 4000 iteration steps with the common parameters b=8b=8, γ=μ/A2\gamma=\mu/\|A\|^{2} with μ=0.5\mu=0.5, and the intial guess x0=0x_{0}=0. The reconstruction results are reported in the left and middle figures of Figure 2, which shows two methods produce similar results by the noiseless tomographic data. To clarify the difference of two methods, we also plot the relative square errors xnxL22/xL22\|x_{n}-x^{\dagger}\|_{L^{2}}^{2}/\|x^{\dagger}\|_{L^{2}}^{2} of CBCD method with 500 iterations and the relative mean square errors 𝔼[xnxL22/xL22]\mathbb{E}[\|x_{n}-x^{\dagger}\|_{L^{2}}^{2}/\|x^{\dagger}\|_{L^{2}}^{2}] which are calculated approximately by the average of 100 independent runs of the RBCD method with 500 iterations in the right figure of Figure 2. To the best of our knowledge, for the general linear ill-posed problems (1) by using the CBCD method, the convergence result by noiseless data and regularization property by noisy data have not been established so far.

Refer to caption
Refer to caption
Refer to caption
Figure 2. The reconstruction results by the RBCD method (left) and CBCD method (middle), and the numerical reconstruction errors of two methods (right) from the exact full angle tomographic data of the Shepp-Logan phantom, respectively.
block number 1 2 4 8 16
iteration number 202 205 424 870 1819
Time (s) 9.4262 4.9443 5.1250 5.5956 6.5644
Table 1. The reconstruction results by the Algorithm 1 with different block numbers from the exact full angle tomographic data of the Shepp-Logan phantom.

To further compare the RBCD method with the Landweber iteration, we consider Algorithm 1 for various block numbers b=1,2,4,8,16b=1,2,4,8,16 to solve the above same problem using the same exact tomographic data. Note that Landweber iteration can be seen as the special case of Algorithm 1 with b=1b=1. To give a fair comparison, for all experiments with different block numbers we use the same step-size choose γ=μ/A2\gamma=\mu/\|A\|^{2} with μ=1.99\mu=1.99 and terminate the algorithm when the relative errors satisfy xkδxL22/xL22<0.05\|x_{k}^{\delta}-x^{\dagger}\|_{L^{2}}^{2}/\|x^{\dagger}\|_{L^{2}}^{2}<0.05 for the first time. We perform 100 independent runs and calculate the average of the required iteration number and computational time. The results of these experiments are recorded in Table 1 which show that to get the same relative error, larger iteration number is required if a larger block number is applied; however, the computational times of the RBCD method with b=2,4,8,16b=2,4,8,16 are less than that of Landweber iteration (i.e., b=1b=1).

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3. The reconstruction results of Algorithm 2 from the sparse view tomographic data with 60 projection angles and three relative noise levels δrel=0.01\delta_{\text{rel}}=0.01 (left), 0.020.02 (middle), and 0.030.03 (right) of the Shepp-Logan phantom (top) and real chest CT image (bottom).
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4. The reconstruction results of Algorithm 2 from the limited angle tomographic data with 160 projection angles within [10,170)[10^{\circ},170^{\circ}) with 11^{\circ} steps and three relative noise levels δrel=0.01\delta_{\text{rel}}=0.01 (left), 0.020.02 (middle), and 0.030.03 (right) of the Shepp-Logan phantom (top) and real chest CT image (bottom).

It is well-known that if the tomographic data is not complete (here, we only consider the collection for the angular variable of the Radon transform is incomplete), the computed tomography problem is severely ill-posed [16]. To test the performance on the ill-posed imaging problem, we use the RBCD method for solving the CT problem by the incomplete view (including limited angle and sparse view cases) tomographic data with distinct three relative noise levels δrel=0.01,0.02\delta_{\text{rel}}=0.01,0.02 and 0.030.03. The original images here are chosen as the Shepp-Logan phantom and a real chest CT image in Figure 1. We consider the stimulated spare view tomographic data with 60 projection angles evenly distributed between 1 and 180 degrees and simulated limited angle tomographic data with 160 projection angles within the angular range of [10,170)[10^{\circ},170^{\circ}) with 11^{\circ} steps, respectively. In these experiments, we use the Algorithm 2 with the block number b=4b=4, μ=0.18\mu=0.18 and τ=1.1\tau=1.1, and plot the reconstruction results of the sparse view tomography and limited angle tomography in Figure 3 and Figure 4, respectively. It can be shown in Figure 3 and 4 that Algorithm 2 can always terminate within a suitable iteration time and thus reconstruct a valid image from the incomplete view tomographic data with relatively small noise (i.e., δrel\delta_{\text{rel}} no more than 0.02 for the Shepp-Logan phantom and 0.01 for the real chest CT image). For the reconstruction results of limited angle tomography, we can observe that the boundary of the image which is tangent to the missing angle in data is hard to recover, and there exists streak artifacts in the reconstruction result. The theoretical reason for these phenomena can be found in [3, 7, 20].

4.2. Compressive temporal imaging

In this example, the RBCD method is applied to an application of the video reconstruction, the so-called codedaperture compressive temporal imaging [15]. In this imaging modality, one uses a 2D detector to capture the 3D video data in only one snapshot measurement by a time-varying mask stack and then reconstructs the video from the coded compressed data using the appropriate algorithm.

As described in (2), (3), and (4) in the introduction, the coded aperture compressive temporal imaging in the continuous case can be formulated as the general problem (1). To numerically implement this application, we need to discretize the continuous model appropriately.

Let TiT_{i} be ii-th temporal interval defined in (3) such that |Ti|=1|T_{i}|=1 for i=1,,bi=1,\ldots,b. If bb is large enough so that the length of each interval TiT_{i} is short enough relative to the total interval length |T||T|, we may assume that the functions xi(s,t)x_{i}(s,t) and mi(s,t)m_{i}(s,t) only depend on the spatial variable sΩs\in\Omega, i.e. they are time-invariant. Thus, for any s=(s1,s2)Ωs=(s_{1},s_{2})\in\Omega,

mi(s,t)miTi(s1,s2),xi(s,t)xiTi(s1,s2).m_{i}(s,t)\equiv m_{i}^{T_{i}}(s_{1},s_{2}),\quad x_{i}(s,t)\equiv x_{i}^{T_{i}}(s_{1},s_{2}).

Denote the video to be reconstructed by {xiTi}i=1b\{x_{i}^{T_{i}}\}_{i=1}^{b}, where xiTix_{i}^{T_{i}} is the frame within ii-th time interval and the snapshot measurement by yy, and suppose the mask stack {miTi}i=1b\{m_{i}^{T_{i}}\}_{i=1}^{b} is known. Then the mathematical problem in the semi-discrete scheme is to solve the following equation given the measurement yy and known miTim_{i}^{T_{i}}:

i=1bmiTi(s1,s2)xiTi(s1,s2)=y(s1,s2).\sum_{i=1}^{b}m_{i}^{T_{i}}(s_{1},s_{2})x_{i}^{T_{i}}(s_{1},s_{2})=y(s_{1},s_{2}). (37)

Next, we discretize the spatial domain Ω\Omega into a 256×256256\times 256 pixel grid. The video, mask stack, and snapshot measurement are represented by the matrices {Xi(m,n)}i=1b\{X_{i}(m,n)\}_{i=1}^{b}, {Mi(m,n)}i=1b\{M_{i}(m,n)\}_{i=1}^{b}, and Y(m,n)Y(m,n), m,n=1,,256m,n=1,\ldots,256, respectively. Let xi2562×1\texttt{x}_{i}\in\mathbb{R}^{256^{2}\times 1} and y2562×1\texttt{y}\in\mathbb{R}^{256^{2}\times 1} be the vectors obtained by stacking all columns of the matrices Xi(m,n)X_{i}(m,n) and Y(m,n)Y(m,n). Define Mi:=Diag(Mi)2562×2562\texttt{M}_{i}:=\texttt{Diag}(M_{i})\in\mathbb{R}^{256^{2}\times 256^{2}} by arranging all the elements of MiM_{i} on the diagonal of the matrix Mi\texttt{M}_{i} correspondingly. Therefore, equation (37) is transferred into the fully discrete linear equations

i=1bMixi=y.\sum_{i=1}^{b}\texttt{M}_{i}\texttt{x}_{i}=\texttt{y}. (38)

In practice, the number of frames bb in the video can naturally be chosen as the block number for numerical experiments.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5. The original video of the dataset Runner.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 6. The original video of the dataset Kobe.
Refer to caption
Refer to caption
Figure 7. The exact snapshot measurements of Runner and Kobe.

We assume that the true videos are chosen from the datasets Runner [23] and Kobe [31] with b=8b=8 as shown in Figures 5 and 6. The mask in the stack {Mi(m,n)}i=1b\{M_{i}(m,n)\}_{i=1}^{b} is selected as shifting random binary matrices in which the first mask is simulated by a random binary atrix with elements drawn from a Bernoulli distribution with parameter 0.50.5 and the subsequent masks shift one pixel horizontally in order [31]. Let x=(x1,,xb)\texttt{x}^{\dagger}=(\texttt{x}_{1}^{\dagger},\ldots,\texttt{x}_{b}^{\dagger}) be the true solution and M=(M1,,Mb)\texttt{M}=(\texttt{M}_{1},\ldots,\texttt{M}_{b}) be the forward matrix with M2=b\|\texttt{M}\|^{2}=b. The exact snapshot measurements of the datasets Runner and Kobe are given by y:=i=1bMixi\texttt{y}^{\dagger}:=\sum_{i=1}^{b}\texttt{M}_{i}\texttt{x}_{i}^{\dagger} as shown in Figure 7. In this application, we need to recover x(b×2562)×1\texttt{x}^{\dagger}\in\mathbb{R}^{(b\times 256^{2})\times 1} from the noisy measurement yδ2562×1\texttt{y}^{\delta}\in\mathbb{R}^{256^{2}\times 1} with a relative noise level δrel=0.01\delta_{\text{rel}}=0.01. This implies that the linear system (38) is under-determined due to the compression between different frames. Additionally, it is important to note that a significant amount of useful information in each frame of the original videos will be lost since nearly half of the elements in every mask are zero.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 8. The reconstruction results by Algorithm 3 with {\mathcal{R}} given by (39) from the noisy snapshot measurement with the relative noise level δrel=0.01\delta_{\text{rel}}=0.01 of the dataset Runner.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 9. The reconstruction results by Algorithm 3 with {\mathcal{R}} given by (39) from the noisy snapshot measurement with the relative noise level δrel=0.01\delta_{\text{rel}}=0.01 of the dataset Kobe.

To solve this ill-posed video reconstruction problem, we incorporate the piece-wise constancy feature of the video into our proposed method, Algorithm 3. To this end, we take

(x)=i=1bi(xi)with i(xi):=12xi2+λ|xi|TV,\displaystyle{\mathcal{R}}(\texttt{x})=\sum_{i=1}^{b}{\mathcal{R}}_{i}(\texttt{x}_{i})\quad\mbox{with }{\mathcal{R}}_{i}(\texttt{x}_{i}):=\frac{1}{2}\|\texttt{x}_{i}\|^{2}+\lambda|\texttt{x}_{i}|_{TV}, (39)

where λ>0\lambda>0 is a positive constant and |xi|TV|\texttt{x}_{i}|_{TV} denotes the 2-dimensional isotropic total variation of xi\texttt{x}_{i} for each frame; see [1]. It is clear that {\mathcal{R}} satisfies Assumption 3.1 with κ=1/2\kappa=1/2. In order to implement Algorithm 3, we need to update xk+1,ikδ\texttt{x}_{k+1,i_{k}}^{\delta} from ξk+1,ikδ\xi_{k+1,i_{k}}^{\delta} which requires to solving a total variation denoising problem of the form

xk+1,ikδ=argminz{ik(z)ξk+1,ikδ,z}=argminz{λ|z|TV+12zξk+1,ikδ2}.\texttt{x}_{k+1,i_{k}}^{\delta}=\arg\min_{z}\left\{{\mathcal{R}}_{i_{k}}(z)-\langle\xi_{k+1,i_{k}}^{\delta},z\rangle\right\}=\arg\min_{z}\left\{\lambda|z|_{TV}+\frac{1}{2}\|z-\xi_{k+1,i_{k}}^{\delta}\|^{2}\right\}.

In the experiment, this sub-problem is solved by the iterative clipping algorithm [1]. When executing Algorithm 3, we use ξ0=0\xi_{0}=0 and γ=2κμ/b\gamma=2\kappa\mu/b with μ=1.99\mu=1.99 and for the datasets Runner and Kobe we use λ=15\lambda=15 and 3030 respectively. We run the algorithm for 15001500 iteration steps and report the reconstruction results in Figure 8 and 9. The detailed numerical results of the above experiments are recorded in Table 2, including the computational time, the peak signal-to-noise ratio (PSNR), the structural similarity index measure (SSIM) [29], and the relative error. These results demonstarte that, with a TV denoiser in the domain of prime variable, Algorithm 3 can recover a great deal of missing information from the compressed snapshot measurement and thus reconstruct a good approximation of the original video within an acceptable amount of time even if the data is corrupted by noise.

Time (s) PSNR (db) SSIM relative error
Runner 23.5595 27.8292 0.8012 0.0154
Kobe 23.6561 28.3153 0.8883 0.0216
Table 2. Numerical results of Algorithm 3 with {\mathcal{R}} given by (39) for the datasets Runner and Kobe using the noisy snapshot measurement with the relative noise level δrel=0.01\delta_{\text{rel}}=0.01.

Considering the same problem for Runner and Kobe using the noisy data with δrel=0.01\delta_{\text{rel}}=0.01, we run the Algorithm 4 with τ=2\tau=2, μ=1.99\mu=1.99, and {\mathcal{R}} given by (39).We use λ=15\lambda=15 for the datasets Runner and λ=30\lambda=30 for the dataset Kobe. This algorithm terminates the iteration based on the a posteriori discrepancy principle. The reconstruction results are shown in Figures 10 and 11. We also record the stopping index kδk_{\delta}, PSNR, SSIM, and relative error of the reconstructions in Table 3. It can be observed that Algorithm 4 consistently terminates the iterations within a finite number of steps and provides efficient reconstructions of the true videos, even when the data is corrupted by noise. Therefore, Algorithm 4 effectively meets the key requirement of automatic stopping for this large-scale video reconstruction in practice.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 10. The reconstruction results by Algorithm 4 with {\mathcal{R}} given by (39) from the noisy snapshot measurement with the relative noise level δrel=0.01\delta_{\text{rel}}=0.01 of the dataset Runner.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 11. The reconstruction results by Algorithm 4 with {\mathcal{R}} given by (39) from the noisy snapshot measurement with the relative noise level δrel=0.01\delta_{\text{rel}}=0.01 of the dataset Kobe.
kδk_{\delta} PSNR (db) SSIM relative error
Runner 1306 27.5785 0.7983 0.0163
Kobe 2611 28.4458 0.8842 0.0209
Table 3. Numerical results of Algorithm 4 with {\mathcal{R}} gicen by (39) for the datasets Runner and Kobe using the noisy snapshot measurement with the relative noise level δrel=0.01\delta_{\text{rel}}=0.01.

5. Conclusions

In this paper, we addressed linear ill-posed inverse problems of the separable form i=1bAixi=y\sum_{i=1}^{b}A_{i}x_{i}=y in Hilbert spaces and proposed a randomized block coordinate descent (RBCD) method for solving such problems. Although the RBCD method has been extensively studied for well-posed convex optimization problems, there has been a lack of convergence analysis for ill-posed problems. In this work, we investigated the convergence properties of the RBCD method with noisy data under both a priori and a posteriori stopping rules. We proved that the RBCD method, when combined with an a priori stopping rule, serves as a convergent regularization method in the sense of weak convergence almost surely. Additionally, we explored the early stopping of the RBCD method, demonstrating that the discrepancy principle can terminate the iteration after a finite number of steps almost surely. Furthermore, we developed a strategy to incorporate convex regularization terms into the RBCD method to enhance the detection of solution features. To validate the performance of our proposed method, we conducted numerical simulations, demonstrating its effectiveness.

It should be noted that when the sought solution is smooth and decomposed into many small blocks artificially, the numerical results obtained by the RBCD method, which are not reported here, may not meet expectations due to the mismatch between adjacent blocks. Therefore, there is significant interest in developing novel strategies to overcome this mismatch between adjacent blocks, thereby improving the performance of the RBCD method. However, in cases where the object to be reconstructed naturally consists of many separate frames, such as the coded aperture compressive temporal imaging, the RBCD method demonstrates its effectiveness and yields satisfactory results.

Acknowledgement

The work of Q. Jin is partially supported by the Future Fellowship of the Australian Research Council (FT170100231). The work of D. Liu is supported by the China Scholarship Council program (Project ID: 202307090070).

References

  • [1] A. Beck and M. Teboulle, Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems, IEEE Trans. Image Process., 18 (2009), pp.2419–2434.
  • [2] D. P. Bertsekas, Incremental proximal methods for large scale convex optimization, Math. Program., 129 (2011), pp. 163–195.
  • [3] L. Borg, J. Frikel, J. S. Jørgensen and E. T. Quinto, Analyzing reconstruction artifacts from arbitrary incomplete X-ray CT data, SIAM J. Imaging Sci., 11 (2018), pp. 2786–2814.
  • [4] P. Brémaud, Probability theory and stochastic processes, Universitext, Springer, Cham, 2020.
  • [5] P. L. Combettes and J.-C. Pesquet, Stochastic quasi-Fejér block-coordinate fixed point iterations with random sweeping, SIAM J. Optim., 25 (2015), pp. 1221–1248.
  • [6] H. W. Engl, M. Hanke and A. Neubauer, Regularization of Inverse Problems, Dordrecht, Kluwer, 1996.
  • [7] J. Frikel and E. T. Quinto, Characterization and reduction of artifacts in limited angle tomography, Inverse Problems, 29 (2013), 125007.
  • [8] M. Haltmeier, A. Leitão and O. Scherzer, Kaczmarz methods for regularizing nonlinear ill-posedequations. I. Convergence analysis, Inverse Probl. Imaging, 1 (2007), pp. 289–298.
  • [9] P. C. Hansen and M. Saxild-Hansen, AIR tools—a MATLAB package of algebraic iterative reconstruction methods, J. Comput. Appl. Math., 236 (2012), pp. 2167–2178.
  • [10] Q. Jin, Landweber-Kaczmarz method in Banach spaces with inexact inner solvers, Inverse Problems, 32 (2016), 104005.
  • [11] Q. Jin, X. Lu and L. Zhang, Stochastic mirror descent method for linear ill-posed in Banach spaces, Inverse Problems, 39 (2023), 065010.
  • [12] Q. Jin and W. Wang, Landweber iteration of Kaczmarz type with general non-smooth convex penalty functionals, Inverse Problems, 29 (2013), no. 8, 085011, 22 pp.
  • [13] R. Kowar and O. Scherzer, Convergence analysis of a Landweber-Kaczmarz method for solving nonlinear ill-posed problems, Ill-posed and inverse problems, 253–270, VSP, Zeist, 2002.
  • [14] C. Liu, J. Qiu, and M. Jiang, Light field reconstruction from projection modeling of focal stack, Opt. Express, 25 (2017), pp. 11377–11388.
  • [15] P. Llull, X. Liao, X. Yuan, J. Yang, D. Kittle, L. Carin, G. Sapiro, and D. J. Brady, Coded aperture compressive temporal imaging, Opt. Express, 21 (2013), pp. 10526–10545.
  • [16] A. K. Louis, Incomplete data problems in X-ray computerized tomography. I. singular value decomposition of the limited angle transform, Numer. Math., 48 (1986), pp. 251–262.
  • [17] Z. Lu and L. Xiao, On the complexity analysis of randomized block-coordinate descent methods, Math. Program., 152 (2015), 615–642.
  • [18] F. Natterer, The Mathematics of Computerized Tomography, SIAM, Philadelphia, 2001.
  • [19] Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim., 22 (2012), no. 2, pp. 341–362.
  • [20] E. T. Quinto, Singularities of the X-ray transform and limited data tomography in 2\mathbb{R}^{2} and 3\mathbb{R}^{3}, SIAM J. Appl. Math., 24 (1993), pp. 1215–1225.
  • [21] S. Rabanser, L. Neumann and M. Haltmeier, Analysis of the block coordinate descent method for linear ill-posed problems, SIAM J. Imaging Sci., 12 (2019), no. 4, pp. 1808–1832.
  • [22] P. Richtárik and M. Takác, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Math. Program., 144 (2014), 1–38.
  • [23] Runner data, https://www.videvo.net/video/elite-runner-slow-motion/4541/.
  • [24] A. Saha and A. Tewari, On the nonasymptotic convergence of cyclic coordinate descent methods, SIAM J. Optim., 23 (2013), pp. 576–601.
  • [25] P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl., 109 (2001), 475–494.
  • [26] P. Tseng and S. Yun, Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization, J. Optim. Theory Appl. 140 (2009), 513–535.
  • [27] P. Tseng and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program., 117 (2009), 387–423.
  • [28] A. Wagadarikar, R. John, R. Willett and D. J. Brady, Single disperser design for coded aperture snapshot spectral imaging, Appl. Opt., 47 (2008), pp. B44–B51.
  • [29] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process., 13 (2004), pp. 600–612.
  • [30] S. J. Wright, Coordinate descent algorithms, Math. Program., 151 (2015), pp. 3–34.
  • [31] J. Yang, X. Yuan, X. Liao, P. Llull, G. Sapiro, D. J. Brady and L. Carin, Video compressive sensing using Gaussian mixture models, IEEE Trans. Image Process., 23 (2014), pp. 4863–4878.
  • [32] X. Yin, G. Wang, W. Li and Q. Liao, Iteratively reconstructing 4D light fields from focal stacks, Appl. Opt., 55 (2016), pp. 8457–8463.
  • [33] X. Yuan, D. J. Brady and A. K. Katsaggelos, Snapshot compressive imaging: theory, algorithms, and applications, IEEE Signal Process. Mag., 38 (2021), pp.65–88.
  • [34] C. Zălinscu, Convex Analysis in General Vector Spaces, World Scientific Publishing Co., Inc., River Edge, New Jersey, 2002.