Randomized block coordinate descent method for linear ill-posed problems
Abstract.
Consider the linear ill-posed problems of the form , where, for each , is a bounded linear operator between two Hilbert spaces and . When is huge, solving the problem by an iterative method using the full gradient at each iteration step is both time-consuming and memory insufficient. Although randomized block coordinate decent (RBCD) method has been shown to be an efficient method for well-posed large-scale optimization problems with a small amount of memory, there still lacks a convergence analysis on the RBCD method for solving ill-posed problems. In this paper, we investigate the convergence property of the RBCD method with noisy data under either a priori or a posteriori stopping rules. We prove that the RBCD method combined with an a priori stopping rule yields a sequence that converges weakly to a solution of the problem almost surely. We also consider the early stopping of the RBCD method and demonstrate that the discrepancy principle can terminate the iteration after finite many steps almost surely. For a class of ill-posed problems with special tensor product form, we obtain strong convergence results on the RBCD method. Furthermore, we consider incorporating the convex regularization terms into the RBCD method to enhance the detection of solution features. To illustrate the theory and the performance of the method, numerical simulations from the imaging modalities in computed tomography and compressive temporal imaging are reported.
Key words and phrases:
linear ill-posed problems, randomized block coordinate descent method, convex regularization term, convergence, imaging2010 Mathematics Subject Classification:
65J20, 65J22, 65J10, 94A081. Introduction
In this paper we consider linear ill-posed problems of the form
(1) |
where, for each , is a bounded linear operator between two real Hilbert spaces and . Ill-posed problems with such a form arise naturally from many applications in imaging. For example, coded aperture compressive temporal imaging (more details see Example 4.2 and [15]), a kind of large-scale snapshot compressive imaging [33], can be expressed by the following integral equation
(2) |
where , is a bounded domains in 2-dimensional Euclidean spaces and is a bounded continuous mask function on , which ensures the forward operator is a bounded linear operator from to . Decomposing the whole time domain into disjoint sub-domains , where for any ,
(3) |
we denote the restriction of each function to as , i.e. and let
(4) |
then is a bounded linear operator for each . Since
the above integral equation can be written as the form (1). For more examples, one may refer to the light field reconstruction from the focal stack [14, 32] and other imaging modalities such as spectral imaging [28] and full 3-D computed tomography [18]. In these examples, the sought solution can be separated into many films or frames naturally.
Let be the product space of with the natural inner product inherited from those of and define by
(5) |
Then, (1) can be equivalently stated as . Throughout the paper we assume that (1) has a solution, i.e. , the range of . In practical applications, instead of the exact data we can only acquire a noisy data satisfying
where is the noise level. How to stably reconstruct an approximate solution of ill-posed problems described by (1) using noisy data is a central topic in computational inverse problems. In this situation, the regularization techniques should be used and various regularization methods have been developed in the literature ([6]).
In this paper we will exploit the decomposition structure of the equation (1) to develop a randomized block coordinate descent method for solving ill-posed linear problems. The researches on the block coordinate descent method in the literature mainly focus on solving large scale well-posed optimization problems ([17, 19, 22, 24, 25, 26, 27, 30]). For the unconstrained minimization problem
(6) |
with , where is a continuous differentiable function, the block coordinate descent method updates from by selecting an index from and uses the partial gradient of at to update the component of and leave other components unchanged; more precisely, we have the updating formula
(7) |
where is the step size. The index can be selected in various ways, either in a cyclic fashion or in a random manner. By applying the block coordinate descent method (7) to solve (6) with given by
(8) |
it leads to the iterative method
(9) |
for solving ill-posed linear problem (1) using noisy data , where denotes the adjoint of for each and the superscript “” on all iterates is used to indicate their dependence on the noisy data.
The method (9) can also be motivated from the Landweber iteration
(10) |
which is one of the most prominent regularization methods for solving ill-posed problems, where is the step-size and is the adjoint of . Note that, with the operator defined by (5), we have
Thus, updating from by (10) needs calculating for all . If is huge, applying the method (10) to solve the equation (1) is both time-consuming and memory insufficient. To resolve these issues, it is natural to calculate only one component of and use it to update the corresponding component of and keep other components fixed. This leads again to the block coordinate descent method (9) for solving (1).
The block coordinate descent method and its variants for large scale well-posed optimization problems have been analyzed extensively, and all the established convergence results require either the objective function to be strongly convex or the convergence is established in terms of the objective value. However, these results are not applicable to the method (9) for ill-posed problems because the corresponding objective function (8) is not strongly convex, and moreover, due to the ill-posedness of the underlying problem, convergence on objective value does not imply any result on the iterates directly. Therefore, new analysis is required for understanding the method (9) for ill-posed problems.
In [21] the block coordinate descent method (9) was considered with the index selected in a cyclic fashion, i.e. . The regularization property of the corresponding method was only established for a specific class of linear ill-posed problems in which the forward operator is assumed a particular tensor product form; more precisely, the ill-posed linear system of the form
(11) |
was considered in [21], where is a bounded linear operator between two real Hilbert spaces and and is a matrix with full column rank. Clearly this problem can be cast into the form (1) if we take for each , , the -fold Cartesian product of , and define by
for each . The problem formulation (11), however, seems too specific to cover interesting problems arising from practical applications. How to establish the regularization property of this cyclic block coordinate descent method for ill-posed problem (1) in its full generality remains a challenging open question.
Instead of using a deterministic cyclic order, in this paper we will consider a randomized block coordinate descent method, i.e. the method (9) with being selected from uniformly at random. We will investigate the convergence behavior of the iterates for the ill-posed linear problem (1) with general forward operator . Note that in the method (9) the definition of involves . If this quantity is required to be calculated directly at each iteration step, then the advantages of the method will be largely reduced. Fortunately, this quantity can be calculated in a simple recursive way. Indeed, by the definition of we have
Therefore, we can formulate our randomized block coordinate descent (RBCD) method into the following equivalent form which is convenient for numerical implementation.
Algorithm 1 (RBCD method for ill-posed problems).
Pick an initial guess , set and calculate . Choose a suitable step size . For all integers do the following:
-
(i)
Pick an index randomly via the uniform distribution;
-
(ii)
Update by setting for and
-
(iii)
Calculate .
In this paper we present a convergence analysis on Algorithm 1 by deriving the stability estimate and establishing the regularization property. For the equation (1) in the general form, we obtain a weak convergence result and for a particular tensor product form as studied in [21] we establish the strong convergence result. Moreover, we also consider the early stopping of Algorithm 1 and demonstrate that the discrepancy principle can terminate the iteration after finite many steps almost surely. Note that, Algorithm 1 does not incorporate a priori available information on the feature of the sought solution. In case the sought solution has some special feature, such as non-negativity, sparsity and piece-wise constancy, Algorithm 1 may not be efficient enough to produce satisfactory approximate solutions. In order to resolve this issue, we further extend Algorithm 1 by incorporating into it a convex regularization term which is selected to capture the desired feature. For a detailed description of this extended algorithm, please refer to Algorithm 3 for which we will provide a convergence analysis.
It should be pointed out that the RBCD method is essentially different from the Landweber-Kaczmarz method and its stochastic version which have been studied in [8, 10, 11, 12, 13]. The Landweber-Kaczmarz method relies on decomposing the data into many blocks of small size, while our RBCD method makes use of the block structure of the sought solutions.
This paper is organized as follows. In Section 2 we consider Algorithm 1 by proving various convergence results and investigating the discrepancy principle as an early stopping rule. In Section 3 we consider an extension of Algorithm 1 by incorporating a convex regularization term into it so that the special feature of sought solutions can be detected. Finally in Section 4 we provide numerical simulations to test the performance of our proposed methods.
2. Convergence analysis
In this section we consider Algorithm 1. We first establish a stability estimate, and prove a weak convergence result when the method is terminated by an a priori stopping rule. We then investigate the discrepancy principle and demonstrate that it can terminate the iteration in finite many steps almost surely. When the forward operator has a particular tensor product form as used in [21], we further show that strong convergence can be guaranteed.
Note that, once and are fixed, the sequence is completely determined by the sample path ; changing the sample path can result in a different iterative sequence and thus is a random sequence. Let and, for each integer , let denote the -algebra generated by the random variables for . Then form a filtration which is natural to Algorithm 1. Let denote the expectation associated with this filtration, see [4]. The tower property of conditional expectation
will be frequently used.
2.1. Stability estimate
Let denote the iterative sequence produced by Algorithm 1 with replaced by the exact data . By definition it is easy to see that, for any fixed integer , along any sample path there holds as and hence as . The following result gives a more precise stability estimate.
Lemma 2.1.
Consider Algorithm 1 and assume . Then, along any sample path there holds
(12) |
for all integers . Furthermore
(13) |
for all .
Proof.
Let for all integers . It follows from the definition of and that
Thus
Since and , we have and thus
Consequently
which shows (12).
To derive (13), we note that
Taking the conditional expectation gives
By using and (12), we further obtain
Consequently, by taking the full expectation and using the tower property of conditional expectation, we can obtain
Based on this inequality and the fact , we can use an induction argument to complete the proof of (13) immediately. ∎
2.2. Weak convergence
Our goal is to investigate the approximation behavior of to a solution of . Because of Lemma 2.1, we now focus our consideration on the sequence defined by Algorithm 1 using exact data.
Lemma 2.2.
Proof.
Let denote the -th component of , i.e. . By the definition of and the polarization identity, we have
Taking the conditional expectation and using , we can obtain
which shows (14). ∎
To proceed further, we need the following Doob’s martingale convergence theorem ([4]).
Proposition 2.3.
Let be a sequence of nonnegative random variables in a probability space that is a supermartingale with respect to a filtration , i.e.
Then converges almost surely to a nonnegative random variable with finite expectation.
Based on Proposition 2.3, we next prove the almost sure weak convergence of . We need to be separable in the sense that has a countable dense subset.
Theorem 2.4.
Proof.
Let denote the set of solutions of (1). According to Lemma 2.2, we have for any solution that
which means is a supermartingale. Thus, we may use Proposition 2.3 to conclude that the event
has probability one. We now strengthen this result by showing that there is an event of probability one such that, for any and any sample path , the limit exists. We adapt the arguments in [2, 5]. By the separability of , we can find a countable set such that is dense in . Let
Since for each and is countable, we have . Let be any point. Then there is a sequence such that as . For any sample path we have by the triangle inequality that
Thus
Since , exists. Thus, by the properties of and we have
This implies that both and are finite with
Letting shows that
and hence exists and is finite for every and .
Next we use Lemma 2.2 again to obtain for any that
which implies that
Consequently, the event
(15) |
has probability one. Let . Then . Note that along any sample path in we have is convergent for any and
which implies as . The convergence of implies that is bounded and hence it has a weakly convergent subsequence such that as for some , where “” denotes the weak convergence. Since , we must have , i.e. and consequently converges. We now show that for the whole sequence . It suffices to show that is the unique weak cluster point of . Let be any cluster point of . Then there is another subsequence of such that . From the above argument we also have and thus is convergent. Noting the identity
Since both and are convergent, we can conclude that exists. Therefore
and thus , i.e. and hence . The proof is complete. ∎
Remark 2.5.
Theorem 2.4 shows that there is an event of probability one and a random vector such that almost surely and along every sample path in . Let denote the unique -minimum-norm solution of . We would like to point out that almost surely if is uniformly bounded in the sense that
(16) |
To see this, we first claim that
(17) |
Indeed this is trivial for . Assume it is true for some . Then, by the definition of , we have
Thus
because almost surely. Consequently, by taking the full expectation and using the induction hypothesis we can obtain and the claim (17) is proved.
Under the condition (16), we have
Since almost surely, by using the weak lower semi-continuity of norms and the Fatou lemma we can obtain from Lemma 2.2 that
and thus
Therefore, we may use the dominated convergence theorem, and (17) to conclude
Since always holds as is the -minimum-norm solution, we thus obtain which implies that almost surely.
By using Lemma 2.1 and Theorem 2.4, now we are ready to prove the following weak convergence result on Algorithm 1 under an a priori stopping rule.
Theorem 2.6.
For any sequence of noisy data satisfying with as , let be the iterative sequence produced by Algorithm 1 with replaced by , where . Let the integer be chosen such that and as . Then, by taking a subsequence of if necessary, there holds as almost surely, where denotes the random solution of determined in Theorem 2.4.
Proof.
According to Lemma 2.1 and we have
Therefore, by taking a subsequence of if necessary, we can find an event with such that along every sample path in . According to Theorem 2.4 and , there is an event of probability one such that as along every sample path in . Let . Then and for any there holds
as along every sample path in . The proof is thus complete. ∎
2.3. The discrepancy principle
The above weak convergence result on Algorithm 1 is established under an a priori stopping rule. In applications, we usually expect to terminate the iteration by a posteriori rules. Note that is involved in the algorithm, it is natural to consider terminating the iteration by the discrepancy principle which determines to be the first integer such that , where is a given number. Incorporating the discrepancy principle into Algorithm 1 leads to the following algorithm.
Algorithm 2 (RBCD method with the discrepancy principle).
Pick an initial guess , set and calculate . Choose and . For all integers do the following:
-
(i)
Set the step size by
-
(ii)
Pick an index randomly via the uniform distribution;
-
(iii)
Update by setting for and
-
(iv)
Calculate .
Algorithm 2 is formulated in the way that it incorporates the discrepancy principle to define an infinite sequence , which is convenient for the analysis below. In numerical simulations, the iteration should be terminated as long as is satisfied because the iterates are no longer updated. It should be highlighted that the stopping index depends crucially on the sample path and thus is a random integer. Note also that the step size in Algorithm 2 is a random number; this sharply contrasts to the step size in Algorithm 1 which is deterministic.
Proposition 2.7.
2.4. Strong convergence
In Subsection 2.2, we obtained weak convergence result on Algorithm 1. In this section, we will show that, for a special case studied in [21], a strong convergence result can be derived. To start with, let be a matrix and let be a bounded linear operator. We will consider the problem (11) which can be written as by setting and define as
It is easy to see that (11) is a special case of (1) with for each , , and
Note that , the adjoint of , has the following form
Thus, when our randomized block coordinate descent method, i.e. Algorithm 1, is used to solve (11) with the exact data, the iteration scheme becomes if and
where denotes the -th component of . In order to give a convergence analysis, we introduce with
and for any solution of (11) we set with
Then, we have
Consequently,
(19) |
where
By straightforward calculation, we can obtain
and
where, for each , we use to denote the -th column of . Combining these two equations with (19) gives
which shows the following result.
Lemma 2.8.
Based on Lemma 2.8, now we show the almost sure strong convergence of under the assumption that is of full column rank.
Theorem 2.9.
Proof.
Consider the event defined in (15). It is known that . We now fix an arbitrary sample path in and show that, along this sample path, is a Cauchy sequence. Recall that
(20) |
and hence as . Given any two positive integers , let be an integer such that and
(21) |
Then
We now show that and as . To show the first assertion, we note that
Since, by Lemma 2.8, is monotonically decreasing, exists and thus as . To estimate , we first write
By the definition of and , we have
Using the definition of we further have
By virtue of the Cauchy-Schwarz inequality, we then obtain
With the help of (21) and (20) we further obtain
as . Thus as . Similarly, we can also obtain as . Therefore, is a Cauchy sequence.
Recall that is assumed to be of full column rank. Thus we can find a matrix such that , where is the identity matrix. Then
for . Hence we can recover from . Let denote the Frobenius norm of . Then by the Cauchy-Schwarz inequality we can obtain
which implies that is also a Cauchy sequence and hence as for some . Since as , we can conclude , i.e. is a solution of .
The above argument actually shows that there is a random solution of such that as along any sample path in . Since , we have as almost surely. Note that Lemma 2.8 implies , we can conclude that
which implies that is uniformly bounded in the sense of (16). Thus we may use Remark 2.5 to conclude that almost surely and hence as almost surely. Furthermore, by the dominated convergence theorem we can further obtain as . The proof is therefore complete. ∎
Remark 2.10.
In their exploration of the cyclic block coordinate descent method detailed in [21] for solving the ill-posed problem (11), the authors established the convergence of the generated sequence to a point satisfying
They then concluded that the full column rank property of implies . However, this condition on is not sufficient for drawing such a conclusion when . We circumvented this issue for our RBCD method by leveraging Lemma 2.2. Furthermore, when classifying the limit , the authors in [21] inferred that is the unique -minimum-norm solution of (11) by asserting that for all . Regrettably, this assertion is inaccurate. Actually
which is considerably larger than . We demonstrated that the limit of our method is the -minimum-norm solution by using Remark 2.5.
Theorem 2.11.
Proof.
The above theorem provides a convergence result on Algorithm 1 under an a priori stopping rule when it is used to solve (11). We can also apply Algorithm 2 to solve (11). Correspondingly we have the following convergence result.
Theorem 2.12.
3. Extension
In Algorithm 1 we proposed a randomized block coordinate descent method for solving linear ill-posed problem (1) and provided convergence analysis on the method which demonstrates that the iterates in general converge to the -minimal norm solution. In many applications, however, we are interested in reconstructing solutions with other features, such as non-negativity, sparsity, and piece-wise constancy. Incorporating such feature information into algorithm design can significantly improve the reconstruction accuracy. Assume the feature of the sought solution can be detected by a convex function . We may consider determining a solution of (1) such that
(22) |
We assume the following condition on .
Assumption 3.1.
is proper, lower semi-continuous and strongly convex in the sense that there is a constant such that
for all and . Moreover, has the separable structure
where each is a function from to .
Under Assumption 3.1, it is easy to see that each is proper, lower semi-continuous, and strong convex from to . Furthermore, let denote the subdifferential of , then the following facts on hold (see [12, 34]):
-
(i)
For any with and there holds
(23) where
is the Bregman distance induced by at in the direction .
-
(ii)
For any , and there holds
-
(iii)
For all , and there holds
(24)
Let us elucidate how to extend the method (9) to solve (1) so that the convex function can be incorporated to detect the solution feature. By introducing with
we can rewrite (9) as
Assume and take , we may use the Bregman distance to replace in the above equation to obtain the new updating formula
Letting , then we have
Recall the definition of , we can see that with
(27) |
Under Assumption 3.1, it is known that is uniquely defined and . According to the separable structure of , it is easy to see that with
(30) |
Since , we may repeat the above procedure. This leads us to propose the following algorithm.
Algorithm 3 (RBCD method with convex regularization function).
The analysis on Algorithm 3 is rather challenging. In the following we will prove some results which support the use of Algorithm 3 to solve ill-posed problems. We start with the following lemma.
Lemma 3.2.
Proof.
Theorem 3.3.
Proof.
Since is finite dimensional and maps to , the convex function is Lipschitz continuous on bounded sets and for all ; moreover, for any bounded set there is a constant such that for all and .
In the following we will use the similar argument in the proof of Theorem 2.4 to show the almost sure convergence of . Let denote the set of solutions of (1) in . By using Lemma 3.2 with exact data and Proposition 2.3 we can conclude for any that the event
has probability one. Since is separable, we can find a countable set such that is dense in . Let
Then . We now show that, for any , along any sample path in the sequence is convergent. To see this, we take a sequence such that as . For any sample path we have
Since , for a fixed , is convergent, it is bounded. By (23), is then bounded and hence we can find a constant such that for all . Therefore
This together with the existence of implies
Letting then shows that exists and is finite for every and .
Next we show the almost sure convergence of . By using Lemma 3.2 with exact data we can conclude the event
has probability one. Let . Then . Note that, along any sample path in , is convergent for any and thus and are bounded. Thus, we can find a subsequence of positive integers such that and as . Because , we have . Since as , we can obtain , i.e. . Consequently is convergent. To show , it suffices to show that is the unique cluster point of . Let be any cluster point of . Then, by using the boundedness of , there is another subsequence of positive integers such that and as . Then , and thus is convergent. Noting the identity
we can conclude that exists. Therefore
and thus . Since and , we may use the strong convexity of to conclude . The proof is complete. ∎
For Algorithm 3 with noisy data, we have the following convergence result under an a priori stopping rule.
Theorem 3.4.
Let be finite dimensional and let satisfy Assumption 3.1. Consider Algorithm 3 with . Let be the random sequence determined by Algorithm 3 with the exact data. If is uniformly bounded in the sense of (16) and if the integer is chosen such that and as , then
as , where denotes the unique solution of (1) satisfying (22).
Proof.
According to Theorem 3.3, there is a random solution of (1) such that as almost surely. Since is assumed to satisfy (16), we can use the dominated convergence theorem to conclude as . Based on this, we will show
(31) |
Indeed, by using a similar argument in Remark 2.5 and noting we can conclude that
(32) |
According to the definition of , the convex function on achieves its minimum at and thus . Note that
Consequently, there exists such that . This depends on and hence it is a random element in . Nevertheless, . Combining this with (32) gives
for all integers . Since is uniformly bounded in the sense of (16) and is Lipschitz continuous on bounded sets, we can find a constant such that and for all almost surely. Thus
as . Consequently
By the strong convexity of , we have . Therefore which shows (31).
Next, by using the definition of and , it is easy to show by an induction argument that, along any sample path there holds
(33) |
for every integer .
Finally, we show as if is chosen such that and as . To see this, let for all . From Lemma 3.2 it follows that
Let be any fixed integer. Since , we have for small . Consequently, by virtue of the above inequality, we can obtain
Since as , we may use (33) and the continuity of to further obtain
Since satisfies (16) and is Lipschitz continuous on bounded sets, there is a constant such that
Therefore
for all . By taking and using (31) we thus have which implies as . Consequently, it follows from the strong convexity of that as . ∎
In the above results we considered Algorithm 3 by terminating the iterations via a priori stopping rules. Analogous to Algorithm 2, we may consider incorporating the discrepancy principle into Algorithm 3. This leads to the following algorithm.
Algorithm 4.
For Algorithm 4 we have the following result which in particular shows that the iteration must terminate in finite many steps almost surely, i.e. there is an event of probability one such that, along any sample path in this event, for sufficiently large .
Theorem 3.5.
Proof.
For any integer let . By noting that is -measurable, we may use the similar argument in the proof of Lemma 3.2 to obtain
According to the definition of , we have and . Therefore
Taking the full expectation then gives
(35) |
for all , where .
Based on (35) we now show that the method must terminate after finite many steps almost surely. To see this, consider the event
It suffices to show . By virtue of (35) we have
and hence for any integer that
(36) |
Let denote the characteristic function of , i.e. if and otherwise. Then
Combining this with (36) gives
for all and hence as . Thus .
Next we show (34) under the conditions given in Theorem 3.4. Let be any integer. Since , we have for small . By virtue of (35) and analogous to the proof of Theorem 3.4 we can obtain
where is a generic constant independent of . Letting and using (31) we thus obtain as which together with the strong convexity of implies (34). The proof is therefore complete. ∎
4. Numerical results
In this section, we present numerical simulations to verify the theoretical results and test the performance of the RBCD method for solving some specific linear ill-posed problems that arise from two imaging modalities, including X-ray computed tomography (CT) and coded aperture compressive temporal imaging. In all simulations, the noisy data is generated from the exact data by
where is the relative noise level and is a normalized random noise obeying the standard Gaussian distribution (i.e., ), then the noisy level . The experiments are carried out by MATLAB 2021a on a laptop computer (1.80Hz Intel Core i7 processor with 16GB random access memory).
4.1. Computed tomography
As the first testing example, we consider the standard 2D parallel beam X-ray CT from stimulated tomographic data to illustrate the performance of the RBCD method in this classical imaging task. The process in this modality consists of reconstructing a slice through the patient’s body by collecting the attenuation of X-ray dose as they pass through the biological issues, which can be mathematically expressed as finding a compactly supported function from its line integrals, the so-called Radon transform.
In this example, we discretize the sought image into a 256×256 pixel grid and use the function paralleltomo from the package AIR TOOLS [9] in MATLAB to generate the discrete model. The true images we used in our experiments are the Shepp-Logan phantom and real chest CT image (see Figure 1). The real chest CT image is provided by the dataset chestVolume in MATLAB. If we consider the reconstruction from the tomographic data with projections and 367 X-ray lines per projection, it leads to a linear system where is a coefficient matrix with the size . Let be the true image vector by stacking all the columns of the original image matrix and be the exact data. We divide the true solution into blocks equally, and then the coefficient matrix also can be divided into blocks by column correspondingly.


Assuming the image to be reconstructed is the Shepp-Logan phantom, we compare the RBCD method with the cyclic block coordinate descent (CBCD) method studied in [21] for solving the full angle CT problem from the exact tomographic data with 90 projection angles evenly distributed between 1 and 180 degrees. In theory, the inversion problem with the complete tomographic data is mildly ill-posed. To make a fair comparison, we run both the RBCD method (Algorithm 1) and the CBCD method 4000 iteration steps with the common parameters , with , and the intial guess . The reconstruction results are reported in the left and middle figures of Figure 2, which shows two methods produce similar results by the noiseless tomographic data. To clarify the difference of two methods, we also plot the relative square errors of CBCD method with 500 iterations and the relative mean square errors which are calculated approximately by the average of 100 independent runs of the RBCD method with 500 iterations in the right figure of Figure 2. To the best of our knowledge, for the general linear ill-posed problems (1) by using the CBCD method, the convergence result by noiseless data and regularization property by noisy data have not been established so far.



block number | 1 | 2 | 4 | 8 | 16 |
---|---|---|---|---|---|
iteration number | 202 | 205 | 424 | 870 | 1819 |
Time (s) | 9.4262 | 4.9443 | 5.1250 | 5.5956 | 6.5644 |
To further compare the RBCD method with the Landweber iteration, we consider Algorithm 1 for various block numbers to solve the above same problem using the same exact tomographic data. Note that Landweber iteration can be seen as the special case of Algorithm 1 with . To give a fair comparison, for all experiments with different block numbers we use the same step-size choose with and terminate the algorithm when the relative errors satisfy for the first time. We perform 100 independent runs and calculate the average of the required iteration number and computational time. The results of these experiments are recorded in Table 1 which show that to get the same relative error, larger iteration number is required if a larger block number is applied; however, the computational times of the RBCD method with are less than that of Landweber iteration (i.e., ).












It is well-known that if the tomographic data is not complete (here, we only consider the collection for the angular variable of the Radon transform is incomplete), the computed tomography problem is severely ill-posed [16]. To test the performance on the ill-posed imaging problem, we use the RBCD method for solving the CT problem by the incomplete view (including limited angle and sparse view cases) tomographic data with distinct three relative noise levels and . The original images here are chosen as the Shepp-Logan phantom and a real chest CT image in Figure 1. We consider the stimulated spare view tomographic data with 60 projection angles evenly distributed between 1 and 180 degrees and simulated limited angle tomographic data with 160 projection angles within the angular range of with steps, respectively. In these experiments, we use the Algorithm 2 with the block number , and , and plot the reconstruction results of the sparse view tomography and limited angle tomography in Figure 3 and Figure 4, respectively. It can be shown in Figure 3 and 4 that Algorithm 2 can always terminate within a suitable iteration time and thus reconstruct a valid image from the incomplete view tomographic data with relatively small noise (i.e., no more than 0.02 for the Shepp-Logan phantom and 0.01 for the real chest CT image). For the reconstruction results of limited angle tomography, we can observe that the boundary of the image which is tangent to the missing angle in data is hard to recover, and there exists streak artifacts in the reconstruction result. The theoretical reason for these phenomena can be found in [3, 7, 20].
4.2. Compressive temporal imaging
In this example, the RBCD method is applied to an application of the video reconstruction, the so-called codedaperture compressive temporal imaging [15]. In this imaging modality, one uses a 2D detector to capture the 3D video data in only one snapshot measurement by a time-varying mask stack and then reconstructs the video from the coded compressed data using the appropriate algorithm.
As described in (2), (3), and (4) in the introduction, the coded aperture compressive temporal imaging in the continuous case can be formulated as the general problem (1). To numerically implement this application, we need to discretize the continuous model appropriately.
Let be -th temporal interval defined in (3) such that for . If is large enough so that the length of each interval is short enough relative to the total interval length , we may assume that the functions and only depend on the spatial variable , i.e. they are time-invariant. Thus, for any ,
Denote the video to be reconstructed by , where is the frame within -th time interval and the snapshot measurement by , and suppose the mask stack is known. Then the mathematical problem in the semi-discrete scheme is to solve the following equation given the measurement and known :
(37) |
Next, we discretize the spatial domain into a pixel grid. The video, mask stack, and snapshot measurement are represented by the matrices , , and , , respectively. Let and be the vectors obtained by stacking all columns of the matrices and . Define by arranging all the elements of on the diagonal of the matrix correspondingly. Therefore, equation (37) is transferred into the fully discrete linear equations
(38) |
In practice, the number of frames in the video can naturally be chosen as the block number for numerical experiments.


















We assume that the true videos are chosen from the datasets Runner [23] and Kobe [31] with as shown in Figures 5 and 6. The mask in the stack is selected as shifting random binary matrices in which the first mask is simulated by a random binary atrix with elements drawn from a Bernoulli distribution with parameter and the subsequent masks shift one pixel horizontally in order [31]. Let be the true solution and be the forward matrix with . The exact snapshot measurements of the datasets Runner and Kobe are given by as shown in Figure 7. In this application, we need to recover from the noisy measurement with a relative noise level . This implies that the linear system (38) is under-determined due to the compression between different frames. Additionally, it is important to note that a significant amount of useful information in each frame of the original videos will be lost since nearly half of the elements in every mask are zero.
















To solve this ill-posed video reconstruction problem, we incorporate the piece-wise constancy feature of the video into our proposed method, Algorithm 3. To this end, we take
(39) |
where is a positive constant and denotes the 2-dimensional isotropic total variation of for each frame; see [1]. It is clear that satisfies Assumption 3.1 with . In order to implement Algorithm 3, we need to update from which requires to solving a total variation denoising problem of the form
In the experiment, this sub-problem is solved by the iterative clipping algorithm [1]. When executing Algorithm 3, we use and with and for the datasets Runner and Kobe we use and respectively. We run the algorithm for iteration steps and report the reconstruction results in Figure 8 and 9. The detailed numerical results of the above experiments are recorded in Table 2, including the computational time, the peak signal-to-noise ratio (PSNR), the structural similarity index measure (SSIM) [29], and the relative error. These results demonstarte that, with a TV denoiser in the domain of prime variable, Algorithm 3 can recover a great deal of missing information from the compressed snapshot measurement and thus reconstruct a good approximation of the original video within an acceptable amount of time even if the data is corrupted by noise.
Time (s) | PSNR (db) | SSIM | relative error | |
---|---|---|---|---|
Runner | 23.5595 | 27.8292 | 0.8012 | 0.0154 |
Kobe | 23.6561 | 28.3153 | 0.8883 | 0.0216 |
Considering the same problem for Runner and Kobe using the noisy data with , we run the Algorithm 4 with , , and given by (39).We use for the datasets Runner and for the dataset Kobe. This algorithm terminates the iteration based on the a posteriori discrepancy principle. The reconstruction results are shown in Figures 10 and 11. We also record the stopping index , PSNR, SSIM, and relative error of the reconstructions in Table 3. It can be observed that Algorithm 4 consistently terminates the iterations within a finite number of steps and provides efficient reconstructions of the true videos, even when the data is corrupted by noise. Therefore, Algorithm 4 effectively meets the key requirement of automatic stopping for this large-scale video reconstruction in practice.
















5. Conclusions
In this paper, we addressed linear ill-posed inverse problems of the separable form in Hilbert spaces and proposed a randomized block coordinate descent (RBCD) method for solving such problems. Although the RBCD method has been extensively studied for well-posed convex optimization problems, there has been a lack of convergence analysis for ill-posed problems. In this work, we investigated the convergence properties of the RBCD method with noisy data under both a priori and a posteriori stopping rules. We proved that the RBCD method, when combined with an a priori stopping rule, serves as a convergent regularization method in the sense of weak convergence almost surely. Additionally, we explored the early stopping of the RBCD method, demonstrating that the discrepancy principle can terminate the iteration after a finite number of steps almost surely. Furthermore, we developed a strategy to incorporate convex regularization terms into the RBCD method to enhance the detection of solution features. To validate the performance of our proposed method, we conducted numerical simulations, demonstrating its effectiveness.
It should be noted that when the sought solution is smooth and decomposed into many small blocks artificially, the numerical results obtained by the RBCD method, which are not reported here, may not meet expectations due to the mismatch between adjacent blocks. Therefore, there is significant interest in developing novel strategies to overcome this mismatch between adjacent blocks, thereby improving the performance of the RBCD method. However, in cases where the object to be reconstructed naturally consists of many separate frames, such as the coded aperture compressive temporal imaging, the RBCD method demonstrates its effectiveness and yields satisfactory results.
Acknowledgement
The work of Q. Jin is partially supported by the Future Fellowship of the Australian Research Council (FT170100231). The work of D. Liu is supported by the China Scholarship Council program (Project ID: 202307090070).
References
- [1] A. Beck and M. Teboulle, Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems, IEEE Trans. Image Process., 18 (2009), pp.2419–2434.
- [2] D. P. Bertsekas, Incremental proximal methods for large scale convex optimization, Math. Program., 129 (2011), pp. 163–195.
- [3] L. Borg, J. Frikel, J. S. Jørgensen and E. T. Quinto, Analyzing reconstruction artifacts from arbitrary incomplete X-ray CT data, SIAM J. Imaging Sci., 11 (2018), pp. 2786–2814.
- [4] P. Brémaud, Probability theory and stochastic processes, Universitext, Springer, Cham, 2020.
- [5] P. L. Combettes and J.-C. Pesquet, Stochastic quasi-Fejér block-coordinate fixed point iterations with random sweeping, SIAM J. Optim., 25 (2015), pp. 1221–1248.
- [6] H. W. Engl, M. Hanke and A. Neubauer, Regularization of Inverse Problems, Dordrecht, Kluwer, 1996.
- [7] J. Frikel and E. T. Quinto, Characterization and reduction of artifacts in limited angle tomography, Inverse Problems, 29 (2013), 125007.
- [8] M. Haltmeier, A. Leitão and O. Scherzer, Kaczmarz methods for regularizing nonlinear ill-posedequations. I. Convergence analysis, Inverse Probl. Imaging, 1 (2007), pp. 289–298.
- [9] P. C. Hansen and M. Saxild-Hansen, AIR tools—a MATLAB package of algebraic iterative reconstruction methods, J. Comput. Appl. Math., 236 (2012), pp. 2167–2178.
- [10] Q. Jin, Landweber-Kaczmarz method in Banach spaces with inexact inner solvers, Inverse Problems, 32 (2016), 104005.
- [11] Q. Jin, X. Lu and L. Zhang, Stochastic mirror descent method for linear ill-posed in Banach spaces, Inverse Problems, 39 (2023), 065010.
- [12] Q. Jin and W. Wang, Landweber iteration of Kaczmarz type with general non-smooth convex penalty functionals, Inverse Problems, 29 (2013), no. 8, 085011, 22 pp.
- [13] R. Kowar and O. Scherzer, Convergence analysis of a Landweber-Kaczmarz method for solving nonlinear ill-posed problems, Ill-posed and inverse problems, 253–270, VSP, Zeist, 2002.
- [14] C. Liu, J. Qiu, and M. Jiang, Light field reconstruction from projection modeling of focal stack, Opt. Express, 25 (2017), pp. 11377–11388.
- [15] P. Llull, X. Liao, X. Yuan, J. Yang, D. Kittle, L. Carin, G. Sapiro, and D. J. Brady, Coded aperture compressive temporal imaging, Opt. Express, 21 (2013), pp. 10526–10545.
- [16] A. K. Louis, Incomplete data problems in X-ray computerized tomography. I. singular value decomposition of the limited angle transform, Numer. Math., 48 (1986), pp. 251–262.
- [17] Z. Lu and L. Xiao, On the complexity analysis of randomized block-coordinate descent methods, Math. Program., 152 (2015), 615–642.
- [18] F. Natterer, The Mathematics of Computerized Tomography, SIAM, Philadelphia, 2001.
- [19] Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim., 22 (2012), no. 2, pp. 341–362.
- [20] E. T. Quinto, Singularities of the X-ray transform and limited data tomography in and , SIAM J. Appl. Math., 24 (1993), pp. 1215–1225.
- [21] S. Rabanser, L. Neumann and M. Haltmeier, Analysis of the block coordinate descent method for linear ill-posed problems, SIAM J. Imaging Sci., 12 (2019), no. 4, pp. 1808–1832.
- [22] P. Richtárik and M. Takác, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Math. Program., 144 (2014), 1–38.
- [23] Runner data, https://www.videvo.net/video/elite-runner-slow-motion/4541/.
- [24] A. Saha and A. Tewari, On the nonasymptotic convergence of cyclic coordinate descent methods, SIAM J. Optim., 23 (2013), pp. 576–601.
- [25] P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl., 109 (2001), 475–494.
- [26] P. Tseng and S. Yun, Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization, J. Optim. Theory Appl. 140 (2009), 513–535.
- [27] P. Tseng and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program., 117 (2009), 387–423.
- [28] A. Wagadarikar, R. John, R. Willett and D. J. Brady, Single disperser design for coded aperture snapshot spectral imaging, Appl. Opt., 47 (2008), pp. B44–B51.
- [29] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process., 13 (2004), pp. 600–612.
- [30] S. J. Wright, Coordinate descent algorithms, Math. Program., 151 (2015), pp. 3–34.
- [31] J. Yang, X. Yuan, X. Liao, P. Llull, G. Sapiro, D. J. Brady and L. Carin, Video compressive sensing using Gaussian mixture models, IEEE Trans. Image Process., 23 (2014), pp. 4863–4878.
- [32] X. Yin, G. Wang, W. Li and Q. Liao, Iteratively reconstructing 4D light fields from focal stacks, Appl. Opt., 55 (2016), pp. 8457–8463.
- [33] X. Yuan, D. J. Brady and A. K. Katsaggelos, Snapshot compressive imaging: theory, algorithms, and applications, IEEE Signal Process. Mag., 38 (2021), pp.65–88.
- [34] C. Zălinscu, Convex Analysis in General Vector Spaces, World Scientific Publishing Co., Inc., River Edge, New Jersey, 2002.