This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Convergence rates for the joint solution of inverse problems with compressed sensing data

Andrea Ebner Department of Mathematics, University of Innsbruck
Technikerstrasse 13, 6020 Innsbruck, Austria
E-mail: [email protected]
Markus Haltmeier Department of Mathematics, University of Innsbruck
Technikerstrasse 13, 6020 Innsbruck, Austria
E-mail: [email protected]
(May 15, 2022)
Abstract

Compressed sensing (CS) is a powerful tool for reducing the amount of data to be collected while maintaining high spatial resolution. Such techniques work well in practice and at the same time are supported by solid theory. Standard CS results assume measurements to be made directly on the targeted signal. In many practical applications, however, CS information can only be taken from indirect data h⋆=𝐖​x⋆h_{\star}=\mathbf{W}x_{\star} related to the original signal by an additional forward operator. If inverting the forward operator is ill-posed, then existing CS theory is not applicable. In this paper, we address this issue and present two joint reconstruction approaches, namely relaxed β„“1\ell^{1} co-regularization and strict β„“1\ell^{1} co-regularization, for CS from indirect data. As main results, we derive error estimates for recovering x⋆x_{\star} and h⋆h_{\star}. In particular, we derive a linear convergence rate in the norm for the latter. To obtain these results, solutions are required to satisfy a source condition and the CS measurement operator is required to satisfy a restricted injectivity condition. We further show that these conditions are not only sufficient but even necessary to obtain linear convergence.

Keywords: Compressed sensing from indirect data, joint recovery, inverse problems, regularization, convergence rate, sparse recovery

1 Introduction

Compressed sensing (CS) allows to significantly reduce the amount of measurements while keeping high spatial resolution [4, 7, 9]. In mathematical terms, CS requires recovering a targeted signal xβ‹†βˆˆπ•x_{\star}\in\mathbb{X} from data yΞ΄=πŒβ€‹x⋆+zΞ΄y^{\delta}=\mathbf{M}x_{\star}+z^{\delta}. Here 𝐌:𝕏→𝕐\mathbf{M}\colon\mathbb{X}\to\mathbb{Y} is the CS measurement operator, 𝕏\mathbb{X}, 𝕐\mathbb{Y} are Hilbert spaces and zΞ΄βˆˆπ•z^{\delta}\in\mathbb{Y} is the unknown data perturbation with β€–zδ‖≀δ\|z^{\delta}\|\leq\delta. CS theory shows that even when the measurement operator is severely under-determinated one can derive linear error estimates β€–xΞ΄βˆ’x⋆‖=π’ͺ​(Ξ΄)\|x^{\delta}-x_{\star}\|=\mathcal{O}(\delta) for the CS reconstruction xΞ΄x^{\delta}. Such results can be derived uniformly for all sparse xβ‹†βˆˆπ•x_{\star}\in\mathbb{X} assuming the restricted isometry property (RIP) requiring that β€–πŒβ€‹x1βˆ’πŒβ€‹x2‖≍‖x1βˆ’x2β€–\|\mathbf{M}x_{1}-\mathbf{M}x_{2}\|\asymp\|x_{1}-x_{2}\| for sufficiently sparse elements [5]. The RIP is known to be satisfied with high probability for a wide range of random matricesΒ [1]. Under a restricted injecticity condition, related results for elements satisfying a range condition are derived in [11, 12]. In [10] a strong form of the source condition has been shown to be sufficient and necessary for the uniqueness of β„“1\ell^{1} minimization. In [12] it is shown that the RIP implies the source condition and the restricted injectivity for all sufficiently sparse elements.

1.1 Problem formulation

In many applications, CS measurements can only be made on indirect data h⋆=𝐖​x⋆h_{\star}=\mathbf{W}x_{\star} instead of the targeted signal xβ‹†βˆˆπ•x_{\star}\in\mathbb{X}, where 𝐖:𝕏→ℍ\mathbf{W}\colon\mathbb{X}\to\mathbb{H} is the forward operator coming from a specific application at hand. For example, in computed tomography, the forward operator is the Radon transform, and in microscopy, the forward operator is a convolution operator. The problem of recovering xβ‹†βˆˆπ•x_{\star}\in\mathbb{X} from CS measurements of indirect data becomes

RecoverΒ x⋆ from ​yΞ΄=𝐀𝐖​x⋆+zΞ΄,\text{Recover $x_{\star}$ from }y^{\delta}=\mathbf{A}\mathbf{W}x_{\star}+z^{\delta}\,, (1.1)

where 𝐀:ℍ→𝕐\mathbf{A}\colon\mathbb{H}\to\mathbb{Y} is the CS measurement operator. In this paper we study the stable solution of (1.1).

The naive reconstruction approach is a single-step approach to consider (1.1) as standard CS problem with the composite measurement operator 𝐌=𝐀𝐖\mathbf{M}=\mathbf{A}\mathbf{W}. However, CS recovery conditions (such as the RIP) are not expected to hold for the composite operator 𝐀𝐖\mathbf{A}\mathbf{W} due to the ill-posedness of the operator 𝐖\mathbf{W}. As an alternative one may use a two-step approach where one first solves the CS problem of recovering 𝐖​x⋆\mathbf{W}x_{\star} and afterwards inverts the operator equation of the inverse problem. Apart from the additional effort, both recovery problems need to be regularized and the risk of error propagation is high. Moreover, recovering hβ‹†βˆˆβ„h_{\star}\in\mathbb{H} from sparsity alone suffers from increased non-uniqueness if ran⁑(𝐖)Β―βŠŠβ„\overline{\operatorname{ran}(\mathbf{W})}\subsetneq\mathbb{H}.

1.2 Proposed β„“1\ell^{1} co-regularization

In order to overcome the drawbacks of the single-step and the two-step approach, we introduce two joint reconstruction methods for solving (1.1) using a weighted β„“1\ell^{1} norm βˆ₯β‹…βˆ₯1,ΞΊ\|\,\cdot\,\|_{1,\kappa} (defined in (2.1)) addressing the CS part and variational regularization with an additional penalty β„›\mathcal{R} for addressing the inverse problem part. More precisely, we study the following two regularization approaches.

  1. (a)

    Strict β„“1\ell^{1} co-regularization: Here we construct a regularized solution pair (xΞ±Ξ΄,hΞ±Ξ΄)(x_{\alpha}^{\delta},h_{\alpha}^{\delta}) with hΞ±Ξ΄=𝐖​xΞ±Ξ΄h_{\alpha}^{\delta}=\mathbf{W}x_{\alpha}^{\delta} by minimizing

    π’œΞ±,yδ​(x)≔12​‖𝐀𝐖​xβˆ’yΞ΄β€–2+α​(ℛ​(x)+‖𝐖​xβ€–1,ΞΊ),\mathcal{A}_{\alpha,y^{\delta}}(x)\coloneqq\frac{1}{2}\|\mathbf{A}\mathbf{W}x-y^{\delta}\|^{2}+\alpha\bigl{(}\mathcal{R}(x)+\|\mathbf{W}x\|_{1,\kappa}\bigr{)}\,, (1.2)

    where Ξ±>0\alpha>0 is a regularization parameter. This is equivalent to minimizing ‖𝐀​hβˆ’yΞ΄β€–2/2+α​(ℛ​(x)+β€–hβ€–1,ΞΊ)\|\mathbf{A}h-y^{\delta}\|^{2}/2+\alpha(\mathcal{R}(x)+\|h\|_{1,\kappa}) under the strict constraint h=𝐖​xh=\mathbf{W}x.

  2. (b)

    Relaxed β„“1\ell^{1} co-regularization: Here we relax the constraint h=𝐖​xh=\mathbf{W}x by adding a penalty and construct a regularized solution (xΞ±Ξ΄,hΞ±Ξ΄)(x_{\alpha}^{\delta},h_{\alpha}^{\delta}) by minimizing

    ℬα,yδ​(x,h)≔12​‖𝐖​xβˆ’hβ€–2+12​‖𝐀​hβˆ’yΞ΄β€–2+α​(ℛ​(x)+β€–hβ€–1,ΞΊ).\mathcal{B}_{\alpha,y^{\delta}}(x,h)\coloneqq\frac{1}{2}\|\mathbf{W}x-h\|^{2}+\frac{1}{2}\|\mathbf{A}h-y^{\delta}\|^{2}\\ +\alpha\bigl{(}\mathcal{R}(x)+\|h\|_{1,\kappa}\bigr{)}\,. (1.3)

    The relaxed version in particular allows some defect between 𝐖​xΞ±Ξ΄\mathbf{W}x_{\alpha}^{\delta} and hΞ±Ξ΄h_{\alpha}^{\delta}.

Under standard assumptions, both the strict and the relaxed version provide convergent regularization methods [18].

1.3 Main results

As main results of this paper, under the parameter choice α≍δ\alpha\asymp\delta, we derive the linear convergence rates (see TheoremsΒ 2.8,Β 2.7)

π’ŸΞΎβ„›β€‹(xΞ±Ξ΄,x⋆)\displaystyle\mathcal{D}_{\xi}^{\mathcal{R}}(x_{\alpha}^{\delta},x_{\star}) =π’ͺ​(Ξ΄)Β as ​δ→0\displaystyle=\mathcal{O}(\delta)\quad\text{ as }\delta\to 0
β€–hΞ±Ξ΄βˆ’π–β€‹x⋆‖\displaystyle\|h_{\alpha}^{\delta}-\mathbf{W}x_{\star}\| =π’ͺ​(Ξ΄)Β as ​δ→0,\displaystyle=\mathcal{O}(\delta)\quad\text{ as }\delta\to 0\,,

where π’ŸΞΎβ„›\mathcal{D}_{\xi}^{\mathcal{R}} is the Bregman distance with respect to β„›\mathcal{R} and ΞΎ\xi (see DefinitionΒ 2.1) for strict as well as for relaxed β„“1\ell^{1} co-regularization. In order to archive these results, we assume a restricted injectivity condition for 𝐀\mathbf{A} and source conditions for x⋆x_{\star} and 𝐖​x⋆\mathbf{W}x_{\star}. These above error estimates are optimal in the sense that they cannot be improved even in the cases where 𝐀=Id\mathbf{A}=\operatorname{Id}, which corresponds to an inverse problem only, or the case 𝐖=Id\mathbf{W}=\operatorname{Id} where (1.1) is a standard CS problem on direct data.

As further main result we derive converse results, showing that the source condition and the restricted injectivity condition are also necessary to obtain linear convergence rates (see TheoremΒ 3.4).

We note that our results and analysis closely follow [11, 12], where the source condition and restricted injectivity are shown to be necessary and sufficient for linear convergence of β„“1\ell^{1}-regularization for CS on direct data. In that context, one considers CS as a particular instance of an inverse problem under a sparsity prior using variational regularization with an β„“1\ell^{1}-penalty (that is, β„“1\ell^{1}-regularization). Error estimates in the norm distance for β„“1\ell^{1}-regularization based on the source condition have been first derived in [14] and strengthened in [11]. In the finite dimensional setting, the source condition (under a different name) for β„“1\ell^{1}-regularization has been used previously in [10]. For some more recent development of β„“1\ell^{1}-regularization and source conditions see [8].

Further note that for the direct CS problem where 𝐖=Id\mathbf{W}=\operatorname{Id} is the identity operator and if we take the regularizer β„›=βˆ₯β‹…βˆ₯2/2\mathcal{R}=\|\,\cdot\,\|^{2}/2, then the strict β„“1\ell^{1} co-regularization reduces to the well known elastic net regression model [19]. Closely following the work [11], error estimates for elastic net regularization have been derived in [13]. Finally, we note that another interesting line of research in the context of β„“1\ell^{1} co-regularization would be the derivation of error estimates under the RIP. While we expect this to be possible, such an analysis is beyond the scope of this work.

2 Linear convergence rates

Throughout this paper 𝕏,𝕐\mathbb{X},\mathbb{Y} and ℍ\mathbb{H} denote separable Hilbert spaces with inner product βŸ¨β‹…,β‹…βŸ©\langle\,\cdot\,,\,\cdot\,\rangle and norm βˆ₯β‹…βˆ₯\|\,\cdot\,\|. Moreover we make the following assumptions.

Assumption A.

  1. (A.1)

    𝐖:𝕏→ℍ\mathbf{W}\colon\mathbb{X}\to\mathbb{H} is linear and bounded.

  2. (A.2)

    𝐀:ℍ→𝕐\mathbf{A}\colon\mathbb{H}\to\mathbb{Y} is linear and bounded.

  3. (A.3)

    β„›:𝕏→[0,∞]\mathcal{R}\colon\mathbb{X}\to[0,\infty] is proper, convex and wlsc.

  4. (A.4)

    Ξ›\Lambda is a countable index set.

  5. (A.5)

    (ϕλ)Ξ»βˆˆΞ›βˆˆβ„Ξ›(\phi_{\lambda})_{\lambda\in\Lambda}\in\mathbb{H}^{\Lambda} is an orthonormal basis (ONB) for ℍ\mathbb{H}.

  6. (A.6)

    (ΞΊΞ»)Ξ»βˆˆΞ›βˆˆ[a,∞)Ξ›(\kappa_{\lambda})_{\lambda\in\Lambda}\in[a,\infty)^{\Lambda} for some a>0a>0.

  7. (A.7)

    βˆƒxβˆˆπ•:ℛ​(x)+βˆ‘Ξ»βˆˆΞ›ΞΊΞ»β€‹|βŸ¨Ο•Ξ»,𝐖​x⟩|<∞\exists x\in\mathbb{X}\colon\mathcal{R}(x)+\sum_{\lambda\in\Lambda}\kappa_{\lambda}\left|\langle\phi_{\lambda},\mathbf{W}x\rangle\right|<\infty.

Recall that β„›\mathcal{R} is wlsc (weakly lower semi-continuous) if lim infkβ†’βˆžβ„›β€‹(xk)β‰₯ℛ​(x)\liminf_{k\to\infty}\mathcal{R}(x_{k})\geq\mathcal{R}(x) for all (xk)kβˆˆβ„•βˆˆπ•β„•(x_{k})_{k\in\mathbb{N}}\in\mathbb{X}^{\mathbb{N}} weakly converging to xβˆˆπ•x\in\mathbb{X}. We write ran⁑(𝐖)≔{𝐖​x∣xβˆˆπ•}\operatorname{ran}(\mathbf{W})\coloneqq\{\mathbf{W}x\mid x\in\mathbb{X}\} for the range of 𝐖\mathbf{W} and

supp⁑(h)≔{Ξ»βˆˆΞ›βˆ£βŸ¨Ο•Ξ»,hβŸ©β‰ 0}\operatorname{supp}(h)\coloneqq\{\lambda\in\Lambda\mid\langle\phi_{\lambda},h\rangle\neq 0\}

for the support of hβˆˆβ„h\in\mathbb{H} with respect to (ϕλ)Ξ»βˆˆΞ›(\phi_{\lambda})_{\lambda\in\Lambda}. A signal hβˆˆβ„h\in\mathbb{H} is sparse if |supp⁑(h)|<∞\left|\operatorname{supp}(h)\right|<\infty. The weighted β„“1\ell^{1}-norm βˆ₯β‹…βˆ₯1,ΞΊ:ℍ→[0,∞]\|\,\cdot\,\|_{1,\kappa}\colon\mathbb{H}\to[0,\infty] with weights (ΞΊΞ»)Ξ»βˆˆΞ›(\kappa_{\lambda})_{\lambda\in\Lambda} is defined by

β€–hβ€–1,ΞΊβ‰”βˆ‘Ξ»βˆˆΞ›ΞΊΞ»β€‹|βŸ¨Ο•Ξ»,h⟩|.\|h\|_{1,\kappa}\coloneqq\sum_{\lambda\in\Lambda}\kappa_{\lambda}\left|\langle\phi_{\lambda},h\rangle\right|\,. (2.1)

We have dom(βˆ₯β‹…βˆ₯1,ΞΊ)={hβˆˆβ„βˆ£βˆ‘Ξ»βˆˆΞ›ΞΊΞ»|βŸ¨Ο•Ξ»,h⟩|<∞}\operatorname{dom}(\|\,\cdot\,\|_{1,\kappa})=\{h\in\mathbb{H}\mid\sum_{\lambda\in\Lambda}\kappa_{\lambda}\left|\langle\phi_{\lambda},h\rangle\right|<\infty\}. For a finite subset of indices Ξ©βŠ†Ξ›\Omega\subseteq\Lambda, we write

ℍΩ≔span⁑{Ο•Ξ»βˆ£Ξ»βˆˆΞ©}\displaystyle\mathbb{H}_{\Omega}\coloneqq\operatorname{span}\{\phi_{\lambda}\mid\lambda\in\Omega\} (2.2)
iΞ©:ℍΩ→ℍ:h↦h\displaystyle\mathrm{i}_{\Omega}\colon\mathbb{H}_{\Omega}\to\mathbb{H}\colon h\mapsto h (2.3)
π€Ξ©β‰”π€βˆ˜iΞ©:ℍΩ→𝕐\displaystyle\mathbf{A}_{\Omega}\coloneqq\mathbf{A}\circ\,\mathrm{i}_{\Omega}\colon\mathbb{H}_{\Omega}\to\mathbb{Y} (2.4)
πΩ:ℍ→ℍΩ:hβ†¦βˆ‘Ξ»βˆˆΞ©βŸ¨Ο•Ξ»,xβŸ©β€‹Ο•Ξ».\displaystyle\pi_{\Omega}\colon\mathbb{H}\to\mathbb{H}_{\Omega}\colon h\mapsto\sum_{\lambda\in\Omega}\langle\phi_{\lambda},x\rangle\phi_{\lambda}\,. (2.5)

Because (ϕλ)Ξ»βˆˆΞ›βŠ†β„(\phi_{\lambda})_{\lambda\in\Lambda}\subseteq\mathbb{H} is an ONB, every hβˆˆβ„h\in\mathbb{H} has the basis representation h=βˆ‘Ξ»βˆˆΞ›βŸ¨Ο•Ξ»,hβŸ©β€‹Ο•Ξ»h=\sum_{\lambda\in\Lambda}\langle\phi_{\lambda},h\rangle\phi_{\lambda}. Finally, ‖𝐀‖{\left\|\mathbf{A}\right\|} denotes the standard operator norm.

2.1 Auxiliary estimates

One main ingredient for our results are error estimates for general variational regularization in terms of the Bregman distance. Recall that ΞΎβˆˆπ•\xi\in\mathbb{X} is called subgradient of a functional 𝒬:𝕏→[0,∞]\mathcal{Q}\colon\mathbb{X}\to[0,\infty] at xβ‹†βˆˆπ•x_{\star}\in\mathbb{X} if

βˆ€xβˆˆπ•:𝒬(x)β‰₯𝒬(x⋆)+⟨ξ,xβˆ’xβ‹†βŸ©.\forall x\in\mathbb{X}\colon\quad\mathcal{Q}(x)\geq\mathcal{Q}(x_{\star})+\langle\xi,x-x_{\star}\rangle\,.

The set of all subgradients is called the subdifferential of 𝒬\mathcal{Q} at x⋆x_{\star} and denoted by βˆ‚π’¬β€‹(x⋆)\partial\mathcal{Q}(x_{\star}).

Definition 2.1 (Bregman distance).

Given 𝒬:𝕏→[0,∞]\mathcal{Q}\colon\mathbb{X}\to[0,\infty] and ΞΎβˆˆβˆ‚π’¬β€‹(x⋆)\xi\in\partial\mathcal{Q}(x_{\star}), the Bregman distance between x⋆,xβˆˆπ•x_{\star},x\in\mathbb{X} with respect to 𝒬\mathcal{Q} and ΞΎ\xi is defined by

π’ŸΞΎπ’¬β€‹(x,x⋆)≔𝒬​(x)βˆ’π’¬β€‹(x⋆)βˆ’βŸ¨ΞΎ,xβˆ’xβ‹†βŸ©.\mathcal{D}_{\xi}^{\mathcal{Q}}(x,x_{\star})\coloneqq\mathcal{Q}(x)-\mathcal{Q}(x_{\star})-\langle\xi,x-x_{\star}\rangle\,. (2.6)

The Bregman distance is a valuable tool for deriving error estimates for variational regularization. Specifically, for our purpose we use the following convergence rates result.

Lemma 2.2 (Variational regularization).

Let 𝐌:𝕏→𝕐\mathbf{M}\colon\mathbb{X}\to\mathbb{Y} be bounded and linear, let 𝒬:𝕏→[0,∞]\mathcal{Q}\colon\mathbb{X}\to[0,\infty] be proper, convex and wlsc and let (x⋆,y⋆)βˆˆπ•Γ—π•(x_{\star},y_{\star})\in\mathbb{X}\times\mathbb{Y} satisfy πŒβ€‹x⋆=y⋆\mathbf{M}x_{\star}=y_{\star} and πŒβˆ—β€‹Ξ·βˆˆβˆ‚π’¬β€‹(x⋆)\mathbf{M}^{*}\eta\in\partial\mathcal{Q}(x_{\star}) for some Ξ·βˆˆπ•\eta\in\mathbb{Y}. Then for all Ξ΄,Ξ±>0\delta,\alpha>0, yΞ΄βˆˆπ•y^{\delta}\in\mathbb{Y} with β€–yΞ΄βˆ’y⋆‖≀δ\|y^{\delta}-y_{\star}\|\leq\delta and xαδ∈argmin⁑{β€–πŒβ€‹xβˆ’yΞ΄β€–2/2+α​𝒬​(x)}x_{\alpha}^{\delta}\in\operatorname{argmin}\{\|\mathbf{M}x-y^{\delta}\|^{2}/2+\alpha\mathcal{Q}(x)\} we have

β€–πŒβ€‹xΞ±Ξ΄βˆ’yΞ΄β€–\displaystyle\|\mathbf{M}x_{\alpha}^{\delta}-y^{\delta}\| ≀δ+2​α​‖η‖\displaystyle\leq\delta+2\alpha\|\eta\| (2.7)
DπŒβˆ—β€‹Ξ·π’¬β€‹(xΞ±Ξ΄,x⋆)\displaystyle D_{\mathbf{M}^{*}\eta}^{\mathcal{Q}}(x_{\alpha}^{\delta},x_{\star}) ≀(Ξ΄+α​‖η‖)2/(2​α).\displaystyle\leq(\delta+\alpha\|\eta\|)^{2}/(2\alpha)\,. (2.8)
Proof.

Lemma 2.2 has been derived in [12, Lemma 3.5]. Note that error estimates for variational regularization in the Bregman distance have first been derived in [3]. ∎

For our purpose we will apply Lemma 2.2 where 𝒬\mathcal{Q} is a combination formed by β„›\mathcal{R} and βˆ₯β‹…βˆ₯1,ΞΊ\|\,\cdot\,\|_{1,\kappa}. We will use that the subdifferential of βˆ₯β‹…βˆ₯1,ΞΊ\|\,\cdot\,\|_{1,\kappa} at h⋆h_{\star} consists of all Ξ·=βˆ‘Ξ»βˆˆΞ›Ξ·Ξ»β€‹Ο•Ξ»βˆˆβ„\eta=\sum_{\lambda\in\Lambda}\eta_{\lambda}\phi_{\lambda}\in\mathbb{H} with

{Ξ·Ξ»=κλ​sign⁑(βŸ¨Ο•Ξ»,hβ‹†βŸ©)Β forΒ β€‹Ξ»βˆˆsupp⁑(h⋆)ηλ∈[βˆ’ΞΊΞ»,ΞΊΞ»]Β forΒ β€‹Ξ»βˆ‰supp⁑(h⋆).\begin{cases}\eta_{\lambda}=\kappa_{\lambda}\operatorname{sign}(\langle\phi_{\lambda},h_{\star}\rangle)&\text{ for }\lambda\in\operatorname{supp}(h_{\star})\\ \eta_{\lambda}\in[-\kappa_{\lambda},\kappa_{\lambda}]&\text{ for }\lambda\notin\operatorname{supp}(h_{\star})\,.\end{cases}

Since the family (Ξ·Ξ»)Ξ»βˆˆΞ›(\eta_{\lambda})_{\lambda\in\Lambda} is square summable, Ξ·Ξ»=Β±ΞΊΞ»\eta_{\lambda}=\pm\kappa_{\lambda} can be obtained for only finitely many Ξ»\lambda and therefore βˆ‚β€–h⋆‖1,ΞΊ\partial\|h_{\star}\|_{1,\kappa} is nonempty if and only if h⋆h_{\star} is sparse.

Remark 2.3 (Weighted β„“1\ell^{1}-norm).

For Ξ·=βˆ‘Ξ»βˆˆΞ›Ξ·Ξ»β€‹Ο•Ξ»βˆˆβˆ‚β€–h⋆‖1,ΞΊ\eta=\sum_{\lambda\in\Lambda}\eta_{\lambda}\phi_{\lambda}\in\partial\|h_{\star}\|_{1,\kappa} define

Ω​[Ξ·]\displaystyle\Omega[\eta] ≔{Ξ»βˆˆΞ›:|Ξ·Ξ»|=ΞΊΞ»}\displaystyle\coloneqq\{\lambda\in\Lambda\colon\left|\eta_{\lambda}\right|=\kappa_{\lambda}\} (2.9)
m​[Ξ·]\displaystyle m[\eta] ≔min⁑{ΞΊΞ»βˆ’|Ξ·Ξ»|:Ξ»βˆ‰Ξ©β€‹[Ξ·]}.\displaystyle\coloneqq\min\{\kappa_{\lambda}-\left|\eta_{\lambda}\right|\colon\lambda\notin\Omega[\eta]\}. (2.10)

Then Ω​[Ξ·]\Omega[\eta] is finite and as (Ξ·Ξ»)Ξ»βˆˆΞ›(\eta_{\lambda})_{\lambda\in\Lambda} converges to zero, m​[Ξ·]m[\eta] is well-defined with m​[Ξ·]>0m[\eta]>0. Because βˆ₯β‹…βˆ₯1,ΞΊ\|\,\cdot\,\|_{1,\kappa} is positively homogeneous it holds β€–h⋆‖1=⟨η,hβ‹†βŸ©\|h_{\star}\|_{1}=\langle\eta,h_{\star}\rangle. Thus, for hβˆˆβ„h\in\mathbb{H},

π’ŸΞ·βˆ₯β‹…βˆ₯1,κ​(h,h⋆)\displaystyle\mathcal{D}_{\eta}^{\|\,\cdot\,\|_{1,\kappa}}(h,h_{\star}) =β€–hβ€–1,ΞΊβˆ’βŸ¨Ξ·,h⟩\displaystyle=\|h\|_{1,\kappa}-\langle\eta,h\rangle
=βˆ‘Ξ»βˆˆΞ›(κλ​|βŸ¨Ο•Ξ»,h⟩|βˆ’Ξ·Ξ»β€‹βŸ¨Ο•Ξ»,h⟩)\displaystyle=\sum_{\lambda\in\Lambda}(\kappa_{\lambda}\left|\langle\phi_{\lambda},h\rangle\right|-\eta_{\lambda}\langle\phi_{\lambda},h\rangle)
β‰₯βˆ‘Ξ»βˆˆΞ›(ΞΊΞ»βˆ’|Ξ·Ξ»|)​|βŸ¨Ο•Ξ»,h⟩|\displaystyle\geq\sum_{\lambda\in\Lambda}(\kappa_{\lambda}-\left|\eta_{\lambda}\right|)\left|\langle\phi_{\lambda},h\rangle\right|
β‰₯m​[Ξ·]β€‹βˆ‘Ξ»βˆ‰Ξ©β€‹[Ξ·]|βŸ¨Ο•Ξ»,h⟩|.\displaystyle\geq m[\eta]\sum_{\lambda\notin\Omega[\eta]}\left|\langle\phi_{\lambda},h\rangle\right|\,. (2.11)

Estimate (2.11) implies that if π’ŸΞ·βˆ₯β‹…βˆ₯1,κ​(hΞ±Ξ΄,h⋆)\mathcal{D}_{\eta}^{\|\,\cdot\,\|_{1,\kappa}}(h_{\alpha}^{\delta},h_{\star}) linearly converge to 0, then so does βˆ‘Ξ»βˆ‰Ξ©β€‹[Ξ·]|βŸ¨Ο•Ξ»,hαδ⟩|\sum_{\lambda\notin\Omega[\eta]}\left|\langle\phi_{\lambda},h^{\delta}_{\alpha}\rangle\right|.

Lemma 2.4.

Let Ξ©βŠ†Ξ›\Omega\subseteq\Lambda be finite, 𝐀Ω:ℍΩ→𝕐\mathbf{A}_{\Omega}\colon\mathbb{H}_{\Omega}\to\mathbb{Y} injective and hβ‹†βˆˆβ„Ξ©h_{\star}\in\mathbb{H}_{\Omega}. Then, for all hβˆˆβ„h\in\mathbb{H},

β€–hβˆ’hβ‹†β€–β‰€β€–π€Ξ©βˆ’1‖​‖𝐀​hβˆ’π€β€‹h⋆‖+(1+β€–π€Ξ©βˆ’1‖​‖𝐀‖)β€‹βˆ‘Ξ»βˆ‰Ξ©|βŸ¨Ο•Ξ»,h⟩|.\|h-h_{\star}\|\leq\|\mathbf{A}^{-1}_{\Omega}\|\|\mathbf{A}h-\mathbf{A}h_{\star}\|+(1+\|\mathbf{A}^{-1}_{\Omega}\|\|\mathbf{A}\|)\sum_{\lambda\notin\Omega}\left|\langle\phi_{\lambda},h\rangle\right|\,. (2.12)
Proof.

Because ℍΩ\mathbb{H}_{\Omega} is finite dimensional and 𝐀Ω\mathbf{A}_{\Omega} is injective, the inverse π€Ξ©βˆ’1:ran⁑(𝐀Ω)→ℍΩ\mathbf{A}^{-1}_{\Omega}\colon\operatorname{ran}(\mathbf{A}_{\Omega})\to\mathbb{H}_{\Omega} is well defined and bounded. Consequently,

β€–hβˆ’h⋆‖\displaystyle\|h-h_{\star}\| ≀‖πΩ​hβˆ’h⋆‖+β€–Ο€Ξ›βˆ–Ξ©β€‹hβ€–\displaystyle\leq\|\pi_{\Omega}h-h_{\star}\|+\|\pi_{\Lambda\setminus\Omega}h\|
β‰€β€–π€Ξ©βˆ’1‖​‖𝐀Ω​(πΩ​hβˆ’h⋆)β€–+β€–Ο€Ξ›βˆ–Ξ©β€‹hβ€–\displaystyle\leq\|\mathbf{A}^{-1}_{\Omega}\|\|\mathbf{A}_{\Omega}(\pi_{\Omega}h-h_{\star})\|+\|\pi_{\Lambda\setminus\Omega}h\|
β‰€β€–π€Ξ©βˆ’1‖​‖𝐀​(hβˆ’h⋆)βˆ’π€β€‹Ο€Ξ›βˆ–Ξ©β€‹hβ€–+β€–Ο€Ξ›βˆ–Ξ©β€‹hβ€–\displaystyle\leq\|\mathbf{A}^{-1}_{\Omega}\|\|\mathbf{A}(h-h_{\star})-\mathbf{A}\pi_{\Lambda\setminus\Omega}h\|+\|\pi_{\Lambda\setminus\Omega}h\|
β‰€β€–π€Ξ©βˆ’1‖​‖𝐀​hβˆ’π€β€‹h⋆‖+(1+β€–π€Ξ©βˆ’1‖​‖𝐀‖)β€‹β€–Ο€Ξ›βˆ–Ξ©β€‹hβ€–.\displaystyle\leq\|\mathbf{A}^{-1}_{\Omega}\|\|\mathbf{A}h-\mathbf{A}h_{\star}\|+(1+\|\mathbf{A}^{-1}_{\Omega}\|\|\mathbf{A}\|)\|\pi_{\Lambda\setminus\Omega}h\|\,.

Bounding the β„“2\ell^{2}-norm by the β„“1\ell^{1}-norm yields (2.12). ∎

Lemma 2.5.

Let hβ‹†βˆˆβ„h_{\star}\in\mathbb{H} be sparse, Ξ·βˆˆβˆ‚β€–h⋆‖1,ΞΊ\eta\in\partial\|h_{\star}\|_{1,\kappa} and assume that 𝐀Ω​[Ξ·]:ℍΩ​[Ξ·]→𝕐\mathbf{A}_{\Omega[\eta]}\colon\mathbb{H}_{\Omega[\eta]}\to\mathbb{Y} is injective. Then, for hβˆˆβ„h\in\mathbb{H},

β€–hβˆ’h⋆‖≀‖𝐀Ω​[Ξ·]βˆ’1‖​‖𝐀​hβˆ’π€β€‹h⋆‖+1+‖𝐀Ω​[Ξ·]βˆ’1‖​‖𝐀‖m​[Ξ·]β€‹π’ŸΞ·βˆ₯β‹…βˆ₯1,κ​(h,h⋆).\|h-h_{\star}\|\leq\|\mathbf{A}^{-1}_{\Omega[\eta]}\|\|\mathbf{A}h-\mathbf{A}h_{\star}\|+\frac{1+\|\mathbf{A}^{-1}_{\Omega[\eta]}\|\|\mathbf{A}\|}{m[\eta]}\mathcal{D}_{\eta}^{\|\,\cdot\,\|_{1,\kappa}}(h,h_{\star})\,. (2.13)
Proof.

Follows form (2.12), (2.11). ∎

2.2 Relaxed β„“1\ell^{1} co-regularization

First we derive linear rates for the relaxed model ℬα,yΞ΄\mathcal{B}_{\alpha,y^{\delta}}. These results will be derived under the following condition.

Condition 1.

  1. (1.1)

    (x⋆,h⋆,y⋆)βˆˆπ•Γ—β„Γ—π•(x_{\star},h_{\star},y_{\star})\in\mathbb{X}\times\mathbb{H}\times\mathbb{Y} with 𝐖​x⋆=h⋆\mathbf{W}x_{\star}=h_{\star}, 𝐀​h⋆=y⋆\mathbf{A}h_{\star}=y_{\star}.

  2. (1.2)

    βˆƒuβˆˆβ„:\exists u\in\mathbb{H}\colon π–βˆ—β€‹uβˆˆβˆ‚β„›β€‹(x⋆)\mathbf{W}^{*}u\in\partial\mathcal{R}(x_{\star})

  3. (1.3)

    βˆƒvβˆˆπ•:\exists v\in\mathbb{Y}\colon π€βˆ—β€‹vβˆ’uβˆˆβˆ‚β€–h⋆‖1,ΞΊ\mathbf{A}^{*}v-u\in\partial\|h_{\star}\|_{1,\kappa}

  4. (1.4)

    𝐀Ω​[π€βˆ—β€‹vβˆ’u]\mathbf{A}_{\Omega[\mathbf{A}^{*}v-u]} is injective.

Conditions (1.2), (1.3) are source conditions very commonly assumed in regularization theory. From (1.3) it follows that h⋆h_{\star} is sparse and contained in ℍΩ​[π€βˆ—β€‹vβˆ’u]\mathbb{H}_{\Omega[\mathbf{A}^{*}v-u]}. Condition (1.4) is the restricted injectivity condition.

Remark 2.6 (Product formulation).

We introduce the operator 𝐌:𝕏×ℍ→ℍ×𝕐\mathbf{M}\colon\mathbb{X}\times\mathbb{H}\to\mathbb{H}\times\mathbb{Y} and the functional 𝒬:𝕏×ℍ→[0,∞]\mathcal{Q}\colon\mathbb{X}\times\mathbb{H}\to[0,\infty],

πŒβ€‹(x,h)≔(𝐖​xβˆ’h,𝐀​h)\displaystyle\mathbf{M}(x,h)\coloneqq(\mathbf{W}x-h,\mathbf{A}h) (2.14)
𝒬​(x,h)≔ℛ​(x)+β€–hβ€–1,ΞΊ.\displaystyle\mathcal{Q}(x,h)\coloneqq\mathcal{R}(x)+\|h\|_{1,\kappa}\,. (2.15)

Using these notions, one can rewrite the relaxed co-regularization functional ℬα,yΞ΄\mathcal{B}_{\alpha,y^{\delta}} as

ℬα,yδ​(x,h)=12β€‹β€–πŒβ€‹(x,h)βˆ’(0,yΞ΄)β€–2+α​𝒬​(x,h).\mathcal{B}_{\alpha,y^{\delta}}(x,h)=\frac{1}{2}\|\mathbf{M}(x,h)-(0,y^{\delta})\|^{2}+\alpha\mathcal{Q}(x,h).

Because 𝐖\mathbf{W} and 𝐀\mathbf{A} are linear and bounded, 𝐌\mathbf{M} is linear and bounded, too. Moreover, since β„›\mathcal{R} and βˆ₯β‹…βˆ₯1,ΞΊ\|\,\cdot\,\|_{1,\kappa} are proper, convex and wlsc, 𝒬\mathcal{Q} has these properties, too. The subdifferential βˆ‚π’¬β€‹(x⋆,h⋆)\partial\mathcal{Q}(x_{\star},h_{\star}) is given by βˆ‚π’¬β€‹(x⋆,h⋆)=βˆ‚β„›β€‹(x⋆)Γ—βˆ‚β€–h⋆‖1,ΞΊ\partial\mathcal{Q}(x_{\star},h_{\star})=\partial\mathcal{R}(x_{\star})\times\partial\|h_{\star}\|_{1,\kappa}. The Bregman distance with respect to ΞΎ=(ΞΎ1,ΞΎ2)\xi=(\xi_{1},\xi_{2}) is given by

π’Ÿ(ΞΎ1,ΞΎ2)𝒬​((x,h),(x⋆,h⋆))=π’ŸΞΎ1ℛ​(xβˆ’x⋆)+π’ŸΞΎ2βˆ₯β‹…βˆ₯1,κ​(hβˆ’h⋆).\mathcal{D}_{(\xi_{1},\xi_{2})}^{\mathcal{Q}}((x,h),(x_{\star},h_{\star}))=\mathcal{D}_{\xi_{1}}^{\mathcal{R}}(x-x_{\star})+\mathcal{D}_{\xi_{2}}^{{\left\|\cdot\right\|}{}_{1,\kappa}}(h-h_{\star})\,. (2.16)

(1.2), (1.3) can be written as πŒβˆ—β€‹(u,v)βˆˆβˆ‚π’¬β€‹(x⋆,h⋆)\mathbf{M}^{*}(u,v)\in\partial\mathcal{Q}(x_{\star},h_{\star}).

Here comes our main estimate for the relaxed model.

Theorem 2.7 (Relaxed β„“1\ell^{1} co-regularization).

Let ConditionΒ 1 hold and consider the parameter choice Ξ±=C​δ\alpha=C\delta for C>0C>0. Then for all yΞ΄βˆˆπ•y^{\delta}\in\mathbb{Y} with β€–yΞ΄βˆ’y⋆‖≀δ\|y^{\delta}-y_{\star}\|\leq\delta and all (xΞ±Ξ΄,hΞ±Ξ΄)∈argmin⁑ℬα,yΞ΄(x_{\alpha}^{\delta},h_{\alpha}^{\delta})\in\operatorname{argmin}\mathcal{B}_{\alpha,y^{\delta}} we have

π’Ÿπ–βˆ—β€‹uℛ​(xΞ±Ξ΄,x⋆)\displaystyle\mathcal{D}_{\mathbf{W}^{*}u}^{\mathcal{R}}(x_{\alpha}^{\delta},x_{\star}) ≀c(u,v)​δ,\displaystyle\leq c_{(u,v)}\delta, (2.17)
β€–hΞ±Ξ΄βˆ’h⋆‖\displaystyle\|h_{\alpha}^{\delta}-h_{\star}\| ≀d(u,v)​δ,\displaystyle\leq d_{(u,v)}\delta\,, (2.18)

where

c(u,v)≔(1+C​‖(u,v)β€–)2/(2​C)\displaystyle c_{(u,v)}\coloneqq(1+C\|(u,v)\|)^{2}/(2C)
d(u,v)≔2​‖𝐀Ω​[π€βˆ—β€‹vβˆ’u]βˆ’1‖​(1+C​‖(u,v)β€–)+1+‖𝐀Ω​[π€βˆ—β€‹vβˆ’u]βˆ’1‖​‖𝐀‖m​[Ξ·]​c(u,v).\displaystyle\begin{multlined}d_{(u,v)}\coloneqq 2\|\mathbf{A}^{-1}_{\Omega[\mathbf{A}^{*}v-u]}\|(1+C\|(u,v)\|)+\frac{1+\|\mathbf{A}^{-1}_{\Omega[\mathbf{A}^{*}v-u]}\|\|\mathbf{A}\|}{m[\eta]}c_{(u,v)}.\end{multlined}d_{(u,v)}\coloneqq 2\|\mathbf{A}^{-1}_{\Omega[\mathbf{A}^{*}v-u]}\|(1+C\|(u,v)\|)+\frac{1+\|\mathbf{A}^{-1}_{\Omega[\mathbf{A}^{*}v-u]}\|\|\mathbf{A}\|}{m[\eta]}c_{(u,v)}.
Proof.

According to (1.3), Ξ·β‰”π€βˆ—β€‹vβˆ’uβˆˆβˆ‚β€–h⋆‖1,ΞΊ\eta\coloneqq\mathbf{A}^{*}v-u\in\partial\|h_{\star}\|_{1,\kappa}, which implies that Ω​[Ξ·]\Omega[\eta] is finite and hβ‹†βˆˆβ„Ξ©β€‹[Ξ·]h_{\star}\in\mathbb{H}_{\Omega[\eta]}. With (1.4) and Lemma 2.5 we therefore get

β€–hΞ±Ξ΄βˆ’h⋆‖≀‖𝐀Ω​[Ξ·]βˆ’1‖​‖𝐀​hΞ±Ξ΄βˆ’y⋆‖+(1+‖𝐀Ω​[Ξ·]βˆ’1‖​‖𝐀‖)m​[Ξ·]β€‹π’ŸΞ·βˆ₯β‹…βˆ₯1,κ​(hΞ±Ξ΄,h⋆).\|h_{\alpha}^{\delta}-h_{\star}\|\leq\|\mathbf{A}^{-1}_{\Omega[\eta]}\|\|\mathbf{A}h^{\delta}_{\alpha}-y_{\star}\|+\frac{(1+\|\mathbf{A}^{-1}_{\Omega[\eta]}\|\|\mathbf{A}\|)}{m[\eta]}\mathcal{D}_{\eta}^{\|\,\cdot\,\|_{1,\kappa}}(h^{\delta}_{\alpha},h_{\star})\,. (2.19)

Using the product formulation as in Remark 2.6, according to (1.2), (1.3) the source condition πŒβˆ—β€‹(u,v)βˆˆβˆ‚π’¬β€‹(x⋆,h⋆)\mathbf{M}^{*}(u,v)\in\partial\mathcal{Q}(x_{\star},h_{\star}) holds. By Lemma 2.2 and the choice Ξ±=C​δ\alpha=C\delta we obtain

β€–πŒβ€‹(xΞ±Ξ΄,hΞ±Ξ΄)βˆ’(0,yΞ΄)‖≀(1+2​C​‖(u,v)β€–)​δ\displaystyle\|\mathbf{M}(x_{\alpha}^{\delta},h_{\alpha}^{\delta})-(0,y^{\delta})\|\leq(1+2C\|(u,v)\|)\,\delta
π’ŸπŒβˆ—β€‹(u,v)𝒬​((xΞ±Ξ΄,hΞ±Ξ΄),(x⋆,h⋆))≀(1+C​‖(u,v)β€–)2/(2​C)​δ.\displaystyle\mathcal{D}_{\mathbf{M}^{*}(u,v)}^{\mathcal{Q}}\left((x_{\alpha}^{\delta},h_{\alpha}^{\delta}),(x_{\star},h_{\star})\right)\leq(1+C\|(u,v)\|)^{2}/(2C)\,\delta\,.

Using (2.14), (2.15), (2.16) we obtain

‖𝐀​hΞ±Ξ΄βˆ’yΞ΄β€–\displaystyle\|\mathbf{A}h^{\delta}_{\alpha}-y^{\delta}\| ≀(1+2​C​‖(u,v)β€–)​δ\displaystyle\leq(1+2C\|(u,v)\|)\,\delta
‖𝐖​xΞ±Ξ΄βˆ’hΞ±Ξ΄β€–\displaystyle\|\mathbf{W}x_{\alpha}^{\delta}-h^{\delta}_{\alpha}\| ≀(1+2​C​‖(u,v)β€–)​δ\displaystyle\leq(1+2C\|(u,v)\|)\,\delta
π’ŸΞ·βˆ₯β‹…βˆ₯1,κ​(hΞ±Ξ΄,h⋆)\displaystyle\mathcal{D}_{\eta}^{\|\,\cdot\,\|_{1,\kappa}}(h^{\delta}_{\alpha},h_{\star}) ≀(1+C​‖(u,v)β€–)2​δ/(2​C)\displaystyle\leq(1+C\|(u,v)\|)^{2}\,\delta/(2C)
π’Ÿπ–βˆ—β€‹uℛ​(xΞ±Ξ΄,x⋆)\displaystyle\mathcal{D}_{\mathbf{W}^{*}u}^{\mathcal{R}}(x_{\alpha}^{\delta},x_{\star}) ≀(1+C​‖(u,v)β€–)2​δ/(2​C).\displaystyle\leq(1+C\|(u,v)\|)^{2}\,\delta/(2C)\,.

Combining this with (2.19) completes the proof. ∎

If β„›\mathcal{R} is totally convex, then convergence in the Bregman distance implies convergence in the norm [18, Lemma 3.31]. For example, for the standard penalty β„›=βˆ₯β‹…βˆ₯2/2\mathcal{R}=\|\,\cdot\,\|^{2}/2 from TheoremΒ 2.7 one deduces the rate β€–xΞ±Ξ΄βˆ’x⋆‖=π’ͺ​(Ξ΄)\|x_{\alpha}^{\delta}-x_{\star}\|=\mathcal{O}(\sqrt{\delta}).

2.3 Strict β„“1\ell^{1} co-regularization

Next we analyze the strict approach (1.2). We derive linear convergence rates under the following condition.

Condition 2.

  1. (2.1)

    (x⋆,y⋆)βˆˆπ•Γ—π•(x_{\star},y_{\star})\in\mathbb{X}\times\mathbb{Y} satisfies 𝐀𝐖​x⋆=y⋆\mathbf{A}\mathbf{W}x_{\star}=y_{\star}.

  2. (2.2)

    βˆƒΞ½βˆˆπ•:\exists\nu\in\mathbb{Y}\colon π–βˆ—β€‹π€βˆ—β€‹Ξ½βˆˆβˆ‚(β„›+βˆ‚β€–π–β€‹(β‹…)β€–1,ΞΊ)​(x⋆)\mathbf{W}^{*}\mathbf{A}^{*}\nu\in\partial(\mathcal{R}+\partial\|\mathbf{W}(\,\cdot\,)\|_{1,\kappa})(x_{\star})

  3. (2.3)

    βˆƒΞΎβˆˆβˆ‚β„›β€‹(x⋆)\exists\xi\in\partial\mathcal{R}(x_{\star}) βˆƒΞ·βˆˆβˆ‚βˆ₯β‹…βˆ₯1(𝐖x⋆):\exists\eta\in\partial\|\,\cdot\,\|_{1}(\mathbf{W}x_{\star})\colon π–βˆ—β€‹π€βˆ—β€‹Ξ½=ΞΎ+π–βˆ—β€‹Ξ·\mathbf{W}^{*}\mathbf{A}^{*}\nu=\xi+\mathbf{W}^{*}\eta

  4. (2.4)

    𝐀Ω​[Ξ·]\mathbf{A}_{\Omega[\eta]} is injective.

Condition (2.2) is a source condition for the forward operator 𝐖𝐀\mathbf{W}\mathbf{A} and the regularization functional β„›+βˆ‚β€–π–β€‹(β‹…)β€–1,ΞΊ\mathcal{R}+\partial\|\mathbf{W}(\,\cdot\,)\|_{1,\kappa}. Condition (2.3) assumes the splitting of the subgradient π–βˆ—β€‹π€βˆ—β€‹Ξ½=ΞΎ+π–βˆ—β€‹Ξ·\mathbf{W}^{*}\mathbf{A}^{*}\nu=\xi+\mathbf{W}^{*}\eta into subgradients ΞΎβˆˆβˆ‚β„›β€‹(x⋆)\xi\in\partial\mathcal{R}(x_{\star}) and π–βˆ—β€‹Ξ·βˆˆβˆ‚β€–π–β€‹(β‹…)β€–1​(x⋆)\mathbf{W}^{*}\eta\in\partial\|\mathbf{W}(\,\cdot\,)\|_{1}(x_{\star}). The assumption (2.4) is the restricted injectivity.

Theorem 2.8 (Strict β„“1\ell^{1} co-regularization).

Let Condition 2 hold and consider the parameter choice Ξ±=C​δ\alpha=C\delta for C>0C>0. Then for β€–yΞ΄βˆ’y⋆‖≀δ\|y^{\delta}-y_{\star}\|\leq\delta and xαδ∈argminβ‘π’œΞ±,yΞ΄x_{\alpha}^{\delta}\in\operatorname{argmin}\mathcal{A}_{\alpha,y^{\delta}} we have

π’ŸΞΎβ„›β€‹(xΞ±Ξ΄,x⋆)\displaystyle\mathcal{D}_{\xi}^{\mathcal{R}}(x_{\alpha}^{\delta},x_{\star}) ≀c(Ξ½,Ξ·)​δ\displaystyle\leq c_{(\nu,\eta)}\delta (2.20)
‖𝐖​xΞ±Ξ΄βˆ’π–β€‹x⋆‖\displaystyle\|\mathbf{W}x_{\alpha}^{\delta}-\mathbf{W}x_{\star}\| ≀d(Ξ½,Ξ·)​δ,\displaystyle\leq d_{(\nu,\eta)}\delta\,, (2.21)

with the constants

c(Ξ½,Ξ·)≔(1+C​‖ν‖)2/(2​C)\displaystyle c_{(\nu,\eta)}\coloneqq(1+C\|\nu\|)^{2}/(2C)
d(Ξ½,Ξ·)≔2​‖𝐀Ω​[Ξ·]βˆ’1‖​(1+C​‖ν‖)+1+‖𝐀Ω​[Ξ·]βˆ’1‖​‖𝐀‖m​[Ξ·]​c(Ξ½,Ξ·).\displaystyle\begin{multlined}d_{(\nu,\eta)}\coloneqq 2\|\mathbf{A}^{-1}_{\Omega[\eta]}\|(1+C\|\nu\|)+\frac{1+\|\mathbf{A}^{-1}_{\Omega[\eta]}\|\|\mathbf{A}\|}{m[\eta]}c_{(\nu,\eta)}\,.\end{multlined}d_{(\nu,\eta)}\coloneqq 2\|\mathbf{A}^{-1}_{\Omega[\eta]}\|(1+C\|\nu\|)+\frac{1+\|\mathbf{A}^{-1}_{\Omega[\eta]}\|\|\mathbf{A}\|}{m[\eta]}c_{(\nu,\eta)}\,.
Proof.

ConditionΒ 2 implies that Ω​[Ξ·]\Omega[\eta] finite and 𝐖​xβ‹†βˆˆβ„Ξ©β€‹[Ξ·]\mathbf{W}x_{\star}\in\mathbb{H}_{\Omega[\eta]}. From Lemma 2.5 we obtain

‖𝐖​xΞ±Ξ΄βˆ’π–β€‹x⋆‖≀‖𝐀Ω​[Ξ·]βˆ’1‖​‖𝐀𝐖​xΞ±Ξ΄βˆ’y⋆‖+(1+‖𝐀Ω​[Ξ·]βˆ’1‖​‖𝐀‖)m​[Ξ·]β€‹π’ŸΞ·βˆ₯β‹…βˆ₯1​(𝐖​xΞ±Ξ΄,𝐖​x⋆).\|\mathbf{W}x_{\alpha}^{\delta}-\mathbf{W}x_{\star}\|\leq\|\mathbf{A}^{-1}_{\Omega[\eta]}\|\|\mathbf{A}\mathbf{W}x_{\alpha}^{\delta}-y_{\star}\|+\frac{(1+\|\mathbf{A}^{-1}_{\Omega[\eta]}\|\|\mathbf{A}\|)}{m[\eta]}\mathcal{D}_{\eta}^{\|\,\cdot\,\|_{1}}(\mathbf{W}x_{\alpha}^{\delta},\mathbf{W}x_{\star}). (2.22)

According to (2.2) and LemmaΒ 2.2 applied with 𝐌=𝐀𝐖\mathbf{M}=\mathbf{A}\mathbf{W} and 𝒬=β„›+‖𝐖​(β‹…)β€–1,ΞΊ\mathcal{Q}=\mathcal{R}+\|\mathbf{W}(\,\cdot\,)\|_{1,\kappa} we obtain

‖𝐀𝐖​xΞ±Ξ΄βˆ’yδ‖≀(1+2​C​‖ν‖)​δ\displaystyle\|\mathbf{A}\mathbf{W}x_{\alpha}^{\delta}-y^{\delta}\|\leq(1+2C\|\nu\|)\,\delta (2.23)
π’Ÿπ–βˆ—β€‹π€βˆ—β€‹Ξ½π’¬β€‹(xΞ±Ξ΄,x⋆)≀(1+C​‖ν‖)2/(2​C)​δ.\displaystyle\mathcal{D}_{\mathbf{W}^{*}\mathbf{A}^{*}\nu}^{\mathcal{Q}}\left(x_{\alpha}^{\delta},x_{\star}\right)\leq(1+C\|\nu\|)^{2}/(2C)\,\delta. (2.24)

From (2.2) we obtain π’Ÿπ–βˆ—β€‹π€βˆ—β€‹Ξ½π’¬=π’ŸΞ·βˆ₯β‹…βˆ₯1​(𝐖​(β‹…),𝐖​(β‹…))+π’ŸΞΎβ„›\mathcal{D}_{\mathbf{W}^{*}\mathbf{A}^{*}\nu}^{\mathcal{Q}}=\mathcal{D}_{\eta}^{\|\,\cdot\,\|_{1}}(\mathbf{W}(\,\cdot\,),\mathbf{W}(\,\cdot\,))+\mathcal{D}_{\xi}^{\mathcal{R}}. Together with (2.22), (2.23), (2.23) this show the claim. ∎

3 Necessary Conditions

In this section we show that the source condition and restricted injectivity are not only sufficient but also necessary for linear convergence of relaxed β„“1\ell^{1} co-regularization. In the following we restrict ourselves to the β„“1\ell^{1}-norm

βˆ₯β‹…βˆ₯1≔βˆ₯β‹…βˆ₯1,1=βˆ‘Ξ»βˆˆΞ›|βŸ¨Ο•Ξ»,β‹…βŸ©|.\|\,\cdot\,\|_{1}\coloneqq\|\,\cdot\,\|_{1,1}=\sum_{\lambda\in\Lambda}\left|\langle\phi_{\lambda},\,\cdot\,\rangle\right|.

We denote by 𝐌\mathbf{M} and 𝒬\mathcal{Q} the product operator and regularizer defined in (2.14), (2.15). We call (x⋆,h⋆)(x_{\star},h_{\star}) a 𝒬\mathcal{Q}-minimizing solution of πŒβ€‹(x,h)=(0,y⋆)\mathbf{M}(x,h)=(0,y_{\star}) if xβ‹†βˆˆargmin⁑{𝒬​(x)βˆ£πŒβ€‹(x,h)=(0,y⋆)}x_{\star}\in\operatorname{argmin}\{\mathcal{Q}(x)\mid\mathbf{M}(x,h)=(0,y_{\star})\}. In this section we fix the following list of assumptions which is slightly stronger than Assumption A.

Assumption B.

  1. (B.1)

    𝐖:𝕏→ℍ\mathbf{W}\colon\mathbb{X}\to\mathbb{H} is linear and bounded with dense range.

  2. (B.2)

    𝐀:ℍ→𝕐\mathbf{A}\colon\mathbb{H}\to\mathbb{Y} is linear and bounded.

  3. (B.3)

    β„›:ℍ→[0,∞]\mathcal{R}\colon\mathbb{H}\to[0,\infty] is proper, strictly convex and wlsc.

  4. (B.4)

    Ξ›\Lambda is countable index set.

  5. (B.5)

    (ϕλ)Ξ»βˆˆΞ›βŠ†β„(\phi_{\lambda})_{\lambda\in\Lambda}\subseteq\mathbb{H} is an ONB of ℍ\mathbb{H}.

  6. (B.6)

    βˆ€Ξ»βˆˆΞ›:Ο•Ξ»βˆˆran⁑(𝐖)\forall\lambda\in\Lambda\colon\phi_{\lambda}\in\operatorname{ran}(\mathbf{W}).

  7. (B.7)

    βˆƒxβˆˆπ•:ℛ​(x)+‖𝐖​xβ€–1<∞\exists x\in\mathbb{X}\colon\mathcal{R}(x)+\|\mathbf{W}x\|_{1}<\infty.

  8. (B.8)

    β„›\mathcal{R} is Gateaux differentiable at x⋆x_{\star} if (x⋆,h⋆)(x_{\star},h_{\star}) is the unique 𝒬\mathcal{Q}-minimizing solution of πŒβ€‹(x,h)=(0,y⋆)\mathbf{M}(x,h)=(0,y_{\star}).

Under AssumptionΒ B, the equation πŒβ€‹(x,h)=(0,y⋆)\mathbf{M}(x,h)=(0,y_{\star}) has a unique 𝒬\mathcal{Q}-minimizing solution.

Condition 3.

  1. (3.1)

    (x⋆,h⋆,y⋆)βˆˆπ•Γ—β„Γ—π•(x_{\star},h_{\star},y_{\star})\in\mathbb{X}\times\mathbb{H}\times\mathbb{Y} with 𝐀​h⋆=y⋆\mathbf{A}h_{\star}=y_{\star}, 𝐖​x⋆=h⋆\mathbf{W}x_{\star}=h_{\star}.

  2. (3.2)

    βˆƒuβˆˆβ„:\exists u\in\mathbb{H}\colon π–βˆ—β€‹uβˆˆβˆ‚β„›β€‹(x⋆)\mathbf{W}^{*}u\in\partial\mathcal{R}(x_{\star})

  3. (3.3)

    βˆƒvβˆˆπ•:\exists v\in\mathbb{Y}\colon π€βˆ—β€‹vβˆ’uβˆˆβˆ‚β€–h⋆‖1\mathbf{A}^{*}v-u\in\partial\|h_{\star}\|_{1}

  4. (3.4)

    βˆ€Ξ»βˆ‰supp⁑(h⋆):\forall\lambda\notin\operatorname{supp}(h_{\star})\colon |βŸ¨Ο•Ξ»,π€βˆ—β€‹vβˆ’u⟩|<1\left|\langle\phi_{\lambda},\mathbf{A}^{*}v-u\rangle\right|<1

  5. (3.5)

    𝐀supp⁑[h⋆]\mathbf{A}_{\operatorname{supp}[h_{\star}]} is injective.

3.1 Auxiliary results

ConditionΒ 3 is clearly stronger than ConditionΒ 1. Below we will show that these conditions are actually equivalent. For that purpose we start with several lemmas. These results are in the spirit of [12] where necessary conditions for standard β„“1\ell^{1} minimization have been derived.

Lemma 3.1.

Assume that (x⋆,h⋆)βˆˆπ•Γ—β„(x_{\star},h_{\star})\in\mathbb{X}\times\mathbb{H} is the unique 𝒬\mathcal{Q}-minimizing solution of πŒβ€‹(x,h)=(0,y⋆)\mathbf{M}(x,h)=(0,y_{\star}), let uβˆˆβ„u\in\mathbb{H} satisfy π–βˆ—β€‹u=βˆ‚β„›β€‹(x⋆)\mathbf{W}^{*}u=\partial\mathcal{R}(x_{\star}) and assume that h⋆h_{\star} is sparse. Then:

  1. (a)

    The restricted mapping 𝐀supp⁑(h⋆)\mathbf{A}_{\operatorname{supp}(h_{\star})} is injective.

  2. (b)

    For every finite set Ξ©1\Omega_{1} with supp⁑(h⋆)∩Ω1=βˆ…\operatorname{supp}(h_{\star})\cap\Omega_{1}=\emptyset there exists ΞΈβˆˆπ•\theta\in\mathbb{Y} such that

    βˆ€Ξ»βˆˆsupp⁑(h⋆):βŸ¨Ο•Ξ»,π€βˆ—β€‹ΞΈβˆ’u⟩=sign⁑(βŸ¨Ο•Ξ»,hβ‹†βŸ©)\displaystyle\forall\lambda\in\operatorname{supp}(h_{\star})\colon\langle\phi_{\lambda},\mathbf{A}^{*}\theta-u\rangle=\operatorname{sign}(\langle\phi_{\lambda},h_{\star}\rangle)
    βˆ€Ξ»βˆˆΞ©1:|βŸ¨Ο•Ξ»,π€βˆ—β€‹ΞΈβˆ’u⟩|<1.\displaystyle\forall\lambda\in\Omega_{1}\colon\left|\langle\phi_{\lambda},\mathbf{A}^{*}\theta-u\rangle\right|<1\,.
Proof.

(a): Denote Ω≔supp⁑(h⋆)\Omega\coloneqq\operatorname{supp}(h_{\star}). After possibly replacing some basis vectors by βˆ’Ο•Ξ»-\phi_{\lambda}, we may assume without loss of generality that sign⁑(βŸ¨Ο•Ξ»,hβ‹†βŸ©)=1\operatorname{sign}(\langle\phi_{\lambda},h_{\star}\rangle)=1 for λ∈Ω\lambda\in\Omega. Since (x⋆,h⋆)(x_{\star},h_{\star}) is the unique 𝒬\mathcal{Q}-minimizing solution of πŒβ€‹(x,h)=(0,y⋆)\mathbf{M}(x,h)=(0,y_{\star}), it follows that

𝒬​(x⋆,h⋆)<𝒬​(x⋆+t​x,h⋆+t​𝐖​x)\mathcal{Q}(x_{\star},h_{\star})<\mathcal{Q}(x_{\star}+tx,h_{\star}+t\mathbf{W}x)\quad

for all tβ‰ 0t\neq 0 and all xβˆˆπ•x\in\mathbb{X} with w≔𝐖​x∈ker⁑(𝐀)βˆ–{0}w\coloneqq\mathbf{W}x\in\ker(\mathbf{A})\setminus\{0\}. Because Ξ©\Omega is finite, the mapping

t↦‖h⋆+t​wβ€–1=βˆ‘Ξ»βˆˆΞ©|βŸ¨Ο•Ξ»,hβ‹†βŸ©+tβ€‹βŸ¨Ο•Ξ»,w⟩|+|t|β€‹βˆ‘Ξ»βˆ‰Ξ©|βŸ¨Ο•Ξ»,w⟩|t\mapsto\|h_{\star}+tw\|_{1}=\sum_{\lambda\in\Omega}\left|\langle\phi_{\lambda},h_{\star}\rangle+t\langle\phi_{\lambda},w\rangle\right|+\left|t\right|\sum_{\lambda\notin\Omega}\left|\langle\phi_{\lambda},w\rangle\right|

is piecewise linear. Taking the one-sided directional derivative of 𝒬\mathcal{Q} with respect to tt, we have

0\displaystyle 0 <limt↓0𝒬​(x⋆+t​x,h⋆+t​w)βˆ’π’¬β€‹(x⋆,h⋆)t\displaystyle<\lim_{t\downarrow 0}\frac{\mathcal{Q}(x_{\star}+tx,h_{\star}+tw)-\mathcal{Q}(x_{\star},h_{\star})}{t}
=limt↓0β€–h⋆+t​wβ€–1βˆ’β€–h⋆‖1t+limt↓0ℛ​(x⋆+t​x)βˆ’β„›β€‹(x⋆)t\displaystyle=\lim_{t\downarrow 0}\frac{\|h_{\star}+tw\|_{1}-\|h_{\star}\|_{1}}{t}+\lim_{t\downarrow 0}\frac{\mathcal{R}(x_{\star}+tx)-\mathcal{R}(x_{\star})}{t}
=βˆ‘Ξ»βˆˆΞ©βŸ¨Ο•Ξ»,w⟩+βˆ‘Ξ»βˆ‰Ξ©|βŸ¨Ο•Ξ»,w⟩|+βŸ¨π–βˆ—β€‹u,x⟩.\displaystyle=\sum_{\lambda\in\Omega}\langle\phi_{\lambda},w\rangle+\sum_{\lambda\notin\Omega}\left|\langle\phi_{\lambda},w\rangle\right|+\langle\mathbf{W}^{*}u,x\rangle. (3.1)

For the last equality we used that βŸ¨Ο•Ξ»,hβ‹†βŸ©=1\langle\phi_{\lambda},h_{\star}\rangle=1 for all λ∈Ω\lambda\in\Omega, that β„›\mathcal{R} is Gateaux differentiable and that π–βˆ—β€‹u=βˆ‚β„›β€‹(x⋆)\mathbf{W}^{*}u=\partial\mathcal{R}(x_{\star}). Inserting βˆ’(x,w)-(x,w) instead of (x,w)(x,w) in (3.1) we deduce

βˆ‘Ξ»βˆ‰Ξ©|βŸ¨Ο•Ξ»,w⟩|>|βˆ‘Ξ»βˆˆΞ©βŸ¨Ο•Ξ»,w⟩+⟨u,w⟩|\sum_{\lambda\notin\Omega}\left|\langle\phi_{\lambda},w\rangle\right|>\Big{|}\sum_{\lambda\in\Omega}\langle\phi_{\lambda},w\rangle+\langle u,w\rangle\Big{|} (3.2)

for all w∈(ker⁑(𝐀)∩ran⁑(𝐖))βˆ–{0}w\in(\ker(\mathbf{A})\cap\operatorname{ran}(\mathbf{W}))\setminus\{0\}. In particular,

βˆ€w∈(ker⁑(𝐀)∩ran⁑(𝐖))βˆ–{0}:βˆ‘Ξ»βˆ‰Ξ©|βŸ¨Ο•Ξ»,w⟩|>0\forall w\in(\ker(\mathbf{A})\cap\operatorname{ran}(\mathbf{W}))\setminus\{0\}\colon\sum_{\lambda\notin\Omega}\left|\langle\phi_{\lambda},w\rangle\right|>0 (3.3)

and consequentially ker⁑(𝐀)∩ran⁑(𝐖)βˆ©β„Ξ©={0}\ker(\mathbf{A})\cap\operatorname{ran}(\mathbf{W})\cap\mathbb{H}_{\Omega}=\{0\}. Because Ο•Ξ»βˆˆran⁑(𝐖)\phi_{\lambda}\in\operatorname{ran}(\mathbf{W}) for all Ξ»βˆˆΞ›\lambda\in\Lambda and Ξ©\Omega is finite, we have β„Ξ©βŠ†ran⁑(𝐖)\mathbb{H}_{\Omega}\subseteq\operatorname{ran}(\mathbf{W}). Therefore ker⁑(𝐀)βˆ©β„Ξ©={0}\ker(\mathbf{A})\cap\mathbb{H}_{\Omega}=\{0\} which verifies that 𝐀Ω\mathbf{A}_{\Omega} is injective.

(b): Let Ξ©1βŠ†Ξ›\Omega_{1}\subseteq\Lambda be finite with Ω∩Ω1=βˆ…\Omega\cap\Omega_{1}=\emptyset. Inequality (3.2) and the finiteness of Ξ©βˆͺΞ©1\Omega\cup\Omega_{1} imply the existence of a constant μ∈(0,1)\mu\in(0,1) such that, for w∈ker⁑(𝐀)βˆ©β„Ξ©βˆͺΞ©1w\in\ker(\mathbf{A})\cap\mathbb{H}_{\Omega\cup\Omega_{1}},

ΞΌβ€‹βˆ‘Ξ»βˆˆΞ©1|βŸ¨Ο•Ξ»,w⟩|β‰₯|βˆ‘Ξ»βˆˆΞ©βŸ¨Ο•Ξ»,w⟩+⟨u,w⟩|.\mu\sum_{\lambda\in\Omega_{1}}\left|\langle\phi_{\lambda},w\rangle\right|\geq\Big{|}\sum_{\lambda\in\Omega}\langle\phi_{\lambda},w\rangle+\langle u,w\rangle\Big{|}\,. (3.4)

Assume for the moment ξ∈ran⁑(𝐀ΩβˆͺΞ©1βˆ—)\xi\in\operatorname{ran}(\mathbf{A}^{*}_{\Omega\cup\Omega_{1}}). Then ΞΎ=𝐀ΩβˆͺΞ©1βˆ—β€‹ΞΈ\xi=\mathbf{A}^{*}_{\Omega\cup\Omega_{1}}\theta for some ΞΈβˆˆπ•\theta\in\mathbb{Y}. Due to (B.5), πΩ\pi_{\Omega} is an orthogonal projection and the adjoint of the embedding iΞ©\mathrm{i}_{\Omega}. The identity πΩβˆͺΞ©1βˆ˜π€βˆ—=(π€βˆ˜iΞ©βˆͺΞ©1)βˆ—=𝐀ΩβˆͺΞ©1βˆ—\pi_{\Omega\cup\Omega_{1}}\circ\mathbf{A}^{*}=(\mathbf{A}\circ i_{\Omega\cup\Omega_{1}})^{*}=\mathbf{A}^{*}_{\Omega\cup\Omega_{1}} implies that

βˆ€Ξ»βˆˆΞ©βˆͺΞ©1:βŸ¨Ο•Ξ»,ξ⟩=βŸ¨Ο•Ξ»,𝐀ΩβˆͺΞ©1βˆ—β€‹ΞΈβŸ©=βŸ¨Ο•Ξ»,π€βˆ—β€‹ΞΈβŸ©.\forall\lambda\in\Omega\cup\Omega_{1}\colon\langle\phi_{\lambda},\xi\rangle=\langle\phi_{\lambda},\mathbf{A}^{*}_{\Omega\cup\Omega_{1}}\theta\rangle=\langle\phi_{\lambda},\mathbf{A}^{*}\theta\rangle\,.

By assumption, ℍΩβˆͺΞ©1\mathbb{H}_{\Omega\cup\Omega_{1}} is finite dimensional and therefore ran(𝐀ΩβˆͺΞ©1βˆ—)=ker(𝐀ΩβˆͺΞ©1)βŸ‚βŠ†β„Ξ©βˆͺΞ©1\operatorname{ran}(\mathbf{A}^{*}_{\Omega\cup\Omega_{1}})=\ker(\mathbf{A}_{\Omega\cup\Omega_{1}})^{\perp}\subseteq\mathbb{H}_{\Omega\cup\Omega_{1}}, where (β‹…)βŸ‚(\,\cdot\,)^{\perp} denotes the orthogonal complement in ℍΩβˆͺΞ©1\mathbb{H}_{\Omega\cup\Omega_{1}}. Consequently we have to show the existence of ξ∈(ker(𝐀ΩβˆͺΞ©1)βŸ‚βŠ†β„Ξ©βˆͺΞ©1\xi\in(\ker(\mathbf{A}_{\Omega\cup\Omega_{1}})^{\perp}\subseteq\mathbb{H}_{\Omega\cup\Omega_{1}} with

βŸ¨Ο•Ξ»,ξ⟩=1+uΞ»βˆ€Ξ»βˆˆΞ©,βŸ¨Ο•Ξ»,ξ⟩∈(uΞ»βˆ’1,uΞ»+1)βˆ€Ξ»βˆˆΞ©1,\displaystyle\begin{aligned} \langle\phi_{\lambda},\xi\rangle&=1+u_{\lambda}&\forall\lambda&\in\Omega,\\ \langle\phi_{\lambda},\xi\rangle&\in(u_{\lambda}-1,u_{\lambda}+1)&\forall\lambda&\in\Omega_{1},\end{aligned} (3.5)

where uΞ»β‰”βŸ¨Ο•Ξ»,u⟩u_{\lambda}\coloneqq\langle\phi_{\lambda},u\rangle.

Define the element zβˆˆβ„Ξ©βˆͺΞ©1z\in\mathbb{H}_{\Omega\cup\Omega_{1}} by βŸ¨Ο•Ξ»,z⟩=1+uΞ»\langle\phi_{\lambda},z\rangle=1+u_{\lambda} for λ∈Ω\lambda\in\Omega and βŸ¨Ο•Ξ»,z⟩=uΞ»\langle\phi_{\lambda},z\rangle=u_{\lambda} for λ∈Ω1\lambda\in\Omega_{1}. If z∈(ker⁑(𝐀))βŸ‚z\in(\ker(\mathbf{A}))^{\perp}, then we choose ξ≔z\xi\coloneqq z and (3.5) is fulfilled. If, on the other hand, zβˆ‰(ker⁑(𝐀))βŸ‚z\notin(\ker(\mathbf{A}))^{\perp}, then dim(ker⁑(𝐀ΩβˆͺΞ©1))≕sβ‰₯1\dim(\ker(\mathbf{A}_{\Omega\cup\Omega_{1}}))\eqqcolon s\geq 1 and there exists a basis (w(1),…,w(s))(w^{(1)},\ldots,w^{(s)}) of ker⁑(𝐀ΩβˆͺΞ©1)\ker(\mathbf{A}_{\Omega\cup\Omega_{1}}) such that

1=⟨z,w(i)⟩=βˆ‘Ξ»βˆˆΞ©(1+uΞ»)β€‹βŸ¨Ο•Ξ»,w(i)⟩+βˆ‘Ξ»βˆˆΞ©1uΞ»β€‹βŸ¨Ο•Ξ»,w(i)⟩=βˆ‘Ξ»βˆˆΞ©βŸ¨Ο•Ξ»,w(i)⟩+βˆ‘Ξ»βˆˆΞ©βˆͺΞ©1uΞ»β€‹βŸ¨Ο•Ξ»,w(i)⟩+βˆ‘Ξ»βˆ‰Ξ©βˆͺΞ©1uΞ»β€‹βŸ¨Ο•Ξ»,w(i)⟩=βˆ‘Ξ»βˆˆΞ©βŸ¨Ο•Ξ»,w(i)⟩+⟨u,w(i)βŸ©βˆ€i∈{1,…,s}\displaystyle\begin{aligned} 1&=\langle z,w^{(i)}\rangle\\ &=\sum_{\lambda\in\Omega}(1+u_{\lambda})\langle\phi_{\lambda},w^{(i)}\rangle+\sum_{\lambda\in\Omega_{1}}u_{\lambda}\langle\phi_{\lambda},w^{(i)}\rangle\\ &\begin{multlined}=\sum_{\lambda\in\Omega}\langle\phi_{\lambda},w^{(i)}\rangle+\sum_{\lambda\in\Omega\cup\Omega_{1}}u_{\lambda}\langle\phi_{\lambda},w^{(i)}\rangle+\sum_{\lambda\notin\Omega\cup\Omega_{1}}u_{\lambda}\langle\phi_{\lambda},w^{(i)}\rangle\end{multlined}=\sum_{\lambda\in\Omega}\langle\phi_{\lambda},w^{(i)}\rangle+\sum_{\lambda\in\Omega\cup\Omega_{1}}u_{\lambda}\langle\phi_{\lambda},w^{(i)}\rangle+\sum_{\lambda\notin\Omega\cup\Omega_{1}}u_{\lambda}\langle\phi_{\lambda},w^{(i)}\rangle\\ &=\sum_{\lambda\in\Omega}\langle\phi_{\lambda},w^{(i)}\rangle+\langle u,w^{(i)}\rangle\quad\forall i\in\{1,\ldots,s\}\end{aligned} (3.6)

Consider now the constrained minimization problem on ℍΩ1\mathbb{H}_{\Omega_{1}}

maxλ∈Ω1⁑|βŸ¨Ο•Ξ»,zβ€²βŸ©|β†’minsubject toΒ β€‹βŸ¨zβ€²,w(i)⟩=βˆ’1Β for ​i∈{1,…,s}.\displaystyle\begin{aligned} &\max_{\lambda\in\Omega_{1}}\left|\langle\phi_{\lambda},z^{\prime}\rangle\right|\to\min\\ &\text{subject to }\langle z^{\prime},w^{(i)}\rangle=-1\quad\text{ for }i\in\{1,\ldots,s\}.\end{aligned} (3.7)

Because of the equality 1=⟨z,w(i)⟩1=\langle z,w^{(i)}\rangle, the admissible vectors zβ€²z^{\prime} in (3.7) are precisely those for which ξ≔z+zβ€²βˆˆ(ker⁑(𝐀ΩβˆͺΞ©1))βŸ‚\xi\coloneqq z+z^{\prime}\in(\ker(\mathbf{A}_{\Omega\cup\Omega_{1}}))^{\perp}. Thus, the task of finding ΞΎ\xi satisfying (3.5) reduces to showing that the value of (3.7) is strictly smaller that 11. Note that the dual of the convex function z′↦maxλ∈Ω1⁑|βŸ¨Ο•Ξ»,zβ€²βŸ©|z^{\prime}\mapsto\max_{\lambda\in\Omega_{1}}\left|\langle\phi_{\lambda},z^{\prime}\rangle\right| is the function

maxΞ©1βˆ‹z′↦{0Β ifΒ β€‹βˆ‘Ξ»βˆˆΞ©1|βŸ¨Ο•Ξ»,zβ€²βŸ©|≀1+∞ ifΒ β€‹βˆ‘Ξ»βˆˆΞ©1|βŸ¨Ο•Ξ»,zβ€²βŸ©|>1.\max_{\Omega_{1}}\ni z^{\prime}\mapsto\begin{cases}0&\text{ if }\sum_{\lambda\in\Omega_{1}}\left|\langle\phi_{\lambda},z^{\prime}\rangle\right|\leq 1\\ +\infty&\text{ if }\sum_{\lambda\in\Omega_{1}}\left|\langle\phi_{\lambda},z^{\prime}\rangle\right|>1.\end{cases} (3.8)

Recalling that ⟨zβ€²,w(i)⟩=βˆ‘Ξ»βˆˆΞ©1βŸ¨Ο•Ξ»,zβ€²βŸ©β€‹Ο•Ξ»β€‹w(i)\langle z^{\prime},w^{(i)}\rangle=\sum_{\lambda\in\Omega_{1}}\langle\phi_{\lambda},z^{\prime}\rangle{\phi_{\lambda}}{w^{(i)}}, it follows that the dual problem to (3.7) is the following constrained problem on ℝs\mathbb{R}^{s}:

S​(p)β‰”βˆ’βˆ‘i=1spiβ†’minsubject toΒ β€‹βˆ‘Ξ»βˆˆΞ©1|βˆ‘i=1spiβ€‹βŸ¨Ο•Ξ»,w(i)⟩|≀1.\displaystyle\begin{aligned} &S(p)\coloneqq-\sum_{i=1}^{s}p_{i}\to\min\\ &\text{subject to }\sum_{\lambda\in\Omega_{1}}\Big{|}\sum_{i=1}^{s}p_{i}\langle\phi_{\lambda},w^{(i)}\rangle\Big{|}\leq 1\,.\end{aligned} (3.9)

From (3.6) we obtain that

βˆ‘Ξ»βˆˆΞ©βˆ‘i=1spiβ€‹βŸ¨Ο•Ξ»,w(i)⟩+βˆ‘i=1spiβ€‹βŸ¨u,w(i)⟩=βˆ‘i=1spi=βˆ’S​(p)\sum_{\lambda\in\Omega}\sum_{i=1}^{s}p_{i}\langle\phi_{\lambda},w^{(i)}\rangle+\sum_{i=1}^{s}p_{i}\langle u,w^{(i)}\rangle=\sum_{i=1}^{s}p_{i}=-S(p)

for every pβˆˆβ„sp\in\mathbb{R}^{s}. Taking w=βˆ‘i=1spi​w(i)∈ker⁑(𝐀)βˆ©β„Ξ©βˆͺΞ©1w=\sum_{i=1}^{s}p_{i}w^{(i)}\in\ker(\mathbf{A})\cap\mathbb{H}_{\Omega\cup\Omega_{1}}, inequality (3.4) therefore implies that for every pβˆˆβ„sp\in\mathbb{R}^{s} there exist μ∈(0,1)\mu\in(0,1) such that

ΞΌβ€‹βˆ‘Ξ»βˆˆΞ©1|βˆ‘i=1spiβ€‹βŸ¨Ο•Ξ»,w(i)⟩|\displaystyle\mu\sum_{\lambda\in\Omega_{1}}\Big{|}\sum_{i=1}^{s}p_{i}\langle\phi_{\lambda},w^{(i)}\rangle\Big{|}
β‰₯|βˆ‘Ξ»βˆˆΞ©βˆ‘i=1spiβ€‹βŸ¨Ο•Ξ»,w(i)⟩+βˆ‘i=1spiβ€‹βŸ¨u,w(i)⟩|\displaystyle\qquad\geq\Big{|}\sum_{\lambda\in\Omega}\sum_{i=1}^{s}p_{i}\langle\phi_{\lambda},w^{(i)}\rangle+\sum_{i=1}^{s}p_{i}\langle u,w^{(i)}\rangle\Big{|}
=|βˆ‘i=1spi|=|S​(p)|.\displaystyle\qquad=\Big{|}\sum_{i=1}^{s}p_{i}\Big{|}=\left|S(p)\right|\,.

From (3.9) it follows that |S​(p)|≀μ\left|S(p)\right|\leq\mu for every admissible vector pβˆˆβ„sp\in\mathbb{R}^{s} for (3.9). Thus the value of S​(p)S(p) in (3.9) is greater than or equal to βˆ’ΞΌ-\mu. Since the value of the primal problem (3.7) is the negative of the dual problem (3.9), this shows that the value of (3.7) is at most ΞΌ\mu. As μ∈(0,1)\mu\in(0,1), this proves that the value of (3.7) is strictly smaller than 11 and, as we have shown above, this proves assertion (3.5). ∎

Lemma 3.2.

Assume that (x⋆,h⋆)βˆˆπ•Γ—β„(x_{\star},h_{\star})\in\mathbb{X}\times\mathbb{H} is the unique 𝒬\mathcal{Q}-minimizing solution of πŒβ€‹(x,h)=(0,y⋆)\mathbf{M}(x,h)=(0,y_{\star}) and suppose π–βˆ—β€‹uβˆˆβˆ‚β„›β€‹(x⋆)\mathbf{W}^{*}u\in\partial\mathcal{R}(x_{\star}), π€βˆ—β€‹vβˆ’uβˆˆβˆ‚β€–h⋆‖1\mathbf{A}^{*}v-u\in\partial\|h_{\star}\|_{1} for some (u,v)βˆˆβ„Γ—π•(u,v)\in\mathbb{H}\times\mathbb{Y}. Then (x⋆,h⋆)(x_{\star},h_{\star}) satisfies Condition 3.

Proof.

The restricted injectivity condition (3.5) follows from Lemma 3.1. Conditions (3.1), (3.2) are satisfied according to assumption. Define now

Ξ©1\displaystyle\Omega_{1} ≔Ω​[π€βˆ—β€‹vβˆ’u]βˆ–supp⁑(h⋆)\displaystyle\coloneqq\Omega[\mathbf{A}^{*}v-u]\setminus\operatorname{supp}(h_{\star})
={Ξ»βˆˆΞ›βˆ–supp⁑(h⋆)∣|βŸ¨Ο•Ξ»,π€βˆ—β€‹vβˆ’u⟩|=1}.\displaystyle=\{\lambda\in\Lambda\setminus\operatorname{supp}(h_{\star})\mid\left|\langle\phi_{\lambda},\mathbf{A}^{*}v-u\rangle\right|=1\}.

Because (βŸ¨Ο•Ξ»,π€βˆ—β€‹vβˆ’u⟩)Ξ»βˆˆΞ›βŠ†β„“2​(Ξ›)(\langle\phi_{\lambda},\mathbf{A}^{*}v-u\rangle)_{\lambda\in\Lambda}\subseteq\ell^{2}(\Lambda), the set Ξ©1\Omega_{1} is finite. Let ΞΈβˆˆπ•\theta\in\mathbb{Y} be as in Lemma 3.1Β (b) and set

β€–ΞΈβ€–βˆžβ‰”sup{|βŸ¨Ο•Ξ»,π€βˆ—β€‹ΞΈβˆ’u⟩|βˆ£Ξ»βˆˆΞ›}\displaystyle\|\theta\|_{\infty}\coloneqq\sup\{\left|\langle\phi_{\lambda},\mathbf{A}^{*}\theta-u\rangle\right|\mid\lambda\in\Lambda\}
a≔(1βˆ’m​[π€βˆ—β€‹vβˆ’u])/(2β€‹β€–ΞΈβ€–βˆž)\displaystyle a\coloneqq(1-m[\mathbf{A}^{*}v-u])/(2\|\theta\|_{\infty})
v^≔(1βˆ’a)​v+a​θ.\displaystyle\hat{v}\coloneqq(1-a)v+a\theta\,.

Note that a∈(0,1/2]a\in(0,1/2]. Then the following hold:

  • β€’

    If λ∈supp⁑(h⋆)\lambda\in\operatorname{supp}(h_{\star}), then

    βŸ¨Ο•Ξ»,π€βˆ—β€‹v^βˆ’u⟩=(1βˆ’a)β€‹βŸ¨Ο•Ξ»,π€βˆ—β€‹vβˆ’u⟩+aβ€‹βŸ¨Ο•Ξ»,π€βˆ—β€‹ΞΈβˆ’u⟩=sign⁑(βŸ¨Ο•Ξ»,hβ‹†βŸ©).\langle\phi_{\lambda},\mathbf{A}^{*}\hat{v}-u\rangle=(1-a)\langle\phi_{\lambda},\mathbf{A}^{*}v-u\rangle+a\langle\phi_{\lambda},\mathbf{A}^{*}\theta-u\rangle=\operatorname{sign}(\langle\phi_{\lambda},h_{\star}\rangle).
  • β€’

    If λ∈Ω1\lambda\in\Omega_{1}, then

    |βŸ¨Ο•Ξ»,π€βˆ—β€‹v^βˆ’u⟩|≀(1βˆ’a)​|βŸ¨Ο•Ξ»,π€βˆ—β€‹vβˆ’u⟩|+a​|βŸ¨Ο•Ξ»,π€βˆ—β€‹ΞΈβˆ’u⟩|<(1βˆ’a)+a=1.\left|\langle\phi_{\lambda},\mathbf{A}^{*}\hat{v}-u\rangle\right|\leq(1-a)\left|\langle\phi_{\lambda},\mathbf{A}^{*}v-u\rangle\right|+a\left|\langle\phi_{\lambda},\mathbf{A}^{*}\theta-u\rangle\right|<(1-a)+a=1.
  • β€’

    If Ξ»βˆˆΞ›βˆ–(supp⁑(h⋆)βˆͺΞ©1)\lambda\in\Lambda\setminus(\operatorname{supp}(h_{\star})\cup\Omega_{1}), then

    |βŸ¨Ο•Ξ»,π€βˆ—β€‹v^βˆ’u⟩|\displaystyle\left|\langle\phi_{\lambda},\mathbf{A}^{*}\hat{v}-u\rangle\right|
    ≀(1βˆ’a)​m​[π€βˆ—β€‹vβˆ’u]+aβ€‹β€–ΞΈβ€–βˆž\displaystyle\qquad\leq(1-a)\,m[\mathbf{A}^{*}v-u]+a\|\theta\|_{\infty}
    ≀m​[π€βˆ—β€‹vβˆ’u]+(1βˆ’m​[π€βˆ—β€‹vβˆ’u])/2\displaystyle\qquad\leq m[\mathbf{A}^{*}v-u]+(1-m[\mathbf{A}^{*}v-u])/2
    =(1+m​[π€βˆ—β€‹vβˆ’u])/2<1.\displaystyle\qquad=(1+m[\mathbf{A}^{*}v-u])/2<1.

Consequently (u,v^)(u,\hat{v}) satisfies (3.3), (3.4). ∎

Lemma 3.3.

Let (Ξ΄k)kβˆˆβ„•βˆˆ(0,∞)β„•(\delta_{k})_{k\in\mathbb{N}}\in(0,\infty)^{\mathbb{N}} converge to 0, (yk)kβˆˆβ„•βˆˆπ•β„•(y_{k})_{k\in\mathbb{N}}\in\mathbb{Y}^{\mathbb{N}} satisfy β€–ykβˆ’y⋆‖≀δk\|y_{k}-y_{\star}\|\leq\delta_{k} and xk∈argmin⁑ℬα,yΞ΄x_{k}\in\operatorname{argmin}\mathcal{B}_{\alpha,y^{\delta}} with Ξ±kβ‰₯C​δk\alpha_{k}\geq C\delta_{k} for C>0C>0. Then β€–xkβˆ’x⋆‖→0\|x_{k}-x_{\star}\|\to 0 and β€–πŒβ€‹xkβˆ’ykβ€–=π’ͺ​(Ξ΄k)\|\mathbf{M}x_{k}-y_{k}\|=\mathcal{O}(\delta_{k}) as kβ†’βˆžk\to\infty imply ran⁑(πŒβˆ—)βˆ©βˆ‚π’¬β€‹(x⋆)β‰ βˆ…{\operatorname{ran}(\mathbf{M}^{*})\cap\partial\mathcal{Q}(x_{\star})\neq\emptyset}.

Proof.

See [12, Lemma 4.1]. The proof given there also applies to our situation. ∎

3.2 Main result

The following theorem in the main results of this section and shows that the source condition and restricted injectivity are in fact necessary for linear convergence.

Theorem 3.4 (Converse results).

Let (x⋆,h⋆,y⋆)βˆˆπ•Γ—β„βˆˆπ•(x_{\star},h_{\star},y_{\star})\in\mathbb{X}\times\mathbb{H}\in\mathbb{Y} satisfy 𝐀​h⋆=y⋆\mathbf{A}h_{\star}=y_{\star} and 𝐖​x⋆=h⋆\mathbf{W}x_{\star}=h_{\star} and let AssumptionΒ 3 hold. Then the following statements are equivalent:

  1. (i)

    (x⋆,h⋆,y⋆)(x_{\star},h_{\star},y_{\star}) satisfies Condition 3.

  2. (ii)

    (x⋆,h⋆,y⋆)(x_{\star},h_{\star},y_{\star}) satisfies Condition 1.

  3. (iii)

    βˆƒΞΎβˆˆβˆ‚β„›β€‹(x⋆)\exists\xi\in\partial\mathcal{R}(x_{\star}) βˆ€C>0\forall C>0 βˆƒc1,c2>0:\exists c_{1},c_{2}>0\colon For Ξ±=C​δ\alpha=C\delta, β€–yΞ΄βˆ’y⋆‖≀δ\|y^{\delta}-y_{\star}\|\leq\delta, and (xΞ±Ξ΄,hΞ±Ξ΄)∈argmin⁑ℬα,yΞ΄(x_{\alpha}^{\delta},h_{\alpha}^{\delta})\in\operatorname{argmin}\mathcal{B}_{\alpha,y^{\delta}} we have

    π’ŸΞΎβ„›β€‹(xΞ±Ξ΄,x⋆)\displaystyle\mathcal{D}_{\xi}^{\mathcal{R}}(x_{\alpha}^{\delta},x_{\star}) ≀c1​δ\displaystyle\leq c_{1}\delta (3.10)
    β€–hΞ±Ξ΄βˆ’h⋆‖\displaystyle\|h_{\alpha}^{\delta}-h_{\star}\| ≀c2​δ.\displaystyle\leq c_{2}\delta\,. (3.11)
  4. (iv)

    (x⋆,h⋆)(x_{\star},h_{\star}) is the unique 𝒬\mathcal{Q}-minimizing solution of πŒβ€‹(x,h)=(0,y⋆)\mathbf{M}(x,h)=(0,y_{\star}) and βˆ€C>0\forall C>0 βˆƒc3,c4>0\exists c_{3},c_{4}>0 with

    ‖𝐀​hΞ±Ξ΄βˆ’y⋆‖\displaystyle\|\mathbf{A}h^{\delta}_{\alpha}-y_{\star}\| ≀c3​δ\displaystyle\leq c_{3}\delta (3.12)
    ‖𝐖​xΞ±Ξ΄βˆ’hΞ±Ξ΄β€–\displaystyle\|\mathbf{W}x_{\alpha}^{\delta}-h^{\delta}_{\alpha}\| ≀c4​δ\displaystyle\leq c_{4}\delta (3.13)

    for β€–yβˆ’y⋆‖≀δ\|y-y_{\star}\|\leq\delta, Ξ±=C​δ\alpha=C\delta, (xΞ±Ξ΄,hΞ±Ξ΄)∈argmin⁑ℬα,yΞ΄(x_{\alpha}^{\delta},h_{\alpha}^{\delta})\in\operatorname{argmin}\mathcal{B}_{\alpha,y^{\delta}}.

Proof.

Item (i) obviously implies Item (ii). The implication (ii) β‡’\Rightarrow (iii) has been shown in Theorem 2.7. The rate in (iii) implies that h⋆h_{\star} is the second component of every 𝒬\mathcal{Q}-minimizing solution of πŒβ€‹(x,h)=(0,y⋆)\mathbf{M}(x,h)=(0,y_{\star}). As β„›\mathcal{R} is strictly convex, (3.10) implies that (x⋆,h⋆)(x_{\star},h_{\star}) is the unique 𝒬\mathcal{Q}-minimizing solution of πŒβ€‹(x,h)=(0,y⋆)\mathbf{M}(x,h)=(0,y_{\star}). The rate in (3.12) follows trivially from (3.11), since 𝐀\mathbf{A} is linear and bounded. To prove (3.13) we choose uβˆˆβ„u\in\mathbb{H} with βˆ‚β„›β€‹(x⋆)=π–βˆ—β€‹u\partial\mathcal{R}(x_{\star})=\mathbf{W}^{*}u and proceed similar as in the proof of Lemma 2.2. Because (xΞ±Ξ΄,hΞ±Ξ΄)∈argmin⁑ℬα,yΞ΄(x_{\alpha}^{\delta},h_{\alpha}^{\delta})\in\operatorname{argmin}\mathcal{B}_{\alpha,y^{\delta}},

‖𝐖​xΞ±Ξ΄βˆ’hΞ±Ξ΄β€–2+‖𝐀​hΞ±Ξ΄βˆ’yΞ΄β€–2+2​α​ℛ​(xΞ±Ξ΄)+2​α​‖hΞ±Ξ΄β€–1≀‖hβ‹†βˆ’hΞ±Ξ΄β€–2+‖𝐀​hΞ±Ξ΄βˆ’yΞ΄β€–2+2​α​ℛ​(x⋆)+2​α​‖hΞ±Ξ΄β€–1\|\mathbf{W}x_{\alpha}^{\delta}-h^{\delta}_{\alpha}\|^{2}+\|\mathbf{A}h^{\delta}_{\alpha}-y^{\delta}\|^{2}+2\alpha\mathcal{R}(x_{\alpha}^{\delta})+2\alpha\|h^{\delta}_{\alpha}\|_{1}\\ \leq\|h_{\star}-h^{\delta}_{\alpha}\|^{2}+\|\mathbf{A}h^{\delta}_{\alpha}-y^{\delta}\|^{2}+2\alpha\mathcal{R}(x_{\star})+2\alpha\|h^{\delta}_{\alpha}\|_{1}

and therefore

‖𝐖​xΞ±Ξ΄βˆ’hΞ±Ξ΄β€–2≀(c2​δ)2+2​α​(ℛ​(x⋆)βˆ’β„›β€‹(xΞ±Ξ΄)).\|\mathbf{W}x_{\alpha}^{\delta}-h^{\delta}_{\alpha}\|^{2}\leq(c_{2}\delta)^{2}+2\alpha(\mathcal{R}(x_{\star})-\mathcal{R}(x_{\alpha}^{\delta}))\,.

By the definition of the Bregman distance

ℛ​(x⋆)βˆ’β„›β€‹(xΞ±Ξ΄)\displaystyle\mathcal{R}(x_{\star})-\mathcal{R}(x_{\alpha}^{\delta})
β‰€βˆ’Dπ–βˆ—β€‹uℛ​(xΞ±Ξ΄,x⋆)βˆ’βŸ¨u,𝐖​xΞ±Ξ΄βˆ’hβ‹†βŸ©\displaystyle\quad\leq-D_{\mathbf{W}^{*}u}^{\mathcal{R}}(x_{\alpha}^{\delta},x_{\star})-\langle u,\mathbf{W}x_{\alpha}^{\delta}-h_{\star}\rangle
β‰€βˆ’Dπ–βˆ—β€‹uℛ​(xΞ±Ξ΄,x⋆)+β€–u‖​‖𝐖​xΞ±Ξ΄βˆ’h⋆‖\displaystyle\quad\leq-D_{\mathbf{W}^{*}u}^{\mathcal{R}}(x_{\alpha}^{\delta},x_{\star})+\|u\|\|\mathbf{W}x_{\alpha}^{\delta}-h_{\star}\|
β‰€βˆ’Dπ–βˆ—β€‹uℛ​(xΞ±Ξ΄,x⋆)+β€–u‖​‖𝐖​xΞ±Ξ΄βˆ’hΞ±Ξ΄β€–+β€–u‖​c2​δ.\displaystyle\quad\begin{multlined}\leq-D_{\mathbf{W}^{*}u}^{\mathcal{R}}(x_{\alpha}^{\delta},x_{\star})+\|u\|\|\mathbf{W}x_{\alpha}^{\delta}-h^{\delta}_{\alpha}\|+\|u\|c_{2}\delta.\end{multlined}\leq-D_{\mathbf{W}^{*}u}^{\mathcal{R}}(x_{\alpha}^{\delta},x_{\star})+\|u\|\|\mathbf{W}x_{\alpha}^{\delta}-h^{\delta}_{\alpha}\|+\|u\|c_{2}\delta.

Since the Bregman distance is nonnegative, it follows

0β‰₯‖𝐖​xΞ±Ξ΄βˆ’hΞ±Ξ΄β€–2βˆ’2​α​‖u‖​‖𝐖​xΞ±Ξ΄βˆ’hΞ±Ξ΄β€–βˆ’2​α​‖u‖​c2β€‹Ξ΄βˆ’(c2​δ)2\displaystyle\begin{multlined}0\geq\|\mathbf{W}x_{\alpha}^{\delta}-h^{\delta}_{\alpha}\|^{2}-2\alpha\|u\|\|\mathbf{W}x_{\alpha}^{\delta}-h^{\delta}_{\alpha}\|-2\alpha\|u\|c_{2}\delta-(c_{2}\delta)^{2}\end{multlined}0\geq\|\mathbf{W}x_{\alpha}^{\delta}-h^{\delta}_{\alpha}\|^{2}-2\alpha\|u\|\|\mathbf{W}x_{\alpha}^{\delta}-h^{\delta}_{\alpha}\|-2\alpha\|u\|c_{2}\delta-(c_{2}\delta)^{2}
=(‖𝐖​xΞ±Ξ΄βˆ’hΞ±Ξ΄β€–+c2​δ)β‹…(‖𝐖​xΞ±Ξ΄βˆ’hΞ±Ξ΄β€–βˆ’2​α​‖uβ€–βˆ’c2​δ)\displaystyle\;\,\begin{multlined}=\left(\|\mathbf{W}x_{\alpha}^{\delta}-h^{\delta}_{\alpha}\|+c_{2}\delta\right)\cdot\left(\|\mathbf{W}x_{\alpha}^{\delta}-h^{\delta}_{\alpha}\|-2\alpha\|u\|-c_{2}\delta\right)\end{multlined}=\left(\|\mathbf{W}x_{\alpha}^{\delta}-h^{\delta}_{\alpha}\|+c_{2}\delta\right)\cdot\left(\|\mathbf{W}x_{\alpha}^{\delta}-h^{\delta}_{\alpha}\|-2\alpha\|u\|-c_{2}\delta\right)

and hence ‖𝐖​xΞ±Ξ΄βˆ’hαδ‖≀(2​C​‖uβ€–+c2)​δ\|\mathbf{W}x_{\alpha}^{\delta}-h^{\delta}_{\alpha}\|\leq(2C\|u\|+c_{2})\delta.

Now let (iv) hold and β€–yβˆ’yk‖≀δk\|y-y_{k}\|\leq\delta_{k}, Ξ΄kβ†’0\delta_{k}\to 0. Choose Ξ±k=C​δk\alpha_{k}=C\delta_{k} and (xk,hk)∈argmin⁑ℬαk,yk(x_{k},h_{k})\in\operatorname{argmin}\mathcal{B}_{\alpha_{k},y_{k}}. The uniqueness of (x⋆,h⋆)(x_{\star},h_{\star}) implies β€–(xk,hk)βˆ’(x⋆,h⋆)β€–β†’0\|(x_{k},h_{k})-(x_{\star},h_{\star})\|\to 0 as kβ†’βˆžk\to\infty, see [11, Prop. 7]. Moreover,

β€–(𝐖​xkβˆ’hk,𝐀​hkβˆ’yk)‖≀‖𝐖​xΞ±Ξ΄βˆ’hkβ€–+‖𝐀​hkβˆ’yk‖≀(c4+c3+1)​δk.\|(\mathbf{W}x_{k}-h_{k},\mathbf{A}h_{k}-y_{k})\|\leq\|\mathbf{W}x_{\alpha}^{\delta}-h_{k}\|+\|\mathbf{A}h_{k}-y_{k}\|\leq(c_{4}+c_{3}+1)\delta_{k}\,.

Lemma 3.3 implies ran⁑(πŒβˆ—)βˆ©βˆ‚β„›β€‹(x⋆,h⋆)β‰ βˆ…\operatorname{ran}(\mathbf{M}^{*})\cap\partial\mathcal{R}(x_{\star},h_{\star})\neq\emptyset, which means that there exists (u,v)βˆˆβ„Γ—π•(u,v)\in\mathbb{H}\times\mathbb{Y} such that π–βˆ—β€‹uβˆˆβˆ‚β„›β€‹(x⋆)\mathbf{W}^{*}u\in\partial\mathcal{R}(x_{\star}) and π€βˆ—β€‹vβˆ’uβˆˆβˆ‚β€–h⋆‖1\mathbf{A}^{*}v-u\in\partial\|h_{\star}\|_{1}. Proposition 3.2 finally shows that Condition 3 holds, which concludes the proof. ∎

3.3 Numerical example

We consider recovering a function from CS measurements of its primitive. The aim of this elementary example is to point out possible implementation of the two proposed models and supporting the linear error estimates. Detailed comparison with other methods and figuring out limitations of each method is subject of future research.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 1: Top: x⋆x_{\star} (left) and 𝐖​x⋆\mathbf{W}x_{\star} (right). Middle: Reconstruction using relaxed (left) and strict (right) β„“1\ell^{1} co-regularization. Bottom: β€–xβ‹†βˆ’xΞ±Ξ΄β€–2/2\|x_{\star}-x_{\alpha}^{\delta}\|^{2}/2 (left) and β€–hβ‹†βˆ’hΞ±Ξ΄β€–\|h_{\star}-h_{\alpha}^{\delta}\| (right) as a functions of noise level Ξ΄\delta.

The discrete operator 𝐖:ℝN→ℝN\mathbf{W}\colon\mathbb{R}^{N}\to\mathbb{R}^{N} is taken as a discretization of the integration operator L2​[0,1]β†’L2​[0,1]:fβ†¦βˆ«0tfL^{2}[0,1]\to L^{2}[0,1]\colon f\mapsto\int_{0}^{t}f. The CS measurement matrix 𝐀:ℝmΓ—N\mathbf{A}\colon\mathbb{R}^{m\times N} is taken as random Bernoulli matrix with entries 0,10,1. We apply strict and relaxed β„“1\ell^{1} co-regularization with β„›=βˆ₯β‹…βˆ₯2/2\mathcal{R}=\|\,\cdot\,\|^{2}/2, (ϕλ)Ξ»βˆˆΞ›(\phi_{\lambda})_{\lambda\in\Lambda} as Daubechies wavelet ONB with two vanishing moments and ΞΊΞ»=1\kappa_{\lambda}=1. For minimizing the relaxed β„“1\ell^{1} co-regularization functional we apply the Douglas-Rachford algorithm [6, Algorithm 4.2] and for strict β„“1\ell^{1} co-regularization we apply the ADMM algorithm [6, Algorithm 6.4] applied to the constraint formulation argminx,h⁑{‖𝐀𝐖​xβˆ’yΞ΄β€–2/2+α​‖xβ€–2/2+α​‖hβ€–1βˆ£π–β€‹x=h}\operatorname{argmin}_{x,h}\{\|\mathbf{A}\mathbf{W}x-y^{\delta}\|^{2}/2+\alpha\|x\|^{2}/2+\alpha\|h\|_{1}\mid\mathbf{W}x=h\}.

Results are shown in Figure 1. The top row depicts the targeted signal xβ‹†βˆˆβ„Nx_{\star}\in\mathbb{R}^{N} (left) for which 𝐖​x⋆\mathbf{W}x_{\star} is sparsely represented by (ϕλ)Ξ»βˆˆΞ›(\phi_{\lambda})_{\lambda\in\Lambda} (right). The middle row shows reconstructions using the strict and the relaxed co-sparse regularization from noisy data β€–yβˆ’y⋆‖≀10βˆ’5\|y-y_{\star}\|\leq 10^{-5}. The bottom row plots π’Ÿx⋆ℛ​(xΞ±Ξ΄,x⋆)\mathcal{D}_{x_{\star}}^{\mathcal{R}}(x_{\alpha}^{\delta},x_{\star}) and β€–hΞ±Ξ΄βˆ’π–β€‹x⋆‖\|h^{\delta}_{\alpha}-\mathbf{W}x_{\star}\| as functions of the noise level. Note that for β„›=βˆ₯β‹…βˆ₯2/2\mathcal{R}=\|\,\cdot\,\|^{2}/2 the Bregman distance is given by π’Ÿx⋆ℛ​(xΞ±Ξ΄,x⋆)=β€–xΞ±Ξ΄βˆ’x⋆‖2/2\mathcal{D}_{x_{\star}}^{\mathcal{R}}(x_{\alpha}^{\delta},x_{\star})=\|x_{\alpha}^{\delta}-x_{\star}\|^{2}/2. Both error plots show a linear convergence rate supporting TheoremsΒ 2.7,Β 2.8, 3.4.

4 Conclusion

While the theory of CS on direct data is well developed, this by far not the case when compressed measurements are made on indirect data. For that purpose, in this paper we study CS from indirect data written as composite problem yΞ΄=𝐀𝐖​x⋆+zΞ΄y^{\delta}=\mathbf{A}\mathbf{W}x_{\star}+z^{\delta} where 𝐀\mathbf{A} models the CS measurement operator and 𝐖\mathbf{W} the forward model generating indirect data and depending on the application at hand. For signal reconstruction we have proposed two novel reconstruction methods, named relaxed and strict β„“1\ell^{1} co-regularization, for jointly estimating xx and h⋆=𝐀​xh_{\star}=\mathbf{A}x. Note that the main conceptual difference between the proposed method over standard CS is that we use the β„“1\ell^{1} penalty for indirect data 𝐖​x⋆\mathbf{W}x_{\star} instead of x⋆x_{\star} together with another penalty for x⋆x_{\star} accounting for the inversion of 𝐀\mathbf{A}, and jointly recovering both unknowns.

As main results for both reconstruction models we derive linear error estimates under source conditions and restricted injectivity (see TheoremsΒ 2.8,Β 2.7). Moreover, conditions have been shown to be even necessary to obtain such results (see TheoremΒ 3.4). Our results have been illustrated on a simple numerical example for combined CS and numerical differentiation. In future work further detailed numerical investigations are in order comparing our models with standard CS approaches in practical important applications demonstrating strengths and limitations of different methods. Potential applications include magnetic resonance imaging [2, 15] or photoacoustic tomography [17, 16].

Acknowledgment

The presented work of A.E. and M.H. has been supported of the Austrian Science Fund (FWF), project P 30747-N32.

References

  • [1] R.Β Baraniuk, M.Β Davenport, R.Β DeVore, and M.Β Wakin. A simple proof of the restricted isometry property for random matrices. Constructive Approximation, 28(3):253–263, 2008.
  • [2] K.Β T. Block, M.Β Uecker, and J.Β Frahm. Undersampled radial MRI with multiple coils. iterative image reconstruction using a total variation constraint. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, 57(6):1086–1098, 2007.
  • [3] M.Β Burger and S.Β Osher. Convergence rates of convex variational regularization. Inverse problems, 20(5):1411, 2004.
  • [4] E.Β J. CandΓ¨s, J.Β Romberg, and T.Β Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52(2):489–509, 2006.
  • [5] E.Β J. Candes, J.Β K. Romberg, and T.Β Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 59(8):1207–1223, 2006.
  • [6] P.Β L. Combettes and J.Β C. Pesquet. Proximal splitting methods in signal processing. Springer Optimization and Its Applications, 49:185–212, 2011.
  • [7] D.Β L. Donoho. Compressed sensing. IEEE Transactions on information theory, 52(4):1289–1306, 2006.
  • [8] J.Β Flemming. Variational Source Conditions, Quadratic Inverse Problems, Sparsity Promoting Regularization: New Results in Modern Theory of Inverse Problems and an Application in Laser Optics. Springer, 2018.
  • [9] S.Β Foucart and H.Β Rauhut. A mathematical introduction to compressive sensing. Applied and Numerical Harmonic Analysis. BirkhΓ€user/Springer, New York, 2013.
  • [10] J.-J. Fuchs. On sparse representations in arbitrary redundant bases. IEEE transactions on Information theory, 50(6):1341–1344, 2004.
  • [11] M.Β Grasmair, M.Β Haltmeier, and O.Β Scherzer. Sparse regularization with β„“p\ell^{p} penalty term. Inverse Problems, 24(5), 2008.
  • [12] M.Β Grasmair, O.Β Scherzer, and M.Β Haltmeier. Necessary and sufficient conditions for linear convergence of β„“1\ell^{1}-regularization. Communications on Pure and Applied Mathematics, 64(2):161–182, 2011.
  • [13] B.Β Jin, D.Β A. Lorenz, and S.Β Schiffler. Elastic-net regularization: error estimates and active set methods. Inverse Problems, 25(11):115022, 2009.
  • [14] D.Β Lorenz. Convergence rates and source conditions for tikhonov regularization with sparsity constraints. Journal of Inverse & Ill-Posed Problems, 16(5), 2008.
  • [15] M.Β Lustig, D.Β Donoho, and J.Β M. Pauly. Sparse MRI: The application of compressed sensing for rapid mr imaging. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, 58(6):1182–1195, 2007.
  • [16] J.Β Provost and F.Β Lesage. The application of compressed sensing for photo-acoustic tomography. IEEE transactions on medical imaging, 28(4):585–594, 2008.
  • [17] M.Β Sandbichler, F.Β Krahmer, T.Β Berer, P.Β Burgholzer, and M.Β Haltmeier. A novel compressed sensing scheme for photoacoustic tomography. SIAM Journal on Applied Mathematics, 75(6):2475–2494, 2015.
  • [18] O.Β Scherzer, M.Β Grasmair, H.Β Grossauer, M.Β Haltmeier, and F.Β Lenzen. Variational methods in imaging. Springer, 2009.
  • [19] H.Β Zou and T.Β Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 67(2):301–320, 2005.