This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimal Recovery from Inaccurate Data in Hilbert Spaces:
Regularize, but what of the Parameter?  

Simon Foucart111S. F. is supported by grants from the NSF (CCF-1934904, DMS-2053172) and from the ONR (N00014-20-1-2787).   and Chunyang Liao — Texas A&M University
Abstract

In Optimal Recovery, the task of learning a function from observational data is tackled deterministically by adopting a worst-case perspective tied to an explicit model assumption made on the functions to be learned. Working in the framework of Hilbert spaces, this article considers a model assumption based on approximability. It also incorporates observational inaccuracies modeled via additive errors bounded in 2\ell_{2}. Earlier works have demonstrated that regularization provide algorithms that are optimal in this situation, but did not fully identify the desired hyperparameter. This article fills the gap in both a local scenario and a global scenario. In the local scenario, which amounts to the determination of Chebyshev centers, the semidefinite recipe of Beck and Eldar (legitimately valid in the complex setting only) is complemented by a more direct approach, with the proviso that the observational functionals have orthonormal representers. In the said approach, the desired parameter is the solution to an equation that can be resolved via standard methods. In the global scenario, where linear algorithms rule, the parameter elusive in the works of Micchelli et al. is found as the byproduct of a semidefinite program. Additionally and quite surprisingly, in case of observational functionals with orthonormal representers, it is established that any regularization parameter is optimal.

Key words and phrases: Regularization, Chebyshev center, semidefinite programming, S-procedure, hyperparameter selection.

AMS classification: 41A65, 46N40, 90C22, 90C47.

 

1 Introduction

1.1 Background on Optimal Recovery

This article is concerned with a central problem in Data Science, namely: a function ff is acquired through point evaluations

(1) yi=f(x(i)),i=1,,m,y_{i}=f(x^{(i)}),\qquad i=1,\ldots,m,

and these data should be used to learn ff—or to recover it, with the terminology preferred in this article. Importantly, the evaluation points x(1),,x(m)x^{(1)},\ldots,x^{(m)} are considered fixed entities in our scenario: they cannot be chosen in a favorable way, as in Information-Based Complexity [Novak and Woźniakowski, 2008], nor do they occur as independent realizations of a random variable, as in Statistical Learning Theory [Hastie, Tibshirani, and Friedman, 2009]. In particular, without an underlying probability distribution, the performance of the recovery process cannot be assessed via generalization error. Instead, it is assessed via a notion of worst-case error, central to the theory of Optimal Recovery [Micchelli and Rivlin, 1977].

To outline this theory, we make the framework slightly more abstract. Precisely, given a normed space FF, the unknown function is replaced by an element fFf\in F. This element is accessible only through a priori information expressing an educated belief about ff and a posteriori information akin to (1). In other words, our partial knowledge about ff is summed up via

  • the fact that f𝒦f\in\mathcal{K} for a subset 𝒦\mathcal{K} of FF called a model set;

  • the observational data yi=λi(f)y_{i}=\lambda_{i}(f), i=1,,mi=1,\ldots,m, for some linear functionals λ1,,λmF\lambda_{1},\ldots,\lambda_{m}\in F^{*} making up the observation map Λ:gF[λ1(g);;λm(g)]m\Lambda:g\in F\mapsto[\lambda_{1}(g);\ldots;\lambda_{m}(g)]\in\mathbb{R}^{m}.

We wish to approximate ff by some f^F\widehat{f}\in F produced using this partial knowledge of ff. Since the error ff^\|f-\widehat{f}\| involves the unknown ff, which is only accessible via f𝒦f\in\mathcal{K} and Λ(f)=y\Lambda(f)=y, we take a worst-case perspective leading to the local worst-case error

(2) lwce(y,f^):=supf𝒦Λ(f)=yff^.{\rm lwce}(y,\widehat{f}):=\sup_{\begin{subarray}{c}f\in\mathcal{K}\\ \Lambda(f)=y\end{subarray}}\|f-\widehat{f}\|.

Our objective consists in finding an element f^\widehat{f} that minimizes lwce(y,f^){\rm lwce}(y,\widehat{f}). Such an f^\widehat{f} can be described, almost tautologically, as a center of a smallest ball containing 𝒦Λ1({y})\mathcal{K}\cap\Lambda^{-1}(\{y\}). It is called a Chebyshev center of this set of model- and data-consistent elements. This remark, however, does not come with any practical construction of a Chebyshev center.

The term local was used above to make a distinction with the global worst-case error of a recovery map Δ(=Δ𝒦):mF\Delta(=\Delta_{\mathcal{K}}):\mathbb{R}^{m}\to F, defined as

(3) gwce(Δ):=supyΛ(𝒦)lwce(y,Δ(y))=supf𝒦fΔ(Λ(f)).{\rm gwce}(\Delta):=\sup_{y\in\Lambda(\mathcal{K})}{\rm lwce}(y,\Delta(y))=\sup_{f\in\mathcal{K}}\|f-\Delta(\Lambda(f))\|.

The minimal value of gwce(Δ){\rm gwce}(\Delta) is called the intrinsic error (of the observation map Λ\Lambda over the model set 𝒦\mathcal{K}) and the maps Δ\Delta that achieve this minimal value are called globally optimal recovery maps. Our objective consists in constructing such maps—of course, the map that assigns to yy a Chebyshev center of 𝒦Λ1({y})\mathcal{K}\cap\Lambda^{-1}(\{y\}) is one of them, but it may be impractical. By contrast, for model sets that are convex and symmetric, the existence of linear maps among the set of globally optimal recovery maps is guaranteed by fundamental results from Optimal Recovery in at least two settings: when FF is a Hilbert space and when FF is an arbitrary normed space but the full recovery of ff gives way to the recovery of a quantity of interest Q(f)Q(f), QQ being a linear functional. We refer the readers to [Foucart, To appear, Chapter 9] for details.

1.2 The specific problem

The problem solved in this article is a quintessential Optimal Recovery problem—its specificity lies in the particular model set and in the incorporation of errors in the observation process. The underlying normed space FF is a Hilbert space and is therefore denoted by HH from now on. Reproducing kernel Hilbert spaces, whose usage is widespread in Data Science [Schölkopf and Smola, 2002], are of particular interest as point evaluations of type (1) make perfect sense there.

Concerning the model set, we concentrate on an approximation-based choice that is increasingly scrutinized, see e.g. [Maday, Patera, Penn, and Yano, 2015], [DeVore, Petrova, and Wojtaszczyk, 2017] and [Cohen, Dahmen, Mula, and Nichols, 2020]. Depending on a linear subspace 𝒱\mathcal{V} of HH and on a parameter ε>0\varepsilon>0, it takes the form

𝒦={fH:dist(f,𝒱)ε}.\mathcal{K}=\{f\in H:{\rm dist}(f,\mathcal{V})\leq\varepsilon\}.

Binev, Cohen, Dahmen, DeVore, Petrova, and Wojtaszczyk [2017] completely solved the Optimal Recovery problem with exact data in this situation (locally and globally). Precisely, they showed that the solution f^\widehat{f} to

(4) minimizefHdist(f,𝒱)s.toΛ(f)=y,\underset{f\in H}{\rm minimize}\,\;{\rm dist}(f,\mathcal{V})\qquad\mbox{s.to}\quad\Lambda(f)=y,

which clearly belongs to the model- and data-consistent set 𝒦Λ1({y})\mathcal{K}\cap\Lambda^{-1}(\{y\}), turns out to be its Chebyshev center. Moreover, with P𝒱P_{\mathcal{V}} and P𝒱P_{\mathcal{V}^{\perp}} denoting the orthogonal projectors onto 𝒱\mathcal{V} and onto the orthogonal complement 𝒱\mathcal{V}^{\perp} of 𝒱\mathcal{V}, the fact that dist(f,𝒱)=fP𝒱f=P𝒱f{\rm dist}(f,\mathcal{V})=\|f-P_{\mathcal{V}}f\|=\|P_{\mathcal{V}^{\perp}}f\| makes the optimization program (4) tractable. It can actually be seen that Δ:yf^\Delta:y\mapsto\widehat{f} is a linear map. This is a significant advantage because Δ\Delta can then be precomputed in an offline stage knowing only 𝒱\mathcal{V} and Λ\Lambda and the program (4) need not be solved afresh for each new data ymy\in\mathbb{R}^{m} arriving in an online stage.

Concerning the observation process, instead of exact data y=Λ(f)my=\Lambda(f)\in\mathbb{R}^{m}, it is now assumed that

y=Λ(f)+emy=\Lambda(f)+e\in\mathbb{R}^{m}

for some unknown error vector eme\in\mathbb{R}^{m}. This error vector is not modeled as random noise but through the deterministic 2\ell_{2}-bound e2η\|e\|_{2}\leq\eta. Although other p\ell_{p}-norms can the considered for the optimal recovery of Q(f0)Q(f_{0}) when QQ is a linear functional on an arbitrary normed space FF (see [Ettehad and Foucart, 2021]), here the arguments rely critically on m\mathbb{R}^{m} being endowed with the 2\ell_{2}-norm. It will be written simply as \|\cdot\| below, hoping that it does not create confusion with the Hilbert norm on HH.

For our specific problem, the worst-case recovery errors (2) and (3) need to be adjusted. The local worst-case recovery error at yy for f^\widehat{f} becomes

lwce(y,f^)=supP𝒱fεΛ(f)yηff^.{\rm lwce}(y,\widehat{f})=\sup_{\begin{subarray}{c}\|P_{\mathcal{V}^{\perp}}f\|\leq\varepsilon\\ \|\Lambda(f)-y\|\leq\eta\end{subarray}}\|f-\widehat{f}\|.

As for the global worst-case error of Δ:mH\Delta:\mathbb{R}^{m}\to H, it reads

gwce(Δ)=supP𝒱fεeηfΔ(Λ(f)+e).{\rm gwce}(\Delta)=\sup_{\begin{subarray}{c}\|P_{\mathcal{V}^{\perp}}f\|\leq\varepsilon\\ \|e\|\leq\eta\end{subarray}}\|f-\Delta(\Lambda(f)+e)\|.

Note that both worst-case errors are infinite if one can find a nonzero hh in 𝒱ker(Λ)\mathcal{V}\cap\ker(\Lambda). Indeed, the element ft:=f+thf_{t}:=f+th, tt\in\mathbb{R}, obeys P𝒱ft=P𝒱fε\|P_{\mathcal{V}^{\perp}}f_{t}\|=\|P_{\mathcal{V}^{\perp}}f\|\leq\varepsilon and yΛ(ft)=yΛ(f)η\|y-\Lambda(f_{t})\|=\|y-\Lambda(f)\|\leq\eta, so for instance lwce(y,f^)suptftf^=+{\rm lwce}(y,\widehat{f})\geq\sup_{t\in\mathbb{R}}\|f_{t}-\widehat{f}\|=+\infty. Thus, we always make the assumption that

(5) 𝒱ker(Λ)={0}.\mathcal{V}\cap\ker(\Lambda)=\{0\}.

We keep in mind that the latter forces n:=dim(𝒱)mn:=\dim(\mathcal{V})\leq m, as can be seen by dimension arguments. With Λ\Lambda^{*} denoting the Hermitian adjoint of Λ\Lambda, another assumption that we sometimes make reads

(6) ΛΛ=Idm.\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}}.

This is not extremely stringent: assuming the surjectivity of Λ\Lambda is quite natural, otherwise certain observations need not be collected; then the map Λ\Lambda can be preprocessed into another map Λ~\widetilde{\Lambda} satisfying Λ~Λ~=Idm\widetilde{\Lambda}\widetilde{\Lambda}^{*}=\mathrm{Id}_{\mathbb{R}^{m}} by setting Λ~=(ΛΛ)1/2Λ\widetilde{\Lambda}=(\Lambda\Lambda^{*})^{-1/2}\Lambda. Incidentally, if u1,,umHu_{1},\ldots,u_{m}\in H represent the Riesz representers of the observation functionals λ1,,λmH\lambda_{1},\ldots,\lambda_{m}\in H^{*}, characterized by ui,f=λi(f)\langle u_{i},f\rangle=\lambda_{i}(f) for all fHf\in H, then the assumption (6) is equivalent to the orthonormality of the system (u1,,um)(u_{1},\ldots,u_{m}). In a reproducing kernel Hilbert space with kernel KK, if the λi\lambda_{i}’s are point evaluations at some x(i)x^{(i)}’s, so that ui=K(,x(i))u_{i}=K(\cdot,x^{(i)}), then (6) is equivalent to K(x(i),x(j))=δi,jK(x^{(i)},x^{(j)})=\delta_{i,j} for all i,j=1,,mi,j=1,\ldots,m. This occurs e.g. for the Paley–Wiener space of functions with Fourier transform supported on [π,π][-\pi,\pi] when the evaluations points come from an integer grid, since the kernel is given by K(x,x)=sinc(π(xx))K(x,x^{\prime})={\rm sinc}(\pi(x-x^{\prime})), x,xx,x^{\prime}\in\mathbb{R}.

1.3 Main results

There are previous works on Optimal Recovery in Hilbert spaces in the presence of observation error bounded in 2\ell_{2}. Notably, [Beck and Eldar, 2007] dealt with the local setting, while [Melkman and Micchelli, 1979] and [Micchelli, 1993] dealt with the global setting. These works underline the importance of regularization, which is prominent in many other settings [Chen and Haykin, 2002]. They establish that the optimal recovery maps are obtained by solving the unconstrained program

(7) minimizefH(1τ)P𝒱f2+τΛfy2\underset{f\in H}{\rm minimize}\,\;(1-\tau)\|P_{\mathcal{V}^{\perp}}f\|^{2}+\tau\|\Lambda f-y\|^{2}

for some τ[0,1]\tau\in[0,1]. It is the precise choice of this regularization parameter τ\tau which is the purpose of this article. Assuming from now on that HH is finite dimensional222It is likely that the results are still valid in the infinite-dimensional case. But then it is unclear how to solve (8) and (9) numerically, so the infinite-dimensional case is not given proper scrutiny in the rest of the article., we provide a complete (almost) picture of the local and global Optimal Recovery solutions, as summarized in the four points below, three of them being new:

  1. L1.

    With HH restricted here to be a complex Hilbert space, the Chebyshev center of the set {fH:P𝒱fε,Λfyη}\{f\in H:\|P_{\mathcal{V}^{\perp}}f\|\leq\varepsilon,\|\Lambda f-y\|\leq\eta\} is the minimizer of (7) for the choice τ=d/(c+d)\tau=d_{\sharp}/(c_{\sharp}+d_{\sharp}), where c,dc_{\sharp},d_{\sharp} are solutions to the semidefinite program333In the statement of this semidefinite program and elsewhere, the notation T0T\succeq 0 means that an operator TT is positive semidefinite on HH, i.e., that Tf,f0\langle Tf,f\rangle\geq 0 for all fHf\in H.

    minimizec,d,t0ε2c+(η2y2)d+t\displaystyle\underset{c,d,t\geq 0}{\rm minimize}\,\;\varepsilon^{2}c+(\eta^{2}-\|y\|^{2})d+t s.to cP𝒱+dΛΛId,\displaystyle cP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda\succeq\mathrm{Id},
    and [cP𝒱+dΛΛ|dΛyd(Λy)|t]0.\displaystyle\begin{bmatrix}cP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda&|&-d\Lambda^{*}y\\ \hline\cr-d(\Lambda^{*}y)^{*}&|&t\end{bmatrix}\succeq 0.
  2. L2.

    Under the orthonormal observations assumption (6) but without the above restriction on HH, the Chebyshev center of the set {fH:P𝒱fε,Λfyη}\{f\in H:\|P_{\mathcal{V}^{\perp}}f\|\leq\varepsilon,\|\Lambda f-y\|\leq\eta\} is the minimizer of (7) for the choice τ\tau that satisfies

    (8) λmin((1τ)P𝒱+τΛΛ)=(1τ)2ε2τ2η2(1τ)ε2τη2+(1τ)τ(12τ)δ2,\lambda_{\min}((1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda)=\frac{(1-\tau)^{2}\varepsilon^{2}-\tau^{2}\eta^{2}}{(1-\tau)\varepsilon^{2}-\tau\eta^{2}+(1-\tau)\tau(1-2\tau)\delta^{2}},

    where δ\delta is precomputed as δ=min{P𝒱f:Λf=y}=min{Λfy:f𝒱}\delta=\min\{\|P_{\mathcal{V}^{\perp}}f\|:\Lambda f=y\}=\min\{\|\Lambda f-y\|:f\in\mathcal{V}\}. For the distinct case 𝒱={0}\mathcal{V}=\{0\}, the best choice of parameter is more simply τ=max{1η/y,0}\tau=\max\{1-\eta/\|y\|,0\}.

  3. G1.

    A globally optimal recovery map is provided by the linear map sending ymy\in\mathbb{R}^{m} to the minimizer of (7) with parameter τ=d/(c+d)\tau=d_{\flat}/(c_{\flat}+d_{\flat}), where c,dc_{\flat},d_{\flat} are solutions to the semidefinite program

    (9) minimizec,dε2c+η2ds.tocP𝒱+dΛΛId.\underset{c,d}{\rm minimize}\,\;\varepsilon^{2}c+\eta^{2}d\qquad\mbox{s.to}\quad cP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda\succeq\mathrm{Id}.
  4. G2.

    Under the orthonormal observations assumption (6), the linear map sending ymy\in\mathbb{R}^{m} to the minimizer of (7) is a globally optimal recovery map for any choice of parameter τ[0,1]\tau\in[0,1].

Before entering the technicalities, a few comments are in order to put these results in context. Item L1 is the result of [Beck and Eldar, 2007] (see Corollary 3.2 there) adapted to our situation. It relies on an extension of the S-lemma involving two quadratic constraints. This extension is valid in the complex finite-dimensional setting, but not necessarily in the real setting, hence the restriction on HH (this does not preclude the validity of the result in the real setting, though). It is worth pointing out the nonlinearity of the map that sends ymy\in\mathbb{R}^{m} to the above Chebyshev center. Incidentally, we can safely talk about the Chebyshev center, because it is known [Garkavi, 1962] that a bounded set in a uniformly convex Banach space has exactly one Chebyshev center. A sketch of the argument adapted to our situation is presented in the appendix.

For item L2, working with an observation map Λ\Lambda satisfying ΛΛ=Idm\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}} allows us to construct the Chebyshev center even in the setting of a real Hilbert space. This is possible because our argument does not rely on the extension of the S-lemma—it just uses the obvious implication. As for equation (8), it is easily solved using the bisection method or the Newton/secant method. Moreover, it gives some insight on the value of the optimal parameter τ\tau. For instance, the proof reveals that τ\tau is always between 1/21/2 and ε/(ε+η)\varepsilon/(\varepsilon+\eta). When εη\varepsilon\geq\eta, say, the optimal parameter should then satisfy τ1/2\tau\geq 1/2, which is somewhat intuitive: εη\varepsilon\geq\eta means that there is more model mismatch than data mismatch, so the regularization should penalize model fidelity less than data fidelity by taking 1ττ1-\tau\leq\tau, i.e., τ1/2\tau\geq 1/2. As an aside, we point out that, here too, the map that sends ymy\in\mathbb{R}^{m} to the Chebyshev center is not a linear map—if it was, then the optimal parameter should be independent of yy.

In contrast, the globally optimal recovery map of item G1 is linear. It is one of several globally optimal recovery maps, since the locally optimal one (which is nonlinear) is also globally optimal. However, as revealed in the reproducible444matlab and Python files illustrating the findings of this article are located at https://github.com/foucart/COR. accompanying this article, it is in general the only regularization map that turns out to be globally optimal. The fact that regularization produces globally optimal recovery maps was recognized by Micchelli, who wrote in the abstract of [Micchelli, 1993] that “the regularization parameter must be chosen with care”. However, a recipe for selecting the parameter was not given there, except on a specific example. The closest to a nonexhaustive search is found in [Plaskota, 1996, Lemma 2.6.2] for the case 𝒱={0}\mathcal{V}=\{0\}, but even this result does not translate into a numerically tractable recipe. The selection stemming from (9) does, at least when HH is finite-dimensional, which is assumed here. Semidefinite programs can indeed be solved in matlab using CVX [Grant and Boyd, 2014] and in Python using CVXPY [Diamond and Boyd, 2016].

Finally, a surprise arises in item G2. Working with an observation map Λ\Lambda satisfying ΛΛ=Idm\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}}, the latter indeed reveals that the regularization parameter does not need to be chosen with care after all, since regularization maps are globally optimal no matter how the parameter τ[0,1]\tau\in[0,1] is chosen. The precise interpretation of the choices τ=0\tau=0 and τ=1\tau=1 will be elucidated later.

The rest of this article is organized as follows. Section 2 gathers some auxiliary results that are used in the proofs of the main results. Section 3 elucidates item L1 and establishes item L2—in other words, it is concerned with local optimality. Section 4, which is concerned with global optimality, is the place where items G1 and G2 are proved. Lastly, a short appendix containing some side information is included after the bibliography.

2 Technical Preparation

This section establishes (or recalls) a few results that we isolate here in order not to disrupt the flow of subsequent arguments.

2.1 S-lemma and S-procedure

Loosely speaking, the S-procedure is a relaxation technique expressing the fact that a quadratic inequality is a consequence of some quadratic constraints. In case of a single quadratic constraint, the relaxation turns out to be exact. This result, known as the S-lemma, can be stated as follows: given quadratic functions q0q_{0} and q1q_{1} defined on 𝕂N\mathbb{K}^{N}, with 𝕂=\mathbb{K}=\mathbb{R} or 𝕂=\mathbb{K}=\mathbb{C},

[q0(x)0 whenever q1(x)0][there exists a0:q0aq1],[q_{0}(x)\leq 0\;\mbox{ whenever }q_{1}(x)\leq 0]\iff[\mbox{there exists }a\geq 0:q_{0}\leq aq_{1}],

provided q1(x~)<0q_{1}(\widetilde{x})<0 for some x~𝕂N\widetilde{x}\in\mathbb{K}^{N}. With more than one quadratic constraint, q1,,qkq_{1},\ldots,q_{k}, say, q0(x)0q_{0}(x)\leq 0 whenever q1(x)0,,qk(x)0q_{1}(x)\leq 0,\ldots,q_{k}(x)\leq 0 is still a consequence of q0a1q1++akqkq_{0}\leq a_{1}q_{1}+\cdots+a_{k}q_{k} for some a1,,ak0a_{1},\ldots,a_{k}\geq 0, but the reverse implication does not hold anymore. There is a subtlety when k=2k=2, as the reverse implication holds for 𝕂=\mathbb{K}=\mathbb{C} but not for 𝕂=\mathbb{K}=\mathbb{R}, see [Pólik and Terlaky, 2007, Section 3]. However, if the quadratic constraints do not feature linear terms, then the reverse implication holds for k=2k=2 also when 𝕂=\mathbb{K}=\mathbb{R}. Since this result of [Polyak, 1998, Theorem 4.1] is to be invoked later, we state it formally below.

Theorem 1.

Suppose that N3N\geq 3 and that quadratic functions q0,q1,q2q_{0},q_{1},q_{2} on N\mathbb{R}^{N} take the form qi(x)=Aix,x+αiq_{i}(x)=\langle A_{i}x,x\rangle+\alpha_{i} for symmetric matrices A0,A1,A2N×NA_{0},A_{1},A_{2}\in\mathbb{R}^{N\times N} and scalars α0,α1,α2\alpha_{0},\alpha_{1},\alpha_{2}\in\mathbb{R}. Then

[q0(x)0 whenever q1(x)0 and q2(x)0][there exist a1,a20:q0a1q1+a2q2],[q_{0}(x)\leq 0\;\mbox{ whenever }q_{1}(x)\leq 0\mbox{ and }q_{2}(x)\leq 0]\iff[\mbox{there exist }a_{1},a_{2}\geq 0:q_{0}\leq a_{1}q_{1}+a_{2}q_{2}],

provided q1(x~)<0q_{1}(\widetilde{x})<0 and q2(x~)<0q_{2}(\widetilde{x})<0 for some x~N\widetilde{x}\in\mathbb{R}^{N} and b1A1+b2A20b_{1}A_{1}+b_{2}A_{2}\succ 0 for some b1,b2b_{1},b_{2}\in\mathbb{R}.

2.2 Regularization

In this subsection, we take a closer look at the regularization program (7). The result below shows that its solution depends linearly on ymy\in\mathbb{R}^{m}. In fact, the result covers a slightly more general program and the linearity claim follows by taking R=P𝒱R=P_{\mathcal{V}^{\perp}}, r=0r=0, S=ΛS=\Lambda, and s=ys=y.

Proposition 2.

Let R,SR,S be linear maps from HH into other Hilbert spaces containing points r,sr,s, respectively. For τ(0,1)\tau\in(0,1), the optimization program

(10) minimizefH(1τ)Rfr2+τSfs2\underset{f\in H}{\rm minimize}\,\;(1-\tau)\|Rf-r\|^{2}+\tau\|Sf-s\|^{2}

has solutions fτHf_{\tau}\in H characterized by

(11) ((1τ)RR+τSS)fτ=(1τ)Rr+τSs.\big{(}(1-\tau)R^{*}R+\tau S^{*}S\big{)}f_{\tau}=(1-\tau)R^{*}r+\tau S^{*}s.

Moreover, if ker(R)ker(S)={0}\ker(R)\cap\ker(S)=\{0\}, then fτf_{\tau} is uniquely given by

(12) fτ=((1τ)RR+τSS)1((1τ)Rr+τSs).f_{\tau}=\big{(}(1-\tau)R^{*}R+\tau S^{*}S\big{)}^{-1}\big{(}(1-\tau)R^{*}r+\tau S^{*}s\big{)}.
Proof.

The program (10) can be interpreted as a standard least squares problem, namely as

minimizefH[1τRτS]f[1τrτs]2.\underset{f\in H}{\rm minimize}\,\left\|\begin{bmatrix}\sqrt{1-\tau}R\\ \hline\cr\sqrt{\tau}S\end{bmatrix}f-\begin{bmatrix}\sqrt{1-\tau}r\\ \hline\cr\sqrt{\tau}s\end{bmatrix}\right\|^{2}.

According to the normal equations, its solutions fτf_{\tau} are characterized by

[1τRτS][1τRτS]fτ=[1τRτS][1τrτs],\begin{bmatrix}\sqrt{1-\tau}R\\ \hline\cr\sqrt{\tau}S\end{bmatrix}^{*}\begin{bmatrix}\sqrt{1-\tau}R\\ \hline\cr\sqrt{\tau}S\end{bmatrix}f_{\tau}=\begin{bmatrix}\sqrt{1-\tau}R\\ \hline\cr\sqrt{\tau}S\end{bmatrix}^{*}\begin{bmatrix}\sqrt{1-\tau}r\\ \hline\cr\sqrt{\tau}s\end{bmatrix},

which is a rewriting of (11). Next, if ker(R)ker(S)={0}\ker(R)\cap\ker(S)=\{0\}, then

((1τ)RR+τSS)f,f=(1τ)Rf2+τSf20,\langle((1-\tau)R^{*}R+\tau S^{*}S)f,f\rangle=(1-\tau)\|Rf\|^{2}+\tau\|Sf\|^{2}\geq 0,

with equality only possible when fker(R)ker(S)f\in\ker(R)\cap\ker(S), i.e., f=0f=0. This shows that (1τ)RR+τSS(1-\tau)R^{*}R+\tau S^{*}S is positive definite, and hence invertible, which allows us to write (12) as a consequence of (11). ∎

The expression (12) is not always the most convenient one. Under extra conditions on RR and SS, we shall see that fτf_{\tau}, τ[0,1]\tau\in[0,1], can in fact be expressed as the convex combination fτ=(1τ)f0+τf1f_{\tau}=(1-\tau)f_{0}+\tau f_{1}. The elements f0f_{0} and f1f_{1} should be interpreted555Intuitively, the solution to the program (10) written as the minimization of Rfr2+(τ/(1τ))Sfs2\|Rf-r\|^{2}+(\tau/(1-\tau))\|Sf-s\|^{2} becomes, as τ1\tau\to 1, the mininizer of Rfr2\|Rf-r\|^{2} subject to Sfs2=0\|Sf-s\|^{2}=0. This explains the interpretation of f1f_{1}. A similar argument explains the interpretation of f0f_{0}. as

f0=argminfHSfs\displaystyle f_{0}=\underset{f\in H}{{\rm argmin}\,}\|Sf-s\| s.to Rf=r,\displaystyle\qquad\mbox{s.to }\;Rf=r,
f1=argminfHRfr\displaystyle f_{1}=\underset{f\in H}{{\rm argmin}\,}\|Rf-r\| s.to Sf=s.\displaystyle\qquad\mbox{s.to }\;Sf=s.

The requirements that rran(R)r\in{\rm ran}(R) and sran(S)s\in{\rm ran}(S) need to be imposed for f0f_{0} and f1f_{1} to even exist and the condition ker(R)ker(S)={0}\ker(R)\cap\ker(S)=\{0\} easily guarantees that f0f_{0} and f1f_{1} are unique. They obey

(13) Rf0=r,S(Sf0s)ker(R),Sf1=s,R(Rf1r)ker(S).Rf_{0}=r,\quad S^{*}(Sf_{0}-s)\in\ker(R)^{\perp},\quad Sf_{1}=s,\quad R^{*}(Rf_{1}-r)\in\ker(S)^{\perp}.

For instance, the identity Rf0=rRf_{0}=r reflects the constraint in the optimization program defining f0f_{0}, while S(Sf0s)ker(R)S^{*}(Sf_{0}-s)\in\ker(R)^{\perp} is obtained by expanding S(f0+tu)s2S(f0)s2\|S(f_{0}+tu)-s\|^{2}\geq\|S(f_{0})-s\|^{2} around t=0t=0 for any uker(R)u\in\ker(R). At this point, we are ready to establish our claim under extra conditions on RR and SS, namely that they are orthogonal projections. These conditions will be in place when the observation map satisfies ΛΛ=Idm\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}}. Indeed, in view of w2=w,ΛΛw=Λw2\|w\|^{2}=\langle w,\Lambda\Lambda^{*}w\rangle=\|\Lambda^{*}w\|^{2} for any wmw\in\mathbb{R}^{m}, the regularization program (7) also reads

minimizefH(1τ)P𝒱f2+τΛΛfΛy2,\underset{f\in H}{\rm minimize}\,\;(1-\tau)\|P_{\mathcal{V}^{\perp}}f\|^{2}+\tau\|\Lambda^{*}\Lambda f-\Lambda^{*}y\|^{2},

where both P𝒱P_{\mathcal{V}^{\perp}} and ΛΛ\Lambda^{*}\Lambda are orthogonal projections. The result below will then be applied with R=P𝒱R=P_{\mathcal{V}^{\perp}}, r=0r=0, S=ΛΛS=\Lambda^{*}\Lambda, and s=Λys=\Lambda^{*}y.

Proposition 3.

Let R,SR,S be two orthogonal projectors on HH such that ker(R)ker(S)={0}\ker(R)\cap\ker(S)=\{0\} and let rran(R)r\in{\rm ran}(R), sran(S)s\in{\rm ran}(S). For τ[0,1]\tau\in[0,1], the solution fτf_{\tau} to the optimization program

(14) minimizefH(1τ)Rfr2+τSfs2\underset{f\in H}{\rm minimize}\,\;(1-\tau)\|Rf-r\|^{2}+\tau\|Sf-s\|^{2}

satisfies

(15) fτ=(1τ)f0+τf1.f_{\tau}=(1-\tau)f_{0}+\tau f_{1}.

Moreover, one has

(16) Rfτr=τf1f0andSfτs=(1τ)f1f0.\|Rf_{\tau}-r\|=\tau\|f_{1}-f_{0}\|\qquad\mbox{and}\qquad\|Sf_{\tau}-s\|=(1-\tau)\|f_{1}-f_{0}\|.
Proof.

Taking the extra conditions on RR and SS into account, the identities (13) read

Rf0=r,Sf0sran(R),Sf1=s,Rf1rran(S).Rf_{0}=r,\quad Sf_{0}-s\in{\rm ran}(R),\quad Sf_{1}=s,\quad Rf_{1}-r\in{\rm ran}(S).

In this form, they imply that

S(f0f1),(RS)(f0f1)\displaystyle\langle S(f_{0}-f_{1}),(R-S)(f_{0}-f_{1})\rangle =Sf0s,R(f0f1)Sf0s,S(f0f1)\displaystyle=\langle Sf_{0}-s,R(f_{0}-f_{1})\rangle-\langle Sf_{0}-s,S(f_{0}-f_{1})\rangle
=Sf0s,f0f1Sf0s,f0f1\displaystyle=\langle Sf_{0}-s,f_{0}-f_{1}\rangle-\langle Sf_{0}-s,f_{0}-f_{1}\rangle
(17) =0.\displaystyle=0.

In a similar fashion, by exchanging the roles of RR and SS, and consequently also of f0f_{0} and f1f_{1}, we have R(f1f0),(SR)(f1f0)=0\langle R(f_{1}-f_{0}),(S-R)(f_{1}-f_{0})\rangle=0, i.e., R(f0f1),(RS)(f0f1)=0\langle R(f_{0}-f_{1}),(R-S)(f_{0}-f_{1})\rangle=0. Subtracting (17) from the latter yields (RS)(f0f1)2=0\|(R-S)(f_{0}-f_{1})\|^{2}=0, in other words R(f0f1)=S(f0f1)R(f_{0}-f_{1})=S(f_{0}-f_{1}). Then, the element h:=f0f1R(f0f1)=f0f1S(f0f1)h:=f_{0}-f_{1}-R(f_{0}-f_{1})=f_{0}-f_{1}-S(f_{0}-f_{1}) belongs to ker(R)ker(S)\ker(R)\cap\ker(S), so that h=0h=0. In summary, we have established that

(18) R(f0f1)=S(f0f1)=f0f1.R(f_{0}-f_{1})=S(f_{0}-f_{1})=f_{0}-f_{1}.

From here, we can deduce the two parts of the proposition. For the first part, we notice that

((1τ)R+τS)((1τ)f0+τf1)\displaystyle\big{(}(1-\tau)R+\tau S\big{)}\big{(}(1-\tau)f_{0}+\tau f_{1}\big{)} =(1τ)2Rf0+(1τ)τ(Rf1+Sf0)+τ2Sf1\displaystyle=(1-\tau)^{2}Rf_{0}+(1-\tau)\tau(Rf_{1}+Sf_{0})+\tau^{2}Sf_{1}
=(1τ)2Rf0+(1τ)τ(Rf0+Sf1)+τ2Sf1\displaystyle=(1-\tau)^{2}Rf_{0}+(1-\tau)\tau(Rf_{0}+Sf_{1})+\tau^{2}Sf_{1}
=(1τ)Rf0+τSf1\displaystyle=(1-\tau)Rf_{0}+\tau Sf_{1}
=(1τ)r+τs,\displaystyle=(1-\tau)r+\tau s,

which shows that (1τ)f0+τf1(1-\tau)f_{0}+\tau f_{1} satisfies the relation (11) characterizing the minimizer fτf_{\tau} of (14), so that fτ=(1τ)f0+τf1f_{\tau}=(1-\tau)f_{0}+\tau f_{1}, as announced in (15). For the second part, we notice that

Rfτr=(1τ)Rf0+τRf1Rf0=τR(f1f0)=τ(f1f0),Rf_{\tau}-r=(1-\tau)Rf_{0}+\tau Rf_{1}-Rf_{0}=\tau R(f_{1}-f_{0})=\tau(f_{1}-f_{0}),

so the first equality of (16) follows by taking the norm. The second equality of (16) is derived in a similar fashion. ∎

We complement Proposition 3 with a few additional pieces of information.

Remark.

Under the assumptions of Proposition 3, the solution fτf_{\tau} to (14) is also solution to

minimizefHmax{(1τ)Rfr,τSfs}.\underset{f\in H}{\rm minimize}\,\;\max\big{\{}(1-\tau)\|Rf-r\|,\tau\|Sf-s\|\big{\}}.

Indeed, at f=fτf=f_{\tau}, the squared objective function equals (1τ)2τ2f1f02(1-\tau)^{2}\tau^{2}\|f_{1}-f_{0}\|^{2}, while at an arbitrary fHf\in H, it satisfies

max{(1τ)2Rfr2,τ2Sfs2}\displaystyle\max\big{\{}(1-\tau)^{2}\|Rf-r\|^{2},\tau^{2}\|Sf-s\|^{2}\big{\}} 111τ+1τ((1τ)Rfr2+τSfs2)\displaystyle\geq\frac{1}{\dfrac{1}{1-\tau}+\dfrac{1}{\tau}}\big{(}(1-\tau)\|Rf-r\|^{2}+\tau\|Sf-s\|^{2}\big{)}
(1τ)τ((1τ)Rfτr2+τSfτs2)\displaystyle\geq(1-\tau)\tau\big{(}(1-\tau)\|Rf_{\tau}-r\|^{2}+\tau\|Sf_{\tau}-s\|^{2}\big{)}
=(1τ)τ((1τ)τ2f1f02+τ(1τ)2f1f02)\displaystyle=(1-\tau)\tau\big{(}(1-\tau)\tau^{2}\|f_{1}-f_{0}\|^{2}+\tau(1-\tau)^{2}\|f_{1}-f_{0}\|^{2}\big{)}
=(1τ)2τ2f1f02.\displaystyle=(1-\tau)^{2}\tau^{2}\|f_{1}-f_{0}\|^{2}.

In the case R=P𝒱R=P_{\mathcal{V}^{\perp}}, r=0r=0, S=ΛΛS=\Lambda^{*}\Lambda, and s=Λys=\Lambda^{*}y, the choice τ=ε/(ε+η)\tau=\varepsilon/(\varepsilon+\eta) is quite relevant, since the above optimization program becomes equivalent to

minimizefHmax{1εP𝒱f,1ηΛfy}.\underset{f\in H}{\rm minimize}\,\;\max\bigg{\{}\frac{1}{\varepsilon}\|P_{\mathcal{V}^{\perp}}f\|,\frac{1}{\eta}\|\Lambda f-y\|\bigg{\}}.

Its solution is clearly in the model- and data-consistent set {fH:P𝒱fε,Λfyη}\{f\in H:\|P_{\mathcal{V}^{\perp}}f\|\leq\varepsilon,\|\Lambda f-y\|\leq\eta\}. In fact, this could have been a natural guess for its Chebyshev center, but item L2 reveals the invalidity of such a guess. Nonetheless, the special parameter τ=ε/(ε+η)\tau=\varepsilon/(\varepsilon+\eta) will make a reappearance in the argument leading to item L2.

Remark.

The proof of Proposition 3 showcased the important identities Rf0=rRf_{0}=r, Sf1=sSf_{1}=s, and R(f0f1)=S(f0f1)=f0f1R(f_{0}-f_{1})=S(f_{0}-f_{1})=f_{0}-f_{1}. In the case R=P𝒱R=P_{\mathcal{V}^{\perp}}, r=0r=0, S=ΛΛS=\Lambda^{*}\Lambda, and s=Λys=\Lambda^{*}y, if Δτ\Delta_{\tau} denotes the recovery map assigning to ymy\in\mathbb{R}^{m} the solution fτf_{\tau} to the regularization program (7), these identities read, when ΛΛ=Idm\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}},

(19) P𝒱Δ0=0,ΛΛΔ1=Λ,P𝒱(Δ0Δ1)=ΛΛ(Δ0Δ1)=Δ0Δ1.P_{\mathcal{V}^{\perp}}\Delta_{0}=0,\qquad\Lambda^{*}\Lambda\Delta_{1}=\Lambda^{*},\qquad P_{\mathcal{V}^{\perp}}(\Delta_{0}-\Delta_{1})=\Lambda^{*}\Lambda(\Delta_{0}-\Delta_{1})=\Delta_{0}-\Delta_{1}.
Remark.

Considering again the case R=P𝒱R=P_{\mathcal{V}^{\perp}}, r=0r=0, S=ΛΛS=\Lambda^{*}\Lambda, and s=Λys=\Lambda^{*}y, Proposition 3 implies that fτ𝒱+ran(Λ)f_{\tau}\in\mathcal{V}+{\rm ran}(\Lambda^{*}) for any τ[0,1]\tau\in[0,1], given that the latter holds for τ=0\tau=0 and for τ=1\tau=1. For τ=0\tau=0, this is because the constraint P𝒱f=0P_{\mathcal{V}^{\perp}}f=0 of the optimization program defining f0f_{0} imposes f0𝒱f_{0}\in\mathcal{V}. For τ=1\tau=1, this is a result established e.g. in [Foucart, Liao, Shahrampour, and Wang, 2020, Theorem 2]. The said result also provides an efficient way to compute the solution fτf_{\tau} of (7) even when HH is infinite dimensional, as stated in the appendix.

3 Local Optimality

Our goal in this section is to determine locally optimal recovery maps. In other words, the section is concerned with Chebyshev centers. We start by considering the situation of an arbitrary observation map Λ\Lambda, but with a restriction on the space HH. Next, lifting this restriction on HH, we refine the result in the particular case of an observation map satisfying ΛΛ=Idm\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}}.

3.1 Arbitrary observations

In this subsection, we reproduce a result from [Beck and Eldar, 2007], albeit with different notation, and explain how it implies the statement of item L1. The result in question, namely Corollary 3.2, relies on the S-procedure with two constraints, and as such cannot be claimed in the real setting.

Theorem 4.

Let HH be a complex Hilbert space. Let R,SR,S be two linear maps from HH into other Hilbert spaces containing points r,sr,s, respectively. Suppose the existence of f~H\widetilde{f}\in H such that Rf~r<ε\|R\widetilde{f}-r\|<\varepsilon and Sf~s<η\|S\widetilde{f}-s\|<\eta and the existence of τ[0,1]\tau\in[0,1] such that (1τ)RR+τSS(1-\tau)R^{*}R+\tau S^{*}S is positive definite. Then the Chebyshev center of {fH:Rfrε,Sfsη}\{f\in H:\|Rf-r\|\leq\varepsilon,\|Sf-s\|\leq\eta\} equals f=(cRR+dSS)1(cRr+dSs)f_{\sharp}=\big{(}c_{\sharp}R^{*}R+d_{\sharp}S^{*}S\big{)}^{-1}(c_{\sharp}R^{*}r+d_{\sharp}S^{*}s), where c,dc_{\sharp},d_{\sharp} are solutions to

minimizec,d,t0(ε2r2)c+(η2s2)d+t\displaystyle\underset{c,d,t\geq 0}{\rm minimize}\,\;(\varepsilon^{2}-\|r\|^{2})c+(\eta^{2}-\|s\|^{2})d+t s.to cRR+dSSId,\displaystyle cR^{*}R+dS^{*}S\succeq\mathrm{Id},
and [cRR+dSS|cRrdSsc(Rr)d(Ss)|t]0.\displaystyle\begin{bmatrix}cR^{*}R+dS^{*}S&|&-cR^{*}r-dS^{*}s\\ \hline\cr-c(R^{*}r)^{*}-d(S^{*}s)^{*}&|&t\end{bmatrix}\succeq 0.

The statement made in item L1 is of course derived by taking R=P𝒱R=P_{\mathcal{V}^{\perp}}, r=0r=0, S=ΛS=\Lambda, and s=ys=y. Theorem 4 is indeed applicable, as f~=(f0+f1)/2\widetilde{f}=(f_{0}+f_{1})/2 satisfies the strict feasibility conditions, while the positive definiteness condition is not only fulfilled for some τ[0,1]\tau\in[0,1], but for all τ(0,1)\tau\in(0,1), since ((1τ)P𝒱+τΛΛ)f,f=(1τ)P𝒱f2+τΛf20\langle((1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda)f,f\rangle=(1-\tau)\|P_{\mathcal{V}^{\perp}}f\|^{2}+\tau\|\Lambda f\|^{2}\geq 0, with equality only possible if f𝒱ker(Λ)f\in\mathcal{V}\cap\ker(\Lambda), i.e., if f=0f=0 thanks to the assumption (5). We also note that, by virtue of (12), the element ff_{\sharp} defined above is nothing else than the regularized solution with parameter τ=d/(c+d)\tau=d_{\sharp}/(c_{\sharp}+d_{\sharp}).

3.2 Orthonormal observations

In this subsection, we place ourselves in the situation of an observation map satisfying ΛΛ=Idm\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}} and we provide a proof of the statements made item L2. In fact, we prove some slightly more general results and L2 follows by taking R=P𝒱R=P_{\mathcal{V}^{\perp}}, r=0r=0, S=ΛΛS=\Lambda^{*}\Lambda, and s=Λys=\Lambda^{*}y. Note that we must separate the cases where R=IdR=\mathrm{Id} (corresponding to 𝒱={0}\mathcal{V}=\{0\}) and where RR is a proper orthogonal projection (corresponding to 𝒱{0}\mathcal{V}\not=\{0\}). We emphasize that, in each of these two cases, the optimal parameter τ\tau_{\sharp} is not independent of yy. Therefore, in view of (15) and of the linear dependence of f0f_{0} and f1f_{1} on yy, the regularized solution fτf_{\tau_{\sharp}} does not depend linearly on yy. In other words, the locally optimal recovery map is not a linear map. The following two simple lemmas will be used to deal with both cases.

Lemma 5.

Let R,SR,S be two linear maps from HH into other Hilbert spaces containing points r,sr,s, respectively. Given fHf_{\sharp}\in H, let

hargmaxhHhs.to {Rfr+Rhε,Sfs+Shη.h_{\sharp}\in\underset{h\in H}{{\rm argmax}\,}\|h\|\qquad\mbox{s.to }\left\{\begin{matrix}\|Rf_{\sharp}-r+Rh\|&\leq\varepsilon,\\ \|Sf_{\sharp}-s+Sh\|&\leq\eta.\end{matrix}\right.

If the orthogonality conditions

(20) R(Rfr),h=0andS(Sfs),h=0\langle R^{*}(Rf_{\sharp}-r),h_{\sharp}\rangle=0\qquad\mbox{and}\qquad\langle S^{*}(Sf_{\sharp}-s),h_{\sharp}\rangle=0

are fulfilled, then ff_{\sharp} is the Chebyshev center of the set {fH:Rfrε,Sfsη},\{f\in H:\|Rf-r\|\leq\varepsilon,\|Sf-s\|\leq\eta\}, i.e., for any gHg\in H,

(21) supRfrεSfsηfgsupRfrεSfsηff.\sup_{\begin{subarray}{c}\|Rf-r\|\leq\varepsilon\\ \|Sf-s\|\leq\eta\end{subarray}}\|f-g\|\geq\sup_{\begin{subarray}{c}\|Rf-r\|\leq\varepsilon\\ \|Sf-s\|\leq\eta\end{subarray}}\|f-f_{\sharp}\|.
Proof.

First, writing f=f+hf=f_{\sharp}+h, we easily see that the right-hand side of (21) reduces to h\|h_{\sharp}\|. Second, let us remark that the orthogonality conditions guarantee that f±:=f±hf_{\pm}:=f_{\sharp}\pm h_{\sharp} both satisfy Rf±rε\|Rf_{\pm}-r\|\leq\varepsilon and Sf±sη\|Sf_{\pm}-s\|\leq\eta. For instance, we have

(22) Rf±r2=Rfr±Rh2=Rfr2+Rh2=Rfr+Rh2ε2,\|Rf_{\pm}-r\|^{2}=\|Rf_{\sharp}-r\pm Rh_{\sharp}\|^{2}=\|Rf_{\sharp}-r\|^{2}+\|Rh_{\sharp}\|^{2}=\|Rf_{\sharp}-r+Rh_{\sharp}\|^{2}\leq\varepsilon^{2},

where the latter inequality reflects the feasibility of hh_{\sharp}. Therefore, the left-hand side of (21) is bounded below by

(23) max±f±g12(f+g+fg)12(f+g)(fg)=122h=h,\max_{\pm}\|f_{\pm}-g\|\geq\frac{1}{2}\big{(}\|f_{+}-g\|+\|f_{-}-g\|\big{)}\geq\frac{1}{2}\|(f_{+}-g)-(f_{-}-g)\|=\frac{1}{2}\|2h_{\sharp}\|=\|h_{\sharp}\|,

i.e., by the right-hand side of (21). ∎

The next lemma somehow relates to the S-procedure. However, it does not involve the coveted (and usually invalid) equivalence, but only the straightforward implication.

Lemma 6.

Let R,SR,S be two linear maps from HH into other Hilbert spaces containing points r,sr,s, respectively. Given fHf_{\sharp}\in H and hHh_{\sharp}\in H, suppose that

(24) Rfr+Rh2=ε2andSfs+Sh2=η2,\|Rf_{\sharp}-r+Rh_{\sharp}\|^{2}=\varepsilon^{2}\qquad\mbox{and}\qquad\|Sf_{\sharp}-s+Sh_{\sharp}\|^{2}=\eta^{2},

and that there exist a,b0a,b\geq 0 such that

(25) aRR+bSSIdaR^{*}R+bS^{*}S\succeq\mathrm{Id}

as well as

(26) aR(Rfr)+bS(Sfs)+(aRR+bSS)h=h.aR^{*}(Rf_{\sharp}-r)+bS^{*}(Sf_{\sharp}-s)+(aR^{*}R+bS^{*}S)h_{\sharp}=h_{\sharp}.

Then, one has

(27) hargmaxhHhs.to {Rfr+Rhε,Sfs+Shη.h_{\sharp}\in\underset{h\in H}{{\rm argmax}\,}\|h\|\qquad\mbox{s.to }\left\{\begin{matrix}\|Rf_{\sharp}-r+Rh\|&\leq\varepsilon,\\ \|Sf_{\sharp}-s+Sh\|&\leq\eta.\end{matrix}\right.
Proof.

By writing the variable in the optimization program (27) as h=h+gh=h_{\sharp}+g, the constraints on hh transform into constraints on gg. Thanks to (24), the latter constraints read

RRg,g+2R(Rfr+Rh),g0andSSg,g+2S(Sfs+Sh),g0.\langle R^{*}Rg,g\rangle+2\langle R^{*}(Rf_{\sharp}-r+Rh_{\sharp}),g\rangle\leq 0\quad\mbox{and}\quad\langle S^{*}Sg,g\rangle+2\langle S^{*}(Sf_{\sharp}-s+Sh_{\sharp}),g\rangle\leq 0.

Combining these constraints—specifically, multiplying the first by aa, the second by bb, and summing—implies that

0\displaystyle 0 (aRR+bSS)g,g+2aR(Rfr)+bS(Sfs)+(aRR+bSS)h,g\displaystyle\geq\langle(aR^{*}R+bS^{*}S)g,g\rangle+2\langle aR^{*}(Rf_{\sharp}-r)+bS^{*}(Sf_{\sharp}-s)+(aR^{*}R+bS^{*}S)h_{\sharp},g\rangle
g,g+2h,g,\displaystyle\geq\langle g,g\rangle+2\langle h_{\sharp},g\rangle,

where (25) and (26) were exploited in the last step. In other words, one has 0h+g2h20\geq\|h_{\sharp}+g\|^{2}-\|h_{\sharp}\|^{2}, i.e., h2h2\|h\|^{2}\leq\|h_{\sharp}\|^{2}, under the constraints on hh, proving that hh_{\sharp} is indeed a maximizer in (27). ∎

3.2.1 The case R=IdR=\mathrm{Id}

As mentioned earlier, the case R=IdR=\mathrm{Id} corresponds to the choice 𝒱={0}\mathcal{V}=\{0\}, i.e., to a model set 𝒦\mathcal{K} being an origin-centered ball in HH, and to regularizations being classical Tikhonov regularizations. The arguments are slightly less involved here than for the case RIdR\not=\mathrm{Id}. Here is the main result.

Theorem 7.

Let SS be an orthogonal projector on HH with ker(S){0}\ker(S)\not=\{0\} and let rHr\in H, sran(S)s\in{\rm ran}(S). The solution fτf_{\tau_{\sharp}} to the regularization program (14) with parameter

τ=max{1ηSrs,0}\tau_{\sharp}=\max\bigg{\{}1-\frac{\eta}{\|Sr-s\|},0\bigg{\}}

is the Chebyshev center of the set {fH:frε,Sfsη}\{f\in H:\|f-r\|\leq\varepsilon,\|Sf-s\|\leq\eta\}.

Proof.

Before separating two cases, we remark that Srsε+η\|Sr-s\|\leq\varepsilon+\eta is implicitly assumed for the above set to be nonempty. Now, we first consider the case Srs>η\|Sr-s\|>\eta. Defining f:=fτf_{\sharp}:=f_{\tau_{\sharp}} with τ=1η/Srs(0,1)\tau_{\sharp}=1-\eta/\|Sr-s\|\in(0,1), our objective is to find hHh_{\sharp}\in H and a,b0a,b\geq 0 for which conditions (24), (25), and (26) of Lemma 6 are fulfilled, so that hh_{\sharp} is a maximizer appearing in Lemma 5, and then to verify that the orthogonality conditions (20) hold, so that ff_{\sharp} is indeed the required Chebyshev center. We take any hker(S)h_{\sharp}\in\ker(S), with a normalization will be decided later, and a=1a=1, b=τ/(1τ)b=\tau_{\sharp}/(1-\tau_{\sharp}). In this way, since R=IdR=\mathrm{Id}, condition (25) is automatic, and condition (26) follows from the characterization (11) written here as (1τ)(fr)=τ(Sfs)(1-\tau_{\sharp})(f_{\sharp}-r)=-\tau_{\sharp}(Sf_{\sharp}-s). This characterization also allows us to deduce (20) only from Sfs,h=0\langle Sf_{\sharp}-s,h_{\sharp}\rangle=0, which holds because the spaces ran(S){\rm ran}(S) and ker(S)\ker(S) are orthogonal. The remaining condition (24) now reads fr2+h2=ε2\|f_{\sharp}-r\|^{2}+\|h_{\sharp}\|^{2}=\varepsilon^{2} and Sfs2=η2\|Sf_{\sharp}-s\|^{2}=\eta^{2}. Recalling from Proposition 3 that f=(1τ)f0+τf1f_{\sharp}=(1-\tau_{\sharp})f_{0}+\tau_{\sharp}f_{1}, while taking into account that f0=rf_{0}=r here and that f1=f0+S(f1f0)=r+sSrf_{1}=f_{0}+S(f_{1}-f_{0})=r+s-Sr thanks to (18), we have fr=τ(sSr)f_{\sharp}-r=\tau_{\sharp}(s-Sr) and Sfs=(1τ)(sSr)Sf_{\sharp}-s=-(1-\tau_{\sharp})(s-Sr). Thus, condition (24) reads

τ2sSr2+h2=ε2and(1τ)2sSr2=η2.\tau_{\sharp}^{2}\|s-Sr\|^{2}+\|h_{\sharp}\|^{2}=\varepsilon^{2}\qquad\mbox{and}\qquad(1-\tau_{\sharp})^{2}\|s-Sr\|^{2}=\eta^{2}.

The latter is justified by our choice of τ\tau_{\sharp}, while the former can simply be achieved by normalizing hh_{\sharp}, so long as ετsSr\varepsilon\geq\tau_{\sharp}\|s-Sr\|, i.e., εsSrη\varepsilon\geq\|s-Sr\|-\eta, which is our implicit assumption for nonemptiness of the set under consideration.

Next, we consider the case Srsη\|Sr-s\|\leq\eta. We note that this implies that rr belongs to the set {fH:frε,Sfsη}\{f\in H:\|f-r\|\leq\varepsilon,\|Sf-s\|\leq\eta\}—we are going to show that rr is actually the Chebyshev center of this set. In other words, since r=f0r=f_{0}, this means that fτf_{\tau_{\sharp}} with τ=0\tau_{\sharp}=0 is the Chebyshev center. To this end, we shall establish that, for any gHg\in H,

supfrεSfsηfgsupfrεSfsηfr.\sup_{\begin{subarray}{c}\|f-r\|\leq\varepsilon\\ \|Sf-s\|\leq\eta\end{subarray}}\|f-g\|\geq\sup_{\begin{subarray}{c}\|f-r\|\leq\varepsilon\\ \|Sf-s\|\leq\eta\end{subarray}}\|f-r\|.

On the one hand, the right-hand side is obviously bounded above by ε\varepsilon. On the other hand, selecting hker(S)h\in\ker(S) with h=ε\|h\|=\varepsilon, we define f±:=r±hf_{\pm}:=r\pm h to obtain f±r=h=ε\|f_{\pm}-r\|=\|h\|=\varepsilon and Sf±s=Srsη\|Sf_{\pm}-s\|=\|Sr-s\|\leq\eta. Thus, the left-hand side is bounded below by

max±f±g12f+g+12fg12(f+g)(fg)=122h=ε.\max_{\pm}\|f_{\pm}-g\|\geq\frac{1}{2}\|f_{+}-g\|+\frac{1}{2}\|f_{-}-g\|\geq\frac{1}{2}\|(f_{+}-g)-(f_{-}-g)\|=\frac{1}{2}\|2h\|=\varepsilon.

This proves that the left-hand side is larger than or equal to the right-hand side, as required. ∎

3.2.2 The case RIdR\not=\mathrm{Id}

We now assume that RR is a proper orthogonal projection, i.e., that RIdR\not=\mathrm{Id}, which corresponds to the case 𝒱{0}\mathcal{V}\not=\{0\}. The main result is stated below. To apply it in practice, the optimal parameter τ\tau needs to be computed by solving an equation involving the smallest eigenvalue of self-adjoint operator depending on τ\tau. This can be done using an all purpose routine. We could also devise our own bisection method, Newton method (since the derivative dλmin/dτd\lambda_{\min}/d\tau is accessible, see the appendix), or secant method.

Theorem 8.

Let RId,SIdR\not=\mathrm{Id},S\not=\mathrm{Id} be two orthogonal projectors on HH such that ker(R)ker(S)={0}\ker(R)\cap\ker(S)=\{0\} and let rran(R)r\in{\rm ran}(R), sran(S)s\in{\rm ran}(S). Consider τ\tau_{\sharp} to be a (often unique) τ\tau between 1/21/2 and ε/(ε+η)\varepsilon/(\varepsilon+\eta) such that

(28) λmin((1τ)R+τS)(1τ)2ε2τ2η2(1τ)ε2τη2+(1τ)τ(12τ)δ2=0,\lambda_{\min}((1-\tau)R+\tau S)-\frac{(1-\tau)^{2}\varepsilon^{2}-\tau^{2}\eta^{2}}{(1-\tau)\varepsilon^{2}-\tau\eta^{2}+(1-\tau)\tau(1-2\tau)\delta^{2}}=0,

where δ\delta is precomputed as δ=min{Rfr:Sf=s}=min{Sfs:Rf=r}\delta=\min\{\|Rf-r\|:Sf=s\}=\min\{\|Sf-s\|:Rf=r\}. Then the solution fτf_{\tau_{\sharp}} of the regularization program (14) with parameter τ\tau_{\sharp} is the Chebyshev center of the set {fH:Rfrε,Sfsη}\{f\in H:\|Rf-r\|\leq\varepsilon,\|Sf-s\|\leq\eta\}.

Remark.

It there is no observation error, i.e., if η=0\eta=0, then the parameter solving equation (28) is τ=1\tau_{\sharp}=1. In case R=P𝒱R=P_{\mathcal{V}^{\perp}}, r=0r=0, S=ΛΛS=\Lambda^{*}\Lambda, and s=Λys=\Lambda^{*}y, this means that the Chebyshev center is f1=argminP𝒱ff_{1}={\rm argmin}\,\|P_{\mathcal{V}^{\perp}}f\| s.to Λf=y\Lambda f=y and we thus retrieve the result of [Binev, Cohen, Dahmen, DeVore, Petrova, and Wojtaszczyk, 2017].

The proof of Theorem 8 requires an additional result that gives information about the norms of the projections RhRh and ShSh when hh is an eigenvector of the positive semidefinite operator (1τ)R+τS(1-\tau)R+\tau S. This result will be applied for the eigenvector associated with the smallest eigenvalue.

Lemma 9.

Let R,SR,S be two orthogonal projectors on HH. For τ(0,1)\tau\in(0,1), let hHh\in H be an eigenvector of (1τ)R+τS(1-\tau)R+\tau S corresponding to an eigenvalue λ1/2\lambda\not=1/2. Then

(29) Rh2=(τλ)λ(1τ)(12λ)h2andSh2=(1τλ)λτ(12λ)h2.\|Rh\|^{2}=\frac{(\tau-\lambda)\lambda}{(1-\tau)(1-2\lambda)}\|h\|^{2}\qquad\mbox{and}\qquad\|Sh\|^{2}=\frac{(1-\tau-\lambda)\lambda}{\tau(1-2\lambda)}\|h\|^{2}.
Proof.

We notice, on the one hand, that

(30) (1τ)Rh2+τSh2\displaystyle(1-\tau)\|Rh\|^{2}+\tau\|Sh\|^{2} =(1τ)Rh,h+τSh,h=((1τ)R+τS)h,h\displaystyle=(1-\tau)\langle Rh,h\rangle+\tau\langle Sh,h\rangle=\langle((1-\tau)R+\tau S)h,h\rangle
=λh2,\displaystyle=\lambda\|h\|^{2},

and, on the other hand, that

(1τ)2Rh2τ2Sh2\displaystyle(1-\tau)^{2}\|Rh\|^{2}-\tau^{2}\|Sh\|^{2} =(1τ)Rh+τSh,(1τ)RhτSh=λh,(1τ)RhτSh\displaystyle=\langle(1-\tau)Rh+\tau Sh,(1-\tau)Rh-\tau Sh\rangle=\langle\lambda h,(1-\tau)Rh-\tau Sh\rangle
=λ(1τ)Rh2λτSh2.\displaystyle=\lambda(1-\tau)\|Rh\|^{2}-\lambda\tau\|Sh\|^{2}.

Rearranging the latter yields

(31) (1τ)(1τλ)Rh2τ(τλ)Sh2=0.(1-\tau)(1-\tau-\lambda)\|Rh\|^{2}-\tau(\tau-\lambda)\|Sh\|^{2}=0.

Together, the equaltions (30) and (31) form a two-by-two linear system in the unknowns Rh2\|Rh\|^{2} and Sh2\|Sh\|^{2} with determinant (1τ)τ(12λ)0-(1-\tau)\tau(1-2\lambda)\not=0. Its solutions are easily verified to be the ones given in (29). ∎

Remark.

Because Rh2\|Rh\|^{2}, Sh2\|Sh\|^{2}, and h2\|h\|^{2} are all nonnegative, Lemma 9 implicitly guarantees that τλ\tau-\lambda and 1τλ1-\tau-\lambda have the same sign as 12λ01-2\lambda\not=0. These quantities are nonnegative when RIdR\not=\mathrm{Id}, SIdS\not=\mathrm{Id}, and λ\lambda is the smallest eigenvalue—the case of application of the lemma. Indeed, taking fker(R)f\in\ker(R) with f=1\|f\|=1 (which is possible because RIdR\not=\mathrm{Id}), one has

λmin:=λmin((1τ)R+τS)(1τ)Rf+τSf=τSfτ,\lambda_{\min}:=\lambda_{\min}((1-\tau)R+\tau S)\leq\|(1-\tau)Rf+\tau Sf\|=\tau\|Sf\|\leq\tau,

i.e., τλmin0\tau-\lambda_{\min}\geq 0. The inequality λmin1τ\lambda_{\min}\leq 1-\tau, i.e., 1τλmin01-\tau-\lambda_{\min}\geq 0, is obtained in a similar fashion. These inequalities sum up to give 12λmin01-2\lambda_{\min}\geq 0. The latter is in fact (strictly) positive when τ1/2\tau\not=1/2, since either τ\tau or 1τ1-\tau is smaller than 1/21/2, so that λmin<1/2\lambda_{\min}<1/2.

With the above result at hand, we are ready to fully justify the main result of this subsection.

Proof of Theorem 8.

Let us temporarily take for granted the existence of a solution τ\tau_{\sharp} to (28). Defining f:=fτf_{\sharp}:=f_{\tau_{\sharp}}, our objective is again to find hHh_{\sharp}\in H and a,b0a,b\geq 0 for which conditions (24), (25), and (26) of Lemma 6 are fulfilled, so that hh_{\sharp} is a maximizer appearing in Lemma 5, and then to verify that the orthogonality conditions (20) hold, so that ff_{\sharp} is indeed the required Chebyshev center. Writing λ:=λmin((1τ)R+τS)\lambda_{\sharp}:=\lambda_{\min}((1-\tau_{\sharp})R+\tau_{\sharp}S), we choose hh_{\sharp} to be a (so far unnormalized) eigenvector of (1τ)R+τS(1-\tau_{\sharp})R+\tau_{\sharp}S corresponding to the eigenvalue λ\lambda_{\sharp}. Setting a:=(1τ)/λa:=(1-\tau_{\sharp})/\lambda_{\sharp} and b:=τ/λb:=\tau_{\sharp}/\lambda_{\sharp}, conditions (25) is swiftly verified, since RR=RRR^{*}=R, SS=SSS^{*}=S, and

aR+bS=(1τ)R+τSλmin((1τ)R+τS)Id.aR+bS=\frac{(1-\tau_{\sharp})R+\tau_{\sharp}S}{\lambda_{\min}((1-\tau_{\sharp})R+\tau_{\sharp}S)}\succeq\mathrm{Id}.

Then, the characterization (1τ)R(fr)=τS(fs)(1-\tau_{\sharp})R(f_{\sharp}-r)=-\tau_{\sharp}S(f_{\sharp}-s) of the regularization solution ff_{\sharp}, see (11), allows us to validate condition (26) via

aR(fr)+bS(fs)+(aR+bS)h\displaystyle aR(f_{\sharp}-r)+bS(f_{\sharp}-s)+(aR+bS)h_{\sharp} =1λ((1τ)R(fr)+τS(fs)+((1τ)R+τS)h)\displaystyle=\frac{1}{\lambda_{\sharp}}\Big{(}(1-\tau_{\sharp})R(f_{\sharp}-r)+\tau_{\sharp}S(f_{\sharp}-s)+((1-\tau_{\sharp})R+\tau_{\sharp}S)h_{\sharp}\Big{)}
=1λ(0+λh)=h.\displaystyle=\frac{1}{\lambda_{\sharp}}\left(0+\lambda_{\sharp}h_{\sharp}\right)=h_{\sharp}.

The orthogonality conditions (20) are also swiftly verified: the second one follows from the first one using (11); the first one holds because, while hh_{\sharp} is an eigenvector of (1τ)R+τS(1-\tau_{\sharp})R+\tau_{\sharp}S corresponding to its smallest eigenvalue, R(fr)=τ/(1τ)S(fs)R(f_{\sharp}-r)=-\tau_{\sharp}/(1-\tau_{\sharp})S(f_{\sharp}-s) is an eigenvector corresponding to the largest eigenvalue (i.e., to one), since it is invariant when applying both RR and SS. Thus, it remains to verify that the two conditions of (24) are fulfilled. In view of the orthogonality conditions (20), they read

(32) Rfr2+Rh2=ε2andSfs2+Sh2=η2.\|Rf_{\sharp}-r\|^{2}+\|Rh_{\sharp}\|^{2}=\varepsilon^{2}\qquad\mbox{and}\qquad\|Sf_{\sharp}-s\|^{2}+\|Sh_{\sharp}\|^{2}=\eta^{2}.

Now, invoking Proposition 3, as well as Lemma 9, the two conditions of (24) become

(33) τ2δ2+(τλ)λ(1τ)(12λ)h2\displaystyle\tau_{\sharp}^{2}\delta^{2}+\frac{(\tau_{\sharp}-\lambda_{\sharp})\lambda_{\sharp}}{(1-\tau_{\sharp})(1-2\lambda_{\sharp})}\|h_{\sharp}\|^{2} =ε2\displaystyle=\varepsilon^{2}
(34) (1τ)2δ2+(1τλ)λτ(12λ)h2\displaystyle(1-\tau_{\sharp})^{2}\delta^{2}+\frac{(1-\tau_{\sharp}-\lambda_{\sharp})\lambda_{\sharp}}{\tau_{\sharp}(1-2\lambda_{\sharp})}\|h_{\sharp}\|^{2} =η2.\displaystyle=\eta^{2}.

After some simplification work, starting by forming the combinations (1τ)×(1-\tau_{\sharp})\times(33)τ2×-\tau_{\sharp}^{2}\times(34) and (1τλ)(1τ)×(1-\tau_{\sharp}-\lambda_{\sharp})(1-\tau_{\sharp})\times(33)(τλ)(τ)×-(\tau_{\sharp}-\lambda_{\sharp})(\tau_{\sharp})\times(34), these two conditions are seen to be equivalent to

(35) h2\displaystyle\|h_{\sharp}\|^{2} =12λ(2τ1)λ2((1τ)2ε2τ2η2),\displaystyle=\frac{1-2\lambda_{\sharp}}{(2\tau_{\sharp}-1)\lambda_{\sharp}^{2}}\big{(}(1-\tau_{\sharp})^{2}\varepsilon^{2}-\tau_{\sharp}^{2}\eta^{2}\big{)},
(36) λ\displaystyle\lambda_{\sharp} =(1τ)2ε2τ2η2(1τ)ε2τη2+(1τ)τ(12τ)δ2.\displaystyle=\frac{(1-\tau_{\sharp})^{2}\varepsilon^{2}-\tau_{\sharp}^{2}\eta^{2}}{(1-\tau_{\sharp})\varepsilon^{2}-\tau_{\sharp}\eta^{2}+(1-\tau_{\sharp})\tau_{\sharp}(1-2\tau_{\sharp})\delta^{2}}.

These two conditions can be fulfilled: the latter is the condition that defined τ\tau_{\sharp}, i.e., (28), while the former is simply guaranteed by properly normalizing the eigenvector hh_{\sharp}.

Before establishing the existence τ\tau_{\sharp}, we point out that its uniqueness holds when f0f1f_{0}\not=f_{1}, i.e., when there is no fHf\in H such that Rf=rRf=r and Sf=sSf=s—such an ff would solve the regularization program for any τ[0,1]\tau\in[0,1]. Indeed, if ττ\tau\not=\tau^{\prime} were two solutions to (28), then the previous argument would imply that fτf_{\tau} and fτf_{\tau^{\prime}} are both Chebyshev centers, which could only happen if they were equal, i.e., if f0=f1f_{0}=f_{1} by (15). Now, for the existence of τ\tau_{\sharp}, it will be justified by the fact that the function

θ:τλmin((1τ)R+τS)(1τ)2ε2τ2η2(1τ)ε2τη2+(1τ)τ(12τ)δ2\theta:\tau\mapsto\lambda_{\min}((1-\tau)R+\tau S)-\frac{(1-\tau)^{2}\varepsilon^{2}-\tau^{2}\eta^{2}}{(1-\tau)\varepsilon^{2}-\tau\eta^{2}+(1-\tau)\tau(1-2\tau)\delta^{2}}

is continuous between 1/21/2 and ε/(ε+η)\varepsilon/(\varepsilon+\eta) and takes values of different signs there. To see the difference in sign, notice that λmin((1τ)R+τS)[0,1/2]\lambda_{\min}((1-\tau)R+\tau S)\in[0,1/2] by the remark after Lemma 9—this is where the assumption RIdR\not=\mathrm{Id} is critical—so that

θ(12)12120andθ(εε+η)000.\theta\bigg{(}\frac{1}{2}\bigg{)}\leq\frac{1}{2}-\frac{1}{2}\leq 0\qquad\mbox{and}\qquad\theta\bigg{(}\frac{\varepsilon}{\varepsilon+\eta}\bigg{)}\geq 0-0\geq 0.

To see the continuity, we need the continuity of the smallest eigenvalue as a function of τ\tau and the nonvanishing of the denominator (1τ)ε2τη2+(1τ)τ(12τ)δ2(1-\tau)\varepsilon^{2}-\tau\eta^{2}+(1-\tau)\tau(1-2\tau)\delta^{2} between 1/21/2 and ε/(ε+η)\varepsilon/(\varepsilon+\eta). The former is a consequence of Weyl’s inequality, yielding

|λmin((1τ)R+τS)λmin((1τ)R+τS)|((1τ)R+τS)((1τ)R+τS)=|ττ|RS.|\lambda_{\min}((1-\tau)R+\tau S)-\lambda_{\min}((1-\tau^{\prime})R+\tau^{\prime}S)|\leq\|((1-\tau)R+\tau S)-((1-\tau^{\prime})R+\tau^{\prime}S)\|=|\tau-\tau^{\prime}|\,\|R-S\|.

The latter is less immediate. We start by using (16) and recalling the very definition of fτf_{\tau} to write

(1τ)τδ2=(1τ)Rfτr2+τSfτr2(1τ)ε2+τη2.(1-\tau)\tau\delta^{2}=(1-\tau)\|Rf_{\tau}-r\|^{2}+\tau\|Sf_{\tau}-r\|^{2}\leq(1-\tau)\varepsilon^{2}+\tau\eta^{2}.

Therefore, if the denominator vanished for some τ(0,1){1/2}\tau\in(0,1)\setminus\{1/2\}, we would have

0\displaystyle 0 =(1τ)ε2τη212τ+(1τ)τδ2(1τ)ε2τη212τ+(1τ)ε2+τη2=2(1τ)2ε22τ2η212τ\displaystyle=\frac{(1-\tau)\varepsilon^{2}-\tau\eta^{2}}{1-2\tau}+(1-\tau)\tau\delta^{2}\leq\frac{(1-\tau)\varepsilon^{2}-\tau\eta^{2}}{1-2\tau}+(1-\tau)\varepsilon^{2}+\tau\eta^{2}=\frac{2(1-\tau)^{2}\varepsilon^{2}-2\tau^{2}\eta^{2}}{1-2\tau}
=((1τ)ε+τη)((1τ)ετη)1/2τ=((1τ)ε+τη)(ε+η)ε/(ε+η)τ1/2τ.\displaystyle=\frac{\big{(}(1-\tau)\varepsilon+\tau\eta\big{)}\big{(}(1-\tau)\varepsilon-\tau\eta\big{)}}{1/2-\tau}=\big{(}(1-\tau)\varepsilon+\tau\eta\big{)}\big{(}\varepsilon+\eta\big{)}\frac{\varepsilon/(\varepsilon+\eta)-\tau}{1/2-\tau}.

This would force ε/(ε+η)τ\varepsilon/(\varepsilon+\eta)-\tau and 1/2τ1/2-\tau to have the same sign, contrary to the assumption that τ\tau runs between 1/21/2 and ε/(ε+η)\varepsilon/(\varepsilon+\eta). Thus, the nonvanishing of the denominator is explained, concluding the proof. ∎

Remark.

The above arguments contain the value of the minimal local worst-case error, i.e., of the Chebyshev radius of the set 𝒞={fH:Rfrε,Sfsη}\mathcal{C}=\{f\in H:\|Rf-r\|\leq\varepsilon,\|Sf-s\|\leq\eta\}. Indeed, we recall from the proof of Lemma 5 that this radius equals h\|h_{\sharp}\|, whose value was derived in (35). This expression can be simplified with the help of (36) by noticing that

12λλ=(2τ1)(1τ)ε2+τη2(1τ)τδ2(1τ)2ε2τ2η2.\frac{1-2\lambda_{\sharp}}{\lambda_{\sharp}}=(2\tau_{\sharp}-1)\frac{(1-\tau_{\sharp})\varepsilon^{2}+\tau_{\sharp}\eta^{2}-(1-\tau_{\sharp})\tau_{\sharp}\delta^{2}}{(1-\tau_{\sharp})^{2}\varepsilon^{2}-\tau_{\sharp}^{2}\eta^{2}}.

As a consequence, we deduce that the Chebyshev radius satisfies

radius(𝒞)2=1τλε2+τλη2(1τ)τλδ2,λ:=λmin((1τ)R+τS).{\rm radius}(\mathcal{C})^{2}=\frac{1-\tau_{\sharp}}{\lambda_{\sharp}}\varepsilon^{2}+\frac{\tau_{\sharp}}{\lambda_{\sharp}}\eta^{2}-\frac{(1-\tau_{\sharp})\tau_{\sharp}}{\lambda_{\sharp}}\delta^{2},\qquad\quad\lambda_{\sharp}:=\lambda_{\min}((1-\tau_{\sharp})R+\tau_{\sharp}S).

4 Global Optimality

Our goal in this section is to uncover some favorable globally optimal recovery maps—favorable in the sense that they are linear maps. We start by considering the situation of an arbitrary observation map Λ\Lambda before moving to the particular case where it satisfies ΛΛ=Idm\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}}.

4.1 Arbitrary observations

In this subsection, we first recall a standard lower bound for the global worst-case error. This lower bound, already exploited e.g. in [Micchelli, 1993], shall be expressed as the minimal value of a certain semidefinite program. This expression will allow us to demonstrate that the lower bound is achieved by the regularization map

Δτ:ymargminfH(1τ)P𝒱f2+τΛfy2\Delta_{\tau}:y\in\mathbb{R}^{m}\mapsto\underset{f\in H}{{\rm argmin}\,}\;(1-\tau)\|P_{\mathcal{V}^{\perp}}f\|^{2}+\tau\|\Lambda f-y\|^{2}

for some parameter τ(0,1)\tau\in(0,1) to be explicitly determined. Here is a precise formulation of the result.

Theorem 10.

Given the approximability set 𝒦={fH:dist(f,𝒱)ε}\mathcal{K}=\{f\in H:{\rm dist}(f,\mathcal{V})\leq\varepsilon\} and the uncertainty set ={em:eη}\mathcal{E}=\{e\in\mathbb{R}^{m}:\|e\|\leq\eta\}, define τ:=d/(c+d)\tau_{\flat}:=d_{\flat}/(c_{\flat}+d_{\flat}) where c,d0c_{\flat},d_{\flat}\geq 0 are solutions to

minimizec,d0cε2+dη2s.tocP𝒱+dΛΛId.\underset{c,d\geq 0}{\rm minimize}\,\;c\varepsilon^{2}+d\eta^{2}\qquad\mbox{s.to}\quad cP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda\succeq\mathrm{Id}.

Then the regularization map Δτ\Delta_{\tau_{\flat}} is a globally optimal recovery map over 𝒦\mathcal{K} and \mathcal{E}, i.e.,

(37) gwce(Δτ)=infΔ:mHgwce(Δ).{\rm gwce}(\Delta_{\tau_{\flat}})=\inf_{\Delta:\mathbb{R}^{m}\to H}{\rm gwce}(\Delta).

The proof relies on three lemmas given below, the first of which introducing the said lower bound.

Lemma 11.

For any recovery map Δ:mH\Delta:\mathbb{R}^{m}\to H, one has gwce(Δ)lb{\rm gwce}(\Delta)\geq{\rm lb}, where

lb:=supP𝒱hεΛhηh.{\rm lb}:=\sup_{\begin{subarray}{c}\|P_{\mathcal{V}^{\perp}}h\|\leq\varepsilon\\ \|\Lambda h\|\leq\eta\end{subarray}}\|h\|.
Proof.

As a reminder, the global worst-case error of Δ\Delta is defined by

gwce(Δ)=supP𝒱fεeηfΔ(Λf+e).{\rm gwce}(\Delta)=\sup_{\begin{subarray}{c}\|P_{\mathcal{V}^{\perp}}f\|\leq\varepsilon\\ \|e\|\leq\eta\end{subarray}}\|f-\Delta(\Lambda f+e)\|.

For any hHh\in H such that P𝒱hε\|P_{\mathcal{V}^{\perp}}h\|\leq\varepsilon and Λhη\|\Lambda h\|\leq\eta, since f±=±hf_{\pm}=\pm h satisfies P𝒱f±ε\|P_{\mathcal{V}^{\perp}}f_{\pm}\|\leq\varepsilon and e±=Λhe_{\pm}=\mp\Lambda h satisfies e±η\|e_{\pm}\|\leq\eta, we have

gwce(Δ)\displaystyle{\rm gwce}(\Delta) max±f±Δ(Λf±+e±)=max±f±Δ(0)12f+Δ(0)+12fΔ(0)\displaystyle\geq\max_{\pm}\|f_{\pm}-\Delta(\Lambda f_{\pm}+e_{\pm})\|=\max_{\pm}\|f_{\pm}-\Delta(0)\|\geq\frac{1}{2}\|f_{+}-\Delta(0)\|+\frac{1}{2}\|f_{-}-\Delta(0)\|
12(f+Δ(0))(fΔ(0))=122h=h.\displaystyle\geq\frac{1}{2}\|(f_{+}-\Delta(0))-(f_{-}-\Delta(0))\|=\frac{1}{2}\|2h\|=\|h\|.

Taking the supremum over hh leads to the required inequality gwce(Δ)lb{\rm gwce}(\Delta)\geq{\rm lb}. ∎

The second lemma expresses the square of the lower bound as the minimal value of a semidefinite program. In passing, the square of the global worst-case error of a linear recovery map is also related to the minimal value of a semidefinite program.

Lemma 12.

One has

(38) lb2=minc,d0cε2+dη2s.tocP𝒱+dΛΛId.{\rm lb}^{2}=\min_{c,d\geq 0}\;c\varepsilon^{2}+d\eta^{2}\qquad\mbox{s.to}\quad cP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda\succeq\mathrm{Id}.

Moreover, if a recovery map Δ:mH\Delta:\mathbb{R}^{m}\to H is linear, one also has

(39) gwce(Δ)2minc,d0cε2+dη2s.to[cP𝒱|00|dIdm][IdΛΔΔ][IdΔΛ|Δ].{\rm gwce}(\Delta)^{2}\leq\min_{c,d\geq 0}\;c\varepsilon^{2}+d\eta^{2}\qquad\mbox{s.to}\quad\begin{bmatrix}cP_{\mathcal{V}^{\perp}}&|&0\\ \hline\cr 0&|&d\,\mathrm{Id}_{\mathbb{R}^{m}}\end{bmatrix}\succeq\begin{bmatrix}\mathrm{Id}-\Lambda^{*}\Delta^{*}\\ \hline\cr\Delta^{*}\end{bmatrix}\begin{bmatrix}\mathrm{Id}-\Delta\Lambda\;|\;\Delta\end{bmatrix}.
Proof.

The first semidefinite characterization is based on the version of the S-procedure stated in Theorem 1. Precisely, we write the square of the lower bound as

lb2\displaystyle{\rm lb}^{2} =suphh2\displaystyle=\;\,\sup_{h}\,\|h\|^{2} s.to P𝒱h2ε2 and Λh2η2\displaystyle\mbox{s.to }\|P_{\mathcal{V}^{\perp}}h\|^{2}\leq\varepsilon^{2}\mbox{ and }\|\Lambda h\|^{2}\leq\eta^{2}
=infγγ\displaystyle=\;\,\inf_{\gamma}\,\gamma s.to h2γ whenever P𝒱h2ε2 and Λh2η2\displaystyle\mbox{s.to }\|h\|^{2}\leq\gamma\mbox{ whenever }\|P_{\mathcal{V}^{\perp}}h\|^{2}\leq\varepsilon^{2}\mbox{ and }\|\Lambda h\|^{2}\leq\eta^{2}
=infγγ\displaystyle=\;\,\inf_{\gamma}\,\gamma s.to c,d0:h2γc(P𝒱h2ε2)+d(Λh2η2) for all hH\displaystyle\mbox{s.to }\exists\,c,d\geq 0:\;\|h\|^{2}-\gamma\leq c(\|P_{\mathcal{V}^{\perp}}h\|^{2}-\varepsilon^{2})+d(\|\Lambda h\|^{2}-\eta^{2})\mbox{ for all }h\in H
=infγc,d0γ\displaystyle=\inf_{\begin{subarray}{c}\gamma\\ c,d\geq 0\end{subarray}}\,\gamma s.to cP𝒱h,h+dΛΛh,hh,h+γcε2dη20 for all hH.\displaystyle\mbox{s.to }c\langle P_{\mathcal{V}^{\perp}}h,h\rangle+d\langle\Lambda^{*}\Lambda h,h\rangle-\langle h,h\rangle+\gamma-c\varepsilon^{2}-d\eta^{2}\geq 0\mbox{ for all }h\in H.

The validity of Theorem 1 in ensured by the facts that P𝒱h~2ε2<0\|P_{\mathcal{V}^{\perp}}\widetilde{h}\|^{2}-\varepsilon^{2}<0 and Λh~2η2<0\|\Lambda\widetilde{h}\|^{2}-\eta^{2}<0 for h~=0\widetilde{h}=0 and that P𝒱+ΛΛ0P_{\mathcal{V}^{\perp}}+\Lambda^{*}\Lambda\succ 0. Note that the resulting constraint decouples as cP𝒱h+dΛΛhh,h0\langle cP_{\mathcal{V}^{\perp}}h+d\Lambda^{*}\Lambda h-h,h\rangle\geq 0 for all hHh\in H, i.e., cP𝒱+dΛΛId0cP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda-\mathrm{Id}\succeq 0, and γcε2dη20\gamma-c\varepsilon^{2}-d\eta^{2}\geq 0. Taking the minimal value of γ\gamma under the latter constraint, namely cε2+dη2c\varepsilon^{2}+d\eta^{2}, leads to the expression of lb2{\rm lb}^{2} given in (38).

As for (39), we start by remarking that the linearity of the recovery map Δ\Delta allows us to write

gwce(Δ)2\displaystyle{\rm gwce}(\Delta)^{2} =supf,efΔΛfΔe2s.to P𝒱f2ε2 and e2η2\displaystyle=\sup_{f,e}\,\|f-\Delta\Lambda f-\Delta e\|^{2}\quad\mbox{s.to }\|P_{\mathcal{V}^{\perp}}f\|^{2}\leq\varepsilon^{2}\mbox{ and }\|e\|^{2}\leq\eta^{2}
=infγγs.to fΔΛfΔe2γ whenever P𝒱f2ε2 and e2η2.\displaystyle=\inf_{\gamma}\,\gamma\quad\mbox{s.to }\|f-\Delta\Lambda f-\Delta e\|^{2}\leq\gamma\mbox{ whenever }\|P_{\mathcal{V}^{\perp}}f\|^{2}\leq\varepsilon^{2}\mbox{ and }\|e\|^{2}\leq\eta^{2}.

The latter constraint can be expressed in terms of the combined variable v=(f,e)H×mv=(f,-e)\in H\times\mathbb{R}^{m} as

(40) [IdΔΛ|Δ]v2γ whenever [P𝒱| 0]v2ε2 and [0|Idm]v2η2.\Big{\|}\begin{bmatrix}\mathrm{Id}-\Delta\Lambda\;|\;\Delta\end{bmatrix}v\Big{\|}^{2}\leq\gamma\mbox{ whenever }\Big{\|}\begin{bmatrix}P_{\mathcal{V}^{\perp}}\;|\;0\end{bmatrix}v\Big{\|}^{2}\leq\varepsilon^{2}\mbox{ and }\Big{\|}\begin{bmatrix}0\;|\;\mathrm{Id}_{\mathbb{R}^{m}}\end{bmatrix}v\Big{\|}^{2}\leq\eta^{2}.

Although the proviso of Theorem 1 is not fulfiled here, the constraint (40) is still a consequence of (but is not equivalent to) the existence of c,d0c,d\geq 0 such that

[IdΔΛ|Δ]v2γc([P𝒱| 0]v2ε2)+d([0|Idm]v2η2)for all vH×m.\;\Big{\|}\begin{bmatrix}\mathrm{Id}-\Delta\Lambda\;|\;\Delta\end{bmatrix}v\Big{\|}^{2}-\gamma\leq c\Big{(}\Big{\|}\begin{bmatrix}P_{\mathcal{V}^{\perp}}\;|\;0\end{bmatrix}v\Big{\|}^{2}-\varepsilon^{2}\Big{)}+d\Big{(}\Big{\|}\begin{bmatrix}0\;|\;\mathrm{Id}_{\mathbb{R}^{m}}\end{bmatrix}v\Big{\|}^{2}-\eta^{2}\Big{)}\quad\mbox{for all }v\in H\times\mathbb{R}^{m}.

The latter can also be written as the existence of c,d0c,d\geq 0 such that, for all vH×mv\in H\times\mathbb{R}^{m},

(c[P𝒱| 0][P𝒱| 0]+d[0|Idm][0|Idm][IdΔΛ|Δ][IdΔΛ|Δ])v,v+γcε2dη20.\Big{\langle}\Big{(}c\begin{bmatrix}P_{\mathcal{V}^{\perp}}\;|\;0\end{bmatrix}^{*}\begin{bmatrix}P_{\mathcal{V}^{\perp}}\;|\;0\end{bmatrix}+d\begin{bmatrix}0\;|\;\mathrm{Id}_{\mathbb{R}^{m}}\end{bmatrix}^{*}\begin{bmatrix}0\;|\;\mathrm{Id}_{\mathbb{R}^{m}}\end{bmatrix}-\begin{bmatrix}\mathrm{Id}-\Delta\Lambda\;|\;\Delta\end{bmatrix}^{*}\begin{bmatrix}\mathrm{Id}-\Delta\Lambda\;|\;\Delta\end{bmatrix}\Big{)}v,v\Big{\rangle}\\ +\gamma-c\varepsilon^{2}-d\eta^{2}\geq 0.

Therefore, we obtain the inequality (instead of the equality)

gwce(Δ)2\displaystyle{\rm gwce}(\Delta)^{2} infγc,d0γ\displaystyle\leq\inf_{\begin{subarray}{c}\gamma\\ c,d\geq 0\end{subarray}}\,\gamma s.to c[P𝒱|00|0]+d[0|00|Idm][IdΛΔΔ][IdΔΛ|Δ]0\displaystyle c\begin{bmatrix}P_{\mathcal{V}^{\perp}}&|&0\\ \hline\cr 0&|&0\end{bmatrix}+d\begin{bmatrix}0&|&0\\ \hline\cr 0&|&\mathrm{Id}_{\mathbb{R}^{m}}\end{bmatrix}-\begin{bmatrix}\mathrm{Id}-\Lambda^{*}\Delta^{*}\\ \hline\cr\Delta^{*}\end{bmatrix}\begin{bmatrix}\mathrm{Id}-\Delta\Lambda\;|\;\Delta\end{bmatrix}\succeq 0
and γcε2dη20.\displaystyle\gamma-c\varepsilon^{2}-d\eta^{2}\geq 0.

The variable γ\gamma can be eliminated from this optimization program by assigning it the value cε2+dη2c\varepsilon^{2}+d\eta^{2}, thus arriving at the semidefinite program announced in (39). ∎

The third and final lemma relates the constraints of (38) and (39): while the constraint of (39) with any regularization map Δτ\Delta_{\tau} implies the constraint of (38), see the appendix, we need the partial converse that the constraint of (38) implies the constraint of (39) for a specific regularization map Δτ\Delta_{\tau}.

Lemma 13.

If cP𝒱+dΛΛIdcP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda\succeq\mathrm{Id}, then setting τ=d/(c+d)\tau=d/(c+d) yields

[cP𝒱|00|dIdm][IdΛΔτΔτ][IdΔτΛ|Δτ].\begin{bmatrix}cP_{\mathcal{V}^{\perp}}&|&0\\ \hline\cr 0&|&d\,\mathrm{Id}_{\mathbb{R}^{m}}\end{bmatrix}\succeq\begin{bmatrix}\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*}\\ \hline\cr\Delta_{\tau}^{*}\end{bmatrix}\begin{bmatrix}\mathrm{Id}-\Delta_{\tau}\Lambda\;|\;\Delta_{\tau}\end{bmatrix}.
Proof.

We recall from Proposition 2 adapted to the current situation that, for any τ(0,1)\tau\in(0,1),

Δτ=((1τ)P𝒱+τΛΛ)1(τΛ),henceIdΔτΛ=((1τ)P𝒱+τΛΛ)1((1τ)P𝒱).\Delta_{\tau}=\big{(}(1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda\big{)}^{-1}(\tau\Lambda^{*}),\quad\mbox{hence}\quad\mathrm{Id}-\Delta_{\tau}\Lambda=\big{(}(1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda\big{)}^{-1}((1-\tau)P_{\mathcal{V}^{\perp}}).

We now notice that the hypothesis cP𝒱+dΛΛIdcP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda\succeq\mathrm{Id} is equivalent to λmin(cP𝒱+dΛΛ)1\lambda_{\min}(cP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda)\geq 1. With our particular choice of τ\tau, this reads λmin((1τ)P𝒱+τΛΛ)1/(c+d)\lambda_{\min}((1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda)\geq 1/(c+d). It follows that

λmax(((1τ)P𝒱+τΛΛ)1)=1λmin((1τ)P𝒱+τΛΛ)c+d.\lambda_{\max}\big{(}((1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda)^{-1}\big{)}=\frac{1}{\lambda_{\min}((1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda)}\leq c+d.

The inverse appearing above can be written as

((1τ)P𝒱+τΛΛ)1[1τP𝒱|τΛ][1τP𝒱τΛ]((1τ)P𝒱+τΛΛ)1\big{(}(1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda\big{)}^{-1}\begin{bmatrix}\sqrt{1-\tau}P_{\mathcal{V}^{\perp}}\;|\;\sqrt{\tau}\Lambda^{*}\end{bmatrix}\begin{bmatrix}\sqrt{1-\tau}P_{\mathcal{V}^{\perp}}\\ \hline\cr\sqrt{\tau}\Lambda\end{bmatrix}\big{(}(1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda\big{)}^{-1}

and since ABAB and BABA always have the same nonzero eigenvalues, we derive that

λmax([1τP𝒱τΛ]((1τ)P𝒱+τΛΛ)2[1τP𝒱|τΛ])c+d.\lambda_{\max}\left(\begin{bmatrix}\sqrt{1-\tau}P_{\mathcal{V}^{\perp}}\\ \hline\cr\sqrt{\tau}\Lambda\end{bmatrix}\big{(}(1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda\big{)}^{-2}\begin{bmatrix}\sqrt{1-\tau}P_{\mathcal{V}^{\perp}}\;|\;\sqrt{\tau}\Lambda^{*}\end{bmatrix}\right)\leq c+d.

Writing the latter as

[1τP𝒱τΛ]((1τ)P𝒱+τΛΛ)2[1τP𝒱|τΛ](c+d)Id\begin{bmatrix}\sqrt{1-\tau}P_{\mathcal{V}^{\perp}}\\ \hline\cr\sqrt{\tau}\Lambda\end{bmatrix}\big{(}(1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda\big{)}^{-2}\begin{bmatrix}\sqrt{1-\tau}P_{\mathcal{V}^{\perp}}\;|\;\sqrt{\tau}\Lambda^{*}\end{bmatrix}\preceq(c+d)\mathrm{Id}

and multiplying on both sides by [1τP𝒱|00|τIdm]\begin{bmatrix}\sqrt{1-\tau}P_{\mathcal{V}^{\perp}}&|&0\\ \hline\cr 0&|&\sqrt{\tau}\,\mathrm{Id}_{\mathbb{R}^{m}}\end{bmatrix} yields

[(1τ)P𝒱τΛ]((1τ)P𝒱+τΛΛ)2[(1τ)P𝒱|τΛ](c+d)[(1τ)P𝒱|00|τIdm].\begin{bmatrix}(1-\tau)P_{\mathcal{V}^{\perp}}\\ \hline\cr\tau\Lambda\end{bmatrix}\big{(}(1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda\big{)}^{-2}\begin{bmatrix}(1-\tau)P_{\mathcal{V}^{\perp}}\;|\;\tau\Lambda^{*}\end{bmatrix}\preceq(c+d)\begin{bmatrix}(1-\tau)P_{\mathcal{V}^{\perp}}&|&0\\ \hline\cr 0&|&\tau\,\mathrm{Id}_{\mathbb{R}^{m}}\end{bmatrix}.

Taking the expressions of Δτ\Delta_{\tau} and IdΔτΛ\mathrm{Id}-\Delta_{\tau}\Lambda into account, we conclude that

[IdΛΔτΔτ][IdΔτΛ|Δτ][cP𝒱|00|dIdm],\begin{bmatrix}\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*}\\ \hline\cr\Delta_{\tau}^{*}\end{bmatrix}\begin{bmatrix}\mathrm{Id}-\Delta_{\tau}\Lambda\;|\;\Delta_{\tau}\end{bmatrix}\preceq\begin{bmatrix}cP_{\mathcal{V}^{\perp}}&|&0\\ \hline\cr 0&|&d\,\mathrm{Id}_{\mathbb{R}^{m}}\end{bmatrix},

as announced. ∎

With the above three lemmas at hand, the main result of this subsection follows easily.

Proof of Theorem 10.

Since Lemma 11 guarantees that inf{gwce(Δ),Δ:mH}lb\inf\{{\rm gwce}(\Delta),\Delta:\mathbb{R}^{m}\to H\}\geq{\rm lb}, we only need to show that gwce(Δτ)lb{\rm gwce}(\Delta_{\tau_{\flat}})\leq{\rm lb}. By the first part of Lemma 12, we have lb2=cε2+dη2{\rm lb}^{2}=c_{\flat}\varepsilon^{2}+d_{\flat}\eta^{2} with cc_{\flat} and dd_{\flat} satisfying cP𝒱+dΛΛIdc_{\flat}P_{\mathcal{V}^{\perp}}+d_{\flat}\Lambda\Lambda^{*}\succeq\mathrm{Id}. By Lemma 13, the latter implies that

[cP𝒱|00|dIdm][IdΛΔτΔτ][IdΔτΛ|Δτ].\begin{bmatrix}c_{\flat}P_{\mathcal{V}^{\perp}}&|&0\\ \hline\cr 0&|&d_{\flat}\mathrm{Id}_{\mathbb{R}^{m}}\end{bmatrix}\succeq\begin{bmatrix}\mathrm{Id}-\Lambda^{*}\Delta_{\tau_{\flat}}^{*}\\ \hline\cr\Delta_{\tau_{\flat}}^{*}\end{bmatrix}\begin{bmatrix}\mathrm{Id}-\Delta_{\tau_{\flat}}\Lambda\;|\;\Delta_{\tau_{\flat}}\end{bmatrix}.

By the second part of Lemma 12, it follows that gwce(Δτ)2cε2+dη2=lb2{\rm gwce}(\Delta_{\tau_{\flat}})^{2}\leq c_{\flat}\varepsilon^{2}+d_{\flat}\eta^{2}={\rm lb}^{2}, which is the required inequality.∎

Remark.

When 𝒱={0}\mathcal{V}=\{0\}, so that P𝒱=IdP_{\mathcal{V}^{\perp}}=\mathrm{Id}, we obtain c=1c_{\flat}=1 and d=0d_{\flat}=0, resulting in a minimal global worst-case error equal to ε\varepsilon and achieved for the regularization map Δ0=0\Delta_{0}=0. This result can be seen directly from gwce(Δ)sup{h:hε,Λhη}=ε{\rm gwce}(\Delta)\geq\sup\{\|h\|:\|h\|\leq\varepsilon,\|\Lambda h\|\leq\eta\}=\varepsilon for any Δ:mH\Delta:\mathbb{R}^{m}\to H, while gwce(Δ0)=sup{f:fε}=ε{\rm gwce}(\Delta_{0})=\sup\{\|f\|:\|f\|\leq\varepsilon\}=\varepsilon.

4.2 Orthonormal observations

In this subsection, we demonstrate that the use of orthonormal observations guarantees, rather unexpectedly, that regularization provides optimal recovery maps even without a careful parameter selection. The main result reads as follows.

Theorem 14.

Given the approximability set 𝒦={fH:dist(f,𝒱)ε}\mathcal{K}=\{f\in H:{\rm dist}(f,\mathcal{V})\leq\varepsilon\} and the uncertainty set ={em:eη}\mathcal{E}=\{e\in\mathbb{R}^{m}:\|e\|\leq\eta\}, if ΛΛ=Idm\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}}, then all the regularization maps Δτ\Delta_{\tau} are optimal recovery maps, i.e., for all τ[0,1]\tau\in[0,1],

(41) gwce(Δτ)=infΔ:mHgwce(Δ).{\rm gwce}(\Delta_{\tau})=\inf_{\Delta:\mathbb{R}^{m}\to H}{\rm gwce}(\Delta).

The proof strategy consists in establishing that the constraints in (38) and in (39) with Δ=Δτ\Delta=\Delta_{\tau} are in fact equivalent for any τ[0,1]\tau\in[0,1]. This yields the inequality gwce(Δτ)lb{\rm gwce}(\Delta_{\tau})\leq{\rm lb}, which proves the required result, given that lb{\rm lb} was introduced as a lower bound on gwce(Δ){\rm gwce}(\Delta) for every Δ\Delta. While the constraint in (39) implies the constraint in (38) for any observation map Λ\Lambda (see the appendix), the reverse implication relies on the fact that ΛΛ=Idm\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}}, e.g. via the identity Δτ=(1τ)Δ0+τΔ1\Delta_{\tau}=(1-\tau)\Delta_{0}+\tau\Delta_{1} derived in Proposition 3. The following realization is also a crucial point of our argument.

Lemma 15.

Assume that ΛΛ=Idm\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}}. For c,d0c,d\geq 0, let hh be an eigenvector of cP𝒱+dΛΛcP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda associated with an eigenvalue λ\lambda. For any τ[0,1]\tau\in[0,1], one has

  • if λc+d\lambda\not=c+d, then

    (IdΛΔτ)h=cλP𝒱handΛΔτh=dλΛΛh;(\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*})h=\dfrac{c}{\lambda}P_{\mathcal{V}^{\perp}}h\qquad\mbox{and}\qquad\Lambda^{*}\Delta_{\tau}^{*}h=\dfrac{d}{\lambda}\Lambda^{*}\Lambda h;
  • if λ=c+d\lambda=c+d, then

    (IdΛΔτ)h=(1τ)handΛΔτh=τh.(\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*})h=(1-\tau)h\qquad\mbox{and}\qquad\Lambda^{*}\Delta_{\tau}^{*}h=\tau h.
Proof.

Multiplying the eigenequation defining hh on the left by ΛΔτ\Lambda^{*}\Delta_{\tau}^{*}, we obtain

(42) cΛΔτP𝒱h+dΛΔτΛΛh=λΛΔτh.c\Lambda^{*}\Delta_{\tau}^{*}P_{\mathcal{V}^{\perp}}h+d\Lambda^{*}\Delta_{\tau}^{*}\Lambda^{*}\Lambda h=\lambda\Lambda^{*}\Delta_{\tau}^{*}h.

According to (19), we have Δ0P𝒱=0\Delta_{0}^{*}P_{\mathcal{V}^{\perp}}=0, Δ1P𝒱=Δ1Δ0\Delta_{1}^{*}P_{\mathcal{V}^{\perp}}=\Delta_{1}^{*}-\Delta_{0}^{*}, Δ1ΛΛ=Λ\Delta_{1}^{*}\Lambda^{*}\Lambda=\Lambda, and Δ0ΛΛ=Δ0Δ1+Λ\Delta_{0}^{*}\Lambda^{*}\Lambda=\Delta_{0}^{*}-\Delta_{1}^{*}+\Lambda. Thus, the relation (42) specified to τ=0\tau=0 and to τ=1\tau=1 yields

(43) dΛΔ0hdΛΔ1h+dΛΛh\displaystyle d\Lambda^{*}\Delta_{0}^{*}h-d\Lambda^{*}\Delta_{1}^{*}h+d\Lambda^{*}\Lambda h =λΛΔ0h,\displaystyle=\lambda\Lambda^{*}\Delta_{0}^{*}h,
(44) cΛΔ1hcΛΔ0h+dΛΛh\displaystyle c\Lambda^{*}\Delta_{1}^{*}h-c\Lambda^{*}\Delta_{0}^{*}h+d\Lambda^{*}\Lambda h =λΛΔ1h.\displaystyle=\lambda\Lambda^{*}\Delta_{1}^{*}h.

Subtracting (44) from (43) yields (c+d)(ΛΔ0hΛΔ1h)=λ(ΛΔ0hΛΔ1h)(c+d)(\Lambda^{*}\Delta_{0}^{*}h-\Lambda^{*}\Delta_{1}^{*}h)=\lambda(\Lambda^{*}\Delta_{0}^{*}h-\Lambda^{*}\Delta_{1}^{*}h). Therefore, we derive that ΛΔ0h=ΛΔ1h\Lambda^{*}\Delta_{0}^{*}h=\Lambda^{*}\Delta_{1}^{*}h provided λc+d\lambda\not=c+d. In this case, the equations (43)-(44) reduce to ΛΔ0h=ΛΔ1h=(d/λ)ΛΛh\Lambda^{*}\Delta_{0}^{*}h=\Lambda^{*}\Delta_{1}^{*}h=(d/\lambda)\Lambda^{*}\Lambda h. In view of Δτ=(1τ)Δ0+τΔ1\Delta_{\tau}=(1-\tau)\Delta_{0}+\tau\Delta_{1}, we arrive at ΛΔτh=(d/λ)ΛΛh\Lambda^{*}\Delta_{\tau}^{*}h=(d/\lambda)\Lambda^{*}\Lambda h for any τ[0,1]\tau\in[0,1]. The relation (IdΛΔτ)h=(c/λ)P𝒱h(\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*})h=(c/\lambda)P_{\mathcal{V}^{\perp}}h follows from the eigenequation rewritten as (c/λ)P𝒱h+(d/λ)ΛΛh=h(c/\lambda)P_{\mathcal{V}^{\perp}}h+(d/\lambda)\Lambda^{*}\Lambda h=h.

It remains to deal with the case λ=c+d\lambda=c+d. Notice that this case is not vacuous, as it is equivalent to h𝒱ran(ΛΛ)h\in\mathcal{V}^{\perp}\cap{\rm ran}(\Lambda^{*}\Lambda), which is nontrivial by a dimension argument involving assumption (5). To see this equivalence, notice that h𝒱ran(ΛΛ)h\in\mathcal{V}^{\perp}\cap{\rm ran}(\Lambda^{*}\Lambda) clearly implies cP𝒱h+dΛΛh=(c+d)hcP_{\mathcal{V}^{\perp}}h+d\Lambda^{*}\Lambda h=(c+d)h, while the latter eigenequation forces cP𝒱h2+dΛΛh2=(c+d)h2c\|P_{\mathcal{V}^{\perp}}h\|^{2}+d\|\Lambda^{*}\Lambda h\|^{2}=(c+d)\|h\|^{2}, hence P𝒱h2=h2\|P_{\mathcal{V}^{\perp}}h\|^{2}=\|h\|^{2} and ΛΛh2=h2\|\Lambda^{*}\Lambda h\|^{2}=\|h\|^{2}, i.e., h𝒱h\in\mathcal{V}^{\perp} and hran(ΛΛ)h\in{\rm ran}(\Lambda^{*}\Lambda). We now consider such an eigenvector hh associated with the eigenvalue c+dc+d: in view of h𝒱ran(ΛΛ)h\in\mathcal{V}^{\perp}\cap{\rm ran}(\Lambda^{*}\Lambda), we remark that Δ0h=Δ0P𝒱h=0\Delta_{0}^{*}h=\Delta_{0}^{*}P_{\mathcal{V}^{\perp}}h=0 and that Δ1h=Δ1ΛΛh=Λh\Delta_{1}^{*}h=\Delta_{1}^{*}\Lambda^{*}\Lambda h=\Lambda h. We deduce that ΛΔτh=(1τ)ΛΔ0h+τΛΔ1h=τΛΛh=τh\Lambda^{*}\Delta_{\tau}^{*}h=(1-\tau)\Lambda^{*}\Delta_{0}^{*}h+\tau\Lambda^{*}\Delta_{1}^{*}h=\tau\Lambda^{*}\Lambda h=\tau h and in turn that (IdΛΔτ)h=(1τ)h(\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*})h=(1-\tau)h. ∎

We are now ready to establish the main result of this subsection.

Proof of Theorem 14.

Let τ[0,1]\tau\in[0,1] be fixed throughout. As announced earlier, our objective is to establish that, thanks to ΛΛ=Idm\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}}, the condition cP𝒱+dΛΛIdcP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda\succeq\mathrm{Id} implies the condition

[cP𝒱|00|dIdm][IdΛΔτΔτ][IdΔτΛ|Δτ],\begin{bmatrix}cP_{\mathcal{V}^{\perp}}&|&0\\ \hline\cr 0&|&d\,\mathrm{Id}_{\mathbb{R}^{m}}\end{bmatrix}\succeq\begin{bmatrix}\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*}\\ \hline\cr\Delta_{\tau}^{*}\end{bmatrix}\begin{bmatrix}\mathrm{Id}-\Delta_{\tau}\Lambda\;|\;\Delta_{\tau}\end{bmatrix},

or equivalently the condition

[cP𝒱|00|dΛΛ][IdΛΔτΛΔτ][IdΔτΛ|ΔτΛ].\begin{bmatrix}cP_{\mathcal{V}^{\perp}}&|&0\\ \hline\cr 0&|&d\Lambda^{*}\Lambda\end{bmatrix}\succeq\begin{bmatrix}\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*}\\ \hline\cr\Lambda^{*}\Delta_{\tau}^{*}\end{bmatrix}\begin{bmatrix}\mathrm{Id}-\Delta_{\tau}\Lambda\;|\;\Delta_{\tau}\Lambda\end{bmatrix}.

The equivalence of these conditions is seen as follows: the former implies the latter by multiplying on the left by [Id|00|Λ]\begin{bmatrix}\mathrm{Id}&|&0\\ \hline\cr 0&|&\Lambda^{*}\end{bmatrix} and on the right by [Id|00|Λ]\begin{bmatrix}\mathrm{Id}&|&0\\ \hline\cr 0&|&\Lambda\end{bmatrix}, while the latter implies the former under the assumption ΛΛ=Idm\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}} by multiplying on the left by [Id|00|Λ]\begin{bmatrix}\mathrm{Id}&|&0\\ \hline\cr 0&|&\Lambda\end{bmatrix} and on the right by [Id|00|Λ]\begin{bmatrix}\mathrm{Id}&|&0\\ \hline\cr 0&|&\Lambda^{*}\end{bmatrix}. As a matter of fact, according to a classical result about Schur complements, see e.g. [Boyd and Vandenberghe, 2004, Section A.5.5], the latter is further equivalent to

[Id|IdΔτΛ|ΔτΛIdΛΔτ|cP𝒱|0ΛΔτ|0|dΛΛ]0.\begin{bmatrix}\mathrm{Id}&|&\mathrm{Id}-\Delta_{\tau}\Lambda&|&\Delta_{\tau}\Lambda\\ \hline\cr\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*}&|&cP_{\mathcal{V}^{\perp}}&|&0\\ \hline\cr\Lambda^{*}\Delta_{\tau}^{*}&|&0&|&d\Lambda^{*}\Lambda\end{bmatrix}\succeq 0.

Thus, considering f,g,hHf,g,h\in H, our objective is to prove the nonnegativity of the inner product

ip:=\displaystyle{\rm ip}:=\, [Id|IdΔτΛ|ΔτΛIdΛΔτ|cP𝒱|0ΛΔτ|0|dΛΛ][fgh],[fgh]\displaystyle\left\langle\begin{bmatrix}\mathrm{Id}&|&\mathrm{Id}-\Delta_{\tau}\Lambda&|&\Delta_{\tau}\Lambda\\ \hline\cr\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*}&|&cP_{\mathcal{V}^{\perp}}&|&0\\ \hline\cr\Lambda^{*}\Delta_{\tau}^{*}&|&0&|&d\Lambda^{*}\Lambda\end{bmatrix}\begin{bmatrix}f\\ \hline\cr g\\ \hline\cr h\end{bmatrix},\begin{bmatrix}f\\ \hline\cr g\\ \hline\cr h\end{bmatrix}\right\rangle
=\displaystyle=\, f,f+cP𝒱g,g+dΛΛh,h+2(IdΛΔτ)f,g+2ΛΔτf,h.\displaystyle\langle f,f\rangle+c\langle P_{\mathcal{V}^{\perp}}g,g\rangle+d\langle\Lambda^{*}\Lambda h,h\rangle+2\langle(\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*})f,g\rangle+2\langle\Lambda^{*}\Delta_{\tau}^{*}f,h\rangle.

Let us decompose ff, gg, and hh as f=f+f′′f=f^{\prime}+f^{\prime\prime}, g=g+g′′g=g^{\prime}+g^{\prime\prime}, and h=h+h′′h=h^{\prime}+h^{\prime\prime}, where ff^{\prime}, gg^{\prime}, and hh^{\prime} belong to the space HH^{\prime} spanned by eigenvectors of cP𝒱+dΛΛcP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda corresponding to eigenvalues λc+d\lambda\not=c+d and where f′′f^{\prime\prime}, g′′g^{\prime\prime}, and h′′h^{\prime\prime} belong to the eigenspace H′′H^{\prime\prime} of cP𝒱+dΛΛcP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda corresponding to the eigenvalue λ=c+d\lambda=c+d, i.e., H′′=𝒱ran(ΛΛ)H^{\prime\prime}=\mathcal{V}^{\perp}\cap{\rm ran}(\Lambda^{*}\Lambda). We take notice of the fact that the spaces HH^{\prime} and H′′H^{\prime\prime} are orthogonal. With this decomposition, the above inner product becomes

ip=ip+ip′′+ip′′′,{\rm ip}={\rm ip}^{\prime}+{\rm ip}^{\prime\prime}+{\rm ip}^{\prime\prime\prime},

where we have set

ip\displaystyle{\rm ip}^{\prime} =f,f+cP𝒱g,g+dΛΛh,h+2(IdΛΔτ)f,g+2ΛΔτf,h,\displaystyle=\langle f^{\prime},f^{\prime}\rangle+c\langle P_{\mathcal{V}^{\perp}}g^{\prime},g^{\prime}\rangle+d\langle\Lambda^{*}\Lambda h,^{\prime}h^{\prime}\rangle+2\langle(\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*})f^{\prime},g^{\prime}\rangle+2\langle\Lambda^{*}\Delta_{\tau}^{*}f^{\prime},h^{\prime}\rangle,
ip′′\displaystyle{\rm ip}^{\prime\prime} =f′′,f′′+cP𝒱g′′,g′′+dΛΛh′′,h′′+2(IdΛΔτ)f′′,g′′+2ΛΔτf′′,h′′,\displaystyle=\langle f^{\prime\prime},f^{\prime\prime}\rangle+c\langle P_{\mathcal{V}^{\perp}}g^{\prime\prime},g^{\prime\prime}\rangle+d\langle\Lambda^{*}\Lambda h^{\prime\prime},h^{\prime\prime}\rangle+2\langle(\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*})f^{\prime\prime},g^{\prime\prime}\rangle+2\langle\Lambda^{*}\Delta_{\tau}^{*}f^{\prime\prime},h^{\prime\prime}\rangle,
ip′′′\displaystyle{\rm ip}^{\prime\prime\prime} =2f,f′′+2P𝒱g,g′′+2ΛΛh,h′′+2(IdΛΔτ)f,g′′+2ΛΔτf,h′′\displaystyle=2\langle f^{\prime},f^{\prime\prime}\rangle+2\langle P_{\mathcal{V}^{\perp}}g^{\prime},g^{\prime\prime}\rangle+2\langle\Lambda^{*}\Lambda h^{\prime},h^{\prime\prime}\rangle+2\langle(\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*})f^{\prime},g^{\prime\prime}\rangle+2\langle\Lambda^{*}\Delta_{\tau}^{*}f^{\prime},h^{\prime\prime}\rangle
+2(IdΛΔτ)f′′,g+2ΛΔτf′′,h.\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\,+2\langle(\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*})f^{\prime\prime},g^{\prime}\rangle+2\langle\Lambda^{*}\Delta_{\tau}^{*}f^{\prime\prime},h^{\prime}\rangle.

We first remark that the terms in ip′′′{\rm ip}^{\prime\prime\prime} are all zero: first, it is clear that f,f′′=0\langle f^{\prime},f^{\prime\prime}\rangle=0; then, one has P𝒱g,g′′=g,P𝒱g′′=g,g′′=0\langle P_{\mathcal{V}^{\perp}}g^{\prime},g^{\prime\prime}\rangle=\langle g^{\prime},P_{\mathcal{V}^{\perp}}g^{\prime\prime}\rangle=\langle g^{\prime},g^{\prime\prime}\rangle=0 and ΛΛh,h′′=0\langle\Lambda^{*}\Lambda h^{\prime},h^{\prime\prime}\rangle=0 is obtained similarly; next, Lemma 15 ensures that (IdΛΔτ)f′′,g=(1τ)f′′,g=0\langle(\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*})f^{\prime\prime},g^{\prime}\rangle=(1-\tau)\langle f^{\prime\prime},g^{\prime}\rangle=0 and ΛΔτf′′,h=0\langle\Lambda^{*}\Delta_{\tau}^{*}f^{\prime\prime},h^{\prime}\rangle=0 is obtained similarly; last, writing f=ifif^{\prime}=\sum_{i}f_{i} where the fiHf_{i}\in H^{\prime} are orthogonal eigenvectors of cP𝒱+dΛΛcP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda corresponding to eigenvalues λi<c+d\lambda_{i}<c+d, we derive from Lemma 15 that

(IdΛΔτ)f,g′′=icλiP𝒱fi,g′′=icλifi,P𝒱g′′=icλifi,g′′=0,\langle(\mathrm{Id}-\Lambda^{*}\Delta_{\tau}^{*})f^{\prime},g^{\prime\prime}\rangle=\sum_{i}\frac{c}{\lambda_{i}}\langle P_{\mathcal{V}^{\perp}}f_{i},g^{\prime\prime}\rangle=\sum_{i}\frac{c}{\lambda_{i}}\langle f_{i},P_{\mathcal{V}^{\perp}}g^{\prime\prime}\rangle=\sum_{i}\frac{c}{\lambda_{i}}\langle f_{i},g^{\prime\prime}\rangle=0,

and ΛΔτf,h′′=0\langle\Lambda^{*}\Delta_{\tau}^{*}f^{\prime},h^{\prime\prime}\rangle=0 is obtained similarly. As a result, we have ip′′′=0{\rm ip}^{\prime\prime\prime}=0.

We now turn to the quantity ip{\rm ip}^{\prime}. Exploiting Lemma 15 again, we write

ip\displaystyle{\rm ip}^{\prime} =f,f+cP𝒱g,g+dΛΛh,h+2icλiP𝒱fi,g+2idλiΛΛfi,h\displaystyle=\langle f^{\prime},f^{\prime}\rangle+c\langle P_{\mathcal{V}^{\perp}}g^{\prime},g^{\prime}\rangle+d\langle\Lambda^{*}\Lambda h,^{\prime}h^{\prime}\rangle+2\bigg{\langle}\sum_{i}\frac{c}{\lambda_{i}}P_{\mathcal{V}^{\perp}}f_{i},g^{\prime}\bigg{\rangle}+2\bigg{\langle}\sum_{i}\frac{d}{\lambda_{i}}\Lambda^{*}\Lambda f_{i},h^{\prime}\bigg{\rangle}
=f,f+c(P𝒱g,P𝒱g+2i1λiP𝒱fi,P𝒱g)\displaystyle=\langle f^{\prime},f^{\prime}\rangle+c\Bigg{(}\langle P_{\mathcal{V}^{\perp}}g^{\prime},P_{\mathcal{V}^{\perp}}g^{\prime}\rangle+2\bigg{\langle}\sum_{i}\frac{1}{\lambda_{i}}P_{\mathcal{V}^{\perp}}f_{i},P_{\mathcal{V}^{\perp}}g^{\prime}\bigg{\rangle}\Bigg{)}
+d(ΛΛh,ΛΛh+2i1λiΛΛfi,ΛΛh)\displaystyle\phantom{=\langle f^{\prime},f^{\prime}\rangle}\;+d\Bigg{(}\langle\Lambda^{*}\Lambda h^{\prime},\Lambda^{*}\Lambda h^{\prime}\rangle+2\bigg{\langle}\sum_{i}\frac{1}{\lambda_{i}}\Lambda^{*}\Lambda f_{i},\Lambda^{*}\Lambda h^{\prime}\bigg{\rangle}\Bigg{)}
=f,f+c(P𝒱g+i1λiP𝒱fi2i1λiP𝒱fi2)\displaystyle=\langle f^{\prime},f^{\prime}\rangle+c\Bigg{(}\bigg{\|}P_{\mathcal{V}^{\perp}}g^{\prime}+\sum_{i}\frac{1}{\lambda_{i}}P_{\mathcal{V}^{\perp}}f_{i}\bigg{\|}^{2}-\bigg{\|}\sum_{i}\frac{1}{\lambda_{i}}P_{\mathcal{V}^{\perp}}f_{i}\bigg{\|}^{2}\Bigg{)}
+d(ΛΛh+i1λiΛΛfi2i1λiΛΛfi2).\displaystyle\phantom{=\langle f^{\prime},f^{\prime}\rangle}\;+d\Bigg{(}\bigg{\|}\Lambda^{*}\Lambda h^{\prime}+\sum_{i}\frac{1}{\lambda_{i}}\Lambda^{*}\Lambda f_{i}\bigg{\|}^{2}-\bigg{\|}\sum_{i}\frac{1}{\lambda_{i}}\Lambda^{*}\Lambda f_{i}\bigg{\|}^{2}\Bigg{)}.

At this point, we can bound ip{\rm ip}^{\prime} from below as

ip\displaystyle{\rm ip}^{\prime} f,f(cP𝒱(i1λifi)2+dΛΛ(i1λifi)2)\displaystyle\geq\langle f^{\prime},f^{\prime}\rangle-\Bigg{(}c\bigg{\|}P_{\mathcal{V}^{\perp}}\Big{(}\sum_{i}\frac{1}{\lambda_{i}}f_{i}\Big{)}\bigg{\|}^{2}+d\bigg{\|}\Lambda^{*}\Lambda\Big{(}\sum_{i}\frac{1}{\lambda_{i}}f_{i}\Big{)}\bigg{\|}^{2}\Bigg{)}
=f,f(cP𝒱+dΛΛ)(i1λifi),(i1λifi)\displaystyle=\langle f^{\prime},f^{\prime}\rangle-\bigg{\langle}\Big{(}cP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda\Big{)}\Big{(}\sum_{i}\frac{1}{\lambda_{i}}f_{i}\Big{)},\Big{(}\sum_{i}\frac{1}{\lambda_{i}}f_{i}\Big{)}\bigg{\rangle}
=ifi2ifi,i1λifi=ifi2(11λi).\displaystyle=\sum_{i}\|f_{i}\|^{2}-\bigg{\langle}\sum_{i}f_{i},\sum_{i}\frac{1}{\lambda_{i}}f_{i}\bigg{\rangle}=\sum_{i}\|f_{i}\|^{2}\bigg{(}1-\frac{1}{\lambda_{i}}\bigg{)}.

This shows that ip0{\rm ip}^{\prime}\geq 0 since the condition cP𝒱+dΛΛIdcP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda\succeq\mathrm{Id} ensures that λi1\lambda_{i}\geq 1 for every ii.

Finally, Lemma 15 also helps us to bound the quantity ip′′{\rm ip}^{\prime\prime} from below according to

ip′′\displaystyle{\rm ip}^{\prime\prime} =f′′2+cg′′2+dh′′2+2(1τ)f′′,g′′+2τf′′,h′′\displaystyle=\|f^{\prime\prime}\|^{2}+c\|g^{\prime\prime}\|^{2}+d\|h^{\prime\prime}\|^{2}+2(1-\tau)\langle f^{\prime\prime},g^{\prime\prime}\rangle+2\tau\langle f^{\prime\prime},h^{\prime\prime}\rangle
=(1τ)(f′′2+2f′′,g′′)+τ(f′′2+2f′′,h′′)+cg′′2+dh′′2\displaystyle=(1-\tau)\big{(}\|f^{\prime\prime}\|^{2}+2\langle f^{\prime\prime},g^{\prime\prime}\rangle\big{)}+\tau\big{(}\|f^{\prime\prime}\|^{2}+2\langle f^{\prime\prime},h^{\prime\prime}\rangle\big{)}+c\|g^{\prime\prime}\|^{2}+d\|h^{\prime\prime}\|^{2}
(1τ)g′′2τh′′2+cg′′2+dh′′2.\displaystyle\geq-(1-\tau)\|g^{\prime\prime}\|^{2}-\tau\|h^{\prime\prime}\|^{2}+c\|g^{\prime\prime}\|^{2}+d\|h^{\prime\prime}\|^{2}.

This allows us to obtain ip′′0{\rm ip}^{\prime\prime}\geq 0 since the condition cP𝒱+dΛΛIdcP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda\succeq\mathrm{Id} ensures that c1c\geq 1 and d1d\geq 1. Altogether, we have shown that ip=ip+ip′′+ip′′′0{\rm ip}={\rm ip}^{\prime}+{\rm ip}^{\prime\prime}+{\rm ip}^{\prime\prime\prime}\geq 0, which concludes the proof. ∎

Remark.

The value of the minimal global worst-case error can, in general, be computed by solving the semidefinite program (38) characterizing the lower bound lb{\rm lb}. In the case where ΛΛ=Idm\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}}, it can also be computed without resorting to semidefinite programming. Precisely, if τ\tau_{\sharp} denotes the (unique) τ\tau between 1/21/2 and ε/(ε+η)\varepsilon/(\varepsilon+\eta) such that

(45) λmin((1τ)P𝒱+τΛΛ)=(1τ)2ε2τ2η2(1τ)ε2τη2\lambda_{\min}((1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda)=\frac{(1-\tau)^{2}\varepsilon^{2}-\tau^{2}\eta^{2}}{(1-\tau)\varepsilon^{2}-\tau\eta^{2}}

and if λ\lambda_{\sharp} denotes λmin((1τ)P𝒱+τΛΛ)\lambda_{\min}((1-\tau_{\sharp})P_{\mathcal{V}^{\perp}}+\tau_{\sharp}\Lambda^{*}\Lambda), then we claim that, for any τ[0,1]\tau\in[0,1],

gwce(Δτ)2=1τλε2+τλη2.{\rm gwce}(\Delta_{\tau})^{2}=\frac{1-\tau_{\sharp}}{\lambda_{\sharp}}\varepsilon^{2}+\frac{\tau_{\sharp}}{\lambda_{\sharp}}\eta^{2}.

Indeed, since we now know that the global worst-case error gwce(Δτ){\rm gwce}(\Delta_{\tau}) equals its lower bound lb{\rm lb} independently of τ[0,1]\tau\in[0,1] and since c:=(1τ)/λc_{\sharp}:=(1-\tau_{\sharp})/\lambda_{\sharp} and d:=τ/λd_{\sharp}:=\tau_{\sharp}/\lambda_{\sharp} are feasible for the semidefinite program (38) characterizing lb{\rm lb}, we obtain

(46) gwce(Δτ)21τλε2+τλη2.{\rm gwce}(\Delta_{\tau})^{2}\leq\frac{1-\tau_{\sharp}}{\lambda_{\sharp}}\varepsilon^{2}+\frac{\tau_{\sharp}}{\lambda_{\sharp}}\eta^{2}.

Moreover, going back to the proof of Theorem 8, we recognize that the choice of τ\tau_{\sharp} here corresponds to the instance y=0y=0 there. This instance comes with ff_{\sharp} being equal to zero and with hh_{\sharp} being equal to a properly normalized eigenvector of (1τ)P𝒱+τΛΛ(1-\tau_{\sharp})P_{\mathcal{V}^{\perp}}+\tau_{\sharp}\Lambda^{*}\Lambda corresponding to the eigenvalue λ\lambda_{\sharp}. The identities (32) now read P𝒱h2=ε2\|P_{\mathcal{V}^{\perp}}h_{\sharp}\|^{2}=\varepsilon^{2} and ΛΛh2=η2\|\Lambda^{*}\Lambda h_{\sharp}\|^{2}=\eta^{2}, i.e., Λh2=η2\|\Lambda h_{\sharp}\|^{2}=\eta^{2}. Setting f=hf=h_{\sharp} and e=Λhe=-\Lambda h_{\sharp}, which satisfy P𝒱f=ε\|P_{\mathcal{V}^{\perp}}f\|=\varepsilon and e=η\|e\|=\eta, the very definition of the global worst-case error yields

(47) gwce(Δτ)2\displaystyle{\rm gwce}(\Delta_{\tau})^{2} fΔτ(Λf+e)2=h2\displaystyle\geq\|f-\Delta_{\tau}(\Lambda f+e)\|^{2}=\|h_{\sharp}\|^{2}
=1λ((1τ)P𝒱+τΛΛ)h,h\displaystyle=\frac{1}{\lambda_{\sharp}}\big{\langle}\big{(}(1-\tau_{\sharp})P_{\mathcal{V}^{\perp}}+\tau_{\sharp}\Lambda^{*}\Lambda\big{)}h_{\sharp},h_{\sharp}\big{\rangle}
=1τλP𝒱h2+τλΛΛh2\displaystyle=\frac{1-\tau_{\sharp}}{\lambda_{\sharp}}\|P_{\mathcal{V}^{\perp}}h_{\sharp}\|^{2}+\frac{\tau_{\sharp}}{\lambda_{\sharp}}\|\Lambda^{*}\Lambda h_{\sharp}\|^{2}
=1τλε2+τλη2.\displaystyle=\frac{1-\tau_{\sharp}}{\lambda_{\sharp}}\varepsilon^{2}+\frac{\tau_{\sharp}}{\lambda_{\sharp}}\eta^{2}.

Together, the inequalities (46) and (47) justify our claim about the value of the global worst-case error. In passing, it is worth noticing that the above argument reveals that f=hf=h_{\sharp} and e=Λhe=-\Lambda h_{\sharp} are extremal in the defining expression for the global worst-case error of the regularization map Δτ\Delta_{\tau} independently of the parameter τ[0,1]\tau\in[0,1].

References

  • Beck and Eldar [2007] A. Beck and Y. C. Eldar. Regularization in regression with bounded noise: a Chebyshev center approach. SIAM Journal on Matrix Analysis and Applications, 29(2):606–625, 2007.
  • Binev et al. [2017] P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, and P. Wojtaszczyk. Data assimilation in reduced modeling. SIAM/ASA Journal on Uncertainty Quantification, 5(1):1–29, 2017.
  • Boyd and Vandenberghe [2004] S. P. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
  • Chen and Haykin [2002] Z. Chen and S. Haykin. On different facets of regularization theory. Neural Computation, 14(12):2791–2846, 2002.
  • Cohen et al. [2020] A. Cohen, W. Dahmen, O. Mula, and J. Nichols. Nonlinear reduced models for state and parameter estimation. arXiv preprint arXiv:2009.02687, 2020.
  • DeVore et al. [2017] R. DeVore, G. Petrova, and P. Wojtaszczyk. Data assimilation and sampling in Banach spaces. Calcolo, 54(3):963–1007, 2017.
  • Diamond and Boyd [2016] S. Diamond and S. Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
  • Ettehad and Foucart [2021] M. Ettehad and S. Foucart. Instances of computational optimal recovery: dealing with observation errors. SIAM/ASA Journal on Uncertainty Quantification, 9(4):1438–1456, 2021.
  • Foucart [To appear] S. Foucart. Mathematical Pictures at a Data Science Exhibition. Cambridge University Press, To appear.
  • Foucart et al. [2020] S. Foucart, C. Liao, S. Shahrampour, and Y. Wang. Learning from non-random data in Hilbert spaces: an optimal recovery perspective. arXiv preprint arXiv:2006.03706, 2020.
  • Garkavi [1962] A. L. Garkavi. On the optimal net and best cross-section of a set in a normed space. Izvestiya Rossiiskoi Akademii Nauk. Seriya Matematicheskaya, 26(1):87–106, 1962.
  • Grant and Boyd [2014] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx, March 2014.
  • Hastie et al. [2009] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, second edition, 2009.
  • Maday et al. [2015] Y. Maday, A. T. Patera, J. D. Penn, and M. Yano. A parameterized-background data-weak approach to variational data assimilation: formulation, analysis, and application to acoustics. International Journal for Numerical Methods in Engineering, 102(5):933–965, 2015.
  • Melkman and Micchelli [1979] A. A. Melkman and C. A. Micchelli. Optimal estimation of linear operators in Hilbert spaces from inaccurate data. SIAM Journal on Numerical Analysis, 16(1):87–105, 1979.
  • Micchelli [1993] C. A. Micchelli. Optimal estimation of linear operators from inaccurate data: a second look. Numerical Algorithms, 5(8):375–390, 1993.
  • Micchelli and Rivlin [1977] C. A. Micchelli and T. J. Rivlin. A survey of optimal recovery. In Optimal Estimation in Approximation Theory, pages 1–54. Springer, 1977.
  • Novak and Woźniakowski [2008] E. Novak and H. Woźniakowski. Tractability of Multivariate Problems: Linear Information. European Mathematical Society, 2008.
  • Plaskota [1996] L. Plaskota. Noisy Information and Computational Complexity. Cambridge University Press, 1996.
  • Pólik and Terlaky [2007] I. Pólik and T. Terlaky. A survey of the S-lemma. SIAM Review, 49(3):371–418, 2007.
  • Polyak [1998] B. T. Polyak. Convexity of quadratic transformations and its use in control and optimization. Journal of Optimization Theory and Applications, 99(3):553–583, 1998.
  • Schölkopf and Smola [2002] B. Schölkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.

Appendix

This additional section collects justifications for a few facts that were mentioned but not explained in the main text. These facts are: the uniqueness of a Chebyshev center for the model- and data-consistent set (see page 1.3), the efficient computation of the solution to (7) when ΛΛ=Idm\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}} (see page Remark), the form of Newton method when solving equation (28) (see page 8), and the reason why the constraint of (39) always implies the constraint of (38) (see pages 12 and 14).

Uniqueness of the Chebyshev center.

Let f1^,f2^\widehat{f_{1}},\widehat{f_{2}} be two Chebyshev centers, i.e., minimizers of max{fg:P𝒱gε,Λgyη}\max\{\|f-g\|:\|P_{\mathcal{V}^{\perp}}g\|\leq\varepsilon,\|\Lambda g-y\|\leq\eta\} and let μ\mu be the value of the minimum. Consider g¯H\overline{g}\in H such that (f1^+f2^)/2g¯=max{(f1^+f2^)/2g:P𝒱gε,Λgyη}\|(\widehat{f_{1}}+\widehat{f_{2}})/2-\overline{g}\|=\max\{\|(\widehat{f_{1}}+\widehat{f_{2}})/2-g\|:\|P_{\mathcal{V}^{\perp}}g\|\leq\varepsilon,\|\Lambda g-y\|\leq\eta\}. Then

μ\displaystyle\mu (f1^+f2^)/2g¯12f1^g¯+12f2^g¯\displaystyle\leq\|(\widehat{f_{1}}+\widehat{f_{2}})/2-\overline{g}\|\leq\frac{1}{2}\|\widehat{f_{1}}-\overline{g}\|+\frac{1}{2}\|\widehat{f_{2}}-\overline{g}\|
12max{f1^g:P𝒱gε,Λgyη}+12max{f2^g:P𝒱gε,Λgyη}\displaystyle\leq\frac{1}{2}\max\{\|\widehat{f_{1}}-g\|:\|P_{\mathcal{V}^{\perp}}g\|\leq\varepsilon,\|\Lambda g-y\|\leq\eta\}+\frac{1}{2}\max\{\|\widehat{f_{2}}-g\|:\|P_{\mathcal{V}^{\perp}}g\|\leq\varepsilon,\|\Lambda g-y\|\leq\eta\}
=12μ+12μ=μ.\displaystyle=\frac{1}{2}\mu+\frac{1}{2}\mu=\mu.

Thus, equality must hold all the way through. This implies that f1^g¯=f2^g¯\widehat{f_{1}}-\overline{g}=\widehat{f_{2}}-\overline{g}, i.e., that f1^=f2^\widehat{f_{1}}=\widehat{f_{2}}, as expected.

Computation of the regularized solution.

Let (v1,,vn)(v_{1},\ldots,v_{n}) be a basis for 𝒱\mathcal{V} and let u1,,umu_{1},\ldots,u_{m} denote the Riesz representers of the observation functionals λ1,,λm\lambda_{1},\ldots,\lambda_{m}, which form an orthonormal basis for ran(Λ){\rm ran}(\Lambda^{*}) under the assumption that ΛΛ=Idm\Lambda\Lambda^{*}=\mathrm{Id}_{\mathbb{R}^{m}}. With Cm×nC\in\mathbb{R}^{m\times n} representing the cross-gramian with entries ui,vj=λi(vj)\langle u_{i},v_{j}\rangle=\lambda_{i}(v_{j}), the solution to the regularization program (7) is given, even when HH is infinite dimensional, by

fτ=τi=1maiui+j=1nbjvj,f_{\tau}=\tau\sum_{i=1}^{m}a_{i}u_{i}+\sum_{j=1}^{n}b_{j}v_{j},

where the coefficient vectors ama\in\mathbb{R}^{m} and bnb\in\mathbb{R}^{n} are computed according to

b=(CC)1Cyanda=yCb.b=\big{(}C^{\top}C\big{)}^{-1}C^{\top}y\qquad\mbox{and}\qquad a=y-Cb.

This is fairly easy to see for τ=0\tau=0 and it has been established in [Foucart, Liao, Shahrampour, and Wang, 2020, Theorem 2] for τ=1\tau=1, so the general result follows from Proposition 3. Alternatively, it can be obtained by replicating the steps from the proof of the case τ=1\tau=1 with minor changes.

Newton method.

Equation (28) takes the form F(τ)=0F(\tau)=0, where

F(τ)=λmin((1τ)R+τS)(1τ)2ε2τ2η2(1τ)ε2τη2+(1τ)τ(12τ)δ2.F(\tau)=\lambda_{\min}((1-\tau)R+\tau S)-\frac{(1-\tau)^{2}\varepsilon^{2}-\tau^{2}\eta^{2}}{(1-\tau)\varepsilon^{2}-\tau\eta^{2}+(1-\tau)\tau(1-2\tau)\delta^{2}}.

Newton method produces a sequence (τk)k0(\tau_{k})_{k\geq 0} converging to a solution using the recursion

(48) τk+1=τkF(τk)F(τk),k0.\tau_{k+1}=\tau_{k}-\frac{F(\tau_{k})}{F^{\prime}(\tau_{k})},\qquad k\geq 0.

In order to apply this method, we need the ability to compute the derivative of FF with respect to τ\tau. Setting λmin=λmin((1τ)R+τS)\lambda_{\min}=\lambda_{\min}((1-\tau)R+\tau S), this essentially reduces to the computation of dλmin/dτd\lambda_{\min}/d\tau, which is performed via the argument below. Note that the argument is not rigorous, as we take for granted the differentiability of the eigenvalue λmin\lambda_{\min} and of a normalized eigenvector hh associated with it. However, nothing prevents us from applying the scheme (48) using the expression for dλmin/dτd\lambda_{\min}/d\tau given in (49) below and agree that a solution has been found if the output τK\tau_{K} satisfies F(τK)<ιF(\tau_{K})<\iota for some prescribed tolerance ι>0\iota>0. Now, the argument starts form the identities

((1τ)R+τS)h=λminhandh,h=1,((1-\tau)R+\tau S)h=\lambda_{\min}h\qquad\mbox{and}\qquad\langle h,h\rangle=1,

which we differentiate to obtain

(SR)h+((1τ)R+τS)dhdτ=dλmindτh+λmindhdτand2h,dhdτ=0.(S-R)h+((1-\tau)R+\tau S)\frac{dh}{d\tau}=\frac{d\lambda_{\min}}{d\tau}h+\lambda_{\min}\frac{dh}{d\tau}\qquad\mbox{and}\qquad 2\Big{\langle}h,\frac{dh}{d\tau}\Big{\rangle}=0.

By taking the inner product with hh in the first identity and using the second identity, we derive

(SR)h,h=dλmindτ,i.e., dλmindτ=Sh2Rh2.\langle(S-R)h,h\rangle=\frac{d\lambda_{\min}}{d\tau},\qquad\mbox{i.e., }\qquad\frac{d\lambda_{\min}}{d\tau}=\|Sh\|^{2}-\|Rh\|^{2}.

According to Lemma 9, this expression can be transformed, after some work, into

(49) dλmindτ=12ττ(1τ)λmin(1λmin)12λmin.\frac{d\lambda_{\min}}{d\tau}=\frac{1-2\tau}{\tau(1-\tau)}\,\frac{\lambda_{\min}(1-\lambda_{\min})}{1-2\lambda_{\min}}.
Relation between semidefinite constraints.

Suppose that the constraint of (39) holds for a regularization map Δτ\Delta_{\tau}. In view of the expressions

Δτ=((1τ)P𝒱+τΛΛ)1(τΛ)andIdΔτΛ=((1τ)P𝒱+τΛΛ)1((1τ)P𝒱),\Delta_{\tau}=\big{(}(1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda\big{)}^{-1}(\tau\Lambda^{*})\quad\mbox{and}\quad\mathrm{Id}-\Delta_{\tau}\Lambda=\big{(}(1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda\big{)}^{-1}((1-\tau)P_{\mathcal{V}^{\perp}}),

this constraint also reads

[cP𝒱|00|dIdm][(1τ)P𝒱τΛ]((1τ)P𝒱+τΛΛ)2[(1τ)P𝒱|τΛ].\begin{bmatrix}cP_{\mathcal{V}^{\perp}}&|&0\\ \hline\cr 0&|&d\,\mathrm{Id}_{\mathbb{R}^{m}}\end{bmatrix}\succeq\begin{bmatrix}(1-\tau)P_{\mathcal{V}^{\perp}}\\ \hline\cr\tau\Lambda\end{bmatrix}\big{(}(1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda\big{)}^{-2}\begin{bmatrix}(1-\tau)P_{\mathcal{V}^{\perp}}\;|\;\tau\Lambda^{*}\end{bmatrix}.

Multiplying on the left by [P𝒱|Λ]\begin{bmatrix}P_{\mathcal{V}^{\perp}}\;|\;\Lambda^{*}\end{bmatrix} and on the right by [P𝒱Λ]\begin{bmatrix}P_{\mathcal{V}^{\perp}}\\ \hline\cr\Lambda\end{bmatrix} yields

cP𝒱+dΛΛ((1τ)P𝒱+τΛΛ)((1τ)P𝒱+τΛΛ)2((1τ)P𝒱+τΛΛ)=Id.cP_{\mathcal{V}^{\perp}}+d\Lambda^{*}\Lambda\succeq((1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda)\big{(}(1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda\big{)}^{-2}((1-\tau)P_{\mathcal{V}^{\perp}}+\tau\Lambda^{*}\Lambda)=\mathrm{Id}.

This is the constraint of (38).