This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Radius of Information for Two Intersected Centered Hyperellipsoids
  and Implications in Optimal Recovery from Inaccurate Data  

Simon Foucart111Texas A&M University, [email protected]   and Chunyang Liao222University of California, Los Angeles, [email protected]
S. F. is supported by grants from the NSF (DMS-2053172) and from the ONR (N00014-20-1-2787).
Abstract

For objects belonging to a known model set and observed through a prescribed linear process, we aim at determining methods to recover linear quantities of these objects that are optimal from a worst-case perspective. Working in a Hilbert setting, we show that, if the model set is the intersection of two hyperellipsoids centered at the origin, then there is an optimal recovery method which is linear. It is specifically given by a constrained regularization procedure whose parameters, short of being explicit, can be precomputed by solving a semidefinite program. This general framework can be swiftly applied to several scenarios: the two-space problem, the problem of recovery from 2\ell_{2}-inaccurate data, and the problem of recovery from a mixture of accurate and 2\ell_{2}-inaccurate data. With more effort, it can also be applied to the problem of recovery from 1\ell_{1}-inaccurate data. For the latter, we reach the conclusion of existence of an optimal recovery method which is linear, again given by constrained regularization, under a computationally verifiable sufficient condition. Experimentally, this condition seems to hold whenever the level of 1\ell_{1}-inaccuracy is small enough. We also point out that, independently of the inaccuracy level, the minimal worst-case error of a linear recovery method can be found by semidefinite programming.

Key words and phrases: Optimal Recovery, Regularization, Semidefinite Programming, S-procedure.

AMS classification: 41A65, 46N40, 90C22, 90C47.

 

1 Introduction

The question “Do linear problems have linear optimal algorithms?” was surveyed by Packel [1988]. He gave the commonly accepted answer “usually but not always”. This question, central to the subject of Optimal Recovery, is also one of the main concerns of the present article. We shall start by recalling the meaning of this cryptic question and by introducing our notation, already employed in [Foucart, 2022], which is inspired by the field of Learning Theory. The concepts differ by names only from familiar concepts traditionally encountered in the field of Information-Based Complexity (IBC), see e.g. [Novak and Woźniakowski, 2008]. We try to draw parallels between the terminologies of these fields below.

Common to the two notational settings is the use of the letter ff for the objects of interest, since in both cases they are primarily thought of as functions, although they could be seen as arbitrary elements from a prescribed normed space. Whereas we use the notation FF for this normed space (and HH when it is a Hilbert space), FF typically stands for a strict subset of the said normed space in IBC. We too assume that our objects of interest live in a strict subset of FF, but it is denoted by 𝒦\mathcal{K} and called model set. The premise that f𝒦f\in\mathcal{K} is referred to as a priori information, since it reflects some prior scientific knowledge about realistic objects of interest. In addition, we have at our disposal some a posteriori information in the form yi=λi(f)y_{i}=\lambda_{i}(f), i=1,,mi=1,\ldots,m, for some linear functionals λ1,,λmF\lambda_{1},\ldots,\lambda_{m}\in F^{*}. Oftentimes, these linear functionals are point evaluations, giving rise, in IBC parlance, to the standard information y1=f(x(1)),,ym=f(x(m))y_{1}=f(x^{(1)}),\ldots,y_{m}=f(x^{(m)}). We call ymy\in\mathbb{R}^{m} the observation vector and notice that it can be written as y=Λfy=\Lambda f for some linear map Λ:Fm\Lambda:F\to\mathbb{R}^{m}, referred to as observation map. From the available information, both a priori and a posteriori, the task is to recover (approximate, learn, …) the object ff in full or maybe just to estimate a quantity of interest Q(f)Q(f), where Q:FZQ:F\to Z is a linear map from FF into another normed space ZZ. Such a map QQ is called the solution operator in IBC. Our task is realized by way of a recovery map Δ:mZ\Delta:\mathbb{R}^{m}\to Z—we refrain from using the IBC term algorithm, since computational feasibility is not a requirement at this point. The performance of this recovery map is assessed by the (global) worst-case error defined as

ErrQ,𝒦(Λ,Δ):=supf𝒦Q(f)Δ(Λf)Z.{\rm Err}_{Q,\mathcal{K}}(\Lambda,\Delta):=\sup_{f\in\mathcal{K}}\|Q(f)-\Delta(\Lambda f)\|_{Z}.

We are interested in how small the latter can be, in other words in the intrinsic error—often labeled radius of information in IBC—defined as

ErrQ,𝒦(Λ):=infΔ:mZErrQ,𝒦(Λ,Δ).{\rm Err}^{*}_{Q,\mathcal{K}}(\Lambda):=\inf_{\Delta:\mathbb{R}^{m}\to Z}{\rm Err}_{Q,\mathcal{K}}(\Lambda,\Delta).

Moreover, our quest is concerned with optimal recovery maps, i.e., recovery maps Δopt:mZ\Delta^{\rm opt}:\mathbb{R}^{m}\to Z that achieve the above infimum. With the terminology settled, the initial question may now be phrased as: “among all the possible optimal recovery maps, is there one which is linear?”. It is well known that the answer is affirmative in two prototypical situations: (i) when the quantity of interest QQ is a linear functional and the model set 𝒦\mathcal{K} is symmetric and convex and (ii) when FF is a Hilbert space and the model set is a centered hyperellipsoid. Another situation allowing for linear optimal recovery maps involves F=C(𝒳)F=C(\mathcal{X}), although the existence arguments rarely turn into practical constructions, except in a handful of cases such as [Foucart, 2023].

One contribution of the present article is to uncover yet another situation where optimality of linear recovery maps occur, precisely when the model set is the intersection of two centered hyperellipsoids. We do actually construct the linear optimal recovery map: it is given by constrained regularization with parameters that are clearly determined. In fact, we determine the corresponding radius of information simultaneously: it is the optimal value of a semidefinite program. The main theoretical tool is Polyak’s S-procedure, which elucidates exactly when a quadratic inequality (with no linear terms) is a consequence of two quadratic inequalities (with no linear terms). This S-procedure is in general not valid with more quadratic inequalities, explaining why our result pertains to the intersection of two centered hyperellipsoids only. This foremost result is established in Section 2 and some implications are derived in subsequent sections. Specifically, Section 3 lists a few simple consequences. One of them is the solution, in the so-called global Optimal Recovery setting, of the two-space problem, where the model set is based on approximability capabilities of two linear subspaces. The other consequences concern two situations for which the observations are inaccurate: first, when inaccuracies are bounded in 2\ell_{2}, we retrieve—and extend to infinite dimensions—some earlier results of ours; second, when some observations are exact while an 2\ell_{2}-bound is available for the inaccurate ones, we uncover an optimal recovery map built by constrained regularization, hence linear. Section 4 presents a more intricate consequence of Section 2, namely the scenario of inaccurate observations bounded in 1\ell_{1}. There, the result is not as pleasing as hoped for, but it nonetheless reveals, somewhat surprisingly, that linear recovery maps can be optimal in this scenario, too. The caveat is that the result holds conditionally on a certain sufficient condition. This condition is close to tautological, but it has the advantage of being computationally verifiable. Our numerical experiments (outlined in the reproducible files accompanying this article) indicate that the condition is likely to hold in case of small 1\ell_{1}-inacurracies.

2 Solution for the two-hyperellipsoid-intersection model set

From now on, the space FF where the objects of interest live will be a Hilbert space (of possibly infinite dimension), hence it shall be designated by HH. There are other Hilbert spaces involved as ranges of linear maps, such as the quantity of interest QQ. We will use the notation \|\cdot\| and ,\langle\cdot,\cdot\rangle indistinctly for all the associated Hilbert norms and inner products. Thus, the model set considered in this section—an intersection of two centered hyperellipsoids—takes the form

(1) 𝒦={fH:Rf1 and Sf1}\mathcal{K}=\{f\in H:\|Rf\|\leq 1\mbox{ and }\|Sf\|\leq 1\}

for some Hilbert-valued bounded linear maps RR and SS defined on HH. We assume throughout that

(2) ker(R)ker(S)ker(Λ)={0},\ker(R)\cap\ker(S)\cap\ker(\Lambda)=\{0\},

for otherwise the worst-case error of any recovery map Δ\Delta, i.e.,

(3) ErrQ,𝒦(Λ,Δ):=supRf1Sf1QfΔ(Λf){\rm Err}_{Q,\mathcal{K}}(\Lambda,\Delta):=\sup_{\begin{subarray}{c}\|Rf\|\leq 1\\ \|Sf\|\leq 1\end{subarray}}\|Qf-\Delta(\Lambda f)\|

would be infinite for Q=IdQ=\mathrm{Id}, say. We also assume that Λ:Hm\Lambda:H\to\mathbb{R}^{m} is surjective, for otherwise some observations would be redundant. This allows us to define the pseudo-inverse of Λ\Lambda as

Λ=Λ(ΛΛ)1:mH.\Lambda^{\dagger}=\Lambda^{*}(\Lambda\Lambda^{*})^{-1}:\mathbb{R}^{m}\to H.

2.1 Main result

The result stated below not only provides the value of the radius of information, i.e., of the minimal worst-case error over all recovery maps, but it also identifies an optimal recovery map. The latter involves the constrained regularization maps parametrized by a,b0a,b\geq 0 and defined as

(4) Δa,b:ym[argminfHaRf2+bSf2s.to Λf=y]H,\displaystyle\Delta_{a,b}:y\in\mathbb{R}^{m}\mapsto\Big{[}\underset{f\in H}{{\rm argmin}\,}\;a\,\|Rf\|^{2}+b\,\|Sf\|^{2}\quad\mbox{s.to }\Lambda f=y\hskip 1.42262pt\Big{]}\in H, a,b>0,\displaystyle a,b>0,
(5) Δa,0:ym[argminfHSf2s.to Λf=y and Rf=0]ker(R),\displaystyle\Delta_{a,0}:y\in\mathbb{R}^{m}\mapsto\Big{[}\underset{f\in H}{{\rm argmin}\,}\;\|Sf\|^{2}\quad\mbox{s.to }\Lambda f=y\mbox{ and }Rf=0\Big{]}\in\ker(R), a>0,\displaystyle a>0,
(6) Δ0,b:ym[argminfHRf2s.to Λf=y and Sf=0]ker(S),\displaystyle\Delta_{0,b}:y\in\mathbb{R}^{m}\mapsto\Big{[}\underset{f\in H}{{\rm argmin}\,}\;\|Rf\|^{2}\quad\mbox{s.to }\Lambda f=y\mbox{ and }Sf=0\Big{]}\in\ker(S), b>0.\displaystyle b>0.

Although not obvious at first sight, the maps Δa,b\Delta_{a,b} are linear. For instance, when a,b>0a,b>0, they indeed take the form, with 𝒩:=kerΛ\mathcal{N}:=\ker\Lambda denoting the null space of Λ\Lambda and R𝒩,S𝒩R_{\mathcal{N}},S_{\mathcal{N}} standing for the restrictions of R,SR,S to 𝒩\mathcal{N},

(7) Δa,b=Λ[aR𝒩R𝒩+bS𝒩S𝒩]1(aR𝒩R+bS𝒩S)Λ,\Delta_{a,b}=\Lambda^{\dagger}-\big{[}aR_{\mathcal{N}}^{*}R_{\mathcal{N}}+bS_{\mathcal{N}}^{*}S_{\mathcal{N}}\big{]}^{-1}\big{(}aR_{\mathcal{N}}^{*}R+bS_{\mathcal{N}}^{*}S\big{)}\Lambda^{\dagger},

where the invertibility of aR𝒩R𝒩+bS𝒩S𝒩aR_{\mathcal{N}}^{*}R_{\mathcal{N}}+bS_{\mathcal{N}}^{*}S_{\mathcal{N}} follows from (2)333In the infinite-dimensional setting, this assumption should in fact be strengthened to the existence of δ>0\delta>0 such that max{Rh,Sh}δh\max\{\|Rh\|,\|Sh\|\}\geq\delta\|h\| for all hkerΛh\in\ker\Lambda, see e.g. [Rudin, 1991, Theorem 12.12].. It is also worth pointing out that

(8) IdΔa,bΛ=[aR𝒩R𝒩+bS𝒩S𝒩]1(aR𝒩R+bS𝒩S).\mathrm{Id}-\Delta_{a,b}\Lambda=\big{[}aR_{\mathcal{N}}^{*}R_{\mathcal{N}}+bS_{\mathcal{N}}^{*}S_{\mathcal{N}}\big{]}^{-1}\big{(}aR_{\mathcal{N}}^{*}R+bS_{\mathcal{N}}^{*}S\big{)}.

The justification of both (7) and (8) can be found in the appendix. There, we also establish the convergence of Δa,b(y)\Delta_{a,b}(y) to Δa,0(y)\Delta_{a,0}(y) as b0b\to 0 and of Δa,b(y)\Delta_{a,b}(y) to Δ0,b(y)\Delta_{0,b}(y) as a0a\to 0, the convergence being understood in the weak sense when dim(H)=\dim(H)=\infty.

Theorem 1.

For the two-hyperellipsoid-intersection model set (1), the square of the radius of information of the observation map Λ:Hm\Lambda:H\to\mathbb{R}^{m} for the estimation of QQ is given by the optimal value of the program

(9) minimizea,b0a+bs.to aRh2+bSh2Qh2for all hkerΛ.\underset{a,b\geq 0}{\rm minimize}\,\;a+b\qquad\mbox{s.to }\quad a\|Rh\|^{2}+b\|Sh\|^{2}\geq\|Qh\|^{2}\quad\mbox{for all }h\in\ker\Lambda.

Further, if a,b0a_{\sharp},b_{\sharp}\geq 0 are minimizers of this program, then QΔa,bQ\circ\Delta_{a_{\sharp},b_{\sharp}} is an optimal recovery map. In short,

(10) ErrQ,𝒦(Λ,QΔa,b)2=infΔ:mZErrQ,𝒦(Λ,Δ)2=a+b.{\rm Err}_{Q,\mathcal{K}}(\Lambda,Q\circ\Delta_{a_{\sharp},b_{\sharp}})^{2}=\inf_{\Delta:\mathbb{R}^{m}\to Z}{\rm Err}_{Q,\mathcal{K}}(\Lambda,\Delta)^{2}=a_{\sharp}+b_{\sharp}.

The proof of this result is postponed for a short while. Before that, we address the question of whether the optimization program (9) can be solved in practice. The answer is yes, at least when HH is finite-dimensional. Indeed, if (h1,,hn)(h_{1},\ldots,h_{n}) denotes a basis for 𝒩=kerΛ\mathcal{N}=\ker\Lambda, representing h𝒩h\in\mathcal{N} as h=i=1nxihih=\sum_{i=1}^{n}x_{i}h_{i} for xnx\in\mathbb{R}^{n} allows us to reformulate the constraint cRh2+dSh2Qh2c\|Rh\|^{2}+d\|Sh\|^{2}\geq\|Qh\|^{2} for all h𝒩h\in\mathcal{N} as c𝖱x,x+d𝖲x,x𝖰x,xc\langle{\sf R^{\prime}}x,x\rangle+d\langle{\sf S^{\prime}}x,x\rangle\geq\langle{\sf Q^{\prime}}x,x\rangle for all xnx\in\mathbb{R}^{n}, where 𝖱,𝖲,𝖰n×n{\sf R^{\prime}},{\sf S^{\prime}},{\sf Q^{\prime}}\in\mathbb{R}^{n\times n} are symmetric matrices with entries

(11) 𝖱i,j=R(hi),R(hj),𝖲i,j=S(hi),S(hj),𝖰i,j=Q(hi),Q(hj).{\sf R^{\prime}}_{i,j}=\langle R(h_{i}),R(h_{j})\rangle,\qquad{\sf S^{\prime}}_{i,j}=\langle S(h_{i}),S(h_{j})\rangle,\qquad{\sf Q^{\prime}}_{i,j}=\langle Q(h_{i}),Q(h_{j})\rangle.

Thus, the program (9) is equivalent to the semidefinite program

minimizea,b0a+bs.to a𝖱+b𝖲𝖰.\underset{a,b\geq 0}{\rm minimize}\,\;a+b\qquad\mbox{s.to }\quad a{\sf R^{\prime}}+b{\sf S^{\prime}}\succeq{\sf Q^{\prime}}.

Such a semidefinite program can be solved efficiently via a variety of solvers, e.g. the ones embedded in the matlab-based modeling system CVX [Grant and Boyd, 2014], although they (currently) all struggles when nn is in the thousands.

2.2 Justification

The proof of Theorem 1 is broken down into three small results which we find useful to isolate as separate lemmas. The first lemma estimates the radius of information from below and the second lemma is a key step for the third lemma, which estimates the radius of information from above using constrained regularization maps. Here is the first lemma.

Lemma 2.

The squared worst-case error of any recovery map Δ\Delta satisfies, with 𝒩:=kerΛ\mathcal{N}:=\ker\Lambda,

(12) ErrQ,𝒦(Λ,Δ)2LB:=suph𝒦𝒩Qh2{\rm Err}_{Q,\mathcal{K}}(\Lambda,\Delta)^{2}\geq{\rm LB}:=\sup_{h\in\mathcal{K}\cap\mathcal{N}}\|Qh\|^{2}

and this lower bound LB{\rm LB} can be reformulated as

LB=infa,b0a+bs.toaRh2+bSh2Qh2for all h𝒩.{\rm LB}=\inf_{a,b\geq 0}a+b\qquad\mbox{s.to}\quad a\|Rh\|^{2}+b\|Sh\|^{2}\geq\|Qh\|^{2}\quad\mbox{for all }h\in\mathcal{N}.
Proof.

We include the argument for the first part, even though it is very classical. It starts by considering any h𝒦𝒩h\in\mathcal{K}\cap\mathcal{N} and by noticing that both +h+h and h-h belong to 𝒦𝒩\mathcal{K}\cap\mathcal{N} before observing that

ErrQ,𝒦(Λ,Δ)2\displaystyle{\rm Err}_{Q,\mathcal{K}}(\Lambda,\Delta)^{2} =supf𝒦QfΔ(Λf)2max±Q(±h)Δ(0)2\displaystyle=\sup_{f\in\mathcal{K}}\|Qf-\Delta(\Lambda f)\|^{2}\geq\max_{\pm}\|Q(\pm h)-\Delta(0)\|^{2}
12QhΔ(0)2+12QhΔ(0)2=Qh2+Δ(0)2\displaystyle\geq\frac{1}{2}\|Qh-\Delta(0)\|^{2}+\frac{1}{2}\|-Qh-\Delta(0)\|^{2}=\|Qh\|^{2}+\|\Delta(0)\|^{2}
Qh2.\displaystyle\geq\|Qh\|^{2}.

Finally, it finishes by taking the supremum over h𝒦𝒩h\in\mathcal{K}\cap\mathcal{N} to derive that ErrQ,𝒦(Λ,Δ)2LB{\rm Err}_{Q,\mathcal{K}}(\Lambda,\Delta)^{2}\geq{\rm LB}.

The argument for the second part begins by reformulating the lower bound as

LB=infγγs.to Qh2γ whenever h𝒩 satisfies Rh21 and Sh21.{\rm LB}=\inf_{\gamma}\gamma\quad\mbox{s.to }\|Qh\|^{2}\leq\gamma\mbox{ whenever }h\in\mathcal{N}\mbox{ satisfies }\|Rh\|^{2}\leq 1\mbox{ and }\|Sh\|^{2}\leq 1.

By the version of Polyak’s S-procedure recalled in the appendix and its extension to the inifinite-dimensional case, the latter constraint is equivalent to444To verify the applicability of the S-procedure, note that h=0h=0 satisfies the strict feasibility condition (h𝒩h\in\mathcal{N}, Rh2<1\|Rh\|^{2}<1, Sh2<1\|Sh\|^{2}<1) and that any a,b>0a,b>0 satisfy the positive definiteness condition (aR𝒩R𝒩+bS𝒩S𝒩0aR_{\mathcal{N}}^{*}R_{\mathcal{N}}+bS_{\mathcal{N}}^{*}S_{\mathcal{N}}\succ 0).

there exist a,b0 such that Qh2γa(Rh21)+b(Sh21) for all h𝒩.\mbox{there exist }a,b\geq 0\mbox{ such that }\|Qh\|^{2}-\gamma\leq a\big{(}\|Rh\|^{2}-1\big{)}+b\big{(}\|Sh\|^{2}-1\big{)}\mbox{ for all }h\in\mathcal{N}.

The latter decouples as

there exist a,b0 such that γa+b and aRh2+bSh2Qh2 for all h𝒩.\mbox{there exist }a,b\geq 0\mbox{ such that }\gamma\geq a+b\;\mbox{ and }\;a\|Rh\|^{2}+b\|Sh\|^{2}\geq\|Qh\|^{2}\mbox{ for all }h\in\mathcal{N}.

Therefore, the lower bound takes the form

LB=infγa,b0γs.to γa+b and aRh2+bSh2Qh2 for all h𝒩.{\rm LB}=\inf_{\begin{subarray}{c}\gamma\\ a,b\geq 0\end{subarray}}\gamma\quad\mbox{s.to }\gamma\geq a+b\;\mbox{ and }\;a\|Rh\|^{2}+b\|Sh\|^{2}\geq\|Qh\|^{2}\mbox{ for all }h\in\mathcal{N}.

Since the minimal value that γ\gamma can achieve under these constraints is a+ba+b, this infimum indeed reduces to the form of the lower bound announced in the statement of lemma. ∎

The second lemma is reminiscent of a result already obtained (with n=2n=2) in [Foucart and Liao, 2022, Lemma 13] for Q=IdQ=\mathrm{Id} and in [Foucart, Liao, and Veldt, 2023, Lemma 3] for an arbitrary linear quantity of interest QQ, but the new proof presented here is more transparent, as it avoids arguments involving semidefinite matrices. As such, it is valid in infinite-dimensional Hilbert spaces, too.

Lemma 3.

Let 𝒩\mathcal{N} be a linear subspace of HH and let R1,,RnR_{1},\ldots,R_{n} be Hilbert-valued linear maps defined on HH. Suppose that c1,,cn>0c_{1},\ldots,c_{n}>0 satisfy

(13) Qh2i=1nciRih2 for all h𝒩.\|Qh\|^{2}\leq\sum_{i=1}^{n}c_{i}\|R_{i}h\|^{2}\quad\mbox{ for all }h\in\mathcal{N}.

Then, setting T=i=1nciRi,𝒩Ri:H𝒩T=\sum_{i=1}^{n}c_{i}R_{i,\mathcal{N}}^{*}R_{i}\colon H\to\mathcal{N} and assuming that T𝒩=i=1nciRi,𝒩Ri,𝒩:𝒩𝒩T_{\mathcal{N}}=\sum_{i=1}^{n}c_{i}R_{i,\mathcal{N}}^{*}R_{i,\mathcal{N}}\colon\mathcal{N}\to\mathcal{N} is invertible, one has

QT𝒩1(i=1nciRi,𝒩Rifi)2i=1nciRifi2 for all f1,,fnH.\Big{\|}Q\,T_{\mathcal{N}}^{-1}\Big{(}\sum_{i=1}^{n}c_{i}R_{i,\mathcal{N}}^{*}R_{i}f_{i}\Big{)}\Big{\|}^{2}\leq\sum_{i=1}^{n}c_{i}\big{\|}R_{i}f_{i}\big{\|}^{2}\quad\mbox{ for all }f_{1},\ldots,f_{n}\in H.
Proof.

To ease notation, let h:=i=1nciRi,𝒩Rifih:=\sum_{i=1}^{n}c_{i}R_{i,\mathcal{N}}^{*}R_{i}f_{i}. Note that hh belongs to 𝒩\mathcal{N}, and so does T𝒩1hT_{\mathcal{N}}^{-1}h. In view of (13), it is enough to prove that

(14) i=1nciRiT𝒩1h2i=1nciRifi2.\sum_{i=1}^{n}c_{i}\|R_{i}T_{\mathcal{N}}^{-1}h\|^{2}\leq\sum_{i=1}^{n}c_{i}\big{\|}R_{i}f_{i}\big{\|}^{2}.

The left-hand side of (14), which we denote by LHS{\rm LHS} for short, is manipulated as follows:

LHS\displaystyle{\rm LHS} =i=1nciRi,𝒩T𝒩1h2=i=1nciRi,𝒩Ri,𝒩T𝒩1h,T𝒩1h=i=1nciRi,𝒩Ri,𝒩T𝒩1h,T𝒩1h\displaystyle=\sum_{i=1}^{n}c_{i}\|R_{i,\mathcal{N}}T_{\mathcal{N}}^{-1}h\|^{2}=\sum_{i=1}^{n}c_{i}\bigg{\langle}R_{i,\mathcal{N}}^{*}R_{i,\mathcal{N}}T_{\mathcal{N}}^{-1}h,T_{\mathcal{N}}^{-1}h\bigg{\rangle}=\bigg{\langle}\sum_{i=1}^{n}c_{i}R_{i,\mathcal{N}}^{*}R_{i,\mathcal{N}}T_{\mathcal{N}}^{-1}h,T_{\mathcal{N}}^{-1}h\bigg{\rangle}
=T𝒩T𝒩1h,T𝒩1h=h,T𝒩1h=i=1nciRi,𝒩Rifi,T𝒩1h=i=1nciRi,𝒩Rifi,T𝒩1h\displaystyle=\bigg{\langle}T_{\mathcal{N}}T_{\mathcal{N}}^{-1}h,T_{\mathcal{N}}^{-1}h\bigg{\rangle}=\bigg{\langle}h,T_{\mathcal{N}}^{-1}h\bigg{\rangle}=\bigg{\langle}\sum_{i=1}^{n}c_{i}R_{i,\mathcal{N}}^{*}R_{i}f_{i},T_{\mathcal{N}}^{-1}h\bigg{\rangle}=\sum_{i=1}^{n}c_{i}\bigg{\langle}R_{i,\mathcal{N}}^{*}R_{i}f_{i},T_{\mathcal{N}}^{-1}h\bigg{\rangle}
=i=1nciRifi,RiT𝒩1h.\displaystyle=\sum_{i=1}^{n}c_{i}\bigg{\langle}R_{i}f_{i},R_{i}T_{\mathcal{N}}^{-1}h\bigg{\rangle}.

From the general inequality u,v(u2+v2)/2\langle u,v\rangle\leq(\|u\|^{2}+\|v\|^{2})/2, we derive that

LHS\displaystyle{\rm LHS} i=1nci2(Rifi2+RiT𝒩1h2)=12i=1nciRifi2+12LHS,\displaystyle\leq\sum_{i=1}^{n}\frac{c_{i}}{2}\left(\Big{\|}R_{i}f_{i}\Big{\|}^{2}+\Big{\|}R_{i}T_{\mathcal{N}}^{-1}h\Big{\|}^{2}\right)=\frac{1}{2}\sum_{i=1}^{n}c_{i}\Big{\|}R_{i}f_{i}\Big{\|}^{2}+\frac{1}{2}{\rm LHS},

which is just a rearrangement of the desired inequality (14). ∎

The third and final lemma gives an upper bound for the squared worst-case error of the constrained regularization map Δa,b\Delta_{a,b}.

Lemma 4.

Suppose that a,b>0a,b>0 satisfy

Qh2aRh2+bSh2 for all hkerΛ.\|Qh\|^{2}\leq a\|Rh\|^{2}+b\|Sh\|^{2}\quad\mbox{ for all }h\in\ker\Lambda.

Then one has

ErrQ,𝒦(Λ,QΔa,b)2a+b.{\rm Err}_{Q,\mathcal{K}}(\Lambda,Q\circ\Delta_{a,b})^{2}\leq a+b.
Proof.

The squared worst-case error of the recovery map QΔa,bQ\circ\Delta_{a,b} is

ErrQ,𝒦(Λ,QΔa,b)2\displaystyle{\rm Err}_{Q,\mathcal{K}}(\Lambda,Q\circ\Delta_{a,b})^{2} =supf𝒦QfQΔa,b(Λf)2=supRf1Sf1Q(IdΔa,bΛ)f2\displaystyle=\sup_{f\in\mathcal{K}}\|Qf-Q\circ\Delta_{a,b}(\Lambda f)\|^{2}=\sup_{\begin{subarray}{c}\|Rf\|\leq 1\\ \|Sf\|\leq 1\end{subarray}}\|Q(\mathrm{Id}-\Delta_{a,b}\Lambda)f\|^{2}
=supRf1Sf1Q[aR𝒩R𝒩+bS𝒩S𝒩]1(aR𝒩Rf+bS𝒩Sf)2,\displaystyle=\sup_{\begin{subarray}{c}\|Rf\|\leq 1\\ \|Sf\|\leq 1\end{subarray}}\Big{\|}Q\big{[}aR_{\mathcal{N}}^{*}R_{\mathcal{N}}+bS_{\mathcal{N}}^{*}S_{\mathcal{N}}\big{]}^{-1}\big{(}aR_{\mathcal{N}}^{*}Rf+bS_{\mathcal{N}}^{*}Sf\big{)}\Big{\|}^{2},

where we have made use of (8) with 𝒩=kerΛ\mathcal{N}=\ker\Lambda. Then, invoking Lemma 3 for n=2n=2, R1=RR_{1}=R, R2=SR_{2}=S, and f1=f2=ff_{1}=f_{2}=f, we obtain

ErrQ,𝒦(Λ,QΔa,b)2\displaystyle{\rm Err}_{Q,\mathcal{K}}(\Lambda,Q\circ\Delta_{a,b})^{2} supRf1Sf1(aRf2+bSf2)a+b,\displaystyle\leq\sup_{\begin{subarray}{c}\|Rf\|\leq 1\\ \|Sf\|\leq 1\end{subarray}}\big{(}a\|Rf\|^{2}+b\|Sf\|^{2}\big{)}\leq a+b,

which is the announced result. ∎

With this series of lemmas at hand, we are now ready to justify the main result of this section.

Proof of Theorem 1.

Let a,b0a_{\sharp},b_{\sharp}\geq 0 be minimizers of the optimization program (9). On the one hand, Lemma 2 guarantees that

(15) infΔ:mZErrQ,𝒦(Λ,Δ)2a+b.\inf_{\Delta:\mathbb{R}^{m}\to Z}{\rm Err}_{Q,\mathcal{K}}(\Lambda,\Delta)^{2}\geq a_{\sharp}+b_{\sharp}.

On the other hand, by the feasibility of aa_{\sharp} and bb_{\sharp}, we have Qh2aRh2+bSh2\|Qh\|^{2}\leq a_{\sharp}\|Rh\|^{2}+b_{\sharp}\|Sh\|^{2} for all hkerΛh\in\ker\Lambda. To deal with the possibility of aa_{\sharp} or bb_{\sharp} being zero, we consider, for any ε>0\varepsilon>0, a(ε):=a+ε>0a_{\sharp}^{(\varepsilon)}:=a_{\sharp}+\varepsilon>0 and b(ε):=b+ε>0b_{\sharp}^{(\varepsilon)}:=b_{\sharp}+\varepsilon>0 and notice that Qh2a(ε)Rh2+b(ε)Sh2\|Qh\|^{2}\leq a_{\sharp}^{(\varepsilon)}\|Rh\|^{2}+b_{\sharp}^{(\varepsilon)}\|Sh\|^{2} for all hkerΛh\in\ker\Lambda. Lemma 4 then guarantees that ErrQ,𝒦(Λ,QΔa(ε),b(ε))2a(ε)+b(ε){\rm Err}_{Q,\mathcal{K}}(\Lambda,Q\circ\Delta_{a_{\sharp}^{(\varepsilon)},b_{\sharp}^{(\varepsilon)}})^{2}\leq a_{\sharp}^{(\varepsilon)}+b_{\sharp}^{(\varepsilon)}. It is now easy to see that taking (possibly weak) limits as ε0\varepsilon\to 0 yields

(16) ErrQ,𝒦(Λ,QΔa,b)2a+b.{\rm Err}_{Q,\mathcal{K}}(\Lambda,Q\circ\Delta_{a_{\sharp},b_{\sharp}})^{2}\leq a_{\sharp}+b_{\sharp}.

The inequalities (15) and (16) together fully justify (10) and thus complete the proof. ∎

2.3 Side results

In this section, we put forward an interpretation of the radius of information that differs from the minimial value of the program (9) and we shed light on the extremizer appearing in the expression of the lower bound from (12)—which is now known to coincide with the squared radius of information. Although these results are not used later, we include them here because they appear interesting for their own sake. Both results call upon the largest eigenvalue, denoted by λmax\lambda_{\max}, of self-adjoint operators.

Proposition 5.

For the two-hyperellipsoid-intersection model set (1), the radius of information of the observation map Λ:Hm\Lambda:H\to\mathbb{R}^{m} for the estimation of QQ is also given as the optimal value λ\lambda_{\sharp} of the program

minimizeτ[0,1]λmax(Q𝒩[(1τ)R𝒩R𝒩+τS𝒩S𝒩]1Q𝒩),\underset{\tau\in[0,1]}{\rm minimize}\,\;\lambda_{\max}\big{(}Q_{\mathcal{N}}[(1-\tau)R_{\mathcal{N}}^{*}R_{\mathcal{N}}+\tau S_{\mathcal{N}}^{*}S_{\mathcal{N}}]^{-1}Q_{\mathcal{N}}^{*}\big{)},

where 𝒩:=kerΛ\mathcal{N}:=\ker\Lambda. Moreover, if τ(0,1)\tau_{\sharp}\in(0,1) represents a minimizer of the above program, then a:=(1τ)λa_{\sharp}:=(1-\tau_{\sharp})\lambda_{\sharp} and b:=τλb_{\sharp}:=\tau_{\sharp}\lambda_{\sharp} are minimizers of (9).

Proof.

The foremost observation consists in reformulating the constraint in (9) as

(17) λmax(Q𝒩[aR𝒩R𝒩+bS𝒩S𝒩]1Q𝒩)1.\lambda_{\max}\big{(}Q_{\mathcal{N}}[aR_{\mathcal{N}}^{*}R_{\mathcal{N}}+bS_{\mathcal{N}}^{*}S_{\mathcal{N}}]^{-1}Q_{\mathcal{N}}^{*}\big{)}\leq 1.

Indeed, the said constraint can be equivalently expressed in the form

aR𝒩R𝒩+bS𝒩S𝒩\displaystyle aR_{\mathcal{N}}^{*}R_{\mathcal{N}}+bS_{\mathcal{N}}^{*}S_{\mathcal{N}} Q𝒩Q𝒩Id[aR𝒩R𝒩+bS𝒩S𝒩]1/2Q𝒩Q𝒩[aR𝒩R𝒩+bS𝒩S𝒩]1/2\displaystyle\succeq Q_{\mathcal{N}}^{*}Q_{\mathcal{N}}\iff\mathrm{Id}\succeq[aR_{\mathcal{N}}^{*}R_{\mathcal{N}}+bS_{\mathcal{N}}^{*}S_{\mathcal{N}}]^{-1/2}Q_{\mathcal{N}}^{*}Q_{\mathcal{N}}[aR_{\mathcal{N}}^{*}R_{\mathcal{N}}+bS_{\mathcal{N}}^{*}S_{\mathcal{N}}]^{-1/2}
1λmax([aR𝒩R𝒩+bS𝒩S𝒩]1/2Q𝒩Q𝒩[aR𝒩R𝒩+bS𝒩S𝒩]1/2)\displaystyle\iff 1\geq\lambda_{\max}\big{(}[aR_{\mathcal{N}}^{*}R_{\mathcal{N}}+bS_{\mathcal{N}}^{*}S_{\mathcal{N}}]^{-1/2}Q_{\mathcal{N}}^{*}Q_{\mathcal{N}}[aR_{\mathcal{N}}^{*}R_{\mathcal{N}}+bS_{\mathcal{N}}^{*}S_{\mathcal{N}}]^{-1/2}\big{)}
1λmax(Q𝒩[aR𝒩R𝒩+bS𝒩S𝒩]1Q𝒩).\displaystyle\iff 1\geq\lambda_{\max}\big{(}Q_{\mathcal{N}}[aR_{\mathcal{N}}^{*}R_{\mathcal{N}}+bS_{\mathcal{N}}^{*}S_{\mathcal{N}}]^{-1}Q_{\mathcal{N}}^{*}\big{)}.

Thus, the above-defined a=(1τ)λa_{\sharp}=(1-\tau_{\sharp})\lambda_{\sharp} and b=τλb_{\sharp}=\tau_{\sharp}\lambda_{\sharp} are feasible for (9), since then

(18) λmax(Q𝒩[aR𝒩R𝒩+bS𝒩S𝒩]1Q𝒩)=1λλmax(Q𝒩[(1τ)R𝒩R𝒩+τS𝒩S𝒩]1Q𝒩)=1.\lambda_{\max}\big{(}Q_{\mathcal{N}}[a_{\sharp}R_{\mathcal{N}}^{*}R_{\mathcal{N}}+b_{\sharp}S_{\mathcal{N}}^{*}S_{\mathcal{N}}]^{-1}Q_{\mathcal{N}}^{*}\big{)}=\frac{1}{\lambda_{\sharp}}\lambda_{\max}\big{(}Q_{\mathcal{N}}[(1-\tau_{\sharp})R_{\mathcal{N}}^{*}R_{\mathcal{N}}+\tau_{\sharp}S_{\mathcal{N}}^{*}S_{\mathcal{N}}]^{-1}Q_{\mathcal{N}}^{*}\big{)}=1.

It now remains to show that a+ba+ba+b\geq a_{\sharp}+b_{\sharp} whenever a,b>0a,b>0 are feasible for (9). To see this, with τ=b/(a+b)\tau=b/(a+b) and 1τ=a/(a+b)1-\tau=a/(a+b), notice that

λ\displaystyle\lambda_{\sharp} λmax(Q𝒩[(1τ)R𝒩R𝒩+τS𝒩S𝒩]1Q𝒩)=(a+b)λmax(Q𝒩[aR𝒩R𝒩+bS𝒩S𝒩]1Q𝒩)\displaystyle\leq\lambda_{\max}\big{(}Q_{\mathcal{N}}[(1-\tau)R_{\mathcal{N}}^{*}R_{\mathcal{N}}+\tau S_{\mathcal{N}}^{*}S_{\mathcal{N}}]^{-1}Q_{\mathcal{N}}^{*}\big{)}=(a+b)\,\lambda_{\max}\big{(}Q_{\mathcal{N}}[aR_{\mathcal{N}}^{*}R_{\mathcal{N}}+bS_{\mathcal{N}}^{*}S_{\mathcal{N}}]^{-1}Q_{\mathcal{N}}^{*}\big{)}
(a+b),\displaystyle\leq(a+b),

where (17) was used for the last inequality. The desired conclusion follows from λ=a+b\lambda_{\sharp}=a_{\sharp}+b_{\sharp}. ∎

Proposition 6.

Under the setting of Proposition 5, recall that

(19) suph𝒩{Qh2:Rh21,Sh21}=infa,b0{a+b:aR𝒩R𝒩+bS𝒩S𝒩Q𝒩Q𝒩}.\sup_{h\in\mathcal{N}}\big{\{}\|Qh\|^{2}:\;\|Rh\|^{2}\leq 1,\|Sh\|^{2}\leq 1\big{\}}=\inf_{a,b\geq 0}\left\{a+b:\;aR_{\mathcal{N}}^{*}R_{\mathcal{N}}+bS_{\mathcal{N}}^{*}S_{\mathcal{N}}\succeq Q_{\mathcal{N}}^{*}Q_{\mathcal{N}}\right\}.

If h𝒩h_{\sharp}\in\mathcal{N} and a,b>0a_{\sharp},b_{\sharp}>0 are extremizers of the above two programs, then

  1. (i)

    Rh=1\|Rh_{\sharp}\|=1 and Sh=1\|Sh_{\sharp}\|=1;

  2. (ii)

    (aR𝒩R𝒩+bS𝒩S𝒩)h=Q𝒩Q𝒩h\big{(}a_{\sharp}R_{\mathcal{N}}^{*}R_{\mathcal{N}}+b_{\sharp}S_{\mathcal{N}}^{*}S_{\mathcal{N}}\big{)}h_{\sharp}=Q_{\mathcal{N}}^{*}Q_{\mathcal{N}}h_{\sharp}.

Proof.

Setting T𝒩:=aR𝒩R𝒩+bS𝒩S𝒩T_{\mathcal{N}}:=a_{\sharp}R_{\mathcal{N}}^{*}R_{\mathcal{N}}+b_{\sharp}S_{\mathcal{N}}^{*}S_{\mathcal{N}}, we already know from (18) that λmax(Q𝒩T𝒩1Q𝒩)=1\lambda_{\max}\big{(}Q_{\mathcal{N}}T_{\mathcal{N}}^{-1}Q_{\mathcal{N}}^{*}\big{)}=1. In view of Rh1\|Rh_{\sharp}\|\leq 1, Sh1\|Sh_{\sharp}\|\leq 1, Qh2=a+b\|Qh_{\sharp}\|^{2}=a_{\sharp}+b_{\sharp}, and writing g:=T𝒩1/2hg_{\sharp}:=T_{\mathcal{N}}^{1/2}h_{\sharp}, we observe that

g2\displaystyle\|g_{\sharp}\|^{2} =T𝒩h,h=(aR𝒩R𝒩+bS𝒩S𝒩)h,h=aRh2+bSh2\displaystyle=\langle T_{\mathcal{N}}h_{\sharp},h_{\sharp}\rangle=\langle(a_{\sharp}R_{\mathcal{N}}^{*}R_{\mathcal{N}}+b_{\sharp}S_{\mathcal{N}}^{*}S_{\mathcal{N}})h_{\sharp},h_{\sharp}\rangle=a_{\sharp}\|Rh_{\sharp}\|^{2}+b_{\sharp}\|Sh_{\sharp}\|^{2}
(1)a+b=Qh2=Q𝒩T𝒩1/2g2=(T𝒩1/2Q𝒩Q𝒩T𝒩1/2)g,g\displaystyle\underset{(1)}{\leq}a_{\sharp}+b_{\sharp}=\|Qh_{\sharp}\|^{2}=\|Q_{\mathcal{N}}T_{\mathcal{N}}^{-1/2}g_{\sharp}\|^{2}=\langle(T_{\mathcal{N}}^{-1/2}Q_{\mathcal{N}}^{*}Q_{\mathcal{N}}T_{\mathcal{N}}^{-1/2})g_{\sharp},g_{\sharp}\rangle
(2)λmax(T𝒩1/2Q𝒩Q𝒩T𝒩1/2)g2=λmax(Q𝒩T𝒩1Q𝒩)g2=g2.\displaystyle\underset{(2)}{\leq}\lambda_{\max}\big{(}T_{\mathcal{N}}^{-1/2}Q_{\mathcal{N}}^{*}Q_{\mathcal{N}}T_{\mathcal{N}}^{-1/2}\big{)}\|g_{\sharp}\|^{2}=\lambda_{\max}\big{(}Q_{\mathcal{N}}T_{\mathcal{N}}^{-1}Q_{\mathcal{N}}^{*}\big{)}\|g_{\sharp}\|^{2}=\|g_{\sharp}\|^{2}.

Since the left-hand and right-hand sides are identical, equality must hold throughout. In particular, equality in (1) implies (i). As for equality in (2), it imposes that gg_{\sharp} is an eigenvector associated with the eigenvalue λmax(T𝒩1/2Q𝒩Q𝒩T𝒩1/2)=1\lambda_{\max}\big{(}T_{\mathcal{N}}^{-1/2}Q_{\mathcal{N}}^{*}Q_{\mathcal{N}}T_{\mathcal{N}}^{-1/2}\big{)}=1, meaning that (T𝒩1/2Q𝒩Q𝒩T𝒩1/2)g=g(T_{\mathcal{N}}^{-1/2}Q_{\mathcal{N}}^{*}Q_{\mathcal{N}}T_{\mathcal{N}}^{-1/2})g_{\sharp}=g_{\sharp}, i.e., Q𝒩Q𝒩h=T𝒩hQ_{\mathcal{N}}^{*}Q_{\mathcal{N}}h_{\sharp}=T_{\mathcal{N}}h_{\sharp}, which is (ii). ∎

Remark.

The result of Proposition 6 is, in a sense, a characterization of the equality between the supremum and the infimum in (19). Indeed, the argument can easily be turned around: given minimizers a,b>0a_{\sharp},b_{\sharp}>0, if we can find h𝒩h_{\sharp}\in\mathcal{N} satisfying (i) and (ii), then the supremum equals the infimum (it is always “at most” by the trivial part of the S-procedure and it is “at least” thanks to the existence of hh_{\sharp}). Such an approach was used, in essence, to determine explicit solutions of specific differential-equation-inspired Optimal Recovery problems featuring two quadratics constraints but without invoking Polyak’s S-procedure, see [Magaril-Il’yaev, Osipenko, and Tikhomirov, 2004, Vvedenskaya, 2009]. The same circle of ideas extends to the intersection of n>2n>2 hyperellipsoids. Indeed, leaving the details to the reader, we state the loose equivalence between the equality

suph𝒩{Qh2:Rih21,i=1,,n}=infc1,,cn0{i=1nci:ciRi,𝒩Ri,𝒩Q𝒩Q𝒩}\sup_{h\in\mathcal{N}}\big{\{}\|Qh\|^{2}:\;\|R_{i}h\|^{2}\leq 1,i=1,\ldots,n\big{\}}=\inf_{c_{1},\ldots,c_{n}\geq 0}\left\{\sum_{i=1}^{n}c_{i}:\;c_{i}R_{i,\mathcal{N}}^{*}R_{i,\mathcal{N}}\succeq Q_{\mathcal{N}}^{*}Q_{\mathcal{N}}\right\}

and, with c1,,cn>0c_{1}^{\sharp},\ldots,c_{n}^{\sharp}>0 denoting minimizers of the latter, the existence of h𝒩h_{\sharp}\in\mathcal{N} such that

Rih=1,i=1,,n,and(i=1nciRi,𝒩Ri,𝒩)h=Q𝒩Q𝒩h.\|R_{i}h_{\sharp}\|=1,\quad i=1,\ldots,n,\qquad\mbox{and}\qquad\left(\sum_{i=1}^{n}c^{\sharp}_{i}R_{i,\mathcal{N}}^{*}R_{i,\mathcal{N}}\right)h_{\sharp}=Q_{\mathcal{N}}^{*}Q_{\mathcal{N}}h_{\sharp}.

This gives us a practical way of deciding whether Theorem 1 extends to n>2n>2 hyperellipsoids: after solving a semidefinite program, construct a candidate hh_{\sharp} by solving an eigenvalue problem and test if the Rih\|R_{i}h_{\sharp}\| are all equal. As observed numerically, this occurs in some situations, but certainly not in all, in particular not when the RiR_{i} are orthogonal projectors (as in the multispace problem described below). The moral is that this article deals with the intersection of n=2n=2 hyperellipsoids not only because the strategy based on Polyak’s S-procedure does not apply to n>2n>2, but also because the natural extension is not valid for n>2n>2.

3 Three easy consequences

The two-hyperellisoid-intersection framework has direct implications for optimal recovery from (partially) inaccurate data, to be discussed later, and even more directly for optimal recovery from accurate data under the two-space approximability model, to be elucidated right now.

3.1 The two-space problem

A model set based on approximability by a linear subspace VV of FF with parameter ε>0\varepsilon>0, namely

𝒦={fF:dist(f,V)ε},\mathcal{K}=\{f\in F:{\rm dist}(f,V)\leq\varepsilon\},

gained traction after the work of Binev, Cohen, Dahmen, DeVore, Petrova, and Wojtaszczyk [2017]. When FF is a Hilbert space, these authors completely solved the full recovery (Q=IdQ=\mathrm{Id}) problem even in the local setting—the present article deals solely with the global setting. They also raised the question of the multispace problem, a particular case of which being the two-space problem where

(20) 𝒦={fH:dist(f,V)ε and dist(f,W)η}.\mathcal{K}=\{f\in H:{\rm dist}(f,V)\leq\varepsilon\mbox{ and }{\rm dist}(f,W)\leq\eta\}.

For the multispace problem, they proposed two iterative algorithms which, in the limit, produce model- and data-consistent objects. As such, these algorithms yield worst-case errors that are near-optimal by a factor at most two.

The two-space problem—in fact, even the multispace problem—in an arbitrary normed space FF was solved in [Foucart, 2021] but only when the quantity of interest QQ is a linear functional. For more general linear maps QQ, but when FF is a Hilbert space, the two-space problem is a special case of our two-hyperellipsoid-intersection problem. Indeed, the model set (20) is an instantiation of the model set (1) with RR and SS being scaled versions of the orthogonal projectors onto the orthogonal complements of VV and WW, precisely R=(1/ε)PVR=(1/\varepsilon)P_{V^{\perp}} and S=(1/η)PWS=(1/\eta)P_{W^{\perp}}. Thus, Theorem 1 applies directly and we arrive, through the change of optimization variables a=cε2a=c\varepsilon^{2} and b=dη2b=d\eta^{2}, at the result stated below for completeness.

Theorem 7.

For the two-space model set (20), the square of the radius of information of the observation map Λ:Hm\Lambda:H\to\mathbb{R}^{m} for the estimation of QQ is given by the optimal value of the program

(21) minimizec,d0cε2+dη2s.to cPVh2+dPWh2Qh2for all hkerΛ.\underset{c,d\geq 0}{\rm minimize}\,\;c\varepsilon^{2}+d\eta^{2}\qquad\mbox{s.to }\quad c\|P_{V^{\perp}}h\|^{2}+d\|P_{W^{\perp}}h\|^{2}\geq\|Qh\|^{2}\quad\mbox{for all }h\in\ker\Lambda.

Further, if c,d0c_{\sharp},d_{\sharp}\geq 0 are minimizers of this program and if Δc,d\Delta_{c_{\sharp},d_{\sharp}} is the map defined for ymy\in\mathbb{R}^{m} by

Δc,d(y)=[argminfHcPVf2+dPWf2s.to Λf=y]\Delta_{c_{\sharp},d_{\sharp}}(y)=\Big{[}\underset{f\in H}{{\rm argmin}\,}\,c_{\sharp}\|P_{V^{\perp}}f\|^{2}+d_{\sharp}\|P_{W^{\perp}}f\|^{2}\quad\mbox{s.to }\Lambda f=y\Big{]}

(and interpreted via continuity in case c=0c_{\sharp}=0 or d=0d_{\sharp}=0), then the linear map QΔc,dQ\circ\Delta_{c_{\sharp},d_{\sharp}} provides an optimal recovery map.

3.2 Recovery from 2\ell_{2}-inaccurate data

Suppose now that the observations made on the objects of interest f𝒦f\in\mathcal{K} are not accurate anymore, but rather of the form y=Λf+ey=\Lambda f+e with an error vector eme\in\mathbb{R}^{m} belonging to some uncertainty set \mathcal{E}. We then need to adjust the notion of worst-case error of a recovery map Δ:Hm\Delta:H\to\mathbb{R}^{m} and thus define the quantity

(22) ErrQ,𝒦,(Λ,Δ):=supf𝒦eQfΔ(Λf+e).{\rm Err}_{Q,\mathcal{K},\mathcal{E}}(\Lambda,\Delta):=\sup_{\begin{subarray}{c}f\in\mathcal{K}\\ e\in\mathcal{E}\end{subarray}}\|Qf-\Delta(\Lambda f+e)\|.

In this subsection, both the model set and uncertainty set are hyperellipsoids, i.e.,

(23) 𝒦\displaystyle\mathcal{K} ={fH:Rfε},\displaystyle=\{f\in H:\|Rf\|\leq\varepsilon\},
(24) \displaystyle\mathcal{E} ={em:Seη}.\displaystyle=\{e\in\mathbb{R}^{m}:\|Se\|\leq\eta\}.

In this situation, the problem at hand reduces to the two-hyperellipsoid-intersection problem with accurate data. Indeed, considering the compound variable f~=(f,e)\widetilde{f}=(f,e) belonging to the extended Hilbert space H~:=H×m\widetilde{H}:=H\times\mathbb{R}^{m}, let us introduce linear maps Λ~\widetilde{\Lambda}, Q~\widetilde{Q}, R~\widetilde{R}, and S~\widetilde{S} defined on H~\widetilde{H} by

(25) Λ~((f,e))=Λf+e,Q~((f,e))=Qf,R~((f,e))=(1/ε)Rf,S~((f,e))=(1/η)Se.\widetilde{\Lambda}((f,e))=\Lambda f+e,\quad\widetilde{Q}((f,e))=Qf,\quad\widetilde{R}((f,e))=(1/\varepsilon)Rf,\quad\widetilde{S}((f,e))=(1/\eta)Se.

The worst-case error (22) of the recovery map Δ\Delta is then expressed as

ErrQ,𝒦,(Λ,Δ)=supR~f~1S~f~1Q~f~Δ(Λ~f~),{\rm Err}_{Q,\mathcal{K},\mathcal{E}}(\Lambda,\Delta)=\sup_{\begin{subarray}{c}\|\widetilde{R}\widetilde{f}\|\leq 1\\ \|\widetilde{S}\widetilde{f}\|\leq 1\end{subarray}}\|\widetilde{Q}\widetilde{f}-\Delta(\widetilde{\Lambda}\widetilde{f})\|,

i.e., exactly as in (3). Exploiting this analogy, Theorem 1 yields the result stated below. Note that it is not entirely new: it was obtained in [Foucart and Liao, 2022] for Q=IdQ=\mathrm{Id} and S=IdS=\mathrm{Id} and in [Foucart, Liao, and Veldt, 2023] for an arbitrary QQ but still with S=IdS=\mathrm{Id}. The extension to SIdS\not=\mathrm{Id} is minor—more pertinent is the fact that the result is now valid in infinite dimensions (although solving (26) in practice would then be a challenge).

Theorem 8.

For the hyperellipsoidal model and uncertainty sets (23) and (24), the square of the radius of information of the observation map Λ:Hm\Lambda:H\to\mathbb{R}^{m} for the estimation of QQ is given by the optimal value of the program

(26) minimizec,d0cε2+dη2s.to cRf2+dSΛf2Qf2for all fH.\underset{c,d\geq 0}{\rm minimize}\,\;c\varepsilon^{2}+d\eta^{2}\qquad\mbox{s.to }\quad c\|Rf\|^{2}+d\|S\Lambda f\|^{2}\geq\|Qf\|^{2}\quad\mbox{for all }f\in H.

Further, if c,d0c_{\sharp},d_{\sharp}\geq 0 are minimizers of this program and if Δc,d\Delta_{c_{\sharp},d_{\sharp}} is the map defined for ymy\in\mathbb{R}^{m} by

(27) Δc,d(y)=[argminfHcRf2+dS(yΛf)2]\Delta_{c_{\sharp},d_{\sharp}}(y)=\Big{[}\underset{f\in H}{{\rm argmin}\,}\,c_{\sharp}\|Rf\|^{2}+d_{\sharp}\|S(y-\Lambda f)\|^{2}\Big{]}

(and interpreted via continuity in case c=0c_{\sharp}=0 or d=0d_{\sharp}=0), then the linear map QΔc,dQ\circ\Delta_{c_{\sharp},d_{\sharp}} provides an optimal recovery map.

Proof.

With the change of optimization variables a=cε2a=c\varepsilon^{2} and b=dη2b=d\eta^{2}, the program (9) for Λ~\widetilde{\Lambda}, Q~\widetilde{Q}, R~\widetilde{R}, and S~\widetilde{S} becomes

minimizec,d0cε2+dη2s.to cRf2+dSe2Qf2when fH,em satisfy Λf+e=0.\underset{c,d\geq 0}{\rm minimize}\,\;c\varepsilon^{2}+d\eta^{2}\quad\mbox{s.to }\;c\|Rf\|^{2}+d\|Se\|^{2}\geq\|Qf\|^{2}\quad\mbox{when }f\in H,e\in\mathbb{R}^{m}\mbox{ satisfy }\Lambda f+e=0.

Eliminating eme\in\mathbb{R}^{m} from the above yields the program (26). As for the constrained regularization map Δa,b\Delta_{a_{\sharp},b_{\sharp}} from (4), it is to be replaced by

ym[argminfH,emcRf2+dSe2s.to Λf+e=y].y\in\mathbb{R}^{m}\mapsto\Big{[}\underset{f\in H,e\in\mathbb{R}^{m}}{{\rm argmin}\,}\,c_{\sharp}\|Rf\|^{2}+d_{\sharp}\|Se\|^{2}\quad\mbox{s.to }\Lambda f+e=y\Big{]}.

Again, eliminating eme\in\mathbb{R}^{m} from the above leads to (27). ∎

3.3 Recovery for mixed accurate and 2\ell_{2}-inaccurate data

In some situations, parts of the observations on the objects of interest f𝒦f\in\mathcal{K} can be made accurately, while other parts are subject to errors. Such a situation occurs e.g. when learning the parameters of partial differential equations (using kernel methods, say) from example solutions that can be perfectly evaluated at points on the boundary of the domain but imprecisely at points inside the domain, see e.g [Anandkumar, Azizzadenesheli, Bhattacharya, Kovachki, Li, Liu, and Stuart, 2020, Long, Mrvaljevic, Zhe, and Hosseini, 2023]. To cover this possibility, we can decompose the error vector eme\in\mathbb{R}^{m} as e=(e,e′′)m×m′′e=(e^{\prime},e^{\prime\prime})\in\mathbb{R}^{m^{\prime}}\times\mathbb{R}^{m^{\prime\prime}}, with e=0e^{\prime}=0 and e′′η\|e^{\prime\prime}\|\leq\eta. More generally, we shall assume that Se=0S^{\prime}e=0 and S′′eη\|S^{\prime\prime}e\|\leq\eta for some Hilbert-valued linear maps S,S′′S^{\prime},S^{\prime\prime} defined on m\mathbb{R}^{m}. We shall therefore consider model and uncertainty sets of the form

(28) 𝒦\displaystyle\mathcal{K} ={hH:Rfε},\displaystyle=\{h\in H:\|Rf\|\leq\varepsilon\},
(29) \displaystyle\mathcal{E} ={eker(S):S′′eη}.\displaystyle=\{e\in\ker(S^{\prime}):\|S^{\prime\prime}e\|\leq\eta\}.

This time working with the different extended space H~=H×ker(S)\widetilde{H}=H\times\ker(S^{\prime}), we still introduce linear maps Λ~\widetilde{\Lambda}, Q~\widetilde{Q}, R~\widetilde{R}, and S~\widetilde{S} defined on the compound variable f~=(f,e)H~\widetilde{f}=(f,e)\in\widetilde{H} almost as in (25), but with one slight modification for S~\widetilde{S}, namely

(30) Λ~((f,e))=Λf+e,Q~((f,e))=Qf,R~((f,e))=(1/ε)Rf,S~((f,e))=(1/η)S′′e.\widetilde{\Lambda}((f,e))=\Lambda f+e,\quad\widetilde{Q}((f,e))=Qf,\quad\widetilde{R}((f,e))=(1/\varepsilon)Rf,\quad\widetilde{S}((f,e))=(1/\eta)S^{\prime\prime}e.

The worst-case error (22) of a recovery map for this mixed error scenario is still identifiable with the worst-case error (3) for the two-hyperellipsoid-intersection scenario, so we can once more leverage Theorem 1 to derive the following result.

Theorem 9.

For the model set (28) and the mixed-uncertainty set (29), the square of the radius of information of the observation map Λ:Hm\Lambda:H\to\mathbb{R}^{m} for the estimation of QQ is given by the optimal value of the program

(31) minimizec,d0cε2+dη2s.to cRf2+dS′′Λf2Qf2for all fker(SΛ).\underset{c,d\geq 0}{\rm minimize}\,\;c\varepsilon^{2}+d\eta^{2}\qquad\mbox{s.to }\quad c\|Rf\|^{2}+d\|S^{\prime\prime}\Lambda f\|^{2}\geq\|Qf\|^{2}\quad\mbox{for all }f\in\ker(S^{\prime}\Lambda).

Further, if c,d0c_{\sharp},d_{\sharp}\geq 0 are minimizers of this program and if Δc,d\Delta_{c_{\sharp},d_{\sharp}} is the map defined for ymy\in\mathbb{R}^{m} by

(32) Δc,d(y)=[argminfHcRf2+dS′′(yΛf)2s.to SΛf=Sy]\Delta_{c_{\sharp},d_{\sharp}}(y)=\Big{[}\underset{f\in H}{{\rm argmin}\,}\,c_{\sharp}\|Rf\|^{2}+d_{\sharp}\|S^{\prime\prime}(y-\Lambda f)\|^{2}\quad\mbox{s.to }S^{\prime}\Lambda f=S^{\prime}y\Big{]}

(interpreted via continuity in case c=0c_{\sharp}=0 or d=0d_{\sharp}=0), then the linear map QΔc,dQ\circ\Delta_{c_{\sharp},d_{\sharp}} provides an optimal recovery map.

Proof.

With the change of optimization variables a=cε2a=c\varepsilon^{2} and b=dη2b=d\eta^{2}, the program (9) for Λ~\widetilde{\Lambda}, Q~\widetilde{Q}, R~\widetilde{R}, and S~\widetilde{S} becomes

minimizec,d0cε2+dη2s.to cRf2+dS′′e2Qf2when fH,eker(S) satisfy Λf+e=0.\underset{c,d\geq 0}{\rm minimize}\,\;c\varepsilon^{2}+d\eta^{2}\quad\mbox{s.to }\;c\|Rf\|^{2}+d\|S^{\prime\prime}e\|^{2}\geq\|Qf\|^{2}\;\mbox{when }f\in H,e\in\ker(S^{\prime})\mbox{ satisfy }\Lambda f+e=0.

The form of the program (31) is obtained by eliminating eme\in\mathbb{R}^{m} from the above via e=Λfe=-\Lambda f and noticing that eker(S)e\in\ker(S^{\prime}) means that SΛf=0S^{\prime}\Lambda f=0, i.e., fker(SΛ)f\in\ker(S^{\prime}\Lambda). The constrained regularization map is now to be replaced by

ym[argminfH,eker(S)cRf2+dS′′e2s.to Λf+e=y].y\in\mathbb{R}^{m}\mapsto\Big{[}\underset{f\in H,e\in\ker(S^{\prime})}{{\rm argmin}\,}\,c_{\sharp}\|Rf\|^{2}+d_{\sharp}\|S^{\prime\prime}e\|^{2}\quad\mbox{s.to }\Lambda f+e=y\Big{]}.

The form of the program (32) is obtained by eliminating eme\in\mathbb{R}^{m} from the above via e=yΛfe=y-\Lambda f and noticing that eker(S)e\in\ker(S^{\prime}) means that SΛf=SyS^{\prime}\Lambda f=S^{\prime}y. ∎

It is worth making this result more explicit for our motivating example where eme\in\mathbb{R}^{m} is decomposed as e=(e,e′′)m×m′′e=(e^{\prime},e^{\prime\prime})\in\mathbb{R}^{m^{\prime}}\times\mathbb{R}^{m^{\prime\prime}} and we have Se=eS^{\prime}e=e^{\prime}, S′′e=e′′S^{\prime\prime}e=e^{\prime\prime}. The observation process on the object of interest fHf\in H satisfying Rfε\|Rf\|\leq\varepsilon is itself decomposed as y=Λfy^{\prime}=\Lambda^{\prime}f and y′′=Λ′′f+e′′y^{\prime\prime}=\Lambda^{\prime\prime}f+e^{\prime\prime}, where Λ:Hm\Lambda^{\prime}:H\to\mathbb{R}^{m^{\prime}}, Λ′′:Hm′′\Lambda^{\prime\prime}:H\to\mathbb{R}^{m^{\prime\prime}} are linear maps and where e′′η\|e^{\prime\prime}\|\leq\eta. In this case, a linear optimal recovery map is obtained, maybe somewhat intuitively, via the constrained regularization

minimizefHcRf2+dy′′Λ′′f2s.to Λf=y.\underset{f\in H}{\rm minimize}\,\,c_{\sharp}\|Rf\|^{2}+d_{\sharp}\|y^{\prime\prime}-\Lambda^{\prime\prime}f\|^{2}\quad\mbox{s.to }\Lambda^{\prime}f=y^{\prime}.

Our more significant contribution consists in uncovering a principled way of selecting the parameters c,d0c_{\sharp},d_{\sharp}\geq 0, namely as solutions to the program

minimizec,d0cε2+dη2s.to cRf2+dΛ′′f2Qf2 for all fker(Λ).\underset{c,d\geq 0}{\rm minimize}\,\;c\varepsilon^{2}+d\eta^{2}\quad\mbox{s.to }\;c\|Rf\|^{2}+d\|\Lambda^{\prime\prime}f\|^{2}\geq\|Qf\|^{2}\;\mbox{ for all }f\in\ker(\Lambda^{\prime}).

4 One intricate consequence: recovery from 1\ell_{1}-inaccurate data

In this final section, we contemplate yet another scenario of optimal recovery from inaccurate data which borrows from the results of Section 2. The situation is more delicate than in Section 3, though, because the observation error is not modeled through an 2\ell_{2}-bound but an 1\ell_{1}-bound. Thus, the objects of interest ff from a Hilbert space HH are acquired via inaccurate linear observations of the form y=Λf+emy=\Lambda f+e\in\mathbb{R}^{m}, where the model set for ff and the uncertainty set for ee are given relative to some parameter ε>0\varepsilon>0 and η>0\eta>0 by

(33) 𝒦\displaystyle\mathcal{K} ={fH:Rfε},\displaystyle=\{f\in H:\|Rf\|\leq\varepsilon\},
(34) \displaystyle\mathcal{E} ={em:e1η}.\displaystyle=\{e\in\mathbb{R}^{m}:\|e\|_{1}\leq\eta\}.

Towards the goal of optimally estimating a Hilbert-valued linear quantity of interest Q:HZQ:H\to Z, the worst-case error of a recovery map Δ:mZ\Delta:\mathbb{R}^{m}\to Z is defined as

(35) ErrQ,𝒦,(Λ,Δ)=supRfεe1ηQfΔ(Λf+e).{\rm Err}_{Q,\mathcal{K},\mathcal{E}}(\Lambda,\Delta)=\sup_{\begin{subarray}{c}\|Rf\|\leq\varepsilon\\ \|e\|_{1}\leq\eta\end{subarray}}\|Qf-\Delta(\Lambda f+e)\|.

We will reveal that, conditionally on a checkable sufficient condition, the radius of information can still be computed and a constrained-regularization-based optimal recovery map—turning out, perhaps surprisingly, to be linear—can still be constructed efficiently. The sufficient condition is not vacuous: numerically, it even appears to hold whenever η\eta is small enough. Unfortunately, we were not able to establish this fact theoretically.

4.1 Main result

The result presented below involves constrained regularization maps Δc,d(j)\Delta^{(j)}_{c,d} defined, for j=1,,mj=1,\ldots,m and for c,d>0c,d>0, by

(36) Δc,d(j):ym[argminfHcRf2+dyΛf2s.to λi(f)=yifor ij]H,\Delta_{c,d}^{(j)}:y\in\mathbb{R}^{m}\mapsto\Big{[}\underset{f\in H}{{\rm argmin}\,}\,c\|Rf\|^{2}+d\|y-\Lambda f\|^{2}\quad\mbox{s.to }\lambda_{i}(f)=y_{i}\;\mbox{for }i\not=j\Big{]}\in H,

with the usual interpretation when c=0c=0 or d=0d=0. These constrained regularization maps are linear. Indeed, as a consequence of Lemma 16 in the appendix, they are given by

Δc,d(j)=Λ[cR𝒩jR𝒩j+dΛ𝒩jΛ𝒩j]1(cR𝒩jRΛ),𝒩j=ijker(λi).\Delta_{c,d}^{(j)}=\Lambda^{\dagger}-\big{[}cR_{\mathcal{N}_{j}}^{*}R_{\mathcal{N}_{j}}+d\Lambda_{\mathcal{N}_{j}}^{*}\Lambda_{\mathcal{N}_{j}}\big{]}^{-1}(cR_{\mathcal{N}_{j}}^{*}R\Lambda^{\dagger}),\qquad\mathcal{N}_{j}=\bigcap_{i\not=j}\ker(\lambda_{i}).

For each j=1,,mj=1,\ldots,m, fixing from now on an element ujHu_{j}\in H such that555In this section, the notation eje_{j} does not represents the jjth entry of the error vector, but the jjth element of the canonical basis for m\mathbb{R}^{m}. Λuj=ej\Lambda u_{j}=e_{j} (e.g. uj=Λeju_{j}=\Lambda^{\dagger}e_{j}), we compute

(37) lbj:=minc,d0cε2+dη2s.to cR(hθuj)2+dθ2Q(hθuj)2 for all hkerΛ and θ{\rm lb}^{\prime}_{j}:=\min_{c,d\geq 0}\,c\varepsilon^{2}+d\eta^{2}\quad\mbox{s.to }c\|R(h-\theta u_{j})\|^{2}+d\theta^{2}\geq\|Q(h-\theta u_{j})\|^{2}\;\mbox{ for all }h\in\ker\Lambda\mbox{ and }\theta\in\mathbb{R}

and we let cj,dj0c_{j},d_{j}\geq 0 denote extremizers of this optimization program. In addition, we consider (and we shall compute some of) the quantities Mi,jM_{i,j} defined for i,j=1,,mi,j=1,\ldots,m by

Mi,j\displaystyle M_{i,j} :=infc,d0cε2+dη2\displaystyle:=\inf_{c,d\geq 0}\,c\varepsilon^{2}+d\eta^{2}
 s.to cR(hθui)2+dθ2Q(hθui)QΔcj,dj(j)Λh2 for all hH and θ.\displaystyle\quad\;\mbox{ s.to }c\|R(h-\theta u_{i})\|^{2}+d\theta^{2}\geq\|Q(h-\theta u_{i})-Q\Delta^{(j)}_{c_{j},d_{j}}\Lambda h\|^{2}\;\mbox{ for all }h\in H\mbox{ and }\theta\in\mathbb{R}.

The main result of this section can now be stated as follows.

Theorem 10.

Aiming at estimating a Hilbert-valued linear map Q:HZQ:H\to Z from the observation map Λ:m\Lambda:\mathbb{H}\to\mathbb{R}^{m} under the hyperellipsoid model set (33) and the 1\ell_{1}-uncertainty set (34), assume that

(38) Mi,kMk,kfor all i=1,,m,where k:=argmaxj=1,,mlbj.M_{i,k}\leq M_{k,k}\quad\mbox{for all }i=1,\ldots,m,\qquad\mbox{where }k:=\underset{j=1,\ldots,m}{{\rm argmax}\,}\;{\rm lb}^{\prime}_{j}.

Then the square of the radius of information is equal to lbk{\rm lb}^{\prime}_{k} and the linear map QΔck,dk(k):mZQ\circ\Delta^{(k)}_{c_{k},d_{k}}:\mathbb{R}^{m}\to Z provides an optimal recovery map.

The proof of this result is given in the next subsection. Before getting there, we reiterate that the sufficient condition (38) seems to occur whenever η\eta is small enough, as supported by the numerical experiments presented in the reproducible files accompanying this article.

4.2 Justification

Much like the proof of Theorem 1, the proof of Theorem 10 is divided into three separate lemmas: one which establishes lower bounds for the radius of information, one which indicates that each lower bound is achieved by an associated constrained regularization map, and one that establishes a key property of such constrained regularization maps. Finally, these three ingredients will be put together while incorporating the sufficient condition (38). Throughout the argument, it will be convenient to work with the linear maps Γ\Gamma, Q(1),,Q(m)Q^{(1)},\ldots,Q^{(m)}, R(1),,R(m)R^{(1)},\ldots,R^{(m)}, and SS defined for g=(h,θ)g=(h,\theta) in the extended Hilbert space H×H\times\mathbb{R} via

(39) Γ(g)=Λh,Q(j)(g)=Q(hθuj),R(j)(g)=(1/ε)R(hθuj),S(g)=(1/η)θ.\Gamma(g)=\Lambda h,\quad Q^{(j)}(g)=Q(h-\theta u_{j}),\quad R^{(j)}(g)=(1/\varepsilon)R(h-\theta u_{j}),\quad S(g)=(1/\eta)\theta.

Let us now state the first of our series of three lemmas.

Lemma 11.

The squared global worst-case error of any recovery map Δ\Delta satisfies

ErrQ,𝒦,(Λ,Δ)2maxj=1,,mlbj(Δ)maxj=1,,mlbj,{\rm Err}_{Q,\mathcal{K},\mathcal{E}}(\Lambda,\Delta)^{2}\geq\max_{j=1,\ldots,m}{\rm lb}_{j}(\Delta)\geq\max_{j=1,\ldots,m}{\rm lb}^{\prime}_{j},

where the lower bounds are expressed as

(40) lbj(Δ)\displaystyle{\rm lb}_{j}(\Delta) =supRfε|θ|ηQfΔ(Λf+θej)2\displaystyle=\sup_{\begin{subarray}{c}\|Rf\|\leq\varepsilon\\ |\theta|\leq\eta\end{subarray}}\|Qf-\Delta(\Lambda f+\theta e_{j})\|^{2}
(41) =supR(j)g1Sg1Q(j)gΔ(Γg)2,\displaystyle=\sup_{\begin{subarray}{c}\|R^{(j)}g\|\leq 1\\ \|Sg\|\leq 1\end{subarray}}\|Q^{(j)}g-\Delta(\Gamma g)\|^{2},
(42) lbj\displaystyle{\rm lb}^{\prime}_{j} =supR(j)g1Sg1gkerΓQ(j)g2\displaystyle=\sup_{\begin{subarray}{c}\|R^{(j)}g\|\leq 1\\ \|Sg\|\leq 1\\ g\in\ker\Gamma\end{subarray}}\|Q^{(j)}g\|^{2}
(43) =infa,b0a+bs.toaR(j)g2+bSg2Q(j)g2 for all gkerΓ.\displaystyle=\inf_{a,b\geq 0}\;a+b\qquad\mbox{s.to}\quad a\|R^{(j)}g\|^{2}+b\|Sg\|^{2}\geq\|Q^{(j)}g\|^{2}\;\mbox{ for all }g\in\ker\Gamma.
Proof.

For j=1,,mj=1,\ldots,m, noticing in (35) that the supremum over all eme\in\mathbb{R}^{m} satisfying e1η\|e\|_{1}\leq\eta is larger than or equal to the supremum over all e=θeje=\theta e_{j} with |θ|η|\theta|\leq\eta leads to the lower bound on ErrQ,𝒦,(Λ,Δ)2{\rm Err}_{Q,\mathcal{K},\mathcal{E}}(\Lambda,\Delta)^{2} expressed in (40). To arrive at (41), we make the change of variable h=f+θujh=f+\theta u_{j}, so that

lbj(Δ)=supR(hθui)ε|θ|ηQ(hθui)Δ(Λh)2,{\rm lb}_{j}(\Delta)=\sup_{\begin{subarray}{c}\|R(h-\theta u_{i})\|\leq\varepsilon\\ |\theta|\leq\eta\end{subarray}}\|Q(h-\theta u_{i})-\Delta(\Lambda h)\|^{2},

and (41) follows by setting g=(h,θ)H×g=(h,\theta)\in H\times\mathbb{R} and taking the expressions (39) for Γ\Gamma, Q(j)Q^{(j)}, R(j)R^{(j)}, and SS into account. At this point, it is apparent that lbj(Δ){\rm lb}_{j}(\Delta) coincides with the worst-case error of Δ\Delta for the estimation of Q(j)gQ^{(j)}g from Γg\Gamma g under the two-hyperellipsoid-intersection model assumption R(j)g1\|R^{(j)}g\|\leq 1 and Sg1\|Sg\|\leq 1. Thus, the further lower bound (42) and its reformulation (43) follow from an application of Lemma 2. ∎

The lemma below, although not used explicitly later, gives us an idea of the coveted optimal recovery map by looking at the case of equality in lbj(Δ)lbj{\rm lb}_{j}(\Delta)\geq{\rm lb}_{j}^{\prime}. For the latter, note that the expression (43) is seen to coincide with the expression (37) by making the change of optimization variables a=cε2a=c\varepsilon^{2}, b=dη2b=d\eta^{2}.

Lemma 12.

For each j=1,,mj=1,\ldots,m, if Δcj,dj(j)\Delta^{(j)}_{c_{j},d_{j}} is the constrained regularization map defined in (36) and its parameters cj,dj0c_{j},d_{j}\geq 0 are extremizers of the expression (37) for lbj{\rm lb}_{j}^{\prime}, then

lbj(QΔcj,dj(j))=lbj.{\rm lb}_{j}(Q\circ\Delta^{(j)}_{c_{j},d_{j}})={\rm lb}^{\prime}_{j}.
Proof.

According to the results of Section 2, we know that equality in lbj(Δ)lbj{\rm lb}_{j}(\Delta)\geq{\rm lb}^{\prime}_{j} occurs for Δ=Q(j)Δj,aj,bj\Delta_{\flat}=Q^{(j)}\circ\Delta_{j,a_{j},b_{j}}, where aj,bj>0a_{j},b_{j}>0 are extremizers of the program

minimizea,b0a+bs.toaR(j)g2+bSg2Q(j)g2 for all gkerΓ\underset{a,b\geq 0}{\rm minimize}\,\,a+b\qquad\mbox{s.to}\quad a\|R^{(j)}g\|^{2}+b\|Sg\|^{2}\geq\|Q^{(j)}g\|^{2}\;\mbox{ for all }g\in\ker\Gamma

and where the recovery map Δj,aj,bj:mH×\Delta_{j,a_{j},b_{j}}:\mathbb{R}^{m}\to H\times\mathbb{R} is defined, for ymy\in\mathbb{R}^{m}, by

Δj,aj,bj(y)=[argmingH×ajR(j)g2+bjSg2s.to Γg=y].\Delta_{j,a_{j},b_{j}}(y)=\Big{[}\underset{g\in H\times\mathbb{R}}{{\rm argmin}\,\,}a_{j}\|R^{(j)}g\|^{2}+b_{j}\|Sg\|^{2}\quad\mbox{s.to }\;\Gamma g=y\Big{]}.

It now remains to verify that Δ\Delta_{\flat} agrees with QΔcj,dj(j)Q\circ\Delta^{(j)}_{c_{j},d_{j}}. First, according to the expressions (39) for Γ\Gamma, Q(j)Q^{(j)}, R(j)R^{(j)}, and SS, and in view of the relations a=cε2a=c\varepsilon^{2} and b=dη2b=d\eta^{2}, it is easily seen that aj=cjε2a_{j}=c_{j}\varepsilon^{2} and bj=djη2b_{j}=d_{j}\eta^{2}. Then, writing Δj,aj,bj(y)=(h,θ)\Delta_{j,a_{j},b_{j}}(y)=(h_{\flat},\theta_{\flat}) with hHh_{\flat}\in H and θ\theta_{\flat}\in\mathbb{R}, we see that

(h,θ)=[argmin(h,θ)H×cjR(hθuj)2+djθ2s.to Λh=y],(h_{\flat},\theta_{\flat})=\Big{[}\underset{(h,\theta)\in H\times\mathbb{R}}{{\rm argmin}\,\,}c_{j}\|R(h-\theta u_{j})\|^{2}+d_{j}\theta^{2}\quad\mbox{s.to }\Lambda h=y\Big{]},

and consequently, making the change f=hθujf=h-\theta u_{j}, we deduce that

(hθuj,θ)=[argmin(f,θ)H×cjRf2+djθ2s.to Λf+θej=y].(h_{\flat}-\theta_{\flat}u_{j},\theta_{\flat})=\Big{[}\underset{(f,\theta)\in H\times\mathbb{R}}{{\rm argmin}\,\,}c_{j}\|Rf\|^{2}+d_{j}\theta^{2}\quad\mbox{s.to }\Lambda f+\theta e_{j}=y\Big{]}.

Since the constraint Λf+θej=y\Lambda f+\theta e_{j}=y decomposes as λj(f)+θ=yj\lambda_{j}(f)+\theta=y_{j} and λi(f)=yi\lambda_{i}(f)=y_{i} for iji\not=j, we obtain, after eliminating θ\theta,

hθuj\displaystyle h_{\flat}-\theta_{\flat}u_{j} =[argminfHcjRf2+dj(yjλj(f))2s.to λi(f)=yi for ij]\displaystyle=\Big{[}\underset{f\in H}{{\rm argmin}\,\,}c_{j}\|Rf\|^{2}+d_{j}(y_{j}-\lambda_{j}(f))^{2}\quad\mbox{s.to }\lambda_{i}(f)=y_{i}\mbox{ for }i\not=j\Big{]}
=Δcj,dj(j)(y),\displaystyle=\Delta^{(j)}_{c_{j},d_{j}}(y),

where that last equality is simply the definition of Δcj,dj(j)\Delta^{(j)}_{c_{j},d_{j}}. The remaining justification is settled by remarking that Δ(y)=Q(j)(Δj,aj,bj(y))=Q(j)((h,θ))=Q(hθuj)=Q(Δcj,dj(j)(y))\Delta_{\flat}(y)=Q^{(j)}(\Delta_{j,a_{j},b_{j}}(y))=Q^{(j)}((h_{\flat},\theta_{\flat}))=Q(h_{\flat}-\theta_{\flat}u_{j})=Q(\Delta^{(j)}_{c_{j},d_{j}}(y)). ∎

The third lemma is a step towards the determination of the worst-case error of the constrained regularization map Δcj,dj(j)\Delta_{c_{j},d_{j}}^{(j)}.

Lemma 13.

For each j=1,,mj=1,\ldots,m, one has

supRfε|θ|ηQfQΔcj,dj(j)(Λf+θej)2=lbj.\sup_{\begin{subarray}{c}\|Rf\|\leq\varepsilon\\ |\theta|\leq\eta\end{subarray}}\|Qf-Q\Delta_{c_{j},d_{j}}^{(j)}(\Lambda f+\theta e_{j})\|^{2}={\rm lb}^{\prime}_{j}.
Proof.

Setting g=(f+θuj,θ)H×g=(f+\theta u_{j},\theta)\in H\times\mathbb{R}, the quantity under consideration becomes

supR(j)g1Sg1Q(j)gQΔcj,dj(j)(Γg)2=supR(j)g1Sg1Q(j)gQ(j)Δj,aj,bj(Γg)2,\sup_{\begin{subarray}{c}\|R^{(j)}g\|\leq 1\\ \|Sg\|\leq 1\end{subarray}}\|Q^{(j)}g-Q\Delta^{(j)}_{c_{j},d_{j}}(\Gamma g)\|^{2}=\sup_{\begin{subarray}{c}\|R^{(j)}g\|\leq 1\\ \|Sg\|\leq 1\end{subarray}}\|Q^{(j)}g-Q^{(j)}\Delta_{j,a_{j},b_{j}}(\Gamma g)\|^{2},

where we have borrowed from the previous proof the observation that QΔcj,dj(j)=Q(j)Δj,aj,bjQ\circ\Delta^{(j)}_{c_{j},d_{j}}=Q^{(j)}\circ\Delta_{j,a_{j},b_{j}} with aj=cjε2a_{j}=c_{j}\varepsilon^{2} and bj=djη2b_{j}=d_{j}\eta^{2}. Thus, our quantity appears to be the worst-case error of Q(j)Δj,aj,bjQ^{(j)}\circ\Delta_{j,a_{j},b_{j}} for the estimation of Q(j)gQ^{(j)}g from Γg\Gamma g given that R(j)g1\|R^{(j)}g\|\leq 1 and Sg1\|Sg\|\leq 1. We know from the results of Section 2 that the latter is equal to aj+bj=cjε2+djη2a_{j}+b_{j}=c_{j}\varepsilon^{2}+d_{j}\eta^{2}, i.e., to lbj{\rm lb}^{\prime}_{j}, as announced. ∎

Having these three lemmas at our disposal, we now turn to the justification of the main result of this section.

Proof of Theorem 10.

According to Lemma 11, there holds

infΔ:mZErrQ,𝒦,(Λ,Δ)2lbk,\inf_{\Delta:\mathbb{R}^{m}\to Z}{\rm Err}_{Q,\mathcal{K},\mathcal{E}}(\Lambda,\Delta)^{2}\geq{\rm lb}^{\prime}_{k},

where we recall that the index kk is obtained as the maximizer of lbj{\rm lb}^{\prime}_{j} over all j=1,,mj=1,\ldots,m. In order to prove our result, we have to show that this infimum is actually achieved for the linear recovery map QΔck,dk(k)Q\circ\Delta^{(k)}_{c_{k},d_{k}}. To this end, we notice that the linearity of QΔck,dk(k)Q\circ\Delta^{(k)}_{c_{k},d_{k}} guarantees that

ErrQ,𝒦,(Λ,QΔck,dk(k))2\displaystyle{\rm Err}_{Q,\mathcal{K},\mathcal{E}}(\Lambda,Q\circ\Delta^{(k)}_{c_{k},d_{k}})^{2} =supRfεe1ηQfQΔck,dk(k)(Λf+e)2\displaystyle=\sup_{\begin{subarray}{c}\|Rf\|\leq\varepsilon\\ \|e\|_{1}\leq\eta\end{subarray}}\|Qf-Q\Delta^{(k)}_{c_{k},d_{k}}(\Lambda f+e)\|^{2}
=maxi=1,,msupRfε|θ|ηQfQΔck,dk(k)(Λf+θei)2\displaystyle=\max_{i=1,\ldots,m}\sup_{\begin{subarray}{c}\|Rf\|\leq\varepsilon\\ |\theta|\leq\eta\end{subarray}}\|Qf-Q\Delta^{(k)}_{c_{k},d_{k}}(\Lambda f+\theta e_{i})\|^{2}
=maxi=1,,msupR(hθui)ε|θ|ηQ(hθui)QΔck,dk(k)Λh2.\displaystyle=\max_{i=1,\ldots,m}\sup_{\begin{subarray}{c}\|R(h-\theta u_{i})\|\leq\varepsilon\\ |\theta|\leq\eta\end{subarray}}\|Q(h-\theta u_{i})-Q\Delta^{(k)}_{c_{k},d_{k}}\Lambda h\|^{2}.

It has become familiar, relying on Polyak’s S-procedure, to transform the latter supremum into

infc,d0cε2+dη2s.to cR(hθui)2+dθ2Q(hθui)QΔck,dk(k)Λh2 for all hH and θ,\inf_{c,d\geq 0}\,c\varepsilon^{2}+d\eta^{2}\quad\mbox{s.to }c\|R(h-\theta u_{i})\|^{2}+d\theta^{2}\geq\|Q(h-\theta u_{i})-Q\Delta^{(k)}_{c_{k},d_{k}}\Lambda h\|^{2}\mbox{ for all }h\in H\mbox{ and }\theta\in\mathbb{R},

which we recognize as the quantity Mi,kM_{i,k}. Now, calling upon the sufficient condition (38), we derive that

ErrQ,𝒦,(Λ,QΔck,dk(k))2=maxi=1,,mMi,k=Mk,k=lbk,{\rm Err}_{Q,\mathcal{K},\mathcal{E}}(\Lambda,Q\circ\Delta^{(k)}_{c_{k},d_{k}})^{2}=\max_{i=1,\ldots,m}M_{i,k}=M_{k,k}={\rm lb}^{\prime}_{k},

where the last step was due to Lemma 13. This equality completes the proof. ∎

4.3 Side result

When the sufficient condition (38) fails, it is not anymore guaranteed that the linear map QΔck,dk(k)Q\circ\Delta^{(k)}_{c_{k},d_{k}} provides an optimal recovery map. Regardless, we can always solve a semidefinite program to obtain a linear recovery map with minimal worst-case error, according to the result stated below with the notation introduced in (39).

Proposition 14.

The squared worst-case error of a linear recovery maps Δlin:mZ\Delta^{\rm lin}:\mathbb{R}^{m}\to Z can be computed as the optimal value of a semidefinite program, namely as

(44) ErrQ,𝒦,(Λ,Δlin)2=minγa1,b1,,am,bm0γ\displaystyle{\rm Err}_{Q,\mathcal{K},\mathcal{E}}(\Lambda,\Delta^{\rm lin})^{2}=\min_{\begin{subarray}{c}\gamma\in\mathbb{R}\\ a_{1},b_{1},\ldots,a_{m},b_{m}\geq 0\end{subarray}}\,\gamma s.to [IdQ(i)ΔlinΓ(Q(i)ΔlinΓ)aiR(i)R(i)+biSS]0\displaystyle\quad\mbox{s.to }\begin{bmatrix}\mathrm{Id}&\vline&Q^{(i)}-\Delta^{\rm lin}\Gamma\\ \hline\cr(Q^{(i)}-\Delta^{\rm lin}\Gamma)^{*}&\vline&a_{i}{R^{(i)}}^{*}R^{(i)}+b_{i}S^{*}S\end{bmatrix}\succeq 0
and ai+biγfor all i=1,,m.\displaystyle\quad\mbox{and }\;a_{i}+b_{i}\leq\gamma\quad\mbox{for all }i=1,\ldots,m.

This quantity can further be minimized over all linear maps Δlin:mZ\Delta^{\rm lin}:\mathbb{R}^{m}\to Z, yielding a linear recovery map with smallest worst-case error.

Proof.

For a linear recovery map Δlin:mZ\Delta^{\rm lin}:\mathbb{R}^{m}\to Z, we have

ErrQ,𝒦,(Λ,Δlin)2\displaystyle{\rm Err}_{Q,\mathcal{K},\mathcal{E}}(\Lambda,\Delta^{\rm lin})^{2} =maxi=1,,msupRfε|θ|ηQfΔlin(Λf+θei)2\displaystyle=\max_{i=1,\ldots,m}\sup_{\begin{subarray}{c}\|Rf\|\leq\varepsilon\\ |\theta|\leq\eta\end{subarray}}\|Qf-\Delta^{\rm lin}(\Lambda f+\theta e_{i})\|^{2}
(45) =infγγs.to supRfε|θ|ηQfΔlin(Λf+θei)2γ for all i=1,,m.\displaystyle=\;\;\;\inf_{\gamma\in\mathbb{R}}\,\gamma\quad\mbox{s.to }\;\sup_{\begin{subarray}{c}\|Rf\|\leq\varepsilon\\ |\theta|\leq\eta\end{subarray}}\|Qf-\Delta^{\rm lin}(\Lambda f+\theta e_{i})\|^{2}\leq\gamma\mbox{ for all }i=1,\ldots,m.

Note that the above ii-dependent suprema can also be expressed as

supR(hθui)ε|θ|η\displaystyle\sup_{\begin{subarray}{c}\|R(h-\theta u_{i})\|\leq\varepsilon\\ |\theta|\leq\eta\end{subarray}} Q(hθui)ΔlinΛh2=supR(i)g1Sg1Q(i)gΔlinΓg2\displaystyle\|Q(h-\theta u_{i})-\Delta^{\rm lin}\Lambda h\|^{2}=\sup_{\begin{subarray}{c}\|R^{(i)}g\|\leq 1\\ \|Sg\|\leq 1\end{subarray}}\|Q^{(i)}g-\Delta^{\rm lin}\Gamma g\|^{2}
=infai,bi0ai+bis.toaiR(i)g2+biSg2Q(i)gΔlinΓg2 for all gH×\displaystyle=\inf_{a_{i},b_{i}\geq 0}\;a_{i}+b_{i}\quad\mbox{s.to}\quad a_{i}\|R^{(i)}g\|^{2}+b_{i}\|Sg\|^{2}\geq\|Q^{(i)}g-\Delta^{\rm lin}\Gamma g\|^{2}\mbox{ for all }g\in H\times\mathbb{R}
=infai,bi0ai+bis.toaiR(i)R(i)+biSS(Q(i)ΔlinΓ)(Q(i)ΔlinΓ).\displaystyle=\inf_{a_{i},b_{i}\geq 0}\;a_{i}+b_{i}\quad\mbox{s.to}\quad a_{i}{R^{(i)}}^{*}R^{(i)}+b_{i}S^{*}S\succeq(Q^{(i)}-\Delta^{\rm lin}\Gamma)^{*}(Q^{(i)}-\Delta^{\rm lin}\Gamma).

Therefore, the ii-dependent constraint in (45) is equivalent to the existence of ai,bi0a_{i},b_{i}\geq 0 such that ai+biγa_{i}+b_{i}\leq\gamma and aiR(i)R(i)+biSS(Q(i)ΔlinΓ)(Q(i)ΔlinΓ)a_{i}{R^{(i)}}^{*}R^{(i)}+b_{i}S^{*}S\succeq(Q^{(i)}-\Delta^{\rm lin}\Gamma)^{*}(Q^{(i)}-\Delta^{\rm lin}\Gamma). As such, we arrive at

ErrQ,𝒦,(Λ,Δlin)2=infγa1,b1,,am,bm0γ\displaystyle{\rm Err}_{Q,\mathcal{K},\mathcal{E}}(\Lambda,\Delta^{\rm lin})^{2}=\inf_{\begin{subarray}{c}\gamma\in\mathbb{R}\\ a_{1},b_{1},\ldots,a_{m},b_{m}\geq 0\end{subarray}}\,\gamma s.to aiR(i)R(i)+biSS(Q(i)ΔlinΓ)(Q(i)ΔlinΓ)\displaystyle\quad\mbox{s.to }\;a_{i}{R^{(i)}}^{*}R^{(i)}+b_{i}S^{*}S\succeq(Q^{(i)}-\Delta^{\rm lin}\Gamma)^{*}(Q^{(i)}-\Delta^{\rm lin}\Gamma)
and ai+biγfor all i=1,,m.\displaystyle\quad\mbox{and }\;a_{i}+b_{i}\leq\gamma\quad\mbox{for all }i=1,\ldots,m.

Using Schur complements, the above ii-dependent semidefinite constraints can each be rephrased as

[IdQ(i)ΔlinΓ(Q(i)ΔlinΓ)aiR(i)R(i)+biSS]0,\begin{bmatrix}\mathrm{Id}&\vline&Q^{(i)}-\Delta^{\rm lin}\Gamma\\ \hline\cr(Q^{(i)}-\Delta^{\rm lin}\Gamma)^{*}&\vline&a_{i}{R^{(i)}}^{*}R^{(i)}+b_{i}S^{*}S\end{bmatrix}\succeq 0,

leading to ErrQ,𝒦,(Λ,Δlin)2{\rm Err}_{Q,\mathcal{K},\mathcal{E}}(\Lambda,\Delta^{\rm lin})^{2} being expressed as in (44). We finally note that the linear dependence on Δlin\Delta^{\rm lin} of the constraints in (44) allows us to further view the minimization of ErrQ,𝒦,(Λ,Δlin)2{\rm Err}_{Q,\mathcal{K},\mathcal{E}}(\Lambda,\Delta^{\rm lin})^{2} over all linear maps Δlin\Delta^{\rm lin} as a semidefinite program. ∎

We should remark that, even when η\eta is small and the sufficient condition (38) holds, the linear recovery map with smallest worst-case error obtained by semidefinite programming may differ from the optimal recovery map QΔck,dk(k)Q\circ\Delta^{(k)}_{c_{k},d_{k}}, illustrating the nonuniqueness of optimal recovery maps. Moreover, when η\eta is not small, our numerical experiments (available from the reproducible files) suggest that QΔck,dk(k)Q\circ\Delta^{(k)}_{c_{k},d_{k}} may not be optimal among linear recovery maps anymore.

Appendix

In this appendix, we provide justifications for a few facts not fully explained in the main text.

Polyak’s S-procedure.

Given quadratic functions q0,q1,,qnq_{0},q_{1},\ldots,q_{n}, the statement q0(x)0q_{0}(x)\leq 0 whenever q1(x)0,,qn(x)0q_{1}(x)\leq 0,\ldots,q_{n}(x)\leq 0 holds if there exists a1,,an0a_{1},\ldots,a_{n}\geq 0 such that q0a1q1+anqnq_{0}\leq a_{1}q_{1}+\cdots a_{n}q_{n}. The following result, paraphrased from [Polyak, 1998, Theorem 4.1] establishes that this sufficient condition is also necessary when n=2n=2 and the qiq_{i}’s contain no linear terms.

Theorem 15.

Suppose that N3N\geq 3 and that quadratic functions q0,q1,q2q_{0},q_{1},q_{2} on N\mathbb{R}^{N} take the form qi(x)=Aix,x+αiq_{i}(x)=\langle A_{i}x,x\rangle+\alpha_{i} for symmetric matrices A0,A1,A2N×NA_{0},A_{1},A_{2}\in\mathbb{R}^{N\times N} and scalars α0,α1,α2\alpha_{0},\alpha_{1},\alpha_{2}\in\mathbb{R}. Then

[q0(x)0 whenever q1(x)0 and q2(x)0][there exist a1,a20:q0a1q1+a2q2],[q_{0}(x)\leq 0\;\mbox{ whenever }q_{1}(x)\leq 0\mbox{ and }q_{2}(x)\leq 0]\iff[\mbox{there exist }a_{1},a_{2}\geq 0:q_{0}\leq a_{1}q_{1}+a_{2}q_{2}],

provided q1(x~)<0q_{1}(\widetilde{x})<0 and q2(x~)<0q_{2}(\widetilde{x})<0 for some x~N\widetilde{x}\in\mathbb{R}^{N} (strict feasibility) and b1A1+b2A20b_{1}A_{1}+b_{2}A_{2}\succ 0 for some b1,b2b_{1},b_{2}\in\mathbb{R} (positive definiteness).

As established in [Contino, Fongi, and Muro, 2022, Proposition 5.2], such a result remains valid when N\mathbb{R}^{N} is replaced by an arbitrary Hilbert space HH—even of infinite dimension—and the AiA_{i}’s are self-adjoint bounded linear operators on HH. This generalized version is the one called upon in the main text.

Constrained Regularization.

The goal here is to justify the identities (7) and (8), which are consequences of the general observation below.

Lemma 16.

Let A:HHA:H\to H^{\prime} and B:HH′′B:H\to H^{\prime\prime} be two bounded linear maps between Hilbert spaces. Assume that there exists δ>0\delta>0 such that Azδz\|Az\|\geq\delta\|z\| for all z:=ker(B)z\in\mathcal{B}:=\ker(B), so that AA:A_{\mathcal{B}}^{*}A_{\mathcal{B}}:\mathcal{B}\to\mathcal{B} is invertible, where A:HA_{\mathcal{B}}:\mathcal{B}\to H^{\prime} denotes the restriction of AA to \mathcal{B}. Given aHa\in H^{\prime} and bH′′b\in H^{\prime\prime}, the solution xHx^{\sharp}\in H to

minimizexHAxa2s.toBx=b\underset{x\in H}{\rm minimize\;}\|Ax-a\|^{2}\qquad\mbox{s.to}\quad Bx=b

can be expressed, for any x¯\overline{x} such that Bx¯=bB\overline{x}=b, as

x=x¯[AA]1A(Ax¯a).x^{\sharp}=\overline{x}-\big{[}A_{\mathcal{B}}^{*}A_{\mathcal{B}}\big{]}^{-1}A_{\mathcal{B}}^{*}(A\overline{x}-a).
Proof.

Writing the optimization variable xHx\in H as x=x¯zx=\overline{x}-z with zz\in\mathcal{B} and the minimizer xx^{\sharp} as x=x¯zx^{\sharp}=\overline{x}-z^{\sharp} with zz^{\sharp}\in\mathcal{B}, we see that zz^{\sharp} is solution to

minimizezAx¯aAz2.\underset{z\in\mathcal{B}}{\rm minimize\;}\|A\overline{x}-a-Az\|^{2}.

This solution is characterized by the orthogonality condition Ax¯aAz,Az=0\langle A\overline{x}-a-Az^{\sharp},Az\rangle=0 for all zz\in\mathcal{B}, which is equivalent to A(Ax¯aAz)=0A_{\mathcal{B}}^{*}(A\overline{x}-a-Az^{\sharp})=0, or to AAz=A(Ax¯a)A_{\mathcal{B}}^{*}A_{\mathcal{B}}z^{\sharp}=A_{\mathcal{B}}^{*}(A\overline{x}-a). Left-multiplying by [AA]1\big{[}A_{\mathcal{B}}^{*}A_{\mathcal{B}}\big{]}^{-1} to obtain zz^{\sharp} and substituting into x=x¯zx^{\sharp}=\overline{x}-z^{\sharp} yields the announced result. ∎

It follows as a consequence that, if R1,,RnR_{1},\ldots,R_{n} are Hilbert-valued bounded linear maps defined on HH such that there exists δ>0\delta>0 with max{R1z,,Rnz}δz\max\{\|R_{1}z\|,\ldots,\|R_{n}z\|\}\geq\delta\|z\| for all z𝒩:=ker(Λ)z\in\mathcal{N}:=\ker(\Lambda)666This assumption simply reduces to ker(R1)ker(Rn)ker(Λ)={0}\ker(R_{1})\cap\ldots\cap\ker(R_{n})\cap\ker(\Lambda)=\{0\} when HH is finite dimensional. and if c1,,cn>0c_{1},\ldots,c_{n}>0, then, for any ymy\in\mathbb{R}^{m},

(46) Δc1,,cn(y)\displaystyle\Delta_{c_{1},\ldots,c_{n}}(y) :=[argminxHi=1nciRix2s.to Λx=y]\displaystyle:=\bigg{[}\underset{x\in H}{{\rm argmin}\,\,}\sum_{i=1}^{n}c_{i}\|R_{i}x\|^{2}\quad\mbox{s.to }\Lambda x=y\bigg{]}
=Λy[i=1nciRi,𝒩Ri,𝒩]1(i=1nciRi,𝒩Ri)Λy.\displaystyle=\Lambda^{\dagger}y-\bigg{[}\sum_{i=1}^{n}c_{i}R_{i,\mathcal{N}}^{*}R_{i,\mathcal{N}}\bigg{]}^{-1}\Big{(}\sum_{i=1}^{n}c_{i}R_{i,\mathcal{N}}^{*}R_{i}\Big{)}\Lambda^{\dagger}y.

To arrive at this identity, which reduces to (7) when n=2n=2, it suffices to apply Lemma 16 with

A=[c1R1cnRn],a=0,B=Λ,b=y,x¯=Λy.A=\begin{bmatrix}\sqrt{c_{1}}R_{1}\\ \hline\cr\vdots\\ \hline\cr\sqrt{c_{n}}R_{n}\end{bmatrix},\qquad a=0,\qquad B=\Lambda,\qquad b=y,\qquad\overline{x}=\Lambda^{\dagger}y.

Furthermore, if y=Λxy=\Lambda x for some xHx\in H, taking x¯=x\overline{x}=x instead of x¯=Λy\overline{x}=\Lambda^{\dagger}y leads, after rearrangement, to

xΔc1,,cnΛx=[i=1nciRi,𝒩Ri,𝒩]1(i=1nciRi,𝒩Ri)x.x-\Delta_{c_{1},\ldots,c_{n}}\Lambda x=\bigg{[}\sum_{i=1}^{n}c_{i}R_{i,\mathcal{N}}^{*}R_{i,\mathcal{N}}\bigg{]}^{-1}\Big{(}\sum_{i=1}^{n}c_{i}R_{i,\mathcal{N}}^{*}R_{i}\Big{)}x.

The latter reduces to (8) when n=2n=2.

Finally, we want to justify the statement made in Section 2 that Δa,b(y)\Delta_{a,b}(y) converges weakly to Δa,0(y)\Delta_{a,0}(y) as b0b\to 0 for any fixed ymy\in\mathbb{R}^{m}. We shall do so under the working assumption that there exists δ>0\delta>0 such that max{Rz,Sz}δz\max\{\|Rz\|,\|Sz\|\}\geq\delta\|z\| for all z𝒩=ker(Λ)z\in\mathcal{N}=\ker(\Lambda)777This assumption reduces to ker(R)ker(S)ker(Λ)={0}\ker(R)\cap\ker(S)\cap\ker(\Lambda)=\{0\} when HH is finite dimensional.. Supposing without loss of generality that a=1a=1, we thus want to establish that

xb:=argminxH[Rx2+bSx2s.to Λx=y]b0x0:=argminxH[Sx2s.to Λx=y,Rx=0].x_{b}:=\underset{x\in H}{{\rm argmin}\,}\left[\|Rx\|^{2}+b\|Sx\|^{2}\quad\mbox{s.to }\Lambda x=y\right]\underset{b\to 0}{\rightharpoonup}x_{0}:=\underset{x\in H}{{\rm argmin}\,}\left[\|Sx\|^{2}\quad\mbox{s.to }\Lambda x=y,\;Rx=0\right].

If this was not the case, there would exist vHv\in H, ε>0\varepsilon>0, and a sequence (bk)k1(b_{k})_{k\geq 1} decreasing to zero such that |xbkx0,v|ε|\langle x_{b_{k}}-x_{0},v\rangle|\geq\varepsilon for each k1k\geq 1. Now, from the optimality property of xbkx_{b_{k}}, we have Rxbk2+bkSxbk2Rx02+bkSx02\|Rx_{b_{k}}\|^{2}+b_{k}\|Sx_{b_{k}}\|^{2}\leq\|Rx_{0}\|^{2}+b_{k}\|Sx_{0}\|^{2}, which yields, in view of Rx0=0Rx_{0}=0,

Rxbk2bkSx02andSxbk2Sx02.\|Rx_{b_{k}}\|^{2}\leq b_{k}\|Sx_{0}\|^{2}\qquad\mbox{and}\qquad\|Sx_{b_{k}}\|^{2}\leq\|Sx_{0}\|^{2}.

Thanks to our working assumption, it follows that the sequence (xbkx0)k1(x_{b_{k}}-x_{0})_{k\geq 1} of elements in 𝒩\mathcal{N} is bounded, and then so is the sequence (xbk)k1(x_{b_{k}})_{k\geq 1}. As such, it possesses a subsequence weakly converging to some x~H\widetilde{x}\in H, say. We still write (xbk)k1(x_{b_{k}})_{k\geq 1} for this subsequence and we note that |x~x0,v|ε|\langle\widetilde{x}-x_{0},v\rangle|\geq\varepsilon. Next, in view of Λxk=y\Lambda x_{k}=y for all k1k\geq 1, we derive that Λx~=y\Lambda\widetilde{x}=y from xbkx~x_{b_{k}}\rightharpoonup\widetilde{x}. From there, we also obtain RxbkRx~Rx_{b_{k}}\rightharpoonup R\widetilde{x} and SxbkSx~Sx_{b_{k}}\rightharpoonup S\widetilde{x}, and in turn Rx~lim infRxbk=0\|R\widetilde{x}\|\leq\liminf\|Rx_{b_{k}}\|=0 and Sx~lim infSxbk=Sx0\|S\widetilde{x}\|\leq\liminf\|Sx_{b_{k}}\|=\|Sx_{0}\|. These facts imply that x~\widetilde{x} is also a minimizer for the program defining x0x_{0}, so that x~=x0\widetilde{x}=x_{0} by uniqueness of the minimizer. This is of course incompatible with |x~x0,v|ε|\langle\widetilde{x}-x_{0},v\rangle|\geq\varepsilon and provides the required contradiction.

References

  • Anandkumar et al. [2020] A. Anandkumar, K. Azizzadenesheli, K. Bhattacharya, N. Kovachki, Z. Li, B. Liu, and A. Stuart. Neural operator: graph kernel network for partial differential equations. In ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations, 2020.
  • Binev et al. [2017] P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, and P. Wojtaszczyk. Data assimilation in reduced modeling. SIAM/ASA Journal on Uncertainty Quantification, 5(1):1–29, 2017.
  • Contino et al. [2022] M. Contino, G. Fongi, and S. Muro. Polyak’s theorem on Hilbert spaces. Optimization, pages 1–15, 2022.
  • Foucart [2021] S. Foucart. Instances of computational optimal recovery: refined approximability models. Journal of Complexity, 62:101503, 2021.
  • Foucart [2022] S. Foucart. Mathematical Pictures at a Data Science Exhibition. Cambridge University Press, 2022.
  • Foucart [2023] S. Foucart. Full recovery from point values: an optimal algorithm for Chebyshev approximability prior. Advances in Computational Mathematics, 49(4):57, 2023.
  • Foucart and Liao [2022] S. Foucart and C. Liao. Optimal recovery from inaccurate data in Hilbert spaces: regularize, but what of the parameter? Constructive Approximation, pages 1–32, 2022.
  • Foucart et al. [2023] S. Foucart, C. Liao, and N. Veldt. On the optimal recovery of graph signals. In 2023 International Conference on Sampling Theory and Applications (SampTA), pages 1–5, 2023. doi: 10.1109/SampTA59647.2023.10301205.
  • Grant and Boyd [2014] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx, March 2014.
  • Long et al. [2023] D. Long, N. Mrvaljevic, S. Zhe, and B. Hosseini. A kernel approach for PDE discovery and operator learning. arXiv preprint arXiv:2210.08140, 2023.
  • Magaril-Il’yaev et al. [2004] G. G. Magaril-Il’yaev, K. Yu. Osipenko, and V. M. Tikhomirov. On optimal recovery of heat equation solutions. Approximation theory: a volume dedicated to B. Bojanov, pages 163–175, 2004.
  • Novak and Woźniakowski [2008] E. Novak and H. Woźniakowski. Tractability of Multivariate Problems: Linear Information. European Mathematical Society, 2008.
  • Packel [1988] E. W. Packel. Do linear problems have linear optimal algorithms? SIAM Review, 30(3):388–403, 1988.
  • Polyak [1998] B. T. Polyak. Convexity of quadratic transformations and its use in control and optimization. Journal of Optimization Theory and Applications, 99(3):553–583, 1998.
  • Rudin [1991] W. Rudin. Functional Analysis. McGraw-Hill, second edition, 1991.
  • Vvedenskaya [2009] E. V. Vvedenskaya. On the optimal recovery of a solution of a system of linear homogeneous differential equations. Differential Equations, 45(2):262–266, 2009.