Radius of Information for Two Intersected Centered Hyperellipsoids
and Implications in Optimal Recovery from Inaccurate Data
Abstract
For objects belonging to a known model set and observed through a prescribed linear process, we aim at determining methods to recover linear quantities of these objects that are optimal from a worst-case perspective. Working in a Hilbert setting, we show that, if the model set is the intersection of two hyperellipsoids centered at the origin, then there is an optimal recovery method which is linear. It is specifically given by a constrained regularization procedure whose parameters, short of being explicit, can be precomputed by solving a semidefinite program. This general framework can be swiftly applied to several scenarios: the two-space problem, the problem of recovery from -inaccurate data, and the problem of recovery from a mixture of accurate and -inaccurate data. With more effort, it can also be applied to the problem of recovery from -inaccurate data. For the latter, we reach the conclusion of existence of an optimal recovery method which is linear, again given by constrained regularization, under a computationally verifiable sufficient condition. Experimentally, this condition seems to hold whenever the level of -inaccuracy is small enough. We also point out that, independently of the inaccuracy level, the minimal worst-case error of a linear recovery method can be found by semidefinite programming.
Key words and phrases: Optimal Recovery, Regularization, Semidefinite Programming, S-procedure.
AMS classification: 41A65, 46N40, 90C22, 90C47.
1 Introduction
The question “Do linear problems have linear optimal algorithms?” was surveyed by Packel [1988]. He gave the commonly accepted answer “usually but not always”. This question, central to the subject of Optimal Recovery, is also one of the main concerns of the present article. We shall start by recalling the meaning of this cryptic question and by introducing our notation, already employed in [Foucart, 2022], which is inspired by the field of Learning Theory. The concepts differ by names only from familiar concepts traditionally encountered in the field of Information-Based Complexity (IBC), see e.g. [Novak and Woźniakowski, 2008]. We try to draw parallels between the terminologies of these fields below.
Common to the two notational settings is the use of the letter for the objects of interest, since in both cases they are primarily thought of as functions, although they could be seen as arbitrary elements from a prescribed normed space. Whereas we use the notation for this normed space (and when it is a Hilbert space), typically stands for a strict subset of the said normed space in IBC. We too assume that our objects of interest live in a strict subset of , but it is denoted by and called model set. The premise that is referred to as a priori information, since it reflects some prior scientific knowledge about realistic objects of interest. In addition, we have at our disposal some a posteriori information in the form , , for some linear functionals . Oftentimes, these linear functionals are point evaluations, giving rise, in IBC parlance, to the standard information . We call the observation vector and notice that it can be written as for some linear map , referred to as observation map. From the available information, both a priori and a posteriori, the task is to recover (approximate, learn, …) the object in full or maybe just to estimate a quantity of interest , where is a linear map from into another normed space . Such a map is called the solution operator in IBC. Our task is realized by way of a recovery map —we refrain from using the IBC term algorithm, since computational feasibility is not a requirement at this point. The performance of this recovery map is assessed by the (global) worst-case error defined as
We are interested in how small the latter can be, in other words in the intrinsic error—often labeled radius of information in IBC—defined as
Moreover, our quest is concerned with optimal recovery maps, i.e., recovery maps that achieve the above infimum. With the terminology settled, the initial question may now be phrased as: “among all the possible optimal recovery maps, is there one which is linear?”. It is well known that the answer is affirmative in two prototypical situations: (i) when the quantity of interest is a linear functional and the model set is symmetric and convex and (ii) when is a Hilbert space and the model set is a centered hyperellipsoid. Another situation allowing for linear optimal recovery maps involves , although the existence arguments rarely turn into practical constructions, except in a handful of cases such as [Foucart, 2023].
One contribution of the present article is to uncover yet another situation where optimality of linear recovery maps occur, precisely when the model set is the intersection of two centered hyperellipsoids. We do actually construct the linear optimal recovery map: it is given by constrained regularization with parameters that are clearly determined. In fact, we determine the corresponding radius of information simultaneously: it is the optimal value of a semidefinite program. The main theoretical tool is Polyak’s S-procedure, which elucidates exactly when a quadratic inequality (with no linear terms) is a consequence of two quadratic inequalities (with no linear terms). This S-procedure is in general not valid with more quadratic inequalities, explaining why our result pertains to the intersection of two centered hyperellipsoids only. This foremost result is established in Section 2 and some implications are derived in subsequent sections. Specifically, Section 3 lists a few simple consequences. One of them is the solution, in the so-called global Optimal Recovery setting, of the two-space problem, where the model set is based on approximability capabilities of two linear subspaces. The other consequences concern two situations for which the observations are inaccurate: first, when inaccuracies are bounded in , we retrieve—and extend to infinite dimensions—some earlier results of ours; second, when some observations are exact while an -bound is available for the inaccurate ones, we uncover an optimal recovery map built by constrained regularization, hence linear. Section 4 presents a more intricate consequence of Section 2, namely the scenario of inaccurate observations bounded in . There, the result is not as pleasing as hoped for, but it nonetheless reveals, somewhat surprisingly, that linear recovery maps can be optimal in this scenario, too. The caveat is that the result holds conditionally on a certain sufficient condition. This condition is close to tautological, but it has the advantage of being computationally verifiable. Our numerical experiments (outlined in the reproducible files accompanying this article) indicate that the condition is likely to hold in case of small -inacurracies.
2 Solution for the two-hyperellipsoid-intersection model set
From now on, the space where the objects of interest live will be a Hilbert space (of possibly infinite dimension), hence it shall be designated by . There are other Hilbert spaces involved as ranges of linear maps, such as the quantity of interest . We will use the notation and indistinctly for all the associated Hilbert norms and inner products. Thus, the model set considered in this section—an intersection of two centered hyperellipsoids—takes the form
(1) |
for some Hilbert-valued bounded linear maps and defined on . We assume throughout that
(2) |
for otherwise the worst-case error of any recovery map , i.e.,
(3) |
would be infinite for , say. We also assume that is surjective, for otherwise some observations would be redundant. This allows us to define the pseudo-inverse of as
2.1 Main result
The result stated below not only provides the value of the radius of information, i.e., of the minimal worst-case error over all recovery maps, but it also identifies an optimal recovery map. The latter involves the constrained regularization maps parametrized by and defined as
(4) | |||||
(5) | |||||
(6) |
Although not obvious at first sight, the maps are linear. For instance, when , they indeed take the form, with denoting the null space of and standing for the restrictions of to ,
(7) |
where the invertibility of follows from (2)333In the infinite-dimensional setting, this assumption should in fact be strengthened to the existence of such that for all , see e.g. [Rudin, 1991, Theorem 12.12].. It is also worth pointing out that
(8) |
The justification of both (7) and (8) can be found in the appendix. There, we also establish the convergence of to as and of to as , the convergence being understood in the weak sense when .
Theorem 1.
For the two-hyperellipsoid-intersection model set (1), the square of the radius of information of the observation map for the estimation of is given by the optimal value of the program
(9) |
Further, if are minimizers of this program, then is an optimal recovery map. In short,
(10) |
The proof of this result is postponed for a short while. Before that, we address the question of whether the optimization program (9) can be solved in practice. The answer is yes, at least when is finite-dimensional. Indeed, if denotes a basis for , representing as for allows us to reformulate the constraint for all as for all , where are symmetric matrices with entries
(11) |
Thus, the program (9) is equivalent to the semidefinite program
Such a semidefinite program can be solved efficiently via a variety of solvers, e.g. the ones embedded in the matlab-based modeling system CVX [Grant and Boyd, 2014], although they (currently) all struggles when is in the thousands.
2.2 Justification
The proof of Theorem 1 is broken down into three small results which we find useful to isolate as separate lemmas. The first lemma estimates the radius of information from below and the second lemma is a key step for the third lemma, which estimates the radius of information from above using constrained regularization maps. Here is the first lemma.
Lemma 2.
The squared worst-case error of any recovery map satisfies, with ,
(12) |
and this lower bound can be reformulated as
Proof.
We include the argument for the first part, even though it is very classical. It starts by considering any and by noticing that both and belong to before observing that
Finally, it finishes by taking the supremum over to derive that .
The argument for the second part begins by reformulating the lower bound as
By the version of Polyak’s S-procedure recalled in the appendix and its extension to the inifinite-dimensional case, the latter constraint is equivalent to444To verify the applicability of the S-procedure, note that satisfies the strict feasibility condition (, , ) and that any satisfy the positive definiteness condition ().
The latter decouples as
Therefore, the lower bound takes the form
Since the minimal value that can achieve under these constraints is , this infimum indeed reduces to the form of the lower bound announced in the statement of lemma. ∎
The second lemma is reminiscent of a result already obtained (with ) in [Foucart and Liao, 2022, Lemma 13] for and in [Foucart, Liao, and Veldt, 2023, Lemma 3] for an arbitrary linear quantity of interest , but the new proof presented here is more transparent, as it avoids arguments involving semidefinite matrices. As such, it is valid in infinite-dimensional Hilbert spaces, too.
Lemma 3.
Let be a linear subspace of and let be Hilbert-valued linear maps defined on . Suppose that satisfy
(13) |
Then, setting and assuming that is invertible, one has
Proof.
To ease notation, let . Note that belongs to , and so does . In view of (13), it is enough to prove that
(14) |
The left-hand side of (14), which we denote by for short, is manipulated as follows:
From the general inequality , we derive that
which is just a rearrangement of the desired inequality (14). ∎
The third and final lemma gives an upper bound for the squared worst-case error of the constrained regularization map .
Lemma 4.
Suppose that satisfy
Then one has
Proof.
With this series of lemmas at hand, we are now ready to justify the main result of this section.
Proof of Theorem 1.
Let be minimizers of the optimization program (9). On the one hand, Lemma 2 guarantees that
(15) |
On the other hand, by the feasibility of and , we have for all . To deal with the possibility of or being zero, we consider, for any , and and notice that for all . Lemma 4 then guarantees that . It is now easy to see that taking (possibly weak) limits as yields
(16) |
The inequalities (15) and (16) together fully justify (10) and thus complete the proof. ∎
2.3 Side results
In this section, we put forward an interpretation of the radius of information that differs from the minimial value of the program (9) and we shed light on the extremizer appearing in the expression of the lower bound from (12)—which is now known to coincide with the squared radius of information. Although these results are not used later, we include them here because they appear interesting for their own sake. Both results call upon the largest eigenvalue, denoted by , of self-adjoint operators.
Proposition 5.
Proof.
The foremost observation consists in reformulating the constraint in (9) as
(17) |
Indeed, the said constraint can be equivalently expressed in the form
Thus, the above-defined and are feasible for (9), since then
(18) |
It now remains to show that whenever are feasible for (9). To see this, with and , notice that
where (17) was used for the last inequality. The desired conclusion follows from . ∎
Proposition 6.
Under the setting of Proposition 5, recall that
(19) |
If and are extremizers of the above two programs, then
-
(i)
and ;
-
(ii)
.
Proof.
Setting , we already know from (18) that . In view of , , , and writing , we observe that
Since the left-hand and right-hand sides are identical, equality must hold throughout. In particular, equality in (1) implies (i). As for equality in (2), it imposes that is an eigenvector associated with the eigenvalue , meaning that , i.e., , which is (ii). ∎
Remark.
The result of Proposition 6 is, in a sense, a characterization of the equality between the supremum and the infimum in (19). Indeed, the argument can easily be turned around: given minimizers , if we can find satisfying (i) and (ii), then the supremum equals the infimum (it is always “at most” by the trivial part of the S-procedure and it is “at least” thanks to the existence of ). Such an approach was used, in essence, to determine explicit solutions of specific differential-equation-inspired Optimal Recovery problems featuring two quadratics constraints but without invoking Polyak’s S-procedure, see [Magaril-Il’yaev, Osipenko, and Tikhomirov, 2004, Vvedenskaya, 2009]. The same circle of ideas extends to the intersection of hyperellipsoids. Indeed, leaving the details to the reader, we state the loose equivalence between the equality
and, with denoting minimizers of the latter, the existence of such that
This gives us a practical way of deciding whether Theorem 1 extends to hyperellipsoids: after solving a semidefinite program, construct a candidate by solving an eigenvalue problem and test if the are all equal. As observed numerically, this occurs in some situations, but certainly not in all, in particular not when the are orthogonal projectors (as in the multispace problem described below). The moral is that this article deals with the intersection of hyperellipsoids not only because the strategy based on Polyak’s S-procedure does not apply to , but also because the natural extension is not valid for .
3 Three easy consequences
The two-hyperellisoid-intersection framework has direct implications for optimal recovery from (partially) inaccurate data, to be discussed later, and even more directly for optimal recovery from accurate data under the two-space approximability model, to be elucidated right now.
3.1 The two-space problem
A model set based on approximability by a linear subspace of with parameter , namely
gained traction after the work of Binev, Cohen, Dahmen, DeVore, Petrova, and Wojtaszczyk [2017]. When is a Hilbert space, these authors completely solved the full recovery () problem even in the local setting—the present article deals solely with the global setting. They also raised the question of the multispace problem, a particular case of which being the two-space problem where
(20) |
For the multispace problem, they proposed two iterative algorithms which, in the limit, produce model- and data-consistent objects. As such, these algorithms yield worst-case errors that are near-optimal by a factor at most two.
The two-space problem—in fact, even the multispace problem—in an arbitrary normed space was solved in [Foucart, 2021] but only when the quantity of interest is a linear functional. For more general linear maps , but when is a Hilbert space, the two-space problem is a special case of our two-hyperellipsoid-intersection problem. Indeed, the model set (20) is an instantiation of the model set (1) with and being scaled versions of the orthogonal projectors onto the orthogonal complements of and , precisely and . Thus, Theorem 1 applies directly and we arrive, through the change of optimization variables and , at the result stated below for completeness.
Theorem 7.
For the two-space model set (20), the square of the radius of information of the observation map for the estimation of is given by the optimal value of the program
(21) |
Further, if are minimizers of this program and if is the map defined for by
(and interpreted via continuity in case or ), then the linear map provides an optimal recovery map.
3.2 Recovery from -inaccurate data
Suppose now that the observations made on the objects of interest are not accurate anymore, but rather of the form with an error vector belonging to some uncertainty set . We then need to adjust the notion of worst-case error of a recovery map and thus define the quantity
(22) |
In this subsection, both the model set and uncertainty set are hyperellipsoids, i.e.,
(23) | ||||
(24) |
In this situation, the problem at hand reduces to the two-hyperellipsoid-intersection problem with accurate data. Indeed, considering the compound variable belonging to the extended Hilbert space , let us introduce linear maps , , , and defined on by
(25) |
The worst-case error (22) of the recovery map is then expressed as
i.e., exactly as in (3). Exploiting this analogy, Theorem 1 yields the result stated below. Note that it is not entirely new: it was obtained in [Foucart and Liao, 2022] for and and in [Foucart, Liao, and Veldt, 2023] for an arbitrary but still with . The extension to is minor—more pertinent is the fact that the result is now valid in infinite dimensions (although solving (26) in practice would then be a challenge).
Theorem 8.
For the hyperellipsoidal model and uncertainty sets (23) and (24), the square of the radius of information of the observation map for the estimation of is given by the optimal value of the program
(26) |
Further, if are minimizers of this program and if is the map defined for by
(27) |
(and interpreted via continuity in case or ), then the linear map provides an optimal recovery map.
3.3 Recovery for mixed accurate and -inaccurate data
In some situations, parts of the observations on the objects of interest can be made accurately, while other parts are subject to errors. Such a situation occurs e.g. when learning the parameters of partial differential equations (using kernel methods, say) from example solutions that can be perfectly evaluated at points on the boundary of the domain but imprecisely at points inside the domain, see e.g [Anandkumar, Azizzadenesheli, Bhattacharya, Kovachki, Li, Liu, and Stuart, 2020, Long, Mrvaljevic, Zhe, and Hosseini, 2023]. To cover this possibility, we can decompose the error vector as , with and . More generally, we shall assume that and for some Hilbert-valued linear maps defined on . We shall therefore consider model and uncertainty sets of the form
(28) | ||||
(29) |
This time working with the different extended space , we still introduce linear maps , , , and defined on the compound variable almost as in (25), but with one slight modification for , namely
(30) |
The worst-case error (22) of a recovery map for this mixed error scenario is still identifiable with the worst-case error (3) for the two-hyperellipsoid-intersection scenario, so we can once more leverage Theorem 1 to derive the following result.
Theorem 9.
For the model set (28) and the mixed-uncertainty set (29), the square of the radius of information of the observation map for the estimation of is given by the optimal value of the program
(31) |
Further, if are minimizers of this program and if is the map defined for by
(32) |
(interpreted via continuity in case or ), then the linear map provides an optimal recovery map.
Proof.
With the change of optimization variables and , the program (9) for , , , and becomes
The form of the program (31) is obtained by eliminating from the above via and noticing that means that , i.e., . The constrained regularization map is now to be replaced by
The form of the program (32) is obtained by eliminating from the above via and noticing that means that . ∎
It is worth making this result more explicit for our motivating example where is decomposed as and we have , . The observation process on the object of interest satisfying is itself decomposed as and , where , are linear maps and where . In this case, a linear optimal recovery map is obtained, maybe somewhat intuitively, via the constrained regularization
Our more significant contribution consists in uncovering a principled way of selecting the parameters , namely as solutions to the program
4 One intricate consequence: recovery from -inaccurate data
In this final section, we contemplate yet another scenario of optimal recovery from inaccurate data which borrows from the results of Section 2. The situation is more delicate than in Section 3, though, because the observation error is not modeled through an -bound but an -bound. Thus, the objects of interest from a Hilbert space are acquired via inaccurate linear observations of the form , where the model set for and the uncertainty set for are given relative to some parameter and by
(33) | ||||
(34) |
Towards the goal of optimally estimating a Hilbert-valued linear quantity of interest , the worst-case error of a recovery map is defined as
(35) |
We will reveal that, conditionally on a checkable sufficient condition, the radius of information can still be computed and a constrained-regularization-based optimal recovery map—turning out, perhaps surprisingly, to be linear—can still be constructed efficiently. The sufficient condition is not vacuous: numerically, it even appears to hold whenever is small enough. Unfortunately, we were not able to establish this fact theoretically.
4.1 Main result
The result presented below involves constrained regularization maps defined, for and for , by
(36) |
with the usual interpretation when or . These constrained regularization maps are linear. Indeed, as a consequence of Lemma 16 in the appendix, they are given by
For each , fixing from now on an element such that555In this section, the notation does not represents the th entry of the error vector, but the th element of the canonical basis for . (e.g. ), we compute
(37) |
and we let denote extremizers of this optimization program. In addition, we consider (and we shall compute some of) the quantities defined for by
The main result of this section can now be stated as follows.
Theorem 10.
The proof of this result is given in the next subsection. Before getting there, we reiterate that the sufficient condition (38) seems to occur whenever is small enough, as supported by the numerical experiments presented in the reproducible files accompanying this article.
4.2 Justification
Much like the proof of Theorem 1, the proof of Theorem 10 is divided into three separate lemmas: one which establishes lower bounds for the radius of information, one which indicates that each lower bound is achieved by an associated constrained regularization map, and one that establishes a key property of such constrained regularization maps. Finally, these three ingredients will be put together while incorporating the sufficient condition (38). Throughout the argument, it will be convenient to work with the linear maps , , , and defined for in the extended Hilbert space via
(39) |
Let us now state the first of our series of three lemmas.
Lemma 11.
The squared global worst-case error of any recovery map satisfies
where the lower bounds are expressed as
(40) | ||||
(41) | ||||
(42) | ||||
(43) |
Proof.
For , noticing in (35) that the supremum over all satisfying is larger than or equal to the supremum over all with leads to the lower bound on expressed in (40). To arrive at (41), we make the change of variable , so that
and (41) follows by setting and taking the expressions (39) for , , , and into account. At this point, it is apparent that coincides with the worst-case error of for the estimation of from under the two-hyperellipsoid-intersection model assumption and . Thus, the further lower bound (42) and its reformulation (43) follow from an application of Lemma 2. ∎
The lemma below, although not used explicitly later, gives us an idea of the coveted optimal recovery map by looking at the case of equality in . For the latter, note that the expression (43) is seen to coincide with the expression (37) by making the change of optimization variables , .
Lemma 12.
Proof.
According to the results of Section 2, we know that equality in occurs for , where are extremizers of the program
and where the recovery map is defined, for , by
It now remains to verify that agrees with . First, according to the expressions (39) for , , , and , and in view of the relations and , it is easily seen that and . Then, writing with and , we see that
and consequently, making the change , we deduce that
Since the constraint decomposes as and for , we obtain, after eliminating ,
where that last equality is simply the definition of . The remaining justification is settled by remarking that . ∎
The third lemma is a step towards the determination of the worst-case error of the constrained regularization map .
Lemma 13.
For each , one has
Proof.
Setting , the quantity under consideration becomes
where we have borrowed from the previous proof the observation that with and . Thus, our quantity appears to be the worst-case error of for the estimation of from given that and . We know from the results of Section 2 that the latter is equal to , i.e., to , as announced. ∎
Having these three lemmas at our disposal, we now turn to the justification of the main result of this section.
Proof of Theorem 10.
According to Lemma 11, there holds
where we recall that the index is obtained as the maximizer of over all . In order to prove our result, we have to show that this infimum is actually achieved for the linear recovery map . To this end, we notice that the linearity of guarantees that
It has become familiar, relying on Polyak’s S-procedure, to transform the latter supremum into
which we recognize as the quantity . Now, calling upon the sufficient condition (38), we derive that
where the last step was due to Lemma 13. This equality completes the proof. ∎
4.3 Side result
When the sufficient condition (38) fails, it is not anymore guaranteed that the linear map provides an optimal recovery map. Regardless, we can always solve a semidefinite program to obtain a linear recovery map with minimal worst-case error, according to the result stated below with the notation introduced in (39).
Proposition 14.
The squared worst-case error of a linear recovery maps can be computed as the optimal value of a semidefinite program, namely as
(44) | ||||
This quantity can further be minimized over all linear maps , yielding a linear recovery map with smallest worst-case error.
Proof.
For a linear recovery map , we have
(45) |
Note that the above -dependent suprema can also be expressed as
Therefore, the -dependent constraint in (45) is equivalent to the existence of such that and . As such, we arrive at
Using Schur complements, the above -dependent semidefinite constraints can each be rephrased as
leading to being expressed as in (44). We finally note that the linear dependence on of the constraints in (44) allows us to further view the minimization of over all linear maps as a semidefinite program. ∎
We should remark that, even when is small and the sufficient condition (38) holds, the linear recovery map with smallest worst-case error obtained by semidefinite programming may differ from the optimal recovery map , illustrating the nonuniqueness of optimal recovery maps. Moreover, when is not small, our numerical experiments (available from the reproducible files) suggest that may not be optimal among linear recovery maps anymore.
Appendix
In this appendix, we provide justifications for a few facts not fully explained in the main text.
Polyak’s S-procedure.
Given quadratic functions , the statement whenever holds if there exists such that . The following result, paraphrased from [Polyak, 1998, Theorem 4.1] establishes that this sufficient condition is also necessary when and the ’s contain no linear terms.
Theorem 15.
Suppose that and that quadratic functions on take the form for symmetric matrices and scalars . Then
provided and for some (strict feasibility) and for some (positive definiteness).
As established in [Contino, Fongi, and Muro, 2022, Proposition 5.2], such a result remains valid when is replaced by an arbitrary Hilbert space —even of infinite dimension—and the ’s are self-adjoint bounded linear operators on . This generalized version is the one called upon in the main text.
Constrained Regularization.
The goal here is to justify the identities (7) and (8), which are consequences of the general observation below.
Lemma 16.
Let and be two bounded linear maps between Hilbert spaces. Assume that there exists such that for all , so that is invertible, where denotes the restriction of to . Given and , the solution to
can be expressed, for any such that , as
Proof.
Writing the optimization variable as with and the minimizer as with , we see that is solution to
This solution is characterized by the orthogonality condition for all , which is equivalent to , or to . Left-multiplying by to obtain and substituting into yields the announced result. ∎
It follows as a consequence that, if are Hilbert-valued bounded linear maps defined on such that there exists with for all 666This assumption simply reduces to when is finite dimensional. and if , then, for any ,
(46) | ||||
To arrive at this identity, which reduces to (7) when , it suffices to apply Lemma 16 with
Furthermore, if for some , taking instead of leads, after rearrangement, to
The latter reduces to (8) when .
Finally, we want to justify the statement made in Section 2 that converges weakly to as for any fixed . We shall do so under the working assumption that there exists such that for all 777This assumption reduces to when is finite dimensional.. Supposing without loss of generality that , we thus want to establish that
If this was not the case, there would exist , , and a sequence decreasing to zero such that for each . Now, from the optimality property of , we have , which yields, in view of ,
Thanks to our working assumption, it follows that the sequence of elements in is bounded, and then so is the sequence . As such, it possesses a subsequence weakly converging to some , say. We still write for this subsequence and we note that . Next, in view of for all , we derive that from . From there, we also obtain and , and in turn and . These facts imply that is also a minimizer for the program defining , so that by uniqueness of the minimizer. This is of course incompatible with and provides the required contradiction.
References
- Anandkumar et al. [2020] A. Anandkumar, K. Azizzadenesheli, K. Bhattacharya, N. Kovachki, Z. Li, B. Liu, and A. Stuart. Neural operator: graph kernel network for partial differential equations. In ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations, 2020.
- Binev et al. [2017] P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, and P. Wojtaszczyk. Data assimilation in reduced modeling. SIAM/ASA Journal on Uncertainty Quantification, 5(1):1–29, 2017.
- Contino et al. [2022] M. Contino, G. Fongi, and S. Muro. Polyak’s theorem on Hilbert spaces. Optimization, pages 1–15, 2022.
- Foucart [2021] S. Foucart. Instances of computational optimal recovery: refined approximability models. Journal of Complexity, 62:101503, 2021.
- Foucart [2022] S. Foucart. Mathematical Pictures at a Data Science Exhibition. Cambridge University Press, 2022.
- Foucart [2023] S. Foucart. Full recovery from point values: an optimal algorithm for Chebyshev approximability prior. Advances in Computational Mathematics, 49(4):57, 2023.
- Foucart and Liao [2022] S. Foucart and C. Liao. Optimal recovery from inaccurate data in Hilbert spaces: regularize, but what of the parameter? Constructive Approximation, pages 1–32, 2022.
- Foucart et al. [2023] S. Foucart, C. Liao, and N. Veldt. On the optimal recovery of graph signals. In 2023 International Conference on Sampling Theory and Applications (SampTA), pages 1–5, 2023. doi: 10.1109/SampTA59647.2023.10301205.
- Grant and Boyd [2014] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx, March 2014.
- Long et al. [2023] D. Long, N. Mrvaljevic, S. Zhe, and B. Hosseini. A kernel approach for PDE discovery and operator learning. arXiv preprint arXiv:2210.08140, 2023.
- Magaril-Il’yaev et al. [2004] G. G. Magaril-Il’yaev, K. Yu. Osipenko, and V. M. Tikhomirov. On optimal recovery of heat equation solutions. Approximation theory: a volume dedicated to B. Bojanov, pages 163–175, 2004.
- Novak and Woźniakowski [2008] E. Novak and H. Woźniakowski. Tractability of Multivariate Problems: Linear Information. European Mathematical Society, 2008.
- Packel [1988] E. W. Packel. Do linear problems have linear optimal algorithms? SIAM Review, 30(3):388–403, 1988.
- Polyak [1998] B. T. Polyak. Convexity of quadratic transformations and its use in control and optimization. Journal of Optimization Theory and Applications, 99(3):553–583, 1998.
- Rudin [1991] W. Rudin. Functional Analysis. McGraw-Hill, second edition, 1991.
- Vvedenskaya [2009] E. V. Vvedenskaya. On the optimal recovery of a solution of a system of linear homogeneous differential equations. Differential Equations, 45(2):262–266, 2009.