This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Parametric Complexity Bounds for Approximating PDEs
with Neural Networks

   Tanya Marwah, Zachary C. Lipton, Andrej Risteski
Machine Learning Department, Carnegie Mellon University
{tmarwah, zlipton, aristesk}@andrew.cmu.edu
Abstract

Recent experiments have shown that deep networks can approximate solutions to high-dimensional PDEs, seemingly escaping the curse of dimensionality. However, questions regarding the theoretical basis for such approximations, including the required network size, remain open. In this paper, we investigate the representational power of neural networks for approximating solutions to linear elliptic PDEs with Dirichlet boundary conditions. We prove that when a PDE’s coefficients are representable by small neural networks, the parameters required to approximate its solution scale polynomially with the input dimension dd and proportionally to the parameter counts of the coefficient networks. To this we end, we develop a proof technique that simulates gradient descent (in an appropriate Hilbert space) by growing a neural network architecture whose iterates each participate as sub-networks in their (slightly larger) successors, and converge to the solution of the PDE. We bound the size of the solution, showing a polynomial dependence on dd and no dependence on the volume of the domain.

1 Introduction

A partial differential equation (PDE) relates a multivariate function defined over some domain to its partial derivatives. Typically, one’s goal is to solve for the (unknown) function, often subject to additional constraints, such as the function’s value on the boundary of the domain. PDEs are ubiquitous in both the natural and social sciences, where they model such diverse processes as heat diffusion [Crank and Nicolson, 1947; Özişik et al., 2017], fluid dynamics [Anderson and Wendt, 1995; Temam, 2001], and financial markets [Black and Scholes, 1973; Ehrhardt and Mickens, 2008]. Because most PDEs of interest lack closed-form solutions, computational approximation methods remain a vital and an active field of research [Ames, 2014]. For low-dimensional functions, dominant approaches include the finite differences and finite element methods [LeVeque, 2007], which discretize the domain. After partitioning the domain into a mesh, these methods solve for the function value at its vertices. However, these techniques scale exponentially with the input dimension, rendering them unsuitable for high-dimensional problems.

Following breakthroughs in deep learning for approximating high-dimensional functions in such diverse domains as computer vision [Krizhevsky et al., 2012; Radford et al., 2015] and natural language processing [Bahdanau et al., 2014; Devlin et al., 2018; Vaswani et al., 2017], a burgeoning line of research leverages neural networks to approximate solutions to PDEs. This line of work has produced promising empirical results for common PDEs such as the Hamilton-Jacobi-Bellman and Black-Scholes equations [Han et al., 2018; Grohs et al., 2018; Sirignano and Spiliopoulos, 2018]. Because they do not explicitly discretize the domain, and given their empirical success on high-dimensional problems, these methods appear not to suffer the curse of dimensionality. However, these methods are not well understood theoretically, leaving open questions about when they are applicable, what their performance depends on, and just how many parameters are required to approximate the solution to a given PDE.

Over the past three years, several theoretical works have investigated questions of representational power under various assumptions. Exploring a variety of settings, Kutyniok et al. [2019], Grohs et al. [2018], and Jentzen et al. [2018], proved that the number of parameters required to approximate a solution to a PDE exhibits a less than exponential dependence on the input dimension for some special parabolic PDEs that admit straightforward analysis. Grohs and Herrmann [2020] consider elliptic PDEs with Dirichlet boundary conditions. However, their rate depends on the volume of the domain, and thus can have an implicit exponential dependence on dimension (e.g., consider a hypercube with side length greater than one).

In this paper, we focus on linear elliptic PDEs with Dirichlet boundary conditions, which are prevalent in science and engineering (e.g., the Laplace and Poisson equations). Notably, linear elliptic PDEs define the steady state of processes like heat diffusion and fluid dynamics. Our work asks:

Question.

How many parameters suffice to approximate the solution to a linear elliptic PDE up to a specified level of precision using a neural network?

We show that when the coefficients of the PDE are expressible as small neural networks (note that PDE coefficients are functions), the number of parameters required to approximate the PDE’s solution is proportional to the number of parameters required to express the coefficients. Furthermore, we show that the number of parameters depends polynomially on the dimension and does not depend upon the volume of the domain.

2 Overview of Results

To begin, we formally define linear elliptic PDEs.

Definition 1 (Linear Elliptic PDE [Evans, 1998]).

Linear elliptic PDEs with Dirichlet boundary condition can be expressed in the following form:

{(Lu)(x)(div(Au)+cu)(x)=f(x),xΩ,u(x)=0,xΩ,\left\{\begin{aligned} \left(Lu\right)(x)\equiv\left(-\text{div}\left(A\nabla u\right)+cu\right)(x)&=f(x),\forall x\in\Omega,\\ u(x)&=0,\forall x\in\partial\Omega,\end{aligned}\right.

where Ωd\Omega\subset\mathbb{R}^{d} is a bounded open set with a boundary Ω\partial\Omega. Further, for all xΩx\in\Omega, A:Ωd×dA:\Omega\to\mathbb{R}^{d\times d} is a matrix-valued function, s.t. A(x)0A(x)\succ 0, and c:Ωc:\Omega\to\mathbb{R}, s.t. c(x)>0c(x)>0. 111 Here, div denotes the divergence operator. Given a vector field F:ddF:\mathbb{R}^{d}\to\mathbb{R}^{d}, div(F)=F=i=1dFixi\text{div}(F)=\nabla\cdot F=\sum_{i=1}^{d}\frac{\partial F_{i}}{\partial x_{i}}

We refer to AA and cc as the coefficients of the PDE. The divergence form in Definition 1 is one of two canonical ways to define a linear elliptic PDE [Evans, 1998] and is convenient for several technical reasons (see Section 4). The Dirichlet boundary condition states that the solution takes a constant value (here 0) on the boundary Ω\partial\Omega.

Our goal is to express the number of parameters required to approximate the solution of a PDE in terms of those required to approximate its coefficients AA and cc. Our key result shows:

Theorem (Informal).

If the coefficients A,cA,c and the function ff are approximable by neural networks with at most NN parameters, the solution uu^{\star} to the PDE in Definition 1 is approximable by a neural network with O(poly(d)N)O\left(\mbox{poly}(d)N\right) parameters.

This result, formally expressed in Section 5, may help to explain the practical efficacy of neural networks in approximating solutions to high-dimensional PDEs with boundary conditions [Sirignano and Spiliopoulos, 2018; Li et al., 2020a]. To establish this result, we develop a constructive proof technique that simulates gradient descent (in an appropriate Hilbert space) through the very architecture of a neural network. Each iterate, given by a neural network, is subsumed into the (slightly larger) network representing the subsequent iterate. The key to our analysis is to bound both (i) the growth in network size across consecutive iterates; and (ii) the total number of iterates required.

Organization of the paper

We introduce the required notation along with some mathematical preliminaries on PDEs in Section 4. The problem setting and formal statement of the main result are provided in Section 5. Finally, we provide the proof of the main result in Section 6.

3 Prior Work

Among the first papers to leverage neural networks to approximate solutions to PDEs with boundary conditions are Lagaris et al. [1998], Lagaris et al. [2000], and Malek and Beidokhti [2006]. However, these methods discretize the input space and thus are not suitable for high-dimensional input spaces. More recently, mesh-free neural network approaches have been proposed for high-dimensional PDEs [Han et al., 2018; Raissi et al., 2017, 2019], achieving impressive empirical results in various applications. Sirignano and Spiliopoulos [2018] design a loss function that penalizes failure to satisfy the PDE, training their network on minibatches sampled uniformly from the input domain. They also provide a universal approximation result, showing that for sufficiently regularized PDEs, there exists a multilayer network that approximates its solution. However, they do not comment on the complexity of the neural network or how it scales with the input dimension. Khoo et al. [2017] also prove universal approximation power, albeit with networks of size exponential in the input dimension. Recently, Grohs et al. [2018]; Jentzen et al. [2018] provided a better-than-exponential dependence on the input dimension for some special parabolic PDEs, for which the simulating a PDE solver by a neural network is straightforward.

Several recent works [Bhattacharya et al., 2020; Kutyniok et al., 2019; Li et al., 2020b, a] show (experimentally) that a single neural network can solve for an entire family of PDEs. They approximate the map from a PDE’s parameters to its solution, potentially avoiding the trouble of retraining for every set of coefficients. Among these, only Kutyniok et al. [2019] provides theoretical grounding. However, they assume the existence of a finite low-dimensional space with basis functions that can approximate this parametric map—and it is unclear when this would obtain. Our work proves the existence of such maps, under the assumption that the family of PDEs has coefficients described by neural networks with a fixed architecture (Section 7).

In the work most closely related to ours, Grohs and Herrmann [2020] provides approximation rates polynomial in the input dimension dd for the Poisson equation (a special kind of linear elliptic PDE) with Dirichlet boundary conditions. They introduce a walk-on-the-sphere algorithm, which simulates a stochastic differential equation that can be used to solve a Poisson equation with Dirichlet boundary conditions (see, e.g., Oksendal [2013]’s Theorem 9.13). The rates provided in Grohs and Herrmann [2020] depend on the volume of the domain, and thus depend, implicitly, exponentially on the input dimension dd. Our result considers the boundary condition for the PDE and is independent of the volume of the domain. Further, we note that our results are defined for a more general linear elliptic PDE, of which the Poisson equation is a special case.

4 Notation and Definitions

We now introduce several key concepts from PDEs and some notation. For any open set Ωd\Omega\subset\mathbb{R}^{d}, we denote its boundary by Ω\partial\Omega and denote its closure by Ω¯:=ΩΩ\bar{\Omega}:=\Omega\cup\partial\Omega. By C0(Ω)C^{0}(\Omega), we denote the space of real-valued continuous functions defined over the domain Ω\Omega. Furthermore, for kk\in\mathbb{N}, a function gg belongs to Ck(Ω)C^{k}(\Omega) if all partial derivatives αg\partial^{\alpha}g exist and are continuous for any multi-index α\alpha, such that |α|k|\alpha|\leq k. Finally, a function gC(Ω)g\in C^{\infty}(\Omega) if gCk(Ω)g\in C^{k}(\Omega) for all kk\in\mathbb{N}. Next, we define several relevant function spaces:

Definition 2.

For any k{}k\in\mathbb{N}\cup\{\infty\}, C0k(Ω):={g:gCk(Ω),supp(g)¯Ω}C^{k}_{0}(\Omega):=\{g:g\in C^{k}(\Omega),\overline{\mbox{supp}(g)}\subset\Omega\}.

Definition 3.

For a domain Ω\Omega, the function space L2(Ω)L^{2}(\Omega) consists of all functions g:Ωg:\Omega\to\mathbb{R}, s.t. gL2(Ω)<\|g\|_{L^{2}(\Omega)}<\infty where gL2(Ω)=(Ω|g(x)|2𝑑x)12\|g\|_{L^{2}(\Omega)}=\left(\int_{\Omega}|g(x)|^{2}dx\right)^{\frac{1}{2}}. This function space is equipped with the inner product

g,hL2(Ω)=Ωg(x)h(x)𝑑x.\langle g,h\rangle_{L^{2}(\Omega)}=\int_{\Omega}g(x)h(x)dx.
Definition 4.

For a domain Ω\Omega and a function g:Ωg:\Omega\rightarrow\mathbb{R}, the function space L(Ω)L^{\infty}(\Omega) is defined analogously, where gL(Ω)=inf{c0:|g(x)|c for almost all xΩ}\|g\|_{L^{\infty}(\Omega)}=\inf\{c\geq 0:|g(x)|\leq c\text{ for almost all }x\in\Omega\}.

Definition 5.

For a domain Ω\Omega and mm\in\mathbb{N}, we define the Hilbert space Hm(Ω)H^{m}(\Omega) as

Hm(Ω):={g:Ω:αgL2(Ω),αs.t.|α|m}H^{m}(\Omega):=\{g:\Omega\rightarrow\mathbb{R}:\partial^{\alpha}g\in L^{2}(\Omega),\;\forall\alpha\;\text{s.t.}\;\;|\alpha|\leq m\}

Furthermore, Hm(Ω)H^{m}(\Omega) is equipped with the inner product, g,hHm(Ω)=|α|mΩ(αg)(αh)𝑑x\langle g,h\rangle_{H^{m}(\Omega)}=\sum_{|\alpha|\leq m}\int_{\Omega}(\partial^{\alpha}g)(\partial^{\alpha}h)dx and the corresponding norm

gHm(Ω)=(|α|mαgL2(Ω)2)12.\|g\|_{H^{m}(\Omega)}=\left(\sum_{|\alpha|\leq m}\|\partial^{\alpha}g\|^{2}_{L^{2}(\Omega)}\right)^{\frac{1}{2}}.
Definition 6.

The closure of C0(Ω)C^{\infty}_{0}(\Omega) in Hm(Ω)H^{m}(\Omega) is denoted by H0m(Ω)H^{m}_{0}(\Omega).

Informally, H0m(Ω)H_{0}^{m}(\Omega) is the set of functions belonging to Hm(Ω)H^{m}(\Omega) that can be approximated by a sequence of functions ϕnC0(Ω)\phi_{n}\in C_{0}^{\infty}(\Omega). This also implies that if a function gH0m(Ω)g\in H_{0}^{m}(\Omega), then g(x)=0g(x)=0 for all xΩx\in\partial\Omega. This space (particularly with m=1m=1) is often useful when analyzing elliptic PDEs with Dirichlet boundary conditions.

Definition 7 (Weak Solution).

Given the PDE in Definition 1, if fL2(Ω)f\in L^{2}(\Omega), then a function u:Ωu:\Omega\to\mathbb{R} solves the PDE in a weak sense if uH01(Ω)u\in H^{1}_{0}(\Omega) and for all vH01(Ω)v\in H^{1}_{0}(\Omega), we have

Ω(Auv+cuv)𝑑x=Ωfv𝑑x\int_{\Omega}\left(A\nabla u\cdot\nabla v+cuv\right)dx=\int_{\Omega}fvdx (1)

The left hand side of (1) is also equal to Lu,vL2(Ω)\langle Lu,v\rangle_{L^{2}(\Omega)} for all u,vH01(Ω)u,v\in H^{1}_{0}(\Omega) (see Lemma A.1), whereas, following the definition of the L2(Ω)L^{2}(\Omega) norm, the right side is simply f,vL2(Ω)\langle f,v\rangle_{L^{2}(\Omega)}. Having introduced these preliminaries, we now introduce some important facts about linear PDEs that feature prominently in our analysis.

Proposition 1.

For the PDE in Definition 1, if fL2(Ω)f\in L^{2}(\Omega) the following hold:

  1. 1.

    The solution to Equation (1) exists and is unique.

  2. 2.

    The weak solution is also the unique solution of the following minimization problem:

    u=argminvH01(Ω)J(v):=argminvH01(Ω){12Lv,vL2(Ω)f,vL2(Ω)}.u^{\star}=\mathop{\mathrm{argmin}}_{v\in H^{1}_{0}(\Omega)}J(v):=\mathop{\mathrm{argmin}}_{v\in H^{1}_{0}(\Omega)}\left\{\frac{1}{2}\langle Lv,v\rangle_{L^{2}(\Omega)}-\langle f,v\rangle_{L^{2}(\Omega)}\right\}. (2)

This proposition is standard (we include a proof in the Appendix, Section A.1 for completeness) and states that there exists a unique solution to the PDE (referred to as uu^{\star}), which is also the solution we get from the variational formulation in  (2). In this work, we introduce a sequence of functions that minimizes the loss in the variational formulation.

Definition 8 (Eigenvalues and Eigenfunctions, Evans [1998]).

Given an operator LL, the tuples (λ,φ)i=1(\lambda,\varphi)_{i=1}^{\infty}, where λi\lambda_{i}\in\mathbb{R} and φiH01(Ω)\varphi_{i}\in H^{1}_{0}(\Omega) are (eigenvalue, eigenfunction) pairs that satisfy Lφ=λφL\varphi=\lambda\varphi, for all xΩx\in\Omega. Since φH01(Ω)\varphi\in H^{1}_{0}(\Omega), we know that φ|Ω=0\varphi_{|\partial\Omega}=0. The eigenvalue can be written as

λi=infuXiLu,uL2(Ω)uL2(Ω)2,\lambda_{i}=\inf_{u\in X_{i}}\frac{\langle Lu,u\rangle_{L^{2}(\Omega)}}{\|u\|^{2}_{L^{2}(\Omega)}}, (3)

where Xi:=span{φ1,,φi}={uH01(Ω):u,φjL2(Ω)=0j{1,,i}}X_{i}:=\mbox{span}\{\varphi_{1},\ldots,\varphi_{i}\}^{\perp}=\{u\in H^{1}_{0}(\Omega):\langle u,\varphi_{j}\rangle_{L^{2}(\Omega)}=0\;\forall j\in\{1,\cdots,i\}\} and 0<λ1λ20<\lambda_{1}\leq\lambda_{2}\leq\cdots. Furthermore, we define by Φk\Phi_{k} the span of the first kk eigenfunctions of LL, i.e., Φk:=span{φ1,,φk}\Phi_{k}:=\operatorname{span}\{\varphi_{1},\cdots,\varphi_{k}\}.

We note that since the operator LL is self-adjoint and elliptic (in particular, L1L^{-1} is compact), the eigenvalues are real and countable. Moreover, the eigenfunctions form an orthonormal basis of H01(Ω)H^{1}_{0}(\Omega) (see Evans [1998], Section 6.5).

5 Main Result

Before stating our results, we provide the formal assumptions on the PDEs of interest:

Assumptions:

  1. (i)

    Smoothness: We assume that ΩC\partial\Omega\in C^{\infty}. We also assume that the coefficient AΩd×dA\in\Omega\rightarrow\mathbb{R}^{d\times d} is a symmetric matrix-valued function, i.e., A=(aij(x))A=(a_{ij}(x)) and aij(x)L(Ω)a_{ij}(x)\in L^{\infty}(\Omega) for all i,j[d]i,j\in[d] and the function cL(Ω)c\in L^{\infty}(\Omega) and c(x)ζ>0c(x)\geq\zeta>0 for all xΩx\in\Omega. Furthermore, we assume that aij,cCa_{ij},c\in C^{\infty}. We define a constant

    C:=(2d2+1)max{maxα:|α|3maxi,jαaijL(Ω),maxα:|α|2αcL(Ω)}.C:=(2d^{2}+1)\max\left\{\max_{\alpha:|\alpha|\leq 3}\max_{i,j}\|\partial^{\alpha}a_{ij}\|_{L^{\infty}(\Omega)},\max_{\alpha:|\alpha|\leq 2}\|\partial^{\alpha}c\|_{L^{\infty}(\Omega)}\right\}.

    Further, the function fL2(Ω)f\in L^{2}(\Omega) is also in CC^{\infty} and the projection of ff onto Φk\Phi_{k} which we denote fspanf_{\operatorname{span}} satisfies for any multi-index α\alpha: αfαfspanL2(Ω)ϵspan\|\partial^{\alpha}f-\partial^{\alpha}f_{\operatorname{span}}\|_{L^{2}(\Omega)}\leq\epsilon_{\operatorname{span}}. 222Since ΩC\partial\Omega\in C^{\infty} and the functions aij,ca_{ij},c and ff are all in CC^{\infty}, it follows from Nirenberg [1955] (Theorem, Section 5) the eigenfuntions of LL are also CC^{\infty}. Hence, the function fspanf_{\operatorname{span}} is in CC^{\infty} as well.

  2. (ii)

    Ellipticity: There exist constants Mm>0M\geq m>0 such that, for all xΩx\in\Omega and ξd\xi\in\mathbb{R}^{d},

    mξ2i,j=1daij(x)ξiξjMξ2.m\|\xi\|^{2}\leq\sum_{i,j=1}^{d}a_{ij}(x)\xi_{i}\xi_{j}\leq M\|\xi\|^{2}.
  3. (iii)

    Neural network approximability: There exist neural networks A~\tilde{A} and c~\tilde{c} with NA,NcN_{A},N_{c}\in\mathbb{N} parameters, respectively, that approximate the functions AA and cc, i.e., AA~L(Ω)ϵA\|A-\tilde{A}\|_{L^{\infty}(\Omega)}\leq\epsilon_{A} and cc~L(Ω)ϵc\|c-\tilde{c}\|_{L^{\infty}(\Omega)}\leq\epsilon_{c}, for small ϵA,ϵc0\epsilon_{A},\epsilon_{c}\geq 0. We assume that for all uH01(Ω)u\in H^{1}_{0}(\Omega) the operator L~\tilde{L} defined as,

    L~u=div(A~u)+c~u.\tilde{L}u=-\mathrm{div}(\tilde{A}\nabla u)+\tilde{c}u. (4)

    is elliptic with (λ~i,φ~i)i=1(\tilde{\lambda}_{i},\tilde{\varphi}_{i})_{i=1}^{\infty} (eigenvalue, eigenfunction) pairs. We also assume that there exists a neural network fnnCf_{\operatorname{nn}}\in C^{\infty} with NfNN_{f}\in N parameters such that for any multi-index α\alpha, αfαfnnL2(Ω)ϵnn\|\partial^{\alpha}f-\partial^{\alpha}f_{\operatorname{nn}}\|_{L^{2}(\Omega)}\leq\epsilon_{\operatorname{nn}}. By Σ\Sigma, we denote the set of all (infinitely differentiable) activation functions used by networks A~\tilde{A}, c~\tilde{c}, and fnnf_{\operatorname{nn}}. By Σ\Sigma^{\prime}, we denote the set that contains all the nn-th order derivatives of the activation functions in Σ\Sigma, n0\forall n\in\mathbb{N}_{0}

Intuitively, ellipticity of LL in a linear PDE Lu=fLu=f is analogous to positive definiteness of a matrix QdQ\in\mathbb{R}^{d} in a linear equation Qx=kQx=k, where x,kdx,k\in\mathbb{R}^{d}.

In (iii), we assume that the coefficients AA and cc, and the function ff can be approximated by neural networks. While this is true for any smooth functions given sufficiently large NA,Nc,NfN_{A},N_{c},N_{f}, our results are most interesting when these quantities are small (e.g. subexponential in the input dimension dd). For many PDEs used in practice, approximating the coefficients using small neural networks is straightforward. For example, in heat diffusion (whose equilibrium is defined by a linear elliptic PDE) A(x)A(x) defines the conductivity of the material at point xx. If the conductivity is constant, then the coefficients can be written as neural networks with O(1)O(1) parameters.

The part of assumption (i) that stipulates that ff is close to fspanf_{\operatorname{span}} can be thought of as a smoothness condition on ff. For instance, if L=ΔL=-\Delta (the Laplacian operator), the Dirichlet form satisfies Lu,uL2(Ω)uL2(Ω)2=uL2(Ω)uL2(Ω)\frac{\langle Lu,u\rangle_{L^{2}(\Omega)}}{\|u\|^{2}_{L^{2}(\Omega)}}=\frac{\|\nabla u\|_{L^{2}(\Omega)}}{\|u\|_{L^{2}(\Omega)}}, so eigenfunctions corresponding to higher eigenvalues tend to exhibit a higher degree of spikiness. The reader can also think of the eigenfunctions corresponding to larger kk as Fourier basis functions corresponding to higher frequencies.

Finally, in (i) and (iii), while the requirement that the function pairs (ff, fnnf_{\operatorname{nn}}) and (ff, fspanf_{\operatorname{span}}) are close not only in their values, but their derivatives as well is a matter of analytical convenience, our key results do not necessarily depend on this precise assumption. Alternatively, we could replace this assumption with similar (but incomparable) conditions: e.g., we can also assume closeness of the values and a rapid decay of the L2L^{2} norms of the derivatives. We require control over the derivatives because our method’s gradient descent iterations involve repeatedly applying the operator LL to ff—which results in progressively higher derivatives.

We can now formally state our main result:

Theorem 1 (Main Theorem).

Consider a linear elliptic PDE satisfying Assumptions (i)-(iii), and let uH01(Ω)u^{\star}\in H^{1}_{0}(\Omega) denote its unique solution. If there exists a neural network u0H01(Ω)u_{0}\in H_{0}^{1}(\Omega) with N0N_{0} parameters, such that uu0L2(Ω)R\|u^{\star}-u_{0}\|_{L^{2}(\Omega)}\leq R, for some R<R<\infty, then for every TT\in\mathbb{N} such that T120min(λk,1)δT\leq\frac{1}{20\min(\lambda_{k},1)\delta}, there exists a neural network uTu_{T} with size

O(d2T(N0+NA)+T(Nf+Nc))O\left(d^{2T}\left(N_{0}+N_{A}\right)+T(N_{f}+N_{c})\right)

such that uuTL2(Ω)ϵ+ϵ~\|u^{\star}-u_{T}\|_{L^{2}(\Omega)}\leq\epsilon+\tilde{\epsilon} where

ϵ\displaystyle\epsilon :=(λ~kλ~1λ~k+λ~1)TR,\displaystyle:=\left(\frac{\tilde{\lambda}_{k}-\tilde{\lambda}_{1}}{\tilde{\lambda}_{k}+\tilde{\lambda}_{1}}\right)^{T}R,
ϵ~\displaystyle\tilde{\epsilon} :=ϵspanλ1+δλ1fL2(Ω)γδ+δuL2(Ω)+(max{1,T2Cη})T(ϵspan+ϵnn+4(1+δγδ)λkTfL2(Ω)),\displaystyle:=\frac{\epsilon_{\operatorname{span}}}{\lambda_{1}}+\frac{\delta}{\lambda_{1}}\frac{\|f\|_{L^{2}(\Omega)}}{\gamma-\delta}+\delta\|u^{\star}\|_{L^{2}(\Omega)}+(\max\{1,T^{2}C\eta\})^{T}\left(\epsilon_{\operatorname{span}}+\epsilon_{nn}+4\left(1+\frac{\delta}{\gamma-\delta}\right)\lambda_{k}^{T}\|f\|_{L^{2}(\Omega)}\right),

and η:=2λ~1+λ~k\eta:=\frac{2}{\tilde{\lambda}_{1}+\tilde{\lambda}_{k}}, δ:=max{ϵAm,ϵcζ}\delta:=\max\left\{\frac{\epsilon_{A}}{m},\frac{\epsilon_{c}}{\zeta}\right\}. Furthermore, the activation functions used in uTu_{T} belong to the set ΣΣ{ρ}\Sigma\cup\Sigma^{\prime}\cup\{\rho\} where ρ(y)=y2\rho(y)=y^{2} for all yy\in\mathbb{R} is the square activation function.

This theorem shows that given an initial neural network u0H01(Ω)u_{0}\in H^{1}_{0}(\Omega) containing N0N_{0} parameters, we can recover a neural network that is ϵ\epsilon close to the unique solution uu^{\star}. The number of parameters in uϵu_{\epsilon} depend on how close the initial estimate u0u_{0} is to the solution uu^{\star}, and N0N_{0}. This results in a trade-off, where better approximations may require more parameters, compared to a poorer approximation with fewer parameters.

Note that ϵ0\epsilon\to 0 as TT\to\infty, while ϵ~\tilde{\epsilon} is a “bias” error term that does not go to 0 as TT\to\infty. The first three terms in the expression for ϵ~\tilde{\epsilon} result from bounding the difference between the solutions to the equations Lu=fLu=f and L~u=fspan\tilde{L}u=f_{\operatorname{span}}, whereas the third term is due to difference between ff and fnnf_{\operatorname{nn}} and the fact that our proof involves simulating the gradient descent updates with neural networks. Further, if the constant ζ\zeta is equal to 0 then the error term ϵc\epsilon_{c} will also be 0, in which case the term δ\delta will equal ϵA/m\epsilon_{A}/m.

The fact that ϵ:=(λ~kλ~1λ~k+λ~1)TR\epsilon:=\left(\frac{\tilde{\lambda}_{k}-\tilde{\lambda}_{1}}{\tilde{\lambda}_{k}+\tilde{\lambda}_{1}}\right)^{T}R comes from the fact that we are simulating TT steps of a gradient descent-like procedure on a strongly convex loss. The parameters λ~k\tilde{\lambda}_{k} and λ~1\tilde{\lambda}_{1} can be thought of as the effective Lipschitz and strong-convexity constants of the loss. Finally, to give a sense of what RR looks like, we show in Lemma 1 that if u0u_{0} is initialized to be identically zero then RfL2(Ω)λ1R\leq\frac{\|f\|_{L^{2}(\Omega)}}{\lambda_{1}}.

Lemma 1.

If u0=0u_{0}=0, then RfL2(Ω)λ1R\leq\frac{\|f\|_{L^{2}(\Omega)}}{\lambda_{1}}.

Proof.

Given that u0u_{0} is identically 0, the value of RR in Theorem 1 equals uu0L2(Ω)=uL2(Ω)\|u^{\star}-u_{0}\|_{L^{2}(\Omega)}=\|u^{\star}\|_{L^{2}(\Omega)} Using the inequality in (2), we have,

uL2(Ω)2\displaystyle\|u^{\star}\|_{L^{2}(\Omega)}^{2} Lu,uλ1\displaystyle\leq\frac{\langle Lu^{\star},u^{\star}\rangle}{\lambda_{1}}
1λ1f,uL2(Ω)\displaystyle\leq\frac{1}{\lambda_{1}}\langle f,u^{\star}\rangle_{L^{2}(\Omega)}
1λ1fL2(Ω)uL2(Ω)\displaystyle\leq\frac{1}{\lambda_{1}}\|f\|_{L^{2}(\Omega)}\|u^{\star}\|_{L^{2}(\Omega)}
uL2(Ω)\displaystyle\implies\|u^{\star}\|_{L^{2}(\Omega)} 1λ1fL2(Ω)\displaystyle\leq\frac{1}{\lambda_{1}}\|f\|_{L^{2}(\Omega)}\qed

We make few remarks about the theorem statement:

Remark 1.

While we state our convergence results in L2(Ω)L^{2}(\Omega) norm, our proof works for the H01(Ω)H^{1}_{0}(\Omega) norm as well. This is because in the space defined by the top-k eigenfunctions of the operator LL, L2(Ω)L^{2}(\Omega) and H01(Ω)H^{1}_{0}(\Omega) norm are equivalent (shown in Proposition A.1). Further, note that even though we have assumed that uH01(Ω)u^{\star}\in H^{1}_{0}(\Omega) is the unique solution of (1) from the boundary regularity condition, we have that uH2(Ω)u^{\star}\in H^{2}(\Omega) (see Evans [1998], Chapter 6, Section 6.3). This ensures that the solution uu^{\star} is twice differentiable as well.

Remark 2.

To get a sense of the scale of λ1\lambda_{1} and λk\lambda_{k}, when L=ΔL=-\Delta (the Laplacian operator), the eigenvalue λ1=infuH01(Ω)uL2(Ω)uL2(Ω)=1Cp\lambda_{1}=\inf_{u\in H^{1}_{0}(\Omega)}\frac{\|\nabla u\|_{L^{2}(\Omega)}}{\|u\|_{L^{2}(\Omega)}}=\frac{1}{C_{p}}, where CpC_{p} is the Poincaré constant (see Theorem A.1 in Appendix). For geometrically well-behaved sets Ω\Omega (e.g. convex sets with a strongly convex boundary, like a sphere), CpC_{p} is even dimension-independent. Further from the Weyl’s law operator (Evans [1998], Section 6.5) we have

limkλkd/2k=(2π)dvol(Ω)α(d)\lim_{k\to\infty}\frac{\lambda_{k}^{d/2}}{k}=\frac{(2\pi)^{d}}{\mathrm{vol}(\Omega)\alpha(d)}

where α(d)\alpha(d) is the volume of a unit ball in dd dimensions. So, if vol(Ω)1/α(d)\mathrm{vol}(\Omega)\geq 1/\alpha(d), λk\lambda_{k} grows as O(k2/d)O(k^{2/d}), which is a constant so long as logkd\log k\ll d.

6 Proof of Main Result

First, we provide some intuition behind the proof, via an analogy between a uniformly elliptic operator and a positive definite matrix in linear algebra. We can think of finding the solution to the equation Lu=fLu=f for an elliptic LL as analogous to finding the solution to the linear system of equations Qx=kQx=k, where QQ is a d×dd\times d positive definite matrix, and xx and kk are dd-dimensional vectors. One way to solve such a linear system is by minimizing the strongly convex function Qxb2\|Qx-b\|^{2} using gradient descent. Since the objective is strongly convex, after O(log(1/ϵ))O(\log(1/\epsilon)) gradient steps, we reach an ϵ\epsilon-optimal point in an l2l_{2} sense.

Our proof uses a similar strategy. First, we show that for the operator LL, we can define a sequence of functions that converge to an ϵ\epsilon-optimal function approximation (in this case in the L2(Ω)L^{2}(\Omega) norm) after O(log(1/ϵ)O(\log(1/\epsilon) steps—similar to the rate of convergence for strongly convex functions. Next, we inductively show that each iterate in the sequence can be approximated by a small neural network. More precisely, we show that given a bound on the size of the tt-th iterate utu_{t}, we can, in turn, upper bound the size of the (t+1)(t+1)-th iterate ut+1u_{t+1} because the update transforming utu_{t} to ut+1u_{t+1} can be simulated by a small neural network (Lemma 7). These iterations look roughly like ut+1utη(Lutf)u_{t+1}\leftarrow u_{t}-\eta(Lu_{t}-f), and we use a “backpropagation” lemma (Lemma 8) which bounds the size of the derivative of a neural network.

6.1 Defining a Convergent Sequence

The rough idea is to perform gradient descent in L2(Ω)L^{2}(\Omega) [Neuberger, 2009; Faragó and Karátson, 2001, 2002] to define a convergent sequence whose iterates converge to uu^{\star} in L2(Ω)L^{2}(\Omega) norm (and following Remark 1, in H01(Ω)H^{1}_{0}(\Omega) as well). However, there are two obstacles to defining the iterates as simply ut+1utη(Lutf)u_{t+1}\leftarrow u_{t}-\eta(Lu_{t}-f): (1) LL is unbounded—so the standard way of choosing a step size for gradient descent (roughly the ratio of the minimum and maximum eigenvalues of LL) would imply choosing a step size η=0\eta=0, and (2) LL does not necessarily preserve the boundary conditions, so if we start with utH01(Ω)u_{t}\in H^{1}_{0}(\Omega), it may be that LutfLu_{t}-f does not even lie in H01(Ω)H^{1}_{0}(\Omega).

We resolve both issues by restricting the updates to the span of the first kk eigenfunctions of LL. More concretely, as shown in Lemma 2, if a function uu in Φk\Phi_{k}, then the function LuLu will also lie in Φk\Phi_{k}. We also show that within the span of the first kk eigenfunctions, LL is bounded (with maximum eigenvalue λk\lambda_{k}), and can therefore be viewed as an operator from Φk\Phi_{k} to Φk\Phi_{k}. Further, we use fspanf_{\operatorname{span}} instead of ff in our updates, which now have the form ut+1utη(Lutfspan)u_{t+1}\leftarrow u_{t}-\eta(Lu_{t}-f_{\operatorname{span}}). Since fspanf_{\operatorname{span}} belongs to Φk\Phi_{k}, for a utu_{t} in Φk\Phi_{k} the next iterate ut+1u_{t+1} will now remain in Φk\Phi_{k}. Continuing the matrix analogy, we can choose the usual step size of η=2λ1+λk\eta=\frac{2}{\lambda_{1}+\lambda_{k}}. Precisely, we show:

Lemma 2.

Let LL be an elliptic operator. Then, for all vΦkv\in\Phi_{k} it holds:

  1. 1.

    LvΦkLv\in\Phi_{k}.

  2. 2.

    λ1vL2(Ω)Lv,vL2(Ω)λkvL2(Ω)\lambda_{1}\|v\|_{L^{2}(\Omega)}\leq\langle Lv,v\rangle_{L^{2}(\Omega)}\leq\lambda_{k}\|v\|_{L^{2}(\Omega)}

  3. 3.

    (I2λk+λkL)uL2(Ω)λkλ1λk+λ1uL2(Ω)\left\|\left(I-\frac{2}{\lambda_{k}+\lambda_{k}}L\right)u\right\|_{L^{2}(\Omega)}\leq\frac{\lambda_{k}-\lambda_{1}}{\lambda_{k}+\lambda_{1}}\|u\|_{L^{2}(\Omega)}

Proof.

Writing uΦku\in\Phi_{k} as u=idiφiu=\sum_{i}d_{i}\varphi_{i} where di=u,φiL2(Ω)d_{i}=\langle u,\varphi_{i}\rangle_{L^{2}(\Omega)}, we have Lu=i=1kλidiφiLu=\sum_{i=1}^{k}\lambda_{i}d_{i}\varphi_{i}. Therefore LuΦ~kLu\in\tilde{\Phi}_{k} and LuLu lies in H01(Ω)H^{1}_{0}(\Omega), proving (1.).

Since vΦkv\in\Phi_{k}, we use the definition of eigenvalues in (3) to get,

Lv,vL2(Ω)vL2(Ω)supvLv,vL2(Ω)vL2(Ω)=λk\displaystyle\frac{\langle Lv,v\rangle_{L^{2}(\Omega)}}{\|v\|_{L^{2}(\Omega)}}\leq\sup_{v}\frac{\langle Lv,v\rangle_{L^{2}(\Omega)}}{\|v\|_{L^{2}(\Omega)}}=\lambda_{k}
Lv,vL2(Ω)λkvL2(Ω)2\displaystyle\implies\langle Lv,v\rangle_{L^{2}(\Omega)}\leq\lambda_{k}\|v\|^{2}_{L^{2}(\Omega)}

and similarly

Lv,vL2(Ω)vL2(Ω)infvLv,vL2(Ω)vL2(Ω)=λ1\displaystyle\frac{\langle Lv,v\rangle_{L^{2}(\Omega)}}{\|v\|_{L^{2}(\Omega)}}\geq\inf_{v}\frac{\langle Lv,v\rangle_{L^{2}(\Omega)}}{\|v\|_{L^{2}(\Omega)}}=\lambda_{1}
Lv,vL2(Ω)λ1vL2(Ω)2\displaystyle\implies\langle Lv,v\rangle_{L^{2}(\Omega)}\geq\lambda_{1}\|v\|^{2}_{L^{2}(\Omega)}

In order to prove (2.) let us first denote L¯:=(I2λk+λ1L)\bar{L}:=\left(I-\frac{2}{\lambda_{k}+\lambda_{1}}L\right). Note if φ\varphi is an eigenfunction of LL with corresponding eigenvalue λ\lambda, it is also an eigenfunction of L¯\bar{L} with corresponding eigenvalue λk+λ12λλk+λ1\frac{\lambda_{k}+\lambda_{1}-2\lambda}{\lambda_{k}+\lambda_{1}}.

Hence, writing uΦku\in\Phi_{k} as u=i=1kdiφiu=\sum_{i=1}^{k}d_{i}\varphi_{i}, where di=u,φid_{i}=\langle u,\varphi_{i}\rangle, we have

L¯uL2(Ω)2=i=1kλk+λ12λiλk+λ1diφiL2(Ω)2maxik(λk+λ12λiλk+λ1)2i=1kdiφiL2(Ω)2\|\bar{L}u\|_{L^{2}(\Omega)}^{2}=\left\|\sum_{i=1}^{k}\frac{\lambda_{k}+\lambda_{1}-2\lambda_{i}}{\lambda_{k}+\lambda_{1}}d_{i}\varphi_{i}\right\|_{L^{2}(\Omega)}^{2}\leq\max_{i\in k}\left(\frac{\lambda_{k}+\lambda_{1}-2\lambda_{i}}{\lambda_{k}+\lambda_{1}}\right)^{2}\left\|\sum_{i=1}^{k}d_{i}\varphi_{i}\right\|_{L^{2}(\Omega)}^{2} (5)

By the orthogonality of {φi}i=1k\{\varphi_{i}\}_{i=1}^{k}, we have

i=1kdiφiL2(Ω)2=i=1kdi2=uL2(Ω)2\left\|\sum_{i=1}^{k}d_{i}\varphi_{i}\right\|_{L^{2}(\Omega)}^{2}=\sum_{i=1}^{k}d_{i}^{2}=\|u\|_{L^{2}(\Omega)}^{2}

Since λ1λ2λk\lambda_{1}\leq\lambda_{2}\dots\leq\lambda_{k}, we have λk+λ12λiλ1λk\lambda_{k}+\lambda_{1}-2\lambda_{i}\geq\lambda_{1}-\lambda_{k} and λk+λ12λiλkλ1\lambda_{k}+\lambda_{1}-2\lambda_{i}\leq\lambda_{k}-\lambda_{1}, so |λk+λ12λi|λkλ1|\lambda_{k}+\lambda_{1}-2\lambda_{i}|\leq\lambda_{k}-\lambda_{1}. This implies maxik(λk+λ12λiλk+λ1)2(λ1λkλ1+λk)2\max_{i\in k}\left(\frac{\lambda_{k}+\lambda_{1}-2\lambda_{i}}{\lambda_{k}+\lambda_{1}}\right)^{2}\leq\left(\frac{\lambda_{1}-\lambda_{k}}{\lambda_{1}+\lambda_{k}}\right)^{2}. Plugging this back in (5), we get the claim we wanted. ∎

In fact, we will use a slight variant of the updates and instead set ut+1utη(L~uf~span)u_{t+1}\leftarrow u_{t}-\eta(\tilde{L}u-\tilde{f}_{\operatorname{span}}) as the iterates of the convergent sequence, where f~span\tilde{f}_{\operatorname{span}} is the projections of ff onto Φ~k\tilde{\Phi}_{k}. This sequence satisfies two important properties: (1) The convergence point of the sequence and uu^{\star}, the solution to the original PDE, are not too far from each other; (2) The sequence of functions converges exponentially fast. In Section 6.2, we will see that updates defined thusly will be more convenient to simulate via a neural network.

The first property is formalized as follows:

Lemma 3.

Assume that u~span\tilde{u}_{\operatorname{span}}^{\star} is the solution to the PDE L~u=f~span\tilde{L}u=\tilde{f}_{\operatorname{span}}, where f~span:H01(Ω)\tilde{f}_{\operatorname{span}}:H^{1}_{0}(\Omega)\to\mathbb{R} is the projections of ff onto Φ~k\tilde{\Phi}_{k}. Given Assumptions (i)-(iii), we have uu~spanL2(Ω)ϵ\|u^{\star}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}\leq\epsilon, such that ϵ=ϵspanλ1+δλ1fL2(Ω)γδ+δu~spanL2(Ω)\epsilon=\frac{\epsilon_{\operatorname{span}}}{\lambda_{1}}+\frac{\delta}{\lambda_{1}}\frac{\|f\|_{L^{2}(\Omega)}}{\gamma-\delta}+\delta\|\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}, where γ=1λk1λk+1\gamma=\frac{1}{\lambda_{k}}-\frac{1}{\lambda_{k+1}} and δ=max{ϵAm,ϵcζ}\delta=\max\left\{\frac{\epsilon_{A}}{m},\frac{\epsilon_{c}}{\zeta}\right\}.

The proof for Lemma 3 is provided in the Appendix (Section B.1). Each of the three terms in the final error captures different sources of perturbation: the first term comes from approximating ff by fspanf_{\operatorname{span}}; the second term comes from applying Davis-Kahan [Davis and Kahan, 1970] to bound the “misalignment” between the eigenspaces Φk\Phi_{k} and Φ~k\tilde{\Phi}_{k} (hence, the appearance of the eigengap between the kk and (k+1)(k+1)-st eigenvalue of L1L^{-1}); the third term is a type of “relative” error bounding the difference between the solutions to the PDEs Lu=f~spanLu=\tilde{f}_{\operatorname{span}} and L~u=f~span\tilde{L}u=\tilde{f}_{\operatorname{span}}.

The “misalignment” term can be characterized through the following lemma:

Lemma 4 (Bounding distance between fspanf_{\operatorname{span}} and f~span\tilde{f}_{\operatorname{span}}).

Given Assumptions (i)-(iii)and denoting the projection of ff onto Φ~k\tilde{\Phi}_{k} by f~span\tilde{f}_{\operatorname{span}} we have:

fspanf~spanL2(Ω)fL2(Ω)δγδ\|f_{\operatorname{span}}-\tilde{f}_{\operatorname{span}}\|_{L^{2}(\Omega)}\leq\frac{\|f\|_{L^{2}(\Omega)}\delta}{\gamma-\delta} (6)

where δ=max{ϵAm,ϵcζ}\delta=\max\left\{\frac{\epsilon_{A}}{m},\frac{\epsilon_{c}}{\zeta}\right\}.

Proof.

Let us write fspan=i=1kfiφif_{\operatorname{span}}=\sum_{i=1}^{k}f_{i}\varphi_{i} where fi=f,φiL2(Ω)f_{i}=\langle f,\varphi_{i}\rangle_{L^{2}(\Omega)}. Further, we can define a function f~spanΦ~k\tilde{f}_{\operatorname{span}}\in\tilde{\Phi}_{k} such that f~span=i=1kf~iφ~i\tilde{f}_{\operatorname{span}}=\sum_{i=1}^{k}\tilde{f}_{i}\tilde{\varphi}_{i} such that f~i=f,φ~iL2(Ω)\tilde{f}_{i}=\langle f,\tilde{\varphi}_{i}\rangle_{L^{2}(\Omega)}.

If Pkg:=i=1kg,φiL2(Ω)φiP_{k}g:=\sum_{i=1}^{k}\langle g,\varphi_{i}\rangle_{L^{2}(\Omega)}\varphi_{i} and P~kg:=i=1kg,φ~iL2(Ω)φ~i\tilde{P}_{k}g:=\sum_{i=1}^{k}\langle g,\tilde{\varphi}_{i}\rangle_{L^{2}(\Omega)}\tilde{\varphi}_{i} denote the projection of a function gg onto Φk\Phi_{k} and Φ~k\tilde{\Phi}_{k}, from Lemma C.1, we have:

fspanf~spanL2(Ω)\displaystyle\|f_{\operatorname{span}}-\tilde{f}_{\operatorname{span}}\|_{L^{2}(\Omega)} =i=1kf,φiL2(Ω)φif,φ~iL2(Ω)φ~iL2(Ω)\displaystyle=\left\|\sum_{i=1}^{k}\langle f,\varphi_{i}\rangle_{L^{2}(\Omega)}\varphi_{i}-\langle f,\tilde{\varphi}_{i}\rangle_{L^{2}(\Omega)}\tilde{\varphi}_{i}\right\|_{L^{2}(\Omega)}
=PkfP~kfL2(Ω)\displaystyle=\left\|P_{k}f-\tilde{P}_{k}f\right\|_{L^{2}(\Omega)}
PkP~kfL2(Ω)\displaystyle\leq\|P_{k}-\tilde{P}_{k}\|\|f\|_{L^{2}(\Omega)}
δγδfL2(Ω)\displaystyle\leq\frac{\delta}{\gamma-\delta}\|f\|_{L^{2}(\Omega)}

where γ=1λk1λk+1\gamma=\frac{1}{\lambda_{k}}-\frac{1}{\lambda_{k+1}}, and δ=max{ϵAm,ϵcζ}\delta=\max\left\{\frac{\epsilon_{A}}{m},\frac{\epsilon_{c}}{\zeta}\right\}. ∎

The main technical tool for bounding the difference between the operators LL and L~\tilde{L} can be formalized through the lemma below. Note, the “relative” nature of the perturbation is because LL and L~\tilde{L} are not bounded operators.

Lemma 5 (Relative operator perturbation bound).

Consider the operator L~\tilde{L} defined in (4), then for all uH01(Ω)u\in H^{1}_{0}(\Omega) we have the following:

  1. 1.

    (L~L)u,uδLu,u\langle(\tilde{L}-L)u,u\rangle\leq\delta\langle Lu,u\rangle

  2. 2.

    (L1L~I)u,uL2(Ω)δuL2(Ω)2\langle(L^{-1}\tilde{L}-I)u,u\rangle_{L^{2}(\Omega)}\leq\delta\|u\|_{L^{2}(\Omega)}^{2}

where δ=max{ϵAm,ϵcζ}\delta=\max\left\{\frac{\epsilon_{A}}{m},\frac{\epsilon_{c}}{\zeta}\right\}.

Proof.
(L~L)u,u\displaystyle\langle(\tilde{L}-L)u,u\rangle =Ω((A~A)uu+(c~c)u2)𝑑x\displaystyle=\int_{\Omega}\left((\tilde{A}-A)\nabla u\cdot\nabla u+(\tilde{c}-c)u^{2}\right)dx
(maxijA~ijAijL(Ω))uL2(Ω)2+c~cL(Ω)uL2(Ω)2\displaystyle\leq\left(\max_{ij}\|\tilde{A}_{ij}-A_{ij}\|_{L^{\infty}(\Omega)}\right)\|\nabla u\|^{2}_{L^{2}(\Omega)}+\|\tilde{c}-c\|_{L^{\infty}(\Omega)}\|u\|^{2}_{L^{2}(\Omega)}
ϵAuL2(Ω)2+ϵcuL2(Ω)2\displaystyle\leq\epsilon_{A}\|\nabla u\|_{L^{2}(\Omega)}^{2}+\epsilon_{c}\|u\|^{2}_{L^{2}(\Omega)} (7)

Further, note that

Lu,u\displaystyle\langle Lu,u\rangle =ΩAuu+cu2dx\displaystyle=\int_{\Omega}A\nabla u\cdot\nabla u+cu^{2}dx
muL2(Ω)2+ζuL2(Ω)2\displaystyle\geq m\|\nabla u\|^{2}_{L^{2}(\Omega)}+\zeta\|u\|_{L^{2}(\Omega)}^{2} (8)

Using the inequality a+bc+dmin{ac,bd}\frac{a+b}{c+d}\geq\min\{\frac{a}{c},\frac{b}{d}\} from (7) and (8), we have

muL2(Ω)2+ζuL2(Ω)2ϵAuL2(Ω)2+ϵcuL2(Ω)2min{mϵA,ζϵc}\displaystyle\frac{m\|\nabla u\|^{2}_{L^{2}(\Omega)}+\zeta\|u\|_{L^{2}(\Omega)}^{2}}{\epsilon_{A}\|\nabla u\|_{L^{2}(\Omega)}^{2}+\epsilon_{c}\|u\|_{L^{2}(\Omega)}^{2}}\geq\min\left\{\frac{m}{\epsilon_{A}},\frac{\zeta}{\epsilon_{c}}\right\} (9)

Hence this implies that

(L~L)u,uδLu,u\langle(\tilde{L}-L)u,u\rangle\leq\delta\langle Lu,u\rangle

where δ=max{ϵAm,ϵcζ}\delta=\max\left\{\frac{\epsilon_{A}}{m},\frac{\epsilon_{c}}{\zeta}\right\} proving part (1.).

Further, for part (2.) we have for all uH01(Ω)u\in H^{1}_{0}(\Omega),

(L~L)u,uL2(Ω)δLu,uL2(Ω)\displaystyle\langle(\tilde{L}-L)u,u\rangle_{L^{2}(\Omega)}\leq\delta\langle Lu,u\rangle_{L^{2}(\Omega)}
\displaystyle\implies (L~L1I)Lu,uL2(Ω)δLu,uL2(Ω)\displaystyle\langle(\tilde{L}L^{-1}-I)Lu,u\rangle_{L^{2}(\Omega)}\leq\delta\langle Lu,u\rangle_{L^{2}(\Omega)}
\displaystyle\implies (L~L1I)v,uL2(Ω)δv,uL2(Ω)\displaystyle\langle(\tilde{L}L^{-1}-I)v,u\rangle_{L^{2}(\Omega)}\leq\delta\langle v,u\rangle_{L^{2}(\Omega)}
\displaystyle\implies (L~L1)v,uL2(Ω)(1+δ)v,uL2(Ω)\displaystyle\langle(\tilde{L}L^{-1})v,u\rangle_{L^{2}(\Omega)}\leq(1+\delta)\langle v,u\rangle_{L^{2}(\Omega)} (10)

where v=Luv=Lu. Therefore using (10) the following holds for all uH01(Ω)u\in H^{1}_{0}(\Omega),

(L~L1)u,uL2(Ω)(1+δ)uL2(Ω)2\displaystyle\langle(\tilde{L}L^{-1})u,u\rangle_{L^{2}(\Omega)}\leq(1+\delta)\|u\|_{L^{2}(\Omega)}^{2}
(1)\displaystyle\stackrel{{\scriptstyle(1)}}{{\implies}} u,(L1L~)uL2(Ω)(1+δ)uL2(Ω)2\displaystyle\langle u,(L^{-1}\tilde{L})u\rangle_{L^{2}(\Omega)}\leq(1+\delta)\|u\|_{L^{2}(\Omega)}^{2}
(2)\displaystyle\stackrel{{\scriptstyle(2)}}{{\implies}} (L1L~I)u,uL2(Ω)δuL2(Ω)2\displaystyle\langle(L^{-1}\tilde{L}-I)u,u\rangle_{L^{2}(\Omega)}\leq\delta\|u\|_{L^{2}(\Omega)}^{2} (11)

where we use the fact that the operators L~\tilde{L} and L1L^{-1} are self-adjoint to get (1) and then bring the appropriate terms to the LHS in (2). ∎

The second property of the sequence of functions is that they converge exponentially fast. Namely, we show:

Lemma 6 (Convergence of gradient descent in L2L^{2}).

Let u~span\tilde{u}_{\operatorname{span}}^{\star} denote the unique solution to the PDE L~u=f~span\tilde{L}u=\tilde{f}_{\operatorname{span}}, where f~spanΦ~k\tilde{f}_{\operatorname{span}}\in\tilde{\Phi}_{k}, and the operator L~\tilde{L} satisfies the conditions in Lemma 2. For any u0H01(Ω)u_{0}\in H_{0}^{1}(\Omega) such that u0Φ~ku_{0}\in\tilde{\Phi}_{k}, we define the sequence

ut+1ut2λ~1+λ~k(L~utf~span)(t)u_{t+1}\leftarrow u_{t}-\frac{2}{\tilde{\lambda}_{1}+\tilde{\lambda}_{k}}(\tilde{L}u_{t}-\tilde{f}_{\operatorname{span}})\quad(t\in\mathbb{N}) (12)

where for all tt\in\mathbb{N}, utH01(Ω)u_{t}\in H_{0}^{1}(\Omega). Then for any tt\in\mathbb{N}, we have

utu~spanL2(Ω)(λ~kλ~1λ~k+λ~1)t1u0u~spanL2(Ω)\|u_{t}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}\leq\left(\frac{\tilde{\lambda}_{k}-\tilde{\lambda}_{1}}{\tilde{\lambda}_{k}+\tilde{\lambda}_{1}}\right)^{t-1}\|u_{0}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}

The proof is essentially the same as the the analysis of the convergence time of gradient descent for strongly convex losses. Namely, we have:

Proof.

Given that u0H01(Ω)u_{0}\in H^{1}_{0}(\Omega) and u0Φ~ku_{0}\in\tilde{\Phi}_{k} the function L~u0H01(Ω)\tilde{L}u_{0}\in H^{1}_{0}(\Omega) and L~u0Φ~k\tilde{L}u_{0}\in\tilde{\Phi}_{k} as well (from Lemma 2).

As f~spanΦ~k\tilde{f}_{\operatorname{span}}\in\tilde{\Phi}_{k}, all the iterates in the sequence will also belong to H01(Ω)H^{1}_{0}(\Omega) and will lie in the Φ~k\tilde{\Phi}_{k}.

Now at a step tt the iteration looks like,

ut+1=un2λ~k+λ~1(L~utf~span)\displaystyle u_{t+1}=u_{n}-\frac{2}{\tilde{\lambda}_{k}+\tilde{\lambda}_{1}}\left(\tilde{L}u_{t}-\tilde{f}_{\operatorname{span}}\right)
ut+1u~span=(I2λ~k+λ~1L~)(utu~span)\displaystyle u_{t+1}-\tilde{u}_{\operatorname{span}}^{\star}=\left(I-\frac{2}{\tilde{\lambda}_{k}+\tilde{\lambda}_{1}}\tilde{L}\right)(u_{t}-\tilde{u}_{\operatorname{span}}^{\star})

Using the result from Lemma 2, part 3. we have,

ut+1u~spanL2(Ω)(λ~kλ~1λ~k+λ~1)utu~spanL2(Ω)\displaystyle\|u_{t+1}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}\leq\left(\frac{\tilde{\lambda}_{k}-\tilde{\lambda}_{1}}{\tilde{\lambda}_{k}+\tilde{\lambda}_{1}}\right)\|u_{t}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}
ut+1u~spanL2(Ω)(λ~kλ~1λ~k+λ~1)tu0u~spanL2(Ω)\displaystyle\implies\|u_{t+1}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}\leq\left(\frac{\tilde{\lambda}_{k}-\tilde{\lambda}_{1}}{\tilde{\lambda}_{k}+\tilde{\lambda}_{1}}\right)^{t}\|u_{0}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}

This finishes the proof. ∎

Combining the results from Lemma 3 and Lemma 6 via triangle inequality, we have:

uuTL2(Ω)\displaystyle\|u^{\star}-u_{T}\|_{L^{2}(\Omega)} uu~spanL2(Ω)+u~spanuTL2(Ω)\displaystyle\leq\|u^{\star}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}+\|\tilde{u}_{\operatorname{span}}^{\star}-u_{T}\|_{L^{2}(\Omega)}

and the first term on the RHS subsumes the first three summands of ϵ~\tilde{\epsilon} defined in Theorem 1.

6.2 Approximating iterates by neural networks

In Lemma 6, we show that there exists a sequence of functions (12) which converge fast to a function close to uu^{\star}. The next step in the proof is to approximate the iterates by neural networks.

The main idea is as follows. Suppose first the iterates ut+1=utη(L~utf~span)u_{t+1}=u_{t}-\eta(\tilde{L}u_{t}-\tilde{f}_{\operatorname{span}}) are such that f~span\tilde{f}_{\operatorname{span}} is exactly representable as a neural network. Then, the iterate ut+1u_{t+1} can be written in terms of three operations performed on utu_{t}, aa and ff: taking derivatives, multiplication and addition. Moreover, if gg is representable as a neural network with NN parameters, the coordinates of the vector g\nabla g can be represented by a neural network with O(N)O(N) parameters. This is a classic result (Lemma 8), essentially following from the backpropagation algorithm. Finally, addition or multiplication of two functions representable as neural networks with sizes N1,N2N_{1},N_{2} can be represented as neural networks with size O(N1+N2)O(N_{1}+N_{2}) (see Lemma 9).

Using these facts, we can write down a recurrence upper bounding the size of neural network approximation ut+1u_{t+1}, denoted by u^t+1\hat{u}_{t+1}, in terms of the number of parameters in u^t\hat{u}_{t} (which is the neural network approximation to utu_{t}). Formally, we have:

Lemma 7 (Recursion Lemma).

Given the Assumptions (i)-(iii), consider the update equation

u^t+1u^t2λ~1+λ~k(L~u^tfnn)\hat{u}_{t+1}\leftarrow\hat{u}_{t}-\frac{2}{\tilde{\lambda}_{1}+\tilde{\lambda}_{k}}\left(\tilde{L}\hat{u}_{t}-f_{\operatorname{nn}}\right) (13)

If at step tt, u^t:d\hat{u}_{t}:\mathbb{R}^{d}\to\mathbb{R} is a neural network with NtN_{t} parameters, then the function u^t+1\hat{u}_{t+1} is a neural network with O(d2(NA+Nt)+Nt+Nf~+Nc)O(d^{2}(N_{A}+N_{t})+N_{t}+N_{\tilde{f}}+N_{c}) parameters.

Proof.

Expand the update u^t+1u^tη(L~u^tfnn)\hat{u}_{t+1}\leftarrow\hat{u}_{t}-\eta\left(\tilde{L}\hat{u}_{t}-f_{\operatorname{nn}}\right) as follows:

u^t+1u^tη(i,j=1da~ijiju^t+j=1d(i=1dia~ij)ju^t+c~u^tfnn).\displaystyle\hat{u}_{t+1}\leftarrow\hat{u}_{t}-\eta\left(\sum_{i,j=1}^{d}\tilde{a}_{ij}\partial_{ij}\hat{u}_{t}+\sum_{j=1}^{d}\left(\sum_{i=1}^{d}\partial_{i}\tilde{a}_{ij}\right)\partial_{j}\hat{u}_{t}+\tilde{c}\hat{u}_{t}-f_{\operatorname{nn}}\right).

Using Lemma 8, iju^t\partial_{ij}\hat{u}_{t}, ju^t\partial_{j}\hat{u}_{t} and ia~ij\partial_{i}\tilde{a}_{ij} can be represented by a neural network with O(Nt)O(N_{t}), O(Nt)O(N_{t}) and O(NA)O(N_{A}) parameters, respectively. Further, ia~ijju\partial_{i}\tilde{a}_{ij}\partial_{j}u and a~ijiju^\tilde{a}_{ij}\partial_{ij}\hat{u} can be represented by a neural network with O(NA+Nt)O(N_{A}+N_{t}) parameters, and c~u^t\tilde{c}\hat{u}_{t} can be represented by a network with O(Nt+Nc)O(N_{t}+N_{c}) parameters, from Lemma 9. Hence u^t+1\hat{u}_{t+1} can be represented in O(d2(NA+Nt)+Nf+Nc+Nt)O(d^{2}(N_{A}+N_{t})+N_{f}+N_{c}+N_{t}) parameters. Note that, throughout the entire proofs OO hides independent constants. ∎

Combining the results of Lemma 6 and Lemma 7, we can get a recurrence for the number of parameters required to represent the neural network u^t\hat{u}_{t}:

Nt+1d2Nt+d2NA+Nt+Nf~+Nc\displaystyle N_{t+1}\leq d^{2}N_{t}+d^{2}N_{A}+N_{t}+N_{\tilde{f}}+N_{c}

Unfolding this recurrence, we get NTd2TN0+d2(dT1)d21NA+T(Nf)+Nc)N_{T}\leq d^{2T}N_{0}+\frac{d^{2}(d^{T}-1)}{d^{2}-1}N_{A}+T(N_{f})+N_{c}).

The formal lemmas for the different operations on neural networks we can simulate using a new neural network are as follows:

Lemma 8 (Backpropagation, Rumelhart et al. [1986]).

Consider neural network g:mg:\mathbb{R}^{m}\rightarrow\mathbb{R} with depth ll, NN parameters and differentiable activation functions in the set {σi}i=1A\{\sigma_{i}\}_{i=1}^{A}. There exists a neural network of size O(l+N)O(l+N) and activation functions in the set {σi,σi}i=1A\{\sigma_{i},\sigma^{\prime}_{i}\}_{i=1}^{A} that calculates the gradient dgdi\frac{dg}{di} for all i[m]i\in[m].

Lemma 9 (Addition and Multiplication).

Given neural networks g:Ωg:\Omega\rightarrow\mathbb{R}, h:Ωh:\Omega\rightarrow\mathbb{R}, with NgN_{g} and NhN_{h} parameters respectively, the operations g(x)+h(x)g(x)+h(x) and g(x)h(x)g(x)\cdot h(x) can be represented by neural networks of size O(Ng+Nh)O(N_{g}+N_{h}), and square activation functions.

Proof.

For Addition, there exists a network hh containing both networks ff and gg as subnetworks and an extra layer to compute the addition between their outputs. Hence, the total number of parameters in such a network will be O(Nf+Ng)O(N_{f}+N_{g}).

For Multiplication, consider the operation f(x)g(x)=12((f(x)+g(x))2f(x)2g(x)2)f(x)\cdot g(x)=\frac{1}{2}\left(\left(f(x)+g(x)\right)^{2}-f(x)^{2}-g(x)^{2}\right). Then following the same argument as for addition of two networks, we can construct a network hh containing both networks and square activation function. ∎

While the representation result in Lemma 9 is shown using square activation, we refer to Yarotsky [2017] for approximation results with ReLU activation. The scaling with respect to the number of parameters in the network remains the same.


Finally, we have to deal with the fact that f~span\tilde{f}_{\operatorname{span}} is not exactly a neural network, but only approximately so. The error due to this discrepancy can be characterized through the following lemma:

Lemma 10 (Error using fnnf_{\operatorname{nn}}).

Consider the update equation in (13), where fnnf_{\operatorname{nn}} is a neural network with NfN_{f}. Then the neural network u^t\hat{u}_{t} approximates the function utu_{t} such that utu^tL2(Ω)ϵnn(t)\|u_{t}-\hat{u}_{t}\|_{L^{2}(\Omega)}\leq\epsilon^{(t)}_{\operatorname{nn}} where ϵnn(t)\epsilon^{(t)}_{\operatorname{nn}} is

O((max{1,t2ηeC})t(ϵspan+ϵnn+4(1+δγδ)λktfL2(Ω)))O\left((\max\{1,t^{2}\eta eC\})^{t}\left(\epsilon_{\operatorname{span}}+\epsilon_{\operatorname{nn}}+4\left(1+\frac{\delta}{\gamma-\delta}\right)\lambda_{k}^{t}\|f\|_{L^{2}(\Omega)}\right)\right)

where δ=max{ϵAm,ϵcζ}\delta=\max\left\{\frac{\epsilon_{A}}{m},\frac{\epsilon_{c}}{\zeta}\right\}, γ=1λk1λk+1\gamma=\frac{1}{\lambda_{k}}-\frac{1}{\lambda_{k+1}}, and α\alpha is a multi-index.

The proof for the lemma is deferred to Section B.2 of the Appendix. The main strategy to prove this lemma involves tracking the “residual” non-neural-network part of the iterates. Precisely, for every tt\in\mathbb{N}, we will write ut=u^t+rtu_{t}=\hat{u}_{t}+r_{t}, s.t. u^t\hat{u}_{t} is a neural network and bound rtL2(Ω)\|r_{t}\|_{L^{2}(\Omega)}. {u^t}t=0\{\hat{u}_{t}\}_{t=0}^{\infty} is defined such that

{u^0=u0,u^t+1=u^tη(L~u^tfnn)\begin{cases}\hat{u}_{0}=u_{0},\\ \hat{u}_{t+1}=\hat{u}_{t}-\eta\left(\tilde{L}\hat{u}_{t}-f_{\operatorname{nn}}\right)\end{cases}

Correspondingly, as rt=utu^tr_{t}=u_{t}-\hat{u}_{t}, we have:

{r0=0,rt+1=(IηL~)rtr\begin{cases}r_{0}=0,\\ r_{t+1}=(I-\eta\tilde{L})r_{t}-r\end{cases}

Unfolding the recurrence, we have rt=i=0t1(IηL~)(i)rr_{t}=\sum_{i=0}^{t-1}(I-\eta\tilde{L})^{(i)}r, which reduces the proof to bounding (IηL~)(i)L2(Ω)\|(I-\eta\tilde{L})^{(i)}\|_{L^{2}(\Omega)}. 333The reason we require that fnnf_{\operatorname{nn}} is close to ff not only in the L2L_{2} sense but also in terms of their higher order derivatives is since L~(t)r\tilde{L}^{(t)}r involves 2t2t-order derivatives of rr to be bounded at each step.

7 Applications to Learning Operators

A number of recent works attempt to simultaneously approximate the solutions for an entire family of PDEs by learning a parametric map that takes as inputs (some representation of) the coefficients of a PDE and returns its solution [Bhattacharya et al., 2020; Li et al., 2020b, a]. For example, given a set of observations that {aj,uj}j=1N\{a_{j},u_{j}\}_{j=1}^{N}, where each aja_{j} denotes a coefficient of a PDE with corresponding solution uju_{j}, they learn a neural network GG such that for all jj, uj=G(aj)u_{j}=G(a_{j}). Our parametric results provide useful insights for why simultaneously solving an entire family of PDEs with a single neural network GG is possible in the case of linear elliptic PDEs.

Consider the case where the coefficients aja_{j} in the family of PDEs are given by neural networks with a fixed architecture, but where each instance of a PDE is characterized by a different setting of the weights in the models representing the coefficients. Lemma 7 shows that each iteration of our sequence (12) constructs a new network containing both the current solution and the coefficient networks as subnetworks. We can view our approximation as not merely approximating the solution to a single PDE but to every PDE in the family, by treating the coefficient networks as placeholder architectures whose weights are provided as inputs. Thus, our construction provides a parametric map between the coefficients of an elliptic PDE in this family and its solution.

8 Conclusion and Future Work

We derive parametric complexity bounds for neural network approximations for solving linear elliptic PDEs with Dirichlet boundary conditions, whenever the coefficients can be approximated by are neural networks with finite parameter counts. By simulating gradient descent in function spaces using neural networks, we construct a neural network that approximates the solution of a PDE. We show that the number of parameters in the neural network depends on the parameters required to represent the coeffcients and has a poly(d)\mbox{poly}(d) dependence on the dimension of the input space, therefore avoiding the curse of dimensionality.

An immediate open question is related to the tightening our results: our current error bound is sensitive to the neural network approximation lying close to Φk\Phi_{k} which could be alleviated by relaxing (by adding some kind of “regularity” assumptions) the dependence of our analysis on the first kk eigenfunctions. Further, the dependencies in the exponent of dd on RR and κ\kappa in parametric bound may also be improvable. Finally, the idea of simulating an iterative algorithm by a neural network to derive a representation-theoretic result is broadly applicable, and may be a fertile ground for further work, both theoretically and empirically, as it suggest a particular kind of weight tying.

References

  • Ames [2014] William F Ames. Numerical methods for partial differential equations. Academic press, 2014.
  • Anderson and Wendt [1995] John David Anderson and J Wendt. Computational fluid dynamics, volume 206. Springer, 1995.
  • Bahdanau et al. [2014] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
  • Bhattacharya et al. [2020] Kaushik Bhattacharya, Bamdad Hosseini, Nikola B Kovachki, and Andrew M Stuart. Model reduction and neural networks for parametric pdes. arXiv preprint arXiv:2005.03180, 2020.
  • Black and Scholes [1973] Fischer Black and Myron Scholes. The pricing of options and corporate liabilities. Journal of political economy, 81(3):637–654, 1973.
  • Crank and Nicolson [1947] John Crank and Phyllis Nicolson. A practical method for numerical evaluation of solutions of partial differential equations of the heat-conduction type. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 43, pages 50–67. Cambridge University Press, 1947.
  • Davis and Kahan [1970] Chandler Davis and William Morton Kahan. The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970.
  • Devlin et al. [2018] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  • Ehrhardt and Mickens [2008] Matthias Ehrhardt and Ronald E Mickens. A fast, stable and accurate numerical method for the black–scholes equation of american options. International Journal of Theoretical and Applied Finance, 11(05):471–501, 2008.
  • Evans [1998] Lawrence C Evans. Partial Differential Equations. graduate studies in mathematics. american mathematical society, 1998. ISBN 9780821807729.
  • Faragó and Karátson [2001] I Faragó and J Karátson. The gradient-finite element method for elliptic problems. Computers & Mathematics with Applications, 42(8-9):1043–1053, 2001.
  • Faragó and Karátson [2002] István Faragó and János Karátson. Numerical solution of nonlinear elliptic problems via preconditioning operators: Theory and applications, volume 11. Nova Publishers, 2002.
  • Gilbarg and Trudinger [2001] David Gilbarg and Neil S Trudinger. Elliptic partial differential equations of second order. 2001.
  • Grohs and Herrmann [2020] Philipp Grohs and Lukas Herrmann. Deep neural network approximation for high-dimensional elliptic pdes with boundary conditions. arXiv preprint arXiv:2007.05384, 2020.
  • Grohs et al. [2018] Philipp Grohs, Fabian Hornung, Arnulf Jentzen, and Philippe Von Wurstemberger. A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of black-scholes partial differential equations. arXiv preprint arXiv:1809.02362, 2018.
  • Han et al. [2018] Jiequn Han, Arnulf Jentzen, and E Weinan. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018.
  • Jentzen et al. [2018] Arnulf Jentzen, Diyora Salimova, and Timo Welti. A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients. arXiv preprint arXiv:1809.07321, 2018.
  • Khoo et al. [2017] Yuehaw Khoo, Jianfeng Lu, and Lexing Ying. Solving parametric pde problems with artificial neural networks. arXiv preprint arXiv:1707.03351, 2017.
  • Krizhevsky et al. [2012] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25, pages 1097–1105. Curran Associates, Inc., 2012.
  • Kutyniok et al. [2019] Gitta Kutyniok, Philipp Petersen, Mones Raslan, and Reinhold Schneider. A theoretical analysis of deep neural networks and parametric pdes. arXiv preprint arXiv:1904.00377, 2019.
  • Lagaris et al. [1998] Isaac E Lagaris, Aristidis Likas, and Dimitrios I Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks, 9(5):987–1000, 1998.
  • Lagaris et al. [2000] Isaac E Lagaris, Aristidis C Likas, and Dimitris G Papageorgiou. Neural-network methods for boundary value problems with irregular boundaries. IEEE Transactions on Neural Networks, 11(5):1041–1049, 2000.
  • Lax and Milgram [1954] Peter D Lax and Arthur N Milgram. Parabolic equations, volume 33 of annals of mathematics studies, 1954.
  • LeVeque [2007] Randall J LeVeque. Finite difference methods for ordinary and partial differential equations: steady-state and time-dependent problems. SIAM, 2007.
  • Li et al. [2020a] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020a.
  • Li et al. [2020b] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485, 2020b.
  • Malek and Beidokhti [2006] Alaeddin Malek and R Shekari Beidokhti. Numerical solution for high order differential equations using a hybrid neural network—optimization method. Applied Mathematics and Computation, 183(1):260–271, 2006.
  • Neuberger [2009] John Neuberger. Sobolev gradients and differential equations. Springer Science & Business Media, 2009.
  • Nirenberg [1955] Louis Nirenberg. Remarks on strongly elliptic partial differential equations. Communications on pure and applied mathematics, 8(4):648–674, 1955.
  • Oksendal [2013] Bernt Oksendal. Stochastic differential equations: an introduction with applications. Springer Science & Business Media, 2013.
  • Özişik et al. [2017] M Necati Özişik, Helcio RB Orlande, Marcelo J Colaço, and Renato M Cotta. Finite difference methods in heat transfer. CRC press, 2017.
  • Radford et al. [2015] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
  • Raissi et al. [2017] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561, 2017.
  • Raissi et al. [2019] Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
  • Rumelhart et al. [1986] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors. nature, 323(6088):533–536, 1986.
  • Sirignano and Spiliopoulos [2018] Justin Sirignano and Konstantinos Spiliopoulos. Dgm: A deep learning algorithm for solving partial differential equations. Journal of computational physics, 375:1339–1364, 2018.
  • Temam [2001] Roger Temam. Navier-Stokes equations: theory and numerical analysis, volume 343. American Mathematical Soc., 2001.
  • Vaswani et al. [2017] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
  • Yarotsky [2017] Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114, 2017.

Appendix A Brief Overview of Partial Differential Equations

In this section, we introduce few key definitions and results from PDE literature. We note that the results in this section are standard and have been included in the Appendix for completeness. We refer the reader to classical texts on PDEs [Evans, 1998; Gilbarg and Trudinger, 2001] for more details.

We will use the following Poincaré inequality throughout our proofs.

Theorem A.1 (Poincaré inequality).

Given Ωd\Omega\subset\mathbb{R}^{d}, a bounded open subset, there exists a constant Cp>0C_{p}>0 such that for all uH01(Ω)u\in H_{0}^{1}(\Omega)

uL2(Ω)CpuL2(Ω).\|u\|_{L^{2}(\Omega)}\leq C_{p}\|\nabla u\|_{L^{2}(\Omega)}.
Corollary A.1.

For the bounded open subset Ωd\Omega\subset\mathbb{R}^{d}, for all uH01(Ω)u\in H^{1}_{0}(\Omega), we define the norm in the Hilbert space H01(Ω)H^{1}_{0}(\Omega) as

uH01(Ω)=uL2(Ω).\|u\|_{H^{1}_{0}(\Omega)}=\|\nabla u\|_{L^{2}(\Omega)}. (14)

Further, the norm in H01(Ω)H^{1}_{0}(\Omega) is equivalent to the norm H1(Ω)H^{1}(\Omega).

Proof.

Note that for uH01(Ω)u\in H^{1}_{0}(\Omega) we have,

uH1(Ω)\displaystyle\|u\|_{H^{1}(\Omega)} =uL2(Ω)+uL2(Ω)\displaystyle=\|\nabla u\|_{L^{2}(\Omega)}+\|u\|_{L^{2}(\Omega)}
uL2(Ω)\displaystyle\geq\|\nabla u\|_{L^{2}(\Omega)}
uH1(Ω)\displaystyle\implies\|u\|_{H^{1}(\Omega)} uH01(Ω).\displaystyle\geq\|u\|_{H^{1}_{0}(\Omega)}.

Where we have used the definition of the norm in H01(Ω)H^{1}_{0}(\Omega) space.

Further, using the result in Theorem A.1 we have

uH1(Ω)2=(uL2(Ω)2+uL2(Ω)2)(Cp2+1)uH1(Ω)2\displaystyle\|u\|^{2}_{H^{1}(\Omega)}=\left(\|u\|^{2}_{L^{2}(\Omega)}+\|\nabla u\|^{2}_{L^{2}(\Omega)}\right)\leq\left(C_{p}^{2}+1\right)\|\nabla u\|^{2}_{H^{1}(\Omega)} (15)

Therefore, combining the two inequalities we have

uH01(Ω)uH1(Ω)ChuH01(Ω)\|u\|_{H^{1}_{0}(\Omega)}\leq\|u\|_{H^{1}(\Omega)}\leq C_{h}\|u\|_{H^{1}_{0}(\Omega)} (16)

where Ch=(Cp2+1)C_{h}=(C_{p}^{2}+1). Hence we have that the norm in H01(Ω)H^{1}_{0}(\Omega) and H1(Ω)H^{1}(\Omega) spaces are equivalent. ∎

Proposition A.1 (Equivalence between L2(Ω)L^{2}(\Omega) and H01(Ω)H^{1}_{0}(\Omega) norms).

If vΦkv\in\Phi_{k} then we have that vL2(Ω)\|v\|_{L^{2}(\Omega)} is equivalent to vH01(Ω)\|v\|_{H^{1}_{0}(\Omega)}.

Proof.

We have from the Poincare inequality in Theorem A.1 that for all vH01(Ω)v\in H^{1}_{0}(\Omega), the norm in L2(Ω)L^{2}(\Omega) is upper bounded by the norm in H01(Ω)H^{1}_{0}(\Omega), i.e.,

vL2(Ω)2vH01(Ω)2\displaystyle\|v\|^{2}_{L^{2}(\Omega)}\leq\|v\|^{2}_{H^{1}_{0}(\Omega)}

Further, using results from (18) and (17) (where b(u,v):=Lu,vL2(Ω)b(u,v):=\langle Lu,v\rangle_{L^{2}(\Omega)}), we know that for all vH01(Ω)v\in H^{1}_{0}(\Omega) we have

mvH01(Ω)2Lv,vL2(Ω)max{M,CpcL(Ω)}vH01(Ω)2\displaystyle m\|v\|_{H^{1}_{0}(\Omega)}^{2}\leq\langle Lv,v\rangle_{L^{2}(\Omega)}\leq\max\{M,C_{p}\|c\|_{L^{\infty}(\Omega)}\}\|v\|_{H^{1}_{0}(\Omega)}^{2}

This implies that Lu,vL2(Ω)\langle Lu,v\rangle_{L^{2}(\Omega)} is equivalent to the inner product u,vH01(Ω)\langle u,v\rangle_{H^{1}_{0}(\Omega)}, i.e., for all u,vH01(Ω)u,v\in H^{1}_{0}(\Omega),

mu,vH01(Ω)Lu,vL2(Ω)max{M,CpcL(Ω)}u,vH01(Ω)m\langle u,v\rangle_{H^{1}_{0}(\Omega)}\leq\langle Lu,v\rangle_{L^{2}(\Omega)}\leq\max\left\{M,C_{p}\|c\|_{L^{\infty}(\Omega)}\right\}\langle u,v\rangle_{H^{1}_{0}(\Omega)}

Further, since vΦkv\in\Phi_{k}, we have from Lemma 2 that

Lv,vL2(Ω)λkvL2(Ω)2\displaystyle\langle Lv,v\rangle_{L^{2}(\Omega)}\leq\lambda_{k}\|v\|_{L^{2}(\Omega)}^{2}
\displaystyle\implies vH01(Ω)λkc1vL2(Ω)2\displaystyle\|v\|_{H^{1}_{0}(\Omega)}\leq\frac{\lambda_{k}}{c_{1}}\|v\|^{2}_{L^{2}(\Omega)}

Hence we have that for all vΦkv\in\Phi_{k} vL2(Ω)\|v\|_{L^{2}(\Omega)} is equivalent to vH01(Ω)\|v\|_{H^{1}_{0}(\Omega)} and by Corollary A.1 is also equivalent to vH1(Ω)\|v\|_{H^{1}(\Omega)}. ∎

Now introduce a form for Lu,vL2(Ω)\langle Lu,v\rangle_{L^{2}(\Omega)} that is more amenable for the existence and uniqueness results.

Lemma A.1.

For all u,vH01(Ω)u,v\in H^{1}_{0}(\Omega), we have the following,

  1. 1.

    The inner product Lu,vL2(Ω)\langle Lu,v\rangle_{L^{2}(\Omega)} equals,

    Lu,vL2(Ω)=Ω(Auv+cuv)𝑑x\langle Lu,v\rangle_{L^{2}(\Omega)}=\int_{\Omega}\left(A\nabla u\cdot\nabla v+cuv\right)\;dx
  2. 2.

    The operator LL is self-adjoint.

Proof.
  1. 1.

    We will be using the following integration by parts formula,

    Ωuxi𝑑x=Ωuvxi𝑑x+ΩuvniΓ\int_{\Omega}\frac{\partial u}{\partial x_{i}}dx=-\int_{\Omega}u\frac{\partial v}{\partial x_{i}}dx+\int_{\partial\Omega}uvn_{i}\partial\Gamma

    Where nin_{i} is a normal at the boundary and Γ\partial\Gamma is an infinitesimal element of the boundary.

    Hence we have for all u,vH01(Ω)u,v\in H^{1}_{0}(\Omega),

    Lu,vL2(Ω)\displaystyle\langle Lu,v\rangle_{L^{2}(\Omega)} =Ω(i=1d(i(Au)i))v+cuvdx\displaystyle=\int_{\Omega}-\left(\sum_{i=1}^{d}\left(\partial_{i}\left(A\nabla u\right)_{i}\right)\right)v+cuv\;dx
    =ΩAuvdxΩ(i=1d(Au)ini)v𝑑Γ+Ωcuv𝑑x\displaystyle=\int_{\Omega}A\nabla u\cdot\nabla vdx-\int_{\partial\Omega}\left(\sum_{i=1}^{d}(A\nabla u)_{i}n_{i}\right)vd\Gamma+\int_{\Omega}cuvdx
    =ΩAuvdx+Ωcuvdx(v|Ω=0)\displaystyle=\int_{\Omega}A\nabla u\cdot\nabla vdx+\int_{\Omega}cuvdx\qquad(\because v_{|\partial\Omega}=0)
  2. 2.

    To show that the operator L:H01(Ω)H01(Ω)L:H^{1}_{0}(\Omega)\to H^{1}_{0}(\Omega) is self-adjoint, we show that for all u,vH01(Ω)u,v\in H^{1}_{0}(\Omega) we have Lu,v=u,Lv\langle Lu,v\rangle=\langle u,Lv\rangle.

    From Proposition A.1, for functions u,vH01(Ω)u,v\in H^{1}_{0}(\Omega) we have

    Lu,vL2(Ω)\displaystyle\langle Lu,v\rangle_{L^{2}(\Omega)} =ΩAuvdx+Ωcuv𝑑x\displaystyle=\int_{\Omega}A\nabla u\cdot\nabla vdx+\int_{\Omega}cuvdx
    =ΩAvudx+Ωcvu𝑑x\displaystyle=\int_{\Omega}A\nabla v\cdot\nabla udx+\int_{\Omega}cvudx
    =u,Lv\displaystyle=\langle u,Lv\rangle

A.1 Proof of Proposition 1

We first show that if uu is the unique solution then it minimizes the variational norm.

Let uu denote the weak solution, further for all wH01(Ω)w\in H^{1}_{0}(\Omega) let v=u+wv=u+w. Using the fact that LL is self-adjoint (as shown in Lemma A.1) we have

J(v)=J(u+w)\displaystyle J(v)=J(u+w) =12L(u+w),(u+w)L2(Ω)f,u+wL2(Ω)\displaystyle=\frac{1}{2}\langle L(u+w),(u+w)\rangle_{L^{2}(\Omega)}-\langle f,u+w\rangle_{L^{2}(\Omega)}
=12Lu,uL2(Ω)+12Lw,wL2(Ω)+Lu,wL2(Ω)f,uL2(Ω)f,wL2(Ω)\displaystyle=\frac{1}{2}\langle Lu,u\rangle_{L^{2}(\Omega)}+\frac{1}{2}\langle Lw,w\rangle_{L^{2}(\Omega)}+\langle Lu,w\rangle_{L^{2}(\Omega)}-\langle f,u\rangle_{L^{2}(\Omega)}-\langle f,w\rangle_{L^{2}(\Omega)}
=J(u)+12Lw,wL2(Ω)+Lu,wL2(Ω)f,wL2(Ω)\displaystyle=J(u)+\frac{1}{2}\langle Lw,w\rangle_{L^{2}(\Omega)}+\langle Lu,w\rangle_{L^{2}(\Omega)}-\langle f,w\rangle_{L^{2}(\Omega)}
J(u)\displaystyle\geq J(u)

where we use the fact that Lu,uL2(Ω)>0\langle Lu,u\rangle_{L^{2}(\Omega)}>0 and that uu is a weak solution hence (1) holds for all wH01(Ω)w\in H^{1}_{0}(\Omega).

To show the other side, assume that uu minimizes JJ, i.e., for all λ>0\lambda>0 and vH01(Ω)v\in H^{1}_{0}(\Omega) we have, J(u+λv)J(u)J(u+\lambda v)\geq J(u),

J(u+λv)J(u)\displaystyle J(u+\lambda v)\geq J(u)
12L(u+λv),(u+λv)L2(Ω)f,(u+λv)L2(Ω)12Lu,uL2(Ω)f,uL2(Ω)\displaystyle\frac{1}{2}\langle L(u+\lambda v),(u+\lambda v)\rangle_{L^{2}(\Omega)}-\langle f,(u+\lambda v)\rangle_{L^{2}(\Omega)}\geq\frac{1}{2}\langle Lu,u\rangle_{L^{2}(\Omega)}-\langle f,u\rangle_{L^{2}(\Omega)}
\displaystyle\implies λ2Lv,vL2(Ω)+Lu,vL2(Ω)f,vL2(Ω)0\displaystyle\frac{\lambda}{2}\langle Lv,v\rangle_{L^{2}(\Omega)}+\langle Lu,v\rangle_{L^{2}(\Omega)}-\langle f,v\rangle_{L^{2}(\Omega)}\geq 0

Taking λ0\lambda\to 0, we get

Lu,vL2(Ω)f,vL2(Ω)0\langle Lu,v\rangle_{L^{2}(\Omega)}-\langle f,v\rangle_{L^{2}(\Omega)}\geq 0

and also taking vv as v-v, we have

Lu,vL2(Ω)f,vL2(Ω)0\langle Lu,v\rangle_{L^{2}(\Omega)}-\langle f,v\rangle_{L^{2}(\Omega)}\leq 0

Together, this implies that if uu is the solution to (2), then uu is also the weak solution, i.e, for all vH01(Ω)v\in H^{1}_{0}(\Omega) we have

Lu,vL2(Ω)=f,vL2(Ω)\langle Lu,v\rangle_{L^{2}(\Omega)}=\langle f,v\rangle_{L^{2}(\Omega)}

Proof for Existence and Uniqueness of the Solution

In order to prove for the uniqueness of the solution, we first state the Lax-Milgram theorem.

Theorem A.2 (Lax-Milgram, Lax and Milgram [1954]).

Let \mathcal{H} be a Hilbert space with inner-product (,):×(\cdot,\cdot):\mathcal{H}\times\mathcal{H}\rightarrow\mathbb{R}, and let b:×b:\mathcal{H}\times\mathcal{H}\rightarrow\mathbb{R} and l:l:\mathcal{H}\rightarrow\mathbb{R} be the bilinear form and linear form, respectively. Assume that there exists constants C1,C2,C3>0C_{1},C_{2},C_{3}>0 such that for all u,vu,v\in\mathcal{H} we have,

C1u2b(u,u),|b(u,v)|C2uv,and |l(u)|C3u.C_{1}\|u\|^{2}_{\mathcal{H}}\leq b(u,u),\quad|b(u,v)|\leq C_{2}\|u\|_{\mathcal{H}}\|v\|_{\mathcal{H}},\quad\text{and\;}|l(u)|\leq C_{3}\|u\|_{\mathcal{H}}.

Then there exists a unique uu\in\mathcal{H} such that,

b(u,v)=l(v)for allv.b(u,v)=l(v)\quad\text{for all}\;v\in\mathcal{H}.

Having stated the Lax-Milgram Theorem, we make the following proposition,

Proposition A.2.

Given the assumptions (i)-(iii), solution to the variational formulation in Equation 1 exists and is unique.

Proof.

Using the variational formulation defined in (1), we introduce the bilinear form b(,):H01(Ω)×H01(Ω)b(\cdot,\cdot):H^{1}_{0}(\Omega)\times H^{1}_{0}(\Omega)\to\mathbb{R} where b(u,v):=Lu,vb(u,v):=\langle Lu,v\rangle. Hence, we prove the theorem by showing that the bilinear form b(u,v)b(u,v) satisfies the conditions in Theorem A.2.

We first show that for all u,vH01(Ω)u,v\in H^{1}_{0}(\Omega) the following holds,

|b(u,v)|\displaystyle|b(u,v)| =|Ω(Auv+cuv)𝑑x|\displaystyle=\left|\int_{\Omega}\left(A\nabla u\cdot\nabla v+cuv\right)dx\right|
Ω|(Auv+cuv)|𝑑x\displaystyle\leq\int_{\Omega}\left|\left(A\nabla u\cdot\nabla v+cuv\right)\right|dx
Ω|Auv|𝑑x+Ω|cuv|𝑑x\displaystyle\leq\int_{\Omega}\left|A\nabla u\cdot\nabla v\right|dx+\int_{\Omega}\left|cuv\right|dx
AL(Ω)uL2(Ω)vL2(Ω)+cL(Ω)uL2(Ω)vL2(Ω\displaystyle\leq\|A\|_{L^{\infty}(\Omega)}\|\nabla u\|_{L^{2}(\Omega)}\|\nabla v\|_{L^{2}(\Omega)}+\|c\|_{L^{\infty}(\Omega)}\|u\|_{L^{2}(\Omega)}\|v\|_{L^{2}(\Omega}
MuL2(Ω)vL2(Ω)+cL(Ω)uL2(Ω)vL2(Ω)\displaystyle\leq M\|\nabla u\|_{L^{2}(\Omega)}\|\nabla v\|_{L^{2}(\Omega)}+\|c\|_{L^{\infty}(\Omega)}\|u\|_{L^{2}(\Omega)}\|v\|_{L^{2}(\Omega)}
max{M,CpcL(Ω)}uH01(Ω)vH01(Ω)\displaystyle\leq\max\left\{M,C_{p}\|c\|_{L^{\infty}(\Omega)}\right\}\|u\|_{H^{1}_{0}(\Omega)}\|v\|_{H^{1}_{0}(\Omega)} (17)

Now we show that the bilinear form a(u,u)a(u,u) is lower bounded.

b(v,v)\displaystyle b(v,v) =Ω(Avv+cv2)𝑑x\displaystyle=\int_{\Omega}\left(A\nabla v\cdot\nabla v+cv^{2}\right)dx
mΩv2𝑑x=mvH01(Ω)\displaystyle\geq m\int_{\Omega}\|\nabla v\|^{2}dx=m\|v\|_{H^{1}_{0}(\Omega)} (18)

Finally, for vH01(Ω)v\in H^{1}_{0}(\Omega)

|(f,v)|=|Ωfv𝑑x|fL2(Ω)vL2(Ω)CpfL2(Ω)vH01(Ω)\displaystyle|(f,v)|=\left|\int_{\Omega}fvdx\right|\leq\|f\|_{L^{2}(\Omega)}\|v\|_{L^{2}(\Omega)}\leq C_{p}\|f\|_{L^{2}(\Omega)}\|v\|_{H^{1}_{0}(\Omega)}

Hence, we satisfy the assumptions in required in Theorem A.2 and therefore the variational problem defined in (1) has a unique solution. ∎

Appendix B Perturbation Analysis

B.1 Proof of Lemma 3

Proof.

Using the triangle inequality the error between uu^{\star} and u~span\tilde{u}_{\operatorname{span}}^{\star}, we have,

uu~spanL2(Ω)uuspanL2(Ω)(I)+uspanu~spanL2(Ω)(II)\|u^{\star}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}\leq\underbrace{\|u^{\star}-u_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}}_{(I)}+\underbrace{\|u_{\operatorname{span}}^{\star}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}}_{(II)} (19)

where uspanu_{\operatorname{span}}^{\star} is the solution to the PDE Lu=fspanLu=f_{\operatorname{span}}.

In order to bound Term (I), we use the inequality in (2) to get,

uuspanL2(Ω)2\displaystyle\|u^{\star}-u_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}^{2} 1λ1L(uuspan),uuspanL2(Ω)\displaystyle\leq\frac{1}{\lambda_{1}}\langle L(u^{\star}-u_{\operatorname{span}}^{\star}),u^{\star}-u_{\operatorname{span}}^{\star}\rangle_{L^{2}(\Omega)}
=1λ1ffspan,uuspanL2(Ω)\displaystyle=\frac{1}{\lambda_{1}}\langle f-f_{\operatorname{span}},u^{\star}-u_{\operatorname{span}}^{\star}\rangle_{L^{2}(\Omega)}
1λ1ffspanL2(Ω)uuspanL2(Ω)\displaystyle\leq\frac{1}{\lambda_{1}}\|f-f_{\operatorname{span}}\|_{L^{2}(\Omega)}\|u^{\star}-u_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}
uuspanL2(Ω)\displaystyle\implies\|u^{\star}-u_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)} 1λ1ffspanL2(Ω)ϵspanλ1\displaystyle\leq\frac{1}{\lambda_{1}}\|f-f_{\operatorname{span}}\|_{L^{2}(\Omega)}\leq\frac{\epsilon_{\operatorname{span}}}{\lambda_{1}} (20)

We now bound Term (II).

First we introduce an intermediate PDE Lu=f~spanLu=\tilde{f}_{\operatorname{span}}, and denote the solution u~\tilde{u}. Therefore, by utilizing triangle inequality again Term (II) can be expanded as the following,

uspanu~spanL2(Ω)uspanu~L2(Ω)+u~u~spanL2(Ω)\|u_{\operatorname{span}}^{\star}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}\leq\|u_{\operatorname{span}}^{\star}-\tilde{u}\|_{L^{2}(\Omega)}+\|\tilde{u}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)} (21)

We will tackle the second term in (21) first.

Using u~=L1f~span\tilde{u}=L^{-1}\tilde{f}_{\operatorname{span}} and u~span=L~1f~span\tilde{u}_{\operatorname{span}}^{\star}=\tilde{L}^{-1}\tilde{f}_{\operatorname{span}},

u~u~spanL2(Ω)\displaystyle\|\tilde{u}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)} =(L1L~1)f~spanL2(Ω)\displaystyle=\|(L^{-1}-\tilde{L}^{-1})\tilde{f}_{\operatorname{span}}\|_{L^{2}(\Omega)}
=(L1L~I)L~1f~spanL2(Ω)\displaystyle=\|(L^{-1}\tilde{L}-I)\tilde{L}^{-1}\tilde{f}_{\operatorname{span}}\|_{L^{2}(\Omega)}
u~u~spanL2(Ω)\displaystyle\implies\|\tilde{u}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)} =(L1L~I)u~spanL2(Ω)\displaystyle=\|(L^{-1}\tilde{L}-I)\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)} (22)

Therefore, using the inequality in Lemma 5 part (2.) we can upper bounded (22) to get,

u~u~spanL2(Ω)δu~spanL2(Ω)\|\tilde{u}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}\leq\delta\|\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)} (23)

where δ=max{ϵAm,ϵcζ}\delta=\max\left\{\frac{\epsilon_{A}}{m},\frac{\epsilon_{c}}{\zeta}\right\}.

Proceeding to the first term in (21), using Lemma 4, and the inequality in (2), the term uspanu~L2(Ω)\|u_{\operatorname{span}}^{\star}-\tilde{u}\|_{L^{2}(\Omega)} can be upper bounded by,

uspanu~L2(Ω)2\displaystyle\|u_{\operatorname{span}}^{\star}-\tilde{u}\|^{2}_{L^{2}(\Omega)} 1λ1L(uspanu~),uspanu~L2(Ω)\displaystyle\leq\frac{1}{\lambda_{1}}\langle L(u_{\operatorname{span}}^{\star}-\tilde{u}),u_{\operatorname{span}}^{\star}-\tilde{u}\rangle_{L^{2}(\Omega)}
1λ1fspanf~span,uspanu~L2(Ω)\displaystyle\leq\frac{1}{\lambda_{1}}\langle f_{\operatorname{span}}-\tilde{f}_{\operatorname{span}},u_{\operatorname{span}}^{\star}-\tilde{u}\rangle_{L^{2}(\Omega)}
1λ1fspanf~spanL2(Ω)uspanu~L2(Ω)\displaystyle\leq\frac{1}{\lambda_{1}}\|f_{\operatorname{span}}-\tilde{f}_{\operatorname{span}}\|_{L^{2}(\Omega)}\|u_{\operatorname{span}}^{\star}-\tilde{u}\|_{L^{2}(\Omega)}
uspanu~L2(Ω)\displaystyle\implies\|u_{\operatorname{span}}^{\star}-\tilde{u}\|_{L^{2}(\Omega)} 1λ1fspanf~spanL2(Ω)δλ1fL2(Ω)γδ\displaystyle\leq\frac{1}{\lambda_{1}}\|f_{\operatorname{span}}-\tilde{f}_{\operatorname{span}}\|_{L^{2}(\Omega)}\leq\frac{\delta}{\lambda_{1}}\cdot\frac{\|f\|_{L^{2}(\Omega)}}{\gamma-\delta} (24)

Therefore Term (II), i.e., uspanu~spanL2(Ω)\|u_{\operatorname{span}}^{\star}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)} can be upper bounded by

uspanu~spanL2(Ω)uspanu~L2(Ω)+u~u~spanL2(Ω)ϵ^fλ1+δu~spanL2(Ω)\|u_{\operatorname{span}}^{\star}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}\leq\|u_{\operatorname{span}}^{\star}-\tilde{u}\|_{L^{2}(\Omega)}+\|\tilde{u}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}\leq\frac{\hat{\epsilon}_{f}}{\lambda_{1}}+\delta\|\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)} (25)

Putting everything together, we can upper bound (19) as

uu~spanL2(Ω)\displaystyle\|u^{\star}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)} uuspanL2(Ω)+uspanu~spanL2(Ω)\displaystyle\leq\|u^{\star}-u_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}+\|u_{\operatorname{span}}^{\star}-\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}
ϵspanλ1+δλ1fL2(Ω)γδ+δu~spanL2(Ω)\displaystyle\leq\frac{\epsilon_{\operatorname{span}}}{\lambda_{1}}+\frac{\delta}{\lambda_{1}}\frac{\|f\|_{L^{2}(\Omega)}}{\gamma-\delta}+\delta\|\tilde{u}_{\operatorname{span}}^{\star}\|_{L^{2}(\Omega)}

where γ=1λk1λk+1\gamma=\frac{1}{\lambda_{k}}-\frac{1}{\lambda_{k+1}} and δ=max{ϵAm,ϵcζ}\delta=\max\left\{\frac{\epsilon_{A}}{m},\frac{\epsilon_{c}}{\zeta}\right\}. ∎

B.2 Proof of Lemma 10

Proof.

We define r=f~spanfnnr=\tilde{f}_{\operatorname{span}}-f_{\operatorname{nn}}, therefore from Lemma C.2 we have that for any multi-index α\alpha,

L~trL2(Ω)(t!)2Ct(ϵnn+ϵspan)+4(1+δγδ)λktfspanL2(Ω).\|\tilde{L}^{t}r\|_{L^{2}(\Omega)}\leq(t!)^{2}\cdot C^{t}\left(\epsilon_{\operatorname{nn}}+\epsilon_{\operatorname{span}}\right)+4\left(1+\frac{\delta}{\gamma-\delta}\right)\lambda_{k}^{t}\|f_{\operatorname{span}}\|_{L^{2}(\Omega)}.

For every tt\in\mathbb{N}, we will write ut=u^t+rtu_{t}=\hat{u}_{t}+r_{t}, s.t. u^t\hat{u}_{t} is a neural network and we (iteratively) bound rtL2(Ω)\|r_{t}\|_{L^{2}(\Omega)}. Precisely, we define a sequence of neural networks {u^t}t=0\{\hat{u}_{t}\}_{t=0}^{\infty}, s.t.

{u^0=u0,u^t+1=u^tη(L~u^tfnn)\begin{cases}\hat{u}_{0}=u_{0},\\ \hat{u}_{t+1}=\hat{u}_{t}-\eta\left(\tilde{L}\hat{u}_{t}-f_{\operatorname{nn}}\right)\end{cases}

Since rt=utu^tr_{t}=u_{t}-\hat{u}_{t}, we can define a corresponding recurrence for rtr_{t}:

{r0=0,rt+1=(IηL~)rtr\begin{cases}r_{0}=0,\\ r_{t+1}=(I-\eta\tilde{L})r_{t}-r\end{cases}

Unfolding the recurrence, we get

rt+1\displaystyle r_{t+1} =i=0t(IηL~)ir\displaystyle=\sum_{i=0}^{t}(I-\eta\tilde{L})^{i}r (26)

Using the binomial expansion we can write:

(IηL~)tr\displaystyle(I-\eta\tilde{L})^{t}r =i=0t(ti)(1)i(ηL~)ir\displaystyle=\sum_{i=0}^{t}\binom{t}{i}(-1)^{i}(\eta\tilde{L})^{i}r
(IηL~)(t)rL2(Ω)\displaystyle\implies\|(I-\eta\tilde{L})^{(t)}r\|_{L^{2}(\Omega)} =i=0t(ti)(1)i(ηL~)irL2(Ω)\displaystyle=\left\|\sum_{i=0}^{t}\binom{t}{i}(-1)^{i}(\eta\tilde{L})^{i}r\right\|_{L^{2}(\Omega)}
i=0t(ti)ηiL~irL2(Ω)\displaystyle\leq\sum_{i=0}^{t}\binom{t}{i}\eta^{i}\|\tilde{L}^{i}r\|_{L^{2}(\Omega)}
i=0t(tei)iηiL~irL2(Ω)(ti)(tei)i\displaystyle\leq\sum_{i=0}^{t}\left(\frac{te}{i}\right)^{i}\eta^{i}\|\tilde{L}^{i}r\|_{L^{2}(\Omega)}\qquad\because\text{$\binom{t}{i}\leq\left(\frac{te}{i}\right)^{i}$}
(1)i=0t(teiη)i((i!)2Ci(ϵnn+ϵspan)+4(1+δγδ)λkifspanL2(Ω))\displaystyle\stackrel{{\scriptstyle(1)}}{{\leq}}\sum_{i=0}^{t}\left(\frac{te}{i}\eta\right)^{i}\left((i!)^{2}C^{i}\left(\epsilon_{\operatorname{nn}}+\epsilon_{\operatorname{span}}\right)+4\left(1+\frac{\delta}{\gamma-\delta}\right)\lambda_{k}^{i}\|f_{\operatorname{span}}\|_{L^{2}(\Omega)}\right)
i=0t(teiη)i(i!)2Ci((ϵnn+ϵspan)+4(1+δγδ)λki(i!)2CifspanL2(Ω))\displaystyle\leq\sum_{i=0}^{t}\left(\frac{te}{i}\eta\right)^{i}(i!)^{2}C^{i}\left(\left(\epsilon_{\operatorname{nn}}+\epsilon_{\operatorname{span}}\right)+4\left(1+\frac{\delta}{\gamma-\delta}\right)\frac{\lambda_{k}^{i}}{(i!)^{2}C^{i}}\|f_{\operatorname{span}}\|_{L^{2}(\Omega)}\right)
(2)i=0t(teiηi2C)i((ϵnn+ϵspan)+4(1+δγδ)λki(i!)2CifspanL2(Ω))\displaystyle\stackrel{{\scriptstyle(2)}}{{\leq}}\sum_{i=0}^{t}\left(\frac{te}{i}\eta i^{2}C\right)^{i}\left(\left(\epsilon_{\operatorname{nn}}+\epsilon_{\operatorname{span}}\right)+4\left(1+\frac{\delta}{\gamma-\delta}\right)\frac{\lambda_{k}^{i}}{(i!)^{2}C^{i}}\|f_{\operatorname{span}}\|_{L^{2}(\Omega)}\right)
(3)i=0t(teiηi2C)i((ϵnn+ϵspan)+4(1+δγδ)λkifspanL2(Ω))\displaystyle\stackrel{{\scriptstyle(3)}}{{\leq}}\sum_{i=0}^{t}\left(\frac{te}{i}\eta i^{2}C\right)^{i}\left(\left(\epsilon_{\operatorname{nn}}+\epsilon_{\operatorname{span}}\right)+4\left(1+\frac{\delta}{\gamma-\delta}\right)\lambda_{k}^{i}\|f_{\operatorname{span}}\|_{L^{2}(\Omega)}\right)
i=0t(tieηC)i((ϵnn+ϵspan)+4(1+δγδ)λkifspanL2(Ω))\displaystyle\leq\sum_{i=0}^{t}(tie\eta C)^{i}\left(\left(\epsilon_{\operatorname{nn}}+\epsilon_{\operatorname{span}}\right)+4\left(1+\frac{\delta}{\gamma-\delta}\right)\lambda_{k}^{i}\|f_{\operatorname{span}}\|_{L^{2}(\Omega)}\right)
tmax{1,(t2eηC)t}((ϵnn+ϵspan)+4(1+δγδ)λktfspanL2(Ω))\displaystyle\leq t\max\{1,(t^{2}e\eta C)^{t}\}\left(\left(\epsilon_{\operatorname{nn}}+\epsilon_{\operatorname{span}}\right)+4\left(1+\frac{\delta}{\gamma-\delta}\right)\lambda_{k}^{t}\|f_{\operatorname{span}}\|_{L^{2}(\Omega)}\right)

Here the inequality (1)(1) follows by using the bound derived in Lemma C.2. Further, we use that all ii\in\mathbb{N} we have i!iii!\leq i^{i} in (2)(2) and the inequality (3)(3) follows from the fact that 1(i!)2Ci1\frac{1}{(i!)^{2}C^{i}}\leq 1.

Hence we have the the final upper bound:

rtL2(Ω)t2max{1,(t2eηC)t}(ϵnn+ϵspan+4(1+δγδ)λktfspanL2(Ω))\displaystyle\|r_{t}\|_{L^{2}(\Omega)}\leq t^{2}\max\{1,(t^{2}e\eta C)^{t}\}\left(\epsilon_{\operatorname{nn}}+\epsilon_{\operatorname{span}}+4\left(1+\frac{\delta}{\gamma-\delta}\right)\lambda_{k}^{t}\|f_{\operatorname{span}}\|_{L^{2}(\Omega)}\right)

Appendix C Technical Lemmas: Perturbation Bounds

In this section we introduce some useful lemmas about perturbation bounds used in the preceding parts of the appendix.

First we show a lemma that’s ostensibly an application of Davis-Kahan to the (bounded) operators L1L^{-1} and L~1\tilde{L}^{-1}.

Lemma C.1 (Subspace alignment).

Consider linear elliptic operators LL and L~\tilde{L} with eigenvalues λ1λ2\lambda_{1}\leq\lambda_{2}\leq\cdots and λ1λ2\lambda_{1}\leq\lambda_{2}\leq\cdots respectively. Assume that γ:=1λk1λk+1>0\gamma:=\frac{1}{\lambda_{k}}-\frac{1}{\lambda_{k+1}}>0. For any function gH01(Ω)g\in H^{1}_{0}(\Omega), we define Pkg:=i=1kg,φiL2(Ω)φiP_{k}g:=\sum_{i=1}^{k}\langle g,\varphi_{i}\rangle_{L^{2}(\Omega)}\varphi_{i} and P~kg:=i=1kg,φ~iL2(Ω)φ~i\tilde{P}_{k}g:=\sum_{i=1}^{k}\langle g,\tilde{\varphi}_{i}\rangle_{L^{2}(\Omega)}\tilde{\varphi}_{i} as the projection of gg onto Φk\Phi_{k} and Φ~k\tilde{\Phi}_{k}, respectively. Then we have:

PkgP~kgL2(Ω)δγδgL2(Ω)\|P_{k}g-\tilde{P}_{k}g\|_{L^{2}(\Omega)}\leq\frac{\delta}{\gamma-\delta}\|g\|_{L^{2}(\Omega)} (27)

where δ=max{ϵAm,ϵcζ}\delta=\max\left\{\frac{\epsilon_{A}}{m},\frac{\epsilon_{c}}{\zeta}\right\}.

Proof.

We begin the proof by first showing that the inverse of the operators LL and L~\tilde{L} are close. Using the result from Lemma 5 with δ=max{ϵAm,ϵcζ}\delta=\max\left\{\frac{\epsilon_{A}}{m},\frac{\epsilon_{c}}{\zeta}\right\}, we have:

(L1L~I)u,uL2(Ω)δuL2(Ω)2\displaystyle\langle(L^{-1}\tilde{L}-I)u,u\rangle_{L^{2}(\Omega)}\leq\delta\|u\|_{L^{2}(\Omega)}^{2}
(L1L~1)L~u,uL2(Ω)δuL2(Ω)2\displaystyle\implies\langle(L^{-1}-\tilde{L}^{-1})\tilde{L}u,u\rangle_{L^{2}(\Omega)}\leq\delta\|u\|_{L^{2}(\Omega)}^{2}
(L1L~1)v,uL2(Ω)δuL2(Ω)2\displaystyle\implies\langle(L^{-1}-\tilde{L}^{-1})v,u\rangle_{L^{2}(\Omega)}\leq\delta\|u\|_{L^{2}(\Omega)}^{2}

Now, the operator norm L1L~1\|L^{-1}-\tilde{L}^{-1}\| can be written as,

L1L~1=supvH01(Ω)(L1L~1)v,vL2(Ω)vL2(Ω)2δ\|L^{-1}-\tilde{L}^{-1}\|=\sup_{v\in H^{1}_{0}(\Omega)}\frac{\langle(L^{-1}-\tilde{L}^{-1})v,v\rangle_{L^{2}(\Omega)}}{\|v\|^{2}_{L^{2}(\Omega)}}\leq\delta (28)

Further note that, {1λi}i=1\{\frac{1}{\lambda_{i}}\}_{i=1}^{\infty} and {1λ~i}i=1\{\frac{1}{\tilde{\lambda}_{i}}\}_{i=1}^{\infty} are the eigenvalues of the operators L1L^{-1} and L~1\tilde{L}^{-1}, respectively. Therefore from Weyl’s Inequality and (28) we have:

supi|1λi1λ~i|L1L~1δ\sup_{i}\left|\frac{1}{\lambda_{i}}-\frac{1}{\tilde{\lambda}_{i}}\right|\leq\|L^{-1}-\tilde{L}^{-1}\|\leq\delta (29)

Therefore, for all ii\in\mathbb{N}, we have that 1λ~i[1λiδ,1λi+δ]\frac{1}{\tilde{\lambda}_{i}}\in[\frac{1}{\lambda_{i}}-\delta,\frac{1}{\lambda_{i}}+\delta], i.e., all the eigenvalues of L~1\tilde{L}^{-1} are within δ\delta of the eigenvalue of L1L^{-1}. which therefore implies that the difference between kthk^{th} eigenvalues is,

1λ~k1λk+11λk1λk+1δ\frac{1}{\tilde{\lambda}_{k}}-\frac{1}{\lambda_{k+1}}\geq\frac{1}{\lambda_{k}}-\frac{1}{\lambda_{k+1}}-\delta

Since the operators L1L^{-1}, L~1\tilde{L}^{-1} are bounded, the Davis-Kahan sinΘ\sin\Theta theorem [Davis and Kahan, 1970] can be used to conclude that:

sinΘ(Φk,Φ~k)=PkP~kL1L~1γδδγδ\|\sin\Theta(\Phi_{k},\tilde{\Phi}_{k})\|=\|P_{k}-\tilde{P}_{k}\|\leq\frac{\|L^{-1}-\tilde{L}^{-1}\|}{\gamma-\delta}\leq\frac{\delta}{\gamma-\delta} (30)

where \|\cdot\| is understood to be the operator norm, and γ=1λk1λk+1\gamma=\frac{1}{\lambda_{k}}-\frac{1}{\lambda_{k+1}}. Therefore for any function gH01(Ω)g\in H^{1}_{0}(\Omega) we have

PkgP~kgL2(Ω)\displaystyle\|P_{k}g-\tilde{P}_{k}g\|_{L^{2}(\Omega)} PkP~kgL2(Ω)\displaystyle\leq\|P_{k}-\tilde{P}_{k}\|\|g\|_{L^{2}(\Omega)}
L1L~1γδgL2(Ω)\displaystyle\leq\frac{\|L^{-1}-\tilde{L}^{-1}\|}{\gamma-\delta}\|g\|_{L^{2}(\Omega)}

By (30), we then get PkgP~kgL2(Ω)δγδgL2(Ω)\|P_{k}g-\tilde{P}_{k}g\|_{L^{2}(\Omega)}\leq\frac{\delta}{\gamma-\delta}\|g\|_{L^{2}(\Omega)}, which finishes the proof. ∎

Finally, we show that repeated applications of L~\tilde{L} to fnnff_{\operatorname{nn}}-f have also bounded norms:

Lemma C.2 (Bounding norms of applications of L~\tilde{L}).

The functions fnnf_{\operatorname{nn}} and ff satisfy:

  1. 1.

    L~n(fnnfspan)L2(Ω)(n!)2Cn(ϵspan+ϵnn)\|\tilde{L}^{n}(f_{\operatorname{nn}}-f_{\operatorname{span}})\|_{L^{2}(\Omega)}\leq(n!)^{2}\cdot C^{n}(\epsilon_{\operatorname{span}}+\epsilon_{\operatorname{nn}})

  2. 2.

    L~n(fnnf~span)L2(Ω)(n!)2Cn(ϵspan+ϵnn)+4(1+δγδ)λknfL2(Ω)\|\tilde{L}^{n}(f_{\operatorname{nn}}-\tilde{f}_{\operatorname{span}})\|_{L^{2}(\Omega)}\leq(n!)^{2}\cdot C^{n}(\epsilon_{\operatorname{span}}+\epsilon_{\operatorname{nn}})+4\left(1+\frac{\delta}{\gamma-\delta}\right)\lambda_{k}^{n}\|f\|_{L^{2}(\Omega)}

where δ=max{ϵAm,ϵcζ}\delta=\max\left\{\frac{\epsilon_{A}}{m},\frac{\epsilon_{c}}{\zeta}\right\}.

Proof.

For Part 1, by Lemma D.4 we have that

L~n(fnnfspan)L2(Ω)(n!)2Cnmaxα:|α|n+2α(fnnfspan)L2(Ω)\|\tilde{L}^{n}(f_{\operatorname{nn}}-f_{\operatorname{span}})\|_{L^{2}(\Omega)}\leq(n!)^{2}\cdot C^{n}\max_{\alpha:|\alpha|\leq n+2}\|\partial^{\alpha}(f_{\operatorname{nn}}-f_{\operatorname{span}})\|_{L^{2}(\Omega)} (31)

From Assumptions (i)-(iii), for any multi-index α\alpha we have:

αfnnαfspanL2(Ω)\displaystyle\|\partial^{\alpha}f_{\operatorname{nn}}-\partial^{\alpha}f_{\operatorname{span}}\|_{L^{2}(\Omega)} αfnnαfL2(Ω)+αfαfspanL2(Ω)\displaystyle\leq\|\partial^{\alpha}f_{\operatorname{nn}}-\partial^{\alpha}f\|_{L^{2}(\Omega)}+\|\partial^{\alpha}f-\partial^{\alpha}f_{\operatorname{span}}\|_{L^{2}(\Omega)}
ϵnn+ϵspan\displaystyle\leq\epsilon_{\operatorname{nn}}+\epsilon_{\operatorname{span}} (32)

Combining (31) and (32) we get the result for Part 1.

For Part 2 we have,

L~n(f~spanfnn)L2(Ω)\displaystyle\|\tilde{L}^{n}(\tilde{f}_{\operatorname{span}}-f_{\operatorname{nn}})\|_{L^{2}(\Omega)} =L~n(f~spanfspan+fspanfnn)L2(Ω)\displaystyle=\|\tilde{L}^{n}(\tilde{f}_{\operatorname{span}}-f_{\operatorname{span}}+f_{\operatorname{span}}-f_{\operatorname{nn}})\|_{L^{2}(\Omega)} (33)
L~n(f~spanfspan)L2(Ω)+L~n(fspanfnn)L2(Ω)\displaystyle\leq\|\tilde{L}^{n}(\tilde{f}_{\operatorname{span}}-f_{\operatorname{span}})\|_{L^{2}(\Omega)}+\|\tilde{L}^{n}\left(f_{\operatorname{span}}-f_{\operatorname{nn}}\right)\|_{L^{2}(\Omega)} (34)

Note that from Lemma 5 part (2.) we have that L1L~Iδ\|L^{-1}\tilde{L}-I\|\leq\delta (where \|\cdot\| denotes the operator norm). This implies that there exists an operator Σ\Sigma, such that Σδ\|\Sigma\|\leq\delta and we can express L~\tilde{L} as:

L~=L(I+Σ)\displaystyle\tilde{L}=L(I+\Sigma)

We will show that there exists a Σ~\tilde{\Sigma}, s.t. Σ~n2δ\|\tilde{\Sigma}\|\leq n2\delta and L~n=(I+Σ~)Ln\tilde{L}^{n}=(I+\tilde{\Sigma})L^{n}. Towards that, we will denote Ln:=L1L1L1n timesL^{-n}:=\underbrace{L^{-1}\circ L^{-1}\circ\cdots L^{-1}}_{\text{n times}} and show that

LnL~n1+n2δ\left\|L^{-n}\tilde{L}^{n}\right\|\leq 1+n2\delta (35)

We have:

LnL~n\displaystyle\left\|L^{-n}\tilde{L}^{n}\right\| =Ln(L(I+Σ))n\displaystyle=\left\|L^{-n}\left(L(I+\Sigma)\right)^{n}\right\|
=Ln(Ln+j=1nLj1(LΣ)Lnj++(LΣ)n)\displaystyle=\left\|L^{-n}\left(L^{n}+\sum_{j=1}^{n}L^{j-1}\circ(L\circ\Sigma)\circ L^{n-j}+\cdots+(L\circ\Sigma)^{n}\right)\right\|
=I+j=1nLnLj1ΣLnj++Ln(LΣ)(n)\displaystyle=\left\|I+\sum_{j=1}^{n}L^{-n}\circ L^{j-1}\circ\Sigma\circ L^{n-j}+\cdots+L^{-n}\circ(L\circ\Sigma)^{(n)}\right\|
(1)1+j=1nLnLj1ΣLnj++Ln(LΣ)n\displaystyle\stackrel{{\scriptstyle(1)}}{{\leq}}1+\left\|\sum_{j=1}^{n}L^{-n}\circ L^{j-1}\circ\Sigma\circ L^{n-j}\right\|+\cdots+\|L^{-n}\circ(L\circ\Sigma)^{n}\|
(2)1+i=1n(ni)δi\displaystyle\stackrel{{\scriptstyle(2)}}{{\leq}}1+\sum_{i=1}^{n}\binom{n}{i}\delta^{i}
=(1+δ)n\displaystyle=(1+\delta)^{n}
(3)enδ\displaystyle\stackrel{{\scriptstyle(3)}}{{\leq}}e^{n\delta}
1+2nδ\displaystyle\leq 1+2n\delta

where (1) follows from triangle inequality, (2) follows from Lemma D.5, (3) follows from 1+xex1+x\leq e^{x}, and the last part follows from nδ1/10n\delta\leq 1/10 and Taylor expanding exe^{x}. Next, since LL and L~\tilde{L} are elliptic operators, we have LnL~n=L~nLn\|L^{-n}\tilde{L}^{n}\|=\|\tilde{L}^{n}L^{-n}\|. From this, it immediately follows that there exists a Σ~\tilde{\Sigma}, s.t. L~n=(I+Σ~)Ln\tilde{L}^{n}=(I+\tilde{\Sigma})L^{n} with Σ~n2δ\|\tilde{\Sigma}\|\leq n2\delta.

Plugging this into the first term of (34), we have

L~n(f~spanfspan)L2(Ω)\displaystyle\|\tilde{L}^{n}(\tilde{f}_{\operatorname{span}}-f_{\operatorname{span}})\|_{L^{2}(\Omega)} =L~nf~spanL~nfspanL2(Ω)\displaystyle=\|\tilde{L}^{n}\tilde{f}_{\operatorname{span}}-\tilde{L}^{n}f_{\operatorname{span}}\|_{L^{2}(\Omega)}
=L~nf~span(I+Σ~)LnfspanL2(Ω)\displaystyle=\|\tilde{L}^{n}\tilde{f}_{\operatorname{span}}-(I+\tilde{\Sigma})L^{n}f_{\operatorname{span}}\|_{L^{2}(\Omega)}
L~nf~spanLnfspanL2(Ω)+Σ~LnfspanL2(Ω)\displaystyle\leq\|\tilde{L}^{n}\tilde{f}_{\operatorname{span}}-L^{n}f_{\operatorname{span}}\|_{L^{2}(\Omega)}+\|\tilde{\Sigma}L^{n}f_{\operatorname{span}}\|_{L^{2}(\Omega)}
L~nf~spanLnfspanL2(Ω)+Σ~LnfspanL2(Ω)\displaystyle\leq\|\tilde{L}^{n}\tilde{f}_{\operatorname{span}}-L^{n}f_{\operatorname{span}}\|_{L^{2}(\Omega)}+\|\tilde{\Sigma}\|\|L^{n}f_{\operatorname{span}}\|_{L^{2}(\Omega)}
L~nf~spanLnfspanL2(Ω)+n2δλknfspanL2(Ω)\displaystyle\leq\|\tilde{L}^{n}\tilde{f}_{\operatorname{span}}-L^{n}f_{\operatorname{span}}\|_{L^{2}(\Omega)}+n2\delta\lambda_{k}^{n}\|f_{\operatorname{span}}\|_{L^{2}(\Omega)} (36)

The first term in first term in (36) can be expanded as follows:

L~nf~spanLnfspanL2(Ω)\displaystyle\|\tilde{L}^{n}\tilde{f}_{\operatorname{span}}-L^{n}f_{\operatorname{span}}\|_{L^{2}(\Omega)} =L~nf~spanLnf~span+Lnf~span+LnfspanL2(Ω)\displaystyle=\|\tilde{L}^{n}\tilde{f}_{\operatorname{span}}-L^{n}\tilde{f}_{\operatorname{span}}+L^{n}\tilde{f}_{\operatorname{span}}+L^{n}f_{\operatorname{span}}\|_{L^{2}(\Omega)}
L~nf~spanLnf~span+Lnf~spanLnfspanL2(Ω)\displaystyle\leq\|\tilde{L}^{n}\tilde{f}_{\operatorname{span}}-L^{n}\tilde{f}_{\operatorname{span}}\|+\|L^{n}\tilde{f}_{\operatorname{span}}-L^{n}f_{\operatorname{span}}\|_{L^{2}(\Omega)} (37)

We’ll consider the two terms in turn.

For the first term, the same proof as that of (35) shows that there exists an operator Σ^\hat{\Sigma}, s.t. Σ^2nδ\|\hat{\Sigma}\|\leq 2n\delta and Ln=(I+Σ^)L~nL^{n}=(I+\hat{\Sigma})\tilde{L}^{n}. Hence, we have:

L~nf~spanLnf~span\displaystyle\|\tilde{L}^{n}\tilde{f}_{\operatorname{span}}-L^{n}\tilde{f}_{\operatorname{span}}\| =L~nf~span(I+Σ^)L~nf~span\displaystyle=\|\tilde{L}^{n}\tilde{f}_{\operatorname{span}}-(I+\hat{\Sigma})\tilde{L}^{n}\tilde{f}_{\operatorname{span}}\|
=Σ^L~nf~span\displaystyle=\|\hat{\Sigma}\tilde{L}^{n}\tilde{f}_{\operatorname{span}}\|
λ~knΣ^f~spanL2(Ω)\displaystyle\leq\tilde{\lambda}_{k}^{n}\|\hat{\Sigma}\|\|\tilde{f}_{\operatorname{span}}\|_{L^{2}(\Omega)}
2nδλ~knfL2(Ω)(f~spanL2(Ω)fL2(Ω))\displaystyle\leq 2n\delta\tilde{\lambda}_{k}^{n}\|f\|_{L^{2}(\Omega)}\qquad(\because\|\tilde{f}_{\operatorname{span}}\|_{L^{2}(\Omega)}\leq\|f\|_{L^{2}(\Omega)}) (38)

For the second term in (36) we have:

Ln(f~spanfspan)L2(Ω)\displaystyle\|L^{n}(\tilde{f}_{\operatorname{span}}-f_{\operatorname{span}})\|_{L^{2}(\Omega)} supv:v=v1v2,v1Φk,v2Φ~kLnvL2(Ω)vL2(Ω)f~spanfspanL2(Ω)\displaystyle\leq\sup_{v:v=v_{1}-v_{2},v_{1}\in\Phi_{k},v_{2}\in\tilde{\Phi}_{k}}\frac{\|L^{n}v\|_{L^{2}(\Omega)}}{\|v\|_{L^{2}(\Omega)}}\|\tilde{f}_{\operatorname{span}}-f_{\operatorname{span}}\|_{L^{2}(\Omega)} (39)

To bound the first factor we have:

LnvL2(Ω)\displaystyle\|L^{n}v\|_{L^{2}(\Omega)} =Ln(v1v2)L2(Ω)\displaystyle=\|L^{n}(v_{1}-v_{2})\|_{L^{2}(\Omega)}
Lnv1L2(Ω)+Lnv2L2(Ω)\displaystyle\leq\|L^{n}v_{1}\|_{L^{2}(\Omega)}+\|L^{n}v_{2}\|_{L^{2}(\Omega)}
=Lnv1L2(Ω)+(I+Σ^)L~nv2L2(Ω)\displaystyle=\|L^{n}v_{1}\|_{L^{2}(\Omega)}+\|(I+\hat{\Sigma})\tilde{L}^{n}v_{2}\|_{L^{2}(\Omega)}
λknv1L2(Ω)+λ~knI+Σ^2v2L2(Ω)\displaystyle\leq\lambda_{k}^{n}\|v_{1}\|_{L^{2}(\Omega)}+\tilde{\lambda}^{n}_{k}\|I+\hat{\Sigma}\|_{2}\|v_{2}\|_{L^{2}(\Omega)}
(λkn+λ~kn(1+2nδ))vL2(Ω)\displaystyle\leq(\lambda_{k}^{n}+\tilde{\lambda}^{n}_{k}(1+2n\delta))\|v\|_{L^{2}(\Omega)}

where we use the fact that v1L2(Ω),v2L2(Ω)vL2(Ω)\|v_{1}\|_{L^{2}(\Omega)},\|v_{2}\|_{L^{2}(\Omega)}\leq\|v\|_{L^{2}(\Omega)} and Σ^2nδ\|\hat{\Sigma}\|\leq 2n\delta. Hence, we can bound

supv:v=v1v2,v1Φk,v2Φ~kLnvL2(Ω)vL2(Ω)(λkn+λ~kn(1+2nδ))\sup_{v:v=v_{1}-v_{2},v_{1}\in\Phi_{k},v_{2}\in\tilde{\Phi}_{k}}\frac{\|L^{n}v\|_{L^{2}(\Omega)}}{\|v\|_{L^{2}(\Omega)}}\leq(\lambda^{n}_{k}+\tilde{\lambda}^{n}_{k}(1+2n\delta)) (40)

From (40) and Lemma 4 we have:

Ln(f~spanfspan)L2(Ω)\displaystyle\|L^{n}(\tilde{f}_{\operatorname{span}}-f_{\operatorname{span}})\|_{L^{2}(\Omega)} supv:v=v1v2,v1Φk,v2Φ~kLnvL2(Ω)vL2(Ω)f~spanfspanL2(Ω)\displaystyle\leq\sup_{v:v=v_{1}-v_{2},v_{1}\in\Phi_{k},v_{2}\in\tilde{\Phi}_{k}}\frac{\|L^{n}v\|_{L^{2}(\Omega)}}{\|v\|_{L^{2}(\Omega)}}\|\tilde{f}_{\operatorname{span}}-f_{\operatorname{span}}\|_{L^{2}(\Omega)}
(λkn+λ~kn(1+2nδ))δγδfL2(Ω)\displaystyle\leq(\lambda^{n}_{k}+\tilde{\lambda}^{n}_{k}(1+2n\delta))\frac{\delta}{\gamma-\delta}\|f\|_{L^{2}(\Omega)} (41)

Therefore from (38) and (41), we can upper bound L~n(f~spanfspan)L2(Ω)\|\tilde{L}^{n}(\tilde{f}_{\operatorname{span}}-f_{\operatorname{span}})\|_{L^{2}(\Omega)} using (36) as follows:

L~n(f~spanfspan)L2(Ω)\displaystyle\|\tilde{L}^{n}(\tilde{f}_{\operatorname{span}}-f_{\operatorname{span}})\|_{L^{2}(\Omega)} L~nf~spanLnfspanL2(Ω)+2nδλknfL2(Ω)\displaystyle\leq\|\tilde{L}^{n}\tilde{f}_{\operatorname{span}}-L^{n}f_{\operatorname{span}}\|_{L^{2}(\Omega)}+2n\delta\lambda_{k}^{n}\|f\|_{L^{2}(\Omega)}
2nδλ~knfL2(Ω)+(λkn+λ~kn(1+2nδ))δγδfL2(Ω)+2nδλknfL2(Ω)\displaystyle\leq 2n\delta\tilde{\lambda}_{k}^{n}\|f\|_{L^{2}(\Omega)}+(\lambda_{k}^{n}+\tilde{\lambda}^{n}_{k}(1+2n\delta))\frac{\delta}{\gamma-\delta}\|f\|_{L^{2}(\Omega)}+2n\delta\lambda_{k}^{n}\|f\|_{L^{2}(\Omega)}
(i)(1+2nδλk)(1+(1+2nδ))δγδλknfL2(Ω)+2nδλknfL2(Ω)\displaystyle\stackrel{{\scriptstyle(i)}}{{\leq}}(1+2n\delta\lambda_{k})\left(1+(1+2n\delta)\right)\frac{\delta}{\gamma-\delta}\lambda_{k}^{n}\|f\|_{L^{2}(\Omega)}+2n\delta\lambda_{k}^{n}\|f\|_{L^{2}(\Omega)}
(ii)4(1+δγδ)λknfL2(Ω)\displaystyle\stackrel{{\scriptstyle(ii)}}{{\leq}}4\left(1+\frac{\delta}{\gamma-\delta}\right)\lambda_{k}^{n}\|f\|_{L^{2}(\Omega)}

Here in (i)(i) we use the result from Lemma D.1 and write λ~knλkn1+2nδλk\frac{\tilde{\lambda}^{n}_{k}}{\lambda_{k}^{n}}\leq 1+2n\delta\lambda_{k}. In (iii)(iii), we use nTn\leq T and the fact that 2Tmin(1,λk)δ1/1012T\min(1,\lambda_{k})\delta\leq 1/10\leq 1

Therefore, finally we have:

L~nf~spanLnfspanL2(Ω)4(δγδ+1)λknfL2(Ω)\|\tilde{L}^{n}\tilde{f}_{\operatorname{span}}-L^{n}f_{\operatorname{span}}\|_{L^{2}(\Omega)}\leq 4\left(\frac{\delta}{\gamma-\delta}+1\right)\lambda_{k}^{n}\|f\|_{L^{2}(\Omega)}

Combining with the result for Part 1, Therefore we have the following:

L~n(f~spanfnn)L2(Ω)(n!)2Cn(ϵspan+ϵnn)+4(1+δγδ)λknfL2(Ω)\displaystyle\|\tilde{L}^{n}(\tilde{f}_{\operatorname{span}}-f_{\operatorname{nn}})\|_{L^{2}(\Omega)}\leq(n!)^{2}\cdot C^{n}(\epsilon_{\operatorname{span}}+\epsilon_{\operatorname{nn}})+4\left(1+\frac{\delta}{\gamma-\delta}\right)\lambda_{k}^{n}\|f\|_{L^{2}(\Omega)}

Appendix D Technical Lemmas: Manipulating Operators

Before we state the lemmas we introduce some common notation used throughout this section. We denote Ln=LLLn timesL^{n}=\underbrace{L\circ L\circ\cdots\circ L}_{\text{n times}}. Further we use LkL_{k} to denote the operator with kaij\partial_{k}a_{ij} for all i,j[d]i,j\in[d] and kc\partial_{k}c as coefficients, that is:

Lku=i,j=1d(kaij)ijui,j=1dk(iai)ju+(kc)uL_{k}u=\sum_{i,j=1}^{d}-\left(\partial_{k}a_{ij}\right)\partial_{ij}u-\sum_{i,j=1}^{d}\partial_{k}\left(\partial_{i}a_{i}\right)\partial_{j}u+(\partial_{k}c)u

Similarly the operator LklL_{kl} is defined as:

Lklu=i,j=1d(klaij)ijui,j=1dkl(iai)ju+(klc)uL_{kl}u=\sum_{i,j=1}^{d}-\left(\partial_{kl}a_{ij}\right)\partial_{ij}u-\sum_{i,j=1}^{d}\partial_{kl}\left(\partial_{i}a_{i}\right)\partial_{j}u+(\partial_{kl}c)u
Lemma D.1.

Given φi\varphi_{i} and φ~i\tilde{\varphi}_{i} for all i[k]i\in[k] are top kk eigenvalues of operators LL and L~\tilde{L} respectively, such that L1L~1\|L^{-1}-\tilde{L}^{-1}\| is bounded. Then for all nn\in\mathbb{N} we have that

λ~in(1+e^)λin\tilde{\lambda}^{n}_{i}\leq(1+\hat{e})\lambda^{n}_{i}

where i[k]i\in[k] and |e^|2nδλk|\hat{e}|\leq 2n\delta\lambda_{k} and δ=max{ϵAm,ϵcζ}\delta=\max\left\{\frac{\epsilon_{A}}{m},\frac{\epsilon_{c}}{\zeta}\right\}.

Proof.

From (28) and Weyl’s inequality we have for all ii\in\mathbb{N}

supi|1λi1λ~i|L1L~1δ\displaystyle\sup_{i}\left|\frac{1}{\lambda_{i}}-\frac{1}{\tilde{\lambda}_{i}}\right|\leq\|L^{-1}-\tilde{L}^{-1}\|\leq\delta

From this, we can conclude that:

|λ~iλi|δλiλ~i\displaystyle\left|\tilde{\lambda}_{i}-\lambda_{i}\right|\leq\delta\lambda_{i}\tilde{\lambda}_{i}
\displaystyle\implies λ~i(1δλi)λi\displaystyle\tilde{\lambda}_{i}(1-\delta\lambda_{i})\leq\lambda_{i}
\displaystyle\implies λ~iλi(1δλi)\displaystyle\tilde{\lambda}_{i}\leq\frac{\lambda_{i}}{(1-\delta\lambda_{i})}
\displaystyle\implies λ~i(1+δλi)λi\displaystyle\tilde{\lambda}_{i}\leq(1+\delta\lambda_{i})\lambda_{i}

Writing λ~i=(1+e~i)λi\tilde{\lambda}_{i}=(1+\tilde{e}_{i})\lambda_{i} (where e~i=δλi\tilde{e}_{i}=\delta\lambda_{i}), we have

|λ~inλin|\displaystyle\left|\tilde{\lambda}^{n}_{i}-\lambda_{i}^{n}\right| =|((1+e~i)λi)nλin|\displaystyle=\left|((1+\tilde{e}_{i})\lambda_{i})^{n}-\lambda_{i}^{n}\right|
=|λin((1+e~i)n1)|\displaystyle=\left|\lambda_{i}^{n}((1+\tilde{e}_{i})^{n}-1)\right|
(1)λin|e~|i|j=1n(1+e~i)j|\displaystyle\stackrel{{\scriptstyle(1)}}{{\leq}}\lambda_{i}^{n}|\tilde{e}|_{i}\left|\sum_{j=1}^{n}(1+\tilde{e}_{i})^{j}\right|
(2)λinn|e~i|en|e~i|\displaystyle\stackrel{{\scriptstyle(2)}}{{\leq}}\lambda_{i}^{n}n|\tilde{e}_{i}|e^{n|\tilde{e}_{i}|}
(3)λinn|e~i|(1+|2ne~i|)\displaystyle\stackrel{{\scriptstyle(3)}}{{\leq}}\lambda_{i}^{n}n|\tilde{e}_{i}|(1+|2n\tilde{e}_{i}|)
2λinn|e~i|\displaystyle\leq 2\lambda_{i}^{n}n|\tilde{e}_{i}|

where (1) follows from the factorization anbn=(ab)(i=0n1aibnii)a^{n}-b^{n}=(a-b)(\sum_{i=0}^{n-1}a^{i}b^{n-i-i}), (2) follows from 1+xex1+x\leq e^{x}, and (3) follows from n|e~i|1/20n|\tilde{e}_{i}|\leq 1/20 and Taylor expanding exe^{x}. Hence, there exists a e^i\hat{e}_{i}, s.t. λ~in=(1+e^i)λin\tilde{\lambda}^{n}_{i}=(1+\hat{e}_{i})\lambda^{n}_{i} and |e^i|2n|e~i||\hat{e}_{i}|\leq 2n|\tilde{e}_{i}| (i.e., |e^i|2nδλi|\hat{e}_{i}|\leq 2n\delta\lambda_{i}). Using the fact that λiλk\lambda_{i}\leq\lambda_{k} for all i[k]i\in[k] completes the proof. ∎

Lemma D.2 (Operator Chain Rule).

Given an elliptic operator LL, for all vC(Ω)v\in C^{\infty}(\Omega) we have the following

kLnu=i=1n(LniLkLi1)(u)+Ln(ku)\nabla_{k}L^{n}u=\sum_{i=1}^{n}\left(L^{n-i}\circ L_{k}\circ L^{i-1}\right)(u)+L^{n}(\nabla_{k}u) (42)
kl(Lnu)=i,ji<j(LniLkLji1LlLj1)u+i,ji>j(LnjLkLij1LlLi1)u+i(LniLklLi1)u+Ln(klu)\displaystyle\begin{split}\nabla_{kl}(L^{n}u)&=\sum_{\begin{subarray}{c}i,j\\ i<j\end{subarray}}\left(L^{n-i}\circ L_{k}\circ L^{j-i-1}\circ L_{l}\circ L^{j-1}\right)u\\ &+\sum_{\begin{subarray}{c}i,j\\ i>j\end{subarray}}\left(L^{n-j}\circ L_{k}\circ L^{i-j-1}\circ L_{l}\circ L^{i-1}\right)u\\ &+\sum_{i}\left(L^{n-i}\circ L_{kl}\circ L^{i-1}\right)u+L^{n}(\nabla_{kl}u)\end{split} (43)

where we assume that L(0)=IL^{(0)}=I.

Proof.

We show the proof using induction on nn. To handle the base case, for n=1n=1, we have

k(Lu)\displaystyle\nabla_{k}(Lu) =k(div(Au)+cu)\displaystyle=\nabla_{k}\left(-\mathrm{div}(A\nabla u)+cu\right)
=k(ijaijijuijiaijju+cu)\displaystyle=\nabla_{k}\left(-\sum_{ij}a_{ij}\partial_{ij}u-\sum_{ij}\partial_{i}a_{ij}\partial_{j}u+cu\right)
=(ijaijij(ku)ijiaijjku+cku)\displaystyle=\left(-\sum_{ij}a_{ij}\partial_{ij}(\partial_{k}u)-\sum_{ij}\partial_{i}a_{ij}\partial_{j}\partial_{k}u+c\partial_{k}u\right)
+(ijkaijijuijikaijju+kcu)\displaystyle+\left(-\sum_{ij}\partial_{k}a_{ij}\partial_{ij}u-\sum_{ij}\partial_{i}\partial_{k}a_{ij}\partial_{j}u+\partial_{k}cu\right)
=L(ku)+Lku\displaystyle=L(\nabla_{k}u)+L_{k}u (44)

Similarly n=1n=1 and k,l[d]k,l\in[d],

kl(Lu)\displaystyle\nabla_{kl}(Lu) =kl(div(Au)+cu)\displaystyle=\nabla_{kl}\left(-\mathrm{div}(A\nabla u)+cu\right)
=kl(ijaijijuijiaijju+cu)\displaystyle=\nabla_{kl}\left(-\sum_{ij}a_{ij}\partial_{ij}u-\sum_{ij}\partial_{i}a_{ij}\partial_{j}u+cu\right)
=(ijaijij(klu)ijiaijjklu+cklu)\displaystyle=\left(-\sum_{ij}a_{ij}\partial_{ij}(\partial_{kl}u)-\sum_{ij}\partial_{i}a_{ij}\partial_{j}\partial_{kl}u+c\partial_{kl}u\right)
+(ijkaijijluijikaijjlu+kclu)\displaystyle+\left(-\sum_{ij}\partial_{k}a_{ij}\partial_{ij}\partial_{l}u-\sum_{ij}\partial_{i}\partial_{k}a_{ij}\partial_{j}\partial_{l}u+\partial_{k}c\partial_{l}u\right)
+(ijlaijijkuijilaijjku+lcku)\displaystyle+\left(-\sum_{ij}\partial_{l}a_{ij}\partial_{ij}\partial_{k}u-\sum_{ij}\partial_{i}\partial_{l}a_{ij}\partial_{j}\partial_{k}u+\partial_{l}c\partial_{k}u\right)
+(ijklaijijuijiklaijju+klcu)\displaystyle+\left(-\sum_{ij}\partial_{kl}a_{ij}\partial_{ij}u-\sum_{ij}\partial_{i}\partial_{kl}a_{ij}\partial_{j}u+\partial_{kl}cu\right)
=L(klu)+Lk(lu)+Ll(ku)+Lklu\displaystyle=L(\nabla_{kl}u)+L_{k}(\nabla_{l}u)+L_{l}(\nabla_{k}u)+L_{kl}u (45)

For the inductive case, assume that for all m<nm<n, (42) and (43) hold. Then, for any k[d]k\in[d] we have:

k(Lnu)\displaystyle\nabla_{k}(L^{n}u) =k(LLn1(u))\displaystyle=\nabla_{k}\left(L\circ L^{n-1}(u)\right)
=L(k(Ln1u))+Lk(Ln1u)\displaystyle=L\left(\nabla_{k}(L^{n-1}u)\right)+L_{k}\left(L^{n-1}u\right)
=L(i=1n1(Ln1iLkLi1)u+Ln1(ku))+Lk(Ln1)u\displaystyle=L\left(\sum_{i=1}^{n-1}\left(L^{n-1-i}\circ L_{k}\circ L^{i-1}\right)u+L^{n-1}(\nabla_{k}u)\right)+L_{k}\left(L^{n-1}\right)u
=i=1n(LniLkLi1)(u)+Ln(ku)\displaystyle=\sum_{i=1}^{n}\left(L^{n-i}\circ L_{k}\circ L^{i-1}\right)(u)+L^{n}(\nabla_{k}u) (46)

Similarly, for all k,l[d]k,l\in[d] we have:

kl(Lnu)\displaystyle\nabla_{kl}(L^{n}u) =kl(LLn1(u))\displaystyle=\nabla_{kl}\left(L\circ L^{n-1}(u)\right)
=L(kl(Ln1u))+Lk(l(Ln1u))+Ll(k(Ln1u))+Lkl(Ln1u)\displaystyle=L\left(\nabla_{kl}(L^{n-1}u)\right)+L_{k}\left(\nabla_{l}\left(L^{n-1}u\right)\right)+L_{l}\left(\nabla_{k}\left(L^{n-1}u\right)\right)+L_{kl}\left(L^{n-1}u\right)
=L(i,ji<jn1(Ln1iLkLji1LlLj1)u\displaystyle=L\Bigg{(}\sum_{\begin{subarray}{c}i,j\\ i<j\end{subarray}}^{n-1}\left(L^{n-1-i}\circ L_{k}\circ L^{j-i-1}\circ L_{l}\circ L^{j-1}\right)u
+i,ji>jn1(Ln1jLkLij1LlLi1)u\displaystyle\qquad\;+\sum_{\begin{subarray}{c}i,j\\ i>j\end{subarray}}^{n-1}\left(L^{n-1-j}\circ L_{k}\circ L^{i-j-1}\circ L_{l}\circ L^{i-1}\right)u
+i=1n1(Ln1iLklLi1)u+Ln1(klu))\displaystyle\qquad\;+\sum_{i=1}^{n-1}\left(L^{n-1-i}\circ L_{kl}\circ L^{i-1}\right)u+L^{n-1}(\nabla_{kl}u)\Bigg{)}
+Lk(i=1n1(Ln1iLlLi1)(u)+Ln1(lu))(from (46))\displaystyle+L_{k}\Bigg{(}\sum_{i=1}^{n-1}\left(L^{n-1-i}\circ L_{l}\circ L^{i-1}\right)(u)+L^{n-1}(\nabla_{l}u)\Bigg{)}\qquad\text{(from \eqref{eq:operator_inequality_final_beta_1})}
+Ll(i=1n1(Ln1iLkLi1)(u)+Ln1(ku))(from (46))\displaystyle+L_{l}\Bigg{(}\sum_{i=1}^{n-1}\left(L^{n-1-i}\circ L_{k}\circ L^{i-1}\right)(u)+L^{n-1}(\nabla_{k}u)\Bigg{)}\qquad\text{(from \eqref{eq:operator_inequality_final_beta_1})}
+Lkl(Ln1u)\displaystyle+L_{kl}\left(L^{n-1}u\right)
=i,ji<jn(LniLkLji1LlLj1)u\displaystyle=\sum_{\begin{subarray}{c}i,j\\ i<j\end{subarray}}^{n}\left(L^{n-i}\circ L_{k}\circ L^{j-i-1}\circ L_{l}\circ L^{j-1}\right)u
+i,ji>jn(LnjLkLij1LlLi1)u\displaystyle\qquad+\sum_{\begin{subarray}{c}i,j\\ i>j\end{subarray}}^{n}\left(L^{n-j}\circ L_{k}\circ L^{i-j-1}\circ L_{l}\circ L^{i-1}\right)u
+in(LniLklLi1)u+Ln(klu)\displaystyle\qquad+\sum_{i}^{n}\left(L^{n-i}\circ L_{kl}\circ L^{i-1}\right)u+L^{n}(\nabla_{kl}u) (47)

By induction, the claim follows. ∎

Lemma D.3.

For all uC(Ω)u\in C^{\infty}(\Omega) then for all k,l[d]k,l\in[d] the following upper bounds hold,

LuL2(Ω)Cmaxα:|α|2αuL2(Ω)\|Lu\|_{L^{2}(\Omega)}\leq C\max_{\alpha:|\alpha|\leq 2}\|\partial^{\alpha}u\|_{L^{2}(\Omega)} (48)
k(Lu)L2(Ω)2Cmaxα:|α|3αuL2(Ω)\|\nabla_{k}(Lu)\|_{L^{2}(\Omega)}\leq 2\cdot C\max_{\alpha:|\alpha|\leq 3}\|\partial^{\alpha}u\|_{L^{2}(\Omega)} (49)

and

kl(Lu)L2(Ω)4Cmaxα:|α|4αuL2(Ω)\|\nabla_{kl}(Lu)\|_{L^{2}(\Omega)}\leq 4\cdot C\max_{\alpha:|\alpha|\leq 4}\|\partial^{\alpha}u\|_{L^{2}(\Omega)} (50)

where

C:=(2d2+1)max{maxα:|α|3maxi,jαaijL(Ω),maxα:|α|2αcL(Ω)}.C:=(2d^{2}+1)\max\left\{\max_{\alpha:|\alpha|\leq 3}\max_{i,j}\|\partial^{\alpha}a_{ij}\|_{L^{\infty}(\Omega)},\max_{\alpha:|\alpha|\leq 2}\|\partial^{\alpha}c\|_{L^{\infty}(\Omega)}\right\}.
Proof.

We first show the upper bound on LuL2(Ω)\|Lu\|_{L^{2}(\Omega)}:

LuL2(Ω)\displaystyle\|Lu\|_{L^{2}(\Omega)} i,j=1daijijui,j=1diaijju+cuL2(Ω)\displaystyle\leq\left\|-\sum_{i,j=1}^{d}a_{ij}\partial_{ij}u-\sum_{i,j=1}^{d}\partial_{i}a_{ij}\partial_{j}u+cu\right\|_{L^{2}(\Omega)}
(1)(2d2+1)max{maxi,jiaijL(Ω),maxi,jaijL(Ω),cL(Ω)}C1maxα:|α|2αuL2(Ω)\displaystyle\leq^{(1)}\underbrace{(2d^{2}+1)\max\left\{\max_{i,j}\|\partial_{i}a_{ij}\|_{L^{\infty}(\Omega)},\max_{i,j}\|a_{ij}\|_{L^{\infty}(\Omega)},\|c\|_{L^{\infty}(\Omega)}\right\}}_{C_{1}}\max_{\alpha:|\alpha|\leq 2}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}
C1maxα:|α|2αuL2(Ω)\displaystyle\leq C_{1}\max_{\alpha:|\alpha|\leq 2}\|\partial^{\alpha}u\|_{L^{2}(\Omega)} (51)

where (1) follows by Hölder.

Proceeding to k(Lu)L2(Ω)\|\nabla_{k}(Lu)\|_{L^{2}(\Omega)}, from Lemma D.4 we have

k(Lu)L2(Ω)\displaystyle\|\nabla_{k}(Lu)\|_{L^{2}(\Omega)} LkuL2(Ω)+L(ku)L2(Ω)\displaystyle\leq\|L_{k}u\|_{L^{2}(\Omega)}+\|L(\nabla_{k}u)\|_{L^{2}(\Omega)}
i,j=1dkaijijui,j=1dikaijju+kcuL2(Ω)\displaystyle\leq\left\|-\sum_{i,j=1}^{d}\partial_{k}a_{ij}\partial_{ij}u-\sum_{i,j=1}^{d}\partial_{ik}a_{ij}\partial_{j}u+\partial_{k}cu\right\|_{L^{2}(\Omega)}
+i,j=1daijijkui,j=1diaijjku+ckuL2(Ω)\displaystyle+\left\|-\sum_{i,j=1}^{d}a_{ij}\partial_{ijk}u-\sum_{i,j=1}^{d}\partial_{i}a_{ij}\partial_{jk}u+c\partial_{k}u\right\|_{L^{2}(\Omega)}
(2d2+1)max{maxα:|α|2maxi,jαaijL(Ω),kcL(Ω)}maxα:|α|2αuL2(Ω)\displaystyle\leq(2d^{2}+1)\max\left\{\max_{\alpha:|\alpha|\leq 2}\max_{i,j}\|\partial^{\alpha}a_{ij}\|_{L^{\infty}(\Omega)},\|\partial_{k}c\|_{L^{\infty}(\Omega)}\right\}\max_{\alpha:|\alpha|\leq 2}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}
+(2d2+1)max{maxα:|α|1maxi,jαaijL(Ω),cL(Ω)}maxα:|α|3αuL2(Ω)\displaystyle+(2d^{2}+1)\max\left\{\max_{\alpha:|\alpha|\leq 1}\max_{i,j}\|\partial^{\alpha}a_{ij}\|_{L^{\infty}(\Omega)},\|c\|_{L^{\infty}(\Omega)}\right\}\max_{\alpha:|\alpha|\leq 3}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}
k(Lu)L2(Ω)\displaystyle\implies\|\nabla_{k}(Lu)\|_{L^{2}(\Omega)} 2(2d2+1)max{maxα:|α|2maxi,jαaijL(Ω),maxα:|α|1αcL(Ω)}C2maxα:|α|3αuL2(Ω)\displaystyle\leq 2\cdot\underbrace{(2d^{2}+1)\max\left\{\max_{\alpha:|\alpha|\leq 2}\max_{i,j}\|\partial^{\alpha}a_{ij}\|_{L^{\infty}(\Omega)},\max_{\alpha:|\alpha|\leq 1}\|\partial^{\alpha}c\|_{L^{\infty}(\Omega)}\right\}}_{C_{2}}\max_{\alpha:|\alpha|\leq 3}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}
2C2maxα:|α|3αuL2(Ω)\displaystyle\leq 2\cdot C_{2}\max_{\alpha:|\alpha|\leq 3}\|\partial^{\alpha}u\|_{L^{2}(\Omega)} (52)

We use the result from Lemma D.2 (equation (45)), to upper bound the quantity kl(Lu)L2(Ω)\|\nabla_{kl}(Lu)\|_{L^{2}(\Omega)}

kl(Lu)L2(Ω)\displaystyle\|\nabla_{kl}(Lu)\|_{L^{2}(\Omega)} LkluL2(Ω)+Lk(lu)L2(Ω)+Ll(ku)L2(Ω)+L(klu)L2(Ω)\displaystyle\leq\|L_{kl}u\|_{L^{2}(\Omega)}+\|L_{k}(\nabla_{l}u)\|_{L^{2}(\Omega)}+\|L_{l}(\nabla_{k}u)\|_{L^{2}(\Omega)}+\|L(\nabla_{kl}u)\|_{L^{2}(\Omega)}
i,j=1dklaijijui,j=1diklaijju+klcuL2(Ω)\displaystyle\leq\left\|-\sum_{i,j=1}^{d}\partial_{kl}a_{ij}\partial_{ij}u-\sum_{i,j=1}^{d}\partial_{ikl}a_{ij}\partial_{j}u+\partial_{kl}cu\right\|_{L^{2}(\Omega)}
+i,j=1dkaijijlui,j=1dikaijjlu+kcluL2(Ω)\displaystyle+\left\|-\sum_{i,j=1}^{d}\partial_{k}a_{ij}\partial_{ij}\partial_{l}u-\sum_{i,j=1}^{d}\partial_{i}\partial_{k}a_{ij}\partial_{j}\partial_{l}u+\partial_{k}c\partial_{l}u\right\|_{L^{2}(\Omega)}
+i,j=1dlaijijkui,j=1dilaijjku+lckuL2(Ω)\displaystyle+\left\|-\sum_{i,j=1}^{d}\partial_{l}a_{ij}\partial_{ij}\partial_{k}u-\sum_{i,j=1}^{d}\partial_{i}\partial_{l}a_{ij}\partial_{j}\partial_{k}u+\partial_{l}c\partial_{k}u\right\|_{L^{2}(\Omega)}
+i,j=1daijijklui,j=1diaijjklu+ckluL2(Ω)\displaystyle+\left\|-\sum_{i,j=1}^{d}a_{ij}\partial_{ijkl}u-\sum_{i,j=1}^{d}\partial_{i}a_{ij}\partial_{jkl}u+c\partial_{kl}u\right\|_{L^{2}(\Omega)}
(2d2+1)max{maxα:|α|3maxi,jαaijL(Ω),klcL(Ω)}maxα:|α|2αuL2(Ω)\displaystyle\leq(2d^{2}+1)\max\left\{\max_{\alpha:|\alpha|\leq 3}\max_{i,j}\|\partial^{\alpha}a_{ij}\|_{L^{\infty}(\Omega)},\|\partial_{kl}c\|_{L^{\infty}(\Omega)}\right\}\max_{\alpha:|\alpha|\leq 2}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}
+2(2d2+1)max{maxα:|α|2maxi,jαaijL(Ω),cL(Ω)}maxα:|α|3αuL2(Ω)\displaystyle+2(2d^{2}+1)\max\left\{\max_{\alpha:|\alpha|\leq 2}\max_{i,j}\|\partial^{\alpha}a_{ij}\|_{L^{\infty}(\Omega)},\|c\|_{L^{\infty}(\Omega)}\right\}\max_{\alpha:|\alpha|\leq 3}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}
+(2d2+1)max{maxα:|α|2maxi,jαaijL(Ω),cL(Ω)}maxα:|α|4αuL2(Ω)\displaystyle+(2d^{2}+1)\max\left\{\max_{\alpha:|\alpha|\leq 2}\max_{i,j}\|\partial^{\alpha}a_{ij}\|_{L^{\infty}(\Omega)},\|c\|_{L^{\infty}(\Omega)}\right\}\max_{\alpha:|\alpha|\leq 4}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}
kl(Lu)L2(Ω)\displaystyle\implies\|\nabla_{kl}(Lu)\|_{L^{2}(\Omega)} 4(2d2+1)max{maxα:|α|3maxi,jαaijL(Ω),maxα:|α|2αcL(Ω)}C3maxα:|α|4αuL2(Ω)\displaystyle\leq 4\cdot\underbrace{(2d^{2}+1)\max\left\{\max_{\alpha:|\alpha|\leq 3}\max_{i,j}\|\partial^{\alpha}a_{ij}\|_{L^{\infty}(\Omega)},\max_{\alpha:|\alpha|\leq 2}\|\partial^{\alpha}c\|_{L^{\infty}(\Omega)}\right\}}_{C_{3}}\max_{\alpha:|\alpha|\leq 4}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}
4C3maxα:|α|4αuL2(Ω)\displaystyle\leq 4\cdot C_{3}\max_{\alpha:|\alpha|\leq 4}\|\partial^{\alpha}u\|_{L^{2}(\Omega)} (53)

Since C1C2C3C_{1}\leq C_{2}\leq C_{3}, we define C:=C3C:=C_{3} and therefore from equations (51), (52) and (53) the claim follows.

Further, we note that from (52), we also have that

Lk(u)L2(Ω),L(ku)L2(Ω)Cmaxα:|α|3αuL2(Ω)\|L_{k}(u)\|_{L^{2}(\Omega)},\|L(\nabla_{k}u)\|_{L^{2}(\Omega)}\leq C\max_{\alpha:|\alpha|\leq 3}\|\partial^{\alpha}u\|_{L^{2}(\Omega)} (54)

and similarly from (53) we have that,

Lkl(u)L2(Ω),Lk(lu)L2(Ω),Ll(ku)L2(Ω),L(klu)L2(Ω)Cmaxα:|α|4αuL2(Ω)\|L_{kl}(u)\|_{L^{2}(\Omega)},\|L_{k}(\nabla_{l}u)\|_{L^{2}(\Omega)},\|L_{l}(\nabla_{k}u)\|_{L^{2}(\Omega)},\|L(\nabla_{kl}u)\|_{L^{2}(\Omega)}\leq C\max_{\alpha:|\alpha|\leq 4}\|\partial^{\alpha}u\|_{L^{2}(\Omega)} (55)

Lemma D.4.

For all uC(Ω)u\in C^{\infty}(\Omega) and k,l[d]k,l\in[d] then for all nn\in\mathbb{N} we have the following upper bounds,

LnuL2(Ω)(n!)2Cnmaxα:|α|n+2αuL2(Ω)\|L^{n}u\|_{L^{2}(\Omega)}\leq(n!)^{2}\cdot C^{n}\max_{\alpha:|\alpha|\leq n+2}\|\partial^{\alpha}u\|_{L^{2}(\Omega)} (56)
k(Lnu)L2(Ω)(n+1)(n!)2Cnmaxα:|α|n+2αuL2(Ω)\|\nabla_{k}(L^{n}u)\|_{L^{2}(\Omega)}\leq(n+1)\cdot(n!)^{2}\cdot C^{n}\max_{\alpha:|\alpha|\leq n+2}\|\partial^{\alpha}u\|_{L^{2}(\Omega)} (57)
kl(Lnu)L2(Ω)((n+1)!)2Cnmaxα:|α|n+3αuL2(Ω)\|\nabla_{kl}(L^{n}u)\|_{L^{2}(\Omega)}\leq((n+1)!)^{2}\cdot C^{n}\max_{\alpha:|\alpha|\leq n+3}\|\partial^{\alpha}u\|_{L^{2}(\Omega)} (58)

where C=(2d2+1)max{maxα:|α|3maxi,jαaijL(Ω),maxα:|α|2αcL(Ω)}C=(2d^{2}+1)\max\left\{\max_{\alpha:|\alpha|\leq 3}\max_{i,j}\|\partial^{\alpha}a_{ij}\|_{L^{\infty}(\Omega)},\max_{\alpha:|\alpha|\leq 2}\|\partial^{\alpha}c\|_{L^{\infty}(\Omega)}\right\}.

Proof.

We prove the Lemma by induction on nn. The base case n=1n=1 follows from Lemma D.3, along with the fact that maxα:|α|2αuL2(Ω)maxα:|α|3αuL2(Ω)\max_{\alpha:|\alpha|\leq 2}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}\leq\max_{\alpha:|\alpha|\leq 3}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}.

To show the inductive case, assume that the claim holds for all m(n1)m\leq(n-1). By Lemma D.3, we have

LnuL2(Ω)\displaystyle\|L^{n}u\|_{L^{2}(\Omega)} =L(Ln1u)L2(Ω)\displaystyle=\|L(L^{n-1}u)\|_{L^{2}(\Omega)}
i,j=1daijij(Ln1u)i,j=1diaijj(Ln1u)+c(Ln1u)L2(Ω)\displaystyle\leq\left\|-\sum_{i,j=1}^{d}a_{ij}\partial_{ij}(L^{n-1}u)-\sum_{i,j=1}^{d}\partial_{i}a_{ij}\partial_{j}(L^{n-1}u)+c(L^{n-1}u)\right\|_{L^{2}(\Omega)}
Cmax{Ln1uL2(Ω),maxii(Ln1u)L2(Ω),maxi,jij(Ln1u)L2(Ω)}\displaystyle\leq C\cdot\max\left\{\|L^{n-1}u\|_{L^{2}(\Omega)},\max_{i}\|\nabla_{i}(L^{n-1}u)\|_{L^{2}(\Omega)},\max_{i,j}\|\nabla_{ij}(L^{n-1}u)\|_{L^{2}(\Omega)}\right\}
C(n!)2Cn1maxα:|α|(n1)+3αuL2(Ω)\displaystyle\leq C\cdot(n!)^{2}\cdot C^{n-1}\max_{\alpha:|\alpha|\leq(n-1)+3}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}

Thus, we have

LnuL2(Ω)(n!)2Cnmaxα:|α|n+2αuL2(Ω)\|L^{n}u\|_{L^{2}(\Omega)}\leq(n!)^{2}\cdot C^{n}\max_{\alpha:|\alpha|\leq n+2}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}

as we need.

Similarly, for k[d]k\in[d], we have:

k(Lnu)L2(Ω)\displaystyle\|\nabla_{k}(L^{n}u)\|_{L^{2}(\Omega)} i=1n(LniLkLi1)(u)L2(Ω)+Ln(ku)L2(Ω)\displaystyle\leq\sum_{i=1}^{n}\left\|\left(L^{n-i}\circ L_{k}\circ L^{i-1}\right)(u)\right\|_{L^{2}(\Omega)}+\|L^{n}(\nabla_{k}u)\|_{L^{2}(\Omega)}
(n)(n!)2Cnmaxα:|α|n+2αuL2(Ω)+(n!)2Cnmaxα:|α|n+2αuL2(Ω)\displaystyle\leq(n)\cdot(n!)^{2}\cdot C^{n}\max_{\alpha:|\alpha|\leq n+2}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}+(n!)^{2}\cdot C^{n}\max_{\alpha:|\alpha|\leq n+2}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}
(n+1)(n!)2Cnmaxα:|α|n+2αuL2(Ω)\displaystyle\leq(n+1)\cdot(n!)^{2}\cdot C^{n}\max_{\alpha:|\alpha|\leq n+2}\|\partial^{\alpha}u\|_{L^{2}(\Omega)} (59)

Finally, for k,l[d]k,l\in[d] we have

kl(Lnu)L2(Ω)\displaystyle\|\nabla_{kl}(L^{n}u)\|_{L^{2}(\Omega)} i,ji<j(LniLkLji1LlLj1)uL2(Ω)\displaystyle\leq\sum_{\begin{subarray}{c}i,j\\ i<j\end{subarray}}\left\|\left(L^{n-i}\circ L_{k}\circ L^{j-i-1}\circ L_{l}\circ L^{j-1}\right)u\right\|_{L^{2}(\Omega)}
+i,ji>j(LnjLkLij1LlLi1)uL2(Ω)\displaystyle+\sum_{\begin{subarray}{c}i,j\\ i>j\end{subarray}}\left\|\left(L^{n-j}\circ L_{k}\circ L^{i-j-1}\circ L_{l}\circ L^{i-1}\right)u\right\|_{L^{2}(\Omega)}
+i(LniLklLi1)uL2(Ω)+Ln(klu)L2(Ω)\displaystyle+\sum_{i}\left\|\left(L^{n-i}\circ L_{kl}\circ L^{i-1}\right)u\right\|_{L^{2}(\Omega)}+\|L^{n}(\nabla_{kl}u)\|_{L^{2}(\Omega)}
n(n+1)(n!)2Cnmaxα:|α|n+2αuL2(Ω)\displaystyle\leq n(n+1)\cdot(n!)^{2}\cdot C^{n}\max_{\alpha:|\alpha|\leq n+2}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}
+n(n!)2Cnmaxα:|α|n+2αuL2(Ω)+Cnmaxα:|α|n+3αuL2(Ω)\displaystyle+n\cdot(n!)^{2}\cdot C^{n}\max_{\alpha:|\alpha|\leq n+2}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}+C^{n}\max_{\alpha:|\alpha|\leq n+3}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}
kl(Lnu)L2(Ω)\displaystyle\implies\|\nabla_{kl}(L^{n}u)\|_{L^{2}(\Omega)} ((n+1)!)2Cnmaxα:|α|n+3αuL2(Ω)\displaystyle\leq\left((n+1)!\right)^{2}\cdot C^{n}\max_{\alpha:|\alpha|\leq n+3}\|\partial^{\alpha}u\|_{L^{2}(\Omega)}

Thus, the claim follows. ∎

Lemma D.5.

Let AniA_{n}^{i}, i[n]i\in[n] be defined as a composition of (ni)(n-i) applications of LL and ii applications of LΣL\circ\Sigma (in any order), s.t. Σδ\|\Sigma\|\leq\delta. Then, we have:

LnAniδi\|L^{-n}A_{n}^{i}\|\leq\delta^{i} (61)
Proof.

We prove the above claim by induction on nn.

For n=1n=1 we have two cases. If A(1)=LΣA^{(1)}=L\circ\Sigma, we have:

L1LΣδ\|L^{-1}\circ L\circ\Sigma\|\leq\delta

If A(1)=LA^{(1)}=L we have:

L1L=1\|L^{-1}L\|=1

Towards the inductive hypothesis, assume that for mn1m\leq n-1 and i[n1]i\in[n-1] it holds that,

Ln1An1iδi\displaystyle\|L^{n-1}A_{n-1}^{i}\|\leq\delta^{i}

For nn, we will have two cases. First, if Ani+1=An1iLΣA_{n}^{i+1}=A_{n-1}^{i}\circ L\circ\Sigma, by submultiplicativity of the operator norm, as well as the fact that similar operators have identical spectra (hence equal operator norm) we have:

LnAni+1\displaystyle\|L^{-n}\circ A^{i+1}_{n}\| =L1L(n1)An1(i)LΣ\displaystyle=\|L^{-1}\circ L^{-(n-1)}\circ A_{n-1}^{(i)}\circ L\circ\Sigma\|
=L(n1)An1iLΣL1\displaystyle=\|L^{-(n-1)}\circ A_{n-1}^{i}\circ L\circ\Sigma\circ L^{-1}\|
δL(n1)A(n1)i1LΣL1\displaystyle\leq\delta\|L^{-(n-1)}A_{(n-1)}^{i-1}\|\|L\circ\Sigma\circ L^{-1}\|
δiδ=δi+1\displaystyle\leq\delta^{i}\delta=\delta^{i+1}

so the inductive claim is proved. In the second case, Ani=An1iLA_{n}^{i}=A_{n-1}^{i}L and we have, by using the fact that the similar operators have identical spectra:

LnAniL\displaystyle\|L^{-n}\circ A_{n}^{i}\circ L\| =L(n1)An1iLL1\displaystyle=\|L^{-(n-1)}\circ A_{n-1}^{i}\circ L\circ L^{-1}\|
=L(n1)An1iδi\displaystyle=\|L^{-(n-1)}\circ A_{n-1}^{i}\|\leq\delta^{i}

where the last inequality follows by the inductive hypothesis. ∎