This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Non-parametric estimates for graphon mean-field particle systems

Erhan Bayraktar Department of Mathematics, University of Michigan, Ann Arbor, MI 48109. [email protected]  and  Hongyi Zhou Department of Mathematics, University of Michigan, Ann Arbor, MI 48109. [email protected]
(Date: March 9, 2024)
Abstract.

We consider the graphon mean-field system introduced in the work of Bayraktar, Chakraborty, and Wu. It is the large-population limit of a heterogeneously interacting diffusive particle system, where the interaction is of mean-field type with weights characterized by an underlying graphon function. Through observation of continuous-time trajectories within the particle system, we construct plug-in estimators of the particle density, the drift coefficient, and thus the graphon interaction weights of the mean-field system. Our estimators for the density and drift are direct results of kernel interpolation on the empirical data, and a deconvolution method leads to an estimator of the underlying graphon function. We show that, as the number of particles increases, the graphon estimator converges to the true graphon function pointwisely, and as a consequence, in the cut metric. Besides, we conduct a minimax analysis within a particular class of particle systems to justify the pointwise optimality of the density and drift estimators.

Key words and phrases:
graphon mean-field system, interacting particles, kernel estimation, minimax analysis
2020 Mathematics Subject Classification:
Primary 62G07, 62H22, 62M05; Secondary 05C80, 60J60, 60K35.
This research is supported in part by the National Science Foundation.

1. Introduction

We study a statistical method to estimate the interaction strength in the graphon mean-field interaction particle system introduced in [3]. The particles in such a system are characterized by not only the feature vector in a physical space d\mathbb{R}^{d} but also the “type” indexed by I=[0,1]I=[0,1]. The interaction strength between particles of different types is quantified by a graphon function G:I×I[0,1]G:I\times I\to[0,1].

Precisely, the system consists of a family of diffusion processes with dynamics

(1.1) Xu(t)\displaystyle X_{u}(t) =Xu(0)+0tIdb(Xu(s),x)G(u,v)μs,v(dy)𝑑v𝑑s\displaystyle=X_{u}(0)+\int_{0}^{t}\int_{I}\int_{\mathbb{R}^{d}}b(X_{u}(s),x)G(u,v)\mu_{s,v}(dy)dvds
+0tσ(Xu(s))𝑑Bu(s),t0,uI,\displaystyle\qquad\qquad+\int_{0}^{t}\sigma(X_{u}(s))dB_{u}(s)\,,\qquad t\geqslant 0\,,\qquad u\in I\,,

where b:d×db:\mathbb{R}^{d}\times\mathbb{R}^{d}\to\mathbb{R} and σ:dd×d\sigma:\mathbb{R}^{d}\to\mathbb{R}^{d\times d} are some Lipschitz functions, {Xu(0)\nonscript|\nonscriptuI}\{X_{u}(0)\nonscript\>|\nonscript\>\mathopen{}\allowbreak u\in I\} are a collection of independent random variables in d\mathbb{R}^{d} with distributions {μ0,u\nonscript|\nonscriptuI}\{\mu_{0,u}\nonscript\>|\nonscript\>\mathopen{}\allowbreak u\in I\}, and {Bu\nonscript|\nonscriptuI}\{B_{u}\nonscript\>|\nonscript\>\mathopen{}\allowbreak u\in I\} are i.i.d. dd-dimensional Brownian motions independent of {Xu(0)\nonscript|\nonscriptuI}\{X_{u}(0)\nonscript\>|\nonscript\>\mathopen{}\allowbreak u\in I\}. Here, we are assuming that the interactions between particles only happen in the drift term.

The main purpose of this study is to estimate the graphon function GG by continuous observation of a finite-population system. It is shown in [3] that system (1.1) is the large-population limit of the following system

(1.2) Xin(t)\displaystyle X^{n}_{i}(t) =Xin(0)+0t1nj=1nb(Xin(s),Xjn(s))gijnds\displaystyle=X_{\frac{i}{n}}(0)+\int_{0}^{t}\frac{1}{n}\sum_{j=1}^{n}b(X^{n}_{i}(s),X^{n}_{j}(s))g^{n}_{ij}ds
+0tσ(Xin(s))𝑑Bin(s),t0,i=1,,n,\displaystyle\qquad\qquad+\int_{0}^{t}\sigma(X^{n}_{i}(s))dB_{\frac{i}{n}}(s)\,,\qquad t\geqslant 0\,,\quad i=1,\dots,n\,,

where gijn=G(in,jn)g^{n}_{ij}=G(\frac{i}{n},\frac{j}{n}). We continuously observe (1.2) over a (fixed) time horizon [0,T][0,T]. Using empirical data, we construct estimators of the particle density and the drift term of (1.1) and finally an estimator of the graphon function GG. The error of our estimation is well-controlled when the number of particles increases, with proper choices of parameters and under certain conditions.

In this work, we are mainly interested in a model resembling the McKean-Vlasov type, where the drift integrand bb takes the form

b(x,y)=F(xy)+V(x),x,yd,b(x,y)=F(x-y)+V(x)\,,\qquad x,y\in\mathbb{R}^{d}\,,

for some sufficiently regular functions FF and VV. The function FF acts as an interacting force between two particles depending on their relative positions, while the function VV accounts for an external force applied to every single particle. Also, we consider graphon functions GG of the form

G(u,v)=g(uv),u,vI,G(u,v)=g(u-v)\,,\qquad u,v\in I\,,

for some function g:g:\mathbb{R}\to\mathbb{R} with certain regularities. When we specialize in this case, the problem boils down to estimating the function gg.

1.1. Background

The study of classical mean-field systems with homogeneous interaction and the associated parabolic equations in the sense of McKean [29] dates back to the 1960s. The original motivation of this study came from plasma theory in statistical physics (see [35, 34, 25] and references therein), and its significance in applied mathematics was well demonstrated throughout the past decades. Several analytic and probabilistic methods were developed during the period to push-forward the study of mean-field systems (see the references in [28]).

However, the early formulation of this problem focuses on the theoretical properties of the systems, which highly rely on precise knowledge of the dynamics of the systems. Statistical methods that fit those properties into models with noises were still in shortage until the 21st century. A modern formulation came to the stage in the 2010s, when the development of other areas of research led to a high demand for statistical inference models. Empirical data from a particle system can be utilized as inferences to estimate the dynamics of the system and thus predict its future behaviors. Those new ideas are applied to various application fields, including chemical and biological systems [2, 11, 30], economics and finance [23, 15], collective behaviors [13, 16], etc.

While the features of particles are usually embedded in a physical space d\mathbb{R}^{d}, the situations in the study of modern networks can be more complicated. Systems with inhomogeneity contain different types of entities (e.g. social networks [36] and power supplies [33]), and interactions between two individuals also depend on their types. Then the idea of studying sophisticated networks using graphon particle systems begins to draw significant attention.

A heterogeneous particle system can be embedded into a graph (deterministic or random) graph (see [17, 19, 20, 21, 22, 31]), where the interaction strength between two types of particles is quantified by the corresponding edge weight. As the number of vertices increases and the graph becomes denser, the interaction strength approaches some bounded symmetric kernel G:[0,1]2[0,1]G:[0,1]^{2}\to[0,1] called graphon. In fact, every graphon is the limit of a sequence of finite graphs, which is discussed in Chapter 11 of [27].

In recent years, graphon mean field games have demonstrated their ability to model densely interactive networks in multiple studies, e.g. [14, 12, 32]. On the purely theoretical side, several results on the stability and stationarity of graphon mean-field systems have been established in [5, 6, 3, 7], and the concentration of measures is well studied in [6, 4]. These properties enable the study of the mean-field systems from a statistical inference point of view.

Statistical inference methods are widely applied to learning the dynamics of interactive systems. Empirical data in a McKean-Vlasov model can be interpolated using a kernel to obtain estimates of particle density [28, 8, 1]. In particular, the data-driven estimation algorithms in [28] automatically choose the best kernel bandwidths among a predetermined, possibly opaque set, which ensures pointwise optimality even without explicitly specifying the parameters. Estimating the interacting force requires more technical tools, including the deconvolution methods introduced by Johannes in [24]. These strategies offer firm technical support in the analysis of interactive systems with unknown driving forces, which admits predictions of the evolution of the systems solely based on empirical data.

1.2. Our contributions and organization of this paper

Recall the graphon function GG in the mean-field system (1.1). In Section 2, we introduce a kernel interpolation method adopted from [28, 8]. We make continuous observations of the nn-particle system (1.2) during a finite time interval [0,T][0,T]. The empirical data from the finite-population system are then interpolated in both the feature space d\mathbb{R}^{d} and the index space II to produce a pointwise estimator μ^hn(t,u,x)\hat{\mu}^{n}_{h}(t,u,x) of the particle density functions μ(t,u,x)\mu(t,u,x). A further interpolation in the time variable leads to an estimator β^h,κn(t,u,x)\hat{\beta}^{n}_{h,\kappa}(t,u,x) for the drift coefficients

β(t,u,x)=defIG(u,v)(V(x)+Fμt,v(x))𝑑v.\beta(t,u,x)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{I}G(u,v)(V(x)+F\ast\mu_{t,v}(x))dv\,.

Then we apply a deconvolution method introduced in [24] to build a pointwise estimator G^ϑn\hat{G}^{n}_{\vartheta} of GG. Here

ϑ=(h1,h2,h3,κ0,κ1,κ2,r,r~)\vartheta=(h_{1},h_{2},h_{3},\kappa_{0},\kappa_{1},\kappa_{2},r,\tilde{r})

are the parameters associated to the estimators: h=(h1,h2,h3)+3h=(h_{1},h_{2},h_{3})\in\mathbb{R}_{+}^{3} are the bandwidths of the kernels, κ=(κ0,κ1,κ2)+3\kappa=(\kappa_{0},\kappa_{1},\kappa_{2})\in\mathbb{R}_{+}^{3} are the denominator cutoff factors to prohibit fraction blowups, and r,r~>0r,\tilde{r}>0 are cutoff radius. We will explain them with more details in Section 2. We show in Section 3 that there exists a sequence (ϑn)n(\vartheta^{n})_{n\in\mathbb{N}} of the parameters such that

limn𝐄|G^ϑnn(u0,v0)G(u0,v0)|2=0,\lim_{n\to\infty}\mathbf{E}\left|\hat{G}^{n}_{\vartheta_{n}}(u_{0},v_{0})-G(u_{0},v_{0})\right|^{2}=0\,,

subject to the regularity conditions.

We will disclose the particular settings of our problem in Section 2. These include the continuity and integrability of the coefficients F,V,GF,V,G and the initial data μ0,u\mu_{0,u}. Then we define the kernel-interpolated estimators μ^hn\hat{\mu}^{n}_{h} and β^h,κn\hat{\beta}^{n}_{h,\kappa}, with free choices of the kernel bandwidth vector hh and cutoff factors κ\kappa. It is worth noticing that the bandwidths of our estimators are fixed throughout the algorithm, whereas the data-driven Goldenshluger-Lepski estimators applied in [28] make dynamic choices of bandwidths from a pre-determined finite set of candidates. The pre-determined set can be invisible to the users, and the algorithm automatically selects the best candidate to give as output an estimation. Such algorithm attains the optimal pointwise oracle estimations without precise knowledge of the system’s continuity property and does not lose too much efficiency for each tuple of plug-in arguments. However, the convergence of our estimator G^ϑn\hat{G}^{n}_{\vartheta} depends on the total L2L^{2}-errors of the plug-in estimators (instead of the pointwise errors), so it becomes more beneficial to fix the bandwidths all along. The minimax analysis in Section 4 shows that our estimator μ^hn\hat{\mu}^{n}_{h} is still pointwisely optimal when given enough information.

We will present upper bounds on the errors of the pointwise estimators in Section 3, with proofs in Section 5. The main idea behind the proofs are the stability of the mean-field systems and the concentration of particle density. We make a direct connection between the (observed) finite-population system (1.2) and the intrinsic graphon mean field system (1.1) by the next inequality step. With particle density μ\mu, for example, we have

(1.3) 𝐄|μ^hnμ|22𝐄|μ^hnμ¯hn|2+2𝐄|μ¯hnμ|2,\mathbf{E}\left|\hat{\mu}^{n}_{h}-\mu\right|^{2}\leqslant 2\mathbf{E}\left|\hat{\mu}^{n}_{h}-\bar{\mu}^{n}_{h}\right|^{2}+2\mathbf{E}\left|\bar{\mu}^{n}_{h}-\mu\right|^{2}\,,

where

μ¯hn(t0,u0,x0)=1ni=1nJh2(u0in)Kh3(x0Xin(t0)).\bar{\mu}^{n}_{h}(t_{0},u_{0},x_{0})=\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))\,.

The first part is controlled by the convergence of (1.2) to (1.1) (see [3]). For the second part, we follow the idea of [28] and produce a Berstein concentration inequality. The use of Bernstein’s inequality here avoids the extra constants that arise from the change of measures in [28], thanks to the independence of particles in the graphon mean-field system. It is worth noticing that all the constants appearing in the inequalities are global (independent of the plug-in arguments t0,u0,x0t_{0},u_{0},x_{0}), and we keep some of the explicit summations in the upper bound on purpose (as can be seen in Lemma 3.1 and 3.2). Those properties preserve the integrability of the whole sums and maintain nice asymptotic behavior of the estimator G^ϑn\hat{G}^{n}_{\vartheta}.

In Section 4, we perform a minimax analysis on the plug-in estimator of particle density μ\mu and the drift coefficient β\beta. We restrict our view to those particle systems with locally Hölder continuous density functions, the existence of which can be seen in several classical texts in Fokker-Planck equations such as [9]. We present an alternative analysis on the pointwise behaviors of μ^hn\hat{\mu}^{n}_{h} and π^hn\hat{\pi}^{n}_{h} with a change-of-measure strategy adapted from [28]. This improves the pointwise errors obtained in Section 3.1 with the sacrifice of a constant factor depending on the value of μ\mu near (t0,u0,x0)(t_{0},u_{0},x_{0}). Balancing among the several error terms leads to optimal asymptotic upper bounds. On the other hand, we find the (theoretical) lower bounds of the estimation error and compare them with the upper bounds, which demonstrates the optimality of our estimators. The proofs are given in Section 6.

2. Model and estimators

2.1. Setting, notation and assumptions

Let us fix a finite time horizon T>0T>0. All observations are made within the time interval [0,T][0,T]. We denote by 𝒞d\mathcal{C}_{d} the space of d\mathbb{R}^{d}-valued continuous functions on [0,T][0,T], i.e. 𝒞d=defC([0,T];d)\mathcal{C}_{d}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}C([0,T];\mathbb{R}^{d}). In more general cases, we write Ck(𝒳;𝒴)C^{k}(\mathcal{X};\mathcal{Y}) for the space of kk-times continuously differentiable functions defined on 𝒳\mathcal{X} taking values in 𝒴\mathcal{Y}. Similarly, we write Lp(𝒳;𝒴)L^{p}(\mathcal{X};\mathcal{Y}) for the space of pp-th Lebesgue-integrable functions, and Ws,p(𝒳;𝒴)W^{s,p}(\mathcal{X};\mathcal{Y}) for the Sobolev space. The position of 𝒴\mathcal{Y} is often neglected if 𝒴=\mathcal{Y}=\mathbb{R}.

An d\mathbb{R}^{d}-valued function ff can be written componentwise (fk)1kd(f_{k})_{1\leqslant k\leqslant d}. The Fourier transform of a function f:ddf:\mathbb{R}^{d}\to\mathbb{R}^{d} is defined componentwise via

df(ξ)=(dfk(ξ))1kd=(deixξfk(x)𝑑x)1kd.\mathcal{F}_{\mathbb{R}^{d}}f(\xi)=(\mathcal{F}_{\mathbb{R}^{d}}f_{k}(\xi))_{1\leqslant k\leqslant d}=\left(\int_{\mathbb{R}^{d}}e^{-ix\xi}f_{k}(x)dx\right)_{1\leqslant k\leqslant d}\,.

This will be applied to the drift coefficients in our deconvolution method.

We usually have the following assumptions on the graphon mean field system (1.1).

Condition 2.1.
  1. (1)

    The drift coefficient b:d×ddb:\mathbb{R}^{d}\times\mathbb{R}^{d}\to\mathbb{R}^{d} is bounded and has bounded first derivatives. It is Lipschitz continuous in the sense that there exists some constant C>0C>0 such that

    (2.1) |b(x,y)b(x,y)|C(|xx|+|yy|),x,x,y,yd.\left|b(x,y)-b(x^{\prime},y^{\prime})\right|\leqslant C(\left|x-x^{\prime}\right|+\left|y-y^{\prime}\right|)\,,\qquad x,x^{\prime},y,y^{\prime}\in\mathbb{R}^{d}\,.
  2. (2)

    The drift coefficient bb takes the form b(x,y)=F(xy)+V(x)b(x,y)=F(x-y)+V(x), where F,VW1,p(d)F,V\in W^{1,p}(\mathbb{R}^{d}) for p=1,2,p=1,2,\infty.

  3. (3)

    The diffusion coefficient σ:dd×d\sigma:\mathbb{R}^{d}\to\mathbb{R}^{d\times d} is Lipschitz in the operator norm on d×d\mathbb{R}^{d\times d}, i.e. there exists some constant C>0C>0 such that

    σ(x)σ(x)C|xx|,x,xd,\left\|\sigma(x)-\sigma(x^{\prime})\right\|\leqslant C\left|x-x^{\prime}\right|\,,\qquad x,x^{\prime}\in\mathbb{R}^{d}\,,

    where \left\|\cdot\right\| is the operator norm of d×d{d\times d} matrices.

  4. (4)

    The diffusion coefficient σ\sigma is bounded in the sense that there exist constants σ±>0\sigma_{\pm}>0 such that

    σ2IσσTσ+2I,\sigma_{-}^{2}I\preceq\sigma\sigma^{T}\preceq\sigma_{+}^{2}I\,,

    where two square matrices MM and NN satisfy MNM\preceq N if NMN-M is positive semi-definite.

Recall that the types of particles are indexed by I=[0,1]I=[0,1], and the interaction strength between particles of two types is given by a graphon function G:I×I[0,1]G:I\times I\to[0,1]. We consider the following conditions for the structure of the graphon function.

Condition 2.2.
  1. (1)

    The graphon function is piecewise Lipschitz in the sense that there exists a constant C>0C>0 and a finite partition jJIj\bigcup_{j\in J}I_{j} of II, such that

    |G(u,v)G(u,v)|C(|uu|+|vv|),(u,v),(u,v)Ii×Ij,i,jJ.\left|G(u,v)-G(u^{\prime},v^{\prime})\right|\leqslant C(\left|u-u^{\prime}\right|+\left|v-v^{\prime}\right|)\,,\qquad(u,v),(u^{\prime},v^{\prime})\in I_{i}\times I_{j},\;i,j\in J\,.
  2. (2)

    The graphon function GG has the form G(u,v)=g(uv)G(u,v)=g(u-v), where g:[0,1]g:\mathbb{R}\to[0,1] is a Lipschitz continuous function with g(0)=g0(0,1]g(0)=g_{0}\in(0,1] a given constant.

  3. (3)

    Upon item (2), we have further that the Fourier transform of gg is in L1L2L^{1}\cap L^{2} and decays fast enough, so that

    r~2|w|>r~|g(w)|2𝑑w0\tilde{r}^{2}\int_{\left|w\right|>\tilde{r}}\left|\mathcal{F}g(w)\right|^{2}dw\to 0

    as r~\tilde{r}\to\infty.

Finally, we examine the initial state of the system.

Condition 2.3.

We denote by 𝒫(S)\mathcal{P}(S) the space of probability measures on a Polish space SS (e.g. d\mathbb{R}^{d}, 𝒞d\mathcal{C}_{d}).

  1. (1)

    The initial distributions μ0,u(dx)\mu_{0,u}(dx) admit density functions xμ(0,u,x)x\mapsto\mu(0,u,x) with respect to the Lebesgue measure on d\mathbb{R}^{d}. There exist some constant c0>0c_{0}>0, c11c_{1}\geqslant 1 such that

    supuIdexp(c0|x|2)μ(0,u,x)c1.\sup_{u\in I}\int_{\mathbb{R}^{d}}\exp(c_{0}\left|x\right|^{2})\mu(0,u,x)\leqslant c_{1}\,.
  2. (2)

    There exists a constant C>0C>0 and a finite collection of intervals {Ij}jJ\{I_{j}\}_{j\in J} such that jJIj=I\bigcup_{j\in J}I_{j}=I, and

    𝒲2(μ0,u,μ0,v)C|uv|,u,vIj,jJ,\mathcal{W}_{2}(\mu_{0,u},\mu_{0,v})\leqslant C\left|u-v\right|\,,\qquad u,v\in I_{j},\quad j\in J\,,

    where 𝒲2:𝒫(d)×𝒫(d)[0,]\mathcal{W}_{2}:\mathcal{P}(\mathbb{R}^{d})\times\mathcal{P}(\mathbb{R}^{d})\to[0,\infty] is the Wasserstein 2-distance.

  3. (3)

    There exists a function ρIL2L(d)\rho_{I}\in L^{2}\cap L^{\infty}(\mathbb{R}^{d}) such that |μ0,uμ0,v|ρI|uv|\left|\mu_{0,u}-\mu_{0,v}\right|\leqslant\rho_{I}\left|u-v\right| almost everywhere, for every u,vIu,v\in I.

The (continuously indexed) graphon mean-field system built on appropriately chosen conditions from above has dynamics

dXu(t)=Idb(Xu(t),x)G(u,v)μt,v(dx)𝑑v𝑑t+σ(Xu(t))dBu(t),Xu(0)μ0,u,\displaystyle dX_{u}(t)=\int_{I}\int_{\mathbb{R}^{d}}b(X_{u}(t),x)G(u,v)\mu_{t,v}(dx)dvdt+\sigma(X_{u}(t))dB_{u}(t)\,,\qquad X_{u}(0)\sim\mu_{0,u}\,,

for uIu\in I, t[0,T]t\in[0,T]. Define the drift term

β(t,u,x,μt)=defIdb(x,y)G(u,v)μt,v(dy)𝑑v\beta(t,u,x,\mu_{t})\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{I}\int_{\mathbb{R}^{d}}b(x,y)G(u,v)\mu_{t,v}(dy)dv

and observe that β:[0,T]×I×d×𝒫(d)I\beta:[0,T]\times I\times\mathbb{R}^{d}\times\mathcal{P}(\mathbb{R}^{d})^{I} is measurable. We will always abbreviate it as β(t,u,x)\beta(t,u,x) (ignoring the mean-field argument). Under Condition 2.1(1)(3), we know that β(t,u,)\beta(t,u,\cdot) is Lipschitz continuous and has at most linear growth for every t[0,T]t\in[0,T] and uIu\in I. This means μt,u\mu_{t,u} is the unique weak solution to the associated Fokker-Planck equation of the diffusion process dXu(t)=β(t,u,Xu(t))dt+σ(Xu(t))dBu(t)dX_{u}(t)=\beta(t,u,X_{u}(t))dt+\sigma(X_{u}(t))dB_{u}(t), and the map Iu(μt,u)t[0,T]𝒫(𝒞d)I\ni u\mapsto(\mu_{t,u})_{t\in[0,T]}\in\mathcal{P}(\mathcal{C}_{d}) is measurable due to Proposition 2.1 in [3]. Further, with Condition 2.1(4) and 2.3(1), every μ,u\mu_{\cdot,u} admits a density function μ(t,u,x)\mu(t,u,x) with respect to the Lebesgue measure on [0,T]×d[0,T]\times\mathbb{R}^{d} (see [9]). Note that μ:[0,T]×I×d+\mu:[0,T]\times I\times\mathbb{R}^{d}\to\mathbb{R}_{+} is measurable. We claim that the densities are asymptotically bounded.

Proposition 2.1.

Assume Conditions 2.1(1)(3)(4), 2.3(1), and that bb is almost everywhere bounded. There exists some C,R>0C,R>0 such that, for every p>d+2p>d+2 and every bounded open set UU disjoint from the closed ball B(0,R)¯\overline{B(0,R)}, we have for all t(0,T)t\in(0,T) and uIu\in I that

μt,uL(U)Cμ0,uL(U)+Ctpd22(1+bp).\left\|\mu_{t,u}\right\|_{L^{\infty}(U)}\leqslant C\left\|\mu_{0,u}\right\|_{L^{\infty}(U)}+Ct^{\frac{p-d-2}{2}}(1+\left\|b\right\|_{\infty}^{p})\,.

As a consequence,

supt[0,T],uIμt,u𝟏{|x|>R}2<.\sup_{t\in[0,T],u\in I}\left\|\mu_{t,u}\mathbf{1}_{\{\left|x\right|>R\}}\right\|_{2}<\infty\,.

The proof will be given in Appendix B. It also shows the L2L^{2}-integrability of the density function μt,u\mu_{t,u} at any t(0,T)t\in(0,T) (see also Corollary 8.2.2, [9]).

Our goal in the next section is to construct estimations of the functions μ(t,u,x)\mu(t,u,x) and β(t,u,x)\beta(t,u,x), to give an estimation of G(u,v)=g(uv)G(u,v)=g(u-v). We will use the L2L^{2}-distance in the probability space to describe our estimation errors.

2.2. Plug-in estimators

To estimate the underlying functions described in the previous paragraph, we make continuous-time observations of the nn-particle system (1.2),

Xin(t)\displaystyle X^{n}_{i}(t) =Xin(0)+0t1nj=1nb(Xin(s),Xjn(s))gijnds+0tσ(Xin(s))𝑑Bin(s),\displaystyle=X_{\frac{i}{n}}(0)+\int_{0}^{t}\frac{1}{n}\sum_{j=1}^{n}b(X^{n}_{i}(s),X^{n}_{j}(s))g^{n}_{ij}ds+\int_{0}^{t}\sigma(X^{n}_{i}(s))dB_{\frac{i}{n}}(s)\,,
i=1,,n,\displaystyle\qquad i=1,\dots,n\,,

where gijn=G(in,jn)g^{n}_{ij}=G(\frac{i}{n},\frac{j}{n}) for every i,j{1,,n}i,j\in\{1,\dots,n\}. This finite system shows some consistency with respect to the mean-field system in the following sense.

Lemma 2.1 (Theorem 3.2, [3]).

Assume Conditions 2.1(1)(3), 2.2(1), and 2.3(1)(2) hold. Then there exists a constant C>0C>0 such that

(2.2) supt[0,T]max1in𝐄|Xin(t)Xin(t)|2Cn.\sup_{t\in[0,T]}\max_{1\leqslant i\leqslant n}\mathbf{E}\left|X^{n}_{i}(t)-X_{\frac{i}{n}}(t)\right|^{2}\leqslant\frac{C}{n}\,.

Kernel Interpolation

We introduce an HJK-kernel adapted from [28]. Choose three functions HCc1()H\in C^{1}_{c}(\mathbb{R}), JCc1()J\in C^{1}_{c}(\mathbb{R}), KCc1(d)K\in C^{1}_{c}(\mathbb{R}^{d}) that are non-negative and normalized:

H(t)𝑑t=J(u)𝑑u=dK(x)𝑑x=1,\int_{\mathbb{R}}H(t)dt=\int_{\mathbb{R}}J(u)du=\int_{\mathbb{R}^{d}}K(x)dx=1\,,

and have order (at least) 1:

tH(t)𝑑t=uJ(u)𝑑u=dxiK(x)𝑑x=0,i=1,,d.\int_{\mathbb{R}}tH(t)dt=\int_{\mathbb{R}}uJ(u)du=\int_{\mathbb{R}^{d}}x_{i}K(x)dx=0\,,\quad i=1,\dots,d\,.

With the bandwidth vector h=(h1,h2,h3)+3h=(h_{1},h_{2},h_{3})\in\mathbb{R}_{+}^{3}, the dilations are defined by

Hh1(t)=h11H(h11t),Jh2(u)=h21J(h21u),Kh3(x)=h3dK(h31x),H_{h_{1}}(t)=h_{1}^{-1}H(h_{1}^{-1}t)\,,\quad J_{h_{2}}(u)=h_{2}^{-1}J(h_{2}^{-1}u)\,,\quad K_{h_{3}}(x)=h_{3}^{-d}K(h_{3}^{-1}x)\,,

and the products are written as

(JK)h(u,x)=Jh2(u)Kh3(x),(HJK)h(t,u,x)=Hh1(t)Jh2(u)Kh3(x).(J\otimes K)_{h}(u,x)=J_{h_{2}}(u)K_{h_{3}}(x)\,,\qquad(H\otimes J\otimes K)_{h}(t,u,x)=H_{h_{1}}(t)J_{h_{2}}(u)K_{h_{3}}(x)\,.

Due to the use of floating bandwidth, we choose without loss of generality kernels H,J,KH,J,K supported in the closed unit ball (in the space where they are defined).

With a given number nn of particles, we run the finite system (Xin)i=1,,n(X^{n}_{i})_{i=1,\dots,n} over the time interval [0,T][0,T]. This gives us the empirical distribution

μtn(du,dx)=1ni=1nδXin(t)(dx)δin(du),t[0,T].\mu^{n}_{t}(du,dx)=\frac{1}{n}\sum_{i=1}^{n}\delta_{X^{n}_{i}(t)}(dx)\delta_{\frac{i}{n}}(du)\,,\qquad t\in[0,T]\,.

Using the JK part of the kernel to interpolate it gives a plug-in estimator of the density μ\mu:

(2.3) μ^hn(t,u,x)=def(JK)hμtn(u,x)=1ni=1nJh2(uin)Kh3(xXin(t))\hat{\mu}^{n}_{h}(t,u,x)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}(J\otimes K)_{h}\ast\mu^{n}_{t}(u,x)=\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u-\frac{i}{n})K_{h_{3}}(x-X^{n}_{i}(t))

for t[0,T]t\in[0,T], uIu\in I, xdx\in\mathbb{R}^{d}.

We also look at an auxiliary quantity π:[0,T]×I×dd\pi:[0,T]\times I\times\mathbb{R}^{d}\to\mathbb{R}^{d} defined by

π(t,u,x)=defβ(t,u,x)μ(t,u,x).\pi(t,u,x)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\beta(t,u,x)\mu(t,u,x)\,.

A discrete approximation is given by

πn(dt,du,dx)=def1ni=1nδXin(t)(dx)δin(du)dXin(t),\pi^{n}(dt,du,dx)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\frac{1}{n}\sum_{i=1}^{n}\delta_{X^{n}_{i}(t)}(dx)\delta_{\frac{i}{n}}(du)dX^{n}_{i}(t)\,,

so for any test function ff, we have

[0,T]×I×df(t,u,x)πn(dt,du,dx)=0T1ni=1nf(t,in,Xin(t))dXin(t)\int_{[0,T]\times I\times\mathbb{R}^{d}}f(t,u,x)\pi^{n}(dt,du,dx)=\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}f(t,\frac{i}{n},X^{n}_{i}(t))dX^{n}_{i}(t)

as a stochastic integral. Using the HJK kernel to interpolate it gives a plug-in estimator of π\pi:

(2.4) π^hn(t,u,x)\displaystyle\hat{\pi}^{n}_{h}(t,u,x) =def(HJK)hπn(t,u,x)\displaystyle\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}(H\otimes J\otimes K)_{h}\ast\pi^{n}(t,u,x)
=0T1ni=1nHh1(ts)Jh2(uin)Kh3(xXin(s))dXin(s)\displaystyle=\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}H_{h_{1}}(t-s)J_{h_{2}}(u-\frac{i}{n})K_{h_{3}}(x-X^{n}_{i}(s))dX^{n}_{i}(s)

for t[0,T]t\in[0,T], uIu\in I, xdx\in\mathbb{R}^{d}. That leads to a plug-in estimator of β\beta,

(2.5) β^h,κ=defπ^hnμ^hnκ2,\hat{\beta}_{h,\kappa}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\frac{\hat{\pi}^{n}_{h}}{\hat{\mu}^{n}_{h}\lor\kappa_{2}}\,,

where κ2>0\kappa_{2}>0 is a cut-off parameter to prevent the denominator from getting too large.

Deconvolution

The deconvolution method is usually applied to obtain a function ff from the convolution fgf\ast g. We follow the ideas of [24] and [28]. Here, we only present the definitions and estimators, while delaying the full intuitions to Appendix A.

To apply the Fourier transform on the index space I=[0,1]I=[0,1], we consider the zero extension to all measurable functions defined in II. With some abuse of notation, we let

μ(t,u,x)={μ(t,u,x),uI0,uI,μ^hn(t,u,x)={μ^hn(t,u,x),uI0,uI,\mu(t,u,x)=\begin{cases}\mu(t,u,x)\,,&u\in I\\ 0\,,&u\in\mathbb{R}\setminus I\end{cases}\,,\qquad\hat{\mu}^{n}_{h}(t,u,x)=\begin{cases}\hat{\mu}^{n}_{h}(t,u,x)\,,&u\in I\\ 0\,,&u\in\mathbb{R}\setminus I\end{cases}\,,
π(t,u,x)={π(t,u,x),uI0,uI,π^hn(t,u,x)={π^hn(t,u,x),uI0,uI,\pi(t,u,x)=\begin{cases}\pi(t,u,x)\,,&u\in I\\ 0\,,&u\in\mathbb{R}\setminus I\end{cases}\,,\qquad\hat{\pi}^{n}_{h}(t,u,x)=\begin{cases}\hat{\pi}^{n}_{h}(t,u,x)\,,&u\in I\\ 0\,,&u\in\mathbb{R}\setminus I\end{cases}\,,
β(t,u,x)={β(t,u,x),uI0,uI,β^h,κn(t,u,x)={β^h,κn(t,u,x),uI0,uI.\beta(t,u,x)=\begin{cases}\beta(t,u,x)\,,&u\in I\\ 0\,,&u\in\mathbb{R}\setminus I\end{cases}\,,\qquad\hat{\beta}^{n}_{h,\kappa}(t,u,x)=\begin{cases}\hat{\beta}^{n}_{h,\kappa}(t,u,x)\,,&u\in I\\ 0\,,&u\in\mathbb{R}\setminus I\end{cases}\,.

Then we define the Fourier transform of function ff supported on II via

If(w)=Ieiwuf(u)𝑑u=eiwuf(u)𝑑u.\mathcal{F}_{I}f(w)=\int_{I}e^{-iwu}f(u)du=\int_{\mathbb{R}}e^{-iwu}f(u)du\,.

Note that we may view I\mathcal{F}_{I} as a linear operator on function-valued functions, and it admits an inverse transform on L2L^{2}-spaces.

In addition, we consider a linear operator ϕ\mathcal{L}_{\phi} on time-dependent functions, defined by

ϕf=0Tf(t)ϕ(t)𝑑t,\mathcal{L}_{\phi}f=\int_{0}^{T}f(t)\phi(t)dt\,,

where ϕL([0,T];)\phi\in L^{\infty}([0,T];\mathbb{C}) has compact support in (0,T)(0,T), such that 0Tϕ(t)𝑑t=0\int_{0}^{T}\phi(t)dt=0 (we denote this subspace of LL^{\infty} functions by L˙\dot{L}^{\infty}). We write \mathcal{L} for ϕ\mathcal{L}_{\phi} when ϕ\phi is fixed and the context has no ambiguity. The intuition of this operator is also explained in Appendix A.

Main estimator and its convergence

Finally, with some additional cutoff parameters, we introduce our estimator of the graphon function

(2.6) G^ϑn(u0,v0)=defg0I1(𝒯β^h,κ,rn𝒯μ^h,rn𝟏{|𝒯μ^h,rn|>κ1,|w|r~})(u0v0)L2(d)I1(𝒯β^h,κ,rn𝒯μ^h,rn𝟏{|𝒯μ^h,rn|>κ1,|w|r~})(0)L2(d)κ0,\hat{G}^{n}_{\vartheta}(u_{0},v_{0})\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}g_{0}\cdot\frac{\left\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}^{n}_{h,r}\right|>\kappa_{1},\left|w\right|\leqslant\tilde{r}\}}\big{)}(u_{0}-v_{0})\right\|_{L^{2}(\mathbb{R}^{d})}}{\left\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}^{n}_{h,r}\right|>\kappa_{1},\left|w\right|\leqslant\tilde{r}\}}\big{)}(0)\right\|_{L^{2}(\mathbb{R}^{d})}\lor\kappa_{0}}\,,

where 𝒯=defIdϕ\mathcal{T}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\mathcal{F}_{I}\mathcal{F}_{\mathbb{R}^{d}}\mathcal{L}_{\phi}, and

μ^h,rn=defμ^hn𝟏{|x|r},β^h,κ,rn=defβ^h,κn𝟏{|x|r}.\hat{\mu}^{n}_{h,r}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\hat{\mu}^{n}_{h}\mathbf{1}_{\{\left|x\right|\leqslant r\}}\,,\qquad\hat{\beta}^{n}_{h,\kappa,r}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\hat{\beta}^{n}_{h,\kappa}\mathbf{1}_{\{\left|x\right|\leqslant r\}}\,.

We will explain the intuition of this estimator in Appendix A.

For the estimate to converge, we need a further (relatively strong) assumption on some data intrinsic to the particle system.

Assumption 2.1.

Given Conditions 2.1, 2.2, and 2.3, there exists ϕL˙([0,T];)\phi\in\dot{L}^{\infty}([0,T];\mathbb{C}), compactly supported in (0,T)(0,T), such that Idϕμ0\mathcal{F}_{I}\mathcal{F}_{\mathbb{R}^{d}}\mathcal{L}_{\phi}\mu\neq 0 almost everywhere on ×d\mathbb{R}\times\mathbb{R}^{d}.

Theorem 2.1 (Main theorem).

Assume Conditions 2.1, 2.2, 2.3, and Assumption 2.1. Take ϕ\phi given by Assumption 2.1. There exists a function 𝒰\mathcal{U} in the number of particles nn and the parameters

ϑ=(h1,h2,h3,κ0,κ1,κ2,r,r~)\vartheta=(h_{1},h_{2},h_{3},\kappa_{0},\kappa_{1},\kappa_{2},r,\tilde{r})

such that

(2.7) maxu0,v0I𝐄|G^ϑn(u0,v0)G(u0,v0)|2𝒰(n,ϑ),\max_{u_{0},v_{0}\in I}\mathbf{E}\left|\hat{G}^{n}_{\vartheta}(u_{0},v_{0})-G(u_{0},v_{0})\right|^{2}\leqslant\mathcal{U}(n,\vartheta)\,,

whenever n,r,r~n,r,\tilde{r} are sufficiently large, h1,h2,h3,κ1>0h_{1},h_{2},h_{3},\kappa_{1}>0 are sufficiently small, and κ0,κ1,κ2>0\kappa_{0},\kappa_{1},\kappa_{2}>0 such that κ0<g0F2\kappa_{0}<g_{0}\left\|F\right\|_{2} and

κ2<inf{μ(t,u,x)tsupp(ϕ),uI,|x|r}.\kappa_{2}<\inf\{\mu(t,u,x)\mid t\in\operatorname{supp}(\phi),u\in I,\left|x\right|\leqslant r\}\,.

Moreover, there exists a sequence of choices of the parameters (ϑn)n(\vartheta_{n})_{n\in\mathbb{N}} such that 𝒰(n,ϑn)0\mathcal{U}(n,\vartheta_{n})\to 0 as nn\to\infty.

Remark 2.1.

Notice that the bound 𝒰(n,ϑ)\mathcal{U}(n,\vartheta) is uniform over all u0,v0Iu_{0},v_{0}\in I. Then the cut distance from an estimator G^\hat{G} to the true graphon function GG is bounded by

d(G^,G)\displaystyle d_{\square}(\hat{G},G) =defsupJ1,J2(I)|J1×J2G^(u,v)G(u,v)dudv|\displaystyle\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\sup_{J_{1},J_{2}\in\mathcal{B}(I)}\left|\int_{J_{1}\times J_{2}}\hat{G}(u,v)-G(u,v)dudv\right|
I×I|G^(u,v)G(u,v)|𝑑u𝑑v.\displaystyle\leqslant\int_{I\times I}\left|\hat{G}(u,v)-G(u,v)\right|dudv\,.

This implies

𝐄(d(G^ϑn,G))I×I𝐄|G^ϑn(u,v)G(u,v)|𝑑u𝑑v𝒰(n,ϑ)1/2.\mathbf{E}\big{(}d_{\square}(\hat{G}^{n}_{\vartheta},G)\big{)}\leqslant\int_{I\times I}\mathbf{E}\left|\hat{G}^{n}_{\vartheta}(u,v)-G(u,v)\right|dudv\leqslant\mathcal{U}(n,\vartheta)^{1/2}\,.

Hence we also have the convergence of the estimator G^ϑn\hat{G}^{n}_{\vartheta} in the cut metric.

Remark 2.2.

The Assumption 2.1 is made for the purpose of dominated convergence, and this is standard in a variety of applications of the deconvolution method [24]. Yet it is nontrivial to verify, as it involves the distribution through the whole time interval. We discuss this further in Appendix C.

3. Convergence of estimators

3.1. Error bounds for density and drift

We give estimates of the particle density μ\mu and the intermediate quantity π\pi in the general setting. These ultimately contribute to the estimate of the graphon function. From this section onwards, we will keep using the asymptotic comparison symbol \lesssim, where fgf\lesssim g means there exists some constant c>0c>0 such that fcgf\leqslant cg. In addition, we write fqgf\lesssim_{q}g if fcgf\leqslant cg for some constant cc depending on the quantity qq (e.g. time horizon TT, dimension dd).

Estimates of particle density μ(t,u,x)\mu(t,u,x)

Lemma 3.1.

Assume that Conditions 2.1(1)(3), 2.2(1), and 2.3(1)(2) hold. For t0(0,T)t_{0}\in(0,T), u0(0,1)u_{0}\in(0,1), x0Ωx_{0}\in\Omega, we have

𝐄|μ^hn(t0,u0,x0)μ(t0,u0,x0)|2\displaystyle\mathbf{E}\left|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}\leqslant
C(n2h322dK2i=1nJh2(u0in)2\displaystyle\qquad C(n^{-2}h_{3}^{-2-2d}\left\|\nabla K\right\|_{\infty}^{2}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}
+n2i=1nJh2(u0in)2Kh3(x0)L2(μt0,in)2+n2h22h32dJ2K2\displaystyle\qquad+n^{-2}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\left\|K_{h_{3}}(x_{0}-\cdot)\right\|^{2}_{L^{2}(\mu_{t_{0},\frac{i}{n}})}+n^{-2}h_{2}^{-2}h_{3}^{-2d}\left\|J\right\|_{\infty}^{2}\left\|K\right\|_{\infty}^{2}
+n2h322dJ22K2+n3h22J2i=1nKh3(x0)L2(μt0,in)2)\displaystyle\qquad+n^{-2}h_{3}^{-2-2d}\left\|J\right\|_{2}^{2}\left\|\nabla K\right\|_{\infty}^{2}+n^{-3}h_{2}^{-2}\left\|\nabla J\right\|_{\infty}^{2}\sum_{i=1}^{n}\left\|K_{h_{3}}(x_{0}-\cdot)\right\|_{L^{2}(\mu_{t_{0},\frac{i}{n}})}^{2})
+|(JK)hμt0(u0,x0)μ(t0,u0,x0)|2,\displaystyle\qquad+\left|(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}\,,

where

(JK)hμt0(u0,x0)=Id(JK)h(u0u,x0x)μ(t0,u,x)𝑑x𝑑u.(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})=\int_{I}\int_{\mathbb{R}^{d}}(J\otimes K)_{h}(u_{0}-u,x_{0}-x)\mu(t_{0},u,x)dxdu\,.

Integrating the above pointwise errors, we have the following L2L^{2}-error on the estimator μhn\mu^{n}_{h}.

Corollary 3.1.

Assume the same hypothesis as in Lemma 3.1. Fix a compact interval [τ1,τ2](0,T)[\tau_{1},\tau_{2}]\subset(0,T). For n1n\gg 1, h2,h31h_{2},h_{3}\ll 1, r1r\gg 1, we have

τ1τ2Id𝐄|μ^h,rn(t,u,x)μ(t,u,x)|2dxdudtC(θ2,μ(r)+θ3,μ(h)+\displaystyle\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\mathbf{E}\left|\hat{\mu}^{n}_{h,r}(t,u,x)-\mu(t,u,x)\right|^{2}dxdudt\leqslant C(\theta_{2,\mu}(r)+\theta_{3,\mu}(h)+
rd(n2h322d+n2h22h32d)+n1h21h3d+n2h22h3d),\displaystyle\qquad r^{d}(n^{-2}h_{3}^{-2-2d}+n^{-2}h_{2}^{-2}h_{3}^{-2d})+n^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{2}^{-2}h_{3}^{-d})\,,

where CC is a constant depending on T,d,b,J,K{T,d,b,J,K}. Here θ2,μ:++\theta_{2,\mu}:\mathbb{R}_{+}\to\mathbb{R}_{+} is a function such that θ2,μ(r)0\theta_{2,\mu}(r)\to 0 as rr\to\infty, and θ3,μ:+3+\theta_{3,\mu}:\mathbb{R}_{+}^{3}\to\mathbb{R}_{+} is a function such that θ3,μ(h)0\theta_{3,\mu}(h)\to 0 as h2+h30h_{2}+h_{3}\to 0.

Estimates of the drift term β(t,u,x)\beta(t,u,x)

Lemma 3.2.

Assume Conditions 2.1(1)(3)(4), 2.2(1), 2.3(1)(2), and that bb is bounded. Then, for t0(0,T)t_{0}\in(0,T), u0(0,1)u_{0}\in(0,1), x0dx_{0}\in\mathbb{R}^{d}, we have

𝐄|πhn(t0,u0,x0)π(t0,u0,x0)|2C(Td2σ+2n1h12h22h32d\displaystyle\mathbf{E}\left|\pi^{n}_{h}(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})\right|^{2}\leqslant C(Td^{2}\sigma_{+}^{2}n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d}
+Tn1h11h22h322db2H22J2K2\displaystyle\quad+Tn^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d}\left\|b\right\|_{\infty}^{2}\left\|H\right\|_{2}^{2}\left\|J\right\|_{\infty}^{2}\left\|\nabla K\right\|_{\infty}^{2}
+Tn2h11h32di=1nJh2(u0in)2\displaystyle\qquad\quad+Tn^{-2}h_{1}^{-1}h_{3}^{-2d}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}
+n2h11h22h32dTb2H22J2K2\displaystyle\quad+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2d}T\left\|b\right\|_{\infty}^{2}\left\|H\right\|_{2}^{2}\left\|J\right\|_{\infty}^{2}\left\|K\right\|_{\infty}^{2}
+Tn2b20THh12(t0t)i=1nJh22(u0in)Kh3(x0)L2(μt,in)2dt\displaystyle\qquad\quad+Tn^{-2}\left\|b\right\|_{\infty}^{2}\int_{0}^{T}H_{h_{1}}^{2}(t_{0}-t)\sum_{i=1}^{n}J_{h_{2}}^{2}(u_{0}-\frac{i}{n})\left\|K_{h_{3}}(x_{0}-\cdot)\right\|_{L^{2}(\mu_{t,\frac{i}{n}})}^{2}dt
+Tn2h11h21h322db2H22J22K2\displaystyle\quad+Tn^{-2}h_{1}^{-1}h_{2}^{-1}h_{3}^{-2-2d}\left\|b\right\|_{\infty}^{2}\left\|H\right\|_{2}^{2}\left\|J\right\|_{2}^{2}\left\|\nabla K\right\|_{\infty}^{2}
+Tn2h11h21h32dH22J22K2\displaystyle\qquad\quad+Tn^{-2}h_{1}^{-1}h_{2}^{-1}h_{3}^{-2d}\left\|H\right\|_{2}^{2}\left\|J\right\|_{2}^{2}\left\|K\right\|_{\infty}^{2}
+Tn3h24b2J20THh12(t0t)i=1nKh3(x0)L2(μt,in)2dt)\displaystyle\qquad\quad+Tn^{-3}h_{2}^{-4}\left\|b\right\|_{\infty}^{2}\left\|\nabla J\right\|_{\infty}^{2}\int_{0}^{T}H_{h_{1}}^{2}(t_{0}-t)\sum_{i=1}^{n}\left\|K_{h_{3}}(x_{0}-\cdot)\right\|_{L^{2}(\mu_{t,\frac{i}{n}})}^{2}dt)
+|(HJK)hπ(t0,u0,x0)π(t0,u0,x0)|2.\displaystyle\quad+\left|(H\otimes J\otimes K)_{h}\ast\pi(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})\right|^{2}\,.

Recall that βh,κn=πhnμhnκ2\beta^{n}_{h,\kappa}=\frac{\pi^{n}_{h}}{\mu^{n}_{h}\lor\kappa_{2}}. Integrating the above pointwise errors and using Corollary 3.1, we have the following L2L^{2}-error on the estimator βh,κn\beta^{n}_{h,\kappa}.

Corollary 3.2.

Assume the same hypothesis as in Lemma 3.2. Fix a compact interval [τ1,τ2](0,T)[\tau_{1},\tau_{2}]\subset(0,T). For n1n\gg 1, h1,h2,h31h_{1},h_{2},h_{3}\ll 1, r1r\gg 1, and 0<κ2<inf{μ(t,u,x)\nonscript|\nonscriptt[τ1,τ2],uI,|x|r}0<\kappa_{2}<\inf\{\mu(t,u,x)\nonscript\>|\nonscript\>\mathopen{}\allowbreak t\in[\tau_{1},\tau_{2}],u\in I,\left|x\right|\leqslant r\}, we have

τ1τ2Id𝐄|β^h,κ,rn(t,u,x)β(t,u,x)|2𝑑x𝑑u𝑑t\displaystyle\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\mathbf{E}\left|\hat{\beta}^{n}_{h,\kappa,r}(t,u,x)-\beta(t,u,x)\right|^{2}dxdudt\leqslant
C(κ22(n1h11h21h3d+n2h11h24h3d)\displaystyle\qquad C(\kappa_{2}^{-2}\big{(}n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-4}h_{3}^{-d}\big{)}
+κ22rd(n1h11h22h322d+n1h12h22h32d)\displaystyle\quad+\kappa_{2}^{-2}r^{d}(n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d}+n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d})
+κ22(θ3,μ(h)+θ3,π(h))+θ2,β(r)),\displaystyle\quad+\kappa_{2}^{-2}(\theta_{3,\mu}(h)+\theta_{3,\pi}(h))+\theta_{2,\beta}(r))\,,

where the constant CC depends on T,d,b,σ,H,J,K{T,d,b,\sigma,H,J,K}. Here θ2,β:++\theta_{2,\beta}:\mathbb{R}_{+}\to\mathbb{R}_{+} is some function such that θ2,β(r)0\theta_{2,\beta}(r)\to 0 as rr\to\infty, and θ3,π:+3+\theta_{3,\pi}:\mathbb{R}_{+}^{3}\to\mathbb{R}_{+} is some function such that θ3,π(h)0\theta_{3,\pi}(h)\to 0 as h1+h2+h30h_{1}+h_{2}+h_{3}\to 0.

Although we postponed the proofs to Section 5, we now justify the main result (Theorem 2.1).

3.2. Proof of main theorem

Proof of Theorem 2.1.

Step 1: For simplicity, we usually abbreviate G^ϑn\hat{G}^{n}_{\vartheta} as G^\hat{G}, and similarly for μ^\hat{\mu} and β^\hat{\beta}.

Fix arbitrary u0,v0Iu_{0},v_{0}\in I. Write G^\hat{G} as

G^(u0,v0)=g0A^(u0v0)A^(0)κ0\hat{G}(u_{0},v_{0})=g_{0}\cdot\frac{\hat{A}(u_{0}-v_{0})}{\hat{A}(0)\lor\kappa_{0}}

where

A^(u)=defI1(𝒯β^h,κ,rn𝒯μ^h,rn𝟏{|𝒯μ^h,rn|>κ1,|w|r~})(u)L2(d).\hat{A}(u)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\left\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}^{n}_{h,r}\right|>\kappa_{1},\left|w\right|\leqslant\tilde{r}\}}\big{)}(u)\right\|_{L^{2}(\mathbb{R}^{d})}\,.

Also let A(u)=g(u)F2A(u)=g(u)\left\|F\right\|_{2}. When κ0<A(0)\kappa_{0}<A(0), we have

|G^(u0,v0)G(u0,v0)|2\displaystyle\left|\hat{G}(u_{0},v_{0})-G(u_{0},v_{0})\right|^{2} =g02|A(u0v0)(A^(0)A(0))A(0)(A^(0)κ0)+A(u0v0)A^(u0v0)A^(0)κ0|2\displaystyle=g_{0}^{2}\left|\frac{A(u_{0}-v_{0})(\hat{A}(0)-A(0))}{A(0)(\hat{A}(0)\lor\kappa_{0})}+\frac{A(u_{0}-v_{0})-\hat{A}(u_{0}-v_{0})}{\hat{A}(0)\lor\kappa_{0}}\right|^{2}
G(u0,v0)2|A^(0)κ0A(0)|2(A^(0)κ0)2+|A(u0v0)A^(u0v0)|2(A^(0)κ0)2\displaystyle\lesssim\frac{G(u_{0},v_{0})^{2}\left|\hat{A}(0)\lor\kappa_{0}-A(0)\right|^{2}}{(\hat{A}(0)\lor\kappa_{0})^{2}}+\frac{\left|A(u_{0}-v_{0})-\hat{A}(u_{0}-v_{0})\right|^{2}}{(\hat{A}(0)\lor\kappa_{0})^{2}}
κ02(|A^(0)A(0)|2+|A(u0v0)A^(u0v0)|2).\displaystyle\lesssim\kappa_{0}^{-2}\left(\left|\hat{A}(0)-A(0)\right|^{2}+\left|A(u_{0}-v_{0})-\hat{A}(u_{0}-v_{0})\right|^{2}\right)\,.

So it suffices to bound the expressions 𝐄|A^(u)A(u)|2\mathbf{E}\left|\hat{A}(u)-A(u)\right|^{2}, and we will do that in the following steps.

Step 2: By Minkowski’s inequality, we have for each uu\in\mathbb{R} that

|I1(𝒯β^h,κ,rn𝒯μ^h,rn𝟏{|𝒯μ^h,rn|>κ1,|w|r~})(u)L2(d)g(u)FL2(d)|\displaystyle\left|\left\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}^{n}_{h,r}\right|>\kappa_{1},\left|w\right|\leqslant\tilde{r}\}}\big{)}(u)\right\|_{L^{2}(\mathbb{R}^{d})}-g(u)\left\|F\right\|_{L^{2}(\mathbb{R}^{d})}\right|
I1(𝒯β^h,κ,rn𝒯μ^h,rn𝟏{|𝒯μ^h,rn|>κ1,|w|r~})(u)g(u)dFL2(d)\displaystyle\leqslant\left\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}^{n}_{h,r}\right|>\kappa_{1},\left|w\right|\leqslant\tilde{r}\}}\big{)}(u)-g(u)\mathcal{F}_{\mathbb{R}^{d}}F\right\|_{L^{2}(\mathbb{R}^{d})}
=I1((𝒯β^h,κ,rn𝒯μ^h,rn𝟏{|𝒯μ^h,rn|>κ1,|w|r~}IgdF))(u)L2(d).\displaystyle=\left\|\mathcal{F}_{I}^{-1}\left(\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}^{n}_{h,r}\right|>\kappa_{1},\left|w\right|\leqslant\tilde{r}\}}-\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F\big{)}\right)(u)\right\|_{L^{2}(\mathbb{R}^{d})}\,.

Then

𝐄|A^(u)A(u)|2\displaystyle\mathbf{E}\left|\hat{A}(u)-A(u)\right|^{2} 𝐄I1(𝒯β^h,κ,rn𝒯μ^h,rn𝟏{|𝒯μ^h,rn|>κ1,|w|r~}IgdF)(u)L2(d)2\displaystyle\lesssim\mathbf{E}\left\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}^{n}_{h,r}\right|>\kappa_{1},\left|w\right|\leqslant\tilde{r}\}}-\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F\big{)}(u)\right\|_{L^{2}(\mathbb{R}^{d})}^{2}
r~2𝐄|w|r~(𝒯β^h,κ,rn𝒯μ^h,rn𝟏{|𝒯μ^h,rn|>κ1}IgdF)(w)L2(d)2𝑑w\displaystyle\lesssim\tilde{r}^{2}\mathbf{E}\int_{\left|w\right|\leqslant\tilde{r}}\left\|\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}^{n}_{h,r}\right|>\kappa_{1}\}}-\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F\big{)}(w)\right\|_{L^{2}(\mathbb{R}^{d})}^{2}dw
+dF|w|>r~eiuwIg(w)𝑑wL2(d)2.\displaystyle\qquad\qquad+\left\|\mathcal{F}_{\mathbb{R}^{d}}F\int_{\left|w\right|>\tilde{r}}e^{iuw}\mathcal{F}_{I}g(w)dw\right\|_{L^{2}(\mathbb{R}^{d})}^{2}\,.

For the second term, we observe by Parseval’s identity that

I1(IgdF)(u)L2(d)=|g(u)|F2F2<,\left\|\mathcal{F}_{I}^{-1}(\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F)(u)\right\|_{L^{2}(\mathbb{R}^{d})}=\left|g(u)\right|\left\|F\right\|_{2}\leqslant\left\|F\right\|_{2}<\infty\,,

so that

dF|w|>r~eiuwIg(w)𝑑wL2(d)2F22(|w|>r~|Ig(w)|𝑑w)20\left\|\mathcal{F}_{\mathbb{R}^{d}}F\int_{\left|w\right|>\tilde{r}}e^{iuw}\mathcal{F}_{I}g(w)dw\right\|_{L^{2}(\mathbb{R}^{d})}^{2}\leqslant\left\|F\right\|_{2}^{2}\left(\int_{\left|w\right|>\tilde{r}}\left|\mathcal{F}_{I}g(w)\right|dw\right)^{2}\to 0

as r~\tilde{r}\to\infty due to Condition 2.2(3), at some rate θ~(r~)\tilde{\theta}(\tilde{r}) independent of uu.

Step 3: Now we look at the first term under Assumption 2.1,

r~2𝐄d|(𝒯β^h,κ,rn𝒯μ^h,rn𝟏{|𝒯μ^h,rn|>κ1,|w|r~}𝒯β𝒯μ)(w,ξ)|2𝑑ξ𝑑w\tilde{r}^{2}\mathbf{E}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\left|\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}^{n}_{h,r}\right|>\kappa_{1},\left|w\right|\leqslant\tilde{r}\}}-\frac{\mathcal{T}\beta}{\mathcal{T}\mu}\big{)}(w,\xi)\right|^{2}d\xi dw

Notice that 𝒯β=(Ig)(dF)(𝒯μ)\mathcal{T}\beta=(\mathcal{F}_{I}g)(\mathcal{F}_{\mathbb{R}^{d}}F)(\mathcal{T}\mu). Following [28], we split the integrand by

𝒯β^𝒯μ^IgdF=(𝒯β^𝒯β)+(Ig)(dF)(𝒯μ𝒯μ^)𝒯μ^,\frac{\mathcal{T}\hat{\beta}}{\mathcal{T}\hat{\mu}}-\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F=\frac{(\mathcal{T}\hat{\beta}-\mathcal{T}\beta)+(\mathcal{F}_{I}g)(\mathcal{F}_{\mathbb{R}^{d}}F)(\mathcal{T}\mu-\mathcal{T}\hat{\mu})}{\mathcal{T}\hat{\mu}}\,,

when the division is well-defined. Then

|𝒯β^𝒯μ^𝟏{|𝒯μ^|>κ1,|w|r~}IgdF|2\displaystyle\left|\frac{\mathcal{T}\hat{\beta}}{\mathcal{T}\hat{\mu}}\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}\right|>\kappa_{1},\left|w\right|\leqslant\tilde{r}\}}-\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F\right|^{2}
κ12|𝒯β^𝒯β|2+κ12|g|2|F|2|𝒯μ𝒯μ^|2+|IgdF(𝟏{|𝒯μ^|κ1}+𝟏{|w|>r~})|2\displaystyle\quad\lesssim\kappa_{1}^{-2}\left|\mathcal{T}\hat{\beta}-\mathcal{T}\beta\right|^{2}+\kappa_{1}^{-2}\left|\mathcal{F}g\right|^{2}\left|\mathcal{F}F\right|^{2}\left|\mathcal{T}\mu-\mathcal{T}\hat{\mu}\right|^{2}+\left|\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F(\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}\right|\leqslant\kappa_{1}\}}+\mathbf{1}_{\{\left|w\right|>\tilde{r}\}})\right|^{2}
=:𝒜1+𝒜2+𝒜3.\displaystyle\quad=:\mathcal{A}_{1}+\mathcal{A}_{2}+\mathcal{A}_{3}\,.

For 𝒜1\mathcal{A}_{1}, the Parseval’s identity gives

dκ12|𝒯β^𝒯β|2(w,ξ)𝑑ξ𝑑w\displaystyle\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\kappa_{1}^{-2}\left|\mathcal{T}\hat{\beta}-\mathcal{T}\beta\right|^{2}(w,\xi)d\xi dw =κ12d|ϕβ^(u,x)ϕβ(u,x)|2𝑑x𝑑u\displaystyle=\kappa_{1}^{-2}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\left|\mathcal{L}_{\phi}\hat{\beta}(u,x)-\mathcal{L}_{\phi}\beta(u,x)\right|^{2}dxdu
κ12ϕ22τ1τ2Id|β^(t,u,x)β(t,u,x)|2𝑑x𝑑u𝑑t,\displaystyle\leqslant\kappa_{1}^{-2}\left\|\phi\right\|_{2}^{2}\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\left|\hat{\beta}(t,u,x)-\beta(t,u,x)\right|^{2}dxdudt\,,

where supp(ϕ)[τ1,τ2]\operatorname{supp}(\phi)\subset[\tau_{1},\tau_{2}]. From Corollary 3.2 we get

𝐄dκ12|𝒯β^(w,ξ)𝒯β(w,ξ)|2𝑑ξ𝑑wT,d,b,σ,H,J,K,ϕ\displaystyle\mathbf{E}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\kappa_{1}^{-2}\left|\mathcal{T}\hat{\beta}(w,\xi)-\mathcal{T}\beta(w,\xi)\right|^{2}d\xi dw\lesssim_{T,d,b,\sigma,H,J,K,\phi}
κ12κ22(n1h11h21h3d+n2h11h24h3d)\displaystyle\qquad\qquad\kappa_{1}^{-2}\kappa_{2}^{-2}\big{(}n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-4}h_{3}^{-d}\big{)}
+κ12κ22rd(n1h11h22h322d+n1h12h22h32d)\displaystyle\qquad\quad+\kappa_{1}^{-2}\kappa_{2}^{-2}r^{d}(n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d}+n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d})
+κ12κ22(θ3,μ(h)+θ3,π(h))+κ12θ2,β(r).\displaystyle\qquad\quad+\kappa_{1}^{-2}\kappa_{2}^{-2}(\theta_{3,\mu}(h)+\theta_{3,\pi}(h))+\kappa_{1}^{-2}\theta_{2,\beta}(r)\,.

For 𝒜2\mathcal{A}_{2}, similarly we have

d|𝒯μ^𝒯μ|2(w,ξ)𝑑ξ𝑑wϕ220Td|μ^(t,u,x)μ(t,u,x)|2𝑑x𝑑u𝑑t.\displaystyle\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\left|\mathcal{T}\hat{\mu}-\mathcal{T}\mu\right|^{2}(w,\xi)d\xi dw\leqslant\left\|\phi\right\|_{2}^{2}\int_{0}^{T}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\left|\hat{\mu}(t,u,x)-\mu(t,u,x)\right|^{2}dxdudt\,.

Also, |g|g12\left|\mathcal{F}g\right|\leqslant\left\|g\right\|_{1}\leqslant 2, and |F|F1<\left|\mathcal{F}F\right|\leqslant\left\|F\right\|_{1}<\infty. Along with Corollary 3.1 we have that

𝐄dκ12|g|2|F|2|𝒯μ𝒯μ^|2𝑑ξ𝑑wT,d,b,σ,J,K,ϕ\displaystyle\mathbf{E}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\kappa_{1}^{-2}\left|\mathcal{F}g\right|^{2}\left|\mathcal{F}F\right|^{2}\left|\mathcal{T}\mu-\mathcal{T}\hat{\mu}\right|^{2}d\xi dw\lesssim_{T,d,b,\sigma,J,K,\phi}
κ12(θ2,μ(r)+θ3,μ(h))\displaystyle\quad\qquad\kappa_{1}^{-2}(\theta_{2,\mu}(r)+\theta_{3,\mu}(h))
+κ12rd(n2h322d+n2h22h32d)+κ12(n1h21h3d+n2h22h3d),\displaystyle\quad\quad+\kappa_{1}^{-2}r^{d}(n^{-2}h_{3}^{-2-2d}+n^{-2}h_{2}^{-2}h_{3}^{-2d})+\kappa_{1}^{-2}(n^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{2}^{-2}h_{3}^{-d})\,,

For 𝒜3\mathcal{A}_{3}, we first observe that

𝐄|Ig(w)dF(ξ)𝟏{|𝒯μ^|κ1}(w,ξ)|2\displaystyle\mathbf{E}\left|\mathcal{F}_{I}g(w)\mathcal{F}_{\mathbb{R}^{d}}F(\xi)\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}\right|\leqslant\kappa_{1}\}}(w,\xi)\right|^{2}
|Ig(w)dF(ξ)|2𝐄[𝟏{|𝒯μ^𝒯μ|κ1}+𝟏{|𝒯μ|2κ1}]\displaystyle\leqslant\left|\mathcal{F}_{I}g(w)\mathcal{F}_{\mathbb{R}^{d}}F(\xi)\right|^{2}\mathbf{E}\left[\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}-\mathcal{T}\mu\right|\geqslant\kappa_{1}\}}+\mathbf{1}_{\{\left|\mathcal{T}\mu\right|\leqslant 2\kappa_{1}\}}\right]
|Ig(w)dF(ξ)|2(κ12𝐄|𝒯(μ^μ)(w,ξ)|2+𝟏{|𝒯μ|2κ1}(w,ξ)).\displaystyle\leqslant\left|\mathcal{F}_{I}g(w)\mathcal{F}_{\mathbb{R}^{d}}F(\xi)\right|^{2}\left(\kappa_{1}^{-2}\mathbf{E}\left|\mathcal{T}(\hat{\mu}-\mu)(w,\xi)\right|^{2}+\mathbf{1}_{\{\left|\mathcal{T}\mu\right|\leqslant 2\kappa_{1}\}}(w,\xi)\right)\,.

Integrating the first part gives

d|g(w)F(ξ)|2κ12𝐄|𝒯(μ^μ)(w,ξ)|2𝑑ξ𝑑w\displaystyle\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\left|\mathcal{F}g(w)\mathcal{F}F(\xi)\right|^{2}\kappa_{1}^{-2}\mathbf{E}\left|\mathcal{T}(\hat{\mu}-\mu)(w,\xi)\right|^{2}d\xi dw
κ12g12F12d𝐄|𝒯(μ^μ)(w,ξ)|2𝑑ξ𝑑w\displaystyle\leqslant\kappa_{1}^{-2}\left\|g\right\|_{1}^{2}\left\|F\right\|_{1}^{2}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\mathbf{E}\left|\mathcal{T}(\hat{\mu}-\mu)(w,\xi)\right|^{2}d\xi dw
g,b,ϕκ120Td𝐄|μ^h,rn(t,u,x)μ(t,u,x)|2𝑑x𝑑u𝑑t,\displaystyle\lesssim_{g,b,\phi}\kappa_{1}^{-2}\int_{0}^{T}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\mathbf{E}\left|\hat{\mu}^{n}_{h,r}(t,u,x)-\mu(t,u,x)\right|^{2}dxdudt\,,

which can be bounded in the same way as 𝒜2\mathcal{A}_{2} using Corollary 3.1.

Integrating for the second part, we get

(3.1) d|Ig(w)dF(ξ)|2𝟏{|𝒯μ|2κ1}(w,ξ)𝑑ξ𝑑w.\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\left|\mathcal{F}_{I}g(w)\mathcal{F}_{\mathbb{R}^{d}}F(\xi)\right|^{2}\mathbf{1}_{\{\left|\mathcal{T}\mu\right|\leqslant 2\kappa_{1}\}}(w,\xi)d\xi dw\,.

Under Assumption 2.1, we apply dominated convergence to see that this quantity goes to 0 as κ10\kappa_{1}\to 0.

In addition,

0Td𝐄|Ig(w)dF(ξ)𝟏{|w|>r~}(w,ξ)|2\displaystyle\int_{0}^{T}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\mathbf{E}\left|\mathcal{F}_{I}g(w)\mathcal{F}_{\mathbb{R}^{d}}F(\xi)\mathbf{1}_{\{\left|w\right|>\tilde{r}\}}(w,\xi)\right|^{2}
0Td|Ig(w)𝟏{|w|>r~}|2|dF(ξ)|2\displaystyle\leqslant\int_{0}^{T}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\left|\mathcal{F}_{I}g(w)\mathbf{1}_{\{\left|w\right|>\tilde{r}\}}\right|^{2}\left|\mathcal{F}_{\mathbb{R}^{d}}F(\xi)\right|^{2}
=TF22{|w|>r~}|Ig(w)|2.\displaystyle=T\left\|F\right\|_{2}^{2}\int_{\{\left|w\right|>\tilde{r}\}}\left|\mathcal{F}_{I}g(w)\right|^{2}\,.

Condition 2.2(3) guarantees that it converges to 0 faster than r~2\tilde{r}^{-2} as r~\tilde{r}\to\infty. We denote the total convergence rate of 𝒜3\mathcal{A}_{3} by θ1(r~,κ1)\theta_{1}(\tilde{r},\kappa_{1}).

To summarize, we define

𝒰(n,ϑ)\displaystyle\mathcal{U}(n,\vartheta) =C(κ02r~2θ~(r~)\displaystyle=C(\kappa_{0}^{-2}\tilde{r}^{2}\tilde{\theta}(\tilde{r})
+r~2κ02θ1(r~,κ1)\displaystyle+\tilde{r}^{2}\kappa_{0}^{-2}\theta_{1}(\tilde{r},\kappa_{1})
+r~2κ02κ12(θ2,β(r)+κ22(θ2,μ(r)+θ3,μ(h)+θ3,π(h)))\displaystyle+\tilde{r}^{2}\kappa_{0}^{-2}\kappa_{1}^{-2}(\theta_{2,\beta}(r)+\kappa_{2}^{-2}(\theta_{2,\mu}(r)+\theta_{3,\mu}(h)+\theta_{3,\pi}(h)))
+r~2κ02κ12κ22(n1h11h21h3d+n2h11h24h3d)\displaystyle+\tilde{r}^{2}\kappa_{0}^{-2}\kappa_{1}^{-2}\kappa_{2}^{-2}(n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-4}h_{3}^{-d})
+r~2κ02κ12κ22rd(n1h11h22h322d+n1h12h22h32d)).\displaystyle+\tilde{r}^{2}\kappa_{0}^{-2}\kappa_{1}^{-2}\kappa_{2}^{-2}r^{d}(n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d}+n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d}))\,.

Here the constant CC depends only on T,d,b,σ,H,J,KT,d,b,\sigma,H,J,K, which are fixed for the model. The upper bound is independent of u0,v0u_{0},v_{0}, so we obtain the uniform bound presented in the theorem.

We can fix κ0\kappa_{0} and ϕ\phi. Let ϑn=(h1(n),h2(n),h3(n),κ0,κ1(n),κ2(r(n)),r(n),r~(n))\vartheta_{n}=(h_{1}^{(n)},h_{2}^{(n)},h_{3}^{(n)},\kappa_{0},\kappa_{1}^{(n)},\kappa_{2}(r^{(n)}),r^{(n)},\tilde{r}^{(n)}), where r(n),r~(n)r^{(n)},\tilde{r}^{(n)}\to\infty slowly enough as nn\to\infty, κ2(r)=12inf{μ(t,u,x)\nonscript|\nonscripttsupp(ϕ),uI,|x|r}\kappa_{2}(r)=\frac{1}{2}\inf\{\mu(t,u,x)\nonscript\>|\nonscript\>\mathopen{}\allowbreak t\in\operatorname{supp}(\phi),u\in I,\left|x\right|\leqslant r\}, and κ1(n),h1(n),h2(n),h3(n)0\kappa_{1}^{(n)},h_{1}^{(n)},h_{2}^{(n)},h_{3}^{(n)}\to 0 accordingly. We may guarantee that the quantities θ1\theta_{1}, θ2,β\theta_{2,\beta}, θ2,μ\theta_{2,\mu}, θ3,μ\theta_{3,\mu}, θ3,π\theta_{3,\pi} all converge to 0. Then 𝒰(n,ϑn)0\mathcal{U}(n,\vartheta_{n})\to 0 as nn\to\infty, finishing the proof. ∎

4. Minimax analysis on plug-in estimators

In Section 3.1 we presented upper bounds for the estimation error 𝐄|μ^hnμ|2\mathbf{E}\left|\hat{\mu}^{n}_{h}-\mu\right|^{2} and 𝐄|π^hnπ|2\mathbf{E}\left|\hat{\pi}^{n}_{h}-\pi\right|^{2}. Those are similar to those given in [10] and are not tight, though convergent. However, the estimators themselves are indeed optimal whenever the parameters h1h_{1}, h2h_{2}, and h3h_{3} are properly chosen. In this section, we conduct a minimax analysis to study both the upper bounds and lower bounds of the estimation errors to witness the optimality.

We first look at an improved upper bound on the error of the plug-in estimator μ^hn\hat{\mu}^{n}_{h}.

Lemma 4.1.
  1. (1)

    Assume Conditions 2.1(1)(3), 2.2(1), and 2.3 hold. For every t0(0,T)t_{0}\in(0,T), u0(0,1)u_{0}\in(0,1), x0dx_{0}\in\mathbb{R}^{d}, we have

    𝐄|μ^hn(t0,u0,x0)μ(t0,u0,x0)|2\displaystyle\mathbf{E}\left|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}\leqslant
    C0(n1h21h3dJ2K22+n2h22h32dJ2K2\displaystyle\qquad C_{0}(n^{-1}h_{2}^{-1}h_{3}^{-d}\left\|J\right\|_{\infty}^{2}\left\|K\right\|_{2}^{2}+n^{-2}h_{2}^{-2}h_{3}^{-2d}\left\|J\right\|_{\infty}^{2}\left\|K\right\|_{\infty}^{2}
    +n2h322dJ2K2+n2h22h3dJ2K22)\displaystyle\qquad+n^{-2}h_{3}^{-2-2d}\left\|J\right\|_{\infty}^{2}\left\|\nabla K\right\|_{\infty}^{2}+n^{-2}h_{2}^{-2}h_{3}^{-d}\left\|\nabla J\right\|_{\infty}^{2}\left\|K\right\|_{2}^{2})
    +|(JK)hμt0(u0,x0)μ(t0,u0,x0)|2,\displaystyle\qquad+\left|(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}\,,

    where C0>0C_{0}>0 is independent of the bandwidths h2,h3h_{2},h_{3} and the number of particles nn.

  2. (2)

    Assume further that there exist some p>2p>2 and cp>0c_{p}>0 such that

    𝒲p(μ0,u,μ0,v)cp|uv|,,u,vI,.\mathcal{W}_{p}(\mu_{0,u},\mu_{0,v})\leqslant c_{p}\left|u-v\right|,,\qquad\forall u,v\in I,.

    Then, for every t0(0,T)t_{0}\in(0,T), u0(0,1)u_{0}\in(0,1), x0dx_{0}\in\mathbb{R}^{d}, we have

    𝐄|μ^hn(t0,u0,x0)μ(t0,u0,x0)|2\displaystyle\mathbf{E}\left|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}\leqslant
    C0(n1h21h3dJ2K22+n2h22h32dJ2K2\displaystyle\qquad C_{0}(n^{-1}h_{2}^{-1}h_{3}^{-d}\left\|J\right\|_{\infty}^{2}\left\|K\right\|_{2}^{2}+n^{-2}h_{2}^{-2}h_{3}^{-2d}\left\|J\right\|_{\infty}^{2}\left\|K\right\|_{\infty}^{2}
    +n2h32p+2pdJ2K2+n2h22h3dJ2K22)\displaystyle\qquad+n^{-2}h_{3}^{-2-\frac{p+2}{p}d}\left\|J\right\|_{\infty}^{2}\left\|\nabla K\right\|_{\infty}^{2}+n^{-2}h_{2}^{-2}h_{3}^{-d}\left\|\nabla J\right\|_{\infty}^{2}\left\|K\right\|_{2}^{2})
    +|(JK)hμt0(u0,x0)μ(t0,u0,x0)|2,\displaystyle\qquad+\left|(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}\,,

    where C0>0C_{0}>0 is independent of the bandwidths h1,h2,h3h_{1},h_{2},h_{3} and the number of particles nn.

There is also an improved upper bound on the error of the plug-in estimator β^h,κn\hat{\beta}^{n}_{h,\kappa}.

Lemma 4.2.

Assume Conditions 2.1(1)(3)(4), 2.2(1), 2.3. Assume further that bb is bounded and has bounded first and second derivatives. For every t0(0,T)t_{0}\in(0,T), u0(0,1)u_{0}\in(0,1), x0dx_{0}\in\mathbb{R}^{d}, we have

𝐄|π^hn(t0,u0,x0)π(t0,u0,x0)|2\displaystyle\mathbf{E}\left|\hat{\pi}^{n}_{h}(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})\right|^{2}\lesssim
n1h11h21h3d+n2h11h22h312d+n2h11h322d\displaystyle\qquad\quad n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}+n^{-2}h_{1}^{-1}h_{3}^{-2-2d}
+|(HJK)hπ(t0,u0,x0)π(t0,u0,x0)|2.\displaystyle\qquad\quad+\left|(H\otimes J\otimes K)_{h}\ast\pi(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})\right|^{2}\,.
Remark 4.1.

We will use the above results for the minimax analysis in the next step. Yet, they are not ideal for estimating the total error of L2([0,T]×I×d)L^{2}([0,T]\times I\times\mathbb{R}^{d}) (as used in the proof of Theorem 2.1). It relies on the local boundedness of the density function μ\mu, which follows from a crude estimate of the form μ(t,u,x)exp(c(1+|x|2))\mu(t,u,x)\leqslant\exp(c(1+\left|x\right|^{2})) (see, for instance, Corollary 8.2.2, [9]).

4.1. Anisotropic Hölder smoothness classes

In the estimates of Lemma 4.1, all items are explicitly quantitative except the bias term

(JK)hμt0(u0,x0)μ(t0,u0,x0)=\displaystyle(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})=
dJh2(u0u)Kh3(x0x)(μ(t0,u,x)μ(t0,u0,x0))𝑑x𝑑u.\displaystyle\qquad\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}J_{h_{2}}(u_{0}-u)K_{h_{3}}(x_{0}-x)(\mu(t_{0},u,x)-\mu(t_{0},u_{0},x_{0}))dxdu\,.

The analysis of this quantity relies on some continuity of the density function μ\mu. Here, we introduce a specific class of particle systems following the idea in [28].

Definition 4.1.

Let α=(α1,,αd)d\alpha=(\alpha_{1},\dots,\alpha_{d})\in\mathbb{N}^{d} be a multi-index. Its norm is given by

|α|=i=1dαi,\left|\alpha\right|=\sum_{i=1}^{d}\alpha_{i}\,,

and the differential operator of order α\alpha is defined by Dα=1α1dαdD^{\alpha}=\partial^{\alpha_{1}}_{1}\cdots\partial^{\alpha_{d}}_{d}.

Definition 4.2.

Let UdU\subset\mathbb{R}^{d} be an open neighborhood of a point x0dx_{0}\in\mathbb{R}^{d}. We say a function f:df:\mathbb{R}^{d}\to\mathbb{R} belongs to the ss-Hölder continuity class at (x0,U)(x_{0},U) with s>0s>0 if for every x,yUx,y\in U and every multi-index α\alpha with |α|s\left|\alpha\right|\leqslant s, we have

|Dαf(x)Dαf(y)|C|xy|ss,\left|D^{\alpha}f(x)-D^{\alpha}f(y)\right|\leqslant C\left|x-y\right|^{s-\lfloor s\rfloor}\,,

where C=C(f,U)C=C(f,U) is the smallest constant that satisfies the above inequality. We denote this class of functions by s(x0)\mathcal{H}^{s}(x_{0}). The s\mathcal{H}^{s}-norm in this class is defined by

fs(x0)=supxU|f(x)|+C(f,U).\left\|f\right\|_{\mathcal{H}^{s}(x_{0})}=\sup_{x\in U}\left|f(x)\right|+C(f,U)\,.

4.2. Minimax estimation for density

Notice that the particle density function μ\mu solves the following system of equations [18]

tμt,u=(μt,uIdb(,y)G(u,v)μt,v(dy)𝑑v)+12i,j=1dij2((σσT)ijμt,u),uI.\partial_{t}\mu_{t,u}=-\nabla\cdot\left(\mu_{t,u}\int_{I}\int_{\mathbb{R}^{d}}b(\cdot,y)G(u,v)\mu_{t,v}(dy)dv\right)+\frac{1}{2}\sum_{i,j=1}^{d}\partial_{ij}^{2}((\sigma\sigma^{T})_{ij}\mu_{t,u})\,,\quad u\in I\,.

This is a system of fully coupled Fokker-Planck equations, and the solution is uniquely determined by (b,σ,G,μ0)𝒫(b,\sigma,G,\mu_{0})\in\mathcal{P}, where 𝒫\mathcal{P} is the class of (b,σ,G,μ0)(b,\sigma,G,\mu_{0}) satisfying Condition 2.1(1)(3), 2.2(1), and 2.3. We denote by 𝒫(b,σ,G,μ0)μ=S(b,σ,G,μ0)\mathcal{P}\ni(b,\sigma,G,\mu_{0})\mapsto\mu=S(b,\sigma,G,\mu_{0}) the solution operator. We consider a specific class of coefficients and initial data.

For s>0s>0, we define

𝒜s(t0,u0,x0)={(b,σ,G,μ0)𝒫\nonscript|\nonscriptμ=S(b,σ,G,μ0),μt0,u0s(x0)},\mathcal{A}^{s}(t_{0},u_{0},x_{0})=\{(b,\sigma,G,\mu_{0})\in\mathcal{P}\nonscript\>|\nonscript\>\mathopen{}\allowbreak\mu=S(b,\sigma,G,\mu_{0}),\mu_{t_{0},u_{0}}\in\mathcal{H}^{s}(x_{0})\}\,,

and set

𝒜s(t0,x0)=uI𝒜s(t0,u,x0).\mathcal{A}^{s}(t_{0},x_{0})=\bigcap_{u\in I}\mathcal{A}^{s}(t_{0},u,x_{0})\,.

Moreover, we consider a restriction of this class

𝒜Ls(t0,x0)={(b,σ,G,μ0)𝒫\nonscript|\nonscriptS(b,σ,G,μ0)t0,u0s(x0)+bL}\mathcal{A}^{s}_{L}(t_{0},x_{0})=\{(b,\sigma,G,\mu_{0})\in\mathcal{P}\nonscript\>|\nonscript\>\mathopen{}\allowbreak\left\|S(b,\sigma,G,\mu_{0})_{t_{0},u_{0}}\right\|_{\mathcal{H}^{s}(x_{0})}+\left\|b\right\|_{\infty}\leqslant L\}

for L>0L>0.

Several articles have discussed the richness of those classes of functions. In particular, Proposition 13 in [28] gives an example of Mckean-Vlasov homogeneous particle systems that fall into this class. With some slight modifications to its proof, we have the following result.

Proposition 4.1.

Let σ=σ0Id×d\sigma=\sigma_{0}I_{d\times d} for some σ0>0\sigma_{0}>0. Let b(x,y)=V(x)+F(xy)b(x,y)=V(x)+F(x-y) with F,VCc1F,V\in C_{c}^{1} and

Vs+Fs+supuIμ0,us′′<,\left\|V\right\|_{\mathcal{H}^{s}}+\left\|F\right\|_{\mathcal{H}^{s^{\prime}}}+\sup_{u\in I}\left\|\mu_{0,u}\right\|_{\mathcal{H}^{s^{\prime\prime}}}<\infty\,,

for some s,s>1s,s^{\prime}>1 with ss\notin\mathbb{Z}, s′′>0s^{\prime\prime}>0. Here, s\mathcal{H}^{s} denotes the global Hölder class (where we simply choose U=dU=\mathbb{R}^{d}).

Suppose further that μ0,u\mu_{0,u} are probability measures with finite first moments and continuous density functions uniformly bounded over uIu\in I. Then, for every t0(0,T)t_{0}\in(0,T) and x0dx_{0}\in\mathbb{R}^{d}, we have μt0,us(x0)\mu_{t_{0},u}\in\mathcal{H}^{s}(x_{0}) for all uIu\in I.

Now we present the minimax theorem, over the particle systems within those restricted smoothness classes.

Theorem 4.1.

Let L>0L>0 and s(0,1)s\in(0,1). Assume one of the following holds:

  1. (a)

    Hypothesis of Lemma 4.1(1) and s12s\geqslant\frac{1}{2},

  2. (b)

    Hypothesis of Lemma 4.1(2) with p>2p>2 and s(0,12)s\in(0,\frac{1}{2}) such that p(24s)(p2)dp(2-4s)\leqslant(p-2)d.

Then for every t0(0,T)t_{0}\in(0,T), u0(0,1)u_{0}\in(0,1), x0dx_{0}\in\mathbb{R}^{d}, we have

(4.1) sup(b,σ,G,μ0)𝒜Ls(t0,x0)infh2,h3>0𝐄|μ^hn(t0,u0,x0)μ(t0,u0,x0)|2Cn2sd+3s.\sup_{(b,\sigma,G,\mu_{0})\in\mathcal{A}^{s}_{L}(t_{0},x_{0})}\inf_{h_{2},h_{3}>0}\mathbf{E}\left|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}\leqslant Cn^{-\frac{2s}{d+3s}}\,.

Moreover,

(4.2) infμ^sup(b,σ,G,μ0)𝒜Ls(t0,x0)𝐄|μ^μ(t0,u0,x0)|2cn2sd+3s,\inf_{\hat{\mu}}\sup_{(b,\sigma,G,\mu_{0})\in\mathcal{A}^{s}_{L}(t_{0},x_{0})}\mathbf{E}\left|\hat{\mu}-\mu(t_{0},u_{0},x_{0})\right|^{2}\geqslant cn^{-\frac{2s}{d+3s}}\,,

where the infimum is taken over all possible estimators of μ(t0,u0,x0)\mu(t_{0},u_{0},x_{0}) constructed from μt0n\mu^{n}_{t_{0}}. Both constants CC and cc depend only on the parameters T,d,LT,d,L, the kernels J,KJ,K, the function ρI\rho_{I} given by Condition 2.3(3), and the values of μ\mu in a small neighborhood of t0,u0,x0t_{0},u_{0},x_{0}.

The proofs are written in Section 6.1. Here we make several remarks on the above results.

Remark 4.2.

Without the extra assumption on the pp-Wasserstein continuity on initial data in Lemma 4.1(2), we would obtain a suboptimal upper bound when s<12s<\frac{1}{2}, namely n1d+1+sn^{-\frac{1}{d+1+s}} (though we still attain the optimal bound when s12s\geqslant\frac{1}{2}).

Remark 4.3.

We mark that the asymptotic behavior is slower than that of [28], namely n2sd+3sn^{-\frac{2s}{d+3s}} rather than n2sd+2sn^{-\frac{2s}{d+2s}}. This is due to the heterogeneity of our graphon particle system. Both the index gap and the density gap between the target particle u0u_{0} and the regularly spaced observations are O(n1)O(n^{-1}), which actually reduces the approximation accuracy. Yet it is possible to estimate the average density μ¯=Iμu𝑑u\bar{\mu}=\int_{I}\mu_{u}du using exactly the same strategy as [28], and that should probably exhibit the identical asymptotic behavior.

Remark 4.4.

Note that our algorithm to estimate μ\mu is not adaptive to the observed data. Instead, users are free to set the parameters (e.g. bandwidths) according to their own accuracy demands, and the parameters are fixed from the start. As long as the bandwidths are chosen appropriately, our estimator still achieves optimality. It also improves computational efficiency compared to the data-driven Goldenshluger-Lepski algorithm applied in [28], which selects the best bandwidths among the set of candidates by making O(n)O(n)-many comparisons. Nevertheless, the adaptive estimator automatically fits the data with the best parameters and produces an error just a logarithmic factor higher than the lower bound. It is nonparametric and requires less knowledge about the initial state of the particle systems.

4.3. Minimax estimation for drift

In this section, we consider a slight generalization of the graphon mean-field system, where the drift coefficient is time-inhomogeneous. Namely, we extend (1.1) to

(4.3) Xu(t)=Xu(0)+0tIb(t,Xu(s),x)G(u,v)μs,v(dy)𝑑v𝑑s+0tσ(Xu(s))𝑑Bu(s),X_{u}(t)=X_{u}(0)+\int_{0}^{t}\int_{I}b(t,X_{u}(s),x)G(u,v)\mu_{s,v}(dy)dvds+\int_{0}^{t}\sigma(X_{u}(s))dB_{u}(s)\,,

where bC1([0,T];W2,(d×d;d))b\in C^{1}([0,T];W^{2,\infty}(\mathbb{R}^{d}\times\mathbb{R}^{d};\mathbb{R}^{d})) satisfies that, t,t[0,T]\forall t,t^{\prime}\in[0,T], x,x,y,ydx,x^{\prime},y,y^{\prime}\in\mathbb{R}^{d},

(4.4) |b(t,x,y)b(t,x,y)|C(|tt|+|xx|+|yy|),\left|b(t,x,y)-b(t^{\prime},x^{\prime},y^{\prime})\right|\leqslant C(\left|t-t^{\prime}\right|+\left|x-x^{\prime}\right|+\left|y-y^{\prime}\right|)\,,

for some constant C>0C>0 (compare with (2.1)). Then the drift term β\beta is given by

β(t,u,x,μt)=IG(u,v)db(t,x,y)μt,v(dy)𝑑v.\beta(t,u,x,\mu_{t})=\int_{I}G(u,v)\int_{\mathbb{R}^{d}}b(t,x,y)\mu_{t,v}(dy)dv\,.

Note that the stability analysis in [3] and previous estimates (Lemmata 4.1 and 4.2) still hold. We will conduct a minimax analysis on the drift term β\beta to show that the estimator β^h,κn\hat{\beta}^{n}_{h,\kappa} is optimal in this general setting.

We can extend the notion of Hölder continuity to the space of time-dependent functions.

Definition 4.3.

Let t0(0,T)t_{0}\in(0,T) and x0dx_{0}\in\mathbb{R}^{d}, and s1,s3>0s_{1},s_{3}>0. We say that a function f:(0,T)×df:(0,T)\times\mathbb{R}^{d}\to\mathbb{R} belongs to class s1,s3(t0,x0)\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0}) if there exists an open neighborhood UU of (t0,x0)(t_{0},x_{0}) in (0,T)×d(0,T)\times\mathbb{R}^{d} such that

fs1,s3(t0,x0)=defsup(t,x)U|f(t,x)|+C(f,U)<,\left\|f\right\|_{\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0})}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\sup_{(t,x)\in U}\left|f(t,x)\right|+C(f,U)<\infty\,,

where C(f,U)C(f,U) is the minimum positive number CC such that

|Dα1,α3f(t,x)Dα1,α3f(s,y)|C(|ts|s1s1+|xy|s3s3)\left|D^{\alpha_{1},\alpha_{3}}f(t,x)-D^{\alpha_{1},\alpha_{3}}f(s,y)\right|\leqslant C(\left|t-s\right|^{s_{1}-\lfloor s_{1}\rfloor}+\left|x-y\right|^{s_{3}-\lfloor s_{3}\rfloor})

for all (t,x),(s,y)U(t,x),(s,y)\in U and all (multi-)indices α1,α3\alpha_{1},\alpha_{3} with |α1|s1\left|\alpha_{1}\right|\leqslant s_{1}, |α3|s3\left|\alpha_{3}\right|\leqslant s_{3}.

We say that a function with values m\mathbb{R}^{m} f=(f1,,fm)f=(f_{1},\dots,f_{m}) belongs to class s1,s3(t0,x0)\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0}) if each component fjs1,s3(t0,x0)f_{j}\in\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0}).

Let 𝒫ˇ\check{\mathcal{P}} be the set of all parameters (b,σ,G,μ0)(b,\sigma,G,\mu_{0}) satisfying the same conditions as 𝒫\mathcal{P} defined above Proposition 4.1, except that bb has the form b(t,x,y)b(t,x,y) and satisfies (4.4). For s1,s3>0s_{1},s_{3}>0, we define

𝒜ˇs1,s3(t0,u0,x0)={(b,σ,G,μ0)𝒫ˇ\nonscript|\nonscriptμ=S(b,σ,G,μ0),μu0s1,s3(t0,x0)},\check{\mathcal{A}}^{s_{1},s_{3}}(t_{0},u_{0},x_{0})=\{(b,\sigma,G,\mu_{0})\in\check{\mathcal{P}}\nonscript\>|\nonscript\>\mathopen{}\allowbreak\mu=S(b,\sigma,G,\mu_{0}),\mu_{u_{0}}\in\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0})\}\,,

and set

𝒜ˇs1,s3(t0,x0)=uI𝒜ˇs1,s3(t0,u,x0).\check{\mathcal{A}}^{s_{1},s_{3}}(t_{0},x_{0})=\bigcap_{u\in I}\check{\mathcal{A}}^{s_{1},s_{3}}(t_{0},u,x_{0})\,.

Moreover, we consider a restriction of this class

𝒜ˇLs1,s3(t0,x0)={(b,σ,G,μ0)𝒫ˇ\nonscript|\nonscriptS(b,σ,G,μ0)s1,s3(t0,x0)+bCt1Wx,y2,L}\check{\mathcal{A}}^{s_{1},s_{3}}_{L}(t_{0},x_{0})=\{(b,\sigma,G,\mu_{0})\in\check{\mathcal{P}}\nonscript\>|\nonscript\>\mathopen{}\allowbreak\left\|S(b,\sigma,G,\mu_{0})\right\|_{\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0})}+\left\|b\right\|_{C^{1}_{t}W_{x,y}^{2,\infty}}\leqslant L\}

for L>0L>0.

Then we present a minimax result on the estimation of the drift term β\beta. The proof will be given in Section 6.2.

Theorem 4.2.

Let L>0L>0, and s1,s3(0,1)s_{1},s_{3}\in(0,1). Define the effective smoothness sbs_{b} by

1sb=1s1+1+1s3.\frac{1}{s_{b}}=\frac{1}{s_{1}}+1+\frac{1}{s_{3}}\,.

Assume the hypothesis of Lemma 4.2, and s1,s3s_{1},s_{3} satisfy the following conditions

1s11s3+20,1s12s3+40.\frac{1}{s_{1}}-\frac{1}{s_{3}}+2\geqslant 0\,,\qquad\frac{1}{s_{1}}-\frac{2}{s_{3}}+4\geqslant 0\,.

Then, for every t0(0,T)t_{0}\in(0,T), u0(0,1)u_{0}\in(0,1), x0dx_{0}\in\mathbb{R}^{d}, we have

(4.5) sup(b,σ,G,μ0)𝒜ˇLs1,s3(t0,x0)infh1,h2,h3,κ2>0𝐄|β^h,κn(t0,u0,x0)β(t0,u0,x0)|2Cn2sb2sb+1.\sup_{(b,\sigma,G,\mu_{0})\in\check{\mathcal{A}}^{s_{1},s_{3}}_{L}(t_{0},x_{0})}\inf_{h_{1},h_{2},h_{3},\kappa_{2}>0}\mathbf{E}\left|\hat{\beta}^{n}_{h,\kappa}(t_{0},u_{0},x_{0})-\beta(t_{0},u_{0},x_{0})\right|^{2}\leqslant Cn^{-\frac{2s_{b}}{2s_{b}+1}}\,.

Moreover

(4.6) infβ^sup(b,σ,G,μ0)𝒜ˇLs1,s3(t0,x0)𝐄|β^β(t0,u0,x0)|2cn2sb2sb+1,\inf_{\hat{\beta}}\sup_{(b,\sigma,G,\mu_{0})\in\check{\mathcal{A}}^{s_{1},s_{3}}_{L}(t_{0},x_{0})}\mathbf{E}\left|\hat{\beta}-\beta(t_{0},u_{0},x_{0})\right|^{2}\geqslant cn^{-\frac{2s_{b}}{2s_{b}+1}}\,,

where the infimum is taken over all possible estimators β^\hat{\beta} built on empirical data (μtn)t[0,T](\mu^{n}_{t})_{t\in[0,T]}. Both constants CC and cc depend only on the parameters T,d,L,κ2T,d,L,\kappa_{2}, the kernels H,J,KH,J,K, the function ρI\rho_{I} given by Condition 2.3(3), and the values of μ\mu in a small neighborhood of t0,u0,x0t_{0},u_{0},x_{0}.

Remark 4.5.

It is also possible to obtain a similar argument in the case s3<12s_{3}<\frac{1}{2}, subject to some additional assumptions on the 𝒲p\mathcal{W}_{p}-continuity of the initial data, as in Theorem 4.1. We leave this to the readers.

Remark 4.6.

As mentioned above, the stability of graphon particle systems given in [3] and the consistency of the estimations in the Lemmata 4.1 and 4.2 remain true in the above generalization with the time-inhomogeneous drift coefficient bb. In this case, the time dependence of β=β(t,u,x,μt)\beta=\beta(t,u,x,\mu_{t}) is not only dependent on the mean field but also on an independent variable tt. This gives more freedom in constructing examples and thus opens up a cleaner path towards the theoretical lower bound. It is for that reason that we include the time-inhomogeneous drifts for the minimax analysis.

Optimality of graphon estimator G^\hat{G}

Recall from (2.6) and the proof of Theorem 2.1 that the error of our estimator G^\hat{G} depends on the L2L^{2} variation of estimators μ^\hat{\mu} and β^\hat{\beta}. There are several convergent quantities that require some stronger conditions to make them explicitly quantitative, but we are not diving into the details in this work. There is also a particular item, θ1(r~,κ1)\theta_{1}(\tilde{r},\kappa_{1}) in the proof of Theorem 2.1, whose convergence is due to Assumption 2.1, that is not explicitly quantifiable without further assumptions. This keeps us away from a minimax analysis of the error 𝐄|G^G|2\mathbf{E}\left|\hat{G}-G\right|^{2} to study its optimality. However, given the pointwise optimality of μ^\hat{\mu} and β^\hat{\beta}, our estimators should show relatively good performances.

5. Proofs for Section 3.1

5.1. Proof of Lemma 3.1 and Corollary 3.1

Proof of Lemma 3.1.

Fix t0,u0,x0t_{0},u_{0},x_{0}. Recall that

μ^hn(t0,u0,x0)=(JK)hμt0n=1ni=1nJh2(u0in)Kh3(x0Xin(t0)).\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})=(J\otimes K)_{h}\ast\mu^{n}_{t_{0}}=\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0}))\,.

We do the following inequality via a telescoping sum,

𝐄|μ^hn(t0,u0,x0)μ(t0,u0,x0)|2\displaystyle\mathbf{E}\left|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}\leqslant
4𝐄|1ni=1nJh2(u0in)(Kh3(x0Xin(t0))Kh3(x0Xin(t0)))|2\displaystyle\qquad 4\mathbf{E}\left|\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})\big{(}K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0}))-K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))\big{)}\right|^{2}
+4𝐄|1ni=1nJh2(u0in)(Kh3(x0Xin(t0))𝐄Kh3(x0Xin(t0)))|2\displaystyle\quad+4\mathbf{E}\left|\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})\big{(}K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))-\mathbf{E}K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))\big{)}\right|^{2}
+4𝐄|IJh2(u0nun)𝐄(Kh3(x0Xin(t0)))Jh2(u0u)𝐄(Kh3(x0Xu(t0)))du|2\displaystyle\quad+4\mathbf{E}\left|\int_{I}J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0})))-J_{h_{2}}(u_{0}-u)\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0})))du\right|^{2}
+4𝐄|(JK)hμ(t0,u0,x0)μ(t0,u0,x0)|2\displaystyle\quad+4\mathbf{E}\left|(J\otimes K)_{h}\ast\mu(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}
=:4(M1+M2+M3+M4).\displaystyle\quad=:4(M_{1}+M_{2}+M_{3}+M_{4})\,.

We only need to provide the appropriate bounds for M1M_{1}, M2M_{2}, and M3M_{3}.

Step 1. We bound M1M_{1} using the convergence of the finite-population system to the graphon mean-field system.

M1\displaystyle M_{1} 1ni=1n|Jh2(u0in)|2𝐄|Kh3(x0Xin(t0))Kh3(x0Xin(t0))|2\displaystyle\leqslant\frac{1}{n}\sum_{i=1}^{n}\left|J_{h_{2}}(u_{0}-\frac{i}{n})\right|^{2}\mathbf{E}\left|K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0}))-K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))\right|^{2}
1ni=1nJh2(u0in)2Kh32𝐄|Xin(t0)Xin(t0)|2.\displaystyle\leqslant\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\left\|\nabla K_{h_{3}}\right\|_{\infty}^{2}\mathbf{E}\left|X^{n}_{i}(t_{0})-X_{\frac{i}{n}}(t_{0})\right|^{2}\,.

Then applying Lemma 2.1 gives the bound

M1n2h322dK2i=1nJh2(u0in)2.M_{1}\lesssim n^{-2}h_{3}^{-2-2d}\left\|\nabla K\right\|_{\infty}^{2}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\,.

Step 2. We bound M2M_{2} following the idea of [28].

For i=1,,ni=1,\dots,n, let

Zi=Jh2(u0in)(Kh3(x0Xin(t0))𝐄(Kh3(x0Xin(t0)))).Z_{i}=J_{h_{2}}(u_{0}-\frac{i}{n})\big{(}K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))-\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0})))\big{)}\,.

The second part simplifies to

M2=𝐄|1ni=1nZi|2=0𝐏(|1ni=1nZi|>z)𝑑z.M_{2}=\mathbf{E}\left|\frac{1}{n}\sum_{i=1}^{n}Z_{i}\right|^{2}=\int_{0}^{\infty}\mathbf{P}\left(\left|\frac{1}{n}\sum_{i=1}^{n}Z_{i}\right|>\sqrt{z}\right)dz\,.

From (1.1) we know XuX_{u} are autonomous. And due to the independence of the Brownian motions {Bu:uI}\{B_{u}:u\in I\}, Z1,,ZnZ_{1},\dots,Z_{n} are all independent. We have 𝐄Zi=0\mathbf{E}Z_{i}=0 and |Zi|(JK)h<\left|Z_{i}\right|\leqslant\left\|(J\otimes K)_{h}\right\|_{\infty}<\infty for every i=1,,ni=1,\dots,n. Then Bernstein’s inequality reads

𝐏(|1ni=1nZi|>z)2exp(12n2zi=1n𝐄Zi2+13nz(JK)h).\mathbf{P}\left(\left|\frac{1}{n}\sum_{i=1}^{n}Z_{i}\right|>\sqrt{z}\right)\leqslant 2\exp\left(-\frac{\frac{1}{2}n^{2}z}{\sum_{i=1}^{n}\mathbf{E}Z_{i}^{2}+\frac{1}{3}n\sqrt{z}\left\|(J\otimes K)_{h}\right\|_{\infty}}\right)\,.

Now we apply the inequality (48) in [28] to see that

(5.1) 0𝐏(|1ni=1nZi|>z)𝑑zmax{2n2i=1n𝐄Zi2,49n2(JK)h2}.\int_{0}^{\infty}\mathbf{P}\left(\left|\frac{1}{n}\sum_{i=1}^{n}Z_{i}\right|>\sqrt{z}\right)dz\lesssim\max\left\{2n^{-2}\sum_{i=1}^{n}\mathbf{E}Z_{i}^{2},\frac{4}{9}n^{-2}\left\|(J\otimes K)_{h}\right\|_{\infty}^{2}\right\}\,.

Observe that

𝐄Zi24Jh2(u0in)2𝐄(Kh3(x0Xin(t0))2)=4Jh2(u0in)2Kh3(x0)L2(μt0,in)2.\mathbf{E}Z_{i}^{2}\leqslant 4J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))^{2})=4J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\left\|K_{h_{3}}(x_{0}-\cdot)\right\|_{L^{2}(\mu_{t_{0},\frac{i}{n}})}^{2}\,.

Plugging into (5.1) gives the bound

M2n2i=1nJh2(u0in)2Kh3(x0)L2(μt0,in)2+n2h22h32dJ2K2.M_{2}\lesssim n^{-2}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\left\|K_{h_{3}}(x_{0}-\cdot)\right\|^{2}_{L^{2}(\mu_{t_{0},\frac{i}{n}})}+n^{-2}h_{2}^{-2}h_{3}^{-2d}\left\|J\right\|_{\infty}^{2}\left\|K\right\|_{\infty}^{2}\,.

Step 3. Notice that Jh2J_{h_{2}} is supported on B(0,h2)¯\overline{B(0,h_{2})}. We bound M3M_{3} using only Minkowski’s inequality and the inequality mean-value theorem.

M3\displaystyle M_{3} 4h2I|Jh2(u0nun)Jh2(u0u)|2(𝐄(Kh3(x0Xnun(t0))))2\displaystyle\leqslant 4h_{2}\int_{I}\left|J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})-J_{h_{2}}(u_{0}-u)\right|^{2}(\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0}))))^{2}
+Jh2(u0u)2(𝐄|Kh3(x0Xnun(t0))Kh3(x0Xu(t0))|2)2du\displaystyle\qquad+J_{h_{2}}(u_{0}-u)^{2}\big{(}\mathbf{E}\left|K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0}))-K_{h_{3}}(x_{0}-X_{u}(t_{0}))\right|^{2}\big{)}^{2}du
4h22(u0+supp(Jh2))Jh22|nunu|2Kh3(x0)L2(μt0,nun)2\displaystyle\leqslant 4h_{2}\int_{2(u_{0}+\operatorname{supp}(J_{h_{2}}))}\left\|\nabla J_{h_{2}}\right\|_{\infty}^{2}\left|\frac{\lceil nu\rceil}{n}-u\right|^{2}\left\|K_{h_{3}}(x_{0}-\cdot)\right\|_{L^{2}(\mu_{t_{0},\frac{\lceil nu\rceil}{n}})}^{2}
+Jh2(u0u)2kh32𝐄|Xnun(t0)Xu(t0)|2du\displaystyle\qquad+J_{h_{2}}(u_{0}-u)^{2}\left\|\nabla k_{h_{3}}\right\|_{\infty}^{2}\mathbf{E}\left|X_{\frac{\lceil nu\rceil}{n}}(t_{0})-X_{u}(t_{0})\right|^{2}du
n3h22J2i=1nKh3(x0)L2(μt0,in)2+n2h322dJ22K2,\displaystyle\lesssim n^{-3}h_{2}^{-2}\left\|\nabla J\right\|_{\infty}^{2}\sum_{i=1}^{n}\left\|K_{h_{3}}(x_{0}-\cdot)\right\|_{L^{2}(\mu_{t_{0},\frac{i}{n}})}^{2}+n^{-2}h_{3}^{-2-2d}\left\|J\right\|_{2}^{2}\left\|\nabla K\right\|_{\infty}^{2}\,,

where the last step uses Theorem 2.1 of [3]. That finishes the proof. ∎

We will consistently use the equality that follows from Fubini-Tonelli theorem,

(5.2) dKh3(x)L2(μt,u)2dx=h3dK22,\int_{\mathbb{R}^{d}}\left\|K_{h_{3}}(x-\cdot)\right\|_{L^{2}(\mu_{t,u})}^{2}dx=h_{3}^{-d}\left\|K\right\|_{2}^{2}\,,

where the latter L2L^{2}-norm is taken with respect to the Lebesgue measure on d\mathbb{R}^{d}.

Proof of Corollary 3.1.

Recall that μ^h,rn=μ^hn𝟏{|x|r}\hat{\mu}^{n}_{h,r}=\hat{\mu}^{n}_{h}\mathbf{1}_{\{\left|x\right|\leqslant r\}}. We break the integral into two parts

τ1τ2Id𝐄|μ^h,rn(t,u,x)μ(t,u,x)|2𝑑x𝑑u𝑑t=\displaystyle\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\mathbf{E}\left|\hat{\mu}^{n}_{h,r}(t,u,x)-\mu(t,u,x)\right|^{2}dxdudt=
τ1τ2I{|x|r}𝐄|μ^h,rn(t,u,x)μ(t,u,x)|2𝑑x𝑑u𝑑t+τ1τ2I{|x|>r}μ(t,u,x)2𝑑x𝑑u𝑑t.\displaystyle\qquad\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\{\left|x\right|\leqslant r\}}\mathbf{E}\left|\hat{\mu}^{n}_{h,r}(t,u,x)-\mu(t,u,x)\right|^{2}dxdudt+\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\{\left|x\right|>r\}}\mu(t,u,x)^{2}dxdudt\,.

The second part tends to 0 as rr\to\infty due to Proposition 2.1 and dominated convergence. We denote the convergence rate by θ2,μ(r)\theta_{2,\mu}(r).

For the first part, we rearrange and integrate the terms given in Lemma 3.1. Recall that

𝐄|μ^hn(t,u,x)μ(t,u,x)|2\displaystyle\mathbf{E}\left|\hat{\mu}^{n}_{h}(t,u,x)-\mu(t,u,x)\right|^{2}\lesssim
n2h22h32d+n2h322d\displaystyle\qquad n^{-2}h_{2}^{-2}h_{3}^{-2d}+n^{-2}h_{3}^{-2-2d}
(5.3) +n2h322dK2i=1nJh2(uin)2\displaystyle\quad+n^{-2}h_{3}^{-2-2d}\left\|\nabla K\right\|_{\infty}^{2}\sum_{i=1}^{n}J_{h_{2}}(u-\frac{i}{n})^{2}
(5.4) +n2i=1nJh2(uin)2Kh3(x)L2(μt,in)2\displaystyle\quad+n^{-2}\sum_{i=1}^{n}J_{h_{2}}(u-\frac{i}{n})^{2}\left\|K_{h_{3}}(x-\cdot)\right\|^{2}_{L^{2}(\mu_{t,\frac{i}{n}})}
(5.5) +n3h22J2i=1nKh3(x)L2(μt,in)2\displaystyle\quad+n^{-3}h_{2}^{-2}\left\|\nabla J\right\|_{\infty}^{2}\sum_{i=1}^{n}\left\|K_{h_{3}}(x-\cdot)\right\|_{L^{2}(\mu_{t_{,}\frac{i}{n}})}^{2}
+|(JK)hμt(u,x)μ(t,u,x)|2.\displaystyle\quad+\left|(J\otimes K)_{h}\ast\mu_{t}(u,x)-\mu(t,u,x)\right|^{2}\,.

The first line of bounds are independent of t,u,xt,u,x, so that integrating them gives

Trd(n2h22h32d+n2h322d).Tr^{d}(n^{-2}h_{2}^{-2}h_{3}^{-2d}+n^{-2}h_{3}^{-2-2d})\,.

The middle three lines are computed as follows. For line (5.3) we have

I{|x|r}n2h322dK2i=1nJh2(uin)2dxdu\displaystyle\int_{I}\int_{\{\left|x\right|\leqslant r\}}n^{-2}h_{3}^{-2-2d}\left\|\nabla K\right\|_{\infty}^{2}\sum_{i=1}^{n}J_{h_{2}}(u-\frac{i}{n})^{2}dxdu
rdn2h322dK2i=1nJh2(uin)2\displaystyle\qquad\leqslant r^{d}n^{-2}h_{3}^{-2-2d}\left\|\nabla K\right\|_{\infty}^{2}\sum_{i=1}^{n}\int_{\mathbb{R}}J_{h_{2}}(u-\frac{i}{n})^{2}
=rdn1h21h322dJ22.\displaystyle\qquad=r^{d}n^{-1}h_{2}^{-1}h_{3}^{-2-2d}\left\|J\right\|_{2}^{2}\,.

With the same idea and using (5.2), line (5.4) gives

Idn2i=1nJh2(uin)2Kh3(x)L2(μt,in)2dxdu\displaystyle\int_{I}\int_{\mathbb{R}^{d}}n^{-2}\sum_{i=1}^{n}J_{h_{2}}(u-\frac{i}{n})^{2}\left\|K_{h_{3}}(x-\cdot)\right\|^{2}_{L^{2}(\mu_{t,\frac{i}{n}})}dxdu
n1h21h3dJ22K22.\displaystyle\qquad\leqslant n^{-1}h_{2}^{-1}h_{3}^{-d}\left\|J\right\|_{2}^{2}\left\|K\right\|_{2}^{2}\,.

Analogously, line (5.5) gives

Idn3h22J2i=1nKh3(x)L2(μt,in)2dxdu\displaystyle\int_{I}\int_{\mathbb{R}^{d}}n^{-3}h_{2}^{-2}\left\|\nabla J\right\|_{\infty}^{2}\sum_{i=1}^{n}\left\|K_{h_{3}}(x-\cdot)\right\|_{L^{2}(\mu_{t_{,}\frac{i}{n}})}^{2}dxdu
=n2h22h3dJ2K22.\displaystyle\qquad=n^{-2}h_{2}^{-2}h_{3}^{-d}\left\|\nabla J\right\|_{\infty}^{2}\left\|K\right\|_{2}^{2}\,.

In addition, the final line expands as

Id|(JK)hμt(u,x)μ(t,u,x)|2𝑑u𝑑x\displaystyle\int_{I}\int_{\mathbb{R}^{d}}\left|(J\otimes K)_{h}\ast\mu_{t}(u,x)-\mu(t,u,x)\right|^{2}dudx
×d|×U(JK)h(u,x)(μ(t,uu,xx)μ(t,u,x))𝑑x𝑑u|2𝑑x𝑑u\displaystyle\leqslant\int_{\mathbb{R}\times\mathbb{R}^{d}}\left|\int_{\mathbb{R}\times U}(J\otimes K)_{h}(u^{\prime},x^{\prime})(\mu(t,u-u^{\prime},x-x^{\prime})-\mu(t,u,x))dx^{\prime}du^{\prime}\right|^{2}dxdu
(×d(JK)h(u,x)(×d(μ(t,uu,xx)μ(t,u,x))2𝑑x𝑑u)1/2𝑑x𝑑u)2\displaystyle\leqslant\left(\int_{\mathbb{R}\times\mathbb{R}^{d}}(J\otimes K)_{h}(u^{\prime},x^{\prime})\left(\int_{\mathbb{R}\times\mathbb{R}^{d}}(\mu(t,u-u^{\prime},x-x^{\prime})-\mu(t,u,x))^{2}dxdu\right)^{1/2}dx^{\prime}du^{\prime}\right)^{2}
sup|u|h2,|x|h3μ(t,u,x)μ(t,,)L2(×d)2.\displaystyle\leqslant\sup_{\left|u^{\prime}\right|\leqslant h_{2},\left|x^{\prime}\right|\leqslant h_{3}}\left\|\mu(t,\cdot-u^{\prime},\cdot-x^{\prime})-\mu(t,\cdot,\cdot)\right\|_{L^{2}(\mathbb{R}\times\mathbb{R}^{d})}^{2}\,.

Note that μt,u2\left\|\mu_{t,u}\right\|_{2} is uniformly bounded for t[τ1,τ2]t\in[\tau_{1},\tau_{2}] and uIu\in I as a consequence of Proposition 2.1 (see also Corollary 8.2.2 of [9] for details on local upper bounds of particle density). As translations converge in L2L^{2}, with dominated convergence we have

(5.6) limh2,h30τ1τ2Id|(JK)hμt(u,x)μ(t,u,x)|2𝑑u𝑑x𝑑t=0.\lim_{h_{2},h_{3}\to 0}\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\left|(J\otimes K)_{h}\ast\mu_{t}(u,x)-\mu(t,u,x)\right|^{2}dudxdt=0\,.

We denote the convergence rate by θ3,μ(h)\theta_{3,\mu}(h).

In summary, the L2L^{2}-error of μ^h,rn\hat{\mu}^{n}_{h,r} is given by

τ1τ2Id𝐄|μ^h,rn(t,u,x)μ(t,u,x)|2𝑑x𝑑u𝑑tT,b,J,Kθ2,μ(r)+θ3,μ(h)+\displaystyle\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\mathbf{E}\left|\hat{\mu}^{n}_{h,r}(t,u,x)-\mu(t,u,x)\right|^{2}dxdudt\lesssim_{T,b,J,K}\theta_{2,\mu}(r)+\theta_{3,\mu}(h)+
rd(n2h322d+n2h22h32d)+n1h21h3d+n2h22h3d.\displaystyle\qquad r^{d}(n^{-2}h_{3}^{-2-2d}+n^{-2}h_{2}^{-2}h_{3}^{-2d})+n^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{2}^{-2}h_{3}^{-d}\,.

5.2. Proof of Lemma 3.2 and Corollary 3.2

Recall the dynamics of XuX_{u} defined in (1.1) For simplicity, we let

Yu(t)=defβ(t,u,Xu(t))=Idb(Xu(t),x)G(u,v)μt,v(dy)𝑑vY_{u}(t)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\beta(t,u,X_{u}(t))=\int_{I}\int_{\mathbb{R}^{d}}b(X_{u}(t),x)G(u,v)\mu_{t,v}(dy)dv

for every uIu\in I and t[0,T]t\in[0,T]. Similarly, for the finite-population system, we let

Yin(t)=def1nj=1nb(Xin(t),Xjn(t))gijnY^{n}_{i}(t)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\frac{1}{n}\sum_{j=1}^{n}b(X^{n}_{i}(t),X^{n}_{j}(t))g^{n}_{ij}

for i=1,,ni=1,\dots,n, and t[0,T]t\in[0,T]. Observe that |Yin(t)|,|Yu(t)|b\left|Y^{n}_{i}(t)\right|,\left|Y_{u}(t)\right|\leqslant\left\|b\right\|_{\infty}. We have the following consistency result.

Lemma 5.1.

We assume the same hypothesis as in Lemma 2.1. In the nn-particle system, we have the following

(5.7) max1in𝐄|Yin(t)Yin(t)|2max1in𝐄|Xin(t)Xin(t)|2+1n,\max_{1\leqslant i\leqslant n}\mathbf{E}\left|Y^{n}_{i}(t)-Y_{\frac{i}{n}}(t)\right|^{2}\lesssim\max_{1\leqslant i\leqslant n}\mathbf{E}\left|X^{n}_{i}(t)-X_{\frac{i}{n}}(t)\right|^{2}+\frac{1}{n}\,,

for every t[0,T]t\in[0,T]. As a consequence,

maxi=1,,n0Tψ(t)𝐄|Yin(t)Yin(t)|2ψ1n.\max_{i=1,\dots,n}\int_{0}^{T}\psi(t)\mathbf{E}\left|Y_{\frac{i}{n}}(t)-Y^{n}_{i}(t)\right|^{2}\lesssim_{\psi}\frac{1}{n}\,.

for any bounded continuous function ψ\psi.

Although we defer the proof to Appendix B, we are now ready to prove our estimates of π\pi.

Proof of Lemma 3.2.

Fix t0,u0,x0t_{0},u_{0},x_{0}. Recall that

π^hn(t0,u0,x0)\displaystyle\hat{\pi}^{n}_{h}(t_{0},u_{0},x_{0}) =0T1ni=1n(HJK)h(t0t,u0in,x0Xin(t))dXin(t)\displaystyle=\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))dX^{n}_{i}(t)
=0T1ni=1n(HJK)h(t0t,u0in,x0Xin(t))Yin(t)dt\displaystyle=\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))Y^{n}_{i}(t)dt
+0T1ni=1n(HJK)h(t0t,u0in,x0Xin(t))σ(Xin(t))dBin(t).\displaystyle\qquad+\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))\sigma(X^{n}_{i}(t))dB_{\frac{i}{n}}(t)\,.

From this we may then write

𝐄|πhn(t0,u0,x0)π(t0,u0,x0)|2\displaystyle\mathbf{E}\left|\pi^{n}_{h}(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})\right|^{2}\leqslant
5𝐄|0THh1(t0t)1ni=1nJh2(u0in)\displaystyle\qquad 5\mathbf{E}\left|\int_{0}^{T}H_{h_{1}}(t_{0}-t)\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})\right.
(Kh3(x0Xin(t))Yin(t)Kh3(x0Xin(t))Yin(t))dt|2\displaystyle\qquad\qquad\qquad\left.\big{(}K_{h_{3}}(x_{0}-X^{n}_{i}(t))Y^{n}_{i}(t)-K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t)\big{)}dt\right|^{2}
+5𝐄|0THh1(t0t)1ni=1nJh2(u0in)\displaystyle\quad+5\mathbf{E}\left|\int_{0}^{T}H_{h_{1}}(t_{0}-t)\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})\right.
(Kh3(x0Xin(t))Yin(t)𝐄(Kh3(x0Xin(t))Yin(t)))dt|2\displaystyle\qquad\qquad\qquad\left.\big{(}K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t)-\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t))\big{)}dt\right|^{2}
+5𝐄|0THh1(t0t)I(Jh2(u0nun)𝐄(Kh3(x0Xnun(t))Ynun(t))\displaystyle\quad+5\mathbf{E}\left|\int_{0}^{T}H_{h_{1}}(t_{0}-t)\int_{I}\left(J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))Y_{\frac{\lceil nu\rceil}{n}}(t))\right.\right.
Jh2(u0u)𝐄(Kh3(x0Xu(t))Yu(t)))dudt|2\displaystyle\qquad\qquad\qquad\left.\left.-J_{h_{2}}(u_{0}-u)\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t))Y_{u}(t))\right)dudt\right|^{2}
+5𝐄|0TId(HJK)h(t0t,u0u,x0x)π(t,u,x)𝑑x𝑑u𝑑tπ(t0,u0,x0)|2\displaystyle\quad+5\mathbf{E}\left|\int_{0}^{T}\int_{I}\int_{\mathbb{R}^{d}}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-u,x_{0}-x)\pi(t,u,x)dxdudt-\pi(t_{0},u_{0},x_{0})\right|^{2}
+5𝐄|0T1ni=1nHh1(t0t)Jh2(u0in)Kh3(x0Xin(t))σ(Xin(t))dBin(t)|2\displaystyle\quad+5\mathbf{E}\left|\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}H_{h_{1}}(t_{0}-t)J_{h_{2}}(u_{0}-\frac{i}{n})K_{h_{3}}(x_{0}-X^{n}_{i}(t))\sigma(X^{n}_{i}(t))dB_{\frac{i}{n}}(t)\right|^{2}
=:5(P1+P2+P3+P4+P5).\displaystyle\quad=:5(P_{1}+P_{2}+P_{3}+P_{4}+P_{5})\,.

We do not need to do anything for P4P_{4} for now, and P5P_{5} will be proved using standard techniques in stochastic analysis at the end. The proof of the rest are bounded in analogously as M1M_{1} through M3M_{3} in the proof of Lemma 3.1.

Step 1. Observe that P1P_{1} is upper bounded by

T0THh12(t0t)1ni=1nJh2(u0in)2\displaystyle T\int_{0}^{T}H_{h_{1}}^{2}(t_{0}-t)\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}
𝐄|Kh3(x0Xin(t))Yin(t)Kh3(x0Xin(t))Yin(t)|2dt.\displaystyle\qquad\qquad\mathbf{E}\left|K_{h_{3}}(x_{0}-X^{n}_{i}(t))Y^{n}_{i}(t)-K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t)\right|^{2}dt\,.

For each tt and ii, we have

|Kh3(x0Xin(t))Yin(t)Kh3(x0Xin(t))Yin(t)|\displaystyle\left|K_{h_{3}}(x_{0}-X^{n}_{i}(t))Y^{n}_{i}(t)-K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t)\right|
|Kh3(x0Xin(t))Kh3(x0Xin(t))||Yin(t)|+Kh3(x0Xin(t))|Yin(t)Yin(t)|\displaystyle\leqslant\left|K_{h_{3}}(x_{0}-X^{n}_{i}(t))-K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))\right|\left|Y^{n}_{i}(t)\right|+K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))\left|Y^{n}_{i}(t)-Y_{\frac{i}{n}}(t)\right|
Kh3|Xin(t)Xin(t)|b+Kh3|Yin(t)Yin(t)|,\displaystyle\leqslant\left\|\nabla K_{h_{3}}\right\|_{\infty}\left|X^{n}_{i}(t)-X_{\frac{i}{n}}(t)\right|\left\|b\right\|_{\infty}+\left\|K_{h_{3}}\right\|_{\infty}\left|Y^{n}_{i}(t)-Y_{\frac{i}{n}}(t)\right|\,,

so

𝐄|Kh3(x0Xin(t))Yin(t)Kh3(x0Xin(t))Yin(t)|2\displaystyle\mathbf{E}\left|K_{h_{3}}(x_{0}-X^{n}_{i}(t))Y^{n}_{i}(t)-K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t)\right|^{2}
2Kh32b2𝐄|Xin(t)Xin(t)|2+2Kh32𝐄|Yin(t)Yin(t)|2.\displaystyle\leqslant 2\left\|\nabla K_{h_{3}}\right\|_{\infty}^{2}\left\|b\right\|_{\infty}^{2}\mathbf{E}\left|X^{n}_{i}(t)-X_{\frac{i}{n}}(t)\right|^{2}+2\left\|K_{h_{3}}\right\|_{\infty}^{2}\mathbf{E}\left|Y^{n}_{i}(t)-Y_{\frac{i}{n}}(t)\right|^{2}\,.

Then, with Lemmata 2.1 and 5.1, we get

P1b,H,J,K,Tn1h11h22h322d+Tn2h32di=1nJh2(u0u)2.P_{1}\lesssim_{b,H,J,K,T}n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d}+Tn^{-2}h_{3}^{-2d}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-u)^{2}\,.

Step 2. To bound P2P_{2}, we apply Berstein’s inequality. Let

Zi(t)=Jh2(u0in)(Kh3(x0Xin(t))Yin(t)𝐄(Kh3(x0Xin(t))Yin(t))).Z_{i}(t)=J_{h_{2}}(u_{0}-\frac{i}{n})\big{(}K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t)-\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t))\big{)}\,.

Then

P2\displaystyle P_{2} T0THh1(t0t)2𝐄|1ni=1nZi(t)|2𝑑t\displaystyle\leqslant T\int_{0}^{T}H_{h_{1}}(t_{0}-t)^{2}\mathbf{E}\left|\frac{1}{n}\sum_{i=1}^{n}Z_{i}(t)\right|^{2}dt
=T0THh1(t0t)20𝐏(|i=1nZi(t)|>nz)𝑑z𝑑t.\displaystyle=T\int_{0}^{T}H_{h_{1}}(t_{0}-t)^{2}\int_{0}^{\infty}\mathbf{P}\left(\left|\sum_{i=1}^{n}Z_{i}(t)\right|>n\sqrt{z}\right)dzdt\,.

Observe that 𝐄Zi(t)=0\mathbf{E}Z_{i}(t)=0, and |Zi(t)|2(JK)hb\left|Z_{i}(t)\right|\leqslant 2\left\|(J\otimes K)_{h}\right\|_{\infty}\left\|b\right\|_{\infty} for every i=1,,ni=1,\dots,n, for every tt. Also, every Zi(t)Z_{i}(t) is a function of Xin(t)X_{\frac{i}{n}}(t), which makes them independent of each other among i=1,,ni=1,\dots,n. So we may apply Bernstein’s inequality and inequality (48) in [28] to obtain

𝐏(|i=1nZi(t)|>nz)\displaystyle\mathbf{P}\left(\left|\sum_{i=1}^{n}Z_{i}(t)\right|>n\sqrt{z}\right) 2exp(12n2zi=1n𝐄|Zi(t)|2+nz32(JK)hb)\displaystyle\leqslant 2\exp\left(-\frac{\frac{1}{2}n^{2}z}{\sum_{i=1}^{n}\mathbf{E}\left|Z_{i}(t)\right|^{2}+\frac{n\sqrt{z}}{3}2\left\|(J\otimes K)_{h}\right\|_{\infty}\left\|b\right\|_{\infty}}\right)
max{2n2i=1n𝐄|Zi(t)|2,169n2(JK)h2b2}.\displaystyle\lesssim\max\left\{2n^{-2}\sum_{i=1}^{n}\mathbf{E}\left|Z_{i}(t)\right|^{2},\frac{16}{9}n^{-2}\left\|(J\otimes K)_{h}\right\|_{\infty}^{2}\left\|b\right\|_{\infty}^{2}\right\}\,.

Further notice that

𝐄|Zi(t)|2b2Jh2(u0in)2Kh3(x0)L2(μt,in)2.\mathbf{E}\left|Z_{i}(t)\right|^{2}\leqslant\left\|b\right\|_{\infty}^{2}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\left\|K_{h_{3}}(x_{0}-\cdot)\right\|_{L^{2}(\mu_{t,\frac{i}{n}})}^{2}\,.

Thus

P2\displaystyle P_{2} Tn2b20THh1(t0t)2i=1nJh2(u0in)2Kh3(x0)L2(μt,in)2\displaystyle\lesssim Tn^{-2}\left\|b\right\|_{\infty}^{2}\int_{0}^{T}H_{h_{1}}(t_{0}-t)^{2}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\left\|K_{h_{3}}(x_{0}-\cdot)\right\|_{L^{2}(\mu_{t,\frac{i}{n}})}^{2}
+Tn2h11h22h32db2H22J2K2.\displaystyle\qquad+Tn^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2d}\left\|b\right\|_{\infty}^{2}\left\|H\right\|_{2}^{2}\left\|J\right\|_{\infty}^{2}\left\|K\right\|_{\infty}^{2}\,.

Step 3. The idea for bounding P3P_{3} is analogous to that of M3M_{3} in the proof of Lemma 3.1, which uses the stability of the graphon mean-field system. Observe that

P3T0THh1(t0t)24h2I|𝐄P3(t,u)|2𝑑u𝑑t,P_{3}\leqslant T\int_{0}^{T}H_{h_{1}}(t_{0}-t)^{2}4h_{2}\int_{I}\left|\mathbf{E}P_{3}(t,u)\right|^{2}dudt\,,

where

P3(t,u)=Jh2(u0nun)Kh3(x0Xnun(t))Ynun(t)Jh2(u0u)Kh3(x0Xu(t))Yu(t).P_{3}(t,u)=J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))Y_{\frac{\lceil nu\rceil}{n}}(t)-J_{h_{2}}(u_{0}-u)K_{h_{3}}(x_{0}-X_{u}(t))Y_{u}(t)\,.

Note that

|P3(t,u)|2\displaystyle\left|P_{3}(t,u)\right|^{2} 2|Jh2(u0nun)Jh2(u0u)|2Kh3(x0Xnun(t))2|Ynun(t)|2\displaystyle\leqslant 2\left|J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})-J_{h_{2}}(u_{0}-u)\right|^{2}K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))^{2}\left|Y_{\frac{\lceil nu\rceil}{n}}(t)\right|^{2}
+2Jh2(u0u)2|Kh3(x0Xnun(t))Ynun(t)Kh3(x0Xu(t))Yu(t)|2\displaystyle\qquad+2J_{h_{2}}(u_{0}-u)^{2}\left|K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))Y_{\frac{\lceil nu\rceil}{n}}(t)-K_{h_{3}}(x_{0}-X_{u}(t))Y_{u}(t)\right|^{2}
2Jh22|nunu|2Kh3(x0Xnun(t))2b2\displaystyle\leqslant 2\left\|\nabla J_{h_{2}}\right\|_{\infty}^{2}\left|\frac{\lceil nu\rceil}{n}-u\right|^{2}K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))^{2}\left\|b\right\|_{\infty}^{2}
+4Jh2(u0u)2Kh3(x0Xnun(t))2|Ynun(t)Yu(t)|2\displaystyle\qquad+4J_{h_{2}}(u_{0}-u)^{2}K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))^{2}\left|Y_{\frac{\lceil nu\rceil}{n}}(t)-Y_{u}(t)\right|^{2}
+4Jh2(u0u)2|Kh3(x0Xnun(t))Kh3(x0Xu(t))|2|Yu(t)|2\displaystyle\qquad+4J_{h_{2}}(u_{0}-u)^{2}\left|K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))-K_{h_{3}}(x_{0}-X_{u}(t))\right|^{2}\left|Y_{u}(t)\right|^{2}
2n2h22J2b2Kh3(x0Xnun(t))2\displaystyle\leqslant 2n^{-2}h_{2}^{-2}\left\|\nabla J\right\|_{\infty}^{2}\left\|b\right\|_{\infty}^{2}K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))^{2}
+4Jh2(u0u)2h32dK2|Ynun(t)Yu(t)|2\displaystyle\qquad+4J_{h_{2}}(u_{0}-u)^{2}h_{3}^{-2d}\left\|K\right\|_{\infty}^{2}\left|Y_{\frac{\lceil nu\rceil}{n}}(t)-Y_{u}(t)\right|^{2}
+4Jh2(u0u)2h322dK2|Xnun(t)Xu(t)|2b2.\displaystyle\qquad+4J_{h_{2}}(u_{0}-u)^{2}h_{3}^{-2-2d}\left\|\nabla K\right\|_{\infty}^{2}\left|X_{\frac{\lceil nu\rceil}{n}}(t)-X_{u}(t)\right|^{2}\left\|b\right\|_{\infty}^{2}\,.

Then

𝐄|P3(t,u)|2\displaystyle\mathbf{E}\left|P_{3}(t,u)\right|^{2} 2n2h24J2b2Kh3(x0)L2(μt,nun)2\displaystyle\leqslant 2n^{-2}h_{2}^{-4}\left\|\nabla J\right\|_{\infty}^{2}\left\|b\right\|_{\infty}^{2}\left\|K_{h_{3}}(x_{0}-\cdot)\right\|_{L^{2}(\mu_{t,\frac{\lceil nu\rceil}{n}})}^{2}
+4n2h32dK2Jh2(u0u)2+4n2h322dK2b2Jh2(u0u)2.\displaystyle\quad+4n^{-2}h_{3}^{-2d}\left\|K\right\|_{\infty}^{2}J_{h_{2}}(u_{0}-u)^{2}+4n^{-2}h_{3}^{-2-2d}\left\|\nabla K\right\|_{\infty}^{2}\left\|b\right\|_{\infty}^{2}J_{h_{2}}(u_{0}-u)^{2}\,.

Integrating those produces

P3\displaystyle P_{3} Tn2h22J2b20THh1(t0t)2IKh3(x0)L2(μt,nun)2dudt\displaystyle\lesssim Tn^{-2}h_{2}^{-2}\left\|\nabla J\right\|_{\infty}^{2}\left\|b\right\|_{\infty}^{2}\int_{0}^{T}H_{h_{1}}(t_{0}-t)^{2}\int_{I}\left\|K_{h_{3}}(x_{0}-\cdot)\right\|_{L^{2}(\mu_{t,\frac{\lceil nu\rceil}{n}})}^{2}dudt
+Tn2h11h322dH22J22K2b2\displaystyle\qquad+Tn^{-2}h_{1}^{-1}h_{3}^{-2-2d}\left\|H\right\|_{2}^{2}\left\|J\right\|_{2}^{2}\left\|\nabla K\right\|_{\infty}^{2}\left\|b\right\|_{\infty}^{2}
+Tn2h11h32dH22J22K2.\displaystyle\qquad+Tn^{-2}h_{1}^{-1}h_{3}^{-2d}\left\|H\right\|_{2}^{2}\left\|J\right\|_{2}^{2}\left\|K\right\|_{\infty}^{2}\,.

Step 4. For P5P_{5}, notice that {Bin\nonscript|\nonscripti=1,,n}\{B_{\frac{i}{n}}\nonscript\>|\nonscript\>\mathopen{}\allowbreak i=1,\dots,n\} are distinct independent Brownian motions. Then we apply Itô’s isometry to see that

P5\displaystyle P_{5} =1n2𝐄|0Ti=1n(HJK)h(t0t,u0in,x0Xin(t))σ(Xin(t))dBin(t)|2\displaystyle=\frac{1}{n^{2}}\mathbf{E}\left|\int_{0}^{T}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))\sigma(X^{n}_{i}(t))dB_{\frac{i}{n}}(t)\right|^{2}
=dn2𝐄(0Ti=1n(HJK)h(t0t,u0in,x0Xin(t))2tr(σσT)(Xin(t))dt)\displaystyle=\frac{d}{n^{2}}\mathbf{E}\left(\int_{0}^{T}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))^{2}\operatorname{tr}(\sigma\sigma^{T})(X^{n}_{i}(t))dt\right)
σ+2d2n2i=1n0T𝐄(HJK)h(t0t,u0in,x0Xin(t))2\displaystyle\leqslant\frac{\sigma_{+}^{2}d^{2}}{n^{2}}\sum_{i=1}^{n}\int_{0}^{T}\mathbf{E}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))^{2}
Td2σ+2n1h12h22h32d.\displaystyle\lesssim Td^{2}\sigma_{+}^{2}n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d}\,.

Adding all the above bounds finishes the proof. ∎

Proof of Corollary 3.2.

Recall that β^h,κ,rn=β^h,κn𝟏{|x|r}\hat{\beta}^{n}_{h,\kappa,r}=\hat{\beta}^{n}_{h,\kappa}\mathbf{1}_{\{\left|x\right|\leqslant r\}}. We break the integral into two parts

τ1τ2Id𝐄|β^h,κ,rn(t,u,x)β(t,u,x)|2\displaystyle\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\mathbf{E}\left|\hat{\beta}^{n}_{h,\kappa,r}(t,u,x)-\beta(t,u,x)\right|^{2}
(5.8) =τ1τ2I{|x|r}𝐄|β^h,κn(t,u,x)β(t,u,x)|2+τ1τ2I{|x|>r}|β(t,u,x)|2𝑑x𝑑u𝑑t.\displaystyle\quad=\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\{\left|x\right|\leqslant r\}}\mathbf{E}\left|\hat{\beta}^{n}_{h,\kappa}(t,u,x)-\beta(t,u,x)\right|^{2}+\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\{\left|x\right|>r\}}\left|\beta(t,u,x)\right|^{2}dxdudt\,.

Step 1. The convergence of the second part is due to the L2L^{2}-integrability of β\beta. More precisely, recall that

β(t,u,x)=Idb(x,y)G(u,v)μt,v(dy)𝑑v,\beta(t,u,x)=\int_{I}\int_{\mathbb{R}^{d}}b(x,y)G(u,v)\mu_{t,v}(dy)dv\,,

where b(x,y)=F(xy)+V(x)b(x,y)=F(x-y)+V(x) with F,VL1L2LF,V\in L^{1}\cap L^{2}\cap L^{\infty}. Then

|β(t,u,x)|22|V(x)|2+2d|F(xy)|2μt,v(dy),\left|\beta(t,u,x)\right|^{2}\leqslant 2\left|V(x)\right|^{2}+2\int_{\mathbb{R}^{d}}\left|F(x-y)\right|^{2}\mu_{t,v}(dy)\,,

so that

τ1τ2Id|β(t,u,x)|22T(V22+F22)<.\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\left|\beta(t,u,x)\right|^{2}\leqslant 2T(\left\|V\right\|_{2}^{2}+\left\|F\right\|_{2}^{2})<\infty\,.

Thus by dominated convergence, we have

θ2,β(r)=defτ1τ2I{|x|>r}|β(t,u,x)|2𝑑x𝑑u𝑑t0\theta_{2,\beta}(r)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\{\left|x\right|>r\}}\left|\beta(t,u,x)\right|^{2}dxdudt\to 0

as rr\to\infty.

Step 2. We now look at the first part of (5.8). Recalling that β^h,κn=π^hnμ^hnκ2\hat{\beta}^{n}_{h,\kappa}=\frac{\hat{\pi}^{n}_{h}}{\hat{\mu}^{n}_{h}\lor\kappa_{2}}, we obtain

|β^h,κn(t,u,x)β(t,u,x)|2\displaystyle\left|\hat{\beta}^{n}_{h,\kappa}(t,u,x)-\beta(t,u,x)\right|^{2}\lesssim
κ22(|π^hn(t,u,x)π(t,u,x)|2+b2|μ^hn(t,u,x)μ(t,u,x)|2)\displaystyle\qquad\kappa_{2}^{-2}\left(\left|\hat{\pi}^{n}_{h}(t,u,x)-\pi(t,u,x)\right|^{2}+\left\|b\right\|_{\infty}^{2}\left|\hat{\mu}^{n}_{h}(t,u,x)-\mu(t,u,x)\right|^{2}\right)

whenever 0<κ2<μ(t,u,x)0<\kappa_{2}<\mu(t,u,x). Note that μ\mu has a strictly positive lower bound over [τ1,τ2]×I×B(0,r)[\tau_{1},\tau_{2}]\times I\times B(0,r) thanks to Harnack’s inequality (see for instance Corollary 8.2.2 in [9]). This allows us to choose a strictly positive κ2\kappa_{2} depending on rr. We may set without loss of generality κ2=κ2(r)\kappa_{2}=\kappa_{2}(r) decreasing as rr increases.

We already have an upper bound of

τ1τ2I{|x|r}𝐄|μ^hn(t,u,x)μ(t,u,x)|2𝑑x𝑑u𝑑t\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\{\left|x\right|\leqslant r\}}\mathbf{E}\left|\hat{\mu}^{n}_{h}(t,u,x)-\mu(t,u,x)\right|^{2}dxdudt

from Corollary 3.1. It remains to look at the errors of π^\hat{\pi}.

For the estimates of π\pi, we rearrange and combine the terms in the upper bound given in Lemma 3.2 to see that

𝐄|π^hn(t,u,x)π(t,u,x)|2T,b,H,J,K\displaystyle\mathbf{E}\left|\hat{\pi}^{n}_{h}(t,u,x)-\pi(t,u,x)\right|^{2}\lesssim_{T,b,H,J,K}
n1h11h22h322d+n1h12h22h32d\displaystyle\qquad n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d}+n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d}
+n2h11h32di=1nJh2(uin)2\displaystyle\quad+n^{-2}h_{1}^{-1}h_{3}^{-2d}\sum_{i=1}^{n}J_{h_{2}}(u-\frac{i}{n})^{2}
+n20THh12(ts)i=1nJh22(uin)Kh3(x)L2(μs,in)2ds\displaystyle\quad+n^{-2}\int_{0}^{T}H_{h_{1}}^{2}(t-s)\sum_{i=1}^{n}J_{h_{2}}^{2}(u-\frac{i}{n})\left\|K_{h_{3}}(x-\cdot)\right\|_{L^{2}(\mu_{s,\frac{i}{n}})}^{2}ds
+n3h220THh12(ts)i=1nKh3(x)L2(μs,in)2ds\displaystyle\quad+n^{-3}h_{2}^{-2}\int_{0}^{T}H_{h_{1}}^{2}(t-s)\sum_{i=1}^{n}\left\|K_{h_{3}}(x-\cdot)\right\|_{L^{2}(\mu_{s,\frac{i}{n}})}^{2}ds
+|(HJK)hπ(t,u,x)π(t,u,x)|2.\displaystyle\quad+\left|(H\otimes J\otimes K)_{h}\ast\pi(t,u,x)-\pi(t,u,x)\right|^{2}\,.

Analogous to the proof of Corollary 3.1, integrating those items over [τ1,τ2]×I×{|x|r}[\tau_{1},\tau_{2}]\times I\times\{\left|x\right|\leqslant r\} produces

τ1τ2I{|x|r}𝐄|π^hn(t,u,x)π(t,u,x)|2T,b,H,J,K\displaystyle\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\{\left|x\right|\leqslant r\}}\mathbf{E}\left|\hat{\pi}^{n}_{h}(t,u,x)-\pi(t,u,x)\right|^{2}\lesssim_{T,b,H,J,K}
rd(n1h11h22h322d+n1h12h22h32d)\displaystyle\qquad r^{d}(n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d}+n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d})
+n1h11h21h3d+n2h11h22h3d\displaystyle\quad+n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-d}
+(HJK)hππL2([τ1,τ2]×I×d)2.\displaystyle\quad+\left\|(H\otimes J\otimes K)_{h}\ast\pi-\pi\right\|_{L^{2}([\tau_{1},\tau_{2}]\times I\times\mathbb{R}^{d})}^{2}\,.

Recall that π=μβ\pi=\mu\beta, where |β|b\left|\beta\right|\leqslant\left\|b\right\|_{\infty}. Using the same idea attaining (5.6), we see that

limh1,h2,h30(HJK)hππL2([τ1,τ2]×I×d)2=0,\lim_{h_{1},h_{2},h_{3}\to 0}\left\|(H\otimes J\otimes K)_{h}\ast\pi-\pi\right\|_{L^{2}([\tau_{1},\tau_{2}]\times I\times\mathbb{R}^{d})}^{2}=0\,,

and we denote the convergence rate by θ3,π(h)\theta_{3,\pi}(h).

Therefore, joining all the items, we obtain an overall upper bound

τ1τ2Id𝐄|β^h,κ,rn(t,u,x)β(t,u,x)|2𝑑x𝑑u𝑑t\displaystyle\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\mathbf{E}\left|\hat{\beta}^{n}_{h,\kappa,r}(t,u,x)-\beta(t,u,x)\right|^{2}dxdudt\lesssim
κ2(r)2(n1h11h21h3d+n2h11h22h3d)\displaystyle\qquad\kappa_{2}(r)^{-2}(n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-d})
+κ2(r)2rd(n1h12h22h32d+n1h11h22h322d)\displaystyle\quad+\kappa_{2}(r)^{-2}r^{d}(n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d}+n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d})
+κ2(r)2(θ3,μ(h)+θ3,π(h))+θ2,β(r),\displaystyle\quad+\kappa_{2}(r)^{-2}(\theta_{3,\mu}(h)+\theta_{3,\pi}(h))+\theta_{2,\beta}(r)\,,

finishing the proof. ∎

6. Proofs for Section 4

The main improvement in the estimations in Lemmata 4.1 and 4.2 over Lemmata 3.1 and 3.2 is the elimination of the step of inequality (1.3). At a given point (t0,u0,x0)(t_{0},u_{0},x_{0}), we are able to remove a heavy error term by simply sacrificing a constant multiple (depending on the point (t0,u0,x0)(t_{0},u_{0},x_{0})). This requires a change-of-measure argument thanks to Girsanov’s theorem, and the analysis of the constant multiple follows from Proposition 19 of [28].

Recall that the finite-population system has the following dynamics

dXin(t)=1nj=1ngijnb(Xin(t),Xjn(t))dt+σ(Xin(t))dBin(t),dX^{n}_{i}(t)=\frac{1}{n}\sum_{j=1}^{n}g^{n}_{ij}b(X^{n}_{i}(t),X^{n}_{j}(t))dt+\sigma(X^{n}_{i}(t))dB_{\frac{i}{n}}(t)\,,

for i=1,,ni=1,\dots,n. We define

B¯in(t)=0t(σσT)1/2(Xin(s))(dXin(s)β(s,in,Xin(s))ds)\bar{B}^{n}_{i}(t)=\int_{0}^{t}(\sigma\sigma^{T})^{-1/2}(X^{n}_{i}(s))(dX^{n}_{i}(s)-\beta(s,\frac{i}{n},X^{n}_{i}(s))ds)

for i=1,,ni=1,\dots,n, and t[0,T]t\in[0,T]. Then

dXin(t)=β(t,in,Xin(t))dt+σ(Xin(t))dB¯in(t),i=1,,n.dX^{n}_{i}(t)=\beta(t,\frac{i}{n},X^{n}_{i}(t))dt+\sigma(X^{n}_{i}(t))d\bar{B}^{n}_{i}(t)\,,\qquad i=1,\dots,n\,.

Let M¯n\bar{M}^{n} be the process

M¯tn=i=1n0t(1nj=1ngijnb(Xin(s),Xjn(s))β(s,in,Xin(s)))T(σσT)1/2(Xin(s))𝑑B¯in(s).\bar{M}^{n}_{t}=\sum_{i=1}^{n}\int_{0}^{t}\left(\frac{1}{n}\sum_{j=1}^{n}g^{n}_{ij}b(X^{n}_{i}(s),X^{n}_{j}(s))-\beta(s,\frac{i}{n},X^{n}_{i}(s))\right)^{T}(\sigma\sigma^{T})^{-1/2}(X^{n}_{i}(s))d\bar{B}^{n}_{i}(s)\,.

Define a new probability measure 𝐏¯\bar{\mathbf{P}} via

d𝐏¯d𝐏=exp(M¯Tn12M¯nT),\frac{d\bar{\mathbf{P}}}{d\mathbf{P}}=\exp\left(\bar{M}^{n}_{T}-\frac{1}{2}\langle\bar{M}^{n}\rangle_{T}\right)\,,

where \langle\cdot\rangle denotes the quadratic variation. Observe that {B¯in\nonscript|\nonscripti=1,,n}\{\bar{B}^{n}_{i}\nonscript\>|\nonscript\>\mathopen{}\allowbreak i=1,\dots,n\} are independent 𝐏¯\bar{\mathbf{P}}-Brownian motions, and that M¯\bar{M} is a 𝐏¯\bar{\mathbf{P}}-martingale. So {Xin\nonscript|\nonscripti=1,,n}\{X^{n}_{i}\nonscript\>|\nonscript\>\mathopen{}\allowbreak i=1,\dots,n\} are independent under 𝐏¯\bar{\mathbf{P}}, and the 𝐏¯\bar{\mathbf{P}}-law of XinX^{n}_{i} coincides with 𝐏\mathbf{P}-law of XinX_{\frac{i}{n}}, respectively for every i=1,,ni=1,\dots,n. A slight modification on Proposition 19 of [28] gives the following relation.

Lemma 6.1.

There exist constants C,a>0C,a>0 such that, for any T\mathcal{F}_{T}-measurable event EE, we have

𝐏(E)C(𝐏¯(E))a.\mathbf{P}(E)\leqslant C(\bar{\mathbf{P}}(E))^{a}\,.

Here T\mathcal{F}_{T} is the σ\sigma-algebra generated by the Brownian motions {Bu(t)}t[0,T],uI\{B_{u}(t)\}_{t\in[0,T],u\in I}.

Now we have the tools to complete the proof of the improved estimations and thus the minimax analysis.

6.1. Proof of Theorem 4.1

We first justify the improved upper bound.

Proof of Lemma 4.1.

Observe that

𝐄|μ^hn(t0,u0,x0)μ(t0,u0,x0)|2\displaystyle\mathbf{E}\left|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}
3𝐄|1ni=1nJh2(u0in)(Kh3(x0Xin(t0))𝐄¯(Kh3(x0Xin(t0))))|2\displaystyle\leqslant 3\mathbf{E}\left|\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})\big{(}K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0}))-\bar{\mathbf{E}}(K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0})))\big{)}\right|^{2}
+3𝐄|IJh2(u0nun)𝐄¯(Kh3(x0Xin(t0)))Jh2(u0u)𝐄(Kh3(x0Xu(t0)))du|2\displaystyle+3\mathbf{E}\left|\int_{I}J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})\bar{\mathbf{E}}(K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0})))-J_{h_{2}}(u_{0}-u)\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0})))du\right|^{2}
+3|(JK)hμt0(u0,x0)μ(t0,u0,x0)|\displaystyle+3\left|(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|
=:3(M1+M2+M3).\displaystyle=:3(M^{\prime}_{1}+M^{\prime}_{2}+M^{\prime}_{3})\,.

For i=1,,ni=1,\dots,n, let

Z¯i=Jh2(u0in)(Kh3(x0Xin(t0))𝐄¯(Kh3(x0Xin(t0)))).\bar{Z}_{i}=J_{h_{2}}(u_{0}-\frac{i}{n})\big{(}K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0}))-\bar{\mathbf{E}}(K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0})))\big{)}\,.

Note that Z¯i=0\bar{Z}_{i}=0 whenever |u0in|>h2\left|u_{0}-\frac{i}{n}\right|>h_{2}, so the number of nonzero terms in the summation is O(nh2)O(nh_{2}).

The main improvement upon Lemma 3.1 comes from the upper bound of M1M^{\prime}_{1} via the change-of-measure argument. Following the same strategy as in the proof of Lemma 3.1, we have

M1\displaystyle M^{\prime}_{1} =𝐄|1ni=1nZ¯i|2\displaystyle=\mathbf{E}\left|\frac{1}{n}\sum_{i=1}^{n}\bar{Z}_{i}\right|^{2}
=0𝐏(|i=1nZ¯i|>nz)𝑑x\displaystyle=\int_{0}^{\infty}\mathbf{P}\left(\left|\sum_{i=1}^{n}\bar{Z}_{i}\right|>n\sqrt{z}\right)dx
C0𝐏¯(|i=1nZ¯i|>nz)a𝑑z.\displaystyle\leqslant C\int_{0}^{\infty}\bar{\mathbf{P}}\left(\left|\sum_{i=1}^{n}\bar{Z}_{i}\right|>n\sqrt{z}\right)^{a}dz\,.

Recall that {Xin\nonscript|\nonscripti=1,,n}\{X^{n}_{i}\nonscript\>|\nonscript\>\mathopen{}\allowbreak i=1,\dots,n\} are independent under 𝐏¯\bar{\mathbf{P}}. Then so are {Z¯i\nonscript|\nonscripti=1,,n}\{\bar{Z}_{i}\nonscript\>|\nonscript\>\mathopen{}\allowbreak i=1,\dots,n\}. Moreover, we have 𝐄¯Z¯i=0\bar{\mathbf{E}}\bar{Z}_{i}=0 and |Z¯i|2(JK)h\left|\bar{Z}_{i}\right|\leqslant 2\left\|(J\otimes K)_{h}\right\|_{\infty} a.s. We may thus apply Bernstein’s inequality,

𝐏¯(|i=1nZ¯i|>nz)2exp(12n2zi=1n𝐄Z¯i2+13nz(JK)h).\bar{\mathbf{P}}\left(\left|\sum_{i=1}^{n}\bar{Z}_{i}\right|>n\sqrt{z}\right)\leqslant 2\exp\left(-\frac{\frac{1}{2}n^{2}z}{\sum_{i=1}^{n}\mathbf{E}\bar{Z}_{i}^{2}+\frac{1}{3}n\sqrt{z}\left\|(J\otimes K)_{h}\right\|_{\infty}}\right)\,.

For index ii such that |u0in|h2\left|u_{0}-\frac{i}{n}\right|\leqslant h_{2}, we have

𝐄Zi2\displaystyle\mathbf{E}Z_{i}^{2} Jh2(u0in)2𝐄¯(Kh3(x0Xin(t0))2)\displaystyle\leqslant J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\bar{\mathbf{E}}(K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0}))^{2})
=Jh2(u0in)2𝐄(Kh3(x0Xin(t0))2)\displaystyle=J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))^{2})
h22J2dKh3(x0x)2μt0,in(dx)\displaystyle\leqslant h_{2}^{-2}\left\|J\right\|_{\infty}^{2}\int_{\mathbb{R}^{d}}K_{h_{3}}(x_{0}-x)^{2}\mu_{t_{0},\frac{i}{n}}(dx)
Cμh22h3dJ2K22,\displaystyle\leqslant C_{\mu}h_{2}^{-2}h_{3}^{-d}\left\|J\right\|_{\infty}^{2}\left\|K\right\|_{2}^{2}\,,

thanks to the local boundedness of μt0,in\mu_{t_{0},\frac{i}{n}}, in a neighborhood of x0x_{0}. Using estimate (48) in [28], we get

0𝐏¯(|i=1nZ¯i|>nz)a𝑑z\displaystyle\int_{0}^{\infty}\bar{\mathbf{P}}\left(\left|\sum_{i=1}^{n}\bar{Z}_{i}\right|>n\sqrt{z}\right)^{a}dz
2max{2Cμnh21h3dJ2K22an2,(2nh21h3d(JK)3an2)2}\displaystyle\quad\leqslant 2\max\left\{\frac{2C_{\mu}nh_{2}^{-1}h_{3}^{-d}\left\|J\right\|_{\infty}^{2}\left\|K\right\|_{2}^{2}}{an^{2}},\big{(}\frac{2nh_{2}^{-1}h_{3}^{-d}\left\|(J\otimes K)\right\|_{\infty}}{3an^{2}}\big{)}^{2}\right\}
C(n1h21h3dJ2K22+n2h22h32dJ2K2).\displaystyle\quad\leqslant C(n^{-1}h_{2}^{-1}h_{3}^{-d}\left\|J\right\|_{\infty}^{2}\left\|K\right\|_{2}^{2}+n^{-2}h_{2}^{-2}h_{3}^{-2d}\left\|J\right\|_{\infty}^{2}\left\|K\right\|_{\infty}^{2})\,.

The term M2M^{\prime}_{2} is where items (1) and (2) of Lemma 4.1 become different. We first work on item (1). Recall that Jh2J_{h_{2}} is supported on B(0,h2)¯\overline{B(0,h_{2})}. Then Cauchy-Schwarz inequality gives

M2\displaystyle M^{\prime}_{2}\leqslant
h2I|Jh2(u0nun)𝐄¯(Kh3(x0Xnun(t0)))Jh2(u0u)𝐄(Kh3(x0Xu(t0)))|2𝑑u.\displaystyle\;h_{2}\int_{I}\left|J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})\bar{\mathbf{E}}(K_{h_{3}}(x_{0}-X^{n}_{\lceil nu\rceil}(t_{0})))-J_{h_{2}}(u_{0}-u)\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0})))\right|^{2}du\,.

For each u[u0h2,u0+h2]u\in[u_{0}-h_{2},u_{0}+h_{2}], since the 𝐏¯\bar{\mathbf{P}}-law of XnunX^{n}_{\lceil nu\rceil} is identical to the 𝐏\mathbf{P}-law of XnunX_{\frac{\lceil nu\rceil}{n}}, we have

Jh2(u0nun)𝐄¯(Kh3(x0Xnun(t0)))Jh2(u0u)𝐄(Kh3(x0Xu(t0)))\displaystyle J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})\bar{\mathbf{E}}(K_{h_{3}}(x_{0}-X^{n}_{\lceil nu\rceil}(t_{0})))-J_{h_{2}}(u_{0}-u)\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0})))
=Jh2(u0nun)(𝐄(Kh3(x0Xnun(t0)))𝐄(Kh3(x0Xu(t0))))\displaystyle=J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})\left(\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0})))-\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0})))\right)
+(Jh2(u0nun)Jh2(u0u))𝐄(Kh3(x0Xu(t0))).\displaystyle+\left(J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})-J_{h_{2}}(u_{0}-u)\right)\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0})))\,.

So

|Jh2(u0nun)𝐄¯(Kh3(x0Xnun(t0)))Jh2(u0u)𝐄(Kh3(x0Xu(t0)))|2\displaystyle\left|J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})\bar{\mathbf{E}}(K_{h_{3}}(x_{0}-X^{n}_{\lceil nu\rceil}(t_{0})))-J_{h_{2}}(u_{0}-u)\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0})))\right|^{2}
2Jh2(u0nun)2𝐄|Kh3(x0Xnun(t0))Kh3(x0Xu(t0))|2\displaystyle\quad\leqslant 2J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})^{2}\mathbf{E}\left|K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0}))-K_{h_{3}}(x_{0}-X_{u}(t_{0}))\right|^{2}
+2|Jh2(u0nun)Jh2(u0u)|2𝐄(Kh3(x0Xu(t0))2)\displaystyle\qquad+2\left|J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})-J_{h_{2}}(u_{0}-u)\right|^{2}\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0}))^{2})
2h22J2h322dK2𝐄|Xnun(t0)Xu(t0)|2\displaystyle\quad\leqslant 2h_{2}^{-2}\left\|J\right\|_{\infty}^{2}h_{3}^{-2-2d}\left\|K\right\|_{\infty}^{2}\mathbf{E}\left|X_{\frac{\lceil nu\rceil}{n}}(t_{0})-X_{u}(t_{0})\right|^{2}
+2h24n2J2Cμh3dK22\displaystyle\qquad+2h_{2}^{-4}n^{-2}\left\|\nabla J\right\|_{\infty}^{2}C_{\mu}h_{3}^{-d}\left\|K\right\|_{2}^{2}
C(n2h22h322d+n2h24h3d),\displaystyle\quad\leqslant C(n^{-2}h_{2}^{-2}h_{3}^{-2-2d}+n^{-2}h_{2}^{-4}h_{3}^{-d})\,,

where the last inequality uses Theorem 2.1 of [3].

Integrating the above errors, we obtain

M2Cn2(h322dJ2K2+h22h3dJ2K22).M^{\prime}_{2}\leqslant Cn^{-2}(h_{3}^{-2-2d}\left\|J\right\|_{\infty}^{2}\left\|\nabla K\right\|_{\infty}^{2}+h_{2}^{-2}h_{3}^{-d}\left\|\nabla J\right\|_{\infty}^{2}\left\|K\right\|_{2}^{2})\,.

That finishes the proof of item (1).

Looking at the proof of item (1), we notice that the only difference in item (2) compared to item (1) happens at the term

𝐄|Kh3(x0Xnun(t0))Kh3(x0Xu(t0))|2.\mathbf{E}\left|K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0}))-K_{h_{3}}(x_{0}-X_{u}(t_{0}))\right|^{2}\,.

The previous (crude) analysis (in the proof of Lemma 3.1) gives an upper bound O(n2h322d)O(n^{-2}h_{3}^{-2-2d}) simply by mean-value theorem. However, the use of mean-value theorem is activated only when |x0Xnun(t0)|h3\left|x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0})\right|\leqslant h_{3} and |x0Xu(t0)|h3\left|x_{0}-X_{u}(t_{0})\right|\leqslant h_{3}. Given the local boundedness of μ\mu, we have

𝐏(A(t0,u,x0))=def𝐏(|x0Xnun(t0)|h3 or |x0Xu(t0)|h3)2Cμh3d\mathbf{P}(A(t_{0},u,x_{0}))\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\mathbf{P}\left(\left|x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0})\right|\leqslant h_{3}\text{ or }\left|x_{0}-X_{u}(t_{0})\right|\leqslant h_{3}\right)\leqslant 2C_{\mu}^{\prime}h_{3}^{d}

for some constant CμC_{\mu}^{\prime}.

Now, with the additional assumption on the continuity of initial data with respect to the pp-Wasserstein metric, we adjust the proof of Theorem 2.1(b) in [3] to see that

𝐄|Xnun(t0)Xu(t0)|pCpnp.\mathbf{E}\left|X_{\frac{\lceil nu\rceil}{n}}(t_{0})-X_{u}(t_{0})\right|^{p}\leqslant C_{p}^{\prime}n^{-p}\,.

Then, by Hölder’s inequality, we get

𝐄|Kh3(x0Xnun(t0))Kh3(x0Xu(t0))|2\displaystyle\mathbf{E}\left|K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0}))-K_{h_{3}}(x_{0}-X_{u}(t_{0}))\right|^{2}
𝐄(Kh32|Xnun(t0)Xu(t0)|2𝟏A(t0,u,x0))\displaystyle\leqslant\mathbf{E}\left(\left\|\nabla K_{h_{3}}\right\|_{\infty}^{2}\left|X_{\frac{\lceil nu\rceil}{n}}(t_{0})-X_{u}(t_{0})\right|^{2}\mathbf{1}_{A(t_{0},u,x_{0})}\right)
h322dK2(|Xnun(t0)Xu(t0)|p)2p𝐏(A(t0,u0,x0))p2p\displaystyle\leqslant h_{3}^{-2-2d}\left\|\nabla K\right\|_{\infty}^{2}\left(\left|X_{\frac{\lceil nu\rceil}{n}}(t_{0})-X_{u}(t_{0})\right|^{p}\right)^{\frac{2}{p}}\mathbf{P}(A(t_{0},u_{0},x_{0}))^{\frac{p-2}{p}}
Cpnph32p+2pdK2.\displaystyle\leqslant C_{p}^{\prime}n^{-p}h_{3}^{-2-\frac{p+2}{p}d}\left\|\nabla K\right\|_{\infty}^{2}\,.

Note that CpC_{p}^{\prime} is independent of uu. That finishes the proof of item (2). ∎

It remains to analyze the bias term. Fix t0(0,T)t_{0}\in(0,T), u0(0,1)u_{0}\in(0,1), and x0dx_{0}\in\mathbb{R}^{d}. When h2<u0h_{2}<u_{0}, we have

(JK)hμ(t0,u0,x0)μ(t0,u0,x0)=\displaystyle(J\otimes K)_{h}\ast\mu(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})=
dJh2(u0u)Kh3(x0x)(μ(t0,u,x)μ(t0,u0,x0))𝑑x𝑑u.\displaystyle\qquad\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}J_{h_{2}}(u_{0}-u)K_{h_{3}}(x_{0}-x)(\mu(t_{0},u,x)-\mu(t_{0},u_{0},x_{0}))dxdu\,.

For uIu\in I and xdx\in\mathbb{R}^{d} such that |u0u|<h2\left|u_{0}-u\right|<h_{2} and |x0x|<h3\left|x_{0}-x\right|<h_{3}, we have

μ(t0,u,x)μ(t0,u0,x0)=\displaystyle\mu(t_{0},u,x)-\mu(t_{0},u_{0},x_{0})=
(μ(t0,u,x)μ(t0,u0,x))+(μ(t0,u0,x)μ(t0,u0,x0)).\displaystyle\qquad(\mu(t_{0},u,x)-\mu(t_{0},u_{0},x))+(\mu(t_{0},u_{0},x)-\mu(t_{0},u_{0},x_{0}))\,.

The second term has order O(|h3|s)O(\left|h_{3}\right|^{s}) due to the selection of Hölder continuity class. We will bound the first term with the following technical lemma.

Lemma 6.2.

Assume the hypothesis of There exists some constant CI>0C_{I}>0, depending only on T,d,b,σT,d,b,\sigma, such that

|μ(t,u,x)μ(t,v,x)|CI|uv|dx-a.s.\left|\mu(t,u,x)-\mu(t,v,x)\right|\leqslant C_{I}\left|u-v\right|\qquad dx\text{-a.s.}

for every u,vIu,v\in I and every t[0,T]t\in[0,T].

The proof consists of several properties of parabolic equations, and we defer it to the Appendix B.

With the technical estimates given above, we are now able to prove Theorem 4.1. We start with the upper bound.

Proof of Theorem 4.1, upper bound.

We first work under assumption (a).

Given (b,σ,G,μ0)𝒜Ls(t0,x0)(b,\sigma,G,\mu_{0})\in\mathcal{A}^{s}_{L}(t_{0},x_{0}), we know that

|μ(t0,u0,x)μ(t0,u0,x0)|L|xx0|sLh3s\left|\mu(t_{0},u_{0},x)-\mu(t_{0},u_{0},x_{0})\right|\leqslant L\left|x-x_{0}\right|^{s}\leqslant Lh_{3}^{s}

whenever x0xsupp(Kh3)x_{0}-x\in supp(K_{h_{3}}). Thanks to Lemma 6.2, we have

|μ(t0,u,x)μ(t0,u0,x)|CI|uu0|CIh2,\left|\mu(t_{0},u,x)-\mu(t_{0},u_{0},x)\right|\leqslant C_{I}\left|u-u_{0}\right|\leqslant C_{I}h_{2}\,,

whenever u,u0Iu,u_{0}\in I. So the bias term is bounded by

|(JK)hμt0(u0,x0)μ(t0,u0,x0)|22(CI2+L2)(h22+h32s).\left|(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}\leqslant 2(C_{I}^{2}+L^{2})(h_{2}^{2}+h_{3}^{2s})\,.

Then, along with Lemma 4.1, the total upper bound of the estimation error is given by

𝐄|μ^hnμ|2C(n1h21h3d+n2h22h32d+n2h322d+n2h22h3d+h22+h32s).\mathbf{E}\left|\hat{\mu}^{n}_{h}-\mu\right|^{2}\leqslant C(n^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{2}^{-2}h_{3}^{-2d}+n^{-2}h_{3}^{-2-2d}+n^{-2}h_{2}^{-2}h_{3}^{-d}+h_{2}^{2}+h_{3}^{2s})\,.

Taking h2=nsd+3sh_{2}=n^{-\frac{s}{d+3s}} and h3=n1d+3sh_{3}=n^{-\frac{1}{d+3s}}, we get

𝐄|μ^hn(t0,u0,x0)μ(t0,u0,x0)|2n2sd+3s+n6s2d+3sn2sd+3s.\mathbf{E}\left|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}\lesssim n^{-\frac{2s}{d+3s}}+n^{-\frac{6s-2}{d+3s}}\lesssim n^{-\frac{2s}{d+3s}}\,.

The last inequality holds when s12s\geqslant\frac{1}{2}. Note that the implicit constant in the inequality depends only T,d,CI,L,(JK)2,(JK)T,d,C_{I},L,\left\|(J\otimes K)\right\|_{2},\left\|(J\otimes K)\right\|_{\infty}, and the values of μ\mu near (t0,u0,x0)(t_{0},u_{0},x_{0}).

Next, we work under the assumption (b). Analogous to above, we have

𝐄|μ^hnμ|2C(n1h21h3d+n2h22h32d+n2h32p+2pd+n2h22h3d+h22+h32s).\mathbf{E}\left|\hat{\mu}^{n}_{h}-\mu\right|^{2}\leqslant C(n^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{2}^{-2}h_{3}^{-2d}+n^{-2}h_{3}^{-2-\frac{p+2}{p}d}+n^{-2}h_{2}^{-2}h_{3}^{-d}+h_{2}^{2}+h_{3}^{2s})\,.

Taking h2=nsd+3sh_{2}=n^{-\frac{s}{d+3s}} and h3=n1d+3sh_{3}=n^{-\frac{1}{d+3s}}, we get

𝐄|μ^hn(t0,u0,x0)μ(t0,u0,x0)|2n2sd+3s+n6s2+d2d/pd+3s.\mathbf{E}\left|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}\lesssim n^{-\frac{2s}{d+3s}}+n^{-\frac{6s-2+d-2d/p}{d+3s}}\,.

As 0<s<120<s<\frac{1}{2}, p>2p>2, and p(24s)(p2)dp(2-4s)\leqslant(p-2)d, we have

6s2+d2dp2s.6s-2+d-\frac{2d}{p}\geqslant 2s\,.

This leads to the final asymptotic upper bound in (4.1), namely

n2sd+3s.n^{-\frac{2s}{d+3s}}\,.

Finally, we demonstrate the lower bound using Le Cam’s two-point method (see for instance Chapter 2 of [26]). We shall construct two examples of graphon mean-field systems such that, the total variation distance between their laws is bounded by 12\frac{1}{2}, while the densities at (t0,u0,x0)(t_{0},u_{0},x_{0}) differ by some quantity of order n2sd+3sn^{-\frac{2s}{d+3s}}. The construction is adapted from [28], with an extra factor for the graphon index uIu\in I, so we will skip some technical details in the proof below.

Proof of Theorem 4.1, lower bound.

Step 1. We consider graphon particle systems with no interactions.

Pick a smooth potential function U1:dU_{1}:\mathbb{R}^{d}\to\mathbb{R} such that U1\nabla U_{1} is Lipschitz, U1=0U_{1}=0 in a neighborhood of x0x_{0}, and

lim sup|x|xTU1(x)|x|2>0.\limsup_{\left|x\right|\to\infty}\frac{x^{T}\nabla U_{1}(x)}{\left|x\right|^{2}}>0\,.

Define the drift b(x,y)=b1(x)=defU1(x)b(x,y)=b_{1}(x)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}-\nabla U_{1}(x). Pick a Lipschitz continuous function G1:I×G_{1}:I\times\mathbb{R} such that G1=0G_{1}=0 in a neighborhood of u0u_{0}, and define the graphon weight G(u,v)=G1(u)G(u,v)=G_{1}(u). We set C1,u=defdexp(2G1(u)U1(x))𝑑xC_{1,u}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{\mathbb{R}^{d}}\exp(-2G_{1}(u)U_{1}(x))dx and define

ν1(u,x)=C1,u1exp(2G1(u)U1(x)),uI.\nu_{1}(u,x)=C_{1,u}^{-1}\exp(-2G_{1}(u)U_{1}(x))\,,\qquad u\in I\,.

Then we obtain a family of diffusion processes {Xu}uI\{X_{u}\}_{u\in I} such that

(6.1) dXu(t)=b1(Xu(t))G1(u)dt+dBu(t),Xu(0)ν1(u),uI.dX_{u}(t)=b_{1}(X_{u}(t))G_{1}(u)dt+dB_{u}(t)\,,\quad X_{u}(0)\sim\nu_{1}(u)\,,\qquad u\in I\,.

Notice that XuX_{u}’s are independent, and ν1(u)\nu_{1}(u) is the invariant distribution of XuX_{u}. This gives a graphon particle system with time-invariant density function ν1\nu_{1}. In particular, we may assume that (b1,Id×d,G1,ν1)SL/2s(t0,x0)(b_{1},I_{d\times d},G_{1},\nu_{1})\in S^{s}_{L/2}(t_{0},x_{0}).

Now we consider a deviation from the system (6.1). Let ψCc(×)\psi\in C_{c}^{\infty}(\mathbb{R}\times\mathbb{R}) be a cut-off function such that

  • ψ(0,0)=1\psi(0,0)=1 and ψ=1\left\|\psi\right\|_{\infty}=1,

  • dψ(u,x)𝑑x=0\int_{\mathbb{R}^{d}}\psi(u,x)dx=0 for every uu\in\mathbb{R}, and ψ2=1\left\|\psi\right\|_{2}=1,

  • supuψ(u,)s(x0)<\sup_{u\in\mathbb{R}}\left\|\psi(u,\cdot)\right\|_{\mathcal{H}^{s}(x_{0})}<\infty.

Let α(0,1)\alpha\in(0,1) be sufficiently small. Then define U2:dU_{2}:\mathbb{R}^{d}\to\mathbb{R} and G2:I[0,1]G_{2}:I\to[0,1] so that

G2n(u)U2n(x)=G1(u)U1(x)+αC1,un1/2ζn1/2τnd/2ψ(ζn(uu0),τn(xx0)),G_{2}^{n}(u)U_{2}^{n}(x)=G_{1}(u)U_{1}(x)+\alpha C_{1,u}n^{-1/2}\zeta_{n}^{1/2}\tau_{n}^{d/2}\psi(\zeta_{n}(u-u_{0}),\tau_{n}(x-x_{0}))\,,

where τn,ζn\tau_{n},\zeta_{n} are positive scalars that tend to \infty as nn\to\infty. Let b2n=U2nb_{2}^{n}=-\nabla U_{2}^{n}. Then we construct the second particle system similar to the above, with time-invariant density

ν2n(u,x)=C2,n,u1exp(2G2n(u)U2n(x)),C2,n,u=defdexp(2G2n(u)U2n(x))𝑑x.\nu_{2}^{n}(u,x)=C_{2,n,u}^{-1}\exp(-2G_{2}^{n}(u)U_{2}^{n}(x))\,,\quad C_{2,n,u}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{\mathbb{R}^{d}}\exp(-2G_{2}^{n}(u)U_{2}^{n}(x))dx\,.

Moreover, to maintain the desired Lipschitz and Hölder continuity, we need

n1/2ζn3/2τns+d/21.n^{-1/2}\zeta_{n}^{3/2}\tau_{n}^{s+d/2}\lesssim 1\,.

This allows us to take τn=n1d+3s\tau_{n}=n^{\frac{1}{d+3s}} and ζn=nsd+3s\zeta_{n}=n^{\frac{s}{d+3s}}, which also ensures that (b2n,Id×d,G2n,ν2n)SLs(t0,x0)(b_{2}^{n},I_{d\times d},G_{2}^{n},\nu_{2}^{n})\in S^{s}_{L}(t_{0},x_{0}).

Step 2. Now we run the finite-population systems derived from the above two graphon particle systems and make observations of the particle positions. For (6.1), the nn particles display the dynamics

dXin(t)=b1(Xin(t))G(in)dt+dBin(t),i=1,,n.dX^{n}_{i}(t)=b_{1}(X^{n}_{i}(t))G(\frac{i}{n})dt+dB_{\frac{i}{n}}(t)\,,\qquad i=1,\dots,n\,.

The distributions of the particles coincide with those in the graphon system, with joint law

μ1=defi=1nν1(in).\mu_{1}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\bigotimes_{i=1}^{n}\nu_{1}(\frac{i}{n})\,.

Similarly, the joint law in the second system is given by

μ2n=defi=1nν2n(in).\mu_{2}^{n}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\bigotimes_{i=1}^{n}\nu_{2}^{n}(\frac{i}{n})\,.

Then, following the strategy in [28], with Pinsker’s inequality, we have

μ1μ2nTV22i=1n|logC2,n,inC1,in|\displaystyle\left\|\mu_{1}-\mu_{2}^{n}\right\|_{TV}^{2}\leqslant 2\sum_{i=1}^{n}\left|\log\frac{C_{2,n,\frac{i}{n}}}{C_{1,\frac{i}{n}}}\right|

Taylor’s theorem gives

|logC2,n,inC1,in|\displaystyle\left|\log\frac{C_{2,n,\frac{i}{n}}}{C_{1,\frac{i}{n}}}\right| |C2,n,inC1,in1|\displaystyle\leqslant\left|\frac{C_{2,n,\frac{i}{n}}}{C_{1,\frac{i}{n}}}-1\right|
(6.2) =2α2n1ζnτnddν1(u,x)1ψ(ζn(inu0),τn(xx0))2Ri(x)𝑑x,\displaystyle=2\alpha^{2}n^{-1}\zeta_{n}\tau_{n}^{d}\int_{\mathbb{R}^{d}}\nu_{1}(u,x)^{-1}\psi(\zeta_{n}(\frac{i}{n}-u_{0}),\tau_{n}(x-x_{0}))^{2}R_{i}(x)dx\,,

where the remainder term Ri[0,2]R_{i}\in[0,2]. Notice that ν1(u,x)1\nu_{1}(u,x)^{-1} is bounded above in a neighborhood of (u0,x0)(u_{0},x_{0}). So there exists some constant c1c_{1} such that

μ1μ2nTV2c1α2ζnni=1nψ(ζn(inu0),)22.\left\|\mu_{1}-\mu_{2}^{n}\right\|_{TV}^{2}\leqslant\frac{c_{1}\alpha^{2}\zeta_{n}}{n}\sum_{i=1}^{n}\left\|\psi(\zeta_{n}(\frac{i}{n}-u_{0}),\cdot)\right\|_{2}^{2}\,.

Since ψCc\psi\in C_{c}^{\infty}, we have

Id|ψ(ζn(nunu0),x)2ψ(ζn(uu0),x)2|𝑑x𝑑u\displaystyle\int_{I}\int_{\mathbb{R}^{d}}\left|\psi(\zeta_{n}(\frac{\lceil nu\rceil}{n}-u_{0}),x)^{2}-\psi(\zeta_{n}(u-u_{0}),x)^{2}\right|dxdu
2Id|ζn(nunu)|ψ0(x)𝑑x𝑑u\displaystyle\leqslant 2\int_{I}\int_{\mathbb{R}^{d}}\left|\zeta_{n}(\frac{\lceil nu\rceil}{n}-u)\right|\psi_{0}(x)dxdu
2n1ζnψ01,\displaystyle\leqslant 2n^{-1}\zeta_{n}\left\|\psi_{0}\right\|_{1}\,,

where we set ψ0=defk=1duψk(x)L()\psi_{0}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\sum_{k=1}^{d}\left\|\nabla_{u}\psi_{k}(x)\right\|_{L^{\infty}(\mathbb{R})}. This implies

1ni=1nψ(ζn(inu0),)22ζn1ψ22+2n1ζnψ01.\frac{1}{n}\sum_{i=1}^{n}\left\|\psi(\zeta_{n}(\frac{i}{n}-u_{0}),\cdot)\right\|_{2}^{2}\leqslant\zeta_{n}^{-1}\left\|\psi\right\|_{2}^{2}+2n^{-1}\zeta_{n}\left\|\psi_{0}\right\|_{1}\,.

Thus

μ1μ2nTV2c1α2+o(1)14\left\|\mu_{1}-\mu_{2}^{n}\right\|_{TV}^{2}\leqslant c_{1}\alpha^{2}+o(1)\leqslant\frac{1}{4}

when α\alpha is chosen to be small enough and nn is sufficiently large.

Step 3. Finally, we apply the Le Cam’s lemma to see that

infμ^sup(b,σ,G,μ0)𝒜Ls(t0,x0)𝐄|μ^μ(t0,u0,x0)|\displaystyle\inf_{\hat{\mu}}\sup_{(b,\sigma,G,\mu_{0})\in\mathcal{A}^{s}_{L}(t_{0},x_{0})}\mathbf{E}\left|\hat{\mu}-\mu(t_{0},u_{0},x_{0})\right|
\displaystyle\geqslant infμ^maxμ~{μ1,μ2n}𝐄|μ^μ~(t0,u0,x0)|\displaystyle\inf_{\hat{\mu}}\max_{\tilde{\mu}\in\{\mu_{1},\mu_{2}^{n}\}}\mathbf{E}\left|\hat{\mu}-\tilde{\mu}(t_{0},u_{0},x_{0})\right|
\displaystyle\geqslant 12|ν1(u0,x0)ν2n(u0,x0)|(1μ1μ2nTV)\displaystyle\frac{1}{2}\left|\nu_{1}(u_{0},x_{0})-\nu_{2}^{n}(u_{0},x_{0})\right|(1-\left\|\mu_{1}-\mu_{2}^{n}\right\|_{TV})
\displaystyle\geqslant 14|ν1(u0,x0)ν2n(u0,x0)|.\displaystyle\frac{1}{4}\left|\nu_{1}(u_{0},x_{0})-\nu_{2}^{n}(u_{0},x_{0})\right|\,.

The same strategy as in (6.2) (see also equation (68) in [28]) gives that

|ν1(u0,x0)ν2n(u0,x0)|n1/2ζn1/2τnd/2=nsd+3s.\left|\nu_{1}(u_{0},x_{0})-\nu_{2}^{n}(u_{0},x_{0})\right|\gtrsim n^{-1/2}\zeta_{n}^{1/2}\tau_{n}^{d/2}=n^{-\frac{s}{d+3s}}\,.

Therefore, we get the lower bound (4.2) as well:

infμ^sup(b,σ,G,μ0)𝒜Ls(t0,x0)𝐄|μ^μ(t0,u0,x0)|2n2sd+3s,\inf_{\hat{\mu}}\sup_{(b,\sigma,G,\mu_{0})\in\mathcal{A}^{s}_{L}(t_{0},x_{0})}\mathbf{E}\left|\hat{\mu}-\mu(t_{0},u_{0},x_{0})\right|^{2}\geqslant n^{-\frac{2s}{d+3s}}\,,

completing the proof of Theorem 4.1. ∎

6.2. Proof of Theorem 4.2

We first justify the improved upper bound. For notation simplicity, let us define the n,pn,p-norm of a function f:[0,T]×I×ddf:[0,T]\times I\times\mathbb{R}^{d}\to\mathbb{R}^{d} via

φn,pp=def0T1ni=1ndf(t,in,x)p𝑑x𝑑t.\left\|\varphi\right\|_{n,p}^{p}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}\int_{\mathbb{R}^{d}}f(t,\frac{i}{n},x)^{p}dxdt\,.
Proof of Lemma 4.2.

Fix t0,u0,x0t_{0},u_{0},x_{0}. We consider the following telescoping sum

π^hn(t0,u0,x0)π(t0,u0,x0)\displaystyle\hat{\pi}^{n}_{h}(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})
=0T1ni=1n(HJK)h(t0t,u0in,x0Xin(t))(Yin(t)β(t,in,Xin(t)))dt\displaystyle=\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))(Y^{n}_{i}(t)-\beta(t,\frac{i}{n},X^{n}_{i}(t)))dt
+0TId(HJK)h(t0t,u0u,x0x)β(t,u,x)(μtn(du,dx)μt,u(dx)du)𝑑t\displaystyle+\int_{0}^{T}\int_{I}\int_{\mathbb{R}^{d}}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-u,x_{0}-x)\beta(t,u,x)(\mu^{n}_{t}(du,dx)-\mu_{t,u}(dx)du)dt
+0T1ni=1n(HJK)h(t0t,u0in,x0Xin(t))σ(Xin(t))dBin(t)\displaystyle+\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))\sigma(X^{n}_{i}(t))dB_{\frac{i}{n}}(t)
+((HJK)hππ)(t0,u0,x0).\displaystyle+((H\otimes J\otimes K)_{h}\ast\pi-\pi)(t_{0},u_{0},x_{0})\,.

This allows us to write

𝐄|π^hn(t0,u0,x0)π(t0,u0,x0)|24(𝐄|P1|2+𝐄|P2|2+𝐄|P3|2+𝐄|P4|2)\mathbf{E}\left|\hat{\pi}^{n}_{h}(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})\right|^{2}\leqslant 4(\mathbf{E}\left|P^{\prime}_{1}\right|^{2}+\mathbf{E}\left|P^{\prime}_{2}\right|^{2}+\mathbf{E}\left|P^{\prime}_{3}\right|^{2}+\mathbf{E}\left|P^{\prime}_{4}\right|^{2})

with obvious definitions of the four components.

Step 1. To bound P2P^{\prime}_{2}, we observe that

P2\displaystyle P^{\prime}_{2} =[0,T]×I×d(φβ)(t,u,x)(μtn(du,dx)μt,u(dx)du)𝑑t\displaystyle=\int_{[0,T]\times I\times\mathbb{R}^{d}}(\varphi\beta)(t,u,x)(\mu^{n}_{t}(du,dx)-\mu_{t,u}(dx)du)dt
=[0,T]×I×d(φβ)(t,u,x)(μtn(du,dx)μ¯tn(du,dx))𝑑t\displaystyle=\int_{[0,T]\times I\times\mathbb{R}^{d}}(\varphi\beta)(t,u,x)(\mu^{n}_{t}(du,dx)-\bar{\mu}^{n}_{t}(du,dx))dt
+[0,T]×I×d(φβ)(t,u,x)(μ¯tn(du,dx)μt,u(dx)du)𝑑t,\displaystyle\quad+\int_{[0,T]\times I\times\mathbb{R}^{d}}(\varphi\beta)(t,u,x)(\bar{\mu}^{n}_{t}(du,dx)-\mu_{t,u}(dx)du)dt\,,

where μ¯tn(du,dx)=1ni=1nμt,u(dx)δin(du)\bar{\mu}^{n}_{t}(du,dx)=\frac{1}{n}\sum_{i=1}^{n}\mu_{t,u}(dx)\delta_{\frac{i}{n}}(du), and φ=(HJK)h(t0,u0,x0)\varphi=(H\otimes J\otimes K)_{h}(t_{0}-\cdot,u_{0}-\cdot,x_{0}-\cdot). Then the first part becomes

P2,1=def1n0T(φβ)(t,in,Xin(t))𝐄¯[(φβ)(t,in,Xin(t))]dt,P^{\prime}_{2,1}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\frac{1}{n}\int_{0}^{T}(\varphi\beta)(t,\frac{i}{n},X^{n}_{i}(t))-\bar{\mathbf{E}}[(\varphi\beta)(t,\frac{i}{n},X^{n}_{i}(t))]dt\,,

where 𝐏¯\bar{\mathbf{P}} is the measure under which (Xin)i=1,,n(X^{n}_{i})_{i=1,\dots,n} are independent, with corresponding laws μin\mu_{\frac{i}{n}}, i=1,,ni=1,\dots,n. Since β\beta is bounded, and μ(t,u,x)\mu(t,u,x) is bounded in a small neighborhood of (t0,u0,x0)(t_{0},u_{0},x_{0}), we have

𝐄¯[(φβ)(t,in,Xin(t))]2Hh12(t0t)Jh22(u0in)Kh322,\bar{\mathbf{E}}[(\varphi\beta)(t,\frac{i}{n},X^{n}_{i}(t))]^{2}\lesssim H_{h_{1}}^{2}(t_{0}-t)J_{h_{2}}^{2}(u_{0}-\frac{i}{n})\left\|K_{h_{3}}\right\|_{2}^{2}\,,

where the implicit constant is uniform over i=1,,ni=1,\dots,n. This gives

0Ti=1n𝐄¯[(φβ)(t,in,Xin(t))]2(HJK)hn,22.\int_{0}^{T}\sum_{i=1}^{n}\bar{\mathbf{E}}[(\varphi\beta)(t,\frac{i}{n},X^{n}_{i}(t))]^{2}\lesssim\left\|(H\otimes J\otimes K)_{h}\right\|_{n,2}^{2}\,.

Applying Bernstein’s inequality gives

𝐏(|P2,1|z)\displaystyle\mathbf{P}\big{(}\left|P^{\prime}_{2,1}\right|\geqslant\sqrt{z}\big{)} c1𝐏¯(|P2,1|z)c2\displaystyle\leqslant c_{1}\bar{\mathbf{P}}\big{(}\left|P^{\prime}_{2,1}\right|\geqslant\sqrt{z}\big{)}^{c_{2}}
2dc1exp(c2nzc3(T1(HJK)hn,22+z(HJK)h)),\displaystyle\leqslant 2dc_{1}\exp\left(-\frac{c_{2}nz}{c_{3}(T^{-1}\left\|(H\otimes J\otimes K)_{h}\right\|_{n,2}^{2}+\sqrt{z}\left\|(H\otimes J\otimes K)_{h}\right\|_{\infty})}\right)\,,

for some positive constants c1,c2,c3c_{1},c_{2},c_{3}. Note that Jh2J_{h_{2}} is supported on [u0h2,u0+h2][u_{0}-h_{2},u_{0}+h_{2}], so

(HJK)hn,22\displaystyle\left\|(H\otimes J\otimes K)_{h}\right\|_{n,2}^{2} =0THh12(t0t)1ni=1nJh22(u0in)Kh322dt\displaystyle=\int_{0}^{T}H_{h_{1}}^{2}(t_{0}-t)\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}^{2}(u_{0}-\frac{i}{n})\left\|K_{h_{3}}\right\|_{2}^{2}dt
Hh122h2Jh22Kh322\displaystyle\lesssim\left\|H_{h_{1}}\right\|_{2}^{2}h_{2}\left\|J_{h_{2}}\right\|_{\infty}^{2}\left\|K_{h_{3}}\right\|_{2}^{2}
h11h21h3d.\displaystyle\lesssim h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}\,.

Thus

𝐄|P2,1|2\displaystyle\mathbf{E}\left|P^{\prime}_{2,1}\right|^{2} 2dc10exp(c2nzc3(T1(HJK)hn,22+z(HJK)h))𝑑z\displaystyle\lesssim 2dc_{1}\int_{0}^{\infty}\exp\left(-\frac{c_{2}nz}{c_{3}(T^{-1}\left\|(H\otimes J\otimes K)_{h}\right\|_{n,2}^{2}+\sqrt{z}\left\|(H\otimes J\otimes K)_{h}\right\|_{\infty})}\right)dz
n1h11h21h3d+n2h12h22h32d.\displaystyle\lesssim n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d}\,.

For the second part, we observe that

P2,2\displaystyle P^{\prime}_{2,2} =def[0,T]×I×d(φβ)(t,u,x)(μ¯tn(du,dx)μt,u(dx)du)𝑑t\displaystyle\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{[0,T]\times I\times\mathbb{R}^{d}}(\varphi\beta)(t,u,x)(\bar{\mu}^{n}_{t}(du,dx)-\mu_{t,u}(dx)du)dt
=0TI𝐄[(φβ)(t,nun,Xnun(t))(φβ)(t,u,Xu(t))]𝑑u𝑑t.\displaystyle=\int_{0}^{T}\int_{I}\mathbf{E}\left[(\varphi\beta)(t,\frac{\lceil nu\rceil}{n},X_{\frac{\lceil nu\rceil}{n}}(t))-(\varphi\beta)(t,u,X_{u}(t))\right]dudt\,.

This is identical to P3P_{3} in the proof of Lemma 3.2, which gives

|P2,2|n2h11h32d(h22+h32).\left|P^{\prime}_{2,2}\right|\lesssim n^{-2}h_{1}^{-1}h_{3}^{-2d}(h_{2}^{-2}+h_{3}^{-2})\,.

Thus

𝐄|P2|n1h11h21h3d+n2h11h32d(h11h22+h32)\mathbf{E}\left|P^{\prime}_{2}\right|\lesssim n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{3}^{-2d}(h_{1}^{-1}h_{2}^{-2}+h_{3}^{-2})

Step 2. To bound P1P^{\prime}_{1}, let us abuse the notation β:[0,T]×I×d×𝒫(I×d)d\beta:[0,T]\times I\times\mathbb{R}^{d}\times\mathcal{P}(I\times\mathbb{R}^{d})\to\mathbb{R}^{d} for the mean-field dependent quantity

β(t,u,x,μt)=IdG(u,v)b(x,y)μt,u(dx)𝑑v.\beta(t,u,x,\mu_{t})=\int_{I}\int_{\mathbb{R}^{d}}G(u,v)b(x,y)\mu_{t,u}(dx)dv\,.

Then we may write

Yin(t)β(t,in,Xin(t))\displaystyle Y^{n}_{i}(t)-\beta(t,\frac{i}{n},X^{n}_{i}(t)) =β(t,in,Xin(t),μtn)β(t,in,Xin(t),μt)\displaystyle=\beta(t,\frac{i}{n},X^{n}_{i}(t),\mu^{n}_{t})-\beta(t,\frac{i}{n},X^{n}_{i}(t),\mu_{t})
=IdG(in,u)b(Xin(t),x)(μtn(du,dx)μt,u(dx)du).\displaystyle=\int_{I}\int_{\mathbb{R}^{d}}G(\frac{i}{n},u)b(X^{n}_{i}(t),x)(\mu^{n}_{t}(du,dx)-\mu_{t,u}(dx)du)\,.

Note that bb and GG are bounded and Lipschitz, we may compare with Proposition 19 under Assumption 4(iii) [28]. This gives a uniform-in-time bound

𝐏(|Yin(t)β(t,in,Xin(t))|z)c1exp(c2nz21+nz),z>0.\mathbf{P}\left(\left|Y^{n}_{i}(t)-\beta(t,\frac{i}{n},X^{n}_{i}(t))\right|\geqslant z\right)\leqslant c_{1}\exp\big{(}-\frac{c_{2}nz^{2}}{1+\sqrt{n}z}\big{)}\,,\qquad z>0\,.

Now, applying Cauchy-Schwarz twice, we obtain

𝐄|P1|2\displaystyle\mathbf{E}\left|P^{\prime}_{1}\right|^{2} [𝐄(0T1ni=1n(HJK)h(t0t,u0in,x0Xin(t))2dt)2]1/2\displaystyle\leqslant\left[\mathbf{E}\left(\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))^{2}dt\right)^{2}\right]^{1/2}
[𝐄(0T1ni=1n|Yin(t)β(t,in,Xin(t))|2dt)2]1/2\displaystyle\quad\cdot\left[\mathbf{E}\left(\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}\left|Y^{n}_{i}(t)-\beta(t,\frac{i}{n},X^{n}_{i}(t))\right|^{2}dt\right)^{2}\right]^{1/2}

Applying Cauchy-Schwarz to the second term again, we see that

[𝐄(0T1ni=1n|Yin(t)β(t,in,Xin(t))|2dt)2]1/2\displaystyle\left[\mathbf{E}\left(\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}\left|Y^{n}_{i}(t)-\beta(t,\frac{i}{n},X^{n}_{i}(t))\right|^{2}dt\right)^{2}\right]^{1/2}
T1/2(0T1ni=1n𝐄|Yin(t)β(t,in,Xin(t))|4dt)1/2\displaystyle\leqslant T^{1/2}\left(\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}\mathbf{E}\left|Y^{n}_{i}(t)-\beta(t,\frac{i}{n},X^{n}_{i}(t))\right|^{4}dt\right)^{1/2}
supt[0,T]supi=1,,n(0𝐏(|Yin(t)β(t,in,Xin(t))|z1/4)𝑑z)1/2\displaystyle\lesssim\sup_{t\in[0,T]}\sup_{i=1,\dots,n}\left(\int_{0}^{\infty}\mathbf{P}\left(\left|Y^{n}_{i}(t)-\beta(t,\frac{i}{n},X^{n}_{i}(t))\right|\geqslant z^{1/4}\right)dz\right)^{1/2}
02c1exp(c2nz1/21+nz1/4)1/2n1.\displaystyle\lesssim\int_{0}^{\infty}2c_{1}\exp\big{(}-\frac{c_{2}nz^{1/2}}{1+\sqrt{n}z^{1/4}}\big{)}^{1/2}\lesssim n^{-1}\,.

For the first term, Minkowski’s inequality gives an upper bound

[0,T]×I×dφ(t,u,x)2μt,u(dx)𝑑u𝑑t\displaystyle\int_{[0,T]\times I\times\mathbb{R}^{d}}\varphi(t,u,x)^{2}\mu_{t,u}(dx)dudt
(6.3) +[𝐄([0,T]×I×dφ(t,u,x)2(μtn(du,dx)μt,u(dx)du)𝑑t)2]1/2,\displaystyle+\left[\mathbf{E}\left(\int_{[0,T]\times I\times\mathbb{R}^{d}}\varphi(t,u,x)^{2}(\mu^{n}_{t}(du,dx)-\mu_{t,u}(dx)du)dt\right)^{2}\right]^{1/2}\,,

where φ=(HJK)h(t0,u0,x0)\varphi=(H\otimes J\otimes K)_{h}(t_{0}-\cdot,u_{0}-\cdot,x_{0}-\cdot). With the same strategy applied to bound P2,1P^{\prime}_{2,1}, we get

𝐄[[0,T]×I×dφ(t,u,x)2(μtn(du,dx)μ¯tn(du,dx))𝑑t]2\displaystyle\mathbf{E}\left[\int_{[0,T]\times I\times\mathbb{R}^{d}}\varphi(t,u,x)^{2}(\mu^{n}_{t}(du,dx)-\bar{\mu}^{n}_{t}(du,dx))dt\right]^{2}
n1h13h23h33d+n2h14h24h34d,\displaystyle\qquad\qquad\lesssim n^{-1}h_{1}^{-3}h_{2}^{-3}h_{3}^{-3d}+n^{-2}h_{1}^{-4}h_{2}^{-4}h_{3}^{-4d}\,,

which follows from the fact that

φn,44h13h23h33d.\left\|\varphi\right\|_{n,4}^{4}\lesssim h_{1}^{-3}h_{2}^{-3}h_{3}^{-3d}\,.

On the other hand, using the idea of P2,2P^{\prime}_{2,2}, we get

𝐄[[0,T]×I×dφ(t,u,x)2(μ¯tn(du,dx)μ¯t,u(dx)du)𝑑t]\displaystyle\mathbf{E}\left[\int_{[0,T]\times I\times\mathbb{R}^{d}}\varphi(t,u,x)^{2}(\bar{\mu}^{n}_{t}(du,dx)-\bar{\mu}_{t,u}(dx)du)dt\right]
n1h11h22h3d+n1h11h22h312d.\displaystyle\qquad\qquad\lesssim n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-d}+n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}\,.

Those lead to the following asymptotic upper bound for the first term,

h11h21h3d+n1/2h13/2h23/2h33d/2+n1h12h22h32d\displaystyle h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-1/2}h_{1}^{-3/2}h_{2}^{-3/2}h_{3}^{-3d/2}+n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d}
+n1h11h22h3d+n1h11h22h312d,\displaystyle+n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-d}+n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}\,,

Joining the above leads to

𝐄|P1|2n1h11h21h3d+n2h11h22h312d,\mathbf{E}\left|P^{\prime}_{1}\right|^{2}\lesssim n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}\,,

where we assume n1h11h21h3d1n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}\lesssim 1.

Step 3. Now, to bound P3P^{\prime}_{3}, we apply Itô’s isometry to see that

𝐄|P3|2\displaystyle\mathbf{E}\left|P^{\prime}_{3}\right|^{2} σ+2d2n𝐄[0T1ni=1nφ(t,in,Xin(t))2dt]\displaystyle\leqslant\frac{\sigma_{+}^{2}d^{2}}{n}\mathbf{E}\left[\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}\varphi(t,\frac{i}{n},X^{n}_{i}(t))^{2}dt\right]
=σ+2d2n𝐄[[0,T]×I×dφ(t,u,x)2μtn(du,dx)𝑑t].\displaystyle=\frac{\sigma_{+}^{2}d^{2}}{n}\mathbf{E}\left[\int_{[0,T]\times I\times\mathbb{R}^{d}}\varphi(t,u,x)^{2}\mu^{n}_{t}(du,dx)dt\right]\,.

Note that

𝐄[[0,T]×I×dφ(t,u,x)2μt,u(dx)𝑑u𝑑t]φ22.\mathbf{E}\left[\int_{[0,T]\times I\times\mathbb{R}^{d}}\varphi(t,u,x)^{2}\mu_{t,u}(dx)dudt\right]\lesssim\left\|\varphi\right\|_{2}^{2}\,.

We write again

μtn(du,dx)μt,u(dx)du=(μtn(du,dx)μ¯tn(du,dx))+(μ¯tn(du,dx)μt,u(dx)du).\mu^{n}_{t}(du,dx)-\mu_{t,u}(dx)du=(\mu^{n}_{t}(du,dx)-\bar{\mu}^{n}_{t}(du,dx))+(\bar{\mu}^{n}_{t}(du,dx)-\mu_{t,u}(dx)du)\,.

Using the bound for (6.3) in Step 2 leads to

𝐄|P3|2n1h11h21h3d+n2h11h22h312d,\displaystyle\mathbf{E}\left|P^{\prime}_{3}\right|^{2}\lesssim n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}\,,

whenever n1h11h21h3d1n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}\lesssim 1.

Summarizing the above, we conclude that

𝐄|π^hnπ|2n1h11h21h3d+n2h11h22h312d+n2h11h322d+|φππ|2.\mathbf{E}\left|\hat{\pi}^{n}_{h}-\pi\right|^{2}\lesssim n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}+n^{-2}h_{1}^{-1}h_{3}^{-2-2d}+\left|\varphi\ast\pi-\pi\right|^{2}\,.

Finally, analogous to the analysis of μ\mu, we are able to show the optimality of β^\hat{\beta}.

Proof of Theorem 4.2.

The strategies are identical to the proof of Theorem 4.1.

Step 1: Upper bound. Fix t0(0,T)t_{0}\in(0,T), u0Iu_{0}\in I, x0dx_{0}\in\mathbb{R}^{d}, and κ2>0\kappa_{2}>0. We assume that n1h11h21h3d1n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}\lesssim 1.

Recall that

|β^h,κβ|2κ22(|π^hnπ|2+b2|μ^hnμ|2)\left|\hat{\beta}_{h,\kappa}-\beta\right|^{2}\lesssim\kappa_{2}^{-2}(\left|\hat{\pi}^{n}_{h}-\pi\right|^{2}+\left\|b\right\|_{\infty}^{2}\left|\hat{\mu}^{n}_{h}-\mu\right|^{2})

whenever κ2<infsupp(HJK)hμ\kappa_{2}<\inf_{\operatorname{supp}(H\otimes J\otimes K)_{h}}\mu. From Lemmata 4.1 and 4.2, we have

𝐄|μ^hnμ|2\displaystyle\mathbf{E}\left|\hat{\mu}^{n}_{h}-\mu\right|^{2}\lesssim
n1h21h3d+n2h22h32d+n2h322d+n2h22h3d\displaystyle\qquad n^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{2}^{-2}h_{3}^{-2d}+n^{-2}h_{3}^{-2-2d}+n^{-2}h_{2}^{-2}h_{3}^{-d}
+|(JK)hμμ|2,\displaystyle\qquad+\left|(J\otimes K)_{h}\ast\mu-\mu\right|^{2}\,,

and

𝐄|π^hnπ|2\displaystyle\mathbf{E}\left|\hat{\pi}^{n}_{h}-\pi\right|^{2}\lesssim
n1h11h21h3d+n2h11h22h312d+n2h11h322d\displaystyle\qquad n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}+n^{-2}h_{1}^{-1}h_{3}^{-2-2d}
+|(HJK)hππ|2.\displaystyle\qquad+\left|(H\otimes J\otimes K)_{h}\ast\pi-\pi\right|^{2}\,.

In the proof of Theorem 4.1 we saw that

|(JK)hμμ|2h22+h32s3.\left|(J\otimes K)_{h}\ast\mu-\mu\right|^{2}\lesssim h_{2}^{2}+h_{3}^{2s_{3}}\,.

We may obtain a similar upper bound for |(HJK)hππ|2\left|(H\otimes J\otimes K)_{h}\ast\pi-\pi\right|^{2}.

Recall that π=μβ\pi=\mu\beta, where

β(t,u,x)=IG(u,v)db(t,x,y)μt,v(dy)𝑑v=IG(u,v)𝐄[b(t,x,Xv(t))]𝑑v.\beta(t,u,x)=\int_{I}G(u,v)\int_{\mathbb{R}^{d}}b(t,x,y)\mu_{t,v}(dy)dv=\int_{I}G(u,v)\mathbf{E}[b(t,x,X_{v}(t))]dv\,.

The boundedness of bb ensures the Lipschitz continuity of β\beta in variable uu, while the Lipschitz continuity of bb leads to local Hölder s3s_{3}-continuity of β\beta in variable xx for s3[0,1]s_{3}\in[0,1]. Moreover, for 0<t1<t2<T0<t_{1}<t_{2}<T, by Itô’s formula we have

β(t2,u,x)β(t1,u,x)\displaystyle\beta(t_{2},u,x)-\beta(t_{1},u,x)
=IG(u,v)𝐄(b(t2,x,Xv(t2))b(t1,x,Xv(t1)))𝑑v\displaystyle=\int_{I}G(u,v)\mathbf{E}(b(t_{2},x,X_{v}(t_{2}))-b(t_{1},x,X_{v}(t_{1})))dv
=IG(u,v)𝐄[t1t2(tbt(x)+yb(x)βt,v+12y2b(x)tr(σσT))(Xv(t))𝑑t]𝑑v.\displaystyle=\int_{I}G(u,v)\mathbf{E}\left[\int_{t_{1}}^{t_{2}}(\partial_{t}b_{t}(x)+\partial_{y}b(x)\beta_{t,v}+\frac{1}{2}\partial_{y}^{2}b(x)\operatorname{tr}(\sigma\sigma^{T}))(X_{v}(t))dt\right]dv\,.

Given the uniform boundedness of bb, tb\partial_{t}b, yb\partial_{y}b, and y2b\partial_{y}^{2}b, we have a uniform bound

|β(t2,u,x)β(t1,u,x)||t2t1|,\left|\beta(t_{2},u,x)-\beta(t_{1},u,x)\right|\lesssim\left|t_{2}-t_{1}\right|\,,

and this is further bounded by O(|t2t1|s1)O(\left|t_{2}-t_{1}\right|^{s_{1}}) for s1[0,1]s_{1}\in[0,1] whenever t2t1<1t_{2}-t_{1}<1. Then we have βs1,s3(t0,x0)\beta\in\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0}) as well, and so does π=μβ\pi=\mu\beta. Thus

|(HJK)hππ|2h12s1+h22+h32s3.\left|(H\otimes J\otimes K)_{h}\ast\pi-\pi\right|^{2}\lesssim h_{1}^{2s_{1}}+h_{2}^{2}+h_{3}^{2s_{3}}\,.

Joining the above leads to the bound

𝐄|β^h,κnβ|2n1h11h21h3d+n2h11h22h312d+n2h11h322d+h12s1+h22+h32s3.\mathbf{E}\left|\hat{\beta}^{n}_{h,\kappa}-\beta\right|^{2}\lesssim n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}+n^{-2}h_{1}^{-1}h_{3}^{-2-2d}+h_{1}^{2s_{1}}+h_{2}^{2}+h_{3}^{2s_{3}}\,.

Choosing h1=nsbs1(2sb+1)h_{1}=n^{-\frac{s_{b}}{s_{1}(2s_{b}+1)}}, h2=nsb2sb+1h_{2}=n^{-\frac{s_{b}}{2s_{b}+1}}, and h3=nsbs3(2sb+1)h_{3}=n^{-\frac{s_{b}}{s_{3}(2s_{b}+1)}}, we get

𝐄|β^h,κn(t0,u0,x0)β(t0,u0,x0)|2n2sb2sb+1.\mathbf{E}\left|\hat{\beta}^{n}_{h,\kappa}(t_{0},u_{0},x_{0})-\beta(t_{0},u_{0},x_{0})\right|^{2}\lesssim n^{-\frac{2s_{b}}{2s_{b}+1}}\,.

Note that the implicit constant depends only on T,d,bT,d,\left\|b\right\|_{\infty}, the L2L^{2} and LL^{\infty} norms of (HJK)(H\otimes J\otimes K), and the values of μ\mu in a small neighborhood of (t0,u0,x0)(t_{0},u_{0},x_{0}). That concludes (4.5).

Step 2: Lower bound. We construct examples and apply the two-point comparison lemma in a similar way as in the proof of Theorem 4.1.

Let L>0L>0. We consider also models with no interactions, so that

β(t,u,x)=IG(u)db(t,x)μt,v(dy)𝑑v=G(u)b(t,x).\beta(t,u,x)=\int_{I}G(u)\int_{\mathbb{R}^{d}}b(t,x)\mu_{t,v}(dy)dv=G(u)b(t,x)\,.

Pick b1(t,x)b_{1}(t,x) and G1(u)G_{1}(u) so that (b1,Id×d,G1,μ0)𝒜ˇL/2s1,s3(t0,x0)(b_{1},I_{d\times d},G_{1},\mu_{0})\in\check{\mathcal{A}}^{s_{1},s_{3}}_{L/2}(t_{0},x_{0}). Note that there is no interaction, so the particles (X1n,,Xnn)(X^{n}_{1},\dots,X^{n}_{n}) are independent, with joint law denoted by ν1\nu_{1}.

Choose some ψCc(××d)\psi\in C_{c}^{\infty}(\mathbb{R}\times\mathbb{R}\times\mathbb{R}^{d}) such that

  • ψ(0,0,0)=1\psi(0,0,0)=1, ψ1\left\|\psi\right\|_{\infty}1,

  • ψ𝑑t𝑑u𝑑x=0\int\psi dtdudx=0, ψ2=1\left\|\psi\right\|_{2}=1,

  • supuψ(u,)s1,s3(t0,x0)<\sup_{u\in\mathbb{R}}\left\|\psi(u,\cdot)\right\|_{\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0})}<\infty.

For n1n\geqslant 1 and some small enough α(0,1)\alpha\in(0,1), pick b2n(t,x)b_{2}^{n}(t,x) and G2n(u)G_{2}^{n}(u) so that

G2n(u)b2n(t,x)=G1(u)b1(t,x)+αn12(τn1)12(τn2)12(τn3)d2ψ(τn1(tt0),τn2(uu0),τn3(xx0)),G_{2}^{n}(u)b_{2}^{n}(t,x)=G_{1}(u)b_{1}(t,x)+\alpha n^{-\frac{1}{2}}(\tau^{1}_{n})^{\frac{1}{2}}(\tau^{2}_{n})^{\frac{1}{2}}(\tau^{3}_{n})^{\frac{d}{2}}\psi(\tau^{1}_{n}(t-t_{0}),\tau^{2}_{n}(u-u_{0}),\tau^{3}_{n}(x-x_{0}))\,,

where (τni)n1(\tau^{i}_{n})_{n\geqslant 1}, i=1,2,3i=1,2,3, are sequences of scalars such that

(τn1)s1=(τn2)s2=(τn3)s3d=nsb2sb+1.(\tau^{1}_{n})^{s_{1}}=(\tau^{2}_{n})^{s_{2}}=(\tau^{3}_{n})^{\frac{s_{3}}{d}}=n^{\frac{s_{b}}{2s_{b}+1}}\,.

With proper choice of parameters (b1,G1,μ0)(b_{1},G_{1},\mu_{0}), small enough α\alpha and large enough nn, the local Hölder continuity of the density follows from classical estimates given in [9]. This means we may assume (b2n,G2n,Id×d,μ0)𝒜ˇLs1,s3(t0,x0)(b_{2}^{n},G_{2}^{n},I_{d\times d},\mu_{0})\in\check{\mathcal{A}}^{s_{1},s_{3}}_{L}(t_{0},x_{0}).

Denote by ν2n\nu_{2}^{n} the joint law of the particles (X1n,,Xnn)(X^{n}_{1},\dots,X^{n}_{n}) derived from the parameters (b2n,G2n,Id×d,μ0)(b_{2}^{n},G_{2}^{n},I_{d\times d},\mu_{0}). Following the idea of Lemma 28 in [28], we see that

(6.4) ν1ν2nTV2140Ti=1n𝐄|G2n(in)b2n(t,Xin(t))G1(in)b1(t,Xin(t))|2dt.\left\|\nu_{1}-\nu_{2}^{n}\right\|_{TV}^{2}\leqslant\frac{1}{4}\int_{0}^{T}\sum_{i=1}^{n}\mathbf{E}\left|G_{2}^{n}(\frac{i}{n})b_{2}^{n}(t,X^{n}_{i}(t))-G_{1}(\frac{i}{n})b_{1}(t,X^{n}_{i}(t))\right|^{2}dt\,.

Given the compact support of ψ\psi and local boundedness of μt,u\mu_{t,u}, (6.4) is further bounded by

α240Tτ1ni=1nτ2n(τ3n)dψ(τ1n(tt0),τ2n(inu0),τ3n(x0))22dtα24<14\displaystyle\frac{\alpha^{2}}{4}\int_{0}^{T}\tau^{1}_{n}\sum_{i=1}^{n}\tau^{2}_{n}(\tau^{3}_{n})^{d}\left\|\psi(\tau^{1}_{n}(t-t_{0}),\tau^{2}_{n}(\frac{i}{n}-u_{0}),\tau^{3}_{n}(\cdot-x_{0}))\right\|_{2}^{2}dt\lesssim\frac{\alpha^{2}}{4}<\frac{1}{4}

when α\alpha is sufficiently small.

On the other hand,

|G2n(u0)b2n(t0,x0)G1(u0)b1(t0,x0)|=αn12(τ1n)12(τ2n)12(τ3n)d2nsb2sb+1.\left|G_{2}^{n}(u_{0})b_{2}^{n}(t_{0},x_{0})-G_{1}(u_{0})b_{1}(t_{0},x_{0})\right|=\alpha n^{-\frac{1}{2}}(\tau^{1}_{n})^{\frac{1}{2}}(\tau^{2}_{n})^{\frac{1}{2}}(\tau^{3}_{n})^{\frac{d}{2}}\gtrsim n^{-\frac{s_{b}}{2s_{b}+1}}\,.

Applying Le Cam’s two-point comparison method gives the lower bound

infβ^sup(b,σ,G,μ0)𝒜ˇs1,s3L(t0,x0)𝐄|β^β(t0,u0,x0)|\displaystyle\inf_{\hat{\beta}}\sup_{(b,\sigma,G,\mu_{0})\in\check{\mathcal{A}}^{s_{1},s_{3}}_{L}(t_{0},x_{0})}\mathbf{E}\left|\hat{\beta}-\beta(t_{0},u_{0},x_{0})\right|
\displaystyle\geqslant infβ^maxβ~{β1,β2n}𝐄|β^β~(t0,u0,x0)|\displaystyle\inf_{\hat{\beta}}\max_{\tilde{\beta}\in\{\beta_{1},\beta_{2}^{n}\}}\mathbf{E}\left|\hat{\beta}-\tilde{\beta}(t_{0},u_{0},x_{0})\right|
\displaystyle\geqslant 12|β1(t0,u0,x0)β2n(t0,u0,x0)|(1ν1ν2nTV)\displaystyle\frac{1}{2}\left|\beta_{1}(t_{0},u_{0},x_{0})-\beta_{2}^{n}(t_{0},u_{0},x_{0})\right|(1-\left\|\nu_{1}-\nu_{2}^{n}\right\|_{TV})
\displaystyle\geqslant 14|G1(u0)b1(t0,x0)G2n(u0)b2n(t0,x0)|\displaystyle\frac{1}{4}\left|G_{1}(u_{0})b_{1}(t_{0},x_{0})-G_{2}^{n}(u_{0})b_{2}^{n}(t_{0},x_{0})\right|
\displaystyle\gtrsim nsb2sb+1.\displaystyle n^{-\frac{s_{b}}{2s_{b}+1}}\,.

Acknowledgement

We express our sincere gratitude to Marc Hoffmann for helpful discussions.

Appendix A Intuitions of Estimators

Recall that

β(t,u,x)=Idb(x,y)G(u,v)μt,v(dy)dv.\beta(t,u,x)=\int_{I}\int_{\mathbb{R}^{d}}b(x,y)G(u,v)\mu_{t,v}(dy)dv\,.

With Condition 2.1(2) and 2.2(2), we may further expand it as

β(t,u,x)=V(x)Ig(uv)dv+gFμt(u,x),\beta(t,u,x)=V(x)\int_{I}g(u-v)dv+gF\ast\mu_{t}(u,x)\,,

where the convolution here is done on the space ×d\mathbb{R}\times\mathbb{R}^{d}.

The first term is independent of time tt, so we have tβ=gFtμ\partial_{t}\beta=gF\ast\partial_{t}\mu. Note that t\partial_{t} is a linear operator, and we may approximate it by some finite difference operator

Dhf(t0)=deff(t0+h)f(t0)h=0Tf(t)δt0+h(dt)δt0(dt)h.D_{h}f(t_{0})\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\frac{f(t_{0}+h)-f(t_{0})}{h}=\int_{0}^{T}f(t)\frac{\delta_{t_{0}+h}(dt)-\delta_{t_{0}}(dt)}{h}\,.

We consider some linear operator ϕ\mathcal{L}_{\phi} with bounded function ϕ\phi that approximates the differential operator, and

ϕβ=(gF)ϕμ.\mathcal{L}_{\phi}\beta=(g\otimes F)\ast\mathcal{L}_{\phi}\mu\,.

Then the deconvolution method gives

Idϕβ=(Ig)(dF)(Idϕμ),\mathcal{F}_{I}\mathcal{F}_{\mathbb{R}^{d}}\mathcal{L}_{\phi}\beta=(\mathcal{F}_{I}g)(\mathcal{F}_{\mathbb{R}^{d}}F)(\mathcal{F}_{I}\mathcal{F}_{\mathbb{R}^{d}}\mathcal{L}_{\phi}\mu)\,,

i.e. 𝒯β=(g)(F)𝒯μ\mathcal{T}\beta=(\mathcal{F}g)(\mathcal{F}F)\mathcal{T}\mu. Thus it leads to the formula

G(u0,v0)=g(u0v0)F2g0F2=I1(𝒯β𝒯μ)(u=u0v0)2I1(𝒯β𝒯μ)(u=0)2G(u_{0},v_{0})=\frac{g(u_{0}-v_{0})\left\|F\right\|_{2}}{g_{0}\left\|F\right\|_{2}}=\frac{\left\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\beta}{\mathcal{T}\mu}\big{)}(u=u_{0}-v_{0})\right\|_{2}}{\left\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\beta}{\mathcal{T}\mu}\big{)}(u=0)\right\|_{2}}

whenever well-defined.

Once we have an estimate of μ\mu and an estimate of β\beta, we may plug them into this formula. With some additional cutoff factors (to avoid the denominators being too small), we produce the estimator (2.6).

Appendix B Proofs of Technical Lemmas

Proof of Proposition 2.1.

Recall that for each uIu\in I, the dynamic of type-uu particles is given by dXu(t)=β(t,u,Xu(t))dt+σ(Xu(t))dBu(t)dX_{u}(t)=\beta(t,u,X_{u}(t))dt+\sigma(X_{u}(t))dB_{u}(t), where βt,u=defβ(t,u,)\beta_{t,u}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\beta(t,u,\cdot) is Lipschitz on d\mathbb{R}^{d} (this is not hard to verify). So μt,u\mu_{t,u} uniquely solves the Fokker-Planck equation

tμt,u=(βt,uμt,u)+12i,j=1dxixj((σσT)ijμt,u).\partial_{t}\mu_{t,u}=-\nabla\cdot(\beta_{t,u}\mu_{t,u})+\frac{1}{2}\sum_{i,j=1}^{d}\partial_{x_{i}x_{j}}((\sigma\sigma^{T})_{ij}\mu_{t,u})\,.

Theorem 7.3.3 in [9] gives a local upper bound

μt,uL(U)Cμ0,uL(U)+Ct(pd2)/2(1+βLp(μt,u)p)\left\|\mu_{t,u}\right\|_{L^{\infty}(U)}\leqslant C\left\|\mu_{0,u}\right\|_{L^{\infty}(U)}+Ct^{(p-d-2)/2}(1+\left\|\beta\right\|_{L^{p}(\mu_{t,u})}^{p})

for all p>d+2p>d+2 and any bounded open set UU, whenever the LL^{\infty}-norm is well-defined.

We find that for any pp,

d|β(t,u,x)|pμ(t,u,dx)\displaystyle\int_{\mathbb{R}^{d}}\left|\beta(t,u,x)\right|^{p}\mu(t,u,dx)
=d|Idb(x,y)G(u,v)μ(t,v,dy)dv|pμ(t,u,dx)\displaystyle=\int_{\mathbb{R}^{d}}\left|\int_{I}\int_{\mathbb{R}^{d}}b(x,y)G(u,v)\mu(t,v,dy)dv\right|^{p}\mu(t,u,dx)
Id|db(x,y)μ(t,v,dy)|pμ(t,u,dx)dv\displaystyle\lesssim\int_{I}\int_{\mathbb{R}^{d}}\left|\int_{\mathbb{R}^{d}}b(x,y)\mu(t,v,dy)\right|^{p}\mu(t,u,dx)dv
bp<.\displaystyle\leqslant\left\|b\right\|_{\infty}^{p}<\infty\,.

From Condition 2.3(1) we know that there exists some R>0R>0 such that μ(0,u,x)\mu(0,u,x) is uniformly bounded by some constant MM outside the unit ball B(0,R)B(0,R) for all uIu\in I. We cover B(0,R)¯c\overline{B(0,R)}^{c} with open balls, so that

μ(t,u,x)CM+Ct(pd2)/2(1+bp)\mu(t,u,x)\leqslant CM+Ct^{(p-d-2)/2}(1+\left\|b\right\|_{\infty}^{p})

holds for every t[0,T]t\in[0,T], uIu\in I, |x|>R\left|x\right|>R.

Denote the above upper bound by C0C_{0}. Then we have for every t[0,T]t\in[0,T] and uIu\in I that

|x|>Rμ(t,u,x)2dxC0|x|>Rμ(t,u,x)dx=C0𝐏(|Xu(t)|>R)C0𝐄|Xu(t)|2R2.\int_{\left|x\right|>R}\mu(t,u,x)^{2}dx\leqslant C_{0}\int_{\left|x\right|>R}\mu(t,u,x)dx=C_{0}\mathbf{P}(\left|X_{u}(t)\right|>R)\leqslant\frac{C_{0}\mathbf{E}\left|X_{u}(t)\right|^{2}}{R^{2}}\,.

A classical estimates shows that supt[0,T],uI𝐄|Xu(t)|2<\sup_{t\in[0,T],u\in I}\mathbf{E}\left|X_{u}(t)\right|^{2}<\infty, so the above quantity tends to 0 as RR\to\infty. ∎

Proof of Leamma 5.1.

From the proof of Theorem 3.2 of [3] we see that there exists some constant C>0C>0 such that

𝐄|Yni(t)Yin(t)|26Cmax1jn𝐄|Xnj(t)Xjn(t)|2+6Cn2\mathbf{E}\left|Y^{n}_{i}(t)-Y_{\frac{i}{n}}(t)\right|^{2}\leqslant 6C\max_{1\leqslant j\leqslant n}\mathbf{E}\left|X^{n}_{j}(t)-X_{\frac{j}{n}}(t)\right|^{2}+\frac{6C}{n^{2}}

for all i=1,,ni=1,\dots,n and t[0,T]t\in[0,T], which gives the first inequality. The second inequality follows immediately from dominated convergence. ∎

Proof of Lemma 6.2.

For the first part, we recall that μ\mu satisfies the Fokker-Planck equation in the following sense:

tμt,u+(βt,uμt,u)=12i,j=1dij2((σσT)ijμt,u).\partial_{t}\mu_{t,u}+\nabla\cdot(\beta_{t,u}\mu_{t,u})=\frac{1}{2}\sum_{i,j=1}^{d}\partial_{ij}^{2}((\sigma\sigma^{T})_{ij}\mu_{t,u})\,.

For simplicity we work under the (additional) assumption that σ=σ0Id×d\sigma=\sigma_{0}I_{d\times d} for some constant σ0>0\sigma_{0}>0.

Given distinct u,vIu,v\in I, we have

t(μt,uμt,v)\displaystyle\partial_{t}(\mu_{t,u}-\mu_{t,v})
=σ022Δ(μt,uμt,v)(βt,u(μt,uμt,v))(μt,v(βt,uβt,v)).\displaystyle\qquad=\frac{\sigma_{0}^{2}}{2}\Delta(\mu_{t,u}-\mu_{t,v})-\nabla\cdot(\beta_{t,u}(\mu_{t,u}-\mu_{t,v}))-\nabla\cdot(\mu_{t,v}(\beta_{t,u}-\beta_{t,v}))\,.

Notice that,

βt,u(x)=Idb(x,y)G(u,u)μt,u(dy)du,\beta_{t,u}(x)=\int_{I}\int_{\mathbb{R}^{d}}b(x,y)G(u,u^{\prime})\mu_{t,u^{\prime}}(dy)du^{\prime}\,,

so there exist some constant c1>0c_{1}>0 such that

|βt,u(x)βt,v(x)|\displaystyle\left|\beta_{t,u}(x)-\beta_{t,v}(x)\right| Id|b(x,y)(G(u,u)G(v,u))|μt,u(dy)du\displaystyle\leqslant\int_{I}\int_{\mathbb{R}^{d}}\left|b(x,y)(G(u,u^{\prime})-G(v,u^{\prime}))\right|\mu_{t,u^{\prime}}(dy)du^{\prime}
c1bId|uv|μt,u(dy)du\displaystyle\leqslant c_{1}\left\|b\right\|_{\infty}\int_{I}\int_{\mathbb{R}^{d}}\left|u-v\right|\mu_{t,u^{\prime}}(dy)du^{\prime}
c1b|uv|\displaystyle\leqslant c_{1}\left\|b\right\|_{\infty}\left|u-v\right|

and similarly, |βt,u(x)βt,v(x)|c1xb|uv|\left|\nabla\cdot\beta_{t,u}(x)-\nabla\cdot\beta_{t,v}(x)\right|\leqslant c_{1}\left\|\nabla_{x}\cdot b\right\|_{\infty}\left|u-v\right|, for any u,vIu,v\in I. Then we have

t(μt,uμt,v)\displaystyle\partial_{t}(\mu_{t,u}-\mu_{t,v})\leqslant
σ022Δ(μt,uμt,v)βt,u(μt,uμt,v)βt,u(μt,uμt,v)+C|uv|,\displaystyle\qquad\frac{\sigma_{0}^{2}}{2}\Delta(\mu_{t,u}-\mu_{t,v})-\nabla\cdot\beta_{t,u}(\mu_{t,u}-\mu_{t,v})-\beta_{t,u}\cdot\nabla(\mu_{t,u}-\mu_{t,v})+C\left|u-v\right|\,,

where the constant CC depends only on the Lipschitz coefficients of bb and GG.

We compare the above inequality with the following differential equation

tφt=σ022Δφtβt,uφtβt,uφt,\partial_{t}\varphi_{t}=\frac{\sigma_{0}^{2}}{2}\Delta\varphi_{t}-\nabla\cdot\beta_{t,u}\varphi_{t}-\beta_{t,u}\cdot\nabla\varphi_{t}\,,

with initial condition φ0=μ0,uμ0,v\varphi_{0}=\mu_{0,u}-\mu_{0,v}. This is a linear homogeneous parabolic equation. Thanks to maximum principle, we have

μt,uμt,vφt+Ct|uv|.\mu_{t,u}-\mu_{t,v}\leqslant\varphi_{t}+Ct\left|u-v\right|\,.

It remains to bound φt\varphi_{t}.

Consider a time reversal of φ\varphi, namely ψ(t,x)=defφ(Tt,x)\psi(t,x)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\varphi(T-t,x). It satisfies the following equation

tψt+(βTt,u)ψt+βTt,uψt+σ022Δψt=0,\partial_{t}\psi_{t}+(\nabla\cdot\beta_{T-t,u})\psi_{t}+\beta_{T-t,u}\cdot\nabla\psi_{t}+\frac{\sigma_{0}^{2}}{2}\Delta\psi_{t}=0\,,

with terminal condition ψT=φ0\psi_{T}=\varphi_{0}.

Note that βt,u=Idxb(x,y)G(w,w)μt,w(dy)dw\nabla\cdot\beta_{t,u}=\int_{I}\int_{\mathbb{R}^{d}}\nabla_{x}\cdot b(x,y)G(w,w^{\prime})\mu_{t,w^{\prime}}(dy)dw^{\prime} is bounded. Then Feynman-Kac formula reads

ψ(t,x)\displaystyle\psi(t,x) =𝐄~Zt=x[exp(tTβTs,u(Zs)ds)ψT(ZT)]\displaystyle=\tilde{\mathbf{E}}^{Z_{t}=x}\left[\exp\left(\int_{t}^{T}\nabla\cdot\beta_{T-s,u}(Z_{s})ds\right)\psi_{T}(Z_{T})\right]
=𝐄~Zt=x[exp(tTβs,u(ZTs)ds)φ0(ZT)],\displaystyle=\tilde{\mathbf{E}}^{Z_{t}=x}\left[\exp\left(-\int_{t}^{T}\nabla\cdot\beta_{s,u}(Z_{T-s})ds\right)\varphi_{0}(Z_{T})\right]\,,

where (Zt)t[0,T](Z_{t})_{t\in[0,T]} is a diffusion process with dynamics

dZt=β(Tt,u,Zt)dt+σdW~t,dZ_{t}=\beta(T-t,u,Z_{t})dt+\sigma d\tilde{W}_{t}\,,

and W~\tilde{W} is a dd-dimensional Brownian motion, under the measure 𝐏~\tilde{\mathbf{P}}.

Thus we have

|φt(x)|=|ψ(Tt,x)|φ0exp(txb)exp(txb)ρI(x)|uv|,\left|\varphi_{t}(x)\right|=\left|\psi(T-t,x)\right|\leqslant\left\|\varphi_{0}\right\|_{\infty}\exp\big{(}t\left\|\nabla_{x}\cdot b\right\|_{\infty}\big{)}\leqslant\exp\big{(}t\left\|\nabla_{x}\cdot b\right\|_{\infty}\big{)}\rho_{I}(x)\left|u-v\right|\,,

where ρI\rho_{I} is given by Condition 2.3(3). This implies μt,uμt,v|uv|\mu_{t,u}-\mu_{t,v}\lesssim\left|u-v\right| at every fixed xx, where the implicit constant is independent of x,u,v,tx,u,v,t.

Similarly, the other direction μt,vμt,u\mu_{t,v}-\mu_{t,u} produces the same bound. Hence, there exists some constant C>0C>0 such that, for every t[0,T]t\in[0,T], xdx\in\mathbb{R}^{d}, and u,vIu,v\in I, it holds that

|μ(t,u,x)μ(t,v,x)|C|uv|.\left|\mu(t,u,x)-\mu(t,v,x)\right|\leqslant C\left|u-v\right|\,.

Appendix C Reduction of Assumption 2.1

Recall the operator 𝒯=Idϕ\mathcal{T}=\mathcal{F}_{I}\mathcal{F}_{\mathbb{R}^{d}}\mathcal{L}_{\phi}. Our estimator G^\hat{G} requires computing the quantity 𝒯β^𝒯μ^\frac{\mathcal{T}\hat{\beta}}{\mathcal{T}\hat{\mu}}. If in a set U×dU\subset\mathbb{R}\times\mathbb{R}^{d} with positive Lebesgue measure we have 𝒯μ=0\mathcal{T}\mu=0, a good estimator μ^\hat{\mu} would lead to small |𝒯μ^|\left|\mathcal{T}\hat{\mu}\right|. This might blow up the fraction and keep us away from a good estimation for GG. Therefore, the ad hoc assumption on the nonvanishing property of 𝒯μ\mathcal{T}\mu is somehow inevitable in this problem. It is also commonly seen in learning sample density with unknown error distribution (see [24] for instance).

However, the assumption 2.1 is not a trivial property of μ\mu. Given the nonlinearity of the Fokker-Planck equation associated to (1.1), computating the explicit formula for 𝒯μ\mathcal{T}\mu is mostly impossible. To the best of our knowledge, no explicit solutions for graphon systems (1.1) satisfying all our assumptions have been presented. In this appendix, we study some special cases, where Assumption 2.1 is reduced to some weaker conditions that are easier to verify. We work under the hypothesis of Theorem 2.1, and assume σ=Id×d\sigma=I_{d\times d} for simplicity.

Case 1: Degeneration to a homogeneous system

Suppose Iuμ0,u𝒫(d)I\ni u\mapsto\mu_{0,u}\in\mathcal{P}(\mathbb{R}^{d}) is constant. Corollary 2.3, [18], states that, given that the map

Iu01G(u,v)dvI\ni u\mapsto\int_{0}^{1}G(u,v)dv

is also constant (denoted by g¯\bar{g}), the density map Iuμu𝒫(𝒞d)I\ni u\mapsto\mu_{u}\in\mathcal{P}(\mathcal{C}_{d}) is also constant, and it solves the classical McKean-Vlasov equation

tμ¯(t,x)=12Δμ¯(t,x)(μ¯(t,x)dg¯b(x,y)μ¯(t,y)dy),\partial_{t}\bar{\mu}(t,x)=\frac{1}{2}\Delta\bar{\mu}(t,x)-\nabla\cdot\left(\bar{\mu}(t,x)\int_{\mathbb{R}^{d}}\bar{g}b(x,y)\bar{\mu}(t,y)dy\right)\,,

where we may view g¯b\bar{g}b as a single quantity. In our model G(u,v)=g(uv)G(u,v)=g(u-v), gg must be 1-periodic on [1,1][-1,1].

Notice that for w0w\neq 0,

𝒯μ(w,ξ)\displaystyle\mathcal{T}\mu(w,\xi) =0Tϕ(t)eiwudeiξxμ(t,u,x)dxdudt\displaystyle=\int_{0}^{T}\phi(t)\int_{\mathbb{R}}e^{-iwu}\int_{\mathbb{R}^{d}}e^{-i\xi\cdot x}\mu(t,u,x)dxdudt
=0Tϕ(t)01eiwudeiξxμ(t,u,x)dxdudt\displaystyle=\int_{0}^{T}\phi(t)\int_{0}^{1}e^{-iwu}\int_{\mathbb{R}^{d}}e^{-i\xi\cdot x}\mu(t,u,x)dxdudt
=0Tϕ(t)1eiwiwμ¯t(ξ)dt\displaystyle=\int_{0}^{T}\phi(t)\frac{1-e^{-iw}}{iw}\mathcal{F}\bar{\mu}_{t}(\xi)dt
=1eiwiwμ¯(ξ).\displaystyle=\frac{1-e^{-iw}}{iw}\mathcal{F}\mathcal{L}\bar{\mu}(\xi)\,.

Note that 1eiw=01-e^{-iw}=0 if and only if w=2πkw=2\pi k for kk\in\mathbb{Z}, which means it is nonzero dwdw-a.e. Then our Assumption 2.1 reduces to Assumption 16 (on the solution μ¯\bar{\mu}) in [28].

Case 2: Denegeration to a finite graph

Define the degree of an index uu with respect to a subset JIJ\subset I by

degJ(u)=JG(u,v)dv,\deg_{J}(u)=\int_{J}G(u,v)dv\,,

and

deg(u)=01G(u,v)dv.\deg(u)=\int_{0}^{1}G(u,v)dv\,.

Consider the partition I=j=1mIjI=\bigcup_{j=1}^{m}I_{j}, and denote by [u0][u_{0}] the subset Iju0I_{j}\ni u_{0}. Assume the degree on each part of the partition is constant, i.e., for u0,u1,u2Iu_{0},u_{1},u_{2}\in I, we have

(C.1) deg[u0](u1)=[u0]G(u1,v)dv=[u0]G(u2,v)dv=deg[u0](u2),\deg_{[u_{0}]}(u_{1})=\int_{[u_{0}]}G(u_{1},v)dv=\int_{[u_{0}]}G(u_{2},v)dv=\deg_{[u_{0}]}(u_{2})\,,

whenever [u1]=[u2][u_{1}]=[u_{2}]. An example is G(u,v)=g(uv)G(u,v)=g(u-v), where gg is defined on \mathbb{R}, supported on [1,1][-1,1], and 1m\frac{1}{m}-periodic on [1,1][-1,1]. Then we take Ij=(j1m,jm]I_{j}=(\frac{j-1}{m},\frac{j}{m}] for every j=1,,mj=1,\dots,m, so that

degIj(u1)=j1mjmg(u1v)dv=u1j1mu1jmg(v)dv=01mg(v)dv.\deg_{I_{j}}(u_{1})=\int_{\frac{j-1}{m}}^{\frac{j}{m}}g(u_{1}-v)dv=\int_{u_{1}-\frac{j-1}{m}}^{u_{1}-\frac{j}{m}}g(v)dv=\int_{0}^{\frac{1}{m}}g(v)dv\,.

Assume further that the initial data map uμ0,u𝒫(d)u\mapsto\mu_{0,u}\in\mathcal{P}(\mathbb{R}^{d}) is also constant on each IjI_{j}. Theorem 2.1 in [18] tells us that the map uμ𝒫(𝒞d)u\mapsto\mu\in\mathcal{P}(\mathcal{C}_{d}) is constant in each part of the partition, that is, μu1=μu2\mu_{u_{1}}=\mu_{u_{2}} whenever [u1]=[u2][u_{1}]=[u_{2}].

We set μ(j)(t,x)=μt,u(x)\mu^{(j)}(t,x)=\mu_{t,u}(x), for uIju\in I_{j}, j=1,,mj=1,\dots,m, and set Djk=dIk(u)D_{jk}=d_{I_{k}}(u) for uIju\in I_{j}, which is well-defined due to (C.1). Then they satisfy a family of coupled equations:

tμ(j)=12Δμ(j)(D1)j(μ(j)V)(μ(j)k=1mDjkFμ(k)),j=1,,m.\partial_{t}\mu^{(j)}=\frac{1}{2}\Delta\mu^{(j)}-(D\vec{1})_{j}\nabla\cdot(\mu^{(j)}V)-\nabla\cdot\left(\mu^{(j)}\sum_{k=1}^{m}D_{jk}F\ast\mu^{(k)}\right)\,,\qquad j=1,\dots,m\,.

where D=(Djk)D=(D_{jk}) is treated as an m×mm\times m matrix. Assume that all DjkD_{jk} are equal to some constant d0>0d_{0}>0. We see that

tμ(j)=12Δμ(j)md0(μ(j)V)(μ(j)d0F(k=1mμ(k))).\partial_{t}\mu^{(j)}=\frac{1}{2}\Delta\mu^{(j)}-md_{0}\nabla\cdot(\mu^{(j)}V)-\nabla\cdot\left(\mu^{(j)}d_{0}F\ast\big{(}\sum_{k=1}^{m}\mu^{(k)}\big{)}\right)\,.

Let μ¯=1mj=1mμ(j)\bar{\mu}=\frac{1}{m}\sum_{j=1}^{m}\mu^{(j)} (it is indeed Iμudu\int_{I}\mu_{u}du in this situation), then it solves

(C.2) tμ¯=12Δμ¯md0(μ¯V)md0(μ¯Fμ¯).\partial_{t}\bar{\mu}=\frac{1}{2}\Delta\bar{\mu}-md_{0}\nabla\cdot(\bar{\mu}V)-md_{0}\nabla\cdot(\bar{\mu}F\ast\bar{\mu})\,.

Note that μ¯𝒫(d)\bar{\mu}\in\mathcal{P}(\mathbb{R}^{d}), and we may define

b¯(t,x,μ¯t)=defmd0(V(x)+dF(xy)μ¯t(dy)).\bar{b}(t,x,\bar{\mu}_{t})\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}md_{0}(V(x)+\int_{\mathbb{R}^{d}}F(x-y)\bar{\mu}_{t}(dy))\,.

Then (C.2) is the associated Fokker-Planck equation for the mean-field diffusion process

dUt=b¯(t,Ut,μ¯t)dt+dB¯t,U0μ¯0,dU_{t}=\bar{b}(t,U_{t},\bar{\mu}_{t})dt+d\bar{B}_{t}\,,\qquad U_{0}\sim\bar{\mu}_{0}\,,

and μ¯\bar{\mu} is the density of UU.

Now we may compute

𝒯μ(w,ξ)=k=1mck(w)μ(k)(ξ),\displaystyle\mathcal{T}\mu(w,\xi)=\sum_{k=1}^{m}c_{k}(w)\mathcal{F}\mathcal{L}\mu^{(k)}(\xi)\,,

where in this case

ck(w)=defIkeiwvdv={eiwkm(eiwm1)iww01mw=0.c_{k}(w)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{I_{k}}e^{-iwv}dv=\begin{cases}\frac{e^{-\frac{iwk}{m}}(e^{\frac{iw}{m}}-1)}{iw}&w\neq 0\\ \frac{1}{m}&w=0\end{cases}.

We focus on the part where w0w\neq 0, where 𝒯μ0\mathcal{T}\mu\neq 0 if and only if

(k=1meiwkmμ(k))0.\mathcal{F}\mathcal{L}\big{(}\sum_{k=1}^{m}e^{-\frac{iwk}{m}}\mu^{(k)}\big{)}\neq 0\,.

Set

ρw=defk=1meiwkmμ(k).\rho_{w}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\sum_{k=1}^{m}e^{-\frac{iwk}{m}}{\mu^{(k)}}\,.

Notice that for every ww\in\mathbb{R}, ρw\rho_{w} solves the linear differential equation

(C.3) tρ=12Δρmd0(ρ(V+Fμ¯t)).\partial_{t}\rho=\frac{1}{2}\Delta\rho-md_{0}\nabla\cdot(\rho(V+F\ast\bar{\mu}_{t}))\,.

Its canonical diffusion process has the following dynamics

dRt=β¯(t,Rt)dt+dBt,dR_{t}=\bar{\beta}(t,R_{t})dt+dB_{t}\,,

where β¯(t,x)=b¯(t,x,μ¯t)\bar{\beta}(t,x)=\bar{b}(t,x,\bar{\mu}_{t}). Then Assumption 2.1 now reduces to Assumption 16 in [28] on ρw\rho_{w} for almost every w[0,2πm]w\in[0,2\pi m].

Remark C.1.

In the most general case, (μu)uI(\mu_{u})_{u\in I} are the solutions to a system of infinitely many fully coupled nonlinear differential equations. There are no explicit formulae either for Iμt(w)=Ieiwuμt,udu\mathcal{F}_{I}\mu_{t}(w)=\int_{I}e^{-iwu}\mu_{t,u}du to fit into a condition involving only the operator d\mathcal{F}_{\mathbb{R}^{d}}\mathcal{L}. However, each ρw\rho_{w} in Case 2 is now the solution to some linear equation (though the coefficients involve μ¯t\bar{\mu}_{t}, which solves some other equation and could be seen as a known quantity). The assumption becomes much milder in this sense, and that is the main reduction in Case 2.

References

  • [1] C. Amorino, A. Heidari, V. Pilipauskaitė, and M. Podolskij. Parameter estimation of discretely observed interacting particle systems. Stochastic Processes and their Applications, 163:350–386, 2023.
  • [2] J. Baladron, D. Fasoli, O. Faugeras, and J. Touboul. Mean-field description and propagation of chaos in networks of hodgkin-huxley and fizhugh-nagumo neurons. Electron. J. Statist., 2, 2012.
  • [3] E. Bayraktar, S. Chakraborty, and R. Wu. Graphon mean field systems. Ann. Appl. Probab., 33(5):3587 – 3619, 2023.
  • [4] E. Bayraktar and D. Kim. Concentration of measure for graphon particle system. Adv. Appl. Probab., pages 1–28, 2024.
  • [5] E. Bayraktar and R. Wu. Stationarity and uniform in time convergence for the graphon particle system. Stochastic Processes and their Applications, 150:532–568, 2022.
  • [6] E. Bayraktar and R. Wu. Graphon particle system: Uniform-in-time concentration bounds. Stochastic Processes and their Applications, 156:196–225, 2023.
  • [7] E. Bayraktar, R. Wu, and X. Zhang. Propagation of chaos of forward-backward stochastic differential equations with graphon interactions. Applied Mathematics and Optimization, 88(1):25, 2023.
  • [8] D. Belomestny, V. Pilipauskaitė, and M. Podolskij. Semiparametric estimation of mckean-vlasov sdes. Ann. Inst. H. Poincaré Probab. Statist., 59(1):79–96, 2023.
  • [9] V. I. Bogachev, N. V. Krylov, M. Rockner, and S. V. Shaposhnikov. Fokker-Planck-Kolmogorov equations. Mathematical surveys and monographs, volume 207. American Mathematical Society, Providence, Rhode Island, 2015.
  • [10] F. Bolley, A. Guillin, and C. Villani. Quantitative concentration inequalities for empirical measures on non-compact spaces. Probab. Theory Relat. Fields, 137(3):541–593, 2007.
  • [11] M. Burger, V. Capasso, and D. Morale. On an aggregation model with long and short range interactions. Nonlinear Analysis: Real World Applications, 8(3):939–958, 2007.
  • [12] P. E. Caines and M. Huang. Graphon mean field games and their equations. SIAM Journal on Control and Optimization, 59(6):4373–4399, 2021.
  • [13] C. Canuto, F. Fagnani, and P. Tilli. An eulerian approach to the analysis of krause’s consensus models. SIAM Journal on Control and Optimization, 50(1):243–265, 2012.
  • [14] R. Carmona, D. B. Cooney, C. V. Graves, and M. Laurière. Stochastic graphon games: I. the static case. Mathematics of Operations Research, 47(1):750–778, 2021.
  • [15] R. A. Carmona. Applications of mean field games in financial engineering and economic theory. Proceedings of Symposia in Applied Mathematics, 78, 2021.
  • [16] B. Chazelle, Q. Jiu, Q. Li, and C. Wang. Well-posedness of the limiting equation of a noisy consensus model in opinion dynamics. J. Differ. Equ., 263(1):365–397, 2017.
  • [17] F. Coppini. Long time dynamics for interacting oscillators on graphs. Ann. Appl. Probab., 32(1):360–391, 2022.
  • [18] F. Coppini. A note on fokker-planck equations and graphons, 2022.
  • [19] F. Coppini, H. Dietert, and G. Giacomin. A law of large numbers and large deviations for interacting diffusions on erdős-rényi graphs. Stochastics and Dynamics, page 2050010, 2019.
  • [20] F. Delarue. Mean field games: A toy model on an erdös-renyi graph. ESAIM: Proceedings and Surveys, 60:1–26, 2017.
  • [21] S. Delattre, G. Giacomin, and E. Lucon. A note on dynamical models on random graphs and fokker-planck equations. Journal of Statistical Physics, 165(4):785–798, 2016.
  • [22] P. Dupuis and G. S. Medvedev. The large deviation principle for interacting dynamical systems on random graphs. Communications in Mathematical Physics, 390(2):545–575, 2022.
  • [23] J.-P. Fouque and L.-H. Sun. Systemic Risk Illustrated, pages 444–452. Cambridge University Press, 2013.
  • [24] J. Johannes. Deconvolution with unknown error distribution. The Annals of Statistics, 37(5A):2301–2323, 2009.
  • [25] V. N. Kolokoltsov. Nonlinear Markov processes and kinetic equations. Cambridge Tracts in Mathematics. Cambridge University Press, 2010.
  • [26] L. Le Cam. Asymptotic Methods in Statistical Decision Theory. Springer Series in Statistics. Springer-Verlag, New York, NY, 1986.
  • [27] L. Lovász. Large networks and graph limits. Colloquium Publications. American Mathematical Society, 2012.
  • [28] L. D. Maestra and M. Hoffmann. Nonparametric estimation for interacting particle systems: Mckean-vlasov models. Probab. Theory Relat. Fields, 182:551–613, 2022.
  • [29] H. P. McKean Jr. A class of markov processes associated with nonlinear parabolic equations. Proceedings of the National Academy of Sciences of the United States of America, 56:1907–1911, 1966.
  • [30] A. Mogilner and L. Edelstein-Keshet. A non-local model for a swarm. J. Math. Biol., 38:534–570, 1999.
  • [31] R. I. Oliveira and G. H. Reis. Interacting diffusions on random graphs with diverging average degrees: Hydrodynamics and large deviations. Journal of Statistical Physics, 2019.
  • [32] F. Parise and A. Ozdaglar. Graphon games: A statistical framework for network games and interventions. Econometrica, 91(1):191–225, 2023.
  • [33] T. Sarkar, A. Bhattacharjee, H. Samanta, K. Bhattacharya, and H. Saha. Optimal design and implementation of solar pv-wind-biogas-vrfb storage integrated smart hybrid microgrid for ensuring zero loss of power supply probability. Energy Conversion and Management, 191:102–118, 2019.
  • [34] A.-S. Sznitman. Topics in propagation of chaos. Lecture Notes in Mathematics. Springer-Verlag, New York, 1991.
  • [35] A. A. Vlasov. Many-Particle Theory and Its Application to Plasma. Russian Monographs and Texts on Advanced Mathematics and Physics 8. Gordon and Breach, New York, 1961.
  • [36] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Structural Analysis in the Social Sciences. Cambridge University Press, 1994.