Non-parametric estimates for graphon mean-field particle systems

Erhan Bayraktar Department of Mathematics, University of Michigan, Ann Arbor, MI 48109. [email protected] and Hongyi Zhou Department of Mathematics, University of Michigan, Ann Arbor, MI 48109. [email protected]

(Date: March 9, 2024)

Abstract.

We consider the graphon mean-field system introduced in the work of Bayraktar, Chakraborty, and Wu. It is the large-population limit of a heterogeneously interacting diffusive particle system, where the interaction is of mean-field type with weights characterized by an underlying graphon function. Through observation of continuous-time trajectories within the particle system, we construct plug-in estimators of the particle density, the drift coefficient, and thus the graphon interaction weights of the mean-field system. Our estimators for the density and drift are direct results of kernel interpolation on the empirical data, and a deconvolution method leads to an estimator of the underlying graphon function. We show that, as the number of particles increases, the graphon estimator converges to the true graphon function pointwisely, and as a consequence, in the cut metric. Besides, we conduct a minimax analysis within a particular class of particle systems to justify the pointwise optimality of the density and drift estimators.

Key words and phrases:

graphon mean-field system, interacting particles, kernel estimation, minimax analysis

2020 Mathematics Subject Classification:

Primary 62G07, 62H22, 62M05; Secondary 05C80, 60J60, 60K35.

This research is supported in part by the National Science Foundation.

1. Introduction

We study a statistical method to estimate the interaction strength in the graphon mean-field interaction particle system introduced in [3]. The particles in such a system are characterized by not only the feature vector in a physical space $\mathbb{R}^{d}$ but also the “type” indexed by $I=[0,1]$ . The interaction strength between particles of different types is quantified by a graphon function $G:I\times I\to[0,1]$ .

Precisely, the system consists of a family of diffusion processes with dynamics

(1.1)		$\displaystyle X_{u}(t)$	$\displaystyle=X_{u}(0)+\int_{0}^{t}\int_{I}\int_{\mathbb{R}^{d}}b(X_{u}(s),x)G(u,v)\mu_{s,v}(dy)dvds$
		$\displaystyle\qquad\qquad+\int_{0}^{t}\sigma(X_{u}(s))dB_{u}(s)\,,\qquad t\geqslant 0\,,\qquad u\in I\,,$

where $b:\mathbb{R}^{d}\times\mathbb{R}^{d}\to\mathbb{R}$ and $\sigma:\mathbb{R}^{d}\to\mathbb{R}^{d\times d}$ are some Lipschitz functions, $\{X_{u}(0)\nonscript\>|\nonscript\>\mathopen{}\allowbreak u\in I\}$ are a collection of independent random variables in $\mathbb{R}^{d}$ with distributions $\{\mu_{0,u}\nonscript\>|\nonscript\>\mathopen{}\allowbreak u\in I\}$ , and $\{B_{u}\nonscript\>|\nonscript\>\mathopen{}\allowbreak u\in I\}$ are i.i.d. $d$ -dimensional Brownian motions independent of $\{X_{u}(0)\nonscript\>|\nonscript\>\mathopen{}\allowbreak u\in I\}$ . Here, we are assuming that the interactions between particles only happen in the drift term.

The main purpose of this study is to estimate the graphon function $G$ by continuous observation of a finite-population system. It is shown in [3] that system (1.1) is the large-population limit of the following system

(1.2)		$\displaystyle X^{n}_{i}(t)$	$\displaystyle=X_{\frac{i}{n}}(0)+\int_{0}^{t}\frac{1}{n}\sum_{j=1}^{n}b(X^{n}_{i}(s),X^{n}_{j}(s))g^{n}_{ij}ds$
		$\displaystyle\qquad\qquad+\int_{0}^{t}\sigma(X^{n}_{i}(s))dB_{\frac{i}{n}}(s)\,,\qquad t\geqslant 0\,,\quad i=1,\dots,n\,,$

where $g^{n}_{ij}=G(\frac{i}{n},\frac{j}{n})$ . We continuously observe (1.2) over a (fixed) time horizon $[0,T]$ . Using empirical data, we construct estimators of the particle density and the drift term of (1.1) and finally an estimator of the graphon function $G$ . The error of our estimation is well-controlled when the number of particles increases, with proper choices of parameters and under certain conditions.

In this work, we are mainly interested in a model resembling the McKean-Vlasov type, where the drift integrand $b$ takes the form

b(x,y)=F(x-y)+V(x)\,,\qquad x,y\in\mathbb{R}^{d}\,,

for some sufficiently regular functions $F$ and $V$ . The function $F$ acts as an interacting force between two particles depending on their relative positions, while the function $V$ accounts for an external force applied to every single particle. Also, we consider graphon functions $G$ of the form

G(u,v)=g(u-v)\,,\qquad u,v\in I\,,

for some function $g:\mathbb{R}\to\mathbb{R}$ with certain regularities. When we specialize in this case, the problem boils down to estimating the function $g$ .

1.1. Background

The study of classical mean-field systems with homogeneous interaction and the associated parabolic equations in the sense of McKean [29] dates back to the 1960s. The original motivation of this study came from plasma theory in statistical physics (see [35, 34, 25] and references therein), and its significance in applied mathematics was well demonstrated throughout the past decades. Several analytic and probabilistic methods were developed during the period to push-forward the study of mean-field systems (see the references in [28]).

However, the early formulation of this problem focuses on the theoretical properties of the systems, which highly rely on precise knowledge of the dynamics of the systems. Statistical methods that fit those properties into models with noises were still in shortage until the 21st century. A modern formulation came to the stage in the 2010s, when the development of other areas of research led to a high demand for statistical inference models. Empirical data from a particle system can be utilized as inferences to estimate the dynamics of the system and thus predict its future behaviors. Those new ideas are applied to various application fields, including chemical and biological systems [2, 11, 30], economics and finance [23, 15], collective behaviors [13, 16], etc.

While the features of particles are usually embedded in a physical space $\mathbb{R}^{d}$ , the situations in the study of modern networks can be more complicated. Systems with inhomogeneity contain different types of entities (e.g. social networks [36] and power supplies [33]), and interactions between two individuals also depend on their types. Then the idea of studying sophisticated networks using graphon particle systems begins to draw significant attention.

A heterogeneous particle system can be embedded into a graph (deterministic or random) graph (see [17, 19, 20, 21, 22, 31]), where the interaction strength between two types of particles is quantified by the corresponding edge weight. As the number of vertices increases and the graph becomes denser, the interaction strength approaches some bounded symmetric kernel $G:[0,1]^{2}\to[0,1]$ called graphon. In fact, every graphon is the limit of a sequence of finite graphs, which is discussed in Chapter 11 of [27].

In recent years, graphon mean field games have demonstrated their ability to model densely interactive networks in multiple studies, e.g. [14, 12, 32]. On the purely theoretical side, several results on the stability and stationarity of graphon mean-field systems have been established in [5, 6, 3, 7], and the concentration of measures is well studied in [6, 4]. These properties enable the study of the mean-field systems from a statistical inference point of view.

Statistical inference methods are widely applied to learning the dynamics of interactive systems. Empirical data in a McKean-Vlasov model can be interpolated using a kernel to obtain estimates of particle density [28, 8, 1]. In particular, the data-driven estimation algorithms in [28] automatically choose the best kernel bandwidths among a predetermined, possibly opaque set, which ensures pointwise optimality even without explicitly specifying the parameters. Estimating the interacting force requires more technical tools, including the deconvolution methods introduced by Johannes in [24]. These strategies offer firm technical support in the analysis of interactive systems with unknown driving forces, which admits predictions of the evolution of the systems solely based on empirical data.

1.2. Our contributions and organization of this paper

Recall the graphon function $G$ in the mean-field system (1.1). In Section 2, we introduce a kernel interpolation method adopted from [28, 8]. We make continuous observations of the $n$ -particle system (1.2) during a finite time interval $[0,T]$ . The empirical data from the finite-population system are then interpolated in both the feature space $\mathbb{R}^{d}$ and the index space $I$ to produce a pointwise estimator $\hat{\mu}^{n}_{h}(t,u,x)$ of the particle density functions $\mu(t,u,x)$ . A further interpolation in the time variable leads to an estimator $\hat{\beta}^{n}_{h,\kappa}(t,u,x)$ for the drift coefficients

\beta(t,u,x)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{I}G(u,v)(V(x)+F\ast\mu_{t,v}(x))dv\,.

Then we apply a deconvolution method introduced in [24] to build a pointwise estimator $\hat{G}^{n}_{\vartheta}$ of $G$ . Here

\vartheta=(h_{1},h_{2},h_{3},\kappa_{0},\kappa_{1},\kappa_{2},r,\tilde{r})

are the parameters associated to the estimators: $h=(h_{1},h_{2},h_{3})\in\mathbb{R}_{+}^{3}$ are the bandwidths of the kernels, $\kappa=(\kappa_{0},\kappa_{1},\kappa_{2})\in\mathbb{R}_{+}^{3}$ are the denominator cutoff factors to prohibit fraction blowups, and $r,\tilde{r}>0$ are cutoff radius. We will explain them with more details in Section 2. We show in Section 3 that there exists a sequence $(\vartheta^{n})_{n\in\mathbb{N}}$ of the parameters such that

\lim_{n\to\infty}\mathbf{E}\left|\hat{G}^{n}_{\vartheta_{n}}(u_{0},v_{0})-G(u_{0},v_{0})\right|^{2}=0\,,

subject to the regularity conditions.

We will disclose the particular settings of our problem in Section 2. These include the continuity and integrability of the coefficients $F,V,G$ and the initial data $\mu_{0,u}$ . Then we define the kernel-interpolated estimators $\hat{\mu}^{n}_{h}$ and $\hat{\beta}^{n}_{h,\kappa}$ , with free choices of the kernel bandwidth vector $h$ and cutoff factors $\kappa$ . It is worth noticing that the bandwidths of our estimators are fixed throughout the algorithm, whereas the data-driven Goldenshluger-Lepski estimators applied in [28] make dynamic choices of bandwidths from a pre-determined finite set of candidates. The pre-determined set can be invisible to the users, and the algorithm automatically selects the best candidate to give as output an estimation. Such algorithm attains the optimal pointwise oracle estimations without precise knowledge of the system’s continuity property and does not lose too much efficiency for each tuple of plug-in arguments. However, the convergence of our estimator $\hat{G}^{n}_{\vartheta}$ depends on the total $L^{2}$ -errors of the plug-in estimators (instead of the pointwise errors), so it becomes more beneficial to fix the bandwidths all along. The minimax analysis in Section 4 shows that our estimator $\hat{\mu}^{n}_{h}$ is still pointwisely optimal when given enough information.

We will present upper bounds on the errors of the pointwise estimators in Section 3, with proofs in Section 5. The main idea behind the proofs are the stability of the mean-field systems and the concentration of particle density. We make a direct connection between the (observed) finite-population system (1.2) and the intrinsic graphon mean field system (1.1) by the next inequality step. With particle density $\mu$ , for example, we have

(1.3)

\mathbf{E}\left|\hat{\mu}^{n}_{h}-\mu\right|^{2}\leqslant 2\mathbf{E}\left|\hat{\mu}^{n}_{h}-\bar{\mu}^{n}_{h}\right|^{2}+2\mathbf{E}\left|\bar{\mu}^{n}_{h}-\mu\right|^{2}\,,

where

\bar{\mu}^{n}_{h}(t_{0},u_{0},x_{0})=\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))\,.

The first part is controlled by the convergence of (1.2) to (1.1) (see [3]). For the second part, we follow the idea of [28] and produce a Berstein concentration inequality. The use of Bernstein’s inequality here avoids the extra constants that arise from the change of measures in [28], thanks to the independence of particles in the graphon mean-field system. It is worth noticing that all the constants appearing in the inequalities are global (independent of the plug-in arguments $t_{0},u_{0},x_{0}$ ), and we keep some of the explicit summations in the upper bound on purpose (as can be seen in Lemma 3.1 and 3.2). Those properties preserve the integrability of the whole sums and maintain nice asymptotic behavior of the estimator $\hat{G}^{n}_{\vartheta}$ .

In Section 4, we perform a minimax analysis on the plug-in estimator of particle density $\mu$ and the drift coefficient $\beta$ . We restrict our view to those particle systems with locally Hölder continuous density functions, the existence of which can be seen in several classical texts in Fokker-Planck equations such as [9]. We present an alternative analysis on the pointwise behaviors of $\hat{\mu}^{n}_{h}$ and $\hat{\pi}^{n}_{h}$ with a change-of-measure strategy adapted from [28]. This improves the pointwise errors obtained in Section 3.1 with the sacrifice of a constant factor depending on the value of $\mu$ near $(t_{0},u_{0},x_{0})$ . Balancing among the several error terms leads to optimal asymptotic upper bounds. On the other hand, we find the (theoretical) lower bounds of the estimation error and compare them with the upper bounds, which demonstrates the optimality of our estimators. The proofs are given in Section 6.

2. Model and estimators

2.1. Setting, notation and assumptions

Let us fix a finite time horizon $T>0$ . All observations are made within the time interval $[0,T]$ . We denote by $\mathcal{C}_{d}$ the space of $\mathbb{R}^{d}$ -valued continuous functions on $[0,T]$ , i.e. $\mathcal{C}_{d}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}C([0,T];\mathbb{R}^{d})$ . In more general cases, we write $C^{k}(\mathcal{X};\mathcal{Y})$ for the space of $k$ -times continuously differentiable functions defined on $\mathcal{X}$ taking values in $\mathcal{Y}$ . Similarly, we write $L^{p}(\mathcal{X};\mathcal{Y})$ for the space of $p$ -th Lebesgue-integrable functions, and $W^{s,p}(\mathcal{X};\mathcal{Y})$ for the Sobolev space. The position of $\mathcal{Y}$ is often neglected if $\mathcal{Y}=\mathbb{R}$ .

An $\mathbb{R}^{d}$ -valued function $f$ can be written componentwise $(f_{k})_{1\leqslant k\leqslant d}$ . The Fourier transform of a function $f:\mathbb{R}^{d}\to\mathbb{R}^{d}$ is defined componentwise via

\mathcal{F}_{\mathbb{R}^{d}}f(\xi)=(\mathcal{F}_{\mathbb{R}^{d}}f_{k}(\xi))_{1\leqslant k\leqslant d}=\left(\int_{\mathbb{R}^{d}}e^{-ix\xi}f_{k}(x)dx\right)_{1\leqslant k\leqslant d}\,.

This will be applied to the drift coefficients in our deconvolution method.

We usually have the following assumptions on the graphon mean field system (1.1).

Condition 2.1.

(1)

The drift coefficient $b:\mathbb{R}^{d}\times\mathbb{R}^{d}\to\mathbb{R}^{d}$ is bounded and has bounded first derivatives. It is Lipschitz continuous in the sense that there exists some constant $C>0$ such that

(2.1)

\left|b(x,y)-b(x^{\prime},y^{\prime})\right|\leqslant C(\left|x-x^{\prime}\right|+\left|y-y^{\prime}\right|)\,,\qquad x,x^{\prime},y,y^{\prime}\in\mathbb{R}^{d}\,.

(2)

The drift coefficient $b$ takes the form $b(x,y)=F(x-y)+V(x)$ , where $F,V\in W^{1,p}(\mathbb{R}^{d})$ for $p=1,2,\infty$ .
(3)

The diffusion coefficient $\sigma:\mathbb{R}^{d}\to\mathbb{R}^{d\times d}$ is Lipschitz in the operator norm on $\mathbb{R}^{d\times d}$ , i.e. there exists some constant $C>0$ such that

$\left\|\sigma(x)-\sigma(x^{\prime})\right\|\leqslant C\left|x-x^{\prime}\right|\,,\qquad x,x^{\prime}\in\mathbb{R}^{d}\,,$

where $\left\|\cdot\right\|$ is the operator norm of ${d\times d}$ matrices.
(4)

The diffusion coefficient $\sigma$ is bounded in the sense that there exist constants $\sigma_{\pm}>0$ such that

$\sigma_{-}^{2}I\preceq\sigma\sigma^{T}\preceq\sigma_{+}^{2}I\,,$

where two square matrices $M$ and $N$ satisfy $M\preceq N$ if $N-M$ is positive semi-definite.

Recall that the types of particles are indexed by $I=[0,1]$ , and the interaction strength between particles of two types is given by a graphon function $G:I\times I\to[0,1]$ . We consider the following conditions for the structure of the graphon function.

Condition 2.2.

(1)

The graphon function is piecewise Lipschitz in the sense that there exists a constant $C>0$ and a finite partition $\bigcup_{j\in J}I_{j}$ of $I$ , such that

\left|G(u,v)-G(u^{\prime},v^{\prime})\right|\leqslant C(\left|u-u^{\prime}\right|+\left|v-v^{\prime}\right|)\,,\qquad(u,v),(u^{\prime},v^{\prime})\in I_{i}\times I_{j},\;i,j\in J\,.

(2)

The graphon function $G$ has the form $G(u,v)=g(u-v)$ , where $g:\mathbb{R}\to[0,1]$ is a Lipschitz continuous function with $g(0)=g_{0}\in(0,1]$ a given constant.
(3)

Upon item (2), we have further that the Fourier transform of $g$ is in $L^{1}\cap L^{2}$ and decays fast enough, so that

$\tilde{r}^{2}\int_{\left|w\right|>\tilde{r}}\left|\mathcal{F}g(w)\right|^{2}dw\to 0$

as $\tilde{r}\to\infty$ .

Finally, we examine the initial state of the system.

Condition 2.3.

We denote by $\mathcal{P}(S)$ the space of probability measures on a Polish space $S$ (e.g. $\mathbb{R}^{d}$ , $\mathcal{C}_{d}$ ).

(1)

The initial distributions $\mu_{0,u}(dx)$ admit density functions $x\mapsto\mu(0,u,x)$ with respect to the Lebesgue measure on $\mathbb{R}^{d}$ . There exist some constant $c_{0}>0$ , $c_{1}\geqslant 1$ such that

$\sup_{u\in I}\int_{\mathbb{R}^{d}}\exp(c_{0}\left|x\right|^{2})\mu(0,u,x)\leqslant c_{1}\,.$
(2)

There exists a constant $C>0$ and a finite collection of intervals $\{I_{j}\}_{j\in J}$ such that $\bigcup_{j\in J}I_{j}=I$ , and

$\mathcal{W}_{2}(\mu_{0,u},\mu_{0,v})\leqslant C\left|u-v\right|\,,\qquad u,v\in I_{j},\quad j\in J\,,$

where $\mathcal{W}_{2}:\mathcal{P}(\mathbb{R}^{d})\times\mathcal{P}(\mathbb{R}^{d})\to[0,\infty]$ is the Wasserstein 2-distance.
(3)

There exists a function $\rho_{I}\in L^{2}\cap L^{\infty}(\mathbb{R}^{d})$ such that $\left|\mu_{0,u}-\mu_{0,v}\right|\leqslant\rho_{I}\left|u-v\right|$ almost everywhere, for every $u,v\in I$ .

The (continuously indexed) graphon mean-field system built on appropriately chosen conditions from above has dynamics

\displaystyle dX_{u}(t)=\int_{I}\int_{\mathbb{R}^{d}}b(X_{u}(t),x)G(u,v)\mu_{t,v}(dx)dvdt+\sigma(X_{u}(t))dB_{u}(t)\,,\qquad X_{u}(0)\sim\mu_{0,u}\,,

for $u\in I$ , $t\in[0,T]$ . Define the drift term

\beta(t,u,x,\mu_{t})\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{I}\int_{\mathbb{R}^{d}}b(x,y)G(u,v)\mu_{t,v}(dy)dv

and observe that $\beta:[0,T]\times I\times\mathbb{R}^{d}\times\mathcal{P}(\mathbb{R}^{d})^{I}$ is measurable. We will always abbreviate it as $\beta(t,u,x)$ (ignoring the mean-field argument). Under Condition 2.1(1)(3), we know that $\beta(t,u,\cdot)$ is Lipschitz continuous and has at most linear growth for every $t\in[0,T]$ and $u\in I$ . This means $\mu_{t,u}$ is the unique weak solution to the associated Fokker-Planck equation of the diffusion process $dX_{u}(t)=\beta(t,u,X_{u}(t))dt+\sigma(X_{u}(t))dB_{u}(t)$ , and the map $I\ni u\mapsto(\mu_{t,u})_{t\in[0,T]}\in\mathcal{P}(\mathcal{C}_{d})$ is measurable due to Proposition 2.1 in [3]. Further, with Condition 2.1(4) and 2.3(1), every $\mu_{\cdot,u}$ admits a density function $\mu(t,u,x)$ with respect to the Lebesgue measure on $[0,T]\times\mathbb{R}^{d}$ (see [9]). Note that $\mu:[0,T]\times I\times\mathbb{R}^{d}\to\mathbb{R}_{+}$ is measurable. We claim that the densities are asymptotically bounded.

Proposition 2.1.

Assume Conditions 2.1(1)(3)(4), 2.3(1), and that $b$ is almost everywhere bounded. There exists some $C,R>0$ such that, for every $p>d+2$ and every bounded open set $U$ disjoint from the closed ball $\overline{B(0,R)}$ , we have for all $t\in(0,T)$ and $u\in I$ that

\left\|\mu_{t,u}\right\|_{L^{\infty}(U)}\leqslant C\left\|\mu_{0,u}\right\|_{L^{\infty}(U)}+Ct^{\frac{p-d-2}{2}}(1+\left\|b\right\|_{\infty}^{p})\,.

As a consequence,

\sup_{t\in[0,T],u\in I}\left\|\mu_{t,u}\mathbf{1}_{\{\left|x\right|>R\}}\right\|_{2}<\infty\,.

The proof will be given in Appendix B. It also shows the $L^{2}$ -integrability of the density function $\mu_{t,u}$ at any $t\in(0,T)$ (see also Corollary 8.2.2, [9]).

Our goal in the next section is to construct estimations of the functions $\mu(t,u,x)$ and $\beta(t,u,x)$ , to give an estimation of $G(u,v)=g(u-v)$ . We will use the $L^{2}$ -distance in the probability space to describe our estimation errors.

2.2. Plug-in estimators

To estimate the underlying functions described in the previous paragraph, we make continuous-time observations of the $n$ -particle system (1.2),

	$\displaystyle X^{n}_{i}(t)$	$\displaystyle=X_{\frac{i}{n}}(0)+\int_{0}^{t}\frac{1}{n}\sum_{j=1}^{n}b(X^{n}_{i}(s),X^{n}_{j}(s))g^{n}_{ij}ds+\int_{0}^{t}\sigma(X^{n}_{i}(s))dB_{\frac{i}{n}}(s)\,,$
		$\displaystyle\qquad i=1,\dots,n\,,$

where $g^{n}_{ij}=G(\frac{i}{n},\frac{j}{n})$ for every $i,j\in\{1,\dots,n\}$ . This finite system shows some consistency with respect to the mean-field system in the following sense.

Lemma 2.1 (Theorem 3.2, [3]).

Assume Conditions 2.1(1)(3), 2.2(1), and 2.3(1)(2) hold. Then there exists a constant $C>0$ such that

(2.2)

\sup_{t\in[0,T]}\max_{1\leqslant i\leqslant n}\mathbf{E}\left|X^{n}_{i}(t)-X_{\frac{i}{n}}(t)\right|^{2}\leqslant\frac{C}{n}\,.

Kernel Interpolation

We introduce an HJK-kernel adapted from [28]. Choose three functions $H\in C^{1}_{c}(\mathbb{R})$ , $J\in C^{1}_{c}(\mathbb{R})$ , $K\in C^{1}_{c}(\mathbb{R}^{d})$ that are non-negative and normalized:

\int_{\mathbb{R}}H(t)dt=\int_{\mathbb{R}}J(u)du=\int_{\mathbb{R}^{d}}K(x)dx=1\,,

and have order (at least) 1:

\int_{\mathbb{R}}tH(t)dt=\int_{\mathbb{R}}uJ(u)du=\int_{\mathbb{R}^{d}}x_{i}K(x)dx=0\,,\quad i=1,\dots,d\,.

With the bandwidth vector $h=(h_{1},h_{2},h_{3})\in\mathbb{R}_{+}^{3}$ , the dilations are defined by

H_{h_{1}}(t)=h_{1}^{-1}H(h_{1}^{-1}t)\,,\quad J_{h_{2}}(u)=h_{2}^{-1}J(h_{2}^{-1}u)\,,\quad K_{h_{3}}(x)=h_{3}^{-d}K(h_{3}^{-1}x)\,,

and the products are written as

(J\otimes K)_{h}(u,x)=J_{h_{2}}(u)K_{h_{3}}(x)\,,\qquad(H\otimes J\otimes K)_{h}(t,u,x)=H_{h_{1}}(t)J_{h_{2}}(u)K_{h_{3}}(x)\,.

Due to the use of floating bandwidth, we choose without loss of generality kernels $H,J,K$ supported in the closed unit ball (in the space where they are defined).

With a given number $n$ of particles, we run the finite system $(X^{n}_{i})_{i=1,\dots,n}$ over the time interval $[0,T]$ . This gives us the empirical distribution

\mu^{n}_{t}(du,dx)=\frac{1}{n}\sum_{i=1}^{n}\delta_{X^{n}_{i}(t)}(dx)\delta_{\frac{i}{n}}(du)\,,\qquad t\in[0,T]\,.

Using the JK part of the kernel to interpolate it gives a plug-in estimator of the density $\mu$ :

(2.3)

\hat{\mu}^{n}_{h}(t,u,x)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}(J\otimes K)_{h}\ast\mu^{n}_{t}(u,x)=\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u-\frac{i}{n})K_{h_{3}}(x-X^{n}_{i}(t))

for $t\in[0,T]$ , $u\in I$ , $x\in\mathbb{R}^{d}$ .

We also look at an auxiliary quantity $\pi:[0,T]\times I\times\mathbb{R}^{d}\to\mathbb{R}^{d}$ defined by

\pi(t,u,x)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\beta(t,u,x)\mu(t,u,x)\,.

A discrete approximation is given by

\pi^{n}(dt,du,dx)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\frac{1}{n}\sum_{i=1}^{n}\delta_{X^{n}_{i}(t)}(dx)\delta_{\frac{i}{n}}(du)dX^{n}_{i}(t)\,,

so for any test function $f$ , we have

\int_{[0,T]\times I\times\mathbb{R}^{d}}f(t,u,x)\pi^{n}(dt,du,dx)=\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}f(t,\frac{i}{n},X^{n}_{i}(t))dX^{n}_{i}(t)

as a stochastic integral. Using the HJK kernel to interpolate it gives a plug-in estimator of $\pi$ :

(2.4)		$\displaystyle\hat{\pi}^{n}_{h}(t,u,x)$	$\displaystyle\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}(H\otimes J\otimes K)_{h}\ast\pi^{n}(t,u,x)$
		$\displaystyle=\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}H_{h_{1}}(t-s)J_{h_{2}}(u-\frac{i}{n})K_{h_{3}}(x-X^{n}_{i}(s))dX^{n}_{i}(s)$

for $t\in[0,T]$ , $u\in I$ , $x\in\mathbb{R}^{d}$ . That leads to a plug-in estimator of $\beta$ ,

(2.5)

\hat{\beta}_{h,\kappa}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\frac{\hat{\pi}^{n}_{h}}{\hat{\mu}^{n}_{h}\lor\kappa_{2}}\,,

where $\kappa_{2}>0$ is a cut-off parameter to prevent the denominator from getting too large.

Deconvolution

The deconvolution method is usually applied to obtain a function $f$ from the convolution $f\ast g$ . We follow the ideas of [24] and [28]. Here, we only present the definitions and estimators, while delaying the full intuitions to Appendix A.

To apply the Fourier transform on the index space $I=[0,1]$ , we consider the zero extension to all measurable functions defined in $I$ . With some abuse of notation, we let

\mu(t,u,x)=\begin{cases}\mu(t,u,x)\,,&u\in I\\ 0\,,&u\in\mathbb{R}\setminus I\end{cases}\,,\qquad\hat{\mu}^{n}_{h}(t,u,x)=\begin{cases}\hat{\mu}^{n}_{h}(t,u,x)\,,&u\in I\\ 0\,,&u\in\mathbb{R}\setminus I\end{cases}\,,

\pi(t,u,x)=\begin{cases}\pi(t,u,x)\,,&u\in I\\ 0\,,&u\in\mathbb{R}\setminus I\end{cases}\,,\qquad\hat{\pi}^{n}_{h}(t,u,x)=\begin{cases}\hat{\pi}^{n}_{h}(t,u,x)\,,&u\in I\\ 0\,,&u\in\mathbb{R}\setminus I\end{cases}\,,

\beta(t,u,x)=\begin{cases}\beta(t,u,x)\,,&u\in I\\ 0\,,&u\in\mathbb{R}\setminus I\end{cases}\,,\qquad\hat{\beta}^{n}_{h,\kappa}(t,u,x)=\begin{cases}\hat{\beta}^{n}_{h,\kappa}(t,u,x)\,,&u\in I\\ 0\,,&u\in\mathbb{R}\setminus I\end{cases}\,.

Then we define the Fourier transform of function $f$ supported on $I$ via

\mathcal{F}_{I}f(w)=\int_{I}e^{-iwu}f(u)du=\int_{\mathbb{R}}e^{-iwu}f(u)du\,.

Note that we may view $\mathcal{F}_{I}$ as a linear operator on function-valued functions, and it admits an inverse transform on $L^{2}$ -spaces.

In addition, we consider a linear operator $\mathcal{L}_{\phi}$ on time-dependent functions, defined by

\mathcal{L}_{\phi}f=\int_{0}^{T}f(t)\phi(t)dt\,,

where $\phi\in L^{\infty}([0,T];\mathbb{C})$ has compact support in $(0,T)$ , such that $\int_{0}^{T}\phi(t)dt=0$ (we denote this subspace of $L^{\infty}$ functions by $\dot{L}^{\infty}$ ). We write $\mathcal{L}$ for $\mathcal{L}_{\phi}$ when $\phi$ is fixed and the context has no ambiguity. The intuition of this operator is also explained in Appendix A.

Main estimator and its convergence

Finally, with some additional cutoff parameters, we introduce our estimator of the graphon function

(2.6)

\hat{G}^{n}_{\vartheta}(u_{0},v_{0})\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}g_{0}\cdot\frac{\left\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}^{n}_{h,r}\right|>\kappa_{1},\left|w\right|\leqslant\tilde{r}\}}\big{)}(u_{0}-v_{0})\right\|_{L^{2}(\mathbb{R}^{d})}}{\left\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}^{n}_{h,r}\right|>\kappa_{1},\left|w\right|\leqslant\tilde{r}\}}\big{)}(0)\right\|_{L^{2}(\mathbb{R}^{d})}\lor\kappa_{0}}\,,

where $\mathcal{T}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\mathcal{F}_{I}\mathcal{F}_{\mathbb{R}^{d}}\mathcal{L}_{\phi}$ , and

\hat{\mu}^{n}_{h,r}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\hat{\mu}^{n}_{h}\mathbf{1}_{\{\left|x\right|\leqslant r\}}\,,\qquad\hat{\beta}^{n}_{h,\kappa,r}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\hat{\beta}^{n}_{h,\kappa}\mathbf{1}_{\{\left|x\right|\leqslant r\}}\,.

We will explain the intuition of this estimator in Appendix A.

For the estimate to converge, we need a further (relatively strong) assumption on some data intrinsic to the particle system.

Assumption 2.1.

Given Conditions 2.1, 2.2, and 2.3, there exists $\phi\in\dot{L}^{\infty}([0,T];\mathbb{C})$ , compactly supported in $(0,T)$ , such that $\mathcal{F}_{I}\mathcal{F}_{\mathbb{R}^{d}}\mathcal{L}_{\phi}\mu\neq 0$ almost everywhere on $\mathbb{R}\times\mathbb{R}^{d}$ .

Theorem 2.1 (Main theorem).

Assume Conditions 2.1, 2.2, 2.3, and Assumption 2.1. Take $\phi$ given by Assumption 2.1. There exists a function $\mathcal{U}$ in the number of particles $n$ and the parameters

\vartheta=(h_{1},h_{2},h_{3},\kappa_{0},\kappa_{1},\kappa_{2},r,\tilde{r})

such that

(2.7)

\max_{u_{0},v_{0}\in I}\mathbf{E}\left|\hat{G}^{n}_{\vartheta}(u_{0},v_{0})-G(u_{0},v_{0})\right|^{2}\leqslant\mathcal{U}(n,\vartheta)\,,

whenever $n,r,\tilde{r}$ are sufficiently large, $h_{1},h_{2},h_{3},\kappa_{1}>0$ are sufficiently small, and $\kappa_{0},\kappa_{1},\kappa_{2}>0$ such that $\kappa_{0}<g_{0}\left\|F\right\|_{2}$ and

\kappa_{2}<\inf\{\mu(t,u,x)\mid t\in\operatorname{supp}(\phi),u\in I,\left|x\right|\leqslant r\}\,.

Moreover, there exists a sequence of choices of the parameters $(\vartheta_{n})_{n\in\mathbb{N}}$ such that $\mathcal{U}(n,\vartheta_{n})\to 0$ as $n\to\infty$ .

Remark 2.1.

Notice that the bound $\mathcal{U}(n,\vartheta)$ is uniform over all $u_{0},v_{0}\in I$ . Then the cut distance from an estimator $\hat{G}$ to the true graphon function $G$ is bounded by

	$\displaystyle d_{\square}(\hat{G},G)$	$\displaystyle\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\sup_{J_{1},J_{2}\in\mathcal{B}(I)}\left\|\int_{J_{1}\times J_{2}}\hat{G}(u,v)-G(u,v)dudv\right\|$
		$\displaystyle\leqslant\int_{I\times I}\left\|\hat{G}(u,v)-G(u,v)\right\|dudv\,.$

This implies

\mathbf{E}\big{(}d_{\square}(\hat{G}^{n}_{\vartheta},G)\big{)}\leqslant\int_{I\times I}\mathbf{E}\left|\hat{G}^{n}_{\vartheta}(u,v)-G(u,v)\right|dudv\leqslant\mathcal{U}(n,\vartheta)^{1/2}\,.

Hence we also have the convergence of the estimator $\hat{G}^{n}_{\vartheta}$ in the cut metric.

Remark 2.2.

The Assumption 2.1 is made for the purpose of dominated convergence, and this is standard in a variety of applications of the deconvolution method [24]. Yet it is nontrivial to verify, as it involves the distribution through the whole time interval. We discuss this further in Appendix C.

3. Convergence of estimators

3.1. Error bounds for density and drift

We give estimates of the particle density $\mu$ and the intermediate quantity $\pi$ in the general setting. These ultimately contribute to the estimate of the graphon function. From this section onwards, we will keep using the asymptotic comparison symbol $\lesssim$ , where $f\lesssim g$ means there exists some constant $c>0$ such that $f\leqslant cg$ . In addition, we write $f\lesssim_{q}g$ if $f\leqslant cg$ for some constant $c$ depending on the quantity $q$ (e.g. time horizon $T$ , dimension $d$ ).

Estimates of particle density $\mu(t,u,x)$

Lemma 3.1.

Assume that Conditions 2.1(1)(3), 2.2(1), and 2.3(1)(2) hold. For $t_{0}\in(0,T)$ , $u_{0}\in(0,1)$ , $x_{0}\in\Omega$ , we have

	$\displaystyle\mathbf{E}\left\|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right\|^{2}\leqslant$
	$\displaystyle\qquad C(n^{-2}h_{3}^{-2-2d}\left\\|\nabla K\right\\|_{\infty}^{2}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}$
	$\displaystyle\qquad+n^{-2}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\left\\|K_{h_{3}}(x_{0}-\cdot)\right\\|^{2}_{L^{2}(\mu_{t_{0},\frac{i}{n}})}+n^{-2}h_{2}^{-2}h_{3}^{-2d}\left\\|J\right\\|_{\infty}^{2}\left\\|K\right\\|_{\infty}^{2}$
	$\displaystyle\qquad+n^{-2}h_{3}^{-2-2d}\left\\|J\right\\|_{2}^{2}\left\\|\nabla K\right\\|_{\infty}^{2}+n^{-3}h_{2}^{-2}\left\\|\nabla J\right\\|_{\infty}^{2}\sum_{i=1}^{n}\left\\|K_{h_{3}}(x_{0}-\cdot)\right\\|_{L^{2}(\mu_{t_{0},\frac{i}{n}})}^{2})$
	$\displaystyle\qquad+\left\|(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right\|^{2}\,,$

where

(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})=\int_{I}\int_{\mathbb{R}^{d}}(J\otimes K)_{h}(u_{0}-u,x_{0}-x)\mu(t_{0},u,x)dxdu\,.

Integrating the above pointwise errors, we have the following $L^{2}$ -error on the estimator $\mu^{n}_{h}$ .

Corollary 3.1.

Assume the same hypothesis as in Lemma 3.1. Fix a compact interval $[\tau_{1},\tau_{2}]\subset(0,T)$ . For $n\gg 1$ , $h_{2},h_{3}\ll 1$ , $r\gg 1$ , we have

	$\displaystyle\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\mathbf{E}\left\|\hat{\mu}^{n}_{h,r}(t,u,x)-\mu(t,u,x)\right\|^{2}dxdudt\leqslant C(\theta_{2,\mu}(r)+\theta_{3,\mu}(h)+$
	$\displaystyle\qquad r^{d}(n^{-2}h_{3}^{-2-2d}+n^{-2}h_{2}^{-2}h_{3}^{-2d})+n^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{2}^{-2}h_{3}^{-d})\,,$

where $C$ is a constant depending on ${T,d,b,J,K}$ . Here $\theta_{2,\mu}:\mathbb{R}_{+}\to\mathbb{R}_{+}$ is a function such that $\theta_{2,\mu}(r)\to 0$ as $r\to\infty$ , and $\theta_{3,\mu}:\mathbb{R}_{+}^{3}\to\mathbb{R}_{+}$ is a function such that $\theta_{3,\mu}(h)\to 0$ as $h_{2}+h_{3}\to 0$ .

Estimates of the drift term $\beta(t,u,x)$

Lemma 3.2.

Assume Conditions 2.1(1)(3)(4), 2.2(1), 2.3(1)(2), and that $b$ is bounded. Then, for $t_{0}\in(0,T)$ , $u_{0}\in(0,1)$ , $x_{0}\in\mathbb{R}^{d}$ , we have

	$\displaystyle\mathbf{E}\left\|\pi^{n}_{h}(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})\right\|^{2}\leqslant C(Td^{2}\sigma_{+}^{2}n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d}$
	$\displaystyle\quad+Tn^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d}\left\\|b\right\\|_{\infty}^{2}\left\\|H\right\\|_{2}^{2}\left\\|J\right\\|_{\infty}^{2}\left\\|\nabla K\right\\|_{\infty}^{2}$
	$\displaystyle\qquad\quad+Tn^{-2}h_{1}^{-1}h_{3}^{-2d}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}$
	$\displaystyle\quad+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2d}T\left\\|b\right\\|_{\infty}^{2}\left\\|H\right\\|_{2}^{2}\left\\|J\right\\|_{\infty}^{2}\left\\|K\right\\|_{\infty}^{2}$
	$\displaystyle\qquad\quad+Tn^{-2}\left\\|b\right\\|_{\infty}^{2}\int_{0}^{T}H_{h_{1}}^{2}(t_{0}-t)\sum_{i=1}^{n}J_{h_{2}}^{2}(u_{0}-\frac{i}{n})\left\\|K_{h_{3}}(x_{0}-\cdot)\right\\|_{L^{2}(\mu_{t,\frac{i}{n}})}^{2}dt$
	$\displaystyle\quad+Tn^{-2}h_{1}^{-1}h_{2}^{-1}h_{3}^{-2-2d}\left\\|b\right\\|_{\infty}^{2}\left\\|H\right\\|_{2}^{2}\left\\|J\right\\|_{2}^{2}\left\\|\nabla K\right\\|_{\infty}^{2}$
	$\displaystyle\qquad\quad+Tn^{-2}h_{1}^{-1}h_{2}^{-1}h_{3}^{-2d}\left\\|H\right\\|_{2}^{2}\left\\|J\right\\|_{2}^{2}\left\\|K\right\\|_{\infty}^{2}$
	$\displaystyle\qquad\quad+Tn^{-3}h_{2}^{-4}\left\\|b\right\\|_{\infty}^{2}\left\\|\nabla J\right\\|_{\infty}^{2}\int_{0}^{T}H_{h_{1}}^{2}(t_{0}-t)\sum_{i=1}^{n}\left\\|K_{h_{3}}(x_{0}-\cdot)\right\\|_{L^{2}(\mu_{t,\frac{i}{n}})}^{2}dt)$
	$\displaystyle\quad+\left\|(H\otimes J\otimes K)_{h}\ast\pi(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})\right\|^{2}\,.$

Recall that $\beta^{n}_{h,\kappa}=\frac{\pi^{n}_{h}}{\mu^{n}_{h}\lor\kappa_{2}}$ . Integrating the above pointwise errors and using Corollary 3.1, we have the following $L^{2}$ -error on the estimator $\beta^{n}_{h,\kappa}$ .

Corollary 3.2.

Assume the same hypothesis as in Lemma 3.2. Fix a compact interval $[\tau_{1},\tau_{2}]\subset(0,T)$ . For $n\gg 1$ , $h_{1},h_{2},h_{3}\ll 1$ , $r\gg 1$ , and $0<\kappa_{2}<\inf\{\mu(t,u,x)\nonscript\>|\nonscript\>\mathopen{}\allowbreak t\in[\tau_{1},\tau_{2}],u\in I,\left|x\right|\leqslant r\}$ , we have

	$\displaystyle\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\mathbf{E}\left\|\hat{\beta}^{n}_{h,\kappa,r}(t,u,x)-\beta(t,u,x)\right\|^{2}dxdudt\leqslant$
	$\displaystyle\qquad C(\kappa_{2}^{-2}\big{(}n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-4}h_{3}^{-d}\big{)}$
	$\displaystyle\quad+\kappa_{2}^{-2}r^{d}(n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d}+n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d})$
	$\displaystyle\quad+\kappa_{2}^{-2}(\theta_{3,\mu}(h)+\theta_{3,\pi}(h))+\theta_{2,\beta}(r))\,,$

where the constant $C$ depends on ${T,d,b,\sigma,H,J,K}$ . Here $\theta_{2,\beta}:\mathbb{R}_{+}\to\mathbb{R}_{+}$ is some function such that $\theta_{2,\beta}(r)\to 0$ as $r\to\infty$ , and $\theta_{3,\pi}:\mathbb{R}_{+}^{3}\to\mathbb{R}_{+}$ is some function such that $\theta_{3,\pi}(h)\to 0$ as $h_{1}+h_{2}+h_{3}\to 0$ .

Although we postponed the proofs to Section 5, we now justify the main result (Theorem 2.1).

3.2. Proof of main theorem

Proof of Theorem 2.1.

Step 1: For simplicity, we usually abbreviate $\hat{G}^{n}_{\vartheta}$ as $\hat{G}$ , and similarly for $\hat{\mu}$ and $\hat{\beta}$ .

Fix arbitrary $u_{0},v_{0}\in I$ . Write $\hat{G}$ as

\hat{G}(u_{0},v_{0})=g_{0}\cdot\frac{\hat{A}(u_{0}-v_{0})}{\hat{A}(0)\lor\kappa_{0}}

where

\hat{A}(u)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\left\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}^{n}_{h,r}\right|>\kappa_{1},\left|w\right|\leqslant\tilde{r}\}}\big{)}(u)\right\|_{L^{2}(\mathbb{R}^{d})}\,.

Also let $A(u)=g(u)\left\|F\right\|_{2}$ . When $\kappa_{0}<A(0)$ , we have

	$\displaystyle\left\|\hat{G}(u_{0},v_{0})-G(u_{0},v_{0})\right\|^{2}$	$\displaystyle=g_{0}^{2}\left\|\frac{A(u_{0}-v_{0})(\hat{A}(0)-A(0))}{A(0)(\hat{A}(0)\lor\kappa_{0})}+\frac{A(u_{0}-v_{0})-\hat{A}(u_{0}-v_{0})}{\hat{A}(0)\lor\kappa_{0}}\right\|^{2}$
		$\displaystyle\lesssim\frac{G(u_{0},v_{0})^{2}\left\|\hat{A}(0)\lor\kappa_{0}-A(0)\right\|^{2}}{(\hat{A}(0)\lor\kappa_{0})^{2}}+\frac{\left\|A(u_{0}-v_{0})-\hat{A}(u_{0}-v_{0})\right\|^{2}}{(\hat{A}(0)\lor\kappa_{0})^{2}}$
		$\displaystyle\lesssim\kappa_{0}^{-2}\left(\left\|\hat{A}(0)-A(0)\right\|^{2}+\left\|A(u_{0}-v_{0})-\hat{A}(u_{0}-v_{0})\right\|^{2}\right)\,.$

So it suffices to bound the expressions $\mathbf{E}\left|\hat{A}(u)-A(u)\right|^{2}$ , and we will do that in the following steps.

Step 2: By Minkowski’s inequality, we have for each $u\in\mathbb{R}$ that

	$\displaystyle\left\|\left\\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left\|\mathcal{T}\hat{\mu}^{n}_{h,r}\right\|>\kappa_{1},\left\|w\right\|\leqslant\tilde{r}\}}\big{)}(u)\right\\|_{L^{2}(\mathbb{R}^{d})}-g(u)\left\\|F\right\\|_{L^{2}(\mathbb{R}^{d})}\right\|$
	$\displaystyle\leqslant\left\\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left\|\mathcal{T}\hat{\mu}^{n}_{h,r}\right\|>\kappa_{1},\left\|w\right\|\leqslant\tilde{r}\}}\big{)}(u)-g(u)\mathcal{F}_{\mathbb{R}^{d}}F\right\\|_{L^{2}(\mathbb{R}^{d})}$
	$\displaystyle=\left\\|\mathcal{F}_{I}^{-1}\left(\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left\|\mathcal{T}\hat{\mu}^{n}_{h,r}\right\|>\kappa_{1},\left\|w\right\|\leqslant\tilde{r}\}}-\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F\big{)}\right)(u)\right\\|_{L^{2}(\mathbb{R}^{d})}\,.$

Then

	$\displaystyle\mathbf{E}\left\|\hat{A}(u)-A(u)\right\|^{2}$	$\displaystyle\lesssim\mathbf{E}\left\\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left\|\mathcal{T}\hat{\mu}^{n}_{h,r}\right\|>\kappa_{1},\left\|w\right\|\leqslant\tilde{r}\}}-\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F\big{)}(u)\right\\|_{L^{2}(\mathbb{R}^{d})}^{2}$
		$\displaystyle\lesssim\tilde{r}^{2}\mathbf{E}\int_{\left\|w\right\|\leqslant\tilde{r}}\left\\|\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left\|\mathcal{T}\hat{\mu}^{n}_{h,r}\right\|>\kappa_{1}\}}-\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F\big{)}(w)\right\\|_{L^{2}(\mathbb{R}^{d})}^{2}dw$
		$\displaystyle\qquad\qquad+\left\\|\mathcal{F}_{\mathbb{R}^{d}}F\int_{\left\|w\right\|>\tilde{r}}e^{iuw}\mathcal{F}_{I}g(w)dw\right\\|_{L^{2}(\mathbb{R}^{d})}^{2}\,.$

For the second term, we observe by Parseval’s identity that

\left\|\mathcal{F}_{I}^{-1}(\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F)(u)\right\|_{L^{2}(\mathbb{R}^{d})}=\left|g(u)\right|\left\|F\right\|_{2}\leqslant\left\|F\right\|_{2}<\infty\,,

so that

\left\|\mathcal{F}_{\mathbb{R}^{d}}F\int_{\left|w\right|>\tilde{r}}e^{iuw}\mathcal{F}_{I}g(w)dw\right\|_{L^{2}(\mathbb{R}^{d})}^{2}\leqslant\left\|F\right\|_{2}^{2}\left(\int_{\left|w\right|>\tilde{r}}\left|\mathcal{F}_{I}g(w)\right|dw\right)^{2}\to 0

as $\tilde{r}\to\infty$ due to Condition 2.2(3), at some rate $\tilde{\theta}(\tilde{r})$ independent of $u$ .

Step 3: Now we look at the first term under Assumption 2.1,

\tilde{r}^{2}\mathbf{E}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\left|\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left|\mathcal{T}\hat{\mu}^{n}_{h,r}\right|>\kappa_{1},\left|w\right|\leqslant\tilde{r}\}}-\frac{\mathcal{T}\beta}{\mathcal{T}\mu}\big{)}(w,\xi)\right|^{2}d\xi dw

Notice that $\mathcal{T}\beta=(\mathcal{F}_{I}g)(\mathcal{F}_{\mathbb{R}^{d}}F)(\mathcal{T}\mu)$ . Following [28], we split the integrand by

\frac{\mathcal{T}\hat{\beta}}{\mathcal{T}\hat{\mu}}-\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F=\frac{(\mathcal{T}\hat{\beta}-\mathcal{T}\beta)+(\mathcal{F}_{I}g)(\mathcal{F}_{\mathbb{R}^{d}}F)(\mathcal{T}\mu-\mathcal{T}\hat{\mu})}{\mathcal{T}\hat{\mu}}\,,

when the division is well-defined. Then

	$\displaystyle\left\|\frac{\mathcal{T}\hat{\beta}}{\mathcal{T}\hat{\mu}}\mathbf{1}_{\{\left\|\mathcal{T}\hat{\mu}\right\|>\kappa_{1},\left\|w\right\|\leqslant\tilde{r}\}}-\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F\right\|^{2}$
	$\displaystyle\quad\lesssim\kappa_{1}^{-2}\left\|\mathcal{T}\hat{\beta}-\mathcal{T}\beta\right\|^{2}+\kappa_{1}^{-2}\left\|\mathcal{F}g\right\|^{2}\left\|\mathcal{F}F\right\|^{2}\left\|\mathcal{T}\mu-\mathcal{T}\hat{\mu}\right\|^{2}+\left\|\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F(\mathbf{1}_{\{\left\|\mathcal{T}\hat{\mu}\right\|\leqslant\kappa_{1}\}}+\mathbf{1}_{\{\left\|w\right\|>\tilde{r}\}})\right\|^{2}$
	$\displaystyle\quad=:\mathcal{A}_{1}+\mathcal{A}_{2}+\mathcal{A}_{3}\,.$

For $\mathcal{A}_{1}$ , the Parseval’s identity gives

	$\displaystyle\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\kappa_{1}^{-2}\left\|\mathcal{T}\hat{\beta}-\mathcal{T}\beta\right\|^{2}(w,\xi)d\xi dw$	$\displaystyle=\kappa_{1}^{-2}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\left\|\mathcal{L}_{\phi}\hat{\beta}(u,x)-\mathcal{L}_{\phi}\beta(u,x)\right\|^{2}dxdu$
		$\displaystyle\leqslant\kappa_{1}^{-2}\left\\|\phi\right\\|_{2}^{2}\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\left\|\hat{\beta}(t,u,x)-\beta(t,u,x)\right\|^{2}dxdudt\,,$

where $\operatorname{supp}(\phi)\subset[\tau_{1},\tau_{2}]$ . From Corollary 3.2 we get

	$\displaystyle\mathbf{E}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\kappa_{1}^{-2}\left\|\mathcal{T}\hat{\beta}(w,\xi)-\mathcal{T}\beta(w,\xi)\right\|^{2}d\xi dw\lesssim_{T,d,b,\sigma,H,J,K,\phi}$
	$\displaystyle\qquad\qquad\kappa_{1}^{-2}\kappa_{2}^{-2}\big{(}n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-4}h_{3}^{-d}\big{)}$
	$\displaystyle\qquad\quad+\kappa_{1}^{-2}\kappa_{2}^{-2}r^{d}(n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d}+n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d})$
	$\displaystyle\qquad\quad+\kappa_{1}^{-2}\kappa_{2}^{-2}(\theta_{3,\mu}(h)+\theta_{3,\pi}(h))+\kappa_{1}^{-2}\theta_{2,\beta}(r)\,.$

For $\mathcal{A}_{2}$ , similarly we have

\displaystyle\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\left|\mathcal{T}\hat{\mu}-\mathcal{T}\mu\right|^{2}(w,\xi)d\xi dw\leqslant\left\|\phi\right\|_{2}^{2}\int_{0}^{T}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\left|\hat{\mu}(t,u,x)-\mu(t,u,x)\right|^{2}dxdudt\,.

Also, $\left|\mathcal{F}g\right|\leqslant\left\|g\right\|_{1}\leqslant 2$ , and $\left|\mathcal{F}F\right|\leqslant\left\|F\right\|_{1}<\infty$ . Along with Corollary 3.1 we have that

	$\displaystyle\mathbf{E}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\kappa_{1}^{-2}\left\|\mathcal{F}g\right\|^{2}\left\|\mathcal{F}F\right\|^{2}\left\|\mathcal{T}\mu-\mathcal{T}\hat{\mu}\right\|^{2}d\xi dw\lesssim_{T,d,b,\sigma,J,K,\phi}$
	$\displaystyle\quad\qquad\kappa_{1}^{-2}(\theta_{2,\mu}(r)+\theta_{3,\mu}(h))$
	$\displaystyle\quad\quad+\kappa_{1}^{-2}r^{d}(n^{-2}h_{3}^{-2-2d}+n^{-2}h_{2}^{-2}h_{3}^{-2d})+\kappa_{1}^{-2}(n^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{2}^{-2}h_{3}^{-d})\,,$

For $\mathcal{A}_{3}$ , we first observe that

	$\displaystyle\mathbf{E}\left\|\mathcal{F}_{I}g(w)\mathcal{F}_{\mathbb{R}^{d}}F(\xi)\mathbf{1}_{\{\left\|\mathcal{T}\hat{\mu}\right\|\leqslant\kappa_{1}\}}(w,\xi)\right\|^{2}$
	$\displaystyle\leqslant\left\|\mathcal{F}_{I}g(w)\mathcal{F}_{\mathbb{R}^{d}}F(\xi)\right\|^{2}\mathbf{E}\left[\mathbf{1}_{\{\left\|\mathcal{T}\hat{\mu}-\mathcal{T}\mu\right\|\geqslant\kappa_{1}\}}+\mathbf{1}_{\{\left\|\mathcal{T}\mu\right\|\leqslant 2\kappa_{1}\}}\right]$
	$\displaystyle\leqslant\left\|\mathcal{F}_{I}g(w)\mathcal{F}_{\mathbb{R}^{d}}F(\xi)\right\|^{2}\left(\kappa_{1}^{-2}\mathbf{E}\left\|\mathcal{T}(\hat{\mu}-\mu)(w,\xi)\right\|^{2}+\mathbf{1}_{\{\left\|\mathcal{T}\mu\right\|\leqslant 2\kappa_{1}\}}(w,\xi)\right)\,.$

Integrating the first part gives

	$\displaystyle\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\left\|\mathcal{F}g(w)\mathcal{F}F(\xi)\right\|^{2}\kappa_{1}^{-2}\mathbf{E}\left\|\mathcal{T}(\hat{\mu}-\mu)(w,\xi)\right\|^{2}d\xi dw$
	$\displaystyle\leqslant\kappa_{1}^{-2}\left\\|g\right\\|_{1}^{2}\left\\|F\right\\|_{1}^{2}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\mathbf{E}\left\|\mathcal{T}(\hat{\mu}-\mu)(w,\xi)\right\|^{2}d\xi dw$
	$\displaystyle\lesssim_{g,b,\phi}\kappa_{1}^{-2}\int_{0}^{T}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\mathbf{E}\left\|\hat{\mu}^{n}_{h,r}(t,u,x)-\mu(t,u,x)\right\|^{2}dxdudt\,,$

which can be bounded in the same way as $\mathcal{A}_{2}$ using Corollary 3.1.

Integrating for the second part, we get

(3.1)

\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\left|\mathcal{F}_{I}g(w)\mathcal{F}_{\mathbb{R}^{d}}F(\xi)\right|^{2}\mathbf{1}_{\{\left|\mathcal{T}\mu\right|\leqslant 2\kappa_{1}\}}(w,\xi)d\xi dw\,.

Under Assumption 2.1, we apply dominated convergence to see that this quantity goes to 0 as $\kappa_{1}\to 0$ .

In addition,

	$\displaystyle\int_{0}^{T}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\mathbf{E}\left\|\mathcal{F}_{I}g(w)\mathcal{F}_{\mathbb{R}^{d}}F(\xi)\mathbf{1}_{\{\left\|w\right\|>\tilde{r}\}}(w,\xi)\right\|^{2}$
	$\displaystyle\leqslant\int_{0}^{T}\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}\left\|\mathcal{F}_{I}g(w)\mathbf{1}_{\{\left\|w\right\|>\tilde{r}\}}\right\|^{2}\left\|\mathcal{F}_{\mathbb{R}^{d}}F(\xi)\right\|^{2}$
	$\displaystyle=T\left\\|F\right\\|_{2}^{2}\int_{\{\left\|w\right\|>\tilde{r}\}}\left\|\mathcal{F}_{I}g(w)\right\|^{2}\,.$

Condition 2.2(3) guarantees that it converges to 0 faster than $\tilde{r}^{-2}$ as $\tilde{r}\to\infty$ . We denote the total convergence rate of $\mathcal{A}_{3}$ by $\theta_{1}(\tilde{r},\kappa_{1})$ .

To summarize, we define

	$\displaystyle\mathcal{U}(n,\vartheta)$	$\displaystyle=C(\kappa_{0}^{-2}\tilde{r}^{2}\tilde{\theta}(\tilde{r})$
		$\displaystyle+\tilde{r}^{2}\kappa_{0}^{-2}\theta_{1}(\tilde{r},\kappa_{1})$
		$\displaystyle+\tilde{r}^{2}\kappa_{0}^{-2}\kappa_{1}^{-2}(\theta_{2,\beta}(r)+\kappa_{2}^{-2}(\theta_{2,\mu}(r)+\theta_{3,\mu}(h)+\theta_{3,\pi}(h)))$
		$\displaystyle+\tilde{r}^{2}\kappa_{0}^{-2}\kappa_{1}^{-2}\kappa_{2}^{-2}(n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-4}h_{3}^{-d})$
		$\displaystyle+\tilde{r}^{2}\kappa_{0}^{-2}\kappa_{1}^{-2}\kappa_{2}^{-2}r^{d}(n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d}+n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d}))\,.$

Here the constant $C$ depends only on $T,d,b,\sigma,H,J,K$ , which are fixed for the model. The upper bound is independent of $u_{0},v_{0}$ , so we obtain the uniform bound presented in the theorem.

We can fix $\kappa_{0}$ and $\phi$ . Let $\vartheta_{n}=(h_{1}^{(n)},h_{2}^{(n)},h_{3}^{(n)},\kappa_{0},\kappa_{1}^{(n)},\kappa_{2}(r^{(n)}),r^{(n)},\tilde{r}^{(n)})$ , where $r^{(n)},\tilde{r}^{(n)}\to\infty$ slowly enough as $n\to\infty$ , $\kappa_{2}(r)=\frac{1}{2}\inf\{\mu(t,u,x)\nonscript\>|\nonscript\>\mathopen{}\allowbreak t\in\operatorname{supp}(\phi),u\in I,\left|x\right|\leqslant r\}$ , and $\kappa_{1}^{(n)},h_{1}^{(n)},h_{2}^{(n)},h_{3}^{(n)}\to 0$ accordingly. We may guarantee that the quantities $\theta_{1}$ , $\theta_{2,\beta}$ , $\theta_{2,\mu}$ , $\theta_{3,\mu}$ , $\theta_{3,\pi}$ all converge to 0. Then $\mathcal{U}(n,\vartheta_{n})\to 0$ as $n\to\infty$ , finishing the proof. ∎

4. Minimax analysis on plug-in estimators

In Section 3.1 we presented upper bounds for the estimation error $\mathbf{E}\left|\hat{\mu}^{n}_{h}-\mu\right|^{2}$ and $\mathbf{E}\left|\hat{\pi}^{n}_{h}-\pi\right|^{2}$ . Those are similar to those given in [10] and are not tight, though convergent. However, the estimators themselves are indeed optimal whenever the parameters $h_{1}$ , $h_{2}$ , and $h_{3}$ are properly chosen. In this section, we conduct a minimax analysis to study both the upper bounds and lower bounds of the estimation errors to witness the optimality.

We first look at an improved upper bound on the error of the plug-in estimator $\hat{\mu}^{n}_{h}$ .

Lemma 4.1.

(1)

Assume Conditions 2.1(1)(3), 2.2(1), and 2.3 hold. For every $t_{0}\in(0,T)$ , $u_{0}\in(0,1)$ , $x_{0}\in\mathbb{R}^{d}$ , we have

	$\displaystyle\mathbf{E}\left\|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right\|^{2}\leqslant$
	$\displaystyle\qquad C_{0}(n^{-1}h_{2}^{-1}h_{3}^{-d}\left\\|J\right\\|_{\infty}^{2}\left\\|K\right\\|_{2}^{2}+n^{-2}h_{2}^{-2}h_{3}^{-2d}\left\\|J\right\\|_{\infty}^{2}\left\\|K\right\\|_{\infty}^{2}$
	$\displaystyle\qquad+n^{-2}h_{3}^{-2-2d}\left\\|J\right\\|_{\infty}^{2}\left\\|\nabla K\right\\|_{\infty}^{2}+n^{-2}h_{2}^{-2}h_{3}^{-d}\left\\|\nabla J\right\\|_{\infty}^{2}\left\\|K\right\\|_{2}^{2})$
	$\displaystyle\qquad+\left\|(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right\|^{2}\,,$

where $C_{0}>0$ is independent of the bandwidths $h_{2},h_{3}$ and the number of particles $n$ .

(2)

Assume further that there exist some $p>2$ and $c_{p}>0$ such that

\mathcal{W}_{p}(\mu_{0,u},\mu_{0,v})\leqslant c_{p}\left|u-v\right|,,\qquad\forall u,v\in I,.

Then, for every $t_{0}\in(0,T)$ , $u_{0}\in(0,1)$ , $x_{0}\in\mathbb{R}^{d}$ , we have

	$\displaystyle\mathbf{E}\left\|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right\|^{2}\leqslant$
	$\displaystyle\qquad C_{0}(n^{-1}h_{2}^{-1}h_{3}^{-d}\left\\|J\right\\|_{\infty}^{2}\left\\|K\right\\|_{2}^{2}+n^{-2}h_{2}^{-2}h_{3}^{-2d}\left\\|J\right\\|_{\infty}^{2}\left\\|K\right\\|_{\infty}^{2}$
	$\displaystyle\qquad+n^{-2}h_{3}^{-2-\frac{p+2}{p}d}\left\\|J\right\\|_{\infty}^{2}\left\\|\nabla K\right\\|_{\infty}^{2}+n^{-2}h_{2}^{-2}h_{3}^{-d}\left\\|\nabla J\right\\|_{\infty}^{2}\left\\|K\right\\|_{2}^{2})$
	$\displaystyle\qquad+\left\|(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right\|^{2}\,,$

where $C_{0}>0$ is independent of the bandwidths $h_{1},h_{2},h_{3}$ and the number of particles $n$ .

There is also an improved upper bound on the error of the plug-in estimator $\hat{\beta}^{n}_{h,\kappa}$ .

Lemma 4.2.

Assume Conditions 2.1(1)(3)(4), 2.2(1), 2.3. Assume further that $b$ is bounded and has bounded first and second derivatives. For every $t_{0}\in(0,T)$ , $u_{0}\in(0,1)$ , $x_{0}\in\mathbb{R}^{d}$ , we have

	$\displaystyle\mathbf{E}\left\|\hat{\pi}^{n}_{h}(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})\right\|^{2}\lesssim$
	$\displaystyle\qquad\quad n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}+n^{-2}h_{1}^{-1}h_{3}^{-2-2d}$
	$\displaystyle\qquad\quad+\left\|(H\otimes J\otimes K)_{h}\ast\pi(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})\right\|^{2}\,.$

Remark 4.1.

We will use the above results for the minimax analysis in the next step. Yet, they are not ideal for estimating the total error of $L^{2}([0,T]\times I\times\mathbb{R}^{d})$ (as used in the proof of Theorem 2.1). It relies on the local boundedness of the density function $\mu$ , which follows from a crude estimate of the form $\mu(t,u,x)\leqslant\exp(c(1+\left|x\right|^{2}))$ (see, for instance, Corollary 8.2.2, [9]).

4.1. Anisotropic Hölder smoothness classes

In the estimates of Lemma 4.1, all items are explicitly quantitative except the bias term

	$\displaystyle(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})=$
	$\displaystyle\qquad\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}J_{h_{2}}(u_{0}-u)K_{h_{3}}(x_{0}-x)(\mu(t_{0},u,x)-\mu(t_{0},u_{0},x_{0}))dxdu\,.$

The analysis of this quantity relies on some continuity of the density function $\mu$ . Here, we introduce a specific class of particle systems following the idea in [28].

Definition 4.1.

Let $\alpha=(\alpha_{1},\dots,\alpha_{d})\in\mathbb{N}^{d}$ be a multi-index. Its norm is given by

\left|\alpha\right|=\sum_{i=1}^{d}\alpha_{i}\,,

and the differential operator of order $\alpha$ is defined by $D^{\alpha}=\partial^{\alpha_{1}}_{1}\cdots\partial^{\alpha_{d}}_{d}$ .

Definition 4.2.

Let $U\subset\mathbb{R}^{d}$ be an open neighborhood of a point $x_{0}\in\mathbb{R}^{d}$ . We say a function $f:\mathbb{R}^{d}\to\mathbb{R}$ belongs to the $s$ -Hölder continuity class at $(x_{0},U)$ with $s>0$ if for every $x,y\in U$ and every multi-index $\alpha$ with $\left|\alpha\right|\leqslant s$ , we have

\left|D^{\alpha}f(x)-D^{\alpha}f(y)\right|\leqslant C\left|x-y\right|^{s-\lfloor s\rfloor}\,,

where $C=C(f,U)$ is the smallest constant that satisfies the above inequality. We denote this class of functions by $\mathcal{H}^{s}(x_{0})$ . The $\mathcal{H}^{s}$ -norm in this class is defined by

\left\|f\right\|_{\mathcal{H}^{s}(x_{0})}=\sup_{x\in U}\left|f(x)\right|+C(f,U)\,.

4.2. Minimax estimation for density

Notice that the particle density function $\mu$ solves the following system of equations [18]

\partial_{t}\mu_{t,u}=-\nabla\cdot\left(\mu_{t,u}\int_{I}\int_{\mathbb{R}^{d}}b(\cdot,y)G(u,v)\mu_{t,v}(dy)dv\right)+\frac{1}{2}\sum_{i,j=1}^{d}\partial_{ij}^{2}((\sigma\sigma^{T})_{ij}\mu_{t,u})\,,\quad u\in I\,.

This is a system of fully coupled Fokker-Planck equations, and the solution is uniquely determined by $(b,\sigma,G,\mu_{0})\in\mathcal{P}$ , where $\mathcal{P}$ is the class of $(b,\sigma,G,\mu_{0})$ satisfying Condition 2.1(1)(3), 2.2(1), and 2.3. We denote by $\mathcal{P}\ni(b,\sigma,G,\mu_{0})\mapsto\mu=S(b,\sigma,G,\mu_{0})$ the solution operator. We consider a specific class of coefficients and initial data.

For $s>0$ , we define

\mathcal{A}^{s}(t_{0},u_{0},x_{0})=\{(b,\sigma,G,\mu_{0})\in\mathcal{P}\nonscript\>|\nonscript\>\mathopen{}\allowbreak\mu=S(b,\sigma,G,\mu_{0}),\mu_{t_{0},u_{0}}\in\mathcal{H}^{s}(x_{0})\}\,,

and set

\mathcal{A}^{s}(t_{0},x_{0})=\bigcap_{u\in I}\mathcal{A}^{s}(t_{0},u,x_{0})\,.

Moreover, we consider a restriction of this class

\mathcal{A}^{s}_{L}(t_{0},x_{0})=\{(b,\sigma,G,\mu_{0})\in\mathcal{P}\nonscript\>|\nonscript\>\mathopen{}\allowbreak\left\|S(b,\sigma,G,\mu_{0})_{t_{0},u_{0}}\right\|_{\mathcal{H}^{s}(x_{0})}+\left\|b\right\|_{\infty}\leqslant L\}

for $L>0$ .

Several articles have discussed the richness of those classes of functions. In particular, Proposition 13 in [28] gives an example of Mckean-Vlasov homogeneous particle systems that fall into this class. With some slight modifications to its proof, we have the following result.

Proposition 4.1.

Let $\sigma=\sigma_{0}I_{d\times d}$ for some $\sigma_{0}>0$ . Let $b(x,y)=V(x)+F(x-y)$ with $F,V\in C_{c}^{1}$ and

\left\|V\right\|_{\mathcal{H}^{s}}+\left\|F\right\|_{\mathcal{H}^{s^{\prime}}}+\sup_{u\in I}\left\|\mu_{0,u}\right\|_{\mathcal{H}^{s^{\prime\prime}}}<\infty\,,

for some $s,s^{\prime}>1$ with $s\notin\mathbb{Z}$ , $s^{\prime\prime}>0$ . Here, $\mathcal{H}^{s}$ denotes the global Hölder class (where we simply choose $U=\mathbb{R}^{d}$ ).

Suppose further that $\mu_{0,u}$ are probability measures with finite first moments and continuous density functions uniformly bounded over $u\in I$ . Then, for every $t_{0}\in(0,T)$ and $x_{0}\in\mathbb{R}^{d}$ , we have $\mu_{t_{0},u}\in\mathcal{H}^{s}(x_{0})$ for all $u\in I$ .

Now we present the minimax theorem, over the particle systems within those restricted smoothness classes.

Theorem 4.1.

Let $L>0$ and $s\in(0,1)$ . Assume one of the following holds:

(a)

Hypothesis of Lemma 4.1(1) and $s\geqslant\frac{1}{2}$ ,
(b)

Hypothesis of Lemma 4.1(2) with $p>2$ and $s\in(0,\frac{1}{2})$ such that $p(2-4s)\leqslant(p-2)d$ .

Then for every $t_{0}\in(0,T)$ , $u_{0}\in(0,1)$ , $x_{0}\in\mathbb{R}^{d}$ , we have

(4.1)

\sup_{(b,\sigma,G,\mu_{0})\in\mathcal{A}^{s}_{L}(t_{0},x_{0})}\inf_{h_{2},h_{3}>0}\mathbf{E}\left|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}\leqslant Cn^{-\frac{2s}{d+3s}}\,.

Moreover,

(4.2)

\inf_{\hat{\mu}}\sup_{(b,\sigma,G,\mu_{0})\in\mathcal{A}^{s}_{L}(t_{0},x_{0})}\mathbf{E}\left|\hat{\mu}-\mu(t_{0},u_{0},x_{0})\right|^{2}\geqslant cn^{-\frac{2s}{d+3s}}\,,

where the infimum is taken over all possible estimators of $\mu(t_{0},u_{0},x_{0})$ constructed from $\mu^{n}_{t_{0}}$ . Both constants $C$ and $c$ depend only on the parameters $T,d,L$ , the kernels $J,K$ , the function $\rho_{I}$ given by Condition 2.3(3), and the values of $\mu$ in a small neighborhood of $t_{0},u_{0},x_{0}$ .

The proofs are written in Section 6.1. Here we make several remarks on the above results.

Remark 4.2.

Without the extra assumption on the $p$ -Wasserstein continuity on initial data in Lemma 4.1(2), we would obtain a suboptimal upper bound when $s<\frac{1}{2}$ , namely $n^{-\frac{1}{d+1+s}}$ (though we still attain the optimal bound when $s\geqslant\frac{1}{2}$ ).

Remark 4.3.

We mark that the asymptotic behavior is slower than that of [28], namely $n^{-\frac{2s}{d+3s}}$ rather than $n^{-\frac{2s}{d+2s}}$ . This is due to the heterogeneity of our graphon particle system. Both the index gap and the density gap between the target particle $u_{0}$ and the regularly spaced observations are $O(n^{-1})$ , which actually reduces the approximation accuracy. Yet it is possible to estimate the average density $\bar{\mu}=\int_{I}\mu_{u}du$ using exactly the same strategy as [28], and that should probably exhibit the identical asymptotic behavior.

Remark 4.4.

Note that our algorithm to estimate $\mu$ is not adaptive to the observed data. Instead, users are free to set the parameters (e.g. bandwidths) according to their own accuracy demands, and the parameters are fixed from the start. As long as the bandwidths are chosen appropriately, our estimator still achieves optimality. It also improves computational efficiency compared to the data-driven Goldenshluger-Lepski algorithm applied in [28], which selects the best bandwidths among the set of candidates by making $O(n)$ -many comparisons. Nevertheless, the adaptive estimator automatically fits the data with the best parameters and produces an error just a logarithmic factor higher than the lower bound. It is nonparametric and requires less knowledge about the initial state of the particle systems.

4.3. Minimax estimation for drift

In this section, we consider a slight generalization of the graphon mean-field system, where the drift coefficient is time-inhomogeneous. Namely, we extend (1.1) to

(4.3)

X_{u}(t)=X_{u}(0)+\int_{0}^{t}\int_{I}b(t,X_{u}(s),x)G(u,v)\mu_{s,v}(dy)dvds+\int_{0}^{t}\sigma(X_{u}(s))dB_{u}(s)\,,

where $b\in C^{1}([0,T];W^{2,\infty}(\mathbb{R}^{d}\times\mathbb{R}^{d};\mathbb{R}^{d}))$ satisfies that, $\forall t,t^{\prime}\in[0,T]$ , $x,x^{\prime},y,y^{\prime}\in\mathbb{R}^{d}$ ,

(4.4)

\left|b(t,x,y)-b(t^{\prime},x^{\prime},y^{\prime})\right|\leqslant C(\left|t-t^{\prime}\right|+\left|x-x^{\prime}\right|+\left|y-y^{\prime}\right|)\,,

for some constant $C>0$ (compare with (2.1)). Then the drift term $\beta$ is given by

\beta(t,u,x,\mu_{t})=\int_{I}G(u,v)\int_{\mathbb{R}^{d}}b(t,x,y)\mu_{t,v}(dy)dv\,.

Note that the stability analysis in [3] and previous estimates (Lemmata 4.1 and 4.2) still hold. We will conduct a minimax analysis on the drift term $\beta$ to show that the estimator $\hat{\beta}^{n}_{h,\kappa}$ is optimal in this general setting.

We can extend the notion of Hölder continuity to the space of time-dependent functions.

Definition 4.3.

Let $t_{0}\in(0,T)$ and $x_{0}\in\mathbb{R}^{d}$ , and $s_{1},s_{3}>0$ . We say that a function $f:(0,T)\times\mathbb{R}^{d}\to\mathbb{R}$ belongs to class $\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0})$ if there exists an open neighborhood $U$ of $(t_{0},x_{0})$ in $(0,T)\times\mathbb{R}^{d}$ such that

\left\|f\right\|_{\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0})}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\sup_{(t,x)\in U}\left|f(t,x)\right|+C(f,U)<\infty\,,

where $C(f,U)$ is the minimum positive number $C$ such that

\left|D^{\alpha_{1},\alpha_{3}}f(t,x)-D^{\alpha_{1},\alpha_{3}}f(s,y)\right|\leqslant C(\left|t-s\right|^{s_{1}-\lfloor s_{1}\rfloor}+\left|x-y\right|^{s_{3}-\lfloor s_{3}\rfloor})

for all $(t,x),(s,y)\in U$ and all (multi-)indices $\alpha_{1},\alpha_{3}$ with $\left|\alpha_{1}\right|\leqslant s_{1}$ , $\left|\alpha_{3}\right|\leqslant s_{3}$ .

We say that a function with values $\mathbb{R}^{m}$ $f=(f_{1},\dots,f_{m})$ belongs to class $\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0})$ if each component $f_{j}\in\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0})$ .

Let $\check{\mathcal{P}}$ be the set of all parameters $(b,\sigma,G,\mu_{0})$ satisfying the same conditions as $\mathcal{P}$ defined above Proposition 4.1, except that $b$ has the form $b(t,x,y)$ and satisfies (4.4). For $s_{1},s_{3}>0$ , we define

\check{\mathcal{A}}^{s_{1},s_{3}}(t_{0},u_{0},x_{0})=\{(b,\sigma,G,\mu_{0})\in\check{\mathcal{P}}\nonscript\>|\nonscript\>\mathopen{}\allowbreak\mu=S(b,\sigma,G,\mu_{0}),\mu_{u_{0}}\in\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0})\}\,,

and set

\check{\mathcal{A}}^{s_{1},s_{3}}(t_{0},x_{0})=\bigcap_{u\in I}\check{\mathcal{A}}^{s_{1},s_{3}}(t_{0},u,x_{0})\,.

Moreover, we consider a restriction of this class

\check{\mathcal{A}}^{s_{1},s_{3}}_{L}(t_{0},x_{0})=\{(b,\sigma,G,\mu_{0})\in\check{\mathcal{P}}\nonscript\>|\nonscript\>\mathopen{}\allowbreak\left\|S(b,\sigma,G,\mu_{0})\right\|_{\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0})}+\left\|b\right\|_{C^{1}_{t}W_{x,y}^{2,\infty}}\leqslant L\}

for $L>0$ .

Then we present a minimax result on the estimation of the drift term $\beta$ . The proof will be given in Section 6.2.

Theorem 4.2.

Let $L>0$ , and $s_{1},s_{3}\in(0,1)$ . Define the effective smoothness $s_{b}$ by

\frac{1}{s_{b}}=\frac{1}{s_{1}}+1+\frac{1}{s_{3}}\,.

Assume the hypothesis of Lemma 4.2, and $s_{1},s_{3}$ satisfy the following conditions

\frac{1}{s_{1}}-\frac{1}{s_{3}}+2\geqslant 0\,,\qquad\frac{1}{s_{1}}-\frac{2}{s_{3}}+4\geqslant 0\,.

Then, for every $t_{0}\in(0,T)$ , $u_{0}\in(0,1)$ , $x_{0}\in\mathbb{R}^{d}$ , we have

(4.5)

\sup_{(b,\sigma,G,\mu_{0})\in\check{\mathcal{A}}^{s_{1},s_{3}}_{L}(t_{0},x_{0})}\inf_{h_{1},h_{2},h_{3},\kappa_{2}>0}\mathbf{E}\left|\hat{\beta}^{n}_{h,\kappa}(t_{0},u_{0},x_{0})-\beta(t_{0},u_{0},x_{0})\right|^{2}\leqslant Cn^{-\frac{2s_{b}}{2s_{b}+1}}\,.

Moreover

(4.6)

\inf_{\hat{\beta}}\sup_{(b,\sigma,G,\mu_{0})\in\check{\mathcal{A}}^{s_{1},s_{3}}_{L}(t_{0},x_{0})}\mathbf{E}\left|\hat{\beta}-\beta(t_{0},u_{0},x_{0})\right|^{2}\geqslant cn^{-\frac{2s_{b}}{2s_{b}+1}}\,,

where the infimum is taken over all possible estimators $\hat{\beta}$ built on empirical data $(\mu^{n}_{t})_{t\in[0,T]}$ . Both constants $C$ and $c$ depend only on the parameters $T,d,L,\kappa_{2}$ , the kernels $H,J,K$ , the function $\rho_{I}$ given by Condition 2.3(3), and the values of $\mu$ in a small neighborhood of $t_{0},u_{0},x_{0}$ .

Remark 4.5.

It is also possible to obtain a similar argument in the case $s_{3}<\frac{1}{2}$ , subject to some additional assumptions on the $\mathcal{W}_{p}$ -continuity of the initial data, as in Theorem 4.1. We leave this to the readers.

Remark 4.6.

As mentioned above, the stability of graphon particle systems given in [3] and the consistency of the estimations in the Lemmata 4.1 and 4.2 remain true in the above generalization with the time-inhomogeneous drift coefficient $b$ . In this case, the time dependence of $\beta=\beta(t,u,x,\mu_{t})$ is not only dependent on the mean field but also on an independent variable $t$ . This gives more freedom in constructing examples and thus opens up a cleaner path towards the theoretical lower bound. It is for that reason that we include the time-inhomogeneous drifts for the minimax analysis.

Optimality of graphon estimator $\hat{G}$

Recall from (2.6) and the proof of Theorem 2.1 that the error of our estimator $\hat{G}$ depends on the $L^{2}$ variation of estimators $\hat{\mu}$ and $\hat{\beta}$ . There are several convergent quantities that require some stronger conditions to make them explicitly quantitative, but we are not diving into the details in this work. There is also a particular item, $\theta_{1}(\tilde{r},\kappa_{1})$ in the proof of Theorem 2.1, whose convergence is due to Assumption 2.1, that is not explicitly quantifiable without further assumptions. This keeps us away from a minimax analysis of the error $\mathbf{E}\left|\hat{G}-G\right|^{2}$ to study its optimality. However, given the pointwise optimality of $\hat{\mu}$ and $\hat{\beta}$ , our estimators should show relatively good performances.

5. Proofs for Section 3.1

5.1. Proof of Lemma 3.1 and Corollary 3.1

Proof of Lemma 3.1.

Fix $t_{0},u_{0},x_{0}$ . Recall that

\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})=(J\otimes K)_{h}\ast\mu^{n}_{t_{0}}=\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0}))\,.

We do the following inequality via a telescoping sum,

	$\displaystyle\mathbf{E}\left\|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right\|^{2}\leqslant$
	$\displaystyle\qquad 4\mathbf{E}\left\|\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})\big{(}K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0}))-K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))\big{)}\right\|^{2}$
	$\displaystyle\quad+4\mathbf{E}\left\|\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})\big{(}K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))-\mathbf{E}K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))\big{)}\right\|^{2}$
	$\displaystyle\quad+4\mathbf{E}\left\|\int_{I}J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0})))-J_{h_{2}}(u_{0}-u)\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0})))du\right\|^{2}$
	$\displaystyle\quad+4\mathbf{E}\left\|(J\otimes K)_{h}\ast\mu(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right\|^{2}$
	$\displaystyle\quad=:4(M_{1}+M_{2}+M_{3}+M_{4})\,.$

We only need to provide the appropriate bounds for $M_{1}$ , $M_{2}$ , and $M_{3}$ .

Step 1. We bound $M_{1}$ using the convergence of the finite-population system to the graphon mean-field system.

	$\displaystyle M_{1}$	$\displaystyle\leqslant\frac{1}{n}\sum_{i=1}^{n}\left\|J_{h_{2}}(u_{0}-\frac{i}{n})\right\|^{2}\mathbf{E}\left\|K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0}))-K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))\right\|^{2}$
		$\displaystyle\leqslant\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\left\\|\nabla K_{h_{3}}\right\\|_{\infty}^{2}\mathbf{E}\left\|X^{n}_{i}(t_{0})-X_{\frac{i}{n}}(t_{0})\right\|^{2}\,.$

Then applying Lemma 2.1 gives the bound

M_{1}\lesssim n^{-2}h_{3}^{-2-2d}\left\|\nabla K\right\|_{\infty}^{2}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\,.

Step 2. We bound $M_{2}$ following the idea of [28].

For $i=1,\dots,n$ , let

Z_{i}=J_{h_{2}}(u_{0}-\frac{i}{n})\big{(}K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))-\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0})))\big{)}\,.

The second part simplifies to

M_{2}=\mathbf{E}\left|\frac{1}{n}\sum_{i=1}^{n}Z_{i}\right|^{2}=\int_{0}^{\infty}\mathbf{P}\left(\left|\frac{1}{n}\sum_{i=1}^{n}Z_{i}\right|>\sqrt{z}\right)dz\,.

From (1.1) we know $X_{u}$ are autonomous. And due to the independence of the Brownian motions $\{B_{u}:u\in I\}$ , $Z_{1},\dots,Z_{n}$ are all independent. We have $\mathbf{E}Z_{i}=0$ and $\left|Z_{i}\right|\leqslant\left\|(J\otimes K)_{h}\right\|_{\infty}<\infty$ for every $i=1,\dots,n$ . Then Bernstein’s inequality reads

\mathbf{P}\left(\left|\frac{1}{n}\sum_{i=1}^{n}Z_{i}\right|>\sqrt{z}\right)\leqslant 2\exp\left(-\frac{\frac{1}{2}n^{2}z}{\sum_{i=1}^{n}\mathbf{E}Z_{i}^{2}+\frac{1}{3}n\sqrt{z}\left\|(J\otimes K)_{h}\right\|_{\infty}}\right)\,.

Now we apply the inequality (48) in [28] to see that

(5.1)

\int_{0}^{\infty}\mathbf{P}\left(\left|\frac{1}{n}\sum_{i=1}^{n}Z_{i}\right|>\sqrt{z}\right)dz\lesssim\max\left\{2n^{-2}\sum_{i=1}^{n}\mathbf{E}Z_{i}^{2},\frac{4}{9}n^{-2}\left\|(J\otimes K)_{h}\right\|_{\infty}^{2}\right\}\,.

Observe that

\mathbf{E}Z_{i}^{2}\leqslant 4J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))^{2})=4J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\left\|K_{h_{3}}(x_{0}-\cdot)\right\|_{L^{2}(\mu_{t_{0},\frac{i}{n}})}^{2}\,.

Plugging into (5.1) gives the bound

M_{2}\lesssim n^{-2}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\left\|K_{h_{3}}(x_{0}-\cdot)\right\|^{2}_{L^{2}(\mu_{t_{0},\frac{i}{n}})}+n^{-2}h_{2}^{-2}h_{3}^{-2d}\left\|J\right\|_{\infty}^{2}\left\|K\right\|_{\infty}^{2}\,.

Step 3. Notice that $J_{h_{2}}$ is supported on $\overline{B(0,h_{2})}$ . We bound $M_{3}$ using only Minkowski’s inequality and the inequality mean-value theorem.

	$\displaystyle M_{3}$	$\displaystyle\leqslant 4h_{2}\int_{I}\left\|J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})-J_{h_{2}}(u_{0}-u)\right\|^{2}(\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0}))))^{2}$
		$\displaystyle\qquad+J_{h_{2}}(u_{0}-u)^{2}\big{(}\mathbf{E}\left\|K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0}))-K_{h_{3}}(x_{0}-X_{u}(t_{0}))\right\|^{2}\big{)}^{2}du$
		$\displaystyle\leqslant 4h_{2}\int_{2(u_{0}+\operatorname{supp}(J_{h_{2}}))}\left\\|\nabla J_{h_{2}}\right\\|_{\infty}^{2}\left\|\frac{\lceil nu\rceil}{n}-u\right\|^{2}\left\\|K_{h_{3}}(x_{0}-\cdot)\right\\|_{L^{2}(\mu_{t_{0},\frac{\lceil nu\rceil}{n}})}^{2}$
		$\displaystyle\qquad+J_{h_{2}}(u_{0}-u)^{2}\left\\|\nabla k_{h_{3}}\right\\|_{\infty}^{2}\mathbf{E}\left\|X_{\frac{\lceil nu\rceil}{n}}(t_{0})-X_{u}(t_{0})\right\|^{2}du$
		$\displaystyle\lesssim n^{-3}h_{2}^{-2}\left\\|\nabla J\right\\|_{\infty}^{2}\sum_{i=1}^{n}\left\\|K_{h_{3}}(x_{0}-\cdot)\right\\|_{L^{2}(\mu_{t_{0},\frac{i}{n}})}^{2}+n^{-2}h_{3}^{-2-2d}\left\\|J\right\\|_{2}^{2}\left\\|\nabla K\right\\|_{\infty}^{2}\,,$

where the last step uses Theorem 2.1 of [3]. That finishes the proof. ∎

We will consistently use the equality that follows from Fubini-Tonelli theorem,

(5.2)

\int_{\mathbb{R}^{d}}\left\|K_{h_{3}}(x-\cdot)\right\|_{L^{2}(\mu_{t,u})}^{2}dx=h_{3}^{-d}\left\|K\right\|_{2}^{2}\,,

where the latter $L^{2}$ -norm is taken with respect to the Lebesgue measure on $\mathbb{R}^{d}$ .

Proof of Corollary 3.1.

Recall that $\hat{\mu}^{n}_{h,r}=\hat{\mu}^{n}_{h}\mathbf{1}_{\{\left|x\right|\leqslant r\}}$ . We break the integral into two parts

	$\displaystyle\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\mathbf{E}\left\|\hat{\mu}^{n}_{h,r}(t,u,x)-\mu(t,u,x)\right\|^{2}dxdudt=$
	$\displaystyle\qquad\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\{\left\|x\right\|\leqslant r\}}\mathbf{E}\left\|\hat{\mu}^{n}_{h,r}(t,u,x)-\mu(t,u,x)\right\|^{2}dxdudt+\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\{\left\|x\right\|>r\}}\mu(t,u,x)^{2}dxdudt\,.$

The second part tends to 0 as $r\to\infty$ due to Proposition 2.1 and dominated convergence. We denote the convergence rate by $\theta_{2,\mu}(r)$ .

For the first part, we rearrange and integrate the terms given in Lemma 3.1. Recall that

	$\displaystyle\mathbf{E}\left\|\hat{\mu}^{n}_{h}(t,u,x)-\mu(t,u,x)\right\|^{2}\lesssim$
	$\displaystyle\qquad n^{-2}h_{2}^{-2}h_{3}^{-2d}+n^{-2}h_{3}^{-2-2d}$
(5.3)		$\displaystyle\quad+n^{-2}h_{3}^{-2-2d}\left\\|\nabla K\right\\|_{\infty}^{2}\sum_{i=1}^{n}J_{h_{2}}(u-\frac{i}{n})^{2}$
(5.4)		$\displaystyle\quad+n^{-2}\sum_{i=1}^{n}J_{h_{2}}(u-\frac{i}{n})^{2}\left\\|K_{h_{3}}(x-\cdot)\right\\|^{2}_{L^{2}(\mu_{t,\frac{i}{n}})}$
(5.5)		$\displaystyle\quad+n^{-3}h_{2}^{-2}\left\\|\nabla J\right\\|_{\infty}^{2}\sum_{i=1}^{n}\left\\|K_{h_{3}}(x-\cdot)\right\\|_{L^{2}(\mu_{t_{,}\frac{i}{n}})}^{2}$
	$\displaystyle\quad+\left\|(J\otimes K)_{h}\ast\mu_{t}(u,x)-\mu(t,u,x)\right\|^{2}\,.$

The first line of bounds are independent of $t,u,x$ , so that integrating them gives

Tr^{d}(n^{-2}h_{2}^{-2}h_{3}^{-2d}+n^{-2}h_{3}^{-2-2d})\,.

The middle three lines are computed as follows. For line (5.3) we have

	$\displaystyle\int_{I}\int_{\{\left\|x\right\|\leqslant r\}}n^{-2}h_{3}^{-2-2d}\left\\|\nabla K\right\\|_{\infty}^{2}\sum_{i=1}^{n}J_{h_{2}}(u-\frac{i}{n})^{2}dxdu$
	$\displaystyle\qquad\leqslant r^{d}n^{-2}h_{3}^{-2-2d}\left\\|\nabla K\right\\|_{\infty}^{2}\sum_{i=1}^{n}\int_{\mathbb{R}}J_{h_{2}}(u-\frac{i}{n})^{2}$
	$\displaystyle\qquad=r^{d}n^{-1}h_{2}^{-1}h_{3}^{-2-2d}\left\\|J\right\\|_{2}^{2}\,.$

With the same idea and using (5.2), line (5.4) gives

	$\displaystyle\int_{I}\int_{\mathbb{R}^{d}}n^{-2}\sum_{i=1}^{n}J_{h_{2}}(u-\frac{i}{n})^{2}\left\\|K_{h_{3}}(x-\cdot)\right\\|^{2}_{L^{2}(\mu_{t,\frac{i}{n}})}dxdu$
	$\displaystyle\qquad\leqslant n^{-1}h_{2}^{-1}h_{3}^{-d}\left\\|J\right\\|_{2}^{2}\left\\|K\right\\|_{2}^{2}\,.$

Analogously, line (5.5) gives

	$\displaystyle\int_{I}\int_{\mathbb{R}^{d}}n^{-3}h_{2}^{-2}\left\\|\nabla J\right\\|_{\infty}^{2}\sum_{i=1}^{n}\left\\|K_{h_{3}}(x-\cdot)\right\\|_{L^{2}(\mu_{t_{,}\frac{i}{n}})}^{2}dxdu$
	$\displaystyle\qquad=n^{-2}h_{2}^{-2}h_{3}^{-d}\left\\|\nabla J\right\\|_{\infty}^{2}\left\\|K\right\\|_{2}^{2}\,.$

In addition, the final line expands as

	$\displaystyle\int_{I}\int_{\mathbb{R}^{d}}\left\|(J\otimes K)_{h}\ast\mu_{t}(u,x)-\mu(t,u,x)\right\|^{2}dudx$
	$\displaystyle\leqslant\int_{\mathbb{R}\times\mathbb{R}^{d}}\left\|\int_{\mathbb{R}\times U}(J\otimes K)_{h}(u^{\prime},x^{\prime})(\mu(t,u-u^{\prime},x-x^{\prime})-\mu(t,u,x))dx^{\prime}du^{\prime}\right\|^{2}dxdu$
	$\displaystyle\leqslant\left(\int_{\mathbb{R}\times\mathbb{R}^{d}}(J\otimes K)_{h}(u^{\prime},x^{\prime})\left(\int_{\mathbb{R}\times\mathbb{R}^{d}}(\mu(t,u-u^{\prime},x-x^{\prime})-\mu(t,u,x))^{2}dxdu\right)^{1/2}dx^{\prime}du^{\prime}\right)^{2}$
	$\displaystyle\leqslant\sup_{\left\|u^{\prime}\right\|\leqslant h_{2},\left\|x^{\prime}\right\|\leqslant h_{3}}\left\\|\mu(t,\cdot-u^{\prime},\cdot-x^{\prime})-\mu(t,\cdot,\cdot)\right\\|_{L^{2}(\mathbb{R}\times\mathbb{R}^{d})}^{2}\,.$

Note that $\left\|\mu_{t,u}\right\|_{2}$ is uniformly bounded for $t\in[\tau_{1},\tau_{2}]$ and $u\in I$ as a consequence of Proposition 2.1 (see also Corollary 8.2.2 of [9] for details on local upper bounds of particle density). As translations converge in $L^{2}$ , with dominated convergence we have

(5.6)

\lim_{h_{2},h_{3}\to 0}\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\left|(J\otimes K)_{h}\ast\mu_{t}(u,x)-\mu(t,u,x)\right|^{2}dudxdt=0\,.

We denote the convergence rate by $\theta_{3,\mu}(h)$ .

In summary, the $L^{2}$ -error of $\hat{\mu}^{n}_{h,r}$ is given by

	$\displaystyle\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\mathbf{E}\left\|\hat{\mu}^{n}_{h,r}(t,u,x)-\mu(t,u,x)\right\|^{2}dxdudt\lesssim_{T,b,J,K}\theta_{2,\mu}(r)+\theta_{3,\mu}(h)+$
	$\displaystyle\qquad r^{d}(n^{-2}h_{3}^{-2-2d}+n^{-2}h_{2}^{-2}h_{3}^{-2d})+n^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{2}^{-2}h_{3}^{-d}\,.$

∎

5.2. Proof of Lemma 3.2 and Corollary 3.2

Recall the dynamics of $X_{u}$ defined in (1.1) For simplicity, we let

Y_{u}(t)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\beta(t,u,X_{u}(t))=\int_{I}\int_{\mathbb{R}^{d}}b(X_{u}(t),x)G(u,v)\mu_{t,v}(dy)dv

for every $u\in I$ and $t\in[0,T]$ . Similarly, for the finite-population system, we let

Y^{n}_{i}(t)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\frac{1}{n}\sum_{j=1}^{n}b(X^{n}_{i}(t),X^{n}_{j}(t))g^{n}_{ij}

for $i=1,\dots,n$ , and $t\in[0,T]$ . Observe that $\left|Y^{n}_{i}(t)\right|,\left|Y_{u}(t)\right|\leqslant\left\|b\right\|_{\infty}$ . We have the following consistency result.

Lemma 5.1.

We assume the same hypothesis as in Lemma 2.1. In the $n$ -particle system, we have the following

(5.7)

\max_{1\leqslant i\leqslant n}\mathbf{E}\left|Y^{n}_{i}(t)-Y_{\frac{i}{n}}(t)\right|^{2}\lesssim\max_{1\leqslant i\leqslant n}\mathbf{E}\left|X^{n}_{i}(t)-X_{\frac{i}{n}}(t)\right|^{2}+\frac{1}{n}\,,

for every $t\in[0,T]$ . As a consequence,

\max_{i=1,\dots,n}\int_{0}^{T}\psi(t)\mathbf{E}\left|Y_{\frac{i}{n}}(t)-Y^{n}_{i}(t)\right|^{2}\lesssim_{\psi}\frac{1}{n}\,.

for any bounded continuous function $\psi$ .

Although we defer the proof to Appendix B, we are now ready to prove our estimates of $\pi$ .

Proof of Lemma 3.2.

Fix $t_{0},u_{0},x_{0}$ . Recall that

	$\displaystyle\hat{\pi}^{n}_{h}(t_{0},u_{0},x_{0})$	$\displaystyle=\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))dX^{n}_{i}(t)$
		$\displaystyle=\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))Y^{n}_{i}(t)dt$
		$\displaystyle\qquad+\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))\sigma(X^{n}_{i}(t))dB_{\frac{i}{n}}(t)\,.$

From this we may then write

	$\displaystyle\mathbf{E}\left\|\pi^{n}_{h}(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})\right\|^{2}\leqslant$
	$\displaystyle\qquad 5\mathbf{E}\left\|\int_{0}^{T}H_{h_{1}}(t_{0}-t)\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})\right.$
	$\displaystyle\qquad\qquad\qquad\left.\big{(}K_{h_{3}}(x_{0}-X^{n}_{i}(t))Y^{n}_{i}(t)-K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t)\big{)}dt\right\|^{2}$
	$\displaystyle\quad+5\mathbf{E}\left\|\int_{0}^{T}H_{h_{1}}(t_{0}-t)\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})\right.$
	$\displaystyle\qquad\qquad\qquad\left.\big{(}K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t)-\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t))\big{)}dt\right\|^{2}$
	$\displaystyle\quad+5\mathbf{E}\left\|\int_{0}^{T}H_{h_{1}}(t_{0}-t)\int_{I}\left(J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))Y_{\frac{\lceil nu\rceil}{n}}(t))\right.\right.$
	$\displaystyle\qquad\qquad\qquad\left.\left.-J_{h_{2}}(u_{0}-u)\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t))Y_{u}(t))\right)dudt\right\|^{2}$
	$\displaystyle\quad+5\mathbf{E}\left\|\int_{0}^{T}\int_{I}\int_{\mathbb{R}^{d}}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-u,x_{0}-x)\pi(t,u,x)dxdudt-\pi(t_{0},u_{0},x_{0})\right\|^{2}$
	$\displaystyle\quad+5\mathbf{E}\left\|\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}H_{h_{1}}(t_{0}-t)J_{h_{2}}(u_{0}-\frac{i}{n})K_{h_{3}}(x_{0}-X^{n}_{i}(t))\sigma(X^{n}_{i}(t))dB_{\frac{i}{n}}(t)\right\|^{2}$
	$\displaystyle\quad=:5(P_{1}+P_{2}+P_{3}+P_{4}+P_{5})\,.$

We do not need to do anything for $P_{4}$ for now, and $P_{5}$ will be proved using standard techniques in stochastic analysis at the end. The proof of the rest are bounded in analogously as $M_{1}$ through $M_{3}$ in the proof of Lemma 3.1.

Step 1. Observe that $P_{1}$ is upper bounded by

	$\displaystyle T\int_{0}^{T}H_{h_{1}}^{2}(t_{0}-t)\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}$
	$\displaystyle\qquad\qquad\mathbf{E}\left\|K_{h_{3}}(x_{0}-X^{n}_{i}(t))Y^{n}_{i}(t)-K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t)\right\|^{2}dt\,.$

For each $t$ and $i$ , we have

	$\displaystyle\left\|K_{h_{3}}(x_{0}-X^{n}_{i}(t))Y^{n}_{i}(t)-K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t)\right\|$
	$\displaystyle\leqslant\left\|K_{h_{3}}(x_{0}-X^{n}_{i}(t))-K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))\right\|\left\|Y^{n}_{i}(t)\right\|+K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))\left\|Y^{n}_{i}(t)-Y_{\frac{i}{n}}(t)\right\|$
	$\displaystyle\leqslant\left\\|\nabla K_{h_{3}}\right\\|_{\infty}\left\|X^{n}_{i}(t)-X_{\frac{i}{n}}(t)\right\|\left\\|b\right\\|_{\infty}+\left\\|K_{h_{3}}\right\\|_{\infty}\left\|Y^{n}_{i}(t)-Y_{\frac{i}{n}}(t)\right\|\,,$

	$\displaystyle\mathbf{E}\left\|K_{h_{3}}(x_{0}-X^{n}_{i}(t))Y^{n}_{i}(t)-K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t)\right\|^{2}$
	$\displaystyle\leqslant 2\left\\|\nabla K_{h_{3}}\right\\|_{\infty}^{2}\left\\|b\right\\|_{\infty}^{2}\mathbf{E}\left\|X^{n}_{i}(t)-X_{\frac{i}{n}}(t)\right\|^{2}+2\left\\|K_{h_{3}}\right\\|_{\infty}^{2}\mathbf{E}\left\|Y^{n}_{i}(t)-Y_{\frac{i}{n}}(t)\right\|^{2}\,.$

Then, with Lemmata 2.1 and 5.1, we get

P_{1}\lesssim_{b,H,J,K,T}n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d}+Tn^{-2}h_{3}^{-2d}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-u)^{2}\,.

Step 2. To bound $P_{2}$ , we apply Berstein’s inequality. Let

Z_{i}(t)=J_{h_{2}}(u_{0}-\frac{i}{n})\big{(}K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t)-\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t))Y_{\frac{i}{n}}(t))\big{)}\,.

Then

	$\displaystyle P_{2}$	$\displaystyle\leqslant T\int_{0}^{T}H_{h_{1}}(t_{0}-t)^{2}\mathbf{E}\left\|\frac{1}{n}\sum_{i=1}^{n}Z_{i}(t)\right\|^{2}dt$
		$\displaystyle=T\int_{0}^{T}H_{h_{1}}(t_{0}-t)^{2}\int_{0}^{\infty}\mathbf{P}\left(\left\|\sum_{i=1}^{n}Z_{i}(t)\right\|>n\sqrt{z}\right)dzdt\,.$

Observe that $\mathbf{E}Z_{i}(t)=0$ , and $\left|Z_{i}(t)\right|\leqslant 2\left\|(J\otimes K)_{h}\right\|_{\infty}\left\|b\right\|_{\infty}$ for every $i=1,\dots,n$ , for every $t$ . Also, every $Z_{i}(t)$ is a function of $X_{\frac{i}{n}}(t)$ , which makes them independent of each other among $i=1,\dots,n$ . So we may apply Bernstein’s inequality and inequality (48) in [28] to obtain

	$\displaystyle\mathbf{P}\left(\left\|\sum_{i=1}^{n}Z_{i}(t)\right\|>n\sqrt{z}\right)$	$\displaystyle\leqslant 2\exp\left(-\frac{\frac{1}{2}n^{2}z}{\sum_{i=1}^{n}\mathbf{E}\left\|Z_{i}(t)\right\|^{2}+\frac{n\sqrt{z}}{3}2\left\\|(J\otimes K)_{h}\right\\|_{\infty}\left\\|b\right\\|_{\infty}}\right)$
		$\displaystyle\lesssim\max\left\{2n^{-2}\sum_{i=1}^{n}\mathbf{E}\left\|Z_{i}(t)\right\|^{2},\frac{16}{9}n^{-2}\left\\|(J\otimes K)_{h}\right\\|_{\infty}^{2}\left\\|b\right\\|_{\infty}^{2}\right\}\,.$

Further notice that

\mathbf{E}\left|Z_{i}(t)\right|^{2}\leqslant\left\|b\right\|_{\infty}^{2}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\left\|K_{h_{3}}(x_{0}-\cdot)\right\|_{L^{2}(\mu_{t,\frac{i}{n}})}^{2}\,.

Thus

	$\displaystyle P_{2}$	$\displaystyle\lesssim Tn^{-2}\left\\|b\right\\|_{\infty}^{2}\int_{0}^{T}H_{h_{1}}(t_{0}-t)^{2}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\left\\|K_{h_{3}}(x_{0}-\cdot)\right\\|_{L^{2}(\mu_{t,\frac{i}{n}})}^{2}$
		$\displaystyle\qquad+Tn^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2d}\left\\|b\right\\|_{\infty}^{2}\left\\|H\right\\|_{2}^{2}\left\\|J\right\\|_{\infty}^{2}\left\\|K\right\\|_{\infty}^{2}\,.$

Step 3. The idea for bounding $P_{3}$ is analogous to that of $M_{3}$ in the proof of Lemma 3.1, which uses the stability of the graphon mean-field system. Observe that

P_{3}\leqslant T\int_{0}^{T}H_{h_{1}}(t_{0}-t)^{2}4h_{2}\int_{I}\left|\mathbf{E}P_{3}(t,u)\right|^{2}dudt\,,

where

P_{3}(t,u)=J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))Y_{\frac{\lceil nu\rceil}{n}}(t)-J_{h_{2}}(u_{0}-u)K_{h_{3}}(x_{0}-X_{u}(t))Y_{u}(t)\,.

Note that

	$\displaystyle\left\|P_{3}(t,u)\right\|^{2}$	$\displaystyle\leqslant 2\left\|J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})-J_{h_{2}}(u_{0}-u)\right\|^{2}K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))^{2}\left\|Y_{\frac{\lceil nu\rceil}{n}}(t)\right\|^{2}$
		$\displaystyle\qquad+2J_{h_{2}}(u_{0}-u)^{2}\left\|K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))Y_{\frac{\lceil nu\rceil}{n}}(t)-K_{h_{3}}(x_{0}-X_{u}(t))Y_{u}(t)\right\|^{2}$
		$\displaystyle\leqslant 2\left\\|\nabla J_{h_{2}}\right\\|_{\infty}^{2}\left\|\frac{\lceil nu\rceil}{n}-u\right\|^{2}K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))^{2}\left\\|b\right\\|_{\infty}^{2}$
		$\displaystyle\qquad+4J_{h_{2}}(u_{0}-u)^{2}K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))^{2}\left\|Y_{\frac{\lceil nu\rceil}{n}}(t)-Y_{u}(t)\right\|^{2}$
		$\displaystyle\qquad+4J_{h_{2}}(u_{0}-u)^{2}\left\|K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))-K_{h_{3}}(x_{0}-X_{u}(t))\right\|^{2}\left\|Y_{u}(t)\right\|^{2}$
		$\displaystyle\leqslant 2n^{-2}h_{2}^{-2}\left\\|\nabla J\right\\|_{\infty}^{2}\left\\|b\right\\|_{\infty}^{2}K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t))^{2}$
		$\displaystyle\qquad+4J_{h_{2}}(u_{0}-u)^{2}h_{3}^{-2d}\left\\|K\right\\|_{\infty}^{2}\left\|Y_{\frac{\lceil nu\rceil}{n}}(t)-Y_{u}(t)\right\|^{2}$
		$\displaystyle\qquad+4J_{h_{2}}(u_{0}-u)^{2}h_{3}^{-2-2d}\left\\|\nabla K\right\\|_{\infty}^{2}\left\|X_{\frac{\lceil nu\rceil}{n}}(t)-X_{u}(t)\right\|^{2}\left\\|b\right\\|_{\infty}^{2}\,.$

Then

	$\displaystyle\mathbf{E}\left\|P_{3}(t,u)\right\|^{2}$	$\displaystyle\leqslant 2n^{-2}h_{2}^{-4}\left\\|\nabla J\right\\|_{\infty}^{2}\left\\|b\right\\|_{\infty}^{2}\left\\|K_{h_{3}}(x_{0}-\cdot)\right\\|_{L^{2}(\mu_{t,\frac{\lceil nu\rceil}{n}})}^{2}$
		$\displaystyle\quad+4n^{-2}h_{3}^{-2d}\left\\|K\right\\|_{\infty}^{2}J_{h_{2}}(u_{0}-u)^{2}+4n^{-2}h_{3}^{-2-2d}\left\\|\nabla K\right\\|_{\infty}^{2}\left\\|b\right\\|_{\infty}^{2}J_{h_{2}}(u_{0}-u)^{2}\,.$

Integrating those produces

	$\displaystyle P_{3}$	$\displaystyle\lesssim Tn^{-2}h_{2}^{-2}\left\\|\nabla J\right\\|_{\infty}^{2}\left\\|b\right\\|_{\infty}^{2}\int_{0}^{T}H_{h_{1}}(t_{0}-t)^{2}\int_{I}\left\\|K_{h_{3}}(x_{0}-\cdot)\right\\|_{L^{2}(\mu_{t,\frac{\lceil nu\rceil}{n}})}^{2}dudt$
		$\displaystyle\qquad+Tn^{-2}h_{1}^{-1}h_{3}^{-2-2d}\left\\|H\right\\|_{2}^{2}\left\\|J\right\\|_{2}^{2}\left\\|\nabla K\right\\|_{\infty}^{2}\left\\|b\right\\|_{\infty}^{2}$
		$\displaystyle\qquad+Tn^{-2}h_{1}^{-1}h_{3}^{-2d}\left\\|H\right\\|_{2}^{2}\left\\|J\right\\|_{2}^{2}\left\\|K\right\\|_{\infty}^{2}\,.$

Step 4. For $P_{5}$ , notice that $\{B_{\frac{i}{n}}\nonscript\>|\nonscript\>\mathopen{}\allowbreak i=1,\dots,n\}$ are distinct independent Brownian motions. Then we apply Itô’s isometry to see that

	$\displaystyle P_{5}$	$\displaystyle=\frac{1}{n^{2}}\mathbf{E}\left\|\int_{0}^{T}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))\sigma(X^{n}_{i}(t))dB_{\frac{i}{n}}(t)\right\|^{2}$
		$\displaystyle=\frac{d}{n^{2}}\mathbf{E}\left(\int_{0}^{T}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))^{2}\operatorname{tr}(\sigma\sigma^{T})(X^{n}_{i}(t))dt\right)$
		$\displaystyle\leqslant\frac{\sigma_{+}^{2}d^{2}}{n^{2}}\sum_{i=1}^{n}\int_{0}^{T}\mathbf{E}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))^{2}$
		$\displaystyle\lesssim Td^{2}\sigma_{+}^{2}n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d}\,.$

Adding all the above bounds finishes the proof. ∎

Proof of Corollary 3.2.

Recall that $\hat{\beta}^{n}_{h,\kappa,r}=\hat{\beta}^{n}_{h,\kappa}\mathbf{1}_{\{\left|x\right|\leqslant r\}}$ . We break the integral into two parts

		$\displaystyle\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\mathbf{E}\left\|\hat{\beta}^{n}_{h,\kappa,r}(t,u,x)-\beta(t,u,x)\right\|^{2}$
(5.8)			$\displaystyle\quad=\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\{\left\|x\right\|\leqslant r\}}\mathbf{E}\left\|\hat{\beta}^{n}_{h,\kappa}(t,u,x)-\beta(t,u,x)\right\|^{2}+\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\{\left\|x\right\|>r\}}\left\|\beta(t,u,x)\right\|^{2}dxdudt\,.$

Step 1. The convergence of the second part is due to the $L^{2}$ -integrability of $\beta$ . More precisely, recall that

\beta(t,u,x)=\int_{I}\int_{\mathbb{R}^{d}}b(x,y)G(u,v)\mu_{t,v}(dy)dv\,,

where $b(x,y)=F(x-y)+V(x)$ with $F,V\in L^{1}\cap L^{2}\cap L^{\infty}$ . Then

\left|\beta(t,u,x)\right|^{2}\leqslant 2\left|V(x)\right|^{2}+2\int_{\mathbb{R}^{d}}\left|F(x-y)\right|^{2}\mu_{t,v}(dy)\,,

so that

\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\left|\beta(t,u,x)\right|^{2}\leqslant 2T(\left\|V\right\|_{2}^{2}+\left\|F\right\|_{2}^{2})<\infty\,.

Thus by dominated convergence, we have

\theta_{2,\beta}(r)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\{\left|x\right|>r\}}\left|\beta(t,u,x)\right|^{2}dxdudt\to 0

as $r\to\infty$ .

Step 2. We now look at the first part of (5.8). Recalling that $\hat{\beta}^{n}_{h,\kappa}=\frac{\hat{\pi}^{n}_{h}}{\hat{\mu}^{n}_{h}\lor\kappa_{2}}$ , we obtain

	$\displaystyle\left\|\hat{\beta}^{n}_{h,\kappa}(t,u,x)-\beta(t,u,x)\right\|^{2}\lesssim$
	$\displaystyle\qquad\kappa_{2}^{-2}\left(\left\|\hat{\pi}^{n}_{h}(t,u,x)-\pi(t,u,x)\right\|^{2}+\left\\|b\right\\|_{\infty}^{2}\left\|\hat{\mu}^{n}_{h}(t,u,x)-\mu(t,u,x)\right\|^{2}\right)$

whenever $0<\kappa_{2}<\mu(t,u,x)$ . Note that $\mu$ has a strictly positive lower bound over $[\tau_{1},\tau_{2}]\times I\times B(0,r)$ thanks to Harnack’s inequality (see for instance Corollary 8.2.2 in [9]). This allows us to choose a strictly positive $\kappa_{2}$ depending on $r$ . We may set without loss of generality $\kappa_{2}=\kappa_{2}(r)$ decreasing as $r$ increases.

We already have an upper bound of

\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\{\left|x\right|\leqslant r\}}\mathbf{E}\left|\hat{\mu}^{n}_{h}(t,u,x)-\mu(t,u,x)\right|^{2}dxdudt

from Corollary 3.1. It remains to look at the errors of $\hat{\pi}$ .

For the estimates of $\pi$ , we rearrange and combine the terms in the upper bound given in Lemma 3.2 to see that

	$\displaystyle\mathbf{E}\left\|\hat{\pi}^{n}_{h}(t,u,x)-\pi(t,u,x)\right\|^{2}\lesssim_{T,b,H,J,K}$
	$\displaystyle\qquad n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d}+n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d}$
	$\displaystyle\quad+n^{-2}h_{1}^{-1}h_{3}^{-2d}\sum_{i=1}^{n}J_{h_{2}}(u-\frac{i}{n})^{2}$
	$\displaystyle\quad+n^{-2}\int_{0}^{T}H_{h_{1}}^{2}(t-s)\sum_{i=1}^{n}J_{h_{2}}^{2}(u-\frac{i}{n})\left\\|K_{h_{3}}(x-\cdot)\right\\|_{L^{2}(\mu_{s,\frac{i}{n}})}^{2}ds$
	$\displaystyle\quad+n^{-3}h_{2}^{-2}\int_{0}^{T}H_{h_{1}}^{2}(t-s)\sum_{i=1}^{n}\left\\|K_{h_{3}}(x-\cdot)\right\\|_{L^{2}(\mu_{s,\frac{i}{n}})}^{2}ds$
	$\displaystyle\quad+\left\|(H\otimes J\otimes K)_{h}\ast\pi(t,u,x)-\pi(t,u,x)\right\|^{2}\,.$

Analogous to the proof of Corollary 3.1, integrating those items over $[\tau_{1},\tau_{2}]\times I\times\{\left|x\right|\leqslant r\}$ produces

	$\displaystyle\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\{\left\|x\right\|\leqslant r\}}\mathbf{E}\left\|\hat{\pi}^{n}_{h}(t,u,x)-\pi(t,u,x)\right\|^{2}\lesssim_{T,b,H,J,K}$
	$\displaystyle\qquad r^{d}(n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d}+n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d})$
	$\displaystyle\quad+n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-d}$
	$\displaystyle\quad+\left\\|(H\otimes J\otimes K)_{h}\ast\pi-\pi\right\\|_{L^{2}([\tau_{1},\tau_{2}]\times I\times\mathbb{R}^{d})}^{2}\,.$

Recall that $\pi=\mu\beta$ , where $\left|\beta\right|\leqslant\left\|b\right\|_{\infty}$ . Using the same idea attaining (5.6), we see that

\lim_{h_{1},h_{2},h_{3}\to 0}\left\|(H\otimes J\otimes K)_{h}\ast\pi-\pi\right\|_{L^{2}([\tau_{1},\tau_{2}]\times I\times\mathbb{R}^{d})}^{2}=0\,,

and we denote the convergence rate by $\theta_{3,\pi}(h)$ .

Therefore, joining all the items, we obtain an overall upper bound

	$\displaystyle\int_{\tau_{1}}^{\tau_{2}}\int_{I}\int_{\mathbb{R}^{d}}\mathbf{E}\left\|\hat{\beta}^{n}_{h,\kappa,r}(t,u,x)-\beta(t,u,x)\right\|^{2}dxdudt\lesssim$
	$\displaystyle\qquad\kappa_{2}(r)^{-2}(n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-d})$
	$\displaystyle\quad+\kappa_{2}(r)^{-2}r^{d}(n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d}+n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d})$
	$\displaystyle\quad+\kappa_{2}(r)^{-2}(\theta_{3,\mu}(h)+\theta_{3,\pi}(h))+\theta_{2,\beta}(r)\,,$

finishing the proof. ∎

6. Proofs for Section 4

The main improvement in the estimations in Lemmata 4.1 and 4.2 over Lemmata 3.1 and 3.2 is the elimination of the step of inequality (1.3). At a given point $(t_{0},u_{0},x_{0})$ , we are able to remove a heavy error term by simply sacrificing a constant multiple (depending on the point $(t_{0},u_{0},x_{0})$ ). This requires a change-of-measure argument thanks to Girsanov’s theorem, and the analysis of the constant multiple follows from Proposition 19 of [28].

Recall that the finite-population system has the following dynamics

dX^{n}_{i}(t)=\frac{1}{n}\sum_{j=1}^{n}g^{n}_{ij}b(X^{n}_{i}(t),X^{n}_{j}(t))dt+\sigma(X^{n}_{i}(t))dB_{\frac{i}{n}}(t)\,,

for $i=1,\dots,n$ . We define

\bar{B}^{n}_{i}(t)=\int_{0}^{t}(\sigma\sigma^{T})^{-1/2}(X^{n}_{i}(s))(dX^{n}_{i}(s)-\beta(s,\frac{i}{n},X^{n}_{i}(s))ds)

for $i=1,\dots,n$ , and $t\in[0,T]$ . Then

dX^{n}_{i}(t)=\beta(t,\frac{i}{n},X^{n}_{i}(t))dt+\sigma(X^{n}_{i}(t))d\bar{B}^{n}_{i}(t)\,,\qquad i=1,\dots,n\,.

Let $\bar{M}^{n}$ be the process

\bar{M}^{n}_{t}=\sum_{i=1}^{n}\int_{0}^{t}\left(\frac{1}{n}\sum_{j=1}^{n}g^{n}_{ij}b(X^{n}_{i}(s),X^{n}_{j}(s))-\beta(s,\frac{i}{n},X^{n}_{i}(s))\right)^{T}(\sigma\sigma^{T})^{-1/2}(X^{n}_{i}(s))d\bar{B}^{n}_{i}(s)\,.

Define a new probability measure $\bar{\mathbf{P}}$ via

\frac{d\bar{\mathbf{P}}}{d\mathbf{P}}=\exp\left(\bar{M}^{n}_{T}-\frac{1}{2}\langle\bar{M}^{n}\rangle_{T}\right)\,,

where $\langle\cdot\rangle$ denotes the quadratic variation. Observe that $\{\bar{B}^{n}_{i}\nonscript\>|\nonscript\>\mathopen{}\allowbreak i=1,\dots,n\}$ are independent $\bar{\mathbf{P}}$ -Brownian motions, and that $\bar{M}$ is a $\bar{\mathbf{P}}$ -martingale. So $\{X^{n}_{i}\nonscript\>|\nonscript\>\mathopen{}\allowbreak i=1,\dots,n\}$ are independent under $\bar{\mathbf{P}}$ , and the $\bar{\mathbf{P}}$ -law of $X^{n}_{i}$ coincides with $\mathbf{P}$ -law of $X_{\frac{i}{n}}$ , respectively for every $i=1,\dots,n$ . A slight modification on Proposition 19 of [28] gives the following relation.

Lemma 6.1.

There exist constants $C,a>0$ such that, for any $\mathcal{F}_{T}$ -measurable event $E$ , we have

\mathbf{P}(E)\leqslant C(\bar{\mathbf{P}}(E))^{a}\,.

Here $\mathcal{F}_{T}$ is the $\sigma$ -algebra generated by the Brownian motions $\{B_{u}(t)\}_{t\in[0,T],u\in I}$ .

Now we have the tools to complete the proof of the improved estimations and thus the minimax analysis.

6.1. Proof of Theorem 4.1

We first justify the improved upper bound.

Proof of Lemma 4.1.

Observe that

	$\displaystyle\mathbf{E}\left\|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right\|^{2}$
	$\displaystyle\leqslant 3\mathbf{E}\left\|\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})\big{(}K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0}))-\bar{\mathbf{E}}(K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0})))\big{)}\right\|^{2}$
	$\displaystyle+3\mathbf{E}\left\|\int_{I}J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})\bar{\mathbf{E}}(K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0})))-J_{h_{2}}(u_{0}-u)\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0})))du\right\|^{2}$
	$\displaystyle+3\left\|(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right\|$
	$\displaystyle=:3(M^{\prime}_{1}+M^{\prime}_{2}+M^{\prime}_{3})\,.$

For $i=1,\dots,n$ , let

\bar{Z}_{i}=J_{h_{2}}(u_{0}-\frac{i}{n})\big{(}K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0}))-\bar{\mathbf{E}}(K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0})))\big{)}\,.

Note that $\bar{Z}_{i}=0$ whenever $\left|u_{0}-\frac{i}{n}\right|>h_{2}$ , so the number of nonzero terms in the summation is $O(nh_{2})$ .

The main improvement upon Lemma 3.1 comes from the upper bound of $M^{\prime}_{1}$ via the change-of-measure argument. Following the same strategy as in the proof of Lemma 3.1, we have

	$\displaystyle M^{\prime}_{1}$	$\displaystyle=\mathbf{E}\left\|\frac{1}{n}\sum_{i=1}^{n}\bar{Z}_{i}\right\|^{2}$
		$\displaystyle=\int_{0}^{\infty}\mathbf{P}\left(\left\|\sum_{i=1}^{n}\bar{Z}_{i}\right\|>n\sqrt{z}\right)dx$
		$\displaystyle\leqslant C\int_{0}^{\infty}\bar{\mathbf{P}}\left(\left\|\sum_{i=1}^{n}\bar{Z}_{i}\right\|>n\sqrt{z}\right)^{a}dz\,.$

Recall that $\{X^{n}_{i}\nonscript\>|\nonscript\>\mathopen{}\allowbreak i=1,\dots,n\}$ are independent under $\bar{\mathbf{P}}$ . Then so are $\{\bar{Z}_{i}\nonscript\>|\nonscript\>\mathopen{}\allowbreak i=1,\dots,n\}$ . Moreover, we have $\bar{\mathbf{E}}\bar{Z}_{i}=0$ and $\left|\bar{Z}_{i}\right|\leqslant 2\left\|(J\otimes K)_{h}\right\|_{\infty}$ a.s. We may thus apply Bernstein’s inequality,

\bar{\mathbf{P}}\left(\left|\sum_{i=1}^{n}\bar{Z}_{i}\right|>n\sqrt{z}\right)\leqslant 2\exp\left(-\frac{\frac{1}{2}n^{2}z}{\sum_{i=1}^{n}\mathbf{E}\bar{Z}_{i}^{2}+\frac{1}{3}n\sqrt{z}\left\|(J\otimes K)_{h}\right\|_{\infty}}\right)\,.

For index $i$ such that $\left|u_{0}-\frac{i}{n}\right|\leqslant h_{2}$ , we have

	$\displaystyle\mathbf{E}Z_{i}^{2}$	$\displaystyle\leqslant J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\bar{\mathbf{E}}(K_{h_{3}}(x_{0}-X^{n}_{i}(t_{0}))^{2})$
		$\displaystyle=J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{i}{n}}(t_{0}))^{2})$
		$\displaystyle\leqslant h_{2}^{-2}\left\\|J\right\\|_{\infty}^{2}\int_{\mathbb{R}^{d}}K_{h_{3}}(x_{0}-x)^{2}\mu_{t_{0},\frac{i}{n}}(dx)$
		$\displaystyle\leqslant C_{\mu}h_{2}^{-2}h_{3}^{-d}\left\\|J\right\\|_{\infty}^{2}\left\\|K\right\\|_{2}^{2}\,,$

thanks to the local boundedness of $\mu_{t_{0},\frac{i}{n}}$ , in a neighborhood of $x_{0}$ . Using estimate (48) in [28], we get

	$\displaystyle\int_{0}^{\infty}\bar{\mathbf{P}}\left(\left\|\sum_{i=1}^{n}\bar{Z}_{i}\right\|>n\sqrt{z}\right)^{a}dz$
	$\displaystyle\quad\leqslant 2\max\left\{\frac{2C_{\mu}nh_{2}^{-1}h_{3}^{-d}\left\\|J\right\\|_{\infty}^{2}\left\\|K\right\\|_{2}^{2}}{an^{2}},\big{(}\frac{2nh_{2}^{-1}h_{3}^{-d}\left\\|(J\otimes K)\right\\|_{\infty}}{3an^{2}}\big{)}^{2}\right\}$
	$\displaystyle\quad\leqslant C(n^{-1}h_{2}^{-1}h_{3}^{-d}\left\\|J\right\\|_{\infty}^{2}\left\\|K\right\\|_{2}^{2}+n^{-2}h_{2}^{-2}h_{3}^{-2d}\left\\|J\right\\|_{\infty}^{2}\left\\|K\right\\|_{\infty}^{2})\,.$

The term $M^{\prime}_{2}$ is where items (1) and (2) of Lemma 4.1 become different. We first work on item (1). Recall that $J_{h_{2}}$ is supported on $\overline{B(0,h_{2})}$ . Then Cauchy-Schwarz inequality gives

	$\displaystyle M^{\prime}_{2}\leqslant$
	$\displaystyle\;h_{2}\int_{I}\left\|J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})\bar{\mathbf{E}}(K_{h_{3}}(x_{0}-X^{n}_{\lceil nu\rceil}(t_{0})))-J_{h_{2}}(u_{0}-u)\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0})))\right\|^{2}du\,.$

For each $u\in[u_{0}-h_{2},u_{0}+h_{2}]$ , since the $\bar{\mathbf{P}}$ -law of $X^{n}_{\lceil nu\rceil}$ is identical to the $\mathbf{P}$ -law of $X_{\frac{\lceil nu\rceil}{n}}$ , we have

	$\displaystyle J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})\bar{\mathbf{E}}(K_{h_{3}}(x_{0}-X^{n}_{\lceil nu\rceil}(t_{0})))-J_{h_{2}}(u_{0}-u)\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0})))$
	$\displaystyle=J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})\left(\mathbf{E}(K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0})))-\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0})))\right)$
	$\displaystyle+\left(J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})-J_{h_{2}}(u_{0}-u)\right)\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0})))\,.$

	$\displaystyle\left\|J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})\bar{\mathbf{E}}(K_{h_{3}}(x_{0}-X^{n}_{\lceil nu\rceil}(t_{0})))-J_{h_{2}}(u_{0}-u)\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0})))\right\|^{2}$
	$\displaystyle\quad\leqslant 2J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})^{2}\mathbf{E}\left\|K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0}))-K_{h_{3}}(x_{0}-X_{u}(t_{0}))\right\|^{2}$
	$\displaystyle\qquad+2\left\|J_{h_{2}}(u_{0}-\frac{\lceil nu\rceil}{n})-J_{h_{2}}(u_{0}-u)\right\|^{2}\mathbf{E}(K_{h_{3}}(x_{0}-X_{u}(t_{0}))^{2})$
	$\displaystyle\quad\leqslant 2h_{2}^{-2}\left\\|J\right\\|_{\infty}^{2}h_{3}^{-2-2d}\left\\|K\right\\|_{\infty}^{2}\mathbf{E}\left\|X_{\frac{\lceil nu\rceil}{n}}(t_{0})-X_{u}(t_{0})\right\|^{2}$
	$\displaystyle\qquad+2h_{2}^{-4}n^{-2}\left\\|\nabla J\right\\|_{\infty}^{2}C_{\mu}h_{3}^{-d}\left\\|K\right\\|_{2}^{2}$
	$\displaystyle\quad\leqslant C(n^{-2}h_{2}^{-2}h_{3}^{-2-2d}+n^{-2}h_{2}^{-4}h_{3}^{-d})\,,$

where the last inequality uses Theorem 2.1 of [3].

Integrating the above errors, we obtain

M^{\prime}_{2}\leqslant Cn^{-2}(h_{3}^{-2-2d}\left\|J\right\|_{\infty}^{2}\left\|\nabla K\right\|_{\infty}^{2}+h_{2}^{-2}h_{3}^{-d}\left\|\nabla J\right\|_{\infty}^{2}\left\|K\right\|_{2}^{2})\,.

That finishes the proof of item (1).

Looking at the proof of item (1), we notice that the only difference in item (2) compared to item (1) happens at the term

\mathbf{E}\left|K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0}))-K_{h_{3}}(x_{0}-X_{u}(t_{0}))\right|^{2}\,.

The previous (crude) analysis (in the proof of Lemma 3.1) gives an upper bound $O(n^{-2}h_{3}^{-2-2d})$ simply by mean-value theorem. However, the use of mean-value theorem is activated only when $\left|x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0})\right|\leqslant h_{3}$ and $\left|x_{0}-X_{u}(t_{0})\right|\leqslant h_{3}$ . Given the local boundedness of $\mu$ , we have

\mathbf{P}(A(t_{0},u,x_{0}))\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\mathbf{P}\left(\left|x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0})\right|\leqslant h_{3}\text{ or }\left|x_{0}-X_{u}(t_{0})\right|\leqslant h_{3}\right)\leqslant 2C_{\mu}^{\prime}h_{3}^{d}

for some constant $C_{\mu}^{\prime}$ .

Now, with the additional assumption on the continuity of initial data with respect to the $p$ -Wasserstein metric, we adjust the proof of Theorem 2.1(b) in [3] to see that

\mathbf{E}\left|X_{\frac{\lceil nu\rceil}{n}}(t_{0})-X_{u}(t_{0})\right|^{p}\leqslant C_{p}^{\prime}n^{-p}\,.

Then, by Hölder’s inequality, we get

	$\displaystyle\mathbf{E}\left\|K_{h_{3}}(x_{0}-X_{\frac{\lceil nu\rceil}{n}}(t_{0}))-K_{h_{3}}(x_{0}-X_{u}(t_{0}))\right\|^{2}$
	$\displaystyle\leqslant\mathbf{E}\left(\left\\|\nabla K_{h_{3}}\right\\|_{\infty}^{2}\left\|X_{\frac{\lceil nu\rceil}{n}}(t_{0})-X_{u}(t_{0})\right\|^{2}\mathbf{1}_{A(t_{0},u,x_{0})}\right)$
	$\displaystyle\leqslant h_{3}^{-2-2d}\left\\|\nabla K\right\\|_{\infty}^{2}\left(\left\|X_{\frac{\lceil nu\rceil}{n}}(t_{0})-X_{u}(t_{0})\right\|^{p}\right)^{\frac{2}{p}}\mathbf{P}(A(t_{0},u_{0},x_{0}))^{\frac{p-2}{p}}$
	$\displaystyle\leqslant C_{p}^{\prime}n^{-p}h_{3}^{-2-\frac{p+2}{p}d}\left\\|\nabla K\right\\|_{\infty}^{2}\,.$

Note that $C_{p}^{\prime}$ is independent of $u$ . That finishes the proof of item (2). ∎

It remains to analyze the bias term. Fix $t_{0}\in(0,T)$ , $u_{0}\in(0,1)$ , and $x_{0}\in\mathbb{R}^{d}$ . When $h_{2}<u_{0}$ , we have

	$\displaystyle(J\otimes K)_{h}\ast\mu(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})=$
	$\displaystyle\qquad\int_{\mathbb{R}}\int_{\mathbb{R}^{d}}J_{h_{2}}(u_{0}-u)K_{h_{3}}(x_{0}-x)(\mu(t_{0},u,x)-\mu(t_{0},u_{0},x_{0}))dxdu\,.$

For $u\in I$ and $x\in\mathbb{R}^{d}$ such that $\left|u_{0}-u\right|<h_{2}$ and $\left|x_{0}-x\right|<h_{3}$ , we have

	$\displaystyle\mu(t_{0},u,x)-\mu(t_{0},u_{0},x_{0})=$
	$\displaystyle\qquad(\mu(t_{0},u,x)-\mu(t_{0},u_{0},x))+(\mu(t_{0},u_{0},x)-\mu(t_{0},u_{0},x_{0}))\,.$

The second term has order $O(\left|h_{3}\right|^{s})$ due to the selection of Hölder continuity class. We will bound the first term with the following technical lemma.

Lemma 6.2.

Assume the hypothesis of There exists some constant $C_{I}>0$ , depending only on $T,d,b,\sigma$ , such that

\left|\mu(t,u,x)-\mu(t,v,x)\right|\leqslant C_{I}\left|u-v\right|\qquad dx\text{-a.s.}

for every $u,v\in I$ and every $t\in[0,T]$ .

The proof consists of several properties of parabolic equations, and we defer it to the Appendix B.

With the technical estimates given above, we are now able to prove Theorem 4.1. We start with the upper bound.

Proof of Theorem 4.1, upper bound.

We first work under assumption (a).

Given $(b,\sigma,G,\mu_{0})\in\mathcal{A}^{s}_{L}(t_{0},x_{0})$ , we know that

\left|\mu(t_{0},u_{0},x)-\mu(t_{0},u_{0},x_{0})\right|\leqslant L\left|x-x_{0}\right|^{s}\leqslant Lh_{3}^{s}

whenever $x_{0}-x\in supp(K_{h_{3}})$ . Thanks to Lemma 6.2, we have

\left|\mu(t_{0},u,x)-\mu(t_{0},u_{0},x)\right|\leqslant C_{I}\left|u-u_{0}\right|\leqslant C_{I}h_{2}\,,

whenever $u,u_{0}\in I$ . So the bias term is bounded by

\left|(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}\leqslant 2(C_{I}^{2}+L^{2})(h_{2}^{2}+h_{3}^{2s})\,.

Then, along with Lemma 4.1, the total upper bound of the estimation error is given by

\mathbf{E}\left|\hat{\mu}^{n}_{h}-\mu\right|^{2}\leqslant C(n^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{2}^{-2}h_{3}^{-2d}+n^{-2}h_{3}^{-2-2d}+n^{-2}h_{2}^{-2}h_{3}^{-d}+h_{2}^{2}+h_{3}^{2s})\,.

Taking $h_{2}=n^{-\frac{s}{d+3s}}$ and $h_{3}=n^{-\frac{1}{d+3s}}$ , we get

\mathbf{E}\left|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}\lesssim n^{-\frac{2s}{d+3s}}+n^{-\frac{6s-2}{d+3s}}\lesssim n^{-\frac{2s}{d+3s}}\,.

The last inequality holds when $s\geqslant\frac{1}{2}$ . Note that the implicit constant in the inequality depends only $T,d,C_{I},L,\left\|(J\otimes K)\right\|_{2},\left\|(J\otimes K)\right\|_{\infty}$ , and the values of $\mu$ near $(t_{0},u_{0},x_{0})$ .

Next, we work under the assumption (b). Analogous to above, we have

\mathbf{E}\left|\hat{\mu}^{n}_{h}-\mu\right|^{2}\leqslant C(n^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{2}^{-2}h_{3}^{-2d}+n^{-2}h_{3}^{-2-\frac{p+2}{p}d}+n^{-2}h_{2}^{-2}h_{3}^{-d}+h_{2}^{2}+h_{3}^{2s})\,.

Taking $h_{2}=n^{-\frac{s}{d+3s}}$ and $h_{3}=n^{-\frac{1}{d+3s}}$ , we get

\mathbf{E}\left|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right|^{2}\lesssim n^{-\frac{2s}{d+3s}}+n^{-\frac{6s-2+d-2d/p}{d+3s}}\,.

As $0<s<\frac{1}{2}$ , $p>2$ , and $p(2-4s)\leqslant(p-2)d$ , we have

6s-2+d-\frac{2d}{p}\geqslant 2s\,.

This leads to the final asymptotic upper bound in (4.1), namely

n^{-\frac{2s}{d+3s}}\,.

∎

Finally, we demonstrate the lower bound using Le Cam’s two-point method (see for instance Chapter 2 of [26]). We shall construct two examples of graphon mean-field systems such that, the total variation distance between their laws is bounded by $\frac{1}{2}$ , while the densities at $(t_{0},u_{0},x_{0})$ differ by some quantity of order $n^{-\frac{2s}{d+3s}}$ . The construction is adapted from [28], with an extra factor for the graphon index $u\in I$ , so we will skip some technical details in the proof below.

Proof of Theorem 4.1, lower bound.

Step 1. We consider graphon particle systems with no interactions.

Pick a smooth potential function $U_{1}:\mathbb{R}^{d}\to\mathbb{R}$ such that $\nabla U_{1}$ is Lipschitz, $U_{1}=0$ in a neighborhood of $x_{0}$ , and

\limsup_{\left|x\right|\to\infty}\frac{x^{T}\nabla U_{1}(x)}{\left|x\right|^{2}}>0\,.

Define the drift $b(x,y)=b_{1}(x)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}-\nabla U_{1}(x)$ . Pick a Lipschitz continuous function $G_{1}:I\times\mathbb{R}$ such that $G_{1}=0$ in a neighborhood of $u_{0}$ , and define the graphon weight $G(u,v)=G_{1}(u)$ . We set $C_{1,u}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{\mathbb{R}^{d}}\exp(-2G_{1}(u)U_{1}(x))dx$ and define

\nu_{1}(u,x)=C_{1,u}^{-1}\exp(-2G_{1}(u)U_{1}(x))\,,\qquad u\in I\,.

Then we obtain a family of diffusion processes $\{X_{u}\}_{u\in I}$ such that

(6.1)

dX_{u}(t)=b_{1}(X_{u}(t))G_{1}(u)dt+dB_{u}(t)\,,\quad X_{u}(0)\sim\nu_{1}(u)\,,\qquad u\in I\,.

Notice that $X_{u}$ ’s are independent, and $\nu_{1}(u)$ is the invariant distribution of $X_{u}$ . This gives a graphon particle system with time-invariant density function $\nu_{1}$ . In particular, we may assume that $(b_{1},I_{d\times d},G_{1},\nu_{1})\in S^{s}_{L/2}(t_{0},x_{0})$ .

Now we consider a deviation from the system (6.1). Let $\psi\in C_{c}^{\infty}(\mathbb{R}\times\mathbb{R})$ be a cut-off function such that

•

$\psi(0,0)=1$ and $\left\|\psi\right\|_{\infty}=1$ ,
•

$\int_{\mathbb{R}^{d}}\psi(u,x)dx=0$ for every $u\in\mathbb{R}$ , and $\left\|\psi\right\|_{2}=1$ ,
•

$\sup_{u\in\mathbb{R}}\left\|\psi(u,\cdot)\right\|_{\mathcal{H}^{s}(x_{0})}<\infty$ .

Let $\alpha\in(0,1)$ be sufficiently small. Then define $U_{2}:\mathbb{R}^{d}\to\mathbb{R}$ and $G_{2}:I\to[0,1]$ so that

G_{2}^{n}(u)U_{2}^{n}(x)=G_{1}(u)U_{1}(x)+\alpha C_{1,u}n^{-1/2}\zeta_{n}^{1/2}\tau_{n}^{d/2}\psi(\zeta_{n}(u-u_{0}),\tau_{n}(x-x_{0}))\,,

where $\tau_{n},\zeta_{n}$ are positive scalars that tend to $\infty$ as $n\to\infty$ . Let $b_{2}^{n}=-\nabla U_{2}^{n}$ . Then we construct the second particle system similar to the above, with time-invariant density

\nu_{2}^{n}(u,x)=C_{2,n,u}^{-1}\exp(-2G_{2}^{n}(u)U_{2}^{n}(x))\,,\quad C_{2,n,u}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{\mathbb{R}^{d}}\exp(-2G_{2}^{n}(u)U_{2}^{n}(x))dx\,.

Moreover, to maintain the desired Lipschitz and Hölder continuity, we need

n^{-1/2}\zeta_{n}^{3/2}\tau_{n}^{s+d/2}\lesssim 1\,.

This allows us to take $\tau_{n}=n^{\frac{1}{d+3s}}$ and $\zeta_{n}=n^{\frac{s}{d+3s}}$ , which also ensures that $(b_{2}^{n},I_{d\times d},G_{2}^{n},\nu_{2}^{n})\in S^{s}_{L}(t_{0},x_{0})$ .

Step 2. Now we run the finite-population systems derived from the above two graphon particle systems and make observations of the particle positions. For (6.1), the $n$ particles display the dynamics

dX^{n}_{i}(t)=b_{1}(X^{n}_{i}(t))G(\frac{i}{n})dt+dB_{\frac{i}{n}}(t)\,,\qquad i=1,\dots,n\,.

The distributions of the particles coincide with those in the graphon system, with joint law

\mu_{1}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\bigotimes_{i=1}^{n}\nu_{1}(\frac{i}{n})\,.

Similarly, the joint law in the second system is given by

\mu_{2}^{n}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\bigotimes_{i=1}^{n}\nu_{2}^{n}(\frac{i}{n})\,.

Then, following the strategy in [28], with Pinsker’s inequality, we have

\displaystyle\left\|\mu_{1}-\mu_{2}^{n}\right\|_{TV}^{2}\leqslant 2\sum_{i=1}^{n}\left|\log\frac{C_{2,n,\frac{i}{n}}}{C_{1,\frac{i}{n}}}\right|

Taylor’s theorem gives

	$\displaystyle\left\|\log\frac{C_{2,n,\frac{i}{n}}}{C_{1,\frac{i}{n}}}\right\|$	$\displaystyle\leqslant\left\|\frac{C_{2,n,\frac{i}{n}}}{C_{1,\frac{i}{n}}}-1\right\|$
(6.2)			$\displaystyle=2\alpha^{2}n^{-1}\zeta_{n}\tau_{n}^{d}\int_{\mathbb{R}^{d}}\nu_{1}(u,x)^{-1}\psi(\zeta_{n}(\frac{i}{n}-u_{0}),\tau_{n}(x-x_{0}))^{2}R_{i}(x)dx\,,$

where the remainder term $R_{i}\in[0,2]$ . Notice that $\nu_{1}(u,x)^{-1}$ is bounded above in a neighborhood of $(u_{0},x_{0})$ . So there exists some constant $c_{1}$ such that

\left\|\mu_{1}-\mu_{2}^{n}\right\|_{TV}^{2}\leqslant\frac{c_{1}\alpha^{2}\zeta_{n}}{n}\sum_{i=1}^{n}\left\|\psi(\zeta_{n}(\frac{i}{n}-u_{0}),\cdot)\right\|_{2}^{2}\,.

Since $\psi\in C_{c}^{\infty}$ , we have

	$\displaystyle\int_{I}\int_{\mathbb{R}^{d}}\left\|\psi(\zeta_{n}(\frac{\lceil nu\rceil}{n}-u_{0}),x)^{2}-\psi(\zeta_{n}(u-u_{0}),x)^{2}\right\|dxdu$
	$\displaystyle\leqslant 2\int_{I}\int_{\mathbb{R}^{d}}\left\|\zeta_{n}(\frac{\lceil nu\rceil}{n}-u)\right\|\psi_{0}(x)dxdu$
	$\displaystyle\leqslant 2n^{-1}\zeta_{n}\left\\|\psi_{0}\right\\|_{1}\,,$

where we set $\psi_{0}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\sum_{k=1}^{d}\left\|\nabla_{u}\psi_{k}(x)\right\|_{L^{\infty}(\mathbb{R})}$ . This implies

\frac{1}{n}\sum_{i=1}^{n}\left\|\psi(\zeta_{n}(\frac{i}{n}-u_{0}),\cdot)\right\|_{2}^{2}\leqslant\zeta_{n}^{-1}\left\|\psi\right\|_{2}^{2}+2n^{-1}\zeta_{n}\left\|\psi_{0}\right\|_{1}\,.

Thus

\left\|\mu_{1}-\mu_{2}^{n}\right\|_{TV}^{2}\leqslant c_{1}\alpha^{2}+o(1)\leqslant\frac{1}{4}

when $\alpha$ is chosen to be small enough and $n$ is sufficiently large.

Step 3. Finally, we apply the Le Cam’s lemma to see that

		$\displaystyle\inf_{\hat{\mu}}\sup_{(b,\sigma,G,\mu_{0})\in\mathcal{A}^{s}_{L}(t_{0},x_{0})}\mathbf{E}\left\|\hat{\mu}-\mu(t_{0},u_{0},x_{0})\right\|$
	$\displaystyle\geqslant$	$\displaystyle\inf_{\hat{\mu}}\max_{\tilde{\mu}\in\{\mu_{1},\mu_{2}^{n}\}}\mathbf{E}\left\|\hat{\mu}-\tilde{\mu}(t_{0},u_{0},x_{0})\right\|$
	$\displaystyle\geqslant$	$\displaystyle\frac{1}{2}\left\|\nu_{1}(u_{0},x_{0})-\nu_{2}^{n}(u_{0},x_{0})\right\|(1-\left\\|\mu_{1}-\mu_{2}^{n}\right\\|_{TV})$
	$\displaystyle\geqslant$	$\displaystyle\frac{1}{4}\left\|\nu_{1}(u_{0},x_{0})-\nu_{2}^{n}(u_{0},x_{0})\right\|\,.$

The same strategy as in (6.2) (see also equation (68) in [28]) gives that

\left|\nu_{1}(u_{0},x_{0})-\nu_{2}^{n}(u_{0},x_{0})\right|\gtrsim n^{-1/2}\zeta_{n}^{1/2}\tau_{n}^{d/2}=n^{-\frac{s}{d+3s}}\,.

Therefore, we get the lower bound (4.2) as well:

\inf_{\hat{\mu}}\sup_{(b,\sigma,G,\mu_{0})\in\mathcal{A}^{s}_{L}(t_{0},x_{0})}\mathbf{E}\left|\hat{\mu}-\mu(t_{0},u_{0},x_{0})\right|^{2}\geqslant n^{-\frac{2s}{d+3s}}\,,

completing the proof of Theorem 4.1. ∎

6.2. Proof of Theorem 4.2

We first justify the improved upper bound. For notation simplicity, let us define the $n,p$ -norm of a function $f:[0,T]\times I\times\mathbb{R}^{d}\to\mathbb{R}^{d}$ via

\left\|\varphi\right\|_{n,p}^{p}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}\int_{\mathbb{R}^{d}}f(t,\frac{i}{n},x)^{p}dxdt\,.

Proof of Lemma 4.2.

Fix $t_{0},u_{0},x_{0}$ . We consider the following telescoping sum

	$\displaystyle\hat{\pi}^{n}_{h}(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})$
	$\displaystyle=\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))(Y^{n}_{i}(t)-\beta(t,\frac{i}{n},X^{n}_{i}(t)))dt$
	$\displaystyle+\int_{0}^{T}\int_{I}\int_{\mathbb{R}^{d}}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-u,x_{0}-x)\beta(t,u,x)(\mu^{n}_{t}(du,dx)-\mu_{t,u}(dx)du)dt$
	$\displaystyle+\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))\sigma(X^{n}_{i}(t))dB_{\frac{i}{n}}(t)$
	$\displaystyle+((H\otimes J\otimes K)_{h}\ast\pi-\pi)(t_{0},u_{0},x_{0})\,.$

This allows us to write

\mathbf{E}\left|\hat{\pi}^{n}_{h}(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})\right|^{2}\leqslant 4(\mathbf{E}\left|P^{\prime}_{1}\right|^{2}+\mathbf{E}\left|P^{\prime}_{2}\right|^{2}+\mathbf{E}\left|P^{\prime}_{3}\right|^{2}+\mathbf{E}\left|P^{\prime}_{4}\right|^{2})

with obvious definitions of the four components.

Step 1. To bound $P^{\prime}_{2}$ , we observe that

	$\displaystyle P^{\prime}_{2}$	$\displaystyle=\int_{[0,T]\times I\times\mathbb{R}^{d}}(\varphi\beta)(t,u,x)(\mu^{n}_{t}(du,dx)-\mu_{t,u}(dx)du)dt$
		$\displaystyle=\int_{[0,T]\times I\times\mathbb{R}^{d}}(\varphi\beta)(t,u,x)(\mu^{n}_{t}(du,dx)-\bar{\mu}^{n}_{t}(du,dx))dt$
		$\displaystyle\quad+\int_{[0,T]\times I\times\mathbb{R}^{d}}(\varphi\beta)(t,u,x)(\bar{\mu}^{n}_{t}(du,dx)-\mu_{t,u}(dx)du)dt\,,$

where $\bar{\mu}^{n}_{t}(du,dx)=\frac{1}{n}\sum_{i=1}^{n}\mu_{t,u}(dx)\delta_{\frac{i}{n}}(du)$ , and $\varphi=(H\otimes J\otimes K)_{h}(t_{0}-\cdot,u_{0}-\cdot,x_{0}-\cdot)$ . Then the first part becomes

P^{\prime}_{2,1}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\frac{1}{n}\int_{0}^{T}(\varphi\beta)(t,\frac{i}{n},X^{n}_{i}(t))-\bar{\mathbf{E}}[(\varphi\beta)(t,\frac{i}{n},X^{n}_{i}(t))]dt\,,

where $\bar{\mathbf{P}}$ is the measure under which $(X^{n}_{i})_{i=1,\dots,n}$ are independent, with corresponding laws $\mu_{\frac{i}{n}}$ , $i=1,\dots,n$ . Since $\beta$ is bounded, and $\mu(t,u,x)$ is bounded in a small neighborhood of $(t_{0},u_{0},x_{0})$ , we have

\bar{\mathbf{E}}[(\varphi\beta)(t,\frac{i}{n},X^{n}_{i}(t))]^{2}\lesssim H_{h_{1}}^{2}(t_{0}-t)J_{h_{2}}^{2}(u_{0}-\frac{i}{n})\left\|K_{h_{3}}\right\|_{2}^{2}\,,

where the implicit constant is uniform over $i=1,\dots,n$ . This gives

\int_{0}^{T}\sum_{i=1}^{n}\bar{\mathbf{E}}[(\varphi\beta)(t,\frac{i}{n},X^{n}_{i}(t))]^{2}\lesssim\left\|(H\otimes J\otimes K)_{h}\right\|_{n,2}^{2}\,.

Applying Bernstein’s inequality gives

	$\displaystyle\mathbf{P}\big{(}\left\|P^{\prime}_{2,1}\right\|\geqslant\sqrt{z}\big{)}$	$\displaystyle\leqslant c_{1}\bar{\mathbf{P}}\big{(}\left\|P^{\prime}_{2,1}\right\|\geqslant\sqrt{z}\big{)}^{c_{2}}$
		$\displaystyle\leqslant 2dc_{1}\exp\left(-\frac{c_{2}nz}{c_{3}(T^{-1}\left\\|(H\otimes J\otimes K)_{h}\right\\|_{n,2}^{2}+\sqrt{z}\left\\|(H\otimes J\otimes K)_{h}\right\\|_{\infty})}\right)\,,$

for some positive constants $c_{1},c_{2},c_{3}$ . Note that $J_{h_{2}}$ is supported on $[u_{0}-h_{2},u_{0}+h_{2}]$ , so

	$\displaystyle\left\\|(H\otimes J\otimes K)_{h}\right\\|_{n,2}^{2}$	$\displaystyle=\int_{0}^{T}H_{h_{1}}^{2}(t_{0}-t)\frac{1}{n}\sum_{i=1}^{n}J_{h_{2}}^{2}(u_{0}-\frac{i}{n})\left\\|K_{h_{3}}\right\\|_{2}^{2}dt$
		$\displaystyle\lesssim\left\\|H_{h_{1}}\right\\|_{2}^{2}h_{2}\left\\|J_{h_{2}}\right\\|_{\infty}^{2}\left\\|K_{h_{3}}\right\\|_{2}^{2}$
		$\displaystyle\lesssim h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}\,.$

Thus

	$\displaystyle\mathbf{E}\left\|P^{\prime}_{2,1}\right\|^{2}$	$\displaystyle\lesssim 2dc_{1}\int_{0}^{\infty}\exp\left(-\frac{c_{2}nz}{c_{3}(T^{-1}\left\\|(H\otimes J\otimes K)_{h}\right\\|_{n,2}^{2}+\sqrt{z}\left\\|(H\otimes J\otimes K)_{h}\right\\|_{\infty})}\right)dz$
		$\displaystyle\lesssim n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d}\,.$

For the second part, we observe that

	$\displaystyle P^{\prime}_{2,2}$	$\displaystyle\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{[0,T]\times I\times\mathbb{R}^{d}}(\varphi\beta)(t,u,x)(\bar{\mu}^{n}_{t}(du,dx)-\mu_{t,u}(dx)du)dt$
		$\displaystyle=\int_{0}^{T}\int_{I}\mathbf{E}\left[(\varphi\beta)(t,\frac{\lceil nu\rceil}{n},X_{\frac{\lceil nu\rceil}{n}}(t))-(\varphi\beta)(t,u,X_{u}(t))\right]dudt\,.$

This is identical to $P_{3}$ in the proof of Lemma 3.2, which gives

\left|P^{\prime}_{2,2}\right|\lesssim n^{-2}h_{1}^{-1}h_{3}^{-2d}(h_{2}^{-2}+h_{3}^{-2})\,.

Thus

\mathbf{E}\left|P^{\prime}_{2}\right|\lesssim n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{3}^{-2d}(h_{1}^{-1}h_{2}^{-2}+h_{3}^{-2})

Step 2. To bound $P^{\prime}_{1}$ , let us abuse the notation $\beta:[0,T]\times I\times\mathbb{R}^{d}\times\mathcal{P}(I\times\mathbb{R}^{d})\to\mathbb{R}^{d}$ for the mean-field dependent quantity

\beta(t,u,x,\mu_{t})=\int_{I}\int_{\mathbb{R}^{d}}G(u,v)b(x,y)\mu_{t,u}(dx)dv\,.

Then we may write

	$\displaystyle Y^{n}_{i}(t)-\beta(t,\frac{i}{n},X^{n}_{i}(t))$	$\displaystyle=\beta(t,\frac{i}{n},X^{n}_{i}(t),\mu^{n}_{t})-\beta(t,\frac{i}{n},X^{n}_{i}(t),\mu_{t})$
		$\displaystyle=\int_{I}\int_{\mathbb{R}^{d}}G(\frac{i}{n},u)b(X^{n}_{i}(t),x)(\mu^{n}_{t}(du,dx)-\mu_{t,u}(dx)du)\,.$

Note that $b$ and $G$ are bounded and Lipschitz, we may compare with Proposition 19 under Assumption 4(iii) [28]. This gives a uniform-in-time bound

\mathbf{P}\left(\left|Y^{n}_{i}(t)-\beta(t,\frac{i}{n},X^{n}_{i}(t))\right|\geqslant z\right)\leqslant c_{1}\exp\big{(}-\frac{c_{2}nz^{2}}{1+\sqrt{n}z}\big{)}\,,\qquad z>0\,.

Now, applying Cauchy-Schwarz twice, we obtain

	$\displaystyle\mathbf{E}\left\|P^{\prime}_{1}\right\|^{2}$	$\displaystyle\leqslant\left[\mathbf{E}\left(\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}(H\otimes J\otimes K)_{h}(t_{0}-t,u_{0}-\frac{i}{n},x_{0}-X^{n}_{i}(t))^{2}dt\right)^{2}\right]^{1/2}$
		$\displaystyle\quad\cdot\left[\mathbf{E}\left(\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}\left\|Y^{n}_{i}(t)-\beta(t,\frac{i}{n},X^{n}_{i}(t))\right\|^{2}dt\right)^{2}\right]^{1/2}$

Applying Cauchy-Schwarz to the second term again, we see that

	$\displaystyle\left[\mathbf{E}\left(\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}\left\|Y^{n}_{i}(t)-\beta(t,\frac{i}{n},X^{n}_{i}(t))\right\|^{2}dt\right)^{2}\right]^{1/2}$
	$\displaystyle\leqslant T^{1/2}\left(\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}\mathbf{E}\left\|Y^{n}_{i}(t)-\beta(t,\frac{i}{n},X^{n}_{i}(t))\right\|^{4}dt\right)^{1/2}$
	$\displaystyle\lesssim\sup_{t\in[0,T]}\sup_{i=1,\dots,n}\left(\int_{0}^{\infty}\mathbf{P}\left(\left\|Y^{n}_{i}(t)-\beta(t,\frac{i}{n},X^{n}_{i}(t))\right\|\geqslant z^{1/4}\right)dz\right)^{1/2}$
	$\displaystyle\lesssim\int_{0}^{\infty}2c_{1}\exp\big{(}-\frac{c_{2}nz^{1/2}}{1+\sqrt{n}z^{1/4}}\big{)}^{1/2}\lesssim n^{-1}\,.$

For the first term, Minkowski’s inequality gives an upper bound

		$\displaystyle\int_{[0,T]\times I\times\mathbb{R}^{d}}\varphi(t,u,x)^{2}\mu_{t,u}(dx)dudt$
(6.3)			$\displaystyle+\left[\mathbf{E}\left(\int_{[0,T]\times I\times\mathbb{R}^{d}}\varphi(t,u,x)^{2}(\mu^{n}_{t}(du,dx)-\mu_{t,u}(dx)du)dt\right)^{2}\right]^{1/2}\,,$

where $\varphi=(H\otimes J\otimes K)_{h}(t_{0}-\cdot,u_{0}-\cdot,x_{0}-\cdot)$ . With the same strategy applied to bound $P^{\prime}_{2,1}$ , we get

	$\displaystyle\mathbf{E}\left[\int_{[0,T]\times I\times\mathbb{R}^{d}}\varphi(t,u,x)^{2}(\mu^{n}_{t}(du,dx)-\bar{\mu}^{n}_{t}(du,dx))dt\right]^{2}$
	$\displaystyle\qquad\qquad\lesssim n^{-1}h_{1}^{-3}h_{2}^{-3}h_{3}^{-3d}+n^{-2}h_{1}^{-4}h_{2}^{-4}h_{3}^{-4d}\,,$

which follows from the fact that

\left\|\varphi\right\|_{n,4}^{4}\lesssim h_{1}^{-3}h_{2}^{-3}h_{3}^{-3d}\,.

On the other hand, using the idea of $P^{\prime}_{2,2}$ , we get

	$\displaystyle\mathbf{E}\left[\int_{[0,T]\times I\times\mathbb{R}^{d}}\varphi(t,u,x)^{2}(\bar{\mu}^{n}_{t}(du,dx)-\bar{\mu}_{t,u}(dx)du)dt\right]$
	$\displaystyle\qquad\qquad\lesssim n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-d}+n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}\,.$

Those lead to the following asymptotic upper bound for the first term,

	$\displaystyle h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-1/2}h_{1}^{-3/2}h_{2}^{-3/2}h_{3}^{-3d/2}+n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d}$
	$\displaystyle+n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-d}+n^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}\,,$

Joining the above leads to

\mathbf{E}\left|P^{\prime}_{1}\right|^{2}\lesssim n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}\,,

where we assume $n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}\lesssim 1$ .

Step 3. Now, to bound $P^{\prime}_{3}$ , we apply Itô’s isometry to see that

	$\displaystyle\mathbf{E}\left\|P^{\prime}_{3}\right\|^{2}$	$\displaystyle\leqslant\frac{\sigma_{+}^{2}d^{2}}{n}\mathbf{E}\left[\int_{0}^{T}\frac{1}{n}\sum_{i=1}^{n}\varphi(t,\frac{i}{n},X^{n}_{i}(t))^{2}dt\right]$
		$\displaystyle=\frac{\sigma_{+}^{2}d^{2}}{n}\mathbf{E}\left[\int_{[0,T]\times I\times\mathbb{R}^{d}}\varphi(t,u,x)^{2}\mu^{n}_{t}(du,dx)dt\right]\,.$

Note that

\mathbf{E}\left[\int_{[0,T]\times I\times\mathbb{R}^{d}}\varphi(t,u,x)^{2}\mu_{t,u}(dx)dudt\right]\lesssim\left\|\varphi\right\|_{2}^{2}\,.

We write again

\mu^{n}_{t}(du,dx)-\mu_{t,u}(dx)du=(\mu^{n}_{t}(du,dx)-\bar{\mu}^{n}_{t}(du,dx))+(\bar{\mu}^{n}_{t}(du,dx)-\mu_{t,u}(dx)du)\,.

Using the bound for (6.3) in Step 2 leads to

\displaystyle\mathbf{E}\left|P^{\prime}_{3}\right|^{2}\lesssim n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}\,,

whenever $n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}\lesssim 1$ .

Summarizing the above, we conclude that

\mathbf{E}\left|\hat{\pi}^{n}_{h}-\pi\right|^{2}\lesssim n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}+n^{-2}h_{1}^{-1}h_{3}^{-2-2d}+\left|\varphi\ast\pi-\pi\right|^{2}\,.

∎

Finally, analogous to the analysis of $\mu$ , we are able to show the optimality of $\hat{\beta}$ .

Proof of Theorem 4.2.

The strategies are identical to the proof of Theorem 4.1.

Step 1: Upper bound. Fix $t_{0}\in(0,T)$ , $u_{0}\in I$ , $x_{0}\in\mathbb{R}^{d}$ , and $\kappa_{2}>0$ . We assume that $n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}\lesssim 1$ .

Recall that

\left|\hat{\beta}_{h,\kappa}-\beta\right|^{2}\lesssim\kappa_{2}^{-2}(\left|\hat{\pi}^{n}_{h}-\pi\right|^{2}+\left\|b\right\|_{\infty}^{2}\left|\hat{\mu}^{n}_{h}-\mu\right|^{2})

whenever $\kappa_{2}<\inf_{\operatorname{supp}(H\otimes J\otimes K)_{h}}\mu$ . From Lemmata 4.1 and 4.2, we have

	$\displaystyle\mathbf{E}\left\|\hat{\mu}^{n}_{h}-\mu\right\|^{2}\lesssim$
	$\displaystyle\qquad n^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{2}^{-2}h_{3}^{-2d}+n^{-2}h_{3}^{-2-2d}+n^{-2}h_{2}^{-2}h_{3}^{-d}$
	$\displaystyle\qquad+\left\|(J\otimes K)_{h}\ast\mu-\mu\right\|^{2}\,,$

and

	$\displaystyle\mathbf{E}\left\|\hat{\pi}^{n}_{h}-\pi\right\|^{2}\lesssim$
	$\displaystyle\qquad n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}+n^{-2}h_{1}^{-1}h_{3}^{-2-2d}$
	$\displaystyle\qquad+\left\|(H\otimes J\otimes K)_{h}\ast\pi-\pi\right\|^{2}\,.$

In the proof of Theorem 4.1 we saw that

\left|(J\otimes K)_{h}\ast\mu-\mu\right|^{2}\lesssim h_{2}^{2}+h_{3}^{2s_{3}}\,.

We may obtain a similar upper bound for $\left|(H\otimes J\otimes K)_{h}\ast\pi-\pi\right|^{2}$ .

Recall that $\pi=\mu\beta$ , where

\beta(t,u,x)=\int_{I}G(u,v)\int_{\mathbb{R}^{d}}b(t,x,y)\mu_{t,v}(dy)dv=\int_{I}G(u,v)\mathbf{E}[b(t,x,X_{v}(t))]dv\,.

The boundedness of $b$ ensures the Lipschitz continuity of $\beta$ in variable $u$ , while the Lipschitz continuity of $b$ leads to local Hölder $s_{3}$ -continuity of $\beta$ in variable $x$ for $s_{3}\in[0,1]$ . Moreover, for $0<t_{1}<t_{2}<T$ , by Itô’s formula we have

	$\displaystyle\beta(t_{2},u,x)-\beta(t_{1},u,x)$
	$\displaystyle=\int_{I}G(u,v)\mathbf{E}(b(t_{2},x,X_{v}(t_{2}))-b(t_{1},x,X_{v}(t_{1})))dv$
	$\displaystyle=\int_{I}G(u,v)\mathbf{E}\left[\int_{t_{1}}^{t_{2}}(\partial_{t}b_{t}(x)+\partial_{y}b(x)\beta_{t,v}+\frac{1}{2}\partial_{y}^{2}b(x)\operatorname{tr}(\sigma\sigma^{T}))(X_{v}(t))dt\right]dv\,.$

Given the uniform boundedness of $b$ , $\partial_{t}b$ , $\partial_{y}b$ , and $\partial_{y}^{2}b$ , we have a uniform bound

\left|\beta(t_{2},u,x)-\beta(t_{1},u,x)\right|\lesssim\left|t_{2}-t_{1}\right|\,,

and this is further bounded by $O(\left|t_{2}-t_{1}\right|^{s_{1}})$ for $s_{1}\in[0,1]$ whenever $t_{2}-t_{1}<1$ . Then we have $\beta\in\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0})$ as well, and so does $\pi=\mu\beta$ . Thus

\left|(H\otimes J\otimes K)_{h}\ast\pi-\pi\right|^{2}\lesssim h_{1}^{2s_{1}}+h_{2}^{2}+h_{3}^{2s_{3}}\,.

Joining the above leads to the bound

\mathbf{E}\left|\hat{\beta}^{n}_{h,\kappa}-\beta\right|^{2}\lesssim n^{-1}h_{1}^{-1}h_{2}^{-1}h_{3}^{-d}+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-1-2d}+n^{-2}h_{1}^{-1}h_{3}^{-2-2d}+h_{1}^{2s_{1}}+h_{2}^{2}+h_{3}^{2s_{3}}\,.

Choosing $h_{1}=n^{-\frac{s_{b}}{s_{1}(2s_{b}+1)}}$ , $h_{2}=n^{-\frac{s_{b}}{2s_{b}+1}}$ , and $h_{3}=n^{-\frac{s_{b}}{s_{3}(2s_{b}+1)}}$ , we get

\mathbf{E}\left|\hat{\beta}^{n}_{h,\kappa}(t_{0},u_{0},x_{0})-\beta(t_{0},u_{0},x_{0})\right|^{2}\lesssim n^{-\frac{2s_{b}}{2s_{b}+1}}\,.

Note that the implicit constant depends only on $T,d,\left\|b\right\|_{\infty}$ , the $L^{2}$ and $L^{\infty}$ norms of $(H\otimes J\otimes K)$ , and the values of $\mu$ in a small neighborhood of $(t_{0},u_{0},x_{0})$ . That concludes (4.5).

Step 2: Lower bound. We construct examples and apply the two-point comparison lemma in a similar way as in the proof of Theorem 4.1.

Let $L>0$ . We consider also models with no interactions, so that

\beta(t,u,x)=\int_{I}G(u)\int_{\mathbb{R}^{d}}b(t,x)\mu_{t,v}(dy)dv=G(u)b(t,x)\,.

Pick $b_{1}(t,x)$ and $G_{1}(u)$ so that $(b_{1},I_{d\times d},G_{1},\mu_{0})\in\check{\mathcal{A}}^{s_{1},s_{3}}_{L/2}(t_{0},x_{0})$ . Note that there is no interaction, so the particles $(X^{n}_{1},\dots,X^{n}_{n})$ are independent, with joint law denoted by $\nu_{1}$ .

Choose some $\psi\in C_{c}^{\infty}(\mathbb{R}\times\mathbb{R}\times\mathbb{R}^{d})$ such that

•

$\psi(0,0,0)=1$ , $\left\|\psi\right\|_{\infty}1$ ,
•

$\int\psi dtdudx=0$ , $\left\|\psi\right\|_{2}=1$ ,
•

$\sup_{u\in\mathbb{R}}\left\|\psi(u,\cdot)\right\|_{\mathcal{H}^{s_{1},s_{3}}(t_{0},x_{0})}<\infty$ .

For $n\geqslant 1$ and some small enough $\alpha\in(0,1)$ , pick $b_{2}^{n}(t,x)$ and $G_{2}^{n}(u)$ so that

G_{2}^{n}(u)b_{2}^{n}(t,x)=G_{1}(u)b_{1}(t,x)+\alpha n^{-\frac{1}{2}}(\tau^{1}_{n})^{\frac{1}{2}}(\tau^{2}_{n})^{\frac{1}{2}}(\tau^{3}_{n})^{\frac{d}{2}}\psi(\tau^{1}_{n}(t-t_{0}),\tau^{2}_{n}(u-u_{0}),\tau^{3}_{n}(x-x_{0}))\,,

where $(\tau^{i}_{n})_{n\geqslant 1}$ , $i=1,2,3$ , are sequences of scalars such that

(\tau^{1}_{n})^{s_{1}}=(\tau^{2}_{n})^{s_{2}}=(\tau^{3}_{n})^{\frac{s_{3}}{d}}=n^{\frac{s_{b}}{2s_{b}+1}}\,.

With proper choice of parameters $(b_{1},G_{1},\mu_{0})$ , small enough $\alpha$ and large enough $n$ , the local Hölder continuity of the density follows from classical estimates given in [9]. This means we may assume $(b_{2}^{n},G_{2}^{n},I_{d\times d},\mu_{0})\in\check{\mathcal{A}}^{s_{1},s_{3}}_{L}(t_{0},x_{0})$ .

Denote by $\nu_{2}^{n}$ the joint law of the particles $(X^{n}_{1},\dots,X^{n}_{n})$ derived from the parameters $(b_{2}^{n},G_{2}^{n},I_{d\times d},\mu_{0})$ . Following the idea of Lemma 28 in [28], we see that

(6.4)

\left\|\nu_{1}-\nu_{2}^{n}\right\|_{TV}^{2}\leqslant\frac{1}{4}\int_{0}^{T}\sum_{i=1}^{n}\mathbf{E}\left|G_{2}^{n}(\frac{i}{n})b_{2}^{n}(t,X^{n}_{i}(t))-G_{1}(\frac{i}{n})b_{1}(t,X^{n}_{i}(t))\right|^{2}dt\,.

Given the compact support of $\psi$ and local boundedness of $\mu_{t,u}$ , (6.4) is further bounded by

\displaystyle\frac{\alpha^{2}}{4}\int_{0}^{T}\tau^{1}_{n}\sum_{i=1}^{n}\tau^{2}_{n}(\tau^{3}_{n})^{d}\left\|\psi(\tau^{1}_{n}(t-t_{0}),\tau^{2}_{n}(\frac{i}{n}-u_{0}),\tau^{3}_{n}(\cdot-x_{0}))\right\|_{2}^{2}dt\lesssim\frac{\alpha^{2}}{4}<\frac{1}{4}

when $\alpha$ is sufficiently small.

On the other hand,

\left|G_{2}^{n}(u_{0})b_{2}^{n}(t_{0},x_{0})-G_{1}(u_{0})b_{1}(t_{0},x_{0})\right|=\alpha n^{-\frac{1}{2}}(\tau^{1}_{n})^{\frac{1}{2}}(\tau^{2}_{n})^{\frac{1}{2}}(\tau^{3}_{n})^{\frac{d}{2}}\gtrsim n^{-\frac{s_{b}}{2s_{b}+1}}\,.

Applying Le Cam’s two-point comparison method gives the lower bound

		$\displaystyle\inf_{\hat{\beta}}\sup_{(b,\sigma,G,\mu_{0})\in\check{\mathcal{A}}^{s_{1},s_{3}}_{L}(t_{0},x_{0})}\mathbf{E}\left\|\hat{\beta}-\beta(t_{0},u_{0},x_{0})\right\|$
	$\displaystyle\geqslant$	$\displaystyle\inf_{\hat{\beta}}\max_{\tilde{\beta}\in\{\beta_{1},\beta_{2}^{n}\}}\mathbf{E}\left\|\hat{\beta}-\tilde{\beta}(t_{0},u_{0},x_{0})\right\|$
	$\displaystyle\geqslant$	$\displaystyle\frac{1}{2}\left\|\beta_{1}(t_{0},u_{0},x_{0})-\beta_{2}^{n}(t_{0},u_{0},x_{0})\right\|(1-\left\\|\nu_{1}-\nu_{2}^{n}\right\\|_{TV})$
	$\displaystyle\geqslant$	$\displaystyle\frac{1}{4}\left\|G_{1}(u_{0})b_{1}(t_{0},x_{0})-G_{2}^{n}(u_{0})b_{2}^{n}(t_{0},x_{0})\right\|$
	$\displaystyle\gtrsim$	$\displaystyle n^{-\frac{s_{b}}{2s_{b}+1}}\,.$

∎

Acknowledgement

We express our sincere gratitude to Marc Hoffmann for helpful discussions.

Appendix A Intuitions of Estimators

Recall that

\beta(t,u,x)=\int_{I}\int_{\mathbb{R}^{d}}b(x,y)G(u,v)\mu_{t,v}(dy)dv\,.

With Condition 2.1(2) and 2.2(2), we may further expand it as

\beta(t,u,x)=V(x)\int_{I}g(u-v)dv+gF\ast\mu_{t}(u,x)\,,

where the convolution here is done on the space $\mathbb{R}\times\mathbb{R}^{d}$ .

The first term is independent of time $t$ , so we have $\partial_{t}\beta=gF\ast\partial_{t}\mu$ . Note that $\partial_{t}$ is a linear operator, and we may approximate it by some finite difference operator

D_{h}f(t_{0})\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\frac{f(t_{0}+h)-f(t_{0})}{h}=\int_{0}^{T}f(t)\frac{\delta_{t_{0}+h}(dt)-\delta_{t_{0}}(dt)}{h}\,.

We consider some linear operator $\mathcal{L}_{\phi}$ with bounded function $\phi$ that approximates the differential operator, and

\mathcal{L}_{\phi}\beta=(g\otimes F)\ast\mathcal{L}_{\phi}\mu\,.

Then the deconvolution method gives

\mathcal{F}_{I}\mathcal{F}_{\mathbb{R}^{d}}\mathcal{L}_{\phi}\beta=(\mathcal{F}_{I}g)(\mathcal{F}_{\mathbb{R}^{d}}F)(\mathcal{F}_{I}\mathcal{F}_{\mathbb{R}^{d}}\mathcal{L}_{\phi}\mu)\,,

i.e. $\mathcal{T}\beta=(\mathcal{F}g)(\mathcal{F}F)\mathcal{T}\mu$ . Thus it leads to the formula

G(u_{0},v_{0})=\frac{g(u_{0}-v_{0})\left\|F\right\|_{2}}{g_{0}\left\|F\right\|_{2}}=\frac{\left\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\beta}{\mathcal{T}\mu}\big{)}(u=u_{0}-v_{0})\right\|_{2}}{\left\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\beta}{\mathcal{T}\mu}\big{)}(u=0)\right\|_{2}}

whenever well-defined.

Once we have an estimate of $\mu$ and an estimate of $\beta$ , we may plug them into this formula. With some additional cutoff factors (to avoid the denominators being too small), we produce the estimator (2.6).

Appendix B Proofs of Technical Lemmas

Proof of Proposition 2.1.

Recall that for each $u\in I$ , the dynamic of type- $u$ particles is given by $dX_{u}(t)=\beta(t,u,X_{u}(t))dt+\sigma(X_{u}(t))dB_{u}(t)$ , where $\beta_{t,u}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\beta(t,u,\cdot)$ is Lipschitz on $\mathbb{R}^{d}$ (this is not hard to verify). So $\mu_{t,u}$ uniquely solves the Fokker-Planck equation

\partial_{t}\mu_{t,u}=-\nabla\cdot(\beta_{t,u}\mu_{t,u})+\frac{1}{2}\sum_{i,j=1}^{d}\partial_{x_{i}x_{j}}((\sigma\sigma^{T})_{ij}\mu_{t,u})\,.

Theorem 7.3.3 in [9] gives a local upper bound

\left\|\mu_{t,u}\right\|_{L^{\infty}(U)}\leqslant C\left\|\mu_{0,u}\right\|_{L^{\infty}(U)}+Ct^{(p-d-2)/2}(1+\left\|\beta\right\|_{L^{p}(\mu_{t,u})}^{p})

for all $p>d+2$ and any bounded open set $U$ , whenever the $L^{\infty}$ -norm is well-defined.

We find that for any $p$ ,

	$\displaystyle\int_{\mathbb{R}^{d}}\left\|\beta(t,u,x)\right\|^{p}\mu(t,u,dx)$
	$\displaystyle=\int_{\mathbb{R}^{d}}\left\|\int_{I}\int_{\mathbb{R}^{d}}b(x,y)G(u,v)\mu(t,v,dy)dv\right\|^{p}\mu(t,u,dx)$
	$\displaystyle\lesssim\int_{I}\int_{\mathbb{R}^{d}}\left\|\int_{\mathbb{R}^{d}}b(x,y)\mu(t,v,dy)\right\|^{p}\mu(t,u,dx)dv$
	$\displaystyle\leqslant\left\\|b\right\\|_{\infty}^{p}<\infty\,.$

From Condition 2.3(1) we know that there exists some $R>0$ such that $\mu(0,u,x)$ is uniformly bounded by some constant $M$ outside the unit ball $B(0,R)$ for all $u\in I$ . We cover $\overline{B(0,R)}^{c}$ with open balls, so that

\mu(t,u,x)\leqslant CM+Ct^{(p-d-2)/2}(1+\left\|b\right\|_{\infty}^{p})

holds for every $t\in[0,T]$ , $u\in I$ , $\left|x\right|>R$ .

Denote the above upper bound by $C_{0}$ . Then we have for every $t\in[0,T]$ and $u\in I$ that

\int_{\left|x\right|>R}\mu(t,u,x)^{2}dx\leqslant C_{0}\int_{\left|x\right|>R}\mu(t,u,x)dx=C_{0}\mathbf{P}(\left|X_{u}(t)\right|>R)\leqslant\frac{C_{0}\mathbf{E}\left|X_{u}(t)\right|^{2}}{R^{2}}\,.

A classical estimates shows that $\sup_{t\in[0,T],u\in I}\mathbf{E}\left|X_{u}(t)\right|^{2}<\infty$ , so the above quantity tends to 0 as $R\to\infty$ . ∎

Proof of Leamma 5.1.

From the proof of Theorem 3.2 of [3] we see that there exists some constant $C>0$ such that

\mathbf{E}\left|Y^{n}_{i}(t)-Y_{\frac{i}{n}}(t)\right|^{2}\leqslant 6C\max_{1\leqslant j\leqslant n}\mathbf{E}\left|X^{n}_{j}(t)-X_{\frac{j}{n}}(t)\right|^{2}+\frac{6C}{n^{2}}

for all $i=1,\dots,n$ and $t\in[0,T]$ , which gives the first inequality. The second inequality follows immediately from dominated convergence. ∎

Proof of Lemma 6.2.

For the first part, we recall that $\mu$ satisfies the Fokker-Planck equation in the following sense:

\partial_{t}\mu_{t,u}+\nabla\cdot(\beta_{t,u}\mu_{t,u})=\frac{1}{2}\sum_{i,j=1}^{d}\partial_{ij}^{2}((\sigma\sigma^{T})_{ij}\mu_{t,u})\,.

For simplicity we work under the (additional) assumption that $\sigma=\sigma_{0}I_{d\times d}$ for some constant $\sigma_{0}>0$ .

Given distinct $u,v\in I$ , we have

	$\displaystyle\partial_{t}(\mu_{t,u}-\mu_{t,v})$
	$\displaystyle\qquad=\frac{\sigma_{0}^{2}}{2}\Delta(\mu_{t,u}-\mu_{t,v})-\nabla\cdot(\beta_{t,u}(\mu_{t,u}-\mu_{t,v}))-\nabla\cdot(\mu_{t,v}(\beta_{t,u}-\beta_{t,v}))\,.$

Notice that,

\beta_{t,u}(x)=\int_{I}\int_{\mathbb{R}^{d}}b(x,y)G(u,u^{\prime})\mu_{t,u^{\prime}}(dy)du^{\prime}\,,

so there exist some constant $c_{1}>0$ such that

	$\displaystyle\left\|\beta_{t,u}(x)-\beta_{t,v}(x)\right\|$	$\displaystyle\leqslant\int_{I}\int_{\mathbb{R}^{d}}\left\|b(x,y)(G(u,u^{\prime})-G(v,u^{\prime}))\right\|\mu_{t,u^{\prime}}(dy)du^{\prime}$
		$\displaystyle\leqslant c_{1}\left\\|b\right\\|_{\infty}\int_{I}\int_{\mathbb{R}^{d}}\left\|u-v\right\|\mu_{t,u^{\prime}}(dy)du^{\prime}$
		$\displaystyle\leqslant c_{1}\left\\|b\right\\|_{\infty}\left\|u-v\right\|$

and similarly, $\left|\nabla\cdot\beta_{t,u}(x)-\nabla\cdot\beta_{t,v}(x)\right|\leqslant c_{1}\left\|\nabla_{x}\cdot b\right\|_{\infty}\left|u-v\right|$ , for any $u,v\in I$ . Then we have

	$\displaystyle\partial_{t}(\mu_{t,u}-\mu_{t,v})\leqslant$
	$\displaystyle\qquad\frac{\sigma_{0}^{2}}{2}\Delta(\mu_{t,u}-\mu_{t,v})-\nabla\cdot\beta_{t,u}(\mu_{t,u}-\mu_{t,v})-\beta_{t,u}\cdot\nabla(\mu_{t,u}-\mu_{t,v})+C\left\|u-v\right\|\,,$

where the constant $C$ depends only on the Lipschitz coefficients of $b$ and $G$ .

We compare the above inequality with the following differential equation

\partial_{t}\varphi_{t}=\frac{\sigma_{0}^{2}}{2}\Delta\varphi_{t}-\nabla\cdot\beta_{t,u}\varphi_{t}-\beta_{t,u}\cdot\nabla\varphi_{t}\,,

with initial condition $\varphi_{0}=\mu_{0,u}-\mu_{0,v}$ . This is a linear homogeneous parabolic equation. Thanks to maximum principle, we have

\mu_{t,u}-\mu_{t,v}\leqslant\varphi_{t}+Ct\left|u-v\right|\,.

It remains to bound $\varphi_{t}$ .

Consider a time reversal of $\varphi$ , namely $\psi(t,x)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\varphi(T-t,x)$ . It satisfies the following equation

\partial_{t}\psi_{t}+(\nabla\cdot\beta_{T-t,u})\psi_{t}+\beta_{T-t,u}\cdot\nabla\psi_{t}+\frac{\sigma_{0}^{2}}{2}\Delta\psi_{t}=0\,,

with terminal condition $\psi_{T}=\varphi_{0}$ .

Note that $\nabla\cdot\beta_{t,u}=\int_{I}\int_{\mathbb{R}^{d}}\nabla_{x}\cdot b(x,y)G(w,w^{\prime})\mu_{t,w^{\prime}}(dy)dw^{\prime}$ is bounded. Then Feynman-Kac formula reads

	$\displaystyle\psi(t,x)$	$\displaystyle=\tilde{\mathbf{E}}^{Z_{t}=x}\left[\exp\left(\int_{t}^{T}\nabla\cdot\beta_{T-s,u}(Z_{s})ds\right)\psi_{T}(Z_{T})\right]$
		$\displaystyle=\tilde{\mathbf{E}}^{Z_{t}=x}\left[\exp\left(-\int_{t}^{T}\nabla\cdot\beta_{s,u}(Z_{T-s})ds\right)\varphi_{0}(Z_{T})\right]\,,$

where $(Z_{t})_{t\in[0,T]}$ is a diffusion process with dynamics

dZ_{t}=\beta(T-t,u,Z_{t})dt+\sigma d\tilde{W}_{t}\,,

and $\tilde{W}$ is a $d$ -dimensional Brownian motion, under the measure $\tilde{\mathbf{P}}$ .

Thus we have

\left|\varphi_{t}(x)\right|=\left|\psi(T-t,x)\right|\leqslant\left\|\varphi_{0}\right\|_{\infty}\exp\big{(}t\left\|\nabla_{x}\cdot b\right\|_{\infty}\big{)}\leqslant\exp\big{(}t\left\|\nabla_{x}\cdot b\right\|_{\infty}\big{)}\rho_{I}(x)\left|u-v\right|\,,

where $\rho_{I}$ is given by Condition 2.3(3). This implies $\mu_{t,u}-\mu_{t,v}\lesssim\left|u-v\right|$ at every fixed $x$ , where the implicit constant is independent of $x,u,v,t$ .

Similarly, the other direction $\mu_{t,v}-\mu_{t,u}$ produces the same bound. Hence, there exists some constant $C>0$ such that, for every $t\in[0,T]$ , $x\in\mathbb{R}^{d}$ , and $u,v\in I$ , it holds that

\left|\mu(t,u,x)-\mu(t,v,x)\right|\leqslant C\left|u-v\right|\,.

∎

Appendix C Reduction of Assumption 2.1

Recall the operator $\mathcal{T}=\mathcal{F}_{I}\mathcal{F}_{\mathbb{R}^{d}}\mathcal{L}_{\phi}$ . Our estimator $\hat{G}$ requires computing the quantity $\frac{\mathcal{T}\hat{\beta}}{\mathcal{T}\hat{\mu}}$ . If in a set $U\subset\mathbb{R}\times\mathbb{R}^{d}$ with positive Lebesgue measure we have $\mathcal{T}\mu=0$ , a good estimator $\hat{\mu}$ would lead to small $\left|\mathcal{T}\hat{\mu}\right|$ . This might blow up the fraction and keep us away from a good estimation for $G$ . Therefore, the ad hoc assumption on the nonvanishing property of $\mathcal{T}\mu$ is somehow inevitable in this problem. It is also commonly seen in learning sample density with unknown error distribution (see [24] for instance).

However, the assumption 2.1 is not a trivial property of $\mu$ . Given the nonlinearity of the Fokker-Planck equation associated to (1.1), computating the explicit formula for $\mathcal{T}\mu$ is mostly impossible. To the best of our knowledge, no explicit solutions for graphon systems (1.1) satisfying all our assumptions have been presented. In this appendix, we study some special cases, where Assumption 2.1 is reduced to some weaker conditions that are easier to verify. We work under the hypothesis of Theorem 2.1, and assume $\sigma=I_{d\times d}$ for simplicity.

Case 1: Degeneration to a homogeneous system

Suppose $I\ni u\mapsto\mu_{0,u}\in\mathcal{P}(\mathbb{R}^{d})$ is constant. Corollary 2.3, [18], states that, given that the map

I\ni u\mapsto\int_{0}^{1}G(u,v)dv

is also constant (denoted by $\bar{g}$ ), the density map $I\ni u\mapsto\mu_{u}\in\mathcal{P}(\mathcal{C}_{d})$ is also constant, and it solves the classical McKean-Vlasov equation

\partial_{t}\bar{\mu}(t,x)=\frac{1}{2}\Delta\bar{\mu}(t,x)-\nabla\cdot\left(\bar{\mu}(t,x)\int_{\mathbb{R}^{d}}\bar{g}b(x,y)\bar{\mu}(t,y)dy\right)\,,

where we may view $\bar{g}b$ as a single quantity. In our model $G(u,v)=g(u-v)$ , $g$ must be 1-periodic on $[-1,1]$ .

Notice that for $w\neq 0$ ,

	$\displaystyle\mathcal{T}\mu(w,\xi)$	$\displaystyle=\int_{0}^{T}\phi(t)\int_{\mathbb{R}}e^{-iwu}\int_{\mathbb{R}^{d}}e^{-i\xi\cdot x}\mu(t,u,x)dxdudt$
		$\displaystyle=\int_{0}^{T}\phi(t)\int_{0}^{1}e^{-iwu}\int_{\mathbb{R}^{d}}e^{-i\xi\cdot x}\mu(t,u,x)dxdudt$
		$\displaystyle=\int_{0}^{T}\phi(t)\frac{1-e^{-iw}}{iw}\mathcal{F}\bar{\mu}_{t}(\xi)dt$
		$\displaystyle=\frac{1-e^{-iw}}{iw}\mathcal{F}\mathcal{L}\bar{\mu}(\xi)\,.$

Note that $1-e^{-iw}=0$ if and only if $w=2\pi k$ for $k\in\mathbb{Z}$ , which means it is nonzero $dw$ -a.e. Then our Assumption 2.1 reduces to Assumption 16 (on the solution $\bar{\mu}$ ) in [28].

Case 2: Denegeration to a finite graph

Define the degree of an index $u$ with respect to a subset $J\subset I$ by

\deg_{J}(u)=\int_{J}G(u,v)dv\,,

and

\deg(u)=\int_{0}^{1}G(u,v)dv\,.

Consider the partition $I=\bigcup_{j=1}^{m}I_{j}$ , and denote by $[u_{0}]$ the subset $I_{j}\ni u_{0}$ . Assume the degree on each part of the partition is constant, i.e., for $u_{0},u_{1},u_{2}\in I$ , we have

(C.1)

\deg_{[u_{0}]}(u_{1})=\int_{[u_{0}]}G(u_{1},v)dv=\int_{[u_{0}]}G(u_{2},v)dv=\deg_{[u_{0}]}(u_{2})\,,

whenever $[u_{1}]=[u_{2}]$ . An example is $G(u,v)=g(u-v)$ , where $g$ is defined on $\mathbb{R}$ , supported on $[-1,1]$ , and $\frac{1}{m}$ -periodic on $[-1,1]$ . Then we take $I_{j}=(\frac{j-1}{m},\frac{j}{m}]$ for every $j=1,\dots,m$ , so that

\deg_{I_{j}}(u_{1})=\int_{\frac{j-1}{m}}^{\frac{j}{m}}g(u_{1}-v)dv=\int_{u_{1}-\frac{j-1}{m}}^{u_{1}-\frac{j}{m}}g(v)dv=\int_{0}^{\frac{1}{m}}g(v)dv\,.

Assume further that the initial data map $u\mapsto\mu_{0,u}\in\mathcal{P}(\mathbb{R}^{d})$ is also constant on each $I_{j}$ . Theorem 2.1 in [18] tells us that the map $u\mapsto\mu\in\mathcal{P}(\mathcal{C}_{d})$ is constant in each part of the partition, that is, $\mu_{u_{1}}=\mu_{u_{2}}$ whenever $[u_{1}]=[u_{2}]$ .

We set $\mu^{(j)}(t,x)=\mu_{t,u}(x)$ , for $u\in I_{j}$ , $j=1,\dots,m$ , and set $D_{jk}=d_{I_{k}}(u)$ for $u\in I_{j}$ , which is well-defined due to (C.1). Then they satisfy a family of coupled equations:

\partial_{t}\mu^{(j)}=\frac{1}{2}\Delta\mu^{(j)}-(D\vec{1})_{j}\nabla\cdot(\mu^{(j)}V)-\nabla\cdot\left(\mu^{(j)}\sum_{k=1}^{m}D_{jk}F\ast\mu^{(k)}\right)\,,\qquad j=1,\dots,m\,.

where $D=(D_{jk})$ is treated as an $m\times m$ matrix. Assume that all $D_{jk}$ are equal to some constant $d_{0}>0$ . We see that

\partial_{t}\mu^{(j)}=\frac{1}{2}\Delta\mu^{(j)}-md_{0}\nabla\cdot(\mu^{(j)}V)-\nabla\cdot\left(\mu^{(j)}d_{0}F\ast\big{(}\sum_{k=1}^{m}\mu^{(k)}\big{)}\right)\,.

Let $\bar{\mu}=\frac{1}{m}\sum_{j=1}^{m}\mu^{(j)}$ (it is indeed $\int_{I}\mu_{u}du$ in this situation), then it solves

(C.2)

\partial_{t}\bar{\mu}=\frac{1}{2}\Delta\bar{\mu}-md_{0}\nabla\cdot(\bar{\mu}V)-md_{0}\nabla\cdot(\bar{\mu}F\ast\bar{\mu})\,.

Note that $\bar{\mu}\in\mathcal{P}(\mathbb{R}^{d})$ , and we may define

\bar{b}(t,x,\bar{\mu}_{t})\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}md_{0}(V(x)+\int_{\mathbb{R}^{d}}F(x-y)\bar{\mu}_{t}(dy))\,.

Then (C.2) is the associated Fokker-Planck equation for the mean-field diffusion process

dU_{t}=\bar{b}(t,U_{t},\bar{\mu}_{t})dt+d\bar{B}_{t}\,,\qquad U_{0}\sim\bar{\mu}_{0}\,,

and $\bar{\mu}$ is the density of $U$ .

Now we may compute

\displaystyle\mathcal{T}\mu(w,\xi)=\sum_{k=1}^{m}c_{k}(w)\mathcal{F}\mathcal{L}\mu^{(k)}(\xi)\,,

where in this case

c_{k}(w)\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\int_{I_{k}}e^{-iwv}dv=\begin{cases}\frac{e^{-\frac{iwk}{m}}(e^{\frac{iw}{m}}-1)}{iw}&w\neq 0\\ \frac{1}{m}&w=0\end{cases}.

We focus on the part where $w\neq 0$ , where $\mathcal{T}\mu\neq 0$ if and only if

\mathcal{F}\mathcal{L}\big{(}\sum_{k=1}^{m}e^{-\frac{iwk}{m}}\mu^{(k)}\big{)}\neq 0\,.

Set

\rho_{w}\stackrel{{\scriptstyle\scriptscriptstyle\textup{def}}}{{=}}\sum_{k=1}^{m}e^{-\frac{iwk}{m}}{\mu^{(k)}}\,.

Notice that for every $w\in\mathbb{R}$ , $\rho_{w}$ solves the linear differential equation

(C.3)

\partial_{t}\rho=\frac{1}{2}\Delta\rho-md_{0}\nabla\cdot(\rho(V+F\ast\bar{\mu}_{t}))\,.

Its canonical diffusion process has the following dynamics

dR_{t}=\bar{\beta}(t,R_{t})dt+dB_{t}\,,

where $\bar{\beta}(t,x)=\bar{b}(t,x,\bar{\mu}_{t})$ . Then Assumption 2.1 now reduces to Assumption 16 in [28] on $\rho_{w}$ for almost every $w\in[0,2\pi m]$ .

Remark C.1.

In the most general case, $(\mu_{u})_{u\in I}$ are the solutions to a system of infinitely many fully coupled nonlinear differential equations. There are no explicit formulae either for $\mathcal{F}_{I}\mu_{t}(w)=\int_{I}e^{-iwu}\mu_{t,u}du$ to fit into a condition involving only the operator $\mathcal{F}_{\mathbb{R}^{d}}\mathcal{L}$ . However, each $\rho_{w}$ in Case 2 is now the solution to some linear equation (though the coefficients involve $\bar{\mu}_{t}$ , which solves some other equation and could be seen as a known quantity). The assumption becomes much milder in this sense, and that is the main reduction in Case 2.

References

[1] C. Amorino, A. Heidari, V. Pilipauskaitė, and M. Podolskij. Parameter estimation of discretely observed interacting particle systems. Stochastic Processes and their Applications, 163:350–386, 2023.
[2] J. Baladron, D. Fasoli, O. Faugeras, and J. Touboul. Mean-field description and propagation of chaos in networks of hodgkin-huxley and fizhugh-nagumo neurons. Electron. J. Statist., 2, 2012.
[3] E. Bayraktar, S. Chakraborty, and R. Wu. Graphon mean field systems. Ann. Appl. Probab., 33(5):3587 – 3619, 2023.
[4] E. Bayraktar and D. Kim. Concentration of measure for graphon particle system. Adv. Appl. Probab., pages 1–28, 2024.
[5] E. Bayraktar and R. Wu. Stationarity and uniform in time convergence for the graphon particle system. Stochastic Processes and their Applications, 150:532–568, 2022.
[6] E. Bayraktar and R. Wu. Graphon particle system: Uniform-in-time concentration bounds. Stochastic Processes and their Applications, 156:196–225, 2023.
[7] E. Bayraktar, R. Wu, and X. Zhang. Propagation of chaos of forward-backward stochastic differential equations with graphon interactions. Applied Mathematics and Optimization, 88(1):25, 2023.
[8] D. Belomestny, V. Pilipauskaitė, and M. Podolskij. Semiparametric estimation of mckean-vlasov sdes. Ann. Inst. H. Poincaré Probab. Statist., 59(1):79–96, 2023.
[9] V. I. Bogachev, N. V. Krylov, M. Rockner, and S. V. Shaposhnikov. Fokker-Planck-Kolmogorov equations. Mathematical surveys and monographs, volume 207. American Mathematical Society, Providence, Rhode Island, 2015.
[10] F. Bolley, A. Guillin, and C. Villani. Quantitative concentration inequalities for empirical measures on non-compact spaces. Probab. Theory Relat. Fields, 137(3):541–593, 2007.
[11] M. Burger, V. Capasso, and D. Morale. On an aggregation model with long and short range interactions. Nonlinear Analysis: Real World Applications, 8(3):939–958, 2007.
[12] P. E. Caines and M. Huang. Graphon mean field games and their equations. SIAM Journal on Control and Optimization, 59(6):4373–4399, 2021.
[13] C. Canuto, F. Fagnani, and P. Tilli. An eulerian approach to the analysis of krause’s consensus models. SIAM Journal on Control and Optimization, 50(1):243–265, 2012.
[14] R. Carmona, D. B. Cooney, C. V. Graves, and M. Laurière. Stochastic graphon games: I. the static case. Mathematics of Operations Research, 47(1):750–778, 2021.
[15] R. A. Carmona. Applications of mean field games in financial engineering and economic theory. Proceedings of Symposia in Applied Mathematics, 78, 2021.
[16] B. Chazelle, Q. Jiu, Q. Li, and C. Wang. Well-posedness of the limiting equation of a noisy consensus model in opinion dynamics. J. Differ. Equ., 263(1):365–397, 2017.
[17] F. Coppini. Long time dynamics for interacting oscillators on graphs. Ann. Appl. Probab., 32(1):360–391, 2022.
[18] F. Coppini. A note on fokker-planck equations and graphons, 2022.
[19] F. Coppini, H. Dietert, and G. Giacomin. A law of large numbers and large deviations for interacting diffusions on erdős-rényi graphs. Stochastics and Dynamics, page 2050010, 2019.
[20] F. Delarue. Mean field games: A toy model on an erdös-renyi graph. ESAIM: Proceedings and Surveys, 60:1–26, 2017.
[21] S. Delattre, G. Giacomin, and E. Lucon. A note on dynamical models on random graphs and fokker-planck equations. Journal of Statistical Physics, 165(4):785–798, 2016.
[22] P. Dupuis and G. S. Medvedev. The large deviation principle for interacting dynamical systems on random graphs. Communications in Mathematical Physics, 390(2):545–575, 2022.
[23] J.-P. Fouque and L.-H. Sun. Systemic Risk Illustrated, pages 444–452. Cambridge University Press, 2013.
[24] J. Johannes. Deconvolution with unknown error distribution. The Annals of Statistics, 37(5A):2301–2323, 2009.
[25] V. N. Kolokoltsov. Nonlinear Markov processes and kinetic equations. Cambridge Tracts in Mathematics. Cambridge University Press, 2010.
[26] L. Le Cam. Asymptotic Methods in Statistical Decision Theory. Springer Series in Statistics. Springer-Verlag, New York, NY, 1986.
[27] L. Lovász. Large networks and graph limits. Colloquium Publications. American Mathematical Society, 2012.
[28] L. D. Maestra and M. Hoffmann. Nonparametric estimation for interacting particle systems: Mckean-vlasov models. Probab. Theory Relat. Fields, 182:551–613, 2022.
[29] H. P. McKean Jr. A class of markov processes associated with nonlinear parabolic equations. Proceedings of the National Academy of Sciences of the United States of America, 56:1907–1911, 1966.
[30] A. Mogilner and L. Edelstein-Keshet. A non-local model for a swarm. J. Math. Biol., 38:534–570, 1999.
[31] R. I. Oliveira and G. H. Reis. Interacting diffusions on random graphs with diverging average degrees: Hydrodynamics and large deviations. Journal of Statistical Physics, 2019.
[32] F. Parise and A. Ozdaglar. Graphon games: A statistical framework for network games and interventions. Econometrica, 91(1):191–225, 2023.
[33] T. Sarkar, A. Bhattacharjee, H. Samanta, K. Bhattacharya, and H. Saha. Optimal design and implementation of solar pv-wind-biogas-vrfb storage integrated smart hybrid microgrid for ensuring zero loss of power supply probability. Energy Conversion and Management, 191:102–118, 2019.
[34] A.-S. Sznitman. Topics in propagation of chaos. Lecture Notes in Mathematics. Springer-Verlag, New York, 1991.
[35] A. A. Vlasov. Many-Particle Theory and Its Application to Plasma. Russian Monographs and Texts on Advanced Mathematics and Physics 8. Gordon and Breach, New York, 1961.
[36] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Structural Analysis in the Social Sciences. Cambridge University Press, 1994.

	$\displaystyle\mathbf{E}\left\|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right\|^{2}\leqslant$
	$\displaystyle\qquad C(n^{-2}h_{3}^{-2-2d}\left\\|\nabla K\right\\|_{\infty}^{2}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}$
	$\displaystyle\qquad+n^{-2}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}\left\\|K_{h_{3}}(x_{0}-\cdot)\right\\|^{2}_{L^{2}(\mu_{t_{0},\frac{i}{n}})}+n^{-2}h_{2}^{-2}h_{3}^{-2d}\left\\|J\right\\|_{\infty}^{2}\left\\|K\right\\|_{\infty}^{2}$
	$\displaystyle\qquad+n^{-2}h_{3}^{-2-2d}\left\\|J\right\\|_{2}^{2}\left\\|\nabla K\right\\|_{\infty}^{2}+n^{-3}h_{2}^{-2}\left\\|\nabla J\right\\|_{\infty}^{2}\sum_{i=1}^{n}\left\\|K_{h_{3}}(x_{0}-\cdot)\right\\|_{L^{2}(\mu_{t_{0},\frac{i}{n}})}^{2})$
	$\displaystyle\qquad+\left\|(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right\|^{2}\,,$

	$\displaystyle\mathbf{E}\left\|\pi^{n}_{h}(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})\right\|^{2}\leqslant C(Td^{2}\sigma_{+}^{2}n^{-1}h_{1}^{-2}h_{2}^{-2}h_{3}^{-2d}$
	$\displaystyle\quad+Tn^{-1}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2-2d}\left\\|b\right\\|_{\infty}^{2}\left\\|H\right\\|_{2}^{2}\left\\|J\right\\|_{\infty}^{2}\left\\|\nabla K\right\\|_{\infty}^{2}$
	$\displaystyle\qquad\quad+Tn^{-2}h_{1}^{-1}h_{3}^{-2d}\sum_{i=1}^{n}J_{h_{2}}(u_{0}-\frac{i}{n})^{2}$
	$\displaystyle\quad+n^{-2}h_{1}^{-1}h_{2}^{-2}h_{3}^{-2d}T\left\\|b\right\\|_{\infty}^{2}\left\\|H\right\\|_{2}^{2}\left\\|J\right\\|_{\infty}^{2}\left\\|K\right\\|_{\infty}^{2}$
	$\displaystyle\qquad\quad+Tn^{-2}\left\\|b\right\\|_{\infty}^{2}\int_{0}^{T}H_{h_{1}}^{2}(t_{0}-t)\sum_{i=1}^{n}J_{h_{2}}^{2}(u_{0}-\frac{i}{n})\left\\|K_{h_{3}}(x_{0}-\cdot)\right\\|_{L^{2}(\mu_{t,\frac{i}{n}})}^{2}dt$
	$\displaystyle\quad+Tn^{-2}h_{1}^{-1}h_{2}^{-1}h_{3}^{-2-2d}\left\\|b\right\\|_{\infty}^{2}\left\\|H\right\\|_{2}^{2}\left\\|J\right\\|_{2}^{2}\left\\|\nabla K\right\\|_{\infty}^{2}$
	$\displaystyle\qquad\quad+Tn^{-2}h_{1}^{-1}h_{2}^{-1}h_{3}^{-2d}\left\\|H\right\\|_{2}^{2}\left\\|J\right\\|_{2}^{2}\left\\|K\right\\|_{\infty}^{2}$
	$\displaystyle\qquad\quad+Tn^{-3}h_{2}^{-4}\left\\|b\right\\|_{\infty}^{2}\left\\|\nabla J\right\\|_{\infty}^{2}\int_{0}^{T}H_{h_{1}}^{2}(t_{0}-t)\sum_{i=1}^{n}\left\\|K_{h_{3}}(x_{0}-\cdot)\right\\|_{L^{2}(\mu_{t,\frac{i}{n}})}^{2}dt)$
	$\displaystyle\quad+\left\|(H\otimes J\otimes K)_{h}\ast\pi(t_{0},u_{0},x_{0})-\pi(t_{0},u_{0},x_{0})\right\|^{2}\,.$

	$\displaystyle\left\|\hat{G}(u_{0},v_{0})-G(u_{0},v_{0})\right\|^{2}$	$\displaystyle=g_{0}^{2}\left\|\frac{A(u_{0}-v_{0})(\hat{A}(0)-A(0))}{A(0)(\hat{A}(0)\lor\kappa_{0})}+\frac{A(u_{0}-v_{0})-\hat{A}(u_{0}-v_{0})}{\hat{A}(0)\lor\kappa_{0}}\right\|^{2}$
		$\displaystyle\lesssim\frac{G(u_{0},v_{0})^{2}\left\|\hat{A}(0)\lor\kappa_{0}-A(0)\right\|^{2}}{(\hat{A}(0)\lor\kappa_{0})^{2}}+\frac{\left\|A(u_{0}-v_{0})-\hat{A}(u_{0}-v_{0})\right\|^{2}}{(\hat{A}(0)\lor\kappa_{0})^{2}}$
		$\displaystyle\lesssim\kappa_{0}^{-2}\left(\left\|\hat{A}(0)-A(0)\right\|^{2}+\left\|A(u_{0}-v_{0})-\hat{A}(u_{0}-v_{0})\right\|^{2}\right)\,.$

	$\displaystyle\mathbf{E}\left\|\hat{A}(u)-A(u)\right\|^{2}$	$\displaystyle\lesssim\mathbf{E}\left\\|\mathcal{F}_{I}^{-1}\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left\|\mathcal{T}\hat{\mu}^{n}_{h,r}\right\|>\kappa_{1},\left\|w\right\|\leqslant\tilde{r}\}}-\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F\big{)}(u)\right\\|_{L^{2}(\mathbb{R}^{d})}^{2}$
		$\displaystyle\lesssim\tilde{r}^{2}\mathbf{E}\int_{\left\|w\right\|\leqslant\tilde{r}}\left\\|\big{(}\frac{\mathcal{T}\hat{\beta}^{n}_{h,\kappa,r}}{\mathcal{T}\hat{\mu}^{n}_{h,r}}\mathbf{1}_{\{\left\|\mathcal{T}\hat{\mu}^{n}_{h,r}\right\|>\kappa_{1}\}}-\mathcal{F}_{I}g\mathcal{F}_{\mathbb{R}^{d}}F\big{)}(w)\right\\|_{L^{2}(\mathbb{R}^{d})}^{2}dw$
		$\displaystyle\qquad\qquad+\left\\|\mathcal{F}_{\mathbb{R}^{d}}F\int_{\left\|w\right\|>\tilde{r}}e^{iuw}\mathcal{F}_{I}g(w)dw\right\\|_{L^{2}(\mathbb{R}^{d})}^{2}\,.$

	$\displaystyle\mathbf{E}\left\|\hat{\mu}^{n}_{h}(t_{0},u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right\|^{2}\leqslant$
	$\displaystyle\qquad C_{0}(n^{-1}h_{2}^{-1}h_{3}^{-d}\left\\|J\right\\|_{\infty}^{2}\left\\|K\right\\|_{2}^{2}+n^{-2}h_{2}^{-2}h_{3}^{-2d}\left\\|J\right\\|_{\infty}^{2}\left\\|K\right\\|_{\infty}^{2}$
	$\displaystyle\qquad+n^{-2}h_{3}^{-2-2d}\left\\|J\right\\|_{\infty}^{2}\left\\|\nabla K\right\\|_{\infty}^{2}+n^{-2}h_{2}^{-2}h_{3}^{-d}\left\\|\nabla J\right\\|_{\infty}^{2}\left\\|K\right\\|_{2}^{2})$
	$\displaystyle\qquad+\left\|(J\otimes K)_{h}\ast\mu_{t_{0}}(u_{0},x_{0})-\mu(t_{0},u_{0},x_{0})\right\|^{2}\,,$

Non-parametric estimates for graphon mean-field particle systems

Abstract.

Key words and phrases:

2020 Mathematics Subject Classification:

1. Introduction

1.1. Background

1.2. Our contributions and organization of this paper

2. Model and estimators

2.1. Setting, notation and assumptions

Condition 2.1.

Condition 2.2.

Condition 2.3.

Proposition 2.1.

2.2. Plug-in estimators

Lemma 2.1 (Theorem 3.2, [3]).

Kernel Interpolation

Deconvolution

Main estimator and its convergence

Assumption 2.1.

Theorem 2.1 (Main theorem).

Remark 2.1.

Remark 2.2.

3. Convergence of estimators

3.1. Error bounds for density and drift

Estimates of particle density μ​(t,u,x)\mu(t,u,x)

Lemma 3.1.

Corollary 3.1.

Estimates of the drift term β​(t,u,x)\beta(t,u,x)

Lemma 3.2.

Corollary 3.2.

3.2. Proof of main theorem

Proof of Theorem 2.1.

4. Minimax analysis on plug-in estimators

Lemma 4.1.

Lemma 4.2.

Remark 4.1.

4.1. Anisotropic Hölder smoothness classes

Definition 4.1.

Definition 4.2.

4.2. Minimax estimation for density

Proposition 4.1.

Theorem 4.1.

Remark 4.2.

Remark 4.3.

Remark 4.4.

4.3. Minimax estimation for drift

Definition 4.3.

Theorem 4.2.

Remark 4.5.

Remark 4.6.

Optimality of graphon estimator G^\hat{G}

5. Proofs for Section 3.1

5.1. Proof of Lemma 3.1 and Corollary 3.1

Proof of Lemma 3.1.

Proof of Corollary 3.1.

5.2. Proof of Lemma 3.2 and Corollary 3.2

Lemma 5.1.

Proof of Lemma 3.2.

Proof of Corollary 3.2.

6. Proofs for Section 4

Lemma 6.1.

6.1. Proof of Theorem 4.1

Proof of Lemma 4.1.

Lemma 6.2.

Proof of Theorem 4.1, upper bound.

Proof of Theorem 4.1, lower bound.

6.2. Proof of Theorem 4.2

Proof of Lemma 4.2.

Proof of Theorem 4.2.

Acknowledgement

Appendix A Intuitions of Estimators

Appendix B Proofs of Technical Lemmas

Proof of Proposition 2.1.

Proof of Leamma 5.1.

Proof of Lemma 6.2.

Appendix C Reduction of Assumption 2.1

Case 1: Degeneration to a homogeneous system

Case 2: Denegeration to a finite graph

Remark C.1.

References

Estimates of particle density $\mu(t,u,x)$

Estimates of the drift term $\beta(t,u,x)$

Optimality of graphon estimator $\hat{G}$