Non-parametric estimates for graphon mean-field particle systems
Abstract.
We consider the graphon mean-field system introduced in the work of Bayraktar, Chakraborty, and Wu. It is the large-population limit of a heterogeneously interacting diffusive particle system, where the interaction is of mean-field type with weights characterized by an underlying graphon function. Through observation of continuous-time trajectories within the particle system, we construct plug-in estimators of the particle density, the drift coefficient, and thus the graphon interaction weights of the mean-field system. Our estimators for the density and drift are direct results of kernel interpolation on the empirical data, and a deconvolution method leads to an estimator of the underlying graphon function. We show that, as the number of particles increases, the graphon estimator converges to the true graphon function pointwisely, and as a consequence, in the cut metric. Besides, we conduct a minimax analysis within a particular class of particle systems to justify the pointwise optimality of the density and drift estimators.
Key words and phrases:
graphon mean-field system, interacting particles, kernel estimation, minimax analysis2020 Mathematics Subject Classification:
Primary 62G07, 62H22, 62M05; Secondary 05C80, 60J60, 60K35.1. Introduction
We study a statistical method to estimate the interaction strength in the graphon mean-field interaction particle system introduced in [3]. The particles in such a system are characterized by not only the feature vector in a physical space but also the “type” indexed by . The interaction strength between particles of different types is quantified by a graphon function .
Precisely, the system consists of a family of diffusion processes with dynamics
(1.1) | ||||
where and are some Lipschitz functions, are a collection of independent random variables in with distributions , and are i.i.d. -dimensional Brownian motions independent of . Here, we are assuming that the interactions between particles only happen in the drift term.
The main purpose of this study is to estimate the graphon function by continuous observation of a finite-population system. It is shown in [3] that system (1.1) is the large-population limit of the following system
(1.2) | ||||
where . We continuously observe (1.2) over a (fixed) time horizon . Using empirical data, we construct estimators of the particle density and the drift term of (1.1) and finally an estimator of the graphon function . The error of our estimation is well-controlled when the number of particles increases, with proper choices of parameters and under certain conditions.
In this work, we are mainly interested in a model resembling the McKean-Vlasov type, where the drift integrand takes the form
for some sufficiently regular functions and . The function acts as an interacting force between two particles depending on their relative positions, while the function accounts for an external force applied to every single particle. Also, we consider graphon functions of the form
for some function with certain regularities. When we specialize in this case, the problem boils down to estimating the function .
1.1. Background
The study of classical mean-field systems with homogeneous interaction and the associated parabolic equations in the sense of McKean [29] dates back to the 1960s. The original motivation of this study came from plasma theory in statistical physics (see [35, 34, 25] and references therein), and its significance in applied mathematics was well demonstrated throughout the past decades. Several analytic and probabilistic methods were developed during the period to push-forward the study of mean-field systems (see the references in [28]).
However, the early formulation of this problem focuses on the theoretical properties of the systems, which highly rely on precise knowledge of the dynamics of the systems. Statistical methods that fit those properties into models with noises were still in shortage until the 21st century. A modern formulation came to the stage in the 2010s, when the development of other areas of research led to a high demand for statistical inference models. Empirical data from a particle system can be utilized as inferences to estimate the dynamics of the system and thus predict its future behaviors. Those new ideas are applied to various application fields, including chemical and biological systems [2, 11, 30], economics and finance [23, 15], collective behaviors [13, 16], etc.
While the features of particles are usually embedded in a physical space , the situations in the study of modern networks can be more complicated. Systems with inhomogeneity contain different types of entities (e.g. social networks [36] and power supplies [33]), and interactions between two individuals also depend on their types. Then the idea of studying sophisticated networks using graphon particle systems begins to draw significant attention.
A heterogeneous particle system can be embedded into a graph (deterministic or random) graph (see [17, 19, 20, 21, 22, 31]), where the interaction strength between two types of particles is quantified by the corresponding edge weight. As the number of vertices increases and the graph becomes denser, the interaction strength approaches some bounded symmetric kernel called graphon. In fact, every graphon is the limit of a sequence of finite graphs, which is discussed in Chapter 11 of [27].
In recent years, graphon mean field games have demonstrated their ability to model densely interactive networks in multiple studies, e.g. [14, 12, 32]. On the purely theoretical side, several results on the stability and stationarity of graphon mean-field systems have been established in [5, 6, 3, 7], and the concentration of measures is well studied in [6, 4]. These properties enable the study of the mean-field systems from a statistical inference point of view.
Statistical inference methods are widely applied to learning the dynamics of interactive systems. Empirical data in a McKean-Vlasov model can be interpolated using a kernel to obtain estimates of particle density [28, 8, 1]. In particular, the data-driven estimation algorithms in [28] automatically choose the best kernel bandwidths among a predetermined, possibly opaque set, which ensures pointwise optimality even without explicitly specifying the parameters. Estimating the interacting force requires more technical tools, including the deconvolution methods introduced by Johannes in [24]. These strategies offer firm technical support in the analysis of interactive systems with unknown driving forces, which admits predictions of the evolution of the systems solely based on empirical data.
1.2. Our contributions and organization of this paper
Recall the graphon function in the mean-field system (1.1). In Section 2, we introduce a kernel interpolation method adopted from [28, 8]. We make continuous observations of the -particle system (1.2) during a finite time interval . The empirical data from the finite-population system are then interpolated in both the feature space and the index space to produce a pointwise estimator of the particle density functions . A further interpolation in the time variable leads to an estimator for the drift coefficients
Then we apply a deconvolution method introduced in [24] to build a pointwise estimator of . Here
are the parameters associated to the estimators: are the bandwidths of the kernels, are the denominator cutoff factors to prohibit fraction blowups, and are cutoff radius. We will explain them with more details in Section 2. We show in Section 3 that there exists a sequence of the parameters such that
subject to the regularity conditions.
We will disclose the particular settings of our problem in Section 2. These include the continuity and integrability of the coefficients and the initial data . Then we define the kernel-interpolated estimators and , with free choices of the kernel bandwidth vector and cutoff factors . It is worth noticing that the bandwidths of our estimators are fixed throughout the algorithm, whereas the data-driven Goldenshluger-Lepski estimators applied in [28] make dynamic choices of bandwidths from a pre-determined finite set of candidates. The pre-determined set can be invisible to the users, and the algorithm automatically selects the best candidate to give as output an estimation. Such algorithm attains the optimal pointwise oracle estimations without precise knowledge of the system’s continuity property and does not lose too much efficiency for each tuple of plug-in arguments. However, the convergence of our estimator depends on the total -errors of the plug-in estimators (instead of the pointwise errors), so it becomes more beneficial to fix the bandwidths all along. The minimax analysis in Section 4 shows that our estimator is still pointwisely optimal when given enough information.
We will present upper bounds on the errors of the pointwise estimators in Section 3, with proofs in Section 5. The main idea behind the proofs are the stability of the mean-field systems and the concentration of particle density. We make a direct connection between the (observed) finite-population system (1.2) and the intrinsic graphon mean field system (1.1) by the next inequality step. With particle density , for example, we have
(1.3) |
where
The first part is controlled by the convergence of (1.2) to (1.1) (see [3]). For the second part, we follow the idea of [28] and produce a Berstein concentration inequality. The use of Bernstein’s inequality here avoids the extra constants that arise from the change of measures in [28], thanks to the independence of particles in the graphon mean-field system. It is worth noticing that all the constants appearing in the inequalities are global (independent of the plug-in arguments ), and we keep some of the explicit summations in the upper bound on purpose (as can be seen in Lemma 3.1 and 3.2). Those properties preserve the integrability of the whole sums and maintain nice asymptotic behavior of the estimator .
In Section 4, we perform a minimax analysis on the plug-in estimator of particle density and the drift coefficient . We restrict our view to those particle systems with locally Hölder continuous density functions, the existence of which can be seen in several classical texts in Fokker-Planck equations such as [9]. We present an alternative analysis on the pointwise behaviors of and with a change-of-measure strategy adapted from [28]. This improves the pointwise errors obtained in Section 3.1 with the sacrifice of a constant factor depending on the value of near . Balancing among the several error terms leads to optimal asymptotic upper bounds. On the other hand, we find the (theoretical) lower bounds of the estimation error and compare them with the upper bounds, which demonstrates the optimality of our estimators. The proofs are given in Section 6.
2. Model and estimators
2.1. Setting, notation and assumptions
Let us fix a finite time horizon . All observations are made within the time interval . We denote by the space of -valued continuous functions on , i.e. . In more general cases, we write for the space of -times continuously differentiable functions defined on taking values in . Similarly, we write for the space of -th Lebesgue-integrable functions, and for the Sobolev space. The position of is often neglected if .
An -valued function can be written componentwise . The Fourier transform of a function is defined componentwise via
This will be applied to the drift coefficients in our deconvolution method.
We usually have the following assumptions on the graphon mean field system (1.1).
Condition 2.1.
-
(1)
The drift coefficient is bounded and has bounded first derivatives. It is Lipschitz continuous in the sense that there exists some constant such that
(2.1) -
(2)
The drift coefficient takes the form , where for .
-
(3)
The diffusion coefficient is Lipschitz in the operator norm on , i.e. there exists some constant such that
where is the operator norm of matrices.
-
(4)
The diffusion coefficient is bounded in the sense that there exist constants such that
where two square matrices and satisfy if is positive semi-definite.
Recall that the types of particles are indexed by , and the interaction strength between particles of two types is given by a graphon function . We consider the following conditions for the structure of the graphon function.
Condition 2.2.
-
(1)
The graphon function is piecewise Lipschitz in the sense that there exists a constant and a finite partition of , such that
-
(2)
The graphon function has the form , where is a Lipschitz continuous function with a given constant.
-
(3)
Upon item (2), we have further that the Fourier transform of is in and decays fast enough, so that
as .
Finally, we examine the initial state of the system.
Condition 2.3.
We denote by the space of probability measures on a Polish space (e.g. , ).
-
(1)
The initial distributions admit density functions with respect to the Lebesgue measure on . There exist some constant , such that
-
(2)
There exists a constant and a finite collection of intervals such that , and
where is the Wasserstein 2-distance.
-
(3)
There exists a function such that almost everywhere, for every .
The (continuously indexed) graphon mean-field system built on appropriately chosen conditions from above has dynamics
for , . Define the drift term
and observe that is measurable. We will always abbreviate it as (ignoring the mean-field argument). Under Condition 2.1(1)(3), we know that is Lipschitz continuous and has at most linear growth for every and . This means is the unique weak solution to the associated Fokker-Planck equation of the diffusion process , and the map is measurable due to Proposition 2.1 in [3]. Further, with Condition 2.1(4) and 2.3(1), every admits a density function with respect to the Lebesgue measure on (see [9]). Note that is measurable. We claim that the densities are asymptotically bounded.
Proposition 2.1.
The proof will be given in Appendix B. It also shows the -integrability of the density function at any (see also Corollary 8.2.2, [9]).
Our goal in the next section is to construct estimations of the functions and , to give an estimation of . We will use the -distance in the probability space to describe our estimation errors.
2.2. Plug-in estimators
To estimate the underlying functions described in the previous paragraph, we make continuous-time observations of the -particle system (1.2),
where for every . This finite system shows some consistency with respect to the mean-field system in the following sense.
Lemma 2.1 (Theorem 3.2, [3]).
Kernel Interpolation
We introduce an HJK-kernel adapted from [28]. Choose three functions , , that are non-negative and normalized:
and have order (at least) 1:
With the bandwidth vector , the dilations are defined by
and the products are written as
Due to the use of floating bandwidth, we choose without loss of generality kernels supported in the closed unit ball (in the space where they are defined).
With a given number of particles, we run the finite system over the time interval . This gives us the empirical distribution
Using the JK part of the kernel to interpolate it gives a plug-in estimator of the density :
(2.3) |
for , , .
We also look at an auxiliary quantity defined by
A discrete approximation is given by
so for any test function , we have
as a stochastic integral. Using the HJK kernel to interpolate it gives a plug-in estimator of :
(2.4) | ||||
for , , . That leads to a plug-in estimator of ,
(2.5) |
where is a cut-off parameter to prevent the denominator from getting too large.
Deconvolution
The deconvolution method is usually applied to obtain a function from the convolution . We follow the ideas of [24] and [28]. Here, we only present the definitions and estimators, while delaying the full intuitions to Appendix A.
To apply the Fourier transform on the index space , we consider the zero extension to all measurable functions defined in . With some abuse of notation, we let
Then we define the Fourier transform of function supported on via
Note that we may view as a linear operator on function-valued functions, and it admits an inverse transform on -spaces.
In addition, we consider a linear operator on time-dependent functions, defined by
where has compact support in , such that (we denote this subspace of functions by ). We write for when is fixed and the context has no ambiguity. The intuition of this operator is also explained in Appendix A.
Main estimator and its convergence
Finally, with some additional cutoff parameters, we introduce our estimator of the graphon function
(2.6) |
where , and
We will explain the intuition of this estimator in Appendix A.
For the estimate to converge, we need a further (relatively strong) assumption on some data intrinsic to the particle system.
Assumption 2.1.
Theorem 2.1 (Main theorem).
Assume Conditions 2.1, 2.2, 2.3, and Assumption 2.1. Take given by Assumption 2.1. There exists a function in the number of particles and the parameters
such that
(2.7) |
whenever are sufficiently large, are sufficiently small, and such that and
Moreover, there exists a sequence of choices of the parameters such that as .
Remark 2.1.
Notice that the bound is uniform over all . Then the cut distance from an estimator to the true graphon function is bounded by
This implies
Hence we also have the convergence of the estimator in the cut metric.
Remark 2.2.
3. Convergence of estimators
3.1. Error bounds for density and drift
We give estimates of the particle density and the intermediate quantity in the general setting. These ultimately contribute to the estimate of the graphon function. From this section onwards, we will keep using the asymptotic comparison symbol , where means there exists some constant such that . In addition, we write if for some constant depending on the quantity (e.g. time horizon , dimension ).
Estimates of particle density
Integrating the above pointwise errors, we have the following -error on the estimator .
Corollary 3.1.
Assume the same hypothesis as in Lemma 3.1. Fix a compact interval . For , , , we have
where is a constant depending on . Here is a function such that as , and is a function such that as .
Estimates of the drift term
Lemma 3.2.
Recall that . Integrating the above pointwise errors and using Corollary 3.1, we have the following -error on the estimator .
Corollary 3.2.
Assume the same hypothesis as in Lemma 3.2. Fix a compact interval . For , , , and , we have
where the constant depends on . Here is some function such that as , and is some function such that as .
3.2. Proof of main theorem
Proof of Theorem 2.1.
Step 1: For simplicity, we usually abbreviate as , and similarly for and .
Fix arbitrary . Write as
where
Also let . When , we have
So it suffices to bound the expressions , and we will do that in the following steps.
Step 2: By Minkowski’s inequality, we have for each that
Then
For the second term, we observe by Parseval’s identity that
so that
as due to Condition 2.2(3), at some rate independent of .
Step 3: Now we look at the first term under Assumption 2.1,
Notice that . Following [28], we split the integrand by
when the division is well-defined. Then
For , we first observe that
Integrating the first part gives
which can be bounded in the same way as using Corollary 3.1.
Integrating for the second part, we get
(3.1) |
Under Assumption 2.1, we apply dominated convergence to see that this quantity goes to 0 as .
In addition,
Condition 2.2(3) guarantees that it converges to 0 faster than as . We denote the total convergence rate of by .
To summarize, we define
Here the constant depends only on , which are fixed for the model. The upper bound is independent of , so we obtain the uniform bound presented in the theorem.
We can fix and . Let , where slowly enough as , , and accordingly. We may guarantee that the quantities , , , , all converge to 0. Then as , finishing the proof. ∎
4. Minimax analysis on plug-in estimators
In Section 3.1 we presented upper bounds for the estimation error and . Those are similar to those given in [10] and are not tight, though convergent. However, the estimators themselves are indeed optimal whenever the parameters , , and are properly chosen. In this section, we conduct a minimax analysis to study both the upper bounds and lower bounds of the estimation errors to witness the optimality.
We first look at an improved upper bound on the error of the plug-in estimator .
Lemma 4.1.
- (1)
-
(2)
Assume further that there exist some and such that
Then, for every , , , we have
where is independent of the bandwidths and the number of particles .
There is also an improved upper bound on the error of the plug-in estimator .
Lemma 4.2.
Remark 4.1.
We will use the above results for the minimax analysis in the next step. Yet, they are not ideal for estimating the total error of (as used in the proof of Theorem 2.1). It relies on the local boundedness of the density function , which follows from a crude estimate of the form (see, for instance, Corollary 8.2.2, [9]).
4.1. Anisotropic Hölder smoothness classes
In the estimates of Lemma 4.1, all items are explicitly quantitative except the bias term
The analysis of this quantity relies on some continuity of the density function . Here, we introduce a specific class of particle systems following the idea in [28].
Definition 4.1.
Let be a multi-index. Its norm is given by
and the differential operator of order is defined by .
Definition 4.2.
Let be an open neighborhood of a point . We say a function belongs to the -Hölder continuity class at with if for every and every multi-index with , we have
where is the smallest constant that satisfies the above inequality. We denote this class of functions by . The -norm in this class is defined by
4.2. Minimax estimation for density
Notice that the particle density function solves the following system of equations [18]
This is a system of fully coupled Fokker-Planck equations, and the solution is uniquely determined by , where is the class of satisfying Condition 2.1(1)(3), 2.2(1), and 2.3. We denote by the solution operator. We consider a specific class of coefficients and initial data.
For , we define
and set
Moreover, we consider a restriction of this class
for .
Several articles have discussed the richness of those classes of functions. In particular, Proposition 13 in [28] gives an example of Mckean-Vlasov homogeneous particle systems that fall into this class. With some slight modifications to its proof, we have the following result.
Proposition 4.1.
Let for some . Let with and
for some with , . Here, denotes the global Hölder class (where we simply choose ).
Suppose further that are probability measures with finite first moments and continuous density functions uniformly bounded over . Then, for every and , we have for all .
Now we present the minimax theorem, over the particle systems within those restricted smoothness classes.
Theorem 4.1.
Let and . Assume one of the following holds:
Then for every , , , we have
(4.1) |
Moreover,
(4.2) |
where the infimum is taken over all possible estimators of constructed from . Both constants and depend only on the parameters , the kernels , the function given by Condition 2.3(3), and the values of in a small neighborhood of .
The proofs are written in Section 6.1. Here we make several remarks on the above results.
Remark 4.2.
Without the extra assumption on the -Wasserstein continuity on initial data in Lemma 4.1(2), we would obtain a suboptimal upper bound when , namely (though we still attain the optimal bound when ).
Remark 4.3.
We mark that the asymptotic behavior is slower than that of [28], namely rather than . This is due to the heterogeneity of our graphon particle system. Both the index gap and the density gap between the target particle and the regularly spaced observations are , which actually reduces the approximation accuracy. Yet it is possible to estimate the average density using exactly the same strategy as [28], and that should probably exhibit the identical asymptotic behavior.
Remark 4.4.
Note that our algorithm to estimate is not adaptive to the observed data. Instead, users are free to set the parameters (e.g. bandwidths) according to their own accuracy demands, and the parameters are fixed from the start. As long as the bandwidths are chosen appropriately, our estimator still achieves optimality. It also improves computational efficiency compared to the data-driven Goldenshluger-Lepski algorithm applied in [28], which selects the best bandwidths among the set of candidates by making -many comparisons. Nevertheless, the adaptive estimator automatically fits the data with the best parameters and produces an error just a logarithmic factor higher than the lower bound. It is nonparametric and requires less knowledge about the initial state of the particle systems.
4.3. Minimax estimation for drift
In this section, we consider a slight generalization of the graphon mean-field system, where the drift coefficient is time-inhomogeneous. Namely, we extend (1.1) to
(4.3) |
where satisfies that, , ,
(4.4) |
for some constant (compare with (2.1)). Then the drift term is given by
Note that the stability analysis in [3] and previous estimates (Lemmata 4.1 and 4.2) still hold. We will conduct a minimax analysis on the drift term to show that the estimator is optimal in this general setting.
We can extend the notion of Hölder continuity to the space of time-dependent functions.
Definition 4.3.
Let and , and . We say that a function belongs to class if there exists an open neighborhood of in such that
where is the minimum positive number such that
for all and all (multi-)indices with , .
We say that a function with values belongs to class if each component .
Let be the set of all parameters satisfying the same conditions as defined above Proposition 4.1, except that has the form and satisfies (4.4). For , we define
and set
Moreover, we consider a restriction of this class
for .
Then we present a minimax result on the estimation of the drift term . The proof will be given in Section 6.2.
Theorem 4.2.
Let , and . Define the effective smoothness by
Assume the hypothesis of Lemma 4.2, and satisfy the following conditions
Then, for every , , , we have
(4.5) |
Moreover
(4.6) |
where the infimum is taken over all possible estimators built on empirical data . Both constants and depend only on the parameters , the kernels , the function given by Condition 2.3(3), and the values of in a small neighborhood of .
Remark 4.5.
It is also possible to obtain a similar argument in the case , subject to some additional assumptions on the -continuity of the initial data, as in Theorem 4.1. We leave this to the readers.
Remark 4.6.
As mentioned above, the stability of graphon particle systems given in [3] and the consistency of the estimations in the Lemmata 4.1 and 4.2 remain true in the above generalization with the time-inhomogeneous drift coefficient . In this case, the time dependence of is not only dependent on the mean field but also on an independent variable . This gives more freedom in constructing examples and thus opens up a cleaner path towards the theoretical lower bound. It is for that reason that we include the time-inhomogeneous drifts for the minimax analysis.
Optimality of graphon estimator
Recall from (2.6) and the proof of Theorem 2.1 that the error of our estimator depends on the variation of estimators and . There are several convergent quantities that require some stronger conditions to make them explicitly quantitative, but we are not diving into the details in this work. There is also a particular item, in the proof of Theorem 2.1, whose convergence is due to Assumption 2.1, that is not explicitly quantifiable without further assumptions. This keeps us away from a minimax analysis of the error to study its optimality. However, given the pointwise optimality of and , our estimators should show relatively good performances.
5. Proofs for Section 3.1
5.1. Proof of Lemma 3.1 and Corollary 3.1
Proof of Lemma 3.1.
Fix . Recall that
We do the following inequality via a telescoping sum,
We only need to provide the appropriate bounds for , , and .
Step 1. We bound using the convergence of the finite-population system to the graphon mean-field system.
Then applying Lemma 2.1 gives the bound
Step 2. We bound following the idea of [28].
For , let
The second part simplifies to
From (1.1) we know are autonomous. And due to the independence of the Brownian motions , are all independent. We have and for every . Then Bernstein’s inequality reads
Now we apply the inequality (48) in [28] to see that
(5.1) |
Step 3. Notice that is supported on . We bound using only Minkowski’s inequality and the inequality mean-value theorem.
where the last step uses Theorem 2.1 of [3]. That finishes the proof. ∎
We will consistently use the equality that follows from Fubini-Tonelli theorem,
(5.2) |
where the latter -norm is taken with respect to the Lebesgue measure on .
Proof of Corollary 3.1.
Recall that . We break the integral into two parts
The second part tends to 0 as due to Proposition 2.1 and dominated convergence. We denote the convergence rate by .
For the first part, we rearrange and integrate the terms given in Lemma 3.1. Recall that
(5.3) | ||||
(5.4) | ||||
(5.5) | ||||
The first line of bounds are independent of , so that integrating them gives
The middle three lines are computed as follows. For line (5.3) we have
With the same idea and using (5.2), line (5.4) gives
Analogously, line (5.5) gives
In addition, the final line expands as
Note that is uniformly bounded for and as a consequence of Proposition 2.1 (see also Corollary 8.2.2 of [9] for details on local upper bounds of particle density). As translations converge in , with dominated convergence we have
(5.6) |
We denote the convergence rate by .
In summary, the -error of is given by
∎
5.2. Proof of Lemma 3.2 and Corollary 3.2
Recall the dynamics of defined in (1.1) For simplicity, we let
for every and . Similarly, for the finite-population system, we let
for , and . Observe that . We have the following consistency result.
Lemma 5.1.
We assume the same hypothesis as in Lemma 2.1. In the -particle system, we have the following
(5.7) |
for every . As a consequence,
for any bounded continuous function .
Although we defer the proof to Appendix B, we are now ready to prove our estimates of .
Proof of Lemma 3.2.
Fix . Recall that
From this we may then write
We do not need to do anything for for now, and will be proved using standard techniques in stochastic analysis at the end. The proof of the rest are bounded in analogously as through in the proof of Lemma 3.1.
Step 1. Observe that is upper bounded by
For each and , we have
so
Then, with Lemmata 2.1 and 5.1, we get
Step 2. To bound , we apply Berstein’s inequality. Let
Then
Observe that , and for every , for every . Also, every is a function of , which makes them independent of each other among . So we may apply Bernstein’s inequality and inequality (48) in [28] to obtain
Further notice that
Thus
Step 3. The idea for bounding is analogous to that of in the proof of Lemma 3.1, which uses the stability of the graphon mean-field system. Observe that
where
Note that
Then
Integrating those produces
Step 4. For , notice that are distinct independent Brownian motions. Then we apply Itô’s isometry to see that
Adding all the above bounds finishes the proof. ∎
Proof of Corollary 3.2.
Recall that . We break the integral into two parts
(5.8) |
Step 1. The convergence of the second part is due to the -integrability of . More precisely, recall that
where with . Then
so that
Thus by dominated convergence, we have
as .
Step 2. We now look at the first part of (5.8). Recalling that , we obtain
whenever . Note that has a strictly positive lower bound over thanks to Harnack’s inequality (see for instance Corollary 8.2.2 in [9]). This allows us to choose a strictly positive depending on . We may set without loss of generality decreasing as increases.
For the estimates of , we rearrange and combine the terms in the upper bound given in Lemma 3.2 to see that
Analogous to the proof of Corollary 3.1, integrating those items over produces
Recall that , where . Using the same idea attaining (5.6), we see that
and we denote the convergence rate by .
Therefore, joining all the items, we obtain an overall upper bound
finishing the proof. ∎
6. Proofs for Section 4
The main improvement in the estimations in Lemmata 4.1 and 4.2 over Lemmata 3.1 and 3.2 is the elimination of the step of inequality (1.3). At a given point , we are able to remove a heavy error term by simply sacrificing a constant multiple (depending on the point ). This requires a change-of-measure argument thanks to Girsanov’s theorem, and the analysis of the constant multiple follows from Proposition 19 of [28].
Recall that the finite-population system has the following dynamics
for . We define
for , and . Then
Let be the process
Define a new probability measure via
where denotes the quadratic variation. Observe that are independent -Brownian motions, and that is a -martingale. So are independent under , and the -law of coincides with -law of , respectively for every . A slight modification on Proposition 19 of [28] gives the following relation.
Lemma 6.1.
There exist constants such that, for any -measurable event , we have
Here is the -algebra generated by the Brownian motions .
Now we have the tools to complete the proof of the improved estimations and thus the minimax analysis.
6.1. Proof of Theorem 4.1
We first justify the improved upper bound.
Proof of Lemma 4.1.
Observe that
For , let
Note that whenever , so the number of nonzero terms in the summation is .
The main improvement upon Lemma 3.1 comes from the upper bound of via the change-of-measure argument. Following the same strategy as in the proof of Lemma 3.1, we have
Recall that are independent under . Then so are . Moreover, we have and a.s. We may thus apply Bernstein’s inequality,
For index such that , we have
thanks to the local boundedness of , in a neighborhood of . Using estimate (48) in [28], we get
The term is where items (1) and (2) of Lemma 4.1 become different. We first work on item (1). Recall that is supported on . Then Cauchy-Schwarz inequality gives
For each , since the -law of is identical to the -law of , we have
So
where the last inequality uses Theorem 2.1 of [3].
Integrating the above errors, we obtain
That finishes the proof of item (1).
Looking at the proof of item (1), we notice that the only difference in item (2) compared to item (1) happens at the term
The previous (crude) analysis (in the proof of Lemma 3.1) gives an upper bound simply by mean-value theorem. However, the use of mean-value theorem is activated only when and . Given the local boundedness of , we have
for some constant .
Now, with the additional assumption on the continuity of initial data with respect to the -Wasserstein metric, we adjust the proof of Theorem 2.1(b) in [3] to see that
Then, by Hölder’s inequality, we get
Note that is independent of . That finishes the proof of item (2). ∎
It remains to analyze the bias term. Fix , , and . When , we have
For and such that and , we have
The second term has order due to the selection of Hölder continuity class. We will bound the first term with the following technical lemma.
Lemma 6.2.
Assume the hypothesis of There exists some constant , depending only on , such that
for every and every .
The proof consists of several properties of parabolic equations, and we defer it to the Appendix B.
With the technical estimates given above, we are now able to prove Theorem 4.1. We start with the upper bound.
Proof of Theorem 4.1, upper bound.
We first work under assumption (a).
Given , we know that
whenever . Thanks to Lemma 6.2, we have
whenever . So the bias term is bounded by
Then, along with Lemma 4.1, the total upper bound of the estimation error is given by
Taking and , we get
The last inequality holds when . Note that the implicit constant in the inequality depends only , and the values of near .
Next, we work under the assumption (b). Analogous to above, we have
Taking and , we get
As , , and , we have
This leads to the final asymptotic upper bound in (4.1), namely
∎
Finally, we demonstrate the lower bound using Le Cam’s two-point method (see for instance Chapter 2 of [26]). We shall construct two examples of graphon mean-field systems such that, the total variation distance between their laws is bounded by , while the densities at differ by some quantity of order . The construction is adapted from [28], with an extra factor for the graphon index , so we will skip some technical details in the proof below.
Proof of Theorem 4.1, lower bound.
Step 1. We consider graphon particle systems with no interactions.
Pick a smooth potential function such that is Lipschitz, in a neighborhood of , and
Define the drift . Pick a Lipschitz continuous function such that in a neighborhood of , and define the graphon weight . We set and define
Then we obtain a family of diffusion processes such that
(6.1) |
Notice that ’s are independent, and is the invariant distribution of . This gives a graphon particle system with time-invariant density function . In particular, we may assume that .
Now we consider a deviation from the system (6.1). Let be a cut-off function such that
-
•
and ,
-
•
for every , and ,
-
•
.
Let be sufficiently small. Then define and so that
where are positive scalars that tend to as . Let . Then we construct the second particle system similar to the above, with time-invariant density
Moreover, to maintain the desired Lipschitz and Hölder continuity, we need
This allows us to take and , which also ensures that .
Step 2. Now we run the finite-population systems derived from the above two graphon particle systems and make observations of the particle positions. For (6.1), the particles display the dynamics
The distributions of the particles coincide with those in the graphon system, with joint law
Similarly, the joint law in the second system is given by
Then, following the strategy in [28], with Pinsker’s inequality, we have
Taylor’s theorem gives
(6.2) |
where the remainder term . Notice that is bounded above in a neighborhood of . So there exists some constant such that
Since , we have
where we set . This implies
Thus
when is chosen to be small enough and is sufficiently large.
6.2. Proof of Theorem 4.2
We first justify the improved upper bound. For notation simplicity, let us define the -norm of a function via
Proof of Lemma 4.2.
Fix . We consider the following telescoping sum
This allows us to write
with obvious definitions of the four components.
Step 1. To bound , we observe that
where , and . Then the first part becomes
where is the measure under which are independent, with corresponding laws , . Since is bounded, and is bounded in a small neighborhood of , we have
where the implicit constant is uniform over . This gives
Applying Bernstein’s inequality gives
for some positive constants . Note that is supported on , so
Thus
For the second part, we observe that
This is identical to in the proof of Lemma 3.2, which gives
Thus
Step 2. To bound , let us abuse the notation for the mean-field dependent quantity
Then we may write
Note that and are bounded and Lipschitz, we may compare with Proposition 19 under Assumption 4(iii) [28]. This gives a uniform-in-time bound
Now, applying Cauchy-Schwarz twice, we obtain
Applying Cauchy-Schwarz to the second term again, we see that
For the first term, Minkowski’s inequality gives an upper bound
(6.3) |
where . With the same strategy applied to bound , we get
which follows from the fact that
On the other hand, using the idea of , we get
Those lead to the following asymptotic upper bound for the first term,
Joining the above leads to
where we assume .
Step 3. Now, to bound , we apply Itô’s isometry to see that
Note that
We write again
Using the bound for (6.3) in Step 2 leads to
whenever .
Summarizing the above, we conclude that
∎
Finally, analogous to the analysis of , we are able to show the optimality of .
Proof of Theorem 4.2.
The strategies are identical to the proof of Theorem 4.1.
Step 1: Upper bound. Fix , , , and . We assume that .
Recall that , where
The boundedness of ensures the Lipschitz continuity of in variable , while the Lipschitz continuity of leads to local Hölder -continuity of in variable for . Moreover, for , by Itô’s formula we have
Given the uniform boundedness of , , , and , we have a uniform bound
and this is further bounded by for whenever . Then we have as well, and so does . Thus
Joining the above leads to the bound
Choosing , , and , we get
Note that the implicit constant depends only on , the and norms of , and the values of in a small neighborhood of . That concludes (4.5).
Step 2: Lower bound. We construct examples and apply the two-point comparison lemma in a similar way as in the proof of Theorem 4.1.
Let . We consider also models with no interactions, so that
Pick and so that . Note that there is no interaction, so the particles are independent, with joint law denoted by .
Choose some such that
-
•
, ,
-
•
, ,
-
•
.
For and some small enough , pick and so that
where , , are sequences of scalars such that
With proper choice of parameters , small enough and large enough , the local Hölder continuity of the density follows from classical estimates given in [9]. This means we may assume .
Denote by the joint law of the particles derived from the parameters . Following the idea of Lemma 28 in [28], we see that
(6.4) |
Given the compact support of and local boundedness of , (6.4) is further bounded by
when is sufficiently small.
On the other hand,
Applying Le Cam’s two-point comparison method gives the lower bound
∎
Acknowledgement
We express our sincere gratitude to Marc Hoffmann for helpful discussions.
Appendix A Intuitions of Estimators
Recall that
With Condition 2.1(2) and 2.2(2), we may further expand it as
where the convolution here is done on the space .
The first term is independent of time , so we have . Note that is a linear operator, and we may approximate it by some finite difference operator
We consider some linear operator with bounded function that approximates the differential operator, and
Then the deconvolution method gives
i.e. . Thus it leads to the formula
whenever well-defined.
Once we have an estimate of and an estimate of , we may plug them into this formula. With some additional cutoff factors (to avoid the denominators being too small), we produce the estimator (2.6).
Appendix B Proofs of Technical Lemmas
Proof of Proposition 2.1.
Recall that for each , the dynamic of type- particles is given by , where is Lipschitz on (this is not hard to verify). So uniquely solves the Fokker-Planck equation
Theorem 7.3.3 in [9] gives a local upper bound
for all and any bounded open set , whenever the -norm is well-defined.
We find that for any ,
From Condition 2.3(1) we know that there exists some such that is uniformly bounded by some constant outside the unit ball for all . We cover with open balls, so that
holds for every , , .
Denote the above upper bound by . Then we have for every and that
A classical estimates shows that , so the above quantity tends to 0 as . ∎
Proof of Leamma 5.1.
From the proof of Theorem 3.2 of [3] we see that there exists some constant such that
for all and , which gives the first inequality. The second inequality follows immediately from dominated convergence. ∎
Proof of Lemma 6.2.
For the first part, we recall that satisfies the Fokker-Planck equation in the following sense:
For simplicity we work under the (additional) assumption that for some constant .
Given distinct , we have
Notice that,
so there exist some constant such that
and similarly, , for any . Then we have
where the constant depends only on the Lipschitz coefficients of and .
We compare the above inequality with the following differential equation
with initial condition . This is a linear homogeneous parabolic equation. Thanks to maximum principle, we have
It remains to bound .
Consider a time reversal of , namely . It satisfies the following equation
with terminal condition .
Note that is bounded. Then Feynman-Kac formula reads
where is a diffusion process with dynamics
and is a -dimensional Brownian motion, under the measure .
Thus we have
where is given by Condition 2.3(3). This implies at every fixed , where the implicit constant is independent of .
Similarly, the other direction produces the same bound. Hence, there exists some constant such that, for every , , and , it holds that
∎
Appendix C Reduction of Assumption 2.1
Recall the operator . Our estimator requires computing the quantity . If in a set with positive Lebesgue measure we have , a good estimator would lead to small . This might blow up the fraction and keep us away from a good estimation for . Therefore, the ad hoc assumption on the nonvanishing property of is somehow inevitable in this problem. It is also commonly seen in learning sample density with unknown error distribution (see [24] for instance).
However, the assumption 2.1 is not a trivial property of . Given the nonlinearity of the Fokker-Planck equation associated to (1.1), computating the explicit formula for is mostly impossible. To the best of our knowledge, no explicit solutions for graphon systems (1.1) satisfying all our assumptions have been presented. In this appendix, we study some special cases, where Assumption 2.1 is reduced to some weaker conditions that are easier to verify. We work under the hypothesis of Theorem 2.1, and assume for simplicity.
Case 1: Degeneration to a homogeneous system
Suppose is constant. Corollary 2.3, [18], states that, given that the map
is also constant (denoted by ), the density map is also constant, and it solves the classical McKean-Vlasov equation
where we may view as a single quantity. In our model , must be 1-periodic on .
Case 2: Denegeration to a finite graph
Define the degree of an index with respect to a subset by
and
Consider the partition , and denote by the subset . Assume the degree on each part of the partition is constant, i.e., for , we have
(C.1) |
whenever . An example is , where is defined on , supported on , and -periodic on . Then we take for every , so that
Assume further that the initial data map is also constant on each . Theorem 2.1 in [18] tells us that the map is constant in each part of the partition, that is, whenever .
We set , for , , and set for , which is well-defined due to (C.1). Then they satisfy a family of coupled equations:
where is treated as an matrix. Assume that all are equal to some constant . We see that
Let (it is indeed in this situation), then it solves
(C.2) |
Note that , and we may define
Then (C.2) is the associated Fokker-Planck equation for the mean-field diffusion process
and is the density of .
Now we may compute
where in this case
We focus on the part where , where if and only if
Set
Notice that for every , solves the linear differential equation
(C.3) |
Its canonical diffusion process has the following dynamics
where . Then Assumption 2.1 now reduces to Assumption 16 in [28] on for almost every .
Remark C.1.
In the most general case, are the solutions to a system of infinitely many fully coupled nonlinear differential equations. There are no explicit formulae either for to fit into a condition involving only the operator . However, each in Case 2 is now the solution to some linear equation (though the coefficients involve , which solves some other equation and could be seen as a known quantity). The assumption becomes much milder in this sense, and that is the main reduction in Case 2.
References
- [1] C. Amorino, A. Heidari, V. Pilipauskaitė, and M. Podolskij. Parameter estimation of discretely observed interacting particle systems. Stochastic Processes and their Applications, 163:350–386, 2023.
- [2] J. Baladron, D. Fasoli, O. Faugeras, and J. Touboul. Mean-field description and propagation of chaos in networks of hodgkin-huxley and fizhugh-nagumo neurons. Electron. J. Statist., 2, 2012.
- [3] E. Bayraktar, S. Chakraborty, and R. Wu. Graphon mean field systems. Ann. Appl. Probab., 33(5):3587 – 3619, 2023.
- [4] E. Bayraktar and D. Kim. Concentration of measure for graphon particle system. Adv. Appl. Probab., pages 1–28, 2024.
- [5] E. Bayraktar and R. Wu. Stationarity and uniform in time convergence for the graphon particle system. Stochastic Processes and their Applications, 150:532–568, 2022.
- [6] E. Bayraktar and R. Wu. Graphon particle system: Uniform-in-time concentration bounds. Stochastic Processes and their Applications, 156:196–225, 2023.
- [7] E. Bayraktar, R. Wu, and X. Zhang. Propagation of chaos of forward-backward stochastic differential equations with graphon interactions. Applied Mathematics and Optimization, 88(1):25, 2023.
- [8] D. Belomestny, V. Pilipauskaitė, and M. Podolskij. Semiparametric estimation of mckean-vlasov sdes. Ann. Inst. H. Poincaré Probab. Statist., 59(1):79–96, 2023.
- [9] V. I. Bogachev, N. V. Krylov, M. Rockner, and S. V. Shaposhnikov. Fokker-Planck-Kolmogorov equations. Mathematical surveys and monographs, volume 207. American Mathematical Society, Providence, Rhode Island, 2015.
- [10] F. Bolley, A. Guillin, and C. Villani. Quantitative concentration inequalities for empirical measures on non-compact spaces. Probab. Theory Relat. Fields, 137(3):541–593, 2007.
- [11] M. Burger, V. Capasso, and D. Morale. On an aggregation model with long and short range interactions. Nonlinear Analysis: Real World Applications, 8(3):939–958, 2007.
- [12] P. E. Caines and M. Huang. Graphon mean field games and their equations. SIAM Journal on Control and Optimization, 59(6):4373–4399, 2021.
- [13] C. Canuto, F. Fagnani, and P. Tilli. An eulerian approach to the analysis of krause’s consensus models. SIAM Journal on Control and Optimization, 50(1):243–265, 2012.
- [14] R. Carmona, D. B. Cooney, C. V. Graves, and M. Laurière. Stochastic graphon games: I. the static case. Mathematics of Operations Research, 47(1):750–778, 2021.
- [15] R. A. Carmona. Applications of mean field games in financial engineering and economic theory. Proceedings of Symposia in Applied Mathematics, 78, 2021.
- [16] B. Chazelle, Q. Jiu, Q. Li, and C. Wang. Well-posedness of the limiting equation of a noisy consensus model in opinion dynamics. J. Differ. Equ., 263(1):365–397, 2017.
- [17] F. Coppini. Long time dynamics for interacting oscillators on graphs. Ann. Appl. Probab., 32(1):360–391, 2022.
- [18] F. Coppini. A note on fokker-planck equations and graphons, 2022.
- [19] F. Coppini, H. Dietert, and G. Giacomin. A law of large numbers and large deviations for interacting diffusions on erdős-rényi graphs. Stochastics and Dynamics, page 2050010, 2019.
- [20] F. Delarue. Mean field games: A toy model on an erdös-renyi graph. ESAIM: Proceedings and Surveys, 60:1–26, 2017.
- [21] S. Delattre, G. Giacomin, and E. Lucon. A note on dynamical models on random graphs and fokker-planck equations. Journal of Statistical Physics, 165(4):785–798, 2016.
- [22] P. Dupuis and G. S. Medvedev. The large deviation principle for interacting dynamical systems on random graphs. Communications in Mathematical Physics, 390(2):545–575, 2022.
- [23] J.-P. Fouque and L.-H. Sun. Systemic Risk Illustrated, pages 444–452. Cambridge University Press, 2013.
- [24] J. Johannes. Deconvolution with unknown error distribution. The Annals of Statistics, 37(5A):2301–2323, 2009.
- [25] V. N. Kolokoltsov. Nonlinear Markov processes and kinetic equations. Cambridge Tracts in Mathematics. Cambridge University Press, 2010.
- [26] L. Le Cam. Asymptotic Methods in Statistical Decision Theory. Springer Series in Statistics. Springer-Verlag, New York, NY, 1986.
- [27] L. Lovász. Large networks and graph limits. Colloquium Publications. American Mathematical Society, 2012.
- [28] L. D. Maestra and M. Hoffmann. Nonparametric estimation for interacting particle systems: Mckean-vlasov models. Probab. Theory Relat. Fields, 182:551–613, 2022.
- [29] H. P. McKean Jr. A class of markov processes associated with nonlinear parabolic equations. Proceedings of the National Academy of Sciences of the United States of America, 56:1907–1911, 1966.
- [30] A. Mogilner and L. Edelstein-Keshet. A non-local model for a swarm. J. Math. Biol., 38:534–570, 1999.
- [31] R. I. Oliveira and G. H. Reis. Interacting diffusions on random graphs with diverging average degrees: Hydrodynamics and large deviations. Journal of Statistical Physics, 2019.
- [32] F. Parise and A. Ozdaglar. Graphon games: A statistical framework for network games and interventions. Econometrica, 91(1):191–225, 2023.
- [33] T. Sarkar, A. Bhattacharjee, H. Samanta, K. Bhattacharya, and H. Saha. Optimal design and implementation of solar pv-wind-biogas-vrfb storage integrated smart hybrid microgrid for ensuring zero loss of power supply probability. Energy Conversion and Management, 191:102–118, 2019.
- [34] A.-S. Sznitman. Topics in propagation of chaos. Lecture Notes in Mathematics. Springer-Verlag, New York, 1991.
- [35] A. A. Vlasov. Many-Particle Theory and Its Application to Plasma. Russian Monographs and Texts on Advanced Mathematics and Physics 8. Gordon and Breach, New York, 1961.
- [36] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Structural Analysis in the Social Sciences. Cambridge University Press, 1994.