This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Convergence of Random Batch Method for interacting particles with disparate species and weights

Shi Jin [email protected] Lei Li [email protected] Jian-Guo Liu [email protected] Department of Mathematics and Department of Physics, Duke University, Durham, NC 27708, USA.
Abstract

We consider in this work the convergence of Random Batch Method proposed in our previous work [Jin et al., J. Comput. Phys., 400(1), 2020] for interacting particles to the case of disparate species and weights. We show that the strong error is of O(τ)O(\sqrt{\tau}) while the weak error is of O(τ)O(\tau) where τ\tau is the time step between two random divisions of batches. Both types of convergence are uniform in NN, the number of particles. The proof of strong convergence follows closely the proof in [Jin et al., J. Comput. Phys., 400(1), 2020] for indistinguishable particles, but there are still some differences: since there is no exchangeability now, we have to use a certain weighted average of the errors; some refined auxiliary lemmas have to be proved compared with our previous work. To show that the weak convergence of empirical measure is uniform in NN, certain sharp estimates for the derivatives of the backward equations have been used. The weak convergence analysis is also illustrating for the convergence of Random Batch Method for NN-body Liouville equations.

1 Introduction

Interacting particle systems are ubiquitous in natural, for example, molecules in fluids [19], plasma [7], galaxy in universe. In addition, many collective behaviors in natural and social sciences are due to interacting individuals, and examples include swarming [44, 11, 10, 15], flocking [14, 23, 1], and chemotaxis [24, 5] and consensus clusters in opinion dynamics [38]. In many models for these phenomenon, individual particles can have weights. For example, in the point vortex model [12, 22, 33], the “particles” correspond to different point vortices with different strength and they interact each other through a Hamiltonian system. To approximate the nonlinear Fokker-Planck equation for interacting particles, like the Keller-Segel equation, one can use interacting particles with different masses to approximate the dynamics and then compute the empirical density more efficiently [13, 34]. There may also be several species, where the particles may have different features; for example, in the microscopic description of the Poisson-Boltzmann equation, particles with different charges interact with each other through Coulomb forces [2, 26].

The systems mentioned above can be written as the second order ODE/SDE system for 1iN1\leq i\leq N

dXi=VidtdVi=1N1j=1,jiNqjF(XiXj)dtγVidt+σdWi\displaystyle\begin{split}&dX^{i}=V^{i}\,dt\\ &dV^{i}=\frac{1}{N-1}\sum_{j=1,j\neq i}^{N}q_{j}F(X^{i}-X^{j})\,dt-\gamma V^{i}\,dt+\sigma\,dW^{i}\end{split} (1.1)

or the first order ODE/SDE system

dXi=1N1j=1:jiNqjF(XiXj)dt+σdWi,i=1,,N,\displaystyle dX^{i}=\frac{1}{N-1}\sum_{j=1:j\neq i}^{N}q_{j}F(X^{i}-X^{j})\,dt+\sigma\,dW^{i},~{}~{}i=1,\cdots,N, (1.2)

where qjq_{j}’s are the weights (they can be mass, charges etc). Note that (1.1) contains the Hamiltonian case when the force FF is conservative and γ=σ=0\gamma=\sigma=0. If one discretizes (1.1) or (1.2) directly, the computational cost per time step is O(N2)O(N^{2}), which is undesired for large NN. In the case of fast enough decaying interactions, the Fast Multipole Method (FMM) [42] can reduce the complexity per iteration to O(N)O(N). However, the implementation is not easy, as it needs some advanced data structures.

The authors proposed in [25] a simple random algorithm, called the Random Batch Method (RBM), to reduce the computation cost per time step from O(N2)O(N^{2}) to O(N)O(N) for indistinguishable particles (thus with the same weight). The idea was to apply the “mini-batch” idea, famous for its application in the so-called stochastic gradient descent (SGD) [41, 8, 9] in machine learning. Later on, the “mini-batch” was used to develop a Markov Chain Monte Carlo method, called the stochastic gradient Langevin dynamics (SGLD), by Welling and Teh for Bayesian inference [45]. The random binary collisions were also used to simulate Boltzmann equation [6, 39, 4] or the mean-field equation for flocking [1]. How to design the mini-batch strategy depends on the specific applications. For interacting particle systems, the strategy in [25] is to do random grouping at each time interval and then let the particles interact within the groups on that small time interval. Compared with FMM, the accuracy is lower (halfth order in time step), but RBM is simpler to implement and is valid for more general potentials (e.g. the SVGD ODE [35, 30]). Intuitively, the method converges because there is time average in time, and thus the convergence is like that in the Law of Large Number, but in time (see [25] for more detailed explanation). Moreover, the RBM converges in the regime of mean field limit ([43, 20, 29]). In fact, the method is asymptotic-preserving regarding the mean-field limit, which means the algorithm can approximate the one-marginal distribution with error bound independent of NN.

The proof in [25] depends on the exchangeability of the particles. If the particles carry weights, they are not exchangeable. Even so, it is clear that RBM can be equally well applied for interacting particles in multispecies and with weights (see Algorithm 1 below). The goal of this paper is to analyze the convergence of RBM for interacting particles with weights or multispecies. We will discuss both strong and weak convergences, which rely on different techniques so that the error estimates are independent of NN. The proof of strong convergence follows closely the proof in [25] for indistinguishable particles, but there are still some differences: since there is no exchangeability, we have to use a kind of weighted average of the errors; some refined auxiliary lemmas have to be used here, compared with those in [25]. The weak convergence is for the empirical measures instead of the one marginal distribution as in [25]. Some sharp estimates for the derivatives of the backward equations have to be done so that the weak convergence is uniform in NN. The weak convergence analysis is also illustrating for the convergence of Random Batch Method for NN-body Liouville equations.

The rest of the paper is organized as follows. In section 2, we introduce some basic notations and give a detailed description of the Random Batch Method. Section 3 is devoted to the proof of strong convergence. In section 4, we show the weak first order convergence, where a key proposition for the estimates of derivatives of the backward equation is needed.

2 Setup and notations

We consider interacting particles with weights. Since in practice, there can be external field, and also there are some applications where the interaction kernels depend on the two specific particles (for example, the SVGD ODE [30]), we consider in general the following first order system

dXi=b(Xi)dt+1N1j:jimjKij(Xi,Xj)dt+σdWi,i=1,2,,N.\displaystyle dX^{i}=b(X^{i})\,dt+\frac{1}{N-1}\sum_{j:j\neq i}m_{j}K_{ij}(X^{i},X^{j})\,dt+\sigma dW^{i},~{}~{}i=1,2,\cdots,N. (2.1)

The argument in this paper can be generalized to second order systems without difficulty, which we omit. In (2.1), b()b(\cdot) is the external force field, {Wi}s\{W^{i}\}^{\prime}s are some given independent dd dimensional Wiener processes (the standard Brownian motions) and we impose

mj0.\displaystyle m_{j}\geq 0. (2.2)

Note that this model includes the cases for particles with multispecies since the signs of the interaction can be included into KijK_{ij} and for (1.2) mj=|qj|m_{j}=|q_{j}|.

For notational convenience, we define

Fi(x):=b(xi)+1N1j:jimjKij(xi,xj),\displaystyle F_{i}(x):=b(x_{i})+\frac{1}{N-1}\sum_{j:j\neq i}m_{j}K_{ij}(x_{i},x_{j}), (2.3)

where

x:=(x1,,xN)Nd.x:=(x_{1},\cdots,x_{N})\in\mathbb{R}^{Nd}.

The (random) empirical probability measure corresponding to (2.1) is given by

μN:=1Nj=1Nωjδ(xXj),\displaystyle\mu^{N}:=\frac{1}{N}\sum_{j=1}^{N}\omega_{j}\delta(x-X^{j}), (2.4)

where

ωj=Nmjjmj.\omega_{j}=\frac{Nm_{j}}{\sum_{j}m_{j}}.
Assumption 2.1.

We assume there are positive constants A,MA,M independent of NN such that

1Njmj=M,maxj|mj|A.\displaystyle\frac{1}{N}\sum_{j}m_{j}=M,~{}~{}\max_{j}|m_{j}|\leq A. (2.5)

With the assumption above, we find

ωj=O(1).\displaystyle\omega_{j}=O(1). (2.6)

Below are some assumptions on the external field and interaction kernels.

Assumption 2.2.

Moreover, we assume b()b(\cdot) is one-sided Lipschitz:

(z1z2)(b(z1)b(z2))β|z1z2|2\displaystyle(z_{1}-z_{2})\cdot(b(z_{1})-b(z_{2}))\leq\beta|z_{1}-z_{2}|^{2} (2.7)

for some constant β\beta and that b,bb,\nabla b have polynomial growth

|b(z)|+|b|C(1+|z|)q.\displaystyle|b(z)|+|\nabla b|\leq C(1+|z|)^{q}. (2.8)

The functions Kij(,)K_{ij}(\cdot,\cdot) and their derivatives up to second order are uniformly bounded in 1i,jN1\leq i,j\leq N.

2.1 The Random Batch Method and mathematical setup

We now give some detailed explanation of the random batch method proposed in [25], when applied on (2.1). Suppose the computational interval is [0,T][0,T]. We pick a small time step τ\tau, and define the discrete time grids

tk:=kτ,k.\displaystyle t_{k}:=k\tau,~{}~{}k\in\mathbb{N}. (2.9)

The number of iteration for the algorithm is

NT:=Tτ.\displaystyle N_{T}:=\left\lceil\frac{T}{\tau}\right\rceil. (2.10)

At each time grid tkt_{k}, we divide the

N=npN=np

particles into nn small batches with equal size pp (pNp\ll N, often p=2p=2) randomly. We have assumed pp divides NN for convenience. Denote the nn batches by 𝒞q,q=1,,n\mathcal{C}_{q},q=1,\cdots,n, and then each particle only interacts particles within its own batch. The detail is shown in Algorithm 1). Clearly, each iteration contains two main steps: (1) Randomly dividing the particles into nn batches (implemented by random permuation, costing O(N)O(N) [17]); (2) particles interact inside batches only.

Algorithm 1 (RBM)
1:for k in 1:[T/τ]k\text{ in }1:[T/\tau] do
2:     Divide {1,2,,pn}\{1,2,\ldots,pn\} into nn batches randomly.
3:     for each batch 𝒞q\mathcal{C}_{q} do
4:         Update X~i\tilde{X}^{i}’s (i𝒞qi\in\mathcal{C}_{q}) by solving the following SDE with t[tk1,tk)t\in[t_{k-1},t_{k}).
dX~i=b(X~i)dt+1p1j𝒞q,jimjKij(X~i,X~j)dt+σdWi.\displaystyle d\tilde{X}^{i}=b(\tilde{X}^{i})dt+\frac{1}{p-1}\sum_{j\in\mathcal{C}_{q},j\neq i}m_{j}K_{ij}(\tilde{X}^{i},\tilde{X}^{j})dt+\sigma dW^{i}. (2.11)
5:     end for
6:end for

Above, the Wiener process WiW^{i} (Brownian motion) used is the same as in (2.1).

Remark 2.1.

For particles with weights, it is desirable to get some random batches by importance sampling. This is left for future study.

We denote 𝒞q(k)\mathcal{C}_{q}^{(k)}, 1qn1\leq q\leq n the batches at tkt_{k}, and

𝒞(k):={𝒞1(k),,𝒞n(k)},\displaystyle\mathcal{C}^{(k)}:=\{\mathcal{C}_{1}^{(k)},\cdots,\mathcal{C}_{n}^{(k)}\}, (2.12)

will denote the random division of batches at tkt_{k}. It is standard by the Kolmogorov extension theorem [16] that there exists a probability space (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) so that the random variables {X0i,Wi,𝒞(k):1iN,k0}\{X_{0}^{i},W^{i},\mathcal{C}^{(k)}:1\leq i\leq N,k\geq 0\} are on this probability space and they are all independent. As usual, we use 𝔼\mathbb{E} to denote the integration on Ω\Omega with respect the probability measure \mathbb{P}. For the convenience of the analysis, we introduce the L2()L^{2}(\mathbb{P}) norm as:

v=𝔼|v|2.\displaystyle\|v\|=\sqrt{\mathbb{E}|v|^{2}}. (2.13)

Define the filtration {k}k0\{\mathcal{F}_{k}\}_{k\geq 0} by

k1=σ(X0i,Wi(t),𝒞(j);ttk1,jk1).\displaystyle\mathcal{F}_{k-1}=\sigma(X_{0}^{i},W^{i}(t),\mathcal{C}^{(j)};t\leq t_{k-1},j\leq k-1). (2.14)

In other words, k1\mathcal{F}_{k-1} is the σ\sigma-algebra generated by the initial values X0iX_{0}^{i} (i=1,,Ni=1,\ldots,N), Bi(t)B^{i}(t), ttk1t\leq t_{k-1}, and 𝒞(j)\mathcal{C}^{(j)}, jk1j\leq k-1. Hence, k1\mathcal{F}_{k-1} contains the information of how batches are constructed for t[tk1,tk)t\in[t_{k-1},t_{k}). We also introduce the filtration {𝒢k}k0\{\mathcal{G}_{k}\}_{k\geq 0} by

𝒢k1=σ(X0i,Wi(t),𝒞(j);ttk1,jk2).\displaystyle\mathcal{G}_{k-1}=\sigma(X_{0}^{i},W^{i}(t),\mathcal{C}^{(j)};t\leq t_{k-1},j\leq k-2). (2.15)

If we use σ(𝒞(k1))\sigma(\mathcal{C}^{(k-1)}) to mean the σ\sigma-algebra generated by 𝒞(k1)\mathcal{C}^{(k-1)}, the random division of batches at tk1t_{k-1}, then k1=σ(𝒢k1σ(𝒞(k1)))\mathcal{F}_{k-1}=\sigma(\mathcal{G}_{k-1}\cup\sigma(\mathcal{C}^{(k-1)})).

For further discussion, given some random batches 𝒞\mathcal{C}, we define the random variables

Iij={1i,j are in the same batch,0 otherwise ,1i,jN.\displaystyle I_{ij}=\begin{cases}1&i,j\text{~{}are in the same batch,}\\ 0&\text{~{}otherwise~{}}\end{cases},~{}~{}1\leq i,j\leq N. (2.16)

We will focus on the approximation error of X~\tilde{X} for XX for t[0,T]t\in[0,T]. In particular, we define the error process

Zi=X~iXi,i=1,,N.\displaystyle Z^{i}=\tilde{X}^{i}-X^{i},~{}~{}i=1,\cdots,N. (2.17)

2.2 A comment about interacting particles with multispecies

In applications, the most important cases where particles carry weights are the multispecies cases. For example, when we simulate the microscopic particles for the Poisson-Boltzmann equations [2, 26], we need to consider charged particles with different valences, in particular

dXi=1N1j=1:jiNQjzizjF(XiXj)dt+σdWi,i=1,,N,\displaystyle dX^{i}=\frac{1}{N-1}\sum_{j=1:j\neq i}^{N}Q_{j}z_{i}z_{j}F(X^{i}-X^{j})\,dt+\sigma\,dW^{i},~{}~{}i=1,\cdots,N, (2.18)

where zi=±1z_{i}=\pm 1 represents whether the charge is positive or negative and Qj0Q_{j}\geq 0 is the absolute value of the charges. In this case, we can define

Kij(x,y)=zizjF(xy),\displaystyle K_{ij}(x,y)=z_{i}z_{j}F(x-y), (2.19)

and this reduces to (2.1).

Also, people may care about different densities for different species. Similar to (2.4), one can compute the empirical measures for all these species separately. The empirical measure (2.4) is then a mixture of them. Hence, the model (2.1) is rich enough to include interacting particles with disparate species and weights. Below, we will address these uniformly under the framework of (2.1).

3 The strong convergence

Since there is no exchangeability, we have to use weighted average of the errors. We consider

J(t):=12Ni=1Nmi𝔼|X~iXi|2,\displaystyle J(t):=\frac{1}{2N}\sum_{i=1}^{N}m_{i}\mathbb{E}|\tilde{X}^{i}-X^{i}|^{2}, (3.1)

which is the strong error. As a common convention, the “12\frac{1}{2}” prefactor is used for energies of quadratic forms. Recently, a certain quantum correspondence of this error has been used by Golse et al. to prove the convergence of Random Batch Method for NN-body Schrödinger equations [21].

Theorem 3.1.

Suppose Assumptions 2.12.2 hold. Then, it holds that

suptTJ(t)Cτp1+τ2,\displaystyle\sup_{t\leq T}J(t)\leq C\sqrt{\frac{\tau}{p-1}+\tau^{2}}, (3.2)

where CC is independent of N,pN,p.

Define the error of the random approximation for the interacting force by

χi(x):=1p1j𝒞θ,jimjKij(xi,xj)1N1j:jimjKij(xi,xj)=1p1j:jiIijmjKij(xi,xj)1N1j:jimjKij(xi,xj),\displaystyle\begin{split}\chi_{i}(x)&:=\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}K_{ij}(x_{i},x_{j})-\frac{1}{N-1}\sum_{j:j\neq i}m_{j}K_{ij}(x_{i},x_{j})\\ &=\frac{1}{p-1}\sum_{j:j\neq i}I_{ij}m_{j}K_{ij}(x_{i},x_{j})-\frac{1}{N-1}\sum_{j:j\neq i}m_{j}K_{ij}(x_{i},x_{j}),\end{split} (3.3)

where xidx_{i}\in\mathbb{R}^{d}, and again x=(x1,,xN)Ndx=(x_{1},\cdots,x_{N})\in\mathbb{R}^{Nd}.

We have the following facts

Lemma 3.1.

For iji\neq j, it holds that

𝔼Iij=p1N1,\displaystyle\mathbb{E}I_{ij}=\frac{p-1}{N-1}, (3.4)

and for distinct i,j,i,j,\ell, it holds that

𝔼IijIi=(p1)(p2)(N1)(N2).\displaystyle\mathbb{E}I_{ij}I_{i\ell}=\frac{(p-1)(p-2)}{(N-1)(N-2)}. (3.5)

The proofs have been done in the proof of [25, Lemma 3.1], for which we omit. Using Lemma 3.1, one can have the following consistency of the Random Batch Method.

Lemma 3.2.

For given x=(x1,,xN)Ndx=(x_{1},\ldots,x_{N})\in\mathbb{R}^{Nd}, it holds that

𝔼χi(x)=0.\displaystyle\mathbb{E}\chi_{i}(x)=0. (3.6)

Moreover, the second moment is given by

𝔼|χi(x)|2=(1p11N1)Λi(x),\displaystyle\mathbb{E}|\chi_{i}(x)|^{2}=\left(\frac{1}{p-1}-\frac{1}{N-1}\right)\Lambda_{i}(x), (3.7)

where

Λi(x):=1N2j:ji|mjKij(xi,xj)1N1:imKi(xi,x)|2.\displaystyle\Lambda_{i}(x):=\frac{1}{N-2}\sum_{j:j\neq i}\Big{|}m_{j}K_{ij}(x_{i},x_{j})-\frac{1}{N-1}\sum_{\ell:\ell\neq i}m_{\ell}K_{i\ell}(x_{i},x_{\ell})\Big{|}^{2}. (3.8)

Though slightly different from [25, Lemma 3.1], the proof is essentially the same and we omit.

We move to some important additional estimates:

Lemma 3.3.

Let XiX^{i} and X~i\tilde{X}^{i} be solutions to (2.1) and (2.11) respectively. Suppose Assumptions 2.12.2 hold. Then,

suptT(𝔼|Xi(t)|q+𝔼|X~i(t)|q)Cq.\displaystyle\sup_{t\leq T}(\mathbb{E}|X^{i}(t)|^{q}+\mathbb{E}|\tilde{X}^{i}(t)|^{q})\leq C_{q}. (3.9)

Besides, for any k>0k>0 and q2q\geq 2,

supt[tk1,tk)|𝔼(|X~i(t)|q|k1)|C1|X~i(tk1)|q+C2.\displaystyle\sup_{t\in[t_{k-1},t_{k})}\left|\mathbb{E}(|\tilde{X}^{i}(t)|^{q}|\mathcal{F}_{k-1})\right|\leq C_{1}|\tilde{X}^{i}(t_{k-1})|^{q}+C_{2}. (3.10)

holds almost surely.

Moreover, almost surely, it holds that

|𝔼(X~i(t)X~i(tk1)|k1)|C(1+|X~i(tk1)|q)τ,|𝔼(|X~i(t)X~i(tk1)|2|k1)|C(1+|X~i(tk1)|q)τ.\displaystyle\begin{split}&|\mathbb{E}(\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})|\mathcal{F}_{k-1})|\leq C(1+|\tilde{X}^{i}(t_{k-1})|^{q})\tau,\\ &\Big{|}\mathbb{E}\left(|\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})|^{2}|\mathcal{F}_{k-1}\right)\Big{|}\leq C(1+|\tilde{X}^{i}(t_{k-1})|^{q})\tau.\end{split} (3.11)

Note that the second equation in (3.11) is different from that in [25]. In fact, the proof in [25] has a small gap, and one needs this refined estimate to fill in that gap as well ([25] Page 13, from Line 17 to Line 19, the variable 𝔼(|δX~i|2+|δX~j|2|m1)\mathbb{E}(|\delta\tilde{X}^{i}|^{2}+|\delta\tilde{X}^{j}|^{2}|\mathcal{F}_{m-1}) is not independent of 𝒞θ\mathcal{C}_{\theta}; to get Line 19, we need this refined version). Though the proof is not hard, we attach it in Appendix A.

The following lemma is an improved version of [25, Lemma 3.2], which is very important to establish the strong convergence for the problem considered in this paper.

Lemma 3.4.

Fix i{1,,N}i\in\{1,\ldots,N\}. Let 𝒞θ\mathcal{C}_{\theta} be the random batch of size pp that contains ii in the random division. Let YjY_{j} (1jN1\leq j\leq N) be NN random variables (or random vectors) that are independent of 𝒞θ\mathcal{C}_{\theta}. Then, for p2p\geq 2,

1p1j𝒞θ,jiYj(1N1j:jiYj2)1/2.\displaystyle\left\|\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}Y_{j}\right\|\leq\left(\frac{1}{N-1}\sum_{j:j\neq i}\|Y_{j}\|^{2}\right)^{1/2}. (3.12)
Proof.

By the definition and independence,

1p1j𝒞θ,jiYj2=1(p1)2𝔼|j:jiIijYj|2=1(p1)2j,:ji,i𝔼(IijIi)𝔼(YjY)1(p1)2[j,:ji,i,j(p1)(p2)(N1)(N2)YjY+j:ji𝔼IijYj2]=:R1+R2.\begin{split}\left\|\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}Y_{j}\right\|^{2}&=\frac{1}{(p-1)^{2}}\mathbb{E}\left|\sum_{j:j\neq i}I_{ij}Y_{j}\right|^{2}\\ &=\frac{1}{(p-1)^{2}}\sum_{j,\ell:j\neq i,\ell\neq i}\mathbb{E}(I_{ij}I_{i\ell})\mathbb{E}(Y_{j}\cdot Y_{\ell})\\ &\leq\frac{1}{(p-1)^{2}}\Bigg{[}\sum_{j,\ell:j\neq i,\ell\neq i,j\neq\ell}\frac{(p-1)(p-2)}{(N-1)(N-2)}\|Y_{j}\|\|Y_{\ell}\|\\ &\quad\quad\quad\quad\quad\quad+\sum_{j:j\neq i}\mathbb{E}I_{ij}\|Y_{j}\|^{2}\Bigg{]}\\ &=:R_{1}+R_{2}.\end{split}

Note that the independence is used in the second equality. The first inequality is due to Lemma 3.1.

It is easy to calculate

R1p2(p1)(N1)(N2)j,:ji,i,j(12Yj2+12Y2)=p2(p1)(N1)j:jiYj2,\begin{split}R_{1}&\leq\frac{p-2}{(p-1)(N-1)(N-2)}\sum_{j,\ell:j\neq i,\ell\neq i,j\neq\ell}(\frac{1}{2}\|Y_{j}\|^{2}+\frac{1}{2}\|Y_{\ell}\|^{2})\\ &=\frac{p-2}{(p-1)(N-1)}\sum_{j:j\neq i}\|Y_{j}\|^{2},\end{split}

while

R2=1(p1)(N1)j:jiYj2.R_{2}=\frac{1}{(p-1)(N-1)}\sum_{j:j\neq i}\|Y_{j}\|^{2}.

Hence,

R1+R21N1j:jiYj2.R_{1}+R_{2}\leq\frac{1}{N-1}\sum_{j:j\neq i}\|Y_{j}\|^{2}.

The claim thus follows. ∎

We now consider the error process defined in (2.17). The derivative of ZiZ^{i} is clearly given by

ddtZi=[b(X~i)b(Xi)]+1p1j𝒞θ,jimjKij(X~i,X~j)1N1j𝒞θ,jimjKij(Xi,Xj),\frac{d}{dt}Z^{i}=[b(\tilde{X}^{i})-b(X^{i})]+\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}K_{ij}(\tilde{X}^{i},\tilde{X}^{j})\\ -\frac{1}{N-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}K_{ij}(X^{i},X^{j}), (3.13)

where 𝒞θ\mathcal{C}_{\theta} is the random batch in 𝒞\mathcal{C} that contains ii.

Define

δKij(t):=Kij(X~i(t),X~j(t))Kij(Xi(t),Xj(t)).\displaystyle\delta K_{ij}(t):=K_{ij}(\tilde{X}^{i}(t),\tilde{X}^{j}(t))-K_{ij}(X^{i}(t),X^{j}(t)). (3.14)

The right hand side of (3.15) can then be written as

ddtZi=[b(X~i)b(Xi)]+1N1j:jimjδKij+χi(X~)=[b(X~i)b(Xi)]+1p1j𝒞θ,jimjδKij+χi(X).\displaystyle\begin{split}\frac{d}{dt}Z^{i}&=[b(\tilde{X}^{i})-b(X^{i})]+\frac{1}{N-1}\sum_{j:j\neq i}m_{j}\delta K_{ij}+\chi_{i}(\tilde{X})\\ &=[b(\tilde{X}^{i})-b(X^{i})]+\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}\delta K_{ij}+\chi_{i}(X).\end{split} (3.15)

With this, one can obtain the following simple lemma

Lemma 3.5.

For t[tk1,tk)t\in[t_{k-1},t_{k}),

Zi(t)Zi(tk1)Cτ.\displaystyle\|Z^{i}(t)-Z^{i}(t_{k-1})\|\leq C\tau. (3.16)

Also, almost surely,

|Zi(t)||Zi(tk1)|(1+Cτ)+Cτ.\displaystyle|Z^{i}(t)|\leq|Z^{i}(t_{k-1})|(1+C\tau)+C\tau. (3.17)
Proof.

By (3.15), since bb has polynomial growth, the claim for Zi(t)Zi(tm1)\|Z^{i}(t)-Z^{i}(t_{m-1})\| is then an easy consequence of the qq-moment estimates in Lemma 3.3.

Dotting using (3.15) with ZiZ^{i} and using the one-sided Lipschitz condition in Assumption 2.2, one has

12ddt|Zi|2C|Zi|2+C|Zi|.\frac{1}{2}\frac{d}{dt}|Z^{i}|^{2}\leq C|Z^{i}|^{2}+C|Z^{i}|.

Hence, almost surely, it holds that

ddt|Zi|C|Zi|+C.\frac{d}{dt}|Z^{i}|\leq C|Z^{i}|+C.

The second claim then follows. ∎

We now give the proof of the strong convergence in Theorem 3.1.

Proof of Theorem 3.1.

First,

dJ(t)dt=1Ni=1Nmi𝔼{Zi[(Fi(X~)Fi(X))+χi(X~)]}.\frac{dJ(t)}{dt}=\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}\Big{\{}Z^{i}\cdot[(F_{i}(\tilde{X})-F_{i}(X))+\chi_{i}(\tilde{X})]\Big{\}}.

The first term is easy to bound by the one-sided Lipschitz condition of bb and the conditions of KijK_{ij} in Assumption 2.2 :

1Ni=1Nmi𝔼{Zi(Fi(X~)Fi(X))}CNimi|Zi|2+CN(N1)ijmimj𝔼(|Zi|2+|Zj||Zj|).\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}\Big{\{}Z^{i}\cdot(F_{i}(\tilde{X})-F_{i}(X))\Big{\}}\leq\frac{C}{N}\sum_{i}m_{i}|Z^{i}|^{2}+\frac{C}{N(N-1)}\sum_{i\neq j}m_{i}m_{j}\mathbb{E}(|Z^{i}|^{2}+|Z^{j}||Z^{j}|).

This is clearly bounded by J(t)J(t).

Now, we focus on the second term. The technique is the same as in our previous work [25], but some special modifications are needed for our problem here.

1Ni=1Nmi𝔼{Zi(t)χi(X~(t))}=1Ni=1Nmi𝔼{Zi(tk1)χi(X~(t))}+1Ni=1Nmi𝔼{(Zi(t)Zi(tk1))χi(X~(t))}=:I1+I2.\displaystyle\begin{split}\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}\Big{\{}Z^{i}(t)\cdot\chi_{i}(\tilde{X}(t))\Big{\}}=&\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}\Big{\{}Z^{i}(t_{k-1})\cdot\chi_{i}(\tilde{X}(t))\Big{\}}\\ &+\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}\Big{\{}(Z^{i}(t)-Z^{i}(t_{k-1}))\cdot\chi_{i}(\tilde{X}(t))\Big{\}}\\ =:&I_{1}+I_{2}.\end{split}

Step 1– Estimate of I1I_{1}:

For I1I_{1}, using the consistency result in Lemma 3.2, one has

I1=1Ni=1Nmi𝔼{Zi(tk1)[χi(X~(t))χi(X~(tk1))]}.I_{1}=\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}\left\{Z^{i}(t_{k-1})\cdot\Big{[}\chi_{i}(\tilde{X}(t))-\chi_{i}(\tilde{X}(t_{k-1}))\Big{]}\right\}.

In fact,

𝔼{Zi(tk1)χi(X~(tk1))}=𝔼{Zi(tk1)𝔼[χi(X~(tk1))|𝒢k1]}=𝔼0=0.\mathbb{E}\left\{Z^{i}(t_{k-1})\cdot\chi_{i}(\tilde{X}(t_{k-1}))\right\}=\mathbb{E}\left\{Z^{i}(t_{k-1})\cdot\mathbb{E}\Big{[}\chi_{i}(\tilde{X}(t_{k-1}))|\mathcal{G}_{k-1}\Big{]}\right\}=\mathbb{E}0=0.

This is the only place where the σ\sigma-algebra 𝒢k1\mathcal{G}_{k-1} is used.

Note that

𝔼{Zi(tk1)[χi(X~(t))χi(X~(tk1))]}=𝔼(Z1(tk1)𝔼(χi(X~(t))χi(X~(tk1))|k1))CZ1(tk1)𝔼[χi(X~(t))χi(X~(tk1))|k1].\begin{split}&\mathbb{E}\left\{Z^{i}(t_{k-1})\cdot\Big{[}\chi_{i}(\tilde{X}(t))-\chi_{i}(\tilde{X}(t_{k-1}))\Big{]}\right\}\\ &=\mathbb{E}\Bigg{(}Z^{1}(t_{k-1})\cdot\mathbb{E}(\chi_{i}(\tilde{X}(t))-\chi_{i}(\tilde{X}(t_{k-1}))|\mathcal{F}_{k-1})\Bigg{)}\\ &\leq C\|Z^{1}(t_{k-1})\|\left\|\mathbb{E}[\chi_{i}(\tilde{X}(t))-\chi_{i}(\tilde{X}(t_{k-1}))|\mathcal{F}_{k-1}]\right\|.\end{split}

Using the definition of χi\chi_{i} (3.3), one has

𝔼[χi(X~(s))χi(X~(tk1))|k1]=1p1j𝒞θ,jimj𝔼(δK~ij|k1)1N1j:jimj𝔼(δK~ij|k1),\begin{split}&\mathbb{E}\left[\chi_{i}(\tilde{X}(s))-\chi_{i}(\tilde{X}(t_{k-1}))|\mathcal{F}_{k-1}\right]\\ &=\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}\mathbb{E}(\delta\tilde{K}^{ij}|\mathcal{F}_{k-1})-\frac{1}{N-1}\sum_{j:j\neq i}m_{j}\mathbb{E}(\delta\tilde{K}^{ij}|\mathcal{F}_{k-1}),\end{split}

where

δK~ij=Kij(X~i(s),X~j(s))Kij(X~i(tk1),X~j(tk1)).\delta\tilde{K}^{ij}=K_{ij}(\tilde{X}^{i}(s),\tilde{X}^{j}(s))-K_{ij}(\tilde{X}^{i}(t_{k-1}),\tilde{X}^{j}(t_{k-1})).

We first estimate 𝔼(δK~ij|k1)\mathbb{E}(\delta\tilde{K}^{ij}|\mathcal{F}_{k-1}). Denote δX~j:=X~j(s)X~(tk1)\delta\tilde{X}^{j}:=\tilde{X}^{j}(s)-\tilde{X}(t_{k-1}). Performing Taylor expansion around tk1t_{k-1}, one has

δK~ij=(xiKij|tk1δX~i+xjKij|tk1δX~j)+12M:[δX~i,δX~j][δX~i,δX~j],\delta\tilde{K}^{ij}=(\nabla_{x_{i}}K_{ij}|_{t_{k-1}}\cdot\delta\tilde{X}^{i}+\nabla_{x_{j}}K_{ij}|_{t_{k-1}}\cdot\delta\tilde{X}^{j})+\frac{1}{2}M:[\delta\tilde{X}^{i},\delta\tilde{X}^{j}]\otimes[\delta\tilde{X}^{i},\delta\tilde{X}^{j}],

with MM being a random variable (tensor) bounded by 2K\|\nabla^{2}K\|_{\infty}. By (3.11), one finds that

|𝔼(δK~ij|k1)|C(1+|X~i(tk1)|q+|X~j(tk1)|q)τ.|\mathbb{E}(\delta\tilde{K}^{ij}|\mathcal{F}_{k-1})|\leq C(1+|\tilde{X}^{i}(t_{k-1})|^{q}+|\tilde{X}^{j}(t_{k-1})|^{q})\tau.

The right hand side is independent of 𝒞θ\mathcal{C}_{\theta}, and this is the place where we need the almost surely bound (3.11). Applying Lemma 3.4, one has

1p1j𝒞θ,jimj𝔼(δK~ij|k1)Cτ(1+(1N1j:ji|X~j(tk1)|q12)1/2)Cτ.\left\|\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}\mathbb{E}(\delta\tilde{K}^{ij}|\mathcal{F}_{k-1})\right\|\leq C\tau\left(1+\left(\frac{1}{N-1}\sum_{j:j\neq i}\||\tilde{X}^{j}(t_{k-1})|^{q_{1}}\|^{2}\right)^{1/2}\right)\leq C\tau.

The term 1N1j:jimj𝔼(δK~ij|k1)\|\frac{1}{N-1}\sum_{j:j\neq i}m_{j}\mathbb{E}(\delta\tilde{K}^{ij}|\mathcal{F}_{k-1})\| is much easier to estimate, and it is also bounded by CτC\tau.

Hence

𝔼Zi(tk1)[χi(X~(t))χi(X~(tk1))]CZi(tk1)τCZi(t)τ+Cτ2.\mathbb{E}Z^{i}(t_{k-1})\cdot\Big{[}\chi_{i}(\tilde{X}(t))-\chi_{i}(\tilde{X}(t_{k-1}))\Big{]}\leq C\|Z^{i}(t_{k-1})\|\tau\leq C\|Z^{i}(t)\|\tau+C\tau^{2}.

Then,

I1(CNi=1NmiZi(t)τ)+Cτ2δ1Ni=1NmiZi(t)2+C(δ)τ2I_{1}\leq\left(\frac{C}{N}\sum_{i=1}^{N}m_{i}\|Z^{i}(t)\|\tau\right)+C\tau^{2}\leq\delta\frac{1}{N}\sum_{i=1}^{N}m_{i}\|Z^{i}(t)\|^{2}+C(\delta)\tau^{2}

Step 2– Estimate of I2I_{2}.

We decompose

I2=1Ni=1Nmi𝔼(Zi(t)Zi(tk1))[χi(X~(t))χi(X(t))]+1Ni=1Nmi𝔼(Zi(t)Zi(tk1))χi(X(t))=:I21+I22.I_{2}=\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}(Z^{i}(t)-Z^{i}(t_{k-1}))\cdot[\chi_{i}(\tilde{X}(t))-\chi_{i}(X(t))]\\ +\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}(Z^{i}(t)-Z^{i}(t_{k-1}))\cdot\chi_{i}(X(t))=:I_{21}+I_{22}.

We first consider I21I_{21}.Clearly,

I211Ni=1NmiCτχi(X~(t))χi(X(t)).I_{21}\leq\frac{1}{N}\sum_{i=1}^{N}m_{i}C\tau\|\chi_{i}(\tilde{X}(t))-\chi_{i}(X(t))\|.

Since KK is Lipschitz continuous,

|δKij(t)|L(|Zi(t)|+|Zj(t)|).|\delta K_{ij}(t)|\leq L(|Z^{i}(t)|+|Z^{j}(t)|).

Hence,

(1p1j𝒞θ,jimjδKij(t))L(Zi(t)+1p1j𝒞θ,jimj|Zj(t)|).\displaystyle\left\|\left(\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}\delta K_{ij}(t)\right)\right\|\leq L\left(\|Z^{i}(t)\|+\left\|\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}|Z^{j}(t)|\right\|\right). (3.18)

Note that Zj(t)Z^{j}(t) depends on 𝒞θ\mathcal{C}_{\theta} and we cannot apply Lemma 3.4. Instead, by Lemma 3.5, one has that

|Zi(t)|C|Zi(tk1)|+Cτ.|Z^{i}(t)|\leq C|Z^{i}(t_{k-1})|+C\tau.

Since Zi(tk1)Z^{i}(t_{k-1}) is independent of 𝒞θ\mathcal{C}_{\theta}, Lemma 3.4 then gives us that

1p1j𝒞θ,jimj|Zj(t)|C(1N1j:jimj2Zj(tk1)2)1/2+CτC(1N1j:jimjZj(tk1)2)1/2+Cτ.\displaystyle\begin{split}\left\|\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}|Z^{j}(t)|\right\|&\leq C\left(\frac{1}{N-1}\sum_{j:j\neq i}m_{j}^{2}\|Z^{j}(t_{k-1})\|^{2}\right)^{1/2}+C\tau\\ &\leq C\left(\frac{1}{N-1}\sum_{j:j\neq i}m_{j}\|Z^{j}(t_{k-1})\|^{2}\right)^{1/2}+C\tau.\end{split} (3.19)
Remark 3.1.

This is the place where we need Lemma 3.4, a refined version of [25, Lemma 3.2]. If we use [25, Lemma 3.2], one controls 1p1j𝒞θ,jimj|Zj(tk1)|\|\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}|Z^{j}(t_{k-1})|\| by supjmjZj(tk1)\sup_{j}m_{j}\|Z^{j}(t_{k-1})\| which is not enough to close the estimate.

With this, we find

I21C(1Ni=1Nmiτ(1N1j:jimjZj(tk1)2)1/2)+Cτ2δ1NimiZi(tk1)2+C(δ)τ2,\begin{split}I_{21}&\leq C\left(\frac{1}{N}\sum_{i=1}^{N}m_{i}\tau\left(\frac{1}{N-1}\sum_{j:j\neq i}m_{j}\|Z^{j}(t_{k-1})\|^{2}\right)^{1/2}\right)+C\tau^{2}\\ &\leq\delta\frac{1}{N}\sum_{i}m_{i}\|Z^{i}(t_{k-1})\|^{2}+C(\delta)\tau^{2},\end{split}

where Assumption 2.1 has been used.

We now consider I22I_{22}. We first recall

ddtZi=[b(X~i)b(Xi)]+1p1j𝒞θ,jimjδKij(t)+χi(X(t))\displaystyle\frac{d}{dt}Z^{i}=[b(\tilde{X}^{i})-b(X^{i})]+\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}\delta K_{ij}(t)+\chi_{i}(X(t)) (3.20)

Note

|b(X~i)b(Xi)|01|(X~iXi)b((1σ)X~i+σXi)|𝑑σ,|b(\tilde{X}^{i})-b(X^{i})|\leq\int_{0}^{1}|(\tilde{X}^{i}-X^{i})\cdot\nabla b((1-\sigma)\tilde{X}^{i}+\sigma X^{i})|\,d\sigma,

and b((1σ)X~i+σXi)\nabla b((1-\sigma)\tilde{X}^{i}+\sigma X^{i}) is controlled by C(|X~i|q+|Xi|q)C(|\tilde{X}^{i}|^{q}+|X^{i}|^{q}) for some q>0q>0. Hence,

𝔼|b(X~i)b(Xi)||χi(X(t))|χi(X(t))X~iXi(𝔼(|X~i|q1+|Xi|q1)2)1/2CZi(t),\displaystyle\begin{split}\mathbb{E}|b(\tilde{X}^{i})-b(X^{i})||\chi_{i}(X(t^{\prime}))|&\leq\|\chi_{i}(X(t^{\prime}))\|_{\infty}\|\tilde{X}^{i}-X^{i}\|(\mathbb{E}(|\tilde{X}^{i}|^{q_{1}}+|X^{i}|^{q_{1}})^{2})^{1/2}\\ &\leq C\|Z^{i}(t)\|,\end{split}

by (3.9).

Integrating (3.20) in time over [tk1,t][t_{k-1},t], then dotting with χi(X(t))\chi_{i}(X(t)), and taking the expectation, one gets

|𝔼((Zi(t)Zi(tk1))χm,i(X(t)))|Ctk1tZi(s)𝑑s+tk1t𝔼[(1p1j𝒞θ,jimjδKij(s))χi(X(t))]𝑑s+𝔼tk1tχi(X(s))χi(X(t))𝑑s,\left|\mathbb{E}((Z^{i}(t)-Z^{i}(t_{k-1}))\cdot\chi_{m,i}(X(t)))\right|\leq C\int_{t_{k-1}}^{t}\|Z^{i}(s)\|ds+\\ \int_{t_{k-1}}^{t}\mathbb{E}\left[\left(\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}\delta K_{ij}(s)\right)\cdot\chi_{i}(X(t))\right]\,ds+\mathbb{E}\int_{t_{k-1}}^{t}\chi_{i}(X(s))\cdot\chi_{i}(X(t))\,ds, (3.21)

Applying (3.18) and (3.19), the second term on the right hand of (3.21) is bounded by

C(1N1j:jimj2Zj(tk1)2)1/2τ+Cτ2.C\left(\frac{1}{N-1}\sum_{j:j\neq i}m_{j}^{2}\|Z^{j}(t_{k-1})\|^{2}\right)^{1/2}\tau+C\tau^{2}.

Similarly as we estimate I21I_{21}, this is controlled by δ1NimiZi(tk1)2+C(δ)τ2\delta\frac{1}{N}\sum_{i}m_{i}\|Z^{i}(t_{k-1})\|^{2}+C(\delta)\tau^{2}. The last term on the right hand of (3.21) is controlled by Lemma 3.2 (in particular, equation (3.7)):

[1p11N1]Λiτ.\left[\frac{1}{p-1}-\frac{1}{N-1}\right]\|\Lambda_{i}\|_{\infty}\tau.

Therefore,

I22δ1NimiZi(tk1)2+C(δ)τ2+[1p11N1]Λiτ.I_{22}\leq\delta\frac{1}{N}\sum_{i}m_{i}\|Z^{i}(t_{k-1})\|^{2}+C(\delta)\tau^{2}+\left[\frac{1}{p-1}-\frac{1}{N-1}\right]\|\Lambda_{i}\|_{\infty}\tau.

Moreover,

Zi(tk1)2(Zi(t)+Cτ)22Zi(t)2+2C2τ2.\|Z^{i}(t_{k-1})\|^{2}\leq(\|Z^{i}(t)\|+C\tau)^{2}\leq 2\|Z^{i}(t)\|^{2}+2C^{2}\tau^{2}.

Finally, taking all those estimates together, one has the following estimate:

ddtJCJ+C(δ)τ2+[1p11N1]Λiτ.\frac{d}{dt}J\leq CJ+C(\delta)\tau^{2}+\left[\frac{1}{p-1}-\frac{1}{N-1}\right]\|\Lambda_{i}\|_{\infty}\tau.

The claim then follows by Grönwall’s lemma. ∎

Remark 3.2.

The strong convergence can imply the convergence of one marginal distributions. See [25] for more details. The weak convergence below, however, is for empirical measure (2.4).

4 The weak convergence and Random Batch Method for backward equations

In practice, one may be more interested in the distributions of the particles instead of the trajectories of Xi(t)X^{i}(t). Hence, the error (3.1) is not suitable for this purpose, and we seek to study the weak convergence of the empirical measure (2.4), as commonly used in the numerical SDE literature [36, 28]. Roughly speaking, we say a sequence of measures μN\mu^{N} converges to some measure μ\mu weakly if for any suitable test function φ\varphi, it holds that

φdμN=:μN,φμ,φ:=φdμ.\int\varphi\,d\mu^{N}=:\langle\mu^{N},\varphi\rangle\to\langle\mu,\varphi\rangle:=\int\varphi\,d\mu.

For our problem, we consider the weak convergence of the empirical measure μN\mu^{N} defined in (2.4), which is defined on Borel sets from d\mathbb{R}^{d}, to the empirical measure corresponding to (2.1) as τ0\tau\to 0. Hence, to show that the empirical measures given by XX and X~\tilde{X} are close in law, we pick a test function φCb(d)\varphi\in C_{b}^{\infty}(\mathbb{R}^{d}), and hope to show that the weak error defined below is small:

Ek:=|1Ni=1Nωi𝔼φ(X~i(kτ))1Ni=1Nωi𝔼φ(Xi(kτ))|.\displaystyle E_{k}:=\left|\frac{1}{N}\sum_{i=1}^{N}\omega_{i}\mathbb{E}\varphi(\tilde{X}^{i}(k\tau))-\frac{1}{N}\sum_{i=1}^{N}\omega_{i}\mathbb{E}\varphi(X^{i}(k\tau))\right|. (4.1)
Remark 4.1.

Traditionally, the test functions used for schemes of numerical SDEs are those with polynomial growth at infinity [37, 36]. We used CbC_{b}^{\infty} as the test functions as done nowadays [3, 32], which will induce a weaker topology that disregards high order moments. If one has the moment control, the convergence using these two types of test functions will be the same.

For the weak convergence, we need some different assumptions on bb and KijK_{ij}.

Assumption 4.1.

The functions b,Kijb,K_{ij} are C4C^{4} and the derivatives of bb and KijK_{ij} up to order 44 are uniformly bounded (uniform in i,ji,j).

Remark 4.2.

Proof of weak convergence using Assumption 2.2 instead of 4.1 should also be completed. Using Assumption 4.1 makes the proof based on semigroup technique elegant (see the details below).

We now state the main theorem for the weak convergence of RBM.

Theorem 4.1.

Under Assumption 2.1 and Assumption 4.1, the Random Batch Method converges weakly with first order for the empirical measure (2.4). In particular, the weak error defined in (4.1) satisfies

supk:kτT|Ek|Cτ,\displaystyle\sup_{k:k\tau\leq T}|E_{k}|\leq C\tau, (4.2)

where C=C(φ,T)C=C(\varphi,T) is independent of N,τN,\tau, but depends on φ\varphi.

Usually, in numerical SDEs, to prove the weak convergence, one makes use of the backward Kolmogorov equation (”backward equation” for short) which is defined in the same Euclidean space. For our problem, we need to lift the Euclidean space from d\mathbb{R}^{d} to Nd\mathbb{R}^{Nd}. In particular, define

u(x,t):=1Ni=1Nωi𝔼[φ(Xi(t))|X(0)=x]\displaystyle u(x,t):=\frac{1}{N}\sum_{i=1}^{N}\omega_{i}\mathbb{E}[\varphi(X^{i}(t))|X(0)=x] (4.3)

where xNdx\in\mathbb{R}^{Nd} and XX is the solution to (2.1). Then, we make use of this function to study the weak convergence. This function uu satisfies the following backward equation [40]

tu=u:=i=1N[b(xi)+1N1j:jimjKij(xi,xj)]xiu+i=1N12σ2Δxiu.\displaystyle\partial_{t}u=\mathcal{L}u:=\sum_{i=1}^{N}\left[b(x_{i})+\frac{1}{N-1}\sum_{j:j\neq i}m_{j}K_{ij}(x_{i},x_{j})\right]\cdot\nabla_{x_{i}}u+\sum_{i=1}^{N}\frac{1}{2}\sigma^{2}\Delta_{x_{i}}u. (4.4)

The operator \mathcal{L} is called the generator of the ODE/SDE (2.1). The Laplacian Δxi\Delta_{x_{i}} is given by

Δxi:=j=1dxi(j)xi(j),xi=(xi(1),,xi(d))d.\displaystyle\Delta_{x_{i}}:=\sum_{j=1}^{d}\partial_{x_{i}^{(j)}x_{i}^{(j)}},~{}~{}x_{i}=(x_{i}^{(1)},\cdots,x_{i}^{(d)})\in\mathbb{R}^{d}. (4.5)

The solution semigroup for (4.4) will be denoted by

etu(,0):=u(x,t).\displaystyle e^{t\mathcal{L}}u(\cdot,0):=u(x,t). (4.6)

By the well-known property of backward equation [40, 31], one has

u(,t)=etu(,0)u(,0).\displaystyle\|u(\cdot,t)\|_{\infty}=\|e^{t\mathcal{L}}u(\cdot,0)\|_{\infty}\leq\|u(\cdot,0)\|_{\infty}. (4.7)

The function uu is defined on Nd\mathbb{R}^{Nd}, and naive estimates of the norms for the derivatives will depend on NN. This is not sufficient for us to show the NN independent of weak convergence for the empirical measure. In fact, it is clear that

uC,\displaystyle\|u\|_{\infty}\leq C, (4.8)

independent of NN. The following proposition then gives the crucial estimates of the derivatives of uu.

Proposition 4.1.

For any ii and jij\neq i, one has

xiu+xi2u+xi3u+xi4uC(T)1N,\displaystyle\|\nabla_{x_{i}}u\|_{\infty}+\|\nabla^{2}_{x_{i}}u\|_{\infty}+\|\nabla_{x_{i}}^{3}u\|_{\infty}+\|\nabla_{x_{i}}^{4}u\|_{\infty}\leq C(T)\frac{1}{N}, (4.9)

and

xixju+xixj2u+xi2xj2uC(T)1N2.\displaystyle\|\nabla_{x_{i}}\nabla_{x_{j}}u\|_{\infty}+\|\nabla_{x_{i}}\nabla^{2}_{x_{j}}u\|_{\infty}+\|\nabla^{2}_{x_{i}}\nabla^{2}_{x_{j}}u\|_{\infty}\leq C(T)\frac{1}{N^{2}}. (4.10)

Here, xi2\nabla_{x_{i}}^{2} is the Hessian matrix. Similarly, xi4u\nabla_{x_{i}}^{4}u is the fourth order tensor with derivatives of the form (k=14xi(jk))u(\prod_{k=1}^{4}\partial_{x_{i}^{(j_{k})}})u and jk{1,,d}j_{k}\in\{1,\cdots,d\}. The norm xi4u\|\nabla_{x_{i}}^{4}u\|_{\infty} is understood as

supx[j1,j2,j3,j4((k=14xi(jk))u)2]1/2.\sup_{x}\left[\sum_{j_{1},j_{2},j_{3},j_{4}}\Big{(}\big{(}\prod_{k=1}^{4}\partial_{x_{i}^{(j_{k})}}\big{)}u\Big{)}^{2}\right]^{1/2}.

The proof is kind of tedious and is given in Appendix B.

For further discussion, we introduce the generator corresponding to the Random Batch Method. For t(tk1,tk]t\in(t_{k-1},t_{k}]

C:=i=1N[b(xi)+1p1j:jiIij(k1)mjKij(xi,xj)]xi+i=1N12σ2Δxi.\displaystyle\mathcal{L}_{C}:=\sum_{i=1}^{N}\left[b(x_{i})+\frac{1}{p-1}\sum_{j:j\neq i}I_{ij}^{(k-1)}m_{j}K_{ij}(x_{i},x_{j})\right]\cdot\nabla_{x_{i}}+\sum_{i=1}^{N}\frac{1}{2}\sigma^{2}\Delta_{x_{i}}.

We recall that IijI_{ij} is the indicator for i,ji,j being in the same batch, and we use Iij(k1)I_{ij}^{(k-1)} to mean the indicator corresponding to 𝒞(k1)\mathcal{C}^{(k-1)}.

The importance of Proposition 4.1 is that one can bound iu\mathcal{L}^{i}u uniformly in NN, where i\mathcal{L}^{i} means the composition of \mathcal{L} for ii times (if i=0i=0, it is the identity operator). This is crucial for establishing the weak convergence uniformly in NN. In particular, we have the following.

Lemma 4.1.

For the function uu defined in (4.3), it holds that for i=0,1,2i=0,1,2 that

iuC,𝒞iuC\displaystyle\|\mathcal{L}^{i}u\|\leq C,~{}~{}~{}\|\mathcal{L}_{\mathcal{C}}^{i}u\|\leq C (4.11)

Moreover,

𝔼𝒞=.\displaystyle\mathbb{E}\mathcal{L}_{\mathcal{C}}=\mathcal{L}. (4.12)
Proof.

The first assertion is a corollary of Proposition 4.1. The second claim is the one proved in Lemma 3.2. ∎

To go further, we need to introduce a semigroup associated to RBM (see also [18] for the semigroup used in SGD). Consider the transition X~(tk1)X~(tk)\tilde{X}(t_{k-1})\to\tilde{X}(t_{k}), it is clear that this transition gives a time homogeneous Markov chain. Define the operator 𝒮(k):Cb(Nd)Cb(Nd)\mathcal{S}^{(k)}:C_{b}(\mathbb{R}^{Nd})\to C_{b}(\mathbb{R}^{Nd}) as

𝒮(k)ϕ:=𝔼[ϕ(X~(kτ))|X~(0)=x].\displaystyle\mathcal{S}^{(k)}\phi:=\mathbb{E}[\phi(\tilde{X}(k\tau))|\tilde{X}(0)=x]. (4.13)

Below, we sometimes use 𝔼x\mathbb{E}_{x} to mean 𝔼(|X~(0)=x)\mathbb{E}(\cdot|\tilde{X}(0)=x) for convenience. Then, by the Markov property [16], it can be shown that

𝒮(k)=𝒮k:=𝒮𝒮.\displaystyle\mathcal{S}^{(k)}=\mathcal{S}^{k}:=\mathcal{S}\circ\cdots\circ\mathcal{S}. (4.14)

In fact, for any test function ϕ\phi, we let uk:=(𝒮(k)ϕu^{k}:=(\mathcal{S}^{(k)}\phi. Then, it holds that

uk(x)=𝔼[ϕ(X~(tk))|X~(0)=x]=𝔼[𝔼[ϕ(X~(tk))|X~(t1)]|X~(0)=x]=𝔼[uk1(X~(t1))|X~(0)=x],\begin{split}u^{k}(x)&=\mathbb{E}[\phi(\tilde{X}(t_{k}))|\tilde{X}(0)=x]\\ &=\mathbb{E}[\mathbb{E}[\phi(\tilde{X}(t_{k}))|\tilde{X}(t_{1})]|\tilde{X}(0)=x]\\ &=\mathbb{E}[u^{k-1}(\tilde{X}(t_{1}))|\tilde{X}(0)=x],\end{split}

where the third equality is by the Markov property, namely

𝒮(k)=𝒮𝒮(k1).\mathcal{S}^{(k)}=\mathcal{S}\circ\mathcal{S}^{(k-1)}.

Clearly, 𝒮k\mathcal{S}^{k} are nonexpansive in L(Nd)L^{\infty}(\mathbb{R}^{Nd}), i.e.

𝒮kϕϕ.\displaystyle\|\mathcal{S}^{k}\phi\|_{\infty}\leq\|\phi\|_{\infty}. (4.15)

We give the proof of the weak convergence.

Proof of Theorem 4.1.

The weak error in (4.1) is to compare uu in (4.3) and 𝒮nϕ\mathcal{S}^{n}\phi for test function

ϕ(x)=1Ni=1Nωiφ(xi).\displaystyle\phi(x)=\frac{1}{N}\sum_{i=1}^{N}\omega_{i}\varphi(x_{i}). (4.16)

By Equation (4.14) (𝒮(k)=𝒮k\mathcal{S}^{(k)}=\mathcal{S}^{k}), we need to estimate

𝒮nϕ(x)u(x,nτ)=k=1n[𝒮ku(x,(nk)τ)𝒮k1u(x,(nk+1)τ)].\mathcal{S}^{n}\phi(x)-u(x,n\tau)=\sum_{k=1}^{n}\Big{[}\mathcal{S}^{k}u(x,(n-k)\tau)-\mathcal{S}^{k-1}u(x,(n-k+1)\tau)\Big{]}.

Clearly, by (4.15), one has

|𝒮nϕ(x)u(x,nτ)|k=1n𝒮u(x,(nk)τ)u(x,(nk+1)τ)|\mathcal{S}^{n}\phi(x)-u(x,n\tau)|\leq\sum_{k=1}^{n}\|\mathcal{S}u(x,(n-k)\tau)-u(x,(n-k+1)\tau)\|_{\infty}

By the definition of SS, one has

𝒮u(x,(nk)τ)=𝔼xu(X~(τ),(nk)τ)=𝔼𝒞𝔼x(u(X~(τ),(nk)τ)|𝒞)=𝔼𝒞eτ𝒞u(x,(nk)τ).\begin{split}\mathcal{S}u(x,(n-k)\tau)&=\mathbb{E}_{x}u(\tilde{X}(\tau),(n-k)\tau)\\ &=\mathbb{E}_{\mathcal{C}}\mathbb{E}_{x}(u(\tilde{X}(\tau),(n-k)\tau)|\mathcal{C})\\ &=\mathbb{E}_{\mathcal{C}}e^{\tau\mathcal{L}_{\mathcal{C}}}u(x,(n-k)\tau).\end{split}

The second equality means that we fix a division of batches to compute the expectation with respect to the Brownian motions, and then average out about the random batches. For the last equality, we recall that eτ𝒞e^{\tau\mathcal{L}_{\mathcal{C}}} means the solution semigroup by tv=𝒞v\partial_{t}v=\mathcal{L}_{\mathcal{C}}v.

Note

eτ𝒞u(x,(nk)τ)=u(x,(nk)τ)+τ𝒞u(x,(nk)τ)+0τ(τz)𝒞2ez𝒞u(x,(nk)τ)𝑑z.e^{\tau\mathcal{L}_{\mathcal{C}}}u(x,(n-k)\tau)=u(x,(n-k)\tau)+\tau\mathcal{L}_{\mathcal{C}}u(x,(n-k)\tau)\\ +\int_{0}^{\tau}(\tau-z)\mathcal{L}_{\mathcal{C}}^{2}e^{z\mathcal{L}_{\mathcal{C}}}u(x,(n-k)\tau)\,dz.

For the remainder term, by Lemma 4.1, it holds for every partition of batches that

𝒞2ez𝒞u(x,(nk)τ)=ez𝒞𝒞2u(x,(nk)τ)𝒞2u(x,(nk)τ)C.\begin{split}\|\mathcal{L}_{\mathcal{C}}^{2}e^{z\mathcal{L}_{\mathcal{C}}}u(x,(n-k)\tau)\|_{\infty}&=\|e^{z\mathcal{L}_{\mathcal{C}}}\mathcal{L}_{\mathcal{C}}^{2}u(x,(n-k)\tau)\|_{\infty}\\ &\leq\|\mathcal{L}_{\mathcal{C}}^{2}u(x,(n-k)\tau)\|_{\infty}\leq C.\end{split}

Hence,

𝒮u(x,(nk)τ)u(x,(nk)τ)𝔼τ𝒞u(x,(nk)τ)Cτ2,\|\mathcal{S}u(x,(n-k)\tau)-u(x,(n-k)\tau)-\mathbb{E}\tau\mathcal{L}_{\mathcal{C}}u(x,(n-k)\tau)\|_{\infty}\leq C\tau^{2},

with CC independent of NN. Here, we used the fact that eτ𝒞e^{\tau\mathcal{L}_{\mathcal{C}}} is nonexpansive in LL^{\infty}, similar as in (4.7).

Similarly,

u(x,(nk+1)τ)=eτu(x,(nk)τ)=u(x,(nk)τ)+τu(x,(nk)τ)+0τ(τz)2ezu(x,(nk)τ)𝑑z.\begin{split}&u(x,(n-k+1)\tau)=e^{\tau\mathcal{L}}u(x,(n-k)\tau)\\ &=u(x,(n-k)\tau)+\tau\mathcal{L}u(x,(n-k)\tau)+\int_{0}^{\tau}(\tau-z)\mathcal{L}^{2}e^{z\mathcal{L}}u(x,(n-k)\tau)\,dz.\end{split}

The remainder term is again bounded by Cτ2C\tau^{2}. Since

𝔼𝒞=,\mathbb{E}\mathcal{L}_{\mathcal{C}}=\mathcal{L},

one finds that

𝒮u(x,(nk)τ)u(x,(nk+1)τ)Cτ2,\|\mathcal{S}u(x,(n-k)\tau)-u(x,(n-k+1)\tau)\|_{\infty}\leq C\tau^{2},

with CC independent of NN.

Hence,

|𝒮nϕ(x)u(x,nτ)|k=1n𝒮u(x,(nk)τ)u(x,(nk+1)τ)CNτ2Cτ.|\mathcal{S}^{n}\phi(x)-u(x,n\tau)|\leq\sum_{k=1}^{n}\|\mathcal{S}u(x,(n-k)\tau)-u(x,(n-k+1)\tau)\|_{\infty}\leq CN\tau^{2}\leq C\tau.

The claim is then proved. ∎

Remark 4.3.

This result is reminiscent of the results by Golse et al. [21], where the average of the one marginal density matrices for NN body quantum system has been used for the Random Batch Method, and the convergence rate is also O(τ)O(\tau) under a certain weak norm.

The backward equation for the Random Batch system is given by

tu~=i=1N[b(xi)+1p1j:jiIij(k1)mjKij(xi,xj)]xiu~+i12σ2Δxiu~.\displaystyle\partial_{t}\tilde{u}=\sum_{i=1}^{N}\left[b(x_{i})+\frac{1}{p-1}\sum_{j:j\neq i}I_{ij}^{(k-1)}m_{j}K_{ij}(x_{i},x_{j})\right]\cdot\nabla_{x_{i}}\tilde{u}+\sum_{i}\frac{1}{2}\sigma^{2}\Delta_{x_{i}}\tilde{u}. (4.17)

Hence, Theorem 4.1 in fact says the following result when we apply random batch method to backward equations or Liouville equations (σ=0\sigma=0).

Theorem 4.2.

Consider the backward equation (4.4) and its corresponding equation using Random Batch Method (4.17). If the initial value is given by

u|t=0=u~|t=0=1Niωiφ(xi),\displaystyle u|_{t=0}=\tilde{u}|_{t=0}=\frac{1}{N}\sum_{i}\omega_{i}\varphi(x_{i}), (4.18)

then it holds that

suptTu(t)u~(t)C(T)τ.\displaystyle\sup_{t\leq T}\|u(t)-\tilde{u}(t)\|_{\infty}\leq C(T)\tau. (4.19)
Remark 4.4.

For general initial data, the approximation in LL^{\infty} given by random batch method for backward equation (Liouville equation when σ=0\sigma=0) can not be uniform in NN.

Remark 4.5.

The situation can be different if one considers the Liouville equation of second order system for the density distribution [27]. Clearly, in this case, we cannot use the LL^{\infty} norm to gauge the difference. Instead, a certain weak norm for the combination of one marginals should be considered as in [21]. This is left for future.

Acknowledgement

S. Jin was partially supported by the NSFC grant No. 31571071. The work of L. Li was partially sponsored by NSFC 11901389, Shanghai Sailing Program 19YF1421300 and NSFC 11971314. The work of J.-G. Liu was partially supported by KI-Net NSF RNMS11-07444 and NSF DMS-1812573.

Appendix A Proof of Lemma 3.3

Proof.

The first part is similar as in [25] except that for b()b(\cdot) terms, we need to use the one-sided Lipschitz condition. Hence, we omit the proof.

We now prove (3.11). Consider a realization so that the equation is written as

dX~i(t)=b(X~i)dt+1p1j𝒞θ,jimjKij(X~i,X~j)+σdBi,d\tilde{X}^{i}(t)=b(\tilde{X}^{i})\,dt+\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}K_{ij}(\tilde{X}^{i},\tilde{X}^{j})+\sigma dB^{i},

where 𝒞θ\mathcal{C}_{\theta} again is the random batch that contains ii from the random division at tk1t_{k-1}, or 𝒞(k1)\mathcal{C}^{(k-1)}. It follows that

𝔼(X~i(t)X~i(tk1)|k1)=tk1t𝔼(b(X~i)|k1)𝑑s+tk1t𝔼(1p1j𝒞θ,jimjKij(X~i,X~j)|k1)𝑑s.\displaystyle\begin{split}\mathbb{E}\Big{(}\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})\Big{|}\mathcal{F}_{k-1}\Big{)}=&\int_{t_{k-1}}^{t}\mathbb{E}(b(\tilde{X}^{i})|\mathcal{F}_{k-1})\,ds\\ &+\int_{t_{k-1}}^{t}\mathbb{E}\left(\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}K_{ij}(\tilde{X}^{i},\tilde{X}^{j})\Big{|}\mathcal{F}_{k-1}\right)\,ds.\end{split}

Note that KK is bounded and |b(x)|C(1+|x|q)|b(x)|\leq C(1+|x|^{q}) for some q>0q>0. Together with (3.10), this implies the first estimate in (3.11).

For the second equation in (3.11), Itô’s formula implies that

ddt𝔼[|X~i(t)X~i(tk1)|2|k1]=2𝔼[(X~i(t)X~i(tk1))(b(X~i)+1p1j𝒞θ,jimjKij(X~i,X~j))|k1]+σ2.\frac{d}{dt}\mathbb{E}\left[|\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})|^{2}|\mathcal{F}_{k-1}\right]\\ =2\mathbb{E}\left[\Big{(}\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})\Big{)}\cdot\left(b(\tilde{X}^{i})+\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}K_{ij}(\tilde{X}^{i},\tilde{X}^{j})\right)\Big{|}\mathcal{F}_{k-1}\right]+\sigma^{2}.

Note that

(X~i(t)X~i(tk1))b(X~i)β|X~i(t)X~i(tk1)|2+(X~i(t)X~i(tk1))b(X~i(tk1))C|X~i(t)X~i(tk1)|2+C(1+|X~i(tk1)|q)\Big{(}\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})\Big{)}\cdot b(\tilde{X}^{i})\leq\beta|\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})|^{2}+\Big{(}\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})\Big{)}\cdot b(\tilde{X}^{i}(t_{k-1}))\\ \leq C|\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})|^{2}+C(1+|\tilde{X}^{i}(t_{k-1})|^{q})

Hence,

ddt𝔼[|X~i(t)X~i(tk1)|2|k1]C𝔼[|X~i(t)X~i(tk1)|2|k1]+C(1+|X~i(tk1)|q).\frac{d}{dt}\mathbb{E}\left[|\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})|^{2}|\mathcal{F}_{k-1}\right]\leq C\mathbb{E}\left[|\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})|^{2}|\mathcal{F}_{k-1}\right]+C(1+|\tilde{X}^{i}(t_{k-1})|^{q}).

The claim then follows.

Appendix B Proof of Proposition 4.1

Step 1– Estimates of xiu\nabla_{x_{i}}u.

Taking x\nabla_{x_{\ell}} in (4.4), one has

txu=xu+f(x)\displaystyle\partial_{t}\nabla_{x_{\ell}}u=\mathcal{L}\nabla_{x_{\ell}}u+f_{\ell}(x) (B.1)

with

f(x,t)=[xb(x)+1N1j=1NmjxKj(x,xj)]xu+1N1j=1NmxKj(xj,x)xju.f_{\ell}(x,t)=\left[\nabla_{x_{\ell}}b(x_{\ell})+\frac{1}{N-1}\sum_{j=1}^{N}m_{j}\nabla_{x_{\ell}}K_{\ell j}(x_{\ell},x_{j})\right]\cdot\nabla_{x_{\ell}}u\\ +\frac{1}{N-1}\sum_{j=1}^{N}m_{\ell}\nabla_{x_{\ell}}K_{j\ell}(x_{j},x_{\ell})\cdot\nabla_{x_{j}}u. (B.2)

Here xK\nabla_{x_{\ell}}K is a second order tensor and we use the convention that (Av)i:=jAijvj(A\cdot v)_{i}:=\sum_{j}A_{ij}v_{j}.

Hence,

xu(t)=etxu(0)+0te(ts)f(x,s)𝑑s.\nabla_{x_{\ell}}u(t)=e^{t\mathcal{L}}\nabla_{x_{\ell}}u(0)+\int_{0}^{t}e^{(t-s)\mathcal{L}}f_{\ell}(x,s)\,ds.

Since the semigroup ete^{t\mathcal{L}} is nonexpansive in LL^{\infty} as in (4.7), we find

xu(t)xu(0)+0tf(x,s)𝑑s.\|\nabla_{x_{\ell}}u(t)\|_{\infty}\leq\|\nabla_{x_{\ell}}u(0)\|_{\infty}+\int_{0}^{t}\|f_{\ell}(x,s)\|_{\infty}\,ds.

However, by (B.2), the linear transform from [x1u,,xNu][\nabla_{x_{1}}u,\cdots,\nabla_{x_{N}}u] to [f1,,fN][f_{1},\cdots,f_{N}] is a block matrix with the LL^{\infty} norm bounded. This is because all the off-diagonal blocks are of order O(1N)O(\frac{1}{N}). Hence,

f(x,s)Cmaxxu(t)=:a(t).\|f_{\ell}(x,s)\|_{\infty}\leq C\max_{\ell}\|\nabla_{x_{\ell}}u(t)\|_{\infty}=:a(t).

Therefore,

a(t)a(0)+C0ta(s)𝑑s.a(t)\leq a(0)+C\int_{0}^{t}a(s)\,ds.

This means

suptTa(t)C(T)a(0).\sup_{t\leq T}a(t)\leq C(T)a(0).

Clearly since

u(0)=1Niωiφ(xi)xiu(0)C1N,u(0)=\frac{1}{N}\sum_{i}\omega_{i}\varphi(x_{i})\Rightarrow\|\nabla_{x_{i}}u(0)\|_{\infty}\leq C\frac{1}{N},

the estimate for xiu\nabla_{x_{i}}u follows by Grönwall’s lemma.

Step 2– Second order derivatives

We first compute the xi2\nabla_{x_{i}}^{2} derivatives (the xix_{i} Hessian).

tx2u=x2u+f~,\displaystyle\partial_{t}\nabla_{x_{\ell}}^{2}u=\mathcal{L}\nabla_{x_{\ell}}^{2}u+\tilde{f}_{\ell}, (B.3)

with

f~=2xbx2u+1N1j2mjxKj(x,xj)x2u+1N1i:i2mxKi(xi,x)xixu+(2xb+1N1jmjx2Kj(x,xj))xu+1N1imx2Ki(xi,x)xiu.\displaystyle\begin{split}\tilde{f}_{\ell}=&2\nabla_{x_{\ell}}b\cdot\nabla_{x_{\ell}}^{2}u+\frac{1}{N-1}\sum_{j}2m_{j}\nabla_{x_{\ell}}K_{\ell j}(x_{\ell},x_{j})\cdot\nabla_{x_{\ell}}^{2}u\\ &+\frac{1}{N-1}\sum_{i:i\neq\ell}2m_{\ell}\nabla_{x_{\ell}}K_{i\ell}(x_{i},x_{\ell})\cdot\nabla_{x_{i}}\nabla_{x_{\ell}}u\\ &+\left(2\nabla_{x_{\ell}}b+\frac{1}{N-1}\sum_{j}m_{j}\nabla_{x_{\ell}}^{2}K_{\ell j}(x_{\ell},x_{j})\right)\cdot\nabla_{x_{\ell}}u\\ &+\frac{1}{N-1}\sum_{i}m_{\ell}\nabla_{x_{\ell}}^{2}K_{i\ell}(x_{i},x_{\ell})\cdot\nabla_{x_{i}}u.\end{split} (B.4)

Again, xib\nabla_{x_{i}}b for a vector field bb is a second order tensor with (xib)rs=xi(r)b(s)(\nabla_{x_{i}}b)_{rs}=\partial_{x_{i}^{(r)}}b^{(s)}. For the dot product between tensors: ABA\cdot B means the contraction between the last index of AA and the first index of BB. For example, we use the convention that

(xKi(xi,x)xixu)rs:=q=1dx(r)Ki(q)xi(q)x(s)u.(\nabla_{x_{\ell}}K_{i\ell}(x_{i},x_{\ell})\cdot\nabla_{x_{i}}\nabla_{x_{\ell}}u)_{rs}:=\sum_{q=1}^{d}\partial_{x_{\ell}^{(r)}}K_{i\ell}^{(q)}\partial_{x_{i}^{(q)}}\partial_{x_{\ell}^{(s)}}u.

Thus,

f~=A~1x2u+1N1iA~i2xixu+A~3u+1N1iA~i4xiu.\displaystyle\begin{split}\tilde{f}_{\ell}=&\tilde{A}_{\ell}^{1}\cdot\nabla_{x_{\ell}}^{2}u+\frac{1}{N-1}\sum_{i}\tilde{A}_{i\ell}^{2}\cdot\nabla_{x_{i}}\nabla_{x_{\ell}}u\\ &+\tilde{A}_{\ell}^{3}\cdot\nabla_{\ell}u+\frac{1}{N-1}\sum_{i}\tilde{A}_{i\ell}^{4}\cdot\nabla_{x_{i}}u.\end{split}

Here, all the A~\tilde{A} tensors are bounded independent of NN. There are O(N)O(N) terms in the summation in the first line, and thus the linear transform is again bounded in LLL^{\infty}\to L^{\infty} independent of NN. The terms on the second line is clearly controlled as C1NC\frac{1}{N} by the estimates of xiu\nabla_{x_{i}}u.

Similarly, one can compute for q\ell\neq q that

txxqu=xxqu+gq\displaystyle\partial_{t}\nabla_{x_{\ell}}\nabla_{x_{q}}u=\mathcal{L}\nabla_{x_{\ell}}\nabla_{x_{q}}u+g_{\ell q} (B.5)

where the expression of gqg_{\ell q} is complicated but it is of the following form

gq=Am1xxqu+1N1j(Aj2xjxu+Ajq3xjxqu)+1N1(mxxqKq(x,xq)u+mqxqxKq(xq,x)xqu).g_{\ell q}=A_{\ell m}^{1}\cdot\nabla_{x_{\ell}}\nabla_{x_{q}}u+\frac{1}{N-1}\sum_{j}(A_{j\ell}^{2}\cdot\nabla_{x_{j}}\nabla_{x_{\ell}}u+A_{jq}^{3}\cdot\nabla_{x_{j}}\nabla_{x_{q}}u)\\ +\frac{1}{N-1}(m_{\ell}\nabla_{x_{\ell}}\nabla_{x_{q}}K_{\ell q}(x_{\ell},x_{q})\cdot\nabla_{\ell}u+m_{q}\nabla_{x_{q}}\nabla_{x_{\ell}}K_{q\ell}(x_{q},x_{\ell})\cdot\nabla_{x_{q}}u). (B.6)

Here, AkA^{k}’s are second order tensors that are made up of the derivatives of b,Kijb,K_{ij}, and thus bounded by constants independent of NN. Note that there are O(N)O(N) terms in the summation in the first line, and therefore the linear transform is again bounded in LLL^{\infty}\to L^{\infty} independent of NN. The terms on the second line is clearly controlled as C1N2C\frac{1}{N^{2}} by the estimates of xiu\nabla_{x_{i}}u.

We first of all consider all the second order derivatives xxq\nabla_{x_{\ell}}\nabla_{x_{q}} where \ell can be equal or not equal to qq . By similar argument as for xiu\nabla_{x_{i}}u, one can get the estimate

xxqu(t)xxqu(0)+C0tmax,qxxqu(s)ds+C(T)1N.\displaystyle\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(t)\|_{\infty}\leq\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(0)\|_{\infty}+C\int_{0}^{t}\max_{\ell,q}\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(s)\|_{\infty}\,ds+C(T)\frac{1}{N}. (B.7)

Moreover,

x2u(0)C1N,xxqu(0)=0.\|\nabla_{x_{\ell}}^{2}u(0)\|_{\infty}\leq C\frac{1}{N},~{}~{}~{}\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(0)\|_{\infty}=0.

Then, Grönwall’s inequality tells us that

max,qxxqu(t)C(T)1N.\displaystyle\max_{\ell,q}\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(t)\|_{\infty}\leq C(T)\frac{1}{N}. (B.8)

Now, we focus on (B.5) for q\ell\neq q. To do this, we need to separate out the j=j=\ell and j=qj=q terms in (B.6), which are of order O(1N)O(\frac{1}{N}), and thus with the prefactor 1/(N1)1/(N-1), they contribute O(1/N2)O(1/N^{2}) to gqg_{\ell q}. Hence, one has for q\ell\neq q that

xxqu(t)xxqu(0)+C0tmaxqxxqu(s)ds+C(T)1N2.\displaystyle\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(t)\|_{\infty}\leq\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(0)\|_{\infty}+C\int_{0}^{t}\max_{\ell\neq q}\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(s)\|_{\infty}\,ds+C(T)\frac{1}{N^{2}}. (B.9)

Since xxqu(0)=0\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(0)\|_{\infty}=0 for q\ell\neq q, one then has

maxqxxqu(t)C0tmaxqxxqu(s)ds+C(T)1N2.\displaystyle\max_{\ell\neq q}\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(t)\|_{\infty}\leq C\int_{0}^{t}\max_{\ell\neq q}\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(s)\|_{\infty}\,ds+C(T)\frac{1}{N^{2}}. (B.10)

Grönwall’s lemma then tells us that

maxqxxqu(t)C1N2.\displaystyle\max_{\ell\neq q}\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(t)\|_{\infty}\leq C\frac{1}{N^{2}}. (B.11)

Step 3– Higher order derivatives.

The higher order derivatives can be similarly estimated using induction. For these proofs, some derivatives that are not listed in Proposition 4.1 should be involved. For example, for third order derivatives, one should expect

xixjxku=O(1/N3)\|\nabla_{x_{i}}\nabla_{x_{j}}\nabla_{x_{k}}u\|_{\infty}=O(1/N^{3})

for distinct i,j,ki,j,k. These proofs are tedious but the essential ideas are the same as we prove the claims for the Hessian. We omit the details.

References

  • [1] G. Albi and L. Pareschi. Binary interaction algorithms for the simulation of flocking and swarming dynamics. Multiscale Modeling & Simulation, 11(1):1–29, 2013.
  • [2] G. Allaire, J.-F. Dufrêche, A. Mikelić, and A. Piatnitski. Asymptotic analysis of the Poisson–Boltzmann equation describing electrokinetics in porous media. Nonlinearity, 26(3):881, 2013.
  • [3] D. F. Anderson and J. C. Mattingly. A weak trapezoidal method for a class of stochastic differential equations. Commun. Math. Sci., 9(1):301–318, 2011.
  • [4] Hans Babovsky and Reinhard Illner. A convergence proof for Nanbu’s simulation method for the full Boltzmann equation. SIAM journal on numerical analysis, 26(1):45–65, 1989.
  • [5] A. L. Bertozzi, J. B. Garnett, and T. Laurent. Characterization of radially symmetric finite time blowup in multidimensional aggregation equations. SIAM J. Math. Anal., 44(2):651–681, 2012.
  • [6] G. A. Bird. Approach to translational equilibrium in a rigid sphere gas. The Physics of Fluids, 6(10):1518–1519, 1963.
  • [7] C. K. Birdsall and A. B. Langdon. Plasma physics via computer simulation. CRC press, 2004.
  • [8] L. Bottou. Online learning and stochastic approximations. On-line learning in neural networks, 17(9):142, 1998.
  • [9] S. Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.
  • [10] E. Carlen, P. Degond, and B. Wennberg. Kinetic limits for pair-interaction driven master equations and biological swarm models. Mathematical Models and Methods in Applied Sciences, 23(07):1339–1376, 2013.
  • [11] J. A. Carrillo, L. Pareschi, and M. Zanella. Particle based gPC methods for mean-field models of swarming with uncertainty. Communications in Computational Physics, 25(2), 2019.
  • [12] A. J. Chorin. Numerical study of slightly viscous flow. J. Fluid Mech., 57(4):785–796, 1973.
  • [13] K. Craig and A. L Bertozzi. A blob method for the aggregation equation. Math. Comp., 85(300):1681–1717, 2016.
  • [14] F. Cucker and S. Smale. Emergent behavior in flocks. IEEE Transactions on automatic control, 52(5):852–862, 2007.
  • [15] P. Degond, J.-G. Liu, and R. L. Pego. Coagulation–fragmentation model for animal group-size statistics. Journal of Nonlinear Science, 27(2):379–424, 2017.
  • [16] R. Durrett. Probability: Theory and Examples. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 4 edition, 2010.
  • [17] R. Durstenfeld. Algorithm 235: random permutation. Communications of the ACM, 7(7):420, 1964.
  • [18] Y. Feng, L. Li, and J.-G. Liu. Semi-groups of stochastic gradient descent and online principal component analysis: properties and diffusion approximations. Commun. Math. Sci., 16(3), 2018.
  • [19] D. Frenkel and B. Smit. Understanding molecular simulation: from algorithms to applications, volume 1. Elsevier, 2001.
  • [20] A. Georges, G. Kotliar, W. Krauth, and M. J. Rozenberg. Dynamical mean-field theory of strongly correlated fermion systems and the limit of infinite dimensions. Reviews of Modern Physics, 68(1):13, 1996.
  • [21] F. Golse, S. Jin, and T. Paul. The random batch method for nn-body quantum dynamics. arXiv preprint arXiv:1912.07424, 2019.
  • [22] K. E. Gustafson and J. A. Sethian. Vortex methods and vortex motion, volume 19. SIAM, 1991.
  • [23] S.-Y. Ha and J.-G. Liu. A simple proof of the Cucker-Smale flocking dynamics and mean-field limit. Commun. Math. Sci., 7(2):297–325, 2009.
  • [24] D. Horstmann. From 1970 until present: the Keller-Segel model in chemotaxis and its consequences. Jahresber. Dtsch. Math.-Ver., 105:103–165, 2003.
  • [25] S. Jin, L. Li, and J.-G. Liu. Random Batch methods (RBM) for interacting particle systems. Journal of Computational Physics, 400:108877, 2020.
  • [26] S. Jin, L. Li, J.-G. Liu, and Y. Tang. A direct simulation approach for the Poisson-Boltzmann equation using the Random Batch Method. preprint, 2020.
  • [27] S. Jin and X. Wen. Hamiltonian-preserving schemes for the Liouville equation with discontinuous potentials. Commun. Math. Sci., 3(3):285–315, 2005.
  • [28] P. E. Kloeden and E. Platen. Numerical solution of stochastic differential equations, volume 23. Springer Science & Business Media, 2013.
  • [29] J.-M. Lasry and P.-L. Lions. Mean field games. Japanese journal of mathematics, 2(1):229–260, 2007.
  • [30] L. Li, Y. Li, J.-G. Liu, Z. Liu, and J. Lu. A stochastic version of Stein variational gradient descent for efficient sampling. Comm. App. Math. Comp. Sci., 2020.
  • [31] L. Li and J.-G. Liu. Large time behaviors of upwind schemes and B-schemes for Fokker-Planck equations on \mathbb{R} by jump processes. Math. Comp., 2020.
  • [32] L. Li, J. Lu, J. C. Mattingly, and L. Wang. Numerical methods for stochastic differential equations based on Gaussian mixture. arXiv preprint arXiv:1812.11932, 2018.
  • [33] J.-G. Liu and Z. Xin. Convergence of the point vortex method for 2-d vortex sheet. Math. Comput., 70(234):595–606, 2001.
  • [34] J.-G. Liu and R. Yang. A random particle blob method for the Keller-Segel equation and convergence analysis. Mathematics of Computation, 86(304):725–745, 2017.
  • [35] Q. Liu and D. Wang. Stein variational gradient descent: A general purpose bayesian inference algorithm. In Advances In Neural Information Processing Systems, pages 2378–2386, 2016.
  • [36] G. N. Milstein and M. V. Tretyakov. Stochastic numerics for mathematical physics. Springer Science & Business Media, 2013.
  • [37] GN Milstein. Weak approximation of solutions of systems of stochastic differential equations. Theory of Probability &\& Its Applications, 30(4):750–766, 1986.
  • [38] S. Motsch and E. Tadmor. Heterophilious dynamics enhances consensus. SIAM review, 56(4):577–621, 2014.
  • [39] Kenichi Nanbu. Direct simulation scheme derived from the Boltzmann equation. i. monocomponent gases. Journal of the Physical Society of Japan, 49(5):2042–2049, 1980.
  • [40] B. Øksendal. Stochastic differential equations: an introduction with applications. Springer, Berlin, Heidelberg, sixth edition, 2003.
  • [41] H. Robbins and S. Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400–407, 1951.
  • [42] V. Rokhlin. Rapid solution of integral equations of classical potential theory. Journal of computational physics, 60(2):187–207, 1985.
  • [43] H. E. Stanley. Phase transitions and critical phenomena. Clarendon Press, Oxford, 1971.
  • [44] T. Vicsek, A. Czirók, E. Ben-Jacob, I. Cohen, and O. Shochet. Novel type of phase transition in a system of self-driven particles. Physical review letters, 75(6):1226, 1995.
  • [45] M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 681–688, 2011.