Convergence of Random Batch Method for interacting particles with disparate species and weights

Shi Jin [email protected] Lei Li [email protected] Jian-Guo Liu [email protected] Department of Mathematics and Department of Physics, Duke University, Durham, NC 27708, USA.

Abstract

We consider in this work the convergence of Random Batch Method proposed in our previous work [Jin et al., J. Comput. Phys., 400(1), 2020] for interacting particles to the case of disparate species and weights. We show that the strong error is of $O(\sqrt{\tau})$ while the weak error is of $O(\tau)$ where $\tau$ is the time step between two random divisions of batches. Both types of convergence are uniform in $N$ , the number of particles. The proof of strong convergence follows closely the proof in [Jin et al., J. Comput. Phys., 400(1), 2020] for indistinguishable particles, but there are still some differences: since there is no exchangeability now, we have to use a certain weighted average of the errors; some refined auxiliary lemmas have to be proved compared with our previous work. To show that the weak convergence of empirical measure is uniform in $N$ , certain sharp estimates for the derivatives of the backward equations have been used. The weak convergence analysis is also illustrating for the convergence of Random Batch Method for $N$ -body Liouville equations.

1 Introduction

Interacting particle systems are ubiquitous in natural, for example, molecules in fluids [19], plasma [7], galaxy in universe. In addition, many collective behaviors in natural and social sciences are due to interacting individuals, and examples include swarming [44, 11, 10, 15], flocking [14, 23, 1], and chemotaxis [24, 5] and consensus clusters in opinion dynamics [38]. In many models for these phenomenon, individual particles can have weights. For example, in the point vortex model [12, 22, 33], the “particles” correspond to different point vortices with different strength and they interact each other through a Hamiltonian system. To approximate the nonlinear Fokker-Planck equation for interacting particles, like the Keller-Segel equation, one can use interacting particles with different masses to approximate the dynamics and then compute the empirical density more efficiently [13, 34]. There may also be several species, where the particles may have different features; for example, in the microscopic description of the Poisson-Boltzmann equation, particles with different charges interact with each other through Coulomb forces [2, 26].

The systems mentioned above can be written as the second order ODE/SDE system for $1\leq i\leq N$

\displaystyle\begin{split}&dX^{i}=V^{i}\,dt\\ &dV^{i}=\frac{1}{N-1}\sum_{j=1,j\neq i}^{N}q_{j}F(X^{i}-X^{j})\,dt-\gamma V^{i}\,dt+\sigma\,dW^{i}\end{split}

(1.1)

or the first order ODE/SDE system

\displaystyle dX^{i}=\frac{1}{N-1}\sum_{j=1:j\neq i}^{N}q_{j}F(X^{i}-X^{j})\,dt+\sigma\,dW^{i},~{}~{}i=1,\cdots,N,

(1.2)

where $q_{j}$ ’s are the weights (they can be mass, charges etc). Note that (1.1) contains the Hamiltonian case when the force $F$ is conservative and $\gamma=\sigma=0$ . If one discretizes (1.1) or (1.2) directly, the computational cost per time step is $O(N^{2})$ , which is undesired for large $N$ . In the case of fast enough decaying interactions, the Fast Multipole Method (FMM) [42] can reduce the complexity per iteration to $O(N)$ . However, the implementation is not easy, as it needs some advanced data structures.

The authors proposed in [25] a simple random algorithm, called the Random Batch Method (RBM), to reduce the computation cost per time step from $O(N^{2})$ to $O(N)$ for indistinguishable particles (thus with the same weight). The idea was to apply the “mini-batch” idea, famous for its application in the so-called stochastic gradient descent (SGD) [41, 8, 9] in machine learning. Later on, the “mini-batch” was used to develop a Markov Chain Monte Carlo method, called the stochastic gradient Langevin dynamics (SGLD), by Welling and Teh for Bayesian inference [45]. The random binary collisions were also used to simulate Boltzmann equation [6, 39, 4] or the mean-field equation for flocking [1]. How to design the mini-batch strategy depends on the specific applications. For interacting particle systems, the strategy in [25] is to do random grouping at each time interval and then let the particles interact within the groups on that small time interval. Compared with FMM, the accuracy is lower (halfth order in time step), but RBM is simpler to implement and is valid for more general potentials (e.g. the SVGD ODE [35, 30]). Intuitively, the method converges because there is time average in time, and thus the convergence is like that in the Law of Large Number, but in time (see [25] for more detailed explanation). Moreover, the RBM converges in the regime of mean field limit ([43, 20, 29]). In fact, the method is asymptotic-preserving regarding the mean-field limit, which means the algorithm can approximate the one-marginal distribution with error bound independent of $N$ .

The proof in [25] depends on the exchangeability of the particles. If the particles carry weights, they are not exchangeable. Even so, it is clear that RBM can be equally well applied for interacting particles in multispecies and with weights (see Algorithm 1 below). The goal of this paper is to analyze the convergence of RBM for interacting particles with weights or multispecies. We will discuss both strong and weak convergences, which rely on different techniques so that the error estimates are independent of $N$ . The proof of strong convergence follows closely the proof in [25] for indistinguishable particles, but there are still some differences: since there is no exchangeability, we have to use a kind of weighted average of the errors; some refined auxiliary lemmas have to be used here, compared with those in [25]. The weak convergence is for the empirical measures instead of the one marginal distribution as in [25]. Some sharp estimates for the derivatives of the backward equations have to be done so that the weak convergence is uniform in $N$ . The weak convergence analysis is also illustrating for the convergence of Random Batch Method for $N$ -body Liouville equations.

The rest of the paper is organized as follows. In section 2, we introduce some basic notations and give a detailed description of the Random Batch Method. Section 3 is devoted to the proof of strong convergence. In section 4, we show the weak first order convergence, where a key proposition for the estimates of derivatives of the backward equation is needed.

2 Setup and notations

We consider interacting particles with weights. Since in practice, there can be external field, and also there are some applications where the interaction kernels depend on the two specific particles (for example, the SVGD ODE [30]), we consider in general the following first order system

\displaystyle dX^{i}=b(X^{i})\,dt+\frac{1}{N-1}\sum_{j:j\neq i}m_{j}K_{ij}(X^{i},X^{j})\,dt+\sigma dW^{i},~{}~{}i=1,2,\cdots,N.

(2.1)

The argument in this paper can be generalized to second order systems without difficulty, which we omit. In (2.1), $b(\cdot)$ is the external force field, $\{W^{i}\}^{\prime}s$ are some given independent $d$ dimensional Wiener processes (the standard Brownian motions) and we impose

\displaystyle m_{j}\geq 0.

(2.2)

Note that this model includes the cases for particles with multispecies since the signs of the interaction can be included into $K_{ij}$ and for (1.2) $m_{j}=|q_{j}|$ .

For notational convenience, we define

\displaystyle F_{i}(x):=b(x_{i})+\frac{1}{N-1}\sum_{j:j\neq i}m_{j}K_{ij}(x_{i},x_{j}),

(2.3)

where

x:=(x_{1},\cdots,x_{N})\in\mathbb{R}^{Nd}.

The (random) empirical probability measure corresponding to (2.1) is given by

\displaystyle\mu^{N}:=\frac{1}{N}\sum_{j=1}^{N}\omega_{j}\delta(x-X^{j}),

(2.4)

where

\omega_{j}=\frac{Nm_{j}}{\sum_{j}m_{j}}.

Assumption 2.1.

We assume there are positive constants $A,M$ independent of $N$ such that

\displaystyle\frac{1}{N}\sum_{j}m_{j}=M,~{}~{}\max_{j}|m_{j}|\leq A.

(2.5)

With the assumption above, we find

\displaystyle\omega_{j}=O(1).

(2.6)

Below are some assumptions on the external field and interaction kernels.

Assumption 2.2.

Moreover, we assume $b(\cdot)$ is one-sided Lipschitz:

\displaystyle(z_{1}-z_{2})\cdot(b(z_{1})-b(z_{2}))\leq\beta|z_{1}-z_{2}|^{2}

(2.7)

for some constant $\beta$ and that $b,\nabla b$ have polynomial growth

\displaystyle|b(z)|+|\nabla b|\leq C(1+|z|)^{q}.

(2.8)

The functions $K_{ij}(\cdot,\cdot)$ and their derivatives up to second order are uniformly bounded in $1\leq i,j\leq N$ .

2.1 The Random Batch Method and mathematical setup

We now give some detailed explanation of the random batch method proposed in [25], when applied on (2.1). Suppose the computational interval is $[0,T]$ . We pick a small time step $\tau$ , and define the discrete time grids

\displaystyle t_{k}:=k\tau,~{}~{}k\in\mathbb{N}.

(2.9)

The number of iteration for the algorithm is

\displaystyle N_{T}:=\left\lceil\frac{T}{\tau}\right\rceil.

(2.10)

At each time grid $t_{k}$ , we divide the

N=np

particles into $n$ small batches with equal size $p$ ( $p\ll N$ , often $p=2$ ) randomly. We have assumed $p$ divides $N$ for convenience. Denote the $n$ batches by $\mathcal{C}_{q},q=1,\cdots,n$ , and then each particle only interacts particles within its own batch. The detail is shown in Algorithm 1). Clearly, each iteration contains two main steps: (1) Randomly dividing the particles into $n$ batches (implemented by random permuation, costing $O(N)$ [17]); (2) particles interact inside batches only.

Algorithm 1 (RBM)

1:for

k\text{ in }1:[T/\tau]

2: Divide

\{1,2,\ldots,pn\}

into

n

batches randomly.

3: for each batch

\mathcal{C}_{q}

4: Update

\tilde{X}^{i}

’s (

i\in\mathcal{C}_{q}

) by solving the following SDE with

t\in[t_{k-1},t_{k})

\displaystyle d\tilde{X}^{i}=b(\tilde{X}^{i})dt+\frac{1}{p-1}\sum_{j\in\mathcal{C}_{q},j\neq i}m_{j}K_{ij}(\tilde{X}^{i},\tilde{X}^{j})dt+\sigma dW^{i}.

(2.11)

5: end for

6:end for

Above, the Wiener process $W^{i}$ (Brownian motion) used is the same as in (2.1).

Remark 2.1.

For particles with weights, it is desirable to get some random batches by importance sampling. This is left for future study.

We denote $\mathcal{C}_{q}^{(k)}$ , $1\leq q\leq n$ the batches at $t_{k}$ , and

\displaystyle\mathcal{C}^{(k)}:=\{\mathcal{C}_{1}^{(k)},\cdots,\mathcal{C}_{n}^{(k)}\},

(2.12)

will denote the random division of batches at $t_{k}$ . It is standard by the Kolmogorov extension theorem [16] that there exists a probability space $(\Omega,\mathcal{F},\mathbb{P})$ so that the random variables $\{X_{0}^{i},W^{i},\mathcal{C}^{(k)}:1\leq i\leq N,k\geq 0\}$ are on this probability space and they are all independent. As usual, we use $\mathbb{E}$ to denote the integration on $\Omega$ with respect the probability measure $\mathbb{P}$ . For the convenience of the analysis, we introduce the $L^{2}(\mathbb{P})$ norm as:

\displaystyle\|v\|=\sqrt{\mathbb{E}|v|^{2}}.

(2.13)

Define the filtration $\{\mathcal{F}_{k}\}_{k\geq 0}$ by

\displaystyle\mathcal{F}_{k-1}=\sigma(X_{0}^{i},W^{i}(t),\mathcal{C}^{(j)};t\leq t_{k-1},j\leq k-1).

(2.14)

In other words, $\mathcal{F}_{k-1}$ is the $\sigma$ -algebra generated by the initial values $X_{0}^{i}$ ( $i=1,\ldots,N$ ), $B^{i}(t)$ , $t\leq t_{k-1}$ , and $\mathcal{C}^{(j)}$ , $j\leq k-1$ . Hence, $\mathcal{F}_{k-1}$ contains the information of how batches are constructed for $t\in[t_{k-1},t_{k})$ . We also introduce the filtration $\{\mathcal{G}_{k}\}_{k\geq 0}$ by

\displaystyle\mathcal{G}_{k-1}=\sigma(X_{0}^{i},W^{i}(t),\mathcal{C}^{(j)};t\leq t_{k-1},j\leq k-2).

(2.15)

If we use $\sigma(\mathcal{C}^{(k-1)})$ to mean the $\sigma$ -algebra generated by $\mathcal{C}^{(k-1)}$ , the random division of batches at $t_{k-1}$ , then $\mathcal{F}_{k-1}=\sigma(\mathcal{G}_{k-1}\cup\sigma(\mathcal{C}^{(k-1)}))$ .

For further discussion, given some random batches $\mathcal{C}$ , we define the random variables

\displaystyle I_{ij}=\begin{cases}1&i,j\text{~{}are in the same batch,}\\ 0&\text{~{}otherwise~{}}\end{cases},~{}~{}1\leq i,j\leq N.

(2.16)

We will focus on the approximation error of $\tilde{X}$ for $X$ for $t\in[0,T]$ . In particular, we define the error process

\displaystyle Z^{i}=\tilde{X}^{i}-X^{i},~{}~{}i=1,\cdots,N.

(2.17)

2.2 A comment about interacting particles with multispecies

In applications, the most important cases where particles carry weights are the multispecies cases. For example, when we simulate the microscopic particles for the Poisson-Boltzmann equations [2, 26], we need to consider charged particles with different valences, in particular

\displaystyle dX^{i}=\frac{1}{N-1}\sum_{j=1:j\neq i}^{N}Q_{j}z_{i}z_{j}F(X^{i}-X^{j})\,dt+\sigma\,dW^{i},~{}~{}i=1,\cdots,N,

(2.18)

where $z_{i}=\pm 1$ represents whether the charge is positive or negative and $Q_{j}\geq 0$ is the absolute value of the charges. In this case, we can define

\displaystyle K_{ij}(x,y)=z_{i}z_{j}F(x-y),

(2.19)

and this reduces to (2.1).

Also, people may care about different densities for different species. Similar to (2.4), one can compute the empirical measures for all these species separately. The empirical measure (2.4) is then a mixture of them. Hence, the model (2.1) is rich enough to include interacting particles with disparate species and weights. Below, we will address these uniformly under the framework of (2.1).

3 The strong convergence

Since there is no exchangeability, we have to use weighted average of the errors. We consider

\displaystyle J(t):=\frac{1}{2N}\sum_{i=1}^{N}m_{i}\mathbb{E}|\tilde{X}^{i}-X^{i}|^{2},

(3.1)

which is the strong error. As a common convention, the “ $\frac{1}{2}$ ” prefactor is used for energies of quadratic forms. Recently, a certain quantum correspondence of this error has been used by Golse et al. to prove the convergence of Random Batch Method for $N$ -body Schrödinger equations [21].

Theorem 3.1.

Suppose Assumptions 2.1–2.2 hold. Then, it holds that

\displaystyle\sup_{t\leq T}J(t)\leq C\sqrt{\frac{\tau}{p-1}+\tau^{2}},

(3.2)

where $C$ is independent of $N,p$ .

Define the error of the random approximation for the interacting force by

\displaystyle\begin{split}\chi_{i}(x)&:=\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}K_{ij}(x_{i},x_{j})-\frac{1}{N-1}\sum_{j:j\neq i}m_{j}K_{ij}(x_{i},x_{j})\\ &=\frac{1}{p-1}\sum_{j:j\neq i}I_{ij}m_{j}K_{ij}(x_{i},x_{j})-\frac{1}{N-1}\sum_{j:j\neq i}m_{j}K_{ij}(x_{i},x_{j}),\end{split}

(3.3)

where $x_{i}\in\mathbb{R}^{d}$ , and again $x=(x_{1},\cdots,x_{N})\in\mathbb{R}^{Nd}$ .

We have the following facts

Lemma 3.1.

For $i\neq j$ , it holds that

\displaystyle\mathbb{E}I_{ij}=\frac{p-1}{N-1},

(3.4)

and for distinct $i,j,\ell$ , it holds that

\displaystyle\mathbb{E}I_{ij}I_{i\ell}=\frac{(p-1)(p-2)}{(N-1)(N-2)}.

(3.5)

The proofs have been done in the proof of [25, Lemma 3.1], for which we omit. Using Lemma 3.1, one can have the following consistency of the Random Batch Method.

Lemma 3.2.

For given $x=(x_{1},\ldots,x_{N})\in\mathbb{R}^{Nd}$ , it holds that

\displaystyle\mathbb{E}\chi_{i}(x)=0.

(3.6)

Moreover, the second moment is given by

\displaystyle\mathbb{E}|\chi_{i}(x)|^{2}=\left(\frac{1}{p-1}-\frac{1}{N-1}\right)\Lambda_{i}(x),

(3.7)

where

\displaystyle\Lambda_{i}(x):=\frac{1}{N-2}\sum_{j:j\neq i}\Big{|}m_{j}K_{ij}(x_{i},x_{j})-\frac{1}{N-1}\sum_{\ell:\ell\neq i}m_{\ell}K_{i\ell}(x_{i},x_{\ell})\Big{|}^{2}.

(3.8)

Though slightly different from [25, Lemma 3.1], the proof is essentially the same and we omit.

We move to some important additional estimates:

Lemma 3.3.

Let $X^{i}$ and $\tilde{X}^{i}$ be solutions to (2.1) and (2.11) respectively. Suppose Assumptions 2.1–2.2 hold. Then,

\displaystyle\sup_{t\leq T}(\mathbb{E}|X^{i}(t)|^{q}+\mathbb{E}|\tilde{X}^{i}(t)|^{q})\leq C_{q}.

(3.9)

Besides, for any $k>0$ and $q\geq 2$ ,

\displaystyle\sup_{t\in[t_{k-1},t_{k})}\left|\mathbb{E}(|\tilde{X}^{i}(t)|^{q}|\mathcal{F}_{k-1})\right|\leq C_{1}|\tilde{X}^{i}(t_{k-1})|^{q}+C_{2}.

(3.10)

holds almost surely.

Moreover, almost surely, it holds that

\displaystyle\begin{split}&|\mathbb{E}(\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})|\mathcal{F}_{k-1})|\leq C(1+|\tilde{X}^{i}(t_{k-1})|^{q})\tau,\\ &\Big{|}\mathbb{E}\left(|\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})|^{2}|\mathcal{F}_{k-1}\right)\Big{|}\leq C(1+|\tilde{X}^{i}(t_{k-1})|^{q})\tau.\end{split}

(3.11)

Note that the second equation in (3.11) is different from that in [25]. In fact, the proof in [25] has a small gap, and one needs this refined estimate to fill in that gap as well ([25] Page 13, from Line 17 to Line 19, the variable $\mathbb{E}(|\delta\tilde{X}^{i}|^{2}+|\delta\tilde{X}^{j}|^{2}|\mathcal{F}_{m-1})$ is not independent of $\mathcal{C}_{\theta}$ ; to get Line 19, we need this refined version). Though the proof is not hard, we attach it in Appendix A.

The following lemma is an improved version of [25, Lemma 3.2], which is very important to establish the strong convergence for the problem considered in this paper.

Lemma 3.4.

Fix $i\in\{1,\ldots,N\}$ . Let $\mathcal{C}_{\theta}$ be the random batch of size $p$ that contains $i$ in the random division. Let $Y_{j}$ ( $1\leq j\leq N$ ) be $N$ random variables (or random vectors) that are independent of $\mathcal{C}_{\theta}$ . Then, for $p\geq 2$ ,

\displaystyle\left\|\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}Y_{j}\right\|\leq\left(\frac{1}{N-1}\sum_{j:j\neq i}\|Y_{j}\|^{2}\right)^{1/2}.

(3.12)

Proof.

By the definition and independence,

\begin{split}\left\|\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}Y_{j}\right\|^{2}&=\frac{1}{(p-1)^{2}}\mathbb{E}\left|\sum_{j:j\neq i}I_{ij}Y_{j}\right|^{2}\\ &=\frac{1}{(p-1)^{2}}\sum_{j,\ell:j\neq i,\ell\neq i}\mathbb{E}(I_{ij}I_{i\ell})\mathbb{E}(Y_{j}\cdot Y_{\ell})\\ &\leq\frac{1}{(p-1)^{2}}\Bigg{[}\sum_{j,\ell:j\neq i,\ell\neq i,j\neq\ell}\frac{(p-1)(p-2)}{(N-1)(N-2)}\|Y_{j}\|\|Y_{\ell}\|\\ &\quad\quad\quad\quad\quad\quad+\sum_{j:j\neq i}\mathbb{E}I_{ij}\|Y_{j}\|^{2}\Bigg{]}\\ &=:R_{1}+R_{2}.\end{split}

Note that the independence is used in the second equality. The first inequality is due to Lemma 3.1.

It is easy to calculate

\begin{split}R_{1}&\leq\frac{p-2}{(p-1)(N-1)(N-2)}\sum_{j,\ell:j\neq i,\ell\neq i,j\neq\ell}(\frac{1}{2}\|Y_{j}\|^{2}+\frac{1}{2}\|Y_{\ell}\|^{2})\\ &=\frac{p-2}{(p-1)(N-1)}\sum_{j:j\neq i}\|Y_{j}\|^{2},\end{split}

while

R_{2}=\frac{1}{(p-1)(N-1)}\sum_{j:j\neq i}\|Y_{j}\|^{2}.

Hence,

R_{1}+R_{2}\leq\frac{1}{N-1}\sum_{j:j\neq i}\|Y_{j}\|^{2}.

The claim thus follows. ∎

We now consider the error process defined in (2.17). The derivative of $Z^{i}$ is clearly given by

\frac{d}{dt}Z^{i}=[b(\tilde{X}^{i})-b(X^{i})]+\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}K_{ij}(\tilde{X}^{i},\tilde{X}^{j})\\ -\frac{1}{N-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}K_{ij}(X^{i},X^{j}),

(3.13)

where $\mathcal{C}_{\theta}$ is the random batch in $\mathcal{C}$ that contains $i$ .

Define

\displaystyle\delta K_{ij}(t):=K_{ij}(\tilde{X}^{i}(t),\tilde{X}^{j}(t))-K_{ij}(X^{i}(t),X^{j}(t)).

(3.14)

The right hand side of (3.15) can then be written as

\displaystyle\begin{split}\frac{d}{dt}Z^{i}&=[b(\tilde{X}^{i})-b(X^{i})]+\frac{1}{N-1}\sum_{j:j\neq i}m_{j}\delta K_{ij}+\chi_{i}(\tilde{X})\\ &=[b(\tilde{X}^{i})-b(X^{i})]+\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}\delta K_{ij}+\chi_{i}(X).\end{split}

(3.15)

With this, one can obtain the following simple lemma

Lemma 3.5.

For $t\in[t_{k-1},t_{k})$ ,

\displaystyle\|Z^{i}(t)-Z^{i}(t_{k-1})\|\leq C\tau.

(3.16)

Also, almost surely,

\displaystyle|Z^{i}(t)|\leq|Z^{i}(t_{k-1})|(1+C\tau)+C\tau.

(3.17)

Proof.

By (3.15), since $b$ has polynomial growth, the claim for $\|Z^{i}(t)-Z^{i}(t_{m-1})\|$ is then an easy consequence of the $q$ -moment estimates in Lemma 3.3.

Dotting using (3.15) with $Z^{i}$ and using the one-sided Lipschitz condition in Assumption 2.2, one has

\frac{1}{2}\frac{d}{dt}|Z^{i}|^{2}\leq C|Z^{i}|^{2}+C|Z^{i}|.

Hence, almost surely, it holds that

\frac{d}{dt}|Z^{i}|\leq C|Z^{i}|+C.

The second claim then follows. ∎

We now give the proof of the strong convergence in Theorem 3.1.

Proof of Theorem 3.1.

First,

\frac{dJ(t)}{dt}=\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}\Big{\{}Z^{i}\cdot[(F_{i}(\tilde{X})-F_{i}(X))+\chi_{i}(\tilde{X})]\Big{\}}.

The first term is easy to bound by the one-sided Lipschitz condition of $b$ and the conditions of $K_{ij}$ in Assumption 2.2 :

\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}\Big{\{}Z^{i}\cdot(F_{i}(\tilde{X})-F_{i}(X))\Big{\}}\leq\frac{C}{N}\sum_{i}m_{i}|Z^{i}|^{2}+\frac{C}{N(N-1)}\sum_{i\neq j}m_{i}m_{j}\mathbb{E}(|Z^{i}|^{2}+|Z^{j}||Z^{j}|).

This is clearly bounded by $J(t)$ .

Now, we focus on the second term. The technique is the same as in our previous work [25], but some special modifications are needed for our problem here.

\displaystyle\begin{split}\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}\Big{\{}Z^{i}(t)\cdot\chi_{i}(\tilde{X}(t))\Big{\}}=&\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}\Big{\{}Z^{i}(t_{k-1})\cdot\chi_{i}(\tilde{X}(t))\Big{\}}\\ &+\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}\Big{\{}(Z^{i}(t)-Z^{i}(t_{k-1}))\cdot\chi_{i}(\tilde{X}(t))\Big{\}}\\ =:&I_{1}+I_{2}.\end{split}

Step 1– Estimate of $I_{1}$ :

For $I_{1}$ , using the consistency result in Lemma 3.2, one has

I_{1}=\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}\left\{Z^{i}(t_{k-1})\cdot\Big{[}\chi_{i}(\tilde{X}(t))-\chi_{i}(\tilde{X}(t_{k-1}))\Big{]}\right\}.

In fact,

\mathbb{E}\left\{Z^{i}(t_{k-1})\cdot\chi_{i}(\tilde{X}(t_{k-1}))\right\}=\mathbb{E}\left\{Z^{i}(t_{k-1})\cdot\mathbb{E}\Big{[}\chi_{i}(\tilde{X}(t_{k-1}))|\mathcal{G}_{k-1}\Big{]}\right\}=\mathbb{E}0=0.

This is the only place where the $\sigma$ -algebra $\mathcal{G}_{k-1}$ is used.

Note that

\begin{split}&\mathbb{E}\left\{Z^{i}(t_{k-1})\cdot\Big{[}\chi_{i}(\tilde{X}(t))-\chi_{i}(\tilde{X}(t_{k-1}))\Big{]}\right\}\\ &=\mathbb{E}\Bigg{(}Z^{1}(t_{k-1})\cdot\mathbb{E}(\chi_{i}(\tilde{X}(t))-\chi_{i}(\tilde{X}(t_{k-1}))|\mathcal{F}_{k-1})\Bigg{)}\\ &\leq C\|Z^{1}(t_{k-1})\|\left\|\mathbb{E}[\chi_{i}(\tilde{X}(t))-\chi_{i}(\tilde{X}(t_{k-1}))|\mathcal{F}_{k-1}]\right\|.\end{split}

Using the definition of $\chi_{i}$ (3.3), one has

\begin{split}&\mathbb{E}\left[\chi_{i}(\tilde{X}(s))-\chi_{i}(\tilde{X}(t_{k-1}))|\mathcal{F}_{k-1}\right]\\ &=\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}\mathbb{E}(\delta\tilde{K}^{ij}|\mathcal{F}_{k-1})-\frac{1}{N-1}\sum_{j:j\neq i}m_{j}\mathbb{E}(\delta\tilde{K}^{ij}|\mathcal{F}_{k-1}),\end{split}

where

\delta\tilde{K}^{ij}=K_{ij}(\tilde{X}^{i}(s),\tilde{X}^{j}(s))-K_{ij}(\tilde{X}^{i}(t_{k-1}),\tilde{X}^{j}(t_{k-1})).

We first estimate $\mathbb{E}(\delta\tilde{K}^{ij}|\mathcal{F}_{k-1})$ . Denote $\delta\tilde{X}^{j}:=\tilde{X}^{j}(s)-\tilde{X}(t_{k-1})$ . Performing Taylor expansion around $t_{k-1}$ , one has

\delta\tilde{K}^{ij}=(\nabla_{x_{i}}K_{ij}|_{t_{k-1}}\cdot\delta\tilde{X}^{i}+\nabla_{x_{j}}K_{ij}|_{t_{k-1}}\cdot\delta\tilde{X}^{j})+\frac{1}{2}M:[\delta\tilde{X}^{i},\delta\tilde{X}^{j}]\otimes[\delta\tilde{X}^{i},\delta\tilde{X}^{j}],

with $M$ being a random variable (tensor) bounded by $\|\nabla^{2}K\|_{\infty}$ . By (3.11), one finds that

|\mathbb{E}(\delta\tilde{K}^{ij}|\mathcal{F}_{k-1})|\leq C(1+|\tilde{X}^{i}(t_{k-1})|^{q}+|\tilde{X}^{j}(t_{k-1})|^{q})\tau.

The right hand side is independent of $\mathcal{C}_{\theta}$ , and this is the place where we need the almost surely bound (3.11). Applying Lemma 3.4, one has

\left\|\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}\mathbb{E}(\delta\tilde{K}^{ij}|\mathcal{F}_{k-1})\right\|\leq C\tau\left(1+\left(\frac{1}{N-1}\sum_{j:j\neq i}\||\tilde{X}^{j}(t_{k-1})|^{q_{1}}\|^{2}\right)^{1/2}\right)\leq C\tau.

The term $\|\frac{1}{N-1}\sum_{j:j\neq i}m_{j}\mathbb{E}(\delta\tilde{K}^{ij}|\mathcal{F}_{k-1})\|$ is much easier to estimate, and it is also bounded by $C\tau$ .

Hence

\mathbb{E}Z^{i}(t_{k-1})\cdot\Big{[}\chi_{i}(\tilde{X}(t))-\chi_{i}(\tilde{X}(t_{k-1}))\Big{]}\leq C\|Z^{i}(t_{k-1})\|\tau\leq C\|Z^{i}(t)\|\tau+C\tau^{2}.

Then,

I_{1}\leq\left(\frac{C}{N}\sum_{i=1}^{N}m_{i}\|Z^{i}(t)\|\tau\right)+C\tau^{2}\leq\delta\frac{1}{N}\sum_{i=1}^{N}m_{i}\|Z^{i}(t)\|^{2}+C(\delta)\tau^{2}

Step 2– Estimate of $I_{2}$ .

We decompose

I_{2}=\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}(Z^{i}(t)-Z^{i}(t_{k-1}))\cdot[\chi_{i}(\tilde{X}(t))-\chi_{i}(X(t))]\\ +\frac{1}{N}\sum_{i=1}^{N}m_{i}\mathbb{E}(Z^{i}(t)-Z^{i}(t_{k-1}))\cdot\chi_{i}(X(t))=:I_{21}+I_{22}.

We first consider $I_{21}$ .Clearly,

I_{21}\leq\frac{1}{N}\sum_{i=1}^{N}m_{i}C\tau\|\chi_{i}(\tilde{X}(t))-\chi_{i}(X(t))\|.

Since $K$ is Lipschitz continuous,

|\delta K_{ij}(t)|\leq L(|Z^{i}(t)|+|Z^{j}(t)|).

Hence,

\displaystyle\left\|\left(\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}\delta K_{ij}(t)\right)\right\|\leq L\left(\|Z^{i}(t)\|+\left\|\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}|Z^{j}(t)|\right\|\right).

(3.18)

Note that $Z^{j}(t)$ depends on $\mathcal{C}_{\theta}$ and we cannot apply Lemma 3.4. Instead, by Lemma 3.5, one has that

|Z^{i}(t)|\leq C|Z^{i}(t_{k-1})|+C\tau.

Since $Z^{i}(t_{k-1})$ is independent of $\mathcal{C}_{\theta}$ , Lemma 3.4 then gives us that

\displaystyle\begin{split}\left\|\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}|Z^{j}(t)|\right\|&\leq C\left(\frac{1}{N-1}\sum_{j:j\neq i}m_{j}^{2}\|Z^{j}(t_{k-1})\|^{2}\right)^{1/2}+C\tau\\ &\leq C\left(\frac{1}{N-1}\sum_{j:j\neq i}m_{j}\|Z^{j}(t_{k-1})\|^{2}\right)^{1/2}+C\tau.\end{split}

(3.19)

Remark 3.1.

This is the place where we need Lemma 3.4, a refined version of [25, Lemma 3.2]. If we use [25, Lemma 3.2], one controls $\|\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}|Z^{j}(t_{k-1})|\|$ by $\sup_{j}m_{j}\|Z^{j}(t_{k-1})\|$ which is not enough to close the estimate.

With this, we find

\begin{split}I_{21}&\leq C\left(\frac{1}{N}\sum_{i=1}^{N}m_{i}\tau\left(\frac{1}{N-1}\sum_{j:j\neq i}m_{j}\|Z^{j}(t_{k-1})\|^{2}\right)^{1/2}\right)+C\tau^{2}\\ &\leq\delta\frac{1}{N}\sum_{i}m_{i}\|Z^{i}(t_{k-1})\|^{2}+C(\delta)\tau^{2},\end{split}

where Assumption 2.1 has been used.

We now consider $I_{22}$ . We first recall

\displaystyle\frac{d}{dt}Z^{i}=[b(\tilde{X}^{i})-b(X^{i})]+\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}\delta K_{ij}(t)+\chi_{i}(X(t))

(3.20)

Note

|b(\tilde{X}^{i})-b(X^{i})|\leq\int_{0}^{1}|(\tilde{X}^{i}-X^{i})\cdot\nabla b((1-\sigma)\tilde{X}^{i}+\sigma X^{i})|\,d\sigma,

and $\nabla b((1-\sigma)\tilde{X}^{i}+\sigma X^{i})$ is controlled by $C(|\tilde{X}^{i}|^{q}+|X^{i}|^{q})$ for some $q>0$ . Hence,

\displaystyle\begin{split}\mathbb{E}|b(\tilde{X}^{i})-b(X^{i})||\chi_{i}(X(t^{\prime}))|&\leq\|\chi_{i}(X(t^{\prime}))\|_{\infty}\|\tilde{X}^{i}-X^{i}\|(\mathbb{E}(|\tilde{X}^{i}|^{q_{1}}+|X^{i}|^{q_{1}})^{2})^{1/2}\\ &\leq C\|Z^{i}(t)\|,\end{split}

by (3.9).

Integrating (3.20) in time over $[t_{k-1},t]$ , then dotting with $\chi_{i}(X(t))$ , and taking the expectation, one gets

\left|\mathbb{E}((Z^{i}(t)-Z^{i}(t_{k-1}))\cdot\chi_{m,i}(X(t)))\right|\leq C\int_{t_{k-1}}^{t}\|Z^{i}(s)\|ds+\\ \int_{t_{k-1}}^{t}\mathbb{E}\left[\left(\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}\delta K_{ij}(s)\right)\cdot\chi_{i}(X(t))\right]\,ds+\mathbb{E}\int_{t_{k-1}}^{t}\chi_{i}(X(s))\cdot\chi_{i}(X(t))\,ds,

(3.21)

Applying (3.18) and (3.19), the second term on the right hand of (3.21) is bounded by

C\left(\frac{1}{N-1}\sum_{j:j\neq i}m_{j}^{2}\|Z^{j}(t_{k-1})\|^{2}\right)^{1/2}\tau+C\tau^{2}.

Similarly as we estimate $I_{21}$ , this is controlled by $\delta\frac{1}{N}\sum_{i}m_{i}\|Z^{i}(t_{k-1})\|^{2}+C(\delta)\tau^{2}$ . The last term on the right hand of (3.21) is controlled by Lemma 3.2 (in particular, equation (3.7)):

\left[\frac{1}{p-1}-\frac{1}{N-1}\right]\|\Lambda_{i}\|_{\infty}\tau.

Therefore,

I_{22}\leq\delta\frac{1}{N}\sum_{i}m_{i}\|Z^{i}(t_{k-1})\|^{2}+C(\delta)\tau^{2}+\left[\frac{1}{p-1}-\frac{1}{N-1}\right]\|\Lambda_{i}\|_{\infty}\tau.

Moreover,

\|Z^{i}(t_{k-1})\|^{2}\leq(\|Z^{i}(t)\|+C\tau)^{2}\leq 2\|Z^{i}(t)\|^{2}+2C^{2}\tau^{2}.

Finally, taking all those estimates together, one has the following estimate:

\frac{d}{dt}J\leq CJ+C(\delta)\tau^{2}+\left[\frac{1}{p-1}-\frac{1}{N-1}\right]\|\Lambda_{i}\|_{\infty}\tau.

The claim then follows by Grönwall’s lemma. ∎

Remark 3.2.

The strong convergence can imply the convergence of one marginal distributions. See [25] for more details. The weak convergence below, however, is for empirical measure (2.4).

4 The weak convergence and Random Batch Method for backward equations

In practice, one may be more interested in the distributions of the particles instead of the trajectories of $X^{i}(t)$ . Hence, the error (3.1) is not suitable for this purpose, and we seek to study the weak convergence of the empirical measure (2.4), as commonly used in the numerical SDE literature [36, 28]. Roughly speaking, we say a sequence of measures $\mu^{N}$ converges to some measure $\mu$ weakly if for any suitable test function $\varphi$ , it holds that

\int\varphi\,d\mu^{N}=:\langle\mu^{N},\varphi\rangle\to\langle\mu,\varphi\rangle:=\int\varphi\,d\mu.

For our problem, we consider the weak convergence of the empirical measure $\mu^{N}$ defined in (2.4), which is defined on Borel sets from $\mathbb{R}^{d}$ , to the empirical measure corresponding to (2.1) as $\tau\to 0$ . Hence, to show that the empirical measures given by $X$ and $\tilde{X}$ are close in law, we pick a test function $\varphi\in C_{b}^{\infty}(\mathbb{R}^{d})$ , and hope to show that the weak error defined below is small:

\displaystyle E_{k}:=\left|\frac{1}{N}\sum_{i=1}^{N}\omega_{i}\mathbb{E}\varphi(\tilde{X}^{i}(k\tau))-\frac{1}{N}\sum_{i=1}^{N}\omega_{i}\mathbb{E}\varphi(X^{i}(k\tau))\right|.

(4.1)

Remark 4.1.

Traditionally, the test functions used for schemes of numerical SDEs are those with polynomial growth at infinity [37, 36]. We used $C_{b}^{\infty}$ as the test functions as done nowadays [3, 32], which will induce a weaker topology that disregards high order moments. If one has the moment control, the convergence using these two types of test functions will be the same.

For the weak convergence, we need some different assumptions on $b$ and $K_{ij}$ .

Assumption 4.1.

The functions $b,K_{ij}$ are $C^{4}$ and the derivatives of $b$ and $K_{ij}$ up to order $4$ are uniformly bounded (uniform in $i,j$ ).

Remark 4.2.

Proof of weak convergence using Assumption 2.2 instead of 4.1 should also be completed. Using Assumption 4.1 makes the proof based on semigroup technique elegant (see the details below).

We now state the main theorem for the weak convergence of RBM.

Theorem 4.1.

Under Assumption 2.1 and Assumption 4.1, the Random Batch Method converges weakly with first order for the empirical measure (2.4). In particular, the weak error defined in (4.1) satisfies

\displaystyle\sup_{k:k\tau\leq T}|E_{k}|\leq C\tau,

(4.2)

where $C=C(\varphi,T)$ is independent of $N,\tau$ , but depends on $\varphi$ .

Usually, in numerical SDEs, to prove the weak convergence, one makes use of the backward Kolmogorov equation (”backward equation” for short) which is defined in the same Euclidean space. For our problem, we need to lift the Euclidean space from $\mathbb{R}^{d}$ to $\mathbb{R}^{Nd}$ . In particular, define

\displaystyle u(x,t):=\frac{1}{N}\sum_{i=1}^{N}\omega_{i}\mathbb{E}[\varphi(X^{i}(t))|X(0)=x]

(4.3)

where $x\in\mathbb{R}^{Nd}$ and $X$ is the solution to (2.1). Then, we make use of this function to study the weak convergence. This function $u$ satisfies the following backward equation [40]

\displaystyle\partial_{t}u=\mathcal{L}u:=\sum_{i=1}^{N}\left[b(x_{i})+\frac{1}{N-1}\sum_{j:j\neq i}m_{j}K_{ij}(x_{i},x_{j})\right]\cdot\nabla_{x_{i}}u+\sum_{i=1}^{N}\frac{1}{2}\sigma^{2}\Delta_{x_{i}}u.

(4.4)

The operator $\mathcal{L}$ is called the generator of the ODE/SDE (2.1). The Laplacian $\Delta_{x_{i}}$ is given by

\displaystyle\Delta_{x_{i}}:=\sum_{j=1}^{d}\partial_{x_{i}^{(j)}x_{i}^{(j)}},~{}~{}x_{i}=(x_{i}^{(1)},\cdots,x_{i}^{(d)})\in\mathbb{R}^{d}.

(4.5)

The solution semigroup for (4.4) will be denoted by

\displaystyle e^{t\mathcal{L}}u(\cdot,0):=u(x,t).

(4.6)

By the well-known property of backward equation [40, 31], one has

\displaystyle\|u(\cdot,t)\|_{\infty}=\|e^{t\mathcal{L}}u(\cdot,0)\|_{\infty}\leq\|u(\cdot,0)\|_{\infty}.

(4.7)

The function $u$ is defined on $\mathbb{R}^{Nd}$ , and naive estimates of the norms for the derivatives will depend on $N$ . This is not sufficient for us to show the $N$ independent of weak convergence for the empirical measure. In fact, it is clear that

\displaystyle\|u\|_{\infty}\leq C,

(4.8)

independent of $N$ . The following proposition then gives the crucial estimates of the derivatives of $u$ .

Proposition 4.1.

For any $i$ and $j\neq i$ , one has

\displaystyle\|\nabla_{x_{i}}u\|_{\infty}+\|\nabla^{2}_{x_{i}}u\|_{\infty}+\|\nabla_{x_{i}}^{3}u\|_{\infty}+\|\nabla_{x_{i}}^{4}u\|_{\infty}\leq C(T)\frac{1}{N},

(4.9)

and

\displaystyle\|\nabla_{x_{i}}\nabla_{x_{j}}u\|_{\infty}+\|\nabla_{x_{i}}\nabla^{2}_{x_{j}}u\|_{\infty}+\|\nabla^{2}_{x_{i}}\nabla^{2}_{x_{j}}u\|_{\infty}\leq C(T)\frac{1}{N^{2}}.

(4.10)

Here, $\nabla_{x_{i}}^{2}$ is the Hessian matrix. Similarly, $\nabla_{x_{i}}^{4}u$ is the fourth order tensor with derivatives of the form $(\prod_{k=1}^{4}\partial_{x_{i}^{(j_{k})}})u$ and $j_{k}\in\{1,\cdots,d\}$ . The norm $\|\nabla_{x_{i}}^{4}u\|_{\infty}$ is understood as

\sup_{x}\left[\sum_{j_{1},j_{2},j_{3},j_{4}}\Big{(}\big{(}\prod_{k=1}^{4}\partial_{x_{i}^{(j_{k})}}\big{)}u\Big{)}^{2}\right]^{1/2}.

The proof is kind of tedious and is given in Appendix B.

For further discussion, we introduce the generator corresponding to the Random Batch Method. For $t\in(t_{k-1},t_{k}]$

\displaystyle\mathcal{L}_{C}:=\sum_{i=1}^{N}\left[b(x_{i})+\frac{1}{p-1}\sum_{j:j\neq i}I_{ij}^{(k-1)}m_{j}K_{ij}(x_{i},x_{j})\right]\cdot\nabla_{x_{i}}+\sum_{i=1}^{N}\frac{1}{2}\sigma^{2}\Delta_{x_{i}}.

We recall that $I_{ij}$ is the indicator for $i,j$ being in the same batch, and we use $I_{ij}^{(k-1)}$ to mean the indicator corresponding to $\mathcal{C}^{(k-1)}$ .

The importance of Proposition 4.1 is that one can bound $\mathcal{L}^{i}u$ uniformly in $N$ , where $\mathcal{L}^{i}$ means the composition of $\mathcal{L}$ for $i$ times (if $i=0$ , it is the identity operator). This is crucial for establishing the weak convergence uniformly in $N$ . In particular, we have the following.

Lemma 4.1.

For the function $u$ defined in (4.3), it holds that for $i=0,1,2$ that

\displaystyle\|\mathcal{L}^{i}u\|\leq C,~{}~{}~{}\|\mathcal{L}_{\mathcal{C}}^{i}u\|\leq C

(4.11)

Moreover,

\displaystyle\mathbb{E}\mathcal{L}_{\mathcal{C}}=\mathcal{L}.

(4.12)

Proof.

The first assertion is a corollary of Proposition 4.1. The second claim is the one proved in Lemma 3.2. ∎

To go further, we need to introduce a semigroup associated to RBM (see also [18] for the semigroup used in SGD). Consider the transition $\tilde{X}(t_{k-1})\to\tilde{X}(t_{k})$ , it is clear that this transition gives a time homogeneous Markov chain. Define the operator $\mathcal{S}^{(k)}:C_{b}(\mathbb{R}^{Nd})\to C_{b}(\mathbb{R}^{Nd})$ as

\displaystyle\mathcal{S}^{(k)}\phi:=\mathbb{E}[\phi(\tilde{X}(k\tau))|\tilde{X}(0)=x].

(4.13)

Below, we sometimes use $\mathbb{E}_{x}$ to mean $\mathbb{E}(\cdot|\tilde{X}(0)=x)$ for convenience. Then, by the Markov property [16], it can be shown that

\displaystyle\mathcal{S}^{(k)}=\mathcal{S}^{k}:=\mathcal{S}\circ\cdots\circ\mathcal{S}.

(4.14)

In fact, for any test function $\phi$ , we let $u^{k}:=(\mathcal{S}^{(k)}\phi$ . Then, it holds that

\begin{split}u^{k}(x)&=\mathbb{E}[\phi(\tilde{X}(t_{k}))|\tilde{X}(0)=x]\\ &=\mathbb{E}[\mathbb{E}[\phi(\tilde{X}(t_{k}))|\tilde{X}(t_{1})]|\tilde{X}(0)=x]\\ &=\mathbb{E}[u^{k-1}(\tilde{X}(t_{1}))|\tilde{X}(0)=x],\end{split}

where the third equality is by the Markov property, namely

\mathcal{S}^{(k)}=\mathcal{S}\circ\mathcal{S}^{(k-1)}.

Clearly, $\mathcal{S}^{k}$ are nonexpansive in $L^{\infty}(\mathbb{R}^{Nd})$ , i.e.

\displaystyle\|\mathcal{S}^{k}\phi\|_{\infty}\leq\|\phi\|_{\infty}.

(4.15)

We give the proof of the weak convergence.

Proof of Theorem 4.1.

The weak error in (4.1) is to compare $u$ in (4.3) and $\mathcal{S}^{n}\phi$ for test function

\displaystyle\phi(x)=\frac{1}{N}\sum_{i=1}^{N}\omega_{i}\varphi(x_{i}).

(4.16)

By Equation (4.14) ( $\mathcal{S}^{(k)}=\mathcal{S}^{k}$ ), we need to estimate

\mathcal{S}^{n}\phi(x)-u(x,n\tau)=\sum_{k=1}^{n}\Big{[}\mathcal{S}^{k}u(x,(n-k)\tau)-\mathcal{S}^{k-1}u(x,(n-k+1)\tau)\Big{]}.

Clearly, by (4.15), one has

|\mathcal{S}^{n}\phi(x)-u(x,n\tau)|\leq\sum_{k=1}^{n}\|\mathcal{S}u(x,(n-k)\tau)-u(x,(n-k+1)\tau)\|_{\infty}

By the definition of $S$ , one has

\begin{split}\mathcal{S}u(x,(n-k)\tau)&=\mathbb{E}_{x}u(\tilde{X}(\tau),(n-k)\tau)\\ &=\mathbb{E}_{\mathcal{C}}\mathbb{E}_{x}(u(\tilde{X}(\tau),(n-k)\tau)|\mathcal{C})\\ &=\mathbb{E}_{\mathcal{C}}e^{\tau\mathcal{L}_{\mathcal{C}}}u(x,(n-k)\tau).\end{split}

The second equality means that we fix a division of batches to compute the expectation with respect to the Brownian motions, and then average out about the random batches. For the last equality, we recall that $e^{\tau\mathcal{L}_{\mathcal{C}}}$ means the solution semigroup by $\partial_{t}v=\mathcal{L}_{\mathcal{C}}v$ .

Note

e^{\tau\mathcal{L}_{\mathcal{C}}}u(x,(n-k)\tau)=u(x,(n-k)\tau)+\tau\mathcal{L}_{\mathcal{C}}u(x,(n-k)\tau)\\ +\int_{0}^{\tau}(\tau-z)\mathcal{L}_{\mathcal{C}}^{2}e^{z\mathcal{L}_{\mathcal{C}}}u(x,(n-k)\tau)\,dz.

For the remainder term, by Lemma 4.1, it holds for every partition of batches that

\begin{split}\|\mathcal{L}_{\mathcal{C}}^{2}e^{z\mathcal{L}_{\mathcal{C}}}u(x,(n-k)\tau)\|_{\infty}&=\|e^{z\mathcal{L}_{\mathcal{C}}}\mathcal{L}_{\mathcal{C}}^{2}u(x,(n-k)\tau)\|_{\infty}\\ &\leq\|\mathcal{L}_{\mathcal{C}}^{2}u(x,(n-k)\tau)\|_{\infty}\leq C.\end{split}

Hence,

\|\mathcal{S}u(x,(n-k)\tau)-u(x,(n-k)\tau)-\mathbb{E}\tau\mathcal{L}_{\mathcal{C}}u(x,(n-k)\tau)\|_{\infty}\leq C\tau^{2},

with $C$ independent of $N$ . Here, we used the fact that $e^{\tau\mathcal{L}_{\mathcal{C}}}$ is nonexpansive in $L^{\infty}$ , similar as in (4.7).

Similarly,

\begin{split}&u(x,(n-k+1)\tau)=e^{\tau\mathcal{L}}u(x,(n-k)\tau)\\ &=u(x,(n-k)\tau)+\tau\mathcal{L}u(x,(n-k)\tau)+\int_{0}^{\tau}(\tau-z)\mathcal{L}^{2}e^{z\mathcal{L}}u(x,(n-k)\tau)\,dz.\end{split}

The remainder term is again bounded by $C\tau^{2}$ . Since

\mathbb{E}\mathcal{L}_{\mathcal{C}}=\mathcal{L},

one finds that

\|\mathcal{S}u(x,(n-k)\tau)-u(x,(n-k+1)\tau)\|_{\infty}\leq C\tau^{2},

with $C$ independent of $N$ .

Hence,

|\mathcal{S}^{n}\phi(x)-u(x,n\tau)|\leq\sum_{k=1}^{n}\|\mathcal{S}u(x,(n-k)\tau)-u(x,(n-k+1)\tau)\|_{\infty}\leq CN\tau^{2}\leq C\tau.

The claim is then proved. ∎

Remark 4.3.

This result is reminiscent of the results by Golse et al. [21], where the average of the one marginal density matrices for $N$ body quantum system has been used for the Random Batch Method, and the convergence rate is also $O(\tau)$ under a certain weak norm.

The backward equation for the Random Batch system is given by

\displaystyle\partial_{t}\tilde{u}=\sum_{i=1}^{N}\left[b(x_{i})+\frac{1}{p-1}\sum_{j:j\neq i}I_{ij}^{(k-1)}m_{j}K_{ij}(x_{i},x_{j})\right]\cdot\nabla_{x_{i}}\tilde{u}+\sum_{i}\frac{1}{2}\sigma^{2}\Delta_{x_{i}}\tilde{u}.

(4.17)

Hence, Theorem 4.1 in fact says the following result when we apply random batch method to backward equations or Liouville equations ( $\sigma=0$ ).

Theorem 4.2.

Consider the backward equation (4.4) and its corresponding equation using Random Batch Method (4.17). If the initial value is given by

\displaystyle u|_{t=0}=\tilde{u}|_{t=0}=\frac{1}{N}\sum_{i}\omega_{i}\varphi(x_{i}),

(4.18)

then it holds that

\displaystyle\sup_{t\leq T}\|u(t)-\tilde{u}(t)\|_{\infty}\leq C(T)\tau.

(4.19)

Remark 4.4.

For general initial data, the approximation in $L^{\infty}$ given by random batch method for backward equation (Liouville equation when $\sigma=0$ ) can not be uniform in $N$ .

Remark 4.5.

The situation can be different if one considers the Liouville equation of second order system for the density distribution [27]. Clearly, in this case, we cannot use the $L^{\infty}$ norm to gauge the difference. Instead, a certain weak norm for the combination of one marginals should be considered as in [21]. This is left for future.

Acknowledgement

S. Jin was partially supported by the NSFC grant No. 31571071. The work of L. Li was partially sponsored by NSFC 11901389, Shanghai Sailing Program 19YF1421300 and NSFC 11971314. The work of J.-G. Liu was partially supported by KI-Net NSF RNMS11-07444 and NSF DMS-1812573.

Appendix A Proof of Lemma 3.3

Proof.

The first part is similar as in [25] except that for $b(\cdot)$ terms, we need to use the one-sided Lipschitz condition. Hence, we omit the proof.

We now prove (3.11). Consider a realization so that the equation is written as

d\tilde{X}^{i}(t)=b(\tilde{X}^{i})\,dt+\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}K_{ij}(\tilde{X}^{i},\tilde{X}^{j})+\sigma dB^{i},

where $\mathcal{C}_{\theta}$ again is the random batch that contains $i$ from the random division at $t_{k-1}$ , or $\mathcal{C}^{(k-1)}$ . It follows that

\displaystyle\begin{split}\mathbb{E}\Big{(}\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})\Big{|}\mathcal{F}_{k-1}\Big{)}=&\int_{t_{k-1}}^{t}\mathbb{E}(b(\tilde{X}^{i})|\mathcal{F}_{k-1})\,ds\\ &+\int_{t_{k-1}}^{t}\mathbb{E}\left(\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}K_{ij}(\tilde{X}^{i},\tilde{X}^{j})\Big{|}\mathcal{F}_{k-1}\right)\,ds.\end{split}

Note that $K$ is bounded and $|b(x)|\leq C(1+|x|^{q})$ for some $q>0$ . Together with (3.10), this implies the first estimate in (3.11).

For the second equation in (3.11), Itô’s formula implies that

\frac{d}{dt}\mathbb{E}\left[|\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})|^{2}|\mathcal{F}_{k-1}\right]\\ =2\mathbb{E}\left[\Big{(}\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})\Big{)}\cdot\left(b(\tilde{X}^{i})+\frac{1}{p-1}\sum_{j\in\mathcal{C}_{\theta},j\neq i}m_{j}K_{ij}(\tilde{X}^{i},\tilde{X}^{j})\right)\Big{|}\mathcal{F}_{k-1}\right]+\sigma^{2}.

Note that

\Big{(}\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})\Big{)}\cdot b(\tilde{X}^{i})\leq\beta|\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})|^{2}+\Big{(}\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})\Big{)}\cdot b(\tilde{X}^{i}(t_{k-1}))\\ \leq C|\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})|^{2}+C(1+|\tilde{X}^{i}(t_{k-1})|^{q})

Hence,

\frac{d}{dt}\mathbb{E}\left[|\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})|^{2}|\mathcal{F}_{k-1}\right]\leq C\mathbb{E}\left[|\tilde{X}^{i}(t)-\tilde{X}^{i}(t_{k-1})|^{2}|\mathcal{F}_{k-1}\right]+C(1+|\tilde{X}^{i}(t_{k-1})|^{q}).

The claim then follows.

∎

Appendix B Proof of Proposition 4.1

Step 1– Estimates of $\nabla_{x_{i}}u$ .

Taking $\nabla_{x_{\ell}}$ in (4.4), one has

\displaystyle\partial_{t}\nabla_{x_{\ell}}u=\mathcal{L}\nabla_{x_{\ell}}u+f_{\ell}(x)

(B.1)

with

f_{\ell}(x,t)=\left[\nabla_{x_{\ell}}b(x_{\ell})+\frac{1}{N-1}\sum_{j=1}^{N}m_{j}\nabla_{x_{\ell}}K_{\ell j}(x_{\ell},x_{j})\right]\cdot\nabla_{x_{\ell}}u\\ +\frac{1}{N-1}\sum_{j=1}^{N}m_{\ell}\nabla_{x_{\ell}}K_{j\ell}(x_{j},x_{\ell})\cdot\nabla_{x_{j}}u.

(B.2)

Here $\nabla_{x_{\ell}}K$ is a second order tensor and we use the convention that $(A\cdot v)_{i}:=\sum_{j}A_{ij}v_{j}$ .

Hence,

\nabla_{x_{\ell}}u(t)=e^{t\mathcal{L}}\nabla_{x_{\ell}}u(0)+\int_{0}^{t}e^{(t-s)\mathcal{L}}f_{\ell}(x,s)\,ds.

Since the semigroup $e^{t\mathcal{L}}$ is nonexpansive in $L^{\infty}$ as in (4.7), we find

\|\nabla_{x_{\ell}}u(t)\|_{\infty}\leq\|\nabla_{x_{\ell}}u(0)\|_{\infty}+\int_{0}^{t}\|f_{\ell}(x,s)\|_{\infty}\,ds.

However, by (B.2), the linear transform from $[\nabla_{x_{1}}u,\cdots,\nabla_{x_{N}}u]$ to $[f_{1},\cdots,f_{N}]$ is a block matrix with the $L^{\infty}$ norm bounded. This is because all the off-diagonal blocks are of order $O(\frac{1}{N})$ . Hence,

\|f_{\ell}(x,s)\|_{\infty}\leq C\max_{\ell}\|\nabla_{x_{\ell}}u(t)\|_{\infty}=:a(t).

Therefore,

a(t)\leq a(0)+C\int_{0}^{t}a(s)\,ds.

This means

\sup_{t\leq T}a(t)\leq C(T)a(0).

Clearly since

u(0)=\frac{1}{N}\sum_{i}\omega_{i}\varphi(x_{i})\Rightarrow\|\nabla_{x_{i}}u(0)\|_{\infty}\leq C\frac{1}{N},

the estimate for $\nabla_{x_{i}}u$ follows by Grönwall’s lemma.

Step 2– Second order derivatives

We first compute the $\nabla_{x_{i}}^{2}$ derivatives (the $x_{i}$ Hessian).

\displaystyle\partial_{t}\nabla_{x_{\ell}}^{2}u=\mathcal{L}\nabla_{x_{\ell}}^{2}u+\tilde{f}_{\ell},

(B.3)

with

\displaystyle\begin{split}\tilde{f}_{\ell}=&2\nabla_{x_{\ell}}b\cdot\nabla_{x_{\ell}}^{2}u+\frac{1}{N-1}\sum_{j}2m_{j}\nabla_{x_{\ell}}K_{\ell j}(x_{\ell},x_{j})\cdot\nabla_{x_{\ell}}^{2}u\\ &+\frac{1}{N-1}\sum_{i:i\neq\ell}2m_{\ell}\nabla_{x_{\ell}}K_{i\ell}(x_{i},x_{\ell})\cdot\nabla_{x_{i}}\nabla_{x_{\ell}}u\\ &+\left(2\nabla_{x_{\ell}}b+\frac{1}{N-1}\sum_{j}m_{j}\nabla_{x_{\ell}}^{2}K_{\ell j}(x_{\ell},x_{j})\right)\cdot\nabla_{x_{\ell}}u\\ &+\frac{1}{N-1}\sum_{i}m_{\ell}\nabla_{x_{\ell}}^{2}K_{i\ell}(x_{i},x_{\ell})\cdot\nabla_{x_{i}}u.\end{split}

(B.4)

Again, $\nabla_{x_{i}}b$ for a vector field $b$ is a second order tensor with $(\nabla_{x_{i}}b)_{rs}=\partial_{x_{i}^{(r)}}b^{(s)}$ . For the dot product between tensors: $A\cdot B$ means the contraction between the last index of $A$ and the first index of $B$ . For example, we use the convention that

(\nabla_{x_{\ell}}K_{i\ell}(x_{i},x_{\ell})\cdot\nabla_{x_{i}}\nabla_{x_{\ell}}u)_{rs}:=\sum_{q=1}^{d}\partial_{x_{\ell}^{(r)}}K_{i\ell}^{(q)}\partial_{x_{i}^{(q)}}\partial_{x_{\ell}^{(s)}}u.

Thus,

\displaystyle\begin{split}\tilde{f}_{\ell}=&\tilde{A}_{\ell}^{1}\cdot\nabla_{x_{\ell}}^{2}u+\frac{1}{N-1}\sum_{i}\tilde{A}_{i\ell}^{2}\cdot\nabla_{x_{i}}\nabla_{x_{\ell}}u\\ &+\tilde{A}_{\ell}^{3}\cdot\nabla_{\ell}u+\frac{1}{N-1}\sum_{i}\tilde{A}_{i\ell}^{4}\cdot\nabla_{x_{i}}u.\end{split}

Here, all the $\tilde{A}$ tensors are bounded independent of $N$ . There are $O(N)$ terms in the summation in the first line, and thus the linear transform is again bounded in $L^{\infty}\to L^{\infty}$ independent of $N$ . The terms on the second line is clearly controlled as $C\frac{1}{N}$ by the estimates of $\nabla_{x_{i}}u$ .

Similarly, one can compute for $\ell\neq q$ that

\displaystyle\partial_{t}\nabla_{x_{\ell}}\nabla_{x_{q}}u=\mathcal{L}\nabla_{x_{\ell}}\nabla_{x_{q}}u+g_{\ell q}

(B.5)

where the expression of $g_{\ell q}$ is complicated but it is of the following form

g_{\ell q}=A_{\ell m}^{1}\cdot\nabla_{x_{\ell}}\nabla_{x_{q}}u+\frac{1}{N-1}\sum_{j}(A_{j\ell}^{2}\cdot\nabla_{x_{j}}\nabla_{x_{\ell}}u+A_{jq}^{3}\cdot\nabla_{x_{j}}\nabla_{x_{q}}u)\\ +\frac{1}{N-1}(m_{\ell}\nabla_{x_{\ell}}\nabla_{x_{q}}K_{\ell q}(x_{\ell},x_{q})\cdot\nabla_{\ell}u+m_{q}\nabla_{x_{q}}\nabla_{x_{\ell}}K_{q\ell}(x_{q},x_{\ell})\cdot\nabla_{x_{q}}u).

(B.6)

Here, $A^{k}$ ’s are second order tensors that are made up of the derivatives of $b,K_{ij}$ , and thus bounded by constants independent of $N$ . Note that there are $O(N)$ terms in the summation in the first line, and therefore the linear transform is again bounded in $L^{\infty}\to L^{\infty}$ independent of $N$ . The terms on the second line is clearly controlled as $C\frac{1}{N^{2}}$ by the estimates of $\nabla_{x_{i}}u$ .

We first of all consider all the second order derivatives $\nabla_{x_{\ell}}\nabla_{x_{q}}$ where $\ell$ can be equal or not equal to $q$ . By similar argument as for $\nabla_{x_{i}}u$ , one can get the estimate

\displaystyle\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(t)\|_{\infty}\leq\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(0)\|_{\infty}+C\int_{0}^{t}\max_{\ell,q}\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(s)\|_{\infty}\,ds+C(T)\frac{1}{N}.

(B.7)

Moreover,

\|\nabla_{x_{\ell}}^{2}u(0)\|_{\infty}\leq C\frac{1}{N},~{}~{}~{}\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(0)\|_{\infty}=0.

Then, Grönwall’s inequality tells us that

\displaystyle\max_{\ell,q}\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(t)\|_{\infty}\leq C(T)\frac{1}{N}.

(B.8)

Now, we focus on (B.5) for $\ell\neq q$ . To do this, we need to separate out the $j=\ell$ and $j=q$ terms in (B.6), which are of order $O(\frac{1}{N})$ , and thus with the prefactor $1/(N-1)$ , they contribute $O(1/N^{2})$ to $g_{\ell q}$ . Hence, one has for $\ell\neq q$ that

\displaystyle\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(t)\|_{\infty}\leq\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(0)\|_{\infty}+C\int_{0}^{t}\max_{\ell\neq q}\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(s)\|_{\infty}\,ds+C(T)\frac{1}{N^{2}}.

(B.9)

Since $\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(0)\|_{\infty}=0$ for $\ell\neq q$ , one then has

\displaystyle\max_{\ell\neq q}\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(t)\|_{\infty}\leq C\int_{0}^{t}\max_{\ell\neq q}\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(s)\|_{\infty}\,ds+C(T)\frac{1}{N^{2}}.

(B.10)

Grönwall’s lemma then tells us that

\displaystyle\max_{\ell\neq q}\|\nabla_{x_{\ell}}\nabla_{x_{q}}u(t)\|_{\infty}\leq C\frac{1}{N^{2}}.

(B.11)

Step 3– Higher order derivatives.

The higher order derivatives can be similarly estimated using induction. For these proofs, some derivatives that are not listed in Proposition 4.1 should be involved. For example, for third order derivatives, one should expect

\|\nabla_{x_{i}}\nabla_{x_{j}}\nabla_{x_{k}}u\|_{\infty}=O(1/N^{3})

for distinct $i,j,k$ . These proofs are tedious but the essential ideas are the same as we prove the claims for the Hessian. We omit the details.

References

[1] G. Albi and L. Pareschi. Binary interaction algorithms for the simulation of flocking and swarming dynamics. Multiscale Modeling & Simulation, 11(1):1–29, 2013.
[2] G. Allaire, J.-F. Dufrêche, A. Mikelić, and A. Piatnitski. Asymptotic analysis of the Poisson–Boltzmann equation describing electrokinetics in porous media. Nonlinearity, 26(3):881, 2013.
[3] D. F. Anderson and J. C. Mattingly. A weak trapezoidal method for a class of stochastic differential equations. Commun. Math. Sci., 9(1):301–318, 2011.
[4] Hans Babovsky and Reinhard Illner. A convergence proof for Nanbu’s simulation method for the full Boltzmann equation. SIAM journal on numerical analysis, 26(1):45–65, 1989.
[5] A. L. Bertozzi, J. B. Garnett, and T. Laurent. Characterization of radially symmetric finite time blowup in multidimensional aggregation equations. SIAM J. Math. Anal., 44(2):651–681, 2012.
[6] G. A. Bird. Approach to translational equilibrium in a rigid sphere gas. The Physics of Fluids, 6(10):1518–1519, 1963.
[7] C. K. Birdsall and A. B. Langdon. Plasma physics via computer simulation. CRC press, 2004.
[8] L. Bottou. Online learning and stochastic approximations. On-line learning in neural networks, 17(9):142, 1998.
[9] S. Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.
[10] E. Carlen, P. Degond, and B. Wennberg. Kinetic limits for pair-interaction driven master equations and biological swarm models. Mathematical Models and Methods in Applied Sciences, 23(07):1339–1376, 2013.
[11] J. A. Carrillo, L. Pareschi, and M. Zanella. Particle based gPC methods for mean-field models of swarming with uncertainty. Communications in Computational Physics, 25(2), 2019.
[12] A. J. Chorin. Numerical study of slightly viscous flow. J. Fluid Mech., 57(4):785–796, 1973.
[13] K. Craig and A. L Bertozzi. A blob method for the aggregation equation. Math. Comp., 85(300):1681–1717, 2016.
[14] F. Cucker and S. Smale. Emergent behavior in flocks. IEEE Transactions on automatic control, 52(5):852–862, 2007.
[15] P. Degond, J.-G. Liu, and R. L. Pego. Coagulation–fragmentation model for animal group-size statistics. Journal of Nonlinear Science, 27(2):379–424, 2017.
[16] R. Durrett. Probability: Theory and Examples. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 4 edition, 2010.
[17] R. Durstenfeld. Algorithm 235: random permutation. Communications of the ACM, 7(7):420, 1964.
[18] Y. Feng, L. Li, and J.-G. Liu. Semi-groups of stochastic gradient descent and online principal component analysis: properties and diffusion approximations. Commun. Math. Sci., 16(3), 2018.
[19] D. Frenkel and B. Smit. Understanding molecular simulation: from algorithms to applications, volume 1. Elsevier, 2001.
[20] A. Georges, G. Kotliar, W. Krauth, and M. J. Rozenberg. Dynamical mean-field theory of strongly correlated fermion systems and the limit of infinite dimensions. Reviews of Modern Physics, 68(1):13, 1996.
[21] F. Golse, S. Jin, and T. Paul. The random batch method for $n$ -body quantum dynamics. arXiv preprint arXiv:1912.07424, 2019.
[22] K. E. Gustafson and J. A. Sethian. Vortex methods and vortex motion, volume 19. SIAM, 1991.
[23] S.-Y. Ha and J.-G. Liu. A simple proof of the Cucker-Smale flocking dynamics and mean-field limit. Commun. Math. Sci., 7(2):297–325, 2009.
[24] D. Horstmann. From 1970 until present: the Keller-Segel model in chemotaxis and its consequences. Jahresber. Dtsch. Math.-Ver., 105:103–165, 2003.
[25] S. Jin, L. Li, and J.-G. Liu. Random Batch methods (RBM) for interacting particle systems. Journal of Computational Physics, 400:108877, 2020.
[26] S. Jin, L. Li, J.-G. Liu, and Y. Tang. A direct simulation approach for the Poisson-Boltzmann equation using the Random Batch Method. preprint, 2020.
[27] S. Jin and X. Wen. Hamiltonian-preserving schemes for the Liouville equation with discontinuous potentials. Commun. Math. Sci., 3(3):285–315, 2005.
[28] P. E. Kloeden and E. Platen. Numerical solution of stochastic differential equations, volume 23. Springer Science & Business Media, 2013.
[29] J.-M. Lasry and P.-L. Lions. Mean field games. Japanese journal of mathematics, 2(1):229–260, 2007.
[30] L. Li, Y. Li, J.-G. Liu, Z. Liu, and J. Lu. A stochastic version of Stein variational gradient descent for efficient sampling. Comm. App. Math. Comp. Sci., 2020.
[31] L. Li and J.-G. Liu. Large time behaviors of upwind schemes and B-schemes for Fokker-Planck equations on $\mathbb{R}$ by jump processes. Math. Comp., 2020.
[32] L. Li, J. Lu, J. C. Mattingly, and L. Wang. Numerical methods for stochastic differential equations based on Gaussian mixture. arXiv preprint arXiv:1812.11932, 2018.
[33] J.-G. Liu and Z. Xin. Convergence of the point vortex method for 2-d vortex sheet. Math. Comput., 70(234):595–606, 2001.
[34] J.-G. Liu and R. Yang. A random particle blob method for the Keller-Segel equation and convergence analysis. Mathematics of Computation, 86(304):725–745, 2017.
[35] Q. Liu and D. Wang. Stein variational gradient descent: A general purpose bayesian inference algorithm. In Advances In Neural Information Processing Systems, pages 2378–2386, 2016.
[36] G. N. Milstein and M. V. Tretyakov. Stochastic numerics for mathematical physics. Springer Science & Business Media, 2013.
[37] GN Milstein. Weak approximation of solutions of systems of stochastic differential equations. Theory of Probability $\&$ Its Applications, 30(4):750–766, 1986.
[38] S. Motsch and E. Tadmor. Heterophilious dynamics enhances consensus. SIAM review, 56(4):577–621, 2014.
[39] Kenichi Nanbu. Direct simulation scheme derived from the Boltzmann equation. i. monocomponent gases. Journal of the Physical Society of Japan, 49(5):2042–2049, 1980.
[40] B. Øksendal. Stochastic differential equations: an introduction with applications. Springer, Berlin, Heidelberg, sixth edition, 2003.
[41] H. Robbins and S. Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400–407, 1951.
[42] V. Rokhlin. Rapid solution of integral equations of classical potential theory. Journal of computational physics, 60(2):187–207, 1985.
[43] H. E. Stanley. Phase transitions and critical phenomena. Clarendon Press, Oxford, 1971.
[44] T. Vicsek, A. Czirók, E. Ben-Jacob, I. Cohen, and O. Shochet. Novel type of phase transition in a system of self-driven particles. Physical review letters, 75(6):1226, 1995.
[45] M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 681–688, 2011.