Convergence Analysis of Schrödinger-Föllmer Sampler without Convexity

Yuling Jiao School of Mathematics and Statistics, Wuhan University, Wuhan, China. Email: [email protected] Lican Kang School of Mathematics and Statistics, Wuhan University, Wuhan, China. Email: [email protected] Yanyan Liu School of Mathematics and Statistics, Wuhan University, Wuhan, China. Email: [email protected] Youzhou Zhou Department of Pure mathematics, Xi’an Jiaotong-Liverpool University, Suzhou, China. Email: [email protected]

Abstract

Schrödinger-Föllmer sampler (SFS) [26], is a novel and efficient approach for sampling from possibly unnormalized distributions without ergodicity. SFS is based on the Euler-Maruyama discretization of Schrödinger-Föllmer diffusion process

\mathrm{d}X_{t}=-\nabla U\left(X_{t},t\right)\mathrm{d}t+\mathrm{d}B_{t},\quad t\in[0,1],\quad X_{0}=0

on the unit interval, which transports the degenerate distribution at time zero to the target distribution at time one. In [26], the consistency of SFS is established under a restricted assumption that the potential $U(x,t)$ is uniformly (on $t$ ) strongly convex (on $x$ ). In this paper we provide a nonasymptotic error bound of SFS in Wasserstein distance under some smooth and bounded conditions on the density ratio of the target distribution over the standard normal distribution, but without requiring the strongly convexity of the potential.

1 Introduction

Sampling from possibly unnormalized distributions is an import task in Bayesian statistics and machine learning. Ever since the Metropolis-Hastings (MH) algorithm [32, 25] was introduced, various random sampling methods were proposed, including Gibbs sampler, random walk sampler, independent sampler, Lagevin sampler, bouncy particle sampler, zig-zag sampler [23, 22, 37, 29, 35, 4, 2], among others, see [5, 14] and the references therein. The above mentioned sampling algorithms generate random samples by running an ergodic Markov chain whose stationary distribution is the target distribution.

In [26], the Schrödinger-Föllmer sampler (SFS), a novel sampling approach without requiring the property of ergodicity is proposed. SFS is based on the Schrödinger-Föllmer diffusion process, defined as

\displaystyle\mathrm{d}X_{t}=b\left(X_{t},t\right)\mathrm{d}t+\mathrm{d}B_{t},\quad t\in[0,1],\quad X_{0}=0,

(1)

where the drift function

b(x,t)=-\nabla U(x,t)=\frac{\mathbb{E}_{Z\sim N\left(0,\mbox{\bf I}_{p}\right)}[\nabla f(x+\sqrt{1-t}Z)]}{\mathbb{E}_{Z\sim N\left(0,\mbox{\bf I}_{p}\right)}[f(x+\sqrt{1-t}Z)]}:\mathbb{R}^{p}\times[0,1]\rightarrow\mathbb{R}^{1}

with $f(\cdot)=\frac{d\mu}{dN\left(0,\mbox{\bf I}_{p}\right)}(\cdot)$ . According to [28] and [18], the process $\{X_{t}\}_{t\in[0,1]}$ in (1) was first formulated by Föllmer [19, 20, 21] when studying the Schrödinger bridge problem [36]. The main feature of the above Schrödinger-Föllmer process is that it interpolates $\delta_{0}$ and the target $\mu$ in time $[0,1]$ , i.e., $X_{1}\sim\mu$ , see Proposition 1. SFS samples from $\mu$ via the following Euler-Maruyama discretization of (1),

Y_{t_{k+1}}=Y_{t_{k}}+sb\left(Y_{t_{k}},t_{k}\right)+\sqrt{s}\epsilon_{k+1},~{}Y_{t_{0}}=0,~{}k=0,1,\ldots,K-1,

where $s=1/K$ is the step size, $t_{k}=sk$ , and $\{\epsilon_{k}\}_{k=1}^{K}$ are independent and identically distributed from $N(0,\mbox{\bf I}_{p})$ . If the expectations in the drift term $b(x,t)$ do not have analytical forms, one can use Monte Carlo method to evaluate $b\left(Y_{t_{k}},t_{k}\right)$ approximately, i.e., one can sample from $\mu$ according

\widetilde{Y}_{t_{k+1}}=\widetilde{Y}_{t_{k}}+s\tilde{b}_{m}\left(\widetilde{Y}_{t_{k}},t_{k}\right)+\sqrt{s}\epsilon_{k+1},~{}\widetilde{Y}_{t_{0}}=0,~{}k=0,1,\ldots,K-1,

where $\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})=\frac{\frac{1}{m}\sum_{j=1}^{m}[\nabla f(\widetilde{Y}_{t_{k}}+\sqrt{1-t_{k}}Z_{j})]}{\frac{1}{m}\sum_{j=1}^{m}[f(\widetilde{Y}_{t_{k}}+\sqrt{1-t_{k}}Z_{j})]}$ with $Z_{1},...Z_{m}$ i.i.d $N(0,\mbox{\bf I}_{p})$ . The numerical simulations in [26] demonstrate that SFS outperforms the exiting samplers based on egordicity.

In Section 4.2 of [26], they prove that

W_{2}(\mbox{Law}(\widetilde{Y}_{t_{K}}),\mu)\rightarrow 0,\ \ \mathrm{as}\ \ s\rightarrow 0,m\rightarrow\infty

under a restricted assumption that the potential $U(x,t)$ is uniformly strongly convex, i.e.,

U(x,t)-U(y,t)-\nabla U(y,t)^{\top}(x-y)\geq(M/2)\left\|x-y\right\|^{2}_{2},\forall x,y\in\mathbb{R}^{p},\forall t\in[0,1],

(2)

where $M$ is one finite and positive constant. In this paper we provide a new analysis of the above SFS iteration. We establish a nonasymptotic error bound on $W_{2}(\mbox{Law}(\widetilde{Y}_{t_{K}}),\mu)$ under the condition that $f$ and $\nabla f$ are Lipschitz continuous and $f$ has positive lower bound, but without using the uniformly strongly convexity requirement (2).

The rest of this paper is organized as follows. In Section 2, we recall the SFS method. In Section 3, we present our theoretical analysis. We conclude in Section 4. Proofs for all the theorems are provided in Appendix A.

2 Schrödinger-Föllmer sampler

In this section we recall the Schrödinger-Föllmer sampler briefly. More backgrounds on the Schrödinger-Föllmer diffusion process please see [10, 28, 7, 26].

Let $\mu\in\mathcal{P}\left(\mathbb{R}^{p}\right)$ be the target distribution and absolutely continuous with respect to the $p$ -dimensional standard normal measure $G=N(0,\mbox{\bf I}_{p})$ . Let

f(x)=\frac{d\mu}{dG}(x).

We assume that

$(\textbf{A1})$

$f,\nabla f$ are Lipschitz continuous with constant $\gamma$ ,
$(\textbf{A2})$

There exists $\xi>0$ such that $f\geq\xi$ .

Define the heat semigroup $Q_{t},t\in[0,1]$ as

Q_{t}f(x)=\mathbb{E}_{Z\sim G}[f(x+\sqrt{t}Z)].

Proposition 1.

Define a drift function

b(x,t)=\nabla\log Q_{1-t}f(x).

If $f$ satisfies assumptions $(\textbf{A1})$ and $(\textbf{A2})$ , then the Schrödinger-Föllmer diffusion

\displaystyle\mathrm{d}X_{t}=b\left(X_{t},t\right)\mathrm{d}t+\mathrm{d}B_{t},\quad t\in[0,1],\quad X_{0}=0,

(3)

has a unique strong solution and $X_{1}\sim\mu$ .

Remark 2.1.

(i)

From the definition of $b(x,t)$ in Proposition 1, it follows that $U(x,t)=-\log Q_{1-t}f(x)$ .
(ii)

If the target distribution is $\mu(dx)=\exp(-V(x))dx/C$ with the normalized constant $C$ , then $f(x)=\frac{\left(\sqrt{2\pi}\right)^{p}}{C}\exp(-V(x)+\frac{\|x\|_{2}^{2}}{2})$ . Once $V(x)$ is twice differentiable and

$\lim_{R\to\infty}\sup_{\|x\|_{2}\geq R}\exp\left(-V(x)+\|x\|_{2}^{2}/2\right)\|x-\nabla V\|_{2}<\infty,$

$\lim_{R\to\infty}\sup_{\|x\|_{2}\geq R}\exp\left(-V(x)+\|x\|_{2}^{2}/2\right)\|Hess(V)\|_{2}<\infty,$

then both $f$ and $\nabla f$ are Lipschitz continuous, i.e., (A1) holds. (A2) is equivalent to the growth condition on the potential that $V(x)\leq\frac{\|x\|^{2}}{2}-\log\xi+\mathrm{contant}$ .

(iii)

Under (A1) and (A2), some calculation shows that

\|\nabla f\|_{2}\leq\gamma,\|Hess(f)\|_{2}\leq\gamma,

and

\sup_{x\in\mathbb{R}^{p},t\in[0,1]}\|\nabla Q_{1-t}f(x)\|_{2}\leq\gamma,\sup_{x\in\mathbb{R}^{p},t\in[0,1]}\|Hess(Q_{1-t}f(x))\|_{2}\leq\gamma,

and

b(x,t)=\frac{\nabla Q_{1-t}f(x)}{Q_{1-t}f(x)},~{}\nabla b(x,t)=\frac{Hess(Q_{1-t}f)(x)}{Q_{1-t}f(x)}-b(x,t)b(x,t)^{\top}.

We conclude that

\sup_{x\in\mathbb{R}^{p},t\in[0,1]}\|b(x,t)\|_{2}\leq\frac{\gamma}{\xi},\sup_{x\in\mathbb{R}^{p},t\in[0,1]}\|\nabla b(x,t)\|_{2}\leq\frac{\gamma}{\xi}+\frac{\gamma^{2}}{\xi^{2}}.

Proposition 1 shows that the Schrödinger-Föllmer diffusion will transport $\delta_{0}$ to the target $\mu$ on the unite time interval. Since drift term $b(x,t)$ is scale-invariant with respect to $f$ in the sense that $b(x,t)=\nabla\log Q_{1-t}Cf(x),\forall C>0$ . Therefore, the Schrödinger-Föllmer diffusion can be used for sampling from $\mu(dx)=\exp(-V(x))dx/C$ , where the normalizing constant of $C$ may not to be known. To this end, we use the Euler-Maruyama method to discretize the Schrödinger-Föllmer diffusion (3). Let

t_{k}=k\cdot s,\ \ ~{}k=0,1,\ldots,K,\ \ ~{}\mbox{with}~{}\ \ s=1/K,\ \ Y_{t_{0}}=0,

the Euler-Maruyama scheme reads

\displaystyle Y_{t_{k+1}}=Y_{t_{k}}+sb\left(Y_{t_{k}},t_{k}\right)+\sqrt{s}\epsilon_{k+1},~{}k=0,1,\ldots,K-1,

(4)

where $\{\epsilon_{k}\}_{k=1}^{K}$ are i.i.d. $N(0,\mbox{\bf I}_{p})$ and

\displaystyle b(Y_{t_{k}},t_{k})=\frac{\mathbb{E}_{Z}[\nabla f(Y_{t_{k}}+\sqrt{1-t_{k}}Z)]}{\mathbb{E}_{Z}[f(Y_{t_{k}}+\sqrt{1-t_{k}}Z)]}=\frac{\mathbb{E}_{Z}[Zf(Y_{t_{k}}+\sqrt{1-t_{k}}Z)]}{\mathbb{E}_{Z}[f(Y_{t_{k}}+\sqrt{1-t_{k}}Z)]\sqrt{1-t_{k}}}.

(5)

From the definition of $b(Y_{t_{k}},t_{k})$ in (5) we may not get its explicit expression. Here, we can get one estimator $\tilde{b}_{m}$ of $b$ by replacing $\mathbb{E}_{Z}$ in $b$ with $m$ -sample mean, i.e.,

\displaystyle\tilde{b}_{m}(Y_{t_{k}},t_{k})=\frac{\frac{1}{m}\sum_{j=1}^{m}[\nabla f(Y_{t_{k}}+\sqrt{1-t_{k}}Z_{j})]}{\frac{1}{m}\sum_{j=1}^{m}[f(Y_{t_{k}}+\sqrt{1-t_{k}}Z_{j})]},\ k=0,\ldots,K-1,

(6)

\displaystyle\tilde{b}_{m}(Y_{t_{k}},t_{k})=\frac{\frac{1}{m}\sum_{j=1}^{m}[Z_{j}f(Y_{t_{k}}+\sqrt{1-t_{k}}Z_{j})]}{\frac{1}{m}\sum_{j=1}^{m}[f(Y_{t_{k}}+\sqrt{1-t_{k}}Z_{j})]\cdot\sqrt{1-t_{k}}},\ k=0,\ldots,K-1,

(7)

where $Z_{1},\ldots,Z_{m}$ are i.i.d. $N(0,\mbox{\bf I}_{p})$ . The detailed description of SFS is summarized in following Algorithm 1 below, which is Algorithm 2 in [26].

Algorithm 1 SFS for

\mu=\exp(-V(x))/C

with Monte Carlo estimation of the drift term

1: Input:

m

K

. Initialize

s=1/K

\widetilde{Y}_{t_{0}}=0

2: for

k=0,1,\ldots,K-1

3: Sample

\epsilon_{k+1}\sim N(0,\mbox{\bf I}_{p})

4: Sample

Z_{i},i,\ldots,m

, from

N(0,\mbox{\bf I}_{p})

5: Compute

\tilde{b}_{m}

according to (6) or (7),

\widetilde{Y}_{t_{k+1}}=\widetilde{Y}_{t_{k}}+s\tilde{b}_{m}\left(\widetilde{Y}_{t_{k}},t_{k}\right)+\sqrt{s}\epsilon_{k+1}

7: end for

8: Output:

\{\widetilde{Y}_{t_{k}}\}_{k=1}^{K}

In Section 4.2 of [26], they proved that

W_{2}(\mbox{Law}(\widetilde{Y}_{t_{K}}),\mu)\rightarrow 0,\ \ \mathrm{as}\ \ s\rightarrow 0,m\rightarrow\infty

under a restricted assumption that the potential is uniformly strongly convex, i.e, $U(x,t)$ satisfies (2). However, (2) is not easy to verify. In the next section, we establish a nonasymptotic bound on the Wasserstein distance between the law of $\widetilde{Y}_{t_{K}}$ generated by SFS (Algorithm 1) and the target $\mu$ under smooth and bounded conditions (A1) and (A2) but without using the strongly uniform convexity assumption (2) on $U(x,t)$ .

3 Nonasymptotic Bound without convexity

Under conditions (A1) and (A2), one can easily deduce the growth condition and Lipschitz/Hölder continuity of the drift term $b(x,t)$ [26], i.e.,

\displaystyle\|b(x,t)\|_{2}^{2}\leq C_{0}(1+\|x\|_{2}^{2}),

(C1)

and

\displaystyle\|b(x,t)-b(y,t)\|_{2}\leq C_{1}\|x-y\|_{2},

(C2)

and

\displaystyle\|b(x,t)-b(y,s)\|_{2}\leq C_{1}\left(\|x-y\|_{2}+|t-s|^{\frac{1}{2}}\right),

(C3)

where $C_{0}$ and $C_{1}$ are two finite and positive constants.

Remark 3.1.

(C1) and (C2) are the essentially sufficient conditions such that the Schrödinger-Föllmer SDE (3) admits the unique strong solution. (C3) has been introduced in Theorem 4.1 of [38], and it is also similar to the condition H2 of [6] and Assumption 3.2 of [1]. Obviously, (C3) implies (C2), and (C1) holds if the drift term $b(x,t)$ is bounded over $\mathbb{R}^{p}\times[0,1]$ .

Let $\mathcal{D}(\nu_{1},\nu_{2})$ be the collection of coupling probability measures on $\left(\mathbb{R}^{2p},\mathcal{B}(\mathbb{R}^{2p})\right)$ such that its respective marginal distributions are $\nu_{1}$ and $\nu_{2}$ . The Wasserstein of order $d\geq 1$ with which we measure the discrepancy between $\mbox{Law}(\widetilde{Y}_{t_{K}})$ and $\mu$ is defined as

\displaystyle W_{d}(\nu_{1},\nu_{2})=\inf_{\nu\in\mathcal{D}(\nu_{1},\nu_{2})}\left(\int_{\mathbb{R}^{p}}\int_{\mathbb{R}^{p}}\left\|\theta_{1}-\theta_{2}\right\|_{2}^{d}\mathrm{d}\nu\left(\theta_{1},\theta_{2}\right)\right)^{1/d}.

Theorem 2.

Assume (A1) and (A2) hold, then

\displaystyle W_{2}(\mbox{Law}(\widetilde{Y}_{t_{K}}),\mu)\leq\mathcal{O}(\sqrt{ps})+\mathcal{O}\left(\sqrt{\frac{p}{\log(m)}}\right),

where $s=1/K$ is the step size.

Remark 3.2.

This theorem provides some guidance on the selection of $s$ and $m$ . To ensure convergence of the distribution of $\widetilde{Y}_{t_{K}}$ , we should set the step size $s=o(1/p)$ and $m=\exp(p/o(1))$ . In high-dimensional models with a large $p$ , we need to generate a large number of random vectors from $N(0,\mbox{\bf I}_{p})$ to obtain an accurate estimate of the drift term $b$ . If we assume that $f$ is bounded above we can improve the nonasymptotic error bound, in which $\mathcal{O}\left(\sqrt{p/\log(m)}\right)$ can be improved to be $\mathcal{O}\left(\sqrt{p/m}\right)$ .

Theorem 3.

Assume that, in addition to the conditions of Theorem 2, $f$ has a finite upper bound, then

\displaystyle W_{2}(\mbox{Law}(\widetilde{Y}_{t_{K}}),\mu)\leq\mathcal{O}(\sqrt{ps})+\mathcal{O}\left(\sqrt{\frac{p}{m}}\right),

where $s=1/K$ is the step size.

Remark 3.3.

With the boundedness condition on $f$ , to ensure convergence of the sampling distribution, we can set the step size $s=o(1/p)$ and $m=p/o(1)$ . Note that the sample size requirement for approximating the drift term is significantly less stringent than that in Theorem 2.

Remark 3.4.

Langevin sampling method has been studied under the (strongly) convex potential assumption [15, 16, 17, 11, 12, 8, 13]; the dissipativity condition for the drift term [34, 33, 40]; the local convexity condition for the potential function outside a ball [17, 9, 30, 3]. However, these conditions may not hold for models with multiple modes, for example, Gaussian mixtures, where their potentials are not convex and the log Sobolev inequality may not be satisfied. Moreover, the constant in the log Sobolev inequality depends on the dimensionality exponentially [39, 24, 31, 34], implying that the Langevin samplers suffers from the curse of dimensionality. SFS does not require the underlying Markov process to be ergodic, therefore, our results in Theorem 2 and 3 are established under the smooth and bounded assumptions (A1) and (A2) on $f$ but do not need the above mentioned conditions used in the analysis of Langevin samplers.

In Theorem 2 and Theorem 3, we use (A2), i.e, $f$ has positive lower bound, however, (A2) may not hold if the target distribution admits compact support. To circumvent this difficulty, we consider the regularized probability measure

\mu_{\varepsilon}=(1-\varepsilon)\mu+\varepsilon G,\ \ \varepsilon\in(0,1).

The corresponding density ratio is

f_{\varepsilon}=\frac{d\mu_{\varepsilon}}{dG}=(1-\varepsilon)f+\varepsilon.

Obviously, $f_{\varepsilon}$ satisfies (A1) and (A2) if $f$ and $\nabla f$ are Lipschitz continuous. Since $\mu_{\varepsilon}$ can approximate to $\mu$ well if we set $\varepsilon$ small enough, then we consider sampling from $\mu_{\varepsilon}$ by running SFS (Algorithm 1) with $f$ being replaced by $f_{\varepsilon}$ . We use $\widetilde{Y}_{t_{K}}^{\varepsilon}$ to denote the last iteration of SFS.

Theorem 4.

Assume (A1) holds and set $\varepsilon=(\log(m))^{-1/5}$ , then

\displaystyle W_{2}(\mbox{Law}(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu)\leq\mathcal{O}(\sqrt{ps})+\widetilde{C}_{p}\cdot\mathcal{O}\left(\frac{1}{(\log(m))^{1/10}}\right),

where $s=1/K$ is the step size, $\widetilde{C}_{p}$ is a constant depending on $p$ . Moreover, if $f$ has the finite upper bound and set $\varepsilon=m^{-1/5}$ , then

\displaystyle W_{2}(\mbox{Law}(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu)\leq\mathcal{O}(\sqrt{ps})+\widetilde{C}_{p}\cdot\mathcal{O}\left(\frac{1}{m^{1/10}}\right).

4 Conclusion

In [26], Schrödinger-Föllmer sampler (SFS) was proposed for sampling from possibly unnormalized distributions. The key feature of SFS is that it does not need ergodicity as its theoretical basis. The consistency of SFS proved in [26] relies a restricted assumption that the potential function is uniformly strongly convex. In this paper we provide a new convergence analysis of the SFS without the strongly convexity condition on the potential. We establish a nonasymptotic error bound on Wasserstein distance between the law of the output of SFS and the target distribution under smooth and bounded assumptions on the density ratio of the target distribution over the standard normal distribution.

5 Acknowledgment

The authors would like to thank Professor Liming Wu at Université Clermont-Auvergne for helpful discussions on this topic.

Appendix A Appendix

In this appendix, we prove Proposition 1 and Theorems 2-4.

A.1 Proof of Proposition 1

Proof.

This is a known result, see for example [10, 27]. ∎

A.2 Preliminary lemmas for Theorems 2-3

First, we introduce Lemmas 5-9 in preparing for the proofs of Theorems 2-3.

Lemma 5.

Assume (A1) and (A2) hold, then

\displaystyle\mathbb{E}[\|X_{t}\|_{2}^{2}]\leq 2(C_{0}+p)\exp(2C_{0}t).

Proof.

From the definition of $X_{t}$ in (3), we have $\|X_{t}\|_{2}\leq\int_{0}^{t}\|b(X_{u},u)\|_{2}\mathrm{d}u+\|B_{t}\|_{2}.$ Then, we can get

	$\displaystyle\\|X_{t}\\|_{2}^{2}$	$\displaystyle\leq 2\left(\int_{0}^{t}\\|b(X_{u},u)\\|_{2}\mathrm{d}u\right)^{2}+2\\|B_{t}\\|_{2}^{2}$
		$\displaystyle\leq 2t\int_{0}^{t}\\|b(X_{u},u)\\|_{2}^{2}\mathrm{d}u+2\\|B_{t}\\|_{2}^{2}$
		$\displaystyle\leq 2t\int_{0}^{t}C_{0}[\\|X_{u}\\|_{2}^{2}+1]\mathrm{d}u+2\\|B_{t}\\|_{2}^{2},$

where the first inequality holds by the inequality $(a+b)^{2}\leq 2a^{2}+2b^{2}$ , the last inequality holds by (C1). Thus,

	$\displaystyle\mathbb{E}\\|X_{t}\\|_{2}^{2}$	$\displaystyle\leq 2t\int_{0}^{t}C_{0}(\mathbb{E}\\|X_{u}\\|_{2}^{2}+1)\mathrm{d}u+2\mathbb{E}\\|B_{t}\\|_{2}^{2}$
		$\displaystyle\leq 2C_{0}\int_{0}^{t}\mathbb{E}\\|X_{u}\\|_{2}^{2}\mathrm{d}u+2(C_{0}+p).$

By Bellman-Gronwall inequality, we have

\displaystyle\mathbb{E}\|X_{t}\|_{2}^{2}\leq 2(C_{0}+p)\exp(2C_{0}t).

∎

Lemma 6.

Assume (A1) and (A2) hold, then for any $0\leq t_{1}\leq t_{2}\leq 1$ ,

\displaystyle\mathbb{E}[\|X_{t_{2}}-X_{t_{1}}\|_{2}^{2}]\leq 4C_{0}\exp(2C_{0})(C_{0}+p)(t_{2}-t_{1})^{2}+2C_{0}(t_{2}-t_{1})^{2}+2p(t_{2}-t_{1}).

Proof.

From the definition of $X_{t}$ in (3), we have

\displaystyle\|X_{t_{2}}-X_{t_{1}}\|_{2}\leq\int_{t_{1}}^{t_{2}}\|b(X_{u},u)\|_{2}\mathrm{d}u+\|B_{t_{2}}-B_{t_{1}}\|_{2}.

Then, we can get

	$\displaystyle\\|X_{t_{2}}-X_{t_{1}}\\|_{2}^{2}$	$\displaystyle\leq 2\left(\int_{t_{1}}^{t_{2}}\\|b(X_{u},u)\\|_{2}\mathrm{d}u\right)^{2}+2\\|B_{t_{2}}-B_{t_{1}}\\|_{2}^{2}$
		$\displaystyle\leq 2(t_{2}-t_{1})\int_{t_{1}}^{t_{2}}\\|b(X_{u},u)\\|_{2}^{2}\mathrm{d}u+2\\|B_{t_{2}}-B_{t_{1}}\\|_{2}^{2}$
		$\displaystyle\leq 2(t_{2}-t_{1})\int_{t_{1}}^{t_{2}}C_{0}[\\|X_{u}\\|_{2}^{2}+1]\mathrm{d}u+2\\|B_{t_{2}}-B_{t_{1}}\\|_{2}^{2},$

where the last inequality holds by (C1). Hence,

	$\displaystyle\mathbb{E}\\|X_{t_{2}}-X_{t_{1}}\\|_{2}^{2}$	$\displaystyle\leq 2(t_{2}-t_{1})\int_{t_{1}}^{t_{2}}C_{0}(\mathbb{E}\\|X_{u}\\|_{2}^{2}+1)\mathrm{d}u+2\mathbb{E}\\|B_{t_{2}}-B_{t_{1}}\\|_{2}^{2}$
		$\displaystyle\leq 4C_{0}\exp(2C_{0})(C_{0}+p)(t_{2}-t_{1})^{2}+2C_{0}(t_{2}-t_{1})^{2}+2p(t_{2}-t_{1}),$

where the last inequality holds by Lemma 5. ∎

Lemma 7.

Assume (A1) and (A2) hold, then for any $R>0$ ,

\displaystyle\underset{\|x\|_{2}\leq R,t\in[0,1]}{\sup}\mathbb{E}\left[\|b(x,t)-\tilde{b}_{m}(x,t)\|_{2}^{2}\right]\leq\mathcal{O}\left(\frac{p\exp(R^{2})}{m}\right).

Moreover, if $f$ has the finite upper bound, then

\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}\left[\|b(x,t)-\tilde{b}_{m}(t,x)\|_{2}^{2}\right]\leq\mathcal{O}\left(\frac{p}{m}\right).

Proof.

Denote two independent sets of independent copies of $Z\sim N(0,\mbox{\bf I}_{p})$ , that is, ${\bf Z}=\{Z_{1},\ldots,Z_{m}\}$ and ${\bf Z}^{\prime}=\{Z_{1}^{\prime},\ldots,Z_{m}^{\prime}\}$ . For notation convenience, we denote

	$\displaystyle d=\mathbb{E}_{Z}\nabla f(x+\sqrt{1-t}Z),~{}d_{m}=\frac{\sum_{i=1}^{m}\nabla f(x+\sqrt{1-t}Z_{i})}{m},$
	$\displaystyle e=\mathbb{E}_{Z}f(x+\sqrt{1-t}Z),~{}e_{m}=\frac{\sum_{i=1}^{m}f(x+\sqrt{1-t}Z_{i})}{m},$
	$\displaystyle d_{m}^{\prime}=\frac{\sum_{i=1}^{m}\nabla f(x+\sqrt{1-t}Z_{i}^{\prime})}{m},~{}e_{m}^{\prime}=\frac{\sum_{i=1}^{m}f(x+\sqrt{1-t}Z_{i}^{\prime})}{m}.$

Due to $d-d_{m}=\mathbb{E}\left[d_{m}^{\prime}-d_{m}|{\bf Z}\right]$ , then $\|d-d_{m}\|^{2}_{2}\leq\mathbb{E}\left[\|d_{m}^{\prime}-d_{m}\|^{2}_{2}|{\bf Z}\right]$ . Then,

$\displaystyle\mathbb{E}\\|d-d_{m}\\|^{2}$	$\displaystyle\leq\mathbb{E}\left[\mathbb{E}[\\|d_{m}^{\prime}-d_{m}\\|^{2}_{2}\|{\bf Z}]\right]=\mathbb{E}\\|d_{m}^{\prime}-d_{m}\\|^{2}_{2}$
	$\displaystyle=\frac{\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\\|\nabla f(x+\sqrt{1-t}Z_{1})-\nabla f(x+\sqrt{1-t}Z_{1}^{\prime})\right\\|^{2}_{2}}{m}$
	$\displaystyle\leq\frac{(1-t)\gamma^{2}}{m}\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\\|Z_{1}-Z_{1}^{\prime}\right\\|^{2}_{2}$
	$\displaystyle\leq\frac{2p\gamma^{2}}{m},$	(A.1)

where the second inequality holds by (A1). Similarly, we also have

$\displaystyle\mathbb{E}\|e-e_{m}\|^{2}$	$\displaystyle\leq\mathbb{E}\|e_{m}^{\prime}-e_{m}\|^{2}$
	$\displaystyle=\frac{\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\|f(x+\sqrt{1-t}Z_{1})-f(x+\sqrt{1-t}Z_{1}^{\prime})\right\|^{2}}{m}$
	$\displaystyle\leq\frac{(1-t)\gamma^{2}}{m}\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\\|Z_{1}-Z_{1}^{\prime}\right\\|^{2}_{2}$
	$\displaystyle\leq\frac{2p\gamma^{2}}{m},$	(A.2)

where the second inequality holds due to (A1). Thus, by (A.2) and (A.2), it follows that

\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}\left\|d-d_{m}\right\|_{2}^{2}\leq\frac{2p\gamma^{2}}{m},

(A.3)

\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}|e-e_{m}|^{2}\leq\frac{2p\gamma^{2}}{m}.

(A.4)

Then, by (A1) and (A2), through some simple calculation, it yields that

$\displaystyle\\|b(x,t)-\tilde{b}_{m}(x,t)\\|_{2}$	$\displaystyle=\left\\|\frac{d}{e}-\frac{d_{m}}{e_{m}}\right\\|_{2}$
	$\displaystyle\leq\frac{\\|d\\|_{2}\|e_{m}-e\|+\\|d-d_{m}\\|_{2}\|e\|}{\|ee_{m}\|}$
	$\displaystyle\leq\frac{\gamma\|e_{m}-e\|+\\|d-d_{m}\\|_{2}\|e\|}{\xi^{2}}.$	(A.5)

Let $R>0$ , then

\displaystyle\underset{\|x\|_{2}\leq R}{\sup}f(x)\leq\mathcal{O}\left(\exp(R^{2}/2)\right).

(A.6)

Therefore, by (A.3)-(A.6), it can be concluded that

\displaystyle\underset{\|x\|_{2}\leq R,t\in[0,1]}{\sup}\mathbb{E}\left[\|b(x,t)-\tilde{b}_{m}(x,t)\|_{2}^{2}\right]\leq\mathcal{O}\left(\frac{p\exp(R^{2})}{m}\right).

Moreover, if $f$ has the finite upper bound, that is, there exists one finite and positive constant $\zeta$ such that $f\leq\zeta$ . Then, similar to (A.2), it follows that for all $x\in\mathbb{R}^{p}$ and $t\in[0,1]$ ,

\displaystyle\|b(x,t)-\tilde{b}_{m}(x,t)\|_{2}^{2}

\displaystyle\leq 2\frac{\gamma^{2}|e_{m}-e|^{2}+\zeta^{2}\|d-d_{m}\|_{2}^{2}}{\xi^{4}}.

(A.7)

Then, by (A.3)-(A.4) and (A.7), it follows that

\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}\left[\|b(x,t)-\tilde{b}_{m}(t,x)\|_{2}^{2}\right]\leq\mathcal{O}\left(\frac{p}{m}\right).

∎

Lemma 8.

Assume (A1) and (A2) hold, then for $k=0,1,\ldots,K$ ,

\displaystyle E[\|\widetilde{Y}_{t_{k}}\|^{2}_{2}]\leq\frac{6\gamma^{2}}{\xi^{2}}+3p.

Proof.

Define $\Theta_{k,t}=\widetilde{Y}_{t_{k}}+(t-t_{k})\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})$ and $\widetilde{Y}_{t}=\Theta_{k,t}+B_{t}-B_{t_{k}}$ , where $t_{k}\leq t\leq t_{k+1}$ with $k=0,1,\ldots,K-1$ . By (A1) and (A2), it follows that for all $x\in\mathbb{R}^{p}$ and $t\in[0,1]$ ,

\displaystyle\|b(x,t)\|_{2}^{2}\leq\frac{\gamma^{2}}{\xi^{2}},~{}~{}\|\tilde{b}_{m}(x,t)\|_{2}^{2}\leq\frac{\gamma^{2}}{\xi^{2}}.

(A.8)

Then, by (A.8), we have

	$\displaystyle\\|\Theta_{k,t}\\|^{2}_{2}$	$\displaystyle=\\|\widetilde{Y}_{t_{k}}\\|^{2}_{2}+(t-t_{k})^{2}\\|\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\\|^{2}_{2}+2(t-t_{k})\widetilde{Y}_{t_{k}}^{\top}\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})$
		$\displaystyle\leq(1+s)\\|\widetilde{Y}_{t_{k}}\\|^{2}_{2}+\frac{(s+s^{2})\gamma^{2}}{\xi^{2}}.$

Further, we can get

	$\displaystyle\mathbb{E}[\\|\widetilde{Y}_{t}\\|^{2}_{2}\|\widetilde{Y}_{t_{k}}]$	$\displaystyle=\mathbb{E}[\\|\Theta_{k,t}\\|^{2}_{2}\|\widetilde{Y}_{t_{k}}]+(t-t_{k})p$
		$\displaystyle\leq(1+s)\mathbb{E}\\|\widetilde{Y}_{t_{k}}\\|^{2}_{2}+\frac{(s+s^{2})\gamma^{2}}{\xi^{2}}+sp.$

Therefore,

\displaystyle\mathbb{E}[\|\widetilde{Y}_{t_{k+1}}\|^{2}_{2}]\leq(1+s)\mathbb{E}\|\widetilde{Y}_{t_{k}}\|^{2}_{2}+\frac{(s+s^{2})\gamma^{2}}{\xi^{2}}+sp.

Since $\widetilde{Y}_{t_{0}}=0$ , then by induction, we have

\displaystyle\mathbb{E}[\|\widetilde{Y}_{t_{k+1}}\|^{2}_{2}]\leq\frac{6\gamma^{2}}{\xi^{2}}+3p.

∎

Lemma 9.

Assume (A1) and (A2) hold, then for $k=0,1,\ldots,K$ ,

\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}\leq\mathcal{O}\left(\frac{p}{\log(m)}\right).

Moreover, if $f$ has the finite upper bound, then

\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}\leq\mathcal{O}\left(\frac{p}{m}\right).

Proof.

Let $R>0$ , then

\begin{split}\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}&=\mathbb{E}_{\widetilde{Y}_{t_{k}}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}\|_{2}\leq R)\right]\\ &~{}~{}~{}+\mathbb{E}_{\widetilde{Y}_{t_{k}}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}\|_{2}>R)\right].\end{split}

(A.9)

Next, we need to bound the two terms of (A.9). First, by Lemma 7, we have

\mathbb{E}_{\widetilde{Y}_{t_{k}}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}\|_{2}\leq R)\right]\leq\mathcal{O}\left(\frac{p\exp(R^{2})}{m}\right).

Secondly, combining (A.8) and Lemma 8 with Markov inequality, it yields that

\mathbb{E}_{\widetilde{Y}_{t_{k}}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}\|_{2}>R)\right]\leq\mathcal{O}\left(p/R^{2}\right).

Thence

\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}\leq\mathcal{O}\left(\frac{p\exp(R^{2})}{m}\right)+\mathcal{O}\left(p/R^{2}\right).

(A.10)

Set $R=\left(\frac{\log(m)}{2}\right)^{1/2}$ in (A.10), then we have

\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}\leq\mathcal{O}\left(\frac{p}{\log(m)}\right).

Moreover, if $f$ has the finite upper bound, then by Lemma 7, we can similarly get

\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}=\mathbb{E}_{\widetilde{Y}_{t_{k}}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}\right]\leq\mathcal{O}\left(\frac{p}{m}\right).

This completes the proof. ∎

A.3 Proof of Theorem 2

Proof.

From the definition of $\widetilde{Y}_{t_{k}}$ and $X_{t_{k}}$ , we have

	$\displaystyle\\|\widetilde{Y}_{t_{k}}-X_{t_{k}}\\|_{2}^{2}$
	$\displaystyle\leq\\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\\|_{2}^{2}+\left(\int_{t_{k-1}}^{t_{k}}\\|b(X_{u},u)-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\\|_{2}\mathrm{d}u\right)^{2}$
	$\displaystyle~{}~{}~{}+2\\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\\|_{2}\left(\int_{t_{k-1}}^{t_{k}}\\|b(X_{u},u)-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\\|_{2}\mathrm{d}u\right)$
	$\displaystyle\leq(1+s)\\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\\|_{2}^{2}+(1+s)\int_{t_{k-1}}^{t_{k}}\\|b(X_{u},u)-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\\|_{2}^{2}\mathrm{d}u$
	$\displaystyle\leq(1+s)\\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\\|_{2}^{2}+2(1+s)\int_{t_{k-1}}^{t_{k}}\\|b(X_{u},u)-b(\widetilde{Y}_{t_{k-1}},t_{k-1})\\|_{2}^{2}\mathrm{d}u$
	$\displaystyle~{}~{}~{}+2s(1+s)\\|b(\widetilde{Y}_{t_{k-1}},t_{k-1})-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\\|_{2}^{2}$
	$\displaystyle\leq(1+s)\\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\\|_{2}^{2}+4C_{1}^{2}(1+s)\int_{t_{k-1}}^{t_{k}}[\\|X_{u}-\widetilde{Y}_{t_{k-1}}\\|_{2}^{2}+\|u-t_{k-1}\|]\mathrm{d}u$
	$\displaystyle~{}~{}~{}+2s(1+s)\\|b(\widetilde{Y}_{t_{k-1}},t_{k-1})-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\\|_{2}^{2}$
	$\displaystyle\leq(1+s)\\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\\|_{2}^{2}+8C_{1}^{2}(1+s)\int_{t_{k-1}}^{t_{k}}\\|X_{u}-X_{t_{k-1}}\\|_{2}^{2}\mathrm{d}u$
	$\displaystyle~{}~{}~{}+8C_{1}^{2}s(1+s)\\|X_{t_{k-1}}-\widetilde{Y}_{t_{k-1}}\\|_{2}^{2}+4C_{1}^{2}(1+s)s^{2}$
	$\displaystyle~{}~{}~{}+2s(1+s)\\|b(\widetilde{Y}_{t_{k-1}},t_{k-1})-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\\|_{2}^{2}$
	$\displaystyle\leq(1+s+8C_{1}^{2}(s+s^{2}))\\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\\|_{2}^{2}+8C_{1}^{2}(1+s)\int_{t_{k-1}}^{t_{k}}\\|X_{u}-X_{t_{k-1}}\\|_{2}^{2}\mathrm{d}u$
	$\displaystyle~{}~{}~{}+4C_{1}^{2}(1+s)s^{2}+2s(1+s)\\|b(\widetilde{Y}_{t_{k-1}},t_{k-1})-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\\|_{2}^{2},$

where the second inequality holds duo to $2ab\leq sa^{2}+\frac{b^{2}}{s}$ , the fourth inequality holds by (C3). Then,

$\displaystyle\mathbb{E}\\|\widetilde{Y}_{t_{k}}-X_{t_{k}}\\|_{2}^{2}$	$\displaystyle\leq(1+s+8C_{1}^{2}(s+s^{2}))\mathbb{E}\\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\\|_{2}^{2}$
	$\displaystyle~{}~{}~{}+8C_{1}^{2}(1+s)\int_{t_{k-1}}^{t_{k}}\mathbb{E}\\|X_{u}-X_{t_{k-1}}\\|_{2}^{2}\mathrm{d}u+4C_{1}^{2}(s^{2}+s^{3})$
	$\displaystyle~{}~{}~{}+2s(1+s)\mathbb{E}[\\|b(\widetilde{Y}_{t_{k-1}},t_{k-1})-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\\|_{2}^{2}]$
	$\displaystyle\leq(1+s+8C_{1}^{2}(s+s^{2}))\mathbb{E}\\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\\|_{2}^{2}+h(s)$
	$\displaystyle~{}~{}~{}+4C_{1}^{2}(s^{2}+s^{3})+2s(1+s)\mathbb{E}[\\|b(\widetilde{Y}_{t_{k-1}},t_{k-1})-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\\|_{2}^{2}]$
	$\displaystyle\leq(1+s+8C_{1}^{2}(s+s^{2}))\mathbb{E}\\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\\|_{2}^{2}+h(s)$
	$\displaystyle~{}~{}~{}+4C_{1}^{2}(s^{2}+s^{3})+2s(1+s)\mathcal{O}\left(\frac{p}{\log(m)}\right),$	(A.11)

where $h(s)=8C_{1}^{2}(s+s^{2})[4C_{0}\exp(2C_{0})(C_{0}+p)s^{2}+2C_{0}s^{2}+2ps]$ , and the last inequality holds by Lemma 9. Owing to $\widetilde{Y}_{t_{0}}=X_{t_{0}}=0$ , we can conclude that

	$\displaystyle\mathbb{E}\\|\widetilde{Y}_{t_{K}}-X_{t_{K}}\\|_{2}^{2}$
	$\displaystyle\leq\frac{(1+s+8C_{1}^{2}(s+s^{2}))^{K}-1}{s+8C_{1}^{2}(s+s^{2})}\left[h(s)+4C_{1}^{2}(s^{2}+s^{3})+2(s+s^{2})\mathcal{O}\left(\frac{p}{\log(m)}\right)\right]$
	$\displaystyle\leq\mathcal{O}(ps)+\mathcal{O}\left(\frac{p}{\log(m)}\right).$

Therefore,

\displaystyle W_{2}(Law(\widetilde{Y}_{t_{K}}),\mu)\leq\mathcal{O}(\sqrt{ps})+\mathcal{O}\left(\sqrt{\frac{p}{\log(m)}}\right).

∎

A.4 Proof of Theorem 3

Proof.

This proof is same as that of Theorem 2. Similar to (A.3), by Lemma 9, it yields that

	$\displaystyle\mathbb{E}\\|\widetilde{Y}_{t_{k}}-X_{t_{k}}\\|_{2}^{2}$	$\displaystyle\leq(1+s+8C_{1}^{2}(s+s^{2}))\mathbb{E}\\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\\|_{2}^{2}+h(s)$
		$\displaystyle~{}~{}~{}+4C_{1}^{2}(s^{2}+s^{3})+2s(1+s)\mathcal{O}\left(\frac{p}{m}\right).$

Then, we also have

	$\displaystyle\mathbb{E}\\|\widetilde{Y}_{t_{K}}-X_{t_{K}}\\|_{2}^{2}$
	$\displaystyle\leq\frac{(1+s+8C_{1}^{2}(s+s^{2}))^{K}-1}{s+8C_{1}^{2}(s+s^{2})}\left[h(s)+4C_{1}^{2}(s^{2}+s^{3})+2(s+s^{2})\mathcal{O}\left(\frac{1}{m}\right)\right]$
	$\displaystyle\leq\mathcal{O}(ps)+\mathcal{O}\left(\frac{p}{m}\right).$

Hence, it follows that

\displaystyle W_{2}(Law(\widetilde{Y}_{t_{K}}),\mu)\leq\mathcal{O}(\sqrt{ps})+\mathcal{O}\left(\sqrt{\frac{p}{m}}\right).

∎

A.5 Preliminary lemmas for Theorem 4

To prove Theorem 4, we first prove the Lemmas 10-12.

Lemma 10.

Assume (A1) holds, then for any $R>0$ ,

\displaystyle\underset{\|x\|_{2}\leq R,t\in[0,1]}{\sup}\mathbb{E}\left[\|b(x,t)-\tilde{b}_{m}(x,t)\|_{2}^{2}\right]\leq\mathcal{O}\left(\frac{p\exp(R^{2})(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right),

where $C_{p}=(2\pi)^{p/2}C^{-1}$ . Moreover, if $f$ has the finite upper bound, then

\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}\left[\|b(x,t)-\tilde{b}_{m}(t,x)\|_{2}^{2}\right]\leq\mathcal{O}\left(\frac{p(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right).

Proof.

Denote two independent sets of independent copies of $Z\sim N(0,\mbox{\bf I}_{p})$ by ${\bf Z}=\{Z_{1},\ldots,Z_{m}\}$ and ${\bf Z}^{\prime}=\{Z_{1}^{\prime},\ldots,Z_{m}^{\prime}\}$ . For notation convenience, we denote

	$\displaystyle d=\mathbb{E}_{Z}\nabla g(x+\sqrt{1-t}Z),~{}d_{m}=\frac{\sum_{i=1}^{m}\nabla g(x+\sqrt{1-t}Z_{i})}{m},$
	$\displaystyle e=\mathbb{E}_{Z}\left[g(x+\sqrt{1-t}Z)+\frac{\varepsilon}{C_{p}(1-\varepsilon)}\right],~{}e_{m}=\frac{\sum_{i=1}^{m}g(x+\sqrt{1-t}Z_{i})}{m}+\frac{\varepsilon}{C_{p}(1-\varepsilon)},$
	$\displaystyle d_{m}^{\prime}=\frac{\sum_{i=1}^{m}\nabla g(x+\sqrt{1-t}Z_{i}^{\prime})}{m},~{}e_{m}^{\prime}=\frac{\sum_{i=1}^{m}g(x+\sqrt{1-t}Z_{i}^{\prime})}{m}+\frac{\varepsilon}{C_{p}(1-\varepsilon)},$

where $g(x)=\exp(\|x\|^{2}_{2}/2-V(x))$ . Since $d-d_{m}=\mathbb{E}\left[d_{m}^{\prime}-d_{m}|{\bf Z}\right]$ , we have $\|d-d_{m}\|^{2}_{2}\leq\mathbb{E}\left[\|d_{m}^{\prime}-d_{m}\|^{2}_{2}|{\bf Z}\right]$ . By (A1), it yields that $g$ and $\nabla g$ are Lipschitz continuous. Thus there exists a finite and positive constant $\gamma$ such that for all $x,y\in\mathbb{R}^{p}$ ,

\displaystyle|g(x)-g(y)|\leq\gamma\|x-y\|_{2},

(A.12)

\displaystyle\|\nabla g(x)-\nabla g(y)\|_{2}\leq\gamma\|x-y\|_{2}.

(A.13)

Therefore,

$\displaystyle\mathbb{E}\\|d-d_{m}\\|^{2}$	$\displaystyle\leq\mathbb{E}\left[\mathbb{E}[\\|d_{m}^{\prime}-d_{m}\\|^{2}_{2}\|{\bf Z}]\right]=\mathbb{E}\\|d_{m}^{\prime}-d_{m}\\|^{2}_{2}$
	$\displaystyle=\frac{\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\\|\nabla g(x+\sqrt{1-t}Z_{1})-\nabla g(x+\sqrt{1-t}Z_{1}^{\prime})\right\\|^{2}_{2}}{m}$
	$\displaystyle\leq\frac{(1-t)\gamma^{2}}{m}\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\\|Z_{1}-Z_{1}^{\prime}\right\\|^{2}_{2}$
	$\displaystyle\leq\frac{2p\gamma^{2}}{m},$	(A.14)

where the second inequality follows from (A.13). Similarly, we also have

$\displaystyle\mathbb{E}\|e-e_{m}\|^{2}$	$\displaystyle\leq\mathbb{E}\|e_{m}^{\prime}-e_{m}\|^{2}$
	$\displaystyle=\frac{\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\|g(x+\sqrt{1-t}Z_{1})-g(x+\sqrt{1-t}Z_{1}^{\prime})\right\|^{2}}{m}$
	$\displaystyle\leq\frac{(1-t)\gamma^{2}}{m}\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\\|Z_{1}-Z_{1}^{\prime}\right\\|^{2}_{2}$
	$\displaystyle\leq\frac{2p\gamma^{2}}{m},$	(A.15)

where the second inequality follows from (A.12). Hence, by (A.5) and (A.5), we have

\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}\left\|d-d_{m}\right\|_{2}^{2}\leq\frac{2p\gamma^{2}}{m},

(A.16)

\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}|e-e_{m}|^{2}\leq\frac{2p\gamma^{2}}{m}.

(A.17)

Then, by (A.12) and (A.13), through some simple calculation, it yields that

$\displaystyle\\|b(x,t)-\tilde{b}_{m}(x,t)\\|_{2}$	$\displaystyle=\left\\|\frac{d}{e}-\frac{d_{m}}{e_{m}}\right\\|_{2}$
	$\displaystyle\leq\frac{\\|d\\|_{2}\|e_{m}-e\|+\\|d-d_{m}\\|_{2}\|e\|}{\|ee_{m}\|}$
	$\displaystyle\leq\frac{\gamma\|e_{m}-e\|+\\|d-d_{m}\\|_{2}\|e\|}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}.$	(A.18)

Let $R>0$ , then

\displaystyle\underset{\|x\|_{2}\leq R}{\sup}g(x)\leq\mathcal{O}\left(\exp(R^{2}/2)\right).

(A.19)

Therefore, by (A.16)-(A.19), it can be concluded that

	$\displaystyle\underset{\\|x\\|_{2}\leq R,t\in[0,1]}{\sup}\mathbb{E}\left[\\|b(x,t)-\tilde{b}_{m}(x,t)\\|_{2}^{2}\right]$	$\displaystyle\leq\mathcal{O}\left(\frac{p\exp(R^{2})}{m(\varepsilon/(C_{p}-C_{p}\varepsilon))^{4}}\right)+\mathcal{O}\left(\frac{p}{m(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}\right)$
		$\displaystyle\leq\mathcal{O}\left(\frac{p\exp(R^{2})(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right).$

Moreover, $f$ has a finite upper bound so does $g$ . Then there exists a finite and positive constant $\zeta$ such that $g\leq\zeta$ . Similar to (A.5), it follows that for all $x\in\mathbb{R}^{p}$ and $t\in[0,1]$ ,

\displaystyle\|b(x,t)-\tilde{b}_{m}(x,t)\|_{2}^{2}

\displaystyle\leq 2\frac{\gamma^{2}|e_{m}-e|^{2}+(\zeta+\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}\|d-d_{m}\|_{2}^{2}}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{4}}.

(A.20)

Then, by (A.16)-(A.17) and (A.20), it follows that

	$\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}\left[\\|b(x,t)-\tilde{b}_{m}(t,x)\\|_{2}^{2}\right]$	$\displaystyle\leq\mathcal{O}\left(\frac{p}{m(\varepsilon/(C_{p}-C_{p}\varepsilon))^{4}}\right)+\mathcal{O}\left(\frac{p}{m(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}\right)$
		$\displaystyle\leq\mathcal{O}\left(\frac{p(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right).$

∎

Lemma 11.

Assume (A1) holds, then for $k=0,1,\ldots,K$ ,

\displaystyle\mathbb{E}[\|\widetilde{Y}_{t_{k}}^{\varepsilon}\|^{2}_{2}]\leq\mathcal{O}\left(\frac{(C_{p})^{2}}{\varepsilon^{2}}\right)+\mathcal{O}\left(p\right),

where $C_{p}=(2\pi)^{p/2}C^{-1}$ .

Proof.

Define $\Theta_{k,t}^{\varepsilon}=\widetilde{Y}_{t_{k}}^{\varepsilon}+(t-t_{k})\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})$ and $\widetilde{Y}_{t}^{\varepsilon}=\Theta_{k,t}^{\varepsilon}+B_{t}-B_{t_{k}}$ , where $t_{k}\leq t\leq t_{k+1}$ with $k=0,1,\ldots,K-1$ . By (A1), then there exists one finite and positive constant $\gamma$ such that $g$ is $\gamma$ -Lipschitz continuous. Then, for all $x\in\mathbb{R}^{p}$ and $t\in[0,1]$ , we have

\displaystyle\|b(x,t)\|_{2}^{2}\leq\frac{\gamma^{2}}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}},~{}~{}\|\tilde{b}_{m}(x,t)\|_{2}^{2}\leq\frac{\gamma^{2}}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}.

(A.21)

By (A.21), we have

	$\displaystyle\\|\Theta_{k,t}^{\varepsilon}\\|^{2}_{2}$	$\displaystyle=\\|\widetilde{Y}_{t_{k}}^{\varepsilon}\\|^{2}_{2}+(t-t_{k})^{2}\\|\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\\|^{2}_{2}+2(t-t_{k})(\widetilde{Y}_{t_{k}}^{\varepsilon})^{\top}\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})$
		$\displaystyle\leq(1+s)\\|\widetilde{Y}_{t_{k}}^{\varepsilon}\\|^{2}_{2}+\frac{(s+s^{2})\gamma^{2}}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}.$

Furthermore, it can be shown that

	$\displaystyle\mathbb{E}[\\|\widetilde{Y}_{t}^{\varepsilon}\\|^{2}_{2}\|\widetilde{Y}_{t_{k}}^{\varepsilon}]$	$\displaystyle=\mathbb{E}[\\|\Theta_{k,t}^{\varepsilon}\\|^{2}_{2}\|\widetilde{Y}_{t_{k}}^{\varepsilon}]+(t-t_{k})p$
		$\displaystyle\leq(1+s)\mathbb{E}\\|\widetilde{Y}_{t_{k}}^{\varepsilon}\\|^{2}_{2}+\frac{(s+s^{2})\gamma^{2}}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}+sp.$

Therefore,

\displaystyle\mathbb{E}[\|\widetilde{Y}_{t_{k+1}}^{\varepsilon}\|^{2}_{2}]\leq(1+s)\mathbb{E}[\|\widetilde{Y}_{t_{k}}^{\varepsilon}\|^{2}_{2}]+\frac{(s+s^{2})\gamma^{2}}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}+sp.

Since $\widetilde{Y}_{t_{0}}^{\varepsilon}=0$ , then by induction, we have

\displaystyle\mathbb{E}[\|\widetilde{Y}_{t_{k+1}}^{\varepsilon}\|^{2}_{2}]\leq\frac{6\gamma^{2}}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}+3p\leq\mathcal{O}\left(\frac{(C_{p})^{2}}{\varepsilon^{2}}\right)+\mathcal{O}\left(p\right).

∎

Lemma 12.

Assume (A1) holds, then for $k=0,1,\ldots,K$ and $t\in[0,1]$ ,

\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2}

\displaystyle\leq\mathcal{O}\left(\frac{p(C_{p})^{4}}{\sqrt{m}\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{(C_{p})^{4}}{\log(m)\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{\log(m)\varepsilon^{2}}\right),

where $C_{p}=(2\pi)^{p/2}C^{-1}$ . Moreover, if $f$ has a finite upper bound, then

\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2}\leq\mathcal{O}\left(\frac{p(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right).

Proof.

Let $R>0$ , then

\begin{split}\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2}&=\mathbb{E}_{\widetilde{Y}_{t_{k}}^{\varepsilon}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}^{\varepsilon}\|_{2}\leq R)\right]\\ &~{}~{}~{}+\mathbb{E}_{\widetilde{Y}_{t_{k}}^{\varepsilon}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}^{\varepsilon}\|_{2}>R)\right].\end{split}

(A.22)

Next, we need to bound the two terms on the right hand of (A.22). First, by Lemma 10, we have

\displaystyle\mathbb{E}_{\widetilde{Y}_{t_{k}}^{\varepsilon}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}^{\varepsilon}\|_{2}\leq R)\right]\leq\mathcal{O}\left(\frac{p\exp(R^{2})(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right).

Second, by combining (A.21) and Lemma 11 with the Markov inequality, we have

\displaystyle\mathbb{E}_{\widetilde{Y}_{t_{k}}^{\varepsilon}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}^{\varepsilon}\|_{2}>R)\right]

\displaystyle\leq\mathcal{O}\left(\frac{(C_{p})^{4}}{R^{2}\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{R^{2}\varepsilon^{2}}\right).

Therefore,

	$\displaystyle\mathbb{E}\left\\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\\|_{2}^{2}$	$\displaystyle\leq\mathcal{O}\left(\frac{p\exp(R^{2})(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right)$
		$\displaystyle~{}~{}~{}+\mathcal{O}\left(\frac{(C_{p})^{4}}{R^{2}\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{R^{2}\varepsilon^{2}}\right).$		(A.23)

Setting $R=\left(\frac{\log(m)}{2}\right)^{1/2}$ in (A.5), we have

\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2}

\displaystyle\leq\mathcal{O}\left(\frac{p(C_{p})^{4}}{\sqrt{m}\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{(C_{p})^{4}}{\log(m)\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{\log(m)\varepsilon^{2}}\right).

Moreover, if $f$ has a finite upper bound, then by Lemma 10, we can similarly get

	$\displaystyle\mathbb{E}\left\\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\\|_{2}^{2}$	$\displaystyle=\mathbb{E}_{\widetilde{Y}_{t_{k}}^{\varepsilon}}\mathbb{E}_{Z}\left[\left\\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\\|_{2}^{2}\right]$
		$\displaystyle\leq\mathcal{O}\left(\frac{p(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right).$

This completes the proof. ∎

A.6 Proof of Theorem 4

Proof.

By triangle inequality, we have $W_{2}(Law(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu)\leq W_{2}(Law(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu_{\varepsilon})+W_{2}(\mu,\mu_{\varepsilon})$ , then we obtain the upper bound of two terms on the right hand of this inequality, respectively.

First, similar to the proof of Theorem 2, by Lemma 12 and $\widetilde{Y}_{t_{0}}^{\varepsilon}=X_{t_{0}}=0$ and through some calculation, we can conclude that

\displaystyle W_{2}(Law(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu_{\varepsilon})\leq\mathcal{O}(\sqrt{ps})+\mathcal{O}\left(\frac{\sqrt{p}(C_{p})^{2}}{m^{1/4}\varepsilon^{2}}\right)+\mathcal{O}\left(\frac{(C_{p})^{2}}{\sqrt{\log(m)}\varepsilon^{2}}\right)+\mathcal{O}\left(\frac{\sqrt{p}C_{p}}{\sqrt{\log(m)}\varepsilon}\right).

(A.24)

Second, we need to get the upper bound of $W_{2}(\mu,\mu_{\varepsilon})$ . Let $Y\sim\mu$ and $Z\sim N(0,\mbox{\bf I}_{p})$ , $\theta$ is one Bernoulli random variable satisfying $P(\theta=1)=1-\varepsilon$ and $P(\theta=0)=\varepsilon$ . Assume $Y$ , $Z$ and $\theta$ are independent of each other. Then $(Y,(1-\theta)Z+\theta Y)$ is one coupling of $(\mu,\mu_{\varepsilon})$ , and denote its joint distribution by $\pi$ . Therefore, we have

	$\displaystyle\int_{\mathbb{R}^{p}\times\mathbb{R}^{p}}\\|x-y\\|_{2}^{2}d\pi$	$\displaystyle=\mathbb{E}\left\\|Y-((1-\theta)Z+\theta Y)\right\\|_{2}^{2}$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}\left[\\|Y-((1-\theta)Z+\theta Y)\\|_{2}^{2}\|\theta\right]\right]$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}\left[\\|Y-((1-\theta)Z+\theta Y)\\|_{2}^{2}\|\theta=1\right]\right]P(\theta=1)$
		$\displaystyle~{}~{}~{}+\mathbb{E}\left[\mathbb{E}\left[\\|Y-((1-\theta)Z+\theta Y)\\|_{2}^{2}\|\theta=0\right]\right]P(\theta=0)$
		$\displaystyle=\mathbb{E}[\\|Y-Z\\|_{2}^{2}\|\theta=0]P(\theta=0)$
		$\displaystyle=\varepsilon\mathbb{E}\\|Y-Z\\|_{2}^{2}$
		$\displaystyle\leq\mathcal{O}(p\varepsilon).$

Then we have

\displaystyle W_{2}(\mu,\mu_{\varepsilon})\leq\mathcal{O}(\sqrt{p\varepsilon}).

(A.25)

Combining (A.24) with (A.25), it yields that

		$\displaystyle W_{2}(Law(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu)$
		$\displaystyle~{}~{}\leq\mathcal{O}(\sqrt{p\varepsilon})+\mathcal{O}(\sqrt{ps})+\mathcal{O}\left(\frac{\sqrt{p}(C_{p})^{2}}{m^{1/4}\varepsilon^{2}}\right)+\mathcal{O}\left(\frac{(C_{p})^{2}}{\sqrt{\log(m)}\varepsilon^{2}}\right)+\mathcal{O}\left(\frac{\sqrt{p}C_{p}}{\sqrt{\log(m)}\varepsilon}\right).$		(A.26)

Set $\varepsilon=(\log(m))^{-1/5}$ in (A.6), then there exist one constant $\widetilde{C}_{p}$ depending on $p$ such that

\displaystyle W_{2}(Law(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu)\leq\widetilde{C}_{p}\cdot\mathcal{O}\left(\frac{1}{(\log(m))^{1/10}}\right)+\mathcal{O}(\sqrt{ps}).

Moreover, if $f$ has the finite upper bound, then similar to the proof of Theorem 3 and by (A.25) and Lemma 12, we have

\displaystyle W_{2}(Law(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu)\leq\mathcal{O}(\sqrt{p\varepsilon})+\mathcal{O}(\sqrt{ps})+\mathcal{O}\left(\frac{\sqrt{p}(C_{p})^{2}}{\sqrt{m}\varepsilon^{2}}\right)+\mathcal{O}\left(\frac{\sqrt{p}C_{p}}{\sqrt{m}\varepsilon}\right).

(A.27)

Set $\varepsilon=m^{-1/5}$ in (A.27), then there exists one constant $\widetilde{C}_{p}$ depending on $p$ such that

\displaystyle W_{2}(Law(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu)\leq\widetilde{C}_{p}\cdot\mathcal{O}\left(\frac{1}{m^{1/10}}\right)+\mathcal{O}(\sqrt{ps}).

∎

References

[1] Mathias Barkhagen, Ngoc Huy Chau, Eric Moulines, Miklos Rasonyi, Sotirios Sabanis, and Ying Zhang, On stochastic gradient langevin dynamics with dependent data streams in the logconcave case, arXiv preprint arXiv:1812.02709, (2018).
[2] Joris Bierkens, Paul Fearnhead, Gareth Roberts, et al., The zig-zag process and super-efficient sampling for bayesian analysis of big data, The Annals of Statistics, 47 (2019), pp. 1288–1320.
[3] Nawaf Bou-Rabee, Andreas Eberle, Raphael Zimmer, et al., Coupling and convergence for hamiltonian monte carlo, Annals of Applied Probability, 30 (2020), pp. 1209–1250.
[4] Alexandre Bouchard-Cote, Sebastian J Vollmer, and Arnaud Doucet, The bouncy particle sampler: A nonreversible rejection-free markov chain monte carlo method, Journal of the American Statistical Association, 113 (2018), pp. 855–867.
[5] Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng, Handbook of markov chain monte carlo, CRC press, 2011.
[6] Ngoc Huy Chau, Eric Moulines, Miklos Rasonyi, Sotirios Sabanis, and Ying Zhang, On stochastic gradient langevin dynamics with dependent data streams: the fully non-convex case, arXiv preprint arXiv:1905.13142, (2019).
[7] Yongxin Chen, Tryphon T Georgiou, and Michele Pavon, Stochastic control liaisons: Richard sinkhorn meets gaspard monge on a schrodinger bridge, SIAM Review, 63 (2021), pp. 249–313.
[8] Xiang Cheng and Peter Bartlett, Convergence of langevin mcmc in kl-divergence, Proceedings of Machine Learning Research, Volume 83: Algorithmic Learning Theory, (2018), pp. 186–211.
[9] Xiang Cheng, Niladri S Chatterji, Yasin Abbasi-Yadkori, Peter L Bartlett, and Michael I Jordan, Sharp convergence rates for langevin dynamics in the nonconvex setting, arXiv preprint arXiv:1805.01648, (2018).
[10] Paolo Dai Pra, A stochastic control approach to reciprocal diffusion processes, Applied mathematics and Optimization, 23 (1991), pp. 313–329.
[11] Arnak S Dalalyan, Further and stronger analogy between sampling and optimization: Langevin monte carlo and gradient descent, arXiv: Statistics Theory, (2017).
[12] , Theoretical guarantees for approximate sampling from smooth and log-concave densities, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79 (2017), pp. 651–676.
[13] Arnak S Dalalyan and Avetik G Karagulyan, User-friendly guarantees for the langevin monte carlo with inaccurate gradient, Stochastic Processes and their Applications, 129 (2019), pp. 5278–5311.
[14] David B Dunson and JE Johndrow, The hastings algorithm at fifty, Biometrika, 107 (2020), pp. 1–23.
[15] Alain Durmus and Eric Moulines, High-dimensional bayesian inference via the unadjusted langevin algorithm., arXiv: Statistics Theory, (2016).
[16] , Sampling from a strongly log-concave distribution with the unadjusted langevin algorithm, arXiv: Statistics Theory, (2016).
[17] Alain Durmus, Eric Moulines, et al., Nonasymptotic convergence analysis for the unadjusted langevin algorithm, The Annals of Applied Probability, 27 (2017), pp. 1551–1587.
[18] Ronen Eldan, Joseph Lehec, Yair Shenfeld, et al., Stability of the logarithmic sobolev inequality via the follmer process, in Annales de l’Institut Henri Poincare, Probabilites et Statistiques, vol. 56, Institut Henri Poincare, 2020, pp. 2253–2269.
[19] Hans Follmer, An entropy approach to the time reversal of diffusion processes, in Stochastic Differential Systems Filtering and Control, Springer, 1985, pp. 156–163.
[20] , Time reversal on wiener space, in Stochastic processes-mathematics and physics, Springer, 1986, pp. 119–129.
[21] , Random fields and diffusion processes, in Ecole dEte de Probabilites de Saint-Flour XV–XVII, 1985–87, Springer, 1988, pp. 101–203.
[22] Alan E Gelfand and Adrian FM Smith, Sampling-based approaches to calculating marginal densities, Journal of the American statistical association, 85 (1990), pp. 398–409.
[23] Stuart Geman and Donald Geman, Stochastic relaxation, gibbs distributions, and the bayesian restoration of images, IEEE Transactions on pattern analysis and machine intelligence, (1984), pp. 721–741.
[24] Jack K Hale, Asymptotic behavior of dissipative systems, no. 25, American Mathematical Soc., 2010.
[25] W Keith Hastings, Monte carlo sampling methods using markov chains and their applications, (1970).
[26] Jian Huang, Yuling Jiao, Lican Kang, Xu Liao, Jin Liu, and Yanyan Liu, Schrodinger-follmer sampler: Sampling without ergodicity, arXiv preprint arXiv: 2106.10880, (2021).
[27] Joseph Lehec, Representation formula for the entropy and functional inequalities, in Annales de l’IHP Probabilites et statistiques, vol. 49, 2013, pp. 885–899.
[28] Christian Leonard, A survey of the schrodinger problem and some of its connections with optimal transport, Discrete Continuous Dynamical Systems-A, 34 (2014), p. 1533.
[29] Jun S Liu, Monte Carlo strategies in scientific computing, Springer Science Business Media, 2008.
[30] Yi-An Ma, Yuansi Chen, Chi Jin, Nicolas Flammarion, and Michael I Jordan, Sampling can be faster than optimization, Proceedings of the National Academy of Sciences, 116 (2019), pp. 20881–20885.
[31] Georg Menz, Andre Schlichting, et al., Poincare and logarithmic sobolev inequalities by decomposition of the energy landscape, Annals of Probability, 42 (2014), pp. 1809–1884.
[32] Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller, Equation of state calculations by fast computing machines, The journal of chemical physics, 21 (1953), pp. 1087–1092.
[33] Wenlong Mou, Nicolas Flammarion, Martin J Wainwright, and Peter L Bartlett, Improved bounds for discretization of langevin diffusions: Near-optimal rates without convexity, arXiv preprint arXiv:1907.11331, (2019).
[34] Maxim Raginsky, Alexander Rakhlin, and Matus Telgarsky, Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis, in Conference on Learning Theory, PMLR, 2017, pp. 1674–1703.
[35] Christian P Robert, George Casella, and George Casella, Introducing monte carlo methods with r, vol. 18, Springer, 2010.
[36] Erwin Schrodinger, Sur la theorie relativiste de lelectron et l’interpretation de la mecanique quantique, in Annales de l’institut Henri Poincare, vol. 2, 1932, pp. 269–310.
[37] Luke Tierney, Markov chains for exploring posterior distributions, The Annals of Statistics, 22 (1994), pp. 1701–1728.
[38] Belinda Tzen and Maxim Raginsky, Theoretical guarantees for sampling and inference in generative models with latent diffusions, in Conference on Learning Theory, PMLR, 2019, pp. 3084–3114.
[39] Feng-Yu Wang et al., Log-sobolev inequalities: different roles of ric and hess, The Annals of Probability, 37 (2009), pp. 1587–1604.
[40] Ying Zhang, Omer Deniz Akyildiz, Theo Damoulas, and Sotirios Sabanis, Nonasymptotic estimates for stochastic gradient langevin dynamics under local conditions in nonconvex optimization, arXiv preprint arXiv:1910.02008, (2019).

	$\displaystyle\\|X_{t}\\|_{2}^{2}$	$\displaystyle\leq 2\left(\int_{0}^{t}\\|b(X_{u},u)\\|_{2}\mathrm{d}u\right)^{2}+2\\|B_{t}\\|_{2}^{2}$
		$\displaystyle\leq 2t\int_{0}^{t}\\|b(X_{u},u)\\|_{2}^{2}\mathrm{d}u+2\\|B_{t}\\|_{2}^{2}$
		$\displaystyle\leq 2t\int_{0}^{t}C_{0}[\\|X_{u}\\|_{2}^{2}+1]\mathrm{d}u+2\\|B_{t}\\|_{2}^{2},$

	$\displaystyle\\|X_{t_{2}}-X_{t_{1}}\\|_{2}^{2}$	$\displaystyle\leq 2\left(\int_{t_{1}}^{t_{2}}\\|b(X_{u},u)\\|_{2}\mathrm{d}u\right)^{2}+2\\|B_{t_{2}}-B_{t_{1}}\\|_{2}^{2}$
		$\displaystyle\leq 2(t_{2}-t_{1})\int_{t_{1}}^{t_{2}}\\|b(X_{u},u)\\|_{2}^{2}\mathrm{d}u+2\\|B_{t_{2}}-B_{t_{1}}\\|_{2}^{2}$
		$\displaystyle\leq 2(t_{2}-t_{1})\int_{t_{1}}^{t_{2}}C_{0}[\\|X_{u}\\|_{2}^{2}+1]\mathrm{d}u+2\\|B_{t_{2}}-B_{t_{1}}\\|_{2}^{2},$

$\displaystyle\mathbb{E}\\|d-d_{m}\\|^{2}$	$\displaystyle\leq\mathbb{E}\left[\mathbb{E}[\\|d_{m}^{\prime}-d_{m}\\|^{2}_{2}\|{\bf Z}]\right]=\mathbb{E}\\|d_{m}^{\prime}-d_{m}\\|^{2}_{2}$
	$\displaystyle=\frac{\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\\|\nabla f(x+\sqrt{1-t}Z_{1})-\nabla f(x+\sqrt{1-t}Z_{1}^{\prime})\right\\|^{2}_{2}}{m}$
	$\displaystyle\leq\frac{(1-t)\gamma^{2}}{m}\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\\|Z_{1}-Z_{1}^{\prime}\right\\|^{2}_{2}$
	$\displaystyle\leq\frac{2p\gamma^{2}}{m},$	(A.1)

$\displaystyle\mathbb{E}\|e-e_{m}\|^{2}$	$\displaystyle\leq\mathbb{E}\|e_{m}^{\prime}-e_{m}\|^{2}$
	$\displaystyle=\frac{\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\|f(x+\sqrt{1-t}Z_{1})-f(x+\sqrt{1-t}Z_{1}^{\prime})\right\|^{2}}{m}$
	$\displaystyle\leq\frac{(1-t)\gamma^{2}}{m}\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\\|Z_{1}-Z_{1}^{\prime}\right\\|^{2}_{2}$
	$\displaystyle\leq\frac{2p\gamma^{2}}{m},$	(A.2)

$\displaystyle\\|b(x,t)-\tilde{b}_{m}(x,t)\\|_{2}$	$\displaystyle=\left\\|\frac{d}{e}-\frac{d_{m}}{e_{m}}\right\\|_{2}$
	$\displaystyle\leq\frac{\\|d\\|_{2}\|e_{m}-e\|+\\|d-d_{m}\\|_{2}\|e\|}{\|ee_{m}\|}$
	$\displaystyle\leq\frac{\gamma\|e_{m}-e\|+\\|d-d_{m}\\|_{2}\|e\|}{\xi^{2}}.$	(A.5)