This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Convergence Analysis of Schrödinger-Föllmer Sampler without Convexity

Yuling Jiao School of Mathematics and Statistics, Wuhan University, Wuhan, China. Email: [email protected]    Lican Kang School of Mathematics and Statistics, Wuhan University, Wuhan, China. Email: [email protected]    Yanyan Liu School of Mathematics and Statistics, Wuhan University, Wuhan, China. Email: [email protected]    Youzhou Zhou Department of Pure mathematics, Xi’an Jiaotong-Liverpool University, Suzhou, China. Email: [email protected]
Abstract

Schrödinger-Föllmer sampler (SFS) [26], is a novel and efficient approach for sampling from possibly unnormalized distributions without ergodicity. SFS is based on the Euler-Maruyama discretization of Schrödinger-Föllmer diffusion process

dXt=U(Xt,t)dt+dBt,t[0,1],X0=0\mathrm{d}X_{t}=-\nabla U\left(X_{t},t\right)\mathrm{d}t+\mathrm{d}B_{t},\quad t\in[0,1],\quad X_{0}=0

on the unit interval, which transports the degenerate distribution at time zero to the target distribution at time one. In [26], the consistency of SFS is established under a restricted assumption that the potential U(x,t)U(x,t) is uniformly (on tt) strongly convex (on xx). In this paper we provide a nonasymptotic error bound of SFS in Wasserstein distance under some smooth and bounded conditions on the density ratio of the target distribution over the standard normal distribution, but without requiring the strongly convexity of the potential.

1 Introduction

Sampling from possibly unnormalized distributions is an import task in Bayesian statistics and machine learning. Ever since the Metropolis-Hastings (MH) algorithm [32, 25] was introduced, various random sampling methods were proposed, including Gibbs sampler, random walk sampler, independent sampler, Lagevin sampler, bouncy particle sampler, zig-zag sampler [23, 22, 37, 29, 35, 4, 2], among others, see [5, 14] and the references therein. The above mentioned sampling algorithms generate random samples by running an ergodic Markov chain whose stationary distribution is the target distribution.

In [26], the Schrödinger-Föllmer sampler (SFS), a novel sampling approach without requiring the property of ergodicity is proposed. SFS is based on the Schrödinger-Föllmer diffusion process, defined as

dXt=b(Xt,t)dt+dBt,t[0,1],X0=0,\displaystyle\mathrm{d}X_{t}=b\left(X_{t},t\right)\mathrm{d}t+\mathrm{d}B_{t},\quad t\in[0,1],\quad X_{0}=0, (1)

where the drift function

b(x,t)=U(x,t)=𝔼ZN(0,Ip)[f(x+1tZ)]𝔼ZN(0,Ip)[f(x+1tZ)]:p×[0,1]1b(x,t)=-\nabla U(x,t)=\frac{\mathbb{E}_{Z\sim N\left(0,\mbox{\bf I}_{p}\right)}[\nabla f(x+\sqrt{1-t}Z)]}{\mathbb{E}_{Z\sim N\left(0,\mbox{\bf I}_{p}\right)}[f(x+\sqrt{1-t}Z)]}:\mathbb{R}^{p}\times[0,1]\rightarrow\mathbb{R}^{1}

with f()=dμdN(0,Ip)()f(\cdot)=\frac{d\mu}{dN\left(0,\mbox{\bf I}_{p}\right)}(\cdot). According to [28] and [18], the process {Xt}t[0,1]\{X_{t}\}_{t\in[0,1]} in (1) was first formulated by Föllmer [19, 20, 21] when studying the Schrödinger bridge problem [36]. The main feature of the above Schrödinger-Föllmer process is that it interpolates δ0\delta_{0} and the target μ\mu in time [0,1][0,1], i.e., X1μX_{1}\sim\mu, see Proposition 1. SFS samples from μ\mu via the following Euler-Maruyama discretization of (1),

Ytk+1=Ytk+sb(Ytk,tk)+sϵk+1,Yt0=0,k=0,1,,K1,Y_{t_{k+1}}=Y_{t_{k}}+sb\left(Y_{t_{k}},t_{k}\right)+\sqrt{s}\epsilon_{k+1},~{}Y_{t_{0}}=0,~{}k=0,1,\ldots,K-1,

where s=1/Ks=1/K is the step size, tk=skt_{k}=sk, and {ϵk}k=1K\{\epsilon_{k}\}_{k=1}^{K} are independent and identically distributed from N(0,Ip)N(0,\mbox{\bf I}_{p}). If the expectations in the drift term b(x,t)b(x,t) do not have analytical forms, one can use Monte Carlo method to evaluate b(Ytk,tk)b\left(Y_{t_{k}},t_{k}\right) approximately, i.e., one can sample from μ\mu according

Y~tk+1=Y~tk+sb~m(Y~tk,tk)+sϵk+1,Y~t0=0,k=0,1,,K1,\widetilde{Y}_{t_{k+1}}=\widetilde{Y}_{t_{k}}+s\tilde{b}_{m}\left(\widetilde{Y}_{t_{k}},t_{k}\right)+\sqrt{s}\epsilon_{k+1},~{}\widetilde{Y}_{t_{0}}=0,~{}k=0,1,\ldots,K-1,

where b~m(Y~tk,tk)=1mj=1m[f(Y~tk+1tkZj)]1mj=1m[f(Y~tk+1tkZj)]\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})=\frac{\frac{1}{m}\sum_{j=1}^{m}[\nabla f(\widetilde{Y}_{t_{k}}+\sqrt{1-t_{k}}Z_{j})]}{\frac{1}{m}\sum_{j=1}^{m}[f(\widetilde{Y}_{t_{k}}+\sqrt{1-t_{k}}Z_{j})]} with Z1,ZmZ_{1},...Z_{m} i.i.d N(0,Ip)N(0,\mbox{\bf I}_{p}). The numerical simulations in [26] demonstrate that SFS outperforms the exiting samplers based on egordicity.

In Section 4.2 of [26], they prove that

W2(Law(Y~tK),μ)0,ass0,mW_{2}(\mbox{Law}(\widetilde{Y}_{t_{K}}),\mu)\rightarrow 0,\ \ \mathrm{as}\ \ s\rightarrow 0,m\rightarrow\infty

under a restricted assumption that the potential U(x,t)U(x,t) is uniformly strongly convex, i.e.,

U(x,t)U(y,t)U(y,t)(xy)(M/2)xy22,x,yp,t[0,1],U(x,t)-U(y,t)-\nabla U(y,t)^{\top}(x-y)\geq(M/2)\left\|x-y\right\|^{2}_{2},\forall x,y\in\mathbb{R}^{p},\forall t\in[0,1], (2)

where MM is one finite and positive constant. In this paper we provide a new analysis of the above SFS iteration. We establish a nonasymptotic error bound on W2(Law(Y~tK),μ)W_{2}(\mbox{Law}(\widetilde{Y}_{t_{K}}),\mu) under the condition that ff and f\nabla f are Lipschitz continuous and ff has positive lower bound, but without using the uniformly strongly convexity requirement (2).

The rest of this paper is organized as follows. In Section 2, we recall the SFS method. In Section 3, we present our theoretical analysis. We conclude in Section 4. Proofs for all the theorems are provided in Appendix A.

2 Schrödinger-Föllmer sampler

In this section we recall the Schrödinger-Föllmer sampler briefly. More backgrounds on the Schrödinger-Föllmer diffusion process please see [10, 28, 7, 26].

Let μ𝒫(p)\mu\in\mathcal{P}\left(\mathbb{R}^{p}\right) be the target distribution and absolutely continuous with respect to the pp-dimensional standard normal measure G=N(0,Ip)G=N(0,\mbox{\bf I}_{p}). Let

f(x)=dμdG(x).f(x)=\frac{d\mu}{dG}(x).

We assume that

  • (A1)(\textbf{A1})

    f,ff,\nabla f are Lipschitz continuous with constant γ\gamma,

  • (A2)(\textbf{A2})

    There exists ξ>0\xi>0 such that fξf\geq\xi.

Define the heat semigroup Qt,t[0,1]Q_{t},t\in[0,1] as

Qtf(x)=𝔼ZG[f(x+tZ)].Q_{t}f(x)=\mathbb{E}_{Z\sim G}[f(x+\sqrt{t}Z)].
Proposition 1.

Define a drift function

b(x,t)=logQ1tf(x).b(x,t)=\nabla\log Q_{1-t}f(x).

If ff satisfies assumptions (A1)(\textbf{A1}) and (A2)(\textbf{A2}), then the Schrödinger-Föllmer diffusion

dXt=b(Xt,t)dt+dBt,t[0,1],X0=0,\displaystyle\mathrm{d}X_{t}=b\left(X_{t},t\right)\mathrm{d}t+\mathrm{d}B_{t},\quad t\in[0,1],\quad X_{0}=0, (3)

has a unique strong solution and X1μX_{1}\sim\mu.

Remark 2.1.
  • (i)

    From the definition of b(x,t)b(x,t) in Proposition 1, it follows that U(x,t)=logQ1tf(x)U(x,t)=-\log Q_{1-t}f(x).

  • (ii)

    If the target distribution is μ(dx)=exp(V(x))dx/C\mu(dx)=\exp(-V(x))dx/C with the normalized constant CC, then f(x)=(2π)pCexp(V(x)+x222)f(x)=\frac{\left(\sqrt{2\pi}\right)^{p}}{C}\exp(-V(x)+\frac{\|x\|_{2}^{2}}{2}). Once V(x)V(x) is twice differentiable and

    limRsupx2Rexp(V(x)+x22/2)xV2<,\lim_{R\to\infty}\sup_{\|x\|_{2}\geq R}\exp\left(-V(x)+\|x\|_{2}^{2}/2\right)\|x-\nabla V\|_{2}<\infty,
    limRsupx2Rexp(V(x)+x22/2)Hess(V)2<,\lim_{R\to\infty}\sup_{\|x\|_{2}\geq R}\exp\left(-V(x)+\|x\|_{2}^{2}/2\right)\|Hess(V)\|_{2}<\infty,

    then both ff and f\nabla f are Lipschitz continuous, i.e., (A1) holds. (A2) is equivalent to the growth condition on the potential that V(x)x22logξ+contantV(x)\leq\frac{\|x\|^{2}}{2}-\log\xi+\mathrm{contant}.

  • (iii)

    Under (A1) and (A2), some calculation shows that

    f2γ,Hess(f)2γ,\|\nabla f\|_{2}\leq\gamma,\|Hess(f)\|_{2}\leq\gamma,

    and

    supxp,t[0,1]Q1tf(x)2γ,supxp,t[0,1]Hess(Q1tf(x))2γ,\sup_{x\in\mathbb{R}^{p},t\in[0,1]}\|\nabla Q_{1-t}f(x)\|_{2}\leq\gamma,\sup_{x\in\mathbb{R}^{p},t\in[0,1]}\|Hess(Q_{1-t}f(x))\|_{2}\leq\gamma,

    and

    b(x,t)=Q1tf(x)Q1tf(x),b(x,t)=Hess(Q1tf)(x)Q1tf(x)b(x,t)b(x,t).b(x,t)=\frac{\nabla Q_{1-t}f(x)}{Q_{1-t}f(x)},~{}\nabla b(x,t)=\frac{Hess(Q_{1-t}f)(x)}{Q_{1-t}f(x)}-b(x,t)b(x,t)^{\top}.

    We conclude that

    supxp,t[0,1]b(x,t)2γξ,supxp,t[0,1]b(x,t)2γξ+γ2ξ2.\sup_{x\in\mathbb{R}^{p},t\in[0,1]}\|b(x,t)\|_{2}\leq\frac{\gamma}{\xi},\sup_{x\in\mathbb{R}^{p},t\in[0,1]}\|\nabla b(x,t)\|_{2}\leq\frac{\gamma}{\xi}+\frac{\gamma^{2}}{\xi^{2}}.

Proposition 1 shows that the Schrödinger-Föllmer diffusion will transport δ0\delta_{0} to the target μ\mu on the unite time interval. Since drift term b(x,t)b(x,t) is scale-invariant with respect to ff in the sense that b(x,t)=logQ1tCf(x),C>0b(x,t)=\nabla\log Q_{1-t}Cf(x),\forall C>0. Therefore, the Schrödinger-Föllmer diffusion can be used for sampling from μ(dx)=exp(V(x))dx/C\mu(dx)=\exp(-V(x))dx/C, where the normalizing constant of CC may not to be known. To this end, we use the Euler-Maruyama method to discretize the Schrödinger-Föllmer diffusion (3). Let

tk=ks,k=0,1,,K,withs=1/K,Yt0=0,t_{k}=k\cdot s,\ \ ~{}k=0,1,\ldots,K,\ \ ~{}\mbox{with}~{}\ \ s=1/K,\ \ Y_{t_{0}}=0,

the Euler-Maruyama scheme reads

Ytk+1=Ytk+sb(Ytk,tk)+sϵk+1,k=0,1,,K1,\displaystyle Y_{t_{k+1}}=Y_{t_{k}}+sb\left(Y_{t_{k}},t_{k}\right)+\sqrt{s}\epsilon_{k+1},~{}k=0,1,\ldots,K-1, (4)

where {ϵk}k=1K\{\epsilon_{k}\}_{k=1}^{K} are i.i.d. N(0,Ip)N(0,\mbox{\bf I}_{p}) and

b(Ytk,tk)=𝔼Z[f(Ytk+1tkZ)]𝔼Z[f(Ytk+1tkZ)]=𝔼Z[Zf(Ytk+1tkZ)]𝔼Z[f(Ytk+1tkZ)]1tk.\displaystyle b(Y_{t_{k}},t_{k})=\frac{\mathbb{E}_{Z}[\nabla f(Y_{t_{k}}+\sqrt{1-t_{k}}Z)]}{\mathbb{E}_{Z}[f(Y_{t_{k}}+\sqrt{1-t_{k}}Z)]}=\frac{\mathbb{E}_{Z}[Zf(Y_{t_{k}}+\sqrt{1-t_{k}}Z)]}{\mathbb{E}_{Z}[f(Y_{t_{k}}+\sqrt{1-t_{k}}Z)]\sqrt{1-t_{k}}}. (5)

From the definition of b(Ytk,tk)b(Y_{t_{k}},t_{k}) in (5) we may not get its explicit expression. Here, we can get one estimator b~m\tilde{b}_{m} of bb by replacing 𝔼Z\mathbb{E}_{Z} in bb with mm-sample mean, i.e.,

b~m(Ytk,tk)=1mj=1m[f(Ytk+1tkZj)]1mj=1m[f(Ytk+1tkZj)],k=0,,K1,\displaystyle\tilde{b}_{m}(Y_{t_{k}},t_{k})=\frac{\frac{1}{m}\sum_{j=1}^{m}[\nabla f(Y_{t_{k}}+\sqrt{1-t_{k}}Z_{j})]}{\frac{1}{m}\sum_{j=1}^{m}[f(Y_{t_{k}}+\sqrt{1-t_{k}}Z_{j})]},\ k=0,\ldots,K-1, (6)

or

b~m(Ytk,tk)=1mj=1m[Zjf(Ytk+1tkZj)]1mj=1m[f(Ytk+1tkZj)]1tk,k=0,,K1,\displaystyle\tilde{b}_{m}(Y_{t_{k}},t_{k})=\frac{\frac{1}{m}\sum_{j=1}^{m}[Z_{j}f(Y_{t_{k}}+\sqrt{1-t_{k}}Z_{j})]}{\frac{1}{m}\sum_{j=1}^{m}[f(Y_{t_{k}}+\sqrt{1-t_{k}}Z_{j})]\cdot\sqrt{1-t_{k}}},\ k=0,\ldots,K-1, (7)

where Z1,,ZmZ_{1},\ldots,Z_{m} are i.i.d. N(0,Ip)N(0,\mbox{\bf I}_{p}). The detailed description of SFS is summarized in following Algorithm 1 below, which is Algorithm 2 in [26].

Algorithm 1 SFS for μ=exp(V(x))/C\mu=\exp(-V(x))/C with Monte Carlo estimation of the drift term
1:  Input: mm, KK. Initialize s=1/Ks=1/K, Y~t0=0\widetilde{Y}_{t_{0}}=0.
2:  for k=0,1,,K1k=0,1,\ldots,K-1  do
3:     Sample ϵk+1N(0,Ip)\epsilon_{k+1}\sim N(0,\mbox{\bf I}_{p}).
4:     Sample Zi,i,,mZ_{i},i,\ldots,m, from N(0,Ip)N(0,\mbox{\bf I}_{p}).
5:     Compute b~m\tilde{b}_{m} according to (6) or (7),
6:     Y~tk+1=Y~tk+sb~m(Y~tk,tk)+sϵk+1\widetilde{Y}_{t_{k+1}}=\widetilde{Y}_{t_{k}}+s\tilde{b}_{m}\left(\widetilde{Y}_{t_{k}},t_{k}\right)+\sqrt{s}\epsilon_{k+1}.
7:  end for
8:  Output: {Y~tk}k=1K\{\widetilde{Y}_{t_{k}}\}_{k=1}^{K}.

In Section 4.2 of [26], they proved that

W2(Law(Y~tK),μ)0,ass0,mW_{2}(\mbox{Law}(\widetilde{Y}_{t_{K}}),\mu)\rightarrow 0,\ \ \mathrm{as}\ \ s\rightarrow 0,m\rightarrow\infty

under a restricted assumption that the potential is uniformly strongly convex, i.e, U(x,t)U(x,t) satisfies (2). However, (2) is not easy to verify. In the next section, we establish a nonasymptotic bound on the Wasserstein distance between the law of Y~tK\widetilde{Y}_{t_{K}} generated by SFS (Algorithm 1) and the target μ\mu under smooth and bounded conditions (A1) and (A2) but without using the strongly uniform convexity assumption (2) on U(x,t)U(x,t).

3 Nonasymptotic Bound without convexity

Under conditions (A1) and (A2), one can easily deduce the growth condition and Lipschitz/Hölder continuity of the drift term b(x,t)b(x,t) [26], i.e.,

b(x,t)22C0(1+x22),\displaystyle\|b(x,t)\|_{2}^{2}\leq C_{0}(1+\|x\|_{2}^{2}), (C1)

and

b(x,t)b(y,t)2C1xy2,\displaystyle\|b(x,t)-b(y,t)\|_{2}\leq C_{1}\|x-y\|_{2}, (C2)

and

b(x,t)b(y,s)2C1(xy2+|ts|12),\displaystyle\|b(x,t)-b(y,s)\|_{2}\leq C_{1}\left(\|x-y\|_{2}+|t-s|^{\frac{1}{2}}\right), (C3)

where C0C_{0} and C1C_{1} are two finite and positive constants.

Remark 3.1.

(C1) and (C2) are the essentially sufficient conditions such that the Schrödinger-Föllmer SDE (3) admits the unique strong solution. (C3) has been introduced in Theorem 4.1 of [38], and it is also similar to the condition H2 of [6] and Assumption 3.2 of [1]. Obviously, (C3) implies (C2), and (C1) holds if the drift term b(x,t)b(x,t) is bounded over p×[0,1]\mathbb{R}^{p}\times[0,1].

Let 𝒟(ν1,ν2)\mathcal{D}(\nu_{1},\nu_{2}) be the collection of coupling probability measures on (2p,(2p))\left(\mathbb{R}^{2p},\mathcal{B}(\mathbb{R}^{2p})\right) such that its respective marginal distributions are ν1\nu_{1} and ν2\nu_{2}. The Wasserstein of order d1d\geq 1 with which we measure the discrepancy between Law(Y~tK)\mbox{Law}(\widetilde{Y}_{t_{K}}) and μ\mu is defined as

Wd(ν1,ν2)=infν𝒟(ν1,ν2)(ppθ1θ22ddν(θ1,θ2))1/d.\displaystyle W_{d}(\nu_{1},\nu_{2})=\inf_{\nu\in\mathcal{D}(\nu_{1},\nu_{2})}\left(\int_{\mathbb{R}^{p}}\int_{\mathbb{R}^{p}}\left\|\theta_{1}-\theta_{2}\right\|_{2}^{d}\mathrm{d}\nu\left(\theta_{1},\theta_{2}\right)\right)^{1/d}.
Theorem 2.

Assume (A1) and (A2) hold, then

W2(Law(Y~tK),μ)𝒪(ps)+𝒪(plog(m)),\displaystyle W_{2}(\mbox{Law}(\widetilde{Y}_{t_{K}}),\mu)\leq\mathcal{O}(\sqrt{ps})+\mathcal{O}\left(\sqrt{\frac{p}{\log(m)}}\right),

where s=1/Ks=1/K is the step size.

Remark 3.2.

This theorem provides some guidance on the selection of ss and mm. To ensure convergence of the distribution of Y~tK\widetilde{Y}_{t_{K}}, we should set the step size s=o(1/p)s=o(1/p) and m=exp(p/o(1))m=\exp(p/o(1)). In high-dimensional models with a large pp, we need to generate a large number of random vectors from N(0,Ip)N(0,\mbox{\bf I}_{p}) to obtain an accurate estimate of the drift term bb. If we assume that ff is bounded above we can improve the nonasymptotic error bound, in which 𝒪(p/log(m))\mathcal{O}\left(\sqrt{p/\log(m)}\right) can be improved to be 𝒪(p/m)\mathcal{O}\left(\sqrt{p/m}\right).

Theorem 3.

Assume that, in addition to the conditions of Theorem 2, ff has a finite upper bound, then

W2(Law(Y~tK),μ)𝒪(ps)+𝒪(pm),\displaystyle W_{2}(\mbox{Law}(\widetilde{Y}_{t_{K}}),\mu)\leq\mathcal{O}(\sqrt{ps})+\mathcal{O}\left(\sqrt{\frac{p}{m}}\right),

where s=1/Ks=1/K is the step size.

Remark 3.3.

With the boundedness condition on ff, to ensure convergence of the sampling distribution, we can set the step size s=o(1/p)s=o(1/p) and m=p/o(1)m=p/o(1). Note that the sample size requirement for approximating the drift term is significantly less stringent than that in Theorem 2.

Remark 3.4.

Langevin sampling method has been studied under the (strongly) convex potential assumption [15, 16, 17, 11, 12, 8, 13]; the dissipativity condition for the drift term [34, 33, 40]; the local convexity condition for the potential function outside a ball [17, 9, 30, 3]. However, these conditions may not hold for models with multiple modes, for example, Gaussian mixtures, where their potentials are not convex and the log Sobolev inequality may not be satisfied. Moreover, the constant in the log Sobolev inequality depends on the dimensionality exponentially [39, 24, 31, 34], implying that the Langevin samplers suffers from the curse of dimensionality. SFS does not require the underlying Markov process to be ergodic, therefore, our results in Theorem 2 and 3 are established under the smooth and bounded assumptions (A1) and (A2) on ff but do not need the above mentioned conditions used in the analysis of Langevin samplers.

In Theorem 2 and Theorem 3, we use (A2), i.e, ff has positive lower bound, however, (A2) may not hold if the target distribution admits compact support. To circumvent this difficulty, we consider the regularized probability measure

με=(1ε)μ+εG,ε(0,1).\mu_{\varepsilon}=(1-\varepsilon)\mu+\varepsilon G,\ \ \varepsilon\in(0,1).

The corresponding density ratio is

fε=dμεdG=(1ε)f+ε.f_{\varepsilon}=\frac{d\mu_{\varepsilon}}{dG}=(1-\varepsilon)f+\varepsilon.

Obviously, fεf_{\varepsilon} satisfies (A1) and (A2) if ff and f\nabla f are Lipschitz continuous. Since με\mu_{\varepsilon} can approximate to μ\mu well if we set ε\varepsilon small enough, then we consider sampling from με\mu_{\varepsilon} by running SFS (Algorithm 1) with ff being replaced by fεf_{\varepsilon}. We use Y~tKε\widetilde{Y}_{t_{K}}^{\varepsilon} to denote the last iteration of SFS.

Theorem 4.

Assume (A1) holds and set ε=(log(m))1/5\varepsilon=(\log(m))^{-1/5}, then

W2(Law(Y~tKε),μ)𝒪(ps)+C~p𝒪(1(log(m))1/10),\displaystyle W_{2}(\mbox{Law}(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu)\leq\mathcal{O}(\sqrt{ps})+\widetilde{C}_{p}\cdot\mathcal{O}\left(\frac{1}{(\log(m))^{1/10}}\right),

where s=1/Ks=1/K is the step size, C~p\widetilde{C}_{p} is a constant depending on pp. Moreover, if ff has the finite upper bound and set ε=m1/5\varepsilon=m^{-1/5}, then

W2(Law(Y~tKε),μ)𝒪(ps)+C~p𝒪(1m1/10).\displaystyle W_{2}(\mbox{Law}(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu)\leq\mathcal{O}(\sqrt{ps})+\widetilde{C}_{p}\cdot\mathcal{O}\left(\frac{1}{m^{1/10}}\right).

4 Conclusion

In [26], Schrödinger-Föllmer sampler (SFS) was proposed for sampling from possibly unnormalized distributions. The key feature of SFS is that it does not need ergodicity as its theoretical basis. The consistency of SFS proved in [26] relies a restricted assumption that the potential function is uniformly strongly convex. In this paper we provide a new convergence analysis of the SFS without the strongly convexity condition on the potential. We establish a nonasymptotic error bound on Wasserstein distance between the law of the output of SFS and the target distribution under smooth and bounded assumptions on the density ratio of the target distribution over the standard normal distribution.

5 Acknowledgment

The authors would like to thank Professor Liming Wu at Université Clermont-Auvergne for helpful discussions on this topic.

Appendix A Appendix

In this appendix, we prove Proposition 1 and Theorems 2-4.

A.1 Proof of Proposition 1

Proof.

This is a known result, see for example [10, 27]. ∎

A.2 Preliminary lemmas for Theorems 2-3

First, we introduce Lemmas 5-9 in preparing for the proofs of Theorems 2-3.

Lemma 5.

Assume (A1) and (A2) hold, then

𝔼[Xt22]2(C0+p)exp(2C0t).\displaystyle\mathbb{E}[\|X_{t}\|_{2}^{2}]\leq 2(C_{0}+p)\exp(2C_{0}t).
Proof.

From the definition of XtX_{t} in (3), we have Xt20tb(Xu,u)2du+Bt2.\|X_{t}\|_{2}\leq\int_{0}^{t}\|b(X_{u},u)\|_{2}\mathrm{d}u+\|B_{t}\|_{2}. Then, we can get

Xt22\displaystyle\|X_{t}\|_{2}^{2} 2(0tb(Xu,u)2du)2+2Bt22\displaystyle\leq 2\left(\int_{0}^{t}\|b(X_{u},u)\|_{2}\mathrm{d}u\right)^{2}+2\|B_{t}\|_{2}^{2}
2t0tb(Xu,u)22du+2Bt22\displaystyle\leq 2t\int_{0}^{t}\|b(X_{u},u)\|_{2}^{2}\mathrm{d}u+2\|B_{t}\|_{2}^{2}
2t0tC0[Xu22+1]du+2Bt22,\displaystyle\leq 2t\int_{0}^{t}C_{0}[\|X_{u}\|_{2}^{2}+1]\mathrm{d}u+2\|B_{t}\|_{2}^{2},

where the first inequality holds by the inequality (a+b)22a2+2b2(a+b)^{2}\leq 2a^{2}+2b^{2}, the last inequality holds by (C1). Thus,

𝔼Xt22\displaystyle\mathbb{E}\|X_{t}\|_{2}^{2} 2t0tC0(𝔼Xu22+1)du+2𝔼Bt22\displaystyle\leq 2t\int_{0}^{t}C_{0}(\mathbb{E}\|X_{u}\|_{2}^{2}+1)\mathrm{d}u+2\mathbb{E}\|B_{t}\|_{2}^{2}
2C00t𝔼Xu22du+2(C0+p).\displaystyle\leq 2C_{0}\int_{0}^{t}\mathbb{E}\|X_{u}\|_{2}^{2}\mathrm{d}u+2(C_{0}+p).

By Bellman-Gronwall inequality, we have

𝔼Xt222(C0+p)exp(2C0t).\displaystyle\mathbb{E}\|X_{t}\|_{2}^{2}\leq 2(C_{0}+p)\exp(2C_{0}t).

Lemma 6.

Assume (A1) and (A2) hold, then for any 0t1t210\leq t_{1}\leq t_{2}\leq 1,

𝔼[Xt2Xt122]4C0exp(2C0)(C0+p)(t2t1)2+2C0(t2t1)2+2p(t2t1).\displaystyle\mathbb{E}[\|X_{t_{2}}-X_{t_{1}}\|_{2}^{2}]\leq 4C_{0}\exp(2C_{0})(C_{0}+p)(t_{2}-t_{1})^{2}+2C_{0}(t_{2}-t_{1})^{2}+2p(t_{2}-t_{1}).
Proof.

From the definition of XtX_{t} in (3), we have

Xt2Xt12t1t2b(Xu,u)2du+Bt2Bt12.\displaystyle\|X_{t_{2}}-X_{t_{1}}\|_{2}\leq\int_{t_{1}}^{t_{2}}\|b(X_{u},u)\|_{2}\mathrm{d}u+\|B_{t_{2}}-B_{t_{1}}\|_{2}.

Then, we can get

Xt2Xt122\displaystyle\|X_{t_{2}}-X_{t_{1}}\|_{2}^{2} 2(t1t2b(Xu,u)2du)2+2Bt2Bt122\displaystyle\leq 2\left(\int_{t_{1}}^{t_{2}}\|b(X_{u},u)\|_{2}\mathrm{d}u\right)^{2}+2\|B_{t_{2}}-B_{t_{1}}\|_{2}^{2}
2(t2t1)t1t2b(Xu,u)22du+2Bt2Bt122\displaystyle\leq 2(t_{2}-t_{1})\int_{t_{1}}^{t_{2}}\|b(X_{u},u)\|_{2}^{2}\mathrm{d}u+2\|B_{t_{2}}-B_{t_{1}}\|_{2}^{2}
2(t2t1)t1t2C0[Xu22+1]du+2Bt2Bt122,\displaystyle\leq 2(t_{2}-t_{1})\int_{t_{1}}^{t_{2}}C_{0}[\|X_{u}\|_{2}^{2}+1]\mathrm{d}u+2\|B_{t_{2}}-B_{t_{1}}\|_{2}^{2},

where the last inequality holds by (C1). Hence,

𝔼Xt2Xt122\displaystyle\mathbb{E}\|X_{t_{2}}-X_{t_{1}}\|_{2}^{2} 2(t2t1)t1t2C0(𝔼Xu22+1)du+2𝔼Bt2Bt122\displaystyle\leq 2(t_{2}-t_{1})\int_{t_{1}}^{t_{2}}C_{0}(\mathbb{E}\|X_{u}\|_{2}^{2}+1)\mathrm{d}u+2\mathbb{E}\|B_{t_{2}}-B_{t_{1}}\|_{2}^{2}
4C0exp(2C0)(C0+p)(t2t1)2+2C0(t2t1)2+2p(t2t1),\displaystyle\leq 4C_{0}\exp(2C_{0})(C_{0}+p)(t_{2}-t_{1})^{2}+2C_{0}(t_{2}-t_{1})^{2}+2p(t_{2}-t_{1}),

where the last inequality holds by Lemma 5. ∎

Lemma 7.

Assume (A1) and (A2) hold, then for any R>0R>0,

supx2R,t[0,1]𝔼[b(x,t)b~m(x,t)22]𝒪(pexp(R2)m).\displaystyle\underset{\|x\|_{2}\leq R,t\in[0,1]}{\sup}\mathbb{E}\left[\|b(x,t)-\tilde{b}_{m}(x,t)\|_{2}^{2}\right]\leq\mathcal{O}\left(\frac{p\exp(R^{2})}{m}\right).

Moreover, if ff has the finite upper bound, then

supxp,t[0,1]𝔼[b(x,t)b~m(t,x)22]𝒪(pm).\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}\left[\|b(x,t)-\tilde{b}_{m}(t,x)\|_{2}^{2}\right]\leq\mathcal{O}\left(\frac{p}{m}\right).
Proof.

Denote two independent sets of independent copies of ZN(0,Ip)Z\sim N(0,\mbox{\bf I}_{p}), that is, 𝐙={Z1,,Zm}{\bf Z}=\{Z_{1},\ldots,Z_{m}\} and 𝐙={Z1,,Zm}{\bf Z}^{\prime}=\{Z_{1}^{\prime},\ldots,Z_{m}^{\prime}\}. For notation convenience, we denote

d=𝔼Zf(x+1tZ),dm=i=1mf(x+1tZi)m,\displaystyle d=\mathbb{E}_{Z}\nabla f(x+\sqrt{1-t}Z),~{}d_{m}=\frac{\sum_{i=1}^{m}\nabla f(x+\sqrt{1-t}Z_{i})}{m},
e=𝔼Zf(x+1tZ),em=i=1mf(x+1tZi)m,\displaystyle e=\mathbb{E}_{Z}f(x+\sqrt{1-t}Z),~{}e_{m}=\frac{\sum_{i=1}^{m}f(x+\sqrt{1-t}Z_{i})}{m},
dm=i=1mf(x+1tZi)m,em=i=1mf(x+1tZi)m.\displaystyle d_{m}^{\prime}=\frac{\sum_{i=1}^{m}\nabla f(x+\sqrt{1-t}Z_{i}^{\prime})}{m},~{}e_{m}^{\prime}=\frac{\sum_{i=1}^{m}f(x+\sqrt{1-t}Z_{i}^{\prime})}{m}.

Due to ddm=𝔼[dmdm|𝐙]d-d_{m}=\mathbb{E}\left[d_{m}^{\prime}-d_{m}|{\bf Z}\right], then ddm22𝔼[dmdm22|𝐙]\|d-d_{m}\|^{2}_{2}\leq\mathbb{E}\left[\|d_{m}^{\prime}-d_{m}\|^{2}_{2}|{\bf Z}\right]. Then,

𝔼ddm2\displaystyle\mathbb{E}\|d-d_{m}\|^{2} 𝔼[𝔼[dmdm22|𝐙]]=𝔼dmdm22\displaystyle\leq\mathbb{E}\left[\mathbb{E}[\|d_{m}^{\prime}-d_{m}\|^{2}_{2}|{\bf Z}]\right]=\mathbb{E}\|d_{m}^{\prime}-d_{m}\|^{2}_{2}
=𝔼Z1,Z1f(x+1tZ1)f(x+1tZ1)22m\displaystyle=\frac{\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\|\nabla f(x+\sqrt{1-t}Z_{1})-\nabla f(x+\sqrt{1-t}Z_{1}^{\prime})\right\|^{2}_{2}}{m}
(1t)γ2m𝔼Z1,Z1Z1Z122\displaystyle\leq\frac{(1-t)\gamma^{2}}{m}\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\|Z_{1}-Z_{1}^{\prime}\right\|^{2}_{2}
2pγ2m,\displaystyle\leq\frac{2p\gamma^{2}}{m}, (A.1)

where the second inequality holds by (A1). Similarly, we also have

𝔼|eem|2\displaystyle\mathbb{E}|e-e_{m}|^{2} 𝔼|emem|2\displaystyle\leq\mathbb{E}|e_{m}^{\prime}-e_{m}|^{2}
=𝔼Z1,Z1|f(x+1tZ1)f(x+1tZ1)|2m\displaystyle=\frac{\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left|f(x+\sqrt{1-t}Z_{1})-f(x+\sqrt{1-t}Z_{1}^{\prime})\right|^{2}}{m}
(1t)γ2m𝔼Z1,Z1Z1Z122\displaystyle\leq\frac{(1-t)\gamma^{2}}{m}\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\|Z_{1}-Z_{1}^{\prime}\right\|^{2}_{2}
2pγ2m,\displaystyle\leq\frac{2p\gamma^{2}}{m}, (A.2)

where the second inequality holds due to (A1). Thus, by (A.2) and (A.2), it follows that

supxp,t[0,1]𝔼ddm222pγ2m,\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}\left\|d-d_{m}\right\|_{2}^{2}\leq\frac{2p\gamma^{2}}{m}, (A.3)
supxp,t[0,1]𝔼|eem|22pγ2m.\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}|e-e_{m}|^{2}\leq\frac{2p\gamma^{2}}{m}. (A.4)

Then, by (A1) and (A2), through some simple calculation, it yields that

b(x,t)b~m(x,t)2\displaystyle\|b(x,t)-\tilde{b}_{m}(x,t)\|_{2} =dedmem2\displaystyle=\left\|\frac{d}{e}-\frac{d_{m}}{e_{m}}\right\|_{2}
d2|eme|+ddm2|e||eem|\displaystyle\leq\frac{\|d\|_{2}|e_{m}-e|+\|d-d_{m}\|_{2}|e|}{|ee_{m}|}
γ|eme|+ddm2|e|ξ2.\displaystyle\leq\frac{\gamma|e_{m}-e|+\|d-d_{m}\|_{2}|e|}{\xi^{2}}. (A.5)

Let R>0R>0, then

supx2Rf(x)𝒪(exp(R2/2)).\displaystyle\underset{\|x\|_{2}\leq R}{\sup}f(x)\leq\mathcal{O}\left(\exp(R^{2}/2)\right). (A.6)

Therefore, by (A.3)-(A.6), it can be concluded that

supx2R,t[0,1]𝔼[b(x,t)b~m(x,t)22]𝒪(pexp(R2)m).\displaystyle\underset{\|x\|_{2}\leq R,t\in[0,1]}{\sup}\mathbb{E}\left[\|b(x,t)-\tilde{b}_{m}(x,t)\|_{2}^{2}\right]\leq\mathcal{O}\left(\frac{p\exp(R^{2})}{m}\right).

Moreover, if ff has the finite upper bound, that is, there exists one finite and positive constant ζ\zeta such that fζf\leq\zeta. Then, similar to (A.2), it follows that for all xpx\in\mathbb{R}^{p} and t[0,1]t\in[0,1],

b(x,t)b~m(x,t)22\displaystyle\|b(x,t)-\tilde{b}_{m}(x,t)\|_{2}^{2} 2γ2|eme|2+ζ2ddm22ξ4.\displaystyle\leq 2\frac{\gamma^{2}|e_{m}-e|^{2}+\zeta^{2}\|d-d_{m}\|_{2}^{2}}{\xi^{4}}. (A.7)

Then, by (A.3)-(A.4) and (A.7), it follows that

supxp,t[0,1]𝔼[b(x,t)b~m(t,x)22]𝒪(pm).\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}\left[\|b(x,t)-\tilde{b}_{m}(t,x)\|_{2}^{2}\right]\leq\mathcal{O}\left(\frac{p}{m}\right).

Lemma 8.

Assume (A1) and (A2) hold, then for k=0,1,,Kk=0,1,\ldots,K,

E[Y~tk22]6γ2ξ2+3p.\displaystyle E[\|\widetilde{Y}_{t_{k}}\|^{2}_{2}]\leq\frac{6\gamma^{2}}{\xi^{2}}+3p.
Proof.

Define Θk,t=Y~tk+(ttk)b~m(Y~tk,tk)\Theta_{k,t}=\widetilde{Y}_{t_{k}}+(t-t_{k})\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k}) and Y~t=Θk,t+BtBtk\widetilde{Y}_{t}=\Theta_{k,t}+B_{t}-B_{t_{k}}, where tkttk+1t_{k}\leq t\leq t_{k+1} with k=0,1,,K1k=0,1,\ldots,K-1. By (A1) and (A2), it follows that for all xpx\in\mathbb{R}^{p} and t[0,1]t\in[0,1],

b(x,t)22γ2ξ2,b~m(x,t)22γ2ξ2.\displaystyle\|b(x,t)\|_{2}^{2}\leq\frac{\gamma^{2}}{\xi^{2}},~{}~{}\|\tilde{b}_{m}(x,t)\|_{2}^{2}\leq\frac{\gamma^{2}}{\xi^{2}}. (A.8)

Then, by (A.8), we have

Θk,t22\displaystyle\|\Theta_{k,t}\|^{2}_{2} =Y~tk22+(ttk)2b~m(Y~tk,tk)22+2(ttk)Y~tkb~m(Y~tk,tk)\displaystyle=\|\widetilde{Y}_{t_{k}}\|^{2}_{2}+(t-t_{k})^{2}\|\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\|^{2}_{2}+2(t-t_{k})\widetilde{Y}_{t_{k}}^{\top}\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})
(1+s)Y~tk22+(s+s2)γ2ξ2.\displaystyle\leq(1+s)\|\widetilde{Y}_{t_{k}}\|^{2}_{2}+\frac{(s+s^{2})\gamma^{2}}{\xi^{2}}.

Further, we can get

𝔼[Y~t22|Y~tk]\displaystyle\mathbb{E}[\|\widetilde{Y}_{t}\|^{2}_{2}|\widetilde{Y}_{t_{k}}] =𝔼[Θk,t22|Y~tk]+(ttk)p\displaystyle=\mathbb{E}[\|\Theta_{k,t}\|^{2}_{2}|\widetilde{Y}_{t_{k}}]+(t-t_{k})p
(1+s)𝔼Y~tk22+(s+s2)γ2ξ2+sp.\displaystyle\leq(1+s)\mathbb{E}\|\widetilde{Y}_{t_{k}}\|^{2}_{2}+\frac{(s+s^{2})\gamma^{2}}{\xi^{2}}+sp.

Therefore,

𝔼[Y~tk+122](1+s)𝔼Y~tk22+(s+s2)γ2ξ2+sp.\displaystyle\mathbb{E}[\|\widetilde{Y}_{t_{k+1}}\|^{2}_{2}]\leq(1+s)\mathbb{E}\|\widetilde{Y}_{t_{k}}\|^{2}_{2}+\frac{(s+s^{2})\gamma^{2}}{\xi^{2}}+sp.

Since Y~t0=0\widetilde{Y}_{t_{0}}=0, then by induction, we have

𝔼[Y~tk+122]6γ2ξ2+3p.\displaystyle\mathbb{E}[\|\widetilde{Y}_{t_{k+1}}\|^{2}_{2}]\leq\frac{6\gamma^{2}}{\xi^{2}}+3p.

Lemma 9.

Assume (A1) and (A2) hold, then for k=0,1,,Kk=0,1,\ldots,K,

𝔼b(Y~tk,tk)b~m(Y~tk,tk)22𝒪(plog(m)).\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}\leq\mathcal{O}\left(\frac{p}{\log(m)}\right).

Moreover, if ff has the finite upper bound, then

𝔼b(Y~tk,tk)b~m(Y~tk,tk)22𝒪(pm).\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}\leq\mathcal{O}\left(\frac{p}{m}\right).
Proof.

Let R>0R>0, then

𝔼b(Y~tk,tk)b~m(Y~tk,tk)22=𝔼Y~tk𝔼Z[b(Y~tk,tk)b~m(Y~tk,tk)221(Y~tk2R)]+𝔼Y~tk𝔼Z[b(Y~tk,tk)b~m(Y~tk,tk)221(Y~tk2>R)].\begin{split}\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}&=\mathbb{E}_{\widetilde{Y}_{t_{k}}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}\|_{2}\leq R)\right]\\ &~{}~{}~{}+\mathbb{E}_{\widetilde{Y}_{t_{k}}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}\|_{2}>R)\right].\end{split} (A.9)

Next, we need to bound the two terms of (A.9). First, by Lemma 7, we have

𝔼Y~tk𝔼Z[b(Y~tk,tk)b~m(Y~tk,tk)221(Y~tk2R)]𝒪(pexp(R2)m).\mathbb{E}_{\widetilde{Y}_{t_{k}}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}\|_{2}\leq R)\right]\leq\mathcal{O}\left(\frac{p\exp(R^{2})}{m}\right).

Secondly, combining (A.8) and Lemma 8 with Markov inequality, it yields that

𝔼Y~tk𝔼Z[b(Y~tk,tk)b~m(Y~tk,tk)221(Y~tk2>R)]𝒪(p/R2).\mathbb{E}_{\widetilde{Y}_{t_{k}}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}\|_{2}>R)\right]\leq\mathcal{O}\left(p/R^{2}\right).

Thence

𝔼b(Y~tk,tk)b~m(Y~tk,tk)22𝒪(pexp(R2)m)+𝒪(p/R2).\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}\leq\mathcal{O}\left(\frac{p\exp(R^{2})}{m}\right)+\mathcal{O}\left(p/R^{2}\right). (A.10)

Set R=(log(m)2)1/2R=\left(\frac{\log(m)}{2}\right)^{1/2} in (A.10), then we have

𝔼b(Y~tk,tk)b~m(Y~tk,tk)22𝒪(plog(m)).\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}\leq\mathcal{O}\left(\frac{p}{\log(m)}\right).

Moreover, if ff has the finite upper bound, then by Lemma 7, we can similarly get

𝔼b(Y~tk,tk)b~m(Y~tk,tk)22=𝔼Y~tk𝔼Z[b(Y~tk,tk)b~m(Y~tk,tk)22]𝒪(pm).\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}=\mathbb{E}_{\widetilde{Y}_{t_{k}}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}},t_{k})\right\|_{2}^{2}\right]\leq\mathcal{O}\left(\frac{p}{m}\right).

This completes the proof. ∎

A.3 Proof of Theorem 2

Proof.

From the definition of Y~tk\widetilde{Y}_{t_{k}} and XtkX_{t_{k}}, we have

Y~tkXtk22\displaystyle\|\widetilde{Y}_{t_{k}}-X_{t_{k}}\|_{2}^{2}
Y~tk1Xtk122+(tk1tkb(Xu,u)b~m(Y~tk1,tk1)2du)2\displaystyle\leq\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\|_{2}^{2}+\left(\int_{t_{k-1}}^{t_{k}}\|b(X_{u},u)-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\|_{2}\mathrm{d}u\right)^{2}
+2Y~tk1Xtk12(tk1tkb(Xu,u)b~m(Y~tk1,tk1)2du)\displaystyle~{}~{}~{}+2\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\|_{2}\left(\int_{t_{k-1}}^{t_{k}}\|b(X_{u},u)-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\|_{2}\mathrm{d}u\right)
(1+s)Y~tk1Xtk122+(1+s)tk1tkb(Xu,u)b~m(Y~tk1,tk1)22du\displaystyle\leq(1+s)\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\|_{2}^{2}+(1+s)\int_{t_{k-1}}^{t_{k}}\|b(X_{u},u)-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\|_{2}^{2}\mathrm{d}u
(1+s)Y~tk1Xtk122+2(1+s)tk1tkb(Xu,u)b(Y~tk1,tk1)22du\displaystyle\leq(1+s)\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\|_{2}^{2}+2(1+s)\int_{t_{k-1}}^{t_{k}}\|b(X_{u},u)-b(\widetilde{Y}_{t_{k-1}},t_{k-1})\|_{2}^{2}\mathrm{d}u
+2s(1+s)b(Y~tk1,tk1)b~m(Y~tk1,tk1)22\displaystyle~{}~{}~{}+2s(1+s)\|b(\widetilde{Y}_{t_{k-1}},t_{k-1})-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\|_{2}^{2}
(1+s)Y~tk1Xtk122+4C12(1+s)tk1tk[XuY~tk122+|utk1|]du\displaystyle\leq(1+s)\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\|_{2}^{2}+4C_{1}^{2}(1+s)\int_{t_{k-1}}^{t_{k}}[\|X_{u}-\widetilde{Y}_{t_{k-1}}\|_{2}^{2}+|u-t_{k-1}|]\mathrm{d}u
+2s(1+s)b(Y~tk1,tk1)b~m(Y~tk1,tk1)22\displaystyle~{}~{}~{}+2s(1+s)\|b(\widetilde{Y}_{t_{k-1}},t_{k-1})-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\|_{2}^{2}
(1+s)Y~tk1Xtk122+8C12(1+s)tk1tkXuXtk122du\displaystyle\leq(1+s)\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\|_{2}^{2}+8C_{1}^{2}(1+s)\int_{t_{k-1}}^{t_{k}}\|X_{u}-X_{t_{k-1}}\|_{2}^{2}\mathrm{d}u
+8C12s(1+s)Xtk1Y~tk122+4C12(1+s)s2\displaystyle~{}~{}~{}+8C_{1}^{2}s(1+s)\|X_{t_{k-1}}-\widetilde{Y}_{t_{k-1}}\|_{2}^{2}+4C_{1}^{2}(1+s)s^{2}
+2s(1+s)b(Y~tk1,tk1)b~m(Y~tk1,tk1)22\displaystyle~{}~{}~{}+2s(1+s)\|b(\widetilde{Y}_{t_{k-1}},t_{k-1})-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\|_{2}^{2}
(1+s+8C12(s+s2))Y~tk1Xtk122+8C12(1+s)tk1tkXuXtk122du\displaystyle\leq(1+s+8C_{1}^{2}(s+s^{2}))\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\|_{2}^{2}+8C_{1}^{2}(1+s)\int_{t_{k-1}}^{t_{k}}\|X_{u}-X_{t_{k-1}}\|_{2}^{2}\mathrm{d}u
+4C12(1+s)s2+2s(1+s)b(Y~tk1,tk1)b~m(Y~tk1,tk1)22,\displaystyle~{}~{}~{}+4C_{1}^{2}(1+s)s^{2}+2s(1+s)\|b(\widetilde{Y}_{t_{k-1}},t_{k-1})-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\|_{2}^{2},

where the second inequality holds duo to 2absa2+b2s2ab\leq sa^{2}+\frac{b^{2}}{s}, the fourth inequality holds by (C3). Then,

𝔼Y~tkXtk22\displaystyle\mathbb{E}\|\widetilde{Y}_{t_{k}}-X_{t_{k}}\|_{2}^{2} (1+s+8C12(s+s2))𝔼Y~tk1Xtk122\displaystyle\leq(1+s+8C_{1}^{2}(s+s^{2}))\mathbb{E}\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\|_{2}^{2}
+8C12(1+s)tk1tk𝔼XuXtk122du+4C12(s2+s3)\displaystyle~{}~{}~{}+8C_{1}^{2}(1+s)\int_{t_{k-1}}^{t_{k}}\mathbb{E}\|X_{u}-X_{t_{k-1}}\|_{2}^{2}\mathrm{d}u+4C_{1}^{2}(s^{2}+s^{3})
+2s(1+s)𝔼[b(Y~tk1,tk1)b~m(Y~tk1,tk1)22]\displaystyle~{}~{}~{}+2s(1+s)\mathbb{E}[\|b(\widetilde{Y}_{t_{k-1}},t_{k-1})-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\|_{2}^{2}]
(1+s+8C12(s+s2))𝔼Y~tk1Xtk122+h(s)\displaystyle\leq(1+s+8C_{1}^{2}(s+s^{2}))\mathbb{E}\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\|_{2}^{2}+h(s)
+4C12(s2+s3)+2s(1+s)𝔼[b(Y~tk1,tk1)b~m(Y~tk1,tk1)22]\displaystyle~{}~{}~{}+4C_{1}^{2}(s^{2}+s^{3})+2s(1+s)\mathbb{E}[\|b(\widetilde{Y}_{t_{k-1}},t_{k-1})-\tilde{b}_{m}(\widetilde{Y}_{t_{k-1}},t_{k-1})\|_{2}^{2}]
(1+s+8C12(s+s2))𝔼Y~tk1Xtk122+h(s)\displaystyle\leq(1+s+8C_{1}^{2}(s+s^{2}))\mathbb{E}\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\|_{2}^{2}+h(s)
+4C12(s2+s3)+2s(1+s)𝒪(plog(m)),\displaystyle~{}~{}~{}+4C_{1}^{2}(s^{2}+s^{3})+2s(1+s)\mathcal{O}\left(\frac{p}{\log(m)}\right), (A.11)

where h(s)=8C12(s+s2)[4C0exp(2C0)(C0+p)s2+2C0s2+2ps]h(s)=8C_{1}^{2}(s+s^{2})[4C_{0}\exp(2C_{0})(C_{0}+p)s^{2}+2C_{0}s^{2}+2ps], and the last inequality holds by Lemma 9. Owing to Y~t0=Xt0=0\widetilde{Y}_{t_{0}}=X_{t_{0}}=0, we can conclude that

𝔼Y~tKXtK22\displaystyle\mathbb{E}\|\widetilde{Y}_{t_{K}}-X_{t_{K}}\|_{2}^{2}
(1+s+8C12(s+s2))K1s+8C12(s+s2)[h(s)+4C12(s2+s3)+2(s+s2)𝒪(plog(m))]\displaystyle\leq\frac{(1+s+8C_{1}^{2}(s+s^{2}))^{K}-1}{s+8C_{1}^{2}(s+s^{2})}\left[h(s)+4C_{1}^{2}(s^{2}+s^{3})+2(s+s^{2})\mathcal{O}\left(\frac{p}{\log(m)}\right)\right]
𝒪(ps)+𝒪(plog(m)).\displaystyle\leq\mathcal{O}(ps)+\mathcal{O}\left(\frac{p}{\log(m)}\right).

Therefore,

W2(Law(Y~tK),μ)𝒪(ps)+𝒪(plog(m)).\displaystyle W_{2}(Law(\widetilde{Y}_{t_{K}}),\mu)\leq\mathcal{O}(\sqrt{ps})+\mathcal{O}\left(\sqrt{\frac{p}{\log(m)}}\right).

A.4 Proof of Theorem 3

Proof.

This proof is same as that of Theorem 2. Similar to (A.3), by Lemma 9, it yields that

𝔼Y~tkXtk22\displaystyle\mathbb{E}\|\widetilde{Y}_{t_{k}}-X_{t_{k}}\|_{2}^{2} (1+s+8C12(s+s2))𝔼Y~tk1Xtk122+h(s)\displaystyle\leq(1+s+8C_{1}^{2}(s+s^{2}))\mathbb{E}\|\widetilde{Y}_{t_{k-1}}-X_{t_{k-1}}\|_{2}^{2}+h(s)
+4C12(s2+s3)+2s(1+s)𝒪(pm).\displaystyle~{}~{}~{}+4C_{1}^{2}(s^{2}+s^{3})+2s(1+s)\mathcal{O}\left(\frac{p}{m}\right).

Then, we also have

𝔼Y~tKXtK22\displaystyle\mathbb{E}\|\widetilde{Y}_{t_{K}}-X_{t_{K}}\|_{2}^{2}
(1+s+8C12(s+s2))K1s+8C12(s+s2)[h(s)+4C12(s2+s3)+2(s+s2)𝒪(1m)]\displaystyle\leq\frac{(1+s+8C_{1}^{2}(s+s^{2}))^{K}-1}{s+8C_{1}^{2}(s+s^{2})}\left[h(s)+4C_{1}^{2}(s^{2}+s^{3})+2(s+s^{2})\mathcal{O}\left(\frac{1}{m}\right)\right]
𝒪(ps)+𝒪(pm).\displaystyle\leq\mathcal{O}(ps)+\mathcal{O}\left(\frac{p}{m}\right).

Hence, it follows that

W2(Law(Y~tK),μ)𝒪(ps)+𝒪(pm).\displaystyle W_{2}(Law(\widetilde{Y}_{t_{K}}),\mu)\leq\mathcal{O}(\sqrt{ps})+\mathcal{O}\left(\sqrt{\frac{p}{m}}\right).

A.5 Preliminary lemmas for Theorem 4

To prove Theorem 4, we first prove the Lemmas 10-12.

Lemma 10.

Assume (A1) holds, then for any R>0R>0,

supx2R,t[0,1]𝔼[b(x,t)b~m(x,t)22]𝒪(pexp(R2)(Cp)4mε4)+𝒪(p(Cp)2mε2),\displaystyle\underset{\|x\|_{2}\leq R,t\in[0,1]}{\sup}\mathbb{E}\left[\|b(x,t)-\tilde{b}_{m}(x,t)\|_{2}^{2}\right]\leq\mathcal{O}\left(\frac{p\exp(R^{2})(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right),

where Cp=(2π)p/2C1C_{p}=(2\pi)^{p/2}C^{-1}. Moreover, if ff has the finite upper bound, then

supxp,t[0,1]𝔼[b(x,t)b~m(t,x)22]𝒪(p(Cp)4mε4)+𝒪(p(Cp)2mε2).\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}\left[\|b(x,t)-\tilde{b}_{m}(t,x)\|_{2}^{2}\right]\leq\mathcal{O}\left(\frac{p(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right).
Proof.

Denote two independent sets of independent copies of ZN(0,Ip)Z\sim N(0,\mbox{\bf I}_{p}) by 𝐙={Z1,,Zm}{\bf Z}=\{Z_{1},\ldots,Z_{m}\} and 𝐙={Z1,,Zm}{\bf Z}^{\prime}=\{Z_{1}^{\prime},\ldots,Z_{m}^{\prime}\}. For notation convenience, we denote

d=𝔼Zg(x+1tZ),dm=i=1mg(x+1tZi)m,\displaystyle d=\mathbb{E}_{Z}\nabla g(x+\sqrt{1-t}Z),~{}d_{m}=\frac{\sum_{i=1}^{m}\nabla g(x+\sqrt{1-t}Z_{i})}{m},
e=𝔼Z[g(x+1tZ)+εCp(1ε)],em=i=1mg(x+1tZi)m+εCp(1ε),\displaystyle e=\mathbb{E}_{Z}\left[g(x+\sqrt{1-t}Z)+\frac{\varepsilon}{C_{p}(1-\varepsilon)}\right],~{}e_{m}=\frac{\sum_{i=1}^{m}g(x+\sqrt{1-t}Z_{i})}{m}+\frac{\varepsilon}{C_{p}(1-\varepsilon)},
dm=i=1mg(x+1tZi)m,em=i=1mg(x+1tZi)m+εCp(1ε),\displaystyle d_{m}^{\prime}=\frac{\sum_{i=1}^{m}\nabla g(x+\sqrt{1-t}Z_{i}^{\prime})}{m},~{}e_{m}^{\prime}=\frac{\sum_{i=1}^{m}g(x+\sqrt{1-t}Z_{i}^{\prime})}{m}+\frac{\varepsilon}{C_{p}(1-\varepsilon)},

where g(x)=exp(x22/2V(x))g(x)=\exp(\|x\|^{2}_{2}/2-V(x)). Since ddm=𝔼[dmdm|𝐙]d-d_{m}=\mathbb{E}\left[d_{m}^{\prime}-d_{m}|{\bf Z}\right], we have ddm22𝔼[dmdm22|𝐙]\|d-d_{m}\|^{2}_{2}\leq\mathbb{E}\left[\|d_{m}^{\prime}-d_{m}\|^{2}_{2}|{\bf Z}\right]. By (A1), it yields that gg and g\nabla g are Lipschitz continuous. Thus there exists a finite and positive constant γ\gamma such that for all x,ypx,y\in\mathbb{R}^{p},

|g(x)g(y)|γxy2,\displaystyle|g(x)-g(y)|\leq\gamma\|x-y\|_{2}, (A.12)
g(x)g(y)2γxy2.\displaystyle\|\nabla g(x)-\nabla g(y)\|_{2}\leq\gamma\|x-y\|_{2}. (A.13)

Therefore,

𝔼ddm2\displaystyle\mathbb{E}\|d-d_{m}\|^{2} 𝔼[𝔼[dmdm22|𝐙]]=𝔼dmdm22\displaystyle\leq\mathbb{E}\left[\mathbb{E}[\|d_{m}^{\prime}-d_{m}\|^{2}_{2}|{\bf Z}]\right]=\mathbb{E}\|d_{m}^{\prime}-d_{m}\|^{2}_{2}
=𝔼Z1,Z1g(x+1tZ1)g(x+1tZ1)22m\displaystyle=\frac{\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\|\nabla g(x+\sqrt{1-t}Z_{1})-\nabla g(x+\sqrt{1-t}Z_{1}^{\prime})\right\|^{2}_{2}}{m}
(1t)γ2m𝔼Z1,Z1Z1Z122\displaystyle\leq\frac{(1-t)\gamma^{2}}{m}\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\|Z_{1}-Z_{1}^{\prime}\right\|^{2}_{2}
2pγ2m,\displaystyle\leq\frac{2p\gamma^{2}}{m}, (A.14)

where the second inequality follows from (A.13). Similarly, we also have

𝔼|eem|2\displaystyle\mathbb{E}|e-e_{m}|^{2} 𝔼|emem|2\displaystyle\leq\mathbb{E}|e_{m}^{\prime}-e_{m}|^{2}
=𝔼Z1,Z1|g(x+1tZ1)g(x+1tZ1)|2m\displaystyle=\frac{\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left|g(x+\sqrt{1-t}Z_{1})-g(x+\sqrt{1-t}Z_{1}^{\prime})\right|^{2}}{m}
(1t)γ2m𝔼Z1,Z1Z1Z122\displaystyle\leq\frac{(1-t)\gamma^{2}}{m}\mathbb{E}_{Z_{1},Z_{1}^{\prime}}\left\|Z_{1}-Z_{1}^{\prime}\right\|^{2}_{2}
2pγ2m,\displaystyle\leq\frac{2p\gamma^{2}}{m}, (A.15)

where the second inequality follows from (A.12). Hence, by (A.5) and (A.5), we have

supxp,t[0,1]𝔼ddm222pγ2m,\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}\left\|d-d_{m}\right\|_{2}^{2}\leq\frac{2p\gamma^{2}}{m}, (A.16)
supxp,t[0,1]𝔼|eem|22pγ2m.\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}|e-e_{m}|^{2}\leq\frac{2p\gamma^{2}}{m}. (A.17)

Then, by (A.12) and (A.13), through some simple calculation, it yields that

b(x,t)b~m(x,t)2\displaystyle\|b(x,t)-\tilde{b}_{m}(x,t)\|_{2} =dedmem2\displaystyle=\left\|\frac{d}{e}-\frac{d_{m}}{e_{m}}\right\|_{2}
d2|eme|+ddm2|e||eem|\displaystyle\leq\frac{\|d\|_{2}|e_{m}-e|+\|d-d_{m}\|_{2}|e|}{|ee_{m}|}
γ|eme|+ddm2|e|(ε/(CpCpε))2.\displaystyle\leq\frac{\gamma|e_{m}-e|+\|d-d_{m}\|_{2}|e|}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}. (A.18)

Let R>0R>0, then

supx2Rg(x)𝒪(exp(R2/2)).\displaystyle\underset{\|x\|_{2}\leq R}{\sup}g(x)\leq\mathcal{O}\left(\exp(R^{2}/2)\right). (A.19)

Therefore, by (A.16)-(A.19), it can be concluded that

supx2R,t[0,1]𝔼[b(x,t)b~m(x,t)22]\displaystyle\underset{\|x\|_{2}\leq R,t\in[0,1]}{\sup}\mathbb{E}\left[\|b(x,t)-\tilde{b}_{m}(x,t)\|_{2}^{2}\right] 𝒪(pexp(R2)m(ε/(CpCpε))4)+𝒪(pm(ε/(CpCpε))2)\displaystyle\leq\mathcal{O}\left(\frac{p\exp(R^{2})}{m(\varepsilon/(C_{p}-C_{p}\varepsilon))^{4}}\right)+\mathcal{O}\left(\frac{p}{m(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}\right)
𝒪(pexp(R2)(Cp)4mε4)+𝒪(p(Cp)2mε2).\displaystyle\leq\mathcal{O}\left(\frac{p\exp(R^{2})(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right).

Moreover, ff has a finite upper bound so does gg. Then there exists a finite and positive constant ζ\zeta such that gζg\leq\zeta. Similar to (A.5), it follows that for all xpx\in\mathbb{R}^{p} and t[0,1]t\in[0,1],

b(x,t)b~m(x,t)22\displaystyle\|b(x,t)-\tilde{b}_{m}(x,t)\|_{2}^{2} 2γ2|eme|2+(ζ+ε/(CpCpε))2ddm22(ε/(CpCpε))4.\displaystyle\leq 2\frac{\gamma^{2}|e_{m}-e|^{2}+(\zeta+\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}\|d-d_{m}\|_{2}^{2}}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{4}}. (A.20)

Then, by (A.16)-(A.17) and (A.20), it follows that

supxp,t[0,1]𝔼[b(x,t)b~m(t,x)22]\displaystyle\underset{x\in\mathbb{R}^{p},t\in[0,1]}{\sup}\mathbb{E}\left[\|b(x,t)-\tilde{b}_{m}(t,x)\|_{2}^{2}\right] 𝒪(pm(ε/(CpCpε))4)+𝒪(pm(ε/(CpCpε))2)\displaystyle\leq\mathcal{O}\left(\frac{p}{m(\varepsilon/(C_{p}-C_{p}\varepsilon))^{4}}\right)+\mathcal{O}\left(\frac{p}{m(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}\right)
𝒪(p(Cp)4mε4)+𝒪(p(Cp)2mε2).\displaystyle\leq\mathcal{O}\left(\frac{p(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right).

Lemma 11.

Assume (A1) holds, then for k=0,1,,Kk=0,1,\ldots,K,

𝔼[Y~tkε22]𝒪((Cp)2ε2)+𝒪(p),\displaystyle\mathbb{E}[\|\widetilde{Y}_{t_{k}}^{\varepsilon}\|^{2}_{2}]\leq\mathcal{O}\left(\frac{(C_{p})^{2}}{\varepsilon^{2}}\right)+\mathcal{O}\left(p\right),

where Cp=(2π)p/2C1C_{p}=(2\pi)^{p/2}C^{-1}.

Proof.

Define Θk,tε=Y~tkε+(ttk)b~m(Y~tkε,tk)\Theta_{k,t}^{\varepsilon}=\widetilde{Y}_{t_{k}}^{\varepsilon}+(t-t_{k})\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k}) and Y~tε=Θk,tε+BtBtk\widetilde{Y}_{t}^{\varepsilon}=\Theta_{k,t}^{\varepsilon}+B_{t}-B_{t_{k}}, where tkttk+1t_{k}\leq t\leq t_{k+1} with k=0,1,,K1k=0,1,\ldots,K-1. By (A1), then there exists one finite and positive constant γ\gamma such that gg is γ\gamma-Lipschitz continuous. Then, for all xpx\in\mathbb{R}^{p} and t[0,1]t\in[0,1], we have

b(x,t)22γ2(ε/(CpCpε))2,b~m(x,t)22γ2(ε/(CpCpε))2.\displaystyle\|b(x,t)\|_{2}^{2}\leq\frac{\gamma^{2}}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}},~{}~{}\|\tilde{b}_{m}(x,t)\|_{2}^{2}\leq\frac{\gamma^{2}}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}. (A.21)

By (A.21), we have

Θk,tε22\displaystyle\|\Theta_{k,t}^{\varepsilon}\|^{2}_{2} =Y~tkε22+(ttk)2b~m(Y~tkε,tk)22+2(ttk)(Y~tkε)b~m(Y~tkε,tk)\displaystyle=\|\widetilde{Y}_{t_{k}}^{\varepsilon}\|^{2}_{2}+(t-t_{k})^{2}\|\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\|^{2}_{2}+2(t-t_{k})(\widetilde{Y}_{t_{k}}^{\varepsilon})^{\top}\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})
(1+s)Y~tkε22+(s+s2)γ2(ε/(CpCpε))2.\displaystyle\leq(1+s)\|\widetilde{Y}_{t_{k}}^{\varepsilon}\|^{2}_{2}+\frac{(s+s^{2})\gamma^{2}}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}.

Furthermore, it can be shown that

𝔼[Y~tε22|Y~tkε]\displaystyle\mathbb{E}[\|\widetilde{Y}_{t}^{\varepsilon}\|^{2}_{2}|\widetilde{Y}_{t_{k}}^{\varepsilon}] =𝔼[Θk,tε22|Y~tkε]+(ttk)p\displaystyle=\mathbb{E}[\|\Theta_{k,t}^{\varepsilon}\|^{2}_{2}|\widetilde{Y}_{t_{k}}^{\varepsilon}]+(t-t_{k})p
(1+s)𝔼Y~tkε22+(s+s2)γ2(ε/(CpCpε))2+sp.\displaystyle\leq(1+s)\mathbb{E}\|\widetilde{Y}_{t_{k}}^{\varepsilon}\|^{2}_{2}+\frac{(s+s^{2})\gamma^{2}}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}+sp.

Therefore,

𝔼[Y~tk+1ε22](1+s)𝔼[Y~tkε22]+(s+s2)γ2(ε/(CpCpε))2+sp.\displaystyle\mathbb{E}[\|\widetilde{Y}_{t_{k+1}}^{\varepsilon}\|^{2}_{2}]\leq(1+s)\mathbb{E}[\|\widetilde{Y}_{t_{k}}^{\varepsilon}\|^{2}_{2}]+\frac{(s+s^{2})\gamma^{2}}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}+sp.

Since Y~t0ε=0\widetilde{Y}_{t_{0}}^{\varepsilon}=0, then by induction, we have

𝔼[Y~tk+1ε22]6γ2(ε/(CpCpε))2+3p𝒪((Cp)2ε2)+𝒪(p).\displaystyle\mathbb{E}[\|\widetilde{Y}_{t_{k+1}}^{\varepsilon}\|^{2}_{2}]\leq\frac{6\gamma^{2}}{(\varepsilon/(C_{p}-C_{p}\varepsilon))^{2}}+3p\leq\mathcal{O}\left(\frac{(C_{p})^{2}}{\varepsilon^{2}}\right)+\mathcal{O}\left(p\right).

Lemma 12.

Assume (A1) holds, then for k=0,1,,Kk=0,1,\ldots,K and t[0,1]t\in[0,1],

𝔼b(Y~tkε,tk)b~m(Y~tkε,tk)22\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2} 𝒪(p(Cp)4mε4)+𝒪((Cp)4log(m)ε4)+𝒪(p(Cp)2log(m)ε2),\displaystyle\leq\mathcal{O}\left(\frac{p(C_{p})^{4}}{\sqrt{m}\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{(C_{p})^{4}}{\log(m)\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{\log(m)\varepsilon^{2}}\right),

where Cp=(2π)p/2C1C_{p}=(2\pi)^{p/2}C^{-1}. Moreover, if ff has a finite upper bound, then

𝔼b(Y~tkε,tk)b~m(Y~tkε,tk)22𝒪(p(Cp)4mε4)+𝒪(p(Cp)2mε2).\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2}\leq\mathcal{O}\left(\frac{p(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right).
Proof.

Let R>0R>0, then

𝔼b(Y~tkε,tk)b~m(Y~tkε,tk)22=𝔼Y~tkε𝔼Z[b(Y~tkε,tk)b~m(Y~tkε,tk)221(Y~tkε2R)]+𝔼Y~tkε𝔼Z[b(Y~tkε,tk)b~m(Y~tkε,tk)221(Y~tkε2>R)].\begin{split}\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2}&=\mathbb{E}_{\widetilde{Y}_{t_{k}}^{\varepsilon}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}^{\varepsilon}\|_{2}\leq R)\right]\\ &~{}~{}~{}+\mathbb{E}_{\widetilde{Y}_{t_{k}}^{\varepsilon}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}^{\varepsilon}\|_{2}>R)\right].\end{split} (A.22)

Next, we need to bound the two terms on the right hand of (A.22). First, by Lemma 10, we have

𝔼Y~tkε𝔼Z[b(Y~tkε,tk)b~m(Y~tkε,tk)221(Y~tkε2R)]𝒪(pexp(R2)(Cp)4mε4)+𝒪(p(Cp)2mε2).\displaystyle\mathbb{E}_{\widetilde{Y}_{t_{k}}^{\varepsilon}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}^{\varepsilon}\|_{2}\leq R)\right]\leq\mathcal{O}\left(\frac{p\exp(R^{2})(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right).

Second, by combining (A.21) and Lemma 11 with the Markov inequality, we have

𝔼Y~tkε𝔼Z[b(Y~tkε,tk)b~m(Y~tkε,tk)221(Y~tkε2>R)]\displaystyle\mathbb{E}_{\widetilde{Y}_{t_{k}}^{\varepsilon}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2}1(\|\widetilde{Y}_{t_{k}}^{\varepsilon}\|_{2}>R)\right] 𝒪((Cp)4R2ε4)+𝒪(p(Cp)2R2ε2).\displaystyle\leq\mathcal{O}\left(\frac{(C_{p})^{4}}{R^{2}\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{R^{2}\varepsilon^{2}}\right).

Therefore,

𝔼b(Y~tkε,tk)b~m(Y~tkε,tk)22\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2} 𝒪(pexp(R2)(Cp)4mε4)+𝒪(p(Cp)2mε2)\displaystyle\leq\mathcal{O}\left(\frac{p\exp(R^{2})(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right)
+𝒪((Cp)4R2ε4)+𝒪(p(Cp)2R2ε2).\displaystyle~{}~{}~{}+\mathcal{O}\left(\frac{(C_{p})^{4}}{R^{2}\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{R^{2}\varepsilon^{2}}\right). (A.23)

Setting R=(log(m)2)1/2R=\left(\frac{\log(m)}{2}\right)^{1/2} in (A.5), we have

𝔼b(Y~tkε,tk)b~m(Y~tkε,tk)22\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2} 𝒪(p(Cp)4mε4)+𝒪((Cp)4log(m)ε4)+𝒪(p(Cp)2log(m)ε2).\displaystyle\leq\mathcal{O}\left(\frac{p(C_{p})^{4}}{\sqrt{m}\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{(C_{p})^{4}}{\log(m)\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{\log(m)\varepsilon^{2}}\right).

Moreover, if ff has a finite upper bound, then by Lemma 10, we can similarly get

𝔼b(Y~tkε,tk)b~m(Y~tkε,tk)22\displaystyle\mathbb{E}\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2} =𝔼Y~tkε𝔼Z[b(Y~tkε,tk)b~m(Y~tkε,tk)22]\displaystyle=\mathbb{E}_{\widetilde{Y}_{t_{k}}^{\varepsilon}}\mathbb{E}_{Z}\left[\left\|b(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})-\tilde{b}_{m}(\widetilde{Y}_{t_{k}}^{\varepsilon},t_{k})\right\|_{2}^{2}\right]
𝒪(p(Cp)4mε4)+𝒪(p(Cp)2mε2).\displaystyle\leq\mathcal{O}\left(\frac{p(C_{p})^{4}}{m\varepsilon^{4}}\right)+\mathcal{O}\left(\frac{p(C_{p})^{2}}{m\varepsilon^{2}}\right).

This completes the proof. ∎

A.6 Proof of Theorem 4

Proof.

By triangle inequality, we have W2(Law(Y~tKε),μ)W2(Law(Y~tKε),με)+W2(μ,με)W_{2}(Law(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu)\leq W_{2}(Law(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu_{\varepsilon})+W_{2}(\mu,\mu_{\varepsilon}), then we obtain the upper bound of two terms on the right hand of this inequality, respectively.

First, similar to the proof of Theorem 2, by Lemma 12 and Y~t0ε=Xt0=0\widetilde{Y}_{t_{0}}^{\varepsilon}=X_{t_{0}}=0 and through some calculation, we can conclude that

W2(Law(Y~tKε),με)𝒪(ps)+𝒪(p(Cp)2m1/4ε2)+𝒪((Cp)2log(m)ε2)+𝒪(pCplog(m)ε).\displaystyle W_{2}(Law(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu_{\varepsilon})\leq\mathcal{O}(\sqrt{ps})+\mathcal{O}\left(\frac{\sqrt{p}(C_{p})^{2}}{m^{1/4}\varepsilon^{2}}\right)+\mathcal{O}\left(\frac{(C_{p})^{2}}{\sqrt{\log(m)}\varepsilon^{2}}\right)+\mathcal{O}\left(\frac{\sqrt{p}C_{p}}{\sqrt{\log(m)}\varepsilon}\right). (A.24)

Second, we need to get the upper bound of W2(μ,με)W_{2}(\mu,\mu_{\varepsilon}). Let YμY\sim\mu and ZN(0,Ip)Z\sim N(0,\mbox{\bf I}_{p}), θ\theta is one Bernoulli random variable satisfying P(θ=1)=1εP(\theta=1)=1-\varepsilon and P(θ=0)=εP(\theta=0)=\varepsilon. Assume YY, ZZ and θ\theta are independent of each other. Then (Y,(1θ)Z+θY)(Y,(1-\theta)Z+\theta Y) is one coupling of (μ,με)(\mu,\mu_{\varepsilon}), and denote its joint distribution by π\pi. Therefore, we have

p×pxy22𝑑π\displaystyle\int_{\mathbb{R}^{p}\times\mathbb{R}^{p}}\|x-y\|_{2}^{2}d\pi =𝔼Y((1θ)Z+θY)22\displaystyle=\mathbb{E}\left\|Y-((1-\theta)Z+\theta Y)\right\|_{2}^{2}
=𝔼[𝔼[Y((1θ)Z+θY)22|θ]]\displaystyle=\mathbb{E}\left[\mathbb{E}\left[\|Y-((1-\theta)Z+\theta Y)\|_{2}^{2}|\theta\right]\right]
=𝔼[𝔼[Y((1θ)Z+θY)22|θ=1]]P(θ=1)\displaystyle=\mathbb{E}\left[\mathbb{E}\left[\|Y-((1-\theta)Z+\theta Y)\|_{2}^{2}|\theta=1\right]\right]P(\theta=1)
+𝔼[𝔼[Y((1θ)Z+θY)22|θ=0]]P(θ=0)\displaystyle~{}~{}~{}+\mathbb{E}\left[\mathbb{E}\left[\|Y-((1-\theta)Z+\theta Y)\|_{2}^{2}|\theta=0\right]\right]P(\theta=0)
=𝔼[YZ22|θ=0]P(θ=0)\displaystyle=\mathbb{E}[\|Y-Z\|_{2}^{2}|\theta=0]P(\theta=0)
=ε𝔼YZ22\displaystyle=\varepsilon\mathbb{E}\|Y-Z\|_{2}^{2}
𝒪(pε).\displaystyle\leq\mathcal{O}(p\varepsilon).

Then we have

W2(μ,με)𝒪(pε).\displaystyle W_{2}(\mu,\mu_{\varepsilon})\leq\mathcal{O}(\sqrt{p\varepsilon}). (A.25)

Combining (A.24) with (A.25), it yields that

W2(Law(Y~tKε),μ)\displaystyle W_{2}(Law(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu)
𝒪(pε)+𝒪(ps)+𝒪(p(Cp)2m1/4ε2)+𝒪((Cp)2log(m)ε2)+𝒪(pCplog(m)ε).\displaystyle~{}~{}\leq\mathcal{O}(\sqrt{p\varepsilon})+\mathcal{O}(\sqrt{ps})+\mathcal{O}\left(\frac{\sqrt{p}(C_{p})^{2}}{m^{1/4}\varepsilon^{2}}\right)+\mathcal{O}\left(\frac{(C_{p})^{2}}{\sqrt{\log(m)}\varepsilon^{2}}\right)+\mathcal{O}\left(\frac{\sqrt{p}C_{p}}{\sqrt{\log(m)}\varepsilon}\right). (A.26)

Set ε=(log(m))1/5\varepsilon=(\log(m))^{-1/5} in (A.6), then there exist one constant C~p\widetilde{C}_{p} depending on pp such that

W2(Law(Y~tKε),μ)C~p𝒪(1(log(m))1/10)+𝒪(ps).\displaystyle W_{2}(Law(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu)\leq\widetilde{C}_{p}\cdot\mathcal{O}\left(\frac{1}{(\log(m))^{1/10}}\right)+\mathcal{O}(\sqrt{ps}).

Moreover, if ff has the finite upper bound, then similar to the proof of Theorem 3 and by (A.25) and Lemma 12, we have

W2(Law(Y~tKε),μ)𝒪(pε)+𝒪(ps)+𝒪(p(Cp)2mε2)+𝒪(pCpmε).\displaystyle W_{2}(Law(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu)\leq\mathcal{O}(\sqrt{p\varepsilon})+\mathcal{O}(\sqrt{ps})+\mathcal{O}\left(\frac{\sqrt{p}(C_{p})^{2}}{\sqrt{m}\varepsilon^{2}}\right)+\mathcal{O}\left(\frac{\sqrt{p}C_{p}}{\sqrt{m}\varepsilon}\right). (A.27)

Set ε=m1/5\varepsilon=m^{-1/5} in (A.27), then there exists one constant C~p\widetilde{C}_{p} depending on pp such that

W2(Law(Y~tKε),μ)C~p𝒪(1m1/10)+𝒪(ps).\displaystyle W_{2}(Law(\widetilde{Y}_{t_{K}}^{\varepsilon}),\mu)\leq\widetilde{C}_{p}\cdot\mathcal{O}\left(\frac{1}{m^{1/10}}\right)+\mathcal{O}(\sqrt{ps}).

References

  • [1] Mathias Barkhagen, Ngoc Huy Chau, Eric Moulines, Miklos Rasonyi, Sotirios Sabanis, and Ying Zhang, On stochastic gradient langevin dynamics with dependent data streams in the logconcave case, arXiv preprint arXiv:1812.02709, (2018).
  • [2] Joris Bierkens, Paul Fearnhead, Gareth Roberts, et al., The zig-zag process and super-efficient sampling for bayesian analysis of big data, The Annals of Statistics, 47 (2019), pp. 1288–1320.
  • [3] Nawaf Bou-Rabee, Andreas Eberle, Raphael Zimmer, et al., Coupling and convergence for hamiltonian monte carlo, Annals of Applied Probability, 30 (2020), pp. 1209–1250.
  • [4] Alexandre Bouchard-Cote, Sebastian J Vollmer, and Arnaud Doucet, The bouncy particle sampler: A nonreversible rejection-free markov chain monte carlo method, Journal of the American Statistical Association, 113 (2018), pp. 855–867.
  • [5] Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng, Handbook of markov chain monte carlo, CRC press, 2011.
  • [6] Ngoc Huy Chau, Eric Moulines, Miklos Rasonyi, Sotirios Sabanis, and Ying Zhang, On stochastic gradient langevin dynamics with dependent data streams: the fully non-convex case, arXiv preprint arXiv:1905.13142, (2019).
  • [7] Yongxin Chen, Tryphon T Georgiou, and Michele Pavon, Stochastic control liaisons: Richard sinkhorn meets gaspard monge on a schrodinger bridge, SIAM Review, 63 (2021), pp. 249–313.
  • [8] Xiang Cheng and Peter Bartlett, Convergence of langevin mcmc in kl-divergence, Proceedings of Machine Learning Research, Volume 83: Algorithmic Learning Theory, (2018), pp. 186–211.
  • [9] Xiang Cheng, Niladri S Chatterji, Yasin Abbasi-Yadkori, Peter L Bartlett, and Michael I Jordan, Sharp convergence rates for langevin dynamics in the nonconvex setting, arXiv preprint arXiv:1805.01648, (2018).
  • [10] Paolo Dai Pra, A stochastic control approach to reciprocal diffusion processes, Applied mathematics and Optimization, 23 (1991), pp. 313–329.
  • [11] Arnak S Dalalyan, Further and stronger analogy between sampling and optimization: Langevin monte carlo and gradient descent, arXiv: Statistics Theory, (2017).
  • [12]  , Theoretical guarantees for approximate sampling from smooth and log-concave densities, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79 (2017), pp. 651–676.
  • [13] Arnak S Dalalyan and Avetik G Karagulyan, User-friendly guarantees for the langevin monte carlo with inaccurate gradient, Stochastic Processes and their Applications, 129 (2019), pp. 5278–5311.
  • [14] David B Dunson and JE Johndrow, The hastings algorithm at fifty, Biometrika, 107 (2020), pp. 1–23.
  • [15] Alain Durmus and Eric Moulines, High-dimensional bayesian inference via the unadjusted langevin algorithm., arXiv: Statistics Theory, (2016).
  • [16]  , Sampling from a strongly log-concave distribution with the unadjusted langevin algorithm, arXiv: Statistics Theory, (2016).
  • [17] Alain Durmus, Eric Moulines, et al., Nonasymptotic convergence analysis for the unadjusted langevin algorithm, The Annals of Applied Probability, 27 (2017), pp. 1551–1587.
  • [18] Ronen Eldan, Joseph Lehec, Yair Shenfeld, et al., Stability of the logarithmic sobolev inequality via the follmer process, in Annales de l’Institut Henri Poincare, Probabilites et Statistiques, vol. 56, Institut Henri Poincare, 2020, pp. 2253–2269.
  • [19] Hans Follmer, An entropy approach to the time reversal of diffusion processes, in Stochastic Differential Systems Filtering and Control, Springer, 1985, pp. 156–163.
  • [20]  , Time reversal on wiener space, in Stochastic processes-mathematics and physics, Springer, 1986, pp. 119–129.
  • [21]  , Random fields and diffusion processes, in Ecole dEte de Probabilites de Saint-Flour XV–XVII, 1985–87, Springer, 1988, pp. 101–203.
  • [22] Alan E Gelfand and Adrian FM Smith, Sampling-based approaches to calculating marginal densities, Journal of the American statistical association, 85 (1990), pp. 398–409.
  • [23] Stuart Geman and Donald Geman, Stochastic relaxation, gibbs distributions, and the bayesian restoration of images, IEEE Transactions on pattern analysis and machine intelligence, (1984), pp. 721–741.
  • [24] Jack K Hale, Asymptotic behavior of dissipative systems, no. 25, American Mathematical Soc., 2010.
  • [25] W Keith Hastings, Monte carlo sampling methods using markov chains and their applications, (1970).
  • [26] Jian Huang, Yuling Jiao, Lican Kang, Xu Liao, Jin Liu, and Yanyan Liu, Schrodinger-follmer sampler: Sampling without ergodicity, arXiv preprint arXiv: 2106.10880, (2021).
  • [27] Joseph Lehec, Representation formula for the entropy and functional inequalities, in Annales de l’IHP Probabilites et statistiques, vol. 49, 2013, pp. 885–899.
  • [28] Christian Leonard, A survey of the schrodinger problem and some of its connections with optimal transport, Discrete Continuous Dynamical Systems-A, 34 (2014), p. 1533.
  • [29] Jun S Liu, Monte Carlo strategies in scientific computing, Springer Science Business Media, 2008.
  • [30] Yi-An Ma, Yuansi Chen, Chi Jin, Nicolas Flammarion, and Michael I Jordan, Sampling can be faster than optimization, Proceedings of the National Academy of Sciences, 116 (2019), pp. 20881–20885.
  • [31] Georg Menz, Andre Schlichting, et al., Poincare and logarithmic sobolev inequalities by decomposition of the energy landscape, Annals of Probability, 42 (2014), pp. 1809–1884.
  • [32] Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller, Equation of state calculations by fast computing machines, The journal of chemical physics, 21 (1953), pp. 1087–1092.
  • [33] Wenlong Mou, Nicolas Flammarion, Martin J Wainwright, and Peter L Bartlett, Improved bounds for discretization of langevin diffusions: Near-optimal rates without convexity, arXiv preprint arXiv:1907.11331, (2019).
  • [34] Maxim Raginsky, Alexander Rakhlin, and Matus Telgarsky, Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis, in Conference on Learning Theory, PMLR, 2017, pp. 1674–1703.
  • [35] Christian P Robert, George Casella, and George Casella, Introducing monte carlo methods with r, vol. 18, Springer, 2010.
  • [36] Erwin Schrodinger, Sur la theorie relativiste de lelectron et l’interpretation de la mecanique quantique, in Annales de l’institut Henri Poincare, vol. 2, 1932, pp. 269–310.
  • [37] Luke Tierney, Markov chains for exploring posterior distributions, The Annals of Statistics, 22 (1994), pp. 1701–1728.
  • [38] Belinda Tzen and Maxim Raginsky, Theoretical guarantees for sampling and inference in generative models with latent diffusions, in Conference on Learning Theory, PMLR, 2019, pp. 3084–3114.
  • [39] Feng-Yu Wang et al., Log-sobolev inequalities: different roles of ric and hess, The Annals of Probability, 37 (2009), pp. 1587–1604.
  • [40] Ying Zhang, Omer Deniz Akyildiz, Theo Damoulas, and Sotirios Sabanis, Nonasymptotic estimates for stochastic gradient langevin dynamics under local conditions in nonconvex optimization, arXiv preprint arXiv:1910.02008, (2019).