This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Fast-oscillating random perturbations of Hamiltonian systems

Shuo Yan
Department of Mathematics, University of Maryland
College Park, Maryland, United States
[email protected]
Abstract

We consider coupled slow-fast stochastic processes, where the averaged slow motion is given by a two-dimensional Hamiltonian system with multiple critical points. On a proper time scale, the evolution of the first integral converges to a diffusion process on the corresponding Reeb graph, with certain gluing conditions specified at the interior vertices, as in the case of additive white noise perturbations of Hamiltonian systems considered by M. Freidlin and A. Wentzell. The current paper provides the first result where the motion on a graph and the corresponding gluing conditions appear due to the averaging of a slow-fast system, with a Hamiltonian structure, on a large time scale. The result allows one to consider, for instance, long-time diffusion approximation for an oscillator with a potential with more than one well.
Keywords: Averaging, slow-fast system, gluing conditions, diffusion approximation
Mathematics Subject Classification: 37J40, 60F17

1 Introduction

Consider a family of diffusion processes (𝑿tε,𝝃tε)({\bm{X}}_{t}^{\varepsilon},{\bm{\xi}}_{t}^{\varepsilon}) in 2×𝕋m\mathbb{R}^{2}\times\mathbb{T}^{m} satisfying

d𝑿tε=\displaystyle d\bm{X}_{t}^{\varepsilon}= b(𝑿tε,𝝃tε)dt,𝑿0ε=x02,\displaystyle\leavevmode\nobreak\ b({\bm{X}}_{t}^{\varepsilon},{\bm{\xi}}_{t}^{\varepsilon})dt,\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ {\bm{X}}_{0}^{\varepsilon}=x_{0}\in\mathbb{R}^{2}, (1.1)
d𝝃tε=\displaystyle d{\bm{\xi}}_{t}^{\varepsilon}= 1εv(𝝃tε)dt+1εσ(𝝃tε)dWt,𝝃0ε=y0𝕋m,\displaystyle\leavevmode\nobreak\ \frac{1}{\varepsilon}v({\bm{\xi}}_{t}^{\varepsilon})dt+\frac{1}{\sqrt{\varepsilon}}\sigma({\bm{\xi}}_{t}^{\varepsilon})dW_{t},\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ {\bm{\xi}}_{0}^{\varepsilon}=y_{0}\in\mathbb{T}^{m},

where ε\varepsilon is a small positive parameter, 𝕋m\mathbb{T}^{m} is the mm-dimensional torus, and WtW_{t} is an mm-dimensional Brownian motion. In the coupled slow-fast system, 𝑿tε{\bm{X}}_{t}^{\varepsilon} is the slow component and 𝝃tε{\bm{\xi}}_{t}^{\varepsilon} is the fast component, since the generator of the diffusion in the second equation is multiplied by ε1\varepsilon^{-1}. On the space 2×𝕋m\mathbb{R}^{2}\times\mathbb{T}^{m}, the diffusion (1.1) is everywhere degenerate. All the randomness comes from the second equation and is transmitted to 𝑿tε{\bm{X}}_{t}^{\varepsilon} through the vector b(𝑿tε,𝝃tε)b({\bm{X}}_{t}^{\varepsilon},{\bm{\xi}}_{t}^{\varepsilon}), which is fast-oscillating in time. Under natural conditions, the averaging principle holds for the process in (1.1) (cf. [11]). For example, if σ(y)\sigma(y) is non-degenerate (and thus 𝝃tε{\bm{\xi}}_{t}^{\varepsilon} has a unique invariant measure μ\mu independent of ε\varepsilon), then 𝑿tε{\bm{X}}_{t}^{\varepsilon} converges as ε0\varepsilon\to 0 in probability on each finite interval [0,T][0,T] to an averaged process defined by the differential equation

d𝒙t=b¯(𝒙t)dt,d{\bm{x}_{t}}=\bar{b}({\bm{x}_{t}})dt, (1.2)

where b¯(x)=𝕋mb(x,y)𝑑μ(y)\bar{b}(x)=\int_{\mathbb{T}^{m}}b(x,y)d\mu(y). Therefore, 𝑿tε{\bm{X}}_{t}^{\varepsilon} can be viewed as a result of fast-oscillating random perturbations, i.e. b(x,y)b¯(x)b(x,y)-\bar{b}(x), of the deterministic process 𝒙t{\bm{x}_{t}}. Moreover, the deviation can be described more precisely: the process ε1/2(𝑿tε𝒙t)\varepsilon^{-1/2}({\bm{X}}_{t}^{\varepsilon}-{\bm{x}_{t}}) converges weakly to a Gaussian Markov process on a finite interval [0,T][0,T] (cf. [14], [2]), and, if we assume a special type of vector b(x,y)b(x,y), then the local limit theorem holds for ε1(𝑿tε𝒙t)\varepsilon^{-1}({\bm{X}}_{t}^{\varepsilon}-{\bm{x}_{t}}) at time tt ([16]).

If the system (1.2) has a first integral HH, then, by the averaging principle, H(𝑿tε)H({\bm{X}}_{t}^{\varepsilon}) is nearly constant on finite time intervals when ε\varepsilon is small. Nontrivial behavior can, however, be observed on larger time intervals (of order ε1\varepsilon^{-1}). Assume, momentarily, that HH has a single critical point. Then it was demonstrated in [3] that H(𝑿t/εε)H({\bm{X}}_{t/\varepsilon}^{\varepsilon}) converges weakly in C([0,T])C([0,T]), as ε0\varepsilon\to 0, to a diffusion process for any finite TT, under additional assumptions. A similar result in the case of multiple degrees of freedom was obtained recently in [9] and the main goal there was to overcome difficulties related to resonances, which is typical in the case of multiple degrees of freedom. The result holds in the region where no critical points of the first integrals are present and action-angle-type coordinates can be introduced.

Refer to caption
Figure 1: Level sets and the Reeb graph.

Let us return to the two-dimensional situation. In the presence of multiple critical points, including saddle points, the problem gets more complicated as we need to consider the Reeb graph in order to describe the evolution of the first integrals denoted by h=(k,H)h=(k,H), where the additional discrete-valued first integral kk is the label of the edge on the Reeb graph. (For instance, in Figure 1, we have one saddle point and two local minima of HH in the space 2\mathbb{R}^{2}. Accordingly, on the graph, we have one interior vertex, three exterior vertices, including one formally representing the infinity, and three edges connecting them.) In particular, the interior vertices on the graph correspond to the level curves that contain the saddle points, and those level curves are called the separatrices. In this situation, the limiting behavior has already been described for the white-noise-type additive perturbations of dynamical systems: Hamiltonian systems in 2\mathbb{R}^{2} ([12]), general dynamical systems with conservation laws in n\mathbb{R}^{n} ([10]), and Hamiltonian systems with an ergodic component on two-dimensional surfaces ([5],[6],[4]).

In this article, we consider fast-oscillating random perturbations, as discussed above, of Hamiltonian system in 2\mathbb{R}^{2} with multiple critical points and prove that the evolution of the first integrals hh converges to a diffusion process defined by an operator (,D())\mathcal{L},D(\mathcal{L})) on the corresponding Reeb graph. In particular, the exterior vertices turn out to be inaccessible and the behavior of the process near the interior vertices is described in terms of the domain D()D(\mathcal{L}) in the following way: for interior vertex OiO_{i}, there are constants pkp_{k} such that each function fD()f\in D(\mathcal{L}) satisfies

IkOipklimhkOif(hk)=0,\sum_{I_{k}\sim O_{i}}p_{k}\lim_{h_{k}\to O_{i}}f^{\prime}(h_{k})=0, (1.3)

where IkOiI_{k}\sim O_{i} means that OiO_{i} is an endpoint of IkI_{k}. Intuitively, the absolute value of pkp_{k} is proportional to the probability of entering edge IkI_{k} after the process arrives at the vertex OiO_{i}. The relation (1.3) is usually referred to as the gluing condition. In the next section, we will formulate the results along with the assumptions more precisely. The coefficients pkp_{k} will be calculated explicitly. As we mentioned, similar results hold in case of additive perturbations of Hamiltionian systems. Now the techniques in the proof are more involved and require new ideas with analysis on multiple time scales : O(ε1)O(\varepsilon^{-1}), O(1)O(1), O(ε)O(\varepsilon), etc. It is worth noting that our result provides the first example where the motion on a graph and the corresponding gluing conditions appear as a result of averaging of a slow-fast system, with a Hamiltonian structure, on a large time scale.

In the remainder of this section, we briefly introduce the main idea and the critical steps of the proof, and outline the plan of the paper. To start with, since our interest is in the long-time behavior of 𝑿tε{\bm{X}}_{t}^{\varepsilon} on O(ε1)O(\varepsilon^{-1}) time scales, it is often convenient to consider a temporally re-scaled process (Xtε,ξtε)(X_{t}^{\varepsilon},\xi_{t}^{\varepsilon}):

dXtε=\displaystyle dX_{t}^{\varepsilon}= 1εb(Xtε,ξtε)dt,X0ε=x02,\displaystyle\leavevmode\nobreak\ \frac{1}{\varepsilon}b(X_{t}^{\varepsilon},\xi_{t}^{\varepsilon})dt,\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ X_{0}^{\varepsilon}=x_{0}\in\mathbb{R}^{2}, (1.4)
dξtε=\displaystyle d\xi_{t}^{\varepsilon}= 1ε2v(ξtε)dt+1εσ(ξtε)dWt,ξ0ε=y0𝕋m.\displaystyle\leavevmode\nobreak\ \frac{1}{\varepsilon^{2}}v(\xi_{t}^{\varepsilon})dt+\frac{1}{{\varepsilon}}\sigma(\xi_{t}^{\varepsilon})dW_{t},\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \xi_{0}^{\varepsilon}=y_{0}\in\mathbb{T}^{m}.

It is clear that (Xtε,ξtε)=(𝑿t/εε,𝝃t/εε)(X_{t}^{\varepsilon},\xi_{t}^{\varepsilon})=({\bm{X}}_{t/\varepsilon}^{\varepsilon},{\bm{\xi}}_{t/\varepsilon}^{\varepsilon}) in distribution. Thus, it suffices to prove the weak convergence of h(Xtε)h(X_{t}^{\varepsilon}) in the space 𝐂([0,T],𝔾)\bm{\mathrm{C}}([0,T],\mathbb{G}), where 𝔾\mathbb{G} is the Reeb graph. The proof of the weak convergence relies on demonstrating that the pre-limiting process asymptotically solves the martingale problem. Namely, we will show that, for each ff in a sufficiently large subset of D()D(\mathcal{L}) and T>0T>0,

𝐄(x,y)[f(h(XTε))f(h(x))0Tf(h(Xtε))𝑑t]0,\bm{\mathrm{E}}_{(x,y)}[f(h(X_{T}^{\varepsilon}))-f(h(x))-\int_{0}^{T}\mathcal{L}f(h(X_{t}^{\varepsilon}))dt]\to 0, (1.5)

as ε0\varepsilon\to 0, uniformly in xx in any compact set in 2\mathbb{R}^{2} and in y𝕋my\in\mathbb{T}^{m}. Note that, contrary to the standard formulation of the martingale problem, there is no conditioning in (1.5). However, (1.5) is still enough for our purpose (see Lemma 2.4), since (Xtε,ξtε)ε>0(X_{t}^{\varepsilon},\xi_{t}^{\varepsilon})_{\varepsilon>0} is a family of strong Markov processes. The main idea in our proof of (1.5) is to divide the time interval [0,T][0,T] into excursions between different visits to the separatrices and show that the contribution from each excursion is small and they do not accumulate. For example, suppose for now that there is only one saddle point, as shown in Figure 2. Let OO be the saddle point with H(O)=0H(O)=0, γ\gamma be the separatrix, γ={x:|H(x)|=εα}\gamma^{\prime}=\{x:|H(x)|=\varepsilon^{\alpha}\} be a set near the separatrix, where 0<α<1/20<\alpha<1/2, and σ0\sigma\geq 0 be the first time when the process XtεX_{t}^{\varepsilon} reaches γ\gamma. Define inductively the two sequences of stopping times:

σ0=\displaystyle\sigma_{0}= σ,\displaystyle\leavevmode\nobreak\ \sigma, (1.6)
τn=\displaystyle\tau_{n}= inf{t>σn1:Xtεγ},\displaystyle\inf\{t>\sigma_{n-1}:X_{t}^{\varepsilon}\in\gamma^{\prime}\},
σn=\displaystyle\sigma_{n}= inf{t>τn:Xtεγ},\displaystyle\inf\{t>\tau_{n}:X_{t}^{\varepsilon}\in\gamma\},

and consequently two Markov chains (Xτnε,ξτnε)(X_{\tau_{n}}^{\varepsilon},\xi_{\tau_{n}}^{\varepsilon}) and (Xσnε,ξσnε)(X_{\sigma_{n}}^{\varepsilon},\xi_{\sigma_{n}}^{\varepsilon}).

Refer to caption
Figure 2: Construction of discrete Markov chains.

As pointed out earlier, we wish to prove that the contributions to (1.5) from all individual excursions are small and the sum converges to zero as ε0\varepsilon\downarrow 0. Thus, except for the first and the last excursions, it is sufficient to show that (a) the expectation corresponding to one excursion [σn,σn+1][\sigma_{n},\sigma_{n+1}] converges to zero as ε0\varepsilon\downarrow 0 uniformly in initial distribution, (b) the expectation corresponding to one excursion [σn,σn+1][\sigma_{n},\sigma_{n+1}] is exactly zero, for all ε\varepsilon, if the process starts with the invariant measure of the Markov chain (Xσnε,ξσnε)(X_{\sigma_{n}}^{\varepsilon},\xi_{\sigma_{n}}^{\varepsilon}) on γ×𝕋m\gamma\times\mathbb{T}^{m}, and (c) the measures on γ×𝕋m\gamma\times\mathbb{T}^{m} induced by (Xσnε,ξσnε)(X_{\sigma_{n}}^{\varepsilon},\xi_{\sigma_{n}}^{\varepsilon}) converge exponentially, as nn\to\infty, uniformly in ε\varepsilon and in initial distribution, to the invariant one.

The claim in (a) is an extension of the results outside of the singularities in [9], and new difficulties arise due to the degenerations occurring on the boundaries. The claim in (b) is true if there is a common invariant measure for the processes (Xtε,ξtε)(X_{t}^{\varepsilon},\xi_{t}^{\varepsilon}) for all ε\varepsilon and the gluing conditions are chosen appropriately. In general, there is no common invariant measure for all ε\varepsilon, and we need to consider a family of auxiliary processes that do have a common invariant measure, and then use the proximity of the auxiliary and the original processes near the separatrix, and the Girsanov theorem to show that the gluing conditions are actually the same. The assertion in (c) is hard to verify, and its proof requires new techniques, including a local limit theorem for time-inhomogeneous functions of Markov processes and density estimates for hypoelliptic diffusions that will be discussed in later sections.

Plan of the paper.

In Section 2, we introduce the notation, state the assumptions, formulate the main result and the lemma we use to establish the weak convergence. In Section 3, the problem is reduced to the case we discussed where there is only one saddle point. Besides, we construct an auxiliary process with ε\varepsilon-independent invariant measure and derive diffusion approximations of the processes. In Section 4, we prove the averaging principle for the process on the Reeb graph up to the time when the process reaches an interior vertex. In Section 5, we construct the Markov chain on the product space of the separatrix and the mm-torus (see (1.6)) and prove its convergence to the invariant measure with an exponential rate uniformly in ε\varepsilon. In Section 6, we prove the main result. A few technical results including time estimates near the vertices and tightness of the processes are included in the Appendix.

2 Main result

Throughout this article, 𝐏\bm{\mathrm{P}} and 𝐄\bm{\mathrm{E}} represent the probability and expectation, respectively, and the subscripts pertain to initial conditions. tXε\mathcal{F}_{t}^{X^{\varepsilon}_{\cdot}} denotes the natural filtration generated by the process XtεX_{t}^{\varepsilon}. For brevity, the stopping times’ dependence on parameters and initial conditions is not always indicated in the notation when introduced (e.g. (1.6)). \nabla denotes a first order differential operator, i.e., derivative, gradient, Jacobian, etc., depending on the context. χA\chi_{A} denotes the indicator function of the event AA. If AA and BB are two non-negative functions that depend on an asymptotic parameter, we write ABA\lesssim B if A=O(B)A=O(B). 𝐂0(𝔾)\bm{\mathrm{C}}_{0}(\mathbb{G}) is the space of continuous functions on the Reeb graph 𝔾\mathbb{G} that tend to zero at infinity with uniform norm. hh is the projection onto 𝔾\mathbb{G}. In order to formulate the assumptions and results, we further introduce the following notation.

Notation.

  1. (i)

    OiO_{i}’s are the vertices on the graph and are occasionally used to denote the corresponding critical points on the plane when there is no ambiguity. IkI_{k}’s are the edges on the graph and UkU_{k}’s are the corresponding two-dimensional domains. Formally, OO_{\infty} is the vertex that corresponds to infinity. A symbol \sim between a vertex and an edge means that the vertex is an endpoint of the edge.

  2. (ii)

    Consider the following metric on 𝔾\mathbb{G}: r(h1,h2)r(h_{1},h_{2}) is the length of the shortest path connecting h1h_{1} and h2h_{2}. For example, if h1=(1,H1)h_{1}=(1,H_{1}), I1O1I_{1}\sim O_{1}, O1I2O_{1}\sim I_{2}, I2O2I_{2}\sim O_{2}, O2I3O_{2}\sim I_{3} and h2=(3,H2)h_{2}=(3,H_{2}), then r(h1,h2)=|H1H(O1)|+|H(O1)H(O2)|+|H(O2)H2|r(h_{1},h_{2})=|H_{1}-H(O_{1})|+|H(O_{1})-H(O_{2})|+|H(O_{2})-H_{2}|.

  3. (iii)

    γ(h)={x:H(x)=h}\gamma(h)=\{x:H(x)=h\} and γk(h)\gamma_{k}(h) is the connected component of γ(h)\gamma(h) in the domain UkU_{k}.

  4. (iv)

    bh(x,y)=H(x)b(x,y)b_{h}(x,y)=\nabla H(x)\cdot b(x,y).

  5. (v)

    ξt\xi_{t} is the diffusion process on 𝕋m\mathbb{T}^{m} with the generator LL, where

    Lf(y)=v(y)yf(y)+12i,j(σσ)ij(y)2yiyjf(y).Lf(y)=v(y)\cdot\nabla_{y}f(y)+\frac{1}{2}\sum_{i,j}(\sigma\sigma^{*})_{ij}(y)\frac{\partial^{2}}{\partial y_{i}\partial y_{j}}f(y). (2.1)
  6. (vi)

    For hh in the interior of IkI_{k}, define

    Qk(h)\displaystyle Q_{k}(h) =γk(h)dl|H(x)|,\displaystyle=\int_{\gamma_{k}(h)}\frac{dl}{|\nabla H(x)|}, (2.2)
    Ak(h)\displaystyle A_{k}(h) =2Qk(h)γk(h)1|H(x)|0𝐄μbh(x,ξs)bh(x,ξ0)𝑑s𝑑l,\displaystyle=\frac{2}{Q_{k}(h)}\int_{\gamma_{k}(h)}\frac{1}{|\nabla H(x)|}\int_{0}^{\infty}\bm{\mathrm{E}}_{\mu}b_{h}(x,\xi_{s})b_{h}(x,\xi_{0})dsdl,
    Bk(h)\displaystyle B_{k}(h) =1Qk(h)γk(h)1|H(x)|0𝐄μxbh(x,ξs)(b(x,ξ0)H(x))𝑑s𝑑l,\displaystyle=\frac{1}{Q_{k}(h)}\int_{\gamma_{k}(h)}\frac{1}{|\nabla H(x)|}\int_{0}^{\infty}\bm{\mathrm{E}}_{\mu}\nabla_{x}b_{h}(x,\xi_{s})\cdot(b(x,\xi_{0})-\nabla^{\perp}H(x))dsdl,
    Lkf(h)\displaystyle L_{k}f(h) =12Ak(h)f′′(h)+Bk(h)f(h).\displaystyle=\frac{1}{2}A_{k}(h)f^{\prime\prime}(h)+B_{k}(h)f^{\prime}(h).

The following conditions are assumed to hold throughout the article.

Assumptions.

  1. (H1)

    v(y)v(y) and σ(y)\sigma(y) are CC^{\infty} functions on 𝕋m\mathbb{T}^{m}. σ(y)\sigma(y) is m×mm\times m matrix-valued and σ(y)σ(y)𝖳\sigma(y)\sigma(y)^{\mathsf{T}} is positive-definite for all y𝕋my\in\mathbb{T}^{m}.

  2. (H2)

    H(x)H(x) is a CC^{\infty} function from 2\mathbb{R}^{2} to \mathbb{R} with bounded second derivatives. H(x)H(x) has a finite number of non-degenerate critical points. Each level curve corresponding to a vertex on the Reeb graph contains at most one critical point. As |x|+|x|\to+\infty, H(x)/|x|+H(x)/|x|\to+\infty.

  3. (H3)

    b(x,y)b(x,y) is a CC^{\infty} function from 2×𝕋m\mathbb{R}^{2}\times\mathbb{T}^{m} to 2\mathbb{R}^{2} such that the averaged process is a Hamiltonian system with HH, i.e. b¯(x)=H(x)\bar{b}(x)=\nabla^{\perp}H(x).

  4. (H4)

    The fast-oscillating perturbation is non-degenerate, i.e. {b(x,y)b¯(x):y𝕋m}\{b(x,y)-\bar{b}(x):y\in\mathbb{T}^{m}\} spans 2\mathbb{R}^{2} for each x2x\in\mathbb{R}^{2}, and is uniformly bounded together with its first derivatives.

  5. (H5)

    For each xx that belongs to one of the separatrices, there exists y𝕋my\in\mathbb{T}^{m} such that the process in (1.1) satisfies the parabolic Hörmander condition at (x,y)(x,y). Namely, with ε1v~(y)\varepsilon^{-1}\tilde{v}(y) being the drift term in the equation for 𝝃tε{\bm{\xi}}^{\varepsilon}_{t} in the Stratonovich form, we have that

    Lie({(0σk(y)),1km}{[(b(x,y)v~(y)),(0σk(y))],1km})\mathrm{Lie}\left(\left\{\begin{pmatrix}0\\ \sigma_{k}(y)\end{pmatrix},1\leq k\leq m\right\}\bigcup\left\{\left[\begin{pmatrix}b(x,y)\\ \tilde{v}(y)\end{pmatrix},\begin{pmatrix}0\\ \sigma_{k}(y)\end{pmatrix}\right],1\leq k\leq m\right\}\right) (2.3)

    at (x,y)(x,y) spans 2+m\mathbb{R}^{2+m}, where σk(y)\sigma_{k}(y) is the kk-th column of σ(y)\sigma(y), [,][\cdot,\cdot] is the Lie bracket, and Lie()(\cdot) is the Lie algebra generated by a set (cf. [13] or Section 2.3.2 of [17]).

Definition 2.1.

The domain D()D(\mathcal{L}) consists of functions f𝐂0(𝔾)f\in\bm{\mathrm{C}}_{0}(\mathbb{G}) satisfying:

  1. (i)

    ff is twice continuously differentiable in the interior of each edge IkI_{k} of 𝔾\mathbb{G};

  2. (ii)

    The limits limhkOiLkf(hk)\lim_{h_{k}\to O_{i}}L_{k}f(h_{k}) exist and do not depend on the edge IkI_{k};

  3. (iii)

    For interior vertex OiO_{i}, there are constants pk:=±limhOiAk(h)Qk(h)p_{k}:=\pm\lim_{h\to O_{i}}A_{k}(h)Q_{k}(h) such that

    IkOipklimhkOif(hk)=0,\sum_{I_{k}\sim O_{i}}p_{k}\lim_{h_{k}\to O_{i}}f^{\prime}(h_{k})=0, (2.4)

where the sign ++ is taken if OiO_{i} is minimum on IkI_{k}, and the sign - is taken otherwise. The operator \mathcal{L} on the Reeb graph is defined by

f(h)=Lkf(h)\mathcal{L}f(h)=L_{k}f(h) (2.5)

for fD()f\in D(\mathcal{L}) and hh in the interior of IkI_{k}, and defined as limhOif(h)\lim_{h\to O_{i}}\mathcal{L}f(h) at the vertex OiO_{i}.

By the Hille-Yosida theorem (see, for example, Theorem 4.2.2 in [7]), one can check that there exists a unique strong Markov process on 𝔾\mathbb{G} with continuous sample paths that has \mathcal{L} as its generator. Now we are ready to formulate the main result of this article.

Theorem 2.2.

Let the process (𝐗tε,𝛏tε)({\bm{X}}_{t}^{\varepsilon},{\bm{\xi}}_{t}^{\varepsilon}) be defined as in (1.1) and the conditions (H1)-(H5) hold. Then h(𝐗t/εε)h({\bm{X}}_{t/\varepsilon}^{\varepsilon}) converges weakly to the strong Markov process on the Reeb graph 𝔾\mathbb{G} that has the generator (,D())(\mathcal{L},D(\mathcal{L})) and the initial distribution h(x0)h(x_{0}).

Remark 2.3.

The last condition in (H2) can be relaxed without much extra effort since the limiting process defined by \mathcal{L} cannot reach infinity in finite time. In addition, as seen from the proofs in Section 5 and Remark 5.7, assumption (H5) can be relaxed so that it holds for at least one point on each separatrix. Moreover, if the number of Lie brackets needed to generate 2+m\mathbb{R}^{2+m} in the parabolic Hörmander condition is assumed to be given, then we can relax the assumptions on smoothness of the coefficients.

To prove the theorem, we need a result on weak convergence of processes, that is Lemma 4.1 in [8] adapted to our case (see also the original statement in [12]):

Lemma 2.4.

Let Ψ\Psi be a dense linear subspace of 𝐂0(𝔾)\bm{\mathrm{C}}_{0}(\mathbb{G}) and 𝒟\mathcal{D}_{\mathcal{L}} be a linear subspace of D()D(\mathcal{L}), and suppose that Ψ\Psi and 𝒟\mathcal{D}_{\mathcal{L}} have the following properties:

  1. (1)

    There is a λ>0\lambda>0 such that for each FΨF\in\Psi the equation λff=F\lambda f-\mathcal{L}f=F has a solution f𝒟f\in\mathcal{D}_{\mathcal{L}};

  2. (2)

    For each T>0T>0, each f𝒟f\in\mathcal{D}_{\mathcal{L}}, and each compact K𝔾K\subset\mathbb{G},

    𝐄(x,y)[f(h(XTε))f(h(x))0Tf(h(Xtε))𝑑t]0,\bm{\mathrm{E}}_{(x,y)}[f(h(X_{T}^{\varepsilon}))-f(h(x))-\int_{0}^{T}\mathcal{L}f(h(X_{t}^{\varepsilon}))dt]\to 0, (2.6)

uniformly in xh1(K)x\in h^{-1}(K) and y𝕋my\in\mathbb{T}^{m}.

Suppose that, for each starting point (x,y)(x,y) of (Xtε,ξtε)(X_{t}^{\varepsilon},\xi_{t}^{\varepsilon}), the family of measures on 𝐂([0,),𝔾)\bm{\mathrm{C}}([0,\infty),\mathbb{G}) induced by the processes h(Xtε)h(X_{t}^{\varepsilon}), ε>0\varepsilon>0, is tight. Then, for each starting point (x,y)(x,y) of (Xtε,ξtε)(X_{t}^{\varepsilon},\xi_{t}^{\varepsilon}), h(Xtε)h(X_{t}^{\varepsilon}) converges weakly to the strong Markov process on the Reeb graph 𝔾\mathbb{G} that has the generator (,D())(\mathcal{L},D(\mathcal{L})) and the initial distribution h(x)h(x).

Here we choose Ψ\Psi to be all the functions in 𝐂0(𝔾)\bm{\mathrm{C}}_{0}(\mathbb{G}) that are twice continuously differentiable in the interior of each edge; 𝒟\mathcal{D}_{\mathcal{L}} to be all the functions in D()D(\mathcal{L}) that are four times continuously differentiable in the interior of each edge. It is easy to check condition (1)(1) holds in Lemma 2.4, and the tightness of distributions of h(Xtε)h(X_{t}^{\varepsilon}), ε>0\varepsilon>0, is verified in Appendix C. Then the main ingredient of the proof is to verify (2.6) in condition (2) of Lemma 2.4.

3 Preliminaries

In this section, we explain some technical difficulties and our approach to the proof.

3.1 Localization

Considering Theorem 2.2 for 𝑿tε{\bm{X}}_{t}^{\varepsilon} in the state space 2\mathbb{R}^{2} causes technical difficulties due to the presence of multiple separatrices of the Hamiltonian and to the fact that the process 𝑿tε{\bm{X}}_{t}^{\varepsilon} is not positive recurrent. However, such difficulties can be circumvented by considering the process 𝑿tε{\bm{X}}_{t}^{\varepsilon} locally. Namely, let us cover the plane 2\mathbb{R}^{2} by finitely many bounded domains, each containing one of the separatrices and bounded by up to three connected components of level sets of HH, and one unbounded domain not containing any critical points. For example, as shown in Figure 3, we have different parts of the Reeb graph 𝔾\mathbb{G} that correspond to the domains in 2\mathbb{R}^{2}. Every point of 2\mathbb{R}^{2} can be assumed to be contained in the interior either one or two domains. Since it takes positive time to travel from the boundary of one domain to the boundary of another, it suffices to prove the result up to time of exit from one domain. To be more precise, let {Vk:1kK}\{V_{k}:1\leq k\leq K\} be the open cover. Define η0=inf{t0:Xtε1kKVk}\eta_{0}=\inf\{t\geq 0:X_{t}^{\varepsilon}\in\bigcup_{1\leq k\leq K}\partial V_{k}\} and, for kk such that Xηn1εVkX_{\eta_{n-1}}^{\varepsilon}\in V_{k}, define ηn=inf{t>ηn1:XtεVk}\eta_{n}=\inf\{t>\eta_{n-1}:X_{t}^{\varepsilon}\not\in V_{k}\}, n1n\geq 1. In order to prove (2.6), it suffices to prove instead, uniformly in xx in any compact set in 2\mathbb{R}^{2} and in y𝕋my\in\mathbb{T}^{m}, that

𝐄(x,y)[f(h(XTη1ε))f(h(x))0Tη1f(h(Xtε))𝑑t]0,asε0,\bm{\mathrm{E}}_{(x,y)}[f(h(X_{T\wedge\eta_{1}}^{\varepsilon}))-f(h(x))-\int_{0}^{T\wedge{\eta_{1}}}\mathcal{L}f(h(X_{t}^{\varepsilon}))dt]\to 0,\leavevmode\nobreak\ \leavevmode\nobreak\ {\rm as}\leavevmode\nobreak\ \varepsilon\downarrow 0, (3.1)

since it also implies that 𝐏(ηn<T)0\bm{\mathrm{P}}(\eta_{n}<T)\to 0 as nn\to\infty, uniformly in all ε\varepsilon sufficiently small. In the unbounded domain without critical points, (3.1) can be obtained using the result in bounded domain together with the tightness of h(Xtε)h(X_{t}^{\varepsilon}). It remains to consider the bounded domains. Let VV be one of the bounded domains. As explained below, the process XtεX_{t}^{\varepsilon} in VV can be extended beyond the time when it reaches the boundary by embedding VV into a compact manifold MM with an area form and a Hamiltonian such that there are no other separatrices.

Refer to caption
Figure 3: Domains projected on the graph.

Consider the case where VV is not simply connected - for example, VV is the domain that contains O3O_{3} in Figure 3. There are three connected components of 2V\mathbb{R}^{2}\setminus V as shown in Figure 4(a) (other situations can be treated similarly). Then we can modify the Hamiltonian and the vector field in C1C_{1} and C2C_{2} in such a way that assumptions (H2)-(H4) hold locally, and there is only one extremum point of HH in C1C_{1} and one in C2C_{2} (this modification is not needed if VV is simply connected). The unbounded domain C3C_{3} outside VV can be replaced by a compact surface SS so that the resulting state space of XtεX_{t}^{\varepsilon} is, topologically, a sphere M=VC1C2SM=V\bigcup C_{1}\bigcup C_{2}\bigcup S. Then, the vector field on the surface can be chosen as a smooth extension from VV so that the averaged process is a Hamiltonian system on MM with respect to an area form ω\omega, which is simply dx1dx2dx_{1}\wedge dx_{2} on VV, C1C_{1}, and C2C_{2}. Moreover, there exists a chart (S,Φ)(S,\Phi) such that the corresponding vector field b(x,y)b(x,y) on D:=Φ(S)D:=\Phi(S) satisfies that {b(x,y)b¯(x):y𝕋m}\{b(x,y)-\bar{b}(x):y\in\mathbb{T}^{m}\} spans 2\mathbb{R}^{2} for each xDx\in D and the averaged process is a Hamiltonian system with respect to dx1dx2dx_{1}\wedge dx_{2} on DD. For example, as shown in Figure 4(b), we can modify the vector field on the plane outside VV so that there are two disks with the same center D1D2D_{1}\subset D_{2}, and the averaged process is a Hamiltonian system in D2D_{2}, in particular, rotation between D1\partial D_{1} and D2\partial D_{2}. Then D2D_{2} is smoothly glued to a hemisphere and the resulting manifold is MM, and the vector field can be extended to the surface in such a way that the averaged process is a rotation with certain constant angular velocity on the level sets.

Refer to caption
(a) Three connected components of 2V\mathbb{R}^{2}\setminus V.
Refer to caption
(b) Hamiltonian system on manifold MM.
Figure 4: Localization.

It is clear that, when restricted to V×𝕋mV\times\mathbb{T}^{m}, the resulting system defined on M×𝕋mM\times\mathbb{T}^{m} has exactly the same behavior as the original process on 2×𝕋m\mathbb{R}^{2}\times\mathbb{T}^{m}. Therefore, it suffices to prove (3.1) for the new process on M×𝕋mM\times\mathbb{T}^{m}. Let us formally restate the corresponding assumptions and formulate the result on M×𝕋mM\times\mathbb{T}^{m}. In the remainder of the paper, all the definitions (e.g. the quantities defined in Section 2) and statements on MM are understood by locally choosing coordinates so that ω=dx1dx2\omega=dx_{1}\wedge dx_{2}. In particular, on SS, they are understood in the coordinate Φ\Phi, while on the ”flat” part that contains VV, they are understood in the usual way. Then the assumptions on the coefficients on M×𝕋mM\times\mathbb{T}^{m} are analogous to those introduced earlier, so we only mention the differences:

  1. (H2)

    H(x)H(x) is a CC^{\infty} function from MM to \mathbb{R} that has three extremum points and one saddle point.

  2. (H3)

    b(x,y)b(x,y) is a CC^{\infty} function from M×𝕋mM\times\mathbb{T}^{m} to TMTM such that b¯(x)=H(x)\bar{b}(x)=\nabla^{\perp}H(x).

  3. (H4)

    {b(x,y)b¯(x):y𝕋m}\{b(x,y)-\bar{b}(x):y\in\mathbb{T}^{m}\} spans TMTM for all xMx\in M.

From this point on, we denote the process on M×𝕋mM\times\mathbb{T}^{m} as (𝑿tε,𝝃tε)({\bm{X}}_{t}^{\varepsilon},{\bm{\xi}}_{t}^{\varepsilon}), and (Xtε,ξtε)(X_{t}^{\varepsilon},\xi_{t}^{\varepsilon}) on the time scale O(ε1)O(\varepsilon^{-1}) (defined by (1.1) and (1.4) with 2\mathbb{R}^{2} replaced by MM), and assume that the conditions (H2)-(H4) replacing (H2)-(H4) hold. Then (2.6) follows from the next result (see (3.1)).

Proposition 3.1.

For each f𝒟f\in\mathcal{D}_{\mathcal{L}} and each T>0T>0,

𝐄(x,y)[f(h(Xηε))f(h(x))0ηf(h(Xtε))𝑑t]0,\bm{\mathrm{E}}_{(x,y)}[f(h(X_{\eta}^{\varepsilon}))-f(h(x))-\int_{0}^{\eta}\mathcal{L}f(h(X_{t}^{\varepsilon}))dt]\to 0, (3.2)

as ε0\varepsilon\to 0, uniformly in xMx\in M, y𝕋my\in\mathbb{T}^{m}, and ηT\eta\leq T that is a stopping time w.r.t. tXε\mathcal{F}_{t}^{X^{\varepsilon}_{\cdot}}.

3.2 Auxiliary process

It turns out that similar results hold for a more general process with a slightly perturbed fast component:

d𝑿~tε=\displaystyle d{\tilde{\bm{X}}}_{t}^{\varepsilon}= b(𝑿~tε,𝝃~tε)dt,\displaystyle\ b({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}^{\varepsilon}_{t})dt, (3.3)
d𝝃~tε=\displaystyle d{\tilde{\bm{\xi}}}^{\varepsilon}_{t}= 1εv(𝝃~tε)dt+1εσ(𝝃~tε)dWt+c(𝑿~tε,𝝃~tε)dt,\displaystyle\ \frac{1}{\varepsilon}v({\tilde{\bm{\xi}}}^{\varepsilon}_{t})dt+\frac{1}{\sqrt{\varepsilon}}\sigma({\tilde{\bm{\xi}}}^{\varepsilon}_{t})dW_{t}+c({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}^{\varepsilon}_{t})dt,

where c(x,y)c(x,y) is infinitely differentiable. Namely, h(𝑿~t/εε)h({\tilde{\bm{X}}}_{t/\varepsilon}^{\varepsilon}) converges weakly to the Markov process defined by the operator (c,D(c))(\mathcal{L}_{c},D(\mathcal{L}_{c})) on the Reeb graph. Here, the subscript cc indicates that c\mathcal{L}_{c} depends on the choice of c(x,y)c(x,y). If c(x,y)=0c(x,y)=0, then it is clear that (𝑿~tε,𝝃~tε)=(𝑿tε,𝝃tε)({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}_{t}^{\varepsilon})=({\bm{X}}_{t}^{\varepsilon},{\bm{\xi}}_{t}^{\varepsilon}), and thus =c\mathcal{L}=\mathcal{L}_{c}. However, if c(x,y)0c(x,y)\not=0, then we have an additional drift term in (3.3), and thus we need an additional drift term in the generator of the limiting process. While a precise definition of (c,D(c))(\mathcal{L}_{c},D(\mathcal{L}_{c})) is deferred to later sections, we observe that the operators replacing LkL_{k} depend on c(x,y)c(x,y), and the domain D(c)D(\mathcal{L}_{c}) as well as the linear subspace, denoted by 𝒟c\mathcal{D}_{\mathcal{L}_{c}}, chosen in Lemma 2.4 also vary for different c(x,y)c(x,y). Therefore, in order to formulate general results, we consider 𝒟\mathcal{D}, the set of continuous functions on 𝔾\mathbb{G} that are four-times continuously differentiable inside each edge and satisfy conditions (i) and (iii) in Definition 2.1, as well as a weaker form of condition (ii), namely, the limits limhkOiLkf(hk)\lim_{h_{k}\to O_{i}}L_{k}f(h_{k}) exist but are not necessarily independent of the edge IkI_{k}. Note that 𝒟\mathcal{D} contains 𝒟c\mathcal{D}_{\mathcal{L}_{c}} for all choices of c(x,y)c(x,y). Define c\mathcal{L}_{c} on 𝒟\mathcal{D} by applying the differential operator (LkL_{k} plus an additional drift corresponding to c(x,y)c(x,y)) on each edge separately, with the result not being necessarily continuous at the interior vertices.

As mentioned before, we need to construct a family of auxiliary processes that, on the one hand, have a common invariant measure for all ε>0\varepsilon>0 and, on the other hand, are close to the processes of interest. The auxiliary process on MM can in fact be obtained by choosing a special c(x,y)c(x,y) in (3.3). We denote this particular choice of c(x,y)c(x,y) as c~(x,y)\tilde{c}(x,y). Now we find c~(x,y)\tilde{c}(x,y) such that λ×μ\lambda\times\mu is the invariant measure for the process with every ε\varepsilon, where λ\lambda is the area measure w.r.t. ω\omega and μ\mu is the invariant measure for 𝝃tε{\bm{\xi}}_{t}^{\varepsilon} in 𝕋m\mathbb{T}^{m}. Let L~ε\tilde{L}^{\varepsilon} be the generator of the process (𝑿~tε,𝝃~tε)({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}_{t}^{\varepsilon}):

L~εf(x,y)=b(x,y)xf(x,y)+c~(x,y)yf(x,y)+1εLf(x,y).\tilde{L}^{\varepsilon}f(x,y)=b(x,y)\cdot\nabla_{x}f(x,y)+\tilde{c}(x,y)\cdot\nabla_{y}f(x,y)+\frac{1}{\varepsilon}Lf(x,y).

Hence, λ×μ\lambda\times\mu is the invariant measure if L~εp(y)=0{\tilde{L}^{\varepsilon*}}p(y)=0, where L~ε{\tilde{L}^{\varepsilon*}} is the adjoint operator of L~ε\tilde{L}^{\varepsilon} and pp is the density of μ\mu, i.e.

0=L~εp(y)=\displaystyle 0={\tilde{L}^{\varepsilon*}}p(y)= divx(b(x,y)p(y))divy(c~(x,y)p(y))+1εLp(y),\displaystyle-\mathrm{div}_{x}(b(x,y)p(y))-\mathrm{div}_{y}(\tilde{c}(x,y)p(y))+\frac{1}{\varepsilon}L^{*}p(y), (3.4)

where LL^{*} is the adjoint operator of LL. Since μ\mu is the invariant measure for 𝝃tε{\bm{\xi}}^{\varepsilon}_{t}, the last term vanishes. Hence (3.4) reduces to

divxb(x,y)p(y)+divy(c~(x,y)p(y))=0.\mathrm{div}_{x}b(x,y)p(y)+\mathrm{div}_{y}(\tilde{c}(x,y)p(y))=0. (3.5)

To see the existence of the solution, we need the following lemma (cf. Lemma 2.1 in [9]).

Lemma 3.2.

Let g~(x,y)\tilde{g}(x,y) be a bounded function on 2×𝕋m\mathbb{R}^{2}\times\mathbb{T}^{m} that is infinitely differentiable, and let L~\tilde{L} be the generator of a non-degenerate diffusion on 𝕋m\mathbb{T}^{m} with the unique invariant measure μ~\tilde{\mu} and suppose that 𝕋mg~(x,y)𝑑μ~(y)=0\int_{\mathbb{T}^{m}}\tilde{g}(x,y)d\tilde{\mu}(y)=0 for each x2x\in\mathbb{R}^{2}. Then there exists a unique solution u~(x,y)\tilde{u}(x,y) to the equation

L~u~(x,y)=g~(x,y),𝕋mu~(x,y)𝑑μ~(y)=0,\tilde{L}\tilde{u}(x,y)=-\tilde{g}(x,y),\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \int_{\mathbb{T}^{m}}\tilde{u}(x,y)d\tilde{\mu}(y)=0, (3.6)

and u~(x,y)\tilde{u}(x,y) is also bounded and infinitely differentiable. Moreover, if g~(x,y)\tilde{g}(x,y) has uniformly bounded derivatives up to order KK in xx (or y)y), the same holds for u~(x,y)\tilde{u}(x,y).

Remark 3.3.

The same result also holds for functions on M×𝕋mM\times\mathbb{T}^{m}. Thus, the existence of the solution to (3.5) immediately follows from Lemma 3.2 applied to g~(x,y)=divxb(x,y)p(y)\tilde{g}(x,y)=\mathrm{div}_{x}b(x,y)p(y) and L~=Δy\tilde{L}=\Delta_{y}, and by taking the gradient of the solution in (3.6) w.r.t. yy, and dividing it by p(y)p(y).

As in (1.4), we define (X~tε,ξ~tε)=(𝑿~t/εε,𝝃~t/εε)({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon})=({\tilde{\bm{X}}}_{t/\varepsilon}^{\varepsilon},{\tilde{\bm{\xi}}}_{t/\varepsilon}^{\varepsilon}) in distribution. Then, a simple corollary can be obtained by using Lemma 3.2 and then applying Ito’s formula to the corresponding solution u~(𝑿~tε,𝝃~tε)\tilde{u}({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}_{t}^{\varepsilon}) and u~(X~tε,ξ~tε)\tilde{u}({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon}) (cf. Lemma 2.3 in [9]).

Corollary 3.4.

Let g~\tilde{g} satisfy the all the conditions in Lemma 3.2 with L~=L\tilde{L}=L and K=1K=1, then for fixed T>0T>0

𝐄(x,y)|0ηg~(𝑿~sε,𝝃~sε)𝑑s|=O(ε),𝐄(x,y)|0ηg~(X~sε,ξ~sε)𝑑s|=O(ε),\bm{\mathrm{E}}_{(x,y)}\left|\int_{0}^{\eta}\tilde{g}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})ds\right|=O(\sqrt{\varepsilon}),\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \bm{\mathrm{E}}_{(x,y)}\left|\int_{0}^{\eta}\tilde{g}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})ds\right|=O(\varepsilon), (3.7)

uniformly in xMx\in M, y𝕋my\in\mathbb{T}^{m}, and η\eta that is a stopping time bounded by TT.

3.3 Diffusion approximation

Since 𝕋m(b(x,y)H(x))𝑑μ(y)=0\int_{\mathbb{T}^{m}}(b(x,y)-\nabla^{\perp}H(x))d\mu(y)=0, by Lemma 3.2, there exists a function uu that is bounded together with its derivatives such that

Lu(x,y)=(b(x,y)H(x)).Lu(x,y)=-(b(x,y)-\nabla^{\perp}H(x)). (3.8)

The equation is understood element-wise. Apply Ito’s formula to u(𝑿~tε,𝝃~tε)u({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}_{t}^{\varepsilon}):

u(𝑿~tε,𝝃~tε)\displaystyle u({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}_{t}^{\varepsilon}) =u(x0,y0)+1ε0tLu(𝑿~sε,𝝃~sε)+1ε0tyu(𝑿~sε,𝝃~sε)σ(𝝃~sε)𝑑Ws\displaystyle=u(x_{0},y_{0})+\frac{1}{\varepsilon}\int_{0}^{t}Lu({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})+\frac{1}{\sqrt{\varepsilon}}\int_{0}^{t}\nabla_{y}u({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon})dW_{s} (3.9)
+0t[xu(𝑿~sε,𝝃~sε)b(𝑿~sε,𝝃~sε)+yu(𝑿~sε,𝝃~sε)c(𝑿~sε,𝝃~sε)]𝑑s.\displaystyle\quad+\int_{0}^{t}[\nabla_{x}u({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})b({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})+\nabla_{y}u({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})c({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})]ds.

Combining (3.3), (3.8), and (3.9), we obtain

𝑿~tε\displaystyle{\tilde{\bm{X}}}_{t}^{\varepsilon} =x0+0tH(𝑿~sε)𝑑s+ε0t[xu(𝑿~sε,𝝃~sε)b(𝑿~sε,𝝃~sε)+yu(𝑿~sε,𝝃~sε)c(𝑿~sε,𝝃~sε)]𝑑s\displaystyle=x_{0}+\int_{0}^{t}\nabla^{\perp}H({\tilde{\bm{X}}}_{s}^{\varepsilon})ds+\varepsilon\int_{0}^{t}[\nabla_{x}u({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})b({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})+\nabla_{y}u({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})c({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})]ds (3.10)
+ε0tyu(𝑿~sε,𝝃~sε)σ(𝝃~sε)𝑑Ws+ε(u(x0,y0)u(𝑿~tε,𝝃~tε)).\displaystyle\quad+\sqrt{\varepsilon}\int_{0}^{t}\nabla_{y}u({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon})dW_{s}+\varepsilon(u(x_{0},y_{0})-u({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}_{t}^{\varepsilon})).

Similarly, by applying Ito’s formula to u(X~tε,ξ~tε)u({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon}) and repeating the steps above, we have

X~tε\displaystyle{\tilde{X}}_{t}^{\varepsilon} =x0+1ε0tH(X~sε)𝑑s+0t[xu(X~sε,ξ~sε)b(X~sε,ξ~sε)+yu(X~sε,ξ~sε)c(X~sε,ξ~sε)]𝑑s\displaystyle=x_{0}+\frac{1}{\varepsilon}\int_{0}^{t}\nabla^{\perp}H({\tilde{X}}_{s}^{\varepsilon})ds+\int_{0}^{t}[\nabla_{x}u({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})b({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})+\nabla_{y}u({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})c({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})]ds (3.11)
+0tyu(X~sε,ξ~sε)σ(ξ~sε)𝑑Ws+ε(u(x0,y0)u(X~tε,ξ~tε)).\displaystyle\quad+\int_{0}^{t}\nabla_{y}u({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\sigma({\tilde{\xi}}_{s}^{\varepsilon})dW_{s}+\varepsilon(u(x_{0},y_{0})-u({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon})).

This idea of diffusion approximation is frequently used in the remainder of the paper, and the function u(x,y)u(x,y) always refers to the solution to (3.8).

4 Averaging principle inside one domain

In this section, we consider a general process (X~tε,ξ~tε)({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon}) defined after Remark 3.3, which is a faster version of the process in (3.3). The process takes values on M×𝕋mM\times\mathbb{T}^{m}. As a result of localization, MM is separated into three domains, each bounded by the separatrix or a part of it. This section is devoted to the proof of the averaging principle for (X~tε,ξ~tε)({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon}) on M×𝕋mM\times\mathbb{T}^{m} up to time when X~tε{\tilde{X}}_{t}^{\varepsilon} exits from one of the three domains. The domain under consideration will be denoted by UU. Therefore, without any ambiguity, the projection hh simply reduces to the Hamiltonian HH. Let U(h1,h2)U(h_{1},h_{2}) be the region in UU between γ(h1)\gamma(h_{1}) and γ(h2)\gamma(h_{2}), OO be the saddle point, and OO^{\prime} be the extremum point, and further define stopping times τ(h)=inf{t:|H(X~tε)H(O)|=h}\tau(h)=\inf\{t:|H({\tilde{X}}_{t}^{\varepsilon})-H(O)|=h\} and η(h)=inf{t:|H(X~tε)H(O)|=h}\eta(h)=\inf\{t:|H({\tilde{X}}_{t}^{\varepsilon})-H(O^{\prime})|=h\}. Without loss of generality, we assume that H(O)=0H(O)=0 and H(O)=1H(O^{\prime})=1.

4.1 Averaging principle before τ(εα)η(δ)\tau(\varepsilon^{\alpha})\wedge\eta(\delta)

We aim to prove the averaging principle between γ(εα)\gamma(\varepsilon^{\alpha}) and γ(1δ)\gamma(1-\delta) with constants 0<α<1/40<\alpha<1/4 and 0<δ<10<\delta<1. Notice that, for technical reasons, we assume that 0<α<1/40<\alpha<1/4 in this intermediate result and in the proofs that utilize it in this subsection and the next, while we always assume that 0<α<1/20<\alpha<1/2 elsewhere. Let us further define another coordinate ϕ\phi inside this domain UU. Let ll denote the curve that is tangent to H\nabla H at each point and connects the saddle point OO and the extremum point OO^{\prime}, and let l(h)l(h) be the intersection of ll and γ(h)\gamma(h). Let Q(h)Q(h) denote the time it takes for the averaged process 𝒙t\bm{x}_{t} to make one rotation on γ(h)\gamma(h) and q(x)q(x) denote the time it takes for 𝒙t\bm{x}_{t} starting from l(H(x))l(H(x)) to arrive at xx. Now we define the coordinate ϕ(x)=q(x)/Q(H(x))\phi(x)=q(x)/Q(H(x)) whose range is S1:=(mod 1)S^{1}:=\mathbb{R}\ (\mathrm{mod}\ 1). It is easy to see that 𝒙t\bm{x}_{t} has constant speed 1/Q(H(𝒙t))1/Q(H(\bm{x}_{t})) in ϕ\phi coordinate. Since there is logarithmic delay near the saddle point, the coordinate ϕ\phi has exploding derivatives near the separatrix. However, as shown in Appendix A, the order of its derivatives w.r.t. the Euclidean coordinates is under control. Let us denote H~tε=H(X~tε)\tilde{H}_{t}^{\varepsilon}=H({\tilde{X}}_{t}^{\varepsilon}) and Φ~tε=ϕ(X~tε)\tilde{\Phi}_{t}^{\varepsilon}=\phi({\tilde{X}}_{t}^{\varepsilon}). Along the same lines leading to (3.11), we have the following equations with uh=uHu_{h}=u\cdot\nabla H, uϕ=uϕu_{\phi}=u\cdot\nabla\phi, h0=H(x0)h_{0}=H(x_{0}), and ϕ0=ϕ(x0)\phi_{0}=\phi(x_{0}):

H~tε\displaystyle\tilde{H}_{t}^{\varepsilon} =h0+0tyuh(X~sε,ξ~sε)𝖳σ(ξ~sε)𝑑Ws+ε(uh(x0,y0)uh(X~tε,ξ~tε))\displaystyle=h_{0}+\int_{0}^{t}\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\xi}}_{s}^{\varepsilon})dW_{s}+\varepsilon(u_{h}(x_{0},y_{0})-u_{h}({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon}))
+0t[xuh(X~sε,ξ~sε)b(X~sε,ξ~sε)+yuh(X~sε,ξ~sε)c(X~sε,ξ~sε)]𝑑s,\displaystyle\quad+\int_{0}^{t}[\nabla_{x}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot b({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})+\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot c({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})]ds, (4.1)
Φ~tε\displaystyle\tilde{\Phi}_{t}^{\varepsilon} =ϕ0+0tyuϕ(X~sε,ξ~sε)𝖳σ(ξ~sε)𝑑Ws+1ε0t1Q(H~sε)𝑑s\displaystyle=\phi_{0}+\int_{0}^{t}\nabla_{y}u_{\phi}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\xi}}_{s}^{\varepsilon})dW_{s}+\frac{1}{\varepsilon}\int_{0}^{t}\frac{1}{Q(\tilde{H}_{s}^{\varepsilon})}ds
+0t[xuϕ(X~sε,ξ~sε)b(X~sε,ξ~sε)+yuϕ(X~sε,ξ~sε)c(X~sε,ξ~sε)]𝑑s\displaystyle\quad+\int_{0}^{t}[\nabla_{x}u_{\phi}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot b({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})+\nabla_{y}u_{\phi}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot c({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})]ds
+ε(uϕ(x0,y0)uϕ(X~tε,ξ~tε)),\displaystyle\quad+\varepsilon(u_{\phi}(x_{0},y_{0})-u_{\phi}({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon})), (4.2)

for εαh01δ\varepsilon^{\alpha}\leq h_{0}\leq 1-\delta and tτ(εα)η(δ)t\leq\tau(\varepsilon^{\alpha})\wedge\eta(\delta). The term multiplied by 1/ε1/\varepsilon in (4.1) disappears since HH=0\nabla H\cdot\nabla^{\perp}H=0. Define the following coefficients using the original coordinates for all xMx\in M:

A(x)\displaystyle A(x) =𝕋m|yuh(x,y)𝖳σ(y)|2𝑑μ(y),\displaystyle=\int_{\mathbb{T}^{m}}|\nabla_{y}u_{h}(x,y)^{\mathsf{T}}\sigma(y)|^{2}d\mu(y), (4.3)
Bc(x)\displaystyle{B_{c}}(x) =𝕋m[xuh(x,y)b(x,y)+yuh(x,y)c(x,y)]𝑑μ(y);\displaystyle=\int_{\mathbb{T}^{m}}[\nabla_{x}u_{h}(x,y)\cdot b(x,y)+\nabla_{y}u_{h}(x,y)\cdot c(x,y)]d\mu(y);

and (h,ϕ)(h,\phi) coordinates for x=(h,ϕ)x=(h,\phi), where εαh1δ\varepsilon^{\alpha}\leq h\leq 1-\delta and ϕS1\phi\in S^{1}:

A(h,ϕ)\displaystyle A(h,\phi) =A(x),A¯(h)=S1A(h,ϕ)𝑑ϕ,\displaystyle=A(x),\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \bar{A}(h)=\int_{S^{1}}A(h,\phi)d\phi, (4.4)
Bc(h,ϕ)\displaystyle{B_{c}}(h,\phi) =Bc(x),Bc¯(h)=S1Bc(h,ϕ)𝑑ϕ.\displaystyle={B_{c}}(x),\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \bar{B_{c}}(h)=\int_{S^{1}}{B_{c}}(h,\phi)d\phi.

Define c{\mathcal{L}_{c}} by cf=12A¯f′′+Bc¯f{\mathcal{L}_{c}}f=\frac{1}{2}\bar{A}f^{\prime\prime}+\bar{B_{c}}f^{\prime} for f𝒟f\in\mathcal{D} in the interior of each edge. In particular, when c(x,y)=0c(x,y)=0, this definition is consistent with that in (2.5). Introduce two processes close to H~tε,Φ~tε\tilde{H}_{t}^{\varepsilon},\tilde{\Phi}_{t}^{\varepsilon}:

H^tε\displaystyle\hat{H}_{t}^{\varepsilon} =h0+0tyuh(X~sε,ξ~sε)𝖳σ(ξ~sε)𝑑Ws\displaystyle=h_{0}+\int_{0}^{t}\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\xi}}_{s}^{\varepsilon})dW_{s}
+0t[xuh(X~sε,ξ~sε)b(X~sε,ξ~sε)+yuh(X~sε,ξ~sε)c(X~sε,ξ~sε)]𝑑s,\displaystyle\quad+\int_{0}^{t}[\nabla_{x}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot b({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})+\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot c({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})]ds, (4.5)
Φ^tε\displaystyle\hat{\Phi}_{t}^{\varepsilon} =ϕ0+0tyuϕ(X~sε,ξ~sε)𝖳σ(ξ~sε)𝑑Ws+1ε0t1Q(H~sε)𝑑s\displaystyle=\phi_{0}+\int_{0}^{t}\nabla_{y}u_{\phi}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\xi}}_{s}^{\varepsilon})dW_{s}+\frac{1}{\varepsilon}\int_{0}^{t}\frac{1}{Q(\tilde{H}_{s}^{\varepsilon})}ds
+0t[xuϕ(X~sε,ξ~sε)b(X~sε,ξ~sε)+yuϕ(X~sε,ξ~sε)c(X~sε,ξ~sε)]𝑑s.\displaystyle\quad+\int_{0}^{t}[\nabla_{x}u_{\phi}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot b({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})+\nabla_{y}u_{\phi}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot c({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})]ds. (4.6)

For each f𝒟f\in\mathcal{D}, xU(εα,1δ)x\in U(\varepsilon^{\alpha},1-\delta), y𝕋my\in\mathbb{T}^{m}, and stopping time σTη(δ)τ(εα)\sigma^{\prime}\leq T\wedge\eta(\delta)\wedge\tau(\varepsilon^{\alpha}), by Ito’s formula applied to f(H^σε)f(\hat{H}_{\sigma^{\prime}}^{\varepsilon}), we have

𝐄(x,y)f(H^σε)\displaystyle\bm{\mathrm{E}}_{(x,y)}f(\hat{H}_{\sigma^{\prime}}^{\varepsilon}) =f(H(x))+𝐄(x,y)0σ(12|yuh(X~sε,ξ~sε)𝖳σ(ξ~sε)|2f′′(H^sε)\displaystyle=f(H(x))+\bm{\mathrm{E}}_{(x,y)}\int_{0}^{\sigma^{\prime}}\left(\frac{1}{2}|\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\xi}}_{s}^{\varepsilon})|^{2}f^{\prime\prime}(\hat{H}_{s}^{\varepsilon})\right. (4.7)
+[xuh(X~sε,ξ~sε)b(X~sε,ξ~sε)+yuh(X~sε,ξ~sε)c(X~sε,ξ~sε)]f(H^sε))ds.\displaystyle\quad\left.+\left[\nabla_{x}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot b({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})+\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot c({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\right]f^{\prime}(\hat{H}_{s}^{\varepsilon})\right)ds.

Since sup0tσ|H~tεH^tε|=O(ε)\sup_{0\leq t\leq\sigma^{\prime}}|\tilde{H}_{t}^{\varepsilon}-\hat{H}_{t}^{\varepsilon}|=O(\varepsilon),

𝐄(x,y)f(H~σε)\displaystyle\bm{\mathrm{E}}_{(x,y)}f(\tilde{H}_{\sigma^{\prime}}^{\varepsilon}) =f(H(x))+𝐄(x,y)0σ(12|yuh(X~sε,ξ~sε)𝖳σ(ξ~sε)|2f′′(H~sε)\displaystyle=f(H(x))+\bm{\mathrm{E}}_{(x,y)}\int_{0}^{\sigma^{\prime}}\left(\frac{1}{2}|\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\xi}}_{s}^{\varepsilon})|^{2}f^{\prime\prime}(\tilde{H}_{s}^{\varepsilon})\right. (4.8)
+[xuh(X~sε,ξ~sε)b(X~sε,ξ~sε)+yuh(X~sε,ξ~sε)c(X~sε,ξ~sε)]f(H~sε))ds+O(ε).\displaystyle\quad\left.+\left[\nabla_{x}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot b({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})+\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot c({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\right]f^{\prime}(\tilde{H}_{s}^{\varepsilon})\right)ds+O(\varepsilon).

Combining this with (4.3) and (4.4), by Lemma 3.2, as in Corollary 3.4, we have

𝐄(x,y)[f(H~σε)f(H(x))0σ(12A(H~sε,Φ~sε)f′′(H~sε)+Bc(H~sε,Φ~sε)f(H~sε))𝑑s]=O(ε).\bm{\mathrm{E}}_{(x,y)}\left[f(\tilde{H}_{\sigma^{\prime}}^{\varepsilon})-f(H(x))-\int_{0}^{\sigma^{\prime}}\left(\frac{1}{2}A(\tilde{H}_{s}^{\varepsilon},\tilde{\Phi}_{s}^{\varepsilon})f^{\prime\prime}(\tilde{H}_{s}^{\varepsilon})+{B_{c}}(\tilde{H}_{s}^{\varepsilon},\tilde{\Phi}_{s}^{\varepsilon})f^{\prime}(\tilde{H}_{s}^{\varepsilon})\right)ds\right]=O(\varepsilon). (4.9)
Lemma 4.1.

Let g(h,ϕ)g(h,\phi) be either A(h,ϕ)f′′(h)A(h,\phi)f^{\prime\prime}(h) or Bc(h,ϕ)f(h){B_{c}}(h,\phi)f^{\prime}(h), and g¯(h)=S1g(h,ϕ)𝑑ϕ\bar{g}(h)=\int_{S^{1}}g(h,\phi)d\phi. Then, for every T>0T>0,

supxU(εα,1δ)y𝕋msupσTη(δ)τ(εα)𝐄(x,y)|0σ[g(H~sε,Φ~sε)g¯(H~sε)]𝑑s|0,as ε0,\sup_{\begin{subarray}{c}x\in U(\varepsilon^{\alpha},1-\delta)\\ y\in\mathbb{T}^{m}\end{subarray}}\sup_{\sigma^{\prime}\leq T\wedge\eta(\delta)\wedge\tau(\varepsilon^{\alpha})}\bm{\mathrm{E}}_{(x,y)}\left|\int_{0}^{\sigma^{\prime}}\left[g(\tilde{H}_{s}^{\varepsilon},\tilde{\Phi}_{s}^{\varepsilon})-\bar{g}(\tilde{H}_{s}^{\varepsilon})\right]ds\right|\to 0,\leavevmode\nobreak\ \leavevmode\nobreak\ \text{as }\varepsilon\downarrow 0, (4.10)

where the first supremum is taken over all stopping times σTη(δ)τ(εα)\sigma^{\prime}\leq T\wedge\eta(\delta)\wedge\tau(\varepsilon^{\alpha}).

Proof.

Fix κ>0\kappa>0. Since, for fixed hh, g(h,ϕ)g¯(h)g(h,\phi)-\bar{g}(h) is a function on S1S^{1}, we can approximate it by a finite sum of its Fourier series with error less than κ2T\frac{\kappa}{2T}:

g(h,ϕ)g¯(h)0<|k|K(ε)gk(h,ϕ):=0<|k|K(ε)Gk(h)exp(2πikϕ),g(h,\phi)-\bar{g}(h)\approx\sum_{0<|k|\leq K(\varepsilon)}g_{k}(h,\phi):=\sum_{0<|k|\leq K(\varepsilon)}G_{k}(h)\exp(2\pi ik\phi), (4.11)

for all εαh1δ\varepsilon^{\alpha}\leq h\leq 1-\delta and ϕS1\phi\in S^{1}, where

Gk(h)=[g(h,ϕ)g¯(h)]exp(2πikϕ)𝑑ϕ.G_{k}(h)=\int[g(h,\phi)-\bar{g}(h)]\exp(-2\pi ik\phi)d\phi. (4.12)

Since, as shown in Appendix A, gϕϕ′′=O(|logh|/h)g^{\prime\prime}_{\phi\phi}=O(|\log h|/h), we see that K(ε)K(\varepsilon) can be chosen as εα|logε|2\varepsilon^{-\alpha}|\log\varepsilon|^{2} for sufficiently small ε\varepsilon. Then it suffices to prove that, for all 0<|k|K(ε)0<|k|\leq K(\varepsilon) and ε\varepsilon sufficiently small,

supxU(εα,1δ)y𝕋msupσTη(δ)τ(εα)𝐄(x,y)|0σgk(H~sε,Φ~sε)𝑑s|=o(εα|logε|2).\sup_{\begin{subarray}{c}x\in U(\varepsilon^{\alpha},1-\delta)\\ y\in\mathbb{T}^{m}\end{subarray}}\sup_{\sigma^{\prime}\leq T\wedge\eta(\delta)\wedge\tau(\varepsilon^{\alpha})}\bm{\mathrm{E}}_{(x,y)}\left|\int_{0}^{\sigma^{\prime}}g_{k}(\tilde{H}_{s}^{\varepsilon},\tilde{\Phi}_{s}^{\varepsilon})ds\right|=o\left(\frac{\varepsilon^{\alpha}}{|\log\varepsilon|^{2}}\right). (4.13)

We define an auxiliary function vv for fixed gkg_{k}, where 0<|k|K(ε)0<|k|\leq K(\varepsilon):

v=gk(h,ϕ)Q(h)2πik,v=\frac{g_{k}(h,\phi)Q(h)}{2\pi ik}, (4.14)

which satisfies that vϕ/Q(h)=gk(h,ϕ)v^{\prime}_{\phi}/Q(h)=g_{k}(h,\phi). We formulate the bounds on ϕ\phi, vv, gg, and their derivatives, uniformly in all εα<h<1δ\varepsilon^{\alpha}<h<1-\delta and 0<|k|K(ε)0<|k|\leq K(\varepsilon) (proved in the Appendix A):

ϕ\displaystyle\phi [0,1),ϕ=O(1/h),2ϕ=O(1/h2),\displaystyle\in[0,1),\leavevmode\nobreak\ \nabla\phi=O(1/h),\leavevmode\nobreak\ \nabla^{2}\phi=O(1/h^{2}), (4.15)
v\displaystyle v =O(|logh|),vϕ=O(|logh|),vϕϕ′′=O(|logh|3/h),\displaystyle=O(|\log h|),\leavevmode\nobreak\ v^{\prime}_{\phi}=O(|\log h|),\leavevmode\nobreak\ v^{\prime\prime}_{\phi\phi}=O(|\log h|^{3}/h),
vh\displaystyle v^{\prime}_{h} =O(|logh|2/h),vhh′′=O(|logh|3/h3),vϕh′′=O(|logh|2/h),\displaystyle=O(|\log h|^{2}/h),\leavevmode\nobreak\ v^{\prime\prime}_{hh}=O(|\log h|^{3}/h^{3}),\leavevmode\nobreak\ v^{\prime\prime}_{\phi h}=O(|\log h|^{2}/h),
gh\displaystyle g^{\prime}_{h} =O(|logh|/h),ghh′′=O(|logh|2/h3).\displaystyle=O(|\log h|/h),\leavevmode\nobreak\ g^{\prime\prime}_{hh}=O(|\log h|^{2}/h^{3}).

By comparing (H~tε,Φ~tε)(\tilde{H}_{t}^{\varepsilon},\tilde{\Phi}_{t}^{\varepsilon}) and (H^tε,Φ^tε)(\hat{H}_{t}^{\varepsilon},\hat{\Phi}_{t}^{\varepsilon}) in (4.1), (4.2), (4.5), and (4.6), and using the bounds in (4.15), we know that for all σTη(δ)τ(εα)\sigma^{\prime}\leq T\wedge\eta(\delta)\wedge\tau(\varepsilon^{\alpha}),

0σ|gk(H~sε,Φ~sε)vϕ(H^sε,Φ^sε)Q(H~sε)|𝑑s=0σ|vϕ(H~sε,Φ~sε)vϕ(H^sε,Φ^sε)Q(H~sε)|𝑑s=O(ε12α|logε|3).\int_{0}^{\sigma^{\prime}}\left|g_{k}(\tilde{H}_{s}^{\varepsilon},\tilde{\Phi}_{s}^{\varepsilon})-\frac{v^{\prime}_{\phi}(\hat{H}_{s}^{\varepsilon},\hat{\Phi}_{s}^{\varepsilon})}{Q(\tilde{H}_{s}^{\varepsilon})}\right|ds=\int_{0}^{\sigma^{\prime}}\left|\frac{v^{\prime}_{\phi}(\tilde{H}_{s}^{\varepsilon},\tilde{\Phi}_{s}^{\varepsilon})-v^{\prime}_{\phi}(\hat{H}_{s}^{\varepsilon},\hat{\Phi}_{s}^{\varepsilon})}{Q(\tilde{H}_{s}^{\varepsilon})}\right|ds=O(\varepsilon^{1-2\alpha}|\log\varepsilon|^{3}). (4.16)

Apply Ito’s formula to v(H^σε,Φ^σε)v(\hat{H}_{\sigma^{\prime}}^{\varepsilon},\hat{\Phi}_{\sigma^{\prime}}^{\varepsilon}) and obtain

1ε0σvϕ(H^sε,Φ^sε)Q(H~sε)𝑑s\displaystyle\frac{1}{\varepsilon}\int_{0}^{\sigma^{\prime}}\frac{v^{\prime}_{\phi}(\hat{H}_{s}^{\varepsilon},\hat{\Phi}_{s}^{\varepsilon})}{Q(\tilde{H}_{s}^{\varepsilon})}ds =v(H^σε,Φ^σε)v(H(x),ϕ(x))0σvh(H^sε,Φ^sε)yuh(X~sε,ξ~sε)𝖳σ(ξ~sε)𝑑Ws\displaystyle=v(\hat{H}_{\sigma^{\prime}}^{\varepsilon},\hat{\Phi}_{\sigma^{\prime}}^{\varepsilon})-v(H(x),\phi(x))-\int_{0}^{\sigma^{\prime}}v^{\prime}_{h}(\hat{H}_{s}^{\varepsilon},\hat{\Phi}_{s}^{\varepsilon})\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\xi}}_{s}^{\varepsilon})dW_{s}
0σvh(H^sε,Φ^sε)[xuh(X~sε,ξ~sε)b(X~sε,ξ~sε)+yuh(X~sε,ξ~sε)c(X~sε,ξ~sε)]𝑑s\displaystyle\quad-\int_{0}^{\sigma^{\prime}}v^{\prime}_{h}(\hat{H}_{s}^{\varepsilon},\hat{\Phi}_{s}^{\varepsilon})[\nabla_{x}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot b({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})+\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot c({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})]ds
120σvhh′′(H^sε,Φ^sε)|yuh(X~sε,ξ~sε)𝖳σ(ξ~sε)|2𝑑s\displaystyle\quad-\frac{1}{2}\int_{0}^{\sigma^{\prime}}v^{\prime\prime}_{hh}(\hat{H}_{s}^{\varepsilon},\hat{\Phi}_{s}^{\varepsilon})|\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\xi}}_{s}^{\varepsilon})|^{2}ds
0σvϕ(H^sε,Φ^sε)yuϕ(X~sε,ξ~sε)𝖳σ(ξ~sε)𝑑Ws\displaystyle\quad-\int_{0}^{\sigma^{\prime}}v^{\prime}_{\phi}(\hat{H}_{s}^{\varepsilon},\hat{\Phi}_{s}^{\varepsilon})\nabla_{y}u_{\phi}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\xi}}_{s}^{\varepsilon})dW_{s}
0σvϕ(H^sε,Φ^sε)[xuϕ(X~sε,ξ~sε)b(X~sε,ξ~sε)+yuϕ(X~sε,ξ~sε)c(X~sε,ξ~sε)]𝑑s\displaystyle\quad-\int_{0}^{\sigma^{\prime}}v^{\prime}_{\phi}(\hat{H}_{s}^{\varepsilon},\hat{\Phi}_{s}^{\varepsilon})[\nabla_{x}u_{\phi}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot b({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})+\nabla_{y}u_{\phi}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot c({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})]ds
120σvϕϕ′′(H^sε,Φ^sε)|yuϕ(X~sε,ξ~sε)𝖳σ(ξ~sε)|2𝑑s\displaystyle\quad-\frac{1}{2}\int_{0}^{\sigma^{\prime}}v^{\prime\prime}_{\phi\phi}(\hat{H}_{s}^{\varepsilon},\hat{\Phi}_{s}^{\varepsilon})|\nabla_{y}u_{\phi}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\xi}}_{s}^{\varepsilon})|^{2}ds
0σvϕh′′yuh(X~sε,ξ~sε)𝖳σ(ξ~sε)σ(ξ~sε)𝖳uϕ(X~sε,ξ~sε)𝑑s.\displaystyle\quad-\int_{0}^{\sigma^{\prime}}v_{\phi h}^{\prime\prime}\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\xi}}_{s}^{\varepsilon})\sigma({\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}u_{\phi}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})ds.

By using the estimates in (4.15) and the fact that 0<α<1/40<\alpha<1/4, we know that the expectation of the right-hand side is o(εα1|logε|2)o(\frac{\varepsilon^{\alpha-1}}{|\log\varepsilon|^{2}}). Combining this with (4.16), we get (4.13). Thus, the desired result follows. ∎

Now, applying Lemma 4.1 to (4.9), we get

Lemma 4.2.

For each f𝒟f\in\mathcal{D}, 0<α<1/40<\alpha<1/4, and 0<δ<10<\delta<1, as ε0\varepsilon\downarrow 0,

supxU(εα,1δ)y𝕋msupσTη(δ)τ(εα)|𝐄(x,y)[f(H(X~σε))f(H(x))0σcf(H(X~sε))𝑑s]|0,\sup_{\begin{subarray}{c}x\in U(\varepsilon^{\alpha},1-\delta)\\ y\in\mathbb{T}^{m}\end{subarray}}\sup_{\sigma^{\prime}\leq T\wedge\eta(\delta)\wedge\tau(\varepsilon^{\alpha})}|\bm{\mathrm{E}}_{(x,y)}[f(H({\tilde{X}}_{\sigma^{\prime}}^{\varepsilon}))-f(H(x))-\int_{0}^{\sigma^{\prime}}{\mathcal{L}_{c}}f(H({\tilde{X}}_{s}^{\varepsilon}))ds]|\to 0, (4.17)

where the first supremum is taken over all stopping times σTη(δ)τ(εα)\sigma^{\prime}\leq T\wedge\eta(\delta)\wedge\tau(\varepsilon^{\alpha}).

Remark 4.3.

The diffusion process governed by c\mathcal{L}_{c} can reach all points inside the edge and the interior vertex but cannot reach the exterior vertex. For example, in the case considered here, the process can reach all points in [0,1)[0,1) but cannot reach 11. The reason is that, for each δ>0\delta>0, on [0,1δ][0,1-\delta], B¯c(h)\bar{B}_{c}(h) is bounded and 1/A¯(h)|logh|1/\bar{A}(h)\lesssim|\log h| (see Appendix A for details). However, for κ>0\kappa>0 sufficiently small, on [1κ,1][1-\kappa,1], BcB_{c} is uniformly negative while A(h)1hA(h)\lesssim 1-h due to the non-degeneracy of the maximum point (see Lemma B.1 for details).

Lemma 4.4.

For each δ>0\delta>0 and 0<α<1/40<\alpha<1/4, 𝐄(x,y)(η(δ)τ(εα))\bm{\mathrm{E}}_{(x,y)}(\eta(\delta)\wedge\tau(\varepsilon^{\alpha})) in uniformly bounded for all xU(εα,1δ)x\in U(\varepsilon^{\alpha},1-\delta), y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small.

Proof.

The solution fδf^{\delta} to the following equation exists on [0,1δ][0,1-\delta] due to Remark 4.3:

{cfδ=1fδ(0)=fδ(1δ)=0\begin{cases}{\mathcal{L}_{c}}f^{\delta}=-1\\ f^{\delta}(0)=f^{\delta}(1-\delta)=0\end{cases} (4.18)

Let T~>3fδsup\tilde{T}>3\|f^{\delta}\|_{\mathrm{sup}}, then Lemma 4.2 implies that, for all xU(εα,1δ)x\in U(\varepsilon^{\alpha},1-\delta), y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon small enough,

𝐄(x,y)(η(δ)τ(εα)T~)<T~/2.\bm{\mathrm{E}}_{(x,y)}(\eta(\delta)\wedge\tau(\varepsilon^{\alpha})\wedge\tilde{T})<\tilde{T}/2. (4.19)

Thus, by Markov inequality and strong Markov property, 𝐄(x,y)(η(δ)τ(εα))2T~\bm{\mathrm{E}}_{(x,y)}(\eta(\delta)\wedge\tau(\varepsilon^{\alpha}))\leq 2\tilde{T}. ∎

Lemma 4.5.

For each f𝒟f\in\mathcal{D}, δ>0\delta>0, and 0<α<1/40<\alpha<1/4, as ε0\varepsilon\downarrow 0,

supxU(εα,1δ)y𝕋msupση(δ)τ(εα)|𝐄(x,y)[f(H(X~σε))f(H(x))0σcf(H(X~sε))𝑑s]|0,\sup_{\begin{subarray}{c}x\in U(\varepsilon^{\alpha},1-\delta)\\ y\in\mathbb{T}^{m}\end{subarray}}\sup_{\sigma^{\prime}\leq\eta(\delta)\wedge\tau(\varepsilon^{\alpha})}|\bm{\mathrm{E}}_{(x,y)}[f(H({\tilde{X}}^{\varepsilon}_{\sigma^{\prime}}))-f(H(x))-\int_{0}^{\sigma^{\prime}}{\mathcal{L}_{c}}f(H({\tilde{X}}_{s}^{\varepsilon}))ds]|\to 0, (4.20)

where the first supremum is taken over all stopping times ση(δ)τ(εα)\sigma^{\prime}\leq\eta(\delta)\wedge\tau(\varepsilon^{\alpha}).

Proof.

The result can be deduced from Lemma 4.2 and Lemma 4.4 by choosing a sufficiently large TT and using the Markov property. ∎

4.2 Averaging principle before σ\sigma

With estimates on the transition times and probabilities between level sets near the critical points, the result from last subsection can be extended to the time when the process reaches the separatrix, which is denoted by σ\sigma. The main result of this subsection is

Proposition 4.6.

For each f𝒟f\in\mathcal{D}, as ε0\varepsilon\downarrow 0,

supxU,y𝕋msupσσ|𝐄(x,y)[f(H(X~σε))f(H(x))0σcf(H(X~sε))𝑑s]|0,\sup_{\begin{subarray}{c}x\in U,y\in\mathbb{T}^{m}\end{subarray}}\sup_{\sigma^{\prime}\leq\sigma}|\bm{\mathrm{E}}_{(x,y)}[f(H({\tilde{X}}^{\varepsilon}_{\sigma^{\prime}}))-f(H(x))-\int_{0}^{\sigma^{\prime}}{\mathcal{L}_{c}}f(H({\tilde{X}}_{s}^{\varepsilon}))ds]|\to 0, (4.21)

where the first supremum is taken over all stopping times σσ\sigma^{\prime}\leq\sigma.

We state a simple corollary of Proposition B.3.

Corollary 4.7.

For each 0<α<1/20<\alpha<1/2, uniformly in xU(0,εα)x\in U(0,\varepsilon^{\alpha}) and y𝕋my\in\mathbb{T}^{m},

𝐄(x,y)στ(εα)=O(ε2α|logε|).\bm{\mathrm{E}}_{(x,y)}\sigma\wedge\tau(\varepsilon^{\alpha})=O(\varepsilon^{2\alpha}|\log\varepsilon|). (4.22)
Lemma 4.8.

For each 0<α<1/20<\alpha<1/2, uniformly in xU(0,εα)x\in U(0,\varepsilon^{\alpha}) and y𝕋my\in\mathbb{T}^{m},

|𝐏(x,y)(τ(εα)<σ)H(x)εα|=O(εα|logε|).|\bm{\mathrm{P}}_{(x,y)}(\tau(\varepsilon^{\alpha})<\sigma)-H(x)\varepsilon^{-\alpha}|=O(\varepsilon^{\alpha}|\log\varepsilon|). (4.23)
Proof.

As in (4.1), write the equation for H(X~tε)=H~tεH({\tilde{X}}_{t}^{\varepsilon})=\tilde{H}_{t}^{\varepsilon} stopped at στ(εα){\sigma\wedge\tau(\varepsilon^{\alpha})},

H(X~στ(εα)ε)\displaystyle H({\tilde{X}}_{\sigma\wedge\tau(\varepsilon^{\alpha})}^{\varepsilon}) =H(x)+0στ(εα)xuh(X~sε,ξ~sε)b(X~sε,ξ~sε)+yuh(X~sε,ξ~sε)c(X~sε,ξ~sε)ds\displaystyle=H(x)+\int_{0}^{\sigma\wedge\tau(\varepsilon^{\alpha})}\nabla_{x}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot b({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})+\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot c({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})ds
+0στ(εα)yuh(X~sε,ξ~sε)𝖳σ(ξ~sε)𝑑Ws+O(ε).\displaystyle\quad+\int_{0}^{\sigma\wedge\tau(\varepsilon^{\alpha})}\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\xi}}_{s}^{\varepsilon})dW_{s}+O(\varepsilon).

From Corollary 4.7, it follows that

|𝐏(x,y)(τ(εα)<σ)H(x)εα|=εα|𝐄(x,y)H(X~στ(εα)ε)H(x)|=O(εα|logε|).|\bm{\mathrm{P}}_{(x,y)}(\tau(\varepsilon^{\alpha})<\sigma)-H(x)\varepsilon^{-\alpha}|=\varepsilon^{-\alpha}|\bm{\mathrm{E}}_{(x,y)}H({\tilde{X}}_{\sigma\wedge\tau(\varepsilon^{\alpha})}^{\varepsilon})-H(x)|=O(\varepsilon^{\alpha}|\log\varepsilon|).\qed

We prove that the process spends finite time (in expectation) inside UU. The idea is to use the fact that the process on the graph spends little time near the vertices and exits the edge with positive probability once it gets close enough to the interior vertex.

Lemma 4.9.

For each 0<α<1/40<\alpha<1/4, 𝐄(x,y)τ(εα)\bm{\mathrm{E}}_{(x,y)}\tau(\varepsilon^{\alpha}) is uniformly bounded for all xUx\in U such that H(x)εαH(x)\geq\varepsilon^{\alpha}, y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small.

Proof.

By Lemma B.2, fix δ>0\delta>0 such that 𝐄(x,y)η(2δ)1\bm{\mathrm{E}}_{(x,y)}\eta(2\delta)\leq 1 for all ε\varepsilon sufficiently small and all xx satisfying H(x)>12δH(x)>1-2\delta ; By Lemma 4.5, fix κ>0\kappa>0 such that 𝐏(x,y)(η(δ)<τ(εα))<1κ\bm{\mathrm{P}}_{(x,y)}(\eta(\delta)<\tau(\varepsilon^{\alpha}))<1-\kappa all xx satisfying H(x)=12δH(x)=1-2\delta, all y𝕋my\in\mathbb{T}^{m}, and all ε\varepsilon sufficiently small; By Lemma 4.4, fix T>4(1+supxU(εα,1δ),y𝕋m𝐄(x,y)(τ(εα)η(δ)))/κT>4(1+\sup_{x\in U(\varepsilon^{\alpha},1-\delta),y\in\mathbb{T}^{m}}\bm{\mathrm{E}}_{(x,y)}(\tau(\varepsilon^{\alpha})\wedge\eta(\delta)))/\kappa. For all xx with H(x)>12δH(x)>1-2\delta and y𝕋my\in\mathbb{T}^{m},

𝐏(x,y)(τ(εα)>2T)\displaystyle\bm{\mathrm{P}}_{(x,y)}(\tau(\varepsilon^{\alpha})>2T) (4.24)
𝐏(x,y)(η(2δ)>T)+sup(x,y)γ(12δ)×𝕋m(𝐏(x,y)(τ(εα)η(δ)>T)+𝐏(x,y)(η(δ)<τ(εα)))\displaystyle\leq\bm{\mathrm{P}}_{(x,y)}(\eta(2\delta)>T)+\sup_{(x^{\prime},y^{\prime})\in\gamma(1-2\delta)\times\mathbb{T}^{m}}\left(\bm{\mathrm{P}}_{(x^{\prime},y^{\prime})}(\tau(\varepsilon^{\alpha})\wedge\eta(\delta)>T)+\bm{\mathrm{P}}_{(x^{\prime},y^{\prime})}(\eta(\delta)<\tau(\varepsilon^{\alpha}))\right)
1κ/2.\displaystyle\leq 1-\kappa/2.

For all xUx\in U with εαH(x)12δ\varepsilon^{\alpha}\leq H(x)\leq 1-2\delta, the estimate above holds without the first term on the second line. Then the uniform boundedness follows from the Markov property. ∎

We can apply the similar idea near the separatrix. Namely, we choose 0<α<α<1/40<\alpha^{\prime}<\alpha<1/4. By Corollary 4.7, the process spent little time spent between γ\gamma and γ(εα)\gamma(\varepsilon^{\alpha^{\prime}}), and by Lemma 4.8, the process is very likely to exit through the separatrix rather than come back to γ(εα)\gamma(\varepsilon^{\alpha^{\prime}}) once it reaches γ(εα)\gamma(\varepsilon^{\alpha}). Then, since 𝐄(x,y)τ(εα)\bm{\mathrm{E}}_{(x,y)}\tau(\varepsilon^{\alpha}) is uniformly bounded, one can prove the following result:

Lemma 4.10.

𝐄(x,y)σ\bm{\mathrm{E}}_{(x,y)}\sigma is uniformly bounded for all xUx\in U, y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small.

Proof of Proposition 4.6.

Fix κ>0\kappa>0 and 0<α<1/40<\alpha<1/4. By Lemma 4.9, let TT be large enough such that 𝐏(x,y)(τ(εα)>T)<κ\bm{\mathrm{P}}_{(x,y)}(\tau(\varepsilon^{\alpha})>T)<\kappa for all xUx\in U satisfying H(x)εαH(x)\geq\varepsilon^{\alpha}, y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small. By Lemma B.2, let δ>0\delta>0 small enough such that for ε\varepsilon sufficiently small

supxU:H(x)1δy𝕋msupση(δ)|𝐄(x,y)[f(H(X~σε))f(H(x))0σcf(H(X~sε))𝑑s]|<κ,\sup_{\begin{subarray}{c}x\in U:H(x)\geq 1-\delta\\ y\in\mathbb{T}^{m}\end{subarray}}\sup_{\sigma^{\prime}\leq\eta(\delta)}|\bm{\mathrm{E}}_{(x,y)}[f(H({\tilde{X}}_{\sigma^{\prime}}^{\varepsilon}))-f(H(x))-\int_{0}^{\sigma^{\prime}}\mathcal{L}_{c}f(H({\tilde{X}}_{s}^{\varepsilon}))ds]|<\kappa, (4.25)

where the first supremum is taken over all stopping times ση(δ)\sigma^{\prime}\leq\eta(\delta). By Remark 4.3 and Lemma 4.5, let δ>0\delta^{\prime}>0 small enough such that 𝐏(x,y)(η(δ)<τ(εα))<κ\bm{\mathrm{P}}_{(x,y)}(\eta(\delta^{\prime})<\tau(\varepsilon^{\alpha}))<\kappa for all xU(εα,1δ)x\in U(\varepsilon^{\alpha},1-\delta), y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small. For stopping time στ(εα)\sigma^{\prime}\leq\tau(\varepsilon^{\alpha}), xU(εα,1δ)x\in U(\varepsilon^{\alpha},1-\delta), and y𝕋my\in\mathbb{T}^{m},

|𝐄(x,y)[f(H(X~σε)f(H(x))0σcf(H(X~sε))ds]|\displaystyle|\bm{\mathrm{E}}_{(x,y)}[f(H({\tilde{X}}^{\varepsilon}_{\sigma^{\prime}})-f(H(x))-\int_{0}^{\sigma^{\prime}}{\mathcal{L}_{c}}f(H({\tilde{X}}_{s}^{\varepsilon}))ds]| (4.26)
|𝐄(x,y)[f(H(X~η(δ)σε)f(H(x))0η(δ)σcf(H(X~sε))ds]\displaystyle\leq|\bm{\mathrm{E}}_{(x,y)}[f(H({\tilde{X}}^{\varepsilon}_{\eta(\delta^{\prime})\wedge\sigma^{\prime}})-f(H(x))-\int_{0}^{\eta(\delta^{\prime})\wedge\sigma^{\prime}}{\mathcal{L}_{c}}f(H({\tilde{X}}_{s}^{\varepsilon}))ds]
+𝐏(x,y)(η(δ)<σ)supxU:H(x)=δy𝕋m|𝐄(x,y)[f(H(X~σε))f(H(x))0σcf(H(X~sε))𝑑s]|.\displaystyle\quad+\bm{\mathrm{P}}_{(x,y)}(\eta(\delta^{\prime})<\sigma^{\prime})\sup_{\begin{subarray}{c}x^{\prime}\in U:H(x^{\prime})=\delta^{\prime}\\ y^{\prime}\in\mathbb{T}^{m}\end{subarray}}|\bm{\mathrm{E}}_{(x^{\prime},y^{\prime})}[f(H({\tilde{X}}^{\varepsilon}_{\sigma^{\prime}}))-f(H(x^{\prime}))-\int_{0}^{\sigma^{\prime}}{\mathcal{L}_{c}}f(H({\tilde{X}}_{s}^{\varepsilon}))ds]|.

Note that the first term converges to 0 as ε0\varepsilon\downarrow 0 by Lemma 4.5, the probability in the second term is less than κ\kappa, and the supremum is uniformly bounded for all ε\varepsilon by Lemma 4.10. Thus, the expression on the left-hand side of (4.26) converges to 0 uniformly. Combining this with (4.25), we obtain

supxU:H(x)εαy𝕋msupστ(εα)|𝐄(x,y)[f(H(X~σε))f(H(x))0σcf(H(X~sε))𝑑s]|0.\sup_{\begin{subarray}{c}x\in U:H(x)\geq\varepsilon^{\alpha}\\ y\in\mathbb{T}^{m}\end{subarray}}\sup_{\sigma^{\prime}\leq\tau(\varepsilon^{\alpha})}|\bm{\mathrm{E}}_{(x,y)}[f(H({\tilde{X}}_{\sigma^{\prime}}^{\varepsilon}))-f(H(x))-\int_{0}^{\sigma^{\prime}}\mathcal{L}_{c}f(H({\tilde{X}}_{s}^{\varepsilon}))ds]|\to 0. (4.27)

Finally, let us choose 0<α<α0<\alpha^{\prime}<\alpha. Apply Corollary 4.7 and Lemma 4.8 to obtain that 𝐄(x,y)στ(εα)<εα\bm{\mathrm{E}}_{(x,y)}\sigma\wedge\tau(\varepsilon^{\alpha^{\prime}})<\varepsilon^{\alpha^{\prime}} and 𝐏(x,y)(σ<τ(εα))>1/2\bm{\mathrm{P}}_{(x,y)}(\sigma<\tau(\varepsilon^{\alpha^{\prime}}))>1/2 for all xγ(εα)x\in\gamma(\varepsilon^{\alpha}), y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small. As in (4.26), by stopping the process at τ(εα)σ\tau(\varepsilon^{\alpha})\wedge\sigma^{\prime} and τ(εα)σ\tau(\varepsilon^{\alpha^{\prime}})\wedge\sigma^{\prime} and using the strong Markov property, we can conclude that

supxU:H(x)εαy𝕋msupσσ|𝐄(x,y)[f(H(X~σε))f(H(x))0σcf(H(X~sε))𝑑s]|0.\sup_{\begin{subarray}{c}x\in U:H(x)\geq\varepsilon^{\alpha}\\ y\in\mathbb{T}^{m}\end{subarray}}\sup_{\sigma^{\prime}\leq\sigma}|\bm{\mathrm{E}}_{(x,y)}[f(H({\tilde{X}}_{\sigma^{\prime}}^{\varepsilon}))-f(H(x))-\int_{0}^{\sigma^{\prime}}\mathcal{L}_{c}f(H({\tilde{X}}_{s}^{\varepsilon}))ds]|\to 0. (4.28)

Now (4.21) follows from this by applying Corollary 4.7 again. ∎

4.3 Averaging principle starting from γ(εα)\gamma(\varepsilon^{\alpha})

Fix 0<α<α1<α2<1/20<\alpha<\alpha_{1}<\alpha_{2}<1/2 and r>0r>0 small enough. More delicate results are obtained in this subsection to describe the behavior of processes during one excursion from γ(εα)\gamma(\varepsilon^{\alpha}) to γ\gamma (such excursions, in different domains, happen during the time intervals [τn,σn][\tau_{n},\sigma_{n}] defined in (1.6)). In particular, Lemma 4.15 gives bounds on contribution to (3.2) from each such excursion. Recall that Q(h)Q(h) is the rotation time of 𝒙t\bm{x}_{t} on γ(h)\gamma(h). Our first lemma concerns the typical deviation during one rotation.

Lemma 4.11.

For each δ>0\delta>0 there is κ>0\kappa>0 such that for all xU(εα1,r)x\in U(\varepsilon^{\alpha_{1}},r), y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small,

𝐏(x,y)(supt[0,Q(H(x))]|H(𝑿~tε)H(𝒙t)|>ε1/2δ)<εκ.\bm{\mathrm{P}}_{(x,y)}\left(\sup_{t\in[0,Q(H(x))]}|H({\tilde{\bm{X}}}_{t}^{\varepsilon})-H(\bm{x}_{t})|>\varepsilon^{1/2-\delta}\right)<\varepsilon^{\kappa}. (4.29)

There exists δ>0\delta^{\prime}>0 and κ>0\kappa>0 such that for all xU(εα1,r)x\in U(\varepsilon^{\alpha_{1}},r), y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small,

𝐏(x,y)(supt[0,Q(H(x))]|𝑿~tε𝒙t|>εδ)<εκ.\bm{\mathrm{P}}_{(x,y)}\left(\sup_{t\in[0,Q(H(x))]}|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}|>\varepsilon^{\delta^{\prime}}\right)<\varepsilon^{\kappa}. (4.30)
Proof.

It suffices to prove the result for δ<1/2α1\delta<1/2-\alpha_{1}. Fix 0<δ<δ′′<1/2α1δ0<\delta^{\prime}<\delta^{\prime\prime}<1/2-\alpha_{1}-\delta. Recall the definition of qq in Subsection 4.1 and consider the coordinates HH and qq in U(εα2,2r)U(\varepsilon^{\alpha_{2}},2r). As in (4.1) and (4.2), let q0=q(x)q_{0}=q(x), uh=uHu_{h}=u\cdot\nabla H, uq=uqu_{q}=u\cdot\nabla q, and qt=q0+tq_{t}=q_{0}+t, and write the equations with τ0=inf{t:|𝑯~tεh0|>ε1/2δor|𝑸~tεqt|>εδ′′}Q(h0)\tau^{0}=\inf\{t:|{\tilde{\bm{H}}}_{t}^{\varepsilon}-h_{0}|>\varepsilon^{1/2-\delta}\leavevmode\nobreak\ \mathrm{or}\leavevmode\nobreak\ |{\tilde{\bm{Q}}}_{t}^{\varepsilon}-q_{t}|>\varepsilon^{\delta^{\prime\prime}}\}\wedge Q(h_{0}):

H(𝑿~τ0ε)\displaystyle H({\tilde{\bm{X}}}_{\tau^{0}}^{\varepsilon}) =H(x)+ε0τ0yuh(𝑿~sε,𝝃~sε)𝖳σ(𝝃~sε)𝑑Ws+ε(uh(x,y)uh(𝑿~τ0ε,𝝃~τ0ε))\displaystyle=H(x)+\sqrt{\varepsilon}\int_{0}^{\tau^{0}}\nabla_{y}u_{h}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon})dW_{s}+\varepsilon(u_{h}(x,y)-u_{h}({\tilde{\bm{X}}}_{\tau^{0}}^{\varepsilon},{\tilde{\bm{\xi}}}_{\tau^{0}}^{\varepsilon}))
+ε0τ0[xuh(𝑿~sε,𝝃~sε)b(𝑿~sε,𝝃~sε)+yuh(𝑿~sε,𝝃~sε)c(𝑿~sε,𝝃~sε)]𝑑s,\displaystyle\quad+\varepsilon\int_{0}^{\tau^{0}}[\nabla_{x}u_{h}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})\cdot b({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})+\nabla_{y}u_{h}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})\cdot c({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})]ds, (4.31)
q(𝑿~τ0ε)\displaystyle q({\tilde{\bm{X}}}_{\tau^{0}}^{\varepsilon}) =qτ0+ε0τ0yuq(𝑿~sε,𝝃~sε)𝖳σ(𝝃~sε)𝑑Ws+ε(uq(x,y)uq(𝑿~τ0ε,𝝃~τ0ε))\displaystyle=q_{\tau^{0}}+\sqrt{\varepsilon}\int_{0}^{\tau^{0}}\nabla_{y}u_{q}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon})dW_{s}+\varepsilon(u_{q}(x,y)-u_{q}({\tilde{\bm{X}}}_{\tau^{0}}^{\varepsilon},{\tilde{\bm{\xi}}}_{\tau^{0}}^{\varepsilon}))
+ε0τ0[xuq(𝑿~sε,𝝃~sε)b(𝑿~sε,𝝃~sε)+yuq(𝑿~sε,𝝃~sε)c(𝑿~sε,𝝃~sε)]𝑑s.\displaystyle\quad+\varepsilon\int_{0}^{\tau^{0}}[\nabla_{x}u_{q}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})\cdot b({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})+\nabla_{y}u_{q}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})\cdot c({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})]ds. (4.32)

In Appendix A, we prove that |q|=O(|H|/H)|\nabla q|=O(|\nabla H|/H). Thus, it is not hard to see, by looking at the inverse of the Jacobian of (H,q)(H,q) w.r.t. xx, that |H(𝑿~tε)H(𝒙t)|ε1/2δ|H({\tilde{\bm{X}}}_{t}^{\varepsilon})-H(\bm{x}_{t})|\leq\varepsilon^{1/2-\delta} and |𝑿~tε𝒙t|εδ|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}|\leq\varepsilon^{\delta^{\prime}} for all tτ0t\leq\tau^{0}. Let SHS_{H} and SQS_{Q} denote the stochastic integrals in (4.31) and (4.32). Since τ0|logε|{\tau^{0}}\lesssim|\log\varepsilon|,

𝐏(x,y)(τ0<Q(h(x)))<𝐏(x,y)(|SH|>ε1/2δ/2)+𝐏(x,y)(|SQ|>εδ′′/2).\bm{\mathrm{P}}_{(x,y)}(\tau^{0}<Q(h(x)))<\bm{\mathrm{P}}_{(x,y)}(|S_{H}|>\varepsilon^{1/2-\delta}/2)+\bm{\mathrm{P}}_{(x,y)}(|S_{Q}|>\varepsilon^{\delta^{\prime\prime}}/2).

The variance of SHS_{H} and SQS_{Q} is small:

𝐕𝐚𝐫(SH)\displaystyle\bm{\mathrm{Var}}(S_{H}) =ε𝐄(0τ0|yuh(X~sε,ξ~sε)𝖳σ(ξ~sε)|2𝑑s)ε𝐄(0τ0|H(X~sε)|2𝑑s)ε|logε|,\displaystyle=\varepsilon\bm{\mathrm{E}}(\int_{0}^{\tau^{0}}|\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\xi}}_{s}^{\varepsilon})|^{2}ds)\lesssim\varepsilon\bm{\mathrm{E}}(\int_{0}^{\tau^{0}}|\nabla H({\tilde{X}}_{s}^{\varepsilon})|^{2}ds)\lesssim\varepsilon|\log\varepsilon|,
𝐕𝐚𝐫(SQ)\displaystyle\bm{\mathrm{Var}}(S_{Q}) =ε𝐄(0τ0|yuq(X~sε,ξ~sε)𝖳σ(ξ~sε)|2𝑑s)ε𝐄(0τ0|q(X~sε)|2𝑑s)ε12α1|logε|.\displaystyle=\varepsilon\bm{\mathrm{E}}(\int_{0}^{\tau^{0}}|\nabla_{y}u_{q}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\xi}}_{s}^{\varepsilon})|^{2}ds)\lesssim\varepsilon\bm{\mathrm{E}}(\int_{0}^{\tau^{0}}|\nabla q({\tilde{X}}_{s}^{\varepsilon})|^{2}ds)\lesssim\varepsilon^{1-2\alpha_{1}}|\log\varepsilon|.

Hence both results follow from Chebyshev’s inequality with κ<δ\kappa<\delta. ∎

Let F(h)F(h) be the solution to

{cF=1F(0)=F(2r)=0\begin{cases}{\mathcal{L}_{c}}F=-1\\ F(0)=F(2r)=0\end{cases} (4.33)

Let τ1\tau^{1} and τ2\tau^{2} be the first times for X~tε{\tilde{X}}_{t}^{\varepsilon} to exit U(εα1,r)U(\varepsilon^{\alpha_{1}},r) and U(εα2,2r)U(\varepsilon^{\alpha_{2}},2r), respectively. Let xtε=𝒙t/εx_{t}^{\varepsilon}=\bm{x}_{t/\varepsilon}.

Lemma 4.12.

There exists a function g(r)g(r) with limr0g(r)=0\lim_{r\to 0}g(r)=0 such that |F(h)|<g(r)|F^{\prime}(h)|<g(r) for all 0<h<2r0<h<2r. There exists C>0C>0 such that |F′′(h)|<C|logh||F^{\prime\prime}(h)|<C|\log h| and |F′′′(h)|<C/h|F^{\prime\prime\prime}(h)|<C/h.

Proof.

The bounds can be verified with the help of estimates for Q(h)Q(h) in Appendix A. ∎

Lemma 4.13.

There exists a function g(r)g(r) with limr0g(r)=0\lim_{r\to 0}g(r)=0 such that for all xγ(εα)x\in\gamma(\varepsilon^{\alpha}), y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small,

𝐄(x,y)τ1εαg(r).\bm{\mathrm{E}}_{(x,y)}\tau^{1}\leq\varepsilon^{\alpha}g(r). (4.34)
Proof.

For (X~tε,ξ~tε)({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon}) starting from (x,y)(x,y), we define τ¯2=εQ(H(x))τ2\bar{\tau}^{2}=\varepsilon Q(H(x))\wedge\tau^{2}. As in (4.9):

𝐄(x,y)[F(H(X~τ¯2ε))F(H(x))0τ¯2(12A(X~sε)F′′(H(X~sε))+B(X~sε)F(H(X~sε)))𝑑s]=O(ε),\bm{\mathrm{E}}_{(x,y)}[F(H({\tilde{X}}_{\bar{\tau}^{2}}^{\varepsilon}))-F(H(x))-\int_{0}^{\bar{\tau}^{2}}(\frac{1}{2}A({\tilde{X}}_{s}^{\varepsilon})F^{\prime\prime}(H({\tilde{X}}_{s}^{\varepsilon}))+B({\tilde{X}}_{s}^{\varepsilon})F^{\prime}(H({\tilde{X}}_{s}^{\varepsilon})))ds]=O(\varepsilon), (4.35)

uniformly in xU(εα2,2r)x\in U(\varepsilon^{\alpha_{2}},2r) and y𝕋my\in\mathbb{T}^{m}. By the definition of A¯(h)\bar{A}(h) and B¯(h)\bar{B}(h), one can see that εQ(H(x))A¯(H(x))=0εQ(H(x))A(xsε)𝑑s\varepsilon Q(H(x))\bar{A}(H(x))=\int_{0}^{\varepsilon Q(H(x))}A(x^{\varepsilon}_{s})ds and εQ(H(x))B¯(H(x))=0εQ(H(x))B(xsε)𝑑s\varepsilon Q(H(x))\bar{B}(H(x))=\int_{0}^{\varepsilon Q(H(x))}B(x^{\varepsilon}_{s})ds. Since FF solves (4.33), it follows that

εQ(H(x))=εQ(H(x))cF(H(x))=0εQ(H(x))12A(xsε)F′′(H(xsε))+B(xsε)F(H(xsε))ds.\varepsilon Q(H(x))=-\varepsilon Q(H(x)){\mathcal{L}_{c}}F(H(x))=-\int_{0}^{\varepsilon Q(H(x))}\frac{1}{2}A(x^{\varepsilon}_{s})F^{\prime\prime}(H(x^{\varepsilon}_{s}))+B(x^{\varepsilon}_{s})F^{\prime}(H(x^{\varepsilon}_{s}))ds. (4.36)

We prove that there exists K>0K>0 such that

K𝐄(x,y)(F(H(X~τ¯2ε))F(H(x)))εQ(H(x))K\bm{\mathrm{E}}_{(x,y)}(F(H({\tilde{X}}_{\bar{\tau}^{2}}^{\varepsilon}))-F(H(x)))\leq-\varepsilon Q(H(x)) (4.37)

uniformly in xU(εα1,r)x\in U(\varepsilon^{\alpha_{1}},r), y𝕋my\in\mathbb{T}^{m}, and all ε\varepsilon sufficiently small. Then it follows that

𝐄(x,y)τ1KF(H(x)),\bm{\mathrm{E}}_{(x,y)}\tau^{1}\leq KF(H(x)), (4.38)

for xU(εα1,r)x\in U(\varepsilon^{\alpha_{1}},r), y𝕋my\in\mathbb{T}^{m}, and all ε\varepsilon sufficiently small. Indeed, we can define τ¯k2\bar{\tau}^{2}_{k}, k0k\geq 0 recursively: τ¯02=0\bar{\tau}^{2}_{0}=0, τ¯k+12=inf{tτ¯k2:X~tεU(εα2,2r)}(εQ(H(X~τ¯k2ε))+τ¯k2)\bar{\tau}^{2}_{k+1}=\inf\{t\geq\bar{\tau}^{2}_{k}:{\tilde{X}}_{t}^{\varepsilon}\not\in U(\varepsilon^{\alpha_{2}},2r)\}\wedge(\varepsilon Q(H({\tilde{X}}^{\varepsilon}_{\bar{\tau}^{2}_{k}}))+\bar{\tau}^{2}_{k}), and denote the first kk such that τ¯k2\bar{\tau}^{2}_{k} exceeds τ1\tau^{1} as 𝒏\bm{n}. Then we have

𝐄(x,y)[F(H(X~τ¯𝒏2))F(H(x))]\displaystyle\bm{\mathrm{E}}_{(x,y)}\left[F(H({\tilde{X}}_{\bar{\tau}^{2}_{\bm{n}}}))-F(H(x))\right] (4.39)
=𝐄(x,y)k=0χτ¯k2<τ1[F(H(X~τ¯k+12))F(H(X~τ¯k2))]\displaystyle=\bm{\mathrm{E}}_{(x,y)}\sum_{k=0}^{\infty}\chi_{\bar{\tau}^{2}_{k}<\tau^{1}}\left[F(H({\tilde{X}}_{\bar{\tau}^{2}_{k+1}}))-F(H({\tilde{X}}_{\bar{\tau}^{2}_{k}}))\right]
𝐄(x,y)k=0χτ¯k2<τ1sup(x,y)U(εα1,r)×𝕋m𝐄(x,y)[F(H(X~τ¯2ε))F(H(x))]\displaystyle\leq\bm{\mathrm{E}}_{(x,y)}\sum_{k=0}^{\infty}\chi_{\bar{\tau}^{2}_{k}<\tau^{1}}\sup_{(x^{\prime},y^{\prime})\in U(\varepsilon^{\alpha_{1}},r)\times\mathbb{T}^{m}}\bm{\mathrm{E}}_{(x^{\prime},y^{\prime})}\left[F(H({\tilde{X}}_{\bar{\tau}^{2}}^{\varepsilon}))-F(H(x^{\prime}))\right]
1K𝐄(x,y)k=0χτ¯k2<τ1(εQ(X~τ¯k2)).\displaystyle\leq\frac{1}{K}\bm{\mathrm{E}}_{(x,y)}\sum_{k=0}^{\infty}\chi_{\bar{\tau}^{2}_{k}<\tau^{1}}(-\varepsilon Q({\tilde{X}}_{\bar{\tau}^{2}_{k}})).

Hence

𝐄(x,y)τ1𝐄(x,y)τ¯𝒏2=𝐄(x,y)k=0χτ¯k2<τ1(τ¯k+12τ¯k2)ε𝐄(x,y)k=0χτ¯k2<τ1Q(X~τ¯k2)KF(H(x)).\bm{\mathrm{E}}_{(x,y)}\tau^{1}\leq\bm{\mathrm{E}}_{(x,y)}\bar{\tau}^{2}_{\bm{n}}=\bm{\mathrm{E}}_{(x,y)}\sum_{k=0}^{\infty}\chi_{\bar{\tau}^{2}_{k}<\tau^{1}}(\bar{\tau}^{2}_{k+1}-\bar{\tau}^{2}_{k})\leq\varepsilon\bm{\mathrm{E}}_{(x,y)}\sum_{k=0}^{\infty}\chi_{\bar{\tau}^{2}_{k}<\tau^{1}}Q({\tilde{X}}_{\bar{\tau}^{2}_{k}})\leq KF(H(x)).

Then (4.34) follows from (4.38) and Lemma 4.12 by taking xγ(εα)x\in\gamma(\varepsilon^{\alpha}).

To prove (4.37), it is enough to see that, for xU(εα1,r)x\in U(\varepsilon^{\alpha_{1}},r), y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small,

εQ(H(x))+𝐄(x,y)F(H(X~τ¯2ε))F(H(x))\displaystyle\varepsilon Q(H(x))+\bm{\mathrm{E}}_{(x,y)}F(H({\tilde{X}}_{\bar{\tau}^{2}}^{\varepsilon}))-F(H(x))
=𝐄(x,y)0εQ(H(x))(12A(xsε)F′′(H(xsε))+B(xsε)F(H(xsε)))𝑑s\displaystyle=-\bm{\mathrm{E}}_{(x,y)}\int_{0}^{\varepsilon Q(H(x))}\left(\frac{1}{2}A(x^{\varepsilon}_{s})F^{\prime\prime}(H(x^{\varepsilon}_{s}))+B(x^{\varepsilon}_{s})F^{\prime}(H(x^{\varepsilon}_{s}))\right)ds
+𝐄(x,y)0τ¯2(12A(X~sε)F′′(H(X~sε))+B(X~sε)F(H(X~sε)))𝑑s+O(ε)\displaystyle\quad+\bm{\mathrm{E}}_{(x,y)}\int_{0}^{\bar{\tau}^{2}}\left(\frac{1}{2}A({\tilde{X}}_{s}^{\varepsilon})F^{\prime\prime}(H({\tilde{X}}_{s}^{\varepsilon}))+B({\tilde{X}}_{s}^{\varepsilon})F^{\prime}(H({\tilde{X}}_{s}^{\varepsilon}))\right)ds+O(\varepsilon)
=𝐄(x,y)0τ¯2(12A(X~sε)F′′(H(X~sε))12A(xsε)F′′(H(xsε)))𝑑s\displaystyle=\bm{\mathrm{E}}_{(x,y)}\int_{0}^{\bar{\tau}^{2}}\left(\frac{1}{2}A({\tilde{X}}_{s}^{\varepsilon})F^{\prime\prime}(H({\tilde{X}}_{s}^{\varepsilon}))-\frac{1}{2}A(x^{\varepsilon}_{s})F^{\prime\prime}(H(x^{\varepsilon}_{s}))\right)ds
+𝐄(x,y)0τ¯2(B(X~sε)F(H(X~sε))B(xsε)F(H(xsε)))𝑑s\displaystyle\quad+\bm{\mathrm{E}}_{(x,y)}\int_{0}^{\bar{\tau}^{2}}\left(B({\tilde{X}}_{s}^{\varepsilon})F^{\prime}(H({\tilde{X}}_{s}^{\varepsilon}))-B(x^{\varepsilon}_{s})F^{\prime}(H(x^{\varepsilon}_{s}))\right)ds
𝐄(x,y)τ¯2εQ(H(x))(12A(xsε)F′′(H(xsε))+B(xsε)F(H(xsε)))𝑑s+O(ε)\displaystyle\quad-\bm{\mathrm{E}}_{(x,y)}\int_{\bar{\tau}^{2}}^{\varepsilon Q(H(x))}\left(\frac{1}{2}A(x^{\varepsilon}_{s})F^{\prime\prime}(H(x^{\varepsilon}_{s}))+B(x^{\varepsilon}_{s})F^{\prime}(H(x^{\varepsilon}_{s}))\right)ds+O(\varepsilon)
=o(εQ(H(x))),\displaystyle=o(\varepsilon Q(H(x))),

where the first equality is due to (4.35) and (4.36) and the last equality is due to Lemma 4.11 and Lemma 4.12. ∎

Similarly to Lemma 4.10, we can look at the transitions between γ(εα)\gamma(\varepsilon^{\alpha}) and γ(εα1)\gamma(\varepsilon^{\alpha_{1}}). By the transition probabilities given in Lemma 4.8 and transition time given in Corollary 4.7 and Lemma 4.13, one can obtain the following result using the strong Markov property.

Corollary 4.14.

There exists a function g(r)g(r) with limr0g(r)=0\lim_{r\to 0}g(r)=0 such that for all xγ(εα)x\in\gamma(\varepsilon^{\alpha}), y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small,

𝐄(x,y)τ(r)σεαg(r).\bm{\mathrm{E}}_{(x,y)}\tau(r)\wedge\sigma\leq\varepsilon^{\alpha}g(r). (4.40)
Lemma 4.15.

For each f𝒟f\in\mathcal{D}, as ε0\varepsilon\downarrow 0,

sup(x,y)γ(εα)×𝕋msupσσ|𝐄(x,y)[f(H(X~σε))f(H(x))0σcf(H(X~sε))𝑑s]|=o(εα),\sup_{(x,y)\in\gamma(\varepsilon^{\alpha})\times\mathbb{T}^{m}}\sup_{\sigma^{\prime}\leq\sigma}|\bm{\mathrm{E}}_{(x,y)}[f(H({\tilde{X}}^{\varepsilon}_{\sigma^{\prime}}))-f(H(x))-\int_{0}^{\sigma^{\prime}}{\mathcal{L}_{c}}f(H({\tilde{X}}_{s}^{\varepsilon}))ds]|=o(\varepsilon^{\alpha}), (4.41)

where the first supremum is taken over all stopping times σσ\sigma^{\prime}\leq\sigma.

Proof.

Fix κ>0\kappa>0. By Corollary 4.14, we can choose rr small enough so that for stopping time σσ\sigma^{\prime}\leq\sigma: |𝐄(x,y)[H(X~τ(r)σε)H(x)]|<κεα|\bm{\mathrm{E}}_{(x,y)}[H({\tilde{X}}_{\tau(r)\wedge\sigma^{\prime}}^{\varepsilon})-H(x)]|<\kappa\varepsilon^{\alpha} and |𝐄(x,y)[f(H(X~τ(r)σε))f(H(x))]|<κεα|\bm{\mathrm{E}}_{(x,y)}[f(H({\tilde{X}}_{\tau(r)\wedge\sigma^{\prime}}^{\varepsilon}))-f(H(x))]|<\kappa\varepsilon^{\alpha}, and

sup(x,y)γ(εα)×𝕋msupσσ|𝐄(x,y)0τ(r)σcf(H(X~sε))𝑑s|<κεα,\displaystyle\sup_{(x,y)\in\gamma(\varepsilon^{\alpha})\times\mathbb{T}^{m}}\sup_{\sigma^{\prime}\leq\sigma}|\bm{\mathrm{E}}_{(x,y)}\int_{0}^{\tau(r)\wedge\sigma^{\prime}}{\mathcal{L}_{c}}f(H({\tilde{X}}_{s}^{\varepsilon}))ds|<\kappa\varepsilon^{\alpha},

for all ε\varepsilon sufficiently small, using similar arguments leading to (4.1) and (4.8). It follows that, 𝐏(x,y)(H(X~τ(r)σε)=r)H(x)/r+κεα/r2εα/r\bm{\mathrm{P}}_{(x,y)}(H({\tilde{X}}_{\tau(r)\wedge\sigma^{\prime}}^{\varepsilon})=r)\leq H(x)/r+\kappa\varepsilon^{\alpha}/r\leq 2\varepsilon^{\alpha}/r. Therefore, uniformly in all xγ(εα)x\in\gamma(\varepsilon^{\alpha}), y𝕋my\in\mathbb{T}^{m}, and σσ\sigma^{\prime}\leq\sigma,

|𝐄(x,y)[f(H(X~σε))f(H(x))0σcf(X~sε)𝑑s]|\displaystyle|\bm{\mathrm{E}}_{(x,y)}[f(H({\tilde{X}}^{\varepsilon}_{\sigma^{\prime}}))-f(H(x))-\int_{0}^{\sigma^{\prime}}{\mathcal{L}_{c}}f({\tilde{X}}_{s}^{\varepsilon})ds]|
|𝐄(x,y)f[H(X~τ(r)σε)f(H(x))0τ(r)σcf(H(X~sε))𝑑s]|\displaystyle\leq|\bm{\mathrm{E}}_{(x,y)}f[H({\tilde{X}}^{\varepsilon}_{\tau(r)\wedge\sigma^{\prime}})-f(H(x))-\int_{0}^{\tau(r)\wedge\sigma^{\prime}}{\mathcal{L}_{c}}f(H({\tilde{X}}_{s}^{\varepsilon}))ds]|
+𝐏(x,y)(H(Xτ(r)σε)=r)supxγ(r)y𝕋m|𝐄(x,y)[f(H(X~σε))f(H(x))0σcf(H(X~sε))𝑑s]|\displaystyle\quad+\bm{\mathrm{P}}_{(x,y)}(H(X_{\tau(r)\wedge\sigma^{\prime}}^{\varepsilon})=r)\sup_{\begin{subarray}{c}x^{\prime}\in\gamma(r)\\ y^{\prime}\in\mathbb{T}^{m}\end{subarray}}|\bm{\mathrm{E}}_{(x^{\prime},y^{\prime})}[f(H({\tilde{X}}^{\varepsilon}_{\sigma^{\prime}}))-f(H(x))-\int_{0}^{\sigma^{\prime}}{\mathcal{L}_{c}}f(H({\tilde{X}}_{s}^{\varepsilon}))ds]|
3κεα,\displaystyle\leq 3\kappa\varepsilon^{\alpha},

for ε\varepsilon sufficiently small, due to Proposition 4.6 and our choice of rr. The result follows because κ\kappa can be chosen arbitrarily small. ∎

The last result in this subsection provides estimates that will be used later to control the number of excursions from γ(εα)\gamma(\varepsilon^{\alpha}) to γ\gamma in finite time (see Corollary 6.3).

Lemma 4.16.

There is a constant κ>0\kappa>0 such that, for all ε\varepsilon sufficiently small,

sup(x,y)γ(εα)×𝕋m𝐄(x,y)eσ1κεα.\sup_{(x,y)\in\gamma(\varepsilon^{\alpha})\times\mathbb{T}^{m}}\bm{\mathrm{E}}_{(x,y)}e^{-\sigma}\leq 1-\kappa\varepsilon^{\alpha}. (4.42)
Proof.

By Corollary 4.14, as in the proof of Lemma 4.8, we can fix 0<r<1/30<r<1/3 such that for all xγ(εα)x\in\gamma(\varepsilon^{\alpha}), y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small, 𝐏(x,y)(τ(r)<σ)εα/2r\bm{\mathrm{P}}_{(x,y)}(\tau(r)<\sigma)\geq\varepsilon^{\alpha}/2r. Let FF be defined as in (4.33) and t=F(r)/3t=F(r)/3, then it follows from Proposition 4.6, as ε0\varepsilon\downarrow 0,

sup(x,y)γ(r)×𝕋m𝐄(x,y)[F(H(X~στ(2r)tε))F(H(x))0στ(2r)tcF(H(X~sε))𝑑s]0.\sup_{(x,y)\in\gamma(r)\times\mathbb{T}^{m}}\bm{\mathrm{E}}_{(x,y)}[F(H({\tilde{X}}_{\sigma\wedge\tau(2r)\wedge t}^{\varepsilon}))-F(H(x))-\int_{0}^{\sigma\wedge\tau(2r)\wedge t}{\mathcal{L}_{c}}F(H({\tilde{X}}_{s}^{\varepsilon}))ds]\to 0. (4.43)

Thus, we have that for all xγ(r)x\in\gamma(r), y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small,

𝐄(x,y)F(H(X~στ(2r)tε)>F(r)/2,\bm{\mathrm{E}}_{(x,y)}F(H({\tilde{X}}_{\sigma\wedge\tau(2r)\wedge t}^{\varepsilon})>F(r)/2, (4.44)

and it follows that,

𝐏(x,y)(σ>t)𝐏(x,y)(στ(2r)>t)>𝐄(x,y)F(H(X~στ(2r)tε)sup[0,2r]F(h)>F(r)2sup[0,2r]F(h)=:c1(r).\bm{\mathrm{P}}_{(x,y)}(\sigma>t)\geq\bm{\mathrm{P}}_{(x,y)}(\sigma\wedge\tau(2r)>t)>\frac{\bm{\mathrm{E}}_{(x,y)}F(H({\tilde{X}}_{\sigma\wedge\tau(2r)\wedge t}^{\varepsilon})}{\sup_{[0,2r]}F(h)}>\frac{F(r)}{2\sup_{[0,2r]}F(h)}=:c_{1}(r). (4.45)

Then, for all xγ(r)x\in\gamma(r), y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small,

𝐄(x,y)eσ𝐏(x,y)(σt)+𝐏(x,y)(σ>t)et1𝐏(x,y)(σ>t)(1et)1c(r),\bm{\mathrm{E}}_{(x,y)}e^{-\sigma}\leq\bm{\mathrm{P}}_{(x,y)}(\sigma\leq t)+\bm{\mathrm{P}}_{(x,y)}(\sigma>t)e^{-t}\leq 1-\bm{\mathrm{P}}_{(x,y)}(\sigma>t)(1-e^{-t})\leq 1-c(r), (4.46)

with c(r)=(1exp(F(r)/3))c1(r)>0c(r)=(1-\exp(-F(r)/3))c_{1}(r)>0, and therefore,

𝐄(x,y)eσ\displaystyle\bm{\mathrm{E}}_{(x,y)}e^{-\sigma} 𝐏(x,y)(σ<τ(r))+𝐏(x,y)(σ>τ(r))supxγ(r),y𝕋m𝐄(x,y)eσ\displaystyle\leq\bm{\mathrm{P}}_{(x,y)}(\sigma<\tau(r))+\bm{\mathrm{P}}_{(x,y)}(\sigma>\tau(r))\sup_{{x^{\prime}\in\gamma(r),y^{\prime}\in\mathbb{T}^{m}}}\bm{\mathrm{E}}_{(x^{\prime},y^{\prime})}e^{-\sigma}
1𝐏(x,y)(σ>τ(r))(1supxγ(r),y𝕋m𝐄(x,y)eσ)\displaystyle\leq 1-\bm{\mathrm{P}}_{(x,y)}(\sigma>\tau(r))(1-\sup_{{x^{\prime}\in\gamma(r),y^{\prime}\in\mathbb{T}^{m}}}\bm{\mathrm{E}}_{(x^{\prime},y^{\prime})}e^{-\sigma})
112c(r)εαr.\displaystyle\leq 1-\frac{1}{2}c(r)\frac{\varepsilon^{\alpha}}{r}.

The result holds with κ=c(r)/2r\kappa=c(r)/2r. ∎

5 Exponential convergence on the separatrix

We fix 0<α<1/20<\alpha<1/2. As in (1.6) and Figure 2, we define inductively two sequences of stopping times σn\sigma_{n}, n0n\geq 0, and τn\tau_{n}, n1n\geq 1, but now for the general process (X~tε,ξ~tε)({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon}) on M×𝕋mM\times\mathbb{T}^{m} with additional drift c(x,y)c(x,y). Without loss of generality, we assume that the saddle point OO satisfies H(O)=0H(O)=0. Let Vε={x:|H(x)|<εα}V^{\varepsilon}=\{x:|H(x)|<\varepsilon^{\alpha}\} and U1U_{1}, U2U_{2}, U3U_{3} be the three domains separated by γ\gamma. We aim to prove that the distribution of Markov chain (X~σnε,ξ~σnε)({\tilde{X}}^{\varepsilon}_{\sigma_{n}},{\tilde{\xi}}^{\varepsilon}_{\sigma_{n}}) converges in total variation exponentially fast, uniformly in ε\varepsilon and in the initial distribution. Namely, we have the following lemma.

Lemma 5.1.

Let νx,yn,ε\nu^{n,\varepsilon}_{x,y} denote the measure on γ×𝕋m\gamma\times\mathbb{T}^{m} induced by (X~σnε,ξ~σnε)({\tilde{X}}_{\sigma_{n}}^{\varepsilon},{\tilde{\xi}}_{\sigma_{n}}^{\varepsilon}) with starting point (x,y)γ×𝕋m(x,y)\in\gamma\times\mathbb{T}^{m}. Then there exist a probability measure νε\nu^{\varepsilon} on γ×𝕋m\gamma\times\mathbb{T}^{m} and constants Ξ>0\Xi>0 and 0<c<10<c<1 such that, for all ε\varepsilon sufficiently small,

sup(x,y)γ×𝕋mTV(νx,yn,ε,νε)<Ξ(1c)n,\sup_{(x,y)\in\gamma\times\mathbb{T}^{m}}{\mathrm{TV}}(\nu^{n,\varepsilon}_{x,y},\nu^{\varepsilon})<\Xi\cdot(1-c)^{n}, (5.1)

where TV is the total variation distance of probability measures.

The rest of this section is devoted to the proof of Lemma 5.1. Let 𝝈n\bm{\sigma}_{n}, n0n\geq 0, and 𝝉n\bm{\tau}_{n}, n1n\geq 1, be the stopping times w.r.t. (𝑿~tε,𝝃~tε)({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}_{t}^{\varepsilon}) that are analogous to σn\sigma_{n}, τn\tau_{n} w.r.t. (X~tε,ξ~tε)({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon}). The lemma is equivalent to the exponential convergence in total variation of (𝑿~𝝉nε,𝝃~𝝉nε)({\tilde{\bm{X}}}_{\bm{\tau}_{n}}^{\varepsilon},{\tilde{\bm{\xi}}}_{\bm{\tau}_{n}}^{\varepsilon}) on γ×𝕋m\gamma^{\prime}\times\mathbb{T}^{m}, uniformly in ε\varepsilon and in the initial distribution. The proof consists of three steps:

  1. 1.

    The process starting on γ×𝕋m\gamma^{\prime}\times\mathbb{T}^{m} hits I×𝕋mI\times\mathbb{T}^{m} before 𝝉1\bm{\tau}_{1} with uniformly positive probability, where II is a fixed interval on the separatrix.

  2. 2.

    Let the process starting on I×𝕋mI\times\mathbb{T}^{m} evolve for a certain period of time. Then, by a local limit theorem, we can estimate from below the probabilities of hitting O(ε)O(\varepsilon)-sized boxes in a certain O(ε)O(\sqrt{\varepsilon})-sized region, uniformly in the starting point on I×𝕋mI\times\mathbb{T}^{m}.

  3. 3.

    By the parabolic Hörmander condition (H5), we prove a common lower bound for the density of the distribution of the process starting from each of the O(ε)O(\varepsilon)-sized boxes after a short time.

Let us take care of these steps in order.

Step 1. Let 0<β<10<\beta<1, which will be specified later. We prove in the next two results that the process has a uniformly positive probability of following along the averaged process 𝒙t\bm{x}_{t} and going through a neighborhood of the saddle point without making a deviation more than βε\beta\sqrt{\varepsilon} in terms of the value of HH.

Lemma 5.2.

For each fixed t^>0\hat{t}>0, β>0\beta^{\prime}>0,

𝐏(x,y)(sup0tt^|𝑿~tε𝒙t|βε),\bm{\mathrm{P}}_{(x,y)}\left(\sup_{0\leq t\leq\hat{t}}|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}|\leq\beta^{\prime}\sqrt{\varepsilon}\right),

is uniformly positive for all (x,y)M×𝕋m(x,y)\in M\times\mathbb{T}^{m} and ε\varepsilon sufficiently small.

Proof.

Let the eigenvalues of 2H\nabla^{2}H be bounded by KK. Recall formula (3.10). By the boundedness of the coefficients, the event

E:={sup0tt~|0tyu(𝑿~sε,𝝃~sε)σ(𝝃~sε)𝑑Ws|12βeKt~}E:=\left\{\sup_{0\leq t\leq\tilde{t}}|\int_{0}^{t}\nabla_{y}u({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon})dW_{s}|\leq\frac{1}{2}\beta^{\prime}e^{-K\tilde{t}}\right\}

has positive probability, uniformly in the starting points. By (3.10), we have that on the event EE, for tt~t\leq\tilde{t} and ε\varepsilon sufficiently small,

|𝑿~tε𝒙t|\displaystyle|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}| |0t(H(𝑿~sε)H(𝒙s)ds|+ε|0tyu(𝑿~sε,𝝃~sε)σ(𝝃~sε)dWs|\displaystyle\leq|\int_{0}^{t}(\nabla^{\perp}H({\tilde{\bm{X}}}_{s}^{\varepsilon})-\nabla^{\perp}H(\bm{x}_{s})ds|+\sqrt{\varepsilon}|\int_{0}^{t}\nabla_{y}u({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon})dW_{s}|
+ε|0t\displaystyle+\varepsilon|\int_{0}^{t} [xu(𝑿~sε,𝝃~sε)b(𝑿~sε,𝝃~sε)+yu(𝑿~sε,𝝃~sε)c(𝑿~sε,𝝃~sε)]ds|+ε|u(x,y)u(𝑿~tε,𝝃~tε)|\displaystyle[\nabla_{x}u({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})b({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})+\nabla_{y}u({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})c({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})]ds|+\varepsilon|u(x,y)-u({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}_{t}^{\varepsilon})|
K0t|𝑿~sε𝒙s|𝑑s+βeKt~ε.\displaystyle\leq K\int_{0}^{t}|{\tilde{\bm{X}}}_{s}^{\varepsilon}-\bm{x}_{s}|ds+\beta^{\prime}e^{-K\tilde{t}}\sqrt{\varepsilon}.

Then Grönwall’s inequality implies that |𝑿~tε𝒙t|βε|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}|\leq\beta^{\prime}\sqrt{\varepsilon} for all tt~t\leq\tilde{t}. Therefore, EE implies {sup0tt^|𝑿~tε𝒙t|βε}\{\sup_{0\leq t\leq\hat{t}}|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}|\leq\beta^{\prime}\sqrt{\varepsilon}\}, and the uniform positivity follows. ∎

Lemma 5.3.

For any given 0<c<10<c<1, there exist curves Γ1\Gamma_{1} and Γ2\Gamma_{2} in U1U_{1} such that

  1. (i)

    Γ1\Gamma_{1} and Γ2\Gamma_{2} have their tangent vectors as H\nabla H. They intersect with the separatrix on different sides of the saddle point and the averaged motion on the separatrix spends finite time from Γ2\Gamma_{2} to Γ1\Gamma_{1}.

  2. (ii)

    Let xΓ1x\in\Gamma_{1} satisfy 2βε|H(x)|2ε2\beta\sqrt{\varepsilon}\leq|H(x)|\leq 2\sqrt{\varepsilon} and τx=inf{t:𝑿~tεΓ2}\tau_{x}=\inf\{t:{\tilde{\bm{X}}}_{t}^{\varepsilon}\in\Gamma_{2}\}. Then for all y𝕋my\in\mathbb{T}^{m}, 𝐏(x,y)(sup0tτx|H(𝑿~tε)H(x)|βε)>c\bm{\mathrm{P}}_{(x,y)}(\sup_{0\leq t\leq\tau_{x}}|H({\tilde{\bm{X}}}^{\varepsilon}_{t})-H(x)|\leq\beta\sqrt{\varepsilon})>c for all ε\varepsilon sufficiently small.

Refer to caption
Refer to caption
Figure 5: Curves in different coordinates.
Proof.

Suppose H(x)>0H(x)>0 for all xU1x\in U_{1}. By the Morse lemma, there exist neighborhoods UU and VV of the saddle point OO and the origin, respectively, and a diffeomorphism ψ\psi from UU to VV such that H(x)=G(ψ(x))H(x)=G(\psi(x)), where G(z)=z1z2G(z)=z_{1}z_{2}. Then consider a random change of time by dividing the generator by D(x):=det(xψ(x))D(x):=\mathrm{det}(\nabla_{x}\psi(x)):

d𝑿~tε\displaystyle d{\tilde{\bm{X}}}_{t}^{\varepsilon*} =b(𝑿~tε,𝝃~tε)D(𝑿~tε)dt,\displaystyle=\frac{b({\tilde{\bm{X}}}_{t}^{\varepsilon*},{\tilde{\bm{\xi}}}_{t}^{\varepsilon*})}{D({\tilde{\bm{X}}}_{t}^{\varepsilon*})}dt, (5.2)
d𝝃~tε\displaystyle d{\tilde{\bm{\xi}}}_{t}^{\varepsilon*} =1εv(𝝃~tε)D(𝑿~tε)dt+1εσ(𝝃~tε)D(𝑿~tε)dWt+c(𝑿~tε,𝝃~tε)D(𝑿~tε)dt.\displaystyle=\frac{1}{\varepsilon}\frac{v({\tilde{\bm{\xi}}}_{t}^{\varepsilon*})}{D({\tilde{\bm{X}}}_{t}^{\varepsilon*})}dt+\frac{1}{\sqrt{\varepsilon}}\frac{\sigma({\tilde{\bm{\xi}}}_{t}^{\varepsilon*})}{\sqrt{D({\tilde{\bm{X}}}_{t}^{\varepsilon*})}}dW_{t}+\frac{c({\tilde{\bm{X}}}_{t}^{\varepsilon*},{\tilde{\bm{\xi}}}_{t}^{\varepsilon*})}{D({\tilde{\bm{X}}}_{t}^{\varepsilon*})}dt.

Write the equation for Z~tε:=ψ(𝑿~tε){\tilde{Z}}_{t}^{\varepsilon*}:=\psi({\tilde{\bm{X}}}_{t}^{\varepsilon*}):

dZ~tε=1D(ψ1(Z~tε))xψ(ψ1(Z~tε))b(ψ1(Z~tε),𝝃~tε)dt=:b(Z~tε,𝝃~tε)dt.d{\tilde{Z}}_{t}^{\varepsilon*}=\frac{1}{D(\psi^{-1}({\tilde{Z}}_{t}^{\varepsilon*}))}\cdot\nabla_{x}\psi(\psi^{-1}({\tilde{Z}}_{t}^{\varepsilon*}))b(\psi^{-1}({\tilde{Z}}_{t}^{\varepsilon*}),{\tilde{\bm{\xi}}}_{t}^{\varepsilon*})dt=:b^{*}({\tilde{Z}}_{t}^{\varepsilon*},{\tilde{\bm{\xi}}}_{t}^{\varepsilon*})dt.

It is not hard to verify that b(z,y)b^{*}(z,y) satisfies

𝕋mb(z,y)𝑑μ(y)=G(z).\int_{\mathbb{T}^{m}}b^{*}(z,y)d\mu(y)=\nabla^{\perp}G(z).

Hence, by Lemma 3.2, there exists a bounded solution u(z,y)u^{*}(z,y) to

Lu(z,y)=(b(z,y)G(z))D(ψ1(z)).Lu^{*}(z,y)=-(b^{*}(z,y)-\nabla^{\perp}G(z))\cdot D(\psi^{-1}(z)).

Consider a local coordinate G=z1z2G=z_{1}z_{2} and ϕ=12log(z2/z1)\phi^{*}=\frac{1}{2}\log({z_{2}}/{z_{1}}) in VV. The averaged motion has constant speed: 0 in GG and 11 in ϕ\phi^{*}. As in (4.1) and (4.2), we have the equations for G~tε=G(Z~tε)\tilde{G}_{t}^{\varepsilon*}=G({\tilde{Z}}_{t}^{\varepsilon*}), Φ~tε=ϕ(Z~tε)\tilde{\Phi}_{t}^{\varepsilon*}=\phi^{*}({\tilde{Z}}_{t}^{\varepsilon*}), by applying Ito’s formula to ug=uGu^{*}_{g}=u^{*}\cdot\nabla G and uϕ=uϕu^{*}_{\phi}=u^{*}\cdot\nabla\phi^{*}, with z=ψ(x)z=\psi(x), g0=G(z)g_{0}=G(z), and ϕ0=ϕ(z)\phi^{*}_{0}=\phi^{*}(z):

G~tε=\displaystyle\tilde{G}_{t}^{\varepsilon*}=\leavevmode\nobreak\ g0+ε0tyug(Z~sε,𝝃~sε)𝖳σ(𝝃~sε)D(ψ1(Z~sε))𝑑Wsε(ug(Z~tε,𝝃~tε)ug(z,y))\displaystyle g_{0}+\sqrt{\varepsilon}\int_{0}^{t}\nabla_{y}u^{*}_{g}({\tilde{Z}}_{s}^{\varepsilon*},{\tilde{\bm{\xi}}}_{s}^{\varepsilon*})^{\mathsf{T}}\frac{\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon*})}{\sqrt{D(\psi^{-1}({\tilde{Z}}_{s}^{\varepsilon*}))}}dW_{s}-\varepsilon(u^{*}_{g}({\tilde{Z}}_{t}^{\varepsilon*},{\tilde{\bm{\xi}}}_{t}^{\varepsilon*})-u^{*}_{g}(z,y))
+ε0t[zug(Z~sε,𝝃~sε)b(Z~sε,𝝃~sε)+yug(Z~sε,𝝃~sε)c(ψ1(Z~sε),𝝃~sε)D(ψ1(Z~sε))]𝑑s,\displaystyle+\varepsilon\int_{0}^{t}\left[\nabla_{z}u^{*}_{g}({\tilde{Z}}_{s}^{\varepsilon*},{\tilde{\bm{\xi}}}_{s}^{\varepsilon*})\cdot b^{*}({\tilde{Z}}_{s}^{\varepsilon*},{\tilde{\bm{\xi}}}_{s}^{\varepsilon*})+\nabla_{y}u^{*}_{g}({\tilde{Z}}_{s}^{\varepsilon*},{\tilde{\bm{\xi}}}_{s}^{\varepsilon*})\cdot\frac{c(\psi^{-1}({\tilde{Z}}_{s}^{\varepsilon*}),{\tilde{\bm{\xi}}}_{s}^{\varepsilon*})}{D(\psi^{-1}({\tilde{Z}}_{s}^{\varepsilon*}))}\right]ds, (5.3)
Φ~tε=\displaystyle\tilde{\Phi}_{t}^{\varepsilon*}=\leavevmode\nobreak\ ϕ0+t+ε0tyuϕ(Z~sε,𝝃~sε)𝖳σ(𝝃~sε)D(ψ1(Z~sε))𝑑Wsε(uϕ(Z~tε,𝝃~tε)uϕ(z,y))\displaystyle\phi^{*}_{0}+t+\sqrt{\varepsilon}\int_{0}^{t}\nabla_{y}u^{*}_{\phi}({\tilde{Z}}_{s}^{\varepsilon*},{\tilde{\bm{\xi}}}_{s}^{\varepsilon*})^{\mathsf{T}}\frac{\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon*})}{\sqrt{D(\psi^{-1}({\tilde{Z}}_{s}^{\varepsilon*}))}}dW_{s}-\varepsilon(u^{*}_{\phi}({\tilde{Z}}_{t}^{\varepsilon*},{\tilde{\bm{\xi}}}_{t}^{\varepsilon*})-u^{*}_{\phi}(z,y))
+ε0t[zuϕ(Z~sε,𝝃~sε)b(Z~sε,𝝃~sε)+yuϕ(Z~sε,𝝃~sε)c(ψ1(Z~sε),𝝃~sε)D(ψ1(Z~sε))]𝑑s.\displaystyle+\varepsilon\int_{0}^{t}\left[\nabla_{z}u^{*}_{\phi}({\tilde{Z}}_{s}^{\varepsilon*},{\tilde{\bm{\xi}}}_{s}^{\varepsilon*})\cdot b^{*}({\tilde{Z}}_{s}^{\varepsilon*},{\tilde{\bm{\xi}}}_{s}^{\varepsilon*})+\nabla_{y}u^{*}_{\phi}({\tilde{Z}}_{s}^{\varepsilon*},{\tilde{\bm{\xi}}}_{s}^{\varepsilon*})\cdot\frac{c(\psi^{-1}({\tilde{Z}}_{s}^{\varepsilon*}),{\tilde{\bm{\xi}}}_{s}^{\varepsilon*})}{D(\psi^{-1}({\tilde{Z}}_{s}^{\varepsilon*}))}\right]ds. (5.4)

To get the lower bound for the desired probability, we will choose the curves Γ1\Gamma_{1} and Γ2\Gamma_{2} that are close enough to the saddle point. The time it takes to get from Γ1\Gamma_{1} to Γ2\Gamma_{2} is still of order |logε||\log\varepsilon| since they are chosen independently of ε\varepsilon. In this way, the process starting on Γ1\Gamma_{1} and stopped on Γ2\Gamma_{2} will be shown to have small variance, hence it is unlikely for the process to have deviations larger than what we wish. With C>0C>0 to be specified later, let l1={z:ϕ(z)=14logε+12logβ+C}l_{1}=\{z:\phi^{*}(z)=\frac{1}{4}\log\varepsilon+\frac{1}{2}\log\beta+C\}, l2={z:ϕ(z)=(14logε+12logβ+C)}l_{2}=\{z:\phi^{*}(z)=-(\frac{1}{4}\log\varepsilon+\frac{1}{2}\log\beta+C)\}, and l3={z:ϕ(z)=(14logε+12logβ+C)2}l_{3}=\{z:\phi^{*}(z)=-(\frac{1}{4}\log\varepsilon+\frac{1}{2}\log\beta+C)-2\}. The idea is to look at event that the process stays close to the averaged motion before the latter reaches l2l_{2}, which implies that the process does not make a large deviation in GG, or equivalently, in HH, before reaching l3l_{3}. Let Γ1\Gamma_{1}^{*}, Γ2\Gamma_{2}^{*} be the curves that have tangent vectors as xψψ1(xψψ1)𝖳G\nabla_{x}\psi\circ\psi^{-1}({\nabla_{x}\psi\circ\psi^{-1}})^{\mathsf{T}}\nabla G and go through the points (eC,eCβε)(e^{-C},e^{C}\beta\sqrt{\varepsilon}), (eC+1βε,eC1)(e^{C+1}\beta\sqrt{\varepsilon},e^{-C-1}), respectively. Since ψ\psi is a diffeomorphism, it is easy to see that each zz on Γ1\Gamma_{1}^{*} or Γ2\Gamma_{2}^{*} with G(z)βεG(z)\geq\beta\sqrt{\varepsilon} satisfies that 14logε+12logβ+Cϕ(z)(14logε+12logβ+C)2\frac{1}{4}\log\varepsilon+\frac{1}{2}\log\beta+C\leq\phi^{*}(z)\leq-(\frac{1}{4}\log\varepsilon+\frac{1}{2}\log\beta+C)-2. Let Γ1\Gamma_{1} and Γ2\Gamma_{2} be the pre-images of Γ1\Gamma_{1}^{*} and Γ2\Gamma_{2}^{*} in U1U_{1}, as shown in Figure 5. They have H\nabla H as tangent vectors due to the specific way we construct Γ1\Gamma_{1}^{*} and Γ2\Gamma_{2}^{*}. Consider the process in (5.2) starting at xΓ1x\in\Gamma_{1} satisfying that 2βεH(x)2ε2\beta\sqrt{\varepsilon}\leq H(x)\leq 2\sqrt{\varepsilon} with an arbitrary y𝕋my\in\mathbb{T}^{m}. Let ϕt=ϕ0+t\phi_{t}^{*}=\phi^{*}_{0}+t. Define tx=inf{t:ϕt=ϕ(l2)}t_{x}=\inf\{t:\phi_{t}^{*}=\phi^{*}(l_{2})\} and τx=inf{t:|G~tεg0|=βε}inf{t:|Φ~tεϕt|=1}tx\tau_{x}^{*}=\inf\{t:|\tilde{G}_{t}^{\varepsilon*}-g_{0}|=\beta\sqrt{\varepsilon}\}\wedge\inf\{t:|\tilde{\Phi}_{t}^{\varepsilon*}-\phi_{t}^{*}|=1\}\wedge t_{x}. Then it is clear that 𝐏(x,y)(sup0tτx|H(𝑿~tε)H(x)|βε)𝐏(x,y)(τx=tx)\bm{\mathrm{P}}_{(x,y)}(\sup_{0\leq t\leq\tau^{*}_{x}}|H({\tilde{\bm{X}}}^{\varepsilon}_{t})-H(x)|\leq\beta\sqrt{\varepsilon})\geq\bm{\mathrm{P}}_{(x,y)}(\tau_{x}^{*}=t_{x}). Let SGS_{G} and SϕS_{\phi} denote the stochastic integrals in (5.3) and (5.4), respectively, with tt replaced by τx\tau_{x}^{*}. Since τx|logε|\tau^{*}_{x}\lesssim|\log\varepsilon|, G\nabla G is bounded, and ϕε1/2\nabla\phi^{*}\lesssim\varepsilon^{-1/2} before τx\tau^{*}_{x}, we see that the unwanted deviations happen only if SGS_{G} and SϕS_{\phi} are large. Namely,

𝐏(x,y)(τx<tx)𝐏(x,y)(|SG|βε/2)+𝐏(x,y)(|Sϕ|1/2).\bm{\mathrm{P}}_{(x,y)}(\tau_{x}^{*}<t_{x})\leq\bm{\mathrm{P}}_{(x,y)}(|S_{G}|\geq\beta\sqrt{\varepsilon}/2)+\bm{\mathrm{P}}_{(x,y)}(|S_{\phi}|\geq 1/2).

Both terms on the right-hand side can be controlled by Chebyshev’s inequality. Note that there exists a constant K>0K>0 independent of ε\varepsilon such that

𝐕𝐚𝐫(SG)\displaystyle\bm{\mathrm{Var}}(S_{G}) εK𝐄0τx|G(Z~sε)|2𝑑s\displaystyle\leq\varepsilon K\bm{\mathrm{E}}\int_{0}^{\tau_{x}^{*}}|\nabla G({\tilde{Z}}_{s}^{\varepsilon*})|^{2}ds
=εK𝐄0τxG~sε(e2Φ~sε+e2Φ~sε)𝑑s\displaystyle=\varepsilon K\bm{\mathrm{E}}\int_{0}^{\tau_{x}^{*}}\tilde{G}_{s}^{\varepsilon*}(e^{2\tilde{\Phi}_{s}^{\varepsilon*}}+e^{-2\tilde{\Phi}_{s}^{\varepsilon*}})ds
εK0τx(2+β)εe2(e2ϕs+e2ϕs)𝑑s\displaystyle\leq\varepsilon K\int_{0}^{\tau_{x}^{*}}(2+\beta)\sqrt{\varepsilon}e^{2}(e^{2{\phi_{s}^{*}}}+e^{-2{\phi_{s}^{*}}})ds
3Kε3e202(14logε+12logβ+C)(e2ϕs+e2ϕs)𝑑s\displaystyle\leq 3K\sqrt{\varepsilon^{3}}e^{2}\int_{0}^{-2(\frac{1}{4}\log\varepsilon+\frac{1}{2}\log\beta+C)}(e^{2{\phi_{s}^{*}}}+e^{-2{\phi_{s}^{*}}})ds
= 3Kε3e214logε+12logβ+C(14logε+12logβ+C)(e2φ+e2φ)𝑑φ\displaystyle=\ 3K\sqrt{\varepsilon^{3}}e^{2}\int_{\frac{1}{4}\log\varepsilon+\frac{1}{2}\log\beta+C}^{-(\frac{1}{4}\log\varepsilon+\frac{1}{2}\log\beta+C)}(e^{2\varphi}+e^{-2\varphi})d\varphi
3βKe22Cε,\displaystyle\leq\ \frac{3}{\beta}Ke^{2-2C}\varepsilon,

and, similarly,

𝐕𝐚𝐫(Sϕ)εK𝐄0τx|Φ(Z~sε)|2𝑑sεK𝐄0τx1G~sε(e2Φ~sε+e2Φ~sε)𝑑s1β2Ke22C.\bm{\mathrm{Var}}(S_{\phi})\leq\varepsilon K\bm{\mathrm{E}}\int_{0}^{\tau^{*}_{x}}|\nabla\Phi({\tilde{Z}}_{s}^{\varepsilon*})|^{2}ds\leq\varepsilon K\bm{\mathrm{E}}\int_{0}^{\tau^{*}_{x}}\frac{1}{\tilde{G}^{\varepsilon*}_{s}}(e^{2\tilde{\Phi}_{s}^{\varepsilon*}}+e^{-2\tilde{\Phi}_{s}^{\varepsilon*}})ds\leq\frac{1}{\beta^{2}}Ke^{2-2C}.

Then, CC can be chosen large enough such that both variances are small enough, and hence 𝐏(x,y)(|H(𝑿~τxε)H(x)|βε)>c\bm{\mathrm{P}}_{(x,y)}(|H({\tilde{\bm{X}}}^{\varepsilon}_{\tau^{*}_{x}})-H(x)|\leq\beta\sqrt{\varepsilon})>c. ∎

Refer to caption
Figure 6: Four curves on four directions.

We can choose the corresponding curves in the other regions. As a result, we have four curves corresponding to four different directions, all with positive distance to the saddle point, as shown in Figure 6. Moreover, the corresponding transition probabilities near the saddle point have lower bounds analogous to that given in Lemma 5.3 (ii). For the rotations happening away from those curves, we will prove that, before the time when the process comes back to the curves, the deviation of HH can be large enough to cross the separatrix with positive probability. Let Γi(h1,h2)\Gamma_{i}(h_{1},h_{2}) be the set {xΓi:h1H(x)h2}\{x\in\Gamma_{i}:h_{1}\leq H(x)\leq h_{2}\}.

Lemma 5.4.

For each fixed t^>0\hat{t}>0,

𝐏(x,y)(inf0tt^H(𝑿~tε)ε,sup0tt^|𝑿~tε𝒙t|ε1+2α4)\bm{\mathrm{P}}_{(x,y)}\left(\inf_{0\leq t\leq\hat{t}}H({\tilde{\bm{X}}}_{t}^{\varepsilon})\leq-\sqrt{\varepsilon},\sup_{0\leq t\leq\hat{t}}|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}|\leq\varepsilon^{\frac{1+2\alpha}{4}}\right)

is positive uniformly in xΓ2(0,2ε)x\in\Gamma_{2}(0,2\sqrt{\varepsilon}), y𝕋my\in\mathbb{T}^{m}, and all ε\varepsilon sufficiently small.

Proof.

By Lemma 5.2 and the Markov property, it is enough to consider small t^\hat{t} such that 𝒙t\bm{x}_{t} does not reach Γ1\Gamma_{1} before t^\hat{t}. Using formula (3.10) again, we see that 𝐏(x,y)(sup0tt^|𝑿~tε𝒙t|>ε1+2α4)0\bm{\mathrm{P}}_{(x,y)}(\sup_{0\leq t\leq\hat{t}}|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}|>\varepsilon^{\frac{1+2\alpha}{4}})\to 0 as ε0\varepsilon\downarrow 0 uniformly in (x,y)(x,y). Use formula (4.1) on a shorter time scale:

H(𝑿~tε)\displaystyle H({\tilde{\bm{X}}}_{t}^{\varepsilon}) =H(x)+ε0tyuh(𝑿~sε,𝝃~sε)𝖳σ(𝝃~sε)𝑑Ws+ε(uh(x,y)uh(𝑿~tε,𝝃~tε))\displaystyle=H(x)+\sqrt{\varepsilon}\int_{0}^{t}\nabla_{y}u_{h}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon})dW_{s}+\varepsilon(u_{h}(x,y)-u_{h}({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}_{t}^{\varepsilon}))
+ε0t[xuh(𝑿~sε,𝝃~sε)b(𝑿~sε,𝝃~sε)+yuh(𝑿~sε,𝝃~sε)c(𝑿~sε,𝝃~sε)]𝑑s.\displaystyle\quad+\varepsilon\int_{0}^{t}[\nabla_{x}u_{h}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})\cdot b({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})+\nabla_{y}u_{h}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})\cdot c({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})]ds.

So, it suffices to show the uniform positivity of

𝐏(x,y)(inf0tt^0tyuh(𝑿~sε,𝝃~sε)𝖳σ(𝝃~sε)𝑑Ws4,sup0tt^|𝑿~tε𝒙t|ε1+2α4).\bm{\mathrm{P}}_{(x,y)}\left(\inf_{0\leq t\leq\hat{t}}\int_{0}^{t}\nabla_{y}u_{h}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon})dW_{s}\leq-4,\sup_{0\leq t\leq\hat{t}}|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}|\leq\varepsilon^{\frac{1+2\alpha}{4}}\right).

Note that there exists another Brownian motion W~\tilde{W} such that

0tyuh(𝑿~sε,𝝃~sε)𝖳σ(𝝃~sε)𝑑Ws=W~(0t|yuh(𝑿~sε,𝝃~sε)𝖳σ(𝝃~sε)|2𝑑s).\int_{0}^{t}\nabla_{y}u_{h}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon})dW_{s}=\tilde{W}\left(\int_{0}^{t}|\nabla_{y}u_{h}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon})|^{2}ds\right). (5.5)

Recall that in Subsection 4.1 we defined A(x)=𝕋m|yuh(x,y)σ(y)|2𝑑μ(y)A(x)=\int_{\mathbb{T}^{m}}|\nabla_{y}u_{h}(x,y)\sigma(y)|^{2}d\mu(y). Then, by Corollary 3.4,

𝐄(x,y)|0t^|yuh(𝑿~sε,𝝃~sε)𝖳σ(𝝃~sε))|2ds0t^A(𝑿~sε)ds|=O(ε).\bm{\mathrm{E}}_{(x,y)}\left|\int_{0}^{\hat{t}}|\nabla_{y}u_{h}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon}))|^{2}ds-\int_{0}^{\hat{t}}A({\tilde{\bm{X}}}_{s}^{\varepsilon})ds\right|=O(\sqrt{\varepsilon}). (5.6)

Note that on the event {sup0tt^|𝑿~tε𝒙t|ε1+2α4}\{\sup_{0\leq t\leq\hat{t}}|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}|\leq\varepsilon^{\frac{1+2\alpha}{4}}\}, A(𝑿~tε)A({\tilde{\bm{X}}}_{t}^{\varepsilon}) is uniformly positive for 0tt^0\leq t\leq\hat{t}. Let us denote this lower bound as mm, which is independent of xx, yy, and ε\varepsilon. Then

𝐏(x,y)(0t^A(𝑿~sε)𝑑s>mt^,sup0tt^|𝑿~tε𝒙t|ε1+2α4)1.\bm{\mathrm{P}}_{(x,y)}(\int_{0}^{\hat{t}}A({\tilde{\bm{X}}}_{s}^{\varepsilon})ds>m\hat{t},\sup_{0\leq t\leq\hat{t}}|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}|\leq\varepsilon^{\frac{1+2\alpha}{4}})\to 1. (5.7)

By the L1L^{1} convergence in (5.6), we obtain

𝐏(x,y)(0t^|yuh(𝑿~sε,𝝃~sε)𝖳σ(𝝃~sε))|2ds>mt^/2,sup0tt^|𝑿~tε𝒙t|ε1+2α4)1.\bm{\mathrm{P}}_{(x,y)}\left(\int_{0}^{\hat{t}}|\nabla_{y}u_{h}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon}))|^{2}ds>m\hat{t}/2,\sup_{0\leq t\leq\hat{t}}|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}|\leq\varepsilon^{\frac{1+2\alpha}{4}}\right)\to 1. (5.8)

Suppose 0<c<𝐏(inf0tmt^/2W~t<4)0<c<\bm{\mathrm{P}}(\inf_{0\leq t\leq m\hat{t}/2}\tilde{W}_{t}<-4). Then, for all ε\varepsilon sufficiently small,

𝐏(x,y)(inf0tt^0tyuh(𝑿~sε,𝝃~sε)𝖳σ(𝝃~sε)𝑑Ws4,sup0tt^|𝑿~tε𝒙t|ε1+2α4)\displaystyle\bm{\mathrm{P}}_{(x,y)}\left(\inf_{0\leq t\leq\hat{t}}\int_{0}^{t}\nabla_{y}u_{h}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon})dW_{s}\leq-4,\sup_{0\leq t\leq\hat{t}}|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}|\leq\varepsilon^{\frac{1+2\alpha}{4}}\right)
=𝐏(x,y)(inf0tt^W~(0t|yuh(𝑿~sε,𝝃~sε)𝖳σ(𝝃~sε))|2ds)4,sup0tt^|𝑿~tε𝒙t|ε1+2α4)\displaystyle=\bm{\mathrm{P}}_{(x,y)}\left(\inf_{0\leq t\leq\hat{t}}\tilde{W}\left(\int_{0}^{t}|\nabla_{y}u_{h}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon}))|^{2}ds\right)\leq-4,\sup_{0\leq t\leq\hat{t}}|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}|\leq\varepsilon^{\frac{1+2\alpha}{4}}\right)
𝐏(x,y)(inf0tmt^/2W~t4,0t^|yuh(𝑿~sε,𝝃~sε)𝖳σ(𝝃~sε))|2ds>mt^/2,sup0tt^|𝑿~tε𝒙t|ε1+2α4)\displaystyle\geq\bm{\mathrm{P}}_{(x,y)}\left(\inf_{0\leq t\leq m\hat{t}/2}\tilde{W}_{t}\leq-4,\int_{0}^{\hat{t}}|\nabla_{y}u_{h}({\tilde{\bm{X}}}_{s}^{\varepsilon},{\tilde{\bm{\xi}}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\bm{\xi}}}_{s}^{\varepsilon}))|^{2}ds>m\hat{t}/2,\sup_{0\leq t\leq\hat{t}}|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}|\leq\varepsilon^{\frac{1+2\alpha}{4}}\right)
c/2.\displaystyle\geq c/2.\qed
Remark 5.5.

The result in Lemma 5.4 also holds for xΓ4(0,2ε)x\in\Gamma_{4}(0,2\sqrt{\varepsilon}). Similarly, for each fixed t^>0\hat{t}>0,

𝐏(x,y)(sup0tt^H(𝑿~tε)ε,sup0tt^|𝑿~tε𝒙t|ε1+2α4)\bm{\mathrm{P}}_{(x,y)}\left(\sup_{0\leq t\leq\hat{t}}H({\tilde{\bm{X}}}_{t}^{\varepsilon})\geq\sqrt{\varepsilon},\sup_{0\leq t\leq\hat{t}}|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{t}|\leq\varepsilon^{\frac{1+2\alpha}{4}}\right)

is positive uniformly in xΓ2(2ε,0)Γ4(2ε,0)x\in\Gamma_{2}(-2\sqrt{\varepsilon},0)\cup\Gamma_{4}(-2\sqrt{\varepsilon},0), y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small.

Now we can choose β=1/10\beta=1/10. By the results in Lemma 5.2, Lemma 5.3, Lemma 5.4, and Remark 5.5, using the strong Markov property, we obtain the following lemma:

Lemma 5.6.

There exists a closed interval II on γ\gamma that does not contain the saddle point and a constant 0<c<10<c<1 satisfying the following property: if the system (3.3) starts at (x,y)γ×𝕋m(x,y)\in\gamma^{\prime}\times\mathbb{T}^{m}, then for all ε\varepsilon sufficiently small

𝐏(x,y)(η~1<𝝉1)c\bm{\mathrm{P}}_{(x,y)}(\tilde{\eta}_{1}<\bm{\tau}_{1})\geq c (5.9)

where η~1=inf{t:𝐗~tεI}\tilde{\eta}_{1}=\inf\{t:{\tilde{\bm{X}}}_{t}^{\varepsilon}\in I\}.

Remark 5.7.

In order for us to apply Lemma 5.4, we need to choose II that contains the intersection of Γ2\Gamma_{2} and γ\gamma in its interior. In fact, it is not difficult to show that Lemma 5.6 holds for any subset of γ\gamma with non-empty interior.

Step 2. Without loss of generality, we assume that, if 𝒙t\bm{x}_{t} starts at one endpoint of II, then the other endpoint is 𝒙1/2\bm{x}_{1/2}. In the remainder of this section, 𝒙t\bm{x}_{t} always denotes this deterministic motion, irrespective of where 𝑿~tε{\tilde{\bm{X}}}_{t}^{\varepsilon} starts. We aim to study the distribution of the process (𝑿~tε,𝝃~tε)({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}_{t}^{\varepsilon}) starting on I×𝕋mI\times\mathbb{T}^{m} with certain t>0t>0. The choice of tt will depend on the initial point xx being considered (see Figure 7), and this will be convenient as we use the strong Markov property later when combining all three steps.

Refer to caption
Figure 7: Higher order estimate on the distribution of 𝑿~tε{\tilde{\bm{X}}}_{t}^{\varepsilon}.

To be more precise, for xIx\in I, let s(x)s(x) be such that 𝒙s(x)=x\bm{x}_{s(x)}=x (so 0s(x)1/20\leq s(x)\leq 1/2). We introduce a process ζ~tε\tilde{\zeta}_{t}^{\varepsilon}, 0t0\leq t, as the second term in the expansion of 𝑿~tε{\tilde{\bm{X}}}_{t}^{\varepsilon} around the deterministic motion 𝒙(s(x)+t)\bm{x}_{(s(x)+t)}:

dζ~tε\displaystyle d\tilde{\zeta}_{t}^{\varepsilon} =(H)(𝒙s(x)+t)ζ~tεdt+[b(𝒙s(x)+t,𝝃~tε)H(𝒙s(x)+t)]dt,ζ~0ε=0.\displaystyle=\nabla(\nabla^{\perp}H)(\bm{x}_{s(x)+t})\tilde{\zeta}_{t}^{\varepsilon}dt+[b(\bm{x}_{s(x)+t},{\tilde{\bm{\xi}}}_{t}^{\varepsilon})-\nabla^{\perp}H(\bm{x}_{s(x)+t})]dt,\leavevmode\nobreak\ \tilde{\zeta}_{0}^{\varepsilon}=0. (5.10)

(Note that, for finite tt, ζ~tε\tilde{\zeta}_{t}^{\varepsilon} is of order ε\sqrt{\varepsilon}.) Then, by standard perturbation arguments and Grönwall’s inequality, it can be shown that, uniformly in xx such that 0s(x)1/20\leq s(x)\leq 1/2, 0t1s(x)0\leq t\leq 1-s(x), and y𝕋my\in\mathbb{T}^{m},

𝐄(x,y)|𝑿~tε𝒙s(x)+tζ~tε|=O(ε).\bm{\mathrm{E}}_{(x,y)}|{\tilde{\bm{X}}}_{t}^{\varepsilon}-\bm{x}_{s(x)+t}-\tilde{\zeta}_{t}^{\varepsilon}|=O(\varepsilon). (5.11)

Therefore, understanding of the distribution of ζ~1s(x)ε\tilde{\zeta}_{1-s(x)}^{\varepsilon} would help one to understand the distribution of 𝑿~1s(x)ε{\tilde{\bm{X}}}_{1-s(x)}^{\varepsilon}. However, it is not straightforward to study ζ~tε\tilde{\zeta}_{t}^{\varepsilon} since (𝝃~tε,ζ~tε)({\tilde{\bm{\xi}}}_{t}^{\varepsilon},\tilde{\zeta}_{t}^{\varepsilon}) is not a Markov process. We introduce a related process ζtε\zeta_{t}^{\varepsilon} defined using the original Markov process 𝝃tε{\bm{\xi}}_{t}^{\varepsilon}, apply the local limit theorem to (𝝃tε,ζtε)({\bm{\xi}}_{t}^{\varepsilon},\zeta_{t}^{\varepsilon}), and use the Girsanov theorem to get the desired estimate. Namely, let ζtε\zeta_{t}^{\varepsilon}, sts\leq t, be defined by:

dζtε\displaystyle d\zeta_{t}^{\varepsilon} =(H)(𝒙s(x)+t)ζtεdt+[b(𝒙s(x)+t,𝝃tε)H(𝒙s(x)+t)]dt,ζ0ε=0.\displaystyle=\nabla(\nabla^{\perp}H)(\bm{x}_{s(x)+t})\zeta_{t}^{\varepsilon}dt+[b(\bm{x}_{s(x)+t},{\bm{\xi}}_{t}^{\varepsilon})-\nabla^{\perp}H(\bm{x}_{s(x)+t})]dt,\leavevmode\nobreak\ \zeta_{0}^{\varepsilon}=0. (5.12)

The following result is a version of the local limit theorem [16] adapted to our case.

Theorem 5.8.

Let g:[0,1]×𝕋m2g:[0,1]\times\mathbb{T}^{m}\to\mathbb{R}^{2} be a CC^{\infty} function such that g(t,)g(t,\cdot) spans 2\mathbb{R}^{2} and 𝕋mg(t,y)𝑑μ(y)=0\int_{\mathbb{T}^{m}}g(t,y)d\mu(y)=0 for all t0t\geq 0, where μ\mu is the invariant measure of 𝛏tε{\bm{\xi}}_{t}^{\varepsilon}. Then a local limit theorem holds for the following random variable as ε0\varepsilon\to 0 uniformly in (x,y)I×𝕋m(x,y)\in I\times\mathbb{T}^{m},

Sε:=1ε01s(x)g(s(x)+t,𝝃tε)𝑑t.S^{\varepsilon}:=\frac{1}{\varepsilon}\int_{0}^{1-s(x)}g(s(x)+t,{\bm{\xi}}_{t}^{\varepsilon})dt.

Namely, there exists an invertible covariance matrix B(s)B(s) continuous in ss such that

limε0|2πεdetB(s(x))𝐏(x,y)(Sεu[0,1)2)exp(εB(s(x))1u,u2)|=0,\lim_{\varepsilon\to 0}\left|\frac{2\pi}{\varepsilon}\sqrt{\mathrm{det}B(s(x))}\cdot\bm{\mathrm{P}}_{(x,y)}(S^{\varepsilon}-u\in[0,1)^{2})-\exp({-\frac{\varepsilon\langle B(s(x))^{-1}u,u\rangle}{2}})\right|=0, (5.13)

uniformly in u2u\in\mathbb{R}^{2}, xIx\in I, and y𝕋my\in\mathbb{T}^{m}.

The second term in (5.13) is non-trivial even when uu takes large values (of order 1/ε1/\sqrt{\varepsilon}), which is exactly the situation we are dealing with. Following (5.12), we solve explicitly

ζ1s(x)ε=01s(x)Us(x)+t,1(b(𝒙s(x)+t,𝝃tε)H(𝒙s(x)+t))𝑑t,\zeta_{1-s(x)}^{\varepsilon}=\int_{0}^{1-s(x)}U_{s(x)+t,1}(b(\bm{x}_{s(x)+t},{\bm{\xi}}_{t}^{\varepsilon})-\nabla^{\perp}H(\bm{x}_{s(x)+t}))dt, (5.14)

where Ut,sU_{t,s} solves the differential equation

dUt,s=(H)(𝒙s)Ut,sds,dU_{t,s}=\nabla(\nabla^{\perp}H)(\bm{x}_{s})U_{t,s}ds,

and Ut,tU_{t,t} is the identity matrix. Since 𝒙t\bm{x}_{t} is deterministic, the integrand can be treated as a function only of time tt and 𝝃tε{\bm{\xi}}_{t}^{\varepsilon}. Moreover, for each tt, the integrand has zero mean w.r.t. the invariant measure and spans 2\mathbb{R}^{2}, since Ut,1U_{t,1} is deterministic and non-singular and, for each xx, {b(x,y)H(x):y𝕋m}\{b(x,y)-\nabla^{\perp}H(x):y\in\mathbb{T}^{m}\} spans 2\mathbb{R}^{2} by assumption (H4). Then Theorem 5.8 implies that

𝐏(x,y)(1εζ1s(x)ε[j,j+1)×[k,k+1))ε4πdetB(s(x))exp(εB(s(x))1(j,k),(j,k)2)\bm{\mathrm{P}}_{(x,y)}\left(\frac{1}{\varepsilon}\zeta_{1-s(x)}^{\varepsilon}\in[j,j+1)\times[k,k+1)\right)\geq\frac{\varepsilon}{4\pi\sqrt{\mathrm{det}B(s(x))}}\exp\left(-\frac{\varepsilon\langle B(s(x))^{-1}(j,k),(j,k)\rangle}{2}\right) (5.15)

for all ε\varepsilon small enough, 1/εj,k1/ε-1/\sqrt{\varepsilon}\leq j,k\leq 1/\sqrt{\varepsilon}, xIx\in I, and y𝕋my\in\mathbb{T}^{m}. Finally, we compare (𝑿~tε,𝝃~tε,ζ~tε)({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}_{t}^{\varepsilon},\tilde{\zeta}_{t}^{\varepsilon}) with (𝑿tε,𝝃tε,ζtε)({\bm{X}}_{t}^{\varepsilon},{\bm{\xi}}_{t}^{\varepsilon},\zeta_{t}^{\varepsilon}). Since the added drift c(x,y)c(x,y) in the equation of 𝝃~tε{\tilde{\bm{\xi}}}_{t}^{\varepsilon} is small compared to the diffusion term 1εσ(y)\frac{1}{\sqrt{\varepsilon}}\sigma(y), it is not hard to verify that, using the Girsanov theorem, for all ε\varepsilon small enough, 1/εj,k1/ε-1/\sqrt{\varepsilon}\leq j,k\leq 1/\sqrt{\varepsilon}, xIx\in I, and y𝕋my\in\mathbb{T}^{m},

𝐏(x,y)(1εζ~1s(x)ε[j,j+1)×[k,k+1))12𝐏(x,y)(1εζ1s(x)ε[j,j+1)×[k,k+1)).\bm{\mathrm{P}}_{(x,y)}\left(\frac{1}{\varepsilon}\tilde{\zeta}_{1-s(x)}^{\varepsilon}\in[j,j+1)\times[k,k+1)\right)\geq\frac{1}{2}\bm{\mathrm{P}}_{(x,y)}\left(\frac{1}{\varepsilon}\zeta_{1-s(x)}^{\varepsilon}\in[j,j+1)\times[k,k+1)\right). (5.16)

Step 3. We proved that 𝒙1+ζ~1s(x)ε\bm{x}_{1}+\tilde{\zeta}_{1-s(x)}^{\varepsilon} reaches the O(ε)O(\varepsilon)-sized boxes with probabilities bounded from below. We also know that 𝑿~1s(x)ε{\tilde{\bm{X}}}_{1-s(x)}^{\varepsilon} is O(ε)O(\varepsilon)-close to 𝒙1+ζ~1s(x)ε\bm{x}_{1}+\tilde{\zeta}_{1-s(x)}^{\varepsilon} in L1L^{1}. Let us take one generic pair (j,k)(j,k), let Bj,kε,K=𝒙1+[(jK)ε,(j+1+K)ε)×[(kK)ε,(k+1+K)ε)B^{\varepsilon,K}_{j,k}=\bm{x}_{1}+[(j-K)\varepsilon,(j+1+K)\varepsilon)\times[(k-K)\varepsilon,(k+1+K)\varepsilon), and study the distribution of (𝑿~tε,𝝃~tε)({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}_{t}^{\varepsilon}) with the initial point in Bj,kε,K×𝕋mB^{\varepsilon,K}_{j,k}\times\mathbb{T}^{m} after time of order O(ε)O(\varepsilon) (see Figure 8).

Refer to caption
Figure 8: Common component of the distributions.
Lemma 5.9.

For each κ>0\kappa>0, K>0K>0, and y^𝕋m\hat{y}\in\mathbb{T}^{m}, there exist t2>0t_{2}>0, c>0c>0, and, for each pair (j,k)(j,k), a point x^j,kε\hat{x}_{j,k}^{\varepsilon} such that, for each (x,y)Bj,kε,K×𝕋m(x,y)\in B^{\varepsilon,K}_{j,k}\times\mathbb{T}^{m} and all ε\varepsilon sufficiently small,

𝐏(x,y)(τκ<t2ε)c,\bm{\mathrm{P}}_{(x,y)}(\tau_{\kappa}<t_{2}\varepsilon)\geq c,

where τκ=inf{t:𝐗~tεB(x^j,kε,κε),𝛏~tεB(y^,κ)}\tau_{\kappa}=\inf\{t:{\tilde{\bm{X}}}_{t}^{\varepsilon}\in B(\hat{x}_{j,k}^{\varepsilon},\kappa\varepsilon),\leavevmode\nobreak\ {\tilde{\bm{\xi}}}_{t}^{\varepsilon}\in B(\hat{y},\kappa)\}.

Proof.

Recall the definition of 𝒙1\bm{x}_{1} at the beginning of Step 2 (see Figure 7). By assumption (H4), {b(𝒙1,y):y𝕋m}\{b(\bm{x}_{1},y):y\in\mathbb{T}^{m}\} spans 2\mathbb{R}^{2}. So there exist y1,y2𝕋my_{1},y_{2}\in\mathbb{T}^{m} such that v1:=b(𝒙1,y1)v_{1}:=b(\bm{x}_{1},y_{1}) and v2:=b(𝒙1,y2)v_{2}:=b(\bm{x}_{1},y_{2}) span 2\mathbb{R}^{2}. Let us consider the set Sj,k=xBj,kε,K{x+av1+bv2:a,b0}S_{j,k}=\bigcap_{x\in B_{j,k}^{\varepsilon,K}}\{x+av_{1}+bv_{2}:a,b\geq 0\}. Then it is easy to see that, there exist a constant t2>0t_{2}>0 and, for each pair (j,k)(j,k), a point x^j,kεSj,k\hat{x}_{j,k}^{\varepsilon}\in S_{j,k} such that for all xBj,kε,Kx\in B_{j,k}^{\varepsilon,K}, x^j,kε=x+axεv1+bxεv2\hat{x}_{j,k}^{\varepsilon}=x+a_{x}\varepsilon v_{1}+b_{x}\varepsilon v_{2} and 0<ax,bx<t2/50<a_{x},b_{x}<t_{2}/5. There exists δ>0\delta>0 such that for each xB(𝒙1,2δ)x\in B(\bm{x}_{1},2\delta) and each yy in B(yi,2δ)B(y_{i},2\delta), |b(x,y)vi|<κ/t2|b(x,y)-v_{i}|<\kappa/t_{2}, i=1,2i=1,2. Let MM be the upper bound of vector b(x,y)b(x,y). For all ε\varepsilon sufficiently small, the probability of the following event, denoted by EE, has a lower bound, denoted by cc, that only depends on t2t_{2}, κ\kappa, MM, y1y_{1}, y2y_{2}, y^\hat{y}, δ\delta, and not on the starting point (x,y)B(𝒙1,δ)×𝕋m(x,y)\in B(\bm{x}_{1},\delta)\times\mathbb{T}^{m}, thus not on (j,k)(j,k):

E={τ1<(t2κ/M)ε/5;𝝃~τ1+tεB(y1,2δ),t[0,axε];τ2<τ1+ax+(t2κ/M)ε/5;𝝃~τ2+tεB(y2,2δ),t[0,bxε];τ3<τ2+bxε+(t2κ/M)ε/5},E=\begin{Bmatrix}\tau_{1}<(t_{2}\wedge\kappa/M)\varepsilon/5;\leavevmode\nobreak\ {\tilde{\bm{\xi}}}^{\varepsilon}_{\tau_{1}+t}\in B(y_{1},2\delta),\leavevmode\nobreak\ t\in[0,a_{x}\varepsilon];\leavevmode\nobreak\ \tau_{2}<\tau_{1}+a_{x}+(t_{2}\wedge\kappa/M)\varepsilon/5;\\ {\tilde{\bm{\xi}}}^{\varepsilon}_{\tau_{2}+t}\in B(y_{2},2\delta),\leavevmode\nobreak\ t\in[0,b_{x}\varepsilon];\leavevmode\nobreak\ \tau_{3}<\tau_{2}+b_{x}\varepsilon+(t_{2}\wedge\kappa/M)\varepsilon/5\end{Bmatrix},

where τ1=inf{t0:𝝃~tεB(y1,δ)}\tau_{1}=\inf\{t\geq 0:{\tilde{\bm{\xi}}}^{\varepsilon}_{t}\in B(y_{1},\delta)\}, τ2=inf{tτ1+axε:𝝃~tεB(y2,δ)}\tau_{2}=\inf\{t\geq\tau_{1}+a_{x}\varepsilon:{\tilde{\bm{\xi}}}^{\varepsilon}_{t}\in B(y_{2},\delta)\}, and τ3=inf{tτ2+bxε:𝝃~tεB(y^,κ)}\tau_{3}=\inf\{t\geq\tau_{2}+b_{x}\varepsilon:{\tilde{\bm{\xi}}}^{\varepsilon}_{t}\in B(\hat{y},\kappa)\}. If EE is a subset of the event {τκ<t2ε}\{\tau_{\kappa}<t_{2}\varepsilon\}, then the lemma is proved. To show the inclusion, note that on EE,

|𝑿~τ3εx^j,kε|\displaystyle|{\tilde{\bm{X}}}_{\tau_{3}}^{\varepsilon}-\hat{x}_{j,k}^{\varepsilon}| =|𝑿~τ3ε(x+axεv1+bxεv2)|\displaystyle=|{\tilde{\bm{X}}}_{\tau_{3}}^{\varepsilon}-(x+a_{x}\varepsilon v_{1}+b_{x}\varepsilon v_{2})|
|𝑿~τ3ε𝑿~τ2+bxεε|+|𝑿~τ2+bxεε(𝑿~τ2ε+bxεv2)|+|𝑿~τ2ε𝑿~τ1+axεε|\displaystyle\leq|{\tilde{\bm{X}}}_{\tau_{3}}^{\varepsilon}-{\tilde{\bm{X}}}_{\tau_{2}+b_{x}\varepsilon}^{\varepsilon}|+|{\tilde{\bm{X}}}_{\tau_{2}+b_{x}\varepsilon}^{\varepsilon}-({\tilde{\bm{X}}}_{\tau_{2}}^{\varepsilon}+b_{x}\varepsilon v_{2})|+|{\tilde{\bm{X}}}_{\tau_{2}}^{\varepsilon}-{\tilde{\bm{X}}}_{\tau_{1}+a_{x}\varepsilon}^{\varepsilon}|
+|𝑿~τ1+axεε(𝑿~τ1ε+axεv1)|+|𝑿~τ1εx|\displaystyle\quad\quad+|{\tilde{\bm{X}}}_{\tau_{1}+a_{x}\varepsilon}^{\varepsilon}-({\tilde{\bm{X}}}_{\tau_{1}}^{\varepsilon}+a_{x}\varepsilon v_{1})|+|{\tilde{\bm{X}}}_{\tau_{1}}^{\varepsilon}-x|
κε.\displaystyle\leq\kappa\varepsilon.

Besides, by the definition of τ3\tau_{3}, 𝝃~τ3εB(y^,κ){\tilde{\bm{\xi}}}_{\tau_{3}}^{\varepsilon}\in B(\hat{y},\kappa). Thus τκτ3<t2ε\tau_{\kappa}\leq\tau_{3}<t_{2}\varepsilon on EE. ∎

From now on, let y^\hat{y} be the point in assumption (H5) such that the parabolic Hörmander condition holds at (𝒙1,y^)(\bm{x}_{1},\hat{y}) and let ptε((x,y),)p_{t}^{\varepsilon}((x,y),\cdot) be the density of (𝑿~tεε,𝝃~tεε)({\tilde{\bm{X}}}_{t\varepsilon}^{\varepsilon},{\tilde{\bm{\xi}}}_{t\varepsilon}^{\varepsilon}) starting at (x,y)(x,y).

Lemma 5.10.

There exists κ>0\kappa>0 such that for each x^B(𝐱1,κ)\hat{x}\in B(\bm{x}_{1},\kappa) and all ε\varepsilon sufficiently small, there is a domain Cεx^,y^Vε×𝕋mC^{\varepsilon}_{\hat{x},\hat{y}}\subset V^{\varepsilon}\times\mathbb{T}^{m} with λ(Cεx^,y^)>κε2\lambda(C^{\varepsilon}_{\hat{x},\hat{y}})>\kappa\varepsilon^{2} and p1ε((x,y),)>κ/ε2p_{1}^{\varepsilon}((x,y),\cdot)>\kappa/\varepsilon^{2} on Cεx^,y^C^{\varepsilon}_{\hat{x},\hat{y}} for (x,y)B(x^,κε)×B(y^,κ)(x,y)\in B(\hat{x},\kappa\varepsilon)\times B(\hat{y},\kappa).

Proof.

Consider the stochastic processes that depend on the parameters (ε,x,y)(\varepsilon,x,y):

dθtε,x,y\displaystyle d\theta_{t}^{\varepsilon,x,y} =b(x+εθtε,x,y,y+ηtε,x,y)dt,θ0ε,x,y=02,\displaystyle=b(x+\varepsilon\theta_{t}^{\varepsilon,x,y},y+\eta_{t}^{\varepsilon,x,y})dt,\leavevmode\nobreak\ \theta_{0}^{\varepsilon,x,y}=0\in\mathbb{R}^{2}, (5.17)
dηtε,x,y\displaystyle d\eta_{t}^{\varepsilon,x,y} =v(y+ηtε,x,y)dt+εc(x+εθtε,x,y,y+ηtε,x,y)dt+σ(y+ηtε,x,y)dWt,η0ε,x,y=0m.\displaystyle=v(y+\eta_{t}^{\varepsilon,x,y})dt+\varepsilon c(x+\varepsilon\theta_{t}^{\varepsilon,x,y},y+\eta_{t}^{\varepsilon,x,y})dt+\sigma(y+\eta_{t}^{\varepsilon,x,y})dW_{t},\leavevmode\nobreak\ \eta_{0}^{\varepsilon,x,y}={0}\in\mathbb{R}^{m}.

Since, by assumption (H5), the parabolic Hörmander condition for equation (1.1) holds at (𝒙1,y^)(\bm{x}_{1},\hat{y}), it is not hard to see that, if (x,y)(x,y) is close to (𝒙1,y^)(\bm{x}_{1},\hat{y}) and ε\varepsilon is small, the parabolic Hörmander condition holds for (5.17) at 0 and the distribution of (θtε,x,y,ηtε,x,y)(\theta_{t}^{\varepsilon,x,y},\eta_{t}^{\varepsilon,x,y}) is absolutely continuous w.r.t. the Lebesgue measure ([17]). Moreover, if the density function, denoted by p~1ε,x,y(θ,η)\tilde{p}_{1}^{\varepsilon,x,y}(\theta,\eta), exists, it is continuous in ε,x,y,θ\varepsilon,x,y,\theta, and η\eta. Let θ^\hat{\theta} and η^\hat{\eta} satisfy that p~10,𝒙1,y^(θ^,η^)>0\tilde{p}_{1}^{0,\bm{x}_{1},\hat{y}}(\hat{\theta},\hat{\eta})>0. Then there exists 0<δ<10<\delta<1 such that p~1ε,x,y(θ,η)\tilde{p}_{1}^{\varepsilon,x,y}(\theta,\eta) exists and is greater than δ\delta for all 0<ε<δ0<\varepsilon<\delta, xB(𝒙1,δ)x\in B(\bm{x}_{1},\delta), yB(y^,δ)y\in B(\hat{y},\delta), θB(θ^,δ)\theta\in B(\hat{\theta},\delta), and ηB(η^,δ)\eta\in B(\hat{\eta},\delta). For x^B(𝒙1,δ/2)\hat{x}\in B(\bm{x}_{1},\delta/2), define Cεx^,y^=B(x^+εθ^,εδ/2)×B(y^+η^,δ/2)C^{\varepsilon}_{\hat{x},\hat{y}}=B(\hat{x}+\varepsilon\hat{\theta},\varepsilon\delta/2)\times B(\hat{y}+\hat{\eta},\delta/2). Then, for (x,y)B(x^,εδ/2)×B(y^,δ/2)(x,y)\in B(\hat{x},\varepsilon\delta/2)\times B(\hat{y},\delta/2), and (x,y)Cεx^,y^(x^{\prime},y^{\prime})\in C^{\varepsilon}_{\hat{x},\hat{y}}, and 0<ε<δ0<\varepsilon<\delta, we have that

pε1((x,y),(x,y))=1ε2p~ε,x,y(xxε,yy)>δε2.p^{\varepsilon}_{1}((x,y),(x^{\prime},y^{\prime}))=\frac{1}{\varepsilon^{2}}\tilde{p}^{\varepsilon,x,y}\left(\frac{x^{\prime}-x}{\varepsilon},y^{\prime}-y\right)>\frac{\delta}{\varepsilon^{2}}.

The result holds with κ=(δ/2)m+2\kappa=(\delta/2)^{m+2}. ∎

Lemma 5.11.

For each K>0K>0, there exist constants c>0c>0 and t1>0t_{1}>0 such that for all 1/εj,k1/ε-1/\sqrt{\varepsilon}\leq j,k\leq 1/\sqrt{\varepsilon}, there exists a measure πεj,k\pi^{\varepsilon}_{j,k} and a stopping time η~3j,k<t1ε\tilde{\eta}_{3}^{j,k}<t_{1}\varepsilon such that for each (x,y)Bε,Kj,k×𝕋m(x,y)\in B^{\varepsilon,K}_{j,k}\times\mathbb{T}^{m}, the distribution of (𝐗~η~3j,kε,𝛏~η~3j,kε)({\tilde{\bm{X}}}_{\tilde{\eta}_{3}^{j,k}}^{\varepsilon},{\tilde{\bm{\xi}}}_{\tilde{\eta}_{3}^{j,k}}^{\varepsilon}) starting at (x,y)(x,y) has πεj,k\pi^{\varepsilon}_{j,k} as a component and πεj,k(Vε×𝕋m)>c\pi^{\varepsilon}_{j,k}(V^{\varepsilon}\times\mathbb{T}^{m})>c for all ε\varepsilon sufficiently small.

Proof.

We fix constant κ>0\kappa>0 such that the statements in Lemma 5.10 hold. Then, for the fixed κ\kappa, by Lemma 5.9, we fix t2>0t_{2}>0, c>0c^{\prime}>0, and the point x^j,kε\hat{x}_{j,k}^{\varepsilon} for each pair (j,k)(j,k) such that for all (x,y)Bε,Kj,k×𝕋m(x,y)\in B^{\varepsilon,K}_{j,k}\times\mathbb{T}^{m} and ε\varepsilon small, 𝐏(x,y)(τκ<t2ε)c\bm{\mathrm{P}}_{(x,y)}(\tau_{\kappa}<t_{2}\varepsilon)\geq c^{\prime}, where τκ=inf{t:𝑿~tεB(x^j,kε,κε),𝝃~tεB(y^,κ)}\tau_{\kappa}=\inf\{t:{\tilde{\bm{X}}}_{t}^{\varepsilon}\in B(\hat{x}_{j,k}^{\varepsilon},\kappa\varepsilon),\leavevmode\nobreak\ {\tilde{\bm{\xi}}}_{t}^{\varepsilon}\in B(\hat{y},\kappa)\}. It follows from Lemma 5.10 that there is a domain Cεj,kVε×𝕋mC^{\varepsilon}_{j,k}\subset V^{\varepsilon}\times\mathbb{T}^{m} with λ(Cεj,k)>κε2\lambda(C^{\varepsilon}_{j,k})>\kappa\varepsilon^{2} and p1ε((x,y),)>κ/ε2p_{1}^{\varepsilon}((x,y),\cdot)>\kappa/\varepsilon^{2} on Cεj,kC^{\varepsilon}_{j,k} for all (x,y)B(x^j,kε,κε)×B(y^,κ)(x,y)\in B(\hat{x}_{j,k}^{\varepsilon},\kappa\varepsilon)\times B(\hat{y},\kappa). Then the result follows if we define c=cκ2c=c^{\prime}\kappa^{2}, πεj,k=cκ/ε2χ{Cεj,k}λ\pi^{\varepsilon}_{j,k}=c^{\prime}\kappa/\varepsilon^{2}\cdot\chi_{\{C^{\varepsilon}_{j,k}\}}\lambda, t1=t2+2t_{1}=t_{2}+2, and η~3j,k=τκt2ε+ε<t1ε\tilde{\eta}_{3}^{j,k}=\tau_{\kappa}\wedge t_{2}\varepsilon+\varepsilon<t_{1}\varepsilon. ∎

Now let us combine Step 2 and Step 3 together to get the following result concerning the total variation distance of (𝑿~𝝉1,𝝃~𝝉1)({\tilde{\bm{X}}}_{\bm{\tau}_{1}},{\tilde{\bm{\xi}}}_{\bm{\tau}_{1}}) with different starting points on I×𝕋mI\times\mathbb{T}^{m}:

Lemma 5.12.

For each (x,y)I×𝕋m(x,y)\in I\times\mathbb{T}^{m}, let μ~x,yε\tilde{\mu}_{x,y}^{\varepsilon} be the measure induced by (𝐗~𝛕1,𝛏~𝛕1)({\tilde{\bm{X}}}_{\bm{\tau}_{1}},{\tilde{\bm{\xi}}}_{\bm{\tau}_{1}}) starting at (x,y)(x,y). Then there exists c>0c>0 such that TV(μ~x,yε,μ~x,yε)<1c\mathrm{TV}(\tilde{\mu}_{x,y}^{\varepsilon},\tilde{\mu}_{x^{\prime},y^{\prime}}^{\varepsilon})<1-c for any (x,y),(x,y)I×𝕋m(x,y),(x^{\prime},y^{\prime})\in I\times\mathbb{T}^{m} and all ε\varepsilon sufficiently small.

Proof.

It suffices to show that there exist c>0c>0 and a stopping time η~𝝉1\tilde{\eta}\leq\bm{\tau}_{1} such that the total variation distance of (𝑿~η~,𝝃~η~)({\tilde{\bm{X}}}_{\tilde{\eta}},{\tilde{\bm{\xi}}}_{\tilde{\eta}}) with different starting points on I×𝕋mI\times\mathbb{T}^{m} is no more than 1c1-c. Recall the definitions of s(x)s(x) and ζ~tε\tilde{\zeta}_{t}^{\varepsilon} in Step 2. For the process (𝑿~tε,𝝃~tε)({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}_{t}^{\varepsilon}) starting at (x,y)I×𝕋m(x,y)\in I\times\mathbb{T}^{m}, define

Aj,kε={1εζ~1s(x)ε[j,j+1)×[k,k+1)},A_{j,k}^{\varepsilon}=\{\frac{1}{\varepsilon}\tilde{\zeta}_{1-s(x)}^{\varepsilon}\in[j,j+1)\times[k,k+1)\},
EKε={|𝑿~1s(x)ε𝒙1ζ~1s(x)ε|>Kε}{sup0t1s(x)|H(𝑿~tε)|>Kε}.E_{K}^{\varepsilon}=\{|{\tilde{\bm{X}}}_{1-s(x)}^{\varepsilon}-\bm{x}_{1}-\tilde{\zeta}_{1-s(x)}^{\varepsilon}|>K\varepsilon\}\cup\{\sup_{0\leq t\leq 1-s(x)}|H({\tilde{\bm{X}}}_{t}^{\varepsilon})|>K\sqrt{\varepsilon}\}.

Using (5.15) and (5.16), we can find a constant c>0c^{\prime}>0 such that, for all xIx\in I, y𝕋my\in\mathbb{T}^{m}, ε\varepsilon sufficiently small, and 1/εj,k1/ε-1/\sqrt{\varepsilon}\leq j,k\leq 1/\sqrt{\varepsilon}, 𝐏(x,y)(Aj,kε)cε\bm{\mathrm{P}}_{(x,y)}(A_{j,k}^{\varepsilon})\geq c^{\prime}\varepsilon. And using (3.10) and (5.11) we can choose KK large enough such that, for all xIx\in I, y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small, 𝐏(x,y)(EKε)<c/100\bm{\mathrm{P}}_{(x,y)}(E_{K}^{\varepsilon})<c^{\prime}/100. Let η~2=1s(x)𝝉1\tilde{\eta}_{2}=1-s(x)\wedge\bm{\tau}_{1}. Then it is not hard to see that

1/εj,k1/ε𝐏(x,y)(Aj,kε{𝑿~η~2Bε,Kj,k})<c/100.\sum_{-1/\sqrt{\varepsilon}\leq j,k\leq 1/\sqrt{\varepsilon}}\bm{\mathrm{P}}_{(x,y)}(A_{j,k}^{\varepsilon}\cap\{{\tilde{\bm{X}}}_{\tilde{\eta}_{2}}\not\in B^{\varepsilon,K}_{j,k}\})<c^{\prime}/100.

Now let us define, for (x,y)I×𝕋m(x,y)\in I\times\mathbb{T}^{m},

Rx,yε={(j,k):1/εj,k1/ε,𝐏(x,y)(Aj,kε{𝑿~εη~2Bε,Kj,k})<cε/2}.R_{x,y}^{\varepsilon}=\{(j,k):-1/\sqrt{\varepsilon}\leq j,k\leq 1/\sqrt{\varepsilon},\bm{\mathrm{P}}_{(x,y)}(A_{j,k}^{\varepsilon}\cap\{{\tilde{\bm{X}}}^{\varepsilon}_{\tilde{\eta}_{2}}\in B^{\varepsilon,K}_{j,k}\})<c^{\prime}\varepsilon/2\}.

Then we know that |Rx,yε|<150ε|R_{x,y}^{\varepsilon}|<\frac{1}{50\varepsilon} since, for every (j,k)Rx,yε(j,k)\in R_{x,y}^{\varepsilon},

𝐏(x,y)(Aj,kε{𝑿~εη~2Bε,Kj,k})𝐏(x,y)(Aj,kε)𝐏(x,y)(Aj,kε{𝑿~εη~2Bε,Kj,k})cε/2.\bm{\mathrm{P}}_{(x,y)}(A_{j,k}^{\varepsilon}\cap\{{\tilde{\bm{X}}}^{\varepsilon}_{\tilde{\eta}_{2}}\not\in B^{\varepsilon,K}_{j,k}\})\geq\bm{\mathrm{P}}_{(x,y)}(A_{j,k}^{\varepsilon})-\bm{\mathrm{P}}_{(x,y)}(A_{j,k}^{\varepsilon}\cap\{{\tilde{\bm{X}}}^{\varepsilon}_{\tilde{\eta}_{2}}\in B^{\varepsilon,K}_{j,k}\})\geq c^{\prime}\varepsilon/2.

Let the constants c>0c^{\prime\prime}>0, t1>0t_{1}>0, the stopping time η~3j,k<t1ε\tilde{\eta}_{3}^{j,k}<t_{1}\varepsilon, and πεj,k\pi^{\varepsilon}_{j,k} be defined as in Lemma 5.11. Define

πε=12cε1/εj,k1/επεj,k,π^x,y,x,yε=12cε(j,k)Rεx,yRεx,yπεj,k.\pi^{\varepsilon}=\frac{1}{2}c^{\prime}\varepsilon\sum_{-1/\sqrt{\varepsilon}\leq j,k\leq 1/\sqrt{\varepsilon}}\pi^{\varepsilon}_{j,k},\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \hat{\pi}_{x,y,x^{\prime},y^{\prime}}^{\varepsilon}=\frac{1}{2}c^{\prime}\varepsilon\sum_{(j,k)\in R^{\varepsilon}_{x,y}\cup R^{\varepsilon}_{x^{\prime},y^{\prime}}}\pi^{\varepsilon}_{j,k}.

In order to define the desired stopping time, we first run the process starting on I×𝕋mI\times\mathbb{T}^{m} for time η~2\tilde{\eta}_{2} (with overwhelming probability, it is the time for the deterministic motion with the same stating point to reach 𝒙1\bm{x}_{1}). Then we use the locations of both ζ~η~2ε\tilde{\zeta}_{\tilde{\eta}_{2}}^{\varepsilon} and X~η~2ε{\tilde{X}}_{\tilde{\eta}_{2}}^{\varepsilon} to determine whether the process continues and, if it continues, we choose the stopping time based on Lemma 5.11. Namely, we define

η~=η~2+1/εj,k1/εχ(Aj,kε{𝑿~εη~2j,kBj,kε,K})η~3j,k(𝑿~εη~2,𝝃~εη~2),\tilde{\eta}=\tilde{\eta}_{2}+\sum_{-1/\sqrt{\varepsilon}\leq j,k\leq 1/\sqrt{\varepsilon}}\chi(A_{j,k}^{\varepsilon}\cap\{{\tilde{\bm{X}}}^{\varepsilon}_{\tilde{\eta}_{2}^{j,k}}\in B_{j,k}^{\varepsilon,K}\})\cdot\tilde{\eta}_{3}^{j,k}({\tilde{\bm{X}}}^{\varepsilon}_{\tilde{\eta}_{2}},{\tilde{\bm{\xi}}}^{\varepsilon}_{\tilde{\eta}_{2}}), (5.18)

where η~3j,k(x,y)\tilde{\eta}_{3}^{j,k}(x,y) denotes the stopping time with initial condition (x,y)(x,y). Then it follows from previous results that, for any pair (x,y),(x,y)I×𝕋m(x,y),(x^{\prime},y^{\prime})\in I\times\mathbb{T}^{m}, there is a common component πεπ^εx,y,x,y\pi^{\varepsilon}-\hat{\pi}^{\varepsilon}_{x,y,x^{\prime},y^{\prime}} of the distributions of (𝑿~εη~,𝝃~εη~)({\tilde{\bm{X}}}^{\varepsilon}_{\tilde{\eta}},{\tilde{\bm{\xi}}}^{\varepsilon}_{\tilde{\eta}}) starting from (x,y)(x,y) and (x,y)(x^{\prime},y^{\prime}), respectively. Moreover, (πεπ^εx,y,x,y)(Vε×𝕋m)>cc(\pi^{\varepsilon}-\hat{\pi}^{\varepsilon}_{x,y,x^{\prime},y^{\prime}})(V^{\varepsilon}\times\mathbb{T}^{m})>c^{\prime}c^{\prime\prime} since |Rεx,y|<150ε|R^{\varepsilon}_{x,y}|<\frac{1}{50\varepsilon} and |Rεx,y|<150ε|R^{\varepsilon}_{x^{\prime},y^{\prime}}|<\frac{1}{50\varepsilon}. Therefore, the total variation is no more than 1cc1-c^{\prime}c^{\prime\prime}. ∎

Finally, we combine the result we just obtained with Step 1 to prove Lemma 5.1.

Proof of Lemma 5.1.

As we discussed, the result is equivalent to the exponential convergence in total variation of (𝑿~𝝉nε,𝝃~𝝉nε)({\tilde{\bm{X}}}_{\bm{\tau}_{n}}^{\varepsilon},{\tilde{\bm{\xi}}}_{\bm{\tau}_{n}}^{\varepsilon}) on γ×𝕋m\gamma^{\prime}\times\mathbb{T}^{m}, uniformly in ε\varepsilon and in the initial distribution. Let μεx,y\mu^{\varepsilon}_{x,y} denote the measure on γ×𝕋m\gamma^{\prime}\times\mathbb{T}^{m} induced by (𝑿~𝝉1ε,𝝃~𝝉1ε)({\tilde{\bm{X}}}_{\bm{\tau}_{1}}^{\varepsilon},{\tilde{\bm{\xi}}}_{\bm{\tau}_{1}}^{\varepsilon}) with the starting point (x,y)γ×𝕋m(x,y)\in\gamma^{\prime}\times\mathbb{T}^{m}. Then it suffices to prove that there exists c>0c>0 such that, for every pair (x,y)(x,y), (x,y)γ×𝕋m(x^{\prime},y^{\prime})\in\gamma^{\prime}\times\mathbb{T}^{m} and all ε\varepsilon sufficiently small, TV(μεx,y,μεx,y)<1c\mathrm{TV}(\mu^{\varepsilon}_{x,y},\mu^{\varepsilon}_{x^{\prime},y^{\prime}})<1-c, which follows from Lemma 5.6 and Lemma 5.12. ∎

6 Proof of the main result

Since we deal with both the original and the auxiliary processes in this section, certain notation needs clarifying to avoid possible ambiguity: the process (X~tε,ξ~tε)({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon}) represents not a generic process with arbitrary bounded c(x,y)c(x,y) but only the auxiliary process with c~(x,y)\tilde{c}(x,y) satisfying (3.5); σn\sigma_{n}, τn\tau_{n} are defined as in (1.6), but for the process (Xtε,ξtε)(X_{t}^{\varepsilon},\xi_{t}^{\varepsilon}) on M×𝕋mM\times\mathbb{T}^{m}, and σ~n\tilde{\sigma}_{n}, τ~n\tilde{\tau}_{n} represent the corresponding stopping time w.r.t. (X~tε,ξ~tε)({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon}). In (4.4), we defined c\mathcal{L}_{c} on each edge for a generic c(x,y)c(x,y). Here we give a more explicit definition of c~=L~k\mathcal{L}_{\tilde{c}}=\tilde{L}_{k} on the edge IkI_{k}:

L~kf(h)\displaystyle\tilde{L}_{k}f(h) =12Ak(h)f(h)+B~k(h)f(h),\displaystyle=\frac{1}{2}A_{k}(h)f^{\prime\prime}(h)+\tilde{B}_{k}(h)f^{\prime}(h),
Ak(h)\displaystyle A_{k}(h) =2Qk(h)γk(h)1|H(x)|0𝐄μbh(x,ξs)bh(x,ξ0)dsdl,\displaystyle=\frac{2}{Q_{k}(h)}\int_{\gamma_{k}(h)}\frac{1}{|\nabla H(x)|}\int_{0}^{\infty}\bm{\mathrm{E}}_{\mu}b_{h}(x,\xi_{s})b_{h}(x,\xi_{0})dsdl,
B~k(h)\displaystyle\tilde{B}_{k}(h) =1Qk(h)γk(h)1|H(x)|0𝐄μdivx(bh(x,ξs)(b(x,ξ0)H(x)))dsdl.\displaystyle=\frac{1}{Q_{k}(h)}\int_{\gamma_{k}(h)}\frac{1}{|\nabla H(x)|}\int_{0}^{\infty}\bm{\mathrm{E}}_{\mu}\mathrm{div}_{x}(b_{h}(x,\xi_{s})(b(x,\xi_{0})-\nabla^{\perp}H(x)))dsdl.

One can easily check that this is consistent with the definitions of A¯\bar{A} and B¯c\bar{B}_{c} in (4.4), which are the generalizations of the coefficients defined in (2.2) and, moreover,

12[Ak(hk)Qk(hk)f(hk)]=12Ak(hk)Qk(hk)f(hk)+B~k(hk)Qk(hk)f(hk).\frac{1}{2}[A_{k}(h_{k})Q_{k}(h_{k})f^{\prime}(h_{k})]^{\prime}=\frac{1}{2}A_{k}(h_{k})Q_{k}(h_{k})f^{\prime\prime}(h_{k})+\tilde{B}_{k}(h_{k})Q_{k}(h_{k})f^{\prime}(h_{k}). (6.1)
Lemma 6.1.

For each f𝒟f\in\mathcal{D} and all ε\varepsilon sufficiently small, we have 𝐄νε0σ1c~f(h(X~tε))dt=0\bm{\mathrm{E}}_{\nu^{\varepsilon}}\int_{0}^{\sigma_{1}}\mathcal{L}_{\tilde{c}}f(h({\tilde{X}}_{t}^{\varepsilon}))dt=0.

Proof.

This is the place where the gluing condition (2.4) plays a role. Since the process (X~tε,ξ~tε)({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon}) on M×𝕋mM\times\mathbb{T}^{m} is recurrent, and the measure λ×μ\lambda\times\mu is the invariant measure, by Theorem 2.1 in [15], we have that for any measurable set AMA\subset M,

MχA(x)dλ(x)=λ(A)=(λ×μ)(A×𝕋m)=𝐄νε0σ1χA(X~tε)dt.\int_{M}\chi_{A}(x)d\lambda(x)=\lambda(A)=(\lambda\times\mu)(A\times\mathbb{T}^{m})=\bm{\mathrm{E}}_{\nu^{\varepsilon}}\int_{0}^{\sigma_{1}}\chi_{A}({\tilde{X}}_{t}^{\varepsilon})dt.

Thus,

Mc~f(h(x))dλ(x)=𝐄νε0σ1c~f(h(X~tε))dt.\int_{M}\mathcal{L}_{\tilde{c}}f(h(x))d\lambda(x)=\bm{\mathrm{E}}_{\nu^{\varepsilon}}\int_{0}^{\sigma_{1}}\mathcal{L}_{\tilde{c}}f(h({\tilde{X}}_{t}^{\varepsilon}))dt.

So, it suffices to show that the left-hand side is zero. By (6.1) and (2.4),

Mc~f(h(x))dλ(x)\displaystyle\int_{M}\mathcal{L}_{\tilde{c}}f(h(x))d\lambda(x) =k=13IkL~kf(hk)Qk(hk)dhk\displaystyle=\sum_{k=1}^{3}\int_{I_{k}}\tilde{L}_{k}f(h_{k})Q_{k}(h_{k})dh_{k}
=k=13Ik(12Ak(hk)f(hk)+B~k(hk)f(hk))Qk(hk)dhk\displaystyle=\sum_{k=1}^{3}\int_{I_{k}}(\frac{1}{2}A_{k}(h_{k})f^{\prime\prime}(h_{k})+\tilde{B}_{k}(h_{k})f^{\prime}(h_{k}))Q_{k}(h_{k})dh_{k}
=k=13Ik12[Ak(hk)Qk(hk)f(hk)]dhk\displaystyle=\sum_{k=1}^{3}\int_{I_{k}}\frac{1}{2}[A_{k}(h_{k})Q_{k}(h_{k})f^{\prime}(h_{k})]^{\prime}dh_{k}
=12k=13pklimhkOf(hk)\displaystyle=\frac{1}{2}\sum_{k=1}^{3}p_{k}\lim_{h_{k}\to O}f^{\prime}(h_{k})
=0.\displaystyle=0.\qed

Let us verify the analogue of (3.2) in the case of the auxiliary process (X~tε,ξ~tε)({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon}).

Proposition 6.2.

For each f𝒟f\in\mathcal{D} and T>0T>0,

𝐄(x,y)[f(h(X~ηε))f(h(x))0ηc~f(h(X~tε))dt]0\bm{\mathrm{E}}_{(x,y)}[f(h({\tilde{X}}_{\eta}^{\varepsilon}))-f(h(x))-\int_{0}^{\eta}\mathcal{L}_{\tilde{c}}f(h({\tilde{X}}_{t}^{\varepsilon}))dt]\to 0 (6.2)

as ε0\varepsilon\to 0, uniformly in xMx\in M, y𝕋my\in\mathbb{T}^{m}, and η\eta is a stopping time bounded by TT.

Proof.

We divide the time interval [0,η][0,\eta] into visits to the separatrix. Since σn\sigma_{n}\to\infty,

|𝐄(x,y)[f(h(X~ηε))f(h(x))0ηc~f(h(X~tε))dt]|\displaystyle|\bm{\mathrm{E}}_{(x,y)}[f(h({\tilde{X}}_{\eta}^{\varepsilon}))-f(h(x))-\int_{0}^{\eta}\mathcal{L}_{\tilde{c}}f(h({\tilde{X}}_{t}^{\varepsilon}))dt]|
|limn𝐄(x,y)[f(h(X~σ~nε))f(h(x))0σ~nc~f(h(X~tε))dt]|\displaystyle\leq|\lim_{n\to\infty}\bm{\mathrm{E}}_{(x,y)}[f(h({\tilde{X}}_{\tilde{\sigma}_{n}}^{\varepsilon}))-f(h(x))-\int_{0}^{\tilde{\sigma}_{n}}\mathcal{L}_{\tilde{c}}f(h({\tilde{X}}_{t}^{\varepsilon}))dt]|
+|limn𝐄(x,y)𝐄(X~ηε,ξ~ηε)[f(h(X~σ~nε))f(h(X~0ε))0σ~nc~f(h(X~tε))dt]|\displaystyle\quad+|\lim_{n\to\infty}\bm{\mathrm{E}}_{(x,y)}\bm{\mathrm{E}}_{({\tilde{X}}_{\eta}^{\varepsilon},{\tilde{\xi}}_{\eta}^{\varepsilon})}[f(h({\tilde{X}}_{\tilde{\sigma}_{n}}^{\varepsilon}))-f(h({\tilde{X}}_{0}^{\varepsilon}))-\int_{0}^{\tilde{\sigma}_{n}}\mathcal{L}_{\tilde{c}}f(h({\tilde{X}}_{t}^{\varepsilon}))dt]|
2sup(x,y)M×𝕋m|𝐄(x,y)[f(H(X~εσ~))f(H(x))0σc~f(H(X~sε))ds]|\displaystyle\leq 2\sup_{(x,y)\in M\times\mathbb{T}^{m}}|\bm{\mathrm{E}}_{(x,y)}[f(H({\tilde{X}}^{\varepsilon}_{\tilde{\sigma}}))-f(H(x))-\int_{0}^{\sigma}{\mathcal{L}_{\tilde{c}}}f(H({\tilde{X}}_{s}^{\varepsilon}))ds]| (6.3)
+2limnsup(x,y)γ×𝕋m|𝐄(x,y)[f(h(X~σ~nε))f(h(x))0σnc~f(h(X~tε))dt]|.\displaystyle\quad+2\lim_{n\to\infty}\sup_{(x,y)\in\gamma\times\mathbb{T}^{m}}|\bm{\mathrm{E}}_{(x,y)}[f(h({\tilde{X}}_{\tilde{\sigma}_{n}}^{\varepsilon}))-f(h(x))-\int_{0}^{\sigma_{n}}\mathcal{L}_{\tilde{c}}f(h({\tilde{X}}_{t}^{\varepsilon}))dt]|. (6.4)

Note that (6.3) converges to 0 due to Proposition 4.6, and (6.4) also converges to 0 since

limnsup(x,y)γ×𝕋m|𝐄(x,y)[f(h(X~σ~nε))f(h(x))0σ~nc~f(h(X~tε))dt]|\displaystyle\lim_{n\to\infty}\sup_{(x,y)\in\gamma\times\mathbb{T}^{m}}|\bm{\mathrm{E}}_{(x,y)}[f(h({\tilde{X}}_{\tilde{\sigma}_{n}}^{\varepsilon}))-f(h(x))-\int_{0}^{\tilde{\sigma}_{n}}\mathcal{L}_{\tilde{c}}f(h({\tilde{X}}_{t}^{\varepsilon}))dt]|
limnsup(x,y)γ×𝕋mk=0n1|𝐄(x,y)σ~kσ~k+1c~f(h(X~tε))dt|\displaystyle\leq\lim_{n\to\infty}\sup_{(x,y)\in\gamma\times\mathbb{T}^{m}}\sum_{k=0}^{n-1}|\bm{\mathrm{E}}_{(x,y)}\int_{\tilde{\sigma}_{k}}^{\tilde{\sigma}_{k+1}}\mathcal{L}_{\tilde{c}}f(h({\tilde{X}}_{t}^{\varepsilon}))dt|
limnsup(x,y)γ×𝕋mk=0n(2TV(νx,yk,ε,νε)sup(x,y)γ×𝕋m|𝐄(x,y)0σ~1c~f(h(X~tε))dt|)\displaystyle\leq\lim_{n\to\infty}\sup_{(x,y)\in\gamma\times\mathbb{T}^{m}}\sum_{k=0}^{n}\left(2\cdot\mathrm{TV}(\nu_{x,y}^{k,\varepsilon},\nu^{\varepsilon})\cdot\sup_{(x^{\prime},y^{\prime})\in\gamma\times\mathbb{T}^{m}}|\bm{\mathrm{E}}_{(x^{\prime},y^{\prime})}\int_{0}^{\tilde{\sigma}_{1}}\mathcal{L}_{\tilde{c}}f(h({\tilde{X}}_{t}^{\varepsilon}))dt|\right)
=0,\displaystyle=0,

where the second inequality is due to Lemma 6.1 and the last equality follows from Proposition 4.6, Lemma 5.1, and Proposition B.3. Thus, the desired result holds. ∎

To generalize the result to the original process (Xtε,ξtε)(X_{t}^{\varepsilon},\xi_{t}^{\varepsilon}) on M×𝕋mM\times\mathbb{T}^{m}, we need the next two technical results. We start with a simple corollary of Lemma 4.16, which controls the number of excursions or, equivalently, the number of stopping times σn\sigma_{n} and τn\tau_{n} in finite time.

Corollary 6.3.

For a given t>0t>0, the expected number of excursions before tt is O(εα)O(\varepsilon^{-\alpha}):

n=0𝐏(x,y)(τn+1<t)n=0𝐏(x,y)(σn<t)etκεα,\sum_{n=0}^{\infty}\bm{\mathrm{P}}_{(x,y)}(\tau_{n+1}<t)\leq\sum_{n=0}^{\infty}\bm{\mathrm{P}}_{(x,y)}(\sigma_{n}<t)\leq\frac{e^{t}}{\kappa}\varepsilon^{-\alpha}, (6.5)

where κ\kappa is the constant chosen in Lemma 4.16.

Proof.

By Lemma 4.16 and the strong Markov property,

sup(x,y)M×𝕋m𝐄(x,y)eσn(sup(x,y)γ×𝕋m𝐄(x,y)eσ)n(1κεα)n.\sup_{(x,y)\in M\times\mathbb{T}^{m}}\bm{\mathrm{E}}_{(x,y)}e^{-\sigma_{n}}\leq(\sup_{(x,y)\in\gamma^{\prime}\times\mathbb{T}^{m}}\bm{\mathrm{E}}_{(x,y)}e^{-\sigma})^{n}\leq(1-\kappa\varepsilon^{\alpha})^{n}. (6.6)

Thus, by Markov’s inequality, for all n>0n>0,

𝐏(x,y)(τn+1<t)𝐏(x,y)(σn<t)et𝐄(x,y)eσnet(1κεα)n,\bm{\mathrm{P}}_{(x,y)}(\tau_{n+1}<t)\leq\bm{\mathrm{P}}_{(x,y)}(\sigma_{n}<t)\leq e^{t}\bm{\mathrm{E}}_{(x,y)}e^{-\sigma_{n}}\leq e^{t}(1-\kappa\varepsilon^{\alpha})^{n}, (6.7)

and (6.5) follows by taking the sum. ∎

Lemma 6.4.

For each f𝒟f\in\mathcal{D} and δ>0\delta>0 there is 0<ρ<10<\rho<1 such that, for all xγx\in\gamma, y𝕋my\in\mathbb{T}^{m}, and all ε\varepsilon sufficiently small,

supσρ|𝐄(x,y)n=0χ{σn<σ}[f(h(Xτn+1ε))f(h(Xσnε))σnτn+1f(h(Xsε))ds]|\displaystyle\sup_{{\sigma^{\prime}}\leq\rho}|\bm{\mathrm{E}}_{(x,y)}\sum_{n=0}^{\infty}\chi_{\{\sigma_{n}<{\sigma^{\prime}}\}}[f(h(X_{\tau_{n+1}}^{\varepsilon}))-f(h(X_{\sigma_{n}}^{\varepsilon}))-\int_{\sigma_{n}}^{\tau_{n+1}}\mathcal{L}f(h(X_{s}^{\varepsilon}))ds]| (6.8)
δρ+εαδn=0𝐏(x,y)(σn<ρ),\displaystyle\quad\leq\delta\rho+\varepsilon^{\alpha}\delta\sum_{n=0}^{\infty}\bm{\mathrm{P}}_{(x,y)}(\sigma_{n}<\rho),

where σ\sigma^{\prime} is a stopping time w.r.t. Xεt\mathcal{F}^{X_{\cdot}^{\varepsilon}}_{t}.

Proof.

The result holds either with or without the integral term since nearly all of the time is spent from τn\tau_{n} to σn\sigma_{n}. To be precise, by the strong Markov property, Corollary 6.3, and Proposition B.3,

sup(x,y)γ×𝕋msupσρ|𝐄(x,y)n=0χ{σn<σ}σnτn+1f(h(Xsε))ds|\displaystyle\sup_{(x,y)\in\gamma\times\mathbb{T}^{m}}\sup_{{\sigma^{\prime}}\leq\rho}|\bm{\mathrm{E}}_{(x,y)}\sum_{n=0}^{\infty}\chi_{\{\sigma_{n}<{\sigma^{\prime}}\}}\int_{\sigma_{n}}^{\tau_{n+1}}\mathcal{L}f(h(X_{s}^{\varepsilon}))ds| (6.9)
sup(x,y)γ×𝕋msupσρn=0|𝐄(x,y)χ{σn<σ}𝐄(Xσnε,ξσnε)τ1|=O(εα|logε|).\displaystyle\lesssim\sup_{(x,y)\in\gamma\times\mathbb{T}^{m}}\sup_{{\sigma^{\prime}}\leq\rho}\sum_{n=0}^{\infty}|\bm{\mathrm{E}}_{(x,y)}\chi_{\{\sigma_{n}<{\sigma^{\prime}}\}}\bm{\mathrm{E}}_{(X_{\sigma_{n}}^{\varepsilon},\xi_{\sigma_{n}}^{\varepsilon})}\tau_{1}|=O(\varepsilon^{\alpha}|\log\varepsilon|).

Thus, it suffices to prove for all ε\varepsilon sufficiently small

supσρ|𝐄(x,y)n=0χ{σn<σ}[f(h((Xτn+1ε))f(h((Xσnε))|δρ+εαδn=0𝐏(x,y)(σn<ρ).\sup_{{\sigma^{\prime}}\leq\rho}|\bm{\mathrm{E}}_{(x,y)}\sum_{n=0}^{\infty}\chi_{\{\sigma_{n}<{\sigma^{\prime}}\}}[f(h((X_{\tau_{n+1}}^{\varepsilon}))-f(h((X_{\sigma_{n}}^{\varepsilon}))|\leq\delta\rho+\varepsilon^{\alpha}\delta\sum_{n=0}^{\infty}\bm{\mathrm{P}}_{(x,y)}(\sigma_{n}<\rho). (6.10)

Let us prove this for X~tε{\tilde{X}}_{t}^{\varepsilon} first using Proposition 6.2, then apply the Girsanov theorem to get the result for XtεX_{t}^{\varepsilon}. Let σ~\tilde{\sigma}^{\prime} be the analogue of σ\sigma^{\prime} w.r.t. X~εt\mathcal{F}^{{\tilde{X}}_{\cdot}^{\varepsilon}}_{t}. Divide the time interval [0,σ~][0,\tilde{\sigma}^{\prime}] into excursions using stopping times σ~n\tilde{\sigma}_{n} and τ~n\tilde{\tau}_{n}:

𝐄(x,y)[f(h(X~σ~ε))f(h(x))0σ~f(h(X~tε))dt]\displaystyle\bm{\mathrm{E}}_{(x,y)}[f(h({\tilde{X}}_{\tilde{\sigma}^{\prime}}^{\varepsilon}))-f(h(x))-\int_{0}^{\tilde{\sigma}^{\prime}}\mathcal{L}f(h({\tilde{X}}_{t}^{\varepsilon}))dt] (6.11)
=𝐄(x,y)[f(h(X~σ~σ~ε))f(h(x))0σ~σ~f(h(X~tε))dt]\displaystyle\quad=\bm{\mathrm{E}}_{(x,y)}[f(h({\tilde{X}}_{\tilde{\sigma}^{\prime}\wedge\tilde{\sigma}}^{\varepsilon}))-f(h(x))-\int_{0}^{\tilde{\sigma}^{\prime}\wedge\tilde{\sigma}}\mathcal{L}f(h({\tilde{X}}_{t}^{\varepsilon}))dt] (6.12)
+n=0𝐄(x,y)(χ{σ~n<σ~}[f(h(X~τ~n+1σ~ε))f(h(X~σ~nε))σ~nτ~n+1σ~f(h(X~tε))dt])\displaystyle\quad+\sum_{n=0}^{\infty}\bm{\mathrm{E}}_{(x,y)}\left(\chi_{\{\tilde{\sigma}_{n}<\tilde{\sigma}^{\prime}\}}[f(h({\tilde{X}}_{\tilde{\tau}_{n+1}\wedge\tilde{\sigma}^{\prime}}^{\varepsilon}))-f(h({\tilde{X}}_{\tilde{\sigma}_{n}}^{\varepsilon}))-\int_{\tilde{\sigma}_{n}}^{\tilde{\tau}_{n+1}\wedge\tilde{\sigma}^{\prime}}\mathcal{L}f(h({\tilde{X}}_{t}^{\varepsilon}))dt]\right) (6.13)
+n=1𝐄(x,y)(χ{τ~n<σ~}[f(h(X~σ~nσ~ε))f(h(X~τ~nε))τ~nσ~nσ~f(h(X~tε))dt]).\displaystyle\quad+\sum_{n=1}^{\infty}\bm{\mathrm{E}}_{(x,y)}\left(\chi_{\{\tilde{\tau}_{n}<\tilde{\sigma}^{\prime}\}}[f(h({\tilde{X}}_{\tilde{\sigma}_{n}\wedge\tilde{\sigma}^{\prime}}^{\varepsilon}))-f(h({\tilde{X}}_{\tilde{\tau}_{n}}^{\varepsilon}))-\int_{\tilde{\tau}_{n}}^{\tilde{\sigma}_{n}\wedge\tilde{\sigma}^{\prime}}\mathcal{L}f(h({\tilde{X}}_{t}^{\varepsilon}))dt]\right). (6.14)

Thus, (6.13) converges to 0 uniformly for all xγx\in\gamma and σ~ρ\tilde{\sigma}^{\prime}\leq\rho due to the convergence of (6.11), (6.12), and (6.14), by Proposition 6.2, Proposition 4.6, and Lemma 4.15 with Corollary 6.3, respectively. Note that (6.9) also holds for X~tε{\tilde{X}}_{t}^{\varepsilon}, hence we conclude that

sup(x,y)γ×𝕋msupσ~ρn=0𝐄(x,y)(χ{σ~n<σ~}[f(h(X~τ~n+1σ~ε))f(h(X~σ~nε))])0.\sup_{(x,y)\in\gamma\times\mathbb{T}^{m}}\sup_{{\tilde{\sigma}^{\prime}}\leq\rho}\sum_{n=0}^{\infty}\bm{\mathrm{E}}_{(x,y)}\left(\chi_{\{\tilde{\sigma}_{n}<\tilde{\sigma}^{\prime}\}}[f(h({\tilde{X}}_{\tilde{\tau}_{n+1}\wedge\tilde{\sigma}^{\prime}}^{\varepsilon}))-f(h({\tilde{X}}_{\tilde{\sigma}_{n}}^{\varepsilon}))]\right)\to 0. (6.15)

To apply the Girsanov theorem, we choose a sufficiently small time interval and use the fact that the transition probability of (Xtε,ξtε)(X_{t}^{\varepsilon},\xi_{t}^{\varepsilon}) is similar to that of (X~tε,ξ~tε)({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon}) in the sense that they are absolutely continuous with density close to 11. More precisely, for any fixed δ>0\delta^{\prime}>0, by the Girsanov theorem, we can choose a constant ρ1\rho_{1} such that for all 0<ρ<ρ10<\rho<\rho_{1},

μx,yε(|dμ~x,yεdμx,yε1|<δ)1ρ2,\mu_{x,y}^{\varepsilon}\left(\left|\frac{d\tilde{\mu}_{x,y}^{\varepsilon}}{d\mu_{x,y}^{\varepsilon}}-1\right|<\delta^{\prime}\right)\geq 1-\rho^{2}, (6.16)

where μx,yε\mu_{x,y}^{\varepsilon} and μ~x,yε\tilde{\mu}_{x,y}^{\varepsilon} are the measures on 𝐂[0,ρ]\bm{\mathrm{C}}[0,\rho] induced by (Xtε,ξtε)(X_{t}^{\varepsilon},\xi_{t}^{\varepsilon}) and (X~tε,ξ~tε)({\tilde{X}}_{t}^{\varepsilon},{\tilde{\xi}}_{t}^{\varepsilon}). Define

C={|dμ~x,yεdμx,yε1|<δ}𝐂[0,ρ],Ω={(Xtε,t[0,ρ])C}.C^{\prime}=\left\{\left|\frac{d\tilde{\mu}_{x,y}^{\varepsilon}}{d\mu_{x,y}^{\varepsilon}}-1\right|<\delta\right\}\subset\bm{\mathrm{C}}[0,\rho],\leavevmode\nobreak\ \Omega^{\prime}=\{(X_{t}^{\varepsilon},\leavevmode\nobreak\ t\in[0,\rho])\in C^{\prime}\}.

Note that the quantity in (6.10) primarily depends on the behavior of the processes on time interval [0,σ][0,\sigma^{\prime}] and event Ω\Omega^{\prime}. Indeed, we can replace the stopping times τn\tau_{n} by τnσ\tau_{n}\wedge\sigma^{\prime} with O(εα)O(\varepsilon^{\alpha}) errors. To replace Ω\Omega with Ω\Omega^{\prime}, we need several additional results that control the difference.

As in Corollary 6.3, we fix κ>0\kappa>0 and choose a large constant C>0C>0 independent of ρ\rho such that

n=[Clog(C/ρ)εα]𝐏(x,y)(σn<ρ)n=[Clog(C/ρ)εα]eρ(1κεα)nδρεα.\sum_{n=[C\log(C/\rho)\varepsilon^{-\alpha}]}^{\infty}\bm{\mathrm{P}}_{(x,y)}(\sigma_{n}<\rho)\leq\sum_{n=[C\log(C/\rho)\varepsilon^{-\alpha}]}^{\infty}e^{\rho}(1-\kappa\varepsilon^{\alpha})^{n}\leq\delta^{\prime}\rho\varepsilon^{-\alpha}. (6.17)

Now we choose ρ2>0\rho_{2}>0 such that, for all 0<ρ<ρ20<\rho<\rho_{2}, Cρlog(C/ρ)<δC\rho\log(C/\rho)<\delta^{\prime}. Hence, for all σρ\sigma^{\prime}\leq\rho,

n=0𝐏(x,y)({σn<σ}Ω)Cρ2log(C/ρ)εα+δρεα2δρεα.\sum_{n=0}^{\infty}\bm{\mathrm{P}}_{(x,y)}(\{\sigma_{n}<\sigma^{\prime}\}\setminus\Omega^{\prime})\leq C\rho^{2}\log(C/\rho)\varepsilon^{-\alpha}+\delta^{\prime}\rho\varepsilon^{-\alpha}\leq 2\delta^{\prime}\rho\varepsilon^{-\alpha}. (6.18)

Thus, with K:=maxIkO|limhkIk,hkOf(hk)|K:=\max_{I_{k}\sim O}|\lim_{{h_{k}\in I_{k},h_{k}\to O}}f^{\prime}(h_{k})|, we obtain

|𝐄(x,y)n=0χ{σn<σ}Ω[f(h(Xτn+1σε))f(h(Xσnε))]|2(K+1)δρ.|\bm{\mathrm{E}}_{(x,y)}\sum_{n=0}^{\infty}\chi_{\{\sigma_{n}<{\sigma^{\prime}}\}\setminus\Omega^{\prime}}[f(h(X_{\tau_{n+1}\wedge\sigma^{\prime}}^{\varepsilon}))-f(h(X_{\sigma_{n}}^{\varepsilon}))]|\leq 2(K+1)\delta^{\prime}\rho. (6.19)

By following the same steps, we can choose ρ3>0\rho_{3}>0 such that for all 0<ρ<ρ30<\rho<\rho_{3},

|𝐄(x,y)n=0χ{σ~n<σ~}Ω[f(h(X~τ~n+1σ~ε))f(h(X~σ~nε))]|2(K+1)δρ.|\bm{\mathrm{E}}_{(x,y)}\sum_{n=0}^{\infty}\chi_{\{\tilde{\sigma}_{n}<{\tilde{\sigma}^{\prime}}\}\setminus\Omega^{\prime}}[f(h({\tilde{X}}_{\tilde{\tau}_{n+1}\wedge\tilde{\sigma}^{\prime}}^{\varepsilon}))-f(h({\tilde{X}}_{\tilde{\sigma}_{n}}^{\varepsilon}))]|\leq 2(K+1)\delta^{\prime}\rho. (6.20)

It remains to consider

|𝐄(x,y)n=0χ{σn<σ}Ω[f(h(Xτn+1σε))f(h(Xσnε))]|,|\bm{\mathrm{E}}_{(x,y)}\sum_{n=0}^{\infty}\chi_{\{\sigma_{n}<{\sigma^{\prime}}\}\cap\Omega^{\prime}}[f(h(X_{\tau_{n+1}\wedge\sigma^{\prime}}^{\varepsilon}))-f(h(X_{\sigma_{n}}^{\varepsilon}))]|, (6.21)

which can be written and estimated as, with FF denoting the functional on 𝐂[0,ρ]\bm{\mathrm{C}}[0,\rho] found inside the expectation in (6.21),

|CFdμx,yε|\displaystyle\left|\int_{C^{\prime}}Fd\mu_{x,y}^{\varepsilon}\right| =|CFdμ~x,yεCF(dμ~x,yεdμx,yε1)dμx,yε|\displaystyle=\left|\int_{C^{\prime}}Fd\tilde{\mu}_{x,y}^{\varepsilon}-\int_{C^{\prime}}F\left(\frac{d\tilde{\mu}_{x,y}^{\varepsilon}}{d\mu_{x,y}^{\varepsilon}}-1\right)d\mu_{x,y}^{\varepsilon}\right| (6.22)
|CFdμ~x,yε|+δC|F|dμx,yε.\displaystyle\leq\left|\int_{C^{\prime}}Fd\tilde{\mu}_{x,y}^{\varepsilon}\right|+\delta^{\prime}\int_{C^{\prime}}|F|d\mu_{x,y}^{\varepsilon}.

The first term is bounded by 2(K+2)δρ2(K+2)\delta^{\prime}\rho due to (6.15) and (6.20). The second term is simply bounded by (K+1)δεαn=0𝐏(x,y)(σn<ρ)(K+1)\delta^{\prime}\varepsilon^{\alpha}\sum_{n=0}^{\infty}\bm{\mathrm{P}}_{(x,y)}(\sigma_{n}<\rho). Thus, we see that the left-hand side in (6.10) is no more than (4K+6)δ(ρ+εαn=0𝐏(x,y)(σn<ρ))(4K+6)\delta^{\prime}(\rho+\varepsilon^{\alpha}\sum_{n=0}^{\infty}\bm{\mathrm{P}}_{(x,y)}(\sigma_{n}<\rho)) with finite KK independent of δ\delta^{\prime} and ρ\rho for all ε\varepsilon sufficiently small. It remains to take δ=δ/(4K+6)\delta^{\prime}=\delta/(4K+6). ∎

Proof of Proposition 3.1.

Fix arbitrary δ>0\delta>0. We divide the time interval [0,η][0,\eta] into excursions from γ\gamma to γ\gamma^{\prime} and from γ\gamma^{\prime} to γ\gamma by using stopping times σn\sigma_{n} and τn\tau_{n}:

𝐄(x,y)[f(h(Xηε))f(h(x))0ηf(h(Xtε))dt]\displaystyle\bm{\mathrm{E}}_{(x,y)}[f(h(X_{\eta}^{\varepsilon}))-f(h(x))-\int_{0}^{\eta}\mathcal{L}f(h(X_{t}^{\varepsilon}))dt]
=𝐄(x,y)[f(h(Xησε))f(h(x))0ησf(h(Xtε))dt]\displaystyle\quad=\bm{\mathrm{E}}_{(x,y)}[f(h(X_{\eta\wedge\sigma}^{\varepsilon}))-f(h(x))-\int_{0}^{\eta\wedge\sigma}\mathcal{L}f(h(X_{t}^{\varepsilon}))dt] (6.23)
+n=0𝐄(x,y)(χ{σn<η}[f(h(Xτn+1ηε))f(h(Xσnε))σnτn+1ηf(h(Xtε))dt])\displaystyle\quad+\sum_{n=0}^{\infty}\bm{\mathrm{E}}_{(x,y)}\left(\chi_{\{\sigma_{n}<\eta\}}[f(h(X_{\tau_{n+1}\wedge\eta}^{\varepsilon}))-f(h(X_{\sigma_{n}}^{\varepsilon}))-\int_{\sigma_{n}}^{\tau_{n+1}\wedge\eta}\mathcal{L}f(h(X_{t}^{\varepsilon}))dt]\right) (6.24)
+n=1𝐄(x,y)(χ{τn<η}[f(h(Xσnηε))f(h(Xτnε))τnσnηf(h(Xtε))dt]).\displaystyle\quad+\sum_{n=1}^{\infty}\bm{\mathrm{E}}_{(x,y)}\left(\chi_{\{\tau_{n}<\eta\}}[f(h(X_{\sigma_{n}\wedge\eta}^{\varepsilon}))-f(h(X_{\tau_{n}}^{\varepsilon}))-\int_{\tau_{n}}^{\sigma_{n}\wedge\eta}\mathcal{L}f(h(X_{t}^{\varepsilon}))dt]\right). (6.25)

Here (6.23) converges to 0 by Proposition 4.6 and (6.25) converges to 0 by Lemma 4.15 and Corollary 6.3. It remains to consider (6.24) and it suffices to consider instead

n=0𝐄(x,y)(χ{σn<η}[f(h(Xτn+1ε))f(h(Xσnε))σnτn+1f(h(Xtε))dt])\sum_{n=0}^{\infty}\bm{\mathrm{E}}_{(x,y)}\left(\chi_{\{\sigma_{n}<\eta\}}[f(h(X_{\tau_{n+1}}^{\varepsilon}))-f(h(X_{\sigma_{n}}^{\varepsilon}))-\int_{\sigma_{n}}^{\tau_{n+1}}\mathcal{L}f(h(X_{t}^{\varepsilon}))dt]\right) (6.26)

because the difference converges to 0 by Proposition B.3. By Proposition 6.2, we choose 0<ρ<10<\rho<1 such that (6.8) holds for δ\delta and all ε\varepsilon sufficiently small. We introduce the stopping times σ^n\hat{\sigma}_{n} by letting σ^0=σ\hat{\sigma}_{0}=\sigma and σ^n\hat{\sigma}_{n} be the first of σk\sigma_{k} such that σkσ^n1ρ\sigma_{k}-\hat{\sigma}_{n-1}\geq\rho. It is clear that σ^[T/ρ]Tη\hat{\sigma}_{[T/\rho]}\geq T\geq\eta. Hence, we can replace (6.26) by

n=0[T/ρ]1𝐄(x,y)(χ{σ^n<η}𝐄(Xσ^nε,ξσ^nε)k=0χ{σk<ρ}[f(h(Xτn+1ε))f(h(Xσnε))σnτn+1f(h(Xtε))dt])\sum_{n=0}^{[T/\rho]-1}\bm{\mathrm{E}}_{(x,y)}(\chi_{\{\hat{\sigma}_{n}<\eta\}}\bm{\mathrm{E}}_{(X_{\hat{\sigma}_{n}}^{\varepsilon},\xi_{\hat{\sigma}_{n}}^{\varepsilon})}\sum_{k=0}^{\infty}\chi_{\{\sigma_{k}<\rho\}}[f(h(X_{\tau_{n+1}}^{\varepsilon}))-f(h(X_{\sigma_{n}}^{\varepsilon}))-\int_{\sigma_{n}}^{\tau_{n+1}}\mathcal{L}f(h(X_{t}^{\varepsilon}))dt]) (6.27)

and, by the strong Markov property, the difference is no more than

sup(x,y)γ×𝕋msupσρ|𝐄(x,y)n=0χ{σn<σ}[f(h((Xτn+1ε))f(h((Xσnε))σnτn+1f(h(Xtε))dt]|,\sup_{(x,y)\in\gamma\times\mathbb{T}^{m}}\sup_{{\sigma^{\prime}}\leq\rho}|\bm{\mathrm{E}}_{(x,y)}\sum_{n=0}^{\infty}\chi_{\{\sigma_{n}<{\sigma^{\prime}}\}}[f(h((X_{\tau_{n+1}}^{\varepsilon}))-f(h((X_{\sigma_{n}}^{\varepsilon}))-\int_{\sigma_{n}}^{\tau_{n+1}}\mathcal{L}f(h(X_{t}^{\varepsilon}))dt]|, (6.28)

where σ\sigma^{\prime} is a stopping time w.r.t. Xεt\mathcal{F}^{X_{\cdot}^{\varepsilon}}_{t}. Both of them can be bounded by O(δ)O(\delta) due to Lemma 6.4 and Corollary 6.3. ∎

Appendix A Derivatives of the action-angle-type coordinates

In this section, we carefully estimate the first and second derivatives of q(x)q(x), Q(h)Q(h), ϕ(x)\phi(x), A(h,ϕ)A(h,\phi), and Bc(h,ϕ){B_{c}}(h,\phi) in order to prove (4.15). Our main tool is the Morse lemma. Note that we only need to verify the bounds near the separatrix, since the derivatives are uniformly bounded inside the domain. For x,y2x,y\in\mathbb{R}^{2}, let xyx\to y denote the line segment connecting xx and yy. For x,yγ(h)x,y\in\gamma(h) for certain hh, let x𝛾yx\xrightarrow{\gamma}y denote the piece of γ(h)\gamma(h) connecting xx to yy along the direction of H\nabla^{\perp}H. Recall that q(x)=l(H(x))𝛾x1|H|dlq(x)=\int_{l(H(x))\xrightarrow{\gamma}x}\frac{1}{|\nabla H|}dl. To start with, using the Morse lemma, one can compute that q(x)|logH(x)|q(x)\lesssim|\log H(x)| and Q(h)|logh|Q(h)\lesssim|\log h|. Then we make use of two special deterministic motions in the directions of H\nabla^{\perp}H and H\nabla H to calculate the first derivatives of q(x)q(x) precisely:

d𝒙t\displaystyle d\bm{x}_{t} =H(𝒙t)dt,\displaystyle=\nabla^{\perp}H(\bm{x}_{t})dt, (A.1)
d𝒚t\displaystyle d\bm{y}_{t} =H(𝒚t)dt.\displaystyle=\nabla H(\bm{y}_{t})dt.
Refer to caption
Figure 9: Motions tangent and perpendicular to the level curve.

It follows that

q(𝒙t)=q(𝒙0)+t,q(\bm{x}_{t})=q(\bm{x}_{0})+t, (A.2)
q(𝒚t)=q(𝒚0)+DtH|H|2𝒏dl=q(𝒚0)+Dtdiv(H|H|2)dS,\displaystyle q(\bm{y}_{t})=q(\bm{y}_{0})+\int_{\partial D_{t}}\frac{\nabla H}{|\nabla H|^{2}}\cdot\bm{n}dl=q(\bm{y}_{0})+\int_{D_{t}}\mathrm{div}(\frac{\nabla H}{|\nabla H|^{2}})dS, (A.3)

where DtD_{t} is the region bounded by ll, trajectory of 𝒚s\bm{y}_{s}, 0st0\leq s\leq t, γ(H(𝒚0))\gamma(H(\bm{y}_{0})), and γ(H(𝒚t))\gamma(H(\bm{y}_{t})), as shown in Figure 9. Thus, by differentiating (A.2) and (A.3) in tt, we have the following equations:

q(x)H(x)\displaystyle\nabla q(x)\cdot\nabla^{\perp}H(x) =1,\displaystyle=1, (A.4)
q(x)H(x)\displaystyle\nabla q(x)\cdot\nabla H(x) =|H(x)|2l(H(x))𝛾xdiv(H|H|2)1|H|dl.\displaystyle=|\nabla H(x)|^{2}\int_{l(H(x))\xrightarrow{\gamma}x}\mathrm{div}(\frac{\nabla H}{|\nabla H|^{2}})\frac{1}{|\nabla H|}dl.

Therefore, with subscripts denoting the partial derivatives, by solving the linear system,

q1\displaystyle q^{\prime}_{1} =H2H12+H22+H1p,\displaystyle=\frac{-H^{\prime}_{2}}{{H^{\prime}_{1}}^{2}+{H^{\prime}_{2}}^{2}}+H^{\prime}_{1}p, (A.5)
q2\displaystyle q^{\prime}_{2} =H1H12+H22+H2p,\displaystyle=\frac{H^{\prime}_{1}}{{H^{\prime}_{1}}^{2}+{H^{\prime}_{2}}^{2}}+H^{\prime}_{2}p,

where p(x)=l(H(x))𝛾xdiv(H|H|2)1|H|dlp(x)=\int_{l(H(x))\xrightarrow{\gamma}x}\mathrm{div}(\frac{\nabla H}{|\nabla H|^{2}})\frac{1}{|\nabla H|}dl. Using the Morse lemma, one can compute p(x)=O(1/H(x))p(x)=O(1/H(x)), since

|div(H|H|2)|1|H|2.\left|\mathrm{div}(\frac{\nabla H}{|\nabla H|^{2}})\right|\lesssim\frac{1}{|\nabla H|^{2}}. (A.6)

Note that the non-degeneracy of the saddle point implies that |H||H|2|H|\lesssim|\nabla H|^{2}, and hence q=O(|H|/H)\nabla q=O(|\nabla H|/H). The next step is to estimate p\nabla p. For all x,yx,y close enough, with DD denoting the region bounded by ll, xyx\to y, γ(H(x))\gamma(H(x)), and γ(H(y))\gamma(H(y)), we have

|p(x)p(y)|\displaystyle|p(x)-p(y)| =|Ddiv(H|H|2)H|H|2𝐧dlxydiv(H|H|2)H|H|2𝐧dl|\displaystyle=\left|\int_{\partial D}\mathrm{div}(\frac{\nabla H}{|\nabla H|^{2}})\frac{\nabla H}{|\nabla H|^{2}}\cdot\mathbf{n}dl-\int_{x\to y}\mathrm{div}(\frac{\nabla H}{|\nabla H|^{2}})\frac{\nabla H}{|\nabla H|^{2}}\cdot\mathbf{n}dl\right|
D|div[div(H|H|2)H|H|2]|dS+xy|div(H|H|2)1|H||dl\displaystyle\leq\int_{D}\left|\mathrm{div}\left[\mathrm{div}(\frac{\nabla H}{|\nabla H|^{2}})\frac{\nabla H}{|\nabla H|^{2}}\right]\right|dS+\int_{x\to y}\left|\mathrm{div}(\frac{\nabla H}{|\nabla H|^{2}})\frac{1}{|\nabla H|}\right|dl
D1|H|4dS+xy1|H|3dl.\displaystyle\lesssim\int_{D}\frac{1}{|\nabla H|^{4}}dS+\int_{x\to y}\frac{1}{|\nabla H|^{3}}dl.

Then one can obtain |p||H|/H2|\nabla p|\lesssim|\nabla H|/H^{2} by using the Morse lemma again and, as a result, |qij||H|2/H2|q^{\prime\prime}_{ij}|\lesssim|\nabla H|^{2}/H^{2}, 1i,j21\leq i,j\leq 2. Similarly, we can estimate the derivatives of Q(h)Q(h). In fact, Q(h)=γ(h)div(H|H|2)1|H|dl=O(1/h)Q^{\prime}(h)=\int_{\gamma(h)}\mathrm{div}(\frac{\nabla H}{|\nabla H|^{2}})\frac{1}{|\nabla H|}dl=O(1/h) because of the estimate we had on p(x)p(x). In addition,

|Q(h)|=|γ(h)div[div(H|H|2)H|H|2]1|H|dl|γ(h)1|H|5dl=O(1/h2).\left|Q^{\prime\prime}(h)\right|=\left|\int_{\gamma(h)}\mathrm{div}\left[\mathrm{div}(\frac{\nabla H}{|\nabla H|^{2}})\frac{\nabla H}{|\nabla H|^{2}}\right]\frac{1}{|\nabla H|}dl\right|\lesssim\int_{\gamma(h)}\frac{1}{|\nabla H|^{5}}dl=O(1/h^{2}). (A.7)

Since ϕ(x)=q(x)/Q(H(x))\phi(x)=q(x)/Q(H(x)), |ϕ||H|/H|\nabla\phi|\lesssim|\nabla H|/H and |ϕij||H|2/H2|\phi^{\prime\prime}_{ij}|\lesssim|\nabla H|^{2}/H^{2}, 1i,j21\leq i,j\leq 2. Finally, we estimate the derivatives w.r.t. hh of a general function F(h,ϕ)=F~(x1,x2)F(h,\phi)=\tilde{F}(x_{1},x_{2}) with F~\tilde{F} having bounded first and second derivatives. By computing the inverse of the Jacobian of (H,ϕ)(H,\phi) and using the first equation in (A.4),

Fh=F~1ϕ2F~2ϕ1H1ϕ2H2ϕ1=(F~1ϕ2F~2ϕ1)Q(H(x1,x2)).F^{\prime}_{h}=\frac{\tilde{F}^{\prime}_{1}\phi^{\prime}_{2}-\tilde{F}^{\prime}_{2}\phi^{\prime}_{1}}{H^{\prime}_{1}\phi^{\prime}_{2}-H^{\prime}_{2}\phi^{\prime}_{1}}=(\tilde{F}^{\prime}_{1}\phi^{\prime}_{2}-\tilde{F}^{\prime}_{2}\phi^{\prime}_{1})Q(H(x_{1},x_{2})). (A.8)

We deduce that Fh=O(|logh|/h)F^{\prime}_{h}=O(|\log h|/h) and, using FhF_{h}^{\prime} instead of FF in (A.8), Fhh=O(|logh|2/h3)F^{\prime\prime}_{hh}=O(|\log h|^{2}/h^{3}). Similarly, one can obtain Fϕ=O(|logh|)F^{\prime}_{\phi}=O(|\log h|) and Fϕϕ=O(|logh|/h)F^{\prime\prime}_{\phi\phi}=O(|\log h|/h).

Appendix B Exit from neighborhoods of critical points

In this section, we obtain estimates for the exit time from the neighborhoods of the critical points, including extremum points and saddle points. Recall the notation in Section 4: OO is a saddle point, OO^{\prime} is an extremum point, UU is a domain bounded by the separatrix, and η(h)=inf{t:|H(X~tε)H(O)|=h}\eta(h)=\inf\{t:|H({\tilde{X}}_{t}^{\varepsilon})-H(O^{\prime})|=h\}. Recall the function uu defined in (3.8) and let us define

𝑨(x)=𝕋myu(x,y)σ(y)σ(y)𝖳yu(x,y)𝖳dμ(y).\bm{A}(x)=\int_{\mathbb{T}^{m}}\nabla_{y}u(x,y)\sigma(y)\sigma(y)^{\mathsf{T}}\nabla_{y}u(x,y)^{\mathsf{T}}d\mu(y). (B.1)

Using assumption (H4), one can see that 𝑨(x)\bm{A}(x) is positive-definite. Indeed, if there exist a point xx and a vector v0v\not=0 such that v𝖳𝑨(x)v=0v^{\mathsf{T}}\bm{A}(x)v=0, then, since σσ𝖳\sigma\sigma^{\mathsf{T}} is positive-definite, v𝖳yu(x,y)=0v^{\mathsf{T}}\nabla_{y}u(x,y)=0 for all y𝕋my\in\mathbb{T}^{m}. Namely, v𝖳u(x,y)v^{\mathsf{T}}u(x,y) is constant, and v𝖳(b(x,y)b¯(x))=L(v𝖳u(x,y))=0v^{\mathsf{T}}(b(x,y)-\bar{b}(x))=L(v^{\mathsf{T}}u(x,y))=0 on 𝕋m\mathbb{T}^{m}, which contradicts with (H4).

Lemma B.1.

Recall the definition of Bc(x)B_{c}(x) in (4.3). If OO^{\prime} is a minimum point, then Bc(O)>0B_{c}(O^{\prime})>0; if OO^{\prime} is a maximum point, then Bc(O)<0B_{c}(O^{\prime})<0.

Proof.

We prove the result in the case of minimum point. The other case can be treated similarly. Since OO^{\prime} is a critical point and Lu(x,y)=(b(x,y)b¯(x))Lu(x,y)=-(b(x,y)-\bar{b}(x)), we have H(O)=0\nabla H(O^{\prime})=0 and

Bc(O)\displaystyle{B_{c}}(O^{\prime}) =𝕋m[xuh(O,y)b(O,y)+yuh(O,y)c(x,y)]dμ(y)\displaystyle=\int_{\mathbb{T}^{m}}[\nabla_{x}u_{h}(O^{\prime},y)b(O^{\prime},y)+\nabla_{y}u_{h}(O^{\prime},y)c(x,y)]d\mu(y)
=𝕋mu(O,y)𝖳2H(O)b(O,y)dμ(y)\displaystyle=\int_{\mathbb{T}^{m}}u(O^{\prime},y)^{\mathsf{T}}\nabla^{2}H(O^{\prime})b(O^{\prime},y)d\mu(y)
=𝕋mu(O,y)𝖳2H(O)Lu(O,y)dμ(y)\displaystyle=-\int_{\mathbb{T}^{m}}u(O^{\prime},y)^{\mathsf{T}}\nabla^{2}H(O^{\prime})Lu(O^{\prime},y)d\mu(y)
=1i,j22xixjH(O)𝕋mui(O,y)Luj(O,y)dμ(y)\displaystyle=-\sum_{1\leq i,j\leq 2}\frac{\partial^{2}}{\partial x_{i}\partial x_{j}}H(O^{\prime})\int_{\mathbb{T}^{m}}u_{i}(O^{\prime},y)Lu_{j}(O^{\prime},y)d\mu(y)
=1i,j22xixjH(O)𝕋m12(uiLuj+ujLui)(O,y)dμ(y)\displaystyle=-\sum_{1\leq i,j\leq 2}\frac{\partial^{2}}{\partial x_{i}\partial x_{j}}H(O^{\prime})\int_{\mathbb{T}^{m}}\frac{1}{2}(u_{i}Lu_{j}+u_{j}Lu_{i})(O^{\prime},y)d\mu(y)
=121i,j22xixjH(O)(𝑨(O)i,j𝕋mL(uiuj)dμ(y))\displaystyle=\frac{1}{2}\sum_{1\leq i,j\leq 2}\frac{\partial^{2}}{\partial x_{i}\partial x_{j}}H(O^{\prime})\left(\bm{A}(O^{\prime})_{i,j}-\int_{\mathbb{T}^{m}}L(u_{i}u_{j})d\mu(y)\right)
=121i,j22xixjH(O)𝑨(O)i,j.\displaystyle=\frac{1}{2}\sum_{1\leq i,j\leq 2}\frac{\partial^{2}}{\partial x_{i}\partial x_{j}}H(O^{\prime})\bm{A}(O^{\prime})_{i,j}.

This is positive since both 2H\nabla^{2}H and 𝑨\bm{A} are positive definite at OO^{\prime}. ∎

Lemma B.2.

For each κ>0\kappa>0, there exists δ>0\delta>0 such that

𝐄(x,y)η(δ)κ\bm{\mathrm{E}}_{(x,y)}\eta(\delta)\leq\kappa (B.2)

for all xx satisfying |H(x)H(O)|<δ|H(x)-H(O^{\prime})|<\delta, y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small.

Proof.

Without loss of generality, we assume OO^{\prime} to be a minimum point. Similarly to (4.1), we apply Ito’s formula to uh(X~εη(δ)1,ξ~εη(δ)1)u_{h}({\tilde{X}}^{\varepsilon}_{\eta(\delta)\wedge 1},{\tilde{\xi}}^{\varepsilon}_{\eta(\delta)\wedge 1}) and we obtain

H(X~εη(δ)1)\displaystyle H({\tilde{X}}^{\varepsilon}_{\eta(\delta)\wedge 1}) =H(x)+0η(δ)1yuh(X~sε,ξ~sε)𝖳σ(ξ~sε)dWs+ε(uh(x,y)uh(X~η(δ)1ε,ξ~η(δ)1ε))\displaystyle=H(x)+\int_{0}^{\eta(\delta)\wedge 1}\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\tilde{\xi}}_{s}^{\varepsilon})dW_{s}+\varepsilon(u_{h}(x,y)-u_{h}({\tilde{X}}_{\eta(\delta)\wedge 1}^{\varepsilon},{\tilde{\xi}}_{\eta(\delta)\wedge 1}^{\varepsilon}))
+0η(δ)1[xuh(X~sε,ξ~sε)b(X~sε,ξ~sε)+yuh(X~sε,ξ~sε)c(X~sε,ξ~sε)]ds.\displaystyle\quad+\int_{0}^{\eta(\delta)\wedge 1}[\nabla_{x}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot b({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})+\nabla_{y}u_{h}({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})\cdot c({\tilde{X}}_{s}^{\varepsilon},{\tilde{\xi}}_{s}^{\varepsilon})]ds.

By taking the expectation on both sides and using Corollary 3.4, we obtain

𝐄(x,y)0η(δ)1Bc(X~sε)ds<δ+O(ε).\bm{\mathrm{E}}_{(x,y)}\int_{0}^{\eta(\delta)\wedge 1}B_{c}({\tilde{X}}_{s}^{\varepsilon})ds<\delta+O(\varepsilon). (B.3)

Due to Lemma B.1, Bc(O)>0B_{c}(O^{\prime})>0. Hence, we can choose δ\delta to be small enough such that Bc(X~sε)>Bc(O)/2>0B_{c}({\tilde{X}}_{s}^{\varepsilon})>B_{c}(O^{\prime})/2>0 before η(δ)\eta(\delta). Thus, it follows from (B.3) that, for all ε\varepsilon sufficiently small,

𝐄(x,y)(η(δ)1)4δ/Bc(O).\bm{\mathrm{E}}_{(x,y)}(\eta(\delta)\wedge 1)\leq 4\delta/B_{c}(O^{\prime}).

Then 𝐏(x,y)(η(δ)1)4δ/Bc(O)\bm{\mathrm{P}}_{(x,y)}(\eta(\delta)\geq 1)\leq 4\delta/B_{c}(O^{\prime}) for all xx satisfying |H(x)H(O)|<δ|H(x)-H(O^{\prime})|<\delta, y𝕋my\in\mathbb{T}^{m} by Markov’s inequality. Then, one can obtain the desired result using the Markov property by the fact that 𝐄(x,y)η(δ)𝐄(x,y)(η(δ)1)+𝐄(x,y)(η(δ)χ{η(δ)>1})\bm{\mathrm{E}}_{(x,y)}\eta(\delta)\leq\bm{\mathrm{E}}_{(x,y)}(\eta(\delta)\wedge 1)+\bm{\mathrm{E}}_{(x,y)}(\eta(\delta)\chi_{\{\eta(\delta)>1\}}). ∎

Proposition B.3.

Let 0<α<1/20<\alpha<1/2, Vε={x:|H(x)H(O)|<εα}V^{\varepsilon}=\{x:|H(x)-H(O)|<\varepsilon^{\alpha}\}, and τ=inf{t:X~tεVε}\tau=\inf\{t:{\tilde{X}}_{t}^{\varepsilon}\not\in V^{\varepsilon}\}. Then, uniformly in 0<α<1/20<\alpha<1/2 and (x,y)Vε×𝕋m(x,y)\in V^{\varepsilon}\times\mathbb{T}^{m}, as ε0\varepsilon\downarrow 0,

𝐄(x,y)τ=O(ε2α|logε|).\bm{\mathrm{E}}_{(x,y)}\tau=O(\varepsilon^{2\alpha}|\log\varepsilon|). (B.4)

To prove Proposition B.3, it is more convenient to consider the process (𝑿~tε,𝝃~tε)({\tilde{\bm{X}}}_{t}^{\varepsilon},{\tilde{\bm{\xi}}}_{t}^{\varepsilon}), define the stopping time 𝝉=inf{t:𝑿~tεVε}\bm{\tau}=\inf\{t:{\tilde{\bm{X}}}_{t}^{\varepsilon}\not\in V^{\varepsilon}\}, and then prove that 𝐄(x,y)𝝉=O(ε2α1|logε|)\bm{\mathrm{E}}_{(x,y)}\bm{\tau}=O(\varepsilon^{2\alpha-1}|\log\varepsilon|). We need careful analysis of the behavior of the processes near the saddle point and away from the saddle point. The latter is easier to deal with since there is no degeneracy, while the former needs us to, again, use the Morse Lemma to make concrete computations. For simplicity, we prove the result in the case of (𝑿tε,𝝃tε)({\bm{X}}_{t}^{\varepsilon},{\bm{\xi}}_{t}^{\varepsilon}) without the additional drift c(x,y)c(x,y) since it can be seen in the proof that the additional terms induced by c(x,y)c(x,y) are always relatively small. We prove that there exist two neighborhoods, D1D2D_{1}\subset D_{2}, of OO (as shown in Figure 10(a)), such that, in VεV^{\varepsilon}, it takes the process O(|logε|)O(|\log\varepsilon|) time to escape from D2D_{2}, and O(1)O(1) time to return to D1D_{1}. Since xVεx\in V^{\varepsilon} is two-dimensional, we denote x=(p,q)x=(p,q). To avoid confusion brought by convoluted formulas, we assume the saddle point is the origin and the Hamiltonian H(x)=pqH(x)=pq in a small neighborhood of OO. This assumption is not restrictive because, as in the proof of Lemma 5.3, we can use the Morse Lemma and perform a random change of time with the multiplier bounded from below and above, which will not change the order of the expected time. For r>0r>0, we denote DrD_{r} to be the region in VεV^{\varepsilon} with |p|r|p|\leq r and |q|r|q|\leq r, (Dr)in={|p|=r}Vε(\partial D_{r})_{\textrm{in}}=\{|p|=r\}\cap V^{\varepsilon}, (Dr)out={|q|=r}Vε(\partial D_{r})_{\textrm{out}}=\{|q|=r\}\cap V^{\varepsilon}, and choose r3>0r_{3}>0 small enough that H(x)=pqH(x)=pq in Dr3D_{r_{3}}.

Refer to caption
(a) Two neighborhoods of OO.
Refer to caption
(b) Dr1D_{r_{1}} and Dr2D_{r_{2}}.
Figure 10: Transitions between two boundaries.
Lemma B.4.

There exist r1,r2>0r_{1},r_{2}>0 such that, uniformly in (x,y)Dr1×𝕋m(x,y)\in D_{r_{1}}\times\mathbb{T}^{m}, as ε0\varepsilon\downarrow 0,

𝐄(x,y)τ¯=O(|logε|),\bm{\mathrm{E}}_{(x,y)}\bar{\tau}=O(|\log\varepsilon|), (B.5)

where τ¯=inf{t:𝐗tεDr2}\bar{\tau}=\inf\{t:{\bm{X}}_{t}^{\varepsilon}\not\in D_{r_{2}}\}.

Proof.

We denote ηtε=(𝑿tε)2\eta_{t}^{\varepsilon}=({\bm{X}}_{t}^{\varepsilon})_{2} and study the behavior of ηtε\eta_{t}^{\varepsilon} inside B(0,r3)B(0,r_{3}). All the computations below concern 𝑿tε{\bm{X}}_{t}^{\varepsilon} before leaving Dr3D_{r_{3}}. As in (3.10), we can write the equation for ηtε\eta_{t}^{\varepsilon},

ηtε\displaystyle{\eta}_{t}^{\varepsilon} =q+0tηsεds+ε0tyu2(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs\displaystyle=q+\int_{0}^{t}{\eta}_{s}^{\varepsilon}ds+\sqrt{\varepsilon}\int_{0}^{t}\nabla_{y}u_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s} (B.6)
+ε0txu2(𝑿sε,𝝃sε)b(𝑿sε,𝝃sε)ds+ε(u2(x,y)u2(𝑿tε,𝝃tε)).\displaystyle\quad+\varepsilon\int_{0}^{t}\nabla_{x}u_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})\cdot b({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})ds+\varepsilon(u_{2}(x,y)-u_{2}({\bm{X}}_{t}^{\varepsilon},{\bm{\xi}}_{t}^{\varepsilon})).

Introduce η^tε\hat{\eta}_{t}^{\varepsilon}, which is close to ηtε\eta_{t}^{\varepsilon}:

η^tε=q+0tηsεds+ε0tyu2(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs+ε0txu2(𝑿sε,𝝃sε)b(𝑿sε,𝝃sε)ds.\hat{\eta}_{t}^{\varepsilon}=q+\int_{0}^{t}{\eta}_{s}^{\varepsilon}ds+\sqrt{\varepsilon}\int_{0}^{t}\nabla_{y}u_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}+\varepsilon\int_{0}^{t}\nabla_{x}u_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})\cdot b({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})ds. (B.7)

Let F(q)=0qez20zew2dwdzF(q)=\int_{0}^{q}e^{-z^{2}}\int_{0}^{z}e^{w^{2}}dwdz, which satisfies F(q)12logqF(q)\sim\frac{1}{2}\log q, as pp\to\infty, and has bounded derivatives up to the third order. Then we can choose a large C>0C>0, such that |F|,|F|,|F||F^{\prime}|,|F^{\prime\prime}|,|F^{\prime\prime\prime}|, |u(x,y)|,|u(x,y)|,|2u(x,y)||u(x,y)|,|\nabla u(x,y)|,|\nabla^{2}u(x,y)| are bounded by CC. Recall from (B.1) that the vector-valued function yu2(x,y)𝖳σ(y)\nabla_{y}u_{2}(x,y)^{\mathsf{T}}\sigma(y) has non-degenerate average w.r.t. μ\mu, in the sense that 𝑨22(x)>0\bm{A}_{22}(x)>0. Let A0=𝑨22(O)>0A_{0}=\bm{A}_{22}(O)>0 and let 0<r1<r2<r30<r_{1}<r_{2}<r_{3} be such that A0(11/(2C))<𝑨22(x)<A0(1+1/(2C))A_{0}(1-1/(2C))<\bm{A}_{22}(x)<A_{0}(1+1/(2C)) in Dr2D_{r_{2}}, as shown in Figure 10(b). Let r¯2=r3+r22\bar{r}_{2}=\frac{r_{3}+r_{2}}{2}. Define function f(q)f(q) (that depends on ε\varepsilon) and compute its derivatives:

f(q)=2(F(r¯2A0ε)F(qA0ε)),\displaystyle f(q)=2(F(\frac{\bar{r}_{2}}{\sqrt{A_{0}\varepsilon}})-F(\frac{q}{\sqrt{A_{0}\varepsilon}})),\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak f(q)=2A0εF(qA0ε),\displaystyle f^{\prime}(q)=-\frac{2}{\sqrt{A_{0}\varepsilon}}F^{\prime}(\frac{q}{\sqrt{A_{0}\varepsilon}}), (B.8)
f(q)=2A0εF(qA0ε),\displaystyle f^{\prime\prime}(q)=-\frac{2}{A_{0}\varepsilon}F^{\prime\prime}(\frac{q}{\sqrt{A_{0}\varepsilon}}),\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak f(q)=2(A0ε)3F(qA0ε).\displaystyle f^{\prime\prime\prime}(q)=-\frac{2}{(\sqrt{A_{0}\varepsilon})^{3}}F^{\prime\prime\prime}(\frac{q}{\sqrt{A_{0}\varepsilon}}).

Furthermore, ff satisfies the differential equations:

{12A0εf+qf=1f(r¯2)=f(r¯2)=0.\begin{cases}\frac{1}{2}A_{0}\varepsilon f^{\prime\prime}+qf^{\prime}=-1\\ f(-\bar{r}_{2})=f(\bar{r}_{2})=0\end{cases}. (B.9)

By Lemma 3.2, there is a function v2(x,y)v_{2}(x,y) that is bounded together with its derivatives such that

Lv2(x,y)=|yu2(x,y)σ(y)|2𝑨22(x).Lv_{2}(x,y)=\left|\nabla_{y}u_{2}(x,y)\sigma(y)\right|^{2}-\bm{A}_{22}(x). (B.10)

where LL is the generator of the process ξt\xi_{t} (see (2.1)). Since |ηtεη^tε|=O(ε)|\eta_{t}^{\varepsilon}-\hat{\eta}_{t}^{\varepsilon}|=O(\varepsilon) and r¯2>r2\bar{r}_{2}>r_{2}, we can apply Ito’s formula to v2(𝑿tε,𝝃tε)f(ηtε)v_{2}({\bm{X}}_{t}^{\varepsilon},{\bm{\xi}}_{t}^{\varepsilon})f^{\prime\prime}(\eta_{t}^{\varepsilon}) for 0tτ¯0\leq t\leq\bar{\tau}:

v2(𝑿tε,𝝃tε)f(ηtε)=\displaystyle v_{2}({\bm{X}}_{t}^{\varepsilon},{\bm{\xi}}_{t}^{\varepsilon})f^{\prime\prime}(\eta_{t}^{\varepsilon})= v2(x,y)f(q)+0tx(v2(𝑿sε,𝝃sε)f(ηsε))b(𝑿sε,𝝃sε)ds\displaystyle v_{2}(x,y)f^{\prime\prime}(q)+\int_{0}^{t}\nabla_{x}(v_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})f^{\prime\prime}(\eta_{s}^{\varepsilon}))\cdot b({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})ds (B.11)
+1ε0tLv2(𝑿sε,𝝃sε)f(ηsε)ds+1ε0tyv2(𝑿sε,𝝃sε)𝖳f(ηsε)σ(𝝃sε)dWs.\displaystyle+\frac{1}{\varepsilon}\int_{0}^{t}Lv_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})f^{\prime\prime}(\eta_{s}^{\varepsilon})ds+\frac{1}{\sqrt{\varepsilon}}\int_{0}^{t}\nabla_{y}v_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}f^{\prime\prime}(\eta_{s}^{\varepsilon})\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}.

Hence it follows that

0t(|yu2(𝑿sε,𝝃sε)σ(𝝃sε)|2𝑨22(𝑿sε))f(ηsε)ds\displaystyle\int_{0}^{t}(\left|\nabla_{y}u_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})\sigma({\bm{\xi}}_{s}^{\varepsilon})\right|^{2}-\bm{A}_{22}({\bm{X}}_{s}^{\varepsilon}))f^{\prime\prime}(\eta_{s}^{\varepsilon})ds (B.12)
=ε(v2(𝑿tε,𝝃tε)f(ηtε)v2(x,y)f(q))ε0tx(v2(𝑿sε,𝝃sε)f(ηsε))b(𝑿sε,𝝃sε)ds\displaystyle\quad=\varepsilon\left(v_{2}({\bm{X}}_{t}^{\varepsilon},{\bm{\xi}}_{t}^{\varepsilon})f^{\prime\prime}(\eta_{t}^{\varepsilon})-v_{2}(x,y)f^{\prime\prime}(q)\right)-\varepsilon\int_{0}^{t}\nabla_{x}(v_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})f^{\prime\prime}(\eta_{s}^{\varepsilon}))\cdot b({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})ds
ε0tf(ηsε)yv2(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs\displaystyle\quad\quad-\sqrt{\varepsilon}\int_{0}^{t}f^{\prime\prime}(\eta_{s}^{\varepsilon})\nabla_{y}v_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}
=O(1)+O(1ε)tε0tf(ηsε)yv2(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs.\displaystyle\quad=O(1)+O(\frac{1}{\sqrt{\varepsilon}})\cdot t-\sqrt{\varepsilon}\int_{0}^{t}f^{\prime\prime}(\eta_{s}^{\varepsilon})\nabla_{y}v_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}.

Now apply Ito’s formula to f(η^tε)f(\hat{\eta}_{t}^{\varepsilon}) for 0tτ¯0\leq t\leq\bar{\tau}:

f(η^tε)\displaystyle f(\hat{\eta}_{t}^{\varepsilon}) =f(q)+0tf(η^sε)ηsεds+ε20tf(η^sε)|yu2(𝑿sε,𝝃sε)𝖳σ(𝝃sε)|2ds\displaystyle=f(q)+\int_{0}^{t}f^{\prime}(\hat{\eta}_{s}^{\varepsilon})\eta_{s}^{\varepsilon}ds+\frac{\varepsilon}{2}\int_{0}^{t}f^{\prime\prime}(\hat{\eta}_{s}^{\varepsilon})|\nabla_{y}u_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})|^{2}ds
+ε0tf(η^sε)xu2(𝑿sε,𝝃sε)b(𝑿sε,𝝃sε)ds+ε0tf(η^sε)yu2(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs\displaystyle\quad+\varepsilon\int_{0}^{t}f^{\prime}(\hat{\eta}_{s}^{\varepsilon})\nabla_{x}u_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})\cdot b({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})ds+\sqrt{\varepsilon}\int_{0}^{t}f^{\prime}(\hat{\eta}_{s}^{\varepsilon})\nabla_{y}u_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}
=f(q)+0tf(η^sε)η^sεds+O(ε)t+ε20tf(η^sε)𝑨22(𝑿sε)ds\displaystyle=f(q)+\int_{0}^{t}f^{\prime}(\hat{\eta}_{s}^{\varepsilon})\hat{\eta}_{s}^{\varepsilon}ds+O(\sqrt{\varepsilon})\cdot t+\frac{\varepsilon}{2}\int_{0}^{t}f^{\prime\prime}(\hat{\eta}_{s}^{\varepsilon})\bm{A}_{22}({\bm{X}}_{s}^{\varepsilon})ds
+ε20t(|yu2(𝑿sε,𝝃sε)𝖳σ(𝝃sε)|2𝑨22(𝑿sε))f(ηsε)ds+O(ε)t\displaystyle\quad+\frac{\varepsilon}{2}\int_{0}^{t}(|\nabla_{y}u_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})|^{2}-\bm{A}_{22}({\bm{X}}_{s}^{\varepsilon}))f^{\prime\prime}(\eta_{s}^{\varepsilon})ds+O(\sqrt{\varepsilon})\cdot t
+ε0tf(η^sε)yu2(𝑿sε,𝝃sε)𝖳σ(𝝃tε)dWs\displaystyle\quad+\sqrt{\varepsilon}\int_{0}^{t}f^{\prime}(\hat{\eta}_{s}^{\varepsilon})\nabla_{y}u_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{t}^{\varepsilon})dW_{s}
=f(q)+0t[f(η^sε)η^sε+A0ε2f(η^sε)]ds+ε20tf(η^sε)(𝑨22(𝑿sε)A0)ds\displaystyle=f(q)+\int_{0}^{t}[f^{\prime}(\hat{\eta}_{s}^{\varepsilon})\hat{\eta}_{s}^{\varepsilon}+\frac{A_{0}\varepsilon}{2}f^{\prime\prime}(\hat{\eta}_{s}^{\varepsilon})]ds+\frac{\varepsilon}{2}\int_{0}^{t}f^{\prime\prime}(\hat{\eta}_{s}^{\varepsilon})(\bm{A}_{22}({\bm{X}}_{s}^{\varepsilon})-A_{0})ds
+ε2(O(1)+O(1ε)tε0tf(ηsε)yv2(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs)+O(ε)t\displaystyle\quad+\frac{\varepsilon}{2}(O(1)+O(\frac{1}{\sqrt{\varepsilon}})\cdot t-\sqrt{\varepsilon}\int_{0}^{t}f^{\prime\prime}(\eta_{s}^{\varepsilon})\nabla_{y}v_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s})+O(\sqrt{\varepsilon})\cdot t
+ε0tf(η^sε)yu2(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs\displaystyle\quad+\sqrt{\varepsilon}\int_{0}^{t}f^{\prime}(\hat{\eta}_{s}^{\varepsilon})\nabla_{y}u_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}
f(q)t+t2+O(ε)+O(ε)tε320tf(ηsε)yv2(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs\displaystyle\leq f(q)-t+\frac{t}{2}+O(\varepsilon)+O(\sqrt{\varepsilon})\cdot t-\frac{\sqrt{\varepsilon^{3}}}{2}\int_{0}^{t}f^{\prime\prime}(\eta_{s}^{\varepsilon})\nabla_{y}v_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}
+ε0tf(η^sε)yu2(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs,\displaystyle\quad+\sqrt{\varepsilon}\int_{0}^{t}f^{\prime}(\hat{\eta}_{s}^{\varepsilon})\nabla_{y}u_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s},

where the equalities follow from (B.8) and (B.12), and the last inequality holds since ff solves (B.9) and |𝑨22(𝑿sε)A0|<A0/(2C)|\bm{A}_{22}({\bm{X}}_{s}^{\varepsilon})-A_{0}|<A_{0}/(2C). Let τ~=τ¯1/ε\tilde{\tau}=\bar{\tau}\wedge 1/\varepsilon. Then η^τ~ε[r¯2,r¯2]\hat{\eta}_{\tilde{\tau}}^{\varepsilon}\in[-\bar{r}_{2},\bar{r}_{2}] because |η^τ~εητ~ε|=O(ε)|\hat{\eta}_{\tilde{\tau}}^{\varepsilon}-\eta_{\tilde{\tau}}^{\varepsilon}|=O(\varepsilon). The previous calculation reduces to

f(η^τ~ε)\displaystyle f(\hat{\eta}_{\tilde{\tau}}^{\varepsilon}) f(q)τ~/2+O(ε)+O(ε)τ~ε320τ~f(ηsε)yv2(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs\displaystyle\leq f(q)-\tilde{\tau}/2+O(\varepsilon)+O(\sqrt{\varepsilon})\cdot\tilde{\tau}-\frac{\sqrt{\varepsilon^{3}}}{2}\int_{0}^{\tilde{\tau}}f^{\prime\prime}(\eta_{s}^{\varepsilon})\nabla_{y}v_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}
+ε0τ~f(η^sε)yu2(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs.\displaystyle\quad+\sqrt{\varepsilon}\int_{0}^{\tilde{\tau}}f^{\prime}(\hat{\eta}_{s}^{\varepsilon})\nabla_{y}u_{2}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}.

By taking the expectation, we have for all xDr2x\in D_{r_{2}}, y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon small enough

𝐄(x,y)τ~5supr¯2qr¯2f(q)=O(|logε|).\bm{\mathrm{E}}_{(x,y)}\tilde{\tau}\leq 5\sup_{-\bar{r}_{2}\leq q^{\prime}\leq\bar{r}_{2}}f(q^{\prime})=O(|\log\varepsilon|). (B.13)

Then, by Markov’s inequality and the Markov property, we obtain that 𝐄(x,y)τ¯=O(|logε|)\bm{\mathrm{E}}_{(x,y)}\bar{\tau}=O(|\log\varepsilon|). ∎

Lemma B.5.

Let r1,r2,τ¯r_{1},r_{2},\bar{\tau} be defined as in Lemma B.4. Then, uniformly in (x,y)Dr1×𝕋m(x,y)\in D_{r_{1}}\times\mathbb{T}^{m},

𝐏(x,y)(𝑿τ¯ε(Dr2)in)0as ε0.\bm{\mathrm{P}}_{(x,y)}({\bm{X}}_{\bar{\tau}}^{\varepsilon}\in(\partial D_{r_{2}})_{\mathrm{in}})\to 0\leavevmode\nobreak\ \text{as }\varepsilon\downarrow 0. (B.14)
Proof.

Again, we denote x=(p,q)x=(p,q) and, for simplicity, we assume that the saddle point is the origin and that H(x)=pqH(x)=pq in a small neighborhood of OO. We extend the function b(x,y)b(x,y) in the vertical direction in such a way that it is bounded together with its partial derivatives and the first component of b¯(x)\bar{b}(x) is p-p in the region {x:|p|r2}\{x:|p|\leq r_{2}\}. We denote ζtε=(𝑿tε)1\zeta_{t}^{\varepsilon}=({\bm{X}}_{t}^{\varepsilon})_{1} and show that it takes significantly longer than |logε||\log\varepsilon| for ζtε\zeta_{t}^{\varepsilon} to reach ±r2\pm r_{2}, hence it is unlikely for 𝑿tε{\bm{X}}_{t}^{\varepsilon} to exit Dr2D_{r_{2}} through (Dr2)in{(\partial D_{r_{2}})}_{\mathrm{in}}. All the computations below concern 𝑿tε{\bm{X}}_{t}^{\varepsilon} before ζtε\zeta_{t}^{\varepsilon} reaches ±r2\pm r_{2}. As in (3.10), we can write the equation for ζtε\zeta_{t}^{\varepsilon}:

ζtε\displaystyle{\zeta}_{t}^{\varepsilon} =p0tζsεds+ε0tyu1(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs\displaystyle=p-\int_{0}^{t}{\zeta}_{s}^{\varepsilon}ds+\sqrt{\varepsilon}\int_{0}^{t}\nabla_{y}u_{1}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s} (B.15)
+ε0txu1(𝑿sε,𝝃sε)b(𝑿sε,𝝃sε)ds+ε(u1(x,y)u1(𝑿tε,𝝃tε)).\displaystyle\quad+\varepsilon\int_{0}^{t}\nabla_{x}u_{1}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})\cdot b({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})ds+\varepsilon(u_{1}(x,y)-u_{1}({\bm{X}}_{t}^{\varepsilon},{\bm{\xi}}_{t}^{\varepsilon})).

Introduce ζ^tε\hat{\zeta}_{t}^{\varepsilon}, which is close to ζtε\zeta_{t}^{\varepsilon}:

ζ^tε=p0tζsεds+ε0tyu1(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs+ε0txu1(𝑿sε,𝝃sε)b(𝑿sε,𝝃sε)ds.\hat{\zeta}_{t}^{\varepsilon}=p-\int_{0}^{t}{\zeta}_{s}^{\varepsilon}ds+\sqrt{\varepsilon}\int_{0}^{t}\nabla_{y}u_{1}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}+\varepsilon\int_{0}^{t}\nabla_{x}u_{1}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})\cdot b({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})ds. (B.16)

Since b(x,y)b(x,y) and its partial derivatives are bounded in {x:|p|r2}×𝕋m\{x:|p|\leq r_{2}\}\times\mathbb{T}^{m}, we can choose C>0C>0 such that

supx:|p|r2,y𝕋m(|yu1(x,y)𝖳σ(y)|22|u1(x,y)||xu1(x,y)b(x,y)|)<C/2.\sup_{x:|p|\leq r_{2},y\in\mathbb{T}^{m}}\left(|\nabla_{y}u_{1}(x,y)^{\mathsf{T}}\sigma(y)|^{2}\vee 2|u_{1}(x,y)|\vee|\nabla_{x}u_{1}(x,y)\cdot b(x,y)|\right)<C/2. (B.17)

Let us define r¯2=r1+r22\bar{r}_{2}=\frac{r_{1}+r_{2}}{2}, τ^2=inf{t:|ζ^tε|>r¯2}\hat{\tau}_{2}=\inf\{t:|\hat{\zeta}_{t}^{\varepsilon}|>\bar{r}_{2}\}, and function f(p)=exp(p2/(Cε))f(p)=\exp(p^{2}/(C\varepsilon)). Then it follows that

Cε2fpff=0.\frac{C\varepsilon}{2}f^{\prime\prime}-pf^{\prime}-f=0. (B.18)

Note that |ζtε|r2|\zeta_{t}^{\varepsilon}|\leq r_{2} for 0tτ^20\leq t\leq\hat{\tau}_{2} since |ζtεζ^tε|Cε/2|\zeta_{t}^{\varepsilon}-\hat{\zeta}_{t}^{\varepsilon}|\leq C\varepsilon/2. Apply Ito’s formula to exp(t/2)f(ζ^tε)\exp(-t/2)f(\hat{\zeta}_{t}^{\varepsilon}) for 0tτ^20\leq t\leq\hat{\tau}_{2} and obtain using (B.17):

et/2f(ζ^tε)\displaystyle e^{-t/2}f(\hat{\zeta}_{t}^{\varepsilon}) =f(p)120tes/2f(ζ^sε)ds0tes/2f(ζ^sε)ζsεds\displaystyle=f(p)-\frac{1}{2}\int_{0}^{t}e^{-s/2}f(\hat{\zeta}_{s}^{\varepsilon})ds-\int_{0}^{t}e^{-s/2}f^{\prime}(\hat{\zeta}_{s}^{\varepsilon})\zeta_{s}^{\varepsilon}ds
+ε0tes/2f(ζ^sε)xu1(𝑿sε,𝝃sε)b(𝑿sε,𝝃sε)ds\displaystyle\quad+\varepsilon\int_{0}^{t}e^{-s/2}f^{\prime}(\hat{\zeta}_{s}^{\varepsilon})\nabla_{x}u_{1}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})\cdot b({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})ds
+ε20tes/2f(ζ^sε)|yu1(𝑿sε,𝝃sε)𝖳σ(𝝃sε)|2ds\displaystyle\quad+\frac{\varepsilon}{2}\int_{0}^{t}e^{-s/2}f^{\prime\prime}(\hat{\zeta}_{s}^{\varepsilon})|\nabla_{y}u_{1}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})|^{2}ds
+ε0tes/2f(ζ^sε)yu1(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs\displaystyle\quad+\sqrt{\varepsilon}\int_{0}^{t}e^{-s/2}f^{\prime}(\hat{\zeta}_{s}^{\varepsilon})\nabla_{y}u_{1}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}
=f(p)120tes/2f(ζ^sε)ds\displaystyle=f(p)-\frac{1}{2}\int_{0}^{t}e^{-s/2}f(\hat{\zeta}_{s}^{\varepsilon})ds
+0tes/2f(ζ^sε)(12ζ^sε+[(ζ^sεζsε)+εxu1(𝑿sε,𝝃sε)b(𝑿sε,𝝃sε)12ζ^sε])ds\displaystyle\quad+\int_{0}^{t}e^{-s/2}f^{\prime}(\hat{\zeta}_{s}^{\varepsilon})\left(-\frac{1}{2}\hat{\zeta}_{s}^{\varepsilon}+\left[(\hat{\zeta}_{s}^{\varepsilon}-\zeta_{s}^{\varepsilon})+\varepsilon\nabla_{x}u_{1}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})\cdot b({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})-\frac{1}{2}\hat{\zeta}_{s}^{\varepsilon}\right]\right)ds
+ε20tes/2f(ζ^sε)|yu1(𝑿sε,𝝃sε)𝖳σ(𝝃sε)|2ds\displaystyle\quad+\frac{\varepsilon}{2}\int_{0}^{t}e^{-s/2}f^{\prime\prime}(\hat{\zeta}_{s}^{\varepsilon})|\nabla_{y}u_{1}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})|^{2}ds
+ε0tes/2f(ζ^sε)yu1(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs\displaystyle\quad+\sqrt{\varepsilon}\int_{0}^{t}e^{-s/2}f^{\prime}(\hat{\zeta}_{s}^{\varepsilon})\nabla_{y}u_{1}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}
f(p)120tes/2f(ζ^sε)ds120tes/2f(ζ^sε)ζ^sεds+Cε40tes/2f(ζ^sε)ds\displaystyle\leq f(p)-\frac{1}{2}\int_{0}^{t}e^{-s/2}f(\hat{\zeta}_{s}^{\varepsilon})ds-\frac{1}{2}\int_{0}^{t}e^{-s/2}f^{\prime}(\hat{\zeta}_{s}^{\varepsilon})\hat{\zeta}_{s}^{\varepsilon}ds+\frac{C\varepsilon}{4}\int_{0}^{t}e^{-s/2}f^{\prime\prime}(\hat{\zeta}_{s}^{\varepsilon})ds
+0tes/2f(ζ^sε)[(ζ^sεζsε)+εxu1(𝑿sε,𝝃sε)b(𝑿sε,𝝃sε)12ζ^sε]ds\displaystyle\quad+\int_{0}^{t}e^{-s/2}f^{\prime}(\hat{\zeta}_{s}^{\varepsilon})\left[(\hat{\zeta}_{s}^{\varepsilon}-\zeta_{s}^{\varepsilon})+\varepsilon\nabla_{x}u_{1}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})\cdot b({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})-\frac{1}{2}\hat{\zeta}_{s}^{\varepsilon}\right]ds
+ε0tes/2f(ζ^sε)yu1(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs\displaystyle\quad+\sqrt{\varepsilon}\int_{0}^{t}e^{-s/2}f^{\prime}(\hat{\zeta}_{s}^{\varepsilon})\nabla_{y}u_{1}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}
f(p)+18Cε(1et/2)+ε0tes/2f(ζ^sε)yu1(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs.\displaystyle\leq f(p)+18C\varepsilon(1-e^{-t/2})+\sqrt{\varepsilon}\int_{0}^{t}e^{-s/2}f^{\prime}(\hat{\zeta}_{s}^{\varepsilon})\nabla_{y}u_{1}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}.

The last inequality follows from (B.18) and the fact that the integrand on the second line is either negative, when |ζ^sε|2Cε|\hat{\zeta}_{s}^{\varepsilon}|\geq 2C\varepsilon, or small and bounded by 9Cεes/29C\varepsilon e^{-s/2}, when |ζ^sε|2Cε|\hat{\zeta}_{s}^{\varepsilon}|\leq 2C\varepsilon. By replacing tt by the stopping time τ^2\hat{\tau}_{2} and taking expectation, we obtain

𝐄(x,y)eτ^2/22e(r12r¯22)/(Cε).\bm{\mathrm{E}}_{(x,y)}e^{-\hat{\tau}_{2}/2}\leq 2e^{(r_{1}^{2}-{\bar{r}_{2}}^{2})/(C\varepsilon)}. (B.19)

Let τ¯2=inf{t:|ζtε|>r2}\bar{\tau}_{2}=\inf\{t:|\zeta_{t}^{\varepsilon}|>r_{2}\}. Then, since |ζtεζ^tε|Cε/2|\zeta_{t}^{\varepsilon}-\hat{\zeta}_{t}^{\varepsilon}|\leq C\varepsilon/2, it follows that

𝐏(x,y)(τ¯2<|logε|/ε)𝐏(x,y)(τ^2<|logε|/ε)2exp(r12r¯22Cε+|logε|2ε)0,\bm{\mathrm{P}}_{(x,y)}(\bar{\tau}_{2}<|\log\varepsilon|/\sqrt{\varepsilon})\leq\bm{\mathrm{P}}_{(x,y)}(\hat{\tau}_{2}<|\log\varepsilon|/\sqrt{\varepsilon})\leq 2\exp(\frac{r_{1}^{2}-{\bar{r}_{2}}^{2}}{C\varepsilon}+\frac{|\log\varepsilon|}{2\sqrt{\varepsilon}})\to 0, (B.20)

as ε0\varepsilon\downarrow 0. However, by Lemma B.4 and Markov’s inequality, we have

𝐏(x,y)(τ¯>|logε|/ε)0,\bm{\mathrm{P}}_{(x,y)}(\bar{\tau}>|\log\varepsilon|/\sqrt{\varepsilon})\to 0, (B.21)

as ε0\varepsilon\downarrow 0. Thus, the desired result follows. ∎

Lemma B.6.

Let τ¯¯=inf{t:𝐗tεDr1}𝛕\bar{\bar{\tau}}=\inf\{t:{\bm{X}}_{t}^{\varepsilon}\in D_{r_{1}}\}\wedge\bm{\tau}. Then there exists a>0a>0 such that, uniformly in (x,y)(Dr2)out×𝕋m(x,y)\in(\partial{D_{r_{2}}})_{\mathrm{out}}\times\mathbb{T}^{m},

𝐏(x,y)(τ¯¯<𝝉,0τ¯¯|yuh(𝑿sε,𝝃sε)𝖳σ(𝝃sε)|2ds<a)0as ε0.\bm{\mathrm{P}}_{(x,y)}(\bar{\bar{\tau}}<\bm{\tau},\int_{0}^{\bar{\bar{\tau}}}|\nabla_{y}u_{h}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})|^{2}ds<a)\to 0\leavevmode\nobreak\ \text{as }\varepsilon\to 0. (B.22)

Furthermore, 𝐄(x,y)τ¯¯\bm{\mathrm{E}}_{(x,y)}\bar{\bar{\tau}} is bounded uniformly in (x,y)(Dr2)out×𝕋m(x,y)\in(\partial{D_{r_{2}}})_{\mathrm{out}}\times\mathbb{T}^{m}.

Proof.

Let t^>0\hat{t}>0 and tˇ>0\check{t}>0 be the lower bound and the upper bound of time spent by 𝒙t\bm{x}_{t} to get from (Dr2)out(\partial{D_{r_{2}}})_{\mathrm{out}} to Dr1D_{r_{1}}, respectively. Then, similarly to (5.8), there exists a>0a>0 such that

𝐏(x,y)(0t^/2|yuh(𝑿sε,𝝃sε)𝖳σ(𝝃sε))|2ds>a,sup0t2t^|𝑿tε𝒙t|ε1+2α4)1.\bm{\mathrm{P}}_{(x,y)}\left(\int_{0}^{\hat{t}/2}|\nabla_{y}u_{h}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon}))|^{2}ds>a,\sup_{0\leq t\leq 2\hat{t}}|{\bm{X}}_{t}^{\varepsilon}-\bm{x}_{t}|\leq\varepsilon^{\frac{1+2\alpha}{4}}\right)\to 1. (B.23)

Hence

𝐏(x,y)(τ¯¯<𝝉,0τ¯¯|yuh(𝑿sε,𝝃sε)𝖳σ(𝝃sε)|2ds<a)\displaystyle\bm{\mathrm{P}}_{(x,y)}\left(\bar{\bar{\tau}}<\bm{\tau},\int_{0}^{\bar{\bar{\tau}}}|\nabla_{y}u_{h}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})|^{2}ds<a\right)
𝐏(x,y)(t^/2τ¯¯<𝝉,0τ¯¯|yuh(𝑿sε,𝝃sε)𝖳σ(𝝃sε)|2ds<a)+𝐏(x,y)(τ¯¯<𝝉,τ¯¯<t^/2)\displaystyle\leq\bm{\mathrm{P}}_{(x,y)}\left(\hat{t}/2\leq\bar{\bar{\tau}}<\bm{\tau},\int_{0}^{\bar{\bar{\tau}}}|\nabla_{y}u_{h}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})|^{2}ds<a\right)+\bm{\mathrm{P}}_{(x,y)}(\bar{\bar{\tau}}<\bm{\tau},\bar{\bar{\tau}}<\hat{t}/2)
𝐏(x,y)(0t^/2|yuh(𝑿sε,𝝃sε)𝖳σ(𝝃sε)|2ds<a)+𝐏(x,y)(sup0t2t^|𝑿tε𝒙t|>ε1+2α4)\displaystyle\leq\bm{\mathrm{P}}_{(x,y)}\left(\int_{0}^{\hat{t}/2}|\nabla_{y}u_{h}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})|^{2}ds<a\right)+\bm{\mathrm{P}}_{(x,y)}\left(\sup_{0\leq t\leq 2\hat{t}}|{\bm{X}}_{t}^{\varepsilon}-\bm{x}_{t}|>\varepsilon^{\frac{1+2\alpha}{4}}\right)
0.\displaystyle\to 0.

Similarly, it is easy to see that 𝐏(x,y)(τ¯¯>2tˇ)<𝐏(x,y)(sup0t2tˇ|𝑿tε𝒙t|>ε1+2α4)0\bm{\mathrm{P}}_{(x,y)}(\bar{\bar{\tau}}>2\check{t})<\bm{\mathrm{P}}_{(x,y)}\left(\sup_{0\leq t\leq 2\check{t}}|{\bm{X}}_{t}^{\varepsilon}-\bm{x}_{t}|>\varepsilon^{\frac{1+2\alpha}{4}}\right)\to 0, and the desired result follows from the Markov property. ∎

Proof of Proposition B.3.

As in (4.1):

H(𝑿tε)\displaystyle H({\bm{X}}_{t}^{\varepsilon}) =H(x)+ε0txuh(𝑿sε,𝝃sε)b(𝑿sε,𝝃sε)ds\displaystyle=H(x)+\varepsilon\int_{0}^{t}\nabla_{x}u_{h}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})\cdot b({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})ds
+ε0tyuh(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs+ε(uh(x,y)uh(𝑿tε,𝝃tε)).\displaystyle\quad+\sqrt{\varepsilon}\int_{0}^{t}\nabla_{y}u_{h}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}+\varepsilon(u_{h}(x,y)-u_{h}({\bm{X}}_{t}^{\varepsilon},{\bm{\xi}}_{t}^{\varepsilon})).

The change in H(𝑿tε)H({\bm{X}}_{t}^{\varepsilon}) is mainly due to the stochastic integral while the other terms are of order O(ε)O(\varepsilon) and O(tε)O(t\cdot\varepsilon) and can be controlled. For each t(ε)>0t(\varepsilon)>0,

{𝝉<t(ε)}\displaystyle\{\bm{\tau}<t(\varepsilon)\}\supset {sup[0,t(ε)]|ε0t(ε)yuh(𝑿sε,𝝃sε)𝖳σ(𝝃sε)dWs|>3εα}\displaystyle\left\{\sup_{[0,t(\varepsilon)]}\left|\sqrt{\varepsilon}\int_{0}^{t(\varepsilon)}\nabla_{y}u_{h}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})^{\mathsf{T}}\sigma({\bm{\xi}}_{s}^{\varepsilon})dW_{s}\right|>3\varepsilon^{\alpha}\right\} (B.24)
{ε0t(ε)|xuh(𝑿sε,𝝃sε)b(𝑿sε,𝝃sε)|ds<εα/2}.\displaystyle\quad\bigcap\left\{\varepsilon\int_{0}^{t(\varepsilon)}|\nabla_{x}u_{h}({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})\cdot b({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})|ds<\varepsilon^{\alpha}/2\right\}.

Note that if we choose t(ε)=o(εα1)t(\varepsilon)=o(\varepsilon^{\alpha-1}), then the second event is always true. Now we recursively define stopping times:

θ10\displaystyle\theta^{1}_{0} =0,\displaystyle=0,
θ2k\displaystyle\theta^{2}_{k} =inf{tθ1k1:𝑿tεDr2}𝝉,\displaystyle=\inf\{t\geq\theta^{1}_{k-1}:{\bm{X}}_{t}^{\varepsilon}\in\partial D_{r_{2}}\}\wedge\bm{\tau},
θ1k\displaystyle\theta^{1}_{k} =inf{tθ2k:𝑿tεDr1}𝝉.\displaystyle=\inf\{t\geq\theta^{2}_{k}:{\bm{X}}_{t}^{\varepsilon}\in\partial D_{r_{1}}\}\wedge\bm{\tau}.

We denote D(x,y)=yuh(x,y)𝖳σ(y)D(x,y)=\nabla_{y}u_{h}(x,y)^{\mathsf{T}}\sigma(y). Note that once the process leaves VεV^{\varepsilon}, the stopping times stay constant afterwards. The main idea of the proof is to show that after a sufficiently long time t(ε)t(\varepsilon), the stochastic integral will accumulate enough variance to exit from VεV^{\varepsilon}. Let us bound the probability of variance being small:

𝐏(x,y)(𝝉t(ε),0t(ε)|D(𝑿sε,𝝃sε)|2ds<9ε2α1)\displaystyle\bm{\mathrm{P}}_{(x,y)}\left(\bm{\tau}\geq t(\varepsilon),\int_{0}^{t(\varepsilon)}|D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})|^{2}ds<9\varepsilon^{2\alpha-1}\right) (B.25)
𝐏(x,y)(𝝉t(ε)>θ1n(ε),0t(ε)|D(𝑿sε,𝝃sε)|2ds<9ε2α1)+𝐏(x,y)(θ1n(ε)t(ε)),\displaystyle\quad\leq\bm{\mathrm{P}}_{(x,y)}\left(\bm{\tau}\geq t(\varepsilon)>\theta^{1}_{n(\varepsilon)},\int_{0}^{t(\varepsilon)}|D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})|^{2}ds<9\varepsilon^{2\alpha-1}\right)+\bm{\mathrm{P}}_{(x,y)}(\theta^{1}_{n(\varepsilon)}\geq t(\varepsilon)),

where the integer n(ε)n(\varepsilon) will be specified later. Let τ¯\bar{\tau}, τ¯¯\bar{\bar{\tau}}, and aa be defined as in Lemma B.4 and Lemma B.6. Then

𝐏(x,y)(𝝉t(ε)>θ1n(ε),0t(ε)|D(𝑿sε,𝝃sε)|2ds<9ε2α1)\displaystyle\bm{\mathrm{P}}_{(x,y)}\left(\bm{\tau}\geq t(\varepsilon)>\theta^{1}_{n(\varepsilon)},\int_{0}^{t(\varepsilon)}|D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})|^{2}ds<9\varepsilon^{2\alpha-1}\right)
exp(9ε2α1/a)𝐄(x,y)(χ{𝝉>θ1n(ε)}exp(1a0θ1n(ε)|D(𝑿sε,𝝃sε)|2ds))\displaystyle\quad\leq\exp(9\varepsilon^{2\alpha-1}/a)\bm{\mathrm{E}}_{(x,y)}\left(\chi_{\{\bm{\tau}>\theta^{1}_{n(\varepsilon)}\}}\exp\left(-\frac{1}{a}\int_{0}^{\theta^{1}_{n(\varepsilon)}}|D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})|^{2}ds\right)\right) (B.26)
exp(9ε2α1/a)[sup(x,y)Dr1×𝕋m𝐄(x,y)(χ{𝝉>θ11}exp(1a0θ11|D(𝑿sε,𝝃sε)|2ds))]n(ε)1.\displaystyle\quad\leq\exp(9\varepsilon^{2\alpha-1}/a)\left[\sup_{(x,y)\in\partial D_{r_{1}}\times\mathbb{T}^{m}}\bm{\mathrm{E}}_{(x,y)}\left(\chi_{\{\bm{\tau}>\theta^{1}_{1}\}}\exp\left(-\frac{1}{a}\int_{0}^{\theta^{1}_{1}}|D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})|^{2}ds\right)\right)\right]^{{n(\varepsilon)}-1}.

Now let us deal with one excursion from Dr2D_{r_{2}} to Dr1D_{r_{1}}. For (x,y)(Dr2)out×𝕋m(x,y)\in(\partial{D_{r_{2}}})_{\mathrm{out}}\times\mathbb{T}^{m},

𝐄(x,y)(χ{τ¯¯<𝝉}exp(1a0τ¯¯|D(𝑿sε,𝝃sε)|2ds))\displaystyle\bm{\mathrm{E}}_{(x,y)}\left(\chi_{\{\bar{\bar{\tau}}<\bm{\tau}\}}\exp\left(-\frac{1}{a}\int_{0}^{\bar{\bar{\tau}}}|D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})|^{2}ds\right)\right) (B.27)
𝐏(x,y)(τ¯¯<𝝉,0τ¯¯|D(𝑿sε,𝝃sε)|2ds<a)+𝐏(x,y)(τ¯¯<𝝉,0τ¯¯|D(𝑿sε,𝝃sε)|2dsa)/e\displaystyle\quad\leq\bm{\mathrm{P}}_{(x,y)}(\bar{\bar{\tau}}<\bm{\tau},\int_{0}^{\bar{\bar{\tau}}}|D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})|^{2}ds<a)+\bm{\mathrm{P}}_{(x,y)}(\bar{\bar{\tau}}<\bm{\tau},\int_{0}^{\bar{\bar{\tau}}}|D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})|^{2}ds\geq a)/e
e0.99,\displaystyle\quad\leq e^{-0.99},

for all ε\varepsilon sufficiently small, by Lemma B.6. For (x,y)Dr1×𝕋m(x,y)\in\partial D_{r_{1}}\times\mathbb{T}^{m}:

𝐄(x,y)(χ{𝝉>θ11}exp(1a0θ11|D(𝑿sε,𝝃sε)|2ds))\displaystyle\bm{\mathrm{E}}_{(x,y)}\left(\chi_{\{\bm{\tau}>\theta^{1}_{1}\}}\exp\left(-\frac{1}{a}\int_{0}^{\theta^{1}_{1}}|D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})|^{2}ds\right)\right)
𝐄(x,y)(χ{𝝉>θ11,𝑿ετ¯(Dr2)out}exp(1a0θ11|D(𝑿sε,𝝃sε)|2ds))+𝐏(x,y)(𝑿ετ¯(Dr2)in)\displaystyle\leq\bm{\mathrm{E}}_{(x,y)}\left(\chi_{\{\bm{\tau}>\theta^{1}_{1},{\bm{X}}^{\varepsilon}_{\bar{\tau}}\in(\partial{D_{r_{2}}})_{\mathrm{out}}\}}\exp\left(-\frac{1}{a}\int_{0}^{\theta^{1}_{1}}|D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})|^{2}ds\right)\right)+\bm{\mathrm{P}}_{(x,y)}({\bm{X}}^{\varepsilon}_{\bar{\tau}}\in(\partial{D_{r_{2}}})_{\mathrm{in}})
sup(x,y)(Dr2)out×𝕋m𝐄(x,y)(χ{τ¯¯<𝝉}exp(1a0τ¯¯|D(𝑿sε,𝝃sε)|2ds))+𝐏(x,y)(𝑿ετ¯(Dr2)in)\displaystyle\leq\sup_{(x^{\prime},y^{\prime})\in(\partial{D_{r_{2}}})_{\mathrm{out}}\times\mathbb{T}^{m}}\bm{\mathrm{E}}_{(x^{\prime},y^{\prime})}\left(\chi_{\{\bar{\bar{\tau}}<\bm{\tau}\}}\exp\left(-\frac{1}{a}\int_{0}^{\bar{\bar{\tau}}}|D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})|^{2}ds\right)\right)+\bm{\mathrm{P}}_{{(x,y)}}({\bm{X}}^{\varepsilon}_{\bar{\tau}}\in(\partial{D_{r_{2}}})_{\mathrm{in}})
e0.98,\displaystyle\leq e^{-0.98},

by Lemma B.5 and (B.27). Now we can come back to (B.26) and have

𝐏(x,y)(𝝉t(ε)>θ1n,0t(ε)|D(𝑿sε,𝝃sε)|2ds<9ε2α1)exp(9ε2α1/a0.98(n(ε)1)).\bm{\mathrm{P}}_{(x,y)}\left(\bm{\tau}\geq t(\varepsilon)>\theta^{1}_{n},\int_{0}^{t(\varepsilon)}|D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})|^{2}ds<9\varepsilon^{2\alpha-1}\right)\leq\exp(9\varepsilon^{2\alpha-1}/a-0.98({n(\varepsilon)}-1)). (B.28)

The second probability on the right-hand side of (B.25) can be bounded by Lemmas B.4 and B.6 with certain K>0K>0:

𝐏(x,y)(θ1nt(ε))\displaystyle\bm{\mathrm{P}}_{(x,y)}(\theta^{1}_{n}\geq t(\varepsilon)) 𝐄(x,y)θ1n/t(ε)\displaystyle\leq{\bm{\mathrm{E}}_{(x,y)}\theta^{1}_{n}}/{t(\varepsilon)} (B.29)
(sup(x,y)Dr1×𝕋m𝐄(x,y)τ¯+sup(x,y)Dr2×𝕋m𝐄(x,y)τ¯¯)n(ε)t(ε)\displaystyle\leq{\left(\sup_{(x^{\prime},y^{\prime})\in\partial D_{r_{1}}\times\mathbb{T}^{m}}\bm{\mathrm{E}}_{(x^{\prime},y^{\prime})}\bar{\tau}+\sup_{{(x^{\prime},y^{\prime})\in\partial D_{r_{2}}\times\mathbb{T}^{m}}}\bm{\mathrm{E}}_{(x^{\prime},y^{\prime})}\bar{\bar{\tau}}\right)}\cdot\frac{n(\varepsilon)}{t(\varepsilon)}
n(ε)K|logε|t(ε).\displaystyle\leq\frac{n(\varepsilon)K|\log\varepsilon|}{t(\varepsilon)}.

Choose n(ε)=[10ε2α1/a+2]n(\varepsilon)=[10\varepsilon^{2\alpha-1}/a+2]. Then the quantity in (B.28) converges to 0. Choose t(ε)=100Kε2α1|logε|/at(\varepsilon)=100K\varepsilon^{2\alpha-1}|\log\varepsilon|/a, then the quantity on the right-hand side of (B.29) converges to 0.10.1. Therefore, the quantity on the right-hand side of (B.25) converges to 0.10.1. Moreover, since t(ε)=o(εα1)t(\varepsilon)=o(\varepsilon^{\alpha-1}), it follows from (B.24) that, for all xVεx\in V^{\varepsilon}, y𝕋my\in\mathbb{T}^{m}, and ε\varepsilon sufficiently small,

𝐏(x,y)(𝝉t(ε))\displaystyle\bm{\mathrm{P}}_{(x,y)}(\bm{\tau}\geq t(\varepsilon))
=𝐏(x,y)(𝝉t(ε),sup[0,t(ε)]|ε0tD(𝑿sε,𝝃sε)dWs|3εα)\displaystyle=\bm{\mathrm{P}}_{(x,y)}\left(\bm{\tau}\geq t(\varepsilon),\sup_{[0,t(\varepsilon)]}\left|\sqrt{\varepsilon}\int_{0}^{t}D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})dW_{s}\right|\leq 3\varepsilon^{\alpha}\right)
𝐏(x,y)(𝝉t(ε),0t(ε)|D(𝑿sε,𝝃sε)|2ds>9ε2α1,sup[0,t(ε)]|ε0tD(𝑿sε,𝝃sε)dWs|3εα)\displaystyle\leq\bm{\mathrm{P}}_{(x,y)}\left(\bm{\tau}\geq t(\varepsilon),\int_{0}^{t(\varepsilon)}|D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})|^{2}ds>9\varepsilon^{2\alpha-1},\sup_{[0,t(\varepsilon)]}\left|\sqrt{\varepsilon}\int_{0}^{t}D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})dW_{s}\right|\leq 3\varepsilon^{\alpha}\right)
+𝐏(x,y)(𝝉t(ε),0t(ε)|D(𝑿sε,𝝃sε)|2ds<9ε2α1)\displaystyle\quad+\bm{\mathrm{P}}_{(x,y)}\left(\bm{\tau}\geq t(\varepsilon),\int_{0}^{t(\varepsilon)}|D({\bm{X}}_{s}^{\varepsilon},{\bm{\xi}}_{s}^{\varepsilon})|^{2}ds<9\varepsilon^{2\alpha-1}\right)
0.69+0.11=0.8,\displaystyle\leq 0.69+0.11=0.8,

since the stochastic integral in the last inequality can be represented as time-changed Brownian motion. Finally, we have by the Markov property

𝐄(x,y)𝝉5t(ε)=O(ε2α1|logε|).\bm{\mathrm{E}}_{(x,y)}\bm{\tau}\leq 5t(\varepsilon)=O(\varepsilon^{2\alpha-1}|\log\varepsilon|).\qed

Appendix C Tightness

In this section, we verify the tightness for the projection to the Reeb graph of the original process (1.4) on 2×𝕋m\mathbb{R}^{2}\times\mathbb{T}^{m}. By the Arzelà–Ascoli theorem (cf. Theorem 7.3 in [1]), it suffices to check the following two conditions

(i)limR+lim supε0𝐏(x,y)(sup0tT|H(Xtε)|R)=0,\displaystyle(i)\lim_{R\to+\infty}\limsup_{\varepsilon\downarrow 0}\bm{\mathrm{P}}_{(x,y)}\left(\sup_{0\leq t\leq T}|H(X_{t}^{\varepsilon})|\geq R\right)=0, (C.1)
(ii)limδ0lim supε0𝐏(x,y)(sup0s<tT|st|<δr(h(Xtε),h(Xsε))>κ)=0,\displaystyle(ii)\lim_{\delta\downarrow 0}\limsup_{\varepsilon\downarrow 0}\bm{\mathrm{P}}_{(x,y)}\left(\sup_{\begin{subarray}{c}0\leq s<t\leq T\\ |s-t|<\delta\end{subarray}}r(h(X_{t}^{\varepsilon}),h(X_{s}^{\varepsilon}))>\kappa\right)=0, (C.2)

hold for all κ>0\kappa>0 and (x,y)2×𝕋m(x,y)\in\mathbb{R}^{2}\times\mathbb{T}^{m}. As in (4.1), we can also write the equation for H(Xtε)H(X_{t}^{\varepsilon}), where we consider (Xtε,ξtε)(X_{t}^{\varepsilon},\xi_{t}^{\varepsilon}) to be a process on 2×𝕋m\mathbb{R}^{2}\times\mathbb{T}^{m}:

H(Xtε)\displaystyle H(X_{t}^{\varepsilon}) =H(x)+0tyuh(Xsε,ξsε)𝖳σ(ξsε)dWs+ε(uh(x0,y0)uh(Xtε,ξtε))\displaystyle=H(x)+\int_{0}^{t}\nabla_{y}u_{h}(X_{s}^{\varepsilon},\xi_{s}^{\varepsilon})^{\mathsf{T}}\sigma(\xi_{s}^{\varepsilon})dW_{s}+\varepsilon(u_{h}(x_{0},y_{0})-u_{h}(X_{t}^{\varepsilon},\xi_{t}^{\varepsilon})) (C.3)
+0txuh(Xsε,ξsε)b(Xsε,ξsε)ds.\displaystyle\quad+\int_{0}^{t}\nabla_{x}u_{h}(X_{s}^{\varepsilon},\xi_{s}^{\varepsilon})\cdot b(X_{s}^{\varepsilon},\xi_{s}^{\varepsilon})ds.

By assumption (H4) and Lemma 3.2, u(x,y)u(x,y) is bounded together with its first derivatives. Besides, by assumption (H2), H(x)/|x|+H(x)/|x|\to+\infty as |x|+|x|\to+\infty, hence sup{|x|:H(x)R}/R0\sup\{|x|:H(x)\leq R\}/R\to 0 as R+R\to+\infty. Also, there exists an K>0K>0 such that and H(x)>KH(x)>-K for all x2x\in\mathbb{R}^{2}. For R>KR>K, let τR=inf{t:|H(Xtε)|=R}T=inf{t:H(Xtε)=R}T\tau_{R}=\inf\{t:|H(X_{t}^{\varepsilon})|=R\}\wedge T=\inf\{t:H(X_{t}^{\varepsilon})=R\}\wedge T. Then, by Markov’s inequality and boundedness of second derivatives of HH,

𝐏(x,y)(sup0tT|H(Xtε)|R)\displaystyle\bm{\mathrm{P}}_{(x,y)}(\sup_{0\leq t\leq T}|H(X_{t}^{\varepsilon})|\geq R) =𝐏(x,y)(H(XετR)=R)\displaystyle=\bm{\mathrm{P}}_{(x,y)}(H(X^{\varepsilon}_{\tau_{R}})=R)
𝐄(x,y)(H(XετR)+K)/R\displaystyle\leq\bm{\mathrm{E}}_{(x,y)}(H(X^{\varepsilon}_{\tau_{R}})+K)/R
(H(x)+(ε+T)sup{|H(x)|:H(x)R}+T+K)/R\displaystyle\lesssim({H(x)+(\varepsilon+T)\sup\{|\nabla H(x)|:H(x)\leq R\}+T+K})/{R}
(H(x)+(ε+T)sup{|x|:H(x)R}+T+K)/R0,\displaystyle\lesssim({H(x)+(\varepsilon+T)\sup\{|x|:H(x)\leq R\}+T+K})/{R}\to 0,

as R+R\to+\infty, uniformly in 0<ε<10<\varepsilon<1. Thus, we have (C.1), and it also follows that

limR+lim supε0𝐏(x,y)(sup0tT|H(Xtε)|R)=0,\lim_{R\to+\infty}\limsup_{\varepsilon\downarrow 0}\bm{\mathrm{P}}_{(x,y)}(\sup_{0\leq t\leq T}|\nabla H(X_{t}^{\varepsilon})|\geq R)=0, (C.4)

since H(x)/|x|+H(x)/|x|\to+\infty and HH has bounded second derivatives. To verify (C.2), we see that, for an arbitrary κ>0\kappa>0 small,

𝐏(x,y)(sup0s<tT|st|<δr(h(Xtε),h(Xsε))>κ)\displaystyle\bm{\mathrm{P}}_{(x,y)}\left(\sup_{\begin{subarray}{c}0\leq s<t\leq T\\ |s-t|<\delta\end{subarray}}r(h(X_{t}^{\varepsilon}),h(X_{s}^{\varepsilon}))>\kappa\right) k=0[T/δ]𝐏(x,y)(suptδr(h(Xkδ+tε),h(Xkδε))>κ/4)\displaystyle\leq\sum_{k=0}^{[T/\delta]}\bm{\mathrm{P}}_{(x,y)}\left(\sup_{t\leq\delta}r(h(X_{k\delta+t}^{\varepsilon}),h(X_{k\delta}^{\varepsilon}))>\kappa/4\right)
k=0[T/δ]𝐏(x,y)(suptδ|H(Xkδ+tε)H(Xkδε)|>κ/12).\displaystyle\leq\sum_{k=0}^{[T/\delta]}\bm{\mathrm{P}}_{(x,y)}\left(\sup_{t\leq\delta}|H(X_{k\delta+t}^{\varepsilon})-H(X_{k\delta}^{\varepsilon})|>\kappa/12\right).

Since (C.4) holds, it is sufficient to prove, for each R>0R>0,

limδ0lim supε0k=0[T/δ]𝐏(x,y)(suptδ|H(Xkδ+tε)H(Xkδε)|>κ/12,sup0tT|H(Xtε)|R)=0.\lim_{\delta\downarrow 0}\limsup_{\varepsilon\downarrow 0}\sum_{k=0}^{[T/\delta]}\bm{\mathrm{P}}_{(x,y)}\left(\sup_{t\leq\delta}|H(X_{k\delta+t}^{\varepsilon})-H(X_{k\delta}^{\varepsilon})|>\kappa/12,\sup_{0\leq t\leq T}|\nabla H(X_{t}^{\varepsilon})|\leq R\right)=0. (C.5)

Let KK^{\prime} be the upper bound for |b(x,y)H(x)||b(x,y)-\nabla^{\perp}H(x)|, |u(x,y)||u(x,y)|, |xu(x,y)||\nabla_{x}u(x,y)|, and eigenvalues of 2H(x)\nabla^{2}H(x) and (yuσσ𝖳yu𝖳)(x,y)(\nabla_{y}u\sigma\sigma^{\mathsf{T}}\nabla_{y}u^{\mathsf{T}})(x,y) over 2×𝕋m\mathbb{R}^{2}\times\mathbb{T}^{m}. On the event {sup0tT|H(Xtε)|R}\{\sup_{0\leq t\leq T}|\nabla H(X_{t}^{\varepsilon})|\leq R\} the following holds. For ε<κ48KR\varepsilon<\frac{\kappa}{48K^{\prime}R}, due to (C.3) and the fact that the stochastic integral can be represented as a time-changed Brownian motion, we have

suptδ|H(Xkδ+tε)H(Xkδε)|\displaystyle\sup_{t\leq\delta}|H(X_{k\delta+t}^{\varepsilon})-H(X_{k\delta}^{\varepsilon})| suptδ|kδkδ+tyuh(Xsε,ξsε)𝖳σ(ξsε)dWs|+2εKR+K2(K+R)δ\displaystyle\leq\sup_{t\leq\delta}|\int_{k\delta}^{k\delta+t}\nabla_{y}u_{h}(X_{s}^{\varepsilon},\xi_{s}^{\varepsilon})^{\mathsf{T}}\sigma(\xi_{s}^{\varepsilon})dW_{s}|+2\varepsilon K^{\prime}R+{K^{\prime}}^{2}(K^{\prime}+R)\delta
sup0tδKR2|W~t|+K2(K+R)δ+κ/24,\displaystyle\leq\sup_{0\leq t\leq\delta K^{\prime}R^{2}}|\tilde{W}_{t}|+{K^{\prime}}^{2}(K^{\prime}+R)\delta+\kappa/24,

where W~t\tilde{W}_{t} is another Brownian motion. Hence, for δ\delta sufficiently small, independently of ε\varepsilon,

k=0[T/δ]𝐏(x,y)(suptδ|H(Xkδ+tε)H(Xkδε)|>κ/12,sup0tT|H(Xtε)|R)\displaystyle\sum_{k=0}^{[T/\delta]}\bm{\mathrm{P}}_{(x,y)}\left(\sup_{t\leq\delta}|H(X_{k\delta+t}^{\varepsilon})-H(X_{k\delta}^{\varepsilon})|>\kappa/12,\sup_{0\leq t\leq T}|\nabla H(X_{t}^{\varepsilon})|\leq R\right)
[Tδ+1]𝐏(x,y)(sup0tδKR2|W~t|>κ/48)\displaystyle\leq\left[\frac{T}{\delta}+1\right]\cdot\bm{\mathrm{P}}_{(x,y)}\left(\sup_{0\leq t\leq\delta K^{\prime}R^{2}}|\tilde{W}_{t}|>\kappa/48\right)
0,\displaystyle\to 0,

as δ0\delta\downarrow 0, since each term is exponentially small as δ0\delta\downarrow 0. This proves (C.5), and thus (C.2).

Acknowledgements

I am grateful to Mark Freidlin for introducing me to this problem and to my advisor Leonid Koralov for invaluable guidance. I am also grateful to Dmitry Dolgopyat and Yeor Hafouta for insightful discussions.

References

  • [1] Patrick Billingsley “Convergence of probability measures” A Wiley-Interscience Publication, Wiley Series in Probability and Statistics: Probability and Statistics John Wiley & Sons, Inc., New York, 1999, pp. x+277
  • [2] A.. Borodin “A limit theorem for the solutions of differential equations with a random right-hand side” In Teor. Verojatnost. i Primenen. 22.3, 1977, pp. 498–512
  • [3] A.. Borodin and M.. Freidlin “Fast oscillating random perturbations of dynamical systems with conservation laws” In Ann. Inst. H. Poincaré Probab. Statist. 31.3, 1995, pp. 485–525
  • [4] Dmitry Dolgopyat, Mark Freidlin and Leonid Koralov “Deterministic and stochastic perturbations of area preserving flows on a two-dimensional torus” In Ergodic Theory Dynam. Systems 32.3, 2012, pp. 899–918
  • [5] Dmitry Dolgopyat and Leonid Koralov “Averaging of Hamiltonian flows with an ergodic component” In Ann. Probab. 36.6, 2008, pp. 1999–2049
  • [6] Dmitry Dolgopyat and Leonid Koralov “Averaging of incompressible flows on two-dimensional surfaces” In J. Amer. Math. Soc. 26.2, 2013, pp. 427–449
  • [7] Stewart N. Ethier and Thomas G. Kurtz “Markov processes” Characterization and convergence, Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics John Wiley & Sons, Inc., New York, 1986, pp. x+534
  • [8] M. Freidlin and L. Koralov “Averaging in the case of multiple invariant measures for the fast system” In Electron. J. Probab. 26, 2021, pp. Paper No. 138\bibrangessep17
  • [9] M.. Freidlin and A.. Wentzell “Diffusion approximation for noise-induced evolution of first integrals in multifrequency systems” In J. Stat. Phys. 182.3, 2021, pp. Paper No. 45\bibrangessep24
  • [10] Mark Freidlin and Matthias Weber “Random perturbations of dynamical systems and diffusion processes with conservation laws” In Probab. Theory Related Fields 128.3, 2004, pp. 441–466
  • [11] Mark I. Freidlin and Alexander D. Wentzell “Random perturbations of dynamical systems” Translated from the 1979 Russian original by Joseph Szücs 260, Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] Springer, Heidelberg, 2012, pp. xxviii+458
  • [12] Mark I. Freidlin and Alexander D. Wentzell “Random perturbations of Hamiltonian systems” In Mem. Amer. Math. Soc. 109.523, 1994, pp. viii+82
  • [13] Martin Hairer “On Malliavin’s proof of Hörmander’s theorem” In Bull. Sci. Math. 135.6-7, 2011, pp. 650–666
  • [14] R.. Has’minskii “A limit theorem for solutions of differential equations with a random right hand part” In Teor. Verojatnost. i Primenen. 11, 1966, pp. 444–462
  • [15] R.. Has’minskii “Ergodic properties of recurrent diffusion processes and stabilization of the solution of the Cauchy problem for parabolic equations” In Teor. Verojatnost. i Primenen. 5, 1960, pp. 196–214
  • [16] Leonid Koralov and Shuo Yan “Local limit theorem for time-inhomogeneous functions of Markov processes”, 2024 arXiv:2308.00880 [math.PR]
  • [17] David Nualart “The Malliavin calculus and related topics”, Probability and its Applications (New York) Springer-Verlag, Berlin, 2006, pp. xiv+382