This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Nonstandard Representation of the Dirichlet Form

Robert M. Anderson University of California, Berkeley, Department of Economics Haosui Duanmu University of California, Berkeley, Department of Economics  and  Aaron Smith University of Ottawa, Department of Mathematics and Statistics
Abstract.

The Dirichlet form is a generalization of the Laplacian, heavily used in the study of many diffusion-like processes. In this paper we present a nonstandard representation theorem for the Dirichlet form, showing that the usual Dirichlet form can be well-approximated by a hyperfinite sum. One of the main motivations for such a result is to provide a tool for directly translating results about Dirichlet forms on finite or countable state spaces to results on more general state spaces, without having to translate the details of the proofs. As an application, we prove a generalization of a well-known comparison theorem for Markov chains on finite state spaces, and also relate our results to previous generalization attempts.

1. Introduction

The Dirichlet form, introduced in [BD58], can be used in place of the usual Laplacian in situations where the “usual” Laplacian may not be convenient or make sense at all. Among other applications, it is an important tool for defining diffusive processes on various complicated spaces (see e.g. an early paper on fractals [Kus89] and an introductory paper on infinite-dimensional processes [Sch99]), potential theory (see e.g. [AS03] for the application of Dirichlet forms and [Doo12] for the classical theory) and for comparing processes (see versions of such results in [Dav89, LPW09]). See e.g. [MR12, Fuk96] for broad introductions to Dirichlet forms and their uses.

The main result of this paper, Theorem 4.3, develops a nonstandard representation theorem for Dirichlet forms. We give a very informal summary here. Recall that the Dirichlet form of a transition kernel gg with stationary measure π\pi on state space XX can be applied to functions fL2(π)f\in L^{2}(\pi) via the formula:

g(f,f)=12xXyX[f(x)f(y)]2g(x,1,dy)π(dx).\displaystyle\mathscr{E}_{g}(f,f)=\frac{1}{2}\int_{x\in X}\int_{y\in X}[f(x)-f(y)]^{2}g(x,1,\mathrm{d}y)\pi(\mathrm{d}x). (1.1)

Theorem 4.3 says that, under appropriate conditions, this Dirichlet form can be well-approximated by an appropriate hyperfinite sum. With notation to be fixed later in the paper, the main conclusion of the theorem is written

g(f,f)12s,tSX[F(s)F(t)]2Hs({t})Π({s}),\displaystyle\mathscr{E}_{g}(f,f)\approx\frac{1}{2}\sum_{s,t\in S_{X}}[F(s)-F(t)]^{2}H_{s}(\{t\})\Pi(\{s\}), (1.2)

where SXS_{X} is a set meant to approximate XX, FF is a function meant to approximate ff, HsH_{s} and π\pi are measures meant to approximate g(s,1,)g(s,1,\cdot) and π\pi respectively, and the symbol “\approx”, which is defined precisely in Section 2, means two quantities equal up to an infinitesimal.

The immediate motivation for a result such as \reftagform@1.2 is that the hyperfinite sum over SXS_{X} behaves a great deal like a more-familiar finite sum. This allows one to directly translate certain results about processes on discrete spaces to results about processes on more general spaces, without worrying about translating each step in the associated proof. Such direct translation can lead to substantially simpler and shorter proofs, and sometimes allows one to avoid assumptions, e.g. by allowing one to bypass differentiability assumptions that would be needed in translating the proof steps but which are not needed to translate the theorem statement.

As an illustration of Theorem 4.3 and how it may be applied, we give a simple translation of the well-known comparison theorem for Markov chains, first proved in [DSC93]. Previous generalizations of this result have been obtained by purely standard results (see [Yue00]). However, our version offers several important improvements - for example, we remove various differentiability conditions (which in fact often fail in practice) and are able to obtain substantially sharper estimates in some basic but important test cases. See Section 6 for a brief application and discussion, and our forthcoming companion paper focused on these applications for more details.

1.1. Nonstandard Analysis and Transferring Program

We view the main contribution of this paper as the development of a nonstandard version of the classical Dirichlet form. This paper is a small part of an ongoing effort to provide nonstandard analogues to a variety of important objects in probability theory and statistics: [DRW18], [ADS18], [ADS19], [ADS19a], [DRS17] and [DR18]. The main motivation is the rough observation that many interesting theorems in probability have the following properties:

  • The theorem is initially proved on a discrete (or finite-dimensional) space, and

  • The theorem statement does not seem to rely heavily on the space being discrete or finite-dimensional, but

  • Several steps in the most natural proof do seem to rely heavily on the space being discrete or finite-dimensional.

For some examples, see e.g. [And76], [Kei84], [DRW18], [ADS18], [DRS17] and [DR18]. For results that have this form, it is natural to try to directly translate the theorem statement without needing to go through the details of translating the full proof. This idea of translation is at the heart of nonstandard analysis, where it is formalized in the notion of “transfer.” This basic idea has let us and others make progress on several problems that otherwise appear difficult, including:

  • In [DRW18], we prove the Markov chain ergodic theorem for a large class of continuous time general Markov processes, generalizing the well-known Markov chain ergodic theorem for discrete Markov processes.

  • In [DR18], we show that a decision procedure is extended admissible if and only if it has infinitesimal excess Bayes risk under a nonstandard prior distribution. This result holds under complete generality and is a generalization of existing complete class theorems.

  • In [ADS18], we show that mixing time and hitting time are asymptotically equivalent for general Markov processes under moderate regularity condition, generalizing the same result for finite Markov processes in [PS15].

  • In [DRS17], we show that matching priors for specific families of credible sets exist on compact metric spaces, extending a result for finite spaces in [MN16].

As discussed in the introduction, this paper includes another example of the power of this approach: we provide a translation of [LPW09a] that is in many ways more powerful than the extension in previously published work [Yue00] and [Yue02].

2. Introduction to Nonstandard Analysis

We briefly introduce the setting and notation from nonstandard analysis. For those who are not familiar with nonstandard analysis, [DRW18] and [DR18] provide introduction tailored to statisticians and probabilists. [ACH97, Cut+95, WL00] provide thorough introductions.

We use to denote the nonstandard extension map taking elements, sets, functions, relations, etc., to their nonstandard counterparts. In particular, {{}^{*}\mathbb{R}} and {{}^{*}\mathbb{N}} denote the nonstandard extensions of the reals and natural numbers, respectively. An element rr\in{{}^{*}\mathbb{R}} is infinite if |r|>n|r|>n for every nn\in\mathbb{N} and is finite otherwise. An element rr\in{{}^{*}\mathbb{R}} with r>0r>0 is infinitesimal if r1r^{-1} is infinite. For r,sr,s\in{{}^{*}\mathbb{R}}, we use the notation rsr\approx s as shorthand for the statement “|rs||r-s| is infinitesimal,” and similarly we use use rsr\gtrapprox s as shorthand for the statement “either rsr\geq s or rsr\approx s.”

Given a topological space (X,𝒯)(X,\mathcal{T}), the monad of a point xXx\in X is the set U𝒯:xUU\bigcap_{U\in\mathcal{T}\,:\,x\in U}{{}^{*}U}. An element xXx\in{{}^{*}X} is near-standard if it is in the monad of some yXy\in X. We say yy is the standard part of xx and write y=𝗌𝗍(x)y=\mathsf{st}(x). Note that such yy is unique. We use NS(X)\mathrm{NS}({{}^{*}X}) to denote the collection of near-standard elements of X{{}^{*}X} and we say NS(X)\mathrm{NS}({{}^{*}X}) is the near-standard part of X{{}^{*}X}. The standard part map 𝗌𝗍\mathsf{st} is a function from NS(X)\mathrm{NS}({{}^{*}X}) to XX, taking near-standard elements to their standard parts. In both cases, the notation elides the underlying space YY and the topology TT, because the space and topology will always be clear from context. For a metric space (X,d)(X,d), two elements x,yXx,y\in{{}^{*}X} are infinitely close if d(x,y)0{{}^{*}d}(x,y)\approx 0. An element xXx\in{{}^{*}X} is near-standard if and only if it is infinitely close to some yXy\in X. An element xXx\in{{}^{*}X} is finite if there exists yXy\in X such that d(x,y)<{{}^{*}d}(x,y)<\infty and is infinite otherwise.

Let XX be a topological space endowed with Borel σ\sigma-algebra [X]\mathcal{B}[X]. An internal probability measure μ\mu on (X,[X])({{}^{*}X},{{}^{*}\mathcal{B}[X]}) is an internal function from [X][0,1]{{}^{*}\mathcal{B}[X]}\to{{}^{*}[0,1]} such that

  1. (1)

    μ()=0\mu(\emptyset)=0;

  2. (2)

    μ(X)=1\mu({{}^{*}X})=1; and

  3. (3)

    μ\mu is hyperfinitely additive.

The Loeb space of the internal probability space (X,[X],μ)({{}^{*}X},{{}^{*}\mathcal{B}[X]},\mu) is a countably additive probability space (X,[X]¯,μ¯)({{}^{*}X},\overline{{{}^{*}\mathcal{B}[X]}},\overline{\mu}) such that

[X]¯={AX|(ϵ>0)(Ai,Ao[X])(AiAAoμ(AoAi)<ϵ)}\displaystyle\overline{{{}^{*}\mathcal{B}[X]}}=\{A\subset{{}^{*}X}|(\forall\epsilon>0)(\exists A_{i},A_{o}\in{{}^{*}\mathcal{B}[X]})(A_{i}\subset A\subset A_{o}\wedge\mu(A_{o}\setminus A_{i})<\epsilon)\} (2.1)

and

μ¯(A)=sup{𝗌𝗍(μ(Ai))|AiA,Ai[X]}=inf{𝗌𝗍(μ(Ao))|AoA,Ao[X]}.\displaystyle\overline{\mu}(A)=\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}\{\mathsf{st}(\mu(A_{i}))|A_{i}\subset A,A_{i}\in{{}^{*}\mathcal{B}[X]}\}=\operatorname*{\mathrm{inf}\vphantom{\mathrm{infsup}}}\{\mathsf{st}(\mu(A_{o}))|A_{o}\supset A,A_{o}\in{{}^{*}\mathcal{B}[X]}\}. (2.2)

Every standard model is closely connected to its nonstandard extension via the transfer principle, which asserts that a first order statement is true in the standard model is true if and only if it is true in the nonstandard model. Finally, given a cardinal number κ\kappa, a nonstandard model is called κ\kappa-saturated if the following condition holds: let \mathcal{F} be a family of internal sets, if \mathcal{F} has cardinality less than κ\kappa and \mathcal{F} has the finite intersection property, then the total intersection of \mathcal{F} is non-empty. In this paper, we assume our nonstandard model is as saturated as we need (see e.g. [ACH97] for the existence of κ\kappa-saturated nonstandard models for any uncountable cardinal κ\kappa).

3. Hyperfinite Markov Processes

We start this section by giving an overview of hyperfinite Markov processes developed in [DRW18], [ADS18] and [ADS19a]. Intuitively, hyperfinite Markov processes behave like finite Markov processes but can be used to represent general Markov processes under moderate conditions. For the remainder of this paper, we assume that a probability space XX is always endowed with Borel σ\sigma-algebra [X]\mathcal{B}[X] unless otherwise mentioned.

Definition 3.1.

A general hyperfinite Markov chain on X{{}^{*}X} is characterized by the following four ingredients:

  1. (1)

    A hyperfinite state space SXS\subset{{}^{*}X}.

  2. (2)

    A hyperfinite time line T=T={{}^{*}\mathbb{N}}.

  3. (3)

    A set {vi:iS}\{v_{i}:i\in S\} of non-negative hyperreals such that iSvi=1\sum_{i\in S}v_{i}=1.

  4. (4)

    A set {Gij}i,jS\{G_{ij}\}_{i,j\in S} consisting of non-negative hyperreals with jSGij=1\sum_{j\in S}G_{ij}=1 for each iSi\in S

The state space SS naturally inherits the metric of X{{}^{*}X}. For every i,jSi,j\in S, GijG_{ij} refers to the internal probability of going from ii to jj. The following theorem shows the existence of hyperfinite Markov process.

Theorem 3.2 ([DRW18]).

Given a non-empty hyperfinite state space SXS\subset{{}^{*}X}, {vi}iS\{v_{i}\}_{i\in S} and {Gij}i,jS\{G_{ij}\}_{i,j\in S} as in Definition 3.1. Then there exists a σ{{}^{*}\sigma}-additive probability triple (Ω,𝒜,P)(\Omega,\mathcal{A},P) with an internal stochastic process {Xt}tT\{X_{t}\}_{t\in T} defined on (Ω,𝒜,P)(\Omega,\mathcal{A},P) such that

P(X0=i0,X1=i1,Xt=it)=vi0Gi0i1Git1it\displaystyle P(X_{0}=i_{0},X_{1}=i_{1},...X_{t}=i_{t})=v_{i_{0}}G_{i_{0}i_{1}}...G_{i_{t-1}i_{t}} (3.1)

for all tt\in{{}^{*}\mathbb{N}} and i0,.itSi_{0},....i_{t}\in S.

The proof of Theorem 3.2 essentially follows from transferring the existence theorem for finite Markov processes. For every iSi\in S and every internal ASA\subset S, define Gi(A)=jAGijG_{i}(A)=\sum_{j\in A}G_{ij}. For nTn\in T, define Gi(n+1)(A)=jSGj(n)(A)GijG_{i}^{(n+1)}(A)=\sum_{j\in S}G_{j}^{(n)}(A)G_{ij}. It is easy to see that Gi(n)()G_{i}^{(n)}(\cdot) is an internal probability measure on SS for every iSi\in S and nTn\in T. We call {Gi(n)()}\{G_{i}^{(n)}(\cdot)\} the nn-step internal transition kernel of the underlying hyperfinite Markov process. Just like standard Markov process, a hyperfinite Markov process is characterized by its 11-step internal transition kernel.

As in [DRW18], [ADS18] and [ADS19a], we shall construct hyperfinite Markov processes from standard Markov processes. Let {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X} be the transition kernel of a Markov process on XX. We now define the hyperfinite representation of the state space XX.

Definition 3.3.

Let (X,d)(X,d) be a compact metric space. Let δ>0\delta\in{{}^{*}\mathbb{R}_{>0}} be an infinitesimal. A δ\delta-hyperfinite representation of XX is a tuple (S,{B(s)}sS)(S,\{B(s)\}_{s\in S}) such that

  1. (1)

    SS is a hyperfinite subset of X{{}^{*}X}.

  2. (2)

    sB(s)[X]s\in B(s)\in{{}^{*}\mathcal{B}[X]} for every sSs\in S and sSB(s)=X\bigcup_{s\in S}B(s)={{}^{*}X}.

  3. (3)

    For every sSs\in S, the diameter of B(s)B(s) is no greater than δ\delta.

  4. (4)

    B(s1)B(s2)=B(s_{1})\cap B(s_{2})=\emptyset for every s1s2Ss_{1}\neq s_{2}\in S.

The set SS is called the base set of the hyperfinite representation. For every xXx\in{{}^{*}X}, we use sxs_{x} to denote the unique element in SS such that xB(sx)x\in B(s_{x}).

We say (S,{B(s)}sS)(S,\{B(s)\}_{s\in S}) is a hyperfinite representation of XX if (S,{B(s)}sS)(S,\{B(s)\}_{s\in S}) is a δ\delta-hyperfinite representation of XX for some infinitesimal δ\delta. Hyperfinite representations always exist.

Theorem 3.4 ([ADS18]).

Let XX be a compact metric space. Then, for every positive infinitesimal δ\delta, there exists an δ\delta-hyperfinite representation (Sδ,{B(s)}sSδ)(S_{\delta},\{B(s)\}_{s\in S_{\delta}}) of X{{}^{*}X}.

We fix such a hyperfinite representation of X{{}^{*}X} and denoted it by SS in the remainder of this section. We now define a hyperfinite Markov process transition kernel on SS from {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X}. For i,jSi,j\in S, define by Gij=g(i,1,B(j))G_{ij}={{}^{*}g}(i,1,B(j)). For every internal set ASA\subset S, define Gi(A)=jAGijG_{i}(A)=\sum_{j\in A}G_{ij}. It is straightforward to check that {Gij}i,jS\{G_{ij}\}_{i,j\in S} defines the one-step internal transition kernel for some hyperfinite Markov process. In order to establish a connection between {Gij}i,jS\{G_{ij}\}_{i,j\in S} and {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X}, we impose the following nonstandard condition on {g(x,1,)}xX\{{{}^{*}g}(x,1,\cdot)\}_{x\in{{}^{*}X}}.

Condition SSF.

Let (S,{B(s)}sS)(S,\{B(s)\}_{s\in S}) be a hyperfinite representation of XX. The transition kernel {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X} satisfies the S-strong Feller property if, for every xXx\in{{}^{*}X} and every internal ASA\subset S, we have

|g(x,1,aAB(a))g(sx,1,aAB(a))|0,\displaystyle|{{}^{*}g}(x,1,\bigcup_{a\in A}B(a))-{{}^{*}g}(s_{x},1,\bigcup_{a\in A}B(a))|\approx 0, (3.2)

where sxs_{x} is the unique point in SS with xB(sx)x\in B(s_{x}).

For two (internal) probability measures P1P_{1} and P2P_{2}, we use P1P2\parallel P_{1}-P_{2}\parallel to denote the *total variational distance between P1P_{1} and P2P_{2}. SSF is a consequence of the following classical condition on {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X}.

Condition DSF.

The transition kernel {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X} satisfies the strong Feller property if for every xXx\in X and every ϵ>0\epsilon>0 there exists δ>0\delta>0 such that

(yX)(|yx|<δg(x,1,A)g(y,1,A)<ϵ).\displaystyle(\forall y\in X)(|y-x|<\delta\implies\parallel g(x,1,A)-g(y,1,A)\parallel<\epsilon). (3.3)

Suppose {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X} is a reversible transition kernel with stationary distribution π\pi. Define Π\Pi on (S,{B(s)}sS)(S,\{B(s)\}_{s\in S}) by letting Π({a})=π(B(a))\Pi(\{a\})={{}^{*}\pi}(B(a)) for every aSa\in S. The internal transition kernel {Gi()}iS\{G_{i}(\cdot)\}_{i\in S} may not be *reversible with respect to Π\Pi. In the following theorem, we construct an internal transition kernel {Hi()}iS\{H_{i}(\cdot)\}_{i\in S}, which is infinitesimally close to {Gi()}iS\{G_{i}(\cdot)\}_{i\in S} and is *reversible with respect to Π\Pi.

Theorem 3.5 ([ADS19a]).

Let {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X} be a transition kernel on XX. Suppose {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X} is reversible with stationary measure π\pi and satisfies SSF. For every i,jSi,j\in S, define

Hi({j})=B(i)g(x,1,B(j))π(dx)π(B(i))\displaystyle H_{i}(\{j\})=\frac{\int_{B(i)}{{}^{*}g}(x,1,B(j)){{}^{*}\pi}(\mathrm{d}x)}{{{}^{*}\pi}(B(i))} (3.4)

if π(B(i))0{{}^{*}\pi}(B(i))\neq 0. Define Hij=Gij=g(i,1,B(j))H_{ij}=G_{ij}={{}^{*}g}(i,1,B(j)) if π(B(i))=0{{}^{*}\pi}(B(i))=0.

  1. (1)

    {Hi()}iS\{H_{i}(\cdot)\}_{i\in S} defines an one-step internal transition kernel.

  2. (2)

    Π\Pi is a *stationary distribution for {Hi()}iS\{H_{i}(\cdot)\}_{i\in S}.

  3. (3)

    {Hi()}iS\{H_{i}(\cdot)\}_{i\in S} is *reversible with respect to Π\Pi.

  4. (4)

    maxiSGi(t)()Hi(t)()0\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}_{i\in S}\parallel G^{(t)}_{i}(\cdot)-H^{(t)}_{i}(\cdot)\parallel\approx 0 for every tt\in\mathbb{N}.

We shall use notations {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X}, {Gi()}iS\{G_{i}(\cdot)\}_{i\in S} and {Hi()}iS\{H_{i}(\cdot)\}_{i\in S} as they defined in this section for the rest of the paper.

4. Hyperfinite Representation of Dirichlet Form

In this section, we study the relationship between Dirichlet forms for general discrete-time Markov processes and Dirichlet forms for hyperfinite Markov processes.

Let {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X} denote the transition kernel of a Markov process and let π\pi denote its stationary distribution. the Dirichlet form associated with gg of a function f:Xf:X\to\mathbb{R} is defined to be

g(f,f)=12xXyX[f(x)f(y)]2g(x,1,dy)π(dx).\displaystyle\mathscr{E}_{g}(f,f)=\frac{1}{2}\int_{x\in X}\int_{y\in X}[f(x)-f(y)]^{2}g(x,1,\mathrm{d}y)\pi(\mathrm{d}x). (4.1)

When the state space XX is finite, the integral in the above equation becomes summation.

For the remainder of this section, assume that the state space XX is compact. Let (S,{B(s)}sS)(S,\{B(s)\}_{s\in S}) denote a hyperfinite representation of XX. Let {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X} be a transition kernel on XX with stationary distribution π\pi. We define {Gi()}iS\{G_{i}(\cdot)\}_{i\in S}, {Hi()}iS\{H_{i}(\cdot)\}_{i\in S} and Π\Pi as in Section 3. We now use hyperfinite Dirichlet forms to approximate standard Dirichlet forms. We first quote the following well-known result in nonstandard analysis.

Theorem 4.1 ([Cut+95]).

Let (X,[X],μ)(X,\mathcal{B}[X],\mu) be a Radon probability space with Borel σ\sigma-algebra and let f:Xf:X\to\mathbb{R} be measurable. Then f{{}^{*}f} is a lifting of ff with respect to μ¯\overline{{{}^{*}\mu}} i.e.

f(x)f(𝗌𝗍(x))\displaystyle{{}^{*}f}(x)\approx f(\mathsf{st}(x)) (4.2)

for μ¯\overline{{{}^{*}\mu}}-almost all xXx\in{{}^{*}X}. Consequently, there is a set YXY\subset{{}^{*}X} with μ¯(Y)=1\overline{{{}^{*}\mu}}(Y)=1 such that for all y1,y2Yy_{1},y_{2}\in Y

(y1y2)(f(y1)f(y2)).\displaystyle(y_{1}\approx y_{2})\implies({{}^{*}f}(y_{1})\approx{{}^{*}f}(y_{2})). (4.3)

Before proving the main result of this section, we quote the following useful lemma.

Lemma 4.2 ([DRW18]).

Let P1P_{1} and P2P_{2} be two internal probability measures on a hyperfinite set SS. Then

P1()P2()supf:S[0,1]|iSP1({i})f(i)iSP2({i})f(i)|,\displaystyle\parallel P_{1}(\cdot)-P_{2}(\cdot)\parallel\geq{{}^{*}\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}}_{f:S\to{{}^{*}[0,1]}}|\sum_{i\in S}P_{1}(\{i\})f(i)-\sum_{i\in S}P_{2}(\{i\})f(i)|, (4.4)

where P1()P2()=supA(S)|P1(A)P2(A)|\parallel P_{1}(\cdot)-P_{2}(\cdot)\parallel={{}^{*}\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}}_{A\in\mathcal{I}(S)}|P_{1}(A)-P_{2}(A)| and the function ff ranges over all internal functions.

Our main result in this section is:

Theorem 4.3.

Suppose XX is a compact metric space. Let {U1,U2,,UK}[X]\{U_{1},U_{2},\dotsc,U_{K}\}\subset{{}^{*}\mathcal{B}[X]} be a hyperfinite partition of X{{}^{*}X} such that the diameter of UjU_{j} is infinitesimal for each jKj\leq K. Then there exists a hyperfinite representation (SX,{B(s):sSX})(S_{X},\{B(s):s\in S_{X}\}) of XX such that

  1. (1)

    SXS_{X} has internal cardinality KK and {B(s):sSX}={Uj:jK}\{B(s):s\in S_{X}\}=\{U_{j}:j\leq K\}.

  2. (2)

    Suppose SSF holds, then for every continuous function f:Xf:X\to\mathbb{R}, we have

    g(f,f)12s,tSX[F(s)F(t)]2Hs({t})Π({s})\displaystyle\mathscr{E}_{g}(f,f)\approx\frac{1}{2}\sum_{s,t\in S_{X}}[F(s)-F(t)]^{2}H_{s}(\{t\})\Pi(\{s\}) (4.5)

    where F:SXF:S_{X}\to{{}^{*}\mathbb{R}} is defined as F(s)=f(s)F(s)={{}^{*}f}(s) for every sSXs\in S_{X}.

Proof.

Let {Uj:jK}\{U_{j}:j\leq K\} be a partition of X{{}^{*}X} of *Borel sets with infinitesimal radius and let kk\in\mathbb{N}. Pick nn\in\mathbb{N} and let f1,,fnf_{1},\dotsc,f_{n} be nn continuous functions from XX to \mathbb{R}. Let

a=max{[fi(x)fi(y)]2:x,yX,1in}.\displaystyle a=\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}\{[f_{i}(x)-f_{i}(y)]^{2}:x,y\in X,1\leq i\leq n\}\in\mathbb{R}. (4.6)

For 1in1\leq i\leq n, let ri(x)=yX[fi(x)fi(y)]2g(x,1,dy)r_{i}(x)=\int_{y\in X}[f_{i}(x)-f_{i}(y)]^{2}g(x,1,\mathrm{d}y). As rir_{i} is a measurable function from XX\to\mathbb{R}, by Theorem 4.1, there exists a Loeb measurable set MiM_{i} with π¯(Mi)=1\overline{{{}^{*}\pi}}(M_{i})=1 such that ri(x)ri(𝗌𝗍(x)){{}^{*}r_{i}}(x)\approx r_{i}(\mathsf{st}(x)) for xMix\in M_{i}. Let M=i=1nMiM=\bigcap_{i=1}^{n}M_{i}. Note that π¯(M)=1\overline{{{}^{*}\pi}}(M)=1. Thus, rk(y)rk(𝗌𝗍(y)){{}^{*}r_{k}}(y)\approx r_{k}(\mathsf{st}(y)) for all 1kn1\leq k\leq n and all yMy\in M.

Let YMY\subset M be a *Borel set such that π(Y)>11ka{{}^{*}\pi}(Y)>1-\frac{1}{ka}. For 1jK1\leq j\leq K, if UjY=U_{j}\cap Y=\emptyset, we pick an arbitrary element sjUjs_{j}\in U_{j}. If UjYU_{j}\cap Y\neq\emptyset, we pick an arbitrary element sjUjYs_{j}\in U_{j}\cap Y. By the internal definition principle, we know that S={sj:jK}S=\{s_{j}:j\leq K\} is a hyperfinite subset of X{{}^{*}X}. For every sSs\in S, we use B(s)B(s) to denote the unique element in {Uj:jK}\{U_{j}:j\leq K\} which contains ss. It is easy to see that (S,{B(s):sS})(S,\{B(s):s\in S\}) is a hyperfinite representation of X{{}^{*}X}. Moreover, SS has internal cardinality KK and {B(s):sS}={Uj:jK}\{B(s):s\in S\}=\{U_{j}:j\leq K\}. For every 1in1\leq i\leq n, let Fi:SF_{i}:S\to{{}^{*}\mathbb{R}} be Fi(s)=fi(s)F_{i}(s)={{}^{*}f_{i}}(s). For every sSs\in S and every 1in1\leq i\leq n, let

Ri(s)=tS[Fi(s)Fi(t)]2Hs({t}).\displaystyle R_{i}(s)=\sum_{t\in S}[F_{i}(s)-F_{i}(t)]^{2}H_{s}(\{t\}). (4.7)

Let Z={sS:B(s)Y}Z=\{s\in S:B(s)\cap Y\neq\emptyset\}. It is straightforward to see that Π(Z)=π(sZB(s))π(Y)>11ka\Pi(Z)={{}^{*}\pi}(\bigcup_{s\in Z}B(s))\geq{{}^{*}\pi}(Y)>1-\frac{1}{ka}.

Claim 4.4.

For every zZz\in Z and every 1in1\leq i\leq n, we have |Ri(z)ri(z)|2k|R_{i}(z)-{{}^{*}r}_{i}(z)|\lessapprox\frac{2}{k} and ri(z)ri(𝗌𝗍(z)){{}^{*}r_{i}}(z)\approx r_{i}(\mathsf{st}(z)).

Proof.

Pick some 1in1\leq i\leq n and pick some zZz\in Z. By construction, we have zYz\in Y. By Theorem 4.1, we have ri(z)ri(𝗌𝗍(z)){{}^{*}r_{i}}(z)\approx r_{i}(\mathsf{st}(z)). Let YiYY_{i}\supset Y be a *Borel subset of X{{}^{*}X} such that g(z,1,Yi)>11ka{{}^{*}g}(z,1,Y_{i})>1-\frac{1}{ka}. We have

ri(z)\displaystyle{{}^{*}r_{i}}(z) =xX[(fi(z)fi(x))2]g(z,1,dx)\displaystyle=\int_{x\in{{}^{*}X}}[({{}^{*}f_{i}}(z)-{{}^{*}f_{i}}(x))^{2}]{{}^{*}g}(z,1,\mathrm{d}x) (4.8)
=xYi[(fi(z)fi(x))2]g(z,1,dx)+xXYi[(fi(z)fi(x))2]g(z,1,dx).\displaystyle=\int_{x\in Y_{i}}[({{}^{*}f_{i}}(z)-{{}^{*}f_{i}}(x))^{2}]{{}^{*}g}(z,1,\mathrm{d}x)+\int_{x\in{{}^{*}X}\setminus Y_{i}}[({{}^{*}f_{i}}(z)-{{}^{*}f_{i}}(x))^{2}]{{}^{*}g}(z,1,\mathrm{d}x). (4.9)

Note that xXYi[(fi(z)fi(x))2]g(z,1,dx)<1k\int_{x\in{{}^{*}X}\setminus Y_{i}}[({{}^{*}f_{i}}(z)-{{}^{*}f_{i}}(x))^{2}]{{}^{*}g}(z,1,\mathrm{d}x)<\frac{1}{k}. Let

Zi={sS:B(s)Yi}.\displaystyle Z_{i}=\{s\in S:B(s)\cap Y_{i}\neq\emptyset\}. (4.10)

Note that Gz(Zi)>11kaG_{z}(Z_{i})>1-\frac{1}{ka} and aa in Eq. 4.6 is an upper bound of [Fi(z)Fi(s)]2[F_{i}(z)-F_{i}(s)]^{2}. Then we have

sS[Fi(z)Fi(s)]2Gz({s})\displaystyle\sum_{s\in S}[F_{i}(z)-F_{i}(s)]^{2}G_{z}(\{s\}) (4.11)
=sZi[Fi(z)Fi(s)]2Gz({s})+sSZi[Fi(z)Fi(s)]2Gz({s}).\displaystyle=\sum_{s\in Z_{i}}[F_{i}(z)-F_{i}(s)]^{2}G_{z}(\{s\})+\sum_{s\in S\setminus Z_{i}}[F_{i}(z)-F_{i}(s)]^{2}G_{z}(\{s\}). (4.12)

Note that sSZi[Fi(z)Fi(s)]2Gz({s})<1k\sum_{s\in S\setminus Z_{i}}[F_{i}(z)-F_{i}(s)]^{2}G_{z}(\{s\})<\frac{1}{k}.

By the continuity of fif_{i}, we have

xYi[(fi(z)fi(x))2]g(z,1,dx)\displaystyle\int_{x\in Y_{i}}[({{}^{*}f_{i}}(z)-{{}^{*}f_{i}}(x))^{2}]{{}^{*}g}(z,1,\mathrm{d}x) (4.13)
=sZixB(s)Yi[(fi(z)fi(x))2]g(z,1,dx)\displaystyle=\sum_{s\in Z_{i}}\int_{x\in B(s)\cap Y_{i}}[({{}^{*}f_{i}}(z)-{{}^{*}f_{i}}(x))^{2}]{{}^{*}g}(z,1,\mathrm{d}x) (4.14)
sZixB(s)Yi[(fi(z)fi(s))2]g(z,1,dx)\displaystyle\approx\sum_{s\in Z_{i}}\int_{x\in B(s)\cap Y_{i}}[({{}^{*}f_{i}}(z)-{{}^{*}f_{i}}(s))^{2}]{{}^{*}g}(z,1,\mathrm{d}x) (4.15)
=sZi[(Fi(z)Fi(s))2]g(z,1,B(s)Yi).\displaystyle=\sum_{s\in Z_{i}}[(F_{i}(z)-F_{i}(s))^{2}]{{}^{*}g}(z,1,B(s)\cap Y_{i}). (4.16)

As sZi[(Fi(z)Fi(s))2]Gz({s})sZi[(Fi(z)Fi(s))2]g(z,1,B(s)Yi)<1k\sum_{s\in Z_{i}}[(F_{i}(z)-F_{i}(s))^{2}]G_{z}(\{s\})-\sum_{s\in Z_{i}}[(F_{i}(z)-F_{i}(s))^{2}]{{}^{*}g}(z,1,B(s)\cap Y_{i})<\frac{1}{k}, we have |ri(z)sS[(Fi(z)Fi(s))2]Gz({s})|2k|{{}^{*}r_{i}}(z)-\sum_{s\in S}[(F_{i}(z)-F_{i}(s))^{2}]G_{z}(\{s\})|\lessapprox\frac{2}{k}.

By Theorem 3.5 and Lemma 4.2, we have

sS[Fi(z)Fi(s)]2Gz({s})=asS[Fi(z)Fi(s)]2aGz({s})sS[Fi(z)Fi(s)]2Hz({s})\displaystyle\sum_{s\in S}[F_{i}(z)-F_{i}(s)]^{2}G_{z}(\{s\})=a\sum_{s\in S}\frac{[F_{i}(z)-F_{i}(s)]^{2}}{a}G_{z}(\{s\})\approx\sum_{s\in S}[F_{i}(z)-F_{i}(s)]^{2}H_{z}(\{s\}) (4.17)

Hence, we have |Ri(z)ri(z)|2k|R_{i}(z)-{{}^{*}r}_{i}(z)|\lessapprox\frac{2}{k} for every zZz\in Z and every 1in1\leq i\leq n. ∎

For 1in1\leq i\leq n, let S(Fi,Fi)=12sSRi(s)Π({s})\mathscr{R}_{S}(F_{i},F_{i})=\frac{1}{2}\sum_{s\in S}R_{i}(s)\Pi(\{s\}).

Claim 4.5.

For 1in1\leq i\leq n, we have |g(fi,fi)S(Fi,Fi)|<5k|\mathscr{E}_{g}(f_{i},f_{i})-\mathscr{R}_{S}(F_{i},F_{i})|<\frac{5}{k}.

Proof.

Pick some 1in1\leq i\leq n. By definition, we have

2g(fi,fi)\displaystyle 2\mathscr{E}_{g}(f_{i},f_{i}) =xXri(x)π(dx)\displaystyle=\int_{x\in X}r_{i}(x)\pi(\mathrm{d}x) (4.18)
=xXri(x)π(dx)\displaystyle=\int_{x\in{{}^{*}X}}{{}^{*}r_{i}}(x){{}^{*}\pi}(\mathrm{d}x) (4.19)
=xYri(x)π(dx)+xXYri(x)π(dx).\displaystyle=\int_{x\in Y}{{}^{*}r_{i}}(x){{}^{*}\pi}(\mathrm{d}x)+\int_{x\in{{}^{*}X}\setminus Y}{{}^{*}r_{i}}(x){{}^{*}\pi}(\mathrm{d}x). (4.20)

Note that xXYri(x)π(dx)<1k\int_{x\in{{}^{*}X}\setminus Y}{{}^{*}r_{i}}(x){{}^{*}\pi}(\mathrm{d}x)<\frac{1}{k}.

On the other hand, we have

2S(Fi,Fi)\displaystyle 2\mathscr{R}_{S}(F_{i},F_{i}) =sSRi(s)Π({s})\displaystyle=\sum_{s\in S}R_{i}(s)\Pi(\{s\}) (4.21)
=sZRi(s)Π({s})+sSZRi(s)Π({s}).\displaystyle=\sum_{s\in Z}R_{i}(s)\Pi(\{s\})+\sum_{s\in S\setminus Z}R_{i}(s)\Pi(\{s\}). (4.22)

Note that sSZRi(s)Π({s})<1k\sum_{s\in S\setminus Z}R_{i}(s)\Pi(\{s\})<\frac{1}{k}.

We now compare terms xYri(x)π(dx)\int_{x\in Y}{{}^{*}r_{i}}(x){{}^{*}\pi}(\mathrm{d}x) and sZRi(s)Π({s})\sum_{s\in Z}R_{i}(s)\Pi(\{s\}). Note that sZB(s)Y\bigcup_{s\in Z}B(s)\supset Y and ri{{}^{*}r_{i}} is S-continuous in YY. By Claim 4.4, we have

|xYri(x)π(dx)sZRi(s)π(B(s)Y)|\displaystyle|\int_{x\in Y}{{}^{*}r_{i}}(x){{}^{*}\pi}(\mathrm{d}x)-\sum_{s\in Z}R_{i}(s){{}^{*}\pi}(B(s)\cap Y)| (4.23)
=|sZxB(s)Yri(x)π(dx)sZRi(s)π(B(s)Y)|\displaystyle=|\sum_{s\in Z}\int_{x\in B(s)\cap Y}{{}^{*}r_{i}}(x){{}^{*}\pi}(\mathrm{d}x)-\sum_{s\in Z}R_{i}(s){{}^{*}\pi}(B(s)\cap Y)| (4.24)
|sZri(s)π(B(s)Y)sZRi(s)π(B(s)Y)|2k\displaystyle\approx|\sum_{s\in Z}{{}^{*}r_{i}}(s){{}^{*}\pi}(B(s)\cap Y)-\sum_{s\in Z}R_{i}(s){{}^{*}\pi}(B(s)\cap Y)|\lessapprox\frac{2}{k} (4.25)

Note that

sZRi(s)Π({s})sZRi(s)π(B(s)Y)\displaystyle\sum_{s\in Z}R_{i}(s)\Pi(\{s\})-\sum_{s\in Z}R_{i}(s){{}^{*}\pi}(B(s)\cap Y) (4.27)
=sZRi(s)[π(B(s))π(B(s)Y)]<1k.\displaystyle=\sum_{s\in Z}R_{i}(s)[{{}^{*}\pi}(B(s))-{{}^{*}\pi}(B(s)\cap Y)]<\frac{1}{k}. (4.28)

Thus, we have |g(fi,fi)S(Fi,Fi)|4k<5k|\mathscr{E}_{g}(f_{i},f_{i})-\mathscr{R}_{S}(F_{i},F_{i})|\lessapprox\frac{4}{k}<\frac{5}{k}. ∎

Let 𝒞\mathcal{C} denote the collection of all continuous functions from XX to \mathbb{R} and let ϕfk(S)\phi_{f}^{k}(S) be the conjunction of the following formulas:

  1. (1)

    (S,{Uj:jK})(S,\{U_{j}:j\leq K\}) is a hyperfinite representation of X{{}^{*}X}.

  2. (2)

    |g(f,f)12s,tS[F(s)F(t)]2Gs({t})Π({s})|<5k|{{}^{*}\mathscr{E}_{g}(f,f)}-\frac{1}{2}\sum_{s,t\in S}[F(s)-F(t)]^{2}G_{s}(\{t\})\Pi(\{s\})|<\frac{5}{k} where F(s)=f(s)F(s)={{}^{*}f}(s) for every sSs\in S.

The family ={ϕfk:f𝒞,k}\mathscr{B}=\{\phi_{f}^{k}:f\in\mathcal{C},k\in\mathbb{N}\} is finitely satisfiable. By saturation, there exists SXS_{X} such that every formula in \mathscr{B} is satisfied simultaneously. Such (SX,{Uj:jK})(S_{X},\{U_{j}:j\leq K\}) is the desired hyperfinite representation. ∎

The hyperfinite Dirichlet form with HH of an internal function F:SXF:S_{X}\to{{}^{*}\mathbb{R}} is defined to be

H(F,F)=12s,tSX[F(s)F(t)]2Hs({t})Π({s}).\displaystyle\mathscr{F}_{H}(F,F)=\frac{1}{2}\sum_{s,t\in S_{X}}[F(s)-F(t)]^{2}H_{s}(\{t\})\Pi(\{s\}). (4.29)

The result in this section shows that, under moderate conditions, we can use hyperfinite Dirichlet form to approximate the standard Dirichlet form.

5. Nonstandard Characterization of Density Functions

In this section, we study density functions of transition kernels and stationary distributions via studying their corresponding hyperfinite counterparts. The materials presented in this section are closely related to [Zim05]. We start by introducing the notion of S-integrability from nonstandard analysis.

Definition 5.1.

Let (Ω,𝒜,P)(\Omega,\mathcal{A},P) be an internal probability space and let F:ΩF:\Omega\to{{}^{*}\mathbb{R}} be an internally integrable function such that 𝗌𝗍(F)\mathsf{st}(F) exists P¯\overline{P}-almost surely. Then FF is S-integrable with respect to PP if 𝗌𝗍(F)\mathsf{st}({F}) is P¯\overline{P}-integrable, and FdP𝗌𝗍(F)dP¯{{}^{*}\int F\mathrm{d}P}\approx\int\mathsf{st}({F})\mathrm{d}\overline{P}.

The following theorem provides several ways to verify the S-integrability of an internal function FF.

Theorem 5.2 ([ACH97]).

Suppose (Ω,𝒜,P)(\Omega,\mathcal{A},P) is an internal probability space, and F:ΩF:\Omega\to{{}^{*}\mathbb{R}} is an internally integrable function such that 𝗌𝗍(F)\mathsf{st}({F}) exists P¯\overline{P}-almost surely. Then the following are equivalent:

  1. (1)

    𝗌𝗍(|F|dP)\mathsf{st}({\int|F|\mathrm{d}P}) exists and it equals to limn𝗌𝗍(|Fn|dP)\operatorname*{\mathrm{lim}\vphantom{\mathrm{infsup}}}_{n\to\infty}\mathsf{st}({\int|F_{n}|\mathrm{d}P}) where for nn\in\mathbb{N}, Fn=min{F,n}F_{n}=\operatorname*{\mathrm{min}\vphantom{\mathrm{infsup}}}\{F,n\} when F0F\geq 0 and Fn=max{F,n}F_{n}=\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}\{F,-n\} when F0F\leq 0.

  2. (2)

    For every infinite K>0K>0, |F|>K|F|dP0\int_{|F|>K}|F|\mathrm{d}P\approx 0.

  3. (3)

    𝗌𝗍(|F|dP)\mathsf{st}({\int|F|\mathrm{d}P}) exists, and for every BB with P(B)0P(B)\approx 0, we have B|F|dP0\int_{B}|F|\mathrm{d}P\approx 0.

  4. (4)

    FF is S-integrable with respect to PP.

For the rest of this section, let YY be a compact metric space and let {Ai:1iT}\{A_{i}:1\leq i\leq T\} be a hyperfinite partition of Y{{}^{*}Y} such that the diameter for AiA_{i} is infinitesimal for every iTi\leq T.

Definition 5.3.

Let P,QP,Q be two probability measures on (Y,[Y])(Y,\mathcal{B}[Y]). QQ dominates PP if for each ϵ>0\epsilon>0, there exists δ>0\delta>0 such that P(C)<ϵP(C)<\epsilon for all C[Y]C\in\mathcal{B}[Y] with Q(C)<δQ(C)<\delta. Let 𝒫\mathscr{P} be a family of probability measures on (Y,[Y])(Y,\mathcal{B}[Y]). Then 𝒫\mathscr{P} is uniformly dominated by QQ if, for every ϵ>0\epsilon>0, there exists δ>0\delta>0 such that P(C)<ϵP(C)<\epsilon for all P𝒫P\in\mathscr{P} and all C[Y]C\in\mathcal{B}[Y] with Q(C)<δQ(C)<\delta.

Using the nonstandard characterization of domination, QQ dominates PP if and only if Q(A)0{{}^{*}Q}(A)\approx 0 implies P(A)0{{}^{*}P}(A)\approx 0 for A[Y]A\in{{}^{*}\mathcal{B}[Y]}. We now mimic proofs in [Zim05] to give an useful nonstandard characterization of density functions.

Lemma 5.4 ([Zim05]).

Let P,QP,Q be two probability measures on (Y,[Y])(Y,\mathcal{B}[Y]). Suppose PP is dominated by QQ. Then the function ϕ:Y\phi:{{}^{*}Y}\to{{}^{*}\mathbb{R}} defined by

ϕ(x)=i=1TP(Ai)Q(Ai)𝟙Ai(x)\displaystyle\phi(x)=\sum_{i=1}^{T}\frac{{{}^{*}P}(A_{i})}{{{}^{*}Q}(A_{i})}\mathbbm{1}_{A_{i}}(x) (5.1)

is S-integrable with respect to QQ.

Proof.

It is straightforward to verify that Yϕ(x)Q(dx)=1\int_{{{}^{*}Y}}\phi(x){{}^{*}Q}(\mathrm{d}x)=1. As ϕ(x)0\phi(x)\geq 0, we know that ϕ(x)NS()\phi(x)\in\mathrm{NS}({{}^{*}\mathbb{R}}) for Q¯\overline{{{}^{*}Q}}-almost all xXx\in X. Pick a set A[Y]A\in{{}^{*}\mathcal{B}[Y]} such that Q(A)0{{}^{*}Q}(A)\approx 0. By domination, we know that P(A)0{{}^{*}P}(A)\approx 0. Thus, we have

Aϕ(x)Q(dx)=i=1TP(AAi)P(A)0.\displaystyle\int_{A}\phi(x){{}^{*}Q}(\mathrm{d}x)=\sum_{i=1}^{T}{{}^{*}P}(A\cap A_{i})\leq{{}^{*}P}(A)\approx 0. (5.2)

Thus, by Theorem 5.2, ϕ\phi is S-integrable with respect to QQ. ∎

We now show that ϕ\phi defined in Lemma 5.4 is infinitesimally close to the standard Radon-Nidodym derivative.

Theorem 5.5.

Let P,QP,Q be two probability measures on (Y,[Y])(Y,\mathcal{B}[Y]). Suppose PP is dominated by QQ. Let ff denote the density function of PP with respect to QQ. Suppose ff is continuous at y0Yy_{0}\in Y. Let ϕ\phi be the internal function defined in Eq. 5.1. Then ϕ(x)f(y)\phi(x)\approx{{}^{*}f}(y) for x,yYx,y\in{{}^{*}Y} with xyy0x\approx y\approx y_{0}.

Proof.

Pick AiA_{i} such that every point in AiA_{i} is infinitely close to y0y_{0}. Note that ϕ\phi is constant on AiA_{i}. By the transfer principle, we know that P(Ai)=Aif(x)Q(dx){{}^{*}P}(A_{i})=\int_{A_{i}}{{}^{*}f}(x){{}^{*}Q}(\mathrm{d}x) for every 1iT1\leq i\leq T. We also know that P(Ai)=Aiϕ(x)Q(dx){{}^{*}P}(A_{i})=\int_{A_{i}}\phi(x){{}^{*}Q}(\mathrm{d}x). So we have Ai[f(x)ϕ(x)]Q(dx)=0\int_{A_{i}}[{{}^{*}f}(x)-\phi(x)]{{}^{*}Q}(\mathrm{d}x)=0. As ϕ\phi is constant on AiA_{i} and ff is continuous at y0y_{0}, we know that f(x)ϕ(x){{}^{*}f}(x)\approx\phi(x) for all xAix\in A_{i}. By the continuity of ff at y0y_{0} again, we have f(x)ϕ(y){{}^{*}f}(x)\approx\phi(y) for all x,yYx,y\in{{}^{*}Y} with xyy0x\approx y\approx y_{0}. ∎

Theorem 5.5 implies that ϕ(x)=f(𝗌𝗍(x))\phi(x)=f(\mathsf{st}(x)) provided that the density function ff is continuous at 𝗌𝗍(x)\mathsf{st}(x). In fact, by Lemma 5.4 and Theorem 5.5, it can be shown that 𝗌𝗍(ϕ(x))\mathsf{st}(\phi(x)) (xYx\in Y) is a density function with respect to QQ ([Zim05]) provided that 𝗌𝗍(ϕ)\mathsf{st}(\phi) is constant on monads.

5.1. Density Functions For Markov Processes

Let {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X} be a Markov transition kernel with stationary distribution π\pi on compact metric space XX endowed with Borel σ\sigma-algebra [X]\mathcal{B}[X]. Let (SX,{B(s):sSX})(S_{X},\{B(s):s\in S_{X}\}) be a hyperfinite representation of XX. We use G,H,ΠG,H,\Pi to denote the corresponding hyperfinite objects as defined in Section 3.

Assumption 1.

There exists a referencing measure λ\lambda on (X,[X])(X,\mathcal{B}[X]) such that π\pi is dominated by λ\lambda and the family {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X} is dominated by λ\lambda.

We use f(x,)f(x,\cdot) to denote a density function of g(x,1,)g(x,1,\cdot) with respect to λ\lambda and use k()k(\cdot) to denote the density function of π\pi with respect to λ\lambda.

Theorem 5.6.

Suppose Assumption 1 holds. Suppose kk is continuous at x1Xx_{1}\in X and ff is jointly continuous at (x1,x2)(x_{1},x_{2}) for some x2Xx_{2}\in X. Then, for i,jSXi,j\in S_{X} such that ix1i\approx x_{1} and jx2j\approx x_{2}, we have Π({i})λ(B(i))k(i)\frac{\Pi(\{i\})}{{{}^{*}\lambda}(B(i))}\approx{{}^{*}k}(i) and Gi({j})λ(B(j))f(i,j)Hi({j})λ(B(j))\frac{G_{i}(\{j\})}{{{}^{*}\lambda}(B(j))}\approx{{}^{*}f}(i,j)\approx\frac{H_{i}(\{j\})}{{{}^{*}\lambda}(B(j))}.

Proof.

Clearly {B(s):sSX}\{B(s):s\in S_{X}\} is a hyperfinite partition of X{{}^{*}X} such that each B(s)B(s) has infinitesimal diameter. As k()k(\cdot) is continuous at x1x_{1}, by Theorem 5.5, we have

π(B(i))λ(B(i))=Π({i})λ(B(i))k(i)\displaystyle\frac{{{}^{*}\pi}(B(i))}{{{}^{*}\lambda}(B(i))}=\frac{\Pi(\{i\})}{{{}^{*}\lambda}(B(i))}\approx{{}^{*}k}(i) (5.3)

for all ix1i\approx x_{1}.

Pick s,jSs,j\in S such that sx1s\approx x_{1} and jx2j\approx x_{2}. By the transfer principle, we know that

Gs({j})=g(s,1,B(j))=B(j)f(s,x)λ(dx).\displaystyle G_{s}(\{j\})={{}^{*}g}(s,1,B(j))=\int_{B(j)}{{}^{*}f}(s,x){{}^{*}\lambda}(\mathrm{d}x). (5.4)

On the other hand, we also know that

Gs({j})=g(s,1,B(j))=B(j)Gs({j})λ(B(j))λ(dx).\displaystyle G_{s}(\{j\})={{}^{*}g}(s,1,B(j))=\int_{B(j)}\frac{G_{s}(\{j\})}{{{}^{*}\lambda}(B(j))}{{}^{*}\lambda}(\mathrm{d}x). (5.5)

As ff is jointly continuous at (x1,x2)(x_{1},x_{2}), we know that f(s,x)f(s,y)f(𝗌𝗍(s),𝗌𝗍(j)){{}^{*}f}(s,x)\approx{{}^{*}f}(s,y)\approx f(\mathsf{st}(s),\mathsf{st}(j)) for all x,yB(j)x,y\in B(j). This implies that Gs({j})λ(B(j))f(s,j)\frac{G_{s}(\{j\})}{{{}^{*}\lambda}(B(j))}\approx{{}^{*}f}(s,j).

If π(B(j))=0{{}^{*}\pi}(B(j))=0, then

Hs({j})λ(B(j))=Gs({j})λ(B(j))f(s,j).\displaystyle\frac{H_{s}(\{j\})}{{{}^{*}\lambda}(B(j))}=\frac{G_{s}(\{j\})}{{{}^{*}\lambda}(B(j))}\approx{{}^{*}f}(s,j). (5.6)

If π(B(j))0{{}^{*}\pi}(B(j))\neq 0, by the joint continuity of ff at (x1,x2)(x_{1},x_{2}), we have

Hs({j})λ(B(j))\displaystyle\frac{H_{s}(\{j\})}{{{}^{*}\lambda}(B(j))} =B(s)g(x,1,B(j))λ(B(j))π(dx)π(B(s))\displaystyle=\frac{\int_{B(s)}\frac{{{}^{*}g}(x,1,B(j))}{{{}^{*}\lambda}(B(j))}{{}^{*}\pi}(\mathrm{d}x)}{{{}^{*}\pi}(B(s))} (5.7)
B(s)f(x,j)π(dx)π(B(s))f(s,j).\displaystyle\approx\frac{\int_{B(s)}{{}^{*}f}(x,j){{}^{*}\pi}(\mathrm{d}x)}{{{}^{*}\pi}(B(s))}\approx{{}^{*}f}(s,j). (5.8)

Thus, we have the desired result.

6. Comparison Theorem for General Markov Processes

Our main motivation for constructing a hyperfinite representation of the Dirichlet form is to allow us to directly translate theorems about transition kernels on finite-dimensional spaces to theorems on more general state space. As an illustration, we provide a simple translation of a “comparison” theorem for Markov chains on finite state spaces (see the original Theorem 6.1 and our generalization Theorem 6.10 ). We also give a quick explanation, including an example, as to why direct translation does not work well for this problem. These bounds will be more fully developed in our companion paper focused on comparison.

6.1. Original Comparison Bound

We introduce comparison methods, following the same presentation as in [LPW09a]. For a finite reversible transition kernel {P(x,y)}x,yΩ\{P(x,y)\}_{x,y\in\Omega} with stationary distribution π\pi, let E={(x,y):P(x,y)>0}E=\{(x,y):P(x,y)>0\}. An EE-path from xx to yy is a sequence Γ=(e1,e2,,en)\Gamma=(e_{1},e_{2},\dotsc,e_{n}) of edges in EE such that e1=(x,x1),e2=(x1,x2),,en=(xn1,y)e_{1}=(x,x_{1}),e_{2}=(x_{1},x_{2}),\dotsc,e_{n}=(x_{n-1},y) for some vertices x1,x2,,xn1Ωx_{1},x_{2},\dotsc,x_{n-1}\in\Omega. We use |Γ||\Gamma| to denote the length of a path Γ\Gamma and let Q(x,y)=π(x)P(x,y)Q(x,y)=\pi(x)P(x,y) for x,yΩx,y\in\Omega.

Let PP and P~\tilde{P} be two finite reversible transition kernels with stationary distribution π\pi and π~\tilde{\pi}, respectively. Suppose we can fix an EE-path from xx to yy for every (x,y)E~(x,y)\in\tilde{E} and denote it by γxy\gamma_{xy}. Given such a choice of path, we define the congestion ratio by

B=maxeE(1Q(e)(x,y):eγxyQ~(x,y)|γxy|).\displaystyle B=\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}_{e\in E}\left(\frac{1}{Q(e)}\sum_{(x,y):e\in\gamma_{xy}}\tilde{Q}(x,y)|\gamma_{xy}|\right). (6.1)
Theorem 6.1 ([LPW09a]).

Let PP and P~\tilde{P} be finite reversible transition kernels with stationary distributions π\pi and π~\tilde{\pi} respectively. If BB is the congestion ratio for a choice of EE-paths, as defined in Eq. 6.14, then

P~(f,f)BP(f,f).\displaystyle\mathscr{E}_{\tilde{P}}(f,f)\leq B\mathscr{E}_{P}(f,f). (6.2)

The aim of this section is to extend Theorem 6.1 to a large class of general Markov Processes.

6.2. Non-Example: Barbell Graph

Before giving our new theorem, we give a quick example that illustrates why directly mimicking the form of the theorem for finite Markov chains will not work well. To understand the basic issue, note that we can view the bound in Theorem 6.1 as coming from three terms: the length |γxy||\gamma_{xy}| of a path, the relative probability Q~(x,y)Q(e)\frac{\tilde{Q}(x,y)}{Q(e)} of transitions along the path, and the congestion |{(x,y):eγxy}||\{(x,y)\,:\,e\in\gamma_{xy}\}|. The length makes sense as written for general state spaces, and the relative probability can easily be “translated” to e.g. the Radon-Nikodym derivative. However, it is not clear how to translate the notion of congestion, which measures how many paths go exactly through an edge. The problem is that it is often simple to very slightly deform paths in continuous state spaces so that the congestion remains extremely small (or even 1), even though a very large number of paths are extremely close to an edge.

We give a concrete example. Denote by KnK_{n} the usual complete graph on nn vertices and let Gn=(Vn,En)G_{n}=(V_{n},E_{n}) be the “barbell” graph obtained by taking two copies Rn,LnR_{n},L_{n} of KnK_{n} and adding a single edge; denote the endpoints of this new edge by rnRnr_{n}\in R_{n} and nLn\ell_{n}\in L_{n}. Denote by {Xt}\{X_{t}\} the usual simple random walk on this graph, with kernel AnA_{n}. We study the following natural collection of paths:

  1. (1)

    For x,yLnx,y\in L_{n} (or x,yRnx,y\in R_{n}), simply take the edge from xx to yy.

  2. (2)

    For xLnx\in L_{n} and yLny\notin L_{n}, take the edge (x,n)(x,\ell_{n}) then the edge (n,rn)(\ell_{n},r_{n}) and then the edge (rn,y)(r_{n},y). At the end, delete any self-edges that may appear.

  3. (3)

    For xRnx\in R_{n} and yRny\notin R_{n}, do the obvious analogue to path (2).

It is straightforward to check that:

  1. (1)

    All paths are of length at most 3,

  2. (2)

    All transition probabilities are either n1n^{-1} or (n1)1(n-1)^{-1} (so always roughly n1)n^{-1}),

  3. (3)

    The stationary measure at any point is either n2((n1)2+n)\frac{n}{2((n-1)^{2}+n)} or n12((n1)2+n)\frac{n-1}{2((n-1)^{2}+n)} (so again roughly n1n^{-1}), and

  4. (4)

    The edge (rn,n)(r_{n},\ell_{n}) appears in roughly n2n^{2} paths, which is (vastly) more than any other edge.

Putting together these estimates and comparing to the kernel that simply takes i.i.d. samples from the stationary measure, Theorem 6.1 tells us that the relaxation time of AnA_{n} is O(n2)O(n^{2}) (and so the mixing time is O(n2log(n))O(n^{2}\log(n))). In fact it is straightforward to check that both the relaxation and mixing times are Θ(n2)\Theta(n^{2}). Sketching the relevant arguments: the matching lower bound on the relaxation time can be obtained from Cheeger’s inequality; upper and lower bounds on the mixing time can be obtained by checking that it takes Θ(n2)\Theta(n^{2}) steps to travel from Ln\{n}L_{n}\backslash\{\ell_{n}\} to Rn\{rn}R_{n}\backslash\{r_{n}\} on average (and, to obtain the upper bound, a straightforward coupling argument).

We now consider a continuous analogue. Define Sn=vVnIvS_{n}=\cup_{v\in V_{n}}I_{v}, where IvI_{v} are disjoint copies of [0,1][0,1]. For vVnv\in V_{n} and yIvSny\in I_{v}\subset S_{n}, write V(y)=vV(y)=v for the vertex associated with yy and |y||y| for the value of yy in IvI_{v}. Then define the transition kernel QnQ_{n} on SnS_{n} by the following algorithm for sampling YQn(x,)Y\sim Q_{n}(x,\cdot):

  1. (1)

    Sample uXn(V(x),)u\sim X_{n}(V(x),\cdot).

  2. (2)

    Return YUnif(Iu)Y\sim\mathrm{Unif}(I_{u}).

Note that this walk can be written as a product walk for the original simple random walk on GnG_{n} and the kernel whose measures are all Unif[0,1]\mathrm{Unif}[0,1]. In particular, this means it has the same mixing time and relaxation times as {Xt}\{X_{t}\}.

Attempting to confirm this with a naive version of Theorem 6.1 runs into an immediate problem. Roughly speaking, the problem comes from the fact that on continuous spaces we can “split” the heavily-used vertices n,rn\ell_{n},r_{n} into many pieces, and this allows paths travelling between these vertices to avoid each other. Making this split precise requires some additional notation; the following somewhat-heavy notation is reused in Section 6.4.

Let NL,n:Ln\{n}{1,2,,n1}N_{L,n}\,:\,L_{n}\backslash\{\ell_{n}\}\mapsto\{1,2,\ldots,n-1\} be an arbitrary ordering of the vertices in Ln\{n}L_{n}\backslash\{\ell_{n}\}, and similarly let NR,nN_{R,n} be an ordering of Rn\{rn}R_{n}\backslash\{r_{n}\}. For uLn\{n}u\in L_{n}\backslash\{\ell_{n}\}, let In(u)I_{\ell_{n}}(u) be a copy of [0,1][0,1] and for t[0,1]t\in[0,1] let In(u,t)=tIn(u)I_{\ell_{n}}(u,t)=t\in I_{\ell_{n}}(u). Let IrnI_{r_{n}} be the obvious analogous construction on RnR_{n}. Define

Sn=(vVn\{n,rn}Iv)(vLn\{n}In(v))(vRn\{rn}Irn(v));\displaystyle S_{n}^{\prime}=\left(\sqcup_{v\in V_{n}\backslash\{\ell_{n},r_{n}\}}I_{v}\right)\sqcup\left(\sqcup_{v\in L_{n}\backslash\{\ell_{n}\}}I_{\ell_{n}}(v)\right)\sqcup\left(\sqcup_{v\in R_{n}\backslash\{r_{n}\}}I_{r_{n}}(v)\right); (6.3)

that is, we take SnS_{n} and split the interval InI_{\ell_{n}} into n1n-1 intervals vLn\{n}In(u)\sqcup_{v\in L_{n}\backslash\{\ell_{n}\}}I_{\ell_{n}}(u), and make the analogous split for IrnI_{r_{n}}. Define the map ϕ:SnSn\phi\,:\,S_{n}^{\prime}\mapsto S_{n} by the following equations:

ϕ(y)\displaystyle\phi(y) =y,V(y)Vn\{n,rn}\displaystyle=y,\qquad\qquad\qquad\qquad\qquad\qquad\,\,V(y)\in V_{n}\backslash\{\ell_{n},r_{n}\}
V(ϕ(y))\displaystyle V(\phi(y)) =n,yvLn\{n}In(v)\displaystyle=\ell_{n},\qquad\qquad\qquad\qquad\qquad\qquad y\in\cup_{v\in L_{n}\backslash\{\ell_{n}\}}I_{\ell_{n}}(v)
|ϕ(y)|\displaystyle|\phi(y)| =|y|n1+Nn(v)1n1,yLn(v),\displaystyle=\frac{|y|}{n-1}+\frac{N_{n}(v)-1}{n-1},\qquad\qquad\,\,y\in L_{n}(v),

with the obvious analogues to the last two equations when yvRn\{rn}Irn(v)y\in\cup_{v\in R_{n}\backslash\{r_{n}\}}I_{r_{n}}(v). We note that, ignoring a finite number of points, ϕ\phi is a bijection between SnS_{n} and SnS_{n}^{\prime}; ignoring events of measure 0, Ytϕ1(Yt)Y_{t}^{\prime}\equiv\phi^{-1}(Y_{t}) defines a Markov chain with the same relaxation and mixing times as {Yt}\{Y_{t}\}. For the remainder of this section we will consider this process on SnS_{n}^{\prime}, and write QnQ_{n}^{\prime} for its kernel.

We now consider the following path from xIxx\in I_{x} to yIyy\in I_{y}:

  1. (1)

    If V(x),V(y)LnV(x),V(y)\in L_{n} (or V(x),V(y)RnV(x),V(y)\in R_{n}), simply take the edge from xx^{\prime} to yy^{\prime}.

  2. (2)

    For V(x)LnV(x)\in L_{n} and V(y)LnV(y)\notin L_{n}, take the edge from xIxx^{\prime}\in I_{x} to x′′I(n,|x|)Inx^{\prime\prime}\equiv I(\ell_{n},|x^{\prime}|)\in I_{\ell_{n}}, then the edge from x′′x^{\prime\prime} to y′′I(rn,|y|)y^{\prime\prime}\equiv I(r_{n},|y^{\prime}|), and then the edge from y′′y^{\prime\prime} to yy^{\prime}. At the end, delete any self-edges that may appear.

  3. (3)

    For V(x)RnV(x)\in R_{n} and V(y)RnV(y)\notin R_{n}, do the obvious analogue to path (2).

We again try to analyze these paths. All of the earlier comments apply, except that no edges are used more than O(1)O(1) times! This occurs because we have managed to “clone” n,rn\ell_{n},r_{n} into effectively (n1)(n-1) vertices, resulting in effectively (n1)2(n-1)^{2} edges between them; this exactly matches the Θ(n2)\Theta(n^{2}) congestion of the original discrete path. Thus, applying the formula in Theorem 6.1 naively would give a relaxation time that is O(1)O(1), which is not correct.

As discussed at the start of the section, the problem is that the formula of Theorem 6.1 counts how many paths exactly reuse an edge, while the current continuous example merely compresses edges without exactly reusing them. Earlier work [Yue00, Yue02] adjusted for this compression by associating a Jacobian to its paths. Unfortunately, this approach seems to be quite restrictive. The obvious problem is that this approach requires paths to be sufficiently smooth. A less-obvious problem is that the method often gives very poor bounds. This happens even when (as in our barbell example) the continuous chain of interest has a nearly-identical continuous analogue for which comparison inequalities give nearly-sharp answers; see the original papers [Yue00, Yue02] for examples of this phenomenon.

In section 6.4, we analyze QnQ_{n} using our new theorem (and the same path) and obtain the correct answer.

6.3. Hyperfinite Approximation of Congestion Ratio

In this section, we will consider Markov chains on a state space of the form X=i=1IXiX=\sqcup_{i=1}^{I}X_{i}, where XiX_{i} is of the form [0,1]ki[0,1]^{k_{i}} and the notation “\sqcup” is used to emphasize the fact that these are disjoint copies of what may be the same space; we use the word “component” to refer to such a copy, so that e.g. X1X_{1} is one component of XX. We now define a metric dd on XX as following:

  1. (1)

    For every i{1,2,,I}i\in\{1,2,\ldots,I\} and every x,yXix,y\in X_{i}, define d(x,y)d(x,y) to be the standard Euclidean distance between xx and yy.

  2. (2)

    For xXi1x\in X_{i_{1}} and yXi2y\in X_{i_{2}} where i1i2i_{1}\neq i_{2}, define d(x,y)=1+maxiIkid(x,y)=1+\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}_{i\in I}\sqrt{k_{i}}.

It is easy to verify that dd is a metric on XX with the following property:

infijIinfxXi,yXjd(x,y)>supiIsupx,yXid(x,y).\displaystyle\operatorname*{\mathrm{inf}\vphantom{\mathrm{infsup}}}_{i\neq j\in I}\operatorname*{\mathrm{inf}\vphantom{\mathrm{infsup}}}_{x\in X_{i},y\in X_{j}}d(x,y)>\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{i\in I}\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{x,y\in X_{i}}d(x,y). (6.4)

In other words, every point in one “part” XiX_{i} is closer to every other point in the same part than it is to any point in any other part XjX_{j}. We will always use the completion of the Borel σ\sigma-algebra on XX associated with dd.

We start by constructing a specific hyperfinite representation SS of XX. Pick an infinite KK\in{{}^{*}\mathbb{N}} and let δt=1K!\delta t=\frac{1}{K!}. Define S={0,δt,2δt,,1δt}S=\{0,\delta t,2\delta t,\dotsc,1-\delta t\}, Si=SdiS_{i}=S^{d_{i}}, and SX=i=1ISiS_{X}=\sqcup_{i=1}^{I}S_{i}. For every 1iI1\leq i\leq I and sSis\in S_{i}, Bi(s)SiB_{i}(s)\subset S_{i} is defined to be the small rectangle with length equals to δt\delta t on each side and containing ss as its left lower point. Let λi\lambda_{i} denote the Lebesgue measure on XiX_{i}. Then the hyperfinite representations (Si,{Bi(s)}sSi)(S_{i},\{B_{i}(s)\}_{s\in S_{i}}) are uniform in the sense that λi(Bi(s1))=λi(Bi(s2)){{}^{*}\lambda_{i}}(B_{i}(s_{1}))={{}^{*}\lambda_{i}}(B_{i}(s_{2})) for s1,s2Sis_{1},s_{2}\in S_{i}.

We work with two transition kernels {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X} and {g~(x,1,)}xX\{\tilde{g}(x,1,\cdot)\}_{x\in X} on XX with stationary distributions π\pi and π~\tilde{\pi}, respectively. Denote by λ\lambda the natural extension of the Lebesgue measure to XX defined by the following formula: for Lebesgue-measurable A1X1,,AIXIA_{1}\subset X_{1},\ldots,A_{I}\subset X_{I}, set

λ(A1AI)=λ1(A1)++λI(AI),\displaystyle\lambda(A_{1}\cup\ldots\cup A_{I})=\lambda_{1}(A_{1})+\ldots+\lambda_{I}(A_{I}), (6.5)

where λi\lambda_{i} is the usual Lebesgue measure on XiX_{i}. We assume that {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X}, {g~(x,1,)}xX\{\tilde{g}(x,1,\cdot)\}_{x\in X}, π\pi, π~\tilde{\pi} are dominated by λ\lambda. Let k(),k~()k(\cdot),\tilde{k}(\cdot) denote the density functions for π,π~\pi,\tilde{\pi}, respectively. Let f(x,),f~(x,)f(x,\cdot),\tilde{f}(x,\cdot) denote the density functions for g(x,1,),g~(x,1,)g(x,1,\cdot),\tilde{g}(x,1,\cdot), respectively. We now extend notions of path and edge from finite Markov processes to general Markov processes.

Let E={(x,y):f(x,y)>0}E=\{(x,y):f(x,y)>0\}. An EE-path from xx to yy is a sequence (e1,e2,,en)(e_{1},e_{2},\dotsc,e_{n}) of edges in EE such that e1=(x,x1),e2=(x1,x2),,en=(xn1,y)e_{1}=(x,x_{1}),e_{2}=(x_{1},x_{2}),\dotsc,e_{n}=(x_{n-1},y) for some vertices x1,x2,,xn1Xx_{1},x_{2},\dotsc,x_{n-1}\in X. Let Q(x,y)=k(x)f(x,y)Q(x,y)=k(x)f(x,y) for x,yΩx,y\in\Omega. E~\tilde{E} and Q~\tilde{Q} are defined similarly. For (x,y)X2(x,y)\in X^{2}, we assume that there exists an EE-path connecting xx and yy. We fix such a EE-path denote it by γxy\gamma_{xy}. Let Θ={γxy:(x,y)X2}\Theta=\{\gamma_{xy}:(x,y)\in X^{2}\} be the collection of all such paths. We are, of course, really interested in ϝ={γxy:(x,y)E~}\digamma=\{\gamma_{xy}:(x,y)\in\tilde{E}\}. Note that we can view γ\gamma as a function from X2X^{2} to Θ\Theta. The nonstandard counterparts of E,Q,E~,Q~,Θ,ϝE,Q,\tilde{E},\tilde{Q},\Theta,\digamma are defined as nonstandard extensions of these objects. By the transfer principle, γ{{}^{*}\gamma} is a function from X2{{}^{*}X^{2}} to Θ{{}^{*}\Theta}. To associate a nonstandard edge (path) with a standard edge (path), we need to define the distance between edges and paths. Abusing notation slightly, we write:

Definition 6.2.

Let e1=(x1,y1)e_{1}=(x_{1},y_{1}) and e2=(x2,y2)e_{2}=(x_{2},y_{2}) be two edges in EE or E~\tilde{E}. The distance between e1,e2e_{1},e_{2} is defined to be W(e1,e2)=max{d(x1,x2),d(y1,y2)}W(e_{1},e_{2})=\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}\{d(x_{1},x_{2}),\,d(y_{1},y_{2})\}.

Let γa1b1\gamma_{a_{1}b_{1}} and γa2b2\gamma_{a_{2}b_{2}} be two paths in ϝ\digamma. The distance between γa1b1\gamma_{a_{1}b_{1}} and γa2b2\gamma_{a_{2}b_{2}} is defined to be W(γa1b1,γa2b2)=max{d(a1,a2),d(b1,b2)}W(\gamma_{a_{1}b_{1}},\gamma_{a_{2}b_{2}})=\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}\{d(a_{1},a_{2}),\,d(b_{1},b_{2})\} - that is, on paths WW is a pseudometric on paths which only pays attention to the endpoints.

The distance between nonstandard edges and paths can be defined similarly. Two nonstandard edges (paths) are infinitely close to each other if and only if both their start and end points are infinitely close to each other. We write e1e2e_{1}\approx e_{2} (respectively γ1γ2\gamma_{1}\approx\gamma_{2}) if W(e1,e2)0{{}^{*}W}(e_{1},e_{2})\approx 0 ( respectively W(γ1,γ2)0{{}^{*}W}(\gamma_{1},\gamma_{2})\approx 0) for two nonstandard edges (paths) e1,e2e_{1},e_{2} (γ1,γ2\gamma_{1},\gamma_{2}).

We shall use {Gi()}iSX\{G_{i}(\cdot)\}_{i\in S_{X}}, {G~i()}iSX\{\tilde{G}_{i}(\cdot)\}_{i\in S_{X}}, {Hi()}iSX\{H_{i}(\cdot)\}_{i\in S_{X}}, {H~i()}iSX\{\tilde{H}_{i}(\cdot)\}_{i\in S_{X}}, Π()\Pi(\cdot) and Π~()\tilde{\Pi}(\cdot) to denote various corresponding hyperfinite objects defined in Section 3. We now define the hyperfinite counterparts of E,Q,E~,Q~,ϝE,Q,\tilde{E},\tilde{Q},\digamma. Let ={(i,j):Hi({j})>0}\mathcal{E}=\{(i,j):H_{i}(\{j\})>0\}. Note that \mathcal{E} is a subset of SX×SXS_{X}\times S_{X}. An \mathcal{E}-path from ii to jj is a sequence (e1,e2,,en)(e_{1},e_{2},\dotsc,e_{n}) of edges in \mathcal{E} such that e1=(i,i1),e2=(i1,i2),,en=(in1,j)e_{1}=(i,i_{1}),e_{2}=(i_{1},i_{2}),\dotsc,e_{n}=(i_{n-1},j) for some vertices i1,i2,,in1SXi_{1},i_{2},\dotsc,i_{n-1}\in S_{X}. Let 𝒬(i,j)=Π({i})Hi({j})\mathcal{Q}(i,j)=\Pi(\{i\})H_{i}(\{j\}) for i,jΩi,j\in\Omega. ~\tilde{\mathcal{E}} and 𝒬~\tilde{\mathcal{Q}} are defined similarly. We defer the definition of the collection of hyperfinite paths (hyperfinite counterpart of \mathcal{E}) to later part of the section. We now introduce the following assumption. To be precise, let i\mathbbm{P}_{i} denote the projection from X2X^{2} to the ii-th coordinate for i{1,2}i\in\{1,2\}.

Assumption 2.

There exist open sets 𝒪X2\mathcal{O}\subset X^{2} such that, the density functions f(x,y)f(x,y), k(x)k(x) are bounded away from 0 for all (x,y)𝒪(x,y)\in\mathcal{O} and all x1(𝒪)x\in\mathbbm{P}_{1}(\mathcal{O}).

In Section 5.1, Theorem 5.5 implies that division between two hyperfinite probabilities can be used to approximate nonstandard extensions of the density functions. We assume that {g(x,1,)}xX,{g~(x,1,)}xX,π\{g(x,1,\cdot)\}_{x\in X},\{\tilde{g}(x,1,\cdot)\}_{x\in X},\pi and π~\tilde{\pi} are continuous on 𝒪\mathcal{O}.

Assumption 3.

Let 𝒪\mathcal{O} be the same open set as in Assumption 2. The density function f(,)f(\cdot,\cdot) is jointly continuous on 𝒪\mathcal{O} and k()k(\cdot) is continuous on 1(𝒪)\mathbbm{P}_{1}(\mathcal{O}). The density function f~(,)\tilde{f}(\cdot,\cdot) is jointly continuous on X2X^{2} and k~()\tilde{k}(\cdot) is continuous on XX.

Assumption 2 and Assumption 3 imply the following result.

Lemma 6.3.

Suppose Assumption 2 and Assumption 3 hold for 𝒪\mathcal{O}. Then, for i1,i2,j1,j2SXi_{1},i_{2},j_{1},j_{2}\in S_{X} such that (i2,j2)𝗌𝗍1(𝒪)(i_{2},j_{2})\in\mathsf{st}^{-1}(\mathcal{O}), we have

Π~({i1})H~i1({j1})Π({i2})Hi2({j2})k~(i1)f~(i1,j1)k(i2)f(i2,j2)Π~({i1})G~i1({j1})Π({i2})Gi2({j2}).\displaystyle\frac{\tilde{\Pi}(\{i_{1}\})\tilde{H}_{i_{1}}(\{j_{1}\})}{\Pi(\{i_{2}\})H_{i_{2}}(\{j_{2}\})}\approx\frac{{{}^{*}\tilde{k}}(i_{1}){{}^{*}\tilde{f}}(i_{1},j_{1})}{{{}^{*}k}(i_{2}){{}^{*}f}(i_{2},j_{2})}\approx\frac{\tilde{\Pi}(\{i_{1}\})\tilde{G}_{i_{1}}(\{j_{1}\})}{\Pi(\{i_{2}\})G_{i_{2}}(\{j_{2}\})}. (6.6)
Proof.

As (i2,j2)𝗌𝗍1(𝒪)(i_{2},j_{2})\in\mathsf{st}^{-1}(\mathcal{O}), we have (i,j)𝒪(i,j)\in{{}^{*}\mathcal{O}} and i1(𝒪)i\in{{}^{*}\mathbbm{P}_{1}}({{}^{*}\mathcal{O}}) for every (i,j)(i,j) with (i,j)(i2,j2)(i,j)\approx(i_{2},j_{2}). Thus, by Assumption 2 and Theorem 5.6, we know that these fractions are well-defined. Moreover, by Theorem 5.6 again, we have

Π~({i1})H~i1({j1})Π({i2})Hi2({j2})\displaystyle\frac{\tilde{\Pi}(\{i_{1}\})\tilde{H}_{i_{1}}(\{j_{1}\})}{\Pi(\{i_{2}\})H_{i_{2}}(\{j_{2}\})} =Π~({i1})H~i1({j1})λ(B(i1))λ(B(j1))Π({i2})Hi2({j2})λ(B(i2))λ(B(j2))\displaystyle=\frac{\frac{\tilde{\Pi}(\{i_{1}\})\tilde{H}_{i_{1}}(\{j_{1}\})}{{{}^{*}\lambda}(B(i_{1})){{}^{*}\lambda}(B(j_{1}))}}{\frac{\Pi(\{i_{2}\})H_{i_{2}}(\{j_{2}\})}{{{}^{*}\lambda}(B(i_{2})){{}^{*}\lambda}(B(j_{2}))}} (6.7)
k~(i1)f~(i1,j1)k(i2)f(i2,j2)\displaystyle\approx\frac{{{}^{*}\tilde{k}}(i_{1}){{}^{*}\tilde{f}}(i_{1},j_{1})}{{{}^{*}k}(i_{2}){{}^{*}f}(i_{2},j_{2})} (6.8)
Π~({i1})G~i1({j1})Π({i2})Gi2({j2}).\displaystyle\approx\frac{\tilde{\Pi}(\{i_{1}\})\tilde{G}_{i_{1}}(\{j_{1}\})}{\Pi(\{i_{2}\})G_{i_{2}}(\{j_{2}\})}. (6.9)

We also have the following result from Assumption 2 and Assumption 3.

Lemma 6.4.

Suppose Assumption 2 holds for 𝒪\mathcal{O}. Then (i,j)(i,j) is an element of E{{}^{*}E} and \mathcal{E} for every i,jSXi,j\in S_{X} such that (i,j)𝒪(i,j)\in{{}^{*}\mathcal{O}}.

Proof.

Suppose (i,j)𝒪(i,j)\in{{}^{*}\mathcal{O}}. By the transfer of Assumption 2, f(i,j)>0{{}^{*}f}(i,j)>0 hence (i,j)E(i,j)\in{{}^{*}E}. By the construction of the hyperfinite representation, there exist A,B[X]A,B\in{{}^{*}\mathcal{B}[X]} such that

  1. (1)

    AB(i)A\subset B(i) and iAi\in A

  2. (2)

    BB(j)B\subset B(j) and jBj\in B

  3. (3)

    A×B𝒪A\times B\subset{{}^{*}\mathcal{O}}

  4. (4)

    λ(A)>0{{}^{*}\lambda}(A)>0 and λ(B)>0{{}^{*}\lambda}(B)>0.

Then, by the construction of HH (see Theorem 3.5), we conclude that Hi({j})>0H_{i}(\{j\})>0 hence (i,j)(i,j)\in\mathcal{E}. ∎

As we mentioned at the beginning of the section, we fix an EE-path for every (x,y)X2(x,y)\in X^{2}. Let 𝔻={(x,x):xX}\mathbb{D}=\{(x,x)\,:\,x\in X\} denote the collection of diagonal points of X2X^{2}. Our choices of paths have been arbitrary so far. We impose the following assumption which ensures that our choices of paths consist of certain edges.

Assumption 4.

Let 𝒪\mathcal{O} be the same open set as in Assumption 2 and let ϵ0\epsilon_{0} be a positive real number. Let

E~ϵ0={(x,y)X2𝔻:inf(a,b)E~W((x,y),(a,b))ϵ0}.\displaystyle\tilde{E}_{\epsilon_{0}}=\{(x,y)\in X^{2}\setminus\mathbb{D}:\operatorname*{\mathrm{inf}\vphantom{\mathrm{infsup}}}_{(a,b)\in\tilde{E}}W((x,y),(a,b))\leq\epsilon_{0}\}. (6.10)

There exists a 𝒞𝒪\mathcal{C}\subset\mathcal{O} such that 𝒞\mathcal{C} is closed under the subspace topology of X2𝔻X^{2}\setminus\mathbb{D} and, for every (x,y)E~ϵ0(x,y)\in\tilde{E}_{\epsilon_{0}}, there exists an EE-path γxy\gamma_{xy} consisting of edges from 𝒞\mathcal{C}.

Note that 𝗌𝗍1(𝒞)𝒪\mathsf{st}^{-1}(\mathcal{C})\subset\mathcal{O} since 𝒞\mathcal{C} is a closed subset of 𝒪\mathcal{O}. With Assumption 4 holds, for (x,y)E~ϵ0(x,y)\in\tilde{E}_{\epsilon_{0}}, we let the path connecting xx and yy consists of edges from 𝒞\mathcal{C}. In order to generalize Theorem 6.1 to general Markov processes, we need to make sure that every edge is not contained in too many paths. We impose the following ϵ\epsilon-separated property.

Assumption 5.

Fix ϵ>0\epsilon>0. For every edge eE𝔻e\in E\setminus\mathbb{D}, let

Se={γxyΘ:eγxy}\displaystyle S_{e}=\{\gamma_{xy}\in\Theta:e\in\gamma_{xy}\} (6.11)

be the collection of all paths that contains the edge ee. Then for any pair (xi,yi),(xj,yj)(x_{i},y_{i}),(x_{j},y_{j}) such that γxiyi,γxjyjSe\gamma_{x_{i}y_{i}},\gamma_{x_{j}y_{j}}\in S_{e}, the following statement holds:

((xi=xj)(d(xi,xj)>ϵ))((yi=yj)(d(yi,yj)>ϵ)).\displaystyle((x_{i}=x_{j})\vee(d(x_{i},x_{j})>\epsilon))\wedge((y_{i}=y_{j})\vee(d(y_{i},y_{j})>\epsilon)). (6.12)

Fix some ϵ>0\epsilon>0. Assumption 5 then implies that, for each standard edge e𝔻e\not\in\mathbb{D}, there is only finitely many paths from Θ\Theta contain it. It is also true that every nonstandard edge v𝔻v\not\in{{}^{*}\mathbb{D}} is contained in finitely many elements of Θ{{}^{*}\Theta}. We assume that there is an uniform bound on the length of all paths.

Assumption 6.

There exists R>0R>0 such that |γ|R|\gamma|\leq R for every γΘ\gamma\in\Theta.

We now impose the following regularity condition on the length of the paths in ϝ\digamma.

Assumption 7.

There exists M>0M>0 with the following property: for every γϝ\gamma\in\digamma, there exists δ>0\delta>0 such that, for every γϝ\gamma^{\prime}\in\digamma with d(γ,γ)<δd(\gamma,\gamma^{\prime})<\delta, we have ||γ||γ||M\bigl{|}|\gamma|-|\gamma^{\prime}|\bigr{|}\leq M.

Assumption 7 implies the following result:

Lemma 6.5.

Suppose Assumption 7 holds. Let γx1y1{{}^{*}\gamma}_{x_{1}y_{1}} and γx2y2{{}^{*}\gamma}_{x_{2}y_{2}} be two nonstandard paths in ϝ{{}^{*}\digamma}. If x1x2x_{1}\approx x_{2} and y1y2y_{1}\approx y_{2} (that is, W(γx1y1,γx2y2)0{{}^{*}W}({{}^{*}\gamma}_{x_{1}y_{1}},{{}^{*}\gamma}_{x_{2}y_{2}})\approx 0), then ||γx1y1||γx2y2||M\bigl{|}|{{}^{*}\gamma}_{x_{1}y_{1}}|-|{{}^{*}\gamma}_{x_{2}y_{2}}|\bigr{|}\leq M.

We now define the collection of hyperfinite paths which we shall be primarily working with for the rest of the section. For (i,i)~𝔻(i,i)\in\tilde{\mathcal{E}}\cap{{}^{*}\mathbb{D}}, the hyperfinite path Γii\Gamma_{ii} is simply (i,i)(i,i) which has length 0. For (i,j)~𝔻(i,j)\in\tilde{\mathcal{E}}\setminus{{}^{*}\mathbb{D}}, by the transfer principle, there exists a nonstandard path γijΘ{{}^{*}\gamma}_{ij}\in{{}^{*}\Theta} connecting ii and jj. By Assumption 6, we know that |γij|=k|{{}^{*}\gamma}_{ij}|=k for some natural number kRk\leq R. Thus, γij{{}^{*}\gamma}_{ij} is a sequence of edges in E{{}^{*}E} and we can denote this sequence by (i,i1),(i1,i2),,(ik1,j)(i,i_{1}),(i_{1},i_{2}),\dotsc,(i_{k-1},j). Note that, as H~i({j})>0\tilde{H}_{i}(\{j\})>0, by the construction of H~\tilde{H} and the fact that iji\neq j, we know that (i,j)E~ϵ0(i,j)\in{{}^{*}\tilde{E}_{\epsilon_{0}}}. A hyperfinite path Γij\Gamma_{ij} from ii to jj is defined by the sequence

(i,si1),(si1,si2),,(sik1,j)\displaystyle(i,s_{i_{1}}),(s_{i_{1}},s_{i_{2}}),\dotsc,(s_{i_{k-1}},j) (6.13)

where, for every 1nk11\leq n\leq k-1, sins_{i_{n}} is the unique element in SXS_{X} such that inB(sin)i_{n}\in B(s_{i_{n}}).

We now verify that Γij\Gamma_{ij} is a well-defined hyperfinite path from ii to jj.

Lemma 6.6.

Suppose Assumption 2 holds for 𝒪\mathcal{O}, Assumption 4 holds for 𝒞\mathcal{C}. Let (a,b){(i,si(1)),(si(1),si(2)),,(si(k1),j)}(a,b)\in\{(i,s_{i^{(1)}}),(s_{i^{(1)}},s_{i^{(2)}}),\dotsc,(s_{i^{(k-1)}},j)\}. Then, H(a,b)>0H(a,b)>0.

Proof.

Let (a,b){(i,si1),(si1,si2),,(sik1,j)}(a,b)\in\{(i,s_{i_{1}}),(s_{i_{1}},s_{i_{2}}),\dotsc,(s_{i_{k-1}},j)\}. By Assumption 4 and the fact that (i,j)E~ϵ0(i,j)\in{{}^{*}\tilde{E}_{\epsilon_{0}}}, we know that (a,b)𝒞𝒪(a,b)\in{{}^{*}\mathcal{C}}\subset{{}^{*}\mathcal{O}}. By Lemma 6.4, we know that Ha({b})>0H_{a}(\{b\})>0. ∎

Thus, (a,b)(a,b) is an element of \mathcal{E} so Γij\Gamma_{ij} is a well-defined hyperfinite path. Let Υ={Γij:(i,j)~}\Upsilon=\{\Gamma_{ij}:(i,j)\in\tilde{\mathcal{E}}\} be a fixed collection of hyperfinite paths. By construction, we have |Γij||γij||\Gamma_{ij}|\leq|{{}^{*}\gamma}_{ij}| for (i,j)SX×SX(i,j)\in S_{X}\times S_{X}.

Suppose Assumption 5 (ϵ\epsilon-separated property) holds for some ϵ>0\epsilon>0. Let v=(s,t)E𝔻v=(s,t)\in{{}^{*}E}\setminus{{}^{*}\mathbb{D}} for s,tSXs,t\in S_{X}. By Assumption 5, vv is contained in finitely many nonstandard paths (elements of Θ{{}^{*}\Theta}). However, vv may be contained in infinitely many hyperfinite paths (elements of Υ\Upsilon). This is because, for a nonstandard path γij{{}^{*}\gamma}_{ij} with i,jSXi,j\in S_{X}, if γij{{}^{*}\gamma}_{ij} contains a nonstandard edge e=(x,y)e=(x,y) where xB(s)x\in B(s) and yB(t)y\in B(t), then the hyperfinite path Γij\Gamma_{ij} contains vv. So we impose the following regularity condition on Θ\Theta.

Assumption 8.

There exist K>0K\in\mathbb{R}_{>0} and a continuous function m:E𝔻[0,K]m:E\setminus\mathbb{D}\to[0,K] with the following property: for every ϵ>0\epsilon>0 and standard edges e,eE𝔻e,e^{\prime}\in E\setminus\mathbb{D}, if W(e,e)<ϵW(e,e^{\prime})<\epsilon and eγΘe\in\gamma\in\Theta, there exists γΘ\gamma^{\prime}\in\Theta such that W(γ,γ)<m(e)ϵW(\gamma,\gamma^{\prime})<m(e)\epsilon and eγe^{\prime}\in\gamma^{\prime}.

By the transfer principle, m{{}^{*}m} is an internal function from E𝔻{{}^{*}E}\setminus{{}^{*}\mathbb{D}} to {{}^{*}\mathbb{R}}. Assumption 8 also implies that m(e)m(e){{}^{*}m}(e)\approx{{}^{*}m}(e^{\prime}) for e,eE𝔻e,e^{\prime}\in{{}^{*}E}\setminus{{}^{*}\mathbb{D}} such that W(e,e)0{{}^{*}W}(e,e^{\prime})\approx 0 and 𝗌𝗍(e)𝔻\mathsf{st}(e)\not\in\mathbb{D}. We have the following result.

Lemma 6.7.

Suppose Assumption 8 holds. Let v=(i,j)E𝔻v=(i,j)\in{{}^{*}E}\setminus{{}^{*}\mathbb{D}} be a hyperfinite edge that is contained in nn nonstandard paths (elements of Θ{{}^{*}\Theta}). Then vv is contained in at most 2Kn2Kn many hyperfinite paths. Suppose that 𝗌𝗍(v)=(𝗌𝗍(i),𝗌𝗍(j))E𝔻\mathsf{st}(v)=(\mathsf{st}(i),\mathsf{st}(j))\in E\setminus\mathbb{D}. Then vv is contained in at most 2(m(𝗌𝗍(v))+1)×n2(\lceil m(\mathsf{st}(v))\rceil+1)\times n many hyperfinite paths.

Proof.

Recall that each element in {B(s):sSX}\{B(s):s\in S_{X}\} is a rectangle with side length δt\delta t. Let A={γk:kn}A=\{{{}^{*}\gamma}_{k}:k\leq n\} denote the collection of nonstandard paths that contain vv and let Γ\Gamma be a hyperfinite path that contains vv. By the construction of hyperfinite path, there exists a nonstandard path γ{{}^{*}\gamma}^{\prime} that corresponds with Γ\Gamma. Note that γ{{}^{*}\gamma}^{\prime} has the same start and end points as Γ\Gamma and γ{{}^{*}\gamma}^{\prime} contains some nonstandard edge e=(x,y)E𝔻e=(x,y)\in{{}^{*}E}\setminus{{}^{*}\mathbb{D}} where xB(i)x\in B(i) and yB(j)y\in B(j). As W(e,v)<δt{{}^{*}W}(e,v)<\delta t, by Assumption 8, there exists k0nk_{0}\leq n such that W(γ,γk0)<m(v)δt{{}^{*}W}({{}^{*}\gamma}^{\prime},{{}^{*}\gamma}_{k_{0}})<{{}^{*}m}(v)\delta t. By Assumption 8 and the hypothesis of the theorem, we know that vv is contained in at most 2Kn2Kn many hyperfinite paths. If 𝗌𝗍(v)=(𝗌𝗍(i),𝗌𝗍(j))E𝔻\mathsf{st}(v)=(\mathsf{st}(i),\mathsf{st}(j))\in E\setminus\mathbb{D}, then vv is contained in at most 2(m(𝗌𝗍(v))+1)×n2(\lceil m(\mathsf{st}(v))\rceil+1)\times n many hyperfinite paths. ∎

Lemma 6.8.

Suppose Assumption 2 holds for 𝒪\mathcal{O}. Suppose Assumption 5 holds for some ϵ>0\epsilon>0 and Assumption 8 holds. Let 𝒞\mathcal{C} be a subset of 𝒪\mathcal{O} such that 𝗌𝗍1(𝒞)𝒪\mathsf{st}^{-1}(\mathcal{C})\subset{{}^{*}\mathcal{O}}. Let v=(i,j)(SX×SX)𝗌𝗍1(𝒞)v=(i,j)\in(S_{X}\times S_{X})\cap\mathsf{st}^{-1}(\mathcal{C}) be contained in ΓstΥ\Gamma_{st}\in\Upsilon with iji\not\approx j. Then e=(𝗌𝗍(i),𝗌𝗍(j))e=(\mathsf{st}(i),\mathsf{st}(j)) is an element of E𝔻E\setminus\mathbb{D} and it is contained in γ𝗌𝗍(s)𝗌𝗍(t)Θ\gamma_{\mathsf{st}(s)\mathsf{st}(t)}\in\Theta.

Proof.

By Lemma 6.4, we have vE𝔻v\in{{}^{*}E}\setminus{{}^{*}\mathbb{D}}. Note that e𝒞𝒪e\in\mathcal{C}\subset\mathcal{O} and e𝔻e\not\in\mathbb{D}. By Assumption 2, we know that f(𝗌𝗍(i),𝗌𝗍(j))>0f(\mathsf{st}(i),\mathsf{st}(j))>0 so eE𝔻e\in E\setminus\mathbb{D}. Let γst{{}^{*}\gamma}_{st} denote the nonstandard path that associates with Γst\Gamma_{st}. Then γst{{}^{*}\gamma}_{st} contains v=(x,y)v^{\prime}=(x,y) where xB(i)x\in B(i) and yB(j)y\in B(j). As W(v,e)0{{}^{*}W}(v^{\prime},{{}^{*}e})\approx 0, by Assumption 8, we have m(v)m(e)=m(e){{}^{*}m}(v^{\prime})\approx{{}^{*}m}({{}^{*}e})=m(e) and there exists a nonstandard path γ{{}^{*}\gamma}^{\prime} such that W(γst,γ)0{{}^{*}W}({{}^{*}\gamma}_{st},{{}^{*}\gamma}^{\prime})\approx 0 and eγ{{}^{*}e}\in{{}^{*}\gamma}^{\prime}. By Assumption 5, ee is only contained in finitely many paths. By the transfer principle, it must be contained in γ𝗌𝗍(s)𝗌𝗍(t)\gamma_{\mathsf{st}(s)\mathsf{st}(t)}. ∎

Before we present our main result, we introduce one more assumption which asserts that edges with small length can only be contained in paths with close starting and ending points. For e=(a,b)Ee=(a,b)\in E, the length of ee, which we denote by e\|e\|, is defined to be e=d(a,b)\|e\|=d(a,b).

Assumption 9.

There exists a function L:L:\mathbb{R}\to\mathbb{R} such that

  • L(x)>0L(x)>0 for all x>0x>0,

  • L(x)0{{}^{*}L}(x)\approx 0 if and only if x0x\approx 0,

  • For every γabΘ\gamma_{ab}\in\Theta and every eγabe\in\gamma_{ab}, we have eL(d(a,b))\|e\|\geq L(d(a,b)).

Note that, for every continuous function LL such that L(0)=0L(0)=0 and L(x)>0L(x)>0 for x>0x>0, we know that L(x)0{{}^{*}L}(x)\approx 0 if and only if x0x\approx 0.

Lemma 6.9.

Suppose Assumption 9 holds. Let v=(s,t)v=(s,t) be an element of \mathcal{E} such that sts\approx t. If ΓijΥ\Gamma_{ij}\in\Upsilon contains vv, then iji\approx j.

Proof.

Let ΓijΥ\Gamma_{ij}\in\Upsilon contain vv. Then the nonstandard path γij{{}^{*}\gamma}_{ij} contains e=(x,y)Ee=(x,y)\in{{}^{*}E} for xB(s)x\in B(s) and yB(t)y\in B(t). By Assumption 9 and the transfer principle, we have eL(|ij|)\|e\|\geq{{}^{*}L}(|i-j|). As e0\|e\|\approx 0, we know that L(|ij|)0{{}^{*}L}(|i-j|)\approx 0. By Assumption 9 again, we can conclude that iji\approx j. ∎

We are now at the place to establish the main result of this section.

Theorem 6.10.

Let {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X} and {g~(x,1,)}xX\{\tilde{g}(x,1,\cdot)\}_{x\in X} be two reversible transition kernels satisfying SSF with stationary distributions π\pi and π~\tilde{\pi}, respectively. Let k(),k~()k(\cdot),\tilde{k}(\cdot) denote the density functions for π,π~\pi,\tilde{\pi}, respectively and let f(x,),f~(x,)f(x,\cdot),\tilde{f}(x,\cdot) denote the density functions for g(x,1,),g~(x,1,)g(x,1,\cdot),\tilde{g}(x,1,\cdot), respectively. Suppose Assumption 2, Assumption 3 hold for some 𝒪\mathcal{O}, Assumption 4 holds for some 𝒞\mathcal{C} and ϵ0>0\epsilon_{0}>0, Assumption 5 holds for some ϵ>0\epsilon>0, Assumption 6 holds for some R>0R>0, Assumption 7 holds for some M>0M>0, Assumption 8 holds for some continuous function m:Em:E\to\mathbb{R}, and Assumption 9 holds for some L:L:\mathbb{R}\to\mathbb{R}. Let

B=supeE(2(m(e)+1)Q(e)(x,y):eγxyQ~(x,y)(|γxy|+M)).\displaystyle B=\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{e\in E}\left(\frac{2(\lceil m(e)\rceil+1)}{Q(e)}\sum_{(x,y):e\in\gamma_{xy}}\tilde{Q}(x,y)(|\gamma_{xy}|+M)\right). (6.14)

Then, for every continuous function h:Xh:X\to\mathbb{R}, we have

g~(h,h)Bg(h,h).\displaystyle\mathscr{E}_{\tilde{g}}(h,h)\leq B\mathscr{E}_{g}(h,h). (6.15)
Proof.

Let {Hi()}iSX\{H_{i}(\cdot)\}_{i\in S_{X}} and {H~i()}iSX\{\tilde{H}_{i}(\cdot)\}_{i\in S_{X}} be two hyperfinite transition kernels associated with {g(x,1,)}xX\{g(x,1,\cdot)\}_{x\in X} and {g~(x,1,)}xX\{\tilde{g}(x,1,\cdot)\}_{x\in X} as defined in Theorem 3.5. Let Π()\Pi(\cdot) and Π~()\tilde{\Pi}(\cdot) be *stationary distributions for {Hi()}iSX\{H_{i}(\cdot)\}_{i\in S_{X}} and {H~i()}iSX\{\tilde{H}_{i}(\cdot)\}_{i\in S_{X}}, respectively. By Theorem 3.5, we know that {Hi()}iSX\{H_{i}(\cdot)\}_{i\in S_{X}} is *reversible with respect to Π\Pi and {H~i()}iSX\{\tilde{H}_{i}(\cdot)\}_{i\in S_{X}} is *reversible with respect to Π~\tilde{\Pi}. Let

𝔹=maxv(1𝒬(v)(s,t):vΓst𝒬~(s,t)|Γst|)\displaystyle\mathbb{B}=\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}_{v\in\mathcal{E}}\left(\frac{1}{\mathcal{Q}(v)}\sum_{(s,t):v\in\Gamma_{st}}\tilde{\mathcal{Q}}(s,t)|\Gamma_{st}|\right) (6.16)

be the hyperfinite congestion ratio. By the transfer principle, for every internal function F:SXF:S_{X}\to{{}^{*}\mathbb{R}}, we have

H~(F,F)𝔹H(F,F).\displaystyle\mathscr{F}_{\tilde{H}}(F,F)\leq\mathbb{B}\mathscr{F}_{H}(F,F). (6.17)

Pick a continuous function h0:Xh_{0}:X\to\mathbb{R}. Define an internal function F0:SXF_{0}:S_{X}\to{{}^{*}\mathbb{R}} by letting F0(s)=h0(s)F_{0}(s)={{}^{*}h_{0}}(s) for every sSXs\in S_{X}. By Theorem 4.3, we have g(h0,h0)H(F0,F0)\mathscr{E}_{g}(h_{0},h_{0})\approx\mathscr{F}_{H}(F_{0},F_{0}) and g~(h0,h0)H~(F0,F0)\mathscr{E}_{\tilde{g}}(h_{0},h_{0})\approx\mathscr{F}_{\tilde{H}}(F_{0},F_{0}). Thus, to finish the proof, it is sufficient to show that 𝔹B\mathbb{B}\lessapprox B.

Pick v=(i,j)v=(i,j)\in\mathcal{E}. Suppose (B(i)×B(j))𝒞=(B(i)\times B(j))\cap{{}^{*}\mathcal{C}}=\emptyset. Let ΓabΥ\Gamma_{ab}\in\Upsilon. As H~a({b})>0{{}^{*}\tilde{H}}_{a}(\{b\})>0, we conclude that (a,b)E~ϵ0(a,b)\in{{}^{*}\tilde{E}_{\epsilon_{0}}}. By Assumption 4 and the transfer principle, we know that the nonstandard path γab{{}^{*}\gamma}_{ab} is formed by edges from 𝒞{{}^{*}\mathcal{C}}. As (B(i)×B(j))C=(B(i)\times B(j))\cap{{}^{*}C}=\emptyset, we know that vΓabv\not\in\Gamma_{ab}. Thus, we can conclude that

1𝒬(v)(s,t):vΓst𝒬~(s,t)|Γst|=0.\displaystyle\frac{1}{\mathcal{Q}(v)}\sum_{(s,t):v\in\Gamma_{st}}\tilde{\mathcal{Q}}(s,t)|\Gamma_{st}|=0. (6.18)

We now suppose that (B(i)×B(j))C(B(i)\times B(j))\cap{{}^{*}C}\neq\emptyset but but iji\approx j. By Assumption 5, vv is contained in finitely many nonstandard paths. By Lemma 6.7, vv is contained in finitely many hyperfinite paths. By Lemma 6.9, any hyperfinite path contains vv has infinitesimal length. Hence, we have

1𝒬(v)(s,t):vΓst𝒬~(s,t)|Γst|=0\displaystyle\frac{1}{\mathcal{Q}(v)}\sum_{(s,t):v\in\Gamma_{st}}\tilde{\mathcal{Q}}(s,t)|\Gamma_{st}|=0 (6.19)

in this case.

We now suppose that v𝗌𝗍1(𝒞)𝗌𝗍1(𝒪)𝒪v\in\mathsf{st}^{-1}(\mathcal{C})\subset\mathsf{st}^{-1}(\mathcal{O})\subset{{}^{*}\mathcal{O}}. This immediately implies that iji\not\approx j. By Assumption 2, we know that 𝗌𝗍(f(i,j))>0\mathsf{st}({{}^{*}f}(i,j))>0 and 𝗌𝗍(k(i))>0\mathsf{st}({{}^{*}k}(i))>0. Let Γab\Gamma_{ab} be a hyperfinite path that contains vv. Suppose f~(a,b)0{{}^{*}\tilde{f}}(a,b)\approx 0. Then, by Lemma 6.3, we know that 𝒬~(a,b)𝒬(v)0\frac{\tilde{\mathcal{Q}}(a,b)}{\mathcal{Q}(v)}\approx 0. By Assumption 5 and Lemma 6.7, vv is contained in finitely many hyperfinite paths. Thus, we have

1𝒬(v)(s,t):vΓst𝒬~(s,t)|Γst|1𝒬(v)vΓst𝒯v𝒬~(s,t)|Γst|\displaystyle\frac{1}{\mathcal{Q}(v)}\sum_{(s,t):v\in\Gamma_{st}}\tilde{\mathcal{Q}}(s,t)|\Gamma_{st}|\approx\frac{1}{\mathcal{Q}(v)}\sum_{v\in\Gamma_{st}\in\mathcal{T}_{v}}\tilde{\mathcal{Q}}(s,t)|\Gamma_{st}| (6.20)

where 𝒯v={Γst:𝗌𝗍(f~(s,t))>0vΓst}\mathcal{T}_{v}=\{\Gamma_{st}:\mathsf{st}({{}^{*}\tilde{f}}(s,t))>0\wedge v\in\Gamma_{st}\}.

By Lemma 6.8, e=(𝗌𝗍(i),𝗌𝗍(j))e=(\mathsf{st}(i),\mathsf{st}(j)) is an element of E𝔻E\setminus\mathbb{D}. It follows from Assumption 5 that ee is contained in finitely many elements of ϝ\digamma. Let

Te={γϝ:eγ}={γx1y1,γx2y2,,γxnyn}\displaystyle T_{e}=\{\gamma\in\digamma:e\in\gamma\}=\{\gamma_{x_{1}y_{1}},\gamma_{x_{2}y_{2}},\dotsc,\gamma_{x_{n}y_{n}}\} (6.21)

for some nn\in\mathbb{N}. By Lemma 6.8 again, we have 𝒯v=i=1nAi\mathcal{T}_{v}=\bigcup_{i=1}^{n}A_{i} where Ai=𝒯v{ΓΥ:W(Γ,γxiyi)0}A_{i}=\mathcal{T}_{v}\cap\{\Gamma\in\Upsilon:{{}^{*}W}(\Gamma,{{}^{*}\gamma}_{x_{i}y_{i}})\approx 0\}. By Lemma 6.7, for every 1in1\leq i\leq n, we have |Ai|2m(e)+1|A_{i}|\leq\lceil 2m(e)+1\rceil. By Lemma 6.3, Assumption 2 and Assumption 3 , we have

1𝒬(v)vΓst𝒯v𝒬~(s,t)|Γst|vΓst𝒯vQ~(𝗌𝗍(s),𝗌𝗍(t))Q(e)|Γst|.\displaystyle\frac{1}{\mathcal{Q}(v)}\sum_{v\in\Gamma_{st}\in\mathcal{T}_{v}}\tilde{\mathcal{Q}}(s,t)|\Gamma_{st}|\approx\sum_{v\in\Gamma_{st}\in\mathcal{T}_{v}}\frac{\tilde{Q}(\mathsf{st}(s),\mathsf{st}(t))}{Q(e)}|\Gamma_{st}|. (6.22)

By Lemma 6.5, we have |Γst||γ𝗌𝗍(s)𝗌𝗍(t)|+M|\Gamma_{st}|\leq|\gamma_{\mathsf{st}(s)\mathsf{st}(t)}|+M. Thus, combining with Eq. 6.20, we have

(s,t):vΓstQ~(𝗌𝗍(s),𝗌𝗍(t))Q(e)|Γst|2(m(e)+1)Q(e)(x,y):eγxyQ~(x,y)(|γxy|+M).\displaystyle\sum_{(s,t):v\in\Gamma_{st}}\frac{\tilde{Q}(\mathsf{st}(s),\mathsf{st}(t))}{Q(e)}|\Gamma_{st}|\lessapprox\frac{2(\lceil m(e)\rceil+1)}{Q(e)}\sum_{(x,y):e\in\gamma_{xy}}\tilde{Q}(x,y)(|\gamma_{xy}|+M). (6.23)

Hence, we can conclude that 𝔹B\mathbb{B}\lessapprox B, proving the desired result. ∎

6.4. Example: Finishing the Barbell Graph

We continue from Section 6.2, using the same notation and paths. We will compare the kernel QnQ_{n}^{\prime} to the kernel that takes i.i.d. samples from the stationary measure of QnQ_{n}^{\prime}. Let dd be as in Section 6.3.

We now quickly check the assumptions of Theorem 6.10:

  • Both transition kernels clearly satisfy DSF (indeed, in the statement of the condition one can take δ=0.1\delta=0.1 for all ϵ0\epsilon\geq 0), and thus they also satisfy the weaker condition SSF.

  • Assumption 2 and Assumption 3 hold with 𝒪=X2\mathcal{O}=X^{2}.

  • Assumption 4 holds with 𝒞\mathcal{C} consisting of all edges of length at most 1, and ϵ0=1\epsilon_{0}=1.

  • Assumption 5 holds with any 0<ϵ<10<\epsilon<1.

  • Assumption 6 holds with R=3R=3.

  • Assumption 7 holds trivially with M=R1=2M=R-1=2.

  • Assumption 8 holds for any 0<ϵ<10<\epsilon<1 with m(e)(n+1)m(e)\equiv(n+1).

  • Assumption 9 holds with L(x)xL(x)\equiv x.

Applying these bounds to the formula BB appearing in Theorem 6.10, we find B=Θ(n2)B=\Theta(n^{2}); this gives the correct dependence on nn for the relaxation time.

References

  • [ACH97] “Nonstandard analysis” Theory and applications In Proceedings of the NATO Advanced Study Institute on Nonstandard Analysis and its Applications held in Edinburgh, June 30–July 13, 1996 493, NATO Advanced Science Institutes Series C: Mathematical and Physical Sciences Kluwer Academic Publishers Group, Dordrecht, 1997, pp. xiv+366 DOI: 10.1007/978-94-011-5544-1
  • [ADS18] Robert M. Anderson, Haosui Duanmu and Aaron Smith “Mixing Times and Hitting Times for General Markov Processes” submitted, 2018
  • [ADS19] Robert M. Anderson, Haosui Duanmu and Aaron Smith “Mixing Times and Average Mixing Times for General Markov Processes” Submitted, 2019
  • [ADS19a] Robert M. Anderson, Haosui Duanmu and Aaron Smith “Mixing Times and Hitting Times for Gibbs Samplers and other Non-Feller Processes” submitted, 2019
  • [And76] Robert M. Anderson “A non-standard representation for Brownian motion and Itô integration” In Israel J. Math. 25.1-2, 1976, pp. 15–46 DOI: 10.1007/BF02756559
  • [AS03] Sergio Albeverio and Walter Schachermayer “Lectures on probability theory and statistics: ecole d’eté de probabilités de Saint-Flour XXX-2000” Springer Science & Business Media, 2003
  • [BD58] A. Beurling and J. Deny “Espaces de Dirichlet. I. Le cas élémentaire” In Acta Mathematica 99.1, 1958, pp. 203–224
  • [Cut+95] “Developments in nonstandard mathematics” Papers from the International Colloquium (CIMNS94) held in memory of Abraham Robinson at the University of Aveiro, Aveiro, July 18–22, 1994 336, Pitman Research Notes in Mathematics Series Longman, Harlow, 1995, pp. x+260
  • [Dav89] E. B. Davies “Heat Kernels and Spectral Theory”, Cambridge Tracts in Mathematics Cambridge University Press, 1989 DOI: 10.1017/CBO9780511566158
  • [Doo12] Joseph L Doob “Classical potential theory and its probabilistic counterpart: Advanced problems” Springer Science & Business Media, 2012
  • [DR18] Haosui Duanmu and Daniel M. Roy “On extended admissible procedures and their nonstandard Bayes risk” The Annals of Statistics, under revision, 2018
  • [DRS17] Haosui Duanmu, Daniel M. Roy and Aaron Smith “Existence of matching priors on compact spaces yielding confidence intervals” In preparation, 2017
  • [DRW18] Haosui Duanmu, J.S. Rosenthal and William Weiss “Ergodicity of Markov processes via non-standard analysis” Memoirs of the American Mathematical Society, to appear, 2018
  • [DSC93] Persi Diaconis and Laurent Saloff-Coste “Comparison theorems for reversible Markov chains” In The Annals of Applied Probability JSTOR, 1993, pp. 696–730
  • [Fuk96] M Fukushima “Dirichlet forms and symmetric Markov processes” In Bull. Amer. Math. Soc 33, 1996, pp. 87–92
  • [Kei84] H. Jerome Keisler “An infinitesimal approach to stochastic analysis” In Mem. Amer. Math. Soc. 48.297, 1984, pp. x+184 DOI: 10.1090/memo/0297
  • [Kus89] Shigeo Kusuoka “Dirichlet forms on fractals and products of random matrices” In Publications of the Research Institute for Mathematical Sciences 25.4 Research Institute forMathematical Sciences, 1989, pp. 659–680
  • [LPW09] David Levin, Yuval Peres and Elizabeth Wilmer “Markov Chains and Mixing Times” Providence, Rhode Island: American Mathematical Society, 2009, pp. xi+371
  • [LPW09a] David A. Levin, Yuval Peres and Elizabeth L. Wilmer “Markov chains and mixing times” With a chapter by James G. Propp and David B. Wilson American Mathematical Society, Providence, RI, 2009, pp. xviii+371
  • [MN16] Ulrich K Müller and Andriy Norets “Coverage Inducing Priors in Nonstandard Inference Problems” In Journal of the American Statistical Association 111.515 Taylor & Francis, 2016, pp. 1233–1241
  • [MR12] Zhi-Ming Ma and Michael Röckner “Introduction to the theory of (non-symmetric) Dirichlet forms” Springer Science & Business Media, 2012
  • [PS15] Yuval Peres and Perla Sousi “Mixing times are hitting times of large sets” In J. Theoret. Probab. 28.2, 2015, pp. 488–519 DOI: 10.1007/s10959-013-0497-9
  • [Sch99] Byron Schmuland “Dirichlet forms: some infinite-dimensional examples” In Canad. J. Statist. 27.4, 1999, pp. 683–700 DOI: 10.2307/3316125
  • [WL00] “Nonstandard analysis for the working mathematician” 510, Mathematics and its Applications Kluwer Academic Publishers, Dordrecht, 2000, pp. xiv+311 DOI: 10.1007/978-94-011-4168-0
  • [Yue00] Wai Kong Yuen “Applications of geometric bounds to the convergence rate of Markov chains on n\mathbb{R}^{n} In Stochastic processes and their applications 87.1 Elsevier, 2000, pp. 1–23
  • [Yue02] Wai Kong Yuen “Generalization of discrete-time geometric bounds to convergence rate of Markov processes on n\mathbb{R}^{n} In Stochastic models 18.2 Taylor & Francis, 2002, pp. 301–331
  • [Zim05] G. Beate Zimmer “A unifying Radon-Nikodým theorem through nonstandard hulls” In Illinois J. Math. 49.3, 2005, pp. 873–883 URL: http://projecteuclid.org.myaccess.library.utoronto.ca/euclid.ijm/1258138224