A Stein’s Method Approach to the Linear Noise Approximation for Stationary Distributions of Chemical Reaction Networks

Theodore W. Grunberg Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA 02139 USA ([email protected]). Domitilla Del Vecchio Department of Mechanical Engineering, MIT, Cambridge, MA 02139 USA ([email protected]).

Abstract

Stochastic Chemical Reaction Networks are continuous time Markov chain models that describe the time evolution of the molecular counts of species interacting stochastically via discrete reactions. Such models are ubiquitous in systems and synthetic biology, but often have a large or infinite number of states, and thus are not amenable to computation and analysis. Due to this, approximations that rely on the molecular counts and the volume being large are commonly used, with the most common being the Reaction Rate Equations and the Linear Noise Approximation. For finite time intervals, Kurtz established the validity of the Reaction Rate Equations and Linear Noise Approximation, by proving law of large numbers and central limit theorem results respectively. However, the analogous question for the stationary distribution of the Markov chain model has remained mostly unanswered, except for chemical reaction networks with special structures or bounded molecular counts. In this work, we use Stein’s Method to obtain sufficient conditions for the stationary distribution of an appropriately scaled Stochastic Chemical Reaction Network to converge to the Linear Noise Approximation as the system size goes to infinity. Our results give non asymptotic bounds on the error in the Reaction Rate Equations and in the Linear Noise Approximation as applied to the stationary distribution. As a special case, we give conditions under which the global exponential stability of an equilibrium point of the Reaction Rate Equations is sufficient to obtain our error bounds, thus permitting one to obtain conclusions about the Markov chain by analyzing the deterministic Reaction Rate Equations.

1 Introduction

1.1 Background and existing results

Systems composed of a set of chemical species interacting via a finite set of reactions are most often modeled as a Stochastic Chemical Reaction Network (SCRN), a continuous time Markov chain (CTMC) model for how the molecular counts of each species change over time. Such models have found widespread use in systems and synthetic biology as a method to capture the counts of relevant biomolecules within a single cell, where due to the small numbers of molecules present, the effect of stochasticity in the system’s evolution can be profound [67, 61, 52, 46, 6, 20, 65]. The time evolution of the probability distribution of molecular counts can be computed via the Kolmogorov forward equations, often called the Chemical Master Equation [27], or by a Monte Carlo approach using the Doob-Gillespie Algorithm [16, 26]. However, due to the very large, and often countably infinite, number of states in the Markov chain, simpler approximations have been proposed, the most prominent of which exploit the fact that for many physical systems the volume in which the reactions occur and the molecular counts of all species are large. These approximations are formalized by considering a family of Markov chains $\bm{X}_{\Omega}(t)$ , where $\bm{X}_{\Omega}(t)$ is the vector of molecular counts at time $t$ and $\Omega$ is the volume. The Reaction Rate Equation (RRE) model is an Ordinary Differential Equation (ODE), whose solution $\bm{v}(t)$ is a deterministic approximation for $\bm{X}_{\Omega}(t)/\Omega$ , rigorously justified by Kurtz on finite time intervals in the sense that under mild technical conditions, $\bm{X}_{\Omega}(t)/\Omega\rightarrow_{p}\bm{v}(t)$ uniformly on any finite time interval [39]. Due to $\bm{v}(t)$ being the solution to an ODE, the computational and analytical simplicity of the RRE model has lead to its widespread use in both modeling in systems biology [3] and design in synthetic biology [14, 24, 19]. In essence, the RRE model approximates the concentrations by a constant at each point in time, neglecting any stochastic fluctuation away from this value. In order to account for these fluctuations, Van Kampen expanded the Forward Kolmogorov Equations in powers of $1/\sqrt{\Omega}$ [35, 62]. The first order approximation he derived in this manner, called the Linear Noise Approximation (LNA), adds a correction term to the RREs, specifically a Gaussian approximation to fluctuations of $\bm{X}_{\Omega}(t)/\Omega$ about $\bm{v}(t)$ . The LNA was rigorously justified by Kurtz in the sense of giving technical conditions for the central limit theorem result $\frac{1}{\sqrt{\Omega}}\left(\bm{X}_{\Omega}(t)-\Omega\bm{v}(t)\right)\rightarrow_{p}\mathcal{N}(0,P(t))$ to hold uniformly on finite time intervals with an appropriate covariance matrix $P(t)$ . The LNA is similar to the RREs in that the approximating distribution can be easily computed from the solution to an ODE, and due to this simplicity it has been used widely to quantify noise in natural biological systems, [50, 51, 18, 60], as well as for design [56, 55], and inference [38, 22, 23] in synthetic biology.

However, a key weakness of the theoretical justification for the RREs and the LNA is that the results given in [39] do not say anything about the relationship between the stationary distribution of the Markov chain model and the steady state behavior of the RREs or LNA, even when the stationary distribution exists uniquely for all $\Omega$ and the RREs/LNA has a well defined limiting behavior. In a later work, Kurtz gave a sufficient condition for the uniform in time convergence of $\bm{X}_{\Omega}(t)/\Omega$ to $\bm{v}(t)$ and of $\frac{1}{\sqrt{\Omega}}\left(\bm{X}_{\Omega}(t)-\Omega\bm{v}(t)\right)$ to $\mathcal{N}(0,P(t))$ [40]. However, that result requires that the state space of $\bm{X}_{\Omega}(t)/\Omega$ be bounded uniformly in $\Omega$ , which rules out the analysis of many biological systems of interest where the molecular counts of certain species can grow unbounded, and thus for all $\Omega$ , $\bm{X}_{\Omega}(t)$ evolves on a countably infinite subset of the integer lattice. This technical difficulty in using the RREs or LNA to approximate the stationary distribution of an SCRN has long been understood [62], and in fact it is often the case that the Markov chain and RREs/LNA models have different long term behavior [43]. In this work we give sufficient conditions under which there exists a constant $C$ such that for all sufficiently large $\Omega$ ,

\mathcal{W}_{1}\left(\frac{1}{\sqrt{\Omega}}\left(\bm{X}_{\Omega}^{\infty}-\Omega\bm{v}^{*}\right),\bm{Y}^{\infty}\right)\leq C\frac{\ln{\Omega}}{\sqrt{\Omega}},

(1)

where $\bm{X}_{\Omega}^{\infty}$ is distributed according to the stationary distribution of $\bm{X}_{\Omega}(t)$ , $\bm{v}^{*}$ is an appropriate equilbrium point of the RRE, $\bm{Y}^{\infty}\sim\mathcal{N}(0,P^{\infty})$ with $P^{\infty}$ a covariance matrix computed via the LNA, and $\mathcal{W}_{1}$ is the 1-Wasserstein distance between the laws of the two random variables. This result immediately implies both that the Markov chain’s stationary distribution converges to an equilibrium point of the RRE in the sense $\bm{X}_{\Omega}^{\infty}/\Omega\rightarrow_{d}\bm{v}^{*}$ , where $\bm{v}^{*}$ is an appropriate equilbrium of the RRE, and that $\frac{1}{\sqrt{\Omega}}\left(\bm{X}_{\Omega}(t)-\Omega\bm{v}(t)\right)\rightarrow_{d}\mathcal{N}(0,P^{\infty})$ [25]. We give sufficient conditions in terms of second moment of $\frac{1}{\sqrt{\Omega}}\left(\bm{X}_{\Omega}^{\infty}-\Omega\bm{v}^{*}\right)$ for (1) to hold, and additionally give sufficient conditions on only the RRE model for (1) to hold, which include global exponential stability of $\bm{v}^{*}$ . Interestingly, the convergence rate given by (1) is identical to the convergence rate shown by Kurtz on finite time intervals [41].

The primary technique we use to prove our results is Stein’s Method [13]. Stein’s Method is a powerful technique for proving central limit theorems introduced by Stein in 1972 [59, 58], and used in a wide variety of situations since then, such as when the limiting distribution is Poisson [7] or Binomial [17], as well as for convergence to a diffusion process [8]. The most closely related work to our study comes from queueing theory, where Braverman et al. using Stein’s Method, and Gurvich using a closely related technique, studied diffusion approximations of certain queueing systems in the heavy traffic regime [11, 10, 33]. Our application of Stein’s method follows that of [11] at a high level, where the key difference is that our family of CTMCs and limiting process is in $n$ dimensions instead of one. In order to handle this, we use results on Stein Factors from [30].

1.2 Related work

Although rigorous work on approximating the stationary distribution of SCRNs in the large volume limit has been limited, a variety of related ideas have been presented in the literature. In particular, when the microscopic rates are all affine, it is known that the LNA matches the first and second moments of $\bm{X}_{\Omega}(t)$ for all time [62, 43]. This idea has been extended to a restricted class of SCRNs with nonlinear propensities in [31]. Additionally, Sontag and Singh gave a special class of SCRNs where the moments can be exactly computed even when they may not match the moments of the LNA exactly [57]. However, in these cases it is still unclear if the stationary distribution of $\frac{1}{\sqrt{\Omega}}\left(\bm{X}_{\Omega}(t)-\Omega v^{*}\right)$ converges to the Gaussian given by the LNA in the large volume limit. In fact, the results we present allow one to rigorously go from convergence of $\frac{1}{\sqrt{\Omega}}\left(\bm{X}_{\Omega}^{\infty}-\Omega v^{*}\right)$ to $\mathcal{N}(0,P^{\infty})$ with respect to the first and second moments, to convergence in 1-Wasserstein distance. We can thus strengthen the results of e.g. [31] with limited additional work. We note that methods for establishing convergence of the stationary distributions of a family of Markov Processes to the stationary distribution of a limiting process are given in [21]. However, not only have these methods not been successfully applied to SCRNs, but also our Stein’s Method approach yields non-asymptotic bounds on the approximation error in 1-Wasserstein distance.

In certain special cases, such as detailed or complex balance, the stationary distribution of $\bm{X}_{\Omega}(t)$ can be computed in closed form, and thus in principle one could directly analyze the approximation error when using the LNA [66, 4, 48]. However, for most SCRNs of interest in biology, such results are not applicable. For example, a simple two step transcription-translation model for protein expression is not detailed or complex balanced [14], and thus the results in [66, 4] cannot be used to obtain a closed form expression for the stationary distribution, whereas the results of [48] are only exact for systems where $\bm{X}_{\Omega}(t)$ has finitely many states.

We note that the LNA is only one diffusion approximation for SCRNs, and in fact the Chemical Langevin Equation (CLE) is often a more accurate approximation to $\bm{X}_{\Omega}(t)/\Omega$ for moderately large volumes [28]. The convergence of $\bm{X}_{\Omega}(t)/\Omega$ to the CLE on finite time intervals was established by Kurtz [41], but using the CLE to approximate the stationary distribution of $\bm{X}_{\Omega}(t)/\Omega$ has remained mostly unstudied. In fact, the CLE often exhibits finite time breakdown and thus does not have a well defined stationary distribution, a problem that has been tackled by allowing complex values in the Stochastic Differential equation [54], as well as imposing the physical constraint that concentrations are nonnegative [42].

1.3 Outline of the rest of the paper

The rest of the paper is as follows: In Section 2 we give notation and mathematical background, and then introduce the three relevant models. In Section 3 we state our main results, the proofs of which appear in Section 4. In Section 5 we illustrate our results with several examples. In Section 6 we provide concluding remarks and future research directions.

2 Problem Setting

In this section we first introduce the notation we will use, and then introduce SCRNs and the associated CTMC model along with and the Reaction Rate Equations and Linear Noise Approximation models. We also introduce the 1-Wasserstein distance, which we will use to measure the distance between probability distributions.

2.1 Notation

We denote (column) vectors by bold symbols $\bm{x},\bm{Z}$ , with the elements indexed as $\bm{x}=[x_{1},\dots,x_{n}]^{T}$ . Let $\mathbb{Z}_{\geq 0}^{n}$ [ $\mathbb{Z}_{>0}^{n}$ ] denote the nonnegative [positive] integer lattice in $n$ dimensions. Let $\mathbb{Q}$ denote the set of rational numbers. Let $\mathbb{R}^{n}$ , $\mathbb{R}_{\geq 0}^{n}$ , and $\mathbb{R}^{n}_{>0}$ denote n-dimensional Euclidian space, and the nonnegative and positive orthants of the same respectively. For $\bm{x}\in\mathbb{R}^{n}$ we denote the standard 2-norm by $\lVert\bm{x}\rVert$ , and for a matrix $A\in\mathbb{R}^{n\times m}$ we denote by $\left\lVert A\right\rVert$ the induced 2-norm $\left\lVert A\right\rVert=\sup_{\bm{x}\neq 0}\frac{\left\lVert A\bm{x}\right\rVert}{\lVert\bm{x}\rVert}$ , and denote by $\mathrm{cond}(A)$ the condition number of $A$ . When all eigenvalues of $A\in\mathbb{R}^{n\times n}$ have strictly negative real parts we say that $A$ is Hurwitz. Given a function $g:\mathbb{R}^{n}\rightarrow\mathbb{R}$ , we denote by $\frac{\partial g}{\partial\bm{z}}$ , $\nabla g$ , and $\nabla^{2}g$ the row vector of partial derivatives, the gradient, and the Hessian of $g$ respectively. We denote the third (directional) derivative of $g$ by $\nabla^{3}g[\bm{x}_{1},\bm{x}_{2},\bm{x}_{3}]$ , along the directions $\bm{x}_{1},\bm{x}_{2},\bm{x}_{3}$ , and define its induced norm as $\sup_{\lVert\bm{x}_{1}\rVert=\lVert\bm{x}_{2}\rVert=\lVert\bm{x}_{3}\rVert=1}\lVert\nabla^{3}g[\bm{x}_{1},\bm{x}_{2},\bm{x}_{3}]\rVert$ . For two functions $f$ and $g$ from $\mathbb{R}^{n}$ to $\mathbb{R}$ , we define the convolution of $f$ and $g$ as $(f\star g)(\bm{x})=\int_{\mathbb{R}^{n}}f(\bm{x}-\bm{s})g(\bm{s})\mu(d\bm{s})$ , with $\mu$ the Lebesgue measure on $\mathbb{R}^{n}$ . We denote the set of k-continuously differentiable functions on a domain $\mathcal{D}$ by $C^{k}(\mathcal{D})$ , where we omit the domain if it is clear from context.

2.2 Stochastic Chemical Reaction Networks

We consider Stochastic Chemical Reaction Networks parameterized by $\Omega\in\mathcal{O}\subseteq\mathbb{R}_{>0}$ , composed of $n$ species $\mathrm{X}_{1},\mathrm{X}_{2},\dots,\mathrm{X}_{n}$ , interacting according to $r$ reactions, each characterized by a nonzero reaction vector $\bm{\xi}_{i}\in\mathbb{Z}^{n}$ and a microscopic propensity function $\lambda_{i}(\bm{x})$ , which depends on $\Omega$ . The reaction vector describes the change in the counts on the species when reaction $i$ fires, and the microscopic propensity function describes the rate at which reaction $i$ fires. Thus, for each $\Omega\in\mathcal{O}$ , the Stochastic Chenical Reaction Network describes a CTMC $\{\bm{X}_{\Omega}(t),t\geq 0\}$ , where $\bm{X}_{\Omega}(t)=[X_{1}(t),X_{2}(t),\dots,X_{n}(t)]^{T}\in\mathcal{X}_{\Omega}\subseteq\mathbb{Z}_{\geq 0}^{n}$ represents the molecular counts of the species. The state space $\mathcal{X}_{\Omega}$ can be chosen as any subset of $\mathbb{Z}_{\geq 0}^{n}$ such that 1) $\lambda_{i}(\bm{x})\geq 0$ for all $\bm{x}\in\mathcal{X}_{\Omega}$ and 2) for all $\bm{x}\in\mathcal{X}_{\Omega}$ and all $i\in\{1,\dots,r\}$ such that $\bm{x}+\bm{\xi}_{i}\notin\mathcal{X}_{\Omega}$ , $\lambda_{i}(\bm{x})=0$ . The infinitesimal transition rates from $\bm{x}$ to $\bm{x}^{\prime}$ are given by $q_{\bm{x},\bm{x}^{\prime}}=\sum_{i:\bm{x}+\bm{\xi}_{i}=\bm{x}^{\prime}}\lambda_{i}(\bm{x})$ . We refer the interested reader to [5] for a more thorough background on SCRNs. Throughout this work we will need to impose the following assumption on our SCRNs:

Assumption 2.1.

The propensity functions have the form

\lambda_{i}(\bm{x})=\Omega\bar{\lambda}_{i}\left(\frac{\bm{x}}{\Omega}\right).

(2)

Additionally, there exists $\mathcal{X}=\prod_{j=1}^{n}I_{j}\subseteq\mathbb{R}_{\geq 0}^{n}$ where for each $j$ , $I_{J}=[0,x_{j}^{max}]$ for some $x_{j}^{max}\in\mathbb{Q}_{>0}$ , or $I_{j}=[0,\infty)$ , such that:

1.

For all $i\in\{1,\dots,r\}$ , $\bar{\lambda}_{i}(\bm{v})>0$ for all $\bm{v}\in\mathcal{X}$ such that there exists $\eta>0$ satisfying $\bm{v}+\eta\bm{\xi}_{i}\in\mathcal{X}$ .
2.

For all $i\in\{1,\dots,r\}$ , $\bar{\lambda}_{i}(\bm{v})=0$ for all $\bm{v}\in\mathcal{X}$ such that for all $\eta>0$ , $\bm{v}+\eta\bm{\xi}_{i}\notin\mathcal{X}$ .

Remark 2.1.

The functions $\bar{\lambda}_{i}(\bm{v})$ are the macroscopic propensities of each reaction.

Under Assumption 2.1, we can define $\mathcal{O}=\left\{\Omega\in\mathbb{R}_{>0}\middle|\forall j,\;\Omega x_{j}^{max}\in\mathbb{Z}_{>0}\right\}$ . Note that our assumption that $x_{j}^{max}\in\mathbb{Q}$ ensures that $\sup\mathcal{O}=+\infty$ , i.e. our family of Markov chains includes those with arbitrarily large volume. Under Assumption 2.1 we will always choose to define the state space of our CTMC as $\mathcal{X}_{\Omega}=\mathbb{Z}_{\geq 0}^{n}\cap\Omega\mathcal{X}$ . In order to apply our Stein’s Method technique, we will need the following assumption:

Assumption 2.2.

For all $i\in\{1,\dots,r\}$ , $\bar{\lambda}_{i}(\bm{v})$ is twice continuously differentiable on some open set $\mathcal{D}$ containing $\mathcal{X}$ , and has bounded second derivatives on $\mathcal{X}$ .

One form of $\lambda_{i}(\bm{x})$ which satisfies Assumptions 2.1 and 2.2 is mass action kinetics [62], which means that the propensities are of the form

\lambda_{i}(\bm{x})=k_{i}\Omega^{1-\sum_{j}\xi_{ri,j}}\prod_{j=1}^{n}x_{j}^{\xi_{ri,j}},

(3)

where $\bm{\xi}_{i}=\bm{\xi}_{pi}-\bm{\xi}_{ri}$ with $\bm{\xi}_{pi},\bm{\xi}_{ri}\in\mathbb{Z}_{\geq 0}^{n}$ is the reaction vector of the chemical reaction

\schemestart\subscheme{\sum_{j}\xi_{ri,j}\mathrm{X}_{j}}\arrow(x1--x1){->[k_{i}]}[0]\subscheme{\sum_{j}\xi_{pi,j}\mathrm{X}_{j}}\schemestop.

(4)

The requirement that $\sum\bm{\xi}_{pi}\leq 2$ must be imposed for $\lambda_{i}(\bm{x})$ to satisfy Assumption 2.2. See [27] for a discussion of the physical assumptions necessary to reach such propensities. We note that (3) requires that no reactant appears twice on the left hand side of the reaction. Having a reactant $\mathrm{X}_{j}$ appear twice on the left hand side of a reaction creates a propensity proportional to $x_{j}^{2}$ instead of the physically accurate $x_{j}(x_{j}-1)$ [27]. Here $k_{i}$ is the reaction rate constant of reaction $i$ and $\xi_{ri,j}$ is the $j$ ^th element of $\bm{\xi}_{ri}$ . Another form of $\lambda_{i}(\bm{x})$ that satisfies Assumptions 2.1 and 2.2 is

\lambda_{i}(\bm{x})=\Omega\frac{k^{i}_{1}(x_{j_{i}}/\Omega)^{p}}{k^{i}_{2}+(x_{j_{i}}/\Omega)^{q}},

(5)

where $j_{i}\in\{1,\dots,n\}$ and $p,q\in\mathbb{Z}_{>0}$ such that $p+q\leq 3$ or $p=q$ . Such “Hill-type propensities” can arise through timescale separation in systems of chemical reactions with mass action kinetics [36, 47, 34]. For simplicity, we will impose the following assumption:

Assumption 2.3.

The stoichiometric subspace, $\mathrm{span}\{\bm{\xi}_{1},\dots,\bm{\xi}_{r}\}$ , has dimension $n$ , and for each $\Omega\in\mathcal{O}$ , $\bm{X}_{\Omega}(t)$ is irreducible.

Though Assumption 2.3 appears to rule out SCRNs that have conservation laws, i.e. a linear combination of species counts that remains unchanged by all reactions, we will show through an example in Section 5 that Assumption 2.3 does not fundamentally prevent the analysis of SCRNs with conservation laws, as long as a change of coordinates exist where the system can be represented in a lower dimensional lattice $\mathbb{Z}_{\geq 0}^{n^{\prime}}$ as an SCRN with stoichiometric subspace of dimension $n^{\prime}$ . For $f:\mathbb{R}^{n}\rightarrow\mathbb{R}$ the generator of $\bm{X}_{\Omega}(t)$ is

G_{\bm{X}_{\Omega}}f(\bm{x})=\sum_{i=1}^{r}\lambda_{i}(\bm{x})\left(f(\bm{x}+\bm{\xi}_{i})-f(\bm{x})\right).

(6)

We will also consider two shifted and scaled version of $\bm{X}_{\Omega}(t)$ defined by

\tilde{\bm{X}}_{\Omega}(t)=\frac{1}{\sqrt{\Omega}}\left(\bm{X}_{\Omega}(t)-\Omega\bm{v}^{*}\right),

(7)

and

\tilde{\tilde{\bm{X}}}_{\Omega}(t)=\frac{1}{\Omega}\bm{X}_{\Omega}(t)-\bm{v}^{*},

(8)

where $\bm{v}^{*}$ is a specified point in $\mathcal{X}$ . For $f:\mathbb{R}^{n}\rightarrow\mathbb{R}$ the generator of $\tilde{\bm{X}}_{\Omega}(t)$ is given by

G_{\tilde{\bm{X}}_{\Omega}}f(\bm{x})=\sum_{i=1}^{r}\lambda_{i}(\sqrt{\Omega}\bm{x}+\Omega\bm{v}^{*})\left(f(\bm{x}+\frac{1}{\sqrt{\Omega}}\bm{\xi}_{i})-f(\bm{x})\right),

(9)

and the generator of $\tilde{\tilde{\bm{X}}}_{\Omega}(t)$ is given by

G_{\tilde{\tilde{\bm{X}}}_{\Omega}}f(\bm{x})=\sum_{i=1}^{r}\Omega\bar{\lambda}_{i}(\bm{x}+\bm{v}^{*})\left(f(\bm{x}+\frac{1}{\Omega}\bm{\xi}_{i})-f(\bm{x})\right).

(10)

When $\bm{X}_{\Omega}(t)$ has a unique stationary distribution $\pi$ , we denote by $\bm{X}_{\Omega}^{\infty}$ a random variable having law $\pi$ . We likewise define $\tilde{\bm{X}}_{\Omega}^{\infty}$ and $\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty}$ .

2.3 Reaction Rate Equations

The Reaction Rate Equations (RREs) are an Ordinary Differential Equation Model which is used to approximate $\bm{X}_{\Omega}(t)/\Omega$ . Defining $\bar{\bm{\lambda}}(\bm{v})=\begin{bmatrix}\bar{\lambda}_{1}(\bm{v}),&\bar{\lambda}_{2}(\bm{v}),&\dots,&\bar{\lambda}_{r}(\bm{v})\end{bmatrix}^{T}$ and $S=\begin{bmatrix}\bm{\xi}_{1}&\bm{\xi}_{2}&\dots&\bm{\xi}_{r}\end{bmatrix}$ , the reaction rate equations are

\displaystyle\frac{d}{dt}\bm{v}(t)

\displaystyle=F(\bm{v}(t)),

\displaystyle\bm{v}(0)

\displaystyle=\bm{v}_{0}

(11)

where $F(\bm{v})=S\bar{\bm{\lambda}}(\bm{v})$ and $\bm{v}\in\mathcal{X}\subseteq\mathbb{R}^{n}_{\geq 0}$ . An equilibrium point of the RREs is any point $\bm{v}^{*}\in\mathcal{X}$ satisfying $0=F(\bm{v}^{*})$ . We note that under Assumption 2.1, $\mathcal{X}$ will be positive invariant with respect to 11.

2.4 Linear Noise Approximation

The Linear Noise Approximation (LNA) is a diffusion approximation obtained by expanding the Chemical Master Equation using the ansatz $\bm{X}_{\Omega}(t)\approx\Omega\bm{v}(t)+\sqrt{\Omega}\bm{Y}^{\prime}(t)$ [62]. The terms $\bm{v}(t)$ and $\bm{Y}^{\prime}(t)$ are given by


$\displaystyle\frac{d}{dt}\bm{v}(t)$	$\displaystyle=F(\bm{v}(t)),$	$\displaystyle\bm{v}(0)$	$\displaystyle=\bm{v}_{0},$	(12a)
$\displaystyle d\bm{Y}^{\prime}(t)$	$\displaystyle=\frac{\partial F(\bm{v})}{\partial\bm{v}}\bm{Y}^{\prime}(t)dt+S\operatorname{diag}\sqrt{\bar{\bm{\lambda}}(\bm{v})}d\bm{W}(t),$	$\displaystyle\bm{Y}^{\prime}(0)$	$\displaystyle=\bm{Y}^{\prime}_{0},$	(12b)

where evidently (12a) is the RRE (11). Here $\bm{W}(t)$ is a unit covariance Wiener Process. Let $\bm{v}^{*}$ be an equilibrium point of (12a). We note that $\bar{\lambda}_{i}(\bm{v})\geq 0$ must hold for (12b) to make sense, which is guaranteed by e.g. Assumption 2.1. We denote by $\bm{Y}(t)$ the solution to (12b) with $\bm{v}(t)=\bm{v}^{*}$ and $\bm{Y}^{\prime}(0)=\bm{Y}_{0}$ . The generator of $\bm{Y}(t)$ is

G_{\bm{Y}}f(\bm{x})=\frac{\partial f(\bm{x})}{\partial\bm{x}}\cdot\frac{\partial F(\bm{v}^{*})}{\partial\bm{v}}\bm{x}+\frac{1}{2}\sum_{i=1,j=1}^{n}D_{ij}\frac{\partial^{2}f(\bm{x})}{\partial x_{i}\partial x_{j}},

(13)

where $D=S\operatorname{diag}\left(\bar{\bm{\lambda}}(\bm{v}^{*})\right)S^{T}$ . We remark that $\bm{Y}(t)$ is an Ornstein–Uhlenbeck process, and thus $\bm{Y}(t)$ is Gaussian as long as $\bm{Y}_{0}$ is. The stationary distribution of $\bm{Y}(t)$ is a zero mean Gaussian with covariance matrix $P^{\infty}\in\mathbb{R}^{n\times n}$ given as the solution to the Lyapunov Equation

\frac{\partial F(\bm{v}^{*})}{\partial\bm{v}}P^{\infty}+P^{\infty}\frac{\partial F(\bm{v}^{*})}{\partial\bm{v}}^{T}=-S\operatorname{diag}\bar{\bm{\lambda}}(\bm{v}^{*})S^{T},

(14)

which has a unique positive definite solution as long as $\frac{\partial F(\bm{v}^{*})}{\partial\bm{v}}$ is Hurwitz. In this case, we denote by $\bm{Y}^{\infty}$ a random variable distributed according to $\mathcal{N}(0,P^{\infty})$ .

2.5 1-Wasserstein Distance

In this work we measure the distance between probability distributions using the 1-Wasserstein distance [25]. By a slight abuse of notation, we define the 1-Wasserstein distance between two random variables as the 1-Wasserstein distance between the laws of the random variables, formally:

Definition 2.1.

Given two random variables $\bm{Z}_{1}$ and $\bm{Z}_{2}$ , both taking values in $\mathbb{R}^{n}$ , we define the 1-Wasserstein distance between $\bm{Z}_{1}$ and $\bm{Z}_{2}$ as

\mathcal{W}_{1}\left(\bm{Z}_{1},\bm{Z}_{2}\right)=\sup_{h\in\mathrm{Lip}(1)}\left\lvert\mathbb{E}\left[h\left(\bm{Z}_{1}\right)\right]-\mathbb{E}\left[h\left(\bm{Z}_{2}\right)\right]\right\rvert,

(15)

where $\mathrm{Lip}(1)=\left\{h:\mathbb{R}^{n}\rightarrow\mathbb{R}\middle|\forall\bm{x},\bm{x}^{\prime}\in\mathbb{R}^{n},\;\left\lVert h(\bm{x})-h(\bm{x}^{\prime})\right\rVert\leq\left\lVert\bm{x}-\bm{x}^{\prime}\right\rVert\right\}$ .

We remark that this definition of 1-Wasserstein distance uses the dual formulation [63], and is a slight abuse of notation since the 1-Wasserstein distance is usually defined between two probability distributions. Our definition is equivalent to taking the 1-Wasserstein distance between the laws of $\bm{Z}_{1}$ and $\bm{Z}_{2}$ .

3 Approximating Stationary Distributions of SCRNs

Here we state the main results of this work. To begin, we present the technical conditions that will guarantee that the stationary distribution of $\tilde{\bm{X}}_{\Omega}(t)$ converges to $\mathcal{N}(0,P^{\infty})$ . We then give our main result, showing that when the first and second moments of $\tilde{\bm{X}}_{\Omega}^{\infty}$ are controlled, we can bound the 1-Wasserstein distance between $\tilde{\bm{X}}_{\Omega}^{\infty}$ and $\bm{Y}^{\infty}$ . We next show that this result can be used to analyze the RRE approximation as well. Finally, we give a method to check that the second moment of $\tilde{\bm{X}}_{\Omega}^{\infty}$ is controlled by relating the global stability properties of the RRE to the required moment bound.

Condition 3.1.

There exists $p(\bm{x})$ , a polynomial of degree $\kappa$ , such that for all $i$ and all $\bm{x}\in\mathcal{X}$ , $\bar{\lambda}_{i}(\bm{x})\leq p(\bm{x})$ . Furthermore, there exists $\Omega_{0}$ such that for all $\Omega\in\left\{\Omega\in\mathcal{O}\middle|\Omega\geq\Omega_{0}\right\}$ , $\tilde{\bm{X}}_{\Omega}(t)$ has a unique stationary distribution $\tilde{\pi}$ , which satisfies

\mathbb{E}_{\tilde{\bm{X}}_{\Omega}^{\infty}\sim\tilde{\pi}}\left[\lVert\tilde{\bm{X}}_{\Omega}^{\infty}\rVert^{\kappa+1}\right]<\infty.

(16)

Condition 3.1 does not require any type of uniformity in $\Omega$ , and therefore the condition $\mathbb{E}\left[\lVert\bm{X}_{\Omega}^{\infty}\rVert^{\kappa+1}\right]<\infty$ can be checked instead.

Condition 3.2 (Uniform moment bounds).

There exist constants $M_{1}$ , $M_{2}$ , and $\Omega_{0}$ such that for all $\Omega\in\left\{\Omega\in\mathcal{O}\middle|\Omega\geq\Omega_{0}\right\}$ , $\tilde{\bm{X}}_{\Omega}(t)$ has a unique stationary distribution $\tilde{\pi}$ , which satisfies

\mathbb{E}_{\tilde{\bm{X}}_{\Omega}^{\infty}\sim\tilde{\pi}}\left[\lVert\tilde{\bm{X}}_{\Omega}^{\infty}\rVert\right]\leq M_{1},

(17)

and

\mathbb{E}_{\tilde{\bm{X}}_{\Omega}^{\infty}\sim\tilde{\pi}}\left[\lVert\tilde{\bm{X}}_{\Omega}^{\infty}\rVert^{2}\right]\leq M_{2}.

(18)

Remark 3.1.

In Condition 3.2, the bound on $\mathbb{E}_{\tilde{\bm{X}}_{\Omega}^{\infty}\sim\tilde{\pi}}\left[\lVert\tilde{\bm{X}}_{\Omega}^{\infty}\rVert^{2}\right]$ implies the existence of a bound on $\mathbb{E}_{\tilde{\bm{X}}_{\Omega}^{\infty}\sim\tilde{\pi}}\left[\lVert\tilde{\bm{X}}_{\Omega}^{\infty}\rVert\right]$ . This can be seen from the fact that by Jensen’s inequality, $\mathbb{E}_{\tilde{\bm{X}}_{\Omega}^{\infty}\sim\tilde{\pi}}\left[\lVert\tilde{\bm{X}}_{\Omega}^{\infty}\rVert\right]^{2}\leq\mathbb{E}_{\tilde{\bm{X}}_{\Omega}^{\infty}\sim\tilde{\pi}}\left[\lVert\tilde{\bm{X}}_{\Omega}^{\infty}\rVert^{2}\right]$ , and thus $\mathbb{E}_{\tilde{\bm{X}}_{\Omega}^{\infty}\sim\tilde{\pi}}\left[\lVert\tilde{\bm{X}}_{\Omega}^{\infty}\rVert^{2}\right]\leq M_{2}$ implies that $\mathbb{E}_{\tilde{\bm{X}}_{\Omega}^{\infty}\sim\tilde{\pi}}\left[\lVert\tilde{\bm{X}}_{\Omega}^{\infty}\rVert\right]\leq\sqrt{M_{2}}$ . However, we state Condition 3.2 as it is to allow for the possibility of using a stronger bound on $\mathbb{E}_{\tilde{\bm{X}}_{\Omega}^{\infty}\sim\tilde{\pi}}\left[\lVert\tilde{\bm{X}}_{\Omega}^{\infty}\rVert\right]$ than the one derived from $M_{2}$ .

Remark 3.2.

Recalling that $\tilde{\tilde{\bm{X}}}_{\Omega}=\frac{1}{\Omega}\left(X_{\Omega}-\Omega\bm{v}^{*}\right)$ is the “concentration scaled” Markov chain shifted to $\bm{v}^{*}$ , Condition 3.2 is equivalent to the existence of $\Omega_{0}$ , $M_{1}$ , and $M_{2}$ such that for all $\Omega\in\left\{\Omega\in\mathcal{O}\middle|\Omega\geq\Omega_{0}\right\}$ , there exists a unique stationary distribution of $\tilde{\tilde{\bm{X}}}_{\Omega}(t)$ , $\tilde{\tilde{\pi}}$ , and it satisfies

\mathbb{E}_{\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty}\sim\tilde{\tilde{\pi}}}\left[\lVert\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty}\rVert\right]\leq\frac{1}{\sqrt{\Omega}}M_{1},

(19)

and

\mathbb{E}_{\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty}\sim\tilde{\tilde{\pi}}}\left[\lVert\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty}\rVert^{2}\right]\leq\frac{1}{\Omega}M_{2}.

(20)

We are now ready to state our main result.

Theorem 3.1.

Consider an SCRN $\bm{X}_{\Omega}(t)$ satisfying Assumptions 2.1, 2.2, and 2.3 and let $\bm{v}^{*}\in\mathrm{int}(\mathcal{X})$ be an equilibrium point of (12a) such that $\frac{\partial F}{\partial\bm{v}}(\bm{v}^{*})$ is Hurwitz. If Conditions 3.1 and 3.2 hold, then there exist $C$ and $\Omega_{0}^{\prime}$ such that for all $\Omega\in\left\{\Omega\in\mathcal{O}\middle|\Omega\geq\Omega_{0}^{\prime}\right\}$ ,

\mathcal{W}_{1}\left(\tilde{\bm{X}}_{\Omega}^{\infty},\bm{Y}^{\infty}\right)\leq C\frac{\ln\Omega}{\sqrt{\Omega}}.

(21)

Proof.

See Section 4.1. ∎

Remark 3.3.

The conclusion of Theorem 3.1 implies that

\lim_{\Omega\rightarrow\infty}\mathcal{W}_{1}\left(\tilde{\bm{X}}_{\Omega}^{\infty},\bm{Y}^{\infty}\right)=0,

(22)

from which we can conclude convergence in distribution of $\tilde{\bm{X}}_{\Omega}^{\infty}$ to $\bm{Y}^{\infty}$ [63].

Remark 3.4.

The constants $C$ and $\Omega_{0}^{\prime}$ in Theorem 3.1 depend on the propensities and stoichiometry vectors, as well as $M_{1}$ , $M_{2}$ , and $\Omega_{0}$ from Condition 3.2. We note that an explicit computation of $C$ and $\Omega_{0}^{\prime}$ is possible, as will be seen in the proof, where we give an explicit upper bound that is dominated by a term proportional to $\frac{\ln\Omega}{\sqrt{\Omega}}$ .

Though our Stein’s Method approach does not require the preliminary step of establishing a law of large numbers type result relating $\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty}$ to $\bm{v}^{*}$ , an equilibrium point of (12a), we can conclude such a result whenever the assumptions of Theorem 3.1 are satisfied.

Corollary 3.1.

Consider an SCRN $\bm{X}_{\Omega}(t)$ satisfying Assumptions 2.1, 2.2, and 2.3 and let $\bm{v}^{*}\in\mathrm{int}(\mathcal{X})$ be an equilibrium point of (12a) such that $\frac{\partial F}{\partial\bm{v}}(\bm{v}^{*})$ is Hurwitz. If Conditions 3.1 and 3.2 hold, then there exist $\tilde{C}$ and $\tilde{\Omega}_{0}^{\prime}$ such that for all $\Omega\in\left\{\Omega\in\mathcal{O}\middle|\Omega\geq\tilde{\Omega}_{0}^{\prime}\right\}$ ,

\mathcal{W}_{1}\left(\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty},0\right)\leq\tilde{C}\frac{1}{\sqrt{\Omega}}.

(23)

Proof.

From the triangle inequality, we have that

\mathcal{W}_{1}\left(\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty},0\right)\leq\mathcal{W}_{1}\left(\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty},\frac{1}{\sqrt{\Omega}}\bm{Y}^{\infty}\right)+\mathcal{W}_{1}\left(\frac{1}{\sqrt{\Omega}}\bm{Y}^{\infty},0\right).

(24)

To bound the right hand side, notice that $\tilde{\tilde{\bm{X}}}^{\infty}_{\Omega}=\frac{1}{\sqrt{\Omega}}\tilde{\bm{X}}^{\infty}_{\Omega}$ . Thus,

\mathcal{W}_{1}\left(\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty},\frac{1}{\sqrt{\Omega}}\bm{Y}^{\infty}\right)=\frac{1}{\sqrt{\Omega}}\mathcal{W}_{1}\left(\tilde{\bm{X}}^{\infty}_{\Omega},\bm{Y}^{\infty}\right).

(25)

Under the assumptions of the corollary, we can apply Theorem 3.1 to conclude that there exist $C$ and $\Omega_{0}^{\prime}$ such that for all $\Omega\in\left\{\Omega\in\mathcal{O}\middle|\Omega\geq\Omega_{0}^{\prime}\right\}$ ,

\mathcal{W}_{1}\left(\tilde{\bm{X}}_{\Omega}^{\infty},\bm{Y}^{\infty}\right)\leq C\frac{\ln(\Omega)}{\sqrt{\Omega}},

(26)

and thus from (25), we have that for all $\Omega\in\left\{\Omega\in\mathcal{O}\middle|\Omega\geq\Omega_{0}^{\prime}\right\}$ ,

\mathcal{W}_{1}\left(\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty},\frac{1}{\sqrt{\Omega}}\bm{Y}^{\infty}\right)\leq C\frac{\ln\Omega}{\Omega}.

(27)

Observing that $\mathcal{W}_{1}\left(\frac{1}{\sqrt{\Omega}}\bm{Y}^{\infty},0\right)=\frac{1}{\sqrt{\Omega}}\mathcal{W}_{1}\left(\bm{Y}^{\infty},0\right)$ , we have from (24) and (27) that for all $\Omega\in\left\{\Omega\in\mathcal{O}\middle|\Omega\geq\Omega_{0}^{\prime}\right\}$ ,

	$\displaystyle\mathcal{W}_{1}\left(\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty},0\right)$	$\displaystyle\leq C\frac{\ln\Omega}{\Omega}+\frac{1}{\sqrt{\Omega}}\mathcal{W}_{1}\left(\bm{Y}^{\infty},0\right),$		(28)
		$\displaystyle\leq\left(C+\mathcal{W}_{1}\left(\bm{Y}^{\infty},0\right)\right)\frac{1}{\sqrt{\Omega}},$		(29)

which completes the proof. ∎

A special case where Conditions 3.1 and 3.2 are easy to verify is an SCRN with mass action kinetics and only zeroth and first order reactions. This idea is formalized in Corollary 3.2.

Corollary 3.2.

Consider an SCRN $\bm{X}_{\Omega}(t)$ satisfying Assumption 2.3 where for all $i$ , $\bar{\lambda}_{i}(\bm{x})$ is either of the form $\bar{\lambda}_{i}(\bm{x})=k_{i}$ or $\bar{\lambda}_{i}(\bm{x})=k_{i}x_{j_{i}}$ for some $k_{i}>0$ and $j_{i}\in\{1,\dots,n\}$ . Let $\bm{v}^{*}\in\mathrm{int}(\mathcal{X})$ be an equilibrium point of (12a) such that $\frac{\partial F(\bm{v}^{*})}{\partial\bm{v}}$ is Hurwitz. Then, there exist $C$ and $\Omega_{0}^{\prime}$ such that for all $\Omega\geq\Omega_{0}^{\prime}$ ,

\mathcal{W}_{1}\left(\tilde{\bm{X}}_{\Omega}^{\infty},\bm{Y}^{\infty}\right)\leq C\frac{\ln\Omega}{\sqrt{\Omega}}.

(30)

Proof.

The proof relies on the fact that for SCRNs with only zeroth and first order reactions, the LNA gives exactly the first and second moments of $\bm{X}_{\Omega}^{\infty}$ . Specifically, the fact that $\frac{\partial F(\bm{v}^{*})}{\partial\bm{v}}$ is Hurwitz implies that we can apply Proposition 7 in [32] to conclude that all moments of $\bm{X}_{\Omega}^{\infty}$ are finite, and thus Condition 3.1 holds. Additionally, since all of the propensities are affine, a computation of the first and second moment dynamics of $\bm{X}_{\Omega}(t)$ reveals that for all time [43], $\mathbb{E}\left[\tilde{\tilde{\bm{X}}}_{\Omega}(t)\right]=0$ and $\mathbb{E}\left[\tilde{\tilde{\bm{X}}}_{\Omega}(t)\tilde{\tilde{\bm{X}}}_{\Omega}^{T}(t)\right]=\frac{1}{\Omega}\mathbb{E}\left[\bm{Y}(t)\bm{Y}^{T}(t)\right]$ . Therefore, by Remarks 3.1 and 3.2, Condition 3.2 is satisfied. The result then follows from Theorem 3.1. ∎

We now give a result that gives sufficient conditions on the RREs for the conclusion of Theorem 3.1 to hold. This result is substantially easier to use in practice than Theorem 3.1, since it requires only analyzing an ODE with $n$ variables, and not the stationary distribution of a CTMC over $\mathcal{X}_{\Omega}$ .

Theorem 3.2.

Consider an SCRN $\bm{X}_{\Omega}(t)$ satisfying Assumptions 2.1, 2.2, and 2.3, and let $\bm{v}^{*}\in\mathrm{int}(\mathcal{X})$ . Suppose that there exist $K,\gamma_{1}>0$ such that for all $\bm{v}_{0}\in\mathcal{X}$ ,

\left\lVert\bm{v}(t)-\bm{v}^{*}\right\rVert\leq Ke^{-\gamma_{1}t}\left\lVert\bm{v}_{0}-\bm{v}^{*}\right\rVert,

(31)

where $\bm{v}(t)$ is the solution to (12a), and that there exists $\bm{c}\in\mathbb{R}^{n}_{>0}$ and $d,\gamma_{2}>0$ such that for all $\bm{v}\in\left\{\bm{v}\in\mathcal{X}\middle|\bm{c}^{T}\bm{v}\geq d\right\}$ ,

\bm{c}^{T}F(\bm{v})\leq-\gamma_{2}\bm{c}^{T}\bm{v}.

(32)

Then, there exist $C$ and $\Omega_{0}^{\prime}$ such that for all $\Omega\in\left\{\Omega\in\mathcal{O}\middle|\Omega\geq\Omega_{0}^{\prime}\right\}$ ,

\mathcal{W}_{1}\left(\tilde{\bm{X}}_{\Omega}^{\infty},\bm{Y}^{\infty}\right)\leq C\frac{\ln\Omega}{\sqrt{\Omega}}.

(33)

Proof.

See Section 4.4. ∎

4 Proofs of the Main Results

In this section we provide the proofs of the results presented in Section 3.

4.1 Proof of Theorem 3.1

We require the following Lemma from [29], which gives sufficient conditions for the the Basic Adjoint Relationship (BAR) to hold.

Lemma 4.1 (Proposition 3 in [29]).

Consider a CTMC over state space $\mathcal{S}$ with rate matrix having entries $G(\bm{x},\bm{x}^{\prime})$ and stationary distribution $\pi(\bm{x})$ . If

\sum_{\bm{x}\in\mathcal{S}}\pi(\bm{x})\left\lvert G(\bm{x},\bm{x})\right\rvert\left\lvert f(\bm{x})\right\rvert<\infty,

(34)

then

\mathbb{E}_{\bm{X}^{\infty}\sim\pi}\left[Gf(\bm{X}^{\infty})\right]=0.

(35)

It will be convenient for us to consider the 1-Wasserstein distance computed by taking a supremum over not the 1-Lipschitz functions, but the 1-Lipschitz functions that are additionally in $C^{3}$ with bounded derivatives. To this end, let

\mathcal{H}=\{h\in C^{3}(\mathbb{R}^{n})|h\in\mathrm{Lip}(1),\;\forall k\in\{1,2,3\},\;\sup_{\bm{x}\in\mathbb{R}^{n}}\lVert\nabla^{k}h(\bm{x})\rVert<\infty\}.

(36)

We have the following Lemma, which is proved in Section 4.3.

Lemma 4.2.

The following holds for any two probability distributions $\nu$ and $\rho$ over $\mathbb{R}^{n}$ :

\sup_{h\in\mathrm{Lip}(1)}\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h(\bm{Y})]\right\rvert=\sup_{h\in\mathcal{H}}\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h(\bm{Y})]\right\rvert.

(37)

We now present the derivative bounds that we will need. The proof of Lemma 4.3 uses results from [30], and is provided in 4.2.

Lemma 4.3 (Derivative Bounds).

Consider the Stein Equation

G_{\bm{Y}}f_{h}(\bm{x})=h(\bm{x})-\mathbb{E}\left[h(\bm{Y}^{\infty})\right],

(38)

for $h\in\mathcal{H}$ . Assume that there exists a symmetric matrix $H>0$ and a real number $\phi>0$ such that

H\frac{\partial F(\bm{v}^{*})}{\partial\bm{v}}+\left(\frac{\partial F(\bm{v}^{*})}{\partial\bm{v}}\right)^{T}H\leq-2\phi H.

(39)

Assume also that $S\operatorname{diag}\sqrt{\bar{\bm{\lambda}}(\bm{v}^{*})}$ has a right inverse. Then, there exists a solution $f_{h}$ to the Stein Equation, which satisfies

\forall\bm{x}\in\mathbb{R}^{n},\;\left\lVert\nabla f_{h}(\bm{x})\right\rVert\leq C_{1},

(40)

\forall\bm{x}\in\mathbb{R}^{n},\;\left\lVert\nabla^{2}f_{h}(\bm{x})\right\rVert\leq C_{2},

(41)

and for all $0<\zeta<1$ , there exists $C_{3}(\zeta)$ such that

\forall\bm{x},\bm{x}^{\prime}\in\mathbb{R}^{n},\;\left\lVert\nabla^{2}f_{h}(\bm{x})-\nabla^{2}f_{h}(\bm{x}^{\prime})\right\rVert\leq C_{3}(\zeta)\lVert\bm{x}-\bm{x}^{\prime}\rVert^{1-\zeta},

(42)

where $C_{1}$ , $C_{2}$ , and $C_{3}$ are given by

$\displaystyle C_{1}$	$\displaystyle=\int_{0}^{\infty}\left\lVert e^{At}\right\rVert dt,$	(43)
$\displaystyle C_{2}$	$\displaystyle=\left\lVert\Sigma^{-1}\right\rVert\mathrm{cond}(H)\left(2+e^{-\phi}C_{1}\right),$	(44)
$\displaystyle C_{3}$	$\displaystyle=\frac{2C_{3}^{\prime}}{\zeta}+e^{-\phi}C_{3}^{\prime}C_{1},$	(45)

where $A=\frac{\partial F(\bm{v}^{*})}{\partial\bm{v}}$ , $\Sigma^{-1}$ is the right inverse of $S\operatorname{diag}\sqrt{\bar{\bm{\lambda}}(\bm{v}^{*})}$ , and

C_{3}^{\prime}=2\max\{\left\lVert\Sigma^{-1}\right\rVert,\left\lVert\Sigma^{-1}\right\rVert^{2}\}\mathrm{cond}(H)^{\tfrac{5}{2}}.

(46)

Proof.

See Section 4.2. ∎

Remark 4.1.

The assumption that there exists a symmetric matrix $H>0$ and a real number $\phi>0$ such that

H\frac{\partial F(\bm{v}^{*})}{\partial\bm{v}}+\left(\frac{\partial F(\bm{v}^{*})}{\partial\bm{v}}\right)^{T}H\leq-2\phi H

(47)

is equivalent to the assumption that $\frac{\partial F(\bm{v}^{*})}{\partial\bm{v}}$ is Hurwitz.

Our goal is to analyze

\mathcal{W}_{1}\left(\tilde{\bm{X}}_{\Omega}^{\infty},\bm{Y}^{\infty}\right)=\sup_{h\in\mathrm{Lip}(1)}\left\lvert\mathbb{E}\left[h\left(\tilde{\bm{X}}_{\Omega}^{\infty}\right)\right]-\mathbb{E}\left[h\left(\bm{Y}^{\infty}\right)\right]\right\rvert.

(48)

We begin by observing that per Lemma 4.2 we can obtain $\mathcal{W}_{1}\left(\tilde{\bm{X}}_{\Omega}^{\infty},\bm{Y}^{\infty}\right)$ by taking the supremum over $h\in\mathcal{H}$ instead of $h\in\mathrm{Lip}(1)$ , and thus we can analyze

\mathcal{W}_{1}\left(\tilde{\bm{X}}_{\Omega}^{\infty},\bm{Y}^{\infty}\right)=\sup_{h\in\mathcal{H}}\left\lvert\mathbb{E}\left[h\left(\tilde{\bm{X}}_{\Omega}^{\infty}\right)\right]-\mathbb{E}\left[h\left(\bm{Y}^{\infty}\right)\right]\right\rvert.

(49)

To do this we use Stein’s Method following [11]. The Stein Equation is

G_{\bm{Y}}f_{h}(\bm{x})=h(\bm{x})-\mathbb{E}\left[h(\bm{Y}^{\infty})\right],

(50)

which in our case is the Poisson equation

\frac{\partial f_{h}(\bm{x})}{\partial\bm{x}}\cdot\frac{\partial F(\bm{v}^{*})}{\partial\bm{v}}\bm{x}+\frac{1}{2}\sum_{i=1,j=1}^{n}D_{ij}\frac{\partial^{2}f_{h}(\bm{x})}{\partial x_{i}\partial x_{j}}=h(\bm{x})-\mathbb{E}\left[h(\bm{Y}^{\infty})\right],

(51)

which is guaranteed to have a solution by Lemma 4.3. Throughout the rest of this proof, we will denote by $f_{h}$ a solution to (51) with $h\in\mathcal{H}$ that satisfies the bounds given by Lemma 4.3. We take the expectation of the Stein Equation with respect to $\tilde{\bm{X}}_{\Omega}^{\infty}$ to find

\mathbb{E}\left[G_{\bm{Y}}f_{h}(\tilde{\bm{X}}_{\Omega}^{\infty})\right]=\mathbb{E}\left[h(\tilde{\bm{X}}_{\Omega}^{\infty})\right]-\mathbb{E}\left[h(\bm{Y}^{\infty})\right].

(52)

Before proceeding, we show that $\mathbb{E}\left[G_{\tilde{\bm{X}}_{\Omega}}f_{h}(\tilde{\bm{X}}_{\Omega}^{\infty})\right]=0$ for all $h\in\mathcal{H}$ . We have that $G_{\tilde{\bm{X}}_{\Omega}}(\bm{x},\bm{x})=-\sum_{i=1}^{r}\lambda_{i}(\sqrt{\Omega}\bm{x}+\Omega\bm{v}^{*})$ , and so by Condition 3.1, there exists $K>0$ such that

\sum_{\bm{x}}\mathbb{P}[\tilde{\bm{X}}_{\Omega}^{\infty}=\bm{x}]\lvert G_{\tilde{\bm{X}}_{\Omega}}(\bm{x},\bm{x})f_{h}(\bm{x})\rvert\leq\sum_{\bm{x}}\mathbb{P}[\tilde{\bm{X}}_{\Omega}^{\infty}=\bm{x}](K+\lVert\sqrt{\Omega}\bm{x}+\Omega\bm{v}^{*}\rVert^{\kappa})\lvert f_{h}(\bm{x})\rvert,

(53)

where the sums are taken over $\bm{x}$ such that $\sqrt{\Omega}\bm{x}+\Omega\bm{v}^{*}\in\mathcal{X}_{\Omega}$ . By Lemma 4.3, $f_{h}$ is Lipschitz, and so $\lvert f_{h}(\bm{x})\rvert\leq\lvert f_{h}(0)\rvert+\lVert\bm{x}\rVert$ . Therefore,

	$\displaystyle\sum_{\bm{x}}\mathbb{P}[\tilde{\bm{X}}_{\Omega}^{\infty}=\bm{x}]\lvert G_{\tilde{\bm{X}}_{\Omega}}(\bm{x},\bm{x})f_{h}(\bm{x})\rvert$	$\displaystyle\leq\sum_{\bm{x}}\mathbb{P}[\tilde{\bm{X}}_{\Omega}^{\infty}=\bm{x}](K+\lVert\sqrt{\Omega}\bm{x}+\Omega\bm{v}^{*}\rVert^{\kappa})\left(\lvert f_{h}(0)\rvert+\lVert\bm{x}\rVert\right),$		(54)
	$\displaystyle\begin{split}&\leq\sum_{\bm{x}}\mathbb{P}[\tilde{\bm{X}}_{\Omega}^{\infty}=\bm{x}]\left((K\lvert f_{h}(0)\rvert+K\lVert\bm{x}\rVert\vphantom{\max\left\{1,\left(\lVert\sqrt{\Omega}\bm{x}\rVert+\lVert\Omega\bm{v}^{}\rVert\right)^{\kappa}\right\}}\right.\\ &\left.\quad\quad\quad+\lvert f_{h}(0)\rvert\lVert\sqrt{\Omega}\bm{x}+\Omega\bm{v}^{}\rVert^{\kappa}\right.\\ &\quad\quad\quad\left.+\max\left\{1,\left(\lVert\sqrt{\Omega}\bm{x}\rVert+\lVert\Omega\bm{v}^{*}\rVert\right)^{\kappa}\right\}\lVert\bm{x}\rVert\right),\end{split}$			(55)

which is finite due to Condition 3.1. Therefore, from Lemma 4.1, $\mathbb{E}\left[G_{\tilde{\bm{X}}_{\Omega}}f_{h}(\tilde{\bm{X}}_{\Omega}^{\infty})\right]=0$ . Thus, for all $h\in\mathcal{H}$ , (52) implies that

$\displaystyle\left\lvert\mathbb{E}\left[h\left(\tilde{\bm{X}}_{\Omega}^{\infty}\right)\right]-\mathbb{E}\left[h\left(\bm{Y}^{\infty}\right)\right]\right\rvert$	$\displaystyle=\left\lvert\mathbb{E}\left[G_{\bm{Y}}f_{h}(\tilde{\bm{X}}_{\Omega}^{\infty})\right]-\mathbb{E}\left[G_{\tilde{\bm{X}}_{\Omega}}f_{h}(\tilde{\bm{X}}_{\Omega}^{\infty})\right]\right\rvert,$	(56)
	$\displaystyle=\left\lvert\mathbb{E}\left[G_{\bm{Y}}f_{h}(\tilde{\bm{X}}_{\Omega}^{\infty})-G_{\tilde{\bm{X}}_{\Omega}}f_{h}(\tilde{\bm{X}}_{\Omega}^{\infty})\right]\right\rvert,$	(57)
	$\displaystyle\leq\mathbb{E}\left\lvert G_{\bm{Y}}f_{h}(\tilde{\bm{X}}_{\Omega}^{\infty})-G_{\tilde{\bm{X}}_{\Omega}}f_{h}(\tilde{\bm{X}}_{\Omega}^{\infty})\right\rvert,$	(58)

where $f_{h}$ is a solution to the Stein equation. The existence of $f_{h}$ is guaranteed by Lemma 4.3. For conciseness, let us define $\delta=\frac{1}{\sqrt{\Omega}}$ . Let $\bm{l}=\bm{x}/\delta+\bm{v}^{*}/\delta^{2}$ . For a fixed $\bm{l}$ we take the first order Taylor expansion of $G_{\tilde{\bm{X}}_{\Omega}}f_{h}(\bm{x})$ in $\delta$ using the Lagrange form of the remainder.

$\displaystyle G_{\tilde{\bm{X}}_{\Omega}}f_{h}(\bm{x})$	$\displaystyle=\sum_{i=1}^{r}\lambda_{i}(\bm{l})\left(f(\bm{x}+\delta\bm{\xi}_{i})-f(\bm{x})\right),$	(59)
	$\displaystyle=\delta\sum_{i=1}^{r}\lambda_{i}(\bm{l})\frac{\partial f_{h}(\bm{x})}{\partial\bm{x}}\bm{\xi}_{i}+\delta^{2}\frac{1}{2}\sum_{i=1}^{r}\lambda_{i}(l)\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x}+\epsilon\bm{\xi}_{i})\bm{\xi}_{i},$	(60)
$\displaystyle\begin{split}&=\delta\sum_{i=1}^{r}\lambda_{i}(\bm{l})\frac{\partial f_{h}(\bm{x})}{\partial\bm{x}}\bm{\xi}_{i}+\delta^{2}\frac{1}{2}\sum_{i=1}^{r}\lambda_{i}(l)\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x})\bm{\xi}_{i}\\ &\quad\quad+\delta^{2}\frac{1}{2}\sum_{i=1}^{r}\lambda_{i}(\bm{l})\left(\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x}+\epsilon\bm{\xi}_{i})\bm{\xi}_{i}-\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x})\bm{\xi}_{i}\right),\end{split}$		(61)

where $0\leq\epsilon\leq\delta$ , and $\epsilon$ can depend on $\bm{l}$ . We have that $\lambda_{i}(\bm{l})=\frac{1}{\delta^{2}}\bar{\lambda}_{i}(\delta\bm{x}+\bm{v}^{*})$ , and so from Taylor’s Theorem we have

\frac{1}{\delta^{2}}\bar{\lambda}_{i}(\delta\bm{x}+\bm{v}^{*})=\frac{1}{\delta^{2}}\bar{\lambda}_{i}(\bm{v}^{*})+\frac{1}{\delta}\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\bm{x}+R_{i}(\bm{x}),

(62)

where $\lvert R_{i}(\bm{x})\rvert\leq\frac{1}{2}\lVert\bm{x}\rVert^{2}\sup_{\bm{z}\in\mathcal{X}}\lVert\nabla^{2}{\bar{\lambda}_{i}}(\bm{z})\rVert$ , in which the supremum over $\mathcal{X}$ is finite due to our assumption that $\bar{\lambda}_{i}(\bm{x})$ has bounded second derivative. We have that (LABEL:eq:taylor_expanded) becomes

\displaystyle\begin{split}G_{\tilde{\bm{X}}_{\Omega}}f_{h}(\bm{x})&=\sum_{i=1}^{r}\left(\frac{1}{\delta^{2}}\bar{\lambda}_{i}(\bm{v}^{*})+\frac{1}{\delta}\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\bm{x}+R_{i}(\bm{x})\right)\delta\frac{\partial f_{h}(\bm{x})}{\partial\bm{x}}\bm{\xi}_{i}\\ &+\sum_{i=1}^{r}\left(\frac{1}{\delta^{2}}\bar{\lambda}_{i}(\bm{v}^{*})+\frac{1}{\delta}\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\bm{x}+R_{i}(\bm{x})\right)\delta^{2}\frac{1}{2}\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x})\bm{\xi}_{i}\\ &+\sum_{i=1}^{r}\left(\frac{1}{\delta^{2}}\bar{\lambda}_{i}(\bm{v}^{*})+\frac{1}{\delta}\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\bm{x}\right)\delta^{2}\frac{1}{2}\left(\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x}+\epsilon\bm{\xi}_{i})\bm{\xi}_{i}-\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x})\bm{\xi}_{i}\right)\\ &+\sum_{i=1}^{r}R_{i}(\bm{x})\delta^{2}\frac{1}{2}\left(\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x}+\epsilon\bm{\xi}_{i})\bm{\xi}_{i}-\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x})\bm{\xi}_{i}\right).\end{split}

(63)

Collecting terms and using the fact that $F(\bm{v}^{*})=0$ , we obtain

\displaystyle\begin{split}&G_{\tilde{\bm{X}}_{\Omega}}f_{h}(\bm{x})=\frac{\partial f_{h}(\bm{x})}{\partial\bm{x}}\sum_{i=1}^{r}\bm{\xi}_{i}\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\bm{x}+\frac{1}{2}\sum_{i=1}^{r}\bar{\lambda}_{i}(\bm{v}^{*})\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x})\bm{\xi}_{i}\\ &\quad\quad+\delta\frac{1}{2}\sum_{i=1}^{r}\left[2R_{i}(\bm{x})\frac{\partial f_{h}(\bm{x})}{\partial\bm{x}}\bm{\xi}_{i}+\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\bm{x}\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x})\bm{\xi}_{i}\right]\\ &\quad\quad+\delta^{2}\frac{1}{2}\sum_{i=1}^{r}R_{i}(\bm{x})\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x})\bm{\xi}_{i}\\ &\quad\quad+\frac{1}{2}\sum_{i=1}^{r}\left(\bar{\lambda}_{i}(\bm{v}^{*})+\delta\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\bm{x}+\delta^{2}R_{i}(\bm{x})\right)\left(\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x}+\epsilon\bm{\xi}_{i})\bm{\xi}_{i}-\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x})\bm{\xi}_{i}\right).\end{split}

(64)

Observe that the first two terms are identical to $G_{Y}f_{h}(\bm{x})$ , since

\frac{1}{2}\sum_{i=1}^{r}\bar{\lambda}_{i}(\bm{v}^{*})\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x})\bm{\xi}_{i}=\frac{1}{2}\sum_{p=1,q=1}^{n}D_{pq}\frac{\partial f(\bm{x})}{\partial x_{p}\partial x_{q}},

(65)

with $D=S\operatorname{diag}(\bar{\bm{\lambda}}(\bm{v}^{*}))S^{T}$ . Therefore, for all $\bm{x}$ such that $\sqrt{\Omega}\bm{x}+\Omega\bm{v}^{*}\in\mathcal{X}_{\Omega}$ ,

\displaystyle\begin{split}&\left\lvert G_{\bm{Y}}f_{h}(\bm{x})-G_{\tilde{\bm{X}}_{\Omega}}f_{h}(\bm{x})\right\rvert\leq\left\lvert\frac{1}{2}\delta\sum_{i=1}^{r}\left[2R_{i}(\bm{x})\frac{\partial f_{h}(\bm{x})}{\partial\bm{x}}\bm{\xi}_{i}+\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\bm{x}\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x})\bm{\xi}_{i}\right]\right\rvert\\ &\quad\quad\quad+\left\lvert\frac{1}{2}\delta^{2}\sum_{i=1}^{r}R_{i}(\bm{x})\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x})\bm{\xi}_{i}\right\rvert\\ &\quad\quad+\left\lvert\frac{1}{2}\sum_{i=1}^{r}\left(\bar{\lambda}_{i}(\bm{v}^{*})+\delta\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\bm{x}+\delta^{2}R_{i}(\bm{x})\right)\left(\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x}+\epsilon\bm{\xi}_{i})\bm{\xi}_{i}-\bm{\xi}_{i}^{T}\nabla^{2}{f_{h}}(\bm{x})\bm{\xi}_{i}\right)\right\rvert.\end{split}

(66)

Letting $Q_{i}=\sup_{\bm{v}\in\mathcal{X}}\lVert\nabla^{2}{\bar{\lambda}_{i}}(\bm{v})\rVert$ and invoking Lemma 4.3 by the fact that Assumptions 2.1 and 2.3 ensure that $S\operatorname{diag}\sqrt{\bar{\bm{\lambda}}(\bm{v}^{*})}$ has a right inverse since $\bar{\bm{\lambda}}(\bm{v}^{*})>0$ and $S$ has full column rank, we have

\displaystyle\begin{split}\left\lvert G_{\bm{Y}}f_{h}(\bm{x})-G_{\tilde{\bm{X}}_{\Omega}}f_{h}(\bm{x})\right\rvert&\leq\frac{1}{2}\delta\sum_{i=1}^{r}\left(C_{1}Q_{i}\left\lVert\bm{x}\right\rVert^{2}\lVert\bm{\xi}_{i}\rVert+C_{2}\left\lVert\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\right\rVert\left\lVert\bm{x}\right\rVert\lVert\bm{\xi}_{i}\rVert^{2}\right)\\ &+\frac{1}{4}\delta^{2}C_{2}\left\lVert\bm{x}\right\rVert^{2}\sum_{i=1}^{r}Q_{i}\left\lVert\bm{\xi}_{i}\right\rVert^{2}\\ &+\frac{1}{2}\delta^{2-\zeta}C_{3}(\zeta)\lVert\bm{x}\rVert\sum_{i=1}^{r}\left\lVert\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\right\rVert\left\lVert\bm{\xi}_{i}\right\rVert^{3-\zeta}\\ &+\frac{1}{4}\delta^{3-\zeta}C_{3}(\zeta)\left\lVert\bm{x}\right\rVert^{2}\sum_{i=1}^{r}Q_{i}\left\lVert\bm{\xi}_{i}\right\rVert^{3-\zeta}\\ &+\frac{1}{2}\delta^{1-\zeta}C_{3}(\zeta)\sum_{i=1}^{r}\left\lvert\bar{\lambda}_{i}(\bm{v}^{*})\right\rvert\left\lVert\bm{\xi}_{i}\right\rVert^{3-\zeta}.\end{split}

(67)

This bound holds for any $h\in\mathcal{H}$ , with $f_{h}$ chosen as the solution to the Stein equation (51) specified by Lemma 4.3, and thus from (56) we have

\begin{split}\left\lvert\mathbb{E}\left[h\left(\tilde{\bm{X}}_{\Omega}^{\infty}\right)\right]-\mathbb{E}\left[h\left(\bm{Y}^{\infty}\right)\right]\right\rvert&\\ &\leq\mathbb{E}\left[\frac{1}{2}\delta\sum_{i=1}^{r}\left(C_{1}Q_{i}\left\lVert\tilde{\bm{X}}_{\Omega}^{\infty}\right\rVert^{2}\lVert\bm{\xi}_{i}\rVert+C_{2}\left\lVert\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\right\rVert\left\lVert\tilde{\bm{X}}_{\Omega}^{\infty}\right\rVert\lVert\bm{\xi}_{i}\rVert^{2}\right)\right.\\ +\frac{1}{4}\delta^{2}C_{2}\left\lVert\tilde{\bm{X}}_{\Omega}^{\infty}\right\rVert^{2}\sum_{i=1}^{r}Q_{i}\left\lVert\bm{\xi}_{i}\right\rVert^{2}\\ +\frac{1}{2}\delta^{2-\zeta}C_{3}(\zeta)\lVert\tilde{\bm{X}}_{\Omega}^{\infty}\rVert\sum_{i=1}^{r}\left\lVert\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\right\rVert\left\lVert\bm{\xi}_{i}\right\rVert^{3-\zeta}\\ +\frac{1}{4}\delta^{3-\zeta}C_{3}(\zeta)\left\lVert\tilde{\bm{X}}_{\Omega}^{\infty}\right\rVert^{2}\sum_{i=1}^{r}Q_{i}\left\lVert\bm{\xi}_{i}\right\rVert^{3-\zeta}\\ \left.+\frac{1}{2}\delta^{1-\zeta}C_{3}(\zeta)\sum_{i=1}^{r}\left\lvert\bar{\lambda}_{i}(\bm{v}^{*})\right\rvert\left\lVert\bm{\xi}_{i}\right\rVert^{3-\zeta}\right].\end{split}

(68)

Using the moment bounds and the linearity of the expectation we have

\begin{split}\left\lvert\mathbb{E}\left[h\left(\tilde{\bm{X}}_{\Omega}^{\infty}\right)\right]-\mathbb{E}\left[h\left(\bm{Y}^{\infty}\right)\right]\right\rvert\leq\frac{1}{2}\delta\sum_{i=1}^{r}\left(C_{1}Q_{i}M_{2}\lVert\bm{\xi}_{i}\rVert+C_{2}\left\lVert\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\right\rVert M_{1}\lVert\bm{\xi}_{i}\rVert^{2}\right)\\ +\frac{1}{4}\delta^{2}C_{2}M_{2}\sum_{i=1}^{r}Q_{i}\left\lVert\bm{\xi}_{i}\right\rVert^{2}\\ +\frac{1}{2}\delta^{2-\zeta}C_{3}(\zeta)M_{1}\sum_{i=1}^{r}\left\lVert\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\right\rVert\left\lVert\bm{\xi}_{i}\right\rVert^{3-\zeta}\\ +\frac{1}{4}\delta^{3-\zeta}C_{3}(\zeta)M_{2}\sum_{i=1}^{r}Q_{i}\left\lVert\bm{\xi}_{i}\right\rVert^{3-\zeta}\\ +\frac{1}{2}\delta^{1-\zeta}C_{3}(\zeta)\sum_{i=1}^{r}\left\lvert\bar{\lambda}_{i}(\bm{v}^{*})\right\rvert\left\lVert\bm{\xi}_{i}\right\rVert^{3-\zeta}.\end{split}

(69)

To prove the error bound proportional to $\frac{\ln\Omega}{\sqrt{\Omega}}$ , we take $\zeta=-(\ln\delta)^{-1}$ for $\Omega>e^{2}$ , in which case $C_{3}(\zeta)=-2\ln(\delta)C_{3}^{\prime}+e^{-\phi}C_{3}^{\prime}C_{1}$ . Let $\iota>0$ . For all $\Omega\geq\max\{\Omega_{0},e^{2}+\iota\}$ we have

\begin{split}\left\lvert\mathbb{E}\left[h\left(\tilde{\bm{X}}_{\Omega}^{\infty}\right)\right]-\mathbb{E}\left[h\left(\bm{Y}^{\infty}\right)\right]\right\rvert\leq\frac{1}{2}\delta\sum_{i=1}^{r}\left(C_{1}Q_{i}M_{2}\lVert\bm{\xi}_{i}\rVert+C_{2}\left\lVert\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\right\rVert M_{1}\lVert\bm{\xi}_{i}\rVert^{2}\right)\\ +\frac{1}{4}\delta^{2}C_{2}M_{2}\sum_{i=1}^{r}Q_{i}\left\lVert\bm{\xi}_{i}\right\rVert^{2}\\ +\frac{1}{2}\delta^{2}C_{3}^{\prime}e\left(2\ln(\delta^{-1})+e^{-\phi}C_{1}\right)M_{1}\sum_{i=1}^{r}\left\lVert\frac{\partial\bar{\lambda}_{i}(\bm{v}^{*})}{\partial\bm{v}}\right\rVert\max\{\left\lVert\bm{\xi}_{i}\right\rVert^{2},\left\lVert\bm{\xi}_{i}\right\rVert^{3}\}\\ +\frac{1}{4}\delta^{3}C_{3}^{\prime}e\left(2\ln(\delta^{-1})+e^{-\phi}C_{1}\right)M_{2}\sum_{i=1}^{r}Q_{i}\max\{\left\lVert\bm{\xi}_{i}\right\rVert^{2},\left\lVert\bm{\xi}_{i}\right\rVert^{3}\}\\ +\frac{1}{2}\delta C_{3}^{\prime}e\left(2\ln(\delta^{-1})+e^{-\phi}C_{1}\right)\sum_{i=1}^{r}\left\lvert\bar{\lambda}_{i}(\bm{v}^{*})\right\rvert\max\{\left\lVert\bm{\xi}_{i}\right\rVert^{2},\left\lVert\bm{\xi}_{i}\right\rVert^{3}\}.\end{split}

(70)

The $\delta\ln(\delta^{-1})=\frac{1}{2}\delta\ln(\delta^{-2})=\frac{1}{2}\frac{\ln\Omega}{\sqrt{\Omega}}$ term decays the slowest, which yields the desired result.

4.2 Proof of Lemma 4.3

Here we prove Lemma 4.3. The proof is an application of Theorem 5 in [30], with the bounds tightened with more detailed analysis in our special case. For convenience we adopt the following notion from [30]. For a function $g$ with domain $\mathbb{R}^{n}$ , we use the notation

M_{j}(g)=\sup_{\bm{x},\bm{x}^{\prime}\in\mathbb{R}^{n},\bm{x}\neq\bm{x}^{\prime}}\frac{\lVert\nabla^{\lceil j\rceil-1}g(\bm{x})-\nabla^{\lceil j\rceil-1}g(\bm{x}^{\prime})\rVert}{\lVert\bm{x}-\bm{x}^{\prime}\rVert^{\{j\}}},

(71)

where $\{j\}=j-\lceil j-1\rceil$ , and

F_{j}(g)=\sup_{\bm{x}\in\mathbb{R}^{n},\lVert v_{1}\rVert=\lVert v_{2}\rVert=\dots=\lVert v_{j}\rVert=1}\lVert\nabla^{j}g(\bm{x})[\bm{z}_{1},\dots,\bm{z}_{j}]\rVert_{F}

(72)

Here, for $g:\mathbb{R}^{n}\rightarrow\mathbb{R}$ , $\nabla^{r}g(\bm{x})[\bm{z}_{1},\dots,\bm{z}_{j}]$ denotes the $r$ ^th derivative of $g$ along the directions $\bm{z}_{1},\dots,\bm{z}_{j}$ . An Itô Diffusion $\bm{Z}(t)$ is the solution to

d\bm{Z}_{\bm{x}}(t)=b(\bm{Z}_{\bm{x}}(t))dt+\sigma(\bm{Z}_{\bm{x}}(t))d\bm{W}(t),\;\bm{Z}_{\bm{x}}(t)=\bm{x},

(73)

where $\bm{W}(t)$ is an $m$ dimensional Wiener process. We denote the transition semigroup of $\bm{Z}(t)$ by $(P_{t})_{t\geq 0}$ , and the infinitesimal generator of $\bm{Z}(t)$ by $\mathcal{A}$ . $\bm{Z}(t)$ has Wasserstein decay rate r(t) for a nonincreasing integrable function $r:\mathbb{R}_{\geq 0}\rightarrow\mathbb{R}$ if for all $\bm{z},\bm{z}^{\prime}\in\mathbb{R}^{n}$ and all $t\geq 0$ ,

\mathcal{W}_{1}\left(\bm{Z}(t)|\bm{Z}(0)=\bm{z},\bm{Z}(t)|\bm{Z}(0)=\bm{z}^{\prime}\right)\leq r(t)\mathcal{W}_{1}(\bm{z},\bm{z}^{\prime}).

(74)

For clarity, we start by stating the following theorem, slightly adapted from [30], which ensures the existence of constants $C_{1}$ , $C_{2}$ and $C_{3}^{\prime}$ . We will shortly improve these constants in our special case.

Theorem 4.1 (Theorem 5 in [30]).

Let $h:\mathbb{R}^{n}\rightarrow\mathbb{R}$ be Lipschitz. Consider an $n$ dimensional Itô diffusion given by (73) with Wasserstein decay rate $\bar{r}(t)$ and invariant measure $P$ . If $b$ and $\sigma$ have locally Lipschitz second and third derivatives and a right inverse $\sigma^{-1}(\bm{x})$ for each $\bm{x}\in\mathbb{R}^{n}$ , and $h\in C^{3}(\mathbb{R}^{n})$ with bounded second and third derivatives, then

f_{h}=\int_{0}^{\infty}\mathbb{E}_{P}\left[h(\bm{Z})\right]-P_{t}hdt

(75)

is twice continuously differentiable, satisfies the Stein equation

\mathcal{A}f_{h}=h-\mathbb{E}_{P}\left[h(\bm{Z})\right],

(76)

and additionally,

M_{1}(f_{h})\leq M_{1}(h)\int_{0}^{\infty}\bar{r}(t)dt,

(77)

and

M_{2}(f_{h})\leq M_{1}(h)(\beta_{1}+\beta_{2}),

(78)

where

	$\displaystyle\beta_{1}=\bar{r}(0)\left(2M_{0}(\sigma^{-1})+\bar{r}(0)M_{1}(\sigma)M_{0}(\sigma^{-1})+\bar{r}(0)\sqrt{\alpha}\right),$		(79)
	$\displaystyle\beta_{2}=\bar{r}(0)\left(e^{\gamma_{2}}M_{0}(\sigma^{-1})+e^{\gamma_{2}}M_{1}(\sigma)M_{0}(\sigma^{-1})+\frac{2}{3}e^{\gamma_{4}}\sqrt{\alpha}\right),$		(80)

with $\gamma_{\rho}=\rho M_{1}(b)+\frac{\rho^{2}-2\rho}{2}M_{1}(\sigma)^{2}+\frac{\rho}{2}F_{1}(\sigma)^{2}$ and $\alpha=\frac{M_{2}(b)^{2}}{2M_{1}(b)+4M_{1}(\sigma)^{2}}+2F_{2}(\sigma)^{2}$ . Furthermore, for all $\zeta\in(0,1)$ ,

M_{3-\zeta}(f_{h})\leq M_{1}(h)\frac{1}{K}\left(\frac{1}{\zeta}+\int_{0}^{\infty}\bar{r}(t)dt\right),

(81)

for $K>0$ that depends only on $M_{1:3}(\sigma)$ , $M_{1:3}(b)$ , $M_{0}(\sigma^{-1})$ , and $\bar{r}$ .

We now show that $\bm{Y}(t)$ has a Wasserstein decay rate of $\left\lVert e^{At}\right\rVert$ , meaning that

\mathcal{W}_{1}(\bm{Y}(t)|\bm{Y}(0)=\bm{y},\bm{Y}(t)|\bm{Y}(0)=\bm{y}^{\prime})\leq\left\lVert e^{At}\right\rVert\mathcal{W}_{1}(\bm{y},\bm{y}^{\prime}),

(82)

where $\bm{Y}(t)|\bm{Y}(0)=\bm{y}$ denotes $\bm{Y}(t)$ , the solution to (12b) with $\bm{v}(t)=\bm{v}^{*}$ and $\bm{Y}^{\prime}(0)=\bm{y}$ . Observe that $\bm{Y}(t)|\bm{Y}(0)=\bm{y}$ and $\bm{Y}(t)|\bm{Y}(0)=\bm{y}^{\prime}$ are Gaussian with the same covariance. Therefore, letting $\bm{Y}_{\bm{y}}=\bm{Y}(t)|\bm{Y}(0)=\bm{y}$ , we have

\mathcal{W}_{1}\left(\bm{Y}_{\bm{y}}(t),\bm{Y}_{\bm{y}^{\prime}}(t)\right)=\left\lVert\mathbb{E}\left[\bm{Y}_{\bm{y}}(t)\right]-\mathbb{E}\left[\bm{Y}_{\bm{y}^{\prime}}(t)\right]\right\rVert,

(83)

which can be shown by considering the lower bound

\mathcal{W}_{1}\left(\bm{Y}_{\bm{y}}(t),\bm{Y}_{\bm{y}^{\prime}}(t)\right)\geq\left\lVert\mathbb{E}\left[\bm{Y}_{\bm{y}}(t)\right]-\mathbb{E}\left[\bm{Y}_{\bm{y}^{\prime}}(t)\right]\right\rVert

(84)

from Jensen’s inequality, and the upper bound

\mathcal{W}_{1}\left(\bm{Y}_{\bm{y}}(t),\bm{Y}_{\bm{y}^{\prime}}(t)\right)\leq\left\lVert\mathbb{E}\left[\bm{Y}_{\bm{y}}(t)\right]-\mathbb{E}\left[\bm{Y}_{\bm{y}^{\prime}}(t)\right]\right\rVert

(85)

as evidenced by the coupling $\bm{Y}_{\bm{y}}(t)=\bm{Y}_{\bm{y}^{\prime}}(t)+\mathbb{E}\left[\bm{Y}_{\bm{y}}(t)\right]-\mathbb{E}\left[\bm{Y}_{\bm{y}^{\prime}}(t)\right]$ . Theorem 4.1 then gives us the constant $C_{1}$ . To prove the claimed expressions for $C_{2}$ and $C_{3}^{\prime}$ , we must specialize the arguments of [30] to our diffusion, which has a linear and Hurwitz $b$ and a uniform $\sigma$ . As in [30], denote by $\bm{V}_{\bm{v}}(t)$ the first directional derivative flow, defined as the solution to

d\bm{V}_{\bm{v}}(t)=\nabla b(\bm{Z}_{\bm{x}}(t))\bm{V}_{\bm{v}}(t)dt+\nabla\sigma(\bm{Z}_{\bm{x}}(t))d\bm{W}(t),\quad\bm{V}_{\bm{v}}(0)=\bm{v}.

(86)

Further, define $\bm{U}_{\bm{v},\bm{v}^{\prime}}(t)$ , the second directional derivative flow, defined as the solution to

\begin{split}d\bm{U}_{\bm{v},\bm{v}^{\prime}}(t)=\left(\nabla b(\bm{Z}_{\bm{x}}(t))\bm{U}_{\bm{v},\bm{v}^{\prime}}+\nabla^{2}b(\bm{Z}_{\bm{x}}(t))[\bm{V}_{\bm{v}^{\prime}}(t)]\bm{V}_{\bm{v}}(t)\right)dt+\left(\nabla\sigma(\bm{Z}_{\bm{x}}(t))\bm{U}_{\bm{v},\bm{v}^{\prime}}(t)\right.\\ \left.+\nabla^{2}\sigma(\bm{Z}_{\bm{x}}(t))[\bm{V}_{\bm{v}^{\prime}}(t)]\bm{V}_{\bm{v}}(t)\right)d\bm{W}(t),\;\bm{U}_{\bm{v},\bm{v}^{\prime}}(0)=0.\end{split}

(87)

In our proof the following lemma replaces the derivative flow bounds given in Lemma 16 of [30].

Lemma 4.4.

Consider an Itô diffusion given by (73) with transition semigroup $(P_{t})_{t\geq 0}$ and assume that $b(\bm{z})=A\bm{z}$ where $A\in\mathbb{R}^{n\times n}$ satisfies

HA+A^{T}H\leq-2\phi H,

(88)

For a matrix $H>0$ and a real number $\phi>0$ . Assume also that $\sigma(\bm{z})=\sigma$ is a constant. Then, for all $t\geq 0$ , $\rho>0$ , and $\bm{v}\in\mathbb{R}^{n}$ ,

\mathbb{E}\left[\lVert\bm{V}_{\bm{v}}(t)\rVert^{\rho}\right]\leq\mathrm{cond}(H)^{\rho/2}e^{-\rho\phi t}\lVert\bm{v}\rVert^{\rho}

(89)

and for all $t\geq 0$ , $\bm{U}_{\bm{v},\bm{v}^{\prime}}(t)=0$ .

Proof.

Under the assumptions of the Lemma, (86) becomes

d\bm{V}_{\bm{v}}(t)=\frac{\partial F(\bm{v}^{*})}{\partial\bm{v}}\bm{V}_{\bm{v}}(t)dt,\quad\bm{V}_{\bm{v}}(0)=\bm{v},

(90)

and thus $\bm{V}_{\bm{v}}(t)=e^{At}\bm{v}$ . Therefore, $\mathbb{E}\left[\bm{V}_{\bm{v}}(t)^{\rho}\right]=\lVert e^{At}\bm{v}\rVert^{\rho}$ . To bound this quantity, observe that the condition

HA+A^{T}H\leq-2\phi H,

(91)

implies that

\sqrt{H}A\sqrt{H}^{-1}+\left(\sqrt{H}A\sqrt{H}^{-1}\right)^{T}\leq-2\phi I,

(92)

where $\sqrt{H}$ is the principle square root of $H$ . It then follows from standard arguments from linear systems theory [15, 45] that for all $t\geq 0$ ,

\lVert e^{\sqrt{H}A\sqrt{H}^{-1}t}\rVert\leq e^{-\phi t}.

(93)

We then have that for all $t\geq 0$ and $\bm{v}\in\mathbb{R}^{n}$ ,

$\displaystyle\lVert e^{At}\bm{v}\rVert$	$\displaystyle=\lVert\sqrt{H}^{-1}e^{\sqrt{H}A\sqrt{H}^{-1}t}\sqrt{H}\bm{v}\rVert$	(94)
	$\displaystyle\leq\lVert\sqrt{H}^{-1}\rVert\lVert e^{\sqrt{H}A\sqrt{H}^{-1}t}\rVert\lVert\sqrt{H}\rVert\lVert\bm{v}\rVert$	(95)
	$\displaystyle\leq\sqrt{\mathrm{cond}(H)}e^{-\phi t}\lVert\bm{v}\rVert$	(96)

which establishes the desired bound on $\lVert e^{At}\bm{v}\rVert$ . To complete the proof, observe that under the assumptions of the lemma, (87) becomes

d\bm{U}_{\bm{v},\bm{v}^{\prime}}(t)=\nabla b(\bm{Z}_{\bm{x}}(t))\bm{U}_{\bm{v},\bm{v}^{\prime}},\;\bm{U}_{\bm{v},\bm{v}^{\prime}}(0)=0,

(97)

and thus $\bm{U}_{\bm{v},\bm{v}^{\prime}}(t)=0$ . ∎

With this specialization, we now trace the construction of $C_{2}$ and $C_{3}(\zeta)$ through the same argument used in proof of Theorem 5 in [30] to obtain the claimed expressions. Using Lemma 4.4 we make the specializations described in Lemmas 4.5 and 4.6, which we state and prove before proceeding. In our case, the bound on $M_{2}(f)$ given by Lemma 15 in [30] can be replaced by the following lemma.

Lemma 4.5.

Consider an Itô diffusion given by (73) with transition semigroup $(P_{t})_{t\geq 0}$ and assume that $b(\bm{z})=A\bm{z}$ where $A\in\mathbb{R}^{n\times n}$ satisfies

HA+A^{T}H\leq-2\phi H,

(98)

For a matrix $H>0$ and a real number $\phi>0$ . Assume also that $\sigma(\bm{z})=\sigma$ is a constant with right inverse $\sigma^{-1}$ . Then, for all $t>0$ and $f\in C^{2}$ with bounded first and second derivatives, $P_{t}f$ satisfies

M_{2}(P_{t}f)\leq\inf_{t_{0}\in(0,t]}M_{1}(f)r(t-t_{0})M_{0}(\sigma^{-1})e^{-\phi t_{0}}\mathrm{cond}(H)\frac{1}{\sqrt{t_{0}}},

(99)

where $r(t)=\lVert e^{At}\rVert$ .

Proof.

First, observe that under the assumptions of the lemma, the Itô diffusion has a Wasserstein decay rate of $r(t)=\lVert e^{At}\rVert$ . Thus, we can invoke Lemma 15 directly to obtain that $P_{t}f$ is twice continuously differentiable. The remainder of the proof follows that of Lemma 15 in [30], with the substitution of our form of $b$ and $\sigma$ , and using Lemma 4.4 instead of Lemma 16 in [30] for bounds on the derivative flows. ∎

In our special case, the bound on $M_{3}(P_{t}f)$ from Lemma 20 in [30] can be replaced with the following lemma.

Lemma 4.6.

Consider an Itô diffusion given by (73) with transition semigroup $(P_{t})_{t\geq 0}$ and assume that $b(\bm{z})=A\bm{z}$ where $A\in\mathbb{R}^{n\times n}$ satisfies

HA+A^{T}H\leq-2\phi H,

(100)

For a matrix $H>0$ and a real number $\phi>0$ . Assume also that $\sigma(\bm{z})=\sigma$ is a constant with right inverse $\sigma^{-1}$ . Then, for all $t>0$ and $f\in C^{3}$ with bounded second and third derivatives, $P_{t}f$ satisfies

M_{3}(P_{t}f)\leq\inf_{t_{0}\in(0,t]}\frac{2}{t_{0}}M_{1}(f)r(t-t_{0})M_{0}^{2}(\sigma^{-1})\mathrm{cond}(H)^{\frac{5}{2}}e^{-\frac{3}{2}\phi t_{0}},

(101)

where $r(t)=\lVert e^{At}\rVert$ .

Proof.

First, observe that under the assumptions of the lemma, the Itô diffusion has a Wasserstein decay rate of $r(t)=\lVert e^{At}\rVert$ . Then, the proof follows that of Lemma 20 in [30], but with the substitution of the bound on $M_{2}(P_{t}f)$ given by Lemma 4.5, and the derivative flow bounds of Lemma 16 in [30] replaced by Lemma 4.4. ∎

We now finish our proof of Lemma 4.3. Observe that in this case we have $A=\frac{\partial F}{\partial\bm{x}}(\bm{x}^{*})$ and $\sigma=\Sigma$ , where $\Sigma$ has a right inverse $\Sigma^{-1}$ . Let $0<\zeta<1$ . We have for $f_{h}$ solving $\mathcal{A}f_{h}=h-\mathbb{E}_{P}\left[h(\bm{Z})\right]$ that by the Dominated Convergence Theorem and Jensen’s inequality that

M_{\zeta-1}(\nabla^{2}f_{h})=M_{\zeta-1}\left(-\int_{0}^{\infty}\nabla^{2}P_{t}hdt\right)\leq\int_{0}^{\infty}M_{1-\zeta}\left(\nabla^{2}P_{t}h\right)dt.

(102)

Splitting the last integral into the interval $[0,1]$ and $[1,\infty]$ we have

M_{\zeta-1}(\nabla^{2}f_{h})\leq T_{1}+T_{2},

(103)

where

T_{1}=\int_{0}^{1}M_{1-\zeta}\left(\nabla^{2}P_{t}h\right)dt,

(104)

and

T_{2}=\int_{1}^{\infty}M_{1-\zeta}\left(\nabla^{2}P_{t}h\right)dt.

(105)

Applying the seminorm interpolation result of Lemma 19 in [30], we have

	$\displaystyle M_{1-\zeta}\left(\nabla^{2}P_{t}h\right)$	$\displaystyle\leq 2^{\zeta}\left(M_{0}(\nabla^{2}P_{t}h)\right)^{\zeta}\left(M_{1}(\nabla^{2}P_{t}h)\right)^{1-\zeta}$		(106)
		$\displaystyle=2^{\zeta}\left(M_{2}(P_{t}h)\right)^{\zeta}\left(M_{3}(P_{t}h)\right)^{1-\zeta}.$		(107)

Under the assumptions of the lemma, we can apply Lemmas 4.5 and 4.6. For $t\leq 1$ we set $t_{0}=t$ to obtain a bound on $T_{1}$ as follows.

$\displaystyle T_{1}$	$\displaystyle=\int_{0}^{1}M_{1-\zeta}\left(\nabla^{2}P_{t}h\right)dt$	(108)
	$\displaystyle\leq\int_{0}^{1}2^{\zeta}\left(M_{2}(P_{t}h)\right)^{\zeta}\left(M_{3}(P_{t}h)\right)^{1-\zeta}dt$	(109)
	$\displaystyle\leq\int_{0}^{1}2^{\zeta}\left(r(0)M_{0}(\Sigma^{-1})e^{-\phi t}\mathrm{cond}(H)\frac{1}{\sqrt{t}}\right)^{\zeta}\left(\frac{2}{t}r(0)M_{0}^{2}(\Sigma^{-1})\mathrm{cond}(H)^{\frac{5}{2}}e^{-\frac{3}{2}\phi t}\right)^{1-\zeta}dt$	(110)
	$\displaystyle=2r(0)M_{0}^{2-\zeta}(\Sigma^{-1})\mathrm{cond}(H)^{\frac{5}{2}-\frac{3}{2}\zeta}\int_{0}^{1}e^{-(\frac{3}{2}-\frac{1}{2}\zeta)\phi t}t^{-1+\frac{1}{2}\zeta}dt$	(111)
	$\displaystyle\leq 2r(0)M_{0}^{2-\zeta}(\Sigma^{-1})\mathrm{cond}(H)^{\frac{5}{2}-\frac{3}{2}\zeta}\frac{2}{\zeta}$	(112)
	$\displaystyle=\frac{4}{\zeta}\max\left\{\lVert\Sigma^{-1}\rVert,\lVert\Sigma^{-1}\rVert^{2}\right\}\mathrm{cond}(H)^{\frac{5}{2}},$	(113)

where we have used the observations that $r(0)=1$ and $\mathrm{cond}(H)\geq 1$ . turning to $T_{2}$ , for $t\geq 1$ we set $t_{0}=1$ in Lemmas 4.5 and 4.6, obtaining

$\displaystyle T_{2}$	$\displaystyle=\int_{1}^{\infty}M_{1-\zeta}\left(\nabla^{2}P_{t}h\right)dt$	(114)
	$\displaystyle\leq\int_{1}^{\infty}2^{\zeta}\left(M_{2}(P_{t}h)\right)^{\zeta}\left(M_{3}(P_{t}h)\right)^{1-\zeta}dt$	(115)
	$\displaystyle\leq\int_{1}^{\infty}\left(2r(t-1)M_{0}(\Sigma^{-1})e^{-\phi}\mathrm{cond}(H)\right)^{\zeta}\left(2r(t-1)M_{0}^{2}(\Sigma^{-1})\mathrm{cond}(H)^{\frac{5}{2}}e^{-\frac{3}{2}\phi}\right)^{1-\zeta}dt$	(116)
	$\displaystyle=2M_{0}^{2-\zeta}(\Sigma^{-1})\mathrm{cond}(H)^{\frac{5}{2}-\frac{3}{2}\zeta}e^{-(\frac{3}{2}-\frac{1}{2}\zeta)\phi}\int_{1}^{\infty}r(t-1)dt$	(117)
	$\displaystyle\leq 2M_{0}^{2-\zeta}(\Sigma^{-1})\mathrm{cond}(H)^{\frac{5}{2}-\frac{3}{2}\zeta}e^{-\phi}\int_{0}^{\infty}r(t)dt$	(118)
	$\displaystyle=2\max\left\{\lVert\Sigma^{-1}\rVert,\lVert\Sigma^{-1}\rVert^{2}\right\}\mathrm{cond}(H)^{\frac{5}{2}}e^{-\phi}\int_{0}^{\infty}r(t)dt.$	(119)

Finally, by the definition of $M_{1-\zeta}$ we have

\forall\bm{x},\bm{x}^{\prime}\in\mathbb{R}^{n},\;\lVert\nabla^{2}f_{h}(\bm{x})-\nabla^{2}f_{h}(\bm{x}^{\prime})\rVert\leq(T_{1}+T_{2})\lVert\bm{x}-\bm{x}^{\prime}\rVert^{1-\zeta},

(120)

which proves the desired bound when combined with the bounds on $T_{1}$ and $T_{2}$ derived above.

4.3 Proof of Lemma 4.2

Proof.

We have that

\sup_{h\in\mathrm{Lip}(1)}\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h(\bm{Y})]\right\rvert\geq\sup_{h\in\mathcal{H}}\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h(\bm{Y})]\right\rvert.

(121)

We will show that

\sup_{h\in\mathrm{Lip}(1)}\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h(\bm{Y})]\right\rvert\leq\sup_{h\in\mathcal{H}}\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h(\bm{Y})]\right\rvert.

(122)

Our proof is based on mollifying $h\in\mathrm{Lip}(1)$ . Let $\varphi_{\epsilon}(\bm{x})$ be the bump function defined by

\varphi_{\epsilon}(\bm{x})=\left\{\begin{array}[]{cc}\frac{1}{\epsilon^{n}}Ce^{\frac{-1}{1-\lVert\bm{x}/\epsilon\rVert^{2}}},&\lVert\bm{x}/\epsilon\rVert\leq 1,\\ 0,&\text{else},\end{array}\right.

(123)

Where $C=\left(\int_{\mathbb{R}^{n}}e^{\frac{-1}{1-\lVert\bm{x}\rVert^{2}}}d\bm{x}\right)^{-1}$ . Suppose for contradiction that

\sup_{h\in\mathrm{Lip}(1)}\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h(\bm{Y})]\right\rvert>\sup_{h\in\mathcal{H}}\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h(\bm{Y})]\right\rvert,

(124)

then, there exists $h^{*}\in\mathrm{Lip}(1)$ such that $h^{*}\notin\mathcal{H}$ , and

\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h^{*}(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h^{*}(\bm{Y})]\right\rvert-\sup_{h\in\mathcal{H}}\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h(\bm{Y})]\right\rvert=:a>0.

(125)

Let $h^{*}_{a/4}=h^{*}\star\varphi_{a/4}$ . We will show that $h^{*}_{a/4}=h^{*}\star\varphi_{a/4}\in\mathcal{H}$ , and $\lVert h^{*}-h^{*}_{a/4}\rVert_{\infty}\leq a/4$ . Let $\epsilon=a/4$ . Recall that

h^{*}_{\epsilon}(\bm{x})=\left(\varphi_{\epsilon}\star h^{*}\right)(\bm{x})=\int_{\mathbb{R}^{n}}\varphi_{\epsilon}(\bm{x}-\bm{y})h(\bm{y})\mu(d\bm{y}),

(126)

where $\mu$ is the Lebesgue measure. Observe that

$\displaystyle\left\lvert h^{}_{\epsilon}(\bm{x})-h^{}_{\epsilon}(\bm{x}^{\prime})\right\rvert$	$\displaystyle=\left\lvert\int_{\mathbb{R}^{n}}\varphi_{\epsilon}(\bm{y})\left(h(\bm{x}-\bm{y})-h(\bm{x}^{\prime}-\bm{y})\right)\mu(d\bm{y})\right\rvert,$	(127)
	$\displaystyle\leq\int_{\mathbb{R}^{n}}\left\lvert\varphi_{\epsilon}(\bm{y})\left(h(\bm{x}-\bm{y})-h(\bm{x}^{\prime}-\bm{y})\right)\right\rvert\mu(d\bm{y}),$	(128)
	$\displaystyle\leq\int_{\mathbb{R}^{n}}\varphi_{\epsilon}(\bm{y})\left\lvert h(\bm{x}-\bm{y})-h(\bm{x}^{\prime}-\bm{y})\right\rvert\mu(d\bm{y}),$	(129)
	$\displaystyle\leq\lVert\bm{x}-\bm{x}^{\prime}\rVert\int_{\mathbb{R}^{n}}\varphi_{\epsilon}(\bm{y})\mu(d\bm{y}),$	(130)
	$\displaystyle=\lVert\bm{x}-\bm{x}^{\prime}\rVert,$	(131)

and thus $h^{*}_{\epsilon}\in\mathrm{Lip}(1)$ . Next, let $\frac{\partial^{\bm{\alpha}}}{\partial\bm{x}^{\bm{\alpha}}}$ with $\bm{\alpha}\in\mathbb{Z}_{\geq 0}^{n}$ denote $\prod_{j=1}^{n}\frac{\partial^{\alpha_{j}}}{\partial x_{j}^{\alpha_{j}}}$ , and observe that by the Dominated Convergence Theorem,

\frac{\partial^{\bm{\alpha}}h^{*}_{\epsilon}(\bm{x})}{\partial\bm{x}^{\bm{\alpha}}}=\int_{\mathbb{R}^{n}}\frac{\partial^{\bm{\alpha}}\varphi_{\epsilon}(\bm{x}-\bm{y})}{\partial\bm{x}^{\bm{\alpha}}}h(\bm{y})\mu(d\bm{y}),

(132)

showing that $h^{*}_{\epsilon}(\bm{x})\in C^{\infty}(\mathbb{R}^{n})$ , and that furthermore that $\left\lVert\frac{\partial^{\bm{\alpha}}h^{*}_{\epsilon}(\bm{x})}{\partial\bm{x}^{\bm{\alpha}}}\right\rVert$ is bounded. We have shown that $h^{*}_{\epsilon}\in\mathcal{H}$ . Now, consider

$\displaystyle\left\lvert h^{}(\bm{x})-h^{}_{a/4}(\bm{x})\right\rvert$	$\displaystyle=\left\lvert\int_{\mathbb{R}^{n}}\varphi_{a/4}(\bm{y})h^{}(\bm{x})\mu(d\bm{y})-\int_{\mathbb{R}^{n}}\varphi_{a/4}(\bm{y})h^{}(\bm{x}-\bm{y})\mu(d\bm{y})\right\rvert,$	(133)
	$\displaystyle\leq\int_{\mathbb{R}^{n}}\left\lvert\varphi_{a/4}(\bm{y})\left(h^{}(\bm{x})-h^{}(\bm{x}-\bm{y})\right)\right\rvert\mu(d\bm{y}),$	(134)
	$\displaystyle\leq\int_{\lVert\bm{y}\rVert\leq a/4}\varphi_{a/4}(\bm{y})\lVert\bm{y}\rVert\mu(d\bm{y}),$	(135)
	$\displaystyle\leq\frac{a}{4}\int_{\lVert\bm{y}\rVert\leq a/4}\varphi_{a/4}(\bm{y})\mu(d\bm{y}),$	(136)
	$\displaystyle=\frac{a}{4},$	(137)

where we have used the fact that $\varphi_{a/4}(\bm{y})=0$ for all $\lVert\bm{y}\rVert>\frac{a}{4}$ , and the fact that $h^{*}\in\mathrm{Lip}(1)$ . We have that

	$\displaystyle\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h^{}(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h^{}(\bm{Y})]\right\rvert-\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h^{}_{a/4}(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h^{}_{a/4}(\bm{Y})]\right\rvert$	(138)
$\displaystyle\begin{split}&=\left\|\mathbb{E}_{\bm{X}\sim\nu}[h^{}(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h^{}(\bm{Y})]-\mathbb{E}_{\bm{X}\sim\nu}[h^{}_{a/4}(\bm{X})]+\mathbb{E}_{\bm{Y}\sim\rho}[h^{}_{a/4}(\bm{Y})]+\mathbb{E}_{\bm{X}\sim\nu}[h^{}_{a/4}(\bm{X})]\right.\\ &\quad\quad\quad\left.-\mathbb{E}_{\bm{Y}\sim\rho}[h^{}_{a/4}(\bm{Y})]\right\|-\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h^{}_{a/4}(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h^{}_{a/4}(\bm{Y})]\right\rvert,\end{split}$		(139)
	$\displaystyle\leq\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h^{}(\bm{X})-h^{}_{a/4}(\bm{X})]\right\rvert+\left\lvert\mathbb{E}_{\bm{Y}\sim\rho}[h^{}(\bm{Y})-h^{}_{a/4}(\bm{Y})]\right\rvert,$	(140)
	$\displaystyle\leq\mathbb{E}_{\bm{X}\sim\nu}[\left\lvert h^{}(\bm{X})-h^{}_{a/4}(\bm{X})\right\rvert]+\mathbb{E}_{\bm{Y}\sim\rho}[\left\lvert h^{}(\bm{Y})-h^{}_{a/4}(\bm{Y})\right\rvert]\leq\frac{a}{2},$	(141)

where the second to last inequality follows from Jensen’s Inequality and the last follows from the fact that $\lVert h^{*}-h^{*}_{a/4}\rVert_{\infty}\leq\frac{a}{4}$ . Now, we have from (125) and (138) that

\sup_{h\in\mathcal{H}}\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h(\bm{Y})]\right\rvert+a=\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h^{*}(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h^{*}(\bm{Y})]\right\rvert\\ \leq\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h^{*}_{a/4}(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h^{*}_{a/4}(\bm{Y})]\right\rvert+a/2,

(142)

and thus

\sup_{h\in\mathcal{H}}\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h(\bm{Y})]\right\rvert<\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h^{*}_{a/4}(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h^{*}_{a/4}(\bm{Y})]\right\rvert,

(143)

which cannot be true since $h^{*}_{a/4}\in\mathcal{H}$ . Thus, by contradiction,

\sup_{h\in\mathrm{Lip}(1)}\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h(\bm{Y})]\right\rvert\leq\sup_{h\in\mathcal{H}}\left\lvert\mathbb{E}_{\bm{X}\sim\nu}[h(\bm{X})]-\mathbb{E}_{\bm{Y}\sim\rho}[h(\bm{Y})]\right\rvert,

(144)

which completes the proof. ∎

4.4 Proof of Theorem 3.2

We first give a lemma that uses a Foster-Lyapunov criteria [49] to conclude that Condition 3.2 holds.

Lemma 4.7.

Consider an SCRN $\bm{X}_{\Omega}(t)$ satisfying Assumption 2.1 and 2.3, and suppose there exists $V:\{\bm{x}\in\mathbb{R}^{n}|\bm{x}+\bm{v}^{*}\in\mathcal{X}\}\rightarrow\mathbb{R}$ , $a,b,K>0$ , and $\Omega_{0}$ such that $a\left\lVert\bm{x}\right\rVert^{2}\leq V(\bm{x})\leq b\left\lVert\bm{x}\right\rVert^{2}$ and for all $\Omega\in\left\{\Omega\in\mathcal{O}\middle|\Omega\geq\Omega_{0}\right\}$ and all $\bm{x}\in\{\bm{x}\in\mathbb{R}^{n}|\Omega(\bm{x}+\bm{v}^{*})\in\mathcal{X}_{\Omega}\}$ ,

G_{\tilde{\tilde{\bm{X}}}_{\Omega}}V(\bm{x})\leq-\gamma\lVert\bm{x}\rVert^{2}+\frac{K}{\Omega}.

(145)

Then, Condition 3.2 is satisfied.

Proof.

We begin by showing that for all $\Omega\in\left\{\Omega\in\mathcal{O}\middle|\Omega\leq\Omega_{0}\right\}$ , $\bm{X}_{\Omega}(t)$ has a unique stationary distribution and is nonexplosive. If $|\mathcal{X}_{\Omega}|$ is finite, then since $\bm{X}_{\Omega}(t)$ is irreducible by Assumption 2.3, $\bm{X}_{\Omega}(t)$ has a unique stationary distribution and is nonexplosive. Otherwise, observe that $V(\bm{x})$ is radially unbounded due to the assumption that $a\left\lVert\bm{x}\right\rVert^{2}\leq V(\bm{x})$ . Additionally, by using the bound $V(\bm{x})\leq b\lVert\bm{x}\rVert^{2}$ , we have that

G_{\tilde{\tilde{\bm{X}}}_{\Omega}}V(\bm{x})\leq-\frac{\gamma}{b}V(\bm{x})+\frac{K}{\Omega}.

(146)

Thus, by Theorem 7.1 in [49], $\tilde{\tilde{\bm{X}}}_{\Omega}(t)$ is exponentially ergodic, and thus has a unique stationary distribution and is nonexplosive. Now we show the desired moment bounds. We have that for all $\Omega\in\left\{\Omega\in\mathcal{O}\middle|\Omega\geq\Omega_{0}\right\}$ ,

G_{\tilde{\tilde{\bm{X}}}_{\Omega}}V(\bm{x})\leq-\gamma(\lVert\bm{x}\rVert^{2}+1)+\gamma+\frac{K}{\Omega}.

(147)

By Theorem 4.3 in [49], we therefore have that $\mathbb{E}\left[\left\lVert\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty}\right\rVert^{2}\right]+1\leq\frac{\gamma+\frac{K}{\Omega}}{\gamma}$ , which implies that

\mathbb{E}\left[\left\lVert\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty}\right\rVert^{2}\right]\leq\frac{K}{\gamma\Omega}.

(148)

By Jensen’s inequality, $\mathbb{E}\left[\left\lVert\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty}\right\rVert\right]\leq\sqrt{\frac{K}{\gamma\Omega}}$ . Thus, the SCRN satisfies Condition 3.2. ∎

Building on Lemma 4.7, we now prove lemma that relates the existence of a quadratic type Lyapunov function that for the RRE to Conditions 3.1 and 3.2, by using the Lyapunov function as a Foster-Lyapunov function for $\tilde{\tilde{\bm{X}}}_{\Omega}(t)$ .

Lemma 4.8.

Consider an SCRN $\bm{X}_{\Omega}(t)$ satisfying Assumptions 2.1, 2.2, and 2.3, and let $\bm{v}^{*}\in\mathrm{int}(\mathcal{X})$ be an equilibrium point of (12a). Suppose that there exists an open set $\mathcal{D}\supseteq\left\{\bm{x}\in\mathbb{R}^{n}\middle|\bm{x}+\bm{v}^{*}\in\mathcal{X}\right\}$ and a Lyapunov function for the RRE $V:\mathcal{D}\rightarrow\mathbb{R}_{\geq 0}$ twice continuously differentiable and $a,b,\gamma>0$ such that for all $\bm{x}\in\left\{\bm{x}\in\mathbb{R}^{n}\middle|\bm{x}+\bm{v}^{*}\in\mathcal{X}\right\}$ , $a\lVert\bm{x}\rVert^{2}\leq V(\bm{x})\leq b\lVert\bm{x}\rVert^{2}$ and

\frac{\partial V}{\partial\bm{z}}F(\bm{x}+\bm{v}^{*})\leq-\gamma\lVert\bm{x}\rVert^{2},

(149)

and additionally, there exists $B$ such that $\sup_{\bm{x}\in\left\{\bm{x}\in\mathbb{R}^{n}\middle|\bm{x}+\bm{v}^{*}\in\mathcal{X}\right\}}\lVert\nabla^{2}V(\bm{x})\rVert\leq B$ . Then, Condition 3.2 is satisfied.

Proof.

Let $\eta=\frac{1}{\Omega}$ , and for $\bm{z}\in\mathcal{X}_{\Omega}$ let $\bm{x}=\eta\bm{z}-\bm{v}^{*}$ . Consider

G_{\tilde{\tilde{\bm{X}}}_{\Omega}}V(\bm{x})=\sum_{i=1}^{r}\bar{\lambda}_{i}(\bm{x}+\bm{v}^{*})\frac{1}{\eta}\left(V(\bm{x}+\eta\bm{\xi}_{i})-V(\bm{x})\right).

(150)

If we take the Taylor expansion in $\eta$ using the Lagrange form of the remainder we obtain

G_{\tilde{\tilde{\bm{X}}}_{\Omega}}V(\bm{x})=\sum_{i=1}^{r}\bar{\lambda}_{i}(\bm{x}+\bm{v}^{*})\left(\frac{\partial V}{\partial\bm{x}}\bm{\xi}_{i}+\frac{1}{2}\eta\bm{\xi}_{i}^{T}\nabla^{2}{V}(\bm{x}+\epsilon\bm{\xi}_{i})\bm{\xi}_{i}\right),

(151)

where $0\leq\epsilon\leq\eta$ depends on $\bm{x}$ . Thus, for sufficiently large $\Omega$ we obtain

G_{\tilde{\tilde{\bm{X}}}_{\Omega}}V(\bm{x})\leq\frac{\partial V(\bm{x})}{\partial\bm{z}}F(\bm{x}+\bm{v}^{*})+\frac{K}{\Omega}(1+\lVert\bm{x}\rVert^{2})\leq-\frac{\gamma}{2}\lVert\bm{x}\rVert^{2}+\frac{K}{\Omega},

(152)

for some $K>0$ . Here we have used the fact that by Assumption 2.2, there exists $K^{\prime}$ such that $\left\lvert\sum_{i=1}^{r}\bar{\lambda}_{i}(\bm{x}+\bm{v}^{*})\right\rvert\leq K^{\prime}(1+\lVert\bm{x}\rVert^{2})$ . Hence, by Lemma 4.7, Condition 3.2 is satisfied. ∎

We now prove Theorem 3.2. We note that we only need to verify Condition 3.1 for $\kappa=2$ , since we are assuming that the Hessian of $\bar{\lambda}_{i}(\bm{X})$ is bounded.

Proof.

We begin by checking that Condition 3.1 holds. Let us consider the scaled Markov chain $\bm{Z}_{\Omega}(t)=\frac{1}{\Omega}\bm{X}_{\Omega}(t)$ and the function $V(\bm{v})=\left(\bm{c}^{T}\bm{v}\right)^{\kappa+1}$ . We have that

$\displaystyle G_{\bm{Z}_{\Omega}}V(\bm{v})$	$\displaystyle=\sum_{i=1}^{r}\lambda_{i}\left(\Omega\bm{v}\right)\left(V(\bm{v}+\frac{1}{\Omega}\bm{\xi}_{i})-V(\bm{v})\right)$	(153)
	$\displaystyle=\sum_{i=1}^{r}\Omega\bar{\lambda}_{i}\left(\bm{v}\right)\left(V(\bm{v}+\frac{1}{\Omega}\bm{\xi}_{i})-V(\bm{v})\right)$	(154)
	$\displaystyle=\sum_{i=1}^{r}\Omega\bar{\lambda}_{i}\left(\bm{v}\right)\left(\left(\bm{c}^{T}(\bm{v}+\frac{1}{\Omega}\bm{\xi}_{i})\right)^{\kappa+1}-\left(\bm{c}^{T}\bm{v}\right)^{\kappa+1}\right)$	(155)

For $\kappa=2$ , we have

	$\displaystyle\begin{split}\Omega\left(\left(\bm{c}^{T}(\bm{v}+\frac{1}{\Omega}\bm{\xi}_{i})\right)^{\kappa+1}-\left(\bm{c}^{T}\bm{v}\right)^{\kappa+1}\right)&=\Omega\left((\bm{c}^{T}\bm{v})^{3}+\frac{3}{\Omega}(\bm{c}^{T}\bm{v})^{2}\bm{c}^{T}\bm{\xi}_{i}\right.\\ &\left.+\frac{3}{\Omega^{2}}\bm{c}^{T}\bm{v}(\bm{c}^{T}\bm{\xi}_{i})^{2}+\frac{1}{\Omega^{3}}(\bm{c}^{T}\bm{\xi}_{i})^{3}-(\bm{c}^{T}\bm{v})^{3}\right)\end{split}$			(156)
		$\displaystyle=3(\bm{c}^{T}\bm{v})^{2}\bm{c}^{T}\bm{\xi}_{i}+\frac{3}{\Omega}\bm{c}^{T}\bm{v}(\bm{c}^{T}\bm{\xi}_{i})^{2}+\frac{1}{\Omega^{2}}(\bm{c}^{T}\bm{\xi}_{i})^{3}$		(157)

Noting that $\frac{\partial}{\partial\bm{v}}V(\bm{v})=(\kappa+1)(\bm{c}^{T}\bm{v})^{\kappa}\bm{c}^{T}$ , we have

	$\displaystyle G_{\bm{Z}_{\Omega}}V(\bm{v})$	$\displaystyle=\sum_{i=1}^{r}\bar{\lambda}_{i}\left(\bm{v}\right)\left(3(\bm{c}^{T}\bm{v})^{2}\bm{c}^{T}\bm{\xi}_{i}+\frac{3}{\Omega}\bm{c}^{T}\bm{v}(\bm{c}^{T}\bm{\xi}_{i})^{2}+\frac{1}{\Omega^{2}}(\bm{c}^{T}\bm{\xi}_{i})^{3}\right)$		(158)
		$\displaystyle=\frac{\partial V(\bm{v})}{\partial\bm{v}}F(\bm{v})+\frac{3}{\Omega}\sum_{i=1}^{r}\bar{\lambda}_{i}\left(\bm{v}\right)\bm{c}^{T}\bm{v}(\bm{c}^{T}\bm{\xi}_{i})^{2}+\frac{1}{\Omega^{2}}\sum_{i=1}^{r}\bar{\lambda}_{i}\left(\bm{v}\right)(\bm{c}^{T}\bm{\xi}_{i})^{3}$		(159)

Thus, for sufficiently large $\Omega$ , and $\bm{v}\geq 0$ such that $\bm{c}^{T}\bm{v}\geq d$ , we have that $G_{\bm{Z}_{\Omega}}V(\bm{v})\leq-\frac{1}{2}\gamma_{2}(\bm{c}^{T}\bm{v})^{3}$ . Therefore,

G_{\bm{Z}_{\Omega}}V(\bm{v})\leq-\left(\frac{1}{2}\gamma_{2}(\bm{c}^{T}\bm{v})^{3}+1\right)+\max\limits_{\bm{v}\in\frac{1}{\Omega}\mathcal{X}_{\Omega}:\bm{c}^{T}\bm{v}\leq d}\left(G_{\bm{Z}_{\Omega}}V(\bm{v})+\frac{1}{2}\gamma_{2}(\bm{c}^{T}\bm{v})^{3}+1\right),

(160)

which shows by Theorem 7.1 in [49] that $\bm{Z}_{\Omega}(t)$ has a unique stationary distribution and that by Theorem 4.3 in [49] $\mathbb{E}[(\bm{c}^{T}\bm{Z}_{\Omega}^{\infty})^{3}]<\infty$ , and thus $\tilde{\tilde{\bm{X}}}_{\Omega}(t)$ has a unique stationary distribution and $\mathbb{E}[(\bm{c}^{T}\tilde{\tilde{\bm{X}}}_{\Omega}^{\infty})^{3}]<\infty$ . Therefore, Condition 3.1 is satisfied. We now verify that Condition 3.2 holds. We will need the following Lemma, which is a slight variation on the standard converse Lyapunov theorem, see e.g. [37].

Lemma 4.9.

Let $\mathcal{X}\subseteq\mathbb{R}^{n}_{\geq 0}$ be closed and let $\mathcal{D}\subset\mathbb{R}^{n}$ be an open set such that $\mathcal{X}\subset\mathcal{D}$ . Let $\bar{F}:\mathcal{D}\rightarrow\mathbb{R}^{n}$ be a $C^{2}$ function defining a vector field for which $\mathcal{X}$ is positively invariant. Suppose that for all $\bm{x}\in\mathcal{X}$ , the solution to

\dot{\bm{x}}^{\prime}=\bar{F}(\bm{x}^{\prime}),\quad\bm{x}^{\prime}(0)=\bm{x},

(161)

$\phi(t;\bm{x})$ , exists uniquely on $[0,\infty)$ and that there exists $\bar{K},\bar{\gamma}_{1}>0$ and $\bm{x}^{*}\in\mathcal{X}$ such that for all $\bm{x}\in\mathcal{X}$ and all $t\geq 0$ ,

\lVert\phi(t;\bm{x})-\bm{x}^{*}\rVert\leq\bar{K}e^{-\bar{\gamma}_{1}t}\lVert\bm{x}-\bm{x}^{*}\rVert,

(162)

and furthermore, there exist $\bm{c}\in\mathbb{R}^{n}_{>0}$ , $\bar{d}>0,\bar{\gamma}_{2}>0$ such that for all $\bm{x}\in\left\{\bm{x}\in\mathcal{X}\middle|\bar{\bm{c}}^{T}\bm{x}\geq\bar{d}\right\}$ ,

\bar{\bm{c}}^{T}\bar{F}(\bm{x})\leq-\gamma_{2}\bar{\bm{c}}^{T}\bm{x}.

(163)

Then, for each $d^{\prime}\geq\bar{d}$ , there exists $\overline{W}_{1}:\mathcal{D}\rightarrow\mathbb{R}$ twice continuously differentiable and $\bar{a}_{1},\bar{b}_{1},\bar{\gamma}>0$ such that for all $\bm{x}\in\left\{\bm{x}\in\mathcal{X}\middle|\bar{\bm{c}}^{T}\bm{x}\leq d^{\prime}\right\}$ ,

	$\displaystyle\bar{a}_{1}\lVert\bm{x}-\bm{x}^{*}\rVert^{2}\leq\overline{W}_{1}(\bm{x})$	$\displaystyle\leq\bar{b}_{1}\lVert\bm{x}-\bm{x}^{*}\rVert^{2},$		(164)
	$\displaystyle\frac{\partial\overline{W}_{1}(\bm{x})}{\partial\bm{v}}\bar{F}(\bm{x})$	$\displaystyle\leq-\bar{\gamma}\left\lVert\bm{x}-\bm{x}^{*}\right\rVert^{2}.$		(165)

Proof.

The proof that there exists $\overline{W}_{1}$ satisfying the quadratic bounds proceeds analogously to the proof of Theorem 4.14 in [37], with the only adjustment that we use $\left\{\bm{x}\in\mathcal{X}\middle|\bm{c}^{T}\bm{x}\leq d^{\prime}\right\}$ as the domain instead of a ball around the origin. That $\overline{W}_{1}\in C^{2}$ is assured by the form

\overline{W}_{1}(\bm{x})=\int_{0}^{s}\phi(t;\bm{x})^{T}\phi(t;\bm{x})dt,

(166)

where $\phi(t;\bm{x})$ is the solution to $\dot{\bm{x}}^{\prime}=\bar{F}(\bm{x}^{\prime})$ with $\bm{x}^{\prime}(0)=\bm{x}$ and $\phi(t;\bm{x})\in C^{2}$ by $\bar{F}\in C^{2}$ and e.g. Theorem 8.43 in [64]. ∎

We now use Lemma 4.9 to construct a Lyapunov Function $W_{1}$ . Recall that $\gamma_{1}$ , $K$ , $\bm{c}$ , and $d$ are assumed to exist by the assumptions of Theorem 3.2. We apply Lemma 4.9 to $\bar{F}(\bm{x})=F(\bm{x})$ with $\bar{\gamma}_{1}=\bar{\gamma}$ , $\bar{K}=K$ , $\bar{\bm{c}}=\bm{c}$ , $\bar{d}=d$ , $\bm{x}^{*}=\bm{v}^{*}$ , and $\bar{\gamma}_{2}=\gamma_{2}$ , picking $d^{\prime}>d$ to obtain a function $\overline{W}_{1}$ with associated values $\bar{a}_{1}$ , $\bar{b}_{1}$ , and $\bar{\gamma}$ . Let $W_{1}(\bm{v})=\overline{W}_{1}(\bm{v})$ . In order to create a Lyapunov function on all of $\mathcal{X}$ , we define $W_{2}:\mathcal{D}\rightarrow\mathbb{R}$ as

W_{2}(\bm{v})=2\bar{b}_{1}\left(\left(\bm{c}^{T}\bm{v}\right)^{2}-(d^{\prime})^{2}\right).

(167)

We construct a global Lyapunov function by merging $W_{1}$ near the origin with $W_{2}$ far from the origin. We must do this in a way that creates a $C^{2}$ function with bounded second derivative so that we can apply Lemma 4.8. Our approach is similar to a technique used to prove Theorem B.1 in [44]. To this end, let $W:\mathcal{D}\rightarrow\mathbb{R}$ be defined by

W(\bm{v})=\max\left\{W_{1}(\bm{v}),W_{2}(\bm{v})\right\}.

(168)

We have that for all $\bm{v}\in\left\{\bm{v}\in\mathcal{X}\middle|\bm{c}^{T}\bm{v}<d^{\prime}\right\}$ , $W(\bm{v})=W_{1}(\bm{v})$ , and hence in this region,

\displaystyle\frac{\partial W(\bm{v})}{\partial\bm{v}}F(\bm{v})

\displaystyle\leq-\bar{\gamma}\left\lVert\bm{v}-\bm{v}^{*}\right\rVert^{2}.

(169)

Observe that due to the quadratic upper bound on $W_{1}(\bm{v})$ , there exists $d^{\prime\prime}>d^{\prime}$ such that for all $\bm{v}\in\left\{\bm{v}\in\mathcal{X}\middle|\bm{c}^{T}\bm{v}>d^{\prime\prime}\right\}$ , $W_{2}(\bm{v})>W_{1}(\bm{v})$ , and thus in this region

$\displaystyle\frac{\partial W(\bm{v})}{\partial\bm{v}}F(\bm{v})$	$\displaystyle\leq-4\bar{b}_{1}\gamma_{2}(\bm{c}^{T}\bm{v})^{2},$	(170)
	$\displaystyle\leq-4\bar{b}_{1}\gamma_{2}\sum_{i}c_{i}^{2}v_{i}^{2},$	(171)
	$\displaystyle\leq-4\bar{b}_{1}\gamma_{2}\left(\min_{i}c_{i}^{2}\right)\sum_{i}v_{i}^{2},$	(172)
	$\displaystyle\leq-4\bar{b}_{1}\gamma_{2}\left(\min_{i}c_{i}^{2}\right)\sum_{i}(v_{i}^{2}-2v_{i}v_{i}^{*}),$	(173)

where we have used that $\mathcal{X}\subseteq\mathbb{R}_{\geq 0}^{n}$ in the last line. hence, there exists $\tilde{\gamma}_{2}>0$ such that for all $\bm{v}\in\left\{\bm{v}\in\mathcal{X}\middle|\bm{c}^{T}\bm{v}>d^{\prime\prime}\right\}$ ,

	$\displaystyle\frac{\partial W(\bm{v})}{\partial\bm{v}}F(\bm{v})$	$\displaystyle\leq-\tilde{\gamma}_{2}\sum_{i}(v_{i}^{2}-2v_{i}v_{i}^{}+(v^{}_{i})^{2}),$		(174)
		$\displaystyle=-\tilde{\gamma}_{2}\left\lVert\bm{v}-\bm{v}^{*}\right\rVert^{2}.$		(175)

Let $d_{1},d_{2},d_{3},d_{4}$ be such that $\bm{c}^{T}\bm{v}^{*}<d_{1}<d_{2}<d^{\prime}$ and $d^{\prime\prime}<d_{3}<d_{4}$ . Let $S_{1}=\left\{\bm{v}\in\mathcal{X}\middle|\bm{c}^{T}\bm{v}<d_{2}\right\}$ , $S_{2}=\left\{\bm{v}\in\mathcal{X}\middle|d_{1}<\bm{c}^{T}\bm{v}<d_{4}\right\}$ , and $S_{3}=\left\{\bm{v}\in\mathcal{X}\middle|d_{3}<\bm{c}^{T}\bm{v}\right\}$ . Now consider $\bm{v}\in S_{2}$ . Let $\varphi_{\epsilon}(\bm{x})$ be the bump function defined by

\varphi_{\epsilon}(\bm{x})=\left\{\begin{array}[]{cc}\frac{1}{\epsilon^{n}}Ce^{\frac{-1}{1-\lVert\bm{x}/\epsilon\rVert^{2}}},&\lVert\bm{x}/\epsilon\rVert\leq 1,\\ 0,&\text{else},\end{array}\right.

(176)

where $C=\left(\int_{\mathbb{R}^{n}}e^{\frac{-1}{1-\lVert\bm{x}\rVert^{2}}}d\bm{x}\right)^{-1}$ , and let $W_{\epsilon}(\bm{v})=W\star\varphi_{\epsilon}$ , where we treat $W(\bm{v})=0$ for all $\bm{v}\notin\mathcal{D}$ . We have that

\displaystyle\frac{\partial W_{\epsilon}(\bm{v})}{\partial\bm{v}}F(\bm{v})

\displaystyle=\frac{\partial}{\partial\bm{v}}\int_{\mathbb{R}^{n}}W(\bm{v}-\bm{s})\varphi_{\epsilon}(\bm{s})\mu(d\bm{s})\cdot F(\bm{v}),

(177)

where $\mu$ is the Lebesgue measure on $\mathbb{R}^{n}$ . For $\bm{\tau}\in\mathbb{R}^{n}$ , we have that

\frac{\partial}{\partial\bm{v}}W(\bm{v}-\bm{s})\varphi_{\epsilon}(\bm{s})\cdot\bm{\tau}=\lim_{\eta\rightarrow 0}\frac{W(\bm{v}-\bm{s}+\eta\bm{\tau})-W(\bm{v}-\bm{s})}{\eta}\varphi_{\epsilon}(\bm{s}),

(178)

and for $\eta>0$ we have

\left\lVert\frac{W(\bm{v}-\bm{s}+\eta\bm{\tau})-W(\bm{v}-\bm{s})}{\eta}\varphi_{\epsilon}(\bm{s})\right\rVert\leq L\lVert\bm{\tau}\rVert\cdot\max_{\bm{s}\in\mathbb{R}^{n}}\lVert\varphi_{\epsilon}(\bm{s})\rVert,

(179)

where $L$ is the Lipschitz constant of $W$ on $S_{2}$ , and $\max_{\bm{s}\in\mathbb{R}^{n}}\lVert\varphi_{\epsilon}(\bm{s})\rVert=C/(e\epsilon^{n})$ . By the Dominated Convergence Theorem and the fact that $\lim_{\eta\rightarrow 0}\frac{W(\bm{v}-\bm{s}+\eta\bm{\tau})-W(\bm{v}-\bm{s})}{\eta}\varphi_{\epsilon}(\bm{s})=\frac{\partial W(\bm{v})}{\partial\bm{v}}\varphi_{\epsilon}(\bm{s})\cdot\bm{\tau}$ almost everywhere,

\frac{\partial W_{\epsilon}(\bm{v})}{\partial\bm{v}}F(\bm{v})=\int_{\mathbb{R}^{n}}\frac{\partial W(\bm{v}-\bm{s})}{\partial\bm{v}}\cdot F(\bm{v})\varphi_{\epsilon}(\bm{s})\mu(d\bm{s}).

(180)

For sufficiently small $\epsilon$ , there exists $\gamma>0$ such that for all $\lVert\bm{s}\rVert\leq\epsilon$ and $\bm{v}\in S_{2}$ such that $W$ is differentiable at $\bm{v}-\bm{s}$ ,

\frac{\partial W(\bm{v}-\bm{s})}{\partial\bm{v}}F(\bm{v})\leq-\gamma\lVert\bm{v}-\bm{v}^{*}\rVert^{2}.

(181)

Thus, for $\bm{v}\in S_{2}$ , we have

\frac{\partial W_{\epsilon}(\bm{v})}{\partial\bm{v}}F(\bm{v})\leq-\gamma\left\lVert\bm{v}-\bm{v}^{*}\right\rVert^{2}.

(182)

Now, let $\beta_{i}:\mathcal{D}\rightarrow[0,1]$ for $i=\{1,2,3\}$ be a smooth partition of unity subordinate to $\{S_{1},S_{2},S_{3}\}$ , and let

W_{s}(\bm{v})=\beta_{1}(\bm{v})W_{1}(\bm{v})+\beta_{2}(\bm{v})W_{\epsilon}(\bm{v})+\beta_{3}(\bm{v})W_{2}(\bm{v}).

(183)

We have that

	$\displaystyle\begin{split}\frac{\partial W_{s}(\bm{v})}{\partial\bm{v}}F(\bm{v})&=\left(\beta_{1}(\bm{v})\frac{\partial W_{1}(\bm{v})}{\partial\bm{v}}+\beta_{2}(\bm{v})\frac{\partial W_{\epsilon}(\bm{v})}{\partial\bm{v}}+\beta_{3}(\bm{v})\frac{\partial W_{3}(\bm{v})}{\partial\bm{v}}\right)F(\bm{v})\\ &\quad+\left(\frac{\partial\beta_{1}(\bm{v})}{\partial\bm{v}}W_{1}(\bm{v})+\frac{\partial\beta_{2}(\bm{v})}{\partial\bm{v}}W_{\epsilon}(\bm{v})+\frac{\partial\beta_{3}(\bm{v})}{\partial\bm{v}}W_{2}(\bm{v})\right)F(\bm{v}),\end{split}$			(184)
	$\displaystyle\begin{split}\frac{\partial W_{s}(\bm{v})}{\partial\bm{v}}F(\bm{v})&=\left(\beta_{1}(\bm{v})\frac{\partial W_{1}(\bm{v})}{\partial\bm{v}}+\beta_{2}(\bm{v})\frac{\partial W_{\epsilon}(\bm{v})}{\partial\bm{v}}+\beta_{3}(\bm{v})\frac{\partial W_{3}(\bm{v})}{\partial\bm{v}}\right)F(\bm{v})\\ &\quad+\left(\frac{\partial\beta_{1}(\bm{v})}{\partial\bm{v}}(W_{1}(\bm{v})-W(\bm{v}))+\frac{\partial\beta_{2}(\bm{v})}{\partial\bm{v}}(W_{\epsilon}(\bm{v})-W(\bm{v}))\right.\\ &\left.+\frac{\partial\beta_{3}(\bm{v})}{\partial\bm{v}}(W_{2}(\bm{v})-W(\bm{v}))\right)F(\bm{v}),\end{split}$			(185)

where we have used the fact that $\sum_{j=1}^{3}\frac{\partial\beta_{j}(\bm{v})}{\partial\bm{v}}=0$ .

Let us consider $\bm{v}\in\mathcal{X}\cap(S_{1}\setminus S_{2})$ . In this region, $W(\bm{v})=W_{1}(\bm{v})$ , $\beta_{2}(\bm{v})=\beta_{3}(\bm{v})=0$ , and $\frac{\partial\beta_{2}(\bm{v})}{\partial\bm{v}}=\frac{\partial\beta_{3}(\bm{v})}{\partial\bm{v}}=0$ . Thus,

\frac{\partial W_{s}(\bm{v})}{\partial\bm{v}}F(\bm{v})\leq-\gamma_{1}\left\lVert\bm{v}-\bm{v}^{*}\right\rVert^{2}.

(186)

Now, let us consider $\bm{v}\in\overline{\mathcal{X}\cap S_{1}\cap S_{2}}$ . In this region, $W(\bm{v})=W_{1}(\bm{v})$ , $\beta_{3}(\bm{v})=0$ , and $\frac{\partial\beta_{3}(\bm{v})}{\partial\bm{v}}=0$ , implying that

\frac{\partial W_{s}(\bm{v})}{\partial\bm{v}}F(\bm{v})\leq-\min\{\gamma,\gamma_{1}\}\left\lVert\bm{v}-\bm{v}^{*}\right\rVert^{2}+\left\lVert\frac{\partial\beta_{2}(\bm{v})}{\partial\bm{v}}(W_{\epsilon}(\bm{v})-W(\bm{v}))\right\rVert.

(187)

By e.g. Lemma B.2 in [44], $W_{\epsilon}(\bm{v})\rightarrow W(\bm{v})$ uniformly on the compact set $\overline{\mathcal{X}\cap S_{1}\cap S_{2}}$ . Additionally, observe that $\min\limits_{\bm{v}\in\overline{\mathcal{X}\cap S_{1}\cap S_{2}}}\left\lVert\bm{v}-\bm{v}^{*}\right\rVert^{2}>0$ , and thus for sufficiently small $\epsilon$ , for all $\bm{v}\in\overline{\mathcal{X}\cap S_{1}\cap S_{2}}$ ,

\frac{\partial W_{s}(\bm{v})}{\partial\bm{v}}F(\bm{v})\leq-\frac{1}{2}\min\{\gamma,\gamma_{1}\}\left\lVert\bm{v}-\bm{v}^{*}\right\rVert^{2}.

(188)

Now, let us consider $\bm{v}\in\overline{\mathcal{X}\cap(S_{2}\setminus S_{1}\setminus S_{3})}$ . In this region, $\beta_{1}(\bm{v})=\beta_{3}(\bm{v})=0$ and $\frac{\partial\beta_{1}(\bm{v})}{\partial\bm{v}}=\frac{\partial\beta_{3}(\bm{v})}{\partial\bm{v}}=0$ , and thus

\frac{\partial W_{s}(\bm{v})}{\partial\bm{v}}F(\bm{v})\leq-\gamma\left\lVert\bm{v}-\bm{v}^{*}\right\rVert^{2}+\left\lVert\frac{\partial\beta_{2}(\bm{v})}{\partial\bm{v}}(W_{\epsilon}(\bm{v})-W(\bm{v}))\right\rVert.

(189)

By noting that $\overline{\mathcal{X}\cap(S_{2}\setminus S_{1}\setminus S_{3})}$ is compact, and again using Lemma B.2 from [44] and the fact that
$\min\limits_{\bm{v}\in\overline{\mathcal{X}\cap(S_{2}\setminus S_{1}\setminus S_{3})}}\left\lVert\bm{v}-\bm{v}^{*}\right\rVert^{2}>0$ , we have that for sufficiently small $\epsilon$ ,

\frac{\partial W_{s}(\bm{v})}{\partial\bm{v}}F(\bm{v})\leq-\frac{1}{2}\gamma\left\lVert\bm{v}-\bm{v}^{*}\right\rVert^{2}.

(190)

Let us consider $\bm{v}\in\overline{\mathcal{X}\cap(S_{3}\cap S_{2})}$ . In this region, $\beta_{1}(\bm{v})=0$ and $W(\bm{v})=W_{2}(\bm{v})$ . Thus,

\frac{\partial W_{s}(\bm{v})}{\partial\bm{v}}F(\bm{v})\leq-\min\{\gamma,\tilde{\gamma}_{2}\}\left\lVert\bm{v}-\bm{v}^{*}\right\rVert^{2}+\left\lVert\frac{\partial\beta_{2}(\bm{v})}{\partial\bm{v}}(W_{\epsilon}(\bm{v})-W(\bm{v}))\right\rVert.

(191)

Again using Lemma B.2 from [44], and the fact that $\min\limits_{\bm{v}\in\overline{\mathcal{X}\cap(S_{3}\cap S_{2})}}\left\lVert\bm{v}-\bm{v}^{*}\right\rVert^{2}>0$ , we have that for sufficiently small epsilon,

\frac{\partial W_{s}(\bm{v})}{\partial\bm{v}}F(\bm{v})\leq-\frac{1}{2}\min\{\gamma,\tilde{\gamma}_{2}\}\left\lVert\bm{v}-\bm{v}^{*}\right\rVert^{2}.

(192)

Finally, consider $\bm{v}\in\overline{\mathcal{X}\cap(S_{3}\setminus S_{2})}$ . In this region, $W(\bm{v})=W_{2}(\bm{v})$ , $\beta_{1}(\bm{v})=\beta_{2}(\bm{v})=0$ , and $\frac{\partial\beta_{1}(\bm{v})}{\partial\bm{v}}=\frac{\partial\beta_{2}(\bm{v})}{\partial\bm{v}}=0$ . Thus,

\frac{\partial W_{s}(\bm{v})}{\partial\bm{v}}F(\bm{v})\leq-\tilde{\gamma}_{2}\left\lVert\bm{v}-\bm{v}^{*}\right\rVert^{2}.

(193)

Combining the preceding results for each subset of $\mathcal{X}$ , we have that for sufficiently small $\epsilon$ , for all $\bm{v}\in\mathcal{X}$ ,

\frac{\partial W_{s}(\bm{v})}{\partial\bm{v}}F(\bm{v})\leq-\frac{1}{2}\min\{\gamma,\gamma_{1},\tilde{\gamma}_{2}\}\left\lVert\bm{v}-\bm{v}^{*}\right\rVert^{2}.

(194)

Observe that

\sup_{\bm{v}\in\mathcal{X}}\left\lVert\nabla^{2}{W_{s}}(\bm{v})\right\rVert=\max\left\{\max_{\bm{v}\in\overline{\mathcal{X}\setminus S_{1}\setminus S_{2}}}\left\lVert\nabla^{2}{W_{s}}(\bm{v})\right\rVert,4\bar{b}_{1}\left\lVert\bm{c}\bm{c}^{T}\right\rVert\right\},

(195)

where $\max_{\bm{v}\in\overline{\mathcal{X}\setminus S_{1}\setminus S_{2}}}\left\lVert\nabla^{2}{W_{s}}(\bm{v})\right\rVert$ exists by the smoothness of $W_{s}$ and the compactness of $\overline{\mathcal{X}\setminus S_{1}\setminus S_{2}}$ . Therefore, for some sufficiently small $\epsilon$ , we can apply Lemma 4.8 with $V(\bm{v})=W_{s}(\bm{v})$ to conclude that the SCRN satisfies Condition 3.2.

We have now verified Conditions 3.1 and 3.2. Observe that $0=F(\bm{v}^{*})$ , i.e. $\bm{v}^{*}$ is an equilibrium point of (12a), and by Corollary 4.3 in [37], the global exponential stability of $\bm{v}^{*}$ implies that $\frac{\partial F(\bm{v}^{*})}{\partial\bm{v}}$ is Hurwitz. Thus, we can apply Theorem 3.1 to obtain the desired result. ∎

5 Examples

5.1 Antithetic motif

The antithetic feedback motif is a common SCRN encountered in synthetic biology because it can approximately implement an integrator [53, 12]. The antithetic motif is given by the following set of chemical reactions:

\schemestart\subscheme

\emptyset

\arrow(z–x1)¡=¿[

k_{2}

][

k_{1}

][135] \subscheme

\mathrm{X}_{1}

\arrow(@z–x2)¡=¿[

k_{3}

][

k_{4}

][90] \subscheme

\mathrm{X}_{2}

\arrow(@z–x12)¡-[

k_{5}

][45] \subscheme

\mathrm{X}_{1}+\mathrm{X}_{2}

\schemestop

(196)

The reaction vectors and propensities are:

$\displaystyle\lambda_{1}(\bm{x})$	$\displaystyle=\Omega k_{1},$	$\displaystyle\bm{\xi}_{1}=\begin{bmatrix}+1&0\end{bmatrix}^{T},$	(197)
$\displaystyle\lambda_{2}(\bm{x})$	$\displaystyle=k_{2}x_{1},$	$\displaystyle\bm{\xi}_{2}=\begin{bmatrix}-1&0\end{bmatrix}^{T},$	(198)
$\displaystyle\lambda_{3}(\bm{x})$	$\displaystyle=\Omega k_{3},$	$\displaystyle\bm{\xi}_{3}=\begin{bmatrix}0&+1\end{bmatrix}^{T},$	(199)
$\displaystyle\lambda_{4}(\bm{x})$	$\displaystyle=k_{4}x_{2},$	$\displaystyle\bm{\xi}_{4}=\begin{bmatrix}0&-1\end{bmatrix}^{T},$	(200)
$\displaystyle\lambda_{5}(\bm{x})$	$\displaystyle=\frac{k_{5}}{\Omega}x_{1}x_{2},$	$\displaystyle\bm{\xi}_{5}=\begin{bmatrix}-1&-1\end{bmatrix}^{T}.$	(201)

This SCRN satisfies Assumptions 2.1, 2.2, and 2.3, with $\mathcal{X}_{\Omega}=\mathbb{Z}_{\geq 0}^{2}$ . We show that we can apply Theorem 3.2. Observe that as shown in [9], the RREs

	$\displaystyle\dot{v}_{1}$	$\displaystyle=k_{1}-k_{2}v_{1}-k_{5}v_{1}v_{2},$		(203)
	$\displaystyle\dot{v}_{2}$	$\displaystyle=k_{3}-k_{4}v_{2}-k_{5}v_{1}v_{2},$		(204)

have a unique, globally exponentially stable equilibrium point $\bm{v}^{*}$ in $\mathbb{R}^{n}$ , with $\bm{v}^{*}>0$ . Letting $\bm{c}=\begin{bmatrix}1&1\end{bmatrix}^{T}$ we have that $\bm{c}^{T}F(\bm{v})=k_{1}+k_{2}-k_{2}v_{1}-k_{4}v_{2}-2k_{5}v_{1}v_{2}\leq k_{1}+k_{2}-\min\{k_{2},k_{4}\}(v_{1}+v_{2})$ , and thus there exists $d,\gamma>0$ such for all $\bm{v}\in\left\{\bm{v}\in\mathbb{R}_{\geq 0}^{2}\middle|\bm{c}^{T}\bm{v}\geq d\right\}$ , $\bm{c}^{T}F(\bm{v})\leq-\gamma\bm{c}^{T}\bm{v}$ . Therefore, by Theorem 3.2, there exists $C,\Omega_{0}^{\prime}>0$ such that for all $\Omega\geq\Omega_{0}^{\prime}$ ,

\mathcal{W}_{1}\left(\tilde{\bm{X}}_{\Omega}^{\infty},\bm{Y}^{\infty}\right)\leq C\frac{\ln\Omega}{\sqrt{\Omega}}.

(205)

5.2 Transcriptional activation

We consider a simple model of a protein $P$ that is transcriptionally regulated by $R$ binding to the promoter $T$ to form complex $C$ [14]:

			(206)
			(207)
			(208)

This SCRN has four species, however, we can use the two conservation laws $R+C=R_{tot}$ and $T+C=T_{tot}$ to obtain the following reduced model:

	$\displaystyle X_{1}$	$\displaystyle=C$		(209)
	$\displaystyle X_{2}$	$\displaystyle=P$		(210)

with reactions

$\displaystyle\lambda_{1}(\bm{x})$	$\displaystyle=\frac{k_{1}}{\Omega}(R_{tot}-x_{1})(T_{tot}-x_{1}),$	$\displaystyle\bm{\xi}_{1}=\begin{bmatrix}+1&0\end{bmatrix}^{T},$	(211)
$\displaystyle\lambda_{2}(\bm{x})$	$\displaystyle=k_{2}x_{1},$	$\displaystyle\bm{\xi}_{2}=\begin{bmatrix}-1&0\end{bmatrix}^{T},$	(212)
$\displaystyle\lambda_{3}(\bm{x})$	$\displaystyle=k_{3}x_{1},$	$\displaystyle\bm{\xi}_{3}=\begin{bmatrix}0&+1\end{bmatrix}^{T},$	(213)
$\displaystyle\lambda_{4}(\bm{x})$	$\displaystyle=k_{4}x_{2},$	$\displaystyle\bm{\xi}_{4}=\begin{bmatrix}0&-1\end{bmatrix}^{T},$	(214)

resulting in the RREs

	$\displaystyle\dot{v}_{1}$	$\displaystyle=k_{1}(\bar{R}_{tot}-v_{1})(\bar{T}_{tot}-v_{1})-k_{2}v_{1},$		(216)
	$\displaystyle\dot{v}_{2}$	$\displaystyle=k_{3}v_{1}-k_{4}v_{2}.$		(217)

If we set $R_{tot}=\Omega\bar{R}_{tot}$ and $T_{tot}=\Omega\bar{T}_{tot}$ , with $\bar{R}_{tot},\bar{T}_{tot}\in\mathbb{Q}$ , the SCRN satisfies Assumptions 2.1 and 2.2 with $\mathcal{X}=[0,\min\{\bar{R}_{tot},\bar{T}_{tot}\}]\times[0,\infty)$ . Additionally, the SCRN satisfies Assumption 2.3. The RREs have a unique, globally exponentially stable equilibrium point in $\mathcal{X}$ , and letting $\bm{c}=\begin{bmatrix}0&1\end{bmatrix}^{T}$ , we have $\bm{c}^{T}F(\bm{v})=k_{3}v_{1}-k_{4}v_{2}$ , and so there exists $\gamma,d>0$ such that for all $\bm{v}\in\left\{\bm{v}\in\mathcal{X}\middle|\bm{c}^{T}\bm{v}\geq d\right\}$ , $\bm{c}^{T}F(\bm{v})\leq-\gamma\bm{c}^{T}\bm{v}$ . Therefore, by Theorem 3.2, there exists $C,\Omega_{0}^{\prime}>0$ such that for all $\Omega\in\left\{\Omega\in\mathcal{O}\middle|\Omega\geq\Omega_{0}^{\prime}\right\}$ ,

\mathcal{W}_{1}\left(\tilde{\bm{X}}_{\Omega}^{\infty},\bm{Y}^{\infty}\right)\leq C\frac{\ln\Omega}{\sqrt{\Omega}}.

(218)

6 Conclusion

In this work we studied the relationship between the stationary distributions of appropriately scaled Markov chains corresponding to SCRNs, and the steady state behavior of the RRE and LNA models which are commonly used as approximations. Using Stein’s Method, we obtained a bound on the 1-Wasserstein distance between $\tilde{\bm{X}}_{\Omega}^{\infty}=\frac{1}{\sqrt{\Omega}}\left(\bm{X}_{\Omega}^{\infty}-\Omega\bm{v}^{*}\right)$ and $\bm{Y}^{\infty}$ , the fluctuation term of the LNA, that is proportional to $\frac{\ln\Omega}{\sqrt{\Omega}}$ , and a bound on the 1-Wasserstein distance between $\frac{1}{\Omega}\bm{X}_{\Omega}^{\infty}$ and $\bm{v}^{*}$ that is proportional to $\frac{1}{\sqrt{\Omega}}$ . Our main result requires one to control the second moment of $\tilde{\bm{X}}_{\Omega}^{\infty}$ , and we presented a Foster-Lyapunov based method to do so by analyzing just the RREs. Our condition requires global exponential stability of $\bm{v}^{*}$ in the RRE, along with the existence of a linear function with exponential decay far from the origin. While proving the global exponential stability of an equilibrium point of the RRE is in and of itself a challenging task, a large body of literature on the topic is available, since the RRE have been the subject on intense study over the years. In fact, recent advances in global stability of equilibria of the RRE can potentially be leveraged via our results to analyze the error in the RRE and LNA for large volume [2, 1]. The authors foresee this work applying to model reduction via timescale separation of SCRNs, and system identification algorithms, where the computational simplicity of the LNA makes it an attractive model. In both cases, the fact that we can obtain non-asymptotic error bounds on the LNA is critical to analyzing the error in the reduced models and the identified system respectively.

7 Acknowledgments

This work was supported in part by the U.S. AFOSR under grant FA9550-22-1-0316. The authors thank Dylan Hirsch for his comments on the proofs of Lemma 4.2 and Theorem 3.2, and thank Ruth J Williams for her helpful feedback on a preliminary version of this work.

References

[1] M Ali Al-Radhawi. Graphical characterizations of robust stability in biological interaction networks. Mathematics of Control, Signals, and Systems, pages 1–33, 2023.
[2] Muhammad Ali Al-Radhawi and David Angeli. New approach to the stability of chemical reaction networks: Piecewise linear in rates lyapunov functions. IEEE Transactions on Automatic Control, 61(1):76–89, 2015.
[3] Uri Alon. An introduction to systems biology: design principles of biological circuits. Chapman and Hall/CRC, 2006.
[4] David F Anderson, Gheorghe Craciun, and Thomas G Kurtz. Product-form stationary distributions for deficiency zero chemical reaction networks. Bulletin of mathematical biology, 72:1947–1970, 2010.
[5] David F Anderson and Thomas G Kurtz. Stochastic analysis of biochemical systems, volume 674. Springer, 2015.
[6] Adam Arkin, John Ross, and Harley H McAdams. Stochastic Kinetic Analysis of Developmental Pathway Bifurcation in Phage $\lambda$ -Infected Escherichia coli Cells. Genetics, 149(4):1633–1648, 08 1998.
[7] Andrew D Barbour. Stein’s method and poisson process convergence. Journal of Applied Probability, 25(A):175–184, 1988.
[8] Andrew D Barbour. Stein’s method for diffusion approximations. Probability theory and related fields, 84(3):297–322, 1990.
[9] Franco Blanchini and Elisa Franco. Structurally robust biological networks. BMC Systems Biology, 5(1):74, 2011.
[10] Anton Braverman and JG Dai. Stein’s method for steady-state diffusion approximations of m/ph/n+ m systems. Annals of Applied Probability, 27(1):550–581, 2017.
[11] Anton Braverman, JG Dai, and Jiekun Feng. Stein’s method for steady-state diffusion approximations: an introduction through the erlang-a and erlang-c models. Stochastic Systems, 6(2):301–366, 2017.
[12] Corentin Briat, Ankit Gupta, and Mustafa Khammash. Antithetic integral feedback ensures robust perfect adaptation in noisy biomolecular networks. Cell Systems, 2(1):15–26, 2016.
[13] Sourav Chatterjee. A short survey of stein’s method. In Proceedings of the Proceedings of the International Congress of Mathematicians, pages 1–24, 2014.
[14] D. Del Vecchio and R. M. Murray. Biomolecular Feedback Systems. Princeton University Press, 1st edition, 2014.
[15] Charles Desoer and Hiromasa Haneda. The measure of a matrix as a tool to analyze computer algorithms for circuit analysis. IEEE Transactions on Circuit Theory, 19(5):480–486, 1972.
[16] Joseph L Doob. Markoff chains—denumerable case. Transactions of the American Mathematical Society, 58:455–473, 1945.
[17] Werner Ehm. Binomial approximation to the poisson binomial distribution. Statistics & Probability Letters, 11(1):7–16, 1991.
[18] Johan Elf and Måns Ehrenberg. Fast evaluation of fluctuations in biochemical networks with the linear noise approximation. Genome research, 13(11):2475–2484, 2003.
[19] M. B. Elowitz and S. Leibler. A synthetic oscillatory network of transcriptional regulators. Nature, 403:339–342, 2000.
[20] Michael B Elowitz, Arnold J Levine, Eric D Siggia, and Peter S Swain. Stochastic gene expression in a single cell. Science, 297(5584):1183–1186, 2002.
[21] Stewart N Ethier and Thomas G Kurtz. Markov processes: characterization and convergence. John Wiley & Sons, 2009.
[22] Paul Fearnhead, Vasilieos Giagos, and Chris Sherlock. Inference for reaction networks using the linear noise approximation. Biometrics, 70(2):457–466, 2014.
[23] Bärbel Finkenstädt, Dan J Woodcock, Michal Komorowski, Claire V Harper, Julian RE Davis, Mike RH White, and David A Rand. Quantifying intrinsic and extrinsic noise in gene transcription using the linear noise approximation: An application to single cell data. The Annals of Applied Statistics, pages 1960–1982, 2013.
[24] T.S. Gardner, C.R. Cantor, and J.J. Collins. Construction of the genetic toggle switch in Escherichia Coli. Nature, 403:339–342, 2000.
[25] Alison L Gibbs and Francis Edward Su. On choosing and bounding probability metrics. International statistical review, 70(3):419–435, 2002.
[26] Daniel T Gillespie. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. Journal of computational physics, 22(4):403–434, 1976.
[27] Daniel T Gillespie. A rigorous derivation of the chemical master equation. Physica A: Statistical Mechanics and its Applications, 188(1-3):404–425, 1992.
[28] Daniel T. Gillespie. The chemical langevin equation. The Journal of Chemical Physics, 113(1):297–306, 2000.
[29] Peter W Glynn and Assaf Zeevi. Bounding stationary expectations of markov processes. In Markov processes and related topics: a Festschrift for Thomas G. Kurtz, pages 195–214. Institute of Mathematical Statistics, 2008.
[30] Jackson Gorham, Andrew B Duncan, Sebastian J Vollmer, and Lester Mackey. Measuring sample quality with diffusions. The Annals of Applied Probability, 29(5):2884–2928, 2019.
[31] Ramon Grima. Linear-noise approximation and the chemical master equation agree up to second-order moments for a class of chemical systems. Phys. Rev. E, 92:042124, Oct 2015.
[32] Ankit Gupta, Corentin Briat, and Mustafa Khammash. A scalable computational framework for establishing long-term behavior of stochastic reaction networks. PLoS computational biology, 10(6):e1003669, 2014.
[33] Itai Gurvich. Diffusion models and steady-state approximations for exponentially ergodic markovian queues. Annals of Applied Probability, 24(6):2527–2559, 2014.
[34] Dylan Hirsch, Theodore W Grunberg, and Domitilla Del Vecchio. Error bound for hill-function approximations in a class of stochastic transcriptional network models. In Proceedings of the 62st IEEE Conference on Decision and Control, Singapore, 2023. IEEE.
[35] NG van Kampen. A power series expansion of the master equation. Canadian Journal of Physics, 39(4):551–567, 1961.
[36] Hye-Won Kang and Thomas G Kurtz. Separation of time-scales and model reduction for stochastic reaction networks. The Annals of Applied Probability, 23(2):529–583, 2013.
[37] Hassan K Khalil. Nonlinear Systems. Prentice Hall, Upper Saddle River, NJ, 3^rd edition, 2002.
[38] Michał Komorowski, Bärbel Finkenstädt, Claire V Harper, and David A Rand. Bayesian inference of biochemical kinetic parameters using the linear noise approximation. BMC bioinformatics, 10:1–10, 2009.
[39] Thomas G Kurtz. The relationship between stochastic and deterministic models for chemical reactions. The Journal of Chemical Physics, 57(7):2976–2978, 1972.
[40] Thomas G Kurtz. Limit theorems and diffusion approximations for density dependent markov chains. Stochastic Systems: Modeling, Identification and Optimization, I, pages 67–78, 1976.
[41] Thomas G Kurtz. Strong approximation theorems for density dependent markov chains. Stochastic Processes and their Applications, 6(3):223–240, 1978.
[42] Saul C Leite and Ruth J Williams. A constrained langevin approximation for chemical reaction networks. The Annals of Applied Probability, 29(3):1541–1608, 2019.
[43] Ioannis Lestas, Johan Paulsson, Nicholas E. Ross, and Glenn Vinnicombe. Noise in gene regulatory networks. IEEE Transactions on Automatic Control, 53(Special Issue):189–200, 2008.
[44] Yuandan Lin, Eduardo D Sontag, and Yuan Wang. A smooth converse lyapunov theorem for robust stability. SIAM Journal on Control and Optimization, 34(1):124–160, 1996.
[45] Winfried Lohmiller and Jean-Jacques E Slotine. Nonlinear process control using contraction theory. AIChE journal, 46(3):588–596, 2000.
[46] Harley H. McAdams and Adam Arkin. Stochastic mechanisms in gene expression. Proceedings of the National Academy of Sciences, 94(3):814–819, 1997.
[47] Bence Mélykúti, Joao P Hespanha, and Mustafa Khammash. Equilibrium distributions of simple biochemical reaction systems for time-scale separation in stochastic reaction networks. Journal of The Royal Society Interface, 11(97):20140054, 2014.
[48] X Flora Meng, Ania-Ariadna Baetica, Vipul Singhal, and Richard M Murray. Recursively constructing analytic expressions for equilibrium distributions of stochastic biochemical reaction networks. Journal of The Royal Society Interface, 14(130):20170157, 2017.
[49] Sean P Meyn and Richard L Tweedie. Stability of markovian processes II: Continuous-time processes and sampled chains. Advances in Applied Probability, 25(3):487–517, 1993.
[50] Johan Paulsson. Summing up the noise in gene networks. Nature, 427(6973):415–418, 2004.
[51] Johan Paulsson. Models of stochastic gene expression. Physics of life reviews, 2(2):157–175, 2005.
[52] Johan Paulsson and Måns Ehrenberg. Random signal fluctuations can reduce random fluctuations in regulated components of chemical regulatory networks. Physical review letters, 84(23):5447, 2000.
[53] Yili Qian and Domitilla Del Vecchio. Realizing integral control in living cells: how to overcome leaky integration due to dilution? Journal of The Royal Society Interface, 15(139), 2018.
[54] David Schnoerr, Guido Sanguinetti, and Ramon Grima. The complex chemical langevin equation. The Journal of chemical physics, 141(2), 2014.
[55] Abhyudai Singh. Negative feedback through mrna provides the best control of gene-expression noise. IEEE transactions on nanobioscience, 10(3):194–200, 2011.
[56] Abhyudai Singh and Joao P Hespanha. Optimal feedback strength for noise suppression in autoregulatory gene networks. Biophysical journal, 96(10):4013–4023, 2009.
[57] Eduardo D Sontag and Abhyudai Singh. Exact moment dynamics for feedforward nonlinear chemical reaction networks. IEEE Life Sciences Letters, 1(2):26–29, 2015.
[58] C Stein. Approximate computation of expectations, institute of mathematical statistics lecture notes—monograph series, 7, institute of mathematical statistics, hayward, ca, 1986. IMS Lecture Notes and Monograph Series, 7, 1986.
[59] Charles Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: Probability Theory, volume 6, pages 583–603. University of California Press, 1972.
[60] Yi Tao, Xiudeng Zheng, and Yuehua Sun. Effect of feedback regulation on stochastic gene expression. Journal of theoretical biology, 247(4):827–836, 2007.
[61] Tianhai Tian and Kevin Burrage. Stochastic models for regulatory networks of the genetic toggle switch. Proceedings of the national Academy of Sciences, 103(22):8372–8377, 2006.
[62] Nicolaas Godfried Van Kampen. Stochastic processes in physics and chemistry, volume 1. Elsevier, 1992.
[63] Cédric Villani. Optimal transport: old and new, volume 338. Springer, 2009.
[64] G Kelley Walter and C Peterson Allan. The theory of differential equations classical and qualitative, 2010.
[65] Leor S. Weinberger, John C. Burnett, Jared E. Toettcher, Adam P. Arkin, and David V. Schaffer. Stochastic gene expression in a lentiviral positive-feedback loop: Hiv-1 tat fluctuations drive phenotypic diversity. Cell, 122(2):169–182, 2005.
[66] Peter Whittle. Systems in stochastic equilibrium. John Wiley & Sons, Inc., 1986.
[67] Darren J Wilkinson. Stochastic modelling for quantitative description of heterogeneous biological systems. Nature Reviews Genetics, 10(2):122–133, 2009.