Moment stability of stochastic processes with applications to control systems

Arnab Ganguly Department of Mathematics, Louisiana State University, USA. [email protected] and Debasish Chatterjee Indian Institute of Technology Bombay, Systems & Control Engineering, India. [email protected]

Abstract.

We establish new conditions for obtaining uniform bounds on the moments of discrete-time stochastic processes. Our results require a weak negative drift criterion along with a state-dependent restriction on the sizes of the one-step jumps of the processes. The state-dependent feature of the results make them suitable for a large class of multiplicative-noise processes. Under the additional assumption of Markovian property, new result on ergodicity has also been proved. There are several applications to iterative systems, control systems, and other dynamical systems with state-dependent multiplicative noise, and we include illustrative examples to demonstrate applicability of our results.

Key words and phrases:

moment bound, stability, ergodicity, Markov processes, control systems.

2010 Mathematics Subject Classification:

60J05, 60J20, 60F17, 93D05, 93E15

Research of A. Ganguly is supported in part by NSF DMS - 1855788 and Louisiana Board of Regents through the Board of Regents Support Fund (contract number: LEQSF(2016-19)-RD-A-04)

1. Introduction

The paper studies stability properties of a general class of discrete-time stochastic systems. Assessment of stability of dynamical systems is an important research area which has been studied extensively over the years. For example, in control theory a primary objective is to design suitable control policies which will ensure appropriate stability properties (e.g. bounded variance) of the underlying controlled system. There are various notions of stability of a system. In mathematics, stability often refers to equilibrium stability, which, for deterministic dynamical systems, is mainly concerned with qualitative behaviors of the trajectories of the system that start near the equilibrium point. For the stochastic counterpart, in Markovian setting it usually involves study of existence of invariant distributions and associated convergence and ergodic properties. A comprehensive source of results on different ergodicty properties for discrete-time Markov chains using Foster-Lyapunov functions is [15] (also see the references therein). Several extensions of such results have since then been explored quite extensively in the literature (for example, see [17, 18]). Another important book in this area is [9], which uses expected occupation measures of the chain for identifying conditions for stability.

The primary objective of the paper is to study moment stability, which concerns itself with uniform bounds on moments of a general stochastic process $X_{n}$ or, more generally, on expectations of the form $\mathbb{E}(V(X_{n}))$ for a given function $V$ . This is a bit different from the usual notions of stability in the Markovian setting as mentioned in the previous paragraph, but they are not unrelated. Indeed, if the process $\{X_{n}\}$ has certain underlying Lyapunov structure a strong form of Markovian stability holds which in particular implies moment stability. The result, which is based on Foster-Lyapunov criterion, can be described as follows. Given a Markov chain $\{X_{n}\}_{n\in\mathbb{N}}$ taking values in a Polish space $\mathcal{S}$ with a transition probability kernel $\mathcal{P}$ , suppose there exists a non-negative measurable function $u:\mathcal{S}\rightarrow[0,\infty)$ , called a Foster-Lyapunov function, such that the process $\{u(X_{n})\}_{n\in\mathbb{N}}$ possesses has the following negative drift condition: for some constant $b\geqslant 0,\ \theta>0$ , a set $A\subset\mathcal{S}$ , and a function $V:\mathcal{S}\rightarrow[0,\infty)$

(1.1)

\displaystyle\mathbb{E}\left[u(X_{n+1})-u(X_{n})|X_{n}=x\right]\equiv\int_{\mathcal{S}}\mathcal{P}(x,dy)u(y)-u(x)\leqslant-\theta V(x)+b\mathds{1}_{\{x\in A\}}.

If the set $A$ is petite, (which, roughly speaking, are the sets that have the property that any set $B$ is ‘equally accessible’ from any point inside the petite set - for definition and more details, see [16, 15]), the process $\{X_{n}\}$ has a unique invariant distribution $\pi$ and also $\pi(V)=\int_{\mathcal{S}}\pi(dx)V(x)<\infty$ . Moreover, under aperiodicity, it can be concluded that the chain is Harris ergodic, that is,

\|\mathcal{P}^{n}(x,\cdot)-\pi\|_{V}\rightarrow 0,\quad\mbox{ as }n\rightarrow\infty,

where $\|\cdot\|_{V}$ is the $V$ -norm (see, the definition at the end of introduction) [15, Chapter 14]. In particular, one has $\mathbb{E}[V(X_{n})]\rightarrow\pi(V)$ as $n\rightarrow\infty$ (which of course implies boundedness of $\mathbb{E}[V(X_{n})]$ ). Thus for a Markov process $\{X_{n}\}$ , one way to get a uniform bound on $V(X_{n})$ is to find a Foster-Lyapunov function $u$ such that (1.1) holds.

The objective of the first part of the paper is to explore scenarios where a strong negative drift condition like (1.1) does not hold or at least such a Lypaunov function is not easy to find for a specific $V$ . We do note that the required conditions in our results are formulated in terms of the target function $V$ itself. One pleasing aspect of this feature is that search for a suitable Lyapunov function $u$ is not required for applying these results.

Our main result, Theorem 2.2, deals with the general regime where the state process $\{X_{n}\}$ is a general stochastic process and not necessarily Markovian. While past study on stability mostly concerns homogeneous Markov processes, the literature in the case of more general processes including non-homogeneous Markov processes and processes with long-range dependence is rather limited. The starting point in Theorem 2.2 is a weaker negative drift like condition:

(1.2)

\displaystyle\mathbb{E}\left(V(X_{n+1})-V(X_{n})|\mathcal{F}_{n}\right)\leqslant-A,\quad X_{n}\notin\mathcal{D}

which, if $X_{n}$ is a homogeneous Markov chain, is of course equivalent to $\mathcal{P}V(x)-V(x)\leqslant-A$ for $x$ outside $\mathcal{D}$ . As could be seen by comparing (1.2) with (1.1) that even in the Markovian setting, the results of [15, Chapter 14] will not imply $\sup_{n}\mathbb{E}(V(X_{n}))<\infty.$ In fact, condition (1.2) is not enough to guarantee such an assertion even in a deterministic setting. For example, consider the sequence $\{x_{n}\}$ on $\mathbb{N}$ defined by

\displaystyle x_{n+1}=\begin{cases}x_{n}-1,&\quad\text{ if }x_{n}>1\\ n+1,&\quad\text{ if }x_{n}=1.\end{cases}

Clearly, $\sup_{n\geqslant 1}x_{n}=\infty,$ even though the negative drift condition is satisfied for $\mathcal{D}=\{1\}$ . But we showed in Theorem 2.2 that under a state-dependent restriction on the conditional moments of $V(X_{n+1})$ given $\mathcal{F}_{n}$ (see Assumption 2.1 for details), the desired uniform moment bound can be achieved. Note that the above sequence $\{x_{n+1}\}$ fails (2.1-c) of Assumption 2.1 but satisfies the other two conditions.

In the (homogeneous) Markovian framework, Theorem 2.2 leads to a new result (c.f. Theorem 2.8) on Harris ergodicity of Markov chains which will be useful in occasions when Foster-Lyapunov drift criterion in form of (1.1) does not hold. Importantly, Theorem 2.8 does not require $\mathcal{D}$ to be petite or prior checking of aperiodicity of the chain.

Theorem 2.2 is partly influenced by a result of Pemantle and Rosenthal [21] which established a uniform bound on $\mathbb{E}(V^{r}(X_{n}))$ under (1.2) and the additional assumption of a constant bound on conditional $p$ -th moment of one-step jumps of the process given $\mathcal{F}_{n}$ , that is, $\mathbb{E}\left[|V(X_{n+1})-V(X_{n})|^{p}|\mathcal{F}_{n}\right]$ . However, for a large class of stochastic systems the latter requirement of a uniform bound on conditional moments of jump sizes cannot be fulfilled. In particular, our work is motivated by some problems on stability about a class of stochastic systems with multiplicative noise where such conditions on one-step jumps are almost always state-dependent and can never be bounded by a constant. Our work generalized the result of [21] in two important directions - it uses a different “metric” to control the one step jumps and it allows such jumps to be bounded by a suitable state dependent function. Specifically, instead of $\mathbb{E}\left[|V(X_{n+1})-V(X_{n})|^{p}|\mathcal{F}_{n}\right]$ we control the centered conditional $p$ -th moment of $V(X_{n+1})$ , that is, $\mathbb{E}\left[\left|V(X_{n+1})-\mathbb{E}(V(X_{n+1})|\mathcal{F}_{n}\right|^{p}\Big{|}\mathcal{F}_{n}\right]$ , in a state-dependent way. The latter quantity can be viewed as a distance between the actual position at time $n+1$ , $V(X_{n+1})$ , and the expected position at that time given the past information, $\mathbb{E}(V(X_{n+1})|\mathcal{F}_{n})$ , while [21] uses the distance between actual positions at times $n+1$ and $n$ . These extensions require a different approach involving different auxiliary estimates. The advantages of this new ‘jump metric’ and the state dependency feature have been discussed in detail after the the proof of Theorem 2.2. Together, they significantly increase applicability of our result to a large class of stochastic systems.

This is demonstrated in Section 3, where a broad class of systems with multiplicative noise is studied and new stability results (see Proposition 3.2 and Corollary 3.4) are obtained. This, in particular, includes stochastic switching systems and Markov processes of the form $X_{n+1}=H(X_{n})+G(X_{n})\xi_{n+1}.$ The last part of this section is devoted to the important problem of stabilization of stochastic linear systems with bounded control inputs. The problem of interest here is to find conditions which guarantee $L^{2}$ -boundedness of a stochastic linear system of the form $X_{n+1}=AX_{n}+Bu_{n}+\xi_{n+1}$ with bounded control input. This has been studied in a previous work of the second author (see [24] and references therein for more background on the problem), and it has been shown that when $(A,B)$ is stabilizable, there exists a $k$ -history dependent control policy which assures bounded variance of such system provided the norm of the control is sufficiently large. This upper bound on the norm of the control is an artificial obstacle on its design, and it has been conjectured in [24] that it is not required although a proof couldn’t be provided. Here we show that this conjecture is indeed true (c.f. Proposition 3.7), and the artificial restriction on the control norm can be lifted largely owing to the new “metric” in Theorem 2.2. In fact, as Proposition 3.2 and Corollary 3.4 indicate this stabilization result can be easily extended to cover more general classes of stochastic control systems including the ones with multiplicative noise.

The article is organized as follows. The mathematical framework and the main results are described in Section 2. Section 3 discusses potential applications of our results for a large class of stochastic systems including switching systems, multiplicative Markov models, which are especially relevant to control theory.

Notation and terminology: For a probability kernel $P$ on $\mathcal{S}\times\mathcal{S}$ , and a function $f:\mathcal{S}\rightarrow[0,\infty)$ , the function $Pf:\mathcal{S}\rightarrow[0,\infty)$ will be defined by $Pf(x)=\int_{\mathcal{S}}f(y)P(x,dy)$ . In similar spirit, for a measure $\mu$ on $\mathcal{S}$ , $\mu(f)$ will be defined by $\mu(f)=\int_{\mathcal{S}}f(x)\mu(dx).$ For a signed measure, $\mu$ , on $\mathcal{S}$ . the corresponding total variation measure is denoted by $|\mu|=\mu^{+}+\mu^{-}$ , where $\mu=\mu^{+}-\mu^{-}$ as per the Jordan decomposition. If $\mu=\nu_{1}-\nu_{2}$ , where $\nu_{1}$ and $\nu_{2}$ are probability measures, the total variation distance $\|\nu_{1}-\nu_{2}\|_{TV}$ is given by

\|\nu_{1}-\nu_{2}\|_{TV}=|\mu|(\mathcal{S})=2\sup_{A\in\mathcal{B}(\mathcal{S})}|\nu_{1}(A)-\nu_{2}(A)|.

More generally, if $g:\mathcal{S}\rightarrow[0,\infty)$ is a measurable function, the $g$ -norm of $\mu=\nu_{1}-\nu_{2}$ is defined by $\|\mu\|_{g}=\sup\{|\mu(f)|:f\text{ measurable and }0\leqslant f\leqslant g\}$

Throughout, we will work on an abstract probability space $(\Omega,\mathcal{F},\mathbb{P})$ . $\mathbb{E}$ will denote the expectation operator under $\mathbb{P}$ . In context of the process $\{X_{n}\}$ , $\mathbb{E}_{x}$ will denote the conditional expectation given $X_{0}=x$ .

2. Mathematical framework and main results

The section presents two main results, Theorem 2.2 on uniform bounds on functions of a general stochastic process and Theorem 2.8 on ergodicity properties in the homogeneous Markovian setting. The mathematical framework pertains to a stochastic process $\{X_{n}\}$ taking values in a topological space $\mathcal{S}$ and involves negative drift conditions outside a set $\mathcal{D}$ , together with a state-dependent control on the size of one-step jumps of $\{X_{n}\}$ .

2.1. Uniform bounds for moments of stochastic processes

Assumption 2.1.

There exist measurable functions $V:\mathcal{S}\rightarrow[0,\infty),\ \varphi:\mathcal{S}\rightarrow[0,\infty)$ , and a set $\mathcal{D}\subset\mathcal{S}$ such that

(2.1-a)

for all $n\in N$ ,

$\mathbb{E}_{x_{0}}[V(X_{n+1})-V(X_{n})\mid\mathcal{F}_{n}]\leqslant-A\mbox{\ \ on\ \ }\{X_{n}\notin\mathcal{D}\};$

(2.1-b)

for all $n\in\mathbb{N}$ and some $p>2$ , $\Xi_{n}$ , the centered conditional $p$ -th moment of $V(X_{n+1})$ given $\mathcal{F}_{n}$ , satisfy

\Xi_{n}\doteq\mathbb{E}_{x_{0}}\Big{[}|V(X_{n+1})-\mathbb{E}(V(X_{n+1})|\mathcal{F}_{n})|^{p}\Big{|}\mathcal{F}_{n}\Big{]}\leqslant\varphi(X_{n}),

where $\varphi(x)\leqslant\mathscr{C}_{\varphi}(1+V^{s}(x))$ for some $0\leqslant s<p/2-1$ and some constant $\mathscr{C}_{\varphi}>0$ .

(2.1-c)

$\sup_{x\in\mathcal{D}}V(x)<\infty,$ and for some constant $\bar{\mathscr{B}}_{0}({x_{0}})$ ,

\mathbb{E}_{x_{0}}\left[\left(\mathbb{E}[V(X_{n+1})|\mathcal{F}_{n}]\right)^{p}\mathds{1}_{\{X_{n}\in\mathcal{D}\}}\right]<\bar{\mathscr{B}}_{0}({x_{0}}).

Theorem 2.2.

Suppose that Assumption 2.1 holds for the process $\{X_{n}\}$ with $X_{0}={x_{0}}$ . Then

\mathscr{B}_{r}({x_{0}})\doteq\sup_{n\in\mathbb{N}}\mathbb{E}_{x_{0}}\bigl{[}V(X_{n})^{r}\bigr{]}<\infty,

for any $0\leqslant r<\varsigma(s,p),$ where

\varsigma(s,p)=\begin{cases}p\left(1-\frac{s}{p-2}\right)-1,&\quad\text{ for }s\in[0,(p-2)^{2}/2p)\cup[1-2/p,\ p/2-1),\text{ when }\ 2<p<4;\\ &\quad\text{ for all }s\in[0,p/2-1),\text{ when }\ p\geqslant 4;\\ p-2,&\quad\text{ for }\ (p-2)^{2}/2p\leqslant s<1-2/p,\text{ when }\ 2<p<4.\\ \end{cases}

Remark 2.3.

•

The proof is a combination of Proposition 2.5 and Proposition 2.6. Proposition 2.5 first establishes a weaker version of the above assertion by showing that $\sup_{n\in\mathbb{N}}\mathbb{E}_{x_{0}}\bigl{[}V(X_{n})^{r}\bigr{]}<\infty$ , for all $r<p/2-1$ . However, extension of the result from there to all $r<\varsigma(s,p)$ (notice that $\varsigma(s,p)\geqslant p/2-1$ ) requires a substantial amount of extra work and is achieved through Proposition 2.6.
•

Note that (2.1-c) is implied by the simpler condition: $\mathbb{E}_{x_{0}}[V(X_{n+1})|\mathcal{F}_{n}]\leqslant\bar{\mathscr{B}}_{0}$ on $\{X_{n}\in\mathcal{D}\}$ for some constant $\bar{\mathscr{B}}_{0}$ .

Proof of Theorem 2.2.

From Proposition 2.5 and the growth assumption on $\varphi$ , it follows that for any $1\leqslant\theta<(p-2)/2s$ , $\sup_{n}\|\Xi_{n}\|_{\theta}\leqslant\sup_{n}\left(\mathbb{E}_{x_{0}}(\varphi^{\theta}(X_{n}))\right)^{1/\theta}<\infty$ , where $\|\cdot\|_{\theta}$ is the $\mathcal{L}^{\theta}(\Omega,\mathbb{P})$ -norm (c.f. Proposition 2.6). The result now follows from Proposition 2.6 by letting $\theta\rightarrow(p-2)/2s-.$ If $s=0$ , that is, $\Xi_{n}\leqslant\mathscr{C}$ , for some constant $\mathscr{C}$ a.s., we take $\theta=\infty$ in Proposition 2.6. ∎

At this stage it is instructive to compare Theorem 2.2 with [21, Theorem 1] and precisely note some of the improvements the former offer. The first significant extension is that Theorem 2.2 allows the jump sizes in (2.1-b) to be state dependent whereas, [21] requires

(†)

\displaystyle\mathbb{E}_{x_{0}}\left[|V(X_{n+1})-V(X_{n})|^{p}|\mathcal{F}_{n}\right]\leqslant B,

for some constant $B>0$ . The resulting benefits are obvious as it allows the result in particular to be applicable to large class of multiplicative systems of the form

X_{n+1}=H(X_{n})+G(X_{n})\xi_{n+1},

which [21, Theorem 1] will not cover. The second notable distinction is in the ‘metric’ used in (2.1-b) in controlling jump sizes : while [21] involves $\mathbb{E}\left[|V(X_{n+1})-V(X_{n})|^{p}|\mathcal{F}_{n}\right]$ , our result only requires controlling the centered conditional $p$ -th moments of $V(X_{n+1})$ given $\mathcal{F}_{n}$ , namely, $\mathbb{E}_{x}\left[\big{|}V(X_{n+1})-\mathbb{E}[V(X_{n+1})|\mathcal{F}_{n}]\big{|}^{p}\Big{|}\mathcal{F}_{n}\right]$ . Of course, the latter leads to weaker hypothesis as

\mathbb{E}_{x_{0}}\left[\big{|}V(X_{n+1})-\mathbb{E}[V(X_{n+1})|\mathcal{F}_{n}]\big{|}^{p}\Big{|}\mathcal{F}_{n}\right]\leqslant 2^{p}\mathbb{E}_{x_{0}}\left[|V(X_{n+1})-V(X_{n})|^{p}|\mathcal{F}_{n}\right].

It is important to emphasize the advantages of the weaker hypothesis as the condition in († ‣ 2.1) precludes it from being applicable even to some additive models. To illustrate this with a simple example, consider a $[0,\infty)$ -valued process $\{X_{n}\}$ given by

X_{n+1}=X_{n}/2+\xi_{n+1},\quad X_{0}\geqslant 0,

where $\xi_{n}$ are $[0,\infty)$ -valued random variables with $\mu_{p}=\sup_{n}\mathbb{E}(\xi_{n}^{p})<\infty$ for $p>2$ . Since $X_{n+1}-X_{n}=-X_{n}/2+\xi_{n+1},$ clearly the negative drift condition (c.f (2.1-a)) holds with $V(x)=|x|$ . but for the jump sizes we can only have

\displaystyle\mathbb{E}_{x_{0}}\left[|X_{n+1}-X_{n}|^{p}|\mathcal{F}_{n}\right]=O(X_{n}^{p}).

This means that [21, Theorem 1] cannot be used to get $\sup_{n}\mathbb{E}_{x}(X_{n})<\infty$ for this simple additive system - a fact which easily follows from an elementary iteration argument (note, $\mathbb{E}_{x}(X_{n})\stackrel{{\scriptstyle n\rightarrow\infty}}{{\rightarrow}}2\mu_{1}$ ). On the other hand, our theorem clearly covers such cases as

\displaystyle\mathbb{E}_{x_{0}}\left[|X_{n+1}-E\left(X_{n+1}|\mathcal{F}_{n}\right)|^{p}\Big{|}\mathcal{F}_{n}\right]\leqslant\bar{\mu}_{p},\quad\bar{\mu}_{p}=\sup_{n}\mathbb{E}|\xi_{n}-\mathbb{E}(\xi_{n})|^{p}.

It should actually be noted that had Theorem 2.2 simply controlled the jump sizes by imposing the more restrictive condition, $\mathbb{E}\left[|X_{n+1}-X_{n}|^{p}|\mathcal{F}_{n}\right]\leqslant\varphi(X_{n})$ , the state-dependency feature was not enough to salvage the moment bound of the above additive system (because of the requirement $\varphi(x)=O(V^{s}(x))$ for $s<p/2-1$ ). It is interesting to note that the results of [15] based on Foster-Lyapunov drift conditions also cannot directly be used in this simple example, as $\{X_{n}\}$ is not necessarily Markov (since the $\xi_{n}$ are not assumed to be i.i.d). To summarize, the weaker jump metric coupled with state dependency feature makes Theorem 2.2 a rather powerful tool in understanding stability for a broad class of stochastic systems. Some important results in this direction for switching systems have been discussed in the application section.

The following lemma will be used in various necessary estimates.

Lemma 2.4.

Let ${M_{n}}$ be a martingale relative to the filtration $\{\mathcal{F}_{n}\}$ ,

(2.3)

\displaystyle\gamma_{n}\stackrel{{\scriptstyle def}}{{=}}\mathbb{E}\bigl{[}\left\lvert{M_{n+1}-M_{n}}\right\rvert^{p}\,\big{|}\,\mathcal{F}_{n}\bigr{]},\quad n\geqslant 0

$\Theta$ a non-negative random variable, and $b>0$ a constant. Then for some constants $\mathscr{C}_{0}$ and $\mathscr{C}_{00}$

(a)

$\displaystyle\mathbb{E}\left[|M_{n}-M_{k}|^{p}|\mathcal{F}_{k}\right]\leqslant\mathscr{C}_{0}(n-k)^{\frac{p}{2}-1}\sum_{m=k}^{n-1}\mathbb{E}[\gamma_{m}|\mathcal{F}_{k}].$
(b)

$\displaystyle\mathbb{E}\left[(|M_{n}-M_{k}|+\Theta)^{r}\mathds{1}_{\{|M_{n}-M_{k}|+\Theta)>b\}}|\mathcal{F}_{k}\right]\leqslant\mathscr{C}_{00}\left((n-k)^{\frac{p}{2}-1}\sum_{m=k}^{n-1}\mathbb{E}\left[\gamma_{m}|\mathcal{F}_{k}\right]+\mathbb{E}\left[|\Theta|^{p}|\mathcal{F}_{k}\right]\right)b^{r-p}.$

Proof.

Note that by Burkholder’s inequality (e.g., see [23]), there exists $c_{p}>0$ such that

\mathbb{E}\bigl{[}\left\lvert{M_{n}-M_{k}}\right\rvert^{p}\,\big{|}\,\mathcal{F}_{k}\bigr{]}\leqslant c_{p}\mathbb{E}\left[\biggl{(}\sum_{m=k}^{n-1}\bigl{[}\left\lvert{M_{m+1}-M_{m}}\right\rvert^{2}\bigr{]}\biggr{)}^{p/2}\,\bigg{|}\,\mathcal{F}_{k}\right].

Now by Hölder’s inequality and by (2.3)

	$\displaystyle{}\mathbb{E}\bigl{[}\left\lvert{M_{n}-M_{k}}\right\rvert^{p}\,\big{\|}\,\mathcal{F}_{k}\bigr{]}$	$\displaystyle\leqslant c_{p}(n-k)^{\frac{p}{2}-1}\sum_{m=k}^{n-1}\mathbb{E}\left[\left\lvert{M_{m+1}-M_{m}}\right\rvert^{p}\|\mathcal{F}_{k}\right]$
(2.4)			$\displaystyle\leqslant c_{p}(n-k)^{\frac{p}{2}-1}\sum_{m=k}^{n-1}\mathbb{E}\left[\gamma_{m}\|\mathcal{F}_{k}\right].$

Now observe that for a random variable $Y_{n}$ , by Hölder’s inequality and Markov’s inequality: $\mathbb{P}(|Y_{n}|>b|\mathcal{F}_{k})\leqslant\mathbb{E}\left[|Y_{n}|^{p}|\mathcal{F}_{k}\right]/b^{p}$ , we have for $r<p$ and $n\geqslant k$

\displaystyle\mathbb{E}\left[|Y_{n}|^{r}\mathds{1}_{\{|Y_{n}|>b\}}|\mathcal{F}_{k}\right]\leqslant\ \mathbb{E}\left[|Y_{n}|^{p}|\mathcal{F}_{k}\right]^{r/p}\mathbb{P}(|Y_{n}|>b|\mathcal{F}_{k})^{(p-r)/p}\leqslant\ \mathbb{E}\left[|Y_{n}|^{p}|\mathcal{F}_{k}\right]/b^{p-r}.

Taking $Y_{n}=|M_{n}-M_{k}|+\Theta$ we have

\displaystyle\mathbb{E}\left[|M_{n}-M_{k}|+\Theta|^{r}\mathds{1}_{\{||M_{n}-M_{k}|+\Theta_{k}|>b\}}|\mathcal{F}_{k}\right]\leqslant\ 2^{p-1}\left(\mathbb{E}\left[|M_{n}-M_{k}|^{p}|\mathcal{F}_{k}\right]+\mathbb{E}\left[\Theta^{p}|\mathcal{F}_{k}\right]\right)/b^{p-r},

and part (b) follows from (2.4). ∎

We now prove the two propositions which form the backbone of our main result, Theorem 2.2.

Proposition 2.5.

Suppose that Assumption 2.1 holds. Then for any $0\leqslant r<p/2-1,$

\mathscr{B}_{r}({x_{0}})\doteq\sup_{n\in\mathbb{N}}\mathbb{E}_{x_{0}}\bigl{[}V(X_{n})^{r}\bigr{]}<\infty,

Proof of Proposition 2.5.

Fix an $r\in(s,p/2-1)$ . Observe that it is enough to prove the result for such an $r$ . Writing $\varphi(x)=\varphi(x)\mathds{1}_{\{|V(x)|\leqslant M\}}+(\varphi(x)/V^{r}(x))V^{r}(x)\mathds{1}_{\{|V(x)|>M\}}$ , we can say, because of the growth assumption on $\varphi$ (c.f (2.1-b)), that for every $\varepsilon>0$ , there exists a constant $\mathscr{C}_{1}(\varepsilon)$ such that $\varphi(x)\leqslant\mathscr{C}_{1}(\varepsilon)+\varepsilon V^{r}(x).$

The constants appearing in various estimates below will be denoted by $\mathscr{C}_{i}$ ’s. They will not depend on $n$ but may depend on the parameters of the system and the initial position ${x_{0}}$ .

Define $\mathscr{M}_{0}=0$ and

\displaystyle\mathscr{M}_{n}=\sum_{j=0}^{n-1}V(X_{j+1})-\mathbb{E}_{x_{0}}[V(X_{j+1})|\mathcal{F}_{j}],\quad n\geqslant 1.

Then $\mathscr{M}_{n}$ is a martingale. Fix $N\in\mathbb{N}$ , and define the last time $\{X_{k}\}$ is in $\mathcal{D}$ :

\eta\equiv\max\{k\leqslant N\mid X_{k}\in\mathcal{D}\}.

Notice that $\{\eta=k\}=\{X_{k}\in\mathcal{D}\}\cap\cap_{j>k}^{N}\{X_{j}\notin\mathcal{D}\}$ . On $\{\eta=k\}$ , for $k<n\leqslant N$

	$\displaystyle{}\mathscr{M}_{n}-\mathscr{M}_{k}=$	$\displaystyle V(X_{n})-V(X_{k})-\sum_{j=k}^{n-1}\left(\mathbb{E}_{x_{0}}[V(X_{j+1})\|\mathcal{F}_{j}]-V(X_{j})\right)$
	$\displaystyle{}\geqslant$	$\displaystyle\ V(X_{n})-V(X_{k})-\left(\mathbb{E}_{x_{0}}[V(X_{k+1})\|\mathcal{F}_{k}]-V(X_{k})\right)+A(n-k-1)$
(2.5)		$\displaystyle\equiv$	$\displaystyle\ V(X_{n})+A(n-k-1)-\mathbb{E}_{x_{0}}[V(X_{k+1})\|\mathcal{F}_{k}].$

It follows that on $\{\eta=k\}$ ,

\displaystyle V(X_{N})^{r}\leqslant\left(|\mathscr{M}_{N}-\mathscr{M}_{k}|+\xi_{k}\right)^{r},\quad\text{and}\quad A(N-k-1)\leqslant|\mathscr{M}_{N}-\mathscr{M}_{k}|+\xi_{k},

where $\xi_{k}=\mathbb{E}_{x_{0}}[V(X_{k+1})|\mathcal{F}_{k}]\mathds{1}_{\{X_{k}\in\mathcal{D}\}}$ .

On $\{\eta=-\infty\},$ which corresponds to the case that the chain starting outside $\mathcal{D}$ never enters $\mathcal{D}$ by time $N$ , we have

\displaystyle V(X_{N})^{r}\leqslant\left(|\mathscr{M}_{N}-\mathscr{M}_{0}|+V({x_{0}})\right)^{r},\quad\text{and}\quad AN\leqslant|\mathscr{M}_{N}-\mathscr{M}_{0}|+V({x_{0}}).

Thus for $k\leqslant N-2$ ,

	$\displaystyle\mathbb{E}_{x_{0}}[V(X_{N})^{r}1_{\{\eta=k\}}]\leqslant$	$\displaystyle\ \mathbb{E}_{x_{0}}\left[\left(\|\mathscr{M}_{N}-\mathscr{M}_{k}\|+\xi_{k}\right)^{r}1_{\{\eta=k\}}\right]$
	$\displaystyle\leqslant$	$\displaystyle\ \mathbb{E}_{x_{0}}\left[\left(\|\mathscr{M}_{N}-\mathscr{M}_{k}\|+\xi_{k}\right)^{r}1_{\{\|\mathscr{M}_{N}-\mathscr{M}_{k}\|+\xi_{k}\geqslant A(N-k-1)\}}\right]$
	$\displaystyle\leqslant$	$\displaystyle\ 2^{p-1}\left(c_{p}(N-k)^{\frac{p}{2}-1}\sum_{m=k}^{N-1}\mathbb{E}_{x_{0}}\left[\varphi(X_{m})\right]+\mathbb{E}_{x}\left[\xi_{k}^{p}\right]\right)/(N-k-1)^{p-r}$
	$\displaystyle\leqslant$	$\displaystyle\ \mathscr{C}_{2}\left((N-k)^{r-1-p/2}\sum_{m=k}^{N-1}\mathbb{E}_{x_{0}}\left[\varphi(X_{m})\right]+(N-k)^{r-p}\right),$

where we used (a) (2.1-c), (b) Lemma 2.4 along with the observation that

\displaystyle\mathbb{E}_{x_{0}}\left[\left\lvert{M_{n+1}-M_{n}}\right\rvert^{p}\right]=\mathbb{E}_{x_{0}}\left[\left\lvert{V(X_{n+1})-\mathbb{E}[V(X_{n+1})|\mathcal{F}_{n}]}\right\rvert^{p}\right]\leqslant\mathbb{E}_{x_{0}}[\varphi(X_{n})]

and (c) the fact that $\sup_{m\geqslant 2}m/(m-1)=2.$

Similarly, on $\{\eta=-\infty\}$ ,

	$\displaystyle\mathbb{E}_{x_{0}}[V(X_{N})^{r}1_{\{\eta=-\infty\}}]\leqslant$	$\displaystyle\mathbb{E}_{x_{0}}\left[\left(\|\mathscr{M}_{N}\|+V({x_{0}})\right)^{r}1_{\{\|\mathscr{M}_{N}\|+V({x_{0}})\geqslant AN\}}\right]$
	$\displaystyle\leqslant$	$\displaystyle\ 2^{p-1}\left(c_{p}N^{\frac{p}{2}-1}\sum_{m=0}^{N-1}\mathbb{E}_{x_{0}}\left[\varphi(X_{m})\right]+V({x_{0}})^{p}\right)/N^{p-r}$
	$\displaystyle\leqslant$	$\displaystyle\ 2^{p-1}\left(c_{p}N^{r-1-\frac{p}{2}}\sum_{m=0}^{N-1}\mathbb{E}_{x_{0}}\left[\varphi(X_{m})\right]+V({x_{0}})^{p}N^{r-p}\right).$

Next, note that because of (2.1-b)

\displaystyle\mathbb{E}_{x_{0}}[V^{p}(X_{N})|\mathcal{F}_{N-1}]\mathds{1}_{\{X_{N-1}\in\mathcal{D}\}}\leqslant 2^{p-1}\left(\mathbb{E}_{x_{0}}[V(X_{N})|\mathcal{F}_{N-1}]^{p}\mathds{1}_{\{X_{N-1}\in\mathcal{D}\}}+\sup_{x\in\mathcal{D}}\varphi(x)\right),

which by (2.1-c) of course implies that for any $q\leqslant p$ ,

\mathbb{E}_{x_{0}}[V(X_{N})^{q}1_{\{\eta=N-1\}}]\leqslant\ \mathbb{E}_{x_{0}}[V^{q}(X_{N})\mathds{1}_{\{X_{N-1}\in\mathcal{D}\}}]\leqslant\mathscr{C}_{3}.

Lastly,

\displaystyle\mathbb{E}_{x_{0}}[V(X_{N})^{r}1_{\{\eta=N\}}]\leqslant

\displaystyle\ \mathbb{E}_{x_{0}}[V(X_{N})^{r}1_{\{X_{N}\in\mathcal{D}\}}]\leqslant\sup_{z\in\mathcal{D}}V^{r}(z),

Thus,

	$\displaystyle\mathbb{E}_{x_{0}}[V(X_{N})^{r}]=$	$\displaystyle\ \sum_{k=0}^{N}\mathbb{E}_{x_{0}}[V(X_{N})^{r}1_{\{\eta=k\}}]+\mathbb{E}_{x_{0}}V(X_{N})^{r}1_{\{\eta=-\infty\}}]$
	$\displaystyle\leqslant$	$\displaystyle\ \ \sum_{k=0}^{N-2}\mathbb{E}_{x_{0}}[V(X_{N})^{r}1_{\{\zeta=k\}}]+\mathscr{C}_{3}+\sup_{z\in\mathcal{D}}V^{r}(z)+\mathbb{E}_{x_{0}}[V(X_{N})^{r}1_{\{\zeta=-\infty\}}]$
	$\displaystyle\leqslant$	$\displaystyle\ \mathscr{C}_{4}(1+V^{p}({x_{0}}))\zeta(p-r)+\mathscr{C}_{4}\sum_{k=0}^{N-2}(N-k)^{r-1-p/2}\sum_{m=k}^{N-1}\mathbb{E}_{x_{0}}\left[\varphi(X_{m})\right]$
	$\displaystyle\leqslant$	$\displaystyle\ \mathscr{C}_{5}(1+V^{p}({x_{0}}))+\mathscr{C}_{4}\sum_{k=0}^{N-2}(N-k)^{r-1-p/2}\sum_{m=k}^{N-1}\mathbb{E}_{x_{0}}\left[\mathscr{C}_{1}(\varepsilon)+\varepsilon V^{r}(X_{m})\right]$
	$\displaystyle\leqslant$	$\displaystyle\ \mathscr{C}_{6}(\varepsilon)(1+V^{p}({x_{0}}))+\mathscr{C}_{4}\varepsilon\sum_{m=0}^{N-1}\beta^{N}_{m}\mathbb{E}_{x_{0}}\left[V^{r}(X_{m})\right],$

where the choice of $\varepsilon$ will be specified shortly, and $\beta^{N}_{m}=\sum_{k=0}^{m}(N-k)^{r-1-p/2}.$ Iterating, we have

(2.6)		$\displaystyle\mathbb{E}_{x_{0}}\bigl{[}V(X_{N})^{r}\bigr{]}$	$\displaystyle\leqslant\mathscr{C}_{6}(\varepsilon)(1+V^{p}({x_{0}}))\biggl{(}1+\mathscr{C}_{4}\varepsilon\sum_{l_{1}=0}^{N-1}\beta^{N}_{l_{1}}+(\mathscr{C}_{4}\varepsilon)^{2}\sum_{l_{1}=0}^{N-1}\beta^{N}_{l_{1}}\sum_{l_{2}=0}^{l_{1}-1}\beta_{l_{2}}^{l_{1}}+\cdots$
(2.6)			$\displaystyle\qquad\cdots+(\mathscr{C}_{4}\varepsilon)^{N-1}\beta^{N}_{N-1}\beta^{N-1}_{N-2}\ldots\beta^{2}_{1}\beta^{1}_{0}\biggr{)}(1+V^{r}({x_{0}})).$

Notice that for any $k>0$ , since $r<p/2-1$ ,

\sum_{l=0}^{k-1}\beta^{k}_{l}=\sum_{l=0}^{k-1}\sum_{j=0}^{l}(k-j)^{r-1-p/2}=\sum_{j=0}^{k-1}(k-j)^{r-p/2}\leqslant\zeta(p/2-r).

Choosing $\varepsilon$ so that $\mathscr{C}_{4}\varepsilon\zeta(p/2-r)<1$ , (2.6) yields

\displaystyle\mathbb{E}_{x_{0}}\bigl{[}V(X_{N})^{r}\bigr{]}\leqslant\frac{\mathscr{C}_{6}(\varepsilon)(1+V^{p}({x_{0}}))(1+V^{r}({x_{0}}))}{1-\mathscr{C}_{4}\varepsilon\zeta(p/2-r)},

and the assertion follows.

∎

The next proposition helps to extend the above result from any $r<p/2-1$ to $\varsigma(s,p)$ as stipulated in Theorem 2.2. However it is also a stand-alone result that is applicable to certain models where Theorem 2.2 is not directly applicable. These are cases where one directly does not have any good estimate of the conditional centered moment $\Xi_{n}$ as required in Theorem 2.6, but have suitable upper bounds for its $\|\cdot\|_{\theta}$ norm. As a simple example, let $X_{n}$ be a stochastic process taking values in $[-\mathfrak{c}_{0},\infty)$ , whose temporal evolution is given by

X_{n+1}=\mathfrak{c}_{1}+X_{n}/2+Y_{n}

where $\mathfrak{c}_{0}$ and $\mathfrak{c}_{1}$ are (real-valued) constants, and $\{Y_{n}\}$ is an $\mathcal{F}_{n}$ -adapted martingale difference process (that is, $\mathbb{E}(Y_{n+1}|\mathcal{F}_{n})=0$ ) and $\sup_{n}\mathbb{E}(|Y_{n}|^{p})<\infty$ for $p>2$ . Then Theorem 2.2 is not applicable, but the following proposition can be applied with $\theta=1$ to $V(x)=x+\mathfrak{c}_{0}.$

Proposition 2.6.

Let $\Xi_{n}\equiv\mathbb{E}_{x_{0}}\left[|V(X_{n+1})-\mathbb{E}(V(X_{n+1})|\mathcal{F}_{n})|^{p}|\mathcal{F}_{n}\right]$ denote the centered conditional $p$ -th moment of $V(X_{n+1})$ given $\mathcal{F}_{n}$ . Assume that (2.1-a) and (2.1-c) of Assumption 2.1 hold, and for $p>2$ , some $\theta\in[1,\infty]$ and some constant $0<\bar{\mathscr{B}}_{\theta}(x)<\infty$ ,

\displaystyle\|\Xi_{n}\|_{\theta}=\mathbb{E}_{x_{0}}\left[\Xi_{n}^{\theta}\right]^{1/\theta}\leqslant\bar{\mathscr{B}}_{\theta}({x_{0}}),\quad\text{ for all }n\geqslant 0.

Then $\displaystyle\mathscr{B}_{r}({x_{0}})\doteq\sup_{n\in\mathbb{N}}\mathbb{E}_{x_{0}}\bigl{[}V(X_{n})^{r}\bigr{]}<\infty$ for $0\leqslant r<\bar{\varsigma}(\theta,p),$

\displaystyle\bar{\varsigma}(\theta,p)=\begin{cases}p\left(1-\frac{1}{2\theta}\right)-1,&\quad\text{for }\theta\in\left[1,\frac{p}{2}\right]\cup\left(\frac{p}{p-2},\infty\right]\text{ when }2<p<4,\\ &\quad\text{for any }\ \theta\geqslant 1\text{ when }p>4;\\ p-2,&\quad\text{for }\theta\in\left(\frac{p}{2},\frac{p}{p-2}\right]\text{ when }2<p<4.\end{cases}

Here $\theta=\infty$ cooresponds to the case that $\Xi_{n}=\mathbb{E}_{x_{0}}\left[|V(X_{n+1})-\mathbb{E}(V(X_{n+1})|\mathcal{F}_{n})|^{p}|\mathcal{F}_{n}\right]\leqslant\bar{\mathscr{B}}$ a.s, for some constant $\bar{\mathscr{B}}>0$ .

Proof of Proposition 2.6.

The constants appearing in various estimates below (besides the ones that appeared before) will be denoted by $\hat{C}_{i}$ ’s. They will not depend on $n$ but may depend on the parameters of the system and the initial position ${x_{0}}$ .

Define $\mathscr{M}_{n}$ , $\eta$ and $\xi_{k}$ as in the proof of Proposition 2.5. Fix $N$ , $0\leqslant k\leqslant N$ , and define $\varsigma\equiv\varsigma(N,k)$ by

\displaystyle\varsigma=\inf\{j\geqslant k:\mathscr{M}_{j}-\mathscr{M}_{k}+\xi_{k}\geqslant A(N-k-1)/2\}.

Clearly, $\varsigma\leqslant N$ . For $j>k$ , notice that on $\{\varsigma=j\}$ ,

\displaystyle\mathscr{M}_{j-1}-\mathscr{M}_{k}+\xi_{k}\leqslant A(N-k-1)/2,

and hence on $\{\eta=k\}\cap\{\varsigma=j\}$

\displaystyle\mathscr{M}_{N}-\mathscr{M}_{j-1}\geqslant A(N-k-1)/2+V(X_{N}).

It follows that for $j>k$

\displaystyle\mathbb{E}_{x_{0}}[V(X_{N})\mathds{1}_{\{\eta=k\}}\mathds{1}_{\{\varsigma=j\}}]\leqslant\mathbb{E}_{x_{0}}\left[|\mathscr{M}_{N}-\mathscr{M}_{j-1}|^{r}\mathds{1}_{\{|\mathscr{M}_{N}-\mathscr{M}_{j-1}|>A(N-k-1)/2\}}\mathds{1}_{\{|\mathscr{M}_{j-1}-\mathscr{M}_{k}|+\xi_{k}>A(j-k-2)\vee 0\}}\mathds{1}_{\{\varsigma=j\}}\right].

Notice that $\mathcal{S}_{j}\equiv\mathbb{E}_{x_{0}}\left[|\mathscr{M}_{N}-\mathscr{M}_{j-1}|^{r}\mathds{1}_{|\mathscr{M}_{N}-\mathscr{M}_{j-1}|>A(N-k-1)/2}|\mathcal{F}_{j}\right]$ can be estimated by Lemma 2.4 as

	$\displaystyle\mathcal{S}_{j}\leqslant$	$\displaystyle\ (2/A(N-k-1))^{p-r}\mathbb{E}_{x_{0}}\left[\|\mathscr{M}_{N}-\mathscr{M}_{j-1}\|^{p}\|\mathcal{F}_{j}\right]$
	$\displaystyle\leqslant$	$\displaystyle\ \hat{C}_{0}\left[\mathbb{E}_{x_{0}}\left[\|\mathscr{M}_{N}-\mathscr{M}_{j}\|^{p}\|\mathcal{F}_{j}\right]+\|\mathscr{M}_{j}-\mathscr{M}_{j-1}\|^{p}\right]/(N-k-1)^{p-r}$
	$\displaystyle\leqslant$	$\displaystyle\ \hat{C}_{0}\left[\mathscr{C}_{0}(N-j)^{\frac{p}{2}-1}\mathbb{E}_{x_{0}}\left[\sum_{l=j}^{N-1}\Xi_{l}\|\mathcal{F}_{j}\right]+\|\mathscr{M}_{j}-\mathscr{M}_{j-1}\|^{p}\right]/(N-k-1)^{p-r}.$

Also, for $\varsigma=k$ by Lemma 2.4,

	$\displaystyle\mathbb{E}_{x_{0}}[V(X_{N})\mathds{1}_{\{\eta=k\}}\mathds{1}_{\{\varsigma=k\}}]\leqslant$	$\displaystyle\ \mathbb{E}_{x_{0}}\left[\mathds{1}_{\{\varsigma=k\}}\mathbb{E}_{x_{0}}\left[(\|\mathscr{M}_{N}-\mathscr{M}_{k}\|+\xi_{k})^{r}\mathds{1}_{\|\|\mathscr{M}_{N}-\mathscr{M}_{k}\|+\xi_{k}>A(N-k-1)}\|\mathcal{F}_{k}\right]\right]$
	$\displaystyle\leqslant$	$\displaystyle\ \hat{C}_{1}\mathbb{E}_{x_{0}}\left[\mathds{1}_{\{\varsigma=k\}}\left((N-k-1)^{r-p/2-1}\sum_{l=k}^{N-1}\mathbb{E}_{x_{0}}\left[\Xi_{l}\|\mathcal{F}_{k}\right]+(N-k-1)^{r-p}\|\xi_{k}\|^{p}\right)\right].$

Hence,

	$\displaystyle{}\mathbb{E}_{x_{0}}[V(X_{N})\mathds{1}_{\{\eta=k\}}]=$	$\displaystyle\ \sum_{j=k}^{N}\mathbb{E}_{x_{0}}[V(X_{N})\mathds{1}_{\{\eta=k\}}\mathds{1}_{\{\varsigma=j\}}]\leqslant\ \hat{C}_{1}(N-k-1)^{r-p/2-1}\mathbb{E}_{x_{0}}\left[\mathds{1}_{\{\varsigma=k\}}\sum_{l=k}^{N-1}\mathbb{E}_{x_{0}}\left[\Xi_{l}\|\mathcal{F}_{k}\right]\right]$
		$\displaystyle\hskip 14.22636pt+\hat{C}_{1}(N-k-1)^{r-p}\bar{\mathscr{B}}({x_{0}})+\sum_{j=k+1}^{N}\mathbb{E}_{x_{0}}\left[\mathds{1}_{\{\varsigma=j\}}\mathds{1}_{\{\|\mathscr{M}_{j-1}-\mathscr{M}_{k}\|+\xi_{k}>A(j-k-2)\}}\mathcal{S}_{j}\right]$
	$\displaystyle{}\leqslant$	$\displaystyle\ \hat{C}_{2}\Big{[}(N-k-1)^{r-\frac{p}{2}-1}\sum_{j=k}^{N}\sum_{l=j}^{N-1}\mathbb{E}_{x_{0}}\left[\Xi_{l}\mathds{1}_{\{\varsigma=j\}}\right]+(N-k-1)^{p-r}$
(2.7)			$\displaystyle\hskip 8.5359pt+(N-k-1)^{r-p}\sum_{j=k+1}^{N}\|\mathscr{M}_{j}-\mathscr{M}_{j-1}\|^{p}\mathds{1}_{\{\|\mathscr{M}_{j-1}-\mathscr{M}_{k}\|+\xi_{k}>A(j-k-2)\vee 0\}}\Big{]}.$

We next estimate the above two terms separately, and for that we need the following bound which is an immediate consequence of Doob’s maximal inequality and assumption:

(2.8)

\displaystyle\mathbb{P}_{x_{0}}\left(\max_{k\leqslant j\leqslant l}|\mathscr{M}_{j}-\mathscr{M}_{k}|+\xi_{k}>\Upsilon\right)\leqslant

\displaystyle\ \mathbb{E}_{x_{0}}\left[\max_{k\leqslant j\leqslant l}|\mathscr{M}_{j}-\mathscr{M}_{k}|+\xi_{k}\right]^{p}/\Upsilon^{p}\leqslant\hat{C}_{3}\left((l-k)^{p/2}+\bar{\mathscr{B}}_{0}(x)\right)/\Upsilon^{p}

Now notice that

	$\displaystyle{}\sum_{j=k}^{N-1}\sum_{l=j}^{N-1}\mathbb{E}_{x_{0}}\left[\Xi_{l}\mathds{1}_{\{\varsigma=j\}}\right]=$	$\displaystyle\ \sum_{l=k}^{N-1}\sum_{j=k}^{l}\mathbb{E}_{x_{0}}\left[\Xi_{l}\mathds{1}_{\{\varsigma=j\}}\right]=\sum_{l=k}^{N-1}\mathbb{E}_{x_{0}}\left[\Xi_{l}\mathds{1}_{\{\varsigma\leqslant l\}}\right]\leqslant\ \sum_{l=k}^{N-1}\\|\Xi_{l}\\|_{\theta}\mathbb{P}_{x_{0}}(\varsigma\leqslant l)^{1/\theta^{*}}$
	$\displaystyle{}\leqslant$	$\displaystyle\ \bar{\mathscr{B}}_{\theta}(x)\sum_{l=k}^{N-1}\mathbb{P}_{x_{0}}\left(\max_{k\leqslant j\leqslant l}\left(\|\mathscr{M}_{j}-\mathscr{M}_{k}\|+\xi_{k}\right)>A(N-k-1)/2\right)^{1/\theta^{*}}$
	$\displaystyle{}\leqslant$	$\displaystyle\left(\frac{2}{A}\right)^{p/\theta^{}}\bar{\mathscr{B}}_{\theta}(x)\sum_{l=k}^{N-1}\left(\frac{\hat{C}_{3}\left((l-k)^{p/2}+\bar{\mathscr{B}}_{0}(x)\right)}{(N-k-1)^{p}}\right)^{1/\theta^{}}$
(2.9)		$\displaystyle\leqslant$	$\displaystyle\ \hat{C}_{4}(N-k-1)^{-\frac{p}{2\theta^{*}}+1}=\hat{C}_{4}(N-k-1)^{-\frac{p}{2}\left(1-\frac{1}{\theta}\right)+1},$

where $\frac{1}{\theta}+\frac{1}{\theta^{*}}=1$ .

Next notice that the term

\mathcal{A}\equiv\sum_{j=k+1}^{N}\mathbb{E}_{x_{0}}\left[|\mathscr{M}_{j}-\mathscr{M}_{j-1}|^{p}\mathds{1}_{\{|\mathscr{M}_{j-1}-\mathscr{M}_{k}|+\xi_{k}>A(j-k-2)\}}\right]

can be estimated as

	$\displaystyle{}\mathcal{A}\leqslant$	$\displaystyle\ \\|\Xi_{k+1}\\|_{\theta}+\\|\Xi_{k+2}\\|_{\theta}+\sum_{j=k+3}^{N}\mathbb{E}_{x_{0}}\left[\Xi_{j-1}\mathds{1}_{\{\|\mathscr{M}_{j-1}-\mathscr{M}_{k}\|+\xi_{k}>A(j-k-2)\}}\right]$
	$\displaystyle{}\leqslant$	$\displaystyle\ 2\bar{\mathscr{B}}_{\theta}(x)+\sum_{j=k+3}^{N}\\|\Xi_{j-1}\\|_{\theta}\mathbb{P}_{x_{0}}\left(\|\mathscr{M}_{j-1}-\mathscr{M}_{k}\|+\xi_{k}>A(j-k-2)\right)^{1/\theta^{*}}$
	$\displaystyle{}\leqslant$	$\displaystyle\ \bar{\mathscr{B}}_{\theta}(x)\left[2+A^{-p/\theta^{}}\sum_{j=k+3}^{N}\left(\hat{C}_{3}\left((j-k-1)^{p/2}+\bar{\mathscr{B}}_{0}(x)\right)/(j-k-2)^{p}\right)^{1/\theta^{}}\right],$
	$\displaystyle{}\leqslant$	$\displaystyle\ \hat{C}_{5}\left[1+\sum_{j=k+3}^{N}1/(j-k-2)^{p/2\theta^{*}}\right]$
(2.10)		$\displaystyle\leqslant$	$\displaystyle\ \begin{cases}\hat{C}_{6},&\quad\text{ if }p/2\theta^{*}=p(1-1/\theta)/2>1,\\ \hat{C}_{7}(N-k-1),&\quad\text{ otherwise},\end{cases}$

where the third inequality is by (2.8). We now consider some cases.

Case 1: $\theta\leqslant p/2$ : Suppose that $r<p\left(1-\frac{1}{2\theta}\right)-1$ . Notice in this case this implies that $p-r-1>\frac{p}{2\theta}\geqslant 1.$ It follows from (2.7), (2.9) and (2.10) (second case) that

	$\displaystyle\mathbb{E}_{x_{0}}[V^{r}(X_{N})]=$	$\displaystyle\ \sum_{k=0}^{N}\mathbb{E}_{x_{0}}[V^{r}(X_{N})\mathds{1}_{\{\eta=k\}}]\leqslant\sum_{k=0}^{N-2}\mathbb{E}_{x_{0}}[V^{r}(X_{N})\mathds{1}_{\{\eta=k\}}]+\mathbb{E}_{x_{0}}[V^{r}(X_{N})\mathds{1}_{\{X_{N-1}\in\mathcal{D}\}}]+\sup_{x\in\mathcal{D}}V^{r}(x)$
	$\displaystyle\leqslant$	$\displaystyle\ \hat{C}_{8}\left[\sum_{k=0}^{N-2}(N-k-1)^{r-\frac{p}{2}-1-\frac{p}{2}\left(1-\frac{1}{\theta}\right)+1}+\sum_{k=0}^{N-2}(N-k-1)^{r-p}+\sum_{k=0}^{N-2}(N-k-1)^{r-p+1}\right]$
		$\displaystyle\ +\mathscr{C}_{3}+\sup_{x\in\mathcal{D}}V^{r}(x)$
	$\displaystyle=$	$\displaystyle\ \hat{C}_{8}\left(\zeta\left(p\left(1-\frac{1}{2\theta}\right)-r\right)+\zeta(p-r)+\zeta(p-r-1)\right)+\mathscr{C}_{3}+\sup_{x\in\mathcal{D}}V^{r}(x).$

Case 2: $\theta>p/2$ , and $p\geqslant 4$ : Suppose that $r<p\left(1-\frac{1}{2\theta}\right)-1$ . Notice that $\theta>p/2$ , and $p\geqslant 4$ imply that $p/2\theta^{*}=p(1-1/\theta)/2>1$ . Like the previous case, it again follows from (2.7), (2.9) and (2.10) (first case)

	$\displaystyle\mathbb{E}_{x_{0}}[V^{r}(X_{N})]\leqslant$	$\displaystyle\ \sum_{k=0}^{N-2}\mathbb{E}_{x_{0}}[V^{r}(X_{N})\mathds{1}_{\{\eta=k\}}]+\mathbb{E}_{x_{0}}[V^{r}(X_{N})\mathds{1}_{\{X_{N-1}\in\mathcal{D}\}}]+\sup_{x\in\mathcal{D}}V^{r}(x)$
	$\displaystyle\leqslant$	$\displaystyle\ \hat{C}_{9}\left(\sum_{k=0}^{N-2}(N-k)^{r-p\left(1-\frac{1}{2\theta}\right)}+\sum_{k=0}^{N-1}(N-k-1)^{r-p}\right)+\mathscr{C}_{3}+\sup_{x\in\mathcal{D}}V^{r}(x)\leqslant\ \hat{C}_{10}.$

The other cases in the assertion follow similarly once we observe that $\theta>p/(p-2)\Leftrightarrow p/2\theta^{*}>1$ and for $2<p<4$ , $p/2<p/(p-2)$ .

∎

2.2. Ergodicity of Markov processes

Theorem 2.2 leads to the following result on Harris ergodicity of Markov processes.

Definition 2.7.

A function $V:\mathcal{E}\rightarrow[0,\infty)$ is inf-compact if the level sets, $\mathcal{K}_{m}=\{x:V(x)\leqslant m\}$ are compact for all $m\geqslant 0$ .

Note that an inf-compact function $V$ is lower-semicontinuous.

Theorem 2.8.

Let $\{X_{n}\}$ be a Markov process taking values in a locally compact separable space $\mathcal{E}$ with transition kernel $\mathcal{P}$ . Suppose for an inf-compact function $V:\mathcal{E}\rightarrow[0,\infty)$ , the following conditions hold:

(2.8-a)

for all $n\in N$ ,

$\mathcal{P}V(x)-V(x)\leqslant-A,\quad\text{on }\ \{x\notin\mathcal{D}\};$

(2.8-b)

for some $p>2$

\mathcal{P}|V(\cdot)-\mathcal{P}V(x)|^{p}(x)=\int|V(y)-\mathcal{P}V(x)|^{p}\mathcal{P}(x,dy)\leqslant\varphi(x),

where $\varphi:\mathcal{E}\rightarrow[0,\infty]$ satisfies $\varphi(x)\leqslant\mathscr{C}_{\varphi}(1+V^{s}(x))$ for some $s<p/2-1$ and some constant $\mathscr{C}_{\varphi}>0$ . This is of course same as requiring $\mathbb{E}\left[\big{|}V(X_{n+1})-\mathcal{P}V(X_{n})\big{|}^{p}\Big{|}\mathcal{F}_{n}\right]\leqslant\varphi(X_{n}).$

(2.8-c)

$\sup_{x\in\mathcal{D}}V(x)<\infty,$ and $\sup_{x\in\mathcal{D}}\mathcal{P}V(x)<\infty,$

Also, suppose that

(2.8-d)

$\mathcal{P}$ is weak Feller, $\psi$ -irreducible, and admits a density $q$ with respect to some Radon measure $\mu$ , that is, $\mathcal{P}(x,dy)=q(x,y)\mu(dy)$ , and that for every compact set $\mathcal{K}$ , there exists a constant $\mathfrak{c}_{\mathcal{K},0}$ such that

$\sup_{y\in\mathcal{K}}q(x,y)\leqslant\mathfrak{c}_{\mathcal{K},0}\left(1+V^{r}(x)\right).$

Then

(i)

Under (2.8-a) - (2.8-c), $\sup_{n}\mathbb{E}_{x_{0}}(V^{r}(X_{n}))\equiv\sup_{n}\mathcal{P}^{n}V^{r}({x_{0}})<\infty$ for any $0\leqslant r<\varsigma(s,p)$ , where $\varsigma(s,p)$ is as in Theorem 2.2.

(ii)

Under additional assumption of (2.8-d), $\{X_{n}\}$ is positive Harris recurrent (PHR) and aperiodic with a unique invariant distribution $\pi$ , and for any $x_{0}$ and $r\in(0,\varsigma(s,p))$

(2.11)

\displaystyle\int(V^{r}+1)d|\mathcal{P}^{n}({x_{0}},\cdot)-\pi|\rightarrow 0\quad\text{as }n\rightarrow\infty;

or equivalently,

(2.12)

\displaystyle\|\mathcal{P}^{n}({x_{0}},\cdot)-\pi\|_{V^{r}+1}\doteq\sup_{f:|f|\leqslant V^{r}+1}|\mathcal{P}^{n}f({x_{0}})-\pi(f)|\rightarrow 0,\quad\text{as }n\rightarrow\infty.

Proof.

(i) follows from the Theorem 2.2. Since $V$ is inf-compact, it follows from (i) that for every $x_{0}$ , $\{\mathcal{P}^{n}(x_{0},\cdot)\}$ is tight, and let $\pi$ be one of its limit point. Since $\mathcal{P}$ is weak Feller, by the Krylov-Bogolyubov theorem [22, Theorem 7.1], $\pi$ is invariant for $\mathcal{P}$ , and uniqueness of $\pi$ follows from the assumption of $\psi$ -irreducibility [9, Proposition 4.2.2] . Hence, for every $x_{0}$ , $\mathcal{P}^{n}(x_{0},\cdot)\Rightarrow\pi$ (along the full sequence) as $n\rightarrow\infty$ .

For (ii) we start by establishing the following claim.

Claim: Suppose that $f\leqslant V^{r}+1$ for some $r\in(0,\varsigma(s,p))$ . Then $\mathcal{P}^{n}f(x_{0})\rightarrow\pi(f)$ as $n\rightarrow\infty$ for any $x_{0}\in\mathcal{E}$ .

Since $V$ is lower semi-continuous we have by (generalized) Fatou’s lemma,

\displaystyle\pi(V^{r})\leqslant\liminf_{n\rightarrow\infty}\mathcal{P}^{n}V^{r}(x_{0})\leqslant\mathscr{B}_{r}(x_{0})

for any $r\in(0,\varsigma(s,p))$ . Now let $f\leqslant V^{r}+1$ for some $r\in(0,\varsigma(s,p))$ and fix $\varepsilon>0$ .

Since $\{\mathcal{P}^{n}(x_{0},\cdot)\}$ is tight, for a given $\tilde{\varepsilon}>0$ , there exists a compact set $\mathcal{K}$ (which depends on $x_{0}$ and which we take of the form $\mathcal{K}_{m}=\{x:V(x)\leqslant m\}$ for sufficiently large $m$ ) such that

\sup_{n}\mathcal{P}^{n}(x_{0},\mathcal{K}^{c})\leqslant\tilde{\varepsilon},\quad\text{ and }\quad\pi(\mathcal{K}^{c})\leqslant\tilde{\varepsilon}.

Now by Hölder’s inequality

	$\displaystyle{}\mathcal{P}^{n}f1_{\mathcal{K}^{c}}(x_{0})=$	$\displaystyle\ \int f(y)1_{\mathcal{K}^{c}}(y)\mathcal{P}^{n}(x_{0},dy)\leqslant\int(V^{r}(y)+1)1_{\mathcal{K}^{c}}(y)\mathcal{P}^{n}(x_{0},dy)$
	$\displaystyle{}\leqslant$	$\displaystyle\ \left(\int V^{r^{\prime}}(y)\mathcal{P}^{n}(x_{0},dy)\right)^{r/r^{\prime}}\left(\int\mathds{1}_{\mathcal{K}^{c}}(y)\mathcal{P}^{n}(x_{0},dy)\right)^{1-r/r^{\prime}}+\mathcal{P}^{n}(x_{0},\mathcal{K}^{c}),$
(2.13)		$\displaystyle\leqslant$	$\displaystyle\ \mathscr{B}_{r^{\prime}}^{r/r^{\prime}}(x_{0})\tilde{\varepsilon}^{1-r/r^{\prime}}+\tilde{\varepsilon}$

for some $r<r^{\prime}<\varsigma(s,p)$ . Similarly, $\pi(f\mathds{1}_{\mathcal{K}^{c}})\leqslant\mathscr{B}_{r^{\prime}}^{r/r^{\prime}}(x)\tilde{\varepsilon}^{1-r/r^{\prime}}+\tilde{\varepsilon}$ .

Since $f\mathds{1}_{\mathcal{K}}\in L^{1}(\mu)$ , there exist $\{h_{m}\}\subset C_{c}(\mathcal{E},\mathbb{R})$ such that $h_{m}\rightarrow f1_{\mathcal{K}}$ in $L^{1}(\mu)$ as $m\rightarrow\infty$ , and $\sup_{x}|h_{m}(x)|\leqslant\sup_{x\in\mathcal{K}}|f(x)|$ for $m\geqslant 1$ . In fact, we can choose $\{h_{m}\}$ such that $supp(h_{m})\subset\mathcal{K}^{\prime}\supset\mathcal{K}.$ for some compact set $\mathcal{K}^{\prime}$ .

Observe that for $x\in\mathcal{E}$ $y\in\mathcal{K}^{\prime}$

	$\displaystyle q^{n}(x,y)=$	$\displaystyle\ \int q^{n-1}(x,z)q(z,y)d\mu(z)\leqslant\int q^{n-1}(x,z)\mathfrak{c}_{\mathcal{K^{\prime}},0}\left(1+V^{r}(z)\right)d\mu(z)$
	$\displaystyle\leqslant$	$\displaystyle\ \mathfrak{c}_{\mathcal{K^{\prime}},0}\left(1+\mathbb{E}_{x}(V^{r}(X_{n-1}))\right)\leqslant\mathfrak{c}_{\mathcal{K^{\prime}},0}\left(1+\mathscr{B}_{r}(x)\right)\equiv\mathscr{C}_{\mathcal{K}^{\prime}}(x).$

Hence

	$\displaystyle{}\sup_{n}\|\mathcal{P}^{n}f\mathds{1}_{\mathcal{K}}(x_{0})-\mathcal{P}^{n}h_{m}\|\leqslant$	$\displaystyle\ \int_{\mathcal{K}^{\prime}}\|f(y)\mathds{1}_{\mathcal{K}}(y)-h_{m}(y)\|q^{n}(x_{0},y)d\mu(y)$
(2.14)		$\displaystyle\leqslant$	$\displaystyle\ \mathscr{C}_{\mathcal{K}^{\prime}}(x_{0})\\|f1_{\mathcal{K}}-h_{m}\\|_{1}.$

Next, notice that $\pi$ is absolutely continuous with $\mu$ . Indeed, if $\mu(A)=0$ , then $\mathcal{P}(x,A)=0$ , and hence $\pi(A)=\int\pi(dx)\mathcal{P}(x,A)=0$ . Let $g=d\pi/d\mu$ . For any $M>0$ ,

	$\displaystyle{}\|\pi(h_{m})-\pi(f1_{\mathcal{K}})\|\leqslant$	$\displaystyle\ M\int\|h_{m}-f\mathds{1}_{\mathcal{K}}\|\mathds{1}_{\{g\leqslant M\}}d\mu+\int\|h_{m}-f\mathds{1}_{\mathcal{K}}\|g\mathds{1}_{\{g\geqslant M\}}d\mu$
(2.15)		$\displaystyle\leqslant$	$\displaystyle M\\|h_{m}-f\mathds{1}_{\mathcal{K}}\\|_{1}+2\sup_{x\in\mathcal{K}}\|f(x)\|\int g\mathds{1}_{\{g\geqslant M\}}d\mu.$

Write

	$\displaystyle{}\mathcal{P}^{n}f(x_{0})-\pi(f)=$	$\displaystyle\ \left(\mathcal{P}^{n}f1_{\mathcal{K}}(x_{0})-\mathcal{P}^{n}h_{m}(x_{0})\right)+\left(\mathcal{P}^{n}h_{m}(x_{0})-\pi(h_{m})\right)+\left(\pi(h_{m})-\pi(f1_{\mathcal{K}}(x_{0}))\right)$
(2.16)			$\displaystyle\ +\mathcal{P}^{n}f1_{\mathcal{K}^{c}}(x_{0})-\pi(f1_{\mathcal{K}^{c}}(x_{0})),$

and choose $\mathcal{K}$ such that (2.13) holds for $\tilde{\varepsilon}$ where $\tilde{\varepsilon}$ is chosen such that $\mathscr{B}_{r^{\prime}}^{r/r^{\prime}}(x_{0})\tilde{\varepsilon}^{1-r/r^{\prime}}+\tilde{\varepsilon}\leqslant\varepsilon/10$ . Since $\int gd\mu=1$ , choose sufficiently large $M$ such that $\int g\mathds{1}_{\{g\geqslant M\}}d\mu\leqslant\varepsilon/(20\sup_{x\in\mathcal{K}}|f(x)|)$ , then a sufficiently large $m$ such that

\|f1_{\mathcal{K}}-h_{m}\|_{1}\leqslant(\varepsilon/5\mathscr{C}_{\mathcal{K}^{\prime}}(x_{0}))\wedge(\varepsilon/10M).

Finally, since $\mathcal{P}^{n}(x_{0},\cdot)\Rightarrow\pi$ , and $h_{m}\in C_{c}(\mathcal{E},\mathbb{R})$ , we have $\left(\mathcal{P}^{n}h_{m}(x_{0})-\pi(h_{m})\right)\rightarrow 0$ as $n\rightarrow\infty$ . Hence, we can choose a sufficiently large $n$ such that $|\mathcal{P}^{n}h_{m}(x_{0})-\pi(h_{m})|\leqslant\varepsilon/5$ , and thus from (2.13), (2.14), (2.15) and (2.16),

\displaystyle|\mathcal{P}^{n}f(x_{0})-\pi(f)|\leqslant

\displaystyle\varepsilon.

This proves the claim, which in particular says that for any $x_{0}\in\mathcal{E}$ and any Borel set $A$ , $\mathcal{P}^{n}(x,A)\stackrel{{\scriptstyle n\rightarrow\infty}}{{\rightarrow}}\pi(A)$ . By [9, Theorem 4.3.4] (also see [8]), $\{X_{n}\}$ is aperiodic and PHR, and by the same result this implies $\|\mathcal{P}^{n}(x,\cdot)-\pi\|_{TV}\rightarrow 0.$ The equivalence of the setwise convergence of $\mathcal{P}^{n}(x,\cdot)$ and convergence in total-variation norm is a unique feature of PHR chains. Now note that by Hölder’s inequality for some $r^{\prime}\in(r,\varsigma(s,p))$

	$\displaystyle\int(V^{r}(y)+1)d\|\mathcal{P}^{n}(x,\cdot)-\pi\|(y)\leqslant$	$\displaystyle\ \left(\int V^{r^{\prime}}(y)(\mathcal{P}^{n}(x,dy)+\pi(dy))\right)^{r/r^{\prime}}\\|\mathcal{P}^{n}(x,\cdot)-\pi\\|_{TV}^{1-r/r^{\prime}}$
		$\displaystyle\hskip 11.38092pt+\\|\mathcal{P}^{n}(x,\cdot)-\pi\\|_{TV}$
	$\displaystyle\leqslant$	$\displaystyle\ 2\mathscr{B}_{r^{\prime}}(x)^{r/r^{\prime}}\\|\mathcal{P}^{n}(x,\cdot)-\pi\\|_{TV}^{1-r/r^{\prime}}+\\|\mathcal{P}^{n}(x,\cdot)-\pi\\|_{TV}\stackrel{{\scriptstyle n\rightarrow\infty}}{{\rightarrow}}0.$

The equivalence of (2.11) and (2.12) follows from Lemma 2.9 below.

∎

Lemma 2.9.

Let $\nu$ be a signed measure on a complete separable metric space $\mathcal{E}$ . Suppose that $g:\mathcal{E}\rightarrow[0,\infty)$ is a measurable function such that $|\nu|(g)=\int gd|\nu|<\infty$ . Then

\frac{1}{2}|\nu|(g)\leqslant\|\nu\|_{g}\leqslant|\nu|(g),

where recall $\|\nu\|_{g}=\sup_{f:|f|\leqslant g}|\nu(f)|$

Proof.

The last inequality is trivial as for any measurable $f$ with $|f|\leqslant g$ , $|\nu(f)|\leqslant|\nu|(|f|)\leqslant|\nu|(g)$ . For the first inequality, let $\mathcal{E}=\mathcal{Y}\cup\mathcal{N}$ be the Hahn decomposition for $\nu$ (in particular, $\mathcal{Y}\cap\mathcal{N}=\varnothing$ ) , with the corresponding Jordan decomposition $\nu=\nu^{+}-\nu^{-}$ (i.e., supp $(\nu^{+})\subset\mathcal{Y}$ and supp $(\nu^{-})\subset\mathcal{N})$ . Choose $f=g\mathds{1}_{\mathcal{Y}}$ . Then

\displaystyle\|\nu\|_{g}\geqslant|\nu(g\mathds{1}_{\mathcal{Y}})|=|\nu^{+}(g\mathds{1}_{\mathcal{Y}})-\nu^{-}(g\mathds{1}_{\mathcal{Y}})|=\nu^{+}(g\mathds{1}_{\mathcal{Y}})=\nu^{+}(g),

where the last equality is because supp $(\nu^{+})\subset\mathcal{Y}$ . Similarly, choosing $f=g\mathds{1}_{\mathcal{N}}$ , we have $\|\nu\|_{g}\geqslant\nu^{-}(g)$ , whence it follows that $2\|\nu\|_{g}\geqslant|\nu|(g).$

∎

3. Applications

This sections is devoted to understanding stability of a broad class of multiplicative systems through application of the previous theorems.

3.1. Discrete time switching systems

Let $\mathbb{H}$ be a Hilbert space and $\mathcal{E}$ a Polish space. Suppose there exists a sequence of measurable maps $P_{n}:\mathbb{H}\times\mathcal{E}\times\mathcal{E}\rightarrow[0,1]$ such that for each $x\in\mathbb{H}$ , the function $P_{n}(x,\cdot,\cdot)$ is a transition probability kernel. Consider a discrete-time $\mathcal{F}_{n}$ -adapted process $\{Z_{n}\}\equiv\{(X_{n},Y_{n})\}$ taking values in $\mathbb{H}\times\mathcal{E}$ , whose dynamics is defined by the following rule: given the state $(X_{n},Y_{n})=(x_{n},y_{n})$ ,

(SS-1)

first, $Y_{n+1}$ is selected randomly according to the (possibly) time-inhomogenous transition probability distribution $P_{n}(x_{n},y_{n},\cdot)\equiv P_{n,x_{n}}(y_{n},\cdot)$ ,
(SS-2)

next given $Y_{n+1}=y_{n+1}$ ,

$X_{n+1}=H_{n}(x_{n},y_{n+1},\xi_{n+1}),$

where $\{\xi_{k}:k=1,\ldots\}$ is a sequence of independent random variables taking values in a Banach space $\mathbb{B}$ , $\xi_{n+1}$ is independent of $\sigma\{\mathcal{F}_{n},Y_{n+1}\}$ and $H_{n}:\mathbb{H}\times\mathcal{E}\times\mathbb{B}\rightarrow\mathbb{H}$ .

In general $\{(X_{n},Y_{n})\}$ is a (possibly) time-inhomogeneous Markov process but clearly, neither $\{X_{n}\}$ nor $\{Y_{n}\}$ is Markovian on its own. The stochastic system $\{(X_{n},Y_{n})\}$ is known as a discrete-time switching system or a stochastic hybrid system (and sometimes also known as iterated function system with place dependent probabilities [1]). Stochastic hybrid systems are extensively used to model practical phenomena where system parameters are subject to sudden changes. These systems have found widespread applications in various disciplines including synthesis of fractals, modeling of biological networks, [12], target tracking [19], communication networks [10], control theory [2, 3, 4] - to name a few. There is a considerable literature addressing classical weak stability questions concerning the existence and uniqueness of invariant measures of iterated function systems, see e.g., [20, 13, 25, 5, 11] and the references therein. Comprehensive sources studying various properties of these systems including results on stability in both continuous and discrete time can be found in [14, 28] (also see the references therein). In most of these works, $Y_{n}$ is often assumed to be a stand-alone finite or countable state-space Markov chains.

We consider a broad class of coupled switching or hybrid systems whose dynamics is described by (SS-1) and (SS-2) with $H_{n}$ of the form

H_{n}(x,y,z)=L_{n}(x,y)+F_{n}(x,y)+G_{n}(x,y,z),

where $L_{n},F_{n}:\mathbb{H}\times\mathcal{E}\rightarrow\mathbb{H}$ and $G_{n}:\mathbb{H}\times\mathcal{E}\times\mathbb{B}\rightarrow\mathbb{H}$ . In other words, $\{X_{n}\}$ satisfies

(3.17)

\displaystyle X_{n+1}=L_{n}(X_{n},Y_{n+1})+F_{n}(X_{n},Y_{n+1})+G_{n}(X_{n},Y_{n+1},\xi_{n+1})

where the $\xi_{n}$ are $\mathbb{B}$ -valued random variables. (3.17), for example, includes multiplicative systems of the form

\displaystyle X_{n+1}=X_{n}+F_{n}(X_{n},Y_{n+1})+G^{0}_{n}(X_{n},Y_{n+1})\xi_{n+1}.

We will make the following assumptions on the above system.

Assumption 3.1.

(SS-7)

For $\|x\|>B$ , and any $y\in\mathcal{E}$ ,

P_{n,x}\langle F_{n}(x,\cdot),L_{n}(x,\cdot)\rangle(y)=\int\langle F_{n}(x,y^{\prime}),L_{n}(x,y^{\prime})\rangle P_{n,x}(y,dy^{\prime})\leqslant-\mathfrak{m}_{0}\|x\|^{-(1+\gamma)},

for some constants $\mathfrak{m}_{0}$ and exponent $\gamma\geqslant 0$ .

(SS-8)

The following growth conditions hold:

•

$\|L_{n}(x,y)\|\leqslant\mathfrak{m}_{L,1}(y)\|x\|+\mathfrak{m}_{L,2}(y)$ and $\displaystyle\|\bar{L}_{n}(x,y)\|\leqslant\mathfrak{m}_{\bar{L}}(y)(1+\|x\|)^{l_{1}},$ where
$\bar{L}_{n}(x,y)=L_{n}(x,y)-P_{n,x}L_{n}(x,\cdot)(y).$
•

$\displaystyle\|F_{n}(x,y)\|\leqslant\mathfrak{m}_{F}(y)(1+\|x\|)^{f_{0}},$ $\bar{F}_{n}(x,y)\leqslant\mathfrak{m}_{\bar{F}}(y)(1+\|x\|)^{f_{1}}$ ,
$\|G_{n}(x,y,z)\|\leqslant\mathfrak{m}_{G}(y)(1+\|x\|)^{g_{0}}\Psi(z),$ where $\Psi:\mathbb{B}\rightarrow[0,\infty)$ and $\bar{F}_{n}(x,y)=F_{n}(x,y)-P_{n,x}F_{n}(x,\cdot)(y).$

•

For any $p>0$ , the constants $\bar{\mathfrak{m}}_{F,p},\bar{\mathfrak{m}}_{\bar{F},p},\bar{\mathfrak{m}}_{G,p},\bar{\mathfrak{m}}_{L,1,p},\bar{\mathfrak{m}}_{L,2,p}$ and $\bar{\mathfrak{m}}_{\bar{L},p}$ are finite, and $\bar{\mathfrak{m}}_{L,1,2}\leqslant 1$ ,where the above constants are defined as

(3.18)

\displaystyle\bar{\mathfrak{m}}_{\chi,p}\doteq\sup_{n,x,z}\int\mathfrak{m}^{p}_{\chi}(y)P_{n,x}(z,dy),\quad\chi=F,\bar{F},G,\{L,1\},\{L,2\},\bar{L}.

(SS-9)
The exponents satisfy:
- •
  
  (a) $f_{0}<(1+\gamma)/2$ , or (b) $f_{0}=(1+\gamma)/2$ and $\bar{\mathfrak{m}}_{F,2}\leqslant 2\mathfrak{m}_{0}$ ;
- •
  
  $\ g_{0}<\gamma\wedge 1/2$ , and $l_{1}\vee f_{1}<1/2$ .
(SS-10)

The $\xi_{n}$ are independent $\mathbb{B}$ -valued random variables with distribution $\nu_{n}$ ; for each $n$ , $\xi_{n+1}$ is independent of $\sigma\{\mathcal{F}_{n},Y_{n+1}\}$ , and for any $p>0$ , $m_{*}^{p}=\sup_{n}\mathbb{E}(\Psi(\xi_{n})^{p})<\infty$

Proposition 3.2.

Under Assumption 3.1, $\sup_{n}\mathbb{E}_{x_{0}}\|X_{n}\|^{m}<\infty.$ for any $m>0$ and $x_{0}\in\mathbb{H}$ . If the functions $G_{n}$ are centered with respect to the variable $z$ in the sense that $\displaystyle\hat{G}_{n}(x,y)\doteq\ \int_{\mathbb{B}}G_{n}(x,y,z)\nu_{n+1}(dz)=0$ for all $n\geqslant 1$ , $x\in\mathbb{H}$ and $y\in\mathcal{E}$ , then we only need $g_{0}<1/2$ instead of $\ g_{0}<\gamma\wedge 1/2$ in (SS-9) for the above assertion to be true.

Remark 3.3.

A few comments are in order.

•

Because of the growth assumption on $G_{n}$ in (SS-8) and the condition (SS-10), for each $n,x$ and $y$ , the function $z\rightarrow G_{n}(x,y,z)$ is Bochner integrable, and hence $\hat{G}_{n}(x,y)\doteq\int_{\mathbb{B}}G_{n}(x,y,z)\nu_{n+1}(dz)$ is well defined (the integral is defined in Bochner sense).
•

One scenario where the functions $G_{n}$ are centered (with respect to the variable $z$ ) occurs when considering multiplicative stochastic system driven by zero-mean random variables. Specifically, in such models the $G_{n}$ are of the form $G_{n}(x,y,z)=G^{0}_{n}(x,y)z$ and the $\xi_{n}$ are mean zero-random variables. Also notice for these models, $\Psi(z)=\|z\|_{\mathbb{B}}.$
•

Suppose that the $G_{n}$ are not centered in the variable $z$ . If $\gamma<1/2$ , (SS-9) requires that the growth exponent of $G_{n}$ , $g_{0}<\gamma$ . However, this could be extended to the boundary case of $g_{0}=\gamma$ (when $\gamma<1/2$ ) provided the averaged growth constants $\bar{\mathfrak{m}}_{\chi,p}$ (c.f. (3.18)) meet certain conditions. If $g_{0}=\gamma$ and $f_{0}<(1+\gamma)/2$ , then the assertion of Proposition 3.2 is true provided $\left(\bar{\mathfrak{m}}_{G,2}m^{2}_{*}\right)^{1/2}<\mathfrak{m}_{0}$ . If $g_{0}=\gamma$ and $f_{0}=(1+\gamma)/2$ , then the same assertion holds provided $\left(\bar{\mathfrak{m}}_{G,2}m^{2}_{*}\right)^{1/2}+\bar{\mathfrak{m}}_{F,2}/2<\mathfrak{m}_{0}$ .
•

Condition (SS-7) is implied by the simpler condition:

$\langle F_{n}(x,y),L_{n}(x,y)\rangle\leqslant-\mathfrak{m}_{0}\|x\|^{1+\gamma},\quad\|x\|>B,\ \forall y.$

Similarly, for many models a stronger (but easier to check) form of the condition (SS-8) , where the ‘constants’ $\mathfrak{m}_{\chi}$ (for $\chi=F,\bar{F},G,\{L,1\},\{L,2\},\bar{L}$ ) do not depend on $y$ , suffices. In that case the corresponding averaged constants (given by (3.18)) are of course given by $\bar{\mathfrak{m}}_{\chi,p}=\mathfrak{m}_{\chi}^{p}$ , and are therefore trivially finite.
•

One common example of $L_{n}$ is $L_{n}(x,y)\equiv L_{n}(x)=x$ or $U_{n}x$ for some unitary operator $U_{n}$ . If $L_{n}(x,y)\equiv L_{n}(x)$ , then centered $L_{n}$ , that is, $\bar{L}_{n}\equiv 0$ , and the condition on the corresponding growth exponent $l_{1}$ is trivially satisfied.
•

Clearly, $f_{1}\leqslant f_{0}$ , where recall that $f_{1}$ and $f_{0}$ are the growth rates of $\bar{F}_{n}(x,y)=F_{n}(x,y)-P_{x}F_{n}(x,\cdot)(y)$ (centered $F_{n}$ ) and $F_{n}$ , respectively. In some models, without any other information or suitable estimates on $\bar{F}$ , $f_{1}$ may just have to be taken the same as $f_{0}$ , in which case condition (SS-9) implies that the above result on uniform bounds on moments applies to systems for which $f_{0}<1/2.$ (and not $(1+\gamma)/2$ ). However, in some other models the optimal growth rate $f_{1}$ of $\bar{F}_{n}$ can indeed be lower than that of $F_{n}$ . For example, as we noted before for the function $L_{n}$ , if $F_{n}(x,y)\equiv F_{n}(x)$ , then $\bar{F}_{n}(x,y)\equiv 0$ (that is, in particular, $f_{1}=0$ ), and this along with Theorem 2.8 leads to Corollary 3.4 about Harris ergodicty of a large class of multiplicative Markovian systems.

Proof of Proposition 3.2.

Besides the different parameters in Assumption 3.1, other constants appearing in various estimates below will be denoted by $\mathfrak{m}_{i}$ ’s. They will not depend on $n$ but may depend on the parameters of the system.

For the proof we will only consider the case of (SS-9)-(a), where $f_{0}<(1+\gamma)/2$ ; the proofs in the cases of (SS-9)-(b) and the second point in Remark 3.3 follow from (3.21) and some minor modification of the arguments. For each $n$ , define the functions $\hat{G}_{n}:\mathbb{H}\times\mathcal{E}\rightarrow\mathbb{H}$ and $\tilde{G},\ \bar{G}_{n}:\mathbb{H}\times\mathcal{E}\times\mathbb{B}\rightarrow\mathbb{H}$ by

	$\displaystyle\ \hat{G}_{n}(x,y)=\ \int_{\mathbb{B}}G_{n}(x,y,z)\nu_{n+1}(dz),\quad\tilde{G}_{n}(x,y,z)=G_{n}(x,y,z)-\hat{G}_{n}(x,y),\quad\text{ and}$
	$\displaystyle\ \bar{G}_{n}(x,y,z)=\ G_{n}(x,y,z)-P_{n,x}\hat{G}_{n}(x,\cdot)(y)=G_{n}(x,y,z)-\mathbb{E}(G(X_{n},Y_{n+1},\xi_{n+1})\|(X_{n},Y_{n})=(x,y))$

(recall that $\nu_{n}$ is the distribution measure of $\xi_{n}$ ), and notice that by (SS-8) and (SS-10) for any $p>0$ ,

	$\displaystyle{}\mathbb{E}\left[\|\hat{G}_{n}(X_{n},Y_{n+1})\|^{p}\|\mathcal{F}_{n}\right]=$	$\displaystyle\ \int_{\mathcal{E}}\left(\int_{\mathbb{B}}G_{n}(x,y,z)\nu_{n+1}(dz)\right)^{p}P_{n,X_{n}}(Y_{n},dy)$
	$\displaystyle{}\leqslant$	$\displaystyle\ \int_{\mathcal{E}}\int_{\mathbb{B}}\mathfrak{m}_{G}^{p}(y)(1+\\|X_{n}\\|)^{pg_{0}}\Psi(z)^{p}\nu_{n+1}(dz)P_{n,X_{n}}(Y_{n},dy)$
(3.19)		$\displaystyle\leqslant$	$\displaystyle\ \bar{\mathfrak{m}}_{\hat{G},p}(1+\\|X_{n}\\|)^{pg_{0}},$

where $\bar{\mathfrak{m}}_{\hat{G},p}=\bar{\mathfrak{m}}_{G,p}m_{*}^{p}$ (recall $m_{*}^{p}=\sup_{k}\mathbb{E}\left[\Psi(\xi_{k})^{p}\right]<\infty$ ). It now easily follows that $\bar{G}_{n}$ and $\tilde{G}_{n}$ satisfy the following growth conditions:

\displaystyle\|\tilde{G}_{n}(x,y,z)\|\leqslant\mathfrak{m}_{\tilde{G}}(y)(1+\|x\|)^{g_{0}}\Psi(z),\quad\text{and}\quad\|\bar{G}_{n}(x,y,z)\|\leqslant\mathfrak{m}_{\bar{G}}(y)(1+\|x\|)^{g_{0}}\Psi(z)

for some functions $\mathfrak{m}_{\bar{G}}(y)$ and $\mathfrak{m}_{\tilde{G}}(y)$ (depending on $y$ ), where $\bar{\mathfrak{m}}_{\chi,p}<\infty$ for $\chi=\tilde{G},\bar{G}$ (see (3.18) for definition of $\bar{\mathfrak{m}}_{\chi,p}$ ). Consequently, for any $p>0$

	$\displaystyle\mathbb{E}\left[\\|\tilde{G}_{n}(X_{n},Y_{n+1},\xi_{n+1})\\|^{p}\big{\|}\mathcal{F}_{n}\right]\leqslant$	$\displaystyle\ \bar{\mathfrak{m}}_{\tilde{G},p}m_{*}^{p}(1+\\|X_{n}\\|)^{pg_{0}},$
	$\displaystyle\mathbb{E}\left[\\|\bar{G}_{n}(X_{n},Y_{n+1},\xi_{n+1})\\|^{p}\big{\|}\mathcal{F}_{n}\right]\leqslant$	$\displaystyle\ \bar{\mathfrak{m}}_{\bar{G},p}m_{*}^{p}(1+\\|X_{n}\\|)^{pg_{0}}.$

Also,

(3.20)

\displaystyle\begin{aligned} \mathbb{E}\left[\|L_{n}(X_{n},Y_{n+1})\|^{2}\big{|}\mathcal{F}_{n}\right]\leqslant&\ \|X_{n}\|^{2}+2\bar{\mathfrak{m}}_{L,2,2}^{1/2}\|X_{n}\|+\bar{\mathfrak{m}}_{L,2,2}=\left(\bar{\mathfrak{m}}_{L,2,2}^{1/2}+\|X_{n}\|\right)^{2}\\ \mathbb{E}\left[\|F_{n}(X_{n},Y_{n+1})\|^{2}\big{|}\mathcal{F}_{n}\right]\leqslant&\ \bar{\mathfrak{m}}_{F,2}(1+\|X_{n}\|)^{2f_{0}}.\end{aligned}

Now writing $G(X_{n},Y_{n+1},\xi_{n+1})=\hat{G}_{n}(X_{n},Y_{n+1})+\tilde{G}(X_{n},Y_{n+1},\xi_{n+1})$ , we have

	$\displaystyle\\|X_{n+1}\\|^{2}=$	$\displaystyle\ \\|L_{n}(X_{n},Y_{n+1})\\|^{2}+\\|F_{n}(X_{n},Y_{n+1})\\|^{2}+\\|\hat{G}_{n}(X_{n},Y_{n+1})\\|^{2}+\\|\tilde{G}(X_{n},Y_{n+1},\xi_{n+1})\\|^{2}$
		$\displaystyle\ +2\langle L_{n}(X_{n},Y_{n+1}),F_{n}(X_{n},Y_{n+1})\rangle+2\langle(L_{n}+F_{n}+\hat{G}_{n})(X_{n},Y_{n+1}),\tilde{G}(X_{n},Y_{n+1},\xi_{n+1})\rangle$
		$\displaystyle+2\langle(L_{n}+F_{n})(X_{n},Y_{n+1}),\hat{G}_{n}(X_{n},Y_{n+1})\rangle.$

Denoting the term $\langle(L_{n}+F_{n}+\hat{G}_{n})(X_{n},Y_{n+1}),\tilde{G}(X_{n},Y_{n+1},\xi_{n+1})\rangle$ by $J_{n+1}$ , we have

	$\displaystyle\mathbb{E}\left[J_{n+1}\|\mathcal{F}_{n}\right]=$	$\displaystyle\ \int_{\mathbb{B}}\int_{\mathcal{E}}\langle(L_{n}+F_{n}+\hat{G}_{n})(X_{n},y),\tilde{G}(X_{n},y,z)\rangle P_{n,X_{n}}(Y_{n},dy)\nu_{n+1}(dz)$
	$\displaystyle=$	$\displaystyle\ \int_{\mathcal{E}}\left\langle(L_{n}+F_{n}+\hat{G}_{n})(X_{n},y),\int_{\mathbb{B}}\tilde{G}(X_{n},y,z)\nu_{n+1}(dz)\right\rangle P_{n,X_{n}}(Y_{n},dy)=0.$

Also by Cauchy-Schwartz inequality, (3.19) and (3.20)

	$\displaystyle\mathbb{E}\left[\|\langle F_{n}(X_{n},Y_{n+1}),\hat{G}_{n}(X_{n},Y_{n+1})\rangle\|\big{\|}\mathcal{F}_{n}\right]\leqslant$	$\displaystyle\ \left(\mathbb{E}\left[\\|F_{n}(X_{n},Y_{n+1})\\|^{2}\big{\|}\mathcal{F}_{n}\right]\right)^{1/2}\left(\mathbb{E}\left[\\|\hat{G}_{n}(X_{n},Y_{n+1})\\|^{2}\big{\|}\mathcal{F}_{n}\right]\right)^{1/2}$
	$\displaystyle\leqslant$	$\displaystyle\ \bar{\mathfrak{m}}_{F,2}^{1/2}\bar{\mathfrak{m}}_{\hat{G},2}^{1/2}(1+\\|X_{n}\\|)^{f_{0}+g_{0}},$

and similarly,

\displaystyle\mathbb{E}\left[|\langle L_{n}(X_{n},Y_{n+1}),\hat{G}_{n}(X_{n},Y_{n+1})\rangle|\big{|}\mathcal{F}_{n}\right]\leqslant

\displaystyle\ \bar{\mathfrak{m}}_{\hat{G},2}^{1/2}\left(\bar{\mathfrak{m}}_{L,2,2}^{1/2}\vee 1+\|X_{n}\|\right)^{1+g_{0}}.

Hence, on $\{\|X_{n}\|>B\}$

	$\displaystyle{}\mathbb{E}\left[\\|X_{n+1}\\|^{2}\|\mathcal{F}_{n}\right]\leqslant$	$\displaystyle\ \\|X_{n}\\|^{2}+2\bar{\mathfrak{m}}_{L,2,2}^{1/2}\\|X_{n}\\|+\bar{\mathfrak{m}}_{L,2,2}+\bar{\mathfrak{m}}_{F,2}(1+\\|X_{n}\\|)^{2f_{0}}+(\bar{\mathfrak{m}}_{\hat{G},p}+\bar{\mathfrak{m}}_{\tilde{G},p}m_{*}^{p})(1+\\|X_{n}\\|)^{2g_{0}}$
(3.21)			$\displaystyle\ -2\mathfrak{m}_{0}\\|X_{n}\\|^{1+\gamma}+2\bar{\mathfrak{m}}_{F,2}^{1/2}\bar{\mathfrak{m}}_{\hat{G},2}^{1/2}(1+\\|X_{n}\\|)^{f_{0}+g_{0}}+2\bar{\mathfrak{m}}_{\hat{G},2}^{1/2}\left(\bar{\mathfrak{m}}_{L,2,2}^{1/2}\vee 1+\\|X_{n}\\|\right)^{1+g_{0}}.$

Since $\delta_{0}\doteq 2(f_{0}\vee g_{0})\vee(f_{0}+g_{0})\vee(1+g_{0})<1+\gamma,$ by (SS-9) it follows from the above inequality that we can choose $C>B$ large enough so that for $\|x_{n}\|>C$ ,

\displaystyle\mathbb{E}\left[\|X_{n+1}\|^{2}-\|X_{n}\|^{2}\left|\right.\mathcal{F}_{n}\right]\leqslant

\displaystyle\ \mathfrak{m}_{1}\left(\|X_{n}\|^{\delta_{0}}-\|X_{n}\|^{1+\gamma}\right)<0.

Also notice that choosing $C>B\vee 1$ we have for $\|X_{n}\|>C$

\displaystyle\mathchoice{{\hbox{$\displaystyle\sqrt{\mathbb{E}\left[\|X_{n+1}\|^{2}|\mathcal{F}_{n}\right]\,}$}\lower 0.4pt\hbox{\vrule height=9.30444pt,depth=-7.44359pt}}}{{\hbox{$\textstyle\sqrt{\mathbb{E}\left[\|X_{n+1}\|^{2}|\mathcal{F}_{n}\right]\,}$}\lower 0.4pt\hbox{\vrule height=9.30444pt,depth=-7.44359pt}}}{{\hbox{$\scriptstyle\sqrt{\mathbb{E}\left[\|X_{n+1}\|^{2}|\mathcal{F}_{n}\right]\,}$}\lower 0.4pt\hbox{\vrule height=6.53888pt,depth=-5.23112pt}}}{{\hbox{$\scriptscriptstyle\sqrt{\mathbb{E}\left[\|X_{n+1}\|^{2}|\mathcal{F}_{n}\right]\,}$}\lower 0.4pt\hbox{\vrule height=5.03888pt,depth=-4.03113pt}}}+\|X_{n}\|\leqslant\mathfrak{m}_{2}(1+\|X_{n}\|)^{1\vee\delta_{0}/2}\leqslant 2^{1\vee\delta_{0}/2}\mathfrak{m}_{2}\|X_{n}\|^{1\vee\delta_{0}/2}.

Therefore for $\|X_{n}\|>C$ ,

	$\displaystyle\mathbb{E}\left[\\|X_{n+1}\\|\|\mathcal{F}_{n}\right]-\\|X_{n}\\|\leqslant$	$\displaystyle\ \mathchoice{{\hbox{$\displaystyle\sqrt{\mathbb{E}\left[\\|X_{n+1}\\|^{2}\|\mathcal{F}_{n}\right]\,}$}\lower 0.4pt\hbox{\vrule height=9.30444pt,depth=-7.44359pt}}}{{\hbox{$\textstyle\sqrt{\mathbb{E}\left[\\|X_{n+1}\\|^{2}\|\mathcal{F}_{n}\right]\,}$}\lower 0.4pt\hbox{\vrule height=9.30444pt,depth=-7.44359pt}}}{{\hbox{$\scriptstyle\sqrt{\mathbb{E}\left[\\|X_{n+1}\\|^{2}\|\mathcal{F}_{n}\right]\,}$}\lower 0.4pt\hbox{\vrule height=6.53888pt,depth=-5.23112pt}}}{{\hbox{$\scriptscriptstyle\sqrt{\mathbb{E}\left[\\|X_{n+1}\\|^{2}\|\mathcal{F}_{n}\right]\,}$}\lower 0.4pt\hbox{\vrule height=5.03888pt,depth=-4.03113pt}}}-\\|X_{n}\\|=\frac{\mathbb{E}\left[\\|X_{n+1}\\|^{2}\left\|\right.\mathcal{F}_{n}\right]-\\|X_{n}\\|^{2}}{\mathchoice{{\hbox{$\displaystyle\sqrt{\mathbb{E}\left[\\|X_{n+1}\\|^{2}\|\mathcal{F}_{n}\right]\,}$}\lower 0.4pt\hbox{\vrule height=9.30444pt,depth=-7.44359pt}}}{{\hbox{$\textstyle\sqrt{\mathbb{E}\left[\\|X_{n+1}\\|^{2}\|\mathcal{F}_{n}\right]\,}$}\lower 0.4pt\hbox{\vrule height=9.30444pt,depth=-7.44359pt}}}{{\hbox{$\scriptstyle\sqrt{\mathbb{E}\left[\\|X_{n+1}\\|^{2}\|\mathcal{F}_{n}\right]\,}$}\lower 0.4pt\hbox{\vrule height=6.53888pt,depth=-5.23112pt}}}{{\hbox{$\scriptscriptstyle\sqrt{\mathbb{E}\left[\\|X_{n+1}\\|^{2}\|\mathcal{F}_{n}\right]\,}$}\lower 0.4pt\hbox{\vrule height=5.03888pt,depth=-4.03113pt}}}+\\|X_{n}\\|}$
	$\displaystyle\leqslant$	$\displaystyle\ \mathfrak{m}_{3}\left(\\|X_{n}\\|^{\delta_{0}-1\vee\delta_{0}/2}-\\|X_{n}\\|^{1+\gamma-1\vee\delta_{0}/2}\right).$

Because of assumption (SS-9), notice that

\|x\|^{\delta_{0}-1\vee\delta_{0}/2}-\|x\|^{1+\gamma-1\vee\delta_{0}/2}\stackrel{{\scriptstyle\|x\|\rightarrow\infty}}{{\longrightarrow}}\begin{cases}-\infty,&\quad\gamma>0\\ -\mathfrak{m}_{3},&\quad\gamma=0.\end{cases}

In either case, there exist a constant $A>0$ , and a sufficiently large $C$ , such that

(3.22)

\displaystyle\mathbb{E}\left[\|X_{n+1}\||\mathcal{F}_{n}\right]-\|X_{n}\|\leqslant

\displaystyle-A,\quad\text{ on }\|X_{n}\|>C.

Next, notice that

	$\displaystyle\Big{\|}\\|X_{n+1}\\|-\mathbb{E}\left[\\|X_{n+1}\\|\big{\|}\mathcal{F}_{n}\right]\Big{\|}\leqslant$	$\displaystyle\ \Big{\|}\\|X_{n+1}\\|-\\|\mathbb{E}[X_{n+1}\big{\|}\mathcal{F}_{n}]\\|\Big{\|}+\Big{\|}\\|\mathbb{E}[X_{n+1}\|\mathcal{F}_{n}]\\|-\mathbb{E}\left[\\|X_{n+1}\\|\big{\|}\mathcal{F}_{n}\right]\Big{\|}$
	$\displaystyle\leqslant$	$\displaystyle\\|X_{n+1}-\mathbb{E}\left[X_{n+1}\big{\|}\mathcal{F}_{n}\right]\\|+\Big{\|}\mathbb{E}\left[\\|X_{n+1}\\|-\\|\mathbb{E}[X_{n+1}\big{\|}\mathcal{F}_{n}]\\|\big{\|}\mathcal{F}_{n}\right]\Big{\|}$
	$\displaystyle\leqslant$	$\displaystyle\\|X_{n+1}-\mathbb{E}[X_{n+1}\|\mathcal{F}_{n}]\\|+\mathbb{E}\left[\\|X_{n+1}-\mathbb{E}[X_{n+1}\|\mathcal{F}_{n}]\\|\big{\|}\mathcal{F}_{n}\right].$

Hence,

(3.23)	$\displaystyle\Xi_{n}=$	$\displaystyle\ \mathbb{E}\left[\left\|\\|X_{n+1}\\|-\mathbb{E}\left[\\|X_{n+1}\\|\big{\|}\mathcal{F}_{n}\right]\right\|^{p}\Big{\|}\mathcal{F}_{n}\right]\leqslant\ 2^{p}\mathbb{E}\left[\\|X_{n+1}-\mathbb{E}[X_{n+1}\|\mathcal{F}_{n}]\\|^{p}\Big{\|}\mathcal{F}_{n}\right]$
	$\displaystyle=$	$\displaystyle\ 2^{p}\mathbb{E}\left[\\|\bar{L}(X_{n},Y_{n+1})+\bar{F}(X_{n},Y_{n+1})+\bar{G}(X_{n},Y_{n+1},\xi_{n+1})\\|^{p}\Big{\|}\mathcal{F}_{n}\right]$
	$\displaystyle\leqslant$	$\displaystyle\ \mathfrak{m}_{4}(1+\\|X_{n}\\|)^{p(l_{1}\vee f_{1}\vee g_{0})}\equiv\phi_{p}(X_{n}),$

where $\phi_{p}(x)\doteq\mathfrak{m}_{4}(1+\|x\|)^{p(l_{1}\vee f_{1}\vee g_{0})}$ . Since $l_{1}\vee f_{1}\vee g_{0}<1/2$ , for large enough $p$ , we have $p(l_{1}\vee f_{1}\vee g_{0})<p/2-1$ . It now follows from Theorem 2.2 (using $V(x)=\|x\|$ ) that for any $r\in(0,\varsigma(s=p(l_{1}\vee f_{1}\vee g_{0}),p))$ , $\sup_{n}\mathbb{E}\|X_{n}\|^{r}<\infty$ . Since $p>0$ is arbitrarily large, the assertion follows.

If $G_{n}(x,y,z)$ are centered, that is, if $\hat{G}_{n}\equiv 0$ , then of course $\bar{\mathfrak{m}}_{\hat{G},p}$ can be taken to be $0$ for all $p>0$ , and from (3.21) , $\delta_{0}=2(f_{0}\vee g_{0})$ . Consequently, we do not need $g_{0}<\gamma$ to have $\delta_{0}<1+\gamma$ .

∎

Corollary 3.4.

Consider the class of $\{\mathcal{F}_{n}\}$ -adapted Markov processes taking values in $\mathbb{R}^{d}$ , whose dynamics is defined by

(3.24)

\displaystyle X_{n+1}=L(X_{n})+F(X_{n})+G(X_{n})\xi_{n+1},

where $F,L:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ , $G:\mathbb{R}^{d}\rightarrow\mathbb{M}^{d\times d^{\prime}}$ are continuous functions, and $d\leqslant d^{\prime}$ . Assume that

(M-1)

$F$ , $G$ and $L$ satisfy the growth conditions (a) $\|L(x)\|\leqslant\|x\|$ for $\|x\|>B$ , (b) $\displaystyle\|F(x)\|\leqslant\mathfrak{m}_{F}(1+\|x\|)^{\gamma_{0}},$ and (c) $\|G(x)\|\leqslant\mathfrak{m}_{G}(1+\|x\|)^{g_{0}};$
(M-2)

for some constant $\mathfrak{m}_{0},B$ and exponent $\gamma\geqslant 0$ ,

$\langle F(x),L(x)\rangle\leqslant-\mathfrak{m}_{0}\|x\|^{1+\gamma},\quad\text{ for }\|x\|>B;$
(M-3)

the exponents satisfy: (a) $\gamma_{0}<(1+\gamma)/2$ , or $\gamma_{0}=(1+\gamma)/2$ and $\mathfrak{m}_{F}\leqslant\mathfrak{m}_{0}/2$ ; (b) $g_{0}<1/2$ ;
(M-4)

the $\xi_{n}$ are i.i.d $\mathbb{R}^{d^{\prime}}$ -valued random variables with density $\rho$ with respect to Lebesgue measure, $\lambda_{\text{leb}}$ ; $\rho(z)>0$ for all $z\in\mathbb{R}^{d^{\prime}}$ , $\sup_{z\in\mathbb{R}^{d^{\prime}}}\rho(z)<\infty$ , and for each $p>0$ , $m_{*}^{p}=\mathbb{E}(\|\xi_{1}\|^{p})<\infty$ ;
(M-5)

for some $\theta\geqslant 0$ and $\varepsilon_{0}>0$ ,

$u^{T}G(x)G(x)^{T}u\geqslant\varepsilon_{0}u^{T}u/(1+\|x\|)^{\theta},\quad\forall\ u,x\in\mathbb{R}^{d}.$

If in addition $\mathbb{E}(\xi_{1})=0$ , for all $n$ , then (a) $\{X_{n}\}$ is PHR and aperiodic with a unique invariant distribution $\pi$ , (b) $\sup_{n}\mathbb{E}_{x_{0}}(\|X_{n}\|^{r})\vee\mathbb{E}_{\pi}\|X_{n}\|^{r}<\infty$ , and (c) (2.11) or equivalently, (2.12) holds with $V(u)=\|u\|$ , for any $x_{0}$ and $r>0$ . If $\mathbb{E}(\xi_{1})\neq 0$ , then the same assertion is true provided $g_{0}<\gamma\wedge 1/2$

Proof.

Since $L,F$ and $G$ are continuous, it follows by the dominated convergence theorem that $\{X_{n}\}$ is weak-Feller. From the assumption (M-5), it follows that $GG^{T}$ is positive definite (in particular, non singular), and det $(G(x)G^{T}(x))\geqslant\varepsilon^{d}_{0}/(1+\|x\|)^{\theta d}$ . Note that $\mathcal{P}(x,\cdot)$ admits a density $q(x,\cdot)$ . Specifically,

q(x,y)=\frac{1}{\mathchoice{{\hbox{$\displaystyle\sqrt{\text{det}(G(x)G^{T}(x))\,}$}\lower 0.4pt\hbox{\vrule height=8.74664pt,depth=-6.99734pt}}}{{\hbox{$\textstyle\sqrt{\text{det}(G(x)G^{T}(x))\,}$}\lower 0.4pt\hbox{\vrule height=8.74664pt,depth=-6.99734pt}}}{{\hbox{$\scriptstyle\sqrt{\text{det}(G(x)G^{T}(x))\,}$}\lower 0.4pt\hbox{\vrule height=6.15pt,depth=-4.92003pt}}}{{\hbox{$\scriptscriptstyle\sqrt{\text{det}(G(x)G^{T}(x))\,}$}\lower 0.4pt\hbox{\vrule height=4.78333pt,depth=-3.82668pt}}}}\rho\left(G(x)^{-}_{R}(y-L(x)-H(x)\right)\leqslant\sup_{z}\rho(z)(1+\|x\|)^{\theta d/2}/\varepsilon^{d/2}_{0},

where $G(x)^{-}_{R}=G^{T}(x)\left(G(x)G(x)^{T}\right)^{-1}$ is the Moore-Penrose pseudoinverse (in particular, right inverse) of $G(x)$ . Moreover, since $\rho(z)>0$ a.s, for each $x$ , $q(x,y)>0$ a.s in $y$ (with respect to $\lambda_{\text{leb}}$ ), and consequently, $X_{n}$ is $\lambda_{\text{leb}}$ -irreducible. This shows that Condition (2.8-d) of Theorem 2.8 holds. The various assertions now follow from Theorem 2.8 and Proposition 3.2 ∎

Remark 3.5.

The condition (M-5) is much weaker than uniform ellipticity condition that is sometimes imposed on $GG^{T}$ for these kinds of models - the latter requiring for some $\varepsilon_{0}>0$ , $u^{T}G(x)G(x)^{T}u\geqslant\varepsilon_{0}u^{T}u$ , for all $u,x\in\mathbb{R}^{d}.$

The above theorem also holds, with some possible minor modifications, for systems of the form (3.24) taking values in other locally compact spaces with $\xi_{n}$ admitting a density $\rho$ with respect to the Haar measure. In particular, for such systems taking values in a countable state space like $\mathbb{Z}^{d}$ or $\mathbb{Q}^{d}$ , notice that the transition probability mass function (density with respect to counting measure) $q(x,y)$ naturally exists and $q(x,y)\leqslant 1$ , that is, the bound on $q$ in condition (2.8-d) of Theorem 2.8 is trivially satisfied. Hence condition (M-5) in Corollary 3.4 is not needed in this case. However, depending on the specific model, one might still require $G$ to have full row rank for establishing irreducibility of the chain.

As an important application, the above corollary can be used to establish ergodicity of numerical schemes of stochastic differential equations (SDEs).

Example 3.6.

Euler-Maruyama scheme for ergodic SDEs: Consider the SDE

\displaystyle X(t)=X(0)+\int_{0}^{t}F(X(s))ds+\int_{0}^{t}G(X(s))dW(s),

and suppose that $X$ is ergodic with invariant / equilibrium distribution $\pi$ - which is typically unknown. Approximating this equilibrium distribution is an important computational problem in various areas including statistical physics, machine learning, mathematical finance etc. Since numerically solving the corresponding (stationary) Kolomogorov PDE for $\pi$ is computationally expensive even when the dimension is as low as $3$ , one commonly resorts to discretization schemes like the Euler-Maruyama method:

\displaystyle X^{\Delta}(t_{n+1})=X^{\Delta}(t_{n})+F(X^{\Delta}(t_{n}))\Delta+\Delta^{1/2}G(X^{\Delta}(t_{n}))\xi_{n+1}.

Here the $\xi_{n}$ are iid $N(0,I)$ -random variables, and $\{t_{n}\}$ is a partition of $[0,\infty)$ with $t_{n+1}-t_{n}=\Delta$ - the step size of discretization. However, the use of such discretization techniques in approximating $\pi$ is justified provided one can establish (a) ergodicity of the discretized chain $\{X^{\Delta}(t_{n})\}$ with a unique invariant distribution $\pi^{\Delta}$ , and (b) convergence of $\pi^{\Delta}$ to $\pi$ as $\Delta\rightarrow 0$ . This is a hard problem involving infinite time horizon, and usual error analysis of Euler-Maruyama schemes, which has of course been well studied in the literature, is not useful here, as they are over finite time intervals. In comparison, much less is available on theoretical error analyses of these types of infinite-time horizon approximation problems, and some important results in this direction have been obtained by Talay [27, 26, 7]. A recent paper [6] (also see the references therein for more background on the problem) conducts a thorough large deviation error analysis of the problem in an appropriate scaling regime.

This short example do not attempt to address both the points (a) and (b) of this problem as that requires a separate paper-long treatment. Here, we are only interested in the point (a) above - which is ergodicity of the discretized chain $\{X^{\Delta}(t_{n})\}$ . It is well known that ergodicity of $X$ does not guarantee the ergodicity of the discretized chain $X^{\Delta}$ . Discretization can destroy the underlying Lyapunov structure of an ergodic SDE!

In [27, 26] among several other important results, Talay et al. in particular showed that the chain $\{X^{\Delta}(t_{n})\}$ is ergodic with unique invariant measure $\pi^{\Delta}$ and $\mathbb{E}(f(X^{\Delta}(t_{n}))\rightarrow\pi^{\Delta}(f)$ as $n\rightarrow\infty$ for any $f\in C^{\infty}(\mathbb{R}^{d},\mathbb{R})$ such that $f$ and all its derivatives have polynomial growth under the assumption (i) $\langle F(x),x\rangle\leqslant-\mathfrak{m}_{0}\|x\|^{2}$ , for $\|x\|>B$ , (ii) $F$ and $G$ are $C^{\infty}$ with bounded derivatives of all order and (iii) $GG^{T}$ is uniformly elliptic and bounded. An application of Corollary 3.4 shows that this result can be significantly improved with stronger convergence results under weaker hypothesis (c.f (M-1) -(M-5)). In particular, uniform ellipticity and boundedness conditions on $GG^{T}$ , which are quite restrictive for many models, can be removed.

3.2. Moment stability of linear stochastic control systems

Consider the system

(3.25)

\displaystyle X_{n+1}=AX_{n}+Bu_{n}+\xi_{n+1}

We are interested in the problem of finding conditions under which a linear stochastic system with possibly unbounded additive stochastic noise is globally stabilizable with bounded control inputs $\{u_{n}\}$ . Stabilization of stochastic linear systems with bounded control is a topic of significant interest in control engineering because of its importance in diverse fields; suboptimal control strategies such as receding-horizon control, and rollout algorithms, among others, can be easily constructed incorporating such constraints, and have become popular in applications. Here we simply refer to [24] and references therein for a detailed background on this topic.

Of course, boundedness of some moments of the noise component is necessary for attaining (moment) stability of the system. Specifically, we consider the following problem:

Problem: Suppose $\mathbb{U}\doteq\left\{z\in\mathbb{R}^{m}:\|z\|\leqslant U_{\max}\right\}$ . We consider admissible possible $k$ -history dependent control policies of the type $\pi=\{\pi_{n}\}$ so that $\pi_{n}:\mathbb{R}^{d\times k}\rightarrow\mathbb{U}$ , and for every $y_{1},y_{2}\ldots,y_{k}\in\mathbb{R}^{d}$ , $\pi_{n}(y_{1},\ldots,y_{k})\in\mathbb{U}$ . Given $r\geqslant 1$ and $U_{\max}>0$ , find an admissible policy $\pi=\{\pi_{n}\}_{n\in\mathbb{N}}$ with control authority $U_{\max}$ , such that the system (3.25) with $u_{n}=\pi_{n}(X_{n-k+1},\ldots,X_{n-1},X_{n})$ is $r$ -th moment stable, that is, for every initial condition $X_{0}=x_{0}$ , $\sup_{n}\mathbb{E}_{x_{0}}\|X_{n}\|^{r}<\infty$ .

It is known that mean square boundedness holds for systems with bounded controls where $A$ is Schur stable, that is, all eigenvalues of $A$ are contained in the open unit disk (the proof uses Foster-Lyapunov techniques from [15]). In the more general framework, under the assumption that the pair $(A,B)$ is only stabilizable (which in particular allows the eigenvalues of $A$ to lie on the closed unit disk), [24] shows that there exist a $k$ -history dependent control policy that ensures moment stability of (3.25), provided the control authority $U_{\max}$ is chosen sufficiently large. It was conjectured in [24], that the lower bound on $U_{\max}$ can possibly be lifted with newer techniques, and here we demonstrate that is indeed the case. The following result is an easy corollary of Proposition 3.2. For simplicity, we assume that $A$ is orthogonal and $(A,B)$ is reachable in $k$ -steps. The steps from there to the more general case are similar to that in [24]. In case $B$ has full row rank, it will follow that $k$ can be taken to be $1$ , that is, the resulting policy is stationary feedback.

Proposition 3.7.

Consider the system defined by (3.25). Suppose that $A$ is orthogonal and the pair $(A,B)$ is reachable in $k$ steps (that is, $\operatorname{rank}(\mathcal{R}_{k})=d$ , where $\mathcal{R}_{k}=[B\ AB\ A^{2}B\ \ldots\ A^{k-1}B]$ ). Then for any $U_{\max}>0$ , there exists a $k$ -history dependent policy $\pi=\{\pi_{n}\}$ such that given $(X_{n-k+1},\ldots,X_{n-1},X_{n})=(x_{n-k+1},\ldots,x_{n-1},x_{n})$ , $\pi_{n}(x_{n-k+1},\ldots,x_{n-1},x_{n})\doteq f_{n\text{ \rm mod }k}(x_{\left\lfloor{n/k}\right\rfloor k})$ for some functions $f_{0},f_{1},\ldots,f_{k-1}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{m}$ where $\|f_{i}(x)\|\leqslant U_{\max}$ for $i=0,1,2,\ldots,k-1$ , and for which $\sup_{n}\mathbb{E}_{x_{0}}\|X_{n}\|^{r}<\infty$ for any $x_{0}\in\mathbb{R}^{d}$ .

Proof.

Define $\hat{X}^{(k)}_{n}=X_{nk}$ , and notice that by iterating (3.25) we get

\displaystyle\hat{X}^{(k)}_{n+1}=

\displaystyle A^{k}\hat{X}^{(k)}_{n}+\mathcal{R}_{k}\begin{pmatrix}u_{(n+1)k-1}\\ \vdots\\ u_{nk+1}\\ u_{nk}\end{pmatrix}+\sum_{j=1}^{k}A^{k-1-j}\xi_{nk+i}\equiv A^{k}\hat{X}^{(k)}_{n}+\mathcal{R}_{k}\hat{u}^{(k)}_{n}+\hat{\xi}^{(k)}_{n}

Notice that $\mathbb{E}(\xi^{(k)}_{n})=0$ and $\sup_{n}\mathbb{E}\|\xi^{(k)}_{n}\|^{p}\leqslant\hat{\mathscr{C}}_{k}$ for some constant $\hat{\mathscr{C}}_{k}>0$ . Since $\mathcal{R}_{k}$ has full row rank, it has a right inverse $\mathcal{R}_{k}^{-}$ . Define

\operatorname{sat}(y)=\begin{cases}y,&\quad y\in B(0,\hat{U}_{\max})\\ \hat{U}_{\max}\ y/\|y\|,&\quad\text{otherwise}\end{cases}

and choose $\hat{u}^{(k)}_{n}=-\mathcal{R}_{k}^{-}A^{k}\operatorname{sat}(\hat{X}^{(k)}_{n})$ , where $\hat{U}_{\max}$ is such that $\|\mathcal{R}_{k}^{-}A^{k}\|\hat{U}_{\max}\leqslant U_{\max}.$ This yields the system

\displaystyle\hat{X}^{(k)}_{n+1}=A^{k}\hat{X}^{(k)}_{n}-A^{k}\operatorname{sat}(\hat{X}^{(k)}_{n})+\hat{\xi}^{(k)}_{n}.

Since for $\|z\|>U_{\max}$ , $\langle A^{k}z,-A^{k}\operatorname{sat}(z)\rangle=-\|z\|$ (recall that $A$ is orthogonal), we have from Proposition 3.2 that there exists a constant $\mathfrak{c}^{(k,r)}_{0}$ such that

\sup_{n}\mathbb{E}\|\hat{X}^{(k)}_{n}\|^{r}=\sup_{n}\mathbb{E}\|X_{nk}\|^{r}<\mathfrak{c}^{(k)}_{0}.

It is now immediate by a sequential argument that for any $\ell=0,1,\ldots,k-1$ , $\mathbb{E}\|X_{nk+\ell}\|^{r}\leqslant\mathfrak{c}^{(k,r)}_{\ell}$ where $\mathfrak{c}^{(k,r)}_{\ell}=3^{r-1}\left(\|A\|^{r}\mathfrak{c}^{(k,r)}_{\ell-1}+\|\mathcal{R}_{k}^{-}A^{k}\|^{r}U^{r}_{\max}+\mathfrak{m}^{r}_{*}\right)$ .

Notice that the original controls $u_{n}$ are defined

u_{n}=-E_{k-(n\text{ \rm mod }k)}^{T}\mathcal{R}_{k}^{-}A^{k}\operatorname{sat}(X_{\left\lfloor{n/k}\right\rfloor k}),

where the matrices $E_{j}\in\mathbb{M}_{m\times km},\ j=1,2,\ldots,k$ , are defined by

\displaystyle E_{j}=\begin{bmatrix}\bm{0}_{m\times m}&\ldots&\bm{0}_{m\times m}&\underbrace{\bm{I}_{m\times m}}_{j\text{-th block}}&\bm{0}_{m\times m}&\ldots&\bm{0}_{m\times m}\end{bmatrix}

In particular, from the state at time $nk$ , the present and the next $k-1$ controls $u_{j},j=nk,nk+1,\ldots,nk+k-1$ can be computed. ∎

References

Barnsley et al. [1988] M. F. Barnsley, S. G. Demko, J. H. Elton, and J. S. Geronimo. Invariant measures for Markov processes arising from iterated function systems with place-dependent probabilities. Annales de l’Institut Henri Poincaré. Probabilités et Statistique, 24(3):367–394, 1988. Erratum in ibid., 24 (1989), no. 4, 589–590.
Chatterjee and Pal [2011] D. Chatterjee and S. Pal. An excursion-theoretic approach to stability of discrete-time stochastic hybrid systems. Applied Mathematics & Optimization, 63(2):217–237, 2011. http://dx.doi.org/10.1007/s00245-010-9117-6.
Chatterjee et al. [2011] D. Chatterjee, E. Cinquemani, and J. Lygeros. Maximizing the probability of attaining a target prior to extinction. Nonlinear Analysis: Hybrid Systems, 5(2):367 – 381, 2011. Special Issue related to IFAC Conference on Analysis and Design of Hybrid Systems (ADHS’09) - IFAC ADHS’09, http://dx.doi.org/10.1016/j.nahs.2010.12.003.
Costa et al. [2005] O. L. V. Costa, M. D. Fragoso, and R. P. Marques. Discrete-Time Markov Jump Linear Systems. Probability and its Applications (New York). Springer-Verlag London, Ltd., London, 2005.
Diaconis and Freedman [1999] P. Diaconis and D. Freedman. Iterated random functions. SIAM Review, 41(1):45–76 (electronic), 1999.
Ganguly and Sundar [2021] Arnab Ganguly and P. Sundar. Inhomogeneous functionals and approximations of invariant distributions of ergodic diffusions: central limit theorem and moderate deviation asymptotics. Stochastic Process. Appl., 133:74–110, 2021. ISSN 0304-4149.
Graham and Talay [2013] Carl Graham and Denis Talay. Stochastic simulation and Monte Carlo methods, volume 68 of Stochastic Modelling and Applied Probability. Springer, Heidelberg, 2013. ISBN 978-3-642-39362-4; 978-3-642-39363-1. Mathematical foundations of stochastic simulation.
Hernández-Lerma and Lasserre [2001] Onésimo Hernández-Lerma and Jean B. Lasserre. Further criteria for positive Harris recurrence of Markov chains. Proc. Amer. Math. Soc., 129(5):1521–1524, 2001. ISSN 0002-9939.
Hernández-Lerma and Lasserre [2003] Onésimo Hernández-Lerma and Jean Bernard Lasserre. Markov chains and invariant probabilities, volume 211 of Progress in Mathematics. Birkhäuser Verlag, Basel, 2003. ISBN 3-7643-7000-9.
Hespanha [2005] J. P. Hespanha. A model for stochastic hybrid systems with application to communication networks. Nonlinear Anal., 62(8):1353–1383, 2005. ISSN 0362-546X.
Jarner and Tweedie [2001] S. F. Jarner and R. L. Tweedie. Locally contracting iterated functions and stability of Markov chains. Journal of Applied Probability, 38(2):494–507, 2001.
Lasota and Mackey [1994] A. Lasota and M. C. Mackey. Chaos, Fractals, and Noise, volume 97 of Applied Mathematical Sciences. Springer-Verlag, New York, 2 edition, 1994.
Lasota and Yorke [1994] A. Lasota and J. A. Yorke. Lower bound technique for Markov operators and iterated function systems. Random & Computational Dynamics, 2(1):41–77, 1994.
Mao and Yuan [2006] Xuerong Mao and Chenggui Yuan. Stochastic differential equations with Markovian switching. Imperial College Press, London, 2006. ISBN 1-86094-701-8.
Meyn and Tweedie [2009] S. P. Meyn and R. L. Tweedie. Markov Chains and Stochastic Stability. Cambridge University Press, London, 2 edition, 2009.
Meyn and Tweedie [1992] Sean P. Meyn and R. L. Tweedie. Stability of Markovian processes. I. Criteria for discrete-time chains. Adv. in Appl. Probab., 24(3):542–574, 1992. ISSN 0001-8678.
Meyn and Tweedie [1993a] Sean P. Meyn and R. L. Tweedie. Stability of Markovian processes. II. Continuous-time processes and sampled chains. Adv. in Appl. Probab., 25(3):487–517, 1993a. ISSN 0001-8678.
Meyn and Tweedie [1993b] Sean P. Meyn and R. L. Tweedie. Stability of Markovian processes. III. Foster-Lyapunov criteria for continuous-time processes. Adv. in Appl. Probab., 25(3):518–548, 1993b. ISSN 0001-8678.
Michel [1990] Mariton Michel. Jump Linear Systems in Automatic Control. Marcel Dekker, New York, 1990.
Peigné [1993] M. Peigné. Iterated function systems and spectral decomposition of the associated Markov operator. In Fascicule de probabilités, volume 1993 of Publ. Inst. Rech. Math. Rennes, page 28. Univ. Rennes I, Rennes, 1993.
Pemantle and Rosenthal [1999] R. Pemantle and J. S. Rosenthal. Moment conditions for a sequence with negative drift to be uniformly bounded in $L^{r}$ . Stochastic Processes and their Applications, 82(1):143–155, 1999.
[22] G. Da Prato. An Introduction to Infinite-Dimensional Analysis. Universitext. Springer-Verlag, Berlin. Revised and extended from the 2001 original by Da Prato.
Protter [2005] Philip E. Protter. Stochastic Integration and Differential Equations, volume 21 of Stochastic Modelling and Applied Probability. Springer-Verlag, Berlin, 2005. ISBN 3-540-00313-4. Second edition. Version 2.1, Corrected third printing.
Ramponi et al. [2010] F. Ramponi, D. Chatterjee, A. Milias-Argeitis, P. Hokayem, and J. Lygeros. Attaining mean square boundedness of a marginally stable stochastic linear system with a bounded control input. IEEE Transactions on Automatic Control, 55(10):2414–2418, 2010. http://arxiv.org/abs/0907.1436.
Szarek [2003] T. Szarek. Invariant measures for nonexpensive Markov operators on Polish spaces. Dissertationes Mathematicae (Rozprawy Matematyczne), 415, 2003. Dissertation, Polish Academy of Science, Warsaw, 2003.
Talay [1990] Denis Talay. Second-order discretization schemes of stochastic differential systems for the computation of the invariant law. Stochastics and Stochastic Reports, 29(1):13–36, 1990.
Talay and Tubaro [1990] Denis Talay and Luciano Tubaro. Expansion of the global error for numerical schemes solving stochastic differential equations. Stochastic Anal. Appl., 8(4):483–509 (1991), 1990. ISSN 0736-2994.
Yin and Zhu [2010] G. George Yin and Chao Zhu. Hybrid switching diffusions, volume 63 of Stochastic Modelling and Applied Probability. Springer, New York, 2010. ISBN 978-1-4419-1104-9. Properties and applications.

	$\displaystyle{}\mathbb{E}\bigl{[}\left\lvert{M_{n}-M_{k}}\right\rvert^{p}\,\big{\|}\,\mathcal{F}_{k}\bigr{]}$	$\displaystyle\leqslant c_{p}(n-k)^{\frac{p}{2}-1}\sum_{m=k}^{n-1}\mathbb{E}\left[\left\lvert{M_{m+1}-M_{m}}\right\rvert^{p}\|\mathcal{F}_{k}\right]$
(2.4)			$\displaystyle\leqslant c_{p}(n-k)^{\frac{p}{2}-1}\sum_{m=k}^{n-1}\mathbb{E}\left[\gamma_{m}\|\mathcal{F}_{k}\right].$

	$\displaystyle{}\sup_{n}\|\mathcal{P}^{n}f\mathds{1}_{\mathcal{K}}(x_{0})-\mathcal{P}^{n}h_{m}\|\leqslant$	$\displaystyle\ \int_{\mathcal{K}^{\prime}}\|f(y)\mathds{1}_{\mathcal{K}}(y)-h_{m}(y)\|q^{n}(x_{0},y)d\mu(y)$
(2.14)		$\displaystyle\leqslant$	$\displaystyle\ \mathscr{C}_{\mathcal{K}^{\prime}}(x_{0})\\|f1_{\mathcal{K}}-h_{m}\\|_{1}.$

	$\displaystyle{}\|\pi(h_{m})-\pi(f1_{\mathcal{K}})\|\leqslant$	$\displaystyle\ M\int\|h_{m}-f\mathds{1}_{\mathcal{K}}\|\mathds{1}_{\{g\leqslant M\}}d\mu+\int\|h_{m}-f\mathds{1}_{\mathcal{K}}\|g\mathds{1}_{\{g\geqslant M\}}d\mu$
(2.15)		$\displaystyle\leqslant$	$\displaystyle M\\|h_{m}-f\mathds{1}_{\mathcal{K}}\\|_{1}+2\sup_{x\in\mathcal{K}}\|f(x)\|\int g\mathds{1}_{\{g\geqslant M\}}d\mu.$

	$\displaystyle\int(V^{r}(y)+1)d\|\mathcal{P}^{n}(x,\cdot)-\pi\|(y)\leqslant$	$\displaystyle\ \left(\int V^{r^{\prime}}(y)(\mathcal{P}^{n}(x,dy)+\pi(dy))\right)^{r/r^{\prime}}\\|\mathcal{P}^{n}(x,\cdot)-\pi\\|_{TV}^{1-r/r^{\prime}}$
		$\displaystyle\hskip 11.38092pt+\\|\mathcal{P}^{n}(x,\cdot)-\pi\\|_{TV}$
	$\displaystyle\leqslant$	$\displaystyle\ 2\mathscr{B}_{r^{\prime}}(x)^{r/r^{\prime}}\\|\mathcal{P}^{n}(x,\cdot)-\pi\\|_{TV}^{1-r/r^{\prime}}+\\|\mathcal{P}^{n}(x,\cdot)-\pi\\|_{TV}\stackrel{{\scriptstyle n\rightarrow\infty}}{{\rightarrow}}0.$

	$\displaystyle\mathbb{E}\left[\\|\tilde{G}_{n}(X_{n},Y_{n+1},\xi_{n+1})\\|^{p}\big{\|}\mathcal{F}_{n}\right]\leqslant$	$\displaystyle\ \bar{\mathfrak{m}}_{\tilde{G},p}m_{*}^{p}(1+\\|X_{n}\\|)^{pg_{0}},$
	$\displaystyle\mathbb{E}\left[\\|\bar{G}_{n}(X_{n},Y_{n+1},\xi_{n+1})\\|^{p}\big{\|}\mathcal{F}_{n}\right]\leqslant$	$\displaystyle\ \bar{\mathfrak{m}}_{\bar{G},p}m_{*}^{p}(1+\\|X_{n}\\|)^{pg_{0}}.$