Explicit Mean-Square Error Bounds
for Monte-Carlo and Linear Stochastic Approximation

Shuhang Chen Department of Mathematics at the University of Florida, Gainesville. Adithya M. Devraj Department of ECE at the University of Florida, Gainesville. Ana Bušić Inria and DI ENS, École Normale Supérieure, CNRS, PSL Research University, Paris, France.
Financial support from ARO grant W911NF1810334 is gratefully acknowledged. Additional support from EPCN 1609131 & CPS 1646229, and French National Research Agency grant ANR-16-CE05-0008. Sean Meyn²²footnotemark: 2

Abstract

This paper concerns error bounds for recursive equations subject to Markovian disturbances. Motivating examples abound within the fields of Markov chain Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these algorithms can be interpreted as special cases of stochastic approximation (SA). It is argued that it is not possible in general to obtain a Hoeffding bound on the error sequence, even when the underlying Markov chain is reversible and geometrically ergodic, such as the M/M/1 queue. This is motivation for the focus on mean square error bounds for parameter estimates. It is shown that mean square error achieves the optimal rate of $O(1/n)$ , subject to conditions on the step-size sequence. Moreover, the exact constants in the rate are obtained, which is of great value in algorithm design.

Keywords: Stochastic Approximation, Markov chain Monte Carlo, Reinforcement learning

1 Introduction

Many questions in statistics and the area of reinforcement learning are concerned with computation of the root of a function in the form of an expectation: $\bar{f}(\theta)={\sf E}[f(\theta,\Phi))]$ , where $\Phi$ is a vector valued random variable, and $\theta\in\mathbb{R}^{d}$ . The value $\theta^{*}$ satisfying $\bar{f}(\theta^{*})=0$ is most commonly approximated through some version of the stochastic approximation (SA) algorithm of Robbins and Monro [32, 5]. In its basic form, this is the recursive algorithm

\theta_{n+1}=\theta_{n}+\alpha_{n+1}f(\theta_{n},\Phi_{n+1})

(1)

in which $\{\alpha_{n}\}$ is a non-negative gain sequence, and $\{\Phi_{n}\}$ is a sequence of random variables whose distribution converges to that of $\Phi$ as $n\to\infty$ . The sequence is a Markov chain in the applications of interest in this paper.

There is a large body of work on conditions for convergence of this recursion, and also a Central Limit Theorem (CLT): with $\widetilde{\theta}_{n}=\theta_{n}-\theta^{*}$ ,

	$\displaystyle\lim_{n\to\infty}\widetilde{\theta}_{n}$	$\displaystyle=0$	almost surely
	$\displaystyle\lim_{n\to\infty}\sqrt{n}\widetilde{\theta}_{n}$	$\displaystyle=N(0,\Sigma_{\theta})\qquad$	in distribution

The $d\times d$ matrix $\Sigma_{\theta}$ is known as the asymptotic covariance. The CLT requires substantially stronger assumptions on the gain sequence, the function $f$ , and the statistics of the “noise” sequence $\{\Phi_{n}\}$ [2, 25].

Soon after the stochastic approximation algorithm was first introduced in [32, 4], Chung [9] identified the optimal CLT covariance and techniques to obtain the optimum for scalar recursions. This can be cast as a form of stochastic Newton-Raphson (SNR) [14, 15, 12, 11]. Gradient free methods [or stochastic quasi Newton-Raphson (SQNR)] appeared in later work: The first example was proposed by Venter in [39], which was shown to obtain the optimal variance for a one-dimensional SA recursion. The algorithm obtains estimates of the SNR gain $-A^{-1}$ (see (2) below), through a procedure similar to the Kiefer-Wolfowitz algorithm [23]. Ruppert proposed an extension of Venter’s algorithm for vector-valued functions [33].

The averaging technique of Ruppert and Polyak [34, 30, 31] is a two-time-scale algorithm that is also designed to achieve the optimal asymptotic covariance. More recently, a two-time-scale variant of the SNR algorithm known as “Zap-SNR” was proposed in [14, 15, 12, 11], with applications to reinforcement learning. Zap algorithms are stable and convergent under mild assumptions [14, 7].

Under the typical assumptions under which the CLT holds for the recursion (1), the asymptotic covariance has an explicit form in terms of a linearization [24, Chapter 10, Theorem 3.3]. Assume that the solution to $\bar{f}(\theta^{*})=0$ is unique, and denote

A=\partial\bar{f}\,(\theta^{*})\,,\qquad\Delta_{n}=f(\theta^{*},\Phi_{n})

(2)

The error dynamics of the SA recursion are then approximated by the linear SA recursion:

\widetilde{\theta}_{n+1}=\widetilde{\theta}_{n}+\alpha_{n+1}[A\widetilde{\theta}_{n}+\Delta_{n+1}]\,.

(3)

Subject to the assumption that ${\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A$ is Hurwitz (i.e., $\text{Real}(\lambda)<-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ for each eigenvalue of $A$ ), the $d\times d$ matrix $\Sigma_{\theta}$ is the unique positive semi-definite solution to the Lyapunov equation

[{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A]\Sigma+\Sigma[{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A]^{\intercal}+\Sigma_{\Delta}=0

(4)

in which $\Sigma_{\Delta}$ is also an asymptotic covariance: the covariance matrix appearing in the CLT for the sequence $\{\Delta_{n}\}$ (which may be expressed in terms of a solution to a Poisson equation - see [24, Theorem 2.2, Chapter 10]).

The goal of this paper is to demonstrate that the CLT is far less asymptotic than it may appear. For this we focus analysis on the linearization (3), along with first steps towards analysis of the nonlinear recursions. Subject to assumptions on $A$ and the Markov chain, we establish the bound

\text{\rm Cov}\,(\theta_{n})=n^{-1}\Sigma_{\theta}+O(n^{-1-\delta})

(5)

with $\text{\rm Cov}\,(\theta_{n})={\sf E}[\widetilde{\theta}_{n}\widetilde{\theta}_{n}^{\intercal}]$ . Under further assumptions, the bound is refined to obtain $\delta\geq 1$ , and even finer bounds:

\text{\rm Cov}\,(\theta_{n})=n^{-1}\Sigma_{\theta}+n^{-2}\Sigma_{\theta,2}+O(n^{-2-\delta})

(6)

where again $\delta>0$ and formula of $\Sigma_{\theta,2}$ is obtained in the paper based on a second Lyapunov equation and a solution to a second Poisson equation.

It is hoped that these results will be helpful in construction and performance analysis of many algorithms found in machine learning, statistics and reinforcement learning. Identification of the coefficient for the $n^{-2}$ term from (6) may lead to criteria for gain design when one aims to minimize the covariance with a fixed budget on the number of iterations.

The reader may ask, why not search directly for finite- $n$ bounds of the flavor of Hoeffding’s inequality:

{\sf P}\{\|\widetilde{\theta}_{n}\|\geq\varepsilon\}\leq b_{0}\exp(-nI_{0}(\varepsilon))\,

(7)

where $b_{0}>0$ is fixed, and $I_{0}$ is a convex function that is strictly positive and finite in a region $0<\varepsilon^{2}\leq\bar{\varepsilon}^{2}$ . The answer is that such bounds are not always possible even for the simplest SA recursions, even when the Markov chain is geometrically ergodic. This is clarified in the first general example:

1.1 Markov Chain Monte Carlo

As a prototypical example of stochastic approximation, Markov chain Monte Carlo (MCMC) proceeds by constructing an ergodic Markov chain $\Phi$ with invariant measure $\pi$ so as to estimate $\pi(F)=\int F(z)\,\pi(dz)$ for some function $F:{\sf Z}\rightarrow\mathbb{R}^{d}$ . One then simulates $\Phi_{1},\Phi_{2},\dots,\Phi_{n+1}$ to obtain the estimates

\theta_{n+1}=\frac{1}{n+1}\sum_{k=1}^{n+1}F(\Phi_{k})

(8)

This is an instance of the SA recursion (1):

\theta_{n+1}=\theta_{n}+\frac{1}{n+1}(-\theta_{n}+F(\Phi_{n+1}))

(9)

Subtracting $\theta^{*}=\pi(F)$ from both sides of (9) gives, with $\widetilde{\theta}_{n}=\theta_{n}-\pi(F)$ ,

\widetilde{\theta}_{n+1}=\widetilde{\theta}_{n}+\frac{1}{n+1}(-\widetilde{\theta}_{n}+F(\Phi_{n+1})-\pi(F))

which is (3) in a special case: $A=-I$ , $\Delta_{n+1}=F(\Phi_{n+1})-\pi(F)$ and $\alpha_{n}=1/n$ .

A significant part of the literature on MCMC focuses on finding Markov chains whose marginals approach the invariant measure $\pi$ quickly. Error estimates for MCMC have only been studied under rather restrictive settings. For instance, under the assumption of uniform ergodicity of $\Phi$ and uniform boundedness of $F$ (which rarely hold in practice outside of a finite state space), a generalized Hoeffding’s inequality was obtained in [19] to obtain the PAC-style error bound (7). We can not expect Hoeffding’s bound if either of these assumptions is relaxed. Consider the simplest countable state space Markov chain: the M/M/1 queue with uniformization, defined with ${\sf Z}=\{0,1,2,\dots\}$ and

\Phi_{n+1}=\begin{cases}\Phi_{n}+1&\text{prob.\ $\alpha$}\\ \max(\Phi_{n}-1,0)&\text{prob.\ $\mu=1-\alpha$}\end{cases}

This is a reversible, geometrically ergodic Markov chain when $\rho=\alpha/\mu<1$ , with geometric invariant distribution $\pi(z)=(1-\rho)\rho^{z}$ , $z\geq 0$ . It is shown in [28] that the error bound (7) fails for most unbounded functions $F$ . The question is looked at in greater depth in [17, 16], where asymptotic bounds are obtained for the special case $F(z)\equiv z$ . An asymptotic version of (7) is obtained for the lower tail:

\displaystyle\lim_{n\to\infty}\frac{1}{n}\log\Bigl{(}{\sf P}\{\widetilde{\theta}_{n}\leq-\varepsilon\}\Bigr{)}

\displaystyle=-I_{0}(\varepsilon)

(10)

in which the right hand side is strictly negative and finite valued for positive $\varepsilon$ in a neighborhood of zero. An entirely different scaling is required for the upper tail:

\displaystyle\lim_{n\to\infty}\frac{1}{n}\log\Bigl{(}{\sf P}\Bigl{\{}\frac{\widetilde{\theta}_{n}}{n}\geq\varepsilon\Bigr{\}}\Bigr{)}

\displaystyle=-J_{0}(\varepsilon)

(11)

where again the right hand side is strictly negative and finite valued for $\varepsilon>0$ sufficiently small. It follows from (11) that the PAC-style bound (7) is not attainable.

1.2 Reinforcement Learning

The theory of this paper also applies to TD-learning. In this case, the Markov chain $\Phi$ contains as one component a state process for a system to be controlled.

Consider a Markov chain $X$ evolving on a (Polish) state space ${\sf X}$ . Given a cost function $c:{\sf X}\to\mathbb{R}$ , and a discount factor $\beta\in(0,1)$ , the goal in TD-learning is to approximate the solution $h:{\sf X}\to\mathbb{R}$ to the Bellman equation:

h(x)=c(x)+\beta{\sf E}[h(X_{n+1})\mid X_{n}=x]

(12)

This functional equation can be recast:


$\displaystyle{\sf E}[$	$\displaystyle\mathcal{D}(h,\Phi_{n+1})\mid\Phi_{0}\,\dots\,\Phi_{n}]=0$	(13a)
where	$\displaystyle\mathcal{D}(h,\Phi_{n+1})\mathbin{:=}c(X_{n})+\beta h(X_{n+1})-h(X_{n})\,,\quad\Phi_{n+1}\mathbin{:=}(X_{n+1},X_{n})\,,\quad n\geq 0$	(13b)

Equation (13a) may be regarded as motivation for the TD-learning algorithms of [36, 38].

Consider a linearly parameterized family of candidate approximations $\{h^{\theta}(x)=\theta^{\intercal}\psi(x):\theta\in\mathbb{R}^{d}\}$ , where $\psi:{\sf X}\to\mathbb{R}^{d}$ denotes the $d$ basis functions. The goal in TD-learning is to solve the Galerkin relaxation of (13a,13b):

{\sf E}[\mathcal{D}(h^{\theta^{*}},\Phi_{n+1})\zeta_{n}]=0

(14)

where $\{\zeta_{n}\}$ is a $d$ -dimensional stochastic process, adapted to $\Phi$ , and the expectation is with respect to the steady state distribution. In particular, $\zeta_{n}\equiv\psi(X_{n})$ in TD( $0$ ) learning, so that the goal in this case is to find $\theta^{*}\in\mathbb{R}^{d}$ such that:

	$\displaystyle{\sf E}$	$\displaystyle\big{[}\mathcal{D}(h^{\theta^{*}},\Phi_{n+1})\psi(X_{n})\big{]}=0$		(15)
		$\displaystyle\mathcal{D}(h^{\theta^{}},\Phi_{n+1})=c(X_{n})+\beta h^{\theta^{}}(X_{n+1})-h^{\theta^{*}}(X_{n})$		(15)

The TD( $0$ ) algorithm is the SA recursion (1) applied to solve (15):

	$\displaystyle\theta_{n+1}$	$\displaystyle=\theta_{n}+\alpha_{n+1}d_{n+1}\psi(X_{n})$		(16)
	$\displaystyle d_{n+1}$	$\displaystyle=c(X_{n})+\beta h^{\theta_{n}}(X_{n+1})-h^{\theta_{n}}(X_{n})$		(16)

Denoting

	$\displaystyle A_{n+1}$	$\displaystyle\mathbin{:=}\psi(X_{n})\big{(}\beta\psi(X_{n+1})-\psi(X_{n})\big{)}^{\intercal}$
	$\displaystyle b_{n+1}$	$\displaystyle\mathbin{:=}-c(X_{n})\psi(X_{n})$

the algorithm (16) can be rewritten as:

\theta_{n+1}=\theta_{n}+\alpha_{n+1}\big{(}A_{n+1}\theta_{n}-b_{n+1}\big{)}

(17)

Note that $\theta^{*}$ from (14) solves the linear equation ${\sf E}[A_{n+1}]\theta^{*}={\sf E}[b_{n+1}]$ . Subtracting $\theta^{*}$ from both sides of (17) gives, with $\widetilde{\theta}_{n}=\theta_{n}-\theta^{*}$ ,

\widetilde{\theta}_{n+1}=\widetilde{\theta}_{n}+\alpha_{n+1}[A\widetilde{\theta}_{n}+(A_{n+1}-A)\widetilde{\theta}_{n}+A_{n+1}\theta^{*}-b_{n+1}]

(18)

Under mild conditions, we show through coupling that iteration (18) can be closely approximated by the linear SA recursion (3) with matrix $A={\sf E}[A_{n+1}]$ and noise sequence $\Delta_{n+1}=A_{n+1}\theta^{*}-b_{n+1}$ . In particular, the two recursions have the same asymptotic covariance if the matrix ${\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A$ is Hurwitz (see Section 2.3).

Under general assumptions on the Markov chain $X$ , and the basis functions $\psi$ , it is known that matrix $A=\partial{\sf E}[\mathcal{D}(h^{\theta^{*}},\Phi_{n+1})]={\sf E}[A_{n+1}]$ is Hurwitz, and that the sequence of estimates $\{\theta_{n}\}$ converges to $\theta^{*}$ [38]. However, when the discount factor $\beta$ is close to $1$ , it can be shown that $\lambda_{\max}>-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ (where $\lambda_{\max}$ denotes the largest eigenvalue of $A$ ), and is in fact close to $0$ under mild additional assumptions [14, 11, 13]. It follows that the algorithm has infinite asymptotic covariance: full details and finer results can be deduced from Theorems 2.4 and 2.7.

The SNR algorithm is defined as follows:

$\displaystyle\theta_{n+1}$	$\displaystyle=\theta_{n}-\alpha_{n+1}\widehat{A}_{n+1}^{-1}d_{n+1}\psi(X_{n})$	(19)
$\displaystyle d_{n+1}$	$\displaystyle=c(X_{n})+\beta h^{\theta_{n}}(X_{n+1})-h^{\theta_{n}}(X_{n})$
$\displaystyle\widehat{A}_{n+1}$	$\displaystyle=\widehat{A}_{n}+\alpha_{n+1}\big{[}A_{n+1}-\widehat{A}_{n}\big{]}$	(20)

Under the assumption that the sequence of matrices $\{\widehat{A}_{n}:n\geq 0\}$ is invertible for each $n$ , it is shown in [14, 11] that the sequence of estimates obtained using (19,20) are identical to the parameter estimates obtained using the LSTD( $0$ ) algorithm: $\theta_{n+1}=\widehat{A}_{n+1}^{-1}\widehat{b}_{n+1}$ , with

	$\displaystyle\widehat{A}_{n+1}$	$\displaystyle=\widehat{A}_{n}+\alpha_{n+1}\big{[}A_{n+1}-\widehat{A}_{n}\big{]}$
	$\displaystyle\widehat{b}_{n+1}$	$\displaystyle=\widehat{b}_{n}+\alpha_{n+1}\big{[}b_{n+1}-\widehat{b}_{n}\big{]}$

Consequently, the LSTD( $0$ ) algorithm achieves the optimal asymptotic covariance.

Q-learning and many other RL algorithms can also be cast as SA recursions. They are no longer linear, but it is anticipated that bounds can be obtain in future research through linearization [18].

1.3 Literature Survey

Finite time performance bounds for linear stochastic approximation were obtained in many prior papers, subject to the assumption that the noise sequence $\{\Delta_{n}\}$ appearing in (3) is a martingale difference sequence [10, 26]. This assumption is rarely satisfied in the applications of interest to the authors.

Much of the literature on finite time bounds for linear SA recursions with Markovian noise has been recent. For constant step-size algorithms with step-size $\alpha$ , it follows from analysis in [6] that the pair process $(\theta_{n},\Phi_{n})$ is a geometrically ergodic Markov chain, and the covariance of $\theta_{n}$ is $O(\alpha)$ in steady state. Finite time bounds of order $O(\alpha)$ were obtained in [37, 3, 35, 21]. Unfortunately, these bounds are not tight, and hence their value for algorithm design is limited.

Mean-square error bounds have also been obtained for diminishing step-size algorithms, to establish the optimal rate of convergence ${\sf E}[\|\widetilde{\theta}_{n}\|^{2}]\leq b_{\theta}/n$ [35, 3, 8]. The constant $b_{\theta}$ is a function of the mixing time of the underlying Markov chain. These results require strong assumptions (uniform ergodicity of the Markov chain), and do not obtain the optimal constant $b^{*}_{\theta}=\hbox{\rm trace\,}(\Sigma_{\theta})$ . Rather than parameter estimation error, finite time bounds are obtained in [22] for ${\sf E}[\|\bar{f}(\theta_{n})\|^{2}]$ , which may be regarded as a far more relevant performance criterion. Bounds are obtained for Markovian models, subject to the existence of a Lyapunov function similar to what is assumed in the present work. It is again not clear if the resulting bounds are tight, or have value in algorithm design.

1.4 Contributions

The main contribution of this paper is a general framework for analyzing the finite time performance of linear stochastic approximation algorithms with Markovian noise, and vanishing step-size (required to achieve the optimal rate of convergence of Chung-Ruppert-Polyak). The M/M/1 queue example illustrates plainly that Markovian noise introduces challenges not seen in the “white noise” setting, and that the finite- $n$ error bound (7) cannot be obtained without substantial restrictions. Even under the assumptions of [19] (uniform ergodicity, and bounded noise), the resulting bounds are extremely loose and hence may give little insight for algorithm design. Our approach allows us to obtain explicit bounds, without the uniform boundedness assumption of noise that is frequently imposed in the literature [3, 35, 21, 8]. Instead, it is assumed that the Markovian noise is $V$ -uniform ergodic; an assumption that is far weaker than geometric or uniform mixing.

Our starting point is the classical martingale approximation of the noise used in CLT analysis of Markov chains [29, Chapter 17] and used in the analysis of SA recursions since Metivier and Priouret [27]. Under mild assumptions on the Markov chain, each $\Delta_{n}$ can be expressed as the sum of a martinagle difference and a telescoping term. The solution of the linear recursion (3) is decomposed as a sum of the respective responses:

\widetilde{\theta}_{n}=\widetilde{\theta}_{n}^{\mathcal{M}}+\widetilde{\theta}_{n}^{\mathcal{T}}

(21)

The challenge is to obtain explicit bounds on the mean square error for each term.

We say that a deterministic vector-valued sequence $\{e_{n}\}$ converges to zero at rate $1/n^{\varrho_{0}}$ if

\lim_{n\to\infty}n^{\varrho}\|e_{n}\|=\begin{cases}0,&\text{if }\;\varrho<\varrho_{0}\\ \infty,&\text{if }\;\varrho>\varrho_{0}\end{cases}

Bounds for the mean-square error are obtained in Thm. 2.4, subject to conditions on both the matrix $A$ and the noise sequence. In summary, under general assumptions on $\{\Delta_{n}\}$ ,

(i)

The bound (5) holds if ${\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A$ is Hurwitz.
(ii)

If $I+A$ is Hurwitz, then the finer bound (6) holds.

(iii)

If there is an eigenvalue of $A$ satisfying $\text{Real}(\lambda)>-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ , and corresponding left-eigenvalue $v$ that lies outside of the null-space of the asymptotic covariance of the noise sequence, then

\lim_{n\to\infty}n^{2\rho}{\sf E}[|v^{\intercal}\widetilde{\theta}_{n}|^{2}]=\begin{cases}0,&\quad\varrho<\varrho_{0}\\ \infty,&\quad\varrho>\varrho_{0}\end{cases}

(22)

with $\rho_{0}=|\text{Real}(\lambda)|$ . The convergence of ${\sf E}[\|\widetilde{\theta}_{n}\|^{2}]$ to zero is thus no faster than $n^{-2\rho_{0}}$ .

2 Mean Square Convergence

2.1 Notation and Background

Consider the linear SA recursion (3), with the noise sequence $\{\Delta_{n}\}$ defined in (2). We use the following notation to explicitly represent the noise as a function of $\Phi_{n}$ :

f^{*}(\Phi_{n})\mathbin{:=}\Delta_{n}=f(\theta^{*}\,,\Phi_{n})

(23)

A form of geometric ergodicity is assumed throughout. To apply standard theory, it is assumed that the state space ${\sf Z}$ is Polish (the standing assumption in [29]). We fix a measurable function $V\colon{\sf Z}\to[1,\infty)$ , and let $L_{\infty}^{V}$ denote the set of measurable functions $g\colon{\sf Z}\to\mathbb{R}$ satisfying

\|g\|_{V}\mathbin{:=}\sup_{z\in{\sf Z}}\frac{|g(z)|}{V(z)}<\infty

The Markov chain $\Phi$ is assumed to be $V$ -uniformly ergodic: there exists $\rho\in(0,1)$ , and $B_{V}<\infty$ such that for each $g\in L_{\infty}^{V}$ , $z\in{\sf Z}$ ,

\Big{|}{\sf E}[g(\Phi_{n})\mid\Phi_{0}=z]-\pi(g)\Big{|}\leq B_{V}\|g\|_{V}\rho^{n}V(z)\,,\quad n\geq 0

(24)

where $\pi$ is the unique invariant measure, and $\pi(g)=\int g(z)\,\pi(dz)$ is the steady state mean.

The uniform bound (24) is not a strong assumption. For example, it is satisfied for the M/M/1 queue described in Section 1.1 with $V(z)=\exp(\varepsilon_{0}z)$ , for $\varepsilon_{0}>0$ sufficiently small, with $z\in{\sf Z}=\{0,1,\dots\}$ [29, Thm. 16.4.1].

The following are imposed throughout:

Assumptions:

(A1)

The Markov process $\Phi$ is $V$ -uniformly ergodic, with unique invariant measure denoted $\pi$ .
(A2)

The $d\times d$ matrix $A$ is Hurwitz, and the step-size sequence $\alpha_{n}\equiv 1/n$ , $n\geq 1$ .
(A3)

The function $f^{*}:{\sf Z}\to\mathbb{R}^{d}$ satisfies $\|{f_{i}^{*}}^{2}\|_{V}<\infty$ and $\pi(f^{*}_{i})=0$ for each $i$ .

For any $g\in L_{\infty}^{V}$ , denote $\tilde{g}(z)=g(z)-\pi(g)$ , and

{\hat{g}}(z)=\sum_{n=0}^{\infty}{\sf E}[\tilde{g}(\Phi_{n})\mid\Phi_{0}=z]

(25)

It is evident that ${\hat{g}}\in L_{\infty}^{V}$ under (A1). Further conclusions are summarized below. Thm. 2.1 (i) follows immediately from (A1). Part (ii) follows from (i) and [29, Lemma 15.2.9] (the chain is also $\sqrt{V}$ -uniformly ergodic).

Theorem 2.1.

The following conclusions hold for a $V$ -uniformly ergodic Markov chain:

(i)

The function ${\hat{g}}\in L_{\infty}^{V}$ defined in (25) has zero mean, and solves Poisson’s equation:

${\sf E}[{\hat{g}}(\Phi_{k+1})\!\mid\!\Phi_{k}\!=\!z]\!=\!{\hat{g}}(z)-\tilde{g}(z)$ (26)
(ii)

If $g^{2}\in L_{\infty}^{V}$ , then ${\hat{g}}^{2}\in L_{\infty}^{V}$ . ∎

Assumption (A3) implies that the sequence $\{\Delta_{n}\}$ appearing in (3) is zero mean for the stationary version of the Markov chain $\Phi$ . Its asymptotic covariance (appearing in the Central Limit Theorem) is denoted

\Sigma_{\Delta}=\sum_{k=-\infty}^{\infty}{\sf E}_{\pi}[\Delta_{k}\Delta_{0}^{\intercal}]

(27)

where the expectations are in steady state.

A more useful representation of $\Sigma_{\Delta}$ is obtained through a decomposition of the noise sequence based on Poisson’s equation. This now standard technique was introduced in the SA literature in the 1980s [27].

With $f^{*}$ defined in (23), denote by ${\hat{f}}$ a solution to Poisson’s equation:

{\sf E}[{\hat{f}}\,(\Phi_{k+1})\mid\Phi_{k}=z]={\hat{f}}(z)-f^{*}(z)

(28)

This is in fact $d$ separate Poisson equations since $f^{*}\colon{\sf Z}\to\mathbb{R}^{d}$ . It is assumed for convenience that the solutions are normalized so ${\hat{f}}$ has zero steady-state mean. This is justified by the fact that ${\hat{f}}-\pi({\hat{f}})$ also solves (28) under assumption (A3). The fact that ${\hat{f}}_{i}^{2}\in L_{\infty}^{V}$ for $1\leq i\leq d$ follows from Thm. 2.1 (ii).

We then write, for $n\geq 1$ ,

\Delta_{n}=f^{*}(\Phi_{n})=\Delta_{n+1}^{m}+Z_{n}-Z_{n+1}

where $Z_{n}={\hat{f}}(\Phi_{n})$ and $\Delta_{n+1}^{m}=Z_{n+1}-{\sf E}[Z_{n+1}\mid\mathcal{F}_{n}]$ is a martingale difference sequence. Each of the sequences is bounded in $L_{2}$ , and the asymptotic covariance (27) is expressed

\Sigma_{\Delta}={\sf E}_{\pi}[\Delta_{n}^{m}{\Delta_{n}^{m}}^{\intercal}]

(29)

where the expectation is taken in steady-state. The equivalence of (29) and (27) appears in [29, Theorem 17.5.3] for the case in which $\Delta_{n}$ is scalar valued; the generalization to vector valued processes involves only notational changes.

2.2 Decomposition and Scaling of the Parameter Sequence

We now explain the decomposition (21). Each of the two sequences $\{\widetilde{\theta}_{n}^{\mathcal{M}}\}$ and $\{\widetilde{\theta}_{n}^{\mathcal{T}}\}$ evolves as a stochastic approximation sequence, differentiated by the inputs and initial conditions:


$\displaystyle\widetilde{\theta}_{n+1}^{\mathcal{M}}$	$\displaystyle=\widetilde{\theta}_{n}^{\mathcal{M}}+\alpha_{n+1}\bigl{[}A\widetilde{\theta}_{n}^{\mathcal{M}}\!+\Delta_{n+2}^{m}\bigr{]}\,,$	$\displaystyle\widetilde{\theta}_{0}^{\mathcal{M}}=\widetilde{\theta}_{0}$	(30a)
$\displaystyle\widetilde{\theta}_{n+1}^{\mathcal{T}}$	$\displaystyle=\widetilde{\theta}_{n}^{\mathcal{T}}+\alpha_{n+1}\bigl{[}A\widetilde{\theta}_{n}^{\mathcal{T}}\!+\!Z_{n+1}-Z_{n+2}\bigr{]}\,,$	$\displaystyle\widetilde{\theta}_{0}^{\mathcal{T}}=0$	(30b)

The second recursion admits a more tractable realization through the change of variables, $\Xi_{n}=\widetilde{\theta}_{n}^{\mathcal{T}}+\alpha_{n}Z_{n+1}$ , $n\geq 1$ .

Lemma 2.2.

The sequence $\{\Xi_{n}\}$ evolves as the SA recursion

\Xi_{n+1}=\Xi_{n}+\alpha_{n+1}\bigl{[}A\Xi_{n}-\alpha_{n}[I+A]Z_{n+1}\bigr{]}\,,\qquad\Xi_{1}=Z_{1}

(31)

∎

Lemma 2.2 combined with (30) gives

\widetilde{\theta}_{n}=\widetilde{\theta}_{n}^{(1)}+\widetilde{\theta}_{n}^{(2)}+\widetilde{\theta}_{n}^{(3)}

(32)

where $\widetilde{\theta}_{n}^{(1)}\!=\widetilde{\theta}_{n}^{\mathcal{M}}$ , $\widetilde{\theta}_{n}^{(2)}\!=\Xi_{n}$ , and $\widetilde{\theta}_{n}^{(3)}\!=\!-\alpha_{n}Z_{n+1}$ for $n\geq 1$ . Note that $\widetilde{\theta}_{n}^{\mathcal{T}}=\widetilde{\theta}_{n}^{(2)}+\widetilde{\theta}_{n}^{(3)}$ .

It is more convenient to work directly with the recursion for the scaled sequence:

Lemma 2.3.

For any $\varrho\in(0,1/2]$ , the scaled sequence $\widetilde{\theta}^{\varrho}_{n}=n^{\varrho}\widetilde{\theta}_{n}$ admits the recursion,

\widetilde{\theta}^{\varrho}_{n+1}\!=\!\widetilde{\theta}^{\varrho}_{n}+\alpha_{n+1}\bigl{[}\varrho_{n}\widetilde{\theta}^{\varrho}_{n}+A(n,\varrho)\widetilde{\theta}^{\varrho}_{n}+(n+1)^{\varrho}\Delta_{n+1}\bigr{]}

(33)

where $\varrho_{n}=\varrho+\varepsilon(n,\varrho)$ with $\varepsilon(n,\varrho)=O(n^{-1})$ , and $A(n,\varrho)=(1+n^{-1})^{\varrho}A$ . ∎

Denote $\widetilde{\theta}_{n}^{\varrho,(i)}=n^{\varrho}\widetilde{\theta}_{n}^{(i)}$ for each $i$ . Lemma 2.3 combined with (32) gives

\widetilde{\theta}^{\varrho}_{n}=\widetilde{\theta}_{n}^{\varrho,(1)}+\widetilde{\theta}_{n}^{\varrho,(2)}+\widetilde{\theta}_{n}^{\varrho,(3)}

(34)

The first two sequences evolve as SA recursions:


$\displaystyle\widetilde{\theta}_{n+1}^{\varrho,(1)}=\widetilde{\theta}_{n}^{\varrho,(1)}+$	$\displaystyle\alpha_{n+1}\bigl{[}[\varrho_{n}I+A(n,\varrho)]\widetilde{\theta}_{n}^{\varrho,(1)}+(n+1)^{\varrho}\Delta_{n+2}^{m}\bigr{]}\,,$	$\displaystyle\widetilde{\theta}_{0}^{\varrho,(1)}\!=\!\widetilde{\theta}^{\varrho}_{0}$	(35a)
$\displaystyle\widetilde{\theta}_{n+1}^{\varrho,(2)}=\widetilde{\theta}_{n}^{\varrho,(2)}+$	$\displaystyle\alpha_{n+1}\bigl{[}[\varrho_{n}I+A(n,\varrho)]\widetilde{\theta}_{n}^{\varrho,(2)}-(n+1)^{\varrho}\alpha_{n}[I+A]Z_{n+1}\bigr{]}\,,$	$\displaystyle\widetilde{\theta}^{\varrho,(2)}_{1}\!=\!\Xi_{1}$	(35b)

2.3 Mean Square Error Bounds

Fix the initial condition $(\Phi_{0},\widetilde{\theta}_{0})$ , and denote $\text{\rm Cov}\,(\theta_{n})={\sf E}[\widetilde{\theta}_{n}\widetilde{\theta}_{n}^{\intercal}]$ and $\Sigma_{Z}={\sf E}_{\pi}[Z_{n}Z_{n}^{\intercal}]$ . The following summarizes bounds on the convergence rate of ${\sf E}[\|\widetilde{\theta}_{n}\|^{2}]=\hbox{\rm trace\,}(\text{\rm Cov}\,(\theta_{n}))$ .

Theorem 2.4.

Suppose (A1)-(A3) hold. Then, for the linear recursion (3),

(i)

If $\text{Real}(\lambda)<-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ for every eigenvalue $\lambda$ of $A$ , then for some $\delta=\delta(A,\Sigma_{\Delta})>0$ ,

$\text{\rm Cov}\,(\theta_{n})=n^{-1}\Sigma_{\theta}+O(n^{-1-\delta})\,,\qquad n\geq 0\,,$

where $\Sigma_{\theta}\geq 0$ is the solution to the Lyapunov equation (4). Consequently, the rate of convergence of ${\sf E}[\|\widetilde{\theta}_{n}\|^{2}]$ is $1/n$ .
(ii)

Suppose there is an eigenvalue $\lambda$ of $A$ that satisfies $-\varrho_{0}=\text{Real}(\lambda)>-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ . Let $v\neq 0$ denote a corresponding left eigenvector, and suppose that $\Sigma_{\Delta}v\neq 0$ . Then, ${\sf E}[|v^{\intercal}\widetilde{\theta}_{n}|^{2}]$ converges to $0$ at rate $n^{-2\varrho_{0}}$ . ∎

The proof of Thm. 2.4 is contained in Section 2.5. The following negative result is a direct corollary of Thm. 2.4 (ii):

Corollary 2.5.

Suppose (A1)-(A3) hold. Moreover, suppose there is an eigenvalue $\lambda$ of $A$ that satisfies $-\varrho_{0}=\text{Real}(\lambda)>-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ , with corresponding left eigenvector $v$ satisfying $\Sigma_{\Delta}v\neq 0$ . Then, ${\sf E}[\|\widetilde{\theta}_{n}\|^{2}]$ converges to zero at rate no faster than $1/n^{2\varrho_{0}}$ .

One challenge in extension to nonlinear recursions is that the noise sequence depends on the parameter estimates (recall (2)). This is true even for TD learning with linear function approximation (see (18) and surrounding discussion). Extension to these recursions is obtained through coupling.

Consider the error sequence for a random linear recursion

\widetilde{\theta}^{\circ}_{n+1}=\widetilde{\theta}^{\circ}_{n}+\alpha_{n+1}[A_{n+1}\widetilde{\theta}^{\circ}_{n}+A_{n+1}\theta^{*}-b_{n+1}]

(36)

subject to the following assumptions:

(A4)

The sequences $\{A_{n},b_{n}\}$ are functions of the Markov chain:

$A_{n}=\mathcal{A}(\Phi_{n})\,,\quad b_{n}=\mathcal{B}(\Phi_{n})\,,$

which satisfy $\|\mathcal{A}_{i,j}^{2}\|_{V}<\infty$ , $\|\mathcal{B}_{i}^{2}\|_{V}<\infty$ for each $1\leq i,j\leq d$ . The steady state means are denoted $A={\sf E}_{\pi}[A_{n}]$ , $b={\sf E}_{\pi}[b_{n}]$ . Moreover, the matrix $A$ is Hurwitz, and $\theta^{*}=A^{-1}b$ .

Theorem 2.6.

Suppose Assumptions A1-A4 hold. If the matrix ${\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A$ is Hurwitz, the error bound (5) holds for $\{\widetilde{\theta}^{\circ}_{n}\}$ obtained from (36), with $\Delta_{n+1}=A_{n+1}\theta^{*}-b_{n+1}$ .

The proof of the theorem is via coupling with (3). For this we write (36) in the suggestive form

\widetilde{\theta}^{\circ}_{n+1}=\widetilde{\theta}^{\circ}_{n}+\alpha_{n+1}[A\widetilde{\theta}^{\circ}_{n}+\Delta_{n+1}+(A_{n+1}-A)\widetilde{\theta}^{\circ}_{n}]\,,\qquad\Delta_{n+1}=A_{n+1}\theta^{*}-b_{n+1}

(37)

With common initial condition $\Phi_{0}$ , the sequence $\{\widetilde{\theta}^{\circ}_{n}\}$ is compared with the error sequence $\{\widetilde{\theta}^{\bullet}_{n}\}$ for the corresponding linear SA algorithm:

\widetilde{\theta}^{\bullet}_{n+1}=\widetilde{\theta}^{\bullet}_{n}+\alpha_{n+1}[A\widetilde{\theta}^{\bullet}_{n}+\Delta_{n+1}]

The difference sequence $\{\mathcal{E}_{n}\mathbin{:=}\widetilde{\theta}^{\circ}_{n}-\widetilde{\theta}^{\bullet}_{n}\}$ evolves according to (3), but with a vanishing noise sequence:

\mathcal{E}_{n+1}=\mathcal{E}_{n}+\alpha_{n+1}[A\mathcal{E}_{n}+(A_{n+1}-A)\widetilde{\theta}^{\circ}_{n}]

(38)

By decomposing $A_{n+1}-A$ into martingale difference and telescoping sequences based on Poisson’s equation, the technique used to prove Thm. 2.4 can be used to obtain the following bound on the mean-square coupling error.

Let $\lambda=-\varrho_{0}+ui$ denote an eigenvalue of the matrix $A$ with largest real part (i.e., $\varrho_{0}$ is minimal).

Theorem 2.7.

Under Assumptions (A1)-(A4),

(i)

$\displaystyle\limsup_{n\rightarrow\infty}n^{2}{\sf E}[\mathcal{E}_{n}^{\intercal}\mathcal{E}_{n}]<\infty$ if $\varrho_{0}>1$ .
(ii)

$\displaystyle\limsup_{n\rightarrow\infty}n^{2\varrho}{\sf E}[\mathcal{E}_{n}^{\intercal}\mathcal{E}_{n}]<\infty$ for all $\varrho<\varrho_{0}$ , provided $\varrho_{0}\leq 1$ .

∎

Thm. 2.7 provides a remarkable bound when $\rho_{0}>1$ : it immediately implies Thm. 2.6 because the mean-square coupling error ${\sf E}[\|\mathcal{E}_{n}]\|^{2}$ tends to zero at rate no slower than $n^{-2}$ , which is far faster than ${\sf E}[\|\widetilde{\theta}^{\bullet}_{n}\|^{2}]\approx\hbox{\rm trace\,}(\Sigma_{\theta})n^{-1}$ (implied by Thm. 2.4).

An alert reader may observe that Theorems 2.6 and 2.7 leave out a special case: consider $\rho_{0}<{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ , so that the rate of convergence of ${\sf E}[\|\widetilde{\theta}^{\bullet}_{n}\|^{2}]$ is the sub-optimal value $n^{-2\rho_{0}}$ . The bound obtained in Thm. 2.7 remains valuable, in the sense that it combined with Thm. 2.4 (ii) implies the rate of convergence of ${\sf E}[\|\widetilde{\theta}^{\circ}_{n}]\|^{2}$ is no slower than $n^{-2\rho_{0}}$ . However, because ${\sf E}[\|\mathcal{E}_{n}]\|^{2}$ and ${\sf E}[\|\widetilde{\theta}^{\bullet}_{n}\|^{2}]$ tend to zero at the same rate, we cannot rule out the possibility that $\widetilde{\theta}^{\circ}_{n}=\mathcal{E}_{n}+\widetilde{\theta}^{\bullet}_{n}$ converges to zero much faster. In particular, it remains to prove that if there is an eigenvalue $\lambda$ of $A$ that satisfies $-\varrho_{0}=\text{Real}(\lambda)>-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ , and an eigenvector $v\neq 0$ satisfying $\Sigma_{\Delta}v\neq 0$ , then, ${\sf E}[|v^{\intercal}\widetilde{\theta}_{n}^{\circ}|^{2}]$ converges to $0$ at rate $n^{-2\varrho_{0}}$ .

2.4 Implications

Thm. 2.4 indicates that the convergence rate of $\text{\rm Cov}\,(\theta_{n})$ is determined jointly by the matrix $A$ , and the martingale difference component of the noise sequence $\{\Delta_{n}\}$ . Convergence of $\{\widetilde{\theta}_{n}\}$ can be slow if the matrix $A$ has eigenvalues close to zero.

The result also explains the slow convergence of some reinforcement learning algorithms. For instance, the matrix $A$ in Watkins’ Q-learning has at least one eigenvalue with real part greater than or equal to $-(1-\beta)$ , where $\beta$ is the discount factor appearing in the Markov decision process [40, 14, 11]. Since $\beta$ is usually close to one, Thm. 2.4 implies that the convergence rate of the algorithm is much slower than $n^{-1}$ . Under the assumption that $A$ is Hurwitz, the $1/n$ convergence rate is guaranteed by the use of a modified step-size sequence $\alpha_{n}=g/n$ , with $g>0$ chosen so that the matrix ${\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+gA$ is Hurwitz. Corollary 2.8 follows directly from Thm. 2.4 (i).

Corollary 2.8.

Let $g$ be a constant such that ${\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+gA$ is Hurwitz, and $\{\widetilde{\theta}_{n}\}$ be recursively obtained as

\widetilde{\theta}_{n+1}=\widetilde{\theta}_{n}+\frac{g}{n+1}[A\widetilde{\theta}_{n}+\Delta_{n+1}]

Then, for some $\delta=\delta(A,g,\Sigma_{\Delta})>0$ ,

\text{\rm Cov}\,(\theta_{n})={\sf E}[\widetilde{\theta}_{n}\widetilde{\theta}_{n}^{\intercal}]=n^{-1}\Sigma_{\theta}^{g}+O(n^{-1-\delta})

where $\Sigma_{\theta}^{g}\geq 0$ solves the Lyapunov equation

[{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+gA]\Sigma+\Sigma[{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+gA]^{\intercal}+g^{2}\Sigma_{\Delta}=0

∎

We can also ensure the $1/n$ convergence rate by using a matrix gain. Provided $A$ is invertible, and if it is known beforehand, $\alpha_{n}=-A^{-1}/n$ is the optimal step-size sequence (in terms of minimizing the asymptotic covariance) [1, 25, 13]. The SQNR algorithm of [33] and the Zap-SNR algorithm [14, 11] provide general approaches to recursively estimate the optimal matrix gain.

The next subsection is dedicated to the proof of Thm. 2.4. The proofs of the technical results are contained in the Appendix A.

2.5 Proof of Thm. 2.4

Denote $\text{\rm Cov}\,(\theta_{n}^{(i)})={\sf E}[\widetilde{\theta}_{n}^{(i)}(\widetilde{\theta}_{n}^{(i)})^{\intercal}]$ and $\Sigma_{n}^{\varrho,(i)}={\sf E}[\widetilde{\theta}^{\varrho,(i)}(\widetilde{\theta}^{\varrho,(i)})^{\intercal}]=n^{2\varrho}\text{\rm Cov}\,(\theta_{n}^{(i)})$ for each $i$ in (34). The proof proceeds by establishing the convergence rate for each $\text{\rm Cov}\,(\theta_{n}^{(i)})$ . The main challenges are the first two: $\text{\rm Cov}\,(\theta_{n}^{(1)})$ and $\text{\rm Cov}\,(\theta_{n}^{(2)})$ , for which explicit bounds are obtained by studying recursions of the scaled sequences. Bounding $\widetilde{\theta}_{n}^{{(3)}}=-\alpha_{n}Z_{n+1}$ is trivial.

2.5.1 The martingale difference term

Proposition 2.9.

Under (A1)-(A3),

(i)

If $\text{Real}(\lambda)<-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ for every eigenvalue $\lambda$ of $A$ , then

$\text{\rm Cov}\,(\theta_{n}^{(1)})=n^{-1}\Sigma_{\theta}+O(n^{-1-\delta})$

where $\delta=\delta({\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A,\Sigma_{\Delta})>0$ , and $\Sigma_{\theta}$ is the solution to the Lyapunov equation (4).
(ii)

Suppose there is an eigenvalue $\lambda$ of $A$ , that satisfies $-\varrho_{0}=\text{Real}(\lambda)>-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ . Let $v\neq 0$ denote the corresponding left eigenvector, and suppose moreover that $\Sigma_{\Delta}v\neq 0$ . Then, ${\sf E}[|v^{\intercal}\widetilde{\theta}_{n}^{(1)}|^{2}]$ converges to $0$ at rate $n^{-2\varrho_{0}}$ .

∎

2.5.2 The telescoping sequence term

Proposition 2.10.

Under (A1)-(A3),

(i)

If $\text{Real}(\lambda)<-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ for every eigenvalue $\lambda$ of $A$ , then, $\text{\rm Cov}\,(\theta_{n}^{(2)})=O(n^{-1-\delta})$ for some $\delta=\delta({\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A,\Sigma_{\Delta})>0$ .
(ii)

Suppose there is an eigenvalue $\lambda$ of $A$ that satisfies $-\varrho_{0}=\text{Real}(\lambda)>-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ . Let $v\neq 0$ denote the corresponding left eigenvector, and suppose moreover that $\Sigma_{\Delta}v\neq 0$ . Then,

$\limsup_{n\rightarrow\infty}n^{2\varrho_{0}}{\sf E}[|v^{\intercal}\widetilde{\theta}_{n}^{(2)}|^{2}]<\infty$

∎

2.5.3 Proof of Thm. 2.4

We obtain the convergence rate of $\text{\rm Cov}\,(\theta_{n})$ based on

\text{\rm Cov}\,(\theta_{n})=\sum_{i=1}^{3}\text{\rm Cov}\,(\theta_{n}^{(i)})+\sum_{i=1}^{3}\sum_{j=1,j\neq i}^{3}{\sf E}[\widetilde{\theta}_{n}^{(i)}(\widetilde{\theta}_{n}^{(j)})^{\intercal}]

For case (i), by Prop. 2.9 (i) and Prop. 2.10 (i), there exists $\delta=\delta({\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A,\Sigma_{\Delta})>0$ such that

	$\displaystyle\text{\rm Cov}\,(\theta_{n}^{(1)})$	$\displaystyle=n^{-1}\Sigma_{\theta}+O(n^{-1-\delta})$
	$\displaystyle\text{\rm Cov}\,(\theta_{n}^{(2)})$	$\displaystyle=O(n^{-1-\delta})$
	$\displaystyle\text{\rm Cov}\,(\theta_{n}^{(3)})$	$\displaystyle=n^{-2}\Sigma_{Z_{n+1}}$

The cross terms between $\widetilde{\theta}_{n}^{(i)}$ and $\widetilde{\theta}_{n}^{(j)}$ for $i\neq j$ are of smaller orders than $O(1/n)$ by the Cauchy-Schwarz inequality. Therefore, for a possibly smaller $\delta>0$ ,

\text{\rm Cov}\,(\theta_{n})=n^{-1}\Sigma_{\theta}+O(n^{-1-\delta})

For case (ii), $\lim_{n\rightarrow 0}n^{2\varrho}{\sf E}[|v^{\intercal}\widetilde{\theta}_{n}|^{2}]=0$ for each $\varrho<\varrho_{0}$ can be obtained from Prop. 2.9 (ii) and Prop. 2.10 (ii) directly by the triangle inequality. For $\varrho>\varrho_{0}$ , the result $\lim_{n\rightarrow 0}n^{2\varrho}{\sf E}[|v^{\intercal}\widetilde{\theta}_{n}|^{2}]=\infty$ is established independently in Lemma A.10. ∎

2.6 Finer Error Bound

2.6.1 Finer Decomposition with Second Poisson Equation

With ${\hat{f}}$ in (28) and that ${\hat{f}}_{i}^{2}\in L_{\infty}^{V}$ for each $1\leq i\leq d$ , denote $\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}$ by the zero-mean solution to the second Poisson equation

{\sf E}[\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}\,(\Phi_{k+1})\mid\Phi_{k}=z]=\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}(z)-{\hat{f}}(z)

(39)

We then write, for $n\geq 1$ ,

Z_{n}=\widehat{\Delta}_{n+1}^{m}+\widehat{Z}_{n}-\widehat{Z}_{n+1}

(40)

where $\widehat{Z}_{n}=\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}(\Phi_{n})$ , and $\widehat{\Delta}_{n+1}^{m}=\widehat{Z}_{n+1}-{\sf E}[\widehat{Z}_{n+1}\mid\mathcal{F}_{n}]$ is a martingale difference sequence.

The type of decomposition in Section 2.2 can be applied to $\widetilde{\theta}_{n}^{(2)}$ in (31) for $n\geq 2$ :

\widetilde{\theta}_{n}^{(2)}=\widetilde{\theta}_{n}^{(2,1)}+\widetilde{\theta}_{n}^{(2,2)}+\widetilde{\theta}_{n}^{(2,3)}

(41)

The first two sequences evolve as SA recursions:


$\displaystyle\widetilde{\theta}_{n+1}^{(2,1)}=\widetilde{\theta}_{n}^{(2,1)}+$	$\displaystyle\alpha_{n+1}\bigl{[}A\widetilde{\theta}_{n}^{(2,1)}-\alpha_{n}[I+A]\widehat{\Delta}_{n+2}^{m}\bigr{]}\,,$	$\displaystyle\widetilde{\theta}_{1}^{(2,1)}=Z_{1}$	(42a)
$\displaystyle\widetilde{\theta}_{n+1}^{(2,2)}=\widetilde{\theta}_{n}^{(2,2)}+$	$\displaystyle\alpha_{n+1}\bigl{[}A\widetilde{\theta}_{n}^{(2,2)}+\alpha_{n-1}\alpha_{n}[2I+A][I+A]\widehat{Z}_{n+1}\bigr{]}\,,$	$\displaystyle\widetilde{\theta}_{2}^{(2,2)}=-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}[I+A]\widehat{Z}_{2}$	(42b)

and $\widetilde{\theta}_{n}^{(2,3)}=\alpha_{n-1}\alpha_{n}[I+A]\widehat{Z}_{n+1}$ . Therefore, $\widetilde{\theta}_{n}$ for $n\geq 2$ can be decomposed as:

\widetilde{\theta}_{n}=\widetilde{\theta}_{n}^{(1)}+\widetilde{\theta}_{n}^{(2,1)}+\widetilde{\theta}_{n}^{(2,2)}+\widetilde{\theta}_{n}^{(2,3)}+\widetilde{\theta}_{n}^{(3)}

(43)

2.6.2 Finer Mean Square Error Bound

The error bound (6) is obtained from (43):

Theorem 2.11.

Suppose Assumptions (A1)-(A3) hold, and moreover $\text{Real}(\lambda)<-1$ for every eigenvalue $\lambda$ of $A$ . Then, for the linear recursion (3),

\text{\rm Cov}\,(\theta_{n})=n^{-1}\Sigma_{\theta}+n^{-2}\Sigma_{\theta,2}+O(n^{-2-\delta})

where $\delta=\delta(I+A,\Sigma_{\Delta})>0$ , $\Sigma_{\theta,2}=\Sigma_{\sharp}+\Sigma_{Z}-{\sf E}_{\pi}[\Delta_{n}^{m}\widehat{Z}_{n}^{\intercal}]-{\sf E}_{\pi}[\widehat{Z}_{n}(\Delta_{n}^{m})^{\intercal}]$ , and $\Sigma_{\sharp}$ is the unique solution to the Lyapunov equation:

[I+A][\Sigma-\text{\rm Cov}\,_{\pi}(\widehat{\Delta}_{n}^{m},\Delta_{n}^{m})]+[\Sigma-\text{\rm Cov}\,_{\pi}(\Delta_{n}^{m},\widehat{\Delta}_{n}^{m})][I+A]^{\intercal}+A\Sigma_{\theta}A^{\intercal}-\Sigma_{\Delta}=0

(44)

∎

2.6.3 Proof of Thm. 2.11

Denote the correlation between $\widetilde{\theta}_{n}^{(a)}$ and $\widetilde{\theta}_{n}^{(b)}$ as $R_{n}^{(a),(b)}={\sf E}[\widetilde{\theta}_{n}^{(a)}(\widetilde{\theta}_{n}^{(b)})^{\intercal}]$ , where $\widetilde{\theta}_{n}^{(a)},\widetilde{\theta}_{n}^{(b)}$ are different terms in (43). The key results that help establish Thm. 2.11 are summarized in the following proposition. The proof is in Appendix A.4.

Proposition 2.12.

Under Assumptions (A1)-(A3), if $\text{Real}(\lambda)<-1$ for every eigenvalue of $A$ , then there is $\delta>0$ such that

(i)

$\text{\rm Cov}\,(\theta_{n}^{(1)})=n^{-1}\Sigma_{\theta}+n^{-2}\Sigma_{\sharp}^{(1)}+O(n^{-2-\delta})$ , where $\delta=\delta(I+A,\Sigma_{\Delta})>0$ , $\Sigma_{\theta}\geq 0$ is the unique solution to the Lyapunov equation (4), and $\Sigma_{\sharp}^{(1)}\geq 0$ solves the Lyapunov equation,

$[I+A]\Sigma+\Sigma[I+A]^{\intercal}+A\Sigma_{\theta}A^{\intercal}-\Sigma_{\Delta}=0$ (45)

(ii)

$R_{n}^{(2,1),(1)}+R_{n}^{(1),(2,1)}=n^{-2}\Sigma_{\sharp}^{(2)}+O(n^{-2-\delta})$ , where $\Sigma_{\sharp}^{(2)}$ solves the Lyapunov equation:

[I+A]\Sigma+\Sigma[I+A]^{\intercal}-[I+A]\text{\rm Cov}\,_{\pi}(\widehat{\Delta}_{n}^{m},\Delta_{n}^{m})-\text{\rm Cov}\,_{\pi}(\Delta_{n}^{m},\widehat{\Delta}_{n}^{m})[I+A]^{\intercal}=0

(46)

(iii)

$R_{n}^{(1),(3)}=-n^{-2}{\sf E}_{\pi}[\Delta_{n}^{m}\widehat{Z}_{n}^{\intercal}]+O(n^{-3})$ .

∎

Proof of Thm. 2.11.

With the decomposition in (43), we have

	$\displaystyle\text{\rm Cov}\,(\theta_{n})=\text{\rm Cov}\,(\theta_{n}^{(1)})$	$\displaystyle+\sum_{j=1}^{3}\text{\rm Cov}\,(\theta_{n}^{(2,j)})+\text{\rm Cov}\,(\theta_{n}^{(3)})+R_{n}^{(1),(3)}+R_{n}^{(3),(1)}$
		$\displaystyle+\sum_{i\in\{1,3\}}\sum_{j=1}^{3}[R_{n}^{(2,j),(i)}+R_{n}^{(i),(2,j)}]+\sum_{j=1}^{3}\sum_{k=1,k\neq j}^{3}[R_{n}^{(2,j),(2,k)}+R_{n}^{(2,k),(2,j)}]$

$\text{\rm Cov}\,(\theta_{n}^{(2,1)})=O(n^{-3})$ , $\text{\rm Cov}\,(\theta_{n}^{(2,2)})=O(n^{-5})$ and $\text{\rm Cov}\,(\theta_{n}^{(2,3)})=O(n^{-4})$ by Thm. 2.4 (i). By the Cauchy-Schwarz inequality, the correlation terms involving $\widetilde{\theta}_{n}^{(2,2)}$ and $\widetilde{\theta}_{n}^{(2,3)}$ are $O(n^{-2.5})$ , and $R_{n}^{(2,1),(3)}=O(n^{-2.5})$ is also $O(n^{-2.5})$ . Prop. 2.12 (ii) shows that $R_{n}^{(2,1),(3)}=O(n^{-3})$ . Hence the covariance can be approximated as follows:

\text{\rm Cov}\,(\theta_{n})=\text{\rm Cov}\,(\theta_{n}^{(1)})+\text{\rm Cov}\,(\theta_{n}^{(3)})+R_{n}^{(1),(3)}+R_{n}^{(3),(1)}+R_{n}^{(2,1),(1)}+R_{n}^{(1),(2,1)}+O(n^{-2.5})

By Prop. 2.12, there exist $\delta(I+A,\Sigma_{\Delta})>0$ and $\delta(I+A)>0$ such that

	$\displaystyle\text{\rm Cov}\,(\theta_{n}^{(1)})$	$\displaystyle=n^{-1}\Sigma_{\theta}+n^{-2}\Sigma_{\sharp}^{(1)}+O(n^{-2-\delta})$
	$\displaystyle\text{\rm Cov}\,(\theta_{n}^{(3)})$	$\displaystyle=n^{-2}\Sigma_{Z}+O(\rho^{n})$
	$\displaystyle R_{n}^{(1),(3)}$	$\displaystyle=-n^{-2}{\sf E}_{\pi}[\Delta_{n}^{m}\widehat{Z}_{n}^{\intercal}]+O(n^{-3})$
	$\displaystyle R_{n}^{(2,1),(1)}+R_{n}^{(1),(2,1)}$	$\displaystyle=n^{-2}\Sigma_{\sharp}^{(2)}+O(n^{-2-\delta})$

Putting those results together gives

\text{\rm Cov}\,(\theta_{n})=n^{-1}\Sigma_{\theta}+n^{-2}\bigl{(}\Sigma_{\sharp}^{(1)}+\Sigma_{\sharp}^{(2)}+\Sigma_{Z}-{\sf E}_{\pi}[\Delta_{n}^{m}\widehat{Z}_{n}^{\intercal}]-{\sf E}_{\pi}[\widehat{Z}_{n}(\Delta_{n}^{m})^{\intercal}]\bigr{)}+O(n^{-2-\delta})

for some $\delta>0$ , where $\Sigma_{\sharp}\mathbin{:=}\Sigma_{\sharp}^{(1)}+\Sigma_{\sharp}^{(2)}$ solves the Lyapunov equation (44). ∎

3 Conclusions

Performance bounds for recursive algorithms are challenging outside of the special cases surveyed in the introduction. The general framework developed in this paper provides tight finite time performance for linear stochastic recursions under mild conditions on the Markovian noise, and we are confident that the techniques will extend to obtain similar bounds for nonlinear stochastic approximation provided that the linearization (2) is meaningful.

The bound (5) implies that, for some constant $b_{\theta}$ and all $n$ ,

{\sf E}[\|\widetilde{\theta}_{n}\|^{2}]\leq n^{-1}\hbox{\rm trace\,}(\Sigma_{\theta})+n^{-1-\delta}b_{\theta}\,.

It may be argued that we have not obtained a finite- $n$ bound, because a bound on the constant $b_{\theta}$ is lacking. Our response is that the precision of the dominant term is most important. We have tested the bound in numerous experiments in which the empirical mean-square error is obtained from multiple independent trials, and the resulting histogram is compared to what is predicted by the Central Limit Theorem with covariance $\Sigma_{\theta}$ . It is found that the Central Limit Theorem is highly predictive of finite- $n$ performance in most cases [14, 11, 13]. While it is hoped that further research will provide bounds on $b_{\theta}$ , it seems likely that any bound will involve high-order statistics of the Markov chain; evidence of this is the complex coefficient of $n^{-2}$ in (6) for the special case $\delta=1$ .

Current research concerns these topics, as well as algorithm design for reinforcement learning in various settings.

References

[1] A. Benveniste, M. Métivier, and P. Priouret. Adaptive algorithms and stochastic approximations, volume 22 of Applications of Mathematics (New York). Springer-Verlag, Berlin, 1990. Translated from the French by Stephen S. Wilson.
[2] A. Benveniste, M. Métivier, and P. Priouret. Adaptive algorithms and stochastic approximations. Springer, 2012.
[3] J. Bhandari, D. Russo, and R. Singal. A finite time analysis of temporal difference learning with linear function approximation. arXiv preprint arXiv:1806.02450, 2018.
[4] J. R. Blum. Multidimensional stochastic approximation methods. The Annals of Mathematical Statistics, pages 737–744, 1954.
[5] V. S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Hindustan Book Agency and Cambridge University Press (jointly), Delhi, India and Cambridge, UK, 2008.
[6] V. S. Borkar and S. P. Meyn. The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim., 38(2):447–469, 2000. (see also IEEE CDC, 1998).
[7] S. Chen, A. M. Devraj, A. Bušić, and S. Meyn. Zap Q Learning with nonlinear function approximation. Submitted for publication and arXiv e-prints, 2019.
[8] Z. Chen, S. Zhang, T. Doan, S. Maguluri, and J. Clarke. Performance of q-learning with linear function approximation: Stability and finite-time analysis. arXiv preprint arXiv:1905.11425, 2019.
[9] K. L. Chung et al. On a stochastic approximation method. The Annals of Mathematical Statistics, 25(3):463–483, 1954.
[10] G. Dalal, B. Szorenyi, G. Thoppe, and S. Mannor. Finite sample analysis of two-timescale stochastic approximation with applications to reinforcement learning. arXiv preprint arXiv:1703.05376, 2017.
[11] A. M. Devraj. Reinforcement Learning Design with Optimal Learning Rate. PhD thesis, University of Florida, 2019.
[12] A. M. Devraj, A. Bušić, and S. Meyn. Zap Q-Learning – a user’s guide. In Proc. of the Fifth Indian Control Conference, January 9-11 2019.
[13] A. M. Devraj, A. Bušić, and S. Meyn. Fundamental design principles for reinforcement learning algorithms. In Handbook on Reinforcement Learning and Control. Springer, 2020.
[14] A. M. Devraj and S. P. Meyn. Fastest convergence for Q-learning. ArXiv e-prints, July 2017.
[15] A. M. Devraj and S. P. Meyn. Zap Q-learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017.
[16] K. Duffy and S. Meyn. Large deviation asymptotics for busy periods. Stochastic Systems, 4(1):300–319, 2014.
[17] K. R. Duffy and S. P. Meyn. Most likely paths to error when estimating the mean of a reflected random walk. Performance Evaluation, 67(12):1290–1303, 2010.
[18] L. Gerencser. Convergence rate of moments in stochastic approximation with simultaneous perturbation gradient approximation and resetting. IEEE Transactions on Automatic Control, 44(5):894–905, May 1999.
[19] P. W. Glynn and D. Ormoneit. Hoeffding’s inequality for uniformly ergodic Markov chains. Statistics and Probability Letters, 56:143–146, 2002.
[20] G. Golub and C. Van Loan. Matrix computations. 3rd. edn. ed, 1996.
[21] B. Hu and U. A. Syed. Characterizing the exact behaviors of temporal difference learning algorithms using markov jump linear system theory. arXiv preprint arXiv:1906.06781, 2019.
[22] B. Karimi, B. Miasojedow, E. Moulines, and H.-T. Wai. Non-asymptotic analysis of biased stochastic approximation scheme. In Conference on Learning Theory, pages 1944–1974, 2019.
[23] J. Kiefer and J. Wolfowitz. Stochastic estimation of the maximum of a regression function. Ann. Math. Statist., 23(3):462–466, 09 1952.
[24] H. Kushner and G. G. Yin. Stochastic approximation and recursive algorithms and applications, volume 35. Springer Science & Business Media, 2003.
[25] H. J. Kushner and G. G. Yin. Stochastic approximation algorithms and applications, volume 35 of Applications of Mathematics (New York). Springer-Verlag, New York, 1997.
[26] C. Lakshminarayanan and C. Szepesvari. Linear stochastic approximation: How far does constant step-size and iterate averaging go? In International Conference on Artificial Intelligence and Statistics, pages 1347–1355, 2018.
[27] M. Metivier and P. Priouret. Applications of a Kushner and Clark lemma to general classes of stochastic algorithms. IEEE Transactions on Information Theory, 30(2):140–151, March 1984.
[28] S. P. Meyn. Control Techniques for Complex Networks. Cambridge University Press, 2007. Pre-publication edition available online.
[29] S. P. Meyn and R. L. Tweedie. Markov chains and stochastic stability. Cambridge University Press, Cambridge, second edition, 2009. Published in the Cambridge Mathematical Library. 1993 edition online.
[30] B. T. Polyak. A new method of stochastic approximation type. Avtomatika i telemekhanika (in Russian). translated in Automat. Remote Control, 51 (1991), pages 98–107, 1990.
[31] B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM J. Control Optim., 30(4):838–855, 1992.
[32] H. Robbins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22:400–407, 1951.
[33] D. Ruppert. A Newton-Raphson version of the multivariate Robbins-Monro procedure. The Annals of Statistics, 13(1):236–245, 1985.
[34] D. Ruppert. Efficient estimators from a slowly convergent Robbins-Monro processes. Technical Report Tech. Rept. No. 781, Cornell University, School of Operations Research and Industrial Engineering, Ithaca, NY, 1988.
[35] R. Srikant and L. Ying. Finite-time error bounds for linear stochastic approximation and TD learning. CoRR, abs/1902.00923, 2019.
[36] R. S. Sutton. Learning to predict by the methods of temporal differences. Mach. Learn., 3(1):9–44, 1988.
[37] V. B. Tadić. Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes. Machine learning, 63(2):107–133, 2006.
[38] J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automat. Control, 42(5):674–690, 1997.
[39] J. Venter et al. An extension of the robbins-monro procedure. The Annals of Mathematical Statistics, 38(1):181–190, 1967.
[40] C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, Cambridge, UK, 1989.

Appendix A Appendices

A.1 Proofs for decomposition and scaling

Proof of Lemma 2.2.

Recall the summation by parts formula: for scalar sequences $\{a_{k},b_{k}\}$ ,

\sum_{k=0}^{N}a_{k+1}[b_{k+1}-b_{k}]=a_{k+1}b_{k+1}-a_{1}b_{0}-\sum_{k=1}^{N}[a_{k+1}-a_{k}]b_{k}

(47)

This is applied to (30b), beginning with

\widetilde{\theta}_{N+1}^{\mathcal{T}}=\sum_{n=0}^{N}\alpha_{n+1}A\widetilde{\theta}_{n}^{\mathcal{T}}+\sum_{n=0}^{N}\alpha_{n+1}[Z_{n+1}-Z_{n+2}]

Hence with $a_{k}=\alpha_{k}$ and $b_{k}=Z_{k+1}$ , the identity (47) implies

	$\displaystyle\sum_{n=0}^{N}\alpha_{n+1}[Z_{n+1}-Z_{n+2}]$	$\displaystyle=Z_{1}-\alpha_{N+1}Z_{N+2}+\sum_{n=1}^{N}[\alpha_{n+1}-\alpha_{n}]Z_{n+1}$
		$\displaystyle=Z_{1}-\alpha_{N+1}Z_{N+2}-\sum_{n=1}^{N}\alpha_{n+1}\alpha_{n}Z_{n+1}$

By substitution, and using $\widetilde{\theta}_{0}^{\mathcal{T}}=0$ ,

\widetilde{\theta}_{N+1}^{\mathcal{T}}=Z_{1}-\alpha_{N+1}Z_{N+2}+\sum_{n=1}^{N}\alpha_{n+1}\bigl{[}A\widetilde{\theta}_{n}^{\mathcal{T}}-\alpha_{n}Z_{n+1}\bigr{]}

With $\Xi_{n}\mathbin{:=}\widetilde{\theta}_{n}^{\mathcal{T}}+\alpha_{n}Z_{n+1}$ for $n\geq 1$ we finally obtain for $N\geq 1$ ,

\Xi_{N+1}=Z_{1}+\sum_{n=1}^{N}\alpha_{n+1}\bigl{[}A\Xi_{n}-\alpha_{n}[I+A]Z_{n+1}\bigr{]}

which is equivalent to (31). ∎

Proof of Lemma 2.3.

Consider the Taylor series expansion:

	$\displaystyle\frac{(n+1)^{\varrho}}{n^{\varrho}}=(1+n^{-1})^{\varrho}$	$\displaystyle=1+\varrho n^{-1}-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\varrho(1-\varrho)n^{-2}+O(n^{-3})$
		$\displaystyle=1+\varrho(n+1)^{-1}+\varrho n^{-1}(n+1)^{-1}-{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\varrho(1-\varrho)n^{-2}+O(n^{-3})$

where the second equation uses $n^{-1}-(n+1)^{-1}=n^{-1}(n+1)^{-1}$ . With $\alpha_{n}=1/n$ , the following bound follows:

(n+1)^{\varrho}=n^{\varrho}\bigl{[}1+\alpha_{n+1}(\varrho+\varepsilon(n,\varrho))\bigr{]}

where $\varepsilon(n,\varrho)=O(n^{-1})$ , and $\varepsilon(n,\varrho)>0$ for all $n$ .

Multiplying both sides of (3) by $(n+1)^{\varrho}$ , we obtain

\widetilde{\theta}^{\varrho}_{n+1}=\widetilde{\theta}^{\varrho}_{n}+\alpha_{n+1}\bigl{[}\varrho_{n}\widetilde{\theta}^{\varrho}_{n}+A(n,\varrho)\widetilde{\theta}^{\varrho}_{n}+(n+1)^{\varrho}\Delta_{n+1}\bigr{]}

where $\varrho_{n}=\varrho+\varepsilon(n,\varrho)$ and $A(n,\varrho)=(1+n^{-1})^{\varrho}A$ . ∎

Lemma A.1.

Let $\varrho_{0}>0,L\geq 0$ be fixed real numbers. Then the following holds for each $n\geq 1$ and $1\leq n_{0}<n$ :

\prod_{k=n_{0}}^{n}[1-\varrho_{0}\alpha_{k}+L^{2}\alpha_{k}^{2}]\leq K_{\ref{t:prod-n}}\frac{n_{0}^{\varrho_{0}}}{(n+1)^{\varrho_{0}}}

where $K_{\ref{t:prod-n}}=\exp(\varrho_{0}+L^{2}\sum_{k=1}^{\infty}\alpha_{k}^{2})$ .

Proof.

By the inequality $1-x\leq\exp(-x)$ ,

\prod_{k=n_{0}}^{n}[1-\varrho_{0}\alpha_{k}+L^{2}\alpha_{k}^{2}]\leq\exp(-\varrho_{0}\sum_{k=n_{0}}^{n}\alpha_{k})\exp(L^{2}\sum_{k=n_{0}}^{n}\alpha_{k}^{2})\leq\exp(-\varrho_{0})K\exp(-\varrho_{0}\sum_{k=n_{0}}^{n}\alpha_{k})

The remainder of the proof involves establishing the bound

\exp(-\varrho_{0}\sum_{k=n_{0}}^{n}\alpha_{k})\leq\exp(\varrho_{0})\frac{n_{0}^{\varrho_{0}}}{(n+1)^{\varrho_{0}}}

(48)

For $n_{0}=1$ this follows from the bound $\sum_{k=1}^{n}\alpha_{k}\geq\ln(n+1)$ , and for $n_{0}\geq 2$ the bound (48) follows from $\sum_{k=n_{0}}^{n}\alpha_{k}>\ln(n+1)-\ln(n_{0}-1)-1$ . ∎

Lemma A.2.

Under Assumptions A1-A3, let $\lambda=-\varrho_{0}+ui$ denote an eigenvalue of matrix $A$ with largest real part. Then

\lim_{n\rightarrow\infty}n^{2\varrho}{\sf E}[\widetilde{\theta}_{n}^{\intercal}\widetilde{\theta}_{n}]=0\,,\qquad\varrho<\varrho_{0}\text{ and }\varrho\leq{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}

Proof.

Recall the decomposition of $\widetilde{\theta}_{n}$ in (32): $\widetilde{\theta}_{n}=\widetilde{\theta}_{n}^{(1)}+\widetilde{\theta}_{n}^{(2)}+\widetilde{\theta}_{n}^{(3)}$ , with $\widetilde{\theta}_{n}^{(1)},\widetilde{\theta}_{n}^{(2)}$ evolving as


$\displaystyle\widetilde{\theta}_{n+1}^{(1)}$	$\displaystyle=\widetilde{\theta}_{n}^{(1)}+\alpha_{n+1}\bigl{[}A\widetilde{\theta}_{n}^{(1)}+\Delta_{n+2}^{m}\bigr{]}\,,$	$\displaystyle\widetilde{\theta}_{0}^{(1)}=\widetilde{\theta}_{0}$	(49a)
$\displaystyle\widetilde{\theta}_{n+1}^{(2)}$	$\displaystyle=\widetilde{\theta}_{n}^{(2)}+\alpha_{n+1}\bigl{[}A\widetilde{\theta}_{n}^{(2)}-\alpha_{n}[I+A]Z_{n+1}\bigr{]}\,,\qquad$	$\displaystyle\widetilde{\theta}_{1}^{(2)}=Z_{1}$	(49b)

For fixed $\varrho<\varrho_{0}\text{ and }\varrho\leq{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ , Let $T>0$ solve the Lyapunov equation $[A+\varrho I]T+T[A+\varrho I]^{\intercal}+I=0$ , which exists since $A+\varrho I$ is Hurwitz. Define the norm of $\widetilde{\theta}_{n}$ by $\|\widetilde{\theta}_{n}\|_{T}\mathbin{:=}\sqrt{{\sf E}[\widetilde{\theta}_{n}^{\intercal}T\widetilde{\theta}_{n}]}$ .

First consider $\widetilde{\theta}_{n}^{(1)}$ . Since the martingale difference $\Delta_{n+2}^{m}$ is uncorrelated with $\widetilde{\theta}_{n}^{(1)}$ , denoting $e_{n}=\|\widetilde{\theta}_{n}^{(1)}\|_{T}^{2},b_{n+2}=\|\Delta_{n+2}^{m}\|_{T}^{2}$ , we obtain the following from (49a):

e_{n+1}=\|[I+\alpha_{n+1}A]\widetilde{\theta}_{n}^{(1)}\|_{T}^{2}+b_{n+2}

(50)

Letting $\lambda_{\circ}>0$ denote the largest eigenvalue of $T$ , we arrive at the following simplification of the first term in (50)

$\displaystyle\\|[I+\alpha_{n+1}A]\widetilde{\theta}_{n}^{(1)}\\|_{T}^{2}$	$\displaystyle={\sf E}\bigl{[}(\widetilde{\theta}_{n}^{(1)})^{\intercal}[T-2\alpha_{n+1}\varrho T-\alpha_{n+1}I+\alpha_{n+1}^{2}ATA^{\intercal}]\widetilde{\theta}_{n}^{(1)}\bigr{]}$	(51)
	$\displaystyle\leq{\sf E}\bigl{[}(\widetilde{\theta}_{n}^{(1)})^{\intercal}[T-2\alpha_{n+1}\varrho T-\frac{1}{\lambda_{\circ}}\alpha_{n+1}T+\alpha_{n+1}^{2}ATA^{\intercal}]\widetilde{\theta}_{n}^{(1)}\bigr{]}$
	$\displaystyle\leq[1-2\alpha_{n+1}\varrho-\alpha_{n+1}/\lambda_{\circ}+\alpha_{n+1}^{2}L^{2}]\\|\widetilde{\theta}_{n}\\|_{T}^{2}$

where $L$ denotes the induced operator norm of $A$ with respect to the norm $\|\,\cdot\,\|_{T}$ . We then obtain the following recursive bound from (50) and (51)

e_{n+1}\leq[1-(2\varrho+1/\lambda_{\circ})\alpha_{n+1}+L^{2}\alpha_{n+1}^{2}]e_{n}+\alpha_{n+1}^{2}K

where $K=\sup_{n\geq 1}b_{n}$ . $K$ is finite since $b_{n}$ converges to ${\sf E}_{\pi}[(\Delta_{n}^{m})^{\intercal}T\Delta_{n}^{m}]$ geometrically fast.

Consequently, for each $n\geq 1$ ,

e_{n+1}\leq e_{0}\prod_{k=1}^{n+1}[1-(2\varrho+1/\lambda_{\circ})\alpha_{k}+L^{2}\alpha_{k}^{2}]+K\sum_{k=1}^{n+1}\alpha_{k}^{2}\prod_{l=k+1}^{n+1}[1-(2\varrho+1/\lambda_{\circ})\alpha_{l}+L^{2}\alpha_{l}^{2}]

By Lemma A.1,

\displaystyle e_{n+1}\leq e_{1}K_{\ref{t:prod-n}}\frac{1}{(n+2)^{2\varrho+1/\lambda_{\circ}}}+\frac{KK_{\ref{t:prod-n}}}{(n+2)^{2\varrho+1/\lambda_{\circ}}}\sum_{k=1}^{n+1}\alpha_{k}^{2-2\varrho-1/\lambda_{\circ}}

Therefore, $e_{n+1}\rightarrow 0$ at rate at least $n^{-2\varrho}$ .

For $\widetilde{\theta}_{n}^{(2)}$ , we use similar arguments. We obtain the following from (49b) by the triangle inequality.

\|\widetilde{\theta}_{n+1}^{(2)}\|_{T}\leq\|[I+\alpha_{n+1}A]\widetilde{\theta}_{n}^{(2)}\|_{T}+\alpha_{n}\alpha_{n+1}\|[I+A]Z_{n+1}\|_{T}

Using the same argument as in (51), along with the inequality $\sqrt{1+x}\leq 1+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}x$ ,

	$\displaystyle\\|[I+\alpha_{n+1}A]\widetilde{\theta}_{n}^{(2)}\\|_{T}$	$\displaystyle\leq\\|\widetilde{\theta}_{n}^{(2)}\\|_{T}\sqrt{1-2\alpha_{n+1}\varrho-\alpha_{n+1}/\lambda_{\circ}+\alpha_{n+1}^{2}L^{2}}$
		$\displaystyle\leq\\|\widetilde{\theta}_{n}^{(2)}\\|_{T}(1-\alpha_{n+1}\varrho-\alpha_{n+1}/(2\lambda_{\circ})+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\alpha_{n+1}^{2}L^{2})$

Denote $K^{\prime}=\sup_{n\geq 1}\|[I+A]Z_{n+1}\|_{T}$ .

\|\widetilde{\theta}_{n+1}^{(2)}\|_{T}\leq[1-(\varrho+1/(2\lambda_{\circ}))\alpha_{n+1}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\alpha_{n+1}^{2}L^{2}]\|\widetilde{\theta}_{n}^{(2)}\|_{T}+\alpha_{n}\alpha_{n+1}K^{\prime}

Then by the same argument for the martingale difference term, we can show that $\|\widetilde{\theta}_{n}^{(2)}\|_{T}\rightarrow 0$ at rate at least $n^{-\varrho}$ .

Given $\|\widetilde{\theta}_{n}^{(3)}\|_{T}=\alpha_{n}\|Z_{n+1}\|_{T}$ converges to zero at rate $1/n$ , the proof is completed by the triangle inequality. ∎

A.2 Proof of Prop. 2.9

(i)

Recall that $\{\Delta^{m}_{n}\}$ is a martingale difference sequence. It is thus an uncorrelated sequence for which $\widetilde{\theta}_{n}^{(1)}$ and $\Delta^{m}_{n+k}$ are uncorrelated for $k\geq 2$ . The following recursion is obtained from these facts and (30a)

\text{\rm Cov}\,(\theta_{n+1}^{(1)})=\text{\rm Cov}\,(\theta_{n}^{(1)})+\alpha_{n+1}\Bigl{[}\text{\rm Cov}\,(\theta_{n}^{(1)})A^{\intercal}+A\text{\rm Cov}\,(\theta_{n}^{(1)})+\alpha_{n+1}[A\text{\rm Cov}\,(\theta_{n}^{(1)})A^{\intercal}+\Sigma_{\Delta_{n+2}}]\Bigr{]}

Multiplying each side by $n+1$ gives

	$\displaystyle(n+1)\text{\rm Cov}\,(\theta_{n+1}^{(1)})=$	$\displaystyle n\text{\rm Cov}\,(\theta_{n}^{(1)})+\text{\rm Cov}\,(\theta_{n}^{(1)})+\text{\rm Cov}\,(\theta_{n}^{(1)})A^{\intercal}+A\text{\rm Cov}\,(\theta_{n}^{(1)})$
		$\displaystyle+\alpha_{n+1}[A\text{\rm Cov}\,(\theta_{n}^{(1)})A^{\intercal}+\Sigma_{\Delta_{n+2}}]$
	$\displaystyle=$	$\displaystyle n\text{\rm Cov}\,(\theta_{n}^{(1)})+\alpha_{n+1}\Bigl{[}(1+\frac{1}{n})[n\text{\rm Cov}\,(\theta_{n}^{(1)})+n\text{\rm Cov}\,(\theta_{n}^{(1)})A^{\intercal}+An\text{\rm Cov}\,(\theta_{n}^{(1)})]$
		$\displaystyle+A\text{\rm Cov}\,(\theta_{n}^{(1)})A^{\intercal}+\Sigma_{\Delta_{n+2}}\Bigr{]}$

The following argument will be used repeatedly through this Appendix: the recursion for $n\text{\rm Cov}\,(\theta_{n}^{(1)})$ is a deterministic SA recursion for $n\text{\rm Cov}\,(\theta_{n}^{(1)})$ , and is regarded as an Euler approximation to the stable linear system

{\mathchoice{\genfrac{}{}{}{1}{d}{dt}}{\genfrac{}{}{}{1}{d}{dt}}{\genfrac{}{}{}{3}{d}{dt}}{\genfrac{}{}{}{3}{d}{dt}}}\mathcal{X}(t)=(1+e^{-t})[\mathcal{X}(t)+A\mathcal{X}(t)+\mathcal{X}(t)A^{\intercal}]+\Sigma_{\Delta}+e^{-t}A\mathcal{X}(t)A^{\intercal}

(52)

Stability follows from the assumption that ${\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A$ is Hurwitz. The standard justification of the Euler approximation is through the choice of timescale: let $t_{n}=\sum_{k=1}^{n}\alpha_{k}$ and let $\mathcal{X}^{n}(t)$ denote the solution to this ODE on $[t_{n},\infty)$ with $\mathcal{X}^{n}(t_{n})=n\text{\rm Cov}\,(\theta_{n}^{(1)})$ , $t\geq t_{n}$ , for any $n\geq 1$ . Using standard ODE arguments [5],

\sup_{k\geq n}\|\mathcal{X}^{n}(t_{k})-k\Sigma_{k}^{(1)}\|=O(1/n)

Exponential convergence of $\mathcal{X}$ to $\Sigma_{\theta}$ implies convergence of $\{n\text{\rm Cov}\,(\theta_{n}^{(1)})\}$ to zero at rate $1/n^{\delta}$ for some $\delta=\delta({\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A,\Sigma_{\Delta})>0$ . ∎

(ii)

Denote $e_{n}^{\varrho_{0}}={\sf E}[|v^{\intercal}\widetilde{\theta}_{n}^{\varrho_{0}}|^{2}]$ and $\lambda=-\varrho_{0}+ui$ . We begin with the proof that

\liminf_{n\to\infty}e^{\varrho_{0}}_{n}>0

(53)

With $v^{\intercal}[I\lambda-A]=0$ , we have $v^{\intercal}[I\varrho_{n}+A(n,\varrho)]=[\varepsilon_{v}(n,\varrho_{0})+ui]v^{\intercal}$ , with $\varepsilon_{v}(n,\varrho_{0})=O(n^{-1})$ . Applying (35a) gives

v^{\intercal}\widetilde{\theta}_{n+1}^{{\varrho_{0}},(1)}=v^{\intercal}\widetilde{\theta}_{n}^{{\varrho_{0}},(1)}+\alpha_{n+1}\bigl{[}[\varepsilon_{v}(n,{\varrho_{0}})+ui]v^{\intercal}\widetilde{\theta}_{n}^{{\varrho_{0}},(1)}+(n+1)^{\varrho_{0}}v^{\intercal}\Delta_{n+2}^{m}\bigr{]}

Let $\overline{v}$ denote the conjugate of $v$ . Consequently, with $\sigma^{2}_{n}(v)=v^{\intercal}\Sigma_{\Delta_{n}}\overline{v}$ ,

e^{\varrho_{0}}_{n+1}=\bigl{[}[1+\varepsilon_{v}(n,{\varrho_{0}})/(n+1)]^{2}+u^{2}/(n+1)^{2}\bigr{]}e^{\varrho_{0}}_{n}+(n+1)^{2{\varrho_{0}}-2}\sigma^{2}_{n+2}(v)

$V$ -uniform ergodicity implies that $\sigma^{2}_{n}(v)\to v^{\intercal}\Sigma_{\Delta}\overline{v}>0$ as $n\to\infty$ at a geometric rate. Fix $n_{0}>0$ so that $\sigma^{2}_{n_{0}}(v)>0$ , and hence also $e^{\varrho_{0}}_{n_{0}+1}>0$ . We also assume that $1+\varepsilon_{v}(n,{\varrho_{0}})/(n+1)>0$ for $n\geq n_{0}$ , which is possible since $\varepsilon_{v}(n,{\varrho_{0}})=O(n^{-1})$ .

For $N>n_{0}$ we obtain the uniform bound

\log(e^{\varrho_{0}}_{N})\geq\log(e^{\varrho_{0}}_{n_{0}+1})+2\sum_{n=n_{0}+2}^{\infty}\log[1-|\varepsilon_{v}(n,{\varrho_{0}})|/(n+1)]>-\infty

which proves that $\liminf_{n\to\infty}e_{n}^{\varrho_{0}}=\liminf_{n\to\infty}v^{\intercal}\Sigma_{n}^{{\varrho_{0}},(1)}\overline{v}>0$ .

The proof of an upper bound for ${\varrho_{0}}<1/2$ : by concavity of the logarithm,

\log(e^{\varrho_{0}}_{n+1})\leq\log\bigl{(}\bigl{[}[1+\varepsilon_{v}(n,{\varrho_{0}})/(n+1)]^{2}+u^{2}/(n+1)^{2}\bigr{]}e^{\varrho_{0}}_{n}\bigr{)}+K(n+1)^{2{\varrho_{0}}-2}

where $K=\sup_{n>n_{0}}\bigl{[}[1+\varepsilon_{v}(n,{\varrho_{0}})/(n+1)]^{2}+u^{2}/(n+1)^{2}\bigr{]}^{-1}[e^{\varrho_{0}}_{n}]^{-1}\sigma^{2}_{n+2}(v)$ . Using concavity of the logarithm once more gives

\log(e^{\varrho_{0}}_{n+1})\leq\log(e^{\varrho_{0}}_{n})+2\varepsilon_{v}(n,{\varrho_{0}})/(n+1)+\frac{\varepsilon_{v}(n,{\varrho_{0}})^{2}}{(n+1)^{2}}+\frac{u^{2}}{(n+1)^{2}}+K(n+1)^{2{\varrho_{0}}-2}

which gives the uniform upper bound

\log(e^{\varrho_{0}}_{N})\leq\log(e^{\varrho_{0}}_{n_{0}+1})+\sum_{n=n_{0}+2}^{\infty}\Bigl{(}2\frac{|\varepsilon_{v}(n,{\varrho_{0}})|}{n+1)}+\frac{\varepsilon_{v}(n,{\varrho_{0}})^{2}}{(n+1)^{2}}+\frac{u^{2}}{(n+1)^{2}}+K(n+1)^{2{\varrho_{0}}-2}\Bigr{)}<\infty

This proves that $\limsup_{n\to\infty}e_{n}^{\varrho_{0}}=\limsup_{n\to\infty}v^{\intercal}\Sigma_{n}^{{\varrho_{0}},(1)}\overline{v}<\infty$ . ∎

A.3 Proof for Prop. 2.10

(i)

Denote $\mathcal{D}_{n}=\varepsilon(n,\varrho)I+A(n,\varrho)-A$ . We can rewrite (35b) as

	$\displaystyle\widetilde{\theta}_{n+1}^{\varrho,(2)}$	$\displaystyle=\widetilde{\theta}_{n}^{\varrho,(2)}+\alpha_{n+1}\bigl{[}[{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A]\widetilde{\theta}_{n}^{\varrho,(2)}+\mathcal{D}_{n}\widetilde{\theta}_{n}^{\varrho,(2)}-\alpha_{n}(n+1)^{\varrho}[I+A]Z_{n+1}\bigr{]}$		(54)
		$\displaystyle=\bigl{[}I+\alpha_{n+1}[{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A]\bigr{]}\widetilde{\theta}_{n}^{\varrho,(2)}+\alpha_{n+1}\mathcal{D}_{n}\widetilde{\theta}_{n}^{\varrho,(2)}-\alpha_{n+1}\alpha_{n}(n+1)^{\varrho}[I+A]Z_{n+1}$		(54)

Let $T>0$ solve the Lyapunov equation

[{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A]^{\intercal}T+T[{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A]+I=0

As in the proof of Lemma A.2, a solution exists because ${\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A$ is Hurwitz. Adopting the familiar notation $\|\widetilde{\theta}_{n}^{\varrho,(2)}\|_{T}\mathbin{:=}\sqrt{{\sf E}[(\widetilde{\theta}_{n}^{\varrho,(2)})^{\intercal}T\widetilde{\theta}_{n}^{\varrho,(2)}]}$ , the triangle inequality applied to (54) gives

\|\widetilde{\theta}_{n+1}^{\varrho,(2)}\|_{T}\leq\|\bigl{[}I+\alpha_{n+1}[{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A]\bigr{]}\widetilde{\theta}_{n}^{\varrho,(2)}\|_{T}+\alpha_{n+1}\|\mathcal{D}_{n}\|_{T}\|\widetilde{\theta}_{n}^{\varrho,(2)}\|_{T}+\alpha_{n+1}\alpha_{n}(n+1)^{\varrho}\|[I+A]Z_{n+1}\|_{T}

(55)

The first term can be simplified by the Lyapunov equation.

	$\displaystyle\\|\bigl{[}I+\alpha_{n+1}[{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A]\bigr{]}\widetilde{\theta}_{n}^{\varrho,(2)}\\|_{T}^{2}=$	$\displaystyle{\sf E}\bigl{[}(\widetilde{\theta}_{n}^{\varrho,(2)})^{\intercal}\bigl{[}T-\alpha_{n+1}I+\alpha_{n+1}^{2}[{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A]^{\intercal}T[{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A]\bigr{]}\widetilde{\theta}_{n}^{\varrho,(2)}\bigr{]}$
	$\displaystyle\leq$	$\displaystyle{\sf E}\bigl{[}(\widetilde{\theta}_{n}^{\varrho,(2)})^{\intercal}\bigl{[}T-\frac{\alpha_{n+1}}{\lambda_{\circ}}T+\alpha_{n+1}^{2}[{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A]^{\intercal}T[{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A]\bigr{]}\widetilde{\theta}_{n}^{\varrho,(2)}\bigr{]}$
	$\displaystyle\leq$	$\displaystyle\\|\widetilde{\theta}_{n}^{\varrho,(2)}\\|_{T}^{2}-\frac{\alpha_{n+1}}{\lambda_{\circ}}\\|\widetilde{\theta}_{n}^{\varrho,(2)}\\|_{T}^{2}+\alpha_{n+1}^{2}L^{2}\\|\widetilde{\theta}_{n}^{\varrho,(2)}\\|_{T}^{2}$

where $L$ is the induced operator norm of ${\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A$ , and $\lambda_{\circ}>0$ denotes its largest eigenvalue.

Consequently, by the inequality $\sqrt{1+x}\leq 1+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}x$ ,

\displaystyle\|\bigl{[}I+\alpha_{n+1}[{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}I+A]\bigr{]}\widetilde{\theta}_{n}^{\varrho,(2)}\|_{T}\leq\|\widetilde{\theta}_{n}^{\varrho,(2)}\|_{T}\sqrt{1-\frac{\alpha_{n+1}}{\lambda_{\circ}}+\alpha_{n+1}^{2}L^{2}}\leq\|\widetilde{\theta}_{n}^{\varrho,(2)}\|_{T}(1-\frac{\alpha_{n+1}}{2\lambda_{\circ}}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\alpha_{n+1}^{2}L^{2})

Fix $n_{0}>0$ such that for $n\geq n_{0}$ ,

\displaystyle 1-\frac{\alpha_{n+1}}{2\lambda_{\circ}}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\alpha_{n+1}^{2}L^{2}+\alpha_{n+1}\|\mathcal{D}_{n}\|_{T}\leq 1-\frac{\alpha_{n+1}}{4\lambda_{\circ}}

This is possible since $\|\mathcal{D}_{n}\|_{T}=O(n^{-1})$ .

Denote $\delta=\min(\frac{1}{4\lambda_{\circ}},\frac{1}{4})$ and $K=\sup_{n\geq n_{0}}\|[I+A]Z_{n+1}\|_{T}$ , which is finite because $\|Z_{n+1}\|_{T}$ converges. We obtain the following from (55)

	$\displaystyle\\|\widetilde{\theta}_{n+1}^{\varrho,(2)}\\|_{T}\leq$	$\displaystyle\\|\widetilde{\theta}_{n}^{\varrho,(2)}\\|_{T}(1-\delta\alpha_{n+1})+\alpha_{n+1}^{1/2}\alpha_{n}K$		(56)
	$\displaystyle\leq$	$\displaystyle\\|\widetilde{\theta}_{n}^{\varrho,(2)}\\|_{T}(1-\delta\alpha_{n+1})+\alpha_{n}^{3/2}K$		(56)

Apply (56) repeatedly for $n\geq n_{0}$

	$\displaystyle\\|\widetilde{\theta}_{n+1}^{\varrho,(2)}\\|_{T}\leq$	$\displaystyle\\|\widetilde{\theta}_{n_{0}}^{\varrho,(2)}\\|_{T}\prod_{k=n_{0}+1}^{n+1}(1-\delta\alpha_{k})+K\sum_{k=n_{0}}^{n}\alpha_{k}^{3/2}\prod_{l=k+1}^{n}(1-\delta\alpha_{l})$
	$\displaystyle\leq$	$\displaystyle\\|\widetilde{\theta}_{n_{0}}^{\varrho,(2)}\\|_{T}\exp(\delta)\frac{n_{0}^{\delta}}{(n+2)^{\delta}}+\frac{K\exp(\delta)}{(n+1)^{\delta}}\sum_{k=n_{0}}^{n}k^{-\frac{3}{2}+\delta}$

where $\sum_{k=1}^{\infty}k^{-\frac{3}{2}+\delta}<\infty$ for $\delta\leq 1/4$ . Therefore, $\|\widetilde{\theta}_{n}^{\varrho,(2)}\|_{T}\rightarrow 0$ at rate at least $n^{-\delta}$ .

The desired conclusion follows: letting $\lambda_{\bullet}>0$ denote the smallest eigenvalue of $T$ ,

\Sigma_{n}^{\varrho,(2)}\leq{\sf E}[(\widetilde{\theta}_{n}^{\varrho,(2)})^{\intercal}\widetilde{\theta}_{n}^{\varrho,(2)}]I\leq\frac{1}{\lambda_{\bullet}}\|\widetilde{\theta}_{n}^{\varrho,(2)}\|_{T}^{2}I

∎

(ii)

Multiplying both sides of (35b) by $v^{\intercal}$ gives

	$\displaystyle v^{\intercal}\widetilde{\theta}_{n+1}^{\varrho_{0},(2)}$	$\displaystyle=v^{\intercal}\widetilde{\theta}_{n}^{\varrho_{0},(2)}+\alpha_{n+1}\bigl{[}[\varepsilon_{v}(n,\varrho_{0})+ui]v^{\intercal}\widetilde{\theta}_{n}^{\varrho_{0},(2)}-(1-\varrho_{0}+ui)\alpha_{n}(n+1)^{\varrho_{0}}v^{\intercal}Z_{n+1}\bigr{]}$		(57)
		$\displaystyle=\bigl{[}1+\alpha_{n+1}[\varepsilon_{v}(n,\varrho_{0})+ui]\bigr{]}v^{\intercal}\widetilde{\theta}_{n}^{\varrho_{0},(2)}-(1-\varrho_{0}+ui)\alpha_{n}\alpha_{n+1}(n+1)^{\varrho_{0}}v^{\intercal}Z_{n+1}$		(57)

With $\|v^{\intercal}\widetilde{\theta}_{n}^{\varrho_{0},(2)}\|_{2}\mathbin{:=}\sqrt{{\sf E}[|v^{\intercal}\widetilde{\theta}_{n}^{\varrho_{0},(2)}|^{2}]}$ , we obtain the following from (57) by the triangle inequality

\displaystyle\|v^{\intercal}\widetilde{\theta}_{n+1}^{\varrho_{0},(2)}\|_{2}\leq\bigl{|}1+\alpha_{n+1}[\varepsilon_{v}(n,\varrho_{0})+ui]\bigr{|}\|v^{\intercal}\widetilde{\theta}_{n}^{\varrho_{0},(2)}\|_{2}+\bigl{|}1-\varrho_{0}+ui\bigr{|}\alpha_{n}\alpha_{n+1}(n+1)^{\varrho_{0}}\|v^{\intercal}Z_{n+1}\|_{2}

(58)

By the inequality $\sqrt{1+x}\leq 1+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}x$ , we have

\bigl{|}1+\alpha_{n+1}\varepsilon_{v}(n,\varrho_{0})+\alpha_{n+1}ui\bigr{|}\leq 1+\alpha_{n+1}\varepsilon_{v}(n,\varrho_{0})+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\alpha_{n+1}^{2}\varepsilon_{v}(n,\varrho_{0})^{2}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\alpha_{n+1}^{2}u^{2}

Fix $n_{0}>0$ such that for $n\geq n_{0}$ ,

1+\alpha_{n+1}\varepsilon_{v}(n,\varrho_{0})+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\alpha_{n+1}^{2}\varepsilon_{v}(n,\varrho_{0})^{2}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\alpha_{n+1}^{2}u^{2}\leq 1+\alpha_{n+1}^{3/2}

which is possible since $\varepsilon_{v}(n,\varrho_{0})=O(n^{-1})$ . With $K=\sup_{n\geq n_{0}}|1-\varrho_{0}+ui|\|v^{\intercal}Z_{n+1}\|_{2}$ , we obtain the following bound from (58):

\|v^{\intercal}\widetilde{\theta}_{n+1}^{\varrho_{0},(2)}\|_{2}\leq(1+\alpha_{n+1}^{3/2})\|v^{\intercal}\widetilde{\theta}_{n}^{\varrho_{0},(2)}\|_{2}+\alpha_{n}^{2-\varrho_{0}}K

(59)

Iterating (59) gives,

	$\displaystyle\\|v^{\intercal}\widetilde{\theta}_{n+1}^{\varrho_{0},(2)}\\|_{2}$	$\displaystyle\leq\\|v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho_{0},(2)}\\|_{2}\prod_{k=n_{0}+1}^{n+1}(1+\alpha_{k}^{3/2})+K\sum_{k=n_{0}}^{n}\alpha_{k}^{2-\varrho_{0}}\prod_{l=k+1}^{n}(1+\alpha_{l}^{3/2})$
		$\displaystyle\leq\\|v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho_{0},(2)}\\|_{2}\exp(\sum_{k=n_{0}+1}^{n+1}k^{-2/3})+K\sum_{k=n_{0}}^{n}k^{-2+\varrho_{0}}\exp(\sum_{l=k+1}^{n}l^{-3/2})$

$\limsup_{n\rightarrow\infty}\|v^{\intercal}\widetilde{\theta}_{n}^{\varrho_{0},(2)})\|_{2}<\infty$ , since it is assumed that $\varrho_{0}<{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ . ∎

A.4 Proof of Prop. 2.12

(i)

Since $\Delta_{n+2}^{m}$ is uncorrelated with $\widetilde{\theta}_{n}^{(1)}$ , the following recursion follows from (30a):

\text{\rm Cov}\,(\theta_{n+1}^{(1)})=\text{\rm Cov}\,(\theta_{n}^{(1)})+\alpha_{n+1}\Bigl{[}\text{\rm Cov}\,(\theta_{n}^{(1)})A^{\intercal}+A\text{\rm Cov}\,(\theta_{n}^{(1)})+\alpha_{n+1}[A\text{\rm Cov}\,(\theta_{n}^{(1)})A^{\intercal}+\Sigma_{\Delta_{n+2}}]\Bigr{]}

Take $\varrho=1/2$ in the definition of $\widetilde{\theta}^{\varrho,(1)}$ and $\Sigma_{n}^{\varrho,(1)}={\sf E}[\widetilde{\theta}^{\varrho,(1)}(\widetilde{\theta}^{\varrho,(1)})^{\intercal}]=n\text{\rm Cov}\,(\theta_{n}^{(1)})$ . Multiplying each side of the equation by $n+1$ gives

\Sigma_{n+1}^{\varrho,(1)}=\Sigma_{n}^{\varrho,(1)}+\alpha_{n+1}\Bigl{[}(1+\frac{1}{n})\bigl{[}\Sigma_{n}^{\varrho,(1)}+\Sigma_{n}^{\varrho,(1)}A^{\intercal}+A\Sigma_{n}^{\varrho,(1)}\bigr{]}+\frac{1}{n}A\Sigma_{n}^{\varrho,(1)}A^{\intercal}+\Sigma_{\Delta_{n+2}}\Bigr{]}

(60)

Recall that $\Sigma_{\theta}$ solves the Laypunov equation $\Sigma+\Sigma A^{\intercal}+A\Sigma+\Sigma_{\Delta}=0$ . Denoting $E_{n}=\Sigma_{n}^{\varrho,(1)}-\Sigma_{\theta}$ , the following identity holds

\Sigma_{n}^{\varrho,(1)}+\Sigma_{n}^{\varrho,(1)}A^{\intercal}+A\Sigma_{n}^{\varrho,(1)}=E_{n}+E_{n}A^{\intercal}+AE_{n}-\Sigma_{\Delta}

Subtracting $\Sigma_{\theta}$ from both sides of (60) gives the recursion

	$\displaystyle E_{n+1}=E_{n}+\alpha_{n+1}\Bigl{[}$	$\displaystyle(1+\frac{1}{n})\bigl{[}E_{n}+E_{n}A^{\intercal}+AE_{n}\bigr{]}+\frac{1}{n}AE_{n}A^{\intercal}$		(61)
		$\displaystyle+\frac{1}{n}A\Sigma_{\theta}A^{\intercal}-\frac{1}{n}\Sigma_{\Delta}-\Sigma_{\Delta}+\Sigma_{\Delta_{n+2}}\Bigr{]}$		(61)

Similar to the decomposition in (30), we have $E_{n}=E_{n}^{(1)}+E_{n}^{(2)}$ , each evolving as


$\displaystyle E_{n+1}^{(1)}$	$\displaystyle=E_{n}^{(1)}+\alpha_{n+1}\Bigl{[}(1+\frac{1}{n})\bigl{[}E_{n}^{(1)}+E_{n}^{(1)}A^{\intercal}+AE_{n}^{(1)}\bigr{]}+\frac{1}{n}AE_{n}^{(1)}A^{\intercal}+\frac{1}{n}\bigl{[}A\Sigma_{\theta}A^{\intercal}-\Sigma_{\Delta}\bigr{]}\Bigr{]}$	(62a)
$\displaystyle E_{n+1}^{(2)}$	$\displaystyle=E_{n}^{(2)}+\alpha_{n+1}\Bigl{[}(1+\frac{1}{n})\bigl{[}E_{n}^{(2)}+E_{n}^{(2)}A^{\intercal}+AE_{n}^{(2)}\bigr{]}+\frac{1}{n}AE_{n}^{(2)}A^{\intercal}+\Sigma_{\Delta_{n+2}}-\Sigma_{\Delta}\Bigr{]}$	(62b)

Since $\Sigma_{\Delta_{n+2}}-\Sigma_{\Delta}$ converges to zero geometrically fast, $\{E_{n}^{(1)}\}$ converges to zero faster than $\{E_{n}^{(2)}\}$ .

Multiplying each side of (62a) by $n+1$ gives

	$\displaystyle(n+1)E_{n+1}^{(1)}$	$\displaystyle=(n+1)E_{n}^{(1)}+(1+\frac{1}{n})\bigl{[}E_{n}^{(1)}+E_{n}^{(1)}A^{\intercal}+AE_{n}^{(1)}\bigr{]}+\frac{1}{n}\bigl{[}AE_{n}^{(1)}A^{\intercal}+A\Sigma_{\theta}A^{\intercal}-\Sigma_{\Delta}\bigr{]}$
		$\displaystyle=nE_{n}^{(1)}+\frac{1}{n}\Bigl{[}(1+\frac{1}{n})\bigl{[}2nE_{n}^{(1)}+nE_{n}^{(1)}A^{\intercal}+AnE_{n}^{(1)}\bigr{]}+A\Sigma_{\theta}A^{\intercal}-\Sigma_{\Delta}+\mathcal{E}_{n}^{\bullet,(1)}\Bigr{]}$

with the error term $\mathcal{E}_{n}^{\bullet,(1)}=AE_{n}^{(1)}A^{\intercal}-E_{n}$ . Note that $A\Sigma_{\theta}A^{\intercal}-\Sigma_{\Delta}=[A+I]\Sigma_{\theta}[A+I]^{\intercal}$ is positive definite.

The recursion for $\{nE_{n}^{(1)}\}$ is treated as in the proof of Prop. 2.9 (i). Consider the matrix ODE,

{\mathchoice{\genfrac{}{}{}{1}{d}{dt}}{\genfrac{}{}{}{1}{d}{dt}}{\genfrac{}{}{}{3}{d}{dt}}{\genfrac{}{}{}{3}{d}{dt}}}\mathcal{X}(t)=(1+e^{-t})[2\mathcal{X}(t)+\mathcal{X}(t)A^{\intercal}+A\mathcal{X}(t)]+A\Sigma_{\theta}A^{\intercal}-\Sigma_{\Delta}+e^{-t}[A\mathcal{X}(t)A^{\intercal}-\mathcal{X}(t)]

(63)

Let $t_{n}=\sum_{k=1}^{n}1/k$ and let $\mathcal{X}^{n}(t)$ denote the solution to this ODE on $[t_{n},\infty)$ with $\mathcal{X}^{n}(t_{n})=nE_{n}^{(1)}$ , $t\geq t_{n}$ , for any $n\geq 1$ . We then obtain as previously,

\sup_{k\geq n}\|\mathcal{X}^{n}(t_{k})-kE_{k}^{(1)}\|=O(1/n)

Recall that $\Sigma_{\sharp}^{(1)}\geq 0$ is the solution to the Lyapunov equation (45). Exponential convergence of $\mathcal{X}$ to $\Sigma_{\sharp}^{(1)}$ implies convergence of $\{nE_{n}^{(1)}\}$ at rate $1/n^{\delta}$ for $\delta=\delta(A+I,\Sigma_{\Delta})>0$ . Therefore, $nE_{n}=\Sigma_{\sharp}^{(1)}+O(n^{-\delta})$ .

Given $\text{\rm Cov}\,(\theta_{n}^{(1)})=n^{-1}\Sigma_{\theta}+n^{-1}E_{n}$ , we have

\text{\rm Cov}\,(\theta_{n}^{(1)})=n^{-1}\Sigma_{\theta}+n^{-2}\Sigma_{\sharp}^{(1)}+O(n^{-2-\delta})

∎

(ii)

We focus on $R_{n}^{(2,1),(1)}$ since $R_{n}^{(1),(2,1)}=[R_{n}^{(2,1),(1)}]^{\intercal}$ . Recall the update forms of $\widetilde{\theta}_{n}^{(1)}$ and $\widetilde{\theta}_{n}^{(2,1)}$ in (30a) and (42a) respectively, where $\widetilde{\theta}_{n}^{(1)}$ is uncorrelated with the martingale difference sequence $\{\widehat{\Delta}_{n+k}^{m}\}$ for $k\geq 2$ and $\widetilde{\theta}_{n}^{(2,1)}$ is uncorrelated with $\{\Delta_{n+k}^{m}\}$ for $k\geq 2$ . With $R_{n}^{(2,1),(1)}={\sf E}[\widetilde{\theta}_{n}^{(2,1)}(\widetilde{\theta}_{n}^{(1)})^{\intercal}]$ , the following is obtained from these facts:

	$\displaystyle R_{n+1}^{(2,1),(1)}=R_{n}^{(2,1),(1)}+\alpha_{n+1}\bigl{[}R_{n}^{(2,1),(1)}A^{\intercal}$	$\displaystyle+AR_{n}^{(2,1),(1)}+\alpha_{n+1}AR_{n}^{(2,1),(1)}A^{\intercal}$
		$\displaystyle-\alpha_{n}\alpha_{n+1}[I+A]\text{\rm Cov}\,(\widehat{\Delta}_{n+2}^{m},\Delta_{n+2}^{m})\bigr{]}$

Denote $C_{n}=nR_{n}^{(2,1),(1)}$ . Multiplying both sides of the previous equation by $n+1$ gives

C_{n+1}=C_{n}+\alpha_{n+1}\bigl{[}(1+n^{-1})[C_{n}+C_{n}A^{\intercal}+AC_{n}]+\alpha_{n}AC_{n}A^{\intercal}-\alpha_{n}[I+A]\text{\rm Cov}\,(\widehat{\Delta}_{n+2}^{m},\Delta_{n+2}^{m})\bigr{]}

Multiplying each side of this equation by $n+1$ once more results in

	$\displaystyle(n+1)C_{n+1}$	$\displaystyle=(n+1)C_{n}+(1+n^{-1})[C_{n}+C_{n}A^{\intercal}+AC_{n}]+\alpha_{n}AC_{n}A^{\intercal}-\alpha_{n}[I+A]\text{\rm Cov}\,(\widehat{\Delta}_{n+2}^{m},\Delta_{n+2}^{m})$
		$\displaystyle=nC_{n}+\alpha_{n}\bigl{[}(1+n^{-1})[2nC_{n}+nC_{n}A^{\intercal}+AnC_{n}]-[I+A]\text{\rm Cov}\,_{\pi}(\widehat{\Delta}_{n+2}^{m},\Delta_{n+2}^{m})+\mathcal{D}_{n+1}^{(2)}\bigr{]}$

where the error term $\mathcal{D}_{n+1}^{(2)}$ consists of two components: $[I+A][\text{\rm Cov}\,_{\pi}(\widehat{\Delta}_{n+2}^{m},\Delta_{n+2}^{m})-\text{\rm Cov}\,(\widehat{\Delta}_{n+2}^{m},\Delta_{n+2}^{m})]$ that converges to zero at a geometric rate and $AC_{n}A^{\intercal}-C_{n}$ .

As previously, this is approximated by the linear system

	$\displaystyle{\mathchoice{\genfrac{}{}{}{1}{d}{dt}}{\genfrac{}{}{}{1}{d}{dt}}{\genfrac{}{}{}{3}{d}{dt}}{\genfrac{}{}{}{3}{d}{dt}}}\mathcal{X}(t)=$	$\displaystyle(1+e^{-t})[2\mathcal{X}(t)+\mathcal{X}(t)A^{\intercal}+A\mathcal{X}(t)]+e^{-t}[A\mathcal{X}(t)A^{\intercal}-\mathcal{X}(t)]$		(64)
		$\displaystyle\qquad-[I+A]\text{\rm Cov}\,_{\pi}(\widehat{\Delta}_{n+2}^{m},\Delta_{n+2}^{m}))$		(64)

With the same argument used in (i), $\{nC_{n}+nC_{n}^{\intercal}\}$ converges to $\Sigma_{\sharp}^{(2)}$ in (46) at rate $1/n^{\delta}$ for $\delta=\delta(A+I)>0$ . Therefore, $nC_{n}+nC_{n}^{\intercal}=\Sigma_{\sharp}^{(2)}+O(n^{-\delta})$ and $R_{n}^{(2,1),(1)}=n^{-2}C_{n}=n^{-2}\Sigma_{\infty,C}+O(n^{-2-\delta})$ . ∎

(iii)

The third claim in Prop. 2.12 is established through a sequence of lemmas. Start with the representation of $\widetilde{\theta}_{n+1}^{(3)}$ based on (40):

\widetilde{\theta}_{n+1}^{(3)}=-\frac{1}{n+1}Z_{n+2}=-\frac{1}{n+1}\widehat{\Delta}_{n+3}^{m}+\frac{1}{n+1}(\widehat{Z}_{n+3}-\widehat{Z}_{n+2})

Since $\widehat{\Delta}_{n+3}^{m}$ is uncorrelated with the sequence $\{\widetilde{\theta}_{k}^{(1)}\}$ for $k\leq n+1$ , we have

{\sf E}[\widetilde{\theta}_{n+1}^{(1)}(\widehat{\Delta}_{n+3}^{m})^{\intercal}]=0

(65)

Hence it suffices to consider the correlation between $\widetilde{\theta}_{n+1}^{(1)}$ and $\widehat{Z}_{n+3}-\widehat{Z}_{n+2}$ . The formula for $\widetilde{\theta}_{n+1}^{(1)}$ for $n\geq 1$ is

\widetilde{\theta}_{n+1}^{(1)}=\prod_{k=1}^{n+1}[I+\alpha_{k}A]\widetilde{\theta}_{0}+\sum_{k=1}^{n+1}\alpha_{k}\prod_{l=k+1}^{n+1}[I+\alpha_{l}A]\Delta_{k+1}^{m}

(66)

$\widetilde{\theta}_{0}{\sf E}[\widehat{Z}_{n+3}^{\intercal}-\widehat{Z}_{n+2}^{\intercal}]$ converges to zero geometrically fast under $V$ -uniform ergodicity of $\Phi$ . Then we consider the expectation of the following:

		$\displaystyle\sum_{k=1}^{n+1}\alpha_{k}\prod_{l=k+1}^{n+1}[I+\alpha_{l}A]\Delta_{k+1}^{m}[\widehat{Z}_{n+3}^{\intercal}-\widehat{Z}_{n+2}^{\intercal}]$		(67)
	$\displaystyle=$	$\displaystyle\sum_{k=1}^{n+1}\alpha_{k}\prod_{l=k+1}^{n+1}[I+\alpha_{l}A]\bigl{[}\Delta_{k+2}^{m}\widehat{Z}_{n+3}^{\intercal}-\Delta_{k+1}^{m}\widehat{Z}_{n+2}^{\intercal}\bigr{]}+\sum_{k=1}^{n+1}\alpha_{k}\prod_{l=k+1}^{n+1}[I+\alpha_{l}A]\bigl{[}\Delta_{k+1}^{m}-\Delta_{k+2}^{m}\bigr{]}\widehat{Z}_{n+3}^{\intercal}$		(67)

The definition of $T$ is now based on the assumption that $I+A$ is Hurwitz: $T>0$ is the unique solution to the Lyapunov equation:

[A+I]T+T[A+I]^{\intercal}+I=0

As previously, we denote $\|W\|_{T}^{2}={\sf E}[W^{\intercal}TW]$ for a random vector $W$ , and denote by $\|M\|_{T}$ the induced operator norm of a matrix $M\in\mathbb{R}^{d\times d}$ . In the following result the vector $W$ is taken to be deterministic.

Lemma A.3.

Suppose the matrix $I+A$ is Hurwitz. Then there exists constant $K$ such that the following holds for any $k\geq 1$ and all $n\geq k$

\bigl{\|}\prod_{l=k+1}^{n+1}[I+\alpha_{l}A]\bigr{\|}_{T}\leq K\frac{k}{n+2}

Proof.

For any vector $W\in\mathbb{R}^{d}$ and $l\geq 1$ , we have

	$\displaystyle\\|[I+\alpha_{l}A]W\\|_{T}^{2}$	$\displaystyle=W^{\intercal}[T-2\alpha_{l}T-\alpha_{l}I+\alpha_{l}^{2}A^{\intercal}TA]W$
		$\displaystyle\leq W^{\intercal}[TT-2\alpha_{l}T+\alpha_{l}^{2}A^{\intercal}TA]W$
		$\displaystyle\leq(1-2\alpha_{l}+\alpha_{l}^{2}L^{2})\\|W\\|_{T}^{2}$

where $L=\|A\|_{T}$ . Hence

\|I+\alpha_{l}A\|_{T}\leq\sqrt{1-2\alpha_{l}+\alpha_{l}^{2}L^{2}}\leq 1-\alpha_{l}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\alpha_{l}^{2}L^{2}

Lemma A.1 completes the proof:

\displaystyle\bigl{\|}\prod_{l=k+1}^{n+1}[I+\alpha_{l}A]\bigr{\|}_{T}\leq\prod_{l=k+1}^{n+1}\|[I+\alpha_{l}A]\|_{T}\leq\prod_{l=k+1}^{n+1}[1-\alpha_{l}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}L^{2}\alpha_{l}^{2}]\leq K_{\ref{t:prod-n}}\frac{k}{n+2}

∎

To analyze ${\sf E}[\Delta_{k+2}^{m}\widehat{Z}_{n+3}^{\intercal}]$ , consider the bivariate Markov chain $\Phi_{n}^{*}=(\Phi_{n},\Phi_{n+1})$ , $n\geq 0$ , with state space ${\sf Z}^{*}={\sf Z}\times{\sf Z}$ . An associated weighting function $V^{*}:{\sf Z}\times{\sf Z}\rightarrow[1,\infty)$ is defined as $V^{*}(z,z^{\prime})=V(z)+V(z^{\prime})$ .

Denote function $h^{k+1,n+2}:{\sf Z}^{*}\rightarrow\mathbb{R}^{d\times d}$ as $h^{k+1,n+2}(z^{\prime},z^{\prime\prime})=({\hat{f}}(z^{\prime\prime})-{\sf E}[{\hat{f}}(\Phi_{k+1})\mid\Phi_{k}=z^{\prime}]){\sf E}[\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}(\Phi_{n+2})^{\intercal}\mid\Phi_{k+1}=z^{\prime\prime}]$ and $h_{i,j}^{k+1,n+2}:{\sf Z}^{*}\rightarrow\mathbb{R}$ as the $(i,j)$ -th entry of $h^{k+1,n+2}$ for $1\leq i,j\leq d$ . Note that $h^{k+1,n+2}(\Phi_{k},\Phi_{k+1})={\sf E}[\Delta_{k+1}^{m}\widehat{Z}_{n+2}\mid\mathcal{F}_{k+1}]$

Lemma A.4.

Suppose Assumptions (A1) and (A3) hold. For each $1\leq i,j\leq d$ ,

(i)

$h_{i,j}^{k+1,n+2}\in L_{\infty}^{V^{*}}$ , moreover there exists constant $B$ such that

\|h_{i,j}^{k+1,n+2}\|_{V^{*}}\leq B\|{\hat{f}}_{i}\|_{\scriptscriptstyle\sqrt{V}}\|\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}\|_{\scriptscriptstyle\sqrt{V}}\rho^{n-k+1}

(ii)

Consequently, there exists constant $B^{\prime}$ such that

\bigl{|}{\sf E}[h_{i,j}^{k+1,n+2}(\Phi_{k},\Phi_{k+1})\mid\Phi_{0}=z]-\pi\bigl{(}h_{i,j}^{k+1,n+2}\bigr{)}\bigr{|}\leq B^{\prime}\|{\hat{f}}_{i}\|_{\scriptscriptstyle\sqrt{V}}\|\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}\|_{\scriptscriptstyle\sqrt{V}}V(z)\rho^{n+1}

Proof.

By the definition of $V^{*}$ -norm,

	$\displaystyle\\|h_{i,j}^{k+1,n+2}\\|_{V^{*}}=$	$\displaystyle\sup_{z^{\prime},z^{\prime\prime}\in{\sf Z}}\frac{\bigl{\|}\bigl{[}{\hat{f}}_{i}(z^{\prime\prime})+{\sf E}[{\hat{f}}_{i}(\Phi_{k+1})\mid\Phi_{k}=z^{\prime}]\bigr{]}{\sf E}[\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}(\Phi_{n+2})\mid\Phi_{k+1}=z^{\prime\prime}]\bigr{\|}}{V(z^{\prime})+V(z^{\prime\prime})}$
	$\displaystyle\leq$	$\displaystyle\sup_{z^{\prime\prime}\in{\sf Z}}\frac{\bigl{\|}{\hat{f}}_{i}(z^{\prime\prime}){\sf E}[\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}(\Phi_{n+2})\mid\Phi_{k+1}=z^{\prime\prime}]\bigr{\|}}{V(z^{\prime\prime})}$
		$\displaystyle+\sup_{z^{\prime},z^{\prime\prime}\in{\sf Z}}\frac{\bigl{\|}{\sf E}[{\hat{f}}_{i}(\Phi_{k+1})\mid\Phi_{k}=z^{\prime}]{\sf E}[\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}(\Phi_{n+2})\mid\Phi_{k+1}=z^{\prime\prime}]\bigr{\|}}{V(z^{\prime})+V(z^{\prime\prime})}$

Given $\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}^{2}\in L_{\infty}^{V}$ and the $\sqrt{V}$ -uniform ergodicity of $\Phi$ [29, Lemma 15.2.9], there exists constant $B_{\scriptscriptstyle\sqrt{V}}<\infty$ such that

\bigl{|}{\sf E}[\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}(\Phi_{n+2})\mid\Phi_{k+1}=z^{\prime\prime}]\bigr{|}\leq B_{\scriptscriptstyle\sqrt{V}}\|\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}\|_{\scriptscriptstyle\sqrt{V}}\sqrt{V(z^{\prime\prime})}\rho^{n+1-k}

Consequently,

\sup_{z^{\prime\prime}\in{\sf Z}}\frac{|{\hat{f}}_{i}(z^{\prime\prime})\bigl{[}{\sf E}[\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}(\Phi_{n+2})\mid\Phi_{k+1}=z^{\prime\prime}]|}{V(z^{\prime\prime})}\leq\|{\hat{f}}_{i}\|_{\scriptscriptstyle\sqrt{V}}B_{\scriptscriptstyle\sqrt{V}}\|\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}\|_{\scriptscriptstyle\sqrt{V}}\rho^{n+1-k}

(68)

By the inequality $V(z^{\prime})+V(z^{\prime\prime})\geq\sqrt{V(z^{\prime})V(z^{\prime\prime})}$ and the $\sqrt{V}$ -uniform ergodicity of $\Phi$ once more, we have

		$\displaystyle\sup_{z^{\prime},z^{\prime\prime}\in{\sf Z}}\frac{\bigl{\|}{\sf E}[{\hat{f}}_{i}(\Phi_{k+1})\mid\Phi_{k}=z^{\prime}]{\sf E}[\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}(\Phi_{n+2})\mid\Phi_{k+1}=z^{\prime\prime}]\bigr{\|}}{V(z^{\prime})+V(z^{\prime\prime})}$		(69)
	$\displaystyle\leq$	$\displaystyle\sup_{z^{\prime}\in{\sf Z}}\frac{\bigl{\|}{\sf E}[{\hat{f}}_{i}(\Phi_{k+1})\mid\Phi_{k}=z^{\prime}]\bigr{\|}}{\sqrt{V(z^{\prime})}}\sup_{z^{\prime\prime}\in{\sf Z}}\frac{\bigl{\|}{\sf E}[\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}(\Phi_{n+2})\mid\Phi_{k+1}=z^{\prime\prime}]\bigr{\|}}{\sqrt{V(z^{\prime\prime})}}\leq B_{\scriptscriptstyle\sqrt{V}}^{2}\\|{\hat{f}}_{i}\\|_{\scriptscriptstyle\sqrt{V}}\\|\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}\\|_{\scriptscriptstyle\sqrt{V}}\rho^{n+2-k}$		(69)

Combining (68) and (69) gives

\|h_{i,j}^{k+1,n+2}\|_{V^{*}}\leq B\|{\hat{f}}_{i}\|_{\scriptscriptstyle\sqrt{V}}\|\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}\|_{\scriptscriptstyle\sqrt{V}}\rho^{n+1-k}

(70)

with $B=B_{\scriptscriptstyle\sqrt{V}}+B_{\scriptscriptstyle\sqrt{V}}^{2}$ .

For (ii), denote $g_{i,j}^{k,n+2}:{\sf Z}\rightarrow\mathbb{R}$ by the conditional expectation:

g_{i,j}^{k,n+2}(z)={\sf E}[h_{i,j}^{k+1,n+2}(\Phi_{k},\Phi_{k+1})\mid\Phi_{k}=z]

This is bounded by a constant times $V^{*}$ :

	$\displaystyle\|g_{i,j}^{k,n+2}(z)\|=\Bigl{\|}\int h_{i,j}^{k+1,n+2}(z,z^{\prime})P(z,dz^{\prime})\Bigr{\|}$	$\displaystyle\leq\Bigl{\|}\int\frac{h_{i,j}^{k+1,n+2}(z,z^{\prime})}{V^{}(z,z^{\prime})}V^{}(z,z^{\prime})P(z,dz^{\prime})\Bigr{\|}$
		$\displaystyle\leq\\|h_{i,j}^{k+1,n+2}\\|_{V^{*}}[V(z)+PV(z)]$

$V$ -uniform ergodicity of $\Phi$ is equivalent to the following drift condition [29, Theorem 16.0.2]: for some $\beta>0,b<\infty$ , and some “petite set” $C$ ,

PV(z)-V(z)\leq-\beta V(z)+b\mathbb{I}_{C}(z)\,,\qquad z\in{\sf Z}

Consequently,

[V(z)+PV(z)]\leq[2V(z)+b]\leq[2+|b|]V(z)

Therefore,

\|g_{i,j}^{k,n+2}\|_{V}\leq[2+|b|]\|h_{i,j}^{k+1,n+2}\|_{V^{*}}\leq[2+|b|]B\|{\hat{f}}_{i}\|_{\scriptscriptstyle\sqrt{V}}\|\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}\|_{\scriptscriptstyle\sqrt{V}}\rho^{n+1-k}

(71)

Thus $g_{i,j}^{k,n+2}\in L_{\infty}^{V}$ . By $V$ -uniform ergodicity of $\Phi$ again,

	$\displaystyle\bigl{\|}{\sf E}[g_{i,j}^{k,n+2}(\Phi_{k})\mid\Phi_{0}=z]-\pi\bigl{(}g_{i,j}^{k,n+2}\bigr{)}\bigr{\|}$	$\displaystyle\leq B_{V}\\|g_{i,j}^{k,n+2}\\|_{V}V(z)\rho^{k}$
		$\displaystyle\leq B^{\prime}\\|{\hat{f}}_{i}\\|_{\scriptscriptstyle\sqrt{V}}\\|\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}\\|_{\scriptscriptstyle\sqrt{V}}V(z)\rho^{n+1}$

with $B^{\prime}=[2+|b|]B_{V}B$ . The proof is then completed by applying the smoothing property of conditional expectation. ∎

Lemma A.5.

Under Assumptions (A1) and (A3), there exists $K<\infty$ such that the following hold


$\displaystyle\bigl{\\|}{\sf E}[\Delta_{k+1}^{m}\widehat{Z}_{n+3}^{\intercal}]\bigr{\\|}_{T}$	$\displaystyle\leq K\rho^{n+1-k}$	(72a)
$\displaystyle\bigl{\\|}{\sf E}[\Delta_{k+1}^{m}\widehat{Z}_{n+2}^{\intercal}]-{\sf E}[\Delta_{k+2}^{m}\widehat{Z}_{n+3}^{\intercal}]\bigr{\\|}_{T}$	$\displaystyle\leq K(1+\rho)\rho^{n+1}$	(72b)

Proof.

By the triangle inequality,

\bigl{\|}{\sf E}[\Delta_{k+1}^{m}\widehat{Z}_{n+2}^{\intercal}]\bigr{\|}_{T}\leq\bigl{\|}{\sf E}[Z_{k+1}\widehat{Z}_{n+2}^{\intercal}]\bigr{\|}_{T}+\bigl{\|}{\sf E}\bigl{[}{\sf E}[Z_{k+1}|\mathcal{F}_{k}]\widehat{Z}_{n+2}^{\intercal}\bigr{]}\bigr{\|}_{T}

where both terms admit the geometric bound in (72a) following directly from the $V$ -geometric mixing of $\Phi$ [29, Theorem 16.1.5].

For (72b), first notice that

{\sf E}[\Delta_{k+1}^{m}\widehat{Z}_{n+2}^{\intercal}]={\sf E}\bigl{[}{\sf E}[\Delta_{k+1}^{m}\widehat{Z}_{n+2}^{\intercal}\mid\mathcal{F}_{k+1}]\bigr{]}={\sf E}[h^{k+1,n+2}(\Phi_{k},\Phi_{k+1})]

With Lemma A.4, we have for each $(i,j)$ -th entry,

\Bigl{|}{\sf E}[h_{i,j}^{k+1,n+2}(\Phi_{k},\Phi_{k+1})\mid\Phi_{0}=z]-\pi\bigl{(}h_{i,j}^{k+1,n+2}\bigr{)}\Bigr{|}\leq B^{\prime}\|{\hat{f}}_{i}\|_{\scriptscriptstyle\sqrt{V}}\|\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}\|_{\scriptscriptstyle\sqrt{V}}V(z)\rho^{n+1}

With fixed initial condition $\Phi_{0}=z$ , by equivalence of matrix norms, there exists a constant $K$ such that

\Bigl{\|}{\sf E}[h^{k+1,n+2}(\Phi_{k},\Phi_{k+1})]-\pi\bigl{(}h_{i,j}^{k+1,n+2}\bigr{)}\Bigr{\|}_{T}\leq K\rho^{n+1}

(72b) then follows from the triangle inequality:

\bigl{\|}{\sf E}[\Delta_{k+1}^{m}\widehat{Z}_{n+2}^{\intercal}]-{\sf E}[\Delta_{k+2}^{m}\widehat{Z}_{n+3}^{\intercal}]\bigr{\|}_{T}\leq K\rho^{n+1}+K\rho^{n+2}=K(1+\rho)\rho^{n+1}

∎

Lemma A.6.

For fixed $\rho\in(0,1)$ , there exists $K<\infty$ such that for all $n\geq 2$ ,

\sum_{k=1}^{n-1}\frac{1}{k}\rho^{-k}\leq K\frac{\rho^{-n}}{n}

Proof.

Denote $\gamma=-\log\rho>0$ and observe that the function $t^{-1}\exp(\gamma t)$ is increasing over $[1,\infty)$ . The following holds for $n\geq 2$

\sum_{k=1}^{n-1}\frac{1}{k}\rho^{-k}=\sum_{k=1}^{n-1}\frac{1}{k}\exp(\gamma k)\leq\int_{1}^{n}t^{-1}\exp(\gamma t)dt

Now consider the integral: for any $t_{0}\in(1,n)$ ,

	$\displaystyle\int_{1}^{n}t^{-1}\exp(\gamma t)dt$	$\displaystyle\leq\int_{1}^{t_{0}}\exp(\gamma t)dt+\int_{t_{0}}^{n}t_{0}^{-1}\exp(\gamma t)dt$
		$\displaystyle\leq\gamma^{-1}\bigl{[}\exp(\gamma t_{0})-\exp(\gamma)+\frac{\exp(\gamma n)-\exp(\gamma t_{0})}{t_{0}}\bigr{]}$

Take $t_{0}=n-\sqrt{n}$ .

	$\displaystyle\exp(\gamma t_{0})-\exp(\gamma)+\frac{\exp(\gamma n)-\exp(\gamma t_{0})}{t_{0}}$	$\displaystyle=\exp(\gamma(n-\sqrt{n}))-\exp(\gamma)+\frac{\exp(\gamma n)-\exp(\gamma(n-\sqrt{n}))}{n-\sqrt{n}}$
		$\displaystyle\leq K^{\prime}n^{-1}\exp(\gamma n)$

where $K^{\prime}=\sup_{t\geq 2}t\exp(-\gamma\sqrt{t})-t\exp(\gamma-\gamma t)+[1-\exp(-\gamma\sqrt{t})]/[1-1/\sqrt{t}]$ . The proof is completed by setting $K=\gamma^{-1}K^{\prime}$ . ∎

Proof of Prop. 2.12 (iii).

Following (65), we have

R_{n+1}^{(1),(3)}={\sf E}[\widetilde{\theta}_{n+1}^{(1)}(\widetilde{\theta}_{n+1}^{(3)})^{\intercal}]=\frac{1}{n+1}{\sf E}[\widetilde{\theta}_{n+1}^{(1)}[\widehat{Z}_{n+3}-\widehat{Z}_{n+2}]^{\intercal}]

(73)

This is bounded based on (67): Lemma A.3 and (72b) indicate that there exists some constant $K$ such that

\displaystyle\sum_{k=1}^{n+1}\alpha_{k}\bigl{\|}\prod_{l=k+1}^{n+1}[I+\alpha_{l}A]\bigr{\|}_{T}\bigl{\|}{\sf E}\bigl{[}\Delta_{k+2}^{m}\widehat{Z}_{n+3}^{\intercal}-\Delta_{k+1}^{m}\widehat{Z}_{n+2}^{\intercal}\bigr{]}\bigr{\|}_{T}\leq K\rho^{n+1}

(74)

For the second term in (67), it admits a simpler form

	$\displaystyle\sum_{k=1}^{n+1}\alpha_{k}\prod_{l=k+1}^{n+1}[I+\alpha_{l}A]\bigl{[}\Delta_{k+1}^{m}-\Delta_{k+2}^{m}\bigr{]}\widehat{Z}_{n+3}^{\intercal}=$	$\displaystyle\prod_{l=2}^{n+1}[I+\alpha_{l}A]\Delta_{2}^{m}\widehat{Z}_{n+3}^{\intercal}-\frac{1}{n+1}\Delta_{n+3}^{m}\widehat{Z}_{n+3}^{\intercal}$
		$\displaystyle-\sum_{k=2}^{n+1}\alpha_{k-1}\alpha_{k}\prod_{l=k+1}^{n+1}[I+\alpha_{l}A][I+A]\Delta_{k+1}^{m}\widehat{Z}_{n+3}^{\intercal}$

where $\prod_{l=2}^{n+1}[I+\alpha_{l}A]{\sf E}[\Delta_{2}\widehat{Z}_{n+3}^{\intercal}]=O(\rho^{n})$ and ${\sf E}[\Delta_{n+3}^{m}\widehat{Z}_{n+3}^{\intercal}]$ converges to its steady-state mean. For the remaining part, Lemma A.3 and (72a) together imply that

		$\displaystyle\Bigl{\\|}\sum_{k=2}^{n+1}\alpha_{k-1}\alpha_{k}\prod_{l=k+1}^{n+1}[I+\alpha_{l}A][I+A]{\sf E}[\Delta_{k+1}^{m}\widehat{Z}_{n+3}^{\intercal}]\Bigr{\\|}_{T}$
		$\displaystyle\leq\sum_{k=2}^{n+1}\alpha_{k-1}\alpha_{k}\prod_{l=k+1}^{n+1}\\|I+\alpha_{l}A\\|_{T}\\|I+A\\|_{T}\\|{\sf E}[\Delta_{k+1}^{m}\widehat{Z}_{n+3}^{\intercal}]\\|_{T}$
		$\displaystyle\leq\frac{K^{\prime}}{n+2}\sum_{k=2}^{n+1}\alpha_{k-1}\rho^{n+1-k}$

for some constant $K^{\prime}$ . By Lemma A.6, there exists another constant $K^{\prime\prime}$ such that

\frac{K^{\prime}}{n+2}\sum_{k=2}^{n+1}\alpha_{k-1}\rho^{n-k}=\frac{K^{\prime}\rho^{n}}{n+2}\sum_{k=1}^{n}\alpha_{k}\rho^{-k}\leq\frac{K^{\prime}K^{\prime\prime}\rho}{(n+1)(n+2)}

This combined with (74) shows that

{\sf E}[\widetilde{\theta}_{n+1}^{(1)}[\widehat{Z}_{n+3}-\widehat{Z}_{n+2}]^{\intercal}]=-(n+1)^{-1}{\sf E}_{\pi}[\Delta_{n}^{m}\widehat{Z}_{n}^{\intercal}]+O(\rho^{n+1})

Following (73), we obtain the desired result:

{\sf E}[\widetilde{\theta}_{n+1}^{(1)}(\widetilde{\theta}_{n+1}^{(3)})^{\intercal}]=-\frac{1}{(n+1)^{2}}{\sf E}_{\pi}[\Delta_{n}^{m}\widehat{Z}_{n}^{\intercal}]+O((n+1)^{-3})

∎

A.5 Unbounded moments

This section is devoted to the proof that $\lim_{n\rightarrow\infty}{\sf E}[|v^{\intercal}\widetilde{\theta}_{n}^{\varrho}|^{2}]=\infty$ for $\varrho>\varrho_{0}$ (see Thm. 2.4 (ii)). Since it suffices to show the result holds for $\varrho_{0}<\varrho<{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ , we assume $\varrho<{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ throughout. Recall that $\lambda=-\varrho_{0}+ui$ .

Consider the update of $\widetilde{\theta}_{n}^{\varrho}$ in (33). With $v^{\intercal}[\lambda I-A]=0$ , we have $v^{\intercal}[\varrho_{n}I+A_{n}]=v^{\intercal}[\varrho-\varrho_{0}+\varepsilon_{v}(n,\varrho)+ui]$ . Multiplying each side of (33) by $v^{\intercal}$ gives

	$\displaystyle v^{\intercal}\widetilde{\theta}_{n+1}^{\varrho}$	$\displaystyle=v^{\intercal}\widetilde{\theta}_{n}^{\varrho}+\alpha_{n+1}\bigl{[}[\varrho-\varrho_{0}+\varepsilon_{v}(n,\varrho)+ui]v^{\intercal}\widetilde{\theta}_{n}^{\varrho}+(n+1)^{\varrho}v^{\intercal}\Delta_{n+1}\bigr{]}$
		$\displaystyle=[1+\alpha_{n+1}\tilde{\varrho}_{n+1}+\alpha_{n+1}ui]v^{\intercal}\widetilde{\theta}_{n}^{\varrho}+(n+1)^{\varrho-1}v^{\intercal}\Delta_{n+1}$

with $\tilde{\varrho}_{n+1}=\varrho-\varrho_{0}+\varepsilon_{v}(n,\varrho)$ . Note that $\tilde{\varrho}_{n+1}$ is strictly positive for sufficiently large $n$ .

For a fixed but arbitrary $n_{0}$ and each $n\geq n_{0}$ , we have

$\displaystyle v^{\intercal}\widetilde{\theta}_{n+1}^{\varrho}$	$\displaystyle=v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}\prod_{k=n_{0}+1}^{n+1}[1+\alpha_{k}\tilde{\varrho}_{k}+\alpha_{k}ui]+\sum_{k=n_{0}+1}^{n+1}k^{\varrho-1}v^{\intercal}\Delta_{k}\prod_{l=k+1}^{n+1}[1+\alpha_{l}\tilde{\varrho}_{l}+\alpha_{l}ui]$	(75)
	$\displaystyle=\Bigl{[}\prod_{k=n_{0}+1}^{n+1}[1+\alpha_{k}\tilde{\varrho}_{k}+\alpha_{k}ui]\Bigr{]}\cdot\Bigl{[}v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}+\sum_{k=n_{0}+1}^{n+1}\frac{k^{\varrho-1}}{\prod_{l=n_{0}+1}^{k}[1+\alpha_{l}\tilde{\varrho}_{l}+\alpha_{l}ui]}v^{\intercal}\Delta_{k}\Bigr{]}$
	$\displaystyle=\Bigl{[}\prod_{k=n_{0}+1}^{n+1}[1+\alpha_{k}\tilde{\varrho}_{k}+\alpha_{k}ui]\Bigr{]}\cdot\Bigl{[}v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}+\sum_{k=n_{0}+1}^{n+1}\beta_{k}v^{\intercal}\Delta_{k}\Bigr{]}$

with $\beta_{n}=n^{\varrho-1}/\prod_{l=n_{0}+1}^{n}[1+\alpha_{l}\tilde{\varrho}_{l}+\alpha_{l}ui]$ .

The analysis of $\{v^{\intercal}\widetilde{\theta}_{n}^{\varrho}\}$ is mainly based on the random series appearing in (75), which requires the following three preliminary results:

Lemma A.7.

There exists some $n_{0}$ such that for each $n>n_{0}$ ,

|\beta_{n}-\beta_{n+1}|^{2}\leq 4|\beta_{n+1}|^{2}\alpha_{n}^{2}(1+u^{2})

Proof.

Note that $|\beta_{n}-\beta_{n+1}|^{2}=|\beta_{n+1}|^{2}|\beta_{n}/\beta_{n+1}-1|^{2}$ , so it is sufficient to bound the second factor:

	$\displaystyle\|\beta_{n}/\beta_{n+1}-1\|^{2}$	$\displaystyle=\|(1+n^{-1})^{1-\varrho}[1+\alpha_{n+1}\tilde{\varrho}_{n+1}+\alpha_{n+1}ui]-1\|^{2}$		(76)
		$\displaystyle=\|(1+n^{-1})^{1-\varrho}[1+\alpha_{n+1}\tilde{\varrho}_{n+1}]-1+(1+n^{-1})^{1-\varrho}\alpha_{n+1}ui\|^{2}$		(76)

Consider the real part in (76): since $\varepsilon_{v}(n,\varrho)=O(n^{-1})$ , there exists $n_{0}$ such that $|\varepsilon_{v}(n,\varrho)|\leq\varrho-\varrho_{0}$ and $\tilde{\varrho}_{n+1}=\varrho-\varrho_{0}+\varepsilon_{v}(n,\varrho)>0$ for $n\geq n_{0}$ . Consequently,

	$\displaystyle 0\leq(1+n^{-1})^{1-\varrho}[1+\alpha_{n+1}\tilde{\varrho}_{n+1}]-1$	$\displaystyle<(1+n^{-1})[1+\alpha_{n+1}\tilde{\varrho}_{n+1}]-1$
		$\displaystyle\leq n^{-1}(1+\tilde{\varrho}_{n+1}+\alpha_{n+1}\tilde{\varrho}_{n+1})$

Given $0<\varrho-\varrho_{0}<{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}$ , we can increase $n_{0}$ if necessary, such that $1+\tilde{\varrho}_{n+1}+\alpha_{n+1}\tilde{\varrho}_{n+1}\leq 2$ for $n\geq n_{0}$ . Then we have

(1+n^{-1})^{1-\varrho}[1+\alpha_{n+1}\tilde{\varrho}_{n+1}]-1\leq 2\alpha_{n}

For the imaginary part, observe that

(1+n^{-1})^{1-\varrho}\alpha_{n+1}u=\alpha_{n}\frac{n^{\varrho}}{(n+1)^{\varrho}}u\leq 2u\alpha_{n}

The proof is completed by summing the bounds for the real and imaginary parts. ∎

Lemma A.8.

Suppose Assumptions A1 and A3 hold. With each $n_{0}\geq 1$ , the random series $\sum_{k=n_{0}+1}^{\infty}\beta_{k}v^{\intercal}\Delta_{k}$ converges a.s..

Proof.

Decompose the series into the sum of a martingale difference and telescoping sequence. The martingale difference sequence converges almost surely given $\{\beta_{n}\}\in\ell_{2}$ ; the telescoping series is absolutely convergent by Lemma A.7. ∎

Lemma A.9.

Suppose Assumptions A1 and A3 hold. Denote $Z_{n}^{v}=v^{\intercal}Z_{n}=v^{\intercal}{\hat{f}}(\Phi_{n})$ . There exists a deterministic constant $K>0$ , such that for all $n_{0}$ and each sequence $\gamma\in\ell_{1}\subseteq\ell_{2}$ ,

{\sf E}\bigl{[}\hbox{\sf Var}\,(\sum_{k=n_{0}+2}^{\infty}\gamma_{k-n_{0}-1}Z_{k}^{v}\mid\mathcal{F}_{n_{0}+1})\bigr{]}\leq K\sum_{k=1}^{\infty}|\gamma_{k}|^{2}

(77)

Proof.

First recall that $\displaystyle\hbox{\sf Var}\,\bigl{(}\sum_{k=n_{0}+2}^{\infty}\gamma_{k-n_{0}-1}Z_{k}^{v}\mid\mathcal{F}_{n_{0}+1}\bigr{)}\leq{\sf E}\bigl{[}|\sum_{k=n_{0}+2}^{\infty}\gamma_{k-n_{0}-1}Z_{k}^{v}|^{2}\mid\mathcal{F}_{n_{0}+1}\bigr{]}$ , and hence by the Markov property,

{\sf E}\bigl{[}|\sum_{k=n_{0}+2}^{\infty}\gamma_{k-n_{0}-1}Z_{k}^{v}|^{2}\mid\mathcal{F}_{n_{0}+1}\bigr{]}={\sf E}_{z^{\prime}}\bigl{[}|\sum_{k=1}^{\infty}\gamma_{k}Z_{k}^{v}|^{2}\bigr{]}=\lim_{n\rightarrow\infty}{\sf E}_{z^{\prime}}\bigl{[}|\sum_{k=1}^{n}\gamma_{k}Z_{k}^{v}|^{2}\bigr{]}

where $z^{\prime}=\Phi_{n}$ , and the last equality holds by the assumption $\gamma\in\ell_{1}$ and dominated convergence. For each $n$ , letting $\lceil\gamma\rceil^{n}=(\gamma_{1},\dots,\gamma_{n})$ denote $\gamma$ truncated at index $n$ , we have

{\sf E}_{z^{\prime}}\bigl{[}|\sum_{k=1}^{n}\gamma_{k}Z_{k}^{v}|^{2}\bigr{]}=\sum_{k=1}^{n}|\gamma_{k}|^{2}{\sf E}_{z^{\prime}}\bigl{[}|Z_{k}^{v}|^{2}\bigr{]}+\sum_{i=1}^{n}\sum_{j\neq i}^{n}\gamma_{i}^{\dagger}\gamma_{j}{\sf E}_{z^{\prime}}\bigl{[}(Z_{i}^{v})^{\dagger}Z_{j}^{v}\bigr{]}=(\lceil\gamma\rceil^{n})^{\dagger}[R]_{n}\lceil\gamma\rceil^{n}

(78)

where $[R]_{n}\in\mathbb{C}^{n\times n}$ is the covariance matrix with each entry defined as $R(i,j)={\sf E}_{z^{\prime}}\bigl{[}(Z_{i}^{v})^{\dagger}Z_{j}^{v}\bigr{]},1\leq i,j\leq n$ ; $[R]_{n}$ is Hermitian and positive semi-definite. With $\lambda_{n}\geq 0$ denoting the largest eigenvalue of $[R]_{n}$ , we have

(\lceil\gamma\rceil^{n})^{\dagger}[R]_{n}\lceil\gamma\rceil^{n}\leq\lambda_{n}\sum_{k=1}^{n}|\gamma_{k}|^{2}\leq\lambda_{n}\sum_{k=1}^{\infty}|\gamma_{k}|^{2}

(79)

By the Gershgorin circle theorem [20], the maximal eigenvalue is upper bounded by the maximum row sum of absolute values of entries:

\lambda_{n}\leq\max_{i\in\{1,\dots,n\}}\sum_{j=1}^{n}|R(i,j)|\leq\sup_{i\in\mathbb{Z}_{+}}\sum_{j=1}^{\infty}|R(i,j)|

For any $i$ , observe that

\sum_{j=1}^{\infty}|R(i,j)|={\sf E}_{z^{\prime}}\bigl{[}|Z_{i}^{v}|^{2}\bigr{]}+\sum_{i<j}|R(i,j)|+\sum_{i>j}|R(i,j)|

Since $V$ -uniform ergodicity of the Markov chain $\Phi$ implies $V$ -geometric mixing [29, Theorem 16.1.5] and $|v^{\intercal}{\hat{f}}|^{2}\in L_{\infty}^{V}$ , there exist $B<\infty$ and $r\in(0,1)$ such that for each $i,k\in\mathbb{Z}_{+}$ ,

\Bigl{|}R(i,i+k)-{\sf E}_{z^{\prime}}\bigl{[}(Z_{i}^{v})^{\dagger}\bigr{]}{\sf E}_{z^{\prime}}\bigl{[}Z_{i+k}^{v}\bigr{]}\Bigl{|}\leq Br^{k}[1+r^{i}V(z^{\prime})]

Consequently,

	$\displaystyle\sum_{j=1}^{\infty}\|R(i,j)\|\leq$	$\displaystyle{\sf E}_{z^{\prime}}\bigl{[}\|Z_{i}^{v}\|^{2}\bigr{]}+\Bigl{\|}{\sf E}_{z^{\prime}}\bigl{[}(Z_{i}^{v})^{\dagger}\bigr{]}\Bigr{\|}\sum_{j=1}^{\infty}\Bigl{\|}{\sf E}_{z^{\prime}}[Z_{j}^{v}]\Bigr{\|}$		(80)
		$\displaystyle+\sum_{i<j}Br^{j-i}[1+r^{i}V(z^{\prime})]+\sum_{i>j}Br^{i-j}[1+r^{j}V(z^{\prime})]$		(80)

Given $|v^{\intercal}{\hat{f}}|^{2}\in L_{\infty}^{V}$ , by (24),

{\sf E}_{z^{\prime}}\bigl{[}|Z_{n}^{v}|^{2}\bigr{]}\leq{\sf E}_{\pi}\bigl{[}|Z_{n}^{v}|^{2}\bigr{]}+B_{V}\bigl{\|}|v^{\intercal}{\hat{f}}|^{2}\bigr{\|}_{V}V(z^{\prime})

The Markov chain $\Phi$ is also $\sqrt{V}$ -uniformly ergodic. By (24) for $\sqrt{V}$ and $|v^{\intercal}{\hat{f}}|^{2}\in L_{\infty}^{V}$ once more,

\bigl{|}{\sf E}_{z^{\prime}}[(Z_{i}^{v})^{\dagger}]\bigr{|}\leq B_{\scriptscriptstyle\sqrt{V}}\|v^{\intercal}{\hat{f}}\|_{\scriptscriptstyle\sqrt{V}}\sqrt{V(z^{\prime})}\rho^{j}

Hence

\bigl{|}{\sf E}_{z^{\prime}}[(Z_{i}^{v})^{\dagger}]\bigr{|}\sum_{j=1}^{\infty}\bigl{|}{\sf E}_{z^{\prime}}[Z_{j}^{v}]\bigr{|}\leq B_{\scriptscriptstyle\sqrt{V}}^{2}\|v^{\intercal}{\hat{f}}\|^{2}_{\scriptscriptstyle\sqrt{V}}V(z^{\prime})\rho^{i}\sum_{j=1}^{\infty}\rho^{j}\leq B^{2}_{\scriptscriptstyle\sqrt{V}}\|v^{\intercal}{\hat{f}}\|^{2}_{\scriptscriptstyle\sqrt{V}}\frac{\rho}{1-\rho}V(z^{\prime})

The other two terms on the right hand side of (80) are bounded as follows:

	$\displaystyle\sum_{j>i}Br^{j-i}[1+r^{i}V(z^{\prime})]$	$\displaystyle=\sum_{j>i}B[r^{j-i}+r^{j}V(z^{\prime})]\leq\frac{Br}{1-r}(1+V(z^{\prime}))$
	$\displaystyle\sum_{j<i}Br^{i-j}[1+r^{j}V(z^{\prime})]$	$\displaystyle=\bigl{[}\sum_{j<i}B[r^{i-j}]\bigr{]}+BV(z^{\prime})(i-1)r^{i}\leq\frac{Br}{1-r}+BV(z^{\prime})\sup_{i}ir^{i}$

where $\sup_{i}ir^{i}$ exists since $\lim_{n\rightarrow\infty}nr^{n}=0$ .

Consequently, there exists some deterministic constant $K^{\prime}$ independent of $z^{\prime}$ such that, the largest eigenvalues $\{\lambda_{n}\}$ are uniformly bounded

\sup_{n}\lambda_{n}\leq K^{\prime}V(z^{\prime})

Combining this with (78) and (79) gives

{\sf E}_{z^{\prime}}\bigl{[}|\sum_{k=1}^{\infty}Z_{k}^{v}|^{2}\bigr{]}\leq K^{\prime}V(z^{\prime})\sum_{k=1}^{\infty}|\gamma_{k}|^{2}

Therefore,

{\sf E}\Bigl{[}{\sf E}\bigl{[}|\sum_{k=n_{0}+2}^{\infty}\gamma_{k-n_{0}-1}Z_{k}^{v}|^{2}\mid\mathcal{F}_{n_{0}+1}\bigr{]}\mid\Phi_{0}=z\Bigr{]}\leq K^{\prime}{\sf E}\bigl{[}V(\Phi_{n_{0}+1})\mid\Phi_{0}=z\bigr{]}\sum_{k=1}^{\infty}|\gamma_{k}|^{2}

By $V\in L_{\infty}^{V}$ and (24) again, ${\sf E}[V(\Phi_{n_{0}+1})\mid\Phi_{0}=z]\leq\pi(V)+B_{V}V(z)$ . The desired conclusion then follows by setting $K=K^{\prime}(B_{V}V(z)+\pi(V))$ . ∎

Lemma A.10.

Suppose Assumptions A1-A3 hold and $\Sigma_{\Delta}v\neq 0$ . With $\{\widetilde{\theta}_{n}^{\varrho}\}$ updated via (33),

\liminf_{n\rightarrow\infty}{\sf E}[|v^{\intercal}\widetilde{\theta}_{n}^{\varrho}|^{2}]=\infty\,,\qquad\varrho>\varrho_{0}

Proof.

With fixed $n_{0}$ , equation (75) gives a representation for $v^{\intercal}\widetilde{\theta}_{n+1}^{\varrho}$ for each $n\geq n_{0}$ . It is obvious that $\liminf_{n\rightarrow\infty}\prod_{k=n_{0}+1}^{n}|1+\tilde{\varrho}_{k}\alpha_{k}+\alpha_{k}ui|^{2}=\infty$ . Hence it suffices to show that $\liminf_{n\rightarrow\infty}{\sf E}[|v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}+\sum_{k=n_{0}+1}^{n+1}\beta_{k}v^{\intercal}\Delta_{k}|^{2}]$ is strictly greater than zero.

By Fatou’s lemma,

	$\displaystyle\liminf_{n\rightarrow\infty}{\sf E}\bigl{[}\|v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}+\sum_{k=n_{0}+1}^{n+1}\beta_{k}v^{\intercal}\Delta_{k}\|^{2}\bigr{]}$	$\displaystyle\geq{\sf E}\bigl{[}\liminf_{n\rightarrow\infty}\|v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}+\sum_{k=n_{0}+1}^{n+1}\beta_{k}v^{\intercal}\Delta_{k}\|^{2}\bigr{]}$
		$\displaystyle={\sf E}\bigl{[}\|v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}+\sum_{k=n_{0}+1}^{\infty}\beta_{k}v^{\intercal}\Delta_{k}\|^{2}\bigr{]}$
		$\displaystyle\geq\hbox{\sf Var}\,\bigl{(}v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}+\sum_{k=n_{0}+1}^{\infty}\beta_{k}v^{\intercal}\Delta_{k}\bigr{)}$

where the equality holds by Lemma A.8. By the law of total variance,

	$\displaystyle\hbox{\sf Var}\,\bigl{(}v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}+\sum_{k=n_{0}+1}^{\infty}\beta_{k}v^{\intercal}\Delta_{k}\bigr{)}$	$\displaystyle\geq{\sf E}\bigl{[}\hbox{\sf Var}\,(v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}+\sum_{k=n_{0}+1}^{\infty}\beta_{k}v^{\intercal}\Delta_{k}\mid\mathcal{F}_{n_{0}+1})\bigr{]}$
		$\displaystyle={\sf E}\bigl{[}\hbox{\sf Var}\,(\sum_{k=n_{0}+1}^{\infty}\beta_{k}v^{\intercal}\Delta_{k}\mid\mathcal{F}_{n_{0}+1})\bigr{]}$

Apply once more the decomposition based on Poisson’s equation:

v^{\intercal}\Delta_{n}=\Delta_{n+1}^{vm}+Z_{n}^{v}-Z_{n+1}^{v}\,,\qquad n\geq 1\,,

where $Z_{n}^{v}=v^{\intercal}{\hat{f}}(\Phi_{n})$ and $\Delta_{n+1}^{vm}=Z_{n+1}^{v}-{\sf E}[Z_{n+1}^{v}\mid\mathcal{F}_{n}]$ is a martingale difference. By the variance inequality $\hbox{\sf Var}\,(X+Y\mid\mathcal{F}_{n_{0}+1})\leq 2\hbox{\sf Var}\,(X\mid\mathcal{F}_{n_{0}+1})+2\hbox{\sf Var}\,(Y\mid\mathcal{F}_{n_{0}+1})$ , we have

	$\displaystyle{\sf E}\bigl{[}\hbox{\sf Var}$	$\displaystyle(\sum_{k=n_{0}+1}^{\infty}\beta_{k}v^{\intercal}\Delta_{k}\mid\mathcal{F}_{n_{0}+1})\bigr{]}$		(81)
		$\displaystyle\geq{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}{\sf E}\bigl{[}\hbox{\sf Var}\,(\sum_{k=n_{0}+1}^{\infty}\beta_{k}\Delta_{k+1}^{vm}\mid\mathcal{F}_{n_{0}+1})\bigr{]}-{\sf E}\bigl{[}\hbox{\sf Var}\,(\sum_{k=n_{0}+1}^{\infty}\beta_{k}(Z_{k}^{v}-Z_{k+1}^{v})\mid\mathcal{F}_{n_{0}+1})\bigr{]}$		(81)

By the law of total variance once more,

\hbox{\sf Var}\,\bigl{(}\sum_{k=n_{0}+1}^{\infty}\beta_{k}\Delta_{k+1}^{vm}\bigr{)}={\sf E}\bigl{[}\hbox{\sf Var}\,(\sum_{k=n_{0}+1}^{\infty}\beta_{k}\Delta_{k+1}^{vm}\mid\mathcal{F}_{n_{0}+1})\bigr{]}+\hbox{\sf Var}\,\bigl{(}{\sf E}[\sum_{k=n_{0}+1}^{\infty}\beta_{k}\Delta_{k+1}^{vm}\mid\mathcal{F}_{n_{0}+1}]\bigr{)}

Note that $\lim_{n\to\infty}{\sf E}[\sum_{k=n_{0}+1}^{n}\beta_{k}\Delta_{k+1}^{vm}\mid\mathcal{F}_{n_{0}+1}]$ converges to zero almost surely. With $\{\beta_{n}\}\in\ell_{2}$ and the Jensen’s inequality, we have for all $n$ ,

\bigl{|}{\sf E}[\sum_{k=n_{0}+1}^{n}\beta_{k}\Delta_{k+1}^{vm}\mid\mathcal{F}_{n_{0}+1}]\bigr{|}^{2}\leq\sum_{k=n_{0}+1}^{\infty}|\beta_{k}|^{2}{\sf E}[|\Delta_{k+1}^{vm}|^{2}\mid\mathcal{F}_{n_{0}+1}]<\infty

Then by the dominated convergence theorem, ${\sf E}\bigl{[}\bigl{|}{\sf E}[\sum_{k=n_{0}+1}^{\infty}\beta_{k}\Delta_{k+1}^{vm}\mid\mathcal{F}_{n_{0}+1}]\bigr{|}^{2}\bigr{]}=0$ . Therefore,

\hbox{\sf Var}\,\bigl{(}{\sf E}[\sum_{k=n_{0}+1}^{\infty}\beta_{k}\Delta_{k+1}^{vm}\mid\mathcal{F}_{n_{0}+1}]\bigr{)}\leq{\sf E}\bigl{[}\bigl{|}{\sf E}[\sum_{k=n_{0}+1}^{\infty}\beta_{k}\Delta_{k+1}^{vm}\mid\mathcal{F}_{n_{0}+1}]\bigr{|}^{2}\bigr{]}=0

Hence,

{\sf E}\bigl{[}\hbox{\sf Var}\,(\sum_{k=n_{0}+1}^{\infty}\beta_{k}\Delta_{k+1}^{vm}\mid\mathcal{F}_{n_{0}+1})\bigr{]}=\hbox{\sf Var}\,\bigl{(}\sum_{k=n_{0}+1}^{\infty}\beta_{k}\Delta_{k+1}^{vm}\bigr{)}=\sum_{k=n_{0}+1}^{\infty}|\beta_{k}|^{2}\sigma_{k+1}^{2}

(82)

where $\sigma_{n}^{2}=\hbox{\sf Var}\,(\Delta_{n}^{vm})$ .

For the telescoping term on the right hand side of (81), we have

	$\displaystyle{\sf E}\bigl{[}\hbox{\sf Var}\,(\sum_{k=n_{0}+1}^{\infty}\beta_{k}(Z_{k}^{v}-Z_{k+1}^{v})\mid\mathcal{F}_{n_{0}+1})\bigr{]}$	$\displaystyle={\sf E}\bigl{[}\hbox{\sf Var}\,(\beta_{n_{0}+1}Z_{n_{0}+1}^{v}-\sum_{k=n_{0}+2}^{\infty}(\beta_{k}-\beta_{k+1})Z_{k}^{v}\mid\mathcal{F}_{n_{0}+1})\bigr{]}$		(83)
		$\displaystyle={\sf E}\bigl{[}\hbox{\sf Var}\,(\sum_{k=n_{0}+2}^{\infty}(\beta_{k}-\beta_{k+1})Z_{k}^{v}\mid\mathcal{F}_{n_{0}+1})\bigr{]}$		(83)

Given $\{\beta_{n}-\beta_{n+1}\}\in\ell_{1}$ by Lemma A.7, Lemma A.9 indicates that there exists some constant $K$ independent of $n_{0}$ such that,

{\sf E}\bigl{[}\hbox{\sf Var}\,(\sum_{k=n_{0}+2}^{\infty}(\beta_{k}-\beta_{k+1})\hat{Z}_{k}\mid\mathcal{F}_{n_{0}+1})\bigr{]}\leq K\sum_{k=n_{0}+2}^{\infty}|\beta_{k}-\beta_{k+1}|^{2}

Combining (82) and (83) gives

\displaystyle{\sf E}[\hbox{\sf Var}\,(\sum_{k=n_{0}+1}^{\infty}\beta_{k}v^{\intercal}\Delta_{k}\mid\mathcal{F}_{n_{0}+1})]\geq{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\sum_{k=n_{0}+1}^{\infty}|\beta_{k}|^{2}\sigma_{k+1}^{2}-K\sum_{k=n_{0}+2}^{\infty}|\beta_{k}-\beta_{k+1}|^{2}

Since $|v^{\intercal}{\hat{f}}|^{2}\in L_{\infty}^{V}$ and $\sigma_{n}^{2}\rightarrow\sigma^{2}=v^{\intercal}\Sigma_{\Delta}\overline{v}>0$ at a geometric rate, we set $n_{0}$ sufficiently large such that Lemma A.7 holds and moreover for all $n\geq n_{0}$ ,

\sigma_{n}^{2}\geq{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\sigma^{2},\qquad\frac{1}{4}\sigma^{2}-4K\alpha_{n}^{2}(1+u^{2})\geq\frac{1}{8}\sigma^{2}\,,

Then,

{\sf E}\bigl{[}\hbox{\sf Var}\,(\sum_{k=n_{0}+1}^{\infty}\beta_{k}v^{\intercal}\Delta_{k}\mid\mathcal{F}_{n_{0}+1})\bigr{]}\geq\frac{1}{8}\sigma^{2}\sum_{k=n_{0}+1}^{\infty}|\beta_{k}|^{2}

Therefore,

\liminf_{n\rightarrow\infty}{\sf E}\bigl{[}|v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}+\sum_{k=n_{0}+1}^{n}\beta_{k}v^{\intercal}\Delta_{k}|^{2}\bigr{]}\geq\frac{1}{8}\sigma^{2}\sum_{k=n_{0}+1}^{\infty}|\beta_{k}|^{2}>0

The desired conclusion then follows from (75):

\liminf_{n\rightarrow\infty}{\sf E}\bigl{[}|v^{\intercal}\widetilde{\theta}_{n+1}^{\varrho}|^{2}\bigr{]}\geq\liminf_{n\rightarrow\infty}\prod_{k=n_{0}+1}^{n}|1+\tilde{\varrho}_{k}\alpha_{k}+\alpha_{k}ui|^{2}\cdot\liminf_{n\rightarrow\infty}{\sf E}\bigl{[}|v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}+\sum_{k=n_{0}+1}^{n}\beta_{k}v^{\intercal}\Delta_{k}|^{2}\bigr{]}=\infty

∎

A.6 Coupling of Deterministic and Random Linear SA

Let $\widehat{\mathcal{A}}:{\sf Z}\rightarrow\mathbb{R}^{d\times d}$ denote the zero-mean solution to the following Poisson equation:

{\sf E}[\widehat{\mathcal{A}}(\Phi_{n+1})\mid\Phi_{n}=z]=\widehat{\mathcal{A}}(z)-\mathcal{A}(z)+A\,,\qquad z\in{\sf Z}

which is a matrix version of (26). Denote $\Delta^{\mathcal{A}}_{n+1}=\widehat{\mathcal{A}}(\Phi_{n+1})-{\sf E}[\widehat{\mathcal{A}}(\Phi_{n+1})\mid\mathcal{F}_{n}]$ (a martingale difference sequence), and $\mathcal{A}_{n}=\widehat{\mathcal{A}}(\Phi_{n})$ . Then, from (36),

	$\displaystyle(A_{n+1}-A)\widetilde{\theta}^{\circ}_{n}$	$\displaystyle=[\Delta^{\mathcal{A}}_{n+2}+\mathcal{A}_{n+1}-\mathcal{A}_{n+2}]\widetilde{\theta}^{\circ}_{n}$
		$\displaystyle=\Delta^{\mathcal{A}}_{n+2}\widetilde{\theta}^{\circ}_{n}+\mathcal{A}_{n+1}\widetilde{\theta}^{\circ}_{n}-\mathcal{A}_{n+2}\widetilde{\theta}^{\circ}_{n+1}+\mathcal{A}_{n+2}(\widetilde{\theta}^{\circ}_{n+1}-\widetilde{\theta}^{\circ}_{n})$
		$\displaystyle=\Delta^{\mathcal{A}}_{n+2}\widetilde{\theta}^{\circ}_{n}+[\mathcal{A}_{n+1}\widetilde{\theta}^{\circ}_{n}-\mathcal{A}_{n+2}\widetilde{\theta}^{\circ}_{n+1}]+\alpha_{n+1}\mathcal{A}_{n+2}(A_{n+1}\widetilde{\theta}^{\circ}_{n}+\Delta_{n+1})$

The sequence $\{\mathcal{E}_{n}\}$ from (38) can be expressed as the sum

\mathcal{E}_{n}=\mathcal{E}_{n}^{(1)}+\mathcal{E}_{n}^{(2)}+\mathcal{E}_{n}^{(3)}+\mathcal{E}_{n}^{(4)}

where $\mathcal{E}_{n}^{(4)}=-\alpha_{n}\mathcal{A}_{n+1}\widetilde{\theta}^{\circ}_{n}$ , and the first three sequences are solutions to the following linear systems:


$\displaystyle\mathcal{E}_{n+1}^{(1)}$	$\displaystyle=\mathcal{E}_{n}^{(1)}+\alpha_{n+1}[A\mathcal{E}_{n}^{(1)}+\Delta^{\mathcal{A}}_{n+2}\widetilde{\theta}^{\circ}_{n}]\,,$	$\displaystyle\mathcal{E}_{0}^{(1)}=0$	(84a)
$\displaystyle\mathcal{E}_{n+1}^{(2)}$	$\displaystyle=\mathcal{E}_{n}^{(2)}+\alpha_{n+1}[A\mathcal{E}_{n}^{(2)}-\alpha_{n}[I+A]\mathcal{A}_{n+1}\widetilde{\theta}^{\circ}_{n}]\,,$	$\displaystyle\mathcal{E}_{1}^{(2)}=\mathcal{A}_{1}\widetilde{\theta}^{\circ}_{0}$	(84b)
$\displaystyle\mathcal{E}_{n+1}^{(3)}$	$\displaystyle=\mathcal{E}_{n}^{(3)}+\alpha_{n+1}[A\mathcal{E}_{n}^{(3)}+\alpha_{n+1}\mathcal{A}_{n+2}(A_{n+1}\widetilde{\theta}^{\circ}_{n}+\Delta_{n+1})]\,,$	$\displaystyle\mathcal{E}_{0}^{(3)}=0$	(84c)

The second recursion arises through the arguments used in the proof of Lemma 2.2.

Recall that $\lambda=-\varrho_{0}+ui$ is an eigenvalue of the matrix $A$ with largest real part. For fixed $0<\varrho<\varrho_{0}$ , let $T\geq 0$ denote the unique solution to the Lyapunov equation

[\varrho I+A]T+T[\varrho I+A]^{\intercal}+I=0

(85)

As previously, the norm of random vector $E\in\mathbb{R}^{d}$ is defined as: $\|E\|_{T}=\sqrt{{\sf E}[E^{\intercal}TE]}$ .

Lemma A.11.

Under Assumptions (A1)-(A4), there exist constants $L_{\ref{t:clE-td-err-inequalities}}$ and $K_{\ref{t:clE-td-err-inequalities}}$ such that, for all $n\geq 1$ ,

(i)

The following holds for each $1\leq i\leq 3$ ,

\|\mathcal{E}_{n+1}^{(i)}\|_{T}^{2}\leq(1-2\varrho\alpha_{n+1}+L_{\ref{t:clE-td-err-inequalities}}^{2}\alpha_{n+1}^{2})\|\mathcal{E}_{n}^{(i)}\|_{T}^{2}+K_{\ref{t:clE-td-err-inequalities}}\alpha_{n+1}^{2}(\|\mathcal{E}_{n}\|_{T}^{2}+\|\widetilde{\theta}^{\bullet}_{n}\|_{T}^{2}+1)

(ii)

The following holds for $\mathcal{E}_{n}^{(4)}$ ,

\|\mathcal{E}_{n+1}^{(4)}\|_{T}^{2}\leq K_{\ref{t:clE-td-err-inequalities}}\alpha_{n+1}^{2}(\|\mathcal{E}_{n}\|_{T}^{2}+\|\widetilde{\theta}^{\bullet}_{n}\|_{T}^{2}+1)

The inequality below will be useful in proving Lemma A.11.

Lemma A.12.

For any real numbers $a,b$ and all $c>0$ ,

(a+b)^{2}\leq(1+c^{-1})a^{2}+(1+c)b^{2}

Proof.

With $(a+b)^{2}=a^{2}+b^{2}+2ab$ , the result follows directly from the inequality

2ab=2(a/\sqrt{c})(\sqrt{c}b)\leq a^{2}/c+cb^{2}

∎

Proof of Lemma A.11.

First consider $\{\mathcal{E}_{n}^{(1)}\}$ updated via (84a). Since the martingale difference sequence $\Delta^{\mathcal{A}}_{n+2}$ is uncorrelated with $\widetilde{\theta}^{\circ}_{n}$ or $\mathcal{E}_{n}^{(1)}$ , we have

\|\mathcal{E}_{n+1}^{(1)}\|_{T}^{2}=\|[I+\alpha_{n+1}A]\mathcal{E}_{n}^{(1)}\|_{T}^{2}+\alpha_{n+1}^{2}\|\Delta^{\mathcal{A}}_{n+2}\widetilde{\theta}^{\circ}_{n}\|_{T}^{2}

Using the fact that $T\geq 0$ solves the Lyapunov equation (85) gives

\|\mathcal{E}_{n+1}^{(1)}\|_{T}^{2}\leq(1-2\varrho\alpha_{n+1}+L_{1}^{2}\alpha_{n+1}^{2})\|\mathcal{E}_{n}^{(1)}\|_{T}^{2}+\alpha_{n+1}^{2}\|\Delta^{\mathcal{A}}_{n+2}\widetilde{\theta}^{\circ}_{n}\|_{T}^{2}

where $L_{1}=\|A\|_{T}$ (the induced operator norm). With $\widetilde{\theta}^{\circ}_{n}=\mathcal{E}_{n}+\widetilde{\theta}^{\bullet}_{n}$ ,

\|\Delta^{\mathcal{A}}_{n+2}\widetilde{\theta}^{\circ}_{n}\|_{T}^{2}\leq 2\|\Delta^{\mathcal{A}}_{n+2}\|_{T}^{2}(\|\mathcal{E}_{n}\|_{T}^{2}+\|\widetilde{\theta}^{\bullet}_{n}\|_{T}^{2})

Consequently,

\|\mathcal{E}_{n+1}^{(1)}\|_{T}^{2}\leq(1-2\varrho\alpha_{n+1}+L_{1}^{2}\alpha_{n+1}^{2})\|\mathcal{E}_{n}^{(1)}\|_{T}^{2}+K_{1}\alpha_{n+1}^{2}(\|\mathcal{E}_{n}\|_{T}^{2}+\|\widetilde{\theta}^{\bullet}_{n}\|_{T}^{2})

(86)

where $K_{1}=\sup_{n}2\|\Delta^{\mathcal{A}}_{n+2}\|_{T}^{2}$ is finite by the $V$ -uniform ergodicity of $\Phi$ applied to $\widehat{\mathcal{A}}_{i,j}^{2}$ (recall Thm. 2.1).

For $\{\mathcal{E}_{n}^{(2)}\}$ updated by (84b), using Lemma A.12 with $c=n(n+1)$ gives

	$\displaystyle\\|\mathcal{E}_{n+1}^{(2)}\\|_{T}^{2}\leq$	$\displaystyle(1+\alpha_{n}\alpha_{n+1})(1-2\varrho\alpha_{n+1}+L_{1}^{2}\alpha_{n+1}^{2})\\|\mathcal{E}_{n}^{(2)}\\|_{T}^{2}$
		$\displaystyle+2(\alpha_{n}\alpha_{n+1}+\alpha_{n}^{2}\alpha_{n+1}^{2})\\|[I+A]\mathcal{A}_{n+1}\\|_{T}^{2}(\\|\mathcal{E}_{n}\\|_{T}^{2}+\\|\widetilde{\theta}^{\bullet}_{n}\\|_{T}^{2})$

We can find $L_{2}$ and $K_{2}$ such that for all $n\geq 1$ ,

	$\displaystyle\alpha_{n+1}^{2}L_{1}^{2}+\alpha_{n}\alpha_{n+1}(1-2\varrho\alpha_{n+1}+L_{1}^{2}\alpha_{n+1}^{2})$	$\displaystyle\leq L_{2}^{2}\alpha_{n+1}^{2}$
	$\displaystyle 2(\alpha_{n}\alpha_{n+1}+\alpha_{n}^{2}\alpha_{n+1}^{2})\\|[I+A]\mathcal{A}_{n+1}\\|_{T}^{2}$	$\displaystyle\leq K_{2}\alpha_{n+1}^{2}$

We then obtain the desired form for the sequence $\{\mathcal{E}_{n}^{(2)}\}$

\displaystyle\|\mathcal{E}_{n+1}^{(2)}\|_{T}^{2}\leq(1-2\varrho\alpha_{n+1}+L_{2}^{2}\alpha_{n+1}^{2})\|\mathcal{E}_{n}^{(2)}\|_{T}^{2}+K_{2}\alpha_{n+1}^{2}(\|\mathcal{E}_{n}\|_{T}^{2}+\|\widetilde{\theta}^{\bullet}_{n}\|_{T}^{2})

(87)

The same argument applies to $\{\mathcal{E}_{n}^{(3)}\}$ in (84c). Therefore, for some constants $L_{3}$ and $K_{3}$ ,

\|\mathcal{E}_{n+1}^{(3)}\|_{T}^{2}\leq(1-2\varrho\alpha_{n+1}+L_{3}^{2}\alpha_{n+1}^{2})\|\mathcal{E}_{n}^{(3)}\|_{T}^{2}+K_{3}\alpha_{n+1}^{2}(\|\mathcal{E}_{n}\|_{T}^{2}+\|\widetilde{\theta}^{\bullet}_{n}\|_{T}^{2}+1)

(88)

A bound on the final term $\mathcal{E}_{n+1}^{(4)}=-\alpha_{n+1}\mathcal{A}_{n+2}\widetilde{\theta}^{\circ}_{n+1}$ is relatively easy.

	$\displaystyle\\|\mathcal{E}_{n+1}^{(4)}\\|_{T}^{2}$	$\displaystyle=\\|\alpha_{n+1}\mathcal{A}_{n+2}[\widetilde{\theta}^{\circ}_{n}+\alpha_{n+1}(A_{n+1}\widetilde{\theta}^{\circ}_{n}+\Delta_{n+1})]\\|_{T}^{2}$
		$\displaystyle\leq 2\alpha_{n+1}^{2}\\|\mathcal{A}_{n+2}\\|_{T}^{2}(\\|I+\alpha_{n+1}A_{n+1}\\|_{T}^{2}\\|\widetilde{\theta}^{\circ}_{n}\\|_{T}^{2}+\alpha_{n+1}^{2}\\|\Delta_{n+1}\\|_{T}^{2})$

Hence there exists some constant $K_{4}$ such that

\|\mathcal{E}_{n+1}^{(4)}\|_{T}^{2}\leq K_{4}\alpha_{n+1}^{2}(\|\mathcal{E}_{n}\|_{T}^{2}+\|\widetilde{\theta}^{\bullet}_{n}\|_{T}^{2}+1)

∎

The results in Lemma A.11 lead to a rough bound on $\|\widetilde{\theta}^{\circ}_{n}\|_{T}^{2}$ presented in the following. This intermediate result will be used later to establish the refined bound in Thm. 2.7.

Lemma A.13.

Under Assumptions (A1)-(A4),

\limsup_{n\rightarrow\infty}n^{\varrho}\|\widetilde{\theta}^{\circ}_{n}\|_{T}^{2}<\infty\,,\qquad\text{for }\,\varrho<\varrho_{0}\text{ and }\varrho\leq 1

Proof.

Denote $\mathcal{E}^{\text{tot}}_{n}=\sum_{i=1}^{4}\|\mathcal{E}_{n}^{(i)}\|_{T}^{2}$ . By Lemma A.11, we can find $n_{0}\geq 1$ such that $1-2\varrho\alpha_{n+1}+L_{\ref{t:clE-td-err-inequalities}}^{2}\alpha_{n+1}^{2}>0$ for $n\geq n_{0}$ and

	$\displaystyle\mathcal{E}^{\text{tot}}_{n+1}$	$\displaystyle\leq(1-2\varrho\alpha_{n+1}+L_{\ref{t:clE-td-err-inequalities}}^{2}\alpha_{n+1}^{2})\mathcal{E}^{\text{tot}}_{n}+4K_{\ref{t:clE-td-err-inequalities}}\alpha_{n+1}^{2}(\\|\mathcal{E}_{n}\\|_{T}^{2}+\\|\widetilde{\theta}^{\bullet}_{n}\\|_{T}^{2}+1)$
		$\displaystyle\leq(1-2\varrho\alpha_{n+1}+L_{\ref{t:clE-td-err-inequalities}}^{2}\alpha_{n+1}^{2})\mathcal{E}^{\text{tot}}_{n}+4K_{\ref{t:clE-td-err-inequalities}}\alpha_{n+1}^{2}(4\mathcal{E}^{\text{tot}}_{n}+\\|\widetilde{\theta}^{\bullet}_{n}\\|_{T}^{2}+1)$
		$\displaystyle\leq(1-2\varrho\alpha_{n+1}+L_{{\text{tot}}}^{2}\alpha_{n+1}^{2})\mathcal{E}^{\text{tot}}_{n}+K_{{\text{tot}}}\alpha_{n+1}^{2}$

with $L_{{\text{tot}}}^{2}=L_{\ref{t:clE-td-err-inequalities}}^{2}+16K_{\ref{t:clE-td-err-inequalities}}$ and $K_{{\text{tot}}}=\sup_{n}4K_{\ref{t:clE-td-err-inequalities}}(\|\widetilde{\theta}^{\bullet}_{n}\|_{T}^{2}+1)$ , which are finite by Lemma A.2 combined with Lemma A.11. Iterating this inequality gives, for $n\geq n_{0}$ ,

\mathcal{E}^{\text{tot}}_{n+1}\leq\mathcal{E}^{\text{tot}}_{n_{0}}\prod_{k=n_{0}+1}^{n+1}(1-2\varrho\alpha_{k}+L_{{\text{tot}}}^{2}\alpha_{k}^{2})+K_{{\text{tot}}}\sum_{k=n_{0}+1}^{n+1}\alpha_{k}^{2}\prod_{l=k+1}^{n+1}(1-2\varrho\alpha_{l}+L_{{\text{tot}}}^{2}\alpha_{l}^{2})

By Lemma A.1,

\mathcal{E}^{\text{tot}}_{n+1}\leq\mathcal{E}^{\text{tot}}_{n_{0}}\frac{K_{\ref{t:prod-n}}n_{0}^{2\varrho}}{(n+2)^{2\varrho}}+\frac{K_{\ref{t:prod-n}}K_{{\text{tot}}}}{(n+2)^{2\varrho}}\sum_{k=n_{0}+1}^{n+1}k^{2\varrho-2}

The partial sum can be estimated by an integral: with $2\varrho-2\leq 0$ ,

\sum_{k=n_{0}}^{n+1}k^{2\varrho-2}\leq 1+\int_{n_{0}}^{n+1}r^{2\varrho-2}dr=\begin{cases}1+[(n+1)^{2\varrho-1}-n_{0}^{2\varrho-1}]/(2\varrho-1)\,,&\text{if }\varrho\neq{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\\ 1+\ln(n+1)-\ln(n_{0})\,,&\text{if }\varrho={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\end{cases}

(89)

Given $\varrho\leq 1$ ,

n^{\varrho}\mathcal{E}^{\text{tot}}_{n}\leq\mathcal{E}^{\text{tot}}_{n_{0}}\frac{K_{\ref{t:prod-n}}n_{0}^{2\varrho}}{(n+2)^{\varrho}}+\frac{K_{\ref{t:prod-n}}K_{{\text{tot}}}}{(n+2)^{\varrho}}\sum_{k=n_{0}+1}^{n+1}k^{2\varrho-2}<\infty

Consequently, $\limsup_{n\rightarrow\infty}n^{\varrho}\|\mathcal{E}_{n}\|_{T}^{2}<\infty$ by the inequality $n^{\varrho}\|\mathcal{E}_{n}\|_{T}^{2}\leq 4n^{\varrho}\mathcal{E}^{\text{tot}}_{n}$ . Then we have

n^{\varrho}\|\widetilde{\theta}^{\circ}_{n}\|_{T}^{2}\leq 2n^{\varrho}\|\mathcal{E}_{n}\|_{T}^{2}+2n^{\varrho}\|\widetilde{\theta}^{\bullet}_{n}\|_{T}^{2}

where $n^{\varrho}\|\widetilde{\theta}^{\bullet}_{n}\|_{T}^{2}\rightarrow 0$ as $n$ goes to infinity by Lemma A.2. Hence $\limsup_{n\rightarrow\infty}n^{\varrho}\|\widetilde{\theta}^{\circ}_{n}\|_{T}^{2}<\infty$ . ∎

Proof of Thm. 2.7.

First consider $\{\mathcal{E}_{n}^{(2)}\}$ updated via (84b). By the triangle inequality and the inequality $\sqrt{1-x}\leq{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}x$ ,

	$\displaystyle\\|\mathcal{E}_{n+1}^{(2)}\\|_{T}$	$\displaystyle\leq\\|[I+\alpha_{n+1}A]\mathcal{E}_{n}^{(2)}\\|_{T}+\alpha_{n}\alpha_{n+1}\\|[I+A]\mathcal{A}_{n+1}\widetilde{\theta}^{\circ}_{n}\\|_{T}$
		$\displaystyle\leq(1-\varrho\alpha_{n+1}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}L^{2}\alpha_{n+1}^{2})\\|\mathcal{E}_{n}^{(2)}\\|_{T}+\alpha_{n+1}^{2+\varrho/2}K$

where $L=\|A\|_{T}$ and $K=\sup_{n}2\|[I+A]\mathcal{A}_{n+1}\|_{T}\|\widetilde{\theta}^{\circ}_{n}\|/(n+1)^{\varrho/2}$ , which is finite thanks to Lemma A.13. Hence, by Lemma A.1 once more,

	$\displaystyle\\|\mathcal{E}_{n+1}^{(2)}\\|_{T}$	$\displaystyle\leq\\|\mathcal{E}_{1}^{(2)}\\|_{T}\prod_{k=2}^{n+1}[1-\varrho\alpha_{k}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}L^{2}\alpha_{k}^{2}]+K\sum_{k=2}^{n+1}\alpha_{k}^{2+\varrho/2}\prod_{l=k+1}^{n+1}[1-\varrho\alpha_{k}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}L^{2}\alpha_{k}^{2}]$
		$\displaystyle\leq\\|\mathcal{E}_{1}^{(2)}\\|_{T}\frac{K_{\ref{t:prod-n}}}{(n+2)^{\varrho}}+\frac{KK_{\ref{t:prod-n}}}{(n+2)^{\varrho}}\sum_{k=2}^{n+1}k^{\varrho/2-2}$

With $\varrho\leq 1$ , we have $\sum_{k=1}^{\infty}k^{\varrho/2-2}\leq\sum_{k=1}^{\infty}k^{-3/2}<\infty$ . Hence $\limsup_{n\rightarrow\infty}n^{\varrho}\|\mathcal{E}_{n}^{(2)}\|_{T}<\infty$ . Replacing $A_{n+1}\widetilde{\theta}^{\circ}_{n}+\Delta_{n+1}$ with $\widetilde{\theta}^{\circ}_{n+1}-\widetilde{\theta}^{\circ}_{n}$ in (84c), the same argument applies to $\{\mathcal{E}_{n}^{(3)}\}$ and we get $\limsup_{n\rightarrow\infty}n^{\varrho}\|\mathcal{E}_{n}^{(3)}\|_{T}<\infty$ . The fact that $\limsup_{n\rightarrow\infty}n\|\mathcal{E}_{n+1}^{(4)}\|_{T}<\infty$ follows directly from definition $\mathcal{E}_{n}^{(4)}=-\alpha_{n}\mathcal{A}_{n+1}\widetilde{\theta}^{\circ}_{n}$ and Lemma A.13. Then we have, for each $2\leq i\leq 4$ ,

\limsup_{n\rightarrow\infty}n^{\varrho}\|\mathcal{E}_{n}^{(i)}\|_{T}<\infty\,,\qquad\text{for }\varrho<\varrho_{0}\text{ and }\varrho\leq 1

(90)

Now consider the martingale difference part $\{\mathcal{E}_{n}^{(1)}\}$ . The following is directly obtained from (84a):

	$\displaystyle\\|\mathcal{E}_{n+1}^{(1)}\\|_{T}^{2}\leq$	$\displaystyle(1-2\varrho\alpha_{n+1}+L^{2}\alpha_{n+1}^{2})\\|\mathcal{E}_{n}^{(1)}\\|_{T}^{2}+\alpha_{n+1}^{2}\\|\Delta^{\mathcal{A}}_{n+2}\\|_{T}^{2}\\|\widetilde{\theta}^{\circ}_{n}\\|_{T}^{2}$
	$\displaystyle\leq$	$\displaystyle(1-2\varrho\alpha_{n+1}+L^{2}\alpha_{n+1}^{2})\\|\mathcal{E}_{n}^{(1)}\\|_{T}^{2}+\alpha_{n+1}^{2}\\|\Delta^{\mathcal{A}}_{n+2}\\|_{T}^{2}\bigl{[}8\sum_{i=1}^{4}\\|\mathcal{E}_{n}^{(i)}\\|_{T}^{2}+2\\|\widetilde{\theta}^{\bullet}_{n}\\|_{T}^{2}\bigr{]}$

From Lemma A.2 we have $\sup_{n}n^{\delta}\|\widetilde{\theta}^{\bullet}_{n}\|_{T}^{2}<\infty$ for $\delta=\min(1,2\varrho)$ . Combining this with (90) implies that there exists some constant $K_{\mathcal{M}}$ such that for $\delta=\min(1,2\varrho)$ ,

\|\Delta^{\mathcal{A}}_{n+2}\|_{T}^{2}\bigl{[}8\sum_{i=2}^{4}\|\mathcal{E}_{n}^{(i)}\|_{T}^{2}+2\|\widetilde{\theta}^{\bullet}_{n}\|_{T}^{2}\bigr{]}\leq K_{\mathcal{M}}\frac{1}{(n+1)^{\delta}}

Consequently,

\displaystyle\|\mathcal{E}_{n+1}^{(1)}\|_{T}^{2}

\displaystyle\leq(1-2\varrho\alpha_{n+1}+L_{\mathcal{M}}^{2}\alpha_{n+1}^{2})\|\mathcal{E}_{n}^{(1)}\|_{T}^{2}+K_{\mathcal{M}}\alpha_{n+1}^{2+\delta}

where $L_{\mathcal{M}}^{2}=\sup_{n}L^{2}+8\|\Delta^{\mathcal{A}}_{n+2}\|_{T}^{2}$ . With initial condition $\mathcal{E}_{0}=0$ , iterating this inequality gives

\displaystyle\|\mathcal{E}_{n+1}^{(1)}\|_{T}^{2}\leq K_{\mathcal{M}}\sum_{k=1}^{n+1}\alpha_{k}^{2+\delta}\prod_{l=k+1}^{n+1}[1-2\varrho\alpha_{l}+L_{\mathcal{M}}^{2}\alpha_{l}^{2}]\leq\frac{K_{\mathcal{M}}K_{\ref{t:prod-n}}}{(n+2)^{2\varrho}}\sum_{k=1}^{n+1}k^{-(2+\delta-2\varrho)}

With $2+\delta-2\varrho>0$ , the partial sum is bounded by an integral similar as (89):

\frac{1}{(n+2)^{2\varrho}}\sum_{k=1}^{n+1}k^{-(2+\delta-2\varrho)}=\begin{cases}O((n+1)^{-2\varrho}),&\text{if }\varrho\leq{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}\text{ and }\delta=2\varrho\\ O((n+1)^{-2\varrho}),&\text{if }{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}}<\varrho<1\text{ and }\delta=1\\ O((n+1)^{-2}),&\text{if }\varrho>1\text{ and }\delta=1\end{cases}

Therefore,

(i)

If $\varrho_{0}\leq 1$ , then $\limsup_{n\rightarrow\infty}(n+1)^{2\varrho}\|\mathcal{E}_{n+1}^{(1)}\|_{T}^{2}<\infty$ for $\varrho<\varrho_{0}$ .
(ii)

If $\varrho_{0}>1$ , then $\limsup_{n\rightarrow\infty}(n+1)^{2}\|\mathcal{E}_{n+1}^{(1)}\|_{T}^{2}<\infty$ .

Given that the same convergence rates hold for the other components in (90), the conclusion then follows. ∎

	$\displaystyle\\|\widetilde{\theta}_{n+1}^{\varrho,(2)}\\|_{T}\leq$	$\displaystyle\\|\widetilde{\theta}_{n_{0}}^{\varrho,(2)}\\|_{T}\prod_{k=n_{0}+1}^{n+1}(1-\delta\alpha_{k})+K\sum_{k=n_{0}}^{n}\alpha_{k}^{3/2}\prod_{l=k+1}^{n}(1-\delta\alpha_{l})$
	$\displaystyle\leq$	$\displaystyle\\|\widetilde{\theta}_{n_{0}}^{\varrho,(2)}\\|_{T}\exp(\delta)\frac{n_{0}^{\delta}}{(n+2)^{\delta}}+\frac{K\exp(\delta)}{(n+1)^{\delta}}\sum_{k=n_{0}}^{n}k^{-\frac{3}{2}+\delta}$

	$\displaystyle\\|h_{i,j}^{k+1,n+2}\\|_{V^{*}}=$	$\displaystyle\sup_{z^{\prime},z^{\prime\prime}\in{\sf Z}}\frac{\bigl{\|}\bigl{[}{\hat{f}}_{i}(z^{\prime\prime})+{\sf E}[{\hat{f}}_{i}(\Phi_{k+1})\mid\Phi_{k}=z^{\prime}]\bigr{]}{\sf E}[\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}(\Phi_{n+2})\mid\Phi_{k+1}=z^{\prime\prime}]\bigr{\|}}{V(z^{\prime})+V(z^{\prime\prime})}$
	$\displaystyle\leq$	$\displaystyle\sup_{z^{\prime\prime}\in{\sf Z}}\frac{\bigl{\|}{\hat{f}}_{i}(z^{\prime\prime}){\sf E}[\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}(\Phi_{n+2})\mid\Phi_{k+1}=z^{\prime\prime}]\bigr{\|}}{V(z^{\prime\prime})}$
		$\displaystyle+\sup_{z^{\prime},z^{\prime\prime}\in{\sf Z}}\frac{\bigl{\|}{\sf E}[{\hat{f}}_{i}(\Phi_{k+1})\mid\Phi_{k}=z^{\prime}]{\sf E}[\hat{\vphantom{\rule{1.0pt}{6.14584pt}}\smash{\hat{f}}}_{j}(\Phi_{n+2})\mid\Phi_{k+1}=z^{\prime\prime}]\bigr{\|}}{V(z^{\prime})+V(z^{\prime\prime})}$

	$\displaystyle\|\beta_{n}/\beta_{n+1}-1\|^{2}$	$\displaystyle=\|(1+n^{-1})^{1-\varrho}[1+\alpha_{n+1}\tilde{\varrho}_{n+1}+\alpha_{n+1}ui]-1\|^{2}$		(76)
		$\displaystyle=\|(1+n^{-1})^{1-\varrho}[1+\alpha_{n+1}\tilde{\varrho}_{n+1}]-1+(1+n^{-1})^{1-\varrho}\alpha_{n+1}ui\|^{2}$		(76)

	$\displaystyle\liminf_{n\rightarrow\infty}{\sf E}\bigl{[}\|v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}+\sum_{k=n_{0}+1}^{n+1}\beta_{k}v^{\intercal}\Delta_{k}\|^{2}\bigr{]}$	$\displaystyle\geq{\sf E}\bigl{[}\liminf_{n\rightarrow\infty}\|v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}+\sum_{k=n_{0}+1}^{n+1}\beta_{k}v^{\intercal}\Delta_{k}\|^{2}\bigr{]}$
		$\displaystyle={\sf E}\bigl{[}\|v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}+\sum_{k=n_{0}+1}^{\infty}\beta_{k}v^{\intercal}\Delta_{k}\|^{2}\bigr{]}$
		$\displaystyle\geq\hbox{\sf Var}\,\bigl{(}v^{\intercal}\widetilde{\theta}_{n_{0}}^{\varrho}+\sum_{k=n_{0}+1}^{\infty}\beta_{k}v^{\intercal}\Delta_{k}\bigr{)}$

	$\displaystyle\\|\mathcal{E}_{n+1}^{(1)}\\|_{T}^{2}\leq$	$\displaystyle(1-2\varrho\alpha_{n+1}+L^{2}\alpha_{n+1}^{2})\\|\mathcal{E}_{n}^{(1)}\\|_{T}^{2}+\alpha_{n+1}^{2}\\|\Delta^{\mathcal{A}}_{n+2}\\|_{T}^{2}\\|\widetilde{\theta}^{\circ}_{n}\\|_{T}^{2}$
	$\displaystyle\leq$	$\displaystyle(1-2\varrho\alpha_{n+1}+L^{2}\alpha_{n+1}^{2})\\|\mathcal{E}_{n}^{(1)}\\|_{T}^{2}+\alpha_{n+1}^{2}\\|\Delta^{\mathcal{A}}_{n+2}\\|_{T}^{2}\bigl{[}8\sum_{i=1}^{4}\\|\mathcal{E}_{n}^{(i)}\\|_{T}^{2}+2\\|\widetilde{\theta}^{\bullet}_{n}\\|_{T}^{2}\bigr{]}$

Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation

Abstract

1 Introduction

1.1 Markov Chain Monte Carlo

1.2 Reinforcement Learning

1.3 Literature Survey

1.4 Contributions

2 Mean Square Convergence

2.1 Notation and Background

Assumptions:

Theorem 2.1.

2.2 Decomposition and Scaling of the Parameter Sequence

Lemma 2.2.

Lemma 2.3.

2.3 Mean Square Error Bounds

Theorem 2.4.

Corollary 2.5.

Theorem 2.6.

Theorem 2.7.

2.4 Implications

Corollary 2.8.

2.5 Proof of Thm. 2.4

2.5.1 The martingale difference term

Proposition 2.9.

2.5.2 The telescoping sequence term

Proposition 2.10.

2.5.3 Proof of Thm. 2.4

2.6 Finer Error Bound

2.6.1 Finer Decomposition with Second Poisson Equation

2.6.2 Finer Mean Square Error Bound

Theorem 2.11.

2.6.3 Proof of Thm. 2.11

Proposition 2.12.

Proof of Thm. 2.11.

3 Conclusions

References

Appendix A Appendices

A.1 Proofs for decomposition and scaling

Proof of Lemma 2.2.

Proof of Lemma 2.3.

Lemma A.1.

Proof.

Lemma A.2.

Proof.

A.2 Proof of Prop. 2.9

(i)

(ii)

A.3 Proof for Prop. 2.10

(i)

(ii)

A.4 Proof of Prop. 2.12

(i)

(ii)

(iii)

Lemma A.3.

Proof.

Lemma A.4.

Proof.

Lemma A.5.

Proof.

Lemma A.6.

Proof.

Proof of Prop. 2.12 (iii).

A.5 Unbounded moments

Lemma A.7.

Proof.

Lemma A.8.

Proof.

Lemma A.9.

Proof.

Lemma A.10.

Proof.

A.6 Coupling of Deterministic and Random Linear SA

Lemma A.11.

Lemma A.12.

Proof.

Proof of Lemma A.11.

Lemma A.13.

Proof.

Proof of Thm. 2.7.

Explicit Mean-Square Error Bounds
for Monte-Carlo and Linear Stochastic Approximation