This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Harris recurrent Markov chains and nonlinear monotone cointegrated models

Patrice Bertaillabel=e1][email protected]\orcid0000-0002-6011-3432 [    Cécile Durotlabel=e2][email protected]\orcid0000-0002-4502-0174 [    Carlos Fernándezlabel=e3][email protected]\orcid0000-0002-8577-865X [ MODAL’X, UMR CNRS 9023, Université Paris Nanterrepresep=, ]e1,e2 LTCI, Telecom Paris, Institut Polytechnique de Parispresep=, ]e3
Abstract

In this paper, we study a nonlinear cointegration-type model of the form Zt=f0(Xt)+WtZ_{t}=f_{0}(X_{t})+W_{t} where f0f_{0} is a monotone function and XtX_{t} is a Harris recurrent Markov chain. We use a nonparametric Least Square Estimator to locally estimate f0f_{0}, and under mild conditions, we show its strong consistency and obtain its rate of convergence. New results (of the Glivenko-Cantelli type) for localized null recurrent Markov chains are also proved.

62G05,
62M05,
62G30,
monotone regression,
isotonic regression,
nonlinear cointegration,
nonparametric estimation,
null recurrent Markov chain,
keywords:
[class=MSC]
keywords:
\startlocaldefs\endlocaldefs

and

1 Introduction and motivations

1.1 Linear and nonlinear cointegation models

Linear cointegration introduced by [16] and developed by [14] and [21, 22], is a concept used in statistics and econometrics to describe a long-term relationship between two or more time series. In general, these time series are non-stationary, integrated of order 1, that is, they behave roughly as random walks. In traditional linear cointegration analysis, variables are assumed to have a linear relationship, which means their long-term equilibrium, as time grows, is characterized by a constant linear combination. This concept has since been extensively studied, particularly in the field of econometrics [14, 30, 31, 21, 22]. Notice that, when there is indeed a significant linear relationship, the link is monotone in each of the variables.

However, in some cases, the relationship between variables may exhibit nonlinear behavior, which cannot be adequately captured by linear cointegration models. The incorporation of nonlinearities allows for a more comprehensive understanding of long-term relationships between variables. [20] have developed an approach for analyzing nonlinear cointegration through threshold cointegration models. These models assume that the linear relationship between variables differs after some changepoints, leading to different long-run equilibrium states (for instance according to some latent regimes). Threshold cointegration models provide a framework for capturing nonlinearities in the data and estimating the changepoints. Refer to [32] for examples and discussions on the importance to introduce nonlinearities in cointegration applications and for further references.

Another method for analyzing nonlinear cointegration is through the use of smooth transition cointegration models introduced by [17]. These models assume a ECM (Error Correction Model) form and allow for smooth transitions between different regimes in the data. Most estimators of non-linear cointegration may be seen as Nadaraya-Watson estimators of the link function. For instance, Wang and Phillips [36, 37, 35] show that it is possible to estimate and perform asymptotic inference in specific nonparametric cointegration regression models using kernel regression techniques. Furthermore, they established that the self-normalized kernel regression estimators converge to a standard normal distribution limit, even when the explanatory variable is integrated. These findings indicate that the estimators can consistently capture the underlying relationship between variables, even in cases where the explanatory variable exhibits non-stationary behavior. The problem of estimating f0f_{0} under the Markovian assumptions has also been tackled using local linear M-type estimators in [8, 26] using smoothing techniques.

These results have been partially extended in the framework of general β\beta-recurrent Markov chain and not just integrated I(1)I(1) time series by [23, 7]. Consider the simple framework where we observe two Markov chains, ZtZ_{t} and XtX_{t}. [23] are essentially interested in the study of nonlinear cointegrated models such as,

Zt=f0(Xt)+Wt,Z_{t}=f_{0}(X_{t})+W_{t}, (1)

where f0f_{0} is a nonlinear function. XtX_{t} and WtW_{t} are independent processes, and XtX_{t} is a positive or β\beta-null recurrent Markov chain. Despite the fact that there is no stationary probability measure for XtX_{t}, they apply the Nadaraya-Watson method to estimate f0f_{0} and established the asymptotic theory of the proposed estimator. The rate of convergence of the estimators at some point xx is essentially linked to the local properties of the β\beta -null recurrent chain XtX_{t} and typically of the order of the square root of the number of visits of the chain in a neighborhood of the point xx.

1.2 Monotone cointegration models: motivations

Monotonicity in cointegration is a rather natural assumption in many economic applications, for instance for modeling demand as a function of income or prices (see for instance [11]) or other variables. Suppose, for example, we are interested in analyzing the long-term relationship between ice cream sales and the average monthly temperature. These two non-stationary variables may be modeled by some β\beta-recurrent Markov chains. We hypothesize that as the average monthly temperature increases, the demand for ice cream also increases: however the rate of increase may vary according to the season. In that case, the nonlinear relationship between the two variables will be monotone. In microeconomics, the same phenomenon is expected for Engel curves, describing how real expenditure varies with household income (see [11]). Expenditures and income (or their log in most models) may be considered as non-stationary variables. However considering a linear cointegration between them may be misleading, since the relationship may change along the life cycle. By Engle’s law, the relationship between the two variables should be monotone. Other types of examples of monotone non-linear cointegration phenomenon may be found in [32].

The purpose of this paper is to propose a simple estimator that is automatically monotone, does not require strong smoothness assumptions (we only require continuity of the link function), and operates under general Markovian assumptions. We establish a nonparametric estimation theory of the nonparametric least squares estimator (LSE) for the function f0f_{0} in the model (1) under the constraints that f0f_{0} is monotone non-increasing. Here, {Wt}\{W_{t}\} is an unobserved process such that E(Wt|Xt)=0E(W_{t}|X_{t})=0 to ensure identifiability of f0f_{0}. Since a minimal condition for undertaking asymptotic analysis on f0(x0)f_{0}(x_{0}) at a given point x0x_{0} is that, as the number of observations on {Xt}\{X_{t}\} increases, there must be infinitely many observations in the neighborhood of x0x_{0}, the process {Xt}\{X_{t}\} will be assumed to be a Harris recurrent Markov chain (cf section 2). We consider at the same time the stationary and β\beta null recurrent non-stationary framework. To our knowledge, it is the first time that such an estimator is proposed in the literature in such a large framework.

1.3 The estimator

Let CC be a set whose interior contains our point of interest x0x_{0}. Having observed (Xt,Zt)}t=0n\left(X_{t},Z_{t}\right)\}_{t=0}^{n}, we denote by Tn(C)T_{n}(C) the number of times that X visited CC up to time nn and by σC(i)\sigma_{C}\left(i\right) the time of the ii-th visit. Then, we consider the nonparametric LSE defined as the minimizer of

fi=1Tn(C)(ZσC(i)f(XσC(i)))2f\mapsto\sum\limits_{i=1}^{T_{n}(C)}{{{\left({{Z_{{\sigma_{C}\left(i\right)}}}-f\left({{X_{{\sigma_{C}\left(i\right)}}}}\right)}\right)}^{2}}} (2)

over the set of non-increasing functions ff on \mathbb{R}. The nonparametric LSE f^n\widehat{f}_{n} has a well known characterization, as follows. Let mm be the number of unique values of XσC(1),,XσC(Tn(C))X_{\sigma_{C}\left(1\right)},\dots,X_{\sigma_{C}\left(T_{n}(C)\right)}, and Y1<<YmY_{1}<\dots<Y_{m} be the corresponding order statistics. Then, f^n(Yk)\widehat{f}_{n}(Y_{k}) is the left-hand slope at i=1Tn(C)𝕀{XσC(i)Yk}\sum_{i=1}^{T_{n}(C)}\mathbb{I}{\left\{X_{\sigma_{C}\left(i\right)}\leqslant Y_{k}\right\}} of the least concave majorant of the set of points

{(0,0),(i=1Tn(C)𝕀{XσC(i)Yk},i=1Tn(C)ZσC(i)𝕀{XσC(i)Yk}),k=1,,m},\left\{(0,0),\ \left(\sum_{i=1}^{T_{n}(C)}\mathbb{I}{\left\{X_{\sigma_{C}\left(i\right)}\leqslant Y_{k}\right\}},\sum_{i=1}^{T_{n}(C)}Z_{\sigma_{C}\left(i\right)}\mathbb{I}{\left\{X_{\sigma_{C}\left(i\right)}\leqslant Y_{k}\right\}}\right),\ k=1,\dots,m\right\}, (3)

and it can be computed using simple algorithms as discussed in [3]. Thus, the constrained LSE is uniquely defined at the observation points, however, it is not uniquely defined between these points: any monotone interpolation of these values is a constrained LSE. As is customary, we consider in the sequel the piecewise-constant and left-continuous LSE that is constant on every interval (Yk1,Yk](Y_{k-1},Y_{k}], k=2,,mk=2,\dots,m and also on (,Y1](-\infty,Y_{1}] and on [Ym,)[Y_{m},\infty).

The use of a localized estimator is due to the fact that we need to control the behavior of the chain around x0x_{0}, and, to do this, we need to estimate the asymptotic ”distribution” of X in a vicinity of x0x_{0}. For Harris recurrent Markov chains, the long-term behavior of the chain is given by its invariant measure (see Section 2). In the positive recurrent case, the invariant measure is finite and it can be estimated by simply considering the empirical cumulative distribution function of the XtX_{t}. However, in the null recurrent case, the invariant measure is only σ\sigma-finite, hence, we need to localize our analysis in a set big enough such that the chain visits it infinitely often, but small enough that the restriction of the invariant measure to it is finite. Moreover, contrary to the bandwidth in kernel type estimators, CC does not depend on nn, and the rate of convergence of the estimator does not depend on CC.

1.4 Outline

Since our paper draws quite heavily on the theory of Harris recurrent Markov chains, we have added a small introduction to the subject as well as the main results that we use throughout the paper in Section 2. In Section 3, we show that under very general assumptions, our estimator f^n\widehat{f}_{n} is strongly consistent, while its rate of convergence is presented in Section 4. In Section 5, we present new results concerning the localized empirical process of Harris recurrent Markov chains that have emerged during our investigation and we believe are interesting in their own right. Section 6 contains the proofs of our main results.

2 Markov chain theory and notation

In this section, we present the notation and main results related to Markov chains that are needed to present our main results. For further details, we refer the reader to the first section of the Supplementary Material [4] and the books [29, 27, 12].

Consider a time-homogeneous irreducible Markov Chain, denoted as X=X0,X1,X2,\textbf{X}=X_{0},X_{1},X_{2},\ldots, defined on a probability space (E,,)\left(E,\mathcal{E},\mathbb{P}\right), where \mathcal{E} is countably generated. The irreducibility measure of the chain is represented by ψ\psi. The transition kernel of the chain is denoted as P(x,A)P\left(x,A\right) and its initial distribution is represented by λ\lambda. If the initial measure of the chain is specified, we use λ\mathbb{P}_{\lambda} (and 𝔼λ\mathbb{E}_{\lambda}) to denote the probability (and the expectation) conditioned on the law of the initial state (X0)=λ\mathcal{L}\left(X_{0}\right)=\lambda.

For any set CC\in\mathcal{E}, we will denote by σC\sigma_{C} and τC\tau_{C}, respectively, the times of first visit and first return of the chain to the set CC, i.e. τC=inf{n1:XnC}\tau_{C}=\operatorname{inf}\left\{n\geqslant 1:X_{n}\in C\right\} and σC=inf{n0:XnC}\sigma_{C}=\operatorname{inf}\left\{n\geqslant 0:X_{n}\in C\right\}. The subsequent visit and return times σC,τC(k)\sigma_{C},\tau_{C}\left(k\right), k1k\geqslant 1 are defined inductively.

Given that our methods will only deal with the values of X in a fixed set CC, if AA is a measurable set, we will write 𝕀C{XtA}\mathbb{I}_{C}\{X_{t}\in A\} instead of 𝕀{XtAC}\mathbb{I}\{X_{t}\in A\cap C\} and if A=EA=E, then we will simply write 𝕀C(Xt)\mathbb{I}_{C}\left(X_{t}\right). We will use Tn(C)T_{n}\left(C\right) to denote the random variable that counts the number of times the chain has visited the set CC up to time nn, that is Tn(C)=t=0n𝕀C(Xt)T_{n}\left(C\right)=\sum_{t=0}^{n}{\mathbb{I}_{C}\left(X_{t}\right)}. Similarly, we will write T(C)T\left(C\right) for the total of numbers of visits the chain X to CC. The set CC is called recurrent if 𝔼xT(C)=+\mathbb{E}_{x}T\left(C\right)=+\infty for all xCx\in C and the chain X is recurrent if every set AA\in\mathcal{E} such that ψ(A)>0\psi\left(A\right)>0 is recurrent. A recurrent chain is called Harris recurrent if for all xEx\in E and all AA\in\mathcal{E} with ψ(A)>0\psi(A)>0 we have (XnAinfinitely often|X0=x)=1\mathbb{P}\left({{X_{n}}\in A\;\text{infinitely often}\left|{{X_{0}}=x}\right.}\right)=1.

Denote by +\mathcal{E}^{+} the class of nonnegative measurable functions with positive ψ\psi support. A function s+s\in\mathcal{E}^{+} is called small if there exists an integer m01m_{0}\geqslant 1 and a measure ν()+\nu\in\mathscr{M}{\left({\mathcal{E}}\right)_{+}} such that

Pm0(x,A)s(x)ν(A)xE,A.{P^{m_{0}}}\left({x,A}\right)\geqslant s\left(x\right)\nu\left(A\right)\quad\forall x\in E,A\in\mathcal{E}. (4)

When a chain possesses a small function ss, we say that it satisfies the minorization inequality M(m0,s,ν)M\left(m_{0},s,\nu\right). A set AA\in\mathcal{E} is said to be small if the function 𝕀A\mathbb{I}_{A} is small. Similarly, a measure ν\nu is small if there exist m0m_{0}, and ss that satisfy (4). By Theorem 2.1 in [29], every irreducible Markov chain possesses a small function and Proposition 2.6 of the same book shows that every measurable set AA with ψ(A)>0\psi\left(A\right)>0 contains a small set. In practice, finding such a set consists in most cases in exhibiting an accessible set, for which the probability that the chain returns to it in mm steps is uniformly bounded below. Moreover, under quite wide conditions a compact set will be small, see [15].

An irreducible chain possesses an accessible atom, if there is a set 𝜶\boldsymbol{\alpha}\in\mathcal{E} such that for all x,yx,y in 𝜶\boldsymbol{\alpha}: P(x,.)=P(y,.)P(x,.)=P(y,.) and ψ(𝜶)>0\psi(\boldsymbol{\alpha})>0. When an accessible atom exists, the stochastic stability properties of X amount to properties concerning the speed of return time to the atom only. Moreover, it follows from the strong Markov property that the sample paths may be divided into independent blocks of random length corresponding to consecutive visits to 𝜶\boldsymbol{\alpha}. The sequence {τ𝜶(j)}j1\left\{\tau_{\boldsymbol{\alpha}}(j)\right\}_{j\geqslant 1} defines successive times at which the chain forgets its past, called regeneration times. Similarly, the sequence of i.i.d. blocks {j}j1\{\mathcal{B}_{j}\}_{j\geqslant 1} are named regeneration blocks. The random variable T(n)=Tn(𝜶)1T\left({n}\right)=T_{n}\left(\boldsymbol{\alpha}\right)-1 counts the number of i.i.d. blocks up to time nn. This term is called number of regenerations up to time nn.

If X does not possess an atom but is Harris recurrent (and therefore satisfies a minorization inequality M(m0,s,ν)M\left(m_{0},s,\nu\right)), a splitting technique, introduced in [28, 29], allows us to extend in some sense the probabilistic structure of X in order to artificially construct an atomic chain (named the split chain and denoted by Xˇ{\check{\textbf{X}}}) that inherits the communication and stochastic stability properties from X. One of the main results derived from this construction is the fact that every Harris recurrent Markov chain admits a unique (up to multiplicative constant) invariant measure (see Proposition 10.4.2 in [27]), that is, a measure π\pi such that

π(B)=P(x,B)dπ(x).B.\pi\left(B\right)=\int{P\left({x,B}\right)d\pi\left(x\right)}.\quad\forall B\in\mathcal{E}.

The invariant measure is finite if and only if 𝔼𝜶ˇτ𝜶ˇ<+\mathbb{E}_{\check{\boldsymbol{\alpha}}}{\tau_{\check{\boldsymbol{\alpha}}}}<+\infty, in this case we say the chain is positive recurrent, otherwise, we say the chain is null recurrent. A null recurrent chain is called β\beta-null recurrent (c.f. Definition 3.2 in [24]) if there exists a small nonnegative function hh, a probability measure λ\lambda, a constant β(0,1)\beta\in\left({0,1}\right) and a slowly varying function LhL_{h} such that

𝔼λ(t=0nh(Xt))1Γ(1+β)nβLh(n)asn.{\mathbb{E}_{\lambda}}\left({\sum\limits_{t=0}^{n}{h\left({{X_{t}}}\right)}}\right)\sim\frac{1}{{\Gamma\left({1+\beta}\right)}}{n^{\beta}}{L_{h}}\left(n\right)\quad\textnormal{as}\;n\to\infty.

As argued in [24], is not a too severe restriction to assume m0=1m_{0}=1. Therefore, throughout this paper we assume that X satisfies the minorization inequality M(1,s,ν)M(1,s,\nu), i.e, there exist a measurable function ss and a probability measure ν\nu such that 0s(x)10\leqslant s\left(x\right)\leqslant 1, Es(x)𝑑ν(x)>0\int_{E}{s(x)d\nu(x)}>0 and

P(x,A)s(x)ν(A).{P}\left({x,A}\right)\geqslant s\left(x\right)\nu\left(A\right). (5)
Remark 2.1.

The extensions to the case where m0>1m_{0}>1 of the results that will be presented in this paper can be carried out (although they involve some complicated notations/proofs) using the mm-skelethon or the resolvent chains, as described in [9, 10] and Chapter 17 of [27]. However, they are not treated in this paper.

The following theorem is a compendium of the main properties of Harris’s recurrent Markov chains that will be used throughout the paper. Among other things, it shows that the asymptotic behavior of T(n)T\left({n}\right) is similar to the function u(n)u\left(n\right) defined as

u(n)={n,if X is positive recurrentnβL(n),if X is β-null recurrent.u\left(n\right)=\begin{cases}n,&\text{if }\textbf{X}\text{ is positive recurrent}\\ n^{\beta}L\left(n\right),&\text{if }\textbf{X}\text{ is }\beta\text{-null recurrent}\end{cases}. (6)
Theorem 2.1.

Suppose X is a Harris recurrent, irreducible Markov chain, with initial measure λ\lambda, that satisfies the minorization condition (5). Let T(n)T\left({n}\right) be the number of complete regenerations until time nn of the split chain Xˇ{\check{\textbf{X}}} , let CC\in\mathcal{E} be a small set and π\pi be an invariant measure for X. Then,

  1. 1.

    0<π(C)<+0<\pi\left(C\right)<+\infty.

  2. 2.

    T(n)Tn(C)\frac{T\left({n}\right)}{T_{n}(C)} converges almost surely to a positive constant.

  3. 3.

    T(n)u(n)\frac{T\left({n}\right)}{u\left(n\right)} converges almost surely to a positive constant if X is positive recurrent and converges in distribution to a Mittag-Leffler111The Mittag-Leffler distribution with index β\beta is a non-negative continuous distribution, whose moments are given by 𝔼(Mβm(1))=m!Γ(1+mβ)m0.\mathbb{E}\left(M^{m}_{\beta}\left(1\right)\right)=\frac{m!}{\Gamma\left(1+m\beta\right)}\;\;m\geqslant 0. See (3.39) in [24] for more details. random variable with index β\beta if X is β\beta-null recurrent.

3 Consistency

The aim of the section is to show that for an arbitrary x0x_{0} in the support of f0f_{0}, the LSE f^n(x0)\widehat{f}_{n}(x_{0}) is consistent. We make the following assumptions on the processes X={Xt}\textbf{X}=\{X_{t}\} and W={Wt}\textbf{W}=\{W_{t}\}.

  • (A1)

    X is a Harris recurrent Markov chain whose kernel P(x,A)P(x,A) satisfies the minorization condition (5).

Let n=σ({X0,,Xn})\mathcal{F}_{n}=\sigma\left(\left\{X_{0},\ldots,X_{n}\right\}\right) be sigma algebra generated by the chain X up to time nn.

  • (A2)

    For each nn, the random variables W1,,WnW_{1},\ldots,W_{n} are conditionally independent given n{{\mathcal{F}_{n}}}, 𝔼(Wt|n)=0\mathbb{E}{\left(W_{t}|\mathcal{F}_{n}\right)}=0 and Var(Wt|n)σ2\operatorname{Var}\left(W_{t}|\mathcal{F}_{n}\right)\leqslant\sigma^{2} for some σ>0\sigma>0.

It follows from Assumption (A1) that the Markov Chain X admits a unique (up to a multiplicative constant) σ\sigma-finite invariant measure π\pi. Let CC be a set such that 0<π(C)<0<\pi\left(C\right)<\infty and x0Cx_{0}\in C. We denote by FnF_{n} the process defined by

Fn(y)=1Tn(C)i=1Tn(C)𝕀{XσC(i)y}=1Tn(C)t=0n𝕀C{Xty}F_{n}(y)=\frac{1}{T_{n}(C)}\sum_{i=1}^{T_{n}(C)}{\mathbb{I}{\{X_{\sigma_{C}\left(i\right)}\leqslant y\}}}=\frac{1}{T_{n}(C)}\sum_{t=0}^{n}\mathbb{I}_{C}{\{X_{t}\leqslant y\}} (7)

for all yy\in\mathbb{R}, which is a localized version of the empirical distribution function of the XtX_{t}’s. It is proved in Lemma 5.1 that FnF_{n} converges almost surely to the distribution function FF supported on CC and defined by

F(y)=π(C(,y])π(C).{F}\left(y\right)=\frac{{\pi\left({C\cap\left({-\infty,y}\right]}\right)}}{{\pi\left(C\right)}}. (8)

Our next two assumptions guarantee that there is a compact CC, that is a small set and contains x0x_{0} as an interior point. Sets like this can be found under very wide conditions (cf [15]).

  • (A3)

    There is δ=δ(x0)\delta=\delta(x_{0}) such that the set C=[x0δ,x0+δ]C=\left[{{x_{0}}-\delta,{x_{0}}+\delta}\right] is small.

  • (A4)

    x0x_{0} belongs to the interior of the support of XtX_{t}.

Notice that by part 1 of Theorem 2.1, (A3)  guarantees that π(C)\pi(C) is finite and positive, and hence, FF is properly defined.

In addition to the assumptions on the processes {Xt}\{X_{t}\} and {Wt}\{W_{t}\}, we need smoothness assumptions on FF and on f0f_{0}. In particular, we will assume that FF and f0f_{0} are continuous and strictly monotone in CC. This implies that f0f_{0} and FF are invertible in CC, so we can find neighborhoods of f0(x0)f_{0}(x_{0}) and F(x0)F(x_{0}) respectively, over which the inverse functions are uniquely defined. We denote by f01f_{0}^{-1} and F1F^{-1} respectively the inverses of f0f_{0} and FF over such a neighborhood of f0(x0)f_{0}(x_{0}) and F(x0)F(x_{0}) respectively. The function f0f_{0} is assumed to be monotone on its whole support.

  • (A5)

    FF is locally continuous and strictly increasing in the sense that for all xx^{\prime} in CC, for all ε>0\varepsilon>0, there exists γ>0\gamma>0 such that |F1(u)x|>γ|F^{-1}(u)-x^{\prime}|>\gamma for all uu such that |uF(x)|ε|u-F(x^{\prime})|\geqslant\varepsilon.

  • (A6)

    f0f_{0} is non-increasing, and f0f_{0} is locally strictly decreasing in the sense that for all xx^{\prime} in CC, for all ε>0\varepsilon>0, there exists γ>0\gamma>0 such that |f0(x)f0(y)|>γ|f_{0}(x^{\prime})-f_{0}(y)|>\gamma for all yy such that |yx|ε|y-x^{\prime}|\geqslant\varepsilon.

  • (A7)

    f0f_{0} continuous in CC.

Assumptions (A1), (A3) and (A5) ensure that XtX_{t} visits infinitely many times any small enough neighborhood of x0x_{0} with probability 1, and guarantee that x0x_{0} is not at the boundary of the recurrent states. Assumptions (A1) and (A3) and Lemma 3.2 in [24] imply that Tn(C)T_{n}(C)\to\infty almost surely.

Theorem 3.1.

Suppose that assumptions (A1)-(A7) are satisfied. Then, as nn\to\infty, one has

f^n(x0)=f0(x0)+oP(1),\widehat{f}_{n}(x_{0})=f_{0}(x_{0})+o_{P}(1), (9)

and

f^n1(f0(x0))=x0+oP(1).\widehat{f}_{n}^{-1}(f_{0}(x_{0}))=x_{0}+o_{P}(1). (10)

4 Rates of convergence

To compute rates of convergence, we need stronger assumptions than for consistency. We replace assumption (A1) for the following stronger version

  • (B1)

    {Xt}\{X_{t}\} is a positive or β\beta-null recurrent, aperiodic and irreducible Markov Chain whose kernel P(x,A)P(x,A) satisfies the minorization condition (5).

We replace, (A5), (A6) and (A7), for the following slightly more restrictive assumption

  • (B2)

    The function f0f_{0} is non-increasing, the functions f0f_{0} and FCF_{C} are differentiable in CC, and the derivatives FCF_{C}^{\prime} and f0f_{0}^{\prime} are bounded, in absolute value, above and away from zero in CC.

Let λ\lambda be the initial measure of X. Our next hypothesis imposes some control on the behavior of the chain in the first regenerative block.

  • (B3)

    There exists a constant KK and a neighborhood VV of 0, such that

    𝔼λ(t=0τ𝜶ˇ(𝕀C{Xtx0+γ}𝕀C{Xtx0γ}))KγγV.\mathbb{E}_{\lambda}\left(\sum\limits_{t=0}^{{\tau_{\check{\boldsymbol{\alpha}}}}}{\left({{\mathbb{I}_{C}}\left\{{{X_{t}}\leqslant{x_{0}}+\gamma}\right\}-{\mathbb{I}_{C}}\left\{{{X_{t}}\leqslant{x_{0}}-\gamma}\right\}}\right)}\right)\leqslant K\gamma\quad\forall\gamma\in V.

Assumption (B3)is satisfied if we assume that the initial measure of the chain is the small measure used for the construction of the split chain (see equation 4.16c in [29]). In the positive recurrent case, taking λ\lambda equal to the unique invariant probability measure of the chain also satisfies (B3).

And finally, we need to control the number of times the chain visits CC in a regeneration block.

  • (B4)

    C(1)=t1𝕀C{Xt}\ell_{C}(\mathcal{B}_{1})=\sum_{t\in\mathcal{B}_{1}}\mathbb{I}_{C}{\left\{X_{t}\right\}} has finite second moment.

Theorem 4.1.

Assume that (A2), (A3), (A4), (B1), (B2), (B3) and (B4) hold. Then, as nn\to\infty, one has

f^n(x0)=f0(x0)+OP(u(n)1/3),\widehat{f}_{n}(x_{0})=f_{0}(x_{0})+O_{P}(u\left(n\right)^{-1/3}), (11)

with u(n)u\left(n\right) as defined in (6).

The rate u(n)u\left(n\right) comes from Lemmas 5.3 and 6.7, and as it can be seen from Theorem 2.1, it is a deterministic approximation of T(n)T\left({n}\right). Note that in the positive recurrent case, u(n)=nu\left(n\right)=n, hence we obtain the same rate n1/3n^{-1/3} as in the i.i.d. case [18, Chapter 2]. In the β\beta-null recurrent case, however, the rate of convergence is nβ/3L1/3(n)n^{\beta/3}L^{1/3}\left(n\right) which is slower than the usual rate. This is due to the null-recurrence of the chain because it takes longer for the process to return to a neighborhood of the point x0x_{0} and it is these points in the neighborhood of x0x_{0} which are used in nonparametric estimation.

5 Localized Markov chains

Given the localized nature of our approach, in this section, we present some results that are particularly useful in this scenario. These results are well known for positive recurrent chains but are new in the null recurrent case. The detailed proofs of these results can be found in Section 2 of the Supplementary material [4].

The first result can be viewed as an extension of the Glivenko-Cantelli theorem to the localized scenario.

Lemma 5.1.

Assume that (A1) and (A3) hold. Then, there exists a stationary σ\sigma-finite measure π\pi, and FF defined by (8), such that,

supy|Fn(y)F(y)|0a.s.\mathop{\sup}\limits_{y\in\mathbb{R}}\left|{{F_{n}}\left(y\right)-F\left(y\right)}\right|\to 0\quad a.s. (12)

as nn\to\infty. If (A5) is also satisfied, then, for all sufficiently small ε>0\varepsilon>0, as nn\to\infty we have

sup|pF(x0)|ε|Fn1(p)F1(p)|0a.s.\sup\limits_{\left|p-F\left(x_{0}\right)\right|\leqslant\varepsilon}\left|{F_{n}^{-1}\left(p\right)-F^{-1}\left(p\right)}\right|\to 0\quad a.s. (13)

Our next result (Lemma 5.2), which is an extension of Lemma 2 in [5] to the localized β\beta-null recurrent case, deals with the properties of classes of functions defined over the regeneration blocks. Before presenting the result, we need some machinery.

Recall that EE\subseteq\mathbb{R} denotes the state space of XX. Define E^=k=1Ek\widehat{E}=\cup_{k=1}^{\infty}E^{k} (i.e. the set of finite subsets of EE) and let the localized occupation measure MCM_{C} be given by

MC(B,dy)=xBCδx(y),for every BE^.\displaystyle M_{C}(B,dy)=\sum_{x\in B\cap C}\delta_{x}(y),\qquad\text{for every }B\in\widehat{E}.

The function that gives the size of the localized blocks is C:E^\ell_{C}:\widehat{E}\to\mathbb{N}

C(B)=MC(B,dy), for every BE^.\displaystyle\ell_{C}(B)=\int\,M_{C}(B,\mathrm{d}y),\qquad\text{ for every }B\in\widehat{E}.

Let ^\widehat{\mathcal{E}} denote the smallest σ\sigma-algebra formed by the elements of the σ\sigma-algebras k\mathcal{E}^{k}, k1k\geqslant 1, where k\mathcal{E}^{k} stands for the classical product σ\sigma-algebra. Let Q^\widehat{Q} denote a probability measure on (E^,^)(\widehat{E},\widehat{\mathcal{E}}). If B(ω)B(\omega) is a random variable with distribution QQ^{\prime}, then MC(B(ω),dy)M_{C}(B(\omega),\mathrm{d}y) is a random measure, i.e., MC(B(ω),dy)M_{C}(B(\omega),\mathrm{d}y) is a (counting) measure on (E,)(E,\mathcal{E}), almost surely, and for every AA\in\mathcal{E}, MC(B(ω),A)=AMC(B(ω),dy)M_{C}(B(\omega),A)=\int_{A}\,M_{C}(B(\omega),\mathrm{d}y) is a measurable random variable (valued in \mathbb{N}). Henceforth (B(ω))×f(y)MC(B(ω),dy)\ell(B(\omega))\times\int f(y)\,M_{C}(B(\omega),\mathrm{d}y) is a random variable and, provided that Q^(2)<\widehat{Q}(\ell^{2})<\infty, the map QCQ_{C}, defined by

QC(A)=EQ^(C(B)×AMC(B,dy))/EQ^(C2),for every A,\displaystyle Q_{C}(A)={E}_{\widehat{Q}}\left(\ell_{C}(B)\times\int_{A}\,M_{C}(B,\mathrm{d}y)\right)/E_{\widehat{Q}}(\ell_{C}^{2}),\qquad\text{for every }A\in\mathcal{E}, (14)

is a probability measure on (E,)(E,\mathcal{E}). The notation EQCE_{Q_{C}} stands for the expectation with respect to the underlying measure QCQ_{C}. Introduce the following notations: for any function g:Eg:E\to\mathbb{R}, let g^C:E^\widehat{g}_{C}:\widehat{E}\to\mathbb{R} be given by

g^C(B)=g(y)MC(B,dy)=xBCg(x)=xBgC(x),\widehat{g}_{C}\left(B\right)=\int{g\left(y\right){M_{C}}\left({B,dy}\right)}=\sum\limits_{x\in B\cap C}{g\left(x\right)}=\sum\limits_{x\in B}{{g_{C}}\left(x\right)}, (15)

and for any class 𝒢\mathcal{G} of real-valued functions defined on EE, denote the localized version of the sums on the blocks by G^C={g^C:g𝒢}\widehat{G}_{C}=\{\widehat{g}_{C}\,:\,g\in\mathcal{G}\}.

Notice that, for any function gg,

𝔼QC(g)=𝔼Q^(C(B)×g(y)MC(B,dy))𝔼Q^(C2)=𝔼Q^(C(B)g^C(B))𝔼Q^(C2).{\mathbb{E}_{{Q_{C}}}}\left(g\right)=\frac{{{\mathbb{E}_{{\widehat{Q}}}}\left({{\ell_{C}}\left(B\right)\times\int{g\left(y\right){M_{C}}\left({B,dy}\right)}}\right)}}{{{\mathbb{E}_{{\widehat{Q}}}}\left({\ell_{C}^{2}}\right)}}=\frac{{{\mathbb{E}_{{\widehat{Q}}}}\left({{\ell_{C}}\left(B\right)\widehat{g}_{C}\left(B\right)}\right)}}{{{\mathbb{E}_{{\widehat{Q}}}}\left({\ell_{C}^{2}}\right)}}. (16)
Lemma 5.2.

Let Q^\widehat{Q} be a probability measure on (E^,^)(\widehat{E},\widehat{\mathcal{E}}) such that 0<CL2(Q^)<0<\|\ell_{C}\|_{L^{2}(\widehat{Q})}<\infty and 𝒢\mathcal{G} be a class of measurable real-valued functions defined on (E,)(E,\mathcal{E}). Then we have, for every 0<ε<0<\varepsilon<\infty,

𝒩(εCL2(Q^),𝒢^C,L2(Q^))𝒩(ε,𝒢,L2(Q)),\mathcal{N}\left({\varepsilon{{\left\|{{\ell_{C}}}\right\|}_{{L_{2}}\left({{\widehat{Q}}}\right)}},\widehat{\mathcal{G}}_{C},{L^{2}}\left({{\widehat{Q}}}\right)}\right)\leqslant\mathcal{N}\left({\varepsilon,\mathcal{G},{L^{2}}\left(Q\right)}\right),

where QQ is given in (14). Moreover, if 𝒢\mathcal{G} belongs to the Vapnik–Chervonenkis (VC) class of functions with constant envelope UU and characteristic (C,v)(\textbf{C},v), then 𝒢^\widehat{\mathcal{G}} is VC with envelope UCU\ell_{C} and characteristic (C,v)(\textbf{C},v).

Remark 5.1.

For a probability measure μ\mu, and a class of functions \mathcal{H}, the covering number 𝒩(ε,,Lr(μ))\mathcal{N}\left({\varepsilon,\mathcal{H},{L^{r}}\left(\mu\right)}\right) is the minimum number of Lr(μ)L^{r}\left(\mu\right) ε\varepsilon-balls needed to cover \mathcal{H}. For more details about this concept and the VC class of functions, see [25].

To put into perspective Lemma 5.2, consider a class of bounded functions 𝒢\mathcal{G} that is VC with finite envelope. Lemma 5.2 tells us that the class of unbounded functions 𝒢^C\widehat{\mathcal{G}}_{C} is also VC. If we also have that (B4) holds, then Theorem 2.5 in [25] tells us that 𝒢^C\widehat{\mathcal{G}}_{C} is a Donsker class. A reasoning like this is used in the proof of the following result, which is a stronger version of Lemma 5.1 under assumptions (B1) and (B2) and has some interest on its own.

Lemma 5.3.

Assume that (B1), (B2), (A3), (A4) and (B4) hold. Then, for all sufficiently small ε>0\varepsilon>0 we have,

Tn(C)sup|yx0|ε|Fn(y)F(y)|2=Op(1){T_{n}}(C)\mathop{\sup}\limits_{|y-x_{0}|\leqslant\varepsilon}{\left|{F_{n}(y)-F(y)}\right|^{2}}={O_{p}}\left(1\right) (17)

when nn goes to ++\infty. If (B2) is also satisfied, as nn\to\infty we have

Tn(C)sup|pF(x0)|ε|Fn1(p)F1(p)|2=Op(1).{T_{n}}(C)\mathop{\sup}\limits_{|p-F(x_{0})|\leqslant\varepsilon}{\left|{F_{n}^{-1}(p)-{F^{-1}}(p)}\right|^{2}}={O_{p}}\left(1\right). (18)

6 Proofs

In this section, we provide the proof of Theorems 3.1 and 4.1. These proofs make use of several intermediate lemmas, whose proofs can be found in Sections 7 and 8.

6.1 Proof of Theorem 3.1

Recall that we consider the piecewise constant and left-continuous LSE f^n\widehat{f}_{n}, that is constant on every interval (Yk1,Yk](Y_{k-1},Y_{k}], k=2,,mk=2,\dots,m and also on (,Y1](-\infty,Y_{1}] and on [Ym,)[Y_{m},\infty). With δ>0\delta>0 fixed, we denote by Tn(C)T_{n}(C) the number of times the Markov Chain X visits the set C:=[x0δ,x0+δ]C:=[x_{0}-\delta,x_{0}+\delta] until time nn:

Tn(C)=t=0n𝕀{XtC}.T_{n}(C)=\sum_{t=0}^{n}\mathbb{I}{\{X_{t}\in C\}}. (19)

Let lk=t=1n𝕀C{XtYk}l_{k}=\sum_{t=1}^{n}\mathbb{I}_{C}{\left\{X_{t}\leqslant Y_{k}\right\}} for all k{1,,m}k\in\{1,\dots,m\} and l0=0l_{0}=0.

Our aim is to provide a characterization of f^n(x0)\widehat{f}_{n}(x_{0}). Recall from (7) that the localized empirical distribution function FnF_{n} is defined as

Fn(y)=1Tn(C)i=0Tn(C)𝕀{XσC(i)y}=1Tn(C)t=0n𝕀C{Xty}F_{n}(y)=\frac{1}{T_{n}(C)}\sum_{i=0}^{T_{n}(C)}{\mathbb{I}{\{X_{\sigma_{C}\left(i\right)}\leqslant y\}}}=\frac{1}{T_{n}(C)}\sum_{t=0}^{n}\mathbb{I}_{C}{\{X_{t}\leqslant y\}}

for yy\in\mathbb{R}. FnF_{n} is 0 on (,Y1)\left({-\infty,{Y_{1}}}\right), so, with an arbitrary random variable Y0<Y1Y_{0}<Y_{1} we have Fn(y)=Fn(Y0)=0F_{n}(y)=F_{n}(Y_{0})=0 for all y<Y1y<Y_{1}. Let 𝒦\cal K be the set

𝒦:={Fn(Yk),k=0,,m}\mathcal{K}:=\left\{F_{n}(Y_{k}),\ k=0,\dots,m\right\} (20)

and let Λn\Lambda_{n} be the continuous piecewise-linear process on [Fn(Y0),Fn(Ym)][F_{n}(Y_{0}),F_{n}(Y_{m})] with knots at the points in 𝒦\cal K and values

Λn(Fn(Yk))=1Tn(C)t=0nZt𝕀C{XtYk}\Lambda_{n}\left(F_{n}(Y_{k})\right)=\frac{1}{T_{n}(C)}\sum_{t=0}^{n}Z_{t}\mathbb{I}_{C}{\left\{X_{t}\leqslant Y_{k}\right\}} (21)

at the knots. The characterization of f^n\widehat{f}_{n} in Lemma 6.2 involves the least concave majorant of Λn\Lambda_{n}. Note that we use Tn(C)T_{n}(C) as a normalization in the definitions of the processes FnF_{n} and Λn\Lambda_{n} since this choice ensures that FnF_{n} and Λn\Lambda_{n} converge to fixed functions, see Lemma 5.1.

Lemma 6.1.

For all y[Fn(Y0),Fn(Ym)]y\in\left[{F_{n}\left(Y_{0}\right),{F_{n}}\left({{Y_{m}}}\right)}\right],

Λn(y)=Ln(y)+Mn(y),{\Lambda_{n}}\left(y\right)={L_{n}}\left(y\right)+{M_{n}}\left(y\right),

where,

Ln(y)=0yfFn1(u)𝑑u,{L_{n}}\left(y\right)=\int\limits_{0}^{y}{f\circ F_{{}_{n}}^{-1}\left(u\right)du}, (22)

and MnM_{n} is a piece-wise linear processes with knots at Fn(Yk)F_{n}(Y_{k}) for k{0,,m}k\in\{0,\dots,m\} such that

Mn(Fn(Yk))=1Tn(C)t=0nWt𝕀C{XtYk}.M_{n}(F_{n}(Y_{k}))=\frac{1}{T_{n}(C)}\sum_{t=0}^{n}W_{t}\mathbb{I}_{C}{\{X_{t}\leqslant Y_{k}\}}.

Moreover, MnM_{n} can be written as

Mn(y)={0,ify=0Rnj(y)+Mnj,otherwiseM_{n}(y)=\left\{\begin{array}[]{crl}0&,&\textnormal{if}\;y=0\\ R_{n}^{j}(y)+M_{n}^{j}&,&\textnormal{otherwise}\end{array}\right. (23)

where,

Mnj\displaystyle M_{n}^{j} =Mn(Fn(Yj))=1Tn(C)t=0nWt𝕀C{XtYj},\displaystyle=M_{n}(F_{n}(Y_{j}))=\frac{1}{T_{n}(C)}\sum_{t=0}^{n}W_{t}\mathbb{I}_{C}{\{X_{t}\leqslant Y_{j}\}}, (24)
Rnj(y)\displaystyle R_{n}^{j}\left(y\right) =t=0nWt𝕀C{Xt=Yj+1}lj+1lj(yFn(Yj)),\displaystyle=\frac{\sum\limits_{t=0}^{n}{{W_{t}}\mathbb{I}_{C}{\{X_{t}=Y_{j+1}\}}}}{{{l_{j+1}}-{l_{j}}}}\left({y-F_{n}\left(Y_{j}\right)}\right), (25)

and jj is such that Yj+1=Fn1(y)Y_{j+1}=F_{n}^{-1}\left(y\right).

In the next lemma, we give an alternative characterization of the monotone nonparametric LSE f^n\widehat{f}_{n} at the observation points Y1,,YmY_{1},\dots,Y_{m}.

Lemma 6.2.

Let C=[xδ,x+δ]C=[x-\delta,x+\delta] for some fixed δ>0\delta>0. Let λ^n\widehat{\lambda}_{n} be the left-hand slope of the least concave majorant of Λn\Lambda_{n}. Then,

f^n(Yk)=λ^nFn(Yk),k{1,,m}.\widehat{f}_{n}(Y_{k})=\widehat{\lambda}_{n}\circ F_{n}(Y_{k}),\quad\forall k\in\{1,\dots,m\}. (26)

with probability 1 for nn big enough.

We consider below the generalized inverse function of f^n\widehat{f}_{n} since it has a more tractable characterization than f^n\widehat{f}_{n} itself. To this end, let us define precisely the generalized inverses of all processes of interest. Since λ^n\widehat{\lambda}_{n} is a non-increasing left-continuous step function on (Fn(Y0),Fn(Ym)](F_{n}(Y_{0}),F_{n}(Y_{m})] that can have jumps only at the points Fn(Yk)F_{n}(Y_{k}), k{1,,m}k\in\{1,\dots,m\}, we define its generalized inverse U^n(a)\widehat{U}_{n}(a), for aa\in\mathbb{R}, as the greatest y(Fn(Y0),Fn(Ym)]y\in(F_{n}(Y_{0}),F_{n}(Y_{m})] that satisfies λ^n(y)a\widehat{\lambda}_{n}(y)\geqslant a, with the convention that the supremum of an empty set is Fn(Y0)F_{n}(Y_{0}). Then for every aa\in\mathbb{R} and y(Fn(Y0),Fn(Ym)]y\in(F_{n}(Y_{0}),F_{n}(Y_{m})], one has

λ^n(y)a if and only if U^n(a)y.\widehat{\lambda}_{n}(y)\geqslant a\mbox{ if and only if }\widehat{U}_{n}(a)\geqslant y. (27)

Likewise, since f^n\widehat{f}_{n} is a left-continuous non-increasing step function on \mathbb{R} that can have jumps only at the observation times Y1<<YmY_{1}<\dots<Y_{m}, we define the generalized inverse f^n1(a)\widehat{f}_{n}^{-1}(a), for aa\in\mathbb{R}, as the greatest y[Y0,Ym]y\in[Y_{0},Y_{m}] that satisfies f^n(y)a\widehat{f}_{n}(y)\geqslant a, with the convention that the supremum of an empty set is Y0Y_{0}. We then have

f^n(y)a if and only if f^n1(a)y\widehat{f}_{n}(y)\geqslant a\mbox{ if and only if }\widehat{f}_{n}^{-1}(a)\geqslant y (28)

for all aa\in\mathbb{R} and y(Y0,Ym]y\in(Y_{0},Y_{m}]. On the other hand, since FnF_{n} is a right-continuous non-decreasing step function on \mathbb{R} with range [Fn(Y0),Fn(Ym)][F_{n}(Y_{0}),F_{n}(Y_{m})], we define the generalized inverse Fn1(a)F_{n}^{-1}(a), for aFn(Ym)a\leqslant F_{n}(Y_{m}), as the smallest y[Y0,Ym]y\in[Y_{0},Y_{m}] which satisfies Fn(y)aF_{n}(y)\geqslant a. Note that the infimum is achieved for all aFn(Ym)a\leqslant F_{n}(Y_{m}). We then have

Fn(y)a if and only if Fn1(a)yF_{n}(y)\geqslant a\mbox{ if and only if }F_{n}^{-1}(a)\leqslant y (29)

for all aFn(Ym)a\leqslant F_{n}(Y_{m}) and y[Y0,Ym]y\in[Y_{0},Y_{m}], and thanks to Lemma 6.2 we have

f^n1=Fn1U^n\widehat{f}_{n}^{-1}=F_{n}^{-1}\circ\widehat{U}_{n} (30)

on \mathbb{R}. Moreover, one can check that

U^n(a)=argmaxp[Fn(Y0),Fn(Ym)]{Λn(p)ap}, for all a,\widehat{U}_{n}(a)=\mathop{\mbox{\sl\em argmax}}_{p\in[F_{n}(Y_{0}),F_{n}(Y_{m})]}\{\Lambda_{n}(p)-ap\},\mbox{ for all }a\in\mathbb{R}, (31)

where argmax denotes the greatest location of maximum (which is achieved on the set 𝒦\cal K in (20)). Thus, the inverse process U^n\widehat{U}_{n} is a location process that is more tractable than f^n\widehat{f}_{n} and λ^n\widehat{\lambda}_{n} themselves. A key idea in the following proofs is to derive properties of U^n\widehat{U}_{n} from its argmax characterization (31), then, to translate these properties to f^n1\widehat{f}_{n}^{-1} thanks to (30), and finally to translate them to f^n\widehat{f}_{n} thanks to (28).

To go from U^n\widehat{U}_{n} to f^n1\widehat{f}_{n}^{-1} using (30) requires to approximate Fn1F_{n}^{-1} by a fixed function. Hence, in the sequel, we are concerned by the convergence of the process FnF_{n} given in (7), where δ>0\delta>0 is chosen sufficiently small, and by the convergence of the corresponding inverse function Fn1F_{n}^{-1}.

It is stated in Lemma 5.1 that under (A1) and (A3), FnF_{n} converges to a fixed distribution function FF that depends on CC, hence on δ\delta. If, moreover, FF is strictly increasing in CC, then we can find a neighborhood of F(x0)F(x_{0}) over which the (usual) inverse function F1F^{-1} is uniquely defined, and Fn1F_{n}^{-1} converges to F1F^{-1}.

In the following lemma, we show that F(x0)F(x_{0}) belongs to the domain of Λn\Lambda_{n} with probability tending to one as nn\to\infty.

Lemma 6.3.

Assume that (A1), (A3), (A4) and (A5) hold. Then, we can find ε>0\varepsilon>0 such that the probability that Y1+εx0YmεY_{1}+\varepsilon\leqslant x_{0}\leqslant Y_{m}-\varepsilon tends to one as nn\to\infty. Moreover, the probability that Fn(Y1)F(x0)Fn(Ym)F_{n}(Y_{1})\leqslant F(x_{0})\leqslant F_{n}(Y_{m}) tends to one as nn\to\infty.

We will also need to control the noise {Wt}\{W_{t}\}. The following lemma shows that the noise is negligible under our assumptions.

Lemma 6.4.

Assume that (A1) and (A2) hold. Let n=σ({X1,,Xn})\mathcal{F}_{n}=\sigma\left(\left\{X_{1},\ldots,X_{n}\right\}\right). Then,

t=0nWt𝕀C{Xt=An}=oP(Tn(C)),\sum\limits_{t=0}^{n}{{W_{t}}{\mathbb{I}_{C}}{\left\{{{X_{t}}={A_{n}}}\right\}}}={o_{P}}\left({T_{n}(C)}\right),

and

supu>An|t=0nWt𝕀C{Xt(An,u]}|=oP(Tn(C)).\mathop{\sup}\limits_{u>{A_{n}}}\left|{\sum\limits_{t=0}^{n}{{W_{t}}{\mathbb{I}_{C}}\left\{{{X_{t}}\in\left({{A_{n}},u}\right]}\right\}}}\right|={o_{P}}\left({T_{n}(C)}\right).

for any sequence of random variables AnA_{n}, independent of the process {Wt}\{W_{t}\}, that is adapted to the filtration {n}\{\mathcal{F}_{n}\}.

With the above lemmas, we can prove convergence of U^n\widehat{U}_{n} to F(x0)F\left(x_{0}\right) given by (31).

Lemma 6.5.

Suppose that assumptions (A1)-(A7) are satisfied. Then, as nn\to\infty, one has

U^n(f0(x0))=F(x0)+oP(1).\widehat{U}_{n}(f_{0}(x_{0}))=F(x_{0})+o_{P}(1). (32)

Now we proceed to the proof of (10). Fix ε>0\varepsilon>0 arbitrarily small. It follows from (30) and (29) that

(f^n1(a)>x0+ε)\displaystyle\mathbb{P}\left(\widehat{f}_{n}^{-1}(a)>x_{0}+\varepsilon\right) \displaystyle\leqslant (Fn1U^n(a)>x0+ε)\displaystyle\mathbb{P}\left(F_{n}^{-1}\circ\widehat{U}_{n}(a)>x_{0}+\varepsilon\right)
\displaystyle\leqslant (U^n(a)Fn(x0+ε))\displaystyle\mathbb{P}\left(\widehat{U}_{n}(a)\geqslant F_{n}(x_{0}+\varepsilon)\right)
\displaystyle\leqslant (U^n(a)F(x0+ε)Kn),\displaystyle\mathbb{P}\left(\widehat{U}_{n}(a)\geqslant F(x_{0}+\varepsilon)-K_{n}\right),

where

Kn=sup|yx0|ε|Fn(y)F(y)|.K_{n}=\sup_{|y-x_{0}|\leqslant\varepsilon}|F_{n}(y)-F(y)|.

With ν:=F(x0+ε)F(x0)\nu:=F(x_{0}+\varepsilon)-F(x_{0}), we obtain

(f^n1(a)>x0+ε)\displaystyle\mathbb{P}\left(\widehat{f}_{n}^{-1}(a)>x_{0}+\varepsilon\right) \displaystyle\leqslant (U^n(a)F(x0)+νKn),\displaystyle\mathbb{P}\left(\widehat{U}_{n}(a)\geqslant F(x_{0})+\nu-K_{n}\right),

and ν\nu is strictly positive since FF is strictly increasing in the neighborhood of x0x_{0}. Hence, it follows from (12) that for sufficiently small ε>0\varepsilon>0 one has

(f^n1(a)>x0+ε)\displaystyle\mathbb{P}\left(\widehat{f}_{n}^{-1}(a)>x_{0}+\varepsilon\right) \displaystyle\leqslant (U^n(a)F(x0)+ν/2)+o(1),\displaystyle\mathbb{P}\left(\widehat{U}_{n}(a)\geqslant F(x_{0})+\nu/2\right)+o(1),

so it follows from (32) that the probability that f^n1(a)>x0+ε\widehat{f}_{n}^{-1}(a)>x_{0}+\varepsilon tends to zero as nn\to\infty. Similarly, the probability that f^n1(a)<x0ε\widehat{f}_{n}^{-1}(a)<x_{0}-\varepsilon tends to zero as nn\to\infty so we conclude that the probability that |f^n1(a)x0|>ε|\widehat{f}_{n}^{-1}(a)-x_{0}|>\varepsilon tends to zero as nn\to\infty. This completes the proof of (10).

To prove (9), fix ε>0\varepsilon>0 sufficiently small so that FF and f0f_{0} are continuous and strictly increasing in the neighborhood of x:=f01(f0(x0)+ε)x^{\prime}:=f_{0}^{-1}(f_{0}(x_{0})+\varepsilon). Equation (10) shows that

f^n1(f0(x0)+ε)=f01(f0(x0)+ε)+oP(1),\widehat{f}_{n}^{-1}(f_{0}(x_{0})+\varepsilon)=f_{0}^{-1}(f_{0}(x_{0})+\varepsilon)+o_{P}(1), (33)

as nn\to\infty. Now, it follows from the switch relation (27) that

(f^n(x0)>f0(x0)+ε)\displaystyle\mathbb{P}\left(\widehat{f}_{n}(x_{0})>f_{0}(x_{0})+\varepsilon\right) (f^n1(f0(x0)+ε)x)\displaystyle\leqslant\mathbb{P}\left(\widehat{f}_{n}^{-1}(f_{0}(x_{0})+\varepsilon)\geqslant x\right)
(f^n1(f0(x0)+ε)f01(f0(x0)+ε)+ν),\displaystyle\leqslant\mathbb{P}\left(\widehat{f}_{n}^{-1}(f_{0}(x_{0})+\varepsilon)\geqslant f_{0}^{-1}(f_{0}(x_{0})+\varepsilon)+\nu\right), (34)

where ν:=xf01(f0(x0)+ε)>0\nu:=x-f_{0}^{-1}(f_{0}(x_{0})+\varepsilon)>0. It follows from (33) that the probability on the right-hand side tends to zero as nn\to\infty. Hence, the probability on the left-hand side tends to zero as well as nn\to\infty.

Similarly, the probability that f^n(x0)<f0(x0)ε\widehat{f}_{n}(x_{0})<f_{0}(x_{0})-\varepsilon tends to zero as nn\to\infty so we conclude that the probability that |f^n(x0)f0(x0)|>ε|\widehat{f}_{n}(x_{0})-f_{0}(x_{0})|>\varepsilon tends to zero as nn\to\infty. This completes the proof of Theorem 3.1.

6.2 Proof of Theorem 4.1

The proof of Theorem 4.1, uses similar ideas as the ones used in the proof of Theorem 3.1 but under stronger assumptions (and therefore using stronger lemmas).

The first intermediate result is the following stronger version of Lemma 6.4.

Lemma 6.6.

Assume that (A2), (A3), (A4), (B1), (B2) and (B3) hold. Then, there exists K>0K>0, γ0>0\gamma_{0}>0 that do not depend on nn and Nγ0N_{\gamma_{0}}\in\mathbb{N}, such that for all γ[0,γ0]\gamma\in[0,\gamma_{0}] and nNγ0n\geqslant N_{\gamma_{0}} one has

𝔼λ(sup|yx0|γ|t=0nWt(𝕀C{Xty}𝕀C{Xtx0})|2)\displaystyle\mathbb{E}_{\lambda}\left(\sup_{|y-x_{0}|\leqslant\gamma}\left|\sum_{t=0}^{n}W_{t}\left(\mathbb{I}_{C}{\{X_{t}\leqslant y\}}-\mathbb{I}_{C}{\{X_{t}\leqslant x_{0}\}}\right)\right|^{2}\right) Ku(n)γ\displaystyle\leqslant Ku\left(n\right)\gamma (35)
𝔼λ(sup|yx0|γ|t=0nWt𝕀C{Xt=y}|2)\displaystyle\mathbb{E}_{\lambda}\left(\sup_{|y-x_{0}|\leqslant\gamma}\left|\sum_{t=0}^{n}W_{t}\mathbb{I}_{C}{\{X_{t}=y\}}\right|^{2}\right) Ku(n)γ\displaystyle\leqslant Ku\left(n\right)\gamma (36)

Then, we need to quantify how well we can approximate Tn(C)T_{n}(C) by u(n)u\left(n\right).

Lemma 6.7.

Assume that (B1) and (A3) hold. Then we have

  • a)

    As nn\to\infty we have

    u(n)Tn(C)=OP(1).\frac{u\left(n\right)}{T_{n}(C)}=O_{P}(1).
  • b)

    Let α\alpha and η\eta be positive constants, then there positive exists constants NηN_{\eta}, c¯η\underline{c}_{\eta} and c¯η\overline{c}_{\eta}, such that

    ((Tn(C)a(n))α[c¯η,c¯η])1η,nNη.\mathbb{P}\left({{{\left({\frac{{T_{n}\left(C\right)}}{{a\left(n\right)}}}\right)}^{\alpha}}\in\left[{\underline{c}_{\eta},\overline{c}_{\eta}}\right]}\right)\geqslant 1-\eta,\quad\forall n\geqslant{N_{\eta}}.

With the above lemmas (including Lemma 5.3 and the ones used in Section 6.1), we can obtain the rate of convergence of U^n\widehat{U}_{n} given by (31).

Lemma 6.8.

Assume that (A2), (A3), (A4), (B1), (B2), (B3) and (B4) hold. Then, as nn\to\infty, one has

U^n(f0(x0))=F(x0)+OP(u(n)1/3),\widehat{U}_{n}(f_{0}(x_{0}))=F(x_{0})+O_{P}\left(u\left(n\right)^{-1/3}\right), (37)

and

f^n1(f0(x0))=x+OP(u(n)1/3).\widehat{f}_{n}^{-1}(f_{0}(x_{0}))=x+O_{P}\left(u\left(n\right)^{-1/3}\right). (38)

Inspecting the proof of Lemma 6.8, one can see that the convergences in (37) and (38) hold in a uniform sense in the neighborhood of x0x_{0}. More precisely, there exists γ>0\gamma>0, independent on nn, such that for all η>0\eta>0 we can find K1>0K_{1}>0 such that

sup|af0(x0)|γ(|U^n(a)Ff01(a)|>K1u(n)1/3)η\sup_{|a-f_{0}(x_{0})|\leqslant\gamma}\mathbb{P}\left(\left|\widehat{U}_{n}(a)-F\circ f_{0}^{-1}(a)\right|>K_{1}{u\left(n\right)^{-1/3}}\right)\leqslant\eta

and

sup|af0(x0)|γ(|f^n1(a)f01(a)|>K1u(n)1/3)η.\sup_{|a-f_{0}(x_{0})|\leqslant\gamma}\mathbb{P}\left(\left|\widehat{f}_{n}^{-1}(a)-f_{0}^{-1}(a)\right|>K_{1}{u\left(n\right)^{-1/3}}\right)\leqslant\eta.

Let ε=K1u(n)1/3\varepsilon=K_{1}{u\left(n\right)^{-1/3}} where K1>0K_{1}>0 does not depend on nn, and recall (6.1) where ν=xf01(f0(x0)+ε)>0\nu=x-f_{0}^{-1}(f_{0}(x_{0})+\varepsilon)>0. It follows from the assumption (B2) that f01f_{0}^{-1} has a derivative that is bounded in sup-norm away from zero in a neighborhood of f0(x0)f_{0}(x_{0}). Hence, it follows from the Taylor expansion that there exists K2>0K_{2}>0 that depends only on f0f_{0} such that νK2ε\nu\geqslant K_{2}\varepsilon, provided that nn is sufficiently large to ensure that f0(x0)+εf_{0}(x_{0})+\varepsilon belongs to this neighborhood of f0(x0)f_{0}(x_{0}). Hence,

(f^n(x0)>f0(x0)+ε)\displaystyle\mathbb{P}\left(\widehat{f}_{n}(x_{0})>f_{0}(x_{0})+\varepsilon\right) \displaystyle\leqslant (f^n1(f0(x0)+ε)f01(f0(x0)+ε)+K2ε).\displaystyle\mathbb{P}\left(\widehat{f}_{n}^{-1}(f_{0}(x_{0})+\varepsilon)\geqslant f_{0}^{-1}(f_{0}(x_{0})+\varepsilon)+K_{2}\varepsilon\right).
\displaystyle\leqslant sup|af0(x0)|γ(|f^n1(a)f01(a)|>K2K1u(n)1/3),\displaystyle\sup_{|a-f_{0}(x_{0})|\leqslant\gamma}\mathbb{P}\left(\left|\widehat{f}_{n}^{-1}(a)-f_{0}^{-1}(a)\right|>K_{2}K_{1}{u\left(n\right)^{-1/3}}\right),

provided that nn is sufficiently large to ensure that f0(x0)+εf_{0}(x_{0})+\varepsilon belongs to the above neighborhood of f0(x0)f_{0}(x_{0}), and that γCu(n)1/3\gamma\geqslant C{u\left(n\right)^{-1/3}}. For fixed η>0\eta>0 one can choose K2>0K_{2}>0 such that the probability on the right-hand side of the previous display is smaller than or equal to η\eta and therefore,

limn(f^n(x0)>f0(x0)+K2u(n)1/3)η.\lim_{n\to\infty}\mathbb{P}\left(\widehat{f}_{n}(x_{0})>f_{0}(x_{0})+K_{2}{u\left(n\right)^{-1/3}}\right)\leqslant\eta.

Similarly, for all fixed η>0\eta>0, one can find K3K_{3} that does not depend on nn such that

limn(f^n(x0)<f0(x0)K3u(n)1/3)η.\lim_{n\to\infty}\mathbb{P}\left(\widehat{f}_{n}(x_{0})<f_{0}(x_{0})-K_{3}{u\left(n\right)^{-1/3}}\right)\leqslant\eta.

Hence, for all fixed η>0\eta>0, there exists K>0K>0 that independent of nn such that

limn(|f^n(x0)f0(x0)|>Ku(n)1/3)η.\lim_{n\to\infty}\mathbb{P}\left(|\widehat{f}_{n}(x_{0})-f_{0}(x_{0})|>K{u\left(n\right)^{-1/3}}\right)\leqslant\eta.

This completes the proof of Theorem 4.1.

7 Technical proofs for Section 6.1

Proof of Lemma 6.1. Combining (21) and (1) yields

Λn(Fn(Yk))=1Tn(C)t=0nf0(Xt)𝕀C{XtYk}+1Tn(C)t=0nWt𝕀C{XtYk}.\Lambda_{n}(F_{n}(Y_{k}))=\frac{1}{T_{n}(C)}\sum_{t=0}^{n}f_{0}(X_{t})\mathbb{I}_{C}{\{X_{t}\leqslant Y_{k}\}}+\frac{1}{T_{n}(C)}\sum_{t=0}^{n}W_{t}\mathbb{I}_{C}{\{X_{t}\leqslant Y_{k}\}}.

The first term on the right-hand side of the previous display can be rewritten as follows:

1Tn(C)t=0nf0(Xt)𝕀C{XtYk}\displaystyle\frac{1}{T_{n}(C)}\sum_{t=0}^{n}f_{0}(X_{t})\mathbb{I}_{C}{\{X_{t}\leqslant Y_{k}\}} =1Tn(C)j=1mf0(Yl)(ljlj1)𝕀C{YjYk}\displaystyle=\frac{1}{T_{n}(C)}\sum_{j=1}^{m}f_{0}(Y_{l})(l_{j}-l_{j-1})\mathbb{I}_{C}{\{Y_{j}\leqslant Y_{k}\}}
=j=1klj1/Tn(C)lj/Tn(C)f0Fn1(u)𝑑u,\displaystyle=\sum_{j=1}^{k}\int_{l_{j-1}/T_{n}(C)}^{l_{j}/T_{n}(C)}f_{0}\circ F_{n}^{-1}(u)du,

using that Fn1(u)=YjF_{n}^{-1}(u)=Y_{j} for all u(lj1/Tn(C),lj/Tn(C)]u\in(l_{j-1}/T_{n}(C),l_{j}/T_{n}(C)]. Hence, for all kk in {0,,m}\{0,\dots,m\}

Λn(Fn(Yk))=0lk/Tn(C)f0Fn1(u)𝑑u+1Tn(C)t=0nWt𝕀C{XtYk}.\Lambda_{n}(F_{n}(Y_{k}))=\int_{0}^{l_{k}/T_{n}(C)}f_{0}\circ F_{n}^{-1}(u)du+\frac{1}{T_{n}(C)}\sum_{t=0}^{n}W_{t}\mathbb{I}_{C}{\{X_{t}\leqslant Y_{k}\}}. (39)

Combining (39) with the piece-wise linearity of Λn\Lambda_{n} yields

Λn(Fn(Yk))=Ln(Fn(Yk))+Mn(Fn(Yk)),{\Lambda_{n}}\left({{F_{n}}\left({{Y_{k}}}\right)}\right)={L_{n}}\left({{F_{n}}\left({{Y_{k}}}\right)}\right)+{M_{n}}\left({{F_{n}}\left({{Y_{k}}}\right)}\right),

where LnL_{n} and MnM_{n} are piece-wise linear processes with knots at Fn(Yk)F_{n}(Y_{k}) for kk in {0,,m}\{0,\dots,m\} and such that

Ln(Fn(Yk))=0lk/Tn(C)f0Fn1(u)𝑑uL_{n}(F_{n}(Y_{k}))=\int_{0}^{l_{k}/T_{n}(C)}f_{0}\circ F_{n}^{-1}(u)du

and

Mn(Fn(Yk))=1Tn(C)t=0nWt𝕀C{XtYk}.M_{n}(F_{n}(Y_{k}))=\frac{1}{T_{n}(C)}\sum_{t=0}^{n}W_{t}\mathbb{I}_{C}{\{X_{t}\leqslant Y_{k}\}}.

In order to ease the notation, we will write Fni=Fn(Yi)F_{n}^{i}=F_{n}(Y_{i}), Lni=Ln(Fn(Yi))L_{n}^{i}={L_{n}}\left({{F_{n}}\left({{Y_{i}}}\right)}\right) and Mni=Mn(Fn(Yi))M_{n}^{i}={M_{n}}\left({{F_{n}}\left({{Y_{i}}}\right)}\right). Let y(Fn(Y0),Fn(Ym)]y\in\left({{F_{n}}\left({{Y_{0}}}\right),{F_{n}}\left({{Y_{m}}}\right)}\right], take jj such that Yj+1=Fn1(y)Y_{j+1}=F_{n}^{-1}\left(y\right), then Fn(Yj)<yFn(Yj+1){F_{n}}\left({{Y_{j}}}\right)<y\leqslant{F_{n}}\left({{Y_{j+1}}}\right). With this notation,

Ln(y)\displaystyle{L_{n}}\left(y\right) =Lnj+1LnjFnj+1Fnj(yFnj)+Lnj,\displaystyle=\frac{{L_{n}^{j+1}-L_{n}^{j}}}{{F_{n}^{j+1}-F_{n}^{j}}}\left({y-F_{n}^{j}}\right)+L_{n}^{j},
Mn(y)\displaystyle{M_{n}}\left(y\right) =Mnj+1MnjFnj+1Fnj(yFnj)+Mnj.\displaystyle=\frac{{M_{n}^{j+1}-M_{n}^{j}}}{{F_{n}^{j+1}-F_{n}^{j}}}\left({y-F_{n}^{j}}\right)+M_{n}^{j}.

Notice that

Lnj+1Lnj\displaystyle L_{n}^{j+1}-L_{n}^{j} =ljTn(C)lj+1Tn(C)f0Fn1(u)𝑑u=lj+1ljTn(C)f(Yj+1),\displaystyle=\int\limits_{\frac{{{l_{j}}}}{{T_{n}(C)}}}^{\frac{{{l_{j+1}}}}{{T_{n}(C)}}}{f_{0}\circ{F_{n}^{-1}}\left(u\right)du}=\frac{{{l_{j+1}}-{l_{j}}}}{{T_{n}(C)}}f\left({{Y_{j+1}}}\right),
Fnj+1Fnj\displaystyle F_{n}^{j+1}-F_{n}^{j} =lj+1ljTn(C),\displaystyle=\frac{{{l_{j+1}}-{l_{j}}}}{{T_{n}(C)}},

therefore,

Ln(y)=f0(Yj+1)(yFnj)+Lnj=ljTn(C)yf0Fn1(u)𝑑u+Lnj=0yf0Fn1(u)𝑑u,{L_{n}}\left(y\right)=f_{0}\left({{Y_{j+1}}}\right)\left({y-F_{n}^{j}}\right)+L_{n}^{j}=\int\limits_{\frac{{{l_{j}}}}{{T_{n}(C)}}}^{y}{f_{0}\circ F_{{}_{n}}^{-1}\left(u\right)du}+L_{n}^{j}=\int\limits_{0}^{y}{f_{0}\circ F_{{}_{n}}^{-1}\left(u\right)du},

which proves (22).

For MnM_{n} we have,

Mnj+1Mnj=1Tn(C)t=0nWt𝕀C{Xt=Yj+1},M_{n}^{j+1}-M_{n}^{j}=\frac{1}{{T_{n}(C)}}\sum\limits_{t=0}^{n}{{W_{t}}\mathbb{I}_{C}{\{X_{t}=Y_{j+1}\}}},

then,

Mn(y)=t=1nWt𝕀C{Xt=Yj+1}lj+1lj(yFnj)+Mnj=Rnj(y)+Mnj.{M_{n}}\left(y\right)=\frac{\sum\limits_{t=1}^{n}{{W_{t}}\mathbb{I}_{C}{\{X_{t}=Y_{j+1}\}}}}{{{l_{j+1}}-{l_{j}}}}\left({y-F_{n}^{j}}\right)+M_{n}^{j}=R_{n}^{j}(y)+M_{n}^{j}.

and this completes the proof.

Proof of Lemma 6.2. By definition, with l0=0l_{0}=0, and lk=t=0n𝕀C{XtYk}l_{k}=\sum_{t=0}^{n}\mathbb{I}_{C}{\left\{X_{t}\leqslant Y_{k}\right\}} for all k{1,,m}k\in\{1,\dots,m\}, we have Fn(Yk)=alkF_{n}(Y_{k})=al_{k} for all k{0,,m}k\in\{0,\dots,m\}, where a=1/Tn(C)a=1/T_{n}(C) and does not depend on kk. Moreover,

Λn(Fn(Yk))=at=0nZt𝕀C{XtYk}\Lambda_{n}\left(F_{n}(Y_{k})\right)=a\sum_{t=0}^{n}Z_{t}\mathbb{I}_{C}{\left\{X_{t}\leqslant Y_{k}\right\}}

Since f^n(Yk)\widehat{f}_{n}(Y_{k}) is the left-hand slope at lkl_{k} of the least concave majorant of the set of points in (3), the equality in (26) follows from Lemma 2.1 in [13].

Proof of Lemma 6.3. The first assertion follows from Assumption (A4) and the second immediately follows from the first one by (12) combined with the strict monotonicity of FF in CC.

Proof of Lemma 6.4. Let n=σ({X0,,Xn})\mathcal{F}_{n}=\sigma\left(\left\{X_{0},\ldots,X_{n}\right\}\right) be sigma algebra generated by the chain {Xt}\left\{X_{t}\right\} up to time nn. Denote by n\mathbb{P}_{\mathcal{F}_{n}} the probability conditioned to n\mathcal{F}_{n}. Take ε>0\varepsilon>0.

By Chebyshev’s inequality, we have

n(|t=0nWt𝕀C{Xt=An}Tn(C)|>ε)σ2t=0n𝕀C{Xt=An}ε2Tn2(C)σ2ε2Tn(C),{\mathbb{P}_{{\mathcal{F}_{n}}}}\left({\left|{\frac{{\sum\limits_{t=0}^{n}{{W_{t}}{\mathbb{I}_{C}}\left\{{{X_{t}}={A_{n}}}\right\}}}}{{{T_{n}}\left(C\right)}}}\right|>\varepsilon}\right)\leqslant\frac{{{\sigma^{2}}\sum\limits_{t=0}^{n}{{\mathbb{I}_{C}}\left\{{{X_{t}}={A_{n}}}\right\}}}}{{{\varepsilon^{2}}{T_{n}}^{2}\left(C\right)}}\leqslant\frac{{{\sigma^{2}}}}{{{\varepsilon^{2}}{T_{n}}\left(C\right)}},

which implies the first part of the Lemma because Tn(C)T_{n}(C)\to\infty with probability 1.

For the second part, let γn(u)\gamma_{n}(u) be the number of times the chain visits (An,u]C{\left({{A_{n}},u}\right]}\cap C up to time nn and An(u)={tn:Xt(An,u]C}={a1,,aγn(u)}{A_{n}}\left(u\right)=\left\{{t\leqslant n:{X_{t}}\in\left({{A_{n}},u}\right]}\cap C\right\}=\left\{a_{1},\ldots,a_{\gamma_{n}(u)}\right\} the times of those visits. Using that γn=supu>Anγn(u)Tn(C)\gamma_{n}=\mathop{\sup}\limits_{u>{A_{n}}}{\gamma_{n}}\left(u\right)\leqslant{T_{n}}\left(C\right) and Kolmogorov’s inequality (Th 3.1.6, pp 122 in [19]) we obtain,

n(supu>An|t=0nWt𝕀C{Xt(An,u]}Tn(C)|>ε)\displaystyle{\mathbb{P}_{{\mathcal{F}_{n}}}}\left({\mathop{\sup}\limits_{u>{A_{n}}}\left|{\frac{{\sum\limits_{t=0}^{n}{{W_{t}}{\mathbb{I}_{C}}\left\{{{X_{t}}\in\left({{A_{n}},u}\right]}\right\}}}}{{{T_{n}}\left(C\right)}}}\right|>\varepsilon}\right) =n(supu>An|i=1γn(u)WtaiTn(C)|>ε)\displaystyle={\mathbb{P}_{{\mathcal{F}_{n}}}}\left({\mathop{\sup}\limits_{u>{A_{n}}}\left|{\sum\limits_{i=1}^{{\gamma_{n}}\left(u\right)}{\frac{{{W_{{t_{{a_{i}}}}}}}}{{{T_{n}}\left(C\right)}}}}\right|>\varepsilon}\right)
n(sup1kγn|i=1kWtaiTn(C)|>ε)\displaystyle\leqslant{\mathbb{P}_{{\mathcal{F}_{n}}}}\left({\mathop{\sup}\limits_{1\leqslant k\leqslant\gamma_{n}}\left|{\sum\limits_{i=1}^{k}{\frac{{{{W}_{t_{a_{i}}}}}}{{{T_{n}}\left(C\right)}}}}\right|>\varepsilon}\right)
σ2ε2Tn(C).\displaystyle\leqslant\frac{{{\sigma^{2}}}}{{{\varepsilon^{2}}{T_{n}}\left(C\right)}}.

which by the same argument as before, implies the second part of the Lemma.

Proof of Lemma 6.5. In the sequel, we set a=f0(x0)a=f_{0}(x_{0}). We begin with the proof of (32).

Fix ε>0\varepsilon>0 arbitrarily, and let ν>0\nu>0 and γ>0\gamma>0 be such that |F1(u)x0|>ν|F^{-1}(u)-x_{0}|>\nu for all uu such that |uF(x0)|ε/2|u-F(x_{0})|\geqslant\varepsilon/2, and |f0(x0)f0(y)|>γ|f_{0}(x_{0})-f_{0}(y)|>\gamma for all yy such that |yx0|ν/2|y-x_{0}|\geqslant\nu/2. Note that existence of ν\nu and γ\gamma is ensured by assumptions (A5) and (A6).

By Lemma 6.3, we can assume without loss of generality that F(x0)F(x_{0}) belongs to the domain [Fn(Y1),Fn(Ym)][F_{n}(Y_{1}),F_{n}(Y_{m})] of Λn\Lambda_{n}, since this occurs with probability tending to one. Therefore, we can find j(x0)j(x_{0}) such that Yj(x0)=Fn1(F(x0)){Y_{j(x_{0})}}={F_{n}}^{-1}\left({F\left({{x_{0}}}\right)}\right). It follows from the characterization in (31) that the event En1:={U^n(a)>F(x0)+ε}E_{n}^{1}:=\{\widehat{U}_{n}(a)>F(x_{0})+\varepsilon\} is contained in the event that there exists p𝒦p\in\cal K such that p>F(x0)+εp>F(x_{0})+\varepsilon and

Λn(p)apΛn(F(x0))aF(x0),{\Lambda_{n}}\left(p\right)-ap\geqslant{\Lambda_{n}}\left({F\left({{x_{0}}}\right)}\right)-aF\left({{x_{0}}}\right),

where we recall that a=f0(x0)a=f_{0}\left(x_{0}\right).

By Lemma 6.1, En1E_{n}^{1} is contained in the event that there exists p𝒦p\in\cal K such that p>F(x0)+εp>F(x_{0})+\varepsilon and

Ln(p)+Mn(p)apLn(F(x0))+Mn(F(x0))aF(x0)L_{n}(p)+M_{n}(p)-ap\geqslant L_{n}(F(x_{0}))+M_{n}(F(x_{0}))-aF(x_{0}) (40)

Using (22) in (40) we obtain that En1E_{n}^{1} is contained in the event that there exists p𝒦p\in\cal K such that p>F(x0)+εp>F(x_{0})+\varepsilon and

t0/Tn(C)pf0Fn1(u)𝑑u+Snapt0/Tn(C)F(x0)f0Fn1(u)𝑑uaF(x0),\int_{t_{0}/T_{n}(C)}^{p}f_{0}\circ F_{n}^{-1}(u)du+S_{n}-ap\geqslant\int_{t_{0}/T_{n}(C)}^{F(x_{0})}f_{0}\circ F_{n}^{-1}(u)du-aF(x_{0}),

where

Sn=supp>F(x0)+ε,p𝒦{Mn(p)Mn(F(x0))}.\displaystyle{S_{n}}=\mathop{\sup}\limits_{p>F(x_{0})+\varepsilon,\;p\in\mathcal{K}}\left\{{{M_{n}}(p)-{M_{n}}\left({F\left({{x_{0}}}\right)}\right)}\right\}.

Let jj and kk such that Yj+1=Fn1(F(x0))Y_{j+1}=F_{n}^{-1}\left(F\left(x_{0}\right)\right) and p=Fn(Yk)p=F_{n}(Y_{k}). By equation (23) we have Mn(p)Mn(F(x0))=MnkMnjRnj(F(x0)){M_{n}}\left(p\right)-{M_{n}}\left({F\left({{x_{0}}}\right)}\right)=M_{n}^{k}-M_{n}^{j}-R_{n}^{j}(F(x_{0})), therefore,

Sn\displaystyle{S_{n}} =supp>F(x0)+εp𝒦{MnkMnj}Rnj(F(x0))\displaystyle=\mathop{\sup}_{\begin{subarray}{l}p>F\left({{x_{0}}}\right)+\varepsilon\\ p\in\mathcal{K}\end{subarray}}{\left\{{M_{n}^{k}-M_{n}^{j}}\right\}-R_{n}^{j}\left({F\left({{x_{0}}}\right)}\right)}
supp>F(x0)+εp𝒦|1Tn(C)t=0nWt(𝕀C{XtFn1(p)}𝕀C{XtFn1(F(x0))})|+|Rnj(Fn(Yj+1))|\displaystyle\begin{split}&\leqslant\mathop{\sup}_{\begin{subarray}{l}p>F\left({{x_{0}}}\right)+\varepsilon\\ p\in\mathcal{K}\end{subarray}}{\left|\frac{1}{T_{n}(C)}\sum_{t=0}^{n}W_{t}\left(\mathbb{I}_{C}{\{X_{t}\leqslant F_{n}^{-1}(p)\}}-\mathbb{I}_{C}{\{X_{t}\leqslant F_{n}^{-1}(F(x_{0}))\}}\right)\right|}\\ &\qquad+\left|{R_{n}^{j}\left({F_{n}\left(Y_{j+1}\right)}\right)}\right|\\ \end{split}
supp>F(x0)+εp𝒦|1Tn(C)t=0nWt𝕀C{Xt(Fn1(F(x0));Fn1(p)]}|+|t=0nWt𝕀C{Xt=Fn1(F(x0))}|Tn(C).\displaystyle\begin{split}&\leqslant\mathop{\sup}_{\begin{subarray}{l}p>F\left({{x_{0}}}\right)+\varepsilon\\ p\in\mathcal{K}\end{subarray}}{\left|\frac{1}{T_{n}(C)}\sum_{t=0}^{n}W_{t}\mathbb{I}_{C}{\left\{X_{t}\in\left({F_{n}^{-1}\left({F\left({{x_{0}}}\right)}\right);F_{n}^{-1}(p)}\right]\right\}}\right|}\\ &\qquad+\frac{\left|\sum\limits_{t=0}^{n}{{W_{t}}\mathbb{I}_{C}{\left\{X_{t}=F_{n}^{-1}\left({F\left({{x_{0}}}\right)}\right)\right\}}}\right|}{T_{n}(C)}.\end{split}

Hence,

Tn(C)Snsupp>F(x0)+εp𝒦|t=0nWt𝕀C{Xt(Fn1(F(x0));Fn1(p)]}|+|t=0nWt𝕀C{Xt=Fn1(F(x0))}|.\begin{split}T_{n}(C)S_{n}&\leqslant\mathop{\sup}_{\begin{subarray}{l}p>F\left({{x_{0}}}\right)+\varepsilon\\ p\in\mathcal{K}\end{subarray}}{\left|\sum_{t=0}^{n}W_{t}\mathbb{I}_{C}{\left\{X_{t}\in\left({F_{n}^{-1}\left({F\left({{x_{0}}}\right)}\right);F_{n}^{-1}(p)}\right]\right\}}\right|}\\ &\qquad+\left|\sum\limits_{t=0}^{n}{{W_{t}}\mathbb{I}_{C}{\left\{X_{t}=F_{n}^{-1}\left({F\left({{x_{0}}}\right)}\right)\right\}}}\right|.\end{split}

Therefore, the event En1E_{n}^{1} is contained in the event that there exists p>F(x0)+εp>F(x_{0})+\varepsilon such that

F(x0)pf0Fn1(u)𝑑u+Sna(pF(x0)).\int_{F(x_{0})}^{p}f_{0}\circ F_{n}^{-1}(u)du+S_{n}\geqslant a(p-F(x_{0})).

Now, let En2E_{n}^{2} be the event that

sup|uF(x0)|ε|Fn1(u)F1(u)|η\sup_{|u-F(x_{0})|\leqslant\varepsilon}|F_{n}^{-1}(u)-F^{-1}(u)|\leqslant\eta

where η(0,ν/4)\eta\in(0,\nu/4) is such that |f0(y)f0(x0)|γ/2|f_{0}(y)-f_{0}(x_{0})|\leqslant\gamma/2 for all yy such that |x0y|η|x_{0}-y|\leqslant\eta. Note that the existence of η\eta is ensured by assumption (A7). Then, it follows from the monotonicity of f0f_{0} and FnF_{n} that on En2E_{n}^{2},

F(x0)pf0Fn1(u)𝑑u\displaystyle\int_{F(x_{0})}^{p}f_{0}\circ F_{n}^{-1}(u)du \displaystyle\leqslant F(x0)F(x0)+ε/2f0(F1(u)η)𝑑u+F(x0)+ε/2pf0(Fn1(F(x0)+ε/2))𝑑u.\displaystyle\int_{F(x_{0})}^{F(x_{0})+\varepsilon/2}f_{0}(F^{-1}(u)-\eta)du+\int_{F(x_{0})+\varepsilon/2}^{p}f_{0}(F_{n}^{-1}(F(x_{0})+\varepsilon/2))du.

Hence, it follows from the definitions of η\eta, ν\nu and γ\gamma that on En2E_{n}^{2},

F(x0)pf0Fn1(u)𝑑u\displaystyle\int_{F(x_{0})}^{p}f_{0}\circ F_{n}^{-1}(u)du \displaystyle\leqslant ε2f0(x0)+γε4+(pF(x0)ε/2)f0(F1(F(x0)+ε/2)η)\displaystyle\frac{\varepsilon}{2}f_{0}(x_{0})+\frac{\gamma\varepsilon}{4}+(p-F(x_{0})-\varepsilon/2)f_{0}(F^{-1}(F(x_{0})+\varepsilon/2)-\eta)
\displaystyle\leqslant ε2f0(x0)+γε4+(pF(x0)ε/2)f0(x0+ν/2)\displaystyle\frac{\varepsilon}{2}f_{0}(x_{0})+\frac{\gamma\varepsilon}{4}+(p-F(x_{0})-\varepsilon/2)f_{0}(x_{0}+\nu/2)
\displaystyle\leqslant ε2f0(x0)+γε4+(pF(x0)ε/2)(f0(x0)γ).\displaystyle\frac{\varepsilon}{2}f_{0}(x_{0})+\frac{\gamma\varepsilon}{4}+(p-F(x_{0})-\varepsilon/2)(f_{0}(x_{0})-\gamma).

This implies that on En2E_{n}^{2},

F(x0)pf0Fn1(u)𝑑u\displaystyle\int_{F(x_{0})}^{p}f_{0}\circ F_{n}^{-1}(u)du \displaystyle\leqslant a(pF(x0))(pF(x0)3ε/4)γ\displaystyle a(p-F(x_{0}))-(p-F(x_{0})-3\varepsilon/4)\gamma
\displaystyle\leqslant a(pF(x0))εγ/4\displaystyle a(p-F(x_{0}))-\varepsilon\gamma/4

for all p>F(x0)+εp>F(x_{0})+\varepsilon. Hence, the event En1En2E_{n}^{1}\cap E_{n}^{2} is contained in the event {Snεγ/4}\{S_{n}\geqslant\varepsilon\gamma/4\}. Now, on En2E_{n}^{2}, for all p>F(x0)+εp>F(x_{0})+\varepsilon we have

Fn1(p)\displaystyle F_{n}^{-1}(p) \displaystyle\geqslant Fn1(F(x0)+ε)\displaystyle F_{n}^{-1}(F(x_{0})+\varepsilon)
\displaystyle\geqslant F1(F(x0)+ε)η\displaystyle F^{-1}(F(x_{0})+\varepsilon)-\eta
\displaystyle\geqslant x+νη\displaystyle x+\nu-\eta
\displaystyle\geqslant Fn1(F(x0))+ν2η\displaystyle F_{n}^{-1}(F(x_{0}))+\nu-2\eta
\displaystyle\geqslant Fn1(F(x0))+ν/2,\displaystyle F_{n}^{-1}(F(x_{0}))+\nu/2,

since ν>4η\nu>4\eta. Therefore,

Tn(C)Sn\displaystyle T_{n}(C)S_{n}\leqslant supu>Fn1(F(x0))+ν/2|t=0nWt𝕀C{Xt(Fn1(F(x0)),u]}|+\displaystyle\sup_{u>F_{n}^{-1}(F(x_{0}))+\nu/2}\left|\sum_{t=0}^{n}W_{t}\mathbb{I}_{C}{\{X_{t}\in(F_{n}^{-1}(F(x_{0})),u]\}}\right|+
+|t=0nWt𝕀C{Xt=Fn1(F(x0))}|.\displaystyle+\left|\sum\limits_{t=0}^{n}{{W_{t}}\mathbb{I}_{C}{\{X_{t}=F_{n}^{-1}\left({F\left({{x_{0}}}\right)}\right)\}}}\right|.

Hence, it follows from Lemma 6.4 that SnS_{n} converges in probability to zero as nn\to\infty, so that the probability of the event {Snεγ/4}\{S_{n}\geqslant\varepsilon\gamma/4\} tends to zero as nn\to\infty. It follows from Lemma 5.1 that for ε\varepsilon sufficiently small, the probability of the event En2E_{n}^{2} tends to one as nn\to\infty, so we conclude that the probability of En1E_{n}^{1} tends to zero as nn\to\infty. Similarly, the probability of the event {U^n(a)<F(x0)ε}\{\widehat{U}_{n}(a)<F(x_{0})-\varepsilon\} tends to zero as nn\to\infty, so that

limn(|U^n(a)F(x0)|>ε)=0\lim_{n\to\infty}\mathbb{P}(|\widehat{U}_{n}(a)-F(x_{0})|>\varepsilon)=0

for all ε>0\varepsilon>0. This completes the proof of (32).

8 Technical proofs for Section 6.2

Proof of Lemma 6.6. Let n=σ({X0,,Xn})\mathcal{F}_{n}=\sigma\left(\left\{X_{0},\ldots,X_{n}\right\}\right) be sigma algebra generated by the chain X up to time nn. Denote by 𝔼n\mathbb{E}_{\mathcal{F}_{n}} the expected value conditioned to n\mathcal{F}_{n}. Take 0<γδ0<\gamma\leqslant\delta and define I0=[x0γ,x0]{I_{0}}=\left[{{x_{0}}-\gamma,{x_{0}}}\right], I1=[x0,x0+γ]{I_{1}}=\left[{{x_{0}},{x_{0}}+\gamma}\right] and

S0(γ)=supyI0|t=0nWt(𝕀{Xty}𝕀{Xtx0})|2S1(γ)=supyI1|t=0nWt(𝕀{Xty}𝕀{Xtx0})|2\begin{gathered}{S_{0}\left(\gamma\right)}=\mathop{\sup}\limits_{y\in{I_{0}}}\left|{\sum\limits_{t=0}^{n}{{W_{t}}\left({{\mathbb{I}}\left\{{{X_{t}}\leqslant y}\right\}-{\mathbb{I}}\left\{{{X_{t}}\leqslant{x_{0}}}\right\}}\right)}}\right|^{2}\hfill\\ {S_{1}\left(\gamma\right)}=\mathop{\sup}\limits_{y\in{I_{1}}}\left|{\sum\limits_{t=0}^{n}{{W_{t}}\left({{\mathbb{I}}\left\{{{X_{t}}\leqslant y}\right\}-{\mathbb{I}}\left\{{{X_{t}}\leqslant{x_{0}}}\right\}}\right)}}\right|^{2}\hfill\\ \end{gathered}

then,

S(γ)\displaystyle S\left(\gamma\right) =sup|yx0|γ|t=0nWt(𝕀{Xty}𝕀{Xtx0})|2=max(S0(γ),S1(γ)),\displaystyle=\mathop{\sup}\limits_{\left|{y-{x_{0}}}\right|\leqslant\gamma}\left|{\sum\limits_{t=0}^{n}{{W_{t}}\left({{\mathbb{I}}\left\{{{X_{t}}\leqslant y}\right\}-{\mathbb{I}}\left\{{{X_{t}}\leqslant{x_{0}}}\right\}}\right)}}\right|^{2}=\max{\Bigl{(}S_{0}\left(\gamma\right),S_{1}\left(\gamma\right)\Bigr{)}},
S0(γ)+S1(γ)\displaystyle\leqslant S_{0}\left(\gamma\right)+S_{1}\left(\gamma\right) (41)

Following the notation of section 2, let

αn(0)(γ)=supyI0Tn([y,x0]),αn(1)(γ)=supyI1Tn([x0,y]),\alpha_{n}^{(0)}\left(\gamma\right)=\mathop{\sup}\limits_{y\in{I_{0}}}{T_{n}}\left({\left[{y,{x_{0}}}\right]}\right)\quad,\quad\alpha_{n}^{(1)}\left(\gamma\right)=\mathop{\sup}\limits_{y\in{I_{1}}}{T_{n}}\left({\left[{{x_{0}},y}\right]}\right),

with this notation, S0=supyI0|i=1Tn([y,x0])Wσ[y,x0)(i)|2{S_{0}}=\mathop{\sup}\limits_{y\in{I_{0}}}{\left|{\sum\limits_{i=1}^{{T_{n}}\left({\left[{y,{x_{0}}}\right]}\right)}{{W_{{\sigma_{[y,x_{0})}(i)}}}}}\right|^{2}} and S1=supyI1|i=1Tn([x0,y])Wσ[x0,y](i)|2S_{1}=\mathop{\sup}\limits_{y\in{I_{1}}}\left|{\sum\limits_{{i=1}}^{{T_{n}}\left({\left[{{x_{0}},y}\right]}\right)}{{W_{{{\sigma_{[x_{0},y]}(i)}}}}}}\right|^{2}.

By Doob’s maximal inequality (Th 10.9.4 in [19]), we have, for j=0,1j=0,1,

𝔼nSj(γ)\displaystyle{\mathbb{E}_{{\mathcal{F}_{n}}}}S_{j}\left(\gamma\right) 4𝔼n(i=1αn(j)Wti)2=4σ2αn(j)(γ)\displaystyle\leqslant 4{\mathbb{E}_{{\mathcal{F}_{n}}}}{\left({\sum\limits_{i=1}^{{\alpha^{(j)}_{n}}}{{W_{{t_{i}}}}}}\right)^{2}}=4\sigma^{2}\alpha^{(j)}_{n}\left(\gamma\right)
4σ2Tn([x0γ,x0+γ])\displaystyle\leqslant 4{\sigma^{2}}{T_{n}}\left({\left[{{x_{0}}-\gamma,{x_{0}}+\gamma}\right]}\right)
4σ2t=0n(𝕀{Xtx0+γ}𝕀{Xt<x0γ}).\displaystyle\leqslant 4{\sigma^{2}}\sum\limits_{t=0}^{n}{\left({{\mathbb{I}}\left\{{{X_{t}}\leqslant{x_{0}}+\gamma}\right\}-{\mathbb{I}}\left\{{{X_{t}}<{x_{0}}-\gamma}\right\}}\right)}. (42)

Therefore, by (41) and (8)

𝔼nS(γ)8σ2t=0n(𝕀{Xtx0+γ}𝕀{Xt<x0γ}).{\mathbb{E}_{{\mathcal{F}_{n}}}}S\left(\gamma\right)\leqslant 8{\sigma^{2}}\sum\limits_{t=0}^{n}{\Bigl{(}{{\mathbb{I}}\left\{{{X_{t}}\leqslant{x_{0}}+\gamma}\right\}-{\mathbb{I}}\left\{{{X_{t}}<{x_{0}}-\gamma}\right\}}\Bigr{)}}. (43)

Define,

  • h(y,γ)=𝕀{y[x0γ,x0+γ]}h\left({y,\gamma}\right)=\mathbb{I}{\left\{{y\in\left[{{x_{0}}-\gamma,{x_{0}}+\gamma}\right]}\right\}},

  • h(j,γ)={t=0ταh(Xt,γ),j=0t=τA(j)+1τA(j+1)h(Xt,γ),j1h\left({{\mathcal{B}_{j}},\gamma}\right)=\begin{cases}\sum\limits_{t=0}^{{\tau_{\alpha}}}{h\left({{X_{t}},\gamma}\right)}&,\quad j=0\hfill\\ \sum\limits_{t={\tau_{A}}(j)+1}^{{\tau_{A}}\left({j+1}\right)}{h\left({{X_{t}},\gamma}\right)}&,\quad j\geqslant 1\hfill\\ \end{cases}

  • Zn(γ)=t=0nh(Xt,γ){Z_{n}}\left(\gamma\right)=\sum\limits_{t=0}^{n}{h\left({{X_{t}},\gamma}\right)}

  • (j)={τ𝜶,j=0τ𝜶(j+1)τ𝜶(j),j1\ell\left({{\mathcal{B}_{j}}}\right)=\begin{cases}{\tau_{\boldsymbol{\alpha}}}&,\quad j=0\hfill\\ {\tau_{\boldsymbol{\alpha}}}(j+1)-{\tau_{\boldsymbol{\alpha}}}(j)&,\quad j\geqslant 1\hfill\\ \end{cases}

  • T~(n)=min{k:i=0k(j)n}\widetilde{T}\left(n\right)=\min\left\{{k:\sum\limits_{i=0}^{k}{\ell\left({{\mathcal{B}_{j}}}\right)}\geqslant n}\right\}.

  • 𝒢k=σ({(h(j,γ),(j))}j=0k){\mathcal{G}_{k}}=\sigma\left({\left\{{\left({h\left({{\mathcal{B}_{j}},\gamma}\right),\ell\left({{\mathcal{B}_{j}}}\right)}\right)}\right\}_{j=0}^{k}}\right) for k0k\geqslant 0.

By the Strong Markov property, {(h(j,γ),(j))}j=1+\left\{{\left({h\left({{\mathcal{B}_{j}},\gamma}\right),\ell\left({{\mathcal{B}_{j}}}\right)}\right)}\right\}_{j=1}^{+\infty} is an i.i.d. sequence which is independent of (h(0,γ),(0)){\left({h\left({{\mathcal{B}_{0}},\gamma}\right),\ell\left({{\mathcal{B}_{0}}}\right)}\right)} (and, therefore, of the initial measure λ\lambda). For nn fixed, the random variable T~(n)\widetilde{T}\left(n\right) is a stopping time for the sequence {(h(j,γ),(j))}j=0+\left\{{\left({h\left({{\mathcal{B}_{j}},\gamma}\right),\ell\left({{\mathcal{B}_{j}}}\right)}\right)}\right\}_{j=0}^{+\infty}, in effect

{T~(n)=0}\displaystyle\left\{{\widetilde{T}\left(n\right)=0}\right\} ={(0)n}𝒢0,\displaystyle=\left\{{\ell\left({{\mathcal{B}_{0}}}\right)\geqslant n}\right\}\in{\mathcal{G}_{0}},
{T~(n)=k}\displaystyle\left\{{\widetilde{T}\left(n\right)=k}\right\} =j=0k1{i=0j(i)<n}{i=0k(j)n}𝒢kk1.\displaystyle=\bigcap\limits_{j=0}^{k-1}{\left\{{\sum\limits_{i=0}^{j}{\ell\left({{\mathcal{B}_{i}}}\right)}<n}\right\}}\bigcap\left\{{\sum\limits_{i=0}^{k}{\ell\left({{\mathcal{B}_{j}}}\right)}\geqslant n}\right\}\in{\mathcal{G}_{k}}\quad\forall k\geqslant 1.

For each nn and γ\gamma we have that

Zn(γ)\displaystyle{Z_{n}}\left(\gamma\right) =t=0ταh(Xt,γ)+j=1T(n)h(j,γ)+t=tα(T(n))+1nh(Xt,γ)\displaystyle=\sum\limits_{t=0}^{{\tau_{\alpha}}}{h\left({{X_{t}},\gamma}\right)}+\sum\limits_{j=1}^{T\left(n\right)}{h\left({{\mathcal{B}_{j}},\gamma}\right)}+\sum\limits_{t={t_{\alpha}}\left({T\left(n\right)}\right)+1}^{n}{h\left({{X_{t}},\gamma}\right)}
h(0,γ)+j=1T~(n)h(j,γ).\displaystyle\leqslant{h\left({{\mathcal{B}_{0}},\gamma}\right)}+\sum\limits_{j=1}^{\widetilde{T}\left(n\right)}{h\left({{\mathcal{B}_{j}},\gamma}\right)}. (44)

where the last inequality is justified by the fact that, T(n)T~(n)T\left(n\right)\leqslant\widetilde{T}\left(n\right) and h(y,γ)h\left(y,\gamma\right) is a nonnegative function. Because (j)1\ell\left({{\mathcal{B}_{j}}}\right)\geqslant 1 for all jj, we have that,

j=1T~(n)h(j,γ)=j=1nh(j,γ)𝕀{T~(n)j},\sum\limits_{j=1}^{\widetilde{T}\left(n\right)}{h\left({{\mathcal{B}_{j}},\gamma}\right)}=\sum\limits_{j=1}^{n}{h\left({{\mathcal{B}_{j}},\gamma}\right)\mathbb{I}\left\{{\widetilde{T}\left(n\right)\geqslant j}\right\}},

then,

𝔼(j=1T~(n)h(j,γ))=j=1n𝔼(h(j,γ)𝕀{T~(n)j}).\mathbb{E}\left({\sum\limits_{j=1}^{\widetilde{T}\left(n\right)}{h\left({{\mathcal{B}_{j}},\gamma}\right)}}\right)=\sum\limits_{j=1}^{n}{\mathbb{E}\left({h\left({{\mathcal{B}_{j}},\gamma}\right)\mathbb{I}\left\{{\widetilde{T}\left(n\right)\geqslant j}\right\}}\right)}. (45)

For each jj we have,

𝔼λ(h(j,γ)𝕀{T~(n)j})=𝔼λ(𝔼(h(j,γ)𝕀{T~(n)j}|𝒢j1))\mathbb{E}_{\lambda}\left({h\left({{\mathcal{B}_{j}},\gamma}\right)\mathbb{I}\left\{{\widetilde{T}\left(n\right)\geqslant j}\right\}}\right)=\mathbb{E}_{\lambda}\left({\mathbb{E}\left({h\left({{\mathcal{B}_{j}},\gamma}\right)\mathbb{I}\left\{{\widetilde{T}\left(n\right)\geqslant j}\right\}\left|{{\mathcal{G}_{j-1}}}\right.}\right)}\right)

Notice that 𝕀{T~(n)j}=1𝕀{T~(n)j1}𝒢j1\mathbb{I}\left\{{\widetilde{T}\left(n\right)\geqslant j}\right\}=1-\mathbb{I}\left\{{\widetilde{T}\left(n\right)\leqslant j-1}\right\}\in{\mathcal{G}_{j-1}} and h(j,γ){h\left({{\mathcal{B}_{j}},\gamma}\right)} is independent of 𝒢j1{\mathcal{G}_{j-1}}, therefore,

𝔼λ(h(j,γ)𝕀{T~(n)j})=𝔼λ(𝕀{T~(n)j})𝔼(h(j,γ)).\displaystyle\mathbb{E}_{\lambda}\left({h\left({{\mathcal{B}_{j}},\gamma}\right)\mathbb{I}\left\{{\widetilde{T}\left(n\right)\geqslant j}\right\}}\right)=\mathbb{E}_{\lambda}\left({\mathbb{I}\left\{{\widetilde{T}\left(n\right)\geqslant j}\right\}}\right)\mathbb{E}\left({h\left({{\mathcal{B}_{j}},\gamma}\right)}\right).

Plugging this into equation (45) we get,

𝔼λ(j=1T~(n)h(j,γ))=j=1n𝔼(h(j,γ))λ(T~(n)j)𝔼(h(1,γ))𝔼λT~(n).\mathbb{E}_{\lambda}\left({\sum\limits_{j=1}^{\widetilde{T}\left(n\right)}{h\left({{\mathcal{B}_{j}},\gamma}\right)}}\right)=\sum\limits_{j=1}^{n}{\mathbb{E}\left({h\left({{\mathcal{B}_{j}},\gamma}\right)}\right)\mathbb{P}_{\lambda}\left({\widetilde{T}\left(n\right)\geqslant j}\right)}\leqslant\mathbb{E}\left({h\left({{\mathcal{B}_{1}},\gamma}\right)}\right)\mathbb{E}_{\lambda}\widetilde{T}\left(n\right).

Then, by taking expectation in (8) we obtain

𝔼λZn(γ)\displaystyle\mathbb{E}_{\lambda}{Z_{n}}\left(\gamma\right) 𝔼λh(0,γ)+𝔼(h(1,γ))𝔼λT~(n)\displaystyle\leqslant\mathbb{E}_{\lambda}h\left({{\mathcal{B}_{0}},\gamma}\right)+\mathbb{E}\left({h\left({{\mathcal{B}_{1}},\gamma}\right)}\right)\mathbb{E}_{\lambda}\widetilde{T}\left(n\right)
𝔼λh(0,γ)+𝔼(h(1,γ))𝔼λ(T(n)+1).\displaystyle\leqslant\mathbb{E}_{\lambda}h\left({{\mathcal{B}_{0}},\gamma}\right)+\mathbb{E}\left({h\left({{\mathcal{B}_{1}},\gamma}\right)}\right)\mathbb{E}_{\lambda}\left(T\left(n\right)+1\right). (46)

By Theorem 1.1 and the fact that FF is Lipschitz we can find K1K_{1} independent of γ\gamma such that,

𝔼(h(1,γ))\displaystyle\mathbb{E}\left({h\left({{\mathcal{B}_{1}},\gamma}\right)}\right) =h(t,γ)𝑑π(t)=Kππ(C)(F(x0+γ)F(x0γ))\displaystyle=\int{h\left({t,\gamma}\right)d\pi\left(t\right)=K_{\pi}\pi\left(C\right)\left({{F}\left({{x_{0}}+\gamma}\right)-{F}\left({{x_{0}}-\gamma}\right)}\right)}
K1γ.\displaystyle\leqslant K_{1}\gamma. (47)

If X is positive recurrent, by Theorem 1.1, T(n)u(n)\frac{T\left({n}\right)}{u\left(n\right)} converges almost surely to a positive constant K2>0K_{2}>0. Moreover, T(n)u(n)1\frac{T\left(n\right)}{u\left(n\right)}\leqslant 1 therefore, by the Dominated Convergence Theorem we obtain that 𝔼λT(n)u(n)K2\mathbb{E}_{\lambda}T\left(n\right)\sim\frac{u\left(n\right)}{K_{2}}. If X is β\beta-null recurrent, by Lemma 3.3 in [24], 𝔼λT(n)u(n)Γ(1+β)\mathbb{E}_{\lambda}T\left(n\right)\sim\frac{{u\left(n\right)}}{{\Gamma\left({1+\beta}\right)}}, hence, for both positive and β\beta-null recurrent chains, we can find K2K_{2} and NN, both independent of γ\gamma, such that 𝔼λT(n)K2u(n)\mathbb{E}_{\lambda}T\left(n\right)\leqslant{K_{2}}u\left(n\right) for all nNn\geqslant N. Using this with (8) and (47) we get,

𝔼λZn(γ)u(n)γ𝔼λh(0,γ)u(n)γ+K1K2nN,γ(0,δ].\frac{{\mathbb{E}_{\lambda}{Z_{n}}\left(\gamma\right)}}{{u\left(n\right)\gamma}}\leqslant\frac{{\mathbb{E}_{\lambda}h\left({{\mathcal{B}_{0}},\gamma}\right)}}{{u\left(n\right)\gamma}}+{K_{1}}{K_{2}}\quad\forall n\geqslant N,\;\forall\gamma\in\left({0,\delta}\right]. (48)

Combining (48) with assumption (B3) and the fact that Zn(0)0Z_{n}\left(0\right)\equiv 0 we obtain that there exist positive constants K3K_{3} and γ0\gamma_{0} such that

𝔼λZn(γ)u(n)γnN,γ(0,γ0].\mathbb{E}_{\lambda}{Z_{n}}\left(\gamma\right)\leqslant u\left(n\right)\gamma\quad\forall n\geqslant N,\;\forall\gamma\in\left({0,\gamma_{0}}\right].

Equation (35) now follows after taking expectation in (43). The proof of (36) follows the same reasoning, but using

Sj(γ)=supyIj|t=0nWt(𝕀C{Xt=y})|2 .{S_{j}}\left(\gamma\right)=\mathop{\sup}\limits_{y\in{I_{j}}}{\left|{\sum\limits_{t=0}^{n}{{W_{t}}\left({\mathbb{I}_{C}\left\{{{X_{t}}=y}\right\}}\right)}}\right|^{2}}{\text{ }}.

Proof of Lemma 6.7. a) If X is positive recurrent, Theorem 1.1 implies that there exists a positive constant KK such that Tn(C)u(n)\frac{T_{n}\left(C\right)}{u\left(n\right)} converges almost surely to Kπ(C)K\pi(C), which is not zero by (A3).

On the other hand, if X is β\beta-null recurrent, Theorem 1.1 and Slutsky’s Theorem implies that there exists a constant K>0K>0 such that Tn(C)u(n)\frac{{{T_{n}}\left(C\right)}}{{u\left(n\right)}} converges in distribution to KMβ(1)KM_{\beta}(1) where Mβ(1)M_{\beta}(1) denotes a Mittag-Leffler distribution with parameter β\beta. This distribution is continuous and strictly positive with probability 1, then, by the Continuous Mapping Theorem, u(n)Tn(C)\frac{{u\left(n\right)}}{{{T_{n}}\left(C\right)}} converges in distribution to a multiple of 1Mβ\frac{1}{{{M_{\beta}}}}, therefore, u(n)Tn(C)\frac{{u\left(n\right)}}{{{T_{n}}\left(C\right)}} is bounded in probability by Theorem 2.4 in [33].

b) Let X be positive recurrent, then, we can find NηN_{\eta} such that

(|(Tn(C)u(n))αKαπ(C)α|(Kπ(C)2)α)1η,nNη.\mathbb{P}\left({\left|{{{\left({\frac{{{T_{n}}\left(C\right)}}{{u\left(n\right)}}}\right)}^{\alpha}}-K^{\alpha}\pi{{\left(C\right)}^{\alpha}}}\right|\leqslant{{\left({\frac{{K\pi\left(C\right)}}{2}}\right)}^{\alpha}}}\right)\geqslant 1-\eta,\quad\forall n\geqslant{N_{\eta}}.

hence,

((Tn(C)u(n))α[Kαπ(C)α2,3Kαπ(C)α2])1η,nNη.\mathbb{P}\left({{{\left({\frac{{{T_{n}}\left(C\right)}}{{u\left(n\right)}}}\right)}^{\alpha}}\in\left[{\frac{{{K^{\alpha}\pi{{\left(C\right)}^{\alpha}}}}}{2},\frac{{3K^{\alpha}\pi{{\left(C\right)}^{\alpha}}}}{2}}\right]}\right)\geqslant 1-\eta,\quad\forall n\geqslant{N_{\eta}}.

Now let X be β\beta-null recurrent. Let Z=(KMβ(1))αZ={\left({K{M_{\beta}}\left(1\right)}\right)^{\alpha}}, This random variable is continuous and positive, therefore, we can find positive constants c¯η\underline{c}_{\eta} and c¯η\overline{c}_{\eta} such that

(Z[c¯η,c¯η])1η2.\mathbb{P}\left({Z\in\left[{\underline{c}_{\eta},\overline{c}_{\eta}}\right]}\right)\geqslant 1-\frac{\eta}{2}. (49)

By the Continuous Mapping Theorem, (Tn(C)u(n))α{{{\left({\frac{{{T_{n}}\left(C\right)}}{{u\left(n\right)}}}\right)}^{\alpha}}} converges in distribution to ZZ, therefore, we can find NηN_{\eta}\in\mathbb{N} such that

|((Tn(C)u(n))α[c¯η,c¯η])(Z[c¯η,c¯η])|η2,nNη,\left|{\mathbb{P}\left({{{\left({\frac{{T_{n}(C)}}{{u\left(n\right)}}}\right)}^{\alpha}}\in\left[{{\underline{c}_{\eta}},{{\overline{c}}_{\eta}}}\right]}\right)-\mathbb{P}\left({Z\in\left[{{\underline{c}_{\eta}},{{\overline{c}}_{\eta}}}\right]}\right)}\right|\leqslant\frac{\eta}{2},\quad\forall n\geqslant{N_{\eta}}, (50)

Combining (49) and (50) we obtain that

((Tn(C)u(n))α[c¯η,c¯η])1η,nNη.\mathbb{P}\left({{{\left({\frac{{T_{n}(C)}}{{u\left(n\right)}}}\right)}^{\alpha}}\in\left[{{\underline{c}_{\eta}},{{\overline{c}}_{\eta}}}\right]}\right)\geqslant 1-\eta,\quad\forall n\geqslant{N_{\eta}}. (51)

Proof of Lemma 6.8. Fix ε(0,1)\varepsilon\in(0,1) small enough so that FF^{\prime} and |f0||f_{0}^{\prime}| are bounded from above and away from zero on [F1(F(x0)2ε),F1(F(x0)+2ε)][F^{-1}(F(x_{0})-2\varepsilon),F^{-1}(F(x_{0})+2\varepsilon)], see the assumption (B2). Then, the proper inverse functions of FF and f0f_{0} are well defined on [F(x0)2ε,F(x0)+2ε][F(x_{0})-2\varepsilon,F(x_{0})+2\varepsilon] and

[f0F1(F(x0)2ε),f0F1(F(x0)+2ε)][f_{0}\circ F^{-1}(F(x_{0})-2\varepsilon),f_{0}\circ F^{-1}(F(x_{0})+2\varepsilon)]

respectively. We denote the inverses on that intervals by F1F^{-1} and f01f_{0}^{-1} respectively. Let

Un(a)=argmax|pF(x0)|ε{Λn(p)ap}U_{n}(a)=\mathop{\mbox{\sl\em argmax}}_{|p-F(x_{0})|\leqslant\varepsilon}\{\Lambda_{n}(p)-ap\} (52)

where a=f0(x0)a=f_{0}(x_{0}) and where the supremum is restricted to p[Fn(Y0),Fn(Ym)]p\in[F_{n}(Y_{0}),F_{n}(Y_{m})]. We will show below that

Un(a)=F(x0)+OP(u(n)1/3),U_{n}(a)=F(x_{0})+O_{P}({u\left(n\right)^{-1/3}}), (53)

as nn\to\infty. Combining (31) to Lemma 6.5 ensures that U^n(a)\widehat{U}_{n}(a) coincides with Un(a)U_{n}(a) with a probability that tends to one as nn\to\infty, so (37) follows from (53).

We turn to the proof of (53). Fix η>0\eta>0 arbitrarily and let

γn=K0u(n)1/3\gamma_{n}=K_{0}{u\left(n\right)^{-1/3}} (54)

for some K01K_{0}\geqslant 1 sufficiently large so that

γn1u(n).\gamma_{n}\geqslant\frac{1}{\sqrt{u\left(n\right)}}. (55)

Then, by part ii) of Lemma 6.7, we can find positive constants c¯η\underline{c}_{\eta}, c¯η\bar{c}_{\eta} and NηN_{\eta} such that

(Tn(C)2/3γnu(n)1/3[K0c¯η,K0c¯η])1η/2nNη,\mathbb{P}\left(T_{n}(C)^{2/3}\gamma_{n}u\left(n\right)^{-1/3}\in[K_{0}\underline{c}_{\eta},K_{0}\bar{c}_{\eta}]\right)\geqslant 1-\eta/2\quad\forall n\geqslant N_{\eta}, (56)

Let c¯=K0c¯η\underline{c}=K_{0}\underline{c}_{\eta} and c¯=K0c¯η\bar{c}=K_{0}\overline{c}_{\eta}. It follows from (18) that for sufficiently small ε>0\varepsilon>0, we can find K1>0K_{1}>0 such that

(Tn(C)sup|pF(x0)|2ε|Fn1(p)F1(p)|2K1)1η/2\mathbb{P}\left(T_{n}(C)\sup_{|p-F(x_{0})|\leqslant 2\varepsilon}|F_{n}^{-1}(p)-F^{-1}(p)|^{2}\leqslant K_{1}\right)\geqslant 1-\eta/2

for all nn. Hence for nηn\geqslant\mathbb{N}_{\eta},

(n)1η,\mathbb{P}(\mathcal{E}_{n})\geqslant 1-\eta,

where n\mathcal{E}_{n} denotes the intersection of the events

Tn(C)2/3γnu(n)1/3[c¯,c¯]T_{n}(C)^{2/3}\gamma_{n}u\left(n\right)^{-1/3}\in[\underline{c},\bar{c}] (57)

and

Tn(C)sup|pF(x0)|2ε|Fn1(p)F1(p)|2K1.T_{n}(C)\sup_{|p-F(x_{0})|\leqslant 2\varepsilon}|F_{n}^{-1}(p)-F^{-1}(p)|^{2}\leqslant K_{1}. (58)

Combining equations (57) and (58), we obtain that, in n\mathcal{E}_{n},

sup|pF(x0)|2ε|Fn1(p)F1(p)|2K2a(n)1\mathop{\sup}\limits_{\left|{p-{F}\left({{x_{0}}}\right)}\right|\leqslant 2\varepsilon}{\left|{F_{n}^{-1}\left(p\right)-{F^{-1}}\left(p\right)}\right|^{2}}\leqslant{K_{2}}a{\left(n\right)^{-1}} (59)

where K2=K1(K0c¯)3/2K_{2}={K_{1}}{\left({\frac{{{K_{0}}}}{{\underline{c}}}}\right)^{3/2}} is independent of nn and K0K_{0}.

By Lemma 6.3, we can assume without loss of generality that F(x0)F(x_{0}) belongs to [Fn(Y0),Fn(Ym)][F_{n}(Y_{0}),F_{n}(Y_{m})], since this occurs with probability that tends to one. Hence, by (52), the event {|Un(a)F(x0)|γn}\{|U_{n}(a)-F(x_{0})|\geqslant\gamma_{n}\} is contained in the event that there exists p[Fn(Y0),Fn(Ym)]p\in[F_{n}(Y_{0}),F_{n}(Y_{m})] with |pF(x0)|ε|p-F(x_{0})|\leqslant\varepsilon, |pF(x0)|γn|p-F(x_{0})|\geqslant\gamma_{n} and

Λn(p)apΛn(F(x0))aF(x0).\Lambda_{n}(p)-ap\geqslant\Lambda_{n}(F(x_{0}))-aF(x_{0}). (60)

Obviously, the probability is equal to zero if γn>ε\gamma_{n}>\varepsilon so we assume in the sequel that γnε\gamma_{n}\leqslant\varepsilon. For all p[F(x0)ε,F(x0)+ε]p\in[F(x_{0})-\varepsilon,F(x_{0})+\varepsilon] define

Λ(p)=F(x0)pf0F1(u)𝑑u.\Lambda\left(p\right)=\int_{F\left({x_{0}}\right)}^{p}f_{0}\circ F^{-1}(u)du.

Let c>0c>0 such that |f0|/F>2c|f_{0}^{\prime}|/F^{\prime}>2c on the interval [F1(F(x0)2ε),F1(F(x0)+2ε)][F^{-1}(F(x_{0})-2\varepsilon),F^{-1}(F(x_{0})+2\varepsilon)]. Since Λ(F(x0))=a\Lambda^{\prime}(F(x_{0}))=a and Λ′′=f0F1/FF1\Lambda^{\prime\prime}=f_{0}^{\prime}\circ F^{-1}/F^{\prime}\circ F^{-1}, it then follows from Taylor’s expansion that

Λ(p)Λ(F(x0))(pF(x0))ac(pF(x0))2\Lambda(p)-\Lambda(F(x_{0}))\leqslant(p-F(x_{0}))a-c(p-F(x_{0}))^{2}

for all p[F(x0)ε,F(x0)+ε]p\in[F(x_{0})-\varepsilon,F(x_{0})+\varepsilon] and therefore, (60) implies that

Δn(p)Δn(F(x0))c(pF(x0))20\Delta_{n}(p)-\Delta_{n}(F(x_{0}))-c(p-F(x_{0}))^{2}\geqslant 0

for all such pp’s, where we set Δn:=ΛnΛ\Delta_{n}:=\Lambda_{n}-\Lambda. Hence, for all nNηn\geqslant N_{\eta},

(|Un(a)F(x0)|γn)\displaystyle\mathbb{P}\left(\left|U_{n}\left(a\right)-F\left(x_{0}\right)\right|\geqslant\gamma_{n}\right)
η+(sup|pF(x0)|[γn,ε]{Δn(p)Δn(F(x0))c(pF(x0))2}0 and n)\displaystyle\qquad\leqslant\eta+\mathbb{P}\left(\sup_{|p-F(x_{0})|\in[\gamma_{n},\varepsilon]}\{\Delta_{n}(p)-\Delta_{n}(F(x_{0}))-c(p-F(x_{0}))^{2}\}\geqslant 0\mbox{ and }{\mathcal{E}_{n}}\right)
η+j(sup|u|[γn2j,γn2j+1]{Δn(F(x0)+u)Δn(F(x0))}c(γn2j)2 and n)\displaystyle\qquad\leqslant\eta+\sum_{j}\mathbb{P}\left(\sup_{|u|\in[\gamma_{n}2^{j},\gamma_{n}2^{j+1}]}\{\Delta_{n}(F(x_{0})+u)-\Delta_{n}(F(x_{0}))\}\geqslant c(\gamma_{n}2^{j})^{2}\mbox{ and }{\mathcal{E}_{n}}\right)
η+j(sup|u|γn2j+1|Δn(F(x0)+u)Δn(F(x0))|c(γn2j)2 and n)\displaystyle\qquad{\leqslant\eta+\sum_{j}\mathbb{P}\left(\sup_{|u|\leqslant\gamma_{n}2^{j+1}}|\Delta_{n}(F(x_{0})+u)-\Delta_{n}(F(x_{0}))|\geqslant c(\gamma_{n}2^{j})^{2}\mbox{ and }\mathcal{E}_{n}\right)} (61)

where the sums are taken over all integers j0j\geqslant 0 such that γn2jε\gamma_{n}2^{j}\leqslant\varepsilon. Recall that we have (39) for all k{0,,m}k\in\{0,\dots,m\}. Since Λn\Lambda_{n} is piecewise-linear with knots at Fn(Y0),,Fn(Ym)F_{n}(Y_{0}),\dots,F_{n}(Y_{m}), by (22) and (23) we get that for every jj in the above sum,

sup|u|γn2j+1|Δn(F(x0)+u)Δn(F(x0))|\displaystyle\sup_{|u|\leqslant\gamma_{n}2^{j+1}}|\Delta_{n}(F(x_{0})+u)-\Delta_{n}(F(x_{0}))|
sup|u|γn2j+1|F(x0)F(x0)+u(f0Fn1(y)f0F1(y))𝑑y|\displaystyle\qquad\leqslant\mathop{\sup}\limits_{|u|\leqslant{\gamma_{n}}{2^{j+1}}}\left|{\int\limits_{F\left(x_{0}\right)}^{{F}\left({{x_{0}}}\right)+u}{\left({{f_{0}}\circ F_{n}^{-1}\left(y\right)-{f_{0}}\circ{F^{-1}}\left(y\right)}\right)dy}}\right|
+sup|u|γn2j+1|Mn(F(x0)+u)Mn(F(x0))|.\displaystyle\qquad\qquad+\mathop{\sup}\limits_{|u|\leqslant{\gamma_{n}}{2^{j+1}}}\left|{{M_{n}}\left({{F}\left({{x_{0}}}\right)+u}\right)-{M_{n}}\left({{F}\left({{x_{0}}}\right)}\right)}\right|. (62)

Moreover, |f0||f_{0}^{\prime}| is bounded above on [F1(F(x0)2ε),F1(F(x0)+2ε)][F^{-1}(F(x_{0})-2\varepsilon),F^{-1}(F(x_{0})+2\varepsilon)], so we obtain that for every jj with γn2jε\gamma_{n}2^{j}\leqslant\varepsilon, the first term on the right-hand side of (8) satisfies

sup|u|γn2j+1|F(x0)F(x0)+u(f0Fn1(p)f0F1(p))𝑑p|\displaystyle\sup_{|u|\leqslant\gamma_{n}2^{j+1}}\left|\int_{F(x_{0})}^{F(x_{0})+u}\left(f_{0}\circ F_{n}^{-1}(p)-f_{0}\circ F^{-1}(p)\right)dp\right|
F(x0)γn2j+1F(x0)+γn2j+1|f0Fn1(p)f0F1(p)|𝑑p\displaystyle\qquad\leqslant\int_{F(x_{0})-\gamma_{n}2^{j+1}}^{F(x_{0})+\gamma_{n}2^{j+1}}\left|f_{0}\circ F_{n}^{-1}(p)-f_{0}\circ F^{-1}(p)\right|dp
K3γn2jsup|pF(x0)|2ε|Fn1(p)F1(p)|,\displaystyle\qquad\leqslant K_{3}\gamma_{n}2^{j}\sup_{|p-F(x_{0})|\leqslant 2\varepsilon}\left|F_{n}^{-1}(p)-F^{-1}(p)\right|,

for some K3>0K_{3}>0 that does not depend on nn. Hence, it follows from the previous display and (59) that

𝔼(sup|u|γn2j+1|F(x0)F(x0)+u(f0Fn1(p)f0F1(p)dp|2𝕀(n))\displaystyle\mathbb{E}\left(\sup_{|u|\leqslant\gamma_{n}2^{j+1}}\left|\int_{F(x_{0})}^{F(x_{0})+u}(f_{0}\circ F_{n}^{-1}(p)-f_{0}\circ F^{-1}(p)dp\right|^{2}\mathbb{I}(\mathcal{E}_{n})\right)
K32γn222j𝔼(sup|pF(x0)|2ε|Fn1(p)F1(p)|2𝕀(n))\displaystyle\qquad\leqslant K_{3}^{2}\gamma_{n}^{2}2^{2j}\mathbb{E}\left(\sup_{|p-F(x_{0})|\leqslant 2\varepsilon}|F_{n}^{-1}(p)-F^{-1}(p)|^{2}\mathbb{I}({\mathcal{E}_{n}})\right)
K32γn222jK2u(n)1.\displaystyle\qquad\leqslant K_{3}^{2}\gamma_{n}^{2}{2^{2j}}{K_{2}}{u\left(n\right)}^{-1}.

Hence, taking K4=K32K2K_{4}=K_{3}^{2}K_{2} we get that for all jj with γn2jε1\gamma_{n}2^{j}\leqslant\varepsilon\leqslant 1.

𝔼(sup|u|γn2j+1|F(x0)F(x0)+u(f0Fn1(p)f0F1(p)dp|2𝕀(n))K4γn2ju(n)1.\mathbb{E}\left(\sup_{|u|\leqslant\gamma_{n}2^{j+1}}\left|\int_{F(x_{0})}^{F(x_{0})+u}(f_{0}\circ F_{n}^{-1}(p)-f_{0}\circ F^{-1}(p)dp\right|^{2}\mathbb{I}(\mathcal{E}_{n})\right)\leqslant{K_{4}}{\gamma_{n}}{2^{j}}u\left(n\right)^{-1}. (63)

By equations (23) and (24) in Lemma 6.1, the second term on the right-hand side of (8) satisfies,

sup|u|γn2j+1|Mn(F(x0)+u)Mn(F(x0))|I1n,j+I2n,j,\mathop{\sup}\limits_{|u|\leqslant{\gamma_{n}}{2^{j+1}}}\left|{{M_{n}}\left({{F}\left({{x_{0}}}\right)+u}\right)-{M_{n}}\left({{F}\left({{x_{0}}}\right)}\right)}\right|\leqslant{I^{n,j}_{1}}+{I^{n,j}_{2}}, (64)

where I1n,jI_{1}^{n,j} and I2n,jI_{2}^{n,j} are given by

I1n,j\displaystyle I^{n,j}_{1} =1Tn(C)sup|u|γn2j+1|t=0nWt(𝕀C{XtFn1(F(x0)+u)}𝕀C{XtFn1(F(x0))})|,\displaystyle=\frac{1}{T_{n}(C)}\sup_{|u|\leqslant\gamma_{n}2^{j+1}}\left|\sum_{t=0}^{n}W_{t}\Bigl{(}\mathbb{I}_{C}{\{X_{t}\leqslant F_{n}^{-1}(F(x_{0})+u)\}}-\mathbb{I}_{C}{\{X_{t}\leqslant F_{n}^{-1}(F(x_{0}))\}}\Bigl{)}\right|,
I2n,j\displaystyle{I^{n,j}_{2}} =2Tn(C)sup|u|γn2j+1|t=0nWt(𝕀C{Xt=Fn1(F(x0)+u)})|.\displaystyle=\frac{2}{{{T_{n}}\left(C\right)}}\sup_{|u|\leqslant\gamma_{n}2^{j+1}}\left|\sum\limits_{t=0}^{n}{{W_{t}}\Bigl{(}{\mathbb{I}_{C}{\left\{{{X_{t}}=F_{n}^{-1}\left({{F}\left({{x_{0}}}\right)+u}\right)}\right\}}}\Bigl{)}}\right|.

For I1n,jI^{n,j}_{1}, it follows from the triangle inequality that

I1n,j2Tn(C)sup|u|γn2j+1|t=0nWt(𝕀C{XtFn1(F(x0)+u)}𝕀C{Xtx0})|.I^{n,j}_{1}\leqslant\frac{2}{T_{n}(C)}\sup_{|u|\leqslant\gamma_{n}2^{j+1}}\left|\sum_{t=0}^{n}W_{t}\left(\mathbb{I}_{C}{\{X_{t}\leqslant F_{n}^{-1}(F(x_{0})+u)\}}-\mathbb{I}_{C}{\{X_{t}\leqslant x_{0}\}}\right)\right|.

Combining (59) and the fact that F1F^{-1} is Lipschitz in [F(x0)2ε,F(x0)+2ε][F(x_{0})-2\varepsilon,F(x_{0})+2\varepsilon] we can find K5=max(K2,sup(F1))K_{5}=\max{\left(\sqrt{K_{2}},\sup\left(F^{-1}\right)\right)} independent of nn such that, on n\mathcal{E}_{n},

sup|pF(x0)|2ε|Fn1(p)F1(p)|K5u(n)\sup_{|p-F(x_{0})|\leqslant 2\varepsilon}|F_{n}^{-1}(p)-F^{-1}(p)|\leqslant\frac{K_{5}}{\sqrt{u\left(n\right)}}

and |F1(F(x0)+u)x|=|F1(F(x0)+u)F1(F(x0))|K5|u|/2|F^{-1}(F(x_{0})+u)-x|=|F^{-1}(F(x_{0})+u)-F^{-1}(F(x_{0}))|\leqslant K_{5}|u|/2 for all uu with |u|2ε|u|\leqslant 2\varepsilon. Hence, on n\mathcal{E}_{n}

I1n,j\displaystyle I^{n,j}_{1} 2Tn(C)sup|yx0|K5γn2j+K5/u(n)|t=0nWt(𝕀C{Xty}𝕀C{Xtx0})|,\displaystyle\leqslant\frac{2}{T_{n}(C)}\sup_{|y-x_{0}|\leqslant K_{5}\gamma_{n}2^{j}+K_{5}/\sqrt{u\left(n\right)}}\left|\sum_{t=0}^{n}W_{t}\left(\mathbb{I}_{C}{\{X_{t}\leqslant y\}}-\mathbb{I}_{C}{\{X_{t}\leqslant x_{0}\}}\right)\right|,
I2n,j\displaystyle I^{n,j}_{2} 2Tn(C)sup|yx0|K5γn2j+K5/u(n)|t=0nWt(𝕀C{Xt=y})|.\displaystyle\leqslant\frac{2}{{{T_{n}}\left(C\right)}}\sup_{|y-x_{0}|\leqslant K_{5}\gamma_{n}2^{j}+K_{5}/\sqrt{u\left(n\right)}}\left|\sum\limits_{t=0}^{n}{{W_{t}}\Bigl{(}{\mathbb{I}_{C}{\left\{{{X_{t}}=y}\right\}}}\Bigl{)}}\right|.

It follows from (55) that γn2jγn1/u(n)\gamma_{n}2^{j}\geqslant\gamma_{n}\geqslant 1/\sqrt{u\left(n\right)} for all j0j\geqslant 0, then, on n\mathcal{E}_{n}

I1n,j\displaystyle I^{n,j}_{1} 2Tn(C)sup|yx0|2K5γn2j|t=0nWt(𝕀C{Xty}𝕀C{Xtx0})|,\displaystyle\leqslant\frac{2}{T_{n}(C)}\sup_{|y-x_{0}|\leqslant 2K_{5}\gamma_{n}2^{j}}\left|\sum_{t=0}^{n}W_{t}\left(\mathbb{I}_{C}{\{X_{t}\leqslant y\}}-\mathbb{I}_{C}{\{X_{t}\leqslant x_{0}\}}\right)\right|,
I2n,j\displaystyle I^{n,j}_{2} 2Tn(C)sup|yx0|2K5γn2j|t=0nWt(𝕀C{Xt=y})|.\displaystyle\leqslant\frac{2}{{{T_{n}}\left(C\right)}}\sup_{|y-x_{0}|\leqslant 2K_{5}\gamma_{n}2^{j}}\left|\sum\limits_{t=0}^{n}{{W_{t}}\Bigl{(}{\mathbb{I}_{C}{\left\{{{X_{t}}=y}\right\}}}\Bigl{)}}\right|.

By Lemma 6.6, we conclude that there exists K6>0K_{6}>0 and NηN_{\eta}^{\prime} such that, for nNηn\geqslant N_{\eta}^{\prime}

𝔼((I1n,j+I2n,j)2𝕀(n))K6γn2ju(n)1\mathbb{E}\Bigl{(}\left(I^{n,j}_{1}+I^{n,j}_{2}\right)^{2}\mathbb{I}(\mathcal{E}_{n})\Bigr{)}\leqslant K_{6}\gamma_{n}2^{j}u\left(n\right)^{-1} (65)

Combining (8), (63), (64) and (65), we conclude that there exists K7>0K_{7}>0, independent of nn and K0K_{0}, such that for all nNηn\geqslant N_{\eta}^{\prime} and j0j\geqslant 0 where γn2jε\gamma_{n}2^{j}\leqslant\varepsilon, one has

𝔼(sup|u|γn2j+1|Δn(F(x0)+u)Δn(F(x0))|2𝕀(n))K7γn2ju(n)1.\displaystyle\mathbb{E}\left(\sup_{|u|\leqslant\gamma_{n}2^{j+1}}|\Delta_{n}(F(x_{0})+u)-\Delta_{n}(F(x_{0}))|^{2}{\mathbb{I}(\mathcal{E}_{n})}\right)\leqslant{K_{7}\gamma_{n}2^{j}u\left(n\right)^{-1}}.

Combining this with (8) and the Markov inequality, we conclude that there exist K8>0K_{8}>0 and Nη′′N_{\eta}^{\prime\prime}, that do not depend on nn nor K0K_{0}, such that, for all nNη′′n\geqslant N_{\eta}^{\prime\prime},

(|Un(a)F(x0)|γn)\displaystyle\mathbb{P}\left(|U_{n}(a)-F(x_{0})|\geqslant\gamma_{n}\right) \displaystyle\leqslant η+K8k0γn2ju(n)1(γn2j)4\displaystyle\eta+K_{8}\sum_{k\geqslant 0}\frac{\gamma_{n}2^{j}u\left(n\right)^{-1}}{(\gamma_{n}2^{j})^{4}}
\displaystyle\leqslant η+K8γn3u(n)1j023j.\displaystyle\eta+K_{8}\gamma_{n}^{-3}u\left(n\right)^{-1}\sum_{j\geqslant 0}2^{-3j}.

The sum on the last line is finite, so there exists K>0K>0, independent of nn and K0K_{0}, such that for nn bigger than Nη′′N_{\eta}^{\prime\prime}

(|Un(a)F(x0)|γn)η+Kγn3u(n)1=η+KK03.\mathbb{P}\left(|U_{n}(a)-F(x_{0})|\geqslant\gamma_{n}\right)\leqslant\eta+K\gamma_{n}^{-3}u\left(n\right)^{-1}=\eta+\frac{K}{K_{0}^{3}}. (66)

The above probability can be made smaller than 2η2\eta by setting (54) for some sufficiently large K0K_{0} independent of nn. This proves (53) and completes the proof of (37).
Now, we turn to the proof of (38). It follows from (30) combined to (32) and Lemma 5.3 that

f^n1(f0(x0))=F1U^n(f0(x0))+Tn(C)1/2OP(1).\widehat{f}_{n}^{-1}(f_{0}(x_{0}))=F^{-1}\circ\widehat{U}_{n}(f_{0}(x_{0}))+T_{n}(C)^{-1/2}O_{P}\left(1\right).

Hence, by Lemma 6.7 we have

f^n1(f0(x0))=F1U^n(f0(x0))+OP(u(n)1/2).\widehat{f}_{n}^{-1}(f_{0}(x_{0}))=F^{-1}\circ\widehat{U}_{n}(f_{0}(x_{0}))+O_{P}\left(u\left(n\right)^{-1/2}\right).

Now, it follows from the assumption (B2) that F1F^{-1} has a bounded derivative in the neighborhood of F(x0)F(x_{0}), to which U^n(f0(x0))\widehat{U}_{n}(f_{0}(x_{0})) belongs with probability that tends to one. Hence, it follows from Taylor’s expansion that

f^n1(f0(x0))\displaystyle\widehat{f}_{n}^{-1}(f_{0}(x_{0})) =F1F(x0)+O(|U^n(f0(x0))F(x0)|)+OP(u(n)1/2)\displaystyle=F^{-1}\circ F(x_{0})+O\left(|\widehat{U}_{n}(f_{0}(x_{0}))-F(x_{0})|\right)+O_{P}\left(u\left(n\right)^{-1/2}\right)
=x+OP(u(n)1/3)+OP(u(n)1/2),\displaystyle=x+O_{P}({u\left(n\right)^{-1/3}})+O_{P}\left(u\left(n\right)^{-1/2}\right),

where we used (37) for the last equality. This proves (38) and completes the proof of Lemma 6.8.

{supplement}\stitle

Supplement of “Harris recurrent Markov chains and nonlinear monotone cointegrated models” \sdescriptionThis is the supplementary material associated with the present article.

1 Markov chain theory and notation

This section extends Section 2 of the main paper, presenting a more detailed exposition of the Markov chain theory required in the proofs. For further details, we refer the reader to [29, 27, 12].

Let X=X0,X1,X2,\textbf{X}=X_{0},X_{1},X_{2},\ldots be a time-homogeneous Markov Chain defined on a probability space (E,,)\left(E,\mathcal{E},\mathbb{P}\right) where \mathcal{E} is countably generated. Let P(x,A)P\left(x,A\right) denote its transition kernel, i.e. for xEx\in E , AA\in\mathcal{E} we have

P(x,A)=(Xi+1A|Xi=x),i=0,1,P\left({x,A}\right)=\mathbb{P}\left({{X_{i+1}}\in A\left|{{X_{i}}=x}\right.}\right),\quad i=0,1,\ldots

Let Pn(x,A)P^{n}(x,A) denote the nn-step transition probability, i.e.

Pn(x,A)=(Xi+nA|Xi=x)i.{P^{n}}\left({x,A}\right)=\mathbb{P}\left({{X_{i+n}}\in A\left|{{X_{i}}=x}\right.}\right)\quad\forall i.

If λ\lambda is a probability measure in (E,)\left(E,\mathcal{E}\right) such that (X0)=λ\mathcal{L}\left(X_{0}\right)=\lambda, then λ\lambda is called the initial measure of the chain X. A homogeneous Markov chain is uniquely identified by its kernel and initial measure.

When the initial measure of the chain is given, we will write λ\mathbb{P}_{\lambda} (and 𝔼λ\mathbb{E}_{\lambda}) for the probability (and the expectation) conditioned on (X0)=λ\mathcal{L}\left(X_{0}\right)=\lambda. When λ=δx\lambda=\delta_{x} for some xEx\in E we will simply write x\mathbb{P}_{x} and 𝔼x\mathbb{E}_{x}.

An homogeneous Markov chain is irreducible if there exists a σ\sigma-finite measure ϕ\phi on (E,)\left(E,\mathcal{E}\right) such that for all xEx\in E and all AA\in\mathcal{E} with ϕ(A)>0\phi(A)>0 we have Pn(x,A)>0P^{n}(x,A)>0 for some n1n\geqslant 1. In this case, there exists a maximal irreducibility measure ψ\psi (all other irreducibility measures are absolutely continuous with respect to ψ\psi). In the following, all Markov chains are supposed to be irreducible with maximal irreducibility measure ψ\psi.

For any set CC\in\mathcal{E}, we will denote by σC\sigma_{C} and τC\tau_{C}, respectively, the times of first visit and first return of the chain to the set CC, i.e. τC=inf{n1:XnC}\tau_{C}=\operatorname{inf}\left\{n\geqslant 1:X_{n}\in C\right\} and σC=inf{n0:XnC}\sigma_{C}=\operatorname{inf}\left\{n\geqslant 0:X_{n}\in C\right\}. The subsequent visit and return times σC,τC(k)\sigma_{C},\tau_{C}\left(k\right), k1k\geqslant 1 are defined inductively as follows:

τC(1)=τC\displaystyle\tau_{C}\left(1\right)=\tau_{C}\quad ,τC(k)=min{n>τC(k1):XnC},\displaystyle,\quad\tau_{C}\left(k\right)=\operatorname{min}\left\{n>\tau_{C}\left(k-1\right):X_{n}\in C\right\}, (67)
σC(1)=σC\displaystyle\sigma_{C}\left(1\right)=\sigma_{C}\quad ,σC(k)=min{n>σC(k1):XnC}.\displaystyle,\quad\sigma_{C}\left(k\right)=\operatorname{min}\left\{n>\sigma_{C}\left(k-1\right):X_{n}\in C\right\}. (68)

Given that our methods will only deal with the values of X in a fixed set CC, if AA is a measurable set, we will write 𝕀C{XtA}\mathbb{I}_{C}\{X_{t}\in A\} instead of 𝕀{XtAC}\mathbb{I}\{X_{t}\in A\cap C\} and if A=EA=E, then we will simply write 𝕀C(Xt)\mathbb{I}_{C}\left(X_{t}\right).

We will use Tn(C)T_{n}\left(C\right) to denote the random variable that counts the number of times the chain has visited the set CC up to time nn, that is Tn(C)=t=0n𝕀C(Xt)T_{n}\left(C\right)=\sum_{t=0}^{n}{\mathbb{I}_{C}\left(X_{t}\right)}. Similarly, we will write T(C)T\left(C\right) for the total of numbers of visits the chain X to CC. The set CC is called recurrent if 𝔼xT(C)=+\mathbb{E}_{x}T\left(C\right)=+\infty for all xCx\in C. The chain X is considered recurrent if every set AA\in\mathcal{E}, such that ψ(A)>0\psi\left(A\right)>0, is recurrent.

Although recurrent chains possess many interesting properties, a stronger type of recurrence is required in our analysis. An irreducible Markov chain is Harris recurrent if for all xEx\in E and all AA\in\mathcal{E} with ψ(A)>0\psi(A)>0 we have

(XnAinfinitely often|X0=x)=1.\mathbb{P}\left({{X_{n}}\in A\;\text{infinitely often}\left|{{X_{0}}=x}\right.}\right)=1.

An irreducible chain possesses an accessible atom, if there is a set 𝜶\boldsymbol{\alpha}\in\mathcal{E} such that for all x,yx,y in 𝜶\boldsymbol{\alpha}: P(x,.)=P(y,.)P(x,.)=P(y,.) and ψ(𝜶)>0\psi(\boldsymbol{\alpha})>0. Denote by 𝜶\mathbb{P}_{\boldsymbol{\alpha}} and 𝔼𝜶(.)\mathbb{E}_{\boldsymbol{\alpha}}{\left(.\right)} the probability and the expectation conditionally to X0𝜶X_{0}\in\boldsymbol{\alpha}. If X possesses an accessible atom and is Harris recurrent, the probability of returning infinitely often to the atom 𝜶\boldsymbol{\alpha} is equal to one, no matter the starting point, i.e. xE,x(τ𝜶<)=1.\forall x\in E,{\mathbb{P}}_{x}\left(\tau_{\boldsymbol{\alpha}}<\infty\right)=1. Moreover, it follows from the strong Markov property that the sample paths may be divided into independent blocks of random length corresponding to consecutive visits to 𝜶\boldsymbol{\alpha}:

0\displaystyle\mathcal{B}_{0} =(X0,X1,,Xτ𝜶(1))\displaystyle=\left(X_{0},X_{1},\ldots,X_{\tau_{\boldsymbol{\alpha}}\left(1\right)}\right)
1\displaystyle\mathcal{B}_{1} =(Xτ𝜶(1)+1,,Xτ𝜶(2))\displaystyle=\left(X_{\tau_{\boldsymbol{\alpha}}\left(1\right)+1},\ldots,X_{\tau_{\boldsymbol{\alpha}}\left(2\right)}\right)
\displaystyle\ldots
n\displaystyle\mathcal{B}_{n} =(Xτ𝜶(n)+1,,Xτ𝜶(n+1))\displaystyle=\left(X_{\tau_{\boldsymbol{\alpha}}\left(n\right)+1},\ldots,X_{\tau_{\boldsymbol{\alpha}}\left(n+1\right)}\right)
\displaystyle\ldots

taking their values in the torus 𝕋=n=1En\mathbb{T}=\cup_{n=1}^{\infty}E^{n}. Notice that the distribution of 0\mathcal{B}_{0} depends on the initial measure, therefore it does not have the same distribution as j\mathcal{B}_{j} for j1j\geqslant 1. The sequence {τ𝜶(j)}j1\left\{\tau_{\boldsymbol{\alpha}}(j)\right\}_{j\geqslant 1} defines successive times at which the chain forgets its past, called regeneration times. Similarly, the sequence of i.i.d. blocks {j}j1\{\mathcal{B}_{j}\}_{j\geqslant 1} are named regeneration blocks. The random variable T(n)=Tn(𝜶)1T\left({n}\right)=T_{n}\left(\boldsymbol{\alpha}\right)-1 counts the number of i.i.d. blocks up to time nn. This term is called number of regenerations up to time nn.

Notice that for any function defined on EE, we can write t=0nf(Xt)\sum_{t=0}^{n}{f\left(X_{t}\right)} as a sum of independent random variables as follows:

t=0nf(Xt)=f(0)+j=1T(n)f(j)+f((n)),\sum_{t=0}^{n}{f\left(X_{t}\right)}=f\left(\mathcal{B}_{0}\right)+\sum_{j=1}^{T\left(n\right)}{f\left(\mathcal{B}_{j}\right)}+{f\left(\mathcal{B}_{(n)}\right)}, (69)

where, f(0)=t=0ταf(Xt)f\left(\mathcal{B}_{0}\right)=\sum_{t=0}^{\tau_{\alpha}}{f\left(X_{t}\right)}, f(j)=t=τ𝜶(j)+1τα(j+1)f(Xt)f\left(\mathcal{B}_{j}\right)=\sum_{t=\tau_{\boldsymbol{\alpha}}\left(j\right)+1}^{\tau_{\alpha}\left(j+1\right)}{f\left(X_{t}\right)} for j=1,,T(n)j=1,\ldots,T\left({n}\right) and f((n)){f\left(\mathcal{B}_{(n)}\right)} denotes the last incomplete block, i.e. f((n))=t=τ𝜶(T(n)+1)+1nf(Xt){f\left(\mathcal{B}_{(n)}\right)}=\sum_{t=\tau_{\boldsymbol{\alpha}}\left(T\left(n\right)+1\right)+1}^{n}{f\left(X_{t}\right)}.

When an accessible atom exists, the stochastic stability properties of X amount to properties concerning the speed of return time to the atom only. For instance, the measure π𝜶\pi_{\boldsymbol{\alpha}} given by:

π𝜶(B)=𝔼𝜶(n=1τ𝜶𝕀{XiB}),B\pi_{\boldsymbol{\alpha}}\left(B\right)=\mathbb{E}_{\boldsymbol{\alpha}}\left(\sum_{n=1}^{\tau_{\boldsymbol{\alpha}}}\mathbb{I}{\left\{X_{i}\in B\right\}}\right),\quad\forall B\in\mathcal{E} (70)

is invariant, i.e.

π𝜶(B)=P(x,B)𝑑π𝜶(x).\pi_{\boldsymbol{\alpha}}\left(B\right)=\int{P\left({x,B}\right)d\pi_{\boldsymbol{\alpha}}\left(x\right)}.

Denote by +\mathcal{E}^{+} the class of nonnegative measurable functions with positive ψ\psi support. A function s+s\in\mathcal{E}^{+} is called small if there exists an integer m01m_{0}\geqslant 1 and a measure ν()+\nu\in\mathscr{M}{\left({\mathcal{E}}\right)_{+}} such that

Pm0(x,A)s(x)ν(A)xE,A.{P^{m_{0}}}\left({x,A}\right)\geqslant s\left(x\right)\nu\left(A\right)\quad\forall x\in E,A\in\mathcal{E}. (71)

When a chain possesses a small function ss, we say that it satisfies the minorization inequality M(m0,s,ν)M\left(m_{0},s,\nu\right). As pointed out in [29], there is no loss of generality in assuming that 0s(x)10\leqslant s\left(x\right)\leqslant 1 and Es(x)𝑑ν(x)>0\int_{E}{s(x)d\nu(x)}>0.

A set AA\in\mathcal{E} is said to be small if the function 𝕀A\mathbb{I}_{A} is small. Similarly, a measure ν\nu is small if there exist m0m_{0}, and ss that satisfy (71). By Theorem 2.1 in [29], every irreducible Markov chain possesses a small function and Proposition 2.6 of the same book shows that every measurable set AA with ψ(A)>0\psi\left(A\right)>0 contains a small set. In practice, finding such a set consists in most cases in exhibiting an accessible set, for which the probability that the chain returns to it in mm steps is uniformly bounded below. Moreover, under quite wide conditions a compact set will be small, see [15].

If X does not possess an atom but is Harris recurrent (and therefore satisfies a minorization inequality M(m0,s,ν)M\left(m_{0},s,\nu\right)), a splitting technique, introduced in [28, 29], allows us to extend in some sense the probabilistic structure of X in order to artificially construct an atom. The general idea behind this construction is to expand the sample space so as to define a sequence (Yn)n(Y_{n})_{n\in\mathbb{N}} of Bernoulli r.v.’s and a bivariate chain Xˇ={(Xn,Yn)}n=0+{\check{\textbf{X}}}=\left\{{\left({{X_{n}},{Y_{n}}}\right)}\right\}_{n=0}^{+\infty}, named split chain, such that the set 𝜶ˇ=(E,1)\check{\boldsymbol{\alpha}}=\left(E,1\right) is an atom of this chain. A detailed description of this construction can be found in [29].

The whole point of this construction consists in the fact that Xˇ{\check{\textbf{X}}} inherits all the communication and stochastic stability properties from X (irreducibility, Harris recurrence,…). In particular, the marginal distribution of the first coordinate process of Xˇ{\check{\textbf{X}}} and the distribution of the original X are identical. Hence, the splitting method enables us to establish all the results known for atomic chains to general Harris chains, for example, the existence of an invariant measure which is unique up to multiplicative constant (see Proposition 10.4.2 in [27]).

The invariant measure is finite if and only if 𝔼𝜶ˇτ𝜶ˇ<+\mathbb{E}_{\check{\boldsymbol{\alpha}}}{\tau_{\check{\boldsymbol{\alpha}}}}<+\infty, in this case we say the chain is positive recurrent, otherwise, we say the chain is null recurrent. A null recurrent chain is called β\beta-null recurrent (c.f. Definition 3.2 in [24]) if there exists a small nonnegative function hh, a probability measure λ\lambda, a constant β(0,1)\beta\in\left({0,1}\right) and a slowly varying function LhL_{h} such that

𝔼λ(t=0nh(Xt))1Γ(1+β)nβLh(n)asn.{\mathbb{E}_{\lambda}}\left({\sum\limits_{t=0}^{n}{h\left({{X_{t}}}\right)}}\right)\sim\frac{1}{{\Gamma\left({1+\beta}\right)}}{n^{\beta}}{L_{h}}\left(n\right)\quad\textnormal{as}\;n\to\infty.

As argued in [24], is not a too severe restriction to assume m0=1m_{0}=1. Therefore, throughout this paper we assume that X satisfies the minorization inequality M(1,s,ν)M(1,s,\nu), i.e, there exist a measurable function ss and a probability measure ν\nu such that 0s(x)10\leqslant s\left(x\right)\leqslant 1, Es(x)𝑑ν(x)>0\int_{E}{s(x)d\nu(x)}>0 and

P(x,A)s(x)ν(A).{P}\left({x,A}\right)\geqslant s\left(x\right)\nu\left(A\right). (72)

A measurable and positive function LL, defined in [a,+)[a,+\infty) for some a0a\geqslant 0, is called slowly varying at ++\infty if it satisfies limx+L(xt)L(x)=1\lim_{x\to+\infty}{\frac{L\left(xt\right)}{L\left(x\right)}}=1 for all tat\geqslant a. See [6] for a detailed compendium of these types of functions.

It was shown in Theorem 3.1 of [24] that if the chain satisfies the minorization condition (5), then it is β\beta-null recurrent if and only if

(τ𝜶ˇ>n)1Γ(1β)nβL(n),\mathbb{P}\left({{\tau_{\check{\boldsymbol{\alpha}}}}>n}\right)\sim\frac{1}{{{\Gamma\left(1-\beta\right)n^{\beta}}L\left(n\right)}}, (73)

where LL is a slowly varying function.

The following theorem is a compendium of the main properties of Harris’s recurrent Markov chains that will be used throughout the proofs. Among other things, it shows that the asymptotic behavior of T(n)T\left({n}\right) is similar to the function u(n)u\left(n\right) defined as

u(n)={n,if X is positive recurrentnβL(n),if X is β-null recurrent.u\left(n\right)=\begin{cases}n,&\text{if }\textbf{X}\text{ is positive recurrent}\\ n^{\beta}L\left(n\right),&\text{if }\textbf{X}\text{ is }\beta\text{-null recurrent}\end{cases}. (74)

The Mittag-Leffler distribution with index β\beta is a non-negative continuous distribution, whose moments are given by

𝔼(Mβm(1))=m!Γ(1+mβ)m0.\mathbb{E}\left(M^{m}_{\beta}\left(1\right)\right)=\frac{m!}{\Gamma\left(1+m\beta\right)}\;\;m\geqslant 0.

See (3.39) in [24] for more details.

Theorem 1.1.

Suppose X is a Harris recurrent, irreducible Markov chain, with initial measure λ\lambda, that satisfies the minorization condition (72). Let T(n)T\left({n}\right) be the number of complete regenerations until time nn of the split chain Xˇ{\check{\textbf{X}}} , let CC\in\mathcal{E} be a small set and π\pi be an invariant measure for X. Then,

  1. 1.

    0<π(C)<+0<\pi\left(C\right)<+\infty.

  2. 2.

    For any function ff, defined on EE, the decomposition (69) holds. Moreover, there is a constant KπK_{\pi}, that only depends on π\pi, such that if fL1(E,π)f\in L^{1}\left(E,{\pi}\right), then 𝔼λf(1)=KπEf𝑑π\mathbb{E}_{\lambda}{f\left(\mathcal{B}_{1}\right)}=K_{\pi}\int_{E}fd\pi.

  3. 3.

    T(n)Tn(C)\frac{T\left({n}\right)}{T_{n}(C)} converges almost surely to a positive constant.

  4. 4.

    T(n)u(n)\frac{T\left({n}\right)}{u\left(n\right)} converges almost surely to a positive constant if X is positive recurrent and converges in distribution to a Mittag-Leffler random variable with index β\beta if X is β\beta-null recurrent.

Remark 1.1.

The Mittag-Leffler distribution with index β\beta is a non-negative continuous distribution, whose moments are given by

𝔼(Mβm(1))=m!Γ(1+mβ)m0.\mathbb{E}\left(M^{m}_{\beta}\left(1\right)\right)=\frac{m!}{\Gamma\left(1+m\beta\right)}\;\;m\geqslant 0.

See (3.39) in [24] for more details.

Remark 1.2.

Part 1 of Theorem 1.1 is Proposition 5.6.ii of [29], part 2 is equation (3.23) of [24] and part 3 is an application of the Ratio Limit Theorem (Theorem 17.2.1 of [27]). For the positive recurrent case, part 4 also follows by the aforementioned Ratio Limit Theorem while the claim for the null recurrent case appears as Theorem 3.2 in [24].

2 Technical proofs for Section 5

Proof of Lemma 5.1. Equation (12) follows from Corollary 2 in [2] and part 2 of Theorem 1.1.

Now, we turn to the proof of (13). To do this, we adapt some of the ideas presented in the proof of Lemma 21.2 in [33].

Let VV be a normal random variable independent of the XiX_{i}’s, and Φ\Phi its distribution function. it follows from (12) that conditionally on the XtX_{t}’s, Fn(V)F_{n}(V) converges almost surely to F(V)F(V). Thus, denoting by X\mathbb{P}_{X} the conditional probability given the XtX_{t}’s, it follows from (29) that Φ(Fn1(u))=X(Fn(V)<u)\Phi(F_{n}^{-1}(u))=\mathbb{P}_{X}(F_{n}(V)<u) converges almost surely to X(F(V)<u)=Φ(F1(u))\mathbb{P}_{X}(F(V)<u)=\Phi(F^{-1}(u)) at every uu at which the limit function is continuous . Since FF is strictly increasing in CC, one can find ε>0\varepsilon>0 such that F1F^{-1} is continuous on [F(x0)ε,F(x0)+ε][F(x_{0})-\varepsilon,F(x_{0})+\varepsilon], so the above limit function is continuous at every u[F(x0)ε,F(x0)+ε]u\in[F(x_{0})-\varepsilon,F(x_{0})+\varepsilon]. By continuity of Φ1\Phi^{-1} on (0,1)(0,1), Fn1(u)F_{n}^{-1}(u) converges almost surely to F1(u)F^{-1}(u) for every such uu. By monotonicity, the convergence is uniform, hence

sup|pF(x0)|ε|Fn1(p)F1(p)|=o(1)a.s.\sup_{|p-F(x_{0})|\leqslant\varepsilon}|F_{n}^{-1}(p)-F^{-1}(p)|=o(1)\quad a.s.

as nn\to\infty.

Proof of Lemma 5.2. This proof is an adaptation to the localized case of the proof of Lemma 2 in [5]. Let fCCf^{\prime}_{C}\in\mathcal{F}^{\prime}_{C}, i.e., there exists ff\in\mathcal{F} such that fC(B)=f(y)MC(B,dy)f_{C}^{\prime}(B)=\int f(y)\,M_{C}(B,\mathrm{d}y). By Cauchy–Schwarz inequality,

(f(y)MC(B,dy))2C(B)(f2MC(B,dy)),{\left({\int f(y){\mkern 1.0mu}{M_{C}}(B,dy)}\right)^{2}}\leqslant{\ell_{C}}\left(B\right)\left({\int{{f^{2}}{M_{C}}(B,dy)}}\right),

then

𝔼Q(fC2)𝔼Q(C(B)(f(y)2MC(B,dy)))=𝔼QC(f2)EQ(C2),\mathbb{E}_{Q^{\prime}}(f^{\prime 2}_{C})\leqslant\mathbb{E}_{Q^{\prime}}\left(\ell_{C}(B)\left(\int f(y)^{2}\,M_{C}(B,dy)\right)\right)=\mathbb{E}_{Q_{C}}(f^{2})E_{Q^{\prime}}(\ell_{C}^{2}),

where the last equality follows from (16). Applying this to the function

fC(B)fk(B)=(f(y)fk(y))MC(B,dy),\displaystyle f_{C}^{\prime}(B)-f_{k}^{\prime}(B)=\int(f(y)-f_{k}(y))\,M_{C}(B,dy),

when each fkf_{k} is the center of an ε\varepsilon-cover of the space \mathcal{F} and ffkL2(QC)ε\|f-f_{k}\|_{L_{2}(Q_{C})}\leqslant\varepsilon gives the first assertion of the lemma. To obtain the second assertion, note that UC=UCU_{C}^{\prime}=U\ell_{C} is an envelope for C\mathcal{F}_{C}^{\prime}. In addition, we have that

UCL2(Q)=UCL2(Q).\displaystyle\|U_{C}^{\prime}\|_{L_{2}(Q^{\prime})}=U\|\ell_{C}\|_{L_{2}(Q^{\prime})}.

From this, we derive that, for every 0<ε<10<\varepsilon<1,

𝒩(εUCL2(Q),𝒰C,L2(Q))=𝒩(εUCL2(Q),𝒰,L2(Q)).\displaystyle\mathcal{N}(\varepsilon\|U_{C}^{\prime}\|_{L_{2}(Q^{\prime})},\,\mathcal{U}_{C}^{\prime},\,L_{2}(Q^{\prime}))=\mathcal{N}(\varepsilon U\|\ell_{C}\|_{L_{2}(Q^{\prime})},\,\mathcal{U}^{\prime},\,L_{2}(Q^{\prime})).

Then using the first assertion of the lemma, we obtain for every 0<ε<10<\varepsilon<1,

𝒩(εUCL2(Q),C,L2(Q))𝒩(εU,,L2(QC)),\displaystyle\mathcal{N}(\varepsilon\|U_{C}^{\prime}\|_{L_{2}(Q^{\prime})},\,\mathcal{F}_{C}^{\prime},\,L_{2}(Q^{\prime}))\leqslant\mathcal{N}(\varepsilon U,\,\mathcal{F},\,L_{2}(Q_{C})),

which implies the second assertion of the Lemmaz whenever the class \mathcal{F} is VC with envelope UU.

Proof of Lemma 5.3. Let BE^B\in\widehat{E} and g:E×+g:E\times\mathbb{R}\to\mathbb{R}_{+}. For each yy\in\mathbb{R} we define gy(x)=g(x,y)g_{y}\left(x\right)=g\left(x,y\right), then, using the notation of section 6.2 we will have g^y(B)=xBCg(x,y)\widehat{g}_{y}\left(B\right)=\sum_{x\in B\cap C}g\left(x,y\right). Finally, for any function h:h:\mathbb{R}\to\mathbb{R}, we define

g~yh(B)=(gyh(y)^)(B)=xBC(g(x,y)h(y))=g^y(B)C(B)h(y).\widetilde{g}^{h}_{y}\left(B\right)=(\widehat{g_{y}-h(y)})(B)=\sum_{x\in B\cap C}\left(g\left(x,y\right)-h\left(y\right)\right)=\widehat{g}_{y}\left(B\right)-\ell_{C}\left(B\right)h\left(y\right).

Let g(x,y)=𝕀{xy}g\left(x,y\right)=\mathbb{I}\left\{x\leqslant y\right\}, and h=Fh=F as defined in (8). Then, g^y(B)=xB𝕀C{xy}\widehat{g}_{y}\left(B\right)=\sum_{x\in B}\mathbb{I}_{C}{\left\{x\leqslant y\right\}} and

g~yF(B)=xBC(𝕀{xy}F(y))=g^y(B)C(B)F(y).\widetilde{g}_{y}^{F}\left(B\right)=\sum_{x\in B\cap C}\left(\mathbb{I}\left\{x\leqslant y\right\}-F\left(y\right)\right)=\widehat{g}_{y}\left(B\right)-\ell_{C}\left(B\right)F\left(y\right).

From now on, we’ll remove the superindex from g~yF\widetilde{g}_{y}^{F} to ease the notation.

By the definition of FnF_{n} and FF ((7) and (8)), we have that

Fn(y)F(y)\displaystyle F_{n}(y)-F(y) =1Tn(C)i=1Tn(C)(𝕀{XσC(i)y}F(y))\displaystyle=\frac{1}{T_{n}(C)}\sum_{i=1}^{T_{n}(C)}{\left(\mathbb{I}{\{X_{\sigma_{C}\left(i\right)}\leqslant y\}}-F(y)\right)}
=1Tn(C)i=0n(𝕀C{Xty}𝕀C{Xi}F(y))\displaystyle=\frac{1}{T_{n}(C)}\sum_{i=0}^{n}{\left(\mathbb{I}_{C}{\{X_{t}\leqslant y\}}-\mathbb{I}_{C}{\{X_{i}\}}F(y)\right)}
=1Tn(C)(g~y(0)+i=1T(n)g~y(i)+g~y((n))),\displaystyle=\frac{1}{T_{n}(C)}\left(\widetilde{g}_{y}\left(\mathcal{B}_{0}\right)+\sum_{i=1}^{T\left(n\right)}{\widetilde{g}_{y}\left(\mathcal{B}_{i}\right)}+\widetilde{g}_{y}\left(\mathcal{B}_{(n)}\right)\right),

therefore,

Tn(C)(Fn(y)F(y))=g~y(0)Tn(C)+i=1T(n)g~y(i)Tn(C)+g~y((n))Tn(C).\sqrt{T_{n}(C)}\Bigl{(}F_{n}(y)-F(y)\Bigr{)}=\frac{\widetilde{g}_{y}\left(\mathcal{B}_{0}\right)}{\sqrt{T_{n}(C)}}+\frac{\sum_{i=1}^{T\left(n\right)}{\widetilde{g}_{y}\left(\mathcal{B}_{i}\right)}}{\sqrt{T_{n}(C)}}+\frac{\widetilde{g}_{y}\left(\mathcal{B}_{(n)}\right)}{\sqrt{T_{n}(C)}}.

Notice that |g~y(0)|2C(0)<+\left|\widetilde{g}_{y}\left(\mathcal{B}_{0}\right)\right|\leqslant 2\ell_{C}(\mathcal{B}_{0})<+\infty and Tn(C)+T_{n}(C)\to+\infty almost surely, therefore, the first term in the last equation converges almost surely to 0 uniformly in yy. For the last term, we have that

|g~y((n))|Tn(C)2C(T(n))Tn(C)=2T(n)Tn(C)C(T(n))T(n),\frac{\left|\widetilde{g}_{y}\left(\mathcal{B}_{(n)}\right)\right|}{\sqrt{T_{n}(C)}}\leq\frac{2\ell_{C}(\mathcal{B}_{T\left(n\right)})}{\sqrt{T_{n}(C)}}=2\sqrt{\frac{T\left(n\right)}{T_{n}(C)}}\frac{\ell_{C}(\mathcal{B}_{T\left(n\right)})}{\sqrt{T\left(n\right)}},

by (B4), the expectation of C2(1)\ell^{2}_{C}(\mathcal{B}_{1}) is finite, then, Lemma 1 in [2] shows that C2(n)n0\frac{\ell^{2}_{C}(\mathcal{B}_{n})}{n}\to 0 a.s. which implies that C(n)n\frac{\ell_{C}(\mathcal{B}_{n})}{\sqrt{n}} also converges to 0 a.s. Since T(n)+T\left(n\right)\to+\infty a.s., by Theorem 6.8.1 in [19] we have C(T(n))T(n)0\frac{\ell_{C}(\mathcal{B}_{T\left(n\right)})}{\sqrt{T\left(n\right)}}\to 0 almost surely. Joining this with the almost sure convergence of T(n)Tn(C)\frac{T(n)}{T_{n}(C)} to a positive constant (see Theorem 1.1) we obtain that |g~y((n))|Tn(C)\frac{\left|\widetilde{g}_{y}\left(\mathcal{B}_{(n)}\right)\right|}{\sqrt{T_{n}(C)}} converges almost surely to 0 uniformly in yy. Therefore,

Tn(C)(Fn(y)F(y))=i=1T(n)g~y(i)T(n)+oP(1).\sqrt{T_{n}(C)}\Bigl{(}F_{n}(y)-F(y)\Bigr{)}=\frac{\sum_{i=1}^{T\left(n\right)}{\widetilde{g}_{y}\left(\mathcal{B}_{i}\right)}}{\sqrt{T(n)}}+o_{P}\left(1\right). (75)

where we have used that Tn(C)T(n)\frac{T_{n}(C)}{T(n)} converges almost surely to a positive constant to use T(n)T(n) instead of Tn(C)T_{n}(C).

Then, (17) will be proved if we show that, for ε\varepsilon small enough

sup|yx0|ε|i=1T(n)g~y(i)|T(n)=Op(1).\mathop{\sup}\limits_{|y-x_{0}|\leqslant\varepsilon}{\frac{\left|\sum_{i=1}^{T\left(n\right)}{\widetilde{g}_{y}\left(\mathcal{B}_{i}\right)}\right|}{\sqrt{T(n)}}}={O_{p}}\left(1\right). (76)

Fix η>0\eta>0 arbitrarily. By Lemma 6.7 and Slutsky’s theorem, we can find positive numbers a¯η,a¯η\underline{a}_{\eta},\overline{a}_{\eta} and an integer NηN_{\eta} such that P(n)1η2P\left(\mathcal{E}_{n}\right)\geqslant 1-\frac{\eta}{2} for all nNηn\geqslant N_{\eta}, where

n={a¯ηu(n)T(n)a¯ηu(n)}.\mathcal{E}_{n}=\left\{\underline{a}_{\eta}u\left(n\right)\leqslant T(n)\leqslant\overline{a}_{\eta}u\left(n\right)\right\}. (77)

Define Wn(ε)=sup|yx0|ε|i=1ng~y(i)|W_{n}(\varepsilon)=\mathop{\sup}\limits_{|y-x_{0}|\leqslant\varepsilon}{\left|\sum_{i=1}^{n}{\widetilde{g}_{y}\left(\mathcal{B}_{i}\right)}\right|} and let MηM_{\eta} be a fixed positive number. Then, for all nNηn\geqslant N_{\eta}

(1T(n)WT(n)>Mη)\displaystyle\mathbb{P}\left(\frac{1}{\sqrt{T(n)}}W_{T(n)}>M_{\eta}\right) <({1T(n)WT(n)>Mη}n)+1(n)\displaystyle<\mathbb{P}\left(\left\{\frac{1}{\sqrt{T(n)}}W_{T(n)}>M_{\eta}\right\}\cap\mathcal{E}_{n}\right)+1-\mathbb{P}\left(\mathcal{E}_{n}\right)
<({1T(n)WT(n)>Mη}n)+η2.\displaystyle<\mathbb{P}\left(\left\{\frac{1}{\sqrt{T(n)}}W_{T(n)}>M_{\eta}\right\}\cap\mathcal{E}_{n}\right)+\frac{\eta}{2}. (78)

On n\mathcal{E}_{n}, a¯ηu(n)T(n)a¯ηu(n)\underline{a}_{\eta}u\left(n\right)\leqslant T(n)\leqslant\overline{a}_{\eta}u\left(n\right), therefore for all nNηn\geqslant N_{\eta}

({1T(n)WT(n)>Mη}n)\displaystyle\mathbb{P}\left(\left\{\frac{1}{\sqrt{T(n)}}W_{T(n)}>M_{\eta}\right\}\cap\mathcal{E}_{n}\right) <({1a¯ηu(n)max1ka¯ηu(n)Wk>Mη}n),\displaystyle<\mathbb{P}\left(\left\{\frac{1}{\sqrt{\underline{a}_{\eta}u\left(n\right)}}\max_{1\leqslant k\leqslant\overline{a}_{\eta}u\left(n\right)}{W_{k}}>M_{\eta}\right\}\cap\mathcal{E}_{n}\right),
<(1a¯ηu(n)max1ka¯ηu(n)Wk>Mη).\displaystyle<\mathbb{P}\left(\frac{1}{\sqrt{\underline{a}_{\eta}u\left(n\right)}}\max_{1\leqslant k\leqslant\overline{a}_{\eta}u\left(n\right)}{W_{k}}>M_{\eta}\right). (79)

The random variables {g~()(k)}k=1a¯ηu(n)\left\{\widetilde{g}_{(\cdot)}\left(\mathcal{B}_{k}\right)\right\}_{k=1}^{\overline{a}_{\eta}u\left(n\right)} are i.i.d., therefore, by Montgomery-Smith’s inequality (Lemma 4 in [1]), there exists a universal constant KK such that for all nNηn\geqslant N_{\eta},

(1a¯ηu(n)max1ka¯ηu(n)Wk>Mη)\displaystyle\mathbb{P}\left(\frac{1}{\sqrt{\underline{a}_{\eta}u\left(n\right)}}\max_{1\leqslant k\leqslant\overline{a}_{\eta}u\left(n\right)}{W_{k}}>M_{\eta}\right) <K(1a¯ηu(n)Wa¯ηu(n)>MηK),\displaystyle<K\mathbb{P}\left(\frac{1}{\sqrt{\underline{a}_{\eta}u\left(n\right)}}{W_{\overline{a}_{\eta}u\left(n\right)}}>\frac{M_{\eta}}{K}\right),
<K(1a¯ηu(n)sup|yx0|ε|i=1a¯ηu(n)g~y(i)|>MηK).\displaystyle<K\mathbb{P}\left(\frac{1}{\sqrt{\underline{a}_{\eta}u\left(n\right)}}{\mathop{\sup}\limits_{|y-x_{0}|\leqslant\varepsilon}{\left|\sum_{i=1}^{\overline{a}_{\eta}u\left(n\right)}{\widetilde{g}_{y}\left(\mathcal{B}_{i}\right)}\right|}}>\frac{M_{\eta}}{K}\right). (80)

For an arbitrary set TT, let +(T)\ell^{+\infty}(T) be the space of all uniformly bounded, real functions on TT, equipped with the uniform norm. Weak convergence to a tight process in this space is characterized by asymptotic tightness plus convergence of marginals (see Chapter 1.5 in [34]).

The class of functions 𝒢F={gy()F(y)}y\mathcal{G}-F=\left\{g_{y}(\cdot)-F\left(y\right)\right\}_{y\in\mathbb{R}} is VC with constant envelope 22, hence, by Lemma 5.2, the class of functions 𝒢F^\widehat{\mathcal{G}-F} is also VC and has 2C2\ell_{C} as envelope. 𝔼C2(1)\mathbb{E}\ell_{C}^{2}(\mathcal{B}_{1}) is finite (by (B4)), therefore, by Theorem 2.5 in [25], 𝒢F^\widehat{\mathcal{G}-F} is Donsker. Then, the process 1a¯ηu(n)|i=1a¯ηu(n)g~y(i)|\frac{1}{\sqrt{\underline{a}_{\eta}u\left(n\right)}}{\left|\sum_{i=1}^{\overline{a}_{\eta}u\left(n\right)}{\widetilde{g}_{y}\left(\mathcal{B}_{i}\right)}\right|} converges weakly in [𝒢F^]\ell^{\infty}\left[\widehat{\mathcal{G}-F}\right] to a tight process ZZ. The map yyy\mapsto\left\|y\right\|_{\infty} from [𝒢F^]\ell^{\infty}\left[\widehat{\mathcal{G}-F}\right] to \mathbb{R} is continuous with respect to the supremum norm (cf. pp 278 of [33]), therefore, 1a¯ηu(n)sup|yx0|ε|i=1a¯ηu(n)g~y(i)|\frac{1}{\sqrt{\underline{a}_{\eta}u\left(n\right)}}{\mathop{\sup}\limits_{|y-x_{0}|\leqslant\varepsilon}{\left|\sum_{i=1}^{\overline{a}_{\eta}u\left(n\right)}{\widetilde{g}_{y}\left(\mathcal{B}_{i}\right)}\right|}} converges in distribution to sup|yx0|εZ(y){\mathop{\sup}\limits_{|y-x_{0}|\leqslant\varepsilon}}{Z\left(y\right)}, hence, we can find VηV_{\eta} and NηN^{\prime}_{\eta} such that

(1a¯ηu(n)sup|yx0|ε|i=1a¯ηu(n)g~y(i)|>Vη)<η2K,n>Nη.\mathbb{P}\left(\frac{1}{\sqrt{\underline{a}_{\eta}u\left(n\right)}}{\mathop{\sup}\limits_{|y-x_{0}|\leqslant\varepsilon}{\left|\sum_{i=1}^{\overline{a}_{\eta}u\left(n\right)}{\widetilde{g}_{y}\left(\mathcal{B}_{i}\right)}\right|}}>V_{\eta}\right)<\frac{\eta}{2K},\quad\forall n>N^{\prime}_{\eta}. (81)

Choosing Mη=KVηM_{\eta}=KV_{\eta} in 81 and joining (80), (79) and (78), completes the proof of (17).

Now we proceed to prove (18). Let η\eta be fixed, by (17) and Lemma 6.7, we can find ε\varepsilon^{\prime}, MηM_{\eta}^{\prime} and NηN_{\eta}^{\prime} such that

(Tn(C)sup|yx0|ε|Fn(y)F(y)|>Mη)\displaystyle\mathbb{P}\left(\sqrt{T_{n}(C)}\mathop{\sup}\limits_{|y-x_{0}|\leqslant\varepsilon^{\prime}}{\left|{F_{n}(y)-F(y)}\right|>M_{\eta}^{\prime}}\right) <η4nNη\displaystyle<\frac{\eta}{4}\quad\forall n\geqslant N_{\eta}^{\prime} (82)
(𝒟n)\displaystyle\mathbb{P}\left(\mathcal{D}_{n}\right) 1η2nNη\displaystyle\geqslant 1-\frac{\eta}{2}\quad\forall n\geqslant N_{\eta}^{\prime} (83)

where 𝒟n={a¯ηu(n)Tn(C)a¯ηu(n)}\mathcal{D}_{n}=\left\{\underline{a}_{\eta}u\left(n\right)\leqslant T_{n}(C)\leqslant\overline{a}_{\eta}u\left(n\right)\right\}. Define the sets

Un\displaystyle U_{n} ={Tn(C)sup|pF(x0)|ε|Fn1(p)F1(p)|>Mη},\displaystyle=\left\{\sqrt{T_{n}(C)}\mathop{\sup}\limits_{|p-F(x_{0})|\leqslant\varepsilon}{\left|{F_{n}^{-1}(p)-{F^{-1}}(p)}\right|}>M_{\eta}\right\},
Un1\displaystyle U_{n}^{1} ={p[F(x0)ε,F(x0)+ε]:Fn1(p)F1(p)>MηTn(C)},\displaystyle=\left\{\exists p\in\left[F(x_{0})-\varepsilon,F(x_{0})+\varepsilon\right]:F_{n}^{-1}(p)-F^{-1}(p)>\frac{M_{\eta}}{\sqrt{T_{n}(C)}}\right\},
Un2\displaystyle U_{n}^{2} ={p[F(x0)ε,F(x0)+ε]:F1(p)Fn1(p)>MηTn(C)}.\displaystyle=\left\{\exists p\in\left[F(x_{0})-\varepsilon,F(x_{0})+\varepsilon\right]:F^{-1}(p)-F_{n}^{-1}(p)>\frac{M_{\eta}}{\sqrt{T_{n}(C)}}\right\}.

where ε\varepsilon and MηM_{\eta} are constants that will be specified later.

On Un1𝒟nU_{n}^{1}\cap\mathcal{D}_{n}, Fn1(p)>MηTn(C)+F1(p)>Mηa¯ηu(n)+F1(p)F_{n}^{-1}\left(p\right)>\frac{M_{\eta}}{\sqrt{T_{n}(C)}}+F^{-1}\left(p\right)>\frac{M_{\eta}}{\sqrt{\overline{a}_{\eta}u\left(n\right)}}+F^{-1}\left(p\right), hence,

Fn(Mηa¯ηu(n)+F1(p))\displaystyle F_{n}\left(\frac{M_{\eta}}{\sqrt{\overline{a}_{\eta}u\left(n\right)}}+F^{-1}\left(p\right)\right) Fn(Fn1(p))p+1Tn(C)\displaystyle\leqslant F_{n}\left(F_{n}^{-1}\left(p\right)\right)\leqslant p+\frac{1}{T_{n}(C)}
p+1a¯ηu(n).\displaystyle\leqslant p+\frac{1}{\underline{a}_{\eta}u\left(n\right)}. (84)

Assumption (B2) indicates that FF has bounded derivative in CC, take K1K_{1} as the maximum value of this derivative in CC, then, the Mean Value Theorem implies that

p\displaystyle p =F(F1(p))=F(Mηa¯ηu(n)+F1(p))F(θp)Mηa¯ηu(n)\displaystyle=F\left(F^{-1}\left(p\right)\right)=F\left(\frac{M_{\eta}}{\sqrt{\overline{a}_{\eta}u\left(n\right)}}+F^{-1}\left(p\right)\right)-\frac{F^{\prime}\left(\theta_{p}\right)M_{\eta}}{\sqrt{\overline{a}_{\eta}u\left(n\right)}}
F(Mηa¯ηu(n)+F1(p))K1Mηa¯ηu(n).\displaystyle\leqslant F\left(\frac{M_{\eta}}{\sqrt{\overline{a}_{\eta}u\left(n\right)}}+F^{-1}\left(p\right)\right)-\frac{K_{1}M_{\eta}}{\sqrt{\overline{a}_{\eta}u\left(n\right)}}.

After plugging this into (2) we get

F(Mηa¯ηu(n)+F1(p))Fn(Mηa¯ηu(n)+F1(p))K1Mηa¯ηu(n)1a¯ηu(n).F\left(\frac{M_{\eta}}{\sqrt{\overline{a}_{\eta}u\left(n\right)}}+F^{-1}\left(p\right)\right)-F_{n}\left(\frac{M_{\eta}}{\sqrt{\overline{a}_{\eta}u\left(n\right)}}+F^{-1}\left(p\right)\right)\geqslant\frac{K_{1}M_{\eta}}{\sqrt{\overline{a}_{\eta}u\left(n\right)}}-\frac{1}{\underline{a}_{\eta}u\left(n\right)}.

Because u(n)+u\left(n\right)\to+\infty, we can find N1N_{1} such that a¯ηu(n)1a¯ηK1<1\sqrt{\frac{\overline{a}_{\eta}}{u\left(n\right)}}\frac{1}{\underline{a}_{\eta}K_{1}}<1 for all nN1n\geqslant N_{1}, taking MηM_{\eta} bigger than MηK1a¯ηa¯η+1\frac{M_{\eta}^{\prime}}{K_{1}}\sqrt{\frac{\overline{a}_{\eta}}{\underline{a}_{\eta}}}+1 and using that Tn(C)a¯ηu(n)T_{n}(C)\leqslant\underline{a}_{\eta}u\left(n\right) on 𝒟n\mathcal{D}_{n}, we obtain, for all nN1n\geqslant N_{1}

F(Mηa¯ηu(n)+F1(p))Fn(Mηa¯ηu(n)+F1(p))>MηTn(C).F\left(\frac{M_{\eta}}{\sqrt{\overline{a}_{\eta}u\left(n\right)}}+F^{-1}\left(p\right)\right)-F_{n}\left(\frac{M_{\eta}}{\sqrt{\overline{a}_{\eta}u\left(n\right)}}+F^{-1}\left(p\right)\right)>\frac{M_{\eta}^{\prime}}{\sqrt{T_{n}(C)}}. (85)

Let N2,ηN_{2,\eta} be such that Mηa¯ηu(n)<ε2\frac{M_{\eta}}{\sqrt{\overline{a}_{\eta}u\left(n\right)}}<\frac{\varepsilon^{\prime}}{2} for nN2,ηn\geqslant N_{2,\eta}. By the continuity of F1F^{-1} in F(x0)F(x_{0}) there exists ε>0\varepsilon>0 such that |F1(p)x0|ε2\left|F^{-1}\left(p\right)-x_{0}\right|\leqslant\frac{\varepsilon^{\prime}}{2} for all pp in [F(x0)ε,F(x0)+ε]\left[F(x_{0})-\varepsilon,F(x_{0})+\varepsilon\right], therefore, the triangular inequality implies that Mηa¯ηu(n)+F1(p)\frac{M_{\eta}}{\sqrt{\overline{a}_{\eta}u\left(n\right)}}+F^{-1}\left(p\right) lies in the interval [x0ε,x0+ε]\left[x_{0}-\varepsilon^{\prime},x_{0}+\varepsilon^{\prime}\right] for all nNη=max(N1,N2,η)n\geqslant N_{\eta}=\text{max}\left(N_{1},N_{2,\eta}\right). This, alongside (85), shows that for all nNηn\geqslant N_{\eta}

Un1𝒟n\displaystyle U_{n}^{1}\cap\mathcal{D}_{n} {y[x0ε,x0+ε]:F(y)Fn(y)>MηTn(C)}\displaystyle\subseteq\left\{\exists y\in\left[x_{0}-\varepsilon^{\prime},x_{0}+\varepsilon^{\prime}\right]:F(y)-F_{n}(y)>\frac{M^{\prime}_{\eta}}{\sqrt{T_{n}(C)}}\right\}
{Tn(C)sup|yx0|ε|Fn(y)F(y)|>Mη}.\displaystyle\subseteq\left\{\sqrt{T_{n}(C)}\mathop{\sup}\limits_{|y-x_{0}|\leqslant\varepsilon^{\prime}}{\left|{F_{n}(y)-F(y)}\right|>M_{\eta}^{\prime}}\right\}.

By a similar argument, it can be shown that

Un2𝒟n{y[x0ε,x0+ε]:Fn(y)F(y)>MηTn(C)}nNη.U_{n}^{2}\cap\mathcal{D}_{n}\subseteq\left\{\exists y\in\left[x_{0}-\varepsilon^{\prime},x_{0}+\varepsilon^{\prime}\right]:F_{n}(y)-F(y)>\frac{M^{\prime}_{\eta}}{\sqrt{T_{n}(C)}}\right\}\quad\forall n\geqslant N_{\eta}.

Using (82) and Un=Un1Un2U_{n}=U_{n}^{1}\cup U_{n}^{2} we obtain that (Un𝒟n)η2\mathbb{P}\left(U_{n}\cap\mathcal{D}_{n}\right)\leqslant\frac{\eta}{2} for all nNηn\geqslant N_{\eta}. Equation (18) now follows by (83).

References

  • [1] {barticle}[author] \bauthor\bsnmAdamczak, \bfnmRadoslaw\binitsR. (\byear2008). \btitleA tail inequality for suprema of unbounded empirical processes with applications to Markov chains. \bjournalElectronic Journal of Probability \bvolume13 \bpages1000 – 1034. \endbibitem
  • [2] {barticle}[author] \bauthor\bsnmAthreya, \bfnmKrishna B\binitsK. B. and \bauthor\bsnmRoy, \bfnmVivekananda\binitsV. (\byear2016). \btitleGeneral Glivenko–Cantelli theorems. \bjournalStat \bvolume5 \bpages306–311. \endbibitem
  • [3] {bbook}[author] \bauthor\bsnmBarlow, \bfnmRichard E\binitsR. E., \bauthor\bsnmBartholomew, \bfnmDavid J\binitsD. J., \bauthor\bsnmBremner, \bfnmJM\binitsJ. and \bauthor\bsnmBrunk, \bfnmH Daniel\binitsH. D. (\byear1972). \btitleStatistical inference under order restrictions: The theory and application of isotonic regression. \bpublisherWiley New York. \endbibitem
  • [4] {bmisc}[author] \bauthor\bsnmBertail, \bfnmPatrice\binitsP., \bauthor\bsnmDurot, \bfnmCécile\binitsC. and \bauthor\bsnmFernández, \bfnmCarlos\binitsC. (\byear2023). \btitleSupplement to “Harris recurrent Markov chains and nonlinear monotone cointegrated models.”. \bdoi10.1214/[provided by typesetter] \endbibitem
  • [5] {barticle}[author] \bauthor\bsnmBertail, \bfnmPatrice\binitsP. and \bauthor\bsnmPortier, \bfnmFrançois\binitsF. (\byear2019). \btitleRademacher complexity for Markov chains: Applications to kernel smoothing and Metropolis–Hastings. \bjournalBernoulli \bvolume25 \bpages3912-3938. \endbibitem
  • [6] {bbook}[author] \bauthor\bsnmBingham, \bfnmN. H.\binitsN. H., \bauthor\bsnmGoldie, \bfnmC. M.\binitsC. M. and \bauthor\bsnmTeugels, \bfnmJ. L.\binitsJ. L. (\byear1987). \btitleRegular variation. \bseriesEncyclopedia of mathematics and its applications 27. \bpublisherCambridge University Press. \endbibitem
  • [7] {barticle}[author] \bauthor\bsnmCai, \bfnmBiqing\binitsB. and \bauthor\bsnmTjøstheim, \bfnmDag\binitsD. (\byear2015). \btitleNonparametric Regression Estimation for Multivariate Null Recurrent Processes. \bjournalEconometrics \bvolume3 \bpages265–288. \endbibitem
  • [8] {barticle}[author] \bauthor\bsnmCai, \bfnmZongwu\binitsZ. and \bauthor\bsnmOuld-Saıid, \bfnmElias\binitsE. (\byear2003). \btitleLocal M-estimator for nonparametric time series. \bjournalStatistics and Probability Letters \bvolume65 \bpages433-449. \endbibitem
  • [9] {barticle}[author] \bauthor\bsnmChen, \bfnmXia\binitsX. (\byear1999). \btitleHow Often Does a Harris Recurrent Markov Chain Recur? \bjournalThe Annals of Probability \bvolume27. \endbibitem
  • [10] {barticle}[author] \bauthor\bsnmChen, \bfnmXia\binitsX. (\byear2000). \btitleOn the limit laws of the second order for additive functionals of Harris recurrent Markov chains. \bjournalProbability Theory and Related Fields \bvolume116. \endbibitem
  • [11] {bbook}[author] \bauthor\bsnmDeaton, \bfnmAngus\binitsA. and \bauthor\bsnmMuellbauer, \bfnmJohn\binitsJ. (\byear1980). \btitleEconomics and Consumer Behavior. \bpublisherCambridge University Press. \bdoi10.1017/CBO9780511805653 \endbibitem
  • [12] {bbook}[author] \bauthor\bsnmDouc, \bfnmRandal\binitsR., \bauthor\bsnmMoulines, \bfnmEric\binitsE., \bauthor\bsnmPriouret, \bfnmPierre\binitsP. and \bauthor\bsnmSoulier, \bfnmPhilippe\binitsP. (\byear2018). \btitleMarkov chains. \bseriesSpringer Series in Operations Research and Financial Engineering. \bpublisherSpringer. \endbibitem
  • [13] {binproceedings}[author] \bauthor\bsnmDurot, \bfnmCécile\binitsC. and \bauthor\bsnmTocquet, \bfnmAnne-Sophie\binitsA.-S. (\byear2003). \btitleOn the distance between the empirical process and its concave majorant in a monotone regression framework. In \bbooktitleAnnales de l’Institut Henri Poincare (B) Probability and Statistics \bvolume39 \bpages217–240. \bpublisherElsevier. \endbibitem
  • [14] {barticle}[author] \bauthor\bsnmEngle, \bfnmRobert F.\binitsR. F. and \bauthor\bsnmGranger, \bfnmC. W. J.\binitsC. W. J. (\byear1987). \btitleCo-Integration and Error Correction: Representation, Estimation, and Testing. \bjournalEconometrica \bvolume55 \bpages251–276. \endbibitem
  • [15] {barticle}[author] \bauthor\bsnmFeigin, \bfnmPaul D.\binitsP. D. and \bauthor\bsnmTweedie, \bfnmRichard L.\binitsR. L. (\byear1985). \btitleRandom Coefficient Autoregressive Processes:a Markov Chain Analysis of Stationarity and Finiteness of Moments. \bjournalJournal of Time Series Analysis \bvolume6. \endbibitem
  • [16] {barticle}[author] \bauthor\bsnmGranger, \bfnmC. W. J.\binitsC. W. J. (\byear1981). \btitleSome properties of time series data and their use in econometric model specification. \bjournalJournal of Econometrics \bvolume16 \bpages121-130. \endbibitem
  • [17] {bbook}[author] \bauthor\bsnmGranger, \bfnmClive\binitsC. and \bauthor\bsnmTeräsvirta, \bfnmTimo\binitsT. (\byear1993). \btitleModelling Non-Linear Economic Relationships. \bpublisherOxford University Press. \endbibitem
  • [18] {bbook}[author] \bauthor\bsnmGroeneboom, \bfnmPiet\binitsP. and \bauthor\bsnmJongbloed, \bfnmGeurt\binitsG. (\byear2014). \btitleNonparametric Estimation under Shape Constraints: Estimators, Algorithms and Asymptotics. \bseriesCambridge Series in Statistical and Probabilistic Mathematics. \bpublisherCambridge University Press. \endbibitem
  • [19] {bbook}[author] \bauthor\bsnmGut, \bfnmAllan\binitsA. (\byear2013). \btitleProbability : a graduate course, \bedition2nd ed ed. \bseriesSpringer texts in statistics. \bpublisherSpringer. \endbibitem
  • [20] {barticle}[author] \bauthor\bsnmHansen, \bfnmBruce E.\binitsB. E. and \bauthor\bsnmSeo, \bfnmByeongseon\binitsB. (\byear2002). \btitleTesting for two-regime threshold cointegration in vector error-correction models. \bjournalJournal of Econometrics \bvolume110 \bpages293-318. \bnoteLong memory and nonlinear time series. \bdoihttps://doi.org/10.1016/S0304-4076(02)00097-0 \endbibitem
  • [21] {barticle}[author] \bauthor\bsnmJohansen, \bfnmS.\binitsS. (\byear1988). \btitleStatistical analysis of cointegrating vectors. \bjournalJournal of Economic Dynamics and Control \bvolume12 \bpages231. \endbibitem
  • [22] {barticle}[author] \bauthor\bsnmJohansen, \bfnmS.\binitsS. (\byear1991). \btitleEstimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. \bjournalEconometrica \bvolume59 \bpages1551. \endbibitem
  • [23] {barticle}[author] \bauthor\bsnmKarlsen, \bfnmHans Arnfinn\binitsH. A., \bauthor\bsnmMyklebust, \bfnmTerje\binitsT. and \bauthor\bsnmTjøstheim, \bfnmDag\binitsD. (\byear2007). \btitleNonparametric estimation in a nonlinear cointegration type model. \bjournalThe Annals of Statistics \bvolume35 \bpages252–299. \endbibitem
  • [24] {barticle}[author] \bauthor\bsnmKarlsen, \bfnmHans Arnfinn\binitsH. A. and \bauthor\bsnmTjostheim, \bfnmDag\binitsD. (\byear2001). \btitleNonparametric estimation in null recurrent time series. \bjournalThe Annals of Statistics \bvolume29. \endbibitem
  • [25] {bbook}[author] \bauthor\bsnmKosorok, \bfnmMichael R.\binitsM. R. (\byear2008). \btitleIntroduction to empirical processes and semiparametric inference, \bedition1 ed. \bseriesSpringer Series in Statistics. \bpublisherSpringer. \endbibitem
  • [26] {barticle}[author] \bauthor\bsnmLin, \bfnmZhengyan\binitsZ., \bauthor\bsnmLi, \bfnmDegui\binitsD. and \bauthor\bsnmChen, \bfnmJia\binitsJ. (\byear2009). \btitleLocal linear M-estimators in null recurrent times series. \bjournalStatistica Sinica \bvolume19 \bpages1683–1703. \endbibitem
  • [27] {bbook}[author] \bauthor\bsnmMeyn, \bfnmSean\binitsS., \bauthor\bsnmTweedie, \bfnmRichard\binitsR. and \bauthor\bsnmGlynn, \bfnmPeter\binitsP. (\byear2009). \btitleMarkov chains and stochastic stability, \bedition2 ed. \bseriesCambridge Mathematical Library. \bpublisherCambridge University Press. \endbibitem
  • [28] {barticle}[author] \bauthor\bsnmNummelin, \bfnmE.\binitsE. (\byear1978). \btitleA splitting technique for Harris recurrent Markov chains. \bjournalProbability Theory and Related Fields \bvolume43. \endbibitem
  • [29] {bbook}[author] \bauthor\bsnmNummelin, \bfnmEsa\binitsE. (\byear1984). \btitleGeneral Irreducible Markov Chains and Non-Negative Operators. \bseriesCambridge Tracts in Mathematics 83. \bpublisherCambridge University Press. \endbibitem
  • [30] {barticle}[author] \bauthor\bsnmPhillips, \bfnmPeter C. B.\binitsP. C. B. (\byear1991). \btitleOptimal inference in cointegrated systems. \bjournalEconometrica \bvolume59 \bpages283. \endbibitem
  • [31] {barticle}[author] \bauthor\bsnmPhillips, \bfnmPeter C. B.\binitsP. C. B. and \bauthor\bsnmSolo, \bfnmV.\binitsV. (\byear1992). \btitleAsymptotics for linear processes. \bjournalThe Annals of Statistics \bvolume20 \bpages971. \endbibitem
  • [32] {barticle}[author] \bauthor\bsnmStigler, \bfnmMatthieu\binitsM. (\byear2020). \btitleThreshold cointegration: overview and implementation in R. \bjournalHandbook of Statistics,42 \bpages229-264. \endbibitem
  • [33] {bbook}[author] \bauthor\bparticleVan der \bsnmVaart, \bfnmAad W\binitsA. W. (\byear2000). \btitleAsymptotic statistics. \bpublisherCambridge university press. \endbibitem
  • [34] {bbook}[author] \bauthor\bparticlevan der \bsnmVaart, \bfnmAad W.\binitsA. W. and \bauthor\bsnmWellner, \bfnmJon A.\binitsJ. A. (\byear1996). \btitleWeak Convergence and Empirical Processes. \bpublisherSpringer. \endbibitem
  • [35] {bbook}[author] \bauthor\bsnmWang, \bfnmQiying\binitsQ. (\byear2015). \btitleLimit Theorems for Nonlinear Cointegrating Regression. \bpublisherWorld Scientific. \endbibitem
  • [36] {barticle}[author] \bauthor\bsnmWang, \bfnmQiying\binitsQ. and \bauthor\bsnmPhillips, \bfnmPeter C. B.\binitsP. C. B. (\byear2009). \btitleAsymptotic Theory for Local Time Density Estimation and Nonparametric Cointegrating Regression. \bjournalEconometric Theory \bvolume25 \bpages710–738. \endbibitem
  • [37] {barticle}[author] \bauthor\bsnmWang, \bfnmQiying\binitsQ. and \bauthor\bsnmPhillips, \bfnmPeter C. B.\binitsP. C. B. (\byear2009). \btitleStructural Nonparametric Cointegrating Regression. \bjournalEconometrica \bvolume77 \bpages1901–1948. \endbibitem