Harris recurrent Markov chains and nonlinear monotone cointegrated models
Abstract
In this paper, we study a nonlinear cointegration-type model of the form where is a monotone function and is a Harris recurrent Markov chain. We use a nonparametric Least Square Estimator to locally estimate , and under mild conditions, we show its strong consistency and obtain its rate of convergence. New results (of the Glivenko-Cantelli type) for localized null recurrent Markov chains are also proved.
keywords:
[class=MSC]keywords:
and
1 Introduction and motivations
1.1 Linear and nonlinear cointegation models
Linear cointegration introduced by [16] and developed by [14] and [21, 22], is a concept used in statistics and econometrics to describe a long-term relationship between two or more time series. In general, these time series are non-stationary, integrated of order 1, that is, they behave roughly as random walks. In traditional linear cointegration analysis, variables are assumed to have a linear relationship, which means their long-term equilibrium, as time grows, is characterized by a constant linear combination. This concept has since been extensively studied, particularly in the field of econometrics [14, 30, 31, 21, 22]. Notice that, when there is indeed a significant linear relationship, the link is monotone in each of the variables.
However, in some cases, the relationship between variables may exhibit nonlinear behavior, which cannot be adequately captured by linear cointegration models. The incorporation of nonlinearities allows for a more comprehensive understanding of long-term relationships between variables. [20] have developed an approach for analyzing nonlinear cointegration through threshold cointegration models. These models assume that the linear relationship between variables differs after some changepoints, leading to different long-run equilibrium states (for instance according to some latent regimes). Threshold cointegration models provide a framework for capturing nonlinearities in the data and estimating the changepoints. Refer to [32] for examples and discussions on the importance to introduce nonlinearities in cointegration applications and for further references.
Another method for analyzing nonlinear cointegration is through the use of smooth transition cointegration models introduced by [17]. These models assume a ECM (Error Correction Model) form and allow for smooth transitions between different regimes in the data. Most estimators of non-linear cointegration may be seen as Nadaraya-Watson estimators of the link function. For instance, Wang and Phillips [36, 37, 35] show that it is possible to estimate and perform asymptotic inference in specific nonparametric cointegration regression models using kernel regression techniques. Furthermore, they established that the self-normalized kernel regression estimators converge to a standard normal distribution limit, even when the explanatory variable is integrated. These findings indicate that the estimators can consistently capture the underlying relationship between variables, even in cases where the explanatory variable exhibits non-stationary behavior. The problem of estimating under the Markovian assumptions has also been tackled using local linear M-type estimators in [8, 26] using smoothing techniques.
These results have been partially extended in the framework of general -recurrent Markov chain and not just integrated time series by [23, 7]. Consider the simple framework where we observe two Markov chains, and . [23] are essentially interested in the study of nonlinear cointegrated models such as,
(1) |
where is a nonlinear function. and are independent processes, and is a positive or -null recurrent Markov chain. Despite the fact that there is no stationary probability measure for , they apply the Nadaraya-Watson method to estimate and established the asymptotic theory of the proposed estimator. The rate of convergence of the estimators at some point is essentially linked to the local properties of the -null recurrent chain and typically of the order of the square root of the number of visits of the chain in a neighborhood of the point .
1.2 Monotone cointegration models: motivations
Monotonicity in cointegration is a rather natural assumption in many economic applications, for instance for modeling demand as a function of income or prices (see for instance [11]) or other variables. Suppose, for example, we are interested in analyzing the long-term relationship between ice cream sales and the average monthly temperature. These two non-stationary variables may be modeled by some -recurrent Markov chains. We hypothesize that as the average monthly temperature increases, the demand for ice cream also increases: however the rate of increase may vary according to the season. In that case, the nonlinear relationship between the two variables will be monotone. In microeconomics, the same phenomenon is expected for Engel curves, describing how real expenditure varies with household income (see [11]). Expenditures and income (or their log in most models) may be considered as non-stationary variables. However considering a linear cointegration between them may be misleading, since the relationship may change along the life cycle. By Engle’s law, the relationship between the two variables should be monotone. Other types of examples of monotone non-linear cointegration phenomenon may be found in [32].
The purpose of this paper is to propose a simple estimator that is automatically monotone, does not require strong smoothness assumptions (we only require continuity of the link function), and operates under general Markovian assumptions. We establish a nonparametric estimation theory of the nonparametric least squares estimator (LSE) for the function in the model (1) under the constraints that is monotone non-increasing. Here, is an unobserved process such that to ensure identifiability of . Since a minimal condition for undertaking asymptotic analysis on at a given point is that, as the number of observations on increases, there must be infinitely many observations in the neighborhood of , the process will be assumed to be a Harris recurrent Markov chain (cf section 2). We consider at the same time the stationary and null recurrent non-stationary framework. To our knowledge, it is the first time that such an estimator is proposed in the literature in such a large framework.
1.3 The estimator
Let be a set whose interior contains our point of interest . Having observed , we denote by the number of times that X visited up to time and by the time of the -th visit. Then, we consider the nonparametric LSE defined as the minimizer of
(2) |
over the set of non-increasing functions on . The nonparametric LSE has a well known characterization, as follows. Let be the number of unique values of , and be the corresponding order statistics. Then, is the left-hand slope at of the least concave majorant of the set of points
(3) |
and it can be computed using simple algorithms as discussed in [3]. Thus, the constrained LSE is uniquely defined at the observation points, however, it is not uniquely defined between these points: any monotone interpolation of these values is a constrained LSE. As is customary, we consider in the sequel the piecewise-constant and left-continuous LSE that is constant on every interval , and also on and on .
The use of a localized estimator is due to the fact that we need to control the behavior of the chain around , and, to do this, we need to estimate the asymptotic ”distribution” of X in a vicinity of . For Harris recurrent Markov chains, the long-term behavior of the chain is given by its invariant measure (see Section 2). In the positive recurrent case, the invariant measure is finite and it can be estimated by simply considering the empirical cumulative distribution function of the . However, in the null recurrent case, the invariant measure is only -finite, hence, we need to localize our analysis in a set big enough such that the chain visits it infinitely often, but small enough that the restriction of the invariant measure to it is finite. Moreover, contrary to the bandwidth in kernel type estimators, does not depend on , and the rate of convergence of the estimator does not depend on .
1.4 Outline
Since our paper draws quite heavily on the theory of Harris recurrent Markov chains, we have added a small introduction to the subject as well as the main results that we use throughout the paper in Section 2. In Section 3, we show that under very general assumptions, our estimator is strongly consistent, while its rate of convergence is presented in Section 4. In Section 5, we present new results concerning the localized empirical process of Harris recurrent Markov chains that have emerged during our investigation and we believe are interesting in their own right. Section 6 contains the proofs of our main results.
2 Markov chain theory and notation
In this section, we present the notation and main results related to Markov chains that are needed to present our main results. For further details, we refer the reader to the first section of the Supplementary Material [4] and the books [29, 27, 12].
Consider a time-homogeneous irreducible Markov Chain, denoted as , defined on a probability space , where is countably generated. The irreducibility measure of the chain is represented by . The transition kernel of the chain is denoted as and its initial distribution is represented by . If the initial measure of the chain is specified, we use (and ) to denote the probability (and the expectation) conditioned on the law of the initial state .
For any set , we will denote by and , respectively, the times of first visit and first return of the chain to the set , i.e. and . The subsequent visit and return times , are defined inductively.
Given that our methods will only deal with the values of X in a fixed set , if is a measurable set, we will write instead of and if , then we will simply write . We will use to denote the random variable that counts the number of times the chain has visited the set up to time , that is . Similarly, we will write for the total of numbers of visits the chain X to . The set is called recurrent if for all and the chain X is recurrent if every set such that is recurrent. A recurrent chain is called Harris recurrent if for all and all with we have .
Denote by the class of nonnegative measurable functions with positive support. A function is called small if there exists an integer and a measure such that
(4) |
When a chain possesses a small function , we say that it satisfies the minorization inequality . A set is said to be small if the function is small. Similarly, a measure is small if there exist , and that satisfy (4). By Theorem 2.1 in [29], every irreducible Markov chain possesses a small function and Proposition 2.6 of the same book shows that every measurable set with contains a small set. In practice, finding such a set consists in most cases in exhibiting an accessible set, for which the probability that the chain returns to it in steps is uniformly bounded below. Moreover, under quite wide conditions a compact set will be small, see [15].
An irreducible chain possesses an accessible atom, if there is a set such that for all in : and . When an accessible atom exists, the stochastic stability properties of X amount to properties concerning the speed of return time to the atom only. Moreover, it follows from the strong Markov property that the sample paths may be divided into independent blocks of random length corresponding to consecutive visits to . The sequence defines successive times at which the chain forgets its past, called regeneration times. Similarly, the sequence of i.i.d. blocks are named regeneration blocks. The random variable counts the number of i.i.d. blocks up to time . This term is called number of regenerations up to time .
If X does not possess an atom but is Harris recurrent (and therefore satisfies a minorization inequality ), a splitting technique, introduced in [28, 29], allows us to extend in some sense the probabilistic structure of X in order to artificially construct an atomic chain (named the split chain and denoted by ) that inherits the communication and stochastic stability properties from X. One of the main results derived from this construction is the fact that every Harris recurrent Markov chain admits a unique (up to multiplicative constant) invariant measure (see Proposition 10.4.2 in [27]), that is, a measure such that
The invariant measure is finite if and only if , in this case we say the chain is positive recurrent, otherwise, we say the chain is null recurrent. A null recurrent chain is called -null recurrent (c.f. Definition 3.2 in [24]) if there exists a small nonnegative function , a probability measure , a constant and a slowly varying function such that
As argued in [24], is not a too severe restriction to assume . Therefore, throughout this paper we assume that X satisfies the minorization inequality , i.e, there exist a measurable function and a probability measure such that , and
(5) |
Remark 2.1.
The extensions to the case where of the results that will be presented in this paper can be carried out (although they involve some complicated notations/proofs) using the -skelethon or the resolvent chains, as described in [9, 10] and Chapter 17 of [27]. However, they are not treated in this paper.
The following theorem is a compendium of the main properties of Harris’s recurrent Markov chains that will be used throughout the paper. Among other things, it shows that the asymptotic behavior of is similar to the function defined as
(6) |
Theorem 2.1.
Suppose X is a Harris recurrent, irreducible Markov chain, with initial measure , that satisfies the minorization condition (5). Let be the number of complete regenerations until time of the split chain , let be a small set and be an invariant measure for X. Then,
-
1.
.
-
2.
converges almost surely to a positive constant.
-
3.
converges almost surely to a positive constant if X is positive recurrent and converges in distribution to a Mittag-Leffler111The Mittag-Leffler distribution with index is a non-negative continuous distribution, whose moments are given by See (3.39) in [24] for more details. random variable with index if X is -null recurrent.
3 Consistency
The aim of the section is to show that for an arbitrary in the support of , the LSE is consistent. We make the following assumptions on the processes and .
-
(A1)
X is a Harris recurrent Markov chain whose kernel satisfies the minorization condition (5).
Let be sigma algebra generated by the chain X up to time .
-
(A2)
For each , the random variables are conditionally independent given , and for some .
It follows from Assumption (A1) that the Markov Chain X admits a unique (up to a multiplicative constant) -finite invariant measure . Let be a set such that and . We denote by the process defined by
(7) |
for all , which is a localized version of the empirical distribution function of the ’s. It is proved in Lemma 5.1 that converges almost surely to the distribution function supported on and defined by
(8) |
Our next two assumptions guarantee that there is a compact , that is a small set and contains as an interior point. Sets like this can be found under very wide conditions (cf [15]).
-
(A3)
There is such that the set is small.
-
(A4)
belongs to the interior of the support of .
Notice that by part 1 of Theorem 2.1, (A3) guarantees that is finite and positive, and hence, is properly defined.
In addition to the assumptions on the processes and , we need smoothness assumptions on and on . In particular, we will assume that and are continuous and strictly monotone in . This implies that and are invertible in , so we can find neighborhoods of and respectively, over which the inverse functions are uniquely defined. We denote by and respectively the inverses of and over such a neighborhood of and respectively. The function is assumed to be monotone on its whole support.
-
(A5)
is locally continuous and strictly increasing in the sense that for all in , for all , there exists such that for all such that .
-
(A6)
is non-increasing, and is locally strictly decreasing in the sense that for all in , for all , there exists such that for all such that .
-
(A7)
continuous in .
4 Rates of convergence
To compute rates of convergence, we need stronger assumptions than for consistency. We replace assumption (A1) for the following stronger version
-
(B1)
is a positive or -null recurrent, aperiodic and irreducible Markov Chain whose kernel satisfies the minorization condition (5).
-
(B2)
The function is non-increasing, the functions and are differentiable in , and the derivatives and are bounded, in absolute value, above and away from zero in .
Let be the initial measure of X. Our next hypothesis imposes some control on the behavior of the chain in the first regenerative block.
-
(B3)
There exists a constant and a neighborhood of 0, such that
Assumption (B3)is satisfied if we assume that the initial measure of the chain is the small measure used for the construction of the split chain (see equation 4.16c in [29]). In the positive recurrent case, taking equal to the unique invariant probability measure of the chain also satisfies (B3).
And finally, we need to control the number of times the chain visits in a regeneration block.
-
(B4)
has finite second moment.
Theorem 4.1.
The rate comes from Lemmas 5.3 and 6.7, and as it can be seen from Theorem 2.1, it is a deterministic approximation of . Note that in the positive recurrent case, , hence we obtain the same rate as in the i.i.d. case [18, Chapter 2]. In the -null recurrent case, however, the rate of convergence is which is slower than the usual rate. This is due to the null-recurrence of the chain because it takes longer for the process to return to a neighborhood of the point and it is these points in the neighborhood of which are used in nonparametric estimation.
5 Localized Markov chains
Given the localized nature of our approach, in this section, we present some results that are particularly useful in this scenario. These results are well known for positive recurrent chains but are new in the null recurrent case. The detailed proofs of these results can be found in Section 2 of the Supplementary material [4].
The first result can be viewed as an extension of the Glivenko-Cantelli theorem to the localized scenario.
Lemma 5.1.
Our next result (Lemma 5.2), which is an extension of Lemma 2 in [5] to the localized -null recurrent case, deals with the properties of classes of functions defined over the regeneration blocks. Before presenting the result, we need some machinery.
Recall that denotes the state space of . Define (i.e. the set of finite subsets of ) and let the localized occupation measure be given by
The function that gives the size of the localized blocks is
Let denote the smallest -algebra formed by the elements of the -algebras , , where stands for the classical product -algebra. Let denote a probability measure on . If is a random variable with distribution , then is a random measure, i.e., is a (counting) measure on , almost surely, and for every , is a measurable random variable (valued in ). Henceforth is a random variable and, provided that , the map , defined by
(14) |
is a probability measure on . The notation stands for the expectation with respect to the underlying measure . Introduce the following notations: for any function , let be given by
(15) |
and for any class of real-valued functions defined on , denote the localized version of the sums on the blocks by .
Notice that, for any function ,
(16) |
Lemma 5.2.
Let be a probability measure on such that and be a class of measurable real-valued functions defined on . Then we have, for every ,
where is given in (14). Moreover, if belongs to the Vapnik–Chervonenkis (VC) class of functions with constant envelope and characteristic , then is VC with envelope and characteristic .
Remark 5.1.
For a probability measure , and a class of functions , the covering number is the minimum number of -balls needed to cover . For more details about this concept and the VC class of functions, see [25].
To put into perspective Lemma 5.2, consider a class of bounded functions that is VC with finite envelope. Lemma 5.2 tells us that the class of unbounded functions is also VC. If we also have that (B4) holds, then Theorem 2.5 in [25] tells us that is a Donsker class. A reasoning like this is used in the proof of the following result, which is a stronger version of Lemma 5.1 under assumptions (B1) and (B2) and has some interest on its own.
6 Proofs
In this section, we provide the proof of Theorems 3.1 and 4.1. These proofs make use of several intermediate lemmas, whose proofs can be found in Sections 7 and 8.
6.1 Proof of Theorem 3.1
Recall that we consider the piecewise constant and left-continuous LSE , that is constant on every interval , and also on and on . With fixed, we denote by the number of times the Markov Chain X visits the set until time :
(19) |
Let for all and .
Our aim is to provide a characterization of . Recall from (7) that the localized empirical distribution function is defined as
for . is 0 on , so, with an arbitrary random variable we have for all . Let be the set
(20) |
and let be the continuous piecewise-linear process on with knots at the points in and values
(21) |
at the knots. The characterization of in Lemma 6.2 involves the least concave majorant of . Note that we use as a normalization in the definitions of the processes and since this choice ensures that and converge to fixed functions, see Lemma 5.1.
Lemma 6.1.
For all ,
where,
(22) |
and is a piece-wise linear processes with knots at for such that
Moreover, can be written as
(23) |
where,
(24) | ||||
(25) |
and is such that .
In the next lemma, we give an alternative characterization of the monotone nonparametric LSE at the observation points .
Lemma 6.2.
Let for some fixed . Let be the left-hand slope of the least concave majorant of . Then,
(26) |
with probability 1 for big enough.
We consider below the generalized inverse function of since it has a more tractable characterization than itself. To this end, let us define precisely the generalized inverses of all processes of interest. Since is a non-increasing left-continuous step function on that can have jumps only at the points , , we define its generalized inverse , for , as the greatest that satisfies , with the convention that the supremum of an empty set is . Then for every and , one has
(27) |
Likewise, since is a left-continuous non-increasing step function on that can have jumps only at the observation times , we define the generalized inverse , for , as the greatest that satisfies , with the convention that the supremum of an empty set is . We then have
(28) |
for all and . On the other hand, since is a right-continuous non-decreasing step function on with range , we define the generalized inverse , for , as the smallest which satisfies . Note that the infimum is achieved for all . We then have
(29) |
for all and , and thanks to Lemma 6.2 we have
(30) |
on . Moreover, one can check that
(31) |
where argmax denotes the greatest location of maximum (which is achieved on the set in (20)). Thus, the inverse process is a location process that is more tractable than and themselves. A key idea in the following proofs is to derive properties of from its argmax characterization (31), then, to translate these properties to thanks to (30), and finally to translate them to thanks to (28).
To go from to using (30) requires to approximate by a fixed function. Hence, in the sequel, we are concerned by the convergence of the process given in (7), where is chosen sufficiently small, and by the convergence of the corresponding inverse function .
It is stated in Lemma 5.1 that under (A1) and (A3), converges to a fixed distribution function that depends on , hence on . If, moreover, is strictly increasing in , then we can find a neighborhood of over which the (usual) inverse function is uniquely defined, and converges to .
In the following lemma, we show that belongs to the domain of with probability tending to one as .
Lemma 6.3.
We will also need to control the noise . The following lemma shows that the noise is negligible under our assumptions.
Lemma 6.4.
With the above lemmas, we can prove convergence of to given by (31).
Now we proceed to the proof of (10). Fix arbitrarily small. It follows from (30) and (29) that
where
With , we obtain
and is strictly positive since is strictly increasing in the neighborhood of . Hence, it follows from (12) that for sufficiently small one has
so it follows from (32) that the probability that tends to zero as . Similarly, the probability that tends to zero as so we conclude that the probability that tends to zero as . This completes the proof of (10).
To prove (9), fix sufficiently small so that and are continuous and strictly increasing in the neighborhood of . Equation (10) shows that
(33) |
as . Now, it follows from the switch relation (27) that
(34) |
where . It follows from (33) that the probability on the right-hand side tends to zero as . Hence, the probability on the left-hand side tends to zero as well as .
Similarly, the probability that tends to zero as so we conclude that the probability that tends to zero as . This completes the proof of Theorem 3.1.
6.2 Proof of Theorem 4.1
The proof of Theorem 4.1, uses similar ideas as the ones used in the proof of Theorem 3.1 but under stronger assumptions (and therefore using stronger lemmas).
The first intermediate result is the following stronger version of Lemma 6.4.
Lemma 6.6.
Then, we need to quantify how well we can approximate by .
Lemma 6.7.
With the above lemmas (including Lemma 5.3 and the ones used in Section 6.1), we can obtain the rate of convergence of given by (31).
Lemma 6.8.
Inspecting the proof of Lemma 6.8, one can see that the convergences in (37) and (38) hold in a uniform sense in the neighborhood of . More precisely, there exists , independent on , such that for all we can find such that
and
Let where does not depend on , and recall (6.1) where . It follows from the assumption (B2) that has a derivative that is bounded in sup-norm away from zero in a neighborhood of . Hence, it follows from the Taylor expansion that there exists that depends only on such that , provided that is sufficiently large to ensure that belongs to this neighborhood of . Hence,
provided that is sufficiently large to ensure that belongs to the above neighborhood of , and that . For fixed one can choose such that the probability on the right-hand side of the previous display is smaller than or equal to and therefore,
Similarly, for all fixed , one can find that does not depend on such that
Hence, for all fixed , there exists that independent of such that
This completes the proof of Theorem 4.1.
7 Technical proofs for Section 6.1
The first term on the right-hand side of the previous display can be rewritten as follows:
using that for all . Hence, for all in
(39) |
Combining (39) with the piece-wise linearity of yields
where and are piece-wise linear processes with knots at for in and such that
and
In order to ease the notation, we will write , and . Let , take such that , then . With this notation,
Notice that
therefore,
which proves (22).
For we have,
then,
and this completes the proof.
Proof of Lemma 6.2. By definition, with , and for all , we have for all , where and does not depend on . Moreover,
Since is the left-hand slope at of the least concave majorant of the set of points in (3), the equality in (26) follows from Lemma 2.1 in [13].
Proof of Lemma 6.3. The first assertion follows from Assumption (A4) and the second immediately follows from the first one by (12) combined with the strict monotonicity of in .
Proof of Lemma 6.4. Let be sigma algebra generated by the chain up to time . Denote by the probability conditioned to . Take .
By Chebyshev’s inequality, we have
which implies the first part of the Lemma because with probability 1.
For the second part, let be the number of times the chain visits up to time and the times of those visits. Using that and Kolmogorov’s inequality (Th 3.1.6, pp 122 in [19]) we obtain,
which by the same argument as before, implies the second part of the Lemma.
Fix arbitrarily, and let and be such that for all such that , and for all such that . Note that existence of and is ensured by assumptions (A5) and (A6).
By Lemma 6.3, we can assume without loss of generality that belongs to the domain of , since this occurs with probability tending to one. Therefore, we can find such that . It follows from the characterization in (31) that the event is contained in the event that there exists such that and
where we recall that .
By Lemma 6.1, is contained in the event that there exists such that and
(40) |
Using (22) in (40) we obtain that is contained in the event that there exists such that and
where
Let and such that and . By equation (23) we have , therefore,
Hence,
Therefore, the event is contained in the event that there exists such that
Now, let be the event that
where is such that for all such that . Note that the existence of is ensured by assumption (A7). Then, it follows from the monotonicity of and that on ,
Hence, it follows from the definitions of , and that on ,
This implies that on ,
for all . Hence, the event is contained in the event . Now, on , for all we have
since . Therefore,
Hence, it follows from Lemma 6.4 that converges in probability to zero as , so that the probability of the event tends to zero as . It follows from Lemma 5.1 that for sufficiently small, the probability of the event tends to one as , so we conclude that the probability of tends to zero as . Similarly, the probability of the event tends to zero as , so that
for all . This completes the proof of (32).
8 Technical proofs for Section 6.2
Proof of Lemma 6.6. Let be sigma algebra generated by the chain X up to time . Denote by the expected value conditioned to . Take and define , and
then,
(41) |
By Doob’s maximal inequality (Th 10.9.4 in [19]), we have, for ,
(42) |
Define,
-
•
,
-
•
-
•
-
•
-
•
.
-
•
for .
By the Strong Markov property, is an i.i.d. sequence which is independent of (and, therefore, of the initial measure ). For fixed, the random variable is a stopping time for the sequence , in effect
For each and we have that
(44) |
where the last inequality is justified by the fact that, and is a nonnegative function. Because for all , we have that,
then,
(45) |
For each we have,
Notice that and is independent of , therefore,
Plugging this into equation (45) we get,
Then, by taking expectation in (8) we obtain
(46) |
By Theorem 1.1 and the fact that is Lipschitz we can find independent of such that,
(47) |
If X is positive recurrent, by Theorem 1.1, converges almost surely to a positive constant . Moreover, therefore, by the Dominated Convergence Theorem we obtain that . If X is -null recurrent, by Lemma 3.3 in [24], , hence, for both positive and -null recurrent chains, we can find and , both independent of , such that for all . Using this with (8) and (47) we get,
(48) |
Combining (48) with assumption (B3) and the fact that we obtain that there exist positive constants and such that
Equation (35) now follows after taking expectation in (43). The proof of (36) follows the same reasoning, but using
Proof of Lemma 6.7. a) If X is positive recurrent, Theorem 1.1 implies that there exists a positive constant such that converges almost surely to , which is not zero by (A3).
On the other hand, if X is -null recurrent, Theorem 1.1 and Slutsky’s Theorem implies that there exists a constant such that converges in distribution to where denotes a Mittag-Leffler distribution with parameter . This distribution is continuous and strictly positive with probability 1, then, by the Continuous Mapping Theorem, converges in distribution to a multiple of , therefore, is bounded in probability by Theorem 2.4 in [33].
b) Let X be positive recurrent, then, we can find such that
hence,
Now let X be -null recurrent. Let , This random variable is continuous and positive, therefore, we can find positive constants and such that
(49) |
By the Continuous Mapping Theorem, converges in distribution to , therefore, we can find such that
(50) |
Proof of Lemma 6.8. Fix small enough so that and are bounded from above and away from zero on , see the assumption (B2). Then, the proper inverse functions of and are well defined on and
respectively. We denote the inverses on that intervals by and respectively. Let
(52) |
where and where the supremum is restricted to . We will show below that
(53) |
as . Combining (31) to Lemma 6.5 ensures that coincides with with a probability that tends to one as , so (37) follows from (53).
We turn to the proof of (53). Fix arbitrarily and let
(54) |
for some sufficiently large so that
(55) |
Then, by part ii) of Lemma 6.7, we can find positive constants , and such that
(56) |
Let and . It follows from (18) that for sufficiently small , we can find such that
for all . Hence for ,
where denotes the intersection of the events
(57) |
and
(58) |
Combining equations (57) and (58), we obtain that, in ,
(59) |
where is independent of and .
By Lemma 6.3, we can assume without loss of generality that belongs to , since this occurs with probability that tends to one. Hence, by (52), the event is contained in the event that there exists with , and
(60) |
Obviously, the probability is equal to zero if so we assume in the sequel that . For all define
Let such that on the interval . Since and , it then follows from Taylor’s expansion that
for all and therefore, (60) implies that
for all such ’s, where we set . Hence, for all ,
(61) |
where the sums are taken over all integers such that . Recall that we have (39) for all . Since is piecewise-linear with knots at , by (22) and (23) we get that for every in the above sum,
(62) |
Moreover, is bounded above on , so we obtain that for every with , the first term on the right-hand side of (8) satisfies
for some that does not depend on . Hence, it follows from the previous display and (59) that
Hence, taking we get that for all with .
(63) |
By equations (23) and (24) in Lemma 6.1, the second term on the right-hand side of (8) satisfies,
(64) |
where and are given by
For , it follows from the triangle inequality that
Combining (59) and the fact that is Lipschitz in we can find independent of such that, on ,
and for all with . Hence, on
It follows from (55) that for all , then, on
By Lemma 6.6, we conclude that there exists and such that, for
(65) |
Combining (8), (63), (64) and (65), we conclude that there exists , independent of and , such that for all and where , one has
Combining this with (8) and the Markov inequality, we conclude that there exist and , that do not depend on nor , such that, for all ,
The sum on the last line is finite, so there exists , independent of and , such that for bigger than
(66) |
The above probability can be made smaller than by setting
(54) for some sufficiently large independent of
. This proves (53) and completes
the proof of (37).
Now, we turn to the proof of (38). It follows from (30) combined to
(32)
and Lemma 5.3 that
Hence, by Lemma 6.7 we have
Now, it follows from the assumption (B2) that has a bounded derivative in the neighborhood of , to which belongs with probability that tends to one. Hence, it follows from Taylor’s expansion that
where we used (37) for the last equality. This proves (38) and completes the proof of Lemma 6.8.
Supplement of “Harris recurrent Markov chains and nonlinear monotone cointegrated models” \sdescriptionThis is the supplementary material associated with the present article.
1 Markov chain theory and notation
This section extends Section 2 of the main paper, presenting a more detailed exposition of the Markov chain theory required in the proofs. For further details, we refer the reader to [29, 27, 12].
Let be a time-homogeneous Markov Chain defined on a probability space where is countably generated. Let denote its transition kernel, i.e. for , we have
Let denote the -step transition probability, i.e.
If is a probability measure in such that , then is called the initial measure of the chain X. A homogeneous Markov chain is uniquely identified by its kernel and initial measure.
When the initial measure of the chain is given, we will write (and ) for the probability (and the expectation) conditioned on . When for some we will simply write and .
An homogeneous Markov chain is irreducible if there exists a -finite measure on such that for all and all with we have for some . In this case, there exists a maximal irreducibility measure (all other irreducibility measures are absolutely continuous with respect to ). In the following, all Markov chains are supposed to be irreducible with maximal irreducibility measure .
For any set , we will denote by and , respectively, the times of first visit and first return of the chain to the set , i.e. and . The subsequent visit and return times , are defined inductively as follows:
(67) | ||||
(68) |
Given that our methods will only deal with the values of X in a fixed set , if is a measurable set, we will write instead of and if , then we will simply write .
We will use to denote the random variable that counts the number of times the chain has visited the set up to time , that is . Similarly, we will write for the total of numbers of visits the chain X to . The set is called recurrent if for all . The chain X is considered recurrent if every set , such that , is recurrent.
Although recurrent chains possess many interesting properties, a stronger type of recurrence is required in our analysis. An irreducible Markov chain is Harris recurrent if for all and all with we have
An irreducible chain possesses an accessible atom, if there is a set such that for all in : and . Denote by and the probability and the expectation conditionally to . If X possesses an accessible atom and is Harris recurrent, the probability of returning infinitely often to the atom is equal to one, no matter the starting point, i.e. Moreover, it follows from the strong Markov property that the sample paths may be divided into independent blocks of random length corresponding to consecutive visits to :
taking their values in the torus . Notice that the distribution of depends on the initial measure, therefore it does not have the same distribution as for . The sequence defines successive times at which the chain forgets its past, called regeneration times. Similarly, the sequence of i.i.d. blocks are named regeneration blocks. The random variable counts the number of i.i.d. blocks up to time . This term is called number of regenerations up to time .
Notice that for any function defined on , we can write as a sum of independent random variables as follows:
(69) |
where, , for and denotes the last incomplete block, i.e. .
When an accessible atom exists, the stochastic stability properties of X amount to properties concerning the speed of return time to the atom only. For instance, the measure given by:
(70) |
is invariant, i.e.
Denote by the class of nonnegative measurable functions with positive support. A function is called small if there exists an integer and a measure such that
(71) |
When a chain possesses a small function , we say that it satisfies the minorization inequality . As pointed out in [29], there is no loss of generality in assuming that and .
A set is said to be small if the function is small. Similarly, a measure is small if there exist , and that satisfy (71). By Theorem 2.1 in [29], every irreducible Markov chain possesses a small function and Proposition 2.6 of the same book shows that every measurable set with contains a small set. In practice, finding such a set consists in most cases in exhibiting an accessible set, for which the probability that the chain returns to it in steps is uniformly bounded below. Moreover, under quite wide conditions a compact set will be small, see [15].
If X does not possess an atom but is Harris recurrent (and therefore satisfies a minorization inequality ), a splitting technique, introduced in [28, 29], allows us to extend in some sense the probabilistic structure of X in order to artificially construct an atom. The general idea behind this construction is to expand the sample space so as to define a sequence of Bernoulli r.v.’s and a bivariate chain , named split chain, such that the set is an atom of this chain. A detailed description of this construction can be found in [29].
The whole point of this construction consists in the fact that inherits all the communication and stochastic stability properties from X (irreducibility, Harris recurrence,…). In particular, the marginal distribution of the first coordinate process of and the distribution of the original X are identical. Hence, the splitting method enables us to establish all the results known for atomic chains to general Harris chains, for example, the existence of an invariant measure which is unique up to multiplicative constant (see Proposition 10.4.2 in [27]).
The invariant measure is finite if and only if , in this case we say the chain is positive recurrent, otherwise, we say the chain is null recurrent. A null recurrent chain is called -null recurrent (c.f. Definition 3.2 in [24]) if there exists a small nonnegative function , a probability measure , a constant and a slowly varying function such that
As argued in [24], is not a too severe restriction to assume . Therefore, throughout this paper we assume that X satisfies the minorization inequality , i.e, there exist a measurable function and a probability measure such that , and
(72) |
A measurable and positive function , defined in for some , is called slowly varying at if it satisfies for all . See [6] for a detailed compendium of these types of functions.
It was shown in Theorem 3.1 of [24] that if the chain satisfies the minorization condition (5), then it is -null recurrent if and only if
(73) |
where is a slowly varying function.
The following theorem is a compendium of the main properties of Harris’s recurrent Markov chains that will be used throughout the proofs. Among other things, it shows that the asymptotic behavior of is similar to the function defined as
(74) |
The Mittag-Leffler distribution with index is a non-negative continuous distribution, whose moments are given by
See (3.39) in [24] for more details.
Theorem 1.1.
Suppose X is a Harris recurrent, irreducible Markov chain, with initial measure , that satisfies the minorization condition (72). Let be the number of complete regenerations until time of the split chain , let be a small set and be an invariant measure for X. Then,
-
1.
.
-
2.
For any function , defined on , the decomposition (69) holds. Moreover, there is a constant , that only depends on , such that if , then .
-
3.
converges almost surely to a positive constant.
-
4.
converges almost surely to a positive constant if X is positive recurrent and converges in distribution to a Mittag-Leffler random variable with index if X is -null recurrent.
Remark 1.1.
The Mittag-Leffler distribution with index is a non-negative continuous distribution, whose moments are given by
See (3.39) in [24] for more details.
Remark 1.2.
Part 1 of Theorem 1.1 is Proposition 5.6.ii of [29], part 2 is equation (3.23) of [24] and part 3 is an application of the Ratio Limit Theorem (Theorem 17.2.1 of [27]). For the positive recurrent case, part 4 also follows by the aforementioned Ratio Limit Theorem while the claim for the null recurrent case appears as Theorem 3.2 in [24].
2 Technical proofs for Section 5
Now, we turn to the proof of (13). To do this, we adapt some of the ideas presented in the proof of Lemma 21.2 in [33].
Let be a normal random variable independent of the ’s, and its distribution function. it follows from (12) that conditionally on the ’s, converges almost surely to . Thus, denoting by the conditional probability given the ’s, it follows from (29) that converges almost surely to at every at which the limit function is continuous . Since is strictly increasing in , one can find such that is continuous on , so the above limit function is continuous at every . By continuity of on , converges almost surely to for every such . By monotonicity, the convergence is uniform, hence
as .
Proof of Lemma 5.2. This proof is an adaptation to the localized case of the proof of Lemma 2 in [5]. Let , i.e., there exists such that . By Cauchy–Schwarz inequality,
then
where the last equality follows from (16). Applying this to the function
when each is the center of an -cover of the space and gives the first assertion of the lemma. To obtain the second assertion, note that is an envelope for . In addition, we have that
From this, we derive that, for every ,
Then using the first assertion of the lemma, we obtain for every ,
which implies the second assertion of the Lemmaz whenever the class is VC with envelope .
Proof of Lemma 5.3. Let and . For each we define , then, using the notation of section 6.2 we will have . Finally, for any function , we define
Let , and as defined in (8). Then, and
From now on, we’ll remove the superindex from to ease the notation.
Notice that and almost surely, therefore, the first term in the last equation converges almost surely to 0 uniformly in . For the last term, we have that
by (B4), the expectation of is finite, then, Lemma 1 in [2] shows that a.s. which implies that also converges to 0 a.s. Since a.s., by Theorem 6.8.1 in [19] we have almost surely. Joining this with the almost sure convergence of to a positive constant (see Theorem 1.1) we obtain that converges almost surely to 0 uniformly in . Therefore,
(75) |
where we have used that converges almost surely to a positive constant to use instead of .
Then, (17) will be proved if we show that, for small enough
(76) |
Fix arbitrarily. By Lemma 6.7 and Slutsky’s theorem, we can find positive numbers and an integer such that for all , where
(77) |
Define and let be a fixed positive number. Then, for all
(78) |
On , , therefore for all
(79) |
The random variables are i.i.d., therefore, by Montgomery-Smith’s inequality (Lemma 4 in [1]), there exists a universal constant such that for all ,
(80) |
For an arbitrary set , let be the space of all uniformly bounded, real functions on , equipped with the uniform norm. Weak convergence to a tight process in this space is characterized by asymptotic tightness plus convergence of marginals (see Chapter 1.5 in [34]).
The class of functions is VC with constant envelope , hence, by Lemma 5.2, the class of functions is also VC and has as envelope. is finite (by (B4)), therefore, by Theorem 2.5 in [25], is Donsker. Then, the process converges weakly in to a tight process . The map from to is continuous with respect to the supremum norm (cf. pp 278 of [33]), therefore, converges in distribution to , hence, we can find and such that
(81) |
Choosing in 81 and joining (80), (79) and (78), completes the proof of (17).
Now we proceed to prove (18). Let be fixed, by (17) and Lemma 6.7, we can find , and such that
(82) | ||||
(83) |
where . Define the sets
where and are constants that will be specified later.
On , , hence,
(84) |
Assumption (B2) indicates that has bounded derivative in , take as the maximum value of this derivative in , then, the Mean Value Theorem implies that
After plugging this into (2) we get
Because , we can find such that for all , taking bigger than and using that on , we obtain, for all
(85) |
Let be such that for . By the continuity of in there exists such that for all in , therefore, the triangular inequality implies that lies in the interval for all . This, alongside (85), shows that for all
By a similar argument, it can be shown that
References
- [1] {barticle}[author] \bauthor\bsnmAdamczak, \bfnmRadoslaw\binitsR. (\byear2008). \btitleA tail inequality for suprema of unbounded empirical processes with applications to Markov chains. \bjournalElectronic Journal of Probability \bvolume13 \bpages1000 – 1034. \endbibitem
- [2] {barticle}[author] \bauthor\bsnmAthreya, \bfnmKrishna B\binitsK. B. and \bauthor\bsnmRoy, \bfnmVivekananda\binitsV. (\byear2016). \btitleGeneral Glivenko–Cantelli theorems. \bjournalStat \bvolume5 \bpages306–311. \endbibitem
- [3] {bbook}[author] \bauthor\bsnmBarlow, \bfnmRichard E\binitsR. E., \bauthor\bsnmBartholomew, \bfnmDavid J\binitsD. J., \bauthor\bsnmBremner, \bfnmJM\binitsJ. and \bauthor\bsnmBrunk, \bfnmH Daniel\binitsH. D. (\byear1972). \btitleStatistical inference under order restrictions: The theory and application of isotonic regression. \bpublisherWiley New York. \endbibitem
- [4] {bmisc}[author] \bauthor\bsnmBertail, \bfnmPatrice\binitsP., \bauthor\bsnmDurot, \bfnmCécile\binitsC. and \bauthor\bsnmFernández, \bfnmCarlos\binitsC. (\byear2023). \btitleSupplement to “Harris recurrent Markov chains and nonlinear monotone cointegrated models.”. \bdoi10.1214/[provided by typesetter] \endbibitem
- [5] {barticle}[author] \bauthor\bsnmBertail, \bfnmPatrice\binitsP. and \bauthor\bsnmPortier, \bfnmFrançois\binitsF. (\byear2019). \btitleRademacher complexity for Markov chains: Applications to kernel smoothing and Metropolis–Hastings. \bjournalBernoulli \bvolume25 \bpages3912-3938. \endbibitem
- [6] {bbook}[author] \bauthor\bsnmBingham, \bfnmN. H.\binitsN. H., \bauthor\bsnmGoldie, \bfnmC. M.\binitsC. M. and \bauthor\bsnmTeugels, \bfnmJ. L.\binitsJ. L. (\byear1987). \btitleRegular variation. \bseriesEncyclopedia of mathematics and its applications 27. \bpublisherCambridge University Press. \endbibitem
- [7] {barticle}[author] \bauthor\bsnmCai, \bfnmBiqing\binitsB. and \bauthor\bsnmTjøstheim, \bfnmDag\binitsD. (\byear2015). \btitleNonparametric Regression Estimation for Multivariate Null Recurrent Processes. \bjournalEconometrics \bvolume3 \bpages265–288. \endbibitem
- [8] {barticle}[author] \bauthor\bsnmCai, \bfnmZongwu\binitsZ. and \bauthor\bsnmOuld-Saıid, \bfnmElias\binitsE. (\byear2003). \btitleLocal M-estimator for nonparametric time series. \bjournalStatistics and Probability Letters \bvolume65 \bpages433-449. \endbibitem
- [9] {barticle}[author] \bauthor\bsnmChen, \bfnmXia\binitsX. (\byear1999). \btitleHow Often Does a Harris Recurrent Markov Chain Recur? \bjournalThe Annals of Probability \bvolume27. \endbibitem
- [10] {barticle}[author] \bauthor\bsnmChen, \bfnmXia\binitsX. (\byear2000). \btitleOn the limit laws of the second order for additive functionals of Harris recurrent Markov chains. \bjournalProbability Theory and Related Fields \bvolume116. \endbibitem
- [11] {bbook}[author] \bauthor\bsnmDeaton, \bfnmAngus\binitsA. and \bauthor\bsnmMuellbauer, \bfnmJohn\binitsJ. (\byear1980). \btitleEconomics and Consumer Behavior. \bpublisherCambridge University Press. \bdoi10.1017/CBO9780511805653 \endbibitem
- [12] {bbook}[author] \bauthor\bsnmDouc, \bfnmRandal\binitsR., \bauthor\bsnmMoulines, \bfnmEric\binitsE., \bauthor\bsnmPriouret, \bfnmPierre\binitsP. and \bauthor\bsnmSoulier, \bfnmPhilippe\binitsP. (\byear2018). \btitleMarkov chains. \bseriesSpringer Series in Operations Research and Financial Engineering. \bpublisherSpringer. \endbibitem
- [13] {binproceedings}[author] \bauthor\bsnmDurot, \bfnmCécile\binitsC. and \bauthor\bsnmTocquet, \bfnmAnne-Sophie\binitsA.-S. (\byear2003). \btitleOn the distance between the empirical process and its concave majorant in a monotone regression framework. In \bbooktitleAnnales de l’Institut Henri Poincare (B) Probability and Statistics \bvolume39 \bpages217–240. \bpublisherElsevier. \endbibitem
- [14] {barticle}[author] \bauthor\bsnmEngle, \bfnmRobert F.\binitsR. F. and \bauthor\bsnmGranger, \bfnmC. W. J.\binitsC. W. J. (\byear1987). \btitleCo-Integration and Error Correction: Representation, Estimation, and Testing. \bjournalEconometrica \bvolume55 \bpages251–276. \endbibitem
- [15] {barticle}[author] \bauthor\bsnmFeigin, \bfnmPaul D.\binitsP. D. and \bauthor\bsnmTweedie, \bfnmRichard L.\binitsR. L. (\byear1985). \btitleRandom Coefficient Autoregressive Processes:a Markov Chain Analysis of Stationarity and Finiteness of Moments. \bjournalJournal of Time Series Analysis \bvolume6. \endbibitem
- [16] {barticle}[author] \bauthor\bsnmGranger, \bfnmC. W. J.\binitsC. W. J. (\byear1981). \btitleSome properties of time series data and their use in econometric model specification. \bjournalJournal of Econometrics \bvolume16 \bpages121-130. \endbibitem
- [17] {bbook}[author] \bauthor\bsnmGranger, \bfnmClive\binitsC. and \bauthor\bsnmTeräsvirta, \bfnmTimo\binitsT. (\byear1993). \btitleModelling Non-Linear Economic Relationships. \bpublisherOxford University Press. \endbibitem
- [18] {bbook}[author] \bauthor\bsnmGroeneboom, \bfnmPiet\binitsP. and \bauthor\bsnmJongbloed, \bfnmGeurt\binitsG. (\byear2014). \btitleNonparametric Estimation under Shape Constraints: Estimators, Algorithms and Asymptotics. \bseriesCambridge Series in Statistical and Probabilistic Mathematics. \bpublisherCambridge University Press. \endbibitem
- [19] {bbook}[author] \bauthor\bsnmGut, \bfnmAllan\binitsA. (\byear2013). \btitleProbability : a graduate course, \bedition2nd ed ed. \bseriesSpringer texts in statistics. \bpublisherSpringer. \endbibitem
- [20] {barticle}[author] \bauthor\bsnmHansen, \bfnmBruce E.\binitsB. E. and \bauthor\bsnmSeo, \bfnmByeongseon\binitsB. (\byear2002). \btitleTesting for two-regime threshold cointegration in vector error-correction models. \bjournalJournal of Econometrics \bvolume110 \bpages293-318. \bnoteLong memory and nonlinear time series. \bdoihttps://doi.org/10.1016/S0304-4076(02)00097-0 \endbibitem
- [21] {barticle}[author] \bauthor\bsnmJohansen, \bfnmS.\binitsS. (\byear1988). \btitleStatistical analysis of cointegrating vectors. \bjournalJournal of Economic Dynamics and Control \bvolume12 \bpages231. \endbibitem
- [22] {barticle}[author] \bauthor\bsnmJohansen, \bfnmS.\binitsS. (\byear1991). \btitleEstimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. \bjournalEconometrica \bvolume59 \bpages1551. \endbibitem
- [23] {barticle}[author] \bauthor\bsnmKarlsen, \bfnmHans Arnfinn\binitsH. A., \bauthor\bsnmMyklebust, \bfnmTerje\binitsT. and \bauthor\bsnmTjøstheim, \bfnmDag\binitsD. (\byear2007). \btitleNonparametric estimation in a nonlinear cointegration type model. \bjournalThe Annals of Statistics \bvolume35 \bpages252–299. \endbibitem
- [24] {barticle}[author] \bauthor\bsnmKarlsen, \bfnmHans Arnfinn\binitsH. A. and \bauthor\bsnmTjostheim, \bfnmDag\binitsD. (\byear2001). \btitleNonparametric estimation in null recurrent time series. \bjournalThe Annals of Statistics \bvolume29. \endbibitem
- [25] {bbook}[author] \bauthor\bsnmKosorok, \bfnmMichael R.\binitsM. R. (\byear2008). \btitleIntroduction to empirical processes and semiparametric inference, \bedition1 ed. \bseriesSpringer Series in Statistics. \bpublisherSpringer. \endbibitem
- [26] {barticle}[author] \bauthor\bsnmLin, \bfnmZhengyan\binitsZ., \bauthor\bsnmLi, \bfnmDegui\binitsD. and \bauthor\bsnmChen, \bfnmJia\binitsJ. (\byear2009). \btitleLocal linear M-estimators in null recurrent times series. \bjournalStatistica Sinica \bvolume19 \bpages1683–1703. \endbibitem
- [27] {bbook}[author] \bauthor\bsnmMeyn, \bfnmSean\binitsS., \bauthor\bsnmTweedie, \bfnmRichard\binitsR. and \bauthor\bsnmGlynn, \bfnmPeter\binitsP. (\byear2009). \btitleMarkov chains and stochastic stability, \bedition2 ed. \bseriesCambridge Mathematical Library. \bpublisherCambridge University Press. \endbibitem
- [28] {barticle}[author] \bauthor\bsnmNummelin, \bfnmE.\binitsE. (\byear1978). \btitleA splitting technique for Harris recurrent Markov chains. \bjournalProbability Theory and Related Fields \bvolume43. \endbibitem
- [29] {bbook}[author] \bauthor\bsnmNummelin, \bfnmEsa\binitsE. (\byear1984). \btitleGeneral Irreducible Markov Chains and Non-Negative Operators. \bseriesCambridge Tracts in Mathematics 83. \bpublisherCambridge University Press. \endbibitem
- [30] {barticle}[author] \bauthor\bsnmPhillips, \bfnmPeter C. B.\binitsP. C. B. (\byear1991). \btitleOptimal inference in cointegrated systems. \bjournalEconometrica \bvolume59 \bpages283. \endbibitem
- [31] {barticle}[author] \bauthor\bsnmPhillips, \bfnmPeter C. B.\binitsP. C. B. and \bauthor\bsnmSolo, \bfnmV.\binitsV. (\byear1992). \btitleAsymptotics for linear processes. \bjournalThe Annals of Statistics \bvolume20 \bpages971. \endbibitem
- [32] {barticle}[author] \bauthor\bsnmStigler, \bfnmMatthieu\binitsM. (\byear2020). \btitleThreshold cointegration: overview and implementation in R. \bjournalHandbook of Statistics,42 \bpages229-264. \endbibitem
- [33] {bbook}[author] \bauthor\bparticleVan der \bsnmVaart, \bfnmAad W\binitsA. W. (\byear2000). \btitleAsymptotic statistics. \bpublisherCambridge university press. \endbibitem
- [34] {bbook}[author] \bauthor\bparticlevan der \bsnmVaart, \bfnmAad W.\binitsA. W. and \bauthor\bsnmWellner, \bfnmJon A.\binitsJ. A. (\byear1996). \btitleWeak Convergence and Empirical Processes. \bpublisherSpringer. \endbibitem
- [35] {bbook}[author] \bauthor\bsnmWang, \bfnmQiying\binitsQ. (\byear2015). \btitleLimit Theorems for Nonlinear Cointegrating Regression. \bpublisherWorld Scientific. \endbibitem
- [36] {barticle}[author] \bauthor\bsnmWang, \bfnmQiying\binitsQ. and \bauthor\bsnmPhillips, \bfnmPeter C. B.\binitsP. C. B. (\byear2009). \btitleAsymptotic Theory for Local Time Density Estimation and Nonparametric Cointegrating Regression. \bjournalEconometric Theory \bvolume25 \bpages710–738. \endbibitem
- [37] {barticle}[author] \bauthor\bsnmWang, \bfnmQiying\binitsQ. and \bauthor\bsnmPhillips, \bfnmPeter C. B.\binitsP. C. B. (\byear2009). \btitleStructural Nonparametric Cointegrating Regression. \bjournalEconometrica \bvolume77 \bpages1901–1948. \endbibitem