Empirical process theory for locally stationary processes
Abstract
We provide a framework for empirical process theory of locally stationary processes using the functional dependence measure. Our results extend known results for stationary Markov chains and mixing sequences by another common possibility to measure dependence and allow for additional time dependence. Our main result is a functional central limit theorem for locally stationary processes. Moreover, maximal inequalities for expectations of sums are developed. We show the applicability of our theory in some examples, for instance we provide uniform convergence rates for nonparametric regression with locally stationary noise.
Empirical process theory for locally stationary processes
Nathawut Phandoidaen, Stefan Richter
[email protected], [email protected]
Institut für angewandte Mathematik, Im Neuenheimer Feld 205, Universität Heidelberg
To appear in Bernoulli
1 Introduction
Empirical process theory is a powerful tool to prove uniform convergence rates and weak convergence of composite functionals. The theory for independent variables is well-studied (cf. [19], [23], [44] or [43] for an overview) based on the original ideas of [13], [17], [18], [37] and [34] among others. For random variables with dependence structure, various approaches have been discussed. There exists a well-developed empirical process theory and large deviation results for Harris-recurrent Markov chains based on regenerative schemes (cf. [28], [41], [21] and [1] among others) or geometric ergodicity (cf. [27]). To quantify the speed of convergence in maximal inequalities, additional assumptions like -recurrence (cf. [26]) have to be imposed. The theory covers a rich class of Markov chains, but for instance, can not discuss linear processes.
An empirical process theory for stationary processes under high-level assumptions on the moments of means was derived in [12] and further discussion papers. In the paradigm of weak dependence (which measures the size of covariances of Lipschitz functions of the random variables), [16] derived Bernstein-type inequalities. Focusing on the analysis of the empirical distribution function (EDF), much more techniques were discussed: For instance, [20, Theorem 4] provide uniform convergence of the EDF by using bounds for covariances of Hölder functions of the random variables. Another abstract concept was introduced by [4] via S-mixing (for stationary mixing), which imposes the existence of -dependent approximations of the original observations. They then derive strong approximations and uniform central limit theorems for the EDF.
A different idea to measure dependence of random variables is given by mixing coefficients. Here, several concepts were introduced, the most common (with increasing strength) being -, - and -mixing (for an overview about mixing coefficients, cf. [15]). Large deviation results and uniform central limit theorems for general classes of functions (not only EDF) were derived by using coupling techniques, cf. [38], [30] for -mixing, [3], [50], and successively refined by [14], [39], [11], [9] (the last two developed for EDFs only) and [40] for -mixing, and [10], [6] for - and -mixing. See also [2], [10] and [40] for comprehensive overviews.
In [10] it is argued that -mixing is the weakest mixing assumption that allows for a “complete” empirical process theory which incorporates maximal inequalities and uniform central limit theorems. There exist explicit upper bounds for -mixing coefficients for Markov chains (cf. [25]) and for so-called V-geometric mixing coefficients (cf. [32]). For several stationary time series models like linear processes (cf. [35] for -mixing), ARMA (cf. [33]), nonlinear AR (cf. [26]) and GARCH processes (cf. [22]) there also exist upper bounds on mixing coefficients. A common assumption in these results is that the observed process or, more often, the innovations of the corresponding process, have a continuous distribution. This is a crucial assumption to handle the relatively complicated mixing coefficients defined over a supremum over two different sigma-algebras. A relaxation of -mixing coefficients was investigated by [11, Theorem 1] and is specifically designed for the analysis of the EDF. In opposite to sigma-algebras, these smaller coefficients are defined with conditional expectations of certain classes of functions and are easier to upper bound for a wide range of time series models.
During the last years, another measure for dependence, the so-called functional dependence measure, became popular (cf. [46]), which uses a Bernoulli shift representation (see (1.1) below) and decomposition into martingales and -dependent sequences. It has been shown in various applications that the functional dependence measure allows, when combined with the rich theory of martingales, for sharp large deviation inequalities (cf. [49] or [51]). In [47] and [31], uniform central limit theorems for the EDF were derived for stationary and piecewise locally stationary processes.
Up to now, no general empirical process theory (allowing for general classes of functions) using the functional dependence measure is available. In this paper we will fill this gap and prove maximal inequalities and functional central limit theorems under functional dependence. Furthermore, we will draw connections and compare our results to already existing empirical process concepts for dependent data we mentioned above. While the empirical process theory for Markov chains and mixing cited above was developed for stationary processes, we will work in the framework of locally stationary processes and therefore automatically provide the first general empirical process theory in this setting ([7] investigated spectral empirical processes for linear processes, [31] proved a functional central limit theorem for a localized empirical distribution function). Locally stationary processes allow for a smooth change of the distribution over time but can locally approximated by stationary processes. Therefore, they provide more flexible time series models (cf. [8] for an introduction).
The functional dependence measure uses a representation of the given process as a Bernoulli shift process and quantifies dependence with a -norm. More precisely, we assume that , , is a -dimensional process of the form
(1.1) |
where is the sigma-algebra generated by , , a sequence of i.i.d. random variables in (), and some measurable function , , . For a real-valued random variable and some , we define . If is an independent copy of , independent of , we define and . The uniform functional dependence measure is given by
(1.2) |
Graphically, measures the impact of in . Although representation (1.1) appears to be rather restrictive, it does cover a large variety of processes. In [5] it was motivated that the set of all processes of the form should be equal to the set of all stationary and ergodic processes. We additionally allow to vary with and to cover processes which change their stochastic behavior over time. This is exactly the form of the so-called locally stationary processes discussed in [8].
Since we are working in the time series context, many applications ask for functions that not only depend on the actual observation of the process but on the whole (infinite) past . In the course of this paper, we aim to derive asymptotic properties of the empirical process
(1.3) |
where
Let denote the bracketing entropy, that is, the logarithm of the number of -brackets with respect to some distance that is necessary to cover (this is made precise at the end of this section). We will define a distance which guarantees weak convergence of (1.3) if the corresponding bracketing entropy integral is finite.
The definition of the functional dependence measure for locally stationary processes is similar to its stationary version and is easy to calculate for many time series models. It does not rely on the stationarity assumption but on the representation of the process as a Bernoulli shift. Therefore, many well-known upper bounds for stationary time series given in [48], including recursively defined models and linear models, directly carry over to the locally stationary case (1.2). It seems reasonable to use it as a starting point to generalize empirical process theory for stationary processes to the more general setting of local stationarity. While the other two paradigms mentioned above should also allow such a generalization in principle, there are open questions:
-
•
The theory for Harris-recurrent Markov chain relies on stationarity and intrinsically needs some knowledge about the whole time series due to the assumption of null-recurrence. There exist generalizations of local stationary Markov chains (cf. for instance [42]), but the corresponding recurrence properties and examples of locally stationary time series are not worked out yet. Furthermore, it is not directly clear how to deal with processes which incorporate the infinite past of , and linear processes can not be discussed easily.
-
•
Absolutely regular -mixing is shown in most examples by assuming some continuity in the distribution or in the corresponding innovations. Especially for linear processes, the bounds are quite hard to obtain and seem not to be optimal. Moreover, there exist no “invariance rules” which would directly allow to transfer the mixing properties of to which incorporates infinitely many lags of .
- •
Contrary to -mixing, the functional dependence measure can easily deal with by using Hölder-type assumptions on . Furthermore, it easily can be calculated in many situations and is not restricted to noncontinuous distributions of . Also, linear processes are covered.
However, there are also some peculiarities using the functional dependence measure (1.2). While for Harris-recurrent Markov chains and -mixing, the empirical process theory is independent of the function class considered, the situation for the functional dependence measure is more complicated. In order to quantify the dependence of by , we have to impose smoothness conditions on in direction of its first argument. The distance therefore will not only change with the dependence structure of , but also has to be “compatible” with the function class . The smoothness condition on also poses a challenging issue when considering chaining procedures where rare events are excluded by (non-smooth) indicator functions.
Our main contributions in this paper are the following:
-
•
We derive maximal inequalities for for classes of functions ,
-
•
a chaining device which preserves smoothness during the chaining procedure and
-
•
conditions to ensure asymptotic tightness and functional convergence of , .
The paper is organized as follows. In Section 2, we present our main result Theorem 2.3, the functional central limit theorem under minimal moment conditions. As a special case, we derive a version for stationary processes. We give a discussion on the distance and compare our result with the empirical process theory for -mixing. Some Assumptions are postponed to Section 3, where a new multivariate central limit theorem for locally stationary processes is presented. In Section 4, we provide new maximal inequalities for for both finite and infinite . In Section 5, we apply our theory to prove uniform convergence rates and weak convergences of several estimators. The aim of the last section is to highlight the wide range of applicability of our theory and to provide the typical conditions which have to be imposed as well as some discussion. In Section 6, a conclusion is drawn. We illustrate the main steps of the proofs in the Appendix of the article but postpone all detailed proofs to the Supplementary Material.
We now introduce some basic notation. For , let , . For ,
(1.4) |
which naturally appears in large deviation inequalities. For a given finite class , let denote its cardinality. We use the abbreviation
(1.5) |
if no confusion arises. For some distance , let denote the bracketing numbers, that is, the smallest number of -brackets (i.e. measurable functions with for all ) to cover . Let denote the bracketing entropy. For , let
2 A new functional central limit theorem
Roughly speaking, a process , is called locally stationary if for each , there exists a stationary process , such that if is small (cf. [8]). Typical estimators are of the form
where is a kernel function and is a bandwidth. Clearly, such a localization changes the convergence rate. To cover these cases, we suppose that any has a representation
(2.1) |
where is independent of and is independent of . We put
(2.2) |
The function class is considered to consist of Hölder-continuous functions in direction of . For , a sequence of elements of (equipped with the maximum norm ) and an absolutely summable sequence of nonnegative real numbers, we set
and .
Definition 2.1.
is called a -class if is a sequence of nonnegative real numbers, and satisfies for all , , ,
Furthermore, satisfies , .
The basic assumption for our main result is the following compatibility condition on .
Assumption 2.2.
is a -class. There exists , such that
(2.3) |
Let , be such that for all ,
While (2.3) summarizes moment assumptions on which are balanced by , the sequence reflects the intrinsic dependence of and measures the influence of the factor to the convergence rate of .
Based on Assumption 2.2, we define for
(2.4) |
Clearly, satisfies a triangle inequality. Therefore, is a distance between . We are now able to state our main result. The weak convergence takes place in the normed space
(2.5) |
cf. [43] for a detailed discussion of this space.
Theorem 2.3.
The proof of Theorem 2.3 consists of two ingredients, convergence of the finite-dimensional distributions (cf. Theorem 3.4) and asymptotic tightness (cf. Corollary 4.5). The more challenging part is asymptotic tightness; its proof only relies on Assumption 2.2 and consists of a new maximal inequality presented in Theorem 4.1 which may be of independent interest. To ensure convergence of the finite-dimensional distributions, we have to formalize local stationarity (Assumption 3.1) and pose conditions in time direction on (cf. Assumption 3.2) and (cf. Assumption 3.3) which is done in Section 3. In particular, it is needed that is properly normalized.
Let us note that in the case that is stationary, and , Assumptions 3.1, 3.2 and 3.3 are directly fulfilled. That is, in the stationary case, Assumption 2.2 is sufficient for Theorem 2.3. We formulate this finding as a simple corollary. Let
where , , is a stationary process and are functions from
with the property that for all , .
Corollary 2.4.
Suppose that . Let and . Assume that
(2.6) |
Then it holds in that
where is a centered Gaussian process with covariances
2.1 Form of and discussion on
2.1.1 Form of
Suppose that is independent of . Based on decay rates of , simpler forms of can be derived and are given in Table 1. These results are elementary and are proved in Lemmas 7.11, 7.12 in the Supplementary Material.
, | , , | |
---|---|---|
If , , are independent, we can choose for and thus is proportional to . We therefore exactly recover the case of independent variables with our theory.
2.1.2 Discussion on
Assumption 2.2 asks to upper bound
which is a convolution of the uniform Hölder constants of and the dependence measure of . Therefore, the specific form of has an impact on the dependence structure which is then introduced in . This is contrary to other typical chaining approaches for Harris-recurrent Markov chains or -mixing sequences where the dependence structure of simply transfers to functions without further conditions.
Furthermore, contrary to other chaining approaches, we have to ask for the existence of moments of in Assumption 2.2 even though only involves . This is due to the linear nature of the functional dependence measure (1.2). If is Lipschitz continuous with respect to its first argument ( in Assumption 2.2), we have to impose . However, these moment assumptions can be relaxed at the cost of larger as follows. Let us consider the special case that only depends on , that is, . If is bounded and Lipschitz continuous with respect to its first argument with Lipschitz constant , for any ,
Thus, can be chosen proportional to . This means that we can reduce the moment assumption to at the cost of having a larger norm .
2.2 Comparison to empirical process theory with -mixing
In this section, we compare our functional central limit theorem for stationary processes from Corollary 2.4 under functional dependence with similar results obtained under -mixing. Unfortunately, we were not able to find a general setting under which the functional dependence measure can be compared with the -mixing coefficients of , . However, in some special cases, both quantities can be upper bounded.
2.2.1 Upper bounds for dependence coefficients of linear processes
Consider the linear process
with an absolutely summable sequence , , and i.i.d. , , with . Then it is immediate that
From [35] (cf. also [15], Section 2.3.1), we have the following result. If for some , , has a Lipschitz-continuous Lebesgue-density and the process is invertible, then for some constant ,
where and . If for some ,
(2.7) |
Note that even for this specific example, the calculation of the functional dependence measure is much easier and possible with much less assumptions. Moreover, bounds for is typically larger than . The reason being the simple structure of compared to the much more involved formulation of dependence through sigma-algebras in the -mixing coefficients. For recursively defined processes with a finite number of lags, are typically upper bounded by geometric decaying coefficients (cf. [48], [8]); the same holds true for under additional continuity assumptions (cf. [15], Section 2.4. or [27], [25] among others).
2.2.2 Entropy integral
In [14] (cf. also [10]), it was shown that if , , is stationary and -mixing with coefficients , , then
(2.8) |
implies weak convergence of in . Here, the -norm is defined as follows. If denotes the inverse cadlag of the decreasing function and the inverse cadlag of the tail function , then
Condition (2.8) was later relaxed in [40, Theorem 8.3]. It could be shown that if consists of indicator functions of specific classes of sets (in particular, corresponds to the empirical distribution function), weak convergence can be obtained under less restrictive conditions than (2.8). Since our theory does not directly allow us to analyze indicator functions because has to be a -class, we do not discuss their generalization here in detail.
In the special cases of polynomial and geometric decay, simple upper bounds for are available (cf. [10]). If for some , then is upper bounded by .
Generally speaking, (2.8) asks for moments of the process to exist while our condition in (2.6) only asks for moments of but allows for smaller function classes through the additional factors given in the entropy integral (cf. Table 1). In specific examples (cf. (2.7)) it may occur that the entropy integral (2.6) is finite while (2.8) is infinite due to missing summability of .
To give a precise comparison, consider the situation of linear processes from (2.7). If , we can choose . Then, the two entropy integrals from Corollary 2.4 (left) and (2.8) read
Here, the entropy integral for mixing only exists if . The difference in the behavior is due to different bounds used for the variance of .
2.3 Integration into other empirical process results for the empirical distribution function of dependent data
While our approach does allow for a general empirical process theory for Hölder continuous function classes, some more general dependence concepts already have been introduced (only) for the discussion of the empirical distribution function (EDF) based on the one-dimensional class . We mention [6], [9], [4] and [20]. The conditions therein cover the case where is a stationary Bernoulli shift with and dependence is measured with the (stationary) functional dependence measure
and its summed up version, . [6] introduces so-called -approximation coefficients , , which can be viewed as another formulation of functional dependence. However, their final result [6, Theorem 5] for the convergence of the EDF is stated with summability conditions both on and absolutely regular mixing coefficients, we therefore do not discuss it in detail here. On the other hand, [9, Theorem 2.1] in combination with [11, Section 6.1] show convergence of the EDF if for some and ,
This is done by introducing simplified -mixing coefficients which can then be upper bounded by . By using independent approximations of the original process, [4, Theorem 1, Corollary 1] obtain convergence of the EDF if for some and , , or equivalently,
[20] discusses convergence of the EDF under a general growth condition imposed on the moments of where , the set of all Hölder-continuous functions with exponent . Their condition is fulfilled if
for some , .
3 A general central limit theorem for locally stationary processes
In this section, we introduce the remaining assumptions needed in Theorem 2.3 which pose regularity conditions on the process and the function class in time direction. They are used to derive a multivariate central limit theorem for under minimal moment conditions in Theorem 3.4. Comparable results in different and more specific contexts were shown in [8] or [42].
We first formalize the property of to be locally stationary (cf. [8]). We ask that for each there exists a stationary process , , such that if is small. Recall from Assumption 2.2.
Assumption 3.1.
For each , there exists a process , , where is a measurable function. Furthermore, there exists some , such that for every , ,
For it holds that .
The behavior of the functions of the class in the direction of time is controlled by the following two continuity assumptions which state conditions on and separately.
Assumption 3.2.
There exists some such that for every ,
For , let .
Assumption 3.3.
For all , the function has bounded variation uniformly in , and
(3.1) |
One of the two following cases hold.
-
(i)
Case (global): For all , has bounded variation for all and the following limit exists:
-
(ii)
Case (local): There exists a sequence and such that . It holds that
The following limit exists for all :
Assumption 3.3 looks rather technical. The first part including (3.1) guarantees the right normalization of . The second part ensures the convergence of the asymptotic variances and covariances .
We obtain the following central limit theorem.
Theorem 3.4 generalizes the one-dimensional central limit theorem from [8]. We now comment on the assumptions.
Remark 3.5.
Assumptions 3.1, 3.2 and 3.3 allow for very general structures of . However, in many special cases, a subset of them is automatically fulfilled:
- •
-
•
If does not depend on , Assumption 3.2 is fulfilled.
Regarding Assumption 3.3 we have:
- •
- •
-
•
If , and with some Lipschitz-continuous kernel with support and fixed , then Assumption 3.3(ii) holds with .
4 Maximal inequalities and asymptotic tightness under functional dependence
In this section, we provide the necessary ingredients for the proof of asymptotic tightness of . We derive a new maximal inequality for finite under functional dependence in Theorem 4.1. We then generalize this bound to arbitrary using chaining techniques in Section 4.2.
4.1 Maximal inequalities
We first derive a maximal inequality which is a main ingredient for chaining devices but also is of independent interest. To state the result, let
and define
Set . For , choose such that
(4.1) |
Put . Recall that as in (1.5).
Theorem 4.1.
Suppose that satisfies and Assumption 2.2. Then there exists some universal constant such that the following holds: If and , then
(4.2) |
and
(4.3) |
Clearly, the second bound (4.3) is a corollary of (4.2) which balances the two terms which involve . Values of for the two prominent cases that is polynomial or exponential decaying can be found in Table 2. The proof of Theorem 4.1 relies on a decomposition of in i.i.d. parts and a residual term of martingale structure. Similar decompositions are also the core of empirical process results for Harris-recurrent Markov chains (cf. [29]) and mixing sequences (cf. [10]).
, | , | |
---|---|---|
In the next subsections, we will prove asymptotic tightness for under the condition that , do not depend on . However, uniform convergence rates of for finite (growing with ) can be obtained without these conditions but with additional moment assumptions, which is done in the following Corollary 4.3. To incorporate the additional moment assumptions, we use a slightly stronger assumption than Assumption 2.2.
Assumption 4.2.
is a -class. There exists , , such that
(4.4) |
Let , be such that for all ,
Note that Assumption 2.2 is obtained by taking . For , let
cf. Table 2 for values of in special cases.
Corollary 4.3 (Uniform convergence rates).
The first condition in (4.5) guarantees that is properly normalized. The second and third condition are needed to prove that the “rare events”, where exceeds some threshold , are of the same order as . For this, we may need more than two moments of , that is, , depending on and the behavior of .
4.2 Asymptotic tightness
In this section, we extend the maximal inequality from Theorem 4.1 to arbitrary (infinite) classes . Since Assumption 2.2 forces to be Hölder-continuous with respect to its first argument , classical chaining approaches which use indicator functions do not apply here. We provide a new chaining technique which preserves continuity in Section 7.2.
For , and define and
(4.6) |
Here, represents the threshold for rare events in the chaining procedure. We have the following result.
Theorem 4.4.
Let satisfy Assumption 2.2 and let be some envelope function of , that is, for each it holds that . Let and assume that . Then there exists some universal constant such that
5 Applications
In this section, we provide some applications of the main results (Corollary 4.3 and Corollary 2.3). We will focus on locally stationary processes and therefore use localization in our functionals, but the results also hold for stationary processes, accordingly.
Let be some bounded kernel function which is Lipschitz continuous with Lipschitz constant , , and support . For some bandwidth , put .
In the first example we consider the nonparametric kernel estimator in the context of nonparametric regression with fixed design and locally stationary noise. We show that under conditions on the bandwidth , which are common in the presence of dependence (cf. [24] or [45]), we obtain the optimal uniform convergence rate . Write for sequences if there exists some constant such that for all .
Example 5.1 (Nonparametric Regression).
Let be some arbitrary process of the form (1.1) with which fulfills for some . Suppose that we observe , given by
where is some function. Estimation of is performed via
Suppose that either
-
•
with some , and , or
-
•
with some and .
From (5.1) and (5.2) below it follows that
First note that due to Lipschitz continuity of with Lipschitz constant , we have
(5.1) | |||||
For the grid , which discretizes up to distances , we obtain by Corollary 4.3 that
(5.2) |
where
The conditions of Corollary 4.3 are easily verified: It holds that with and . Thus, Assumption 4.2 is satisfied with , , . Furthermore, , and
which shows that . The conditions on emerge from the last condition in (4.5) and using the bounds for from Table 2.
For the following two examples we suppose that the underlying process is locally stationary in the sense of Assumption 3.1. Similar assumptions are posed in [8] and are fulfilled for a large variety of locally stationary processes.
In the same spirit as in Example 5.1, it is possible to derive uniform rates of convergence for M-estimators of parameters in models of locally stationary processes. Furthermore, weak Bahadur representations can be obtained. The following results apply for instance to maximum likelihood estimation of parameters in tvARMA or tvGARCH processes. The main tool is to prove uniform convergence of the corresponding objective functions and its derivatives. Since the rest of the proof is standard, the details are postponed to the Supplementary Material, Section 7.8. Let denote the -th derivative with respect to . To apply empirical process theory, we ask for the objective functions to be -classes in (A1) and Lipschitz with respect to in (A2).
Lemma 5.2 (M-estimation, uniform results).
Let be compact and . For each , let be some measurable function which is twice continuously differentiable. Let , and define for ,
Suppose that there exists such that for ,
-
(A1)
is an -class with for some and Assumption 3.1 for is fulfilled with , .
-
(A2)
for all , ,
-
(A3)
attains its global minimum in with positive definite .
Furthermore, suppose that either
-
•
with some , and , or
-
•
with some and .
Define and (the bias). Then, , and as ,
and
Remark 5.3.
-
•
In the tvAR(1) case , we can use for instance
which for is a -class.
-
•
With more smoothness assumptions on or using a local linear estimation method for , the bias term can be shown to be of smaller order, for instance (cf. [8]).
-
•
The theory derived in this paper can also be used to prove asymptotic properties of M-estimators based on objective functions which are only almost everywhere differentiable in the Lebesgue sense by following the theory of chapter 5 in [43]. This is of utmost interest for that have additional analytic properties, such as convexity. Since these properties are also needed in the proofs, we will not discuss this in detail.
We give an easy application of the functional central limit theorem from Theorem 2.3 by inspecting a local stationary version of Example 19.25 in [43].
Example 5.4 (Local mean absolute deviation).
For fixed , put and define the mean absolute deviation
Let Assumption 3.1 hold with , . Suppose that and that for some , . We show that if and ,
(5.3) |
where , denotes the distribution function of and
The result is obtained by using the decomposition
where and
By the triangle inequality, satisfies Assumption 2.2 with , , , and . Assumption 3.2 is trivially fulfilled since does not depend on . Since is a one-dimensional Lipschitz class, . By Corollary 2.3, we obtain that there exists some process such that for , ,
(5.4) |
Furthermore, by Assumption 3.1,
(5.5) | |||||
By Lemma 19.24 in [43], we conclude from (5.4) and (5.5) that
(5.6) |
6 Conclusion
In this paper, we have developed a new empirical process theory for locally stationary processes with the functional dependence measure. We have proven a functional central limit theorem and maximal inequalities. A general empirical process theory for locally stationary processes is a key step to derive asymptotic and nonasymptotic results for M-estimates or testing based on - or -statistics. We have given an example in nonparametric estimation where our theory is applicable. Due to the possibility to analyze the size of the function class and the stochastic properties of the underlying process separately, we conjecture that our theory also permits an extension of various results from i.i.d. to dependent data, such as empirical risk minimization.
From a technical point of view, the linear and moment-based nature of the functional dependence measure has forced us to modify several approaches from empirical process theory for i.i.d. or mixing variables. A main issue was given by the fact that the dependence measure only transfers decay rates for continuous functions. We therefore have provided a new chaining technique which preserves continuity of the arguments of the empirical process.
In principle, a similar empirical process theory for locally stationary processes can be established under mixing conditions such as absolute regularity. This would be a generalization of the results found in [38] and [10]. As we have seen in Section 2.2, such a theory would pose additional moment conditions on . Contrary to that, our framework only requires second moments of , but the entropy integral is enlarged by some factor which increases with stronger dependence. Moreover, in nearly all models the derivation of a bound for mixing coefficients needs continuity of the innovation process which may not be suitable in several examples. Therefore, we consider our theory as a valuable addition to this existing theory even in the stationary case.
One could also think of an extension of our empirical process theory to functions which are noncontinuous with respect to . This can in principle be done by using a martingale decomposition and assuming continuity of , instead. However, one typically can only expect continuity of this functional if either already was continuous or has a continuous density. In the latter case, the sequence might also be -mixing, and a more detailed discussion about advantages of our formulation is necessary.
Acknowledgements
The authors would like to thank the associate editor and two anonymous referees for their helpful remarks which helped to provide a much more concise version of the paper.
7 Appendix
In the appendix, we present the basic ideas used to prove the maximal inequalities of Section 4. We first consider the finite version in Section 7.1 and then present a chaining approach which preserves continuity in Section 7.2.
7.1 Proof idea: A maximal inequality for finite , Theorem 4.1
We provide an approach to obtain maximal inequalities for sums of random variables , , indexed by , by using a decomposition into independent random variables. An approach with similar intentions is presented in [10] (Section 4.3 therein) for absolutely regular sequences and in [29] for Harris-recurrent Markov chains. For convenience, we abbreviate
and put .
To approximate by independent variables, we use a technique from [49] which was refined in [51]. This decomposition is much more involved then the ones for Harris-recurrent Markov chains or mixing sequences since no direct coupling method is available. Define
and
Let be arbitrary. Put and (), . Then we have
(in the case , the sum in the middle does not appear) and thus
We write
The random variables are independent if . This leads to the decomposition
(7.1) | |||||
While the first term in (7.1) can be made small by assumptions on the dependence of and by the use of a large deviation inequality for martingales in Banach spaces from [36], the second and third term allow the application of Rosenthal-type bounds due to the independency of the summands and , respectively. Since the first term in (7.1) allows for a stronger bound in terms of than it is the case for mixing, we can obtain a theory which only needs second moments of . By Assumption 2.2, we can show the following results (cf. Lemma 7.3 in the Supplementary Material and recall (4.1) for the definition of ). For each , , , ,
(7.2) | |||||
(7.3) | |||||
(7.4) |
We now summarize the proof of Theorem 4.1. The detailed proof is found in the Supplementary material. Denote
The remaining terms in (7.1) are discussed similarly or are special cases. We first have
where is a martingale difference sequence with respect to . For and , define . Then
By Theorem 4.1 in [36] there exists an absolute constant such that for ,
By using (7.2) and standard techniques for the functional dependence measure, we conclude that
Summarizing the results, we obtain
(7.5) |
Regarding , we have
Using the fact that martingale sequences are uncorrelated, (7.3) and the simple bound , one can show that
A maximal inequality for independent random variables based on Bernstein’s inequality yields that there exists some universal constant such that
By monotonicity of the first term with respect to and
we obtain with some universal constant ,
which together with (7.5) provides the result of Theorem 4.1.
7.2 An elementary chaining approach which preserves continuity
In this section, we provide a chaining approach which preserves continuity of the functions inside the empirical process. Typical chaining approaches work with indicator functions which is not suitable for application of Theorem 4.1. We replace the indicator functions by suitably chosen truncations. For , define and the corresponding “peaky” residual via
In the following, assume that for each there exists a decomposition , where , is a sequence of nested partitions. For each and , choose a fixed element . For , define if .
Assume furthermore that there exists a sequence such that for all , . Finally, let be a decreasing sequence which will serve as a truncation sequence.
For , we use the decomposition
Since
(7.6) | |||||
we can write
(7.7) |
where
To bound , we use (i) of the following elementary Lemma 7.1 which is proved in Section 7.6 included in the Supplementary Material.
Lemma 7.1.
Because the partitions are nested, we have . By Lemma 7.1 and (7.6), we have
(7.8) | |||||
Let . We then have with iterated application of (7.7) and linearity of ,
(7.9) | |||||
which in combination with (7.8) can now be used for chaining. The following lemma provides the necessary balancing between the truncated versions of and the rare events excluded. Recall that as in (1.4).
Lemma 7.2 (Compatibility lemma).
Proof of Lemma 7.2.
For , put . By Theorem 4.1 and the definition of ,
which shows (7.10). Since
for all with , it holds that
(7.12) |
If , we have
(7.13) |
In the case , the fact that is decreasing implies that is well-defined. We conclude that
(7.14) | |||||
Summarizing the results (7.13) and (7.14), we have
We conclude that
where .
Since , we have . Thus . By definition of , . Thus . By definition of , . We conclude with that
(7.15) |
7.3 Proof idea: A maximal inequality for infinite , Theorem 4.4
The details of the proof are given in Section 7.5 in the Supplementary material. In the following, we abbreviate and . Choose and .
For each , we choose a covering by brackets , such that and . We may assume w.l.o.g. that and that are nested.
In each , fix some , and define and . Put
The chaining procedure is now applied with ( from (4.6)). Choose . We then have
(7.16) |
where .
References
- [1] Radosł aw Adamczak. A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electron. J. Probab., 13:no. 34, 1000–1034, 2008.
- [2] Donald W. K. Andrews and David Pollard. An introduction to functional central limit theorems for dependent stochastic processes. International Statistical Review / Revue Internationale de Statistique, 62(1):119–132, 1994.
- [3] M. A. Arcones and B. Yu. Central limit theorems for empirical and -processes of stationary mixing sequences. J. Theoret. Probab., 7(1):47–71, 1994.
- [4] István Berkes, Siegfried Hörmann, and Johannes Schauer. Asymptotic results for the empirical process of stationary sequences. Stochastic Process. Appl., 119(4):1298–1324, 2009.
- [5] Vivek S. Borkar. White-noise representations in stochastic realization theory. SIAM J. Control Optim., 31(5):1093–1102, 1993.
- [6] Svetlana Borovkova, Robert Burton, and Herold Dehling. Limit theorems for functionals of mixing processes with applications to -statistics and dimension estimation. Trans. Amer. Math. Soc., 353(11):4261–4318, 2001.
- [7] Rainer Dahlhaus and Wolfgang Polonik. Empirical spectral processes for locally stationary time series. Bernoulli, 15(1):1–39, 2009.
- [8] Rainer Dahlhaus, Stefan Richter, and Wei Biao Wu. Towards a general theory for nonlinear locally stationary processes. Bernoulli, 25(2):1013–1044, 2019.
- [9] J. Dedecker. An empirical central limit theorem for intermittent maps. Probab. Theory Related Fields, 148(1-2):177–195, 2010.
- [10] Jérôme Dedecker and Sana Louhichi. Maximal inequalities and empirical central limit theorems. In Empirical process techniques for dependent data, pages 137–159. Birkhäuser Boston, Boston, MA, 2002.
- [11] Jérôme Dedecker and Clémentine Prieur. An empirical central limit theorem for dependent sequences. Stochastic Process. Appl., 117(1):121–142, 2007.
- [12] Herold Dehling, Olivier Durieu, and Dalibor Volny. New techniques for empirical processes of dependent data. Stochastic Process. Appl., 119(10):3699–3718, 2009.
- [13] Monroe D. Donsker. Justification and extension of Doob’s heuristic approach to the Komogorov-Smirnov theorems. Ann. Math. Statistics, 23:277–281, 1952.
- [14] P. Doukhan, P. Massart, and E. Rio. Invariance principles for absolutely regular empirical processes. Ann. Inst. H. Poincaré Probab. Statist., 31(2):393–427, 1995.
- [15] Paul Doukhan. Mixing, volume 85 of Lecture Notes in Statistics. Springer-Verlag, New York, 1994. Properties and examples.
- [16] Paul Doukhan and Michael H Neumann. Probability and moment inequalities for sums of weakly dependent random variables, with applications. Stochastic Processes and their Applications, 117(7):878–903, 2007.
- [17] R. M. Dudley. Weak convergences of probabilities on nonseparable metric spaces and empirical measures on Euclidean spaces. Illinois J. Math., 10:109–126, 1966.
- [18] R. M. Dudley. Central limit theorems for empirical measures. Ann. Probab., 6(6):899–929 (1979), 1978.
- [19] R. M. Dudley. Uniform central limit theorems, volume 142 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, New York, second edition, 2014.
- [20] Olivier Durieu and Marco Tusche. An empirical process central limit theorem for multidimensional dependent data. J. Theoret. Probab., 27(1):249–277, 2014.
- [21] Richard S. Ellis and Aaron D. Wyner. Uniform large deviation property of the empirical process of a Markov chain. Ann. Probab., 17(3):1147–1151, 1989.
- [22] Christian Francq and Jean-Michel Zakoïan. Mixing properties of a general class of GARCH(1,1) models without moment assumptions on the observed process. Econometric Theory, 22(5):815–834, 2006.
- [23] Evarist Giné and Richard Nickl. Mathematical foundations of infinite-dimensional statistical models. Cambridge Series in Statistical and Probabilistic Mathematics, [40]. Cambridge University Press, New York, 2016.
- [24] Bruce E. Hansen. Uniform convergence rates for kernel estimation with dependent data. Econometric Theory, 24(3):726–748, 2008.
- [25] Lothar Heinrich. Bounds for the absolute regularity coefficient of a stationary renewal process. Yokohama Math. J., 40(1):25–33, 1992.
- [26] Hans Arnfinn Karlsen and Dag Tjø stheim. Nonparametric estimation in null recurrent time series. Ann. Statist., 29(2):372–416, 2001.
- [27] RafałKulik, Philippe Soulier, and Olivier Wintenberger. The tail empirical process of regularly varying functions of geometrically ergodic Markov chains. Stochastic Process. Appl., 129(11):4209–4238, 2019.
- [28] Shlomo Levental. Uniform limit theorems for Harris recurrent Markov chains. Probab. Theory Related Fields, 80(1):101–118, 1988.
- [29] Degui Li, Dag Tjø stheim, and Jiti Gao. Estimation in nonlinear regression with Harris recurrent Markov chains. Ann. Statist., 44(5):1957–1987, 2016.
- [30] Eckhard Liebscher. Strong convergence of sums of [alpha]-mixing random variables with applications to density estimation. Stochastic Processes and their Applications, 65(1):69–80, 1996.
- [31] Ulrike Mayer, Henryk Zähle, and Zhou Zhou. Functional weak limit theorem for a local empirical process of non-stationary time series and its application. Bernoulli, 26(3):1891 – 1911, 2020.
- [32] Sean Meyn and Richard L. Tweedie. Markov chains and stochastic stability. Cambridge University Press, Cambridge, second edition, 2009. With a prologue by Peter W. Glynn.
- [33] Abdelkader Mokkadem. Mixing properties of ARMA processes. Stochastic Process. Appl., 29(2):309–315, 1988.
- [34] Mina Ossiander. A central limit theorem under metric entropy with bracketing. Ann. Probab., 15(3):897–919, 1987.
- [35] Tuan D. Pham and Lanh T. Tran. Some mixing properties of time series models. Stochastic Process. Appl., 19(2):297–303, 1985.
- [36] Iosif Pinelis. Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab., 22(4):1679–1706, 1994.
- [37] David Pollard. A central limit theorem for empirical processes. J. Austral. Math. Soc. Ser. A, 33(2):235–248, 1982.
- [38] Emmanuel Rio. The functional law of the iterated logarithm for stationary strongly mixing sequences. Ann. Probab., 23(3):1188–1203, 07 1995.
- [39] Emmanuel Rio. Processus empiriques absolument réguliers et entropie universelle. Probab. Theory Related Fields, 111(4):585–608, 1998.
- [40] Emmanuel Rio. Asymptotic theory of weakly dependent random processes, volume 80 of Probability Theory and Stochastic Modelling. Springer, Berlin, 2017. Translated from the 2000 French edition [ MR2117923].
- [41] Jorge D. Samur. A regularity condition and a limit theorem for Harris ergodic Markov chains. Stochastic Process. Appl., 111(2):207–235, 2004.
- [42] Lionel Truquet. A perturbation analysis of Markov chains models with time-varying parameters. Bernoulli, 26(4):2876–2906, 2020.
- [43] A. W. van der Vaart. Asymptotic statistics, volume 3 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998.
- [44] Aad W. van der Vaart and Jon A. Wellner. Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York, 1996. With applications to statistics.
- [45] Michael Vogt. Nonparametric regression for locally stationary time series. Ann. Statist., 40(5):2601–2633, 2012.
- [46] Wei Biao Wu. Nonlinear system theory: another look at dependence. Proc. Natl. Acad. Sci. USA, 102(40):14150–14154, 2005.
- [47] Wei Biao Wu. Empirical processes of stationary sequences. Statistica Sinica, 18(1):313–333, 2008.
- [48] Wei Biao Wu. Asymptotic theory for stationary processes. Stat. Interface, 4(2):207–226, 2011.
- [49] Wei Biao Wu, Weidong Liu, and Han Xiao. Probability and moment inequalities under dependence. Statist. Sinica, 23(3):1257–1272, 2013.
- [50] Bin Yu. Rates of convergence for empirical processes of stationary mixing sequences. Ann. Probab., 22(1):94–116, 1994.
- [51] Danna Zhang and Wei Biao Wu. Gaussian approximation for high dimensional time series. Ann. Statist., 45(5):1895–1919, 2017.
Supplementary Material
This material contains some details of the proofs in the paper as well as the proofs of the examples.
7.4 Proofs of Section 4.1
Lemma 7.3.
Proof of Lemma 7.3.
We have for each and that
This shows the first assertion. Due to
the second assertion follows similarly. The last assertion follows from
which implies
∎
Proof of Theorem 4.1.
Denote the three terms on the right hand side of (7.1) by . We now discuss the three terms separately. First, we have
For fixed , the sequence
is a -dimensional martingale difference vector with respect to . For a vector and , write . By Theorem 4.1 in [36] there exists an absolute constant such that for ,
(7.18) |
We have
therefore both terms in (7.18) are of the same order and it is enough to bound the second term in (7.18). We have
(7.19) | |||||
Note that
(7.20) | |||||
where and .
By Jensen’s inequality, Lemma 7.3 and the fact that has the same distribution as ,
(7.21) | |||||
Inserting (7.21) into (7.19) delivers
Inserting this bound into (7.18), we obtain
We conclude with that
(7.22) | |||||
We now discuss . If are constants and , mean-zero independent variables (depending on ) with , , then there exists some universal constant such that
(7.23) |
(see e.g. [10] equation (4.3) in Section 4.1 therein).
Note that is a martingale difference sequence and . Furthermore, we have
and
thus
We conclude with the elementary inequality that
Put
Then
(7.24) | |||||
With and (7.23), we obtain
and a similar assertion for the second term ( odd) in . With (7.24), we conclude that
(7.25) | |||||
Note that
(7.26) |
Furthermore, we have by Lemma 7.4 that
(7.27) | |||||
where
(7.28) |
and the second to last equality holds since is increasing.
Since is a sum of independent variables with and , we obtain from (7.23) again
(7.30) |
If we insert the bounds (7.22), (7.29) and (7.30) into (7.1), we obtain the result (4.2).
We now show (4.3). If , we have and thus by (4.2):
(7.31) | |||||
If , we note that the simple bound
(7.32) | |||||
holds. Putting the two bounds (7.31) and (7.32) together, we obtain the result (4.3).
∎
Lemma 7.4.
Let be an increasing sequence in . Then, for any ,
Especially in the case ,
Proof of Lemma 7.4.
It holds that
∎
7.5 Proofs of Section 4.2
Proof of Theorem 4.4.
In the following, we abbreviate and . Choose and .
For each , we choose a covering by brackets , such that and . We may assume w.l.o.g. that .
If do not belong to , we can simply define new brackets by
which fulfill , and
Thus, we can add to without changing the bracketing numbers and the validity of Assumption 4.2.
We now construct inductively a new nested sequence of partitions of from in the following way: For each fixed , put
as the intersections of all previous partitions and the -th partition. Then . By monotonicity of , we have
In each , fix some , and define where . Put and
we set
(7.36) |
Put
( from Lemma 7.2). Choose . We then have
where . Due to Lemma 7.1(iii), still fulfills Assumption 4.2.
Since implies and , it holds that
By (7.8) and (7.9) and the fact that , we have the decomposition
(7.37) | |||||
We now discuss the terms , from (7.37). Therefore, put .
Since with , the class still fulfills Assumption 4.2. We conclude by Lemma 7.1(iii) that for arbitrary , the classes
fulfill Assumption 4.2.
Proof of Corollary 4.5.
Define . It is easily seen that (cf. [43], Theorem 19.5), thus
(7.47) |
Let . Define
Then obviously, is an envelope function of .
By Markov’s inequality, Theorem 4.4 and (7.47),
The first term converges to 0 by (4.7) and (4.8) for (uniformly in ).
We now discuss the second term. The continuity conditions from Assumption 4.2 and Assumption 3.2 transfer to by the inequality
We therefore have as in Lemma 7.8(ii) that for all ,
(7.48) | |||
(7.49) |
Put . Then by Lemma 7.6(ii) and (7.48),
(7.50) | |||||
Put and . By (7.49), . By the assumptions on , and . We conclude with Lemma 7.7(i) that
that is, the first summand in (7.50) tends to . Since , we obtain that (7.50) tends to . ∎
7.6 Proofs of Section 7.2
Proof of Lemma 7.1.
-
(i)
Since implies , we have
Case 1: . Then, since , we have .
Case 2: . Then .
Case 3: . Then, since , we have .
Furthermore, . -
(ii)
The first assertion is obvious. If , we have
which shows the second assertion.
-
(iii)
We will show that for all it holds that
(7.51) from which the assertion follows. For real numbers , we have
thus . This implies and therefore
For the second inequality in (7.51), note that
We therefore have
If , then
A similar result is obtained for . If , , then
A similar result is obtained for , , which proves (7.51).
∎
Lemma 7.5 (Properties of ).
is well-defined and for each , and .
Proof.
and are well-defined since is decreasing (at a rate ) and is increasing (at a rate ) and .
Let . We show that fulfills . By definition of , we obtain which gives the result. Since is decreasing, is decreasing. We conclude that
The second inequality follows from the fact that is increasing and . ∎
7.7 Proofs of Section 3
Proof of Theorem 3.4.
Denote and . Let . We use the decomposition
For fixed , put
Then, since , is a martingale difference sequence and by Lemma 7.8(i),
thus
(7.52) |
Stationary approximation: Put , where
Then we have
For each , it holds that
By Lemma 7.8, we have . Since has bounded variation uniformly in ,
By Lemma 7.8(ii),
We therefore obtain
(7.54) |
Note that
is a martingale difference sequence with respect to , and
We can therefore apply a central limit theorem for martingale difference sequences to .
The Lindeberg condition: Let . Iterated application of Lemma 7.6(i) yields that there are constants only depending on such that
For each , we have
(7.55) | |||||
where we have put
By Lemma 7.8(ii), satisfies the assumptions (7.59) of Lemma 7.7. By assumption, . With , we obtain from Lemma 7.7 that (7.55) converges to , which shows that the Lindeberg condition is satisfied.
Convergence of the variance: We have
For each , we define
Then
By Lemma 7.8(i),(ii), we have
Let . Since has bounded variation uniformly in , it follows that has bounded variation uniformly in . From we conclude .
By assumption and the Cauchy-Schwarz inequality,
It holds that , and
Thus, Lemma 7.7(ii) is applicable.
Case : If has bounded variation, we have
and thus
Here, for , we have that can be written as
which shows that the condition stated in the assumption guarantees the bounded variation of .
Case : If , then we obtain similarly
By the martingale central limit theorem and (7.53), (7.54), we obtain that
(7.56) |
Conclusion: For , we have
(7.57) |
due to
uniformly in and
and the Cramer-Wold device, the assertion of the theorem follows.
∎
Lemma 7.6.
Let , .
-
(i)
For , it holds that
-
(ii)
For random variables , it holds that
Proof of Lemma 7.6.
-
(i)
It holds that
-
(ii)
We have
(7.58) Furthermore, with Markov’s inequality,
Inserting this inequality into (7.58), we obtain the assertion.
∎
The following lemma generalizes some results from [8] using similar techniques as therein.
Lemma 7.7.
Let . Let be a stationary sequence with
(7.59) |
Let be some sequence of functions with .
-
(i)
Let . Let be some sequence with . Then
-
(ii)
Let . Suppose that there exists such that for all , implies . Put and suppose that
Suppose that the limits on the following right hand sides exist. If has bounded variation, then
If , then
Proof of Lemma 7.7.
Let be fixed and assume that . For , Define . Then forms a decomposition of in the sense that . Since , we conclude that . Thus, since ,
(7.60) |
Let , be an arbitrary sequence. Then it holds that
(7.61) | |||||
- (i)
-
(ii)
Since (7.59) also holds for replaced by , we may assume in the following that w.l.o.g. that .
By (7.61) applied to , we obtain
(7.64) We furthermore have
(7.65) Fix . Put and, for a real-valued positive , define . By stationarity, the following equality holds in distribution:
(7.66) Put . By partial summation and since has bounded variation uniformly in ,
(7.67) By stationarity, we have
since . By assumption, .
If has bounded variation, we have with some intermediate value ,
If instead , we have with some intermediate value ,
Since has bounded variation uniformly in ,
∎
Lemma 7.8.
Proof of Lemma 7.8.
- (i)
- (ii)
∎
7.8 Proofs of Section 5
Proof of Lemma 5.2.
Put . From (A1) and Assumption 3.1 we obtain that , .
Since is Lipschitz continuous and (A2) holds, we have
Let be a grid approximation of such that for any , there exists some such that . Since , it is possible to choose such that . Furthermore, define as an approximation of .
Put . With (A1) it is easy to see that
(7.75) | |||||
Finally, since has bounded variation and , uniformly in it holds that
(7.76) |
From (7.74), (7.75) and (7.76) we obtain
(7.77) |
where
By (A3) and (7.77) for , we obtain with standard arguments that if ,
Since is a minimizer of and is twice continuously differentiable, we have the representation
(7.78) |
where fulfills .
7.9 Form of the -norm and connected quantities
Lemma 7.9 (Summation of polynomial and geometric decay).
Let and . Then it holds that
-
(i)
-
(ii)
For ,
where , are constants only depending on .
Proof of Lemma 7.9.
-
(i)
Upper bound: If , then
If , then .
Lower bound: Using similar decomposition arguments as above, we have
-
(ii)
-
•
Exponential decay: Upper bound: First let . Then we have
where .
Lower Bound: Put . Then
where . We have and . Thus
Now consider the case , that is, . Then, , and . We obtain
that is, the assertion holds with .
-
•
Polynomial decay: Upper bound: Let . Then we have by (i):
where .
Lower Bound: Put . By (i), . Then
Elementary analysis yields that the minimum is achieved for , that is,
the assertion holds with .
-
•
∎
Lemma 7.10 (Values of , ).
-
•
Polynomial decay ). Then there exist constants , only depending on such that
and
-
•
Geometric decay (). Then there exist constants , only depending on such that
and
Proof of Lemma 7.10.
-
(i)
By Lemma 7.9(i), with , . In the following we assume w.l.o.g. that and .
-
•
Upper bound: For any ,
Especially we obtain . The assertion holds with .
-
•
Lower bound: Similarly to above,
On the other hand, , which yields the assertion with .
-
•
Upper bound: Put . Then we have
By definition of , . It was already shown in Lemma 7.5 that holds for all . We obtain the assertion with .
-
•
Lower bound: First consider the case .
Put . Since , and thus
By definition of , .
In the case , we have
thus . We conclude that the assertion holds with .
-
•
-
(ii)
We have , where . In the following we assume w.l.o.g. that .
-
•
Upper bound: Put . Define . Then we have
thus
Especially we obtain
that is, the assertion holds with .
-
•
Lower Bound: Case 1: Assume that . Define . Then , and thus
since
We have therefore shown that for ,
(7.82) Case 2: If , then , that is,
We have shown that for all ,
Since
the assertion follows with .
-
•
Upper bound: Put . Then we have
By definition of , we obtain
For , the function attains its maximum at with maximum value . Thus
that is, the assertion holds with .
-
•
Lower Bound: Put . Then
where the last step is due to for . By definition of , we obtain
For , the function attains its minimum at with minimum value . We therefore obtain
that is, the assertion holds with .
-
•
∎
Lemma 7.11 (Form of ).
-
(i)
Polynomial decay (where ): Then there exist some constants only depending on such that
-
(ii)
Geometric decay (where ): Then there exist some constants only depending on such that
Proof of Lemma 7.11.
The assertions follow from Lemma 7.9(ii) by taking . The maximum in the lower bounds is obtained due to the additional summand in .
∎
The following lemma formulates the entropy integral in terms of the well-known bracketing numbers in terms of the -norm in the case that . For this, we use the upper bounds of given in Lemma 7.11.
Lemma 7.12.
Proof of Lemma 7.12.
-
(i)
By Lemma 7.11, . We abbreviate in the following.
Let and , brackets such that . Then
Therefore, the bracketing number fulfill the relation
We conclude that for ,
In the last step, we used the substitution which leads to .
-
(ii)
By Lemma 7.11, with . We abbreviate in the following.
We first collect some properties of . Put . In the case , we have . In the case , we have
This shows that for all ,
(7.83) Furthermore, for ,
(7.84)
∎