Semi--normal: a Hybrid between Normal and -normal (Full Version)
Abstract
The -expectation framework is a generalization of the classical probabilistic system motivated by Knightian uncertainty, where the -normal plays a central role. However, from a statistical perspective, -normal distributions look quite different from the classical normal ones. For instance, its uncertainty is characterized by a set of distributions which covers not only classical normal with different variances, but additional distributions typically having non-zero skewness. The -moments of -normals are defined by a class of fully nonlinear PDEs called -heat equations. To understand -normal in a probabilistic and stochastic way that is more friendly to statisticians and practitioners, we introduce a substructure called semi--normal, which behaves like a hybrid between normal and -normal: it has variance uncertainty but zero-skewness. We will show that the non-zero skewness arises when we impose the -version sequential independence on the semi--normal. More importantly, we provide a series of representations of random vectors with semi--normal marginals under various types of independence. Each of these representations under a typical order of independence is closely related to a class of state-space volatility models with a common graphical structure. In short, semi--normal gives a (conceptual) transition from classical normal to -normal, allowing us a better understanding of the distributional uncertainty of -normal and the sequential independence.
1 Introduction
The -expectation framework is a new generalization of the classical probabilistic system, which is aimed at dealing with random phenomena in dynamic situations where it is hard to precisely determine a unique probabilistic model. These situations are also closely related to the long-existing concern on model uncertainty in statistical practice. For instance, Chatfield, (1995) gives an overview of this concern itself. However, how to better connect the idea of this framework with general data practice is still a developing and challenging area that requires researchers and practitioners from different backgrounds to collaborate and reflect on different degrees of uncertainty possibly brought by complicated nature of the data but also from the modeling procedure itself. To give some examples (rather than a complete list), several recent attempts have been made by Pei et al., (2021); Peng et al., (2020); Peng and Zhou, (2020); Xu and Xuan, (2019); Li, (2018) and Jin and Peng, (2016) (which has been published as Jin and Peng, (2021)). A fundamental and unavoidable problem will be how to better understand the -version distributions and independence from a statistical perspective, which also requires long-term efforts of learning, thinking and exploration. This research work can be treated as a detailed systematic report of our exploration on this basic point in the past three years to a broad community. This community includes not only experts in the area of nonlinear expectations (such as the -expectation) but also researchers and practitioners from other related fields who may not be familiar with the theory of the -expectation framework (-framework) but are interested in the interplay between their areas and the -framework, which requires them to properly understand the meanings and potentials of -version distributions and independence. One vision of this report is to explore and understand the role of statistical methods incorporating -version distributions or processes (with its own independence) in general data practice as well as the differences and connections with the existing classical methods. More importantly, we intend to show how we can broaden our horizon of questions we are able to consider by introducing the notions (such as the distributions and independence) in the -framework (this goal has been partially indicated in Section 5.5). This report is also used to discuss with the broad community to initiate an in-depth discussion on this subject. Considering the length and scope of this work, we decide to divide our core discussions into two stages. The first stage (this paper) can be treated as a theoretical preparation for the second stage (forming a companion of this paper) which provides a series of statistical data experiments based on the theoretical results in this paper.
The main objective of this paper is to provide a better interpretation and understanding of the -normal distribution and the -version independence designed for researchers and practitioners from various background who are familiar with classical probability and statistics. We will achieve this goal by introducing a new substructure called the semi--normal distribution, which behaves like a hybrid connecting normal and -normal: it is a typical object with distributional uncertainty that preserves many properties of classical normal but is also closely related to the -normal distribution.
In any probabilistic framework, if there exists a “normal” distribution (or an equivalent distributional object), it should play a fundamental role in the system. How to understand and deal with the normal distribution is crucial for the development of this framework. The -normal distribution, as its classical analogue, plays a central and fundamental role in the development of the -expectation framework.
1.1 Introduction to the -expectation framework
First we give general readers a short introduction to the -expectation framework. The classical probabilistic system is good at describing the randomness under a single probability rule or model (which could be sophisticated in its form). However, in practice, there are phenomena where it is hard to precisely determine an unique to describe the randomness. In this case, we cannot ignore the uncertainty in the probability rule itself. This kind of uncertainty is often called Knightian uncertainty in economy (Knight, (1921)) or epistemic uncertainty in statistics (Der Kiureghian and Ditlevsen, (2009)). It is also commonly called model uncertainty if it refers to the uncertainty in the probabilistic model. A standard example of Knightian uncertainty comes from the Ellsberg paradox proposed by Ellsberg, (1961) showing the violation of the classical expected utitlity theory based on a linear expectation. In this case, we essentially need to work with a set of probability measures. In order to quantify the extreme cases under , we need to work on a sublinear expectation defined as:
(1.1) |
This sublinear expectation defined as 1.1 first appeared as the upper prevision in Huber, (2004). We also call 1.1 as a representation of . Coherent risk measures proposed by Artzner et al., (1999) can be also represented in this form and more details can be found in Föllmer and Schied, (2011). The notion of Choquet expectation (Choquet, (1954)) is another special type of sublinear expectation which is foundation of a new theory of expected utility by Schmeidler, (1989) to resolve the Ellsberg paradox in static situation. For dynamic situation, the utility theory can be developed by the sublinear version of -expectation proposed by Chen and Epstein, (2002). In principle, -expectation can only deal with those dynamic situations where we can find a reference measure to dominate . Nonetheless, this situation is ideal for technical convenience but also quite restrictive compared with reality: it means all the probabilities in agree on the same null events. For instance, in the context of financial modeling, when there is (Knightian) uncertainty or ambiguity in the volatility process , the set may not necessarily have a reference measure (Epstein and Ji, (2013)). How should we deal with a possibily non-dominated in dynamic situation? It took the community many years to realize that it is necessary to jump out of the classical probability system and start from scratch to construct a new generalization of probability framework, which was established by Peng, (2004, 2007, 2008) and further developed by the academic community led by him, called the -expectation framework.
Since its establishment in 2000s, the -expectation framework is gradually developed into a new generalization of the classical one with its own notion of independence and distributions, as well as the associated stochastic calculus. The spirit of considering to characterize the Knightian uncertainty is embedded into this framework from its initial setup. A distribution under the -expectation can be represented by a family of classical distributions - it provides a convenient way to depict the distributional uncertainty requiring a infinitely dimensional family of distributions which usually may not have an explicit parametric form. More details about this framework can be found in Denis et al., (2011); Peng, (2017); Peng, 2019b .
The -normal distribution is the analogue of normal in this framework. As indicated by its notation, it is a typical object with variance uncertainty. In theory, it plays a central role in the context of the central limit theorem (Peng, 2019a ): it is the asymptotic distribution of the normalized sum of a series of independent random variables with zero mean and variance uncertainty. It has a Stein-type characterization provided by Hu et al., (2017). Fang et al., (2019) provides a insightful discrete approximation and continuous-time form representation of the -normal distribution. In pracitice, -normal has also shown its potentials in the study of risk measure such as the Value at Risk induced by -normal (-VaR) rigorously constructed in Peng et al., (2020) and further developed in recent Peng and Yang, (2020), where the -VaR has mostly outperformed the benchmark methods in terms of the violation rate and predictive performance.
1.2 Potential misunderstandings on the -normal and independence
Since the notion of distribution and independence in the -expectation framework are different from the classical setup, there are several potential misunderstandings on the interpretation of -normal and independence. The sources of these misunderstandings can be summarized into the following four aspects (where we have also provided clarification if applicable):
-
A1
(The uncertainty set of the -normal) The -expectation of the -normal is defined by the (viscosity) solution a fully nonlinear PDE (the -heat equation), which usually does not have an explicit form unless in some special cases (Hu, (2012)). In fact, following the spirit of Knightian uncertainty, a better interpretation of the -version distribution is a family of classical distributions characterizing the distributional uncertainty. Nonetheless, for a general reader, if not careful, the notation of -normal could lead to a misconception that it is associated with the family . Although this impression may still hold in special situations as shown in 3.12, it is not rigorous in general. Actually, the uncertainty set of is much larger than this and one evidence is that the -normal distribution has third-moment uncertainty (all its odd moments have uncertainty), but all the distributions in the family are symmetric implying zero third moments. It means that the uncertainty set of the -normal contains those classical elements that has non-zero third moment. It seems like a strange property for a “normal” distribution in a probabilistic system (especially when we note that if ). An explicit form of the uncertainty set of -normal is given by Denis et al., (2011).
-
A2
(The missing connection between univariate and multivariate -normal) The joint random vector formed by independent -normal distributed random variables does not follow a multivariate -normal (even under any invertible linear transformation of the original vector.) More study of the counter-intuitive properties of -normal can be found in Bayraktar and Munk, (2015).
-
A3
(The asymmetry of independence) The independence in this framework is asymmetric: is independent of does not necessarily mean is independent of . This is why this independence is also called sequential independence, which is different from the classical one. One interpretation of this asymmetry in the relation “ is independent of ” is from the temporal order: if is realized at a time point after , the roles between and would be asymmetric (in terms of the possible dependence structure). Another interpretation is from the distributional uncertainty: any realization of has no effect on the uncertainty set of . Both of the interpretations are valid if one understands the detailed theory of this framework. However, for general audience, both of these are still vague even become quite confusing if one combines them together in a naive way (such as “if happens after , any realization of should have no effect on , then we automatically have one way of independence.”). So far we do have a simple example that the independence is indeed asymmetric (Example 1.3.15 in Peng, 2019b ), but it is not clear why the independence is asymmetric in this example. To be specific, how does the distributional uncertainty of the joint (or the representation of its sublinear expectation) change if we switch the order of the independence? Such a representation (even in a special case) will be beneficial for general audience to better understand the sequential independence in the sense that they can explicitly see how the order of the independence changes the underlying distributional uncertainty.
-
A4
(The lack of caution before the data analysis) Suppose one intends to use the -normal distribution to describe the distributional uncertainty from a dataset (either artificial or realistic one). Without enough caution, the misinterpretations of the independence and distribution in this framework mentioned above may further bring confusion or even mistakes to the data analyzing procedure.
The objectives of this paper all serve for this central problem: from a statistical perspective, how to better understand the -normal distribution and the -version independence. The answer to this question will also lead to a better interpretation and understanding of -normal distribution for general audience and practitioners who are familiar with classical probability and statistics. We will work towards this central problem from the following four basic questions where each question “Q[k]” is corresponding to one of the aspect “A[k]” mentioned above,
-
Q1
How does the third-moment uncertainty of -normal arise? Is this possible to use the linear expectations of classical normal to approach the sublinear expectation of -normals (without involving the underlying PDEs)?
-
Q2
How should we appropriately connect the univariate objects and multivariate objects in this framework? Since it is hard to start from univariate -normals to get multivariate -normal, is this possible for us to make a retreat at the starting point, that is, to connect univariate classical normals with a multivariate -normal?
-
Q3
How can we understand the asymmetry of the independence in this framework in terms of representations?
-
Q4
What kinds of data sequence are related to the volatility uncertainty covered by -normal and what are not?
The interpretation of -normal and sequential independence will also be important to theoretically investigate the reliability and robustness of risk measure derived from -version distributions such as the current -VaR in the literature.
1.3 Our main tool and results in this paper
Our main tool here is a substructure called the semi--normal distribution (Section 3.4), which behaves like a close relative to both classical models (such as a normal mixture model) and also -version objects (such as a -normal). We will also study the various kinds of independence associated with the semi--normal distributions (Section 3.6).
The notion semi--normal was first proposed in Li and Kulperger, (2018), which has used it to design an iterative approximation to the sublinear expectation of -normal and the solution to the -heat equation. Later on this substructure was further developed in the master thesis by Li, (2018) where the independence structures have been proposed there to better perform the pseudo simulation in this context.
This paper gives a more rigorous and systematic construction of these structures and focus more on the distributional and probabilistic aspects of them to show the hybrid roles of semi--normal between classical normal and -normal. To be specific, we will show that there exists a middle stage of independence sitting between the classical (symmetric) independence and the -version (asymmetric) independence. It is called semi-sequential independence, which allows the connection between univariate and multivariate object (3.22), and it is a symmetric relation between two semi--normal objects (3.19).
Moreover, we will provide a series of representations in the form similar to 1.1 associated with the semi--normal distributions and also the random vector with semi--normal marginal under various kinds of independence. Interestingly, by changing the order of the independence, we are equivalently modifying the graphical structure in the representation of the sublinear expectation of the joint vector. This idea will be shown in Section 3.7 and further studied in Section 5.3. These representations provide a more straightforward view on the order of independence in this framework, because we can see how the family of distributions is changed due to the switching of order. Under this view, we can provide a statistical interpretation of the asymmetry of sequential independence between two semi--normal objects (Section 4.3).
Throughout this paper, we will frequently mention the representations of the distributions in the -framework. Theses representation results are crucial here, because the right hand side of representation is simply a family of classical models and its envelope will be exactly the sublinear expectation of -version objects. Through an intuitive representation, a person who is familiar with classical probability and statistics is able to understand the uncertainty described by the -version objects through the representation.
This remaining content of this paper is organized as follows. Section 2 will give a basic setup for the -expectation framework for readers to check the rigorous definitions of each concept. Section 3 presents our main results by putting readers in a context of a classical state-space volatility models. This kind of story setup is specially helpful in the discussions of representation associated with semi--normal in Section 3.7. After we go through these representation results associated with this substructure, readers will find that we have already provided the answers to the four questions during the procedure. These answers will be given and elaborated in Section 4. Finally, Section 5 will summarize the whole paper and also provide more possible extensions as future developments. The proofs of our theoretical results will be put into Section 6 unless a proof is beneficial to the current discussion or is relatively short to be included in the main content.
2 Basic settings of the -expectation framework
This section gives a detailed description of the basic setup (the sublinear expectation space) of the -expectation framework for general audience by starting from a set of probability measures (more rigorous treatments can be found in Chapter 6 in the book by Peng, 2019b ). Another equivalent way is to start from a space of random variables and a sublinear operator (more details can be found in Chapter 1 and 2 in Peng, 2019b ).
For readers who may not be familiar with this setup, the following reading order is recommended as a candidate one:
-
1.
Take a glance at the initial setup and the meaning of notations in this section (especially the notation for independence which is in 2.6);
-
2.
Read through our main results (Section 3) which describe the -version distributions mostly using the representations in terms of classical objects;
-
3.
Come back to this section to check the detailed definitions (such as the connection between -expectation with the solutions to a class of fully nonlinear partial differential equations).
2.1 Distributions, independence and limiting results
Let denote a set of probability measures on a measurable space . Let denote the linear expectation under . Consider the following spaces:
-
•
: the space of all -measurable real-valued functions (or the family of random variables );
-
•
;
-
•
(for );
-
•
(for );
-
•
.
Note that for any ,
We also have, for any ,
Definition 2.1.
(The upper expectation associated with ) For any , we define a functional associated with the family as
where is the extended real line. We also follow the convention that, if exists but is for some , the supreme is taken as .
Definition 2.2 (The upper and lower probability).
For any , let
The set function and are respectively called the lower and upper probabilities associated with .
Proposition 2.3.
The space satisfies:
-
(1)
for any constant ;
-
(2)
If , then ;
-
(3)
If , then ;
-
(4)
If satisfying for any , then ;
-
(5)
If satisfying for any , then ;
-
(6)
For any , for .
Proof.
It is easy to check the first three properties. The logic of (4) comes from the fact that, for each , if we have , -almost surely, we must have exists. Similar logic can be applied to (5). The property (6) is a direct result of (4). ∎
Remark 2.3.1.
However, is not necessarily a linear space. For instance, let and is a Cauchy distributed random variable under . We have and belong to , but .
By 2.3, for any , is well-defined for any . Then we can write as
We will mainly focus on the space .
Proposition 2.4.
The space is a linear space satisfying:
-
(1)
for any constant ;
-
(2)
If , then for any constant ;
-
(3)
If , then ;
-
(4)
If , then ;
-
(5)
If , then ;
-
(6)
If , then for any bounded Borel measurable function .
Proof.
The properties here can be checked by definition of . For instance, (3) comes from the inequality: for any . ∎
Then we can check that becomes a sublinear operator on the linear space . In other words, satisfies: for any ,
-
1.
(Monotonicity) For any , ;
-
2.
(Constant preserving) For any , ;
-
3.
(Sub-additivity) ;
-
4.
(Positive homogeneity) For any , ;
Then we call a sublinear expectation and a sublinear expectation space.
Furthermore, note that is a linear subspace of . We can treat as the null space and define the quotient space . For any with representative , we can define , which is still a sublinear expectation. We can check that induces a Banach norm on . Let denote the completion of under . Since we can check that itself is a Banach space, it is equal to its completion (Proposition 14 in Denis et al., (2011)). Let , then we can check that still forms a sublinear expectation space.
Rigorously speaking, we also require additional conditions on such as the weak compactness so that we have regularity on (Theorem 12 in in Denis et al., (2011)). Meanwhile, there exists such a weakly compact family so that the typical -version distributions (maximal and -normal distribution) exist in the space . More details can be found in Section 2.3 and Section 6.2 in Peng, 2019b . The -expectation is defined after we construct the Brownian motion (the -Brownian motion) in this context but throughout this paper, we will only touch the -version distributions and independence so the expectation so far is still a special kind of sublinear one, which we will still call it as the -expectation to stress its typical properties allowing the existence of the -version distributions. Throughout this section, without further notice, we will stay in .
Let . For any , we will frequently mention a transformation of for a function . Consider the following spaces of functions:
-
•
: the linear space of all bounded and Lipchistz functions;
-
•
: the linear space of functions satisfying the locally Lipchistz property which means
for , some positive integer and depending on .
We will simply write or if the dimension of the domain of is clear in the context by checking the dimension of random objects.
Note that satisfies: for any , if . However, this property does not necessarily hold for any . Therefore, when we discuss the definition of distributions and independence in this framework, we will use . Later on, we will mention that this space can be extended to any for a special family of distributions and under some additional conditions.
Definition 2.5 (Distributions).
There several notions related to the -version distributions:
-
1.
We call and are identically distributed, denoted by , if for any ,
-
2.
A sequence converges in distribution to , denoted as , if for any ,
Definition 2.6 (Independence).
A random variable is (sequentially) independent from , denoted by , if for any ,
Remark 2.6.1.
(Intuition of this independence) Since both and are treated as random object with potential distributional uncertainty, this independence is essentially talking about the relation between the distributional uncertainty of and . If we put our discussion into a context of sequential data (where the order of the data matters), this kind of independence often arises in scenarios where is realized before and any realization of has no effect on the distributional uncertainty of .
Remark 2.6.2.
(Asymmetry of this independence) One important fact regarding this independence is that it is asymmetric: ( is independent from ) does not necessarily mean ( is independent from ), which will be illustrated by 2.7. This is the reason we also call it a sequential independence and we use the notation to indicate the sequential order of the independence between two random objects.
Remark 2.6.3.
(Connection with the classical independence) Note that this sequential independence becomes classical independence (which is symmetric) once and have certain classical distribution. In other words, they can be put under a common classical probability space. In this case, reduces to a linear expectation . To give readers a better understanding, without loss of generality, suppose have a classical joint continuous distribution with density function and the marginal densities are and , we have, for any applicable ,
Therefore, we have , which means and are (classically) independent.
Example 2.7 (Example 1.3.15 in Peng, 2019b ).
Consider two identically distributed with and . Also assume such that . Then we have
We will further study the interpretation of the independence (especially its asymmetric property) in Section 4.3 by giving a detailed version of 2.7 with representation theorems.
Next we give the notion of independence extended to a sequence of random variables.
Definition 2.8.
(Independence of Sequence) For a sequence of random variables, they are (sequentially) independent if
for . For notational convenience, the sequential independence of is denoted as
(2.1) |
This sequence is further identically and independently distributed if they are sequentially independent and for . This property is called (nonlinearly) i.i.d. in short.
Remark 2.8.1.
Note that the independence 2.1 is stronger than the pairwise relation with .
Now we introduce two fundamental -version distributions: maximal and -normal distributions. The former one can be treated as an analogue of “constant” in classical sense. The latter one is a generalization of classical normal. We call an independent copy of if and .
We first introduce the -distribution which is the joint vector of these two fundamental distributions.
Let denote the collection of all symmetric matrices.
Proposition 2.9.
Let be a function satisfying: for each and ,
(2.2) |
Then there exists a pair on some sublinear expectation space such that
(2.3) |
and for any ,
(2.4) |
where is an independent copy of .
Remark 2.9.1.
The relation 2.4 is equivalent to
The proof of 2.9 is available at Section 2.3 in Peng, 2019b . Then we have the notion of -distribution associated with a function .
Definition 2.10.
The sublinear expectation of the random vector above can be characterized by the solution to a parabolic partial differential equation.
Proposition 2.11.
Consider a -distributed random vector associated with a function . For any , let
Then is the unique (viscosity) solution to the following parabolic partial differential equation (PDE):
with initial condition , where and . This PDE is called a -equation.
Remark 2.11.1.
Readers may turn to Crandall et al., (1992) for more details on the notion of viscosity solutions. In this paper, we do not require readers’ knowledge on the viscosity solution. Moreover, it can be treated as a classical one when the function satisfies the strong elliplicity condition.
Next we provide a useful established property of the -distributed random vector . Suppose and the following uniform integrability conditions are statisfied (proposed by Zhang, (2016)):
(2.5) |
and
(2.6) |
Then for any (which is larger than ), we still have (which is a Banach space). (This result is provided in Section 2.5 in Peng, 2019b .) Therefore, in the following context, when we talk about for a -distributed random vector , we can take .
If we pay attention to each marginal part in 2.4, we can see that is similar to a classical normal distribution while behaves like a constant (we do not consider Cauchy distribution here because we assume the existence of expectation). It turns out follows a -normal distribution and follows a maximal distribution.
Definition 2.12 (Maximal distribution).
A -dimensional random vector follows a maximal distribution if, for any independent copy , we have
Another equivalent and specific definition is that follows the maximal distribution if there exists a bounded, closed and convex subset such that, for any ,
Definition 2.13 (-normal distribution).
A -dimensional random vector follows a -normal distribution if, for any independent copy , we have
When , we have () with variance-uncertainty: and .
Proposition 2.14 (-normal distribution characterized by the -heat Equation).
A random vector follows the -dimensional -normal distribution, if and only if is the solution to the -heat equation defined on :
(2.7) |
where , which is a sublinear function characterizing the distribution of . For , we have and when , 2.7 is also called the Black-Scholes-Barenblatt equation with volatility uncertainty.
Remark 2.14.1.
For , when , the -normal distribution can be treated as a classical normal because the -heat equation is reduced to a classical one.
Remark 2.14.2.
(Covariance uncertainty) We can use the function to characterize the definition of -normal distribution. In fact, can be further expressed as
where is a collection of non-negative definite symmetric matrices which can be treated as the uncertainty set of the covariance matrices. In this sense, we can write .
Proposition 2.15.
Consider and a classical distributed random variable . For any , we have
Theorem 2.16 (Law of Large Numbers).
Consider a sequence of i.i.d. satisfying
(2.8) |
Then for any continuous functions satisfying the linear growth condition , we have
where is the bounded, closed and convex subset decided by
For , let and . Then , that is, we have,
Theorem 2.17 (Central Limit Theorem).
Consider a sequence of i.i.d. satisfying mean-certainty and
(2.9) |
Then for any continuous functions satisfying the linear growth condition ,
where is a -normally distributed random variable characterized by the sublinear function defined as
For , let and . Then we have converges in distribution to .
Proposition 2.18.
Consider a sequence and satisfying
for any . If the convergence holds for any , then it also holds for .
Remark 2.18.1.
2.18 is a direct result of Lemma 2.4.12 in Peng, 2019b . It useful when we need to extend the function space for to discuss the convergence in distribution.
2.2 Basic results on independence of sequence
We prepare several basic results on sequential independence between random vectors in the -framework:
-
•
2.19 gives a general result showing the sequential independence between two random vectors implies the independence between their sub-vectors.
-
•
2.20 shows the sequential independence of a sequence implies the independence of the sub-sequence.
-
•
2.23 shows under the sequential independence of a sequence, any two non-overlapping subvector has the sequential independence (as long as keeping the original order.)
These results are useful for the discussions in Section 3.6. We provide the proofs for the convenience of general readers and help them better understand how to deal with the sequential independence .
Proposition 2.19.
For any subsequences and satisfying and , we have the general result that
Proof.
For any applicable test function , define another function on a larger space by
then
Proposition 2.20.
For any subsequence satisfying , we have the result that
Proof.
It is equivalent to prove for any . For any , by the definition of independence of the full sequence , we have
From 2.19, we directly have the sequential independence for the subvectors:
The following 2.21 and 2.22 will be useful in our later discussion, where the dimension of the three objects could be arbitrary finite number.
Lemma 2.21.
If , then .
Proof.
Let Then we can check
where (1) is due to , (2) comes from and (3) comes from . ∎
Lemma 2.22.
If and , we have .
Proof.
Let Then
where (1) comes from , (2) comes from and (3) comes from . ∎
Proposition 2.23.
If , for any increasing subsequence , we have
Proof.
Let Then we have . Our goal is to show for any ,
(2.10) |
Then we can proceed by math induction. Let . The result 2.10 holds when because we directly have
by the definition of . Suppose 2.10 holds for . We need to show the case with :
(2.11) |
Let
Then we have by the definition of We also have by the result for . Then we can follow the same logic of 2.21 to show
which is exactly 2.11. The proof is finished by math induction. ∎
Proposition 2.24.
The following two statements are equivalent:
-
(1)
,
-
(2)
, and .
Proof.
Since (1) implies (2), so we only need to show the other direction. By the definition of (1), we simply need to check:
This is a direct consequence of 2.22 by letting , and . ∎
Proposition 2.25.
For any , we have as long as either one of the following conditions holds:
-
1.
;
-
2.
;
-
3.
is independent from : ;
-
4.
is independent from : .
Proof.
We only need to show Condition 1 and 3. Under Condition 1, we have
Under condition 3, we have
2.26 is an important result that shows the asymmetry of independence between two random objects prevails in this framework except when their distributions are maximal or classical ones.
Theorem 2.26 (Hu and Li, (2014)).
For two non-constant random varibles , if and are mutually independent ( and ), then they belong to either of the following two cases:
-
1.
The distributions of and are classical (no distributional uncertainty);
-
2.
Both and are maximally distributed.
We can also easily obtain the following result.
Proposition 2.27.
For two non-constant random varibles , if they belong to either of the two cases in 2.26, then we have implies .
Proof.
When are classically distributed, the results can be derived from 2.6.3. When they are maximally distributed, this result has been studied in Example 14 in Hu and Li, (2014) and it has been generalized to 3.6 whose proof is in Section 6.1. We sketch the proof here to show the intuition for general readers. Suppose and where and are two bounded, closed and convex sets. If , for any , we can work on the expectation of to show the other direction of independence,
where we have used the fact that if (to apply the representation), which is validated by 6.1 in the proof of 3.6. Hence, we have . ∎
3 Our main results: semi--normal and its representations
This section serves for two objectives. On the one hand, we will introduce a new substructure called the semi--normal distribution and explain its hybrid property and intermediate role sitting between the classical normal and -normal distribution. On the other hand, this section is also designed to give general readers a gentle trip towards the -normal distribution by starting from our old friend, the classical normal distribution.
Although most of the theoretical results presented in this section are in the sublinear expectation space by default unless indicated in the context, we will introduce most of the subsections by starting from a discussion on the distributional uncertainty of a random object in a classical state-space volatility model, whose context will be set up in Section 3.1.
Without further notice, these are the notations we are going to consistently use in this paper:
-
•
: the set of all positive integers.
-
•
and .
-
•
Let denote a identity matrix.
-
•
In , let denote the linear expectation with respect to and we may write it as if the underlying is clear from the context.
-
•
Random variables in : , , , .
-
•
Random variables in : , , . Note that we can treat as a random object in both sublinear and classical system due to 2.14.1.
The reason we use two different sets of random variables in two spaces is mainly for simplicity of our discussion, which will be further explained in 3.2.2.
3.1 Setup of a story in a classical filtered probability space
In , consider as a sequence of classically i.i.d. random variables satisfying for . Let be a sequence of bounded random variables which can be treated as states (or volatility regimes) with state space . Let denote the observation sequence. (It seems like a zero-delay setup, but this is not essential in our current scope of discussion.) Consider a representative where .
For simiplicty of discussion, at each time point , we assume that follows and and are classically independent, denoted as . Consider the following (discrete-time) filtrations:
where is the collection of -null sets used to complete each of the generated -field mentioned above. Note that is the same as . Let . In a classical filtered probability space , we will start the following subsections by putting ourseleves, as a group of data analysts, in a context of dealing with uncertainty on the distributions of one state variable , one observation variable and a sequence of observation variables for .
3.2 Preparation: properties of maximal distribution
Suppose we have uncertainty on the distribution of the state variable (and it is realistic because is not directly observable in practice) due to lack of prior knowledge. Another possible situation is different member in our group has different belief on the behavior of or different preference on the choice of the model - the distribution of could be a degenerate, discrete, absolutely continuous or even arbitrary one with support on . In order to quantify this kind of model uncertainty for a given transformation (as a test function), we usually need to involve the maximum expected value of :
(3.1) |
where can be chosen depending on the available prior information. Possible choices of include,
-
•
: the space of all classically distributed random variables with support on .
-
•
.
-
•
.
-
•
, which is the family of all random variables following degenerate (or Dirac) distribution with mass point at .
We are going to show that 3.1 will all be the same as the sublinear expectation of maximal distribution in the -framework.
Definition 3.1.
(Univariate Maximal Distribution) In sublinear expectation space , a random variable follows maximal distribution with if, for any
Remark 3.1.1.
We can take maximum because we are working on a continuous on a compact set .
The reason that we use the notation (which is like an interval for standard deviation) is for the convenience of our later discussion.
Proposition 3.2 (Representations of Univariate Maximal Distribution).
Consider , then for any , we have and
(3.2) | ||||
(3.3) | ||||
(3.4) | ||||
(3.5) |
Remark 3.2.1.
Note that . The probability laws associated with are equivalent, but and do not have this property.
Remark 3.2.2.
In 3.2, we write the representation in the form of
(3.6) |
where denote a family of random variables in . Equivalently, if we treat as a random variable for both sides (which requires more careful preliminary setup and we will not touch at this stage, more details can be found in Chapter 6 of Peng, 2019b ), 3.6 becomes
(3.7) |
where is the distribution of and becomes a family of distributions. Throughout this paper, we prefer to use the form 3.6 for simplicity of notations and minimization of technical setup, but readers can always informally view 3.6 as a equivalent form of 3.7. In this way, we can better see the distributional uncertainty of .
Remark 3.2.3.
Meanwhile, 3.2 provides four ways to represent the distributional uncertainty of . In practice, practitioners may choose the representation they need depending on the available prior knowledge or their belief on the random phenomenon.
Definition 3.3.
(Multivariate Maximal Distribution) In sublinear expectation space , a random vector follows a (multivariate) maximal distribution , if there exists a compact and convex subset satisfying: for any ,
One can also easily extend the representation in 3.2 to a multivariate case (3.4) by considering which is the space of all classically distributed random variables with support on and also considering its subspaces , and as well.
Proposition 3.4 (Representations of multivariate maximal distribution).
For , we have for any ,
(3.8) |
where can be chosen from and can be changed to except when .
Proof.
It can be extended from the proof of 3.2. ∎
Next we provide a property for multivariate maximal distribution under transformations.
Proposition 3.5.
Suppose . Then for any locally Lipschitz function , we have
where .
Remark 3.5.1.
Next we give a connection between univariate and multivariate maximal distribution.
Proposition 3.6.
(The relation between multivariate and the univariate maximal distribution) Consider a sequence of maximally distributed random variables with , , then the following three statements are equivalent:
-
(1)
are sequentially independent ,
-
(2)
for any permutation of ,
-
(3)
, where the operation is the Cartesian product.
Remark 3.6.1.
3.6 shows that the sequential independence between maximal distribution can be arbitrarily switched without changing its joint distribution, which is a maximal distribution supporting on a -dimensional rectangle. Reversely speaking, if a random vector follows a maximal distribution concentrating this rectangle shape, it implies the sequential independence among its components.
As a special case of 3.6, for two maximal distributed random variables , implies that . In fact, 2.26 given by Hu and Li, (2014) shows for two non-constant, non-classical distributed random objects, this kind of mutual independence only holds for maximal distributions. The asymmetry of sequential independence prevails among the distributions in the -expectation framework.
3.3 Preparation: setup of a product space (a newly added part)
We start from a set of probability measures and a single probability measure , where does not have to be in . Let and . Then we have the associated sublinear expectation spaces with . Note that , as a linear operator, can be treated as a degenerate sublinear expectation. We may simply write the linear expectation as if the probability measure is clear from the context. Since is a linear expectation, the distributions under can be treated as classical ones for which we assume they contain common classical distributions (such as classical normal). We also assume is designed such that -distribution exists in . Then we can combine these two spaces into a product space . It is also forms a sublinear expectation space. More details on this notion of product space can be found in Peng, 2019b (Section 1.3).
For readers’ convenience, here we provide a brief description of this product space.
-
1.
The space is defined as
-
2.
For , we have defined
where where is the product measure of and . Note that due to the sublinearity of .
Proposition 3.7.
For a random variable on and on , by letting and , we have the following results:
-
1.
,
-
2.
for ,
-
3.
For any ,
-
4.
For any ,
-
5.
Proof.
Item 1 and 2 are obvious to see. For Item 3, we have
Similarly, we can show Item 4. For Item 5, our goal is to show
We can see the equation above holds from the following step: with ,
RHS | |||
Remark 3.7.1.
In the following context, without further notice, we will not distinguish (or ) with (or ). Moreover, we can see that, by letting so that , from Item 3, we have
where is any product measure where . By making into , we can show that for any ,
or simply It means that the probability law of is always under each product measure .
Let denote a subspace of the product space mentioned above:
For any , by the representation of maximal distribution, we have
3.4 Univariate semi--normal distribution
Recall the story setup in Section 3.1. Note that can be treated as a normal mixture with scaling latent variable . For simplicity of discussion, we have assumed and . Suppose we are further faced with the uncertainty on the distribution of due to uncertain part. Then the maximum expected value under this distributional uncertainty is
(3.9) |
where the choice of is the same as Section 3.2. It turns out, in either of the choices, 3.9 can be expressed as the sublinear expectation of a semi--normal (3.8).
To begin with, note that can be treated as the same as the classical distribution due to 2.14.1. Therefore, we can also say in the sublinear expectation space. In the following context, we will not distinguish between and . Similarly, a standard multivariate normal can be treated as both a classical distribution and also a degenerate version of a multivariate -normal.
Definition 3.8 (Univariate semi--normal distribution).
For any , we call follows a semi--normal distribution if there exist and with , such that
(3.10) |
where the direction of independence cannot be reversed. It is denoted as .
Remark 3.8.1.
Remark 3.8.2.
(Why we cannot reverse the direction of independence) There are two reasons:
-
1.
The sublinear expectation will essentially change if we do so: the resulting distribution will be different. For instance, if we assume and let , we have
and similarly,
where and . We can see that and already exhibit their difference in the first moment: has certain mean zero but has mean-uncertainty.
- 2.
As we further proceed in this paper, we will see that the property of is closely related to the random vector in its decomposition 3.10 (such as the results in Section 3.6). The following 3.9 guarantees the uniqueness of such decomposition.
Proposition 3.9 (The uniqueness of decomposition).
Theorem 3.10 (Representations of univariate semi--normal).
Consider two classically distributed random variables and satisfying . For any , we have and
(3.11) | ||||
(3.12) | ||||
(3.13) | ||||
(3.14) |
where are the same as the ones in 3.2.
The proof of 3.10 is closely related to the representation of maximal distribution. First we need to prepare the following lemma.
Lemma 3.11.
For any fixed , let with . Then we have .
Proof of 3.10.
Under the sequential independence , for any , we have
First we have by 3.11. Then we can use 3.1 to show the finiteness of due to the continuity of : Next we check each representation in 3.10 by applying the associated representation of maximal distribution in 3.2. For instance, we can show 3.13 based on 3.4:
where we use the fact that and 2.6.3. ∎
Remark 3.11.1.
Remark 3.11.2.
Let denote the cumulative distribution function of under and represent the family of with . Then we can apply the classical Fubini theorem in the evaluation of in 3.10 to get a more explicit form of representation:
where can be chosen from .
Remark 3.11.3 (Why is it called a “semi” one?).
The essential reason is that the uncertainty set of distributions associated with the semi--normal is smaller than the one of -normal. Let and . In fact, we have the following existing result: for any
(3.15) |
which can be proved by applying the comparison theorem of parabolic partial differential equations (in Crandall et al., (1992)) to the associated -heat and classical heat equations with initial condition (the inequality become a strict one when is neither convex nor concave). For readers’ convenience, the result 3.15 is included in Section 2.5 in Peng, 2019b . Meanwhile, we have the representation of from a set of probability measures,
where characterizes the distributional uncertainty of . Hence, 3.15 tells us . A more explicit discussion of this distinction will be provided in 3.24.6.
Remark 3.11.4 (The distribution of ).
In principle, the distribution of can be changed to any other types of classical distribution with finite moment generating function and all the related results like representations will also hold. We choose standard normal because we are working on an intermediate structure between normal and -normal. Another reason comes from the following 3.12.
Proposition 3.12 (A special connection between semi--normal and -normal distribution).
Let and . For , when is convex or concave, we have
3.5 Multivariate semi--normal distribution
The definition of semi--normal distribution can be naturally extended to multi-dimensional situation. Intuitively speaking, the multivariate semi--normal distribution can be treated as an analogue of the classical multivariate normal distribution which can be written as:
(3.16) |
where is a identity matrix and is the covariance matrix.
Let denote the family of real-valued symmetric positive semi-definite matrices. Consider a bounded, closed and convex subset . For any element , it has a non-negative symmetric square root denoted as . Let which is the set of with . Then we can treat as the covariance matrix of a classical multivariate normal distribution due to 3.16 and as a collection of covariance matrices. Note that is still a bounded, closed and convex set. Then a matrix-valued maximal distribution can be directly extended from 3.3.
Definition 3.13 (Multivariate Semi--normal distribution).
Let a bounded, closed and convex subset be the uncertainty set of covariance matrices and . In a sublinear expectation space, a -dimensional random vector follows a (multivariate) semi--normal distribution, denoted by , if there exists a (degenerate) -normal distributed -dimensional random vector
and a -dimensional maximally distributed random matrix
as well as is independent from , expressed as , such that
where the direction of independence here cannot be reversed.
Remark 3.13.1.
The existence of multivariate semi--normal distribution comes from the same logic as 3.8.1 (by using the existence of the -distribution in a multivariate setup).
Remark 3.13.2.
Similar to the discussions in Section 3.2, we can extend the notion of semi--normal distribution and its representation to multivariate siuation.
Theorem 3.14.
(Representation of multivariate semi--normal distribution) Consider the random vector in 3.13. For any , we have and
(3.17) |
where can be chosen from and can be changed to except when .
Proof of 3.14.
Remark 3.14.1.
3.14 means that there are several ways to interpret the distributional uncertainty of multivariate semi--normal :
-
•
it can be described as a collection of with constant covariance matrix ;
-
•
it can be described as a collection of classical multvariate normal mixture distributions with (discretely, absolutely continuously, arbitrarily) distributed random covariance matrices (as a latent scaling variable) ranging in .
By using 3.14, we can conveniently study the covariance uncertainty between the marginals of . First, we can define the the upper and lower covariance between each marginal of as (note that has certain mean zero)
and
Then these two quantities turn out to be closely related to illustrated as follows.
Proposition 3.15 (Upper and lower covariance between semi--normal marginals).
For each , let denote the -th entry of , and
Then we have
Specially speaking, we have
Proof.
For each , let . Then it is obvious that . For each , let
Then by applying 3.14,
Similarly we can show . ∎
3.6 Three types of independence related to semi--normal distribution
Besides the existing -version independence (also called sequential independence) in 2.8, this substructure of semi--normal distribution also provides the possibility to study finer structures of independence in this framework, and interestingly, we will show in Section 3.7 each type of independence is related to a family of state-space volatility models.
We will introduce three types of independence regarding semi--normal distributions. Readers may recall the notation for the independence of sequence (2.8). Throughout this section, we assume for , which is a sequence of semi--normally distributed random variables, then accordingly and . Let
The identity of variance intervals is not essential and the results in this section can be easily generalized to the case .
Definition 3.16.
For a sequence of semi--normal distributed random variables , we have three types of independence:
-
1.
are semi-sequentially independent (denoted as ) if :
(3.18) -
2.
are sequentially independent (denoted as ) if:
(3.19) -
3.
are fully-sequentially independent (denoted as ) if:
(3.20)
Remark 3.16.1 (Compatibility with the definition of semi--normal).
The requirement of independence to form the semi--normal distribution is simply , which is guaranteed by all the three types of independence by 2.19. Furthermore, for two semi--normal object and , we can see that implies
which further indicates However, (or ) does not imply since the latter actually reverses the order of independence between and in the former.
Remark 3.16.2.
(Existence of these types of independence) It comes from the same logic used in 3.8.1 due to the existence of sequentially independent -distributed random vectors.
Theorem 3.17.
The fully-seqential independence of can be equivalently defined as:
-
(F1)
The pair are sequentially independent:
-
(F2)
The elements within each pair satisfy with .
Remark 3.17.1.
We add the condition (F2) only to stress the intrinsic requirement on independence from the definition of semi--normal. The main requirement of fully-sequential independence is (F1). It is also the reason why is stronger than because the latter only involves the product but the former is about the joint vector .
The fully-sequential independence is a stronger version of sequential independence and it does not exhibit much difference with sequential independence in our current scope of discussion (which will be illustrated by 3.24).
Hence, the key new type of independence here is the semi-sequential independence, which is different from the sequential independence and also leads to different joint distribution of . We will study the properties and behaviours of semi--normal under semi-sequential independence. Under such kind of independence, some of the intuitive properties we have in classical situation are preserved. First of all, it is actually a symmetric independence among objects with distributional uncertainty (3.19). This symmetry makes it different from the sequential independence although is defined through . Moreover, the joint vector of semi-sequentially independent semi--normal follows a multivariate semi--normal. It actually provides a view on how to connect univariate and multivariate objects (under distributional uncertainty), which is a non-trivial task for -normal distribution. It further provides a path to start from univariate classical normal to approach a multivariate -normal (by using the multivariate semi--normal as a middle stage). This idea will be further illustrated in Section 4.2.
We call it as “semi-sequential” independence because the only “sequential” requirement in the independence is but the squential order within each vector is inessential in the sense that it can be arbitrarily switched. 3.18 elaborates this point by giving an equivalent definition.
Theorem 3.18.
The semi-sequential independence of can be equivalently defined as:
-
(S1)
The part is independent from part: ,
-
(S2)
The elements in part are sequentially independent: ,
-
(S3)
The elements in part are classically independent.
Remark 3.18.1.
The order of independence within part in (S2) is inessential in the sense that it can be arbitrarily switched by 3.6. Meanwhile, the order in part can also be switched due to the classical independence. Hence, this equivalence definition of semi-sequential independence indicates some intrinsic symmetry of this relation coming from the only two categories of distributions (maximal and classical) that allow mutual independence. This point will be elaborated in the discussion of 3.19 and further formalized in 3.22.
To show the idea of the symmetry of , we start from a simple case with and include a short proof for readers to grasp the intuition. The validation of other results in this section is given in Section 6.3.
Proposition 3.19 (Symmetry in semi-sequential independence).
The following statements are equivalent:
-
(1)
,
-
(2)
,
-
(3)
.
The proof of 3.19 relies on the following 3.20, which is a direct consequence of 2.24 but we still include a separate proof from scratch to show the idea.
Lemma 3.20.
The following two statements are equivalent:
-
(1)
,
-
(2)
, , .
Proof of 3.20.
We can directly see because the independence of a sequence implies the independence among the non-overlapping subvectors as long as keeping the original order (2.23).
. The relation is equivalent to,
-
1.
,
-
2.
,
-
3.
.
The first two are directly implied by (2). For a fixed scalar vector , let Then the third one is equivalent to
In fact, since , we have
where comes from the independence and is due to the relation . ∎
Proof of 3.19.
The equivalence of the three statements will be proved by this logic: .
. By 3.20, (1) indicates
(3.21) |
In 3.21, the roles in part are symmetric and so are part (due to 2.26). Then 3.21 is equivalent to
(3.22) |
which in turn implies by 3.20.
Let . Then
where and . Under the independence , we have 3.22 by 3.20, which further implies We also have from (3.6), then . Meanwhile, means they are actually classically independent with the joint distribution because the distribution of is classical. Therefore, by 3.13,
First, from the definition of , there exist and with independence
(3.23) |
such that In other words, We directly have and are classically independent from their joint distribution. Next we study the independence between part. Similarly, we can the joint distribution from the distribution of :
By 3.6, we have (also vice versa). Note that 3.23 implies Hence we have by 3.20. ∎
Proposition 3.21 (Zero sublinear covariance implies semi-sequential independence).
If follows a bivariate semi--normal and they have certain zero covariance:
then we have (and vice versa).
3.21 and 3.19 seem like natural results for a “normal” object in multivariate case, but this is the first time we establish such kind of connections within the -expectation framework, because the -normal distribution does not have such properties in multivariate case. For instance, for with . On the one hand, given the independence , does not follow a bivariate -normal, neither does under any invertible transformation . One the other hand, if follows a bivariate -normal , we do not have or . These kinds of strange properties bring barriers to the understanding of -normal in multivariate situations, especially on the connection univariate and multivariate objects. More details of this concern can be found in Bayraktar and Munk, (2015). Fortunately, the substructure of semi--normal provides some insights to reveal this connection.
3.19 can be extended to random variables.
Theorem 3.22.
The following three statements are equivalent:
-
(1)
,
-
(2)
for any permutation of ,
-
(3)
.
Remark 3.22.1.
3.22 shows the semi--normal under the the semi-sequential independence has symmetry and compatiability with the multivariate case. The underlying reason is that it takes advantage the (only) two families of distributions that allow both properties: classical normal and maximal distribution. For classical normal, we know that a bivariate normal with diagonal covariance matrix is equivalent to the (symmetric) independence between components. For maximal, the results are provided in 3.6.
We end this session by showing the stability of semi--normal distribution under semi-sequential independence, which indicates that more analogous generalizations of results on classical normal can be discussed here.
Proposition 3.23.
For any satisfying and , we have
Proof.
With and , semi-sequential independence means:
For any , first recall that is in (3.11). On the one hand,
where we use the fact that . On the other hand,
Since
we have for all . ∎
We will further investigate the connection and distinction between these types of independence in Section 3.7 by studying their representations.
3.7 Representations under three types of independence
Let us come back to the story setup in Section 3.1 to introduce our results to general audience. Suppose we intend to study the dynamic of the whole observation process (which is the observable data sequence)
Depending on the background information or knowledge (or the lackness of reliable knowledge on the data pattern and underlying dynamic), we may still have uncertainty on the distribution or dynamic of . Especially in the early stage of data analysis, it is usually required to specify a model structure and search for the optimal one in a family of them. However, at this stage, how to select or distinguish the family of models is an important and non-trivial task in statistial modeling. Suppose we assume that the underlying process belong to a family , but some patterns of the data sequence, which could be generally quantified by for a test function , seems to exceed even the extreme cases in . In this case, we may tend to reject the hypothesis that . In this situation, we usually need to work with the maximum expected value under the uncertainty :
(3.24) |
However, in principle, might be an infinite dimensional family of non-parametric (or semi-parametric) dynamics (due to lack of information on the undelying dynamic). In our current context of discussions, the possible choices of include:
-
•
. As illustrated by Figure 3.1, it includes independent mixture models and a typical class of hidden Markov models without feedback process.(In Figure 3.1, we omit the edge from to only for graphical simplicity.)
-
•
. As illustrated by Figure 3.2, it includes those state space models that the future state variable only depends on the historical observations.
-
•
As illustrated by Figure 3.3, it contains a class of hidden Markov models with feedback process: the future state variable depends on both the previous states and observations. In Figures 3.2 and 3.3, the dashed arrows mean these are possible feedback effects.
Note that
This includes two aspects:
-
•
due to the fact that and ;
-
•
because for any , given , can be treated as a constant thus we must have .
Remark 3.23.1.
The condition in is equivalent to
where is a sequence of -martingale increments.
In traditional statistical modeling, how to deal with the quantity 3.24 is essentially a difficult task when is highly unspecified and only contains some vague conditions on the possible design of edges (such as the additional edges in Figure 3.3 compared with Figure 3.1).
In this section, we will show that 3.24 can be related to the -expectation of a random vector with semi--normal marginals and different choice of is corresponding to a type of independence associated with semi--normal. After transforming 4.1 into a -expectation, it becomes more convenient to evaluate the -expectation and this evaluation procedure also gives us a guidance on what should be the “skeleton” part to consider the extreme scenario when dealing with different forms of .
Our main result can be summarized as follows.
Theorem 3.24.
(Representations of semi--normal random variables under various types of independence) Consider and any ,
-
•
Under semi-sequential independence:
(3.25) we have , and
(3.26) -
•
Under sequential independence:
(3.27) or fully-sequential independence:
(3.28) we have , and
(3.29) (3.30)
Proof of 3.24.
Turn to Section 6.4. ∎
Remark 3.24.1.
We can only say stays the same under sequential or fully sequential independence. It does not mean these two types of independence are equivalent. Their difference might arise when we consider a more general situation , which is out of our current scope of discussion.
Remark 3.24.2.
Here we only consider as univariate semi--normal which can also be routinely extended to multivariate semi--normal (defined in Section 3.5). Then is also requred to be changed to a matrix-valued process.
Remark 3.24.3.
The vision here is that we can use the -expectation of semi--version random vector under various types of independence to obtain the envelope associated with different family of model structures. With or without a kind of dependence (such as with or without the feedback), the family of models is usually infinite dimensional because, in principle, the form of the feedback dependence could be any kind of nonlinear function. Nonetheless, 3.26, 3.29 and 3.30 tell us that, instead of going through all possible elements on the right hand side, we can move to the left side of the equation treat it as a sublinear expectation which has a convenient way to evaluate. For instance, under semi-sequential independence, by 4.4, follows a multivariate semi--normal, then we only need to run through a finite-dimensional subset (as the “skeleton” part) to get the extreme scenario,
Under sequential independence, we only need to run through an iterative algorithm to evaluate , which will be explained in Section 4.1.
Corollary 3.24.1.
As special cases, under semi-sequential independence, we have
(3.31) |
Under sequential independence or fully sequential independence, we have
(3.32) |
where can be replaced by .
Proof.
This a direct result of 3.24. ∎
Remark 3.24.4.
Remark 3.24.5.
To show consistency with the existing results in the literature, if we choose in 3.5, (we can also change the distribution of in to any applicable classical distribution,) then we can apply the CLT in the -expectation framework to the left handside to retrieve a result similar to the one in Rokhlin, (2015) (which is obtained by treating it as a discrete-time stochastic control problem):
where . When choosing , 3.24.1 is related to the discussion in Section 4 of Fang et al., (2019). It is also related to the formulation in Dolinsky et al., (2012), although the latter uses a different approach.
Remark 3.24.6 (A more explicit distinction between semi--normal and -normal).
Let us extend our discussion to a continuous-time version of the setup in Section 3.1. By Denis et al., (2011), the distributional uncertainty of can be explictily written as
where is a classical Brownian motion (induced by ) under and is the collection of all -measurable processes valuing in . Meanwhile, by considering the continuous-time version of 3.24.4, the distributional uncertainty of can be expressed as
where is the collection of all -measurable processes valuing in . Note that because only considers those processes that is independent from . This gives another more explicit distinction between semi--normal and -normal distribution compared with 3.11.3.
Corollary 3.24.2.
Under the setup of 3.24, when is convex or concave,
will be the same under either sequential or semi-sequential independence. Furthermore, in these cases, we have
(3.33) |
The following result can be treated as an extension of 3.12.
Corollary 3.24.3.
Let denote a sequence of nonlinearly i.i.d. -normally distributed random variables with . When is convex or concave, we have
where and they can be either sequentially or semi-sequentially independent.
We can also prove that the representations mentioned in this paper also hold for so that we can apply them to consider the upper probability or capacity induced by the sublinear expectation: (from 2.2 and 2.4). Without loss of generality, we only discuss the univariate case, which can be routinely extended to multivariate situations.
Definition 3.25.
(The upper and lower cdf) In sublinear expectation space, the upper cdf of a random variable is
and the lower cdf is
Theorem 3.26.
(Representations of the upper and lower cdf) Let denote a random variable in sublinear expectation space and is a random variable in the classical probability space whose distribution is characterized by a latent variable . Suppose a representation of the sublinear expectation,
(3.34) |
holds for any . Then we also have the representations for the upper cdf,
(3.35) |
which holds for for any continuity point of . In other words, the representation can be extended to functions in the form . Meanwhile, we also have the representation for the lower cdf,
(3.36) |
which holds for any continuity point of .
Proof of 3.26.
It is easy to show is a monotone function, then the set of discountinuous points is at most a countable set. Let be any continuous point of . For any , take small enough such that,
Take and be two bounded continuous functions such that
and
Then we have
We can apply this inequality to for any given :
then
Note that we can use the representation 3.34 to get,
Then
Since can be arbitrarily small, we have proved the required result 3.35 for . To validate the representation 3.36 for , we simply need to replace with and change to accordingly. ∎
Remark 3.26.1.
(Notes on the continuity of ) Note that 3.26 does not require the continuity of : if . Since one can easily check that is automatically lower continuous: if , the upper continuity ( if ) is what we are really discussing here whenever we say the continuity of . Here we try to avoid the assumption on the upper continuity of which is a quite strong and restrictive one. Even under the regularity of , we can only say the upper continuity holds for closed (Lemma 7 in Denis et al., (2011)). However, when is a continuous point of , consider any sequence converging to as , we do have for sets and ; namely, appears some continuity on this kind of sets.
4 The hybrid roles and applications of semi--normal distributions
In this section, we will show the hybrid roles of semi--normal distributions, connecting the intuition between the classical framework and the -expectation framework, by answering the four questions mentioned in the introduction.
4.1 How to connect the linear expectations of classical normal with -normal
In principle, it is feasible to understand the expectation of -normal distribution through the structure of -heat equation. Nonetheless, as a generalization of the normal distribution, it will be better if we can understand the -normal distribution in a more distributional sense. Is it possible to understand the -normal distribution from our old friend, the classical normal? It is indeed a natural concern or question but this is essentially not straightforward. Even for people who have partially learned the theory of the -expectation framework, there usually exists several common thinking gaps between classical normal and -normal distribution.
For instance, as mentioned in 3.11.3, for ,
(4.1) |
which indicates that the uncertainty set of -normal distribution is larger than a class of classical normal distributions with . Especially, Hu, (2012) shows the strict inequality that when , we have (It stays positive for any odd moments.) Let . By checking the -function defined in 2.14, we have
(4.2) |
which indicates that the -normal distribution should have some “symmetry”. However, exactly due to this identity in distribution shown in 4.2, we should have and share the same (sublinear) third moment: which directly implies,
It tells us that the degree of symmetry or skewness of -normal distribution is uncertain, which somehow looks like a “contradiction” with 4.2 and seems quite counter-intuitive for a “normal” distribution.
Based on the above-mentioned statements showing how different the -normal and classical normal are, our motivation comes from the following opposite aspect: is this possible for us to connect the linear expectation of classical normal distribution with the sublinear expectation of -normal distribution (or use the former to approach the latter one)?
This section will first give an affirmative answer to this question by providing an iterative algorithm given by our previous work (Li and Kulperger, (2018)) based on the semi--normal distribution. Then we are going to extend this iterative algorithm into a general computational procedure to deal with weighted summations in statistical practice.
Theorem 4.1 (The Iterative Approximation of the -normal Distribution).
For any and integer , consider the series of iteration functions with initial function and iterative relation:
(4.3) |
The final iteration function for a given is . As , we have , where .
Remark 4.1.1.
Remark 4.1.2.
From a computational aspect, the normal distribution in 4.3 can be replaced by other classical distributions with finite moment generating functions because this algorithm is based on the -version central limit theorem (as indicated in 4.2). The interval can be further simplified to a two-point set or a three-point set for computational convenience. More theoretical details and numerical aspects (as well as PDE sides) of this iterative algorithm can be found in Li and Kulperger, (2018). This iterative algorithm is also related to the idea of the discrete-time formulation in Dolinsky et al., (2012).
Remark 4.1.3.
Consider a sequence of nonlinearly i.i.d. semi--normal random variables with . Each iteration function can also be expressed as the sublinear expectation of the semi--normal distribution (letting ):
for . Moreover, Li and Kulperger, (2018) further show that the series of iteration functions is an approximation of the whole solution surface of -heat equation on a given time grid. To be specific, consider the -heat equation defined on :
where and . For each , we have
where depending on .
The basic idea of the iterative algorithm comes from the following result. In the following context, without futher notice, let denote a sequence of nonlinearly i.i.d. semi--normally distributed random variables with .
Proposition 4.2.
(A general connection between semi--normal and -normal) For any , we have
where .
Remark 4.2.1.
The iterative algorithm can also be extended to -dimensional cases by extending the dimensions of and accordingly.
Proof.
Then the iterative algorithm (4.1) can be treated as a direct evaluation of . Interestingly, the uncertainty set of each is strictly smaller than the -normal distribution by 4.1 but their normalized sum is able to approach the -normal. Then it leads us to another closely related question: how does the uncertainty set of exactly aggregate (towards the one of -normal) as increases? How does the -version independence change the uncertainty set associated with the expectation of the joint random vector
This question has been answered by the representations shown in 3.24 and 3.24.1.
We can also extend the idea of iterative algorithm into a procedure that can deal with sublinear expectation under sequential independnce in a broader sense. We call it as a -EM (Expectation-Maximation) procedure because it happens to involve expectation and maximization step (but it has no direct relation to the Expectation-Maximiazation algorithm in statistical modeling.)
One of the goals of -EM procedure is to deal with following object for a any fixed :
(4.4) |
where are sequentially independent (the distribution of could also be generalized to any member in a semi--family of distributions which will be defined in Section 5.1) and is the weight vector. Without loss of generality, we assume the Eulicdean norm (or ). These kinds of objects are common in data practice (in the context of financial modeling, statistics or actuarial science). We are going to give an example of a simple linear regression problem in Section 5.4.
The iterative algorithm is a special case of this, with :
which converges to as .
However, in practice, using a asymptotic result may not be feasible here for the following reasons:
-
1.
Note that could be in arbitrary form (ususally depend on the data or problem itself). Although we do we have results like the weighted central limit theorem proved by Zhang and Chen, (2014), we may not always a general asysmptotic result for it.
-
2.
More fundamentally, could be a small number which still has a gap with the asymptotic result. In this case, we need to have a non-asymptoic approximation by involving the convergence rates of the central limiting theorem (like the Berry-Essen bound in classical case) which has been studied by Fang et al., (2019); Huang and Liang, (2019); Song, (2020); Krylov, (2020).
-
3.
If is small compared with the dimension of the data, it further requires us to have a non-asymptotic view of 4.4.
Next we explain the details of the -EM procedure to deal with 4.4. Again, under the spirit of iterative approximation, 4.4 can be computed by the following procedure: with , for ,
Finally we have
Then we can store the optimal choice of control process for our later simulation study (then there is no need to run the iterative algorithm again). Remember, the order we get the optimal process is in the backward order
To follow the original order, we need to reverse it and the optimal process is in the form of
In this way, we have
and the linear expectation can be approximated by a classical Monte-Carlo simulation.
4.2 How to connect univariate and multivariate objects
There are two basic properties for classical normal distribution, which brings convenience in the study of multivariate statistics. First, in for any two independent and both following , we must have form a bivariate normal. (This result still holds even if they are not independent but linearly correlated.) Second, a -valued random vector follows multivariate normal if and only if the inner product is normal for any . However, these two properties no longer hold for -normal distributions. Readers can find the following established result in the book Peng, 2019b (Exercise 2.5.1),
Proposition 4.3.
Suppose and with , for , we have
-
1.
is -normal distributed for any ;
-
2.
does not follow a bivariate -normal distribution.
4.3 shows we cannot construct bivariate -normal distribution directly from two independent univariate -normal distributed random variables. It stays unfeasible even considering any invertible linear transformations of the random vector as shown by Bayraktar and Munk, (2015), which study more details about these strange properties of -normal in multidimensional case.
To further explain the obstacle here, let us first recall that, in Section 4.1, we have shown how to start from the linear expectation of classical normal to correctly understand (also compute) the sublinear expectation of -normal. Suppose our next goal here is to help general audience further understand (or compute) the sublinear expectation of a multivariate -normal distribution with covariance uncertainty characterized by such as from with , and . However, as shown in 4.3, it is difficult to achieve this goal from this path because is not -normal distributed, and neither is for any invertible matrix .
It turns out the connection between univariate and multivariate object is essentially nontrivial. The contribution of this section is to show that this connection can be revealed by introducing an intermediate substructure, called the semi--normal imposed with semi-sequential independence. Typically, 4.4 shows that a joint vector of semi-sequentially independent univariate semi--normal follows a multivariate semi--normal (with a diagonal covariance matrix).
Theorem 4.4.
For a sequence of semi--normal distributed random variables , satisfying for , and
we have
where is the uncertainty set of covariance matrices defined as
Proof.
It is a direct result of 3.22 (the non-identical variance interval here is inessential to the proof). ∎
Next 4.5 shows we can do linear transformation on to get a multivariate semi--normal with a non-diagnonal covariance matrix.
Proposition 4.5 (Multivariate semi--normal under linear transformation).
Let . For any constant matrix with , we have
where
Proof.
First of all, note that with . For any , we have
so , which can be treated as the scaling property for the -dimensional maximal distribution. It follows from that . Therefore,
where
In other words, ∎
Then we can use a sequence of to approach the multivariate -normal by nonlinear CLT.
Theorem 4.6.
Consider a sequence of nonlinearly i.i.d. with . Let be a -normal distributed random vector following . Then we have, for any ,
It means that,
Proof.
This is a multivariate version of 4.2. We only need to validate the conditions. First of all, the sequence definitely has certain zero mean. Then, notice that the distribution of is characterized by the function , where means the trace of the matrix. We only need to prove that for any . By the representation of semi--normal distribution, letting , we have
The argument on how to extend the choice of to is is similar to the proof in 4.2.
Then it creates a path from univariate classical normal to a multivariate -normal. Figure 4.1 shows the relations among linear, semi-- and -normal distributions. We can start from the univariate objects (semi--normal distribution), and construct its multivariate version under semi-sequential independence, then approach the multivariate -normal distribution, which gives us a feasible way to start from univariate objects to approximately approach the multivariate distribution.

4.3 A statistical interpretation of asymmetry in sequential independence
In this section, we will expand 2.7 (which is used to illustrate the asymmetry of independence in this framework) by studying its representation result to provide a more specific, statistical interpretation of this asymmetry. More interestingly, we will show that, for two semi--normally distributed random objects, each of them has certain zero third moment (because its distributional uncertainty can be written as a family of classical normal with different variances). This property is preserved for the summation of them under semi-sequential independence. However, after we impose sequential independence onto them, their summation will exhibit third-moment uncertainty. This phenomenon is closely related to the third moment uncertainty of the -normal (as shown in Section 4.1) by applying the -version central limit theorem (2.17).
Next we expand 2.7 by considering and as two semi--normal distributed random variables.
Example 4.7 (Third moment uncertainty comes from asymmetry of independence).
Suppose , and (which is exactly the classical ) imposed with sequential independence
Let , which turn out to be two identically distributed semi--normal random variables . Note that is a special case of in 2.7. We are going to show that, under different types of independence for ’s or different structures of sequential independence for ’s and ’s, we will have different uncertainty for and whose extreme scenarios can be described by their sublinear expectations.
When or
(4.5) |
since , then we have
(4.6) |
It shows that under semi-sequential independence, does not have third-moment uncertainty. Meanwhile, since follows bivariate semi--normal, we also have
Since we have shown that semi-sequential independence is symmetric (3.19), it means that things do not change when we consider or
However, if we only switch the order of independence between and in 4.5 to get
which means , implying . Note that and , so we still have
It indicates some “symmetry” in its (-version) distribution. Although its second moment is uncertain, we still expect it to have some kind of “zero skewness” which indicates at least “zero third moment”. However, it turns out this is not the case:
(4.7) |
where we apply 2.25 based on the facts that both and have certain zero third moment as well as the results from 2.7:
How can we understand the asymmetry of independence in 4.7 from the representations of the sublinear expectations? This question is answered by the following 4.8, which can be treated as a special case of 3.24.
Let denote a basic family of bivariate polynomials:
Proposition 4.8.
(The joint distribution of two semi--normal random variables under various independence: for a small family of ’s) Consider and any ,
-
•
When , we have
where
-
•
When or , we have
where
Remark 4.8.1.
4.8 provides us with the following intuitions:
-
1.
we can directly see the difference between sequential and semi-sequential independence: under this basic setup, if , we can use the upper envelope of a four-element set to represent (or compute) , while an eight-element set is required when . Meanwhile, note that : it indicates that sequential independence can cover a larger family of models compared with the semi-sequential one. This statement is confirmed in general by 3.24;
-
2.
it reveals another a more intuitive insight on why under sequential independence: the set difference contains those elements where actually depends on the previous (or simply the sign of ). Although this kind of dependence does not create a shift in the mean part of (which is still zero), it will have strong effects on the skewness of the distribution of . This phenomenon is related to the so-called leverage effect in the context of financial time series analysis. In the companion of this paper, we will use a dual-volatility regime-switching data example to give a statistical illustration of this phenomenon and show the necessity of discussing .
-
3.
we can get one specific interpretation of asymmetry in the sequential independence from the format of - the role of and are not symmetric: when , this sequential order means that is realized first and the volatility part of may or may not depend on the value of so as to make the distributional uncertainty in unchanged. In short, when we aggregate the uncertainty set from time to , due to the sequential order of the data, we can only have is affected by a function of but we almost never have is influenced by a function of (due to the restriction from the order of time). As opposed to this asymmetry, the semi-sequential independence is symmetric indicated by the form of : the role of and are symmetric so we must have the same results for and under .
-
4.
More importantly, it also offers guidance on the possible simulation study in this framework. When one intends to generate a data sequence that can go through the scenarios covered by a sequential independent random variables, a more cautious attitude and in-depth thought are required: different blocks of samples with separate may only go through which can be at most treated as semi-sequential independence rather than sequential independence. In order to touch the latter one, one needs to generate those scenarios that allow possible classical dependence between and previous with . By borrowing the language of a state-space model, if we treat as states and as observations, uncertainty in the dependence between current states with previous observations needs to be considered in order to sufficiently discuss the uncertainty set covered by the sequential independence. Otherwise, it is likely to be at most semi-sequential independence. We will further discuss this point in Section 4.4 and further in the companion paper of this work.
Example 4.9 (The direction of independence comes from finer structure).
In the -framework, a symmetric (or mutual) independence between two non-constant random variables only arises if they belong to either of the following two categories: classical distribution or maximal distribution (Hu and Li, (2014)). One interesting question is: how about the independence for combinations (such as products) of them? Logically speaking, if the combination does not fall into the two cases, they must have asymmetric independence, but where does this “asymmetry” come from? To be specific, suppose we have and . Meanwhile, assume independence and . By 2.27, we also have and . However, we do not have and are mutually independent by 2.26, because and are neither classically nor maximally distributed. To further explain the interesting phenomenon here, we have and are mutually independent, so are and . It seems that the role of “ versus ” and “ versus ” should all be “symmetric” and they do not appear any “direction” yet. Nonetheless, when we consider the product and , if they are independent, we must have the direction that either or , but there seems no middle stage where and have some degree of independence where their roles in this relation are symmetric. One question we may ask is, does there exist such kind of middle stage?
In this example, we will give an affirmative answer to this question. It turns out the current conditions of independence are not enough, the relation depends on the structure of independence among the four objects and .
To be compatible with the assumed independence and , suppose we have additional sequential independence among the four objects. There are essentially four cases:
-
1.
If , we have
-
2.
If we have
-
3.
If , we have
-
4.
If , we have
Example 4.10 (The sequential independence is not an “order” with transitivity).
Although sequential independence has its “order”, it is not really an order relation with transitivity. In other words, for three random variables , the sequential independence and do not necessarily imply . A trivial example is when we consider both follow maximal distribution, if we have , then we have (by 3.6), but we never have . A non-trivial example (with three distinct random variables) comes from the fully-sequential independence structure 3.20. For two semi--normal objects and , means
By 2.19, we have and . Then we further have the other direction of independence also holds and (by 2.27). Then we have the following counter-example:
because we already have but the pair cannot have mutual independence by 2.26. Similarly, we have another example
4.4 What kinds of sequences are not -normal?
For examples:
-
1.
generate with , This is essentially a sample from independent normal mixture (with scaling parameter following a uniform distribution). Note that this essence is not affected by the distribution of (as long as follows a fixed distribution). The whole data sequence does not have any distributional uncertainty.
-
2.
first generate , , then generate with . By introducing this blocking design, even though we pretend to treat the switching rule of as unknown here (although it may not be so hard for data analyst to observe this pattern), if we look at the uncertainty set considered in this generation scheme: , it is actually at most a pseudo simulation of semi--normal distribution. One typical feature of this sample is that it does not have skewness-uncertainty: it has a certain zero skewness.
-
3.
consider equal-spaced grid
For each with , generate , . Then treat
(4.8) as an approximation of This kind of schemes has been used in some of the literature (such as Deng et al., (2019); Fei and Fei, (2019)). We may cautiously step back and ask ourselves: is this a valid approximation? Not really, it is actually an approximation of :
where the first convergence can be treated as a classical almost sure convergence and the second one is a deterministic one due to the design of . This fact does not change even using some overlapping groups because each group can be at most treated as a sample from a normal mixture. Again, the problem of the above-mentioned scheme is that it could be misleading for general audience. It is actually going through the uncertainty set of in a semi-sequential independence rather than in sequential independence. For general , only in the later case, the normalized sum will be closer to the -normal distribution. However, this issue can be fixed by consdering an extra step: if the function considered in the question can be proved to be a convex or concave one, then in this practical sense, by 3.24.2, the semi-sequential independence and sequential independence can be treated as the same. For general fixed , we usually need to consider the -EM procedure to do the approximation as discussed in Section 4.1, which is also closely related to Section 4 in Fang et al., (2019). Alternatively, we may consider a small family of ’s so that we have a finite dimensional set of distributions to go through (such as the one in 4.8). In this way, we can get a feasible approximation based on the idea of max-mean estimation by Jin and Peng, (2021) similar to the form 4.8.
The idea of this section will be further illustrated in the companion of this paper by using a series of data experiments.
5 Conclusions and extensions
For a researcher or practitioner from various backgrounds who may not be familiar with the notion of nonlinear expectation, but is comfortable the classical probability theory and normal distribution, when they try to understand the -normal from classical normal, it will be intuitive and beneficial if there exists an intermediate structure that can be directly transformed from the classical normal and also create a bridge towards the -normal. Another thinking gap is from the classical independence (symmetric) to the -version sequential independence (asymmetric). It will be useful if we have an intermediate stage of independence that is under distributional uncertainty but preverses the symmetry, so that it is associated with our common first impression on the relation beween two static separate random objects both with distributional uncertainty, but no sequential order assumed. Once we talk about two objects with sequential order or in a dynamic way, it becomes possible to involve the sequential independence.
This paper has rigorously set up and discussed the semi--normal distributions with its own types of independence, especially the semi-sequential independence. The hybrid roles of these new substructures, the semi--normal with its semi-sequential independence, can be summarized as follows:
-
1.
The semi--normal is closely related to the classical normal that it is simply a classical normal scaled by a -version constant (with a typical independence). Then the semi--normal only exhibits the moment uncertainty with even order but its odd moments, especially the third moment (related to skewness) is preserved to be zero. Meanwhile, the semi--normal is also closely connected with the -normal: they have the same sublinear expectation under a convex or concave transformation . For general , they are connected by the using the -version central limit theorem.
-
2.
The semi--normal with semi-sequential independence also preserve the properties of classical normal in multivariate situation (3.22).
-
3.
4.9 shows the hybrid and intermediate role of semi-sequential independence between classical and the sequential independence. The semi-sequential independence is related to the classical one in the sense that it is symmetric ( implies ) and it is also related to the sequential independence under convex or concave as illustrated in 3.24.2.
We can use a comparison table (Table 1) to summarize the hybrid roles of this substructure: semi--normal with semi-sequential independence, creating a bridge connecting classical normal and -normal.
Normal | Semi--normal | -normal | |
Expectation | Linear | Sublinear | Sublinear |
1st-moment | Certain (0) | Certain (0) | Certain (0) |
2nd-moment | Certain () | Uncertain () | Uncertain () |
3rd-moment | Certain (0) | Certain (0) | Uncertain |
Independence | |||
(Setup) | |||
Stability | |||
Multivariate |
Furthermore, we hope the substructures proposed in this paper will open up new extensions of discussions on the difference and connection between the -expectation framework with the classical one, by providing more details on the operations on the distributions and independence in a ground where the researchers in both areas can have a proper overlapping intuition. In this way, we are able to have a finer discussion on the model uncertainty in the dynamic situation where the volatility part is ambiguous or cannot be precisely determined for data analysts.
Next we will give several possible directions of extensions of this paper. Interestingly, the discussion in Section 5.5 actually shows the vision that, after introducing the tool of sublinear expectation with representations in different situations, we are able to extend our horizon of statistical questions to include those that could be too complicated to handle under classical probability system.
5.1 The semi--family of distributions
We can extend our current notion of semi--normal distribution to a broader class of semi--family of distributions or the class of semi--version distributions.
For simplicity, we only provide this notion in one-dimensional case.
Definition 5.1.
A random variable follows a semi--version distribution if there exists a maximally distributed (where is a bounded, closed and convex set) and a classically distributed , satisfying
such that
for a Borel measurable function satisfying .
Remark 5.1.1.
The three types of independence in 3.16 can be carried over to members in semi--family of distributions. For instance, with , they are called semi-sequentially independent if
Remark 5.1.2.
Most of classical distributions should exist in this framework. To give a quick validation, since there exists which follows standard normal distribution with classical cumulative distribution function (cdf) which is defined as
(Such cdf can be defined using the solution of the classical heat equation.) Let
Note that is a bounded and continuous function, so . Then we can check that follows classical distribution. Next we can use the classical inverse cdf method. For any classical distribution with cdf (no matter it is conitnuous or not), let
(5.1) |
denote the generalized inverse of . Let We only need to add suitable conditions on so that to have . Then we get a random object following distribution with cdf .
Remark 5.1.3.
Note that we assume to be a bounded, closed and convex set for theoretical convenience. In practice, this condition can be weakened. For instance, when we talk about the semi--normal random variable where , the interval can be changed to or .
Example 5.2.
Here are several special examples of semi--version distributions.
-
1.
Consider , and , then follows the semi--normal distribution, whose distributional uncertainty can be characterized by
-
2.
Consider , and
Then the distributional uncertainty of can be described as
We can also show that can cover a family of normal mixture models.
-
3.
(Semi--exponential) Let and . Consider
then we can check that the distributional uncertainty of can be written as
where each has pdf .
-
4.
(Semi--Bernoulli) With , let , and
Consider
Then has distributional uncertainty
Example 5.3.
In general, we can take advantage of the idea of classical inverse cdf method to design the transformation . Then we are able to consider any distributional uncertainty in the form of
(5.2) |
where is the cdf of a classical distribution with parameter . Let denote the generalized inverse of as shown in 5.1. Consider and , and
After we add more conditions on such that , we have has distributional uncertainty in the form 5.2, because
where follows the distribution with cdf .
To further study the properties of semi--version distributions and the semi-sequential independence, let denote the semi--family of distributions:
Note that satisfies:
-
1.
If , for any ,
-
2.
If , ,
-
3.
For , if ,
For any , we have is equivalent to (by the symmetry of semi-sequential independence as illustrated by 3.19), so we can omit the direction between and and also call the mutual semi-sequential independence between them as semi--independence.
Definition 5.4.
(Semi--independence) For any , with and , and are semi--independent if
-
1.
,
-
2.
(which is equivalent to ),
-
3.
are classically independent.
Definition 5.5.
(Semi--independence of sequence) For a sequence with , they are (mutually) semi--independent if
-
1.
,
-
2.
are -version (sequentially) independent (that is ),
-
3.
are classically independent.
A sequence are called semi--version i.i.d. (or semi--i.i.d.) if they are identically distributed and semi--independent.
In the following context, consider , with and .
Proposition 5.6.
If and are semi--independent, for the joint vector , we have for any ,
Proof.
This is direct consequence of the definition of the semi--independence. ∎
For any and , let
Then
In the following context, for simplicty of discussion, we assume are continuous functions. Then we can take maximum on the rectangle in 5.6. (This assumption can be relaxed whenever does not affect the derivation.) Readers may find out that, under the semi--independence, our manipulation of sublinear expectation of semi--version objects becomes quite intuitive and flexible.
Proposition 5.7.
If and are semi--independent, we have
Proof.
Remark 5.7.1.
Compared with 2.25, for , we have one more situation for to hold:
-
1.
either or has mean-certainty,
-
2.
or ,
-
3.
and are semi--independent.
Proposition 5.8.
If and are semi--independent and either one of them has certain mean zero, we have
Proof.
Since and are semi--independent, by 5.6,
If we have either one of them has certain mean zero such as , we have
It means for any . Then we must have and similarly we have by changing to . ∎
5.2 The semi--version of central limit theorem
After setting up the semi--family of distributions and semi-sequential independence, it turns out we can prove a semi--version of central limit theorem in this context, which further brings a substructure connecting the classical central limit theorem with the -version central limit theorem. It also shows the central role of semi--normal in a semi--version class of distributions.
First we consider a subset of :
where we call a classical is standardized if and . Here can be treated as a class of semi--distributions with zero mean and variance uncertainty.
Our current version of the semi--version of central limit theorem can be formulated as follows.
Theorem 5.9.
For any sequence that are semi--i.i.d. with certain zero mean and uncertain variance:
we have
where . To be specific, for any bounded and continuous , we have
(5.3) |
Remark 5.9.1.
Note that any must be a bounded and continuous one, so the convergence in distribution (2.5) must hold.
Remark 5.9.2.
As a classical perspective of 5.9, by using the representation of semi--normal under semi--independence, we have
(5.4) |
where and is a scalar vector. This form is also equivalent to
where could be any hidden processes valuing in that is independent from . When the unknown variance form is taken in this way, the uncertainty on the behavior the normalized summation can be asymptotically characterized by the semi--normal.
If is chosen from a larger family that may involve dependence between with previous , then it will be related with the -version central limit theorem (under sequential independence, rather than the semi--version independence): are sequentially independent,
which gives us
To summarize, the semi--normal distribution can be treated as the attractor for normalized summations of semi--i.i.d. random variables and the -normal is the attractor for summations of -version i.i.d. random variables.
In the proof of 5.9, we adapt the idea of Lindeberg method in a “leave-one-out” manner (Breiman, (1992)) to the sublinear context. One of the reason that we are able to do such adaptation is the symmetry in semi--independence: is semi--independent from (Note that we cannot do such adaptation under sequential independence due to its asymmetry). More details of the proof can be found in Section 6.6.
Since we only have finite second moment assumption so far in 5.9, by adding stronger moment conditions on , the function space of can be taken to be to include those unbounded ones. This statement is based on 2.18.
As a basic example, given a stronger condition , we can check that the convergence 5.3 holds for by direct computation (5.10).
Example 5.10 (Check ).
In the convergence 5.3, since , we only need to show:
In fact,
(Note that, if and , the second term becomes the first one.) For the summand of the second term, without loss of generality, we assume that . Then we have three cases:
-
1.
,
-
2.
,
-
3.
.
In Case 1, since and are semi--independent, we have:
(Note that does not hold under sequential independence by 2.7.) Meanwhile, we can obtain so has certain mean zero.
We can similarly prove the result in Case 2, that is, has certain mean zero. For Case 3, since are semi--independent, we have
We further have using the same logic. Therefore,
where we use the condition that and 5.7.
5.3 Fine structures of independence and the associated family of state-space volatility models
In 4.9, we have mainly discussed the independence between two semi--distributed objects. Here we consider three of them as a starting point to discuss much finer structure of independence.
Consider . The independence structure among them is essentially related to the -version independence among and , . For instance,
-
(a)
,
-
(b)
.
Note that a) is equivalent to and b) means which implies .
Then we can see that there are several middle stages between (a) and (b). In order to present these intermediate stages, let us play a simple game: switch two components each time and change the independence structure from (a) to (b). During this game, the following rules are required:
-
R1
we must keep the independence due to the definition of semi--normal,
-
R2
we must keep the order as and ), because the independence order of elements within each vector is usually equivalent. Otherwise, if we break this order, we need an unnecessary extra step to retrieve the index order to be consistent with (b).
Here we can get two approaches:
-
1.
Since we do not want to break the order within or ), the first step has to be switching some with with . For the part, we can only move due to R1, and similarly for part we can only move . Hence, the first step is to exchange and in (a) to get
(5.5) Then we have two equivalent ways to move on.
- 2.
-
3.
Another way is to exchange and to get
(5.7) Then we can exchange and to get (b).
Note that 5.6 implies the following relation:
We can show that the family of models associated with the representation of under 5.6 can be illustrated by Figure 5.2. Similarly, 5.7 implies
The family of models associated with 5.7 can be described by Figure 5.3. The family of models for 5.5 can be shown by Figure 5.1.
The intuition here is: if all is before , since has distributional certainty, in the directed graph, does not have effects on . As long as we have is before in the order of the -version independence, we must have the additional edge from to in the directed graph of the family of models to represent the sublinear expectation of the joint vector.
We can see that by changing the independence structure, the sublinear expectation of a joint vector of semi--version of distributions can be represented by classes of state-space models with different graphical structures.
One question to be explored is whether there is an independence structure that is associated with the familiy shown in Figure 5.4. Our conjecture is as follows: at least we need the following conditions,
-
1)
which means ,
-
2)
which means ,
-
3)
which means .
5.4 A robust confidence interval for regression under heteorskedastic noise with unknown variance structure
Let denote a sequence of nonlinearly i.i.d. semi--normally distributed random variables with . In Section 4.1. we have studied the -EM procedure which is aimed at the following expression:
(5.8) |
This section will provide a basic example in the context of regression to show why we need to think about 5.8 in statistical practice.
Consider a simple linear regression problem in the context of sequential data (where the order of the data matters):
(5.9) |
where is treated as known and with and for each . We can see that the noise part is heteorskedastic (although is not observable). However, if the variance structure of the noise part is complicated due to measurement errors or the data is collected from different subpopulations with different variances, we need to have some precaution on the properties of the least square estimator , especially when we are lack of prior knowledge on the dynamic of . If we worry that may depend on the preivous with , rather than assuming a single probabilistic model for then perform the regression, in an early stage of data analysis, we can first assume could belong to any elements in defined in Section 3.7. Note that the distributional uncertainty of each can be described by . Then the distributional uncertainty of 5.9 can be translated into a -version format:
(5.10) |
Let
Then the least-square estimator can be written as
(5.11) |
Then we have Note that .
Then we are able to study the properties of by assigning different forms of in 5.8:
-
1.
With and , we have the centred moments of
-
2.
With , we get the object that is useful to derive a confidence interval in this context:
(5.12)
Interestingly, from 3.24 and 3.26, 5.12 further leads us to a robust confidence interval by solving the following equation:
or
The resulting confidence interval is robust in the sense that its coverage rate will be at least regardless of the unknown variance structure of the noise part in the regression. If we have more information that shows does not depend on the previous with , we can consider a smaller family of sets . Alternatively, it also provides a way to perform a sensitivity analysis on the performance of a regression estimator (such as here) under heteroscedastic noise with unknown variance structure that could belong to different family of models.
Then this discussion leads us to another interesting question. In an early stage of data analysis, should we choose or ? This question will be explored in Section 5.5.
5.5 Inference on the general model structure of a state-space volatility model
Recall the setup in Section 5.4. In practice, if lacking knowledge on the underlying dynamic of the datasets, whether we should choose or is a difficult problem in classical statistical methodology (in model specification) because both families involve a infinitely-dimensional family of elements. However, it turns out it can be essentially transformed into a -version question: it has a feasible solution once we introduce the -expectation of semi--family of distributions. This becomes a hypothesis test to distinguish between semi-sequential independence and sequential independence. To be specific, we are able to consider a test:
A good interpretation of this test is, since the class of hidden Markov models (with volatility as the switching regimes) belong to . If we reject the null hypothesis, it means the underlying process cannot be treated as a switching-regime in the hidden Markov setup (or in any other kinds of normal mixture model), but we need to re-investigate the dataset and consider the process outside of the family of the normal mixture model (for instance, we may need to introduce other dependency, like the one between the previous observation with current , such as a feedback design). Throughout this discussion, we did not make any parametric assumption on the model of , and we are still able to give a rigorous test on this distinction. The idea of this test will take advantage of 3.24 to transform the distinction between two family of classical models to a task of distinguishing two different types of independence ( versus ) for semi--normal vector . There are plenty of test functions (neither convex nor concave ones) to reveal the difference between and such that
For instance, we can choose
Under this , the expectation under is a certain zero but the one under is greater than zero. Then we should be able to construct a test statistic based on the form of and obtain a rejection region by studying its tail probability under which can be transformed back into the sublinear expectation of .
How to choose the test function will have significant effect on the performance of this hypothesis test. Moreover, the current interpretation of is the length of the whole data sequence and is the unknown volatility dynamic of the full sequence. We can also interpret as the group size after grouping the dataset in a either non-overlapping or overlapping manner, then we can consider for each group to test whether there is a case falling into the class of , because the sublinear expectation can give a control on the extremes of the group statistics as indicated by Jin and Peng, (2021) and Section 2.2 in Fang et al., (2019).
Acknowledgements and the story behind the semi--normal
We have received many useful suggestions and feedback from the community in the past four years which are beneficial to the formation of this paper (so this paper can be treated as a report to the community).
The authors would like to first express their sincere thanks to Prof. Shige Peng who visited our department in May 2017 (invited by Prof. Reg Kulperger) and our discussion at that time motivated us to study a distributional and probabilistic structure that has a direct connection with the intuition behind the existing max-mean estimation proposed by Jin and Peng, (2021). Later on during the Fields-China Industrial Problem Solving Workshop in Finance, and a short course on the -expectation framework given by Prof. Peng at Fields Institute, Toronto, Canada, we had several interesting discussions on the data experiments in this context, which can be treated as the starting point of the companion paper of the current one. In our regular discussion notes in that period, there was a prototype of the current semi--normal distribution and also a question on independence between semi--normal was raised which is currently included and answered by 4.9.
Although the design of semi--normal is mainly for distributional purpose, this concept was first proposed in Li and Kulperger, (2018), which was applied to design an iterative approximation towards the -normal distribution by starting from the linear expectations of classical normal, as discussed in Section 4.1. During the 2018 Young Researcher Meeting on BSDEs, Nonlinear Expectations and Mathematical Finance in Shanghai Jiao Tong University, we have received beneficial feedback on this iterative method from participants in the conference. Specially we would like to thank to Prof. Yiqing Lin on providing more references that have potential theoretical connections. 4.1.2 is benefited from the comments by Prof. Shuzhen Yang and Prof. Xinpeng Li.
The first author would also like to express his gratitude to Prof. Huaxiong Huang at Fields Institute and his Ph.D. student Nathan Gold for their support and suggestions during a separate long-term and challenging joint project (regularly discussing with Prof. Peng) during summer 2017 on a stochastic method of the -heat equation in high-dimensional case and its theoretical and numerical convergence. In this project, the first author has learned the intuition about the methods in how to appropriately design a switching rule in the stochastic volatility to approximate the solution to a fully nonlinear PDE, which is related to the methods based on BSDEs and the second order BSDEs, and also the intuition behind nonlinear expectations in this context. Although the methods are different, this experience creates another motivation for Li and Kulperger, (2018).
The authors are grateful for the valuable discussions with the community during the conference of Probability, Uncertainty and Quantitative Risk in July 2019. One of the motivations of 4.9 is from the comments by Prof. Mingshang Hu on the independence property of maximal distribution. The writing of Section 4.4 is motivated by the discussions with Prof. Peng during the conference. Section 4.4 and further the data experiments in the companion paper are also benefited from the discussions on the meaning of sequential independence under a set of measures with Prof. Jianfeng Zhang.
We have also benefited from the feedback from participants coming from various backgrounds in the Annual Meetings of SSC (Statistical Society of Canada) in 2018 and 2019 to understand the impression from general audience on the -expectation framework. During the poster session of the Annual Meeting of SSC at McGill University in 2018, we have received several positive comments about designing a substructure connecting the -expectation framework (which is a highly technical one for general audience) with the objects in the classical system. These comments further motivate us to write this paper for general readers. In the Annual Meeting of SSC at Calgary University in 2019, there is a comment from the audience on the property of and the choice of function space ( could be quite small if we choose a large function space for ). It has motivated us to improve the preliminary setup (Section 2) and put more attention on the design of .
During the improvement of this manuscript from the first version (April 2021) to the third version (October 2021), the authors are grateful to Prof. Defei Zhang who gives many beneficial comments (such as the comment on the product space and an improvement of Figure 4.1) and Prof. Xinpeng Li whose suggestion motivates us to develop the research in Section 5.2.
6 Proofs
6.1 Proofs in Section 3.2
Proof of 3.2.
The finiteness of is obvious due to the continuity of and the compactness of . First of all, note that 3.2 is a direct result of 3.1. It is also not hard to see 3.5, since for any , it satisfies , then
which implies
Since , we also have the other direction of inequality holds. Similarly, we can show 3.3.
To validate 3.4, we need to show that for any , there exist a random variable , such that
(6.1) |
Let . Then we have . Since is a continuous function on , there exists such that . In a classical probability space , consider a series of random variables where and . In short, with diminishing variance. Then we must have . Then transform into its truncation on : with . We can easily show that since, for any , By classical Slustky’s theorem, Therefore, for any ,
For any , there exists such that Let which belongs to . It the required object satisfying 6.1, because
Proof of 3.6.
We can prove it by mathematical induction. For , it obviously holds. Suppose the results holds for with , namely,
then we only need to show it holds for . In fact, consider any locally Lipschitz function
satisfying, there exists , ,
Since is independent from ,
Let and
For notational convenience, we sometimes omit the domain of the maximization here in our later discussions if it is clear from the context.
Claim 6.1.
We have and .
Then we are able to apply the representation of maximal distribution (allowed by 6.1) to have
Therefore,
The conclusion can be achieved by induction.
The remaining task is to prove 6.1.
To show , we write
where we adapt to lower dimension in the sense that . Notice
Meanwhile, there exists (actually ), such that
Then we have
where
Next we check . For any ,
where . ∎
Proof of 3.5.
The first statement can be proved by studying the range of . First, we need to show that is also a locally Lipschitz function for any . Suppose satisfies,
(6.2) |
We first can write
(6.3) |
As preparations for next step, we are going to frequently use tha basic fact that the lower-degree polynomials can be dominated by higher-degree ones in the sense that,
(6.4) |
and for any ,
(6.5) |
In 6.3, we can directly use 6.2 to dominate . For the parts like , 6.2 implies,
then there exists such that,
Hence, we can get by the inequality as follows,
Finally, we have from its representation:
The second statement essentially comes from the basic property of the maximum of a continuous function on a rectangle: in this ideal setup, the order of taking marginal maximum does not affect the final value. To show the basic idea, start from a simple case : if , for any , we can work on to show the other direction of independence,
where we have used the fact that if , which can be validated by 6.1. Hence, we have .
In general, for any permutation of , our objective is to prove for any ,
From the first statement, , as a function of , must also follow a maximal distribution, characterized by with
Then we can mimic the derivation for to check the independence,
Since it holds for all possible , it is equivalent to say
6.2 Proofs in Section 3.4 (improved)
In order to show the uniqueness of decomposition (3.9), we first prepare several lemmas.
Lemma 6.1.
For any where and is classical, if where is classical, we must have, for any fixed ,
Proof.
Since for any function ,
by replacing with , we have
It means for any , we have
Therefore, we have ∎
Lemma 6.2.
For a maximally distributed , we have .
Proof.
Let
Then we have and or . (Since each , we have by the completeness of .) Note that
It implies that
then
∎
Lemma 6.3.
Consider where is a compact and convex set and follows a non-degenerate classical distribution with . For any , if , there exists with such that does not depend on or simply when .
Proof.
For any with on , let . Then we have
Meanwhile, note that is bounded by and is bounded by in quasi-surely sense or
Then, for any , we have
then the intersection of two events has probability ,
Hence, with we must have
(The measurability of comes from the continuity of . Under any , the distribution of is always due to 3.7.1.) Then we have
Therefore,
For any , since ,
Then there must exist with such that for ,
or
For any , let with , then on . We have
Therefore, for ,
(6.6) |
If there exist two distinct ,
we must have
This is a contradiction against 6.6. Then we have, for any ,
where is any constant chosen from . ∎
Proof of 3.9.
Since , we have where and is classical satisfying . Suppose there exist and such that . To be specific, without loss of generality, we can assume
such that . Then we have
Note that , then by 6.3, there exist with such that when . Let then we still have . Then we have, for any ,
Similarly, we can also show
Note that
where . By 6.1, letting , the fact that implies, for ,
Then we have, with ,
Meanwhile,
Then we have the set has to be a singleton . It means that or (in a quasi-surely sense). Then we also have
It means that then . The uniqueness has been proved. ∎
Proof of 3.12.
This is a direct consequence of 2.15. Let . On the one hand, for any , as discussed in 3.11.3, we have
On the other hand, when is convex or concave, by 2.15, we have
Hence, we have under convexity (or concavity) of .
For readers’ convenience, we include an explicit proof on why we have such results for semi--normal distribution. For techinical convenience, we assume is second order differentiable. From the representation of the semi--normal distribution (2.15), with , our goal is to show
First of all, by Taylor expansion with , we have,
where is a random variable depending on . Let , then
where is the density of . When is convex, we can use the fact to show the monotonicity of :
Its tells us is increasing with respect to , then reaches its maximum at . Hence,
When is concave, is convex. Replace above with and repeat the same procedure, we are able to show is increasing with respect to , that is, is decreasing and reaches its maximum at . ∎
6.3 Proofs in Section 3.6
The proofs in this section are mainly based on the results in Section 2.2 which provides fruitful tools to deal with the independence of sequence in this framework.
Lemma 6.4.
In sublinear expectation space, for a sequence of i.i.d. random variables (namely, ), we have
where is the identity matrix.
Proof.
Since the distribution of can be treated as the classical , the sequential independence can be treated the classical independence (2.6.3). Then we can get the required results by applying the classical logic. ∎
Remark 6.4.1.
Since the independence of is classical, the order of independence can also be arbitrarily switched so we can easily obtain a result similar to 3.6.
Proposition 6.5.
For a sequence of i.i.d. random variables , the following three statements are equivalent:
-
(1)
,
-
(2)
for any permutation of ,
-
(3)
.
Proof of 3.17.
Since the fully-sequential independence implies (F1) and (F2) by 2.20 and 2.23, we only need to show the other direction. When , this result is a consequence of 2.24. For , let
Next we proceed by math induction. Suppose the result holds for with . For , we only need to show: given the conditions
-
1.
,
-
2.
for ,
we have the fully-sequential independence:
(6.7) |
Since all the independence relation in 6.7 until the term can be guaranteed by the presumed result with , we only need to show the additional independence:
-
1.
,
-
2.
.
The first one comes from (F1) whose definition implies . The second one comes from 2.22 given the following statements:
-
1.
by (F1);
-
2.
by the first one.
Then we have the required result for . The proof is finished by math induction. ∎
Proof of 3.18.
First the definition 3.18 of semi-sequential independence implies (S1) to (S3) by 2.20 and 2.23. We only need to check the other direction. For , let and similarly define the notation . Our goal can be expanded as by 2.8:
-
1.
for any ,
-
2.
for any .
The first one comes from (S2). For the second one, note that we have
-
1.
by (S1),
-
2.
by (S3),
-
3.
by (S2),
then by 2.24, we have proved the second relation. ∎
Proof of 3.22.
The idea main the equivalent definition of semi-sequential independence given by 3.18, which shows the symmetry within part and part. The equivalence of the three statemenst will be proved in this logic:
Let denote a permutation function.
. It is a direct translation of 3.18 by considering the equivalence in each part:
. Let . Then
Then we can decompose (3) into three conditions each of which is equivalent to the condition in (1) under the context of 3.18:
6.4 Proofs in Section 3.7
Proof of 3.24.
(A note on the finiteness of sublinear expectations) For any , it means there exists and such that for
Without loss of generality, we can assume , then we have . It implies
To validate under each case, it will be sufficient to confirm the finiteness of this sublinear expectation: for any ,
(6.8) |
(Semi-sequential independence case) Under the independence specified by 3.25, from 3.14, we have
where . Therefore,
where is the symmetric square root of and At the same time, we can validate the finiteness , because follows a classical multivariate normal distribution, then for any , , which implies 6.8.
Next, since we have , we only need to show for any ,
(6.9) |
Note that for and the random vector must follow a joint distribution supporting on a subset of , then we can apply the representation of multivariate semi--normal distribution (3.14) to get the inequality 6.9.
(Sequential independence case) We proceed by mathematical induction. For , the result 3.29 and 3.30 as well as the finiteness 6.8 hold by applying 3.10. Suppose they also hold for with . Our objective is to prove them when by using the result with . We decompose this goal into three inequalities:
(6.10) |
(6.11) |
and
(6.12) |
After we check the three inequalities above, can be changed to since we will show the sublinear expectation can be reached by some in the proof of 6.10.
First of all, 6.11 is straightforward due to the fact that .
Second, to validate 6.10, it is sufficient show the sublinear expectation can be reached by choosing some . In fact, we can directly select it by the iterative procedure (similar to the idea of 4.1).
where is the maximizer depending on the value of .
Claim 6.2.
For any , let
Then we have .
To apply the result when , we first confirm that (due to 6.2) Then we have
where is the maximizer. From this procedure, we can choose
which is corresponding to an element in . Then it is easy to confirm that by repeating the procedure above. Meanwhile, the finiteness 6.8 is also guaranteed since, for any , choose , we have
due to the confirmed fact that and the assumed 6.8 for .
Third, as an equivalent way of viewing 6.12, we need to prove for any , the corresponding linear expectation is dominated by ]. Actually, we can write the classical expectation as
(6.13) |
Recall the notation we used in the proof of 6.10,
For the conditional expectation part in 6.13, since the information of is given and , from the representation of univariate semi--normal (3.10), it must satisfy:
Hence, by taking expectations on both sides and applying the presumed result for , we have
Therefore, we have shown 6.12. The proof is completed by induction.
(fully-sequential independence case) Note that fully-sequential independence implies the sequential independence and we have shown an explicit representation of for the latter situation. Hence, the representation here is the same as 3.29 and 3.30.
To prove 6.2, first recall the definition of , which means there exists and such that for
Note that
Then we write
where we adapt the norm to lower dimension in the sense that . By triangle inequality, for any ,
then
Hence, with and ,
Proof of 3.24.2.
Under semi-sequential independence, note that
with . It has the representation (3.5) that,
where follows a standard multivariate normal. When is convex, by simply repeating the idea of 3.12 in multivariate case, we can show
Accordingly, when is concave, we can get similar result with replaced by .
Under sequential independence, based on the idea of showing
in 3.24. The maximizer can be obtained by implementing the iterative algorithm: with , ,
(6.14) |
Then we only need to record the optimizer which is a function of to get the maximizer . First we can show that, for ,
(6.15) |
namely, the convexity (or concavity) of can be carried over to . Actually, if is convex (in , it must be convex with respect to each subvector of arguments. Then by applying 3.12, we have
(6.16) |
which also gives us the choice of . Then we can validate the convexity of by definition: with , ,
We can follow the same arguments to show the concave case. Finally, we can start from the convexity (concavity, respectively) of to show the convexity of all and along the way, we get each of the optimal is equal to (, respectively). ∎
6.5 Proofs in Section 4.3
The goal of this section is to prove 4.8, which is a simple representation for for . Throughout this section, without further notice, we consider imposed with sequential independce . We also have the expression with , and .
Lemma 6.6.
For , if is odd,
that is, it has certain zero mean.
Proof.
Directly work on the sublinear expectation by imposing the sequential independence. Let then we have
because we are essentially working on the odd-moment of with . Then we have
Similarly, we have . ∎
Lemma 6.7.
For with even positive integer ,
Furthermore, we have the even moments of :
Proof.
This result directly comes from due to the convexity of , which can be validated by considering its Hessian matrix. Then we can check that
∎
Lemma 6.8.
For with odd positive integer ,
(6.17) |
where satisfies:
(6.18) |
Furthermore, for odd , we have the moments of :
where
Proof.
We can directly check the sublinear expectation
Since the terms in the form of , or with even (then is odd) all have zero mean (with no ambiguity), they can be omitted in the computation by 6.6. Hence,
The inner part can be expressed as
Notice that the monotonicity of with respect to depends on the sign of . Hence
Then we have
Here we have
with each has certain mean zero, so . Therefore,
with . Since is a convex function by noting that it stays at on and increases when , we have
Therefore, we obtain the optimal of in the form 6.18, which can be doubly checked by plugging it back to the right handside of 6.17 to show the equality. We can further get the exact value of by continuing the procedure above,
Here we need to use the property of the classical half-normal distribution:
Since and have the same distribution,
then Hence,
Also notice that , then follows a half-normal distribution or follows a -distribution with raw moments:
Then for , with
Therefore,
Proof of 4.8.
The representation under semi-sequential independence can be directly checked based on 3.24 and 3.24.3. In the following context, we only consider the case of sequential independence , because will induce the same result by a similar logic to the proof of 3.24. The basic idea is we need to show that can be reached by the linear expectation on the right handside for some . Then we have
The reverse direction of inequality comes from the fact that and 3.24. The logic here is similar to the proof of 3.24 in sequential-independence case, that is, we only need to record the optimal choice of when evaluating the sublinear expectation in an iterative way.
For instance, when , the sublinear expectation can be reached by some as illustrated in 6.7 and 6.8. For with ,
(6.19) |
Meanwhile, for any , let and denote the linear expectation. Note that ,
with . We also have due to the setup which is the same as Section 3.7. Then
Hence,
(6.20) |
Then we divide our discussions into three cases: a) is odd, b) is even and is even, c) is even and is odd. When is odd, the expectation in 6.19 is equal to by 6.6. It can be obviously reached the linear expectation on the right handside by choosing any by 6.20. When is even, we can see that the choice of depends on the sign of which further depends on the sign of if is odd (otherwise it is always non-negative). To be specific, when both and are even,
which can be reached by choosing , namely, . When is even and is odd, we have
Hence, by 3.12, we have
It can be reached by choosing and accordingly in 6.20. Similar logic can be applied to and also where the scaling does not have effects on the form of the optimal choice of . ∎
6.6 Proofs in Section 5.2
To prepare for the proof, we consider the following function space:
-
•
: the space of -times continuously differentiable functions on
-
•
: the space of bounded and continuous functions on ,
-
•
.
For any , since is bounded, we have
Lemma 6.9.
Assume and . Suppose the convergence
(6.21) |
holds for any . It also holds for .
Proof of 6.9.
We first consider with a compact support . By the uniform approximation provided by Pursell, (1967), for any , there exists with support such that
For , since is continuous and it is on a compact support, it must be bounded by . By mean-value theorem, for and some , we have
Thus is uniformly continuous and bounded, implying . In this way, we have
Hence, . It means that
Since can be arbitrarily small, we have the convergence 6.21 holds.
Next consider any which is bounded by . For any , it can be decomposed into where has compact support and satisfies if and for ,
Then we have
where the first term must converge by our previous argument. Then we only need to work on the second term that satisfies:
Note that . Then we have . Since can be arbitrarily large, we obtain the convergence 6.21. ∎
Lemma 6.10.
For any , the function , defined as
must be a bounded and increasing one. It also satisfies .
Proofs of 6.10.
The boundedness (and the limit property) can be directly derive from the boundedness (and uniform continuity) of . For the monotone property, for any , since , we must have . ∎
Proof of 5.9.
We adapt the the idea of the Linderberg method in a “leave-one-out” manner to the sublinear context. One of the reason that we are able to do such adaptation is the symmetry in semi--independence: is semi--independent from .
Note that with the semi--independence then we have
Then we consider a sequence of classically i.i.d. satisfying and
For each , consider a triangle array,
and
For this , consider another triangle array which are semi--version i.i.d. following semi--normal and satisfy
Note that here we use the same sequence in . This setup is important for our proof to overcome the difficulty brought by the sublinear property of (it also gives some insight about the role of in the classical central limit theorem compared with in sublinear context). Let
then we must have (by the stability of semi--normal as shown in 3.23);
Our goal is to show the difference, for any (recall 6.9), as ,
(6.22) |
Consider the following summations:
(6.23) |
and
(6.24) |
with the common convention that an empty sum is defined as zero. Note that and , then we can transform the difference in 6.22 to the telescoping sum
(6.25) |
and
(6.26) |
Then we only need to work on the summand . By a Taylor expansion,
for some .
For the first term , its sublinear expectation must exist because the growth of is at most linear due to the boundedness of . Note that is the inner product of
and
with the independence , so we have and are semi--independent. Then we can compute
Similarly, we have . Hence, has certain mean zero. Then we have
For the second term , note that
For , since , by recalling the property of (6.10), we have
where we use the setup and . Similarly, we have
where we use the setup . For , since and are semi--independent, (noting that and depend on the same ,) we have
where we use the fact that . Similarly we have so has certain mean zero. Therefore, we have
Meanwhile, if we reverse the role of and and let with , we get
Hence, by 6.25 and 6.26, we have
Note that for any , , we have
By 6.10, we have for all so . Meanwhile, we have (classically) almost surely as , then by classical dominance convergence theorem, we have implying . Similarly we can show . Finally, we have
or
References
- Artzner et al., (1999) Artzner, P., Delbaen, F., Eber, J.-M., and Heath, D. (1999). Coherent measures of risk. Mathematical finance, 9(3):203–228.
- Bayraktar and Munk, (2015) Bayraktar, E. and Munk, A. (2015). Comparing the -normal distribution to its classical counterpart. Communications on Stochastic Analysis, 9(1):1–18.
- Breiman, (1992) Breiman, L. (1992). Probability. Society for Industrial and Applied Mathematics, USA.
- Chatfield, (1995) Chatfield, C. (1995). Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society: Series A (Statistics in Society), 158(3):419–444.
- Chen and Epstein, (2002) Chen, Z. and Epstein, L. (2002). Ambiguity, risk, and asset returns in continuous time. Econometrica, 70(4):1403–1443.
- Choquet, (1954) Choquet, G. (1954). Theory of capacities. In Annales de l’Institut Fourier, volume 5, pages 131–295.
- Crandall et al., (1992) Crandall, M. G., Ishii, H., and Lions, P.-L. (1992). User’s guide to viscosity solutions of second order partial differential equations. Bulletin of the American Mathematical Society, 27(1):1–67.
- Deng et al., (2019) Deng, S., Fei, C., Fei, W., and Mao, X. (2019). Stability equivalence between the stochastic differential delay equations driven by -brownian motion and the euler–maruyama method. Applied Mathematics Letters, 96:138–146.
- Denis et al., (2011) Denis, L., Hu, M., and Peng, S. (2011). Function spaces and capacity related to a sublinear expectation: application to -Brownian motion paths. Potential Analysis, 34(2):139–161.
- Der Kiureghian and Ditlevsen, (2009) Der Kiureghian, A. and Ditlevsen, O. (2009). Aleatory or epistemic? Does it matter? Structural safety, 31(2):105–112.
- Dolinsky et al., (2012) Dolinsky, Y., Nutz, M., and Soner, H. M. (2012). Weak approximation of -expectations. Stochastic Processes and their Applications, 122(2):664–675.
- Ellsberg, (1961) Ellsberg, D. (1961). Risk, ambiguity, and the Savage axioms. The Quarterly Journal of Economics, 75(4):643–669.
- Epstein and Ji, (2013) Epstein, L. G. and Ji, S. (2013). Ambiguous volatility and asset pricing in continuous time. The Review of Financial Studies, 26(7):1740–1786.
- Fang et al., (2019) Fang, X., Peng, S., Shao, Q., and Song, Y. (2019). Limit theorems with rate of convergence under sublinear expectations. Bernoulli, 25(4A):2564–2596.
- Fei and Fei, (2019) Fei, C. and Fei, W. (2019). Consistency of least squares estimation to the parameter for stochastic differential equations under distribution uncertainty. arXiv preprint arXiv:1904.12701.
- Föllmer and Schied, (2011) Föllmer, H. and Schied, A. (2011). Stochastic finance: an introduction in discrete time. Walter de Gruyter.
- Hu, (2012) Hu, M. (2012). Explicit solutions of -heat equation with a class of initial conditions by -Brownian motion. Nonlinear Analysis, 75(18):6588–6595.
- Hu and Li, (2014) Hu, M. and Li, X. (2014). Independence under the -expectation framework. Journal of Theoretical Probability, 27(3):1011–1020.
- Hu et al., (2017) Hu, M., Peng, S., Song, Y., et al. (2017). Stein type characterization for -normal distributions. Electronic Communications in Probability, 22.
- Huang and Liang, (2019) Huang, S. and Liang, G. (2019). A monotone scheme for -equations with application to the explicit convergence rate of robust central limit theorem. arXiv preprint arXiv:1904.07184.
- Huber, (2004) Huber, P. J. (2004). Robust statistics, volume 523. John Wiley & Sons.
- Jin and Peng, (2016) Jin, H. and Peng, S. (2016). Optimal unbiased estimation for maximal distribution. arXiv preprint arXiv:1611.07994.
- Jin and Peng, (2021) Jin, H. and Peng, S. (2021). Optimal unbiased estimation for maximal distribution. Probability, Uncertainty and Quantitative Risk, 6(3):189–198.
- Knight, (1921) Knight, F. H. (1921). Risk, uncertainty and profit, First edition. Boston, New York, Houghton Mifflin Company.
- Krylov, (2020) Krylov, N. V. (2020). On shige peng’s central limit theorem. Stochastic Processes and their Applications, 130(3):1426–1434.
- Li, (2018) Li, Y. (2018). Statistical exploration in the -expectation framework: the pseudo simulation and estimation of variance uncertainty. Master’s thesis, The University of Western Ontario, London, ON, Canada.
- Li and Kulperger, (2018) Li, Y. and Kulperger, R. (2018). An iterative approximation of the sublinear expectation of an arbitrary function of -normal distribution and the solution to the corresponding -heat equation. arXiv preprint arXiv:1804.10737.
- Pei et al., (2021) Pei, Z., Wang, X., Xu, Y., and Yue, X. (2021). A worst-case risk measure by -var. Acta Mathematicae Applicatae Sinica, English Series, 37(2):421–440.
- Peng, (2004) Peng, S. (2004). Filtration consistent nonlinear expectations and evaluations of contingent claims. Acta Mathematicae Applicatae Sinica, English Series, 20(2):191–214.
- Peng, (2007) Peng, S. (2007). -expectation, -Brownian motion and related stochastic calculus of Itô type. In Stochastic analysis and applications, pages 541–567. Springer.
- Peng, (2008) Peng, S. (2008). Multi-dimensional -Brownian motion and related stochastic calculus under -expectation. Stochastic Processes and their Applications, 118(12):2223–2253.
- Peng, (2017) Peng, S. (2017). Theory, methods and meaning of nonlinear expectation theory. SCIENTIA SINICA Mathematica, 47(10):1223–1254.
- (33) Peng, S. (2019a). Law of large numbers and central limit theorem under nonlinear expectations. Probability, Uncertainty and Quantitative Risk, 4(1):4.
- (34) Peng, S. (2019b). Nonlinear Expectations and Stochastic Calculus under Uncertainty: with Robust CLT and -Brownian Motion, volume 95. Springer-Verlag Berlin Heidelberg.
- Peng and Yang, (2020) Peng, S. and Yang, S. (2020). Autoregressive models of the time series under volatility uncertainty and application to var model. arXiv preprint arXiv:2011.09226.
- Peng et al., (2020) Peng, S., Yang, S., and Yao, J. (2020). Improving value-at-risk prediction under model uncertainty. Journal of Financial Econometrics.
- Peng and Zhou, (2020) Peng, S. and Zhou, Q. (2020). A hypothesis-testing perspective on the -normal distribution theory. Statistics & Probability Letters, 156:108623.
- Pursell, (1967) Pursell, L. E. (1967). Uniform approximation of real continuous functions on the real line by infinitely differentiable functions. Mathematics Magazine, 40(5):263–265.
- Rokhlin, (2015) Rokhlin, D. B. (2015). Central limit theorem under uncertain linear transformations. Statistics & Probability Letters, 107:191–198.
- Schmeidler, (1989) Schmeidler, D. (1989). Subjective probability and expected utility without additivity. Econometrica, 57(3):571–587.
- Song, (2020) Song, Y. (2020). Normal approximation by stein’s method under sublinear expectations. Stochastic Processes and their Applications, 130(5):2838–2850.
- Xu and Xuan, (2019) Xu, Q. and Xuan, X. M. (2019). Nonlinear regression without iid assumption. Probability, Uncertainty and Quantitative Risk, 4(1):1–15.
- Zhang and Chen, (2014) Zhang, D. and Chen, Z. (2014). A weighted central limit theorem under sublinear expectations. Communications in Statistics-Theory and Methods, 43(3):566–577.
- Zhang, (2016) Zhang, L. (2016). Rosenthal’s inequalities for independent and negatively dependent random variables under sub-linear expectations with applications. Science China Mathematics, 59(4):751–768.