This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Semi-GG-normal: a Hybrid between Normal and GG-normal (Full Version)

Yifan Li School of Mathematics and Statistics, University of Western Ontario, London, Canada E-mail: [email protected] Reg Kulperger School of Mathematics and Statistics, University of Western Ontario, London, Canada Hao Yu School of Mathematics and Statistics, University of Western Ontario, London, Canada
((Detailed research work for conference and open discussions))
Abstract

The GG-expectation framework is a generalization of the classical probabilistic system motivated by Knightian uncertainty, where the GG-normal plays a central role. However, from a statistical perspective, GG-normal distributions look quite different from the classical normal ones. For instance, its uncertainty is characterized by a set of distributions which covers not only classical normal with different variances, but additional distributions typically having non-zero skewness. The GG-moments of GG-normals are defined by a class of fully nonlinear PDEs called GG-heat equations. To understand GG-normal in a probabilistic and stochastic way that is more friendly to statisticians and practitioners, we introduce a substructure called semi-GG-normal, which behaves like a hybrid between normal and GG-normal: it has variance uncertainty but zero-skewness. We will show that the non-zero skewness arises when we impose the GG-version sequential independence on the semi-GG-normal. More importantly, we provide a series of representations of random vectors with semi-GG-normal marginals under various types of independence. Each of these representations under a typical order of independence is closely related to a class of state-space volatility models with a common graphical structure. In short, semi-GG-normal gives a (conceptual) transition from classical normal to GG-normal, allowing us a better understanding of the distributional uncertainty of GG-normal and the sequential independence.

1 Introduction

The GG-expectation framework is a new generalization of the classical probabilistic system, which is aimed at dealing with random phenomena in dynamic situations where it is hard to precisely determine a unique probabilistic model. These situations are also closely related to the long-existing concern on model uncertainty in statistical practice. For instance, Chatfield, (1995) gives an overview of this concern itself. However, how to better connect the idea of this framework with general data practice is still a developing and challenging area that requires researchers and practitioners from different backgrounds to collaborate and reflect on different degrees of uncertainty possibly brought by complicated nature of the data but also from the modeling procedure itself. To give some examples (rather than a complete list), several recent attempts have been made by Pei et al., (2021); Peng et al., (2020); Peng and Zhou, (2020); Xu and Xuan, (2019); Li, (2018) and Jin and Peng, (2016) (which has been published as Jin and Peng, (2021)). A fundamental and unavoidable problem will be how to better understand the GG-version distributions and independence from a statistical perspective, which also requires long-term efforts of learning, thinking and exploration. This research work can be treated as a detailed systematic report of our exploration on this basic point in the past three years to a broad community. This community includes not only experts in the area of nonlinear expectations (such as the GG-expectation) but also researchers and practitioners from other related fields who may not be familiar with the theory of the GG-expectation framework (GG-framework) but are interested in the interplay between their areas and the GG-framework, which requires them to properly understand the meanings and potentials of GG-version distributions and independence. One vision of this report is to explore and understand the role of statistical methods incorporating GG-version distributions or processes (with its own independence) in general data practice as well as the differences and connections with the existing classical methods. More importantly, we intend to show how we can broaden our horizon of questions we are able to consider by introducing the notions (such as the distributions and independence) in the GG-framework (this goal has been partially indicated in Section 5.5). This report is also used to discuss with the broad community to initiate an in-depth discussion on this subject. Considering the length and scope of this work, we decide to divide our core discussions into two stages. The first stage (this paper) can be treated as a theoretical preparation for the second stage (forming a companion of this paper) which provides a series of statistical data experiments based on the theoretical results in this paper.

The main objective of this paper is to provide a better interpretation and understanding of the GG-normal distribution and the GG-version independence designed for researchers and practitioners from various background who are familiar with classical probability and statistics. We will achieve this goal by introducing a new substructure called the semi-GG-normal distribution, which behaves like a hybrid connecting normal and GG-normal: it is a typical object with distributional uncertainty that preserves many properties of classical normal but is also closely related to the GG-normal distribution.

In any probabilistic framework, if there exists a “normal” distribution (or an equivalent distributional object), it should play a fundamental role in the system. How to understand and deal with the normal distribution is crucial for the development of this framework. The GG-normal distribution, as its classical analogue, plays a central and fundamental role in the development of the GG-expectation framework.

1.1 Introduction to the GG-expectation framework

First we give general readers a short introduction to the GG-expectation framework. The classical probabilistic system is good at describing the randomness under a single probability rule or model θ\mathbb{P}_{\theta} (which could be sophisticated in its form). However, in practice, there are phenomena where it is hard to precisely determine an unique θ\mathbb{P}_{\theta} to describe the randomness. In this case, we cannot ignore the uncertainty in the probability rule itself. This kind of uncertainty is often called Knightian uncertainty in economy (Knight, (1921)) or epistemic uncertainty in statistics (Der Kiureghian and Ditlevsen, (2009)). It is also commonly called model uncertainty if it refers to the uncertainty in the probabilistic model. A standard example of Knightian uncertainty comes from the Ellsberg paradox proposed by Ellsberg, (1961) showing the violation of the classical expected utitlity theory based on a linear expectation. In this case, we essentially need to work with a set 𝒫={θ,θΘ}\mathcal{P}=\{\mathbb{P}_{\theta},\theta\in\Theta\} of probability measures. In order to quantify the extreme cases under 𝒫\mathcal{P}, we need to work on a sublinear expectation \mathcal{E} defined as:

[]sup𝒫𝔼[].\mathcal{E}[\cdot]\coloneqq\sup_{\mathbb{P}\in\mathcal{P}}\mathbb{E}_{\mathbb{P}}[\cdot]. (1.1)

This sublinear expectation defined as 1.1 first appeared as the upper prevision in Huber, (2004). We also call 1.1 as a representation of \mathcal{E}. Coherent risk measures proposed by Artzner et al., (1999) can be also represented in this form and more details can be found in Föllmer and Schied, (2011). The notion of Choquet expectation (Choquet, (1954)) is another special type of sublinear expectation which is foundation of a new theory of expected utility by Schmeidler, (1989) to resolve the Ellsberg paradox in static situation. For dynamic situation, the utility theory can be developed by the sublinear version of gg-expectation proposed by Chen and Epstein, (2002). In principle, gg-expectation can only deal with those dynamic situations where we can find a reference measure \mathbb{Q} to dominate 𝒫\mathcal{P}. Nonetheless, this situation is ideal for technical convenience but also quite restrictive compared with reality: it means all the probabilities in 𝒫\mathcal{P} agree on the same null events. For instance, in the context of financial modeling, when there is (Knightian) uncertainty or ambiguity in the volatility process σt\sigma_{t}, the set 𝒫\mathcal{P} may not necessarily have a reference measure (Epstein and Ji, (2013)). How should we deal with a possibily non-dominated 𝒫\mathcal{P} in dynamic situation? It took the community many years to realize that it is necessary to jump out of the classical probability system and start from scratch to construct a new generalization of probability framework, which was established by Peng, (2004, 2007, 2008) and further developed by the academic community led by him, called the GG-expectation framework.

Since its establishment in 2000s, the GG-expectation framework is gradually developed into a new generalization of the classical one with its own notion of independence and distributions, as well as the associated stochastic calculus. The spirit of considering 𝒫\mathcal{P} to characterize the Knightian uncertainty is embedded into this framework from its initial setup. A distribution under the GG-expectation can be represented by a family of classical distributions - it provides a convenient way to depict the distributional uncertainty requiring a infinitely dimensional family of distributions which usually may not have an explicit parametric form. More details about this framework can be found in Denis et al., (2011); Peng, (2017); Peng, 2019b .

The GG-normal distribution 𝒩(0,[σ¯2,σ¯2])\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) is the analogue of normal N(0,σ2)N(0,\sigma^{2}) in this framework. As indicated by its notation, it is a typical object with variance uncertainty. In theory, it plays a central role in the context of the central limit theorem (Peng, 2019a ): it is the asymptotic distribution of the normalized sum of a series of independent random variables with zero mean and variance uncertainty. It has a Stein-type characterization provided by Hu et al., (2017). Fang et al., (2019) provides a insightful discrete approximation and continuous-time form representation of the GG-normal distribution. In pracitice, GG-normal has also shown its potentials in the study of risk measure such as the Value at Risk induced by GG-normal (GG-VaR) rigorously constructed in Peng et al., (2020) and further developed in recent Peng and Yang, (2020), where the GG-VaR has mostly outperformed the benchmark methods in terms of the violation rate and predictive performance.

1.2 Potential misunderstandings on the GG-normal and independence

Since the notion of distribution and independence in the GG-expectation framework are different from the classical setup, there are several potential misunderstandings on the interpretation of GG-normal and independence. The sources of these misunderstandings can be summarized into the following four aspects (where we have also provided clarification if applicable):

  1. A1

    (The uncertainty set of the GG-normal) The GG-expectation of the GG-normal is defined by the (viscosity) solution a fully nonlinear PDE (the GG-heat equation), which usually does not have an explicit form unless in some special cases (Hu, (2012)). In fact, following the spirit of Knightian uncertainty, a better interpretation of the GG-version distribution is a family of classical distributions characterizing the distributional uncertainty. Nonetheless, for a general reader, if not careful, the notation of GG-normal 𝒩(0,[σ¯2,σ¯2])\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) could lead to a misconception that it is associated with the family {N(0,σ2),σ[σ¯,σ¯]}\{N(0,\sigma^{2}),\sigma\in{[\underline{\sigma},\overline{\sigma}]}\}. Although this impression may still hold in special situations as shown in 3.12, it is not rigorous in general. Actually, the uncertainty set of 𝒩(0,[σ¯2,σ¯2])\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) is much larger than this and one evidence is that the GG-normal distribution has third-moment uncertainty (all its odd moments have uncertainty), but all the distributions in the family {N(0,σ2),σ[σ¯,σ¯]}\{N(0,\sigma^{2}),\sigma\in{[\underline{\sigma},\overline{\sigma}]}\} are symmetric implying zero third moments. It means that the uncertainty set of the GG-normal contains those classical elements that has non-zero third moment. It seems like a strange property for a “normal” distribution in a probabilistic system (especially when we note that X=dXX\overset{\text{d}}{=}-X if X𝒩(0,[σ¯2,σ¯2])X\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])). An explicit form of the uncertainty set of GG-normal is given by Denis et al., (2011).

  2. A2

    (The missing connection between univariate and multivariate GG-normal) The joint random vector formed by nn independent GG-normal distributed random variables does not follow a multivariate GG-normal (even under any invertible linear transformation of the original vector.) More study of the counter-intuitive properties of GG-normal can be found in Bayraktar and Munk, (2015).

  3. A3

    (The asymmetry of independence) The independence in this framework is asymmetric: XX is independent of YY does not necessarily mean YY is independent of XX. This is why this independence is also called sequential independence, which is different from the classical one. One interpretation of this asymmetry in the relation “YY is independent of XX” is from the temporal order: if YY is realized at a time point after XX, the roles between XX and YY would be asymmetric (in terms of the possible dependence structure). Another interpretation is from the distributional uncertainty: any realization of X=xX=x has no effect on the uncertainty set of YY. Both of the interpretations are valid if one understands the detailed theory of this framework. However, for general audience, both of these are still vague even become quite confusing if one combines them together in a naive way (such as “if YY happens after XX, any realization of YY should have no effect on XX, then we automatically have one way of independence.”). So far we do have a simple example that the independence is indeed asymmetric (Example 1.3.15 in Peng, 2019b ), but it is not clear why the independence is asymmetric in this example. To be specific, how does the distributional uncertainty of the joint (X,Y)(X,Y) (or the representation of its sublinear expectation) change if we switch the order of the independence? Such a representation (even in a special case) will be beneficial for general audience to better understand the sequential independence in the sense that they can explicitly see how the order of the independence changes the underlying distributional uncertainty.

  4. A4

    (The lack of caution before the data analysis) Suppose one intends to use the GG-normal distribution to describe the distributional uncertainty from a dataset (either artificial or realistic one). Without enough caution, the misinterpretations of the independence and distribution in this framework mentioned above may further bring confusion or even mistakes to the data analyzing procedure.

The objectives of this paper all serve for this central problem: from a statistical perspective, how to better understand the GG-normal distribution and the GG-version independence. The answer to this question will also lead to a better interpretation and understanding of GG-normal distribution for general audience and practitioners who are familiar with classical probability and statistics. We will work towards this central problem from the following four basic questions where each question “Q[k]” is corresponding to one of the aspect “A[k]” mentioned above,

  1. Q1

    How does the third-moment uncertainty of GG-normal arise? Is this possible to use the linear expectations of classical normal to approach the sublinear expectation of GG-normals (without involving the underlying PDEs)?

  2. Q2

    How should we appropriately connect the univariate objects and multivariate objects in this framework? Since it is hard to start from univariate GG-normals to get multivariate GG-normal, is this possible for us to make a retreat at the starting point, that is, to connect univariate classical normals with a multivariate GG-normal?

  3. Q3

    How can we understand the asymmetry of the independence in this framework in terms of representations?

  4. Q4

    What kinds of data sequence are related to the volatility uncertainty covered by GG-normal and what are not?

The interpretation of GG-normal and sequential independence will also be important to theoretically investigate the reliability and robustness of risk measure derived from GG-version distributions such as the current GG-VaR in the literature.

1.3 Our main tool and results in this paper

Our main tool here is a substructure called the semi-GG-normal distribution (Section 3.4), which behaves like a close relative to both classical models (such as a normal mixture model) and also GG-version objects (such as a GG-normal). We will also study the various kinds of independence associated with the semi-GG-normal distributions (Section 3.6).

The notion semi-GG-normal was first proposed in Li and Kulperger, (2018), which has used it to design an iterative approximation to the sublinear expectation of GG-normal and the solution to the GG-heat equation. Later on this substructure was further developed in the master thesis by Li, (2018) where the independence structures have been proposed there to better perform the pseudo simulation in this context.

This paper gives a more rigorous and systematic construction of these structures and focus more on the distributional and probabilistic aspects of them to show the hybrid roles of semi-GG-normal between classical normal and GG-normal. To be specific, we will show that there exists a middle stage of independence sitting between the classical (symmetric) independence and the GG-version (asymmetric) independence. It is called semi-sequential independence, which allows the connection between univariate and multivariate object (3.22), and it is a symmetric relation between two semi-GG-normal objects (3.19).

Moreover, we will provide a series of representations in the form similar to 1.1 associated with the semi-GG-normal distributions and also the random vector with semi-GG-normal marginal under various kinds of independence. Interestingly, by changing the order of the independence, we are equivalently modifying the graphical structure in the representation of the sublinear expectation of the joint vector. This idea will be shown in Section 3.7 and further studied in Section 5.3. These representations provide a more straightforward view on the order of independence in this framework, because we can see how the family of distributions is changed due to the switching of order. Under this view, we can provide a statistical interpretation of the asymmetry of sequential independence between two semi-GG-normal objects (Section 4.3).

Throughout this paper, we will frequently mention the representations of the distributions in the GG-framework. Theses representation results are crucial here, because the right hand side of representation is simply a family of classical models and its envelope will be exactly the sublinear expectation of GG-version objects. Through an intuitive representation, a person who is familiar with classical probability and statistics is able to understand the uncertainty described by the GG-version objects through the representation.

This remaining content of this paper is organized as follows. Section 2 will give a basic setup for the GG-expectation framework for readers to check the rigorous definitions of each concept. Section 3 presents our main results by putting readers in a context of a classical state-space volatility models. This kind of story setup is specially helpful in the discussions of representation associated with semi-GG-normal in Section 3.7. After we go through these representation results associated with this substructure, readers will find that we have already provided the answers to the four questions during the procedure. These answers will be given and elaborated in Section 4. Finally, Section 5 will summarize the whole paper and also provide more possible extensions as future developments. The proofs of our theoretical results will be put into Section 6 unless a proof is beneficial to the current discussion or is relatively short to be included in the main content.

2 Basic settings of the GG-expectation framework

This section gives a detailed description of the basic setup (the sublinear expectation space) of the GG-expectation framework for general audience by starting from a set of probability measures (more rigorous treatments can be found in Chapter 6 in the book by Peng, 2019b ). Another equivalent way is to start from a space of random variables and a sublinear operator (more details can be found in Chapter 1 and 2 in Peng, 2019b ).

For readers who may not be familiar with this setup, the following reading order is recommended as a candidate one:

  1. 1.

    Take a glance at the initial setup and the meaning of notations in this section (especially the notation for independence which is in 2.6);

  2. 2.

    Read through our main results (Section 3) which describe the GG-version distributions mostly using the representations in terms of classical objects;

  3. 3.

    Come back to this section to check the detailed definitions (such as the connection between GG-expectation with the solutions to a class of fully nonlinear partial differential equations).

2.1 Distributions, independence and limiting results

Let 𝒫={θ,θΘ}\mathcal{P}=\{\mathbb{P}_{\theta},\theta\in\Theta\} denote a set of probability measures on a measurable space (Ω,)(\Omega,\mathcal{F}). Let 𝔼θ\mathbb{E}_{\theta} denote the linear expectation under θ\mathbb{P}_{\theta}. Consider the following spaces:

  • L0(Ω)L^{0}(\Omega): the space of all \mathcal{F}-measurable real-valued functions (or the family of random variables X:ΩX:\Omega\to\mathbb{R});

  • {XL0(Ω):𝔼θ[X] exists for each θΘ}\mathcal{H}^{*}\coloneqq\{X\in L^{0}(\Omega):\mathbb{E}_{\theta}[X]\text{ exists for each }\theta\in\Theta\};

  • p{XL0(Ω):supθΘ𝔼θ[|X|p]<}\mathcal{H}_{p}\coloneqq\{X\in L^{0}(\Omega):\sup_{\theta\in\Theta}\mathbb{E}_{\theta}[\lvert X\rvert^{p}]<\infty\} (for p>0p>0);

  • 𝒩p{XL0(Ω):supθΘ𝔼θ[|X|p]=0}\mathcal{N}_{p}\coloneqq\{X\in L^{0}(\Omega):\sup_{\theta\in\Theta}\mathbb{E}_{\theta}[\lvert X\rvert^{p}]=0\} (for p>0p>0);

  • 𝒩{XL0(Ω):θ(X=0)=1 for each θΘ}\mathcal{N}\coloneqq\{X\in L^{0}(\Omega):\mathbb{P}_{\theta}(X=0)=1\text{ for each }\theta\in\Theta\}.

Note that for any 1pq<1\leq p\leq q<\infty,

qpL0(Ω).\mathcal{H}_{q}\subset\mathcal{H}_{p}\subset\mathcal{H}^{*}\subset L^{0}(\Omega).

We also have, for any p>0p>0,

𝒩=𝒩p.\mathcal{N}=\mathcal{N}_{p}.
Definition 2.1.

(The upper expectation associated with 𝒫\mathcal{P}) For any XX\in\mathcal{H}^{*}, we define a functional :[,]\mathcal{E}:\mathcal{H}^{*}\to[-\infty,\infty] associated with the family 𝒫\mathcal{P} as

[X]=𝒫[X]supθΘ𝔼θ[X],\mathcal{E}[X]=\mathcal{E}^{\mathcal{P}}[X]\coloneqq\sup_{\theta\in\Theta}\mathbb{E}_{\theta}[X],

where [,][-\infty,\infty] is the extended real line. We also follow the convention that, if 𝔼θ[X]\mathbb{E}_{\theta}[X] exists but is \infty for some θ\theta, the supreme is taken as \infty.

Definition 2.2 (The upper and lower probability).

For any AA\in\mathcal{F}, let

𝐕(A)sup𝒫(A), and 𝐯(A)inf𝒫(A).\mathbf{V}(A)\coloneqq\sup_{\mathbb{P}\in\mathcal{P}}\mathbb{P}(A),\text{ and }\mathbf{v}(A)\coloneqq\inf_{\mathbb{P}\in\mathcal{P}}\mathbb{P}(A).

The set function 𝐯\mathbf{v} and 𝐕\mathbf{V} are respectively called the lower and upper probabilities associated with 𝒫\mathcal{P}.

Proposition 2.3.

The space \mathcal{H}^{*} satisfies:

  1. (1)

    cc\in\mathcal{H}^{*} for any constant cc\in\mathbb{R};

  2. (2)

    If XX\in\mathcal{H}^{*}, then |X|\lvert X\rvert\in\mathcal{H}^{*};

  3. (3)

    If AA\in\mathcal{F}, then 𝟙A\mathds{1}_{A}\in\mathcal{H}^{*};

  4. (4)

    If XL0(Ω)X\in L^{0}(\Omega) satisfying θ(X0)=1\mathbb{P}_{\theta}(X\geq 0)=1 for any θΘ\theta\in\Theta, then XX\in\mathcal{H}^{*};

  5. (5)

    If XL0(Ω)X\in L^{0}(\Omega) satisfying θ(X0)=1\mathbb{P}_{\theta}(X\leq 0)=1 for any θΘ\theta\in\Theta, then XX\in\mathcal{H}^{*};

  6. (6)

    For any XL0(Ω)X\in L^{0}(\Omega), |X|k\lvert X\rvert^{k}\in\mathcal{H}^{*} for k>0k>0.

Proof.

It is easy to check the first three properties. The logic of (4) comes from the fact that, for each 𝒫\mathbb{P}\in\mathcal{P}, if we have X0X\geq 0, \mathbb{P}-almost surely, we must have 𝔼[X]\mathbb{E}_{\mathbb{P}}[X] exists. Similar logic can be applied to (5). The property (6) is a direct result of (4). ∎

Remark 2.3.1.

However, \mathcal{H}^{*} is not necessarily a linear space. For instance, let 𝒫={Q}\mathcal{P}=\{Q\} and XX is a Cauchy distributed random variable under QQ. We have X+X^{+} and XX^{-} belong to \mathcal{H}^{*}, but X=X+XX=X^{+}-X^{-}\notin\mathcal{H}^{*}.

By 2.3, for any XL0(Ω)X\in L^{0}(\Omega), [|X|p]\mathcal{E}[\lvert X\rvert^{p}] is well-defined for any p>0p>0. Then we can write p\mathcal{H}_{p} as

p={XL0(Ω):[|X|p]<}.\mathcal{H}_{p}=\{X\in L^{0}(\Omega):\mathcal{E}[\lvert X\rvert^{p}]<\infty\}.

We will mainly focus on the space 1\mathcal{H}_{1}.

Proposition 2.4.

The space 1\mathcal{H}_{1} is a linear space satisfying:

  1. (1)

    c1c\in\mathcal{H}_{1} for any constant cc\in\mathbb{R};

  2. (2)

    If X1X\in\mathcal{H}_{1}, then cX1cX\in\mathcal{H}_{1} for any constant cc\in\mathbb{R};

  3. (3)

    If X,Y1X,Y\in\mathcal{H}_{1}, then X+Y1X+Y\in\mathcal{H}_{1};

  4. (4)

    If X1X\in\mathcal{H}_{1}, then |X|1\lvert X\rvert\in\mathcal{H}_{1};

  5. (5)

    If AA\in\mathcal{F}, then 𝟙A1\mathds{1}_{A}\in\mathcal{H}_{1};

  6. (6)

    If X1X\in\mathcal{H}_{1}, then φ(X)1\varphi(X)\in\mathcal{H}_{1} for any bounded Borel measurable function φ\varphi.

Proof.

The properties here can be checked by definition of 1\mathcal{H}_{1}. For instance, (3) comes from the inequality: 𝔼θ[|X+Y|]𝔼θ[|X|]+𝔼θ[|Y|]\mathbb{E}_{\theta}[\lvert X+Y\rvert]\leq\mathbb{E}_{\theta}[\lvert X\rvert]+\mathbb{E}_{\theta}[\lvert Y\rvert] for any θΘ\theta\in\Theta. ∎

Then we can check that \mathcal{E} becomes a sublinear operator on the linear space 1\mathcal{H}_{1}. In other words, :1\mathcal{E}:\mathcal{H}_{1}\to\mathbb{R} satisfies: for any X,Y1X,Y\in\mathcal{H}_{1},

  1. 1.

    (Monotonicity) For any XYX\geq Y, [X][Y]\mathcal{E}[X]\geq\mathcal{E}[Y];

  2. 2.

    (Constant preserving) For any cc\in\mathbb{R}, [c]=c\mathcal{E}[c]=c;

  3. 3.

    (Sub-additivity) [X+Y][X]+[Y]\mathcal{E}[X+Y]\leq\mathcal{E}[X]+\mathcal{E}[Y];

  4. 4.

    (Positive homogeneity) For any λ0\lambda\geq 0, [λX]=λ[X]\mathcal{E}[\lambda X]=\lambda\mathcal{E}[X];

Then we call \mathcal{E} a sublinear expectation and (Ω,1,)(\Omega,\mathcal{H}_{1},\mathcal{E}) a sublinear expectation space.

Furthermore, note that 𝒩={XL0(Ω):[|X|]=0}\mathcal{N}=\{X\in L^{0}(\Omega):\mathcal{E}[\lvert X\rvert]=0\} is a linear subspace of 1\mathcal{H}_{1}. We can treat 𝒩\mathcal{N} as the null space and define the quotient space 1/𝒩\mathcal{H}_{1}/\mathcal{N}. For any {X}1/𝒩\{X\}\in\mathcal{H}_{1}/\mathcal{N} with representative XX, we can define [{X}][X]\mathcal{E}[\{X\}]\coloneqq\mathcal{E}[X], which is still a sublinear expectation. We can check that \mathcal{E} induces a Banach norm X1[|X|]\lVert X\rVert_{1}\coloneqq\mathcal{E}[\lvert X\rvert] on 1/𝒩\mathcal{H}_{1}/\mathcal{N}. Let ^1\hat{\mathcal{H}}_{1} denote the completion of 1/𝒩\mathcal{H}_{1}/\mathcal{N} under 1\lVert\cdot\rVert_{1}. Since we can check that 1/𝒩\mathcal{H}_{1}/\mathcal{N} itself is a Banach space, it is equal to its completion ^1\hat{\mathcal{H}}_{1} (Proposition 14 in Denis et al., (2011)). Let ^1\mathcal{H}\coloneqq\hat{\mathcal{H}}_{1}, then we can check that (Ω,,)(\Omega,\mathcal{H},\mathcal{E}) still forms a sublinear expectation space.

Rigorously speaking, we also require additional conditions on 𝒫\mathcal{P} such as the weak compactness so that we have regularity on \mathcal{E} (Theorem 12 in in Denis et al., (2011)). Meanwhile, there exists such a weakly compact family 𝒫\mathcal{P} so that the typical GG-version distributions (maximal and GG-normal distribution) exist in the space (Ω,,)(\Omega,\mathcal{H},\mathcal{E}). More details can be found in Section 2.3 and Section 6.2 in Peng, 2019b . The GG-expectation is defined after we construct the Brownian motion (the GG-Brownian motion) in this context but throughout this paper, we will only touch the GG-version distributions and independence so the expectation \mathcal{E} so far is still a special kind of sublinear one, which we will still call it as the GG-expectation to stress its typical properties allowing the existence of the GG-version distributions. Throughout this section, without further notice, we will stay in (Ω,,)(\Omega,\mathcal{H},\mathcal{E}).

Let d{(X1,X2,,Xd),Xi,i=1,2,,d}\mathcal{H}^{d}\coloneqq\{(X_{1},X_{2},\dotsc,X_{d}),X_{i}\in\mathcal{H},i=1,2,\dotsc,d\}. For any XdX\in\mathcal{H}^{d}, we will frequently mention a transformation φ(X)\varphi(X) of XX for a function φ:d\varphi:\mathbb{R}^{d}\to\mathbb{R}. Consider the following spaces of functions:

  • Cb.Lip(d)C_{\mathrm{b.Lip}}(\mathbb{R}^{d}): the linear space of all bounded and Lipchistz functions;

  • Cl.Lip(d)C_{\mathrm{l.Lip}}(\mathbb{R}^{d}): the linear space of functions satisfying the locally Lipchistz property which means

    |φ(x)φ(y)|Cφ(1+|x|k+|y|k)|xy|,\lvert\varphi(x)-\varphi(y)\rvert\leq C_{\varphi}(1+\lvert x\rvert^{k}+\lvert y\rvert^{k})\lvert x-y\rvert,

    for x,ydx,y\in\mathbb{R}^{d}, some positive integer kk and Cφ>0C_{\varphi}>0 depending on φ\varphi.

We will simply write φCb.Lip\varphi\in C_{\mathrm{b.Lip}} or φCl.Lip\varphi\in C_{\mathrm{l.Lip}} if the dimension of the domain of φ\varphi is clear in the context by checking the dimension of random objects.

Note that \mathcal{H} satisfies: for any φCb.Lip\varphi\in C_{\mathrm{b.Lip}}, φ(X)\varphi(X)\in\mathcal{H} if XdX\in\mathcal{H}^{d}. However, this property does not necessarily hold for any φCl.Lip\varphi\in C_{\mathrm{l.Lip}}. Therefore, when we discuss the definition of distributions and independence in this framework, we will use φCb.Lip\varphi\in C_{\mathrm{b.Lip}}. Later on, we will mention that this space can be extended to any φCl.Lip\varphi\in C_{\mathrm{l.Lip}} for a special family of distributions and under some additional conditions.

Definition 2.5 (Distributions).

There several notions related to the GG-version distributions:

  1. 1.

    We call XX and YY are identically distributed, denoted by X=dYX\overset{\text{d}}{=}Y, if for any φCb.Lip\varphi\in C_{\mathrm{b.Lip}},

    [φ(X)]=[φ(Y)].\mathcal{E}[\varphi(X)]=\mathcal{E}[\varphi(Y)].
  2. 2.

    A sequence {Xn}n=1\{X_{n}\}_{n=1}^{\infty} converges in distribution to XX, denoted as XndXX_{n}\overset{\text{d}}{\longrightarrow}X, if for any φCb.Lip\varphi\in C_{\mathrm{b.Lip}},

    limn[φ(Xn)]=[φ(X)].\lim_{n\to\infty}\mathcal{E}[\varphi(X_{n})]=\mathcal{E}[\varphi(X)].
Definition 2.6 (Independence).

A random variable YY is (sequentially) independent from XX, denoted by XYX\dashrightarrow Y, if for any φCb.Lip\varphi\in C_{\mathrm{b.Lip}},

[φ(X,Y)]=[[φ(x,Y)]x=X].\mathcal{E}[\varphi(X,Y)]=\mathcal{E}[\mathcal{E}[\varphi(x,Y)]_{x=X}].
Remark 2.6.1.

(Intuition of this independence) Since both XX and YY are treated as random object with potential distributional uncertainty, this independence is essentially talking about the relation between the distributional uncertainty of XX and YY. If we put our discussion into a context of sequential data (where the order of the data matters), this kind of independence often arises in scenarios where XX is realized before YY and any realization of XX has no effect on the distributional uncertainty of YY.

Remark 2.6.2.

(Asymmetry of this independence) One important fact regarding this independence is that it is asymmetric: XYX\dashrightarrow Y (YY is independent from XX) does not necessarily mean YXY\dashrightarrow X (XX is independent from YY), which will be illustrated by 2.7. This is the reason we also call it a sequential independence and we use the notation \dashrightarrow to indicate the sequential order of the independence between two random objects.

Remark 2.6.3.

(Connection with the classical independence) Note that this sequential independence becomes classical independence (which is symmetric) once XX and YY have certain classical distribution. In other words, they can be put under a common classical probability space. In this case, \mathcal{E} reduces to a linear expectation 𝔼\mathbb{E}_{\mathbb{P}}. To give readers a better understanding, without loss of generality, suppose (X,Y)(X,Y) have a classical joint continuous distribution with density function fX,Yf_{X,Y} and the marginal densities are fXf_{X} and fYf_{Y}, we have, for any applicable φ\varphi,

𝔼[φ(X,Y)]=φ(x,y)fX,Y(x,y)dxdy=𝔼[𝔼[φ(x,Y)]x=X]=xyφ(x,y)fX(x)fY(y)dydx.\begin{array}[]{ccc}\mathbb{E}_{\mathbb{P}}[\varphi(X,Y)]&=&\int\varphi(x,y)f_{X,Y}(x,y)\mathop{}\!\mathrm{d}x\mathop{}\!\mathrm{d}y\\ \rotatebox{90.0}{$\,=$}\\ \mathbb{E}_{\mathbb{P}}[\mathbb{E}_{\mathbb{P}}[\varphi(x,Y)]_{x=X}]&=&\int_{x}\int_{y}\varphi(x,y)f_{X}(x)f_{Y}(y)\mathop{}\!\mathrm{d}y\mathop{}\!\mathrm{d}x\end{array}.

Therefore, we have fX,Y=fXfYf_{X,Y}=f_{X}f_{Y}, which means XX and YY are (classically) independent.

Example 2.7 (Example 1.3.15 in Peng, 2019b ).

Consider two identically distributed X,YX,Y\in\mathcal{H} with [X]=[X]=0\mathcal{E}[-X]=\mathcal{E}[X]=0 and σ¯2=[X2]>[X2]=σ¯2\overline{\sigma}^{2}=\mathcal{E}[X^{2}]>-\mathcal{E}[-X^{2}]=\underline{\sigma}^{2}. Also assume [|X|]>0\mathcal{E}[\lvert X\rvert]>0 such that [X+]=12[|X|+X]=12[|X|]>0\mathcal{E}[X^{+}]=\frac{1}{2}\mathcal{E}[\lvert X\rvert+X]=\frac{1}{2}\mathcal{E}[\lvert X\rvert]>0. Then we have

[XY2]={(σ¯2σ¯2)[X+]if XY0if YX.\mathcal{E}[XY^{2}]=\begin{cases}(\overline{\sigma}^{2}-\underline{\sigma}^{2})\mathcal{E}[X^{+}]&\text{if }X\dashrightarrow Y\\ 0&\text{if }Y\dashrightarrow X\end{cases}.

We will further study the interpretation of the independence (especially its asymmetric property) in Section 4.3 by giving a detailed version of 2.7 with representation theorems.

Next we give the notion of independence extended to a sequence of random variables.

Definition 2.8.

(Independence of Sequence) For a sequence {Xi}i=1n\{X_{i}\}_{i=1}^{n} of random variables, they are (sequentially) independent if

(X1,X2,,Xi)Xi+1,(X_{1},X_{2},\dotsc,X_{i})\dashrightarrow X_{i+1},

for i=1,2,,n1i=1,2,\dotsc,n-1. For notational convenience, the sequential independence of {Xi}i=1n\{X_{i}\}_{i=1}^{n} is denoted as

X1X2Xn.X_{1}\dashrightarrow X_{2}\dashrightarrow\cdots\dashrightarrow X_{n}. (2.1)

This sequence {Xi}i=1n\{X_{i}\}_{i=1}^{n} is further identically and independently distributed if they are sequentially independent and Xi+1=dXiX_{i+1}\overset{\text{d}}{=}X_{i} for i=1,2,,n1i=1,2,\dotsc,n-1. This property is called (nonlinearly) i.i.d. in short.

Remark 2.8.1.

Note that the independence 2.1 is stronger than the pairwise relation XkXk+1X_{k}\dashrightarrow X_{k+1} with k=1,2,,n1k=1,2,\dotsc,n-1.

Now we introduce two fundamental GG-version distributions: maximal and GG-normal distributions. The former one can be treated as an analogue of “constant” in classical sense. The latter one is a generalization of classical normal. We call X¯\bar{X} an independent copy of XX if X¯=dX\bar{X}\overset{\text{d}}{=}X and XX¯X\dashrightarrow\bar{X}.

We first introduce the GG-distribution which is the joint vector of these two fundamental distributions.

Let 𝕊(d)\mathbb{S}(d) denote the collection of all d×dd\times d symmetric matrices.

Proposition 2.9.

Let G:d×𝕊(d)G:\mathbb{R}^{d}\times\mathbb{S}(d)\to\mathbb{R} be a function satisfying: for each p,p¯dp,\bar{p}\in\mathbb{R}^{d} and A,A¯𝕊(d)A,\bar{A}\in\mathbb{S}(d),

{G(p+p¯,A+A¯)G(p,A)+G(p¯,A¯),G(λp,λA)=λG(p,A) for any λ0,G(p,A)G(p,A¯) if AA¯.\begin{cases}G(p+\bar{p},A+\bar{A})\leq G(p,A)+G(\bar{p},\bar{A}),\\ G(\lambda p,\lambda A)=\lambda G(p,A)\text{ for any }\lambda\geq 0,\\ G(p,A)\leq G(p,\bar{A})\text{ if }A\leq\bar{A}.\end{cases} (2.2)

Then there exists a pair (X,η)(X,\eta) on some sublinear expectation space (Ω,,)(\Omega,\mathcal{H},\mathcal{E}) such that

G(p,A)=[12AX,X+p,η],G(p,A)=\mathcal{E}[\frac{1}{2}\langle AX,X\rangle+\langle p,\eta\rangle], (2.3)

and for any a,b0a,b\geq 0,

(aX+bX¯,a2η+b2η¯)=d(a2+b2X,(a2+b2)η),(aX+b\bar{X},a^{2}\eta+b^{2}\bar{\eta})\overset{\text{d}}{=}(\sqrt{a^{2}+b^{2}}X,(a^{2}+b^{2})\eta), (2.4)

where (X¯,η¯)(\bar{X},\bar{\eta}) is an independent copy of (X,η)(X,\eta).

Remark 2.9.1.

The relation 2.4 is equivalent to (X+X¯,η+η¯)=d(2X,2η).(X+\bar{X},\eta+\bar{\eta})\overset{\text{d}}{=}(\sqrt{2}X,2\eta).

The proof of 2.9 is available at Section 2.3 in Peng, 2019b . Then we have the notion of GG-distribution associated with a function GG.

Definition 2.10.

(GG-distribution) A pair (X,η)(X,\eta) satisfying 2.4 is called GG-distributed associated with a function GG in terms of 2.3.

The sublinear expectation of the random vector (X,η)(X,\eta) above can be characterized by the solution to a parabolic partial differential equation.

Proposition 2.11.

Consider a GG-distributed random vector (X,η)(X,\eta) associated with a function GG. For any φCb.Lip(d×)\varphi\in C_{\mathrm{b.Lip}}(\mathbb{R}^{d}\times\mathbb{R}), let

u(t,x,y)[φ(x+tX,y+tη)],(t,x,y)[0,)×d×d.u(t,x,y)\coloneqq\mathcal{E}[\varphi(x+\sqrt{t}X,y+t\eta)],\;(t,x,y)\in[0,\infty)\times\mathbb{R}^{d}\times\mathbb{R}^{d}.

Then uu is the unique (viscosity) solution to the following parabolic partial differential equation (PDE):

tuG(Dyu,Dx2u)=0,\mathop{}\!\partial_{t}u-G(D_{y}u,D_{x}^{2}u)=0,

with initial condition u|t=0=φu|_{t=0}=\varphi, where Dx2u(xixj2u)i,j=1dD_{x}^{2}u\coloneqq(\mathop{}\!\partial_{x_{i}x_{j}}^{2}u)_{i,j=1}^{d} and Dyu(yiu)i=1dD_{y}u\coloneqq(\mathop{}\!\partial_{y_{i}}u)_{i=1}^{d}. This PDE is called a GG-equation.

Remark 2.11.1.

Readers may turn to Crandall et al., (1992) for more details on the notion of viscosity solutions. In this paper, we do not require readers’ knowledge on the viscosity solution. Moreover, it can be treated as a classical one when the function GG satisfies the strong elliplicity condition.

Next we provide a useful established property of the GG-distributed random vector (X,η)(X,\eta). Suppose |η|,|X|2\lvert\eta\rvert,\lvert X\rvert^{2}\in\mathcal{H} and the following uniform integrability conditions are statisfied (proposed by Zhang, (2016)):

limλ[(|η|λ)+]=0,\lim_{\lambda\to\infty}\mathcal{E}[(\lvert\eta\rvert-\lambda)^{+}]=0, (2.5)

and

limλ[(|X|2λ)+]=0.\lim_{\lambda\to\infty}\mathcal{E}[(\lvert X\rvert^{2}-\lambda)^{+}]=0. (2.6)

Then for any φCl.Lip\varphi\in C_{\mathrm{l.Lip}} (which is larger than Cb.LipC_{\mathrm{b.Lip}}), we still have φ(η,X)\varphi(\eta,X)\in\mathcal{H} (which is a Banach space). (This result is provided in Section 2.5 in Peng, 2019b .) Therefore, in the following context, when we talk about φ(η,X)\varphi(\eta,X) for a GG-distributed random vector (η,X)(\eta,X), we can take φCl.Lip\varphi\in C_{\mathrm{l.Lip}}.

If we pay attention to each marginal part in 2.4, we can see that XX is similar to a classical normal distribution while η\eta behaves like a constant (we do not consider Cauchy distribution here because we assume the existence of expectation). It turns out XX follows a GG-normal distribution and η\eta follows a maximal distribution.

Definition 2.12 (Maximal distribution).

A dd-dimensional random vector η\eta follows a maximal distribution if, for any independent copy η¯\bar{\eta}, we have

η+η¯=d2η.\eta+\bar{\eta}\overset{\text{d}}{=}2\eta.

Another equivalent and specific definition is that η\eta follows the maximal distribution (Γ)\mathcal{M}(\Gamma) if there exists a bounded, closed and convex subset Γd\Gamma\subset\mathbb{R}^{d} such that, for any φCl.Lip\varphi\in C_{\mathrm{l.Lip}},

[φ(η)]=maxyΓφ(y).\mathcal{E}[\varphi(\eta)]=\max_{y\in\Gamma}\varphi(y).
Definition 2.13 (GG-normal distribution).

A dd-dimensional random vector XX follows a GG-normal distribution if, for any independent copy X¯\bar{X}, we have

X+X¯=d2X.X+\bar{X}\overset{\text{d}}{=}\sqrt{2}X.

When d=1d=1, we have X𝒩(0,[σ¯2,σ¯2])X\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) (0σ¯σ¯0\leq\underline{\sigma}\leq\overline{\sigma}) with variance-uncertainty: σ¯2[X2]\underline{\sigma}^{2}\coloneqq-\mathcal{E}[-X^{2}] and σ¯2[X2]\overline{\sigma}^{2}\coloneqq\mathcal{E}[X^{2}].

Proposition 2.14 (GG-normal distribution characterized by the GG-heat Equation).

A random vector XX follows the dd-dimensional GG-normal distribution, if and only if v(t,x)[φ(x+tX)]v(t,x)\coloneqq\mathcal{E}[\varphi(x+\sqrt{t}X)] is the solution to the GG-heat equation defined on (t,x)[0,1]×d(t,x)\in[0,1]\times\mathbb{R}^{d}:

vtG(Dx2v)=0,v|t=0=φ,v_{t}-G(D_{x}^{2}v)=0,\,v|_{t=0}=\varphi, (2.7)

where G(𝐀)12[𝐀X,X]:𝕊dG(\mathbf{A})\coloneqq\frac{1}{2}\mathcal{E}[\langle\mathbf{A}X,X\rangle]:\mathbb{S}_{d}\to\mathbb{R}, which is a sublinear function characterizing the distribution of XX. For d=1d=1, we have G(a)=12(σ¯2a+σ¯2a)G(a)=\frac{1}{2}(\overline{\sigma}^{2}a^{+}-\underline{\sigma}^{2}a^{-}) and when σ¯2>0\underline{\sigma}^{2}>0, 2.7 is also called the Black-Scholes-Barenblatt equation with volatility uncertainty.

Remark 2.14.1.

For d=1d=1, when σ¯=σ¯=σ\underline{\sigma}=\overline{\sigma}=\sigma, the GG-normal distribution X𝒩(0,[σ¯2,σ¯2])X\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) can be treated as a classical normal N(0,σ2)N(0,\sigma^{2}) because the GG-heat equation is reduced to a classical one.

Remark 2.14.2.

(Covariance uncertainty) We can use the function G(𝐀)12[𝐀X,X]G(\mathbf{A})\coloneqq\frac{1}{2}\mathcal{E}[\langle\mathbf{A}X,X\rangle] to characterize the definition of GG-normal distribution. In fact, G(𝐀)G(\mathbf{A}) can be further expressed as

G(𝐀)=12sup𝚺𝒞tr[𝐀𝚺]G(\mathbf{A})=\frac{1}{2}\sup_{\mathbf{\Sigma}\in\mathcal{C}}\text{tr}[\mathbf{A}\mathbf{\Sigma}]

where 𝒞={𝐁𝐁T:𝐁𝕊d}\mathcal{C}=\{\mathbf{B}\mathbf{B}^{T}:\mathbf{B}\in\mathbb{S}_{d}\} is a collection of non-negative definite symmetric matrices which can be treated as the uncertainty set of the covariance matrices. In this sense, we can write X𝒩(𝟎,𝒞)X\sim\mathcal{N}(\bm{0},\mathcal{C}).

Proposition 2.15.

Consider X𝒩(0,[σ¯2,σ¯2])X\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) and a classical distributed random variable ϵN(0,1)\epsilon\sim N(0,1). For any φCl.Lip()\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}), we have

[φ(X)]={𝔼[φ(σ¯ϵ)]if φ is convex𝔼[φ(σ¯ϵ)]if φ is concave.\mathcal{E}[\varphi(X)]=\begin{cases}\mathbb{E}[\varphi(\overline{\sigma}\epsilon)]&\text{if }\varphi\text{ is convex}\\ \mathbb{E}[\varphi(\underline{\sigma}\epsilon)]&\text{if }\varphi\text{ is concave}\end{cases}.
Theorem 2.16 (Law of Large Numbers).

Consider a sequence of i.i.d. {Zi}i=1\{Z_{i}\}_{i=1}^{\infty} satisfying

limλ[(|Z1|λ)+]=0.\lim_{\lambda\to\infty}\mathcal{E}[(\lvert Z_{1}\rvert-\lambda)^{+}]=0. (2.8)

Then for any continuous functions φ\varphi satisfying the linear growth condition |φ(x)|C(1+|x|)\lvert\varphi(x)\rvert\leq C(1+\lvert x\rvert), we have

limn[φ(1ni=1nZi)]=maxvΓφ(v),\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{n}\sum_{i=1}^{n}Z_{i})]=\max_{v\in\Gamma}\varphi(v),

where Γ\Gamma is the bounded, closed and convex subset decided by

maxηΓp,Z1=[p,Z1],pd.\max_{\eta\in\Gamma}\langle p,Z_{1}\rangle=\mathcal{E}[\langle p,Z_{1}\rangle],\;p\in\mathbb{R}^{d}.

For d=1d=1, let μ¯[Z1]\underline{\mu}\coloneqq-\mathcal{E}[-Z_{1}] and μ¯[Z1]\overline{\mu}\coloneqq\mathcal{E}[Z_{1}]. Then 1ni=1nZid[μ¯,μ¯]\frac{1}{n}\sum_{i=1}^{n}Z_{i}\overset{\text{d}}{\longrightarrow}\mathcal{M}[\underline{\mu},\overline{\mu}], that is, we have,

limn[φ(1ni=1nZi)]=[φ([μ¯,μ¯])]=maxμ¯vμ¯φ(v).\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{n}\sum_{i=1}^{n}Z_{i})]=\mathcal{E}[\varphi(\mathcal{M}[\underline{\mu},\overline{\mu}])]=\max_{\underline{\mu}\leq v\leq\overline{\mu}}\varphi(v).
Theorem 2.17 (Central Limit Theorem).

Consider a sequence of i.i.d. {Xi}i=1\{X_{i}\}_{i=1}^{\infty} satisfying mean-certainty [X1]=[X1]=𝟎\mathcal{E}[X_{1}]=-\mathcal{E}[-X_{1}]=\bm{0} and

limλ[(|X1|2λ)+]=0.\lim_{\lambda\to\infty}\mathcal{E}[(\lvert X_{1}\rvert^{2}-\lambda)^{+}]=0. (2.9)

Then for any continuous functions φ\varphi satisfying the linear growth condition |φ(x)|C(1+|x|)\lvert\varphi(x)\rvert\leq C(1+\lvert x\rvert),

limn[φ(1ni=1nXi)]=[φ(X)],\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i})]=\mathcal{E}[\varphi(X)],

where XX is a GG-normally distributed random variable characterized by the sublinear function GG defined as

G(A)[12AX1,X1],A𝕊d.G(A)\coloneqq\mathcal{E}[\frac{1}{2}\langle AX_{1},X_{1}\rangle],\;A\in\mathbb{S}_{d}.

For d=1d=1, let σ¯2[X12]\underline{\sigma}^{2}\coloneqq-\mathcal{E}[-X_{1}^{2}] and σ¯2[X12]\overline{\sigma}^{2}\coloneqq\mathcal{E}[X_{1}^{2}]. Then we have 1ni=1nXi\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i} converges in distribution to X𝒩(0,[σ¯2,σ¯2])X\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) .

Proposition 2.18.

Consider a sequence {Yn}n=1\{Y_{n}\}_{n=1}^{\infty} and YnY_{n} satisfying

supn[|Yn|p]+[|Y|p]<,\sup_{n}\mathcal{E}[\lvert Y_{n}\rvert^{p}]+\mathcal{E}[\lvert Y\rvert^{p}]<\infty,

for any p1p\geq 1. If the convergence limn[φ(Yn)]=[φ(Y)]\lim_{n\to\infty}\mathcal{E}[\varphi(Y_{n})]=\mathcal{E}[\varphi(Y)] holds for any φCb.Lip\varphi\in C_{\mathrm{b.Lip}}, then it also holds for φCl.Lip\varphi\in C_{\mathrm{l.Lip}}.

Remark 2.18.1.

2.18 is a direct result of Lemma 2.4.12 in Peng, 2019b . It useful when we need to extend the function space for φ\varphi to discuss the convergence in distribution.

2.2 Basic results on independence of sequence

We prepare several basic results on sequential independence between random vectors in the GG-framework:

  • 2.19 gives a general result showing the sequential independence between two random vectors implies the independence between their sub-vectors.

  • 2.20 shows the sequential independence of a sequence implies the independence of the sub-sequence.

  • 2.23 shows under the sequential independence of a sequence, any two non-overlapping subvector has the sequential independence (as long as keeping the original order.)

These results are useful for the discussions in Section 3.6. We provide the proofs for the convenience of general readers and help them better understand how to deal with the sequential independence \dashrightarrow.

Proposition 2.19.

For any subsequences {ip}p=1k\{i_{p}\}_{p=1}^{k} and {jq}q=1l\{j_{q}\}_{q=1}^{l} satisfying 1i1<i2<<ikn1\leq i_{1}<i_{2}<\dotsc<i_{k}\leq n and 1j1<j2<<jlm1\leq j_{1}<j_{2}<\dotsc<j_{l}\leq m, we have the general result that

(X1,X2,,Xn)(Y1,Y2,,Ym)(Xi1,Xi2,,Xik)(Yj1,Yj2,,Yjl).(X_{1},X_{2},\dotsc,X_{n})\dashrightarrow(Y_{1},Y_{2},\dotsc,Y_{m})\implies(X_{i_{1}},X_{i_{2}},\dotsc,X_{i_{k}})\dashrightarrow(Y_{j_{1}},Y_{j_{2}},\dotsc,Y_{j_{l}}).
Proof.

For any applicable test function φCl.Lip(k+l)\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k+l}), define another function ψCl.Lip(n+m)\psi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{n+m}) on a larger space by

ψ(x1,x2,,xn,y1,y2,,ym)φ(xi1,xi2,,xik,yj1,yj2,,yjl),\psi(x_{1},x_{2},\dotsc,x_{n},y_{1},y_{2},\dotsc,y_{m})\coloneqq\varphi(x_{i_{1}},x_{i_{2}},\dotsc,x_{i_{k}},y_{j_{1}},y_{j_{2}},\dotsc,y_{j_{l}}),

then

[φ(Xi1,Xi2,,Xik,Yj1,Yj2,,Yjl)]\displaystyle\hphantom{=}\mathcal{E}[\varphi(X_{i_{1}},X_{i_{2}},\dotsc,X_{i_{k}},Y_{j_{1}},Y_{j_{2}},\dotsc,Y_{j_{l}})]
=[ψ(X1,X2,,Xn,Y1,Y2,,Ym)]\displaystyle=\mathcal{E}[\psi(X_{1},X_{2},\dotsc,X_{n},Y_{1},Y_{2},\dotsc,Y_{m})]
=[[ψ(x1,x2,,xn,Y1,Y2,,Ym)]xi=Xi,i=1,n]\displaystyle=\mathcal{E}[\mathcal{E}[\psi(x_{1},x_{2},\dotsc,x_{n},Y_{1},Y_{2},\dotsc,Y_{m})]_{x_{i}=X_{i},i=1,\dotsc n}]
=[[φ(xi1,xi2,,xik,Yj1,Yj2,,Yjl)]xip=Xip,p=1,,k].\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(x_{i_{1}},x_{i_{2}},\dotsc,x_{i_{k}},Y_{j_{1}},Y_{j_{2}},\dotsc,Y_{j_{l}})]_{x_{i_{p}}=X_{i_{p}},p=1,\dotsc,k}].\qed
Proposition 2.20.

For any subsequence {ip}p=1k\{i_{p}\}_{p=1}^{k} satisfying 1i1<i2<<ikn1\leq i_{1}<i_{2}<\dotsc<i_{k}\leq n, we have the result that

X1X2XnXi1Xi2Xik.X_{1}\dashrightarrow X_{2}\dashrightarrow\dotsc\dashrightarrow X_{n}\implies X_{i_{1}}\dashrightarrow X_{i_{2}}\dashrightarrow\dotsc\dashrightarrow X_{i_{k}}.
Proof.

It is equivalent to prove (Xi1,Xi2,,Xij1)Xij(X_{i_{1}},X_{i_{2}},\dotsc,X_{i_{j-1}})\dashrightarrow X_{i_{j}} for any j=2,,kj=2,\dotsc,k. For any j=2,,kj=2,\dotsc,k, by the definition of independence of the full sequence {Xi}i=1n\{X_{i}\}_{i=1}^{n}, we have

(X1,X2,,Xij1,,Xij1)Xij.(X_{1},X_{2},\dotsc,X_{i_{j-1}},\dotsc,X_{i_{j}-1})\dashrightarrow X_{i_{j}}.

From 2.19, we directly have the sequential independence for the subvectors:

(Xi1,Xi2,,Xij1)Xij.(X_{i_{1}},X_{i_{2}},\dotsc,X_{i_{j-1}})\dashrightarrow X_{i_{j}}.\qed

The following 2.21 and 2.22 will be useful in our later discussion, where the dimension of the three objects X,Y,ZX,Y,Z could be arbitrary finite number.

Lemma 2.21.

If XYZX\dashrightarrow Y\dashrightarrow Z, then X(Y,Z)X\dashrightarrow(Y,Z).

Proof.

Let H(x,y)[φ(x,y,Z)].H(x,y)\coloneqq\mathcal{E}[\varphi(x,y,Z)]. Then we can check

[[φ(x,Y,Z)]x=X]\displaystyle\mathcal{E}[\mathcal{E}[\varphi(x,Y,Z)]_{x=X}] =(1)[[[φ(x,y,Z)]y=Y]x=X]\displaystyle\overset{(1)}{=}\mathcal{E}[\mathcal{E}[\mathcal{E}[\varphi(x,y,Z)]_{y=Y}]_{x=X}]
=[[H(x,Y)]x=X]\displaystyle=\mathcal{E}[\mathcal{E}[H(x,Y)]_{x=X}]
=(2)[H(X,Y)]\displaystyle\overset{(2)}{=}\mathcal{E}[H(X,Y)]
=[[φ(x,y,Z)]x=X,y=Y]\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(x,y,Z)]_{x=X,y=Y}]
=(3)[φ(X,Y,Z)],\displaystyle\overset{(3)}{=}\mathcal{E}[\varphi(X,Y,Z)],

where (1) is due to YZY\dashrightarrow Z, (2) comes from XYX\dashrightarrow Y and (3) comes from (X,Y)Z(X,Y)\dashrightarrow Z. ∎

Lemma 2.22.

If X(Y,Z)X\dashrightarrow(Y,Z) and YZY\dashrightarrow Z, we have (X,Y)Z(X,Y)\dashrightarrow Z.

Proof.

Let H(x,y)[φ(x,y,Z)].H(x,y)\coloneqq\mathcal{E}[\varphi(x,y,Z)]. Then

[[φ(x,y,Z)]x=X,y=Y]\displaystyle\mathcal{E}[\mathcal{E}[\varphi(x,y,Z)]_{x=X,y=Y}] =[H(X,Y)]\displaystyle=\mathcal{E}[H(X,Y)]
=(1)[[H(x,Y)]x=X]\displaystyle\overset{(1)}{=}\mathcal{E}[\mathcal{E}[H(x,Y)]_{x=X}]
=[[[φ(x,y,Z)]y=Y]x=X]\displaystyle=\mathcal{E}[\mathcal{E}[\mathcal{E}[\varphi(x,y,Z)]_{y=Y}]_{x=X}]
=(2)[[φ(x,Y,Z)]x=X]\displaystyle\overset{(2)}{=}\mathcal{E}[\mathcal{E}[\varphi(x,Y,Z)]_{x=X}]
=(3)[φ(X,Y,Z)],\displaystyle\overset{(3)}{=}\mathcal{E}[\varphi(X,Y,Z)],

where (1) comes from XYX\dashrightarrow Y, (2) comes from YZY\dashrightarrow Z and (3) comes from X(Y,Z)X\dashrightarrow(Y,Z). ∎

Proposition 2.23.

If X1X2XnX_{1}\dashrightarrow X_{2}\dashrightarrow\cdots\dashrightarrow X_{n}, for any increasing subsequence {ij}j=1k{1,2,,n}\{i_{j}\}_{j=1}^{k}\subset\{1,2,\dotsc,n\}, we have

(Xi1,,Xil)(Xil+1,,Xik).(X_{i_{1}},\dotsc,X_{i_{l}})\dashrightarrow(X_{i_{l+1}},\dotsc,X_{i_{k}}).
Proof.

Let YjXij.Y_{j}\coloneqq X_{i_{j}}. Then we have Y1Y2YkY_{1}\dashrightarrow Y_{2}\dashrightarrow\cdots\dashrightarrow Y_{k}. Our goal is to show for any l=1,2,,k1l=1,2,\dotsc,k-1,

(Y1,,Yl)(Yl+1,,Yk).(Y_{1},\dotsc,Y_{l})\dashrightarrow(Y_{l+1},\dotsc,Y_{k}). (2.10)

Then we can proceed by math induction. Let m=klm=k-l. The result 2.10 holds when m=1m=1 because we directly have

(Y1,,Yk1)Yk,(Y_{1},\dotsc,Y_{k-1})\dashrightarrow Y_{k},

by the definition of Y1Y2YkY_{1}\dashrightarrow Y_{2}\dashrightarrow\cdots\dashrightarrow Y_{k}. Suppose 2.10 holds for m=jm=j. We need to show the case with m=j+1m=j+1:

(Y1,,Ykj1)(Ykj,Ykj+1,Yk).(Y_{1},\dotsc,Y_{k-j-1})\dashrightarrow(Y_{k-j},Y_{k-j+1}\dotsc,Y_{k}). (2.11)

Let

A1\displaystyle A_{1} (Y1,,Ykj1),\displaystyle\coloneqq(Y_{1},\dotsc,Y_{k-j-1}),
A2\displaystyle A_{2} Ykj,\displaystyle\coloneqq Y_{k-j},
A3\displaystyle A_{3} (Ykj+1,,Yk).\displaystyle\coloneqq(Y_{k-j+1},\dotsc,Y_{k}).

Then we have A1A2A_{1}\dashrightarrow A_{2} by the definition of Y1Y2Ykj.Y_{1}\dashrightarrow Y_{2}\dashrightarrow\cdots\dashrightarrow Y_{k-j}. We also have (A1,A2)A3(A_{1},A_{2})\dashrightarrow A_{3} by the result for m=jm=j. Then we can follow the same logic of 2.21 to show

A1(A2,A3),A_{1}\dashrightarrow(A_{2},A_{3}),

which is exactly 2.11. The proof is finished by math induction. ∎

Proposition 2.24.

The following two statements are equivalent:

  1. (1)

    X1X2X3X4X_{1}\dashrightarrow X_{2}\dashrightarrow X_{3}\dashrightarrow X_{4},

  2. (2)

    (X1,X2)(X3,X4)(X_{1},X_{2})\dashrightarrow(X_{3},X_{4}), X1X2X_{1}\dashrightarrow X_{2} and X3X4X_{3}\dashrightarrow X_{4}.

Proof.

Since (1) implies (2), so we only need to show the other direction. By the definition of (1), we simply need to check:

(X1,X2,X3)X4.(X_{1},X_{2},X_{3})\dashrightarrow X_{4}.

This is a direct consequence of 2.22 by letting X(X1,X2)X^{*}\coloneqq(X_{1},X_{2}), YX3Y^{*}\coloneqq X_{3} and ZX4Z^{*}\coloneqq X_{4}. ∎

Proposition 2.25.

For any X,YX,Y\in\mathcal{H}, we have [X+Y]=[X]+[Y]\mathcal{E}[X+Y]=\mathcal{E}[X]+\mathcal{E}[Y] as long as either one of the following conditions holds:

  1. 1.

    [X]=[X]\mathcal{E}[X]=-\mathcal{E}[-X];

  2. 2.

    [Y]=[Y]\mathcal{E}[Y]=-\mathcal{E}[-Y];

  3. 3.

    YY is independent from XX: XYX\dashrightarrow Y;

  4. 4.

    XX is independent from YY: YXY\dashrightarrow X.

Proof.

We only need to show Condition 1 and 3. Under Condition 1, we have

[X+Y][X]+[Y]=[Y][X][Y(X)]=[X+Y].\mathcal{E}[X+Y]\leq\mathcal{E}[X]+\mathcal{E}[Y]=\mathcal{E}[Y]-\mathcal{E}[-X]\leq\mathcal{E}[Y-(-X)]=\mathcal{E}[X+Y].

Under condition 3, we have

[X+Y]=[[x+Y]x=X]=[X+[Y]]=[X]+[Y].\mathcal{E}[X+Y]=\mathcal{E}[\mathcal{E}[x+Y]_{x=X}]=\mathcal{E}[X+\mathcal{E}[Y]]=\mathcal{E}[X]+\mathcal{E}[Y].\qed

2.26 is an important result that shows the asymmetry of independence between two random objects prevails in this framework except when their distributions are maximal or classical ones.

Theorem 2.26 (Hu and Li, (2014)).

For two non-constant random varibles X,YX,Y\in\mathcal{H}, if XX and YY are mutually independent (XYX\dashrightarrow Y and YXY\dashrightarrow X), then they belong to either of the following two cases:

  1. 1.

    The distributions of XX and YY are classical (no distributional uncertainty);

  2. 2.

    Both XX and YY are maximally distributed.

We can also easily obtain the following result.

Proposition 2.27.

For two non-constant random varibles X,YX,Y\in\mathcal{H}, if they belong to either of the two cases in 2.26, then we have XYX\dashrightarrow Y implies YXY\dashrightarrow X.

Proof.

When X,YX,Y are classically distributed, the results can be derived from 2.6.3. When they are maximally distributed, this result has been studied in Example 14 in Hu and Li, (2014) and it has been generalized to 3.6 whose proof is in Section 6.1. We sketch the proof here to show the intuition for general readers. Suppose X(K1)X\sim\mathcal{M}(K_{1}) and Y(K2)Y\sim\mathcal{M}(K_{2}) where K1K_{1} and K2K_{2} are two bounded, closed and convex sets. If XYX\dashrightarrow Y, for any φCl.Lip(2)\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{2}), we can work on the expectation of (X,Y)(X,Y) to show the other direction of independence,

[φ(X,Y)]\displaystyle\mathcal{E}[\varphi(X,Y)] =[[φ(x,Y)]x=X]=[(maxyK2φ(x,y))x=X]\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(x,Y)]_{x=X}]=\mathcal{E}[(\max_{y\in K_{2}}\varphi(x,y))_{x=X}]
=maxxK1maxyK2φ(x,y)=max(x,y)K1×K2φ(x,y)\displaystyle=\max_{x\in K_{1}}\max_{y\in K_{2}}\varphi(x,y)=\max_{(x,y)\in K_{1}\times K_{2}}\varphi(x,y)
=maxyK2maxxK1φ(x,y)=[[φ(X,y)]y=Y],\displaystyle=\max_{y\in K_{2}}\max_{x\in K_{1}}\varphi(x,y)=\mathcal{E}[\mathcal{E}[\varphi(X,y)]_{y=Y}],

where we have used the fact that φy(x)maxyK2φ(x,y)Cl.Lip()\varphi^{*}_{y}(x)\coloneqq\max_{y\in K_{2}}\varphi(x,y)\in C_{\mathrm{l.Lip}}(\mathbb{R}) if φCl.Lip(2)\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{2}) (to apply the representation), which is validated by 6.1 in the proof of 3.6. Hence, we have YXY\dashrightarrow X. ∎

3 Our main results: semi-GG-normal and its representations

This section serves for two objectives. On the one hand, we will introduce a new substructure called the semi-GG-normal distribution and explain its hybrid property and intermediate role sitting between the classical normal and GG-normal distribution. On the other hand, this section is also designed to give general readers a gentle trip towards the GG-normal distribution by starting from our old friend, the classical normal distribution.

Although most of the theoretical results presented in this section are in the sublinear expectation space (Ω,,)(\Omega,\mathcal{H},\mathcal{E}) by default unless indicated in the context, we will introduce most of the subsections by starting from a discussion on the distributional uncertainty of a random object in a classical state-space volatility model, whose context will be set up in Section 3.1.

Without further notice, these are the notations we are going to consistently use in this paper:

  • +\mathbb{N}_{+}: the set of all positive integers.

  • 𝒙(n)(x1,x2,,xn)\bm{x}_{(n)}\coloneqq(x_{1},x_{2},\dotsc,x_{n}) and 𝒙(n)𝒚(n)(x1y1,x2y2,,xnyn)\bm{x}_{(n)}*\bm{y}_{(n)}\coloneqq(x_{1}y_{1},x_{2}y_{2},\dotsc,x_{n}y_{n}).

  • Let 𝐈d\mathbf{I}_{d} denote a d×dd\times d identity matrix.

  • In (Ω,,)(\Omega,\mathcal{F},\mathbb{P}), let 𝔼\mathbb{E}_{\mathbb{P}} denote the linear expectation with respect to \mathbb{P} and we may write it as 𝔼\mathbb{E} if the underlying \mathbb{P} is clear from the context.

  • Random variables in (Ω,,)(\Omega,\mathcal{H},\mathcal{E}): V[σ¯,σ¯]V\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}, ϵ𝒩(0,[1,1]\epsilon\sim\mathcal{N}(0,[1,1], WVϵW\coloneqq V\epsilon, WG𝒩(0,[σ¯2,σ¯2])W^{G}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]).

  • Random variables in (Ω,,)(\Omega,\mathcal{F},\mathbb{P}): σ:Ω[σ¯,σ¯]\sigma:\Omega\to{[\underline{\sigma},\overline{\sigma}]}, ϵN(0,1)\epsilon\sim N(0,1), YσϵY\coloneqq\sigma\epsilon. Note that we can treat ϵ\epsilon as a random object in both sublinear and classical system due to 2.14.1.

The reason we use two different sets of random variables in two spaces is mainly for simplicity of our discussion, which will be further explained in 3.2.2.

3.1 Setup of a story in a classical filtered probability space

In (Ω,,)(\Omega,\mathcal{F},\mathbb{P}), consider (ϵt)t+(\epsilon_{t})_{t\in\mathbb{N}_{+}} as a sequence of classically i.i.d. random variables satisfying 𝔼[|ϵ1|k]<\mathbb{E}_{\mathbb{P}}[|\epsilon_{1}|^{k}]<\infty for k+k\in\mathbb{N}_{+}. Let (σt)t+(\sigma_{t})_{t\in\mathbb{N}_{+}} be a sequence of bounded random variables which can be treated as states (or volatility regimes) with state space Sσ[σ¯,σ¯]S_{\sigma}\subset{[\underline{\sigma},\overline{\sigma}]}. Let Yt=σtϵt,t+Y_{t}=\sigma_{t}\epsilon_{t},t\in\mathbb{N}_{+} denote the observation sequence. (It seems like a zero-delay setup, but this is not essential in our current scope of discussion.) Consider a representative Y=σϵY=\sigma\epsilon where (σ,ϵ)(σ1,ϵ1)(\sigma,\epsilon)\coloneqq(\sigma_{1},\epsilon_{1}).

For simiplicty of discussion, at each time point tt, we assume that ϵt\epsilon_{t} follows N(0,1)N(0,1) and σt\sigma_{t} and ϵt\epsilon_{t} are classically independent, denoted as σtϵt\sigma_{t}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon_{t}. Consider the following (discrete-time) filtrations:

𝒢t\displaystyle\mathcal{G}_{t} σ(σs,st)𝒩,\displaystyle\coloneqq\sigma(\sigma_{s},s\leq t)\vee\mathcal{N},
𝒴t\displaystyle\mathcal{Y}_{t} σ(Ys,st)𝒩,\displaystyle\coloneqq\sigma(Y_{s},s\leq t)\vee\mathcal{N},
t\displaystyle\mathcal{F}_{t} σ((σs,Ys),st)𝒩.\displaystyle\coloneqq\sigma((\sigma_{s},Y_{s}),s\leq t)\vee\mathcal{N}.

where 𝒩\mathcal{N} is the collection of \mathbb{P}-null sets used to complete each of the generated σ\sigma-field mentioned above. Note that t\mathcal{F}_{t} is the same as σ((ϵs,σs),st)𝒩\sigma((\epsilon_{s},\sigma_{s}),s\leq t)\vee\mathcal{N}. Let 𝔽{t}t+\mathbb{F}\coloneqq\{\mathcal{F}_{t}\}_{t\in\mathbb{N}_{+}}. In a classical filtered probability space (Ω,,𝔽,)(\Omega,\mathcal{F},\mathbb{F},\mathbb{P}), we will start the following subsections by putting ourseleves, as a group of data analysts, in a context of dealing with uncertainty on the distributions of one state variable σ\sigma, one observation variable Y=σϵY=\sigma\epsilon and a sequence of observation variables (Y1,Y2,,Yn)(Y_{1},Y_{2},\dotsc,Y_{n}) for n+n\in\mathbb{N}_{+}.

3.2 Preparation: properties of maximal distribution

Suppose we have uncertainty on the distribution of the state variable σ\sigma (and it is realistic because σ\sigma is not directly observable in practice) due to lack of prior knowledge. Another possible situation is different member in our group has different belief on the behavior of σ\sigma or different preference on the choice of the model - the distribution of σ\sigma could be a degenerate, discrete, absolutely continuous or even arbitrary one with support on [σ¯,σ¯]{[\underline{\sigma},\overline{\sigma}]}. In order to quantify this kind of model uncertainty for a given transformation φ\varphi (as a test function), we usually need to involve the maximum expected value of σ\sigma:

supσ𝒜[σ¯,σ¯]𝔼[φ(σ)],\sup_{\sigma\in\mathcal{A}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma)], (3.1)

where 𝒜[σ¯,σ¯]\mathcal{A}{[\underline{\sigma},\overline{\sigma}]} can be chosen depending on the available prior information. Possible choices of 𝒜[σ¯,σ¯]\mathcal{A}{[\underline{\sigma},\overline{\sigma}]} include,

  • 𝒟[σ¯,σ¯]\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}: the space of all classically distributed random variables with support on [σ¯,σ¯]{[\underline{\sigma},\overline{\sigma}]}.

  • 𝒟disc.[σ¯,σ¯]{σ𝒟[σ¯,σ¯]: discretely distributed}\mathcal{D}_{\textbf{disc.}}{[\underline{\sigma},\overline{\sigma}]}\coloneqq\{\sigma\in\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}:\text{ discretely distributed}\}.

  • 𝒟cont.[σ¯,σ¯]{σ𝒟[σ¯,σ¯]: absolutely continuously distributed}\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}\coloneqq\{\sigma\in\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}:\text{ absolutely continuously distributed}\}.

  • 𝒟deg.[σ¯,σ¯]{σ𝒟[σ¯,σ¯]:(σ=v)=1,v[σ¯,σ¯]}\mathcal{D}_{\textbf{deg.}}{[\underline{\sigma},\overline{\sigma}]}\coloneqq\{\sigma\in\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}:\mathbb{P}(\sigma=v)=1,v\in{[\underline{\sigma},\overline{\sigma}]}\}, which is the family of all random variables following degenerate (or Dirac) distribution with mass point at v[σ¯,σ¯]v\in{[\underline{\sigma},\overline{\sigma}]}.

We are going to show that 3.1 will all be the same as the sublinear expectation of maximal distribution in the GG-framework.

Definition 3.1.

(Univariate Maximal Distribution) In sublinear expectation space (Ω,,)(\Omega,\mathcal{H},\mathcal{E}), a random variable VV follows maximal distribution [σ¯,σ¯]\mathcal{M}{[\underline{\sigma},\overline{\sigma}]} with σ¯σ¯\underline{\sigma}\leq\overline{\sigma} if, for any φCl.Lip(),\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}),

[φ(V)]=maxv[σ¯,σ¯]φ(v).\mathcal{E}[\varphi(V)]=\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(v).
Remark 3.1.1.

We can take maximum because we are working on a continuous φ\varphi on a compact set [σ¯,σ¯]{[\underline{\sigma},\overline{\sigma}]}.

The reason that we use the notation [σ¯,σ¯]{[\underline{\sigma},\overline{\sigma}]} (which is like an interval for standard deviation) is for the convenience of our later discussion.

Proposition 3.2 (Representations of Univariate Maximal Distribution).

Consider V[σ¯,σ¯]V\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}, then for any φCl.Lip()\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}), we have 𝔼[|φ(V)|]<\mathbb{E}[\lvert\varphi(V)\rvert]<\infty and

[φ(V)]\displaystyle\mathcal{E}[\varphi(V)] =maxσ𝒟deg.[σ¯,σ¯]𝔼[φ(σ)]\displaystyle=\max_{\sigma\in\mathcal{D}_{\textbf{deg.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma)] (3.2)
=maxσ𝒟disc.[σ¯,σ¯]𝔼[φ(σ)]\displaystyle=\max_{\sigma\in\mathcal{D}_{\textbf{disc.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma)] (3.3)
=supσ𝒟cont.[σ¯,σ¯]𝔼[φ(σ)]\displaystyle=\sup_{\sigma\in\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma)] (3.4)
=maxσ𝒟[σ¯,σ¯]𝔼[φ(σ)].\displaystyle=\max_{\sigma\in\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma)]. (3.5)
Remark 3.2.1.

Note that 𝒟disc.𝒟cont.𝒟\mathcal{D}_{\textbf{disc.}}\cup\mathcal{D}_{\textbf{cont.}}\subset\mathcal{D}. The probability laws associated with 𝒟cont.\mathcal{D}_{\textbf{cont.}} are equivalent, but 𝒟disc.\mathcal{D}_{\textbf{disc.}} and 𝒟\mathcal{D} do not have this property.

Remark 3.2.2.

In 3.2, we write the representation in the form of

[φ(V)]=maxσ𝒜𝔼[φ(σ)],\mathcal{E}[\varphi(V)]=\max_{\sigma\in\mathcal{A}}\mathbb{E}[\varphi(\sigma)], (3.6)

where 𝒜\mathcal{A} denote a family of random variables in (Ω,,)(\Omega,\mathcal{F},\mathbb{P}). Equivalently, if we treat VV as a random variable for both sides (which requires more careful preliminary setup and we will not touch at this stage, more details can be found in Chapter 6 of Peng, 2019b ), 3.6 becomes

[φ(V)]=maxV𝒜1𝔼[φ(V)],\mathcal{E}[\varphi(V)]=\max_{\mathbb{P}_{V}\in\mathbb{P}\circ\mathcal{A}^{-1}}\mathbb{E}_{\mathbb{P}}[\varphi(V)], (3.7)

where V\mathbb{P}_{V} is the distribution of VV and 𝒜1{σ1,σ𝒜}\mathbb{P}\circ\mathcal{A}^{-1}\coloneqq\{\mathbb{P}\circ\sigma^{-1},\sigma\in\mathcal{A}\} becomes a family of distributions. Throughout this paper, we prefer to use the form 3.6 for simplicity of notations and minimization of technical setup, but readers can always informally view 3.6 as a equivalent form of 3.7. In this way, we can better see the distributional uncertainty of VV.

Remark 3.2.3.

Meanwhile, 3.2 provides four ways to represent the distributional uncertainty of VV. In practice, practitioners may choose the representation they need depending on the available prior knowledge or their belief on the random phenomenon.

Definition 3.3.

(Multivariate Maximal Distribution) In sublinear expectation space (Ω,,)(\Omega,\mathcal{H},\mathcal{E}), a random vector 𝑽:Ωd\bm{V}:\Omega\to\mathbb{R}^{d} follows a (multivariate) maximal distribution (𝒱)\mathcal{M}(\mathcal{V}), if there exists a compact and convex subset 𝒱d\mathcal{V}\subset\mathbb{R}^{d} satisfying: for any φCl.Lip(d)\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{d}),

[φ(𝑽)]=max𝝈𝒱φ(𝝈).\mathcal{E}[\varphi(\bm{V})]=\max_{\bm{\sigma}\in\mathcal{V}}\varphi(\bm{\sigma}).

One can also easily extend the representation in 3.2 to a multivariate case (3.4) by considering 𝒟(𝒱)\mathcal{D}(\mathcal{V}) which is the space of all classically distributed random variables with support on 𝒱\mathcal{V} and also considering its subspaces 𝒟deg.(𝒱)\mathcal{D}_{\textbf{deg.}}(\mathcal{V}), 𝒟cont.(𝒱)\mathcal{D}_{\textbf{cont.}}(\mathcal{V}) and 𝒟disc.(𝒱)\mathcal{D}_{\textbf{disc.}}(\mathcal{V}) as well.

Proposition 3.4 (Representations of multivariate maximal distribution).

For 𝐕(𝒱)\bm{V}\sim\mathcal{M}(\mathcal{V}), we have for any φCl.Lip(d)\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{d}),

[φ(𝑽)]=supσ𝒜(𝒱)𝔼[φ(𝝈)],\mathcal{E}[\varphi(\bm{V})]=\sup_{\sigma\in\mathcal{A}(\mathcal{V})}\mathbb{E}[\varphi(\bm{\sigma})], (3.8)

where 𝒜\mathcal{A} can be chosen from {𝒟,𝒟disc.,𝒟cont.,𝒟deg.}\{\mathcal{D},\mathcal{D}_{\textbf{disc.}},\mathcal{D}_{\textbf{cont.}},\mathcal{D}_{\textbf{deg.}}\} and sup\sup can be changed to max\max except when 𝒜=𝒟cont.\mathcal{A}=\mathcal{D}_{\textbf{cont.}}.

Proof.

It can be extended from the proof of 3.2. ∎

Next we provide a property for multivariate maximal distribution under transformations.

Proposition 3.5.

Suppose 𝐕(𝒱)\bm{V}\sim\mathcal{M}(\mathcal{V}). Then for any locally Lipschitz function ψ:(d,)(k,)\psi:(\mathbb{R}^{d},\lVert\cdot\rVert)\to(\mathbb{R}^{k},\lVert\cdot\rVert), we have

𝑺ψ(𝐕)=ψ(V1,V2,,Vd)(𝒮),\bm{S}\coloneqq\psi(\mathbf{V})=\psi(V_{1},V_{2},\dotsc,V_{d})\sim\mathcal{M}(\mathcal{S}),

where 𝒮ψ(𝒱)={ψ(𝛔):𝛔𝒱}\mathcal{S}\coloneqq\psi(\mathcal{V})=\{\psi(\bm{\sigma}):\bm{\sigma}\in\mathcal{V}\}.

Remark 3.5.1.

3.5 is generalized version of Proposition 25 and Remark 26 in Jin and Peng, (2021). It shows that the transformation of a maximal is still a maximal distribution, with support equal to the range of the function.

Next we give a connection between univariate and multivariate maximal distribution.

Proposition 3.6.

(The relation between multivariate and the univariate maximal distribution) Consider a sequence of maximally distributed random variables Vi[σ¯i,σ¯i]V_{i}\sim\mathcal{M}[\underline{\sigma}_{i},\overline{\sigma}_{i}] with σ¯iσ¯i\underline{\sigma}_{i}\leq\overline{\sigma}_{i}, i=1,2,,di=1,2,\dotsc,d, then the following three statements are equivalent:

  1. (1)

    {Vi}i=1d\{V_{i}\}_{i=1}^{d} are sequentially independent V1V2VdV_{1}\dashrightarrow V_{2}\dashrightarrow\cdots\dashrightarrow V_{d},

  2. (2)

    Vi1Vi2VidV_{i_{1}}\dashrightarrow V_{i_{2}}\dashrightarrow\cdots\dashrightarrow V_{i_{d}} for any permutation (i1,i2,,id)(i_{1},i_{2},\dotsc,i_{d}) of (1,2,,d)(1,2,\dotsc,d),

  3. (3)

    𝑽(V1,V2,,Vd)(i=1d[σ¯i,σ¯i])\bm{V}\coloneqq(V_{1},V_{2},\dotsc,V_{d})\sim\mathcal{M}(\prod_{i=1}^{d}[\underline{\sigma}_{i},\overline{\sigma}_{i}]), where the operation i=1d\prod_{i=1}^{d} is the Cartesian product.

Remark 3.6.1.

3.6 shows that the sequential independence between maximal distribution can be arbitrarily switched without changing its joint distribution, which is a maximal distribution supporting on a dd-dimensional rectangle. Reversely speaking, if a random vector follows a maximal distribution concentrating this rectangle shape, it implies the sequential independence among its components.

As a special case of 3.6, for two maximal distributed random variables Vi,i=1,2V_{i},i=1,2, V1V2V_{1}\dashrightarrow V_{2} implies that V2V1V_{2}\dashrightarrow V_{1}. In fact, 2.26 given by Hu and Li, (2014) shows for two non-constant, non-classical distributed random objects, this kind of mutual independence only holds for maximal distributions. The asymmetry of sequential independence prevails among the distributions in the GG-expectation framework.

3.3 Preparation: setup of a product space (a newly added part)

We start from a set 𝒬\mathcal{Q} of probability measures and a single probability measure PP, where PP does not have to be in 𝒬\mathcal{Q}. Let 1[]supQ𝒬𝔼Q[]\mathcal{E}_{1}[\cdot]\coloneqq\sup_{Q\in\mathcal{Q}}\mathbb{E}_{Q}[\cdot] and 2[]𝔼P[]\mathcal{E}_{2}[\cdot]\coloneqq\mathbb{E}_{P}[\cdot]. Then we have the associated sublinear expectation spaces (Ωi,(i),i)(\Omega_{i},\mathcal{H}_{(i)},\mathcal{E}_{i}) with i=1,2i=1,2. Note that 2\mathcal{E}_{2}, as a linear operator, can be treated as a degenerate sublinear expectation. We may simply write the linear expectation 𝔼P\mathbb{E}_{P} as 𝔼\mathbb{E} if the probability measure is clear from the context. Since 2\mathcal{E}_{2} is a linear expectation, the distributions under (Ω2,(2),2)(\Omega_{2},\mathcal{H}_{(2)},\mathcal{E}_{2}) can be treated as classical ones for which we assume they contain common classical distributions (such as classical normal). We also assume 𝒬\mathcal{Q} is designed such that GG-distribution exists in (Ω1,(1),1)(\Omega_{1},\mathcal{H}_{(1)},\mathcal{E}_{1}). Then we can combine these two spaces into a product space (Ω1×Ω2,(1)(2),12)(\Omega_{1}\times\Omega_{2},\mathcal{H}_{(1)}\otimes\mathcal{H}_{(2)},\mathcal{E}_{1}\otimes\mathcal{E}_{2}). It is also forms a sublinear expectation space. More details on this notion of product space can be found in Peng, 2019b (Section 1.3).

For readers’ convenience, here we provide a brief description of this product space.

  1. 1.

    The space is (1)(2)\mathcal{H}_{(1)}\otimes\mathcal{H}_{(2)} defined as

    (1)(2)=\displaystyle\mathcal{H}_{(1)}\otimes\mathcal{H}_{(2)}= {X(ω1,ω2)=f(K(ω1),η(ω2)),(ω1,ω2)Ω1×Ω2,\displaystyle\{X(\omega_{1},\omega_{2})=f(K(\omega_{1}),\eta(\omega_{2})),(\omega_{1},\omega_{2})\in\Omega_{1}\times\Omega_{2},
    K(1)m,η(2)n,fCl.Lip(m+n),m,n+}.\displaystyle K\in\mathcal{H}_{(1)}^{m},\eta\in\mathcal{H}_{(2)}^{n},f\in C_{\mathrm{l.Lip}}(\mathbb{R}^{m+n}),m,n\in\mathbb{N}_{+}\}.
  2. 2.

    For X(ω1,ω2)=f(K(ω1),η(ω2))(1)(2)X(\omega_{1},\omega_{2})=f(K(\omega_{1}),\eta(\omega_{2}))\in\mathcal{H}_{(1)}\otimes\mathcal{H}_{(2)}, we have defined

    [X]=12[X]\displaystyle\mathcal{E}[X]=\mathcal{E}_{1}\otimes\mathcal{E}_{2}[X] 1[2[f(k,η)]k=K]\displaystyle\coloneqq\mathcal{E}_{1}[\mathcal{E}_{2}[f(k,\eta)]_{k=K}]
    =supQ𝒬𝔼Q[𝔼P[f(k,η)]k=K]\displaystyle=\sup_{Q\in\mathcal{Q}}\mathbb{E}_{Q}[\mathbb{E}_{P}[f(k,\eta)]_{k=K}]
    =supQ𝒬f(k,y)Pη(dy)QK(dk)\displaystyle=\sup_{Q\in\mathcal{Q}}\int\int f(k,y)P_{\eta}(\mathop{}\!\mathrm{d}y)Q_{K}(\mathop{}\!\mathrm{d}k)
    =sup𝒫𝔼[X],\displaystyle=\sup_{\mathbb{P}\in\mathcal{P}}\mathbb{E}_{\mathbb{P}}[X],

    where 𝒫{QP,Q𝒬}\mathcal{P}\coloneqq\{Q\otimes P,Q\in\mathcal{Q}\} where QPQ\otimes P is the product measure of PP and QQ. Note that 2112\mathcal{E}_{2}\otimes\mathcal{E}_{1}\neq\mathcal{E}_{1}\otimes\mathcal{E}_{2} due to the sublinearity of 1\mathcal{E}_{1}.

Proposition 3.7.

For a random variable KK on (Ω1,(1),1)(\Omega_{1},\mathcal{H}_{(1)},\mathcal{E}_{1}) and η\eta on (Ω2,(2),2)(\Omega_{2},\mathcal{H}_{(2)},\mathcal{E}_{2}), by letting K¯(ω1,ω2)K(ω1)\bar{K}(\omega_{1},\omega_{2})\coloneqq K(\omega_{1}) and η¯(ω1,ω2)η(ω2)\bar{\eta}(\omega_{1},\omega_{2})\coloneqq\eta(\omega_{2}), we have the following results:

  1. 1.

    K¯,η¯(1)(2)\bar{K},\bar{\eta}\in\mathcal{H}_{(1)}\otimes\mathcal{H}_{(2)},

  2. 2.

    X(ω)=f(K¯(ω),η¯(ω)),X(\omega)=f(\bar{K}(\omega),\bar{\eta}(\omega)), for ωΩ1×Ω2\omega\in\Omega_{1}\times\Omega_{2},

  3. 3.

    For any φCb.Lip\varphi\in C_{\mathrm{b.Lip}}, [φ(K¯)]=1[φ(K)],\mathcal{E}[\varphi(\bar{K})]=\mathcal{E}_{1}[\varphi(K)],

  4. 4.

    For any φCb.Lip\varphi\in C_{\mathrm{b.Lip}}, [φ(η¯)]=𝔼P[φ(η)],\mathcal{E}[\varphi(\bar{\eta})]=\mathbb{E}_{P}[\varphi(\eta)],

  5. 5.

    K¯η¯.\bar{K}\dashrightarrow\bar{\eta}.

Proof.

Item 1 and 2 are obvious to see. For Item 3, we have

[φ(K¯)]\displaystyle\mathcal{E}[\varphi(\bar{K})] =1[2[φ(f1(k,η))]k=K]\displaystyle=\mathcal{E}_{1}[\mathcal{E}_{2}[\varphi(f_{1}(k,\eta))]_{k=K}]
=1[2[φ(k)]k=K]=1[φ(K)].\displaystyle=\mathcal{E}_{1}[\mathcal{E}_{2}[\varphi(k)]_{k=K}]=\mathcal{E}_{1}[\varphi(K)].

Similarly, we can show Item 4. For Item 5, our goal is to show

[φ(K¯,η¯)]=[[φ(k,η¯)]k=K¯].\mathcal{E}[\varphi(\bar{K},\bar{\eta})]=\mathcal{E}[\mathcal{E}[\varphi(k,\bar{\eta})]_{k=\bar{K}}].

We can see the equation above holds from the following step: with H(k)𝔼P[φ(k,η¯)]H(k)\coloneqq\mathbb{E}_{P}[\varphi(k,\bar{\eta})],

RHS =[𝔼P[φ(k,η¯)]k=K¯]\displaystyle=\mathcal{E}[\mathbb{E}_{P}[\varphi(k,\bar{\eta})]_{k=\bar{K}}]
=[H(K¯)]=1[H(K)]\displaystyle=\mathcal{E}[H(\bar{K})]=\mathcal{E}_{1}[H(K)]
=1[𝔼P[φ(k,η¯)]k=K]\displaystyle=\mathcal{E}_{1}[\mathbb{E}_{P}[\varphi(k,\bar{\eta})]_{k=K}]
=[φ(K,η)]=[φ(K¯,η¯)].\displaystyle=\mathcal{E}[\varphi(K,\eta)]=\mathcal{E}[\varphi(\bar{K},\bar{\eta})].\qed
Remark 3.7.1.

In the following context, without further notice, we will not distinguish K¯\bar{K} (or η¯\bar{\eta}) with KK (or η\eta). Moreover, we can see that, by letting η(ω1,ω2)η(ω2)\eta(\omega_{1},\omega_{2})\coloneqq\eta(\omega_{2}) so that η(1)(2)\eta\in\mathcal{H}_{(1)}\otimes\mathcal{H}_{(2)}, from Item 3, we have

[φ(η)]=sup𝒫𝔼[φ(η)]=𝔼P[φ(η)],\mathcal{E}[\varphi(\eta)]=\sup_{\mathbb{P}\in\mathcal{P}}\mathbb{E}_{\mathbb{P}}[\varphi(\eta)]=\mathbb{E}_{P}[\varphi(\eta)],

where \mathbb{P} is any product measure QPQ\otimes P where Q𝒬Q\in\mathcal{Q}. By making φ\varphi into φ-\varphi, we can show that for any 𝒫\mathbb{P}\in\mathcal{P},

𝔼P[φ(η)]=inf𝒫𝔼[φ(η)]𝔼[φ(η)]sup𝒫𝔼[φ(η)]=𝔼P[φ(η)],\mathbb{E}_{P}[\varphi(\eta)]=\inf_{\mathbb{P}\in\mathcal{P}}\mathbb{E}_{\mathbb{P}}[\varphi(\eta)]\leq\mathbb{E}_{\mathbb{P}}[\varphi(\eta)]\leq\sup_{\mathbb{P}\in\mathcal{P}}\mathbb{E}_{\mathbb{P}}[\varphi(\eta)]=\mathbb{E}_{P}[\varphi(\eta)],

or simply 𝔼[φ(η)]=𝔼P[φ(η)].\mathbb{E}_{\mathbb{P}}[\varphi(\eta)]=\mathbb{E}_{P}[\varphi(\eta)]. It means that the probability law of η\eta is always PηP_{\eta} under each product measure 𝒫\mathbb{P}\in\mathcal{P}.

Let ¯s\bar{\mathcal{H}}_{s} denote a subspace of the product space mentioned above:

¯s{X(1)(2):\displaystyle\bar{\mathcal{H}}_{s}\coloneqq\{X\in\mathcal{H}_{(1)}\otimes\mathcal{H}_{(2)}: X(ω1,ω2)=f(K(ω1),η(ω2)),K(1)m(Θ),\displaystyle X(\omega_{1},\omega_{2})=f(K(\omega_{1}),\eta(\omega_{2})),K\in\mathcal{H}_{(1)}^{m}\sim\mathcal{M}(\Theta),
fCl.Lip(m+n),Θm,m,n+}.\displaystyle f\in C_{\mathrm{l.Lip}}(\mathbb{R}^{m+n}),\Theta\subset\mathbb{R}^{m},m,n\in\mathbb{N}_{+}\}.

For any X¯sX\in\bar{\mathcal{H}}_{s}, by the representation of maximal distribution, we have

[X]=supθΘ𝔼P[f(θ,η)].\mathcal{E}[X]=\sup_{\theta\in\Theta}\mathbb{E}_{P}[f(\theta,\eta)].

3.4 Univariate semi-GG-normal distribution

Recall the story setup in Section 3.1. Note that Y=σϵY=\sigma\epsilon can be treated as a normal mixture with scaling latent variable σ:Ω[σ¯,σ¯]\sigma:\Omega\to{[\underline{\sigma},\overline{\sigma}]}. For simplicity of discussion, we have assumed σϵ\sigma\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon and ϵN(0,1)\epsilon\sim N(0,1). Suppose we are further faced with the uncertainty on the distribution of Y=σϵY=\sigma\epsilon due to uncertain σ\sigma part. Then the maximum expected value under this distributional uncertainty is

supσ𝒜[σ¯,σ¯]𝔼[φ(σϵ)],\sup_{\sigma\in\mathcal{A}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)], (3.9)

where the choice of 𝒜[σ¯,σ¯]\mathcal{A}{[\underline{\sigma},\overline{\sigma}]} is the same as Section 3.2. It turns out, in either of the choices, 3.9 can be expressed as the sublinear expectation of a semi-GG-normal W𝒩^(0,[σ¯2,σ¯2])W\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) (3.8).

To begin with, note that 𝒩(0,[1,1])\mathcal{N}(0,[1,1]) can be treated as the same as the classical distribution N(0,1)N(0,1) due to 2.14.1. Therefore, we can also say ϵ𝒩(0,[1,1])\epsilon\sim\mathcal{N}(0,[1,1]) in the sublinear expectation space. In the following context, we will not distinguish between N(0,1)N(0,1) and 𝒩(0,[1,1])\mathcal{N}(0,[1,1]). Similarly, a standard multivariate normal N(𝟎,𝐈d)N(\bm{0},\mathbf{I}_{d}) can be treated as both a classical distribution and also a degenerate version of a multivariate GG-normal.

Definition 3.8 (Univariate semi-GG-normal distribution).

For any W¯sW\in\bar{\mathcal{H}}_{s}, we call WW follows a semi-GG-normal distribution 𝒩^(0,[σ¯2,σ¯2])\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) if there exist V¯s[σ¯,σ¯]V\in\bar{\mathcal{H}}_{s}\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]} and ϵ¯sN(0,1)\epsilon\in\bar{\mathcal{H}}_{s}\sim N(0,1) with VϵV\dashrightarrow\epsilon, such that

W=Vϵ,W=V\epsilon, (3.10)

where the direction of independence cannot be reversed. It is denoted as W𝒩^(0,[σ¯2,σ¯2])W\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]).

Remark 3.8.1.

(Existence of Semi-GG-normal distribution) Since there exist V(1)[σ¯,σ¯]V^{\prime}\in\mathcal{H}_{(1)}\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]} and ϵ(2)N(0,1)\epsilon^{\prime}\in\mathcal{H}_{(2)}\sim N(0,1), let V(ω1,ω2)V(ω1)V(\omega_{1},\omega_{2})\coloneqq V^{\prime}(\omega_{1}) and ϵ(ω1,ω2)ϵ(ω2)\epsilon(\omega_{1},\omega_{2})\coloneqq\epsilon^{\prime}(\omega_{2}). Consider f(x,y)=xyf(x,y)=xy, then 3.7 ensures that W=f(V,ϵ)=f(V,ϵ)¯sW=f(V,\epsilon)=f(V^{\prime},\epsilon^{\prime})\in\bar{\mathcal{H}}_{s} satisfies the properties required by 3.8.

Remark 3.8.2.

(Why we cannot reverse the direction of independence) There are two reasons:

  1. 1.

    The sublinear expectation will essentially change if we do so: the resulting distribution will be different. For instance, if we assume ϵV\epsilon\dashrightarrow V and let W~ϵV\tilde{W}\coloneqq\epsilon V, we have

    [W~]\displaystyle\mathcal{E}[\tilde{W}] =[[xV]x=ϵ]=𝔼[ϵ+σ¯ϵσ¯]\displaystyle=\mathcal{E}[\mathcal{E}[xV]_{x=\epsilon}]=\mathbb{E}[\epsilon^{+}\overline{\sigma}-\epsilon^{-}\underline{\sigma}]
    =𝔼[σ¯ϵ+(σ¯σ¯)ϵ]=(σ¯σ¯)𝔼[ϵ]\displaystyle=\mathbb{E}[\overline{\sigma}\epsilon+(\overline{\sigma}-\underline{\sigma})\epsilon^{-}]=(\overline{\sigma}-\underline{\sigma})\mathbb{E}[\epsilon^{-}]
    =12(σ¯σ¯)𝔼[|ϵ|]>0,\displaystyle=\frac{1}{2}(\overline{\sigma}-\underline{\sigma})\mathbb{E}[\lvert\epsilon\rvert]>0,

    and similarly,

    [W~]=12(σ¯σ¯)𝔼[|ϵ|]<0.-\mathcal{E}[-\tilde{W}]=-\frac{1}{2}(\overline{\sigma}-\underline{\sigma})\mathbb{E}[\lvert\epsilon\rvert]<0.

    where ϵ+=max{ϵ,0}\epsilon^{+}=\max\{\epsilon,0\} and ϵ=max{ϵ,0}\epsilon^{-}=\max\{-\epsilon,0\}. We can see that W~\tilde{W} and WW already exhibit their difference in the first moment: WW has certain mean zero but W{W} has mean-uncertainty.

  2. 2.

    We can never mutual independence in this case because VV is maximal and ϵ\epsilon is classical then it does not belong to the cases in 2.26 proved by Hu and Li, (2014).

As we further proceed in this paper, we will see that the property of WW is closely related to the random vector (V,ϵ)(V,\epsilon) in its decomposition 3.10 (such as the results in Section 3.6). The following 3.9 guarantees the uniqueness of such decomposition.

Proposition 3.9 (The uniqueness of decomposition).

In 3.8, if there exist two pairs (V1,ϵ1)(V_{1},\epsilon_{1}) and (V2,ϵ2)(V_{2},\epsilon_{2}) satisfying the required properties and

W=V1ϵ1=V2ϵ2,W=V_{1}\epsilon_{1}=V_{2}\epsilon_{2},

we must have V1=V2V_{1}=V_{2} and ϵ1=ϵ2\epsilon_{1}=\epsilon_{2}.

Theorem 3.10 (Representations of univariate semi-GG-normal).

Consider two classically distributed random variables σ:Ω[σ¯,σ¯]\sigma:\Omega\to{[\underline{\sigma},\overline{\sigma}]} and ϵN(0,1)\epsilon\sim N(0,1) satisfying σϵ\sigma\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon. For any φCl.Lip()\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}), we have [|φ(W)|]<\mathcal{E}[\lvert\varphi(W)\rvert]<\infty and

[φ(W)]\displaystyle\mathcal{E}[\varphi(W)] =maxσ𝒟deg.[σ¯,σ¯]𝔼[φ(σϵ)]=maxσ[σ¯,σ¯]𝔼[φ(σϵ)]\displaystyle=\max_{\sigma\in\mathcal{D}_{\textbf{deg.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)]=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)] (3.11)
=maxσ𝒟disc.[σ¯,σ¯]𝔼[φ(σϵ)]\displaystyle=\max_{\sigma\in\mathcal{D}_{\textbf{disc.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)] (3.12)
=supσ𝒟cont.[σ¯,σ¯]𝔼[φ(σϵ)]\displaystyle=\sup_{\sigma\in\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)] (3.13)
=maxσ𝒟[σ¯,σ¯]𝔼[φ(σϵ)],\displaystyle=\max_{\sigma\in\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)], (3.14)

where {𝒟,𝒟disc.,𝒟cont.,𝒟deg.}\{\mathcal{D},\mathcal{D}_{\textbf{disc.}},\mathcal{D}_{\textbf{cont.}},\mathcal{D}_{\textbf{deg.}}\} are the same as the ones in 3.2.

The proof of 3.10 is closely related to the representation of maximal distribution. First we need to prepare the following lemma.

Lemma 3.11.

For any fixed v[σ¯,σ¯]v\in{[\underline{\sigma},\overline{\sigma}]}, let φϵ(v)[φ(vϵ)]\varphi_{\epsilon}(v)\coloneqq\mathcal{E}[\varphi(v\epsilon)] with ϵ𝒩(0,[1,1])\epsilon\sim\mathcal{N}(0,[1,1]). Then we have φϵCl.Lip()\varphi_{\epsilon}\in C_{\mathrm{l.Lip}}(\mathbb{R}).

Proof of 3.11.

Note that ϵ=d𝒩(0,[1,1])=dN(0,1)\epsilon\overset{\text{d}}{=}\mathcal{N}(0,[1,1])\overset{\text{d}}{=}N(0,1) as mentioned in 2.14.1. Then φϵ(v)[φ(vϵ)]=𝔼[φ(vϵ)]\varphi_{\epsilon}(v)\coloneqq\mathcal{E}[\varphi(v\epsilon)]=\mathbb{E}[\varphi(v\epsilon)]. Next we can show φϵCl.Lip()\varphi_{\epsilon}\in C_{\mathrm{l.Lip}}(\mathbb{R}) by definition:

|φϵ(x)φϵ(y)|\displaystyle|\varphi_{\epsilon}(x)-\varphi_{\epsilon}(y)| =|𝔼[φ(xϵ)φ(yϵ)]|\displaystyle=|\mathbb{E}_{\mathbb{P}}[\varphi(x\epsilon)-\varphi(y\epsilon)]|
𝔼[Cφ(1+|xϵ|k+|yϵ|k)|ϵ||xy|]\displaystyle\leq\mathbb{E}_{\mathbb{P}}[C_{\varphi}(1+\lvert x\epsilon\rvert^{k}+\lvert y\epsilon\rvert^{k})\lvert\epsilon\rvert\cdot\lvert x-y\rvert]
=Cφ(𝔼[|ϵ|]+𝔼[|ϵ|k+1]|x|k+𝔼[|ϵ|k+1]|y|k)|xy|\displaystyle=C_{\varphi}(\mathbb{E}_{\mathbb{P}}[\lvert\epsilon\rvert]+\mathbb{E}_{\mathbb{P}}[\lvert\epsilon\rvert^{k+1}]\lvert x\rvert^{k}+\mathbb{E}_{\mathbb{P}}[\lvert\epsilon\rvert^{k+1}]\lvert y\rvert^{k})\lvert x-y\rvert
C(1+|x|k+|y|k)|xy|,\displaystyle\leq C(1+\lvert x\rvert^{k}+\lvert y\rvert^{k})\lvert x-y\rvert,

where C=Cφmax{𝔼[|ϵ|],𝔼[|ϵ|k+1]}C=C_{\varphi}\max\{\mathbb{E}_{\mathbb{P}}[\lvert\epsilon\rvert],\mathbb{E}_{\mathbb{P}}[\lvert\epsilon\rvert^{k+1}]\}. ∎

Proof of 3.10.

Under the sequential independence VϵV\dashrightarrow\epsilon, for any φCl.Lip()\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}), we have

[φ(W)]=[φ(Vϵ)]=[[φ(vϵ)]v=V]=[φϵ(V)].\mathcal{E}[\varphi(W)]=\mathcal{E}[\varphi(V\epsilon)]=\mathcal{E}[\mathcal{E}[\varphi(v\epsilon)]_{v=V}]=\mathcal{E}[\varphi_{\epsilon}(V)].

First we have φϵCl.Lip()\varphi_{\epsilon}\in C_{\mathrm{l.Lip}}(\mathbb{R}) by 3.11. Then we can use 3.1 to show the finiteness of [|φ(W)|]\mathcal{E}[\lvert\varphi(W)\rvert] due to the continuity of φϵ\varphi_{\epsilon}: [|φ(W)|]=maxv[σ¯,σ¯]|φϵ(v)|<.\mathcal{E}[\lvert\varphi(W)\rvert]=\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}\lvert\varphi_{\epsilon}(v)\rvert<\infty. Next we check each representation in 3.10 by applying the associated representation of maximal distribution in 3.2. For instance, we can show 3.13 based on 3.4:

[φ(W)]\displaystyle\mathcal{E}[\varphi(W)] =[φϵ(V)]=supσ𝒟cont.[σ¯,σ¯]𝔼[φϵ(σ)]\displaystyle=\mathcal{E}[\varphi_{\epsilon}(V)]=\sup_{\sigma\in\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi_{\epsilon}(\sigma)]
=supσ𝒟cont.[σ¯,σ¯]𝔼[𝔼[φ(vϵ)]v=σ]=supσ𝒟cont.[σ¯,σ¯]𝔼[φ(σϵ)],\displaystyle=\sup_{\sigma\in\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\mathbb{E}[\varphi(v\epsilon)]_{v=\sigma}]=\sup_{\sigma\in\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)],

where we use the fact that σϵ\sigma\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon and 2.6.3. ∎

Remark 3.11.1.

3.10 means that there are several ways to interpret the distributional uncertainty of semi-GG-normal:

  • 3.11 shows it can be described as a collection of N(0,σ2)N(0,\sigma^{2}) with σ[σ¯,σ¯]\sigma\in{[\underline{\sigma},\overline{\sigma}]} (which gives a direct way to compute this sublinear expectation);

  • 3.13, 3.12 and 3.14 show it can be described as a collection of classical normal mixture distribution with (discretely, absolutely continuously or arbitrarily) distributed scale parameter ranging in [σ¯,σ¯]{[\underline{\sigma},\overline{\sigma}]}.

Remark 3.11.2.

Let FσF_{\sigma} denote the cumulative distribution function of σ\sigma under \mathbb{P} and F𝒜[σ¯,σ¯]F_{\mathcal{A}{[\underline{\sigma},\overline{\sigma}]}} represent the family of FσF_{\sigma} with σ𝒜[σ¯,σ¯]\sigma\in\mathcal{A}{[\underline{\sigma},\overline{\sigma}]}. Then we can apply the classical Fubini theorem in the evaluation of 𝔼[φ(σϵ)]\mathbb{E}[\varphi(\sigma\epsilon)] in 3.10 to get a more explicit form of representation:

[φ(W)]=supFσF𝒜[σ¯,σ¯]σ¯σ¯𝔼[φ(vϵ)]Fσ(dv),\mathcal{E}[\varphi(W)]=\sup_{F_{\sigma}\in F_{\mathcal{A}{[\underline{\sigma},\overline{\sigma}]}}}\int_{\underline{\sigma}}^{\overline{\sigma}}\mathbb{E}[\varphi(v\epsilon)]F_{\sigma}(\mathop{}\!\mathrm{d}v),

where 𝒜\mathcal{A} can be chosen from {𝒟,𝒟disc.,𝒟cont.,𝒟deg.}\{\mathcal{D},\mathcal{D}_{\textbf{disc.}},\mathcal{D}_{\textbf{cont.}},\mathcal{D}_{\textbf{deg.}}\}.

Remark 3.11.3 (Why is it called a “semi” one?).

The essential reason is that the uncertainty set of distributions associated with the semi-GG-normal is smaller than the one of GG-normal. Let WG𝒩(0,[σ¯2,σ¯2])W^{G}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) and W𝒩^(0,[σ¯2,σ¯2])W\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]). In fact, we have the following existing result: for any φCl.Lip()\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R})

[φ(WG)]maxv[σ¯,σ¯]𝔼[φ(vϵ)]=[φ(W)],\mathcal{E}[\varphi(W^{G})]\geq\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(v\epsilon)]=\mathcal{E}[\varphi(W)], (3.15)

which can be proved by applying the comparison theorem of parabolic partial differential equations (in Crandall et al., (1992)) to the associated GG-heat and classical heat equations with initial condition φ\varphi (the inequality become a strict one when φ\varphi is neither convex nor concave). For readers’ convenience, the result 3.15 is included in Section 2.5 in Peng, 2019b . Meanwhile, we have the representation of \mathcal{E} from a set 𝒫\mathcal{P} of probability measures,

[φ(X)]=sup𝒫𝔼[φ(X)]=supX𝒫X𝔼[φ(X)],\mathcal{E}[\varphi(X)]=\sup_{\mathbb{Q}\in\mathcal{P}}\mathbb{E}_{\mathbb{Q}}[\varphi(X)]=\sup_{\mathbb{Q}_{X}\in\mathcal{P}_{X}}\mathbb{E}_{\mathbb{Q}}[\varphi(X)],

where 𝒫X{X1,𝒫}\mathcal{P}_{X}\coloneqq\{\mathbb{Q}\circ X^{-1},\mathbb{Q}\in\mathcal{P}\} characterizes the distributional uncertainty of XX. Hence, 3.15 tells us 𝒫W𝒫WG\mathcal{P}_{W}\subset\mathcal{P}_{W^{G}}. A more explicit discussion of this distinction will be provided in 3.24.6.

Remark 3.11.4 (The distribution of ϵ\epsilon).

In principle, the distribution of ϵ\epsilon can be changed to any other types of classical distribution with finite moment generating function and all the related results like representations will also hold. We choose standard normal because we are working on an intermediate structure between normal and GG-normal. Another reason comes from the following 3.12.

Proposition 3.12 (A special connection between semi-GG-normal and GG-normal distribution).

Let WG𝒩(0,[σ¯2,σ¯2])W^{G}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) and W𝒩^(0,[σ¯2,σ¯2])W\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]). For φCl.Lip()\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}), when φ\varphi is convex or concave, we have

[φ(WG)]=[φ(W)]={𝔼[φ(N(0,σ¯2))]φ is convex𝔼[φ(N(0,σ¯2))]φ is concave..\mathcal{E}[\varphi(W^{G})]=\mathcal{E}[\varphi(W)]=\begin{cases}\mathbb{E}_{\mathbb{P}}[\varphi(N(0,\overline{\sigma}^{2}))]&\varphi\text{ is convex}\\ \mathbb{E}_{\mathbb{P}}[\varphi(N(0,\underline{\sigma}^{2}))]&\varphi\text{ is concave}.\end{cases}.

3.5 Multivariate semi-GG-normal distribution

The definition of semi-GG-normal distribution can be naturally extended to multi-dimensional situation. Intuitively speaking, the multivariate semi-GG-normal distribution can be treated as an analogue of the classical multivariate normal distribution which can be written as:

N(𝟎,𝚺)=𝚺1/2N(𝟎,𝐈d),N(\bm{0},\mathbf{\Sigma})=\mathbf{\Sigma}^{1/2}N(\bm{0},\mathbf{I}_{d}), (3.16)

where 𝐈d\mathbf{I}_{d} is a d×dd\times d identity matrix and 𝚺\mathbf{\Sigma} is the covariance matrix.

Let 𝕊d+\mathbb{S}_{d}^{+} denote the family of real-valued symmetric positive semi-definite d×dd\times d matrices. Consider a bounded, closed and convex subset 𝒞𝕊d+\mathcal{C}\subset\mathbb{S}^{+}_{d}. For any element 𝚺𝒞\mathbf{\Sigma}\in\mathcal{C}, it has a non-negative symmetric square root denoted as 𝚺1/2\mathbf{\Sigma}^{1/2}. Let 𝒱𝒞1/2\mathcal{V}\coloneqq\mathcal{C}^{1/2} which is the set of 𝚺1/2\mathbf{\Sigma}^{1/2} with 𝚺𝒞\mathbf{\Sigma}\in\mathcal{C}. Then we can treat 𝚺\mathbf{\Sigma} as the covariance matrix of a classical multivariate normal distribution due to 3.16 and 𝒞\mathcal{C} as a collection of covariance matrices. Note that 𝒱\mathcal{V} is still a bounded, closed and convex set. Then a matrix-valued maximal distribution (𝒱)\mathcal{M}(\mathcal{V}) can be directly extended from 3.3.

Definition 3.13 (Multivariate Semi-GG-normal distribution).

Let a bounded, closed and convex subset 𝒞𝕊d+\mathcal{C}\subset\mathbb{S}^{+}_{d} be the uncertainty set of covariance matrices and 𝒱𝒞1/2\mathcal{V}\coloneqq\mathcal{C}^{1/2}. In a sublinear expectation space, a dd-dimensional random vector 𝑾\bm{W} follows a (multivariate) semi-GG-normal distribution, denoted by 𝑾𝒩^(𝟎,𝒞)\bm{W}\sim\hat{\mathcal{N}}(\bm{0},\mathcal{C}), if there exists a (degenerate) GG-normal distributed dd-dimensional random vector

ϵN(𝟎,𝐈d):Ωd,\bm{\epsilon}\sim N(\bm{0},\mathbf{I}_{d}):\Omega\rightarrow\mathbb{R}^{d},

and a d×dd\times d-dimensional maximally distributed random matrix

𝐕(𝒱):Ωd×d,\mathbf{V}\sim\mathcal{M}(\mathcal{V}):\Omega\rightarrow\mathbb{R}^{d\times d},

as well as ϵ\bm{\epsilon} is independent from 𝐕\mathbf{V}, expressed as 𝐕ϵ\mathbf{V}\dashrightarrow\bm{\epsilon}, such that

𝑾=𝐕ϵ,\bm{W}=\mathbf{V}\bm{\epsilon},

where the direction of independence here cannot be reversed.

Remark 3.13.1.

The existence of multivariate semi-GG-normal distribution comes from the same logic as 3.8.1 (by using the existence of the GG-distribution in a multivariate setup).

Remark 3.13.2.

Note that 𝐕\mathbf{V} in 3.13 is a random matrix. The relation 𝐕ϵ\mathbf{V}\dashrightarrow\bm{\epsilon} is defined by a multivariate version of 2.6.

Similar to the discussions in Section 3.2, we can extend the notion of semi-GG-normal distribution and its representation to multivariate siuation.

Theorem 3.14.

(Representation of multivariate semi-GG-normal distribution) Consider the random vector 𝐖\bm{W} in 3.13. For any φCl.Lip(d)\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{d}), we have [|φ(𝐖)|]<\mathcal{E}[\lvert\varphi(\bm{W})\rvert]<\infty and

[φ(𝑾)]=sup𝚺1/2𝒜(𝒱)𝔼[φ(𝚺1/2ϵ)],\mathcal{E}[\varphi(\bm{W})]=\sup_{\mathbf{\Sigma}^{1/2}\in\mathcal{A}(\mathcal{V})}\mathbb{E}[\varphi(\mathbf{\Sigma}^{1/2}\bm{\epsilon})], (3.17)

where 𝒜\mathcal{A} can be chosen from {𝒟,𝒟disc.,𝒟cont.,𝒟deg.}\{\mathcal{D},\mathcal{D}_{\textbf{disc.}},\mathcal{D}_{\textbf{cont.}},\mathcal{D}_{\textbf{deg.}}\} and sup\sup can be changed to max\max except when 𝒜=𝒟cont.\mathcal{A}=\mathcal{D}_{\textbf{cont.}}.

Proof of 3.14.

The logic of this proof is exactly the same as the one of 3.10 where we apply the representation of the maximal distribution (𝒱)\mathcal{M}(\mathcal{V}) which can be easily checked that it has the same form as 3.4. ∎

Remark 3.14.1.

3.14 means that there are several ways to interpret the distributional uncertainty of multivariate semi-GG-normal 𝒩^(𝟎,𝒞)\hat{\mathcal{N}}(\bm{0},\mathcal{C}):

  • it can be described as a collection of N(0,𝚺)N(0,\mathbf{\Sigma}) with constant covariance matrix 𝚺𝒞\mathbf{\Sigma}\in\mathcal{C};

  • it can be described as a collection of classical multvariate normal mixture distributions with (discretely, absolutely continuously, arbitrarily) distributed random covariance matrices (as a latent scaling variable) ranging in 𝒞\mathcal{C}.

By using 3.14, we can conveniently study the covariance uncertainty between the marginals of 𝑾\bm{W}. First, we can define the the upper and lower covariance between each marginal of 𝑾=(W1,W2,,Wd)\bm{W}=(W_{1},W_{2},\dotsc,W_{d}) as (note that WiW_{i} has certain mean zero)

γ¯(i,j)[WiWj],\overline{\gamma}(i,j)\coloneqq\mathcal{E}[W_{i}W_{j}],

and

γ¯(i,j)[WiWj].\underline{\gamma}(i,j)\coloneqq-\mathcal{E}[-W_{i}W_{j}].

Then these two quantities turn out to be closely related to 𝒞\mathcal{C} illustrated as follows.

Proposition 3.15 (Upper and lower covariance between semi-GG-normal marginals).

For each 𝚺𝒞\mathbf{\Sigma}\in\mathcal{C}, let Σij\Sigma_{ij} denote the (i,j)(i,j)-th entry of 𝚺\mathbf{\Sigma}, and

[Σ¯ij,Σ¯ij][minΣ𝒞Σij,maxΣ𝒞Σij].[\underline{\Sigma}_{ij},\overline{\Sigma}_{ij}]\coloneqq[\min_{\Sigma\in\mathcal{C}}\Sigma_{ij},\max_{\Sigma\in\mathcal{C}}\Sigma_{ij}].

Then we have

γ¯(i,j)=Σ¯ij and γ¯(i,j)=Σ¯ij.\underline{\gamma}(i,j)=\underline{\Sigma}_{ij}\text{ and }\overline{\gamma}(i,j)=\overline{\Sigma}_{ij}.

Specially speaking, we have

σ¯i2[Wi2]=𝚺¯ii and σ¯i2[Wi2]=𝚺¯ii.\overline{\sigma}_{i}^{2}\coloneqq\mathcal{E}[W_{i}^{2}]=\overline{\mathbf{\Sigma}}_{ii}\text{ and }\underline{\sigma}_{i}^{2}\coloneqq-\mathcal{E}[-W_{i}^{2}]=\underline{\mathbf{\Sigma}}_{ii}.
Proof.

For each (i,j){1,2,,d}2(i,j)\in\{1,2,\dotsc,d\}^{2}, let fij(𝑾)=WiWjf_{ij}(\bm{W})=W_{i}W_{j}. Then it is obvious that fijCl.Lip(d)f_{ij}\in C_{\mathrm{l.Lip}}(\mathbb{R}^{d}). For each Σ\Sigma, let

(Y1,Y2,,Yd)𝚺1/2ϵ.(Y_{1},Y_{2},\dotsc,Y_{d})\coloneqq\mathbf{\Sigma}^{1/2}\bm{\epsilon}.

Then by applying 3.14,

γ¯(i,j)=[WiWj]\displaystyle\overline{\gamma}(i,j)=\mathcal{E}[W_{i}W_{j}] =maxΣ1/2𝒱𝔼[fij(𝚺1/2ϵ)]\displaystyle=\max_{\Sigma^{1/2}\in\mathcal{V}}\mathbb{E}[f_{ij}(\mathbf{\Sigma}^{1/2}\bm{\epsilon})]
=maxΣ1/2𝒱𝔼[YiYj]\displaystyle=\max_{\Sigma^{1/2}\in\mathcal{V}}\mathbb{E}[Y_{i}Y_{j}]
=maxΣ1/2𝒱Σij=Σ¯ij.\displaystyle=\max_{\Sigma^{1/2}\in\mathcal{V}}\Sigma_{ij}=\overline{\Sigma}_{ij}.

Similarly we can show γ¯(i,j)=Σ¯ij\underline{\gamma}(i,j)=\underline{\Sigma}_{ij}. ∎

3.6 Three types of independence related to semi-GG-normal distribution

Besides the existing GG-version independence (also called sequential independence) in 2.8, this substructure of semi-GG-normal distribution also provides the possibility to study finer structures of independence in this framework, and interestingly, we will show in Section 3.7 each type of independence is related to a family of state-space volatility models.

We will introduce three types of independence regarding semi-GG-normal distributions. Readers may recall the notation \dashrightarrow for the independence of sequence (2.8). Throughout this section, we assume Wi=Viϵi=d𝒩^(0,[σ¯2,σ¯2])W_{i}=V_{i}\epsilon_{i}\overset{\text{d}}{=}\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) for i=1,2,,ni=1,2,\dotsc,n, which is a sequence of semi-GG-normally distributed random variables, then accordingly Vi=d[σ¯,σ¯]V_{i}\overset{\text{d}}{=}\mathcal{M}{[\underline{\sigma},\overline{\sigma}]} and ϵi=d𝒩(0,[1,1])\epsilon_{i}\overset{\text{d}}{=}\mathcal{N}(0,[1,1]). Let

[σ¯2,σ¯2]𝐈n{𝚺=(σ12σn2):σi2[σ¯2,σ¯2],i=1,2,,n}.[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{n}\coloneqq\mathopen{}\mathclose{{}\left\{\mathbf{\Sigma}=\mathopen{}\mathclose{{}\left(\begin{array}[]{ccc}\sigma^{2}_{1}\\ &\ddots\\ &&\sigma^{2}_{n}\end{array}}\right):\sigma^{2}_{i}\in[\underline{\sigma}^{2},\overline{\sigma}^{2}],i=1,2,\dotsc,n}\right\}.

The identity of variance intervals is not essential and the results in this section can be easily generalized to the case Wi𝒩^(0,[σ¯i2,σ¯i2]),i=1,2,,nW_{i}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}_{i}^{2},\overline{\sigma}_{i}^{2}]),i=1,2,\dotsc,n.

Definition 3.16.

For a sequence of semi-GG-normal distributed random variables {Wi}i=1n(={Viϵi}i=1n)\{W_{i}\}_{i=1}^{n}(=\{V_{i}\epsilon_{i}\}_{i=1}^{n}), we have three types of independence:

  1. 1.

    {Wi}i=1n\{W_{i}\}_{i=1}^{n} are semi-sequentially independent (denoted as W1SW2SSWnW_{1}\overset{\text{S}}{\dashrightarrow}W_{2}\overset{\text{S}}{\dashrightarrow}\dotsc\overset{\text{S}}{\dashrightarrow}W_{n}) if :

    V1V2Vnϵ1ϵ2ϵn;V_{1}\dashrightarrow V_{2}\dashrightarrow\dotsc\dashrightarrow V_{n}\dashrightarrow\epsilon_{1}\dashrightarrow\epsilon_{2}\dashrightarrow\dotsc\dashrightarrow\epsilon_{n}; (3.18)
  2. 2.

    {Wi}i=1n\{W_{i}\}_{i=1}^{n} are sequentially independent (denoted as W1W2WnW_{1}\dashrightarrow W_{2}\dashrightarrow\dotsc\dashrightarrow W_{n}) if:

    V1ϵ1V2ϵ2Vnϵn;V_{1}\epsilon_{1}\dashrightarrow V_{2}\epsilon_{2}\dashrightarrow\dotsc\dashrightarrow V_{n}\epsilon_{n}; (3.19)
  3. 3.

    {Wi}i=1n\{W_{i}\}_{i=1}^{n} are fully-sequentially independent (denoted as W1FW2FFWnW_{1}\overset{\text{F}}{\dashrightarrow}W_{2}\overset{\text{F}}{\dashrightarrow}\dotsc\overset{\text{F}}{\dashrightarrow}W_{n}) if:

    V1ϵ1V2ϵ2Vnϵn.V_{1}\dashrightarrow\epsilon_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{2}\dashrightarrow\dotsc\dashrightarrow V_{n}\dashrightarrow\epsilon_{n}. (3.20)
Remark 3.16.1 (Compatibility with the definition of semi-GG-normal).

The requirement of independence to form the semi-GG-normal distribution is simply ViϵiV_{i}\dashrightarrow\epsilon_{i}, which is guaranteed by all the three types of independence by 2.19. Furthermore, for two semi-GG-normal object W=VϵW=V\epsilon and W¯=V¯ϵ¯\bar{W}=\bar{V}\bar{\epsilon}, we can see that WFW¯W\overset{\text{F}}{\dashrightarrow}\bar{W} implies

(V,ϵ)(V¯,ϵ¯),(V,\epsilon)\dashrightarrow(\bar{V},\bar{\epsilon}),

which further indicates WW¯.W\dashrightarrow\bar{W}. However, WFW¯W\overset{\text{F}}{\dashrightarrow}\bar{W} (or WW¯W\dashrightarrow\bar{W}) does not imply WSW¯W\overset{\text{S}}{\dashrightarrow}\bar{W} since the latter actually reverses the order of independence between ϵ\epsilon and V¯\bar{V} in the former.

Remark 3.16.2.

(Existence of these types of independence) It comes from the same logic used in 3.8.1 due to the existence of nn sequentially independent GG-distributed random vectors.

Theorem 3.17.

The fully-seqential independence of {Wi}i=1n\{W_{i}\}_{i=1}^{n} can be equivalently defined as:

  1. (F1)

    The pair (Vi,ϵi)(V_{i},\epsilon_{i}) are sequentially independent: (V1,ϵ1)(V2,ϵ2)(Vn,ϵn).(V_{1},\epsilon_{1})\dashrightarrow(V_{2},\epsilon_{2})\dashrightarrow\cdots\dashrightarrow(V_{n},\epsilon_{n}).

  2. (F2)

    The elements within each pair (Vi,ϵi)(V_{i},\epsilon_{i}) satisfy ViϵiV_{i}\dashrightarrow\epsilon_{i} with i=1,2,,ni=1,2,\dotsc,n.

Remark 3.17.1.

We add the condition (F2) only to stress the intrinsic requirement on independence from the definition of semi-GG-normal. The main requirement of fully-sequential independence is (F1). It is also the reason why F\overset{\text{F}}{\dashrightarrow} is stronger than \dashrightarrow because the latter only involves the product VϵV\epsilon but the former is about the joint vector (V,ϵ)(V,\epsilon).

The fully-sequential independence is a stronger version of sequential independence and it does not exhibit much difference with sequential independence in our current scope of discussion (which will be illustrated by 3.24).

Hence, the key new type of independence here is the semi-sequential independence, which is different from the sequential independence and also leads to different joint distribution of (W1,W2,,Wn)(W_{1},W_{2},\dotsc,W_{n}). We will study the properties and behaviours of semi-GG-normal under semi-sequential independence. Under such kind of independence, some of the intuitive properties we have in classical situation are preserved. First of all, it is actually a symmetric independence among objects with distributional uncertainty (3.19). This symmetry makes it different from the sequential independence although S\overset{\text{S}}{\dashrightarrow} is defined through \dashrightarrow. Moreover, the joint vector of nn semi-sequentially independent semi-GG-normal follows a multivariate semi-GG-normal. It actually provides a view on how to connect univariate and multivariate objects (under distributional uncertainty), which is a non-trivial task for GG-normal distribution. It further provides a path to start from univariate classical normal to approach a multivariate GG-normal (by using the multivariate semi-GG-normal as a middle stage). This idea will be further illustrated in Section 4.2.

We call it as “semi-sequential” independence because the only “sequential” requirement in the independence is (V1,V2,,Vn)(ϵ1,ϵ2,,ϵn)(V_{1},V_{2},\dotsc,V_{n})\dashrightarrow(\epsilon_{1},\epsilon_{2},\dotsc,\epsilon_{n}) but the squential order within each vector is inessential in the sense that it can be arbitrarily switched. 3.18 elaborates this point by giving an equivalent definition.

Theorem 3.18.

The semi-sequential independence of {Wi}i=1n\{W_{i}\}_{i=1}^{n} can be equivalently defined as:

  1. (S1)

    The ϵ\epsilon part is independent from VV part: (V1,V2,,Vn)(ϵ1,ϵ2,,ϵn)(V_{1},V_{2},\dotsc,V_{n})\dashrightarrow(\epsilon_{1},\epsilon_{2},\dotsc,\epsilon_{n}),

  2. (S2)

    The elements in VV part are sequentially independent: V1V2Vn{V}_{1}\dashrightarrow{V}_{2}\dashrightarrow\cdots\dashrightarrow{V}_{n},

  3. (S3)

    The elements in ϵ\epsilon part are classically independent.

Remark 3.18.1.

The order of independence within VV part in (S2) is inessential in the sense that it can be arbitrarily switched by 3.6. Meanwhile, the order in ϵ\epsilon part can also be switched due to the classical independence. Hence, this equivalence definition of semi-sequential independence indicates some intrinsic symmetry of this relation coming from the only two categories of distributions (maximal and classical) that allow mutual independence. This point will be elaborated in the discussion of 3.19 and further formalized in 3.22.

To show the idea of the symmetry of S\overset{\text{S}}{\dashrightarrow}, we start from a simple case with n=2n=2 and include a short proof for readers to grasp the intuition. The validation of other results in this section is given in Section 6.3.

Proposition 3.19 (Symmetry in semi-sequential independence).

The following statements are equivalent:

  1. (1)

    W1SW2W_{1}\overset{\text{S}}{\dashrightarrow}W_{2},

  2. (2)

    W2SW1W_{2}\overset{\text{S}}{\dashrightarrow}W_{1},

  3. (3)

    (W1,W2)𝒩^(𝟎,[σ¯2,σ¯2]𝐈2)(W_{1},W_{2})\sim\hat{\mathcal{N}}(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{2}).

The proof of 3.19 relies on the following 3.20, which is a direct consequence of 2.24 but we still include a separate proof from scratch to show the idea.

Lemma 3.20.

The following two statements are equivalent:

  1. (1)

    V1V2ϵ1ϵ2V_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{1}\dashrightarrow\epsilon_{2},

  2. (2)

    (V1,V2)(ϵ1,ϵ2)(V_{1},V_{2})\dashrightarrow(\epsilon_{1},\epsilon_{2}), V1V2V_{1}\dashrightarrow V_{2}, ϵ1ϵ2\epsilon_{1}\dashrightarrow\epsilon_{2}.

Proof of 3.20.

We can directly see (1)(2)(1)\implies(2) because the independence of a sequence implies the independence among the non-overlapping subvectors as long as keeping the original order (2.23).

(2)(1)(2)\implies(1). The relation (1)(1) is equivalent to,

  1. 1.

    V1V2V_{1}\dashrightarrow V_{2},

  2. 2.

    (V1,V2)ϵ1(V_{1},V_{2})\dashrightarrow\epsilon_{1},

  3. 3.

    (V1,V2,ϵ1)ϵ2(V_{1},V_{2},\epsilon_{1})\dashrightarrow\epsilon_{2}.

The first two are directly implied by (2). For a fixed scalar vector (v1,v2,e1)(v_{1},v_{2},e_{1}), let H(v1,v2,e1)[φ(v1,v2,e1,e2)].H(v_{1},v_{2},e_{1})\coloneqq\mathcal{E}[\varphi(v_{1},v_{2},e_{1},e_{2})]. Then the third one is equivalent to

[φ(V1,V2,ϵ1,ϵ2)]=[H(V1,V2,ϵ1)].\mathcal{E}[\varphi(V_{1},V_{2},\epsilon_{1},\epsilon_{2})]=\mathcal{E}[H(V_{1},V_{2},\epsilon_{1})].

In fact, since (V1,V2)(ϵ1,ϵ2)(V_{1},V_{2})\dashrightarrow(\epsilon_{1},\epsilon_{2}), we have

[φ(V1,V2,ϵ1,ϵ2)]\displaystyle\mathcal{E}[\varphi(V_{1},V_{2},\epsilon_{1},\epsilon_{2})] =[[φ(v1,v2,ϵ1,ϵ2)]vi=Vi,i=1,2]\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(v_{1},v_{2},\epsilon_{1},\epsilon_{2})]_{v_{i}=V_{i},i=1,2}]
=(a)[[[φ(v1,v2,e1,ϵ2)]e1=ϵ1]vi=Vi,i=1,2]\displaystyle\overset{(a)}{=}\mathcal{E}[\mathcal{E}[\mathcal{E}[\varphi(v_{1},v_{2},e_{1},\epsilon_{2})]_{e_{1}=\epsilon_{1}}]_{v_{i}=V_{i},i=1,2}]
=[[H(v1,v2,ϵ1)]vi=Vi,i=1,2]\displaystyle=\mathcal{E}[\mathcal{E}[H(v_{1},v_{2},\epsilon_{1})]_{v_{i}=V_{i},i=1,2}]
=(b)[H(V1,V2,ϵ1)],\displaystyle\overset{(b)}{=}\mathcal{E}[H(V_{1},V_{2},\epsilon_{1})],

where (a)(a) comes from the independence ϵ1ϵ2\epsilon_{1}\dashrightarrow\epsilon_{2} and (b)(b) is due to the relation (V1,V2)ϵ1(V_{1},V_{2})\dashrightarrow\epsilon_{1}. ∎

Proof of 3.19.

The equivalence of the three statements will be proved by this logic: (1)(2)(3)(1)(1)\implies(2)\implies(3)\implies(1).

(1)(2)(1)\implies(2). By 3.20, (1) indicates

(V1,V2)(ϵ1,ϵ2),V1V2,ϵ1ϵ2.(V_{1},V_{2})\dashrightarrow(\epsilon_{1},\epsilon_{2}),V_{1}\dashrightarrow V_{2},\epsilon_{1}\dashrightarrow\epsilon_{2}. (3.21)

In 3.21, the roles in VV part are symmetric and so are ϵ\epsilon part (due to 2.26). Then 3.21 is equivalent to

(V2,V1)(ϵ2,ϵ1),V2V1,ϵ2ϵ1,(V_{2},V_{1})\dashrightarrow(\epsilon_{2},\epsilon_{1}),V_{2}\dashrightarrow V_{1},\epsilon_{2}\dashrightarrow\epsilon_{1}, (3.22)

which in turn implies W2SW1W_{2}\overset{\text{S}}{\dashrightarrow}W_{1} by 3.20.

(2)(3)(2)\implies(3) Let 𝑾(W1,W2)\bm{W}\coloneqq(W_{1},W_{2}). Then

𝑾=(V1ϵ1,V2ϵ2)=𝐕ϵ,\bm{W}=(V_{1}\epsilon_{1},V_{2}\epsilon_{2})=\mathbf{V}\bm{\epsilon},

where 𝐕=diag(V1,V2)\mathbf{V}=\operatorname{diag}(V_{1},V_{2}) and ϵ=(ϵ1,ϵ2)\bm{\epsilon}=(\epsilon_{1},\epsilon_{2}). Under the independence (2)(2), we have 3.22 by 3.20, which further implies 𝐕ϵ.\mathbf{V}\dashrightarrow\bm{\epsilon}. We also have (V1,V2)([σ¯,σ¯]2)(V_{1},V_{2})\sim\mathcal{M}({[\underline{\sigma},\overline{\sigma}]}^{2}) from V2V1V_{2}\dashrightarrow V_{1} (3.6), then 𝐕([σ¯,σ¯]𝐈2)\mathbf{V}\sim\mathcal{M}({[\underline{\sigma},\overline{\sigma}]}\mathbf{I}_{2}). Meanwhile, ϵ2ϵ1\epsilon_{2}\dashrightarrow\epsilon_{1} means they are actually classically independent with the joint distribution ϵN(𝟎,𝐈n2)\bm{\epsilon}\sim N(\bm{0},\mathbf{I}_{n}^{2}) because the distribution of ϵi\epsilon_{i} is classical. Therefore, by 3.13, 𝑾=𝐕ϵ𝒩^(𝟎,[σ¯2,σ¯2]𝐈2).\bm{W}=\mathbf{V}\bm{\epsilon}\sim\hat{\mathcal{N}}(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{2}).

(3)(1).(3)\implies(1). First, from the definition of 𝑾=(W1,W2)𝒩^(𝟎,[σ¯2,σ¯2]𝐈2)\bm{W}=(W_{1},W_{2})\sim\hat{\mathcal{N}}(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{2}), there exist 𝐕=diag(V1,V2)([σ¯,σ¯]𝐈2),\mathbf{V}=\operatorname{diag}(V_{1},V_{2})\sim\mathcal{M}({[\underline{\sigma},\overline{\sigma}]}\mathbf{I}_{2}), and ϵ=(ϵ1,ϵ2)N(0,𝐈2),\bm{\epsilon}=(\epsilon_{1},\epsilon_{2})\sim N(0,\mathbf{I}_{2}), with independence

𝐕ϵ,\mathbf{V}\dashrightarrow\bm{\epsilon}, (3.23)

such that 𝑾=𝐕ϵ.\bm{W}=\mathbf{V}\bm{\epsilon}. In other words, (W1,W2)=(V1ϵ1,V2ϵ2).(W_{1},W_{2})=(V_{1}\epsilon_{1},V_{2}\epsilon_{2}). We directly have ϵ1\epsilon_{1} and ϵ2\epsilon_{2} are classically independent from their joint distribution. Next we study the independence between ViV_{i} part. Similarly, we can the joint distribution (V1,V2)([σ¯,σ¯]2)(V_{1},V_{2})\sim\mathcal{M}({[\underline{\sigma},\overline{\sigma}]}^{2}) from the distribution of 𝐕\mathbf{V}:

[φ(V1,V2)]\displaystyle\mathcal{E}[\varphi(V_{1},V_{2})] =[φ((1,1)𝐕)]\displaystyle=\mathcal{E}[\varphi((1,1)\mathbf{V})]
=maxB[σ¯,σ¯]𝐈2φ((1,1)B)\displaystyle=\max_{B\in{[\underline{\sigma},\overline{\sigma}]}\mathbf{I}_{2}}\varphi((1,1)B)
=max(v1,v2)[σ¯,σ¯]2φ(v1,v2).\displaystyle=\max_{(v_{1},v_{2})\in{[\underline{\sigma},\overline{\sigma}]}^{2}}\varphi(v_{1},v_{2}).

By 3.6, we have V1V2V_{1}\dashrightarrow V_{2} (also vice versa). Note that 3.23 implies (V1,V2)(ϵ1,ϵ2).(V_{1},V_{2})\dashrightarrow(\epsilon_{1},\epsilon_{2}). Hence we have W1SW2W_{1}\overset{\text{S}}{\dashrightarrow}W_{2} by 3.20. ∎

Proposition 3.21 (Zero sublinear covariance implies semi-sequential independence).

If (W1,W2)(W_{1},W_{2}) follows a bivariate semi-GG-normal and they have certain zero covariance:

[W1W2]=[W1W2]=0,\mathcal{E}[W_{1}W_{2}]=-\mathcal{E}[-W_{1}W_{2}]=0,

then we have W1SW2W_{1}\overset{\text{S}}{\dashrightarrow}W_{2} (and vice versa).

Proof.

It is a direct result of 3.19 and 3.15. ∎

3.21 and 3.19 seem like natural results for a “normal” object in multivariate case, but this is the first time we establish such kind of connections within the GG-expectation framework, because the GG-normal distribution does not have such properties in multivariate case. For instance, for 𝑿=(X1,X2)\bm{X}=(X_{1},X_{2}) with Xi𝒩(0,[σ¯2,σ¯2])X_{i}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]). On the one hand, given the independence X1X2X_{1}\dashrightarrow X_{2}, 𝑿\bm{X} does not follow a bivariate GG-normal, neither does 𝐀𝑿\mathbf{A}\bm{X} under any invertible transformation 𝐀\mathbf{A}. One the other hand, if 𝑿\bm{X} follows a bivariate GG-normal 𝒩(𝟎,[σ¯2,σ¯2]𝐈2)\mathcal{N}(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{2}), we do not have X1X2X_{1}\dashrightarrow X_{2} or X2X1X_{2}\dashrightarrow X_{1}. These kinds of strange properties bring barriers to the understanding of GG-normal in multivariate situations, especially on the connection univariate and multivariate objects. More details of this concern can be found in Bayraktar and Munk, (2015). Fortunately, the substructure of semi-GG-normal provides some insights to reveal this connection.

3.19 can be extended to nn random variables.

Theorem 3.22.

The following three statements are equivalent:

  • (1)

    W1SW2SSWn{W}_{1}\overset{\text{S}}{\dashrightarrow}{W}_{2}\overset{\text{S}}{\dashrightarrow}\cdots\overset{\text{S}}{\dashrightarrow}{W}_{n},

  • (2)

    Wk1SWk2SSWknW_{k_{1}}\overset{\text{S}}{\dashrightarrow}W_{k_{2}}\overset{\text{S}}{\dashrightarrow}\cdots\overset{\text{S}}{\dashrightarrow}W_{k_{n}} for any permutation {kj}j=1n\{k_{j}\}_{j=1}^{n} of {1,2,,n}\{1,2,\dotsc,n\},

  • (3)

    (W1,W2,,Wn)𝒩^(𝟎,[σ¯2,σ¯2]𝐈n)(W_{1},W_{2},\dotsc,W_{n})\sim\hat{\mathcal{N}}(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{n}).

Remark 3.22.1.

3.22 shows the semi-GG-normal under the the semi-sequential independence has symmetry and compatiability with the multivariate case. The underlying reason is that it takes advantage the (only) two families of distributions that allow both properties: classical normal and maximal distribution. For classical normal, we know that a bivariate normal with diagonal covariance matrix is equivalent to the (symmetric) independence between components. For maximal, the results are provided in 3.6.

We end this session by showing the stability of semi-GG-normal distribution under semi-sequential independence, which indicates that more analogous generalizations of results on classical normal can be discussed here.

Proposition 3.23.

For any W¯\bar{W} satisfying W¯=dW\bar{W}\overset{\text{d}}{=}W and WSW¯W\overset{\text{S}}{\dashrightarrow}\bar{W}, we have

W+W¯=d2W.W+\bar{W}\overset{\text{d}}{=}\sqrt{2}W.
Proof.

With W=VϵW=V\epsilon and W¯=V¯ϵ¯\bar{W}=\bar{V}\bar{\epsilon}, semi-sequential independence means:

VV¯ϵϵ¯.V\dashrightarrow\bar{V}\dashrightarrow\epsilon\dashrightarrow\bar{\epsilon}.

For any φCl.Lip()\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}), first recall that φϵ(v)[φ(vϵ)]\varphi_{\epsilon}(v)\coloneqq\mathcal{E}[\varphi(v\epsilon)] is in Cl.Lip()C_{\mathrm{l.Lip}}(\mathbb{R}) (3.11). On the one hand,

[φ(Vϵ+V¯ϵ¯)]\displaystyle\mathcal{E}[\varphi(V\epsilon+\bar{V}\bar{\epsilon})] =[[φ(vϵ+v¯ϵ¯)]v=V,v¯=V¯]=[φϵ(v2+v¯2)]\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(v\epsilon+\bar{v}\bar{\epsilon})]_{v=V,\,\bar{v}=\bar{V}}]=\mathcal{E}\mathopen{}\mathclose{{}\left[\varphi_{\epsilon}\mathopen{}\mathclose{{}\left(\sqrt{v^{2}+\bar{v}^{2}}}\right)}\right]
=[[φϵ(v+V¯)]v=V]=maxx[σ¯,σ¯]maxy[σ¯,σ¯]φϵ(x2+y2),\displaystyle=\mathcal{E}\mathopen{}\mathclose{{}\left[\mathcal{E}[\varphi_{\epsilon}(\sqrt{v+\bar{V}})]_{v=V}}\right]=\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}\max_{y\in{[\underline{\sigma},\overline{\sigma}]}}\varphi_{\epsilon}(\sqrt{x^{2}+y^{2}}),

where we use the fact that VV¯V\dashrightarrow\bar{V}. On the other hand,

[φ(2W)]=maxx[σ¯,σ¯]𝔼[φ(2xϵ)]=maxx[σ¯,σ¯]φϵ(2x).\mathcal{E}[\varphi(\sqrt{2}W)]=\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sqrt{2}x\epsilon)]=\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}\varphi_{\epsilon}(\sqrt{2}x).

Since

{x2+y2;(x,y)[σ¯,σ¯]2}=[2σ¯,2σ¯]={2x;x[σ¯,σ¯]},\{\sqrt{x^{2}+y^{2}};\,(x,y)\in{[\underline{\sigma},\overline{\sigma}]}^{2}\}=[\sqrt{2}\underline{\sigma},\,\sqrt{2}\overline{\sigma}]=\{\sqrt{2}x;\,x\in{[\underline{\sigma},\overline{\sigma}]}\},

we have [φ(W+W¯)]=[φ(2W)]\mathcal{E}[\varphi(W+\bar{W})]=\mathcal{E}[\varphi(\sqrt{2}W)] for all φCl.Lip()\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}). ∎

We will further investigate the connection and distinction between these types of independence in Section 3.7 by studying their representations.

3.7 Representations under three types of independence

Let us come back to the story setup in Section 3.1 to introduce our results to general audience. Suppose we intend to study the dynamic of the whole observation process (which is the observable data sequence)

(Y1,Y2,,Yn)=(σ1ϵ1,σ2ϵ2,,σnϵn).(Y_{1},Y_{2},\dots,Y_{n})=(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n}).

Depending on the background information or knowledge (or the lackness of reliable knowledge on the data pattern and underlying dynamic), we may still have uncertainty on the distribution or dynamic of 𝒀\bm{Y}. Especially in the early stage of data analysis, it is usually required to specify a model structure and search for the optimal one in a family of them. However, at this stage, how to select or distinguish the family of models is an important and non-trivial task in statistial modeling. Suppose we assume that the underlying 𝝈\bm{\sigma} process belong to a family 𝒜n[σ¯,σ¯]\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}, but some patterns of the data sequence, which could be generally quantified by 𝔼[φ(σ1ϵ1,σ2ϵ2,,σnϵn)]\mathbb{E}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n})] for a test function φ\varphi, seems to exceed even the extreme cases in 𝒜n[σ¯,σ¯]\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}. In this case, we may tend to reject the hypothesis that σ𝒜n[σ¯,σ¯]\sigma\in\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}. In this situation, we usually need to work with the maximum expected value under the uncertainty 𝝈𝒜n[σ¯,σ¯]\bm{\sigma}\in\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}:

sup𝝈𝒜n[σ¯,σ¯]𝔼[φ(σ1ϵ1,σ2ϵ2,,σnϵn)].\sup_{\bm{\sigma}\in\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n})]. (3.24)

However, in principle, 𝒜n[σ¯,σ¯]\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]} might be an infinite dimensional family of non-parametric (or semi-parametric) dynamics (due to lack of information on the undelying dynamic). In our current context of discussions, the possible choices of 𝒜n[σ¯,σ¯]\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]} include:

  • 𝒮n[σ¯,σ¯]{𝝈:σt is 𝒢t-measurable,𝝈(n)ϵ(n)}\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}\coloneqq\{\bm{\sigma}:\sigma_{t}\text{ is }\mathcal{G}_{t}\text{-measurable},\bm{\sigma}_{(n)}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\bm{\epsilon}_{(n)}\}. As illustrated by Figure 3.1, it includes independent mixture models and a typical class of hidden Markov models without feedback process.(In Figure 3.1, we omit the edge from σ1\sigma_{1} to σ3\sigma_{3} only for graphical simplicity.)

  • n[σ¯,σ¯]{𝝈:σt is 𝒴t1-measurable}\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}\coloneqq\{\bm{\sigma}:\sigma_{t}\text{ is }\mathcal{Y}_{t-1}\text{-measurable}\}. As illustrated by Figure 3.2, it includes those state space models that the future state variable only depends on the historical observations.

  • n[σ¯,σ¯]{𝝈:σt is t-measurable,(σtϵt)|t1}.\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]}\coloneqq\bigl{\{}\bm{\sigma}:\sigma_{t}\text{ is }\mathcal{F}_{t}\text{-measurable},(\sigma_{t}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon_{t})|\mathcal{F}_{t-1}\}. As illustrated by Figure 3.3, it contains a class of hidden Markov models with feedback process: the future state variable depends on both the previous states and observations. In Figures 3.2 and 3.3, the dashed arrows mean these are possible feedback effects.

Note that

𝒮n[σ¯,σ¯]n[σ¯,σ¯]n[σ¯,σ¯].\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}\cup\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]}.

This includes two aspects:

  • 𝒮n[σ¯,σ¯]n[σ¯,σ¯]\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]} due to the fact that 𝒢tt\mathcal{G}_{t}\subset\mathcal{F}_{t} and 𝝈(n)ϵ(n)\bm{\sigma}_{(n)}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\bm{\epsilon}_{(n)};

  • n[σ¯,σ¯]n[σ¯,σ¯]\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]} because for any 𝝈n[σ¯,σ¯]\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}, given t1𝒴t1\mathcal{F}_{t-1}\supset\mathcal{Y}_{t-1}, σt\sigma_{t} can be treated as a constant thus we must have σtϵt|t1\sigma_{t}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon_{t}|\mathcal{F}_{t-1}.

Remark 3.23.1.

The condition σtϵt|t1\sigma_{t}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon_{t}|\mathcal{F}_{t-1} in n[σ¯,σ¯]\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]} is equivalent to

ηtϵt|t1,\eta_{t}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon_{t}|\mathcal{F}_{t-1},

where ηtσt𝔼[σt|t1]\eta_{t}\coloneqq\sigma_{t}-\mathbb{E}[\sigma_{t}|\mathcal{F}_{t-1}] is a sequence of 𝔽\mathbb{F}-martingale increments.

In traditional statistical modeling, how to deal with the quantity 3.24 is essentially a difficult task when 𝒜n[σ¯,σ¯]\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]} is highly unspecified and only contains some vague conditions on the possible design of edges (such as the additional edges in Figure 3.3 compared with Figure 3.1).

In this section, we will show that 3.24 can be related to the GG-expectation of a random vector with semi-GG-normal marginals and different choice of 𝒜n[σ¯,σ¯]\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]} is corresponding to a type of independence associated with semi-GG-normal. After transforming 4.1 into a GG-expectation, it becomes more convenient to evaluate the GG-expectation and this evaluation procedure also gives us a guidance on what should be the “skeleton” part to consider the extreme scenario when dealing with different forms of 𝒜n[σ¯,σ¯]\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}.

σ1\sigma_{1}Y1Y_{1}ϵ1\cdot\epsilon_{1}σ2\sigma_{2}Y2Y_{2}ϵ2\cdot\epsilon_{2}σ3\sigma_{3}Y3Y_{3}ϵ3\cdot\epsilon_{3}
Figure 3.1: Diagram for 𝒮3[σ¯,σ¯]\mathcal{S}_{3}{[\underline{\sigma},\overline{\sigma}]}
σ1\sigma_{1}Y1Y_{1}ϵ1\cdot\epsilon_{1}σ2\sigma_{2}Y2Y_{2}ϵ2\cdot\epsilon_{2}σ3\sigma_{3}Y3Y_{3}ϵ3\cdot\epsilon_{3}
Figure 3.2: Diagram for 3[σ¯,σ¯]\mathcal{L}_{3}{[\underline{\sigma},\overline{\sigma}]}
σ1\sigma_{1}Y1Y_{1}ϵ1\cdot\epsilon_{1}σ2\sigma_{2}Y2Y_{2}ϵ2\cdot\epsilon_{2}σ3\sigma_{3}Y3Y_{3}ϵ3\cdot\epsilon_{3}
Figure 3.3: Diagram for 3[σ¯,σ¯]\mathcal{L}^{*}_{3}{[\underline{\sigma},\overline{\sigma}]}

Our main result can be summarized as follows.

Theorem 3.24.

(Representations of nn semi-GG-normal random variables under various types of independence) Consider Wi𝒩^(0,[σ¯2,σ¯2]),i=1,2,,nW_{i}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]),i=1,2,\dotsc,n and any φCl.Lip(n)\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{n}),

  • Under semi-sequential independence:

    W1SW2SSWn,W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}\overset{\text{S}}{\dashrightarrow}\cdots\overset{\text{S}}{\dashrightarrow}W_{n}, (3.25)

    we have [|φ(𝑾)|]<\mathcal{E}[\lvert\varphi(\bm{W})\rvert]<\infty, and

    [φ(W1,W2,,Wn)]=max𝝈𝒮n[σ¯,σ¯]𝔼[φ(σ1ϵ1,σ2ϵ2,,σnϵn)].\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{n})]=\max_{\bm{\sigma}\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n})]. (3.26)
  • Under sequential independence:

    W1W2Wn,W_{1}\dashrightarrow W_{2}\dashrightarrow\cdots\dashrightarrow W_{n}, (3.27)

    or fully-sequential independence:

    W1FW2FFWn,W_{1}\overset{\text{F}}{\dashrightarrow}W_{2}\overset{\text{F}}{\dashrightarrow}\cdots\overset{\text{F}}{\dashrightarrow}W_{n}, (3.28)

    we have [|φ(𝑾)|]<\mathcal{E}[\lvert\varphi(\bm{W})\rvert]<\infty, and

    [φ(W1,W2,,Wn)]\displaystyle\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{n})] =max𝝈n[σ¯,σ¯]𝔼[φ(σ1ϵ1,σ2ϵ2,,σnϵn)]\displaystyle=\max_{\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n})] (3.29)
    =max𝝈n[σ¯,σ¯]𝔼[φ(σ1ϵ1,σ2ϵ2,,σnϵn)].\displaystyle=\max_{\bm{\sigma}\in\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n})]. (3.30)
Proof of 3.24.

Turn to Section 6.4. ∎

Remark 3.24.1.

We can only say [φ(𝑾)]=[φ(Viϵi,i=1,2,,n)]\mathcal{E}[\varphi(\bm{W})]=\mathcal{E}[\varphi(V_{i}\epsilon_{i},i=1,2,\dotsc,n)] stays the same under sequential or fully sequential independence. It does not mean these two types of independence are equivalent. Their difference might arise when we consider a more general situation [φ((Vi,ϵi),i=1,2,,n)]\mathcal{E}[\varphi((V_{i},\epsilon_{i}),i=1,2,\dotsc,n)], which is out of our current scope of discussion.

Remark 3.24.2.

Here we only consider WtW_{t} as univariate semi-GG-normal which can also be routinely extended to multivariate semi-GG-normal (defined in Section 3.5). Then σt\sigma_{t} is also requred to be changed to a matrix-valued process.

Remark 3.24.3.

The vision here is that we can use the GG-expectation of semi-GG-version random vector under various types of independence to obtain the envelope associated with different family of model structures. With or without a kind of dependence (such as with or without the feedback), the family of models is usually infinite dimensional because, in principle, the form of the feedback dependence could be any kind of nonlinear function. Nonetheless, 3.26, 3.29 and 3.30 tell us that, instead of going through all possible elements on the right hand side, we can move to the left side of the equation treat it as a sublinear expectation which has a convenient way to evaluate. For instance, under semi-sequential independence, by 4.4, 𝑾\bm{W} follows a multivariate semi-GG-normal, then we only need to run through a finite-dimensional subset (as the “skeleton” part) to get the extreme scenario,

max𝝈𝒮n[σ¯,σ¯]𝔼[φ(σ1ϵ1,σ2ϵ2,,σnϵn)]=max𝝈[σ¯,σ¯]n𝔼[φ(σ1ϵ1,σ2ϵ2,,σnϵn)].\max_{\bm{\sigma}\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n})]=\max_{\bm{\sigma}\in{[\underline{\sigma},\overline{\sigma}]}^{n}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n})].

Under sequential independence, we only need to run through an iterative algorithm to evaluate [φ(W1,W2,,Wn)]\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{n})], which will be explained in Section 4.1.

Corollary 3.24.1.

As special cases, under semi-sequential independence, we have

[φ(1nt=1nWt)]=maxσ𝒮n[σ¯,σ¯]𝔼[φ(1nt=1nσtϵt)]\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{t=1}^{n}W_{t})]=\max_{\sigma\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\sigma_{t}\epsilon_{t})] (3.31)

Under sequential independence or fully sequential independence, we have

[φ(1nt=1nWt)]=maxσn[σ¯,σ¯]𝔼[φ(1nt=1nσtϵt)],\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{t=1}^{n}W_{t})]=\max_{\sigma\in\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\sigma_{t}\epsilon_{t})], (3.32)

where n[σ¯,σ¯]\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]} can be replaced by n[σ¯,σ¯]\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}.

Proof.

This a direct result of 3.24. ∎

Remark 3.24.4.

Under semi-sequential independence, by 3.23, we have

[φ(1nt=1nWt)]=[φ(W1)],\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{t=1}^{n}W_{t})]=\mathcal{E}[\varphi(W_{1})],

then we have

maxσ𝒮n[σ¯,σ¯]𝔼[φ(1nt=1nσtϵt)]=maxσ[σ¯,σ¯]𝔼[φ(σϵ)].\max_{\sigma\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\sigma_{t}\epsilon_{t})]=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)].
Remark 3.24.5.

To show consistency with the existing results in the literature, if we choose n[σ¯,σ¯]\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]} in 3.5, (we can also change the distribution of ϵ\epsilon in WW to any applicable classical distribution,) then we can apply the CLT in the GG-expectation framework to the left handside to retrieve a result similar to the one in Rokhlin, (2015) (which is obtained by treating it as a discrete-time stochastic control problem):

[φ(WG)]=limn[φ(1ni=1nWi)]=limnsupσn[σ¯,σ¯]𝔼[φ(1ni=1nσiϵi)],\mathcal{E}[\varphi(W^{G})]=\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}W_{i})]=\lim_{n\to\infty}\sup_{\sigma\in\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sigma_{i}\epsilon_{i})],

where WG𝒩(0,[σ¯2,σ¯2])W^{G}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]). When choosing n[σ¯,σ¯]\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}, 3.24.1 is related to the discussion in Section 4 of Fang et al., (2019). It is also related to the formulation in Dolinsky et al., (2012), although the latter uses a different approach.

Remark 3.24.6 (A more explicit distinction between semi-GG-normal and GG-normal).

Let us extend our discussion to a continuous-time version of the setup in Section 3.1. By Denis et al., (2011), the distributional uncertainty of 𝒩(0,[σ¯2,σ¯2])\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) can be explictily written as

[φ(WG)]=supσ[σ¯,σ¯]𝔼[φ(01σs𝑑BsP)],\mathcal{E}[\varphi(W^{G})]=\sup_{\sigma\in\mathcal{L}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\int_{0}^{1}\sigma_{s}dB^{P}_{s})],

where BtPB_{t}^{P} is a classical Brownian motion (induced by ϵt\epsilon_{t}) under (Ω,,𝔽,)(\Omega,\mathcal{F},\mathbb{F},\mathbb{P}) and [σ¯,σ¯]\mathcal{L}{[\underline{\sigma},\overline{\sigma}]} is the collection of all t\mathcal{F}_{t}-measurable processes valuing in [σ¯,σ¯]{[\underline{\sigma},\overline{\sigma}]}. Meanwhile, by considering the continuous-time version of 3.24.4, the distributional uncertainty of 𝒩^(0,[σ¯2,σ¯2])\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) can be expressed as

[φ(W)]=supσ𝒮[σ¯,σ¯]𝔼[φ(01σs𝑑BsP)],\mathcal{E}[\varphi(W)]=\sup_{\sigma\in\mathcal{S}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\int_{0}^{1}\sigma_{s}dB^{P}_{s})],

where 𝒮[σ¯,σ¯]\mathcal{S}{[\underline{\sigma},\overline{\sigma}]} is the collection of all 𝒢t\mathcal{G}_{t}-measurable processes valuing in [σ¯,σ¯]{[\underline{\sigma},\overline{\sigma}]}. Note that 𝒮[σ¯,σ¯][σ¯,σ¯]\mathcal{S}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{L}{[\underline{\sigma},\overline{\sigma}]} because 𝒮[σ¯,σ¯]\mathcal{S}{[\underline{\sigma},\overline{\sigma}]} only considers those σt\sigma_{t} processes that is independent from BtB_{t}. This gives another more explicit distinction between semi-GG-normal and GG-normal distribution compared with 3.11.3.

Corollary 3.24.2.

Under the setup of 3.24, when φCl.Lip(n)\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{n}) is convex or concave,

[φ(W1,W2,,Wn)],\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{n})],

will be the same under either sequential or semi-sequential independence. Furthermore, in these cases, we have

[φ(W1,W2,,Wn)]={𝔼[φ(σ¯ϵ1,σ¯ϵ2,,σ¯ϵn)] when φ is concave𝔼[φ(σ¯ϵ1,σ¯ϵ2,,σ¯ϵn)] when φ is convex.\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{n})]=\begin{cases}\mathbb{E}[\varphi(\underline{\sigma}\epsilon_{1},\underline{\sigma}\epsilon_{2},\dotsc,\underline{\sigma}\epsilon_{n})]&\text{ when }\varphi\text{ is concave}\\ \mathbb{E}[\varphi(\overline{\sigma}\epsilon_{1},\overline{\sigma}\epsilon_{2},\dotsc,\overline{\sigma}\epsilon_{n})]&\text{ when }\varphi\text{ is convex}\end{cases}. (3.33)

The following result can be treated as an extension of 3.12.

Corollary 3.24.3.

Let {WiG}i=1n\{W_{i}^{G}\}_{i=1}^{n} denote a sequence of nonlinearly i.i.d. GG-normally distributed random variables with W1G𝒩(0,[σ¯2,σ¯2])W_{1}^{G}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]). When φCl.Lip(n)\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{n}) is convex or concave, we have

[φ(W1G,W2G,,WnG)]=[φ(W1,W2,,Wn)],\mathcal{E}[\varphi(W_{1}^{G},W_{2}^{G},\dotsc,W_{n}^{G})]=\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{n})],

where Wi𝒩^(0,[σ¯2,σ¯2]),i=1,2,,nW_{i}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]),i=1,2,\dotsc,n and they can be either sequentially or semi-sequentially independent.

We can also prove that the representations mentioned in this paper also hold for φ(x)=𝟙{xy}\varphi(x)=\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{x\leq y}}\right\}} so that we can apply them to consider the upper probability or capacity induced by the sublinear expectation: 𝐕(A)=[𝟙A]\mathbf{V}(A)=\mathcal{E}[\mathds{1}_{A}] (from 2.2 and 2.4). Without loss of generality, we only discuss the univariate case, which can be routinely extended to multivariate situations.

Definition 3.25.

(The upper and lower cdf) In sublinear expectation space, the upper cdf of a random variable XX is

F¯X(y)𝐕(Xy)=[𝟙{Xy}],\overline{F}_{X}(y)\coloneqq\mathbf{V}(X\leq y)=\mathcal{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{X\leq y}}\right\}}],

and the lower cdf is

F¯X(y)𝐯(Xy)=[𝟙{Xy}].\underline{F}_{X}(y)\coloneqq\mathbf{v}(X\leq y)=-\mathcal{E}[-\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{X\leq y}}\right\}}].
Theorem 3.26.

(Representations of the upper and lower cdf) Let XX denote a random variable in sublinear expectation space and XαX^{\alpha} is a random variable in the classical probability space whose distribution is characterized by a latent variable α\alpha. Suppose a representation of the sublinear expectation,

[φ(X)]=supα𝒜𝔼[φ(Xα)],\mathcal{E}[\varphi(X)]=\sup_{\alpha\in\mathcal{A}}\mathbb{E}[\varphi(X^{\alpha})], (3.34)

holds for any φCl.Lip()\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}). Then we also have the representations for the upper cdf,

F¯X(y)=𝐕(Xy)=supα𝒜(Xαy),\overline{F}_{X}(y)=\mathbf{V}(X\leq y)=\sup_{\alpha\in\mathcal{A}}\mathbb{P}(X^{\alpha}\leq y), (3.35)

which holds for for any continuity point yy of F¯X\overline{F}_{X}. In other words, the representation can be extended to functions in the form φ(x)𝟙{xy}\varphi(x)\coloneqq\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{x\leq y}}\right\}}. Meanwhile, we also have the representation for the lower cdf,

F¯X(y)=𝐯(Xy)=infα𝒜(Xαy),\underline{F}_{X}(y)=\mathbf{v}(X\leq y)=\inf_{\alpha\in\mathcal{A}}\mathbb{P}(X^{\alpha}\leq y), (3.36)

which holds for any continuity point yy of F¯X\underline{F}_{X}.

Proof of 3.26.

It is easy to show F¯X(y)\overline{F}_{X}(y) is a monotone function, then the set of discountinuous points is at most a countable set. Let yy be any continuous point of F¯X(y)\overline{F}_{X}(y). For any ϵ>0\epsilon>0, take δ\delta small enough such that,

F¯X(y+δ)F¯X(yδ)ϵ.\overline{F}_{X}(y+\delta)-\overline{F}_{X}(y-\delta)\leq\epsilon.

Take ff and gg be two bounded continuous functions such that

f(x)={1xyδ[0,1]yδ<xy0x>y,f(x)=\begin{cases}1&x\leq y-\delta\\ \in[0,1]&y-\delta<x\leq y\\ 0&x>y\end{cases},

and

g(x)={1xy[0,1]y<xy+δ0x>y+δ.g(x)=\begin{cases}1&x\leq y\\ \in[0,1]&y<x\leq y+\delta\\ 0&x>y+\delta\end{cases}.

Then we have

𝟙{xyδ}f(x)𝟙{xy}g(x)𝟙{xy+δ}.\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{x\leq y-\delta}}\right\}}\leq f(x)\leq\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{x\leq y}}\right\}}\leq g(x)\leq\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{x\leq y+\delta}}\right\}}.

We can apply this inequality to XαX^{\alpha} for any given α\alpha:

𝔼[f(Xα)](Xαy)𝔼[g(Xα)],\mathbb{E}[f(X^{\alpha})]\leq\mathbb{P}(X^{\alpha}\leq y)\leq\mathbb{E}[g(X^{\alpha})],

then

supα𝒜𝔼[f(Xα)]supα𝒜(Xαy)supα𝒜𝔼[g(Xα)].\sup_{\alpha\in\mathcal{A}}\mathbb{E}[f(X^{\alpha})]\leq\sup_{\alpha\in\mathcal{A}}\mathbb{P}(X^{\alpha}\leq y)\leq\sup_{\alpha\in\mathcal{A}}\mathbb{E}[g(X^{\alpha})].

Note that f,gCl.Lip()f,g\in C_{\mathrm{l.Lip}}(\mathbb{R}) we can use the representation 3.34 to get,

𝐕(Xyδ)[f(X)]supα𝒜(Xαy)[g(X)]𝐕(Xy+δ).\mathbf{V}(X\leq y-\delta)\leq\mathcal{E}[f(X)]\leq\sup_{\alpha\in\mathcal{A}}\mathbb{P}(X^{\alpha}\leq y)\leq\mathcal{E}[g(X)]\leq\mathbf{V}(X\leq y+\delta).

Then

F¯X(y)ϵF¯X(yδ)supα𝒜(Xαy)F¯X(y+δ)F¯X(y)+ϵ.\overline{F}_{X}(y)-\epsilon\leq\overline{F}_{X}(y-\delta)\leq\sup_{\alpha\in\mathcal{A}}\mathbb{P}(X^{\alpha}\leq y)\leq\overline{F}_{X}(y+\delta)\leq\overline{F}_{X}(y)+\epsilon.

Since ϵ>0\epsilon>0 can be arbitrarily small, we have proved the required result 3.35 for F¯X\overline{F}_{X}. To validate the representation 3.36 for F¯X\underline{F}_{X}, we simply need to replace F¯X\overline{F}_{X} with F¯X\underline{F}_{X} and change sup\sup to inf\inf accordingly. ∎

Remark 3.26.1.

(Notes on the continuity of 𝐕\mathbf{V}) Note that 3.26 does not require the continuity of 𝐕\mathbf{V}: 𝐕(An)𝐕(A)\mathbf{V}(A_{n})\to\mathbf{V}(A) if AnAA_{n}\to A. Since one can easily check that 𝐕\mathbf{V} is automatically lower continuous: 𝐕(An)𝐕(A)\mathbf{V}(A_{n})\uparrow\mathbf{V}(A) if AnAA_{n}\uparrow A, the upper continuity (𝐕(An)𝐕(A)\mathbf{V}(A_{n})\downarrow\mathbf{V}(A) if AnAA_{n}\downarrow A) is what we are really discussing here whenever we say the continuity of 𝐕\mathbf{V}. Here we try to avoid the assumption on the upper continuity of VV which is a quite strong and restrictive one. Even under the regularity of \mathcal{E}, we can only say the upper continuity holds for closed AA (Lemma 7 in Denis et al., (2011)). However, when yy is a continuous point of F¯X\overline{F}_{X}, consider any sequence yny_{n} converging to yy as nn\to\infty, we do have V(An)V(A)V(A_{n})\to V(A) for sets An{Xyn}A_{n}\coloneqq\{X\leq y_{n}\} and A{Xy}A\coloneqq\{X\leq y\}; namely, 𝐕\mathbf{V} appears some continuity on this kind of sets.

4 The hybrid roles and applications of semi-GG-normal distributions

In this section, we will show the hybrid roles of semi-GG-normal distributions, connecting the intuition between the classical framework and the GG-expectation framework, by answering the four questions mentioned in the introduction.

4.1 How to connect the linear expectations of classical normal with GG-normal

In principle, it is feasible to understand the expectation of GG-normal distribution through the structure of GG-heat equation. Nonetheless, as a generalization of the normal distribution, it will be better if we can understand the GG-normal distribution in a more distributional sense. Is it possible to understand the GG-normal distribution from our old friend, the classical normal? It is indeed a natural concern or question but this is essentially not straightforward. Even for people who have partially learned the theory of the GG-expectation framework, there usually exists several common thinking gaps between classical normal and GG-normal distribution.

For instance, as mentioned in 3.11.3, for φCl.Lip()\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}),

[φ(𝒩(0,[σ¯2,σ¯2]))]supσ[σ¯,σ¯]𝔼[φ(N(0,σ2))],\mathcal{E}[\varphi(\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]))]\geq\sup_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi\big{(}N(0,\sigma^{2}))], (4.1)

which indicates that the uncertainty set of GG-normal distribution is larger than a class of classical normal distributions with σ[σ¯,σ¯]\sigma\in{[\underline{\sigma},\overline{\sigma}]}. Especially, Hu, (2012) shows the strict inequality that when φ(x)=x3\varphi(x)=x^{3}, we have [(𝒩(0,[σ¯2,σ¯2]))3]>0.\mathcal{E}[\big{(}\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])\big{)}^{3}]>0. (It stays positive for any odd moments.) Let WG=d𝒩(0,[σ¯2,σ¯2])W^{G}\overset{\text{d}}{=}\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]). By checking the GG-function defined in 2.14, we have

WG=dWG,W^{G}\overset{\text{d}}{=}-W^{G}, (4.2)

which indicates that the GG-normal distribution should have some “symmetry”. However, exactly due to this identity in distribution shown in 4.2, we should have WGW^{G} and WG-W^{G} share the same (sublinear) third moment: [(WG)3]=[(WG)3)]=[(WG)3]>0,\mathcal{E}[-(W^{G})^{3}]=\mathcal{E}[(-W^{G})^{3})]=\mathcal{E}[(W^{G})^{3}]>0, which directly implies,

[(𝒩(0,[σ¯2,σ¯2]))3]>0=𝔼[(N(0,σ2))3]>[(𝒩(0,[σ¯2,σ¯2]))3].\mathcal{E}[\big{(}\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])\big{)}^{3}]>0=\mathbb{E}[\big{(}N(0,\sigma^{2})\big{)}^{3}]>-\mathcal{E}[-\big{(}\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])\big{)}^{3}].

It tells us that the degree of symmetry or skewness of GG-normal distribution is uncertain, which somehow looks like a “contradiction” with 4.2 and seems quite counter-intuitive for a “normal” distribution.

Based on the above-mentioned statements showing how different the GG-normal and classical normal are, our motivation comes from the following opposite aspect: is this possible for us to connect the linear expectation 𝔼[φ(N(0,σ2))]\mathbb{E}[\varphi\big{(}N(0,\sigma^{2}))] of classical normal distribution with the sublinear expectation [φ(𝒩(0,[σ¯2,σ¯2]))]\mathcal{E}[\varphi(\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]))] of GG-normal distribution (or use the former to approach the latter one)?

This section will first give an affirmative answer to this question by providing an iterative algorithm given by our previous work (Li and Kulperger, (2018)) based on the semi-GG-normal distribution. Then we are going to extend this iterative algorithm into a general computational procedure to deal with weighted summations in statistical practice.

Theorem 4.1 (The Iterative Approximation of the GG-normal Distribution).

For any φCl.Lip()\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}) and integer n1n\geq 1, consider the series of iteration functions {φi,n}i=1n\{\varphi_{i,n}\}_{i=1}^{n} with initial function φ0,n(x)φ(x)\varphi_{0,n}(x)\coloneqq\varphi(x) and iterative relation:

φi+1,n(x)maxσ[σ¯,σ¯]𝔼[φi,n(N(x,σ2/n))],i=0,1,,n1.\varphi_{i+1,n}(x)\coloneqq\max_{\sigma\in[\underline{\sigma},\overline{\sigma}]}\mathbb{E}_{\mathbb{P}}[\varphi_{i,n}(N(x,\sigma^{2}/n))],i=0,1,\dotsc,n-1. (4.3)

The final iteration function for a given nn is φn,n\varphi_{n,n}. As nn\to\infty, we have φn,n(0)[φ(WG)]\varphi_{n,n}(0)\to\mathcal{E}[\varphi(W^{G})], where WG𝒩(0,[σ¯2,σ¯2])W^{G}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]).

Remark 4.1.1.

As opposed to 4.1, the relation 4.3 shows that, to correclty understand the sublinear expectation of 𝒩(0,[σ¯2,σ¯2])\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]), we need to start from the linear expectation of classical normal and go through an iterative maximization of the function φ\varphi itself to approach the expectation of 𝒩(0,[σ¯2,σ¯2])\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]). For a fixed nn, we actually have

[φ(𝒩(0,[σ¯2,σ¯2]))]maxσ[σ¯,σ¯]𝔼[φn1,n(N(0,σ2/n))].\mathcal{E}[\varphi(\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]))]\approx\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi_{n-1,n}(N(0,\sigma^{2}/n))].
Remark 4.1.2.

From a computational aspect, the normal distribution in 4.3 can be replaced by other classical distributions with finite moment generating functions because this algorithm is based on the GG-version central limit theorem (as indicated in 4.2). The interval [σ¯,σ¯]{[\underline{\sigma},\overline{\sigma}]} can be further simplified to a two-point set {σ¯,σ¯}\{\underline{\sigma},\overline{\sigma}\} or a three-point set {σ¯,σ¯+σ¯2,σ¯}\{\underline{\sigma},\frac{\underline{\sigma}+\overline{\sigma}}{2},\overline{\sigma}\} for computational convenience. More theoretical details and numerical aspects (as well as PDE sides) of this iterative algorithm can be found in Li and Kulperger, (2018). This iterative algorithm is also related to the idea of the discrete-time formulation in Dolinsky et al., (2012).

Remark 4.1.3.

Consider a sequence {Wi}i=1n\{W_{i}\}_{i=1}^{n} of nonlinearly i.i.d. semi-GG-normal random variables with W1𝒩^(0,[σ¯2,σ¯2])W_{1}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]). Each iteration function can also be expressed as the sublinear expectation of the semi-GG-normal distribution (letting W00W_{0}\coloneqq 0):

φi,n(x)=[φ(x+j=0iWnjn)]=[φ(x+j=0iWjn)],\varphi_{i,n}(x)=\mathcal{E}[\varphi(x+\sum_{j=0}^{i}\frac{W_{n-j}}{\sqrt{n}})]=\mathcal{E}[\varphi(x+\sum_{j=0}^{i}\frac{W_{j}}{\sqrt{n}})],

for i=0,1,,ni=0,1,\dotsc,n. Moreover, Li and Kulperger, (2018) further show that the series of iteration functions is an approximation of the whole solution surface of GG-heat equation on a given time grid. To be specific, consider the GG-heat equation defined on [0,)×[0,\infty)\times\mathbb{R}:

ut+G(uxx)=0,u|t=1=φ,u_{t}+G(u_{xx})=0,\,u|_{t=1}=\varphi,

where G(a)12[aX2]=12(σ¯2a+σ¯2a)G(a)\coloneqq\frac{1}{2}\mathcal{E}[aX^{2}]=\frac{1}{2}(\overline{\sigma}^{2}a^{+}-\underline{\sigma}^{2}a^{-}) and φCl.Lip()\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}). For each p(0,1]p\in(0,1], we have

|u(1p,x)φnp,n(x)|=|[φ(x+pX)][φ(x+i=0npWin)]|=Cφ(1+|x|k)O(1(np)α/2),|u(1-p,x)-\varphi_{\lfloor np\rfloor,n}(x)|=|\mathcal{E}[\varphi(x+\sqrt{p}X)]-\mathcal{E}[\varphi(x+\sum_{i=0}^{\lfloor np\rfloor}\frac{W_{i}}{\sqrt{n}})]|=C_{\varphi}(1+|x|^{k})O(\frac{1}{(np)^{\alpha/2}}),

where α(0,1)\alpha\in(0,1) depending on (σ¯,σ¯)(\underline{\sigma},\overline{\sigma}).

The basic idea of the iterative algorithm comes from the following result. In the following context, without futher notice, let {Wi}i=1\{W_{i}\}_{i=1}^{\infty} denote a sequence of nonlinearly i.i.d. semi-GG-normally distributed random variables with W1𝒩^(0,[σ¯2,σ¯2])W_{1}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]).

Proposition 4.2.

(A general connection between semi-GG-normal and GG-normal) For any φCl.Lip()\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}), we have

limn[φ(1ni=1nWi)]=[φ(WG)],\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}W_{i})]=\mathcal{E}[\varphi(W^{G})],

where WG𝒩(0,[σ¯2,σ¯2])W^{G}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]).

Remark 4.2.1.

The iterative algorithm can also be extended to dd-dimensional cases by extending the dimensions of {Wi}i=1\{W_{i}\}_{i=1}^{\infty} and WGW^{G} accordingly.

Proof.

This is direct result of the GG-version central limit theorem (4.6). We can extend the space of function φ\varphi to Cl.Lip()C_{\mathrm{l.Lip}}(\mathbb{R}) because the condition in 2.18 is satisfied. Let Sn1ni=1nWiS_{n}\coloneqq\frac{1}{\sqrt{n}}\sum_{i=1}^{n}W_{i}. In fact, for any p1p\geq 1, since f(x1,x2,,xn)=|i=1pxi|pf(x_{1},x_{2},\dotsc,x_{n})=\lvert\sum_{i=1}^{p}x_{i}\rvert^{p} is a convex function, by 3.24.2, with ϵiN(0,1),i=1,2,,n\epsilon_{i}\sim N(0,1),i=1,2,\dotsc,n, we have

[|Sn|p]=𝔼[|1ni=1nσ¯ϵi|p]=σ¯p𝔼[|ϵ1|p].\mathcal{E}[\lvert S_{n}\rvert^{p}]=\mathbb{E}[\lvert\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\overline{\sigma}\epsilon_{i}\rvert^{p}]=\overline{\sigma}^{p}\mathbb{E}[\lvert\epsilon_{1}\rvert^{p}].

Meanwhile, [|WG|p]=σ¯p𝔼[|ϵ1|p]\mathcal{E}[\lvert W^{G}\rvert^{p}]=\overline{\sigma}^{p}\mathbb{E}[\lvert\epsilon_{1}\rvert^{p}] due to 2.15. Hence, we have, for any p+p\in\mathbb{N}_{+},

supn[|Sn|p]+[|WG|p]<.\sup_{n}\mathcal{E}[\lvert S_{n}\rvert^{p}]+\mathcal{E}[\lvert W^{G}\rvert^{p}]<\infty.\qed

Then the iterative algorithm (4.1) can be treated as a direct evaluation of [φ(1ni=1nWi)]\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}W_{i})]. Interestingly, the uncertainty set of each WiW_{i} is strictly smaller than the GG-normal distribution by 4.1 but their normalized sum is able to approach the GG-normal. Then it leads us to another closely related question: how does the uncertainty set of [φ(1ni=1nWi)]\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}W_{i})] exactly aggregate (towards the one of GG-normal) as nn increases? How does the GG-version independence change the uncertainty set associated with the expectation of the joint random vector

[φ(W1,W2,,Wn)]?\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{n})]?

This question has been answered by the representations shown in 3.24 and 3.24.1.

We can also extend the idea of iterative algorithm into a procedure that can deal with sublinear expectation under sequential independnce in a broader sense. We call it as a GG-EM (Expectation-Maximation) procedure because it happens to involve expectation and maximization step (but it has no direct relation to the Expectation-Maximiazation algorithm in statistical modeling.)

One of the goals of GG-EM procedure is to deal with following object for a any fixed φCl.Lip\varphi\in C_{\mathrm{l.Lip}}:

[φ(𝒂,𝑾)]=[φ(i=1naiWi)],.\mathcal{E}[\varphi(\mathopen{}\mathclose{{}\left\langle\bm{a},\bm{W}}\right\rangle)]=\mathcal{E}[\varphi(\sum_{i=1}^{n}a_{i}W_{i})],. (4.4)

where Wi𝒩^(0,[σ¯2,σ¯2]),i=1,2,,nW_{i}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]),i=1,2,\dotsc,n are sequentially independent (the distribution of WiW_{i} could also be generalized to any member in a semi-GG-family of distributions which will be defined in Section 5.1) and 𝒂n\bm{a}\in\mathbb{R}^{n} is the weight vector. Without loss of generality, we assume the Eulicdean norm 𝒂=1\lVert\bm{a}\rVert=1 (or ai2=1\sum a_{i}^{2}=1). These kinds of objects are common in data practice (in the context of financial modeling, statistics or actuarial science). We are going to give an example of a simple linear regression problem in Section 5.4.

The iterative algorithm is a special case of this, with ai=1/n,i=1,2,,na_{i}=1/\sqrt{n},i=1,2,\dotsc,n:

[φ(1ni=1nWi)],\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}W_{i})],

which converges to [φ(𝒩(0,[σ¯2,σ¯2]))]\mathcal{E}[\varphi(\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]))] as nn\to\infty.

However, in practice, using a asymptotic result may not be feasible here for the following reasons:

  1. 1.

    Note that aia_{i} could be in arbitrary form (ususally depend on the data or problem itself). Although we do we have results like the weighted central limit theorem proved by Zhang and Chen, (2014), we may not always a general asysmptotic result for it.

  2. 2.

    More fundamentally, nn could be a small number which still has a gap with the asymptotic result. In this case, we need to have a non-asymptoic approximation by involving the convergence rates of the central limiting theorem (like the Berry-Essen bound in classical case) which has been studied by Fang et al., (2019); Huang and Liang, (2019); Song, (2020); Krylov, (2020).

  3. 3.

    If nn is small compared with the dimension dd of the data, it further requires us to have a non-asymptotic view of 4.4.

Next we explain the details of the GG-EM procedure to deal with 4.4. Again, under the spirit of iterative approximation, 4.4 can be computed by the following procedure: with φ0,nφ\varphi_{0,n}\coloneqq\varphi, for i=0,1,2,,n1i=0,1,2,\dotsc,n-1,

φi+1,n(x)=[φ(x+aniWni)]=maxσni[σ¯,σ¯]𝔼[φi,n(x+aniσniϵni)].\varphi_{i+1,n}(x)=\mathcal{E}[\varphi(x+a_{n-i}W_{n-i})]=\max_{\sigma_{n-i}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi_{i,n}(x+a_{n-i}\sigma_{n-i}\epsilon_{n-i})].

Finally we have

[φ(i=1naiWi)]=φn,n(0).\mathcal{E}[\varphi(\sum_{i=1}^{n}a_{i}W_{i})]=\varphi_{n,n}(0).

Then we can store the optimal choice of σi\sigma_{i} control process for our later simulation study (then there is no need to run the iterative algorithm again). Remember, the order we get the optimal σ\sigma^{*} process is in the backward order

(σn,σn1,,σ1),(\sigma_{n}^{*},\sigma_{n-1}^{*},\dotsc,\sigma_{1}^{*}),

To follow the original order, we need to reverse it and the optimal σ\sigma^{*} process is in the form of

σ1\displaystyle\sigma_{1}^{*} [σ¯,σ¯]\displaystyle\in{[\underline{\sigma},\overline{\sigma}]}
σk\displaystyle\sigma_{k}^{*} =σk(i=1k1aiWi),k=2,,n.\displaystyle=\sigma_{k}^{*}(\sum_{i=1}^{k-1}a_{i}W_{i}),k=2,\dotsc,n.

In this way, we have

[φ(𝒂,𝑾)]=[φ(i=1naiWi)]=𝔼[φ(i=1naiσiWi)],\mathcal{E}[\varphi(\mathopen{}\mathclose{{}\left\langle\bm{a},\bm{W}}\right\rangle)]=\mathcal{E}[\varphi(\sum_{i=1}^{n}a_{i}W_{i})]=\mathbb{E}[\varphi(\sum_{i=1}^{n}a_{i}\sigma_{i}^{*}W_{i})],

and the linear expectation can be approximated by a classical Monte-Carlo simulation.

4.2 How to connect univariate and multivariate objects

There are two basic properties for classical normal distribution, which brings convenience in the study of multivariate statistics. First, in (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) for any two independent X1X_{1} and X2X_{2} both following N(0,1)N(0,1), we must have (X1,X2)(X_{1},X_{2}) form a bivariate normal. (This result still holds even if they are not independent but linearly correlated.) Second, a d\mathbb{R}^{d}-valued random vector 𝑿\bm{X} follows multivariate normal if and only if the inner product 𝒂,𝑿\mathopen{}\mathclose{{}\left\langle\bm{a},\bm{X}}\right\rangle is normal for any 𝒂d\bm{a}\in\mathbb{R}^{d}. However, these two properties no longer hold for GG-normal distributions. Readers can find the following established result in the book Peng, 2019b (Exercise 2.5.1),

Proposition 4.3.

Suppose X1X2X_{1}\dashrightarrow X_{2} and X1=dX2=dN(0,[σ¯2,σ¯2])X_{1}\overset{\text{d}}{=}X_{2}\overset{\text{d}}{=}N(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) with σ¯<σ¯\underline{\sigma}<\overline{\sigma}, for 𝐗(X1,X2)\bm{X}\coloneqq(X_{1},X_{2}), we have

  1. 1.

    𝒂,𝑿\mathopen{}\mathclose{{}\left\langle\bm{a},\bm{X}}\right\rangle is GG-normal distributed for any 𝒂2\bm{a}\in\mathbb{R}^{2};

  2. 2.

    𝑿\bm{X} does not follow a bivariate GG-normal distribution.

4.3 shows we cannot construct bivariate GG-normal distribution directly from two independent univariate GG-normal distributed random variables. It stays unfeasible even considering any invertible linear transformations of the random vector (X1,X2)(X_{1},X_{2}) as shown by Bayraktar and Munk, (2015), which study more details about these strange properties of GG-normal in multidimensional case.

To further explain the obstacle here, let us first recall that, in Section 4.1, we have shown how to start from the linear expectation 𝔼[φ(N(0,σ2))]\mathbb{E}[\varphi(N(0,\sigma^{2}))] of classical normal to correctly understand (also compute) the sublinear expectation [φ(𝒩(0,[σ¯2,σ¯2]))]\mathcal{E}[\varphi(\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]))] of GG-normal. Suppose our next goal here is to help general audience further understand (or compute) the sublinear expectation [φ(𝒩(0,𝒞))]\mathcal{E}[\varphi(\mathcal{N}(0,\mathcal{C}))] of a multivariate GG-normal distribution with covariance uncertainty characterized by 𝒞\mathcal{C} such as 𝒞{diag(σ12,σ22),σi[σ¯,σ¯],i=1,2}\mathcal{C}\coloneqq\{\operatorname{diag}(\sigma_{1}^{2},\sigma_{2}^{2}),\sigma_{i}\in{[\underline{\sigma},\overline{\sigma}]},i=1,2\} from (X1,X2)(X_{1},X_{2}) with Xi𝒩(0,[σ¯2,σ¯2])X_{i}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]), i=1,2i=1,2 and X1X2X_{1}\dashrightarrow X_{2}. However, as shown in 4.3, it is difficult to achieve this goal from this path because 𝑿=(X1,X2)\bm{X}=(X_{1},X_{2}) is not GG-normal distributed, and neither is 𝐀𝑿T\mathbf{A}\bm{X}^{T} for any invertible 2×22\times 2 matrix 𝐀\mathbf{A}.

It turns out the connection between univariate and multivariate object is essentially nontrivial. The contribution of this section is to show that this connection can be revealed by introducing an intermediate substructure, called the semi-GG-normal imposed with semi-sequential independence. Typically, 4.4 shows that a joint vector of semi-sequentially independent univariate semi-GG-normal follows a multivariate semi-GG-normal (with a diagonal covariance matrix).

Theorem 4.4.

For a sequence of semi-GG-normal distributed random variables {Wi}i=1n\{W_{i}\}_{i=1}^{n}, satisfying Wi𝒩^(0,[σ¯i2,σ¯i2])W_{i}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}_{i}^{2},\overline{\sigma}_{i}^{2}]) for i=1,2,,ni=1,2,\dotsc,n, and

W1SW2SSWn,W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}\overset{\text{S}}{\dashrightarrow}\dotsc\overset{\text{S}}{\dashrightarrow}W_{n},

we have

(W1,W2,,Wn)T𝒩^(𝟎,𝒞),(W_{1},W_{2},\dotsc,W_{n})^{T}\sim\hat{\mathcal{N}}(\bm{0},\mathcal{C}),

where 𝒞𝕊d+\mathcal{C}\subset\mathbb{S}_{d}^{+} is the uncertainty set of covariance matrices defined as

𝒞{𝚺=(σ21σ2n):σ2i[σ¯i2,σ¯i2],i=1,2,,n}.\mathcal{C}\coloneqq\mathopen{}\mathclose{{}\left\{\mathbf{\Sigma}=\mathopen{}\mathclose{{}\left(\begin{array}[]{ccc}\sigma^{2}_{1}\\ &\ddots\\ &&\sigma^{2}_{n}\end{array}}\right):\sigma^{2}_{i}\in[\underline{\sigma}_{i}^{2},\overline{\sigma}_{i}^{2}],i=1,2,\dotsc,n}\right\}.
Proof.

It is a direct result of 3.22 (the non-identical variance interval here is inessential to the proof). ∎

Next 4.5 shows we can do linear transformation on 𝑾\bm{W} to get a multivariate semi-GG-normal with a non-diagnonal covariance matrix.

Proposition 4.5 (Multivariate semi-GG-normal under linear transformation).

Let 𝐖n×1𝒩^(𝟎,𝒞)\bm{W}_{n\times 1}\sim\hat{\mathcal{N}}(\bm{0},\mathcal{C}). For any constant matrix 𝐀r×n\mathbf{A}\in\mathbb{R}^{r\times n} with rnr\leq n, we have

𝐀𝑾𝒩^(𝟎,𝐀𝒞𝐀T),\mathbf{A}\bm{W}\sim\hat{\mathcal{N}}(\bm{0},\mathbf{A}\mathcal{C}\mathbf{A}^{T}),

where

𝐀𝒞𝐀T{𝐀𝚺𝐀T:𝚺𝒞}r×r.\mathbf{A}\mathcal{C}\mathbf{A}^{T}\coloneqq\mathopen{}\mathclose{{}\left\{\mathbf{A}\mathbf{\Sigma}\mathbf{A}^{T}:\mathbf{\Sigma}\in\mathcal{C}}\right\}\subset\mathbb{R}^{r\times r}.
Proof.

First of all, note that 𝐀r×n𝑾n×1=𝐀r×n𝐕n×nϵn×1\mathbf{A}_{r\times n}\bm{W}_{n\times 1}=\mathbf{A}_{r\times n}\mathbf{V}_{n\times n}\bm{\epsilon}_{n\times 1} with 𝐕(𝒱)\mathbf{V}\sim\mathcal{M}(\mathcal{V}). For any HCl.Lip(r×n)H\in C_{\mathrm{l.Lip}}(\mathbb{R}^{r\times n}), we have

[H(𝐀𝐕)]=max𝚺1/2𝒱𝔼[H(𝐀𝚺1/2)]=max𝐁𝐀𝒱𝔼[H(𝐁)],\mathcal{E}[H(\mathbf{A}\mathbf{V})]=\max_{\mathbf{\Sigma}^{1/2}\in\mathcal{V}}\mathbb{E}_{\mathbb{P}}[H(\mathbf{A}\mathbf{\Sigma}^{1/2})]=\max_{\mathbf{B}\in\mathbf{A}\mathcal{V}}\mathbb{E}_{\mathbb{P}}[H(\mathbf{B})],

so 𝐀𝐕(𝐀𝒱)\mathbf{A}\mathbf{V}\sim\mathcal{M}(\mathbf{A}\mathcal{V}), which can be treated as the scaling property for the n×nn\times n-dimensional maximal distribution. It follows from 𝐕ϵ\mathbf{V}\dashrightarrow\bm{\epsilon} that 𝐀𝐕ϵ\mathbf{A}\mathbf{V}\dashrightarrow\bm{\epsilon}. Therefore,

𝐀𝑾=d(𝐀𝒱)N(𝟎,𝐈n2)𝒩^(𝟎,𝒞),\mathbf{A}\bm{W}\overset{\text{d}}{=}\mathcal{M}(\mathbf{A}\mathcal{V})N(\bm{0},\mathbf{I}_{n}^{2})\sim\hat{\mathcal{N}}(\bm{0},\mathcal{C}^{\prime}),

where

𝒞\displaystyle\mathcal{C}^{{}^{\prime}} {𝐁𝐁T:𝐁𝐀𝒱}\displaystyle\coloneqq\mathopen{}\mathclose{{}\left\{\mathbf{B}\mathbf{B}^{T}:\mathbf{B}\in\mathbf{A}\mathcal{V}}\right\}
={(𝐀𝚺1/2)(𝐀𝚺1/2)T:𝚺1/2𝒱}\displaystyle=\mathopen{}\mathclose{{}\left\{(\mathbf{A}\mathbf{\Sigma}^{1/2})(\mathbf{A}\mathbf{\Sigma}^{1/2})^{T}:\mathbf{\Sigma}^{1/2}\in\mathcal{V}}\right\}
={𝐀𝚺𝐀T:𝚺𝒞}.\displaystyle=\mathopen{}\mathclose{{}\left\{\mathbf{A}\mathbf{\Sigma}\mathbf{A}^{T}:\mathbf{\Sigma}\in\mathcal{C}}\right\}.

In other words, 𝐀𝑾𝒩^(𝟎,𝐀𝒞𝐀T).\mathbf{A}\bm{W}\sim\hat{\mathcal{N}}(\bm{0},\mathbf{A}\mathcal{C}\mathbf{A}^{T}).

Then we can use a sequence of 𝑾𝒩^(𝟎,𝒞)\bm{W}\sim\hat{\mathcal{N}}(\bm{0},\mathcal{C}) to approach the multivariate GG-normal 𝒩(𝟎,𝒞)\mathcal{N}(\bm{0},\mathcal{C}) by nonlinear CLT.

Theorem 4.6.

Consider a sequence of nonlinearly i.i.d. {𝐖i}i=1\{\bm{W}_{i}\}_{i=1}^{\infty} with W1𝒩^(0,𝒞)W_{1}\sim\hat{\mathcal{N}}(0,\mathcal{C}). Let WGW^{G} be a GG-normal distributed random vector following 𝒩(0,𝒞)\mathcal{N}(0,\mathcal{C}). Then we have, for any φCl.Lip\varphi\in C_{\mathrm{l.Lip}},

limn[φ(1ni=1n𝑾i)]=[φ(𝑾G)].\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\bm{W}_{i})]=\mathcal{E}[\varphi(\bm{W}^{G})].

It means that,

1ni=1n𝑾id𝒩(0,𝒞).\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\bm{W}_{i}\overset{\text{d}}{\longrightarrow}\mathcal{N}(0,\mathcal{C}).
Proof.

This is a multivariate version of 4.2. We only need to validate the conditions. First of all, the sequence {Wi}i=1\{W_{i}\}_{i=1}^{\infty} definitely has certain zero mean. Then, notice that the distribution of 𝑾G\bm{W}^{G} is characterized by the function G(𝐀)=12sup𝚺𝒞tr[𝐀𝚺]G(\mathbf{A})=\frac{1}{2}\sup_{\mathbf{\Sigma}\in\mathcal{C}}\text{tr}[\mathbf{A}\mathbf{\Sigma}], where tr[]\text{tr}[\cdot] means the trace of the matrix. We only need to prove that G(𝐀)=12[𝐀W1,W1]G(\mathbf{A})=\frac{1}{2}\mathcal{E}[\langle\mathbf{A}W_{1},W_{1}\rangle] for any 𝐀𝕊d\mathbf{A}\in\mathbb{S}_{d}. By the representation of semi-GG-normal distribution, letting ϵN(0,𝚺)\bm{\epsilon}\sim N(0,\mathbf{\Sigma}), we have

[𝐀W1,W1]\displaystyle\mathcal{E}[\langle\mathbf{A}W_{1},W_{1}\rangle] =sup𝚺𝒞𝔼[𝐀ϵ,ϵ]=sup𝚺𝒞𝔼[(𝐀ϵ)Tϵ]\displaystyle=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\mathbb{E}_{\mathbb{P}}[\langle\mathbf{A}\bm{\epsilon},\bm{\epsilon}\rangle]=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\mathbb{E}_{\mathbb{P}}[(\mathbf{A}\bm{\epsilon})^{T}\bm{\epsilon}]
=sup𝚺𝒞𝔼[ϵT𝐀ϵ]=sup𝚺𝒞𝔼[tr[ϵT𝐀ϵ]]\displaystyle=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\mathbb{E}_{\mathbb{P}}[\bm{\epsilon}^{T}\mathbf{A}\bm{\epsilon}]=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\mathbb{E}_{\mathbb{P}}[\text{tr}[\bm{\epsilon}^{T}\mathbf{A}\bm{\epsilon}]]
=sup𝚺𝒞𝔼[tr[𝐀ϵϵT]]=sup𝚺𝒞tr[𝔼[𝐀ϵϵT]]\displaystyle=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\mathbb{E}_{\mathbb{P}}[\text{tr}[\mathbf{A}\bm{\epsilon}\bm{\epsilon}^{T}]]=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\text{tr}[\mathbb{E}_{\mathbb{P}}[\mathbf{A}\bm{\epsilon}\bm{\epsilon}^{T}]]
=sup𝚺𝒞tr[𝐀𝔼[ϵϵT]]=sup𝚺𝒞tr[𝐀𝚺].\displaystyle=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\text{tr}[\mathbf{A}\mathbb{E}_{\mathbb{P}}[\bm{\epsilon}\bm{\epsilon}^{T}]]=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\text{tr}[\mathbf{A}\mathbf{\Sigma}].\qed

The argument on how to extend the choice of φ\varphi to Cl.LipC_{\mathrm{l.Lip}} is is similar to the proof in 4.2.

Then it creates a path from univariate classical normal to a multivariate GG-normal. Figure 4.1 shows the relations among linear, semi-GG- and GG-normal distributions. We can start from the univariate objects (semi-GG-normal distribution), and construct its multivariate version under semi-sequential independence, then approach the multivariate GG-normal distribution, which gives us a feasible way to start from univariate objects to approximately approach the multivariate distribution.

Refer to caption
Figure 4.1: The relations among classical, semi-GG-, and GG-normal in univariate and multivariate cases

4.3 A statistical interpretation of asymmetry in sequential independence

In this section, we will expand 2.7 (which is used to illustrate the asymmetry of independence in this framework) by studying its representation result to provide a more specific, statistical interpretation of this asymmetry. More interestingly, we will show that, for two semi-GG-normally distributed random objects, each of them has certain zero third moment (because its distributional uncertainty can be written as a family of classical normal with different variances). This property is preserved for the summation of them under semi-sequential independence. However, after we impose sequential independence onto them, their summation will exhibit third-moment uncertainty. This phenomenon is closely related to the third moment uncertainty of the GG-normal (as shown in Section 4.1) by applying the GG-version central limit theorem (2.17).

Next we expand 2.7 by considering XX and YY as two semi-GG-normal distributed random variables.

Example 4.7 (Third moment uncertainty comes from asymmetry of independence).

Suppose V1=dV2[σ¯,σ¯]V_{1}\overset{\text{d}}{=}V_{2}\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}, and ϵ1=dϵ2𝒩(0,[1,1])\epsilon_{1}\overset{\text{d}}{=}\epsilon_{2}\sim\mathcal{N}(0,[1,1]) (which is exactly the classical N(0,1)N(0,1)) imposed with sequential independence

Viϵi,i=1,2.V_{i}\dashrightarrow\epsilon_{i},i=1,2.

Let Wi=dViϵi,i=1,2W_{i}\overset{\text{d}}{=}V_{i}\epsilon_{i},i=1,2, which turn out to be two identically distributed semi-GG-normal random variables W1=dW2=d𝒩^(0,[σ¯2,σ¯2])W_{1}\overset{\text{d}}{=}W_{2}\overset{\text{d}}{=}\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]). Note that (W1,W2)(W_{1},W_{2}) is a special case of (X,Y)(X,Y) in 2.7. We are going to show that, under different types of independence for WiW_{i}’s or different structures of sequential independence for ViV_{i}’s and ϵi\epsilon_{i}’s, we will have different uncertainty for W1W22W_{1}W_{2}^{2} and (W1+W2)3(W_{1}+W_{2})^{3} whose extreme scenarios can be described by their sublinear expectations.

When W1SW2W_{1}\overset{\text{S}}{\dashrightarrow}W_{2} or

V1V2ϵ1ϵ2,V_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{1}\dashrightarrow\epsilon_{2}, (4.5)

since W1+W2=d2W1W_{1}+W_{2}\overset{\text{d}}{=}\sqrt{2}W_{1}, then we have

[(W1+W2)3]=maxv[σ¯,σ¯]𝔼[(2vϵ)3]=0=[(W1+W2)3].\mathcal{E}[(W_{1}+W_{2})^{3}]=\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[(\sqrt{2}v\epsilon)^{3}]=0=-\mathcal{E}[-(W_{1}+W_{2})^{3}]. (4.6)

It shows that under semi-sequential independence, W1+W2W_{1}+W_{2} does not have third-moment uncertainty. Meanwhile, since (W1,W2)(W_{1},W_{2}) follows bivariate semi-GG-normal, we also have

[W1W22]=max(v1,v2)[σ¯,σ¯]2𝔼[(v1v22)ϵ1ϵ22]=0.\mathcal{E}[W_{1}W_{2}^{2}]=\max_{(v_{1},v_{2})\in{[\underline{\sigma},\overline{\sigma}]}^{2}}\mathbb{E}_{\mathbb{P}}[(v_{1}v^{2}_{2})\epsilon_{1}\epsilon_{2}^{2}]=0.

Since we have shown that semi-sequential independence is symmetric (3.19), it means that things do not change when we consider W2SW2W_{2}\overset{\text{S}}{\dashrightarrow}W_{2} or V2V1ϵ2ϵ1.V_{2}\dashrightarrow V_{1}\dashrightarrow\epsilon_{2}\dashrightarrow\epsilon_{1}.

However, if we only switch the order of independence between V2V_{2} and ϵ1\epsilon_{1} in 4.5 to get

V1ϵ1V2ϵ2,V_{1}\dashrightarrow\epsilon_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{2},

which means W1FW2W_{1}\overset{\text{F}}{\dashrightarrow}W_{2}, implying W1W2W_{1}\dashrightarrow W_{2}. Note that Wi=dWi,i=1,2-W_{i}\overset{\text{d}}{=}W_{i},i=1,2 and W1FW2-W_{1}\overset{\text{F}}{\dashrightarrow}-W_{2}, so we still have

(W1+W2)=(W1)+(W2)=dW1+W2.-(W_{1}+W_{2})=(-W_{1})+(-W_{2})\overset{\text{d}}{=}W_{1}+W_{2}.

It indicates some “symmetry” in its (GG-version) distribution. Although its second moment is uncertain, we still expect it to have some kind of “zero skewness” which indicates at least “zero third moment”. However, it turns out this is not the case:

[(W1+W2)3]=3[W1W22]=3(σ¯2σ¯2)σ¯2π>0>[(W1+W2)3],\mathcal{E}[(W_{1}+W_{2})^{3}]=3\mathcal{E}[W_{1}W_{2}^{2}]=3(\overline{\sigma}^{2}-\underline{\sigma}^{2})\frac{\overline{\sigma}}{\sqrt{2\pi}}>0>-\mathcal{E}[-(W_{1}+W_{2})^{3}], (4.7)

where we apply 2.25 based on the facts that both W1W_{1} and W2W_{2} have certain zero third moment as well as the results from 2.7:

[W1W22]>0 while [W12W2]=0.\mathcal{E}[W_{1}W_{2}^{2}]>0\text{ while }\mathcal{E}[W_{1}^{2}W_{2}]=0.

How can we understand the asymmetry of independence in 4.7 from the representations of the sublinear expectations? This question is answered by the following 4.8, which can be treated as a special case of 3.24.

Let Cs.polyC_{\text{s.poly}} denote a basic family of bivariate polynomials:

Cs.poly{φ:φ(x1,x2)=(ax1+bx2)n, or cx1px2q, with p,q,n,a,b,c}.C_{\text{s.poly}}\coloneqq\{\varphi:\varphi(x_{1},x_{2})=(ax_{1}+bx_{2})^{n},\text{ or }cx_{1}^{p}x_{2}^{q},\text{ with }p,q,n\in\mathbb{N},a,b,c\in\mathbb{R}\}.
Proposition 4.8.

(The joint distribution of two semi-GG-normal random variables under various independence: for a small family of φ\varphi’s) Consider Wi𝒩^(0,[σ¯2,σ¯2]),i=1,2W_{i}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]),i=1,2 and any φCs.poly\varphi\in C_{\text{s.poly}},

  • When W1SW2W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}, we have

    [φ(W1,W2)]=max𝝈𝒮20[σ¯,σ¯]𝔼[φ(σ1ϵ1,σ2ϵ2)],\mathcal{E}[\varphi(W_{1},W_{2})]=\max_{\bm{\sigma}\in\mathcal{S}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2})],

    where

    𝒮20[σ¯,σ¯]\displaystyle\mathcal{S}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]} {𝝈=(σ1,σ2):(σ1,σ2){σ¯,σ¯}2}.\displaystyle\coloneqq\bigl{\{}\bm{\sigma}=(\sigma_{1},\sigma_{2}):(\sigma_{1},\sigma_{2})\in\{\underline{\sigma},\overline{\sigma}\}^{2}\bigr{\}}.
  • When W1W2W_{1}\dashrightarrow W_{2} or W1FW2W_{1}\overset{\text{F}}{\dashrightarrow}W_{2}, we have

    [φ(W1,W2)]=max𝝈20[σ¯,σ¯]𝔼[φ(σ1ϵ1,σ2ϵ2)],\mathcal{E}[\varphi(W_{1},W_{2})]=\max_{\bm{\sigma}\in\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2})],

    where

    20[σ¯,σ¯]\displaystyle\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]} {𝝈=(σ1,σ2(σ1ϵ1)):σ1{σ¯,σ¯},\displaystyle\coloneqq\bigl{\{}\bm{\sigma}=(\sigma_{1},\sigma_{2}(\sigma_{1}\epsilon_{1})):\sigma_{1}\in\{\underline{\sigma},\overline{\sigma}\},
    σ2(x)=𝟙{x>0}(σ22σ21)+σ21,\displaystyle\hphantom{\coloneqq\{\bm{\sigma}=(\sigma_{1},\sigma_{2}(\sigma_{1}\epsilon_{1})):}\sigma_{2}(x)=\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{x>0}}\right\}}(\sigma_{22}-\sigma_{21})+\sigma_{21},
    (σ21,σ22){σ¯,σ¯}2}.\displaystyle\hphantom{\coloneqq\{\bm{\sigma}=(\sigma_{1},\sigma_{2}(\sigma_{1}\epsilon_{1})):}(\sigma_{21},\sigma_{22})\in\{\underline{\sigma},\overline{\sigma}\}^{2}\bigr{\}}.
Remark 4.8.1.

4.8 provides us with the following intuitions:

  1. 1.

    we can directly see the difference between sequential and semi-sequential independence: under this basic setup, if W1SW2W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}, we can use the upper envelope of a four-element set to represent (or compute) [φ(W1,W2)]\mathcal{E}[\varphi(W_{1},W_{2})], while an eight-element set is required when W1W2W_{1}\dashrightarrow W_{2}. Meanwhile, note that 𝒮20[σ¯,σ¯]20[σ¯,σ¯]\mathcal{S}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}: it indicates that sequential independence can cover a larger family of models compared with the semi-sequential one. This statement is confirmed in general by 3.24;

  2. 2.

    it reveals another a more intuitive insight on why [(W1+W2)3]>0\mathcal{E}[(W_{1}+W_{2})^{3}]>0 under sequential independence: the set difference 20[σ¯,σ¯]𝒮20[σ¯,σ¯]\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}\setminus\mathcal{S}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]} contains those elements where σ2\sigma_{2} actually depends on the previous σ1ϵ1\sigma_{1}\epsilon_{1} (or simply the sign of ϵ1\epsilon_{1}). Although this kind of dependence does not create a shift in the mean part of W1+W2W_{1}+W_{2} (which is still zero), it will have strong effects on the skewness of the distribution of W1+W2W_{1}+W_{2}. This phenomenon is related to the so-called leverage effect in the context of financial time series analysis. In the companion of this paper, we will use a dual-volatility regime-switching data example to give a statistical illustration of this phenomenon and show the necessity of discussing 20[σ¯,σ¯]\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}.

  3. 3.

    we can get one specific interpretation of asymmetry in the sequential independence from the format of 20[σ¯,σ¯]\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]} - the role of σ1\sigma_{1} and σ2\sigma_{2} are not symmetric: when W1W2W_{1}\dashrightarrow W_{2}, this sequential order means that W1W_{1} is realized first and the volatility part σ2[σ¯,σ¯]\sigma_{2}\in{[\underline{\sigma},\overline{\sigma}]} of W2W_{2} may or may not depend on the value of W1W_{1} so as to make the distributional uncertainty in W2W_{2} unchanged. In short, when we aggregate the uncertainty set from time 11 to 22, due to the sequential order of the data, we can only have σ2\sigma_{2} is affected by a function of (σ1,ϵ1)(\sigma_{1},\epsilon_{1}) but we almost never have σ1\sigma_{1} is influenced by a function of (σ2,ϵ2)(\sigma_{2},\epsilon_{2}) (due to the restriction from the order of time). As opposed to this asymmetry, the semi-sequential independence is symmetric indicated by the form of 𝒮20[σ¯,σ¯]\mathcal{S}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}: the role of σ1\sigma_{1} and σ2\sigma_{2} are symmetric so we must have the same results for [W1W22]\mathcal{E}[W_{1}W_{2}^{2}] and [W12W2]\mathcal{E}[W_{1}^{2}W_{2}] under W1SW2W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}.

  4. 4.

    More importantly, it also offers guidance on the possible simulation study in this framework. When one intends to generate a data sequence that can go through the scenarios covered by a sequential independent random variables, a more cautious attitude and in-depth thought are required: different blocks of samples with separate σi[σ¯,σ¯]\sigma_{i}\in{[\underline{\sigma},\overline{\sigma}]} may only go through 𝒮20[σ¯,σ¯]\mathcal{S}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]} which can be at most treated as semi-sequential independence rather than sequential independence. In order to touch the latter one, one needs to generate those scenarios that allow possible classical dependence between σi\sigma_{i} and previous (σk,ϵk)(\sigma_{k},\epsilon_{k}) with k<ik<i. By borrowing the language of a state-space model, if we treat σi\sigma_{i} as states and σiϵi\sigma_{i}\epsilon_{i} as observations, uncertainty in the dependence between current states with previous observations needs to be considered in order to sufficiently discuss the uncertainty set covered by the sequential independence. Otherwise, it is likely to be at most semi-sequential independence. We will further discuss this point in Section 4.4 and further in the companion paper of this work.

Example 4.9 (The direction of independence comes from finer structure).

In the GG-framework, a symmetric (or mutual) independence between two non-constant random variables only arises if they belong to either of the following two categories: classical distribution or maximal distribution (Hu and Li, (2014)). One interesting question is: how about the independence for combinations (such as products) of them? Logically speaking, if the combination does not fall into the two cases, they must have asymmetric independence, but where does this “asymmetry” come from? To be specific, suppose we have V=dV¯=d[σ¯,σ¯]V\overset{\text{d}}{=}\bar{V}\overset{\text{d}}{=}\mathcal{M}{[\underline{\sigma},\overline{\sigma}]} and ϵ=dϵ¯=d𝒩(0,[1,1])\epsilon\overset{\text{d}}{=}\bar{\epsilon}\overset{\text{d}}{=}\mathcal{N}(0,[1,1]). Meanwhile, assume independence VV¯V\dashrightarrow\bar{V} and ϵϵ¯\epsilon\dashrightarrow\bar{\epsilon}. By 2.27, we also have V¯V\bar{V}\dashrightarrow V and ϵ¯ϵ\bar{\epsilon}\dashrightarrow\epsilon. However, we do not have W=VϵW=V\epsilon and W¯=V¯ϵ¯\bar{W}=\bar{V}\bar{\epsilon} are mutually independent by 2.26, because WW and W¯\bar{W} are neither classically nor maximally distributed. To further explain the interesting phenomenon here, we have VV and V¯\bar{V} are mutually independent, so are ϵ\epsilon and ϵ¯\bar{\epsilon}. It seems that the role of “VV versus V¯\bar{V}” and “ϵ\epsilon versus ϵ¯\bar{\epsilon}” should all be “symmetric” and they do not appear any “direction” yet. Nonetheless, when we consider the product WW and W¯\bar{W}, if they are independent, we must have the direction that either WW¯W\dashrightarrow\bar{W} or W¯W\bar{W}\dashrightarrow W, but there seems no middle stage where WW and W¯\bar{W} have some degree of independence where their roles in this relation are symmetric. One question we may ask is, does there exist such kind of middle stage?

In this example, we will give an affirmative answer to this question. It turns out the current conditions of independence are not enough, the relation depends on the structure of independence among the four objects V,V¯,ϵV,\bar{V},\epsilon and ϵ¯\bar{\epsilon}.

To be compatible with the assumed independence VV¯V\dashrightarrow\bar{V} and ϵϵ¯\epsilon\dashrightarrow\bar{\epsilon}, suppose we have additional sequential independence among the four objects. There are essentially four cases:

  1. 1.

    If VV¯ϵϵ¯V\dashrightarrow\bar{V}\dashrightarrow\epsilon\dashrightarrow\bar{\epsilon}, we have WSW¯,W\overset{\text{S}}{\dashrightarrow}\bar{W},

  2. 2.

    If V¯Vϵ¯ϵ,\bar{V}\dashrightarrow V\dashrightarrow\bar{\epsilon}\dashrightarrow\epsilon, we have W¯SW,\bar{W}\overset{\text{S}}{\dashrightarrow}W,

  3. 3.

    If VϵV¯ϵ¯V\dashrightarrow\epsilon\dashrightarrow\bar{V}\dashrightarrow\bar{\epsilon}, we have WW¯,W\dashrightarrow\bar{W},

  4. 4.

    If V¯ϵ¯Vϵ\bar{V}\dashrightarrow\bar{\epsilon}\dashrightarrow V\dashrightarrow\epsilon, we have W¯W.\bar{W}\dashrightarrow W.

Note that Case 1 and 2 are equivalent (by 3.19). The semi-sequential independence (which is the GG-version sequential independence) between WW and W¯\bar{W} is symmetric. From 2.27, the first two cases are also equivalent to

V¯Vϵϵ¯,\bar{V}\dashrightarrow V\dashrightarrow\epsilon\dashrightarrow\bar{\epsilon},

or

VV¯ϵ¯ϵ.V\dashrightarrow\bar{V}\dashrightarrow\bar{\epsilon}\dashrightarrow\epsilon.

However, Case 3 and 4 (sequential independence) are not equivalent (by 2.7).

Example 4.10 (The sequential independence is not an “order” with transitivity).

Although sequential independence has its “order”, it is not really an order relation with transitivity. In other words, for three random variables X,Y,ZX,Y,Z\in\mathcal{H}, the sequential independence XYX\dashrightarrow Y and YZY\dashrightarrow Z do not necessarily imply XZX\dashrightarrow Z. A trivial example is when we consider Zi,i=1,2Z_{i},i=1,2 both follow maximal distribution, if we have Z1Z2Z_{1}\dashrightarrow Z_{2}, then we have Z2Z1Z_{2}\dashrightarrow Z_{1} (by 3.6), but we never have Z1Z1Z_{1}\dashrightarrow Z_{1}. A non-trivial example (with three distinct random variables) comes from the fully-sequential independence structure 3.20. For two semi-GG-normal objects W=VϵW=V\epsilon and W¯=V¯ϵ¯\bar{W}=\bar{V}\bar{\epsilon}, WFW¯W\overset{\text{F}}{\dashrightarrow}\bar{W} means

VϵV¯ϵ¯.V\dashrightarrow\epsilon\dashrightarrow\bar{V}\dashrightarrow\bar{\epsilon}.

By 2.19, we have VV¯V\dashrightarrow\bar{V} and ϵϵ¯\epsilon\dashrightarrow\bar{\epsilon}. Then we further have the other direction of independence also holds V¯V\bar{V}\dashrightarrow V and ϵ¯ϵ\bar{\epsilon}\dashrightarrow\epsilon (by 2.27). Then we have the following counter-example:

ϵ¯ϵ and ϵV¯ but ϵ¯V¯ does not hold,\bar{\epsilon}\dashrightarrow\epsilon\text{ and }\epsilon\dashrightarrow\bar{V}\text{ but }\bar{\epsilon}\dashrightarrow\bar{V}\text{ does not hold},

because we already have V¯ϵ¯\bar{V}\dashrightarrow\bar{\epsilon} but the pair (V¯,ϵ¯)(\bar{V},\bar{\epsilon}) cannot have mutual independence by 2.26. Similarly, we have another example

V¯V and Vϵ but V¯ϵ does not hold.\bar{V}\dashrightarrow V\text{ and }V\dashrightarrow\epsilon\text{ but }\bar{V}\dashrightarrow\epsilon\text{ does not hold.}

4.4 What kinds of sequences are not GG-normal?

For examples:

  1. 1.

    generate YiN(0,σi2)Y_{i}\sim N(0,\sigma_{i}^{2}) with σiUnif[σ¯,σ¯]\sigma_{i}\sim\text{Unif}{[\underline{\sigma},\overline{\sigma}]}, i=1,2,,n.i=1,2,\dotsc,n. This is essentially a sample from independent normal mixture (with scaling parameter following a uniform distribution). Note that this essence is not affected by the distribution of σ\sigma (as long as σ\sigma follows a fixed distribution). The whole data sequence YiY_{i} does not have any distributional uncertainty.

  2. 2.

    first generate σiUnif[σ¯,σ¯]\sigma_{i}\sim\text{Unif}{[\underline{\sigma},\overline{\sigma}]}, i=1,2,ni=1,2\dotsc,n, then generate YijN(0,σi2)Y_{ij}\sim N(0,\sigma_{i}^{2}) with j=1,2,,mj=1,2,\dotsc,m. By introducing this blocking design, even though we pretend to treat the switching rule of σi\sigma_{i} as unknown here (although it may not be so hard for data analyst to observe this pattern), if we look at the uncertainty set considered in this generation scheme: N(0,σ2):σ[σ¯,σ¯]{N(0,\sigma^{2})\mathpunct{:}\sigma\in{[\underline{\sigma},\overline{\sigma}]}}, it is actually at most a pseudo simulation of semi-GG-normal distribution. One typical feature of this sample is that it does not have skewness-uncertainty: it has a certain zero skewness.

  3. 3.

    consider equal-spaced grid

    σ¯=σ1<σ2<<σm=σ¯.\underline{\sigma}=\sigma_{1}<\sigma_{2}<\dotsc<\sigma_{m}=\overline{\sigma}.

    For each σi\sigma_{i} with i=1,2,,mi=1,2,\dotsc,m, generate YijN(0,σi2)Y_{ij}\sim N(0,\sigma_{i}^{2}), j=1,2,,nj=1,2,\dotsc,n. Then treat

    max1im1nj=1nφ(Yij),\max_{1\leq i\leq m}\frac{1}{n}\sum_{j=1}^{n}\varphi(Y_{ij}), (4.8)

    as an approximation of [φ(𝒩(0,[σ¯2,σ¯2])].\mathcal{E}[\varphi(\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])]. This kind of schemes has been used in some of the literature (such as Deng et al., (2019); Fei and Fei, (2019)). We may cautiously step back and ask ourselves: is this a valid approximation? Not really, it is actually an approximation of [φ(𝒩^(0,[σ¯2,σ¯2])]\mathcal{E}[\varphi(\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])]:

    max1im1nj=1nφ(Yij)nmax1im𝔼[φ(σiϵ)]mmaxσ[σ¯,σ¯]𝔼[φ(σϵ)],\max_{1\leq i\leq m}\frac{1}{n}\sum_{j=1}^{n}\varphi(Y_{ij})\overset{n\to\infty}{\to}\max_{1\leq i\leq m}\mathbb{E}[\varphi(\sigma_{i}\epsilon)]\overset{m\to\infty}{\to}\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)],

    where the first convergence can be treated as a classical almost sure convergence and the second one is a deterministic one due to the design of {σi}i=1m\{\sigma_{i}\}_{i=1}^{m}. This fact does not change even using some overlapping groups because each group can be at most treated as a sample from a normal mixture. Again, the problem of the above-mentioned scheme is that it could be misleading for general audience. It is actually going through the uncertainty set of 𝒩^(0,[σ¯2,σ¯2])\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) in a semi-sequential independence rather than in sequential independence. For general φ\varphi, only in the later case, the normalized sum will be closer to the GG-normal distribution. However, this issue can be fixed by consdering an extra step: if the function φ\varphi considered in the question can be proved to be a convex or concave one, then in this practical sense, by 3.24.2, the semi-sequential independence and sequential independence can be treated as the same. For general fixed φ\varphi, we usually need to consider the GG-EM procedure to do the approximation as discussed in Section 4.1, which is also closely related to Section 4 in Fang et al., (2019). Alternatively, we may consider a small family of φ\varphi’s so that we have a finite dimensional set of distributions to go through (such as the one in 4.8). In this way, we can get a feasible approximation based on the idea of max-mean estimation by Jin and Peng, (2021) similar to the form 4.8.

The idea of this section will be further illustrated in the companion of this paper by using a series of data experiments.

5 Conclusions and extensions

For a researcher or practitioner from various backgrounds who may not be familiar with the notion of nonlinear expectation, but is comfortable the classical probability theory and normal distribution, when they try to understand the GG-normal from classical normal, it will be intuitive and beneficial if there exists an intermediate structure that can be directly transformed from the classical normal and also create a bridge towards the GG-normal. Another thinking gap is from the classical independence (symmetric) to the GG-version sequential independence (asymmetric). It will be useful if we have an intermediate stage of independence that is under distributional uncertainty but preverses the symmetry, so that it is associated with our common first impression on the relation beween two static separate random objects both with distributional uncertainty, but no sequential order assumed. Once we talk about two objects with sequential order or in a dynamic way, it becomes possible to involve the sequential independence.

This paper has rigorously set up and discussed the semi-GG-normal distributions with its own types of independence, especially the semi-sequential independence. The hybrid roles of these new substructures, the semi-GG-normal with its semi-sequential independence, can be summarized as follows:

  1. 1.

    The semi-GG-normal 𝒩^(0,[σ¯2,σ¯2])\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) is closely related to the classical normal that it is simply a classical normal N(0,1)N(0,1) scaled by a GG-version constant V[σ¯,σ¯]V\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]} (with a typical independence). Then the semi-GG-normal only exhibits the moment uncertainty with even order but its odd moments, especially the third moment (related to skewness) is preserved to be zero. Meanwhile, the semi-GG-normal is also closely connected with the GG-normal: they have the same sublinear expectation under a convex or concave transformation φ\varphi. For general φ\varphi, they are connected by the using the GG-version central limit theorem.

  2. 2.

    The semi-GG-normal 𝒩^(0,[σ¯2,σ¯2])\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) with semi-sequential independence also preserve the properties of classical normal in multivariate situation (3.22).

  3. 3.

    4.9 shows the hybrid and intermediate role of semi-sequential independence between classical and the sequential independence. The semi-sequential independence is related to the classical one in the sense that it is symmetric (W1SW2W_{1}\overset{\text{S}}{\dashrightarrow}W_{2} implies W2SW1W_{2}\overset{\text{S}}{\dashrightarrow}W_{1}) and it is also related to the sequential independence under convex or concave φ\varphi as illustrated in 3.24.2.

We can use a comparison table (Table 1) to summarize the hybrid roles of this substructure: semi-GG-normal with semi-sequential independence, creating a bridge connecting classical normal and GG-normal.

Table 1: Comparison among normal, semi-GG-normal and GG-normal
Normal Semi-GG-normal GG-normal
N(0,σ2)N(0,\sigma^{2}) 𝒩^(0,[σ¯2,σ¯2])\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) 𝒩(0,[σ¯2,σ¯2])\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])
Expectation Linear Sublinear Sublinear
1st-moment Certain (0) Certain (0) Certain (0)
2nd-moment Certain (σ2\sigma^{2}) Uncertain ([σ¯2,σ¯2][\underline{\sigma}^{2},\overline{\sigma}^{2}]) Uncertain ([σ¯2,σ¯2][\underline{\sigma}^{2},\overline{\sigma}^{2}])
3rd-moment Certain (0) Certain (0) Uncertain
Independence Classical ()Symmetric\begin{array}[]{c}\text{Classical }(\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}})\\ \text{Symmetric}\end{array} Semi-sequential (S)Symmetric\begin{array}[]{c}\text{Semi-sequential }(\overset{\text{S}}{\dashrightarrow})\\ \text{Symmetric}\end{array} Sequential ()Asymmetric\begin{array}[]{c}\text{Sequential }(\dashrightarrow)\\ \text{Asymmetric}\end{array}
(Setup) X=dX¯=dN(0,σ2)XX¯\begin{array}[]{c}X\overset{\text{d}}{=}\bar{X}\overset{\text{d}}{=}N(0,\sigma^{2})\\ X\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\bar{X}\end{array} X=dX¯=d𝒩^(0,[σ¯2,σ¯2])XSX¯\begin{array}[]{c}X\overset{\text{d}}{=}\bar{X}\overset{\text{d}}{=}\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])\\ X\overset{\text{S}}{\dashrightarrow}\bar{X}\end{array} X=dX¯=d𝒩(0,[σ¯2,σ¯2])XX¯\begin{array}[]{c}X\overset{\text{d}}{=}\bar{X}\overset{\text{d}}{=}\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])\\ X\dashrightarrow\bar{X}\end{array}
Stability X+X¯=d2XX+\bar{X}\overset{\text{d}}{=}\sqrt{2}X X+X¯=d2XX+\bar{X}\overset{\text{d}}{=}\sqrt{2}X X+X¯=d2XX+\bar{X}\overset{\text{d}}{=}\sqrt{2}X
Multivariate (X,X¯)=dN(𝟎,σ2𝐈2)(X,\bar{X})\overset{\text{d}}{=}N(\bm{0},\sigma^{2}\mathbf{I}_{2}) (X,X¯)=d𝒩^(𝟎,[σ¯2,σ¯2]𝐈2)(X,\bar{X})\overset{\text{d}}{=}\hat{\mathcal{N}}(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{2}) (X,X¯)d𝒩(𝟎,[σ¯2,σ¯2]𝐈2)(X,\bar{X})\overset{\text{d}}{\neq}\mathcal{N}(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{2})

Furthermore, we hope the substructures proposed in this paper will open up new extensions of discussions on the difference and connection between the GG-expectation framework with the classical one, by providing more details on the operations on the distributions and independence in a ground where the researchers in both areas can have a proper overlapping intuition. In this way, we are able to have a finer discussion on the model uncertainty in the dynamic situation where the volatility part is ambiguous or cannot be precisely determined for data analysts.

Next we will give several possible directions of extensions of this paper. Interestingly, the discussion in Section 5.5 actually shows the vision that, after introducing the tool of sublinear expectation with representations in different situations, we are able to extend our horizon of statistical questions to include those that could be too complicated to handle under classical probability system.

5.1 The semi-GG-family of distributions

We can extend our current notion of semi-GG-normal distribution to a broader class of semi-GG-family of distributions or the class of semi-GG-version distributions.

For simplicity, we only provide this notion in one-dimensional case.

Definition 5.1.

A random variable WW follows a semi-GG-version distribution if there exists a maximally distributed 𝒁(Θ)\bm{Z}\sim\mathcal{M}(\Theta) (where Θk\Theta\subset\mathbb{R}^{k} is a bounded, closed and convex set) and a classically distributed ϵ\epsilon, satisfying

Zϵ,Z\dashrightarrow\epsilon,

such that

W=f(Z,ϵ),W=f(Z,\epsilon),

for a Borel measurable function ff satisfying f(Z,ϵ)f(Z,\epsilon)\in\mathcal{H}.

Remark 5.1.1.

The three types of independence in 3.16 can be carried over to members in semi-GG-family of distributions. For instance, with Wi=fi(Zi,ϵi),i=1,2,,nW_{i}=f_{i}(Z_{i},\epsilon_{i}),i=1,2,\dotsc,n, they are called semi-sequentially independent if

Z1Z2Znϵ1ϵ2ϵn.Z_{1}\dashrightarrow Z_{2}\dashrightarrow\cdots\dashrightarrow Z_{n}\dashrightarrow\epsilon_{1}\dashrightarrow\epsilon_{2}\dashrightarrow\cdots\dashrightarrow\epsilon_{n}.
Remark 5.1.2.

Most of classical distributions should exist in this framework. To give a quick validation, since there exists ϵ𝒩(0,[1,1])\epsilon\sim\mathcal{N}(0,[1,1]) which follows standard normal distribution N(0,1)N(0,1) with classical cumulative distribution function (cdf) Φ\Phi which is defined as

Φ(x)[𝟙{ϵx}]=𝔼[𝟙{ϵx}].\Phi(x)\coloneqq\mathcal{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon\leq x}}\right\}}]=\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon\leq x}}\right\}}].

(Such cdf can be defined using the solution of the classical heat equation.) Let

UΦ(ϵ).U\coloneqq\Phi(\epsilon).

Note that Φ\Phi is a bounded and continuous function, so UU\in\mathcal{H}. Then we can check that UU follows classical Unif[0,1]\text{Unif}[0,1] distribution. Next we can use the classical inverse cdf method. For any classical distribution with cdf FF (no matter it is conitnuous or not), let

F1(y)inf{x;F(x)y},F^{-1}(y)\coloneqq\inf\{x;F(x)\geq y\}, (5.1)

denote the generalized inverse of FF. Let XF1(U).X\coloneqq F^{-1}(U). We only need to add suitable conditions on FF so that [|X|]<\mathcal{E}[\lvert X\rvert]<\infty to have XX\in\mathcal{H}. Then we get a random object XX following distribution with cdf FF.

Remark 5.1.3.

Note that we assume Θ\Theta to be a bounded, closed and convex set for theoretical convenience. In practice, this condition can be weakened. For instance, when we talk about the semi-GG-normal random variable W=VϵW=V\epsilon where V[σ¯,σ¯]V\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}, the interval [σ¯,σ¯]{[\underline{\sigma},\overline{\sigma}]} can be changed to {σ¯,σ¯}\{\underline{\sigma},\overline{\sigma}\} or {σ¯,(σ¯+σ¯)/2,σ¯}\{\underline{\sigma},(\underline{\sigma}+\overline{\sigma})/2,\overline{\sigma}\}.

Example 5.2.

Here are several special examples of semi-GG-version distributions.

  1. 1.

    Consider Z[σ¯,σ¯]Z\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}, ϵN(0,1)\epsilon\sim N(0,1) and f(x,y)=xyf(x,y)=xy, then WW follows the semi-GG-normal distribution, whose distributional uncertainty can be characterized by

    {N(0,σ2),σ[σ¯,σ¯]}.\{N(0,\sigma^{2}),\sigma\in{[\underline{\sigma},\overline{\sigma}]}\}.
  2. 2.

    Consider Z=(U,V)([μ¯,μ¯]×[σ¯,σ¯])Z=(U,V)\sim\mathcal{M}([\underline{\mu},\overline{\mu}]\times{[\underline{\sigma},\overline{\sigma}]}), ϵN(0,1)\epsilon\sim N(0,1) and

    W=U+Vϵ.W=U+V\epsilon.

    Then the distributional uncertainty of WW can be described as

    {N(μ,σ2),μ[μ¯,μ¯],σ[σ¯,σ¯]}.\{N(\mu,\sigma^{2}),\mu\in[\underline{\mu},\overline{\mu}],\sigma\in{[\underline{\sigma},\overline{\sigma}]}\}.

    We can also show that [φ(W)]\mathcal{E}[\varphi(W)] can cover a family of normal mixture models.

  3. 3.

    (Semi-GG-exponential) Let Z[λ¯,λ¯]Z\sim\mathcal{M}[\underline{\lambda},\overline{\lambda}] and ϵexp(1)\epsilon\sim\exp(1). Consider

    W=Zϵ,W=Z\epsilon,

    then we can check that the distributional uncertainty of WW can be written as

    {exp(λ),λ[λ¯,λ¯]},\{\exp(\lambda),\lambda\in[\underline{\lambda},\overline{\lambda}]\},

    where each exp(λ)\exp(\lambda) has pdf f(x)=1λex/λ𝟙{x0}f(x)=\frac{1}{\lambda}e^{-x/\lambda}\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{x\geq 0}}\right\}}.

  4. 4.

    (Semi-GG-Bernoulli) With 0p¯p¯10\leq\underline{p}\leq\overline{p}\leq 1, let ϵUnif[0,1]\epsilon\sim\text{Unif}[0,1], and

    Z[p¯,p¯].Z\sim\mathcal{M}[\underline{p},\overline{p}].

    Consider

    W=𝟙{ϵZ<0}.W=\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon-Z<0}}\right\}}.

    Then WW has distributional uncertainty

    {Bern(p),p[p¯,p¯]}.\{\text{Bern}(p),p\in[\underline{p},\overline{p}]\}.
Example 5.3.

In general, we can take advantage of the idea of classical inverse cdf method to design the transformation ff. Then we are able to consider any distributional uncertainty in the form of

{F(x;θ),θΘ},\{F(x;\theta),\theta\in\Theta\}, (5.2)

where F(x;θ)F(x;\theta) is the cdf of a classical distribution with parameter θ\theta. Let F1(y;θ)F^{-1}(y;\theta) denote the generalized inverse of F(x;θ)F(x;\theta) as shown in 5.1. Consider 𝒁(Θ)\bm{Z}\sim\mathcal{M}(\Theta) and ϵUnif[0,1]\epsilon\sim\text{Unif}[0,1], and

W=F1(ϵ,𝒁).W=F^{-1}(\epsilon,\bm{Z}).

After we add more conditions on FF such that WW\in\mathcal{H}, we have WW has distributional uncertainty in the form 5.2, because

[φ(W)]=sup𝒛Θ𝔼[φ(F1(ϵ,z))],\mathcal{E}[\varphi(W)]=\sup_{\bm{z}\in\Theta}\mathbb{E}[\varphi(F^{-1}(\epsilon,z))],

where F1(ϵ,z)F^{-1}(\epsilon,z) follows the distribution with cdf F(x;z)F(x;z).

To further study the properties of semi-GG-version distributions and the semi-sequential independence, let ¯s\bar{\mathcal{H}}_{s} denote the semi-GG-family of distributions:

¯s{X:X=f(V,ϵ),V(Θ),ϵ classical,Vϵ}.\bar{\mathcal{H}}_{s}\coloneqq\{X\in\mathcal{H}:X=f(V,\epsilon),V\sim\mathcal{M}(\Theta),\epsilon\text{ classical},V\dashrightarrow\epsilon\}.

Note that ¯s\bar{\mathcal{H}}_{s} satisfies:

  1. 1.

    If X¯sX\in\bar{\mathcal{H}}_{s}, aX¯saX\in\bar{\mathcal{H}}_{s} for any aa\in\mathbb{R},

  2. 2.

    If X¯sX\in\bar{\mathcal{H}}_{s}, |X|¯s\lvert X\rvert\in\bar{\mathcal{H}}_{s},

  3. 3.

    For X,Y¯sX,Y\in\text{$\bar{\mathcal{H}}_{s}$}, if XSYX\overset{\text{S}}{\dashrightarrow}Y, X+Y¯s.X+Y\in\bar{\mathcal{H}}_{s}.

For any X,Y¯sX,Y\in\bar{\mathcal{H}}_{s}, we have XSYX\overset{\text{S}}{\dashrightarrow}Y is equivalent to YSXY\overset{\text{S}}{\dashrightarrow}X (by the symmetry of semi-sequential independence as illustrated by 3.19), so we can omit the direction between XX and YY and also call the mutual semi-sequential independence between them as semi-GG-independence.

Definition 5.4.

(Semi-GG-independence) For any X,Y¯sX,Y\in\bar{\mathcal{H}}_{s}, with X=f(Vx,ϵx)X=f(V_{x},\epsilon_{x}) and Y=g(Vy,ϵy)Y=g(V_{y},\epsilon_{y}), XX and YY are semi-GG-independent if

  1. 1.

    (Vx,Vy)(ϵx,ϵy)(V_{x},V_{y})\dashrightarrow(\epsilon_{x},\epsilon_{y}),

  2. 2.

    VxVyV_{x}\dashrightarrow V_{y} (which is equivalent to VyVxV_{y}\dashrightarrow V_{x}),

  3. 3.

    ϵx,ϵy\epsilon_{x},\epsilon_{y} are classically independent.

Definition 5.5.

(Semi-GG-independence of sequence) For a sequence {Xi}i=1n¯s\{X_{i}\}_{i=1}^{n}\subset\bar{\mathcal{H}}_{s} with Xi=fi(Vi,ϵi)X_{i}=f_{i}(V_{i},\epsilon_{i}), they are (mutually) semi-GG-independent if

  1. 1.

    (V1,V2,,Vn)(ϵ1,ϵ2,,ϵn)(V_{1},V_{2},\dotsc,V_{n})\dashrightarrow(\epsilon_{1},\epsilon_{2},\dotsc,\epsilon_{n}),

  2. 2.

    {Vi}i=1n\{V_{i}\}_{i=1}^{n}are GG-version (sequentially) independent (that is V1V2Vn{V}_{1}\dashrightarrow{V}_{2}\dashrightarrow\cdots\dashrightarrow{V}_{n}),

  3. 3.

    {ϵ}i=1n\{\epsilon\}_{i=1}^{n} are classically independent.

A sequence {Xi}i=1n\{X_{i}\}_{i=1}^{n} are called semi-GG-version i.i.d. (or semi-GG-i.i.d.) if they are identically distributed and semi-GG-independent.

In the following context, consider X,Y¯sX,Y\in\bar{\mathcal{H}}_{s}, with X=f(Vx,ϵx)X=f(V_{x},\epsilon_{x}) and Y=g(Vy,ϵy)Y=g(V_{y},\epsilon_{y}).

Proposition 5.6.

If XX and YY are semi-GG-independent, for the joint vector (X,Y)(X,Y), we have for any φCb.Lip\varphi\in C_{\mathrm{b.Lip}},

[φ(X,Y)]=sup(vx,vy)[σ¯x,σ¯x]×[σ¯y,σ¯y]𝔼[φ(f(vx,ϵx),g(vy,ϵy))].\mathcal{E}[\varphi(X,Y)]=\sup_{(v_{x},v_{y})\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]\times[\underline{\sigma}_{y},\overline{\sigma}_{y}]}\mathbb{E}[\varphi(f(v_{x},\epsilon_{x}),g(v_{y},\epsilon_{y}))].
Proof.

This is direct consequence of the definition of the semi-GG-independence. ∎

For any vx[σ¯x,σ¯x]v_{x}\in[\underline{\sigma}_{x},\overline{\sigma}_{x}] and vy[σ¯y,σ¯y]v_{y}\in[\underline{\sigma}_{y},\overline{\sigma}_{y}], let

h1(vx)𝔼[f(vx,ϵx)] and h2(vy)=𝔼[g(vy,ϵy)].h_{1}(v_{x})\coloneqq\mathbb{E}[f(v_{x},\epsilon_{x})]\text{ and }h_{2}(v_{y})=\mathbb{E}[g(v_{y},\epsilon_{y})].

Then

[X]=[h1(Vx)] and [Y]=[h1(Vy)].\mathcal{E}[X]=\mathcal{E}[h_{1}(V_{x})]\text{ and }\mathcal{E}[Y]=\mathcal{E}[h_{1}(V_{y})].

In the following context, for simplicty of discussion, we assume h1,h2h_{1},h_{2} are continuous functions. Then we can take maximum on the rectangle [σ¯x,σ¯x]×[σ¯y,σ¯y][\underline{\sigma}_{x},\overline{\sigma}_{x}]\times[\underline{\sigma}_{y},\overline{\sigma}_{y}] in 5.6. (This assumption can be relaxed whenever sup\sup does not affect the derivation.) Readers may find out that, under the semi-GG-independence, our manipulation of sublinear expectation of semi-GG-version objects becomes quite intuitive and flexible.

Proposition 5.7.

If XX and YY are semi-GG-independent, we have

[X+Y]=[X]+[Y].\mathcal{E}[X+Y]=\mathcal{E}[X]+\mathcal{E}[Y].
Proof.

By 5.6, we have

[X+Y]\displaystyle\mathcal{E}[X+Y] =max(vx,vy)[σ¯x,σ¯x]×[σ¯y,σ¯y]𝔼[f(vx,ϵx)+g(vy,ϵy)]\displaystyle=\max_{(v_{x},v_{y})\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]\times[\underline{\sigma}_{y},\overline{\sigma}_{y}]}\mathbb{E}[f(v_{x},\epsilon_{x})+g(v_{y},\epsilon_{y})]
=max(vx,vy)[σ¯x,σ¯x]×[σ¯y,σ¯y][h1(vx)+h2(vy)]\displaystyle=\max_{(v_{x},v_{y})\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]\times[\underline{\sigma}_{y},\overline{\sigma}_{y}]}[h_{1}(v_{x})+h_{2}(v_{y})]
=maxvxmaxvy[h1(vx)+h2(vy)]\displaystyle=\max_{v_{x}}\max_{v_{y}}[h_{1}(v_{x})+h_{2}(v_{y})]
=maxvx[h1(vx)+maxvyh2(vy)]\displaystyle=\max_{v_{x}}[h_{1}(v_{x})+\max_{v_{y}}h_{2}(v_{y})]
=maxvxh1(vx)+maxvyh2(vy)=[X]+[Y].\displaystyle=\max_{v_{x}}h_{1}(v_{x})+\max_{v_{y}}h_{2}(v_{y})=\mathcal{E}[X]+\mathcal{E}[Y].

Remark 5.7.1.

Compared with 2.25, for X,Y¯sX,Y\in\bar{\mathcal{H}}_{s}, we have one more situation for [X+Y]=[X]+[Y]\mathcal{E}[X+Y]=\mathcal{E}[X]+\mathcal{E}[Y] to hold:

  1. 1.

    either XX or YY has mean-certainty,

  2. 2.

    XYX\dashrightarrow Y or XYX\dashrightarrow Y,

  3. 3.

    XX and YY are semi-GG-independent.

Proposition 5.8.

If XX and YY are semi-GG-independent and either one of them has certain mean zero, we have

[XY]=[XY]=0.\mathcal{E}[XY]=-\mathcal{E}[-XY]=0.
Proof.

Since XX and YY are semi-GG-independent, by 5.6,

[XY]\displaystyle\mathcal{E}[XY] =max(vx,vy)[σ¯x,σ¯x]×[σ¯y,σ¯y]𝔼[f(vx,ϵx)g(vy,ϵy)]\displaystyle=\max_{(v_{x},v_{y})\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]\times[\underline{\sigma}_{y},\overline{\sigma}_{y}]}\mathbb{E}[f(v_{x},\epsilon_{x})g(v_{y},\epsilon_{y})]
=max(vx,vy)[σ¯x,σ¯x]×[σ¯y,σ¯y]h1(vx)h2(vy).\displaystyle=\max_{(v_{x},v_{y})\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]\times[\underline{\sigma}_{y},\overline{\sigma}_{y}]}h_{1}(v_{x})h_{2}(v_{y}).

If we have either one of them has certain mean zero such as [X]=[X]=0\mathcal{E}[X]=-\mathcal{E}[-X]=0, we have

maxvx[σ¯x,σ¯x]𝔼[f(vx,ϵx)]=minvx[σ¯x,σ¯x]𝔼[f(vx,ϵx)]=0,\max_{v_{x}\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]}\mathbb{E}[f(v_{x},\epsilon_{x})]=\min_{v_{x}\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]}\mathbb{E}[f(v_{x},\epsilon_{x})]=0,

It means h1(vx)=0h_{1}(v_{x})=0 for any vx[σ¯x,σ¯x]v_{x}\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]. Then we must have [XY]=0\mathcal{E}[XY]=0 and similarly we have [XY]-\mathcal{E}[-XY] by changing max\max to min\min. ∎

5.2 The semi-GG-version of central limit theorem

After setting up the semi-GG-family of distributions and semi-sequential independence, it turns out we can prove a semi-GG-version of central limit theorem in this context, which further brings a substructure connecting the classical central limit theorem with the GG-version central limit theorem. It also shows the central role of semi-GG-normal in a semi-GG-version class of distributions.

First we consider a subset of ¯s\bar{\mathcal{H}}_{s}:

s{X¯s:X=Vϵ,V[σ¯,σ¯] with 0σ¯σ¯ and the classical ϵ is standardized},\mathcal{H}_{s}\coloneqq\{X\in\bar{\mathcal{H}}_{s}:X=V\epsilon,V\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}\text{ with }0\leq\underline{\sigma}\leq\overline{\sigma}\text{ and the classical }\epsilon\text{ is standardized}\},

where we call a classical ϵ\epsilon is standardized if 𝔼[ϵ]=0\mathbb{E}[\epsilon]=0 and 𝔼[ϵ2]=1\mathbb{E}[\epsilon^{2}]=1. Here s\mathcal{H}_{s} can be treated as a class of semi-GG-distributions with zero mean and variance uncertainty.

Our current version of the semi-GG-version of central limit theorem can be formulated as follows.

Theorem 5.9.

For any sequence {Xi}i=1n={Viηi}i=1ns\{X_{i}\}_{i=1}^{n}=\{V_{i}\eta_{i}\}_{i=1}^{n}\subset\mathcal{H}_{s} that are semi-GG-i.i.d. with certain zero mean and uncertain variance:

σ¯2=[X12][X12]=σ¯2 with 0σ¯σ¯,\underline{\sigma}^{2}=-\mathcal{E}[-X_{1}^{2}]\leq\mathcal{E}[X_{1}^{2}]=\overline{\sigma}^{2}\text{ with }0\leq\underline{\sigma}\leq\overline{\sigma},

we have

1ni=1nXidW,\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i}\overset{\text{d}}{\longrightarrow}W,

where W𝒩^(0,[σ¯2,σ¯2])W\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]). To be specific, for any bounded and continuous φ\varphi, we have

limn[φ(1ni=1nXi)]=[φ(W)].\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i})]=\mathcal{E}[\varphi(W)]. (5.3)
Remark 5.9.1.

Note that any φCb.Lip\varphi\in C_{\mathrm{b.Lip}} must be a bounded and continuous one, so the convergence in distribution (2.5) must hold.

Remark 5.9.2.

As a classical perspective of 5.9, by using the representation of semi-GG-normal under semi-GG-independence, we have

limnsup𝝈[σ¯,σ¯]nE[φ(1ni=1nσiϵi)]=supv[σ¯,σ¯]E[φ(vϵ)],\lim_{n\to\infty}\sup_{\bm{\sigma}\in[\underline{\sigma},\overline{\sigma}]^{n}}E[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sigma_{i}\epsilon_{i})]=\sup_{v\in[\underline{\sigma},\overline{\sigma}]}E[\varphi(v\epsilon^{*})], (5.4)

where ϵN(0,1)\epsilon^{*}\sim N(0,1) and 𝝈=(σ1,,σn)\bm{\sigma}=(\sigma_{1},\dotsc,\sigma_{n}) is a scalar vector. This form is also equivalent to

limnsup𝝈Sn[σ¯,σ¯]E[φ(1ni=1nσiϵi)]=supv[σ¯,σ¯]E[φ(vϵ)],\lim_{n\to\infty}\sup_{\bm{\sigma}\in S_{n}{[\underline{\sigma},\overline{\sigma}]}}E[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sigma_{i}\epsilon_{i})]=\sup_{v\in[\underline{\sigma},\overline{\sigma}]}E[\varphi(v\epsilon^{*})],

where 𝝈\bm{\sigma} could be any hidden processes valuing in [σ¯,σ¯]{[\underline{\sigma},\overline{\sigma}]} that is independent from (ϵ1,ϵ2,,ϵn)(\epsilon_{1},\epsilon_{2},\dotsc,\epsilon_{n}). When the unknown variance form is taken in this way, the uncertainty on the behavior the normalized summation can be asymptotically characterized by the semi-GG-normal.

If 𝝈\bm{\sigma} is chosen from a larger family that may involve dependence between σi\sigma_{i} with previous (ϵj,j<i)(\epsilon_{j},j<i), then it will be related with the GG-version central limit theorem (under sequential independence, rather than the semi-GG-version independence): XiX_{i} are sequentially independent,

limn[φ(1ni=1nXi)]=[φ(WG)],\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i})]=\mathcal{E}[\varphi(W^{G})],

which gives us

limnsup𝝈n[σ¯,σ¯]E[φ(1ni=1nσiϵi)]=[φ(WG)].\lim_{n\to\infty}\sup_{\bm{\sigma}\in\mathcal{L}_{n}^{*}{[\underline{\sigma},\overline{\sigma}]}}E[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sigma_{i}\epsilon_{i})]=\mathcal{E}[\varphi(W^{G})].

To summarize, the semi-GG-normal distribution can be treated as the attractor for normalized summations of semi-GG-i.i.d. random variables and the GG-normal is the attractor for summations of GG-version i.i.d. random variables.

In the proof of 5.9, we adapt the idea of Lindeberg method in a “leave-one-out” manner (Breiman, (1992)) to the sublinear context. One of the reason that we are able to do such adaptation is the symmetry in semi-GG-independence: XiX_{i} is semi-GG-independent from {Xj,ji}\{X_{j},j\neq i\} (Note that we cannot do such adaptation under sequential independence due to its asymmetry). More details of the proof can be found in Section 6.6.

Since we only have finite second moment assumption so far in 5.9, by adding stronger moment conditions on XnX_{n}, the function space of φ\varphi can be taken to be Cl.LipC_{\mathrm{l.Lip}} to include those unbounded ones. This statement is based on 2.18.

As a basic example, given a stronger condition [|X1|3]<\mathcal{E}[\lvert X_{1}\rvert^{3}]<\infty, we can check that the convergence 5.3 holds for φ(x)=x3\varphi(x)=x^{3} by direct computation (5.10).

Example 5.10 (Check φ(x)=x3\varphi(x)=x^{3}).

In the convergence 5.3, since [W3]=0\mathcal{E}[W^{3}]=0, we only need to show:

limn[(1ni=1nXi)3]=0.\lim_{n\to\infty}\mathcal{E}[(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i})^{3}]=0.

In fact,

[(1ni=1nXi)3]\displaystyle\mathcal{E}[(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i})^{3}] =n3/2[(i=1nXi)3]\displaystyle=n^{-3/2}\mathcal{E}[(\sum_{i=1}^{n}X_{i})^{3}]
=n3/2[i=1nXi3+ij or jkXiXjXk].\displaystyle=n^{-3/2}\mathcal{E}[\sum_{i=1}^{n}X_{i}^{3}+\sum_{i\neq j\text{ or }j\neq k}X_{i}X_{j}X_{k}].

(Note that, if i=ji=j and j=kj=k, the second term becomes the first one.) For the summand XiXjXkX_{i}X_{j}X_{k} of the second term, without loss of generality, we assume that ijki\leq j\leq k. Then we have three cases:

  1. 1.

    i<j=ki<j=k,

  2. 2.

    i=j<ki=j<k,

  3. 3.

    i<j<ki<j<k.

In Case 1, since XiX_{i} and XjX_{j} are semi-GG-independent, we have:

[XiXj2]\displaystyle\mathcal{E}[X_{i}X_{j}^{2}] =max(vi,vj)𝔼[vivj2ηiηj2]\displaystyle=\max_{(v_{i},v_{j})}\mathbb{E}[v_{i}v_{j}^{2}\eta_{i}\eta_{j}^{2}]
=max(vi,vj)vivj2𝔼[ηi]𝔼[ηj2]=0.\displaystyle=\max_{(v_{i},v_{j})}v_{i}v_{j}^{2}\mathbb{E}[\eta_{i}]\mathbb{E}[\eta_{j}^{2}]=0.

(Note that [XiXj2]=0\mathcal{E}[X_{i}X_{j}^{2}]=0 does not hold under sequential independence XiXjX_{i}\dashrightarrow X_{j} by 2.7.) Meanwhile, we can obtain [XiXj2]=0-\mathcal{E}[-X_{i}X_{j}^{2}]=0 so XiXj2X_{i}X_{j}^{2} has certain mean zero.

We can similarly prove the result in Case 2, that is, Xi2XkX_{i}^{2}X_{k} has certain mean zero. For Case 3, since Xi,Xj,XkX_{i},X_{j},X_{k} are semi-GG-independent, we have

[XiXjXk]=max(vi,vj,vk)vivjvk𝔼[ηi]𝔼[ηj]𝔼[ηk]=0.\mathcal{E}[X_{i}X_{j}X_{k}]=\max_{(v_{i},v_{j},v_{k})}v_{i}v_{j}v_{k}\mathbb{E}[\eta_{i}]\mathbb{E}[\eta_{j}]\mathbb{E}[\eta_{k}]=0.

We further have [XiXjXk]=0-\mathcal{E}[-X_{i}X_{j}X_{k}]=0 using the same logic. Therefore,

[(1ni=1nXi)3]=n3/2[i=1nXi3]=n1/2[X13]0,\mathcal{E}[(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i})^{3}]=n^{-3/2}\mathcal{E}[\sum_{i=1}^{n}X_{i}^{3}]=n^{-1/2}\mathcal{E}[X_{1}^{3}]\to 0,

where we use the condition that [|X1|3]<\mathcal{E}[\lvert X_{1}\rvert^{3}]<\infty and 5.7.

5.3 Fine structures of independence and the associated family of state-space volatility models

In 4.9, we have mainly discussed the independence between two semi-GG-distributed objects. Here we consider three of them as a starting point to discuss much finer structure of independence.

Consider Wi=Viϵi=d𝒩^(0,[σ¯2,σ¯2]),i=1,2,3W_{i}=V_{i}\epsilon_{i}\overset{\text{d}}{=}\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]),i=1,2,3. The independence structure among them is essentially related to the GG-version independence among ViV_{i} and ϵi\epsilon_{i}, i=1,2,3i=1,2,3. For instance,

  • (a)

    V1V2V3ϵ1ϵ2ϵ3V_{1}\dashrightarrow V_{2}\dashrightarrow V_{3}\dashrightarrow\epsilon_{1}\dashrightarrow\epsilon_{2}\dashrightarrow\epsilon_{3},

  • (b)

    V1ϵ1V2ϵ2V3ϵ3V_{1}\dashrightarrow\epsilon_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{2}\dashrightarrow V_{3}\dashrightarrow\epsilon_{3}.

Note that a) is equivalent to W1SW2SW3W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}\overset{\text{S}}{\dashrightarrow}W_{3} and b) means W1FW2FW3W_{1}\overset{\text{F}}{\dashrightarrow}W_{2}\overset{\text{F}}{\dashrightarrow}W_{3} which implies W1W2W3W_{1}\dashrightarrow W_{2}\dashrightarrow W_{3}.

Then we can see that there are several middle stages between (a) and (b). In order to present these intermediate stages, let us play a simple game: switch two components each time and change the independence structure from (a) to (b). During this game, the following rules are required:

  • R1

    we must keep the independence ViϵiV_{i}\dashrightarrow\epsilon_{i} due to the definition of semi-GG-normal,

  • R2

    we must keep the order as (V1,V2,V3)(V_{1},V_{2},V_{3}) and (ϵ1,ϵ2,ϵ3(\epsilon_{1},\epsilon_{2},\epsilon_{3}), because the independence order of elements within each vector is usually equivalent. Otherwise, if we break this order, we need an unnecessary extra step to retrieve the index order (1,2,3)(1,2,3) to be consistent with (b).

Here we can get two approaches:

  1. 1.

    Since we do not want to break the order within (V1,V2,V3)(V_{1},V_{2},V_{3}) or (ϵ1,ϵ2,ϵ3(\epsilon_{1},\epsilon_{2},\epsilon_{3}), the first step has to be switching some ViV_{i} with ϵj\epsilon_{j} with i,j=1,2,3i,j=1,2,3. For the ϵ\epsilon part, we can only move ϵ1\epsilon_{1} due to R1, and similarly for VV part we can only move V3V_{3}. Hence, the first step is to exchange V3V_{3} and ϵ1\epsilon_{1} in (a) to get

    V1V2ϵ1V3ϵ2ϵ3.V_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{1}\dashrightarrow V_{3}\dashrightarrow\epsilon_{2}\dashrightarrow\epsilon_{3}. (5.5)

    Then we have two equivalent ways to move on.

  2. 2.

    One way is to exchange V2V_{2} and ϵ1\epsilon_{1} in 5.5 to get

    V1ϵ1V2V3ϵ2ϵ3.V_{1}\dashrightarrow\epsilon_{1}\dashrightarrow V_{2}\dashrightarrow V_{3}\dashrightarrow\epsilon_{2}\dashrightarrow\epsilon_{3}. (5.6)

    Then we can exchange V3V_{3} and ϵ2\epsilon_{2} to get (b).

  3. 3.

    Another way is to exchange V3V_{3} and ϵ2\epsilon_{2} to get

    V1V2ϵ1ϵ2V3ϵ3.V_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{1}\dashrightarrow\epsilon_{2}\dashrightarrow V_{3}\dashrightarrow\epsilon_{3}. (5.7)

    Then we can exchange V2V_{2} and ϵ1\epsilon_{1} to get (b).

Note that 5.6 implies the following relation:

W1(W2,W3) and W2SW3.W_{1}\dashrightarrow(W_{2},W_{3})\text{ and }W_{2}\overset{\text{S}}{\dashrightarrow}W_{3}.

We can show that the family of models associated with the representation of [φ(W1,W2,W3)]\mathcal{E}[\varphi(W_{1},W_{2},W_{3})] under 5.6 can be illustrated by Figure 5.2. Similarly, 5.7 implies

(W1,W2)W3 and W1SW2.(W_{1},W_{2})\dashrightarrow W_{3}\text{ and }W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}.

The family of models associated with 5.7 can be described by Figure 5.3. The family of models for 5.5 can be shown by Figure 5.1.

The intuition here is: if all VjV_{j} is before ϵi\epsilon_{i}, since ϵi\epsilon_{i} has distributional certainty, in the directed graph, σj\sigma_{j} does not have effects on ϵi\epsilon_{i}. As long as we have ϵi\epsilon_{i} is before VjV_{j} in the order of the GG-version independence, we must have the additional edge from ϵi\epsilon_{i} to σj\sigma_{j} in the directed graph of the family of models to represent the sublinear expectation of the joint vector.

We can see that by changing the independence structure, the sublinear expectation of a joint vector of semi-GG-version of distributions can be represented by classes of state-space models with different graphical structures.

One question to be explored is whether there is an independence structure that is associated with the familiy shown in Figure 5.4. Our conjecture is as follows: at least we need the following conditions,

  • 1)

    V1ϵ1V2ϵ2V_{1}\dashrightarrow\epsilon_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{2} which means W1W2W_{1}\dashrightarrow W_{2},

  • 2)

    V2ϵ2V3ϵ3V_{2}\dashrightarrow\epsilon_{2}\dashrightarrow V_{3}\dashrightarrow\epsilon_{3} which means W2W3W_{2}\dashrightarrow W_{3},

  • 3)

    V1V3ϵ1ϵ3V_{1}\dashrightarrow V_{3}\dashrightarrow\epsilon_{1}\dashrightarrow\epsilon_{3} which means W1SW3W_{1}\overset{\text{S}}{\dashrightarrow}W_{3}.

σ1\sigma_{1}Y1Y_{1}ϵ1\cdot\epsilon_{1}σ2\sigma_{2}Y2Y_{2}ϵ2\cdot\epsilon_{2}σ3\sigma_{3}Y3Y_{3}ϵ3\cdot\epsilon_{3}
Figure 5.1: Diagram for 5.5
σ1\sigma_{1}Y1Y_{1}ϵ1\cdot\epsilon_{1}σ2\sigma_{2}Y2Y_{2}ϵ2\cdot\epsilon_{2}σ3\sigma_{3}Y3Y_{3}ϵ3\cdot\epsilon_{3}
Figure 5.2: Diagram for 5.6
σ1\sigma_{1}Y1Y_{1}ϵ1\cdot\epsilon_{1}σ2\sigma_{2}Y2Y_{2}ϵ2\cdot\epsilon_{2}σ3\sigma_{3}Y3Y_{3}ϵ3\cdot\epsilon_{3}
Figure 5.3: Diagram for 5.7
σ1\sigma_{1}Y1Y_{1}ϵ1\cdot\epsilon_{1}σ2\sigma_{2}Y2Y_{2}ϵ2\cdot\epsilon_{2}σ3\sigma_{3}Y3Y_{3}ϵ3\cdot\epsilon_{3}
Figure 5.4: Diagram for the common structure of classical first-order hidden Markov models with feedback

5.4 A robust confidence interval for regression under heteorskedastic noise with unknown variance structure

Let {Wi}i=1\{W_{i}\}_{i=1}^{\infty} denote a sequence of nonlinearly i.i.d. semi-GG-normally distributed random variables with W1𝒩^(0,[σ¯2,σ¯2])W_{1}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]). In Section 4.1. we have studied the GG-EM procedure which is aimed at the following expression:

[φ(i=1naiWi)].\mathcal{E}[\varphi(\sum_{i=1}^{n}a_{i}W_{i})]. (5.8)

This section will provide a basic example in the context of regression to show why we need to think about 5.8 in statistical practice.

Consider a simple linear regression problem in the context of sequential data (xi,Yi)(x_{i},Y_{i})(where the order of the data matters):

Yi=β0+β1xi+ξi,i=1,2,,n,Y_{i}=\beta_{0}+\beta_{1}x_{i}+\xi_{i},i=1,2,\dotsc,n, (5.9)

where xix_{i} is treated as known and ξi=σiϵi\xi_{i}=\sigma_{i}\epsilon_{i} with σi:Ω[σ¯,σ¯]\sigma_{i}:\Omega\to{[\underline{\sigma},\overline{\sigma}]} and ϵiN(0,1)\epsilon_{i}\sim N(0,1) for each i=1,2,,ni=1,2,\dotsc,n. We can see that the noise ξi\xi_{i} part is heteorskedastic (although σi\sigma_{i} is not observable). However, if the variance structure of the noise part ξi\xi_{i} is complicated due to measurement errors or the data is collected from different subpopulations with different variances, we need to have some precaution on the properties of the least square estimator β^1\hat{\beta}_{1}, especially when we are lack of prior knowledge on the dynamic of σi\sigma_{i}. If we worry that σi\sigma_{i} may depend on the preivous ϵk\epsilon_{k} with k<ik<i, rather than assuming a single probabilistic model for σi\sigma_{i} then perform the regression, in an early stage of data analysis, we can first assume 𝝈=(σi)i=1n\bm{\sigma}=(\sigma_{i})_{i=1}^{n} could belong to any elements in n[σ¯,σ¯]\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]} defined in Section 3.7. Note that the distributional uncertainty of each ξi\xi_{i} can be described by Wi𝒩^(0,[σ¯2,σ¯2])W_{i}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]). Then the distributional uncertainty of 5.9 can be translated into a GG-version format:

YGi=β0+β1xi+Wi,i=1,2,,n.Y^{G}_{i}=\beta_{0}+\beta_{1}x_{i}+W_{i},i=1,2,\dotsc,n. (5.10)

Let

aixix¯n(xix¯n)2.a_{i}\coloneqq\frac{x_{i}-\bar{x}_{n}}{\sum(x_{i}-\bar{x}_{n})^{2}}.

Then the least-square estimator can be written as

β^1=(xix¯n)YGi(xix¯n)2=aiYGi=β0ai+aixiβ1+aiWi=β1+aiWi.\hat{\beta}_{1}=\frac{\sum(x_{i}-\bar{x}_{n})Y^{G}_{i}}{\sum(x_{i}-\bar{x}_{n})^{2}}=\sum a_{i}Y^{G}_{i}=\beta_{0}\sum a_{i}+\sum a_{i}x_{i}\beta_{1}+\sum a_{i}W_{i}=\beta_{1}+\sum a_{i}W_{i}. (5.11)

Then we have β^1β1=aiWi.\hat{\beta}_{1}-\beta_{1}=\sum a_{i}W_{i}. Note that [β^1]=[β^1]=β1\mathcal{E}[\hat{\beta}_{1}]=-\mathcal{E}[-\hat{\beta}_{1}]=\beta_{1}.

Then we are able to study the properties of β^1\hat{\beta}_{1} by assigning different forms of φ\varphi in 5.8:

  1. 1.

    With φ(x)=xk\varphi(x)=x^{k} and k+k\in\mathbb{N}_{+}, we have the centred moments of β^1\hat{\beta}_{1}

    [φ(aiWi)]=[(β^1β1)k].\mathcal{E}[\varphi(\sum a_{i}W_{i})]=\mathcal{E}[(\hat{\beta}_{1}-\beta_{1})^{k}].
  2. 2.

    With φ(x)=𝟙{|x|>c}\varphi(x)=\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\lvert x\rvert>c}}\right\}}, we get the object that is useful to derive a confidence interval in this context:

    [φ(aiWi)]=𝐕(|β^1β1|>c).\mathcal{E}[\varphi(\sum a_{i}W_{i})]=\mathbf{V}(\lvert\hat{\beta}_{1}-\beta_{1}\rvert>c). (5.12)

Interestingly, from 3.24 and 3.26, 5.12 further leads us to a robust confidence interval by solving the following equation:

𝐕(|β^1β1|>cα/2)=sup𝝈n[σ¯,σ¯](|aiσiϵi|>cα/2)=α,\mathbf{V}(\lvert\hat{\beta}_{1}-\beta_{1}\rvert>c_{\alpha/2})=\sup_{\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{P}(\lvert\sum a_{i}\sigma_{i}\epsilon_{i}\rvert>c_{\alpha/2})=\alpha,

or

inf𝝈n[σ¯,σ¯](|aiσiϵi|cα/2)=1α.\inf_{\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{P}(\lvert\sum a_{i}\sigma_{i}\epsilon_{i}\rvert\leq c_{\alpha/2})=1-\alpha.

The resulting confidence interval is robust in the sense that its coverage rate will be at least 1α1-\alpha regardless of the unknown variance structure of the noise part σiϵi\sigma_{i}\epsilon_{i} in the regression. If we have more information that shows σk\sigma_{k} does not depend on the previous ϵi\epsilon_{i} with i<ki<k, we can consider a smaller family of sets 𝒮n[σ¯,σ¯]\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}. Alternatively, it also provides a way to perform a sensitivity analysis on the performance of a regression estimator (such as β^1\hat{\beta}_{1} here) under heteroscedastic noise with unknown variance structure that could belong to different family of models.

Then this discussion leads us to another interesting question. In an early stage of data analysis, should we choose 𝝈𝒮n[σ¯,σ¯]\bm{\sigma}\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]} or 𝝈n[σ¯,σ¯]\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}? This question will be explored in Section 5.5.

5.5 Inference on the general model structure of a state-space volatility model

Recall the setup in Section 5.4. In practice, if lacking knowledge on the underlying dynamic of the datasets, whether we should choose 𝝈𝒮n[σ¯,σ¯]\bm{\sigma}\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]} or 𝝈n[σ¯,σ¯]\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]} is a difficult problem in classical statistical methodology (in model specification) because both families involve a infinitely-dimensional family of elements. However, it turns out it can be essentially transformed into a GG-version question: it has a feasible solution once we introduce the GG-expectation of semi-GG-family of distributions. This becomes a hypothesis test to distinguish between semi-sequential independence and sequential independence. To be specific, we are able to consider a test:

H0:𝝈𝒮n[σ¯,σ¯] vs Ha:𝝈n[σ¯,σ¯]𝒮n[σ¯,σ¯].H_{0}:\bm{\sigma}\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}\textbf{ vs }H_{a}:\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}\setminus\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}.

A good interpretation of this test is, since the class of hidden Markov models (with volatility as the switching regimes) belong to 𝒮\mathcal{S}. If we reject the null hypothesis, it means the underlying 𝝈\bm{\sigma} process cannot be treated as a switching-regime in the hidden Markov setup (or in any other kinds of normal mixture model), but we need to re-investigate the dataset and consider the 𝝈\bm{\sigma} process outside of the family of the normal mixture model (for instance, we may need to introduce other dependency, like the one between the previous observation Y<tY_{<t} with current σt\sigma_{t}, such as a feedback design). Throughout this discussion, we did not make any parametric assumption on the model of 𝝈\bm{\sigma}, and we are still able to give a rigorous test on this distinction. The idea of this test will take advantage of 3.24 to transform the distinction between two family of classical models to a task of distinguishing two different types of independence (S\overset{\text{S}}{\dashrightarrow} versus \dashrightarrow) for semi-GG-normal vector (W1,W2,,Wn)(W_{1},W_{2},\dotsc,W_{n}). There are plenty of test functions φ\varphi (neither convex nor concave ones) to reveal the difference between S\overset{\text{S}}{\dashrightarrow} and \dashrightarrow such that

S[φ(W1,W2,,Wn)]<L[φ(W1,W2,,Wn)].\mathcal{E}^{S}[\varphi(W_{1},W_{2},\dotsc,W_{n})]<\mathcal{E}^{L}[\varphi(W_{1},W_{2},\dotsc,W_{n})].

For instance, we can choose

φ(xi,i=1,2,,n)=(1ni=1nxi)3.\varphi(x_{i},i=1,2,\dotsc,n)=(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}x_{i})^{3}.

Under this φ\varphi, the expectation under S\overset{\text{S}}{\dashrightarrow} is a certain zero but the one under \dashrightarrow is greater than zero. Then we should be able to construct a test statistic based on the form of φ\varphi and obtain a rejection region by studying its tail probability under 𝐕\mathbf{V} which can be transformed back into the sublinear expectation of (W1,W2,,Wn)(W_{1},W_{2},\dotsc,W_{n}).

How to choose the test function will have significant effect on the performance of this hypothesis test. Moreover, the current interpretation of nn is the length of the whole data sequence and 𝝈\bm{\sigma} is the unknown volatility dynamic of the full sequence. We can also interpret nn as the group size after grouping the dataset in a either non-overlapping or overlapping manner, then we can consider 𝝈\bm{\sigma} for each group to test whether there is a case falling into the class of HaH_{a}, because the sublinear expectation can give a control on the extremes of the group statistics as indicated by Jin and Peng, (2021) and Section 2.2 in Fang et al., (2019).

Acknowledgements and the story behind the semi-GG-normal

We have received many useful suggestions and feedback from the community in the past four years which are beneficial to the formation of this paper (so this paper can be treated as a report to the community).

The authors would like to first express their sincere thanks to Prof. Shige Peng who visited our department in May 2017 (invited by Prof. Reg Kulperger) and our discussion at that time motivated us to study a distributional and probabilistic structure that has a direct connection with the intuition behind the existing max-mean estimation proposed by Jin and Peng, (2021). Later on during the Fields-China Industrial Problem Solving Workshop in Finance, and a short course on the GG-expectation framework given by Prof. Peng at Fields Institute, Toronto, Canada, we had several interesting discussions on the data experiments in this context, which can be treated as the starting point of the companion paper of the current one. In our regular discussion notes in that period, there was a prototype of the current semi-GG-normal distribution and also a question on independence between semi-GG-normal was raised which is currently included and answered by 4.9.

Although the design of semi-GG-normal is mainly for distributional purpose, this concept was first proposed in Li and Kulperger, (2018), which was applied to design an iterative approximation towards the GG-normal distribution by starting from the linear expectations of classical normal, as discussed in Section 4.1. During the 2018 Young Researcher Meeting on BSDEs, Nonlinear Expectations and Mathematical Finance in Shanghai Jiao Tong University, we have received beneficial feedback on this iterative method from participants in the conference. Specially we would like to thank to Prof. Yiqing Lin on providing more references that have potential theoretical connections. 4.1.2 is benefited from the comments by Prof. Shuzhen Yang and Prof. Xinpeng Li.

The first author would also like to express his gratitude to Prof. Huaxiong Huang at Fields Institute and his Ph.D. student Nathan Gold for their support and suggestions during a separate long-term and challenging joint project (regularly discussing with Prof. Peng) during summer 2017 on a stochastic method of the GG-heat equation in high-dimensional case and its theoretical and numerical convergence. In this project, the first author has learned the intuition about the methods in how to appropriately design a switching rule in the stochastic volatility to approximate the solution to a fully nonlinear PDE, which is related to the methods based on BSDEs and the second order BSDEs, and also the intuition behind nonlinear expectations in this context. Although the methods are different, this experience creates another motivation for Li and Kulperger, (2018).

The authors are grateful for the valuable discussions with the community during the conference of Probability, Uncertainty and Quantitative Risk in July 2019. One of the motivations of 4.9 is from the comments by Prof. Mingshang Hu on the independence property of maximal distribution. The writing of Section 4.4 is motivated by the discussions with Prof. Peng during the conference. Section 4.4 and further the data experiments in the companion paper are also benefited from the discussions on the meaning of sequential independence under a set of measures with Prof. Jianfeng Zhang.

We have also benefited from the feedback from participants coming from various backgrounds in the Annual Meetings of SSC (Statistical Society of Canada) in 2018 and 2019 to understand the impression from general audience on the GG-expectation framework. During the poster session of the Annual Meeting of SSC at McGill University in 2018, we have received several positive comments about designing a substructure connecting the GG-expectation framework (which is a highly technical one for general audience) with the objects in the classical system. These comments further motivate us to write this paper for general readers. In the Annual Meeting of SSC at Calgary University in 2019, there is a comment from the audience on the property of \mathcal{H} and the choice of function space (\mathcal{H} could be quite small if we choose a large function space for φ\varphi). It has motivated us to improve the preliminary setup (Section 2) and put more attention on the design of \mathcal{H}.

During the improvement of this manuscript from the first version (April 2021) to the third version (October 2021), the authors are grateful to Prof. Defei Zhang who gives many beneficial comments (such as the comment on the product space and an improvement of Figure 4.1) and Prof. Xinpeng Li whose suggestion motivates us to develop the research in Section 5.2.

6 Proofs

6.1 Proofs in Section 3.2

Proof of 3.2.

The finiteness of 𝔼[|φ(V)|]\mathbb{E}[\lvert\varphi(V)\rvert] is obvious due to the continuity of φ\varphi and the compactness of [σ¯,σ¯]{[\underline{\sigma},\overline{\sigma}]}. First of all, note that 3.2 is a direct result of 3.1. It is also not hard to see 3.5, since for any σ𝒟[σ¯,σ¯]\sigma\in\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}, it satisfies σ([σ¯,σ¯])=1\mathbb{P}_{\sigma}({[\underline{\sigma},\overline{\sigma}]})=1, then

𝔼[φ(σ)]\displaystyle\mathbb{E}[\varphi(\sigma)] =σ¯σ¯φ(x)σ(dx)σ¯σ¯maxx[σ¯,σ¯]φ(x)σ(dx)\displaystyle=\int_{\underline{\sigma}}^{\overline{\sigma}}\varphi(x)\mathbb{P}_{\sigma}(\mathop{}\!\mathrm{d}x)\leq\int_{\underline{\sigma}}^{\overline{\sigma}}\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(x)\mathbb{P}_{\sigma}(\mathop{}\!\mathrm{d}x)
=maxx[σ¯,σ¯]φ(x)σ([σ¯,σ¯])=maxx[σ¯,σ¯]φ(x),\displaystyle=\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(x)\mathbb{P}_{\sigma}({[\underline{\sigma},\overline{\sigma}]})=\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(x),

which implies

maxσ𝒟[σ¯,σ¯]𝔼[φ(σ)]maxσ[σ¯,σ¯]𝔼[φ(σ)].\max_{\sigma\in\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma)]\leq\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma)].

Since [σ¯,σ¯]𝒟[σ¯,σ¯]{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}, we also have the other direction of inequality holds. Similarly, we can show 3.3.

To validate 3.4, we need to show that for any α>0\alpha>0, there exist a random variable σα𝒟cont.[σ¯,σ¯]\sigma_{\alpha}\in\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}, such that

𝔼[φ(σα)]>[φ(V)]α.\mathbb{E}[\varphi(\sigma_{\alpha})]>\mathcal{E}[\varphi(V)]-\alpha. (6.1)

Let v=argmaxv[σ¯,σ¯]φ(v)v^{*}=\operatorname*{arg\,max}_{v\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(v). Then we have [φ(V)]=φ(v0)\mathcal{E}[\varphi(V)]=\varphi(v_{0}). Since φ\varphi is a continuous function on [σ¯,σ¯]{[\underline{\sigma},\overline{\sigma}]}, there exists v0(σ¯,σ¯)v_{0}\in(\underline{\sigma},\overline{\sigma}) such that φ(v0)>φ(v)α/2\varphi(v_{0})>\varphi(v^{*})-\alpha/2. In a classical probability space (Ω,,)(\Omega,\mathcal{F},\mathbb{P}), consider a series of random variables ξnv0+e/n\xi_{n}\coloneqq v_{0}+e/\sqrt{n} where eN(0,1)e\sim N(0,1) and n+n\in\mathbb{N}_{+}. In short, ξnN(v0,1/n)\xi_{n}\sim N(v_{0},1/n) with diminishing variance. Then we must have ξndv0\xi_{n}\overset{\text{d}}{\longrightarrow}v_{0}. Then transform ξn\xi_{n} into its truncation on [σ¯,σ¯]{[\underline{\sigma},\overline{\sigma}]}: ξnξnIn\xi_{n}^{*}\coloneqq\xi_{n}I_{n} with In𝟙{ξn[σ¯,σ¯]}I_{n}\coloneqq\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\xi_{n}\in{[\underline{\sigma},\overline{\sigma}]}}}\right\}}. We can easily show that In1I_{n}\overset{\mathbb{P}}{\longrightarrow}1 since, for any a>0a>0, (|In1|>a)=(In=0)=1(ξn[σ¯,σ¯])0.\mathbb{P}(\lvert I_{n}-1\rvert>a)=\mathbb{P}(I_{n}=0)=1-\mathbb{P}(\xi_{n}\in{[\underline{\sigma},\overline{\sigma}]})\to 0. By classical Slustky’s theorem, ξn=ξnIndv0.\xi_{n}^{*}=\xi_{n}I_{n}\overset{\text{d}}{\longrightarrow}v_{0}. Therefore, for any φCl.Lip()\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}),

𝔼[φ(ξn)]φ(v0).\mathbb{E}[\varphi(\xi_{n}^{*})]\to\varphi(v_{0}).

For any α>0\alpha>0, there exists nαn_{\alpha} such that 𝔼[φ(ξnα)]>φ(v0)α/2.\mathbb{E}[\varphi(\xi_{n_{\alpha}}^{*})]>\varphi(v_{0})-\alpha/2. Let σαξnα\sigma_{\alpha}\coloneqq\xi_{n_{\alpha}} which belongs to 𝒟cont.[σ¯,σ¯]\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}. It the required object satisfying 6.1, because

𝔼[φ(σα))]>φ(v0)α/2>φ(v0)α=[φ(V)]α.\mathbb{E}[\varphi(\sigma_{\alpha}))]>\varphi(v_{0})-\alpha/2>\varphi(v_{0})-\alpha=\mathcal{E}[\varphi(V)]-\alpha.\qed
Proof of 3.6.

We can prove it by mathematical induction. For d=1d=1, it obviously holds. Suppose the results holds for d=kd=k with k+k\in\mathbb{N}_{+}, namely,

𝑽(k)(V1,V2,,Vk)(i=1k[σ¯i,σ¯i]),\bm{V}_{(k)}\coloneqq(V_{1},V_{2},\dotsc,V_{k})\sim\mathcal{M}(\prod_{i=1}^{k}[\underline{\sigma}_{i},\overline{\sigma}_{i}]),

then we only need to show it holds for d=k+1d=k+1. In fact, consider any locally Lipschitz function

φ:(k+1,)(,||),\varphi:(\mathbb{R}^{k+1},\lVert\cdot\rVert)\to(\mathbb{R},\lvert\cdot\rvert),

satisfying, there exists Cφ>0C_{\varphi}>0, m+m\in\mathbb{R}_{+},

|φ(x)φ(y)|Cφ(1+xm+ym)xy.\lvert\varphi(x)-\varphi(y)\rvert\leq C_{\varphi}(1+\lVert x\rVert^{m}+\lVert y\rVert^{m})\lVert x-y\rVert.

Since Vk+1V_{k+1} is independent from 𝑽(k)\bm{V}_{(k)},

[φ(V1,V2,,Vk)]=[φ(𝑽(k),Vk+1)]=[[φ(𝝈(k),Vk+1)]𝝈(k)=𝑽(k)].\mathcal{E}[\varphi(V_{1},V_{2},\dotsc,V_{k})]=\mathcal{E}[\varphi(\bm{V}_{(k)},V_{k+1})]=\mathcal{E}\mathopen{}\mathclose{{}\left[\mathcal{E}[\varphi(\bm{\sigma}_{(k)},V_{k+1})]_{\bm{\sigma}_{(k)}=\bm{V}_{(k)}}}\right].

Let φk(x)φ(𝝈(k),x),\varphi_{k}(x)\coloneqq\varphi(\bm{\sigma}_{(k)},x), and

ψk+1(𝝈(k))maxσk+1[σ¯k+1,σ¯k+1]φ(𝝈(k),σk+1)=maxσk+1[σ¯k+1,σ¯k+1]φk(σk+1).\psi_{k+1}(\bm{\sigma}_{(k)})\coloneqq\max_{\sigma_{k+1}\in[\underline{\sigma}_{k+1},\overline{\sigma}_{k+1}]}\varphi(\bm{\sigma}_{(k)},\sigma_{k+1})=\max_{\sigma_{k+1}\in[\underline{\sigma}_{k+1},\overline{\sigma}_{k+1}]}\varphi_{k}(\sigma_{k+1}).

For notational convenience, we sometimes omit the domain [σ¯k+1,σ¯k+1][\underline{\sigma}_{k+1},\overline{\sigma}_{k+1}] of the maximization here in our later discussions if it is clear from the context.

Claim 6.1.

We have φkCl.Lip()\varphi_{k}\in C_{\mathrm{l.Lip}}(\mathbb{R}) and ψk+1Cl.Lip(k)\psi_{k+1}\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k}).

Then we are able to apply the representation of maximal distribution 𝑽(k)\bm{V}_{(k)} (allowed by 6.1) to have

[φ(V1,V2,,Vk+1)]\displaystyle\mathcal{E}[\varphi(V_{1},V_{2},\dotsc,V_{k+1})] =[φ(𝑽(k),Vk+1)]\displaystyle=\mathcal{E}[\varphi(\bm{V}_{(k)},V_{k+1})]
=[[φ(𝝈(k),Vk+1)φk(σk+1)]𝝈(k)=𝑽(k)]\displaystyle=\mathcal{E}\Bigl{[}\mathcal{E}[\underbrace{\varphi(\bm{\sigma}_{(k)},V_{k+1})}_{\varphi_{k}(\sigma_{k+1})}]_{\bm{\sigma}_{(k)}=\bm{V}_{(k)}}\Bigr{]}
=[[maxσk+1φk(σk+1)]𝝈(k)=𝑽(k)]\displaystyle=\mathcal{E}\Bigl{[}[\max_{\sigma_{k+1}}\varphi_{k}(\sigma_{k+1})]_{\bm{\sigma}_{(k)}=\bm{V}_{(k)}}\Bigr{]}
=[ψk+1(𝑽(k))]\displaystyle=\mathcal{E}[\psi_{k+1}(\bm{V}_{(k)})]
=max𝝈(k)ψk+1(𝝈(k))\displaystyle=\max_{\bm{\sigma}_{(k)}}\psi_{k+1}(\bm{\sigma}_{(k)})
=max(σ1,σ2,,σk)maxσk+1φ(σ1,,σk,σk+1)\displaystyle=\max_{(\sigma_{1},\sigma_{2},\dotsc,\sigma_{k})}\max_{\sigma_{k+1}}\varphi(\sigma_{1},\dotsc,\sigma_{k},\sigma_{k+1})
=max(σ1,σ2,,σk+1)φ(σ1,,σk,σk+1).\displaystyle=\max_{(\sigma_{1},\sigma_{2},\dotsc,\sigma_{k+1})}\varphi(\sigma_{1},\dotsc,\sigma_{k},\sigma_{k+1}).

Therefore,

(V1,V2,,Vk+1)(i=1k+1[σ¯i,σ¯i]).(V_{1},V_{2},\dotsc,V_{k+1})\sim\mathcal{M}(\prod_{i=1}^{k+1}[\underline{\sigma}_{i},\overline{\sigma}_{i}]).

The conclusion can be achieved by induction.

The remaining task is to prove 6.1.

To show φkCl.Lip()\varphi_{k}\in C_{\mathrm{l.Lip}}(\mathbb{R}), we write

|φk(x)φk(y)|\displaystyle\lvert\varphi_{k}(x)-\varphi_{k}(y)\rvert =|φ(𝝈(k),x)φ(𝝈(k),y)|\displaystyle=\lvert\varphi(\bm{\sigma}_{(k)},x)-\varphi(\bm{\sigma}_{(k)},y)\rvert
Cφ(1+(𝝈(k),x)m+(𝝈(k),y)m)xy,\displaystyle\leq C_{\varphi}(1+\lVert(\bm{\sigma}_{(k)},x)\rVert^{m}+\lVert(\bm{\sigma}_{(k)},y)\rVert^{m})\lVert x-y\rVert,

where we adapt \lVert\cdot\rVert to lower dimension in the sense that x(𝟎(k),x)\lVert x\rVert\coloneqq\lVert(\bm{0}_{(k)},x)\rVert. Notice

(𝝈(k),x)=(𝝈(k),0)+(𝟎(k),x)𝝈(k)+x.\lVert(\bm{\sigma}_{(k)},x)\rVert=\lVert(\bm{\sigma}_{(k)},0)+(\bm{0}_{(k)},x)\rVert\leq\lVert\bm{\sigma}_{(k)}\rVert+\lVert x\rVert.

Meanwhile, there exists K0K\geq 0 (actually K=max{1,2m1}K=\max\{1,2^{m-1}\}), such that

(𝝈(k),x)m(𝝈(k)+x)mK(𝝈(k)m+xm).\lVert(\bm{\sigma}_{(k)},x)\rVert^{m}\leq(\lVert\bm{\sigma}_{(k)}\rVert+\lVert x\rVert)^{m}\leq K(\lVert\bm{\sigma}_{(k)}\rVert^{m}+\lVert x\rVert^{m}).

Then we have

|φk(x)φk(y)|C1(1+xm+ym)xy,\lvert\varphi_{k}(x)-\varphi_{k}(y)\rvert\leq C_{1}(1+\lVert x\rVert^{m}+\lVert y\rVert^{m})\lVert x-y\rVert,

where C1=Cφmax{1+2K𝝈(k)m,K}.C_{1}=C_{\varphi}\max\{1+2K\lVert\bm{\sigma}_{(k)}\rVert^{m},K\}.

Next we check ψk+1Cl.Lip(k)\psi_{k+1}\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k}). For any 𝒂(k),𝒃(k)k\bm{a}_{(k)},\bm{b}_{(k)}\in\mathbb{R}^{k},

|ψk+1(𝒂(k))ψk+1(𝒃(k))|\displaystyle|\psi_{k+1}(\bm{a}_{(k)})-\psi_{k+1}(\bm{b}_{(k)})|
=\displaystyle= |maxσk+1[σ¯,σ¯]φ(𝒂(k),σk+1)maxσk+1[σ¯,σ¯]φ(𝒃(k),σk+1)|\displaystyle|\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\bm{a}_{(k)},\sigma_{k+1})-\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\bm{b}_{(k)},\sigma_{k+1})|
\displaystyle\leq maxσk+1[σ¯,σ¯]|φ(𝒂(k),σk+1)φ(𝒃(k),σk+1)|\displaystyle\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}|\varphi(\bm{a}_{(k)},\sigma_{k+1})-\varphi(\bm{b}_{(k)},\sigma_{k+1})|
\displaystyle\leq maxσk+1[σ¯,σ¯]Cφ(1+(𝒂(k),σk+1)m+(𝒃(k),σk+1)m)𝒂(k)𝒃(k)\displaystyle\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}C_{\varphi}(1+\lVert(\bm{a}_{(k)},\sigma_{k+1})\rVert^{m}+\lVert(\bm{b}_{(k)},\sigma_{k+1})\rVert^{m})\lVert\bm{a}_{(k)}-\bm{b}_{(k)}\rVert
\displaystyle\leq C2(1+𝒂(k)+𝒃(k))𝒂(k)𝒃(k),\displaystyle C_{2}(1+\lVert\bm{a}_{(k)}\rVert+\lVert\bm{b}_{(k)}\rVert)\lVert\bm{a}_{(k)}-\bm{b}_{(k)}\rVert,

where C2=Cφmax{1+2Kσ¯m,K}C_{2}=C_{\varphi}\max\{1+2K\overline{\sigma}^{m},K\}. ∎

Proof of 3.5.

The first statement can be proved by studying the range of ψ(𝑽)\psi(\bm{V}). First, we need to show that φψ(x)φ(ψ(x))\varphi\circ\psi(x)\coloneqq\varphi(\psi(x)) is also a locally Lipschitz function for any φCl.Lip(d)\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{d}). Suppose ψ\psi satisfies,

ψ(𝒙)ψ(𝒚)Cψ(1+𝒙p+𝒚p)𝒙𝒚.\lVert\psi(\bm{x})-\psi(\bm{y})\rVert\leq C_{\psi}(1+\lVert\bm{x}\rVert^{p}+\lVert\bm{y}\rVert^{p})\lVert\bm{x}-\bm{y}\rVert. (6.2)

We first can write

|φψ(𝒙)φψ(𝒚)|\displaystyle|\varphi\circ\psi(\bm{x})-\varphi\circ\psi(\bm{y})| =|φ(ψ(𝒙))φ(ψ(𝒚))|\displaystyle=|\varphi(\psi(\bm{x}))-\varphi(\psi(\bm{y}))|
Cφ(1+ψ(𝒙)m+ψ(𝒚)m)ψ(𝒙)ψ(𝒚).\displaystyle\leq C_{\varphi}(1+\lVert\psi(\bm{x})\rVert^{m}+\lVert\psi(\bm{y})\rVert^{m})\lVert\psi(\bm{x})-\psi(\bm{y})\rVert. (6.3)

As preparations for next step, we are going to frequently use tha basic fact that the lower-degree polynomials can be dominated by higher-degree ones in the sense that,

𝒙kmax{1,𝒙l}1+𝒙l with kl,\lVert\bm{x}\rVert^{k}\leq\max\{1,\lVert\bm{x}\rVert^{l}\}\leq 1+\lVert\bm{x}\rVert^{l}\text{ with }k\leq l, (6.4)

and for any k,l+k,l\in\mathbb{N}_{+},

𝒙k𝒚l12(𝒙2k+𝒙2l).\lVert\bm{x}\rVert^{k}\lVert\bm{y}\rVert^{l}\leq\frac{1}{2}(\lVert\bm{x}\rVert^{2k}+\lVert\bm{x}\rVert^{2l}). (6.5)

In 6.3, we can directly use 6.2 to dominate ψ(𝒙)ψ(𝒚)\lVert\psi(\bm{x})-\psi(\bm{y})\rVert. For the parts like ψ(𝒙)m\lVert\psi(\bm{x})\rVert^{m}, 6.2 implies,

ψ(𝒙)|ψ(𝒙)ψ(𝟎)|+|ψ(𝟎)|Cψ(1+𝒙p)𝒙+C0Cψ(1+𝒙p+1),\lVert\psi(\bm{x})\rVert\leq|\psi(\bm{x})-\psi(\bm{0})|+|\psi(\bm{0})|\leq C_{\psi}(1+\lVert\bm{x}\rVert^{p})\lVert\bm{x}\rVert+C_{0}\leq C_{\psi}^{\prime}(1+\lVert\bm{x}\rVert^{p+1}),

then there exists Cψ>0C_{\psi}^{\prime\prime}>0 such that,

ψ(𝒙)m[Cψ(1+𝒙p+1)]mCψ(1+𝒙(p+1)m).\lVert\psi(\bm{x})\rVert^{m}\leq[C_{\psi}^{\prime}(1+\lVert\bm{x}\rVert^{p+1})]^{m}\leq C_{\psi}^{\prime\prime}(1+\lVert\bm{x}\rVert^{(p+1)m}).

Hence, we can get φψCl.Lip(d)\varphi\circ\psi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{d}) by the inequality as follows,

|φψ(𝒙)φψ(𝒚)|\displaystyle|\varphi\circ\psi(\bm{x})-\varphi\circ\psi(\bm{y})| K1(1+𝒙(p+1)m+𝒚(p+1)m)(1+𝒙p+𝒚p)𝒙𝒚\displaystyle\leq K_{1}(1+\lVert\bm{x}\rVert^{(p+1)m}+\lVert\bm{y}\rVert^{(p+1)m})(1+\lVert\bm{x}\rVert^{p}+\lVert\bm{y}\rVert^{p})\lVert\bm{x}-\bm{y}\rVert
K2(1+𝒙2(p+1)pm+𝒚2(p+1)pm)𝒙𝒚.\displaystyle\leq K_{2}(1+\lVert\bm{x}\rVert^{2(p+1)pm}+\lVert\bm{y}\rVert^{2(p+1)pm})\lVert\bm{x}-\bm{y}\rVert.

Finally, we have ψ(𝝈)(𝒮)\psi(\bm{\sigma})\sim\mathcal{M}(\mathcal{S}) from its representation:

[φ(𝑺)]=[φ(ψ(𝑽))]\displaystyle\mathcal{E}[\varphi(\bm{S})]=\mathcal{E}[\varphi(\psi(\bm{V}))] =[φψ(𝑽)]\displaystyle=\mathcal{E}[\varphi\circ\psi(\bm{V})]
=maxσi[σ¯i,σ¯i],i=1,2,,dφψ(σ1,σ2,,σd)\displaystyle=\max_{\sigma_{i}\in[\underline{\sigma}_{i},\overline{\sigma}_{i}],i=1,2,\dotsc,d}\varphi\circ\psi(\sigma_{1},\sigma_{2},\dotsc,\sigma_{d})
=max𝒔𝒮φ(𝒔).\displaystyle=\max_{\bm{s}\in\mathcal{S}}\varphi(\bm{s}).

The second statement essentially comes from the basic property of the maximum of a continuous function on a rectangle: in this ideal setup, the order of taking marginal maximum does not affect the final value. To show the basic idea, start from a simple case d=2d=2: if V1V2V_{1}\dashrightarrow V_{2}, for any φCl.Lip(2)\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{2}), we can work on (V2,V1)(V_{2},V_{1}) to show the other direction of independence,

[φ(V2,V1)]\displaystyle\mathcal{E}[\varphi(V_{2},V_{1})] =[[φ(σ2,V1)]σ1=V1]\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(\sigma_{2},V_{1})]_{\sigma_{1}=V_{1}}]
=maxσ1[σ¯1,σ¯1]maxσ2[σ¯2,σ¯2]φ(σ2,σ1)\displaystyle=\max_{\sigma_{1}\in[\underline{\sigma}_{1},\overline{\sigma}_{1}]}\max_{\sigma_{2}\in[\underline{\sigma}_{2},\overline{\sigma}_{2}]}\varphi(\sigma_{2},\sigma_{1})
=max(σ1,σ2)i=12[σ¯i,σ¯i]φ(σ2,σ1)\displaystyle=\max_{(\sigma_{1},\sigma_{2})\in\prod_{i=1}^{2}[\underline{\sigma}_{i},\overline{\sigma}_{i}]}\varphi(\sigma_{2},\sigma_{1})
=maxσ2[σ¯2,σ¯2]maxσ1[σ¯1,σ¯1]φ(σ2,σ1)\displaystyle=\max_{\sigma_{2}\in[\underline{\sigma}_{2},\overline{\sigma}_{2}]}\max_{\sigma_{1}\in[\underline{\sigma}_{1},\overline{\sigma}_{1}]}\varphi(\sigma_{2},\sigma_{1})
=[[φ(σ2,V1)]σ2=V2],\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(\sigma_{2},V_{1})]_{\sigma_{2}=V_{2}}],

where we have used the fact that φx(y)maxx[σ¯,σ¯]φ(x,y)Cl.Lip()\varphi_{x}(y)\coloneqq\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(x,y)\in C_{\mathrm{l.Lip}}(\mathbb{R}) if φCl.Lip(2)\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{2}), which can be validated by 6.1. Hence, we have V2V1V_{2}\dashrightarrow V_{1}.

In general, for any permutation (i1,i2,,id)(i_{1},i_{2},\dotsc,i_{d}) of (1,2,,d)(1,2,\dotsc,d), our objective is to prove for any j=2,dj=2,\dotsc d,

(Vi1,Vi2,Vij1)Vij.(V_{i_{1}},V_{i_{2}},\dotsc V_{i_{j-1}})\dashrightarrow V_{i_{j}}.

From the first statement, (Vi1,Vi2,,Vij)(V_{i_{1}},V_{i_{2}},\dotsc,V_{i_{j}}), as a function of (V1,V2,,Vd)(V_{1},V_{2},\dotsc,V_{d}), must also follow a maximal distribution, characterized by (𝒱j)\mathcal{M}(\mathcal{V}_{j}) with

𝒱jk=1j[σ¯ik,σ¯ik].\mathcal{V}_{j}\coloneqq\prod_{k=1}^{j}[\underline{\sigma}_{i_{k}},\overline{\sigma}_{i_{k}}].

Then we can mimic the derivation for d=2d=2 to check the independence,

[φ(Vi1,Vi2,,Vij)]\displaystyle\mathcal{E}[\varphi(V_{i_{1}},V_{i_{2}},\dotsc,V_{i_{j}})] =max(σi1,σi2,,σij)𝒱jφ(σi1,σi2,,σij)\displaystyle=\max_{(\sigma_{i_{1}},\sigma_{i_{2}},\dots,\sigma_{i_{j}})\in\mathcal{V}_{j}}\varphi(\sigma_{i_{1}},\sigma_{i_{2}},\dots,\sigma_{i_{j}})
=max(σi1,σi2,,σij1)maxσijφ(σi1,σi2,,σij)\displaystyle=\max_{(\sigma_{i_{1}},\sigma_{i_{2}},\dotsc,\sigma_{i_{j-1}})}\max_{\sigma_{i_{j}}}\varphi(\sigma_{i_{1}},\sigma_{i_{2}},\dots,\sigma_{i_{j}})
=[[maxσijφ(σi1,σi2,,σij1,Vij)]σik=Vik,k=1,2,j1]\displaystyle=\mathcal{E}[[\max_{\sigma_{i_{j}}}\varphi(\sigma_{i_{1}},\sigma_{i_{2}},\dots,\sigma_{i_{j-1}},V_{i_{j}})]_{\sigma_{i_{k}}=V_{i_{k}},k=1,2\dotsc,j-1}]
=[[φ(σi1,σi2,,σij1,Vij)]σik=Vik,k=1,2,j1].\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(\sigma_{i_{1}},\sigma_{i_{2}},\dotsc,\sigma_{i_{j-1}},V_{i_{j}})]_{\sigma_{i_{k}}=V_{i_{k}},k=1,2\dotsc,j-1}].

Since it holds for all possible jj, it is equivalent to say

Vi1Vi2Vid.V_{i_{1}}\dashrightarrow V_{i_{2}}\dashrightarrow\cdots\dashrightarrow V_{i_{d}}.\qed

6.2 Proofs in Section 3.4 (improved)

In order to show the uniqueness of decomposition (3.9), we first prepare several lemmas.

Lemma 6.1.

For any g(K,η)¯sg(K,\eta)\in\bar{\mathcal{H}}_{s} where K(Θ)K\sim\mathcal{M}(\Theta) and η\eta is classical, if g(K,η)=dϵg(K,\eta)\overset{\text{d}}{=}\epsilon where ϵ\epsilon is classical, we must have, for any fixed kΘk\in\Theta,

g(k,η)=dϵ.g(k,\eta)\overset{\text{d}}{=}\epsilon.
Proof.

Since for any function ψ\psi,

maxkΘ𝔼[ψ(g(k,η))]=[ψ(g(K,η))]=𝔼[ψ(ϵ)],\max_{k\in\Theta}\mathbb{E}[\psi(g(k,\eta))]=\mathcal{E}[\psi(g(K,\eta))]=\mathbb{E}[\psi(\epsilon)],

by replacing ψ\psi with ψ-\psi, we have

minkΘ𝔼[ψ(g(k,η))]\displaystyle\min_{k\in\Theta}\mathbb{E}[\psi(g(k,\eta))] =[ψ(g(K,η))]\displaystyle=-\mathcal{E}[-\psi(g(K,\eta))]
=𝔼[ψ(ϵ)]=𝔼[ψ(ϵ)].\displaystyle=-\mathbb{E}[-\psi(\epsilon)]=\mathbb{E}[\psi(\epsilon)].

It means for any kΘk\in\Theta, we have

𝔼[ψ(g(k,η))]𝔼[ψ(ϵ)].\mathbb{E}[\psi(g(k,\eta))]\equiv\mathbb{E}[\psi(\epsilon)].

Therefore, we have g(k,η)=dϵ.g(k,\eta)\overset{\text{d}}{=}\epsilon.

Lemma 6.2.

For a maximally distributed V[σ¯,σ¯]V\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}, we have 𝐯(V[σ¯,σ¯])=1\mathbf{v}(V\in{[\underline{\sigma},\overline{\sigma}]})=1.

Proof.

Let

φn(x)={1x[σ¯,σ¯]n(xσ¯)+1x[σ¯1n,σ¯)n(xσ¯)+1x(σ¯,σ¯+1n]0otherwise.\varphi_{n}(x)=\begin{cases}1&x\in{[\underline{\sigma},\overline{\sigma}]}\\ n(x-\underline{\sigma})+1&x\in[\underline{\sigma}-\frac{1}{n},\underline{\sigma})\\ -n(x-\overline{\sigma})+1&x\in(\overline{\sigma},\overline{\sigma}+\frac{1}{n}]\\ 0&\text{otherwise}\end{cases}.

Then we have φnCl.Lip()\varphi_{n}\in C_{\mathrm{l.Lip}}(\mathbb{R}) and φn(x)𝟙[σ¯,σ¯](x)\varphi_{n}(x)\downarrow\mathds{1}_{{[\underline{\sigma},\overline{\sigma}]}}(x) or φn(x)𝟙[σ¯,σ¯](x)-\varphi_{n}(x)\uparrow-\mathds{1}_{{[\underline{\sigma},\overline{\sigma}]}}(x). (Since each φn(V)\varphi_{n}(V)\in\mathcal{H}, we have φ(V)=limnφn(V)\varphi(V)=\lim_{n\to\infty}\varphi_{n}(V)\in\mathcal{H} by the completeness of \mathcal{H}.) Note that

[φn(V)]=maxx[σ¯,σ¯](φn(x))=minx[σ¯,σ¯]φn(x)=1.\mathcal{E}[-\varphi_{n}(V)]=\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}(-\varphi_{n}(x))=-\min_{x\in{[\underline{\sigma},\overline{\sigma}]}}\varphi_{n}(x)=-1.

It implies that

[𝟙[σ¯,σ¯](V)]=limn[φn(V)]=1,\mathcal{E}[-\mathds{1}_{{[\underline{\sigma},\overline{\sigma}]}}(V)]=\lim_{n\to\infty}\mathcal{E}[-\varphi_{n}(V)]=-1,

then

𝐯(V[σ¯,σ¯])=[𝟙[σ¯,σ¯](V)]=1.\mathbf{v}(V\in{[\underline{\sigma},\overline{\sigma}]})=-\mathcal{E}[-\mathds{1}_{{[\underline{\sigma},\overline{\sigma}]}}(V)]=1.

Lemma 6.3.

Consider K(Θ)K\sim\mathcal{M}(\Theta) where Θ\Theta is a compact and convex set and η\eta follows a non-degenerate classical distribution PηP_{\eta} with KηK\dashrightarrow\eta. For any hCl.Liph\in C_{\mathrm{l.Lip}}, if h(K,η)[σ¯,σ¯]h(K,\eta)\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}, there exists B()B\in\mathcal{B}(\mathbb{R}) with Pη(B)=1P_{\eta}(B)=1 such that h(x,y)h(x,y) does not depend on yy or simply h(x,y)=h(x)h(x,y)=h(x) when yBy\in B.

Proof.

For any φCl.Lip\varphi\in C_{\mathrm{l.Lip}} with φ(x)>0\varphi(x)>0 on x[σ¯,σ¯]x\in{[\underline{\sigma},\overline{\sigma}]}, let σargmaxσ[σ¯,σ¯]φ(σ)\sigma^{*}\coloneqq\operatorname*{arg\,max}_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma). Then we have

φ(σ)=maxσ[σ¯,σ¯]φ(σ)\displaystyle\varphi(\sigma^{*})=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma) =[φ(h(K,η))]\displaystyle=\mathcal{E}[\varphi(h(K,\eta))]
=maxkΘ𝔼[φ(h(k,η))]\displaystyle=\max_{k\in\Theta}\mathbb{E}[\varphi(h(k,\eta))]
=maxkΘφ(h(k,y))Pη(dy)\displaystyle=\max_{k\in\Theta}\int\varphi(h(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y)
maxkΘφ(h(k,y))Pη(dy).\displaystyle\leq\int\max_{k\in\Theta}\varphi(h(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y).

Meanwhile, note that h(K,η)h(K,\eta) is bounded by [σ¯,σ¯]{[\underline{\sigma},\overline{\sigma}]} and KK is bounded by Θ\Theta in quasi-surely sense or

𝐯(h(K,η)[σ¯,σ¯])=1,𝐯(KΘ)=1.\mathbf{v}(h(K,\eta)\in{[\underline{\sigma},\overline{\sigma}]})=1,\;\mathbf{v}(K\in\Theta)=1.

Then, for any 𝒫\mathbb{P}\in\mathcal{P}, we have

({ω:h(K(ω),η(ω))[σ¯,σ¯]})=1,({ω:K(ω)[σ¯,σ¯]})=1,\mathbb{P}(\{\omega:h(K(\omega),\eta(\omega))\in{[\underline{\sigma},\overline{\sigma}]}\})=1,\;\mathbb{P}(\{\omega:K(\omega)\in{[\underline{\sigma},\overline{\sigma}]}\})=1,

then the intersection of two events has probability 11,

({ω:h(K,η)[σ¯,σ¯],KΘ})=1.\mathbb{P}(\{\omega:h(K,\eta)\in{[\underline{\sigma},\overline{\sigma}]},K\in\Theta\})=1.

Hence, with A{y:h(k,y)[σ¯,σ¯],kΘ},A\coloneqq\{y:h(k,y)\in{[\underline{\sigma},\overline{\sigma}]},k\in\Theta\}, we must have

Pη(A)=({ω:η(ω)A})({ω:h(K,η)[σ¯,σ¯],KΘ})=1.P_{\eta}(A)=\mathbb{P}(\{\omega:\eta(\omega)\in A\})\geq\mathbb{P}(\{\omega:h(K,\eta)\in{[\underline{\sigma},\overline{\sigma}]},K\in\Theta\})=1.

(The measurability of AA comes from the continuity of hh. Under any 𝒫\mathbb{P}\in\mathcal{P}, the distribution of η\eta is always PηP_{\eta} due to 3.7.1.) Then we have

φ(σ)maxkΘφ(h(k,y))Pη(dy)\displaystyle\varphi(\sigma^{*})\leq\int\max_{k\in\Theta}\varphi(h(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y) =AmaxkΘφ(h(k,y))Pη(dy)\displaystyle=\int_{A}\max_{k\in\Theta}\varphi(h(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y)
Amaxσ[σ¯,σ¯]φ(σ)Pη(dy)\displaystyle\leq\int_{A}\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma)P_{\eta}(\mathop{}\!\mathrm{d}y)
=maxσ[σ¯,σ¯]φ(σ)APη(dy)=φ(σ).\displaystyle=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma)\int_{A}P_{\eta}(\mathop{}\!\mathrm{d}y)=\varphi(\sigma^{*}).

Therefore,

AmaxkΘφ(h(k,y))Pη(dy)=maxσ[σ¯,σ¯]φ(σ).\int_{A}\max_{k\in\Theta}\varphi(h(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y)=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma).

For any yAy\in A, since h(k,y)[σ¯,σ¯]h(k,y)\in{[\underline{\sigma},\overline{\sigma}]},

0<maxkΘφ(h(k,y))maxσ[σ¯,σ¯]φ(σ).0<\max_{k\in\Theta}\varphi(h(k,y))\leq\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma).

Then there must exist BAB\subset A with Pη(B)=1P_{\eta}(B)=1 such that for yBy\in B,

maxkΘφ(h(k,y))=maxσ[σ¯,σ¯]φ(σ),\max_{k\in\Theta}\varphi(h(k,y))=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma),

or

[φ(h(K,y))]=maxσ[σ¯,σ¯]φ(σ).\mathcal{E}[\varphi(h(K,y))]=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma).

For any fCl.Lipf\in C_{\mathrm{l.Lip}}, let φ=f(x)+C\varphi=f(x)+C with C=minx[σ¯,σ¯]f(x)+1C=-\min_{x\in{[\underline{\sigma},\overline{\sigma}]}}f(x)+1, then φ>0\varphi>0 on x[σ¯,σ¯]x\in{[\underline{\sigma},\overline{\sigma}]}. We have

[f(h(K,y))]=[φ(h(K,y))]C=maxσ[σ¯,σ¯]φ(σ)C=maxσ[σ¯,σ¯]f(σ).\mathcal{E}[f(h(K,y))]=\mathcal{E}[\varphi(h(K,y))]-C=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma)-C=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}f(\sigma).

Therefore, for yBy\in B,

h(K,y)[σ¯,σ¯].h(K,y)\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}. (6.6)

If there exist two distinct y1,y2By_{1},y_{2}\in B,

δh(K,y1)h(K,y2)0\delta\coloneqq h(K,y_{1})-h(K,y_{2})\neq 0

we must have

h(K,y1)=h(K,y2)+δ[σ¯+δ,σ¯+δ].h(K,y_{1})=h(K,y_{2})+\delta\sim\mathcal{M}[\underline{\sigma}+\delta,\overline{\sigma}+\delta].

This is a contradiction against 6.6. Then we have, for any yBy\in B,

h(V1,y)h(V1,c)h(V1),h(V_{1},y)\equiv h(V_{1},c)\eqqcolon h(V_{1}),

where cc is any constant chosen from BB. ∎

Proof of 3.9.

Since W¯sW\in\bar{\mathcal{H}}_{s}, we have W=f(K,η)W=f(K,\eta) where K(Θ)K\sim\mathcal{M}(\Theta) and η\eta is classical satisfying KηK\dashrightarrow\eta. Suppose there exist Vi¯sV_{i}\in\bar{\mathcal{H}}_{s} and ϵi¯s\epsilon_{i}\in\bar{\mathcal{H}}_{s} such that W=V1ϵ1=V2ϵ2W=V_{1}\epsilon_{1}=V_{2}\epsilon_{2}. To be specific, without loss of generality, we can assume

Vi\displaystyle V_{i} =hi(K,η),\displaystyle=h_{i}(K,\eta),
ϵi\displaystyle\epsilon_{i} =gi(K,η),\displaystyle=g_{i}(K,\eta),

such that f(K,η)=h1(K,η)g1(K,η)=h2(K,η)g2(K,η)f(K,\eta)=h_{1}(K,\eta)g_{1}(K,\eta)=h_{2}(K,\eta)g_{2}(K,\eta). Then we have

g2(K,η)=h1(K,η)h2(K,η)g1(K,η).g_{2}(K,\eta)=\frac{h_{1}(K,\eta)}{h_{2}(K,\eta)}g_{1}(K,\eta).

Note that hi(K,η)[σ¯,σ¯]h_{i}(K,\eta)\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}, then by 6.3, there exist BiB_{i} with Pη(Bi)=1P_{\eta}(B_{i})=1 such that hi(x,y)=hi(x)h_{i}(x,y)=h_{i}(x) when yBiy\in B_{i}. Let B=B1B2B=B_{1}\cap B_{2} then we still have Pη(B)=1P_{\eta}(B)=1. Then we have, for any φ\varphi,

[φ(h1(K,η)h2(K,η)g1(K,η))]\displaystyle\mathcal{E}[\varphi(\frac{h_{1}(K,\eta)}{h_{2}(K,\eta)}g_{1}(K,\eta))] =supkΘ𝔼[φ(h1(k,η)h2(k,η)g1(k,η))]\displaystyle=\sup_{k\in\Theta}\mathbb{E}[\varphi(\frac{h_{1}(k,\eta)}{h_{2}(k,\eta)}g_{1}(k,\eta))]
=supkΘφ(h1(k,y)h2(k,y)g1(k,y))Pη(dy)\displaystyle=\sup_{k\in\Theta}\int\varphi(\frac{h_{1}(k,y)}{h_{2}(k,y)}g_{1}(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y)
=supkΘBφ(h1(k,y)h2(k,y)g1(k,y))Pη(dy)\displaystyle=\sup_{k\in\Theta}\int_{B}\varphi(\frac{h_{1}(k,y)}{h_{2}(k,y)}g_{1}(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y)
=supkΘBφ(h1(k)h2(k)g1(k,y))Pη(dy)\displaystyle=\sup_{k\in\Theta}\int_{B}\varphi(\frac{h_{1}(k)}{h_{2}(k)}g_{1}(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y)
=supkΘφ(h1(k)h2(k)g1(k,y))Pη(dy)\displaystyle=\sup_{k\in\Theta}\int\varphi(\frac{h_{1}(k)}{h_{2}(k)}g_{1}(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y)
=supkΘ𝔼[φ(h1(k)h2(k)g1(k,η))]=[φ(h1(K)h2(K)g1(K,η))].\displaystyle=\sup_{k\in\Theta}\mathbb{E}[\varphi(\frac{h_{1}(k)}{h_{2}(k)}g_{1}(k,\eta))]=\mathcal{E}[\varphi(\frac{h_{1}(K)}{h_{2}(K)}g_{1}(K,\eta))].

Similarly, we can also show

h1(K,η)h2(K,η)=dh1(K)h2(K).\frac{h_{1}(K,\eta)}{h_{2}(K,\eta)}\overset{\text{d}}{=}\frac{h_{1}(K)}{h_{2}(K)}.

Note that

R(K)h1(K)/h2(K)(S),R(K)\coloneqq h_{1}(K)/h_{2}(K)\sim\mathcal{M}(S),

where S={h1(k)/h2(k),kΘ}S=\{h_{1}(k)/h_{2}(k),k\in\Theta\}. By 6.1, letting Z=dN(0,1)Z\overset{\text{d}}{=}N(0,1), the fact that g1(K,η)=dZg_{1}(K,\eta)\overset{\text{d}}{=}Z implies, for kΘk\in\Theta,

g1(k,η)=dZ.g_{1}(k,\eta)\overset{\text{d}}{=}Z.

Then we have, with ψk(x)φ(R(k)x)\psi_{k}(x)\coloneqq\varphi(R(k)x),

[φ(R(K)g1(K,η))]\displaystyle\mathcal{E}[\varphi(R(K)g_{1}(K,\eta))] =supkΘ𝔼[φ(R(k)g1(k,η))]\displaystyle=\sup_{k\in\Theta}\mathbb{E}[\varphi(R(k)g_{1}(k,\eta))]
=supkΘ𝔼[ψk(g1(k,η))]=supkΘ𝔼[ψk(Z)]\displaystyle=\sup_{k\in\Theta}\mathbb{E}[\psi_{k}(g_{1}(k,\eta))]=\sup_{k\in\Theta}\mathbb{E}[\psi_{k}(Z)]
=supkΘ𝔼[φ(R(k)Z)]=supsS𝔼[φ(sZ)].\displaystyle=\sup_{k\in\Theta}\mathbb{E}[\varphi(R(k)Z)]=\sup_{s\in S}\mathbb{E}[\varphi(sZ)].

Meanwhile,

[φ(R(K)g1(K,η))]=[φ(g2(K,η))]=𝔼[φ(Z)].\mathcal{E}[\varphi(R(K)g_{1}(K,\eta))]=\mathcal{E}[\varphi(g_{2}(K,\eta))]=\mathbb{E}[\varphi(Z)].

Then we have the set SS has to be a singleton {1}\{1\}. It means that R(K)({1})R(K)\sim\mathcal{M}(\{1\}) or R(K)=1R(K)=1 (in a quasi-surely sense). Then we also have

h1(K,η)h2(K,η)=dh1(K)h2(K)=d({1}).\frac{h_{1}(K,\eta)}{h_{2}(K,\eta)}\overset{\text{d}}{=}\frac{h_{1}(K)}{h_{2}(K)}\overset{\text{d}}{=}\mathcal{M}(\{1\}).

It means that h1(K,η)=h2(K,η)h_{1}(K,\eta)=h_{2}(K,\eta) then g1(K,η)=g2(K,η)g_{1}(K,\eta)=g_{2}(K,\eta). The uniqueness has been proved. ∎

Proof of 3.12.

This is a direct consequence of 2.15. Let ϵN(0,1)\epsilon\sim N(0,1). On the one hand, for any φCl.Lip\varphi\in C_{\mathrm{l.Lip}}, as discussed in 3.11.3, we have

[φ(WG)]supσ[σ¯,σ¯]𝔼[φ(σϵ)]=[φ(W)].\mathcal{E}[\varphi(W^{G})]\geq\sup_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)]=\mathcal{E}[\varphi(W)].

On the other hand, when φ\varphi is convex or concave, by 2.15, we have

[φ(W)]maxσ{σ¯,σ¯}𝔼[φ(σϵ)][φ(WG)].\mathcal{E}[\varphi(W)]\geq\max_{\sigma\in\{\underline{\sigma},\overline{\sigma}\}}\mathbb{E}[\varphi(\sigma\epsilon)]\geq\mathcal{E}[\varphi(W^{G})].

Hence, we have [φ(WG)]=[φ(W)]\mathcal{E}[\varphi(W^{G})]=\mathcal{E}[\varphi(W)] under convexity (or concavity) of φ\varphi.

For readers’ convenience, we include an explicit proof on why we have such results for semi-GG-normal distribution. For techinical convenience, we assume φ\varphi is second order differentiable. From the representation of the semi-GG-normal distribution (2.15), with G(v)𝔼[φ(vϵ)](v[σ¯,σ¯])G(v)\coloneqq\mathbb{E}_{\mathbb{P}}[\varphi(v\epsilon)](v\in{[\underline{\sigma},\overline{\sigma}]}), our goal is to show

[φ(W)]=maxv[σ¯,σ¯]G(v)={G(σ¯)φ is convexG(σ¯)φ is concave.\mathcal{E}[\varphi(W)]=\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}G(v)=\begin{cases}G(\overline{\sigma})&\varphi\text{ is convex}\\ G(\underline{\sigma})&\varphi\text{ is concave}\end{cases}.

First of all, by Taylor expansion φ(x)=φ(0)+φ(1)(0)x+φ(2)(ξx)x22\varphi(x)=\varphi(0)+\varphi^{(1)}(0)x+\varphi^{(2)}(\xi_{x})\frac{x^{2}}{2} with ξx(0,x)\xi_{x}\in(0,x), we have,

G(v)=𝔼[φ(0)+φ(1)(0)vϵ+φ(2)(ξvϵ)v22ϵ2]=φ(0)+12𝔼[φ(2)(ξvϵ)(vϵ)2],G(v)=\mathbb{E}_{\mathbb{P}}[\varphi(0)+\varphi^{(1)}(0)v\epsilon+\varphi^{(2)}(\xi_{v\epsilon})\frac{v^{2}}{2}\epsilon^{2}]=\varphi(0)+\frac{1}{2}\mathbb{E}_{\mathbb{P}}[\varphi^{(2)}(\xi_{v\epsilon})(v\epsilon)^{2}],

where ξvϵ(0,vϵ)\xi_{v\epsilon}\in(0,v\epsilon) is a random variable depending on ϵ\epsilon. Let MvϵN(0,v2)M\coloneqq v\epsilon\sim N(0,v^{2}), then

K(v)𝔼[φ(2)(ξvϵ)(vϵ)2]=𝔼[φ(2)(ξM)M2]=ϕ(mv)φ(2)(ξm)m2dm,K(v)\coloneqq\mathbb{E}_{\mathbb{P}}[\varphi^{(2)}(\xi_{v\epsilon})(v\epsilon)^{2}]=\mathbb{E}_{\mathbb{P}}[\varphi^{(2)}(\xi_{M})M^{2}]=\int\phi(\frac{m}{v})\varphi^{(2)}(\xi_{m})m^{2}\mathop{}\!\mathrm{d}m,

where ϕ(x)\phi(x) is the density of N(0,1)N(0,1). When φ\varphi is convex, we can use the fact φ(2)0\varphi^{(2)}\geq 0 to show the monotonicity of K(v)K(v):

K(v)\displaystyle K^{\prime}(v) =ϕ(mv)(mv2)φ(2)(ξm)m2dm\displaystyle=\int\phi^{\prime}(\frac{m}{v})(-\frac{m}{v^{2}})\varphi^{(2)}(\xi_{m})m^{2}\mathop{}\!\mathrm{d}m
=12π[m2v3exp(m22v2)]0 for v[σ¯,σ¯]φ(2)(ξm)m20dm0.\displaystyle=\int\frac{1}{\sqrt{2\pi}}\underbrace{\mathopen{}\mathclose{{}\left[\frac{m^{2}}{v^{3}}\exp\mathopen{}\mathclose{{}\left(-\frac{m^{2}}{2v^{2}}}\right)}\right]}_{\geq 0\text{ for }v\in{[\underline{\sigma},\overline{\sigma}]}}\underbrace{\vphantom{\mathopen{}\mathclose{{}\left[\frac{1}{2v}\exp\mathopen{}\mathclose{{}\left(-\frac{m^{2}}{2v^{2}}}\right)}\right]}\varphi^{(2)}(\xi_{m})\,m^{2}}_{\geq 0}\mathop{}\!\mathrm{d}m\geq 0.

Its tells us K(v)K(v) is increasing with respect to v[σ¯,σ¯]v\in{[\underline{\sigma},\overline{\sigma}]}, then K(v)K(v) reaches its maximum at v=σ¯v=\overline{\sigma}. Hence,

[φ(W)]=maxv[σ¯,σ¯]G(v)=maxv[σ¯,σ¯](φ(0)+K(v)2)=G(σ¯).\mathcal{E}[\varphi(W)]=\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}G(v)=\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}(\varphi(0)+\frac{K(v)}{2})=G(\overline{\sigma}).

When φ\varphi is concave, φ-\varphi is convex. Replace φ\varphi above with φ-\varphi and repeat the same procedure, we are able to show G(v)-G(v) is increasing with respect to vv, that is, G(v)G(v) is decreasing and reaches its maximum at σ¯\underline{\sigma}. ∎

6.3 Proofs in Section 3.6

The proofs in this section are mainly based on the results in Section 2.2 which provides fruitful tools to deal with the independence of sequence in this framework.

Lemma 6.4.

In sublinear expectation space, for a sequence of i.i.d. random variables {ϵi}i=1n𝒩(0,[1,1])\{\epsilon_{i}\}_{i=1}^{n}\sim\mathcal{N}(0,[1,1]) (namely, ϵ1ϵ2ϵn\epsilon_{1}\dashrightarrow\epsilon_{2}\dashrightarrow\dotsc\dashrightarrow\epsilon_{n}), we have

(ϵ1,ϵ2,,ϵn)TN(𝟎,𝐈n2),(\epsilon_{1},\epsilon_{2},\dotsc,\epsilon_{n})^{T}\sim N(\bm{0},\mathbf{I}_{n}^{2}),

where 𝐈n\mathbf{I}_{n} is the n×nn\times n identity matrix.

Proof.

Since the distribution of ϵi\epsilon_{i} can be treated as the classical N(0,1)N(0,1), the sequential independence can be treated the classical independence (2.6.3). Then we can get the required results by applying the classical logic. ∎

Remark 6.4.1.

Since the independence of {ϵi}i=1n\{\epsilon_{i}\}_{i=1}^{n} is classical, the order of independence can also be arbitrarily switched so we can easily obtain a result similar to 3.6.

Proposition 6.5.

For a sequence of i.i.d. random variables {ϵi}i=1n𝒩(0,[1,1])\{\epsilon_{i}\}_{i=1}^{n}\sim\mathcal{N}(0,[1,1]), the following three statements are equivalent:

  • (1)

    ϵ1ϵ2ϵn{\epsilon}_{1}\dashrightarrow{\epsilon}_{2}\dashrightarrow\cdots\dashrightarrow{\epsilon}_{n},

  • (2)

    ϵk1ϵk2ϵkn\epsilon_{k_{1}}\dashrightarrow\epsilon_{k_{2}}\dashrightarrow\cdots\dashrightarrow\epsilon_{k_{n}} for any permutation {kj}j=1n\{k_{j}\}_{j=1}^{n} of {1,2,,n}\{1,2,\dotsc,n\},

  • (3)

    (ϵ1,ϵ2,,ϵn)N(𝟎,[σ¯2,σ¯2]𝐈n)(\epsilon_{1},\epsilon_{2},\dotsc,\epsilon_{n})\sim N(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{n}).

Proof of 3.17.

Since the fully-sequential independence implies (F1) and (F2) by 2.20 and 2.23, we only need to show the other direction. When n=2n=2, this result is a consequence of 2.24. For iji\leq j, let

(V,ϵ)ij(Vi,ϵi,Vi+1,ϵi+1,,Vj,ϵj).(V,\epsilon)_{i}^{j}\coloneqq(V_{i},\epsilon_{i},V_{i+1},\epsilon_{i+1},\dotsc,V_{j},\epsilon_{j}).

Next we proceed by math induction. Suppose the result holds for n=kn=k with k2k\geq 2. For n=k+1n=k+1, we only need to show: given the conditions

  1. 1.

    (V1,ϵ1)(V2,ϵ2)(Vk+1,ϵk+1)(V_{1},\epsilon_{1})\dashrightarrow(V_{2},\epsilon_{2})\dashrightarrow\cdots\dashrightarrow(V_{k+1},\epsilon_{k+1}),

  2. 2.

    ViϵiV_{i}\dashrightarrow\epsilon_{i} for i=1,2,,k+1i=1,2,\dotsc,k+1,

we have the fully-sequential independence:

V1ϵ1VkϵkVk+1ϵk+1.V_{1}\dashrightarrow\epsilon_{1}\dashrightarrow\cdots\dashrightarrow V_{k}\dashrightarrow\epsilon_{k}\dashrightarrow V_{k+1}\dashrightarrow\epsilon_{k+1}. (6.7)

Since all the independence relation in 6.7 until the term ϵk\epsilon_{k} can be guaranteed by the presumed result with n=kn=k, we only need to show the additional independence:

  1. 1.

    (V,ϵ)1kVk+1(V,\epsilon)_{1}^{k}\dashrightarrow V_{k+1},

  2. 2.

    ((V,ϵ)1k,Vk+1)ϵk+1((V,\epsilon)_{1}^{k},V_{k+1})\dashrightarrow\epsilon_{k+1}.

The first one comes from (F1) whose definition implies (V,ϵ)1k(Vk+1,ϵk+1)(V,\epsilon)_{1}^{k}\dashrightarrow(V_{k+1},\epsilon_{k+1}). The second one comes from 2.22 given the following statements:

  1. 1.

    (V,ϵ)1k(Vk+1,ϵk+1)(V,\epsilon)_{1}^{k}\dashrightarrow(V_{k+1},\epsilon_{k+1}) by (F1);

  2. 2.

    (V,ϵ)1kVk+1(V,\epsilon)_{1}^{k}\dashrightarrow V_{k+1} by the first one.

Then we have the required result for n=k+1n=k+1. The proof is finished by math induction. ∎

Proof of 3.18.

First the definition 3.18 of semi-sequential independence implies (S1) to (S3) by 2.20 and 2.23. We only need to check the other direction. For iji\leq j, let Vij(Vi,Vi+1,,Vj)V_{i}^{j}\coloneqq(V_{i},V_{i+1},\dotsc,V_{j}) and similarly define the notation ϵij\epsilon_{i}^{j}. Our goal can be expanded as by 2.8:

  1. 1.

    V1lVl+1V_{1}^{l}\dashrightarrow V_{l+1} for any l=1,2,,n1l=1,2,\dotsc,n-1,

  2. 2.

    (V1n1,Vn,ϵ1l)ϵl+1(V_{1}^{n-1},V_{n},\epsilon_{1}^{l})\dashrightarrow\epsilon_{l+1} for any l=1,2,,n1l=1,2,\dotsc,n-1.

The first one comes from (S2). For the second one, note that we have

  1. 1.

    (V1n1,Vn)(ϵ1l,ϵl+1)(V_{1}^{n-1},V_{n})\dashrightarrow(\epsilon_{1}^{l},\epsilon_{l+1}) by (S1),

  2. 2.

    ϵ1lϵl+1\epsilon_{1}^{l}\dashrightarrow\epsilon_{l+1} by (S3),

  3. 3.

    V1n1VnV_{1}^{n-1}\dashrightarrow V_{n} by (S2),

then by 2.24, we have proved the second relation. ∎

Proof of 3.22.

The idea main the equivalent definition of semi-sequential independence given by 3.18, which shows the symmetry within VV part and ϵ\epsilon part. The equivalence of the three statemenst will be proved in this logic:

(3)(1)(2).(3)\iff(1)\iff(2).

Let π:(x1,x2,,xn)(xk1,xk2,,xkn)\pi:(x_{1},x_{2},\dotsc,x_{n})\to(x_{k_{1}},x_{k_{2}},\dotsc,x_{k_{n}}) denote a permutation function.

(1)(2)(1)\iff(2). It is a direct translation of 3.18 by considering the equivalence in each part:

  1. 1.

    The equivalence in (S1) can be seen by by treating each vector as a function of each other under the permutation π\pi (or π1\pi^{-1}.)

  2. 2.

    The equivalence in (S2) comes from 3.6.

  3. 3.

    The equivalence in (S3) comes from 6.5.

(3)(1)(3)\iff(1). Let 𝑾(W1,W2,,Wn)\bm{W}\coloneqq(W_{1},W_{2},\dotsc,W_{n}). Then

𝑾=(V1ϵ1,V2ϵ2,,Vnϵn)=𝐕ϵ.\bm{W}=(V_{1}\epsilon_{1},V_{2}\epsilon_{2},\dotsc,V_{n}\epsilon_{n})=\mathbf{V}\bm{\epsilon}.

Then we can decompose (3) into three conditions each of which is equivalent to the condition in (1) under the context of 3.18:

  1. 1.

    Since 𝐕=(V1,,Vn)diag(1,,1)\mathbf{V}=(V_{1},\dotsc,V_{n})\operatorname{diag}(1,\dotsc,1), we have 𝐕ϵ\mathbf{V}\dashrightarrow\bm{\epsilon} if and only if (V1,,Vn)ϵ(V_{1},\dotsc,V_{n})\dashrightarrow\bm{\epsilon} which is (S1) in 3.18.

  2. 2.

    Note that 𝐕([σ¯,σ¯]𝐈n)\mathbf{V}\sim\mathcal{M}({[\underline{\sigma},\overline{\sigma}]}\mathbf{I}_{n}) is equivalent to 𝑽([σ¯,σ¯]n)\bm{V}\sim\mathcal{M}({[\underline{\sigma},\overline{\sigma}]}^{n}) which is further equivalent to (S2): {Vi}i=1n\{V_{i}\}_{i=1}^{n} are sequentially independent.

  3. 3.

    By 6.5, we have ϵN(𝟎,𝐈n)\bm{\epsilon}\sim N(\bm{0},\mathbf{I}_{n}) is equivalent to (S3): {ϵ}i=1n\{\epsilon\}_{i=1}^{n} are sequentially independent.∎

6.4 Proofs in Section 3.7

Proof of 3.24.

(A note on the finiteness of sublinear expectations) For any φCl.Lip(k+1)\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k+1}), it means there exists m+m\in\mathbb{N}_{+} and C0>0C_{0}>0 such that for 𝒙,𝒚k+1,\bm{x},\bm{y}\in\mathbb{R}^{k+1},

|φ(𝒙)φ(𝒚)|C0(1+𝒙m+𝒚m)𝒙𝒚.\lvert\varphi(\bm{x})-\varphi(\bm{y})\rvert\leq C_{0}(1+\lVert\bm{x}\rVert^{m}+\lVert\bm{y}\rVert^{m})\lVert\bm{x}-\bm{y}\rVert.

Without loss of generality, we can assume φ(𝟎)=0\varphi(\bm{0})=0, then we have |φ(x)|C0(1+xm)x\lvert\varphi(x)\rvert\leq C_{0}(1+\lVert x\rVert^{m})\lVert x\rVert. It implies

[|φ(𝑾)|]C0([𝑾]+[𝑾m]).\mathcal{E}[\lvert\varphi(\bm{W})\rvert]\leq C_{0}(\mathcal{E}[\lVert\bm{W}\rVert]+\mathcal{E}[\lVert\bm{W}\rVert^{m}]).

To validate [|φ(𝑾)|]<\mathcal{E}[\lvert\varphi(\bm{W})\rvert]<\infty under each case, it will be sufficient to confirm the finiteness of this sublinear expectation: for any q+q\in\mathbb{N}_{+},

[𝑾q]<.\mathcal{E}[\lVert\bm{W}\rVert^{q}]<\infty. (6.8)

(Semi-sequential independence case) Under the independence specified by 3.25, from 3.14, we have

𝑾𝒩^(𝟎,𝒱),\bm{W}\sim\hat{\mathcal{N}}(\bm{0},\mathcal{V}),

where 𝒱={diag(σ12,σ22,,σn2):σi[σ¯,σ¯]}\mathcal{V}=\{\operatorname{diag}(\sigma_{1}^{2},\sigma_{2}^{2},\dotsc,\sigma_{n}^{2}):\sigma_{i}\in{[\underline{\sigma},\overline{\sigma}]}\}. Therefore,

[φ(𝑾)]=max𝐕𝒱𝔼[φ(𝐕1/2ϵ)]=max𝝈𝒞n[σ¯,σ¯][φ(𝝈ϵ)],\mathcal{E}[\varphi(\bm{W})]=\max_{\mathbf{V}\in\mathcal{V}}\mathbb{E}_{\mathbb{P}}[\varphi(\mathbf{V}^{1/2}\bm{\epsilon})]=\max_{\bm{\sigma}\in\mathcal{C}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathcal{E}[\varphi(\bm{\sigma}*\bm{\epsilon})],

where 𝐕1/2\mathbf{V}^{1/2} is the symmetric square root of 𝐕\mathbf{V} and 𝒞n[σ¯,σ¯]{𝝈:(σ1,σ2,,σn)[σ¯,σ¯]n}.\mathcal{C}_{n}{[\underline{\sigma},\overline{\sigma}]}\coloneqq\{\bm{\sigma}:(\sigma_{1},\sigma_{2},\dotsc,\sigma_{n})\in{[\underline{\sigma},\overline{\sigma}]}^{n}\}. At the same time, we can validate the finiteness [|φ(𝑾)|]<\mathcal{E}[\lvert\varphi(\bm{W})\rvert]<\infty, because 𝐕1/2ϵ\mathbf{V}^{1/2}\bm{\epsilon} follows a classical multivariate normal distribution, then for any q+q\in\mathbb{N}_{+}, 𝔼[𝐕1/2ϵq]<\mathbb{E}_{\mathbb{P}}[\lVert\mathbf{V}^{1/2}\bm{\epsilon}\rVert^{q}]<\infty, which implies 6.8.

Next, since we have 𝒞n[σ¯,σ¯]𝒮n[σ¯,σ¯]\mathcal{C}_{n}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}, we only need to show for any 𝝈𝒮n[σ¯,σ¯]\bm{\sigma}\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]},

𝔼[φ(𝝈ϵ)][φ(𝑾)].\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}*\bm{\epsilon})]\leq\mathcal{E}[\varphi(\bm{W})]. (6.9)

Note that 𝝈ϵ\bm{\sigma}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\bm{\epsilon} for 𝝈𝒮n[σ¯,σ¯]\bm{\sigma}\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]} and the random vector 𝝈\bm{\sigma} must follow a joint distribution supporting on a subset of [σ¯,σ¯]n{[\underline{\sigma},\overline{\sigma}]}^{n}, then we can apply the representation of multivariate semi-GG-normal distribution (3.14) to get the inequality 6.9.

(Sequential independence case) We proceed by mathematical induction. For n=1n=1, the result 3.29 and 3.30 as well as the finiteness 6.8 hold by applying 3.10. Suppose they also hold for n=kn=k with k+k\in\mathbb{N}_{+}. Our objective is to prove them when n=k+1n=k+1 by using the result with n=kn=k. We decompose this goal into three inequalities:

[φ(𝑾(k+1))]sup𝝈k+1[σ¯,σ¯]𝔼[φ(𝝈ϵ)],\mathcal{E}[\varphi(\bm{W}_{(k+1)})]\leq\sup_{\bm{\sigma}\in\mathcal{L}_{k+1}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}*\bm{\epsilon})], (6.10)
sup𝝈k+1[σ¯,σ¯]𝔼[φ(𝝈ϵ)]sup𝝈k+1[σ¯,σ¯]𝔼[φ(𝝈ϵ)],\sup_{\bm{\sigma}\in\mathcal{L}_{k+1}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}*\bm{\epsilon})]\leq\sup_{\bm{\sigma}\in\mathcal{L}^{*}_{k+1}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}*\bm{\epsilon})], (6.11)

and

sup𝝈k+1[σ¯,σ¯]𝔼[φ(𝝈ϵ)][φ(𝑾(k+1))].\sup_{\bm{\sigma}\in\mathcal{L}^{*}_{k+1}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}*\bm{\epsilon})]\leq\mathcal{E}[\varphi(\bm{W}_{(k+1)})]. (6.12)

After we check the three inequalities above, sup\sup can be changed to max\max since we will show the sublinear expectation can be reached by some 𝝈k+1[σ¯,σ¯]\bm{\sigma}\in\mathcal{L}_{k+1}{[\underline{\sigma},\overline{\sigma}]} in the proof of 6.10.

First of all, 6.11 is straightforward due to the fact that k+1[σ¯,σ¯]k+1[σ¯,σ¯]\mathcal{L}_{k+1}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{L}^{*}_{k+1}{[\underline{\sigma},\overline{\sigma}]}.

Second, to validate 6.10, it is sufficient show the sublinear expectation [φ(𝑾)]\mathcal{E}[\varphi(\bm{W})] can be reached by choosing some 𝝈k+1[σ¯,σ¯]\bm{\sigma}\in\mathcal{L}_{k+1}{[\underline{\sigma},\overline{\sigma}]}. In fact, we can directly select it by the iterative procedure (similar to the idea of 4.1).

[φ(W1,W2,,Wk+1)]\displaystyle\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{k+1})] =[φ(𝑾(k),Wk+1)]\displaystyle=\mathcal{E}[\varphi(\bm{W}_{(k)},W_{k+1})]
=[[φ(𝒘(k),Wk+1)]𝒘(k)=𝑾(k)]\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(\bm{w}_{(k)},W_{k+1})]_{\bm{w}_{(k)}=\bm{W}_{(k)}}]
=[(maxσk+1[σ¯,σ¯]𝔼[φ(𝒘(k),σk+1ϵk+1)])𝒘(k)=𝑾(k)]\displaystyle=\mathcal{E}[(\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\bm{w}_{(k)},\sigma_{k+1}\epsilon_{k+1})])_{\bm{w}_{(k)}=\bm{W}_{(k)}}]
=[(𝔼[φ(𝒘(k),vk+1(𝒘(k))ϵk+1)]],\displaystyle=\mathcal{E}[(\mathbb{E}_{\mathbb{P}}[\varphi(\bm{w}_{(k)},v_{k+1}(\bm{w}_{(k)})\epsilon_{k+1})]],

where vk+1()v_{k+1}(\cdot) is the maximizer depending on the value of 𝒘(k)\bm{w}_{(k)}.

Claim 6.2.

For any φCl.Lip(k+1)\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k+1}), let

φk(x)maxσk+1[σ¯,σ¯]𝔼[φ(x,σk+1ϵk+1)].\varphi_{k}(x)\coloneqq\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(x,\sigma_{k+1}\epsilon_{k+1})].

Then we have φkCl.Lip(k)\varphi_{k}\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k}).

To apply the result when n=kn=k, we first confirm that φkCl.Lip(k)\varphi_{k}\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k}) (due to 6.2) Then we have

[φ(𝑾(k),Wk+1)]\displaystyle\mathcal{E}[\varphi(\bm{W}_{(k)},W_{k+1})] =[φk(𝑾(k))],\displaystyle=\mathcal{E}[\varphi_{k}(\bm{W}_{(k)})],
=max𝝈(k)k[σ¯,σ¯]𝔼[φk(𝝈(k)ϵ(k))]=𝔼[φk(𝒗(k)ϵ(k))]\displaystyle=\max_{\bm{\sigma}_{(k)}\in\mathcal{L}_{k}{{[\underline{\sigma},\overline{\sigma}]}}}\mathbb{E}_{\mathbb{P}}[\varphi_{k}(\bm{\sigma}_{(k)}*\bm{\epsilon}_{(k)})]=\mathbb{E}_{\mathbb{P}}[\varphi_{k}(\bm{v}_{(k)}*\bm{\epsilon}_{(k)})]

where 𝒗(k)k[σ¯,σ¯]\bm{v}_{(k)}\in\mathcal{L}_{k}{{[\underline{\sigma},\overline{\sigma}]}} is the maximizer. From this procedure, we can choose

𝒗(k+1)(𝒗(k),vk+1(𝒗(k)ϵ(k))),\bm{v}_{(k+1)}\coloneqq(\bm{v}_{(k)},v_{k+1}(\bm{v}_{(k)}*\bm{\epsilon}_{(k)})),

which is corresponding to an element in k+1[σ¯,σ¯]\mathcal{L}_{k+1}{[\underline{\sigma},\overline{\sigma}]}. Then it is easy to confirm that [φ(𝑾(k+1))]=𝔼[φ(𝒗(k+1)ϵ(k+1))]\mathcal{E}[\varphi(\bm{W}_{(k+1)})]=\mathbb{E}_{\mathbb{P}}[\varphi(\bm{v}_{(k+1)}*\bm{\epsilon}_{(k+1)})] by repeating the procedure above. Meanwhile, the finiteness 6.8 is also guaranteed since, for any q+q\in\mathbb{N}_{+}, choose φ()=qCl.Lip\varphi(\cdot)=\lVert\cdot\rVert^{q}\in C_{\mathrm{l.Lip}}, we have

[𝑾(k+1)q]=[φ(𝑾(k+1))]=[φk(𝑾(k))]<,\mathcal{E}[\lVert\bm{W}_{(k+1)}\rVert^{q}]=\mathcal{E}[\varphi(\bm{W}_{(k+1)})]=\mathcal{E}[\varphi_{k}(\bm{W}_{(k)})]<\infty,

due to the confirmed fact that φkCl.Lip\varphi_{k}\in C_{\mathrm{l.Lip}} and the assumed 6.8 for n=kn=k.

Third, as an equivalent way of viewing 6.12, we need to prove for any 𝝈(k+1)k+1[σ¯,σ¯]\bm{\sigma}_{(k+1)}\in\mathcal{L}^{*}_{k+1}{[\underline{\sigma},\overline{\sigma}]}, the corresponding linear expectation is dominated by [φ(𝑾(k+1))\mathcal{E}[\varphi(\bm{W}_{(k+1)})]. Actually, we can write the classical expectation as

𝔼[φ(𝝈(k)ϵ(k),σk+1ϵk+1)]=𝔼[𝔼[φ(𝝈(k)ϵ(k),σk+1ϵk+1)|k]].\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)},\sigma_{k+1}\epsilon_{k+1})]=\mathbb{E}_{\mathbb{P}}[\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)},\sigma_{k+1}\epsilon_{k+1})|\mathcal{F}_{k}]]. (6.13)

Recall the notation we used in the proof of 6.10,

φk(𝒘(k))maxσk+1[σ¯,σ¯]𝔼[φ(𝒘(k),σk+1ϵk+1)]=[φ(𝒘(k),Wk+1)].\varphi_{k}(\bm{w}_{(k)})\coloneqq\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\bm{w}_{(k)},\sigma_{k+1}\epsilon_{k+1})]=\mathcal{E}[\varphi(\bm{w}_{(k)},W_{k+1})].

For the conditional expectation part in 6.13, since the information of (𝝈(k),ϵ(k))(\bm{\sigma}_{(k)},\bm{\epsilon}_{(k)}) is given and σk+1ϵk+1|k\sigma_{k+1}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon_{k+1}|\mathcal{F}_{k}, from the representation of univariate semi-GG-normal (3.10), it must satisfy:

𝔼[φ(𝝈(k)ϵ(k),σk+1ϵk+1)|k]\displaystyle\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)},\sigma_{k+1}\epsilon_{k+1})|\mathcal{F}_{k}] maxσk+1[σ¯,σ¯]𝔼[φ(𝝈(k)ϵ(k),σk+1ϵk+1)|k]=φk(𝝈(k)ϵ(k)).\displaystyle\leq\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)},\sigma_{k+1}\epsilon_{k+1})|\mathcal{F}_{k}]=\varphi_{k}(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)}).

Hence, by taking expectations on both sides and applying the presumed result for n=kn=k, we have

𝔼[φ(𝝈(k)ϵ(k),σk+1ϵk+1)]\displaystyle\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)},\sigma_{k+1}\epsilon_{k+1})] 𝔼[φk(𝝈(k)ϵ(k))]max𝝈(k)k[σ¯,σ¯]𝔼[φk(𝝈(k)ϵ(k))]\displaystyle\leq\mathbb{E}_{\mathbb{P}}[\varphi_{k}(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)})]\leq\max_{\bm{\sigma}_{(k)}\in\mathcal{L}^{*}_{k}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi_{k}(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)})]
=[φk(𝑾(k))]=[[φ(𝒘(k),Wk+1)]𝒘(k)=𝑾(k)]\displaystyle=\mathcal{E}[\varphi_{k}(\bm{W}_{(k)})]=\mathcal{E}[\mathcal{E}[\varphi(\bm{w}_{(k)},W_{k+1})]_{\bm{w}_{(k)}=\bm{W}_{(k)}}]
=[φ(𝑾(k),Wk+1)].\displaystyle=\mathcal{E}[\varphi(\bm{W}_{(k)},W_{k+1})].

Therefore, we have shown 6.12. The proof is completed by induction.

(fully-sequential independence case) Note that fully-sequential independence implies the sequential independence and we have shown an explicit representation of [φ(𝑾)]\mathcal{E}[\varphi(\bm{W})] for the latter situation. Hence, the representation here is the same as 3.29 and 3.30.

To prove 6.2, first recall the definition of φCl.Lip(k+1)\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k+1}), which means there exists m+m\in\mathbb{N}_{+} and C0>0C_{0}>0 such that for 𝒙,𝒚k+1\bm{x},\bm{y}\in\mathbb{R}^{k+1}

|φ(𝒙)φ(𝒚)|C0(1+𝒙m+𝒚m)𝒙𝒚.\lvert\varphi(\bm{x})-\varphi(\bm{y})\rvert\leq C_{0}(1+\lVert\bm{x}\rVert^{m}+\lVert\bm{y}\rVert^{m})\lVert\bm{x}-\bm{y}\rVert.

Note that

φk(𝒙)=maxv[σ¯,σ¯]𝔼[φ(𝒙,vϵ)].\varphi_{k}(\bm{x})=\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\bm{x},v\epsilon)].

Then we write

|φk(𝒙)φk(𝒚)|\displaystyle|\varphi_{k}(\bm{x})-\varphi_{k}(\bm{y})| |maxv[σ¯,σ¯](𝔼[φ(𝒙,vϵ)φ(𝒚,vϵ)])|\displaystyle\leq\bigl{|}\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}(\mathbb{E}[\varphi(\bm{x},v\epsilon)-\varphi(\bm{y},v\epsilon)])\bigr{|}
maxv[σ¯,σ¯]𝔼[|φ(𝒙,vϵ)φ(𝒚,yϵ)|]\displaystyle\leq\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\lvert\varphi(\bm{x},v\epsilon)-\varphi(\bm{y},y\epsilon)\rvert]
maxv[σ¯,σ¯]𝔼[C0(1+(𝒙,vϵ)m+(𝒚,vϵ)m)𝒙𝒚],\displaystyle\leq\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[C_{0}(1+\lVert(\bm{x},v\epsilon)\rVert^{m}+\lVert(\bm{y},v\epsilon)\rVert^{m})\lVert\bm{x}-\bm{y}\rVert],

where we adapt the norm to lower dimension in the sense that 𝒂(k+1)(𝒂(k),0)\lVert\bm{a}_{(k+1)}\rVert\coloneqq\lVert(\bm{a}_{(k)},0)\rVert. By triangle inequality, for any v[σ¯,σ¯]v\in{[\underline{\sigma},\overline{\sigma}]},

(𝒙,vϵ)𝒙+|vϵ|𝒙+σ¯|ϵ|,\lVert(\bm{x},v\epsilon)\rVert\leq\lVert\bm{x}\rVert+\lvert v\epsilon\rvert\leq\lVert\bm{x}\rVert+\overline{\sigma}\lvert\epsilon\rvert,

then

(x,vϵ)m(x+σ¯|ϵ|)mmax{1,2m1}(xm+σ¯m|ϵ|m).\lVert(x,v\epsilon)\rVert^{m}\leq(\lVert x\rVert+\overline{\sigma}\lvert\epsilon\rvert)^{m}\leq\max\{1,2^{m-1}\}(\lVert x\rVert^{m}+\overline{\sigma}^{m}\lvert\epsilon\rvert^{m}).

Hence, with C1=C0max{1,2m1}C_{1}=C_{0}\max\{1,2^{m-1}\} and C2=C1max{1,2σ¯m𝔼[|ϵ|m]}C_{2}=C_{1}\max\{1,2\overline{\sigma}^{m}\mathbb{E}[\lvert\epsilon\rvert^{m}]\},

|φk(𝒙)φk(𝒚)|\displaystyle|\varphi_{k}(\bm{x})-\varphi_{k}(\bm{y})| C0maxv[σ¯,σ¯](1+[(x,vϵ)m]+[(y,vϵ)m])xy\displaystyle\leq C_{0}\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}(1+\mathcal{E}[\lVert(x,v\epsilon)\rVert^{m}]+\mathcal{E}[\lVert(y,v\epsilon)\rVert^{m}])\lVert x-y\rVert
C1(1+xm+ym+2σ¯m𝔼[|ϵ|m])xy\displaystyle\leq C_{1}(1+\lVert x\rVert^{m}+\lVert y\rVert^{m}+2\overline{\sigma}^{m}\mathbb{E}[\lvert\epsilon\rvert^{m}])\lVert x-y\rVert
C2(1+xm+ym)xy.\displaystyle\leq C_{2}(1+\lVert x\rVert^{m}+\lVert y\rVert^{m})\lVert x-y\rVert.\qed
Proof of 3.24.2.

Under semi-sequential independence, note that

(W1,W2,,Wn)𝒩^(𝟎,𝒞),(W_{1},W_{2},\dotsc,W_{n})\sim\hat{\mathcal{N}}(\bm{0},\mathcal{C}),

with 𝒞={diag(σ21,σ22,,σ2n):σi[σ¯,σ¯]}\mathcal{C}=\{\operatorname{diag}(\sigma^{2}_{1},\sigma^{2}_{2},\dotsc,\sigma^{2}_{n}):\sigma_{i}\in{[\underline{\sigma},\overline{\sigma}]}\}. It has the representation (3.5) that,

[φ(𝑾(n))]=maxσi[σ¯,σ¯]𝔼[φ(diag(σ1,σ2,,σn)ϵ(n))],\mathcal{E}[\varphi(\bm{W}_{(n)})]=\max_{\sigma_{i}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\operatorname{diag}(\sigma_{1},\sigma_{2},\dotsc,\sigma_{n})\bm{\epsilon}_{(n)})],

where ϵ(n)\bm{\epsilon}_{(n)} follows a standard multivariate normal. When φ\varphi is convex, by simply repeating the idea of 3.12 in multivariate case, we can show

[φ(𝑾(n))]=𝔼[φ(diag(σ¯,σ¯,,σ¯)ϵ(n)).\mathcal{E}[\varphi(\bm{W}_{(n)})]=\mathbb{E}[\varphi(\operatorname{diag}(\overline{\sigma},\overline{\sigma},\dotsc,\overline{\sigma})\bm{\epsilon}_{(n)}).

Accordingly, when φ\varphi is concave, we can get similar result with σ¯\overline{\sigma} replaced by σ¯\underline{\sigma}.

Under sequential independence, based on the idea of showing

[φ(𝑾)]=max𝝈n[σ¯,σ¯]𝔼[φ(𝝈ϵ)],\mathcal{E}[\varphi(\bm{W})]=\max_{\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\bm{\sigma}*\bm{\epsilon})],

in 3.24. The maximizer can be obtained by implementing the iterative algorithm: with φ0φ\varphi_{0}\coloneqq\varphi, i=1,2,,ni=1,2,\dotsc,n,

φi(𝒙(ni))=maxσni+1[σ¯,σ¯]𝔼[φi1(𝒙(ni),σni+1ϵni+1)].\varphi_{i}(\bm{x}_{(n-i)})=\max_{\sigma_{n-i+1}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi_{i-1}(\bm{x}_{(n-i)},\sigma_{n-i+1}\epsilon_{n-i+1})]. (6.14)

Then we only need to record the optimizer σni+1()\sigma_{n-i+1}(\cdot) which is a function of 𝒙(ni)\bm{x}_{(n-i)} to get the maximizer 𝝈n[σ¯,σ¯]\bm{\sigma}^{*}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}. First we can show that, for i=1,2,,ni=1,2,\dotsc,n,

φi1 is convex (concave)φi is convex (concave),\varphi_{i-1}\text{ is convex (concave)}\implies\varphi_{i}\text{ is convex (concave)}, (6.15)

namely, the convexity (or concavity) of φi1\varphi_{i-1} can be carried over to φi\varphi_{i}. Actually, if φi1\varphi_{i-1} is convex (in ni+1)\mathbb{R}^{n-i+1}), it must be convex with respect to each subvector of arguments. Then by applying 3.12, we have

φi(𝒙(ni))=𝔼[φi1(𝒙(ni),σ¯ϵni+1)],\varphi_{i}(\bm{x}_{(n-i)})=\mathbb{E}[\varphi_{i-1}(\bm{x}_{(n-i)},\overline{\sigma}\epsilon_{n-i+1})], (6.16)

which also gives us the choice of σni+1\sigma_{n-i+1}. Then we can validate the convexity of φi\varphi_{i} by definition: with λ[0,1]\lambda\in[0,1], eσ¯ϵni+1e\coloneqq\overline{\sigma}\epsilon_{n-i+1},

φi(λ𝒙(ni)+(1λ)𝒚(ni))\displaystyle\varphi_{i}(\lambda\bm{x}_{(n-i)}+(1-\lambda)\bm{y}_{(n-i)}) =𝔼[φi1(λ𝒙(ni)+(1λ)𝒚(ni),e)]\displaystyle=\mathbb{E}[\varphi_{i-1}(\lambda\bm{x}_{(n-i)}+(1-\lambda)\bm{y}_{(n-i)},e)]
=𝔼[φi1(λ(𝒙(ni),e)+(1λ)(𝒚(ni),e))]\displaystyle=\mathbb{E}[\varphi_{i-1}(\lambda(\bm{x}_{(n-i)},e)+(1-\lambda)(\bm{y}_{(n-i)},e))]
λ𝔼[φi1(𝒙(ni),e)]+(1λ)𝔼[φi1(𝒚(ni),e)]\displaystyle\leq\lambda\mathbb{E}[\varphi_{i-1}(\bm{x}_{(n-i)},e)]+(1-\lambda)\mathbb{E}[\varphi_{i-1}(\bm{y}_{(n-i)},e)]
=λφi(𝒙(ni))+(1λ)φi(𝒚(ni)).\displaystyle=\lambda\varphi_{i}(\bm{x}_{(n-i)})+(1-\lambda)\varphi_{i}(\bm{y}_{(n-i)}).

We can follow the same arguments to show the concave case. Finally, we can start from the convexity (concavity, respectively) of φ0\varphi_{0} to show the convexity of all φi\varphi_{i} and along the way, we get each of the optimal σni+1\sigma_{n-i+1} is equal to σ¯\overline{\sigma} (σ¯\underline{\sigma}, respectively). ∎

Proof of 3.24.3.

The idea is similar to the proof of 3.24.2 by replacing 6.14 by

φi(𝒙(ni))=[φi1(𝒙(ni),WGni+1)].\varphi_{i}(\bm{x}_{(n-i)})=\mathcal{E}[\varphi_{i-1}(\bm{x}_{(n-i)},W^{G}_{n-i+1})].

In order to check the statement 6.15, due to the convexity of φ\varphi, we can use 2.15 to show 6.16. The remaining part is the same as the proof of 3.24.2. ∎

6.5 Proofs in Section 4.3

The goal of this section is to prove 4.8, which is a simple representation for [φ(W1,W2)]\mathcal{E}[\varphi(W_{1},W_{2})] for φCs.poly\varphi\in C_{\text{s.poly}}. Throughout this section, without further notice, we consider W1=dW2=d𝒩^(0,[σ¯2,σ¯2])W_{1}\overset{\text{d}}{=}W_{2}\overset{\text{d}}{=}\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) imposed with sequential independce W1W2W_{1}\dashrightarrow W_{2}. We also have the expression WiViϵiW_{i}\coloneqq V_{i}\epsilon_{i} with Vi[μ¯,μ¯]V_{i}\sim\mathcal{M}[\underline{\mu},\overline{\mu}], ϵiN(0,1)\epsilon_{i}\sim N(0,1) and ViϵiV_{i}\dashrightarrow\epsilon_{i}.

Lemma 6.6.

For p,q+p,q\in\mathbb{N}_{+}, if qq is odd,

[W1pW2q]=[W1pW2q]=0,\mathcal{E}[W_{1}^{p}W_{2}^{q}]=-\mathcal{E}[-W_{1}^{p}W_{2}^{q}]=0,

that is, it has certain zero mean.

Proof.

Directly work on the sublinear expectation by imposing the sequential independence. Let K(x)[xW2q],K(x)\coloneqq\mathcal{E}[xW_{2}^{q}], then we have

K(x)=x+[W2q]+x[W2q]=0,K(x)=x^{+}\mathcal{E}[W_{2}^{q}]+x^{-}\mathcal{E}[-W_{2}^{q}]=0,

because we are essentially working on the odd-moment of σϵ\sigma\epsilon with σ[σ¯,σ¯]\sigma\in{[\underline{\sigma},\overline{\sigma}]}. Then we have

[W1pW2q]\displaystyle\mathcal{E}[W_{1}^{p}W_{2}^{q}] =[[w1pW2q]w1=W1]\displaystyle=\mathcal{E}[\mathcal{E}[w_{1}^{p}W_{2}^{q}]_{w_{1}=W_{1}}]
=[K(W1p)]=0.\displaystyle=\mathcal{E}[K(W_{1}^{p})]=0.

Similarly, we have [W1pW2q]=0-\mathcal{E}[-W_{1}^{p}W_{2}^{q}]=0. ∎

Lemma 6.7.

For φ(x1,x2)=(x1+x2)n\varphi(x_{1},x_{2})=(x_{1}+x_{2})^{n} with even positive integer nn,

[φ(W1,W2)]=𝔼[φ(σ¯ϵ1,σ¯ϵ2)].\mathcal{E}[\varphi(W_{1},W_{2})]=\mathbb{E}[\varphi(\overline{\sigma}\epsilon_{1},\overline{\sigma}\epsilon_{2})].

Furthermore, we have the even moments of (W1+W2)(W_{1}+W_{2}):

[(W1+W2)n]=(n1)!!σ¯n2n/2.\mathcal{E}[(W_{1}+W_{2})^{n}]=(n-1)!!\overline{\sigma}^{n}2^{n/2}.
Proof.

This result directly comes from due to the convexity of φ\varphi, which can be validated by considering its Hessian matrix. Then we can check that

[(W1+W2)n]=𝔼[σ¯n(ϵ1+ϵ2)n]=𝔼[σ¯n(2ϵ1)n]=(n1)!!σ¯n2n/2.\mathcal{E}[(W_{1}+W_{2})^{n}]=\mathbb{E}[\overline{\sigma}^{n}(\epsilon_{1}+\epsilon_{2})^{n}]=\mathbb{E}[\overline{\sigma}^{n}(\sqrt{2}\epsilon_{1})^{n}]=(n-1)!!\overline{\sigma}^{n}2^{n/2}.

Lemma 6.8.

For φ(x1,x2)=(x1+x2)n\varphi(x_{1},x_{2})=(x_{1}+x_{2})^{n} with odd positive integer nn,

[φ(W1,W2)]=𝔼[φ(σ1ϵ1,σ2ϵ2)],\mathcal{E}[\varphi(W_{1},W_{2})]=\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2})], (6.17)

where (σ1,σ2)(\sigma_{1},\sigma_{2}) satisfies:

σ1\displaystyle\sigma_{1} =σ¯,\displaystyle=\overline{\sigma},
σ2\displaystyle\sigma_{2} =σ2(σ1ϵ1)\displaystyle=\sigma_{2}(\sigma_{1}\epsilon_{1})
=σ¯(σ1ϵ1)+σ¯(σ1ϵ1).\displaystyle=\overline{\sigma}(\sigma_{1}\epsilon_{1})^{+}-\underline{\sigma}(\sigma_{1}\epsilon_{1})^{-}. (6.18)

Furthermore, for odd n3n\geq 3, we have the moments of (W1+W2)(W_{1}+W_{2}):

[(W1+W2)n]=2π(k=0(n3)/2Ckσ¯2k+12k1k!),\mathcal{E}[(W_{1}+W_{2})^{n}]=\sqrt{\frac{2}{\pi}}\bigl{(}\sum_{k=0}^{(n-3)/2}C_{k}\overline{\sigma}^{2k+1}2^{k-1}k!\bigr{)},

where Ck=(n2k+1)(n2k1)!!(σ¯n2k1σ¯n2k1).C_{k}=\binom{n}{2k+1}(n-2k-1)!!(\overline{\sigma}^{n-2k-1}-\underline{\sigma}^{n-2k-1}).

Proof.

We can directly check the sublinear expectation

[(W1+W2)n]\displaystyle\mathcal{E}[(W_{1}+W_{2})^{n}] =[i=0n(ni)W1iW2ni].\displaystyle=\mathcal{E}[\sum_{i=0}^{n}\binom{n}{i}W_{1}^{i}W_{2}^{n-i}].

Since the terms in the form of W1nW_{1}^{n}, W2nW_{2}^{n} or W1iW2niW_{1}^{i}W_{2}^{n-i} with even ii (then nin-i is odd) all have zero mean (with no ambiguity), they can be omitted in the computation by 6.6. Hence,

[(W1+W2)n]\displaystyle\mathcal{E}[(W_{1}+W_{2})^{n}] =[i odd(ni)W1iW2ni]\displaystyle=\mathcal{E}[\sum_{i\text{ odd}}\binom{n}{i}W_{1}^{i}W_{2}^{n-i}]
=[[i odd(ni)w1iW2niφ1(w1)]w1=W1].\displaystyle=\mathcal{E}[\underbrace{\mathcal{E}[\sum_{i\text{ odd}}\binom{n}{i}w_{1}^{i}W_{2}^{n-i}}_{\eqqcolon\varphi_{1}(w_{1})}]_{w_{1}=W_{1}}].

The inner part can be expressed as

φ1(w1)\displaystyle\varphi_{1}(w_{1}) =[i odd(ni)w1iW2ni]\displaystyle=\mathcal{E}[\sum_{i\text{ odd}}\binom{n}{i}w_{1}^{i}W_{2}^{n-i}]
=maxσ2[σ¯,σ¯]𝔼[i odd(ni)w1i(σ2ϵ2)ni]\displaystyle=\max_{\sigma_{2}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\sum_{i\text{ odd}}\binom{n}{i}w_{1}^{i}(\sigma_{2}\epsilon_{2})^{n-i}]
=maxσ2[σ¯,σ¯]i odd(ni)w1iσ2ni𝔼[ϵ2ni]\displaystyle=\max_{\sigma_{2}\in{[\underline{\sigma},\overline{\sigma}]}}\sum_{i\text{ odd}}\binom{n}{i}w_{1}^{i}\sigma_{2}^{n-i}\mathbb{E}[\epsilon_{2}^{n-i}]
=maxσ2[σ¯,σ¯]k=0(n3)/2(n2k+1)w12k+1σ2n2k1(n2k1)!!maxσ2[σ¯,σ¯]H(σ2;w1).\displaystyle=\max_{\sigma_{2}\in{[\underline{\sigma},\overline{\sigma}]}}\sum_{k=0}^{(n-3)/2}\binom{n}{2k+1}w_{1}^{2k+1}\sigma_{2}^{n-2k-1}(n-2k-1)!!\eqqcolon\max_{\sigma_{2}\in{[\underline{\sigma},\overline{\sigma}]}}H(\sigma_{2};w_{1}).

Notice that the monotonicity of H(σ2;w1)H(\sigma_{2};w_{1}) with respect to σ2\sigma_{2} depends on the sign of w1w_{1}. Hence

φ1(w1)\displaystyle\varphi_{1}(w_{1}) ={H(σ¯;w1) if w10H(σ¯;w1) if w1<0\displaystyle=\begin{cases}H(\overline{\sigma};w_{1})&\text{ if }w_{1}\geq 0\\ H(\underline{\sigma};w_{1})&\text{ if }w_{1}<0\end{cases}
=𝟙{w10}(H(σ¯;w1)H(σ¯;w1))+H(σ¯;w1)\displaystyle=\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{w_{1}\geq 0}}\right\}}(H(\overline{\sigma};w_{1})-H(\underline{\sigma};w_{1}))+H(\underline{\sigma};w_{1})

Then we have

[(W1+W2)n]\displaystyle\mathcal{E}[(W_{1}+W_{2})^{n}] =[φ1(W1)]\displaystyle=\mathcal{E}[\varphi_{1}(W_{1})]
=[𝟙{W10}(H(σ¯;W1)H(σ¯;W1))+H(σ¯;W1)].\displaystyle=\mathcal{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{W_{1}\geq 0}}\right\}}(H(\overline{\sigma};W_{1})-H(\underline{\sigma};W_{1}))+H(\underline{\sigma};W_{1})].

Here we have

[H(σ¯;W1)]=[k=0(n3)/2(n2k+1)σ2n2k1(n2k1)!!W12k+1],\mathcal{E}[H(\underline{\sigma};W_{1})]=\mathcal{E}[\sum_{k=0}^{(n-3)/2}\binom{n}{2k+1}\sigma_{2}^{n-2k-1}(n-2k-1)!!W_{1}^{2k+1}],

with each W12k+1W_{1}^{2k+1} has certain mean zero, so [H(σ¯;W1)]=[H(σ¯;W1)]=0\mathcal{E}[H(\underline{\sigma};W_{1})]=-\mathcal{E}[-H(\underline{\sigma};W_{1})]=0. Therefore,

[(W1+W2)n]\displaystyle\mathcal{E}[(W_{1}+W_{2})^{n}] =[𝟙{W10}(H(σ¯;W1)H(σ¯;W1))]\displaystyle=\mathcal{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{W_{1}\geq 0}}\right\}}(H(\overline{\sigma};W_{1})-H(\underline{\sigma};W_{1}))]
=[𝟙{W10}k=0(n3)/2CkW12k+1][K(W1)]\displaystyle=\mathcal{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{W_{1}\geq 0}}\right\}}\sum_{k=0}^{(n-3)/2}C_{k}W_{1}^{2k+1}]\eqqcolon\mathcal{E}[K(W_{1})]

with Ck=(n2k+1)(n2k1)!!(σ¯n2k1σ¯n2k1)0C_{k}=\binom{n}{2k+1}(n-2k-1)!!(\overline{\sigma}^{n-2k-1}-\underline{\sigma}^{n-2k-1})\geq 0. Since K(x)K(x) is a convex function by noting that it stays at 0 on x<0x<0 and increases when x0x\geq 0, we have

[(W1+W2)n]=𝔼[K(σ¯ϵ1)].\mathcal{E}[(W_{1}+W_{2})^{n}]=\mathbb{E}[K(\overline{\sigma}\epsilon_{1})].

Therefore, we obtain the optimal of (σ1,σ2)(\sigma_{1},\sigma_{2}) in the form 6.18, which can be doubly checked by plugging it back to the right handside of 6.17 to show the equality. We can further get the exact value of [(W1+W2)n]\mathcal{E}[(W_{1}+W_{2})^{n}] by continuing the procedure above,

[𝟙{W10}k=0(n3)/2CkW12k+1]\displaystyle\mathcal{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{W_{1}\geq 0}}\right\}}\sum_{k=0}^{(n-3)/2}C_{k}W_{1}^{2k+1}] =maxσ1[σ¯,σ¯]𝔼[𝟙{σ1ϵ10}k=0(n3)/2Ck(σ1ϵ1)2k+1]\displaystyle=\max_{\sigma_{1}\in{[\underline{\sigma},\overline{\sigma}]}}\text{$\mathbb{E}$}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\sigma_{1}\epsilon_{1}\geq 0}}\right\}}\sum_{k=0}^{(n-3)/2}C_{k}(\sigma_{1}\epsilon_{1})^{2k+1}]
=𝔼[𝟙{ϵ10}k=0(n3)/2Ck(σ¯ϵ1)2k+1]\displaystyle=\text{$\mathbb{E}$}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}\geq 0}}\right\}}\sum_{k=0}^{(n-3)/2}C_{k}(\overline{\sigma}\epsilon_{1})^{2k+1}]
=k=0(n3)/2Ckσ¯2k+1𝔼[𝟙{ϵ10}ϵ12k+1].\displaystyle=\sum_{k=0}^{(n-3)/2}C_{k}\overline{\sigma}^{2k+1}\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}\geq 0}}\right\}}\epsilon_{1}^{2k+1}].

Here we need to use the property of the classical half-normal distribution:

𝔼[|ϵ|2k+1]\displaystyle\mathbb{E}[|\epsilon|^{2k+1}] =𝔼[|ϵ2k+1|]\displaystyle=\mathbb{E}[|\epsilon^{2k+1}|]
=𝔼[𝟙{ϵ10}ϵ12k+1]+𝔼[𝟙{ϵ1<0}(ϵ1)2k+1].\displaystyle=\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}\geq 0}}\right\}}\epsilon_{1}^{2k+1}]+\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}<0}}\right\}}(-\epsilon_{1})^{2k+1}].

Since ϵ\epsilon and ϵ-\epsilon have the same distribution,

𝔼[𝟙{ϵ1<0}(ϵ1)2k+1]\displaystyle\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}<0}}\right\}}(-\epsilon_{1})^{2k+1}] =𝔼[𝟙{ϵ10}(ϵ1)2k+1]\displaystyle=\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}\leq 0}}\right\}}(-\epsilon_{1})^{2k+1}]
=𝔼[𝟙{ϵ10}(ϵ1)2k+1]\displaystyle=\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{-\epsilon_{1}\geq 0}}\right\}}(-\epsilon_{1})^{2k+1}]
=𝔼[𝟙{ϵ10}ϵ12k+1],\displaystyle=\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}\geq 0}}\right\}}\epsilon_{1}^{2k+1}],

then 𝔼[|ϵ|2k+1]=2𝔼[𝟙{ϵ10}ϵ12k+1].\mathbb{E}[|\epsilon|^{2k+1}]=2\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}\geq 0}}\right\}}\epsilon_{1}^{2k+1}]. Hence,

𝔼[𝟙{ϵ10}ϵ12k+1]=12𝔼[|ϵ|2k+1].\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}\geq 0}}\right\}}\epsilon_{1}^{2k+1}]=\frac{1}{2}\mathbb{E}[|\epsilon|^{2k+1}].

Also notice that ϵN(0,1)\epsilon\sim N(0,1), then |ϵ||\epsilon| follows a half-normal distribution or |ϵ|=|ϵ|/1|\epsilon|=|\epsilon|/1 follows a χ1\chi_{1}-distribution with raw moments:

𝔼[|ϵ|n]=2n/2Γ((n+1)/2)Γ(1/2),\mathbb{E}[|\epsilon|^{n}]=2^{n/2}\frac{\Gamma((n+1)/2)}{\Gamma(1/2)},

Then for n=2k+1n=2k+1, with kk\in\mathbb{N}

𝔼[|ϵ|2k+1]=2k2πk!.\mathbb{E}[\lvert\epsilon\rvert^{2k+1}]=2^{k}\sqrt{\frac{2}{\pi}}k!.

Therefore,

[(W1+W2)n]\displaystyle\mathcal{E}[(W_{1}+W_{2})^{n}] =𝔼[K(σ¯ϵ1)]\displaystyle=\mathbb{E}[K(\overline{\sigma}\epsilon_{1})]
=k=0(n3)/2Ckσ¯2k+112𝔼[|ϵ|2k+1]\displaystyle=\sum_{k=0}^{(n-3)/2}C_{k}\overline{\sigma}^{2k+1}\frac{1}{2}\mathbb{E}[|\epsilon|^{2k+1}]
=2π(k=0(n3)/2Ckσ¯2k+12k1k!).\displaystyle=\sqrt{\frac{2}{\pi}}\bigl{(}\sum_{k=0}^{(n-3)/2}C_{k}\overline{\sigma}^{2k+1}2^{k-1}k!\bigr{)}.\qed
Proof of 4.8.

The representation under semi-sequential independence can be directly checked based on 3.24 and 3.24.3. In the following context, we only consider the case of sequential independence W1W2W_{1}\dashrightarrow W_{2}, because W1FW2W_{1}\overset{\text{F}}{\dashrightarrow}W_{2} will induce the same result by a similar logic to the proof of 3.24. The basic idea is we need to show that [φ(W1,W2)]\mathcal{E}[\varphi(W_{1},W_{2})] can be reached by the linear expectation on the right handside for some 𝝈20[σ¯,σ¯]\bm{\sigma}\in\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}. Then we have

[φ(W1,W2)]max𝝈20[σ¯,σ¯]𝔼[φ(σ1ϵ1,σ2ϵ2)].\mathcal{E}[\varphi(W_{1},W_{2})]\leq\max_{\bm{\sigma}\in\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2})].

The reverse direction of inequality comes from the fact that 20[σ¯,σ¯]2[σ¯,σ¯]\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{L}_{2}{[\underline{\sigma},\overline{\sigma}]} and 3.24. The logic here is similar to the proof of 3.24 in sequential-independence case, that is, we only need to record the optimal choice of (σ1,σ2)(\sigma_{1},\sigma_{2}) when evaluating the sublinear expectation in an iterative way.

For instance, when φ(x1,x2)=(x1+x2)n\varphi(x_{1},x_{2})=(x_{1}+x_{2})^{n}, the sublinear expectation can be reached by some 𝝈20[σ¯,σ¯]\bm{\sigma}\in\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]} as illustrated in 6.7 and 6.8. For φ(x1,x2)=x1px2q\varphi(x_{1},x_{2})=x_{1}^{p}x_{2}^{q} with p,q+p,q\in\mathbb{N}_{+},

[W1pW2q]=[(maxσ2[σ¯,σ¯]w1p𝔼[(σ2ϵ2)q])].\mathcal{E}[W_{1}^{p}W_{2}^{q}]=\mathcal{E}[(\max_{\sigma_{2}\in{[\underline{\sigma},\overline{\sigma}]}}w_{1}^{p}\mathbb{E}[(\sigma_{2}\epsilon_{2})^{q}])]. (6.19)

Meanwhile, for any (σ1,σ2)20[σ¯,σ¯](\sigma_{1},\sigma_{2})\in\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}, let Yi=σiϵi,i=1,2Y_{i}=\sigma_{i}\epsilon_{i},i=1,2 and 𝔼=𝔼\mathbb{E}=\mathbb{E}_{\mathbb{P}} denote the linear expectation. Note that σ1{σ¯,σ¯}\sigma_{1}\in\{\underline{\sigma},\overline{\sigma}\},

σ2=σ2(Y1)=(σ22σ21)𝟙{Y1>0}+σ21,\sigma_{2}=\sigma_{2}(Y_{1})=(\sigma_{22}-\sigma_{21})\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{Y_{1}>0}}\right\}}+\sigma_{21},

with (σ21,σ22){σ¯,σ¯}2(\sigma_{21},\sigma_{22})\in\{\underline{\sigma},\overline{\sigma}\}^{2}. We also have Y1ϵ2Y_{1}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon_{2} due to the setup which is the same as Section 3.7. Then

𝔼[Y1pY2q]\displaystyle\mathbb{E}[Y_{1}^{p}Y_{2}^{q}] =𝔼[Y1p𝔼[Y2q|Y1]]\displaystyle=\mathbb{E}\bigl{[}Y_{1}^{p}\mathbb{E}[Y_{2}^{q}|Y_{1}]\bigr{]}
=𝔼[Y1p𝔼[(σ2(Y1))qϵ2q|Y1]]\displaystyle=\mathbb{E}\bigl{[}Y_{1}^{p}\mathbb{E}[(\sigma_{2}(Y_{1}))^{q}\epsilon_{2}^{q}|Y_{1}]\bigr{]}
=𝔼[(σ1ϵ1)p(σ2(Y1))q]𝔼[ϵ2q]\displaystyle=\mathbb{E}[(\sigma_{1}\epsilon_{1})^{p}(\sigma_{2}(Y_{1}))^{q}]\mathbb{E}[\epsilon_{2}^{q}]
=σ1p𝔼[(σ22σ21)𝟙{Y1>0}+σ21)qϵ1p]𝔼[ϵ2q]\displaystyle=\sigma_{1}^{p}\mathbb{E}[(\sigma_{22}-\sigma_{21})\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{Y_{1}>0}}\right\}}+\sigma_{21})^{q}\epsilon_{1}^{p}]\mathbb{E}[\epsilon_{2}^{q}]
=σ1p𝔼[(σ22qσ21q)𝟙{ϵ1>0}+σ21q)ϵ1p]𝔼[ϵ2q].\displaystyle=\sigma_{1}^{p}\mathbb{E}[(\sigma_{22}^{q}-\sigma_{21}^{q})\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}>0}}\right\}}+\sigma_{21}^{q})\epsilon_{1}^{p}]\mathbb{E}[\epsilon_{2}^{q}].

Hence,

𝔼[Y1pY2q]=σ1p((σ22qσ21q)𝔼[𝟙{ϵ1>0}ϵ1p]+σ21q𝔼[ϵ1p])𝔼[ϵ2q].\mathbb{E}[Y_{1}^{p}Y_{2}^{q}]=\sigma_{1}^{p}\bigl{(}(\sigma_{22}^{q}-\sigma_{21}^{q})\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}>0}}\right\}}\epsilon_{1}^{p}]+\sigma_{21}^{q}\mathbb{E}[\epsilon_{1}^{p}]\bigr{)}\mathbb{E}[\epsilon_{2}^{q}]. (6.20)

Then we divide our discussions into three cases: a) qq is odd, b) qq is even and pp is even, c) qq is even and pp is odd. When qq is odd, the expectation in 6.19 is equal to 0 by 6.6. It can be obviously reached the linear expectation on the right handside by choosing any 𝝈20[σ¯,σ¯]\bm{\sigma}\in\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]} by 6.20. When qq is even, we can see that the choice of σ2\sigma_{2} depends on the sign of w1pw_{1}^{p} which further depends on the sign of w1w_{1} if pp is odd (otherwise it is always non-negative). To be specific, when both qq and pp are even,

[W1pW2q]\displaystyle\mathcal{E}[W_{1}^{p}W_{2}^{q}] =𝔼[ϵ2q]σ¯q[W1P]=σ¯pσ¯q𝔼[ϵ1p]𝔼[ϵ2q],\displaystyle=\mathbb{E}[\epsilon_{2}^{q}]\overline{\sigma}^{q}\mathcal{E}[W_{1}^{P}]=\overline{\sigma}^{p}\overline{\sigma}^{q}\mathbb{E}[\epsilon_{1}^{p}]\mathbb{E}[\epsilon_{2}^{q}],

which can be reached by choosing σ1=σ2=σ¯\sigma_{1}=\sigma_{2}=\overline{\sigma}, namely, (σ21,σ22)=(σ¯,σ¯)(\sigma_{21},\sigma_{22})=(\overline{\sigma},\overline{\sigma}). When qq is even and pp is odd, we have

[W1pW2q]\displaystyle\mathcal{E}[W_{1}^{p}W_{2}^{q}] =𝔼[ϵ2q][σ¯q(W1p)+σ¯q(W1p)]\displaystyle=\mathbb{E}[\epsilon_{2}^{q}]\mathcal{E}[\overline{\sigma}^{q}(W_{1}^{p})^{+}-\underline{\sigma}^{q}(W_{1}^{p})^{-}]
=𝔼[ϵ2q][(σ¯pσ¯q)(W1p)++σ¯qW1p]\displaystyle=\mathbb{E}[\epsilon_{2}^{q}]\mathcal{E}[(\overline{\sigma}^{p}-\underline{\sigma}^{q})(W_{1}^{p})^{+}+\underline{\sigma}^{q}W_{1}^{p}]
=𝔼[ϵ2q][(σ¯pσ¯q)(W1p)+].\displaystyle=\mathbb{E}[\epsilon_{2}^{q}]\mathcal{E}[(\overline{\sigma}^{p}-\underline{\sigma}^{q})(W_{1}^{p})^{+}].

Hence, by 3.12, we have

[W1pW2q]={𝔼[ϵ2q](σ¯pσ¯q)𝔼[(σ¯pϵ1p))+]if σ¯pσ¯q𝔼[ϵ2q](σ¯pσ¯q)𝔼[(σ¯pϵ1p))+]if σ¯p<σ¯q\mathcal{E}[W_{1}^{p}W_{2}^{q}]=\begin{cases}\mathbb{E}[\epsilon_{2}^{q}](\overline{\sigma}^{p}-\underline{\sigma}^{q})\mathbb{E}[(\overline{\sigma}^{p}\epsilon_{1}^{p}))^{+}]&\text{if }\overline{\sigma}^{p}\geq\underline{\sigma}^{q}\\ \mathbb{E}[\epsilon_{2}^{q}](\underline{\sigma}^{p}-\overline{\sigma}^{q})\mathbb{E}[-(\underline{\sigma}^{p}\epsilon_{1}^{p}))^{+}]&\text{if }\overline{\sigma}^{p}<\underline{\sigma}^{q}\end{cases}

It can be reached by choosing σ1\sigma_{1} and (σ21,σ22)(\sigma_{21},\sigma_{22}) accordingly in 6.20. Similar logic can be applied to φ(x1,x2)=cx1px2q\varphi(x_{1},x_{2})=cx_{1}^{p}x_{2}^{q} and also φ(x1,x2)=(ax1+bx2)n\varphi(x_{1},x_{2})=(ax_{1}+bx_{2})^{n} where the scaling does not have effects on the form of the optimal choice of 𝝈\bm{\sigma}. ∎

6.6 Proofs in Section 5.2

To prepare for the proof, we consider the following function space:

  • Ck()C^{k}(\mathbb{R}): the space of kk-times continuously differentiable functions on \mathbb{R}

  • Cb()C_{b}(\mathbb{R}): the space of bounded and continuous functions on \mathbb{R},

  • C()={φC2():φ is bounded and uniformly continuous}C^{*}(\mathbb{R})=\{\varphi\in C^{2}(\mathbb{R}):\varphi^{\prime\prime}\text{ is bounded and uniformly continuous}\}.

For any φC()\varphi\in C^{*}(\mathbb{R}), since φ\varphi^{\prime\prime} is bounded, we have

Msupx|φ(x)|<.M\coloneqq\sup_{x\in\mathbb{R}}\lvert\varphi^{\prime\prime}(x)\rvert<\infty.

The following 6.9 shows that we only need to check those φC()\varphi\in C^{*}(\mathbb{R}) to prove 5.9.

Lemma 6.9.

Assume [|Zn|]<\mathcal{E}[\lvert Z_{n}\rvert]<\infty and [|Z|]<\mathcal{E}[\lvert Z\rvert]<\infty. Suppose the convergence

[φ(Zn)][φ(Z)],\mathcal{E}[\varphi(Z_{n})]\to\mathcal{E}[\varphi(Z)], (6.21)

holds for any φC()\varphi\in C^{*}(\mathbb{R}). It also holds for φCb()\varphi\in C_{b}(\mathbb{R}).

Proof of 6.9.

We first consider φCb()\varphi\in C_{b}(\mathbb{R}) with a compact support SS. By the uniform approximation provided by Pursell, (1967), for any a>0a>0, there exists φaC3()\varphi_{a}\in C^{3}(\mathbb{R}) with support SS such that

supx|φ(x)φa(x)|<a2.\sup_{x\in\mathbb{R}}\lvert\varphi(x)-\varphi_{a}(x)\rvert<\frac{a}{2}.

For k=1,2,3k=1,2,3, since φa(k)\varphi_{a}^{(k)} is continuous and it is on a compact support, it must be bounded by MkM_{k}. By mean-value theorem, for δ>0\delta>0 and some β[0,1]\beta\in[0,1], we have

|φa(2)(x)φa(2)(x+δ)||φ(3)(x+βδ)|δM3δ.\lvert\varphi_{a}^{(2)}(x)-\varphi_{a}^{(2)}(x+\delta)\rvert\leq\lvert\varphi^{(3)}(x+\beta\delta)\rvert\delta\leq M_{3}\delta.

Thus φa(2)\varphi_{a}^{(2)} is uniformly continuous and bounded, implying φaC()\varphi_{a}\in C^{*}(\mathbb{R}). In this way, we have

|[φ(Zn)][φ(Z)]|\displaystyle\lvert\mathcal{E}[\varphi(Z_{n})]-\mathcal{E}[\varphi(Z)]\rvert |[φ(Zn)][φa(Zn)]|+|[φ(Z)][φa(Z)]|\displaystyle\leq\lvert\mathcal{E}[\varphi(Z_{n})]-\mathcal{E}[\varphi_{a}(Z_{n})]\rvert+\lvert\mathcal{E}[\varphi(Z)]-\mathcal{E}[\varphi_{a}(Z)]\rvert
+|[φa(Zn)][φa(Z)]|\displaystyle+\lvert\mathcal{E}[\varphi_{a}(Z_{n})]-\mathcal{E}[\varphi_{a}(Z)]\rvert
a+|[φa(Zn)][φa(Z)]|.\displaystyle\leq a+\lvert\mathcal{E}[\varphi_{a}(Z_{n})]-\mathcal{E}[\varphi_{a}(Z)]\rvert.

Hence, lim supn|[φ(Zn)][φ(Z)]|a\limsup_{n\to\infty}\lvert\mathcal{E}[\varphi(Z_{n})]-\mathcal{E}[\varphi(Z)]\rvert\leq a. It means that

0lim infn|[φ(Zn)][φ(Z)]|lim supn|[φ(Zn)][φ(Z)]|a.0\leq\liminf_{n\to\infty}\lvert\mathcal{E}[\varphi(Z_{n})]-\mathcal{E}[\varphi(Z)]\rvert\leq\limsup_{n\to\infty}\lvert\mathcal{E}[\varphi(Z_{n})]-\mathcal{E}[\varphi(Z)]\rvert\leq a.

Since aa can be arbitrarily small, we have the convergence 6.21 holds.

Next consider any φCb()\varphi\in C_{b}(\mathbb{R}) which is bounded by BB. For any K>0K>0, it can be decomposed into φ=φ1+φ2\varphi=\varphi_{1}+\varphi_{2} where φ1\varphi_{1} has compact support [K,K][-K,K] and φ2\varphi_{2} satisfies φ2(x)=0\varphi_{2}(x)=0 if |x|K\lvert x\rvert\leq K and for |x|>K\lvert x\rvert>K,

|φ2(x)|BB|x|K,\lvert\varphi_{2}(x)\rvert\leq B\leq\frac{B\lvert x\rvert}{K},

Then we have

|[φ(Zn)][φ(Z)]||[φ1(Zn)][φ1(Z)]|+|[φ2(Zn)][φ2(Z)]|,\lvert\mathcal{E}[\varphi(Z_{n})]-\mathcal{E}[\varphi(Z)]\rvert\leq\lvert\mathcal{E}[\varphi_{1}(Z_{n})]-\mathcal{E}[\varphi_{1}(Z)]\rvert+\lvert\mathcal{E}[\varphi_{2}(Z_{n})]-\mathcal{E}[\varphi_{2}(Z)]\rvert,

where the first term must converge by our previous argument. Then we only need to work on the second term that satisfies:

|[φ2(Zn)][φ2(Z)]|BK([|Zn|]+[|Z|]).\lvert\mathcal{E}[\varphi_{2}(Z_{n})]-\mathcal{E}[\varphi_{2}(Z)]\rvert\leq\frac{B}{K}(\mathcal{E}[\lvert Z_{n}\rvert]+\mathcal{E}[\lvert Z\rvert]).

Note that L[|Zn|]+[|Z|]<L\coloneqq\mathcal{E}[\lvert Z_{n}\rvert]+\mathcal{E}[\lvert Z\rvert]<\infty. Then we have lim supn|[φ(Zn)][φ(Z)]|BLK\limsup_{n\to\infty}\lvert\mathcal{E}[\varphi(Z_{n})]-\mathcal{E}[\varphi(Z)]\rvert\leq\frac{BL}{K}. Since KK can be arbitrarily large, we obtain the convergence 6.21. ∎

Lemma 6.10.

For any φC()\varphi\in C^{*}(\mathbb{R}), the function δ:++\delta:\mathbb{R}_{+}\to\mathbb{R}_{+}, defined as

δ(a)sup|xy|a|φ(x)φ(y)|,\delta(a)\coloneqq\sup_{\lvert x-y\rvert\leq a}\lvert\varphi^{\prime\prime}(x)-\varphi^{\prime\prime}(y)\rvert,

must be a bounded and increasing one. It also satisfies lima0δ(a)=0\lim_{a\downarrow 0}\delta(a)=0.

Proofs of 6.10.

The boundedness (and the limit property) can be directly derive from the boundedness (and uniform continuity) of φ\varphi^{\prime\prime}. For the monotone property, for any 0<ab0<a\leq b, since {(x,y):|xy|a}{(x,y):|xy|b}\{(x,y):\lvert x-y\rvert\leq a\}\subset\{(x,y):\lvert x-y\rvert\leq b\}, we must have δ(a)δ(b)\delta(a)\leq\delta(b). ∎

Proof of 5.9.

We adapt the the idea of the Linderberg method in a “leave-one-out” manner to the sublinear context. One of the reason that we are able to do such adaptation is the symmetry in semi-GG-independence: XiX_{i} is semi-GG-independent from {Xj,ji}\{X_{j},j\neq i\}.

Note that Xi=ViηiX_{i}=V_{i}\eta_{i} with the semi-GG-independence then we have

(V1,,Vn)(η1,,ηn).(V_{1},\dots,V_{n})\dashrightarrow(\eta_{1},\dotsc,\eta_{n}).

Then we consider a sequence of classically i.i.d. {ϵi}i=1n\{\epsilon_{i}\}_{i=1}^{n} satisfying ϵ1N(0,1)\epsilon_{1}\sim N(0,1) and

(V1,,Vn)(η1,,ηn)(ϵ1,,ϵn).(V_{1},\dots,V_{n})\dashrightarrow(\eta_{1},\dotsc,\eta_{n})\dashrightarrow(\epsilon_{1},\dotsc,\epsilon_{n}).

For each nn, consider a triangle array,

ei,n=Xin,e_{i,n}=\frac{X_{i}}{\sqrt{n}},

and

Sne1,n++en,n.S_{n}\coloneqq e_{1,n}+\cdots+e_{n,n}.

For this nn, consider another triangle array {Wi,n}i=1n{(Viϵi)/n}i=1n\{W_{i,n}\}_{i=1}^{n}\coloneqq\{(V_{i}\epsilon_{i})/\sqrt{n}\}_{i=1}^{n} which are semi-GG-version i.i.d. following semi-GG-normal and satisfy

Wi,n=dW1,n=dWn.W_{i,n}\overset{\text{d}}{=}W_{1,n}\overset{\text{d}}{=}\frac{W}{\sqrt{n}}.

Note that here we use the same ViV_{i} sequence in WiW_{i}. This setup is important for our proof to overcome the difficulty brought by the sublinear property of \mathcal{E} (it also gives some insight about the role of σ2\sigma^{2} in the classical central limit theorem compared with V2V^{2} in sublinear context). Let

WnW1,n++Wn,n,W_{n}\coloneqq W_{1,n}+\cdots+W_{n,n},

then we must have Wn𝒩^(0,[σ¯2,σ¯2])W_{n}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]) (by the stability of semi-GG-normal as shown in 3.23);

Our goal is to show the difference, for any φC()\varphi\in C^{*}(\mathbb{R}) (recall 6.9), as nn\to\infty,

|[φ(Sn)][φ(W)]|=|[φ(Sn)][φ(Wn)]|0.\lvert\mathcal{E}[\varphi(S_{n})]-\mathcal{E}[\varphi(W)]\rvert=\lvert\mathcal{E}[\varphi(S_{n})]-\mathcal{E}[\varphi(W_{n})]\rvert\to 0. (6.22)

Consider the following summations:

Mi,n=j=1iej,n+j=i+1nWj,n,M_{i,n}=\sum_{j=1}^{i}e_{j,n}+\sum_{j=i+1}^{n}W_{j,n}, (6.23)

and

Ui,n=j=1i1ej,n+j=i+1nWj,n,U_{i,n}=\sum_{j=1}^{i-1}e_{j,n}+\sum_{j=i+1}^{n}W_{j,n}, (6.24)

with the common convention that an empty sum is defined as zero. Note that M0,n=WnM_{0,n}=W_{n} and Mn,n=SnM_{n,n}=S_{n}, then we can transform the difference in 6.22 to the telescoping sum

[φ(Sn)][φ(Wn)]\displaystyle\mathcal{E}[\varphi(S_{n})]-\mathcal{E}[\varphi(W_{n})] [φ(Sn)φ(Wn)]\displaystyle\leq\mathcal{E}[\varphi(S_{n})-\varphi(W_{n})]
=i=1n(φ(Mi,n)φ(Mi1,n))\displaystyle=\mathcal{E}\sum_{i=1}^{n}(\varphi(M_{i,n})-\varphi(M_{i-1,n}))
i=1n[φ(Mi,n)φ(Mi1,n)].\displaystyle\leq\sum_{i=1}^{n}\mathcal{E}[\varphi(M_{i,n})-\varphi(M_{i-1,n})]. (6.25)

and

[φ(Wn)][φ(Sn)]j=1n[φ(Mnj,n)φ(Mnj+1,n)].\mathcal{E}[\varphi(W_{n})]-\mathcal{E}[\varphi(S_{n})]\leq\sum_{j=1}^{n}\mathcal{E}[\varphi(M_{n-j,n})-\varphi(M_{n-j+1,n})]. (6.26)

Then we only need to work on the summand [φ(Mi,n)φ(Mi1,n)]\mathcal{E}[\varphi(M_{i,n})-\varphi(M_{i-1,n})]. By a Taylor expansion,

φ(Mi,n)φ(Mi1,n)\displaystyle\varphi(M_{i,n})-\varphi(M_{i-1,n}) =φ(Ui,n+ei,n)φ(Ui,n+Wi,n)\displaystyle=\varphi(U_{i,n}+e_{i,n})-\varphi(U_{i,n}+W_{i,n})
=(ei,nWi,n)φ(Ui,n)\displaystyle=(e_{i,n}-W_{i,n})\varphi^{\prime}(U_{i,n})
+[12ei,n2φ(Ui,n+αei,n)12Wi,n2φ(Ui,n+βWi,n)],\displaystyle+[\frac{1}{2}e_{i,n}^{2}\varphi^{\prime\prime}(U_{i,n}+\alpha e_{i,n})-\frac{1}{2}W_{i,n}^{2}\varphi^{\prime\prime}(U_{i,n}+\beta W_{i,n})],
(a)+(b)\displaystyle\eqqcolon(a)+(b)

for some α,β[0,1]\alpha,\beta\in[0,1].

For the first term (a)(a), its sublinear expectation must exist because the growth of φ\varphi^{\prime} is at most linear due to the boundedness of φ\varphi^{\prime\prime}. Note that Ui,nU_{i,n} is the inner product of

Vu=(V1,,Vi1,Vi+1,,Vn),V_{u}=(V_{1},\dotsc,V_{i-1},V_{i+1},\dotsc,V_{n}),

and

ξu=(η1,,ηi1,ϵi+1,,ϵn),\xi_{u}=(\eta_{1},\dotsc,\eta_{i-1},\epsilon_{i+1},\dotsc,\epsilon_{n}),

with the independence VuξuV_{u}\dashrightarrow\xi_{u}, so we have ei,nWi,n(=n1/2Vi(ηiϵi))e_{i,n}-W_{i,n}(=n^{-1/2}V_{i}(\eta_{i}-\epsilon_{i})) and Ui,nU_{i,n} are semi-GG-independent. Then we can compute

[(ei,nWi,n)φ(Ui,n)]\displaystyle\mathcal{E}[(e_{i,n}-W_{i,n})\varphi^{\prime}(U_{i,n})] =max(vi,vu)𝔼[n1/2vi(ηiϵi)φ(vuTξu)]\displaystyle=\max_{(v_{i},v_{u})}\mathbb{E}[n^{-1/2}v_{i}(\eta_{i}-\epsilon_{i})\varphi^{\prime}(v_{u}^{T}\xi_{u})]
(classical indep.)\displaystyle(\text{classical indep.}) =max(vi,vu)n1/2vi𝔼[ηiϵi]=0𝔼[φ(vuTξu)]=0.\displaystyle=\max_{(v_{i},v_{u})}n^{-1/2}v_{i}\underbrace{\mathbb{E}[\eta_{i}-\epsilon_{i}]}_{=0}\mathbb{E}[\varphi^{\prime}(v_{u}^{T}\xi_{u})]=0.

Similarly, we have [(ei,nWi,n)φ(Ui,n)]=0-\mathcal{E}[-(e_{i,n}-W_{i,n})\varphi^{\prime}(U_{i,n})]=0. Hence, (a)(a) has certain mean zero. Then we have

[φ(Mi,n)φ(Mi1,n)]=[(b)].\mathcal{E}[\varphi(M_{i,n})-\varphi(M_{i-1,n})]=\mathcal{E}[(b)].

For the second term (b)(b), note that

2×(b)\displaystyle 2\times(b)
=\displaystyle= ei,n2[φ(Ui,n+αei,n)φ(Ui,n)]Wi,n2[φ(Ui,n+βWi,n)φ(Ui,n)]+(ei,n2Wi,n2)φ(Ui,n)\displaystyle e_{i,n}^{2}[\varphi^{\prime\prime}(U_{i,n}+\alpha e_{i,n})-\varphi^{\prime\prime}(U_{i,n})]-W_{i,n}^{2}[\varphi^{\prime\prime}(U_{i,n}+\beta W_{i,n})-\varphi^{\prime\prime}(U_{i,n})]+(e_{i,n}^{2}-W_{i,n}^{2})\varphi^{\prime\prime}(U_{i,n})
\displaystyle\eqqcolon (b)1(b)2+(b)3.\displaystyle(b)_{1}-(b)_{2}+(b)_{3}.

For (b)1(b)_{1}, since |αei,n||ei,n|\lvert\alpha e_{i,n}\rvert\leq\lvert e_{i,n}\rvert, by recalling the property of δ()\delta(\cdot) (6.10), we have

[|(b)1|][ei,n2δ(|ei,n|)]=1n[X12δ(n1/2|X1|)],\mathcal{E}[\lvert(b)_{1}\rvert]\leq\mathcal{E}[e_{i,n}^{2}\delta(\lvert e_{i,n}\rvert)]=\frac{1}{n}\mathcal{E}[X_{1}^{2}\delta(n^{-1/2}\lvert X_{1}\rvert)],

where we use the setup ei,n=Xine_{i,n}=\frac{X_{i}}{\sqrt{n}} and Xi=dX1X_{i}\overset{\text{d}}{=}X_{1}. Similarly, we have

[|(b)2|][Wi,n2δ(|Wi,n|)]=1n[W2δ(n1/2|W|)],\mathcal{E}[\lvert(b)_{2}\rvert]\leq\mathcal{E}[W_{i,n}^{2}\delta(\lvert W_{i,n}\rvert)]=\frac{1}{n}\mathcal{E}[W^{2}\delta(n^{-1/2}\lvert W\rvert)],

where we use the setup Wi,n=dWnW_{i,n}\overset{\text{d}}{=}\frac{W}{\sqrt{n}}. For (b)3(b)_{3}, since (ei,n,Wi,n)(e_{i,n},W_{i,n}) and Ui,nU_{i,n} are semi-GG-independent, (noting that ei,ne_{i,n} and Wi,nW_{i,n} depend on the same ViV_{i},) we have

[(b)3]\displaystyle\mathcal{E}[(b)_{3}] =max(vi,vu)𝔼[n1vi2(ηi2ϵi2)φ(vuTξu)]\displaystyle=\max_{(v_{i},v_{u})}\mathbb{E}[n^{-1}v_{i}^{2}(\eta_{i}^{2}-\epsilon_{i}^{2})\varphi^{\prime\prime}(v_{u}^{T}\xi_{u})]
(classical indep.)\displaystyle(\text{classical indep.}) =max(vi,vu)n1vi2𝔼[ηi2ϵi2]=0𝔼[φ(vuTξu)]=0,\displaystyle=\max_{(v_{i},v_{u})}n^{-1}v_{i}^{2}\underbrace{\mathbb{E}[\eta_{i}^{2}-\epsilon_{i}^{2}]}_{=0}\mathbb{E}[\varphi^{\prime\prime}(v_{u}^{T}\xi_{u})]=0,

where we use the fact that 𝔼[ηi2]=𝔼[ϵi2]=1\mathbb{E}[\eta_{i}^{2}]=\mathbb{E}[\epsilon_{i}^{2}]=1. Similarly we have [(b)3]=0-\mathcal{E}[-(b)_{3}]=0 so (b)3(b)_{3} has certain mean zero. Therefore, we have

[φ(Mi,n)φ(Mi1,n)]\displaystyle\mathcal{E}[\varphi(M_{i,n})-\varphi(M_{i-1,n})] =12[(b)1(b)2]\displaystyle=\frac{1}{2}\mathcal{E}[(b)_{1}-(b)_{2}]
12([|b|1]+[|b|2])\displaystyle\leq\frac{1}{2}(\mathcal{E}[\lvert b\rvert_{1}]+\mathcal{E}[\lvert b\rvert_{2}])
=12n([X12δ(n1/2|X1|)]+[W2δ(n1/2|W|)]).\displaystyle=\frac{1}{2n}(\mathcal{E}[X_{1}^{2}\delta(n^{-1/2}\lvert X_{1}\rvert)]+\mathcal{E}[W^{2}\delta(n^{-1/2}\lvert W\rvert)]).

Meanwhile, if we reverse the role of φ(Mi,n)\varphi(M_{i,n}) and φ(Mi1,n)\varphi(M_{i-1,n}) and let i=nj+1i=n-j+1 with j=1,2,,nj=1,2,\dotsc,n, we get

[φ(Mnj,n)φ(Mnj+1,n)]\displaystyle\mathcal{E}[\varphi(M_{n-j,n})-\varphi(M_{n-j+1,n})] =[φ(Mi1,n)φ(Mi,n)]\displaystyle=\mathcal{E}[\varphi(M_{i-1,n})-\varphi(M_{i,n})]
=12[(b)2(b)1]\displaystyle=\frac{1}{2}\mathcal{E}[(b)_{2}-(b)_{1}]
12([|b|2]+[|b|1])\displaystyle\leq\frac{1}{2}(\mathcal{E}[\lvert b\rvert_{2}]+\mathcal{E}[\lvert b\rvert_{1}])
=12n([X12δ(n1/2|X1|)]+[W2δ(n1/2|W|)]).\displaystyle=\frac{1}{2n}(\mathcal{E}[X_{1}^{2}\delta(n^{-1/2}\lvert X_{1}\rvert)]+\mathcal{E}[W^{2}\delta(n^{-1/2}\lvert W\rvert)]).

Hence, by 6.25 and 6.26, we have

|[φ(Sn)][φ(W)]|\displaystyle\lvert\mathcal{E}[\varphi(S_{n})]-\mathcal{E}[\varphi(W)]\rvert =|[φ(Sn)][φ(Wn)]|\displaystyle=\lvert\mathcal{E}[\varphi(S_{n})]-\mathcal{E}[\varphi(W_{n})]\rvert
=max{[φ(Sn)][φ(Wn)],[φ(Wn)][φ(Sn)]}\displaystyle=\max\{\mathcal{E}[\varphi(S_{n})]-\mathcal{E}[\varphi(W_{n})],\mathcal{E}[\varphi(W_{n})]-\mathcal{E}[\varphi(S_{n})]\}
max{i=1n[φ(Mi,n)φ(Mi1,n)],j=1n[φ(Mnj,n)φ(Mnj+1,n)].}\displaystyle\leq\max\{\sum_{i=1}^{n}\mathcal{E}[\varphi(M_{i,n})-\varphi(M_{i-1,n})],\sum_{j=1}^{n}\mathcal{E}[\varphi(M_{n-j,n})-\varphi(M_{n-j+1,n})].\}
i=1n12n([X12δ(n1/2|X1|)]+[W2δ(n1/2|W|)])\displaystyle\leq\sum_{i=1}^{n}\frac{1}{2n}(\mathcal{E}[X_{1}^{2}\delta(n^{-1/2}\lvert X_{1}\rvert)]+\mathcal{E}[W^{2}\delta(n^{-1/2}\lvert W\rvert)])
=12([X12δ(n1/2|X1|)]+[W2δ(n1/2|W|)]).\displaystyle=\frac{1}{2}(\mathcal{E}[X_{1}^{2}\delta(n^{-1/2}\lvert X_{1}\rvert)]+\mathcal{E}[W^{2}\delta(n^{-1/2}\lvert W\rvert)]).

Note that for any v1[σ¯,σ¯]v_{1}\in{[\underline{\sigma},\overline{\sigma}]}, |v1η1|σ¯|η|1\lvert v_{1}\eta_{1}\rvert\leq\overline{\sigma}\lvert\eta\rvert_{1}, we have

[X12δ(n1/2|X1|)]\displaystyle\mathcal{E}[X_{1}^{2}\delta(n^{-1/2}\lvert X_{1}\rvert)] =maxv1[σ¯,σ¯]𝔼[v12η12δ(n1/2|v1η1|)]\displaystyle=\max_{v_{1}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[v_{1}^{2}\eta_{1}^{2}\delta(n^{-1/2}\lvert v_{1}\eta_{1}\rvert)]
(monotonicity of δ)\displaystyle(\text{monotonicity of }\delta) maxv1[σ¯,σ¯]v12𝔼[η12δ(n1/2σ¯|η1|)]\displaystyle\leq\max_{v_{1}\in{[\underline{\sigma},\overline{\sigma}]}}v_{1}^{2}\mathbb{E}[\eta_{1}^{2}\delta(n^{-1/2}\overline{\sigma}\lvert\eta_{1}\rvert)]
=σ¯2𝔼[η12δ(n1/2σ¯|η1|)].\displaystyle=\overline{\sigma}^{2}\mathbb{E}[\eta_{1}^{2}\delta(n^{-1/2}\overline{\sigma}\lvert\eta_{1}\rvert)].

By 6.10, we have δ(a)2M\delta(a)\leq 2M for all a+a\in\mathbb{R}^{+} so η12δ(n1/2σ¯|η1|)2Mη12\eta_{1}^{2}\delta(n^{-1/2}\overline{\sigma}\lvert\eta_{1}\rvert)\leq 2M\eta_{1}^{2}. Meanwhile, we have η12δ(n1/2σ¯|η1|)0\eta_{1}^{2}\delta(n^{-1/2}\overline{\sigma}\lvert\eta_{1}\rvert)\to 0 (classically) almost surely as nn\to\infty, then by classical dominance convergence theorem, we have 𝔼[η12δ(n1/2σ¯|η1|)]0\mathbb{E}[\eta_{1}^{2}\delta(n^{-1/2}\overline{\sigma}\lvert\eta_{1}\rvert)]\to 0 implying [X12δ(n1/2|X1|)]0\mathcal{E}[X_{1}^{2}\delta(n^{-1/2}\lvert X_{1}\rvert)]\to 0. Similarly we can show [W2δ(n1/2|W|)]0\mathcal{E}[W^{2}\delta(n^{-1/2}\lvert W\rvert)]\to 0. Finally, we have

|[φ(Sn)][φ(W)]|0,\lvert\mathcal{E}[\varphi(S_{n})]-\mathcal{E}[\varphi(W)]\rvert\to 0,

or

limn[φ(1ni=1nXi)]=[φ(W)].\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i})]=\mathcal{E}[\varphi(W)].\qed

References

  • Artzner et al., (1999) Artzner, P., Delbaen, F., Eber, J.-M., and Heath, D. (1999). Coherent measures of risk. Mathematical finance, 9(3):203–228.
  • Bayraktar and Munk, (2015) Bayraktar, E. and Munk, A. (2015). Comparing the GG-normal distribution to its classical counterpart. Communications on Stochastic Analysis, 9(1):1–18.
  • Breiman, (1992) Breiman, L. (1992). Probability. Society for Industrial and Applied Mathematics, USA.
  • Chatfield, (1995) Chatfield, C. (1995). Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society: Series A (Statistics in Society), 158(3):419–444.
  • Chen and Epstein, (2002) Chen, Z. and Epstein, L. (2002). Ambiguity, risk, and asset returns in continuous time. Econometrica, 70(4):1403–1443.
  • Choquet, (1954) Choquet, G. (1954). Theory of capacities. In Annales de l’Institut Fourier, volume 5, pages 131–295.
  • Crandall et al., (1992) Crandall, M. G., Ishii, H., and Lions, P.-L. (1992). User’s guide to viscosity solutions of second order partial differential equations. Bulletin of the American Mathematical Society, 27(1):1–67.
  • Deng et al., (2019) Deng, S., Fei, C., Fei, W., and Mao, X. (2019). Stability equivalence between the stochastic differential delay equations driven by GG-brownian motion and the euler–maruyama method. Applied Mathematics Letters, 96:138–146.
  • Denis et al., (2011) Denis, L., Hu, M., and Peng, S. (2011). Function spaces and capacity related to a sublinear expectation: application to GG-Brownian motion paths. Potential Analysis, 34(2):139–161.
  • Der Kiureghian and Ditlevsen, (2009) Der Kiureghian, A. and Ditlevsen, O. (2009). Aleatory or epistemic? Does it matter? Structural safety, 31(2):105–112.
  • Dolinsky et al., (2012) Dolinsky, Y., Nutz, M., and Soner, H. M. (2012). Weak approximation of GG-expectations. Stochastic Processes and their Applications, 122(2):664–675.
  • Ellsberg, (1961) Ellsberg, D. (1961). Risk, ambiguity, and the Savage axioms. The Quarterly Journal of Economics, 75(4):643–669.
  • Epstein and Ji, (2013) Epstein, L. G. and Ji, S. (2013). Ambiguous volatility and asset pricing in continuous time. The Review of Financial Studies, 26(7):1740–1786.
  • Fang et al., (2019) Fang, X., Peng, S., Shao, Q., and Song, Y. (2019). Limit theorems with rate of convergence under sublinear expectations. Bernoulli, 25(4A):2564–2596.
  • Fei and Fei, (2019) Fei, C. and Fei, W. (2019). Consistency of least squares estimation to the parameter for stochastic differential equations under distribution uncertainty. arXiv preprint arXiv:1904.12701.
  • Föllmer and Schied, (2011) Föllmer, H. and Schied, A. (2011). Stochastic finance: an introduction in discrete time. Walter de Gruyter.
  • Hu, (2012) Hu, M. (2012). Explicit solutions of GG-heat equation with a class of initial conditions by GG-Brownian motion. Nonlinear Analysis, 75(18):6588–6595.
  • Hu and Li, (2014) Hu, M. and Li, X. (2014). Independence under the GG-expectation framework. Journal of Theoretical Probability, 27(3):1011–1020.
  • Hu et al., (2017) Hu, M., Peng, S., Song, Y., et al. (2017). Stein type characterization for GG-normal distributions. Electronic Communications in Probability, 22.
  • Huang and Liang, (2019) Huang, S. and Liang, G. (2019). A monotone scheme for GG-equations with application to the explicit convergence rate of robust central limit theorem. arXiv preprint arXiv:1904.07184.
  • Huber, (2004) Huber, P. J. (2004). Robust statistics, volume 523. John Wiley & Sons.
  • Jin and Peng, (2016) Jin, H. and Peng, S. (2016). Optimal unbiased estimation for maximal distribution. arXiv preprint arXiv:1611.07994.
  • Jin and Peng, (2021) Jin, H. and Peng, S. (2021). Optimal unbiased estimation for maximal distribution. Probability, Uncertainty and Quantitative Risk, 6(3):189–198.
  • Knight, (1921) Knight, F. H. (1921). Risk, uncertainty and profit, First edition. Boston, New York, Houghton Mifflin Company.
  • Krylov, (2020) Krylov, N. V. (2020). On shige peng’s central limit theorem. Stochastic Processes and their Applications, 130(3):1426–1434.
  • Li, (2018) Li, Y. (2018). Statistical exploration in the GG-expectation framework: the pseudo simulation and estimation of variance uncertainty. Master’s thesis, The University of Western Ontario, London, ON, Canada.
  • Li and Kulperger, (2018) Li, Y. and Kulperger, R. (2018). An iterative approximation of the sublinear expectation of an arbitrary function of GG-normal distribution and the solution to the corresponding GG-heat equation. arXiv preprint arXiv:1804.10737.
  • Pei et al., (2021) Pei, Z., Wang, X., Xu, Y., and Yue, X. (2021). A worst-case risk measure by GG-var. Acta Mathematicae Applicatae Sinica, English Series, 37(2):421–440.
  • Peng, (2004) Peng, S. (2004). Filtration consistent nonlinear expectations and evaluations of contingent claims. Acta Mathematicae Applicatae Sinica, English Series, 20(2):191–214.
  • Peng, (2007) Peng, S. (2007). GG-expectation, GG-Brownian motion and related stochastic calculus of Itô type. In Stochastic analysis and applications, pages 541–567. Springer.
  • Peng, (2008) Peng, S. (2008). Multi-dimensional GG-Brownian motion and related stochastic calculus under GG-expectation. Stochastic Processes and their Applications, 118(12):2223–2253.
  • Peng, (2017) Peng, S. (2017). Theory, methods and meaning of nonlinear expectation theory. SCIENTIA SINICA Mathematica, 47(10):1223–1254.
  • (33) Peng, S. (2019a). Law of large numbers and central limit theorem under nonlinear expectations. Probability, Uncertainty and Quantitative Risk, 4(1):4.
  • (34) Peng, S. (2019b). Nonlinear Expectations and Stochastic Calculus under Uncertainty: with Robust CLT and GG-Brownian Motion, volume 95. Springer-Verlag Berlin Heidelberg.
  • Peng and Yang, (2020) Peng, S. and Yang, S. (2020). Autoregressive models of the time series under volatility uncertainty and application to var model. arXiv preprint arXiv:2011.09226.
  • Peng et al., (2020) Peng, S., Yang, S., and Yao, J. (2020). Improving value-at-risk prediction under model uncertainty. Journal of Financial Econometrics.
  • Peng and Zhou, (2020) Peng, S. and Zhou, Q. (2020). A hypothesis-testing perspective on the GG-normal distribution theory. Statistics & Probability Letters, 156:108623.
  • Pursell, (1967) Pursell, L. E. (1967). Uniform approximation of real continuous functions on the real line by infinitely differentiable functions. Mathematics Magazine, 40(5):263–265.
  • Rokhlin, (2015) Rokhlin, D. B. (2015). Central limit theorem under uncertain linear transformations. Statistics & Probability Letters, 107:191–198.
  • Schmeidler, (1989) Schmeidler, D. (1989). Subjective probability and expected utility without additivity. Econometrica, 57(3):571–587.
  • Song, (2020) Song, Y. (2020). Normal approximation by stein’s method under sublinear expectations. Stochastic Processes and their Applications, 130(5):2838–2850.
  • Xu and Xuan, (2019) Xu, Q. and Xuan, X. M. (2019). Nonlinear regression without iid assumption. Probability, Uncertainty and Quantitative Risk, 4(1):1–15.
  • Zhang and Chen, (2014) Zhang, D. and Chen, Z. (2014). A weighted central limit theorem under sublinear expectations. Communications in Statistics-Theory and Methods, 43(3):566–577.
  • Zhang, (2016) Zhang, L. (2016). Rosenthal’s inequalities for independent and negatively dependent random variables under sub-linear expectations with applications. Science China Mathematics, 59(4):751–768.