Semi- $G$ -normal: a Hybrid between Normal and $G$ -normal (Full Version)

Yifan Li School of Mathematics and Statistics, University of Western Ontario, London, Canada E-mail: [email protected] Reg Kulperger School of Mathematics and Statistics, University of Western Ontario, London, Canada Hao Yu School of Mathematics and Statistics, University of Western Ontario, London, Canada

((Detailed research work for conference and open discussions))

Abstract

The $G$ -expectation framework is a generalization of the classical probabilistic system motivated by Knightian uncertainty, where the $G$ -normal plays a central role. However, from a statistical perspective, $G$ -normal distributions look quite different from the classical normal ones. For instance, its uncertainty is characterized by a set of distributions which covers not only classical normal with different variances, but additional distributions typically having non-zero skewness. The $G$ -moments of $G$ -normals are defined by a class of fully nonlinear PDEs called $G$ -heat equations. To understand $G$ -normal in a probabilistic and stochastic way that is more friendly to statisticians and practitioners, we introduce a substructure called semi- $G$ -normal, which behaves like a hybrid between normal and $G$ -normal: it has variance uncertainty but zero-skewness. We will show that the non-zero skewness arises when we impose the $G$ -version sequential independence on the semi- $G$ -normal. More importantly, we provide a series of representations of random vectors with semi- $G$ -normal marginals under various types of independence. Each of these representations under a typical order of independence is closely related to a class of state-space volatility models with a common graphical structure. In short, semi- $G$ -normal gives a (conceptual) transition from classical normal to $G$ -normal, allowing us a better understanding of the distributional uncertainty of $G$ -normal and the sequential independence.

1 Introduction

The $G$ -expectation framework is a new generalization of the classical probabilistic system, which is aimed at dealing with random phenomena in dynamic situations where it is hard to precisely determine a unique probabilistic model. These situations are also closely related to the long-existing concern on model uncertainty in statistical practice. For instance, Chatfield, (1995) gives an overview of this concern itself. However, how to better connect the idea of this framework with general data practice is still a developing and challenging area that requires researchers and practitioners from different backgrounds to collaborate and reflect on different degrees of uncertainty possibly brought by complicated nature of the data but also from the modeling procedure itself. To give some examples (rather than a complete list), several recent attempts have been made by Pei et al., (2021); Peng et al., (2020); Peng and Zhou, (2020); Xu and Xuan, (2019); Li, (2018) and Jin and Peng, (2016) (which has been published as Jin and Peng, (2021)). A fundamental and unavoidable problem will be how to better understand the $G$ -version distributions and independence from a statistical perspective, which also requires long-term efforts of learning, thinking and exploration. This research work can be treated as a detailed systematic report of our exploration on this basic point in the past three years to a broad community. This community includes not only experts in the area of nonlinear expectations (such as the $G$ -expectation) but also researchers and practitioners from other related fields who may not be familiar with the theory of the $G$ -expectation framework ( $G$ -framework) but are interested in the interplay between their areas and the $G$ -framework, which requires them to properly understand the meanings and potentials of $G$ -version distributions and independence. One vision of this report is to explore and understand the role of statistical methods incorporating $G$ -version distributions or processes (with its own independence) in general data practice as well as the differences and connections with the existing classical methods. More importantly, we intend to show how we can broaden our horizon of questions we are able to consider by introducing the notions (such as the distributions and independence) in the $G$ -framework (this goal has been partially indicated in Section 5.5). This report is also used to discuss with the broad community to initiate an in-depth discussion on this subject. Considering the length and scope of this work, we decide to divide our core discussions into two stages. The first stage (this paper) can be treated as a theoretical preparation for the second stage (forming a companion of this paper) which provides a series of statistical data experiments based on the theoretical results in this paper.

The main objective of this paper is to provide a better interpretation and understanding of the $G$ -normal distribution and the $G$ -version independence designed for researchers and practitioners from various background who are familiar with classical probability and statistics. We will achieve this goal by introducing a new substructure called the semi- $G$ -normal distribution, which behaves like a hybrid connecting normal and $G$ -normal: it is a typical object with distributional uncertainty that preserves many properties of classical normal but is also closely related to the $G$ -normal distribution.

In any probabilistic framework, if there exists a “normal” distribution (or an equivalent distributional object), it should play a fundamental role in the system. How to understand and deal with the normal distribution is crucial for the development of this framework. The $G$ -normal distribution, as its classical analogue, plays a central and fundamental role in the development of the $G$ -expectation framework.

1.1 Introduction to the $G$ -expectation framework

First we give general readers a short introduction to the $G$ -expectation framework. The classical probabilistic system is good at describing the randomness under a single probability rule or model $\mathbb{P}_{\theta}$ (which could be sophisticated in its form). However, in practice, there are phenomena where it is hard to precisely determine an unique $\mathbb{P}_{\theta}$ to describe the randomness. In this case, we cannot ignore the uncertainty in the probability rule itself. This kind of uncertainty is often called Knightian uncertainty in economy (Knight, (1921)) or epistemic uncertainty in statistics (Der Kiureghian and Ditlevsen, (2009)). It is also commonly called model uncertainty if it refers to the uncertainty in the probabilistic model. A standard example of Knightian uncertainty comes from the Ellsberg paradox proposed by Ellsberg, (1961) showing the violation of the classical expected utitlity theory based on a linear expectation. In this case, we essentially need to work with a set $\mathcal{P}=\{\mathbb{P}_{\theta},\theta\in\Theta\}$ of probability measures. In order to quantify the extreme cases under $\mathcal{P}$ , we need to work on a sublinear expectation $\mathcal{E}$ defined as:

\mathcal{E}[\cdot]\coloneqq\sup_{\mathbb{P}\in\mathcal{P}}\mathbb{E}_{\mathbb{P}}[\cdot].

(1.1)

This sublinear expectation defined as 1.1 first appeared as the upper prevision in Huber, (2004). We also call 1.1 as a representation of $\mathcal{E}$ . Coherent risk measures proposed by Artzner et al., (1999) can be also represented in this form and more details can be found in Föllmer and Schied, (2011). The notion of Choquet expectation (Choquet, (1954)) is another special type of sublinear expectation which is foundation of a new theory of expected utility by Schmeidler, (1989) to resolve the Ellsberg paradox in static situation. For dynamic situation, the utility theory can be developed by the sublinear version of $g$ -expectation proposed by Chen and Epstein, (2002). In principle, $g$ -expectation can only deal with those dynamic situations where we can find a reference measure $\mathbb{Q}$ to dominate $\mathcal{P}$ . Nonetheless, this situation is ideal for technical convenience but also quite restrictive compared with reality: it means all the probabilities in $\mathcal{P}$ agree on the same null events. For instance, in the context of financial modeling, when there is (Knightian) uncertainty or ambiguity in the volatility process $\sigma_{t}$ , the set $\mathcal{P}$ may not necessarily have a reference measure (Epstein and Ji, (2013)). How should we deal with a possibily non-dominated $\mathcal{P}$ in dynamic situation? It took the community many years to realize that it is necessary to jump out of the classical probability system and start from scratch to construct a new generalization of probability framework, which was established by Peng, (2004, 2007, 2008) and further developed by the academic community led by him, called the $G$ -expectation framework.

Since its establishment in 2000s, the $G$ -expectation framework is gradually developed into a new generalization of the classical one with its own notion of independence and distributions, as well as the associated stochastic calculus. The spirit of considering $\mathcal{P}$ to characterize the Knightian uncertainty is embedded into this framework from its initial setup. A distribution under the $G$ -expectation can be represented by a family of classical distributions - it provides a convenient way to depict the distributional uncertainty requiring a infinitely dimensional family of distributions which usually may not have an explicit parametric form. More details about this framework can be found in Denis et al., (2011); Peng, (2017); Peng, 2019b .

The $G$ -normal distribution $\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ is the analogue of normal $N(0,\sigma^{2})$ in this framework. As indicated by its notation, it is a typical object with variance uncertainty. In theory, it plays a central role in the context of the central limit theorem (Peng, 2019a ): it is the asymptotic distribution of the normalized sum of a series of independent random variables with zero mean and variance uncertainty. It has a Stein-type characterization provided by Hu et al., (2017). Fang et al., (2019) provides a insightful discrete approximation and continuous-time form representation of the $G$ -normal distribution. In pracitice, $G$ -normal has also shown its potentials in the study of risk measure such as the Value at Risk induced by $G$ -normal ( $G$ -VaR) rigorously constructed in Peng et al., (2020) and further developed in recent Peng and Yang, (2020), where the $G$ -VaR has mostly outperformed the benchmark methods in terms of the violation rate and predictive performance.

1.2 Potential misunderstandings on the $G$ -normal and independence

Since the notion of distribution and independence in the $G$ -expectation framework are different from the classical setup, there are several potential misunderstandings on the interpretation of $G$ -normal and independence. The sources of these misunderstandings can be summarized into the following four aspects (where we have also provided clarification if applicable):

A1

(The uncertainty set of the $G$ -normal) The $G$ -expectation of the $G$ -normal is defined by the (viscosity) solution a fully nonlinear PDE (the $G$ -heat equation), which usually does not have an explicit form unless in some special cases (Hu, (2012)). In fact, following the spirit of Knightian uncertainty, a better interpretation of the $G$ -version distribution is a family of classical distributions characterizing the distributional uncertainty. Nonetheless, for a general reader, if not careful, the notation of $G$ -normal $\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ could lead to a misconception that it is associated with the family $\{N(0,\sigma^{2}),\sigma\in{[\underline{\sigma},\overline{\sigma}]}\}$ . Although this impression may still hold in special situations as shown in 3.12, it is not rigorous in general. Actually, the uncertainty set of $\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ is much larger than this and one evidence is that the $G$ -normal distribution has third-moment uncertainty (all its odd moments have uncertainty), but all the distributions in the family $\{N(0,\sigma^{2}),\sigma\in{[\underline{\sigma},\overline{\sigma}]}\}$ are symmetric implying zero third moments. It means that the uncertainty set of the $G$ -normal contains those classical elements that has non-zero third moment. It seems like a strange property for a “normal” distribution in a probabilistic system (especially when we note that $X\overset{\text{d}}{=}-X$ if $X\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ ). An explicit form of the uncertainty set of $G$ -normal is given by Denis et al., (2011).
A2

(The missing connection between univariate and multivariate $G$ -normal) The joint random vector formed by $n$ independent $G$ -normal distributed random variables does not follow a multivariate $G$ -normal (even under any invertible linear transformation of the original vector.) More study of the counter-intuitive properties of $G$ -normal can be found in Bayraktar and Munk, (2015).
A3

(The asymmetry of independence) The independence in this framework is asymmetric: $X$ is independent of $Y$ does not necessarily mean $Y$ is independent of $X$ . This is why this independence is also called sequential independence, which is different from the classical one. One interpretation of this asymmetry in the relation “ $Y$ is independent of $X$ ” is from the temporal order: if $Y$ is realized at a time point after $X$ , the roles between $X$ and $Y$ would be asymmetric (in terms of the possible dependence structure). Another interpretation is from the distributional uncertainty: any realization of $X=x$ has no effect on the uncertainty set of $Y$ . Both of the interpretations are valid if one understands the detailed theory of this framework. However, for general audience, both of these are still vague even become quite confusing if one combines them together in a naive way (such as “if $Y$ happens after $X$ , any realization of $Y$ should have no effect on $X$ , then we automatically have one way of independence.”). So far we do have a simple example that the independence is indeed asymmetric (Example 1.3.15 in Peng, 2019b ), but it is not clear why the independence is asymmetric in this example. To be specific, how does the distributional uncertainty of the joint $(X,Y)$ (or the representation of its sublinear expectation) change if we switch the order of the independence? Such a representation (even in a special case) will be beneficial for general audience to better understand the sequential independence in the sense that they can explicitly see how the order of the independence changes the underlying distributional uncertainty.
A4

(The lack of caution before the data analysis) Suppose one intends to use the $G$ -normal distribution to describe the distributional uncertainty from a dataset (either artificial or realistic one). Without enough caution, the misinterpretations of the independence and distribution in this framework mentioned above may further bring confusion or even mistakes to the data analyzing procedure.

The objectives of this paper all serve for this central problem: from a statistical perspective, how to better understand the $G$ -normal distribution and the $G$ -version independence. The answer to this question will also lead to a better interpretation and understanding of $G$ -normal distribution for general audience and practitioners who are familiar with classical probability and statistics. We will work towards this central problem from the following four basic questions where each question “Q[k]” is corresponding to one of the aspect “A[k]” mentioned above,

Q1

How does the third-moment uncertainty of $G$ -normal arise? Is this possible to use the linear expectations of classical normal to approach the sublinear expectation of $G$ -normals (without involving the underlying PDEs)?
Q2

How should we appropriately connect the univariate objects and multivariate objects in this framework? Since it is hard to start from univariate $G$ -normals to get multivariate $G$ -normal, is this possible for us to make a retreat at the starting point, that is, to connect univariate classical normals with a multivariate $G$ -normal?
Q3

How can we understand the asymmetry of the independence in this framework in terms of representations?
Q4

What kinds of data sequence are related to the volatility uncertainty covered by $G$ -normal and what are not?

The interpretation of $G$ -normal and sequential independence will also be important to theoretically investigate the reliability and robustness of risk measure derived from $G$ -version distributions such as the current $G$ -VaR in the literature.

1.3 Our main tool and results in this paper

Our main tool here is a substructure called the semi- $G$ -normal distribution (Section 3.4), which behaves like a close relative to both classical models (such as a normal mixture model) and also $G$ -version objects (such as a $G$ -normal). We will also study the various kinds of independence associated with the semi- $G$ -normal distributions (Section 3.6).

The notion semi- $G$ -normal was first proposed in Li and Kulperger, (2018), which has used it to design an iterative approximation to the sublinear expectation of $G$ -normal and the solution to the $G$ -heat equation. Later on this substructure was further developed in the master thesis by Li, (2018) where the independence structures have been proposed there to better perform the pseudo simulation in this context.

This paper gives a more rigorous and systematic construction of these structures and focus more on the distributional and probabilistic aspects of them to show the hybrid roles of semi- $G$ -normal between classical normal and $G$ -normal. To be specific, we will show that there exists a middle stage of independence sitting between the classical (symmetric) independence and the $G$ -version (asymmetric) independence. It is called semi-sequential independence, which allows the connection between univariate and multivariate object (3.22), and it is a symmetric relation between two semi- $G$ -normal objects (3.19).

Moreover, we will provide a series of representations in the form similar to 1.1 associated with the semi- $G$ -normal distributions and also the random vector with semi- $G$ -normal marginal under various kinds of independence. Interestingly, by changing the order of the independence, we are equivalently modifying the graphical structure in the representation of the sublinear expectation of the joint vector. This idea will be shown in Section 3.7 and further studied in Section 5.3. These representations provide a more straightforward view on the order of independence in this framework, because we can see how the family of distributions is changed due to the switching of order. Under this view, we can provide a statistical interpretation of the asymmetry of sequential independence between two semi- $G$ -normal objects (Section 4.3).

Throughout this paper, we will frequently mention the representations of the distributions in the $G$ -framework. Theses representation results are crucial here, because the right hand side of representation is simply a family of classical models and its envelope will be exactly the sublinear expectation of $G$ -version objects. Through an intuitive representation, a person who is familiar with classical probability and statistics is able to understand the uncertainty described by the $G$ -version objects through the representation.

This remaining content of this paper is organized as follows. Section 2 will give a basic setup for the $G$ -expectation framework for readers to check the rigorous definitions of each concept. Section 3 presents our main results by putting readers in a context of a classical state-space volatility models. This kind of story setup is specially helpful in the discussions of representation associated with semi- $G$ -normal in Section 3.7. After we go through these representation results associated with this substructure, readers will find that we have already provided the answers to the four questions during the procedure. These answers will be given and elaborated in Section 4. Finally, Section 5 will summarize the whole paper and also provide more possible extensions as future developments. The proofs of our theoretical results will be put into Section 6 unless a proof is beneficial to the current discussion or is relatively short to be included in the main content.

2 Basic settings of the $G$ -expectation framework

This section gives a detailed description of the basic setup (the sublinear expectation space) of the $G$ -expectation framework for general audience by starting from a set of probability measures (more rigorous treatments can be found in Chapter 6 in the book by Peng, 2019b ). Another equivalent way is to start from a space of random variables and a sublinear operator (more details can be found in Chapter 1 and 2 in Peng, 2019b ).

For readers who may not be familiar with this setup, the following reading order is recommended as a candidate one:

1.

Take a glance at the initial setup and the meaning of notations in this section (especially the notation for independence which is in 2.6);
2.

Read through our main results (Section 3) which describe the $G$ -version distributions mostly using the representations in terms of classical objects;
3.

Come back to this section to check the detailed definitions (such as the connection between $G$ -expectation with the solutions to a class of fully nonlinear partial differential equations).

2.1 Distributions, independence and limiting results

Let $\mathcal{P}=\{\mathbb{P}_{\theta},\theta\in\Theta\}$ denote a set of probability measures on a measurable space $(\Omega,\mathcal{F})$ . Let $\mathbb{E}_{\theta}$ denote the linear expectation under $\mathbb{P}_{\theta}$ . Consider the following spaces:

•

$L^{0}(\Omega)$ : the space of all $\mathcal{F}$ -measurable real-valued functions (or the family of random variables $X:\Omega\to\mathbb{R}$ );
•

$\mathcal{H}^{*}\coloneqq\{X\in L^{0}(\Omega):\mathbb{E}_{\theta}[X]\text{ exists for each }\theta\in\Theta\}$ ;
•

$\mathcal{H}_{p}\coloneqq\{X\in L^{0}(\Omega):\sup_{\theta\in\Theta}\mathbb{E}_{\theta}[\lvert X\rvert^{p}]<\infty\}$ (for $p>0$ );
•

$\mathcal{N}_{p}\coloneqq\{X\in L^{0}(\Omega):\sup_{\theta\in\Theta}\mathbb{E}_{\theta}[\lvert X\rvert^{p}]=0\}$ (for $p>0$ );
•

$\mathcal{N}\coloneqq\{X\in L^{0}(\Omega):\mathbb{P}_{\theta}(X=0)=1\text{ for each }\theta\in\Theta\}$ .

Note that for any $1\leq p\leq q<\infty$ ,

\mathcal{H}_{q}\subset\mathcal{H}_{p}\subset\mathcal{H}^{*}\subset L^{0}(\Omega).

We also have, for any $p>0$ ,

\mathcal{N}=\mathcal{N}_{p}.

Definition 2.1.

(The upper expectation associated with $\mathcal{P}$ ) For any $X\in\mathcal{H}^{*}$ , we define a functional $\mathcal{E}:\mathcal{H}^{*}\to[-\infty,\infty]$ associated with the family $\mathcal{P}$ as

\mathcal{E}[X]=\mathcal{E}^{\mathcal{P}}[X]\coloneqq\sup_{\theta\in\Theta}\mathbb{E}_{\theta}[X],

where $[-\infty,\infty]$ is the extended real line. We also follow the convention that, if $\mathbb{E}_{\theta}[X]$ exists but is $\infty$ for some $\theta$ , the supreme is taken as $\infty$ .

Definition 2.2 (The upper and lower probability).

For any $A\in\mathcal{F}$ , let

\mathbf{V}(A)\coloneqq\sup_{\mathbb{P}\in\mathcal{P}}\mathbb{P}(A),\text{ and }\mathbf{v}(A)\coloneqq\inf_{\mathbb{P}\in\mathcal{P}}\mathbb{P}(A).

The set function $\mathbf{v}$ and $\mathbf{V}$ are respectively called the lower and upper probabilities associated with $\mathcal{P}$ .

Proposition 2.3.

The space $\mathcal{H}^{*}$ satisfies:

(1)

$c\in\mathcal{H}^{*}$ for any constant $c\in\mathbb{R}$ ;
(2)

If $X\in\mathcal{H}^{*}$ , then $\lvert X\rvert\in\mathcal{H}^{*}$ ;
(3)

If $A\in\mathcal{F}$ , then $\mathds{1}_{A}\in\mathcal{H}^{*}$ ;
(4)

If $X\in L^{0}(\Omega)$ satisfying $\mathbb{P}_{\theta}(X\geq 0)=1$ for any $\theta\in\Theta$ , then $X\in\mathcal{H}^{*}$ ;
(5)

If $X\in L^{0}(\Omega)$ satisfying $\mathbb{P}_{\theta}(X\leq 0)=1$ for any $\theta\in\Theta$ , then $X\in\mathcal{H}^{*}$ ;
(6)

For any $X\in L^{0}(\Omega)$ , $\lvert X\rvert^{k}\in\mathcal{H}^{*}$ for $k>0$ .

Proof.

It is easy to check the first three properties. The logic of (4) comes from the fact that, for each $\mathbb{P}\in\mathcal{P}$ , if we have $X\geq 0$ , $\mathbb{P}$ -almost surely, we must have $\mathbb{E}_{\mathbb{P}}[X]$ exists. Similar logic can be applied to (5). The property (6) is a direct result of (4). ∎

Remark 2.3.1.

However, $\mathcal{H}^{*}$ is not necessarily a linear space. For instance, let $\mathcal{P}=\{Q\}$ and $X$ is a Cauchy distributed random variable under $Q$ . We have $X^{+}$ and $X^{-}$ belong to $\mathcal{H}^{*}$ , but $X=X^{+}-X^{-}\notin\mathcal{H}^{*}$ .

By 2.3, for any $X\in L^{0}(\Omega)$ , $\mathcal{E}[\lvert X\rvert^{p}]$ is well-defined for any $p>0$ . Then we can write $\mathcal{H}_{p}$ as

\mathcal{H}_{p}=\{X\in L^{0}(\Omega):\mathcal{E}[\lvert X\rvert^{p}]<\infty\}.

We will mainly focus on the space $\mathcal{H}_{1}$ .

Proposition 2.4.

The space $\mathcal{H}_{1}$ is a linear space satisfying:

(1)

$c\in\mathcal{H}_{1}$ for any constant $c\in\mathbb{R}$ ;
(2)

If $X\in\mathcal{H}_{1}$ , then $cX\in\mathcal{H}_{1}$ for any constant $c\in\mathbb{R}$ ;
(3)

If $X,Y\in\mathcal{H}_{1}$ , then $X+Y\in\mathcal{H}_{1}$ ;
(4)

If $X\in\mathcal{H}_{1}$ , then $\lvert X\rvert\in\mathcal{H}_{1}$ ;
(5)

If $A\in\mathcal{F}$ , then $\mathds{1}_{A}\in\mathcal{H}_{1}$ ;
(6)

If $X\in\mathcal{H}_{1}$ , then $\varphi(X)\in\mathcal{H}_{1}$ for any bounded Borel measurable function $\varphi$ .

Proof.

The properties here can be checked by definition of $\mathcal{H}_{1}$ . For instance, (3) comes from the inequality: $\mathbb{E}_{\theta}[\lvert X+Y\rvert]\leq\mathbb{E}_{\theta}[\lvert X\rvert]+\mathbb{E}_{\theta}[\lvert Y\rvert]$ for any $\theta\in\Theta$ . ∎

Then we can check that $\mathcal{E}$ becomes a sublinear operator on the linear space $\mathcal{H}_{1}$ . In other words, $\mathcal{E}:\mathcal{H}_{1}\to\mathbb{R}$ satisfies: for any $X,Y\in\mathcal{H}_{1}$ ,

1.

(Monotonicity) For any $X\geq Y$ , $\mathcal{E}[X]\geq\mathcal{E}[Y]$ ;
2.

(Constant preserving) For any $c\in\mathbb{R}$ , $\mathcal{E}[c]=c$ ;
3.

(Sub-additivity) $\mathcal{E}[X+Y]\leq\mathcal{E}[X]+\mathcal{E}[Y]$ ;
4.

(Positive homogeneity) For any $\lambda\geq 0$ , $\mathcal{E}[\lambda X]=\lambda\mathcal{E}[X]$ ;

Then we call $\mathcal{E}$ a sublinear expectation and $(\Omega,\mathcal{H}_{1},\mathcal{E})$ a sublinear expectation space.

Furthermore, note that $\mathcal{N}=\{X\in L^{0}(\Omega):\mathcal{E}[\lvert X\rvert]=0\}$ is a linear subspace of $\mathcal{H}_{1}$ . We can treat $\mathcal{N}$ as the null space and define the quotient space $\mathcal{H}_{1}/\mathcal{N}$ . For any $\{X\}\in\mathcal{H}_{1}/\mathcal{N}$ with representative $X$ , we can define $\mathcal{E}[\{X\}]\coloneqq\mathcal{E}[X]$ , which is still a sublinear expectation. We can check that $\mathcal{E}$ induces a Banach norm $\lVert X\rVert_{1}\coloneqq\mathcal{E}[\lvert X\rvert]$ on $\mathcal{H}_{1}/\mathcal{N}$ . Let $\hat{\mathcal{H}}_{1}$ denote the completion of $\mathcal{H}_{1}/\mathcal{N}$ under $\lVert\cdot\rVert_{1}$ . Since we can check that $\mathcal{H}_{1}/\mathcal{N}$ itself is a Banach space, it is equal to its completion $\hat{\mathcal{H}}_{1}$ (Proposition 14 in Denis et al., (2011)). Let $\mathcal{H}\coloneqq\hat{\mathcal{H}}_{1}$ , then we can check that $(\Omega,\mathcal{H},\mathcal{E})$ still forms a sublinear expectation space.

Rigorously speaking, we also require additional conditions on $\mathcal{P}$ such as the weak compactness so that we have regularity on $\mathcal{E}$ (Theorem 12 in in Denis et al., (2011)). Meanwhile, there exists such a weakly compact family $\mathcal{P}$ so that the typical $G$ -version distributions (maximal and $G$ -normal distribution) exist in the space $(\Omega,\mathcal{H},\mathcal{E})$ . More details can be found in Section 2.3 and Section 6.2 in Peng, 2019b . The $G$ -expectation is defined after we construct the Brownian motion (the $G$ -Brownian motion) in this context but throughout this paper, we will only touch the $G$ -version distributions and independence so the expectation $\mathcal{E}$ so far is still a special kind of sublinear one, which we will still call it as the $G$ -expectation to stress its typical properties allowing the existence of the $G$ -version distributions. Throughout this section, without further notice, we will stay in $(\Omega,\mathcal{H},\mathcal{E})$ .

Let $\mathcal{H}^{d}\coloneqq\{(X_{1},X_{2},\dotsc,X_{d}),X_{i}\in\mathcal{H},i=1,2,\dotsc,d\}$ . For any $X\in\mathcal{H}^{d}$ , we will frequently mention a transformation $\varphi(X)$ of $X$ for a function $\varphi:\mathbb{R}^{d}\to\mathbb{R}$ . Consider the following spaces of functions:

•

$C_{\mathrm{b.Lip}}(\mathbb{R}^{d})$ : the linear space of all bounded and Lipchistz functions;
•

$C_{\mathrm{l.Lip}}(\mathbb{R}^{d})$ : the linear space of functions satisfying the locally Lipchistz property which means

$\lvert\varphi(x)-\varphi(y)\rvert\leq C_{\varphi}(1+\lvert x\rvert^{k}+\lvert y\rvert^{k})\lvert x-y\rvert,$

for $x,y\in\mathbb{R}^{d}$ , some positive integer $k$ and $C_{\varphi}>0$ depending on $\varphi$ .

We will simply write $\varphi\in C_{\mathrm{b.Lip}}$ or $\varphi\in C_{\mathrm{l.Lip}}$ if the dimension of the domain of $\varphi$ is clear in the context by checking the dimension of random objects.

Note that $\mathcal{H}$ satisfies: for any $\varphi\in C_{\mathrm{b.Lip}}$ , $\varphi(X)\in\mathcal{H}$ if $X\in\mathcal{H}^{d}$ . However, this property does not necessarily hold for any $\varphi\in C_{\mathrm{l.Lip}}$ . Therefore, when we discuss the definition of distributions and independence in this framework, we will use $\varphi\in C_{\mathrm{b.Lip}}$ . Later on, we will mention that this space can be extended to any $\varphi\in C_{\mathrm{l.Lip}}$ for a special family of distributions and under some additional conditions.

Definition 2.5 (Distributions).

There several notions related to the $G$ -version distributions:

1.

We call $X$ and $Y$ are identically distributed, denoted by $X\overset{\text{d}}{=}Y$ , if for any $\varphi\in C_{\mathrm{b.Lip}}$ ,

$\mathcal{E}[\varphi(X)]=\mathcal{E}[\varphi(Y)].$
2.

A sequence $\{X_{n}\}_{n=1}^{\infty}$ converges in distribution to $X$ , denoted as $X_{n}\overset{\text{d}}{\longrightarrow}X$ , if for any $\varphi\in C_{\mathrm{b.Lip}}$ ,

$\lim_{n\to\infty}\mathcal{E}[\varphi(X_{n})]=\mathcal{E}[\varphi(X)].$

Definition 2.6 (Independence).

A random variable $Y$ is (sequentially) independent from $X$ , denoted by $X\dashrightarrow Y$ , if for any $\varphi\in C_{\mathrm{b.Lip}}$ ,

\mathcal{E}[\varphi(X,Y)]=\mathcal{E}[\mathcal{E}[\varphi(x,Y)]_{x=X}].

Remark 2.6.1.

(Intuition of this independence) Since both $X$ and $Y$ are treated as random object with potential distributional uncertainty, this independence is essentially talking about the relation between the distributional uncertainty of $X$ and $Y$ . If we put our discussion into a context of sequential data (where the order of the data matters), this kind of independence often arises in scenarios where $X$ is realized before $Y$ and any realization of $X$ has no effect on the distributional uncertainty of $Y$ .

Remark 2.6.2.

(Asymmetry of this independence) One important fact regarding this independence is that it is asymmetric: $X\dashrightarrow Y$ ( $Y$ is independent from $X$ ) does not necessarily mean $Y\dashrightarrow X$ ( $X$ is independent from $Y$ ), which will be illustrated by 2.7. This is the reason we also call it a sequential independence and we use the notation $\dashrightarrow$ to indicate the sequential order of the independence between two random objects.

Remark 2.6.3.

(Connection with the classical independence) Note that this sequential independence becomes classical independence (which is symmetric) once $X$ and $Y$ have certain classical distribution. In other words, they can be put under a common classical probability space. In this case, $\mathcal{E}$ reduces to a linear expectation $\mathbb{E}_{\mathbb{P}}$ . To give readers a better understanding, without loss of generality, suppose $(X,Y)$ have a classical joint continuous distribution with density function $f_{X,Y}$ and the marginal densities are $f_{X}$ and $f_{Y}$ , we have, for any applicable $\varphi$ ,

\begin{array}[]{ccc}\mathbb{E}_{\mathbb{P}}[\varphi(X,Y)]&=&\int\varphi(x,y)f_{X,Y}(x,y)\mathop{}\!\mathrm{d}x\mathop{}\!\mathrm{d}y\\ \rotatebox{90.0}{$\,=$}\\ \mathbb{E}_{\mathbb{P}}[\mathbb{E}_{\mathbb{P}}[\varphi(x,Y)]_{x=X}]&=&\int_{x}\int_{y}\varphi(x,y)f_{X}(x)f_{Y}(y)\mathop{}\!\mathrm{d}y\mathop{}\!\mathrm{d}x\end{array}.

Therefore, we have $f_{X,Y}=f_{X}f_{Y}$ , which means $X$ and $Y$ are (classically) independent.

Example 2.7 (Example 1.3.15 in Peng, 2019b ).

Consider two identically distributed $X,Y\in\mathcal{H}$ with $\mathcal{E}[-X]=\mathcal{E}[X]=0$ and $\overline{\sigma}^{2}=\mathcal{E}[X^{2}]>-\mathcal{E}[-X^{2}]=\underline{\sigma}^{2}$ . Also assume $\mathcal{E}[\lvert X\rvert]>0$ such that $\mathcal{E}[X^{+}]=\frac{1}{2}\mathcal{E}[\lvert X\rvert+X]=\frac{1}{2}\mathcal{E}[\lvert X\rvert]>0$ . Then we have

\mathcal{E}[XY^{2}]=\begin{cases}(\overline{\sigma}^{2}-\underline{\sigma}^{2})\mathcal{E}[X^{+}]&\text{if }X\dashrightarrow Y\\ 0&\text{if }Y\dashrightarrow X\end{cases}.

We will further study the interpretation of the independence (especially its asymmetric property) in Section 4.3 by giving a detailed version of 2.7 with representation theorems.

Next we give the notion of independence extended to a sequence of random variables.

Definition 2.8.

(Independence of Sequence) For a sequence $\{X_{i}\}_{i=1}^{n}$ of random variables, they are (sequentially) independent if

(X_{1},X_{2},\dotsc,X_{i})\dashrightarrow X_{i+1},

for $i=1,2,\dotsc,n-1$ . For notational convenience, the sequential independence of $\{X_{i}\}_{i=1}^{n}$ is denoted as

X_{1}\dashrightarrow X_{2}\dashrightarrow\cdots\dashrightarrow X_{n}.

(2.1)

This sequence $\{X_{i}\}_{i=1}^{n}$ is further identically and independently distributed if they are sequentially independent and $X_{i+1}\overset{\text{d}}{=}X_{i}$ for $i=1,2,\dotsc,n-1$ . This property is called (nonlinearly) i.i.d. in short.

Remark 2.8.1.

Note that the independence 2.1 is stronger than the pairwise relation $X_{k}\dashrightarrow X_{k+1}$ with $k=1,2,\dotsc,n-1$ .

Now we introduce two fundamental $G$ -version distributions: maximal and $G$ -normal distributions. The former one can be treated as an analogue of “constant” in classical sense. The latter one is a generalization of classical normal. We call $\bar{X}$ an independent copy of $X$ if $\bar{X}\overset{\text{d}}{=}X$ and $X\dashrightarrow\bar{X}$ .

We first introduce the $G$ -distribution which is the joint vector of these two fundamental distributions.

Let $\mathbb{S}(d)$ denote the collection of all $d\times d$ symmetric matrices.

Proposition 2.9.

Let $G:\mathbb{R}^{d}\times\mathbb{S}(d)\to\mathbb{R}$ be a function satisfying: for each $p,\bar{p}\in\mathbb{R}^{d}$ and $A,\bar{A}\in\mathbb{S}(d)$ ,

\begin{cases}G(p+\bar{p},A+\bar{A})\leq G(p,A)+G(\bar{p},\bar{A}),\\ G(\lambda p,\lambda A)=\lambda G(p,A)\text{ for any }\lambda\geq 0,\\ G(p,A)\leq G(p,\bar{A})\text{ if }A\leq\bar{A}.\end{cases}

(2.2)

Then there exists a pair $(X,\eta)$ on some sublinear expectation space $(\Omega,\mathcal{H},\mathcal{E})$ such that

G(p,A)=\mathcal{E}[\frac{1}{2}\langle AX,X\rangle+\langle p,\eta\rangle],

(2.3)

and for any $a,b\geq 0$ ,

(aX+b\bar{X},a^{2}\eta+b^{2}\bar{\eta})\overset{\text{d}}{=}(\sqrt{a^{2}+b^{2}}X,(a^{2}+b^{2})\eta),

(2.4)

where $(\bar{X},\bar{\eta})$ is an independent copy of $(X,\eta)$ .

Remark 2.9.1.

The relation 2.4 is equivalent to $(X+\bar{X},\eta+\bar{\eta})\overset{\text{d}}{=}(\sqrt{2}X,2\eta).$

The proof of 2.9 is available at Section 2.3 in Peng, 2019b . Then we have the notion of $G$ -distribution associated with a function $G$ .

Definition 2.10.

( $G$ -distribution) A pair $(X,\eta)$ satisfying 2.4 is called $G$ -distributed associated with a function $G$ in terms of 2.3.

The sublinear expectation of the random vector $(X,\eta)$ above can be characterized by the solution to a parabolic partial differential equation.

Proposition 2.11.

Consider a $G$ -distributed random vector $(X,\eta)$ associated with a function $G$ . For any $\varphi\in C_{\mathrm{b.Lip}}(\mathbb{R}^{d}\times\mathbb{R})$ , let

u(t,x,y)\coloneqq\mathcal{E}[\varphi(x+\sqrt{t}X,y+t\eta)],\;(t,x,y)\in[0,\infty)\times\mathbb{R}^{d}\times\mathbb{R}^{d}.

Then $u$ is the unique (viscosity) solution to the following parabolic partial differential equation (PDE):

\mathop{}\!\partial_{t}u-G(D_{y}u,D_{x}^{2}u)=0,

with initial condition $u|_{t=0}=\varphi$ , where $D_{x}^{2}u\coloneqq(\mathop{}\!\partial_{x_{i}x_{j}}^{2}u)_{i,j=1}^{d}$ and $D_{y}u\coloneqq(\mathop{}\!\partial_{y_{i}}u)_{i=1}^{d}$ . This PDE is called a $G$ -equation.

Remark 2.11.1.

Readers may turn to Crandall et al., (1992) for more details on the notion of viscosity solutions. In this paper, we do not require readers’ knowledge on the viscosity solution. Moreover, it can be treated as a classical one when the function $G$ satisfies the strong elliplicity condition.

Next we provide a useful established property of the $G$ -distributed random vector $(X,\eta)$ . Suppose $\lvert\eta\rvert,\lvert X\rvert^{2}\in\mathcal{H}$ and the following uniform integrability conditions are statisfied (proposed by Zhang, (2016)):

\lim_{\lambda\to\infty}\mathcal{E}[(\lvert\eta\rvert-\lambda)^{+}]=0,

(2.5)

and

\lim_{\lambda\to\infty}\mathcal{E}[(\lvert X\rvert^{2}-\lambda)^{+}]=0.

(2.6)

Then for any $\varphi\in C_{\mathrm{l.Lip}}$ (which is larger than $C_{\mathrm{b.Lip}}$ ), we still have $\varphi(\eta,X)\in\mathcal{H}$ (which is a Banach space). (This result is provided in Section 2.5 in Peng, 2019b .) Therefore, in the following context, when we talk about $\varphi(\eta,X)$ for a $G$ -distributed random vector $(\eta,X)$ , we can take $\varphi\in C_{\mathrm{l.Lip}}$ .

If we pay attention to each marginal part in 2.4, we can see that $X$ is similar to a classical normal distribution while $\eta$ behaves like a constant (we do not consider Cauchy distribution here because we assume the existence of expectation). It turns out $X$ follows a $G$ -normal distribution and $\eta$ follows a maximal distribution.

Definition 2.12 (Maximal distribution).

A $d$ -dimensional random vector $\eta$ follows a maximal distribution if, for any independent copy $\bar{\eta}$ , we have

\eta+\bar{\eta}\overset{\text{d}}{=}2\eta.

Another equivalent and specific definition is that $\eta$ follows the maximal distribution $\mathcal{M}(\Gamma)$ if there exists a bounded, closed and convex subset $\Gamma\subset\mathbb{R}^{d}$ such that, for any $\varphi\in C_{\mathrm{l.Lip}}$ ,

\mathcal{E}[\varphi(\eta)]=\max_{y\in\Gamma}\varphi(y).

Definition 2.13 ( $G$ -normal distribution).

A $d$ -dimensional random vector $X$ follows a $G$ -normal distribution if, for any independent copy $\bar{X}$ , we have

X+\bar{X}\overset{\text{d}}{=}\sqrt{2}X.

When $d=1$ , we have $X\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ ( $0\leq\underline{\sigma}\leq\overline{\sigma}$ ) with variance-uncertainty: $\underline{\sigma}^{2}\coloneqq-\mathcal{E}[-X^{2}]$ and $\overline{\sigma}^{2}\coloneqq\mathcal{E}[X^{2}]$ .

Proposition 2.14 ( $G$ -normal distribution characterized by the $G$ -heat Equation).

A random vector $X$ follows the $d$ -dimensional $G$ -normal distribution, if and only if $v(t,x)\coloneqq\mathcal{E}[\varphi(x+\sqrt{t}X)]$ is the solution to the $G$ -heat equation defined on $(t,x)\in[0,1]\times\mathbb{R}^{d}$ :

v_{t}-G(D_{x}^{2}v)=0,\,v|_{t=0}=\varphi,

(2.7)

where $G(\mathbf{A})\coloneqq\frac{1}{2}\mathcal{E}[\langle\mathbf{A}X,X\rangle]:\mathbb{S}_{d}\to\mathbb{R}$ , which is a sublinear function characterizing the distribution of $X$ . For $d=1$ , we have $G(a)=\frac{1}{2}(\overline{\sigma}^{2}a^{+}-\underline{\sigma}^{2}a^{-})$ and when $\underline{\sigma}^{2}>0$ , 2.7 is also called the Black-Scholes-Barenblatt equation with volatility uncertainty.

Remark 2.14.1.

For $d=1$ , when $\underline{\sigma}=\overline{\sigma}=\sigma$ , the $G$ -normal distribution $X\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ can be treated as a classical normal $N(0,\sigma^{2})$ because the $G$ -heat equation is reduced to a classical one.

Remark 2.14.2.

(Covariance uncertainty) We can use the function $G(\mathbf{A})\coloneqq\frac{1}{2}\mathcal{E}[\langle\mathbf{A}X,X\rangle]$ to characterize the definition of $G$ -normal distribution. In fact, $G(\mathbf{A})$ can be further expressed as

G(\mathbf{A})=\frac{1}{2}\sup_{\mathbf{\Sigma}\in\mathcal{C}}\text{tr}[\mathbf{A}\mathbf{\Sigma}]

where $\mathcal{C}=\{\mathbf{B}\mathbf{B}^{T}:\mathbf{B}\in\mathbb{S}_{d}\}$ is a collection of non-negative definite symmetric matrices which can be treated as the uncertainty set of the covariance matrices. In this sense, we can write $X\sim\mathcal{N}(\bm{0},\mathcal{C})$ .

Proposition 2.15.

Consider $X\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ and a classical distributed random variable $\epsilon\sim N(0,1)$ . For any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R})$ , we have

\mathcal{E}[\varphi(X)]=\begin{cases}\mathbb{E}[\varphi(\overline{\sigma}\epsilon)]&\text{if }\varphi\text{ is convex}\\ \mathbb{E}[\varphi(\underline{\sigma}\epsilon)]&\text{if }\varphi\text{ is concave}\end{cases}.

Theorem 2.16 (Law of Large Numbers).

Consider a sequence of i.i.d. $\{Z_{i}\}_{i=1}^{\infty}$ satisfying

\lim_{\lambda\to\infty}\mathcal{E}[(\lvert Z_{1}\rvert-\lambda)^{+}]=0.

(2.8)

Then for any continuous functions $\varphi$ satisfying the linear growth condition $\lvert\varphi(x)\rvert\leq C(1+\lvert x\rvert)$ , we have

\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{n}\sum_{i=1}^{n}Z_{i})]=\max_{v\in\Gamma}\varphi(v),

where $\Gamma$ is the bounded, closed and convex subset decided by

\max_{\eta\in\Gamma}\langle p,Z_{1}\rangle=\mathcal{E}[\langle p,Z_{1}\rangle],\;p\in\mathbb{R}^{d}.

For $d=1$ , let $\underline{\mu}\coloneqq-\mathcal{E}[-Z_{1}]$ and $\overline{\mu}\coloneqq\mathcal{E}[Z_{1}]$ . Then $\frac{1}{n}\sum_{i=1}^{n}Z_{i}\overset{\text{d}}{\longrightarrow}\mathcal{M}[\underline{\mu},\overline{\mu}]$ , that is, we have,

\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{n}\sum_{i=1}^{n}Z_{i})]=\mathcal{E}[\varphi(\mathcal{M}[\underline{\mu},\overline{\mu}])]=\max_{\underline{\mu}\leq v\leq\overline{\mu}}\varphi(v).

Theorem 2.17 (Central Limit Theorem).

Consider a sequence of i.i.d. $\{X_{i}\}_{i=1}^{\infty}$ satisfying mean-certainty $\mathcal{E}[X_{1}]=-\mathcal{E}[-X_{1}]=\bm{0}$ and

\lim_{\lambda\to\infty}\mathcal{E}[(\lvert X_{1}\rvert^{2}-\lambda)^{+}]=0.

(2.9)

Then for any continuous functions $\varphi$ satisfying the linear growth condition $\lvert\varphi(x)\rvert\leq C(1+\lvert x\rvert)$ ,

\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i})]=\mathcal{E}[\varphi(X)],

where $X$ is a $G$ -normally distributed random variable characterized by the sublinear function $G$ defined as

G(A)\coloneqq\mathcal{E}[\frac{1}{2}\langle AX_{1},X_{1}\rangle],\;A\in\mathbb{S}_{d}.

For $d=1$ , let $\underline{\sigma}^{2}\coloneqq-\mathcal{E}[-X_{1}^{2}]$ and $\overline{\sigma}^{2}\coloneqq\mathcal{E}[X_{1}^{2}]$ . Then we have $\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i}$ converges in distribution to $X\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ .

Proposition 2.18.

Consider a sequence $\{Y_{n}\}_{n=1}^{\infty}$ and $Y_{n}$ satisfying

\sup_{n}\mathcal{E}[\lvert Y_{n}\rvert^{p}]+\mathcal{E}[\lvert Y\rvert^{p}]<\infty,

for any $p\geq 1$ . If the convergence $\lim_{n\to\infty}\mathcal{E}[\varphi(Y_{n})]=\mathcal{E}[\varphi(Y)]$ holds for any $\varphi\in C_{\mathrm{b.Lip}}$ , then it also holds for $\varphi\in C_{\mathrm{l.Lip}}$ .

Remark 2.18.1.

2.18 is a direct result of Lemma 2.4.12 in Peng, 2019b . It useful when we need to extend the function space for $\varphi$ to discuss the convergence in distribution.

2.2 Basic results on independence of sequence

We prepare several basic results on sequential independence between random vectors in the $G$ -framework:

•

2.19 gives a general result showing the sequential independence between two random vectors implies the independence between their sub-vectors.
•

2.20 shows the sequential independence of a sequence implies the independence of the sub-sequence.
•

2.23 shows under the sequential independence of a sequence, any two non-overlapping subvector has the sequential independence (as long as keeping the original order.)

These results are useful for the discussions in Section 3.6. We provide the proofs for the convenience of general readers and help them better understand how to deal with the sequential independence $\dashrightarrow$ .

Proposition 2.19.

For any subsequences $\{i_{p}\}_{p=1}^{k}$ and $\{j_{q}\}_{q=1}^{l}$ satisfying $1\leq i_{1}<i_{2}<\dotsc<i_{k}\leq n$ and $1\leq j_{1}<j_{2}<\dotsc<j_{l}\leq m$ , we have the general result that

(X_{1},X_{2},\dotsc,X_{n})\dashrightarrow(Y_{1},Y_{2},\dotsc,Y_{m})\implies(X_{i_{1}},X_{i_{2}},\dotsc,X_{i_{k}})\dashrightarrow(Y_{j_{1}},Y_{j_{2}},\dotsc,Y_{j_{l}}).

Proof.

For any applicable test function $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k+l})$ , define another function $\psi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{n+m})$ on a larger space by

\psi(x_{1},x_{2},\dotsc,x_{n},y_{1},y_{2},\dotsc,y_{m})\coloneqq\varphi(x_{i_{1}},x_{i_{2}},\dotsc,x_{i_{k}},y_{j_{1}},y_{j_{2}},\dotsc,y_{j_{l}}),

then

	$\displaystyle\hphantom{=}\mathcal{E}[\varphi(X_{i_{1}},X_{i_{2}},\dotsc,X_{i_{k}},Y_{j_{1}},Y_{j_{2}},\dotsc,Y_{j_{l}})]$
	$\displaystyle=\mathcal{E}[\psi(X_{1},X_{2},\dotsc,X_{n},Y_{1},Y_{2},\dotsc,Y_{m})]$
	$\displaystyle=\mathcal{E}[\mathcal{E}[\psi(x_{1},x_{2},\dotsc,x_{n},Y_{1},Y_{2},\dotsc,Y_{m})]_{x_{i}=X_{i},i=1,\dotsc n}]$
	$\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(x_{i_{1}},x_{i_{2}},\dotsc,x_{i_{k}},Y_{j_{1}},Y_{j_{2}},\dotsc,Y_{j_{l}})]_{x_{i_{p}}=X_{i_{p}},p=1,\dotsc,k}].\qed$

Proposition 2.20.

For any subsequence $\{i_{p}\}_{p=1}^{k}$ satisfying $1\leq i_{1}<i_{2}<\dotsc<i_{k}\leq n$ , we have the result that

X_{1}\dashrightarrow X_{2}\dashrightarrow\dotsc\dashrightarrow X_{n}\implies X_{i_{1}}\dashrightarrow X_{i_{2}}\dashrightarrow\dotsc\dashrightarrow X_{i_{k}}.

Proof.

It is equivalent to prove $(X_{i_{1}},X_{i_{2}},\dotsc,X_{i_{j-1}})\dashrightarrow X_{i_{j}}$ for any $j=2,\dotsc,k$ . For any $j=2,\dotsc,k$ , by the definition of independence of the full sequence $\{X_{i}\}_{i=1}^{n}$ , we have

(X_{1},X_{2},\dotsc,X_{i_{j-1}},\dotsc,X_{i_{j}-1})\dashrightarrow X_{i_{j}}.

From 2.19, we directly have the sequential independence for the subvectors:

(X_{i_{1}},X_{i_{2}},\dotsc,X_{i_{j-1}})\dashrightarrow X_{i_{j}}.\qed

The following 2.21 and 2.22 will be useful in our later discussion, where the dimension of the three objects $X,Y,Z$ could be arbitrary finite number.

Lemma 2.21.

If $X\dashrightarrow Y\dashrightarrow Z$ , then $X\dashrightarrow(Y,Z)$ .

Proof.

Let $H(x,y)\coloneqq\mathcal{E}[\varphi(x,y,Z)].$ Then we can check

	$\displaystyle\mathcal{E}[\mathcal{E}[\varphi(x,Y,Z)]_{x=X}]$	$\displaystyle\overset{(1)}{=}\mathcal{E}[\mathcal{E}[\mathcal{E}[\varphi(x,y,Z)]_{y=Y}]_{x=X}]$
		$\displaystyle=\mathcal{E}[\mathcal{E}[H(x,Y)]_{x=X}]$
		$\displaystyle\overset{(2)}{=}\mathcal{E}[H(X,Y)]$
		$\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(x,y,Z)]_{x=X,y=Y}]$
		$\displaystyle\overset{(3)}{=}\mathcal{E}[\varphi(X,Y,Z)],$

where (1) is due to $Y\dashrightarrow Z$ , (2) comes from $X\dashrightarrow Y$ and (3) comes from $(X,Y)\dashrightarrow Z$ . ∎

Lemma 2.22.

If $X\dashrightarrow(Y,Z)$ and $Y\dashrightarrow Z$ , we have $(X,Y)\dashrightarrow Z$ .

Proof.

Let $H(x,y)\coloneqq\mathcal{E}[\varphi(x,y,Z)].$ Then

	$\displaystyle\mathcal{E}[\mathcal{E}[\varphi(x,y,Z)]_{x=X,y=Y}]$	$\displaystyle=\mathcal{E}[H(X,Y)]$
		$\displaystyle\overset{(1)}{=}\mathcal{E}[\mathcal{E}[H(x,Y)]_{x=X}]$
		$\displaystyle=\mathcal{E}[\mathcal{E}[\mathcal{E}[\varphi(x,y,Z)]_{y=Y}]_{x=X}]$
		$\displaystyle\overset{(2)}{=}\mathcal{E}[\mathcal{E}[\varphi(x,Y,Z)]_{x=X}]$
		$\displaystyle\overset{(3)}{=}\mathcal{E}[\varphi(X,Y,Z)],$

where (1) comes from $X\dashrightarrow Y$ , (2) comes from $Y\dashrightarrow Z$ and (3) comes from $X\dashrightarrow(Y,Z)$ . ∎

Proposition 2.23.

If $X_{1}\dashrightarrow X_{2}\dashrightarrow\cdots\dashrightarrow X_{n}$ , for any increasing subsequence $\{i_{j}\}_{j=1}^{k}\subset\{1,2,\dotsc,n\}$ , we have

(X_{i_{1}},\dotsc,X_{i_{l}})\dashrightarrow(X_{i_{l+1}},\dotsc,X_{i_{k}}).

Proof.

Let $Y_{j}\coloneqq X_{i_{j}}.$ Then we have $Y_{1}\dashrightarrow Y_{2}\dashrightarrow\cdots\dashrightarrow Y_{k}$ . Our goal is to show for any $l=1,2,\dotsc,k-1$ ,

(Y_{1},\dotsc,Y_{l})\dashrightarrow(Y_{l+1},\dotsc,Y_{k}).

(2.10)

Then we can proceed by math induction. Let $m=k-l$ . The result 2.10 holds when $m=1$ because we directly have

(Y_{1},\dotsc,Y_{k-1})\dashrightarrow Y_{k},

by the definition of $Y_{1}\dashrightarrow Y_{2}\dashrightarrow\cdots\dashrightarrow Y_{k}$ . Suppose 2.10 holds for $m=j$ . We need to show the case with $m=j+1$ :

(Y_{1},\dotsc,Y_{k-j-1})\dashrightarrow(Y_{k-j},Y_{k-j+1}\dotsc,Y_{k}).

(2.11)

Let

	$\displaystyle A_{1}$	$\displaystyle\coloneqq(Y_{1},\dotsc,Y_{k-j-1}),$
	$\displaystyle A_{2}$	$\displaystyle\coloneqq Y_{k-j},$
	$\displaystyle A_{3}$	$\displaystyle\coloneqq(Y_{k-j+1},\dotsc,Y_{k}).$

Then we have $A_{1}\dashrightarrow A_{2}$ by the definition of $Y_{1}\dashrightarrow Y_{2}\dashrightarrow\cdots\dashrightarrow Y_{k-j}.$ We also have $(A_{1},A_{2})\dashrightarrow A_{3}$ by the result for $m=j$ . Then we can follow the same logic of 2.21 to show

A_{1}\dashrightarrow(A_{2},A_{3}),

which is exactly 2.11. The proof is finished by math induction. ∎

Proposition 2.24.

The following two statements are equivalent:

(1)

$X_{1}\dashrightarrow X_{2}\dashrightarrow X_{3}\dashrightarrow X_{4}$ ,
(2)

$(X_{1},X_{2})\dashrightarrow(X_{3},X_{4})$ , $X_{1}\dashrightarrow X_{2}$ and $X_{3}\dashrightarrow X_{4}$ .

Proof.

Since (1) implies (2), so we only need to show the other direction. By the definition of (1), we simply need to check:

(X_{1},X_{2},X_{3})\dashrightarrow X_{4}.

This is a direct consequence of 2.22 by letting $X^{*}\coloneqq(X_{1},X_{2})$ , $Y^{*}\coloneqq X_{3}$ and $Z^{*}\coloneqq X_{4}$ . ∎

Proposition 2.25.

For any $X,Y\in\mathcal{H}$ , we have $\mathcal{E}[X+Y]=\mathcal{E}[X]+\mathcal{E}[Y]$ as long as either one of the following conditions holds:

1.

$\mathcal{E}[X]=-\mathcal{E}[-X]$ ;
2.

$\mathcal{E}[Y]=-\mathcal{E}[-Y]$ ;
3.

$Y$ is independent from $X$ : $X\dashrightarrow Y$ ;
4.

$X$ is independent from $Y$ : $Y\dashrightarrow X$ .

Proof.

We only need to show Condition 1 and 3. Under Condition 1, we have

\mathcal{E}[X+Y]\leq\mathcal{E}[X]+\mathcal{E}[Y]=\mathcal{E}[Y]-\mathcal{E}[-X]\leq\mathcal{E}[Y-(-X)]=\mathcal{E}[X+Y].

Under condition 3, we have

\mathcal{E}[X+Y]=\mathcal{E}[\mathcal{E}[x+Y]_{x=X}]=\mathcal{E}[X+\mathcal{E}[Y]]=\mathcal{E}[X]+\mathcal{E}[Y].\qed

2.26 is an important result that shows the asymmetry of independence between two random objects prevails in this framework except when their distributions are maximal or classical ones.

Theorem 2.26 (Hu and Li, (2014)).

For two non-constant random varibles $X,Y\in\mathcal{H}$ , if $X$ and $Y$ are mutually independent ( $X\dashrightarrow Y$ and $Y\dashrightarrow X$ ), then they belong to either of the following two cases:

1.

The distributions of $X$ and $Y$ are classical (no distributional uncertainty);
2.

Both $X$ and $Y$ are maximally distributed.

We can also easily obtain the following result.

Proposition 2.27.

For two non-constant random varibles $X,Y\in\mathcal{H}$ , if they belong to either of the two cases in 2.26, then we have $X\dashrightarrow Y$ implies $Y\dashrightarrow X$ .

Proof.

When $X,Y$ are classically distributed, the results can be derived from 2.6.3. When they are maximally distributed, this result has been studied in Example 14 in Hu and Li, (2014) and it has been generalized to 3.6 whose proof is in Section 6.1. We sketch the proof here to show the intuition for general readers. Suppose $X\sim\mathcal{M}(K_{1})$ and $Y\sim\mathcal{M}(K_{2})$ where $K_{1}$ and $K_{2}$ are two bounded, closed and convex sets. If $X\dashrightarrow Y$ , for any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{2})$ , we can work on the expectation of $(X,Y)$ to show the other direction of independence,

	$\displaystyle\mathcal{E}[\varphi(X,Y)]$	$\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(x,Y)]_{x=X}]=\mathcal{E}[(\max_{y\in K_{2}}\varphi(x,y))_{x=X}]$
		$\displaystyle=\max_{x\in K_{1}}\max_{y\in K_{2}}\varphi(x,y)=\max_{(x,y)\in K_{1}\times K_{2}}\varphi(x,y)$
		$\displaystyle=\max_{y\in K_{2}}\max_{x\in K_{1}}\varphi(x,y)=\mathcal{E}[\mathcal{E}[\varphi(X,y)]_{y=Y}],$

where we have used the fact that $\varphi^{*}_{y}(x)\coloneqq\max_{y\in K_{2}}\varphi(x,y)\in C_{\mathrm{l.Lip}}(\mathbb{R})$ if $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{2})$ (to apply the representation), which is validated by 6.1 in the proof of 3.6. Hence, we have $Y\dashrightarrow X$ . ∎

3 Our main results: semi- $G$ -normal and its representations

This section serves for two objectives. On the one hand, we will introduce a new substructure called the semi- $G$ -normal distribution and explain its hybrid property and intermediate role sitting between the classical normal and $G$ -normal distribution. On the other hand, this section is also designed to give general readers a gentle trip towards the $G$ -normal distribution by starting from our old friend, the classical normal distribution.

Although most of the theoretical results presented in this section are in the sublinear expectation space $(\Omega,\mathcal{H},\mathcal{E})$ by default unless indicated in the context, we will introduce most of the subsections by starting from a discussion on the distributional uncertainty of a random object in a classical state-space volatility model, whose context will be set up in Section 3.1.

Without further notice, these are the notations we are going to consistently use in this paper:

•

$\mathbb{N}_{+}$ : the set of all positive integers.
•

$\bm{x}_{(n)}\coloneqq(x_{1},x_{2},\dotsc,x_{n})$ and $\bm{x}_{(n)}*\bm{y}_{(n)}\coloneqq(x_{1}y_{1},x_{2}y_{2},\dotsc,x_{n}y_{n})$ .
•

Let $\mathbf{I}_{d}$ denote a $d\times d$ identity matrix.
•

In $(\Omega,\mathcal{F},\mathbb{P})$ , let $\mathbb{E}_{\mathbb{P}}$ denote the linear expectation with respect to $\mathbb{P}$ and we may write it as $\mathbb{E}$ if the underlying $\mathbb{P}$ is clear from the context.
•

Random variables in $(\Omega,\mathcal{H},\mathcal{E})$ : $V\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}$ , $\epsilon\sim\mathcal{N}(0,[1,1]$ , $W\coloneqq V\epsilon$ , $W^{G}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ .
•

Random variables in $(\Omega,\mathcal{F},\mathbb{P})$ : $\sigma:\Omega\to{[\underline{\sigma},\overline{\sigma}]}$ , $\epsilon\sim N(0,1)$ , $Y\coloneqq\sigma\epsilon$ . Note that we can treat $\epsilon$ as a random object in both sublinear and classical system due to 2.14.1.

The reason we use two different sets of random variables in two spaces is mainly for simplicity of our discussion, which will be further explained in 3.2.2.

3.1 Setup of a story in a classical filtered probability space

In $(\Omega,\mathcal{F},\mathbb{P})$ , consider $(\epsilon_{t})_{t\in\mathbb{N}_{+}}$ as a sequence of classically i.i.d. random variables satisfying $\mathbb{E}_{\mathbb{P}}[|\epsilon_{1}|^{k}]<\infty$ for $k\in\mathbb{N}_{+}$ . Let $(\sigma_{t})_{t\in\mathbb{N}_{+}}$ be a sequence of bounded random variables which can be treated as states (or volatility regimes) with state space $S_{\sigma}\subset{[\underline{\sigma},\overline{\sigma}]}$ . Let $Y_{t}=\sigma_{t}\epsilon_{t},t\in\mathbb{N}_{+}$ denote the observation sequence. (It seems like a zero-delay setup, but this is not essential in our current scope of discussion.) Consider a representative $Y=\sigma\epsilon$ where $(\sigma,\epsilon)\coloneqq(\sigma_{1},\epsilon_{1})$ .

For simiplicty of discussion, at each time point $t$ , we assume that $\epsilon_{t}$ follows $N(0,1)$ and $\sigma_{t}$ and $\epsilon_{t}$ are classically independent, denoted as $\sigma_{t}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon_{t}$ . Consider the following (discrete-time) filtrations:

	$\displaystyle\mathcal{G}_{t}$	$\displaystyle\coloneqq\sigma(\sigma_{s},s\leq t)\vee\mathcal{N},$
	$\displaystyle\mathcal{Y}_{t}$	$\displaystyle\coloneqq\sigma(Y_{s},s\leq t)\vee\mathcal{N},$
	$\displaystyle\mathcal{F}_{t}$	$\displaystyle\coloneqq\sigma((\sigma_{s},Y_{s}),s\leq t)\vee\mathcal{N}.$

where $\mathcal{N}$ is the collection of $\mathbb{P}$ -null sets used to complete each of the generated $\sigma$ -field mentioned above. Note that $\mathcal{F}_{t}$ is the same as $\sigma((\epsilon_{s},\sigma_{s}),s\leq t)\vee\mathcal{N}$ . Let $\mathbb{F}\coloneqq\{\mathcal{F}_{t}\}_{t\in\mathbb{N}_{+}}$ . In a classical filtered probability space $(\Omega,\mathcal{F},\mathbb{F},\mathbb{P})$ , we will start the following subsections by putting ourseleves, as a group of data analysts, in a context of dealing with uncertainty on the distributions of one state variable $\sigma$ , one observation variable $Y=\sigma\epsilon$ and a sequence of observation variables $(Y_{1},Y_{2},\dotsc,Y_{n})$ for $n\in\mathbb{N}_{+}$ .

3.2 Preparation: properties of maximal distribution

Suppose we have uncertainty on the distribution of the state variable $\sigma$ (and it is realistic because $\sigma$ is not directly observable in practice) due to lack of prior knowledge. Another possible situation is different member in our group has different belief on the behavior of $\sigma$ or different preference on the choice of the model - the distribution of $\sigma$ could be a degenerate, discrete, absolutely continuous or even arbitrary one with support on ${[\underline{\sigma},\overline{\sigma}]}$ . In order to quantify this kind of model uncertainty for a given transformation $\varphi$ (as a test function), we usually need to involve the maximum expected value of $\sigma$ :

\sup_{\sigma\in\mathcal{A}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma)],

(3.1)

where $\mathcal{A}{[\underline{\sigma},\overline{\sigma}]}$ can be chosen depending on the available prior information. Possible choices of $\mathcal{A}{[\underline{\sigma},\overline{\sigma}]}$ include,

•

$\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}$ : the space of all classically distributed random variables with support on ${[\underline{\sigma},\overline{\sigma}]}$ .
•

$\mathcal{D}_{\textbf{disc.}}{[\underline{\sigma},\overline{\sigma}]}\coloneqq\{\sigma\in\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}:\text{ discretely distributed}\}$ .
•

$\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}\coloneqq\{\sigma\in\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}:\text{ absolutely continuously distributed}\}$ .
•

$\mathcal{D}_{\textbf{deg.}}{[\underline{\sigma},\overline{\sigma}]}\coloneqq\{\sigma\in\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}:\mathbb{P}(\sigma=v)=1,v\in{[\underline{\sigma},\overline{\sigma}]}\}$ , which is the family of all random variables following degenerate (or Dirac) distribution with mass point at $v\in{[\underline{\sigma},\overline{\sigma}]}$ .

We are going to show that 3.1 will all be the same as the sublinear expectation of maximal distribution in the $G$ -framework.

Definition 3.1.

(Univariate Maximal Distribution) In sublinear expectation space $(\Omega,\mathcal{H},\mathcal{E})$ , a random variable $V$ follows maximal distribution $\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}$ with $\underline{\sigma}\leq\overline{\sigma}$ if, for any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}),$

\mathcal{E}[\varphi(V)]=\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(v).

Remark 3.1.1.

We can take maximum because we are working on a continuous $\varphi$ on a compact set ${[\underline{\sigma},\overline{\sigma}]}$ .

The reason that we use the notation ${[\underline{\sigma},\overline{\sigma}]}$ (which is like an interval for standard deviation) is for the convenience of our later discussion.

Proposition 3.2 (Representations of Univariate Maximal Distribution).

Consider $V\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}$ , then for any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R})$ , we have $\mathbb{E}[\lvert\varphi(V)\rvert]<\infty$ and

$\displaystyle\mathcal{E}[\varphi(V)]$	$\displaystyle=\max_{\sigma\in\mathcal{D}_{\textbf{deg.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma)]$	(3.2)
	$\displaystyle=\max_{\sigma\in\mathcal{D}_{\textbf{disc.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma)]$	(3.3)
	$\displaystyle=\sup_{\sigma\in\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma)]$	(3.4)
	$\displaystyle=\max_{\sigma\in\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma)].$	(3.5)

Remark 3.2.1.

Note that $\mathcal{D}_{\textbf{disc.}}\cup\mathcal{D}_{\textbf{cont.}}\subset\mathcal{D}$ . The probability laws associated with $\mathcal{D}_{\textbf{cont.}}$ are equivalent, but $\mathcal{D}_{\textbf{disc.}}$ and $\mathcal{D}$ do not have this property.

Remark 3.2.2.

In 3.2, we write the representation in the form of

\mathcal{E}[\varphi(V)]=\max_{\sigma\in\mathcal{A}}\mathbb{E}[\varphi(\sigma)],

(3.6)

where $\mathcal{A}$ denote a family of random variables in $(\Omega,\mathcal{F},\mathbb{P})$ . Equivalently, if we treat $V$ as a random variable for both sides (which requires more careful preliminary setup and we will not touch at this stage, more details can be found in Chapter 6 of Peng, 2019b ), 3.6 becomes

\mathcal{E}[\varphi(V)]=\max_{\mathbb{P}_{V}\in\mathbb{P}\circ\mathcal{A}^{-1}}\mathbb{E}_{\mathbb{P}}[\varphi(V)],

(3.7)

where $\mathbb{P}_{V}$ is the distribution of $V$ and $\mathbb{P}\circ\mathcal{A}^{-1}\coloneqq\{\mathbb{P}\circ\sigma^{-1},\sigma\in\mathcal{A}\}$ becomes a family of distributions. Throughout this paper, we prefer to use the form 3.6 for simplicity of notations and minimization of technical setup, but readers can always informally view 3.6 as a equivalent form of 3.7. In this way, we can better see the distributional uncertainty of $V$ .

Remark 3.2.3.

Meanwhile, 3.2 provides four ways to represent the distributional uncertainty of $V$ . In practice, practitioners may choose the representation they need depending on the available prior knowledge or their belief on the random phenomenon.

Definition 3.3.

(Multivariate Maximal Distribution) In sublinear expectation space $(\Omega,\mathcal{H},\mathcal{E})$ , a random vector $\bm{V}:\Omega\to\mathbb{R}^{d}$ follows a (multivariate) maximal distribution $\mathcal{M}(\mathcal{V})$ , if there exists a compact and convex subset $\mathcal{V}\subset\mathbb{R}^{d}$ satisfying: for any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{d})$ ,

\mathcal{E}[\varphi(\bm{V})]=\max_{\bm{\sigma}\in\mathcal{V}}\varphi(\bm{\sigma}).

One can also easily extend the representation in 3.2 to a multivariate case (3.4) by considering $\mathcal{D}(\mathcal{V})$ which is the space of all classically distributed random variables with support on $\mathcal{V}$ and also considering its subspaces $\mathcal{D}_{\textbf{deg.}}(\mathcal{V})$ , $\mathcal{D}_{\textbf{cont.}}(\mathcal{V})$ and $\mathcal{D}_{\textbf{disc.}}(\mathcal{V})$ as well.

Proposition 3.4 (Representations of multivariate maximal distribution).

For $\bm{V}\sim\mathcal{M}(\mathcal{V})$ , we have for any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{d})$ ,

\mathcal{E}[\varphi(\bm{V})]=\sup_{\sigma\in\mathcal{A}(\mathcal{V})}\mathbb{E}[\varphi(\bm{\sigma})],

(3.8)

where $\mathcal{A}$ can be chosen from $\{\mathcal{D},\mathcal{D}_{\textbf{disc.}},\mathcal{D}_{\textbf{cont.}},\mathcal{D}_{\textbf{deg.}}\}$ and $\sup$ can be changed to $\max$ except when $\mathcal{A}=\mathcal{D}_{\textbf{cont.}}$ .

Proof.

It can be extended from the proof of 3.2. ∎

Next we provide a property for multivariate maximal distribution under transformations.

Proposition 3.5.

Suppose $\bm{V}\sim\mathcal{M}(\mathcal{V})$ . Then for any locally Lipschitz function $\psi:(\mathbb{R}^{d},\lVert\cdot\rVert)\to(\mathbb{R}^{k},\lVert\cdot\rVert)$ , we have

\bm{S}\coloneqq\psi(\mathbf{V})=\psi(V_{1},V_{2},\dotsc,V_{d})\sim\mathcal{M}(\mathcal{S}),

where $\mathcal{S}\coloneqq\psi(\mathcal{V})=\{\psi(\bm{\sigma}):\bm{\sigma}\in\mathcal{V}\}$ .

Remark 3.5.1.

3.5 is generalized version of Proposition 25 and Remark 26 in Jin and Peng, (2021). It shows that the transformation of a maximal is still a maximal distribution, with support equal to the range of the function.

Next we give a connection between univariate and multivariate maximal distribution.

Proposition 3.6.

(The relation between multivariate and the univariate maximal distribution) Consider a sequence of maximally distributed random variables $V_{i}\sim\mathcal{M}[\underline{\sigma}_{i},\overline{\sigma}_{i}]$ with $\underline{\sigma}_{i}\leq\overline{\sigma}_{i}$ , $i=1,2,\dotsc,d$ , then the following three statements are equivalent:

(1)

$\{V_{i}\}_{i=1}^{d}$ are sequentially independent $V_{1}\dashrightarrow V_{2}\dashrightarrow\cdots\dashrightarrow V_{d}$ ,
(2)

$V_{i_{1}}\dashrightarrow V_{i_{2}}\dashrightarrow\cdots\dashrightarrow V_{i_{d}}$ for any permutation $(i_{1},i_{2},\dotsc,i_{d})$ of $(1,2,\dotsc,d)$ ,
(3)

$\bm{V}\coloneqq(V_{1},V_{2},\dotsc,V_{d})\sim\mathcal{M}(\prod_{i=1}^{d}[\underline{\sigma}_{i},\overline{\sigma}_{i}])$ , where the operation $\prod_{i=1}^{d}$ is the Cartesian product.

Remark 3.6.1.

3.6 shows that the sequential independence between maximal distribution can be arbitrarily switched without changing its joint distribution, which is a maximal distribution supporting on a $d$ -dimensional rectangle. Reversely speaking, if a random vector follows a maximal distribution concentrating this rectangle shape, it implies the sequential independence among its components.

As a special case of 3.6, for two maximal distributed random variables $V_{i},i=1,2$ , $V_{1}\dashrightarrow V_{2}$ implies that $V_{2}\dashrightarrow V_{1}$ . In fact, 2.26 given by Hu and Li, (2014) shows for two non-constant, non-classical distributed random objects, this kind of mutual independence only holds for maximal distributions. The asymmetry of sequential independence prevails among the distributions in the $G$ -expectation framework.

3.3 Preparation: setup of a product space (a newly added part)

We start from a set $\mathcal{Q}$ of probability measures and a single probability measure $P$ , where $P$ does not have to be in $\mathcal{Q}$ . Let $\mathcal{E}_{1}[\cdot]\coloneqq\sup_{Q\in\mathcal{Q}}\mathbb{E}_{Q}[\cdot]$ and $\mathcal{E}_{2}[\cdot]\coloneqq\mathbb{E}_{P}[\cdot]$ . Then we have the associated sublinear expectation spaces $(\Omega_{i},\mathcal{H}_{(i)},\mathcal{E}_{i})$ with $i=1,2$ . Note that $\mathcal{E}_{2}$ , as a linear operator, can be treated as a degenerate sublinear expectation. We may simply write the linear expectation $\mathbb{E}_{P}$ as $\mathbb{E}$ if the probability measure is clear from the context. Since $\mathcal{E}_{2}$ is a linear expectation, the distributions under $(\Omega_{2},\mathcal{H}_{(2)},\mathcal{E}_{2})$ can be treated as classical ones for which we assume they contain common classical distributions (such as classical normal). We also assume $\mathcal{Q}$ is designed such that $G$ -distribution exists in $(\Omega_{1},\mathcal{H}_{(1)},\mathcal{E}_{1})$ . Then we can combine these two spaces into a product space $(\Omega_{1}\times\Omega_{2},\mathcal{H}_{(1)}\otimes\mathcal{H}_{(2)},\mathcal{E}_{1}\otimes\mathcal{E}_{2})$ . It is also forms a sublinear expectation space. More details on this notion of product space can be found in Peng, 2019b (Section 1.3).

For readers’ convenience, here we provide a brief description of this product space.

The space is $\mathcal{H}_{(1)}\otimes\mathcal{H}_{(2)}$ defined as

	$\displaystyle\mathcal{H}_{(1)}\otimes\mathcal{H}_{(2)}=$	$\displaystyle\{X(\omega_{1},\omega_{2})=f(K(\omega_{1}),\eta(\omega_{2})),(\omega_{1},\omega_{2})\in\Omega_{1}\times\Omega_{2},$
		$\displaystyle K\in\mathcal{H}_{(1)}^{m},\eta\in\mathcal{H}_{(2)}^{n},f\in C_{\mathrm{l.Lip}}(\mathbb{R}^{m+n}),m,n\in\mathbb{N}_{+}\}.$

For $X(\omega_{1},\omega_{2})=f(K(\omega_{1}),\eta(\omega_{2}))\in\mathcal{H}_{(1)}\otimes\mathcal{H}_{(2)}$ , we have defined

	$\displaystyle\mathcal{E}[X]=\mathcal{E}_{1}\otimes\mathcal{E}_{2}[X]$	$\displaystyle\coloneqq\mathcal{E}_{1}[\mathcal{E}_{2}[f(k,\eta)]_{k=K}]$
		$\displaystyle=\sup_{Q\in\mathcal{Q}}\mathbb{E}_{Q}[\mathbb{E}_{P}[f(k,\eta)]_{k=K}]$
		$\displaystyle=\sup_{Q\in\mathcal{Q}}\int\int f(k,y)P_{\eta}(\mathop{}\!\mathrm{d}y)Q_{K}(\mathop{}\!\mathrm{d}k)$
		$\displaystyle=\sup_{\mathbb{P}\in\mathcal{P}}\mathbb{E}_{\mathbb{P}}[X],$

where $\mathcal{P}\coloneqq\{Q\otimes P,Q\in\mathcal{Q}\}$ where $Q\otimes P$ is the product measure of $P$ and $Q$ . Note that $\mathcal{E}_{2}\otimes\mathcal{E}_{1}\neq\mathcal{E}_{1}\otimes\mathcal{E}_{2}$ due to the sublinearity of $\mathcal{E}_{1}$ .

Proposition 3.7.

For a random variable $K$ on $(\Omega_{1},\mathcal{H}_{(1)},\mathcal{E}_{1})$ and $\eta$ on $(\Omega_{2},\mathcal{H}_{(2)},\mathcal{E}_{2})$ , by letting $\bar{K}(\omega_{1},\omega_{2})\coloneqq K(\omega_{1})$ and $\bar{\eta}(\omega_{1},\omega_{2})\coloneqq\eta(\omega_{2})$ , we have the following results:

1.

$\bar{K},\bar{\eta}\in\mathcal{H}_{(1)}\otimes\mathcal{H}_{(2)}$ ,
2.

$X(\omega)=f(\bar{K}(\omega),\bar{\eta}(\omega)),$ for $\omega\in\Omega_{1}\times\Omega_{2}$ ,
3.

For any $\varphi\in C_{\mathrm{b.Lip}}$ , $\mathcal{E}[\varphi(\bar{K})]=\mathcal{E}_{1}[\varphi(K)],$
4.

For any $\varphi\in C_{\mathrm{b.Lip}}$ , $\mathcal{E}[\varphi(\bar{\eta})]=\mathbb{E}_{P}[\varphi(\eta)],$
5.

$\bar{K}\dashrightarrow\bar{\eta}.$

Proof.

Item 1 and 2 are obvious to see. For Item 3, we have

	$\displaystyle\mathcal{E}[\varphi(\bar{K})]$	$\displaystyle=\mathcal{E}_{1}[\mathcal{E}_{2}[\varphi(f_{1}(k,\eta))]_{k=K}]$
		$\displaystyle=\mathcal{E}_{1}[\mathcal{E}_{2}[\varphi(k)]_{k=K}]=\mathcal{E}_{1}[\varphi(K)].$

Similarly, we can show Item 4. For Item 5, our goal is to show

\mathcal{E}[\varphi(\bar{K},\bar{\eta})]=\mathcal{E}[\mathcal{E}[\varphi(k,\bar{\eta})]_{k=\bar{K}}].

We can see the equation above holds from the following step: with $H(k)\coloneqq\mathbb{E}_{P}[\varphi(k,\bar{\eta})]$ ,

	RHS	$\displaystyle=\mathcal{E}[\mathbb{E}_{P}[\varphi(k,\bar{\eta})]_{k=\bar{K}}]$
		$\displaystyle=\mathcal{E}[H(\bar{K})]=\mathcal{E}_{1}[H(K)]$
		$\displaystyle=\mathcal{E}_{1}[\mathbb{E}_{P}[\varphi(k,\bar{\eta})]_{k=K}]$
		$\displaystyle=\mathcal{E}[\varphi(K,\eta)]=\mathcal{E}[\varphi(\bar{K},\bar{\eta})].\qed$

Remark 3.7.1.

In the following context, without further notice, we will not distinguish $\bar{K}$ (or $\bar{\eta}$ ) with $K$ (or $\eta$ ). Moreover, we can see that, by letting $\eta(\omega_{1},\omega_{2})\coloneqq\eta(\omega_{2})$ so that $\eta\in\mathcal{H}_{(1)}\otimes\mathcal{H}_{(2)}$ , from Item 3, we have

\mathcal{E}[\varphi(\eta)]=\sup_{\mathbb{P}\in\mathcal{P}}\mathbb{E}_{\mathbb{P}}[\varphi(\eta)]=\mathbb{E}_{P}[\varphi(\eta)],

where $\mathbb{P}$ is any product measure $Q\otimes P$ where $Q\in\mathcal{Q}$ . By making $\varphi$ into $-\varphi$ , we can show that for any $\mathbb{P}\in\mathcal{P}$ ,

\mathbb{E}_{P}[\varphi(\eta)]=\inf_{\mathbb{P}\in\mathcal{P}}\mathbb{E}_{\mathbb{P}}[\varphi(\eta)]\leq\mathbb{E}_{\mathbb{P}}[\varphi(\eta)]\leq\sup_{\mathbb{P}\in\mathcal{P}}\mathbb{E}_{\mathbb{P}}[\varphi(\eta)]=\mathbb{E}_{P}[\varphi(\eta)],

or simply $\mathbb{E}_{\mathbb{P}}[\varphi(\eta)]=\mathbb{E}_{P}[\varphi(\eta)].$ It means that the probability law of $\eta$ is always $P_{\eta}$ under each product measure $\mathbb{P}\in\mathcal{P}$ .

Let $\bar{\mathcal{H}}_{s}$ denote a subspace of the product space mentioned above:

	$\displaystyle\bar{\mathcal{H}}_{s}\coloneqq\{X\in\mathcal{H}_{(1)}\otimes\mathcal{H}_{(2)}:$	$\displaystyle X(\omega_{1},\omega_{2})=f(K(\omega_{1}),\eta(\omega_{2})),K\in\mathcal{H}_{(1)}^{m}\sim\mathcal{M}(\Theta),$
		$\displaystyle f\in C_{\mathrm{l.Lip}}(\mathbb{R}^{m+n}),\Theta\subset\mathbb{R}^{m},m,n\in\mathbb{N}_{+}\}.$

For any $X\in\bar{\mathcal{H}}_{s}$ , by the representation of maximal distribution, we have

\mathcal{E}[X]=\sup_{\theta\in\Theta}\mathbb{E}_{P}[f(\theta,\eta)].

3.4 Univariate semi- $G$ -normal distribution

Recall the story setup in Section 3.1. Note that $Y=\sigma\epsilon$ can be treated as a normal mixture with scaling latent variable $\sigma:\Omega\to{[\underline{\sigma},\overline{\sigma}]}$ . For simplicity of discussion, we have assumed $\sigma\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon$ and $\epsilon\sim N(0,1)$ . Suppose we are further faced with the uncertainty on the distribution of $Y=\sigma\epsilon$ due to uncertain $\sigma$ part. Then the maximum expected value under this distributional uncertainty is

\sup_{\sigma\in\mathcal{A}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)],

(3.9)

where the choice of $\mathcal{A}{[\underline{\sigma},\overline{\sigma}]}$ is the same as Section 3.2. It turns out, in either of the choices, 3.9 can be expressed as the sublinear expectation of a semi- $G$ -normal $W\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ (3.8).

To begin with, note that $\mathcal{N}(0,[1,1])$ can be treated as the same as the classical distribution $N(0,1)$ due to 2.14.1. Therefore, we can also say $\epsilon\sim\mathcal{N}(0,[1,1])$ in the sublinear expectation space. In the following context, we will not distinguish between $N(0,1)$ and $\mathcal{N}(0,[1,1])$ . Similarly, a standard multivariate normal $N(\bm{0},\mathbf{I}_{d})$ can be treated as both a classical distribution and also a degenerate version of a multivariate $G$ -normal.

Definition 3.8 (Univariate semi- $G$ -normal distribution).

For any $W\in\bar{\mathcal{H}}_{s}$ , we call $W$ follows a semi- $G$ -normal distribution $\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ if there exist $V\in\bar{\mathcal{H}}_{s}\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}$ and $\epsilon\in\bar{\mathcal{H}}_{s}\sim N(0,1)$ with $V\dashrightarrow\epsilon$ , such that

W=V\epsilon,

(3.10)

where the direction of independence cannot be reversed. It is denoted as $W\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ .

Remark 3.8.1.

(Existence of Semi- $G$ -normal distribution) Since there exist $V^{\prime}\in\mathcal{H}_{(1)}\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}$ and $\epsilon^{\prime}\in\mathcal{H}_{(2)}\sim N(0,1)$ , let $V(\omega_{1},\omega_{2})\coloneqq V^{\prime}(\omega_{1})$ and $\epsilon(\omega_{1},\omega_{2})\coloneqq\epsilon^{\prime}(\omega_{2})$ . Consider $f(x,y)=xy$ , then 3.7 ensures that $W=f(V,\epsilon)=f(V^{\prime},\epsilon^{\prime})\in\bar{\mathcal{H}}_{s}$ satisfies the properties required by 3.8.

Remark 3.8.2.

(Why we cannot reverse the direction of independence) There are two reasons:

The sublinear expectation will essentially change if we do so: the resulting distribution will be different. For instance, if we assume $\epsilon\dashrightarrow V$ and let $\tilde{W}\coloneqq\epsilon V$ , we have

	$\displaystyle\mathcal{E}[\tilde{W}]$	$\displaystyle=\mathcal{E}[\mathcal{E}[xV]_{x=\epsilon}]=\mathbb{E}[\epsilon^{+}\overline{\sigma}-\epsilon^{-}\underline{\sigma}]$
		$\displaystyle=\mathbb{E}[\overline{\sigma}\epsilon+(\overline{\sigma}-\underline{\sigma})\epsilon^{-}]=(\overline{\sigma}-\underline{\sigma})\mathbb{E}[\epsilon^{-}]$
		$\displaystyle=\frac{1}{2}(\overline{\sigma}-\underline{\sigma})\mathbb{E}[\lvert\epsilon\rvert]>0,$

and similarly,

-\mathcal{E}[-\tilde{W}]=-\frac{1}{2}(\overline{\sigma}-\underline{\sigma})\mathbb{E}[\lvert\epsilon\rvert]<0.

where $\epsilon^{+}=\max\{\epsilon,0\}$ and $\epsilon^{-}=\max\{-\epsilon,0\}$ . We can see that $\tilde{W}$ and $W$ already exhibit their difference in the first moment: $W$ has certain mean zero but ${W}$ has mean-uncertainty.

2.

We can never mutual independence in this case because $V$ is maximal and $\epsilon$ is classical then it does not belong to the cases in 2.26 proved by Hu and Li, (2014).

As we further proceed in this paper, we will see that the property of $W$ is closely related to the random vector $(V,\epsilon)$ in its decomposition 3.10 (such as the results in Section 3.6). The following 3.9 guarantees the uniqueness of such decomposition.

Proposition 3.9 (The uniqueness of decomposition).

In 3.8, if there exist two pairs $(V_{1},\epsilon_{1})$ and $(V_{2},\epsilon_{2})$ satisfying the required properties and

W=V_{1}\epsilon_{1}=V_{2}\epsilon_{2},

we must have $V_{1}=V_{2}$ and $\epsilon_{1}=\epsilon_{2}$ .

Theorem 3.10 (Representations of univariate semi- $G$ -normal).

Consider two classically distributed random variables $\sigma:\Omega\to{[\underline{\sigma},\overline{\sigma}]}$ and $\epsilon\sim N(0,1)$ satisfying $\sigma\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon$ . For any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R})$ , we have $\mathcal{E}[\lvert\varphi(W)\rvert]<\infty$ and

$\displaystyle\mathcal{E}[\varphi(W)]$	$\displaystyle=\max_{\sigma\in\mathcal{D}_{\textbf{deg.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)]=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)]$	(3.11)
	$\displaystyle=\max_{\sigma\in\mathcal{D}_{\textbf{disc.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)]$	(3.12)
	$\displaystyle=\sup_{\sigma\in\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)]$	(3.13)
	$\displaystyle=\max_{\sigma\in\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)],$	(3.14)

where $\{\mathcal{D},\mathcal{D}_{\textbf{disc.}},\mathcal{D}_{\textbf{cont.}},\mathcal{D}_{\textbf{deg.}}\}$ are the same as the ones in 3.2.

The proof of 3.10 is closely related to the representation of maximal distribution. First we need to prepare the following lemma.

Lemma 3.11.

For any fixed $v\in{[\underline{\sigma},\overline{\sigma}]}$ , let $\varphi_{\epsilon}(v)\coloneqq\mathcal{E}[\varphi(v\epsilon)]$ with $\epsilon\sim\mathcal{N}(0,[1,1])$ . Then we have $\varphi_{\epsilon}\in C_{\mathrm{l.Lip}}(\mathbb{R})$ .

Proof of 3.11.

Note that $\epsilon\overset{\text{d}}{=}\mathcal{N}(0,[1,1])\overset{\text{d}}{=}N(0,1)$ as mentioned in 2.14.1. Then $\varphi_{\epsilon}(v)\coloneqq\mathcal{E}[\varphi(v\epsilon)]=\mathbb{E}[\varphi(v\epsilon)]$ . Next we can show $\varphi_{\epsilon}\in C_{\mathrm{l.Lip}}(\mathbb{R})$ by definition:

	$\displaystyle\|\varphi_{\epsilon}(x)-\varphi_{\epsilon}(y)\|$	$\displaystyle=\|\mathbb{E}_{\mathbb{P}}[\varphi(x\epsilon)-\varphi(y\epsilon)]\|$
		$\displaystyle\leq\mathbb{E}_{\mathbb{P}}[C_{\varphi}(1+\lvert x\epsilon\rvert^{k}+\lvert y\epsilon\rvert^{k})\lvert\epsilon\rvert\cdot\lvert x-y\rvert]$
		$\displaystyle=C_{\varphi}(\mathbb{E}_{\mathbb{P}}[\lvert\epsilon\rvert]+\mathbb{E}_{\mathbb{P}}[\lvert\epsilon\rvert^{k+1}]\lvert x\rvert^{k}+\mathbb{E}_{\mathbb{P}}[\lvert\epsilon\rvert^{k+1}]\lvert y\rvert^{k})\lvert x-y\rvert$
		$\displaystyle\leq C(1+\lvert x\rvert^{k}+\lvert y\rvert^{k})\lvert x-y\rvert,$

where $C=C_{\varphi}\max\{\mathbb{E}_{\mathbb{P}}[\lvert\epsilon\rvert],\mathbb{E}_{\mathbb{P}}[\lvert\epsilon\rvert^{k+1}]\}$ . ∎

Proof of 3.10.

Under the sequential independence $V\dashrightarrow\epsilon$ , for any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R})$ , we have

\mathcal{E}[\varphi(W)]=\mathcal{E}[\varphi(V\epsilon)]=\mathcal{E}[\mathcal{E}[\varphi(v\epsilon)]_{v=V}]=\mathcal{E}[\varphi_{\epsilon}(V)].

First we have $\varphi_{\epsilon}\in C_{\mathrm{l.Lip}}(\mathbb{R})$ by 3.11. Then we can use 3.1 to show the finiteness of $\mathcal{E}[\lvert\varphi(W)\rvert]$ due to the continuity of $\varphi_{\epsilon}$ : $\mathcal{E}[\lvert\varphi(W)\rvert]=\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}\lvert\varphi_{\epsilon}(v)\rvert<\infty.$ Next we check each representation in 3.10 by applying the associated representation of maximal distribution in 3.2. For instance, we can show 3.13 based on 3.4:

	$\displaystyle\mathcal{E}[\varphi(W)]$	$\displaystyle=\mathcal{E}[\varphi_{\epsilon}(V)]=\sup_{\sigma\in\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi_{\epsilon}(\sigma)]$
		$\displaystyle=\sup_{\sigma\in\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\mathbb{E}[\varphi(v\epsilon)]_{v=\sigma}]=\sup_{\sigma\in\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)],$

where we use the fact that $\sigma\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon$ and 2.6.3. ∎

Remark 3.11.1.

3.10 means that there are several ways to interpret the distributional uncertainty of semi- $G$ -normal:

•

3.11 shows it can be described as a collection of $N(0,\sigma^{2})$ with $\sigma\in{[\underline{\sigma},\overline{\sigma}]}$ (which gives a direct way to compute this sublinear expectation);
•

3.13, 3.12 and 3.14 show it can be described as a collection of classical normal mixture distribution with (discretely, absolutely continuously or arbitrarily) distributed scale parameter ranging in ${[\underline{\sigma},\overline{\sigma}]}$ .

Remark 3.11.2.

Let $F_{\sigma}$ denote the cumulative distribution function of $\sigma$ under $\mathbb{P}$ and $F_{\mathcal{A}{[\underline{\sigma},\overline{\sigma}]}}$ represent the family of $F_{\sigma}$ with $\sigma\in\mathcal{A}{[\underline{\sigma},\overline{\sigma}]}$ . Then we can apply the classical Fubini theorem in the evaluation of $\mathbb{E}[\varphi(\sigma\epsilon)]$ in 3.10 to get a more explicit form of representation:

\mathcal{E}[\varphi(W)]=\sup_{F_{\sigma}\in F_{\mathcal{A}{[\underline{\sigma},\overline{\sigma}]}}}\int_{\underline{\sigma}}^{\overline{\sigma}}\mathbb{E}[\varphi(v\epsilon)]F_{\sigma}(\mathop{}\!\mathrm{d}v),

where $\mathcal{A}$ can be chosen from $\{\mathcal{D},\mathcal{D}_{\textbf{disc.}},\mathcal{D}_{\textbf{cont.}},\mathcal{D}_{\textbf{deg.}}\}$ .

Remark 3.11.3 (Why is it called a “semi” one?).

The essential reason is that the uncertainty set of distributions associated with the semi- $G$ -normal is smaller than the one of $G$ -normal. Let $W^{G}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ and $W\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ . In fact, we have the following existing result: for any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R})$

\mathcal{E}[\varphi(W^{G})]\geq\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(v\epsilon)]=\mathcal{E}[\varphi(W)],

(3.15)

which can be proved by applying the comparison theorem of parabolic partial differential equations (in Crandall et al., (1992)) to the associated $G$ -heat and classical heat equations with initial condition $\varphi$ (the inequality become a strict one when $\varphi$ is neither convex nor concave). For readers’ convenience, the result 3.15 is included in Section 2.5 in Peng, 2019b . Meanwhile, we have the representation of $\mathcal{E}$ from a set $\mathcal{P}$ of probability measures,

\mathcal{E}[\varphi(X)]=\sup_{\mathbb{Q}\in\mathcal{P}}\mathbb{E}_{\mathbb{Q}}[\varphi(X)]=\sup_{\mathbb{Q}_{X}\in\mathcal{P}_{X}}\mathbb{E}_{\mathbb{Q}}[\varphi(X)],

where $\mathcal{P}_{X}\coloneqq\{\mathbb{Q}\circ X^{-1},\mathbb{Q}\in\mathcal{P}\}$ characterizes the distributional uncertainty of $X$ . Hence, 3.15 tells us $\mathcal{P}_{W}\subset\mathcal{P}_{W^{G}}$ . A more explicit discussion of this distinction will be provided in 3.24.6.

Remark 3.11.4 (The distribution of $\epsilon$ ).

In principle, the distribution of $\epsilon$ can be changed to any other types of classical distribution with finite moment generating function and all the related results like representations will also hold. We choose standard normal because we are working on an intermediate structure between normal and $G$ -normal. Another reason comes from the following 3.12.

Proposition 3.12 (A special connection between semi- $G$ -normal and $G$ -normal distribution).

Let $W^{G}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ and $W\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ . For $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R})$ , when $\varphi$ is convex or concave, we have

\mathcal{E}[\varphi(W^{G})]=\mathcal{E}[\varphi(W)]=\begin{cases}\mathbb{E}_{\mathbb{P}}[\varphi(N(0,\overline{\sigma}^{2}))]&\varphi\text{ is convex}\\ \mathbb{E}_{\mathbb{P}}[\varphi(N(0,\underline{\sigma}^{2}))]&\varphi\text{ is concave}.\end{cases}.

3.5 Multivariate semi- $G$ -normal distribution

The definition of semi- $G$ -normal distribution can be naturally extended to multi-dimensional situation. Intuitively speaking, the multivariate semi- $G$ -normal distribution can be treated as an analogue of the classical multivariate normal distribution which can be written as:

N(\bm{0},\mathbf{\Sigma})=\mathbf{\Sigma}^{1/2}N(\bm{0},\mathbf{I}_{d}),

(3.16)

where $\mathbf{I}_{d}$ is a $d\times d$ identity matrix and $\mathbf{\Sigma}$ is the covariance matrix.

Let $\mathbb{S}_{d}^{+}$ denote the family of real-valued symmetric positive semi-definite $d\times d$ matrices. Consider a bounded, closed and convex subset $\mathcal{C}\subset\mathbb{S}^{+}_{d}$ . For any element $\mathbf{\Sigma}\in\mathcal{C}$ , it has a non-negative symmetric square root denoted as $\mathbf{\Sigma}^{1/2}$ . Let $\mathcal{V}\coloneqq\mathcal{C}^{1/2}$ which is the set of $\mathbf{\Sigma}^{1/2}$ with $\mathbf{\Sigma}\in\mathcal{C}$ . Then we can treat $\mathbf{\Sigma}$ as the covariance matrix of a classical multivariate normal distribution due to 3.16 and $\mathcal{C}$ as a collection of covariance matrices. Note that $\mathcal{V}$ is still a bounded, closed and convex set. Then a matrix-valued maximal distribution $\mathcal{M}(\mathcal{V})$ can be directly extended from 3.3.

Definition 3.13 (Multivariate Semi- $G$ -normal distribution).

Let a bounded, closed and convex subset $\mathcal{C}\subset\mathbb{S}^{+}_{d}$ be the uncertainty set of covariance matrices and $\mathcal{V}\coloneqq\mathcal{C}^{1/2}$ . In a sublinear expectation space, a $d$ -dimensional random vector $\bm{W}$ follows a (multivariate) semi- $G$ -normal distribution, denoted by $\bm{W}\sim\hat{\mathcal{N}}(\bm{0},\mathcal{C})$ , if there exists a (degenerate) $G$ -normal distributed $d$ -dimensional random vector

\bm{\epsilon}\sim N(\bm{0},\mathbf{I}_{d}):\Omega\rightarrow\mathbb{R}^{d},

and a $d\times d$ -dimensional maximally distributed random matrix

\mathbf{V}\sim\mathcal{M}(\mathcal{V}):\Omega\rightarrow\mathbb{R}^{d\times d},

as well as $\bm{\epsilon}$ is independent from $\mathbf{V}$ , expressed as $\mathbf{V}\dashrightarrow\bm{\epsilon}$ , such that

\bm{W}=\mathbf{V}\bm{\epsilon},

where the direction of independence here cannot be reversed.

Remark 3.13.1.

The existence of multivariate semi- $G$ -normal distribution comes from the same logic as 3.8.1 (by using the existence of the $G$ -distribution in a multivariate setup).

Remark 3.13.2.

Note that $\mathbf{V}$ in 3.13 is a random matrix. The relation $\mathbf{V}\dashrightarrow\bm{\epsilon}$ is defined by a multivariate version of 2.6.

Similar to the discussions in Section 3.2, we can extend the notion of semi- $G$ -normal distribution and its representation to multivariate siuation.

Theorem 3.14.

(Representation of multivariate semi- $G$ -normal distribution) Consider the random vector $\bm{W}$ in 3.13. For any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{d})$ , we have $\mathcal{E}[\lvert\varphi(\bm{W})\rvert]<\infty$ and

\mathcal{E}[\varphi(\bm{W})]=\sup_{\mathbf{\Sigma}^{1/2}\in\mathcal{A}(\mathcal{V})}\mathbb{E}[\varphi(\mathbf{\Sigma}^{1/2}\bm{\epsilon})],

(3.17)

Proof of 3.14.

The logic of this proof is exactly the same as the one of 3.10 where we apply the representation of the maximal distribution $\mathcal{M}(\mathcal{V})$ which can be easily checked that it has the same form as 3.4. ∎

Remark 3.14.1.

3.14 means that there are several ways to interpret the distributional uncertainty of multivariate semi- $G$ -normal $\hat{\mathcal{N}}(\bm{0},\mathcal{C})$ :

•

it can be described as a collection of $N(0,\mathbf{\Sigma})$ with constant covariance matrix $\mathbf{\Sigma}\in\mathcal{C}$ ;
•

it can be described as a collection of classical multvariate normal mixture distributions with (discretely, absolutely continuously, arbitrarily) distributed random covariance matrices (as a latent scaling variable) ranging in $\mathcal{C}$ .

By using 3.14, we can conveniently study the covariance uncertainty between the marginals of $\bm{W}$ . First, we can define the the upper and lower covariance between each marginal of $\bm{W}=(W_{1},W_{2},\dotsc,W_{d})$ as (note that $W_{i}$ has certain mean zero)

\overline{\gamma}(i,j)\coloneqq\mathcal{E}[W_{i}W_{j}],

and

\underline{\gamma}(i,j)\coloneqq-\mathcal{E}[-W_{i}W_{j}].

Then these two quantities turn out to be closely related to $\mathcal{C}$ illustrated as follows.

Proposition 3.15 (Upper and lower covariance between semi- $G$ -normal marginals).

For each $\mathbf{\Sigma}\in\mathcal{C}$ , let $\Sigma_{ij}$ denote the $(i,j)$ -th entry of $\mathbf{\Sigma}$ , and

[\underline{\Sigma}_{ij},\overline{\Sigma}_{ij}]\coloneqq[\min_{\Sigma\in\mathcal{C}}\Sigma_{ij},\max_{\Sigma\in\mathcal{C}}\Sigma_{ij}].

Then we have

\underline{\gamma}(i,j)=\underline{\Sigma}_{ij}\text{ and }\overline{\gamma}(i,j)=\overline{\Sigma}_{ij}.

Specially speaking, we have

\overline{\sigma}_{i}^{2}\coloneqq\mathcal{E}[W_{i}^{2}]=\overline{\mathbf{\Sigma}}_{ii}\text{ and }\underline{\sigma}_{i}^{2}\coloneqq-\mathcal{E}[-W_{i}^{2}]=\underline{\mathbf{\Sigma}}_{ii}.

Proof.

For each $(i,j)\in\{1,2,\dotsc,d\}^{2}$ , let $f_{ij}(\bm{W})=W_{i}W_{j}$ . Then it is obvious that $f_{ij}\in C_{\mathrm{l.Lip}}(\mathbb{R}^{d})$ . For each $\Sigma$ , let

(Y_{1},Y_{2},\dotsc,Y_{d})\coloneqq\mathbf{\Sigma}^{1/2}\bm{\epsilon}.

Then by applying 3.14,

	$\displaystyle\overline{\gamma}(i,j)=\mathcal{E}[W_{i}W_{j}]$	$\displaystyle=\max_{\Sigma^{1/2}\in\mathcal{V}}\mathbb{E}[f_{ij}(\mathbf{\Sigma}^{1/2}\bm{\epsilon})]$
		$\displaystyle=\max_{\Sigma^{1/2}\in\mathcal{V}}\mathbb{E}[Y_{i}Y_{j}]$
		$\displaystyle=\max_{\Sigma^{1/2}\in\mathcal{V}}\Sigma_{ij}=\overline{\Sigma}_{ij}.$

Similarly we can show $\underline{\gamma}(i,j)=\underline{\Sigma}_{ij}$ . ∎

3.6 Three types of independence related to semi- $G$ -normal distribution

Besides the existing $G$ -version independence (also called sequential independence) in 2.8, this substructure of semi- $G$ -normal distribution also provides the possibility to study finer structures of independence in this framework, and interestingly, we will show in Section 3.7 each type of independence is related to a family of state-space volatility models.

We will introduce three types of independence regarding semi- $G$ -normal distributions. Readers may recall the notation $\dashrightarrow$ for the independence of sequence (2.8). Throughout this section, we assume $W_{i}=V_{i}\epsilon_{i}\overset{\text{d}}{=}\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ for $i=1,2,\dotsc,n$ , which is a sequence of semi- $G$ -normally distributed random variables, then accordingly $V_{i}\overset{\text{d}}{=}\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}$ and $\epsilon_{i}\overset{\text{d}}{=}\mathcal{N}(0,[1,1])$ . Let

[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{n}\coloneqq\mathopen{}\mathclose{{}\left\{\mathbf{\Sigma}=\mathopen{}\mathclose{{}\left(\begin{array}[]{ccc}\sigma^{2}_{1}\\ &\ddots\\ &&\sigma^{2}_{n}\end{array}}\right):\sigma^{2}_{i}\in[\underline{\sigma}^{2},\overline{\sigma}^{2}],i=1,2,\dotsc,n}\right\}.

The identity of variance intervals is not essential and the results in this section can be easily generalized to the case $W_{i}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}_{i}^{2},\overline{\sigma}_{i}^{2}]),i=1,2,\dotsc,n$ .

Definition 3.16.

For a sequence of semi- $G$ -normal distributed random variables $\{W_{i}\}_{i=1}^{n}(=\{V_{i}\epsilon_{i}\}_{i=1}^{n})$ , we have three types of independence:

$\{W_{i}\}_{i=1}^{n}$ are semi-sequentially independent (denoted as $W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}\overset{\text{S}}{\dashrightarrow}\dotsc\overset{\text{S}}{\dashrightarrow}W_{n}$ ) if :

V_{1}\dashrightarrow V_{2}\dashrightarrow\dotsc\dashrightarrow V_{n}\dashrightarrow\epsilon_{1}\dashrightarrow\epsilon_{2}\dashrightarrow\dotsc\dashrightarrow\epsilon_{n};

(3.18)

2.

$\{W_{i}\}_{i=1}^{n}$ are sequentially independent (denoted as $W_{1}\dashrightarrow W_{2}\dashrightarrow\dotsc\dashrightarrow W_{n}$ ) if:

$V_{1}\epsilon_{1}\dashrightarrow V_{2}\epsilon_{2}\dashrightarrow\dotsc\dashrightarrow V_{n}\epsilon_{n};$ (3.19)

$\{W_{i}\}_{i=1}^{n}$ are fully-sequentially independent (denoted as $W_{1}\overset{\text{F}}{\dashrightarrow}W_{2}\overset{\text{F}}{\dashrightarrow}\dotsc\overset{\text{F}}{\dashrightarrow}W_{n}$ ) if:

V_{1}\dashrightarrow\epsilon_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{2}\dashrightarrow\dotsc\dashrightarrow V_{n}\dashrightarrow\epsilon_{n}.

(3.20)

Remark 3.16.1 (Compatibility with the definition of semi- $G$ -normal).

The requirement of independence to form the semi- $G$ -normal distribution is simply $V_{i}\dashrightarrow\epsilon_{i}$ , which is guaranteed by all the three types of independence by 2.19. Furthermore, for two semi- $G$ -normal object $W=V\epsilon$ and $\bar{W}=\bar{V}\bar{\epsilon}$ , we can see that $W\overset{\text{F}}{\dashrightarrow}\bar{W}$ implies

(V,\epsilon)\dashrightarrow(\bar{V},\bar{\epsilon}),

which further indicates $W\dashrightarrow\bar{W}.$ However, $W\overset{\text{F}}{\dashrightarrow}\bar{W}$ (or $W\dashrightarrow\bar{W}$ ) does not imply $W\overset{\text{S}}{\dashrightarrow}\bar{W}$ since the latter actually reverses the order of independence between $\epsilon$ and $\bar{V}$ in the former.

Remark 3.16.2.

(Existence of these types of independence) It comes from the same logic used in 3.8.1 due to the existence of $n$ sequentially independent $G$ -distributed random vectors.

Theorem 3.17.

The fully-seqential independence of $\{W_{i}\}_{i=1}^{n}$ can be equivalently defined as:

(F1)

The pair $(V_{i},\epsilon_{i})$ are sequentially independent: $(V_{1},\epsilon_{1})\dashrightarrow(V_{2},\epsilon_{2})\dashrightarrow\cdots\dashrightarrow(V_{n},\epsilon_{n}).$
(F2)

The elements within each pair $(V_{i},\epsilon_{i})$ satisfy $V_{i}\dashrightarrow\epsilon_{i}$ with $i=1,2,\dotsc,n$ .

Remark 3.17.1.

We add the condition (F2) only to stress the intrinsic requirement on independence from the definition of semi- $G$ -normal. The main requirement of fully-sequential independence is (F1). It is also the reason why $\overset{\text{F}}{\dashrightarrow}$ is stronger than $\dashrightarrow$ because the latter only involves the product $V\epsilon$ but the former is about the joint vector $(V,\epsilon)$ .

The fully-sequential independence is a stronger version of sequential independence and it does not exhibit much difference with sequential independence in our current scope of discussion (which will be illustrated by 3.24).

Hence, the key new type of independence here is the semi-sequential independence, which is different from the sequential independence and also leads to different joint distribution of $(W_{1},W_{2},\dotsc,W_{n})$ . We will study the properties and behaviours of semi- $G$ -normal under semi-sequential independence. Under such kind of independence, some of the intuitive properties we have in classical situation are preserved. First of all, it is actually a symmetric independence among objects with distributional uncertainty (3.19). This symmetry makes it different from the sequential independence although $\overset{\text{S}}{\dashrightarrow}$ is defined through $\dashrightarrow$ . Moreover, the joint vector of $n$ semi-sequentially independent semi- $G$ -normal follows a multivariate semi- $G$ -normal. It actually provides a view on how to connect univariate and multivariate objects (under distributional uncertainty), which is a non-trivial task for $G$ -normal distribution. It further provides a path to start from univariate classical normal to approach a multivariate $G$ -normal (by using the multivariate semi- $G$ -normal as a middle stage). This idea will be further illustrated in Section 4.2.

We call it as “semi-sequential” independence because the only “sequential” requirement in the independence is $(V_{1},V_{2},\dotsc,V_{n})\dashrightarrow(\epsilon_{1},\epsilon_{2},\dotsc,\epsilon_{n})$ but the squential order within each vector is inessential in the sense that it can be arbitrarily switched. 3.18 elaborates this point by giving an equivalent definition.

Theorem 3.18.

The semi-sequential independence of $\{W_{i}\}_{i=1}^{n}$ can be equivalently defined as:

(S1)

The $\epsilon$ part is independent from $V$ part: $(V_{1},V_{2},\dotsc,V_{n})\dashrightarrow(\epsilon_{1},\epsilon_{2},\dotsc,\epsilon_{n})$ ,
(S2)

The elements in $V$ part are sequentially independent: ${V}_{1}\dashrightarrow{V}_{2}\dashrightarrow\cdots\dashrightarrow{V}_{n}$ ,
(S3)

The elements in $\epsilon$ part are classically independent.

Remark 3.18.1.

The order of independence within $V$ part in (S2) is inessential in the sense that it can be arbitrarily switched by 3.6. Meanwhile, the order in $\epsilon$ part can also be switched due to the classical independence. Hence, this equivalence definition of semi-sequential independence indicates some intrinsic symmetry of this relation coming from the only two categories of distributions (maximal and classical) that allow mutual independence. This point will be elaborated in the discussion of 3.19 and further formalized in 3.22.

To show the idea of the symmetry of $\overset{\text{S}}{\dashrightarrow}$ , we start from a simple case with $n=2$ and include a short proof for readers to grasp the intuition. The validation of other results in this section is given in Section 6.3.

Proposition 3.19 (Symmetry in semi-sequential independence).

The following statements are equivalent:

(1)

$W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}$ ,
(2)

$W_{2}\overset{\text{S}}{\dashrightarrow}W_{1}$ ,
(3)

$(W_{1},W_{2})\sim\hat{\mathcal{N}}(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{2})$ .

The proof of 3.19 relies on the following 3.20, which is a direct consequence of 2.24 but we still include a separate proof from scratch to show the idea.

Lemma 3.20.

The following two statements are equivalent:

(1)

$V_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{1}\dashrightarrow\epsilon_{2}$ ,
(2)

$(V_{1},V_{2})\dashrightarrow(\epsilon_{1},\epsilon_{2})$ , $V_{1}\dashrightarrow V_{2}$ , $\epsilon_{1}\dashrightarrow\epsilon_{2}$ .

Proof of 3.20.

We can directly see $(1)\implies(2)$ because the independence of a sequence implies the independence among the non-overlapping subvectors as long as keeping the original order (2.23).

$(2)\implies(1)$ . The relation $(1)$ is equivalent to,

1.

$V_{1}\dashrightarrow V_{2}$ ,
2.

$(V_{1},V_{2})\dashrightarrow\epsilon_{1}$ ,
3.

$(V_{1},V_{2},\epsilon_{1})\dashrightarrow\epsilon_{2}$ .

The first two are directly implied by (2). For a fixed scalar vector $(v_{1},v_{2},e_{1})$ , let $H(v_{1},v_{2},e_{1})\coloneqq\mathcal{E}[\varphi(v_{1},v_{2},e_{1},e_{2})].$ Then the third one is equivalent to

\mathcal{E}[\varphi(V_{1},V_{2},\epsilon_{1},\epsilon_{2})]=\mathcal{E}[H(V_{1},V_{2},\epsilon_{1})].

In fact, since $(V_{1},V_{2})\dashrightarrow(\epsilon_{1},\epsilon_{2})$ , we have

	$\displaystyle\mathcal{E}[\varphi(V_{1},V_{2},\epsilon_{1},\epsilon_{2})]$	$\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(v_{1},v_{2},\epsilon_{1},\epsilon_{2})]_{v_{i}=V_{i},i=1,2}]$
		$\displaystyle\overset{(a)}{=}\mathcal{E}[\mathcal{E}[\mathcal{E}[\varphi(v_{1},v_{2},e_{1},\epsilon_{2})]_{e_{1}=\epsilon_{1}}]_{v_{i}=V_{i},i=1,2}]$
		$\displaystyle=\mathcal{E}[\mathcal{E}[H(v_{1},v_{2},\epsilon_{1})]_{v_{i}=V_{i},i=1,2}]$
		$\displaystyle\overset{(b)}{=}\mathcal{E}[H(V_{1},V_{2},\epsilon_{1})],$

where $(a)$ comes from the independence $\epsilon_{1}\dashrightarrow\epsilon_{2}$ and $(b)$ is due to the relation $(V_{1},V_{2})\dashrightarrow\epsilon_{1}$ . ∎

Proof of 3.19.

The equivalence of the three statements will be proved by this logic: $(1)\implies(2)\implies(3)\implies(1)$ .

$(1)\implies(2)$ . By 3.20, (1) indicates

(V_{1},V_{2})\dashrightarrow(\epsilon_{1},\epsilon_{2}),V_{1}\dashrightarrow V_{2},\epsilon_{1}\dashrightarrow\epsilon_{2}.

(3.21)

In 3.21, the roles in $V$ part are symmetric and so are $\epsilon$ part (due to 2.26). Then 3.21 is equivalent to

(V_{2},V_{1})\dashrightarrow(\epsilon_{2},\epsilon_{1}),V_{2}\dashrightarrow V_{1},\epsilon_{2}\dashrightarrow\epsilon_{1},

(3.22)

which in turn implies $W_{2}\overset{\text{S}}{\dashrightarrow}W_{1}$ by 3.20.

$(2)\implies(3)$ Let $\bm{W}\coloneqq(W_{1},W_{2})$ . Then

\bm{W}=(V_{1}\epsilon_{1},V_{2}\epsilon_{2})=\mathbf{V}\bm{\epsilon},

where $\mathbf{V}=\operatorname{diag}(V_{1},V_{2})$ and $\bm{\epsilon}=(\epsilon_{1},\epsilon_{2})$ . Under the independence $(2)$ , we have 3.22 by 3.20, which further implies $\mathbf{V}\dashrightarrow\bm{\epsilon}.$ We also have $(V_{1},V_{2})\sim\mathcal{M}({[\underline{\sigma},\overline{\sigma}]}^{2})$ from $V_{2}\dashrightarrow V_{1}$ (3.6), then $\mathbf{V}\sim\mathcal{M}({[\underline{\sigma},\overline{\sigma}]}\mathbf{I}_{2})$ . Meanwhile, $\epsilon_{2}\dashrightarrow\epsilon_{1}$ means they are actually classically independent with the joint distribution $\bm{\epsilon}\sim N(\bm{0},\mathbf{I}_{n}^{2})$ because the distribution of $\epsilon_{i}$ is classical. Therefore, by 3.13, $\bm{W}=\mathbf{V}\bm{\epsilon}\sim\hat{\mathcal{N}}(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{2}).$

$(3)\implies(1).$ First, from the definition of $\bm{W}=(W_{1},W_{2})\sim\hat{\mathcal{N}}(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{2})$ , there exist $\mathbf{V}=\operatorname{diag}(V_{1},V_{2})\sim\mathcal{M}({[\underline{\sigma},\overline{\sigma}]}\mathbf{I}_{2}),$ and $\bm{\epsilon}=(\epsilon_{1},\epsilon_{2})\sim N(0,\mathbf{I}_{2}),$ with independence

\mathbf{V}\dashrightarrow\bm{\epsilon},

(3.23)

such that $\bm{W}=\mathbf{V}\bm{\epsilon}.$ In other words, $(W_{1},W_{2})=(V_{1}\epsilon_{1},V_{2}\epsilon_{2}).$ We directly have $\epsilon_{1}$ and $\epsilon_{2}$ are classically independent from their joint distribution. Next we study the independence between $V_{i}$ part. Similarly, we can the joint distribution $(V_{1},V_{2})\sim\mathcal{M}({[\underline{\sigma},\overline{\sigma}]}^{2})$ from the distribution of $\mathbf{V}$ :

	$\displaystyle\mathcal{E}[\varphi(V_{1},V_{2})]$	$\displaystyle=\mathcal{E}[\varphi((1,1)\mathbf{V})]$
		$\displaystyle=\max_{B\in{[\underline{\sigma},\overline{\sigma}]}\mathbf{I}_{2}}\varphi((1,1)B)$
		$\displaystyle=\max_{(v_{1},v_{2})\in{[\underline{\sigma},\overline{\sigma}]}^{2}}\varphi(v_{1},v_{2}).$

By 3.6, we have $V_{1}\dashrightarrow V_{2}$ (also vice versa). Note that 3.23 implies $(V_{1},V_{2})\dashrightarrow(\epsilon_{1},\epsilon_{2}).$ Hence we have $W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}$ by 3.20. ∎

Proposition 3.21 (Zero sublinear covariance implies semi-sequential independence).

If $(W_{1},W_{2})$ follows a bivariate semi- $G$ -normal and they have certain zero covariance:

\mathcal{E}[W_{1}W_{2}]=-\mathcal{E}[-W_{1}W_{2}]=0,

then we have $W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}$ (and vice versa).

Proof.

It is a direct result of 3.19 and 3.15. ∎

3.21 and 3.19 seem like natural results for a “normal” object in multivariate case, but this is the first time we establish such kind of connections within the $G$ -expectation framework, because the $G$ -normal distribution does not have such properties in multivariate case. For instance, for $\bm{X}=(X_{1},X_{2})$ with $X_{i}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ . On the one hand, given the independence $X_{1}\dashrightarrow X_{2}$ , $\bm{X}$ does not follow a bivariate $G$ -normal, neither does $\mathbf{A}\bm{X}$ under any invertible transformation $\mathbf{A}$ . One the other hand, if $\bm{X}$ follows a bivariate $G$ -normal $\mathcal{N}(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{2})$ , we do not have $X_{1}\dashrightarrow X_{2}$ or $X_{2}\dashrightarrow X_{1}$ . These kinds of strange properties bring barriers to the understanding of $G$ -normal in multivariate situations, especially on the connection univariate and multivariate objects. More details of this concern can be found in Bayraktar and Munk, (2015). Fortunately, the substructure of semi- $G$ -normal provides some insights to reveal this connection.

3.19 can be extended to $n$ random variables.

Theorem 3.22.

The following three statements are equivalent:

(1)

${W}_{1}\overset{\text{S}}{\dashrightarrow}{W}_{2}\overset{\text{S}}{\dashrightarrow}\cdots\overset{\text{S}}{\dashrightarrow}{W}_{n}$ ,
(2)

$W_{k_{1}}\overset{\text{S}}{\dashrightarrow}W_{k_{2}}\overset{\text{S}}{\dashrightarrow}\cdots\overset{\text{S}}{\dashrightarrow}W_{k_{n}}$ for any permutation $\{k_{j}\}_{j=1}^{n}$ of $\{1,2,\dotsc,n\}$ ,
(3)

$(W_{1},W_{2},\dotsc,W_{n})\sim\hat{\mathcal{N}}(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{n})$ .

Remark 3.22.1.

3.22 shows the semi- $G$ -normal under the the semi-sequential independence has symmetry and compatiability with the multivariate case. The underlying reason is that it takes advantage the (only) two families of distributions that allow both properties: classical normal and maximal distribution. For classical normal, we know that a bivariate normal with diagonal covariance matrix is equivalent to the (symmetric) independence between components. For maximal, the results are provided in 3.6.

We end this session by showing the stability of semi- $G$ -normal distribution under semi-sequential independence, which indicates that more analogous generalizations of results on classical normal can be discussed here.

Proposition 3.23.

For any $\bar{W}$ satisfying $\bar{W}\overset{\text{d}}{=}W$ and $W\overset{\text{S}}{\dashrightarrow}\bar{W}$ , we have

W+\bar{W}\overset{\text{d}}{=}\sqrt{2}W.

Proof.

With $W=V\epsilon$ and $\bar{W}=\bar{V}\bar{\epsilon}$ , semi-sequential independence means:

V\dashrightarrow\bar{V}\dashrightarrow\epsilon\dashrightarrow\bar{\epsilon}.

For any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R})$ , first recall that $\varphi_{\epsilon}(v)\coloneqq\mathcal{E}[\varphi(v\epsilon)]$ is in $C_{\mathrm{l.Lip}}(\mathbb{R})$ (3.11). On the one hand,

	$\displaystyle\mathcal{E}[\varphi(V\epsilon+\bar{V}\bar{\epsilon})]$	$\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(v\epsilon+\bar{v}\bar{\epsilon})]_{v=V,\,\bar{v}=\bar{V}}]=\mathcal{E}\mathopen{}\mathclose{{}\left[\varphi_{\epsilon}\mathopen{}\mathclose{{}\left(\sqrt{v^{2}+\bar{v}^{2}}}\right)}\right]$
		$\displaystyle=\mathcal{E}\mathopen{}\mathclose{{}\left[\mathcal{E}[\varphi_{\epsilon}(\sqrt{v+\bar{V}})]_{v=V}}\right]=\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}\max_{y\in{[\underline{\sigma},\overline{\sigma}]}}\varphi_{\epsilon}(\sqrt{x^{2}+y^{2}}),$

where we use the fact that $V\dashrightarrow\bar{V}$ . On the other hand,

\mathcal{E}[\varphi(\sqrt{2}W)]=\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sqrt{2}x\epsilon)]=\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}\varphi_{\epsilon}(\sqrt{2}x).

Since

\{\sqrt{x^{2}+y^{2}};\,(x,y)\in{[\underline{\sigma},\overline{\sigma}]}^{2}\}=[\sqrt{2}\underline{\sigma},\,\sqrt{2}\overline{\sigma}]=\{\sqrt{2}x;\,x\in{[\underline{\sigma},\overline{\sigma}]}\},

we have $\mathcal{E}[\varphi(W+\bar{W})]=\mathcal{E}[\varphi(\sqrt{2}W)]$ for all $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R})$ . ∎

We will further investigate the connection and distinction between these types of independence in Section 3.7 by studying their representations.

3.7 Representations under three types of independence

Let us come back to the story setup in Section 3.1 to introduce our results to general audience. Suppose we intend to study the dynamic of the whole observation process (which is the observable data sequence)

(Y_{1},Y_{2},\dots,Y_{n})=(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n}).

Depending on the background information or knowledge (or the lackness of reliable knowledge on the data pattern and underlying dynamic), we may still have uncertainty on the distribution or dynamic of $\bm{Y}$ . Especially in the early stage of data analysis, it is usually required to specify a model structure and search for the optimal one in a family of them. However, at this stage, how to select or distinguish the family of models is an important and non-trivial task in statistial modeling. Suppose we assume that the underlying $\bm{\sigma}$ process belong to a family $\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}$ , but some patterns of the data sequence, which could be generally quantified by $\mathbb{E}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n})]$ for a test function $\varphi$ , seems to exceed even the extreme cases in $\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}$ . In this case, we may tend to reject the hypothesis that $\sigma\in\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}$ . In this situation, we usually need to work with the maximum expected value under the uncertainty $\bm{\sigma}\in\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}$ :

\sup_{\bm{\sigma}\in\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n})].

(3.24)

However, in principle, $\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}$ might be an infinite dimensional family of non-parametric (or semi-parametric) dynamics (due to lack of information on the undelying dynamic). In our current context of discussions, the possible choices of $\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}$ include:

•

$\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}\coloneqq\{\bm{\sigma}:\sigma_{t}\text{ is }\mathcal{G}_{t}\text{-measurable},\bm{\sigma}_{(n)}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\bm{\epsilon}_{(n)}\}$ . As illustrated by Figure 3.1, it includes independent mixture models and a typical class of hidden Markov models without feedback process.(In Figure 3.1, we omit the edge from $\sigma_{1}$ to $\sigma_{3}$ only for graphical simplicity.)
•

$\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}\coloneqq\{\bm{\sigma}:\sigma_{t}\text{ is }\mathcal{Y}_{t-1}\text{-measurable}\}$ . As illustrated by Figure 3.2, it includes those state space models that the future state variable only depends on the historical observations.
•

$\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]}\coloneqq\bigl{\{}\bm{\sigma}:\sigma_{t}\text{ is }\mathcal{F}_{t}\text{-measurable},(\sigma_{t}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon_{t})|\mathcal{F}_{t-1}\}.$ As illustrated by Figure 3.3, it contains a class of hidden Markov models with feedback process: the future state variable depends on both the previous states and observations. In Figures 3.2 and 3.3, the dashed arrows mean these are possible feedback effects.

Note that

\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}\cup\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]}.

This includes two aspects:

•

$\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]}$ due to the fact that $\mathcal{G}_{t}\subset\mathcal{F}_{t}$ and $\bm{\sigma}_{(n)}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\bm{\epsilon}_{(n)}$ ;
•

$\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]}$ because for any $\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}$ , given $\mathcal{F}_{t-1}\supset\mathcal{Y}_{t-1}$ , $\sigma_{t}$ can be treated as a constant thus we must have $\sigma_{t}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon_{t}|\mathcal{F}_{t-1}$ .

Remark 3.23.1.

The condition $\sigma_{t}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon_{t}|\mathcal{F}_{t-1}$ in $\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]}$ is equivalent to

\eta_{t}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon_{t}|\mathcal{F}_{t-1},

where $\eta_{t}\coloneqq\sigma_{t}-\mathbb{E}[\sigma_{t}|\mathcal{F}_{t-1}]$ is a sequence of $\mathbb{F}$ -martingale increments.

In traditional statistical modeling, how to deal with the quantity 3.24 is essentially a difficult task when $\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}$ is highly unspecified and only contains some vague conditions on the possible design of edges (such as the additional edges in Figure 3.3 compared with Figure 3.1).

In this section, we will show that 3.24 can be related to the $G$ -expectation of a random vector with semi- $G$ -normal marginals and different choice of $\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}$ is corresponding to a type of independence associated with semi- $G$ -normal. After transforming 4.1 into a $G$ -expectation, it becomes more convenient to evaluate the $G$ -expectation and this evaluation procedure also gives us a guidance on what should be the “skeleton” part to consider the extreme scenario when dealing with different forms of $\mathcal{A}_{n}{[\underline{\sigma},\overline{\sigma}]}$ .

Figure 3.1: Diagram for

\mathcal{S}_{3}{[\underline{\sigma},\overline{\sigma}]}

Figure 3.2: Diagram for

\mathcal{L}_{3}{[\underline{\sigma},\overline{\sigma}]}

Figure 3.3: Diagram for

\mathcal{L}^{*}_{3}{[\underline{\sigma},\overline{\sigma}]}

Our main result can be summarized as follows.

Theorem 3.24.

(Representations of $n$ semi- $G$ -normal random variables under various types of independence) Consider $W_{i}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]),i=1,2,\dotsc,n$ and any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{n})$ ,

•

Under semi-sequential independence:

W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}\overset{\text{S}}{\dashrightarrow}\cdots\overset{\text{S}}{\dashrightarrow}W_{n},

(3.25)

we have $\mathcal{E}[\lvert\varphi(\bm{W})\rvert]<\infty$ , and

\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{n})]=\max_{\bm{\sigma}\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n})].

(3.26)

•

Under sequential independence:

W_{1}\dashrightarrow W_{2}\dashrightarrow\cdots\dashrightarrow W_{n},

(3.27)

or fully-sequential independence:

W_{1}\overset{\text{F}}{\dashrightarrow}W_{2}\overset{\text{F}}{\dashrightarrow}\cdots\overset{\text{F}}{\dashrightarrow}W_{n},

(3.28)

we have $\mathcal{E}[\lvert\varphi(\bm{W})\rvert]<\infty$ , and

	$\displaystyle\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{n})]$	$\displaystyle=\max_{\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n})]$		(3.29)
		$\displaystyle=\max_{\bm{\sigma}\in\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n})].$		(3.30)

Proof of 3.24.

Turn to Section 6.4. ∎

Remark 3.24.1.

We can only say $\mathcal{E}[\varphi(\bm{W})]=\mathcal{E}[\varphi(V_{i}\epsilon_{i},i=1,2,\dotsc,n)]$ stays the same under sequential or fully sequential independence. It does not mean these two types of independence are equivalent. Their difference might arise when we consider a more general situation $\mathcal{E}[\varphi((V_{i},\epsilon_{i}),i=1,2,\dotsc,n)]$ , which is out of our current scope of discussion.

Remark 3.24.2.

Here we only consider $W_{t}$ as univariate semi- $G$ -normal which can also be routinely extended to multivariate semi- $G$ -normal (defined in Section 3.5). Then $\sigma_{t}$ is also requred to be changed to a matrix-valued process.

Remark 3.24.3.

The vision here is that we can use the $G$ -expectation of semi- $G$ -version random vector under various types of independence to obtain the envelope associated with different family of model structures. With or without a kind of dependence (such as with or without the feedback), the family of models is usually infinite dimensional because, in principle, the form of the feedback dependence could be any kind of nonlinear function. Nonetheless, 3.26, 3.29 and 3.30 tell us that, instead of going through all possible elements on the right hand side, we can move to the left side of the equation treat it as a sublinear expectation which has a convenient way to evaluate. For instance, under semi-sequential independence, by 4.4, $\bm{W}$ follows a multivariate semi- $G$ -normal, then we only need to run through a finite-dimensional subset (as the “skeleton” part) to get the extreme scenario,

\max_{\bm{\sigma}\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n})]=\max_{\bm{\sigma}\in{[\underline{\sigma},\overline{\sigma}]}^{n}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2},\dotsc,\sigma_{n}\epsilon_{n})].

Under sequential independence, we only need to run through an iterative algorithm to evaluate $\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{n})]$ , which will be explained in Section 4.1.

Corollary 3.24.1.

As special cases, under semi-sequential independence, we have

\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{t=1}^{n}W_{t})]=\max_{\sigma\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\sigma_{t}\epsilon_{t})]

(3.31)

Under sequential independence or fully sequential independence, we have

\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{t=1}^{n}W_{t})]=\max_{\sigma\in\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\sigma_{t}\epsilon_{t})],

(3.32)

where $\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]}$ can be replaced by $\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}$ .

Proof.

This a direct result of 3.24. ∎

Remark 3.24.4.

Under semi-sequential independence, by 3.23, we have

\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{t=1}^{n}W_{t})]=\mathcal{E}[\varphi(W_{1})],

then we have

\max_{\sigma\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\frac{1}{\sqrt{n}}\sum_{t=1}^{n}\sigma_{t}\epsilon_{t})]=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)].

Remark 3.24.5.

To show consistency with the existing results in the literature, if we choose $\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]}$ in 3.5, (we can also change the distribution of $\epsilon$ in $W$ to any applicable classical distribution,) then we can apply the CLT in the $G$ -expectation framework to the left handside to retrieve a result similar to the one in Rokhlin, (2015) (which is obtained by treating it as a discrete-time stochastic control problem):

\mathcal{E}[\varphi(W^{G})]=\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}W_{i})]=\lim_{n\to\infty}\sup_{\sigma\in\mathcal{L}^{*}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sigma_{i}\epsilon_{i})],

where $W^{G}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ . When choosing $\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}$ , 3.24.1 is related to the discussion in Section 4 of Fang et al., (2019). It is also related to the formulation in Dolinsky et al., (2012), although the latter uses a different approach.

Remark 3.24.6 (A more explicit distinction between semi- $G$ -normal and $G$ -normal).

Let us extend our discussion to a continuous-time version of the setup in Section 3.1. By Denis et al., (2011), the distributional uncertainty of $\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ can be explictily written as

\mathcal{E}[\varphi(W^{G})]=\sup_{\sigma\in\mathcal{L}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\int_{0}^{1}\sigma_{s}dB^{P}_{s})],

where $B_{t}^{P}$ is a classical Brownian motion (induced by $\epsilon_{t}$ ) under $(\Omega,\mathcal{F},\mathbb{F},\mathbb{P})$ and $\mathcal{L}{[\underline{\sigma},\overline{\sigma}]}$ is the collection of all $\mathcal{F}_{t}$ -measurable processes valuing in ${[\underline{\sigma},\overline{\sigma}]}$ . Meanwhile, by considering the continuous-time version of 3.24.4, the distributional uncertainty of $\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ can be expressed as

\mathcal{E}[\varphi(W)]=\sup_{\sigma\in\mathcal{S}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\int_{0}^{1}\sigma_{s}dB^{P}_{s})],

where $\mathcal{S}{[\underline{\sigma},\overline{\sigma}]}$ is the collection of all $\mathcal{G}_{t}$ -measurable processes valuing in ${[\underline{\sigma},\overline{\sigma}]}$ . Note that $\mathcal{S}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{L}{[\underline{\sigma},\overline{\sigma}]}$ because $\mathcal{S}{[\underline{\sigma},\overline{\sigma}]}$ only considers those $\sigma_{t}$ processes that is independent from $B_{t}$ . This gives another more explicit distinction between semi- $G$ -normal and $G$ -normal distribution compared with 3.11.3.

Corollary 3.24.2.

Under the setup of 3.24, when $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{n})$ is convex or concave,

\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{n})],

will be the same under either sequential or semi-sequential independence. Furthermore, in these cases, we have

\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{n})]=\begin{cases}\mathbb{E}[\varphi(\underline{\sigma}\epsilon_{1},\underline{\sigma}\epsilon_{2},\dotsc,\underline{\sigma}\epsilon_{n})]&\text{ when }\varphi\text{ is concave}\\ \mathbb{E}[\varphi(\overline{\sigma}\epsilon_{1},\overline{\sigma}\epsilon_{2},\dotsc,\overline{\sigma}\epsilon_{n})]&\text{ when }\varphi\text{ is convex}\end{cases}.

(3.33)

The following result can be treated as an extension of 3.12.

Corollary 3.24.3.

Let $\{W_{i}^{G}\}_{i=1}^{n}$ denote a sequence of nonlinearly i.i.d. $G$ -normally distributed random variables with $W_{1}^{G}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ . When $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{n})$ is convex or concave, we have

\mathcal{E}[\varphi(W_{1}^{G},W_{2}^{G},\dotsc,W_{n}^{G})]=\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{n})],

where $W_{i}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]),i=1,2,\dotsc,n$ and they can be either sequentially or semi-sequentially independent.

We can also prove that the representations mentioned in this paper also hold for $\varphi(x)=\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{x\leq y}}\right\}}$ so that we can apply them to consider the upper probability or capacity induced by the sublinear expectation: $\mathbf{V}(A)=\mathcal{E}[\mathds{1}_{A}]$ (from 2.2 and 2.4). Without loss of generality, we only discuss the univariate case, which can be routinely extended to multivariate situations.

Definition 3.25.

(The upper and lower cdf) In sublinear expectation space, the upper cdf of a random variable $X$ is

\overline{F}_{X}(y)\coloneqq\mathbf{V}(X\leq y)=\mathcal{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{X\leq y}}\right\}}],

and the lower cdf is

\underline{F}_{X}(y)\coloneqq\mathbf{v}(X\leq y)=-\mathcal{E}[-\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{X\leq y}}\right\}}].

Theorem 3.26.

(Representations of the upper and lower cdf) Let $X$ denote a random variable in sublinear expectation space and $X^{\alpha}$ is a random variable in the classical probability space whose distribution is characterized by a latent variable $\alpha$ . Suppose a representation of the sublinear expectation,

\mathcal{E}[\varphi(X)]=\sup_{\alpha\in\mathcal{A}}\mathbb{E}[\varphi(X^{\alpha})],

(3.34)

holds for any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R})$ . Then we also have the representations for the upper cdf,

\overline{F}_{X}(y)=\mathbf{V}(X\leq y)=\sup_{\alpha\in\mathcal{A}}\mathbb{P}(X^{\alpha}\leq y),

(3.35)

which holds for for any continuity point $y$ of $\overline{F}_{X}$ . In other words, the representation can be extended to functions in the form $\varphi(x)\coloneqq\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{x\leq y}}\right\}}$ . Meanwhile, we also have the representation for the lower cdf,

\underline{F}_{X}(y)=\mathbf{v}(X\leq y)=\inf_{\alpha\in\mathcal{A}}\mathbb{P}(X^{\alpha}\leq y),

(3.36)

which holds for any continuity point $y$ of $\underline{F}_{X}$ .

Proof of 3.26.

It is easy to show $\overline{F}_{X}(y)$ is a monotone function, then the set of discountinuous points is at most a countable set. Let $y$ be any continuous point of $\overline{F}_{X}(y)$ . For any $\epsilon>0$ , take $\delta$ small enough such that,

\overline{F}_{X}(y+\delta)-\overline{F}_{X}(y-\delta)\leq\epsilon.

Take $f$ and $g$ be two bounded continuous functions such that

f(x)=\begin{cases}1&x\leq y-\delta\\ \in[0,1]&y-\delta<x\leq y\\ 0&x>y\end{cases},

and

g(x)=\begin{cases}1&x\leq y\\ \in[0,1]&y<x\leq y+\delta\\ 0&x>y+\delta\end{cases}.

Then we have

\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{x\leq y-\delta}}\right\}}\leq f(x)\leq\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{x\leq y}}\right\}}\leq g(x)\leq\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{x\leq y+\delta}}\right\}}.

We can apply this inequality to $X^{\alpha}$ for any given $\alpha$ :

\mathbb{E}[f(X^{\alpha})]\leq\mathbb{P}(X^{\alpha}\leq y)\leq\mathbb{E}[g(X^{\alpha})],

then

\sup_{\alpha\in\mathcal{A}}\mathbb{E}[f(X^{\alpha})]\leq\sup_{\alpha\in\mathcal{A}}\mathbb{P}(X^{\alpha}\leq y)\leq\sup_{\alpha\in\mathcal{A}}\mathbb{E}[g(X^{\alpha})].

Note that $f,g\in C_{\mathrm{l.Lip}}(\mathbb{R})$ we can use the representation 3.34 to get,

\mathbf{V}(X\leq y-\delta)\leq\mathcal{E}[f(X)]\leq\sup_{\alpha\in\mathcal{A}}\mathbb{P}(X^{\alpha}\leq y)\leq\mathcal{E}[g(X)]\leq\mathbf{V}(X\leq y+\delta).

Then

\overline{F}_{X}(y)-\epsilon\leq\overline{F}_{X}(y-\delta)\leq\sup_{\alpha\in\mathcal{A}}\mathbb{P}(X^{\alpha}\leq y)\leq\overline{F}_{X}(y+\delta)\leq\overline{F}_{X}(y)+\epsilon.

Since $\epsilon>0$ can be arbitrarily small, we have proved the required result 3.35 for $\overline{F}_{X}$ . To validate the representation 3.36 for $\underline{F}_{X}$ , we simply need to replace $\overline{F}_{X}$ with $\underline{F}_{X}$ and change $\sup$ to $\inf$ accordingly. ∎

Remark 3.26.1.

(Notes on the continuity of $\mathbf{V}$ ) Note that 3.26 does not require the continuity of $\mathbf{V}$ : $\mathbf{V}(A_{n})\to\mathbf{V}(A)$ if $A_{n}\to A$ . Since one can easily check that $\mathbf{V}$ is automatically lower continuous: $\mathbf{V}(A_{n})\uparrow\mathbf{V}(A)$ if $A_{n}\uparrow A$ , the upper continuity ( $\mathbf{V}(A_{n})\downarrow\mathbf{V}(A)$ if $A_{n}\downarrow A$ ) is what we are really discussing here whenever we say the continuity of $\mathbf{V}$ . Here we try to avoid the assumption on the upper continuity of $V$ which is a quite strong and restrictive one. Even under the regularity of $\mathcal{E}$ , we can only say the upper continuity holds for closed $A$ (Lemma 7 in Denis et al., (2011)). However, when $y$ is a continuous point of $\overline{F}_{X}$ , consider any sequence $y_{n}$ converging to $y$ as $n\to\infty$ , we do have $V(A_{n})\to V(A)$ for sets $A_{n}\coloneqq\{X\leq y_{n}\}$ and $A\coloneqq\{X\leq y\}$ ; namely, $\mathbf{V}$ appears some continuity on this kind of sets.

4 The hybrid roles and applications of semi- $G$ -normal distributions

In this section, we will show the hybrid roles of semi- $G$ -normal distributions, connecting the intuition between the classical framework and the $G$ -expectation framework, by answering the four questions mentioned in the introduction.

4.1 How to connect the linear expectations of classical normal with $G$ -normal

In principle, it is feasible to understand the expectation of $G$ -normal distribution through the structure of $G$ -heat equation. Nonetheless, as a generalization of the normal distribution, it will be better if we can understand the $G$ -normal distribution in a more distributional sense. Is it possible to understand the $G$ -normal distribution from our old friend, the classical normal? It is indeed a natural concern or question but this is essentially not straightforward. Even for people who have partially learned the theory of the $G$ -expectation framework, there usually exists several common thinking gaps between classical normal and $G$ -normal distribution.

For instance, as mentioned in 3.11.3, for $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R})$ ,

\mathcal{E}[\varphi(\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]))]\geq\sup_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi\big{(}N(0,\sigma^{2}))],

(4.1)

which indicates that the uncertainty set of $G$ -normal distribution is larger than a class of classical normal distributions with $\sigma\in{[\underline{\sigma},\overline{\sigma}]}$ . Especially, Hu, (2012) shows the strict inequality that when $\varphi(x)=x^{3}$ , we have $\mathcal{E}[\big{(}\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])\big{)}^{3}]>0.$ (It stays positive for any odd moments.) Let $W^{G}\overset{\text{d}}{=}\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ . By checking the $G$ -function defined in 2.14, we have

W^{G}\overset{\text{d}}{=}-W^{G},

(4.2)

which indicates that the $G$ -normal distribution should have some “symmetry”. However, exactly due to this identity in distribution shown in 4.2, we should have $W^{G}$ and $-W^{G}$ share the same (sublinear) third moment: $\mathcal{E}[-(W^{G})^{3}]=\mathcal{E}[(-W^{G})^{3})]=\mathcal{E}[(W^{G})^{3}]>0,$ which directly implies,

\mathcal{E}[\big{(}\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])\big{)}^{3}]>0=\mathbb{E}[\big{(}N(0,\sigma^{2})\big{)}^{3}]>-\mathcal{E}[-\big{(}\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])\big{)}^{3}].

It tells us that the degree of symmetry or skewness of $G$ -normal distribution is uncertain, which somehow looks like a “contradiction” with 4.2 and seems quite counter-intuitive for a “normal” distribution.

Based on the above-mentioned statements showing how different the $G$ -normal and classical normal are, our motivation comes from the following opposite aspect: is this possible for us to connect the linear expectation $\mathbb{E}[\varphi\big{(}N(0,\sigma^{2}))]$ of classical normal distribution with the sublinear expectation $\mathcal{E}[\varphi(\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]))]$ of $G$ -normal distribution (or use the former to approach the latter one)?

This section will first give an affirmative answer to this question by providing an iterative algorithm given by our previous work (Li and Kulperger, (2018)) based on the semi- $G$ -normal distribution. Then we are going to extend this iterative algorithm into a general computational procedure to deal with weighted summations in statistical practice.

Theorem 4.1 (The Iterative Approximation of the $G$ -normal Distribution).

For any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R})$ and integer $n\geq 1$ , consider the series of iteration functions $\{\varphi_{i,n}\}_{i=1}^{n}$ with initial function $\varphi_{0,n}(x)\coloneqq\varphi(x)$ and iterative relation:

\varphi_{i+1,n}(x)\coloneqq\max_{\sigma\in[\underline{\sigma},\overline{\sigma}]}\mathbb{E}_{\mathbb{P}}[\varphi_{i,n}(N(x,\sigma^{2}/n))],i=0,1,\dotsc,n-1.

(4.3)

The final iteration function for a given $n$ is $\varphi_{n,n}$ . As $n\to\infty$ , we have $\varphi_{n,n}(0)\to\mathcal{E}[\varphi(W^{G})]$ , where $W^{G}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ .

Remark 4.1.1.

As opposed to 4.1, the relation 4.3 shows that, to correclty understand the sublinear expectation of $\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ , we need to start from the linear expectation of classical normal and go through an iterative maximization of the function $\varphi$ itself to approach the expectation of $\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ . For a fixed $n$ , we actually have

\mathcal{E}[\varphi(\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]))]\approx\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi_{n-1,n}(N(0,\sigma^{2}/n))].

Remark 4.1.2.

From a computational aspect, the normal distribution in 4.3 can be replaced by other classical distributions with finite moment generating functions because this algorithm is based on the $G$ -version central limit theorem (as indicated in 4.2). The interval ${[\underline{\sigma},\overline{\sigma}]}$ can be further simplified to a two-point set $\{\underline{\sigma},\overline{\sigma}\}$ or a three-point set $\{\underline{\sigma},\frac{\underline{\sigma}+\overline{\sigma}}{2},\overline{\sigma}\}$ for computational convenience. More theoretical details and numerical aspects (as well as PDE sides) of this iterative algorithm can be found in Li and Kulperger, (2018). This iterative algorithm is also related to the idea of the discrete-time formulation in Dolinsky et al., (2012).

Remark 4.1.3.

Consider a sequence $\{W_{i}\}_{i=1}^{n}$ of nonlinearly i.i.d. semi- $G$ -normal random variables with $W_{1}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ . Each iteration function can also be expressed as the sublinear expectation of the semi- $G$ -normal distribution (letting $W_{0}\coloneqq 0$ ):

\varphi_{i,n}(x)=\mathcal{E}[\varphi(x+\sum_{j=0}^{i}\frac{W_{n-j}}{\sqrt{n}})]=\mathcal{E}[\varphi(x+\sum_{j=0}^{i}\frac{W_{j}}{\sqrt{n}})],

for $i=0,1,\dotsc,n$ . Moreover, Li and Kulperger, (2018) further show that the series of iteration functions is an approximation of the whole solution surface of $G$ -heat equation on a given time grid. To be specific, consider the $G$ -heat equation defined on $[0,\infty)\times\mathbb{R}$ :

u_{t}+G(u_{xx})=0,\,u|_{t=1}=\varphi,

where $G(a)\coloneqq\frac{1}{2}\mathcal{E}[aX^{2}]=\frac{1}{2}(\overline{\sigma}^{2}a^{+}-\underline{\sigma}^{2}a^{-})$ and $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R})$ . For each $p\in(0,1]$ , we have

|u(1-p,x)-\varphi_{\lfloor np\rfloor,n}(x)|=|\mathcal{E}[\varphi(x+\sqrt{p}X)]-\mathcal{E}[\varphi(x+\sum_{i=0}^{\lfloor np\rfloor}\frac{W_{i}}{\sqrt{n}})]|=C_{\varphi}(1+|x|^{k})O(\frac{1}{(np)^{\alpha/2}}),

where $\alpha\in(0,1)$ depending on $(\underline{\sigma},\overline{\sigma})$ .

The basic idea of the iterative algorithm comes from the following result. In the following context, without futher notice, let $\{W_{i}\}_{i=1}^{\infty}$ denote a sequence of nonlinearly i.i.d. semi- $G$ -normally distributed random variables with $W_{1}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ .

Proposition 4.2.

(A general connection between semi- $G$ -normal and $G$ -normal) For any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R})$ , we have

\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}W_{i})]=\mathcal{E}[\varphi(W^{G})],

where $W^{G}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ .

Remark 4.2.1.

The iterative algorithm can also be extended to $d$ -dimensional cases by extending the dimensions of $\{W_{i}\}_{i=1}^{\infty}$ and $W^{G}$ accordingly.

Proof.

This is direct result of the $G$ -version central limit theorem (4.6). We can extend the space of function $\varphi$ to $C_{\mathrm{l.Lip}}(\mathbb{R})$ because the condition in 2.18 is satisfied. Let $S_{n}\coloneqq\frac{1}{\sqrt{n}}\sum_{i=1}^{n}W_{i}$ . In fact, for any $p\geq 1$ , since $f(x_{1},x_{2},\dotsc,x_{n})=\lvert\sum_{i=1}^{p}x_{i}\rvert^{p}$ is a convex function, by 3.24.2, with $\epsilon_{i}\sim N(0,1),i=1,2,\dotsc,n$ , we have

\mathcal{E}[\lvert S_{n}\rvert^{p}]=\mathbb{E}[\lvert\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\overline{\sigma}\epsilon_{i}\rvert^{p}]=\overline{\sigma}^{p}\mathbb{E}[\lvert\epsilon_{1}\rvert^{p}].

Meanwhile, $\mathcal{E}[\lvert W^{G}\rvert^{p}]=\overline{\sigma}^{p}\mathbb{E}[\lvert\epsilon_{1}\rvert^{p}]$ due to 2.15. Hence, we have, for any $p\in\mathbb{N}_{+}$ ,

\sup_{n}\mathcal{E}[\lvert S_{n}\rvert^{p}]+\mathcal{E}[\lvert W^{G}\rvert^{p}]<\infty.\qed

Then the iterative algorithm (4.1) can be treated as a direct evaluation of $\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}W_{i})]$ . Interestingly, the uncertainty set of each $W_{i}$ is strictly smaller than the $G$ -normal distribution by 4.1 but their normalized sum is able to approach the $G$ -normal. Then it leads us to another closely related question: how does the uncertainty set of $\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}W_{i})]$ exactly aggregate (towards the one of $G$ -normal) as $n$ increases? How does the $G$ -version independence change the uncertainty set associated with the expectation of the joint random vector

\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{n})]?

This question has been answered by the representations shown in 3.24 and 3.24.1.

We can also extend the idea of iterative algorithm into a procedure that can deal with sublinear expectation under sequential independnce in a broader sense. We call it as a $G$ -EM (Expectation-Maximation) procedure because it happens to involve expectation and maximization step (but it has no direct relation to the Expectation-Maximiazation algorithm in statistical modeling.)

One of the goals of $G$ -EM procedure is to deal with following object for a any fixed $\varphi\in C_{\mathrm{l.Lip}}$ :

\mathcal{E}[\varphi(\mathopen{}\mathclose{{}\left\langle\bm{a},\bm{W}}\right\rangle)]=\mathcal{E}[\varphi(\sum_{i=1}^{n}a_{i}W_{i})],.

(4.4)

where $W_{i}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]),i=1,2,\dotsc,n$ are sequentially independent (the distribution of $W_{i}$ could also be generalized to any member in a semi- $G$ -family of distributions which will be defined in Section 5.1) and $\bm{a}\in\mathbb{R}^{n}$ is the weight vector. Without loss of generality, we assume the Eulicdean norm $\lVert\bm{a}\rVert=1$ (or $\sum a_{i}^{2}=1$ ). These kinds of objects are common in data practice (in the context of financial modeling, statistics or actuarial science). We are going to give an example of a simple linear regression problem in Section 5.4.

The iterative algorithm is a special case of this, with $a_{i}=1/\sqrt{n},i=1,2,\dotsc,n$ :

\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}W_{i})],

which converges to $\mathcal{E}[\varphi(\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]))]$ as $n\to\infty$ .

However, in practice, using a asymptotic result may not be feasible here for the following reasons:

1.

Note that $a_{i}$ could be in arbitrary form (ususally depend on the data or problem itself). Although we do we have results like the weighted central limit theorem proved by Zhang and Chen, (2014), we may not always a general asysmptotic result for it.
2.

More fundamentally, $n$ could be a small number which still has a gap with the asymptotic result. In this case, we need to have a non-asymptoic approximation by involving the convergence rates of the central limiting theorem (like the Berry-Essen bound in classical case) which has been studied by Fang et al., (2019); Huang and Liang, (2019); Song, (2020); Krylov, (2020).
3.

If $n$ is small compared with the dimension $d$ of the data, it further requires us to have a non-asymptotic view of 4.4.

Next we explain the details of the $G$ -EM procedure to deal with 4.4. Again, under the spirit of iterative approximation, 4.4 can be computed by the following procedure: with $\varphi_{0,n}\coloneqq\varphi$ , for $i=0,1,2,\dotsc,n-1$ ,

\varphi_{i+1,n}(x)=\mathcal{E}[\varphi(x+a_{n-i}W_{n-i})]=\max_{\sigma_{n-i}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi_{i,n}(x+a_{n-i}\sigma_{n-i}\epsilon_{n-i})].

Finally we have

\mathcal{E}[\varphi(\sum_{i=1}^{n}a_{i}W_{i})]=\varphi_{n,n}(0).

Then we can store the optimal choice of $\sigma_{i}$ control process for our later simulation study (then there is no need to run the iterative algorithm again). Remember, the order we get the optimal $\sigma^{*}$ process is in the backward order

(\sigma_{n}^{*},\sigma_{n-1}^{*},\dotsc,\sigma_{1}^{*}),

To follow the original order, we need to reverse it and the optimal $\sigma^{*}$ process is in the form of

	$\displaystyle\sigma_{1}^{*}$	$\displaystyle\in{[\underline{\sigma},\overline{\sigma}]}$
	$\displaystyle\sigma_{k}^{*}$	$\displaystyle=\sigma_{k}^{*}(\sum_{i=1}^{k-1}a_{i}W_{i}),k=2,\dotsc,n.$

In this way, we have

\mathcal{E}[\varphi(\mathopen{}\mathclose{{}\left\langle\bm{a},\bm{W}}\right\rangle)]=\mathcal{E}[\varphi(\sum_{i=1}^{n}a_{i}W_{i})]=\mathbb{E}[\varphi(\sum_{i=1}^{n}a_{i}\sigma_{i}^{*}W_{i})],

and the linear expectation can be approximated by a classical Monte-Carlo simulation.

4.2 How to connect univariate and multivariate objects

There are two basic properties for classical normal distribution, which brings convenience in the study of multivariate statistics. First, in $(\Omega,\mathcal{F},\mathbb{P})$ for any two independent $X_{1}$ and $X_{2}$ both following $N(0,1)$ , we must have $(X_{1},X_{2})$ form a bivariate normal. (This result still holds even if they are not independent but linearly correlated.) Second, a $\mathbb{R}^{d}$ -valued random vector $\bm{X}$ follows multivariate normal if and only if the inner product $\mathopen{}\mathclose{{}\left\langle\bm{a},\bm{X}}\right\rangle$ is normal for any $\bm{a}\in\mathbb{R}^{d}$ . However, these two properties no longer hold for $G$ -normal distributions. Readers can find the following established result in the book Peng, 2019b (Exercise 2.5.1),

Proposition 4.3.

Suppose $X_{1}\dashrightarrow X_{2}$ and $X_{1}\overset{\text{d}}{=}X_{2}\overset{\text{d}}{=}N(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ with $\underline{\sigma}<\overline{\sigma}$ , for $\bm{X}\coloneqq(X_{1},X_{2})$ , we have

1.

$\mathopen{}\mathclose{{}\left\langle\bm{a},\bm{X}}\right\rangle$ is $G$ -normal distributed for any $\bm{a}\in\mathbb{R}^{2}$ ;
2.

$\bm{X}$ does not follow a bivariate $G$ -normal distribution.

4.3 shows we cannot construct bivariate $G$ -normal distribution directly from two independent univariate $G$ -normal distributed random variables. It stays unfeasible even considering any invertible linear transformations of the random vector $(X_{1},X_{2})$ as shown by Bayraktar and Munk, (2015), which study more details about these strange properties of $G$ -normal in multidimensional case.

To further explain the obstacle here, let us first recall that, in Section 4.1, we have shown how to start from the linear expectation $\mathbb{E}[\varphi(N(0,\sigma^{2}))]$ of classical normal to correctly understand (also compute) the sublinear expectation $\mathcal{E}[\varphi(\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]))]$ of $G$ -normal. Suppose our next goal here is to help general audience further understand (or compute) the sublinear expectation $\mathcal{E}[\varphi(\mathcal{N}(0,\mathcal{C}))]$ of a multivariate $G$ -normal distribution with covariance uncertainty characterized by $\mathcal{C}$ such as $\mathcal{C}\coloneqq\{\operatorname{diag}(\sigma_{1}^{2},\sigma_{2}^{2}),\sigma_{i}\in{[\underline{\sigma},\overline{\sigma}]},i=1,2\}$ from $(X_{1},X_{2})$ with $X_{i}\sim\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ , $i=1,2$ and $X_{1}\dashrightarrow X_{2}$ . However, as shown in 4.3, it is difficult to achieve this goal from this path because $\bm{X}=(X_{1},X_{2})$ is not $G$ -normal distributed, and neither is $\mathbf{A}\bm{X}^{T}$ for any invertible $2\times 2$ matrix $\mathbf{A}$ .

It turns out the connection between univariate and multivariate object is essentially nontrivial. The contribution of this section is to show that this connection can be revealed by introducing an intermediate substructure, called the semi- $G$ -normal imposed with semi-sequential independence. Typically, 4.4 shows that a joint vector of semi-sequentially independent univariate semi- $G$ -normal follows a multivariate semi- $G$ -normal (with a diagonal covariance matrix).

Theorem 4.4.

For a sequence of semi- $G$ -normal distributed random variables $\{W_{i}\}_{i=1}^{n}$ , satisfying $W_{i}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}_{i}^{2},\overline{\sigma}_{i}^{2}])$ for $i=1,2,\dotsc,n$ , and

W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}\overset{\text{S}}{\dashrightarrow}\dotsc\overset{\text{S}}{\dashrightarrow}W_{n},

we have

(W_{1},W_{2},\dotsc,W_{n})^{T}\sim\hat{\mathcal{N}}(\bm{0},\mathcal{C}),

where $\mathcal{C}\subset\mathbb{S}_{d}^{+}$ is the uncertainty set of covariance matrices defined as

\mathcal{C}\coloneqq\mathopen{}\mathclose{{}\left\{\mathbf{\Sigma}=\mathopen{}\mathclose{{}\left(\begin{array}[]{ccc}\sigma^{2}_{1}\\ &\ddots\\ &&\sigma^{2}_{n}\end{array}}\right):\sigma^{2}_{i}\in[\underline{\sigma}_{i}^{2},\overline{\sigma}_{i}^{2}],i=1,2,\dotsc,n}\right\}.

Proof.

It is a direct result of 3.22 (the non-identical variance interval here is inessential to the proof). ∎

Next 4.5 shows we can do linear transformation on $\bm{W}$ to get a multivariate semi- $G$ -normal with a non-diagnonal covariance matrix.

Proposition 4.5 (Multivariate semi- $G$ -normal under linear transformation).

Let $\bm{W}_{n\times 1}\sim\hat{\mathcal{N}}(\bm{0},\mathcal{C})$ . For any constant matrix $\mathbf{A}\in\mathbb{R}^{r\times n}$ with $r\leq n$ , we have

\mathbf{A}\bm{W}\sim\hat{\mathcal{N}}(\bm{0},\mathbf{A}\mathcal{C}\mathbf{A}^{T}),

where

\mathbf{A}\mathcal{C}\mathbf{A}^{T}\coloneqq\mathopen{}\mathclose{{}\left\{\mathbf{A}\mathbf{\Sigma}\mathbf{A}^{T}:\mathbf{\Sigma}\in\mathcal{C}}\right\}\subset\mathbb{R}^{r\times r}.

Proof.

First of all, note that $\mathbf{A}_{r\times n}\bm{W}_{n\times 1}=\mathbf{A}_{r\times n}\mathbf{V}_{n\times n}\bm{\epsilon}_{n\times 1}$ with $\mathbf{V}\sim\mathcal{M}(\mathcal{V})$ . For any $H\in C_{\mathrm{l.Lip}}(\mathbb{R}^{r\times n})$ , we have

\mathcal{E}[H(\mathbf{A}\mathbf{V})]=\max_{\mathbf{\Sigma}^{1/2}\in\mathcal{V}}\mathbb{E}_{\mathbb{P}}[H(\mathbf{A}\mathbf{\Sigma}^{1/2})]=\max_{\mathbf{B}\in\mathbf{A}\mathcal{V}}\mathbb{E}_{\mathbb{P}}[H(\mathbf{B})],

so $\mathbf{A}\mathbf{V}\sim\mathcal{M}(\mathbf{A}\mathcal{V})$ , which can be treated as the scaling property for the $n\times n$ -dimensional maximal distribution. It follows from $\mathbf{V}\dashrightarrow\bm{\epsilon}$ that $\mathbf{A}\mathbf{V}\dashrightarrow\bm{\epsilon}$ . Therefore,

\mathbf{A}\bm{W}\overset{\text{d}}{=}\mathcal{M}(\mathbf{A}\mathcal{V})N(\bm{0},\mathbf{I}_{n}^{2})\sim\hat{\mathcal{N}}(\bm{0},\mathcal{C}^{\prime}),

where

	$\displaystyle\mathcal{C}^{{}^{\prime}}$	$\displaystyle\coloneqq\mathopen{}\mathclose{{}\left\{\mathbf{B}\mathbf{B}^{T}:\mathbf{B}\in\mathbf{A}\mathcal{V}}\right\}$
		$\displaystyle=\mathopen{}\mathclose{{}\left\{(\mathbf{A}\mathbf{\Sigma}^{1/2})(\mathbf{A}\mathbf{\Sigma}^{1/2})^{T}:\mathbf{\Sigma}^{1/2}\in\mathcal{V}}\right\}$
		$\displaystyle=\mathopen{}\mathclose{{}\left\{\mathbf{A}\mathbf{\Sigma}\mathbf{A}^{T}:\mathbf{\Sigma}\in\mathcal{C}}\right\}.$

In other words, $\mathbf{A}\bm{W}\sim\hat{\mathcal{N}}(\bm{0},\mathbf{A}\mathcal{C}\mathbf{A}^{T}).$ ∎

Then we can use a sequence of $\bm{W}\sim\hat{\mathcal{N}}(\bm{0},\mathcal{C})$ to approach the multivariate $G$ -normal $\mathcal{N}(\bm{0},\mathcal{C})$ by nonlinear CLT.

Theorem 4.6.

Consider a sequence of nonlinearly i.i.d. $\{\bm{W}_{i}\}_{i=1}^{\infty}$ with $W_{1}\sim\hat{\mathcal{N}}(0,\mathcal{C})$ . Let $W^{G}$ be a $G$ -normal distributed random vector following $\mathcal{N}(0,\mathcal{C})$ . Then we have, for any $\varphi\in C_{\mathrm{l.Lip}}$ ,

\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\bm{W}_{i})]=\mathcal{E}[\varphi(\bm{W}^{G})].

It means that,

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\bm{W}_{i}\overset{\text{d}}{\longrightarrow}\mathcal{N}(0,\mathcal{C}).

Proof.

This is a multivariate version of 4.2. We only need to validate the conditions. First of all, the sequence $\{W_{i}\}_{i=1}^{\infty}$ definitely has certain zero mean. Then, notice that the distribution of $\bm{W}^{G}$ is characterized by the function $G(\mathbf{A})=\frac{1}{2}\sup_{\mathbf{\Sigma}\in\mathcal{C}}\text{tr}[\mathbf{A}\mathbf{\Sigma}]$ , where $\text{tr}[\cdot]$ means the trace of the matrix. We only need to prove that $G(\mathbf{A})=\frac{1}{2}\mathcal{E}[\langle\mathbf{A}W_{1},W_{1}\rangle]$ for any $\mathbf{A}\in\mathbb{S}_{d}$ . By the representation of semi- $G$ -normal distribution, letting $\bm{\epsilon}\sim N(0,\mathbf{\Sigma})$ , we have

	$\displaystyle\mathcal{E}[\langle\mathbf{A}W_{1},W_{1}\rangle]$	$\displaystyle=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\mathbb{E}_{\mathbb{P}}[\langle\mathbf{A}\bm{\epsilon},\bm{\epsilon}\rangle]=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\mathbb{E}_{\mathbb{P}}[(\mathbf{A}\bm{\epsilon})^{T}\bm{\epsilon}]$
		$\displaystyle=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\mathbb{E}_{\mathbb{P}}[\bm{\epsilon}^{T}\mathbf{A}\bm{\epsilon}]=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\mathbb{E}_{\mathbb{P}}[\text{tr}[\bm{\epsilon}^{T}\mathbf{A}\bm{\epsilon}]]$
		$\displaystyle=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\mathbb{E}_{\mathbb{P}}[\text{tr}[\mathbf{A}\bm{\epsilon}\bm{\epsilon}^{T}]]=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\text{tr}[\mathbb{E}_{\mathbb{P}}[\mathbf{A}\bm{\epsilon}\bm{\epsilon}^{T}]]$
		$\displaystyle=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\text{tr}[\mathbf{A}\mathbb{E}_{\mathbb{P}}[\bm{\epsilon}\bm{\epsilon}^{T}]]=\sup_{\mathbf{\Sigma}\in\mathcal{C}}\text{tr}[\mathbf{A}\mathbf{\Sigma}].\qed$

The argument on how to extend the choice of $\varphi$ to $C_{\mathrm{l.Lip}}$ is is similar to the proof in 4.2.

Then it creates a path from univariate classical normal to a multivariate $G$ -normal. Figure 4.1 shows the relations among linear, semi- $G$ - and $G$ -normal distributions. We can start from the univariate objects (semi- $G$ -normal distribution), and construct its multivariate version under semi-sequential independence, then approach the multivariate $G$ -normal distribution, which gives us a feasible way to start from univariate objects to approximately approach the multivariate distribution.

Refer to caption — Figure 4.1: The relations among classical, semi- $G$ -, and $G$ -normal in univariate and multivariate cases

4.3 A statistical interpretation of asymmetry in sequential independence

In this section, we will expand 2.7 (which is used to illustrate the asymmetry of independence in this framework) by studying its representation result to provide a more specific, statistical interpretation of this asymmetry. More interestingly, we will show that, for two semi- $G$ -normally distributed random objects, each of them has certain zero third moment (because its distributional uncertainty can be written as a family of classical normal with different variances). This property is preserved for the summation of them under semi-sequential independence. However, after we impose sequential independence onto them, their summation will exhibit third-moment uncertainty. This phenomenon is closely related to the third moment uncertainty of the $G$ -normal (as shown in Section 4.1) by applying the $G$ -version central limit theorem (2.17).

Next we expand 2.7 by considering $X$ and $Y$ as two semi- $G$ -normal distributed random variables.

Example 4.7 (Third moment uncertainty comes from asymmetry of independence).

Suppose $V_{1}\overset{\text{d}}{=}V_{2}\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}$ , and $\epsilon_{1}\overset{\text{d}}{=}\epsilon_{2}\sim\mathcal{N}(0,[1,1])$ (which is exactly the classical $N(0,1)$ ) imposed with sequential independence

V_{i}\dashrightarrow\epsilon_{i},i=1,2.

Let $W_{i}\overset{\text{d}}{=}V_{i}\epsilon_{i},i=1,2$ , which turn out to be two identically distributed semi- $G$ -normal random variables $W_{1}\overset{\text{d}}{=}W_{2}\overset{\text{d}}{=}\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ . Note that $(W_{1},W_{2})$ is a special case of $(X,Y)$ in 2.7. We are going to show that, under different types of independence for $W_{i}$ ’s or different structures of sequential independence for $V_{i}$ ’s and $\epsilon_{i}$ ’s, we will have different uncertainty for $W_{1}W_{2}^{2}$ and $(W_{1}+W_{2})^{3}$ whose extreme scenarios can be described by their sublinear expectations.

When $W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}$ or

V_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{1}\dashrightarrow\epsilon_{2},

(4.5)

since $W_{1}+W_{2}\overset{\text{d}}{=}\sqrt{2}W_{1}$ , then we have

\mathcal{E}[(W_{1}+W_{2})^{3}]=\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[(\sqrt{2}v\epsilon)^{3}]=0=-\mathcal{E}[-(W_{1}+W_{2})^{3}].

(4.6)

It shows that under semi-sequential independence, $W_{1}+W_{2}$ does not have third-moment uncertainty. Meanwhile, since $(W_{1},W_{2})$ follows bivariate semi- $G$ -normal, we also have

\mathcal{E}[W_{1}W_{2}^{2}]=\max_{(v_{1},v_{2})\in{[\underline{\sigma},\overline{\sigma}]}^{2}}\mathbb{E}_{\mathbb{P}}[(v_{1}v^{2}_{2})\epsilon_{1}\epsilon_{2}^{2}]=0.

Since we have shown that semi-sequential independence is symmetric (3.19), it means that things do not change when we consider $W_{2}\overset{\text{S}}{\dashrightarrow}W_{2}$ or $V_{2}\dashrightarrow V_{1}\dashrightarrow\epsilon_{2}\dashrightarrow\epsilon_{1}.$

However, if we only switch the order of independence between $V_{2}$ and $\epsilon_{1}$ in 4.5 to get

V_{1}\dashrightarrow\epsilon_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{2},

which means $W_{1}\overset{\text{F}}{\dashrightarrow}W_{2}$ , implying $W_{1}\dashrightarrow W_{2}$ . Note that $-W_{i}\overset{\text{d}}{=}W_{i},i=1,2$ and $-W_{1}\overset{\text{F}}{\dashrightarrow}-W_{2}$ , so we still have

-(W_{1}+W_{2})=(-W_{1})+(-W_{2})\overset{\text{d}}{=}W_{1}+W_{2}.

It indicates some “symmetry” in its ( $G$ -version) distribution. Although its second moment is uncertain, we still expect it to have some kind of “zero skewness” which indicates at least “zero third moment”. However, it turns out this is not the case:

\mathcal{E}[(W_{1}+W_{2})^{3}]=3\mathcal{E}[W_{1}W_{2}^{2}]=3(\overline{\sigma}^{2}-\underline{\sigma}^{2})\frac{\overline{\sigma}}{\sqrt{2\pi}}>0>-\mathcal{E}[-(W_{1}+W_{2})^{3}],

(4.7)

where we apply 2.25 based on the facts that both $W_{1}$ and $W_{2}$ have certain zero third moment as well as the results from 2.7:

\mathcal{E}[W_{1}W_{2}^{2}]>0\text{ while }\mathcal{E}[W_{1}^{2}W_{2}]=0.

How can we understand the asymmetry of independence in 4.7 from the representations of the sublinear expectations? This question is answered by the following 4.8, which can be treated as a special case of 3.24.

Let $C_{\text{s.poly}}$ denote a basic family of bivariate polynomials:

C_{\text{s.poly}}\coloneqq\{\varphi:\varphi(x_{1},x_{2})=(ax_{1}+bx_{2})^{n},\text{ or }cx_{1}^{p}x_{2}^{q},\text{ with }p,q,n\in\mathbb{N},a,b,c\in\mathbb{R}\}.

Proposition 4.8.

(The joint distribution of two semi- $G$ -normal random variables under various independence: for a small family of $\varphi$ ’s) Consider $W_{i}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]),i=1,2$ and any $\varphi\in C_{\text{s.poly}}$ ,

•

When $W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}$ , we have

\mathcal{E}[\varphi(W_{1},W_{2})]=\max_{\bm{\sigma}\in\mathcal{S}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2})],

where

\displaystyle\mathcal{S}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}

\displaystyle\coloneqq\bigl{\{}\bm{\sigma}=(\sigma_{1},\sigma_{2}):(\sigma_{1},\sigma_{2})\in\{\underline{\sigma},\overline{\sigma}\}^{2}\bigr{\}}.

•

When $W_{1}\dashrightarrow W_{2}$ or $W_{1}\overset{\text{F}}{\dashrightarrow}W_{2}$ , we have

\mathcal{E}[\varphi(W_{1},W_{2})]=\max_{\bm{\sigma}\in\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2})],

where

	$\displaystyle\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}$	$\displaystyle\coloneqq\bigl{\{}\bm{\sigma}=(\sigma_{1},\sigma_{2}(\sigma_{1}\epsilon_{1})):\sigma_{1}\in\{\underline{\sigma},\overline{\sigma}\},$
		$\displaystyle\hphantom{\coloneqq\{\bm{\sigma}=(\sigma_{1},\sigma_{2}(\sigma_{1}\epsilon_{1})):}\sigma_{2}(x)=\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{x>0}}\right\}}(\sigma_{22}-\sigma_{21})+\sigma_{21},$
		$\displaystyle\hphantom{\coloneqq\{\bm{\sigma}=(\sigma_{1},\sigma_{2}(\sigma_{1}\epsilon_{1})):}(\sigma_{21},\sigma_{22})\in\{\underline{\sigma},\overline{\sigma}\}^{2}\bigr{\}}.$

Remark 4.8.1.

4.8 provides us with the following intuitions:

1.

we can directly see the difference between sequential and semi-sequential independence: under this basic setup, if $W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}$ , we can use the upper envelope of a four-element set to represent (or compute) $\mathcal{E}[\varphi(W_{1},W_{2})]$ , while an eight-element set is required when $W_{1}\dashrightarrow W_{2}$ . Meanwhile, note that $\mathcal{S}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}$ : it indicates that sequential independence can cover a larger family of models compared with the semi-sequential one. This statement is confirmed in general by 3.24;
2.

it reveals another a more intuitive insight on why $\mathcal{E}[(W_{1}+W_{2})^{3}]>0$ under sequential independence: the set difference $\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}\setminus\mathcal{S}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}$ contains those elements where $\sigma_{2}$ actually depends on the previous $\sigma_{1}\epsilon_{1}$ (or simply the sign of $\epsilon_{1}$ ). Although this kind of dependence does not create a shift in the mean part of $W_{1}+W_{2}$ (which is still zero), it will have strong effects on the skewness of the distribution of $W_{1}+W_{2}$ . This phenomenon is related to the so-called leverage effect in the context of financial time series analysis. In the companion of this paper, we will use a dual-volatility regime-switching data example to give a statistical illustration of this phenomenon and show the necessity of discussing $\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}$ .
3.

we can get one specific interpretation of asymmetry in the sequential independence from the format of $\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}$ - the role of $\sigma_{1}$ and $\sigma_{2}$ are not symmetric: when $W_{1}\dashrightarrow W_{2}$ , this sequential order means that $W_{1}$ is realized first and the volatility part $\sigma_{2}\in{[\underline{\sigma},\overline{\sigma}]}$ of $W_{2}$ may or may not depend on the value of $W_{1}$ so as to make the distributional uncertainty in $W_{2}$ unchanged. In short, when we aggregate the uncertainty set from time $1$ to $2$ , due to the sequential order of the data, we can only have $\sigma_{2}$ is affected by a function of $(\sigma_{1},\epsilon_{1})$ but we almost never have $\sigma_{1}$ is influenced by a function of $(\sigma_{2},\epsilon_{2})$ (due to the restriction from the order of time). As opposed to this asymmetry, the semi-sequential independence is symmetric indicated by the form of $\mathcal{S}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}$ : the role of $\sigma_{1}$ and $\sigma_{2}$ are symmetric so we must have the same results for $\mathcal{E}[W_{1}W_{2}^{2}]$ and $\mathcal{E}[W_{1}^{2}W_{2}]$ under $W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}$ .
4.

More importantly, it also offers guidance on the possible simulation study in this framework. When one intends to generate a data sequence that can go through the scenarios covered by a sequential independent random variables, a more cautious attitude and in-depth thought are required: different blocks of samples with separate $\sigma_{i}\in{[\underline{\sigma},\overline{\sigma}]}$ may only go through $\mathcal{S}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}$ which can be at most treated as semi-sequential independence rather than sequential independence. In order to touch the latter one, one needs to generate those scenarios that allow possible classical dependence between $\sigma_{i}$ and previous $(\sigma_{k},\epsilon_{k})$ with $k<i$ . By borrowing the language of a state-space model, if we treat $\sigma_{i}$ as states and $\sigma_{i}\epsilon_{i}$ as observations, uncertainty in the dependence between current states with previous observations needs to be considered in order to sufficiently discuss the uncertainty set covered by the sequential independence. Otherwise, it is likely to be at most semi-sequential independence. We will further discuss this point in Section 4.4 and further in the companion paper of this work.

Example 4.9 (The direction of independence comes from finer structure).

In the $G$ -framework, a symmetric (or mutual) independence between two non-constant random variables only arises if they belong to either of the following two categories: classical distribution or maximal distribution (Hu and Li, (2014)). One interesting question is: how about the independence for combinations (such as products) of them? Logically speaking, if the combination does not fall into the two cases, they must have asymmetric independence, but where does this “asymmetry” come from? To be specific, suppose we have $V\overset{\text{d}}{=}\bar{V}\overset{\text{d}}{=}\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}$ and $\epsilon\overset{\text{d}}{=}\bar{\epsilon}\overset{\text{d}}{=}\mathcal{N}(0,[1,1])$ . Meanwhile, assume independence $V\dashrightarrow\bar{V}$ and $\epsilon\dashrightarrow\bar{\epsilon}$ . By 2.27, we also have $\bar{V}\dashrightarrow V$ and $\bar{\epsilon}\dashrightarrow\epsilon$ . However, we do not have $W=V\epsilon$ and $\bar{W}=\bar{V}\bar{\epsilon}$ are mutually independent by 2.26, because $W$ and $\bar{W}$ are neither classically nor maximally distributed. To further explain the interesting phenomenon here, we have $V$ and $\bar{V}$ are mutually independent, so are $\epsilon$ and $\bar{\epsilon}$ . It seems that the role of “ $V$ versus $\bar{V}$ ” and “ $\epsilon$ versus $\bar{\epsilon}$ ” should all be “symmetric” and they do not appear any “direction” yet. Nonetheless, when we consider the product $W$ and $\bar{W}$ , if they are independent, we must have the direction that either $W\dashrightarrow\bar{W}$ or $\bar{W}\dashrightarrow W$ , but there seems no middle stage where $W$ and $\bar{W}$ have some degree of independence where their roles in this relation are symmetric. One question we may ask is, does there exist such kind of middle stage?

In this example, we will give an affirmative answer to this question. It turns out the current conditions of independence are not enough, the relation depends on the structure of independence among the four objects $V,\bar{V},\epsilon$ and $\bar{\epsilon}$ .

To be compatible with the assumed independence $V\dashrightarrow\bar{V}$ and $\epsilon\dashrightarrow\bar{\epsilon}$ , suppose we have additional sequential independence among the four objects. There are essentially four cases:

1.

If $V\dashrightarrow\bar{V}\dashrightarrow\epsilon\dashrightarrow\bar{\epsilon}$ , we have $W\overset{\text{S}}{\dashrightarrow}\bar{W},$
2.

If $\bar{V}\dashrightarrow V\dashrightarrow\bar{\epsilon}\dashrightarrow\epsilon,$ we have $\bar{W}\overset{\text{S}}{\dashrightarrow}W,$
3.

If $V\dashrightarrow\epsilon\dashrightarrow\bar{V}\dashrightarrow\bar{\epsilon}$ , we have $W\dashrightarrow\bar{W},$
4.

If $\bar{V}\dashrightarrow\bar{\epsilon}\dashrightarrow V\dashrightarrow\epsilon$ , we have $\bar{W}\dashrightarrow W.$

Note that Case 1 and 2 are equivalent (by 3.19). The semi-sequential independence (which is the $G$ -version sequential independence) between $W$ and $\bar{W}$ is symmetric. From 2.27, the first two cases are also equivalent to

\bar{V}\dashrightarrow V\dashrightarrow\epsilon\dashrightarrow\bar{\epsilon},

V\dashrightarrow\bar{V}\dashrightarrow\bar{\epsilon}\dashrightarrow\epsilon.

However, Case 3 and 4 (sequential independence) are not equivalent (by 2.7).

Example 4.10 (The sequential independence is not an “order” with transitivity).

Although sequential independence has its “order”, it is not really an order relation with transitivity. In other words, for three random variables $X,Y,Z\in\mathcal{H}$ , the sequential independence $X\dashrightarrow Y$ and $Y\dashrightarrow Z$ do not necessarily imply $X\dashrightarrow Z$ . A trivial example is when we consider $Z_{i},i=1,2$ both follow maximal distribution, if we have $Z_{1}\dashrightarrow Z_{2}$ , then we have $Z_{2}\dashrightarrow Z_{1}$ (by 3.6), but we never have $Z_{1}\dashrightarrow Z_{1}$ . A non-trivial example (with three distinct random variables) comes from the fully-sequential independence structure 3.20. For two semi- $G$ -normal objects $W=V\epsilon$ and $\bar{W}=\bar{V}\bar{\epsilon}$ , $W\overset{\text{F}}{\dashrightarrow}\bar{W}$ means

V\dashrightarrow\epsilon\dashrightarrow\bar{V}\dashrightarrow\bar{\epsilon}.

By 2.19, we have $V\dashrightarrow\bar{V}$ and $\epsilon\dashrightarrow\bar{\epsilon}$ . Then we further have the other direction of independence also holds $\bar{V}\dashrightarrow V$ and $\bar{\epsilon}\dashrightarrow\epsilon$ (by 2.27). Then we have the following counter-example:

\bar{\epsilon}\dashrightarrow\epsilon\text{ and }\epsilon\dashrightarrow\bar{V}\text{ but }\bar{\epsilon}\dashrightarrow\bar{V}\text{ does not hold},

because we already have $\bar{V}\dashrightarrow\bar{\epsilon}$ but the pair $(\bar{V},\bar{\epsilon})$ cannot have mutual independence by 2.26. Similarly, we have another example

\bar{V}\dashrightarrow V\text{ and }V\dashrightarrow\epsilon\text{ but }\bar{V}\dashrightarrow\epsilon\text{ does not hold.}

4.4 What kinds of sequences are not $G$ -normal?

For examples:

1.

generate $Y_{i}\sim N(0,\sigma_{i}^{2})$ with $\sigma_{i}\sim\text{Unif}{[\underline{\sigma},\overline{\sigma}]}$ , $i=1,2,\dotsc,n.$ This is essentially a sample from independent normal mixture (with scaling parameter following a uniform distribution). Note that this essence is not affected by the distribution of $\sigma$ (as long as $\sigma$ follows a fixed distribution). The whole data sequence $Y_{i}$ does not have any distributional uncertainty.
2.

first generate $\sigma_{i}\sim\text{Unif}{[\underline{\sigma},\overline{\sigma}]}$ , $i=1,2\dotsc,n$ , then generate $Y_{ij}\sim N(0,\sigma_{i}^{2})$ with $j=1,2,\dotsc,m$ . By introducing this blocking design, even though we pretend to treat the switching rule of $\sigma_{i}$ as unknown here (although it may not be so hard for data analyst to observe this pattern), if we look at the uncertainty set considered in this generation scheme: ${N(0,\sigma^{2})\mathpunct{:}\sigma\in{[\underline{\sigma},\overline{\sigma}]}}$ , it is actually at most a pseudo simulation of semi- $G$ -normal distribution. One typical feature of this sample is that it does not have skewness-uncertainty: it has a certain zero skewness.

consider equal-spaced grid

\underline{\sigma}=\sigma_{1}<\sigma_{2}<\dotsc<\sigma_{m}=\overline{\sigma}.

For each $\sigma_{i}$ with $i=1,2,\dotsc,m$ , generate $Y_{ij}\sim N(0,\sigma_{i}^{2})$ , $j=1,2,\dotsc,n$ . Then treat

\max_{1\leq i\leq m}\frac{1}{n}\sum_{j=1}^{n}\varphi(Y_{ij}),

(4.8)

as an approximation of $\mathcal{E}[\varphi(\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])].$ This kind of schemes has been used in some of the literature (such as Deng et al., (2019); Fei and Fei, (2019)). We may cautiously step back and ask ourselves: is this a valid approximation? Not really, it is actually an approximation of $\mathcal{E}[\varphi(\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])]$ :

\max_{1\leq i\leq m}\frac{1}{n}\sum_{j=1}^{n}\varphi(Y_{ij})\overset{n\to\infty}{\to}\max_{1\leq i\leq m}\mathbb{E}[\varphi(\sigma_{i}\epsilon)]\overset{m\to\infty}{\to}\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)],

where the first convergence can be treated as a classical almost sure convergence and the second one is a deterministic one due to the design of $\{\sigma_{i}\}_{i=1}^{m}$ . This fact does not change even using some overlapping groups because each group can be at most treated as a sample from a normal mixture. Again, the problem of the above-mentioned scheme is that it could be misleading for general audience. It is actually going through the uncertainty set of $\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ in a semi-sequential independence rather than in sequential independence. For general $\varphi$ , only in the later case, the normalized sum will be closer to the $G$ -normal distribution. However, this issue can be fixed by consdering an extra step: if the function $\varphi$ considered in the question can be proved to be a convex or concave one, then in this practical sense, by 3.24.2, the semi-sequential independence and sequential independence can be treated as the same. For general fixed $\varphi$ , we usually need to consider the $G$ -EM procedure to do the approximation as discussed in Section 4.1, which is also closely related to Section 4 in Fang et al., (2019). Alternatively, we may consider a small family of $\varphi$ ’s so that we have a finite dimensional set of distributions to go through (such as the one in 4.8). In this way, we can get a feasible approximation based on the idea of max-mean estimation by Jin and Peng, (2021) similar to the form 4.8.

The idea of this section will be further illustrated in the companion of this paper by using a series of data experiments.

5 Conclusions and extensions

For a researcher or practitioner from various backgrounds who may not be familiar with the notion of nonlinear expectation, but is comfortable the classical probability theory and normal distribution, when they try to understand the $G$ -normal from classical normal, it will be intuitive and beneficial if there exists an intermediate structure that can be directly transformed from the classical normal and also create a bridge towards the $G$ -normal. Another thinking gap is from the classical independence (symmetric) to the $G$ -version sequential independence (asymmetric). It will be useful if we have an intermediate stage of independence that is under distributional uncertainty but preverses the symmetry, so that it is associated with our common first impression on the relation beween two static separate random objects both with distributional uncertainty, but no sequential order assumed. Once we talk about two objects with sequential order or in a dynamic way, it becomes possible to involve the sequential independence.

This paper has rigorously set up and discussed the semi- $G$ -normal distributions with its own types of independence, especially the semi-sequential independence. The hybrid roles of these new substructures, the semi- $G$ -normal with its semi-sequential independence, can be summarized as follows:

1.

The semi- $G$ -normal $\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ is closely related to the classical normal that it is simply a classical normal $N(0,1)$ scaled by a $G$ -version constant $V\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}$ (with a typical independence). Then the semi- $G$ -normal only exhibits the moment uncertainty with even order but its odd moments, especially the third moment (related to skewness) is preserved to be zero. Meanwhile, the semi- $G$ -normal is also closely connected with the $G$ -normal: they have the same sublinear expectation under a convex or concave transformation $\varphi$ . For general $\varphi$ , they are connected by the using the $G$ -version central limit theorem.
2.

The semi- $G$ -normal $\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ with semi-sequential independence also preserve the properties of classical normal in multivariate situation (3.22).
3.

4.9 shows the hybrid and intermediate role of semi-sequential independence between classical and the sequential independence. The semi-sequential independence is related to the classical one in the sense that it is symmetric ( $W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}$ implies $W_{2}\overset{\text{S}}{\dashrightarrow}W_{1}$ ) and it is also related to the sequential independence under convex or concave $\varphi$ as illustrated in 3.24.2.

We can use a comparison table (Table 1) to summarize the hybrid roles of this substructure: semi- $G$ -normal with semi-sequential independence, creating a bridge connecting classical normal and $G$ -normal.

Table 1: Comparison among normal, semi-

G

-normal and

G

-normal

	Normal	Semi- $G$ -normal	$G$ -normal
	$N(0,\sigma^{2})$	$\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$	$\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$
Expectation	Linear	Sublinear	Sublinear
1st-moment	Certain (0)	Certain (0)	Certain (0)
2nd-moment	Certain ( $\sigma^{2}$ )	Uncertain ( $[\underline{\sigma}^{2},\overline{\sigma}^{2}]$ )	Uncertain ( $[\underline{\sigma}^{2},\overline{\sigma}^{2}]$ )
3rd-moment	Certain (0)	Certain (0)	Uncertain
Independence	$\begin{array}[]{c}\text{Classical }(\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}})\\ \text{Symmetric}\end{array}$	$\begin{array}[]{c}\text{Semi-sequential }(\overset{\text{S}}{\dashrightarrow})\\ \text{Symmetric}\end{array}$	$\begin{array}[]{c}\text{Sequential }(\dashrightarrow)\\ \text{Asymmetric}\end{array}$
(Setup)	$\begin{array}[]{c}X\overset{\text{d}}{=}\bar{X}\overset{\text{d}}{=}N(0,\sigma^{2})\\ X\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\bar{X}\end{array}$	$\begin{array}[]{c}X\overset{\text{d}}{=}\bar{X}\overset{\text{d}}{=}\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])\\ X\overset{\text{S}}{\dashrightarrow}\bar{X}\end{array}$	$\begin{array}[]{c}X\overset{\text{d}}{=}\bar{X}\overset{\text{d}}{=}\mathcal{N}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])\\ X\dashrightarrow\bar{X}\end{array}$
Stability	$X+\bar{X}\overset{\text{d}}{=}\sqrt{2}X$	$X+\bar{X}\overset{\text{d}}{=}\sqrt{2}X$	$X+\bar{X}\overset{\text{d}}{=}\sqrt{2}X$
Multivariate	$(X,\bar{X})\overset{\text{d}}{=}N(\bm{0},\sigma^{2}\mathbf{I}_{2})$	$(X,\bar{X})\overset{\text{d}}{=}\hat{\mathcal{N}}(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{2})$	$(X,\bar{X})\overset{\text{d}}{\neq}\mathcal{N}(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{2})$

Furthermore, we hope the substructures proposed in this paper will open up new extensions of discussions on the difference and connection between the $G$ -expectation framework with the classical one, by providing more details on the operations on the distributions and independence in a ground where the researchers in both areas can have a proper overlapping intuition. In this way, we are able to have a finer discussion on the model uncertainty in the dynamic situation where the volatility part is ambiguous or cannot be precisely determined for data analysts.

Next we will give several possible directions of extensions of this paper. Interestingly, the discussion in Section 5.5 actually shows the vision that, after introducing the tool of sublinear expectation with representations in different situations, we are able to extend our horizon of statistical questions to include those that could be too complicated to handle under classical probability system.

5.1 The semi- $G$ -family of distributions

We can extend our current notion of semi- $G$ -normal distribution to a broader class of semi- $G$ -family of distributions or the class of semi- $G$ -version distributions.

For simplicity, we only provide this notion in one-dimensional case.

Definition 5.1.

A random variable $W$ follows a semi- $G$ -version distribution if there exists a maximally distributed $\bm{Z}\sim\mathcal{M}(\Theta)$ (where $\Theta\subset\mathbb{R}^{k}$ is a bounded, closed and convex set) and a classically distributed $\epsilon$ , satisfying

Z\dashrightarrow\epsilon,

such that

W=f(Z,\epsilon),

for a Borel measurable function $f$ satisfying $f(Z,\epsilon)\in\mathcal{H}$ .

Remark 5.1.1.

The three types of independence in 3.16 can be carried over to members in semi- $G$ -family of distributions. For instance, with $W_{i}=f_{i}(Z_{i},\epsilon_{i}),i=1,2,\dotsc,n$ , they are called semi-sequentially independent if

Z_{1}\dashrightarrow Z_{2}\dashrightarrow\cdots\dashrightarrow Z_{n}\dashrightarrow\epsilon_{1}\dashrightarrow\epsilon_{2}\dashrightarrow\cdots\dashrightarrow\epsilon_{n}.

Remark 5.1.2.

Most of classical distributions should exist in this framework. To give a quick validation, since there exists $\epsilon\sim\mathcal{N}(0,[1,1])$ which follows standard normal distribution $N(0,1)$ with classical cumulative distribution function (cdf) $\Phi$ which is defined as

\Phi(x)\coloneqq\mathcal{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon\leq x}}\right\}}]=\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon\leq x}}\right\}}].

(Such cdf can be defined using the solution of the classical heat equation.) Let

U\coloneqq\Phi(\epsilon).

Note that $\Phi$ is a bounded and continuous function, so $U\in\mathcal{H}$ . Then we can check that $U$ follows classical $\text{Unif}[0,1]$ distribution. Next we can use the classical inverse cdf method. For any classical distribution with cdf $F$ (no matter it is conitnuous or not), let

F^{-1}(y)\coloneqq\inf\{x;F(x)\geq y\},

(5.1)

denote the generalized inverse of $F$ . Let $X\coloneqq F^{-1}(U).$ We only need to add suitable conditions on $F$ so that $\mathcal{E}[\lvert X\rvert]<\infty$ to have $X\in\mathcal{H}$ . Then we get a random object $X$ following distribution with cdf $F$ .

Remark 5.1.3.

Note that we assume $\Theta$ to be a bounded, closed and convex set for theoretical convenience. In practice, this condition can be weakened. For instance, when we talk about the semi- $G$ -normal random variable $W=V\epsilon$ where $V\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}$ , the interval ${[\underline{\sigma},\overline{\sigma}]}$ can be changed to $\{\underline{\sigma},\overline{\sigma}\}$ or $\{\underline{\sigma},(\underline{\sigma}+\overline{\sigma})/2,\overline{\sigma}\}$ .

Example 5.2.

Here are several special examples of semi- $G$ -version distributions.

1.

Consider $Z\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}$ , $\epsilon\sim N(0,1)$ and $f(x,y)=xy$ , then $W$ follows the semi- $G$ -normal distribution, whose distributional uncertainty can be characterized by

$\{N(0,\sigma^{2}),\sigma\in{[\underline{\sigma},\overline{\sigma}]}\}.$
2.

Consider $Z=(U,V)\sim\mathcal{M}([\underline{\mu},\overline{\mu}]\times{[\underline{\sigma},\overline{\sigma}]})$ , $\epsilon\sim N(0,1)$ and

$W=U+V\epsilon.$

Then the distributional uncertainty of $W$ can be described as

$\{N(\mu,\sigma^{2}),\mu\in[\underline{\mu},\overline{\mu}],\sigma\in{[\underline{\sigma},\overline{\sigma}]}\}.$

We can also show that $\mathcal{E}[\varphi(W)]$ can cover a family of normal mixture models.
3.

(Semi- $G$ -exponential) Let $Z\sim\mathcal{M}[\underline{\lambda},\overline{\lambda}]$ and $\epsilon\sim\exp(1)$ . Consider

$W=Z\epsilon,$

then we can check that the distributional uncertainty of $W$ can be written as

$\{\exp(\lambda),\lambda\in[\underline{\lambda},\overline{\lambda}]\},$

where each $\exp(\lambda)$ has pdf $f(x)=\frac{1}{\lambda}e^{-x/\lambda}\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{x\geq 0}}\right\}}$ .
4.

(Semi- $G$ -Bernoulli) With $0\leq\underline{p}\leq\overline{p}\leq 1$ , let $\epsilon\sim\text{Unif}[0,1]$ , and

$Z\sim\mathcal{M}[\underline{p},\overline{p}].$

Consider

$W=\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon-Z<0}}\right\}}.$

Then $W$ has distributional uncertainty

$\{\text{Bern}(p),p\in[\underline{p},\overline{p}]\}.$

Example 5.3.

In general, we can take advantage of the idea of classical inverse cdf method to design the transformation $f$ . Then we are able to consider any distributional uncertainty in the form of

\{F(x;\theta),\theta\in\Theta\},

(5.2)

where $F(x;\theta)$ is the cdf of a classical distribution with parameter $\theta$ . Let $F^{-1}(y;\theta)$ denote the generalized inverse of $F(x;\theta)$ as shown in 5.1. Consider $\bm{Z}\sim\mathcal{M}(\Theta)$ and $\epsilon\sim\text{Unif}[0,1]$ , and

W=F^{-1}(\epsilon,\bm{Z}).

After we add more conditions on $F$ such that $W\in\mathcal{H}$ , we have $W$ has distributional uncertainty in the form 5.2, because

\mathcal{E}[\varphi(W)]=\sup_{\bm{z}\in\Theta}\mathbb{E}[\varphi(F^{-1}(\epsilon,z))],

where $F^{-1}(\epsilon,z)$ follows the distribution with cdf $F(x;z)$ .

To further study the properties of semi- $G$ -version distributions and the semi-sequential independence, let $\bar{\mathcal{H}}_{s}$ denote the semi- $G$ -family of distributions:

\bar{\mathcal{H}}_{s}\coloneqq\{X\in\mathcal{H}:X=f(V,\epsilon),V\sim\mathcal{M}(\Theta),\epsilon\text{ classical},V\dashrightarrow\epsilon\}.

Note that $\bar{\mathcal{H}}_{s}$ satisfies:

1.

If $X\in\bar{\mathcal{H}}_{s}$ , $aX\in\bar{\mathcal{H}}_{s}$ for any $a\in\mathbb{R}$ ,
2.

If $X\in\bar{\mathcal{H}}_{s}$ , $\lvert X\rvert\in\bar{\mathcal{H}}_{s}$ ,
3.

For $X,Y\in\text{$\bar{\mathcal{H}}_{s}$}$ , if $X\overset{\text{S}}{\dashrightarrow}Y$ , $X+Y\in\bar{\mathcal{H}}_{s}.$

For any $X,Y\in\bar{\mathcal{H}}_{s}$ , we have $X\overset{\text{S}}{\dashrightarrow}Y$ is equivalent to $Y\overset{\text{S}}{\dashrightarrow}X$ (by the symmetry of semi-sequential independence as illustrated by 3.19), so we can omit the direction between $X$ and $Y$ and also call the mutual semi-sequential independence between them as semi- $G$ -independence.

Definition 5.4.

(Semi- $G$ -independence) For any $X,Y\in\bar{\mathcal{H}}_{s}$ , with $X=f(V_{x},\epsilon_{x})$ and $Y=g(V_{y},\epsilon_{y})$ , $X$ and $Y$ are semi- $G$ -independent if

1.

$(V_{x},V_{y})\dashrightarrow(\epsilon_{x},\epsilon_{y})$ ,
2.

$V_{x}\dashrightarrow V_{y}$ (which is equivalent to $V_{y}\dashrightarrow V_{x}$ ),
3.

$\epsilon_{x},\epsilon_{y}$ are classically independent.

Definition 5.5.

(Semi- $G$ -independence of sequence) For a sequence $\{X_{i}\}_{i=1}^{n}\subset\bar{\mathcal{H}}_{s}$ with $X_{i}=f_{i}(V_{i},\epsilon_{i})$ , they are (mutually) semi- $G$ -independent if

1.

$(V_{1},V_{2},\dotsc,V_{n})\dashrightarrow(\epsilon_{1},\epsilon_{2},\dotsc,\epsilon_{n})$ ,
2.

$\{V_{i}\}_{i=1}^{n}$ are $G$ -version (sequentially) independent (that is ${V}_{1}\dashrightarrow{V}_{2}\dashrightarrow\cdots\dashrightarrow{V}_{n}$ ),
3.

$\{\epsilon\}_{i=1}^{n}$ are classically independent.

A sequence $\{X_{i}\}_{i=1}^{n}$ are called semi- $G$ -version i.i.d. (or semi- $G$ -i.i.d.) if they are identically distributed and semi- $G$ -independent.

In the following context, consider $X,Y\in\bar{\mathcal{H}}_{s}$ , with $X=f(V_{x},\epsilon_{x})$ and $Y=g(V_{y},\epsilon_{y})$ .

Proposition 5.6.

If $X$ and $Y$ are semi- $G$ -independent, for the joint vector $(X,Y)$ , we have for any $\varphi\in C_{\mathrm{b.Lip}}$ ,

\mathcal{E}[\varphi(X,Y)]=\sup_{(v_{x},v_{y})\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]\times[\underline{\sigma}_{y},\overline{\sigma}_{y}]}\mathbb{E}[\varphi(f(v_{x},\epsilon_{x}),g(v_{y},\epsilon_{y}))].

Proof.

This is direct consequence of the definition of the semi- $G$ -independence. ∎

For any $v_{x}\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]$ and $v_{y}\in[\underline{\sigma}_{y},\overline{\sigma}_{y}]$ , let

h_{1}(v_{x})\coloneqq\mathbb{E}[f(v_{x},\epsilon_{x})]\text{ and }h_{2}(v_{y})=\mathbb{E}[g(v_{y},\epsilon_{y})].

Then

\mathcal{E}[X]=\mathcal{E}[h_{1}(V_{x})]\text{ and }\mathcal{E}[Y]=\mathcal{E}[h_{1}(V_{y})].

In the following context, for simplicty of discussion, we assume $h_{1},h_{2}$ are continuous functions. Then we can take maximum on the rectangle $[\underline{\sigma}_{x},\overline{\sigma}_{x}]\times[\underline{\sigma}_{y},\overline{\sigma}_{y}]$ in 5.6. (This assumption can be relaxed whenever $\sup$ does not affect the derivation.) Readers may find out that, under the semi- $G$ -independence, our manipulation of sublinear expectation of semi- $G$ -version objects becomes quite intuitive and flexible.

Proposition 5.7.

If $X$ and $Y$ are semi- $G$ -independent, we have

\mathcal{E}[X+Y]=\mathcal{E}[X]+\mathcal{E}[Y].

Proof.

By 5.6, we have

	$\displaystyle\mathcal{E}[X+Y]$	$\displaystyle=\max_{(v_{x},v_{y})\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]\times[\underline{\sigma}_{y},\overline{\sigma}_{y}]}\mathbb{E}[f(v_{x},\epsilon_{x})+g(v_{y},\epsilon_{y})]$
		$\displaystyle=\max_{(v_{x},v_{y})\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]\times[\underline{\sigma}_{y},\overline{\sigma}_{y}]}[h_{1}(v_{x})+h_{2}(v_{y})]$
		$\displaystyle=\max_{v_{x}}\max_{v_{y}}[h_{1}(v_{x})+h_{2}(v_{y})]$
		$\displaystyle=\max_{v_{x}}[h_{1}(v_{x})+\max_{v_{y}}h_{2}(v_{y})]$
		$\displaystyle=\max_{v_{x}}h_{1}(v_{x})+\max_{v_{y}}h_{2}(v_{y})=\mathcal{E}[X]+\mathcal{E}[Y].$

∎

Remark 5.7.1.

Compared with 2.25, for $X,Y\in\bar{\mathcal{H}}_{s}$ , we have one more situation for $\mathcal{E}[X+Y]=\mathcal{E}[X]+\mathcal{E}[Y]$ to hold:

1.

either $X$ or $Y$ has mean-certainty,
2.

$X\dashrightarrow Y$ or $X\dashrightarrow Y$ ,
3.

$X$ and $Y$ are semi- $G$ -independent.

Proposition 5.8.

If $X$ and $Y$ are semi- $G$ -independent and either one of them has certain mean zero, we have

\mathcal{E}[XY]=-\mathcal{E}[-XY]=0.

Proof.

Since $X$ and $Y$ are semi- $G$ -independent, by 5.6,

	$\displaystyle\mathcal{E}[XY]$	$\displaystyle=\max_{(v_{x},v_{y})\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]\times[\underline{\sigma}_{y},\overline{\sigma}_{y}]}\mathbb{E}[f(v_{x},\epsilon_{x})g(v_{y},\epsilon_{y})]$
		$\displaystyle=\max_{(v_{x},v_{y})\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]\times[\underline{\sigma}_{y},\overline{\sigma}_{y}]}h_{1}(v_{x})h_{2}(v_{y}).$

If we have either one of them has certain mean zero such as $\mathcal{E}[X]=-\mathcal{E}[-X]=0$ , we have

\max_{v_{x}\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]}\mathbb{E}[f(v_{x},\epsilon_{x})]=\min_{v_{x}\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]}\mathbb{E}[f(v_{x},\epsilon_{x})]=0,

It means $h_{1}(v_{x})=0$ for any $v_{x}\in[\underline{\sigma}_{x},\overline{\sigma}_{x}]$ . Then we must have $\mathcal{E}[XY]=0$ and similarly we have $-\mathcal{E}[-XY]$ by changing $\max$ to $\min$ . ∎

5.2 The semi- $G$ -version of central limit theorem

After setting up the semi- $G$ -family of distributions and semi-sequential independence, it turns out we can prove a semi- $G$ -version of central limit theorem in this context, which further brings a substructure connecting the classical central limit theorem with the $G$ -version central limit theorem. It also shows the central role of semi- $G$ -normal in a semi- $G$ -version class of distributions.

First we consider a subset of $\bar{\mathcal{H}}_{s}$ :

\mathcal{H}_{s}\coloneqq\{X\in\bar{\mathcal{H}}_{s}:X=V\epsilon,V\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}\text{ with }0\leq\underline{\sigma}\leq\overline{\sigma}\text{ and the classical }\epsilon\text{ is standardized}\},

where we call a classical $\epsilon$ is standardized if $\mathbb{E}[\epsilon]=0$ and $\mathbb{E}[\epsilon^{2}]=1$ . Here $\mathcal{H}_{s}$ can be treated as a class of semi- $G$ -distributions with zero mean and variance uncertainty.

Our current version of the semi- $G$ -version of central limit theorem can be formulated as follows.

Theorem 5.9.

For any sequence $\{X_{i}\}_{i=1}^{n}=\{V_{i}\eta_{i}\}_{i=1}^{n}\subset\mathcal{H}_{s}$ that are semi- $G$ -i.i.d. with certain zero mean and uncertain variance:

\underline{\sigma}^{2}=-\mathcal{E}[-X_{1}^{2}]\leq\mathcal{E}[X_{1}^{2}]=\overline{\sigma}^{2}\text{ with }0\leq\underline{\sigma}\leq\overline{\sigma},

we have

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i}\overset{\text{d}}{\longrightarrow}W,

where $W\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ . To be specific, for any bounded and continuous $\varphi$ , we have

\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i})]=\mathcal{E}[\varphi(W)].

(5.3)

Remark 5.9.1.

Note that any $\varphi\in C_{\mathrm{b.Lip}}$ must be a bounded and continuous one, so the convergence in distribution (2.5) must hold.

Remark 5.9.2.

As a classical perspective of 5.9, by using the representation of semi- $G$ -normal under semi- $G$ -independence, we have

\lim_{n\to\infty}\sup_{\bm{\sigma}\in[\underline{\sigma},\overline{\sigma}]^{n}}E[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sigma_{i}\epsilon_{i})]=\sup_{v\in[\underline{\sigma},\overline{\sigma}]}E[\varphi(v\epsilon^{*})],

(5.4)

where $\epsilon^{*}\sim N(0,1)$ and $\bm{\sigma}=(\sigma_{1},\dotsc,\sigma_{n})$ is a scalar vector. This form is also equivalent to

\lim_{n\to\infty}\sup_{\bm{\sigma}\in S_{n}{[\underline{\sigma},\overline{\sigma}]}}E[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sigma_{i}\epsilon_{i})]=\sup_{v\in[\underline{\sigma},\overline{\sigma}]}E[\varphi(v\epsilon^{*})],

where $\bm{\sigma}$ could be any hidden processes valuing in ${[\underline{\sigma},\overline{\sigma}]}$ that is independent from $(\epsilon_{1},\epsilon_{2},\dotsc,\epsilon_{n})$ . When the unknown variance form is taken in this way, the uncertainty on the behavior the normalized summation can be asymptotically characterized by the semi- $G$ -normal.

If $\bm{\sigma}$ is chosen from a larger family that may involve dependence between $\sigma_{i}$ with previous $(\epsilon_{j},j<i)$ , then it will be related with the $G$ -version central limit theorem (under sequential independence, rather than the semi- $G$ -version independence): $X_{i}$ are sequentially independent,

\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i})]=\mathcal{E}[\varphi(W^{G})],

which gives us

\lim_{n\to\infty}\sup_{\bm{\sigma}\in\mathcal{L}_{n}^{*}{[\underline{\sigma},\overline{\sigma}]}}E[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sigma_{i}\epsilon_{i})]=\mathcal{E}[\varphi(W^{G})].

To summarize, the semi- $G$ -normal distribution can be treated as the attractor for normalized summations of semi- $G$ -i.i.d. random variables and the $G$ -normal is the attractor for summations of $G$ -version i.i.d. random variables.

In the proof of 5.9, we adapt the idea of Lindeberg method in a “leave-one-out” manner (Breiman, (1992)) to the sublinear context. One of the reason that we are able to do such adaptation is the symmetry in semi- $G$ -independence: $X_{i}$ is semi- $G$ -independent from $\{X_{j},j\neq i\}$ (Note that we cannot do such adaptation under sequential independence due to its asymmetry). More details of the proof can be found in Section 6.6.

Since we only have finite second moment assumption so far in 5.9, by adding stronger moment conditions on $X_{n}$ , the function space of $\varphi$ can be taken to be $C_{\mathrm{l.Lip}}$ to include those unbounded ones. This statement is based on 2.18.

As a basic example, given a stronger condition $\mathcal{E}[\lvert X_{1}\rvert^{3}]<\infty$ , we can check that the convergence 5.3 holds for $\varphi(x)=x^{3}$ by direct computation (5.10).

Example 5.10 (Check $\varphi(x)=x^{3}$ ).

In the convergence 5.3, since $\mathcal{E}[W^{3}]=0$ , we only need to show:

\lim_{n\to\infty}\mathcal{E}[(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i})^{3}]=0.

In fact,

	$\displaystyle\mathcal{E}[(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i})^{3}]$	$\displaystyle=n^{-3/2}\mathcal{E}[(\sum_{i=1}^{n}X_{i})^{3}]$
		$\displaystyle=n^{-3/2}\mathcal{E}[\sum_{i=1}^{n}X_{i}^{3}+\sum_{i\neq j\text{ or }j\neq k}X_{i}X_{j}X_{k}].$

(Note that, if $i=j$ and $j=k$ , the second term becomes the first one.) For the summand $X_{i}X_{j}X_{k}$ of the second term, without loss of generality, we assume that $i\leq j\leq k$ . Then we have three cases:

1.

$i<j=k$ ,
2.

$i=j<k$ ,
3.

$i<j<k$ .

In Case 1, since $X_{i}$ and $X_{j}$ are semi- $G$ -independent, we have:

	$\displaystyle\mathcal{E}[X_{i}X_{j}^{2}]$	$\displaystyle=\max_{(v_{i},v_{j})}\mathbb{E}[v_{i}v_{j}^{2}\eta_{i}\eta_{j}^{2}]$
		$\displaystyle=\max_{(v_{i},v_{j})}v_{i}v_{j}^{2}\mathbb{E}[\eta_{i}]\mathbb{E}[\eta_{j}^{2}]=0.$

(Note that $\mathcal{E}[X_{i}X_{j}^{2}]=0$ does not hold under sequential independence $X_{i}\dashrightarrow X_{j}$ by 2.7.) Meanwhile, we can obtain $-\mathcal{E}[-X_{i}X_{j}^{2}]=0$ so $X_{i}X_{j}^{2}$ has certain mean zero.

We can similarly prove the result in Case 2, that is, $X_{i}^{2}X_{k}$ has certain mean zero. For Case 3, since $X_{i},X_{j},X_{k}$ are semi- $G$ -independent, we have

\mathcal{E}[X_{i}X_{j}X_{k}]=\max_{(v_{i},v_{j},v_{k})}v_{i}v_{j}v_{k}\mathbb{E}[\eta_{i}]\mathbb{E}[\eta_{j}]\mathbb{E}[\eta_{k}]=0.

We further have $-\mathcal{E}[-X_{i}X_{j}X_{k}]=0$ using the same logic. Therefore,

\mathcal{E}[(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i})^{3}]=n^{-3/2}\mathcal{E}[\sum_{i=1}^{n}X_{i}^{3}]=n^{-1/2}\mathcal{E}[X_{1}^{3}]\to 0,

where we use the condition that $\mathcal{E}[\lvert X_{1}\rvert^{3}]<\infty$ and 5.7.

5.3 Fine structures of independence and the associated family of state-space volatility models

In 4.9, we have mainly discussed the independence between two semi- $G$ -distributed objects. Here we consider three of them as a starting point to discuss much finer structure of independence.

Consider $W_{i}=V_{i}\epsilon_{i}\overset{\text{d}}{=}\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}]),i=1,2,3$ . The independence structure among them is essentially related to the $G$ -version independence among $V_{i}$ and $\epsilon_{i}$ , $i=1,2,3$ . For instance,

(a)

$V_{1}\dashrightarrow V_{2}\dashrightarrow V_{3}\dashrightarrow\epsilon_{1}\dashrightarrow\epsilon_{2}\dashrightarrow\epsilon_{3}$ ,
(b)

$V_{1}\dashrightarrow\epsilon_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{2}\dashrightarrow V_{3}\dashrightarrow\epsilon_{3}$ .

Note that a) is equivalent to $W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}\overset{\text{S}}{\dashrightarrow}W_{3}$ and b) means $W_{1}\overset{\text{F}}{\dashrightarrow}W_{2}\overset{\text{F}}{\dashrightarrow}W_{3}$ which implies $W_{1}\dashrightarrow W_{2}\dashrightarrow W_{3}$ .

Then we can see that there are several middle stages between (a) and (b). In order to present these intermediate stages, let us play a simple game: switch two components each time and change the independence structure from (a) to (b). During this game, the following rules are required:

R1

we must keep the independence $V_{i}\dashrightarrow\epsilon_{i}$ due to the definition of semi- $G$ -normal,
R2

we must keep the order as $(V_{1},V_{2},V_{3})$ and $(\epsilon_{1},\epsilon_{2},\epsilon_{3}$ ), because the independence order of elements within each vector is usually equivalent. Otherwise, if we break this order, we need an unnecessary extra step to retrieve the index order $(1,2,3)$ to be consistent with (b).

Here we can get two approaches:

1.

Since we do not want to break the order within $(V_{1},V_{2},V_{3})$ or $(\epsilon_{1},\epsilon_{2},\epsilon_{3}$ ), the first step has to be switching some $V_{i}$ with $\epsilon_{j}$ with $i,j=1,2,3$ . For the $\epsilon$ part, we can only move $\epsilon_{1}$ due to R1, and similarly for $V$ part we can only move $V_{3}$ . Hence, the first step is to exchange $V_{3}$ and $\epsilon_{1}$ in (a) to get

$V_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{1}\dashrightarrow V_{3}\dashrightarrow\epsilon_{2}\dashrightarrow\epsilon_{3}.$ (5.5)

Then we have two equivalent ways to move on.
2.

One way is to exchange $V_{2}$ and $\epsilon_{1}$ in 5.5 to get

$V_{1}\dashrightarrow\epsilon_{1}\dashrightarrow V_{2}\dashrightarrow V_{3}\dashrightarrow\epsilon_{2}\dashrightarrow\epsilon_{3}.$ (5.6)

Then we can exchange $V_{3}$ and $\epsilon_{2}$ to get (b).
3.

Another way is to exchange $V_{3}$ and $\epsilon_{2}$ to get

$V_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{1}\dashrightarrow\epsilon_{2}\dashrightarrow V_{3}\dashrightarrow\epsilon_{3}.$ (5.7)

Then we can exchange $V_{2}$ and $\epsilon_{1}$ to get (b).

Note that 5.6 implies the following relation:

W_{1}\dashrightarrow(W_{2},W_{3})\text{ and }W_{2}\overset{\text{S}}{\dashrightarrow}W_{3}.

We can show that the family of models associated with the representation of $\mathcal{E}[\varphi(W_{1},W_{2},W_{3})]$ under 5.6 can be illustrated by Figure 5.2. Similarly, 5.7 implies

(W_{1},W_{2})\dashrightarrow W_{3}\text{ and }W_{1}\overset{\text{S}}{\dashrightarrow}W_{2}.

The family of models associated with 5.7 can be described by Figure 5.3. The family of models for 5.5 can be shown by Figure 5.1.

The intuition here is: if all $V_{j}$ is before $\epsilon_{i}$ , since $\epsilon_{i}$ has distributional certainty, in the directed graph, $\sigma_{j}$ does not have effects on $\epsilon_{i}$ . As long as we have $\epsilon_{i}$ is before $V_{j}$ in the order of the $G$ -version independence, we must have the additional edge from $\epsilon_{i}$ to $\sigma_{j}$ in the directed graph of the family of models to represent the sublinear expectation of the joint vector.

We can see that by changing the independence structure, the sublinear expectation of a joint vector of semi- $G$ -version of distributions can be represented by classes of state-space models with different graphical structures.

One question to be explored is whether there is an independence structure that is associated with the familiy shown in Figure 5.4. Our conjecture is as follows: at least we need the following conditions,

1)

$V_{1}\dashrightarrow\epsilon_{1}\dashrightarrow V_{2}\dashrightarrow\epsilon_{2}$ which means $W_{1}\dashrightarrow W_{2}$ ,
2)

$V_{2}\dashrightarrow\epsilon_{2}\dashrightarrow V_{3}\dashrightarrow\epsilon_{3}$ which means $W_{2}\dashrightarrow W_{3}$ ,
3)

$V_{1}\dashrightarrow V_{3}\dashrightarrow\epsilon_{1}\dashrightarrow\epsilon_{3}$ which means $W_{1}\overset{\text{S}}{\dashrightarrow}W_{3}$ .

Figure 5.1: Diagram for 5.5

Figure 5.2: Diagram for 5.6

Figure 5.3: Diagram for 5.7

Figure 5.4: Diagram for the common structure of classical first-order hidden Markov models with feedback

5.4 A robust confidence interval for regression under heteorskedastic noise with unknown variance structure

Let $\{W_{i}\}_{i=1}^{\infty}$ denote a sequence of nonlinearly i.i.d. semi- $G$ -normally distributed random variables with $W_{1}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ . In Section 4.1. we have studied the $G$ -EM procedure which is aimed at the following expression:

\mathcal{E}[\varphi(\sum_{i=1}^{n}a_{i}W_{i})].

(5.8)

This section will provide a basic example in the context of regression to show why we need to think about 5.8 in statistical practice.

Consider a simple linear regression problem in the context of sequential data $(x_{i},Y_{i})$ (where the order of the data matters):

Y_{i}=\beta_{0}+\beta_{1}x_{i}+\xi_{i},i=1,2,\dotsc,n,

(5.9)

where $x_{i}$ is treated as known and $\xi_{i}=\sigma_{i}\epsilon_{i}$ with $\sigma_{i}:\Omega\to{[\underline{\sigma},\overline{\sigma}]}$ and $\epsilon_{i}\sim N(0,1)$ for each $i=1,2,\dotsc,n$ . We can see that the noise $\xi_{i}$ part is heteorskedastic (although $\sigma_{i}$ is not observable). However, if the variance structure of the noise part $\xi_{i}$ is complicated due to measurement errors or the data is collected from different subpopulations with different variances, we need to have some precaution on the properties of the least square estimator $\hat{\beta}_{1}$ , especially when we are lack of prior knowledge on the dynamic of $\sigma_{i}$ . If we worry that $\sigma_{i}$ may depend on the preivous $\epsilon_{k}$ with $k<i$ , rather than assuming a single probabilistic model for $\sigma_{i}$ then perform the regression, in an early stage of data analysis, we can first assume $\bm{\sigma}=(\sigma_{i})_{i=1}^{n}$ could belong to any elements in $\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}$ defined in Section 3.7. Note that the distributional uncertainty of each $\xi_{i}$ can be described by $W_{i}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ . Then the distributional uncertainty of 5.9 can be translated into a $G$ -version format:

Y^{G}_{i}=\beta_{0}+\beta_{1}x_{i}+W_{i},i=1,2,\dotsc,n.

(5.10)

Let

a_{i}\coloneqq\frac{x_{i}-\bar{x}_{n}}{\sum(x_{i}-\bar{x}_{n})^{2}}.

Then the least-square estimator can be written as

\hat{\beta}_{1}=\frac{\sum(x_{i}-\bar{x}_{n})Y^{G}_{i}}{\sum(x_{i}-\bar{x}_{n})^{2}}=\sum a_{i}Y^{G}_{i}=\beta_{0}\sum a_{i}+\sum a_{i}x_{i}\beta_{1}+\sum a_{i}W_{i}=\beta_{1}+\sum a_{i}W_{i}.

(5.11)

Then we have $\hat{\beta}_{1}-\beta_{1}=\sum a_{i}W_{i}.$ Note that $\mathcal{E}[\hat{\beta}_{1}]=-\mathcal{E}[-\hat{\beta}_{1}]=\beta_{1}$ .

Then we are able to study the properties of $\hat{\beta}_{1}$ by assigning different forms of $\varphi$ in 5.8:

1.

With $\varphi(x)=x^{k}$ and $k\in\mathbb{N}_{+}$ , we have the centred moments of $\hat{\beta}_{1}$

$\mathcal{E}[\varphi(\sum a_{i}W_{i})]=\mathcal{E}[(\hat{\beta}_{1}-\beta_{1})^{k}].$
2.

With $\varphi(x)=\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\lvert x\rvert>c}}\right\}}$ , we get the object that is useful to derive a confidence interval in this context:

$\mathcal{E}[\varphi(\sum a_{i}W_{i})]=\mathbf{V}(\lvert\hat{\beta}_{1}-\beta_{1}\rvert>c).$ (5.12)

Interestingly, from 3.24 and 3.26, 5.12 further leads us to a robust confidence interval by solving the following equation:

\mathbf{V}(\lvert\hat{\beta}_{1}-\beta_{1}\rvert>c_{\alpha/2})=\sup_{\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{P}(\lvert\sum a_{i}\sigma_{i}\epsilon_{i}\rvert>c_{\alpha/2})=\alpha,

\inf_{\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{P}(\lvert\sum a_{i}\sigma_{i}\epsilon_{i}\rvert\leq c_{\alpha/2})=1-\alpha.

The resulting confidence interval is robust in the sense that its coverage rate will be at least $1-\alpha$ regardless of the unknown variance structure of the noise part $\sigma_{i}\epsilon_{i}$ in the regression. If we have more information that shows $\sigma_{k}$ does not depend on the previous $\epsilon_{i}$ with $i<k$ , we can consider a smaller family of sets $\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}$ . Alternatively, it also provides a way to perform a sensitivity analysis on the performance of a regression estimator (such as $\hat{\beta}_{1}$ here) under heteroscedastic noise with unknown variance structure that could belong to different family of models.

Then this discussion leads us to another interesting question. In an early stage of data analysis, should we choose $\bm{\sigma}\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}$ or $\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}$ ? This question will be explored in Section 5.5.

5.5 Inference on the general model structure of a state-space volatility model

Recall the setup in Section 5.4. In practice, if lacking knowledge on the underlying dynamic of the datasets, whether we should choose $\bm{\sigma}\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}$ or $\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}$ is a difficult problem in classical statistical methodology (in model specification) because both families involve a infinitely-dimensional family of elements. However, it turns out it can be essentially transformed into a $G$ -version question: it has a feasible solution once we introduce the $G$ -expectation of semi- $G$ -family of distributions. This becomes a hypothesis test to distinguish between semi-sequential independence and sequential independence. To be specific, we are able to consider a test:

H_{0}:\bm{\sigma}\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}\textbf{ vs }H_{a}:\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}\setminus\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}.

A good interpretation of this test is, since the class of hidden Markov models (with volatility as the switching regimes) belong to $\mathcal{S}$ . If we reject the null hypothesis, it means the underlying $\bm{\sigma}$ process cannot be treated as a switching-regime in the hidden Markov setup (or in any other kinds of normal mixture model), but we need to re-investigate the dataset and consider the $\bm{\sigma}$ process outside of the family of the normal mixture model (for instance, we may need to introduce other dependency, like the one between the previous observation $Y_{<t}$ with current $\sigma_{t}$ , such as a feedback design). Throughout this discussion, we did not make any parametric assumption on the model of $\bm{\sigma}$ , and we are still able to give a rigorous test on this distinction. The idea of this test will take advantage of 3.24 to transform the distinction between two family of classical models to a task of distinguishing two different types of independence ( $\overset{\text{S}}{\dashrightarrow}$ versus $\dashrightarrow$ ) for semi- $G$ -normal vector $(W_{1},W_{2},\dotsc,W_{n})$ . There are plenty of test functions $\varphi$ (neither convex nor concave ones) to reveal the difference between $\overset{\text{S}}{\dashrightarrow}$ and $\dashrightarrow$ such that

\mathcal{E}^{S}[\varphi(W_{1},W_{2},\dotsc,W_{n})]<\mathcal{E}^{L}[\varphi(W_{1},W_{2},\dotsc,W_{n})].

For instance, we can choose

\varphi(x_{i},i=1,2,\dotsc,n)=(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}x_{i})^{3}.

Under this $\varphi$ , the expectation under $\overset{\text{S}}{\dashrightarrow}$ is a certain zero but the one under $\dashrightarrow$ is greater than zero. Then we should be able to construct a test statistic based on the form of $\varphi$ and obtain a rejection region by studying its tail probability under $\mathbf{V}$ which can be transformed back into the sublinear expectation of $(W_{1},W_{2},\dotsc,W_{n})$ .

How to choose the test function will have significant effect on the performance of this hypothesis test. Moreover, the current interpretation of $n$ is the length of the whole data sequence and $\bm{\sigma}$ is the unknown volatility dynamic of the full sequence. We can also interpret $n$ as the group size after grouping the dataset in a either non-overlapping or overlapping manner, then we can consider $\bm{\sigma}$ for each group to test whether there is a case falling into the class of $H_{a}$ , because the sublinear expectation can give a control on the extremes of the group statistics as indicated by Jin and Peng, (2021) and Section 2.2 in Fang et al., (2019).

Acknowledgements and the story behind the semi- $G$ -normal

We have received many useful suggestions and feedback from the community in the past four years which are beneficial to the formation of this paper (so this paper can be treated as a report to the community).

The authors would like to first express their sincere thanks to Prof. Shige Peng who visited our department in May 2017 (invited by Prof. Reg Kulperger) and our discussion at that time motivated us to study a distributional and probabilistic structure that has a direct connection with the intuition behind the existing max-mean estimation proposed by Jin and Peng, (2021). Later on during the Fields-China Industrial Problem Solving Workshop in Finance, and a short course on the $G$ -expectation framework given by Prof. Peng at Fields Institute, Toronto, Canada, we had several interesting discussions on the data experiments in this context, which can be treated as the starting point of the companion paper of the current one. In our regular discussion notes in that period, there was a prototype of the current semi- $G$ -normal distribution and also a question on independence between semi- $G$ -normal was raised which is currently included and answered by 4.9.

Although the design of semi- $G$ -normal is mainly for distributional purpose, this concept was first proposed in Li and Kulperger, (2018), which was applied to design an iterative approximation towards the $G$ -normal distribution by starting from the linear expectations of classical normal, as discussed in Section 4.1. During the 2018 Young Researcher Meeting on BSDEs, Nonlinear Expectations and Mathematical Finance in Shanghai Jiao Tong University, we have received beneficial feedback on this iterative method from participants in the conference. Specially we would like to thank to Prof. Yiqing Lin on providing more references that have potential theoretical connections. 4.1.2 is benefited from the comments by Prof. Shuzhen Yang and Prof. Xinpeng Li.

The first author would also like to express his gratitude to Prof. Huaxiong Huang at Fields Institute and his Ph.D. student Nathan Gold for their support and suggestions during a separate long-term and challenging joint project (regularly discussing with Prof. Peng) during summer 2017 on a stochastic method of the $G$ -heat equation in high-dimensional case and its theoretical and numerical convergence. In this project, the first author has learned the intuition about the methods in how to appropriately design a switching rule in the stochastic volatility to approximate the solution to a fully nonlinear PDE, which is related to the methods based on BSDEs and the second order BSDEs, and also the intuition behind nonlinear expectations in this context. Although the methods are different, this experience creates another motivation for Li and Kulperger, (2018).

The authors are grateful for the valuable discussions with the community during the conference of Probability, Uncertainty and Quantitative Risk in July 2019. One of the motivations of 4.9 is from the comments by Prof. Mingshang Hu on the independence property of maximal distribution. The writing of Section 4.4 is motivated by the discussions with Prof. Peng during the conference. Section 4.4 and further the data experiments in the companion paper are also benefited from the discussions on the meaning of sequential independence under a set of measures with Prof. Jianfeng Zhang.

We have also benefited from the feedback from participants coming from various backgrounds in the Annual Meetings of SSC (Statistical Society of Canada) in 2018 and 2019 to understand the impression from general audience on the $G$ -expectation framework. During the poster session of the Annual Meeting of SSC at McGill University in 2018, we have received several positive comments about designing a substructure connecting the $G$ -expectation framework (which is a highly technical one for general audience) with the objects in the classical system. These comments further motivate us to write this paper for general readers. In the Annual Meeting of SSC at Calgary University in 2019, there is a comment from the audience on the property of $\mathcal{H}$ and the choice of function space ( $\mathcal{H}$ could be quite small if we choose a large function space for $\varphi$ ). It has motivated us to improve the preliminary setup (Section 2) and put more attention on the design of $\mathcal{H}$ .

During the improvement of this manuscript from the first version (April 2021) to the third version (October 2021), the authors are grateful to Prof. Defei Zhang who gives many beneficial comments (such as the comment on the product space and an improvement of Figure 4.1) and Prof. Xinpeng Li whose suggestion motivates us to develop the research in Section 5.2.

6 Proofs

6.1 Proofs in Section 3.2

Proof of 3.2.

The finiteness of $\mathbb{E}[\lvert\varphi(V)\rvert]$ is obvious due to the continuity of $\varphi$ and the compactness of ${[\underline{\sigma},\overline{\sigma}]}$ . First of all, note that 3.2 is a direct result of 3.1. It is also not hard to see 3.5, since for any $\sigma\in\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}$ , it satisfies $\mathbb{P}_{\sigma}({[\underline{\sigma},\overline{\sigma}]})=1$ , then

	$\displaystyle\mathbb{E}[\varphi(\sigma)]$	$\displaystyle=\int_{\underline{\sigma}}^{\overline{\sigma}}\varphi(x)\mathbb{P}_{\sigma}(\mathop{}\!\mathrm{d}x)\leq\int_{\underline{\sigma}}^{\overline{\sigma}}\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(x)\mathbb{P}_{\sigma}(\mathop{}\!\mathrm{d}x)$
		$\displaystyle=\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(x)\mathbb{P}_{\sigma}({[\underline{\sigma},\overline{\sigma}]})=\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(x),$

which implies

\max_{\sigma\in\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma)]\leq\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma)].

Since ${[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{D}{[\underline{\sigma},\overline{\sigma}]}$ , we also have the other direction of inequality holds. Similarly, we can show 3.3.

To validate 3.4, we need to show that for any $\alpha>0$ , there exist a random variable $\sigma_{\alpha}\in\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}$ , such that

\mathbb{E}[\varphi(\sigma_{\alpha})]>\mathcal{E}[\varphi(V)]-\alpha.

(6.1)

Let $v^{*}=\operatorname*{arg\,max}_{v\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(v)$ . Then we have $\mathcal{E}[\varphi(V)]=\varphi(v_{0})$ . Since $\varphi$ is a continuous function on ${[\underline{\sigma},\overline{\sigma}]}$ , there exists $v_{0}\in(\underline{\sigma},\overline{\sigma})$ such that $\varphi(v_{0})>\varphi(v^{*})-\alpha/2$ . In a classical probability space $(\Omega,\mathcal{F},\mathbb{P})$ , consider a series of random variables $\xi_{n}\coloneqq v_{0}+e/\sqrt{n}$ where $e\sim N(0,1)$ and $n\in\mathbb{N}_{+}$ . In short, $\xi_{n}\sim N(v_{0},1/n)$ with diminishing variance. Then we must have $\xi_{n}\overset{\text{d}}{\longrightarrow}v_{0}$ . Then transform $\xi_{n}$ into its truncation on ${[\underline{\sigma},\overline{\sigma}]}$ : $\xi_{n}^{*}\coloneqq\xi_{n}I_{n}$ with $I_{n}\coloneqq\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\xi_{n}\in{[\underline{\sigma},\overline{\sigma}]}}}\right\}}$ . We can easily show that $I_{n}\overset{\mathbb{P}}{\longrightarrow}1$ since, for any $a>0$ , $\mathbb{P}(\lvert I_{n}-1\rvert>a)=\mathbb{P}(I_{n}=0)=1-\mathbb{P}(\xi_{n}\in{[\underline{\sigma},\overline{\sigma}]})\to 0.$ By classical Slustky’s theorem, $\xi_{n}^{*}=\xi_{n}I_{n}\overset{\text{d}}{\longrightarrow}v_{0}.$ Therefore, for any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R})$ ,

\mathbb{E}[\varphi(\xi_{n}^{*})]\to\varphi(v_{0}).

For any $\alpha>0$ , there exists $n_{\alpha}$ such that $\mathbb{E}[\varphi(\xi_{n_{\alpha}}^{*})]>\varphi(v_{0})-\alpha/2.$ Let $\sigma_{\alpha}\coloneqq\xi_{n_{\alpha}}$ which belongs to $\mathcal{D}_{\textbf{cont.}}{[\underline{\sigma},\overline{\sigma}]}$ . It the required object satisfying 6.1, because

\mathbb{E}[\varphi(\sigma_{\alpha}))]>\varphi(v_{0})-\alpha/2>\varphi(v_{0})-\alpha=\mathcal{E}[\varphi(V)]-\alpha.\qed

Proof of 3.6.

We can prove it by mathematical induction. For $d=1$ , it obviously holds. Suppose the results holds for $d=k$ with $k\in\mathbb{N}_{+}$ , namely,

\bm{V}_{(k)}\coloneqq(V_{1},V_{2},\dotsc,V_{k})\sim\mathcal{M}(\prod_{i=1}^{k}[\underline{\sigma}_{i},\overline{\sigma}_{i}]),

then we only need to show it holds for $d=k+1$ . In fact, consider any locally Lipschitz function

\varphi:(\mathbb{R}^{k+1},\lVert\cdot\rVert)\to(\mathbb{R},\lvert\cdot\rvert),

satisfying, there exists $C_{\varphi}>0$ , $m\in\mathbb{R}_{+}$ ,

\lvert\varphi(x)-\varphi(y)\rvert\leq C_{\varphi}(1+\lVert x\rVert^{m}+\lVert y\rVert^{m})\lVert x-y\rVert.

Since $V_{k+1}$ is independent from $\bm{V}_{(k)}$ ,

\mathcal{E}[\varphi(V_{1},V_{2},\dotsc,V_{k})]=\mathcal{E}[\varphi(\bm{V}_{(k)},V_{k+1})]=\mathcal{E}\mathopen{}\mathclose{{}\left[\mathcal{E}[\varphi(\bm{\sigma}_{(k)},V_{k+1})]_{\bm{\sigma}_{(k)}=\bm{V}_{(k)}}}\right].

Let $\varphi_{k}(x)\coloneqq\varphi(\bm{\sigma}_{(k)},x),$ and

\psi_{k+1}(\bm{\sigma}_{(k)})\coloneqq\max_{\sigma_{k+1}\in[\underline{\sigma}_{k+1},\overline{\sigma}_{k+1}]}\varphi(\bm{\sigma}_{(k)},\sigma_{k+1})=\max_{\sigma_{k+1}\in[\underline{\sigma}_{k+1},\overline{\sigma}_{k+1}]}\varphi_{k}(\sigma_{k+1}).

For notational convenience, we sometimes omit the domain $[\underline{\sigma}_{k+1},\overline{\sigma}_{k+1}]$ of the maximization here in our later discussions if it is clear from the context.

Claim 6.1.

We have $\varphi_{k}\in C_{\mathrm{l.Lip}}(\mathbb{R})$ and $\psi_{k+1}\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k})$ .

Then we are able to apply the representation of maximal distribution $\bm{V}_{(k)}$ (allowed by 6.1) to have

	$\displaystyle\mathcal{E}[\varphi(V_{1},V_{2},\dotsc,V_{k+1})]$	$\displaystyle=\mathcal{E}[\varphi(\bm{V}_{(k)},V_{k+1})]$
		$\displaystyle=\mathcal{E}\Bigl{[}\mathcal{E}[\underbrace{\varphi(\bm{\sigma}_{(k)},V_{k+1})}_{\varphi_{k}(\sigma_{k+1})}]_{\bm{\sigma}_{(k)}=\bm{V}_{(k)}}\Bigr{]}$
		$\displaystyle=\mathcal{E}\Bigl{[}[\max_{\sigma_{k+1}}\varphi_{k}(\sigma_{k+1})]_{\bm{\sigma}_{(k)}=\bm{V}_{(k)}}\Bigr{]}$
		$\displaystyle=\mathcal{E}[\psi_{k+1}(\bm{V}_{(k)})]$
		$\displaystyle=\max_{\bm{\sigma}_{(k)}}\psi_{k+1}(\bm{\sigma}_{(k)})$
		$\displaystyle=\max_{(\sigma_{1},\sigma_{2},\dotsc,\sigma_{k})}\max_{\sigma_{k+1}}\varphi(\sigma_{1},\dotsc,\sigma_{k},\sigma_{k+1})$
		$\displaystyle=\max_{(\sigma_{1},\sigma_{2},\dotsc,\sigma_{k+1})}\varphi(\sigma_{1},\dotsc,\sigma_{k},\sigma_{k+1}).$

Therefore,

(V_{1},V_{2},\dotsc,V_{k+1})\sim\mathcal{M}(\prod_{i=1}^{k+1}[\underline{\sigma}_{i},\overline{\sigma}_{i}]).

The conclusion can be achieved by induction.

The remaining task is to prove 6.1.

To show $\varphi_{k}\in C_{\mathrm{l.Lip}}(\mathbb{R})$ , we write

	$\displaystyle\lvert\varphi_{k}(x)-\varphi_{k}(y)\rvert$	$\displaystyle=\lvert\varphi(\bm{\sigma}_{(k)},x)-\varphi(\bm{\sigma}_{(k)},y)\rvert$
		$\displaystyle\leq C_{\varphi}(1+\lVert(\bm{\sigma}_{(k)},x)\rVert^{m}+\lVert(\bm{\sigma}_{(k)},y)\rVert^{m})\lVert x-y\rVert,$

where we adapt $\lVert\cdot\rVert$ to lower dimension in the sense that $\lVert x\rVert\coloneqq\lVert(\bm{0}_{(k)},x)\rVert$ . Notice

\lVert(\bm{\sigma}_{(k)},x)\rVert=\lVert(\bm{\sigma}_{(k)},0)+(\bm{0}_{(k)},x)\rVert\leq\lVert\bm{\sigma}_{(k)}\rVert+\lVert x\rVert.

Meanwhile, there exists $K\geq 0$ (actually $K=\max\{1,2^{m-1}\}$ ), such that

\lVert(\bm{\sigma}_{(k)},x)\rVert^{m}\leq(\lVert\bm{\sigma}_{(k)}\rVert+\lVert x\rVert)^{m}\leq K(\lVert\bm{\sigma}_{(k)}\rVert^{m}+\lVert x\rVert^{m}).

Then we have

\lvert\varphi_{k}(x)-\varphi_{k}(y)\rvert\leq C_{1}(1+\lVert x\rVert^{m}+\lVert y\rVert^{m})\lVert x-y\rVert,

where $C_{1}=C_{\varphi}\max\{1+2K\lVert\bm{\sigma}_{(k)}\rVert^{m},K\}.$

Next we check $\psi_{k+1}\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k})$ . For any $\bm{a}_{(k)},\bm{b}_{(k)}\in\mathbb{R}^{k}$ ,

		$\displaystyle\|\psi_{k+1}(\bm{a}_{(k)})-\psi_{k+1}(\bm{b}_{(k)})\|$
	$\displaystyle=$	$\displaystyle\|\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\bm{a}_{(k)},\sigma_{k+1})-\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\bm{b}_{(k)},\sigma_{k+1})\|$
	$\displaystyle\leq$	$\displaystyle\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}\|\varphi(\bm{a}_{(k)},\sigma_{k+1})-\varphi(\bm{b}_{(k)},\sigma_{k+1})\|$
	$\displaystyle\leq$	$\displaystyle\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}C_{\varphi}(1+\lVert(\bm{a}_{(k)},\sigma_{k+1})\rVert^{m}+\lVert(\bm{b}_{(k)},\sigma_{k+1})\rVert^{m})\lVert\bm{a}_{(k)}-\bm{b}_{(k)}\rVert$
	$\displaystyle\leq$	$\displaystyle C_{2}(1+\lVert\bm{a}_{(k)}\rVert+\lVert\bm{b}_{(k)}\rVert)\lVert\bm{a}_{(k)}-\bm{b}_{(k)}\rVert,$

where $C_{2}=C_{\varphi}\max\{1+2K\overline{\sigma}^{m},K\}$ . ∎

Proof of 3.5.

The first statement can be proved by studying the range of $\psi(\bm{V})$ . First, we need to show that $\varphi\circ\psi(x)\coloneqq\varphi(\psi(x))$ is also a locally Lipschitz function for any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{d})$ . Suppose $\psi$ satisfies,

\lVert\psi(\bm{x})-\psi(\bm{y})\rVert\leq C_{\psi}(1+\lVert\bm{x}\rVert^{p}+\lVert\bm{y}\rVert^{p})\lVert\bm{x}-\bm{y}\rVert.

(6.2)

We first can write

	$\displaystyle\|\varphi\circ\psi(\bm{x})-\varphi\circ\psi(\bm{y})\|$	$\displaystyle=\|\varphi(\psi(\bm{x}))-\varphi(\psi(\bm{y}))\|$
		$\displaystyle\leq C_{\varphi}(1+\lVert\psi(\bm{x})\rVert^{m}+\lVert\psi(\bm{y})\rVert^{m})\lVert\psi(\bm{x})-\psi(\bm{y})\rVert.$		(6.3)

As preparations for next step, we are going to frequently use tha basic fact that the lower-degree polynomials can be dominated by higher-degree ones in the sense that,

\lVert\bm{x}\rVert^{k}\leq\max\{1,\lVert\bm{x}\rVert^{l}\}\leq 1+\lVert\bm{x}\rVert^{l}\text{ with }k\leq l,

(6.4)

and for any $k,l\in\mathbb{N}_{+}$ ,

\lVert\bm{x}\rVert^{k}\lVert\bm{y}\rVert^{l}\leq\frac{1}{2}(\lVert\bm{x}\rVert^{2k}+\lVert\bm{x}\rVert^{2l}).

(6.5)

In 6.3, we can directly use 6.2 to dominate $\lVert\psi(\bm{x})-\psi(\bm{y})\rVert$ . For the parts like $\lVert\psi(\bm{x})\rVert^{m}$ , 6.2 implies,

\lVert\psi(\bm{x})\rVert\leq|\psi(\bm{x})-\psi(\bm{0})|+|\psi(\bm{0})|\leq C_{\psi}(1+\lVert\bm{x}\rVert^{p})\lVert\bm{x}\rVert+C_{0}\leq C_{\psi}^{\prime}(1+\lVert\bm{x}\rVert^{p+1}),

then there exists $C_{\psi}^{\prime\prime}>0$ such that,

\lVert\psi(\bm{x})\rVert^{m}\leq[C_{\psi}^{\prime}(1+\lVert\bm{x}\rVert^{p+1})]^{m}\leq C_{\psi}^{\prime\prime}(1+\lVert\bm{x}\rVert^{(p+1)m}).

Hence, we can get $\varphi\circ\psi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{d})$ by the inequality as follows,

	$\displaystyle\|\varphi\circ\psi(\bm{x})-\varphi\circ\psi(\bm{y})\|$	$\displaystyle\leq K_{1}(1+\lVert\bm{x}\rVert^{(p+1)m}+\lVert\bm{y}\rVert^{(p+1)m})(1+\lVert\bm{x}\rVert^{p}+\lVert\bm{y}\rVert^{p})\lVert\bm{x}-\bm{y}\rVert$
		$\displaystyle\leq K_{2}(1+\lVert\bm{x}\rVert^{2(p+1)pm}+\lVert\bm{y}\rVert^{2(p+1)pm})\lVert\bm{x}-\bm{y}\rVert.$

Finally, we have $\psi(\bm{\sigma})\sim\mathcal{M}(\mathcal{S})$ from its representation:

	$\displaystyle\mathcal{E}[\varphi(\bm{S})]=\mathcal{E}[\varphi(\psi(\bm{V}))]$	$\displaystyle=\mathcal{E}[\varphi\circ\psi(\bm{V})]$
		$\displaystyle=\max_{\sigma_{i}\in[\underline{\sigma}_{i},\overline{\sigma}_{i}],i=1,2,\dotsc,d}\varphi\circ\psi(\sigma_{1},\sigma_{2},\dotsc,\sigma_{d})$
		$\displaystyle=\max_{\bm{s}\in\mathcal{S}}\varphi(\bm{s}).$

The second statement essentially comes from the basic property of the maximum of a continuous function on a rectangle: in this ideal setup, the order of taking marginal maximum does not affect the final value. To show the basic idea, start from a simple case $d=2$ : if $V_{1}\dashrightarrow V_{2}$ , for any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{2})$ , we can work on $(V_{2},V_{1})$ to show the other direction of independence,

	$\displaystyle\mathcal{E}[\varphi(V_{2},V_{1})]$	$\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(\sigma_{2},V_{1})]_{\sigma_{1}=V_{1}}]$
		$\displaystyle=\max_{\sigma_{1}\in[\underline{\sigma}_{1},\overline{\sigma}_{1}]}\max_{\sigma_{2}\in[\underline{\sigma}_{2},\overline{\sigma}_{2}]}\varphi(\sigma_{2},\sigma_{1})$
		$\displaystyle=\max_{(\sigma_{1},\sigma_{2})\in\prod_{i=1}^{2}[\underline{\sigma}_{i},\overline{\sigma}_{i}]}\varphi(\sigma_{2},\sigma_{1})$
		$\displaystyle=\max_{\sigma_{2}\in[\underline{\sigma}_{2},\overline{\sigma}_{2}]}\max_{\sigma_{1}\in[\underline{\sigma}_{1},\overline{\sigma}_{1}]}\varphi(\sigma_{2},\sigma_{1})$
		$\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(\sigma_{2},V_{1})]_{\sigma_{2}=V_{2}}],$

where we have used the fact that $\varphi_{x}(y)\coloneqq\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(x,y)\in C_{\mathrm{l.Lip}}(\mathbb{R})$ if $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{2})$ , which can be validated by 6.1. Hence, we have $V_{2}\dashrightarrow V_{1}$ .

In general, for any permutation $(i_{1},i_{2},\dotsc,i_{d})$ of $(1,2,\dotsc,d)$ , our objective is to prove for any $j=2,\dotsc d$ ,

(V_{i_{1}},V_{i_{2}},\dotsc V_{i_{j-1}})\dashrightarrow V_{i_{j}}.

From the first statement, $(V_{i_{1}},V_{i_{2}},\dotsc,V_{i_{j}})$ , as a function of $(V_{1},V_{2},\dotsc,V_{d})$ , must also follow a maximal distribution, characterized by $\mathcal{M}(\mathcal{V}_{j})$ with

\mathcal{V}_{j}\coloneqq\prod_{k=1}^{j}[\underline{\sigma}_{i_{k}},\overline{\sigma}_{i_{k}}].

Then we can mimic the derivation for $d=2$ to check the independence,

	$\displaystyle\mathcal{E}[\varphi(V_{i_{1}},V_{i_{2}},\dotsc,V_{i_{j}})]$	$\displaystyle=\max_{(\sigma_{i_{1}},\sigma_{i_{2}},\dots,\sigma_{i_{j}})\in\mathcal{V}_{j}}\varphi(\sigma_{i_{1}},\sigma_{i_{2}},\dots,\sigma_{i_{j}})$
		$\displaystyle=\max_{(\sigma_{i_{1}},\sigma_{i_{2}},\dotsc,\sigma_{i_{j-1}})}\max_{\sigma_{i_{j}}}\varphi(\sigma_{i_{1}},\sigma_{i_{2}},\dots,\sigma_{i_{j}})$
		$\displaystyle=\mathcal{E}[[\max_{\sigma_{i_{j}}}\varphi(\sigma_{i_{1}},\sigma_{i_{2}},\dots,\sigma_{i_{j-1}},V_{i_{j}})]_{\sigma_{i_{k}}=V_{i_{k}},k=1,2\dotsc,j-1}]$
		$\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(\sigma_{i_{1}},\sigma_{i_{2}},\dotsc,\sigma_{i_{j-1}},V_{i_{j}})]_{\sigma_{i_{k}}=V_{i_{k}},k=1,2\dotsc,j-1}].$

Since it holds for all possible $j$ , it is equivalent to say

V_{i_{1}}\dashrightarrow V_{i_{2}}\dashrightarrow\cdots\dashrightarrow V_{i_{d}}.\qed

6.2 Proofs in Section 3.4 (improved)

In order to show the uniqueness of decomposition (3.9), we first prepare several lemmas.

Lemma 6.1.

For any $g(K,\eta)\in\bar{\mathcal{H}}_{s}$ where $K\sim\mathcal{M}(\Theta)$ and $\eta$ is classical, if $g(K,\eta)\overset{\text{d}}{=}\epsilon$ where $\epsilon$ is classical, we must have, for any fixed $k\in\Theta$ ,

g(k,\eta)\overset{\text{d}}{=}\epsilon.

Proof.

Since for any function $\psi$ ,

\max_{k\in\Theta}\mathbb{E}[\psi(g(k,\eta))]=\mathcal{E}[\psi(g(K,\eta))]=\mathbb{E}[\psi(\epsilon)],

by replacing $\psi$ with $-\psi$ , we have

	$\displaystyle\min_{k\in\Theta}\mathbb{E}[\psi(g(k,\eta))]$	$\displaystyle=-\mathcal{E}[-\psi(g(K,\eta))]$
		$\displaystyle=-\mathbb{E}[-\psi(\epsilon)]=\mathbb{E}[\psi(\epsilon)].$

It means for any $k\in\Theta$ , we have

\mathbb{E}[\psi(g(k,\eta))]\equiv\mathbb{E}[\psi(\epsilon)].

Therefore, we have $g(k,\eta)\overset{\text{d}}{=}\epsilon.$ ∎

Lemma 6.2.

For a maximally distributed $V\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}$ , we have $\mathbf{v}(V\in{[\underline{\sigma},\overline{\sigma}]})=1$ .

Proof.

Let

\varphi_{n}(x)=\begin{cases}1&x\in{[\underline{\sigma},\overline{\sigma}]}\\ n(x-\underline{\sigma})+1&x\in[\underline{\sigma}-\frac{1}{n},\underline{\sigma})\\ -n(x-\overline{\sigma})+1&x\in(\overline{\sigma},\overline{\sigma}+\frac{1}{n}]\\ 0&\text{otherwise}\end{cases}.

Then we have $\varphi_{n}\in C_{\mathrm{l.Lip}}(\mathbb{R})$ and $\varphi_{n}(x)\downarrow\mathds{1}_{{[\underline{\sigma},\overline{\sigma}]}}(x)$ or $-\varphi_{n}(x)\uparrow-\mathds{1}_{{[\underline{\sigma},\overline{\sigma}]}}(x)$ . (Since each $\varphi_{n}(V)\in\mathcal{H}$ , we have $\varphi(V)=\lim_{n\to\infty}\varphi_{n}(V)\in\mathcal{H}$ by the completeness of $\mathcal{H}$ .) Note that

\mathcal{E}[-\varphi_{n}(V)]=\max_{x\in{[\underline{\sigma},\overline{\sigma}]}}(-\varphi_{n}(x))=-\min_{x\in{[\underline{\sigma},\overline{\sigma}]}}\varphi_{n}(x)=-1.

It implies that

\mathcal{E}[-\mathds{1}_{{[\underline{\sigma},\overline{\sigma}]}}(V)]=\lim_{n\to\infty}\mathcal{E}[-\varphi_{n}(V)]=-1,

then

\mathbf{v}(V\in{[\underline{\sigma},\overline{\sigma}]})=-\mathcal{E}[-\mathds{1}_{{[\underline{\sigma},\overline{\sigma}]}}(V)]=1.

∎

Lemma 6.3.

Consider $K\sim\mathcal{M}(\Theta)$ where $\Theta$ is a compact and convex set and $\eta$ follows a non-degenerate classical distribution $P_{\eta}$ with $K\dashrightarrow\eta$ . For any $h\in C_{\mathrm{l.Lip}}$ , if $h(K,\eta)\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}$ , there exists $B\in\mathcal{B}(\mathbb{R})$ with $P_{\eta}(B)=1$ such that $h(x,y)$ does not depend on $y$ or simply $h(x,y)=h(x)$ when $y\in B$ .

Proof.

For any $\varphi\in C_{\mathrm{l.Lip}}$ with $\varphi(x)>0$ on $x\in{[\underline{\sigma},\overline{\sigma}]}$ , let $\sigma^{*}\coloneqq\operatorname*{arg\,max}_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma)$ . Then we have

	$\displaystyle\varphi(\sigma^{*})=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma)$	$\displaystyle=\mathcal{E}[\varphi(h(K,\eta))]$
		$\displaystyle=\max_{k\in\Theta}\mathbb{E}[\varphi(h(k,\eta))]$
		$\displaystyle=\max_{k\in\Theta}\int\varphi(h(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y)$
		$\displaystyle\leq\int\max_{k\in\Theta}\varphi(h(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y).$

Meanwhile, note that $h(K,\eta)$ is bounded by ${[\underline{\sigma},\overline{\sigma}]}$ and $K$ is bounded by $\Theta$ in quasi-surely sense or

\mathbf{v}(h(K,\eta)\in{[\underline{\sigma},\overline{\sigma}]})=1,\;\mathbf{v}(K\in\Theta)=1.

Then, for any $\mathbb{P}\in\mathcal{P}$ , we have

\mathbb{P}(\{\omega:h(K(\omega),\eta(\omega))\in{[\underline{\sigma},\overline{\sigma}]}\})=1,\;\mathbb{P}(\{\omega:K(\omega)\in{[\underline{\sigma},\overline{\sigma}]}\})=1,

then the intersection of two events has probability $1$ ,

\mathbb{P}(\{\omega:h(K,\eta)\in{[\underline{\sigma},\overline{\sigma}]},K\in\Theta\})=1.

Hence, with $A\coloneqq\{y:h(k,y)\in{[\underline{\sigma},\overline{\sigma}]},k\in\Theta\},$ we must have

P_{\eta}(A)=\mathbb{P}(\{\omega:\eta(\omega)\in A\})\geq\mathbb{P}(\{\omega:h(K,\eta)\in{[\underline{\sigma},\overline{\sigma}]},K\in\Theta\})=1.

(The measurability of $A$ comes from the continuity of $h$ . Under any $\mathbb{P}\in\mathcal{P}$ , the distribution of $\eta$ is always $P_{\eta}$ due to 3.7.1.) Then we have

	$\displaystyle\varphi(\sigma^{*})\leq\int\max_{k\in\Theta}\varphi(h(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y)$	$\displaystyle=\int_{A}\max_{k\in\Theta}\varphi(h(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y)$
		$\displaystyle\leq\int_{A}\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma)P_{\eta}(\mathop{}\!\mathrm{d}y)$
		$\displaystyle=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma)\int_{A}P_{\eta}(\mathop{}\!\mathrm{d}y)=\varphi(\sigma^{*}).$

Therefore,

\int_{A}\max_{k\in\Theta}\varphi(h(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y)=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma).

For any $y\in A$ , since $h(k,y)\in{[\underline{\sigma},\overline{\sigma}]}$ ,

0<\max_{k\in\Theta}\varphi(h(k,y))\leq\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma).

Then there must exist $B\subset A$ with $P_{\eta}(B)=1$ such that for $y\in B$ ,

\max_{k\in\Theta}\varphi(h(k,y))=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma),

\mathcal{E}[\varphi(h(K,y))]=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma).

For any $f\in C_{\mathrm{l.Lip}}$ , let $\varphi=f(x)+C$ with $C=-\min_{x\in{[\underline{\sigma},\overline{\sigma}]}}f(x)+1$ , then $\varphi>0$ on $x\in{[\underline{\sigma},\overline{\sigma}]}$ . We have

\mathcal{E}[f(h(K,y))]=\mathcal{E}[\varphi(h(K,y))]-C=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\varphi(\sigma)-C=\max_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}f(\sigma).

Therefore, for $y\in B$ ,

h(K,y)\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}.

(6.6)

If there exist two distinct $y_{1},y_{2}\in B$ ,

\delta\coloneqq h(K,y_{1})-h(K,y_{2})\neq 0

we must have

h(K,y_{1})=h(K,y_{2})+\delta\sim\mathcal{M}[\underline{\sigma}+\delta,\overline{\sigma}+\delta].

This is a contradiction against 6.6. Then we have, for any $y\in B$ ,

h(V_{1},y)\equiv h(V_{1},c)\eqqcolon h(V_{1}),

where $c$ is any constant chosen from $B$ . ∎

Proof of 3.9.

Since $W\in\bar{\mathcal{H}}_{s}$ , we have $W=f(K,\eta)$ where $K\sim\mathcal{M}(\Theta)$ and $\eta$ is classical satisfying $K\dashrightarrow\eta$ . Suppose there exist $V_{i}\in\bar{\mathcal{H}}_{s}$ and $\epsilon_{i}\in\bar{\mathcal{H}}_{s}$ such that $W=V_{1}\epsilon_{1}=V_{2}\epsilon_{2}$ . To be specific, without loss of generality, we can assume

	$\displaystyle V_{i}$	$\displaystyle=h_{i}(K,\eta),$
	$\displaystyle\epsilon_{i}$	$\displaystyle=g_{i}(K,\eta),$

such that $f(K,\eta)=h_{1}(K,\eta)g_{1}(K,\eta)=h_{2}(K,\eta)g_{2}(K,\eta)$ . Then we have

g_{2}(K,\eta)=\frac{h_{1}(K,\eta)}{h_{2}(K,\eta)}g_{1}(K,\eta).

Note that $h_{i}(K,\eta)\sim\mathcal{M}{[\underline{\sigma},\overline{\sigma}]}$ , then by 6.3, there exist $B_{i}$ with $P_{\eta}(B_{i})=1$ such that $h_{i}(x,y)=h_{i}(x)$ when $y\in B_{i}$ . Let $B=B_{1}\cap B_{2}$ then we still have $P_{\eta}(B)=1$ . Then we have, for any $\varphi$ ,

	$\displaystyle\mathcal{E}[\varphi(\frac{h_{1}(K,\eta)}{h_{2}(K,\eta)}g_{1}(K,\eta))]$	$\displaystyle=\sup_{k\in\Theta}\mathbb{E}[\varphi(\frac{h_{1}(k,\eta)}{h_{2}(k,\eta)}g_{1}(k,\eta))]$
		$\displaystyle=\sup_{k\in\Theta}\int\varphi(\frac{h_{1}(k,y)}{h_{2}(k,y)}g_{1}(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y)$
		$\displaystyle=\sup_{k\in\Theta}\int_{B}\varphi(\frac{h_{1}(k,y)}{h_{2}(k,y)}g_{1}(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y)$
		$\displaystyle=\sup_{k\in\Theta}\int_{B}\varphi(\frac{h_{1}(k)}{h_{2}(k)}g_{1}(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y)$
		$\displaystyle=\sup_{k\in\Theta}\int\varphi(\frac{h_{1}(k)}{h_{2}(k)}g_{1}(k,y))P_{\eta}(\mathop{}\!\mathrm{d}y)$
		$\displaystyle=\sup_{k\in\Theta}\mathbb{E}[\varphi(\frac{h_{1}(k)}{h_{2}(k)}g_{1}(k,\eta))]=\mathcal{E}[\varphi(\frac{h_{1}(K)}{h_{2}(K)}g_{1}(K,\eta))].$

Similarly, we can also show

\frac{h_{1}(K,\eta)}{h_{2}(K,\eta)}\overset{\text{d}}{=}\frac{h_{1}(K)}{h_{2}(K)}.

Note that

R(K)\coloneqq h_{1}(K)/h_{2}(K)\sim\mathcal{M}(S),

where $S=\{h_{1}(k)/h_{2}(k),k\in\Theta\}$ . By 6.1, letting $Z\overset{\text{d}}{=}N(0,1)$ , the fact that $g_{1}(K,\eta)\overset{\text{d}}{=}Z$ implies, for $k\in\Theta$ ,

g_{1}(k,\eta)\overset{\text{d}}{=}Z.

Then we have, with $\psi_{k}(x)\coloneqq\varphi(R(k)x)$ ,

	$\displaystyle\mathcal{E}[\varphi(R(K)g_{1}(K,\eta))]$	$\displaystyle=\sup_{k\in\Theta}\mathbb{E}[\varphi(R(k)g_{1}(k,\eta))]$
		$\displaystyle=\sup_{k\in\Theta}\mathbb{E}[\psi_{k}(g_{1}(k,\eta))]=\sup_{k\in\Theta}\mathbb{E}[\psi_{k}(Z)]$
		$\displaystyle=\sup_{k\in\Theta}\mathbb{E}[\varphi(R(k)Z)]=\sup_{s\in S}\mathbb{E}[\varphi(sZ)].$

Meanwhile,

\mathcal{E}[\varphi(R(K)g_{1}(K,\eta))]=\mathcal{E}[\varphi(g_{2}(K,\eta))]=\mathbb{E}[\varphi(Z)].

Then we have the set $S$ has to be a singleton $\{1\}$ . It means that $R(K)\sim\mathcal{M}(\{1\})$ or $R(K)=1$ (in a quasi-surely sense). Then we also have

\frac{h_{1}(K,\eta)}{h_{2}(K,\eta)}\overset{\text{d}}{=}\frac{h_{1}(K)}{h_{2}(K)}\overset{\text{d}}{=}\mathcal{M}(\{1\}).

It means that $h_{1}(K,\eta)=h_{2}(K,\eta)$ then $g_{1}(K,\eta)=g_{2}(K,\eta)$ . The uniqueness has been proved. ∎

Proof of 3.12.

This is a direct consequence of 2.15. Let $\epsilon\sim N(0,1)$ . On the one hand, for any $\varphi\in C_{\mathrm{l.Lip}}$ , as discussed in 3.11.3, we have

\mathcal{E}[\varphi(W^{G})]\geq\sup_{\sigma\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\sigma\epsilon)]=\mathcal{E}[\varphi(W)].

On the other hand, when $\varphi$ is convex or concave, by 2.15, we have

\mathcal{E}[\varphi(W)]\geq\max_{\sigma\in\{\underline{\sigma},\overline{\sigma}\}}\mathbb{E}[\varphi(\sigma\epsilon)]\geq\mathcal{E}[\varphi(W^{G})].

Hence, we have $\mathcal{E}[\varphi(W^{G})]=\mathcal{E}[\varphi(W)]$ under convexity (or concavity) of $\varphi$ .

For readers’ convenience, we include an explicit proof on why we have such results for semi- $G$ -normal distribution. For techinical convenience, we assume $\varphi$ is second order differentiable. From the representation of the semi- $G$ -normal distribution (2.15), with $G(v)\coloneqq\mathbb{E}_{\mathbb{P}}[\varphi(v\epsilon)](v\in{[\underline{\sigma},\overline{\sigma}]})$ , our goal is to show

\mathcal{E}[\varphi(W)]=\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}G(v)=\begin{cases}G(\overline{\sigma})&\varphi\text{ is convex}\\ G(\underline{\sigma})&\varphi\text{ is concave}\end{cases}.

First of all, by Taylor expansion $\varphi(x)=\varphi(0)+\varphi^{(1)}(0)x+\varphi^{(2)}(\xi_{x})\frac{x^{2}}{2}$ with $\xi_{x}\in(0,x)$ , we have,

G(v)=\mathbb{E}_{\mathbb{P}}[\varphi(0)+\varphi^{(1)}(0)v\epsilon+\varphi^{(2)}(\xi_{v\epsilon})\frac{v^{2}}{2}\epsilon^{2}]=\varphi(0)+\frac{1}{2}\mathbb{E}_{\mathbb{P}}[\varphi^{(2)}(\xi_{v\epsilon})(v\epsilon)^{2}],

where $\xi_{v\epsilon}\in(0,v\epsilon)$ is a random variable depending on $\epsilon$ . Let $M\coloneqq v\epsilon\sim N(0,v^{2})$ , then

K(v)\coloneqq\mathbb{E}_{\mathbb{P}}[\varphi^{(2)}(\xi_{v\epsilon})(v\epsilon)^{2}]=\mathbb{E}_{\mathbb{P}}[\varphi^{(2)}(\xi_{M})M^{2}]=\int\phi(\frac{m}{v})\varphi^{(2)}(\xi_{m})m^{2}\mathop{}\!\mathrm{d}m,

where $\phi(x)$ is the density of $N(0,1)$ . When $\varphi$ is convex, we can use the fact $\varphi^{(2)}\geq 0$ to show the monotonicity of $K(v)$ :

	$\displaystyle K^{\prime}(v)$	$\displaystyle=\int\phi^{\prime}(\frac{m}{v})(-\frac{m}{v^{2}})\varphi^{(2)}(\xi_{m})m^{2}\mathop{}\!\mathrm{d}m$
		$\displaystyle=\int\frac{1}{\sqrt{2\pi}}\underbrace{\mathopen{}\mathclose{{}\left[\frac{m^{2}}{v^{3}}\exp\mathopen{}\mathclose{{}\left(-\frac{m^{2}}{2v^{2}}}\right)}\right]}_{\geq 0\text{ for }v\in{[\underline{\sigma},\overline{\sigma}]}}\underbrace{\vphantom{\mathopen{}\mathclose{{}\left[\frac{1}{2v}\exp\mathopen{}\mathclose{{}\left(-\frac{m^{2}}{2v^{2}}}\right)}\right]}\varphi^{(2)}(\xi_{m})\,m^{2}}_{\geq 0}\mathop{}\!\mathrm{d}m\geq 0.$

Its tells us $K(v)$ is increasing with respect to $v\in{[\underline{\sigma},\overline{\sigma}]}$ , then $K(v)$ reaches its maximum at $v=\overline{\sigma}$ . Hence,

\mathcal{E}[\varphi(W)]=\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}G(v)=\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}(\varphi(0)+\frac{K(v)}{2})=G(\overline{\sigma}).

When $\varphi$ is concave, $-\varphi$ is convex. Replace $\varphi$ above with $-\varphi$ and repeat the same procedure, we are able to show $-G(v)$ is increasing with respect to $v$ , that is, $G(v)$ is decreasing and reaches its maximum at $\underline{\sigma}$ . ∎

6.3 Proofs in Section 3.6

The proofs in this section are mainly based on the results in Section 2.2 which provides fruitful tools to deal with the independence of sequence in this framework.

Lemma 6.4.

In sublinear expectation space, for a sequence of i.i.d. random variables $\{\epsilon_{i}\}_{i=1}^{n}\sim\mathcal{N}(0,[1,1])$ (namely, $\epsilon_{1}\dashrightarrow\epsilon_{2}\dashrightarrow\dotsc\dashrightarrow\epsilon_{n}$ ), we have

(\epsilon_{1},\epsilon_{2},\dotsc,\epsilon_{n})^{T}\sim N(\bm{0},\mathbf{I}_{n}^{2}),

where $\mathbf{I}_{n}$ is the $n\times n$ identity matrix.

Proof.

Since the distribution of $\epsilon_{i}$ can be treated as the classical $N(0,1)$ , the sequential independence can be treated the classical independence (2.6.3). Then we can get the required results by applying the classical logic. ∎

Remark 6.4.1.

Since the independence of $\{\epsilon_{i}\}_{i=1}^{n}$ is classical, the order of independence can also be arbitrarily switched so we can easily obtain a result similar to 3.6.

Proposition 6.5.

For a sequence of i.i.d. random variables $\{\epsilon_{i}\}_{i=1}^{n}\sim\mathcal{N}(0,[1,1])$ , the following three statements are equivalent:

(1)

${\epsilon}_{1}\dashrightarrow{\epsilon}_{2}\dashrightarrow\cdots\dashrightarrow{\epsilon}_{n}$ ,
(2)

$\epsilon_{k_{1}}\dashrightarrow\epsilon_{k_{2}}\dashrightarrow\cdots\dashrightarrow\epsilon_{k_{n}}$ for any permutation $\{k_{j}\}_{j=1}^{n}$ of $\{1,2,\dotsc,n\}$ ,
(3)

$(\epsilon_{1},\epsilon_{2},\dotsc,\epsilon_{n})\sim N(\bm{0},[\underline{\sigma}^{2},\overline{\sigma}^{2}]\mathbf{I}_{n})$ .

Proof of 3.17.

Since the fully-sequential independence implies (F1) and (F2) by 2.20 and 2.23, we only need to show the other direction. When $n=2$ , this result is a consequence of 2.24. For $i\leq j$ , let

(V,\epsilon)_{i}^{j}\coloneqq(V_{i},\epsilon_{i},V_{i+1},\epsilon_{i+1},\dotsc,V_{j},\epsilon_{j}).

Next we proceed by math induction. Suppose the result holds for $n=k$ with $k\geq 2$ . For $n=k+1$ , we only need to show: given the conditions

1.

$(V_{1},\epsilon_{1})\dashrightarrow(V_{2},\epsilon_{2})\dashrightarrow\cdots\dashrightarrow(V_{k+1},\epsilon_{k+1})$ ,
2.

$V_{i}\dashrightarrow\epsilon_{i}$ for $i=1,2,\dotsc,k+1$ ,

we have the fully-sequential independence:

V_{1}\dashrightarrow\epsilon_{1}\dashrightarrow\cdots\dashrightarrow V_{k}\dashrightarrow\epsilon_{k}\dashrightarrow V_{k+1}\dashrightarrow\epsilon_{k+1}.

(6.7)

Since all the independence relation in 6.7 until the term $\epsilon_{k}$ can be guaranteed by the presumed result with $n=k$ , we only need to show the additional independence:

1.

$(V,\epsilon)_{1}^{k}\dashrightarrow V_{k+1}$ ,
2.

$((V,\epsilon)_{1}^{k},V_{k+1})\dashrightarrow\epsilon_{k+1}$ .

The first one comes from (F1) whose definition implies $(V,\epsilon)_{1}^{k}\dashrightarrow(V_{k+1},\epsilon_{k+1})$ . The second one comes from 2.22 given the following statements:

1.

$(V,\epsilon)_{1}^{k}\dashrightarrow(V_{k+1},\epsilon_{k+1})$ by (F1);
2.

$(V,\epsilon)_{1}^{k}\dashrightarrow V_{k+1}$ by the first one.

Then we have the required result for $n=k+1$ . The proof is finished by math induction. ∎

Proof of 3.18.

First the definition 3.18 of semi-sequential independence implies (S1) to (S3) by 2.20 and 2.23. We only need to check the other direction. For $i\leq j$ , let $V_{i}^{j}\coloneqq(V_{i},V_{i+1},\dotsc,V_{j})$ and similarly define the notation $\epsilon_{i}^{j}$ . Our goal can be expanded as by 2.8:

1.

$V_{1}^{l}\dashrightarrow V_{l+1}$ for any $l=1,2,\dotsc,n-1$ ,
2.

$(V_{1}^{n-1},V_{n},\epsilon_{1}^{l})\dashrightarrow\epsilon_{l+1}$ for any $l=1,2,\dotsc,n-1$ .

The first one comes from (S2). For the second one, note that we have

1.

$(V_{1}^{n-1},V_{n})\dashrightarrow(\epsilon_{1}^{l},\epsilon_{l+1})$ by (S1),
2.

$\epsilon_{1}^{l}\dashrightarrow\epsilon_{l+1}$ by (S3),
3.

$V_{1}^{n-1}\dashrightarrow V_{n}$ by (S2),

then by 2.24, we have proved the second relation. ∎

Proof of 3.22.

The idea main the equivalent definition of semi-sequential independence given by 3.18, which shows the symmetry within $V$ part and $\epsilon$ part. The equivalence of the three statemenst will be proved in this logic:

(3)\iff(1)\iff(2).

Let $\pi:(x_{1},x_{2},\dotsc,x_{n})\to(x_{k_{1}},x_{k_{2}},\dotsc,x_{k_{n}})$ denote a permutation function.

$(1)\iff(2)$ . It is a direct translation of 3.18 by considering the equivalence in each part:

1.

The equivalence in (S1) can be seen by by treating each vector as a function of each other under the permutation $\pi$ (or $\pi^{-1}$ .)
2.

The equivalence in (S2) comes from 3.6.
3.

The equivalence in (S3) comes from 6.5.

$(3)\iff(1)$ . Let $\bm{W}\coloneqq(W_{1},W_{2},\dotsc,W_{n})$ . Then

\bm{W}=(V_{1}\epsilon_{1},V_{2}\epsilon_{2},\dotsc,V_{n}\epsilon_{n})=\mathbf{V}\bm{\epsilon}.

Then we can decompose (3) into three conditions each of which is equivalent to the condition in (1) under the context of 3.18:

1.

Since $\mathbf{V}=(V_{1},\dotsc,V_{n})\operatorname{diag}(1,\dotsc,1)$ , we have $\mathbf{V}\dashrightarrow\bm{\epsilon}$ if and only if $(V_{1},\dotsc,V_{n})\dashrightarrow\bm{\epsilon}$ which is (S1) in 3.18.
2.

Note that $\mathbf{V}\sim\mathcal{M}({[\underline{\sigma},\overline{\sigma}]}\mathbf{I}_{n})$ is equivalent to $\bm{V}\sim\mathcal{M}({[\underline{\sigma},\overline{\sigma}]}^{n})$ which is further equivalent to (S2): $\{V_{i}\}_{i=1}^{n}$ are sequentially independent.
3.

By 6.5, we have $\bm{\epsilon}\sim N(\bm{0},\mathbf{I}_{n})$ is equivalent to (S3): $\{\epsilon\}_{i=1}^{n}$ are sequentially independent.∎

6.4 Proofs in Section 3.7

Proof of 3.24.

(A note on the finiteness of sublinear expectations) For any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k+1})$ , it means there exists $m\in\mathbb{N}_{+}$ and $C_{0}>0$ such that for $\bm{x},\bm{y}\in\mathbb{R}^{k+1},$

\lvert\varphi(\bm{x})-\varphi(\bm{y})\rvert\leq C_{0}(1+\lVert\bm{x}\rVert^{m}+\lVert\bm{y}\rVert^{m})\lVert\bm{x}-\bm{y}\rVert.

Without loss of generality, we can assume $\varphi(\bm{0})=0$ , then we have $\lvert\varphi(x)\rvert\leq C_{0}(1+\lVert x\rVert^{m})\lVert x\rVert$ . It implies

\mathcal{E}[\lvert\varphi(\bm{W})\rvert]\leq C_{0}(\mathcal{E}[\lVert\bm{W}\rVert]+\mathcal{E}[\lVert\bm{W}\rVert^{m}]).

To validate $\mathcal{E}[\lvert\varphi(\bm{W})\rvert]<\infty$ under each case, it will be sufficient to confirm the finiteness of this sublinear expectation: for any $q\in\mathbb{N}_{+}$ ,

\mathcal{E}[\lVert\bm{W}\rVert^{q}]<\infty.

(6.8)

(Semi-sequential independence case) Under the independence specified by 3.25, from 3.14, we have

\bm{W}\sim\hat{\mathcal{N}}(\bm{0},\mathcal{V}),

where $\mathcal{V}=\{\operatorname{diag}(\sigma_{1}^{2},\sigma_{2}^{2},\dotsc,\sigma_{n}^{2}):\sigma_{i}\in{[\underline{\sigma},\overline{\sigma}]}\}$ . Therefore,

\mathcal{E}[\varphi(\bm{W})]=\max_{\mathbf{V}\in\mathcal{V}}\mathbb{E}_{\mathbb{P}}[\varphi(\mathbf{V}^{1/2}\bm{\epsilon})]=\max_{\bm{\sigma}\in\mathcal{C}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathcal{E}[\varphi(\bm{\sigma}*\bm{\epsilon})],

where $\mathbf{V}^{1/2}$ is the symmetric square root of $\mathbf{V}$ and $\mathcal{C}_{n}{[\underline{\sigma},\overline{\sigma}]}\coloneqq\{\bm{\sigma}:(\sigma_{1},\sigma_{2},\dotsc,\sigma_{n})\in{[\underline{\sigma},\overline{\sigma}]}^{n}\}.$ At the same time, we can validate the finiteness $\mathcal{E}[\lvert\varphi(\bm{W})\rvert]<\infty$ , because $\mathbf{V}^{1/2}\bm{\epsilon}$ follows a classical multivariate normal distribution, then for any $q\in\mathbb{N}_{+}$ , $\mathbb{E}_{\mathbb{P}}[\lVert\mathbf{V}^{1/2}\bm{\epsilon}\rVert^{q}]<\infty$ , which implies 6.8.

Next, since we have $\mathcal{C}_{n}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}$ , we only need to show for any $\bm{\sigma}\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}$ ,

\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}*\bm{\epsilon})]\leq\mathcal{E}[\varphi(\bm{W})].

(6.9)

Note that $\bm{\sigma}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\bm{\epsilon}$ for $\bm{\sigma}\in\mathcal{S}_{n}{[\underline{\sigma},\overline{\sigma}]}$ and the random vector $\bm{\sigma}$ must follow a joint distribution supporting on a subset of ${[\underline{\sigma},\overline{\sigma}]}^{n}$ , then we can apply the representation of multivariate semi- $G$ -normal distribution (3.14) to get the inequality 6.9.

(Sequential independence case) We proceed by mathematical induction. For $n=1$ , the result 3.29 and 3.30 as well as the finiteness 6.8 hold by applying 3.10. Suppose they also hold for $n=k$ with $k\in\mathbb{N}_{+}$ . Our objective is to prove them when $n=k+1$ by using the result with $n=k$ . We decompose this goal into three inequalities:

\mathcal{E}[\varphi(\bm{W}_{(k+1)})]\leq\sup_{\bm{\sigma}\in\mathcal{L}_{k+1}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}*\bm{\epsilon})],

(6.10)

\sup_{\bm{\sigma}\in\mathcal{L}_{k+1}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}*\bm{\epsilon})]\leq\sup_{\bm{\sigma}\in\mathcal{L}^{*}_{k+1}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}*\bm{\epsilon})],

(6.11)

and

\sup_{\bm{\sigma}\in\mathcal{L}^{*}_{k+1}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}*\bm{\epsilon})]\leq\mathcal{E}[\varphi(\bm{W}_{(k+1)})].

(6.12)

After we check the three inequalities above, $\sup$ can be changed to $\max$ since we will show the sublinear expectation can be reached by some $\bm{\sigma}\in\mathcal{L}_{k+1}{[\underline{\sigma},\overline{\sigma}]}$ in the proof of 6.10.

First of all, 6.11 is straightforward due to the fact that $\mathcal{L}_{k+1}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{L}^{*}_{k+1}{[\underline{\sigma},\overline{\sigma}]}$ .

Second, to validate 6.10, it is sufficient show the sublinear expectation $\mathcal{E}[\varphi(\bm{W})]$ can be reached by choosing some $\bm{\sigma}\in\mathcal{L}_{k+1}{[\underline{\sigma},\overline{\sigma}]}$ . In fact, we can directly select it by the iterative procedure (similar to the idea of 4.1).

	$\displaystyle\mathcal{E}[\varphi(W_{1},W_{2},\dotsc,W_{k+1})]$	$\displaystyle=\mathcal{E}[\varphi(\bm{W}_{(k)},W_{k+1})]$
		$\displaystyle=\mathcal{E}[\mathcal{E}[\varphi(\bm{w}_{(k)},W_{k+1})]_{\bm{w}_{(k)}=\bm{W}_{(k)}}]$
		$\displaystyle=\mathcal{E}[(\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\bm{w}_{(k)},\sigma_{k+1}\epsilon_{k+1})])_{\bm{w}_{(k)}=\bm{W}_{(k)}}]$
		$\displaystyle=\mathcal{E}[(\mathbb{E}_{\mathbb{P}}[\varphi(\bm{w}_{(k)},v_{k+1}(\bm{w}_{(k)})\epsilon_{k+1})]],$

where $v_{k+1}(\cdot)$ is the maximizer depending on the value of $\bm{w}_{(k)}$ .

Claim 6.2.

For any $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k+1})$ , let

\varphi_{k}(x)\coloneqq\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(x,\sigma_{k+1}\epsilon_{k+1})].

Then we have $\varphi_{k}\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k})$ .

To apply the result when $n=k$ , we first confirm that $\varphi_{k}\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k})$ (due to 6.2) Then we have

	$\displaystyle\mathcal{E}[\varphi(\bm{W}_{(k)},W_{k+1})]$	$\displaystyle=\mathcal{E}[\varphi_{k}(\bm{W}_{(k)})],$
		$\displaystyle=\max_{\bm{\sigma}_{(k)}\in\mathcal{L}_{k}{{[\underline{\sigma},\overline{\sigma}]}}}\mathbb{E}_{\mathbb{P}}[\varphi_{k}(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)})]=\mathbb{E}_{\mathbb{P}}[\varphi_{k}(\bm{v}_{(k)}\bm{\epsilon}_{(k)})]$

where $\bm{v}_{(k)}\in\mathcal{L}_{k}{{[\underline{\sigma},\overline{\sigma}]}}$ is the maximizer. From this procedure, we can choose

\bm{v}_{(k+1)}\coloneqq(\bm{v}_{(k)},v_{k+1}(\bm{v}_{(k)}*\bm{\epsilon}_{(k)})),

which is corresponding to an element in $\mathcal{L}_{k+1}{[\underline{\sigma},\overline{\sigma}]}$ . Then it is easy to confirm that $\mathcal{E}[\varphi(\bm{W}_{(k+1)})]=\mathbb{E}_{\mathbb{P}}[\varphi(\bm{v}_{(k+1)}*\bm{\epsilon}_{(k+1)})]$ by repeating the procedure above. Meanwhile, the finiteness 6.8 is also guaranteed since, for any $q\in\mathbb{N}_{+}$ , choose $\varphi(\cdot)=\lVert\cdot\rVert^{q}\in C_{\mathrm{l.Lip}}$ , we have

\mathcal{E}[\lVert\bm{W}_{(k+1)}\rVert^{q}]=\mathcal{E}[\varphi(\bm{W}_{(k+1)})]=\mathcal{E}[\varphi_{k}(\bm{W}_{(k)})]<\infty,

due to the confirmed fact that $\varphi_{k}\in C_{\mathrm{l.Lip}}$ and the assumed 6.8 for $n=k$ .

Third, as an equivalent way of viewing 6.12, we need to prove for any $\bm{\sigma}_{(k+1)}\in\mathcal{L}^{*}_{k+1}{[\underline{\sigma},\overline{\sigma}]}$ , the corresponding linear expectation is dominated by $\mathcal{E}[\varphi(\bm{W}_{(k+1)})$ ]. Actually, we can write the classical expectation as

\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)},\sigma_{k+1}\epsilon_{k+1})]=\mathbb{E}_{\mathbb{P}}[\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)},\sigma_{k+1}\epsilon_{k+1})|\mathcal{F}_{k}]].

(6.13)

Recall the notation we used in the proof of 6.10,

\varphi_{k}(\bm{w}_{(k)})\coloneqq\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\bm{w}_{(k)},\sigma_{k+1}\epsilon_{k+1})]=\mathcal{E}[\varphi(\bm{w}_{(k)},W_{k+1})].

For the conditional expectation part in 6.13, since the information of $(\bm{\sigma}_{(k)},\bm{\epsilon}_{(k)})$ is given and $\sigma_{k+1}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon_{k+1}|\mathcal{F}_{k}$ , from the representation of univariate semi- $G$ -normal (3.10), it must satisfy:

\displaystyle\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)},\sigma_{k+1}\epsilon_{k+1})|\mathcal{F}_{k}]

\displaystyle\leq\max_{\sigma_{k+1}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)},\sigma_{k+1}\epsilon_{k+1})|\mathcal{F}_{k}]=\varphi_{k}(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)}).

Hence, by taking expectations on both sides and applying the presumed result for $n=k$ , we have

	$\displaystyle\mathbb{E}_{\mathbb{P}}[\varphi(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)},\sigma_{k+1}\epsilon_{k+1})]$	$\displaystyle\leq\mathbb{E}_{\mathbb{P}}[\varphi_{k}(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)})]\leq\max_{\bm{\sigma}_{(k)}\in\mathcal{L}^{*}_{k}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi_{k}(\bm{\sigma}_{(k)}\bm{\epsilon}_{(k)})]$
		$\displaystyle=\mathcal{E}[\varphi_{k}(\bm{W}_{(k)})]=\mathcal{E}[\mathcal{E}[\varphi(\bm{w}_{(k)},W_{k+1})]_{\bm{w}_{(k)}=\bm{W}_{(k)}}]$
		$\displaystyle=\mathcal{E}[\varphi(\bm{W}_{(k)},W_{k+1})].$

Therefore, we have shown 6.12. The proof is completed by induction.

(fully-sequential independence case) Note that fully-sequential independence implies the sequential independence and we have shown an explicit representation of $\mathcal{E}[\varphi(\bm{W})]$ for the latter situation. Hence, the representation here is the same as 3.29 and 3.30.

To prove 6.2, first recall the definition of $\varphi\in C_{\mathrm{l.Lip}}(\mathbb{R}^{k+1})$ , which means there exists $m\in\mathbb{N}_{+}$ and $C_{0}>0$ such that for $\bm{x},\bm{y}\in\mathbb{R}^{k+1}$

\lvert\varphi(\bm{x})-\varphi(\bm{y})\rvert\leq C_{0}(1+\lVert\bm{x}\rVert^{m}+\lVert\bm{y}\rVert^{m})\lVert\bm{x}-\bm{y}\rVert.

Note that

\varphi_{k}(\bm{x})=\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\bm{x},v\epsilon)].

Then we write

	$\displaystyle\|\varphi_{k}(\bm{x})-\varphi_{k}(\bm{y})\|$	$\displaystyle\leq\bigl{\|}\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}(\mathbb{E}[\varphi(\bm{x},v\epsilon)-\varphi(\bm{y},v\epsilon)])\bigr{\|}$
		$\displaystyle\leq\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\lvert\varphi(\bm{x},v\epsilon)-\varphi(\bm{y},y\epsilon)\rvert]$
		$\displaystyle\leq\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[C_{0}(1+\lVert(\bm{x},v\epsilon)\rVert^{m}+\lVert(\bm{y},v\epsilon)\rVert^{m})\lVert\bm{x}-\bm{y}\rVert],$

where we adapt the norm to lower dimension in the sense that $\lVert\bm{a}_{(k+1)}\rVert\coloneqq\lVert(\bm{a}_{(k)},0)\rVert$ . By triangle inequality, for any $v\in{[\underline{\sigma},\overline{\sigma}]}$ ,

\lVert(\bm{x},v\epsilon)\rVert\leq\lVert\bm{x}\rVert+\lvert v\epsilon\rvert\leq\lVert\bm{x}\rVert+\overline{\sigma}\lvert\epsilon\rvert,

then

\lVert(x,v\epsilon)\rVert^{m}\leq(\lVert x\rVert+\overline{\sigma}\lvert\epsilon\rvert)^{m}\leq\max\{1,2^{m-1}\}(\lVert x\rVert^{m}+\overline{\sigma}^{m}\lvert\epsilon\rvert^{m}).

Hence, with $C_{1}=C_{0}\max\{1,2^{m-1}\}$ and $C_{2}=C_{1}\max\{1,2\overline{\sigma}^{m}\mathbb{E}[\lvert\epsilon\rvert^{m}]\}$ ,

	$\displaystyle\|\varphi_{k}(\bm{x})-\varphi_{k}(\bm{y})\|$	$\displaystyle\leq C_{0}\max_{v\in{[\underline{\sigma},\overline{\sigma}]}}(1+\mathcal{E}[\lVert(x,v\epsilon)\rVert^{m}]+\mathcal{E}[\lVert(y,v\epsilon)\rVert^{m}])\lVert x-y\rVert$
		$\displaystyle\leq C_{1}(1+\lVert x\rVert^{m}+\lVert y\rVert^{m}+2\overline{\sigma}^{m}\mathbb{E}[\lvert\epsilon\rvert^{m}])\lVert x-y\rVert$
		$\displaystyle\leq C_{2}(1+\lVert x\rVert^{m}+\lVert y\rVert^{m})\lVert x-y\rVert.\qed$

Proof of 3.24.2.

Under semi-sequential independence, note that

(W_{1},W_{2},\dotsc,W_{n})\sim\hat{\mathcal{N}}(\bm{0},\mathcal{C}),

with $\mathcal{C}=\{\operatorname{diag}(\sigma^{2}_{1},\sigma^{2}_{2},\dotsc,\sigma^{2}_{n}):\sigma_{i}\in{[\underline{\sigma},\overline{\sigma}]}\}$ . It has the representation (3.5) that,

\mathcal{E}[\varphi(\bm{W}_{(n)})]=\max_{\sigma_{i}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\operatorname{diag}(\sigma_{1},\sigma_{2},\dotsc,\sigma_{n})\bm{\epsilon}_{(n)})],

where $\bm{\epsilon}_{(n)}$ follows a standard multivariate normal. When $\varphi$ is convex, by simply repeating the idea of 3.12 in multivariate case, we can show

\mathcal{E}[\varphi(\bm{W}_{(n)})]=\mathbb{E}[\varphi(\operatorname{diag}(\overline{\sigma},\overline{\sigma},\dotsc,\overline{\sigma})\bm{\epsilon}_{(n)}).

Accordingly, when $\varphi$ is concave, we can get similar result with $\overline{\sigma}$ replaced by $\underline{\sigma}$ .

Under sequential independence, based on the idea of showing

\mathcal{E}[\varphi(\bm{W})]=\max_{\bm{\sigma}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi(\bm{\sigma}*\bm{\epsilon})],

in 3.24. The maximizer can be obtained by implementing the iterative algorithm: with $\varphi_{0}\coloneqq\varphi$ , $i=1,2,\dotsc,n$ ,

\varphi_{i}(\bm{x}_{(n-i)})=\max_{\sigma_{n-i+1}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\varphi_{i-1}(\bm{x}_{(n-i)},\sigma_{n-i+1}\epsilon_{n-i+1})].

(6.14)

Then we only need to record the optimizer $\sigma_{n-i+1}(\cdot)$ which is a function of $\bm{x}_{(n-i)}$ to get the maximizer $\bm{\sigma}^{*}\in\mathcal{L}_{n}{[\underline{\sigma},\overline{\sigma}]}$ . First we can show that, for $i=1,2,\dotsc,n$ ,

\varphi_{i-1}\text{ is convex (concave)}\implies\varphi_{i}\text{ is convex (concave)},

(6.15)

namely, the convexity (or concavity) of $\varphi_{i-1}$ can be carried over to $\varphi_{i}$ . Actually, if $\varphi_{i-1}$ is convex (in $\mathbb{R}^{n-i+1})$ , it must be convex with respect to each subvector of arguments. Then by applying 3.12, we have

\varphi_{i}(\bm{x}_{(n-i)})=\mathbb{E}[\varphi_{i-1}(\bm{x}_{(n-i)},\overline{\sigma}\epsilon_{n-i+1})],

(6.16)

which also gives us the choice of $\sigma_{n-i+1}$ . Then we can validate the convexity of $\varphi_{i}$ by definition: with $\lambda\in[0,1]$ , $e\coloneqq\overline{\sigma}\epsilon_{n-i+1}$ ,

	$\displaystyle\varphi_{i}(\lambda\bm{x}_{(n-i)}+(1-\lambda)\bm{y}_{(n-i)})$	$\displaystyle=\mathbb{E}[\varphi_{i-1}(\lambda\bm{x}_{(n-i)}+(1-\lambda)\bm{y}_{(n-i)},e)]$
		$\displaystyle=\mathbb{E}[\varphi_{i-1}(\lambda(\bm{x}_{(n-i)},e)+(1-\lambda)(\bm{y}_{(n-i)},e))]$
		$\displaystyle\leq\lambda\mathbb{E}[\varphi_{i-1}(\bm{x}_{(n-i)},e)]+(1-\lambda)\mathbb{E}[\varphi_{i-1}(\bm{y}_{(n-i)},e)]$
		$\displaystyle=\lambda\varphi_{i}(\bm{x}_{(n-i)})+(1-\lambda)\varphi_{i}(\bm{y}_{(n-i)}).$

We can follow the same arguments to show the concave case. Finally, we can start from the convexity (concavity, respectively) of $\varphi_{0}$ to show the convexity of all $\varphi_{i}$ and along the way, we get each of the optimal $\sigma_{n-i+1}$ is equal to $\overline{\sigma}$ ( $\underline{\sigma}$ , respectively). ∎

Proof of 3.24.3.

The idea is similar to the proof of 3.24.2 by replacing 6.14 by

\varphi_{i}(\bm{x}_{(n-i)})=\mathcal{E}[\varphi_{i-1}(\bm{x}_{(n-i)},W^{G}_{n-i+1})].

In order to check the statement 6.15, due to the convexity of $\varphi$ , we can use 2.15 to show 6.16. The remaining part is the same as the proof of 3.24.2. ∎

6.5 Proofs in Section 4.3

The goal of this section is to prove 4.8, which is a simple representation for $\mathcal{E}[\varphi(W_{1},W_{2})]$ for $\varphi\in C_{\text{s.poly}}$ . Throughout this section, without further notice, we consider $W_{1}\overset{\text{d}}{=}W_{2}\overset{\text{d}}{=}\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ imposed with sequential independce $W_{1}\dashrightarrow W_{2}$ . We also have the expression $W_{i}\coloneqq V_{i}\epsilon_{i}$ with $V_{i}\sim\mathcal{M}[\underline{\mu},\overline{\mu}]$ , $\epsilon_{i}\sim N(0,1)$ and $V_{i}\dashrightarrow\epsilon_{i}$ .

Lemma 6.6.

For $p,q\in\mathbb{N}_{+}$ , if $q$ is odd,

\mathcal{E}[W_{1}^{p}W_{2}^{q}]=-\mathcal{E}[-W_{1}^{p}W_{2}^{q}]=0,

that is, it has certain zero mean.

Proof.

Directly work on the sublinear expectation by imposing the sequential independence. Let $K(x)\coloneqq\mathcal{E}[xW_{2}^{q}],$ then we have

K(x)=x^{+}\mathcal{E}[W_{2}^{q}]+x^{-}\mathcal{E}[-W_{2}^{q}]=0,

because we are essentially working on the odd-moment of $\sigma\epsilon$ with $\sigma\in{[\underline{\sigma},\overline{\sigma}]}$ . Then we have

	$\displaystyle\mathcal{E}[W_{1}^{p}W_{2}^{q}]$	$\displaystyle=\mathcal{E}[\mathcal{E}[w_{1}^{p}W_{2}^{q}]_{w_{1}=W_{1}}]$
		$\displaystyle=\mathcal{E}[K(W_{1}^{p})]=0.$

Similarly, we have $-\mathcal{E}[-W_{1}^{p}W_{2}^{q}]=0$ . ∎

Lemma 6.7.

For $\varphi(x_{1},x_{2})=(x_{1}+x_{2})^{n}$ with even positive integer $n$ ,

\mathcal{E}[\varphi(W_{1},W_{2})]=\mathbb{E}[\varphi(\overline{\sigma}\epsilon_{1},\overline{\sigma}\epsilon_{2})].

Furthermore, we have the even moments of $(W_{1}+W_{2})$ :

\mathcal{E}[(W_{1}+W_{2})^{n}]=(n-1)!!\overline{\sigma}^{n}2^{n/2}.

Proof.

This result directly comes from due to the convexity of $\varphi$ , which can be validated by considering its Hessian matrix. Then we can check that

\mathcal{E}[(W_{1}+W_{2})^{n}]=\mathbb{E}[\overline{\sigma}^{n}(\epsilon_{1}+\epsilon_{2})^{n}]=\mathbb{E}[\overline{\sigma}^{n}(\sqrt{2}\epsilon_{1})^{n}]=(n-1)!!\overline{\sigma}^{n}2^{n/2}.

∎

Lemma 6.8.

For $\varphi(x_{1},x_{2})=(x_{1}+x_{2})^{n}$ with odd positive integer $n$ ,

\mathcal{E}[\varphi(W_{1},W_{2})]=\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2})],

(6.17)

where $(\sigma_{1},\sigma_{2})$ satisfies:

$\displaystyle\sigma_{1}$	$\displaystyle=\overline{\sigma},$
$\displaystyle\sigma_{2}$	$\displaystyle=\sigma_{2}(\sigma_{1}\epsilon_{1})$
	$\displaystyle=\overline{\sigma}(\sigma_{1}\epsilon_{1})^{+}-\underline{\sigma}(\sigma_{1}\epsilon_{1})^{-}.$	(6.18)

Furthermore, for odd $n\geq 3$ , we have the moments of $(W_{1}+W_{2})$ :

\mathcal{E}[(W_{1}+W_{2})^{n}]=\sqrt{\frac{2}{\pi}}\bigl{(}\sum_{k=0}^{(n-3)/2}C_{k}\overline{\sigma}^{2k+1}2^{k-1}k!\bigr{)},

where $C_{k}=\binom{n}{2k+1}(n-2k-1)!!(\overline{\sigma}^{n-2k-1}-\underline{\sigma}^{n-2k-1}).$

Proof.

We can directly check the sublinear expectation

\displaystyle\mathcal{E}[(W_{1}+W_{2})^{n}]

\displaystyle=\mathcal{E}[\sum_{i=0}^{n}\binom{n}{i}W_{1}^{i}W_{2}^{n-i}].

Since the terms in the form of $W_{1}^{n}$ , $W_{2}^{n}$ or $W_{1}^{i}W_{2}^{n-i}$ with even $i$ (then $n-i$ is odd) all have zero mean (with no ambiguity), they can be omitted in the computation by 6.6. Hence,

	$\displaystyle\mathcal{E}[(W_{1}+W_{2})^{n}]$	$\displaystyle=\mathcal{E}[\sum_{i\text{ odd}}\binom{n}{i}W_{1}^{i}W_{2}^{n-i}]$
		$\displaystyle=\mathcal{E}[\underbrace{\mathcal{E}[\sum_{i\text{ odd}}\binom{n}{i}w_{1}^{i}W_{2}^{n-i}}_{\eqqcolon\varphi_{1}(w_{1})}]_{w_{1}=W_{1}}].$

The inner part can be expressed as

	$\displaystyle\varphi_{1}(w_{1})$	$\displaystyle=\mathcal{E}[\sum_{i\text{ odd}}\binom{n}{i}w_{1}^{i}W_{2}^{n-i}]$
		$\displaystyle=\max_{\sigma_{2}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[\sum_{i\text{ odd}}\binom{n}{i}w_{1}^{i}(\sigma_{2}\epsilon_{2})^{n-i}]$
		$\displaystyle=\max_{\sigma_{2}\in{[\underline{\sigma},\overline{\sigma}]}}\sum_{i\text{ odd}}\binom{n}{i}w_{1}^{i}\sigma_{2}^{n-i}\mathbb{E}[\epsilon_{2}^{n-i}]$
		$\displaystyle=\max_{\sigma_{2}\in{[\underline{\sigma},\overline{\sigma}]}}\sum_{k=0}^{(n-3)/2}\binom{n}{2k+1}w_{1}^{2k+1}\sigma_{2}^{n-2k-1}(n-2k-1)!!\eqqcolon\max_{\sigma_{2}\in{[\underline{\sigma},\overline{\sigma}]}}H(\sigma_{2};w_{1}).$

Notice that the monotonicity of $H(\sigma_{2};w_{1})$ with respect to $\sigma_{2}$ depends on the sign of $w_{1}$ . Hence

	$\displaystyle\varphi_{1}(w_{1})$	$\displaystyle=\begin{cases}H(\overline{\sigma};w_{1})&\text{ if }w_{1}\geq 0\\ H(\underline{\sigma};w_{1})&\text{ if }w_{1}<0\end{cases}$
		$\displaystyle=\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{w_{1}\geq 0}}\right\}}(H(\overline{\sigma};w_{1})-H(\underline{\sigma};w_{1}))+H(\underline{\sigma};w_{1})$

Then we have

	$\displaystyle\mathcal{E}[(W_{1}+W_{2})^{n}]$	$\displaystyle=\mathcal{E}[\varphi_{1}(W_{1})]$
		$\displaystyle=\mathcal{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{W_{1}\geq 0}}\right\}}(H(\overline{\sigma};W_{1})-H(\underline{\sigma};W_{1}))+H(\underline{\sigma};W_{1})].$

Here we have

\mathcal{E}[H(\underline{\sigma};W_{1})]=\mathcal{E}[\sum_{k=0}^{(n-3)/2}\binom{n}{2k+1}\sigma_{2}^{n-2k-1}(n-2k-1)!!W_{1}^{2k+1}],

with each $W_{1}^{2k+1}$ has certain mean zero, so $\mathcal{E}[H(\underline{\sigma};W_{1})]=-\mathcal{E}[-H(\underline{\sigma};W_{1})]=0$ . Therefore,

	$\displaystyle\mathcal{E}[(W_{1}+W_{2})^{n}]$	$\displaystyle=\mathcal{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{W_{1}\geq 0}}\right\}}(H(\overline{\sigma};W_{1})-H(\underline{\sigma};W_{1}))]$
		$\displaystyle=\mathcal{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{W_{1}\geq 0}}\right\}}\sum_{k=0}^{(n-3)/2}C_{k}W_{1}^{2k+1}]\eqqcolon\mathcal{E}[K(W_{1})]$

with $C_{k}=\binom{n}{2k+1}(n-2k-1)!!(\overline{\sigma}^{n-2k-1}-\underline{\sigma}^{n-2k-1})\geq 0$ . Since $K(x)$ is a convex function by noting that it stays at $0$ on $x<0$ and increases when $x\geq 0$ , we have

\mathcal{E}[(W_{1}+W_{2})^{n}]=\mathbb{E}[K(\overline{\sigma}\epsilon_{1})].

Therefore, we obtain the optimal of $(\sigma_{1},\sigma_{2})$ in the form 6.18, which can be doubly checked by plugging it back to the right handside of 6.17 to show the equality. We can further get the exact value of $\mathcal{E}[(W_{1}+W_{2})^{n}]$ by continuing the procedure above,

	$\displaystyle\mathcal{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{W_{1}\geq 0}}\right\}}\sum_{k=0}^{(n-3)/2}C_{k}W_{1}^{2k+1}]$	$\displaystyle=\max_{\sigma_{1}\in{[\underline{\sigma},\overline{\sigma}]}}\text{$\mathbb{E}$}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\sigma_{1}\epsilon_{1}\geq 0}}\right\}}\sum_{k=0}^{(n-3)/2}C_{k}(\sigma_{1}\epsilon_{1})^{2k+1}]$
		$\displaystyle=\text{$\mathbb{E}$}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}\geq 0}}\right\}}\sum_{k=0}^{(n-3)/2}C_{k}(\overline{\sigma}\epsilon_{1})^{2k+1}]$
		$\displaystyle=\sum_{k=0}^{(n-3)/2}C_{k}\overline{\sigma}^{2k+1}\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}\geq 0}}\right\}}\epsilon_{1}^{2k+1}].$

Here we need to use the property of the classical half-normal distribution:

	$\displaystyle\mathbb{E}[\|\epsilon\|^{2k+1}]$	$\displaystyle=\mathbb{E}[\|\epsilon^{2k+1}\|]$
		$\displaystyle=\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}\geq 0}}\right\}}\epsilon_{1}^{2k+1}]+\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}<0}}\right\}}(-\epsilon_{1})^{2k+1}].$

Since $\epsilon$ and $-\epsilon$ have the same distribution,

	$\displaystyle\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}<0}}\right\}}(-\epsilon_{1})^{2k+1}]$	$\displaystyle=\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}\leq 0}}\right\}}(-\epsilon_{1})^{2k+1}]$
		$\displaystyle=\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{-\epsilon_{1}\geq 0}}\right\}}(-\epsilon_{1})^{2k+1}]$
		$\displaystyle=\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}\geq 0}}\right\}}\epsilon_{1}^{2k+1}],$

then $\mathbb{E}[|\epsilon|^{2k+1}]=2\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}\geq 0}}\right\}}\epsilon_{1}^{2k+1}].$ Hence,

\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}\geq 0}}\right\}}\epsilon_{1}^{2k+1}]=\frac{1}{2}\mathbb{E}[|\epsilon|^{2k+1}].

Also notice that $\epsilon\sim N(0,1)$ , then $|\epsilon|$ follows a half-normal distribution or $|\epsilon|=|\epsilon|/1$ follows a $\chi_{1}$ -distribution with raw moments:

\mathbb{E}[|\epsilon|^{n}]=2^{n/2}\frac{\Gamma((n+1)/2)}{\Gamma(1/2)},

Then for $n=2k+1$ , with $k\in\mathbb{N}$

\mathbb{E}[\lvert\epsilon\rvert^{2k+1}]=2^{k}\sqrt{\frac{2}{\pi}}k!.

Therefore,

	$\displaystyle\mathcal{E}[(W_{1}+W_{2})^{n}]$	$\displaystyle=\mathbb{E}[K(\overline{\sigma}\epsilon_{1})]$
		$\displaystyle=\sum_{k=0}^{(n-3)/2}C_{k}\overline{\sigma}^{2k+1}\frac{1}{2}\mathbb{E}[\|\epsilon\|^{2k+1}]$
		$\displaystyle=\sqrt{\frac{2}{\pi}}\bigl{(}\sum_{k=0}^{(n-3)/2}C_{k}\overline{\sigma}^{2k+1}2^{k-1}k!\bigr{)}.\qed$

Proof of 4.8.

The representation under semi-sequential independence can be directly checked based on 3.24 and 3.24.3. In the following context, we only consider the case of sequential independence $W_{1}\dashrightarrow W_{2}$ , because $W_{1}\overset{\text{F}}{\dashrightarrow}W_{2}$ will induce the same result by a similar logic to the proof of 3.24. The basic idea is we need to show that $\mathcal{E}[\varphi(W_{1},W_{2})]$ can be reached by the linear expectation on the right handside for some $\bm{\sigma}\in\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}$ . Then we have

\mathcal{E}[\varphi(W_{1},W_{2})]\leq\max_{\bm{\sigma}\in\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}_{\mathbb{P}}[\varphi(\sigma_{1}\epsilon_{1},\sigma_{2}\epsilon_{2})].

The reverse direction of inequality comes from the fact that $\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}\subset\mathcal{L}_{2}{[\underline{\sigma},\overline{\sigma}]}$ and 3.24. The logic here is similar to the proof of 3.24 in sequential-independence case, that is, we only need to record the optimal choice of $(\sigma_{1},\sigma_{2})$ when evaluating the sublinear expectation in an iterative way.

For instance, when $\varphi(x_{1},x_{2})=(x_{1}+x_{2})^{n}$ , the sublinear expectation can be reached by some $\bm{\sigma}\in\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}$ as illustrated in 6.7 and 6.8. For $\varphi(x_{1},x_{2})=x_{1}^{p}x_{2}^{q}$ with $p,q\in\mathbb{N}_{+}$ ,

\mathcal{E}[W_{1}^{p}W_{2}^{q}]=\mathcal{E}[(\max_{\sigma_{2}\in{[\underline{\sigma},\overline{\sigma}]}}w_{1}^{p}\mathbb{E}[(\sigma_{2}\epsilon_{2})^{q}])].

(6.19)

Meanwhile, for any $(\sigma_{1},\sigma_{2})\in\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}$ , let $Y_{i}=\sigma_{i}\epsilon_{i},i=1,2$ and $\mathbb{E}=\mathbb{E}_{\mathbb{P}}$ denote the linear expectation. Note that $\sigma_{1}\in\{\underline{\sigma},\overline{\sigma}\}$ ,

\sigma_{2}=\sigma_{2}(Y_{1})=(\sigma_{22}-\sigma_{21})\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{Y_{1}>0}}\right\}}+\sigma_{21},

with $(\sigma_{21},\sigma_{22})\in\{\underline{\sigma},\overline{\sigma}\}^{2}$ . We also have $Y_{1}\mathrel{\mathchoice{\hbox to0.0pt{$\displaystyle\perp$\hss}{\displaystyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\textstyle\perp$\hss}{\textstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptstyle\perp$\hss}{\scriptstyle\mkern 2.0mu\perp}}{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}{\scriptscriptstyle\mkern 2.0mu\perp}}}\epsilon_{2}$ due to the setup which is the same as Section 3.7. Then

	$\displaystyle\mathbb{E}[Y_{1}^{p}Y_{2}^{q}]$	$\displaystyle=\mathbb{E}\bigl{[}Y_{1}^{p}\mathbb{E}[Y_{2}^{q}\|Y_{1}]\bigr{]}$
		$\displaystyle=\mathbb{E}\bigl{[}Y_{1}^{p}\mathbb{E}[(\sigma_{2}(Y_{1}))^{q}\epsilon_{2}^{q}\|Y_{1}]\bigr{]}$
		$\displaystyle=\mathbb{E}[(\sigma_{1}\epsilon_{1})^{p}(\sigma_{2}(Y_{1}))^{q}]\mathbb{E}[\epsilon_{2}^{q}]$
		$\displaystyle=\sigma_{1}^{p}\mathbb{E}[(\sigma_{22}-\sigma_{21})\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{Y_{1}>0}}\right\}}+\sigma_{21})^{q}\epsilon_{1}^{p}]\mathbb{E}[\epsilon_{2}^{q}]$
		$\displaystyle=\sigma_{1}^{p}\mathbb{E}[(\sigma_{22}^{q}-\sigma_{21}^{q})\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}>0}}\right\}}+\sigma_{21}^{q})\epsilon_{1}^{p}]\mathbb{E}[\epsilon_{2}^{q}].$

Hence,

\mathbb{E}[Y_{1}^{p}Y_{2}^{q}]=\sigma_{1}^{p}\bigl{(}(\sigma_{22}^{q}-\sigma_{21}^{q})\mathbb{E}[\mathds{1}_{\mathopen{}\mathclose{{}\left\{\cramped{\epsilon_{1}>0}}\right\}}\epsilon_{1}^{p}]+\sigma_{21}^{q}\mathbb{E}[\epsilon_{1}^{p}]\bigr{)}\mathbb{E}[\epsilon_{2}^{q}].

(6.20)

Then we divide our discussions into three cases: a) $q$ is odd, b) $q$ is even and $p$ is even, c) $q$ is even and $p$ is odd. When $q$ is odd, the expectation in 6.19 is equal to $0$ by 6.6. It can be obviously reached the linear expectation on the right handside by choosing any $\bm{\sigma}\in\mathcal{L}_{2}^{0}{[\underline{\sigma},\overline{\sigma}]}$ by 6.20. When $q$ is even, we can see that the choice of $\sigma_{2}$ depends on the sign of $w_{1}^{p}$ which further depends on the sign of $w_{1}$ if $p$ is odd (otherwise it is always non-negative). To be specific, when both $q$ and $p$ are even,

\displaystyle\mathcal{E}[W_{1}^{p}W_{2}^{q}]

\displaystyle=\mathbb{E}[\epsilon_{2}^{q}]\overline{\sigma}^{q}\mathcal{E}[W_{1}^{P}]=\overline{\sigma}^{p}\overline{\sigma}^{q}\mathbb{E}[\epsilon_{1}^{p}]\mathbb{E}[\epsilon_{2}^{q}],

which can be reached by choosing $\sigma_{1}=\sigma_{2}=\overline{\sigma}$ , namely, $(\sigma_{21},\sigma_{22})=(\overline{\sigma},\overline{\sigma})$ . When $q$ is even and $p$ is odd, we have

	$\displaystyle\mathcal{E}[W_{1}^{p}W_{2}^{q}]$	$\displaystyle=\mathbb{E}[\epsilon_{2}^{q}]\mathcal{E}[\overline{\sigma}^{q}(W_{1}^{p})^{+}-\underline{\sigma}^{q}(W_{1}^{p})^{-}]$
		$\displaystyle=\mathbb{E}[\epsilon_{2}^{q}]\mathcal{E}[(\overline{\sigma}^{p}-\underline{\sigma}^{q})(W_{1}^{p})^{+}+\underline{\sigma}^{q}W_{1}^{p}]$
		$\displaystyle=\mathbb{E}[\epsilon_{2}^{q}]\mathcal{E}[(\overline{\sigma}^{p}-\underline{\sigma}^{q})(W_{1}^{p})^{+}].$

Hence, by 3.12, we have

\mathcal{E}[W_{1}^{p}W_{2}^{q}]=\begin{cases}\mathbb{E}[\epsilon_{2}^{q}](\overline{\sigma}^{p}-\underline{\sigma}^{q})\mathbb{E}[(\overline{\sigma}^{p}\epsilon_{1}^{p}))^{+}]&\text{if }\overline{\sigma}^{p}\geq\underline{\sigma}^{q}\\ \mathbb{E}[\epsilon_{2}^{q}](\underline{\sigma}^{p}-\overline{\sigma}^{q})\mathbb{E}[-(\underline{\sigma}^{p}\epsilon_{1}^{p}))^{+}]&\text{if }\overline{\sigma}^{p}<\underline{\sigma}^{q}\end{cases}

It can be reached by choosing $\sigma_{1}$ and $(\sigma_{21},\sigma_{22})$ accordingly in 6.20. Similar logic can be applied to $\varphi(x_{1},x_{2})=cx_{1}^{p}x_{2}^{q}$ and also $\varphi(x_{1},x_{2})=(ax_{1}+bx_{2})^{n}$ where the scaling does not have effects on the form of the optimal choice of $\bm{\sigma}$ . ∎

6.6 Proofs in Section 5.2

To prepare for the proof, we consider the following function space:

•

$C^{k}(\mathbb{R})$ : the space of $k$ -times continuously differentiable functions on $\mathbb{R}$
•

$C_{b}(\mathbb{R})$ : the space of bounded and continuous functions on $\mathbb{R}$ ,
•

$C^{*}(\mathbb{R})=\{\varphi\in C^{2}(\mathbb{R}):\varphi^{\prime\prime}\text{ is bounded and uniformly continuous}\}$ .

For any $\varphi\in C^{*}(\mathbb{R})$ , since $\varphi^{\prime\prime}$ is bounded, we have

M\coloneqq\sup_{x\in\mathbb{R}}\lvert\varphi^{\prime\prime}(x)\rvert<\infty.

The following 6.9 shows that we only need to check those $\varphi\in C^{*}(\mathbb{R})$ to prove 5.9.

Lemma 6.9.

Assume $\mathcal{E}[\lvert Z_{n}\rvert]<\infty$ and $\mathcal{E}[\lvert Z\rvert]<\infty$ . Suppose the convergence

\mathcal{E}[\varphi(Z_{n})]\to\mathcal{E}[\varphi(Z)],

(6.21)

holds for any $\varphi\in C^{*}(\mathbb{R})$ . It also holds for $\varphi\in C_{b}(\mathbb{R})$ .

Proof of 6.9.

We first consider $\varphi\in C_{b}(\mathbb{R})$ with a compact support $S$ . By the uniform approximation provided by Pursell, (1967), for any $a>0$ , there exists $\varphi_{a}\in C^{3}(\mathbb{R})$ with support $S$ such that

\sup_{x\in\mathbb{R}}\lvert\varphi(x)-\varphi_{a}(x)\rvert<\frac{a}{2}.

For $k=1,2,3$ , since $\varphi_{a}^{(k)}$ is continuous and it is on a compact support, it must be bounded by $M_{k}$ . By mean-value theorem, for $\delta>0$ and some $\beta\in[0,1]$ , we have

\lvert\varphi_{a}^{(2)}(x)-\varphi_{a}^{(2)}(x+\delta)\rvert\leq\lvert\varphi^{(3)}(x+\beta\delta)\rvert\delta\leq M_{3}\delta.

Thus $\varphi_{a}^{(2)}$ is uniformly continuous and bounded, implying $\varphi_{a}\in C^{*}(\mathbb{R})$ . In this way, we have

	$\displaystyle\lvert\mathcal{E}[\varphi(Z_{n})]-\mathcal{E}[\varphi(Z)]\rvert$	$\displaystyle\leq\lvert\mathcal{E}[\varphi(Z_{n})]-\mathcal{E}[\varphi_{a}(Z_{n})]\rvert+\lvert\mathcal{E}[\varphi(Z)]-\mathcal{E}[\varphi_{a}(Z)]\rvert$
		$\displaystyle+\lvert\mathcal{E}[\varphi_{a}(Z_{n})]-\mathcal{E}[\varphi_{a}(Z)]\rvert$
		$\displaystyle\leq a+\lvert\mathcal{E}[\varphi_{a}(Z_{n})]-\mathcal{E}[\varphi_{a}(Z)]\rvert.$

Hence, $\limsup_{n\to\infty}\lvert\mathcal{E}[\varphi(Z_{n})]-\mathcal{E}[\varphi(Z)]\rvert\leq a$ . It means that

0\leq\liminf_{n\to\infty}\lvert\mathcal{E}[\varphi(Z_{n})]-\mathcal{E}[\varphi(Z)]\rvert\leq\limsup_{n\to\infty}\lvert\mathcal{E}[\varphi(Z_{n})]-\mathcal{E}[\varphi(Z)]\rvert\leq a.

Since $a$ can be arbitrarily small, we have the convergence 6.21 holds.

Next consider any $\varphi\in C_{b}(\mathbb{R})$ which is bounded by $B$ . For any $K>0$ , it can be decomposed into $\varphi=\varphi_{1}+\varphi_{2}$ where $\varphi_{1}$ has compact support $[-K,K]$ and $\varphi_{2}$ satisfies $\varphi_{2}(x)=0$ if $\lvert x\rvert\leq K$ and for $\lvert x\rvert>K$ ,

\lvert\varphi_{2}(x)\rvert\leq B\leq\frac{B\lvert x\rvert}{K},

Then we have

\lvert\mathcal{E}[\varphi(Z_{n})]-\mathcal{E}[\varphi(Z)]\rvert\leq\lvert\mathcal{E}[\varphi_{1}(Z_{n})]-\mathcal{E}[\varphi_{1}(Z)]\rvert+\lvert\mathcal{E}[\varphi_{2}(Z_{n})]-\mathcal{E}[\varphi_{2}(Z)]\rvert,

where the first term must converge by our previous argument. Then we only need to work on the second term that satisfies:

\lvert\mathcal{E}[\varphi_{2}(Z_{n})]-\mathcal{E}[\varphi_{2}(Z)]\rvert\leq\frac{B}{K}(\mathcal{E}[\lvert Z_{n}\rvert]+\mathcal{E}[\lvert Z\rvert]).

Note that $L\coloneqq\mathcal{E}[\lvert Z_{n}\rvert]+\mathcal{E}[\lvert Z\rvert]<\infty$ . Then we have $\limsup_{n\to\infty}\lvert\mathcal{E}[\varphi(Z_{n})]-\mathcal{E}[\varphi(Z)]\rvert\leq\frac{BL}{K}$ . Since $K$ can be arbitrarily large, we obtain the convergence 6.21. ∎

Lemma 6.10.

For any $\varphi\in C^{*}(\mathbb{R})$ , the function $\delta:\mathbb{R}_{+}\to\mathbb{R}_{+}$ , defined as

\delta(a)\coloneqq\sup_{\lvert x-y\rvert\leq a}\lvert\varphi^{\prime\prime}(x)-\varphi^{\prime\prime}(y)\rvert,

must be a bounded and increasing one. It also satisfies $\lim_{a\downarrow 0}\delta(a)=0$ .

Proofs of 6.10.

The boundedness (and the limit property) can be directly derive from the boundedness (and uniform continuity) of $\varphi^{\prime\prime}$ . For the monotone property, for any $0<a\leq b$ , since $\{(x,y):\lvert x-y\rvert\leq a\}\subset\{(x,y):\lvert x-y\rvert\leq b\}$ , we must have $\delta(a)\leq\delta(b)$ . ∎

Proof of 5.9.

We adapt the the idea of the Linderberg method in a “leave-one-out” manner to the sublinear context. One of the reason that we are able to do such adaptation is the symmetry in semi- $G$ -independence: $X_{i}$ is semi- $G$ -independent from $\{X_{j},j\neq i\}$ .

Note that $X_{i}=V_{i}\eta_{i}$ with the semi- $G$ -independence then we have

(V_{1},\dots,V_{n})\dashrightarrow(\eta_{1},\dotsc,\eta_{n}).

Then we consider a sequence of classically i.i.d. $\{\epsilon_{i}\}_{i=1}^{n}$ satisfying $\epsilon_{1}\sim N(0,1)$ and

(V_{1},\dots,V_{n})\dashrightarrow(\eta_{1},\dotsc,\eta_{n})\dashrightarrow(\epsilon_{1},\dotsc,\epsilon_{n}).

For each $n$ , consider a triangle array,

e_{i,n}=\frac{X_{i}}{\sqrt{n}},

and

S_{n}\coloneqq e_{1,n}+\cdots+e_{n,n}.

For this $n$ , consider another triangle array $\{W_{i,n}\}_{i=1}^{n}\coloneqq\{(V_{i}\epsilon_{i})/\sqrt{n}\}_{i=1}^{n}$ which are semi- $G$ -version i.i.d. following semi- $G$ -normal and satisfy

W_{i,n}\overset{\text{d}}{=}W_{1,n}\overset{\text{d}}{=}\frac{W}{\sqrt{n}}.

Note that here we use the same $V_{i}$ sequence in $W_{i}$ . This setup is important for our proof to overcome the difficulty brought by the sublinear property of $\mathcal{E}$ (it also gives some insight about the role of $\sigma^{2}$ in the classical central limit theorem compared with $V^{2}$ in sublinear context). Let

W_{n}\coloneqq W_{1,n}+\cdots+W_{n,n},

then we must have $W_{n}\sim\hat{\mathcal{N}}(0,[\underline{\sigma}^{2},\overline{\sigma}^{2}])$ (by the stability of semi- $G$ -normal as shown in 3.23);

Our goal is to show the difference, for any $\varphi\in C^{*}(\mathbb{R})$ (recall 6.9), as $n\to\infty$ ,

\lvert\mathcal{E}[\varphi(S_{n})]-\mathcal{E}[\varphi(W)]\rvert=\lvert\mathcal{E}[\varphi(S_{n})]-\mathcal{E}[\varphi(W_{n})]\rvert\to 0.

(6.22)

Consider the following summations:

M_{i,n}=\sum_{j=1}^{i}e_{j,n}+\sum_{j=i+1}^{n}W_{j,n},

(6.23)

and

U_{i,n}=\sum_{j=1}^{i-1}e_{j,n}+\sum_{j=i+1}^{n}W_{j,n},

(6.24)

with the common convention that an empty sum is defined as zero. Note that $M_{0,n}=W_{n}$ and $M_{n,n}=S_{n}$ , then we can transform the difference in 6.22 to the telescoping sum

$\displaystyle\mathcal{E}[\varphi(S_{n})]-\mathcal{E}[\varphi(W_{n})]$	$\displaystyle\leq\mathcal{E}[\varphi(S_{n})-\varphi(W_{n})]$
	$\displaystyle=\mathcal{E}\sum_{i=1}^{n}(\varphi(M_{i,n})-\varphi(M_{i-1,n}))$
	$\displaystyle\leq\sum_{i=1}^{n}\mathcal{E}[\varphi(M_{i,n})-\varphi(M_{i-1,n})].$	(6.25)

and

\mathcal{E}[\varphi(W_{n})]-\mathcal{E}[\varphi(S_{n})]\leq\sum_{j=1}^{n}\mathcal{E}[\varphi(M_{n-j,n})-\varphi(M_{n-j+1,n})].

(6.26)

Then we only need to work on the summand $\mathcal{E}[\varphi(M_{i,n})-\varphi(M_{i-1,n})]$ . By a Taylor expansion,

	$\displaystyle\varphi(M_{i,n})-\varphi(M_{i-1,n})$	$\displaystyle=\varphi(U_{i,n}+e_{i,n})-\varphi(U_{i,n}+W_{i,n})$
		$\displaystyle=(e_{i,n}-W_{i,n})\varphi^{\prime}(U_{i,n})$
		$\displaystyle+[\frac{1}{2}e_{i,n}^{2}\varphi^{\prime\prime}(U_{i,n}+\alpha e_{i,n})-\frac{1}{2}W_{i,n}^{2}\varphi^{\prime\prime}(U_{i,n}+\beta W_{i,n})],$
		$\displaystyle\eqqcolon(a)+(b)$

for some $\alpha,\beta\in[0,1]$ .

For the first term $(a)$ , its sublinear expectation must exist because the growth of $\varphi^{\prime}$ is at most linear due to the boundedness of $\varphi^{\prime\prime}$ . Note that $U_{i,n}$ is the inner product of

V_{u}=(V_{1},\dotsc,V_{i-1},V_{i+1},\dotsc,V_{n}),

and

\xi_{u}=(\eta_{1},\dotsc,\eta_{i-1},\epsilon_{i+1},\dotsc,\epsilon_{n}),

with the independence $V_{u}\dashrightarrow\xi_{u}$ , so we have $e_{i,n}-W_{i,n}(=n^{-1/2}V_{i}(\eta_{i}-\epsilon_{i}))$ and $U_{i,n}$ are semi- $G$ -independent. Then we can compute

	$\displaystyle\mathcal{E}[(e_{i,n}-W_{i,n})\varphi^{\prime}(U_{i,n})]$	$\displaystyle=\max_{(v_{i},v_{u})}\mathbb{E}[n^{-1/2}v_{i}(\eta_{i}-\epsilon_{i})\varphi^{\prime}(v_{u}^{T}\xi_{u})]$
	$\displaystyle(\text{classical indep.})$	$\displaystyle=\max_{(v_{i},v_{u})}n^{-1/2}v_{i}\underbrace{\mathbb{E}[\eta_{i}-\epsilon_{i}]}_{=0}\mathbb{E}[\varphi^{\prime}(v_{u}^{T}\xi_{u})]=0.$

Similarly, we have $-\mathcal{E}[-(e_{i,n}-W_{i,n})\varphi^{\prime}(U_{i,n})]=0$ . Hence, $(a)$ has certain mean zero. Then we have

\mathcal{E}[\varphi(M_{i,n})-\varphi(M_{i-1,n})]=\mathcal{E}[(b)].

For the second term $(b)$ , note that

		$\displaystyle 2\times(b)$
	$\displaystyle=$	$\displaystyle e_{i,n}^{2}[\varphi^{\prime\prime}(U_{i,n}+\alpha e_{i,n})-\varphi^{\prime\prime}(U_{i,n})]-W_{i,n}^{2}[\varphi^{\prime\prime}(U_{i,n}+\beta W_{i,n})-\varphi^{\prime\prime}(U_{i,n})]+(e_{i,n}^{2}-W_{i,n}^{2})\varphi^{\prime\prime}(U_{i,n})$
	$\displaystyle\eqqcolon$	$\displaystyle(b)_{1}-(b)_{2}+(b)_{3}.$

For $(b)_{1}$ , since $\lvert\alpha e_{i,n}\rvert\leq\lvert e_{i,n}\rvert$ , by recalling the property of $\delta(\cdot)$ (6.10), we have

\mathcal{E}[\lvert(b)_{1}\rvert]\leq\mathcal{E}[e_{i,n}^{2}\delta(\lvert e_{i,n}\rvert)]=\frac{1}{n}\mathcal{E}[X_{1}^{2}\delta(n^{-1/2}\lvert X_{1}\rvert)],

where we use the setup $e_{i,n}=\frac{X_{i}}{\sqrt{n}}$ and $X_{i}\overset{\text{d}}{=}X_{1}$ . Similarly, we have

\mathcal{E}[\lvert(b)_{2}\rvert]\leq\mathcal{E}[W_{i,n}^{2}\delta(\lvert W_{i,n}\rvert)]=\frac{1}{n}\mathcal{E}[W^{2}\delta(n^{-1/2}\lvert W\rvert)],

where we use the setup $W_{i,n}\overset{\text{d}}{=}\frac{W}{\sqrt{n}}$ . For $(b)_{3}$ , since $(e_{i,n},W_{i,n})$ and $U_{i,n}$ are semi- $G$ -independent, (noting that $e_{i,n}$ and $W_{i,n}$ depend on the same $V_{i}$ ,) we have

	$\displaystyle\mathcal{E}[(b)_{3}]$	$\displaystyle=\max_{(v_{i},v_{u})}\mathbb{E}[n^{-1}v_{i}^{2}(\eta_{i}^{2}-\epsilon_{i}^{2})\varphi^{\prime\prime}(v_{u}^{T}\xi_{u})]$
	$\displaystyle(\text{classical indep.})$	$\displaystyle=\max_{(v_{i},v_{u})}n^{-1}v_{i}^{2}\underbrace{\mathbb{E}[\eta_{i}^{2}-\epsilon_{i}^{2}]}_{=0}\mathbb{E}[\varphi^{\prime\prime}(v_{u}^{T}\xi_{u})]=0,$

where we use the fact that $\mathbb{E}[\eta_{i}^{2}]=\mathbb{E}[\epsilon_{i}^{2}]=1$ . Similarly we have $-\mathcal{E}[-(b)_{3}]=0$ so $(b)_{3}$ has certain mean zero. Therefore, we have

	$\displaystyle\mathcal{E}[\varphi(M_{i,n})-\varphi(M_{i-1,n})]$	$\displaystyle=\frac{1}{2}\mathcal{E}[(b)_{1}-(b)_{2}]$
		$\displaystyle\leq\frac{1}{2}(\mathcal{E}[\lvert b\rvert_{1}]+\mathcal{E}[\lvert b\rvert_{2}])$
		$\displaystyle=\frac{1}{2n}(\mathcal{E}[X_{1}^{2}\delta(n^{-1/2}\lvert X_{1}\rvert)]+\mathcal{E}[W^{2}\delta(n^{-1/2}\lvert W\rvert)]).$

Meanwhile, if we reverse the role of $\varphi(M_{i,n})$ and $\varphi(M_{i-1,n})$ and let $i=n-j+1$ with $j=1,2,\dotsc,n$ , we get

	$\displaystyle\mathcal{E}[\varphi(M_{n-j,n})-\varphi(M_{n-j+1,n})]$	$\displaystyle=\mathcal{E}[\varphi(M_{i-1,n})-\varphi(M_{i,n})]$
		$\displaystyle=\frac{1}{2}\mathcal{E}[(b)_{2}-(b)_{1}]$
		$\displaystyle\leq\frac{1}{2}(\mathcal{E}[\lvert b\rvert_{2}]+\mathcal{E}[\lvert b\rvert_{1}])$
		$\displaystyle=\frac{1}{2n}(\mathcal{E}[X_{1}^{2}\delta(n^{-1/2}\lvert X_{1}\rvert)]+\mathcal{E}[W^{2}\delta(n^{-1/2}\lvert W\rvert)]).$

Hence, by 6.25 and 6.26, we have

	$\displaystyle\lvert\mathcal{E}[\varphi(S_{n})]-\mathcal{E}[\varphi(W)]\rvert$	$\displaystyle=\lvert\mathcal{E}[\varphi(S_{n})]-\mathcal{E}[\varphi(W_{n})]\rvert$
		$\displaystyle=\max\{\mathcal{E}[\varphi(S_{n})]-\mathcal{E}[\varphi(W_{n})],\mathcal{E}[\varphi(W_{n})]-\mathcal{E}[\varphi(S_{n})]\}$
		$\displaystyle\leq\max\{\sum_{i=1}^{n}\mathcal{E}[\varphi(M_{i,n})-\varphi(M_{i-1,n})],\sum_{j=1}^{n}\mathcal{E}[\varphi(M_{n-j,n})-\varphi(M_{n-j+1,n})].\}$
		$\displaystyle\leq\sum_{i=1}^{n}\frac{1}{2n}(\mathcal{E}[X_{1}^{2}\delta(n^{-1/2}\lvert X_{1}\rvert)]+\mathcal{E}[W^{2}\delta(n^{-1/2}\lvert W\rvert)])$
		$\displaystyle=\frac{1}{2}(\mathcal{E}[X_{1}^{2}\delta(n^{-1/2}\lvert X_{1}\rvert)]+\mathcal{E}[W^{2}\delta(n^{-1/2}\lvert W\rvert)]).$

Note that for any $v_{1}\in{[\underline{\sigma},\overline{\sigma}]}$ , $\lvert v_{1}\eta_{1}\rvert\leq\overline{\sigma}\lvert\eta\rvert_{1}$ , we have

	$\displaystyle\mathcal{E}[X_{1}^{2}\delta(n^{-1/2}\lvert X_{1}\rvert)]$	$\displaystyle=\max_{v_{1}\in{[\underline{\sigma},\overline{\sigma}]}}\mathbb{E}[v_{1}^{2}\eta_{1}^{2}\delta(n^{-1/2}\lvert v_{1}\eta_{1}\rvert)]$
	$\displaystyle(\text{monotonicity of }\delta)$	$\displaystyle\leq\max_{v_{1}\in{[\underline{\sigma},\overline{\sigma}]}}v_{1}^{2}\mathbb{E}[\eta_{1}^{2}\delta(n^{-1/2}\overline{\sigma}\lvert\eta_{1}\rvert)]$
		$\displaystyle=\overline{\sigma}^{2}\mathbb{E}[\eta_{1}^{2}\delta(n^{-1/2}\overline{\sigma}\lvert\eta_{1}\rvert)].$

By 6.10, we have $\delta(a)\leq 2M$ for all $a\in\mathbb{R}^{+}$ so $\eta_{1}^{2}\delta(n^{-1/2}\overline{\sigma}\lvert\eta_{1}\rvert)\leq 2M\eta_{1}^{2}$ . Meanwhile, we have $\eta_{1}^{2}\delta(n^{-1/2}\overline{\sigma}\lvert\eta_{1}\rvert)\to 0$ (classically) almost surely as $n\to\infty$ , then by classical dominance convergence theorem, we have $\mathbb{E}[\eta_{1}^{2}\delta(n^{-1/2}\overline{\sigma}\lvert\eta_{1}\rvert)]\to 0$ implying $\mathcal{E}[X_{1}^{2}\delta(n^{-1/2}\lvert X_{1}\rvert)]\to 0$ . Similarly we can show $\mathcal{E}[W^{2}\delta(n^{-1/2}\lvert W\rvert)]\to 0$ . Finally, we have

\lvert\mathcal{E}[\varphi(S_{n})]-\mathcal{E}[\varphi(W)]\rvert\to 0,

\lim_{n\to\infty}\mathcal{E}[\varphi(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i})]=\mathcal{E}[\varphi(W)].\qed

References

Artzner et al., (1999) Artzner, P., Delbaen, F., Eber, J.-M., and Heath, D. (1999). Coherent measures of risk. Mathematical finance, 9(3):203–228.
Bayraktar and Munk, (2015) Bayraktar, E. and Munk, A. (2015). Comparing the $G$ -normal distribution to its classical counterpart. Communications on Stochastic Analysis, 9(1):1–18.
Breiman, (1992) Breiman, L. (1992). Probability. Society for Industrial and Applied Mathematics, USA.
Chatfield, (1995) Chatfield, C. (1995). Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society: Series A (Statistics in Society), 158(3):419–444.
Chen and Epstein, (2002) Chen, Z. and Epstein, L. (2002). Ambiguity, risk, and asset returns in continuous time. Econometrica, 70(4):1403–1443.
Choquet, (1954) Choquet, G. (1954). Theory of capacities. In Annales de l’Institut Fourier, volume 5, pages 131–295.
Crandall et al., (1992) Crandall, M. G., Ishii, H., and Lions, P.-L. (1992). User’s guide to viscosity solutions of second order partial differential equations. Bulletin of the American Mathematical Society, 27(1):1–67.
Deng et al., (2019) Deng, S., Fei, C., Fei, W., and Mao, X. (2019). Stability equivalence between the stochastic differential delay equations driven by $G$ -brownian motion and the euler–maruyama method. Applied Mathematics Letters, 96:138–146.
Denis et al., (2011) Denis, L., Hu, M., and Peng, S. (2011). Function spaces and capacity related to a sublinear expectation: application to $G$ -Brownian motion paths. Potential Analysis, 34(2):139–161.
Der Kiureghian and Ditlevsen, (2009) Der Kiureghian, A. and Ditlevsen, O. (2009). Aleatory or epistemic? Does it matter? Structural safety, 31(2):105–112.
Dolinsky et al., (2012) Dolinsky, Y., Nutz, M., and Soner, H. M. (2012). Weak approximation of $G$ -expectations. Stochastic Processes and their Applications, 122(2):664–675.
Ellsberg, (1961) Ellsberg, D. (1961). Risk, ambiguity, and the Savage axioms. The Quarterly Journal of Economics, 75(4):643–669.
Epstein and Ji, (2013) Epstein, L. G. and Ji, S. (2013). Ambiguous volatility and asset pricing in continuous time. The Review of Financial Studies, 26(7):1740–1786.
Fang et al., (2019) Fang, X., Peng, S., Shao, Q., and Song, Y. (2019). Limit theorems with rate of convergence under sublinear expectations. Bernoulli, 25(4A):2564–2596.
Fei and Fei, (2019) Fei, C. and Fei, W. (2019). Consistency of least squares estimation to the parameter for stochastic differential equations under distribution uncertainty. arXiv preprint arXiv:1904.12701.
Föllmer and Schied, (2011) Föllmer, H. and Schied, A. (2011). Stochastic finance: an introduction in discrete time. Walter de Gruyter.
Hu, (2012) Hu, M. (2012). Explicit solutions of $G$ -heat equation with a class of initial conditions by $G$ -Brownian motion. Nonlinear Analysis, 75(18):6588–6595.
Hu and Li, (2014) Hu, M. and Li, X. (2014). Independence under the $G$ -expectation framework. Journal of Theoretical Probability, 27(3):1011–1020.
Hu et al., (2017) Hu, M., Peng, S., Song, Y., et al. (2017). Stein type characterization for $G$ -normal distributions. Electronic Communications in Probability, 22.
Huang and Liang, (2019) Huang, S. and Liang, G. (2019). A monotone scheme for $G$ -equations with application to the explicit convergence rate of robust central limit theorem. arXiv preprint arXiv:1904.07184.
Huber, (2004) Huber, P. J. (2004). Robust statistics, volume 523. John Wiley & Sons.
Jin and Peng, (2016) Jin, H. and Peng, S. (2016). Optimal unbiased estimation for maximal distribution. arXiv preprint arXiv:1611.07994.
Jin and Peng, (2021) Jin, H. and Peng, S. (2021). Optimal unbiased estimation for maximal distribution. Probability, Uncertainty and Quantitative Risk, 6(3):189–198.
Knight, (1921) Knight, F. H. (1921). Risk, uncertainty and profit, First edition. Boston, New York, Houghton Mifflin Company.
Krylov, (2020) Krylov, N. V. (2020). On shige peng’s central limit theorem. Stochastic Processes and their Applications, 130(3):1426–1434.
Li, (2018) Li, Y. (2018). Statistical exploration in the $G$ -expectation framework: the pseudo simulation and estimation of variance uncertainty. Master’s thesis, The University of Western Ontario, London, ON, Canada.
Li and Kulperger, (2018) Li, Y. and Kulperger, R. (2018). An iterative approximation of the sublinear expectation of an arbitrary function of $G$ -normal distribution and the solution to the corresponding $G$ -heat equation. arXiv preprint arXiv:1804.10737.
Pei et al., (2021) Pei, Z., Wang, X., Xu, Y., and Yue, X. (2021). A worst-case risk measure by $G$ -var. Acta Mathematicae Applicatae Sinica, English Series, 37(2):421–440.
Peng, (2004) Peng, S. (2004). Filtration consistent nonlinear expectations and evaluations of contingent claims. Acta Mathematicae Applicatae Sinica, English Series, 20(2):191–214.
Peng, (2007) Peng, S. (2007). $G$ -expectation, $G$ -Brownian motion and related stochastic calculus of Itô type. In Stochastic analysis and applications, pages 541–567. Springer.
Peng, (2008) Peng, S. (2008). Multi-dimensional $G$ -Brownian motion and related stochastic calculus under $G$ -expectation. Stochastic Processes and their Applications, 118(12):2223–2253.
Peng, (2017) Peng, S. (2017). Theory, methods and meaning of nonlinear expectation theory. SCIENTIA SINICA Mathematica, 47(10):1223–1254.
(33) Peng, S. (2019a). Law of large numbers and central limit theorem under nonlinear expectations. Probability, Uncertainty and Quantitative Risk, 4(1):4.
(34) Peng, S. (2019b). Nonlinear Expectations and Stochastic Calculus under Uncertainty: with Robust CLT and $G$ -Brownian Motion, volume 95. Springer-Verlag Berlin Heidelberg.
Peng and Yang, (2020) Peng, S. and Yang, S. (2020). Autoregressive models of the time series under volatility uncertainty and application to var model. arXiv preprint arXiv:2011.09226.
Peng et al., (2020) Peng, S., Yang, S., and Yao, J. (2020). Improving value-at-risk prediction under model uncertainty. Journal of Financial Econometrics.
Peng and Zhou, (2020) Peng, S. and Zhou, Q. (2020). A hypothesis-testing perspective on the $G$ -normal distribution theory. Statistics & Probability Letters, 156:108623.
Pursell, (1967) Pursell, L. E. (1967). Uniform approximation of real continuous functions on the real line by infinitely differentiable functions. Mathematics Magazine, 40(5):263–265.
Rokhlin, (2015) Rokhlin, D. B. (2015). Central limit theorem under uncertain linear transformations. Statistics & Probability Letters, 107:191–198.
Schmeidler, (1989) Schmeidler, D. (1989). Subjective probability and expected utility without additivity. Econometrica, 57(3):571–587.
Song, (2020) Song, Y. (2020). Normal approximation by stein’s method under sublinear expectations. Stochastic Processes and their Applications, 130(5):2838–2850.
Xu and Xuan, (2019) Xu, Q. and Xuan, X. M. (2019). Nonlinear regression without iid assumption. Probability, Uncertainty and Quantitative Risk, 4(1):1–15.
Zhang and Chen, (2014) Zhang, D. and Chen, Z. (2014). A weighted central limit theorem under sublinear expectations. Communications in Statistics-Theory and Methods, 43(3):566–577.
Zhang, (2016) Zhang, L. (2016). Rosenthal’s inequalities for independent and negatively dependent random variables under sub-linear expectations with applications. Science China Mathematics, 59(4):751–768.

Semi-GG-normal: a Hybrid between Normal and GG-normal (Full Version)

Abstract

1 Introduction

1.1 Introduction to the GG-expectation framework

1.2 Potential misunderstandings on the GG-normal and independence

1.3 Our main tool and results in this paper

2 Basic settings of the GG-expectation framework

2.1 Distributions, independence and limiting results

Definition 2.1.

Definition 2.2 (The upper and lower probability).

Proposition 2.3.

Proof.

Remark 2.3.1.

Proposition 2.4.

Proof.

Definition 2.5 (Distributions).

Definition 2.6 (Independence).

Remark 2.6.1.

Remark 2.6.2.

Remark 2.6.3.

Example 2.7 (Example 1.3.15 in Peng, 2019b ).

Definition 2.8.

Remark 2.8.1.

Proposition 2.9.

Remark 2.9.1.

Definition 2.10.

Proposition 2.11.

Remark 2.11.1.

Definition 2.12 (Maximal distribution).

Definition 2.13 (GG-normal distribution).

Proposition 2.14 (GG-normal distribution characterized by the GG-heat Equation).

Remark 2.14.1.

Remark 2.14.2.

Proposition 2.15.

Theorem 2.16 (Law of Large Numbers).

Theorem 2.17 (Central Limit Theorem).

Proposition 2.18.

Remark 2.18.1.

2.2 Basic results on independence of sequence

Proposition 2.19.

Proof.

Proposition 2.20.

Proof.

Lemma 2.21.

Proof.

Lemma 2.22.

Proof.

Proposition 2.23.

Proof.

Proposition 2.24.

Proof.

Proposition 2.25.

Proof.

Theorem 2.26 (Hu and Li, (2014)).

Proposition 2.27.

Proof.

3 Our main results: semi-GG-normal and its representations

3.1 Setup of a story in a classical filtered probability space

3.2 Preparation: properties of maximal distribution

Definition 3.1.

Remark 3.1.1.

Proposition 3.2 (Representations of Univariate Maximal Distribution).

Remark 3.2.1.

Remark 3.2.2.

Remark 3.2.3.

Definition 3.3.

Proposition 3.4 (Representations of multivariate maximal distribution).

Proof.

Proposition 3.5.

Remark 3.5.1.

Proposition 3.6.

Remark 3.6.1.

3.3 Preparation: setup of a product space (a newly added part)

Proposition 3.7.

Proof.

Remark 3.7.1.

3.4 Univariate semi-GG-normal distribution

Definition 3.8 (Univariate semi-GG-normal distribution).

Remark 3.8.1.

Remark 3.8.2.

Semi- $G$ -normal: a Hybrid between Normal and $G$ -normal (Full Version)

1.1 Introduction to the $G$ -expectation framework

1.2 Potential misunderstandings on the $G$ -normal and independence

2 Basic settings of the $G$ -expectation framework

Definition 2.13 ( $G$ -normal distribution).

Proposition 2.14 ( $G$ -normal distribution characterized by the $G$ -heat Equation).

3 Our main results: semi- $G$ -normal and its representations

3.4 Univariate semi- $G$ -normal distribution

Definition 3.8 (Univariate semi- $G$ -normal distribution).

Theorem 3.10 (Representations of univariate semi- $G$ -normal).

Remark 3.11.4 (The distribution of $\epsilon$ ).

Proposition 3.12 (A special connection between semi- $G$ -normal and $G$ -normal distribution).

3.5 Multivariate semi- $G$ -normal distribution

Definition 3.13 (Multivariate Semi- $G$ -normal distribution).

Proposition 3.15 (Upper and lower covariance between semi- $G$ -normal marginals).

3.6 Three types of independence related to semi- $G$ -normal distribution

Remark 3.16.1 (Compatibility with the definition of semi- $G$ -normal).

Remark 3.24.6 (A more explicit distinction between semi- $G$ -normal and $G$ -normal).

4 The hybrid roles and applications of semi- $G$ -normal distributions

4.1 How to connect the linear expectations of classical normal with $G$ -normal

Theorem 4.1 (The Iterative Approximation of the $G$ -normal Distribution).

Proposition 4.5 (Multivariate semi- $G$ -normal under linear transformation).