This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Statistical inference on kurtosis of elliptical distributions

\nameBowen Zhou, Peirong Xu and Cheng Wang CONTACT Cheng Wang. Email: [email protected] School of Mathematical Science, Shanghai Jiao Tong University, Shanghai, 20040, China
Abstract

Multivariate elliptically-contoured distributions are widely used for modeling correlated and non-Gaussian data. In this work, we study the kurtosis of the elliptical model, which is an important parameter in many statistical analysis. Based on U-statistics, we develop an estimation method. Theoretically, we show that the proposed estimator is consistent under regular conditions, especially we relax a moment condition and the restriction that the data dimension and the sample size are of the same order. Furthermore, we derive the asymptotic normality of the estimator and evaluate the asymptotic variance through several examples, which allows us to construct a confidence interval. The performance of our method is validated by extensive simulations and real data analysis.

keywords:
Elliptical distribution; kurtosis; high-dimensional data; U-statistics

1 Introduction

The multivariate normal distribution plays a central role in multivariate statistical analysis but is often violated in practical applications. For instance, in finance, while stock returns are symmetric, they exhibit leptokurtosis(Čížek et al., 2011; McNeil et al., 2015). In genomics and bioimaging, empirical evidence suggests that the Gaussian assumption may not hold(Thomas et al., 2010; Posekany et al., 2011). As a natural extension, elliptical distributions have received considerable attention in the past few decades (Fang and Zhang, 1990; Gupta et al., 2013). This class of distributions retains many desirable properties of the multivariate normal distribution, such as symmetry, while also enabling the modeling of non-normal dependence and multivariate extremes, making them valuable in various applications. In particular, many methods in high-dimensional data analysis have been motivated by the study of elliptical distributions (Han and Liu, 2012; Fan et al., 2015, 2018).

A random vector \bmXp\bm{X}\in\mathbb{R}^{p} follows an elliptical distribution if it can be represented in the form

\bmX=𝝁+ξ𝚺12\bmU,\displaystyle\bm{X}=\mbox{\boldmath$\mu$}+\xi\mbox{\boldmath$\Sigma$}^{\frac{1}{2}}{\bm{U}}, (1)

where 𝝁p\mbox{\boldmath$\mu$}\in\mathbb{R}^{p} is a pp-dimensional constant vector, 𝚺p×p\mbox{\boldmath$\Sigma$}\in\mathbb{R}^{p\times p} is a p×pp\times p positive definite matrix, \bmUp{\bm{U}}\in\mathbb{R}^{p} is a random vector uniformly distributed on the unit sphere 𝒮p1\mathcal{S}^{p-1}, and ξ[0,)\xi\in[0,\infty) is a random variable that independent of \bmU{\bm{U}}. Given different distributions of ξ\xi, the class of elliptical distributions encompasses several important distribution families. For instance, it is reduced to the multivariate normal distribution when ξ2\xi^{2} follows a chi-square distribution. If ξ2\xi^{2} follows an F-distribution, it results in the multivariate tt distribution. In general, selecting an appropriate distribution for ξ\xi is crucial but challenging for real data analysis, often leveraging the moment information of ξ\xi. In particular, the kurtosis parameter (Ke et al., 2018)

θ=def𝔼ξ4p(p+2)\displaystyle\theta\stackrel{{\scriptstyle\mbox{{\tiny def}}}}{{=}}\frac{\mathbb{E}\xi^{4}}{p(p+2)}

is widely discussed in the literature. For example, Fan et al. (2015) demonstrated that an estimator of θ\theta is essential for constructing a general quadratic classifier under the assumption that data are generated from elliptical distributions. In the context of financial assets, the leptokurtosis, which equals θ1\theta-1, is often used to capture the tail behavior of stock returns (Ke et al., 2018). In random matrix theory, Hu et al. (2019) utilized the established asymptotic properties for the spherical test of the covariance matrix, which requires an estimator of θ\theta to construct the test statistic. Wang and Lopes (2023) further proposed the parametric bootstrapping method for inference with spectral statistics in high-dimensional elliptical models, requiring a consistent estimator of var(ξ2/p)=(p+2)θ/p1\mbox{var}(\xi^{2}/p)=(p+2)\theta/p-1. Therefore, constructing a theoretically justified estimator for θ\theta is crucial for facilitating statistical inference in different high-dimensional settings.

For the classical setting where the data dimension of \bmX\bm{X} is much smaller than the sample size, Seo and Toyama (1996) proposed a moment estimator by noting ξ2=(\bmX𝝁)𝚺1(\bmX𝝁)\xi^{2}=(\bm{X}-\mbox{\boldmath$\mu$})^{\top}\mbox{\boldmath$\Sigma$}^{-1}(\bm{X}-\mbox{\boldmath$\mu$}), where Σ\Sigma can be well estimated by the sample covariance matrix. It performs poorly in the high-dimensional setting due to noise accumulation of the sample covariance matrix. To address this problem, Fan et al. (2015) proposed a moment estimator using the shrinkage estimator of the precision matrix Σ1\Sigma^{-1}. The consistency of this estimator requires a sparsity condition on Σ1\Sigma^{-1} and is restricted to the sub-Gaussian family. Ke et al. (2018) proposed an estimator of θ\theta based on the coordinates of the data, avoiding the estimation of Σ1\Sigma^{-1} but neglecting the within-coordinate correlation. The estimator is consistent in ultra-high dimensional cases, but its asymptotic normality holds only when the data are from a sub-Gaussian distribution. Wang and Lopes (2023) studied the estimation of var(ξ2/p)=(p+2)θ/p1\mbox{var}(\xi^{2}/p)=(p+2)\theta/p-1 directly by constructing a quadratic-form estimating equation. The estimator is consistent when pp is of the same order as the sample size nn, but its asymptotic normality remains unknown.

We propose to estimate the kurtosis by representing it as a function of U-statistics. Our work contributes to the existing literature in several theoretical aspects: (1) both the consistency and the asymptotic normality of the estimator hold for data collected from general elliptical distributions; (2) the estimator achieves a faster convergence rate than that in Ke et al. (2018) even when pp is larger than nn; and (3) the asymptotic normality holds for both light-tailed and heavy-tailed cases, enabling the construction of valid confidence intervals for different distribution families.

The rest of the paper is organized as follows. We propose an model-free estimator of θ\theta for high-dimensional settings in Section 2.1, and study its theoretical properties in Section 2.2. In Section 2.3, we discuss the asymptotic normality of the estimator and construct confidence intervals of θ\theta under several commonly used distribution families. We investigate the finite sample performance of our method with other competing methods in Section 3. In Section 4, we demonstrate the effectiveness of the proposed method through several real data applications. All technical proofs are provided in the Appendix A.

2 Main results

2.1 Methodology

Suppose \bmXp\bm{X}\in\mathbb{R}^{p} follows an elliptical distribution (1), where assuming 𝔼ξ2=p\mathbb{E}\xi^{2}=p for model identifiability. Then, we know that

var(\bmX𝝁22)=(θ1)tr2𝚺+2θtr𝚺2,\displaystyle\mbox{var}(\|\bm{X}-\mbox{\boldmath$\mu$}\|_{2}^{2})=(\theta-1)\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}+2\theta\mbox{tr}\mbox{\boldmath$\Sigma$}^{2},

which induces a moment equation

θ=var(\bmX𝝁22)+tr2𝚺tr2𝚺+2tr𝚺2.\displaystyle\theta=\frac{\mbox{var}(\|\bm{X}-\mbox{\boldmath$\mu$}\|_{2}^{2})+\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}+2\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}}. (2)

This implies that an empirical estimate of θ\theta can be derived via plug-in technique by estimating var(\bmX𝝁22)\mbox{var}(\|\bm{X}-\mbox{\boldmath$\mu$}\|_{2}^{2}), tr2𝚺\mbox{tr}^{2}\mbox{\boldmath$\Sigma$} and tr𝚺2\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}, respectively.

From the perspective of a UU-statistic, we have

var(\bmX𝝁22)=\displaystyle\mbox{var}(\|\bm{X}-\mbox{\boldmath$\mu$}\|_{2}^{2})= 14𝔼(\bmX1\bmX222\bmX3\bmX422)22tr𝚺2,\displaystyle\frac{1}{4}\mathbb{E}\left(\|\bm{X}_{1}-\bm{X}_{2}\|_{2}^{2}-\|\bm{X}_{3}-\bm{X}_{4}\|_{2}^{2}\right)^{2}-2\mbox{tr}\mbox{\boldmath$\Sigma$}^{2},
tr2𝚺=\displaystyle\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}= 14𝔼\bmX1\bmX222\bmX3\bmX422,\displaystyle\frac{1}{4}\mathbb{E}\|\bm{X}_{1}-\bm{X}_{2}\|_{2}^{2}\|\bm{X}_{3}-\bm{X}_{4}\|_{2}^{2},
tr𝚺2=\displaystyle\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}= 14𝔼((\bmX1\bmX2)(\bmX3\bmX4))2,\displaystyle\frac{1}{4}\mathbb{E}\left((\bm{X}_{1}-\bm{X}_{2})^{\top}(\bm{X}_{3}-\bm{X}_{4})\right)^{2},

where \bmX1,\bmX2,\bmX4\bm{X}_{1},\bm{X}_{2}\cdots,\bm{X}_{4} are i.i.d. copies of \bmX\bm{X}. Therefore, based on i.i.d. samples \bmX1,,\bmXnp\bm{X}_{1},\ldots,\bm{X}_{n}\in\mathbb{R}^{p} from the elliptical distribution (1), empirical estimates of var(\bmX𝝁22)\mbox{var}(\|\bm{X}-\mbox{\boldmath$\mu$}\|_{2}^{2}), tr2𝚺\mbox{tr}^{2}\mbox{\boldmath$\Sigma$} and tr𝚺2\mbox{tr}\mbox{\boldmath$\Sigma$}^{2} would be T12T3,T2,T3T_{1}-2T_{3},T_{2},T_{3}, respectively, where

T1=14n(n1)(n2)(n3)ijkl(\bmXi\bmXj22\bmXk\bmXl22)2,\displaystyle T_{1}=\frac{1}{4n(n-1)(n-2)(n-3)}\sum_{i\neq j\neq k\neq l}\left(\|\bm{X}_{i}-\bm{X}_{j}\|_{2}^{2}-\|\bm{X}_{k}-\bm{X}_{l}\|_{2}^{2}\right)^{2},
T2=14n(n1)(n2)(n3)ijkl\bmXi\bmXj22\bmXk\bmXl22,\displaystyle T_{2}=\frac{1}{4n(n-1)(n-2)(n-3)}\sum_{i\neq j\neq k\neq l}\|\bm{X}_{i}-\bm{X}_{j}\|_{2}^{2}\|\bm{X}_{k}-\bm{X}_{l}\|_{2}^{2},
T3=14n(n1)(n2)(n3)ijkl((\bmXi\bmXj)(\bmXk\bmXl))2.\displaystyle T_{3}=\frac{1}{4n(n-1)(n-2)(n-3)}\sum_{i\neq j\neq k\neq l}\left((\bm{X}_{i}-\bm{X}_{j})^{\top}(\bm{X}_{k}-\bm{X}_{l})\right)^{2}.

The estimator of θ\theta can be constructed as

θ^n=T1+T22T3T2+2T3.\displaystyle\hat{\theta}_{n}=\frac{T_{1}+T_{2}-2T_{3}}{T_{2}+2T_{3}}.

2.2 Consistency

In this subsection, we establish the consistency and asymptotic normality of the proposed estimator θn\theta_{n}. We first make the following assumptions:

Assumption 2.1.

As nn\to\infty, p=p(n)p=p(n)\to\infty, tr𝚺4\mbox{tr}\mbox{\boldmath$\Sigma$}^{4}\to\infty and tr𝚺4/tr2𝚺20\mbox{tr}\mbox{\boldmath$\Sigma$}^{4}/\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}\to 0.

Assumption 2.2.

𝔼ξ8=O(p4)\mathbb{E}\xi^{8}=O(p^{4}).

Assumption 2.1 is commonly used in the high-dimensional setting(Chen et al., 2010; Chen and Qin, 2010; Guo and Chen, 2016), which is required to control the high-order terms in applying Hoeffding decomposition for U-statistics. If all the eigenvalues of 𝚺\Sigma are bounded, Assumption 2.1 holds naturally. Assumption 2.2 restricts the moments of ξ2/p\xi^{2}/p up to 4 at the constant scale. It is weaker than that in Wang and Lopes (2023) involving 4+ε4+\varepsilon moment of ξ2/p\xi^{2}/p with some ε>0\varepsilon>0.

Theorem 2.1.

Under Assumptions 2.1 and 2.2, we have

θ^nθp 0.\displaystyle\hat{\theta}_{n}-\theta\,{\buildrel p\over{\longrightarrow}}\,0.

Theorem 2.1 indicates that θ^n\hat{\theta}_{n} is consistent when the data are collected from general elliptical distributions. This extends the consistency result in Ke et al. (2018) and Wang and Lopes (2023) under weaker conditions.

Theorem 2.2.

Under Assumptions 2.1 and 2.2, we have that:

  • (i)

    if var(ξ2/p)=τ/p+o(1/p)\mbox{var}(\xi^{2}/p)=\tau/p+o(1/p) and var((ξ2/p1)2)=2τ2/p2+o(1/p2)\mbox{var}\left((\xi^{2}/p-1)^{2}\right)=2\tau^{2}/p^{2}+o(1/p^{2}) for some constants τ>0\tau>0, then

    nσ(θ^nθ)dN(0,1),\displaystyle\frac{\sqrt{n}}{\sigma}\left(\hat{\theta}_{n}-\theta\right)\,{\buildrel d\over{\longrightarrow}}\,N(0,1),

    where

    σ2=2(τ2p+2tr𝚺2tr2𝚺)2;\displaystyle\sigma^{2}=2\left(\frac{\tau-2}{p}+2\frac{\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\right)^{2};
  • (ii)

    if var(ξ2/p)=O(1)\mbox{var}(\xi^{2}/p)=O(1), then

    nσ(θ^nθ)dN(0,1),\displaystyle\frac{\sqrt{n}}{\sigma}\left(\hat{\theta}_{n}-\theta\right)\,{\buildrel d\over{\longrightarrow}}\,N(0,1),

    where

    σ2=\displaystyle\sigma^{2}= var[(ξ2p1)22var(ξ2p)ξ2p].\displaystyle\mbox{var}\left[\left(\frac{\xi^{2}}{p}-1\right)^{2}-2\mbox{var}\left(\frac{\xi^{2}}{p}\right)\cdot\frac{\xi^{2}}{p}\right].
Remark 1.

Case (i) requires var(ξ2/p)\mbox{var}(\xi^{2}/p) to be of order 1/p1/p, similar to the requirements in Hu et al. (2019) and Wang and Lopes (2023). Additionally, var((ξ2/p1)2)\mbox{var}\left((\xi^{2}/p-1)^{2}\right) must be of order 1/p21/p^{2}, which is necessary for establishing the asymptotic normality. These two conditions are satisfied in several widely studied sub-classes of elliptical models. For example, if \bmXN(𝝁,𝚺)\bm{X}\sim N(\mbox{\boldmath$\mu$},\mbox{\boldmath$\Sigma$}), then ξχ2(p)\xi\sim\chi^{2}(p) and thus we have

var(ξ2p)=2p,var[(ξ2p1)2]=8p2+o(1p2),\displaystyle\mbox{var}\left(\frac{\xi^{2}}{p}\right)=\frac{2}{p},~{}\mbox{var}\left[\left(\frac{\xi^{2}}{p}-1\right)^{2}\right]=\frac{8}{p^{2}}+o\left(\frac{1}{p^{2}}\right),

which implies that the conditions hold with τ=2\tau=2. More examples can be found in the following section.

2.3 Examples

In this subsection, we apply the theoretical results in Section 2.2 to derive the confidence interval of the kurtosis parameter θ\theta for some widely used elliptical distributions.

Example 2.1 (Example 2.3 of Hu et al. 2019).

Let \bmX=𝝁+ξ𝚺1/2\bmU\bm{X}=\mbox{\boldmath$\mu$}+\xi\mbox{\boldmath$\Sigma$}^{1/2}{\bm{U}} with ξ2=j=1pYj2\xi^{2}=\sum_{j=1}^{p}Y_{j}^{2} independent of \bmU{\bm{U}}, where {Yj,j=1,,p}\{Y_{j},j=1,\cdots,p\} is a sequence of i.i.d. random variables with

𝔼Yj=0,𝔼Yj2=1,𝔼Yj4=3+Δ,𝔼Yj8<+.\displaystyle\mathbb{E}Y_{j}=0,~{}\mathbb{E}Y_{j}^{2}=1,~{}\mathbb{E}Y_{j}^{4}=3+\Delta,~{}\mathbb{E}Y_{j}^{8}<+\infty.

The normal distribution is a special case with YjN(0,1),j=1,,pY_{j}\sim N(0,1),~{}j=1,\cdots,p. By simple calculation, we have

var(ξ2p)=2+Δp,var[(ξ2p1)2]=(2+Δ)2p2,\displaystyle\mbox{var}\left(\frac{\xi^{2}}{p}\right)=\frac{2+\Delta}{p},~{}\mbox{var}\left[\left(\frac{\xi^{2}}{p}-1\right)^{2}\right]=\frac{(2+\Delta)^{2}}{p^{2}},

which implies that the conditions of Case (i) in Theorem 2.2 holds with τ=2+Δ\tau=2+\Delta. Therefore,

n2(Δp+2tr𝚺2tr2𝚺)1(θ^nθ)dN(0,1).\displaystyle\sqrt{\frac{n}{2}}\left(\frac{\Delta}{p}+2\frac{\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\right)^{-1}(\hat{\theta}_{n}-\theta)\,{\buildrel d\over{\longrightarrow}}\,N(0,1).

We further estimate tr𝚺2/tr2𝚺\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}/\mbox{tr}^{2}\mbox{\boldmath$\Sigma$} using its consistent estimator T3/T2T_{3}/T_{2} based on Theorem 2.1, and estimate Δ\Delta by the plug-in estimator Δ^n=(p+2)(θ^n1)\hat{\Delta}_{n}=(p+2)(\hat{\theta}_{n}-1), leveraging the fact that Δ=(p+2)(θ1)\Delta=(p+2)(\theta-1). Then, by Slutsky’s theorem, we have

n2(Δ^np+2T3T2)1(θ^nθ)dN(0,1).\displaystyle\sqrt{\frac{n}{2}}\left(\frac{\hat{\Delta}_{n}}{p}+2\frac{T_{3}}{T_{2}}\right)^{-1}(\hat{\theta}_{n}-\theta)\,{\buildrel d\over{\longrightarrow}}\,N(0,1). (3)

Therefore, for a pre-specified nominal level α(0,1)\alpha\in(0,1), the corresponding 1α1-\alpha confidence interval for θ\theta can be constructed as

[θ^n2n(Δ^np+2T3T2)Zα/2,θ^n+2n(Δ^np+2T3T2)Zα/2],\displaystyle\left[\hat{\theta}_{n}-\sqrt{\frac{2}{n}}\left(\frac{\hat{\Delta}_{n}}{p}+2\frac{T_{3}}{T_{2}}\right)Z_{\alpha/2},~{}\hat{\theta}_{n}+\sqrt{\frac{2}{n}}\left(\frac{\hat{\Delta}_{n}}{p}+2\frac{T_{3}}{T_{2}}\right)Z_{\alpha/2}\right], (4)

where Zα/2Z_{\alpha/2} is the upper α/2\alpha/2 quantile of the standard normal distribution.

Example 2.2 (Kotz-type distribution of Kotz 1975).

Let \bmX=𝝁+ξ𝚺1/2\bmU\bm{X}=\mbox{\boldmath$\mu$}+\xi\mbox{\boldmath$\Sigma$}^{1/2}{\bm{U}} with ξ2s\xi^{2s}\simGamma(ϑ,β)(\vartheta,\beta), then the pp-dimensional random vector \bmX\bm{X} follows the Kotz-type distribution, with the density function

f(\bmx)=sβϑΓ(p/2)πp/2Γ(ϑ)((\bmx𝝁)𝚺1(\bmx𝝁))k1exp{β((\bmx𝝁)𝚺1(\bmx𝝁))s},\displaystyle f(\bm{x})=\frac{s\beta^{\vartheta}\Gamma(p/2)}{\pi^{p/2}\Gamma(\vartheta)}\left((\bm{x}-\mbox{\boldmath$\mu$})^{\top}\mbox{\boldmath$\Sigma$}^{-1}(\bm{x}-\mbox{\boldmath$\mu$})\right)^{k-1}\exp\left\{-\beta\left((\bm{x}-\mbox{\boldmath$\mu$})^{\top}\mbox{\boldmath$\Sigma$}^{-1}(\bm{x}-\mbox{\boldmath$\mu$})\right)^{s}\right\},

where ϑ=(k1+p/2)/s>0\vartheta=(k-1+p/2)/s>0 and β,s>0\beta,s>0. To ensure the identifiability, normalize ξ\xi to satisfy 𝔼ξ2=p\mathbb{E}\xi^{2}=p. Then, we can obtain the moments of ξ2\xi^{2} as

𝔼ξ2m=pmΓ(ϑ+m/s)Γm1(ϑ)Γm(ϑ+1/s),m=1,2,.\displaystyle\mathbb{E}\xi^{2m}=p^{m}\frac{\Gamma(\vartheta+m/s)\Gamma^{m-1}(\vartheta)}{\Gamma^{m}(\vartheta+1/s)},~{}m=1,2,\cdots.

Taking ϑ=p\vartheta=p, β=1\beta=1 and s=1/2s=1/2 as a special case, we have

𝔼ξ2m=j=12m(p+j1)(p+1)m,m=1,2,\displaystyle\mathbb{E}\xi^{2m}=\frac{\prod_{j=1}^{2m}(p+j-1)}{(p+1)^{m}},\ m=1,2,\cdots

which gives

var(ξ2p)=4p+o(1p),var[(ξ2p1)2]=32p2+o(1p2).\displaystyle\mbox{var}\left(\frac{\xi^{2}}{p}\right)=\frac{4}{p}+o\left(\frac{1}{p}\right),~{}\mbox{var}\left[\left(\frac{\xi^{2}}{p}-1\right)^{2}\right]=\frac{32}{p^{2}}+o\left(\frac{1}{p^{2}}\right).

This implies the Kotz-type distribution satisfies the conditions of Case (i) in Theorem 2.2 with τ=4\tau=4. Therefore,

n8(1p+tr𝚺2tr2𝚺)1(θ^nθ)dN(0,1).\displaystyle\sqrt{\frac{n}{8}}\left(\frac{1}{p}+\frac{\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\right)^{-1}\left(\hat{\theta}_{n}-\theta\right)\,{\buildrel d\over{\longrightarrow}}\,N(0,1).

Consequently, for a pre-specified nominal level α(0,1)\alpha\in(0,1), the corresponding 1α1-\alpha confidence interval for θ\theta can be constructed as

[θ^n8n(1p+T3T2)Zα/2,θ^n+8n(1p+T3T2)Zα/2].\displaystyle\left[\hat{\theta}_{n}-\sqrt{\frac{8}{n}}\left(\frac{1}{p}+\frac{T_{3}}{T_{2}}\right)Z_{\alpha/2},~{}\hat{\theta}_{n}+\sqrt{\frac{8}{n}}\left(\frac{1}{p}+\frac{T_{3}}{T_{2}}\right)Z_{\alpha/2}\right]. (5)
Example 2.3 (Multivariate tt distribution).

Let \bmX=𝝁+ξ𝚺1/2\bmU\bm{X}=\mbox{\boldmath$\mu$}+\xi\mbox{\boldmath$\Sigma$}^{1/2}{\bm{U}} with ξ2p((d2)/d)F(p,d)\xi^{2}\sim p((d-2)/d)F(p,d), then the pp-dimensional random vector \bmX\bm{X} follows the multivariate tt distribution td(𝝁,𝚺)t_{d}(\mbox{\boldmath$\mu$},\mbox{\boldmath$\Sigma$}), with the density function

f(\bmx)=Γ[(d+p)/2]Γ(d/2)dp/2πp/2|𝚺|1/2[1+1d(\bmx𝝁)𝚺1(\bmx𝝁)](d+p)/2.\displaystyle f(\bm{x})=\frac{\Gamma[(d+p)/2]}{\Gamma(d/2)d^{p/2}\pi^{p/2}|\mbox{\boldmath$\Sigma$}|^{1/2}}\left[1+\frac{1}{d}(\bm{x}-\mbox{\boldmath$\mu$})^{\top}\mbox{\boldmath$\Sigma$}^{-1}(\bm{x}-\mbox{\boldmath$\mu$})\right]^{-(d+p)/2}.

Note that

var(ξ2p)=2d4+O(1p),\displaystyle\mbox{var}\left(\frac{\xi^{2}}{p}\right)=\frac{2}{d-4}+O\left(\frac{1}{p}\right),

which implies the multivariate tt distribution of fixed dd satisfies the conditions of Case (ii) in Theorem 2.2. By simple calculation, we have

σ2=8(d2)2(d+4)(d4)3(d6)(d8)+O(1p)\sigma^{2}=\frac{8(d-2)^{2}(d+4)}{(d-4)^{3}(d-6)(d-8)}+O\left(\frac{1}{p}\right)

which implies that

n(d4)3(d6)(d8)8(d2)2(d+4)(θ^nθ)dN(0,1).\displaystyle\sqrt{\frac{n(d-4)^{3}(d-6)(d-8)}{8(d-2)^{2}(d+4)}}\left(\hat{\theta}_{n}-\theta\right)\,{\buildrel d\over{\longrightarrow}}\,N(0,1).

We estimate the degree of freedom dd by its plug-in estimate

dn=4θ^n2θ^n1,\displaystyle d_{n}=\frac{4\hat{\theta}_{n}-2}{\hat{\theta}_{n}-1},

and hence the 1α1-\alpha confidence interval for θ\theta is

[θ^n8(dn2)2(dn+4)n(dn4)3(dn6)(dn8)Zα/2,θ^n+8(dn2)2(dn+4)n(dn4)3(dn6)(dn8)Zα/2].\displaystyle\left[\hat{\theta}_{n}-\sqrt{\frac{8(d_{n}-2)^{2}(d_{n}+4)}{n(d_{n}-4)^{3}(d_{n}-6)(d_{n}-8)}}Z_{\alpha/2},\hat{\theta}_{n}+\sqrt{\frac{8(d_{n}-2)^{2}(d_{n}+4)}{n(d_{n}-4)^{3}(d_{n}-6)(d_{n}-8)}}Z_{\alpha/2}\right]. (6)
Example 2.4 (Multivariate Laplace distribution).

The random vector \bmX\bm{X} follows a normal scale mixture distribution if \bmX=𝝁+ξ𝚺1/2\bmU\bm{X}=\mbox{\boldmath$\mu$}+\xi\mbox{\boldmath$\Sigma$}^{1/2}{\bm{U}} with ξ=R1R2\xi=\sqrt{R_{1}R_{2}}, where the random variables R1R_{1} and R2R_{2} are independent and R2χp2R_{2}\sim\chi^{2}_{p}. In particular, if R1R_{1} follows an exponential distribution λexp(λ)\lambda\exp(\lambda) with some λ>0\lambda>0, it reduces to the multivariate Laplace distribution (Eltoft et al., 2006). This distribution satisfies the conditions of Case (ii) in Theorem 2.2 with

var(ξ2p)=1+4p.\displaystyle\mbox{var}\left(\frac{\xi^{2}}{p}\right)=1+\frac{4}{p}.

Note that

σ2=\displaystyle\sigma^{2}= var[(ξ2p1)22var(ξ2p)ξ2p]=4+O(1p).\displaystyle\mbox{var}\left[\left(\frac{\xi^{2}}{p}-1\right)^{2}-2\mbox{var}\left(\frac{\xi^{2}}{p}\right)\cdot\frac{\xi^{2}}{p}\right]=4+O\left(\frac{1}{p}\right).

Therefore,

n(θ^nθ)dN(0,4),\displaystyle\sqrt{n}(\hat{\theta}_{n}-\theta)\,{\buildrel d\over{\longrightarrow}}\,N(0,4),

which gives the 1α1-\alpha confidence interval for θ\theta as

[θ^n2nZα/2,θ^n+2nZα/2].\displaystyle\left[\hat{\theta}_{n}-\sqrt{\frac{2}{n}}Z_{\alpha/2},~{}\hat{\theta}_{n}+\sqrt{\frac{2}{n}}Z_{\alpha/2}\right]. (7)
Example 2.5 (General elliptical distribution).

In practice, we typically assume \bmX=𝝁+ξ𝚺1/2\bmU\bm{X}=\mbox{\boldmath$\mu$}+\xi\mbox{\boldmath$\Sigma$}^{1/2}{\bm{U}} follows an elliptical distribution with the distribution of ξ\xi unknown. To derive the confidence interval for θ\theta, we first need to estimate σ2\sigma^{2} described in Theorem 2.2 properly.

  • If Case (i) holds, noting that τ=limp[(p+2)θp]\tau=\lim_{p\to\infty}[(p+2)\theta-p], we can estimate τ\tau by τ^n=(p+2)θnp\hat{\tau}_{n}=(p+2)\theta_{n}-p, and then give an estimator of σ2\sigma^{2} by

    σ^n2=2(τ^n2p+2T3T2)2.\displaystyle\hat{\sigma}_{n}^{2}=2\left(\frac{\hat{\tau}_{n}-2}{p}+2\frac{T_{3}}{T_{2}}\right)^{2}. (8)
  • If Case (ii) holds, note that

    σ2=𝔼ξ8p4(𝔼ξ4p2)24𝔼ξ6𝔼ξ4p5+4(𝔼ξ4p2)3,\displaystyle\sigma^{2}=\frac{\mathbb{E}\xi^{8}}{p^{4}}-\left(\frac{\mathbb{E}\xi^{4}}{p^{2}}\right)^{2}-4\frac{\mathbb{E}\xi^{6}\mathbb{E}\xi^{4}}{p^{5}}+4\left(\frac{\mathbb{E}\xi^{4}}{p^{2}}\right)^{3},

    where

    𝔼ξ4\displaystyle\mathbb{E}\xi^{4} =\displaystyle= p(p+2)θ,\displaystyle p(p+2)\theta,
    𝔼ξ6\displaystyle\mathbb{E}\xi^{6} =\displaystyle= p(p+2)(p+4)𝔼((\bmX𝝁)(\bmX𝝁))3tr3𝚺+6tr𝚺tr𝚺2+8tr𝚺3,\displaystyle\frac{p(p+2)(p+4)\mathbb{E}((\bm{X}-\mbox{\boldmath$\mu$})^{\top}(\bm{X}-\mbox{\boldmath$\mu$}))^{3}}{\mbox{tr}^{3}\mbox{\boldmath$\Sigma$}+6\mbox{tr}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}+8\mbox{tr}\mbox{\boldmath$\Sigma$}^{3}},
    𝔼ξ8\displaystyle\mathbb{E}\xi^{8} =\displaystyle= p(p+2)(p+4)(p+6)𝔼((\bmX𝝁)(\bmX𝝁))4tr4𝚺+12tr2𝚺tr𝚺2+12tr2𝚺2+32tr𝚺tr𝚺3+48tr𝚺4.\displaystyle\frac{p(p+2)(p+4)(p+6)\mathbb{E}((\bm{X}-\mbox{\boldmath$\mu$})^{\top}(\bm{X}-\mbox{\boldmath$\mu$}))^{4}}{\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}+12\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}+12\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}+32\mbox{tr}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{3}+48\mbox{tr}\mbox{\boldmath$\Sigma$}^{4}}.

    Let \bmX¯\bar{\bm{X}} be the sample mean and 𝚺^\widehat{\mbox{\boldmath$\Sigma$}} be the sample covariance matrix. Then, we can estimate σ2\sigma^{2} by

    σ^n2=φ^np4((p+2)θ^np)24ϱ^np3((p+2)θ^np)+4((p+2)θ^np)3,\displaystyle\hat{\sigma}_{n}^{2}=\frac{\hat{\varphi}_{n}}{p^{4}}-\left(\frac{(p+2)\hat{\theta}_{n}}{p}\right)^{2}-4\frac{\hat{\varrho}_{n}}{p^{3}}\left(\frac{(p+2)\hat{\theta}_{n}}{p}\right)+4\left(\frac{(p+2)\hat{\theta}_{n}}{p}\right)^{3}, (9)

    where

    ϱ^n\displaystyle{\hat{\varrho}_{n}} =\displaystyle= p(p+2)(p+4)i=1n((\bmXi\bmX¯)(\bmXi\bmX¯))3/ntr3𝚺^+6tr𝚺^tr𝚺^2+8tr𝚺^3,\displaystyle\frac{p(p+2)(p+4)\sum_{i=1}^{n}((\bm{X}_{i}-\bar{\bm{X}})^{\top}(\bm{X}_{i}-\bar{\bm{X}}))^{3}/n}{\mbox{tr}^{3}\widehat{\mbox{\boldmath$\Sigma$}}+6\mbox{tr}\widehat{\mbox{\boldmath$\Sigma$}}\mbox{tr}\widehat{\mbox{\boldmath$\Sigma$}}^{2}+8\mbox{tr}\widehat{\mbox{\boldmath$\Sigma$}}^{3}},
    φ^n\displaystyle{\hat{\varphi}_{n}} =\displaystyle= p(p+2)(p+4)(p+6)i=1n((\bmXi\bmX¯)(\bmXi\bmX¯))4/ntr4𝚺^+12tr2𝚺^tr𝚺^2+12tr2𝚺^2+32tr𝚺^tr𝚺^3+48tr𝚺^4.\displaystyle\frac{p(p+2)(p+4)(p+6)\sum_{i=1}^{n}((\bm{X}_{i}-\bar{\bm{X}})^{\top}(\bm{X}_{i}-\bar{\bm{X}}))^{4}/n}{\mbox{tr}^{4}\widehat{\mbox{\boldmath$\Sigma$}}+12\mbox{tr}^{2}\widehat{\mbox{\boldmath$\Sigma$}}\mbox{tr}\widehat{\mbox{\boldmath$\Sigma$}}^{2}+12\mbox{tr}^{2}\widehat{\mbox{\boldmath$\Sigma$}}^{2}+32\mbox{tr}\widehat{\mbox{\boldmath$\Sigma$}}\mbox{tr}\widehat{\mbox{\boldmath$\Sigma$}}^{3}+48\mbox{tr}\widehat{\mbox{\boldmath$\Sigma$}}^{4}}.

Given the estimator σn2\sigma_{n}^{2}, the 1α1-\alpha confidence interval for θ\theta can be constructed as

[θ^nσ^nnZα/2,θ^n+σ^nnZα/2].\displaystyle\left[\hat{\theta}_{n}-\frac{\hat{\sigma}_{n}}{\sqrt{n}}Z_{\alpha/2},~{}\hat{\theta}_{n}+\frac{\hat{\sigma}_{n}}{\sqrt{n}}Z_{\alpha/2}\right]. (10)

3 Simulation studies

In this section, we assess the finite-sample performance of the proposed estimation method via extensive simulation studies. We generate the data matrix from the elliptical model

\bmXi=𝝁+ξi𝚺1/2\bmUi,i=1,,n,\bm{X}_{i}=\mbox{\boldmath$\mu$}+\xi_{i}\mbox{\boldmath$\Sigma$}^{1/2}{\bm{U}}_{i},i=1,\ldots,n,

where 𝚺=(0.5|jk|)\mbox{\boldmath$\Sigma$}=\left(0.5^{|j-k|}\right) is a p×pp\times p Toeplitz matrix. According to the examples given in Section 2.3, we generate ξi2\xi_{i}^{2} from the following four distributions, all of which satisfy the identifiable assumption 𝔼ξi2=p\mathbb{E}\xi_{i}^{2}=p:

  • (1)

    Chi-squared distribution with pp degrees of freedom χp2\chi^{2}_{p} such that θ=1\theta=1;

  • (2)

    ξi2=ωi2/(p+1)\xi_{i}^{2}=\omega_{i}^{2}/(p+1) with ωi\omega_{i}\simGamma(p,1)(p,1);

  • (3)

    FF distribution with pp and 9 degrees of freedom such that θ=1.4\theta=1.4;

  • (4)

    ξi2=R1iR2i\xi_{i}^{2}=R_{1i}R_{2i} with R1iexp(1)R_{1i}\sim\exp(1) and R2iχp2R_{2i}\sim\chi^{2}_{p} such that θ=2\theta=2.

3.1 Estimation

For all the numerical studies in this section, we consider various dimensionality levels p{100,200,400,800,1600}p\in\{100,200,400,800,1600\} with the sample size n=100n=100. For comparison, we consider an oracle estimator as a benchmark. Note that

ξi2=(\bmXi𝝁)𝚺1(\bmXi𝝁),i=1,,n.\displaystyle\xi_{i}^{2}=(\bm{X}_{i}-\mbox{\boldmath$\mu$})^{\top}\mbox{\boldmath$\Sigma$}^{-1}(\bm{X}_{i}-\mbox{\boldmath$\mu$}),\ i=1,\cdots,n.

Then, the oracle estimator for θ=𝔼ξ4/(p(p+2))\theta=\mathbb{E}\xi^{4}/(p(p+2)) can be constructed as

θ^Oracle=1np(p+2)i=1n[(\bmXi𝝁)𝚺1(\bmXi𝝁)]2\displaystyle\hat{\theta}^{Oracle}=\frac{1}{np(p+2)}\sum_{i=1}^{n}\left[(\bm{X}_{i}-\mbox{\boldmath$\mu$})^{\top}\mbox{\boldmath$\Sigma$}^{-1}(\bm{X}_{i}-\mbox{\boldmath$\mu$})\right]^{2}

when the true parameters 𝝁\mu and \bmΣ\bm{\Sigma} are known. Moreover, we implement the coordinate-based method proposed by Ke et al. (2018), which is denoted as θ^KBF\hat{\theta}^{KBF}; and the method proposed by Wang and Lopes (2023), which relies on the moment equation (2) and the sample mean and sample covariance matrix, we denote it as θ^WL\hat{\theta}^{WL}. To evaluate the performance of different methods, we summarize the mean and the empirical standard errors over 1000 replications.

Table 1 reports the estimation results of the different methods under the simulation settings described in Section 3. We see that our method performs comparably with the oracle estimator across all scenarios, indicating that our method is competitive on the kurtosis estimation. Moreover, compared to the methods proposed by Ke et al. (2018) and Wang and Lopes (2023), our method has a lower estimation error in most settings.

Table 1: Estimation results: means and standard errors (in parentheses) for the kurtosis estimates based on 1000 replications.
p=100p=100 p=200p=200 p=400p=400 p=800p=800 p=1600p=1600
Normal distribution (θ=1)(\theta=1)
θ^n\hat{\theta}_{n} 1.000(0.005) 1.000(0.002) 1.000(0.001) 1.000(0.001) 1.000(0.000)
θ^WL\hat{\theta}^{WL} 0.999(0.005) 0.999(0.002) 1.000(0.001) 1.000(0.001) 1.000(0.000)
θ^KBF\hat{\theta}^{KBF} 0.979(0.016) 0.980(0.011) 0.980(0.008) 0.980(0.006) 0.980(0.004)
θ^Oracle\hat{\theta}^{Oracle} 1.000(0.003) 1.000(0.001) 1.000(0.001) 1.000(0.000) 1.000(0.000)
Kotz-type distribution (θ=(p+3)/(p+1))(\theta=(p+3)/(p+1))
θ^n\hat{\theta}_{n} 1.020(0.008) 1.010(0.004) 1.005(0.002) 1.002(0.001) 1.001(0.000)
θ^WL\hat{\theta}^{WL} 1.018(0.008) 1.008(0.004) 1.004(0.002) 1.002(0.001) 1.001(0.000)
θ^KBF\hat{\theta}^{KBF} 0.997(0.018) 0.989(0.012) 0.984(0.008) 0.982(0.006) 0.981(0.004)
θ^Oracle\hat{\theta}^{Oracle} 1.020(0.006) 1.010(0.003) 1.005(0.001) 1.002(0.001) 1.001(0.000)
Multivariate tt distribution (θ=1.4)(\theta=1.4)
θ^n\hat{\theta}_{n} 1.385(0.207) 1.395(0.293) 1.388(0.208) 1.370(0.187) 1.383(0.184)
θ^WL\hat{\theta}^{WL} 1.355(0.186) 1.365(0.243) 1.359(0.185) 1.343(0.168) 1.355(0.165)
θ^KBF\hat{\theta}^{KBF} 1.277(0.121) 1.283(0.135) 1.280(0.118) 1.268(0.105) 1.277(0.107)
θ^Oracle\hat{\theta}^{Oracle} 1.386(0.208) 1.398(0.309) 1.387(0.205) 1.371(0.186) 1.383(0.185)
Multivariate Laplace distribution (θ=2)(\theta=2)
θ^n\hat{\theta}_{n} 2.009(0.225) 2.000(0.216) 2.004(0.200) 2.001(0.212) 2.011(0.209)
θ^WL\hat{\theta}^{WL} 1.919(0.198) 1.911(0.190) 1.915(0.176) 1.912(0.187) 1.921(0.184)
θ^KBF\hat{\theta}^{KBF} 1.783(0.153) 1.776(0.144) 1.781(0.136) 1.776(0.141) 1.784(0.140)
θ^Oracle\hat{\theta}^{Oracle} 2.006(0.211) 2.001(0.208) 2.005(0.196) 2.001(0.210) 2.011(0.209)

3.2 Inference

In this subsection, we assess the performance of the proposed procedure for constructing confidence interval for the kurtosis parameter under the simulation settings described in Section 3. For comparison, we consider the following three types of confidence intervals:

  • CI1: confidence intervals (4) - (7), respectively, with the known distribution family;

  • CI2: confidence interval (10) without knowing the specific distribution family;

  • CIKBF: confidence interval constructed in Ke et al. (2018).

We set the significance level α=0.05\alpha=0.05. To measure the reliability and accuracy of different methods for constructing confidence intervals, we calculate the average empirical coverage probability and average width of confidence interval.

Table 2 reports the results under different dimensionality levels based on 1000 replications. CIKBF is the most conservative as it produces the widest confidence intervals with slightly inflated coverage probability. The proposed methods CI1 and CI2 achieve a good balance between reliability (high coverage probability) and accuracy (narrow CI width). Comparing the four distribution families, the proposed methods give more accurate confidence intervals under the normal and Kotz-type distributions, which is in accordance with Theorem 2.2 in Section 2.2.

Table 2: Empirical coverage probability and average width of confidence interval of three methods with the significance level α=0.05\alpha=0.05. The results are averaged over 1000 datasets.
pp CI1 CI2 CIKBF
ECP AL ECP AL ECP AL
Normal distribution
100 0.9340.934 0.0180.018 0.9310.931 0.0180.018 11 1.1941.194
200 0.9500.950 0.0090.009 0.9420.942 0.0090.009 0.9990.999 1.1891.189
400 0.9510.951 0.0050.005 0.9490.949 0.0050.005 0.9960.996 1.1821.182
800 0.9530.953 0.0020.002 0.9260.926 0.0020.002 0.9980.998 1.1801.180
1600 0.9410.941 0.0010.001 0.9340.934 0.0010.001 0.9950.995 1.1781.178
Kotz-type distribution
100 0.9460.946 0.0290.029 0.9280.928 0.0290.029 0.9960.996 1.2611.261
200 0.9460.946 0.0150.015 0.9330.933 0.0150.015 0.9960.996 1.2201.220
400 0.9460.946 0.0070.007 0.9250.925 0.0070.007 0.9960.996 1.1991.199
800 0.9520.952 0.0040.004 0.9390.939 0.0040.004 0.9980.998 1.1891.189
1600 0.9470.947 0.0020.002 0.9340.934 0.0020.002 0.9950.995 1.1831.183
Multivariate tt distribution
100 0.9000.900 0.3700.370 0.8060.806 0.3740.374 0.9940.994 2.3892.389
200 0.9080.908 0.3860.386 0.8710.871 0.3730.373 0.9860.986 2.3172.317
400 0.9250.925 0.3860.386 0.9040.904 0.3930.393 0.9830.983 2.3482.348
800 0.9240.924 0.3940.394 0.9260.926 0.4030.403 0.9850.985 2.3402.340
1600 0.9330.933 0.3930.393 0.9330.933 0.3960.396 0.9880.988 2.3272.327
Multivariate Laplace distribution
100 0.9350.935 0.7840.784 0.9340.934 0.7960.796 0.9890.989 4.0924.092
200 0.9570.957 0.7960.796 0.9540.954 0.8220.822 0.9930.993 4.0944.094
400 0.9550.955 0.8020.802 0.9610.961 0.8460.846 0.9930.993 4.0644.064
800 0.9690.969 0.7840.784 0.9590.959 0.8500.850 0.9910.991 4.0544.054
1600 0.9490.949 0.7950.795 0.9630.963 0.8640.864 0.9920.992 4.0944.094

4 Real data analysis

In this section, we illustrate the proposed method by analysing the following data sets:

  • The iris flower dataset, available in the R package datasetsdatasets, consists of 150 samples capturing four measurements from three different species of the iris flower.

  • The Wisconsin breast cancer dataset, sourced from the UCI Machine Learning Repository, comprises 569 samples featuring 30 distinct characteristics of cell nuclei derived from fine needle aspirates of breast tissue. The samples are labeled as either malignant or benign.

  • The Parkinson’s dataset, available from the UCI Machine Learning Repository, encompasses voice measurements from 188 patients and 64 healthy individuals. With a focus on Parkinson’s Disease diagnosis and analysis, the dataset includes 752 features covering a range of voice aspects such as fundamental frequency, spectral residual, and variability.

  • The prostate cancer data contains 102 samples with 6033 genes, in which 52 are patients with tumor and 50 are normal individuals. The detailed description can be found in Dettling (2004) and the dataset can be downloaded from http://stat.ethz.ch/~dettling/bagboost.html.

For each dataset, we apply the proposed method to estimate the kurtosis parameter and construct the 95% confidence interval using (10) with Cases (i) and (ii). As shown in Table 3, the malignant breast cancer and the Parkinson’s datasets exhibit large kurtosis values, indicating that the data are heavy-tailed. Furthermore, the CI for Case (i) may be incredible for these data because if Case (i) holds, θ=1+O(1/p)\theta=1+O(1/p).

Table 3: Estimation of kurtosis and corresponding 95% confidence interval for real datasets.
dataset class (n,p)(n,p) θn\theta_{n} 95%CI
Case (i) Case (ii)
iris setosa (50,4) 1.059 (0.563,1.555) (0.676,1.442)
versicolor (50,4) 0.874 (0.465,1.284) (0.633,1.116)
virginica (50,4) 1.008 (0.521,1.495) (0.655,1.361)
breast cancer malignant (212,30) 1.839 (1.317,2.362) (1.783,1.895)
benign (357,30) 1.076 (0.779,1.374) (1.044,1.108)
Parkinson healthy (192,752) 5.211 (3.967,6.457) (4.263,6.161)
patients (564,752) 2.271 (1.888,2.652) (2.200,2.340)
Prostate healthy (50,6033) 0.825 (0.643,1.008) (0.696,0.954)
patients (52,6033) 1.095 (0.889,1.301) (0.909,1.280)

5 Conclusions

In this paper, we conduct a U-statistic form estimation of the kurtosis parameter in elliptical distributions. Theoretically, we extend the results in existing works(Ke et al., 2018; Wang and Lopes, 2023). In special, the consistency of the proposed estimation is established under mild conditions and we derive the asymptotic normality under different moment conditions, which implies different convergence rate of the proposed estimator. A direct generalization of our method is to consider higher moments of ξ2\xi^{2} like Ke et al. (2018), and one may use higher order U-statistic estimations. Another extension is to estimate the kurtosis parameter in a multivariate time series with elliptically-distributed noise. We leave all these as future works.

References

  • Bao and Ullah (2010) Y. Bao and A. Ullah. Expectation of quadratic forms in normal and nonnormal variables with applications. Journal of Statistical Planning and Inference, 140(5):1193–1205, 2010.
  • Chen and Qin (2010) S. X. Chen and Y.-L. Qin. A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics, 38(2):808 – 835, 2010.
  • Chen et al. (2010) S. X. Chen, L.-X. Zhang, and P.-S. Zhong. Tests for high-dimensional covariance matrices. Journal of the American Statistical Association, 105(490):810–819, 2010.
  • Čížek et al. (2011) P. Čížek, W. Härdle, R. Weron, and W. Härdle. Statistical tools for finance and insurance. Springer, 2011.
  • Dettling (2004) M. Dettling. Bagboosting for tumor classification with gene expression data. Bioinformatics, 20(18):3583–3593, 2004.
  • Eltoft et al. (2006) T. Eltoft, T. Kim, and T.-W. Lee. On the multivariate laplace distribution. IEEE Signal Processing Letters, 13(5):300–303, 2006.
  • Fan et al. (2015) J. Fan, Z. T. Ke, H. Liu, and L. Xia. QUADRO: A supervised dimension reduction method via Rayleigh quotient optimization. Annals of Statistics, 43(4):1498, 2015.
  • Fan et al. (2018) J. Fan, H. Liu, and W. Wang. Large covariance estimation through elliptical factor models. Annals of Statistics, 46(4):1383, 2018.
  • Fang and Zhang (1990) K.-T. Fang and Y.-T. Zhang. Generalized multivariate analysis. Springer, Berlin, 1990.
  • Guo and Chen (2016) B. Guo and S. X. Chen. Tests for high dimensional generalized linear models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):1079–1102, 2016.
  • Gupta et al. (2013) A. K. Gupta, T. Varga, T. Bodnar, et al. Elliptically contoured models in statistics and portfolio theory. Springer, 2013.
  • Han and Liu (2012) F. Han and H. Liu. Transelliptical component analysis. Advances in Neural Information Processing Systems, 25, 2012.
  • Hu et al. (2019) J. Hu, W. Li, Z. Liu, and W. Zhou. High-dimensional covariance matrices in elliptical distributions with application to spherical test. Annals of Statistics, 47(1):527–555, 2019.
  • Ke et al. (2018) Z. T. Ke, K. Bose, and J. Fan. Higher moment estimation for elliptically-distributed data: Is it necessary to use a sledgehammer to crack an egg? arXiv:1812.05697, 2018.
  • Kotz (1975) S. Kotz. Multivariate distributions at a cross road. In A Modern Course on Statistical Distributions in Scientific Work: Volume 1—Models and Structures Proceedings of the NATO Advanced Study Institute held at the University of Calgagry, Calgary, Alberta, Canada July 29–August 10, 1974, pages 247–270. Springer, 1975.
  • McNeil et al. (2015) A. J. McNeil, R. Frey, and P. Embrechts. Quantitative risk management: concepts, techniques and tools-revised edition. Princeton university press, 2015.
  • Posekany et al. (2011) A. Posekany, K. Felsenstein, and P. Sykacek. Biological assessment of robust noise models in microarray data analysis. Bioinformatics, 27(6):807–814, 2011.
  • Seo and Toyama (1996) T. Seo and T. Toyama. On the estimation of kurtosis parameter in elliptical distributions. Journal of the Japan Statistical Society, 26(1):59–68, 1996.
  • Thomas et al. (2010) R. Thomas, L. de la Torre, X. Chang, and S. Mehrotra. Validation and characterization of dna microarray gene expression data distribution and associated moments. BMC bioinformatics, 11:1–14, 2010.
  • Wang and Lopes (2023) S. Wang and M. E. Lopes. A bootstrap method for spectral statistics in high-dimensional elliptical models. Electronic Journal of Statistics, 17(2):1848 – 1892, 2023.

Appendix A Proofs and supporting lemmas

We first present several supporting lemmas.

Lemma A.1.

Let \bmUp{\bm{U}}\in\mathbb{R}^{p} be a random vector which is uniformly distribution on the unit sphere 𝒮p1\mathcal{S}^{p-1}. For any p×pp\times p deterministic symmetric matrices \bmA\bm{A}, \bmB\bm{B} and \bmC\bm{C}, we have

𝔼\bmU\bmA\bmU=1ptr\bmA,\displaystyle\mathbb{E}{\bm{U}}^{\top}\bm{A}{\bm{U}}=\frac{1}{p}\mbox{tr}\bm{A}, (11)
𝔼(\bmU\bmA\bmU)(\bmU\bmB\bmU)=(tr\bmAtr\bmB+2tr\bmA\bmB)p(p+2),\displaystyle\mathbb{E}\left({\bm{U}}^{\top}\bm{A}{\bm{U}}\right)\left({\bm{U}}^{\top}\bm{B}{\bm{U}}\right)=\frac{\left(\mbox{tr}\bm{A}\mbox{tr}\bm{B}+2\mbox{tr}\bm{A}\bm{B}\right)}{p(p+2)}, (12)
𝔼(\bmU\bmA\bmU)(\bmU\bmB\bmU)(\bmU\bmC\bmU)\displaystyle\mathbb{E}\left({\bm{U}}^{\top}\bm{A}{\bm{U}}\right)\left({\bm{U}}^{\top}\bm{B}{\bm{U}}\right)\left({\bm{U}}^{\top}\bm{C}{\bm{U}}\right)
=(tr\bmAtr\bmBtr\bmC+2tr\bmAtr\bmB\bmC+2tr\bmBtr\bmA\bmC+2tr\bmCtr\bmA\bmB+8tr\bmA\bmB\bmC)p(p+2)(p+4),\displaystyle\qquad\qquad\qquad=\frac{\left(\mbox{tr}\bm{A}\mbox{tr}\bm{B}\mbox{tr}\bm{C}+2\mbox{tr}\bm{A}\mbox{tr}\bm{B}\bm{C}+2\mbox{tr}\bm{B}\mbox{tr}\bm{A}\bm{C}+2\mbox{tr}\bm{C}\mbox{tr}\bm{A}\bm{B}+8\mbox{tr}\bm{A}\bm{B}\bm{C}\right)}{p(p+2)(p+4)}, (13)
𝔼(\bmU\bmA\bmU)4=(tr4\bmA+12tr2\bmAtr\bmA2+12tr2\bmA2+32tr\bmAtr\bmA3+48tr\bmA4)p(p+2)(p+4)(p+6).\displaystyle\mathbb{E}\left({\bm{U}}^{\top}\bm{A}{\bm{U}}\right)^{4}=\frac{\left(\mbox{tr}^{4}\bm{A}+12\mbox{tr}^{2}\bm{A}\mbox{tr}\bm{A}^{2}+12\mbox{tr}^{2}\bm{A}^{2}+32\mbox{tr}\bm{A}\mbox{tr}\bm{A}^{3}+48\mbox{tr}\bm{A}^{4}\right)}{p(p+2)(p+4)(p+6)}. (14)
Proof of Lemma A.1.

(11) and (12) is the direct result of Hu et al. (2019). By Lemma S4.1 of Hu et al. (2019), we have

𝔼(\bmU\bmA\bmU)(\bmU\bmB\bmU)(\bmU\bmC\bmU)=𝔼(\bmZ\bmA\bmZ)(\bmZ\bmB\bmZ)(\bmZ\bmC\bmZ)𝔼\bmZ26,\displaystyle\mathbb{E}\left({\bm{U}}^{\top}\bm{A}{\bm{U}}\right)\left({\bm{U}}^{\top}\bm{B}{\bm{U}}\right)\left({\bm{U}}^{\top}\bm{C}{\bm{U}}\right)=\frac{\mathbb{E}(\bm{Z}^{\top}\bm{A}\bm{Z})(\bm{Z}^{\top}\bm{B}\bm{Z})(\bm{Z}^{\top}\bm{C}\bm{Z})}{\mathbb{E}\|\bm{Z}\|_{2}^{6}},
𝔼(\bmU\bmA\bmU)4=𝔼(\bmZ\bmA\bmZ)4𝔼\bmZ28,\displaystyle\mathbb{E}\left({\bm{U}}^{\top}\bm{A}{\bm{U}}\right)^{4}=\frac{\mathbb{E}(\bm{Z}^{\top}\bm{A}\bm{Z})^{4}}{\mathbb{E}\|\bm{Z}\|_{2}^{8}},

where \bmZN(\bm0,\bmI)\bm{Z}\sim N(\bm{0},\bm{I}). A detailed result of 𝔼(\bmZ\bmA\bmZ)(\bmZ\bmB\bmZ)(\bmZ\bmC\bmZ)\mathbb{E}(\bm{Z}^{\top}\bm{A}\bm{Z})(\bm{Z}^{\top}\bm{B}\bm{Z})(\bm{Z}^{\top}\bm{C}\bm{Z}) and 𝔼(\bmZ\bmA\bmZ)4\mathbb{E}(\bm{Z}^{\top}\bm{A}\bm{Z})^{4} can be found in Theorem 1 of Bao and Ullah (2010), from which we can get (13) and (14). ∎

Lemma A.2.

For an centered elliptical distributed variable,

\bmX0=ξ𝚺12\bmU,\displaystyle\bm{X}_{0}=\xi\mbox{\boldmath$\Sigma$}^{\frac{1}{2}}{\bm{U}},

where 𝔼ξ2=p\mathbb{E}\xi^{2}=p, we have

𝔼\bmX0\bmX0=\displaystyle\mathbb{E}\bm{X}_{0}^{\top}\bm{X}_{0}= tr𝚺;\displaystyle\mbox{tr}\mbox{\boldmath$\Sigma$};
var(\bmX0\bmX0)=\displaystyle\mbox{var}\left(\bm{X}_{0}^{\top}\bm{X}_{0}\right)= 𝔼(\bmX0\bmX0tr𝚺)2=2θtr𝚺2+(θ1)tr2𝚺;\displaystyle\mathbb{E}\left(\bm{X}_{0}^{\top}\bm{X}_{0}-\mbox{tr}\mbox{\boldmath$\Sigma$}\right)^{2}=2\theta\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}+(\theta-1)\mbox{tr}^{2}\mbox{\boldmath$\Sigma$};
var((\bmX0\bmX0tr𝚺)2)=\displaystyle\mbox{var}\left((\bm{X}_{0}^{\top}\bm{X}_{0}-\mbox{tr}\mbox{\boldmath$\Sigma$})^{2}\right)= (η44η3η22+8η24)tr4𝚺+4(3η4η22)tr2𝚺2\displaystyle(\eta_{4}-4\eta_{3}-\eta_{2}^{2}+8\eta_{2}-4)\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}+4(3\eta_{4}-\eta_{2}^{2})\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}
+4(3η46η3η22+4η2)tr2𝚺tr𝚺2+32(η4η3)tr𝚺tr𝚺3\displaystyle+4\left(3\eta_{4}-6\eta_{3}-\eta_{2}^{2}+4\eta_{2}\right)\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}+32(\eta_{4}-\eta_{3})\mbox{tr}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{3}
+48η4tr𝚺4;\displaystyle+48\eta_{4}\mbox{tr}\mbox{\boldmath$\Sigma$}^{4};
var((\bmX0\bmX0θtr𝚺)2)=\displaystyle\mbox{var}\left((\bm{X}_{0}^{\top}\bm{X}_{0}-\theta\mbox{tr}\mbox{\boldmath$\Sigma$})^{2}\right)= (η44η3η2η22+4η23)tr4𝚺+4(3η4η22)tr2𝚺2\displaystyle(\eta_{4}-4\eta_{3}\eta_{2}-\eta_{2}^{2}+4\eta_{2}^{3})\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}+4(3\eta_{4}-\eta_{2}^{2})\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}
+4(3η46η3η2+2η23+η22)tr2𝚺tr𝚺2\displaystyle+4\left(3\eta_{4}-6\eta_{3}\eta_{2}+2\eta_{2}^{3}+\eta_{2}^{2}\right)\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}
+32(η4η3η2)tr𝚺tr𝚺3+48η4tr𝚺4,\displaystyle+32(\eta_{4}-\eta_{3}\eta_{2})\mbox{tr}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{3}+48\eta_{4}\mbox{tr}\mbox{\boldmath$\Sigma$}^{4},

where ηm=𝔼ξ2m/𝔼Ym\eta_{m}=\mathbb{E}\xi^{2m}/\mathbb{E}Y^{m}, m=1,2,,m=1,2,\cdots, YY follows a chi-squared distribution with pp degree of freedom. In particular, η1=1\eta_{1}=1 and η2=θ\eta_{2}=\theta.

Proof of Lemma A.2.

Noting that ξ\xi and \bmU𝚺\bmU{\bm{U}}^{\top}\mbox{\boldmath$\Sigma$}{\bm{U}} are independent, we have

𝔼(\bmX0\bmX0)k=𝔼ξ2k𝔼(\bmU𝚺\bmU)k,k=1,2,3,4.\displaystyle\mathbb{E}\left(\bm{X}_{0}^{\top}\bm{X}_{0}\right)^{k}=\mathbb{E}\xi^{2k}\cdot\mathbb{E}\left({\bm{U}}^{\top}\mbox{\boldmath$\Sigma$}{\bm{U}}\right)^{k},~{}k=1,2,3,4.

Recalling

ηk=𝔼ξ2kp(p+2)(p+2k2),\displaystyle\eta_{k}=\frac{\mathbb{E}\xi^{2k}}{p(p+2)\cdots(p+2k-2)},

and η1=1,η2=θ\eta_{1}=1,~{}\eta_{2}=\theta, we conclude the results from Lemma A.1. ∎

Both T1T_{1}, T2T_{2} and T3T_{3} are invariant with respect to the location transformation, we assume that 𝝁=\bm0\mbox{\boldmath$\mu$}=\bm{0} in the rest of the paper.

A.1 Proof of Theorem 2.1

Proof.

Consider the Hoeffding decomposition of T1T_{1}, T2T_{2} and T3T_{3}. Defining the kernel functions

k1(\bmx1,\bmx2,\bmx3,\bmx4)=124(i,j,k,l)=π(1,2,3,4)(\bmx1\bmx222\bmx3\bmx422)2,\displaystyle k_{1}(\bm{x}_{1},\bm{x}_{2},\bm{x}_{3},\bm{x}_{4})=\frac{1}{24}\sum_{(i,j,k,l)=\pi(1,2,3,4)}\left(\|\bm{x}_{1}-\bm{x}_{2}\|_{2}^{2}-\|\bm{x}_{3}-\bm{x}_{4}\|_{2}^{2}\right)^{2},
k2(\bmx1,\bmx2,\bmx3,\bmx4)=124(i,j,k,l)=π(1,2,3,4)\bmx1\bmx222\bmx3\bmx422,\displaystyle k_{2}(\bm{x}_{1},\bm{x}_{2},\bm{x}_{3},\bm{x}_{4})=\frac{1}{24}\sum_{(i,j,k,l)=\pi(1,2,3,4)}\|\bm{x}_{1}-\bm{x}_{2}\|_{2}^{2}\|\bm{x}_{3}-\bm{x}_{4}\|_{2}^{2},
k3(\bmx1,\bmx2,\bmx3,\bmx4)=124(i,j,k,l)=π(1,2,3,4)((\bmx1\bmx2)(\bmx3\bmx4))2,\displaystyle k_{3}(\bm{x}_{1},\bm{x}_{2},\bm{x}_{3},\bm{x}_{4})=\frac{1}{24}\sum_{(i,j,k,l)=\pi(1,2,3,4)}\left((\bm{x}_{1}-\bm{x}_{2})^{\top}(\bm{x}_{3}-\bm{x}_{4})\right)^{2},

we have

T1=14Cn4i<j<k<lk1(\bmXi,\bmXj,\bmXk,\bmXl),\displaystyle T_{1}=\frac{1}{4C_{n}^{4}}\sum_{i<j<k<l}k_{1}(\bm{X}_{i},\bm{X}_{j},\bm{X}_{k},\bm{X}_{l}),
T2=14Cn4i<j<k<lk2(\bmXi,\bmXj,\bmXk,\bmXl),\displaystyle T_{2}=\frac{1}{4C_{n}^{4}}\sum_{i<j<k<l}k_{2}(\bm{X}_{i},\bm{X}_{j},\bm{X}_{k},\bm{X}_{l}),
T3=14Cn4i<j<k<lk3(\bmXi,\bmXj,\bmXk,\bmXl).\displaystyle T_{3}=\frac{1}{4C_{n}^{4}}\sum_{i<j<k<l}k_{3}(\bm{X}_{i},\bm{X}_{j},\bm{X}_{k},\bm{X}_{l}).

Note that

3k1(\bmx1,\bmx2,\bmx3,\bmx4)=\displaystyle 3k_{1}(\bm{x}_{1},\bm{x}_{2},\bm{x}_{3},\bm{x}_{4})= (\bmx1\bmx222\bmx3\bmx422)2+(\bmx1\bmx322\bmx2\bmx422)2\displaystyle\left(\|\bm{x}_{1}-\bm{x}_{2}\|_{2}^{2}-\|\bm{x}_{3}-\bm{x}_{4}\|_{2}^{2}\right)^{2}+\left(\|\bm{x}_{1}-\bm{x}_{3}\|_{2}^{2}-\|\bm{x}_{2}-\bm{x}_{4}\|_{2}^{2}\right)^{2}
+(\bmx1\bmx422\bmx2\bmx322)2,\displaystyle+\left(\|\bm{x}_{1}-\bm{x}_{4}\|_{2}^{2}-\|\bm{x}_{2}-\bm{x}_{3}\|_{2}^{2}\right)^{2},
3k2(\bmx1,\bmx2,\bmx3,\bmx4)=\displaystyle 3k_{2}(\bm{x}_{1},\bm{x}_{2},\bm{x}_{3},\bm{x}_{4})= \bmx1\bmx222\bmx3\bmx422+\bmx1\bmx322\bmx2\bmx422\displaystyle\|\bm{x}_{1}-\bm{x}_{2}\|_{2}^{2}\|\bm{x}_{3}-\bm{x}_{4}\|_{2}^{2}+\|\bm{x}_{1}-\bm{x}_{3}\|_{2}^{2}\|\bm{x}_{2}-\bm{x}_{4}\|_{2}^{2}
+\bmx1\bmx422\bmx2\bmx322,\displaystyle+\|\bm{x}_{1}-\bm{x}_{4}\|_{2}^{2}\|\bm{x}_{2}-\bm{x}_{3}\|_{2}^{2},
3k3(\bmx1,\bmx2,\bmx3,\bmx4)=\displaystyle 3k_{3}(\bm{x}_{1},\bm{x}_{2},\bm{x}_{3},\bm{x}_{4})= ((\bmx1\bmx2)(\bmx3\bmx4))2+((\bmx1\bmx3)(\bmx2\bmx4))2\displaystyle\left((\bm{x}_{1}-\bm{x}_{2})^{\top}(\bm{x}_{3}-\bm{x}_{4})\right)^{2}+\left((\bm{x}_{1}-\bm{x}_{3})^{\top}(\bm{x}_{2}-\bm{x}_{4})\right)^{2}
+((\bmx1\bmx4)(\bmx2\bmx3))2.\displaystyle+\left((\bm{x}_{1}-\bm{x}_{4})^{\top}(\bm{x}_{2}-\bm{x}_{3})\right)^{2}.

The conditional expectations of the centered kernel functions is

k11(\bmX1)=\displaystyle k_{11}(\bm{X}_{1})= 𝔼(k1(\bmX1,\bmX2,\bmX3,\bmX4)|\bmX1)𝔼k1(\bmX1,\bmX2,\bmX3,\bmX4)\displaystyle\mathbb{E}\left(k_{1}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})|\bm{X}_{1}\right)-\mathbb{E}k_{1}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})
=\displaystyle= ((\bmX1\bmX1tr𝚺)2𝔼(\bmX1\bmX1tr𝚺)2)+4(\bmX1𝚺\bmX1tr𝚺2),\displaystyle\left((\bm{X}_{1}^{\top}\bm{X}_{1}-\mbox{tr}\mbox{\boldmath$\Sigma$})^{2}-\mathbb{E}(\bm{X}_{1}^{\top}\bm{X}_{1}-\mbox{tr}\mbox{\boldmath$\Sigma$})^{2}\right)+4(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1}-\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}),
k21(\bmX1)=\displaystyle k_{21}(\bm{X}_{1})= 𝔼(k2(\bmX1,\bmX2,\bmX3,\bmX4)|\bmX1)𝔼k2(\bmX1,\bmX2,\bmX3,\bmX4)\displaystyle\mathbb{E}\left(k_{2}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})|\bm{X}_{1}\right)-\mathbb{E}k_{2}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})
=\displaystyle= 2tr𝚺(\bmX1\bmX1tr𝚺),\displaystyle 2\mbox{tr}\mbox{\boldmath$\Sigma$}\left(\bm{X}_{1}^{\top}\bm{X}_{1}-\mbox{tr}\mbox{\boldmath$\Sigma$}\right),
k31(\bmX1)=\displaystyle k_{31}(\bm{X}_{1})= 𝔼(k3(\bmX1,\bmX2,\bmX3,\bmX4)|\bmX1)𝔼k3(\bmX1,\bmX2,\bmX3,\bmX4)\displaystyle\mathbb{E}\left(k_{3}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})|\bm{X}_{1}\right)-\mathbb{E}k_{3}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})
=\displaystyle= 2(\bmX1𝚺\bmX1tr𝚺2).\displaystyle 2\left(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1}-\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}\right).

By Lemma A.1, we have

var(k21(\bmX1))=4(θ1)tr4𝚺+8θtr2𝚺tr𝚺212θtr4𝚺,\displaystyle\mbox{var}\left(k_{21}(\bm{X}_{1})\right)=4(\theta-1)\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}+8\theta\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}\leq 12\theta\mbox{tr}^{4}\mbox{\boldmath$\Sigma$},
var(k31(\bmX1))=4(θ1)tr2𝚺2+8θtr𝚺412θtr2𝚺2=o(tr4𝚺).\displaystyle\mbox{var}\left(k_{31}(\bm{X}_{1})\right)=4(\theta-1)\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}+8\theta\mbox{tr}\mbox{\boldmath$\Sigma$}^{4}\leq 12\theta\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}=o\left(\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}\right).

For k11(\bmX1)k_{11}(\bm{X}_{1}), note that

var(\bmX1𝚺\bmX1)=\displaystyle\mbox{var}\left(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1}\right)= (θ1)tr2𝚺2+2θtr𝚺4(3θ1)tr2𝚺2=o(tr4𝚺),\displaystyle(\theta-1)\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}+2\theta\mbox{tr}\mbox{\boldmath$\Sigma$}^{4}\leq(3\theta-1)\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}=o\left(\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}\right),

and by Lemma A.2,

var((\bmX1\bmX1tr𝚺)2)\displaystyle\mbox{var}\left((\bm{X}_{1}^{\top}\bm{X}_{1}-\mbox{tr}\mbox{\boldmath$\Sigma$})^{2}\right)\leq 𝔼(\bmX1\bmX1tr𝚺)4=𝔼((\bmX1\bmX1θtr𝚺)+(θ1)tr𝚺)4\displaystyle\mathbb{E}(\bm{X}_{1}^{\top}\bm{X}_{1}-\mbox{tr}\mbox{\boldmath$\Sigma$})^{4}=\mathbb{E}\left((\bm{X}_{1}^{\top}\bm{X}_{1}-\theta\mbox{tr}\mbox{\boldmath$\Sigma$})+(\theta-1)\mbox{tr}\mbox{\boldmath$\Sigma$}\right)^{4}
\displaystyle\leq 4𝔼(\bmX1\bmX1θ2tr𝚺)4+4𝔼(θ1)4tr4𝚺\displaystyle 4\mathbb{E}(\bm{X}_{1}^{\top}\bm{X}_{1}-\theta_{2}\mbox{tr}\mbox{\boldmath$\Sigma$})^{4}+4\mathbb{E}(\theta-1)^{4}\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}
\displaystyle\leq Ctr4𝚺.\displaystyle C\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}.

Next we bound the variance of ki(\bmX1,\bmX2,\bmX3,\bmX4)k_{i}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4}). By Lemma A.1,

var(k1(\bmX1,\bmX2,\bmX3,\bmX4))\displaystyle\mbox{var}\left(k_{1}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})\right)\leq var((\bmX1\bmX222\bmX3\bmX422)2)\displaystyle\mbox{var}\left(\left(\|\bm{X}_{1}-\bm{X}_{2}\|_{2}^{2}-\|\bm{X}_{3}-\bm{X}_{4}\|_{2}^{2}\right)^{2}\right)
\displaystyle\leq 𝔼(\bmX1\bmX222\bmX3\bmX422)4\displaystyle\mathbb{E}\left(\|\bm{X}_{1}-\bm{X}_{2}\|_{2}^{2}-\|\bm{X}_{3}-\bm{X}_{4}\|_{2}^{2}\right)^{4}
\displaystyle\leq 4𝔼\bmX1\bmX22864𝔼(\bmX1\bmX1)4=O(tr4𝚺).\displaystyle 4\mathbb{E}\|\bm{X}_{1}-\bm{X}_{2}\|_{2}^{8}\leq 64\mathbb{E}(\bm{X}_{1}^{\top}\bm{X}_{1})^{4}=O\left(\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}\right).

Similarly,

var(k2(\bmX1,\bmX2,\bmX3,\bmX4))\displaystyle\mbox{var}\left(k_{2}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})\right)\leq var(\bmX1\bmX222\bmX3\bmX422)\displaystyle\mbox{var}\left(\|\bm{X}_{1}-\bm{X}_{2}\|_{2}^{2}\|\bm{X}_{3}-\bm{X}_{4}\|_{2}^{2}\right)
\displaystyle\leq 𝔼\bmX1\bmX224\bmX3\bmX424\displaystyle\mathbb{E}\|\bm{X}_{1}-\bm{X}_{2}\|_{2}^{4}\|\bm{X}_{3}-\bm{X}_{4}\|_{2}^{4}
\displaystyle\leq 16(𝔼(\bmX1\bmX1)2)2=O(tr4𝚺),\displaystyle 16\left(\mathbb{E}(\bm{X}_{1}^{\top}\bm{X}_{1})^{2}\right)^{2}=O\left(\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}\right),
var(k3(\bmX1,\bmX2,\bmX3,\bmX4))\displaystyle\mbox{var}\left(k_{3}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})\right)\leq var(((\bmX1\bmX2)(\bmX3\bmX4))2)\displaystyle\mbox{var}\left(\left((\bm{X}_{1}-\bm{X}_{2})^{\top}(\bm{X}_{3}-\bm{X}_{4})\right)^{2}\right)
\displaystyle\leq 𝔼((\bmX1\bmX2)(\bmX3\bmX4))416𝔼(\bmX1\bmX2)4\displaystyle\mathbb{E}\left((\bm{X}_{1}-\bm{X}_{2})^{\top}(\bm{X}_{3}-\bm{X}_{4})\right)^{4}\leq 16\mathbb{E}(\bm{X}_{1}^{\top}\bm{X}_{2})^{4}
=\displaystyle= O(tr2𝚺2)=o(tr4𝚺).\displaystyle O(\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2})=o(\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}).

Combining the above terms we can get

var(T1tr2𝚺)42nvar(k11(\bmX1)4tr2𝚺)+O(1n2)O(1n),\displaystyle\mbox{var}\left(\frac{T_{1}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\right)\leq\frac{4^{2}}{n}\mbox{var}\left(\frac{k_{11}(\bm{X}_{1})}{4\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\right)+O(\frac{1}{n^{2}})\leq O(\frac{1}{n}),
var(T2tr2𝚺)42nvar(k21(\bmX1)4tr2𝚺)+O(1n2)O(1n),\displaystyle\mbox{var}\left(\frac{T_{2}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\right)\leq\frac{4^{2}}{n}\mbox{var}\left(\frac{k_{21}(\bm{X}_{1})}{4\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\right)+O(\frac{1}{n^{2}})\leq O(\frac{1}{n}),
var(T3tr2𝚺)42nvar(k31(\bmX1)4tr2𝚺)+O(tr2𝚺2n2tr4𝚺)O(tr2𝚺2ntr4𝚺)=o(1n).\displaystyle\mbox{var}\left(\frac{T_{3}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\right)\leq\frac{4^{2}}{n}\mbox{var}\left(\frac{k_{31}(\bm{X}_{1})}{4\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\right)+O\left(\frac{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}}{n^{2}\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}}\right)\leq O\left(\frac{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}}{n\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}}\right)=o(\frac{1}{n}).

Hence we conclude that

T1tr2𝚺(θ1)p 0,T2tr2𝚺1p 0,T3tr2𝚺p 0,\displaystyle\frac{T_{1}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}-(\theta-1)\,{\buildrel p\over{\longrightarrow}}\,0,~{}\frac{T_{2}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}-1\,{\buildrel p\over{\longrightarrow}}\,0,~{}\frac{T_{3}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\,{\buildrel p\over{\longrightarrow}}\,0,

and then continuous mapping theorem yields θ^nθp 0.\hat{\theta}_{n}-\theta\,{\buildrel p\over{\longrightarrow}}\,0.

A.2 Proof of Theorem 2.2

Proof.

Note that

θ^nθ=\displaystyle\hat{\theta}_{n}-\theta= T1+(1θ)T22(1+θ2)T3T2+2T3.\displaystyle\frac{T_{1}+(1-\theta)T_{2}-2(1+\theta_{2})T_{3}}{T_{2}+2T_{3}}.

The kernel function of the U statistic T1+(1θ)T2T_{1}+(1-\theta)T_{2} is

k1(\bmX1,\bmX2,\bmX3,\bmX4)+(1θ)k2(\bmX1,\bmX2,\bmX3,\bmX4),\displaystyle k_{1}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})+(1-\theta)k_{2}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4}),

and following the proof of Theorem 2.1, the conditional expectation is

k11(\bmX1)+(1θ)k21(\bmX1)=\displaystyle k_{11}(\bm{X}_{1})+(1-\theta)k_{21}(\bm{X}_{1})= ((\bmX1\bmX1θtr𝚺)2𝔼(\bmX1\bmX1θtr𝚺)2)\displaystyle\left((\bm{X}_{1}^{\top}\bm{X}_{1}-\theta\mbox{tr}\mbox{\boldmath$\Sigma$})^{2}-\mathbb{E}(\bm{X}_{1}^{\top}\bm{X}_{1}-\theta\mbox{tr}\mbox{\boldmath$\Sigma$})^{2}\right)
+4(\bmX1𝚺\bmX1tr𝚺2).\displaystyle+4(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1}-\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}).

By Lemma A.2, we have

var((\bmX1\bmX1θtr𝚺)2)=r1tr4𝚺+r2tr2𝚺tr𝚺2+r3tr2𝚺2+r4tr𝚺tr𝚺3+r5tr𝚺4,\displaystyle\mbox{var}\left((\bm{X}_{1}^{\top}\bm{X}_{1}-\theta\mbox{tr}\mbox{\boldmath$\Sigma$})^{2}\right)=r_{1}\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}+r_{2}\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}+r_{3}\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}+r_{4}\mbox{tr}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{3}+r_{5}\mbox{tr}\mbox{\boldmath$\Sigma$}^{4},
var(\bmX1𝚺\bmX1)=(θ1)tr2𝚺2+2θtr𝚺4.\displaystyle\mbox{var}\left(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1}\right)=(\theta-1)\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}+2\theta\mbox{tr}\mbox{\boldmath$\Sigma$}^{4}.

where

{r1=η44η3η2η22+4η23,r2=4(3η46η3η2+2η23+η22),r3=4(3η4η22),r4=32(η4η3η2),r5=48η4.\displaystyle\left\{\begin{array}[]{llllll}r_{1}=\eta_{4}-4\eta_{3}\eta_{2}-\eta_{2}^{2}+4\eta_{2}^{3},\\ r_{2}=4\left(3\eta_{4}-6\eta_{3}\eta_{2}+2\eta_{2}^{3}+\eta_{2}^{2}\right),\\ r_{3}=4(3\eta_{4}-\eta_{2}^{2}),\\ r_{4}=32(\eta_{4}-\eta_{3}\eta_{2}),\\ r_{5}=48\eta_{4}.\end{array}\right.

By the fact that

𝔼(ξ4p2)=var(ξ2p)+1,\displaystyle\mathbb{E}\left(\frac{\xi^{4}}{p^{2}}\right)=\mbox{var}\left(\frac{\xi^{2}}{p}\right)+1,
𝔼(ξ6p3)=𝔼(ξ2p1)3+3var(ξ2p)+1,\displaystyle\mathbb{E}\left(\frac{\xi^{6}}{p^{3}}\right)=\mathbb{E}\left(\frac{\xi^{2}}{p}-1\right)^{3}+3\mbox{var}\left(\frac{\xi^{2}}{p}\right)+1,
𝔼(ξ8p4)=𝔼(ξ2p1)4+4(ξ2p1)3+6var(ξ2p)+1,\displaystyle\mathbb{E}\left(\frac{\xi^{8}}{p^{4}}\right)=\mathbb{E}\left(\frac{\xi^{2}}{p}-1\right)^{4}+4\left(\frac{\xi^{2}}{p}-1\right)^{3}+6\mbox{var}\left(\frac{\xi^{2}}{p}\right)+1,

we have

r1=\displaystyle r_{1}= p5(p+2)3(p+4)(p+6)[(p+2)2p2𝔼(ξ8p4)4(p+2)(p+6)p2𝔼(ξ6p3)𝔼(ξ4p2)\displaystyle\frac{p^{5}}{(p+2)^{3}(p+4)(p+6)}\left[\frac{(p+2)^{2}}{p^{2}}\mathbb{E}\left(\frac{\xi^{8}}{p^{4}}\right)-4\frac{(p+2)(p+6)}{p^{2}}\mathbb{E}\left(\frac{\xi^{6}}{p^{3}}\right)\mathbb{E}\left(\frac{\xi^{4}}{p^{2}}\right)\right.
(p+2)(p+4)(p+6)p3𝔼2(ξ4p2)+4(p+4)(p+6)p2𝔼3(ξ4p2)]\displaystyle\qquad\left.-\frac{(p+2)(p+4)(p+6)}{p^{3}}\mathbb{E}^{2}\left(\frac{\xi^{4}}{p^{2}}\right)+4\frac{(p+4)(p+6)}{p^{2}}\mathbb{E}^{3}\left(\frac{\xi^{4}}{p^{2}}\right)\right]
=\displaystyle= p5(p+2)3(p+4)(p+6){(p+2)2p2𝔼(ξ2p1)48(p24p+12)p3var(ξ2p)\displaystyle\frac{p^{5}}{(p+2)^{3}(p+4)(p+6)}\left\{\frac{(p+2)^{2}}{p^{2}}\mathbb{E}\left(\frac{\xi^{2}}{p}-1\right)^{4}-\frac{8(p^{2}-4p+12)}{p^{3}}\mbox{var}\left(\frac{\xi^{2}}{p}\right)\right.
[(p+2)(p+4)(p+6)p324(p+6)p2]var2(ξ2p)+8(p6)p3\displaystyle\left.\qquad-\left[\frac{(p+2)(p+4)(p+6)}{p^{3}}-\frac{24(p+6)}{p^{2}}\right]\mbox{var}^{2}\left(\frac{\xi^{2}}{p}\right)+\frac{8(p-6)}{p^{3}}\right.
4(p+2)(p+6)p2𝔼(ξ2p1)3var(ξ2p)+4(p+4)(p+6)p2var3(ξ2p)}.\displaystyle\left.\qquad-4\frac{(p+2)(p+6)}{p^{2}}\mathbb{E}\left(\frac{\xi^{2}}{p}-1\right)^{3}\mbox{var}\left(\frac{\xi^{2}}{p}\right)+4\frac{(p+4)(p+6)}{p^{2}}\mbox{var}^{3}\left(\frac{\xi^{2}}{p}\right)\right\}.

Under the case (i) of Theorem 2.2,

r1=\displaystyle r_{1}= 𝔼(ξ2p1)4var2(ξ2p)8pvar(ξ2p)+8p2+o(1p2)\displaystyle\mathbb{E}\left(\frac{\xi^{2}}{p}-1\right)^{4}-\mbox{var}^{2}\left(\frac{\xi^{2}}{p}\right)-\frac{8}{p}\mbox{var}\left(\frac{\xi^{2}}{p}\right)+\frac{8}{p^{2}}+o\left(\frac{1}{p^{2}}\right)
=\displaystyle= var[(ξ2p1)2]8p[var(ξ2p)1p]+o(1p2)\displaystyle\mbox{var}\left[\left(\frac{\xi^{2}}{p}-1\right)^{2}\right]-\frac{8}{p}\left[\mbox{var}\left(\frac{\xi^{2}}{p}\right)-\frac{1}{p}\right]+o\left(\frac{1}{p^{2}}\right)
=\displaystyle= 2(τ2)2p2+o(1p2).\displaystyle\frac{2(\tau-2)^{2}}{p^{2}}+o\left(\frac{1}{p^{2}}\right).

Under the case (ii) of Theorem 2.2,

r1=\displaystyle r_{1}= var[(ξ2p1)2]4var(ξ2p)𝔼(ξ2p1)3+4var3(ξ2p)\displaystyle\mbox{var}\left[\left(\frac{\xi^{2}}{p}-1\right)^{2}\right]-4\mbox{var}\left(\frac{\xi^{2}}{p}\right)\mathbb{E}\left(\frac{\xi^{2}}{p}-1\right)^{3}+4\mbox{var}^{3}\left(\frac{\xi^{2}}{p}\right)
=\displaystyle= var[(ξ2p1)22var(ξ2p)ξ2p].\displaystyle\mbox{var}\left[\left(\frac{\xi^{2}}{p}-1\right)^{2}-2\mbox{var}\left(\frac{\xi^{2}}{p}\right)\cdot\frac{\xi^{2}}{p}\right].

With similar arguments, we have:

  • For case (i),

    r2=8(τ2)/p+o(1/p),r3=8+o(1),\displaystyle r_{2}=8(\tau-2)/p+o(1/p),~{}r_{3}=8+o(1),
    r4=64(τ2)/p+o(1/p),r5=48+o(1),\displaystyle r_{4}=64(\tau-2)/p+o(1/p),~{}r_{5}=48+o(1),
  • For case (ii),

    r2=O(1),r3=O(1),r4=O(1),r5=O(1).\displaystyle r_{2}=O(1),~{}r_{3}=O(1),r_{4}=O(1),~{}r_{5}=O(1).

Combining all the pieces, we can obtain

var((\bmX1\bmX1θtr𝚺)2)={2(τ2ptr2𝚺+2tr𝚺2)2+o(1p2),case(i),var[(ξ2p1)22var(ξ2p)ξ2p]+o(1),case(ii),\displaystyle\mbox{var}\left((\bm{X}_{1}^{\top}\bm{X}_{1}-\theta\mbox{tr}\mbox{\boldmath$\Sigma$})^{2}\right)=\left\{\begin{array}[]{ll}2\left(\frac{\tau-2}{p}\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}+2\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}\right)^{2}+o\left(\frac{1}{p^{2}}\right),&\rm{case}~{}(i),\\ \mbox{var}\left[\left(\frac{\xi^{2}}{p}-1\right)^{2}-2\mbox{var}\left(\frac{\xi^{2}}{p}\right)\cdot\frac{\xi^{2}}{p}\right]+o(1),&\rm{case}~{}(ii),\end{array}\right.

and

var(\bmX1𝚺\bmX1)=o[var((\bmX1\bmX1θtr𝚺)2)].\displaystyle\mbox{var}\left(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1}\right)=o\left[\mbox{var}\left((\bm{X}_{1}^{\top}\bm{X}_{1}-\theta\mbox{tr}\mbox{\boldmath$\Sigma$})^{2}\right)\right].

Thus

var(k11(\bmX1)+(1θ)k21(\bmX1))=σn2tr4𝚺(1+o(1)).\displaystyle\mbox{var}\left(k_{11}(\bm{X}_{1})+(1-\theta)k_{21}(\bm{X}_{1})\right)=\sigma_{n}^{2}\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}(1+o(1)).

Similarly,

var(k31(\bmX1))=4var(\bmX1𝚺\bmX1)=o(σn2tr4𝚺).\displaystyle\mbox{var}\left(k_{31}(\bm{X}_{1})\right)=4\mbox{var}\left(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1}\right)=o\left(\sigma_{n}^{2}\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}\right).

Next, writing \bmY1=\bmX1\bmX2\bm{Y}_{1}=\bm{X}_{1}-\bm{X}_{2} and \bmY2=\bmX3\bmX4\bm{Y}_{2}=\bm{X}_{3}-\bm{X}_{4}, and noting that

𝔼\bmY1\bmY1=\displaystyle\mathbb{E}\bm{Y}_{1}^{\top}\bm{Y}_{1}= 2tr𝚺,\displaystyle 2\mbox{tr}\mbox{\boldmath$\Sigma$},
𝔼(\bmY1\bmY1)2=\displaystyle\mathbb{E}(\bm{Y}_{1}^{\top}\bm{Y}_{1})^{2}= 2(1+θ)(tr2𝚺+2tr𝚺2),\displaystyle 2(1+\theta)(\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}+2\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}),

thus we have

var(k1(\bmX1,\bmX2,\bmX3,\bmX4)+(1θ)k2(\bmX1,\bmX2,\bmX3,\bmX4))\displaystyle\mbox{var}\left(k_{1}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})+(1-\theta)k_{2}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})\right)
\displaystyle\leq var((\bmX1\bmX222\bmX3\bmX422)2+(1θ)\bmX1\bmX222\bmX3\bmX422)\displaystyle\mbox{var}\left(\left(\|\bm{X}_{1}-\bm{X}_{2}\|_{2}^{2}-\|\bm{X}_{3}-\bm{X}_{4}\|_{2}^{2}\right)^{2}+(1-\theta)\|\bm{X}_{1}-\bm{X}_{2}\|_{2}^{2}\|\bm{X}_{3}-\bm{X}_{4}\|_{2}^{2}\right)
=\displaystyle= var(\bmY124+\bmY224(1+θ)\bmY122\bmY222)\displaystyle\mbox{var}\left(\|\bm{Y}_{1}\|_{2}^{4}+\|\bm{Y}_{2}\|_{2}^{4}-(1+\theta)\|\bm{Y}_{1}\|_{2}^{2}\|\bm{Y}_{2}\|_{2}^{2}\right)
=\displaystyle= var((\bmY1\bmY1(1+θ)tr𝚺)2+(\bmY2\bmY2(1+θ)tr𝚺)2\displaystyle\mbox{var}\left(\left(\bm{Y}_{1}^{\top}\bm{Y}_{1}-(1+\theta)\mbox{tr}\mbox{\boldmath$\Sigma$}\right)^{2}+\left(\bm{Y}_{2}^{\top}\bm{Y}_{2}-(1+\theta)\mbox{tr}\mbox{\boldmath$\Sigma$}\right)^{2}\right.
(1+θ)(\bmY1\bmY12tr𝚺)(\bmY2\bmY22tr𝚺))\displaystyle\qquad\left.-(1+\theta)(\bm{Y}_{1}^{\top}\bm{Y}_{1}-2\mbox{tr}\mbox{\boldmath$\Sigma$})(\bm{Y}_{2}^{\top}\bm{Y}_{2}-2\mbox{tr}\mbox{\boldmath$\Sigma$})\right)
=\displaystyle= 2var((\bmY1\bmY1(1+θ)tr𝚺)2)+(1+θ)2(𝔼(\bmY1\bmY12tr𝚺)2)2,\displaystyle 2\mbox{var}\left(\left(\bm{Y}_{1}^{\top}\bm{Y}_{1}-(1+\theta)\mbox{tr}\mbox{\boldmath$\Sigma$}\right)^{2}\right)+(1+\theta)^{2}\left(\mathbb{E}(\bm{Y}_{1}^{\top}\bm{Y}_{1}-2\mbox{tr}\mbox{\boldmath$\Sigma$})^{2}\right)^{2},
var((\bmY1\bmY1(1+θ)tr𝚺)2)\displaystyle\mbox{var}\left(\left(\bm{Y}_{1}^{\top}\bm{Y}_{1}-(1+\theta)\mbox{tr}\mbox{\boldmath$\Sigma$}\right)^{2}\right)
=\displaystyle= 𝔼(\bmY1\bmY1(1+θ)tr𝚺)4(𝔼(\bmY1\bmY1(1+θ)tr𝚺)2)2\displaystyle\mathbb{E}\left(\bm{Y}_{1}^{\top}\bm{Y}_{1}-(1+\theta)\mbox{tr}\mbox{\boldmath$\Sigma$}\right)^{4}-\left(\mathbb{E}\left(\bm{Y}_{1}^{\top}\bm{Y}_{1}-(1+\theta)\mbox{tr}\mbox{\boldmath$\Sigma$}\right)^{2}\right)^{2}
=\displaystyle= 𝔼(\bmY1\bmY1)44(1+θ)tr𝚺𝔼(\bmY1\bmY1)3+6(1+θ)2tr2𝚺𝔼(\bmY1\bmY1)2\displaystyle\mathbb{E}(\bm{Y}_{1}^{\top}\bm{Y}_{1})^{4}-4(1+\theta)\mbox{tr}\mbox{\boldmath$\Sigma$}\mathbb{E}(\bm{Y}_{1}^{\top}\bm{Y}_{1})^{3}+6(1+\theta)^{2}\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}\mathbb{E}(\bm{Y}_{1}^{\top}\bm{Y}_{1})^{2}
4(1+θ)3tr3𝚺𝔼(\bmY1\bmY1)+(1+θ)4tr4𝚺(1+θ)2((θ1)tr2𝚺+4tr𝚺2)2\displaystyle-4(1+\theta)^{3}\mbox{tr}^{3}\mbox{\boldmath$\Sigma$}\mathbb{E}(\bm{Y}_{1}^{\top}\bm{Y}_{1})+(1+\theta)^{4}\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}-(1+\theta)^{2}\left((\theta-1)\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}+4\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}\right)^{2}
=\displaystyle= 𝔼(\bmY1\bmY1)44(1+θ)tr𝚺𝔼(\bmY1\bmY1)3+4(1+θ)2(2θ+1)tr4𝚺\displaystyle\mathbb{E}(\bm{Y}_{1}^{\top}\bm{Y}_{1})^{4}-4(1+\theta)\mbox{tr}\mbox{\boldmath$\Sigma$}\mathbb{E}(\bm{Y}_{1}^{\top}\bm{Y}_{1})^{3}+4(1+\theta)^{2}(2\theta+1)\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}
+16(1+θ)2(θ+2)tr2𝚺tr𝚺216(1+θ)2tr2𝚺2.\displaystyle+16(1+\theta)^{2}(\theta+2)\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}-16(1+\theta)^{2}\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}.

Expanding 𝔼(\bmY1\bmY1)3\mathbb{E}(\bm{Y}_{1}^{\top}\bm{Y}_{1})^{3} and 𝔼(\bmY1\bmY1)4\mathbb{E}(\bm{Y}_{1}^{\top}\bm{Y}_{1})^{4} we have

𝔼(\bmY1\bmY1)3=\displaystyle\mathbb{E}(\bm{Y}_{1}^{\top}\bm{Y}_{1})^{3}= 𝔼(\bmX1𝚺\bmX1+\bmX2𝚺\bmX22\bmX1𝚺\bmX2)3\displaystyle\mathbb{E}\left(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1}+\bm{X}_{2}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{2}-2\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{2}\right)^{3}
=\displaystyle= 2𝔼(\bmX1𝚺\bmX1)3+6𝔼(\bmX1𝚺\bmX1)2𝔼(\bmX1𝚺\bmX1)+24𝔼(\bmX1𝚺\bmX1)(\bmX1𝚺2\bmX1),\displaystyle 2\mathbb{E}(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1})^{3}+6\mathbb{E}(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1})^{2}\mathbb{E}(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1})+24\mathbb{E}(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1})(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}^{2}\bm{X}_{1}),
𝔼(\bmY1\bmY1)4=\displaystyle\mathbb{E}(\bm{Y}_{1}^{\top}\bm{Y}_{1})^{4}= 𝔼(\bmX1𝚺\bmX1+\bmX2𝚺\bmX22\bmX1𝚺\bmX2)4\displaystyle\mathbb{E}\left(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1}+\bm{X}_{2}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{2}-2\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{2}\right)^{4}
=\displaystyle= 2𝔼(\bmX1𝚺\bmX1)4+8𝔼(\bmX1𝚺\bmX1)3𝔼(\bmX1𝚺\bmX1)+6(𝔼(\bmX1𝚺\bmX1)2)2\displaystyle 2\mathbb{E}(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1})^{4}+8\mathbb{E}(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1})^{3}\mathbb{E}(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1})+6\left(\mathbb{E}(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1})^{2}\right)^{2}
+48𝔼(\bmX1𝚺\bmX1)2(\bmX1𝚺2\bmX1)+48𝔼(\bmX1𝚺\bmX1)(\bmX2𝚺\bmX2)(\bmX1𝚺\bmX2)2\displaystyle+48\mathbb{E}(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1})^{2}(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}^{2}\bm{X}_{1})+48\mathbb{E}(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{1})(\bm{X}_{2}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{2})(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{2})^{2}
+16𝔼(\bmX1𝚺\bmX2)4,\displaystyle+16\mathbb{E}(\bm{X}_{1}^{\top}\mbox{\boldmath$\Sigma$}\bm{X}_{2})^{4},

applying Lemma A.1 we can obtain

𝔼(\bmY1\bmY1)3=\displaystyle\mathbb{E}(\bm{Y}_{1}^{\top}\bm{Y}_{1})^{3}= 2(η3+3η2)tr3𝚺+12(η3+3η2)tr𝚺tr𝚺2+16(η3+3η2)tr𝚺3,\displaystyle 2\left(\eta_{3}+3\eta_{2}\right)\mbox{tr}^{3}\mbox{\boldmath$\Sigma$}+12\left(\eta_{3}+3\eta_{2}\right)\mbox{tr}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}+16\left(\eta_{3}+3\eta_{2}\right)\mbox{tr}\mbox{\boldmath$\Sigma$}^{3},
𝔼(\bmY1\bmY1)4=\displaystyle\mathbb{E}(\bm{Y}_{1}^{\top}\bm{Y}_{1})^{4}= 2(η4+4η3+3η22)tr4𝚺+24(η4+4η3+3η22)tr2𝚺tr𝚺2\displaystyle 2(\eta_{4}+4\eta_{3}+3\eta_{2}^{2})\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}+24(\eta_{4}+4\eta_{3}+3\eta_{2}^{2})\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}
+24(η4+4η3+3η22)tr2𝚺2+64(η4+4η3+3η22)tr𝚺tr𝚺3\displaystyle+24\left(\eta_{4}+4\eta_{3}+3\eta_{2}^{2}\right)\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}+64\left(\eta_{4}+4\eta_{3}+3\eta_{2}^{2}\right)\mbox{tr}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{3}
+96(η4+4η3+3η22)tr𝚺4,\displaystyle+96\left(\eta_{4}+4\eta_{3}+3\eta_{2}^{2}\right)\mbox{tr}\mbox{\boldmath$\Sigma$}^{4},

hence

var((\bmY1\bmY1(1+θ2)tr𝚺)2)\displaystyle\mbox{var}\left(\left(\bm{Y}_{1}^{\top}\bm{Y}_{1}-(1+\theta_{2})\mbox{tr}\mbox{\boldmath$\Sigma$}\right)^{2}\right)
=\displaystyle= (2(η44η3η2η22+4η23)+4(η21)2)tr4𝚺\displaystyle\left(2\left(\eta_{4}-4\eta_{3}\eta_{2}-\eta_{2}^{2}+4\eta_{2}^{3}\right)+4(\eta_{2}-1)^{2}\right)\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}
+(8(3η46η3η2+2η23+η22)+16(3(η3η2)+2(η21)2))tr2𝚺tr𝚺2\displaystyle+\left(8\left(3\eta_{4}-6\eta_{3}\eta_{2}+2\eta_{2}^{3}+\eta_{2}^{2}\right)+16\left(3(\eta_{3}-\eta_{2})+2(\eta_{2}-1)^{2}\right)\right)\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}
+(8(3η4η22)+16(6η3+4η222η21))tr2𝚺2\displaystyle+\left(8\left(3\eta_{4}-\eta_{2}^{2}\right)+16\left(6\eta_{3}+4\eta_{2}^{2}-2\eta_{2}-1\right)\right)\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}
+(64(η4η3η2)+192(η3η2))tr𝚺tr𝚺3\displaystyle+\left(64\left(\eta_{4}-\eta_{3}\eta_{2}\right)+192\left(\eta_{3}-\eta_{2}\right)\right)\mbox{tr}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{3}
+96(η4+4η3+3η22)tr𝚺4.\displaystyle+96\left(\eta_{4}+4\eta_{3}+3\eta_{2}^{2}\right)\mbox{tr}\mbox{\boldmath$\Sigma$}^{4}.

Noting

(1+θ)2\displaystyle(1+\theta)^{2} (𝔼(\bmY1\bmY12tr𝚺)2)2\displaystyle\left(\mathbb{E}(\bm{Y}_{1}^{\top}\bm{Y}_{1}-2\mbox{tr}\mbox{\boldmath$\Sigma$})^{2}\right)^{2}
=4(1+η2)2[(η21)2tr4𝚺+4(η221)tr2𝚺tr𝚺2+4(η2+1)2tr2𝚺2],\displaystyle=4(1+\eta_{2})^{2}\left[(\eta_{2}-1)^{2}\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}+4(\eta_{2}^{2}-1)\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}+4(\eta_{2}+1)^{2}\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}\right],

we have

var(k1(\bmX1,\bmX2,\bmX3,\bmX4)+(1θ)k2(\bmX1,\bmX2,\bmX3,\bmX4))\displaystyle\mbox{var}\left(k_{1}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})+(1-\theta)k_{2}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})\right)
\displaystyle\leq 2var((\bmX1\bmX1θtr𝚺)2)\displaystyle 2\mbox{var}\left((\bm{X}_{1}^{\top}\bm{X}_{1}-\theta\mbox{tr}\mbox{\boldmath$\Sigma$})^{2}\right)
+4(r~1tr4𝚺+r~2tr2𝚺tr𝚺2+r~3tr2𝚺2+r~4tr𝚺tr𝚺3+r~5tr𝚺4),\displaystyle+4\left(\tilde{r}_{1}\mbox{tr}^{4}\mbox{\boldmath$\Sigma$}+\tilde{r}_{2}\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}+\tilde{r}_{3}\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}+\tilde{r}_{4}\mbox{tr}\mbox{\boldmath$\Sigma$}\mbox{tr}\mbox{\boldmath$\Sigma$}^{3}+\tilde{r}_{5}\mbox{tr}\mbox{\boldmath$\Sigma$}^{4}\right),

where

{r~1=(η21)2((η2+1)2+1),r~2=4((3(η3η2)+2(η21)2)+(η2+1)2(η221)),r~3=4((6η3+4η222η21)+(η2+1)4),r~4=48(η3η2),r~5=24(4η3+3η22).\displaystyle\left\{\begin{array}[]{llll}\tilde{r}_{1}=&(\eta_{2}-1)^{2}\left((\eta_{2}+1)^{2}+1\right),\\ \tilde{r}_{2}=&4\left(\left(3(\eta_{3}-\eta_{2})+2(\eta_{2}-1)^{2}\right)+(\eta_{2}+1)^{2}(\eta_{2}^{2}-1)\right),\\ \tilde{r}_{3}=&4\left(\left(6\eta_{3}+4\eta_{2}^{2}-2\eta_{2}-1\right)+(\eta_{2}+1)^{4}\right),\\ \tilde{r}_{4}=&48\left(\eta_{3}-\eta_{2}\right),\\ \tilde{r}_{5}=&24\left(4\eta_{3}+3\eta_{2}^{2}\right).\end{array}\right.

With similar arguments of rir_{i}, we have:

  • for case (i),

    r~1=5(τ2)2p2+o(1p2),r~2=56(τ2)p+o(1p),r~3=92+o(1),\displaystyle\tilde{r}_{1}=\frac{5(\tau-2)^{2}}{p^{2}}+o\left(\frac{1}{p^{2}}\right),~{}\tilde{r}_{2}=\frac{56(\tau-2)}{p}+o\left(\frac{1}{p}\right),~{}\tilde{r}_{3}=92+o(1),
    r~4=96(τ2)p+o(1p),r~5=168+o(1),\displaystyle\tilde{r}_{4}=\frac{96(\tau-2)}{p}+o\left(\frac{1}{p}\right),~{}\tilde{r}_{5}=168+o(1),

    and then

    var(k1(\bmX1,\bmX2,\bmX3,\bmX4)+(1θ)k2(\bmX1,\bmX2,\bmX3,\bmX4))\displaystyle\mbox{var}\left(k_{1}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})+(1-\theta)k_{2}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})\right)
    \displaystyle\leq 9(τ2ptr2𝚺+6tr𝚺2)(τ2ptr2𝚺+2tr𝚺2)+εp2+o(1p2)\displaystyle 9\left(\frac{\tau-2}{p}\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}+6\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}\right)\left(\frac{\tau-2}{p}\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}+2\mbox{tr}\mbox{\boldmath$\Sigma$}^{2}\right)+\frac{\varepsilon}{p^{2}}+o\left(\frac{1}{p^{2}}\right)
    =\displaystyle= O(var((\bmX1\bmX1θtr𝚺)2));\displaystyle O\left(\mbox{var}\left((\bm{X}_{1}^{\top}\bm{X}_{1}-\theta\mbox{tr}\mbox{\boldmath$\Sigma$})^{2}\right)\right);
  • for case (ii),

    r~1=var2(ξ2p)[(var(ξ2p)+2)2+1],\displaystyle\tilde{r}_{1}=\mbox{var}^{2}\left(\frac{\xi^{2}}{p}\right)\left[\left(\mbox{var}\left(\frac{\xi^{2}}{p}\right)+2\right)^{2}+1\right],
    r~2=O(1),r~3=O(1),r~4=O(1),r~5=O(1),\displaystyle\tilde{r}_{2}=O(1),~{}\tilde{r}_{3}=O(1),\tilde{r}_{4}=O(1),~{}\tilde{r}_{5}=O(1),

    and

    var(k1(\bmX1,\bmX2,\bmX3,\bmX4)+(1θ)k2(\bmX1,\bmX2,\bmX3,\bmX4))\displaystyle\mbox{var}\left(k_{1}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})+(1-\theta)k_{2}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})\right)
    =\displaystyle= O(var((\bmX1\bmX1θtr𝚺)2)).\displaystyle O\left(\mbox{var}\left((\bm{X}_{1}^{\top}\bm{X}_{1}-\theta\mbox{tr}\mbox{\boldmath$\Sigma$})^{2}\right)\right).

Similarly,

var(k3(\bmX1,\bmX2,\bmX3,\bmX4))\displaystyle\mbox{var}\left(k_{3}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})\right)
\displaystyle\leq var(((\bmX1\bmX2)(\bmX3\bmX4))2)\displaystyle\mbox{var}\left(\left((\bm{X}_{1}-\bm{X}_{2})^{\top}(\bm{X}_{3}-\bm{X}_{4})\right)^{2}\right)
=\displaystyle= var((\bmY1\bmY2)2)6(1+θ)𝔼(\bmY2𝚺\bmY2)2\displaystyle\mbox{var}\left((\bm{Y}_{1}^{\top}\bm{Y}_{2})^{2}\right)\leq 6(1+\theta)\mathbb{E}\left(\bm{Y}_{2}^{\top}\mbox{\boldmath$\Sigma$}\bm{Y}_{2}\right)^{2}
=\displaystyle= 12(1+θ)2(tr2𝚺2+tr𝚺4)\displaystyle 12(1+\theta)^{2}(\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}^{2}+\mbox{tr}\mbox{\boldmath$\Sigma$}^{4})
=\displaystyle= O(var(k1(\bmX1,\bmX2,\bmX3,\bmX4)+(1θ)k2(\bmX1,\bmX2,\bmX3,\bmX4))).\displaystyle O\left(\mbox{var}\left(k_{1}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})+(1-\theta)k_{2}(\bm{X}_{1},\bm{X}_{2},\bm{X}_{3},\bm{X}_{4})\right)\right).

Combining all the pieces we can obtain

var(T1+(1θ)T2tr2𝚺)=\displaystyle\mbox{var}\left(\frac{T_{1}+(1-\theta)T_{2}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\right)= 42nvar(k11(\bmX1)+(1θ)k21(\bmX1)4tr2𝚺)(1+o(1))\displaystyle\frac{4^{2}}{n}\mbox{var}\left(\frac{k_{11}(\bm{X}_{1})+(1-\theta)k_{21}(\bm{X}_{1})}{4\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\right)(1+o(1))
=\displaystyle= σn2n(1+o(1)),\displaystyle\frac{\sigma_{n}^{2}}{n}(1+o(1)),
var(T3tr2𝚺)=\displaystyle\mbox{var}\left(\frac{T_{3}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\right)= 1nvar(k31(\bmX1)tr2𝚺)+O(σn2n2)=o(σn2n),\displaystyle\frac{1}{n}\mbox{var}\left(\frac{k_{31}(\bm{X}_{1})}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\right)+O\left(\frac{\sigma_{n}^{2}}{n^{2}}\right)=o\left(\frac{\sigma_{n}^{2}}{n}\right),

thus we conclude that

(T1+(1θ)T2)𝔼(T1+(1θ)T2)tr2𝚺\displaystyle\frac{\left(T_{1}+(1-\theta)T_{2}\right)-\mathbb{E}\left(T_{1}+(1-\theta)T_{2}\right)}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}
=\displaystyle= 1ni=1nk11(\bmXi)+(1θ)k21(\bmXi)𝔼(k11(\bmXi)+(1θ)k21(\bmXi))tr2𝚺+op(σnn),\displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{k_{11}(\bm{X}_{i})+(1-\theta)k_{21}(\bm{X}_{i})-\mathbb{E}\left(k_{11}(\bm{X}_{i})+(1-\theta)k_{21}(\bm{X}_{i})\right)}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}+o_{p}\left(\frac{\sigma_{n}}{\sqrt{n}}\right),
T3𝔼T3tr2𝚺=op(σnn).\displaystyle\frac{T_{3}-\mathbb{E}T_{3}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}=o_{p}\left(\frac{\sigma_{n}}{\sqrt{n}}\right).

By classical CLT and Slutsky’s theorem,

nσn(T1+(1θ)T22(1+θ)T3tr2𝚺)dN(0,1).\displaystyle\frac{\sqrt{n}}{\sigma_{n}}\left(\frac{T_{1}+(1-\theta)T_{2}-2(1+\theta)T_{3}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\right)\,{\buildrel d\over{\longrightarrow}}\,N(0,1).

Noting

T2+2T3tr2𝚺p 1,\displaystyle\frac{T_{2}+2T_{3}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\,{\buildrel p\over{\longrightarrow}}\,1,

we finally conclude that

nσn(θnθ)=\displaystyle\frac{\sqrt{n}}{\sigma_{n}}(\theta_{n}-\theta)= nσn(T1+(1θ)T22(1+θ)T3T2+2T3)\displaystyle\frac{\sqrt{n}}{\sigma_{n}}\left(\frac{T_{1}+(1-\theta)T_{2}-2(1+\theta)T_{3}}{T_{2}+2T_{3}}\right)
=\displaystyle= nσn(T1+(1θ)T22(1+θ)T3tr2𝚺)tr2𝚺T2+2T3dN(0,1).\displaystyle\frac{\sqrt{n}}{\sigma_{n}}\left(\frac{T_{1}+(1-\theta)T_{2}-2(1+\theta)T_{3}}{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}\right)\frac{\mbox{tr}^{2}\mbox{\boldmath$\Sigma$}}{T_{2}+2T_{3}}\,{\buildrel d\over{\longrightarrow}}\,N(0,1).