This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Regularized Fingerprinting in Detection and Attribution of Climate Change with Weight Matrix Optimizing the Efficiency in Scaling Factor Estimation

Yan Li1, Kun Chen1, Jun Yan1, and Xuebin Zhang2
1Department of Statistics, University of Connecticut, CT, U.S.A.
2Environment and Climate Change Canada, QC, CA
Corresponding author. Email: [email protected]
Abstract

The optimal fingerprinting method for detection and attribution of climate change is based on a multiple regression where each covariate has measurement error whose covariance matrix is the same as that of the regression error up to a known scale. Inferences about the regression coefficients are critical not only for making statements about detection and attribution but also for quantifying the uncertainty in important outcomes derived from detection and attribution analyses. When there is no errors-in-variables (EIV), the optimal weight matrix in estimating the regression coefficients is the precision matrix of the regression error which, in practice, is never known and has to be estimated from climate model simulations. We construct a weight matrix by inverting a nonlinear shrinkage estimate of the error covariance matrix that minimizes loss functions directly targeting the uncertainty of the resulting regression coefficient estimator. The resulting estimator of the regression coefficients is asymptotically optimal as the sample size of the climate model simulations and the matrix dimension go to infinity together with a limiting ratio. When EIVs are present, the estimator of the regression coefficients based on the proposed weight matrix is asymptotically more efficient than that based on the inverse of the existing linear shrinkage estimator of the error covariance matrix. The performance of the method is confirmed in finite sample simulation studies mimicking realistic situations in terms of the length of the confidence intervals and empirical coverage rates for the regression coefficients. An application to detection and attribution analyses of the mean temperature at different spatial scales illustrates the utility of the method.

Key words: Measurement error; Nonlinear shrinkage estimator; Total least squares

1 Introduction

Detection and attribution analyses of climate change are critical components in establishing a causal relationship from the human emission of greenhouse gases to the warming of planet Earth (e.g., Bindoff et al., 2013). In climate science, detection is the process of demonstrating that a climate variable has changed in some defined statistical sense without providing a reason for that change; attribution is the process of evaluating the relative contributions of multiple causal factors to a change or event with an assignment of statistical confidence (e.g., Hegerl and Zwiers, 2011). Casual factors usually refer to external forcings, which may be anthropogenic (e.g., greenhouse gases, aerosols, ozone precursors, land use) and/or natural (e.g., volcanic eruptions, solar cycle modulations). By comparing simulated results of climate models with observed climate variables, a detection and attribution analysis evaluates the consistency of observed changes with the expected response, also known as fingerprint, of the climate system under each external forcing.

Optimal fingerprinting is the most widely used method for detection and attribution analyses (e.g., Hasselmann, 1997; Allen and Stott, 2003; Hegerl et al., 2010). Fingerprinting is a procedure that regresses the observed climate variable of interest on the fingerprints of external forcings, and checks whether the fingerprints are found in and consistent with the observed data. The central target of statistical inferences here is the regression coefficients, also known as scaling factors, which scale the fingerprints of external forcings to best match the observed climate change. Historically, the method was “optimal” in the context of generalized least squares (GLS) when the precision matrix of the regression error is used as weight, such that the resulting estimator of the scaling factors have the smallest variances. It was later recognized that the fingerprint covariates are not observed but estimated from climate model simulations. This leads to an errors-in-variables (EIV) issue, which has been approached by the total least squares (TLS) (Allen and Stott, 2003) with both the response and the covariates “prewhittened” by the covariance matrix of the error. The covariance matrix represents the internal climate variation. In practice, it is unknown and has to be estimated from climate model simulations (Allen and Tett, 1999; Ribes et al., 2013), which is generally handled preliminarily and independently from the regression inference (e.g., Hannart et al., 2014).

Estimating the error covariance matrix in fingerprinting is challenging because the available runs from climate model simulations are small relative to the dimension of the covariance matrix. The dimension of the covariate matrix is often reduced by considering averages over 5-year or 10-year blocks in time and/or over aggregated grid-boxes in space. For example, decadal average over a 110-year period in 25 grid-boxes would lead to a dimension of 275×275275\times 275. The sample size from climate model simulations that can be used is, however, at most a few hundreds. The sample covariance matrix is not invertible when the sample size of climate model simulations is less than its dimension. Earlier methods project data onto the leading empirical orthogonal functions (EOF) of the internal climate variability (Hegerl et al., 1996; Allen and Tett, 1999). More recently, the regularized optimal fingerprinting (ROF) method avoids the projection step by a regularized covariance matrix estimation (Ribes et al., 2009), which is based on the linear shrinkage covariance matrix estimator of Ledoit and Wolf (2004). The ROF method has been shown to provide a more accurate implementation of optimal fingerprinting than the EOF projection method (Ribes et al., 2013).

The uncertainty in estimating the error covariance matrix has important implications in optimal fingerprinting. The optimality in inferences about the scaling factor in optimal fingerprinting was historically based on the assumption that the error covariance matrix is known. The properties of the scaling factor estimator obtained by substituting the error covariance matrix with an estimate have not been thoroughly investigated in the literature. For example, it is not until recently that the confidence intervals for the scaling factors constructed from asymptotic normal approximation (Fuller, 1980; Gleser, 1981) or bootstrap (Pesta, 2013) were reported to be overly narrow when the matrix is only known up to a scale (DelSole et al., 2019) or completely unknown (Li et al., 2019). A natural, fundamental question is: when the error covariance matrix is estimated, are the confidence intervals constructed using the optimal approach under the assumption of known error covariance matrix still optimal? Since optimality under unknown error covariance matrix is practically infeasible, to be precise, we term fingerprinting for optimal fingerprinting and regularized fingerprinting (RF) for ROF in the sequel.

The contributions of this paper are two-fold. First, we develop a new method to construct the weight matrix in RF by minimizing directly the uncertainty of the resulting scaling factor estimators. The weight matrix is the inverse of a nonlinear shrinkage estimator of the error covariance matrix inspired by Ledoit and Wolf (2017a). We first extend the validity of their nonlinear shrinkage estimator to the context of RF via GLS regression with no EIV. We show that the proposed method is asymptotically optimal as the sample size of climate model simulations and the matrix dimension go to infinity together with a fixed ratio. When there is EIV, as is the case in practice, we show that the proposed weight is more efficient than the existing weight in RF (Ribes et al., 2013) in terms of the asymptotic variance of the scaling factor estimator when RF is conducted with generalized TLS (GTLS). This is why we refer to the current practice by RF instead of ROF. The second contribution is practical recommendations for assumptions about the structure of the error covariance matrix under which the sample covariance is estimated before any shrinkage in RF based on findings of a comparison study under various realistic settings. An implementation of both linear shrinkage and nonlinear shrinkage estimators is publicly available in an R package dacc (Li et al., 2020) for detection and attribution of climate change.

The rest of this article is organized as follows. After a review of RF in Section 2, we develop the proposed weight matrix and the theoretical results to support the asymptotic performance of proposed method in Section 3. A large scale numerical study assessing the performance of the proposed method is reported in Section 4. In Section 5, we demonstrate the proposed method with detection and attribution analysis on global and regional scales. A discussion concludes in Section 6. The technical proofs of the theoretical results are relegated in the Supplementary Materials.

2 Regularized Fingerprinting

Fingerprinting takes the form of a linear regression with EIV

Y\displaystyle Y =i=1pXiβi+ϵ,\displaystyle=\sum_{i=1}^{p}X_{i}\beta_{i}+\epsilon, (1)
X~i\displaystyle\tilde{X}_{i} =Xi+νi,i=1,,p,\displaystyle=X_{i}+\nu_{i},\qquad i=1,\ldots,p, (2)

where YY is a N×1N\times 1 vector of the observed climate variable of interest, XiX_{i} is the true but unobserved N×1N\times 1 fingerprint vector the iith external forcing with scaling factor βi\beta_{i}, ϵ\epsilon is a N×1N\times 1 vector of normally distributed regression error with mean zero and covariance matrix Σ\Sigma, X~i\tilde{X}_{i} is an estimate of XiX_{i} based on nin_{i} climate model simulations under the iith external forcing, and νi\nu_{i} is a normally distributed measurement error vector with mean zero and covariance matrix Σ/ni\Sigma/n_{i}, and νi\nu_{i}’s are mutually independent and independent of ϵ\epsilon, i=1,,pi=1,\ldots,p. The covariance matrices of νi\nu_{i}’s and ϵ\epsilon only differ in scales under the assumption that the climate models reflect the real climate variation. No intercept is present in the regression because the response and the covariates are centered by the same reference level. The primary target of inference is the scaling factors β=(β1,,βp)\beta=(\beta_{1},\ldots,\beta_{p})^{\top}.

The “optimal” in optimal fingerprinting originated from earlier practices under two assumptions: 1) the error covariance matrix Σ\Sigma is known; and 2) XiX_{i}’s are observed. The GLS estimator of β\beta with weight matrix WW is

β^(W)=(XWX)1XWY,\hat{\beta}(W)=(X^{\top}WX)^{-1}X^{\top}WY,

where X=(X1,,Xp)X=(X_{1},\ldots,X_{p}). The covariance matrix of the estimator β^(W)\hat{\beta}(W) is

𝕍(β^(W))\displaystyle\operatorname{\mathbb{V}}(\hat{\beta}(W)) =(XWX)1XWΣWX(XWX)1\displaystyle=(X^{\top}WX)^{-1}X^{\top}W\Sigma WX(X^{\top}WX)^{-1} (3)

The optimal weight matrix is W=Σ1W=\Sigma^{-1}, in which case, β^(Σ1)\hat{\beta}(\Sigma^{-1}) is the best linear unbiased estimator of β\beta with covariance matrix XΣ1XX^{\top}\Sigma^{-1}X. Since Σ\Sigma is unknown, a feasible version of GLS uses W=Σ^1W=\hat{\Sigma}^{-1}, where Σ^\hat{\Sigma} is an estimator of Σ\Sigma obtained separately from controlled runs of climate model simulations.

Later on it was recognized that, instead of XiX_{i}’s, only their estimates X~i\tilde{X}_{i}’s are observed and that using X~i\tilde{X}_{i}’s in place of XiX_{i}’s leads to bias in estimating β\beta (Allen and Stott, 2003). If Σ\Sigma is given, the same structure (up to a scale 1/ni1/n_{i}) of the covariance matrices of νi\nu_{i}’s and ϵ\epsilon allows precise pre-whitening of both YY and X~i\tilde{X}_{i}’s . Then the TLS can be applied to the pre-whitened variables. Inferences about β\beta can be based on the asymptotic normal distribution of the TLS estimator of β\beta (Gleser, 1981) or nonparametric bootstrap (Pesta, 2013), as recently studied by DelSole et al. (2019). Similar to the GLS setting, a feasible version of the GTLS procedure relies on an estimator of Σ\Sigma.

The current practice of fingerprinting consists of two separate steps. First, estimate Σ\Sigma from controlled runs of climate model simulations under the assumption that the climate models capture the internal variability of the real climate system. Second, use this estimated matrix to pre-whiten both the outcome and covariates in the regression model (1)–(2), and obtain the GTLS estimator of β\beta on the pre-whitened data. Nonetheless, estimation of Σ\Sigma in the first step is not an easy task. The dimension of Σ\Sigma is N×NN\times N, with N(N+1)/2N(N+1)/2 parameters if no structure is imposed, which is too large for the sample size nn of available climate model simulations (usually in a few hundreds at most). The sample covariance matrix based on the nn runs is a start, but it is of too much variation; when N>nN>n it is not even invertible. The linear shrinkage method of Ledoit and Wolf (2004) regularizes the sample covariance matrix Σ^n\hat{\Sigma}_{n} to in the form of λΣ^n+ρI\lambda\hat{\Sigma}_{n}+\rho I, where λ\lambda and ρ\rho are scalar tuning parameters and II is the identity matrix. This class of shrinkage estimators has the effect of shrinking the set of sample eigenvalues by reducing its dispersion around the mean, pushing up the smaller ones and pulling down the larger ones. This estimator has been used in the current RF practice (Ribes et al., 2009, 2013).

Substituting Σ\Sigma with an estimator introduces an additional uncertainty. The impact of this uncertainty on the properties of resulting ROF estimator has not been investigated when the whole structure of Σ\Sigma is unknown (Li et al., 2019). The optimality of the optimal fingerprinting in its original sense is unlikely to still hold. Now that the properties of the resulting estimator of β\beta depends on an estimated weight matrix, can we choose this weight matrix estimator to minimize the variance of the estimator of β\beta? The recently proposed nonlinear shrinkage estimator (Ledoit and Wolf, 2017a, 2018) has high potential to outperform the linear shrinkage estimators.

3 Weight Matrix Construction

We consider constructing the weight matrix by inverting a nonlinear shrinkage estimator of Σ\Sigma (Ledoit and Wolf, 2017b) in the fingerprinting context. New theoretical results are developed to justify the adaptation of this nonlinear shrinkage estimator of Σ\Sigma to minimize the uncertainty of the resulting estimator β^\hat{\beta} of β\beta. Assume that there are nn replicates from climate model simulations (usually pre-industrial control runs) that are independent of YY and X~i\tilde{X}_{i}’s. Let Zi,Z2,,ZnNZ_{i},Z_{2},\ldots,Z_{n}\in\mathbb{R}^{N} be the centered replicates so that the sample covariance matrix is computed as Σ^n=n1i=1nZiZi\hat{\Sigma}_{n}=n^{-1}\sum_{i=1}^{n}Z_{i}Z^{\top}_{i}. Our strategy is to revisit the GLS setting with no EIV first and then apply the result of the GTLS setting to the case under EIV, the same order as the historical development.

3.1 GLS

Since the target of inference is β\beta, we propose to minimize the “total variation” of the covariance matrix 𝕍(β^)\operatorname{\mathbb{V}}(\hat{\beta}) of the estimated scale factors β^(W)\hat{\beta}(W) in (3) with respect to W=Σ^1W=\hat{\Sigma}^{-1}. Two loss functions are considered that measure the variation of β^\hat{\beta}, namely, the summation of the variances of β^\hat{\beta} (trace of 𝕍(β^)\operatorname{\mathbb{V}}(\hat{\beta})) and the general variance of β^\hat{\beta} (determinant of 𝕍(β^)\operatorname{\mathbb{V}}(\hat{\beta})), denoted respectively as L1(Σ^,Σ,X)L_{1}(\hat{\Sigma},\Sigma,X) and L2(Σ^,Σ,X)L_{2}(\hat{\Sigma},\Sigma,X). In particular, we have

L1(Σ^,Σ,X)=\displaystyle L_{1}(\hat{\Sigma},\Sigma,X)= Tr((XΣ^1X)1XΣ^1ΣΣ^1X(XΣ^1X)1),\displaystyle\operatorname{Tr}\big{(}(X^{\top}\hat{\Sigma}^{-1}X)^{-1}X^{\top}\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1}X(X^{\top}\hat{\Sigma}^{-1}X)^{-1}\big{)},
L2(Σ^,Σ,X)=\displaystyle L_{2}(\hat{\Sigma},\Sigma,X)= (Tr(XX)pN)pdet(XΣ^1ΣΣ^1XN)det(XΣ^1XN)2,\displaystyle\left(\frac{\operatorname{Tr}(X^{\top}X)}{pN}\right)^{p}\det\left(\frac{X^{\top}\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1}X}{N}\right)\det{}^{-2}\left(\frac{X^{\top}\hat{\Sigma}^{-1}X}{N}\right),

where the first loss function directly targets on the trace of 𝕍(β^(W))\operatorname{\mathbb{V}}(\hat{\beta}(W)) and the second loss function is proportional to the determinant of 𝕍(β^(W))\operatorname{\mathbb{V}}(\hat{\beta}(W)) (up to a constant scale {Tr(XX)/p}p\{\operatorname{Tr}(X^{\top}X)/p\}^{p}).

The theoretical development is built on minimizing the limiting forms of the loss functions as nn\to\infty and NN\to\infty. The special case of p=1p=1 has been approached by Ledoit and Wolf (2017b). We extend their result to multiple linear regressions with p>1p>1.

Lemma 1.

The loss functions L1(Σ^,Σ,X)L_{1}(\hat{\Sigma},\Sigma,X) and L2(Σ^,Σ,X)L_{2}(\hat{\Sigma},\Sigma,X) remain unchanged after orthogonalization of design matrix XX via the singular value decomposition.

The proof of Lemma 1 is in Appendix A.

Lemma 1 implies that, without loss of generality, we only need to consider orthogonal designs in the regression model (1). In other words, we may assume that the columns of the design matrix XX are such that XiXj=0X_{i}^{\top}X_{j}=0 for any iji\neq j.

Consider the minimum variance loss function

Lmv(Σ^,Σ)=Tr(Σ^1ΣΣ^1)/N(Tr(Σ^1)/N)2\displaystyle L_{\mathrm{mv}}(\hat{\Sigma},\Sigma)=\frac{\operatorname{Tr}(\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1})/N}{(\operatorname{Tr}(\hat{\Sigma}^{-1})/N)^{2}} (4)

derived in Engle et al. (2019). We have the following result.

Theorem 1.

As dimension NN\to\infty and sample size nn\to\infty with N/ncN/n\to c for a constant cc, minimizing limn,NL1(Σ^,Σ,X)\lim_{n,N\to\infty}L_{1}(\hat{\Sigma},\Sigma,X) or limn,NL2(Σ^,Σ,X)\lim_{n,N\to\infty}L_{2}(\hat{\Sigma},\Sigma,X) is equivalent to minimizing limn,NLmv(Σ^,Σ)\lim_{n,N\to\infty}L_{\mathrm{mv}}(\hat{\Sigma},\Sigma).

The proof for Theorem 1 is presented in Appendix B.

Let Σ^n=ΓnDnΓn\hat{\Sigma}_{n}=\Gamma_{n}D_{n}\Gamma_{n}^{\top} be the spectral decomposition of the sample covariance matrix Σ^n\hat{\Sigma}_{n}, where Dn=diag(λ1,,λN)D_{n}=\operatorname{diag}(\lambda_{1},\ldots,\lambda_{N}) is the diagonal matrix of the eigenvalues and Γn\Gamma_{n} contains the corresponding eigenvectors. Consider the rotation invariant class of the estimators Σ^=ΓnD~nΓn\hat{\Sigma}=\Gamma_{n}\tilde{D}_{n}\Gamma_{n}^{\top}, where D~n=diag(δ(λ1),,δ(λN))\tilde{D}_{n}=\operatorname{diag}\big{(}\delta(\lambda_{1}),\ldots,\delta(\lambda_{N})\big{)} for a smooth function δ()\delta(\cdot). Then, under some regularity assumptions on the data genration mechanism (Ledoit and Wolf, 2017a, Assumptions 1-4), we can get the asymptotically optimal estimator Σ^\hat{\Sigma} which minimizes the limiting form of proposed two loss functions as nn\to\infty and NN\to\infty.

Let FNF_{N} be the empirical cumulative distribution function of sample eigenvalues. Silverstein (1995) showed that the limiting form F=limN,nFNF=\lim_{N,n\to\infty}F_{N} exists under the same assumptions. The oracle optimal nonlinear shrinkage estimator minimizing the limiting form of proposed loss function under general asymptotics depends only on the derivative f=Ff=F^{\prime} of FF and its Hilbert transform f\mathcal{H}_{f}, and the limiting ratio cc of N/nN/n (Ledoit and Wolf, 2017a), with the shrinkage form of the eigenvalues given by

δoracle(λi)=λi[πcλif(λi)]2+[1cπcλif(λi)]2.\delta_{\mathrm{oracle}}(\lambda_{i})=\frac{\lambda_{i}}{[\pi c\lambda_{i}f(\lambda_{i})]^{2}+[1-c-\pi c\lambda_{i}\mathcal{H}_{f}(\lambda_{i})]^{2}}. (5)

A feasible nonlinear shrinkage estimator (bona fide counterpart of the oracle estimator) can be based on a kernel estimator of ff, which is proposed and shown by Ledoit and Wolf (2017a) to perform as well as the oracle estimator asymptotically. Let cn=N/nc_{n}=N/n, which is a estimator for the limiting concentration ratio cc. The feasible nonlinear shrinkage δ(λi)\delta(\lambda_{i}), i=1,,Ni=1,\ldots,N, of the sample eigenvalues is defined as following results for both cases of cn1c_{n}\leq 1 and cn>1c_{n}>1.

Case 1

If cn1c_{n}\leq 1, that is, the sample covariance matrix is nonsingular, then

δ(λi)=λi[πNnλif~(λi)]2+[1NnπNnλif~(λi)]2,\displaystyle\delta(\lambda_{i})=\frac{\lambda_{i}}{[\pi\frac{N}{n}\lambda_{i}\tilde{f}(\lambda_{i})]^{2}+[1-\frac{N}{n}-\pi\frac{N}{n}\lambda_{i}\mathcal{H}_{\tilde{f}}(\lambda_{i})]^{2}},

where f~()\tilde{f}(\cdot) is a kernel estimator of the limiting sample spectral density ff, and f~\mathcal{H}_{\tilde{f}} is the Hilbert transform of f~\tilde{f}. Various authors adopt different conventions to define the Hilbert transform. We follow Ledoit and Wolf (2017a) and apply the same semicircle kernel function and Hilbert transform because of the consistency of the resulting feasible estimator. Specifically, we have

f~(λi)\displaystyle\tilde{f}(\lambda_{i}) =1Nj=1N[4λj2hn2(λiλj)2]+2πλj2hn2,\displaystyle=\frac{1}{N}\sum_{j=1}^{N}\frac{\sqrt{[4\lambda^{2}_{j}h_{n}^{2}-(\lambda_{i}-\lambda_{j})^{2}]^{+}}}{2\pi\lambda^{2}_{j}h^{2}_{n}},
f~(λi)\displaystyle\mathcal{H}_{\tilde{f}}(\lambda_{i}) =1Nj=1Nsgn(λiλj)[(λiλj)24λj2hn2]+λi+λj2πλj2hn2,\displaystyle=\frac{1}{N}\sum_{j=1}^{N}\frac{\operatorname{sgn}(\lambda_{i}-\lambda_{j})\sqrt{[(\lambda_{i}-\lambda_{j})^{2}-4\lambda^{2}_{j}h_{n}^{2}]^{+}}-\lambda_{i}+\lambda_{j}}{2\pi\lambda^{2}_{j}h^{2}_{n}},

where hn=nγh_{n}=n^{-\gamma} is the bandwidth of the semicircle kernel with tuning parameter γ\gamma, and a+=max(0,a)a^{+}=\max(0,a). For details on the Hilbert transform and the mathematical formulation of Hilbert transform for commonly used kernel functions, see Bateman (1954).

Case 2

In optimal fingerprinting applications, the case of cn>1c_{n}>1 is more relevant because the number nn of controlled runs that can be used to estimate the internal climate variation is often limited, much less than the dimension NN of the problem. If cn>1c_{n}>1, we have NnN-n null eigenvalues. Assume that (λ1,,λNn)=0(\lambda_{1},\ldots,\lambda_{N-n})=0. In this case, we only consider the empirical cumulative distribution function F¯N\underline{F}_{N} of the nonzero nn eigenvalues. From Silverstein (1995), there existing a limiting function F¯\underline{F} such that limN,nF¯N=F¯\lim_{N,n\to\infty}\underline{F}_{N}=\underline{F}, and it admits a continuous derivative f¯\underline{f}. The oracle estimator in Equation (5) can be written as

δoracle(λi)=λiπ2λi2[f¯(λi)2+f¯(λi)2].\delta_{\mathrm{oracle}}(\lambda_{i})=\frac{\lambda_{i}}{\pi^{2}\lambda_{i}^{2}[\underline{f}(\lambda_{i})^{2}+\mathcal{H}_{\underline{f}}(\lambda_{i})^{2}]}.

Then the kernel approach can be adapted in this case. Let f¯~\tilde{\underline{f}} and Hf¯~H_{\tilde{\underline{f}}} be, respectively, the kernel estimator for f¯\underline{f} and its Hilbert transform Hf¯H_{\underline{f}}. The our feasible shrinkage estimator is

δ(0)\displaystyle\delta(0) =1πNnnf¯~(0),i=1,,Nn,\displaystyle=\frac{1}{\pi\frac{N-n}{n}\mathcal{H}_{\tilde{\underline{f}}}(0)},\quad i=1,\ldots,N-n,
δ(λi)\displaystyle\delta(\lambda_{i}) =λiπ2λi2[f¯~(λi)2+f¯~(λi)2],i=Nn+1,,N,\displaystyle=\frac{\lambda_{i}}{\pi^{2}\lambda^{2}_{i}[\tilde{\underline{f}}(\lambda_{i})^{2}+\mathcal{H}_{\tilde{\underline{f}}}(\lambda_{i})^{2}]},\quad i=N-n+1,\ldots,N,

where

f¯~(0)\displaystyle\mathcal{H}_{\underline{\tilde{f}}}(0) =114hn22πnhn2j=Nn+1N1λj,\displaystyle=\frac{1-\sqrt{1-4h^{2}_{n}}}{2\pi nh^{2}_{n}}\sum_{j=N-n+1}^{N}\frac{1}{\lambda_{j}},
f¯~(λi)\displaystyle\tilde{\underline{f}}(\lambda_{i}) =1nj=Nn+1N[4λj2hn2(λiλj)2]+2πλj2hn2,\displaystyle=\frac{1}{n}\sum_{j=N-n+1}^{N}\frac{\sqrt{[4\lambda^{2}_{j}h_{n}^{2}-(\lambda_{i}-\lambda_{j})^{2}]^{+}}}{2\pi\lambda^{2}_{j}h^{2}_{n}},
f¯~(λi)\displaystyle\mathcal{H}_{\tilde{\underline{f}}}(\lambda_{i}) =1nj=Nn+1Nsgn(λiλj)[(λiλj)24λj2hn2]+λi+λj2πλj2hn2,\displaystyle=\frac{1}{n}\sum_{j=N-n+1}^{N}\frac{\operatorname{sgn}(\lambda_{i}-\lambda_{j})\sqrt{[(\lambda_{i}-\lambda_{j})^{2}-4\lambda^{2}_{j}h_{n}^{2}]^{+}}-\lambda_{i}+\lambda_{j}}{2\pi\lambda^{2}_{j}h^{2}_{n}},

and hn=nγh_{n}=n^{-\gamma} is the bandwidth with tuning parameter γ\gamma.

In both cases, the pool-adjacent-violators-algorithm (PAVA) in isotonic regression can be used to ensure the shrunken eigenvalues to be in ascending order. The bandwidth parameter γ\gamma can be selected via crossvalidation on the estimated standard deviation of the scaling factors or other information criteria. The feasible optimal nonlinear shrinkage estimator is the resulting Σ^MV=ΓnD~nΓn\hat{\Sigma}_{\mathrm{MV}}=\Gamma_{n}\tilde{D}_{n}\Gamma_{n}^{\top}.

3.2 GTLS

For the GTLS setting, which is more realistic with EIV, we propose to pre-whiten YY and X~i\tilde{X}_{i}’s by Σ^MV\hat{\Sigma}_{\mathrm{MV}} and then apply the standard TLS procedure (Gleser, 1981) to estimate β\beta. The resulting estimator of the β\beta will be shown to be more efficient than that based on pre-whitening with the linear shrinkage estimator Σ^LS\hat{\Sigma}_{\mathrm{LS}} (Ribes et al., 2013).

Consider the GTLS estimator of β\beta obtained from prewhitening with a class of regularized covariance matrix estimator Σ^\hat{\Sigma} from independent control runs. In the general framework of GTLS, the measurement error vectors usually have the same covariance matrix as the model error vector for the ease of theoretical derivations. This assumption can be easily achived in the OF setting (1)–(2) by multiplying each observed fingerprint vector X~i\tilde{X}_{i} by ni\sqrt{n_{i}}. Therefore, without loss of generality, in the following we assume ni=1n_{i}=1 to simplify the notations.

Let X~=(X~1,,X~p)\tilde{X}=(\tilde{X}_{1},\ldots,\tilde{X}_{p}). The GTLS estimator based on Σ^\hat{\Sigma} is

β^(Σ^)=arg𝛽minΣ^12(YX~β)221+ββ,\hat{\beta}(\hat{\Sigma})=\underset{\beta}{\arg}\min\frac{\|\hat{\Sigma}^{-\frac{1}{2}}(Y-\tilde{X}\beta)\|^{2}_{2}}{1+\beta^{\top}\beta}, (6)

where a2\|a\|_{2} is the 2\ell_{2} norm of vector aa. The asymptotic properties of β^(Σ^)\hat{\beta}(\hat{\Sigma}) are established for a class of covariance matrix estimators Σ^\hat{\Sigma} including both Σ^MV\hat{\Sigma}_{\mathrm{MV}} and Σ^LS\hat{\Sigma}_{\mathrm{LS}}.

Assumption 1.

limN,nXΣ^1X/N=Δ1\lim_{N,n\to\infty}X^{\top}\hat{\Sigma}^{-1}X/N=\Delta_{1} exists, where Δ1\Delta_{1} is a non-singular matrix.

Assumption 2.

limN,nTr(Σ^1Σ)/N\lim_{N,n\to\infty}\operatorname{Tr}(\hat{\Sigma}^{-1}\Sigma)/N exists and is a positive constant.

Assumption 3.

limN,nTr{(Σ^1/2ΣΣ^1/2)2}/N=K\lim_{N,n\to\infty}\operatorname{Tr}\{(\hat{\Sigma}^{-1/2}\Sigma\hat{\Sigma}^{-1/2})^{2}\}/N=K exists with K>0K>0.

Remark 1.

Assumptions 1 originates from Gleser (1981), which is needed for the consistency of β^(Σ^)\hat{\beta}(\hat{\Sigma}). Assumptions 23 are from Ledoit and Wolf (2018, 2017a). Assumption 2 states that the average of the variances of the components of the pre-whitened error vectors converge to positive constant. For the class of rotation invariant estimators defined in Ledoit and Wolf (2017b, a), which includes both Σ^MV\hat{\Sigma}_{\mathrm{MV}} and Σ^LS\hat{\Sigma}_{\mathrm{LS}}, Assumptions 2 and 3 are satisfied.

Lemma 2.

Under Assumptions 13, β^(Σ^)𝒫β0,\hat{\beta}(\hat{\Sigma})\overset{\mathcal{P}}{\to}\beta_{0}, as N,nN,n\to\infty with a N/ncN/n\to c for some c>0c>0.

The proof for Lemma 2 is in Appendix C.

The asymptotic normality of β^(Σ^)\hat{\beta}(\hat{\Sigma}) is established with additional assumptions.

Assumption 4.

limN,nXΣ^1ΣΣ^1X/N=Δ2\lim_{N,n\to\infty}X^{\top}\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1}X/N=\Delta_{2} exists for a non-singular matrix Δ2\Delta_{2}.

Assumption 5.

The regression error ϵ\epsilon and measurement errors νi\nu_{i}’s, i=1,,pi=1,\ldots,p, are mutually independent normally distributed random vectors.

Remark 2.

Assumption 4 originates from Gleser (1981) for the asymptotic noramlity of the GTLS estimator. Assumption 5 is commonly used in the context of climate change detection and attribution for mean state climate variables.

Theorem 2.

Under Assumptions 15, as N,nN,n\to\infty with N/ncN/n\to c for some c>0c>0,

N(β^β0)𝒟𝒩(0,Ξ), where Ξ=Δ11{Δ2+K(Ip+β0β0)1}(1+β0β0)Δ11.\sqrt{N}(\hat{\beta}-\beta_{0})\overset{\mathcal{D}}{\to}\mathcal{N}(0,\Xi),\mbox{ where }\Xi=\Delta^{-1}_{1}\big{\{}\Delta_{2}+K(I_{p}+\beta_{0}\beta_{0}^{\top})^{-1}\big{\}}(1+\beta_{0}^{\top}\beta_{0})\Delta^{-1}_{1}. (7)

The proof of Theorem 2 is in Section C.2 of Appendix C.

The higher efficiency of the resulting estimator for β\beta from the proposed weight matrix in comparison with that from the existing weight is summarized by the following result with proof in Section C.3 of Appendix C.

Theorem 3.

Let Ξ(Σ^)\Xi(\hat{\Sigma}) be the asymptotic covariance matrix in Equation (7) for a rotation invariant estimator Σ^\hat{\Sigma} under Assumptions 15. Then Tr(Ξ(Σ^MV))Tr(Ξ(Σ^LS))\operatorname{Tr}\big{(}\Xi(\hat{\Sigma}_{\mathrm{MV}})\big{)}\leq\operatorname{Tr}\big{(}\Xi(\hat{\Sigma}_{\mathrm{LS}})\big{)}.

In our implementation, a 5-fold cross validation is used to select the optimal bandwidth parameter γ(0.2,0.5)\gamma\in(0.2,0.5).

4 Simulation Studies

The finite sample performance of the proposed method in comparison with the existing practice in RF needs to be assessed to make realistic recommendations for detection and attribution of climate change. We conducted a simulation study similar to the setting of a study in Ribes et al. (2013). The observed climate variable of interest is 11 decadal mean temperatures over 25 grid-boxes, a vector of dimension N=275N=275. Two N×1N\times 1 fingerprints were considered, corresponding to the anthropogenic (ANT) and natural forcings (NAT), denoted by X1X_{1} and X2X_{2}, respectively. They were set to the average of all runs from the CNRM-CM5 model as in Ribes et al. (2013). To vary the strength of the signals, we also considered halving X1X_{1} and X2X_{2}. That is, there were two levels of signal-to-noise ratio corresponding to the cases of multiplying each XiX_{i}, i{1,2}i\in\{1,2\}, controlled by a scale λ{1,0.5}\lambda\in\{1,0.5\}. The true scaling factors were β1=β2=1\beta_{1}=\beta_{2}=1. The distribution of the error vector ϵ\epsilon was multivariate normal MVN(0,Σ)\mathrm{MVN}(0,\Sigma). The distribution of the measurement error vector νi\nu_{i}, i{1,2}i\in\{1,2\}, was MVN(0,Σ/ni)\mathrm{MVN}(0,\Sigma/n_{i}), with (n1,n2)=(35,46)(n_{1},n_{2})=(35,46) which are the values in the detection and attribution analysis of annual mean temperature conducted in Section 5.

Two settings of true Σ\Sigma were considered. In the first setting, Σ\Sigma was an unstructured matrix ΣUN\Sigma_{\mathrm{UN}}, which was obtained by manipulating the eigenvalues but keeping the eigenvectors of the proposed minimum variance estimate from the same set of climate model simulations as in Ribes et al. (2013). Specifically, we first obtained the eigen decomposition of the minimum variance estimate, and then restricted the eigenvalues to be equal over each of the 25 grid-boxes (i.e., only 25 unique values for the N=25×11N=25\times 11 eigenvalues) by taking averages over the decades at each grid-box. The pattern of the resulting eigenvalues is similar to the pattern of the eigenvalues of a spatial-temporal covariance matrix with variance stationarity and weak dependence over the time dimension. Finally, the eigenvalues were scaled independently by a uniformly distributed variable on [0.5,1.5][0.5,1.5], which results in a more unstructured covariance matrix setting similar to the simulation settings in Hannart (2016). In the second setting, Σ\Sigma was set to be ΣST\Sigma_{\mathrm{ST}} whose diagonals were set to be the sample variances from the climate model simulations without imposing temporal stationarity; the corresponding correlation matrix was set to be the Kronecker product of a spatial correlation matrix and a temporal correlation matrix, both with autoregressive of order 1 and coefficient 0.1.

The observed mean temperature vector YY and the estimated fingerprints (X~1,X~2)(\tilde{X}_{1},\tilde{X}_{2}) were generated from Models (1)–(2). The control runs used to estimate Σ\Sigma were generated from MVN(0,Σ)\mathrm{MVN}(0,\Sigma) with sample size n{50,100,200,400}n\in\{50,100,200,400\}. For each replicate, the two GTLS estimators of β\beta in Theorem 3 were obtained. For each configuration, 1000 replicates were run.

Refer to caption
Figure 1: Boxplots of the estimates of the scaling factors in the simulation study based on 1000 replicates.

Figure 1 displays the boxplots of the estimates of the ANT scaling factor β1\beta_{1} from the two TGLS approaches based on pre-whitening with Σ^LS\hat{\Sigma}_{\mathrm{LS}} (denoted as M1) and Σ^MV\hat{\Sigma}_{\mathrm{MV}} (denoted as M2) and, respectively. Both estimators appear to recover the true parameter values well on average. The variations of both estimators are lower for larger nn, higher λ\lambda, and more structured Σ\Sigma (the case of ΣST\Sigma_{\mathrm{ST}}). These observations are expected. A larger nn means more accurate estimation Σ\Sigma; a higher λ\lambda means stronger signal; a more structured Σ\Sigma means an easier task to estimate Σ\Sigma. In the case of Σ=ΣUN\Sigma=\Sigma_{\mathrm{UN}}, the M2 estimates have smaller variations than the M1 estimate, since the eigenvalues were less smooth and, hence, favored the nonlinear shrinkage function. For the case of Σ=ΣST\Sigma=\Sigma_{ST} where the covariance matrix is more structured, both methods estimate the true covariance matrix much more accurately, and the differences between methods are less obvious. More detailed results are summarized in Table D.1 and Table D.2, the latter of which had smaller ensembles in estimating the fingerprints with (n1,n2)=(10,6)(n_{1},n_{2})=(10,6). The standard deviations of the M2 estimates are over 10% smaller than those of the M1 estimates for both cases.

Refer to caption
Figure 2: Calibrated 95% confidence intervals for the scaling factors in the simulation study based on 1000 replicates.

Confidence intervals are an important tool for detection and attribution analyses. It would be desirable if the asymptotic variance in Theorem 2 can be used to construct confidence intervals for the scaling factors. Unfortunately, it has been reported that the confidence intervals for the scaling factors based on Σ^LS\hat{\Sigma}_{\mathrm{LS}} have coverage rates lower than, sometimes much lower than, their nominal levels (Li et al., 2019). The under-coverage issue remains for the estimator based on Σ^MV\hat{\Sigma}_{\mathrm{MV}}. To give confidence intervals with correct coverage rates, Li et al. (2019) proposed a calibration procedure which enlarges the confidence intervals based on the asymptotic normality of the estimators by an appropriate scale tuned by a parametric bootstrap to achieve the desired coverage rate. We applied this calibration to both M1 and M2 estimators. Figure 2 shows the empirical coverage rates of the 95% confidence intervals after the calibration. The coverage rate of a naive confidence interval could be as low as 70% (not shown). After the calibration, the coverage rates are much closer to the nominal levels. The agreement is better for larger nn and more structured Σ\Sigma. The calibrated confidence intervals from M2 are about 10% shorter to those from M1 overall in both settings of true Σ\Sigma, except for the case of Σ=ΣUN\Sigma=\Sigma_{\mathrm{UN}} and sample size n=50n=50 where the confidence intervals suffer from under-coverage issue..

5 Fingerprinting Mean Temperature Changes

We apply the proposed approach to the detection and attribution analyses of annual mean temperature of 1951–2010 at the global (GL), continental, and subcontinental scales. The continental scale regions are Northern Hemisphere (NH), NH midlatitude between 3030^{\circ} and 7070^{\circ} (NHM), Eurasia (EA), and North America (NA), which were studied in (Zhang et al., 2006). The subcontinental scale regions are Western North American (WNA), Central North American (CNA), Eastern North American (ENA), southern Canada (SCA), and southern Europe (SEU), where the spatio-temporal correlation structure is more likely to hold.

In each regional analysis, we first constructed observation vector YY from the HadCRUT4 dataset (Morice et al., 2012). The raw data were monthly anomalies of near-surface air temperature on 5×55^{\circ}\times 5^{\circ} grid-boxes. At each grid-box, each annual mean temperature was computed from monthly temperatures if at least 9 months were available in that year; otherwise, it was considered missing. Then, 5-year averages were computed if no more than 2 annual averages were missing. To reduce the spatial dimension in the analyses, the 5×55^{\circ}\times 5^{\circ} grid-boxes were aggregated into larger grid-boxes. In particular, the grid-box sizes were 40×3040^{\circ}\times 30^{\circ} for GL and NH, 40×1040^{\circ}\times 10^{\circ} for NH 30-70, 10×2010^{\circ}\times 20^{\circ} for EA, and 10×510^{\circ}\times 5^{\circ} for NA. For the subcontinent regions, no aggregation was done except for SCA, in which case 10×1010^{\circ}\times 10^{\circ} grid-boxes were used. Details on the longitude, latitude and spatio-temporal steps of each regions after processing can be found in Table 1.

Table 1: Summaries of the names, coordinate ranges, ideal spatio-temporal dimensions (SS and TT’), and dimension of observation after removing missing values of the 5 regions analyzed in the study.
Acronym Regions Longitude Latitude Grid size SS TT nn
(E) (N) (1×11^{\circ}\times 1^{\circ})
Global and Continental Regions
GL Global -180 / 180 -90 / 90 40×3040\times 30 54 11 572
NH Northern Hemisphere -180 / 180 0 / 90 40×3040\times 30 27 11 297
NHM Northern Hemisphere 30N30^{\circ}N to 70N70^{\circ}N -180 / 180 30 / 70 40×1040\times 10 36 11 396
EA Eurasia -10 / 180 30 / 70 10×2010\times 20 38 11 418
NA North America -130 / -50 30 / 60 10×510\times 5 48 11 512
Subcontinental Regions
WNA Western North America -130 / -105 30 / 60 5×55\times 5 30 11 329
CNA Central North America -105 / -85 30 / 50 5×55\times 5 16 11 176
ENA Eastern North America -85 / -50 15 / 30 5×55\times 5 21 11 231
SCA Southern Canada -110 / -10 50 / 70 10×1010\times 10 20 11 220
SEU South Eupore -10 / 40 35 / 50 5×55\times 5 30 11 330

Two external forcings, ANT and NAT, were considered. Their fingerprints X1X_{1} and X2X_{2} were not observed, but their estimates X~1\tilde{X}_{1} and X~2\tilde{X}_{2} were averages over n1=35n_{1}=35 and n2=46n_{2}=46 runs from CIMP5 climate model simulations. The missing pattern in YY was used to mask the simulated runs. The same procedure used to aggregate the grid-boxes and obtain the 5-year averages in preparing YY was applied to each masked run of each climate model under each forcing. The final estimates X~1\tilde{X}_{1} and X~2\tilde{X}_{2} at each grid-box were averages over all available runs under the ANT and the NAT forcings, respectively, centered by the average of the observed annual temperatures over 1961–1990, the same center used by the HadCRUT4 data to obtain the observed anomalies.

Estimation of Σ\Sigma was based on n=223n=223 runs of 60 years constructed from 47 pre-industrial control simulations of various length. The long-term linear trend was removed separately from the control simulations at each grid-box. As the internal climate variation is assumed to be stationary over time, each control run was first split into non-overlapping blocks of 60 years, and then each 60-year block was masked by the same missing pattern as the HadCRUT4 data to create up to 12 5-year averages at each grid-box. The temporal stationarity of variance at each grid implies equal variance over time steps at each observing grid-box, which is commonly incorporated in detection and attribution analyses of climate change (e.g., Hannart, 2016). Both M1 and M2 estimates based on linear and nonlinear shrinkage, respectively, were obtained for comparison. Pooled estimation of the variance at each grid-box was considered in each of the shrinkage estimation to enforce the stationary, grid-box specific variance.

Refer to caption
Figure 3: Estimated signal scaling factors for ANT and NAT required to best match observed 1950 - 2010 annual mean temperature for different spatial domains, and the corresponding 95% confidence intervals from different methods. For weight matrix construction, “M1” denotes the linear shrinkage approach and “M2” denotes the minimum variance approach. For confidence interval, the calibration method is used.

Figure 3 summarizes the GTLS estimates of the scaling factors β^1\hat{\beta}_{1} and β^2\hat{\beta}_{2} for the ANT and NAT forcings, respectively. The estimates from pre-whitening weight matrix ΣLS\Sigma_{\mathrm{LS}} and ΣMV\Sigma_{\mathrm{MV}} are denoted again as M1 and M2, respectively. The 95% confidence intervals were obtained with the calibration approach of Li et al. (2019). The point estimates from M1 and M2 are similar in all the analysis. The confidence intervals from the M2 method are generally shorter than those from the M1 method in the analyses both at continental and subcontinental scale. More obvious reduction in the confidence interval lengths is observed at the subcontinental scales, e.g., the ANT scaling factor in EA/NA/SCA and the NAT scaling factor in NA/WNA/SCA. This may be explained by that signals at subcontinental scale are weaker and that the error covariance matrix has non-smooth eigenvalues that form some clustering patterns due to weak temporal dependence, as suggested by the simulation study. Although the detection and attribution conclusions based on the confidence intervals remain the same in most cases, the shortened confidence intervals means reduced uncertainty in the estimate of the attributable warming (Jones et al., 2013) and other quantities based on detection and attribution analysis, such as future climate projection and transient climate sensitivity (Li et al., 2019).

6 Discussion

Optimal fingerprinting as the most commonly used method for detection and attribution analyses of climate change has great impact in climate research. Such analyses are the basis for projecting observationally constrained future climate (e.g., Jones et al., 2016) and estimating important properties of the climate system such as climate sensitivity (e.g., Schurer et al., 2018). The original optimality of optimal fingerprinting, which minimizes the uncertainty in the resulting scaling factor estimator, is no longer valid under realistic settings where Σ\Sigma is not known but estimated. Our method constructs a weight matrix by inverting a nonlinear shrinkage estimator of Σ\Sigma, which directly minimizes the variation of the resulting scaling factor estimator. This method is more efficient than the current RF practice (Ribes et al., 2013) as evident from the simulation study. Therefore, the lost optimality in fingerprinting is restored to a good extent for practical purposes, which helps to reduce the uncertainty in important quantities such as attributable warming and climate sensitivity.

There are open questions that we have not addressed. It is of interest to further investigate how the asymptotic results under N,nN,n\to\infty and N/ncN/n\to c can guide the RF practice. The temporal and spatial resolution that controls NN can be tuned in RF practice, which may lead to different efficiencies in inferences and, hence, different results in detection and attribution. Is there an optimal temporal/spatial resolution to obtain the most reliable result? Goodness-of-fit check is an important element in detection and attribution analyses. The classic approach to check the weighted sum of squared residuals against a chi-squared distribution under the assumption of known Σ\Sigma is not valid when Σ\Sigma has to be estimated. Can a test be designed, possibly based on parametric bootstrap, to take into account of the uncertainty in regularized estimation of Σ\Sigma? These questions merit future research.

Acknowledgements

JY’s research was partially supported by NSF grant DMS 1521730. KC’s research was partially supported by NSF grant IIS 1718798.

Supplementary Materials

A Sufficiency to Assume Orthogonal Covariates

Consider the singular value decomposition of N×pN\times p design matrix XX:

X=UDV,X=UDV^{\top},

where UU is a N×NN\times N orthogonal matrix, DD is a N×pN\times p matrix with pp non-negative singular values on the diagonal, and VV is a p×pp\times p orthogonal matrix.

Let X=UD=XV=(X1,,Xp)X^{*}=UD=XV=(X_{1}^{*},\ldots,X_{p}^{*}) and β=Vβ=(β1,,βp)\beta^{*}=V^{\top}\beta=(\beta^{*}_{1},\ldots,\beta^{*}_{p}). The columns of XX^{*} are orthogonal. The linear regression can be expressed as

Y=i=1pXiβi+ϵ.Y=\sum_{i=1}^{p}X^{*}_{i}\beta^{*}_{i}+\epsilon.

The estimator of β\beta^{*} is

β^\displaystyle\hat{\beta}^{*} =(XΣ^1X)1XΣ^1Y\displaystyle=(X^{*^{\top}}\hat{\Sigma}^{-1}X^{*})^{-1}X^{*^{\top}}\hat{\Sigma}^{-1}Y
=(VXΣ^1XV)1VXΣ^1Y\displaystyle=(V^{\top}X^{\top}\hat{\Sigma}^{-1}XV)^{-1}V^{\top}X^{\top}\hat{\Sigma}^{-1}Y
=V(XΣ^1X)1XΣ^1Y\displaystyle=V^{\top}(X^{\top}\hat{\Sigma}^{-1}X)^{-1}X^{\top}\hat{\Sigma}^{-1}Y

and the corresponding covariance matrix is

𝕍(β^)=V(XΣ^1X)1Σ^1ΣΣ^1X(XΣ^1X)1V.\displaystyle\operatorname{\mathbb{V}}(\hat{\beta}^{*})=V^{\top}(X^{\top}\hat{\Sigma}^{-1}X)^{-1}\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1}X(X^{\top}\hat{\Sigma}^{-1}X)^{-1}V.

The orthogonality of VV ensures that Tr(𝕍(β^))=Tr(𝕍(β^)\operatorname{Tr}(\operatorname{\mathbb{V}}(\hat{\beta}))=\operatorname{Tr}(\operatorname{\mathbb{V}}(\hat{\beta}^{*}) and det(𝕍(β^))=det(𝕍(β^)\det(\operatorname{\mathbb{V}}(\hat{\beta}))=\det(\operatorname{\mathbb{V}}(\hat{\beta}^{*}). Therefore, we only need to consider the orthogonal design in the regression model. In other words, we may assume that the columns of the design matrix have columns such that XiXj=0X_{i}^{\top}X_{j}=0 for any iji\neq j.

B Justification of Method M2 in the GLS Case

It is critically important to estimate the covariance matrix precisely as the covariance matrix of β^\hat{\beta} depends on Σ^\hat{\Sigma}. In the estimation of the covariance matrix, an important question is how to quantify the quality of the covariance matrix estimator. We use loss functions to measure the quality of the covariance matrix estimator. For example, one loss function is the Frobenius norm of the bias of the estimator

L(Σ^,Σ)=Σ^ΣF,L(\hat{\Sigma},\Sigma)=\|\hat{\Sigma}-\Sigma\|_{F},

and another loss function is Stein’s loss

L(Σ^,Σ)=Tr(Σ^Σ1)logdet(Σ^Σ1)N,L(\hat{\Sigma},\Sigma)=\operatorname{Tr}(\hat{\Sigma}\Sigma^{-1})-\log\det(\hat{\Sigma}\Sigma^{-1})-N,

which is the Kullback–Leibler divergence under the normal assumption.

B.1 Minimum Variance Loss Function

Considering the purpose of the fingerprinting, we construct a loss function directly related to the variance of the scaling factor estimator. In other words, we minimize the summation of the variances of the estimated scaling factors

L1(Σ^,Σ,X)=Tr((XΣ^1X)1XΣ^1ΣΣ^1X(XΣ^1X)1).L_{1}(\hat{\Sigma},\Sigma,X)=\operatorname{Tr}((X^{\top}\hat{\Sigma}^{-1}X)^{-1}X^{\top}\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1}X(X^{\top}\hat{\Sigma}^{-1}X)^{-1}).

From the random matrix theory (Marc̆enko and Pastur, 1967), we have the following results under the assumption that all limiting forms exist. Consider an N×1N\times 1 random vector x=(x1,,xN)\textbf{x}=(x_{1},\ldots,x_{N})^{{}^{\top}} whose entries are independent and identically distributed with mean zero, positive variance, and finite 12th moment. Let AA be a given N×NN\times N matrix. Then there exists a constant K>0K>0 independent of NN and x such that (Ledoit and Péché, 2011; Silverstein and Bai, 1995)

E|xAxTr(A)|6KA6N3,E|\textbf{x}^{\top}A\textbf{x}-\operatorname{Tr}(A)|^{6}\leq K\|A\|^{6}N^{3},

or

E|xAx/NTr(A)/N|6KA6N3,E|\textbf{x}^{{}^{\top}}A\textbf{x}/N-\operatorname{Tr}(A)/N|^{6}\leq K\|A\|^{6}N^{-3},

where A\|A\| is the spectral norm of AA. In other words, if A<C\|A\|<C for some constant CC, then as NN\to\infty,

xAx/NTr(A)/Na.s.0.\textbf{x}^{{}^{\top}}A\textbf{x}/N-\operatorname{Tr}(A)/N\overset{a.s.}{\to}0.

Similarly, for two independent random vector x and y,

xAyN=(x+y)A(x+y)2NxAx2NyAy2Na.s.0.\frac{\textbf{x}^{{}^{\top}}A\textbf{y}}{N}=\frac{(\textbf{x}+\textbf{y})^{{}^{\top}}A(\textbf{x}+\textbf{y})}{2N}-\frac{\textbf{x}^{{}^{\top}}A\textbf{x}}{2N}-\frac{\textbf{y}^{\top}A\textbf{y}}{2N}\overset{a.s.}{\to}0.

Now let X=(x1,x2,xp)X=(\textbf{x}_{1},\textbf{x}_{2},\ldots\textbf{x}_{p}) be an N×pN\times p matrix with independent columns. Then, for any N×NN\times N matrix AA with A<C\|A\|<C, the p×pp\times p matrix

XAXNTr(A)NIpa.s.0,\frac{X^{{}^{\top}}AX}{N}-\frac{\operatorname{Tr}(A)}{N}I_{p}\overset{a.s.}{\to}0,

where the righthand side is a p×pp\times p matrix of zeroes. In other words, XAX/N{X^{{}^{\top}}AX}/{N} and Tr(A)Ip/N{\operatorname{Tr}(A)}I_{p}/N have the same limit.

In optimal fingerprinting, let Σ^n\hat{\Sigma}_{n} be the sample covariance matrix estimated from ensemble runs representing the internal climate variation. Suppose that the eigendecomposition of Σ^n\hat{\Sigma}_{n} is Σ^n=ΓnDnΓn\hat{\Sigma}_{n}=\Gamma_{n}D_{n}\Gamma_{n}^{\top}, where Dn=diag(λ1,,λN)D_{n}=\operatorname{diag}(\lambda_{1},\ldots,\lambda_{N}) is the diagonal matrix of the eigenvalues and Γn\Gamma_{n} contains the corresponding eigenvectors. Consider the rotation invariant class of the estimators Σ^=ΓnD~nΓn\hat{\Sigma}=\Gamma_{n}\tilde{D}_{n}\Gamma_{n}^{\top}, where D~n=diag(δ(λi))\tilde{D}_{n}=\operatorname{diag}(\delta(\lambda_{i})), i=1,2,,Ni=1,2,\ldots,N for a smooth function δ()\delta(\cdot). Under the same Assumptions 13 of the main text, we have both Σ^1\|\hat{\Sigma}^{-1}\| and Σ^1ΣΣ^1\|\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1}\| bounded. Then the assumptions of almost sure convergence are satisfied.

Consider the loss function

L~1(Σ^,Σ,X)\displaystyle\tilde{L}_{1}(\hat{\Sigma},\Sigma,X) =Tr(XX)p2L1(Σ^,Σ,X)\displaystyle=\frac{\operatorname{Tr}(X^{{}^{\top}}X)}{p^{2}}L_{1}(\hat{\Sigma},\Sigma,X)
=Tr(XX)p2Tr((XΣ^1X)1XΣ^1ΣΣ^1X(XΣ^1X)1)\displaystyle=\frac{\operatorname{Tr}(X^{{}^{\top}}X)}{p^{2}}\operatorname{Tr}((X^{{}^{\top}}\hat{\Sigma}^{-1}X)^{-1}X^{{}^{\top}}\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1}X(X^{{}^{\top}}\hat{\Sigma}^{-1}X)^{-1})
=Tr(XX)p2NTr[(XΣ^1XN)1XΣ^1ΣΣ^1XN(XΣ^1XN)1].\displaystyle=\frac{\operatorname{Tr}(X^{{}^{\top}}X)}{p^{2}N}\operatorname{Tr}\left[\left(\frac{X^{{}^{\top}}\hat{\Sigma}^{-1}X}{N}\right)^{-1}\frac{X^{{}^{\top}}\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1}X}{N}\left(\frac{X^{{}^{\top}}\hat{\Sigma}^{-1}X}{N}\right)^{-1}\right].

Under the orthogonality assumption of XX, Tr(XX)/(pN)a.s.1{\operatorname{Tr}(X^{\top}X)}/(pN)\overset{a.s.}{\to}1. Therefore, we can consider a loss function with the same limit,

L(Σ^,Σ)\displaystyle L(\hat{\Sigma},\Sigma) =1pTr[(Tr(Σ^1)NIp)1Tr(Σ^1ΣΣ^1)NIp(Tr(Σ^1)NIp)1]\displaystyle=\frac{1}{p}\operatorname{Tr}\left[\left(\frac{\operatorname{Tr}(\hat{\Sigma}^{-1})}{N}I_{p}\right)^{-1}\frac{\operatorname{Tr}(\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1})}{N}I_{p}\left(\frac{\operatorname{Tr}(\hat{\Sigma}^{-1})}{N}I_{p}\right)^{-1}\right]
=Tr(Σ^1ΣΣ^1)/N(Tr(Σ^1)/N)2,\displaystyle=\frac{\operatorname{Tr}(\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1})/N}{(\operatorname{Tr}(\hat{\Sigma}^{-1})/N)^{2}},

which has the same form as what Ledoit and Wolf (2017b) got. We have, as NN\to\infty,

L(Σ^,Σ)L~1(Σ^,Σ,X)a.s.0.L(\hat{\Sigma},\Sigma)-\tilde{L}_{1}(\hat{\Sigma},\Sigma,X)\overset{a.s.}{\to}0.

B.2 Minimum General Variance Loss Function

Instead of considering the trace, we can use the determinant of 𝕍(β^)\operatorname{\mathbb{V}}(\hat{\beta}) as the objective loss function. Then the loss function is

L2(Σ^,Σ,X)\displaystyle L_{2}(\hat{\Sigma},\Sigma,X) =det((XΣ^1X)1XΣ^1ΣΣ^1X(XΣ^1X)1)\displaystyle=\det((X^{\top}\hat{\Sigma}^{-1}X)^{-1}X^{\top}\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1}X(X^{\top}\hat{\Sigma}^{-1}X)^{-1})
=det(XΣ^1ΣΣ^1X)det(XΣ^1X)2.\displaystyle=\frac{\det(X^{\top}\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1}X)}{\det(X^{\top}\hat{\Sigma}^{-1}X)^{2}}.

It is asymptotically equivalent to loss function

L~2(Σ^,Σ,X)\displaystyle\tilde{L}_{2}(\hat{\Sigma},\Sigma,X) =(Tr(XX)p)pL2(Σ^,Σ,X)\displaystyle=\left(\frac{\operatorname{Tr}(X^{\top}X)}{p}\right)^{p}L_{2}(\hat{\Sigma},\Sigma,X)
=(Tr(XX)pN)pdet(XΣ^1ΣΣ^1XN)det(XΣ^1XN)2.\displaystyle=\left(\frac{\operatorname{Tr}(X^{\top}X)}{pN}\right)^{p}\frac{\det\left(\frac{X^{\top}\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1}X}{N}\right)}{\det\left(\frac{X^{\top}\hat{\Sigma}^{-1}X}{N}\right)^{2}}.

Similar to the minimum variance loss function, we can consider the loss function with the same limiting form:

L(Σ^,Σ)\displaystyle L(\hat{\Sigma},\Sigma) =det[(Tr(Σ^1)NIp)1Tr(Σ^1ΣΣ^1)NIp(Tr(Σ^1)NIp)1]\displaystyle=\det\left[\left(\frac{\operatorname{Tr}(\hat{\Sigma}^{-1})}{N}I_{p}\right)^{-1}\frac{\operatorname{Tr}(\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1})}{N}I_{p}\left(\frac{\operatorname{Tr}(\hat{\Sigma}^{-1})}{N}I_{p}\right)^{-1}\right]
=(Tr(Σ^1ΣΣ^1)/N(Tr(Σ^1)/N)2)p.\displaystyle=\left(\frac{\operatorname{Tr}(\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1})/N}{(\operatorname{Tr}(\hat{\Sigma}^{-1})/N)^{2}}\right)^{p}.

In other words, as NN\to\infty,

L(Σ^,Σ)L~2(Σ^,Σ,X)a.s.0.L(\hat{\Sigma},\Sigma)-\tilde{L}_{2}(\hat{\Sigma},\Sigma,X)\overset{a.s.}{\to}0.

That is, minimizing the limiting forms of two loss functions are asymptotically equivalent to minimize the limiting form of

Lmv(Σ^,Σ)=Tr(Σ^1ΣΣ^1)/N(Tr(Σ^1)/N)2.L_{mv}(\hat{\Sigma},\Sigma)=\frac{\operatorname{Tr}(\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1})/N}{(\operatorname{Tr}(\hat{\Sigma}^{-1})/N)^{2}}.

This concludes the proof of Theorem 1 of the main text.

C Justification of Method M2 in the GTLS Case

C.1 Proofs of Lemma 2 of the Main Text

Proof.

The results are direct extensions of those in Gleser (1981), we only briefly sketch the proof. Consider the observed data matrix A~=Σ^12[X~,Y]\tilde{A}=\hat{\Sigma}^{-\frac{1}{2}}[\tilde{X},Y] which is obtained by binding the columns of X~\tilde{X} and YY. Under the Assumption 12 and 3, let W=A~A~W=\tilde{A}^{\top}\tilde{A} and δ=limNTr(Σ^1Σ)/N\delta=\lim_{N\to\infty}\operatorname{Tr}(\hat{\Sigma}^{-1}\Sigma)/N, we have

limNN1W=δIp+1+(Ip,β0)Δ1(Ip,β0),\lim_{N\to\infty}N^{-1}W=\delta I_{p+1}+(I_{p},\beta_{0})^{\top}\Delta_{1}(I_{p},\beta_{0}), (C.8)

which is in the same form of Lemma 3.1 in Gleser (1981). Consider the eigen decomposition of the matrix W=GDGW=GDG^{\top}, where D=diag(d1,,dp,dp+1)D=\operatorname{diag}(d_{1},\ldots,d_{p},d_{p+1}), and the (p+1)2(p+1)^{2} matrix GG is partitioned as

G=(G11G12G21G22)G=\begin{pmatrix}G_{11}&G_{12}\\ G_{21}&G_{22}\end{pmatrix}

with p×pp\times p matrix G11G_{11}. Then following the results of Lemma 3.3 by Gleser (1981), (G11,G21)(G^{\top}_{11},G^{\top}_{21})^{\top} converges to the eigenvectors corresponding to the pp largest eigenvalues of the limiting matrix of δIp+1+(Ip,β0)Δ1(Ip,β0)\delta I_{p+1}+(I_{p},\beta_{0})^{\top}\Delta_{1}(I_{p},\beta_{0}), denoted by (H11,H21)(H^{\top}_{11},H^{\top}_{21})^{\top}. Moreover, we have that there is a nonsingular p×pp\times p matrix ψ\psi such that

(H11H21)=(Ipβ0)ψ,H21H111=β0.\begin{pmatrix}H_{11}\\ H_{21}\end{pmatrix}=\begin{pmatrix}I_{p}\\ \beta^{\top}_{0}\end{pmatrix}\psi,\quad H_{21}H_{11}^{-1}=\beta^{\top}_{0}.

The estimator from Equation (6) is given by β^={G21G111}=G12/G22\hat{\beta}=\{G_{21}G^{-1}_{11}\}^{\top}=-G_{12}/G_{22}. With the above results, we have

β^𝒫β0,\hat{\beta}\overset{\mathcal{P}}{\to}\beta_{0},

as N,nN,n\to\infty with N/nc>0N/n\to c>0. ∎

C.2 Proof of Theorem 2 of the Main Text

Proof.

In the context of general asymptotics, i.e., N,nN,n\to\infty with fixed ratio, consider the eigendecomposition of matrix Σ^1/2ΣΣ^1/2=UΛU\hat{\Sigma}^{-1/2}\Sigma\hat{\Sigma}^{-1/2}=U\Lambda U^{\top}, where Λ=diag(λi)\Lambda=\operatorname{diag}(\lambda_{i}) is the diagonal matrix of eigenvalues, and UU is the corresponding matrix of eigenvectors. Let

Y=UΣ^12Y,X=UΣ^12X,ϵ=UΣ^12ϵ,X~=UΣ^12X~,V=UΣ^12V.\displaystyle\begin{split}&Y^{*}=U^{\top}\hat{\Sigma}^{-\frac{1}{2}}Y,\quad X^{*}=U^{\top}\hat{\Sigma}^{-\frac{1}{2}}X,\quad\epsilon^{*}=U^{\top}\hat{\Sigma}^{-\frac{1}{2}}\epsilon,\\ &\tilde{X}^{*}=U^{\top}\hat{\Sigma}^{-\frac{1}{2}}\tilde{X},\quad V^{*}=U^{\top}\hat{\Sigma}^{-\frac{1}{2}}V.\end{split} (C.9)

Then from Equation (6) of the main text, the generalized total least squares estimator β^\hat{\beta} solves equation

S(β)=X~(yX~β)N+βyX~βF2N(1+ββ)=0,S(\beta)=\frac{\tilde{X}^{*\top}(y^{*}-\tilde{X}^{*}\beta)}{N}+\beta\frac{\|y^{*}-\tilde{X}^{*}\beta\|^{2}_{F}}{N(1+\beta^{\top}\beta)}=0,

where S(β)=(S1(β),,Sp(β))S(\beta)=(S_{1}(\beta),\ldots,S_{p}(\beta))^{\top} is a pp dimensional vector. By Taylor’s theorem there exists a series of βj\beta_{j}^{*} on the line segment between β^\hat{\beta} and β0\beta_{0} for j=1,,pj=1,\ldots,p such that

Sj(β^)=Sj(β0)+[Sj(βj)](β^β0)=0,j=1,2,,p\displaystyle S_{j}(\hat{\beta})=S_{j}(\beta_{0})+[\nabla S_{j}(\beta_{j}^{*})]^{\top}(\hat{\beta}-\beta_{0})=0,\quad j=1,2,...,p
S(β^)=S(β0)+H(β^β0)=0,H=(S1(β1),,Sp(βp)).\displaystyle S(\hat{\beta})=S(\beta_{0})+H(\hat{\beta}-\beta_{0})=0,\quad H=(\nabla S_{1}(\beta_{1}^{*}),\ldots,\nabla S_{p}(\beta_{p}^{*})).

It follows that

HN(β^β0)\displaystyle H\sqrt{N}(\hat{\beta}-\beta_{0}) =NS(β0)\displaystyle=-\sqrt{N}S(\beta_{0})
=1Ni=1N{(ϵiviβ0)(xi+vi)+β0(ϵiviβ0)2(1+β0β0)}\displaystyle=-\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\left\{(\epsilon_{i}^{*}-v_{i}^{*\top}\beta_{0})(x^{*}_{i}+v^{*}_{i})+\beta_{0}\frac{(\epsilon_{i}^{*}-v_{i}^{*\top}\beta_{0})^{2}}{(1+\beta_{0}^{\top}\beta_{0})}\right\}
1Ni=1Nfi(β0),\displaystyle\coloneqq-\frac{1}{\sqrt{N}}\sum_{i=1}^{N}f_{i}(\beta_{0}),

where for i=1,,Ni=1,\ldots,N, ϵi\epsilon^{*}_{i} is the iith element of vector ϵ\epsilon, viv^{*}_{i} is the iith row vector of VV^{*} and xix^{*}_{i} is the iith row vector of XX^{*}.

From Assumption 5 in the main text and Equation (C.9), ϵiN(0,λi)\epsilon^{*}_{i}\sim N(0,\lambda_{i}), vi𝒩(0,λiIp)v^{*}_{i}\sim\mathcal{N}(0,\lambda_{i}I_{p}), and fi(β0)f_{i}(\beta_{0}) are mutually independent vectors with finite covariance matrices

𝕍(fi(β0))\displaystyle\operatorname{\mathbb{V}}(f_{i}(\beta_{0})) =λixixi(1+β0β0)+λi2(Ip+Ipβ0β0+2β0β0)3λi2β0β0\displaystyle=\lambda_{i}x^{*}_{i}x^{*\top}_{i}(1+\beta_{0}^{\top}\beta_{0})+\lambda_{i}^{2}(I_{p}+I_{p}\beta_{0}^{\top}\beta_{0}+2\beta_{0}\beta_{0}^{\top})-3\lambda_{i}^{2}\beta_{0}\beta_{0}^{\top}
=[λixixi+λi2(Ip+β0β0)1](1+β0β0),\displaystyle=[\lambda_{i}x^{*}_{i}x^{*\top}_{i}+\lambda_{i}^{2}(I_{p}+\beta_{0}\beta_{0}^{\top})^{-1}](1+\beta_{0}^{\top}\beta_{0}),

such that

limN1Ni=1N𝕍(fi(β0))={Δ2+K(Ip+β0β0)1}(1+β0β0).\lim_{N\to\infty}\frac{1}{N}\sum_{i=1}^{N}\operatorname{\mathbb{V}}(f_{i}(\beta_{0}))=\big{\{}\Delta_{2}+K(I_{p}+\beta_{0}\beta_{0}^{\top})^{-1}\big{\}}(1+\beta_{0}^{\top}\beta_{0}).

Then from Lemma 4.1 of Gleser (1981) and Assumption 3 and 4 of the main text,

NS(β0)𝒟MVN(0,Ξ1),\displaystyle\sqrt{N}S({\beta_{0}})\overset{\mathcal{D}}{\to}MVN(0,\Xi_{1}),

where Ξ1={Δ2+K(Ip+β0β0)1}(1+β0β0)\Xi_{1}=\big{\{}\Delta_{2}+K(I_{p}+\beta_{0}\beta_{0}^{\top})^{-1}\big{\}}(1+\beta_{0}^{\top}\beta_{0}).

It remains to show that as NN\to\infty,

H=(S1(β1),,Sp(βp))𝒫Δ1.H=(\nabla S_{1}(\beta_{1}^{*}),\ldots,\nabla S_{p}(\beta_{p}^{*}))\overset{\mathcal{P}}{\to}\Delta_{1}. (C.10)

Consider the derivative of the score function S(β)S(\beta) at any value of β\beta

S(β)\displaystyle\nabla S(\beta) =X~X~N+yX~βF2N(1+ββ)Ip+βyX~βF2/{N(1+ββ)}β\displaystyle=-\frac{\tilde{X}^{*\top}\tilde{X}^{*}}{N}+\frac{\|y^{*}-\tilde{X}^{*}\beta\|^{2}_{F}}{N(1+\beta^{\top}\beta)}I_{p}+\beta\frac{\partial\|y^{*}-\tilde{X}^{*}\beta\|^{2}_{F}/\{N(1+\beta^{\top}\beta)\}}{\partial\beta^{\top}}
=X~X~N+yX~βF2N(1+ββ)Ip+β{S(β)}1+ββ.\displaystyle=-\frac{\tilde{X}^{*\top}\tilde{X}^{*}}{N}+\frac{\|y^{*}-\tilde{X}^{*}\beta\|^{2}_{F}}{N(1+\beta^{\top}\beta)}I_{p}+\frac{\beta\{S(\beta)\}^{\top}}{1+\beta^{\top}\beta}.

Since S(β)0S(\beta)\to 0 as NN\to\infty and β𝑃β0\beta\overset{P}{\to}\beta_{0}, we have

limN,ββ0S(β)\displaystyle\lim_{N\to\infty,\beta\to\beta_{0}}\nabla S(\beta) =limN,ββ0{X~X~N+yX~βF2N(1+ββ)Ip}\displaystyle=\lim_{N\to\infty,\beta\to\beta_{0}}\left\{-\frac{\tilde{X}^{*\top}\tilde{X}^{*}}{N}+\frac{\|y^{*}-\tilde{X}^{*}\beta\|^{2}_{F}}{N(1+\beta^{\top}\beta)}I_{p}\right\}
=Δ1limNTr(Σ^1Σ)NIp+limNTr(Σ^1Σ)NIp\displaystyle=-\Delta_{1}-\lim_{N\to\infty}\frac{\operatorname{Tr}(\hat{\Sigma}^{-1}\Sigma)}{N}I_{p}+\lim_{N\to\infty}\frac{\operatorname{Tr}(\hat{\Sigma}^{-1}\Sigma)}{N}I_{p}
+limN,ββ0(ββ0)Δ1(ββ0)(1+ββ)Ip\displaystyle\phantom{=\,\,}+\lim_{N\to\infty,\beta\to\beta_{0}}\frac{(\beta-\beta_{0})^{\top}\Delta_{1}(\beta-\beta_{0})}{(1+\beta^{\top}\beta)}I_{p}
=Δ1.\displaystyle=-\Delta_{1}.

Thus from the Lemma 2 of the main text and the fact that {βj}\{\beta_{j}^{*}\} is a sequence on the line segment between β^\hat{\beta} and β0\beta_{0} for j=1,,pj=1,\ldots,p, Equation (C.10) holds. With this, we complete the proof of Theorem 2. ∎

C.3 Proof of Theorem 3 of the Main Text

Proof.

Here we sketch the proof in the context of optimal fingerprinting. More rigorous derivations are to be established for the more general setting. As mentioned in Section 3.2 of the main text, without loss of generality, we adjust the Model (2) to make the covariance matrix of the errors in the covariartes and in the response the same. Consider model

Y\displaystyle Y =i=1pXiβi+ϵ,\displaystyle=\sum_{i=1}^{p}X^{*}_{i}\beta^{*}_{i}+\epsilon,
X~i\displaystyle\tilde{X}^{*}_{i} =Xi+νi,i=1,,p,\displaystyle=X^{*}_{i}+\nu^{*}_{i},\qquad i=1,\ldots,p,

where νi\nu^{*}_{i} is a normally distributed measurement error vector with mean zero and covariance matrix Σ/ni\Sigma/n_{i}. Here for the ease of illustration, we consider n1==np=n0n_{1}=\ldots=n_{p}=n_{0}. Usually in the above model, the magnitudes of fingerprints are comparable to the noise in the sense that the values of Tr{(X)X}\operatorname{Tr}\{(X^{*})^{\top}X^{*}\} and Tr(Σ)\operatorname{Tr}(\Sigma) are comparable, i.e., Tr{(X)X}/pTr(Σ)=O(1)\operatorname{Tr}\{(X^{*})^{\top}X^{*}\}/p\operatorname{Tr}(\Sigma)=O(1) as NN\to\infty.

Let X~i=n0X~i\tilde{X}_{i}=\sqrt{n_{0}}\tilde{X}^{*}_{i}, Xi=n0XiX_{i}=\sqrt{n_{0}}X^{*}_{i}, νi=n0νi\nu_{i}=\sqrt{n_{0}}\nu^{*}_{i} and βi=βi/n0\beta_{i}=\beta^{*}_{i}/\sqrt{n_{0}}. Then the model can be rewritten as

Y\displaystyle Y =i=1pXiβi+ϵ,\displaystyle=\sum_{i=1}^{p}X_{i}\beta_{i}+\epsilon, (C.11)
X~i\displaystyle\tilde{X}_{i} =Xi+νi,i=1,,p,\displaystyle=X_{i}+\nu_{i},\qquad i=1,\ldots,p, (C.12)

which is exactly in the form of Models (1)–(2) of the main text with ni=1n_{i}=1. Then the theoretical results in Section 3.2 of the main text are directly applied to the adjusted model. The coefficient estimations and corresponding variance estimations of the original model can be easily obtained from the results of above adjusted model. That is, in the context of optimal fingerprinting, the original model is equivalent to fit models with relative large magnitudes of fingerprints XiX_{i} and small values of true coefficients β0\beta_{0}.

Consider the trace of asymptotic covariance matrix for the estimated coefficient vector in Model (C.11)\eqref{eq:eiv2} given by

Tr(Ξ)\displaystyle\operatorname{Tr}(\Xi) =Tr(Δ11{Δ2+K(Ip+β0β0)1}(1+β0β0)Δ11)\displaystyle=\operatorname{Tr}(\Delta^{-1}_{1}\big{\{}\Delta_{2}+K(I_{p}+\beta_{0}\beta_{0}^{\top})^{-1}\big{\}}(1+\beta_{0}^{\top}\beta_{0})\Delta^{-1}_{1}) (C.13)
=Tr(Δ11Δ2Δ11)+(1+β0β0)Tr(Δ11K(Ip+β0β0)1Δ11)\displaystyle=\operatorname{Tr}(\Delta^{-1}_{1}\Delta_{2}\Delta^{-1}_{1})+(1+\beta_{0}^{\top}\beta_{0})\operatorname{Tr}(\Delta^{-1}_{1}K(I_{p}+\beta_{0}\beta_{0}^{\top})^{-1}\Delta^{-1}_{1}) (C.14)

from the Theorem 2 of the main text.

The first term Tr(Δ11Δ2Δ11)\operatorname{Tr}(\Delta^{-1}_{1}\Delta_{2}\Delta^{-1}_{1}) on the right hand side of (C.13) is the same as the loss function proposed in Appendix B.1. In the context of general asymptotics, i.e., N,nN,n\to\infty with fixed ratio, the first term is equivalent to

Np2Tr(XX)Tr(Σ^1ΣΣ^1)/N(Tr(Σ^1)/N)2,\frac{Np^{2}}{\operatorname{Tr}(X^{\top}X)}\frac{\operatorname{Tr}(\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1})/N}{(\operatorname{Tr}(\hat{\Sigma}^{-1})/N)^{2}},

as XAX/NX^{\top}AX/N is asymptotically equivalent to {Tr(A)Ip/N}{Tr(XX)/Np}\{\operatorname{Tr}(A)I_{p}/N\}\{\operatorname{Tr}(X^{\top}X)/Np\} for properly defined matrices XX and AA (See Appendix B.1).

As for the second term (1+β0β0)Tr(Δ11K(Ip+β0β0)1Δ11)(1+\beta_{0}^{\top}\beta_{0})\operatorname{Tr}(\Delta^{-1}_{1}K(I_{p}+\beta_{0}\beta_{0}^{\top})^{-1}\Delta^{-1}_{1}), we need to show that it is dominated by the first term. Given the information on the magnitudes of fingerprints XX and coefficients β0\beta_{0} from the above illustrations, with appropriate large choice for the value of n0n_{0} (which is fairly reasonable in real fingerprinting studies), the small values of β0\beta_{0} can be omitted in the second term. Then the second term can be approximated by Tr(Δ11KΔ11)\operatorname{Tr}(\Delta^{-1}_{1}K\Delta^{-1}_{1}). We further note that K=limN,nTr{(Σ^1/2ΣΣ^1/2)2}/NK=\lim_{N,n\to\infty}\operatorname{Tr}\{(\hat{\Sigma}^{-1/2}\Sigma\hat{\Sigma}^{-1/2})^{2}\}/N and Tr{(Σ^1/2ΣΣ^1/2)2}/N{Tr(Σ^1ΣΣ^1)Tr(Σ)}/N\operatorname{Tr}\{(\hat{\Sigma}^{-1/2}\Sigma\hat{\Sigma}^{-1/2})^{2}\}/N\leq\{\operatorname{Tr}(\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1})\operatorname{Tr}(\Sigma)\}/N. Then we have in the general asymptotics

Tr(Δ11KΔ11)\displaystyle\operatorname{Tr}(\Delta^{-1}_{1}K\Delta^{-1}_{1}) N2p3Tr(XX)2{Tr(Σ^1ΣΣ^1)Tr(Σ)}/N(Tr(Σ^1)/N)2\displaystyle\leq\frac{N^{2}p^{3}}{\operatorname{Tr}(X^{\top}X)^{2}}\frac{\{\operatorname{Tr}(\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1})\operatorname{Tr}(\Sigma)\}/N}{(\operatorname{Tr}(\hat{\Sigma}^{-1})/N)^{2}}
=Np2Tr(XX){Tr(Σ^1ΣΣ^1)}/N(Tr(Σ^1)/N)2Tr(Σ)Tr(XX)/p,\displaystyle=\frac{Np^{2}}{\operatorname{Tr}(X^{\top}X)}\frac{\{\operatorname{Tr}(\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1})\}/N}{(\operatorname{Tr}(\hat{\Sigma}^{-1})/N)^{2}}\frac{\operatorname{Tr}(\Sigma)}{\operatorname{Tr}(X^{\top}X)/p},

where Tr(Σ)/(Tr(XX)/p)=O(1/n0)\operatorname{Tr}(\Sigma)/(\operatorname{Tr}(X^{\top}X)/p)=O(1/n_{0}) as NN\to\infty based on the adjustment in Model (C.11). That is, with fairly large value of the number of climate simulations to obtain the estimated fingerprints X~i\tilde{X}_{i}, the second term on the right hand side of (C.13) can be dominated by the first term, i.e., as NN\to\infty we have

(1+β0β0)Tr(Δ11K(Ip+β0β0)1Δ11)\displaystyle(1+\beta_{0}^{\top}\beta_{0})\operatorname{Tr}(\Delta^{-1}_{1}K(I_{p}+\beta_{0}\beta_{0}^{\top})^{-1}\Delta^{-1}_{1}) δTr(Δ11Δ2Δ11),\displaystyle\leq\delta\operatorname{Tr}(\Delta^{-1}_{1}\Delta_{2}\Delta^{-1}_{1}),

for an arbitrary small value δ\delta with large enough magnitude of XiX_{i} in Model (C.11), i.e., large enough number of climate simulations for computing the fingerprints in the original model. Thus the proposed covariance matrix Σ^MV\hat{\Sigma}_{\mathrm{MV}} which is the optimal choice regarding the first term in general asymptotics is expected to outperform the linear shrinkage estimator Σ^LS\hat{\Sigma}_{\mathrm{LS}} in the sense that

Tr(Ξ(Σ^MV))Tr(Ξ(Σ^LS)),\operatorname{Tr}\big{(}\Xi(\hat{\Sigma}_{\mathrm{MV}})\big{)}\leq\operatorname{Tr}\big{(}\Xi(\hat{\Sigma}_{\mathrm{LS}})\big{)},

which is consistent with the fact that as the number of climate simulations to estimate the fingerprints becomes larger, the effects of measurement error diminish.

This completes the proof. ∎

D Detailed Results on Simulation Studies

The results of simulation studies in Section 4 of the main text are detailed in Table D.1, Table D.2 and Figure D.1.

Table D.1: Scaling factor estimates and 95% confidence interval. Two structures of covariance matrix is considered: ΣUN\Sigma_{\mathrm{UN}} for proposed shrinkage estimator from ensemble simulations and ΣST\Sigma_{\mathrm{ST}} for separable spatio-temporal covariance matrix. λ\lambda is the scale to control the signal-to-noise ratio. Error terms are generated from multivariate normal distribution. Number of ensembles for two forcings are n1=35n_{1}=35 and n2=46n_{2}=46. Two constructions of weight matrix are compared. “M1” denote for linear shrinkage method and “M2” denote for proposed approaches. Existing formula-based method (N) and calibration method (CB) are used to construct 95% confidence intervals. Bias and standard deviation (SD ×\times 100) of scaling factors and average length of confidence intervals (CIL) and corresponding empirical rate (CR) from 1000 replicates are recorded.
ANT NAT
N CB N CB
size method Bias SD CIL CR CIL CR Bias SD CIL CR CIL CR
Σ\Sigma Setting 1: ΣUN\Sigma_{\mathrm{UN}}; SNR λ=0.5\lambda=0.5
50 M1 0.02 19.0 0.41 69.8 0.53 82.0 0.01 57.0 1.47 78.5 1.89 89.9
M2 0.02 17.3 0.38 69.2 0.55 84.7 0.01 49.7 1.23 76.3 1.79 92.6
100 M1 0.01 13.6 0.36 76.8 0.47 89.9 -0.02 36.9 1.12 85.8 1.46 94.0
M2 -0.00 9.7 0.29 83.4 0.40 97.0 -0.02 26.8 0.81 86.1 1.13 95.6
200 M1 0.00 9.0 0.30 89.4 0.39 96.2 -0.01 24.4 0.87 92.4 1.08 97.5
M2 -0.00 6.7 0.23 91.3 0.26 95.4 -0.01 18.7 0.64 89.1 0.76 94.8
400 M1 0.00 6.1 0.26 93.7 0.30 96.7 -0.01 18.9 0.72 93.2 0.83 97.0
M2 -0.00 5.6 0.20 93.2 0.22 94.0 -0.00 16.1 0.57 92.1 0.64 94.3
Σ\Sigma Setting 1: ΣUN\Sigma_{\mathrm{UN}}; SNR λ=1\lambda=1
50 M1 0.00 9.0 0.20 71.9 0.25 80.7 -0.01 20.4 0.63 87.2 0.73 92.4
M2 0.00 8.5 0.18 72.5 0.26 83.9 -0.00 19.2 0.55 80.9 0.68 89.6
100 M1 0.00 6.6 0.18 79.8 0.23 90.7 -0.01 14.8 0.51 91.0 0.59 96.2
M2 0.00 4.5 0.14 85.6 0.19 96.5 -0.01 12.0 0.38 86.9 0.46 94.6
200 M1 0.00 4.5 0.15 88.6 0.19 95.9 -0.00 10.2 0.41 95.4 0.46 97.5
M2 0.00 3.3 0.12 92.1 0.13 95.6 0.00 8.7 0.31 93.2 0.35 95.9
400 M1 0.00 3.2 0.13 94.0 0.14 96.5 -0.00 8.3 0.34 95.6 0.37 96.7
M2 0.00 2.8 0.10 92.4 0.11 94.0 -0.00 7.8 0.28 91.3 0.30 95.4
Σ\Sigma Setting 2: ΣST\Sigma_{\mathrm{ST}}; SNR λ=0.5\lambda=0.5
50 M1 -0.00 3.2 0.10 87.7 0.11 91.8 0.00 8.7 0.33 91.8 0.39 96.7
M2 0.00 3.1 0.09 85.6 0.11 91.0 0.00 8.5 0.30 89.9 0.35 95.1
100 M1 -0.00 2.2 0.08 93.2 0.10 96.7 0.00 7.1 0.26 91.3 0.30 95.6
M2 -0.00 2.0 0.07 91.6 0.08 95.6 0.00 7.0 0.23 87.7 0.26 91.6
200 M1 -0.00 1.6 0.07 95.6 0.08 97.3 0.00 5.9 0.22 94.0 0.25 96.5
M2 -0.00 1.6 0.06 94.0 0.06 95.4 0.00 5.6 0.19 90.5 0.21 93.7
400 M1 -0.00 1.5 0.06 94.8 0.06 96.5 0.00 5.1 0.19 91.0 0.21 93.7
M2 -0.00 1.4 0.05 92.4 0.05 93.5 -0.00 4.8 0.17 88.8 0.18 90.7
Σ\Sigma Setting 2: ΣST\Sigma_{\mathrm{ST}}; SNR λ=1\lambda=1
50 M1 -0.00 1.6 0.05 84.2 0.06 90.5 0.00 4.4 0.16 93.5 0.18 94.8
M2 -0.00 1.6 0.04 80.7 0.05 89.6 0.00 4.5 0.15 90.2 0.16 91.6
100 M1 -0.00 1.1 0.04 92.4 0.05 97.3 0.00 3.2 0.13 95.4 0.14 97.8
M2 -0.00 1.0 0.04 89.4 0.04 95.4 0.00 3.2 0.11 92.6 0.12 95.9
200 M1 -0.00 0.8 0.03 94.0 0.04 97.0 0.00 2.8 0.11 94.8 0.12 96.7
M2 0.00 0.7 0.03 93.7 0.03 95.4 0.00 2.7 0.10 92.6 0.10 94.6
400 M1 0.00 0.7 0.03 96.7 0.03 97.5 0.00 2.3 0.09 95.9 0.10 97.8
M2 0.00 0.6 0.03 95.4 0.03 95.4 0.00 2.3 0.08 92.4 0.09 94.6
Table D.2: Scaling factor estimates and 95% confidence interval. Two structures of covariance matrix is considered: ΣUN\Sigma_{\mathrm{UN}} for proposed shrinkage estimator from ensemble simulations and ΣST\Sigma_{\mathrm{ST}} for separable spatio-temporal covariance matrix. λ\lambda is the scale to control the signal-to-noise ratio. Error terms are generated from multivariate normal distribution. Number of ensembles for two forcings are n1=10n_{1}=10 and n2=6n_{2}=6. Two constructions of weight matrix are compared. “M1” denote for linear shrinkage estimator and “M2” denote for proposed approach. Existing formula-based method (N) and calibration method (CB) are used to construct 95% confidence intervals. Bias and standard deviation (SD ×\times 100) of scaling factors and average length of confidence intervals (CIL) and corresponding empirical rate (CR) from 1000 replicates are recorded.
ANT NAT
N CB N CB
size method Bias SD CIL CR CIL CR Bias SD CIL CR CIL CR
Σ\Sigma Setting 1: ΣUN\Sigma_{\mathrm{UN}}; SNR λ=0.5\lambda=0.5
50 M1 0.00 24.6 0.53 74.9 0.71 90.7 -0.00 130.9 2.08 66.8 3.07 85.8
M2 0.00 22.8 0.47 71.9 0.72 92.6 0.05 115.2 1.77 64.9 3.22 92.1
100 M1 -0.00 17.4 0.44 82.3 0.59 95.1 0.06 97.1 1.90 70.0 2.90 90.7
M2 -0.00 11.5 0.34 86.4 0.50 98.1 0.02 53.3 1.29 76.6 2.28 94.6
200 M1 0.01 11.4 0.36 89.1 0.47 96.7 0.03 47.0 1.37 83.7 2.02 94.6
M2 0.00 7.4 0.27 92.1 0.32 95.6 0.00 29.9 0.91 88.8 1.24 97.5
400 M1 0.00 7.5 0.30 94.3 0.35 97.3 0.02 30.5 1.09 90.5 1.41 96.2
M2 0.00 6.5 0.23 91.8 0.25 93.7 0.01 21.3 0.79 92.4 0.95 96.2
Σ\Sigma Setting 1: ΣUN\Sigma_{\mathrm{UN}}; SNR λ=1\lambda=1
50 M1 -0.00 11.0 0.23 69.2 0.30 81.5 0.05 37.7 0.90 76.8 1.22 87.7
M2 -0.01 10.5 0.21 69.2 0.30 83.4 0.03 32.7 0.73 72.5 1.15 92.4
100 M1 -0.00 7.6 0.20 78.5 0.27 90.7 0.01 24.6 0.69 82.3 0.94 92.9
M2 0.00 5.2 0.16 86.9 0.22 95.9 0.00 17.0 0.47 83.4 0.71 95.1
200 M1 -0.00 4.7 0.17 91.0 0.21 98.4 -0.00 15.0 0.50 88.8 0.66 95.4
M2 -0.00 3.4 0.13 94.6 0.14 96.5 -0.01 11.5 0.36 87.2 0.45 94.0
400 M1 0.00 3.4 0.14 97.8 0.16 99.2 -0.00 11.4 0.42 90.2 0.49 95.4
M2 0.00 2.8 0.11 95.1 0.12 95.9 -0.01 9.3 0.32 89.6 0.37 92.4
Σ\Sigma Setting 2: ΣST\Sigma_{\mathrm{ST}}; SNR λ=0.5\lambda=0.5
50 M1 0.00 3.6 0.11 85.6 0.13 92.9 0.01 14.2 0.39 81.2 0.60 96.2
M2 0.00 3.5 0.10 83.4 0.12 92.9 0.01 12.6 0.35 81.5 0.55 97.3
100 M1 -0.00 2.5 0.09 91.8 0.11 96.7 -0.00 9.0 0.30 88.0 0.40 96.7
M2 -0.00 2.3 0.08 91.6 0.09 96.2 -0.00 8.1 0.27 89.1 0.34 95.1
200 M1 0.00 1.9 0.07 94.6 0.08 97.5 0.00 6.6 0.25 93.5 0.30 97.3
M2 0.00 1.8 0.06 94.3 0.07 95.9 0.00 5.9 0.22 92.6 0.25 96.5
400 M1 -0.00 1.5 0.06 95.1 0.07 97.0 0.00 6.1 0.22 94.6 0.24 96.5
M2 -0.00 1.4 0.06 92.4 0.06 94.0 0.00 5.6 0.19 92.6 0.21 96.2
Σ\Sigma Setting 2: ΣST\Sigma_{\mathrm{ST}}; SNR λ=1\lambda=1
50 M1 -0.00 1.7 0.05 87.2 0.06 92.4 0.00 5.4 0.18 89.4 0.23 95.1
M2 -0.00 1.6 0.05 86.4 0.06 91.6 0.00 5.3 0.16 87.2 0.21 92.9
100 M1 -0.00 1.3 0.04 89.4 0.05 92.4 -0.00 3.8 0.14 92.9 0.17 96.5
M2 -0.00 1.2 0.04 86.4 0.04 91.3 -0.00 3.8 0.13 91.8 0.14 93.5
200 M1 0.00 1.0 0.04 92.9 0.04 97.0 0.00 2.9 0.12 94.3 0.14 96.7
M2 0.00 0.9 0.03 90.7 0.04 94.6 0.00 2.6 0.11 94.0 0.12 95.6
400 M1 -0.00 0.7 0.03 96.5 0.03 97.5 -0.00 2.6 0.10 94.6 0.12 97.0
M2 -0.00 0.7 0.03 95.1 0.03 96.5 -0.00 2.5 0.09 94.3 0.10 95.6
Refer to caption
Figure D.1: Calibrated 95% confidence intervals for the scaling factors in the simulation study based on 1000 replicates. The numbers of runs for two forcings are n1=10n_{1}=10 and n2=6n_{2}=6 respectively.

References

  • Allen and Stott (2003) Allen, M. R. and P. A. Stott (2003). Estimating signal amplitudes in optimal fingerprinting, part I: Theory. Climate Dynamics 21, 477–491.
  • Allen and Tett (1999) Allen, M. R. and S. F. B. Tett (1999). Checking for model consistency in optimal fingerprinting. Climate Dynamics 15, 419–434.
  • Bateman (1954) Bateman, H. (1954). Tables of Integral Transforms, Volume II. New York: McGraw-Hill.
  • Bindoff et al. (2013) Bindoff, N. L., P. A. Stott, K. M. AchutaRao, M. R. Allen, N. Gillett, D. Gutzler, K. Hansingo, G. Hegerl, Y. Hu, S. Jain, I. I. Mokhov, J. Overland, J. Perlwitz, R. Sebbari, and X. Zhang (2013). Detection and attribution of climate change: From global to regional. In T. F. Stocker, D. Qin, G.-K. Plattner, M. Tignor, S. K. Allen, J. Boschung, A. Nauels, Y. Xia, V. Bex, and P. M. Midgley (Eds.), Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change, Book section 10, pp.  867–952. Cambridge, United Kingdom and New York, NY, USA: Cambridge University Press.
  • DelSole et al. (2019) DelSole, T., L. Trenary, X. Yan, and M. K. Tippett (2019). Confidence intervals in optimal fingerprinting. Climate Dynamics 52, 4111–4126.
  • Engle et al. (2019) Engle, R. F., O. Ledoit, and M. Wolf (2019). Large dynamic covariance matrices. Journal of Business & Economic Statistics 37, 363–375.
  • Fuller (1980) Fuller, W. A. (1980). Properties of some estimators for the errors-in-variables model. Annals of Statistics 8, 407–422.
  • Gleser (1981) Gleser, L. J. (1981). Estimation in a multivariate “errors in variables” regression model: Large sample results. Annals of Statistics 9(1), 24–44.
  • Hannart (2016) Hannart, A. (2016). Integrated optimal fingerprinting: Method description and illustration. Journal of Climate 29(6), 1977–1998.
  • Hannart et al. (2014) Hannart, A., A. Ribes, and P. Naveau (2014). Optimal fingerprinting under multiple sources of uncertainty. Geophysical Research Letters 41(4), 1261–1268.
  • Hasselmann (1997) Hasselmann, K. (1997). Multi-pattern fingerprint method for detection and attribution of climate change. Climate Dynamics 13(9), 601–611.
  • Hegerl and Zwiers (2011) Hegerl, G. and F. Zwiers (2011). Use of models in detection and attribution of climate change. Wiley Interdisciplinary Reviews: Climate Change 2(4), 570–591.
  • Hegerl et al. (2010) Hegerl, G. C., O. Hoegh-Guldberg, G. Casassa, M. P. Hoerling, R. S. Kovats, C. Parmesan, D. W. Pierce, and P. A. Stott (2010). Good practice guidance paper on detection and attribution related to anthropogenic climate change. In T. F. Stocker, C. B. Field, D. Qin, V. Barros, G.-K. Plattner, M. Tignor, P. M. Midgley, and K. L. Ebi (Eds.), Meeting Report of the Intergovernmental Panel on Climate Change Expert Meeting on Detection and Attribution of Anthropogenic Climate Change, Bern, Switzerland. IPCC Working Group I Technical Support Unit, University of Bern.
  • Hegerl et al. (1996) Hegerl, G. C., H. von Storch, K. Hasselmann, B. D. Santer, U. Cubasch, and P. D. Jones (1996). Detecting greenhouse-gas-induced climate change with an optimal fingerprint method. Journal of Climate 9(10), 2281–2306.
  • Jones et al. (2013) Jones, G. S., P. A. Stott, and N. Christidis (2013). Attribution of observed historical near–surface temperature variations to anthropogenic and natural causes using CMIP5 simulations. Journal of Geophysical Research: Atmospheres 118(10), 4001–4024.
  • Jones et al. (2016) Jones, G. S., P. A. Stott, and J. F. B. Mitchell (2016). Uncertainties in the attribution of greenhouse gas warming and implications for climate prediction. Journal of Geophysical Research: Atmospheres 121(12), 6969–6992.
  • Ledoit and Péché (2011) Ledoit, O. and S. Péché (2011). Eigenvectors of some large sample covariance matrix ensembles. Probability Theory and Related Fields 150, 233–264.
  • Ledoit and Wolf (2004) Ledoit, O. and M. Wolf (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis 88(2), 365–411.
  • Ledoit and Wolf (2017a) Ledoit, O. and M. Wolf (2017a). Direct nonlinear shrinkage estimation of large-dimensional covariance matrices. Working Paper 264, University of Zurich, Department of Economics.
  • Ledoit and Wolf (2017b) Ledoit, O. and M. Wolf (2017b). Nonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz meets Goldilocks. Review of Financial Studies 30(12), 4349–4388.
  • Ledoit and Wolf (2018) Ledoit, O. and M. Wolf (2018). Optimal estimation of a large-dimensional covariance matrix under Stein’s loss. Bernoulli 24(4B), 3791–3832.
  • Li et al. (2020) Li, Y., K. Chen, and J. Yan (2020). dacc: Detection and Attribution of Climate Change. R package version 0.1-13.
  • Li et al. (2019) Li, Y., K. Chen, J. Yan, and X. Zhang (2019). Confidence interval calibration for regularized optimal fingerprinting in detection and attribution of climate change. Technical report, Department of Statistics, University of Connecticut.
  • Marc̆enko and Pastur (1967) Marc̆enko, V. and L. A. Pastur (1967). Distribution of eigenvalues for some sets of random matrices. Sbornik: Mathematics 4(1), 457–483.
  • Morice et al. (2012) Morice, C. P., J. J. Kennedy, N. A. Rayner, and P. D. Jones (2012). Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: The HadCRUT4 data set. Journal of Geophysical Research: Atmospheres 117, D08101.
  • Pesta (2013) Pesta, M. (2013). Total least squares and bootstrapping with applications in calibration. Statistics 47(5), 966––991.
  • Ribes et al. (2009) Ribes, A., J.-M. Azaïs, and S. Planton (2009). Adaptation of the optimal fingerprint method for climate change detection using a well-conditioned covariance matrix estimate. Climate Dynamics 33(5), 707–722.
  • Ribes et al. (2013) Ribes, A., S. Planton, and L. Terray (2013). Application of regularised optimal fingerprinting to attribution. Part I: Method, properties and idealised analysis. Climate Dynamics 41(11-12), 2817–2836.
  • Schurer et al. (2018) Schurer, A., G. Hegerl, A. Ribes, D. Polson, C. Morice, and S. Tett (2018). Estimating the transient climate response from observed warming. Journal of Climate 31(20), 8645–8663.
  • Silverstein (1995) Silverstein, J. W. (1995). Strong convergence of the empirical distribution of eigenvalues of large-dimensional random matrices. Journal of Multivariate Analysis 55, 331–339.
  • Silverstein and Bai (1995) Silverstein, J. W. and Z. Bai (1995). On the empirical distribution of eigenvalues of a class of large-dimensional random matrices. Journal of Multivariate Analysis 54, 175–192.
  • Zhang et al. (2006) Zhang, X., F. Zwiers, and P. A. Stott (2006). Multimodel multisignal climate change detection at regional scale. Journal of Climate 19(17), 4294–4307.