On Spurious Causality, CO₂, and Global Temperature

Philippe Goulet Coulombe Maximilian Göbel Department of Economics, [email protected]. We thanks an anonymous referee who suggested the use of information flows in an earlier paper of ours. We are grateful to Boyuan Zhang for many helpful comments and suggestions.

( University of Pennsylvania ISEG - Universidade de Lisboa
First Draft: October 10, 2020
This Draft: December 12, 2024
)

Abstract

Stips, Macias, Coughlan, Garcia-Gorriz, and Liang (2016) (Nature Scientific Reports) use information flows (Liang,, 2008, 2014) to establish causality from various forcings to global temperature. We show that the formulas being used hinges on a simplifying assumption that is nearly always rejected by the data. We propose an adequate measure of information flow based on Vector Autoregressions, and find that most results in StipsEtAl2016 cannot be corroborated. Then, it is discussed which modeling choices (e.g., the choice of CO₂ series and assumptions about simultaneous relationships) may help in extracting credible estimates of causal flows and the transient climate response simply by looking at the joint dynamics of two climatic time series.

Keywords: information flows, vector autoregressions, global warming, climate econometrics

1 Introduction

Causality is fundamental to science. While causal statements can reasonably be made without hardship in controlled environments, things are far less straightforward when only observational data is available. Answering the question of what happens to $Y$ if one intervenes on $X$ is compromised by the simple fact that $X$ and $Y$ were not generated by exogenously modulating $X$ , but by (often) endogenous interactions between the two variables.

Yet, for many scientific interrogations of capital importance – like the relationship between greenhouse gases and global temperature, only observational data is available. How does different forcings cause global mean surface temperature anomalies (GMTA)? In a popular article, Stips, Macias, Coughlan, Garcia-Gorriz, and Liang (2016) (henceforth, SMCGL) set up to answer the question using information flow (IF) from one variable to another for bivariate stochastic dynamical systems – a methodology developed in Liang2008; Liang2014; Liang2015; Liang2016. Authors make grand claims about the technique being able to extract "true" and "rigorous" causality, which clash with the usually somber tone of causal analysis.

We show that the IF methodology will not extract causality for the near-universe of bivariate stochastic dynamic systems estimated on real data – with those of SMCGL included. In a nutshell, this occurs because Liang2008 formulas being used throughout, assume that a certain statistical quantity is 0 when it is not. This can be tested, and it is rejected almost all the time. More precisely, Liang2014 formula assumes that when conditioning on the past state of the system ( $X_{t-1}$ and $Y_{t-1}$ ), the remaining variation in $X_{t}$ and $Y_{t}$ (the innovations driving the system) are uncorrelated. In SMCGL, where the time unit $t$ is one year, this implies that forcings and GMTA are uncorrelated within the same year, conditional on last year’s values of both. That is unlikely to happen, and the data will testify to that. The assumption is presented as "linearity" (Liang2014), but it has little to do with it. Rather, it assumes that the system is identified, meaning that the joint dynamics of the time series data fall on the knife edge case where causality can indeed be claimed from the data without any external assumption (sims1980). The problem is that for discretely sampled time series, the assumption is almost always rejected by lower-frequency data.¹¹1In tawia2019time, IF are used on daily data, which can alleviate the problem if there are no intra-day relationships. This last condition is something that should be verified, not assumed. In short, SMCGL assume there to be no identification problem (in the sense of sims1980), and find no identification problem.

In this note, we detail the consequences of this high-standing omission on IF measures and SMCGL’s results. First, in simulations (using data generating processes proposed by Liang2014) of which we know the true causality structure, it is shown that IF will often conclude that $X_{t}$ is largely causing $Y_{t}$ , when in fact the reverse is true. Second, we reconsider a key part of SMCGL’s study where the authors investigate the causal structure between different forcings and GMTA. Using an appropriate methodology that accounts for the correlated innovations, it is found that in most instances, the data by itself cannot back SMCGL’s claims. That is, unlike what the authors have put forward, it is not possible (within this framework) to claim that many forcings are causing GMTA’s increase as a direct implication of the data. In other words, from the data alone, it is impossible to discriminate between certain forcings mostly causing GMTA or the reverse. External assumptions based on physical knowledge could remedy that. Or different data. We show that in the case of CO₂, results in accord with the scientific consensus can be obtained when using concentration directly rather than its radiative content.

The alternative methodology that we use are structural Vector Autoregressions (SVAR) which are simultaneous dynamic systems of equations. They can characterize a linear dynamic system in discrete time. The methodology was introduced to macroeconomics by sims1980 and is now so in many fields, ranging from neuroscience (chen2011vector) to climate (VARCTIC). To document its reliability for this application, we report implied transient climate response (TCR) estimates of our alternative methodology. We find that those (i) largely depend on the necessary assumptions made about the simultaneous (with one year) impact of CO₂ forcing on GMTA but (ii) are in the range of recent estimates (OttoEtAl2013; MontamatStock2020) if one assumes simultaneous causality running from CO₂ forcing to GMTA.

Section 2 briefly review IFs and explain where the problematic assumption occurs. It also discusses relevant notions of Vector Autoregressions (VARs, sims1980) as a comprehensive framework to think about causality in multivariate time series systems. Section 3 display IFs possibly spurious behavior using simulated data where the true causality is known. Section 4 revisit the question of causality between different forcings and GMTA using appropriate tools. Section 5 concludes.

2 Information Flows, Non-Innocuous Assumptions and VARs

In this section, we review the basics of IFs as applied in empirical work, pin down the problematic assumption, explain why it is harmful through the lenses of a VAR, and propose an alternative measure of IF based on the VAR.

Liang2008 considers the data generating process (DGP)

\displaystyle d\textbf{X}=\textbf{F}\left(\textbf{X},t\right)dt+\textbf{B}\left(\textbf{x},t\right)d\textbf{W}\;,

(1)

where $\textbf{X}=\left(X_{1},X_{2}\right)\in\mathbb{R}^{2}$ are the state variables and $\textbf{F}=\left(F_{1},F_{2}\right)$ . $\textbf{W}=\left(W_{1},W_{2}\right)$ is a standard Wiener process, with $\Delta\textbf{W}=\frac{d\textbf{W}}{dt}$ and $E\left(\Delta W_{i}\right)=0$ and $E\left(\Delta W_{i}\right)^{2}=\Delta t$ . B is a $2\times 2$ matrix and its entries $b_{ij}$ govern how perturbations instantaneously impact the system. At this point, the only non-innocuous assumption is that of a bivariate system. The information flow from $X_{2}$ to $X_{1}$ , $T_{2\rightarrow 1}$ , is then defined as

\displaystyle T_{2\rightarrow 1}=\frac{dH_{1}}{dt}-\frac{dH_{1\bcancel{2}}}{dt}

(2)

where $\frac{dH_{1}}{dt}$ is the evolution of the marginal entropy of $X_{1}$ and $\frac{dH_{1\bcancel{2}}}{dt}$ denotes $\frac{dH_{1}}{dt}$ where the spillovers from $X_{2}$ are excluded. After lengthy derivations and reasonable assumptions (like that $W$ ’s, soon to be called structural shocks, are uncorrelated) it is obtained that

\displaystyle T_{j\rightarrow i}=-E\left[\frac{1}{\rho_{i}}\frac{\partial\left(F_{i}\rho_{i}\right)}{\partial X_{i}}\right]+\frac{1}{2}E\left[\frac{1}{\rho_{i}}\frac{\partial^{2}\left(g_{ii}\rho_{i}\right)}{\partial X^{2}_{i}}\right]\;,

(3)

where $\rho$ is the joint probability density of variables $X_{i}$ and $X_{j}$ , and $\rho_{i}$ denotes the marginal density of series $X_{i}$ . Given that the model in (1) is not readily identified from the data, this is not yet operational. After further derivations and assuming $\mathbf{B}=\left[\begin{smallmatrix}b_{ii}&0\\ 0&b_{jj}\end{smallmatrix}\right]$ , Liang2014 transforms (3) into a workable formula made of empirical moments

\displaystyle T_{j\rightarrow i}=\frac{\sigma_{i,i}\,\sigma_{i,j}\,\sigma_{j,\Delta i}-\left(\sigma_{i,j}\right)^{2}\,\sigma_{i,\Delta i}}{\left(\sigma_{i,i}\right)^{2}\,\sigma_{j,j}-\sigma_{i,i}\,\left(\sigma_{i,j}\right)^{2}}

(4)

where $\sigma_{i,i}$ is the variance of $i$ , $\sigma_{ij,}$ the covariance between $i$ and $j$ , and $\sigma_{i,\Delta i}$ is the covariance between $i$ and the $k^{th}$ difference of $i$ (we follow Liang2015 and set $k=1$ ). However, the validity of this appealing formula rests on the seemingly technical assumption of a diagonal $\mathbf{B}$ . Our point is that this assumption is far from merely technical and very frequently wrong. Motivating the diagonal $\mathbf{B}$ among other assumptions, Liang2014 states

Since the dynamics is unknown, we first need to choose a model. As always, a linear model is the natural choice, at least at the initial stage of development.

The problem is that (i) of course, we must choose an empirical model, but we will try to avoid those that the data blatantly reject, (ii) $\mathbf{B}$ being diagonal has nothing to do with linearity, and (iii) there is no further potential "development" possible without this assumption, which, in effect, assumes the causal problem away. As a result, whenever the diagonal $\mathbf{B}$ is violated by the data, IFs – as currently used in empirical studies – provide spurious causality.

We now use a very popular framework, from macroeconomics, to think more clearly about $\mathbf{B}$ . Liang2008’s flow of simplifications and assumptions makes his once sophisticated (1) collapse to that of a bivariate VAR with 1 lag

\left[\begin{array}[]{l}X_{1,t}\\ X_{2,t}\end{array}\right]=\underbrace{\left[\begin{array}[]{l}c_{1}\\ c_{2}\end{array}\right]}_{\bm{c}}+\underbrace{\left[\begin{array}[]{ll}a_{11}^{1}&a_{12}^{1}\\ a_{21}^{1}&a_{22}^{1}\end{array}\right]}_{\bm{A}}\left[\begin{array}[]{l}X_{1,t-1}\\ X_{2,t-1}\end{array}\right]+\underbrace{\left[\begin{array}[]{ll}b_{11}&b_{12}\\ b_{21}&b_{22}\end{array}\right]}_{\mathbf{B}}\left[\begin{array}[]{l}\varepsilon_{1t}\\ \varepsilon_{2t}\end{array}\right].

(5)

The seemingly innocuous assumption is much less so within a statistical framework relating assumptions directly to observable quantities. Indeed, the uncorrelatedness of $W$ ’s, which here translates to that of $\epsilon$ , combined with $b_{21}=b_{12}=0$ have a very stark implication. Let the number of endogenous variables be $M$ and the number of lags $P$ . Provided imposing $M=2$ and $P=1$ is reasonable (4) is valid if and only if regression residuals $\left[\begin{smallmatrix}u_{1t}\\ u_{2t}\end{smallmatrix}\right]=\left[\begin{smallmatrix}b_{11}&b_{12}\\ b_{21}&b_{22}\end{smallmatrix}\right]\left[\begin{smallmatrix}\varepsilon_{1t}\\ \varepsilon_{2t}\end{smallmatrix}\right]$ are not cross-correlated. This is easily testable: one needs to estimate equations 1 and 2 separately by least squares, collect the residuals and calculate their correlation $\rho_{u}$ . If the latter is different from 0 (and this could be formally tested with a t-test), then Liang2008’s simple formula does not apply. While $\rho_{u}=0$ might be plausible in continuous time (or anything near it), this is a monumental stretch for data sampled at the yearly frequency (like in SMCGL). Fortunately, unlike true structural causality, the data can directly inform us on whether Liang2014’s formula is valid or not for specific pairs of time series.

2.1 Acknowledging the Identification Problem: Vector Autoregressions

This exposition follows closely VARCTIC. In time series analysis, the "identification" problem originates from simultaneity in the data. We can learn whether

X_{t-1}\rightarrow Y_{t}\quad\text{or}\quad Y_{t-1}\rightarrow X_{t}

is more plausible. This is predictive causality in the sense of granger1969. However, the data itself cannot discriminate between

X_{t}\rightarrow Y_{t}\quad\text{and}\quad X_{t}\leftarrow Y_{t}.

In words, a correlation between $X_{t}$ and $Y_{t}$ can be generated by two different causal structures. Liang2008’s solution is to assume such relationships do not exist — yet, they do. Within a VAR, the problem boils down to the need for identifying $C$ in

\displaystyle C\bm{y}_{t}=\Psi_{0}+\sum_{p=1}^{P}{\Psi_{p}}\bm{y}_{t-p}+\bm{\varepsilon}_{t},

(6)

where $\bm{y}_{t}$ is an $M$ by 1 vector – meaning the dynamic system incorporates $M$ variables. $\Psi_{p}$ ’s parameterizes how each of these variables is predicted by its own lags and lags of the $M-1$ remaining variables. $P$ is the number of lags being included. The matrix $C$ characterizes how the $M$ different variables interact contemporaneously — e.g., how total forcing affects GMTA within the same year (a time unit $t$ in SMCGL). Finally, the structural disturbances are mutually uncorrelated with mean zero:

\bm{\varepsilon}_{t}=\left[\varepsilon_{1,t},\enskip...\enskip,\varepsilon_{M,t}\right]\leavevmode\nobreak\ \sim\leavevmode\nobreak\ \,N\left(0,\leavevmode\nobreak\ I_{M}\right).

Equation (6) is the so-called structural form of the VAR, which cannot be estimated because $C$ is not identified by the data. SMCGL uses formula (4) which implicitly assumes a constrained version of (6) with $M=2$ , $P=1$ , and, most importantly, $C$ being a diagonal matrix. The validity of their analysis hinges upon those constraints not being rejected by the data. In section 4, we find that the data disagrees with at least two of them.

Equation (6) is a structural model which can be used to answer causal questions directly. However, the elements of $C$ are not plain regression coefficients, and cannot be estimated as such — they would be biased. It does not mean that they do not exist. The implications of their existence can best be understood by looking at an estimable "reduced-form" VAR

\displaystyle\bm{y}_{t}=\bm{c}+\sum_{p=1}^{P}{\Phi_{p}}\bm{y}_{t-p}+\bm{u}_{t},

(7)

where $\bm{c}=C^{-1}\Psi_{0}$ and $\Phi_{p}=C^{-1}\Psi_{p}$ are both regression coefficients obtained by running least squares separately on each equation. $\bm{u}_{t}$ are now regression residuals

\bm{u}_{t}=\left[u_{1,t},\enskip...\enskip,u_{M,t}\right]\leavevmode\nobreak\ \sim\leavevmode\nobreak\ \,N\left(0,\leavevmode\nobreak\ \Sigma_{u}\right)

which will be cross-correlated if the true $C$ is not diagonal. As mentioned earlier, Liang2014’s simplifying assumptions translate into $\Sigma_{u}=C^{-1^{\prime}}C^{-1}$ being diagonal which is often at odds with the data. All the parameters of (7) can be estimated with traditional methods, but the model is not "structural" and cannot be used for causal inference. Structural VARs, which aim at uncovering "structural" causality (instead of predictive causality à la granger1969) acknowledge $\Sigma_{u}$ being non-diagonal and provide ways to obtain $C$ . As a byproduct, they can procure valid measures of information flow.

The raw material of causal measures are $\bm{\varepsilon}_{t}$ , the structural disturbances entering the systems. However, those, like structural causality, are not directly extractable from the data: we only have $\bm{u}_{t}$ and translating those back to $\bm{\varepsilon}_{t}$ necessitates $C$ . The latter is not directly attainable, but can be retrieved using the mapping $\Sigma_{u}=C^{-1^{\prime}}C^{-1}$ . In words, this means the covariance matrix of regression residuals from (7) can be used as raw material to retrieve the "structural" $C$ . Mechanically, the identification problem emerges because there are many $C$ ’s satisfying $\hat{\Sigma}_{u}=C^{-1^{\prime}}C^{-1}$ — numerous causal structures deliver the same residuals’ cross-correlations.

The strategy we opt for is the traditional Choleski decomposition of $\hat{\Sigma}_{u}$ . This is one of many identification strategies for the VAR (kilian2017svar). But among the catalog of methodologies, the Choleski decomposition is certainly popular (if not the most popular) in applied work and is simple to implement. Mechanically, it provides a lower-triangular matrix $C$ , satisfying $\hat{\Sigma}_{u}=\mathbf{B}^{\prime}\mathbf{B}$ where $\mathbf{B}$ is the same as from equation (5), but with dimensions $M\times M$ . Its purpose is to transform cross-correlated regressions residuals $u_{t}$ (equation (7)) into uncorrelated structural shocks $\bm{\varepsilon}_{t}$ (equation (6)). This is done by reversing the relationship $\bm{u}_{t}=C\bm{\varepsilon}_{t}$ .

Uncorrelatedness is essential to study how GMTA responds to a given forcing, keeping everything else constant. Such a causal claim would be impossible when considering an impulse from correlated residuals $u_{t}$ as those always co-move. A Choleski decompostion of $\Sigma_{u}$ is one way to transform the observed $u_{t}$ into the unobserved fundamental shocks $\bm{\varepsilon}_{t}$ . The assumption underlying such an approach to orthogonalization is a causal ordering of shocks. The ordering restricts how variables interact with each other within the same year, conditional on the previous state of the system. In our bivariate setup, ordering forcing $j$ after GMTA implies that forcing cannot impact GMTA within the same year. Ordering GMTA after a given forcing implies that GMTA cannot impact forcing $j$ within the same year. SMCGL implicitly assumes both restrictions at the same time, which results in a rejected overidentified model. In contrast, when the model is just identified (when only one restriction is imposed), this choice cannot be validated by the data itself as it does not alter the likelihood.

When revisiting SMCGL’s empirical work, we consider both orderings and document how sensitive conclusions are to that necessary choice.²²2Of course, there are identification schemes outside of the family of ”orderings” obtained by Choleski decomposition, but those are beyond the scope of this paper – and unnecessary to make our main point. There are cases where the sign of net IF (a qualitative notion) between $j$ and GMTA depends on the ordering choice, and cases where it does not. For instance, we will find that whatever is assumed about $C$ in a proper VAR system, total forcing appears to be causing GMTA much more than the reverse. In other cases, like $CO_{2}$ -induced radiative forcing, this cannot be simply ruled out by the data.

2.2 An Adequate Measure of IF Based on the VAR

In a complete multivariate system like a VAR, the errors of the $h$ -steps ahead forecast $y_{t+h,m}$ can be related back to structural shocks – that is, the anomalies driving the dynamics of the system. For instance, we can compute the share of the forecast error variance of GMTA 10 years from now that is attributable to $CO_{2}$ anomalies. Intuitively, if $CO_{2}$ is causing GMTA to increase, its exogenous impulses should be an important driver of GMTA’s variance rather than GMTA anomalies themselves – a high information flow/transfer between the two variables. Accordingly, VAR forecast errors are

\bm{u}_{t+h}=\bm{y}_{t+h}-\hat{\bm{y}}_{t+h}=\sum_{h^{\prime}=0}^{h-1}\Theta_{h^{\prime}}\bm{\varepsilon}_{t+h-h^{\prime}}

where $\Theta_{h}$ is a function of $\Phi_{p}$ ’s and $C$ . See kilian2017svar for further details. The forecast error variance decomposition (FEVD) of the whole system at horizon $h$ can be analytically calculated using the entries of matrix $\Theta_{h^{\prime}}$ . Precisely,

\displaystyle\operatorname{MSPE}_{h}=E(\bm{u}_{t+h}\bm{u}{t+_{h}}^{\prime})=\sum_{h^{\prime}=0}^{h-1}\Theta_{h^{\prime}}\Theta_{h^{\prime}}^{\prime}\;.

An information flow "share" of $i$ for $j$ can be characterized by the share of forecast error variance of variable $j$ attributable to structural shocks of $i$ .³³3For a discussion on how to think about ”shocks” in a physical system, see VARCTIC. While those measures can be assessed for any horizon $h$ (which can contain useful information), we focus on the cumulative sum since it is a measure of total flow. For non-stationary VARs (like those of the empirical section) we use a horizon of $h=15$ years. In the case of stationary VAR process, FEVD measures quickly converge to their long-run value as $h$ increases. Moreover, in that case, our FEVD-based IF measures can be more directly motivated from the Wold representation of a VAR process (kilian2017svar). Finally, note that whenever the $b_{12}=b_{21}=0$ assumption is approximately true, the FEVD approach and IFs give qualitatively similar answers.

3 Simulations

This section showcases how IF can lead to a pretense of causal knowledge, with conclusions that are sometimes the exact opposite of the truth. We consider four data-generating processes (DGPs) where the true $P$ is one. The first two correspond to what is proposed in Liang2014. The last two are generic VAR positively autocorrelated processes with differing degrees of persistence. Following the notation above (equation (5)):


	$\displaystyle\bm{A}$	$\displaystyle=\left[\begin{array}[]{ll}0.5&0.5\\ 0&0.6\end{array}\right],\quad\bm{c}=\left[\begin{array}[]{ll}0.1\\ 0.7\end{array}\right]$		(8e)

\displaystyle\bm{A}

\displaystyle=\left[\begin{array}[]{ll}-0.5&0.9\\ -0.2&0.5\end{array}\right],\quad\bm{c}=\left[\begin{array}[]{ll}0\\ 0\end{array}\right]

(8j)

\displaystyle\bm{A}

\displaystyle=\left[\begin{array}[]{ll}0.5&-0.2\\ -0.5&0.25\end{array}\right],\quad\bm{c}=\left[\begin{array}[]{ll}0\\ 0\end{array}\right]

(8o)

\displaystyle\bm{A}

\displaystyle=\left[\begin{array}[]{ll}0.25&-0.1\\ -0.2&0.1\end{array}\right],\quad\bm{c}=\left[\begin{array}[]{ll}0\\ 0\end{array}\right]

(8t)

As highlighted in the previous section, IFs are calculated assuming $b_{ij}=0$ . However, as will be reported and formally tested in section 4, this is frequently not the case for most time series, especially those sampled at low frequencies. Using a controlled simulation environment, we study how IFs behave for values of $\rho_{u}:=corr(u_{t}^{X_{1}},u_{t}^{X_{2}})\in[-1,1]$ .⁴⁴4Variances of $u_{t}^{X_{1}}$ and $u_{t}^{X_{2}}$ are one. Note that IFs are invariant to $\rho_{u}\neq 0$ emerging from $b_{12}=0$ or $b_{21}=0$ , which is obviously problematic given the causal content of $b_{ij}$ .⁵⁵5It is important to note that while we consider cases where either $b_{12}=0$ or $b_{21}=0$ , there is a continuum of possibilities between those. We do so for simplicity of exposition (it makes the problem dichotomous). Moreover, setting either $b_{12}=0$ or $b_{21}=0$ to 0 corresponds to a causal ordering which is by far the most common identification scheme used in practice – and what we will be using in section 4.

As our indicator of true underlying information flow, taking into account $\rho_{u}\neq 0$ , we use the FEVD-based measure described earlier. Note that here, unlike the application to real data in section 4, we know what are the true $b_{ji}$ ’s. When $\rho_{u}\neq 0$ , the correlation must be attributed to either $b_{12}$ or $b_{21}$ , or a combination of both. In a bivariate setup, this amounts to setting $b_{ij}=0$ , and attributing $\rho\neq 0$ to $b_{ji}$ . Hence, it is possible to tell when standard IFs conclude falsehoods, because in simulations $b_{12}$ and $b_{21}$ are known.

In terms of notation, $\Upsilon^{i,j}_{i\rightarrow j}$ means the FEVD share at horizon $h=10$ with $i=1,2$ and $i\neq j$ . The superscripts in $\Upsilon^{i,j}_{i\rightarrow j}$ determine the true ordering, hence $i$ ordered before $j$ . The subscripts indicate that we plot the contribution of variable $i$ to the forecast error variance of variable $j$ at horizon $h=10$ . The simulations have shown that $h=10$ is sufficient for convergence.

Figure 1 plots the absolute normalized information flows (NIF), $\tau_{i\rightarrow j}$ for DGP(1) through DGP(4), with varying $\rho_{u}$ . A first observation is that $\tau_{i\rightarrow j}$ often varies significantly with $\rho_{u}$ , even when the very formula underlying it assumes $\rho_{u}=0$ . The relative strength of $\tau_{1\rightarrow 2}$ and $\tau_{2\rightarrow 1}$ can easily collapse to 0 or be much higher than what IFs report for $\rho_{u}=0$ .

The fact that IFs vary with $\rho_{u}$ could give false hopes that they account for simultaneous relationships. We conduct a simple exercise to show it does not. An interesting empirical question is whether $i$ is causing $j$ more than $j$ is causing $i$ , which is the appealing promise of IFs. This moves beyond testing for Granger Causality and aims at quantifying the flow of information. The shaded green region corresponds to the values of $\rho_{u}$ for which either the true underlying causality is that $X_{2}$ causes $X_{1}$ more than the reverse, irrespective of whether it emerges from either $b_{12}=0$ or $b_{12}=0$ . For those values of $\rho_{u}$ , when the true ordering is unknown, the qualitative conclusion about the sign of the net causality flow does not hinge on knowledge of $b_{ij}$ . Analogously, the blue shade represents values of $\rho_{u}$ for which $X_{1}$ causing more $X_{2}$ is unanimous among $b_{ij}$ configurations. White regions are values of $\rho_{u}$ where conclusions about the sign of net causality flow cannot be determined from the data, i.e., the unknown $b_{ij}$ is necessary to settle. In other words, given the data available to the modeler, $X_{2}$ causing more $X_{1}$ and vice versa are both equally likely, and sorting it out decisively implies making a successful guess on the true value of $b_{ij}$ .

If IFs were correctly calibrated, the green dotted line should only be above the blue in the shaded green regions, and the dotted blue line above the green one in blue regions. Figure 1 clearly demonstrates this not to be the case. For every DGP, there is a substantial range of values (white spaces) for which IFs clearly conclude $\tau_{i\rightarrow j}>\tau_{j\rightarrow i}$ when there is one pair of $(b_{ij},b_{ji})$ out of two for which the reverse is true. To address concerns about an horizon mismatch $h$ between IFs and FEVD, Figure 4 in Appendix B shows results for $h=2$ . As it turns out, things worsen for IFs (the blue and green regions shrink) since the simultaneity problem is less diluted in dynamics at short horizons.

In sum, those simulations (largely inspired by Liang2015 DGPs) show two things. First, the tractable IFs (from formula (4)) are functions of $\rho_{u}$ even though they assume it to be zero. This compromises any statement on the strength of causal links. Second, for any DGP, there is an underlying pair $\left(b_{ij},b_{ji}\right)$ for which IFs’ conclusion about the net causal flow is the opposite of reality. This is not only an artifact of large $|\rho_{u}|$ , as exemplified by DGP(4).

4 GMTA and Radiative Forcing Revisited

In this section, we first revisit SMCGL’s application of IFs to GMTA and forcings. Second, we look at what lies behind opaque FEVD measurements by reporting impulse response functions, and computing the transient climate response implied by simplistic bivariate VARs.

Global warming generated by man-made forcing is the prevalent generalization of the notion climate change. Numerous researchers have dedicated their works to this very relationship between anthropogenic forcing and constituents of global climate (Hansen2006; AndrewsEtAl2010; LiEtAl2013; notz2016observed) and the socio-economic consequences thereof (Nord14; ipcc2019). Despite overwhelming evidence for anthropogenic forcing being the main driver of global climate change (ipccGlobWarm2018), scientists have also observed that especially since the turn of the millennium, global temperature has plateaued despite ever-rising greenhouse gases and contrary to projections from key climate models.⁶⁶6Resolving this puzzle has led to reevaluate the role of oceans in the interplay of radiative forcing and the climatic response (Tollefson14; Marotzke2015).

IFs hinge exactly on these bivariate relationships, which are attractive in their clarity, but are certainly a stark oversimplification of a complex system. Nevertheless, to give empirical content to our critique of the methodology, we study the relationship between GMTA and 7 forcings from SMCGL using both IFs and our FEVD-based remedy. The sample of annual means ranges from 1850 - 2005.⁷⁷7Data on the Pacific Decadal Oscillation (PDO) ranges from 1900 - 2005. We follow SMCGL and take our data from therein referenced data providers. Table 1 summarizes the results.⁸⁸8The sample was restricted to 1850-2005 to match that of SMCGL. Table 5 reports results extending the sample to 2017.

In Table 1 we report estimated correlation between residuals of the bivariate VAR(1) implied by IFs. When forcing $P=1$ , in 5 cases out of 7, the null that $\hat{\rho}_{u}^{i,GMTA}=0$ is rejected at least at the 10% level. When choosing $P$ with Bayesian Information Criterion (BIC), only $\hat{\rho}_{u}^{\text{Solar},GMTA}=0$ cannot be rejected. Hence, as repeatedly mentioned in the text, IFs assume something that can be, and is rejected by the data. Moreover, in the light of simulations carried earlier, the qualitative and quantitative insights from IFs are often spurious under such conditions.

Table 1: Empirical Results for the Bivariate Relationship
Between Various Forcings and GMTA

Correlation Normalized IF (IF $\times$ 100) FEVD Lags ( $P$ ) Correlation of Residuals ( $\rho_{u}$ ) Ordering: $i$ , GMTA Ordering: GMTA, $i$ $i$ $\rightarrow$ GMTA GMTA $\rightarrow$ $i$ ${i\rightarrow\text{GMTA}}$ ${\text{GMTA}\rightarrow i}$ ${i\rightarrow\text{GMTA}}$ ${\text{GMTA}\rightarrow i}$ Total Forcing 0.73 30.6 20.8 4 0.23*** 47.4 13.0 28.0 25.4 (15.3) (11.1) 1 0.29*** 51.4 9.6 27.6 28.7 Anthropogenic 0.86 39.8 -20.0 4 -0.19** 6.5 3.7 3.8 13.4 (35.7) (-0.6) 1 -0.19** 5.0 5.8 2.2 17.1 CO₂ - ERF (W/m ${}^{\text{2}}$ ) SMCGL 0.86 39.6 -15.2 4 - 0.14* 6.5 8.4 5.6 17.4 (35.1) (-0.4) 1 -0.15* 2.8 4.7 1.1 12.8 Aerosol -0.82 35.9 -24.5 4 -0.19** 2.9 0.6 2.1 1.6 (24.3) (-0.4) 1 -0.10 3.5 4.0 1.8 1.2 Solar 0.49 13.5 6.7 8 0.05 8.5 1.6 6.6 2.5 (3.8) (2.3) 1 0.08 16.6 4.2 12.3 6.8 Volcanic 0.09 0.9 -0.5 4 0.18** 10.9 0.8 3.1 3.6 (0.2) (-0.4) 1 0.20** 7.1 1.4 0.6 3.7 PDO 0.17 -1.2 -0.6 4 0.35*** 31.1 0.9 6.3 10.3 (-0.2) (-0.5) 1 0.34*** 9.1 0.5 0.2 10.7 CO₂ (Mt/yr) 0.82 37.1 -4.3 2 -0.10 8.9 2.1 10.7 0.6 (27.0) (-0.0) 1 -0.05 4.2 0.0 4.4 0.4 CO₂ (W/m ${}^{\text{2}}$ ) 0.86 39.5 -14.0 4 0.23*** 5.2 16.8 2.9 4.7 (34.5) (-0.3) 1 0.07 1.6 4.1 0.9 1.8

Notes: $i$ corresponds to the type of radiative forcing, listed in the left most column. The second column ("Correlation") documents the correlation between GMTA and variable $i$ . FEVD values are taken at horizon $h=15$ , which translates into the contribution of variable $i$ in the variance of the forecast error of variable $j$ a decade and a half after the in-sample end date. Numbers in bold underline the highest absolute causal flow among a ( $i$ ,GMTA) pair for a given measure. "*", "**", and "***" means that the null of the residuals cross-correlation of residuals is rejected at the 10%,5% and 1% level respectively.

Naturally, we concentrate on FEVD results which correctly account for $\rho_{u}\neq 0$ . However, setting $\rho_{u}=0$ is only one of the empirical shortcomings of the empirical IF formula — it also sets $P$ , the number of lags of each $X_{t}$ , to be 1. Clearly, it is empirically plausible that $X_{i,t-2}$ or $X_{j,t-4}$ may have an impact on $X_{i,t}$ beyond what is channeled by single year lags ( $X_{i,t-1}$ and $X_{j,t-1}$ ) . In other words, $P=1$ is extremely restrictive on climatic dynamics. When choosing $P$ with BIC, results align better with prior scientific knowledge. Nonetheless, for the sake of completeness, we report both results ( $P=1$ and $P=P^{*}$ , with $P^{*}$ being BIC’s choice).

With $P=P^{*}$ , the fact that total forcing causes GMTA more than the reverse is without appeal. Nevertheless, the quantitative answer is, again, highly dependent on the ordering choice. After 15 years, total forcing anomalies are responsible for explaining between 28.0% and 47.4% of that of GMTA – depending on the preferred ordering. The net causal flow being higher from aerosol and solar to GMTA are also unanimous, but much smaller. Indecisive results are reported for Anthropogenic, Volcanic and PDO. Overall, Table 1 suggests that the data itself does not support the strong qualitative conclusions of SMCGL for their CO₂ measure and Anthropogenic.

When $P$ is forced to one, as in IFs, inconclusive results are reported for total forcing, aerosol, volcanic and PDO. $P=1$ specifications, irrespective of the ordering⁹⁹9There are other identification schemes which cannot be cast as ”orderings”. That is, there are rotations of $\bm{u}_{t}$ (even when $\rho_{u}=0$ ) giving different structural shocks. Hence, while the two orderings span a lot of possibilities (and those traditionally considered first in practice), they do represent the universe of rotations of $\bm{u}_{t}$ into $\bm{\epsilon}_{t}$ ., conclude that GMTA is causing more $CO_{2}$ and Anthropogenic than the reverse. Only solar forcing results are unanimous, and in line with what climatic common wisdom suggests. Note that it also agrees with results from original IFs, which is not surprising given that $\hat{\rho}_{Solar,GMTA}$ is in the close vicinity of 0. However, choosing a proper $P$ nearly doubles the share of forecast errors attributable to solar forcing.

4.1 What’s up with CO₂?

Results for CO₂ are rather surprising. Irrespective of the ordering, GMTA is reported to cause CO₂ more than the reverse, a finding contradicting SMCGL’s results and common wisdom.¹⁰¹⁰10Hansen2006 states that global warming started to accelerate not prior to the 1970s. Only after 1975 did the global temperature increase by approximately 0.2^∘C per decade. However, SMCGL’s CO₂ measure exhibit ill dynamic behavior that cannot possibly be that of a natural quantity. This becomes obvious when plotting their CO₂-ERF measure in first difference: it evolves according to a series of dichotomous jumps. This CO₂-ERF series, in which ERF stands for effective radiative forcing, is a concept presented in MyhreEtAl2013. The series itself is based on EtminanEtAl2016.

Given the strange jumping behavior of the CO₂-ERF series, and to make our calculations comparable to other findings in the literature (BrunsEtAl2020; MontamatStock2020), we derive RF from CO₂ concentration as follows: we use the well established Meinshausen2017 data set on annual global means of CO₂ concentration, measured in parts per million (ppm). We follow MyhreEtAl2013 and transform the increase in CO₂ concentration in year $t$ measured in ppm relative to the concentration in a given base year, CO_2,base, into radiative forcing, $RF^{\text{CO}_{2}}_{t}$ – measured in W/m ${}^{\text{2}}$ – as follows: $RF^{\text{CO}_{2}}_{t}=5.35\times\text{ln}\left(\text{CO}_{2,t}/\text{CO}_{2,base}\right)$ . We use 1850 as base year following BrunsEtAl2020.

As it turns out, considering this less contentious CO₂ series does not resolve the apparently counterintuitive finding that GMTA explains a larger portion of the forecast error variance of CO₂ than vice versa. Such a finding has also been reported using a different methodology in koutsoyiannis2020atmospheric. We explore a last avenue, that of using annual CO₂ emissions rather than $RF^{\text{CO}_{2}}_{t}$ . This last attempt is successful in reconciling the FEVD approach with the traditional wisdom that CO₂ is causing GMTA "more" than the reverse. This finding is independent of the ordering choice.

In sum, based on this particular time series evidence, the causal link between certain forcings and GMTA remains disputable. What is less disputable are the effects of total forcing, CO₂ emissions, and solar forcing which all explain an important share of GMTA anomalies independently of arbitrary ordering preferences. Nonetheless, we see such analyses as rather primitive and potentially misleading. For instance, $X$ causing ”more” $Y$ does not mean that the reverse causality is not quantitatively important, or climatologically relevant. We now turn to a more promising way to extract meaning out of selected bivariate VARs.

4.2 Impulse Response Functions

To open the black box of those rather abstract measurements, we report in Figure 3 impulse response functions (IRF) for the bivariate models of CO₂-GMTA and total forcing-GMTA. Since sims1980, the dominant approach for studying the properties of the VAR around its deterministic path has been IRFs to structural shocks. Their dynamic effect can be analyzed as that of a randomly assigned treatment because those have been transformed to be uncorrelated, which provides the "keeping everything else constant" interpretation.

The IRF of a variable $i$ to a one standard deviation shock of $\varepsilon_{j,t}$ is defined as

\displaystyle IRF(j\rightarrow i,h)=E(y_{i,t+h}|\bm{y_{t}},\varepsilon_{t,j}=\sigma_{\varepsilon_{j}})-E(y_{i,t+h}|\bm{y_{t}},\varepsilon_{t,j}=0).

(9)

Thus, it is the expected difference, $h$ months after "impact", between a bivariate system that responded to an unexpected CO₂ increase, and the same system where no such increase occurred. In a linear VAR, the above takes a closed-form solution in terms of the matrices from (6). Models are now estimated with Bayesian methods, optimizing the hyperparameters of a standard Minnesota prior, and choosing the number of lags as reported in the gray-shaded rows of Table 1. The primary motivation is to obtain valid inference even in the presence of nonstationarity. Point estimates are nearly identical to that of OLS. For details on such choices in the context of climate data and a more thorough (yet introductory) treatment of IRFs, see VARCTIC.

In Figure 2 we show the effect of an unexpected increase in annual emissions on GMTA, while Figure 3 shows the response of GMTA radiative forcing generated by an unexpected rise in cumulative emissions.

Qualitatively, the impact of annual emissions and cumulative CO₂-induced forcing shocks on global temperature is vastly similar. In accord with findings in VARCTIC for the effect of CO₂ on Arctic sea ice extent, the impact of total forcing and CO₂ shocks is highly durable. This time, it is on GMTA rather than sea ice extent. As reflected in Figures 2 and 3, this result is qualitatively independent of the ordering choice. In both cases (and for both forcing variables), the effect of forcing takes about two years to completely settle in. However, it is clear that the reported short-run impact strongly depends on the identification assumptions, which SMCGL completely abstract from.

Whether the slightly negative short-run response of GMTA in the first panel of Figure 2 favors the ordering $\left\{\text{{GMTA}},\,\text{{CO${}_{2}$}}\right\}$ over $\left\{\text{{CO${}_{2}$}},\,\text{{GMTA}}\right\}$ is debatable: ForsterEtAl2020 find the reduction in global CO₂ emissions during the COVID-19 pandemic to have resulted in a short-run rise of global temperature. The key mechanism is a decline in the cooling-effect of aerosols as a result of less SO₂ emissions. The authors project a rise in global temperature over the first 24 months following the pandemic-induced reduction in global nitrogen oxide (NO_x) emissions.¹¹¹¹11Especially NO₂ is found to be well-correlated with CO₂ emissions (ForsterEtAl2020).

Lastly, a robustness check. Our variables are clearly nonstationary. While this does not pose a problem for Bayesian inference, it is nevertheless natural to wonder if results would be significantly altered by including a time trend. Accordingly, the second column reports the same IRFs, but for VAR specifications augmented with trends. Those show that adding such an exogenous explanatory variable does not change the dynamics of a GMTA response to an unexpected shock to CO₂. The addition of a trend to the bivariate model of GMTA and total forcing, allows GMTA to slowly revert to a lower impact – which is nevertheless highly persistent.

4.3 Are VAR Estimates Quantitatively Reasonable?

The transient climate response (TCR) is a frequently used metric to measure the impact of rising atmospheric CO₂ concentration on temperature. It is not only an indication of the trajectory of ongoing climate change, but also serves as a benchmark to evaluate the results of climate model projections (PhillipsLeirvikStorelvmo2020). The TCR is defined as the increase in temperature, between $h_{0}$ and $h_{T}$ , under the assumption that CO₂ increases annually by 1%. $h_{T}$ is defined as that point in time when – due to the steady annual increase of 1% – CO₂ concentration is twice as high as at date $h_{0}$ (Pretis2020; MontamatStock2020). Such a doubling of CO₂ would occur approximately after 70 years (OttoEtAl2013). Following the transformation of ppm to W/m ${}^{\text{2}}$ as suggested by MyhreEtAl2013 and MontamatStock2020, a doubling of CO₂ under an annual increase in concentration of 1% would generate a radiative forcing of $5.35\times\text{ln}\left(2\right)\approx 3.7\text{ W/m}^{2}$ .¹²¹²12An annual increase of 1% in atmospheric CO₂ concentration results in a doubling of CO₂ after approximately 70 years, which is described more formally as: $h\times ln\left(\frac{1.01}{1}\right)=ln\left(2\right)\;,$ for $h\approx 70$ (MontamatStock2020).

Typical estimates for TCR fall within a range of 1^∘C-2.5^∘C with a 66% probability, as summarized in the IPCC 5th Assessment Report (BindoffEtAl2013). More recent estimates are well aligned with this range. BrunsEtAl2020 reports a point estimate of TCR ranging from 1.17^∘C to 1.85^∘C, depending on the type of data and model specification. Pretis2020 embeds a two-component energy balance model into a cointegrated vector autoregressive model. His estimates vary across model specification and range from 1.24^∘C to 1.38^∘C. PhillipsLeirvikStorelvmo2020 report a global transient climate sensitivity of 2.05^∘C. The IV regression in MontamatStock2020 allows for a differentiation of TCR measurements across different horizons, normalized to giving 70-year-horizon estimates. Their point estimates range in the neighborhood of 1.5^∘C within a 95% confidence interval of roughly 0.9^∘C to 2.1^∘C.

Despite being much more simplistic than the models of the aforementioned works on TCR, the bivariate VARs do also allow to estimate the impact of a doubling of CO₂ on temperature. Here we make use of the concept of IRFs, as presented in Equation (9). In particular, we estimate the impact on temperature when $RF^{\text{CO}_{2}}$ increases by one standard deviation of its reduced-form residuals, $\sigma_{\varepsilon_{j}}$ , where $j=RF^{\text{CO}_{2}}$ , from a bivariate VAR of $RF^{\text{CO}_{2}}$ and GMTA. Recalling the definition of TCR, a doubling of CO₂ concentration, which is achieved by an annual increase of 1% in atmospheric CO₂ concentration, generates an additional radiative forcing of approximately $5.35\times\text{ln}\left(2\right)\approx 3.7\text{ W/m}^{2}$ . In our case, the shock $\sigma_{\varepsilon_{j}}$ to CO₂ is a one-time event at horizon $h=0$ , but its effects are distributed over horizons $h=1,2,3,...,H$ . This allows us to measure the cumulative increase, $\Xi_{i}$ , in $i\in\left\{RF^{\text{CO}_{2}},GMTA\right\}$ generated by $\sigma_{\varepsilon_{j}}$ , where $j=RF^{\text{CO}_{2}}$ , at any horizon $h$ :

\Xi_{j,h}=\sum^{h}_{s=0}IRF(j\rightarrow i,s)\enskip,

where $IRF(j\rightarrow i,s)$ is defined as in Equation (9). Adapting the formula of OttoEtAl2013, we estimate TCR_h as the increase in GMTA at horizon $h$ as follows:

\displaystyle TCR_{h}=\Xi_{GMTA,h}\times\frac{5.35\times\text{ln}\left(2\right)}{\Xi_{RF^{\text{CO}_{2}},h}}\enskip,

(10)

where $\Xi_{GMTA,h}$ is the cumulative increase in global temperature at horizon $h$ , resulting from the shock $\sigma_{\varepsilon_{j}}$ , where $j=RF^{\text{CO}_{2}}$ , at horizon $h=0$ . Likewise, $\Xi_{RF^{\text{CO}_{2}},h}$ is the cumulative increase in radiative forcing of CO₂ at horizon $h$ , resulting from the shock $\sigma_{\varepsilon_{j}}$ , where $j=RF^{\text{CO}_{2}}$ , at horizon $h=0$ .

In Table 2 we present median point estimates of TCR_h for $h=20$ and $h=70$ (as in MontamatStock2020) from bivariate VARs of $RF^{\text{CO}_{2}}$ and GMTA – with and without an exogenous time trend. Thus, Table 2 reports the TCR corresponding to the model specifications in Figure 3 (a) and (b). That is, we use Bayesian estimation techniques and deploy a Minnesota prior on our parameter estimates. Our VAR has four lags and the estimation is based on annual observations between 1850 and 2005.

Table 2: Transient Climate Response

Ordering Without Trend With Trend TCR₂₀ TCR₇₀ TCR₂₀ TCR₇₀ CO₂, GMTA 1.99^∘C 2.06^∘C 2.17^∘C 2.58^∘C GMTA, CO₂ 0.57^∘C 1.82^∘C 0.85^∘C 2.39^∘C

The main message of Table 2 is twofold: first, the ordering of the variables heavily influences the final results for $\text{TCR}_{20}$ , demonstrating the importance of respecting the possibility of cross-correlated residuals. Mechanically, the discordance brought up by the ordering choice vanishes at much longer horizons, and, as a result, $\text{TCR}_{70}$ estimates are largely similar. However, at that horizon, it is the choice of whether or not to include a trend that can alter results significantly. Second, even though the TCR_h point estimates of the trend models are rather located at the upper bound of the IPCC range of 1^∘C-2.5^∘C, a simplistic bivariate VAR model including a constant and a time trend as additional exogenous regressors, is capable of providing a reasonable approximation of the rise in global mean temperature, triggered by a doubling of atmospheric CO₂ concentration.

TCR estimates can be helpful in choosing which ordering is most plausible. For instance, only by ordering CO₂ first do we get $\text{TCR}_{20}$ to fall within the IPCC range. IRFs can also help sort things out. Ordering CO₂ second leads to a surprisingly lasting negative effect of CO₂ shocks on GMTA. An increasingly popular approach to VAR identification in macroeconomics is to use sign restrictions, where implausible IRF draws (based on economic theory) are tossed out (Uhlig2005). This dispenses the researcher from formulating a likely contentious causal ordering of variables. In a climate application, one could identify the VAR by rejecting specifications generating implausible $IRF(\text{CO${}_{2}$}\rightarrow\text{GMTA},h)$ or TCRs. Applying this sort of reasoning leads us to favor – to nobody’s surprise – the specification where simultaneous causality runs from CO₂ to GMTA. However, it is important to stress that this choice is obtained from prior knowledge on what is deemed reasonable and what is not, rather than our two time series.

5 Conclusion

This note is a cautionary tale about how seemingly innocuous simplifying assumptions can go wrong – especially when they are formulated without consulting the data. IFs, as proposed by SMCGL, is a concept that hinges on the assumption of zero correlation between the residuals of a bivariate VAR(1) process. In discrete time, especially with observations at lower frequencies, such an assumption is most often not justified. Both stylized simulations and an empirical application in the form of the transient climate response demonstrate that being negligent about cross-correlated residuals can lead to markedly different outcomes. Our results show that already in a bivariate system of CO₂ and GMTA, the resulting TCR depends on how one deals with correlated residuals.

FEVDs provide an alternative to IFs. Albeit being a decisive improvement over IFs, FEVDs are as good as their underlying statistical model. To avoid departing too much from the SMCGL’s framework, we only considered bivariate models. Climate systems obviously comprise numerous additional variables. wilson2010atmospheric considers four in a VAR setup, but there could be many more. Additionally, the dynamics were assumed to be linear and time invariant, an approximation that should be eventually tested. estrada2013statistically trend "breaks" model is one way to do it and they report results pointing in the same direction as ours. However, with machine learning tools becoming increasingly common use, more flexible alternatives could be used to yield further insights. Finally, we considered simplistic identification schemes which implied a causal ordering of variables. There is a plethora of more sophisticated schemes available (kilian2017svar) and those could be used in future work, especially when moving beyond bivariate systems. Another – simpler – avenue is the use of data sampled at higher frequencies (like daily) which, by construction, makes the simultaneity problem much less of a Damocles sword.

References

Andrews et al., (2010) Andrews, T., Forster, P., Boucher, O., Bellouin, N., and Jones, A. (2010). Precipitation, Radiative Forcing and Global Temperature Change. Geophysical Research Letters, 37(14).
Bindoff et al., (2013) Bindoff, N., Stott, P., AchutaRao, P., Allen, M., Gillett, N., Gutzler, D., Hansingo, K., Hegerl, G., Hu, Y., Jain, S., Mokhov, I., Overland, J., Perlwitz, J., Sebbari, R., and Zhang, X. (2013). Chapter 10 - Detection and Attribution of Climate Change: From Global to Regional. In Climate Change 2013: The Physical Science Basis. IPCC Working Group I Contribution to AR5. Cambridge University Press, Cambridge.
Bruns et al., (2020) Bruns, S., Csereklyei, Z., and Stern, D. (2020). A Multicointegration Model of Global Climate Change. Journal of Econometrics, 214(1):175–197. Annals Issue: Econometric Models of Climate Change.
Chen et al., (2011) Chen, G., Glen, D. R., Saad, Z. S., Hamilton, J. P., Thomason, M. E., Gotlib, I. H., and Cox, R. W. (2011). Vector autoregression, structural equation modeling, and their synthesis in neuroimaging data analysis. Computers in biology and medicine, 41(12):1142–1155.
Estrada et al., (2013) Estrada, F., Perron, P., and Martínez-López, B. (2013). Statistically derived contributions of diverse human influences to twentieth-century temperature changes. Nature Geoscience, 6(12):1050–1055.
Etminan et al., (2016) Etminan, M., Myhre, G., Highwood, E., and Shine, K. (2016). Radiative Forcing of Carbon Dioxide, Methane, and Nitrous Oxide: A significant Revision of the Methane Radiative Forcing. Geophysical Research Letters, 43(24):12,614–12,623.
Forster et al., (2020) Forster, P., Forster, H., Evans, M., Gidden, M., Jones, C., Keller, C., Lamboll, R., Quéré, C., Rogelj, J., Rosen, D., Schleussner, C.-F., Richardson, T., Smith, C., and Turnock, S. (2020). Current and Future Global Climate Impacts Resulting from COVID-19. Nature Climate Change, 10(10):913–919.
Goulet Coulombe and Göbel, (2021) Goulet Coulombe, P. and Göbel, M. (2021). Arctic amplification of anthropogenic forcing: A vector autoregressive analysis. Journal of Climate, Forthcoming.
Granger, (1969) Granger, C. W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society, pages 424–438.
Granville Tunnicliffe, (2015) Granville Tunnicliffe, W. (2015). Atmospheric co2 and global temperatures: the strength and nature of their dependence. Working Paper.
Hansen et al., (2006) Hansen, J., Sato, M., Ruedy, R., Lo, K., Lea, D., and Medina-Elizade, M. (2006). Global Temperature Change. Proceedings of the National Academy of Sciences, 103(39):14288–14293.
Kilian and Lütkepohl, (2017) Kilian, L. and Lütkepohl, H. (2017). Structural vector autoregressive analysis. Cambridge University Press.
Koutsoyiannis and Kundzewicz, (2020) Koutsoyiannis, D. and Kundzewicz, Z. W. (2020). Atmospheric temperature and co2: Hen-or-egg causality? Sci, 2(4):83.
Li et al., (2013) Li, C., Notz, D., Tietsche, S., and Marotzke, J. (2013). The Transient versus the Equilibrium Response of Sea Ice to Global Warming. Journal of Climate, 26(15):5624–5636.
Liang, (2008) Liang, X. (2008). Information Flow within Stochastic Dynamical System. Phys. Rev. E, 78(5):031113.
Liang, (2014) Liang, X. (2014). Unraveling the Cause-Effect Relation between Time Series. Phys. Rev. E, 90:052150.
Liang, (2015) Liang, X. (2015). Normalizing the Causality between Time Series. Phys. Rev. E, 92:022126.
Liang, (2016) Liang, X. (2016). Information Flow and Causality as rigorous Notions ab initio. Phys. Rev. E, 94:052201.
Marotzke and Forster, (2015) Marotzke, J. and Forster, P. (2015). Forcing, Feedback and Internal Variability in Global Temperature Trends. Nature, 517(7536):565–570.
Masson-Delmotte et al., (2018) Masson-Delmotte, V., Zhai, P., Pörtner, H.-O., Roberts, D., Skea, J., Shukla, P., Pirani, A., Moufouma-Okia, W., Péan, C., Pidcock, R., Connors, S., Matthews, J.B.R.and Chen, Y., Zhou, X., Gomis, M., Lonnoy, E., Maycock, T., Tignor, M., and Waterfield, T. (2018). IPCC, 2018: Global Warming of 1.5^∘C. An IPCC Special Report on the Impacts of Global Warming of 1.5^∘C above Pre-Industrial Levels and related Global Greenhouse Gas Emission Pathways, in the Context of Strengthening the Global Response to the Threat of Climate Change, Sustainable Development, and Efforts to Eradicate Poverty. In press.
Meinshausen et al., (2017) Meinshausen, M., Vogel, E., Nauels, A., Lorbacher, K., Meinshausen, N., Etheridge, D., Fraser, P., Montzka, S., Rayner, P., Trudinger, C., Krummel, P., Beyerle, U., Canadell, J., Daniel, J., Enting, I. G., Law, R., Lunder, C., O’Doherty, S., Prinn, R., Reimann, S., Rubino, M., Velders, G., Vollmer, M., Wang, R., and Weiss, R. (2017). Historical greenhouse gas concentrations for climate modelling (cmip6). Geoscientific Model Development, 10(5):2057–2116.
Montamat and Stock, (2020) Montamat, G. and Stock, J. (2020). Quasi-experimental estimates of the transient climate response using observational data. Climatic Change, 160(3):361–371.
Myhre et al., (2013) Myhre, G., Shindell, D., Bréon, F.-M., Collins, W., Fuglestvedt, J., Huang, J., Koch, D., Lamarque, J.-F., Lee, D., Mendoza, B., Nakajima, T., Robock, A., Stephens, G., Takemura, T., and Zhang, H. (2013). Anthropogenic and Natural Radiative Forcing. In Stocker, T., Qin, D., Plattner, G.-K., Tignor, M., Allen, S. K., Doschung, J., Nauels, A., Xia, Y., Bex, V., and Midgley, P., editors, Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change, pages 659–740. Cambridge University Press, Cambridge, UK.
Nordhaus, (2014) Nordhaus, W. (2014). Estimates of the social cost of carbon: Concepts and results from the dice-2013r model and alternative approaches. Journal of the Association of Environmental and Resource Economists, 1(1/2):273–312.
Notz and Stroeve, (2016) Notz, D. and Stroeve, J. (2016). Observed arctic sea-ice loss directly follows anthropogenic co2 emission. Science, 354(6313):747–750.
Otto et al., (2013) Otto, A., Otto, F., Boucher, O., Church, J., Hegerl, G., Forster, P., Gillett, N. P., Gregory, J., Johnson, G., Knutti, R., Lewis, N., Lohmann, U., Marotzke, J., Myhre, G., Shindell, D., Stevens, B., and Allen, M. (2013). Energy Budget Constraints on Climate Response. Nature Geoscience, 6(6):415–416.
Phillips et al., (2020) Phillips, P., Leirvik, T., and Storelvmo, T. (2020). Econometric Estimates of Earth’s Transient Climate Sensitivity. Journal of Econometrics, 214(1):6–32. Annals Issue: Econometric Models of Climate Change.
Pretis, (2020) Pretis, F. (2020). Econometric Modelling of Climate Systems: The Equivalence of Energy Balance Models and Cointegrated Vector Autoregressions. Journal of Econometrics, 214(1):256–273. Annals Issue: Econometric Models of Climate Change.
Shukla et al., (2019) Shukla, P., Skea, J., Calvo Buendia, E., Masson-Delmotte, V., Pörtner, H.-O., Roberts, D., Zhai, P., Slade, R., Connors, S., van Diemen, R., Ferrat, M., Haughey, E., Luz, S., Neogi, S., Pathak, M., Petzold, J., Portugal Pereira, J., Vyas, P., Huntley, E., Kissick, K., Belkacemi, M., and Malley, J. (2019). Climate Change and Land: an IPCC Special Report on Climate Change, Desertification, Land Degradation, Sustainable Land Management, Food Security, and Greenhouse Gas Fluxes in Terrestrial Ecosystems. In press.
Sims, (1980) Sims, C. A. (1980). Macroeconomics and reality. Econometrica: journal of the Econometric Society, pages 1–48.
Stips et al., (2016) Stips, A., Macias, D., Coughlan, C., Garcia-Gorriz, E., and Liang, X. (2016). On the Causal Structure between CO2 and Global Temperature. Scientific Reports, 6:21691.
Tawia Hagan et al., (2019) Tawia Hagan, D. F., Wang, G., San Liang, X., and Dolman, H. A. (2019). A time-varying causality formalism based on the liang–kleeman information flow for analyzing directed interactions in nonstationary climate systems. Journal of Climate, 32(21):7521–7537.
Tollefson, (2014) Tollefson, J. (2014). Climate Change: The Case of the Missing Heat. Nature, 505:276–278.
Uhlig, (2005) Uhlig, H. (2005). What are the effects of monetary policy on output? results from an agnostic identification procedure. Journal of Monetary Economics, 52(2):381 – 419.

Appendix A Data Sources

For the empirical part of Section 4, we refer to the variables presented in Table 1 of SMCGL. Due to the sparse description of the data sources, our data set reduces to the annual global mean surface temperature anomalies (GMTA) and seven other representatives of radiative forcing. For five variables we could match the reported correlation coefficients as well as the IF values of SMCGL, Table 1. For total forcing and solar these measures do not coincide. For the analysis in Section 4.2 and 4.3, we use the CO₂ series based on Meinshausen2017. We transform ppm into W/m ${}^{\text{2}}$ as described in Section 4.

Table 3: List of Variables

Abbreviation	Description	Data Source
Total Forcing	annual; 1850-2005	KNMI Climate Explorer
Anthropogenic	annual; 1850-2005	KNMI Climate Explorer
CO₂-ERF (W/m ${}^{\text{2}}$ )	annual; 1850-2005	KNMI Climate Explorer
Aerosol	annual; 1850-2005	KNMI Climate Explorer
Solar	annual; 1850-2005	KNMI Climate Explorer
Volcanic	annual; 1850-2005	KNMI Climate Explorer
PDO	annual averages of monthly observations; 1900-2005	KNMI Climate Explorer
GMTA	annual; global; 1900-2005	HadCRUT4
CO₂ (Million Tonnes / Year)	annual; global production-based emissions; 1850-2005	Our World in Data – not in use
CO₂ (ppm)	annual; global; 1850-2014 – based on Meinshausen2017 annual; global; 2015-2017 – based on NOAA-ESRL	1850-2014: IAC ETH Zürich 2015-2017: NOAA-ESRL

Appendix B Additional Simulation Results

In Figure 4 below, we compare NIF and FEVD_h=2. One might be concerned that results in Figure 1 suffer from a horizon mismatch between NIF and FEVD_h=10. Results suggest that reducing $h$ worsens NIFs’ problems by shrinking the "safe" regions. This is intuitive, the effect of assumptions on simultaneous relationships gets milder as we get further from $h=0$ .

Appendix C Additional Empirical Results

Table 4: Empirical Results for the Bivariate Relationship
Between Various Forcings and GMTA
Sample Period: 1850-2017

Correlation Normalized IF (IF $\times$ 100) FEVD Lags ( $P$ ) Correlation of Residuals ( $\rho_{u}$ ) Ordering: $i$ , GMTA Ordering: GMTA, $i$ $i$ $\rightarrow$ GMTA GMTA $\rightarrow$ $i$ ${i\rightarrow\text{GMTA}}$ ${\text{GMTA}\rightarrow i}$ ${i\rightarrow\text{GMTA}}$ ${\text{GMTA}\rightarrow i}$ Total Forcing 0.82 36.8 29.5 4 0.23*** 48.0 15.6 28.3 29.3 (17.2) (15.3) 1 0.29*** 55.7 14.4 30.0 36.6 Anthropogenic 0.91 43.7 -18.9 4 -0.16** 6.5 5.0 4.9 13.5 (39.9) (-0.6) 1 -0.15** 4.2 5.1 2.4 13.7 CO₂ - ERF (W/m ${}^{\text{2}}$ ) SMCGL 0.91 43.5 -13.0 4 -0.10 5.75 9.2 5.3 15.4 (39.0) (-0.3) 1 -0.11 2.1 3.1 1.2 8.0 Aerosol -0.84 37.9 -45.7 4 -0.17** 3.9 2.9 6.1 0.0 (19.4) (-1.3) 1 -0.00 2.0 33.3 2.0 33.1 Solar 31.4 7.0 2.1 8 0.05 4.6 0.9 3.1 1.4 (1.1) (0.6) 1 0.06 6.7 1.0 4.4 2.0 Volcanic 0.11 1.3 -0.3 4 0.18** 10.1 0.5 2.7 2.4 (0.2) (-0.2) 1 0.20*** 7.1 0.3 0.6 3.7 PDO 0.15 -2.3 -0.4 4 0.4*** 31.0 0.8 3.9 13.7 (-0.3) (-0.3) 1 0.38*** 9.5 0.3 0.7 13.5 CO₂ (Mt/yr) 0.89 42.0 1.0 2 -0.12 7.9 1.9 8.8 0.3 (31.0) (0.00) 1 -0.10 5.4 0.0 5.4 0.7 CO₂ (W/m ${}^{\text{2}}$ ) 0.91 43.4 -13.4 2 0.18** 5.0 16.1 6.1 6.2 (38.3) (-0.3) 1 0.06 1.5 3.4 1.0 1.6

Table 5: Transient Climate Response
Sample Period: 1850-2017

Ordering Without Trend With Trend TCR₂₀ TCR₇₀ TCR₂₀ TCR₇₀ CO₂, GMTA 1.46^∘C 1.94^∘C 1.76^∘C 2.35^∘C GMTA, CO₂ 0.58^∘C 1.79^∘C 0.97^∘C 2.22^∘C

On Spurious Causality, CO2, and Global Temperature