This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

scpi: Uncertainty Quantification for Synthetic Control Methods

Matias D. Cattaneo Department of Operations Research and Financial Engineering, Princeton University.    Yingjie Feng School of Economics and Management, Tsinghua University.    Filippo Palomba Department of Economics, Princeton University.    Rocio Titiunik Department of Politics, Princeton University.
Abstract

The synthetic control method offers a way to quantify the effect of an intervention using weighted averages of untreated units to approximate the counterfactual outcome that the treated unit(s) would have experienced in the absence of the intervention. This method is useful for program evaluation and causal inference in observational studies. We introduce the software package scpi for prediction and inference using synthetic controls, implemented in Python, R, and Stata. For point estimation or prediction of treatment effects, the package offers an array of (possibly penalized) approaches leveraging the latest optimization methods. For uncertainty quantification, the package offers the prediction interval methods introduced by Cattaneo, Feng and Titiunik (2021) and Cattaneo, Feng, Palomba and Titiunik (2022). The paper includes numerical illustrations and a comparison with other synthetic control software.

Keywords: program evaluation, causal inference, synthetic controls, prediction intervals, non-asymptotic inference.

1 Introduction

The synthetic control method was introduced by Abadie and Gardeazabal (2003), and since then it has become a popular approach for program evaluation and causal inference in observational studies. It offers a way to study the effect of an intervention (e.g., treatments at the level of aggregate units, such as cities, states, or countries) by constructing weighted averages of untreated units to approximate the counterfactual outcome that the treated unit(s) would have experienced in the absence of the intervention. While originally developed for the special case of a single treated unit and a few control units over a short time span, the methodology has been extended in recent years to a variety of other settings with longitudinal data. See Abadie (2021) for a review on synthetic control methods, and Abadie and Cattaneo (2018) for a review on general methods for program evaluation.

Most methodological developments in the synthetic control literature have focused on either expanding the causal framework or developing new implementations for prediction/point estimation. Examples of the former include disaggregated data settings (Abadie and L’Hour, 2021) and staggered treatment adoption (Ben-Michael, Feller and Rothstein, 2022), while examples of the latter include employing different constrained estimation methods (see Table 3 below for references). Conceptually, implementation of the synthetic control method involves two main steps: first, treated units are “matched” to control units using only their pre-intervention data via (often constrained) regression methods and, second, prediction of the counterfactual outcomes of the treated units are obtained by combining the pre-intervention “matching” weights with the post-intervention data of the control units. As a result, the synthetic control approach offers a prediction or point estimator of the (causal) treatment effect for the treated unit(s) after the intervention was deployed.

Compared to prediction or estimation, considerably less effort has been devoted to develop principled uncertainty quantification for synthetic control methods. The most popular approach in practice is to employ design-based permutation methods taking the potential outcome variables as non-random (Abadie, Diamond and Hainmueller, 2010). Other approaches include methods based on large-sample approximations for disaggregated data under correctly specified factor-type models (Li, 2020), time-series permutation-based inference (Chernozhukov, Wüthrich and Zhu, 2021), large-sample approximations for high-dimensional penalization methods (Masini and Medeiros, 2021), and cross-sectional permutation-based inference in semiparametric duration-type settings (Shaikh and Toulis, 2021). A conceptually distinct approach to uncertainty quantification is proposed by Cattaneo, Feng and Titiunik (2021) and Cattaneo, Feng, Palomba and Titiunik (2022), who take the potential outcome variables as random and develop prediction intervals for the imputed (counterfactual) outcome of the treated unit(s) in the post-intervention period employing finite-sample probability concentration methods.

This article introduces the software package scpi for prediction and inference using synthetic control methods, implemented in Python, R, and Stata. For prediction or point estimation of treatment effects, the package offers an array of possibly penalized approaches leveraging the latest conic optimization methods (Domahidi, Chu and Boyd, 2013; Fu, Narasimhan and Boyd, 2020); see also Boyd and Vandenberghe (2004) for an introduction. For uncertainty quantification, the package focuses on the aforementioned prediction interval methods under random potential outcomes. The rest of the article focuses on the 𝚁\mathtt{R} implementation of the software, but we briefly illustrate analogous functionalities for Python in Appendix A, and for Stata in Appendix B.

The R package scpi includes the following six functions:

  • scdata() and scdataMulti(). These functions take as input a DataFrame object and process it to prepare the data matrices used for point estimation/prediction and inference/uncertainty quantification. The function scdata() is specific to the single treated unit case, whereas scdataMulti() can be used with multiple treated units and/or when treatment is adopted in a staggered fashion. Both functions allow the user to specify multiple features of the treated unit(s) to be matched by the synthetic unit(s), as well as feature-specific covariate adjustment, and can handle both independent and identically distributed (i.i.d.) and non-stationary (cointegrated) data.

  • scest(). This function handles “scpi_data” objects produced with scdata() or “scpi_data_multi” objects produced with scdataMulti(), and then implements a class of synthetic control predictions/point estimators for quantification of treatment effects. The implementation allows for multiple features, with and without additional covariate adjustment, and for both stationary and non-stationary data. The allowed prediction procedures include unconstrained weighted least squares as well as constrained weighted least squares with simplex, lasso-type, ridge-type parameter space restrictions and combinations thereof (see Table 2 below).

  • scpi(). This function takes as input an “scpi_data” object produced with scdata() or an “scpi_data_multi” object produced with scdataMulti(), and then computes prediction intervals for a class of synthetic control predictions/point estimators for quantification of treatment effects. It relies on scest() for point estimation/prediction of treatment effects, and thus inherits the same functionalities of that function. In particular, scpi() is designed to be the main function in applications, offering both predictions/point estimators for treatment effects as well as inference/uncertainty quantification (i.e., prediction intervals) for synthetic control methods. The function also allows the user to model separately in-sample and out-of-sample uncertainty, offering a broad range of options for practice.

  • scplot() and scplotMulti(). These functions process objects whose class is either “scest” or “scpi”. These objects contain the results of the point estimation/prediction or uncertainty quantification methods, respectively. The commands build on the ggplot2 package in R (Wickham, 2016) to compare the time series for the outcome of the treated unit(s) with the outcome time series of the synthetic control unit, along with the associated uncertainty. The functions return a ggplot object that can be further modified by the user.

The objects returned by scest() and scpi() support the methods print() and summary(). In typical applications, the user will first prepare the data using the function scdata() or scdataMulti(), and then produce predictions/point estimators for treatment effects with uncertainty quantification using the function scpi(). The function scest() is useful in cases where only predictions/point estimators are of interest. Numerical illustrations are given in Section 5.

There are many R, Python, and Stata packages available for prediction/point estimation and inference using synthetic control methods; Table 1 compares them to the package scpi. As shown in the table, scpi is the first package to offer uncertainty quantification using prediction intervals with random potential outcomes for a wide range of different synthetic control predictors. The package is also one of the first to handle multiple treated units and staggered treatment adoption, offering a wider array of options in terms of predictors and inference methods when compared with the other packages currently available. Furthermore, the package includes misspecification-robust methods, employs the latest optimization packages for conic programs available, and offers automatic parallelization in execution whenever multi-core processors are present, leading to significant improvements in numerical stability and computational speed. Finally, scpi is the only package available in Python, R, and Stata, which gives full portability across multiple statistical software and programming languages, and also the only package employing directly native conic optimization via the ECOS solver (see Table 4 for details).

Table 1: Comparison of different packages available on PyPi, CRAN, REPEC, or GitHub.
Package Statistical Prediction Inference Multiple Staggered Misspecification Automatic Last
Name Platform Method Method Treated Adoption Robust Parallelization Update
ArCo R LA Asym 2017-11-05
pgsc R SC Perm 2018-10-28
MSCMT R SC Perm 2019-11-14
npsynth St SC Perm 2020-06-23
tidysynth R SC Perm 2021-01-27
\hdashrule[0.5ex]22.5cm1pt1pt
microsynth R CA Perm 2021-02-26
scinference R SC, LA Perm 2021-05-14
SCUL R LA Perm 2021-05-19
gsynth R FA Asym 2021-08-06
Synth Py SC Perm 2021-10-07
\hdashrule[0.5ex]22.5cm1pt1pt
treebased-sc Py TB Perm 2021-11-01
SynthCast R SC Perm 2022-03-08
sytnhdid R LS, RI Asym 2022-03-15
allsynth St SC Perm 2022-05-07
synth2 St SC Perm 2022-05-28
\hdashrule[0.5ex]22.5cm1pt1pt
Synth R, St SC Perm 2022-06-08
SCtools R SC Perm 2022-06-09
augsynth R SC, RI Perm 2022-08-02
scul St LA Perm 2022-08-21
scpi Py, R, St SC, LA, RI, LS, + PI, Asym, Perm 2022-10-07

Note: Py = Python (https://www.python.org/); R = R (https://cran.r-project.org/); St = Stata (https://www.stata.com/); LA = Lasso penalty; CA = calibration; FA = factor-augmented models; LS = unconstrained least squares; RI = Ridge penalty; SC = canonical synthetic control; TB = tree-based methods; + = user-specified options (see Table 3 below for more details); Perm = permutation-based inference; Asym = asymptotic-based inference; PI = prediction intervals (non-asymptotic probability guarantees). The symbol ✓ means that the feature is available. The last column reports the date of last update as of October 7, 2022.

The rest of the article is organized as follows. Section 2 introduces the canonical synthetic control setup, and also briefly discusses extensions to multiple treated units with possibly staggered treatment adoption. Section 3 gives a brief introduction to the theory and methodology underlying the point estimation/prediction for synthetic control methods, discussing implementation details. Section 4 gives a brief introduction to the theory and methodology underlying the uncertainty quantification via prediction intervals for synthetic control methods, and also discusses the corresponding issues of implementation. Section 5 showcases some of the functionalities of the package using a real-world dataset, and Section 6 concludes. The appendices illustrate the Python (Appendix A) and Stata (Appendix B) implementations of scpi. Detailed instructions for installation, script files to replicate the analyses, links to software repositories, and other companion information can be found in the package’s website, https://nppackages.github.io/scpi/.

2 Setup

We first consider the canonical synthetic control framework with a single treated unit. The researcher observes J+1J+1 units for T0+T1T_{0}+T_{1} periods of time. Units are indexed by i=1,2,J,J+1i=1,2,\ldots J,J+1, and time periods are indexed by t=1,2,,T0,T0+1,,T0+T1t=1,2,\ldots,T_{0},T_{0}+1,\ldots,T_{0}+T_{1}. During the first T0T_{0} periods, all units are untreated. Starting at T0+1T_{0}+1, unit 11 receives treatment but the other units remain untreated. Once the treatment is assigned at T0+1T_{0}+1, there is no change in treatment status: the treated unit continues to be treated and the untreated units remain untreated until the end of the series, T1T_{1} periods later. The single treated unit in our context could be understood as an “aggregate” of multiple treated units; see Section 2.1 below for more discussion.

Each unit ii at period tt has two potential outcomes, Yit(1)Y_{it}(1) and Yit(0)Y_{it}(0), respectively denoting the outcome under treatment and the outcome in the absence of treatment. Two implicit assumptions are imposed: no spillovers (the potential outcomes of unit ii depend only on ii’s treatment status) and no anticipation (the potential outcomes at tt depend only on the treatment status of the same period). Then, the observed outcome YitY_{it} is

Yit={Yit(0),ifi{2,,J+1}Yit(0),ifi=1 and t{1,,T0}Yit(1),ifi=1 and t{T0+1,,T0+T1}.Y_{it}=\begin{cases}Y_{it}(0),\quad&\text{if}\quad i\in\{2,\ldots,J+1\}\\ Y_{it}(0),\quad&\text{if}\quad i=1\text{ and }t\in\{1,\ldots,T_{0}\}\\ Y_{it}(1),\quad&\text{if}\quad i=1\text{ and }t\in\{T_{0}+1,\ldots,T_{0}+T_{1}\}\end{cases}.

The causal quantity of interest is the difference between the outcome path taken by the treated unit, and the path it would have taken in the absence of the treatment:

τt:=Y1t(1)Y1t(0),t>T0.\tau_{t}:=Y_{1t}(1)-Y_{1t}(0),\qquad t>T_{0}.

We view the two potential outcomes Y1t(1)Y_{1t}(1) and Y1t(0)Y_{1t}(0) as random variables, which implies that τt\tau_{t} is a random quantity as well, corresponding to the treatment effect on a single treated unit. This contrasts with other analysis that regards the treatment effect as a fixed parameter (see Abadie, 2021, for references).

The potential outcome Y1t(1)Y_{1t}(1) of the treated unit is observed after the treatment. To recover the treatment effect τt\tau_{t}, it is necessary to have a “good” prediction of the counterfactual outcome Y1t(0)Y_{1t}(0) of the treated after the intervention. The idea of the synthetic control method is to find a vector of weights 𝐰=(w2,w3,,wJ+1)\mathbf{w}=(w_{2},w_{3},\dots,w_{J+1})^{\prime} such that a given loss function is minimized under constraints, only using pre-intervention observations. Given the resulting set of constructed weights 𝐰^\widehat{\mathbf{w}}, the treated unit’s counterfactual (potential) outcome is calculated as Y^1t(0)=i=2J+1w^iYit(0)\widehat{Y}_{1t}(0)=\sum_{i=2}^{J+1}\widehat{w}_{i}Y_{it}(0) for t>T0t>T_{0}. The weighted average Y^1t(0)\widehat{Y}_{1t}(0) is often referred to as the synthetic control of the treated unit, as it represents how the untreated units can be combined to provide the best counterfactual for the treated unit in the post-treatment period. In what follows, we briefly describe different approaches for point estimation/prediction leading to Y^1t(0)\widehat{Y}_{1t}(0), and then summarize the uncertainty quantification methods to complement those predictions.

2.1 Extensions

Building on the canonical synthetic control setup, we can consider other settings involving multiple treated units with possibly staggered treatment adoption. In particular, we briefly discuss three potential extensions of practical interest.

  • Multiple post-treatment periods. When outcomes are observed in multiple periods after the treatment, a researcher might be interested in the average treatment effect on the (single) treated unit across multiple post-treatment periods rather than the effect at a single period:

    τ:=1T1t=T0+1T0+T1(Y1t(1)Y1t(0))=1T1t=T0+1T0+T1τt.\tau:=\frac{1}{T_{1}}\sum_{t=T_{0}+1}^{T_{0}+T_{1}}\Big{(}Y_{1t}(1)-Y_{1t}(0)\Big{)}=\frac{1}{T_{1}}\sum_{t=T_{0}+1}^{T_{0}+T_{1}}\tau_{t}.

    The analysis of this quantity can be accommodated by the framework above. For instance, given the predicted counterfactual outcome Y^1t(0)=i=2J+1w^iYit(0)\widehat{Y}_{1t}(0)=\sum_{i=2}^{J+1}\widehat{w}_{i}Y_{it}(0) for each post-treatment period t>T0t>T_{0}, the predicted average counterfactual outcome of the treated is given by

    i=2J+1w^i(1T1t=T0+1T0+T1Yit(0)).\sum_{i=2}^{J+1}\widehat{w}_{i}\Big{(}\frac{1}{T_{1}}\sum_{t=T_{0}+1}^{T_{0}+T_{1}}Y_{it}(0)\Big{)}.

    This construction is equivalent to regarding the T1T_{1} post-treatment periods as a “single” period and defining the post-treatment predictors as averages of the corresponding predictors across post-treatment time periods.

  • Multiple treated units. The canonical single treated unit framework above can also be extended to the more general case of multiple treated units. For instance, suppose a researcher observes N0+N1N_{0}+N_{1} units for T0+T1T_{0}+T_{1} time periods, and let units be indexed by i=1,,N1,N1+1,,N0+N1i=1,\ldots,N_{1},N_{1}+1,\ldots,N_{0}+N_{1}. Without loss of generality, the first 11 to N1N_{1} units are assumed to be treated and units from N1+1N_{1}+1 to N0N_{0} to be untreated. Treated and untreated potential outcomes are, respectively, denoted by Yit(1)Y_{it}(1) and Yit(0)Y_{it}(0) for i=1,,N0+N1i=1,\ldots,N_{0}+N_{1}. The observed outcome of the iith treated unit is given by Yit:=𝟙(tT0)Yit(0)+𝟙(t>T0)Yit(1){Y_{it}:=\mathds{1}(t\leq T_{0})Y_{it}(0)+\mathds{1}(t>T_{0})Y_{it}(1)}.

    In such setting, a researcher might be interested in the individual treatment effect τit\tau_{it}

    τit:=Yit(1)Yit(0),t>T0,i=1,,N1,\tau_{it}:=Y_{it}(1)-Y_{it}(0),\quad t>T_{0},\quad i=1,\ldots,N_{1},

    or in the average treatment effect on the treated τt\tau_{\cdot t} across treated units

    τt:=1N1j=1N1(Yjt(1)Yjt(0)),t>T0.\tau_{\cdot t}:=\frac{1}{N_{1}}\sum_{j=1}^{N_{1}}\Big{(}Y_{jt}(1)-Y_{jt}(0)\Big{)},\qquad t>T_{0}.

    The first causal quantity, τit\tau_{it}, can be predicted in the framework described above considering one treated unit at a time or, alternatively, by considering all N1N_{1} treated units jointly.

    To predict the second causal quantity, τt\tau_{\cdot t}, one extra step is necessary. Define an aggregate unit “ave” whose observed outcome is Ytave:=1N1j=1N1YjtY_{t}^{\texttt{ave}}:=\frac{1}{N_{1}}\sum_{j=1}^{N_{1}}Y_{jt}, for t=1,,T0+T1t=1,\ldots,T_{0}+T_{1}. Other features of “unit 1” used in the synthetic control construction can be defined similarly as averages of the corresponding features across multiple treated units. The framework above can now be applied to the “new” average unit with outcome YtaveY_{t}^{\texttt{ave}}.

  • Staggered treatment adoption. Our framework can also be extended to the scenario where multiple treated units are assigned to treatment at different points in time, a staggered adoption design. In this case, one can understand the adoption time as a multivalued treatment assignment, and a large class of causal quantities can be defined accordingly. For example, let Ti{T0+1,T0+2,,T,}T_{i}\in\{T_{0}+1,T_{0}+2,\ldots,T,\infty\} denote the adoption time of unit ii where Ti=T_{i}=\infty means unit ii is never treated, and Yit(s)Y_{it}(s) represents the potential outcome of unit ii at time tt that would be observed if unit ii had adopted the treatment at time ss. Suppose that the treatment effect on unit ii one period after the treatment, i.e., Yi(Ti+1)(Ti)Yi(Ti+1)()Y_{i(T_{i}+1)}(T_{i})-Y_{i(T_{i}+1)}(\infty), is of interest. One can take all units that are treated later than Ti+1T_{i}+1 to obtain the synthetic control weights and construct the synthetic control prediction of the counterfactual outcome Yi(Ti+1)()Y_{i(T_{i}+1)}(\infty) accordingly. The methodology described below can be immediately applied to this problem.

The package scpi allows for estimation/prediction of treatment effects and uncertainty quantification via prediction intervals for the more general synthetic control settings discussed above. However, in order to streamline the exposition, the rest of this article focuses on the case of a single treated unit. See Cattaneo, Feng, Palomba and Titiunik (2022) for a formal treatment of more general staggered adoption problems, and its supplemental appendix for further details on how the package scpi can be used in settings with multiple treatment units and staggered treatment adoption. Our companion replication files illustrate both the canonical single treated unit framework and the generalizations discussed above.

3 Synthetic Control Prediction

We consider synthetic control weights constructed simultaneously for MM features of the treated unit, denoted by 𝐀l=(a1,l,,aT0,l)T0\mathbf{A}_{l}=(a_{1,l},\cdots,a_{T_{0},l})^{\prime}\in\mathbb{R}^{T_{0}}, with index l=1,,Ml=1,\ldots,M. For each feature ll, there exist J+KJ+K variables that can be used to predict or “match” the T0T_{0}-dimensional vector 𝐀l\mathbf{A}_{l}. These J+KJ+K variables are separated into two groups denoted by 𝐁l=(𝐁1,l,𝐁2,l,,𝐁J,l)T0×J\mathbf{B}_{l}=(\mathbf{B}_{1,l},\mathbf{B}_{2,l},\cdots,\mathbf{B}_{J,l})\in\mathbb{R}^{T_{0}\times J} and 𝐂l=(𝐂1,l,,𝐂K,l)T0×K\mathbf{C}_{l}=(\mathbf{C}_{1,l},\cdots,\mathbf{C}_{K,l})\in\mathbb{R}^{T_{0}\times K}, respectively. More precisely, for each jj, 𝐁j,l=(bj1,l,,bjT0,l)\mathbf{B}_{j,l}=(b_{j1,l},\cdots,b_{jT_{0},l})^{\prime} corresponds to the llth feature of the jjth unit observed in T0T_{0} pre-treatment periods and, for each kk, 𝐂k,l=(ck1,l,,ckT0,l)\mathbf{C}_{k,l}=(c_{k1,l},\cdots,c_{kT_{0},l})^{\prime} is another vector of control variables also possibly used to predict 𝐀l\mathbf{A}_{l} over the same pre-intervention time span. For ease of notation, we let d=J+KMd=J+KM.

The goal of the synthetic control method is to search for a vector of common weights 𝐰𝒲J\mathbf{w}\in\mathcal{W}\subseteq\mathbb{R}^{J} across the MM features and a vector of coefficients 𝐫KM\mathbf{r}\in\mathcal{R}\subseteq\mathbb{R}^{KM}, such that the linear combination of 𝐁l\mathbf{B}_{l} and 𝐂l\mathbf{C}_{l} “matches” 𝐀l\mathbf{A}_{l} as close as possible, during the pre-intervention period, for all 1lM1\leq l\leq M and some convex feasibility sets 𝒲\mathcal{W} and \mathcal{R} that capture the restrictions imposed. Specifically, we consider the following optimization problem:

𝜷^:=(𝐰^,𝐫^)argmin𝐰𝒲,𝐫(𝐀𝐁𝐰𝐂𝐫)𝐕(𝐀𝐁𝐰𝐂𝐫)\widehat{\bm{\beta}}:=(\widehat{\mathbf{w}}^{\prime},\;\widehat{\mathbf{r}}^{\prime})^{\prime}\in\underset{\mathbf{w}\in\mathcal{W},\,\mathbf{r}\in\mathcal{R}}{\operatorname*{arg\,min}}\;(\mathbf{A}-\mathbf{B}\mathbf{w}-\mathbf{C}\mathbf{r})^{\prime}\mathbf{V}(\mathbf{A}-\mathbf{B}\mathbf{w}-\mathbf{C}\mathbf{r}) (3.1)

where

𝐀T0M×1=[𝐀1𝐀M],𝐁T0M×J=[𝐁1𝐁M],𝐂T0M×KM=[𝐂1𝟎𝟎𝟎𝐂2𝟎𝟎𝟎𝐂M]\underbrace{\mathbf{A}}_{T_{0}\cdot M\times 1}=\begin{bmatrix}\mathbf{A}_{1}\\ \vdots\\ \mathbf{A}_{M}\end{bmatrix},\quad\underbrace{\mathbf{B}}_{T_{0}\cdot M\times J}=\begin{bmatrix}\mathbf{B}_{1}\\ \vdots\\ \mathbf{B}_{M}\end{bmatrix},\quad\underbrace{\mathbf{C}}_{T_{0}\cdot M\times K\cdot M}=\begin{bmatrix}\mathbf{C}_{1}&\mathbf{0}&\cdots&\mathbf{0}\\ \mathbf{0}&\mathbf{C}_{2}&\cdots&\mathbf{0}\\ \vdots&\vdots&\ddots&\vdots\\ \mathbf{0}&\mathbf{0}&\cdots&\mathbf{C}_{M}\end{bmatrix}

and 𝐕\mathbf{V} is a T0M×T0MT_{0}\cdot M\times T_{0}\cdot M weighting matrix reflecting the relative importance of different equations and time periods.

From (3.1), we can define the pseudo-true residual 𝐮\mathbf{u} as

𝐮=𝐀𝐁𝐰0𝐂𝐫0,\mathbf{u}=\mathbf{A}-\mathbf{B}\mathbf{w}_{0}-\mathbf{C}\mathbf{r}_{0}, (3.2)

where 𝐰0\mathbf{w}_{0} and 𝐫0\mathbf{r}_{0} denote the mean squared error population analog of 𝐰^\widehat{\mathbf{w}} and 𝐫^\widehat{\mathbf{r}}. As discussed in the next section, the proposed prediction intervals are valid conditional on some information set \mathscr{H}. Thus, 𝐰0\mathbf{w}_{0} and 𝐫0\mathbf{r}_{0} above are viewed as the (possibly constrained) best linear prediction coefficients conditional on \mathscr{H}. We do not attach any structural meaning to 𝐰0\mathbf{w}_{0} and 𝐫0\mathbf{r}_{0}: they are only (conditional) pseudo-true values whose meaning should be understood in context, and are determined by the assumptions imposed on the data generating process. In other words, we allow for misspecification when constructing the synthetic control weights 𝐰^\widehat{\mathbf{w}}, as this is the most likely scenario in practice.

Given the constructed weights 𝐰^\widehat{\mathbf{w}} and coefficients 𝐫^\widehat{\mathbf{r}}, the counterfactual outcome at the post-treatment period TT for the treated unit, Y1T(0)Y_{1T}(0), is predicted by

Y^1T(0)=𝐱T𝐰^+𝐠T𝐫^=𝐩T𝜷^,𝐩T:=(𝐱T,𝐠T),T>T0,\widehat{Y}_{1T}(0)=\mathbf{x}_{T}^{\prime}\widehat{\mathbf{w}}+\mathbf{g}_{T}^{\prime}\widehat{\mathbf{r}}=\mathbf{p}_{T}^{\prime}\widehat{\bm{\beta}},\qquad\mathbf{p}_{T}:=(\mathbf{x}_{T}^{\prime},\,\mathbf{g}_{T}^{\prime})^{\prime},\qquad T>T_{0}, (3.3)

where 𝐱TJ\mathbf{x}_{T}\in\mathbb{R}^{J} is a vector of predictors for control units observed in time TT and 𝐠TKM\mathbf{g}_{T}\in\mathbb{R}^{KM} is another set of user-specified predictors observed at time TT. Variables included in 𝐱T\mathbf{x}_{T} and 𝐠T\mathbf{g}_{T} need not be the same as those in 𝐁\mathbf{B} and 𝐂\mathbf{C}, but in practice it is often the case that 𝐱T=(Y2T(0),,Y(J+1)T(0))\mathbf{x}_{T}=(Y_{2T}(0),\cdots,Y_{(J+1)T}(0))^{\prime} and 𝐠T\mathbf{g}_{T} is excluded when 𝐂\mathbf{C} is not specified.

The next section discusses implementation details leading to Y^1T(0)\widehat{Y}_{1T}(0), including the choice of feasibility sets 𝒲\mathcal{W} and \mathcal{R}, weighting matrix 𝐕\mathbf{V}, and additional covariates 𝐂\mathbf{C}.

3.1 Implementation

The function scdata() in scpi prepares the data for point estimation/prediction purposes. This function takes as input an object of class DataFrame and outputs an object of class scpi_data containing the matrices 𝐀,𝐁,𝐂\mathbf{A},\mathbf{B},\mathbf{C} described above, and a matrix of post-treatment predictors 𝐏=(𝐩T0+1,,𝐩T0+T1)\mathbf{P}=(\mathbf{p}_{T_{0}+1},\cdots,\mathbf{p}_{T_{0}+T_{1}})^{\prime}. The user must provide a variable containing a unit identifier (id.var), a time variable (time.var), an outcome variable (outcome.var), the features to be matched (features), the treated unit (unit.tr), the control units (unit.co), the pre-treatment periods (period.pre), and the post-treatment periods (period.post). These options completely specify 𝐀,𝐁\mathbf{A},\mathbf{B}, and 𝐏\mathbf{P}. The user can also control the form of \mathcal{R} in (3.1) or, equivalently, the form of 𝐂\mathbf{C}, through the options cov.adj and constant. The former option allows the user to flexibly specify covariate adjustment feature by feature, while the latter option introduces a column vector of ones of size T0MT_{0}\cdot M in 𝐂\mathbf{C}. If M=1M=1, this is a simple constant term, but if M2M\geq 2 it corresponds to an intercept which is common across features.

The use of the options cov.adj and constant is best explained through some examples. If the user specifies only one feature (M=1M=1), then cov.adj can be an unnamed list:

cov.adj <- list(c("constant","trend"))

This particular choice includes a constant term and a linear time trend in 𝐂\mathbf{C}. If instead multiple features (M2M\geq 2) are used to find the synthetic control weights 𝐰^\widehat{\mathbf{w}}, then cov.adj allows for feature-specific covariate adjustment. For example, in a two-feature setting (M=2M=2), the code

cov.adj <- list('f1' = c("constant","trend"),'f2' = c("trend"))

specifies 𝐂\mathbf{C} as a block diagonal matrix where the first block 𝐂1\mathbf{C}_{1} contains a constant term and a trend, while the second block 𝐂2\mathbf{C}_{2} contains only a trend. If the user wants all features to share the same covariate adjustment, then it is sufficient to input a list with a unique element:

cov.adj <- list(c("constant","trend"))

This specification creates a block diagonal matrix 𝐂\mathbf{C} with identical blocks. In the same example with M=2M=2, if constant <- TRUE and cov.adj <- NULL, then 𝐂\mathbf{C} would not be block diagonal, but rather a column vector of ones of size 2T02T_{0}.

Finally, if 𝐀\mathbf{A} and 𝐁\mathbf{B} form a cointegrated system, by setting the option cointegrated.data to TRUE in scdata(), the matrix 𝐏\mathbf{P} is prepared in such a way that the function scpi() will properly handle in-sample and out-of-sample uncertainty quantification (see sections 4.3 and 4.3).

Once all the design matrices 𝐀,𝐁,𝐂,\mathbf{A},\mathbf{B},\mathbf{C}, and 𝐏\mathbf{P} have been created, we can proceed with point estimation/prediction of the counterfactual outcome of interest via the function scest().

The form of the feasibility set 𝒲\mathcal{W} in (3.1) or, equivalently, the constraints imposed on the weights 𝐰\mathbf{w}, can be set using the option w.constr. The package allows for the following family of constraints:

𝒲{J,{𝐰𝕎:𝐰pQ},{𝐰J:𝐰1=Q,𝐰2Q2}},\displaystyle\mathcal{W}\in\Big{\{}\mathbb{R}^{J},\;\{\mathbf{w}\in\mathbb{W}:||\mathbf{w}||_{p}\leq Q\},\;\{\mathbf{w}\in\mathbb{R}^{J}:||\mathbf{w}||_{1}=Q,||\mathbf{w}||_{2}\leq Q_{2}\}\Big{\}},
𝕎{J,+J},p{1,2},Q++,Q2++,\displaystyle\mathbb{W}\in\{\mathbb{R}^{J},\mathbb{R}^{J}_{+}\},\qquad p\in\{1,2\},\qquad Q\in\mathbb{R}_{++},\qquad Q_{2}\in\mathbb{R}_{++},\qquad

where the inequality constraint on the norm can be made an equality constraint as well. The user can specify the desired form for 𝒲\mathcal{W} through a list to be passed to the option w.constr:

W1 <- list(p = "no norm", lb = -Inf)
W2 <- list(p = "L1", dir = "==", Q = 1, lb = 0)
W3 <- list(p = "L2", dir = "<=", Q = 1, lb = -Inf)
W4 <- list(p = "L1-L2", lb = -Inf, Q = 1, Q2 = 1, dir = "==/<=")

The four lines above create 𝒲1=J\mathcal{W}_{1}=\mathbb{R}^{J}, 𝒲2={𝐰+J:𝐰1=1}\mathcal{W}_{2}=\{\mathbf{w}\in\mathbb{R}_{+}^{J}:||\mathbf{w}||_{1}=1\}, 𝒲3={𝐰J:𝐰21}\mathcal{W}_{3}=\{\mathbf{w}\in\mathbb{R}^{J}:||\mathbf{w}||_{2}\leq 1\}, and 𝒲4={𝐰J:𝐰1=1,𝐰21}\mathcal{W}_{4}=\{\mathbf{w}\in\mathbb{R}^{J}:||\mathbf{w}||_{1}=1,||\mathbf{w}||_{2}\leq 1\}, respectively. In greater detail,

  • p chooses the constrained norm of 𝐰\mathbf{w} among the options ‘no norm’, ‘L1’, ‘L2’, or ‘L1-L2’

  • dir sets the direction of the constraint 𝐰p\|\mathbf{w}\|_{p} and it can be either ‘==’, ‘<=’, or ‘==/<=’

  • Q is the size of the constraint and it can be set to any positive real number

  • lb sets a (common) lower bound on 𝐰\mathbf{w} and it takes as input either 0 or -Inf

Popular constraints can be called explicitly using the option name in the list passed to w.constr. Table 2 gives prototypical examples of such constraints.

Table 2: Constraints on the weights that can be directly called.
Name w.constr 𝒲\mathcal{W}
OLS list(name = ‘ols’) J\mathbb{R}^{J}
simplex list(name = ‘simplex’, Q = Q) {𝐰+J:𝐰1=Q}\{\mathbf{w}\in\mathbb{R}_{+}^{J}:||\mathbf{w}||_{1}=Q\}
lasso list(name = ‘lasso’, Q = Q) {𝐰J:𝐰1Q}\{\mathbf{w}\in\mathbb{R}^{J}:||\mathbf{w}||_{1}\leq Q\}
ridge list(name = ‘ridge’, Q = Q) {𝐰J:𝐰2Q}\{\mathbf{w}\in\mathbb{R}^{J}:||\mathbf{w}||_{2}\leq Q\}
L1-L2 list(name = ‘L1-L2’, Q = Q, Q2 = Q2) {𝐰+J:𝐰1=Q,𝐰2Q2}\{\mathbf{w}\in\mathbb{R}_{+}^{J}:||\mathbf{w}||_{1}=Q,||\mathbf{w}||_{2}\leq Q_{2}\}

In particular, specifying list(name = ‘simplex’, Q = 1) gives the standard constraint used in the canonical synthetic control method, that is, computing weights in (3.1) such that they are non-negative and sum up to one, and without including an intercept. This is the default in the function scest() (and scpi()). The following snippet showcases how each of these five constraints can be called automatically through the option name and manually through the options p, Q, Q2, lb, and dir. In the snippet, Q and Q2 are set to 1 for ridge and L1-L2 constraints, respectively, for simplicity, but to replicate the results obtained with the option name one should input the proper Q according to the rules of thumb described further below.

## Simplex
w.constr <- list(name = "simplex")
w.constr <- list(p = "L1", lb = 0, Q = 1, dir = "==")
## Least Squares
w.constr <- list(name = "ols")
w.constr <- list(p = "no norm", lb = -Inf, Q = NULL, dir = NULL)
## Lasso
w.constr <- list(name = "lasso")
w.constr <- list(p = "L1", lb = -Inf, Q = 1, dir = "<=")
## Ridge
w.constr <- list(name = "ridge")
w.constr <- list(p = "L2", lb = -Inf, Q = 1, dir = "<=")
## L1-L2
w.constr <- list(name = "L1-L2")
w.constr <- list(p = "L1-L2", lb = 0, Q = 1, Q2 = 1, dir = "==/<=")

Using the option w.constr in scest() (or scpi()) and the options cov.adj and constant in scdata() appropriately, i.e., setting 𝒲\mathcal{W} and \mathcal{R} in (3.1), many synthetic control estimators proposed in the literature can be implemented. Table 3 provides a non-exhaustive list of such examples.

Table 3: Examples of 𝒲\mathcal{W} and \mathcal{R} in the synthetic control literature (M=1M=1).
Article 𝒲\mathcal{W} \mathcal{R} w.constr constant
name Q Q2
Hsiao et al. (2012) J\mathbb{R}^{J} \mathbb{R} "ols" NULL NULL TRUE
Abadie et al. (2010) {𝐰+J:𝐰1=1}\{\mathbf{w}\in\mathbb{R}^{J}_{+}:||\mathbf{w}||_{1}=1\} {0}\{0\} "simplex" 1 NULL FALSE
Ferman and Pinto (2021) {𝐰+J:𝐰1=1}\{\mathbf{w}\in\mathbb{R}^{J}_{+}:||\mathbf{w}||_{1}=1\} \mathbb{R} "simplex" 1 NULL TRUE
Chernozhukov et al. (2021) {𝐰J:𝐰11}\{\mathbf{w}\in\mathbb{R}^{J}:||\mathbf{w}||_{1}\leq 1\} \mathbb{R} "lasso" 1 NULL TRUE
Amjad et al. (2018) {𝐰J:𝐰2Q}\{\mathbf{w}\in\mathbb{R}^{J}:||\mathbf{w}||_{2}\leq Q\} {0}\{0\} "ridge" Q NULL FALSE
Arkhangelsky et al. (2021) {𝐰+J:𝐰1=1,𝐰2Q2}\{\mathbf{w}\in\mathbb{R}_{+}^{J}:||\mathbf{w}||_{1}=1,||\mathbf{w}||_{2}\leq Q_{2}\} \mathbb{R} "L1-L2" 1 Q TRUE

Tuning parameter choices

We provide rule-of-thumb choices of the tuning parameter QQ for Lasso- and Ridge-type constraints.

  • Lasso (p=1)(p=1). Being Lasso similar in spirit to the “simplex”-type traditional constraint in the synthetic control literature, we propose Q=1Q=1 as a rule of thumb.

  • Ridge (p=2)(p=2). It is well known that the Ridge prediction problem can be equivalently formulated as an unconstrained penalized optimization problem and as a constrained optimization problem. More precisely, assuming 𝐂\mathbf{C} is not used and M=1M=1 for simplicity, the two Ridge-type problems are

    𝐰^:=argmin𝐰J(𝐀𝐁𝐰)𝐕(𝐀𝐁𝐰)+λ||𝐰||22,\widehat{\mathbf{w}}:=\operatorname*{arg\,min}_{\mathbf{w}\in\mathbb{R}^{J}}(\mathbf{A}-\mathbf{B}\mathbf{w})^{\prime}\mathbf{V}(\mathbf{A}-\mathbf{B}\mathbf{w})+\lambda||\mathbf{w}||_{2}^{2},

    where λ0\lambda\geq 0 is a shrinkage parameter, and

    𝐰^:=argmin𝐰J,𝐰22Q2(𝐀𝐁𝐰)𝐕(𝐀𝐁𝐰),\widehat{\mathbf{w}}:=\operatorname*{arg\,min}_{\mathbf{w}\in\mathbb{R}^{J},\,||\mathbf{w}||_{2}^{2}\leq Q^{2}}(\mathbf{A}-\mathbf{B}\mathbf{w})^{\prime}\mathbf{V}(\mathbf{A}-\mathbf{B}\mathbf{w}),

    where Q0Q\geq 0 is the (explicit) size of the constraint on the norm of 𝐰\mathbf{w}. Under the assumption of Gaussian errors, a risk-minimizing choice (Hoerl, Kannard and Baldwin, 1975) of the standard shrinkage tuning parameter is

    λ=Jσ^𝙾𝙻𝚂2/𝐰^𝙾𝙻𝚂22,\lambda=J\widehat{\sigma}_{\mathtt{OLS}}^{2}/\|\widehat{\mathbf{w}}_{\mathtt{OLS}}\|_{2}^{2},

    where σ^𝙾𝙻𝚂2\widehat{\sigma}_{\mathtt{OLS}}^{2} and 𝐰^𝙾𝙻𝚂\widehat{\mathbf{w}}_{\mathtt{OLS}} are estimators of the variance of the pseudo-true residual 𝐮\mathbf{u} and the coefficients 𝐰0\mathbf{w}_{0} based on least squares regression, respectively.

    Since the two optimization problems above are equivalent, there exists a one-to-one correspondence between λ\lambda and QQ. For example, assuming the columns of 𝐁\mathbf{B} are orthonormal, the closed-form solution for the Ridge estimator is 𝐰^=(𝐈+λ𝐈)1𝐰^𝙾𝙻𝚂\widehat{\mathbf{w}}=(\mathbf{I}+\lambda\mathbf{I})^{-1}\widehat{\mathbf{w}}_{\mathtt{OLS}}, and it follows that if the constraint on the 2\ell^{2}-norm is binding, then Q=𝐰^2=𝐰^𝙾𝙻𝚂2/(1+λ)Q=||\widehat{\mathbf{w}}||_{2}=||\widehat{\mathbf{w}}_{\mathtt{OLS}}||_{2}/(1+\lambda).

    However, if J>T0J>T_{0}, 𝐰^𝙾𝙻𝚂\widehat{\mathbf{w}}_{\mathtt{OLS}} does not exist, hence we cannot rely on the approach suggested above. Indeed, the proposed mapping between λ\lambda and QQ is ill-defined and, also, we are unable to estimate λ\lambda. In this case, we first make the design low-dimensional by performing variable selection on 𝐁\mathbf{B} with Lasso. Once we select the columns of 𝐁\mathbf{B} whose Lasso coefficient is non-zero, we choose λ\lambda according to the rule of thumb described above.

    If more than one feature is specified (M>1M>1), we compute the size of the constraint QlQ_{l} for each feature l=1,,Ml=1,\ldots,M and then select QQ as the tightest constraint to favor shrinkage of 𝐰\mathbf{w}, that is Q:=minl=1,,MQlQ:=\min_{l=1,\ldots,M}Q_{l}.

Missing Data

In case of missing values, we adopt different strategies depending on which units have missing entries and when these occur.

  • Missing pre-treatment data. In this case we compute 𝐰^\widehat{\mathbf{w}} without the periods for which there is at least a missing entry for either the treated unit or one of the donors.

  • Missing post-treatment donor data. Suppose that the iith donor has a missing entry in one of the MM features in the post-treatment period T~\widetilde{T}. It implies that the predictor vector 𝐩T~\mathbf{p}_{\widetilde{T}} has a missing entry, and thus the synthetic unit and the associated prediction intervals are not available.

  • Missing post-treatment treated data. Data for the treated unit after the treatment is only used to quantify the treatment effect τT\tau_{T}, which then will not be available. However, prediction intervals for the synthetic point prediction of the counterfactual outcome Y1t(0)Y_{1t}(0), t>T0t>T_{0}, can still be computed in the usual way as they do not rely on the availability of such data points.

4 Uncertainty Quantification

Following Cattaneo, Feng and Titiunik (2021) and Cattaneo, Feng, Palomba and Titiunik (2022), we view the quantity of interest τT\tau_{T} within the synthetic control framework as a random variable, and hence we refrain from calling it a parameter. Building an analogy with the concept of estimand (or parameter of interest), we refer to τT\tau_{T} as a predictand. Consequently, we prefer to call τ^T=Y1T(1)Y^1T(0)\widehat{\tau}_{T}=Y_{1T}(1)-\widehat{Y}_{1T}(0) based on (3.3) a prediction of τT\tau_{T} rather than an estimator of it, and our goal is to characterize the uncertainty of τ^T\widehat{\tau}_{T} by building prediction intervals rather than confidence intervals. In practice, it is appealing to construct prediction intervals that are valid conditional on a set of observables. We let \mathscr{H} be an information set generated by all features of control units and covariates used in the synthetic control construction, i.e., 𝐁\mathbf{B}, 𝐂\mathbf{C}, 𝐱T\mathbf{x}_{T}, and 𝐠T\mathbf{g}_{T}.

We first decompose the potential outcome of the treated unit based on 𝐰0\mathbf{w}_{0} and 𝐫0\mathbf{r}_{0} introduced in (3.2):

Y1T(0)𝐱T𝐰0+𝐠T𝐫0+eT=𝐩T𝜷0+eT,T>T0,Y_{1T}(0)\equiv\mathbf{x}_{T}^{\prime}\mathbf{w}_{0}+\mathbf{g}_{T}^{\prime}\mathbf{r}_{0}+e_{T}=\mathbf{p}_{T}^{\prime}\bm{\beta}_{0}+e_{T},\qquad T>T_{0}, (4.1)

where eTe_{T} is defined by construction. In our analysis, 𝐰0\mathbf{w}_{0} and 𝐫0\mathbf{r}_{0} are assumed to be (possibly) random quantities around which 𝐰^\widehat{\mathbf{w}} and 𝐫^\widehat{\mathbf{r}} are concentrating in probability, respectively. Then, the distance between the predicted treatment effect on the treated and the target population one is

τ^TτT=Y1T(0)Y^1T(0)=eT𝐩T(𝜷^𝜷0).\widehat{\tau}_{T}-\tau_{T}=Y_{1T}(0)-\widehat{Y}_{1T}(0)=e_{T}-\mathbf{p}_{T}^{\prime}(\widehat{\bm{\beta}}-\bm{\beta}_{0}). (4.2)

where eTe_{T} is the out-of-sample error coming from misspecification along with any additional noise occurring at the post-treatment period T>T0T>T_{0}, and the term 𝐩T(𝜷^𝜷0)\mathbf{p}_{T}^{\prime}(\widehat{\bm{\beta}}-\bm{\beta}_{0}) is the in-sample error coming from the construction of the synthetic control weights. Our goal is to find probability bounds on the two terms separately to give uncertainty quantification: for some pre-specified levels α1,α2(0,1)\alpha_{1},\alpha_{2}\in(0,1), with high probability over \mathscr{H},

[M1,𝙻𝐩T(𝜷^𝜷0)M1,𝚄|]1α1and[M2,𝙻eTM2,𝚄|]1α2.\displaystyle\mathbb{P}\big{[}M_{1,\mathtt{L}}\leq\mathbf{p}_{T}^{\prime}(\widehat{\bm{\beta}}-\bm{\beta}_{0})\leq M_{1,\mathtt{U}}\;\big{|}\;\mathscr{H}\big{]}\geq 1-\alpha_{1}\quad\text{and}\quad\mathbb{P}\big{[}M_{2,\mathtt{L}}\leq e_{T}\leq M_{2,\mathtt{U}}\;\big{|}\;\mathscr{H}\big{]}\geq 1-\alpha_{2}.

It follows that these probability bounds can be combined to construct a prediction interval for τT\tau_{T} with conditional coverage at least 1α1α21-\alpha_{1}-\alpha_{2}: with high probability over \mathscr{H},

[τ^T+M1,𝙻M2,𝚄τTτ^T+M1,𝚄M2,𝙻|]1α1α2.\mathbb{P}\big{[}\widehat{\tau}_{T}+M_{1,\mathtt{L}}-M_{2,\mathtt{U}}\leq\tau_{T}\leq\widehat{\tau}_{T}+M_{1,\mathtt{U}}-M_{2,\mathtt{L}}\big{|}\mathscr{H}\big{]}\geq 1-\alpha_{1}-\alpha_{2}.

4.1 In-Sample Error

Cattaneo, Feng and Titiunik (2021) provide a principled simulation-based method for quantifying the in-sample uncertainty coming from 𝐩T(𝜷^𝜷0)\mathbf{p}_{T}^{\prime}(\widehat{\bm{\beta}}-\bm{\beta}_{0}). Let 𝐙=(𝐁,𝐂)\mathbf{Z}=(\mathbf{B},\mathbf{C}) and 𝐃\mathbf{D} be a non-negative diagonal (scaling) matrix of size dd, possibly depending on the pre-treatment sample size T0T_{0}. Since 𝜷^\widehat{\bm{\beta}} solves (3.1), 𝜹^:=𝐃(𝜷^𝜷0)\widehat{\bm{\delta}}:=\mathbf{D}(\widehat{\bm{\beta}}-\bm{\beta}_{0}) is the optimizer of the centered criterion function:

𝜹^=argmin𝜹Δ{𝜹𝐐^𝜹2𝜸^𝜹},\widehat{\bm{\delta}}=\underset{\bm{\delta}\in\Delta}{\operatorname*{arg\,min}}\;\big{\{}\bm{\delta}^{\prime}\widehat{\mathbf{Q}}\bm{\delta}-2\widehat{\bm{\gamma}}^{\prime}\bm{\delta}\big{\}},

where 𝐐^=𝐃1𝐙𝐕𝐙𝐃1\widehat{\mathbf{Q}}=\mathbf{D}^{-1}\mathbf{Z}^{\prime}\mathbf{V}\mathbf{Z}\mathbf{D}^{-1}, 𝜸^=𝐮𝐕𝐙𝐃1\widehat{\bm{\gamma}}^{\prime}=\mathbf{u}^{\prime}\mathbf{V}\mathbf{Z}\mathbf{D}^{-1}, and Δ={𝒉d:𝒉=𝐃(𝜷𝜷0),𝜷𝒲×}\Delta=\{\bm{h}\in\mathbb{R}^{d}:\bm{h}=\mathbf{D}(\bm{\beta}-\bm{\beta}_{0}),\,\bm{\beta}\in\mathcal{W}\times\mathcal{R}\}. Recall that the information set conditional on which our prediction intervals are constructed contains 𝐁\mathbf{B} and 𝐂\mathbf{C}. Thus, 𝐐^\widehat{\mathbf{Q}} can be taken as fixed, and we need to characterize the uncertainty of 𝜸^\widehat{\bm{\gamma}}.

We construct a simulation-based criterion function accordingly:

(𝜹)=𝜹𝐐^𝜹2(𝐆)𝜹,𝐆𝖭(𝟎,𝚺^),\displaystyle\ell^{\star}(\bm{\delta})=\bm{\delta}^{\prime}\widehat{\mathbf{Q}}\bm{\delta}-2(\mathbf{G}^{\star})^{\prime}\bm{\delta},\qquad\mathbf{G}^{\star}\thicksim\mathsf{N}(\mathbf{0},\widehat{\bm{\Sigma}}), (4.3)

where 𝚺^\widehat{\bm{\Sigma}} is some estimate of 𝚺=𝕍[𝜸^|]\bm{\Sigma}=\mathbb{V}[\widehat{\bm{\gamma}}|\mathscr{H}] and 𝖭(0,𝚺^)\mathsf{N}(0,\widehat{\bm{\Sigma}}) represents the normal distribution with mean 𝟎\mathbf{0} and variance-covariance matrix 𝚺^\widehat{\bm{\Sigma}}. In practice, the criterion function ()\ell^{\star}(\cdot) can be simulated by simply drawing normal random vectors 𝐆\mathbf{G}^{\star}.

Since the original constraint set Δ\Delta is infeasible, we need to construct a constraint set Δ\Delta^{\star} used in simulation that is close to Δ\Delta. Specifically, define the distance between a point 𝐚d\mathbf{a}\in\mathbb{R}^{d} and a set Λd\Lambda\subseteq\mathbb{R}^{d} by

dist(𝐚,Λ)=infλΛ𝐚𝝀,\mathrm{dist}(\mathbf{a},\Lambda)=\inf_{\lambda\in\Lambda}\|\mathbf{a}-\bm{\lambda}\|,

where \|\cdot\| is a generic p\ell_{p} vector norm on d\mathbb{R}^{d} with p1p\geq 1 (e.g., Euclidean norm or 1\ell_{1} norm). We require

dist(𝐚,Δ)𝐚,𝐚Δ(𝟎,ε),\mathrm{dist}(\mathbf{a},\Delta^{\star})\ll\|\mathbf{a}\|,\quad\forall\,\mathbf{a}\in\Delta\cap\mathcal{B}(\mathbf{0},\varepsilon), (4.4)

where (𝟎,ε)\mathcal{B}(\mathbf{0},\varepsilon) is an ε\varepsilon-neighborhood around zero for some ε>0\varepsilon>0. In words, every point in the infeasible constraint set Δ\Delta has to be sufficiently close to the feasible constraint set used in simulation. We discuss below a principled strategy for constructing Δ\Delta^{\star}, which allows for both linear and non-linear constraints in the feasibility set. Section 4.3 provides details on how Δ\Delta^{\star} is constructed and implemented in the scpi package.

Given the feasible criterion function ()\ell^{\star}(\cdot) and constraint set Δ\Delta^{\star}, we let

M1,𝙻:=(α1/2)-quantile of inf{𝐩T𝐃1𝜹:𝜹Δ,(𝜹)0},and\displaystyle M_{1,\mathtt{L}}:=(\alpha_{1}/2)\text{-quantile of }\inf\,\Big{\{}\mathbf{p}_{T}^{\prime}\mathbf{D}^{-1}\bm{\delta}:\bm{\delta}\in\Delta^{\star},\,\ell^{\star}(\bm{\delta})\leq 0\Big{\}},\qquad\text{and}
M1,𝚄:=(1α1/2)-quantile of sup{𝐩T𝐃1𝜹:𝜹Δ,(𝜹)0},\displaystyle M_{1,\mathtt{U}}:=(1-\alpha_{1}/2)\text{-quantile of }\sup\,\Big{\{}\mathbf{p}_{T}^{\prime}\mathbf{D}^{-1}\bm{\delta}:\bm{\delta}\in\Delta^{\star},\,\ell^{\star}(\bm{\delta})\leq 0\Big{\}},

conditional on the data. Under mild regularity conditions, for a large class of synthetic control predictands (3.1), with high probability over \mathscr{H},

[M1,𝙻𝐩T(𝜷^𝜷0)M1,𝚄|]1α1,\mathbb{P}\big{[}M_{1,\mathtt{L}}\leq\mathbf{p}_{T}^{\prime}(\widehat{\bm{\beta}}-\bm{\beta}_{0})\leq M_{1,\mathtt{U}}\;\big{|}\;\mathscr{H}\big{]}\geq 1-\alpha_{1},

up to some small loss of the (conditional) coverage probability. Importantly, this conclusion holds whether the data are stationary or non-stationary and whether the model is correctly specified (i.e., 𝔼[𝐮|]=0\mathbb{E}[\mathbf{u}|\mathscr{H}]=0) or not. If constraints imposed are non-linear, an additional adjustment to this bound may be needed to ensure the desired coverage.

4.2 Out-of-Sample Error

The unobserved random variable eTe_{T} in (4.1) is a single error term in period TT, which we interpret as the error from out-of-sample prediction, conditional on \mathscr{H}. Naturally, in order to have a proper bound on eTe_{T}, it is necessary to determine certain features of its conditional distribution FeT(𝖾)=[eT𝖾|]F_{e_{T}}(\mathsf{e})=\mathbb{P}[e_{T}\leq\mathsf{e}|\mathscr{H}]. In this section, we outline principled but agnostic approaches to quantify the uncertainty introduced by the post-treatment unobserved shock eTe_{T}. Since formalizing the validity of our methods usually requires strong assumptions, we also recommend a generic sensitivity analysis to incorporate out-of-sample uncertainty into the prediction intervals. See Section 4.5 and Section 5, in particular Figure 3 with the corresponding snippet of R code, for further clarifications on how to carry out sensitivity analysis on eTe_{T}.

  • Approach 1: Non-Asymptotic bounds. The starting point is a non-asymptotic probability bound on eTe_{T} via concentration inequalities. For example, suppose that eTe_{T} is sub-Gaussian conditional on \mathscr{H}, i.e., there exists some σ>0\sigma_{\mathscr{H}}>0 such that 𝔼[exp(λ(eT𝔼[eT|]))|]exp(σ2λ2/2)\mathbb{E}[\exp(\lambda(e_{T}-\mathbb{E}[e_{T}|\mathscr{H}]))|\mathscr{H}]\leq\exp(\sigma_{\mathscr{H}}^{2}\lambda^{2}/2) a.s. for all λ\lambda\in\mathbb{R}. Then, we can take

    M2,𝙻:=𝔼[eT|]2σ2log(2/α2)andM2,𝚄:=𝔼[eT|]+2σ2log(2/α2).M_{2,\mathtt{L}}:=\mathbb{E}[e_{T}|\mathscr{H}]-\sqrt{2\sigma_{\mathscr{H}}^{2}\log(2/\alpha_{2})}\qquad\text{and}\qquad M_{2,\mathtt{U}}:=\mathbb{E}[e_{T}|\mathscr{H}]+\sqrt{2\sigma_{\mathscr{H}}^{2}\log(2/\alpha_{2})}.

    In practice, the conditional mean 𝔼[eT|]\mathbb{E}[e_{T}|\mathscr{H}] and the sub-Gaussian parameter σ\sigma_{\mathscr{H}} can be parameterized and/or estimated using the pre-treatment residuals.

  • Approach 2: Location-scale model. Suppose that eT=𝔼[eT|]+(𝕍[eT|])1/2εTe_{T}=\mathbb{E}[e_{T}|\mathscr{H}]+(\mathbb{V}[e_{T}|\mathscr{H}])^{1/2}\varepsilon_{T} with εT\varepsilon_{T} statistically independent of \mathscr{H}. This setting imposes restrictions on the distribution of eT|e_{T}|\mathscr{H}, but allows for a much simpler tabulation strategy. Specifically, we can set the lower bound and upper bound on eTe_{T} as follows:

    M2,𝙻=𝔼[eT|]+(𝕍[eT|])1/2𝔠ε(α2/2)andM2,𝚄=𝔼[eT|]+(𝕍[eT|])1/2𝔠ε(1α2/2),M_{2,\mathtt{L}}=\mathbb{E}[e_{T}|\mathscr{H}]+(\mathbb{V}[e_{T}|\mathscr{H}])^{1/2}\mathfrak{c}_{\varepsilon}(\alpha_{2}/2)\quad\text{and}\quad M_{2,\mathtt{U}}=\mathbb{E}[e_{T}|\mathscr{H}]+(\mathbb{V}[e_{T}|\mathscr{H}])^{1/2}\mathfrak{c}_{\varepsilon}(1-\alpha_{2}/2),

    where 𝔠ε(α2/2)\mathfrak{c}_{\varepsilon}(\alpha_{2}/2) and 𝔠ε(1α2/2)\mathfrak{c}_{\varepsilon}(1-\alpha_{2}/2) are α2/2\alpha_{2}/2 and (1α2/2)(1-\alpha_{2}/2) quantiles of εT\varepsilon_{T}, respectively. In practice, 𝔼[eT|]\mathbb{E}[e_{T}|\mathscr{H}] and 𝕍[eT|]\mathbb{V}[e_{T}|\mathscr{H}] can be parametrized and estimated using the pre-intervention residuals, or perhaps tabulated using auxiliary information. Once such estimates are available, the appropriate quantiles can be easily obtained using the standardized (estimated) residuals.

  • Approach 3: Quantile regression. Another strategy to bound eTe_{T} is to determine the α2/2\alpha_{2}/2 and (1α2/2)(1-\alpha_{2}/2) conditional quantiles of eT|e_{T}|\mathscr{H}, that is,

    M2,𝙻:=(α2/2)-quantile of eT|andM2,𝚄:=(1α2/2)-quantile of eT|.M_{2,\mathtt{L}}:=(\alpha_{2}/2)\text{-quantile of }e_{T}|\mathscr{H}\qquad\text{and}\qquad M_{2,\mathtt{U}}:=(1-\alpha_{2}/2)\text{-quantile of }e_{T}|\mathscr{H}.

    Consequently, we can employ quantile regression methods to estimate those quantities using pre-treatment data.

Using any of the above methods, we have the following probability bound on eTe_{T}:

[M2,𝙻eTM2,𝚄|]1α2.\mathbb{P}\big{[}M_{2,\mathtt{L}}\leq e_{T}\leq M_{2,\mathtt{U}}\;\big{|}\;\mathscr{H}\big{]}\geq 1-\alpha_{2}.

4.3 Implementation

We now discuss the implementation details. The function scpi(), through various options, allows the user to specify different approaches to quantify in-sample and out-of-sample uncertainty based on the methods described above. Most importantly, scpi() permits modelling separately the in-sample error 𝐩T(𝜷^𝜷0)\mathbf{p}_{T}^{\prime}(\widehat{\bm{\beta}}-\bm{\beta}_{0}) and the out-of-sample error eTe_{T}. In addition, the user can provide bounds on them manually with the options w.bounds and e.bounds, respectively, which can be useful for sensitivity analysis in empirical applications.

Modelling In-Sample Uncertainty

In-sample uncertainty stems from the prediction of 𝐩T(𝜷^𝜷0)\mathbf{p}_{T}^{\prime}(\widehat{\bm{\beta}}-\bm{\beta}_{0}), and its quantification reduces to determining M1,𝙻M_{1,\mathtt{L}} and M1,𝚄M_{1,\mathtt{U}}. We first review the methodological proposals for constructing the constraint set Δ\Delta^{\star} used in simulation discussed in Cattaneo, Feng, Palomba and Titiunik (2022), and then present the main procedure for constructing bounds on the in-sample error.


Constructing Δ\Delta^{\star}. Our in-sample uncertainty quantification requires the centered and scaled constraint feasibility set Δ\Delta to be locally identical to (or, at least, well approximated by) the constraint set Δ\Delta^{\star} used in simulation described in (4.3), in the sense of (4.4). Suppose that

𝒲×={𝜷d:𝒎𝚎𝚚(𝜷)=𝟎,𝒎𝚒𝚗(𝜷)𝟎},\mathcal{W}\times\mathcal{R}=\Big{\{}\bm{\beta}\in\mathbb{R}^{d}:\bm{m}_{\mathtt{eq}}(\bm{\beta})=\bm{0},\bm{m}_{\mathtt{in}}(\bm{\beta})\leq\bm{0}\Big{\}},

where 𝒎𝚎𝚚()d𝚎𝚚\bm{m}_{\mathtt{eq}}(\cdot)\in\mathbb{R}^{d_{\mathtt{eq}}} and 𝒎𝚒𝚗()d𝚒𝚗\bm{m}_{\mathtt{in}}(\cdot)\in\mathbb{R}^{d_{\mathtt{in}}} and denote the jjth constraint in 𝒎𝚒𝚗()\bm{m}_{\mathtt{in}}(\cdot) as m𝚒𝚗,j()m_{\mathtt{in},j}(\cdot). Given tuning parameters ϱj>0\varrho_{j}>0, j=1,,d𝚒𝚗j=1,\cdots,d_{\mathtt{in}}, let \mathcal{B} be the set of indices for the inequality constraints such that m𝚒𝚗,j(𝜷^)>ϱjm_{\mathtt{in},j}(\hat{\bm{\beta}})>-\varrho_{j}. Then, we construct Δ\Delta^{\star} as

Δ={𝐃(𝜷𝜷^):𝒎𝚎𝚚(𝜷)=𝟎,m𝚒𝚗,j(𝜷)m𝚒𝚗,j(𝜷^) for j, and m𝚒𝚗,l(𝜷)𝟎 for l}.\Delta^{\star}=\Big{\{}\mathbf{D}(\bm{\beta}-\widehat{\bm{\beta}}):\bm{m}_{\mathtt{eq}}(\bm{\beta})=\bm{0},m_{\mathtt{in},j}(\bm{\beta})\leq m_{\mathtt{in},j}(\widehat{\bm{\beta}})\text{ for }j\in\mathcal{B},\text{ and }m_{\mathtt{in},l}(\bm{\beta})\leq\bm{0}\text{ for }l\notin\mathcal{B}\Big{\}}.

In practice, we need to choose possibly heterogeneous parameters ϱj\varrho_{j}, j=1,,d𝚒𝚗j=1,\ldots,d_{\mathtt{in}}, for different inequality constraints. Our proposed choice of ϱj\varrho_{j} is

ϱj:=𝜷m𝚒𝚗,j(𝜷^)1×ϱ,j=1,,d𝚒𝚗,\varrho_{j}:=\Big{\|}\frac{\partial}{\partial\bm{\beta}}m_{\mathtt{in},j}(\widehat{\bm{\beta}})\Big{\|}_{1}\times\varrho,\quad j=1,\ldots,d_{\mathtt{in}},

for some parameter ϱ\varrho where 1\|\cdot\|_{1} denotes the 1\ell_{1}-norm. We estimate ϱ\varrho according to the following formula if M=1M=1:

ϱ=𝒞log(T0)cT01/2,\varrho=\mathcal{C}\frac{\log(T_{0})^{c}}{T_{0}^{1/2}},

where c=1/2c=1/2 if the data are i.i.d. or weakly dependent, and c=1c=1 if 𝐀\mathbf{A} and 𝐁\mathbf{B} form a cointegrated system, while 𝒞\mathcal{C} is one of the following:

𝒞1=σ^umin1jJσ^bj,𝒞2=max1jJσ^bjσ^umin1jJσ^bj2,𝒞3=max1jJσ^bjumin1jJσ^bj2,\mathcal{C}_{1}=\frac{\widehat{\sigma}_{u}}{\min_{1\leq j\leq J}\widehat{\sigma}_{b_{j}}},\qquad\mathcal{C}_{2}=\frac{\max_{1\leq j\leq J}\widehat{\sigma}_{b_{j}}\widehat{\sigma}_{u}}{\min_{1\leq j\leq J}\widehat{\sigma}^{2}_{b_{j}}},\qquad\mathcal{C}_{3}=\frac{\max_{1\leq j\leq J}\widehat{\sigma}_{b_{j}u}}{\min_{1\leq j\leq J}\widehat{\sigma}^{2}_{b_{j}}},

with 𝒞1\mathcal{C}_{1} as the default. σ^bj,u\widehat{\sigma}_{b_{j},u} is the estimated (unconditional) covariance between the pseudo-true residual 𝐮\mathbf{u} and the feature of the jjth control unit 𝐁j,1\mathbf{B}_{j,1}, and σ^u\widehat{\sigma}_{u} and σ^bj\widehat{\sigma}_{b_{j}} are the estimated (unconditional) standard deviation of, respectively, 𝐮\mathbf{u} and 𝐁j,1\mathbf{B}_{j,1}. In the case of multiple features (M>1M>1), the package employs the same construction above after stacking the data.


Degrees-of-Freedom Correction. Our uncertainty quantification strategy requires an estimator of the conditional variance 𝕍[𝐮|]\mathbb{V}[\mathbf{u}|\mathscr{H}], which may rely on the effective degrees of freedom 𝖽𝖿\mathsf{df} of the synthetic control method. In general, there exists no exact correspondence between the degrees of freedom and the number of parameters in a fitting model (Ye, 1998). Therefore, the estimated degrees of freedom 𝖽𝖿^\widehat{\mathsf{df}} are defined according to the chosen constraint sets for 𝜷\bm{\beta} underlying the estimation procedure in (3.1):

  • OLS. 𝖽𝖿^=J+KM\widehat{\mathsf{df}}=J+KM.

  • Lasso. Following Zou, Hastie and Tibshirani (2007), an unbiased and consistent estimator of 𝖽𝖿\mathsf{df} is 𝖽𝖿^=j=1J𝟙(w^j>0)+KM\widehat{\mathsf{df}}=\sum_{j=1}^{J}\mathds{1}(\widehat{w}_{j}>0)+KM where w^j\widehat{w}_{j} is the jjth element of the constructed weights 𝐰^\widehat{\mathbf{w}}.

  • Simplex. Following the discussion for Lasso, 𝖽𝖿^=j=1J𝟙(w^j>0)1+KM\widehat{\mathsf{df}}=\sum_{j=1}^{J}\mathds{1}(\widehat{w}_{j}>0)-1+KM.

  • Ridge. Let s1s2sJ0s_{1}\geq s_{2}\geq\cdots\geq s_{J}\geq 0 be singular values of 𝐁\mathbf{B} and λ\lambda be the complexity parameter of the corresponding Lagrangian Ridge problem, which satisfies λ𝐰^=𝐁(𝐀𝐁𝐰^)\lambda\widehat{\mathbf{w}}=\mathbf{B}^{\prime}(\mathbf{A}-\mathbf{B}\widehat{\mathbf{w}}). Then, following Friedman, Hastie and Tibshirani (2001), 𝖽𝖿^=j=1Jsj2sj2+λ+KM\widehat{\mathsf{df}}=\sum_{j=1}^{J}\frac{s_{j}^{2}}{s_{j}^{2}+\lambda}+KM.


Main procedure. Given the constraint set Δ\Delta^{\star}, the main procedure for computing the upper and lower bounds on the in-sample error is as follows:

  1. Step 1.

    Estimation of conditional moments of 𝐮\mathbf{u}. To estimate 𝚺\bm{\Sigma} and to simulate the criterion function (4.3) we need an estimate of 𝕍[𝜸^|]\mathbb{V}[\widehat{\bm{\gamma}}|\mathscr{H}] which, in turn, depends on the conditional moments of 𝐮\mathbf{u}. To estimate such moments, the user needs to specify three things:

    1. i)

      whether the model is misspecified or not, via the option u.missp.

    2. ii)

      how to model 𝐮\mathbf{u}, via the options u.order, u.lags, and u.design.

    3. iii)

      an estimator of 𝕍[𝐮|]\mathbb{V}[\mathbf{u}|\mathscr{H}], via the option u.sigma.

    Given the constructed weights 𝐰^=(w^1,,w^J)\widehat{\mathbf{w}}=(\widehat{w}_{1},\cdots,\widehat{w}_{J})^{\prime}, define regularized weights 𝐰^=(w^1,,w^J)\widehat{\mathbf{w}}^{\star}=(\widehat{w}_{1}^{\star},\cdots,\widehat{w}_{J}^{\star})^{\prime} with w^j=w^j𝟙(w^j>ϱ)\widehat{w}_{j}^{\star}=\widehat{w}_{j}\mathds{1}(\widehat{w}_{j}>\varrho) for the tuning parameter ϱ\varrho specified previously. Let 𝐁=diag(𝐁1,𝐁2,,𝐁M)\mathbf{B}^{\star}=\operatorname*{diag}(\mathbf{B}^{\star}_{1},\mathbf{B}^{\star}_{2},\ldots,\mathbf{B}^{\star}_{M}), where 𝐁l\mathbf{B}^{\star}_{l} denotes the matrix composed of the columns of 𝐁l\mathbf{B}_{l} with non-zero regularized weight w^j\widehat{w}_{j}^{\star} only. If the option cointegrated.data in scdata() is set to be TRUE, rather than the columns of 𝐁l\mathbf{B}_{l}, we take the first difference of the columns of 𝐁l\mathbf{B}_{l}. If the user inputs u.missp = FALSE, then it is assumed that 𝔼[𝐮|]=0\mathbb{E}[\mathbf{u}|\mathscr{H}]=0, whereas if u.missp = TRUE (default), then 𝔼[𝐮|]\mathbb{E}[\mathbf{u}|\mathscr{H}] needs to be estimated.

    The unknown conditional expectation 𝔼[𝐮|]\mathbb{E}[\mathbf{u}|\mathscr{H}] is estimated using the fitted values of a flexible linear-in-parameters regression of 𝐮^=𝐀𝐁𝐰^𝐂𝐫^\widehat{\mathbf{u}}=\mathbf{A}-\mathbf{B}\widehat{\mathbf{w}}-\mathbf{C}\widehat{\mathbf{r}} on a design matrix 𝐃𝐮\mathbf{D}_{\mathbf{u}}, which can be provided directly with the option u.design or by specifying the lags of 𝐁\mathbf{B}^{\star} (u.lags) and/or the order of the fully interacted polynomial in 𝐁\mathbf{B}^{\star} (u.order).

    For example, if the user specifies u.lags = 1 and u.order = 1, then the design matrix is 𝐃𝐮=[𝐁𝐁1𝐂]\mathbf{D}_{\mathbf{u}}=[\mathbf{B}^{\star}\;\;\mathbf{B}^{\star}_{-1}\;\;\mathbf{C}], where 𝐁1\mathbf{B}^{\star}_{-1} indicates the first lag of 𝐁\mathbf{B}^{\star}. If, instead, u.order = 0 and u.lags = 0 are specified, then 𝔼^[𝐮|]=𝐮¯𝜾T0\widehat{\mathbb{E}}[\mathbf{u}|\mathscr{H}]=\overline{\mathbf{u}}\otimes\bm{\iota}_{T_{0}}, where 𝐮¯=(u¯1,u¯2,,u¯M)\overline{\mathbf{u}}=(\overline{u}_{1},\overline{u}_{2},\ldots,\overline{u}_{M})^{\prime} with u¯l=T01t=1T0u^t,l\overline{u}_{l}=T_{0}^{-1}\sum_{t=1}^{T_{0}}\widehat{u}_{t,l}, 𝜾ν\bm{\iota}_{\nu} is a ν×1\nu\times 1 vector of ones, and \otimes denotes the Kronecker product.

    The conditional variance of 𝐮\mathbf{u} is estimated as

    𝕍^[𝐮|]=diag(𝚟𝚌1(u^1,1𝔼^[u1,1|])2,,𝚟𝚌T0M(u^T0,M𝔼^[uT0,M|])2)\widehat{\mathbb{V}}[\mathbf{u}|\mathscr{H}]=\operatorname*{diag}\Big{(}\mathtt{vc}_{1}(\widehat{u}_{1,1}-\widehat{\mathbb{E}}[u_{1,1}|\mathscr{H}])^{2},\cdots,\mathtt{vc}_{T_{0}\cdot M}(\widehat{u}_{T_{0},M}-\widehat{\mathbb{E}}[u_{T_{0},M}|\mathscr{H}])^{2}\Big{)}

    where 𝚟𝚌i\mathtt{vc}_{i}, i=1,,T0Mi=1,\cdots,T_{0}\cdot M is a sequence of variance-correction constants, which can be chosen among the well-known family of heteroskedasticity-robust variance-covariance estimators through the option u.sigma. In particular, the package currently allows for five choices:

    𝚟𝚌i(0)=1,𝚟𝚌i(1)=T0MT0M𝖽𝖿,𝚟𝚌i(2)=11𝐋ii,𝚟𝚌i(3)=1(1𝐋ii)2,𝚟𝚌i(4)=1(1𝐋ii)δi\mathtt{vc}_{i}^{(0)}=1,\quad\mathtt{vc}_{i}^{(1)}=\frac{T_{0}\cdot M}{T_{0}\cdot M-\mathsf{df}},\quad\mathtt{vc}_{i}^{(2)}=\frac{1}{1-\mathbf{L}_{ii}},\quad\mathtt{vc}_{i}^{(3)}=\frac{1}{\left(1-\mathbf{L}_{ii}\right)^{2}},\quad\mathtt{vc}_{i}^{(4)}=\frac{1}{\left(1-\mathbf{L}_{ii}\right)^{\delta_{i}}}

    with 𝐋ii\mathbf{L}_{ii} being the ii-th diagonal entry of the leverage matrix 𝐋:=𝐙(𝐙𝐕𝐙)1𝐙𝐕\mathbf{L}:=\mathbf{Z}(\mathbf{Z}^{\prime}\mathbf{V}\mathbf{Z})^{-1}\mathbf{Z}^{\prime}\mathbf{V}, δi=min{4,T0M𝐏ii/𝖽𝖿}\delta_{i}=\min\{4,T_{0}\cdot M\cdot\mathbf{P}_{ii}/\mathsf{df}\}, and 𝖽𝖿\mathsf{df} is a degrees-of-freedom correction factor, whose estimation has been explained before.

  2. Step 2.

    Estimation of 𝚺\bm{\Sigma}. The estimator of 𝚺\bm{\Sigma} is 𝚺^=(𝐙𝐕)𝕍^[𝐮|](𝐕𝐙)\widehat{\bm{\Sigma}}=(\mathbf{Z}^{\prime}\mathbf{V})\widehat{\mathbb{V}}[\mathbf{u}|\mathscr{H}](\mathbf{V}\mathbf{Z}).

  3. Step 3.

    Simulation. The criterion function (𝜹)\ell^{\star}(\bm{\delta}) in (4.3) is simulated by drawing i.i.d. random vectors from the Gaussian distribution 𝖭(0,𝚺^)\mathsf{N}(0,\widehat{\bm{\Sigma}}), conditional on the data.

  4. Step 4.

    Optimization. Let (s)(𝜹)\ell_{(s)}^{\star}(\bm{\delta}) denote the criterion function corresponding to the ss-th draw from 𝖭(0,𝚺^)\mathsf{N}(0,\widehat{\bm{\Sigma}}). For each draw ss, we solve the following constrained problems:

    l(s):=inf𝜹Δ,(s)(𝜹)0𝐩T𝐃1𝜹andu(s):=sup𝜹Δ,(s)(𝜹)0𝐩T𝐃1𝜹,\displaystyle l_{(s)}:=\inf_{\bm{\delta}\in\Delta^{\star},\,\ell_{(s)}^{\star}(\bm{\delta})\leq 0}\mathbf{p}_{T}^{\prime}\mathbf{D}^{-1}\bm{\delta}\qquad\text{and}\qquad u_{(s)}:=\sup_{\bm{\delta}\in\Delta^{\star},\,\ell_{(s)}^{\star}(\bm{\delta})\leq 0}\mathbf{p}_{T}^{\prime}\mathbf{D}^{-1}\bm{\delta}, (4.5)

    where Δ\Delta^{\star} is constructed as explained previously.

  5. Step 5.

    Estimation of M1,𝙻M_{1,\mathtt{L}} and M1,𝚄M_{1,\mathtt{U}}. Step 4 is repeated SS times, where SS can be specified with the option sims. Then, M1,𝙻M_{1,\mathtt{L}} is the (α1/2)(\alpha_{1}/2)-quantile of {l(s)}s=1S\{l_{(s)}\}_{s=1}^{S} and M1,𝚄M_{1,\mathtt{U}} is the (1α1/2)(1-\alpha_{1}/2)-quantile of {u(s)}s=1S\{u_{(s)}\}_{s=1}^{S}. The level of α1\alpha_{1} can be chosen with the option u.alpha.


Execution Speed and Parallelization. Steps 3 and 4 of the procedure above are the most computationally intensive and we optimize them in two ways. First, to solve the optimization problem in (4.5), scpi relies on ECOS, an efficient solver for conic problems (Domahidi, Chu and Boyd, 2013; Fu, Narasimhan and Boyd, 2020). See Cattaneo, Feng, Palomba and Titiunik (2022) for more details on how to cast the different constrained SC methods into conic optimization problems. To give the reader a sense of the speed improvement, Table 4 compares the execution speed of the conic solver we rely on (first column) with other two popular optimizers in R. The first row of the table reports the median computation time of each optimizer, whereas the second row shows the inter-quartile range. On the one hand, using a conic solver in place of a solver for more generic optimization programs (like nloptr) makes our software 4 times faster. On the other hand, our software is tailored to rewrite the SC problem as a conic problem. This gives a 300-fold gain in speed when compared to CVXR, which relies on ECOS but is meant for prototyping generic optimization problems in conic form.

Table 4: Speed comparison across optimizers (units: milliseconds).
ECOS CVXR nloptr
Median 1.411 308.823 5.734
IQR [1.387, 1.431] [301.183, 315.901] [5.534, 6.148]

Notes: The underlying optimization problem is the minimization problem in (4.5)\eqref{eq:in sam obj}, where 𝒲\mathcal{W} is a simplex-type constraint and J,KM,J,KM, and MM are chosen to replicate the size of the empirical application in Section 5. We evaluate the performance of the function scpi through the R package microbenchmark. This simulation was run using an Apple M2 chip, RAM 8.00 GB.

Second, scpi can be sped up further by efficient parallelization of the tasks performed through the package parallel which assigns different simulations to different cores. If 𝙽cores\mathtt{N}_{\text{cores}} cores are used, the final execution time would be approximately 𝚃exec/𝙽cores\mathtt{T}_{\text{exec}}/\mathtt{N}_{\text{cores}}, where 𝚃exec\mathtt{T}_{\text{exec}} is the execution time when a single core is used.

Modelling Out-of-Sample Uncertainty

To quantify the uncertainty coming from eTe_{T}, we need to impose some probabilistic structure that allows us to model the distribution [eT𝚎|]\mathbb{P}[e_{T}\leq\mathtt{e}|\mathscr{H}] and, ultimately, estimate M2,𝙻M_{2,\mathtt{L}} and M2,𝚄M_{2,\mathtt{U}}. We discussed three different alternative approaches: (i) non-asymptotic bounds; (ii) location-scale model; and (iii) quantile regression. The user can choose the preferred way of modeling eT|e_{T}|\mathscr{H} by setting the option e.method to either ‘gaussian’, ‘ls’, or ‘qreg’.

The user can also choose the information used to estimate (conditional) moments or quantiles of eT|e_{T}|\mathscr{H}. Practically, we allow the user to specify a design matrix 𝐃𝐞\mathbf{D}_{\mathbf{e}} that is then used to run the appropriate regressions depending on the approach requested. By default, we set 𝐃𝐞=[𝐁1𝐂1]\mathbf{D}_{\mathbf{e}}=[\mathbf{B}^{\star}_{1}\;\;\mathbf{C}_{1}]. Alternatively, the matrix 𝐃𝐞\mathbf{D}_{\mathbf{e}} can be provided directly through the option e.design or by specifying the lags of 𝐁1\mathbf{B}^{\star}_{1} (e.lags) and/or the order of the fully interacted polynomial in 𝐁1\mathbf{B}^{\star}_{1} (e.order). If the user specifies e.lags = 0 and e.order = 2, then 𝐃𝐞\mathbf{D}_{\mathbf{e}} contains 𝐁1\mathbf{B}^{\star}_{1}, 𝐂1\mathbf{C}_{1}, and all the unique second-order terms generated by the interaction of the columns of 𝐁1\mathbf{B}^{\star}_{1}. If instead e.order = 0 and e.lags = 0 are set, then 𝔼^[eT|]\widehat{\mathbb{E}}[e_{T}|\mathscr{H}] and 𝕍^[eT|]\widehat{\mathbb{V}}[e_{T}|\mathscr{H}] are estimated using the sample average and the sample variance of eTe_{T} using the pre-intervention data. Recall that if the option cointegrated.data is set to TRUE, 𝐁1\mathbf{B}^{\star}_{1} is formed using the first differences of the columns in 𝐁1\mathbf{B}_{1}. Finally, the user can specify α2\alpha_{2} with the option e.alpha.

4.4 Simultaneous Prediction Intervals

Up to this point, we focused on prediction intervals that possess high coverage for the individual treatment effect in each period. However, it may be desirable to have prediction intervals that have high simultaneous coverage for several periods, usually known as simultaneous prediction intervals in the literature. In other words, our final goal is to construct a sequence of intervals {t:T0+1tT0+L}\{\mathcal{I}_{t}:T_{0}+1\leq t\leq T_{0}+L\} for some 1LT11\leq L\leq T_{1} such that with high probability over \mathscr{H},

[τtt, for all T0+1tT0+L|]1α1α2.\mathbb{P}\big{[}\tau_{t}\in\mathcal{I}_{t},\text{ for all }T_{0}+1\leq t\leq T_{0}+L\,\big{|}\,\mathscr{H}\big{]}\geq 1-\alpha_{1}-\alpha_{2}.

To construct such intervals, we need to generalize the procedures described above to quantify the in-sample error (Section 4.1) and the out-of-sample error (Section 4.2).

With regard to the in-sample uncertainty, we handle two separate cases. On the one hand, if the constraints in Δ\Delta are linear (e.g., simplex or lasso), then

M1,𝙻:=(α1/2)-quantile of inf{𝐩t𝐃1𝜹:𝜹Δ,(𝜹)0,T0+1tT0+L}and\displaystyle M_{1,\mathtt{L}}:=(\alpha_{1}/2)\text{-quantile of }\inf\,\Big{\{}\mathbf{p}_{t}^{\prime}\mathbf{D}^{-1}\bm{\delta}:\;\bm{\delta}\in\Delta^{\star},\;\ell^{\star}(\bm{\delta})\leq 0,\;T_{0}+1\leq t\leq T_{0}+L\Big{\}}\;\;\text{and}
M1,𝚄:=(1α1/2)-quantile of sup{𝐩t𝐃1𝜹:𝜹Δ,(𝜹)0,T0+1tT0+L},\displaystyle M_{1,\mathtt{U}}:=(1-\alpha_{1}/2)\text{-quantile of }\sup\,\Big{\{}\mathbf{p}_{t}^{\prime}\mathbf{D}^{-1}\bm{\delta}:\;\bm{\delta}\in\Delta^{\star},\;\ell^{\star}(\bm{\delta})\leq 0,\;T_{0}+1\leq t\leq T_{0}+L\Big{\}},

which guarantees that with high probability over \mathscr{H}

[M1,𝙻𝐩t(𝜷0𝜷^)M1,𝚄, for all T0+1tT0+L|]1α1.\mathbb{P}\big{[}M_{1,\mathtt{L}}\leq\mathbf{p}_{t}^{\prime}(\bm{\beta}_{0}-\widehat{\bm{\beta}})\leq M_{1,\mathtt{U}},\text{ for all }T_{0}+1\leq t\leq T_{0}+L\,\big{|}\,\mathscr{H}\big{]}\geq 1-\alpha_{1}.

On the other hand, if Δ\Delta includes non-linear constraints (e.g., constraints involving the 2\ell_{2} norm), it is necessary to decrease the lower bound M1,𝙻M_{1,\mathtt{L}} and increase the upper bound M1,𝚄M_{1,\mathtt{U}} by some quantity εΔ,t>0\varepsilon_{\Delta,t}>0 for each T0+1tT0+LT_{0}+1\leq t\leq T_{0}+L. To give an example of what εΔ,t\varepsilon_{\Delta,t} looks like, in the case of ridge-type constraints we have

εΔ,t=𝐩t1×(2𝜷^2)1×ϱ2,\varepsilon_{\Delta,t}=\|\mathbf{p}_{t}\|_{1}\times(2\|\widehat{\bm{\beta}}\|_{2})^{-1}\times\varrho^{2},

and see Cattaneo, Feng, Palomba and Titiunik (2022) for more general cases. With regard to the out-of-sample uncertainty, our proposed strategy is a generalization of “Approach 1” in Section 4.2: find M2,𝙻,tM_{2,\mathtt{L},t} and M2,𝚄,tM_{2,\mathtt{U},t} such that with high probability over \mathscr{H},

[M2,𝙻,tetM2,𝚄,t, for all T0+1tT0+L|]1α2.\mathbb{P}\big{[}M_{2,\mathtt{L},t}\leq e_{t}\leq M_{2,\mathtt{U},t},\text{ for all }T_{0}+1\leq t\leq T_{0}+L\,\big{|}\,\mathscr{H}\big{]}\geq 1-\alpha_{2}.

Suppose that each ete_{t}, T0+1tT0+LT_{0}+1\leq t\leq T_{0}+L, is sub-Gaussian conditional on \mathscr{H} (not necessarily independent over tt) with sub-Gaussian parameters σ,tσ\sigma_{\mathscr{H},t}\leq\sigma_{\mathscr{H}} for some σ\sigma_{\mathscr{H}}. Then, we can take

M2,𝙻,t:=𝔼[et|]2σ2log(2L/α2)andM2,𝚄,t:=𝔼[et|]+2σ2log(2L/α2).M_{2,\mathtt{L},t}:=\mathbb{E}[e_{t}|\mathscr{H}]-\sqrt{2\sigma_{\mathscr{H}}^{2}\log(2L/\alpha_{2})}\qquad\text{and}\qquad M_{2,\mathtt{U},t}:=\mathbb{E}[e_{t}|\mathscr{H}]+\sqrt{2\sigma_{\mathscr{H}}^{2}\log(2L/\alpha_{2})}.

We can see that, compared to what we had for “Approach 1”, there is an extra term, logL\sqrt{\log L}, which makes the simultaneous prediction intervals longer.

4.5 Sensitivity Analysis

While the three approaches for out-of-sample uncertainty quantification described in Section 4.2 are simple and intuitive, their validity requires potentially strong assumptions on the underlying data generating process that links the pre-treatment and post-treatment data. Such assumptions are difficult to avoid because the ultimate goal is to learn about the statistical uncertainty introduced by a single unobserved random variable after the treatment/intervention is deployed, that is, eT|e_{T}|\mathscr{H} for some T>T0T>T_{0}. Without additional data availability, or specific modelling assumptions allowing for transferring information from the pre-treatment period to the post-treatment period, it is difficult to formally construct M2,𝙻M_{2,\mathtt{L}} and M2,𝚄M_{2,\mathtt{U}} using data-driven methods.

We suggest approaching the out-of-sample uncertainty quantification as a principled sensitivity analysis, using the approaches above as a starting point. Given the formal and detailed in-sample uncertainty quantification described previously, it is natural to progressively enlarge the final prediction intervals by adding additional out-of-sample uncertainty to ask the question: how large does the additional out-of-sample uncertainty contribution coming from eT|e_{T}|\mathscr{H} need to be in order to render the treatment effect τT\tau_{T} statistically insignificant? Using the approaches above, or similar ones, it is possible to construct natural initial benchmarks. For instance, to implement Approach 1, one can use the pre-treatment outcomes or synthetic control residuals to obtain a “reasonable” benchmark estimate of the sub-Gaussian parameter σ\sigma_{\mathscr{H}} and then progressively enlarge or shrink this parameter to check the robustness of the conclusion. Alternatively, in specific applications, natural levels of uncertainty for the outcomes of interest could be available, and hence used to tabulate the additional out-of-sample uncertainty. We illustrate this approach in Section 5.

5 Empirical Illustration

We showcase the features of the package scpi using real data. For comparability purposes, we employ the canonical dataset in the synthetic control literature on the economic consequences of the 1990 German reunification (Abadie, 2021), and focus on quantifying the causal impact of the German reunification on GDP per capita in West Germany. Thus, we compare the post-reunification outcome for West Germany with the outcome of a synthetic control unit constructed using 16 OECD countries from 1960 to 1990. Using the notation introduced above, we have T0=31T_{0}=31 and J=16J=16. The only feature we exploit to construct the synthetic control is yearly GDP per capita, and we add a constant term for covariate adjustment. Thus M=1M=1 and K=1K=1, and =\mathcal{R}=\mathbb{R}. We explore the effect of the reunification from 1991 to 2003, hence T1=13T_{1}=13. Finally, we treat the time series for West Germany and those countries in the donor pool as a cointegrating system. Given this information, the command scdata() prepares all the matrices needed in the synthetic control framework described above (𝐀\mathbf{A}, 𝐁\mathbf{B}, 𝐂\mathbf{C} and 𝐏\mathbf{P}), and returns an object that must be used as input in either scest() to predict Y^1T(0),\widehat{Y}_{1T}(0), T>T0T>T_{0}, or scpi() to conduct inference on τ^T,\widehat{\tau}_{T}, T>T0T>T_{0}.

We first call scdata() to transform any data frame into an object of class “scpi_data”.

# Load data
> data <- scpi_germany
>
> ## Set parameters for data preparation
> id.var <- "country" # ID variable
> time.var <- "year" # Time variable
> period.pre <- (1960:1990) # Pre-treatment period
> period.post <- (1991:2003) # Post-treatment period
> unit.tr <- "West Germany" # Treated unit
> unit.co <- unique(data$country)[-7] # Donor pool
> outcome.var <- "gdp" # Outcome variable
> constant <- TRUE # Include constant term
> cointegrated.data <- TRUE # Cointegrated data
>
# Data preparation
> df <- scdata(df = data, id.var = id.var, time.var = time.var,
+ outcome.var = outcome.var, period.pre = period.pre,
+ period.post = period.post, unit.tr = unit.tr,
+ unit.co = unit.co, constant = constant,
+ cointegrated.data = cointegrated.data)

After having prepared the data, the next step involves choosing the desired constraint set 𝒲\mathcal{W} to construct the vector of weights 𝐰\mathbf{w}. We consider the canonical synthetic control method and thus search for optimal weights in 𝒲={𝐰+J:𝐰1=1}\mathcal{W}=\{\mathbf{w}\in\mathbb{R}^{J}_{+}:||\mathbf{w}||_{1}=1\}. Such constraint set is the default in scest() and, consequently, in scpi(), as the latter internally calls the former to construct 𝐰\mathbf{w}. The snippet below illustrates how to call scest() and reports the results displayed in the console with the summary() method.

# Estimate SC with a simplex-type constraint (default)
> res.est <- scest(data = df, w.constr = list(name="simplex"))
> summary(res.est)
Synthetic Control Estimation - Setup
Constraint Type: simplex
Constraint Size (Q): 1
Treated Unit: West Germany
Size of the donor pool: 16
Features: 1
Pre-treatment period: 1960-1990
Pre-treatment periods used in estimation: 31
Covariates used for adjustment: 1
Synthetic Control Estimation - Results
Active donors: 6
Coefficients:
Weights
Australia 0.000
Austria 0.441
Belgium 0.000
Denmark 0.000
France 0.000
Greece 0.000
Italy 0.177
Japan 0.013
Netherlands 0.059
New Zealand 0.000
Norway 0.000
Portugal 0.000
Spain 0.000
Switzerland 0.036
UK 0.000
USA 0.274
Covariates
0.constant 0.158

The next step is uncertainty quantification using scpi(). In this case, we quantify the in-sample and out-of-sample uncertainty the same way, using 𝐁\mathbf{B} and 𝐂\mathbf{C} as the conditioning set in both cases. To do so, it suffices to set the order of the polynomial in 𝐁\mathbf{B} to 1 (u.order <- 1 and e.order <- 1) and not include lags (u.lags <- 0 and e.lags <- 0). Furthermore, by specifying the option u.miss <- TRUE, we take into account that the conditional mean of 𝐮\mathbf{u} might differ from 0. This option, together with u.sigma <- "HC1", specifies the following estimator of 𝕍[𝐮|]\mathbb{V}[\mathbf{u}|\mathscr{H}]:

𝕍^[𝐮|]=diag(𝚟𝚌1(1)(𝐮^1𝔼^[𝐮1|])2,,𝚟𝚌T0(1)(𝐮^T0𝔼^[𝐮T0|])2).\widehat{\mathbb{V}}[\mathbf{u}|\mathscr{H}]=\operatorname*{diag}\Big{(}\mathtt{vc}_{1}^{(1)}(\widehat{\mathbf{u}}_{1}-\widehat{\mathbb{E}}[\mathbf{u}_{1}|\mathscr{H}])^{2},\cdots,\mathtt{vc}_{T_{0}}^{(1)}(\widehat{\mathbf{u}}_{T_{0}}-\widehat{\mathbb{E}}[\mathbf{u}_{T_{0}}|\mathscr{H}])^{2}\Big{)}.

Finally, by selecting e.method <- "gaussian", we perform out-of-sample uncertainty quantification treating eTe_{T} as sub-gaussian conditional on 𝐁\mathbf{B} and 𝐂\mathbf{C}. As a last step, we visualize the constructed synthetic control and compare it with the observed time series for the treated unit, taking advantage of the function scplot().

## Quantify uncertainty
> sims <- 500 # Number of simulations
> u.order <- 1 # Degree of polynomial in B and C when modelling u
> u.lags <- 0 # Lags of B to be used when modelling u
> u.sigma <- "HC1" # Estimator for the variance-covariance of u
> u.missp <- TRUE # If TRUE then the model is treated as misspecified
> e.order <- 1 # Degree of polynomial in B and C when modelling e
> e.lags <- 0 # Lags of B to be used when modelling e
> e.method <- "qreg" # Estimation method for out-of-sample uncertainty
> lgapp <- "linear" # Local geometry approximation
> cores <- 1 # Number of cores to be used by scpi
> set.seed(8894)
> res.pi <- scpi(data = df, sims = sims, e.method = e.method, e.order = e.order,
+ e.lags = e.lags, u.order = u.order, u.lags = u.lags, lgapp = lgapp,
+ u.sigma = u.sigma, u.missp = u.missp, cores = cores,
+ w.constr = list(name = "simplex"))
# Visualize results
> plot <- scplot(result = res.pi, plot.range = (1960:2003),
+ label.xy = list(x.lab = "Year", x.ticks = NULL, e.out = TRUE,
+ y.lab = "GDP per capita (thousand US dollars)"),
+ event.label = list(lab = "Reunification", height = 10))
> plot <- plot$plot_out + ggtitle("")
> ggsave(filename = 'germany_unc_simplex.png', plot = plot)

Figure 1 displays the plot resulting from the scplot call. The vertical bars are 90% prediction intervals, where the non-coverage error rate is halved between the out-of-sample and the in-sample uncertainty quantification, i.e. α1=α2=0.05\alpha_{1}=\alpha_{2}=0.05.

Figure 1: Treated and synthetic unit using a simplex-type 𝒲\mathcal{W} and 90% prediction intervals
Refer to caption

Notes: The black line shows the level of the outcome for the treated unit, Y1t(1)Y_{1t}(1), t=1963,,2003t=1963,\ldots,2003, whilst the blue line shows the level of the outcome for the synthetic control, Y^1t(0)\widehat{Y}_{1t}(0), t=1963,,2003t=1963,\ldots,2003. The blue bars report 90% prediction intervals for Y1t(0)Y_{1t}(0). In-sample uncertainty is quantified by means of 1000 simulations of (4.5), whereas out-of-sample uncertainty is quantified through sub-Gaussian bounds.

We also conduct the same exercise using different choices of 𝒲\mathcal{W} (see Table 2). In particular, we construct weights and compute prediction intervals using four other specifications: (i) a lasso-type constraint (Figure 2(a)), (ii) a ridge-type constraint (Figure 2(b)), (iii) no constraint (Figure 2(c)), and (iv) an L1-L2 constraint.

# Comparison of different constraint sets for the weights
> methods <- c("lasso", "ols", "ridge", "L1-L2")
> for (method in methods) {
> if (method %in% c("ridge", "L1-L2")) lgapp <- "generalized"
> set.seed(8894)
> res.pi <- scpi(data = df, sims = sims, e.method = e.method, e.order = e.order,
+ e.lags = e.lags, u.order = u.order, u.lags = u.lags, lgapp = lgapp,
+ u.sigma = u.sigma, u.missp = u.missp, cores = cores,
+ w.constr = list(name = method))
# Visualize results
> plot <- scplot(result = res.pi, plot.range = (1960:2003),
+ label.xy = list(x.lab = "Year", x.ticks = NULL, e.out = TRUE,
+ y.lab = "GDP per capita (thousand US dollars)"),
+ event.label = list(lab = "Reunification", height = 10))
> plot <- plot$plot_out + ggtitle("")
> ggsave(filename = paste0('germany_unc_',method,'.png'), plot = plot)
}
Figure 2: Uncertainty quantification with different types of 𝒲\mathcal{W} using 90% prediction intervals.
Refer to caption
(a) lasso
Refer to caption
(b) ridge
Refer to caption
(c) least squares
Refer to caption
(d) L1-L2

Notes: The black lines show the level of the outcome for the treated unit, Y1t(1)Y_{1t}(1), t=1963,,2003t=1963,\ldots,2003, whilst the blue lines show the level of the outcome for the synthetic control, Y^1t(0)\widehat{Y}_{1t}(0), t=1963,,2003t=1963,\ldots,2003. The blue bars report 90% prediction intervals for Y1t(0)Y_{1t}(0). In-sample uncertainty is quantified by means of 1000 simulations of (4.5), whereas out-of-sample uncertainty is quantified through sub-Gaussian bounds. In panel (b), Q=0.906Q=0.906, whereas in panel (d) Q=1,Q2=0.906Q=1,Q_{2}=0.906.

Section 4.5 clarified the need for some additional sensitivity analysis when it comes to out-of-sample uncertainty quantification. Figure 3 shows the impact of shrinking and enlarging σ^\widehat{\sigma}_{\mathscr{H}} on the prediction intervals for Y1t(0)Y_{1t}(0), t=1997t=1997, when we assume that ete_{t} is sub-Gaussian conditional on \mathscr{H}. As shown in the figure, the predicted treatment effect τ^𝟷𝟿𝟿𝟽\widehat{\tau}_{\mathtt{1997}} remains different from zero with high probability over \mathscr{H} even doubling σ^\widehat{\sigma}_{\mathscr{H}} .

Figure 3: Sensitivity analysis on out-of-sample uncertainty with sub-Gaussian bounds.
Refer to caption

Notes: The black horizontal line shows the level of the outcome for the treated unit in 1997, Y1t(1)Y_{1t}(1) for t=1997t=1997. The blue bars report 95% prediction intervals for Y1t(0)Y_{1t}(0), t=1997t=1997, that only take into account in-sample uncertainty. The red dashed bars adds the out-of-sample uncertainty to obtain 90% prediction intervals.

Finally, the package offers the possibility to match the treated unit and the synthetic unit using multiple features and the possibility to compute simultaneous prediction intervals. If we want to match West Germany and the synthetic unit not only on GDP per capita but also on trade openness (M=2M=2) and include joint prediction intervals, we can simply modify the object scpi_data as follows.

## Data preparation
df <- scdata(df = data, id.var = id.var, time.var = time.var,
outcome.var = outcome.var, period.pre = period.pre,
period.post = period.post, unit.tr = unit.tr,
features = c("gdp","trade"), cov.adj = list(c("constant")),
cointegrated.data = cointegrated.data, unit.co = unit.co)

Results are reported in Figure 4, where blue shaded areas depict 90% simultaneous prediction intervals for periods from 1991 to 2004.

Figure 4: Uncertainty quantification with different types of 𝒲\mathcal{W} using 90% prediction intervals (2 features).
Refer to caption
(a) simplex
Refer to caption
(b) lasso
Refer to caption
(c) ridge
Refer to caption
(d) least squares
Refer to caption
(e) L1-L2

Notes: The black line shows the level of the outcome for the treated unit, Y1t(1)Y_{1t}(1), t=1963,,2003t=1963,\ldots,2003, whilst the blue line shows the level of the outcome for the synthetic control, Y^1t(0)\widehat{Y}_{1t}(0), t=1963,,2003t=1963,\ldots,2003. The blue bars report 90% prediction intervals for Y1t(0)Y_{1t}(0). In-sample uncertainty is quantified by means of 1000 simulations of (4.5), whereas out-of-sample uncertainty is quantified through sub-Gaussian bounds. Blue shaded areas display 90% simultaneous prediction intervals. In panel (c), Q=0.903Q=0.903, whereas in panel (e) Q=1,Q2=0.903Q=1,Q_{2}=0.903.

6 Conclusion

This article introduced the R software package scpi, which implements point estimation/prediction and inference/uncertainty quantification procedures for synthetic control methods. The package is also available in the Stata and Python statistical platforms, as described in the appendices. Further information can be found at https://nppackages.github.io/scpi/.

7 Acknowledgments

We thank Alberto Abadie and Bartolomeo Stellato for many insightful discussions. Cattaneo and Titiunik gratefully acknowledge financial support from the National Science Foundation (SES-2019432), and Cattaneo gratefully acknowledges financial support from the National Institute of Health (R01 GM072611-16).

References

  • Abadie (2021) Abadie, A. (2021), “Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects,” Journal of Economic Literature, 59, 391–425.
  • Abadie and Cattaneo (2018) Abadie, A., and Cattaneo, M. D. (2018), “Econometric Methods for Program Evaluation,” Annual Review of Economics, 10, 465–503.
  • Abadie et al. (2010) Abadie, A., Diamond, A., and Hainmueller, J. (2010), “Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program,” Journal of the American Statistical Association, 105, 493–505.
  • Abadie and Gardeazabal (2003) Abadie, A., and Gardeazabal, J. (2003), “The Economic Costs of Conflict: A Case Study of the Basque Country,” American Economic Review, 93, 113–132.
  • Abadie and L’Hour (2021) Abadie, A., and L’Hour, J. (2021), “A Penalized Synthetic Control Estimator for Disaggregated Data,” Journal of the American Statistical Association, 116, 1817–1834.
  • Amjad et al. (2018) Amjad, M., Shah, D., and Shen, D. (2018), “Robust Synthetic Control,” The Journal of Machine Learning Research, 19, 802–852.
  • Arkhangelsky et al. (2021) Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., and Wager, S. (2021), “Synthetic Difference in Differences,” American Economic Review, 111, 4088–4118.
  • Ben-Michael et al. (2022) Ben-Michael, E., Feller, A., and Rothstein, J. (2022), “Synthetic Controls with Staggered Adoption,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 84, 351–381.
  • Boyd and Vandenberghe (2004) Boyd, S., and Vandenberghe, L. (2004), Convex optimization, Cambridge university press.
  • Cattaneo et al. (2022) Cattaneo, M. D., Feng, Y., Palomba, F., and Titiunik, R. (2022), “Uncertainty Quantification in Synthetic Controls with Staggered Treatment Adoption,” working paper.
  • Cattaneo et al. (2021) Cattaneo, M. D., Feng, Y., and Titiunik, R. (2021), “Prediction Intervals for Synthetic Control Methods,” Journal of the American Statistical Association, 116, 1865–1880.
  • Chernozhukov et al. (2021) Chernozhukov, V., Wüthrich, K., and Zhu, Y. (2021), “An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls,” Journal of the American Statistical Association, 116, 1849–1864.
  • Domahidi et al. (2013) Domahidi, A., Chu, E., and Boyd, S. (2013), “ECOS: An SOCP solver for embedded systems,” in 2013 European Control Conference (ECC), IEEE, pp.  3071–3076.
  • Ferman and Pinto (2021) Ferman, B., and Pinto, C. (2021), “Synthetic Controls with Imperfect Pretreatment Fit,” Quantitative Economics, 12, 1197–1221.
  • Friedman et al. (2001) Friedman, J., Hastie, T., and Tibshirani, R. (2001), The Elements of Statistical Learning, Springer, New York.
  • Fu et al. (2020) Fu, A., Narasimhan, B., and Boyd, S. (2020), “CVXR: An R Package for Disciplined Convex Optimization,” Journal of Statistical Software, 94, 1–34.
  • Hoerl et al. (1975) Hoerl, A. E., Kannard, R. W., and Baldwin, K. F. (1975), “Ridge Regression: Some Simulations,” Communications in Statistics-Theory and Methods, 4, 105–123.
  • Hsiao et al. (2012) Hsiao, C., Steve Ching, H., and Ki Wan, S. (2012), “A Panel Data Approach for Program Evaluation: Measuring the Benefits of Political and Economic Integration of Hong Kong with Mainland China,” Journal of Applied Econometrics, 27, 705–740.
  • Li (2020) Li, K. T. (2020), “Statistical Inference for Average Treatment Effects Estimated by Synthetic Control Methods,” Journal of the American Statistical Association, 115, 2068–2083.
  • Masini and Medeiros (2021) Masini, R., and Medeiros, M. C. (2021), “Counterfactual Analysis with Artificial Controls: Inference, High Dimensions and Nonstationarity,” Journal of the American Statistical Association, 116, 1773–1788.
  • Shaikh and Toulis (2021) Shaikh, A. M., and Toulis, P. (2021), “Randomization Tests in Observational Studies With Staggered Adoption of Treatment,” Journal of the American Statistical Association, 116, 1835–1848.
  • Wickham (2016) Wickham, H. (2016), ggplot2: Elegant Graphics for Data Analysis, Springer.
  • Ye (1998) Ye, J. (1998), “On measuring and Correcting the Effects of Data Mining and Model Selection,” Journal of the American Statistical Association, 93, 120–131.
  • Zou et al. (2007) Zou, H., Hastie, T., and Tibshirani, R. (2007), “On the “degrees of freedom” of the lasso,” The Annals of Statistics, 35, 2173–2192.

Appendix A Appendix: Python Illustration

This appendix section shows how to conduct the same analysis carried out in Section 5 using the companion Python package. Figure 5 shows the main results. The L1-L2 constraint is currently not implemented in the Python version of the scpi package due to technical difficulties with the optimizer nlopt. Replication files and data are available at https://nppackages.github.io/scpi/.

###################################################################
# Replication file for Cattaneo, Feng, Palomba, and Titiunik (2022)
###################################################################
########################################
# Load SCPI_PKG package
import pandas
import numpy
import random
import os
from warnings import filterwarnings
from plotnine import ggtitle, ggsave
from scpi_pkg.scdata import scdata
from scpi_pkg.scdataMulti import scdataMulti
from scpi_pkg.scest import scest
from scpi_pkg.scpi import scpi
from scpi_pkg.scplot import scplot
from scpi_pkg.scplotMulti import scplotMulti
os.chdir('YOUR_PATH_HERE')
filterwarnings('ignore')
########################################
# One feature (gdp)
########################################
########################################
# Load database
data = pandas.read_csv('scpi_germany.csv')
########################################
# Set options for data preparation
id_var = 'country'
outcome_var = 'gdp'
time_var = 'year'
period_pre = numpy.arange(1960, 1991)
period_post = numpy.arange(1991, 2004)
unit_tr = 'West Germany'
unit_co = list(set(data[id_var].to_list()))
unit_co = [cou for cou in unit_co if cou != 'West Germany']
constant = True
cointegrated_data = True
data_prep = scdata(df=data, id_var=id_var, time_var=time_var,
outcome_var=outcome_var, period_pre=period_pre,
period_post=period_post, unit_tr=unit_tr,
unit_co=unit_co, cointegrated_data=cointegrated_data,
constant=constant)
####################################
# SC - point estimation with simplex
est_si = scest(data_prep, w_constr={'name': 'simplex'})
print(est_si)
####################################
# Set options for inference
w_constr = {'name': 'simplex', 'Q': 1}
u_missp = True
u_sigma = 'HC1'
u_order = 1
u_lags = 0
e_method = 'gaussian'
e_order = 1
e_lags = 0
sims = 1000
cores = 1
for mtd in ['simplex', 'lasso', 'ridge', 'L1-L2', 'ols']:
if mtd in ['ridge', 'L1-L2']:
lgapp = 'generalized'
else:
lgapp = 'linear'
random.seed(8894)
pi_si = scpi(data_prep, sims=sims, w_constr={'name': mtd},
u_order=u_order, u_lags=u_lags, e_order=e_order,
e_lags=e_lags, e_method=e_method, u_missp=u_missp,
lgapp=lgapp, u_sigma=u_sigma, cores=cores)
plot = scplot(pi_si, x_lab='Year', e_method=e_method,
y_lab='GDP per capita (thousand US dollars)')
plot = plot + ggtitle('')
pltname = 'py_germany_unc_' + str(mtd) + '.png'
ggsave(filename=pltname, plot=plot)
########################################
# Multiple features (gdp, trade)
########################################
########################################
# Load database
data = pandas.read_csv('scpi_germany.csv')
########################################
# Set options for data preparation
id_var = 'country'
outcome_var = 'gdp'
time_var = 'year'
period_pre = numpy.arange(1960, 1991)
period_post = numpy.arange(1991, 2004)
unit_tr = 'West Germany'
unit_co = list(set(data[id_var].to_list()))
unit_co = [cou for cou in unit_co if cou != 'West Germany']
constant = False
cointegrated_data = True
cov_adj = [['constant'], ['constant']]
data_prep = scdata(df=data, id_var=id_var, time_var=time_var,
outcome_var=outcome_var, period_pre=period_pre,
period_post=period_post, unit_tr=unit_tr, constant=constant,
unit_co=unit_co, cointegrated_data=cointegrated_data,
features=['gdp', 'trade'], cov_adj=cov_adj)
####################################
# SC - point estimation with simplex
est_si = scest(data_prep, w_constr={'name': 'simplex'})
print(est_si)
####################################
# Set options for inference
w_constr = {'name': 'simplex', 'Q': 1}
u_missp = True
u_sigma = 'HC1'
u_order = 1
u_lags = 0
e_method = 'gaussian'
e_order = 1
e_lags = 0
sims = 1000
cores = 1
for mtd in ['simplex', 'lasso', 'ridge', 'L1-L2', 'ols']:
if mtd in ['ridge', 'L1-L2']:
lgapp = 'generalized'
else:
lgapp = 'linear'
random.seed(8894)
pi_si = scpi(data_prep, sims=sims, w_constr={'name': mtd},
u_order=u_order, u_lags=u_lags, e_order=e_order,
e_lags=e_lags, e_method=e_method, u_missp=u_missp,
lgapp=lgapp, u_sigma=u_sigma, cores=cores)
plot = scplot(pi_si, x_lab='Year', e_method=e_method,
y_lab='GDP per capita (thousand US dollars)')
plot = plot + ggtitle('')
pltname = 'py_germany_unc_' + str(mtd) + '.png'
ggsave(filename=pltname, plot=plot)

Case I: M=1M=1

Figure 5: Uncertainty quantification with different types of 𝒲\mathcal{W} using 90% prediction intervals.
Refer to caption
(a) simplex
Refer to caption
(b) lasso
Refer to caption
(c) ridge
Refer to caption
(d) least squares
Refer to caption
(e) L1-L2

Notes: The black line shows the level of the outcome for the treated unit, Y1t(1),t=1963,,2003Y_{1t}(1),t=1963,\ldots,2003, whilst the blue line shows the level of the outcome for the synthetic control, Y^1t(0)\widehat{Y}_{1t}(0), t=1963,,2003t=1963,\ldots,2003. The blue bars report 90% prediction intervals for Y1t(0)Y_{1t}(0). In-sample uncertainty is quantified by means of 1000 simulations of (4.5), whereas out-of-sample uncertainty is quantified through sub-Gaussian bounds. In panel (c), Q=0.906Q=0.906, whereas in panel (e) Q=1,Q2=0.906Q=1,Q_{2}=0.906.

Case II: M=2M=2

Figure 6: Uncertainty quantification with different types of 𝒲\mathcal{W} using 90% prediction intervals.
Refer to caption
(a) simplex
Refer to caption
(b) lasso
Refer to caption
(c) ridge
Refer to caption
(d) least squares
Refer to caption
(e) L1-L2

Notes: The black line shows the level of the outcome for the treated unit, Y1t(1)Y_{1t}(1), t=1963,,2003t=1963,\ldots,2003, whilst the blue line shows the level of the outcome for the synthetic control, Y^1t(0)\widehat{Y}_{1t}(0), t=1963,,2003t=1963,\ldots,2003. The blue bars report 90% prediction intervals for Y1t(0)Y_{1t}(0). In-sample uncertainty is quantified by means of 1000 simulations of (4.5), whereas out-of-sample uncertainty is quantified through sub-Gaussian bounds. Blue shaded areas display 90% simultaneous prediction intervals. In panel (c), Q=0.903Q=0.903, whereas in panel (e) Q=1,Q2=0.903Q=1,Q_{2}=0.903.

Appendix B Appendix: Stata Illustration

This appendix section replicates the analysis conducted in Section 5 for M=1M=1 using the companion Stata package. Main results are shown in Figure 7. The L1-L2 constraint is currently not implemented in the Stata version of the scpi package due to technical difficulties with the optimizer nlopt. Replication files and data are available at https://nppackages.github.io/scpi/.

******************************************************************
* Replication file - Cattaneo, Feng, Palomba, and Titiunik (2022)
******************************************************************
* Load dataset
use ”scpi_germany.dta”, clear
***********************************************
** One feature (gdp)
***********************************************
* Prepare data
scdata gdp, dfname(”python_scdata”) id(country) outcome(gdp) time(year) ///
treatment(status) cointegrated constant
* Quantify uncertainty
local lgapp ”linear”
foreach method in ”simplex” ”lasso” ”ols” ”ridge” ”L1-L2” {
if inlist(”`method'”, ”ridge”, ”L1-L2”) {
local lgapp ”generalized”
}
set seed 8894
scpi, dfname(”python_scdata”) name(`method') e_method(gaussian) u_missp ///
lgapp(”`lgapp'”) sims(1000)
scplot, uncertainty(”gaussian”) gphoptions(note(””) xtitle(”Year”) ///
ytitle(”GPD per capita (thousand US dollars)”))
graph export ”stata_germany_unc_`method'.png”, replace
}
***********************************************
** Multiple features (gdp, trade)
***********************************************
* Prepare data
scdata gdp trade, dfname(”python_scdata”) id(country) outcome(gdp) time(year) ///
treatment(status) cointegrated covadj(”constant”)
* Quantify uncertainty
local lgapp ”linear”
foreach method in ”simplex” ”lasso” ”ols” ”ridge” ”L1-L2” {
if inlist(”`method'”, ”ridge”, ”L1-L2”) {
local lgapp ”generalized”
}
set seed 8894
scpi, dfname(”python_scdata”) name(`method') e_method(gaussian) u_missp ///
lgapp(”`lgapp'”) sims(1000)
scplot, uncertainty(”gaussian”) gphoptions(note(””) xtitle(”Year”) ///
ytitle(”GPD per capita (thousand US dollars)”)) joint
graph export ”stata_germany_unc_`method'_multi.png”, replace
}

Case I: M=1M=1

Figure 7: Uncertainty quantification with different types of 𝒲\mathcal{W} using 90% prediction intervals.
Refer to caption
(a) simplex
Refer to caption
(b) lasso
Refer to caption
(c) ridge
Refer to caption
(d) least squares
Refer to caption
(e) L1-L2

Notes: The black line shows the level of the outcome for the treated unit, Y1t(1)Y_{1t}(1), t=1963,,2003t=1963,\ldots,2003, whilst the blue line shows the level of the outcome for the synthetic control, Y^1t(0)\widehat{Y}_{1t}(0), t=1963,,2003t=1963,\ldots,2003. The blue bars report 90% prediction intervals for Y1t(0)Y_{1t}(0). In-sample uncertainty is quantified by means of 1000 simulations of (4.5), whereas out-of-sample uncertainty is quantified through sub-Gaussian bounds. In panel (c), Q=0.906Q=0.906, whereas in panel (e) Q=1,Q2=0.906Q=1,Q_{2}=0.906.

Case II: M=2M=2

Figure 8: Uncertainty quantification with different types of 𝒲\mathcal{W} using 90% prediction intervals.
Refer to caption
(a) simplex
Refer to caption
(b) lasso
Refer to caption
(c) ridge
Refer to caption
(d) least squares
Refer to caption
(e) L1-L2

Notes: The black line shows the level of the outcome for the treated unit, Y1t(1)Y_{1t}(1), t=1963,,2003t=1963,\ldots,2003, whilst the blue line shows the level of the outcome for the synthetic control, Y^1t(0)\widehat{Y}_{1t}(0), t=1963,,2003t=1963,\ldots,2003. The blue bars report 90% prediction intervals for Y1t(0)Y_{1t}(0). In-sample uncertainty is quantified by means of 1000 simulations of (4.5), whereas out-of-sample uncertainty is quantified through sub-Gaussian bounds. Blue shaded areas display 90% simultaneous prediction intervals. In panel (c), Q=0.903Q=0.903, whereas in panel (e) Q=1,Q2=0.903Q=1,Q_{2}=0.903.