$$\textbf{Synthetic Control Method}$$

$$\text{Taha Bouhoun - Spring 2020}$$

$\textbf{Problem Description:}$

The fundamental problem of causal inference is that we do not observe both potential outcomes of a single unit across the same timeline. In other words, if we want to disentangle the causal effect of a treatment for a given unit, we need to compare the outcome of the unit in the presence and absense of the treatment (e.g., in a medical trial, a patient cannot be in both the treatment and control group at the same time, thus, we can't observe their outcome in both worlds).

However, we can approximate the counterfactual (the outcome of the treated unit had it not been subjected to treatment) using Synthetic Control Method designed by Abadie, Diamond, and Hainmueller (2010). The plan is to construct a synthetic version of the treated unit based on a convex combination of control group units such the the error between the treated unit and the synthetic unit is minimized during the pre-treatment period.

The aim of this final project is to

  • Replicate the findings of the original paper of Synthetic Control Method: Using CVXPY, I will formulate the SCM method as a convex minimization problem and compare it with the findings of the original paper.
  • Provide a basis for Synthetic Control Method in Python: To date, the implimentation of SCM in only availible in R, MATLAB, and Stata. Thus, the formulation of SCM in Python using CVXPY package can be a start for a Python implimentation of the method.

$\textbf{Methodology:}$

The original paper's aim was to disentangle the causal effect of German reunification in 1990 (treatment year) on West Germany's Gross Domestic Product. First, we gather data of the OECD countries from 1960 to 2003, then we split the data into two sections (pre-treatment & post-treatment). We then construct a covariate matrix of the treated unit (West Germany) $X_{1}$ and for the control units $X_{0}$ (rest of OECD). The benchmark is to have the best fit for West Germany' GDP trend and its constrcuted synthetic version during the pre-treatment period to have a credible comparaison in the post-treatment period. First, I addredd a simplified version of the problem on which the covariates have the same importance (weights), then I propose an approach to taking specifying weights for the covariates.

Note: The outcome variable (GDP) is excluded from the covariate matrices

$\textbf{Simplified Formulation:}$

Objective: $$\min\sum_{k=1}^k\left(X_1-W^{\ast}X_0\right)^2$$

Subject to: $$\sum_{j=1}^jW^{\ast}=1$$

Such as:

  • $j$ is the number of control units
  • $k$ is the number of covariates
  • $X_{1}$ is a $(1\times k)$ vector of observed covariates of the treated unit
  • $X_{0}$ is a $(j\times k)$ matrix of observed covariates of the control group
  • $W^\ast$ is a $(1\times j)$ vector of weights

The aim is to minimize the error distance between the outcome of the treated unit $Y_{1}$ and the product of the optimal weights $w^\ast$ by the outcome of the control units in the pre-treatment period $T_{0}$ such that:

$$W^{\ast}\times Y_0\approx Y_1$$

Where:

  • $Y_{1}$ is a $(1\times T_{0})$ vector of the outcome variable of the treated unit in the pre-treatment period
  • $Y_{0}$ is a $(j\times T_{0})$ matrix of the outcome variable of the control units in the pre-treatment period

Once we establish a good fit for the synthetic version and the treated unit, we can extrapolate the comparaison to the post-treatment period $T_{1}$ such that the treatment effect $\hat{\alpha}$ is expressed as: $$\hat{\alpha}=Y_{1T_{1}}-\sum_{k=1}^kW^{\ast}Y_{0T_{1}}$$

$\textbf{Incorporating the covariates' importance matrix:}$

As mentioned in the methodology, the simplified version attributes the same importance to covariates which can be problematic given their different units and how they contribute differently to the error we're trying to minimize. To illustrate the idea, imagine if we take as covariate

  • trade in units of millions of dollars
  • unemployment rate in percentage out of 100\% Both covariates contribute differently to the error rate, an error of 10$ in trade is not equivalent to a 10\% gap in unemployment rate. By introducing a diagonal matrix V, each coravariate is attributed a weight that reflects its importance in terms of predicting the fit with the treated unit trend. The problem then is formulated as follow:

We pick a diagonal matrix V to find the optimal weights W: $$\min\left(X_1-X_0W\right)^TV\left(X_1-X_0W\right)$$

We use the matrix V to test how well the it fits the outcome of the treated unit: $$V^{\ast}=\min\left[\left(Y_1-Y_0W^{\ast}\left(V\right)\right)^T\left(Y_1-Y_0W^{\ast}\left(V\right)\right)\right]$$

The process of finding V can be seen as an optimization problem on its own as the values of W differ as we change our choice of the importance metrix V. As a result, the problem is rendred non-convex as the following objective function can't contain both W and V as unknown variable subject to optimization:

Objective: $$\min\left[V\left(X_1-X_0W\right)^2\right]$$

A strategy to conserve the DCP rules is to fix one variable to optimize for the other then vice versa, however, the attempt was fruitless as the process didn't converge to a reasonable distribution of weights (all weights were attributed to a single unit).

In [1]:
# Import packages
import matplotlib.pyplot as plt
import scipy.optimize
import pandas as pd
import cvxpy as cvx
import numpy as np

$\textbf{Processing data}$

In [2]:
# Import and process data

data = pd.read_csv('german_reunification.csv')

tr_unit = data[data.country == 'West Germany']
tr_unit = tr_unit[tr_unit.year < 1990]
y_tr = np.array(tr_unit.gdp).reshape(1, 30)

ctr_units = data[data.country != 'West Germany']
ctr_units = ctr_units[ctr_units.year < 1990]
y_ctr = np.array(ctr_units.gdp).reshape(16, 30)

tr_unit_all = data[data.country == 'West Germany']
y_tr_all = np.array(tr_unit_all.gdp).reshape(1, 44)

ctr_units_all = data[data.country != 'West Germany']
y_ctr_all = np.array(ctr_units_all.gdp).reshape(16, 44)
In [3]:
# Create matrices for the problem formulation

X1 = data[data.country =='West Germany']
X1 = X1[X1.year < 1990]
X1 = X1.drop(['code','country','gdp', 'year'], axis=1)
X1 = X1.set_index(np.arange(len(X1)) // 30).mean(level=0)
X1 = X1.values

X0 = data[data.country != 'West Germany']
X0 = X0[X0.year < 1990]
X0 = X0.drop(['code','country','gdp', 'year'], axis=1)
X0 = X0.set_index(np.arange(len(X0)) // 30).mean(level=0)
X0 = X0.values

$\textbf{Simplified Formulation}$

In [4]:
# Construct the problem

w = cvx.Variable((1, 16), nonneg=True)

objective = cvx.Minimize(cvx.sum_squares(X1 - w @ X0))

constraints = [cvx.sum(w) == 1]

prob = cvx.Problem(objective, constraints)

# The optimal objective value is returned by prob.solve()

result = prob.solve(verbose=True)

print('The optimal objective value: ',result,'\n\nWeights: ',w.value)
-----------------------------------------------------------------
           OSQP v0.6.0  -  Operator Splitting QP Solver
              (c) Bartolomeo Stellato,  Goran Banjac
        University of Oxford  -  Stanford University 2019
-----------------------------------------------------------------
problem:  variables n = 23, constraints m = 24
          nnz(P) + nnz(A) = 158
settings: linear system solver = qdldl,
          eps_abs = 1.0e-05, eps_rel = 1.0e-05,
          eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
          rho = 1.00e-01 (adaptive),
          sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
          check_termination: on (interval 25),
          scaling: on, scaled_termination: off
          warm start: on, polish: on, time_limit: off

iter   objective    pri res    dua res    rho        time
   1   0.0000e+00   5.58e+01   1.71e+04   1.00e-01   1.31e-04s
 200   2.5574e+01   1.11e-02   8.12e-02   3.01e+00   4.27e-04s
 400   3.9777e+01   1.21e-03   9.31e-03   3.01e+00   7.04e-04s
 600   4.1633e+01   1.33e-04   1.02e-03   3.01e+00   9.77e-04s
 800   4.1839e+01   1.45e-05   1.11e-04   3.01e+00   1.26e-03s
 825   4.1845e+01   1.10e-05   8.42e-05   3.01e+00   1.34e-03s
plsh   4.1864e+01   2.83e-13   2.96e-13   --------   1.45e-03s

status:               solved
solution polish:      successful
number of iterations: 825
optimal objective:    41.8644
run time:             1.45e-03s
optimal rho estimate: 6.28e+00

The optimal objective value:  41.86440556366951 

Weights:  [[3.07188776e-01 0.00000000e+00 6.61204978e-01 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 3.16062464e-02 3.77783640e-17]]
In [5]:
# The weights attributed to each country

result = pd.DataFrame({'Country':data[data.country!='West Germany'].country.unique(), 
                       'Weight': np.round(w.value[0], decimals=3)})
result
Out[5]:
Country Weight
0 USA 0.307
1 UK 0.000
2 Austria 0.661
3 Belgium 0.000
4 Denmark 0.000
5 France 0.000
6 Italy 0.000
7 Netherlands 0.000
8 Norway 0.000
9 Switzerland 0.000
10 Japan 0.000
11 Greece 0.000
12 Portugal 0.000
13 Spain 0.000
14 Australia 0.032
15 New Zealand 0.000
In [6]:
# Comparing between the error of the optimizer fit and the Abadie et. al fit

w_opt = np.array([0.22, 0, 0.42, 0, 0, 0, 0, 0.09, 0, 0.11, 0.16, 0, 0, 0, 0, 0]).reshape(1, 16)

data_compare = pd.DataFrame({'Optimizer':(w @ X0).value[0], 
                             'Optimizer vs Treated': (w @ X0).value[0] - X1[0],
                             'Paper':(w_opt @ X0)[0], 
                             'Paper vs Treated': (w_opt @ X0)[0] - X1[0],
                             'Treated':X1[0]}, index= data.columns[4:])
data_compare
Out[6]:
Optimizer Optimizer vs Treated Paper Paper vs Treated Treated
infrate 4.775253 1.386404 4.793337 1.404488 3.388849
trade 46.588107 0.831420 48.260972 2.504285 45.756687
schooling 50.812894 -4.970439 45.611666 -10.171667 55.783333
invest60 0.257546 -0.079834 0.277653 -0.059727 0.337380
invest70 0.291181 -0.034458 0.316638 -0.009002 0.325640
invest80 25.428058 -1.589940 27.081760 0.063762 27.017998
industry 36.223929 -3.465587 36.472458 -3.217059 39.689516
In [7]:
print('Error using CVXPY: {} \nError in the original paper: {}'\
      .format(sum((w.value @ X0 - X1)[0]), sum((w_opt @ X0 - X1)[0])))
Error using CVXPY: -7.922435428769463 
Error in the original paper: -9.484920562009224
In [8]:
import plotly.graph_objs as go

fig = go.Figure()
fig.add_trace(go.Scatter(x=list(range(1960, 1990)), y=w.value[0] @ y_ctr,
                    mode='lines', name='Optimizer'))
fig.add_trace(go.Scatter(x=list(range(1960, 1990)), y=(w_opt @ y_ctr)[0],
                    mode='lines', name='Original Paper'))
fig.update_layout(title='Synthetic West Germany<br>Optimizer vs. Paper',
                   xaxis_title='Time', yaxis_title='GDP per Capita (USD)')
fig.show()
In [9]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=list(data.year.unique()), y=w.value[0] @ y_ctr_all,
                    mode='lines', name='Optimizer'))
fig.add_trace(go.Scatter(x=list(data.year.unique()), y=y_tr_all[0],
                    mode='lines', name='Treated unit'))
fig.add_shape(dict(type="line", x0=.698, y0=0, x1=.698, y1=.8,
                   line=dict(color="Black", width=1)))

fig.add_trace(go.Scatter(x=[1990], y=[30000], mode="text",
    name="Reunification", text=["German<br>Reunification"]))

fig.update_layout(title='Synthetic West Germany<br>Optimizer vs. Treated unit',
                   xaxis_title='Time', yaxis_title='GDP per Capita (USD)')
fig.show()

$\textbf{Incorporating Covariates' Importance}$

In [10]:
# All covariates

X0_T = (X0.T)
X0_T = np.array(X0_T).reshape(7, 16)
X1_T = (X1.T)
X1_T = np.array(X1_T).reshape(7, 1)

y_ctr_T = y_ctr.reshape(30, 16)
y_tr_T = y_tr.reshape(30, 1)
In [11]:
def total_loss(V_diag, details=False):
    ''' 
    Finds W that minimizes total loss given V,
    Returns total loss
    '''
    V = np.zeros(shape=(7, 7))
    np.fill_diagonal(V, V_diag)
    # Construct the problem
    w = cvx.Variable((16, 1), nonneg=True)
    objective = cvx.Minimize(cvx.sum(V @ cvx.square(X1_T - X0_T @ w)))
    constraints = [cvx.sum(w) == 1]
    problem = cvx.Problem(objective, constraints)
    result = problem.solve(verbose=details)

    return float((y_tr_T - y_ctr_T @ w.value).T @ (y_tr_T - y_ctr_T @ w.value)), w

$\textbf{Generating Random V}$

In [12]:
# Helpless attempt to avoid the DCP rules
record = []
for i in range(1000):
    if (i+1)%100 == 0: print('{}%'.format((i+1)/10))
    new_V = np.random.dirichlet(np.ones(7), size=1)
    record.append([total_loss(new_V)[0], new_V])
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
In [13]:
results = pd.DataFrame(record, columns=['Error', 'Importance'])\
            .sort_values(by='Error', ascending=True)
results = results.reset_index()
results.head(10)
Out[13]:
index Error Importance
0 537 9.950185e+08 [[0.056073904983131174, 0.06930647483063014, 0...
1 651 9.978247e+08 [[0.04348871786367567, 0.05635377897702295, 0....
2 976 9.991154e+08 [[0.01331308525118401, 0.1516163941135187, 0.3...
3 95 1.000996e+09 [[0.06274756054892223, 0.168271324897066, 0.36...
4 756 1.003024e+09 [[0.07093016028506177, 0.08959926400399244, 0....
5 562 1.003920e+09 [[0.0019424779538130606, 0.23170475343722108, ...
6 774 1.007189e+09 [[0.02550787913780792, 0.2143624928562831, 0.0...
7 710 1.007242e+09 [[0.07069151766034885, 0.0648044295586918, 0.6...
8 777 1.011593e+09 [[0.12740682983490986, 0.04737541869538386, 0....
9 757 1.013557e+09 [[0.18387657976160277, 0.14389403984504387, 0....
In [14]:
W = total_loss(results.iloc[0][1], details=True)[1].value

pd.DataFrame({'Country':data[data.country!='West Germany'].country.unique(), 
              'Weight': np.round(W.T[0], decimals=3)})
-----------------------------------------------------------------
           OSQP v0.6.0  -  Operator Splitting QP Solver
              (c) Bartolomeo Stellato,  Goran Banjac
        University of Oxford  -  Stanford University 2019
-----------------------------------------------------------------
problem:  variables n = 23, constraints m = 24
          nnz(P) + nnz(A) = 158
settings: linear system solver = qdldl,
          eps_abs = 1.0e-05, eps_rel = 1.0e-05,
          eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
          rho = 1.00e-01 (adaptive),
          sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
          check_termination: on (interval 25),
          scaling: on, scaled_termination: off
          warm start: on, polish: on, time_limit: off

iter   objective    pri res    dua res    rho        time
   1   0.0000e+00   5.58e+01   2.16e+08   1.00e-01   1.12e-04s
 200   6.6960e+06   8.53e-02   1.01e+04   2.28e+01   4.99e-04s
 400   2.0475e+08   7.02e-02   1.79e+05   2.73e+03   1.44e-03s
 600   9.7576e+08   5.93e-02   1.55e+05   2.73e+03   1.71e-03s
 800   2.1159e+09   5.06e-02   1.33e+05   2.73e+03   1.98e-03s
1000   5.1314e+09   3.84e-02   5.99e+05   1.39e+04   2.26e-03s
1200   7.6440e+09   3.25e-02   5.16e+05   1.39e+04   2.64e-03s
1400   1.0565e+10   2.76e-02   4.38e+05   1.39e+04   2.93e-03s
1600   1.3737e+10   2.36e-02   5.79e+05   1.39e+04   3.20e-03s
1800   1.7388e+10   1.93e-02   9.97e+05   1.39e+04   3.49e-03s
2000   2.0713e+10   1.58e-02   8.15e+05   1.39e+04   4.75e-03s
2200   2.3115e+10   1.32e-02   1.67e+05   1.39e+04   5.05e-03s
2400   2.4594e+10   1.19e-02   1.88e+05   1.39e+04   5.34e-03s
2600   2.5942e+10   1.07e-02   1.72e+05   1.39e+04   5.58e-03s
2800   2.7215e+10   9.67e-03   1.56e+05   1.39e+04   5.85e-03s
3000   2.8411e+10   8.72e-03   1.40e+05   1.39e+04   6.15e-03s
3200   2.9530e+10   7.87e-03   1.27e+05   1.39e+04   6.40e-03s
3400   3.0572e+10   7.11e-03   1.14e+05   1.39e+04   6.68e-03s
3600   3.1538e+10   6.42e-03   1.03e+05   1.39e+04   6.92e-03s
3800   3.2431e+10   5.79e-03   9.32e+04   1.39e+04   7.14e-03s
4000   3.3254e+10   5.23e-03   8.41e+04   1.39e+04   7.37e-03s
4200   3.4011e+10   4.72e-03   7.59e+04   1.39e+04   7.59e-03s
4400   3.4705e+10   4.26e-03   6.85e+04   1.39e+04   7.81e-03s
4600   3.5342e+10   3.84e-03   6.19e+04   1.39e+04   8.04e-03s
4800   3.5924e+10   3.47e-03   5.58e+04   1.39e+04   8.28e-03s
5000   3.6455e+10   3.13e-03   5.04e+04   1.39e+04   8.49e-03s
5200   3.6939e+10   2.83e-03   4.55e+04   1.39e+04   8.70e-03s
5400   3.7381e+10   2.55e-03   4.11e+04   1.39e+04   8.96e-03s
5600   3.7783e+10   2.30e-03   3.71e+04   1.39e+04   9.18e-03s
5800   3.8148e+10   2.08e-03   3.34e+04   1.39e+04   9.39e-03s
6000   3.8480e+10   1.88e-03   3.02e+04   1.39e+04   9.60e-03s
6200   3.8781e+10   1.69e-03   2.72e+04   1.39e+04   9.88e-03s
6400   3.9055e+10   1.53e-03   2.46e+04   1.39e+04   1.01e-02s
6600   3.9303e+10   1.38e-03   2.22e+04   1.39e+04   1.03e-02s
6800   3.9528e+10   1.24e-03   2.00e+04   1.39e+04   1.05e-02s
7000   3.9732e+10   1.12e-03   1.81e+04   1.39e+04   1.08e-02s
7200   3.9916e+10   1.01e-03   1.63e+04   1.39e+04   1.10e-02s
7400   4.0084e+10   9.15e-04   1.47e+04   1.39e+04   1.12e-02s
7600   4.0235e+10   8.26e-04   1.33e+04   1.39e+04   1.14e-02s
7800   4.0372e+10   7.46e-04   1.20e+04   1.39e+04   1.16e-02s
8000   4.0496e+10   6.73e-04   1.08e+04   1.39e+04   1.18e-02s
8200   4.0608e+10   6.07e-04   9.78e+03   1.39e+04   1.20e-02s
8350   4.0684e+10   5.62e-04   9.05e+03   1.39e+04   1.22e-02s

status:               solved
solution polish:      unsuccessful
number of iterations: 8350
optimal objective:    40684448315.4665
run time:             1.23e-02s
optimal rho estimate: 3.51e+04

Out[14]:
Country Weight
0 USA 0.305
1 UK 0.000
2 Austria 0.662
3 Belgium 0.000
4 Denmark 0.000
5 France 0.000
6 Italy 0.000
7 Netherlands 0.000
8 Norway 0.000
9 Switzerland 0.000
10 Japan 0.000
11 Greece 0.000
12 Portugal 0.000
13 Spain 0.000
14 Australia 0.036
15 New Zealand 0.000
In [15]:
pd.DataFrame({'Covariate': data.columns[4:], 
              'Weight': np.round(results['Importance'][0][0], 3)})
Out[15]:
Covariate Weight
0 infrate 0.056
1 trade 0.069
2 schooling 0.208
3 invest60 0.505
4 invest70 0.122
5 invest80 0.025
6 industry 0.014
In [16]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=list(range(1960, 1990)), y=(W.T @ y_ctr)[0],
                    mode='lines', name='Optimizer'))
fig.add_trace(go.Scatter(x=list(range(1960, 1990)), y=y_tr[0],
                    mode='lines', name='Treated Unit'))
fig.update_layout(title='Synthetic West Germany<br>Optimizer vs. Treated Unit',
                   xaxis_title='Time', yaxis_title='GDP per Capita (USD)')
fig.show()
In [17]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=list(data.year.unique()), y=(W.T @ y_ctr_all)[0],
                    mode='lines', name='Optimizer'))
fig.add_trace(go.Scatter(x=list(data.year.unique()), y=y_tr_all[0],
                    mode='lines', name='Treated unit'))
fig.add_shape(dict(type="line", x0=.698, y0=0, x1=.698, y1=.8,
                   line=dict(color="Black", width=1)))

fig.add_trace(go.Scatter(x=[1990], y=[30000], mode="text",
    name="Reunification", text=["German<br>Reunification"]))

fig.update_layout(title='Synthetic West Germany<br>Optimizer vs. Treated unit',
                   xaxis_title='Time', yaxis_title='GDP per Capita (USD)')
fig.show()
In [18]:
# Comparing between the error of the optimizer fit and the Abadie et. al fit
data_compare = pd.DataFrame({'Optimizer':(W.T @ X0)[0], 
                             'Optimizer vs Treated': (W.T @ X0 - X1)[0],
                             'Paper':(w_opt @ X0)[0], 
                             'Paper vs Treated': (w_opt @ X0 - X1)[0],
                             'Treated':X1[0]}, index= data.columns[4:])
data_compare
Out[18]:
Optimizer Optimizer vs Treated Paper Paper vs Treated Treated
infrate 4.799432 1.410583 4.793337 1.404488 3.388849
trade 46.725029 0.968342 48.260972 2.504285 45.756687
schooling 50.960760 -4.822574 45.611666 -10.171667 55.783333
invest60 0.258644 -0.078736 0.277653 -0.059727 0.337380
invest70 0.292285 -0.033355 0.316638 -0.009002 0.325640
invest80 25.531433 -1.486565 27.081760 0.063762 27.017998
industry 36.336410 -3.353107 36.472458 -3.217059 39.689516
In [19]:
print('Error using CVXPY: {} \nError in the original paper: {}'\
      .format(sum((W.T @ X0 - X1)[0]), sum((w_opt @ X0 - X1)[0])))
Error using CVXPY: -7.395411393585723 
Error in the original paper: -9.484920562009224

$\textbf{Differential Evolution}$

In [20]:
def total_loss_func(V_diag):
    ''' 
    Finds W that minimizes total loss given V,
    Returns total loss
    '''
    V = np.zeros(shape=(7, 7))
    np.fill_diagonal(V, V_diag)
    # Construct the problem
    w = cvx.Variable((16, 1), nonneg=True)
    objective = cvx.Minimize(cvx.sum(V @ cvx.square(X1_T - X0_T @ w)))
    constraints = [cvx.sum(w) == 10]
    problem = cvx.Problem(objective, constraints)
    result = problem.solve(verbose=False)

    return float((y_tr_T - y_ctr_T @ w.value).T @ (y_tr_T - y_ctr_T @ w.value))
In [21]:
from scipy.optimize import differential_evolution
bounds = [(0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1)]

result = differential_evolution(total_loss_func, bounds)

'{:.2e}'.format(result.fun)
Out[21]:
'1.37e+11'
In [22]:
gen_V = result.x / np.sum(result.x)
gen_V
Out[22]:
array([0.18831788, 0.23135748, 0.07005758, 0.17786297, 0.14548441,
       0.12704561, 0.05987408])
In [23]:
pd.DataFrame({'Covariate': data.columns[4:], 'Weight': np.round(gen_V, 3)})
Out[23]:
Covariate Weight
0 infrate 0.188
1 trade 0.231
2 schooling 0.070
3 invest60 0.178
4 invest70 0.145
5 invest80 0.127
6 industry 0.060
In [24]:
'{:.2e}'.format(total_loss(gen_V)[0])
Out[24]:
'1.38e+09'
In [25]:
gen_W = total_loss(gen_V)[1].value
pd.DataFrame({'Country':data[data.country!='West Germany'].country.unique(), 
              'Weight': np.round(gen_W.T[0], decimals=3)})
Out[25]:
Country Weight
0 USA 0.295
1 UK 0.000
2 Austria 0.636
3 Belgium 0.000
4 Denmark 0.000
5 France 0.000
6 Italy 0.000
7 Netherlands 0.000
8 Norway 0.000
9 Switzerland 0.000
10 Japan 0.000
11 Greece 0.000
12 Portugal 0.000
13 Spain 0.000
14 Australia 0.068
15 New Zealand 0.000
In [26]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=list(range(1960, 1990)), y=(gen_W.T @ y_ctr)[0],
                    mode='lines', name='Optimizer'))
fig.add_trace(go.Scatter(x=list(range(1960, 1990)), y=y_tr[0],
                    mode='lines', name='Treated Unit'))
fig.update_layout(title='Synthetic West Germany<br>Optimizer vs. Treated Unit',
                   xaxis_title='Time', yaxis_title='GDP per Capita (USD)')
fig.show()
In [27]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=list(data.year.unique()), y=(gen_W.T @ y_ctr_all)[0],
                    mode='lines', name='Optimizer'))
fig.add_trace(go.Scatter(x=list(data.year.unique()), y=y_tr_all[0],
                    mode='lines', name='Treated unit'))
fig.add_shape(dict(type="line", x0=.698, y0=0, x1=.698, y1=.8,
                   line=dict(color="Black", width=1)))

fig.add_trace(go.Scatter(x=[1990], y=[30000], mode="text",
    name="Reunification", text=["German<br>Reunification"]))

fig.update_layout(title='Synthetic West Germany<br>Optimizer vs. Treated unit',
                   xaxis_title='Time', yaxis_title='GDP per Capita (USD)')
fig.show()
In [28]:
print('Error using CVXPY: {} \nError in the original paper: {}'\
      .format(sum((gen_W.T @ X0 - X1)[0]), sum((w_opt @ X0 - X1)[0])))
Error using CVXPY: -8.363955165003988 
Error in the original paper: -9.484920562009224

$\textbf{References}$

  • Abadie, A., Diamond, A., Hainmueller, J. (2015). Comparative politics and the synthetic control method: Comparative politics and the synthetic control method. American Journal of Political Science, 59(2), 495-510. doi:10.1111/ajps.12116
  • Hainmueller, J., Abadie, A., Diamond, A., National Bureau of Economic Research. (2007). Synthetic control methods for comparative case studies : Estimating the effect of California’s tobacco control program (Nber working paper series, no. w12831). Cambridge, Mass.: National Bureau of Economic Research. (2007). Retrieved April 21, 2020.