Estimating Average Causal Effect

The Average Causal Effect (ACE) is quite often a population-level inference of interest. The ACE is difficult to estimate because we often have confounding variables which pollute and bias our estimation of causal effects. Using causal models and inference, we can estimate the ACE using the do-operator. Let’s see how we can do so with simulated data.

Causal model

Let’s define the following causal model.

  • \(C \sim \mathcal{N}(1, 1)\)

  • \(X \sim \mathcal{N}(2 + 3 C, 1)\)

  • \(Y \sim \mathcal{N}(0.5 + 2.5 C + 1.5 X, 1)\)

Where,

  • \(C\) is the confounder (of \(X\) and \(Y\)),

  • \(X\) is the cause, and

  • \(Y\) is the effect.

In estimating the ACE, we want to know the causal effect of \(X\) on \(Y\) controlling for \(C\). Below, we simulate data from this causal model.

[1]:
import numpy as np
import pandas as pd

np.random.seed(37)

Xy = pd.DataFrame() \
    .assign(C=np.random.normal(1, 1, 10_000)) \
    .assign(X=lambda d: np.random.normal(2 + 3 * d['C'])) \
    .assign(Y=lambda d: np.random.normal(0.5 + 2.5 * d['C'] + 1.5 * d['X']))

Xy.shape
[1]:
(10000, 3)

DoubleML ACE estimation

We can use DoubleML to estimate the causal effect of \(X\) on \(Y\) as follows.

[2]:
from sklearn.base import clone
from sklearn.linear_model import LinearRegression
from doubleml import DoubleMLData
from doubleml import DoubleMLPLR

dml_data = DoubleMLData(
    Xy,
    y_col='Y',
    d_cols='X',
    x_cols=['C']
)

learner = LinearRegression()
ml_l = clone(learner)
ml_m = clone(learner)

dml_model = DoubleMLPLR(dml_data, ml_l, ml_m)
dml_model.fit()
dml_model.summary
[2]:
coef std err t P>|t| 2.5 % 97.5 %
X 1.506062 0.009728 154.824303 0.0 1.486996 1.525128

We can also try to estimate the causal effect of \(C\) on \(Y\).

[3]:
dml_data = DoubleMLData(
    Xy,
    y_col='Y',
    d_cols='C',
    x_cols=['X']
)

learner = LinearRegression()
ml_l = clone(learner)
ml_m = clone(learner)

dml_model = DoubleMLPLR(dml_data, ml_l, ml_m)
dml_model.fit()
dml_model.summary
[3]:
coef std err t P>|t| 2.5 % 97.5 %
C 2.492434 0.030878 80.719026 0.0 2.431915 2.552954

The ACEs (coefficients) estimated from DoubleML match the coefficients we used in simulating the data.

Py-SCM ACE estimation

ACE estimation with py-scm is also possible using the iiquery() method. First, let’s create the causal model.

[4]:
from pyscm.reasoning import create_reasoning_model

d = {
    'nodes': ['C', 'X', 'Y'],
    'edges': [
        ('C', 'X'),
        ('C', 'Y'),
        ('X', 'Y')
    ]
}

p = {
    'v': Xy.columns.tolist(),
    'm': Xy.mean().values,
    'S': Xy.cov().values
}

model = create_reasoning_model(d, p)

Now, let’s estimate the causal effects on \(Y\) when we \(\mathrm{do}(C)\). Remember that the do operator removes all backdoor paths from \(C\) to \(Y\). Since we created the causal structure, we know there are no backdoor paths. The ACE when we \(\mathrm{do}(C)\) matches DoubleML and our true parameters.

[5]:
model.iiquery(['C'], ['X'], 'Y')
[5]:
C    2.510711
X    1.499376
dtype: float64

Let’s estimate the causal effects on \(Y\) when we \(\mathrm{do}(X)\). In this situation, we know there is a backdoor path from \(X\) to \(Y\) through \(C\). In py-scm, the do-operator removes the link between \(X\) and \(C\) before the ACE estimation.

[6]:
model.iiquery(['X'], ['C'], 'Y')
[6]:
X    0.633283
C    0.296898
dtype: float64

Clearly, the causal effects of \(X\) and \(C\) on \(Y\) are different when we \(\mathrm{do}(X)\) versus \(\mathrm{do}(C)\).