# Estimating Average Causal Effect

The Average Causal Effect (ACE) is quite often a population-level inference of interest. The ACE is difficult to estimate because we often have confounding variables which pollute and bias our estimation of causal effects. Using causal models and inference, we can estimate the ACE using the `do-operator`

. Let’s see how we can do so with simulated data.

## Causal model

Let’s define the following causal model.

\(C \sim \mathcal{N}(1, 1)\)

\(X \sim \mathcal{N}(2 + 3 C, 1)\)

\(Y \sim \mathcal{N}(0.5 + 2.5 C + 1.5 X, 1)\)

Where,

\(C\) is the confounder (of \(X\) and \(Y\)),

\(X\) is the cause, and

\(Y\) is the effect.

In estimating the ACE, we want to know the causal effect of \(X\) on \(Y\) controlling for \(C\). Below, we simulate data from this causal model.

```
[1]:
```

```
import numpy as np
import pandas as pd
np.random.seed(37)
Xy = pd.DataFrame() \
.assign(C=np.random.normal(1, 1, 10_000)) \
.assign(X=lambda d: np.random.normal(2 + 3 * d['C'])) \
.assign(Y=lambda d: np.random.normal(0.5 + 2.5 * d['C'] + 1.5 * d['X']))
Xy.shape
```

```
[1]:
```

```
(10000, 3)
```

## DoubleML ACE estimation

We can use DoubleML to estimate the causal effect of \(X\) on \(Y\) as follows.

```
[2]:
```

```
from sklearn.base import clone
from sklearn.linear_model import LinearRegression
from doubleml import DoubleMLData
from doubleml import DoubleMLPLR
dml_data = DoubleMLData(
Xy,
y_col='Y',
d_cols='X',
x_cols=['C']
)
learner = LinearRegression()
ml_l = clone(learner)
ml_m = clone(learner)
dml_model = DoubleMLPLR(dml_data, ml_l, ml_m)
dml_model.fit()
dml_model.summary
```

```
[2]:
```

coef | std err | t | P>|t| | 2.5 % | 97.5 % | |
---|---|---|---|---|---|---|

X | 1.506062 | 0.009728 | 154.824303 | 0.0 | 1.486996 | 1.525128 |

We can also try to estimate the causal effect of \(C\) on \(Y\).

```
[3]:
```

```
dml_data = DoubleMLData(
Xy,
y_col='Y',
d_cols='C',
x_cols=['X']
)
learner = LinearRegression()
ml_l = clone(learner)
ml_m = clone(learner)
dml_model = DoubleMLPLR(dml_data, ml_l, ml_m)
dml_model.fit()
dml_model.summary
```

```
[3]:
```

coef | std err | t | P>|t| | 2.5 % | 97.5 % | |
---|---|---|---|---|---|---|

C | 2.492434 | 0.030878 | 80.719026 | 0.0 | 2.431915 | 2.552954 |

The ACEs (coefficients) estimated from DoubleML match the coefficients we used in simulating the data.

## Py-SCM ACE estimation

ACE estimation with py-scm is also possible using the `iiquery()`

method. First, let’s create the causal model.

```
[4]:
```

```
from pyscm.reasoning import create_reasoning_model
d = {
'nodes': ['C', 'X', 'Y'],
'edges': [
('C', 'X'),
('C', 'Y'),
('X', 'Y')
]
}
p = {
'v': Xy.columns.tolist(),
'm': Xy.mean().values,
'S': Xy.cov().values
}
model = create_reasoning_model(d, p)
```

Now, let’s estimate the causal effects on \(Y\) when we \(\mathrm{do}(C)\). Remember that the do operator removes all backdoor paths from \(C\) to \(Y\). Since we created the causal structure, we know there are no backdoor paths. The ACE when we \(\mathrm{do}(C)\) matches DoubleML and our true parameters.

```
[5]:
```

```
model.iiquery(['C'], ['X'], 'Y')
```

```
[5]:
```

```
C 2.510711
X 1.499376
dtype: float64
```

Let’s estimate the causal effects on \(Y\) when we \(\mathrm{do}(X)\). In this situation, we know there is a backdoor path from \(X\) to \(Y\) through \(C\). In py-scm, the do-operator removes the link between \(X\) and \(C\) before the ACE estimation.

```
[6]:
```

```
model.iiquery(['X'], ['C'], 'Y')
```

```
[6]:
```

```
X 0.633283
C 0.296898
dtype: float64
```

Clearly, the causal effects of \(X\) and \(C\) on \(Y\) are different when we \(\mathrm{do}(X)\) versus \(\mathrm{do}(C)\).