Learning
Structure and parameter learning with py-scm is possible. Let’s show how to use py-scm to learn a causal model using the PC-algorithm.
Causal model
The true causal model is defined as follows.
\(C \sim \mathcal{N}(1, 1)\)
\(X \sim \mathcal{N}(1 + 2 C, 1)\)
\(M \sim \mathcal{N}(5 + 1.5 X, 1)\)
\(Y \sim \mathcal{N}(1 + 2 C + 1.5 X + 0.5 M, 1)\)
As you can see,
\(C\) is a confounder of \(X\) and \(Y\), and
\(M\) is a mediator between \(X\) and \(Y\).
We have already simulated data from this causal model, and so we will load it.
[1]:
import pandas as pd
X = pd.read_csv('./_data/model.csv')
X.shape
[1]:
(10000, 4)
[2]:
X.head(10)
[2]:
C | X | M | Y | |
---|---|---|---|---|
0 | 0.945536 | 3.024955 | 9.759877 | 10.532426 |
1 | 1.674308 | 3.387163 | 10.893335 | 15.800990 |
2 | 1.346647 | 3.589577 | 10.983322 | 13.636408 |
3 | -0.300346 | 0.253631 | 4.309610 | 2.124186 |
4 | 2.518512 | 4.986335 | 12.109710 | 18.424847 |
5 | 1.989824 | 6.312251 | 13.992190 | 23.169803 |
6 | 1.277681 | 1.963059 | 7.153576 | 9.272288 |
7 | 0.551411 | 1.834353 | 7.416663 | 9.534964 |
8 | 1.961966 | 2.707050 | 7.871503 | 13.247716 |
9 | 0.172421 | 1.636739 | 8.839026 | 8.404960 |
PC-algorithm
Let’s apply the PC-algorithm to try and recover the causal model.
[3]:
from pyscm.learn import Pc
algorithm = Pc().fit(X)
As you can see below, the true structure is recovered.
[4]:
import networkx as nx
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(5, 5))
g = algorithm.g
pos = nx.nx_agraph.graphviz_layout(g, prog='dot')
nx.draw(g, pos=pos, with_labels=True, node_color='#e0e0e0')
fig.tight_layout()
The means and covariance matrix are available.
[5]:
algorithm.m
[5]:
C 1.001723
X 2.994276
M 9.496402
Y 12.231968
dtype: float64
[6]:
algorithm.c
[6]:
C | X | M | Y | |
---|---|---|---|---|
C | 0.990700 | 1.989244 | 2.994101 | 6.461545 |
X | 1.989244 | 5.004194 | 7.532975 | 15.238727 |
M | 2.994101 | 7.532975 | 12.324022 | 23.445529 |
Y | 6.461545 | 15.238727 | 23.445529 | 48.496009 |
Reasoning model
A py-scm reasoning model may then be created as follows.
[7]:
from pyscm.reasoning import create_reasoning_model
model = create_reasoning_model(algorithm.d, algorithm.p)
model
[7]:
ReasoningModel[H=[C,X,M,Y], M=[1.002,2.994,9.496,12.232], C=[[0.991,1.989,2.994,6.462]|[1.989,5.004,7.533,15.239]|[2.994,7.533,12.324,23.446]|[6.462,15.239,23.446,48.496]]]
We can then use the associational, interventional and counterfactual inference capabilities of the reasoning model.
[8]:
q = model.pquery()
[9]:
q[0]
[9]:
C 1.001723
X 2.994276
M 9.496402
Y 12.231968
dtype: float64
[10]:
q[1]
[10]:
C | X | M | Y | |
---|---|---|---|---|
C | 0.990700 | 1.989244 | 2.994101 | 6.461545 |
X | 1.989244 | 5.004194 | 7.532975 | 15.238727 |
M | 2.994101 | 7.532975 | 12.324022 | 23.445529 |
Y | 6.461545 | 15.238727 | 23.445529 | 48.496009 |