py-scm

Common

These are common type definitions and interfaces used.

class pyscm.common.IReasoningModel

Bases: ABC

Interventional model.

abstract iquery(observations: None | Dict[str, float] = None) Tuple[Series, DataFrame]

Conducts an interventional query.

Parameters:

observations – The set of variables/values that is in do(X).

Returns:

Tuple of means and covariances.

abstract samples(size=1000) DataFrame

Samples data from the marginals.

Parameters:

size – Number of samples.

Returns:

Samples.

class pyscm.common.Parameters(M: Series, C: DataFrame)

Bases: object

Parameters.

C: DataFrame
M: Series
__init__(M: Series, C: DataFrame) None

Reasoning

The reasoning engine logic is contained in this module.

class pyscm.reasoning.ReasoningModel(d: DiGraph, M: Series, C: DataFrame)

Bases: IReasoningModel

Reasoning model.

__init__(d: DiGraph, M: Series, C: DataFrame)

ctor.

Parameters:
  • d – Directed acyclic graph.

  • M – Means.

  • C – Covariance matrix.

cquery(node: Any, factual: Dict[Any, float], counterfactual: List[Dict[Any, float]], n_samples=10000) DataFrame

Conducts a counterfactual query.

Parameters:
  • node – The target node (e.g. dependent variable, y).

  • factual – The factual evidence. All variables must be observed!

  • counterfactual – The counterfactual evidence. Must be parents of target node.

  • n_samples – Number of samples.

Returns:

Counterfactual results.

iiquery(D_cols: List[str], X_cols: List[str], y_col: str, n_samples=10000)

Estimate the average causal effect. The difference with iquery() is the value of the ACE is returned instead of the probability state.

Parameters:
  • D_cols – The variables that we are estimating the causal effects of (causes).

  • X_cols – The variables involved in estimation the causal effects (confounders).

  • y_col – The variable we are interested in estimating the causal against.

  • n_samples – The number of samples to generate per estimation.

Returns:

The causal effects.

iquery(observations: None | Dict[str, float] = None) Tuple[Series, DataFrame]

Conducts an interventional query.

Parameters:

observations – The set of variables/values that is in do(X).

Returns:

Tuple of means and covariances.

pquery(observations: None | Dict[str, float] = None) Tuple[Series, DataFrame]

Performs associational/probabilistic inference.

Denote the following.

  • \(z\) as the variable observed

  • \(y\) as the set of other variables

  • \(\mu\) as the vector of means
    • \(\mu_z\) as the partitioned \(\mu`\) of length \(|z|\)

    • \(\mu_y\) as the partitioned \(\mu`\) of length \(|y|\)

  • \(\Sigma\) as the covariance matrix
    • \(\Sigma_{yz}\) as the partitioned \(\Sigma\) of \(|y|\) rows and \(|z|\) columns

    • \(\Sigma_{zz}\) as the partitioned \(\Sigma\) of \(|z|\) rows and \(|z|\) columns

    • \(\Sigma_{yy}\) as the partitioned \(\Sigma\) of \(|y|\) rows and \(|y|\) columns

If we observe evidence \(z_e\), then the new means \(\mu_y^{*}\) and covariance matrix \(\Sigma_y^{*}\) corresponding to \(y\) are computed as follows.

  • \(\mu_y^{*} = \mu_y - \Sigma_{yz} \Sigma_{zz} (z_e - \mu_z)\)

  • \(\Sigma_y^{*} = \Sigma_{yy} \Sigma_{zz} \Sigma_{yz}^{T}\)

Parameters:

observations – Observations.

Returns:

Tuple of means and covariance matrix.

samples(size=1000) DataFrame

Samples data from the marginals.

Parameters:

size – Number of samples.

Returns:

Samples.

pyscm.reasoning.create_reasoning_model(d: DiGraph | Dict[str, Any], p: Parameters | Dict[str, Any]) ReasoningModel

Create a reasoning model.

Parameters:
  • d – DAG.

  • p – Parameters.

Returns:

ReasoningModel.

Associational Query

Associational query logic is contained in this module.

pyscm.associational.condition(X: List[int], Y: List[int], y: ndarray, m: ndarray, S: ndarray) Tuple[ndarray, ndarray]

Conditions X on Y; Y is the conditioning set (e.g. P(X | Y=y)).

Parameters:
  • X – Indices.

  • Y – Indices.

  • y – The evidence e.g. Y=y.

  • m – Means.

  • S – Covariances.

Returns:

Tuple of updated means and covariances.

pyscm.associational.make_sym_pos_semidef(S: ndarray) ndarray

Make a covariance matrix symmetric positive semidefinite.

Parameters:

S – Covariances.

Returns:

Symmetric positive semidefinite covariances.

Interventional Query

Interventional query logic is contained in this module.

pyscm.interventional.do_estimation(D_cols: List[str], X_cols: List[str], y_col: str, model: IReasoningModel, n_samples=10000)

Estimate the average causal effect.

Parameters:
  • D_cols – The variables that we are manipulating (with the do-operator).

  • X_cols – The variables involved in estimation the causal effects.

  • y_col – The variable we are interested in estimating the causal against.

  • model – The reasoning model.

  • n_samples – The number of samples to generate per estimation.

Returns:

The causal effects.

pyscm.interventional.do_op(X: List[Any], X_val: List[float] | ndarray, d: DiGraph, M: Series, C: DataFrame) Tuple[DiGraph, Series, DataFrame]

Do an interventional query. All parents of each x in X are removed.

Parameters:
  • X – Nodes.

  • X_val – Values corresponding to nodes.

  • d – DAG.

  • M – Means.

  • C – Covariances.

Returns:

Tuple of DAG, means, and covariances.

pyscm.interventional.remove_parents(children: List[Any], d: DiGraph, M: Series, C: DataFrame) Tuple[DiGraph, Series, DataFrame]

Creates a new model remove all parents of child node.

Parameters:
  • children – Children.

  • d – DAG.

  • M – Means.

  • C – Covariances.

Returns:

Tuple of new DAG, means and covariances.

Counterfactual Query

Counterfactual query logic is contained in this module.

pyscm.counterfactual.do_counterfactual(node: Any, factual: Dict[Any, float], counterfactual: List[Dict[Any, float]], d: DiGraph, M: Series, C: DataFrame, n_samples=10000) DataFrame

Estimate the counterfactual.

pyscm.counterfactual.get_scm(node: Any, d: DiGraph, M: Series, C: DataFrame, n_samples=10000) LinearRegression

Get the Structural Causal Model (SCM).

Parameters:
  • node – Target node/variable.

  • d – DAG.

  • M – Means.

  • C – Covariances.

  • n_samples – Number of samples.

Returns:

SCM.

Sampling

Sampling logic is contained in this module.

pyscm.sampling.sample(M: Series, C: DataFrame, size=1000) DataFrame

Generates samples from a multivariate normal distribution.

Parameters:
  • M – Means.

  • C – Covariances.

  • size – Number of samples.

Returns:

Samples.

Serde

Serialization and deserialization is contained in this module.

pyscm.serde.dict_to_graph(d: Dict[str, Any]) DiGraph

Convert a dictionary to a graph.

Parameters:

d – Dictionary.

Returns:

nx.DiGraph.

pyscm.serde.dict_to_model(data: Dict[str, Any]) ReasoningModel

Convert dictionary to model.

Parameters:

d – Dictionary.

Returns:

ReasoningModel.

pyscm.serde.graph_to_dict(g: Graph | DiGraph) Dict[str, Any]

Convert graph to dictionary.

Parameters:

g – nx.Graph or nx.DiGraph.

Returns:

Dictionary.

pyscm.serde.model_to_dict(model: ReasoningModel) Dict[str, Any]

Convert model to dictionary.

Parameters:

model – ReasoningModel.

Returns:

Dictionary.

Learn

Learning algorithms are contained in this module.

class pyscm.learn.Pc(t=0.05, alpha=0.1)

Bases: object

Learns a Bayesian Belief Network using the PC-algorithm.

__init__(t=0.05, alpha=0.1)

ctor.

Parameters:
  • t – Marginal independence threshold. Values lower than this one are considered as independent.

  • alpha – Conditional independence threshold. Values lower than this one are considered as conditionally independent.

fit(X: DataFrame)

Learns the DAG.

Returns:

Pc.

class pyscm.learn.Tree(alpha=0.1)

Bases: object

Learns a Bayesian Belief Network with a tree structure.

__init__(alpha=0.1)

ctor.

Parameters:

alpha – Conditional independence threshold. Values lower than this one are considered as conditionally independent.

fit(X: DataFrame)

Learns the DAG.

Returns:

Tree.

pyscm.learn.compute_indep(df: DataFrame, t=0.05, alpha=0.1) DataFrame

Computes pairwise conditional independence tests.

Parameters:
  • df – Data.

  • t – Threshold for marginal independence.

  • alpha – Alpha value for conditional independence.

Returns:

Independence test results with columns: x, y, is_indep, p.

pyscm.learn.get_local_models(Xy: DataFrame) DataFrame

Learns all local models.

Parameters:

Xy – Data.

Returns:

Dataframe with results of local models. Hints on edge orientation.

pyscm.learn.identify_v_structures(d: DiGraph) List[Tuple[str, str, str]]

Identifies all v-structures (colliders) in the DAG.

Parameters:

d – DAG.

Returns:

List of collider configurations.

pyscm.learn.learn_local_model(Xy: DataFrame, y_col: str) Dict[str, Any]

Learn a local model from the data with respect to the y variable.

Parameters:
  • Xy – Data.

  • y_col – y variable.

Returns:

Dictionary of results of local model.

pyscm.learn.learn_skeleton(df: DataFrame, t=0.05, alpha=0.1) Graph

Learns the skeleton of the graph.

Parameters:
  • df – Data.

  • t – Threshold for marginal independence.

  • alpha – Alpha value for conditional independence.

Returns:

Undirected graph (skeleton).

pyscm.learn.learn_tree_structure(df: DataFrame) Graph

Learn a tree structure.

Parameters:

df – Data.

Returns:

Tree.

pyscm.learn.learn_v_structures(u: Graph, df: DataFrame) DiGraph

Learns the v-structures (colliders) of the graph.

Parameters:
  • u – Undirected graph (skeleton).

  • df – Data.

Returns:

Directed acyclic graph (DAG).

pyscm.learn.orient_by_inference(u: Graph, d: DiGraph, p: DataFrame) Tuple[Graph, DiGraph]

Orient edges by inference.

Parameters:
  • u – Undirected graph (skeleton).

  • d – DAG.

  • p – Local model edge hints.

Returns:

Tuple of undirected graph and DAG.

pyscm.learn.orient_by_models(u: Graph, d: DiGraph, p: DataFrame) Tuple[Graph, DiGraph]

Orient edges by using hints based on local models.

Parameters:
  • u – Undirected graph (skeleton).

  • d – DAG.

  • p – Local model edge hints.

Returns:

Tuple of undirected graph and DAG.