py-scm

Common

These are common type definitions and interfaces used.

class pyscm.common.IReasoningModel

Bases: ABC

Abstract interface implemented by reasoning models.

abstract iquery(Y: str, X: Dict[str, float], samples=1000, pandas: bool = True) pd.Series | np.ndarray

Conduct an interventional query.

Parameters:
  • Y – Target node.

  • X – Intervention assignments, e.g. {"X": 1.0}.

  • samples – Reserved for backward compatibility.

  • pandas – When True return a Series; otherwise return a NumPy array containing [mean, std].

Returns:

The post-intervention mean and standard deviation of Y.

abstract samples(size=1000) pd.DataFrame

Sample observations from the model marginals.

Parameters:

size – Number of samples.

Returns:

Samples drawn from the model.

class pyscm.common.Parameters(M: pd.Series, C: pd.DataFrame)

Bases: object

Mean vector and covariance matrix for a Gaussian model.

Parameters:
  • M – Mean values indexed by variable name.

  • C – Covariance matrix indexed by variable name.

C: pd.DataFrame
M: pd.Series
__init__(M: pd.Series, C: pd.DataFrame) None

Reasoning

The reasoning engine logic is contained in this module.

class pyscm.reasoning.ReasoningModel(d: Any, M: pd.Series | np.ndarray | List[float], C: pd.DataFrame | np.ndarray | List[List[float]], variables: None | list[str] | tuple[str, ...] = None)

Bases: IReasoningModel

Exact reasoning model for linear-Gaussian structural causal queries.

property C: Any

Return the model covariance matrix.

Returns:

Covariance matrix indexed by variable name.

property M: Any

Return the model mean vector.

Returns:

Mean vector indexed by variable name.

__init__(d: Any, M: pd.Series | np.ndarray | List[float], C: pd.DataFrame | np.ndarray | List[List[float]], variables: None | list[str] | tuple[str, ...] = None)

Initialize a reasoning model from a DAG and Gaussian moments.

Parameters:
  • d – Directed acyclic graph.

  • M – Mean vector as a pandas Series, NumPy array, or Python list.

  • C – Covariance matrix as a pandas DataFrame, NumPy array, or nested list.

  • variables – Variable ordering to use when M and C are not pandas objects.

Returns:

None.

cquery(node: Any, factual: Dict[Any, float], counterfactual: List[Dict[Any, float]], n_samples=10000, pandas: bool = True) Any | tuple[tuple[str, ...], ndarray]

Conduct an exact counterfactual query for a linear-Gaussian model.

Parameters:
  • node – Target node.

  • factual – Factual evidence. All variables in the model must be observed.

  • counterfactual – One or more intervention assignments for hypothetical worlds.

  • n_samples – Reserved for backward compatibility.

  • pandas – When True return a DataFrame; otherwise return a tuple of (columns, values) where values is a NumPy array.

Returns:

Counterfactual results for the requested hypothetical worlds.

cquery_raw(node: Any, factual: Dict[Any, float], counterfactual: List[Dict[Any, float]], n_samples=10000) tuple[tuple[str, ...], ndarray]

Return raw NumPy counterfactual results as (columns, values).

Parameters:
  • node – Target node.

  • factual – Factual assignment for every variable in the model.

  • counterfactual – Hypothetical intervention assignments.

  • n_samples – Reserved for backward compatibility.

Returns:

Column names and counterfactual values as a NumPy array.

equery(Y: str, X1: Dict[str, float], X2: Dict[str, float], samples=1000, pandas: bool = True) Any | ndarray

Conducts an Average Causal Effect (ACE) query: E[Y=y|do(X=x1)] - E[Y=y|do(X=x2)].

Parameters:
  • Y

  • X1 – X1 (X=x1).

  • X2 – X2 (X=x2).

  • samples – Number of samples.

  • pandas – When True return a Series; otherwise return a NumPy array containing the difference in [mean, std] moments.

Returns:

ACE.

iquery(Y: str, X: Dict[str, float], samples=1000, pandas: bool = True) Any | ndarray

Conduct an exact interventional query for a linear-Gaussian model.

Parameters:
  • Y – Target node.

  • X – Intervention assignments, e.g. {"X": 1.0}.

  • samples – Reserved for backward compatibility.

  • pandas – When True return a Series; otherwise return a NumPy array containing [mean, std].

Returns:

Post-intervention moments for Y.

iquery_raw(Y: str, X: Dict[str, float], samples=1000) ndarray

Return raw NumPy interventional results as [mean, std].

Parameters:
  • Y – Target node.

  • X – Intervention assignments.

  • samples – Reserved for backward compatibility.

Returns:

NumPy array containing [mean, std] for Y.

pquery(observations: None | Dict[str, float] = None, pandas: bool = True) Tuple[Any, Any] | tuple[ndarray, ndarray]

Performs associational/probabilistic inference.

Denote the following.

  • \(z\) as the variable observed

  • \(y\) as the set of other variables

  • \(\\mu\) as the vector of means
    • \(\\mu_z\) as the partitioned \(\\mu`\) of length \(|z|\)

    • \(\\mu_y\) as the partitioned \(\\mu`\) of length \(|y|\)

  • \(\\Sigma\) as the covariance matrix
    • \(\\Sigma_{yz}\) as the partitioned \(\\Sigma\) of \(|y|\) rows and \(|z|\) columns

    • \(\\Sigma_{zz}\) as the partitioned \(\\Sigma\) of \(|z|\) rows and \(|z|\) columns

    • \(\\Sigma_{yy}\) as the partitioned \(\\Sigma\) of \(|y|\) rows and \(|y|\) columns

If we observe evidence \(z_e\), then the new means \(\\mu_y^{*}\) and covariance matrix \(\\Sigma_y^{*}\) corresponding to \(y\) are computed as follows.

  • \(\\mu_y^{*} = \\mu_y - \\Sigma_{yz} \\Sigma_{zz} (z_e - \\mu_z)\)

  • \(\\Sigma_y^{*} = \\Sigma_{yy} \\Sigma_{zz} \\Sigma_{yz}^{T}\)

Parameters:
  • observations – Observations.

  • pandas – When True return pandas objects; otherwise return NumPy arrays.

Returns:

Tuple of means and covariance matrix.

pquery_raw(observations: None | Dict[str, float] = None) tuple[ndarray, ndarray]

Return raw NumPy associational results.

Parameters:

observations – Observed variable assignments.

Returns:

Mean vector and covariance matrix as NumPy arrays.

samples(size=1000) Any

Samples data from the marginals.

Parameters:

size – Number of samples.

Returns:

Samples.

pyscm.reasoning.create_reasoning_model(d: Any, p: Parameters | Dict[str, Any]) ReasoningModel

Create a reasoning model.

Parameters:
  • d – DAG.

  • p – Parameters.

Returns:

ReasoningModel.

Associational Query

Associational query logic is contained in this module.

pyscm.associational.condition(X: List[int], Y: List[int], y: ndarray, m: ndarray, S: ndarray) Tuple[ndarray, ndarray]

Conditions X on Y; Y is the conditioning set (e.g. P(X | Y=y)).

Parameters:
  • X – Indices.

  • Y – Indices.

  • y – The evidence e.g. Y=y.

  • m – Means.

  • S – Covariances.

Returns:

Tuple of updated means and covariances.

pyscm.associational.make_sym_pos_semidef(S: ndarray) ndarray

Make a covariance matrix symmetric positive semidefinite.

Parameters:

S – Covariances.

Returns:

Symmetric positive semidefinite covariances.

Interventional Query

Interventional query logic is contained in this module.

pyscm.interventional.do_op(X: List[Any], X_val: List[float] | ndarray, d: DiGraph, M: Series, C: DataFrame) Tuple[DiGraph, Series, DataFrame]

Do an interventional query. All parents of each x in X are removed.

Parameters:
  • X – Nodes.

  • X_val – Values corresponding to nodes.

  • d – DAG.

  • M – Means.

  • C – Covariances.

Returns:

Tuple of DAG, means, and covariances.

pyscm.interventional.get_causal_effect(Y: int, X: List[int], Z: List[int], x: Dict[int, float], m: ndarray, S: ndarray, samples=1000) Series

Estimate an interventional effect exactly for a linear-Gaussian model.

This helper preserves the historical surface of the package: Y, X and Z are positional index references into the joint Gaussian defined by m and S.

Parameters:
  • Y – Target variable index.

  • X – Intervention variable indices, e.g. do(X=x).

  • Z – Adjustment-variable indices.

  • x – Intervention assignments by index.

  • m – Mean vector.

  • S – Covariance matrix.

  • samples – Reserved for backward compatibility.

Returns:

A series containing the post-intervention mean and standard deviation of Y.

pyscm.interventional.remove_parents(children: List[Any], d: DiGraph, M: Series, C: DataFrame) Tuple[DiGraph, Series, DataFrame]

Creates a new model remove all parents of child node.

Parameters:
  • children – Children.

  • d – DAG.

  • M – Means.

  • C – Covariances.

Returns:

Tuple of new DAG, means and covariances.

Counterfactual Query

Counterfactual query logic is contained in this module.

pyscm.counterfactual.do_counterfactual(node: Any, factual: Dict[Any, float], counterfactual: List[Dict[Any, float]], d: DiGraph, M: Series, C: DataFrame, n_samples=10000) DataFrame

Estimate counterfactual target values by exact abduction-action-prediction.

The factual assignment must contain every variable in the model. Each counterfactual intervention dictionary may set one or more variables. The returned frame preserves the historical surface of the package by including the target’s parent assignments used in each hypothetical world together with factual and counterfactual columns for the target node.

Parameters:
  • node – Target node whose counterfactual response is requested.

  • factual – Factual assignment for every variable in the model.

  • counterfactual – Hypothetical intervention assignments to evaluate.

  • d – Directed acyclic graph defining the structural model.

  • M – Mean vector for the observational distribution.

  • C – Covariance matrix for the observational distribution.

  • n_samples – Reserved for backward compatibility.

Returns:

Parent assignments together with factual and counterfactual target values.

Sampling

Sampling logic is contained in this module.

pyscm.sampling.sample(M: pd.Series, C: pd.DataFrame, size=1000) pd.DataFrame

Generate samples from a multivariate normal distribution.

Parameters:
  • M – Mean vector indexed by variable name.

  • C – Covariance matrix indexed by variable name.

  • size – Number of samples.

Returns:

Sampled observations as a DataFrame.

Serde

Serialization and deserialization is contained in this module.

pyscm.serde.dict_to_graph(d: Dict[str, Any]) Any

Convert a serialized graph dictionary into a directed graph.

Parameters:

d – Graph payload with nodes and edges entries.

Returns:

Directed graph built from d.

pyscm.serde.dict_to_model(data: Dict[str, Any]) ReasoningModel

Convert a serialized dictionary into a reasoning model.

Parameters:

data – Serialized reasoning-model payload.

Returns:

Reconstructed reasoning model.

pyscm.serde.graph_to_dict(g: Any) Dict[str, Any]

Convert a graph into a JSON-serializable dictionary.

Parameters:

g – Graph to serialize.

Returns:

Dictionary with nodes and edges entries.

pyscm.serde.model_to_dict(model: ReasoningModel) Dict[str, Any]

Convert a reasoning model into a JSON-serializable dictionary.

Parameters:

model – Reasoning model to serialize.

Returns:

Dictionary representation of model.

Generator

Random continuous model generation is contained in this module.

class pyscm.generator.GeneratedGaussianBbn(graph: dict[str, Any], structural: dict[str, StructuralGaussianNode], parameters: dict[str, Any])

Bases: object

Generated Gaussian BBN bundle.

Parameters:
  • graph – Named DAG payload.

  • structural – Structural parameters by node name.

  • parameters – Py-scm style moment payload.

__init__(graph: dict[str, Any], structural: dict[str, StructuralGaussianNode], parameters: dict[str, Any]) None
graph: dict[str, Any]
parameters: dict[str, Any]
structural: dict[str, StructuralGaussianNode]
class pyscm.generator.StructuralGaussianNode(intercept: float, parents: dict[str, float], sd: float)

Bases: object

Structural parameters for one Gaussian node.

Parameters:
  • intercept – Structural intercept.

  • parents – Parent coefficients by parent name.

  • sd – Residual noise standard deviation.

__init__(intercept: float, parents: dict[str, float], sd: float) None
intercept: float
parents: dict[str, float]
sd: float
pyscm.generator.generate_multi_connected_structure(n: int, max_iter: int = 10, seed: int = 37, max_degree: int | None = None) DiGraph

Generate a multi-connected DAG using the shared Darkstar structure algorithm.

Parameters:
  • n – Number of nodes.

  • max_iter – Rewiring iteration count.

  • seed – Deterministic seed.

  • max_degree – Optional undirected degree cap.

Returns:

Integer DAG.

pyscm.generator.generate_multi_gaussian_bbn(n: int, max_iter: int = 10, seed: int = 37, max_degree: int | None = None, intercept_mean: float = 0.0, intercept_sd: float = 0.4, weight_mean: float = 0.0, weight_sd: float = 0.35, noise_sd_min: float = 0.35, noise_sd_max: float = 0.9, max_condition_number: float | None = None, retry_limit: int = 64) GeneratedGaussianBbn

Generate a multi-connected Gaussian BBN.

Parameters:
  • n – Number of nodes.

  • max_iter – Rewiring iteration count.

  • seed – Deterministic seed.

  • max_degree – Optional undirected degree cap.

  • intercept_mean – Mean of the structural intercept draw.

  • intercept_sd – Standard deviation of the structural intercept draw.

  • weight_mean – Mean of the parent-weight draw.

  • weight_sd – Standard deviation of the parent-weight draw.

  • noise_sd_min – Minimum residual standard deviation.

  • noise_sd_max – Maximum residual standard deviation.

  • max_condition_number – Optional covariance condition-number cap.

  • retry_limit – Maximum regeneration attempts when the condition cap rejects a sample.

Returns:

Named graph, structural parameters, and derived moment payload.

pyscm.generator.generate_singly_connected_structure(n: int, max_iter: int = 10, seed: int = 37, max_degree: int | None = None) DiGraph

Generate a singly-connected DAG using the shared Darkstar structure algorithm.

Parameters:
  • n – Number of nodes.

  • max_iter – Rewiring iteration count.

  • seed – Deterministic seed.

  • max_degree – Optional undirected degree cap.

Returns:

Integer DAG.

pyscm.generator.generate_singly_gaussian_bbn(n: int, max_iter: int = 10, seed: int = 37, max_degree: int | None = None, intercept_mean: float = 0.0, intercept_sd: float = 0.4, weight_mean: float = 0.0, weight_sd: float = 0.35, noise_sd_min: float = 0.35, noise_sd_max: float = 0.9, max_condition_number: float | None = None, retry_limit: int = 64) GeneratedGaussianBbn

Generate a singly-connected Gaussian BBN.

Parameters:
  • n – Number of nodes.

  • max_iter – Rewiring iteration count.

  • seed – Deterministic seed.

  • max_degree – Optional undirected degree cap.

  • intercept_mean – Mean of the structural intercept draw.

  • intercept_sd – Standard deviation of the structural intercept draw.

  • weight_mean – Mean of the parent-weight draw.

  • weight_sd – Standard deviation of the parent-weight draw.

  • noise_sd_min – Minimum residual standard deviation.

  • noise_sd_max – Maximum residual standard deviation.

  • max_condition_number – Optional covariance condition-number cap.

  • retry_limit – Maximum regeneration attempts when the condition cap rejects a sample.

Returns:

Named graph, structural parameters, and derived moment payload.

pyscm.generator.get_simple_ordered_tree(n: int) DiGraph

Return the simple ordered tree 0 -> 1 -> ... -> n-1.

Parameters:

n – Number of nodes.

Returns:

Integer DAG.

pyscm.generator.structural_to_parameter_payload(graph: dict[str, Any], structural: dict[str, StructuralGaussianNode]) dict[str, Any]

Convert structural Gaussian parameters into the py-scm v/m/S payload.

Parameters:
  • graph – Named graph payload.

  • structural – Structural parameters by node name.

Returns:

Mean/covariance payload.

pyscm.generator.to_string_graph(graph: DiGraph) dict[str, Any]

Rename an integer graph as X0, X1, …

Parameters:

graph – Integer DAG.

Returns:

Named graph payload.

Learn

Learning algorithms are contained in this module.

class pyscm.learn.Pc(t=0.05, alpha=0.1)

Bases: object

Learns a Bayesian Belief Network using the PC-algorithm.

__init__(t=0.05, alpha=0.1)

Initialize the PC learner.

Parameters:
  • t – Marginal independence threshold. Values lower than this one are considered as independent.

  • alpha – Conditional independence threshold. Values lower than this one are considered as conditionally independent.

Returns:

None.

fit(X: DataFrame)

Learn a directed acyclic graph from the data.

Parameters:

X – Input data frame with one column per variable.

Returns:

Pc.

class pyscm.learn.Tree(alpha=0.1)

Bases: object

Learns a Bayesian Belief Network with a tree structure.

__init__(alpha=0.1)

Initialize the tree-structured learner.

Parameters:

alpha – Conditional independence threshold. Values lower than this one are considered as conditionally independent.

Returns:

None.

fit(X: DataFrame)

Learn a tree-structured directed acyclic graph from the data.

Parameters:

X – Input data frame with one column per variable.

Returns:

Tree.

pyscm.learn.compute_indep(df: DataFrame, t=0.05, alpha=0.1) DataFrame

Computes pairwise conditional independence tests.

Parameters:
  • df – Data.

  • t – Threshold for marginal independence.

  • alpha – Alpha value for conditional independence.

Returns:

Independence test results with columns: x, y, is_indep, p.

pyscm.learn.get_local_models(Xy: DataFrame) DataFrame

Learns all local models.

Parameters:

Xy – Data.

Returns:

Dataframe with results of local models. Hints on edge orientation.

pyscm.learn.identify_v_structures(d: DiGraph) List[Tuple[str, str, str]]

Identifies all v-structures (colliders) in the DAG.

Parameters:

d – DAG.

Returns:

List of collider configurations.

pyscm.learn.learn_local_model(Xy: DataFrame, y_col: str) Dict[str, Any]

Learn a local model from the data with respect to the y variable.

Parameters:
  • Xy – Data.

  • y_col – y variable.

Returns:

Dictionary of results of local model.

pyscm.learn.learn_skeleton(df: DataFrame, t=0.05, alpha=0.1) Graph

Learns the skeleton of the graph.

Parameters:
  • df – Data.

  • t – Threshold for marginal independence.

  • alpha – Alpha value for conditional independence.

Returns:

Undirected graph (skeleton).

pyscm.learn.learn_tree_structure(df: DataFrame) Graph

Learn a tree structure.

Parameters:

df – Data.

Returns:

Tree.

pyscm.learn.learn_v_structures(u: Graph, df: DataFrame) DiGraph

Learns the v-structures (colliders) of the graph.

Parameters:
  • u – Undirected graph (skeleton).

  • df – Data.

Returns:

Directed acyclic graph (DAG).

pyscm.learn.orient_by_inference(u: Graph, d: DiGraph, p: DataFrame | _LocalModelHints) Tuple[Graph, DiGraph]

Orient edges by inference.

Parameters:
  • u – Undirected graph (skeleton).

  • d – DAG.

  • p – Local model edge hints.

Returns:

Tuple of undirected graph and DAG.

pyscm.learn.orient_by_models(u: Graph, d: DiGraph, p: DataFrame | _LocalModelHints) Tuple[Graph, DiGraph]

Orient edges by using hints based on local models.

Parameters:
  • u – Undirected graph (skeleton).

  • d – DAG.

  • p – Local model edge hints.

Returns:

Tuple of undirected graph and DAG.