py-scm
Common
These are common type definitions and interfaces used.
- class pyscm.common.IReasoningModel
Bases:
ABCAbstract interface implemented by reasoning models.
- abstract iquery(Y: str, X: Dict[str, float], samples=1000, pandas: bool = True) pd.Series | np.ndarray
Conduct an interventional query.
- Parameters:
Y – Target node.
X – Intervention assignments, e.g.
{"X": 1.0}.samples – Reserved for backward compatibility.
pandas – When
Truereturn aSeries; otherwise return a NumPy array containing[mean, std].
- Returns:
The post-intervention mean and standard deviation of
Y.
- abstract samples(size=1000) pd.DataFrame
Sample observations from the model marginals.
- Parameters:
size – Number of samples.
- Returns:
Samples drawn from the model.
- class pyscm.common.Parameters(M: pd.Series, C: pd.DataFrame)
Bases:
objectMean vector and covariance matrix for a Gaussian model.
- Parameters:
M – Mean values indexed by variable name.
C – Covariance matrix indexed by variable name.
- C: pd.DataFrame
- M: pd.Series
- __init__(M: pd.Series, C: pd.DataFrame) None
Reasoning
The reasoning engine logic is contained in this module.
- class pyscm.reasoning.ReasoningModel(d: Any, M: pd.Series | np.ndarray | List[float], C: pd.DataFrame | np.ndarray | List[List[float]], variables: None | list[str] | tuple[str, ...] = None)
Bases:
IReasoningModelExact reasoning model for linear-Gaussian structural causal queries.
- property C: Any
Return the model covariance matrix.
- Returns:
Covariance matrix indexed by variable name.
- property M: Any
Return the model mean vector.
- Returns:
Mean vector indexed by variable name.
- __init__(d: Any, M: pd.Series | np.ndarray | List[float], C: pd.DataFrame | np.ndarray | List[List[float]], variables: None | list[str] | tuple[str, ...] = None)
Initialize a reasoning model from a DAG and Gaussian moments.
- Parameters:
d – Directed acyclic graph.
M – Mean vector as a pandas Series, NumPy array, or Python list.
C – Covariance matrix as a pandas DataFrame, NumPy array, or nested list.
variables – Variable ordering to use when
MandCare not pandas objects.
- Returns:
None.
- cquery(node: Any, factual: Dict[Any, float], counterfactual: List[Dict[Any, float]], n_samples=10000, pandas: bool = True) Any | tuple[tuple[str, ...], ndarray]
Conduct an exact counterfactual query for a linear-Gaussian model.
- Parameters:
node – Target node.
factual – Factual evidence. All variables in the model must be observed.
counterfactual – One or more intervention assignments for hypothetical worlds.
n_samples – Reserved for backward compatibility.
pandas – When
Truereturn aDataFrame; otherwise return a tuple of(columns, values)wherevaluesis a NumPy array.
- Returns:
Counterfactual results for the requested hypothetical worlds.
- cquery_raw(node: Any, factual: Dict[Any, float], counterfactual: List[Dict[Any, float]], n_samples=10000) tuple[tuple[str, ...], ndarray]
Return raw NumPy counterfactual results as
(columns, values).- Parameters:
node – Target node.
factual – Factual assignment for every variable in the model.
counterfactual – Hypothetical intervention assignments.
n_samples – Reserved for backward compatibility.
- Returns:
Column names and counterfactual values as a NumPy array.
- equery(Y: str, X1: Dict[str, float], X2: Dict[str, float], samples=1000, pandas: bool = True) Any | ndarray
Conducts an Average Causal Effect (ACE) query: E[Y=y|do(X=x1)] - E[Y=y|do(X=x2)].
- Parameters:
Y –
X1 – X1 (X=x1).
X2 – X2 (X=x2).
samples – Number of samples.
pandas – When
Truereturn aSeries; otherwise return a NumPy array containing the difference in[mean, std]moments.
- Returns:
ACE.
- iquery(Y: str, X: Dict[str, float], samples=1000, pandas: bool = True) Any | ndarray
Conduct an exact interventional query for a linear-Gaussian model.
- Parameters:
Y – Target node.
X – Intervention assignments, e.g.
{"X": 1.0}.samples – Reserved for backward compatibility.
pandas – When
Truereturn aSeries; otherwise return a NumPy array containing[mean, std].
- Returns:
Post-intervention moments for
Y.
- iquery_raw(Y: str, X: Dict[str, float], samples=1000) ndarray
Return raw NumPy interventional results as
[mean, std].- Parameters:
Y – Target node.
X – Intervention assignments.
samples – Reserved for backward compatibility.
- Returns:
NumPy array containing
[mean, std]forY.
- pquery(observations: None | Dict[str, float] = None, pandas: bool = True) Tuple[Any, Any] | tuple[ndarray, ndarray]
Performs associational/probabilistic inference.
Denote the following.
\(z\) as the variable observed
\(y\) as the set of other variables
- \(\\mu\) as the vector of means
\(\\mu_z\) as the partitioned \(\\mu`\) of length \(|z|\)
\(\\mu_y\) as the partitioned \(\\mu`\) of length \(|y|\)
- \(\\Sigma\) as the covariance matrix
\(\\Sigma_{yz}\) as the partitioned \(\\Sigma\) of \(|y|\) rows and \(|z|\) columns
\(\\Sigma_{zz}\) as the partitioned \(\\Sigma\) of \(|z|\) rows and \(|z|\) columns
\(\\Sigma_{yy}\) as the partitioned \(\\Sigma\) of \(|y|\) rows and \(|y|\) columns
If we observe evidence \(z_e\), then the new means \(\\mu_y^{*}\) and covariance matrix \(\\Sigma_y^{*}\) corresponding to \(y\) are computed as follows.
\(\\mu_y^{*} = \\mu_y - \\Sigma_{yz} \\Sigma_{zz} (z_e - \\mu_z)\)
\(\\Sigma_y^{*} = \\Sigma_{yy} \\Sigma_{zz} \\Sigma_{yz}^{T}\)
- Parameters:
observations – Observations.
pandas – When
Truereturn pandas objects; otherwise return NumPy arrays.
- Returns:
Tuple of means and covariance matrix.
- pquery_raw(observations: None | Dict[str, float] = None) tuple[ndarray, ndarray]
Return raw NumPy associational results.
- Parameters:
observations – Observed variable assignments.
- Returns:
Mean vector and covariance matrix as NumPy arrays.
- samples(size=1000) Any
Samples data from the marginals.
- Parameters:
size – Number of samples.
- Returns:
Samples.
- pyscm.reasoning.create_reasoning_model(d: Any, p: Parameters | Dict[str, Any]) ReasoningModel
Create a reasoning model.
- Parameters:
d – DAG.
p – Parameters.
- Returns:
ReasoningModel.
Associational Query
Associational query logic is contained in this module.
- pyscm.associational.condition(X: List[int], Y: List[int], y: ndarray, m: ndarray, S: ndarray) Tuple[ndarray, ndarray]
Conditions X on Y; Y is the conditioning set (e.g. P(X | Y=y)).
- Parameters:
X – Indices.
Y – Indices.
y – The evidence e.g. Y=y.
m – Means.
S – Covariances.
- Returns:
Tuple of updated means and covariances.
- pyscm.associational.make_sym_pos_semidef(S: ndarray) ndarray
Make a covariance matrix symmetric positive semidefinite.
- Parameters:
S – Covariances.
- Returns:
Symmetric positive semidefinite covariances.
Interventional Query
Interventional query logic is contained in this module.
- pyscm.interventional.do_op(X: List[Any], X_val: List[float] | ndarray, d: DiGraph, M: Series, C: DataFrame) Tuple[DiGraph, Series, DataFrame]
Do an interventional query. All parents of each x in X are removed.
- Parameters:
X – Nodes.
X_val – Values corresponding to nodes.
d – DAG.
M – Means.
C – Covariances.
- Returns:
Tuple of DAG, means, and covariances.
- pyscm.interventional.get_causal_effect(Y: int, X: List[int], Z: List[int], x: Dict[int, float], m: ndarray, S: ndarray, samples=1000) Series
Estimate an interventional effect exactly for a linear-Gaussian model.
This helper preserves the historical surface of the package:
Y,XandZare positional index references into the joint Gaussian defined bymandS.- Parameters:
Y – Target variable index.
X – Intervention variable indices, e.g.
do(X=x).Z – Adjustment-variable indices.
x – Intervention assignments by index.
m – Mean vector.
S – Covariance matrix.
samples – Reserved for backward compatibility.
- Returns:
A series containing the post-intervention mean and standard deviation of
Y.
- pyscm.interventional.remove_parents(children: List[Any], d: DiGraph, M: Series, C: DataFrame) Tuple[DiGraph, Series, DataFrame]
Creates a new model remove all parents of child node.
- Parameters:
children – Children.
d – DAG.
M – Means.
C – Covariances.
- Returns:
Tuple of new DAG, means and covariances.
Counterfactual Query
Counterfactual query logic is contained in this module.
- pyscm.counterfactual.do_counterfactual(node: Any, factual: Dict[Any, float], counterfactual: List[Dict[Any, float]], d: DiGraph, M: Series, C: DataFrame, n_samples=10000) DataFrame
Estimate counterfactual target values by exact abduction-action-prediction.
The factual assignment must contain every variable in the model. Each counterfactual intervention dictionary may set one or more variables. The returned frame preserves the historical surface of the package by including the target’s parent assignments used in each hypothetical world together with
factualandcounterfactualcolumns for the target node.- Parameters:
node – Target node whose counterfactual response is requested.
factual – Factual assignment for every variable in the model.
counterfactual – Hypothetical intervention assignments to evaluate.
d – Directed acyclic graph defining the structural model.
M – Mean vector for the observational distribution.
C – Covariance matrix for the observational distribution.
n_samples – Reserved for backward compatibility.
- Returns:
Parent assignments together with factual and counterfactual target values.
Sampling
Sampling logic is contained in this module.
- pyscm.sampling.sample(M: pd.Series, C: pd.DataFrame, size=1000) pd.DataFrame
Generate samples from a multivariate normal distribution.
- Parameters:
M – Mean vector indexed by variable name.
C – Covariance matrix indexed by variable name.
size – Number of samples.
- Returns:
Sampled observations as a DataFrame.
Serde
Serialization and deserialization is contained in this module.
- pyscm.serde.dict_to_graph(d: Dict[str, Any]) Any
Convert a serialized graph dictionary into a directed graph.
- Parameters:
d – Graph payload with
nodesandedgesentries.- Returns:
Directed graph built from
d.
- pyscm.serde.dict_to_model(data: Dict[str, Any]) ReasoningModel
Convert a serialized dictionary into a reasoning model.
- Parameters:
data – Serialized reasoning-model payload.
- Returns:
Reconstructed reasoning model.
- pyscm.serde.graph_to_dict(g: Any) Dict[str, Any]
Convert a graph into a JSON-serializable dictionary.
- Parameters:
g – Graph to serialize.
- Returns:
Dictionary with
nodesandedgesentries.
- pyscm.serde.model_to_dict(model: ReasoningModel) Dict[str, Any]
Convert a reasoning model into a JSON-serializable dictionary.
- Parameters:
model – Reasoning model to serialize.
- Returns:
Dictionary representation of
model.
Generator
Random continuous model generation is contained in this module.
- class pyscm.generator.GeneratedGaussianBbn(graph: dict[str, Any], structural: dict[str, StructuralGaussianNode], parameters: dict[str, Any])
Bases:
objectGenerated Gaussian BBN bundle.
- Parameters:
graph – Named DAG payload.
structural – Structural parameters by node name.
parameters – Py-scm style moment payload.
- __init__(graph: dict[str, Any], structural: dict[str, StructuralGaussianNode], parameters: dict[str, Any]) None
- graph: dict[str, Any]
- parameters: dict[str, Any]
- structural: dict[str, StructuralGaussianNode]
- class pyscm.generator.StructuralGaussianNode(intercept: float, parents: dict[str, float], sd: float)
Bases:
objectStructural parameters for one Gaussian node.
- Parameters:
intercept – Structural intercept.
parents – Parent coefficients by parent name.
sd – Residual noise standard deviation.
- __init__(intercept: float, parents: dict[str, float], sd: float) None
- intercept: float
- parents: dict[str, float]
- sd: float
- pyscm.generator.generate_multi_connected_structure(n: int, max_iter: int = 10, seed: int = 37, max_degree: int | None = None) DiGraph
Generate a multi-connected DAG using the shared Darkstar structure algorithm.
- Parameters:
n – Number of nodes.
max_iter – Rewiring iteration count.
seed – Deterministic seed.
max_degree – Optional undirected degree cap.
- Returns:
Integer DAG.
- pyscm.generator.generate_multi_gaussian_bbn(n: int, max_iter: int = 10, seed: int = 37, max_degree: int | None = None, intercept_mean: float = 0.0, intercept_sd: float = 0.4, weight_mean: float = 0.0, weight_sd: float = 0.35, noise_sd_min: float = 0.35, noise_sd_max: float = 0.9, max_condition_number: float | None = None, retry_limit: int = 64) GeneratedGaussianBbn
Generate a multi-connected Gaussian BBN.
- Parameters:
n – Number of nodes.
max_iter – Rewiring iteration count.
seed – Deterministic seed.
max_degree – Optional undirected degree cap.
intercept_mean – Mean of the structural intercept draw.
intercept_sd – Standard deviation of the structural intercept draw.
weight_mean – Mean of the parent-weight draw.
weight_sd – Standard deviation of the parent-weight draw.
noise_sd_min – Minimum residual standard deviation.
noise_sd_max – Maximum residual standard deviation.
max_condition_number – Optional covariance condition-number cap.
retry_limit – Maximum regeneration attempts when the condition cap rejects a sample.
- Returns:
Named graph, structural parameters, and derived moment payload.
- pyscm.generator.generate_singly_connected_structure(n: int, max_iter: int = 10, seed: int = 37, max_degree: int | None = None) DiGraph
Generate a singly-connected DAG using the shared Darkstar structure algorithm.
- Parameters:
n – Number of nodes.
max_iter – Rewiring iteration count.
seed – Deterministic seed.
max_degree – Optional undirected degree cap.
- Returns:
Integer DAG.
- pyscm.generator.generate_singly_gaussian_bbn(n: int, max_iter: int = 10, seed: int = 37, max_degree: int | None = None, intercept_mean: float = 0.0, intercept_sd: float = 0.4, weight_mean: float = 0.0, weight_sd: float = 0.35, noise_sd_min: float = 0.35, noise_sd_max: float = 0.9, max_condition_number: float | None = None, retry_limit: int = 64) GeneratedGaussianBbn
Generate a singly-connected Gaussian BBN.
- Parameters:
n – Number of nodes.
max_iter – Rewiring iteration count.
seed – Deterministic seed.
max_degree – Optional undirected degree cap.
intercept_mean – Mean of the structural intercept draw.
intercept_sd – Standard deviation of the structural intercept draw.
weight_mean – Mean of the parent-weight draw.
weight_sd – Standard deviation of the parent-weight draw.
noise_sd_min – Minimum residual standard deviation.
noise_sd_max – Maximum residual standard deviation.
max_condition_number – Optional covariance condition-number cap.
retry_limit – Maximum regeneration attempts when the condition cap rejects a sample.
- Returns:
Named graph, structural parameters, and derived moment payload.
- pyscm.generator.get_simple_ordered_tree(n: int) DiGraph
Return the simple ordered tree
0 -> 1 -> ... -> n-1.- Parameters:
n – Number of nodes.
- Returns:
Integer DAG.
- pyscm.generator.structural_to_parameter_payload(graph: dict[str, Any], structural: dict[str, StructuralGaussianNode]) dict[str, Any]
Convert structural Gaussian parameters into the py-scm
v/m/Spayload.- Parameters:
graph – Named graph payload.
structural – Structural parameters by node name.
- Returns:
Mean/covariance payload.
- pyscm.generator.to_string_graph(graph: DiGraph) dict[str, Any]
Rename an integer graph as
X0,X1, …- Parameters:
graph – Integer DAG.
- Returns:
Named graph payload.
Learn
Learning algorithms are contained in this module.
- class pyscm.learn.Pc(t=0.05, alpha=0.1)
Bases:
objectLearns a Bayesian Belief Network using the PC-algorithm.
- __init__(t=0.05, alpha=0.1)
Initialize the PC learner.
- Parameters:
t – Marginal independence threshold. Values lower than this one are considered as independent.
alpha – Conditional independence threshold. Values lower than this one are considered as conditionally independent.
- Returns:
None.
- fit(X: DataFrame)
Learn a directed acyclic graph from the data.
- Parameters:
X – Input data frame with one column per variable.
- Returns:
Pc.
- class pyscm.learn.Tree(alpha=0.1)
Bases:
objectLearns a Bayesian Belief Network with a tree structure.
- __init__(alpha=0.1)
Initialize the tree-structured learner.
- Parameters:
alpha – Conditional independence threshold. Values lower than this one are considered as conditionally independent.
- Returns:
None.
- fit(X: DataFrame)
Learn a tree-structured directed acyclic graph from the data.
- Parameters:
X – Input data frame with one column per variable.
- Returns:
Tree.
- pyscm.learn.compute_indep(df: DataFrame, t=0.05, alpha=0.1) DataFrame
Computes pairwise conditional independence tests.
- Parameters:
df – Data.
t – Threshold for marginal independence.
alpha – Alpha value for conditional independence.
- Returns:
Independence test results with columns: x, y, is_indep, p.
- pyscm.learn.get_local_models(Xy: DataFrame) DataFrame
Learns all local models.
- Parameters:
Xy – Data.
- Returns:
Dataframe with results of local models. Hints on edge orientation.
- pyscm.learn.identify_v_structures(d: DiGraph) List[Tuple[str, str, str]]
Identifies all v-structures (colliders) in the DAG.
- Parameters:
d – DAG.
- Returns:
List of collider configurations.
- pyscm.learn.learn_local_model(Xy: DataFrame, y_col: str) Dict[str, Any]
Learn a local model from the data with respect to the y variable.
- Parameters:
Xy – Data.
y_col – y variable.
- Returns:
Dictionary of results of local model.
- pyscm.learn.learn_skeleton(df: DataFrame, t=0.05, alpha=0.1) Graph
Learns the skeleton of the graph.
- Parameters:
df – Data.
t – Threshold for marginal independence.
alpha – Alpha value for conditional independence.
- Returns:
Undirected graph (skeleton).
- pyscm.learn.learn_tree_structure(df: DataFrame) Graph
Learn a tree structure.
- Parameters:
df – Data.
- Returns:
Tree.
- pyscm.learn.learn_v_structures(u: Graph, df: DataFrame) DiGraph
Learns the v-structures (colliders) of the graph.
- Parameters:
u – Undirected graph (skeleton).
df – Data.
- Returns:
Directed acyclic graph (DAG).
- pyscm.learn.orient_by_inference(u: Graph, d: DiGraph, p: DataFrame | _LocalModelHints) Tuple[Graph, DiGraph]
Orient edges by inference.
- Parameters:
u – Undirected graph (skeleton).
d – DAG.
p – Local model edge hints.
- Returns:
Tuple of undirected graph and DAG.
- pyscm.learn.orient_by_models(u: Graph, d: DiGraph, p: DataFrame | _LocalModelHints) Tuple[Graph, DiGraph]
Orient edges by using hints based on local models.
- Parameters:
u – Undirected graph (skeleton).
d – DAG.
p – Local model edge hints.
- Returns:
Tuple of undirected graph and DAG.