Benchmarking Against R bnlearn

py-scm has been benchmarked against the Gaussian inference available in R bnlearn. The comparison uses two non-trivial Gaussian networks, one simple backdoor model and one mediated probe model, and it evaluates associational, interventional, and counterfactual queries on both.

The py-scm side of the timing harness uses pandas=False so that the runtime comparison reflects the NumPy inference engine rather than pandas materialization overhead. The benchmark records both hot timings, where the same query is repeated on an already-built model, and cold timings, where a fresh model is built before each query.

The figures below reflect the benchmark suite as rerun on 2026-03-28.

The accuracy comparison is exact on the benchmark suite. The worst absolute difference stayed at machine-precision scale for every query type.

Accuracy Against R bnlearn

Query type

Worst absolute difference

Result

Associational

7.21645e-15

Exact

Interventional

3.55271e-15

Exact

Counterfactual

1.33227e-15

Exact

The hot timings show the steady-state cost of repeated inference after model construction and cache setup. On this benchmark suite, py-scm remains at least one order of magnitude faster than bnlearn on every query family, exceeds two orders of magnitude on the interventional and counterfactual queries, and stays above 20x faster even on the slowest associational case.

Hot Runtime Comparison

Query type

py-scm avg ms across benchmark cases

bnlearn avg ms across benchmark cases

py-scm speedup over bnlearn

Associational

0.000959 to 0.002982

0.057 to 0.062

20.1x to 59.4x

Interventional

0.000860 to 0.000951

0.200 to 0.300

210.3x to 341.7x

Counterfactual

0.004831 to 0.005795

1.400 to 2.600

257.6x to 448.7x

The cold timings include model construction and first-query setup. That is a stricter measure because it removes most of the benefit of repeated-query caching. Even under that condition, py-scm remains faster than bnlearn on every benchmarked query family and is now back near the earlier pre-regression cold-start numbers after the constructor-path fixes.

Cold Runtime Comparison

Query type

py-scm avg ms across benchmark cases

bnlearn avg ms across benchmark cases

py-scm speedup over bnlearn

Associational

0.120674 to 0.129090

0.717 to 1.081

5.74x to 8.55x

Interventional

0.163683 to 0.224653

0.750 to 1.050

4.45x to 4.70x

Counterfactual

0.184865 to 0.243760

2.100 to 3.200

11.3x to 13.3x

These results should be read as benchmark-suite results rather than universal guarantees. They show that the current py-scm implementation still matches R bnlearn on the tested Gaussian queries while delivering materially lower runtimes on the same workload in both steady-state and cold-start conditions.