Benchmarking Against R bnlearn
py-scm has been benchmarked against the Gaussian inference available in R bnlearn. The comparison uses two non-trivial Gaussian networks, one simple backdoor model and one mediated probe model, and it evaluates associational, interventional, and counterfactual queries on both.
The py-scm side of the timing harness uses pandas=False so that the runtime comparison reflects the NumPy inference engine rather than pandas materialization overhead. The benchmark records both hot timings, where the same query is repeated on an already-built model, and cold timings, where a fresh model is built before each query.
The figures below reflect the benchmark suite as rerun on 2026-03-28.
The accuracy comparison is exact on the benchmark suite. The worst absolute difference stayed at machine-precision scale for every query type.
Query type |
Worst absolute difference |
Result |
|---|---|---|
Associational |
7.21645e-15 |
Exact |
Interventional |
3.55271e-15 |
Exact |
Counterfactual |
1.33227e-15 |
Exact |
The hot timings show the steady-state cost of repeated inference after model construction and cache setup. On this benchmark suite, py-scm remains at least one order of magnitude faster than bnlearn on every query family, exceeds two orders of magnitude on the interventional and counterfactual queries, and stays above 20x faster even on the slowest associational case.
Query type |
|
|
|
|---|---|---|---|
Associational |
0.000959 to 0.002982 |
0.057 to 0.062 |
20.1x to 59.4x |
Interventional |
0.000860 to 0.000951 |
0.200 to 0.300 |
210.3x to 341.7x |
Counterfactual |
0.004831 to 0.005795 |
1.400 to 2.600 |
257.6x to 448.7x |
The cold timings include model construction and first-query setup. That is a stricter measure because it removes most of the benefit of repeated-query caching. Even under that condition, py-scm remains faster than bnlearn on every benchmarked query family and is now back near the earlier pre-regression cold-start numbers after the constructor-path fixes.
Query type |
|
|
|
|---|---|---|---|
Associational |
0.120674 to 0.129090 |
0.717 to 1.081 |
5.74x to 8.55x |
Interventional |
0.163683 to 0.224653 |
0.750 to 1.050 |
4.45x to 4.70x |
Counterfactual |
0.184865 to 0.243760 |
2.100 to 3.200 |
11.3x to 13.3x |
These results should be read as benchmark-suite results rather than universal guarantees. They show that the current py-scm implementation still matches R bnlearn on the tested Gaussian queries while delivering materially lower runtimes on the same workload in both steady-state and cold-start conditions.