Benchmarks

These are local wall-clock reference measurements for the Gaussian SCM query suite. The absolute numbers are machine-specific; the relative ordering is the useful part. Warm timings reuse an already prepared model. Cold timings include fresh model preparation before each query.

The native-port table uses the shared continuous oracle suite. Every port matched the oracle within tolerance for associational, interventional, effect, and counterfactual query results.

Continuous Query Runtime By Port
Language	Warm ms	Cold ms	vs Python cold
C++	0.0010	0.0036	7.53x
Rust	0.0151	0.0102	2.67x
Ruby	0.0281	0.0133	2.04x
Go	0.0189	0.0138	1.97x
TypeScript	0.0017	0.0192	1.42x
Python	0.0012	0.0272	1.00x
Swift	0.0293	0.0377	0.72x
Lua	0.0519	0.0446	0.61x
C#	0.0029	0.0509	0.53x
Java	0.0172	0.0726	0.37x
Octave	0.0988	0.0829	0.33x
R	0.2009	0.2123	0.13x
Julia	0.0200	4.3262	0.01x

The oracle suite is intentionally small, so sub-0.01 ms differences should be read as local microbenchmark noise unless they also hold on larger generated Gaussian sweeps. The cold column is the safer cross-language comparison because it includes model setup and avoids over-weighting tiny hot-path calls.

R `bnlearn` Comparison

The R comparison uses bnlearn as an independent Gaussian reference over backdoor, mediated, joint-intervention, and collider-conditioned models. py-scm matches bnlearn at floating-point precision across all checked query families.

Accuracy Against R `bnlearn`
Query type	Metrics	Worst absolute difference	Result
Associational	824	5.329e-15	Exact
Interventional	38	4.885e-15	Exact
Causal effect	24	1.776e-15	Exact
Counterfactual	19	1.776e-15	Exact

The runtime table reports ranges across benchmark cases. Speedup is bnlearn time divided by py-scm time, so values above 1.0x favor py-scm. A zero-rounded bnlearn prior timing is excluded from the hot associational speedup range.

Runtime Against R `bnlearn`
Query type	Mode	`py-scm` ms	`bnlearn` ms	Speedup
Associational	hot	0.000345-0.000751	0.000-0.041	53.25x-78.38x
Associational	cold	0.006287-0.026034	0.264-0.477	13.72x-44.63x
Interventional	hot	0.000383-0.000610	0.050-0.150	98.77x-391.31x
Interventional	cold	0.025948-0.046683	0.300-1.150	9.14x-31.64x
Causal effect	hot	0.001225-0.001415	0.150-0.250	112.15x-204.08x
Causal effect	cold	0.031248-0.052408	0.400-0.650	10.50x-13.79x
Counterfactual	hot	0.002350-0.002987	0.600-1.300	205.71x-461.54x
Counterfactual	cold	0.029754-0.049667	0.900-1.600	26.43x-33.24x

The main reading is that py-scm agrees with bnlearn to numerical precision and remains materially faster on the same Gaussian workloads, both for repeated queries and for cold first-hit queries.

Benchmarks

R bnlearn Comparison

R `bnlearn` Comparison