Performance & correctness hardening: network generation + ODE/SSA/PSA/PLA simulators by akutuva21 · Pull Request #121 · RuleWorld/bngplayground

akutuva21 · 2026-06-09T21:02:26Z

Performance & correctness hardening: network generation + ODE/SSA/PSA/PLA simulators

Summary

This PR optimizes the hot paths in rule-based network generation and the deterministic/stochastic simulators (ODE, SSA, PSA, PLA), and fixes two latent correctness bugs that the optimizations exposed. The engine is now substantially faster on large models while producing bit-identical results, with the one exception being intentional correctness fixes to canonicalization for very large or self-bonded species. NFsim is untouched and out of scope.

The work is split into 8 commits. Two of them (4→5 and 6→7) are correctness-fix pairs where the second commit repairs a bug introduced by the first — if this series is rebased or cherry-picked rather than squashed, the fix commits must travel with their predecessors (see Correctness fixes).

What's in this PR

#	Commit	Area
1	Remove obsolete TODOs / KNOWN LIMITATIONS in `BoundedVerifier.ts` (#570)	code health
2	Network-gen + solver optimizations (14 changes)	perf
3	VF2 matcher: Int32Array cores, neighbor cache, direct field assignment, Fenwick highBit	perf
4	Round-3 matcher: component bitmask cache, precomputed order, MCV skip, cache-first lookup, ridx keys	perf
5	fix: bitmask overflow guard (>31 components) + 3 supporting fixes	correctness
6	Flat bond list, integer WL refinement, influence-matrix OOM guard	perf
7	fix: WL integer overflow, O(M·E) regression, self-bond half-edge	correctness
8	O(V)→O(referenced) copy in functional-rate bytecode evaluator	perf

Files touched (engine only): graph/NetworkGenerator.ts, graph/core/Matcher.ts, graph/core/SpeciesGraph.ts, graph/core/Canonical.ts, simulation/SimulationLoop.ts, simulation/PSASimulator.ts, simulation/PLASimulator.ts, simulation/ExpressionEvaluator.ts, multiscale/MultiscaleSimulation.ts, utils/fenwickTree.ts (new), verification/BoundedVerifier.ts, and tests/nauty-canonicalization.spec.ts (regression tests).

Correctness fixes

These are the parts a reviewer should read closely. Everything else is performance-only and behavior-preserving.

C-1 — WL integer-encoding overflow (P0, silent species mis-dedup) — commit 7 fixes commit 6

Commit 6 sped up Weisfeiler-Leman color refinement in Canonical.ts by replacing per-edge string signatures with a packed integer: (compNameId << 20) | (partnerNameId << 10) | prevColor. The 10-bit color field overflows when a color rank reaches 1024 — i.e. for any species with ≥1024 molecules — spilling into the partner-name field, corrupting the refinement signature, and producing a wrong canonical label. Because canonical labels drive species identity, two non-isomorphic species could receive the same label and be silently merged during network generation — a wrong network with no error raised.

Commit 7 replaces the shift-based packing with a multiplication-based encoding sized to the actual data: (compNameId * numNames + partnerNameId) * colorRange + prevColor, where colorRange = numMols. This runs in full IEEE-754 integer precision (max value ≈ numNames² · numMols, far under 2⁵³ for any model that fits in memory) and is injective on the same (localName, partnerName, partnerColor) triple the original string key encoded — so it produces the identical refinement partition and canonical labeling as the pre-optimization baseline, just faster.

Implication for merge: commit 6 in isolation is unsafe for large species. Keep 6 and 7 together.

C-2 / C-3 — also commit 7 (introduced by commit 6's refactor)

The same commit-6 migration to a flat bond list changed the WL inner-loop access pattern to scan all bonds for every molecule — O(M·E) per refinement iteration instead of O(C+E). Commit 7 precomputes a per-molecule incident-edge list once before the iteration loop, restoring linear cost (this matters precisely on the large species the optimization was meant to help). The incident-list builder also pushes both endpoints of every bond unconditionally, fixing a separate regression where intramolecular (self) bonds contributed only one half-edge to the WL signature instead of two.

X-1 — component bitmask overflow guard — commit 5 fixes commit 4

Commit 4 replaced the string-keyed usedTargets set in component matching with a 32-bit bitmask. Commit 5 adds the guard for molecules with >31 components: those paths fall back to a Set-based matchComponentsLarge / assignComponentsBacktrackLarge (faithful replicas of the bitmask backtracking), avoiding the same class of shift overflow. Same merge implication: 4 needs 5.

Performance changes

Network generation (`NetworkGenerator.ts`, `Matcher.ts`, `SpeciesGraph.ts`, `Canonical.ts`)

Removed an unconditional console.log and a debug trace from the per-species hot loop.
queue.shift() (O(n) per dequeue → O(n²) total) replaced with index-based access in both the main species loop and the per-rule BFS.
Most-constrained-first ordering for the partner-set intersection in matchPartnersRecursively (seed from the rarest molecule type).
VF2 matcher: molecule-level cores are now Int32Array with a Uint8Array frontier bitset (no Map/Set allocation per search node); a lazily-cached SpeciesGraph.neighborList gives zero-alloc neighbor reads; the match cache returns its MatchMap arrays directly instead of deep-copying on every hit; canPossiblyMatch reads cached topological aggregates (molTypeCounts/bondCount/boundCompCount/maxDegree).
Component matching: a per-usedTargets integer bitmask key (replacing a sorted-joined string key), Int32Array assignment scratch, component priority order precomputed once per pattern molecule, MCV heuristic skipped for ≤4 remaining components, and the match cache checked before the structural prefilter (the prefilter is redundant work on a cache hit).
Canonicalization: a lazily-cached flat SpeciesGraph.bondList (Int32Array of [m1,c1,m2,c2] tuples) eliminates repeated adjacency string-key parsing (indexOf/substring/parseInt) at every consumption site, and the WL refinement uses numeric edge keys + numeric sort (see C-1 for the corrected encoding).

Simulation (`SimulationLoop.ts`, `PSASimulator.ts`, `PLASimulator.ts`, `ExpressionEvaluator.ts`, `utils/fenwickTree.ts`)

SSA/PSA reaction selection: new FenwickTree (binary-indexed tree) gives O(log R) weighted selection and O(log R) incremental updates, replacing the O(R) cumulative-sum scan; highBit is precomputed in the constructor.
SSA: mass-action propensities use a precomputed effective rate constant kEff (volume scaling folded in once rather than per event); the incremental-propensity recompute is reused for DIN/influence tracking instead of evaluating twice.
PSA: incremental aTot maintenance (with periodic resync) replaces a full O(R) re-sum per event; a dead if/else branch was collapsed.
PLA: reactant stoichiometry is precomputed (uniqReactantIdx/uniqReactantStoich, no per-call Map allocation) and computeTau uses sparse nzSpecies/nzChange for O(nnz) instead of O(species·reactions).
ODE functional-rate path: field writes use direct assignment instead of Object.defineProperty (across SimulationLoop/PSA/PLA/Multiscale); only species actually referenced by a functional rate are written to the rate context per derivative; ridx keys are precomputed.
Functional-rate bytecode evaluator (commit 8): the evaluator previously copied every model variable (all species + observables + params) out of the string-keyed context into its value-slot array on every evaluation. It now scans the compiled bytecode once for the slots the expression actually reads (collectReferencedSlots) and copies only those — O(referenced) instead of O(V) per call. Output is bit-identical; only unreferenced slots (which the bytecode never reads) are left unpopulated.
OOM guard: MAX_INFLUENCE_REACTIONS = 5000 short-circuits the dense Float64Array(R²) influence-matrix allocation on large networks and disables influence tracking with a warning rather than failing at allocation.

Testing

Added tests/nauty-canonicalization.spec.ts: a 1025-molecule linear homopolymer (regression for the C-1 overflow) and a single molecule with an intramolecular bond (regression for the C-3 self-bond half-edge). Both assert canonical labels match the reference, locking the correctness fixes against future refactors of the canonicalization path.
All performance changes are behavior-preserving and should be covered by existing model/equivalence suites; the optimization commits change how values are computed, not what is computed.

Review focus

The subtle, correctness-critical spots, in priority order:

ExpressionEvaluator.ts collectReferencedSlots — its per-opcode operand-width decoding (PUSH_CONST = 8 bytes, PUSH_SPEC/PUSH_OBS = 4-byte Int32, all others operand-free) must match the interpreter's own switch and the bytecode compiler's emit sites exactly, or pc misaligns. Verified against both; confirm if the opcode set changes.
Canonical.ts WL encoding — the multiplication-based key (C-1) and its injectivity argument (color rank always < numMols = colorRange).
Matcher.ts matchComponentsLarge / assignComponentsBacktrackLarge — the Set-based fallback (X-1) must faithfully mirror the bitmask backtracking; the >31 guard is the entry condition.
SpeciesGraph.bondList consumers — bondList is undirected (one entry per bond), so any consumer needing both directions (e.g. the WL incident list, the canonicalization adjacency builder) must push both endpoints; verify none silently dropped a direction.

Out of scope / follow-ups

Functional-rate "Option 2" (direct typed-array binding): commit 8 implements the low-risk half (O(V)→O(referenced) copy). The fuller version — binding referenced variables directly to the state/observable arrays and removing the string-keyed rate context entirely — is deferred and gated on a profile: it only pays off on large functional-rate models, and the call-site/interface churn isn't justified unless a flame graph shows the rate-context construction still dominating. Design notes exist separately.
NFsim: unchanged.
Further gains would require algorithmic changes (e.g. symmetry-aware match enumeration) or parallelism, not incremental tuning.

…NS in BoundedVerifier.ts (#570) Removed the obsolete `TODO` and three `KNOWN LIMITATIONS` from `BoundedVerifier.ts` as the file has already been refactored to use `BNGLParser`, `GraphCanonicalizer`, and `GraphMatcher`. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

- N1: Remove unconditional console.log hot loop + debug trace in NetworkGenerator - N4: queue.shift() → index-based access with splice cleanup - N5: Most-constrained-first partner intersection in matchPartnersRecursively - P1: Collapse dead if/else branch in PSASimulator - P2/P3: Incremental aTot + Fenwick tree for O(log R) PSA selection - S1: Precompute kEff for mass-action propensity in SimulationLoop - S2: Fenwick tree for O(log R) SSA reaction selection - S3: Reuse incremental propensities in DIN tracking (avoid double eval) - L1: Precompute uniqReactantIdx/Stoich in PLA (no Map allocation) - L2: Sparse nzSpecies/nzChange for O(nnz) computeTau - N2: Cached molTypeCounts/bondCount/boundCompCount/maxDegree on SpeciesGraph - N3: canPossiblyMatch reads cached aggregates - N6: Return cached MatchMaps directly (no deep copy)

…, and Fenwick highBit N-1: SpeciesGraph.neighborList lazy cache (zero-alloc neighbor reads in VF2) N-2: Int32Array corePattern/coreTarget + Uint8Array frontier bitset (no Map/Set alloc per node) N-3: Remove componentCandidateCache.clear() on addPair/removePair (preserve multivalent cache) N-4: queue.shift() -> queue[head++] BFS in NetworkGenerator S-1: Object.defineProperty -> target[key]=value in SimulationLoop/PSA/PLA/Multiscale S-2: Precompute species referenced by functional rates (O(numSpecies) -> O(referenced) per derivative) S-3: Precompute highBit in FenwickTree constructor

…nment, precomputed order, MCV skip, cache-first lookup, ridx keys, filtered observable assign

…der mutation fix, X-3 redundant dedup removal, X-4 decorate-then-sort in computeNodeOrdering

Z-2: Add lazily-cached flat bond list (Int32Array) to SpeciesGraph. Each bond stored once as [m1,c1,m2,c2] tuple. Eliminates string-key parsing (indexOf/substring/parseInt) at every consumption site. Z-1: Switch WL refinement to integer-encoded edge tuples. Precomputed compNameToId map + packed int encoding ((compNameId << 20) | (partnerCompNameId << 10) | prevColor) replaces per-edge string allocation and string sort with numeric sort. Refactor all adjacency iterations in Canonical.ts (Nauty bond-vertex construction, BFS fallback, computeOrbits, bond-ID collection) to use bondList. Removes bondVertexByKey dedup Map/set. Add MAX_INFLUENCE_REACTIONS=5000 guard before Float64Array(N^2) allocation in SimulationLoop.ts to prevent OOM on large networks.

C-1 [P0]: Replace shift-based packing ((a<<20)|(b<<10)|c) with multiplication-based encoding ((a*numNames+b)*colorRange+c). 32-bit shifts overflow when color ranks >=1024 (spills into partnerNameId field), silently corrupting WL signatures and causing wrong canonical labels for species with >=1025 molecules. C-2 [P1]: Precompute per-molecule incident edge list once before the iteration loop instead of scanning all bonds for every molecule. Restores O(C+E) per iteration from O(M*E). C-3 [P1]: Push both endpoints unconditionally in incident list builder. Self-bonds (intramolecular) previously contributed only one half-edge due to else-if. Adds regression tests: 1025-molecule linear homopolymer (C-1 overflow) and single-molecule intramolecular bond (C-3 self-bond).

-            setSafeNumericField(rateContext, observableName, 0);
-          }
+        for (let i = 0; i < safeRateObservableNames.length; i++) {
+          rateContext[safeRateObservableNames[i]] = 0;


          const speciesName = model.species[k].name;
          if (speciesName !== '__proto__' && speciesName !== 'constructor' && speciesName !== 'prototype') {
-            setSafeNumericField(rateContext, speciesName, 0);
+            rateContext[speciesName] = 0;


-            }
+          for (let i = 0; i < safeRateObservableNames.length; i++) {
+            const name = safeRateObservableNames[i];
+            rateContext[name] = obsValues[name];


+          // Update species values in the mutable context (in-place) — only those referenced by functional rates
+          for (let ri = 0; ri < referencedSpeciesIndices.length; ri++) {
+            const k = referencedSpeciesIndices[ri];
+            rateContext[referencedSpeciesNames[ri]] = odeUsesAmountState ? yIn[k] : (yIn[k] * speciesVolumes[k]);


+    } else {
+      this.computeUncoveredTargetNodes();
+    }
+    const targetCandidateCount = this.frontierSize;


+
+  private matchComponentsLarge(pMolIdx: number, tMolIdx: number): Map<number, number> | null {
+    const patternMol = this.pattern.molecules[pMolIdx];
+    const targetMol = this.target.molecules[tMolIdx];


+    if (this._maxDegree !== undefined) return this._maxDegree;
+    let max = 0;
+    for (let idx = 0; idx < this.molecules.length; idx++) {
+      let deg = 0;


…ule count, breaking bonded multi-molecule matching

akutuva21 and others added 9 commits June 9, 2026 11:57

Merge branch 'RuleWorld:main' into main

268c9e9

perf: round-3 matcher optimizations — bitmask cache, Int32Array assig…

0be2a17

…nment, precomputed order, MCV skip, cache-first lookup, ridx keys, filtered observable assign

fix: X-1 bitmask overflow guard for >31 target components, X-2 MCV or…

f4d8e37

…der mutation fix, X-3 redundant dedup removal, X-4 decorate-then-sort in computeNodeOrdering

perf: O(V)->O(referenced) copy in functional-rate bytecode evaluator

3b752b1

github-advanced-security AI found potential problems Jun 9, 2026

View reviewed changes

fix: order.length bound uses max component count instead of per-molec…

1466d23

…ule count, breaking bonded multi-molecule matching

akutuva21 merged commit e46ecb1 into RuleWorld:main Jun 10, 2026
7 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance & correctness hardening: network generation + ODE/SSA/PSA/PLA simulators#121

Performance & correctness hardening: network generation + ODE/SSA/PSA/PLA simulators#121
akutuva21 merged 10 commits into
RuleWorld:mainfrom
akutuva21:main

akutuva21 commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

akutuva21 commented Jun 9, 2026

Performance & correctness hardening: network generation + ODE/SSA/PSA/PLA simulators

Summary

What's in this PR

Correctness fixes

C-1 — WL integer-encoding overflow (P0, silent species mis-dedup) — commit 7 fixes commit 6

C-2 / C-3 — also commit 7 (introduced by commit 6's refactor)

X-1 — component bitmask overflow guard — commit 5 fixes commit 4

Performance changes

Network generation (NetworkGenerator.ts, Matcher.ts, SpeciesGraph.ts, Canonical.ts)

Simulation (SimulationLoop.ts, PSASimulator.ts, PLASimulator.ts, ExpressionEvaluator.ts, utils/fenwickTree.ts)

Testing

Review focus

Out of scope / follow-ups

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Network generation (`NetworkGenerator.ts`, `Matcher.ts`, `SpeciesGraph.ts`, `Canonical.ts`)

Simulation (`SimulationLoop.ts`, `PSASimulator.ts`, `PLASimulator.ts`, `ExpressionEvaluator.ts`, `utils/fenwickTree.ts`)