Skip to content

Performance & correctness hardening: network generation + ODE/SSA/PSA/PLA simulators#121

Merged
akutuva21 merged 10 commits into
RuleWorld:mainfrom
akutuva21:main
Jun 10, 2026
Merged

Performance & correctness hardening: network generation + ODE/SSA/PSA/PLA simulators#121
akutuva21 merged 10 commits into
RuleWorld:mainfrom
akutuva21:main

Conversation

@akutuva21

Copy link
Copy Markdown
Member

Performance & correctness hardening: network generation + ODE/SSA/PSA/PLA simulators

Summary

This PR optimizes the hot paths in rule-based network generation and the deterministic/stochastic simulators (ODE, SSA, PSA, PLA), and fixes two latent correctness bugs that the optimizations exposed. The engine is now substantially faster on large models while producing bit-identical results, with the one exception being intentional correctness fixes to canonicalization for very large or self-bonded species. NFsim is untouched and out of scope.

The work is split into 8 commits. Two of them (4→5 and 6→7) are correctness-fix pairs where the second commit repairs a bug introduced by the first — if this series is rebased or cherry-picked rather than squashed, the fix commits must travel with their predecessors (see Correctness fixes).

What's in this PR

# Commit Area
1 Remove obsolete TODOs / KNOWN LIMITATIONS in BoundedVerifier.ts (#570) code health
2 Network-gen + solver optimizations (14 changes) perf
3 VF2 matcher: Int32Array cores, neighbor cache, direct field assignment, Fenwick highBit perf
4 Round-3 matcher: component bitmask cache, precomputed order, MCV skip, cache-first lookup, ridx keys perf
5 fix: bitmask overflow guard (>31 components) + 3 supporting fixes correctness
6 Flat bond list, integer WL refinement, influence-matrix OOM guard perf
7 fix: WL integer overflow, O(M·E) regression, self-bond half-edge correctness
8 O(V)→O(referenced) copy in functional-rate bytecode evaluator perf

Files touched (engine only): graph/NetworkGenerator.ts, graph/core/Matcher.ts, graph/core/SpeciesGraph.ts, graph/core/Canonical.ts, simulation/SimulationLoop.ts, simulation/PSASimulator.ts, simulation/PLASimulator.ts, simulation/ExpressionEvaluator.ts, multiscale/MultiscaleSimulation.ts, utils/fenwickTree.ts (new), verification/BoundedVerifier.ts, and tests/nauty-canonicalization.spec.ts (regression tests).


Correctness fixes

These are the parts a reviewer should read closely. Everything else is performance-only and behavior-preserving.

C-1 — WL integer-encoding overflow (P0, silent species mis-dedup) — commit 7 fixes commit 6

Commit 6 sped up Weisfeiler-Leman color refinement in Canonical.ts by replacing per-edge string signatures with a packed integer: (compNameId << 20) | (partnerNameId << 10) | prevColor. The 10-bit color field overflows when a color rank reaches 1024 — i.e. for any species with ≥1024 molecules — spilling into the partner-name field, corrupting the refinement signature, and producing a wrong canonical label. Because canonical labels drive species identity, two non-isomorphic species could receive the same label and be silently merged during network generation — a wrong network with no error raised.

Commit 7 replaces the shift-based packing with a multiplication-based encoding sized to the actual data: (compNameId * numNames + partnerNameId) * colorRange + prevColor, where colorRange = numMols. This runs in full IEEE-754 integer precision (max value ≈ numNames² · numMols, far under 2⁵³ for any model that fits in memory) and is injective on the same (localName, partnerName, partnerColor) triple the original string key encoded — so it produces the identical refinement partition and canonical labeling as the pre-optimization baseline, just faster.

Implication for merge: commit 6 in isolation is unsafe for large species. Keep 6 and 7 together.

C-2 / C-3 — also commit 7 (introduced by commit 6's refactor)

The same commit-6 migration to a flat bond list changed the WL inner-loop access pattern to scan all bonds for every molecule — O(M·E) per refinement iteration instead of O(C+E). Commit 7 precomputes a per-molecule incident-edge list once before the iteration loop, restoring linear cost (this matters precisely on the large species the optimization was meant to help). The incident-list builder also pushes both endpoints of every bond unconditionally, fixing a separate regression where intramolecular (self) bonds contributed only one half-edge to the WL signature instead of two.

X-1 — component bitmask overflow guard — commit 5 fixes commit 4

Commit 4 replaced the string-keyed usedTargets set in component matching with a 32-bit bitmask. Commit 5 adds the guard for molecules with >31 components: those paths fall back to a Set-based matchComponentsLarge / assignComponentsBacktrackLarge (faithful replicas of the bitmask backtracking), avoiding the same class of shift overflow. Same merge implication: 4 needs 5.


Performance changes

Network generation (NetworkGenerator.ts, Matcher.ts, SpeciesGraph.ts, Canonical.ts)

  • Removed an unconditional console.log and a debug trace from the per-species hot loop.
  • queue.shift() (O(n) per dequeue → O(n²) total) replaced with index-based access in both the main species loop and the per-rule BFS.
  • Most-constrained-first ordering for the partner-set intersection in matchPartnersRecursively (seed from the rarest molecule type).
  • VF2 matcher: molecule-level cores are now Int32Array with a Uint8Array frontier bitset (no Map/Set allocation per search node); a lazily-cached SpeciesGraph.neighborList gives zero-alloc neighbor reads; the match cache returns its MatchMap arrays directly instead of deep-copying on every hit; canPossiblyMatch reads cached topological aggregates (molTypeCounts/bondCount/boundCompCount/maxDegree).
  • Component matching: a per-usedTargets integer bitmask key (replacing a sorted-joined string key), Int32Array assignment scratch, component priority order precomputed once per pattern molecule, MCV heuristic skipped for ≤4 remaining components, and the match cache checked before the structural prefilter (the prefilter is redundant work on a cache hit).
  • Canonicalization: a lazily-cached flat SpeciesGraph.bondList (Int32Array of [m1,c1,m2,c2] tuples) eliminates repeated adjacency string-key parsing (indexOf/substring/parseInt) at every consumption site, and the WL refinement uses numeric edge keys + numeric sort (see C-1 for the corrected encoding).

Simulation (SimulationLoop.ts, PSASimulator.ts, PLASimulator.ts, ExpressionEvaluator.ts, utils/fenwickTree.ts)

  • SSA/PSA reaction selection: new FenwickTree (binary-indexed tree) gives O(log R) weighted selection and O(log R) incremental updates, replacing the O(R) cumulative-sum scan; highBit is precomputed in the constructor.
  • SSA: mass-action propensities use a precomputed effective rate constant kEff (volume scaling folded in once rather than per event); the incremental-propensity recompute is reused for DIN/influence tracking instead of evaluating twice.
  • PSA: incremental aTot maintenance (with periodic resync) replaces a full O(R) re-sum per event; a dead if/else branch was collapsed.
  • PLA: reactant stoichiometry is precomputed (uniqReactantIdx/uniqReactantStoich, no per-call Map allocation) and computeTau uses sparse nzSpecies/nzChange for O(nnz) instead of O(species·reactions).
  • ODE functional-rate path: field writes use direct assignment instead of Object.defineProperty (across SimulationLoop/PSA/PLA/Multiscale); only species actually referenced by a functional rate are written to the rate context per derivative; ridx keys are precomputed.
  • Functional-rate bytecode evaluator (commit 8): the evaluator previously copied every model variable (all species + observables + params) out of the string-keyed context into its value-slot array on every evaluation. It now scans the compiled bytecode once for the slots the expression actually reads (collectReferencedSlots) and copies only those — O(referenced) instead of O(V) per call. Output is bit-identical; only unreferenced slots (which the bytecode never reads) are left unpopulated.
  • OOM guard: MAX_INFLUENCE_REACTIONS = 5000 short-circuits the dense Float64Array(R²) influence-matrix allocation on large networks and disables influence tracking with a warning rather than failing at allocation.

Testing

  • Added tests/nauty-canonicalization.spec.ts: a 1025-molecule linear homopolymer (regression for the C-1 overflow) and a single molecule with an intramolecular bond (regression for the C-3 self-bond half-edge). Both assert canonical labels match the reference, locking the correctness fixes against future refactors of the canonicalization path.
  • All performance changes are behavior-preserving and should be covered by existing model/equivalence suites; the optimization commits change how values are computed, not what is computed.

Review focus

The subtle, correctness-critical spots, in priority order:

  1. ExpressionEvaluator.ts collectReferencedSlots — its per-opcode operand-width decoding (PUSH_CONST = 8 bytes, PUSH_SPEC/PUSH_OBS = 4-byte Int32, all others operand-free) must match the interpreter's own switch and the bytecode compiler's emit sites exactly, or pc misaligns. Verified against both; confirm if the opcode set changes.
  2. Canonical.ts WL encoding — the multiplication-based key (C-1) and its injectivity argument (color rank always < numMols = colorRange).
  3. Matcher.ts matchComponentsLarge / assignComponentsBacktrackLarge — the Set-based fallback (X-1) must faithfully mirror the bitmask backtracking; the >31 guard is the entry condition.
  4. SpeciesGraph.bondList consumers — bondList is undirected (one entry per bond), so any consumer needing both directions (e.g. the WL incident list, the canonicalization adjacency builder) must push both endpoints; verify none silently dropped a direction.

Out of scope / follow-ups

  • Functional-rate "Option 2" (direct typed-array binding): commit 8 implements the low-risk half (O(V)→O(referenced) copy). The fuller version — binding referenced variables directly to the state/observable arrays and removing the string-keyed rate context entirely — is deferred and gated on a profile: it only pays off on large functional-rate models, and the call-site/interface churn isn't justified unless a flame graph shows the rate-context construction still dominating. Design notes exist separately.
  • NFsim: unchanged.
  • Further gains would require algorithmic changes (e.g. symmetry-aware match enumeration) or parallelism, not incremental tuning.

akutuva21 and others added 9 commits June 9, 2026 11:57
…NS in BoundedVerifier.ts (#570)

Removed the obsolete `TODO` and three `KNOWN LIMITATIONS` from `BoundedVerifier.ts`
as the file has already been refactored to use `BNGLParser`, `GraphCanonicalizer`,
and `GraphMatcher`.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
- N1: Remove unconditional console.log hot loop + debug trace in NetworkGenerator
- N4: queue.shift() → index-based access with splice cleanup
- N5: Most-constrained-first partner intersection in matchPartnersRecursively
- P1: Collapse dead if/else branch in PSASimulator
- P2/P3: Incremental aTot + Fenwick tree for O(log R) PSA selection
- S1: Precompute kEff for mass-action propensity in SimulationLoop
- S2: Fenwick tree for O(log R) SSA reaction selection
- S3: Reuse incremental propensities in DIN tracking (avoid double eval)
- L1: Precompute uniqReactantIdx/Stoich in PLA (no Map allocation)
- L2: Sparse nzSpecies/nzChange for O(nnz) computeTau
- N2: Cached molTypeCounts/bondCount/boundCompCount/maxDegree on SpeciesGraph
- N3: canPossiblyMatch reads cached aggregates
- N6: Return cached MatchMaps directly (no deep copy)
…, and Fenwick highBit

N-1: SpeciesGraph.neighborList lazy cache (zero-alloc neighbor reads in VF2)
N-2: Int32Array corePattern/coreTarget + Uint8Array frontier bitset (no Map/Set alloc per node)
N-3: Remove componentCandidateCache.clear() on addPair/removePair (preserve multivalent cache)
N-4: queue.shift() -> queue[head++] BFS in NetworkGenerator
S-1: Object.defineProperty -> target[key]=value in SimulationLoop/PSA/PLA/Multiscale
S-2: Precompute species referenced by functional rates (O(numSpecies) -> O(referenced) per derivative)
S-3: Precompute highBit in FenwickTree constructor
…nment, precomputed order, MCV skip, cache-first lookup, ridx keys, filtered observable assign
…der mutation fix, X-3 redundant dedup removal, X-4 decorate-then-sort in computeNodeOrdering
Z-2: Add lazily-cached flat bond list (Int32Array) to SpeciesGraph.
Each bond stored once as [m1,c1,m2,c2] tuple. Eliminates string-key
parsing (indexOf/substring/parseInt) at every consumption site.

Z-1: Switch WL refinement to integer-encoded edge tuples. Precomputed
compNameToId map + packed int encoding ((compNameId << 20) |
(partnerCompNameId << 10) | prevColor) replaces per-edge string
allocation and string sort with numeric sort.

Refactor all adjacency iterations in Canonical.ts (Nauty bond-vertex
construction, BFS fallback, computeOrbits, bond-ID collection) to
use bondList. Removes bondVertexByKey dedup Map/set.

Add MAX_INFLUENCE_REACTIONS=5000 guard before Float64Array(N^2)
allocation in SimulationLoop.ts to prevent OOM on large networks.
C-1 [P0]: Replace shift-based packing ((a<<20)|(b<<10)|c) with
multiplication-based encoding ((a*numNames+b)*colorRange+c).
32-bit shifts overflow when color ranks >=1024 (spills into
partnerNameId field), silently corrupting WL signatures and
causing wrong canonical labels for species with >=1025 molecules.

C-2 [P1]: Precompute per-molecule incident edge list once before
the iteration loop instead of scanning all bonds for every molecule.
Restores O(C+E) per iteration from O(M*E).

C-3 [P1]: Push both endpoints unconditionally in incident list
builder. Self-bonds (intramolecular) previously contributed only
one half-edge due to else-if.

Adds regression tests: 1025-molecule linear homopolymer (C-1
overflow) and single-molecule intramolecular bond (C-3 self-bond).
setSafeNumericField(rateContext, observableName, 0);
}
for (let i = 0; i < safeRateObservableNames.length; i++) {
rateContext[safeRateObservableNames[i]] = 0;
const speciesName = model.species[k].name;
if (speciesName !== '__proto__' && speciesName !== 'constructor' && speciesName !== 'prototype') {
setSafeNumericField(rateContext, speciesName, 0);
rateContext[speciesName] = 0;
}
for (let i = 0; i < safeRateObservableNames.length; i++) {
const name = safeRateObservableNames[i];
rateContext[name] = obsValues[name];
// Update species values in the mutable context (in-place) — only those referenced by functional rates
for (let ri = 0; ri < referencedSpeciesIndices.length; ri++) {
const k = referencedSpeciesIndices[ri];
rateContext[referencedSpeciesNames[ri]] = odeUsesAmountState ? yIn[k] : (yIn[k] * speciesVolumes[k]);
} else {
this.computeUncoveredTargetNodes();
}
const targetCandidateCount = this.frontierSize;

private matchComponentsLarge(pMolIdx: number, tMolIdx: number): Map<number, number> | null {
const patternMol = this.pattern.molecules[pMolIdx];
const targetMol = this.target.molecules[tMolIdx];
if (this._maxDegree !== undefined) return this._maxDegree;
let max = 0;
for (let idx = 0; idx < this.molecules.length; idx++) {
let deg = 0;
…ule count, breaking bonded multi-molecule matching
@akutuva21 akutuva21 merged commit e46ecb1 into RuleWorld:main Jun 10, 2026
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants