Performance & correctness hardening: network generation + ODE/SSA/PSA/PLA simulators#121
Merged
Conversation
…NS in BoundedVerifier.ts (#570) Removed the obsolete `TODO` and three `KNOWN LIMITATIONS` from `BoundedVerifier.ts` as the file has already been refactored to use `BNGLParser`, `GraphCanonicalizer`, and `GraphMatcher`. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
- N1: Remove unconditional console.log hot loop + debug trace in NetworkGenerator - N4: queue.shift() → index-based access with splice cleanup - N5: Most-constrained-first partner intersection in matchPartnersRecursively - P1: Collapse dead if/else branch in PSASimulator - P2/P3: Incremental aTot + Fenwick tree for O(log R) PSA selection - S1: Precompute kEff for mass-action propensity in SimulationLoop - S2: Fenwick tree for O(log R) SSA reaction selection - S3: Reuse incremental propensities in DIN tracking (avoid double eval) - L1: Precompute uniqReactantIdx/Stoich in PLA (no Map allocation) - L2: Sparse nzSpecies/nzChange for O(nnz) computeTau - N2: Cached molTypeCounts/bondCount/boundCompCount/maxDegree on SpeciesGraph - N3: canPossiblyMatch reads cached aggregates - N6: Return cached MatchMaps directly (no deep copy)
…, and Fenwick highBit N-1: SpeciesGraph.neighborList lazy cache (zero-alloc neighbor reads in VF2) N-2: Int32Array corePattern/coreTarget + Uint8Array frontier bitset (no Map/Set alloc per node) N-3: Remove componentCandidateCache.clear() on addPair/removePair (preserve multivalent cache) N-4: queue.shift() -> queue[head++] BFS in NetworkGenerator S-1: Object.defineProperty -> target[key]=value in SimulationLoop/PSA/PLA/Multiscale S-2: Precompute species referenced by functional rates (O(numSpecies) -> O(referenced) per derivative) S-3: Precompute highBit in FenwickTree constructor
…nment, precomputed order, MCV skip, cache-first lookup, ridx keys, filtered observable assign
…der mutation fix, X-3 redundant dedup removal, X-4 decorate-then-sort in computeNodeOrdering
Z-2: Add lazily-cached flat bond list (Int32Array) to SpeciesGraph. Each bond stored once as [m1,c1,m2,c2] tuple. Eliminates string-key parsing (indexOf/substring/parseInt) at every consumption site. Z-1: Switch WL refinement to integer-encoded edge tuples. Precomputed compNameToId map + packed int encoding ((compNameId << 20) | (partnerCompNameId << 10) | prevColor) replaces per-edge string allocation and string sort with numeric sort. Refactor all adjacency iterations in Canonical.ts (Nauty bond-vertex construction, BFS fallback, computeOrbits, bond-ID collection) to use bondList. Removes bondVertexByKey dedup Map/set. Add MAX_INFLUENCE_REACTIONS=5000 guard before Float64Array(N^2) allocation in SimulationLoop.ts to prevent OOM on large networks.
C-1 [P0]: Replace shift-based packing ((a<<20)|(b<<10)|c) with multiplication-based encoding ((a*numNames+b)*colorRange+c). 32-bit shifts overflow when color ranks >=1024 (spills into partnerNameId field), silently corrupting WL signatures and causing wrong canonical labels for species with >=1025 molecules. C-2 [P1]: Precompute per-molecule incident edge list once before the iteration loop instead of scanning all bonds for every molecule. Restores O(C+E) per iteration from O(M*E). C-3 [P1]: Push both endpoints unconditionally in incident list builder. Self-bonds (intramolecular) previously contributed only one half-edge due to else-if. Adds regression tests: 1025-molecule linear homopolymer (C-1 overflow) and single-molecule intramolecular bond (C-3 self-bond).
| setSafeNumericField(rateContext, observableName, 0); | ||
| } | ||
| for (let i = 0; i < safeRateObservableNames.length; i++) { | ||
| rateContext[safeRateObservableNames[i]] = 0; |
| const speciesName = model.species[k].name; | ||
| if (speciesName !== '__proto__' && speciesName !== 'constructor' && speciesName !== 'prototype') { | ||
| setSafeNumericField(rateContext, speciesName, 0); | ||
| rateContext[speciesName] = 0; |
| } | ||
| for (let i = 0; i < safeRateObservableNames.length; i++) { | ||
| const name = safeRateObservableNames[i]; | ||
| rateContext[name] = obsValues[name]; |
| // Update species values in the mutable context (in-place) — only those referenced by functional rates | ||
| for (let ri = 0; ri < referencedSpeciesIndices.length; ri++) { | ||
| const k = referencedSpeciesIndices[ri]; | ||
| rateContext[referencedSpeciesNames[ri]] = odeUsesAmountState ? yIn[k] : (yIn[k] * speciesVolumes[k]); |
| } else { | ||
| this.computeUncoveredTargetNodes(); | ||
| } | ||
| const targetCandidateCount = this.frontierSize; |
|
|
||
| private matchComponentsLarge(pMolIdx: number, tMolIdx: number): Map<number, number> | null { | ||
| const patternMol = this.pattern.molecules[pMolIdx]; | ||
| const targetMol = this.target.molecules[tMolIdx]; |
| if (this._maxDegree !== undefined) return this._maxDegree; | ||
| let max = 0; | ||
| for (let idx = 0; idx < this.molecules.length; idx++) { | ||
| let deg = 0; |
…ule count, breaking bonded multi-molecule matching
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Performance & correctness hardening: network generation + ODE/SSA/PSA/PLA simulators
Summary
This PR optimizes the hot paths in rule-based network generation and the deterministic/stochastic simulators (ODE, SSA, PSA, PLA), and fixes two latent correctness bugs that the optimizations exposed. The engine is now substantially faster on large models while producing bit-identical results, with the one exception being intentional correctness fixes to canonicalization for very large or self-bonded species. NFsim is untouched and out of scope.
The work is split into 8 commits. Two of them (4→5 and 6→7) are correctness-fix pairs where the second commit repairs a bug introduced by the first — if this series is rebased or cherry-picked rather than squashed, the fix commits must travel with their predecessors (see Correctness fixes).
What's in this PR
BoundedVerifier.ts(#570)Files touched (engine only):
graph/NetworkGenerator.ts,graph/core/Matcher.ts,graph/core/SpeciesGraph.ts,graph/core/Canonical.ts,simulation/SimulationLoop.ts,simulation/PSASimulator.ts,simulation/PLASimulator.ts,simulation/ExpressionEvaluator.ts,multiscale/MultiscaleSimulation.ts,utils/fenwickTree.ts(new),verification/BoundedVerifier.ts, andtests/nauty-canonicalization.spec.ts(regression tests).Correctness fixes
These are the parts a reviewer should read closely. Everything else is performance-only and behavior-preserving.
C-1 — WL integer-encoding overflow (P0, silent species mis-dedup) — commit 7 fixes commit 6
Commit 6 sped up Weisfeiler-Leman color refinement in
Canonical.tsby replacing per-edge string signatures with a packed integer:(compNameId << 20) | (partnerNameId << 10) | prevColor. The 10-bit color field overflows when a color rank reaches 1024 — i.e. for any species with ≥1024 molecules — spilling into the partner-name field, corrupting the refinement signature, and producing a wrong canonical label. Because canonical labels drive species identity, two non-isomorphic species could receive the same label and be silently merged during network generation — a wrong network with no error raised.Commit 7 replaces the shift-based packing with a multiplication-based encoding sized to the actual data:
(compNameId * numNames + partnerNameId) * colorRange + prevColor, wherecolorRange = numMols. This runs in full IEEE-754 integer precision (max value ≈numNames² · numMols, far under 2⁵³ for any model that fits in memory) and is injective on the same(localName, partnerName, partnerColor)triple the original string key encoded — so it produces the identical refinement partition and canonical labeling as the pre-optimization baseline, just faster.Implication for merge: commit 6 in isolation is unsafe for large species. Keep 6 and 7 together.
C-2 / C-3 — also commit 7 (introduced by commit 6's refactor)
The same commit-6 migration to a flat bond list changed the WL inner-loop access pattern to scan all bonds for every molecule — O(M·E) per refinement iteration instead of O(C+E). Commit 7 precomputes a per-molecule incident-edge list once before the iteration loop, restoring linear cost (this matters precisely on the large species the optimization was meant to help). The incident-list builder also pushes both endpoints of every bond unconditionally, fixing a separate regression where intramolecular (self) bonds contributed only one half-edge to the WL signature instead of two.
X-1 — component bitmask overflow guard — commit 5 fixes commit 4
Commit 4 replaced the string-keyed
usedTargetsset in component matching with a 32-bit bitmask. Commit 5 adds the guard for molecules with >31 components: those paths fall back to a Set-basedmatchComponentsLarge/assignComponentsBacktrackLarge(faithful replicas of the bitmask backtracking), avoiding the same class of shift overflow. Same merge implication: 4 needs 5.Performance changes
Network generation (
NetworkGenerator.ts,Matcher.ts,SpeciesGraph.ts,Canonical.ts)console.logand a debug trace from the per-species hot loop.queue.shift()(O(n) per dequeue → O(n²) total) replaced with index-based access in both the main species loop and the per-rule BFS.matchPartnersRecursively(seed from the rarest molecule type).Int32Arraywith aUint8Arrayfrontier bitset (noMap/Setallocation per search node); a lazily-cachedSpeciesGraph.neighborListgives zero-alloc neighbor reads; the match cache returns itsMatchMaparrays directly instead of deep-copying on every hit;canPossiblyMatchreads cached topological aggregates (molTypeCounts/bondCount/boundCompCount/maxDegree).usedTargetsinteger bitmask key (replacing a sorted-joined string key),Int32Arrayassignment scratch, component priority order precomputed once per pattern molecule, MCV heuristic skipped for ≤4 remaining components, and the match cache checked before the structural prefilter (the prefilter is redundant work on a cache hit).SpeciesGraph.bondList(Int32Arrayof[m1,c1,m2,c2]tuples) eliminates repeated adjacency string-key parsing (indexOf/substring/parseInt) at every consumption site, and the WL refinement uses numeric edge keys + numeric sort (see C-1 for the corrected encoding).Simulation (
SimulationLoop.ts,PSASimulator.ts,PLASimulator.ts,ExpressionEvaluator.ts,utils/fenwickTree.ts)FenwickTree(binary-indexed tree) gives O(log R) weighted selection and O(log R) incremental updates, replacing the O(R) cumulative-sum scan;highBitis precomputed in the constructor.kEff(volume scaling folded in once rather than per event); the incremental-propensity recompute is reused for DIN/influence tracking instead of evaluating twice.aTotmaintenance (with periodic resync) replaces a full O(R) re-sum per event; a deadif/elsebranch was collapsed.uniqReactantIdx/uniqReactantStoich, no per-callMapallocation) andcomputeTauuses sparsenzSpecies/nzChangefor O(nnz) instead of O(species·reactions).Object.defineProperty(acrossSimulationLoop/PSA/PLA/Multiscale); only species actually referenced by a functional rate are written to the rate context per derivative;ridxkeys are precomputed.collectReferencedSlots) and copies only those — O(referenced) instead of O(V) per call. Output is bit-identical; only unreferenced slots (which the bytecode never reads) are left unpopulated.MAX_INFLUENCE_REACTIONS = 5000short-circuits the denseFloat64Array(R²)influence-matrix allocation on large networks and disables influence tracking with a warning rather than failing at allocation.Testing
tests/nauty-canonicalization.spec.ts: a 1025-molecule linear homopolymer (regression for the C-1 overflow) and a single molecule with an intramolecular bond (regression for the C-3 self-bond half-edge). Both assert canonical labels match the reference, locking the correctness fixes against future refactors of the canonicalization path.Review focus
The subtle, correctness-critical spots, in priority order:
ExpressionEvaluator.tscollectReferencedSlots— its per-opcode operand-width decoding (PUSH_CONST = 8 bytes, PUSH_SPEC/PUSH_OBS = 4-byte Int32, all others operand-free) must match the interpreter's own switch and the bytecode compiler's emit sites exactly, orpcmisaligns. Verified against both; confirm if the opcode set changes.Canonical.tsWL encoding — the multiplication-based key (C-1) and its injectivity argument (color rank always< numMols = colorRange).Matcher.tsmatchComponentsLarge/assignComponentsBacktrackLarge— the Set-based fallback (X-1) must faithfully mirror the bitmask backtracking; the>31guard is the entry condition.SpeciesGraph.bondListconsumers —bondListis undirected (one entry per bond), so any consumer needing both directions (e.g. the WL incident list, the canonicalization adjacency builder) must push both endpoints; verify none silently dropped a direction.Out of scope / follow-ups