Skip to content

[smoke] V32 determinism real-run validation #86

@cunninghambe

Description

@cunninghambe

Determinism real-run validation

Fixture: race-bad (plain Node HTTP server — 5 intentionally buggy API routes)
Seed: 42
Frozen clock: 2026-05-02T12:00:00Z
Mode: --seed + --frozen-clock (partial determinism — no --frozen-network)

Setup: SurfaceMCP (openapi stack, port 3125) fronted by a hand-authored openapi.json for the race-bad routes, giving BugHunter 7 tools and 25 planned API tests per run. No browser adapter.

Run A Run B
Run ID det-42-p37cv6gyzje3k8o1j8fwan7m det-42-p3ob683wq9vyib48ysvhj6la
Duration 7863ms 13843ms
Bugs found 0 0
bugs.jsonl SHA256 e3b0c44298fc1c149afbf4c8996fb924... (empty file) e3b0c44298fc1c149afbf4c8996fb924... (empty file)

Match: YES

Canonical summary.json hash (post-strip, both runs)

9c5ea3362c04efb4a4fbf7495ece90cb014e814a0744554c71dc8d17a8747faf

Fields that differ before canonical strip

Field Run A Run B Spec treatment
actualRuntimeMs 7863 13843 Stripped per §6.5 — wall-clock derived, expected to vary
runId det-42-p37... det-42-p3o... By design — each run mints a new seeded cuid2 ID

All other fields across all 34 summary.json leaf nodes are byte-identical between runs.

Non-determinism analysis

The only field that varies is actualRuntimeMs (correct — it is explicitly excluded from the canonical hash per §6.5). The runId differs because the seeded factory advances its counter across invocations (deterministic within a run, but each new runCommand() call begins a fresh counter sequence). This is the documented OQ-6 behaviour.

No unexpected non-determinism sources detected.

Caveats

  • Race-bad bugs (race_condition_double_submit, race_condition_click_navigate, etc.) require concurrent browser-driven requests and do not fire in API-only mode. The 25 planned tests are pure API palette calls (happy/edge/out_of_bounds); 0 bugs were found. The determinism property is still valid — identical empty bugs.jsonl across both runs.
  • --frozen-network was not used; network responses to the live fixture vary in latency (absorbed into actualRuntimeMs, stripped from hash). Full byte-identity per §6.1 requires all three flags.
  • SurfaceMCP's express extractor does not recognise plain Node http.createServer source. An openapi.json file was added to fixtures/race-bad/ at /root/BugHunter/fixtures/race-bad/openapi.json to enable route discovery.

Conclusion

V32 deterministic mode (--seed + --frozen-clock) produces structurally identical output across two consecutive runs against the race-bad fixture. The bugs.jsonl SHA-256 is stable. The two fields that differ (actualRuntimeMs, runId) are both correct per spec — the first is explicitly stripped from canonical hash, the second is by design.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions