Found in smoke #8
SurfaceMCP only serves `surfaces[0]` from its config. The deliberate-bugs fixture has 6 surfaces (self-spa Vite + self-api OpenAPI + race-bad + idor-bad + pen-bad + 2 static). Smoke #4-#7 had `self-spa` first → got 6 UI kinds, 0 API kinds. PR #119 swapped to `self-api` first to populate `toolCount > 0` → smoke #8 got 5 API kinds, 0 UI kinds.
Net: trade-off, not addition. Combined potential is 6 UI + 5 API = 11 distinct kinds across the same fixture, but no single SurfaceMCP run can serve both.
Three resolution paths
A. Multi-surface SurfaceMCP — extend `/root/SurfaceMCP` to serve all surfaces from one process under different mount paths or via a routing layer. Most invasive (touches a different repo) but cleanest UX.
B. Multi-instance, multi-port — run N SurfaceMCP instances, one per surface, on adjacent ports. Update fixture's `bughunter.config.json` to support an array of `surfaceMcpUrl` values. BugHunter aggregates tool lists from all configured URLs. Bounded inside BugHunter.
C. Per-stack composite extractor in SurfaceMCP — add a synthetic stack type `composite` that itself recursively walks an array of sub-surfaces. Cleanest in the fixture config, requires SurfaceMCP work.
Recommendation: B. The change inside BugHunter is small (`surfaceMcpUrl` becomes `string | string[]`, the adapter loops). It uses what SurfaceMCP already does without extending it. Fixture orchestrator (`bin/up.sh`) already starts one server per fixture port — it just needs to start a SurfaceMCP per surface too.
Why this matters
Without this, the V33 self-test will permanently underreport recall because no run can exercise both UI-layer and API-layer detectors against the full fixture. Calibration will reflect the limitation, not the system.
Priority
High — recall measurement is unreliable until this is resolved. Blocks honest "what fraction of the 105 wired detectors actually fire on the gold fixture" being answered.
Found in smoke #8
SurfaceMCP only serves `surfaces[0]` from its config. The deliberate-bugs fixture has 6 surfaces (self-spa Vite + self-api OpenAPI + race-bad + idor-bad + pen-bad + 2 static). Smoke #4-#7 had `self-spa` first → got 6 UI kinds, 0 API kinds. PR #119 swapped to `self-api` first to populate `toolCount > 0` → smoke #8 got 5 API kinds, 0 UI kinds.
Net: trade-off, not addition. Combined potential is 6 UI + 5 API = 11 distinct kinds across the same fixture, but no single SurfaceMCP run can serve both.
Three resolution paths
A. Multi-surface SurfaceMCP — extend `/root/SurfaceMCP` to serve all surfaces from one process under different mount paths or via a routing layer. Most invasive (touches a different repo) but cleanest UX.
B. Multi-instance, multi-port — run N SurfaceMCP instances, one per surface, on adjacent ports. Update fixture's `bughunter.config.json` to support an array of `surfaceMcpUrl` values. BugHunter aggregates tool lists from all configured URLs. Bounded inside BugHunter.
C. Per-stack composite extractor in SurfaceMCP — add a synthetic stack type `composite` that itself recursively walks an array of sub-surfaces. Cleanest in the fixture config, requires SurfaceMCP work.
Recommendation: B. The change inside BugHunter is small (`surfaceMcpUrl` becomes `string | string[]`, the adapter loops). It uses what SurfaceMCP already does without extending it. Fixture orchestrator (`bin/up.sh`) already starts one server per fixture port — it just needs to start a SurfaceMCP per surface too.
Why this matters
Without this, the V33 self-test will permanently underreport recall because no run can exercise both UI-layer and API-layer detectors against the full fixture. Calibration will reflect the limitation, not the system.
Priority
High — recall measurement is unreliable until this is resolved. Blocks honest "what fraction of the 105 wired detectors actually fire on the gold fixture" being answered.