From da403b8d7667112f8df83db56dae5be4c9b381cf Mon Sep 17 00:00:00 2001
From: Ed Heltzel <402910+edheltzel@users.noreply.github.com>
Date: Thu, 11 Jun 2026 05:38:34 -0400
Subject: [PATCH 1/2] feat(benchmarks): add Suite C precision-under-noise
 benchmark

Builds seeded deterministic corpora (100/1k/10k/100k records) in a temp DB
using the real schema and write paths, then measures the real FTS5 search()
path against a ground-truth-labeled query set (exact lookup, paraphrase,
problem lookup, ambiguous-with-collisions). Reports P@5, R@5, MRR@5, and
latency p50/p95 per corpus size, with breakdowns by query category, target
table, and provenance (#42). Near-duplicate noise exercises the gap dedup
lineage (#45) exists to close; dedup is intentionally not run so the
baseline records the unmitigated behavior.

Baseline-first per the issue: no pass/fail threshold. Env overrides
RECALL_BENCH_C_SIZES / RECALL_BENCH_C_REPEATS keep CI fast.
---
 benchmarks/README.md                         |  17 +-
 benchmarks/runner.ts                         |   6 +-
 benchmarks/suites/suite-c-internals.ts       | 666 +++++++++++++++++++
 benchmarks/suites/suite-c-precision-noise.ts | 230 +++++++
 src/commands/benchmark.ts                    |   2 +-
 tests/benchmarks/suite-b.test.ts             |  15 +-
 tests/benchmarks/suite-c.test.ts             | 329 +++++++++
 7 files changed, 1257 insertions(+), 8 deletions(-)
 create mode 100644 benchmarks/suites/suite-c-internals.ts
 create mode 100644 benchmarks/suites/suite-c-precision-noise.ts
 create mode 100644 tests/benchmarks/suite-c.test.ts

diff --git a/benchmarks/README.md b/benchmarks/README.md
index 4a8256b..a5d75db 100644
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -1,6 +1,6 @@
 # Recall Benchmarks (Phase 2)
 
-> Status: Suite B (token efficiency) implemented. Suites A / C / D / E are scaffolded in the runner but not yet built. See `.atlas/plans/2026-04-17-mempalace-research-borrow-list.md` for the full Phase 2 design.
+> Status: Suite B (token efficiency) and Suite C (precision under noise) implemented. Suites A / D / E are scaffolded in the runner but not yet built. See `.atlas/plans/2026-04-17-mempalace-research-borrow-list.md` for the full Phase 2 design.
 
 ## Why this exists
 
@@ -51,10 +51,23 @@ Each run writes two files to `benchmarks/results/`:
 |---|---|---|---|
 | A | Cross-session recall | Planned | Retrieval@5 + answer accuracy across N-session synthetic gaps |
 | B | Token efficiency | **Built** | Wake-up bundle char/token cost vs v1 baseline and CLAUDE.md |
-| C | Precision under noise | Planned | Precision@5 and latency at corpus sizes 100 / 1k / 10k / 100k |
+| C | Precision under noise | **Built** | P@5 / R@5 / MRR@5 + latency p50/p95 at corpus sizes 100 / 1k / 10k / 100k |
 | D | Structured-knowledge fidelity | Planned | Supersession correctness, LoA elevation in mixed results |
 | E | Real-world replay | Planned | Help-rate and wrong-direction-rate on anonymized session history |
 
+## Suite C methodology — precision under noise
+
+Suite C answers one question: **when the database is full of junk, does `search()` still surface the right record?**
+
+- **Corpus.** For each size in the ladder (default 100 / 1,000 / 10,000 / 100,000 records), a synthetic corpus is built in a temporary DB using the real schema (`initDb()`) and the real write paths from `src/lib/memory.ts`, so FTS triggers populate exactly as in production. The user's real DB is never touched.
+- **Determinism.** Fixture generation is seeded (mulberry32, default seed 47). The same seed and size produce byte-identical record content, so runs are comparable across machines and over time. Tests assert this.
+- **Ground truth.** A fixed set of target records (constant across sizes) carries labels: table, project, and provenance. The rest of the corpus is noise in three roles: near-duplicates of targets (the precision trap), entity-name collisions, and low-signal filler. Noise spans all five searchable tables, including messages.
+- **Queries.** Four labeled categories: exact project/name lookup, paraphrased decision lookup, learning/problem lookup, and noisy ambiguous queries. Ambiguous queries carry explicit collision labels (name / project / topic) so failures can be attributed to entity ambiguity vs generic ranking noise.
+- **Metrics.** Precision@5, Recall@5, and MRR@5 per corpus size, plus breakdowns by query category, by ground-truth table (`r_at_5_table_*`), and by provenance (`r_at_5_prov_*`). No composite scores, per the methodology rules.
+- **Latency.** One unmeasured warmup pass per corpus size, then 5 measured repeats per query on a warm connection; p50/p95 are computed across all measured calls at that size. The report caveats state the protocol and whether the embedding service was available — Suite C exercises the FTS5 keyword path only.
+- **Baseline-first.** The first run records an honest baseline; there is no pass/fail threshold. Later regression gating can diff runs against the checked-in baseline JSONL in `benchmarks/results/`.
+- **Overrides.** `RECALL_BENCH_C_SIZES` (comma-separated) and `RECALL_BENCH_C_REPEATS` override the corpus ladder and repeat count — used by tests to keep CI fast; leave unset for comparable real runs.
+
 ## Adding a new suite
 
 1. Create `benchmarks/suites/suite-<id>-<name>.ts` exporting `runSuite<id>(): Promise<SuiteResult>`.
diff --git a/benchmarks/runner.ts b/benchmarks/runner.ts
index 366a457..9e81d4b 100644
--- a/benchmarks/runner.ts
+++ b/benchmarks/runner.ts
@@ -11,6 +11,7 @@
 import { mkdirSync, writeFileSync, existsSync } from 'fs';
 import { join, dirname } from 'path';
 import { runSuiteB } from './suites/suite-b-token-efficiency.js';
+import { runSuiteC } from './suites/suite-c-precision-noise.js';
 import type { RunResult, SuiteResult, SuiteId } from './types.js';
 
 const RESULTS_DIR = join(import.meta.dir, 'results');
@@ -38,8 +39,11 @@ async function dispatchSuite(suite: SuiteId, project?: string): Promise<SuiteRes
   switch (suite) {
     case 'B':
       return runSuiteB({ project });
-    case 'A':
     case 'C':
+      // Suite C builds its own synthetic corpora — the project scope does
+      // not apply to it.
+      return runSuiteC();
+    case 'A':
     case 'D':
     case 'E':
       // Stub — these suites are planned but not implemented in this slice.
diff --git a/benchmarks/suites/suite-c-internals.ts b/benchmarks/suites/suite-c-internals.ts
new file mode 100644
index 0000000..1cfc53d
--- /dev/null
+++ b/benchmarks/suites/suite-c-internals.ts
@@ -0,0 +1,666 @@
+// Suite C internals — deterministic fixture corpus, ground-truth query set,
+// and retrieval metric helpers.
+//
+// Unlike Suite B (which measures the user's real DB read-only), Suite C builds
+// a synthetic corpus in a temporary DB and exercises the REAL write and search
+// paths from src/ — initDb() creates the production schema (FTS triggers
+// included), records are inserted through src/lib/memory.ts, and queries run
+// through search(). Measuring a reimplementation of search would measure a
+// strawman; the point is to benchmark the code users actually run.
+//
+// Everything here is seeded and deterministic: the same (seed, size) pair
+// always produces byte-identical record content and the same query set, so
+// runs are comparable across machines and over time.
+
+import {
+  createSession,
+  addDecision,
+  addLearning,
+  addBreadcrumb,
+  addMessagesBatch,
+  createLoaEntry,
+} from '../../src/lib/memory.js';
+import { getDb } from '../../src/db/connection.js';
+import type { Message, Provenance } from '../../src/types/index.js';
+
+/** Retrieval cutoff — all metrics are @5 (P@5, R@5, MRR@5). */
+export const K = 5;
+
+/** Default seed. 47 = the tracking issue number; any fixed value works. */
+export const DEFAULT_SEED = 47;
+
+// ── Seeded PRNG ──────────────────────────────────────────────────────
+// mulberry32 — tiny, well-distributed, deterministic across platforms.
+
+export function mulberry32(seed: number): () => number {
+  let a = seed >>> 0;
+  return () => {
+    a |= 0;
+    a = (a + 0x6d2b79f5) | 0;
+    let t = Math.imul(a ^ (a >>> 15), 1 | a);
+    t = (t + Math.imul(t ^ (t >>> 7), 61 | t)) ^ t;
+    return ((t ^ (t >>> 14)) >>> 0) / 4294967296;
+  };
+}
+
+// ── Fixture types ────────────────────────────────────────────────────
+
+export type FixtureTable = 'decisions' | 'learnings' | 'breadcrumbs' | 'loa_entries' | 'messages';
+
+export type FixtureRole = 'target' | 'near_duplicate' | 'entity_collision' | 'low_signal';
+
+export interface FixtureRecord {
+  /** Stable key — ground-truth queries reference target keys. */
+  key: string;
+  table: FixtureTable;
+  project: string;
+  provenance: Provenance;
+  /** Primary text (decision / problem / content / LoA title). */
+  text: string;
+  /** Secondary text (reasoning / solution / LoA fabric_extract). */
+  detail?: string;
+  importance: number;
+  /** Decisions/learnings only — search() orders decisions by confidence before rank. */
+  confidence?: 'high' | 'medium' | 'low';
+  role: FixtureRole;
+}
+
+export type QueryCategory = 'exact_lookup' | 'paraphrase' | 'problem_lookup' | 'ambiguous';
+
+export interface FixtureQuery {
+  id: string;
+  text: string;
+  category: QueryCategory;
+  /** Optional project filter passed to search(), mirroring scoped lookups. */
+  project?: string;
+  /** Keys of target records that are relevant to this query (ground truth). */
+  expected: string[];
+  /** Ambiguous queries only — labels the collision so failures can be attributed. */
+  collision?: { kind: 'name' | 'project' | 'topic'; note: string };
+}
+
+export interface FixtureSpec {
+  seed: number;
+  size: number;
+  records: FixtureRecord[];
+  queries: FixtureQuery[];
+}
+
+export interface SeededRecord {
+  key: string;
+  table: FixtureTable;
+  id: number;
+  project: string;
+  provenance: Provenance;
+}
+
+// ── Ground-truth targets ─────────────────────────────────────────────
+// Fixed records present at EVERY corpus size, so every query has answers and
+// scores are comparable across sizes. Entity names are invented (ZephyrQueue,
+// GlacierStore, HeliosParser, Atlas) so real-world content can't collide.
+// Confidence stays 'medium' on decision targets — search() ranks decisions by
+// confidence before FTS rank, and inflating our own targets would flatter the
+// baseline.
+
+const PROJECT_PHOENIX = 'phoenix-api';
+const PROJECT_ATLAS = 'atlas-web';
+const PROJECT_NIMBUS = 'nimbus-cli';
+
+const TARGETS: FixtureRecord[] = [
+  {
+    key: 't_zephyr_backoff',
+    table: 'decisions',
+    project: PROJECT_PHOENIX,
+    provenance: 'user_authored',
+    text: 'Use exponential backoff with jitter for ZephyrQueue retry handling',
+    detail: 'Fixed-interval retries hammered the broker during outages; jitter spreads the reconnect storm.',
+    importance: 7,
+    confidence: 'medium',
+    role: 'target',
+  },
+  {
+    key: 't_glacier_compaction',
+    table: 'decisions',
+    project: PROJECT_NIMBUS,
+    provenance: 'extracted',
+    text: 'Schedule GlacierStore compaction nightly at 03:00 UTC',
+    detail: 'Daytime compaction competed with interactive workloads for disk I/O.',
+    importance: 6,
+    confidence: 'medium',
+    role: 'target',
+  },
+  {
+    key: 't_wal_journaling',
+    table: 'decisions',
+    project: PROJECT_PHOENIX,
+    provenance: 'user_authored',
+    text: 'Adopt SQLite WAL journal mode for the persistence layer',
+    detail: 'Allows concurrent readers while a single writer appends; the rollback journal blocked readers during writes.',
+    importance: 7,
+    confidence: 'medium',
+    role: 'target',
+  },
+  {
+    key: 't_atlas_phoenix',
+    table: 'decisions',
+    project: PROJECT_PHOENIX,
+    provenance: 'extracted',
+    text: 'Deploy Atlas gateway to the edge POPs before regional rollout',
+    detail: 'Atlas at the edge cuts p95 latency for auth handshakes.',
+    importance: 6,
+    confidence: 'medium',
+    role: 'target',
+  },
+  {
+    key: 't_atlas_web',
+    table: 'decisions',
+    project: PROJECT_ATLAS,
+    provenance: 'user_authored',
+    text: 'Rename the Atlas design tokens package to atlas-tokens',
+    detail: 'The old package name collided with the Atlas gateway component.',
+    importance: 5,
+    confidence: 'medium',
+    role: 'target',
+  },
+  {
+    key: 't_sqlite_busy',
+    table: 'learnings',
+    project: PROJECT_PHOENIX,
+    provenance: 'extracted',
+    text: 'SQLITE_BUSY errors under concurrent WAL writers in bun:sqlite',
+    detail: 'Serialize writes through a single connection; WAL allows many readers but only one writer.',
+    importance: 7,
+    confidence: 'medium',
+    role: 'target',
+  },
+  {
+    key: 't_orphan_worktree',
+    table: 'learnings',
+    project: PROJECT_NIMBUS,
+    provenance: 'extracted',
+    text: 'Orphaned git worktree branches accumulate after subagent merges',
+    detail: 'Delete the per-agent branch right after merging; lowercase -d refuses unmerged deletes.',
+    importance: 6,
+    confidence: 'medium',
+    role: 'target',
+  },
+  {
+    key: 't_timeout_ingest',
+    table: 'learnings',
+    project: PROJECT_PHOENIX,
+    provenance: 'derived',
+    text: 'Ingest pipeline timeout when the embedding service is cold',
+    detail: 'Warm the embedding service at startup and fail open to keyword search.',
+    importance: 6,
+    confidence: 'medium',
+    role: 'target',
+  },
+  {
+    key: 't_timeout_ui',
+    table: 'learnings',
+    project: PROJECT_ATLAS,
+    provenance: 'extracted',
+    text: 'Modal dismiss timeout races the navigation transition',
+    detail: 'Await the transition promise before starting the dismiss timer.',
+    importance: 5,
+    confidence: 'medium',
+    role: 'target',
+  },
+  {
+    key: 't_helios_tokenizer',
+    table: 'loa_entries',
+    project: PROJECT_PHOENIX,
+    provenance: 'verbatim',
+    text: 'HeliosParser streaming tokenizer design',
+    detail:
+      'HeliosParser tokenizes input incrementally so multi-megabyte payloads never buffer fully in memory. The lookahead window is bounded at 4KB and backpressure propagates to the source stream.',
+    importance: 8,
+    role: 'target',
+  },
+  {
+    key: 't_loa_retro',
+    table: 'loa_entries',
+    project: PROJECT_ATLAS,
+    provenance: 'extracted',
+    text: 'Q2 atlas-web performance retro',
+    detail:
+      'Bundle splitting halved initial load time. The Atlas tokens rename unblocked the design system release train.',
+    importance: 8,
+    role: 'target',
+  },
+  {
+    key: 't_breadcrumb_release',
+    table: 'breadcrumbs',
+    project: PROJECT_NIMBUS,
+    provenance: 'verbatim',
+    text: 'Release 0.9.3 tagged; GlacierStore migration gate passed on staging',
+    importance: 6,
+    role: 'target',
+  },
+];
+
+// ── Ground-truth query set ───────────────────────────────────────────
+// Four categories per the issue spec. Query texts avoid FTS5 syntax
+// characters (quotes, colons, parens) — they go into MATCH verbatim, exactly
+// as search() receives them from callers.
+
+const QUERIES: FixtureQuery[] = [
+  // Exact project/name lookups — entity name + topic terms, sometimes scoped.
+  {
+    id: 'q_exact_zephyr',
+    text: 'ZephyrQueue retry backoff',
+    category: 'exact_lookup',
+    project: PROJECT_PHOENIX,
+    expected: ['t_zephyr_backoff'],
+  },
+  {
+    id: 'q_exact_glacier',
+    text: 'GlacierStore compaction',
+    category: 'exact_lookup',
+    project: PROJECT_NIMBUS,
+    expected: ['t_glacier_compaction'],
+  },
+  {
+    id: 'q_exact_helios',
+    text: 'HeliosParser tokenizer',
+    category: 'exact_lookup',
+    expected: ['t_helios_tokenizer'],
+  },
+  {
+    id: 'q_exact_release',
+    text: 'GlacierStore migration gate',
+    category: 'exact_lookup',
+    project: PROJECT_NIMBUS,
+    expected: ['t_breadcrumb_release'],
+  },
+  // Paraphrased decision lookups — reworded intent. FTS5 MATCH is implicit
+  // AND with no stemming, so some of these are EXPECTED to miss on keyword
+  // search. That gap is part of the baseline this suite records.
+  {
+    id: 'q_para_wal',
+    text: 'journal mode concurrent readers',
+    category: 'paraphrase',
+    expected: ['t_wal_journaling'],
+  },
+  {
+    id: 'q_para_retry',
+    text: 'spread reconnect attempts after broker outages',
+    category: 'paraphrase',
+    expected: ['t_zephyr_backoff'],
+  },
+  {
+    id: 'q_para_compact',
+    text: 'when to run storage compaction',
+    category: 'paraphrase',
+    expected: ['t_glacier_compaction'],
+  },
+  // Learning/problem lookups — phrased the way an agent reports a failure.
+  {
+    id: 'q_prob_busy',
+    text: 'SQLITE_BUSY concurrent writers',
+    category: 'problem_lookup',
+    expected: ['t_sqlite_busy'],
+  },
+  {
+    id: 'q_prob_worktree',
+    text: 'orphaned worktree branches',
+    category: 'problem_lookup',
+    expected: ['t_orphan_worktree'],
+  },
+  {
+    id: 'q_prob_cold',
+    text: 'embedding service cold timeout',
+    category: 'problem_lookup',
+    expected: ['t_timeout_ingest'],
+  },
+  // Noisy ambiguous queries — labeled collisions so failures can be
+  // attributed to entity/name ambiguity vs generic ranking noise.
+  {
+    id: 'q_amb_atlas_name',
+    text: 'Atlas',
+    category: 'ambiguous',
+    expected: ['t_atlas_phoenix', 't_atlas_web', 't_loa_retro'],
+    collision: {
+      kind: 'name',
+      note: 'Atlas is both a gateway (phoenix-api) and a design-tokens package (atlas-web); the atlas-web PROJECT name also matches the term because project is an indexed FTS column, and collision noise mentions Atlas in unrelated content.',
+    },
+  },
+  {
+    id: 'q_amb_atlas_scoped',
+    text: 'Atlas gateway edge',
+    category: 'ambiguous',
+    project: PROJECT_PHOENIX,
+    expected: ['t_atlas_phoenix'],
+    collision: {
+      kind: 'name',
+      note: 'Same Atlas name collision, disambiguated by a project filter plus topic terms.',
+    },
+  },
+  {
+    id: 'q_amb_timeout',
+    text: 'timeout',
+    category: 'ambiguous',
+    expected: ['t_timeout_ingest', 't_timeout_ui'],
+    collision: {
+      kind: 'topic',
+      note: 'Two unrelated timeout learnings (ingest pipeline vs UI modal) plus collision noise mentioning timeouts.',
+    },
+  },
+  {
+    id: 'q_amb_timeout_scoped',
+    text: 'timeout',
+    category: 'ambiguous',
+    project: PROJECT_ATLAS,
+    expected: ['t_timeout_ui'],
+    collision: {
+      kind: 'project',
+      note: 'Same bare term as q_amb_timeout; the project filter is the only disambiguator.',
+    },
+  },
+];
+
+// ── Noise generation ─────────────────────────────────────────────────
+// Three noise roles:
+//   near_duplicate   — deterministic variants of target text. They compete
+//                      directly with targets in ranking but are NOT labeled
+//                      relevant: surfacing the variant instead of the record
+//                      the query asks about is a precision failure.
+//   entity_collision — target entity names embedded in unrelated content.
+//   low_signal       — generic filler an extraction pipeline accumulates.
+
+const NOISE_PROJECTS = [PROJECT_PHOENIX, PROJECT_ATLAS, PROJECT_NIMBUS, 'quartz-docs', 'ember-infra'];
+
+const ENTITY_NAMES = ['Atlas', 'ZephyrQueue', 'GlacierStore', 'HeliosParser', 'timeout', 'compaction'];
+
+const COLLISION_TEMPLATES = [
+  'Filed a ticket about ENTITY color contrast on the marketing splash page',
+  'Renamed the ENTITY spreadsheet tab in the quarterly planning doc',
+  'Standup note - ENTITY demo moved to Thursday',
+  'Asked design for new ENTITY stickers for the offsite',
+  'The ENTITY conference talk recording is up on the wiki',
+];
+
+const LOW_SIGNAL_SUBJECTS = [
+  'build pipeline', 'cache layer', 'release notes', 'standup notes',
+  'dependency bump', 'flaky test', 'lint warning', 'onboarding doc',
+  'dashboard widget', 'feature flag', 'log rotation', 'pager schedule',
+];
+
+const LOW_SIGNAL_VERBS = ['Investigated', 'Reviewed', 'Skimmed', 'Touched', 'Noted', 'Parked'];
+
+const LOW_SIGNAL_OUTCOMES = [
+  'no conclusion', 'follow-up later', 'seems fine', 'needs owner', 'low priority', 'waiting on infra',
+];
+
+const NEAR_DUP_SUFFIXES = [
+  'revisited after incident review',
+  'as discussed in the planning sync',
+  'pending final sign-off',
+  'copied from the old tracker',
+  'second occurrence this quarter',
+];
+
+function pick<T>(rng: () => number, pool: readonly T[]): T {
+  return pool[Math.floor(rng() * pool.length)];
+}
+
+function pickWeighted<T extends string>(rng: () => number, weights: Record<T, number>): T {
+  const entries = Object.entries(weights) as Array<[T, number]>;
+  let roll = rng();
+  for (const [value, weight] of entries) {
+    roll -= weight;
+    if (roll < 0) return value;
+  }
+  return entries[entries.length - 1][0];
+}
+
+const NOISE_TABLE_WEIGHTS: Record<FixtureTable, number> = {
+  messages: 0.4,
+  breadcrumbs: 0.25,
+  decisions: 0.15,
+  learnings: 0.15,
+  loa_entries: 0.05,
+};
+
+const NOISE_ROLE_WEIGHTS: Record<Exclude<FixtureRole, 'target'>, number> = {
+  low_signal: 0.7,
+  entity_collision: 0.2,
+  near_duplicate: 0.1,
+};
+
+const NOISE_PROVENANCE_WEIGHTS: Record<Provenance, number> = {
+  extracted: 0.6,
+  derived: 0.2,
+  verbatim: 0.1,
+  user_authored: 0.1,
+};
+
+function makeNoiseRecord(rng: () => number, index: number): FixtureRecord {
+  const table = pickWeighted(rng, NOISE_TABLE_WEIGHTS);
+  const role = pickWeighted(rng, NOISE_ROLE_WEIGHTS);
+  const provenance = pickWeighted(rng, NOISE_PROVENANCE_WEIGHTS);
+  const project = pick(rng, NOISE_PROJECTS);
+  const importance = 2 + Math.floor(rng() * 6); // 2..7
+
+  let text: string;
+  let detail: string | undefined;
+
+  if (role === 'near_duplicate') {
+    const target = TARGETS[Math.floor(rng() * TARGETS.length)];
+    text = `${target.text} - ${pick(rng, NEAR_DUP_SUFFIXES)}`;
+    detail = target.detail;
+  } else if (role === 'entity_collision') {
+    const entity = pick(rng, ENTITY_NAMES);
+    text = `${pick(rng, COLLISION_TEMPLATES).replace('ENTITY', entity)} (n${index})`;
+  } else {
+    text = `${pick(rng, LOW_SIGNAL_VERBS)} ${pick(rng, LOW_SIGNAL_SUBJECTS)} drift; ${pick(rng, LOW_SIGNAL_OUTCOMES)} (n${index})`;
+  }
+
+  const record: FixtureRecord = {
+    key: `noise_${index}`,
+    table,
+    project,
+    provenance,
+    text,
+    detail,
+    importance,
+    role,
+  };
+
+  if (table === 'decisions' || table === 'learnings') {
+    const roll = rng();
+    record.confidence = roll < 0.15 ? 'high' : roll < 0.85 ? 'medium' : 'low';
+  }
+
+  return record;
+}
+
+/**
+ * Build the full deterministic fixture spec for one corpus size.
+ *
+ * Targets are constant across sizes; noise fills the remainder. Records are
+ * shuffled (seeded Fisher-Yates) so target rows scatter through the ID space
+ * instead of clustering at the start.
+ */
+export function generateFixtureSpec(seed: number, size: number): FixtureSpec {
+  if (size < TARGETS.length + QUERIES.length) {
+    throw new Error(`Suite C corpus size must be at least ${TARGETS.length + QUERIES.length}, got ${size}`);
+  }
+  const rng = mulberry32(seed);
+
+  const records: FixtureRecord[] = [...TARGETS];
+  const noiseCount = size - TARGETS.length;
+  for (let i = 0; i < noiseCount; i++) {
+    records.push(makeNoiseRecord(rng, i));
+  }
+
+  // Seeded Fisher-Yates shuffle.
+  for (let i = records.length - 1; i > 0; i--) {
+    const j = Math.floor(rng() * (i + 1));
+    [records[i], records[j]] = [records[j], records[i]];
+  }
+
+  return { seed, size, records, queries: QUERIES };
+}
+
+// ── Seeding ──────────────────────────────────────────────────────────
+
+const FIXTURE_SESSION_ID = 'suite-c-fixture';
+const SEED_EPOCH_MS = Date.UTC(2026, 0, 1); // fixed epoch — deterministic timestamps
+
+/**
+ * Insert a fixture spec into the CURRENT database (RECALL_DB_PATH must already
+ * point at an initDb()-initialized fixture DB). Uses the real write paths so
+ * FTS triggers populate exactly as they do in production.
+ *
+ * Returns target key → seeded row reference for ground-truth resolution.
+ */
+export function seedFixture(spec: FixtureSpec): Map<string, SeededRecord> {
+  const db = getDb();
+  const targets = new Map<string, SeededRecord>();
+
+  createSession({
+    session_id: FIXTURE_SESSION_ID,
+    started_at: new Date(SEED_EPOCH_MS).toISOString(),
+    project: PROJECT_PHOENIX,
+    source: 'suite-c-benchmark',
+  });
+
+  const messages: Array<Omit<Message, 'id'>> = [];
+  const structured = spec.records.filter((r) => {
+    if (r.table !== 'messages') return true;
+    messages.push({
+      session_id: FIXTURE_SESSION_ID,
+      timestamp: new Date(SEED_EPOCH_MS + messages.length * 60_000).toISOString(),
+      role: messages.length % 2 === 0 ? 'user' : 'assistant',
+      content: r.text,
+      project: r.project,
+      importance: r.importance,
+      provenance: r.provenance,
+    });
+    return false;
+  });
+
+  // Structured tables in one transaction — 100k single inserts without one
+  // would pay a COMMIT per row and take minutes.
+  const insertStructured = db.transaction(() => {
+    for (const r of structured) {
+      let id: number;
+      switch (r.table) {
+        case 'decisions':
+          id = addDecision({
+            decision: r.text,
+            reasoning: r.detail,
+            project: r.project,
+            status: 'active',
+            confidence: r.confidence ?? 'medium',
+            importance: r.importance,
+            provenance: r.provenance,
+          });
+          break;
+        case 'learnings':
+          id = addLearning({
+            problem: r.text,
+            solution: r.detail,
+            project: r.project,
+            confidence: r.confidence ?? 'medium',
+            importance: r.importance,
+            provenance: r.provenance,
+          });
+          break;
+        case 'breadcrumbs':
+          id = addBreadcrumb({
+            content: r.text,
+            project: r.project,
+            importance: r.importance,
+            provenance: r.provenance,
+          });
+          break;
+        case 'loa_entries':
+          id = createLoaEntry({
+            title: r.text,
+            description: r.detail ? r.detail.slice(0, 120) : undefined,
+            fabric_extract: r.detail ?? r.text,
+            project: r.project,
+            importance: r.importance,
+            provenance: r.provenance,
+          });
+          break;
+        default:
+          continue;
+      }
+      if (r.role === 'target') {
+        targets.set(r.key, { key: r.key, table: r.table, id, project: r.project, provenance: r.provenance });
+      }
+    }
+  });
+  insertStructured();
+
+  if (messages.length > 0) {
+    addMessagesBatch(messages); // manages its own transaction
+  }
+
+  return targets;
+}
+
+// ── Metrics ──────────────────────────────────────────────────────────
+// Pure functions over (retrieved, relevant) so they are trivially testable.
+// Relevance keys are `${table}#${id}` — the same identity search() returns.
+
+export interface RetrievedRef {
+  table: string;
+  id: number;
+}
+
+export function refKey(table: string, id: number): string {
+  return `${table}#${id}`;
+}
+
+/**
+ * search() reports loa_entries rows under the logical table name 'loa'
+ * (see SEARCH_TABLES in src/lib/memory.ts). Ground-truth records carry the
+ * physical table name — map it before comparing identities.
+ */
+export function searchTableName(table: FixtureTable): string {
+  return table === 'loa_entries' ? 'loa' : table;
+}
+
+/** Fraction of the top-k that is relevant. Standard P@k: divisor is k. */
+export function precisionAtK(retrieved: RetrievedRef[], relevant: Set<string>, k: number): number {
+  if (k <= 0) return 0;
+  const hits = retrieved.slice(0, k).filter((r) => relevant.has(refKey(r.table, r.id))).length;
+  return hits / k;
+}
+
+/** Fraction of all relevant records that appear in the top-k. */
+export function recallAtK(retrieved: RetrievedRef[], relevant: Set<string>, k: number): number {
+  if (relevant.size === 0) return 0;
+  const top = retrieved.slice(0, k);
+  let hits = 0;
+  for (const key of relevant) {
+    if (top.some((r) => refKey(r.table, r.id) === key)) hits++;
+  }
+  return hits / relevant.size;
+}
+
+/** 1/rank of the first relevant result within the top-k; 0 if none. */
+export function reciprocalRank(retrieved: RetrievedRef[], relevant: Set<string>, k: number): number {
+  const top = retrieved.slice(0, k);
+  for (let i = 0; i < top.length; i++) {
+    if (relevant.has(refKey(top[i].table, top[i].id))) return 1 / (i + 1);
+  }
+  return 0;
+}
+
+/** Nearest-rank percentile (p in 0..100) over an unsorted sample. */
+export function percentile(values: number[], p: number): number {
+  if (values.length === 0) return 0;
+  const sorted = [...values].sort((a, b) => a - b);
+  const idx = Math.min(sorted.length - 1, Math.max(0, Math.ceil((p / 100) * sorted.length) - 1));
+  return sorted[idx];
+}
+
+export function mean(values: number[]): number {
+  if (values.length === 0) return 0;
+  return values.reduce((a, b) => a + b, 0) / values.length;
+}
diff --git a/benchmarks/suites/suite-c-precision-noise.ts b/benchmarks/suites/suite-c-precision-noise.ts
new file mode 100644
index 0000000..0db5742
--- /dev/null
+++ b/benchmarks/suites/suite-c-precision-noise.ts
@@ -0,0 +1,230 @@
+// Suite C — Precision under noise.
+//
+// Measures whether search() retrieves the right high-signal memory when the
+// database contains many irrelevant records. Builds seeded synthetic corpora
+// (100 / 1k / 10k / 100k records by default) in a temporary DB using the real
+// schema and write paths, then runs a ground-truth-labeled query set through
+// the real FTS5 search path and reports P@5, R@5, MRR@5, and latency p50/p95
+// per corpus size, with breakdowns by table, provenance, and query category.
+//
+// What this suite is NOT:
+// - It does NOT measure the wake-up bundle cost (Suite B).
+// - It does NOT measure answer accuracy (Suite A's grader).
+// - It does NOT exercise semantic/hybrid retrieval — FTS5 keyword path only.
+// First implementation records an honest baseline; there is no pass/fail
+// threshold. Regression gating compares future runs against the checked-in
+// baseline JSONL.
+
+import { mkdtempSync, rmSync } from 'fs';
+import { tmpdir } from 'os';
+import { join } from 'path';
+import { search } from '../../src/lib/memory.js';
+import { initDb, closeDb } from '../../src/db/connection.js';
+import { checkEmbeddingService } from '../../src/lib/embeddings.js';
+import type { SuiteResult, MetricSample } from '../types.js';
+import {
+  K,
+  DEFAULT_SEED,
+  generateFixtureSpec,
+  seedFixture,
+  precisionAtK,
+  recallAtK,
+  reciprocalRank,
+  percentile,
+  mean,
+  refKey,
+  searchTableName,
+  type FixtureQuery,
+  type SeededRecord,
+} from './suite-c-internals.js';
+
+export interface SuiteCOptions {
+  /** Corpus sizes to run. Default 100/1k/10k/100k; env RECALL_BENCH_C_SIZES overrides. */
+  sizes?: number[];
+  /** PRNG seed for fixture generation. Default 47. */
+  seed?: number;
+  /** Measured repeats per query (after 1 unmeasured warmup pass). Default 5; env RECALL_BENCH_C_REPEATS overrides. */
+  repeats?: number;
+}
+
+const DEFAULT_SIZES = [100, 1_000, 10_000, 100_000];
+const DEFAULT_REPEATS = 5;
+const WARMUP_PASSES = 1;
+
+function parseEnvInts(name: string): number[] | undefined {
+  const raw = process.env[name];
+  if (!raw) return undefined;
+  const values = raw.split(',').map((s) => parseInt(s.trim(), 10)).filter((n) => Number.isFinite(n) && n > 0);
+  return values.length > 0 ? values : undefined;
+}
+
+const round = (value: number, places: number): number => {
+  const f = 10 ** places;
+  return Math.round(value * f) / f;
+};
+
+interface QueryOutcome {
+  query: FixtureQuery;
+  p5: number;
+  r5: number;
+  rr: number;
+  /** Per-expected-record retrieval flags for table/provenance breakdowns. */
+  expectedHits: Array<{ record: SeededRecord; retrieved: boolean }>;
+}
+
+function runQuery(query: FixtureQuery, targets: Map<string, SeededRecord>): QueryOutcome {
+  const results = search(query.text, { project: query.project, limit: K });
+  const retrieved = results.map((r) => ({ table: r.table, id: r.id }));
+
+  const expected = query.expected.map((key) => {
+    const record = targets.get(key);
+    if (!record) throw new Error(`Suite C ground-truth key ${key} did not resolve to a seeded record`);
+    return record;
+  });
+  const relevant = new Set(expected.map((r) => refKey(searchTableName(r.table), r.id)));
+  const topKeys = new Set(retrieved.slice(0, K).map((r) => refKey(r.table, r.id)));
+
+  return {
+    query,
+    p5: precisionAtK(retrieved, relevant, K),
+    r5: recallAtK(retrieved, relevant, K),
+    rr: reciprocalRank(retrieved, relevant, K),
+    expectedHits: expected.map((record) => ({
+      record,
+      retrieved: topKeys.has(refKey(searchTableName(record.table), record.id)),
+    })),
+  };
+}
+
+function pushBreakdown(
+  samples: MetricSample[],
+  outcomes: QueryOutcome[],
+  scope: string,
+): void {
+  // By query category — P@5 and MRR@5 per category, so ambiguous-query
+  // failures are attributable separately from exact-lookup failures.
+  const categories = [...new Set(outcomes.map((o) => o.query.category))];
+  for (const category of categories) {
+    const group = outcomes.filter((o) => o.query.category === category);
+    samples.push({
+      name: `p_at_5_cat_${category}`,
+      value: round(mean(group.map((o) => o.p5)), 4),
+      unit: 'ratio',
+      scope,
+    });
+    samples.push({
+      name: `mrr_cat_${category}`,
+      value: round(mean(group.map((o) => o.rr)), 4),
+      unit: 'ratio',
+      scope,
+    });
+  }
+
+  // By table and provenance — recall of ground-truth records grouped by the
+  // record's own table/provenance: of the labeled-relevant records in this
+  // dimension, what fraction surfaced in a top-5?
+  const hits = outcomes.flatMap((o) => o.expectedHits);
+  const byDimension = (dim: 'table' | 'provenance', prefix: string) => {
+    const values = [...new Set(hits.map((h) => h.record[dim]))];
+    for (const value of values) {
+      const group = hits.filter((h) => h.record[dim] === value);
+      samples.push({
+        name: `${prefix}_${value}`,
+        value: round(group.filter((h) => h.retrieved).length / group.length, 4),
+        unit: 'ratio',
+        scope,
+      });
+    }
+  };
+  byDimension('table', 'r_at_5_table');
+  byDimension('provenance', 'r_at_5_prov');
+}
+
+export async function runSuiteC(options: SuiteCOptions = {}): Promise<SuiteResult> {
+  const t0 = performance.now();
+
+  const sizes = options.sizes ?? parseEnvInts('RECALL_BENCH_C_SIZES') ?? DEFAULT_SIZES;
+  const seed = options.seed ?? DEFAULT_SEED;
+  const repeats = options.repeats ?? parseEnvInts('RECALL_BENCH_C_REPEATS')?.[0] ?? DEFAULT_REPEATS;
+
+  const embedding = await checkEmbeddingService().catch(() => ({ available: false, model: 'unknown', url: 'unknown' }));
+
+  const samples: MetricSample[] = [];
+
+  // Suite C must never touch the user's real DB: every corpus lives in a
+  // temp dir, and the env override + module connection are restored after.
+  const savedRecallPath = process.env.RECALL_DB_PATH;
+  const savedMemPath = process.env.MEM_DB_PATH;
+  const tempRoot = mkdtempSync(join(tmpdir(), 'recall-suite-c-'));
+
+  try {
+    for (const size of sizes) {
+      const spec = generateFixtureSpec(seed, size);
+
+      closeDb();
+      process.env.RECALL_DB_PATH = join(tempRoot, `corpus-${size}.db`);
+      delete process.env.MEM_DB_PATH;
+      initDb();
+      const targets = seedFixture(spec);
+      const scope = `corpus=${size}`;
+
+      // Warmup — unmeasured pass(es) so first-touch page cache and statement
+      // compilation don't pollute the latency distribution.
+      for (let w = 0; w < WARMUP_PASSES; w++) {
+        for (const query of spec.queries) runQuery(query, targets);
+      }
+
+      const latencies: number[] = [];
+      let outcomes: QueryOutcome[] = [];
+      for (let r = 0; r < repeats; r++) {
+        const pass: QueryOutcome[] = [];
+        for (const query of spec.queries) {
+          const tq = performance.now();
+          pass.push(runQuery(query, targets));
+          latencies.push(performance.now() - tq);
+        }
+        // Retrieval is deterministic for a fixed corpus — relevance metrics
+        // come from the first measured pass; later passes only feed latency.
+        if (r === 0) outcomes = pass;
+      }
+      closeDb();
+
+      samples.push({ name: 'p_at_5', value: round(mean(outcomes.map((o) => o.p5)), 4), unit: 'ratio', scope });
+      samples.push({ name: 'r_at_5', value: round(mean(outcomes.map((o) => o.r5)), 4), unit: 'ratio', scope });
+      samples.push({ name: 'mrr', value: round(mean(outcomes.map((o) => o.rr)), 4), unit: 'ratio (MRR@5)', scope });
+      samples.push({ name: 'latency_p50_ms', value: round(percentile(latencies, 50), 3), unit: 'ms', scope });
+      samples.push({ name: 'latency_p95_ms', value: round(percentile(latencies, 95), 3), unit: 'ms', scope });
+      pushBreakdown(samples, outcomes, scope);
+    }
+  } finally {
+    closeDb();
+    if (savedRecallPath !== undefined) process.env.RECALL_DB_PATH = savedRecallPath;
+    else delete process.env.RECALL_DB_PATH;
+    if (savedMemPath !== undefined) process.env.MEM_DB_PATH = savedMemPath;
+    else delete process.env.MEM_DB_PATH;
+    rmSync(tempRoot, { recursive: true, force: true });
+  }
+
+  const durationMs = Math.round(performance.now() - t0);
+
+  return {
+    suite: 'C',
+    name: 'Precision under noise',
+    description:
+      `Measures FTS5 search() precision against seeded synthetic corpora (sizes: ${sizes.join(', ')}; seed: ${seed}). ` +
+      `A ground-truth-labeled query set (exact lookup, paraphrase, problem lookup, ambiguous-with-collisions) runs at each size; ` +
+      `reports P@5, R@5, MRR@5, latency p50/p95, and breakdowns by query category, target table, and provenance.`,
+    ranAt: new Date().toISOString(),
+    durationMs,
+    samples,
+    caveats: [
+      `Synthetic corpus: deterministic seeded fixtures (seed ${seed}). Absolute scores do not transfer to real-world corpora; compare runs only against this same fixture set.`,
+      `Latency protocol: ${WARMUP_PASSES} unmeasured warmup pass per corpus size, then ${repeats} measured repeats per query on a warm connection; p50/p95 are computed across all measured calls at that size. Relevance metrics come from the first measured pass (retrieval is deterministic for a fixed corpus).`,
+      `Embedding service available: ${embedding.available ? `yes (${embedding.model})` : 'no'}. Suite C exercises the FTS5 keyword path (search()) only — semantic/hybrid retrieval is NOT measured in this baseline either way.`,
+      'FTS5 MATCH is implicit AND with no stemming — paraphrase-category queries are expected to score near zero on keyword search. That gap is part of the honest baseline this suite records.',
+      'Dedup was NOT run before measurement: the corpus contains unmarked near-duplicates that legitimately compete in ranking. search() excludes only records already marked in dedup_lineage.',
+      'Ground truth never includes messages-table records — messages are noise-only in this corpus. The project column is part of every FTS index, so unscoped queries can match records via their project name alone.',
+      'No pass/fail threshold — baseline-first. Later regression gating can diff future runs against the checked-in baseline JSONL.',
+    ],
+  };
+}
diff --git a/src/commands/benchmark.ts b/src/commands/benchmark.ts
index e732e76..0aa0168 100644
--- a/src/commands/benchmark.ts
+++ b/src/commands/benchmark.ts
@@ -11,7 +11,7 @@ import { join } from 'path';
 const SUITES_AVAILABLE = [
   { id: 'A', name: 'Cross-session recall', status: 'planned' },
   { id: 'B', name: 'Token efficiency', status: 'built' },
-  { id: 'C', name: 'Precision under noise', status: 'planned' },
+  { id: 'C', name: 'Precision under noise', status: 'built' },
   { id: 'D', name: 'Structured-knowledge fidelity', status: 'planned' },
   { id: 'E', name: 'Real-world replay', status: 'planned' },
 ] as const;
diff --git a/tests/benchmarks/suite-b.test.ts b/tests/benchmarks/suite-b.test.ts
index 27517c4..0e67647 100644
--- a/tests/benchmarks/suite-b.test.ts
+++ b/tests/benchmarks/suite-b.test.ts
@@ -241,10 +241,17 @@ describe('Runner — runBenchmarks', () => {
   });
 
   test('skips planned suites cleanly when running all', async () => {
-    const out = await runBenchmarks({ project: 'bench-test', dryRun: true });
-    // Only Suite B is built — others return null and are skipped
-    expect(out.result.suites.length).toBe(1);
-    expect(out.result.suites[0].suite).toBe('B');
+    // Suite C is built — cap its corpus ladder so this stays a fast test.
+    process.env.RECALL_BENCH_C_SIZES = '100';
+    process.env.RECALL_BENCH_C_REPEATS = '1';
+    try {
+      const out = await runBenchmarks({ project: 'bench-test', dryRun: true });
+      // Suites B and C are built — A / D / E return null and are skipped
+      expect(out.result.suites.map(s => s.suite)).toEqual(['B', 'C']);
+    } finally {
+      delete process.env.RECALL_BENCH_C_SIZES;
+      delete process.env.RECALL_BENCH_C_REPEATS;
+    }
   });
 });
 
diff --git a/tests/benchmarks/suite-c.test.ts b/tests/benchmarks/suite-c.test.ts
new file mode 100644
index 0000000..f30d1f0
--- /dev/null
+++ b/tests/benchmarks/suite-c.test.ts
@@ -0,0 +1,329 @@
+// Tests for Suite C — precision under noise.
+//
+// Coverage per the issue spec: fixture determinism, ground-truth label
+// consistency, metric calculation, and report generation. Mirrors the
+// suite-b.test.ts approach: assert SHAPE and invariants, not absolute scores —
+// the whole point of Suite C is that scores are an honest measurement, so the
+// tests must not encode expectations about how well search ranks.
+
+import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
+import { mkdtempSync, rmSync, existsSync } from 'fs';
+import { join } from 'path';
+import { tmpdir } from 'os';
+import { Database } from 'bun:sqlite';
+import {
+  K,
+  mulberry32,
+  generateFixtureSpec,
+  seedFixture,
+  precisionAtK,
+  recallAtK,
+  reciprocalRank,
+  percentile,
+  mean,
+  refKey,
+  searchTableName,
+} from '../../benchmarks/suites/suite-c-internals';
+import { runSuiteC } from '../../benchmarks/suites/suite-c-precision-noise';
+import { renderMarkdown } from '../../benchmarks/runner';
+import { initDb, closeDb } from '../../src/db/connection';
+
+const QUERY_CATEGORIES = ['exact_lookup', 'paraphrase', 'problem_lookup', 'ambiguous'] as const;
+
+let savedDbPath: string | undefined;
+let savedMemPath: string | undefined;
+let tempDirs: string[] = [];
+
+beforeEach(() => {
+  savedDbPath = process.env.RECALL_DB_PATH;
+  savedMemPath = process.env.MEM_DB_PATH;
+});
+
+afterEach(() => {
+  closeDb();
+  if (savedDbPath !== undefined) process.env.RECALL_DB_PATH = savedDbPath;
+  else delete process.env.RECALL_DB_PATH;
+  if (savedMemPath !== undefined) process.env.MEM_DB_PATH = savedMemPath;
+  else delete process.env.MEM_DB_PATH;
+  for (const dir of tempDirs) {
+    if (existsSync(dir)) rmSync(dir, { recursive: true, force: true });
+  }
+  tempDirs = [];
+});
+
+/** Point RECALL_DB_PATH at a fresh initialized temp DB, ready for seedFixture. */
+function seedIntoTempDb(): { dbPath: string } {
+  const dir = mkdtempSync(join(tmpdir(), 'recall-suite-c-test-'));
+  tempDirs.push(dir);
+  const dbPath = join(dir, 'fixture.db');
+  closeDb();
+  process.env.RECALL_DB_PATH = dbPath;
+  delete process.env.MEM_DB_PATH;
+  initDb();
+  return { dbPath };
+}
+
+/** Stable digest of corpus text content, independent of row IDs. */
+function corpusDigest(dbPath: string): string {
+  const db = new Database(dbPath, { readonly: true });
+  try {
+    const texts: string[] = [];
+    const pull = (sql: string) => {
+      for (const row of db.prepare(sql).all() as Array<{ t: string }>) texts.push(row.t);
+    };
+    pull('SELECT decision || COALESCE(reasoning, \'\') AS t FROM decisions');
+    pull('SELECT problem || COALESCE(solution, \'\') AS t FROM learnings');
+    pull('SELECT content AS t FROM breadcrumbs');
+    pull('SELECT title || fabric_extract AS t FROM loa_entries');
+    pull('SELECT content AS t FROM messages');
+    texts.sort();
+    return String(Bun.hash(texts.join('\u0000')));
+  } finally {
+    db.close();
+  }
+}
+
+describe('Metric helpers', () => {
+  const retrieved = [
+    { table: 'decisions', id: 1 },
+    { table: 'learnings', id: 2 },
+    { table: 'decisions', id: 3 },
+    { table: 'breadcrumbs', id: 4 },
+    { table: 'loa', id: 5 },
+  ];
+
+  test('precisionAtK divides hits in top-k by k', () => {
+    const relevant = new Set(['decisions#1', 'decisions#3']);
+    expect(precisionAtK(retrieved, relevant, 5)).toBe(2 / 5);
+    expect(precisionAtK(retrieved, relevant, 1)).toBe(1);
+    expect(precisionAtK(retrieved, new Set(), 5)).toBe(0);
+    expect(precisionAtK([], relevant, 5)).toBe(0);
+    expect(precisionAtK(retrieved, relevant, 0)).toBe(0);
+  });
+
+  test('recallAtK divides retrieved relevant by total relevant', () => {
+    const relevant = new Set(['decisions#1', 'learnings#2', 'messages#99']);
+    expect(recallAtK(retrieved, relevant, 5)).toBe(2 / 3);
+    expect(recallAtK(retrieved, relevant, 1)).toBe(1 / 3);
+    expect(recallAtK(retrieved, new Set(), 5)).toBe(0);
+  });
+
+  test('reciprocalRank returns 1/rank of first relevant within k', () => {
+    expect(reciprocalRank(retrieved, new Set(['decisions#1']), 5)).toBe(1);
+    expect(reciprocalRank(retrieved, new Set(['decisions#3']), 5)).toBe(1 / 3);
+    expect(reciprocalRank(retrieved, new Set(['loa#5']), 5)).toBe(1 / 5);
+    // Relevant exists but is outside the top-k cutoff
+    expect(reciprocalRank(retrieved, new Set(['loa#5']), 4)).toBe(0);
+    expect(reciprocalRank(retrieved, new Set(['messages#99']), 5)).toBe(0);
+  });
+
+  test('percentile uses nearest-rank on a sorted copy', () => {
+    const values = [5, 1, 4, 2, 3];
+    expect(percentile(values, 50)).toBe(3);
+    expect(percentile(values, 95)).toBe(5);
+    expect(percentile(values, 100)).toBe(5);
+    expect(percentile([7], 95)).toBe(7);
+    expect(percentile([], 50)).toBe(0);
+    // Input must not be mutated
+    expect(values).toEqual([5, 1, 4, 2, 3]);
+  });
+
+  test('mean averages, empty is 0', () => {
+    expect(mean([1, 2, 3])).toBe(2);
+    expect(mean([])).toBe(0);
+  });
+
+  test('refKey and searchTableName align ground truth with search() identities', () => {
+    expect(refKey('decisions', 7)).toBe('decisions#7');
+    expect(searchTableName('loa_entries')).toBe('loa');
+    expect(searchTableName('decisions')).toBe('decisions');
+    expect(searchTableName('messages')).toBe('messages');
+  });
+});
+
+describe('mulberry32 PRNG', () => {
+  test('same seed produces the same sequence', () => {
+    const a = mulberry32(47);
+    const b = mulberry32(47);
+    for (let i = 0; i < 10; i++) expect(a()).toBe(b());
+  });
+
+  test('different seeds diverge and values stay in [0,1)', () => {
+    const a = mulberry32(47);
+    const b = mulberry32(48);
+    const av = Array.from({ length: 5 }, () => a());
+    const bv = Array.from({ length: 5 }, () => b());
+    expect(av).not.toEqual(bv);
+    for (const v of [...av, ...bv]) {
+      expect(v).toBeGreaterThanOrEqual(0);
+      expect(v).toBeLessThan(1);
+    }
+  });
+});
+
+describe('Fixture determinism', () => {
+  test('same seed and size produce an identical spec', () => {
+    const a = generateFixtureSpec(47, 500);
+    const b = generateFixtureSpec(47, 500);
+    expect(JSON.stringify(a)).toBe(JSON.stringify(b));
+  });
+
+  test('different seeds produce different noise', () => {
+    const a = generateFixtureSpec(47, 500);
+    const b = generateFixtureSpec(48, 500);
+    expect(JSON.stringify(a.records)).not.toBe(JSON.stringify(b.records));
+  });
+
+  test('record count matches the requested size', () => {
+    expect(generateFixtureSpec(47, 100).records.length).toBe(100);
+    expect(generateFixtureSpec(47, 1000).records.length).toBe(1000);
+  });
+
+  test('rejects sizes too small to hold targets and queries', () => {
+    expect(() => generateFixtureSpec(47, 10)).toThrow();
+  });
+
+  test('seeding the same spec yields byte-identical corpus content', () => {
+    const spec = generateFixtureSpec(47, 100);
+    const first = seedIntoTempDb();
+    seedFixture(spec);
+    closeDb();
+    const second = seedIntoTempDb();
+    seedFixture(spec);
+    closeDb();
+    expect(corpusDigest(first.dbPath)).toBe(corpusDigest(second.dbPath));
+  });
+});
+
+describe('Ground-truth label consistency', () => {
+  const spec = generateFixtureSpec(47, 100);
+
+  test('every expected key references a target record in the spec', () => {
+    const targetKeys = new Set(spec.records.filter((r) => r.role === 'target').map((r) => r.key));
+    for (const query of spec.queries) {
+      expect(query.expected.length).toBeGreaterThan(0);
+      for (const key of query.expected) {
+        expect(targetKeys.has(key)).toBe(true);
+      }
+    }
+  });
+
+  test('project-scoped queries only expect records from that project', () => {
+    const byKey = new Map(spec.records.map((r) => [r.key, r]));
+    const scoped = spec.queries.filter((q) => q.project);
+    expect(scoped.length).toBeGreaterThan(0);
+    for (const query of scoped) {
+      for (const key of query.expected) {
+        expect(byKey.get(key)!.project).toBe(query.project!);
+      }
+    }
+  });
+
+  test('all four query categories are present and ambiguous queries carry collision labels', () => {
+    const categories = new Set(spec.queries.map((q) => q.category));
+    for (const category of QUERY_CATEGORIES) expect(categories.has(category)).toBe(true);
+    for (const query of spec.queries) {
+      if (query.category === 'ambiguous') {
+        expect(query.collision).toBeDefined();
+        expect(['name', 'project', 'topic']).toContain(query.collision!.kind);
+      }
+    }
+  });
+
+  test('targets are constant across corpus sizes', () => {
+    const targetsAt = (size: number) =>
+      JSON.stringify(generateFixtureSpec(47, size).records.filter((r) => r.role === 'target').sort((a, b) => a.key.localeCompare(b.key)));
+    expect(targetsAt(100)).toBe(targetsAt(1000));
+  });
+
+  test('seeded targets resolve with matching table, project, and provenance', () => {
+    seedIntoTempDb();
+    const targets = seedFixture(spec);
+    closeDb();
+    const byKey = new Map(spec.records.map((r) => [r.key, r]));
+    for (const query of spec.queries) {
+      for (const key of query.expected) {
+        const seeded = targets.get(key);
+        expect(seeded).toBeDefined();
+        const declared = byKey.get(key)!;
+        expect(seeded!.table).toBe(declared.table);
+        expect(seeded!.project).toBe(declared.project);
+        expect(seeded!.provenance).toBe(declared.provenance);
+        expect(seeded!.id).toBeGreaterThan(0);
+      }
+    }
+  });
+});
+
+describe('Suite C — runSuiteC report generation', () => {
+  test('returns the documented metric set per corpus size', async () => {
+    const result = await runSuiteC({ sizes: [100], repeats: 2 });
+    expect(result.suite).toBe('C');
+    expect(result.name).toBe('Precision under noise');
+    expect(result.caveats.length).toBeGreaterThan(0);
+
+    const at = (name: string) => result.samples.find((s) => s.name === name && s.scope === 'corpus=100');
+    for (const required of ['p_at_5', 'r_at_5', 'mrr', 'latency_p50_ms', 'latency_p95_ms']) {
+      expect(at(required)).toBeDefined();
+    }
+
+    // Ratios stay in [0,1]; latencies are non-negative.
+    for (const name of ['p_at_5', 'r_at_5', 'mrr']) {
+      const sample = at(name)!;
+      expect(sample.value).toBeGreaterThanOrEqual(0);
+      expect(sample.value).toBeLessThanOrEqual(1);
+    }
+    expect(at('latency_p50_ms')!.value).toBeGreaterThanOrEqual(0);
+    expect(at('latency_p95_ms')!.value).toBeGreaterThanOrEqual(at('latency_p50_ms')!.value);
+
+    // Breakdowns: every query category, plus table and provenance dimensions.
+    const names = result.samples.map((s) => s.name);
+    for (const category of QUERY_CATEGORIES) {
+      expect(names).toContain(`p_at_5_cat_${category}`);
+      expect(names).toContain(`mrr_cat_${category}`);
+    }
+    expect(names.some((n) => n.startsWith('r_at_5_table_'))).toBe(true);
+    expect(names.some((n) => n.startsWith('r_at_5_prov_'))).toBe(true);
+  });
+
+  test('one sample group per requested corpus size', async () => {
+    const result = await runSuiteC({ sizes: [100, 150], repeats: 1 });
+    const scopes = new Set(result.samples.map((s) => s.scope));
+    expect(scopes.has('corpus=100')).toBe(true);
+    expect(scopes.has('corpus=150')).toBe(true);
+  });
+
+  test('restores the DB env override and never touches the previous DB path', async () => {
+    process.env.RECALL_DB_PATH = '/tmp/suite-c-sentinel-does-not-exist.db';
+    await runSuiteC({ sizes: [100], repeats: 1 });
+    expect(process.env.RECALL_DB_PATH).toBe('/tmp/suite-c-sentinel-does-not-exist.db');
+    expect(existsSync('/tmp/suite-c-sentinel-does-not-exist.db')).toBe(false);
+  });
+
+  test('latency caveat documents the warmup and repeat protocol', async () => {
+    const result = await runSuiteC({ sizes: [100], repeats: 3 });
+    const latencyCaveat = result.caveats.find((c) => c.includes('warmup'));
+    expect(latencyCaveat).toBeDefined();
+    expect(latencyCaveat).toContain('3 measured repeats');
+    const embeddingCaveat = result.caveats.find((c) => c.includes('Embedding service available'));
+    expect(embeddingCaveat).toBeDefined();
+  });
+
+  test('renderMarkdown renders the Suite C section', async () => {
+    const suite = await runSuiteC({ sizes: [100], repeats: 1 });
+    const md = renderMarkdown({
+      startedAt: '',
+      finishedAt: '',
+      recallVersion: 'test',
+      hostInfo: { platform: 'test', bunVersion: 'test' },
+      suites: [suite],
+    });
+    expect(md).toContain('## Suite C — Precision under noise');
+    expect(md).toContain('| p_at_5 |');
+    expect(md).toContain('### Caveats');
+  });
+
+  test('K cutoff is 5 — the metric names match the measurement', () => {
+    expect(K).toBe(5);
+  });
+});

From 575a60ee7b70c3463880a781ebd996563672ea05 Mon Sep 17 00:00:00 2001
From: Ed Heltzel <402910+edheltzel@users.noreply.github.com>
Date: Thu, 11 Jun 2026 05:38:34 -0400
Subject: [PATCH 2/2] test(benchmarks): record Suite C baseline run (seed 47,
 full ladder)

First honest baseline: exact-lookup MRR degrades 1.0 -> 0 from 100 to 100k
records as unmarked near-duplicates crowd originals out of the top-5;
latency p95 grows 0.5ms -> 36ms. Future regression gating diffs against
this JSONL.
---
 .../results/2026-06-11T09-36-53-suite-C.jsonl |   1 +
 .../results/2026-06-11T09-36-53-suite-C.md    | 112 ++++++++++++++++++
 2 files changed, 113 insertions(+)
 create mode 100644 benchmarks/results/2026-06-11T09-36-53-suite-C.jsonl
 create mode 100644 benchmarks/results/2026-06-11T09-36-53-suite-C.md

diff --git a/benchmarks/results/2026-06-11T09-36-53-suite-C.jsonl b/benchmarks/results/2026-06-11T09-36-53-suite-C.jsonl
new file mode 100644
index 0000000..68975e4
--- /dev/null
+++ b/benchmarks/results/2026-06-11T09-36-53-suite-C.jsonl
@@ -0,0 +1 @@
+{"startedAt":"2026-06-11T09:36:48.827Z","finishedAt":"2026-06-11T09:36:53.977Z","recallVersion":"1.0.0","hostInfo":{"platform":"darwin-arm64","bunVersion":"1.3.14"},"suites":[{"suite":"C","name":"Precision under noise","description":"Measures FTS5 search() precision against seeded synthetic corpora (sizes: 100, 1000, 10000, 100000; seed: 47). A ground-truth-labeled query set (exact lookup, paraphrase, problem lookup, ambiguous-with-collisions) runs at each size; reports P@5, R@5, MRR@5, latency p50/p95, and breakdowns by query category, target table, and provenance.","ranAt":"2026-06-11T09:36:53.977Z","durationMs":5149,"samples":[{"name":"p_at_5","value":0.1857,"unit":"ratio","scope":"corpus=100"},{"name":"r_at_5","value":0.8095,"unit":"ratio","scope":"corpus=100"},{"name":"mrr","value":0.7381,"unit":"ratio (MRR@5)","scope":"corpus=100"},{"name":"latency_p50_ms","value":0.338,"unit":"ms","scope":"corpus=100"},{"name":"latency_p95_ms","value":0.475,"unit":"ms","scope":"corpus=100"},{"name":"p_at_5_cat_exact_lookup","value":0.2,"unit":"ratio","scope":"corpus=100"},{"name":"mrr_cat_exact_lookup","value":1,"unit":"ratio","scope":"corpus=100"},{"name":"p_at_5_cat_paraphrase","value":0.0667,"unit":"ratio","scope":"corpus=100"},{"name":"mrr_cat_paraphrase","value":0.3333,"unit":"ratio","scope":"corpus=100"},{"name":"p_at_5_cat_problem_lookup","value":0.2,"unit":"ratio","scope":"corpus=100"},{"name":"mrr_cat_problem_lookup","value":0.8333,"unit":"ratio","scope":"corpus=100"},{"name":"p_at_5_cat_ambiguous","value":0.25,"unit":"ratio","scope":"corpus=100"},{"name":"mrr_cat_ambiguous","value":0.7083,"unit":"ratio","scope":"corpus=100"},{"name":"r_at_5_table_decisions","value":0.625,"unit":"ratio","scope":"corpus=100"},{"name":"r_at_5_table_loa_entries","value":0.5,"unit":"ratio","scope":"corpus=100"},{"name":"r_at_5_table_breadcrumbs","value":1,"unit":"ratio","scope":"corpus=100"},{"name":"r_at_5_table_learnings","value":1,"unit":"ratio","scope":"corpus=100"},{"name":"r_at_5_prov_user_authored","value":0.75,"unit":"ratio","scope":"corpus=100"},{"name":"r_at_5_prov_extracted","value":0.6667,"unit":"ratio","scope":"corpus=100"},{"name":"r_at_5_prov_verbatim","value":1,"unit":"ratio","scope":"corpus=100"},{"name":"r_at_5_prov_derived","value":1,"unit":"ratio","scope":"corpus=100"},{"name":"p_at_5","value":0.0714,"unit":"ratio","scope":"corpus=1000"},{"name":"r_at_5","value":0.3571,"unit":"ratio","scope":"corpus=1000"},{"name":"mrr","value":0.2857,"unit":"ratio (MRR@5)","scope":"corpus=1000"},{"name":"latency_p50_ms","value":0.442,"unit":"ms","scope":"corpus=1000"},{"name":"latency_p95_ms","value":0.714,"unit":"ms","scope":"corpus=1000"},{"name":"p_at_5_cat_exact_lookup","value":0.15,"unit":"ratio","scope":"corpus=1000"},{"name":"mrr_cat_exact_lookup","value":0.75,"unit":"ratio","scope":"corpus=1000"},{"name":"p_at_5_cat_paraphrase","value":0.0667,"unit":"ratio","scope":"corpus=1000"},{"name":"mrr_cat_paraphrase","value":0.1667,"unit":"ratio","scope":"corpus=1000"},{"name":"p_at_5_cat_problem_lookup","value":0,"unit":"ratio","scope":"corpus=1000"},{"name":"mrr_cat_problem_lookup","value":0,"unit":"ratio","scope":"corpus=1000"},{"name":"p_at_5_cat_ambiguous","value":0.05,"unit":"ratio","scope":"corpus=1000"},{"name":"mrr_cat_ambiguous","value":0.125,"unit":"ratio","scope":"corpus=1000"},{"name":"r_at_5_table_decisions","value":0.5,"unit":"ratio","scope":"corpus=1000"},{"name":"r_at_5_table_loa_entries","value":0,"unit":"ratio","scope":"corpus=1000"},{"name":"r_at_5_table_breadcrumbs","value":1,"unit":"ratio","scope":"corpus=1000"},{"name":"r_at_5_table_learnings","value":0,"unit":"ratio","scope":"corpus=1000"},{"name":"r_at_5_prov_user_authored","value":0.5,"unit":"ratio","scope":"corpus=1000"},{"name":"r_at_5_prov_extracted","value":0.2222,"unit":"ratio","scope":"corpus=1000"},{"name":"r_at_5_prov_verbatim","value":0.5,"unit":"ratio","scope":"corpus=1000"},{"name":"r_at_5_prov_derived","value":0,"unit":"ratio","scope":"corpus=1000"},{"name":"p_at_5","value":0.0429,"unit":"ratio","scope":"corpus=10000"},{"name":"r_at_5","value":0.2143,"unit":"ratio","scope":"corpus=10000"},{"name":"mrr","value":0.1095,"unit":"ratio (MRR@5)","scope":"corpus=10000"},{"name":"latency_p50_ms","value":1.246,"unit":"ms","scope":"corpus=10000"},{"name":"latency_p95_ms","value":5.236,"unit":"ms","scope":"corpus=10000"},{"name":"p_at_5_cat_exact_lookup","value":0.1,"unit":"ratio","scope":"corpus=10000"},{"name":"mrr_cat_exact_lookup","value":0.1333,"unit":"ratio","scope":"corpus=10000"},{"name":"p_at_5_cat_paraphrase","value":0,"unit":"ratio","scope":"corpus=10000"},{"name":"mrr_cat_paraphrase","value":0,"unit":"ratio","scope":"corpus=10000"},{"name":"p_at_5_cat_problem_lookup","value":0,"unit":"ratio","scope":"corpus=10000"},{"name":"mrr_cat_problem_lookup","value":0,"unit":"ratio","scope":"corpus=10000"},{"name":"p_at_5_cat_ambiguous","value":0.05,"unit":"ratio","scope":"corpus=10000"},{"name":"mrr_cat_ambiguous","value":0.25,"unit":"ratio","scope":"corpus=10000"},{"name":"r_at_5_table_decisions","value":0.25,"unit":"ratio","scope":"corpus=10000"},{"name":"r_at_5_table_loa_entries","value":0,"unit":"ratio","scope":"corpus=10000"},{"name":"r_at_5_table_breadcrumbs","value":1,"unit":"ratio","scope":"corpus=10000"},{"name":"r_at_5_table_learnings","value":0,"unit":"ratio","scope":"corpus=10000"},{"name":"r_at_5_prov_user_authored","value":0,"unit":"ratio","scope":"corpus=10000"},{"name":"r_at_5_prov_extracted","value":0.2222,"unit":"ratio","scope":"corpus=10000"},{"name":"r_at_5_prov_verbatim","value":0.5,"unit":"ratio","scope":"corpus=10000"},{"name":"r_at_5_prov_derived","value":0,"unit":"ratio","scope":"corpus=10000"},{"name":"p_at_5","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"r_at_5","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"mrr","value":0,"unit":"ratio (MRR@5)","scope":"corpus=100000"},{"name":"latency_p50_ms","value":2.626,"unit":"ms","scope":"corpus=100000"},{"name":"latency_p95_ms","value":36.182,"unit":"ms","scope":"corpus=100000"},{"name":"p_at_5_cat_exact_lookup","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"mrr_cat_exact_lookup","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"p_at_5_cat_paraphrase","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"mrr_cat_paraphrase","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"p_at_5_cat_problem_lookup","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"mrr_cat_problem_lookup","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"p_at_5_cat_ambiguous","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"mrr_cat_ambiguous","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"r_at_5_table_decisions","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"r_at_5_table_loa_entries","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"r_at_5_table_breadcrumbs","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"r_at_5_table_learnings","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"r_at_5_prov_user_authored","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"r_at_5_prov_extracted","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"r_at_5_prov_verbatim","value":0,"unit":"ratio","scope":"corpus=100000"},{"name":"r_at_5_prov_derived","value":0,"unit":"ratio","scope":"corpus=100000"}],"caveats":["Synthetic corpus: deterministic seeded fixtures (seed 47). Absolute scores do not transfer to real-world corpora; compare runs only against this same fixture set.","Latency protocol: 1 unmeasured warmup pass per corpus size, then 5 measured repeats per query on a warm connection; p50/p95 are computed across all measured calls at that size. Relevance metrics come from the first measured pass (retrieval is deterministic for a fixed corpus).","Embedding service available: no. Suite C exercises the FTS5 keyword path (search()) only — semantic/hybrid retrieval is NOT measured in this baseline either way.","FTS5 MATCH is implicit AND with no stemming — paraphrase-category queries are expected to score near zero on keyword search. That gap is part of the honest baseline this suite records.","Dedup was NOT run before measurement: the corpus contains unmarked near-duplicates that legitimately compete in ranking. search() excludes only records already marked in dedup_lineage.","Ground truth never includes messages-table records — messages are noise-only in this corpus. The project column is part of every FTS index, so unscoped queries can match records via their project name alone.","No pass/fail threshold — baseline-first. Later regression gating can diff future runs against the checked-in baseline JSONL."]}]}
diff --git a/benchmarks/results/2026-06-11T09-36-53-suite-C.md b/benchmarks/results/2026-06-11T09-36-53-suite-C.md
new file mode 100644
index 0000000..aedfd7b
--- /dev/null
+++ b/benchmarks/results/2026-06-11T09-36-53-suite-C.md
@@ -0,0 +1,112 @@
+# Recall Benchmark Run
+
+- **Started:** 2026-06-11T09:36:48.827Z
+- **Finished:** 2026-06-11T09:36:53.977Z
+- **Recall version:** 1.0.0
+- **Host:** darwin-arm64 (Bun 1.3.14)
+
+## Suite C — Precision under noise
+
+Measures FTS5 search() precision against seeded synthetic corpora (sizes: 100, 1000, 10000, 100000; seed: 47). A ground-truth-labeled query set (exact lookup, paraphrase, problem lookup, ambiguous-with-collisions) runs at each size; reports P@5, R@5, MRR@5, latency p50/p95, and breakdowns by query category, target table, and provenance.
+
+_Ran in 5149 ms at 2026-06-11T09:36:53.977Z._
+
+| Metric | Value | Unit | Scope | vs Baseline |
+|---|---:|---|---|---|
+| p_at_5 | 0.1857 | ratio | corpus=100 | — |
+| r_at_5 | 0.8095 | ratio | corpus=100 | — |
+| mrr | 0.7381 | ratio (MRR@5) | corpus=100 | — |
+| latency_p50_ms | 0.338 | ms | corpus=100 | — |
+| latency_p95_ms | 0.475 | ms | corpus=100 | — |
+| p_at_5_cat_exact_lookup | 0.2 | ratio | corpus=100 | — |
+| mrr_cat_exact_lookup | 1 | ratio | corpus=100 | — |
+| p_at_5_cat_paraphrase | 0.0667 | ratio | corpus=100 | — |
+| mrr_cat_paraphrase | 0.3333 | ratio | corpus=100 | — |
+| p_at_5_cat_problem_lookup | 0.2 | ratio | corpus=100 | — |
+| mrr_cat_problem_lookup | 0.8333 | ratio | corpus=100 | — |
+| p_at_5_cat_ambiguous | 0.25 | ratio | corpus=100 | — |
+| mrr_cat_ambiguous | 0.7083 | ratio | corpus=100 | — |
+| r_at_5_table_decisions | 0.625 | ratio | corpus=100 | — |
+| r_at_5_table_loa_entries | 0.5 | ratio | corpus=100 | — |
+| r_at_5_table_breadcrumbs | 1 | ratio | corpus=100 | — |
+| r_at_5_table_learnings | 1 | ratio | corpus=100 | — |
+| r_at_5_prov_user_authored | 0.75 | ratio | corpus=100 | — |
+| r_at_5_prov_extracted | 0.6667 | ratio | corpus=100 | — |
+| r_at_5_prov_verbatim | 1 | ratio | corpus=100 | — |
+| r_at_5_prov_derived | 1 | ratio | corpus=100 | — |
+| p_at_5 | 0.0714 | ratio | corpus=1000 | — |
+| r_at_5 | 0.3571 | ratio | corpus=1000 | — |
+| mrr | 0.2857 | ratio (MRR@5) | corpus=1000 | — |
+| latency_p50_ms | 0.442 | ms | corpus=1000 | — |
+| latency_p95_ms | 0.714 | ms | corpus=1000 | — |
+| p_at_5_cat_exact_lookup | 0.15 | ratio | corpus=1000 | — |
+| mrr_cat_exact_lookup | 0.75 | ratio | corpus=1000 | — |
+| p_at_5_cat_paraphrase | 0.0667 | ratio | corpus=1000 | — |
+| mrr_cat_paraphrase | 0.1667 | ratio | corpus=1000 | — |
+| p_at_5_cat_problem_lookup | 0 | ratio | corpus=1000 | — |
+| mrr_cat_problem_lookup | 0 | ratio | corpus=1000 | — |
+| p_at_5_cat_ambiguous | 0.05 | ratio | corpus=1000 | — |
+| mrr_cat_ambiguous | 0.125 | ratio | corpus=1000 | — |
+| r_at_5_table_decisions | 0.5 | ratio | corpus=1000 | — |
+| r_at_5_table_loa_entries | 0 | ratio | corpus=1000 | — |
+| r_at_5_table_breadcrumbs | 1 | ratio | corpus=1000 | — |
+| r_at_5_table_learnings | 0 | ratio | corpus=1000 | — |
+| r_at_5_prov_user_authored | 0.5 | ratio | corpus=1000 | — |
+| r_at_5_prov_extracted | 0.2222 | ratio | corpus=1000 | — |
+| r_at_5_prov_verbatim | 0.5 | ratio | corpus=1000 | — |
+| r_at_5_prov_derived | 0 | ratio | corpus=1000 | — |
+| p_at_5 | 0.0429 | ratio | corpus=10000 | — |
+| r_at_5 | 0.2143 | ratio | corpus=10000 | — |
+| mrr | 0.1095 | ratio (MRR@5) | corpus=10000 | — |
+| latency_p50_ms | 1.246 | ms | corpus=10000 | — |
+| latency_p95_ms | 5.236 | ms | corpus=10000 | — |
+| p_at_5_cat_exact_lookup | 0.1 | ratio | corpus=10000 | — |
+| mrr_cat_exact_lookup | 0.1333 | ratio | corpus=10000 | — |
+| p_at_5_cat_paraphrase | 0 | ratio | corpus=10000 | — |
+| mrr_cat_paraphrase | 0 | ratio | corpus=10000 | — |
+| p_at_5_cat_problem_lookup | 0 | ratio | corpus=10000 | — |
+| mrr_cat_problem_lookup | 0 | ratio | corpus=10000 | — |
+| p_at_5_cat_ambiguous | 0.05 | ratio | corpus=10000 | — |
+| mrr_cat_ambiguous | 0.25 | ratio | corpus=10000 | — |
+| r_at_5_table_decisions | 0.25 | ratio | corpus=10000 | — |
+| r_at_5_table_loa_entries | 0 | ratio | corpus=10000 | — |
+| r_at_5_table_breadcrumbs | 1 | ratio | corpus=10000 | — |
+| r_at_5_table_learnings | 0 | ratio | corpus=10000 | — |
+| r_at_5_prov_user_authored | 0 | ratio | corpus=10000 | — |
+| r_at_5_prov_extracted | 0.2222 | ratio | corpus=10000 | — |
+| r_at_5_prov_verbatim | 0.5 | ratio | corpus=10000 | — |
+| r_at_5_prov_derived | 0 | ratio | corpus=10000 | — |
+| p_at_5 | 0 | ratio | corpus=100000 | — |
+| r_at_5 | 0 | ratio | corpus=100000 | — |
+| mrr | 0 | ratio (MRR@5) | corpus=100000 | — |
+| latency_p50_ms | 2.626 | ms | corpus=100000 | — |
+| latency_p95_ms | 36.182 | ms | corpus=100000 | — |
+| p_at_5_cat_exact_lookup | 0 | ratio | corpus=100000 | — |
+| mrr_cat_exact_lookup | 0 | ratio | corpus=100000 | — |
+| p_at_5_cat_paraphrase | 0 | ratio | corpus=100000 | — |
+| mrr_cat_paraphrase | 0 | ratio | corpus=100000 | — |
+| p_at_5_cat_problem_lookup | 0 | ratio | corpus=100000 | — |
+| mrr_cat_problem_lookup | 0 | ratio | corpus=100000 | — |
+| p_at_5_cat_ambiguous | 0 | ratio | corpus=100000 | — |
+| mrr_cat_ambiguous | 0 | ratio | corpus=100000 | — |
+| r_at_5_table_decisions | 0 | ratio | corpus=100000 | — |
+| r_at_5_table_loa_entries | 0 | ratio | corpus=100000 | — |
+| r_at_5_table_breadcrumbs | 0 | ratio | corpus=100000 | — |
+| r_at_5_table_learnings | 0 | ratio | corpus=100000 | — |
+| r_at_5_prov_user_authored | 0 | ratio | corpus=100000 | — |
+| r_at_5_prov_extracted | 0 | ratio | corpus=100000 | — |
+| r_at_5_prov_verbatim | 0 | ratio | corpus=100000 | — |
+| r_at_5_prov_derived | 0 | ratio | corpus=100000 | — |
+
+### Caveats
+
+- Synthetic corpus: deterministic seeded fixtures (seed 47). Absolute scores do not transfer to real-world corpora; compare runs only against this same fixture set.
+- Latency protocol: 1 unmeasured warmup pass per corpus size, then 5 measured repeats per query on a warm connection; p50/p95 are computed across all measured calls at that size. Relevance metrics come from the first measured pass (retrieval is deterministic for a fixed corpus).
+- Embedding service available: no. Suite C exercises the FTS5 keyword path (search()) only — semantic/hybrid retrieval is NOT measured in this baseline either way.
+- FTS5 MATCH is implicit AND with no stemming — paraphrase-category queries are expected to score near zero on keyword search. That gap is part of the honest baseline this suite records.
+- Dedup was NOT run before measurement: the corpus contains unmarked near-duplicates that legitimately compete in ranking. search() excludes only records already marked in dedup_lineage.
+- Ground truth never includes messages-table records — messages are noise-only in this corpus. The project column is part of every FTS index, so unscoped queries can match records via their project name alone.
+- No pass/fail threshold — baseline-first. Later regression gating can diff future runs against the checked-in baseline JSONL.
+
+---
+_All metrics are unblended. We do not publish composite scores. See the per-suite caveats before drawing conclusions._
\ No newline at end of file