diff --git a/CHANGELOG.md b/CHANGELOG.md
index 4a11335..abf0eda 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -12,6 +12,18 @@ while MCP tool names (`memory_search`, `memory_add`, etc.) remain stable.
 
 ### Added
 
+- **`recall dedup`** — non-destructive dedup with provenance-aware survivor
+  selection (#45): dry-run by default, `--execute` marks duplicates in the new
+  `dedup_lineage` table (schema migration 9→10) without touching the records,
+  `--delete` is the destructive opt-in (take `recall export --backup` first).
+  Detection combines normalized-text matching with semantic matching over
+  stored embeddings (conservative 0.95 default threshold, skip reported when
+  embeddings are unavailable). Survivor priority is `user_authored > verbatim
+  > extracted > derived > unknown`, then richness, importance, recency.
+  Within-table only; cross-table candidates are report-only. Marked
+  duplicates are hidden from every search path unless
+  `recall search --include-duplicates` is passed, and lineage rows are
+  included in `recall export`.
 - **`recall export`** — portable and disaster-recovery exports (#43): JSON,
   Markdown, SQL dump, and SQLite (`VACUUM INTO`) formats with a manifest
   (counts + provenance counts including explicit `unknown`), a stdout/file/
diff --git a/docs/architecture.md b/docs/architecture.md
index a7d859e..d84e960 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -70,6 +70,7 @@ both.
 | telos | Purpose framework entries (optional) | Yes |
 | documents | Imported standalone markdown documents (optional) | Yes |
 | embeddings | Vector embeddings for semantic search (768-dim, nomic-embed-text) | N/A |
+| dedup_lineage | Duplicate lineage audit trail from `recall dedup` (survivor, duplicate, reason, similarity, status) | No |
 
 All FTS5-indexed tables have automatic sync triggers.
 
@@ -88,6 +89,15 @@ rows stay `NULL` (unknown) until classified with
 `recall provenance backfill`, which only acts on deterministic write-path
 evidence and never guesses.
 
+The `dedup_lineage` table was added in schema migration 9→10. `recall dedup`
+marks duplicate records non-destructively by writing lineage rows here
+(survivor table/id, duplicate table/id, reason, similarity, status); marked
+duplicates stay in their source tables but are hidden from search unless
+`--include-duplicates` is passed. Survivor selection follows provenance order
+(`user_authored > verbatim > extracted > derived > unknown`), then richness,
+importance, and recency. Dedup acts within a table only; cross-table
+candidates are report-only.
+
 ## Tiered RecallStart (v0.7.0+)
 
 The `RecallStart` hook injects two tiers at the top of every session:
diff --git a/docs/cli-reference.md b/docs/cli-reference.md
index c3d38ff..671de42 100644
--- a/docs/cli-reference.md
+++ b/docs/cli-reference.md
@@ -17,6 +17,7 @@ recall search "query" -t decisions      # Hard-filter to decisions only
 recall search "query" --bias-type decisions # Prefer decisions, still show other matching tables
 recall search "query" -p myproject      # Filter by project
 recall search "query" --show-provenance # Show provenance for every result
+recall search "query" --include-duplicates # Include records marked by recall dedup
 recall semantic "query"                 # Semantic search (explicit)
 recall hybrid "query"                   # Hybrid search (explicit)
 ```
@@ -46,6 +47,8 @@ FTS5 supports boolean operators and prefix matching:
 
 By default, search output stays quiet about [Record Provenance](#record-provenance) when a record carries a known value, and visibly flags records whose provenance is unknown (legacy rows that predate the provenance column). Pass `--show-provenance` to display the provenance of every result.
 
+Records marked as duplicates by [`recall dedup`](#dedup) are hidden from every search path (keyword, semantic, hybrid) by default — the records and their lineage remain in the database. Pass `--include-duplicates` to show them.
+
 ---
 
 ## Capture
@@ -269,7 +272,7 @@ Formats:
 
 - **json / markdown** — app-level export of the durable memory tables
   (`sessions`, `messages`, `decisions`, `learnings`, `breadcrumbs`,
-  `loa_entries`). Every row of a provenance-bearing table carries an explicit
+  `loa_entries`, `dedup_lineage`). Every row of a provenance-bearing table carries an explicit
   `provenance` field; legacy `NULL` provenance is exported as the literal
   `unknown` — never omitted, never guessed (see Record Provenance above).
   Embeddings are excluded.
@@ -300,6 +303,46 @@ included.
 overwrites an existing file (a `-N` suffix is added on collision), and prints
 the output path.
 
+## Dedup
+
+Detect and mark duplicate memory records without erasing evidence or lineage.
+
+```bash
+recall dedup                            # Dry-run report (default — writes nothing)
+recall dedup --execute                  # Mark duplicates (non-destructive)
+recall dedup --execute --delete         # Destructive opt-in: hard-delete duplicates
+recall dedup -t breadcrumbs             # Scope to one table
+recall dedup -p myproject               # Scope to one project
+recall dedup --threshold 0.98           # Stricter semantic matching (default 0.95)
+recall dedup --no-semantic              # Exact/normalized text pass only
+```
+
+Safety model:
+
+- **Dry-run by default.** Mutations require `--execute`.
+- **Non-destructive by default.** `--execute` writes lineage rows to the
+  `dedup_lineage` table (survivor, duplicate, reason, similarity, status,
+  timestamp); the duplicate records themselves stay intact and are merely
+  hidden from search. `--delete` is the destructive opt-in and requires
+  `--execute` — run `recall export --backup` first.
+- **Within-table only.** Dedup never merges across tables (or across
+  projects). Cross-table duplicate candidates are report-only.
+- **Survivor priority** is `user_authored > verbatim > extracted > derived >
+  unknown` ([Record Provenance](#record-provenance)); ties break by richness
+  (longer normalized text), importance, recency, then lowest id.
+- **Detection** combines exact/normalized text matching with semantic
+  matching over stored embeddings (no embedding service call needed). The
+  semantic pass is skipped — and reported as skipped — when no embeddings
+  exist; records are never merged below the configured `--threshold`
+  (conservative default: 0.95 cosine similarity). Records with fewer than 20
+  significant characters are never candidates.
+- **Lifecycle-aware.** Only `active` decisions participate; superseded and
+  reverted decisions are managed by the decision lifecycle, not dedup.
+
+Marked duplicates are hidden from all search paths by default; see
+`recall search --include-duplicates`. Lineage is included in
+`recall export`, so the audit trail is portable.
+
 ## Admin
 
 ```bash
diff --git a/src/commands/dedup.ts b/src/commands/dedup.ts
new file mode 100644
index 0000000..7c865bf
--- /dev/null
+++ b/src/commands/dedup.ts
@@ -0,0 +1,145 @@
+// recall dedup command (issue #45).
+//
+// Dry-run by default. --execute marks duplicates (non-destructive: records
+// stay intact, hidden from search via dedup_lineage). --delete is the
+// destructive opt-in and requires --execute; take a `recall export --backup`
+// first. Core logic lives in src/lib/dedup.ts.
+
+import { getDb } from '../db/connection.js';
+import {
+  applyDedupPlan,
+  DEDUP_TABLES,
+  DEFAULT_SEMANTIC_THRESHOLD,
+  planDedup,
+  type ApplyResult,
+  type DedupPlan,
+} from '../lib/dedup.js';
+import type { ProvenanceTable } from '../types/index.js';
+
+export interface DedupOptions {
+  execute?: boolean;
+  delete?: boolean;
+  table?: string;
+  project?: string;
+  threshold?: number;
+  semantic?: boolean;
+}
+
+export interface DedupRunResult {
+  plan: DedupPlan;
+  applied: ApplyResult | null;
+}
+
+export function runDedup(options: DedupOptions = {}): DedupRunResult | undefined {
+  const execute = options.execute ?? false;
+  const destructive = options.delete ?? false;
+
+  if (destructive && !execute) {
+    console.error('--delete requires --execute. Dry-run never deletes.');
+    process.exitCode = 1;
+    return undefined;
+  }
+
+  const target = options.table ?? 'all';
+  if (target !== 'all' && !(DEDUP_TABLES as readonly string[]).includes(target)) {
+    console.error(
+      `Invalid --table "${target}". Valid tables: ${DEDUP_TABLES.join(', ')}, all.`
+    );
+    process.exitCode = 1;
+    return undefined;
+  }
+
+  const threshold = options.threshold ?? DEFAULT_SEMANTIC_THRESHOLD;
+  if (!Number.isFinite(threshold) || threshold <= 0 || threshold > 1) {
+    console.error(`Invalid --threshold "${options.threshold}". Expected a number in (0, 1].`);
+    process.exitCode = 1;
+    return undefined;
+  }
+
+  const db = getDb();
+  const plan = planDedup(db, {
+    tables: target === 'all' ? undefined : [target as ProvenanceTable],
+    project: options.project,
+    threshold,
+    semantic: options.semantic,
+  });
+
+  const mode = !execute
+    ? '[DRY RUN — no changes written]'
+    : destructive
+      ? '[EXECUTE + DELETE — destructive: duplicates will be removed]'
+      : '[EXECUTE — marking duplicates, non-destructive]';
+  console.log(`${mode}\n`);
+
+  if (destructive) {
+    console.log("Recommended: run 'recall export --backup' before destructive dedup.\n");
+  }
+
+  const verb = execute ? (destructive ? 'delete' : 'mark') : 'would mark';
+  let totalPlanned = 0;
+  for (const report of plan.tables) {
+    totalPlanned += report.planned.length;
+    const unchanged = report.scanned - report.planned.length;
+    const skipped: string[] = [];
+    if (report.alreadyMarked > 0) skipped.push(`${report.alreadyMarked} already marked`);
+    if (report.tooShort > 0) skipped.push(`${report.tooShort} too short`);
+    const skippedNote = skipped.length > 0 ? ` (${skipped.join(', ')})` : '';
+    console.log(
+      `${report.table}: scanned ${report.scanned}, exact groups ${report.exactGroups}, ` +
+      `semantic pairs ${report.semanticPairs}, ${verb} ${report.planned.length}, ` +
+      `unchanged ${unchanged}${skippedNote}`
+    );
+    for (const entry of report.planned.slice(0, 3)) {
+      const sim = entry.similarity !== null ? ` @ ${entry.similarity.toFixed(3)}` : '';
+      console.log(
+        `  #${entry.duplicate_id} → survivor #${entry.survivor_id} [${entry.reason}${sim}]`
+      );
+    }
+    if (report.planned.length > 3) {
+      console.log(`  ...and ${report.planned.length - 3} more`);
+    }
+  }
+  console.log('');
+
+  if (plan.semanticSkipped) {
+    console.log(`Semantic pass: skipped — ${plan.semanticSkipped}`);
+  } else {
+    console.log(`Semantic pass: threshold ${plan.threshold}`);
+  }
+
+  const crossText = plan.crossTable.textMatches;
+  console.log(
+    `Cross-table (report-only, never acted on): ${crossText.length} text match group(s), ` +
+    `${plan.crossTable.semanticPairs} semantic pair(s)`
+  );
+  for (const match of crossText.slice(0, 5)) {
+    const members = match.members.map(m => `${m.table}#${m.id}`).join(' ↔ ');
+    const projectTag = match.project ? ` [${match.project}]` : '';
+    console.log(`  ${members}${projectTag}`);
+  }
+  if (crossText.length > 5) {
+    console.log(`  ...and ${crossText.length - 5} more`);
+  }
+  console.log('');
+
+  if (!execute) {
+    if (totalPlanned > 0) {
+      console.log('Re-run with --execute to mark duplicates (non-destructive).');
+      console.log("Marked duplicates are hidden from search; use 'recall search --include-duplicates' to see them.");
+    } else {
+      console.log('No duplicates found.');
+    }
+    return { plan, applied: null };
+  }
+
+  const applied = applyDedupPlan(db, plan, { destructive });
+  if (destructive) {
+    const fkNote = applied.fkProtected > 0
+      ? ` (${applied.fkProtected} kept as marked — referenced by LoA lineage)`
+      : '';
+    console.log(`Deleted ${applied.deleted} duplicate(s)${fkNote}.`);
+  } else {
+    console.log(`Marked ${applied.marked} duplicate(s). Records remain intact and recoverable.`);
+  }
+  return { plan, applied };
+}
diff --git a/src/commands/embed.ts b/src/commands/embed.ts
index c9ae4a0..66bd623 100644
--- a/src/commands/embed.ts
+++ b/src/commands/embed.ts
@@ -2,8 +2,17 @@
 
 import { getDb } from '../db/connection.js';
 import { embed, embeddingToBlob, blobToEmbedding, cosineSimilarity, checkEmbeddingService, reciprocalRankFusion, EMBEDDING_MODEL } from '../lib/embeddings.js';
+import { notMarkedDuplicateSql } from '../lib/dedup.js';
 import { search as ftsSearch } from '../lib/memory.js';
 
+// Marked duplicates (recall dedup, issue #45) keep their embeddings but are
+// hidden from the semantic search paths, matching the FTS5 default.
+function embeddingsWhere(table?: string): string {
+  const conditions = [notMarkedDuplicateSql('source_table', 'source_id')];
+  if (table) conditions.push(`source_table = '${table}'`);
+  return `WHERE ${conditions.join(' AND ')}`;
+}
+
 interface EmbedOptions {
   table?: 'loa' | 'decisions' | 'messages' | 'learnings';
   limit?: number;
@@ -164,11 +173,10 @@ export async function runSemanticSearch(query: string, options: { table?: string
   const queryEmbedding = queryResult.embedding;
 
   // Get all embeddings (for now, brute force - will optimize later)
-  const tableFilter = options.table ? `WHERE source_table = '${options.table}'` : '';
   const embeddings = db.prepare(`
     SELECT id, source_table, source_id, embedding
     FROM embeddings
-    ${tableFilter}
+    ${embeddingsWhere(options.table)}
   `).all() as Array<{ id: number; source_table: string; source_id: number; embedding: Buffer }>;
 
   if (embeddings.length === 0) {
@@ -304,11 +312,10 @@ export async function runHybridSearch(query: string, options: { table?: string;
   const queryEmbedding = queryResult.embedding;
 
   // Get embeddings from database
-  const tableFilter = options.table ? `WHERE source_table = '${options.table}'` : '';
   const embeddings = db.prepare(`
     SELECT id, source_table, source_id, embedding
     FROM embeddings
-    ${tableFilter}
+    ${embeddingsWhere(options.table)}
   `).all() as Array<{ id: number; source_table: string; source_id: number; embedding: Buffer }>;
 
   // Calculate similarities
diff --git a/src/commands/search.ts b/src/commands/search.ts
index 2c13579..9c782a7 100644
--- a/src/commands/search.ts
+++ b/src/commands/search.ts
@@ -8,6 +8,7 @@ interface SearchOptions {
   biasType?: string;
   limit?: number;
   showProvenance?: boolean;
+  includeDuplicates?: boolean;
 }
 
 export function runSearch(query: string, options: SearchOptions): void {
@@ -23,7 +24,8 @@ export function runSearch(query: string, options: SearchOptions): void {
     project: options.project,
     table: options.table,
     biasType: options.biasType as SearchTable | undefined,
-    limit: options.limit || 20
+    limit: options.limit || 20,
+    includeDuplicates: options.includeDuplicates
   });
 
   if (results.length === 0) {
diff --git a/src/db/migrations.ts b/src/db/migrations.ts
index 415e1a4..e5c4cbe 100644
--- a/src/db/migrations.ts
+++ b/src/db/migrations.ts
@@ -198,6 +198,12 @@ export const MIGRATIONS: Migration[] = [
       }
     }
   },
+
+  // Migration 9 → 10: Dedup lineage table (issue #45).
+  // No-op — dedup_lineage and its indexes are brand new, handled by the
+  // CREATE TABLE IF NOT EXISTS DDL that runs before migrations (same
+  // precedent as migration 3 → 4 for the extraction tables).
+  (_db) => {},
 ];
 
 // ---------------------------------------------------------------------------
diff --git a/src/db/schema.ts b/src/db/schema.ts
index 7f60912..269e0b9 100644
--- a/src/db/schema.ts
+++ b/src/db/schema.ts
@@ -181,6 +181,24 @@ CREATE TABLE IF NOT EXISTS procedures (
   times_observed INTEGER DEFAULT 2,
   confidence TEXT DEFAULT 'medium' CHECK (confidence IN ('high', 'medium', 'low'))
 );
+
+-- Dedup lineage (issue #45): persistent audit trail of duplicate marking.
+-- Non-destructive by default — a 'marked' row hides the duplicate from search
+-- while the underlying record stays intact. 'deleted' records a destructive
+-- opt-in removal. 'reverted' is reserved vocabulary for a future unmark path
+-- (CHECK constraints cannot be widened later without a table rebuild).
+CREATE TABLE IF NOT EXISTS dedup_lineage (
+  id INTEGER PRIMARY KEY AUTOINCREMENT,
+  created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
+  survivor_table TEXT NOT NULL,
+  survivor_id INTEGER NOT NULL,
+  duplicate_table TEXT NOT NULL,
+  duplicate_id INTEGER NOT NULL,
+  reason TEXT NOT NULL CHECK (reason IN ('exact', 'semantic')),
+  similarity REAL,
+  status TEXT NOT NULL DEFAULT 'marked' CHECK (status IN ('marked', 'deleted', 'reverted')),
+  detail TEXT
+);
 `;
 
 export const CREATE_INDEXES = `
@@ -231,6 +249,14 @@ CREATE INDEX IF NOT EXISTS idx_documents_created ON documents(created_at);
 
 -- Extraction session indexes
 CREATE INDEX IF NOT EXISTS idx_extraction_sessions_ts ON extraction_sessions(timestamp DESC);
+
+-- Dedup lineage indexes: the partial unique index guarantees a record can be
+-- an actively marked duplicate at most once (idempotence); the survivor index
+-- supports lineage audits.
+CREATE UNIQUE INDEX IF NOT EXISTS idx_dedup_lineage_duplicate
+  ON dedup_lineage(duplicate_table, duplicate_id) WHERE status = 'marked';
+CREATE INDEX IF NOT EXISTS idx_dedup_lineage_survivor
+  ON dedup_lineage(survivor_table, survivor_id);
 `;
 
 export const CREATE_FTS = `
diff --git a/src/index.ts b/src/index.ts
index 4040e17..a2e2aaf 100644
--- a/src/index.ts
+++ b/src/index.ts
@@ -30,6 +30,8 @@ import { runOnboard } from './commands/onboard.js';
 import { runMigrate } from './commands/migrate.js';
 import { runPath } from './commands/path.js';
 import { runExport } from './commands/export.js';
+import { runDedup } from './commands/dedup.js';
+import { DEFAULT_SEMANTIC_THRESHOLD } from './lib/dedup.js';
 import { closeDb } from './db/connection.js';
 
 const program = new Command();
@@ -180,13 +182,15 @@ program
   .option('--bias-type <table>', 'Softly boost one table without filtering others (messages, loa, decisions, learnings, breadcrumbs)')
   .option('-l, --limit <n>', 'Max results', '20')
   .option('--show-provenance', 'Show provenance for every result (default: only unknown provenance is flagged)')
+  .option('--include-duplicates', 'Include records marked as duplicates by recall dedup (hidden by default)')
   .action((query, options) => {
     runSearch(query, {
       project: options.project,
       table: options.table,
       biasType: options.biasType,
       limit: parseInt(options.limit, 10),
-      showProvenance: options.showProvenance
+      showProvenance: options.showProvenance,
+      includeDuplicates: options.includeDuplicates
     });
     closeDb();
   });
@@ -658,6 +662,30 @@ program
     closeDb();
   });
 
+// recall dedup — non-destructive duplicate detection (issue #45)
+// Dry-run by default; --execute marks duplicates in dedup_lineage (records
+// stay intact, hidden from search); --delete is the destructive opt-in.
+program
+  .command('dedup')
+  .description('Detect and mark duplicate memory records (dry-run by default; non-destructive)')
+  .option('--execute', 'Apply the plan: mark duplicates (default is dry-run)')
+  .option('--delete', "Destructive opt-in: hard-delete duplicates instead of marking (requires --execute; run 'recall export --backup' first)")
+  .option('-t, --table <table>', 'Target table: messages, decisions, learnings, breadcrumbs, loa_entries, all', 'all')
+  .option('-p, --project <name>', 'Scope to one project')
+  .option('--threshold <n>', `Semantic similarity threshold (0-1)`, String(DEFAULT_SEMANTIC_THRESHOLD))
+  .option('--no-semantic', 'Skip the semantic (embeddings) pass')
+  .action((options) => {
+    runDedup({
+      execute: options.execute,
+      delete: options.delete,
+      table: options.table,
+      project: options.project,
+      threshold: parseFloat(options.threshold),
+      semantic: options.semantic
+    });
+    closeDb();
+  });
+
 // Default command: recall <query> → hybrid search (Phase 3: best of both worlds)
 program
   .arguments('[query]')
@@ -668,7 +696,7 @@ program
   .option('-k, --keyword', 'Use keyword search only (FTS5)')
   .option('-v, --vector', 'Use vector search only (semantic)')
   .action(async (query, options) => {
-    if (query && !['init', 'add', 'search', 'recent', 'show', 'stats', 'import', 'import-conversations', 'loa', 'telos', 'docs', 'dump', 'embed', 'semantic', 'hybrid', 'doctor', 'importance', 'provenance', 'pin', 'unpin', 'decision', 'prune', 'cluster', 'import-legacy', 'benchmark', 'onboard', 'migrate', 'path', 'export'].includes(query)) {
+    if (query && !['init', 'add', 'search', 'recent', 'show', 'stats', 'import', 'import-conversations', 'loa', 'telos', 'docs', 'dump', 'embed', 'semantic', 'hybrid', 'doctor', 'importance', 'provenance', 'pin', 'unpin', 'decision', 'prune', 'cluster', 'import-legacy', 'benchmark', 'onboard', 'migrate', 'path', 'export', 'dedup'].includes(query)) {
       if (options.keyword) {
         // FTS5 only
         runSearch(query, {
diff --git a/src/lib/dedup.ts b/src/lib/dedup.ts
new file mode 100644
index 0000000..fbfce6e
--- /dev/null
+++ b/src/lib/dedup.ts
@@ -0,0 +1,629 @@
+// recall dedup — core logic (issue #45).
+//
+// Non-destructive dedup with provenance-aware survivor selection. The decision
+// logic (normalization, survivor ordering, grouping, semantic pairing) is kept
+// as pure exported functions over plain data so it stays unit-testable
+// (issue #44 phase 2 adds property tests over these functions).
+//
+// Safety model:
+// - Dry-run by default; mutations require an explicit execute step.
+// - Default action marks duplicates in dedup_lineage — records stay intact.
+//   Hard deletion is a separate destructive opt-in.
+// - Dedup acts within a table (and within a project) only. Cross-table
+//   duplicate candidates are report-only.
+// - Survivor priority: user_authored > verbatim > extracted > derived >
+//   unknown (PROVENANCE_VALUES in types/index.ts is the single source of
+//   truth). Ties break by richness (normalized length), importance, recency,
+//   then lowest id for determinism.
+// - Semantic detection compares stored embeddings pairwise — no embedding
+//   service call is needed. Pairs are never chained transitively: every
+//   marked duplicate has a direct similarity >= threshold to its survivor.
+//
+// Bind-count note (see src/lib/chunk.ts): scans use keyset pagination
+// (`WHERE id > ? LIMIT ?`, fixed binds). Destructive deletion builds
+// input-scaled `IN (?,...)` lists and therefore goes through chunked().
+
+import { createHash } from 'crypto';
+import { Database } from 'bun:sqlite';
+import { chunked, SQLITE_SAFE_CHUNK_SIZE } from './chunk.js';
+import { blobToEmbedding, cosineSimilarity } from './embeddings.js';
+import {
+  PROVENANCE_TABLES,
+  PROVENANCE_VALUES,
+  type Provenance,
+  type ProvenanceTable,
+} from '../types/index.js';
+
+/** Tables dedup operates on — the provenance-bearing memory tables. */
+export const DEDUP_TABLES = PROVENANCE_TABLES;
+
+/**
+ * Conservative default for semantic matching. At 0.95 cosine similarity on
+ * nomic-embed-text vectors, two records are near-verbatim restatements;
+ * anything below stays untouched (never auto-merge below the threshold).
+ */
+export const DEFAULT_SEMANTIC_THRESHOLD = 0.95;
+
+/**
+ * Records whose normalized text is shorter than this are never dedup
+ * candidates. Short acknowledgments ("ok", "thanks", "yes") repeat
+ * legitimately across sessions — mass-marking them would be noise.
+ */
+export const MIN_DEDUP_TEXT_LENGTH = 20;
+
+export type DedupReason = 'exact' | 'semantic';
+export type LineageStatus = 'marked' | 'deleted' | 'reverted';
+
+export interface DedupCandidate {
+  table: ProvenanceTable;
+  id: number;
+  project: string | null;
+  provenance: Provenance | null;
+  importance: number;
+  created_at: string;
+  /** sha256 of the normalized dedup text — exact-match grouping key. */
+  key: string;
+  /** Length of the normalized text — the richness tie-breaker. */
+  textLength: number;
+}
+
+export interface LineageEntry {
+  survivor_table: ProvenanceTable;
+  survivor_id: number;
+  duplicate_table: ProvenanceTable;
+  duplicate_id: number;
+  reason: DedupReason;
+  similarity: number | null;
+  /** JSON audit detail (project, survivor/duplicate provenance). */
+  detail: string;
+}
+
+// ---------------------------------------------------------------------------
+// Pure functions — text normalization and survivor selection
+// ---------------------------------------------------------------------------
+
+/**
+ * Canonical text normalization for duplicate detection: lowercase, strip
+ * quote characters, collapse whitespace. Same convention the extraction
+ * hooks use for their log-level dedup.
+ */
+export function normalizeText(s: string): string {
+  return s.toLowerCase().replace(/['"]/g, '').replace(/\s+/g, ' ').trim();
+}
+
+/**
+ * Survivor priority rank — lower wins. Order comes from PROVENANCE_VALUES
+ * (user_authored > verbatim > extracted > derived); unknown (NULL) ranks
+ * last. Note PROVENANCE_VALUES is already declared in survivor order.
+ */
+export function provenanceRank(p: Provenance | null | undefined): number {
+  if (!p) return PROVENANCE_VALUES.length;
+  const idx = PROVENANCE_VALUES.indexOf(p);
+  return idx === -1 ? PROVENANCE_VALUES.length : idx;
+}
+
+/**
+ * Comparator placing the survivor first: provenance rank, then richness
+ * (longer normalized text), importance, recency (newer created_at), and
+ * finally lowest id so ordering is total and deterministic.
+ */
+export function compareForSurvivor(a: DedupCandidate, b: DedupCandidate): number {
+  const rank = provenanceRank(a.provenance) - provenanceRank(b.provenance);
+  if (rank !== 0) return rank;
+  if (a.textLength !== b.textLength) return b.textLength - a.textLength;
+  const importance = (b.importance ?? 5) - (a.importance ?? 5);
+  if (importance !== 0) return importance;
+  // ISO and SQLite CURRENT_TIMESTAMP formats both sort lexicographically.
+  if (a.created_at !== b.created_at) return a.created_at > b.created_at ? -1 : 1;
+  return a.id - b.id;
+}
+
+export function selectSurvivor(group: DedupCandidate[]): {
+  survivor: DedupCandidate;
+  duplicates: DedupCandidate[];
+} {
+  if (group.length === 0) throw new Error('selectSurvivor: empty group');
+  const sorted = [...group].sort(compareForSurvivor);
+  return { survivor: sorted[0], duplicates: sorted.slice(1) };
+}
+
+/**
+ * Grouping key: same table, same project, same normalized text. Fields are
+ * joined with the NUL escape `\x00` — a separator that cannot occur in any
+ * field, so boundaries can never be forged by content.
+ */
+function groupKey(c: DedupCandidate): string {
+  return `${c.table}\x00${c.project ?? ''}\x00${c.key}`;
+}
+
+/**
+ * Exact (normalized-text) duplicate groups within a table and project.
+ * Returns only groups with two or more members.
+ */
+export function findExactGroups(candidates: DedupCandidate[]): DedupCandidate[][] {
+  const byKey = new Map<string, DedupCandidate[]>();
+  for (const c of candidates) {
+    const k = groupKey(c);
+    const group = byKey.get(k);
+    if (group) group.push(c);
+    else byKey.set(k, [c]);
+  }
+  return [...byKey.values()].filter(g => g.length >= 2);
+}
+
+export interface CrossTableMatch {
+  project: string | null;
+  members: Array<{ table: ProvenanceTable; id: number }>;
+}
+
+/**
+ * Report-only: normalized-text matches spanning two or more tables within
+ * the same project. Dedup never acts across tables.
+ */
+export function findCrossTableMatches(candidates: DedupCandidate[]): CrossTableMatch[] {
+  const byKey = new Map<string, DedupCandidate[]>();
+  for (const c of candidates) {
+    const k = `${c.project ?? ''}\x00${c.key}`;
+    const group = byKey.get(k);
+    if (group) group.push(c);
+    else byKey.set(k, [c]);
+  }
+  const matches: CrossTableMatch[] = [];
+  for (const group of byKey.values()) {
+    const tables = new Set(group.map(c => c.table));
+    if (tables.size < 2) continue;
+    matches.push({
+      project: group[0].project,
+      members: group.map(c => ({ table: c.table, id: c.id })),
+    });
+  }
+  return matches;
+}
+
+export interface EmbeddedCandidate {
+  candidate: DedupCandidate;
+  embedding: number[];
+}
+
+export interface SemanticPair {
+  a: DedupCandidate;
+  b: DedupCandidate;
+  similarity: number;
+  sameTable: boolean;
+}
+
+/**
+ * Pairwise cosine similarity over stored embeddings. Returns every pair at
+ * or above the threshold, split into actionable (same table + project) and
+ * cross-table report-only pairs. Pairs within one table but across projects
+ * are ignored — dedup scope is per-project, matching the exact pass.
+ *
+ * O(n^2) over embedded records — acceptable at personal-memory scale and the
+ * same brute-force shape the cluster command and hybrid search already use.
+ */
+export function findSemanticPairs(
+  embedded: EmbeddedCandidate[],
+  threshold: number
+): SemanticPair[] {
+  const pairs: SemanticPair[] = [];
+  for (let i = 0; i < embedded.length; i++) {
+    for (let j = i + 1; j < embedded.length; j++) {
+      const x = embedded[i];
+      const y = embedded[j];
+      const sameTable = x.candidate.table === y.candidate.table;
+      if (sameTable && x.candidate.project !== y.candidate.project) continue;
+      // Exact duplicates are the exact pass's job; identical keys here would
+      // double-report the same records under a second reason.
+      if (x.candidate.key === y.candidate.key) continue;
+      const similarity = cosineSimilarity(x.embedding, y.embedding);
+      if (similarity >= threshold) {
+        pairs.push({ a: x.candidate, b: y.candidate, similarity, sameTable });
+      }
+    }
+  }
+  // Deterministic processing order: strongest pairs first.
+  pairs.sort((p, q) =>
+    q.similarity - p.similarity ||
+    p.a.table.localeCompare(q.a.table) ||
+    p.a.id - q.a.id ||
+    p.b.id - q.b.id
+  );
+  return pairs;
+}
+
+// ---------------------------------------------------------------------------
+// SQL fragments shared with the search paths
+// ---------------------------------------------------------------------------
+
+/**
+ * SQL fragment excluding records marked as duplicates. `tableExpr` and
+ * `idExpr` are SQL expressions (a quoted literal like `'messages'` or a
+ * column reference like `e.source_table`) — never user input.
+ */
+export function notMarkedDuplicateSql(tableExpr: string, idExpr: string): string {
+  return `NOT EXISTS (
+    SELECT 1 FROM dedup_lineage dl
+    WHERE dl.duplicate_table = ${tableExpr}
+      AND dl.duplicate_id = ${idExpr}
+      AND dl.status = 'marked'
+  )`;
+}
+
+// ---------------------------------------------------------------------------
+// Database plumbing — scans, lineage, deletion
+// ---------------------------------------------------------------------------
+
+interface TableScanConfig {
+  textColumns: string[];
+  createdAtColumn: string;
+  extraWhere?: string;
+}
+
+// Which columns constitute "the text" of a record for duplicate detection.
+// Decisions are scanned at status='active' only: supersede/revert already
+// manage lifecycle copies, and a superseded row must never win survivorship
+// over (and thereby hide) an active decision.
+const TABLE_SCAN_CONFIG: Record<ProvenanceTable, TableScanConfig> = {
+  messages: { textColumns: ['content'], createdAtColumn: 'timestamp' },
+  decisions: { textColumns: ['decision'], createdAtColumn: 'created_at', extraWhere: "status = 'active'" },
+  learnings: { textColumns: ['problem', 'solution'], createdAtColumn: 'created_at' },
+  breadcrumbs: { textColumns: ['content'], createdAtColumn: 'created_at' },
+  loa_entries: { textColumns: ['title', 'fabric_extract'], createdAtColumn: 'created_at' },
+};
+
+export interface ScanResult {
+  candidates: DedupCandidate[];
+  scanned: number;
+  tooShort: number;
+}
+
+/**
+ * Read one table's dedup candidates in bounded batches via keyset pagination
+ * (fixed binds per statement regardless of table size). Rows below
+ * MIN_DEDUP_TEXT_LENGTH are counted but excluded.
+ */
+export function scanCandidates(
+  db: Database,
+  table: ProvenanceTable,
+  project?: string,
+  batchSize: number = SQLITE_SAFE_CHUNK_SIZE
+): ScanResult {
+  const config = TABLE_SCAN_CONFIG[table];
+  const where = [
+    'id > ?',
+    ...(config.extraWhere ? [config.extraWhere] : []),
+    ...(project !== undefined ? ['project = ?'] : []),
+  ].join(' AND ');
+  const stmt = db.prepare(`
+    SELECT id, project, provenance, importance,
+           ${config.createdAtColumn} AS created_at,
+           ${config.textColumns.join(', ')}
+    FROM ${table}
+    WHERE ${where}
+    ORDER BY id
+    LIMIT ?
+  `);
+
+  const candidates: DedupCandidate[] = [];
+  let scanned = 0;
+  let tooShort = 0;
+  let lastId = 0;
+  for (;;) {
+    const params = project !== undefined ? [lastId, project, batchSize] : [lastId, batchSize];
+    const batch = stmt.all(...params) as Array<Record<string, unknown>>;
+    if (batch.length === 0) break;
+    for (const row of batch) {
+      scanned++;
+      const text = config.textColumns
+        .map(c => row[c])
+        .filter(v => v !== null && v !== undefined && String(v).length > 0)
+        .join('\n');
+      const normalized = normalizeText(text);
+      if (normalized.length < MIN_DEDUP_TEXT_LENGTH) {
+        tooShort++;
+        continue;
+      }
+      candidates.push({
+        table,
+        id: row.id as number,
+        project: (row.project as string | null) ?? null,
+        provenance: (row.provenance as Provenance | null) ?? null,
+        importance: (row.importance as number | null) ?? 5,
+        created_at: String(row.created_at),
+        key: createHash('sha256').update(normalized).digest('hex'),
+        textLength: normalized.length,
+      });
+    }
+    lastId = batch[batch.length - 1].id as number;
+  }
+  return { candidates, scanned, tooShort };
+}
+
+/** (table, id) pairs currently marked as duplicates — excluded from scans. */
+export function loadMarkedDuplicates(db: Database): Set<string> {
+  const rows = db.prepare(
+    `SELECT duplicate_table, duplicate_id FROM dedup_lineage WHERE status = 'marked'`
+  ).all() as Array<{ duplicate_table: string; duplicate_id: number }>;
+  return new Set(rows.map(r => `${r.duplicate_table}:${r.duplicate_id}`));
+}
+
+/** Stored embeddings for one table, keyed by source row id. */
+export function loadEmbeddings(db: Database, table: ProvenanceTable): Map<number, number[]> {
+  const rows = db.prepare(
+    `SELECT source_id, embedding FROM embeddings WHERE source_table = ?`
+  ).all(table) as Array<{ source_id: number; embedding: Buffer }>;
+  const map = new Map<number, number[]>();
+  for (const row of rows) {
+    map.set(row.source_id, blobToEmbedding(row.embedding));
+  }
+  return map;
+}
+
+/**
+ * Ids in `table` that other rows reference via foreign keys (loa_entries
+ * message ranges, loa parent links). With foreign_keys=ON these cannot be
+ * hard-deleted; destructive mode downgrades them to 'marked' instead of
+ * failing the whole transaction.
+ */
+export function fkProtectedIds(db: Database, table: ProvenanceTable): Set<number> {
+  const ids = new Set<number>();
+  if (table === 'messages') {
+    const rows = db.prepare(
+      `SELECT message_range_start AS s, message_range_end AS e FROM loa_entries
+       WHERE message_range_start IS NOT NULL OR message_range_end IS NOT NULL`
+    ).all() as Array<{ s: number | null; e: number | null }>;
+    for (const row of rows) {
+      if (row.s !== null) ids.add(row.s);
+      if (row.e !== null) ids.add(row.e);
+    }
+  } else if (table === 'loa_entries') {
+    const rows = db.prepare(
+      `SELECT DISTINCT parent_loa_id AS p FROM loa_entries WHERE parent_loa_id IS NOT NULL`
+    ).all() as Array<{ p: number }>;
+    for (const row of rows) ids.add(row.p);
+  }
+  return ids;
+}
+
+// ---------------------------------------------------------------------------
+// Plan and apply
+// ---------------------------------------------------------------------------
+
+export interface DedupTableReport {
+  table: ProvenanceTable;
+  scanned: number;
+  tooShort: number;
+  alreadyMarked: number;
+  exactGroups: number;
+  semanticPairs: number;
+  planned: LineageEntry[];
+}
+
+export interface DedupPlan {
+  tables: DedupTableReport[];
+  threshold: number;
+  /** Reason the semantic pass did not run, or null if it ran. */
+  semanticSkipped: string | null;
+  crossTable: {
+    textMatches: CrossTableMatch[];
+    semanticPairs: number;
+  };
+}
+
+export interface PlanDedupOptions {
+  tables?: ProvenanceTable[];
+  project?: string;
+  threshold?: number;
+  semantic?: boolean;
+}
+
+function lineageDetail(survivor: DedupCandidate, duplicate: DedupCandidate): string {
+  return JSON.stringify({
+    project: duplicate.project,
+    survivor_provenance: survivor.provenance ?? 'unknown',
+    duplicate_provenance: duplicate.provenance ?? 'unknown',
+  });
+}
+
+function toLineage(
+  survivor: DedupCandidate,
+  duplicate: DedupCandidate,
+  reason: DedupReason,
+  similarity: number | null
+): LineageEntry {
+  return {
+    survivor_table: survivor.table,
+    survivor_id: survivor.id,
+    duplicate_table: duplicate.table,
+    duplicate_id: duplicate.id,
+    reason,
+    similarity,
+    detail: lineageDetail(survivor, duplicate),
+  };
+}
+
+/**
+ * Compute the full dedup plan without writing anything. The same plan is
+ * what a dry run reports and what an execute run applies.
+ */
+export function planDedup(db: Database, options: PlanDedupOptions = {}): DedupPlan {
+  const tables = options.tables ?? [...DEDUP_TABLES];
+  const threshold = options.threshold ?? DEFAULT_SEMANTIC_THRESHOLD;
+  const semantic = options.semantic ?? true;
+  const marked = loadMarkedDuplicates(db);
+
+  const reports = new Map<ProvenanceTable, DedupTableReport>();
+  const eligible = new Map<ProvenanceTable, DedupCandidate[]>();
+
+  // Exact pass — within table + project.
+  const plannedDuplicateKeys = new Set<string>();
+  const plannedSurvivorKeys = new Set<string>();
+  for (const table of tables) {
+    const scan = scanCandidates(db, table, options.project);
+    const fresh = scan.candidates.filter(c => !marked.has(`${c.table}:${c.id}`));
+    const report: DedupTableReport = {
+      table,
+      scanned: scan.scanned,
+      tooShort: scan.tooShort,
+      alreadyMarked: scan.candidates.length - fresh.length,
+      exactGroups: 0,
+      semanticPairs: 0,
+      planned: [],
+    };
+    for (const group of findExactGroups(fresh)) {
+      report.exactGroups++;
+      const { survivor, duplicates } = selectSurvivor(group);
+      plannedSurvivorKeys.add(`${survivor.table}:${survivor.id}`);
+      for (const dup of duplicates) {
+        report.planned.push(toLineage(survivor, dup, 'exact', 1.0));
+        plannedDuplicateKeys.add(`${dup.table}:${dup.id}`);
+      }
+    }
+    reports.set(table, report);
+    eligible.set(table, fresh);
+  }
+
+  // Records still standing after the exact pass (survivors + unmatched).
+  const remaining = [...eligible.values()].flat()
+    .filter(c => !plannedDuplicateKeys.has(`${c.table}:${c.id}`));
+
+  // Cross-table normalized-text matches — report-only.
+  const crossTableText = findCrossTableMatches(remaining);
+
+  // Semantic pass — stored embeddings only; no embedding service required.
+  let semanticSkipped: string | null = null;
+  let crossTableSemantic = 0;
+  if (!semantic) {
+    semanticSkipped = 'disabled (--no-semantic)';
+  } else {
+    const embedded: EmbeddedCandidate[] = [];
+    for (const table of tables) {
+      const embeddings = loadEmbeddings(db, table);
+      if (embeddings.size === 0) continue;
+      for (const candidate of remaining) {
+        if (candidate.table !== table) continue;
+        const embedding = embeddings.get(candidate.id);
+        if (embedding) embedded.push({ candidate, embedding });
+      }
+    }
+    if (embedded.length === 0) {
+      semanticSkipped =
+        'no stored embeddings for the scanned records (run `recall embed backfill` to enable semantic detection)';
+    } else {
+      const pairs = findSemanticPairs(embedded, threshold);
+      for (const pair of pairs) {
+        if (!pair.sameTable) {
+          crossTableSemantic++;
+          continue;
+        }
+        const aKey = `${pair.a.table}:${pair.a.id}`;
+        const bKey = `${pair.b.table}:${pair.b.id}`;
+        // Greedy, strongest-first: a record planned as a duplicate can
+        // neither survive nor be re-marked, and a record planned as a
+        // survivor (in either pass) can never be re-marked by a weaker
+        // pair — lineage stays one hop deep, no transitive chaining.
+        if (plannedDuplicateKeys.has(aKey) || plannedDuplicateKeys.has(bKey)) continue;
+        if (plannedSurvivorKeys.has(aKey) || plannedSurvivorKeys.has(bKey)) continue;
+        const { survivor, duplicates } = selectSurvivor([pair.a, pair.b]);
+        const report = reports.get(pair.a.table)!;
+        report.semanticPairs++;
+        report.planned.push(toLineage(survivor, duplicates[0], 'semantic', pair.similarity));
+        plannedDuplicateKeys.add(`${duplicates[0].table}:${duplicates[0].id}`);
+        plannedSurvivorKeys.add(`${survivor.table}:${survivor.id}`);
+      }
+    }
+  }
+
+  return {
+    tables: tables.map(t => reports.get(t)!),
+    threshold,
+    semanticSkipped,
+    crossTable: { textMatches: crossTableText, semanticPairs: crossTableSemantic },
+  };
+}
+
+export interface ApplyResult {
+  marked: number;
+  deleted: number;
+  /** Duplicates kept as 'marked' in destructive mode because of FK references. */
+  fkProtected: number;
+}
+
+/**
+ * Apply a plan: insert lineage rows ('marked' by default). With
+ * `destructive`, duplicate rows are hard-deleted (with their embeddings) and
+ * lineage status is 'deleted' — except FK-referenced rows, which stay marked.
+ * Runs in one transaction; failures roll back everything.
+ */
+export function applyDedupPlan(
+  db: Database,
+  plan: DedupPlan,
+  options: { destructive?: boolean } = {}
+): ApplyResult {
+  const destructive = options.destructive ?? false;
+  const entries = plan.tables.flatMap(t => t.planned);
+  const result: ApplyResult = { marked: 0, deleted: 0, fkProtected: 0 };
+  if (entries.length === 0) return result;
+
+  const insert = db.prepare(`
+    INSERT INTO dedup_lineage
+      (survivor_table, survivor_id, duplicate_table, duplicate_id, reason, similarity, status, detail)
+    VALUES (?, ?, ?, ?, ?, ?, ?, ?)
+  `);
+
+  // FK lookups are per-table constants within this transaction; cache them so
+  // a large plan doesn't re-query loa_entries per duplicate row.
+  const fkCache = new Map<ProvenanceTable, Set<number>>();
+  const fkProtected = (table: ProvenanceTable): Set<number> => {
+    let cached = fkCache.get(table);
+    if (!cached) {
+      cached = fkProtectedIds(db, table);
+      fkCache.set(table, cached);
+    }
+    return cached;
+  };
+
+  const apply = db.transaction(() => {
+    const toDelete = new Map<ProvenanceTable, number[]>();
+    for (const entry of entries) {
+      let status: LineageStatus = 'marked';
+      if (destructive) {
+        const protectedIds = fkProtected(entry.duplicate_table);
+        if (protectedIds.has(entry.duplicate_id)) {
+          result.fkProtected++;
+        } else {
+          status = 'deleted';
+          const list = toDelete.get(entry.duplicate_table);
+          if (list) list.push(entry.duplicate_id);
+          else toDelete.set(entry.duplicate_table, [entry.duplicate_id]);
+        }
+      }
+      insert.run(
+        entry.survivor_table,
+        entry.survivor_id,
+        entry.duplicate_table,
+        entry.duplicate_id,
+        entry.reason,
+        entry.similarity,
+        status,
+        entry.detail
+      );
+      if (status === 'marked') result.marked++;
+      else result.deleted++;
+    }
+
+    // Input-scaled IN lists — chunked() per the src/lib/chunk.ts audit note.
+    for (const [table, ids] of toDelete) {
+      for (const chunk of chunked(ids)) {
+        const placeholders = chunk.map(() => '?').join(', ');
+        db.prepare(`DELETE FROM ${table} WHERE id IN (${placeholders})`).run(...chunk);
+        db.prepare(
+          `DELETE FROM embeddings WHERE source_table = ? AND source_id IN (${placeholders})`
+        ).run(table, ...chunk);
+      }
+    }
+  });
+  apply();
+
+  return result;
+}
diff --git a/src/lib/embeddings.ts b/src/lib/embeddings.ts
index 8dc22f4..a46632b 100644
--- a/src/lib/embeddings.ts
+++ b/src/lib/embeddings.ts
@@ -84,12 +84,17 @@ export function embeddingToBlob(embedding: number[]): Buffer {
 }
 
 /**
- * Convert SQLite BLOB back to embedding array
+ * Convert SQLite BLOB back to embedding array.
+ * bun:sqlite returns BLOB columns as Uint8Array (not Buffer) — wrap without
+ * copying so readFloatLE is available either way.
  */
-export function blobToEmbedding(blob: Buffer): number[] {
+export function blobToEmbedding(blob: Buffer | Uint8Array): number[] {
+  const buf = Buffer.isBuffer(blob)
+    ? blob
+    : Buffer.from(blob.buffer, blob.byteOffset, blob.byteLength);
   const embedding: number[] = [];
-  for (let i = 0; i < blob.length; i += 4) {
-    embedding.push(blob.readFloatLE(i));
+  for (let i = 0; i < buf.length; i += 4) {
+    embedding.push(buf.readFloatLE(i));
   }
   return embedding;
 }
diff --git a/src/lib/export.ts b/src/lib/export.ts
index 2a072bb..a97573d 100644
--- a/src/lib/export.ts
+++ b/src/lib/export.ts
@@ -25,7 +25,11 @@ import { SQLITE_SAFE_CHUNK_SIZE } from './chunk.js';
 import { getMigrationVersion } from '../db/migrations.js';
 import { VERSION } from '../version.js';
 
-/** Durable memory tables included in app-level (JSON/Markdown/SQL) exports. */
+/**
+ * Durable tables included in app-level (JSON/Markdown/SQL) exports: the
+ * memory tables plus dedup_lineage (issue #45), so duplicate lineage stays
+ * portable and auditable alongside the records it describes.
+ */
 export const EXPORT_TABLES = [
   'sessions',
   'messages',
@@ -33,6 +37,7 @@ export const EXPORT_TABLES = [
   'learnings',
   'breadcrumbs',
   'loa_entries',
+  'dedup_lineage',
 ] as const;
 export type ExportTable = typeof EXPORT_TABLES[number];
 
diff --git a/src/lib/memory.ts b/src/lib/memory.ts
index eb89ce3..4ad246c 100644
--- a/src/lib/memory.ts
+++ b/src/lib/memory.ts
@@ -2,6 +2,7 @@
 
 import { getDb, getDbPath } from '../db/connection.js';
 import { existsSync, statSync } from 'fs';
+import { notMarkedDuplicateSql } from './dedup.js';
 import type { Session, Message, Decision, Learning, Breadcrumb, LoaEntry, Stats, SearchResult, Provenance } from '../types/index.js';
 
 // ============ Sessions ============
@@ -271,6 +272,15 @@ export interface MemorySearchOptions {
   table?: string;
   limit?: number;
   biasType?: SearchTable;
+  /** Include records marked as duplicates by `recall dedup` (hidden by default). */
+  includeDuplicates?: boolean;
+}
+
+// Marked duplicates (issue #45) are hidden from search by default — the
+// lineage row in dedup_lineage keeps them recoverable and auditable.
+function duplicateFilter(options: MemorySearchOptions | undefined, physicalTable: string, idExpr: string): string {
+  if (options?.includeDuplicates) return '';
+  return `AND ${notMarkedDuplicateSql(`'${physicalTable}'`, idExpr)}`;
 }
 
 const TYPE_BIAS_RANK_MULTIPLIER = 4;
@@ -313,6 +323,7 @@ export function search(query: string, options?: MemorySearchOptions): SearchResu
           FROM messages_fts f
           JOIN messages m ON m.id = f.rowid
           WHERE messages_fts MATCH ?
+          ${duplicateFilter(options, 'messages', 'm.id')}
           ${options?.project ? 'AND m.project = ?' : ''}
           ORDER BY f.rank
           LIMIT ?
@@ -325,6 +336,7 @@ export function search(query: string, options?: MemorySearchOptions): SearchResu
           JOIN decisions d ON d.id = f.rowid
           WHERE decisions_fts MATCH ?
           AND d.status = 'active'
+          ${duplicateFilter(options, 'decisions', 'd.id')}
           ${options?.project ? 'AND d.project = ?' : ''}
           ORDER BY CASE d.confidence WHEN 'high' THEN 0 WHEN 'medium' THEN 1 WHEN 'low' THEN 2 ELSE 1 END, f.rank
           LIMIT ?
@@ -336,6 +348,7 @@ export function search(query: string, options?: MemorySearchOptions): SearchResu
           FROM learnings_fts f
           JOIN learnings l ON l.id = f.rowid
           WHERE learnings_fts MATCH ?
+          ${duplicateFilter(options, 'learnings', 'l.id')}
           ${options?.project ? 'AND l.project = ?' : ''}
           ORDER BY f.rank
           LIMIT ?
@@ -347,6 +360,7 @@ export function search(query: string, options?: MemorySearchOptions): SearchResu
           FROM breadcrumbs_fts f
           JOIN breadcrumbs b ON b.id = f.rowid
           WHERE breadcrumbs_fts MATCH ?
+          ${duplicateFilter(options, 'breadcrumbs', 'b.id')}
           ${options?.project ? 'AND b.project = ?' : ''}
           ORDER BY f.rank
           LIMIT ?
@@ -358,6 +372,7 @@ export function search(query: string, options?: MemorySearchOptions): SearchResu
           FROM loa_fts f
           JOIN loa_entries l ON l.id = f.rowid
           WHERE loa_fts MATCH ?
+          ${duplicateFilter(options, 'loa_entries', 'l.id')}
           ${options?.project ? 'AND l.project = ?' : ''}
           ORDER BY f.rank
           LIMIT ?
diff --git a/src/mcp-server.ts b/src/mcp-server.ts
index 958d001..242085f 100644
--- a/src/mcp-server.ts
+++ b/src/mcp-server.ts
@@ -66,6 +66,7 @@ import {
 	reciprocalRankFusion,
 	checkEmbeddingService,
 } from "./lib/embeddings.js";
+import { notMarkedDuplicateSql } from "./lib/dedup.js";
 import type { Provenance } from "./types/index.js";
 import { existsSync } from "fs";
 
@@ -118,9 +119,12 @@ async function hybridSearch(
 			const queryResult = await embed(query);
 			const queryEmbedding = queryResult.embedding;
 
+			// Marked duplicates (recall dedup, issue #45) keep their embeddings
+			// but are hidden from the vector path, matching the FTS5 default.
 			const embeddings = db
 				.prepare(`
         SELECT source_table, source_id, embedding FROM embeddings
+        WHERE ${notMarkedDuplicateSql("source_table", "source_id")}
       `)
 				.all() as Array<{
 				source_table: string;
diff --git a/tests/commands/dedup.test.ts b/tests/commands/dedup.test.ts
new file mode 100644
index 0000000..5c309b0
--- /dev/null
+++ b/tests/commands/dedup.test.ts
@@ -0,0 +1,345 @@
+// recall dedup — issue #45 acceptance criteria.
+//
+// Behavior under test:
+// - dry-run by default: reports the plan, writes nothing
+// - --execute marks duplicates in dedup_lineage; records stay intact
+// - exact/normalized duplicate detection within table + project
+// - survivor priority user_authored > verbatim > extracted > derived > unknown
+// - semantic pass uses stored embeddings only; skipped (and reported) when
+//   none exist; threshold respected, below-threshold never merged
+// - idempotence: a second execute finds nothing new
+// - cross-table candidates are report-only
+// - search hides marked duplicates by default; --include-duplicates shows them
+// - lineage rows persist full audit detail
+// - --delete (destructive opt-in) removes rows + embeddings; FK-referenced
+//   duplicates are kept as marked instead of failing the transaction
+
+import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
+import { setupTestDb, teardownTestDb } from '../helpers/setup';
+import { runDedup } from '../../src/commands/dedup';
+import { embeddingToBlob } from '../../src/lib/embeddings';
+import { getDb } from '../../src/db/connection';
+import { search } from '../../src/lib/memory';
+import {
+  createSession,
+  addMessage,
+  addDecision,
+  addBreadcrumb,
+  createLoaEntry,
+  supersedeDecision,
+} from '../../src/lib/memory';
+
+const originalLog = console.log;
+const originalError = console.error;
+const originalExitCode = process.exitCode;
+
+beforeEach(() => {
+  setupTestDb();
+  console.log = () => {};
+  console.error = () => {};
+});
+
+afterEach(() => {
+  console.log = originalLog;
+  console.error = originalError;
+  process.exitCode = originalExitCode;
+  teardownTestDb();
+});
+
+// Long enough to clear MIN_DEDUP_TEXT_LENGTH after normalization.
+const CRUMB = 'Always use bun for every script in this repository, never npm.';
+const OTHER = 'A completely different breadcrumb about the release process.';
+
+function lineageRows(): Array<Record<string, unknown>> {
+  return getDb().prepare('SELECT * FROM dedup_lineage ORDER BY id').all() as Array<Record<string, unknown>>;
+}
+
+function insertEmbedding(table: string, id: number, vector: number[]): void {
+  getDb().prepare(
+    `INSERT OR REPLACE INTO embeddings (source_table, source_id, model, dimensions, embedding)
+     VALUES (?, ?, 'test', ?, ?)`
+  ).run(table, id, vector.length, embeddingToBlob(vector));
+}
+
+describe('dry-run vs execute', () => {
+  test('dry-run reports the plan and writes nothing', () => {
+    addBreadcrumb({ content: CRUMB, importance: 5 });
+    addBreadcrumb({ content: CRUMB, importance: 5 });
+    addBreadcrumb({ content: OTHER, importance: 5 });
+
+    const result = runDedup({})!;
+    const crumbs = result.plan.tables.find(t => t.table === 'breadcrumbs')!;
+    expect(crumbs.exactGroups).toBe(1);
+    expect(crumbs.planned.length).toBe(1);
+    expect(result.applied).toBeNull();
+    expect(lineageRows().length).toBe(0);
+    expect(search('breadcrumb OR bun').length).toBe(3);
+  });
+
+  test('--execute marks duplicates non-destructively and search hides them', () => {
+    const id1 = addBreadcrumb({ content: CRUMB, importance: 5 });
+    const id2 = addBreadcrumb({ content: CRUMB, importance: 5 });
+
+    const result = runDedup({ execute: true })!;
+    expect(result.applied?.marked).toBe(1);
+
+    // Non-destructive: both records still exist.
+    const count = getDb().prepare('SELECT COUNT(*) AS c FROM breadcrumbs').get() as { c: number };
+    expect(count.c).toBe(2);
+
+    // Search hides the marked duplicate by default...
+    const hidden = search('bun');
+    expect(hidden.length).toBe(1);
+    // ...and shows it again with the explicit include option.
+    const shown = search('bun', { includeDuplicates: true });
+    expect(shown.length).toBe(2);
+    expect(shown.map(r => r.id).sort()).toEqual([id1, id2].sort());
+  });
+
+  test('--delete without --execute is rejected', () => {
+    expect(runDedup({ delete: true })).toBeUndefined();
+    expect(process.exitCode).toBe(1);
+  });
+});
+
+describe('exact detection and survivor selection', () => {
+  test('normalized variants dedupe; provenance picks the survivor', () => {
+    const extracted = addBreadcrumb({ content: CRUMB, importance: 9, provenance: 'extracted' });
+    const authored = addBreadcrumb({ content: CRUMB.toUpperCase(), importance: 1, provenance: 'user_authored' });
+    const legacy = addBreadcrumb({ content: `  ${CRUMB}  `, importance: 9 });
+
+    runDedup({ execute: true });
+
+    const rows = lineageRows();
+    expect(rows.length).toBe(2);
+    for (const row of rows) {
+      expect(row.survivor_id).toBe(authored);
+      expect(row.survivor_table).toBe('breadcrumbs');
+      expect(row.reason).toBe('exact');
+      expect(row.status).toBe('marked');
+    }
+    expect(rows.map(r => r.duplicate_id).sort()).toEqual([extracted, legacy].sort());
+  });
+
+  test('identical text in different projects is not deduped', () => {
+    addBreadcrumb({ content: CRUMB, importance: 5, project: 'alpha' });
+    addBreadcrumb({ content: CRUMB, importance: 5, project: 'beta' });
+    const result = runDedup({ execute: true })!;
+    expect(result.applied?.marked).toBe(0);
+  });
+
+  test('short records are never candidates', () => {
+    addBreadcrumb({ content: 'ok', importance: 5 });
+    addBreadcrumb({ content: 'ok', importance: 5 });
+    const result = runDedup({})!;
+    const crumbs = result.plan.tables.find(t => t.table === 'breadcrumbs')!;
+    expect(crumbs.tooShort).toBe(2);
+    expect(crumbs.planned.length).toBe(0);
+  });
+
+  test('only active decisions participate', () => {
+    const text = 'Adopt SQLite WAL mode for every database connection in Recall.';
+    const stale = addDecision({ decision: text, status: 'active' });
+    supersedeDecision(stale);
+    addDecision({ decision: text, status: 'active' });
+
+    const result = runDedup({ execute: true })!;
+    expect(result.applied?.marked).toBe(0);
+  });
+
+  test('--table scopes the run', () => {
+    addBreadcrumb({ content: CRUMB, importance: 5 });
+    addBreadcrumb({ content: CRUMB, importance: 5 });
+    createSession({ session_id: 's1', started_at: '2026-01-01T00:00:00Z' });
+    addMessage({ session_id: 's1', timestamp: '2026-01-01T00:00:01Z', role: 'user', content: CRUMB });
+    addMessage({ session_id: 's1', timestamp: '2026-01-01T00:00:02Z', role: 'user', content: CRUMB });
+
+    const result = runDedup({ execute: true, table: 'messages' })!;
+    expect(result.plan.tables.length).toBe(1);
+    expect(result.applied?.marked).toBe(1);
+    expect(lineageRows().every(r => r.duplicate_table === 'messages')).toBe(true);
+  });
+
+  test('invalid --table and --threshold are rejected', () => {
+    expect(runDedup({ table: 'sessions' })).toBeUndefined();
+    expect(process.exitCode).toBe(1);
+    process.exitCode = 0;
+    expect(runDedup({ threshold: 1.5 })).toBeUndefined();
+    expect(process.exitCode).toBe(1);
+  });
+});
+
+describe('semantic pass', () => {
+  test('skipped with a clear reason when no embeddings exist', () => {
+    addBreadcrumb({ content: CRUMB, importance: 5 });
+    addBreadcrumb({ content: OTHER, importance: 5 });
+    const result = runDedup({})!;
+    expect(result.plan.semanticSkipped).toContain('embed');
+  });
+
+  test('skipped when disabled via --no-semantic', () => {
+    addBreadcrumb({ content: CRUMB, importance: 5 });
+    const result = runDedup({ semantic: false })!;
+    expect(result.plan.semanticSkipped).toContain('--no-semantic');
+  });
+
+  test('marks pairs at the threshold, never below it', () => {
+    const near = Math.sqrt(1 - 0.97 * 0.97);
+    const a = addBreadcrumb({ content: CRUMB, importance: 5 });
+    const b = addBreadcrumb({ content: OTHER, importance: 5 });
+    insertEmbedding('breadcrumbs', a, [1, 0, 0]);
+    insertEmbedding('breadcrumbs', b, [0.97, near, 0]);
+
+    // Above the default 0.95 threshold → marked, with similarity recorded.
+    const marked = runDedup({ execute: true })!;
+    expect(marked.applied?.marked).toBe(1);
+    const row = lineageRows()[0];
+    expect(row.reason).toBe('semantic');
+    expect(row.similarity as number).toBeCloseTo(0.97, 3);
+
+    // Survivor is the richer record (longer normalized text wins the tie).
+    const longer = CRUMB.length >= OTHER.length ? a : b;
+    expect(row.survivor_id).toBe(longer);
+  });
+
+  test('a stricter threshold leaves the same pair untouched', () => {
+    const near = Math.sqrt(1 - 0.97 * 0.97);
+    const a = addBreadcrumb({ content: CRUMB, importance: 5 });
+    const b = addBreadcrumb({ content: OTHER, importance: 5 });
+    insertEmbedding('breadcrumbs', a, [1, 0, 0]);
+    insertEmbedding('breadcrumbs', b, [0.97, near, 0]);
+
+    const result = runDedup({ execute: true, threshold: 0.99 })!;
+    expect(result.applied?.marked).toBe(0);
+    expect(lineageRows().length).toBe(0);
+  });
+
+  test('a planned survivor is never re-marked by a weaker pair (no transitive chaining)', () => {
+    // PR #60 review repro: cos(A,B)=0.99, cos(B,C)=0.96, cos(A,C)≈0.91 with
+    // the default 0.95 threshold. Ascending text length pins the survivor
+    // orientation: B survives the strongest pair, so without the survivor
+    // guard the weaker B/C pair re-marks B — leaving A's only visible
+    // neighbor at 0.91, below the threshold.
+    const a = addBreadcrumb({ content: 'Review queue triage happens before standup.', importance: 5 });
+    const b = addBreadcrumb({ content: 'Review queue triage always happens before the standup.', importance: 5 });
+    const c = addBreadcrumb({ content: 'Review queue triage must always happen before the morning standup.', importance: 5 });
+    insertEmbedding('breadcrumbs', a, [0.99, Math.sqrt(1 - 0.99 * 0.99), 0]);
+    insertEmbedding('breadcrumbs', b, [1, 0, 0]);
+    insertEmbedding('breadcrumbs', c, [0.96, -Math.sqrt(1 - 0.96 * 0.96), 0]);
+
+    const result = runDedup({ execute: true })!;
+
+    // Only the strongest pair is marked; B/C is skipped because B already
+    // survives A.
+    expect(result.applied?.marked).toBe(1);
+    const rows = lineageRows();
+    expect(rows.length).toBe(1);
+    expect(rows[0].duplicate_id).toBe(a);
+    expect(rows[0].survivor_id).toBe(b);
+    expect(rows[0].similarity as number).toBeCloseTo(0.99, 3);
+
+    // One-hop lineage invariant: no survivor is itself marked as a duplicate.
+    const duplicates = new Set(rows.map(r => `${r.duplicate_table}:${r.duplicate_id}`));
+    for (const row of rows) {
+      expect(duplicates.has(`${row.survivor_table}:${row.survivor_id}`)).toBe(false);
+    }
+  });
+});
+
+describe('idempotence', () => {
+  test('a second execute finds nothing new', () => {
+    addBreadcrumb({ content: CRUMB, importance: 5 });
+    addBreadcrumb({ content: CRUMB, importance: 5 });
+    addBreadcrumb({ content: CRUMB, importance: 5 });
+
+    const first = runDedup({ execute: true })!;
+    expect(first.applied?.marked).toBe(2);
+    const after = lineageRows().length;
+
+    const second = runDedup({ execute: true })!;
+    expect(second.applied?.marked).toBe(0);
+    const crumbs = second.plan.tables.find(t => t.table === 'breadcrumbs')!;
+    expect(crumbs.alreadyMarked).toBe(2);
+    expect(lineageRows().length).toBe(after);
+  });
+});
+
+describe('cross-table candidates are report-only', () => {
+  test('reported, never marked', () => {
+    createSession({ session_id: 's1', started_at: '2026-01-01T00:00:00Z' });
+    addMessage({ session_id: 's1', timestamp: '2026-01-01T00:00:01Z', role: 'user', content: CRUMB });
+    addBreadcrumb({ content: CRUMB, importance: 5 });
+
+    const result = runDedup({ execute: true })!;
+    expect(result.plan.crossTable.textMatches.length).toBe(1);
+    expect(result.applied?.marked).toBe(0);
+    expect(lineageRows().length).toBe(0);
+    // Both records remain searchable.
+    expect(search('bun').length).toBe(2);
+  });
+});
+
+describe('lineage persistence', () => {
+  test('rows carry full audit detail', () => {
+    const survivor = addBreadcrumb({ content: CRUMB, importance: 5, provenance: 'user_authored', project: 'demo' });
+    const dup = addBreadcrumb({ content: CRUMB, importance: 5, provenance: 'extracted', project: 'demo' });
+
+    runDedup({ execute: true });
+
+    const row = lineageRows()[0];
+    expect(row.survivor_table).toBe('breadcrumbs');
+    expect(row.survivor_id).toBe(survivor);
+    expect(row.duplicate_table).toBe('breadcrumbs');
+    expect(row.duplicate_id).toBe(dup);
+    expect(row.reason).toBe('exact');
+    expect(row.similarity).toBe(1);
+    expect(row.status).toBe('marked');
+    expect(typeof row.created_at).toBe('string');
+    const detail = JSON.parse(row.detail as string);
+    expect(detail.survivor_provenance).toBe('user_authored');
+    expect(detail.duplicate_provenance).toBe('extracted');
+    expect(detail.project).toBe('demo');
+  });
+});
+
+describe('destructive opt-in (--execute --delete)', () => {
+  test('deletes duplicates and their embeddings; lineage records the deletion', () => {
+    const survivor = addBreadcrumb({ content: CRUMB, importance: 5, provenance: 'user_authored' });
+    const dup = addBreadcrumb({ content: CRUMB, importance: 5 });
+    insertEmbedding('breadcrumbs', dup, [1, 0, 0]);
+
+    const result = runDedup({ execute: true, delete: true })!;
+    expect(result.applied?.deleted).toBe(1);
+
+    const db = getDb();
+    const remaining = db.prepare('SELECT id FROM breadcrumbs').all() as Array<{ id: number }>;
+    expect(remaining.map(r => r.id)).toEqual([survivor]);
+    const embeddings = db.prepare('SELECT COUNT(*) AS c FROM embeddings').get() as { c: number };
+    expect(embeddings.c).toBe(0);
+    expect(lineageRows()[0].status).toBe('deleted');
+  });
+
+  test('FK-referenced duplicates are kept as marked instead of failing', () => {
+    createSession({ session_id: 's1', started_at: '2026-01-01T00:00:00Z' });
+    const survivor = addMessage({ session_id: 's1', timestamp: '2026-01-02T00:00:00Z', role: 'user', content: CRUMB });
+    const referenced = addMessage({ session_id: 's1', timestamp: '2026-01-01T00:00:00Z', role: 'user', content: CRUMB });
+    // The older message loses survivorship (recency tie-break) but is pinned
+    // by an LoA message range — hard-deleting it would violate the FK.
+    createLoaEntry({
+      title: 'entry', fabric_extract: 'body',
+      message_range_start: referenced, message_range_end: referenced,
+    });
+
+    const result = runDedup({ execute: true, delete: true })!;
+    expect(result.applied?.deleted).toBe(0);
+    expect(result.applied?.fkProtected).toBe(1);
+
+    const row = lineageRows()[0];
+    expect(row.duplicate_id).toBe(referenced);
+    expect(row.survivor_id).toBe(survivor);
+    expect(row.status).toBe('marked');
+    // The record still exists, just hidden.
+    const msg = getDb().prepare('SELECT id FROM messages WHERE id = ?').get(referenced);
+    expect(msg).toBeDefined();
+  });
+});
diff --git a/tests/commands/export.test.ts b/tests/commands/export.test.ts
index b7e0274..799c1f7 100644
--- a/tests/commands/export.test.ts
+++ b/tests/commands/export.test.ts
@@ -134,6 +134,7 @@ describe('manifest', () => {
       learnings: 1,
       breadcrumbs: 1,
       loa_entries: 1,
+      dedup_lineage: 0,
     });
     expect(manifest.provenance_counts.messages).toEqual({ unknown: 1, verbatim: 1 });
     expect(manifest.provenance_counts.decisions).toEqual({ unknown: 1, user_authored: 1 });
@@ -199,6 +200,7 @@ describe('SQL dump', () => {
         learnings: 1,
         breadcrumbs: 1,
         loa_entries: 1,
+        dedup_lineage: 0,
       });
 
       // Legacy NULL restores as NULL; known values survive verbatim
diff --git a/tests/db/migrations.test.ts b/tests/db/migrations.test.ts
index 8a52d55..62added 100644
--- a/tests/db/migrations.test.ts
+++ b/tests/db/migrations.test.ts
@@ -188,7 +188,8 @@ describe('MIGRATIONS array', () => {
   test('has expected number of migrations', () => {
     // 7 → 8: importance column on messages/decisions/learnings/loa_entries (Sprint #4)
     // 8 → 9: provenance column on all five memory tables (issue #42)
-    expect(MIGRATIONS.length).toBe(9);
+    // 9 → 10: dedup_lineage table (issue #45)
+    expect(MIGRATIONS.length).toBe(10);
   });
 
   test('all entries are functions', () => {
diff --git a/tests/lib/dedup.test.ts b/tests/lib/dedup.test.ts
new file mode 100644
index 0000000..e6ff9dd
--- /dev/null
+++ b/tests/lib/dedup.test.ts
@@ -0,0 +1,200 @@
+// recall dedup — pure logic (issue #45).
+//
+// Unit tests over the pure exported functions: normalization, survivor
+// selection (provenance > richness > importance > recency > id), exact
+// grouping, cross-table matching, and semantic pairing. Property-based
+// suites over these functions are issue #44 phase 2 — not here.
+
+import { describe, test, expect } from 'bun:test';
+import {
+  compareForSurvivor,
+  DEFAULT_SEMANTIC_THRESHOLD,
+  findCrossTableMatches,
+  findExactGroups,
+  findSemanticPairs,
+  normalizeText,
+  provenanceRank,
+  selectSurvivor,
+  type DedupCandidate,
+} from '../../src/lib/dedup';
+import { PROVENANCE_VALUES } from '../../src/types/index';
+
+function candidate(overrides: Partial<DedupCandidate> = {}): DedupCandidate {
+  return {
+    table: 'breadcrumbs',
+    id: 1,
+    project: null,
+    provenance: null,
+    importance: 5,
+    created_at: '2026-01-01 00:00:00',
+    key: 'k',
+    textLength: 50,
+    ...overrides,
+  };
+}
+
+describe('normalizeText', () => {
+  test('lowercases, strips quotes, collapses whitespace', () => {
+    expect(normalizeText(`  Use "Bun"   for\n'everything'  `)).toBe('use bun for everything');
+  });
+
+  test('identical meaning with different casing/spacing normalizes equal', () => {
+    expect(normalizeText('Ship  THE  feature')).toBe(normalizeText("ship the 'feature'"));
+  });
+});
+
+describe('provenanceRank', () => {
+  test('orders user_authored > verbatim > extracted > derived > unknown', () => {
+    const ranks = [...PROVENANCE_VALUES, null].map(p => provenanceRank(p));
+    expect(ranks).toEqual([...ranks].sort((a, b) => a - b));
+    expect(provenanceRank('user_authored')).toBeLessThan(provenanceRank('verbatim'));
+    expect(provenanceRank('verbatim')).toBeLessThan(provenanceRank('extracted'));
+    expect(provenanceRank('extracted')).toBeLessThan(provenanceRank('derived'));
+    expect(provenanceRank('derived')).toBeLessThan(provenanceRank(null));
+  });
+});
+
+describe('selectSurvivor', () => {
+  test('provenance dominates richness, importance, and recency', () => {
+    const weakButAuthored = candidate({ id: 1, provenance: 'user_authored', textLength: 10, importance: 1, created_at: '2020-01-01 00:00:00' });
+    const richButExtracted = candidate({ id: 2, provenance: 'extracted', textLength: 9999, importance: 10, created_at: '2026-01-01 00:00:00' });
+    const { survivor, duplicates } = selectSurvivor([richButExtracted, weakButAuthored]);
+    expect(survivor.id).toBe(1);
+    expect(duplicates.map(d => d.id)).toEqual([2]);
+  });
+
+  test('richness breaks provenance ties', () => {
+    const short = candidate({ id: 1, textLength: 10 });
+    const long = candidate({ id: 2, textLength: 200 });
+    expect(selectSurvivor([short, long]).survivor.id).toBe(2);
+  });
+
+  test('importance breaks richness ties', () => {
+    const low = candidate({ id: 1, importance: 3 });
+    const high = candidate({ id: 2, importance: 8 });
+    expect(selectSurvivor([low, high]).survivor.id).toBe(2);
+  });
+
+  test('recency breaks importance ties', () => {
+    const older = candidate({ id: 1, created_at: '2025-01-01 00:00:00' });
+    const newer = candidate({ id: 2, created_at: '2026-01-01 00:00:00' });
+    expect(selectSurvivor([older, newer]).survivor.id).toBe(2);
+  });
+
+  test('lowest id is the final deterministic tie-break', () => {
+    const a = candidate({ id: 7 });
+    const b = candidate({ id: 3 });
+    expect(selectSurvivor([a, b]).survivor.id).toBe(3);
+    // Total order: comparator never reports equality for distinct ids.
+    expect(compareForSurvivor(a, b)).toBeGreaterThan(0);
+  });
+
+  test('throws on an empty group', () => {
+    expect(() => selectSurvivor([])).toThrow();
+  });
+});
+
+describe('findExactGroups', () => {
+  test('groups same table + project + key; singletons excluded', () => {
+    const groups = findExactGroups([
+      candidate({ id: 1, key: 'a' }),
+      candidate({ id: 2, key: 'a' }),
+      candidate({ id: 3, key: 'b' }),
+    ]);
+    expect(groups.length).toBe(1);
+    expect(groups[0].map(c => c.id).sort()).toEqual([1, 2]);
+  });
+
+  test('identical text in different projects is not grouped', () => {
+    const groups = findExactGroups([
+      candidate({ id: 1, key: 'a', project: 'alpha' }),
+      candidate({ id: 2, key: 'a', project: 'beta' }),
+    ]);
+    expect(groups.length).toBe(0);
+  });
+});
+
+describe('findCrossTableMatches', () => {
+  test('reports same text across tables, ignores single-table matches', () => {
+    const matches = findCrossTableMatches([
+      candidate({ id: 1, key: 'a', table: 'breadcrumbs' }),
+      candidate({ id: 2, key: 'a', table: 'messages' }),
+      candidate({ id: 3, key: 'b', table: 'breadcrumbs' }),
+      candidate({ id: 4, key: 'b', table: 'breadcrumbs' }),
+    ]);
+    expect(matches.length).toBe(1);
+    expect(matches[0].members.map(m => m.table).sort()).toEqual(['breadcrumbs', 'messages']);
+  });
+});
+
+describe('findSemanticPairs', () => {
+  const unit = (x: number, y: number, z: number) => [x, y, z];
+
+  test('pairs at/above threshold; orthogonal vectors never pair', () => {
+    const near = Math.sqrt(1 - 0.97 * 0.97);
+    const pairs = findSemanticPairs(
+      [
+        { candidate: candidate({ id: 1, key: 'a' }), embedding: unit(1, 0, 0) },
+        { candidate: candidate({ id: 2, key: 'b' }), embedding: unit(0.97, near, 0) },
+        { candidate: candidate({ id: 3, key: 'c' }), embedding: unit(0, 0, 1) },
+      ],
+      DEFAULT_SEMANTIC_THRESHOLD
+    );
+    expect(pairs.length).toBe(1);
+    expect([pairs[0].a.id, pairs[0].b.id].sort()).toEqual([1, 2]);
+    expect(pairs[0].similarity).toBeCloseTo(0.97, 5);
+  });
+
+  test('below-threshold pairs are never produced', () => {
+    const near = Math.sqrt(1 - 0.97 * 0.97);
+    const pairs = findSemanticPairs(
+      [
+        { candidate: candidate({ id: 1, key: 'a' }), embedding: unit(1, 0, 0) },
+        { candidate: candidate({ id: 2, key: 'b' }), embedding: unit(0.97, near, 0) },
+      ],
+      0.99
+    );
+    expect(pairs.length).toBe(0);
+  });
+
+  test('same-table pairs across projects are ignored; cross-table pairs are flagged', () => {
+    const pairs = findSemanticPairs(
+      [
+        { candidate: candidate({ id: 1, key: 'a', project: 'alpha' }), embedding: unit(1, 0, 0) },
+        { candidate: candidate({ id: 2, key: 'b', project: 'beta' }), embedding: unit(1, 0, 0) },
+        { candidate: candidate({ id: 3, key: 'c', table: 'messages', project: 'alpha' }), embedding: unit(1, 0, 0) },
+      ],
+      0.95
+    );
+    // 1↔2: same table, different projects — ignored.
+    // 1↔3 and 2↔3: cross-table — report-only pairs.
+    expect(pairs.every(p => !p.sameTable)).toBe(true);
+    expect(pairs.length).toBe(2);
+  });
+
+  test('identical normalized keys are left to the exact pass', () => {
+    const pairs = findSemanticPairs(
+      [
+        { candidate: candidate({ id: 1, key: 'same' }), embedding: unit(1, 0, 0) },
+        { candidate: candidate({ id: 2, key: 'same' }), embedding: unit(1, 0, 0) },
+      ],
+      0.95
+    );
+    expect(pairs.length).toBe(0);
+  });
+
+  test('results are sorted strongest-first', () => {
+    const mid = Math.sqrt(1 - 0.96 * 0.96);
+    const near = Math.sqrt(1 - 0.99 * 0.99);
+    const pairs = findSemanticPairs(
+      [
+        { candidate: candidate({ id: 1, key: 'a' }), embedding: unit(1, 0, 0) },
+        { candidate: candidate({ id: 2, key: 'b' }), embedding: unit(0.99, near, 0) },
+        { candidate: candidate({ id: 3, key: 'c' }), embedding: unit(0.96, mid, 0) },
+      ],
+      0.95
+    );
+    const sims = pairs.map(p => p.similarity);
+    expect(sims).toEqual([...sims].sort((a, b) => b - a));
+  });
+});