feat: add recall repair for data and index maintenance#62
Conversation
…t-only integrity checks Explicit data/index maintenance separate from doctor --fix (issue #46). Dry-run by default; --execute applies. Rebuilds drifted FTS5 indexes from source tables, recreates missing indexes/sync triggers from canonical per-table schema DDL (the un-migrated-DB silently-empty-search gap), re-embeds rows missing embeddings when Ollama is available, and reports orphan/invariant problems without mutating them. Never hard-deletes rows, never touches Record Provenance. Doctor gains a read-only FTS health check that recommends repair; --fix remains symlinks-only. Closes #46
Review —
|
… legacy DB crash Blocker 1: restore process.exitCode with the repo's ?? 0 convention in repair.test.ts — Bun silently ignores assigning undefined, so the invalid --table test's exit code poisoned the suite (CI-red with 0 failures). Blocker 2: drop the marked-duplicates exclusion clause from the embed gap query when dedup_lineage does not exist. A pre-dedup legacy DB (the command's headline use case) crashed with a raw SQLiteError inside planRepair before any output; it now completes the plan, still reports embed gaps, and prints the 'recall init' recommendation. The legacy-DB test now builds a genuine pre-dedup schema (full DDL, dedup_lineage dropped, user_version 9) instead of faking it with PRAGMA on a current schema.
Re-review — fix delta 779be25 → 4e65c6eDelta verified surgical: only
Deferred items confirmed untouched (per agreed scope): per-row embed-failure exit code, Both blockers genuinely resolved, CI green, no new issues introduced. VERDICT: APPROVE |
Closes #46
What
Adds
recall repair— explicit data/index maintenance, deliberately separate fromrecall doctor --fix(which stays symlinks-only and never runs data repair).Repair scope (per the issue's acceptance criteria)
messages,decisions,learnings,breadcrumbs,loa_entries,telos,documents. Detection per table: indexed-row count via the_docsizeshadow table (a plainCOUNT(*)on an external-content FTS table reads through to the content table and proves nothing), sync-trigger presence, and the FTS5integrity-checkcommand. Missing indexes/triggers are recreated from canonical per-table schema DDL and rebuilt — this directly covers the un-migrated-DB → silently-empty-search gap flagged in the PR feat: add recall dedup — non-destructive dedup with provenance-aware survivor selection #60 review.loa_entries,decisions,learnings, assistantmessages) when Ollama is available and the row has enough source text. Service unavailable → reported, exit stays 0 unless another requested repair failed. Partial results report embedded/skipped/failed counts — failures are never hidden. Marked duplicates (dedup Add non-destructive dedup with provenance-aware survivor selection #45) are excluded.recall init).Safety
--executeprints arecall export --backuprecommendation.recall doctorgains a read-only FTS health check that recommendsrecall repair; it carries no repair closure, so the--fixloop structurally cannot run data repair.Implementation notes
schema.tsFTS DDL restructured into a per-tableFTS_SCHEMAmap; the legacyCREATE_FTS/CREATE_FTS_TRIGGERSstrings are derived from it (single source of truth, enables per-table recreation).embed.ts's per-table content composition now delegates to the canonicalembeddingTextForinsrc/lib/repair.ts(DRY).src/lib/chunk.tsconvention.runRepairso tests run deterministically offline.Testing
tests/lib/repair.test.ts,tests/commands/repair.test.ts): dry-run vs execute, drift/missing-index/missing-trigger rebuild, service-unavailable success path, partial embed success, no-hard-delete, provenance no-touch, orphan report-only,--tablescoping, pending-migration report, doctor separation.bun run lintclean.--execute, search returns results again.