fix: reindex stability — sentinel resume, donor safety, worktree subdirs, progress bar by aeneasr · Pull Request #67 · ory/lumen

aeneasr · 2026-03-27T15:25:39Z

Summary

Sentinel file resume: Indexing interrupted mid-run (e.g. Ollama timeout) leaves files with hash="". Previously the next session would see a matching root hash and return early, leaving those files permanently unembedded. Now HasSentinelFiles() is checked before the early-return; any sentinel triggers incremental indexing to complete the run.
Metadata persistence on partial failure: saveMeta() closure saves root_hash + timestamps on both success and mid-batch embedding failures, so the next session can match the hash and skip already-complete files.
Donor safety gate: SeedFromDonor now verifies the donor has a non-empty root_hash before copying. Seeding from an incomplete donor propagated corrupted state; this now returns (false, nil) and the worktree starts fresh.
Worktree subdirectory donor discovery: FindDonorIndexBase previously compared worktree root paths, missing cases where the effective project root is a subdirectory (e.g. monorepo/backoffice). The fix computes the relative suffix inside the worktree and looks for a DB at <sibling_worktree>/<relSuffix>.
Progress bar duplication: Long file paths caused pterm line-wrapping, breaking cursor positioning and leaving duplicate output. Titles are now truncated to (terminal_width - 45) chars before UpdateTitle().

Test plan

go test ./... — all packages pass (verified locally)
TestSeedFromDonor_IncompleteDonor — new test covering donor safety gate
Manual: index a large repo, kill mid-run, restart — confirm remaining files are embedded rather than skipped
Manual: open worktree in a monorepo subdirectory — confirm donor seeding finds the sibling worktree's DB
Manual: index with long file paths — confirm no duplicate progress bar output

🤖 Generated with Claude Code

When reindexing takes longer than 15s, semantic_search returns stale results with a warning instead of blocking the agent indefinitely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Buffered done channel (cap 1) to prevent goroutine leak on timeout - Goroutine calls touchChecked on success for correct TTL behavior - Nil progress func in goroutine (request ctx may be gone) - Log errors from background EnsureFresh at Warn level - sync.WaitGroup for graceful shutdown in Close() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

7-task plan with TDD approach: struct changes, WaitGroup, timeout goroutine, formatSearchResults, and tests including a test hook (ensureFreshFunc) to exercise the 15s timeout path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… reindex EnsureFresh now runs in a goroutine. If it completes within 15s, results are returned normally. If it exceeds the timeout, stale results are returned immediately with a StaleWarning while reindexing continues in the background (up to 10min). The goroutine acquires an exclusive flock to avoid concurrent writes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Go 1.25+ provides wg.Go() which simplifies goroutine tracking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…exed Add ensureFreshFunc test hook to indexerCache (follows existing findDonorFunc/seedFunc pattern) and three new tests: - TestEnsureIndexed_TimeoutReturnsStaleWarning: injects a slow EnsureFresh that exceeds the 15s timeout, verifies StaleWarning is returned and Reindexed=false. - TestEnsureIndexed_FastEnsureFreshNoWarning: injects an instant EnsureFresh, verifies no warning and correct stats propagation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…or vec_chunks Handles slow embedding batches and retries on SQLite contention without timing out. INSERT OR REPLACE prevents duplicate key errors when re-embedding chunks that already exist in the vector table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Three root causes fixed: 1. SessionStart double-spawn and no freshness gate (hook.go): - Remove unconditional spawnBackgroundIndexer from runHookSessionStart; generateSessionContextInternal now owns all spawn decisions - After opening the DB for stats, check last_indexed_at: skip spawn when indexed within backgroundIndexStaleness (5 min), spawn when stale or never completed. Prevents every new terminal from triggering a full merkle walk. 2. Goroutine zero-result treated as "fresh" (stdio.go): - Add skipped bool to freshResult. When TryAcquire returns nil (TOCTOU race — another process grabbed the lock) or errors, send freshResult{skipped: true}. Main select now returns StaleWarning for skipped results, consistent with the IsHeld fast-path. Previously the zero result looked like "index is fresh", silently skipping touchChecked and causing the next search to immediately re-spawn. 3. Redundant merkle walk after lumen index finishes (stdio.go): - In the goroutine, after acquiring the flock, check idx.LastIndexedAt(). If within freshnessTTL, call touchChecked() and return without calling EnsureFresh. Uses the DB timestamp as a shared cross-process freshness signal so the MCP server doesn't duplicate the walk just completed by the background indexer. Also fix pre-existing errcheck lint in tui/progress.go. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nge breakdown - index.go: add newDebugLogger() for background path; log start, skip, cancel, error, and completion with full Stats fields; pass logger to Indexer via SetLogger() so indexWithTree can log the indexing plan - index/index.go: add FilesAdded/FilesModified/FilesRemoved/Reason/ OldRootHash/NewRootHash to Stats; populate them in Index, EnsureFresh, and indexWithTree; add SetLogger/logger field to Indexer - hook_spawn_unix.go: discard stderr of background indexer (slog writes to debug.log; piping stderr would mix pterm progress into the log) - search.go: pass nil logger to setupIndexer (interactive command) - CLAUDE.md: document interactive vs background output strategy Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The force_reindex parameter on semantic_search is removed. Reindexing is exclusively triggered by the SessionStart hook and by the background goroutine inside ensureIndexed. Progress notifications are restored and now flow through the background goroutine path so the Claude Code status indicator animates during indexing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Enrich the "indexing plan" slog entry with: - old_root_hash: stored merkle root before this run - new_root_hash: computed merkle root from current filesystem - main_worktree: main git repo root (only when projectDir is a worktree) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When SQLite reports "database disk image is malformed" or "disk I/O error", the index is permanently broken until manually purged. Every subsequent semantic_search call would fail with the same error because touchChecked is never set and each retry hits the same corrupted file. This change adds automatic recovery at two layers: - store.New: if open/schema-setup fails with a corruption error, delete the DB file and its WAL/SHM sidecars and retry once from a clean state. In-memory databases are never deleted. - Indexer.EnsureFresh / Index: if indexWithTree returns a corruption error mid-operation, log ERROR "corrupted database detected, rebuilding", call rebuildStore() (close → delete files → reopen), then retry with an empty stored hash so the fresh DB receives a full index pass. Adds IsCorruptionErr(err) to the store package as the single source of truth for what constitutes a SQLite corruption error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

pterm's cursor positioning assumes bar title fits on one line. Long paths cause wrapping which shifts the cursor, leaving duplicated output on each redraw. Truncate to (terminal_width - 45) chars, reserving space for the bar chrome, appending an ellipsis when truncated. Also benefits terminal resize: pterm.GetTerminalWidth() is called live on every Update(), so the budget adjusts automatically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tion Files registered mid-run have hash="" (sentinel). Previously, if the root hash hadn't changed between sessions the indexer returned early, leaving those files permanently unembedded. - Add HasSentinelFiles() to store: EXISTS query on files WHERE hash='' - In Index() and EnsureFresh(), check sentinels before the early-return: if any exist, fall through to incremental indexing regardless of hash - Replace four separate SetMeta calls at end-of-run with a saveMeta() closure; call it on mid-batch embedding failures too so progress is persisted even when Ollama times out partway through a large repo Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

A donor whose first indexing pass was interrupted has no root_hash in project_meta. Seeding from such a donor propagates partially-indexed state to the new worktree, causing it to believe it is current when it is not. Guard: open the donor read-only, query root_hash before the WAL checkpoint, and bail out (return false, nil) if the value is missing or empty. The new TestSeedFromDonor_IncompleteDonor test covers this path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

FindDonorIndexBase compared worktree root paths directly. When the effective project root is a subdirectory (e.g. monorepo/backoffice), the DB path is derived from that subdirectory, not from the worktree root — so no sibling worktrees were ever found. Fix: identify which worktree contains the project, compute the relative suffix (e.g. "backoffice"), then look for a DB at <sibling_worktree>/<relSuffix> in each sibling. This correctly resolves donor indexes regardless of how deep the effective root sits inside the worktree. Symlinks are resolved at every comparison point. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

aeneasr and others added 22 commits March 23, 2026 18:07

docs: add spec for non-blocking semantic_search with partial results

508a4e8

When reindexing takes longer than 15s, semantic_search returns stale results with a warning instead of blocking the agent indefinitely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor(cmd): add StaleWarning field and WaitGroup to indexerCache

d6b5483

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(cmd): Close() waits for background reindex goroutines

56c3280

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style(cmd): use WaitGroup.Go instead of manual Add/Done pattern

a2e6bd2

Go 1.25+ provides wg.Go() which simplifies goroutine tracking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs(cmd): note worst-case wait time on Close()

7904ce6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(cmd): render StaleWarning in semantic_search output

7822ecc

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test(cmd): verify ensureIndexed skips reindex when flock is held

09a6cdb

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style(cmd): fix errcheck lint in new test functions

d03cb18

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

aeneasr enabled auto-merge (rebase) March 27, 2026 15:26

auto-merge was automatically disabled March 27, 2026 15:43
Rebase failed

aeneasr merged commit 07782d6 into main Mar 27, 2026
4 checks passed

github-actions bot mentioned this pull request Mar 27, 2026

chore(main): release 0.0.24 #68

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reindex stability — sentinel resume, donor safety, worktree subdirs, progress bar#67

fix: reindex stability — sentinel resume, donor safety, worktree subdirs, progress bar#67
aeneasr merged 22 commits intomainfrom
reindex-fixes

aeneasr commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aeneasr commented Mar 27, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant