You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Server hangs and requires SIGKILL after list_projects on large project (SQLite lock contention) #52 (CLOSED): list_projects hang + stuck db-shm from SIGKILL-mid-transaction. Substrate overlap (WAL+shm stuck states), different manifestation: our server does NOT hang; it responds normally; it just returns stale data for symbols that exist in the WAL but have never been checkpointed into the main .db.
If this repro turns out to be a re-emergence of any of the above under a new code path, please re-close as duplicate with a pointer.
Symptom
index_repository returns status: indexed, nodes: N, edges: M where (N, M) are the pre-add-new-file counts — the indexer reports success but the returned counts do not reflect the files added since the last clean checkpoint. search_graph, search_code, and trace_path all return empty for symbols in files added since. Affects both delete_project → index_repository rebuild AND incremental re-index cycles; i.e., even a full rebuild-from-scratch from the MCP-tool-surface does NOT resolve it.
Root cause (empirically verified)
Multiple stdio-spawned codebase-memory-mcp.exe processes persist across client session closes — 10 orphan processes observed in our case, accumulated across multiple Claude Code sessions where stdio shutdown did not handshake cleanly. SQLite WAL at ~/.cache/codebase-memory-mcp/<project>.db-wal grows (9.1 MB observed) while the main <project>.db file's mtime stays frozen at the date of the last clean checkpoint. New symbols get written to WAL, but checkpoint-into-main cannot land because orphan-process connections hold read locks.
Verified by: after running taskkill /F /IM codebase-memory-mcp.exe on all 10 processes + rm ~/.cache/codebase-memory-mcp/<project>.db* (all three files: .db, .db-shm, .db-wal) + re-indexing the project, we observed a +42 node / +41 edge delta for that project — multiple prior sessions' accumulated index writes finally landed. delete_project + index_repository cycle alone does NOT resolve this; the orphan-process kill + cache-file delete is required.
Minimal repro (Windows)
Index a project via index_repository, confirm symbols appear in search_graph.
Close the client session without a clean stdio-shutdown handshake (typical — stdio parent-death is SIGHUP-equivalent, not a graceful close).
Open a new client session against the same project.
ls -la ~/.cache/codebase-memory-mcp/ shows .db-wal growing across retries; .db mtime stuck at first-ever-index date.
tasklist | findstr codebase-memory-mcp shows N > 1 orphan processes.
Workaround
taskkill //F //IM codebase-memory-mcp.exe
rm ~/.cache/codebase-memory-mcp/<project>.db*
# — next client tool call auto-respawns a fresh server via stdio
# (no manual restart needed if registered via .mcp.json)
Verified working; the client-side MCP auto-respawn is transparent. Other projects' indexes in the cache are preserved (they have separate .db files per project).
Proposed fixes
On delete_project: issue PRAGMA wal_checkpoint(TRUNCATE) before touching files on disk.
On startup: detect stale .db-wal beyond a threshold (10 MB?) and either force-checkpoint or emit a loud warning.
On stdio-close: trap SIGTERM / stdio EOF and run PRAGMA wal_checkpoint(TRUNCATE) before exit.
Document the orphan-process failure mode in the README's troubleshooting section.
Downstream context
Encountered while building Outside-Diff Impact Slicing (ODSC) review agents that depend on this MCP's trace_path + detect_changes. Our diagnostic trail is at .claude/agents/VALIDATION_LOG.md Entry 2 (§"Cold-start resolution (2026-04-21, same session)") of the downstream project; see https://github.com/davidseraphi/PROJECT_MEMORY_ENGINE_OS/blob/master/.claude/agents/VALIDATION_LOG.md
(public once the current local-ahead commits are pushed — available on request if not yet visible when this issue is triaged).
Happy to run additional diagnostics on our repro on request (e.g., sqlite3 <db> 'PRAGMA wal_checkpoint;' against the stuck-state DB to confirm the checkpoint-cannot-land hypothesis mechanically).
v0.6.0(confirm viacodebase-memory-mcp --versionbefore filing), Windows 11, stdio MCP registration via project-scope.mcp.json.Relationship to existing CLOSED issues
Triaged before filing; distinct from all three:
.dbfile per project, the right one, but its mtime is frozen while its.db-walaccumulates writes.list_projectshang + stuckdb-shmfrom SIGKILL-mid-transaction. Substrate overlap (WAL+shm stuck states), different manifestation: our server does NOT hang; it responds normally; it just returns stale data for symbols that exist in the WAL but have never been checkpointed into the main.db..dbcreation. Our project IS indexed (index_repositoryreturnsstatus: indexed, nodes: N, edges: Mwith N/M > 0 for the project's pre-existing symbols, andlist_projectslists it), but queries for symbols defined in files added POST-initial-index return empty. Adjacent bug class; fix(mcp): query handlers silently return empty results for unindexed projects; resolve_store creates ghost .db files #119's guard does not cover this.If this repro turns out to be a re-emergence of any of the above under a new code path, please re-close as duplicate with a pointer.
Symptom
index_repositoryreturnsstatus: indexed, nodes: N, edges: Mwhere (N, M) are the pre-add-new-file counts — the indexer reports success but the returned counts do not reflect the files added since the last clean checkpoint.search_graph,search_code, andtrace_pathall return empty for symbols in files added since. Affects bothdelete_project→index_repositoryrebuild AND incremental re-index cycles; i.e., even a full rebuild-from-scratch from the MCP-tool-surface does NOT resolve it.Root cause (empirically verified)
Multiple stdio-spawned
codebase-memory-mcp.exeprocesses persist across client session closes — 10 orphan processes observed in our case, accumulated across multiple Claude Code sessions where stdio shutdown did not handshake cleanly. SQLite WAL at~/.cache/codebase-memory-mcp/<project>.db-walgrows (9.1 MB observed) while the main<project>.dbfile's mtime stays frozen at the date of the last clean checkpoint. New symbols get written to WAL, but checkpoint-into-main cannot land because orphan-process connections hold read locks.Verified by: after running
taskkill /F /IM codebase-memory-mcp.exeon all 10 processes +rm ~/.cache/codebase-memory-mcp/<project>.db*(all three files:.db,.db-shm,.db-wal) + re-indexing the project, we observed a +42 node / +41 edge delta for that project — multiple prior sessions' accumulated index writes finally landed.delete_project+index_repositorycycle alone does NOT resolve this; the orphan-process kill + cache-file delete is required.Minimal repro (Windows)
index_repository, confirm symbols appear insearch_graph..pyfile with a distinct function name.delete_project→index_repository→search_graph(name_pattern=".*new_fn.*")→total: 0.ls -la ~/.cache/codebase-memory-mcp/shows.db-walgrowing across retries;.dbmtime stuck at first-ever-index date.tasklist | findstr codebase-memory-mcpshows N > 1 orphan processes.Workaround
Verified working; the client-side MCP auto-respawn is transparent. Other projects' indexes in the cache are preserved (they have separate
.dbfiles per project).Proposed fixes
delete_project: issuePRAGMA wal_checkpoint(TRUNCATE)before touching files on disk..db-walbeyond a threshold (10 MB?) and either force-checkpoint or emit a loud warning.PRAGMA wal_checkpoint(TRUNCATE)before exit.Downstream context
Encountered while building Outside-Diff Impact Slicing (ODSC) review agents that depend on this MCP's
trace_path+detect_changes. Our diagnostic trail is at.claude/agents/VALIDATION_LOG.mdEntry 2 (§"Cold-start resolution (2026-04-21, same session)") of the downstream project; seehttps://github.com/davidseraphi/PROJECT_MEMORY_ENGINE_OS/blob/master/.claude/agents/VALIDATION_LOG.md
(public once the current local-ahead commits are pushed — available on request if not yet visible when this issue is triaged).
Happy to run additional diagnostics on our repro on request (e.g.,
sqlite3 <db> 'PRAGMA wal_checkpoint;'against the stuck-state DB to confirm the checkpoint-cannot-land hypothesis mechanically).