Context
Empirical finding from Basparin/Lawen project, 900-observation DB, session 2026-04-25.
mem_search is currently a wrapper over SQLite + FTS5 (lexical/tokenized search). Natural-language queries with valid intent but disjoint tokens silently return 0 results despite the content existing.
Reproducible evidence
Same DB, same content, ~5 minutes apart:
| Query |
Hits |
Content existed? |
workflow friction overuse optimization |
0 |
Yes — observations on hooks/bridge/CLAUDE.md |
tools rarely used skills inactive |
0 |
Yes — skill registry observation |
judgment-day ultrareview parallel review |
0 |
Yes — referenced in session summaries |
pulse statusline bridge |
3 |
Same content as query #1, different tokens |
Impact
The value of the saved corpus is gated by the caller's ability to guess past-self's word choices. For agents (Claude orchestrators) writing across many sessions and weeks, this leads to silent recall failures — saved decisions become invisible to queries with semantically equivalent but lexically distinct phrasing.
Proposal
Add an optional embeddings layer with hybrid scoring:
- FTS5 lexical match stays primary (deterministic, fast)
- Cosine similarity over embeddings augments — recovers synonymic / paraphrased queries
- Configurable weight (e.g.
--semantic-weight 0.3) so users can tune lexical-vs-semantic mix
- Embedding model pluggable: local (
sentence-transformers, ONNX) for offline, or API (Anthropic/OpenAI) for higher quality
Acceptance criteria
Same observation recoverable via at least 3 distinct natural-language queries that share intent but no overlapping tokens.
Backwards compatibility
Existing FTS5 behavior unchanged unless embeddings flag is enabled. Migration: re-embed existing observations on first opt-in run.
Workaround documented (no upstream change)
For now Basparin's symbiosis repo carries the workaround: mandatory topic_key per evolving topic, multi-keyword retry before concluding absent. Tracking: https://github.com/Basparin/symbiosis/pull/13
Related
Repo backend confirmed: Persistent memory system for AI coding agents. Agent-agnostic Go binary with SQLite + FTS5 (README).
Context
Empirical finding from Basparin/Lawen project, 900-observation DB, session 2026-04-25.
mem_searchis currently a wrapper over SQLite + FTS5 (lexical/tokenized search). Natural-language queries with valid intent but disjoint tokens silently return 0 results despite the content existing.Reproducible evidence
Same DB, same content, ~5 minutes apart:
workflow friction overuse optimizationtools rarely used skills inactivejudgment-day ultrareview parallel reviewpulse statusline bridgeImpact
The value of the saved corpus is gated by the caller's ability to guess past-self's word choices. For agents (Claude orchestrators) writing across many sessions and weeks, this leads to silent recall failures — saved decisions become invisible to queries with semantically equivalent but lexically distinct phrasing.
Proposal
Add an optional embeddings layer with hybrid scoring:
--semantic-weight 0.3) so users can tune lexical-vs-semantic mixsentence-transformers, ONNX) for offline, or API (Anthropic/OpenAI) for higher qualityAcceptance criteria
Same observation recoverable via at least 3 distinct natural-language queries that share intent but no overlapping tokens.
Backwards compatibility
Existing FTS5 behavior unchanged unless embeddings flag is enabled. Migration: re-embed existing observations on first opt-in run.
Workaround documented (no upstream change)
For now Basparin's symbiosis repo carries the workaround: mandatory
topic_keyper evolving topic, multi-keyword retry before concluding absent. Tracking: https://github.com/Basparin/symbiosis/pull/13Related
Repo backend confirmed:
Persistent memory system for AI coding agents. Agent-agnostic Go binary with SQLite + FTS5(README).