Skip to content

Add semantic search layer on top of FTS5 to improve mem_search recall #233

@Basparin

Description

@Basparin

Context

Empirical finding from Basparin/Lawen project, 900-observation DB, session 2026-04-25.

mem_search is currently a wrapper over SQLite + FTS5 (lexical/tokenized search). Natural-language queries with valid intent but disjoint tokens silently return 0 results despite the content existing.

Reproducible evidence

Same DB, same content, ~5 minutes apart:

Query Hits Content existed?
workflow friction overuse optimization 0 Yes — observations on hooks/bridge/CLAUDE.md
tools rarely used skills inactive 0 Yes — skill registry observation
judgment-day ultrareview parallel review 0 Yes — referenced in session summaries
pulse statusline bridge 3 Same content as query #1, different tokens

Impact

The value of the saved corpus is gated by the caller's ability to guess past-self's word choices. For agents (Claude orchestrators) writing across many sessions and weeks, this leads to silent recall failures — saved decisions become invisible to queries with semantically equivalent but lexically distinct phrasing.

Proposal

Add an optional embeddings layer with hybrid scoring:

  • FTS5 lexical match stays primary (deterministic, fast)
  • Cosine similarity over embeddings augments — recovers synonymic / paraphrased queries
  • Configurable weight (e.g. --semantic-weight 0.3) so users can tune lexical-vs-semantic mix
  • Embedding model pluggable: local (sentence-transformers, ONNX) for offline, or API (Anthropic/OpenAI) for higher quality

Acceptance criteria

Same observation recoverable via at least 3 distinct natural-language queries that share intent but no overlapping tokens.

Backwards compatibility

Existing FTS5 behavior unchanged unless embeddings flag is enabled. Migration: re-embed existing observations on first opt-in run.

Workaround documented (no upstream change)

For now Basparin's symbiosis repo carries the workaround: mandatory topic_key per evolving topic, multi-keyword retry before concluding absent. Tracking: https://github.com/Basparin/symbiosis/pull/13

Related

Repo backend confirmed: Persistent memory system for AI coding agents. Agent-agnostic Go binary with SQLite + FTS5 (README).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions