Summary
The central architectural bet of this PoC is "paraphrase-capable recall without embeddings." That claim is only meaningful when compared against the obvious baseline — an LLM plus vector DB stack on the same scenarios.
What to do
- Port Scenarios 1 through 5 and the 20-query stress corpus to a baseline implementation:
- Embeddings: any open model (e.g., sentence-transformers or local nomic-embed).
- Vector store: sqlite-vec, qdrant-lite, or similar.
- Renderer: template-only (same no-fabrication contract), or a small LLM pass — document which.
- Run both systems on the same corpus and queries. Record mode distribution, recall quality, latency, storage footprint.
- Publish the comparison as
docs/baseline_comparison.md.
Acceptance
- Reproducible harness exists and is documented.
- A results table shows where trie-memory wins, where it loses, where it ties.
- README "honest state" section references the baseline numbers instead of speculation.
Links
docs/design/honest_agent/progress.md Phase D (listed as deferred — explicitly pulling it up to M2 because it is load-bearing for the central claim).
tests/phase_b_stress.rs
tests/honest_agent_paraphrase.rs
Summary
The central architectural bet of this PoC is "paraphrase-capable recall without embeddings." That claim is only meaningful when compared against the obvious baseline — an LLM plus vector DB stack on the same scenarios.
What to do
docs/baseline_comparison.md.Acceptance
Links
docs/design/honest_agent/progress.mdPhase D (listed as deferred — explicitly pulling it up to M2 because it is load-bearing for the central claim).tests/phase_b_stress.rstests/honest_agent_paraphrase.rs