Skip to content

Baseline comparison harness: trie-memory vs LLM + vector DB #5

@tomas-samek

Description

@tomas-samek

Summary

The central architectural bet of this PoC is "paraphrase-capable recall without embeddings." That claim is only meaningful when compared against the obvious baseline — an LLM plus vector DB stack on the same scenarios.

What to do

  • Port Scenarios 1 through 5 and the 20-query stress corpus to a baseline implementation:
    • Embeddings: any open model (e.g., sentence-transformers or local nomic-embed).
    • Vector store: sqlite-vec, qdrant-lite, or similar.
    • Renderer: template-only (same no-fabrication contract), or a small LLM pass — document which.
  • Run both systems on the same corpus and queries. Record mode distribution, recall quality, latency, storage footprint.
  • Publish the comparison as docs/baseline_comparison.md.

Acceptance

  • Reproducible harness exists and is documented.
  • A results table shows where trie-memory wins, where it loses, where it ties.
  • README "honest state" section references the baseline numbers instead of speculation.

Links

  • docs/design/honest_agent/progress.md Phase D (listed as deferred — explicitly pulling it up to M2 because it is load-bearing for the central claim).
  • tests/phase_b_stress.rs
  • tests/honest_agent_paraphrase.rs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions