You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
lex-knowledge currently ingests document chunks directly into Apollo as single entries. Research into MemPalace's architecture (96.6% R@5 verbatim baseline on LongMemEval) reveals a two-tier design that dramatically improves retrieval accuracy: verbatim drawers (original text, immutable) + AAAK compressed pointer entries (lossy symbolic index). The pointer layer is cheap to scan; the drawer layer is what gets returned.
Implement a lightweight AAAK formatter in Ruby (regex-based entity extraction + keyword extraction — no LLM required per MemPalace's design)
Write both tiers in a single ingest transaction
Update query runner to use two-pass retrieval: closet scan → drawer fetch
Why This Matters
The core finding from MemPalace benchmarks: verbatim text + good embeddings beats LLM-extracted summaries on recall. When you summarize at ingest time, you lose the context of why a decision was made, alternatives considered, exact wording. The pointer layer gives you cheap scanning without losing the source.
Summary
lex-knowledge currently ingests document chunks directly into Apollo as single entries. Research into MemPalace's architecture (96.6% R@5 verbatim baseline on LongMemEval) reveals a two-tier design that dramatically improves retrieval accuracy: verbatim drawers (original text, immutable) + AAAK compressed pointer entries (lossy symbolic index). The pointer layer is cheap to scan; the drawer layer is what gets returned.
Current Behavior
Single entry per chunk. Content may be processed/summarized before storage. Retrieval returns whatever was stored.
Proposed Two-Tier Design
Retrieval path:
The closet is a ranking signal that boosts drawer candidates. It never gates — if a closet misses, the direct drawer search still finds it.
AAAK Pointer Format
The AAAK format (from MemPalace's
dialect.py) is a compact symbolic summary:This is what the model scans at low token cost before deciding which full drawers to load.
Implementation Plan
tierandpoints_tometadata fields to lex-knowledge ingest (requires legion-apollo raw_content support — see Add raw_content field for verbatim storage — separate from indexed content legion-apollo#26)Why This Matters
The core finding from MemPalace benchmarks: verbatim text + good embeddings beats LLM-extracted summaries on recall. When you summarize at ingest time, you lose the context of why a decision was made, alternatives considered, exact wording. The pointer layer gives you cheap scanning without losing the source.
Dependencies
References
dialect.py— AAAK format specpalace.py— drawer/closet collection split