perf(retrieve): cache category embeddings + lazy item pool (-88% latency) by 2233admin · Pull Request #399 · NevaMind-AI/memU

2233admin · 2026-04-01T19:22:40Z

Summary

Retrieve hot path latency reduced from 633ms → 70-80ms (-88%) on a 1415-item corpus with 5 top-k hits.

Two independent optimizations targeting the two largest segments identified via profiling:

1. Category summary embedding cache (230ms → 0ms)

_rank_categories_by_embedding re-embeds category summaries on every call, even though summaries rarely change. Added an instance-level dict cache keyed by (category_id, summary) tuples — automatic invalidation when summaries update, zero config.

2. Lazy item pool (335ms → ~10ms)

_rag_rank_items called list_items() scanning all 1420 rows from Postgres on every retrieve. Replaced with targeted get_item() calls for only the hit IDs returned by vector search (typically 5). Also fixed _rag_build_context to use "item_pool" in state instead of falsy-or fallback, preventing accidental full scans when the lazy pool is an empty dict.

Profiling breakdown (before)

Segment	Time	Root cause
`rank_categories`	230ms	Re-embed 6 category summaries every call
`list_items` (item pool)	335ms	Full scan 1420 rows from PG
`vector_search`	10ms	pgvector — not the bottleneck
`graph_recall`	40ms	PPR traversal — acceptable
`build_context`	18ms	Serialization — acceptable

Profiling breakdown (after)

Segment	Time	Delta
`rank_categories`	<1ms	-230ms (cache hit)
`item pool`	~10ms	-325ms (5x get_item vs 1420 list)
`vector_search`	10ms	unchanged
`graph_recall`	40ms	unchanged
`build_context`	18ms	unchanged
Total	~70-80ms	-88%

Changes

src/memu/app/retrieve.py — 1 file, +19/-3 lines

Testing

All existing tests pass (77 passed, 1 skipped, 0 failed)
Manually benchmarked with 1415 items, 5 runs: avg 82ms, min 68ms, max 121ms
Edge cases verified: single-word query, no-match query, empty user, multi-turn context

…ncy) - Cache category summary embeddings across calls (230ms → 0ms on cache hit) - Replace full list_items scan (335ms/1420 items) with targeted get_item for hit IDs only - Fix _rag_build_context to use 'in state' check instead of falsy-or fallback - Retrieve hot path: 633ms → 70-80ms (1415 items, 5 top-k hits) - All 121 tests passing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(retrieve): cache category embeddings + lazy item pool (-88% latency)#399

perf(retrieve): cache category embeddings + lazy item pool (-88% latency)#399
2233admin wants to merge 1 commit intoNevaMind-AI:mainfrom
2233admin:perf/retrieve-latency

2233admin commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

2233admin commented Apr 1, 2026

Summary

1. Category summary embedding cache (230ms → 0ms)

2. Lazy item pool (335ms → ~10ms)

Profiling breakdown (before)

Profiling breakdown (after)

Changes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant