feat(recall_check): entity / memory_type / tag filtering (CLA-108)#38
Conversation
`recall_check` now accepts three optional filter fields alongside the
existing topic + similarity threshold:
* `entity` — exact-match (case-sensitive) on the memory's entity.
* `memory_type` — one of episodic / semantic / orientation; validated
against MemoryType::from_str so a bad value errors cleanly instead
of silently returning no results.
* `tags` — array; memory must include at least one of the listed
tags (any-of, not all-of, per the ticket).
Filters compose with AND across fields, OR within the tag list. Empty
filter set behaves identically to pre-CLA-108 recall_check.
Implementation: filters apply post-Vectorize against D1 rows. The
current Vectorize index doesn't store metadata, and adding it would
require re-upserting every existing vector + a schema change — that
work belongs with CLA-109's hybrid retrieval, when the storage layer
is being touched anyway. For now the no-filter path keeps the existing
lean shape (MMR-rerank then small D1 fetch); the filtered path fetches
all above-threshold candidates from D1 up front so the metadata filter
runs before MMR — otherwise MMR would diversify candidates we're about
to throw away.
When filters are active the Vectorize oversample doubles (limit*8 with
floor 20, was limit*4 floor 10) so a heavily-discriminating filter
still leaves enough candidates to fill `limit` and give MMR diversity
room. Ceiling stays at 100 for free-tier Vectorize friendliness.
Filter logic lives on `Memory::matches_filter` in memory.rs so it
compiles + tests on native (worker_mcp.rs is wasm-gated). Seven unit
tests cover each filter independently, the no-op empty case, and
filter composition.
Header line surfaces active filters back to the model so the result
context is self-describing.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds Memory::matches_filter (entity/type/tags) with tests, extends the MCP recall_check schema/args to accept metadata filters, and changes the recall flow to pre-filter Vectorize/D1 candidates before MMR reranking when filters are active. ChangesMemory Metadata Filtering
Sequence Diagram(s)sequenceDiagram
participant Client
participant ToolRecallCheck
participant Vectorize
participant D1
participant MMR
participant MemoryStore
Client->>ToolRecallCheck: recall_check(topic, filters?)
ToolRecallCheck->>Vectorize: vectorize(topic, oversample?)
Vectorize-->>ToolRecallCheck: candidate vectors
ToolRecallCheck->>D1: fetch above-threshold candidates
D1-->>ToolRecallCheck: candidate IDs + metadata
ToolRecallCheck->>ToolRecallCheck: apply metadata filters (entity/type/tags)
ToolRecallCheck->>MMR: rerank(filtered IDs)
MMR-->>ToolRecallCheck: top-ranked IDs
ToolRecallCheck->>MemoryStore: fetch memories for IDs
MemoryStore-->>ToolRecallCheck: memories
ToolRecallCheck-->>Client: results (with header summarizing active filters)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
…ype-tag-filtering
Summary
recall_checkgains three optional metadata filters:entity— exact-match (case-sensitive). Answers "what do I know about Chopper" by scoping to memories tagged withentity=chopper, ranked by semantic match within that subset.memory_type—episodic/semantic/orientation. Validated againstMemoryType::from_strso a bad value errors cleanly instead of silently returning no results.tags— array; memory must include at least one of the listed tags (any-of, per the ticket).Filters compose with AND across fields, OR within the tag list. Empty filter set is numerically identical to pre-CLA-108 behaviour.
Implementation notes
Memory::matches_filterinmemory.rsso it compiles + tests on native (worker_mcp.rs is wasm-gated and not unit-testable directly).... | Filters: entity=chopper, type=semantic, tags=[walk,morning]— result context is self-describing.Establishes the filterable-recall surface CLA-109 (hybrid retrieval) will inherit.
Test plan
cargo test— 141 pass (was 127; +7 new tests forMemory::matches_filter, each counted in two test binaries because memory.rs is shared)cargo check --target wasm32-unknown-unknown --libcleanworker-build --releaseproduces wasm bundle (28KB)Closes CLA-108.
Summary by CodeRabbit
New Features
Improvements