feat(recall_check): entity / memory_type / tag filtering (CLA-108) by JuzzyDee · Pull Request #38 · JuzzyDee/oneiro

JuzzyDee · 2026-05-22T23:57:35Z

Summary

recall_check gains three optional metadata filters:

entity — exact-match (case-sensitive). Answers "what do I know about Chopper" by scoping to memories tagged with entity=chopper, ranked by semantic match within that subset.
memory_type — episodic / semantic / orientation. Validated against MemoryType::from_str so a bad value errors cleanly instead of silently returning no results.
tags — array; memory must include at least one of the listed tags (any-of, per the ticket).

Filters compose with AND across fields, OR within the tag list. Empty filter set is numerically identical to pre-CLA-108 behaviour.

Implementation notes

Filters apply post-Vectorize against D1 rows. The current Vectorize index doesn't store metadata, and pushing filters down would require re-upserting every existing vector + a schema migration — that's CLA-109 territory when the storage layer is already being touched.
No-filter path stays lean: MMR-rerank then small D1 fetch (existing shape, no perf delta).
Filtered path fetches all above-threshold candidates from D1 up front so the metadata filter runs before MMR — otherwise MMR would diversify candidates about to be thrown away.
Oversample doubles when filters are active (limit8, floor 20, was limit4 floor 10). Ceiling stays at 100 for free-tier Vectorize friendliness. A 90%-discriminating filter on a 10-deep oversample would leave ~1 survivor; doubling keeps the pool deep enough.
Filter logic lives on Memory::matches_filter in memory.rs so it compiles + tests on native (worker_mcp.rs is wasm-gated and not unit-testable directly).
Header line surfaces active filters back to the model: ... | Filters: entity=chopper, type=semantic, tags=[walk,morning] — result context is self-describing.

Establishes the filterable-recall surface CLA-109 (hybrid retrieval) will inherit.

Test plan

cargo test — 141 pass (was 127; +7 new tests for Memory::matches_filter, each counted in two test binaries because memory.rs is shared)
cargo check --target wasm32-unknown-unknown --lib clean
worker-build --release produces wasm bundle (28KB)
Unit tests cover: empty filter (no-op), entity exact-match, entity rejects when memory.entity is None, memory_type, tags any-of, tags-empty skip, filter composition with AND semantics
Post-merge smoke test against live worker — exercise each filter against the restored memory set (entity=chopper, memory_type=semantic, tags=[rover])

Closes CLA-108.

Summary by CodeRabbit

New Features
- Added optional filters for memory recall: entity, memory type, and tags for more targeted results.
Improvements
- Recall flow now detects active filters and adjusts retrieval and reranking to prioritize filtered candidates; returns a clear message when no memories match.
- Output header shows a summary of active filters for easier inspection.

`recall_check` now accepts three optional filter fields alongside the existing topic + similarity threshold: * `entity` — exact-match (case-sensitive) on the memory's entity. * `memory_type` — one of episodic / semantic / orientation; validated against MemoryType::from_str so a bad value errors cleanly instead of silently returning no results. * `tags` — array; memory must include at least one of the listed tags (any-of, not all-of, per the ticket). Filters compose with AND across fields, OR within the tag list. Empty filter set behaves identically to pre-CLA-108 recall_check. Implementation: filters apply post-Vectorize against D1 rows. The current Vectorize index doesn't store metadata, and adding it would require re-upserting every existing vector + a schema change — that work belongs with CLA-109's hybrid retrieval, when the storage layer is being touched anyway. For now the no-filter path keeps the existing lean shape (MMR-rerank then small D1 fetch); the filtered path fetches all above-threshold candidates from D1 up front so the metadata filter runs before MMR — otherwise MMR would diversify candidates we're about to throw away. When filters are active the Vectorize oversample doubles (limit*8 with floor 20, was limit*4 floor 10) so a heavily-discriminating filter still leaves enough candidates to fill `limit` and give MMR diversity room. Ceiling stays at 100 for free-tier Vectorize friendliness. Filter logic lives on `Memory::matches_filter` in memory.rs so it compiles + tests on native (worker_mcp.rs is wasm-gated). Seven unit tests cover each filter independently, the no-op empty case, and filter composition. Header line surfaces active filters back to the model so the result context is self-describing.

coderabbitai · 2026-05-22T23:57:45Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7d5459bd-cd9a-499a-8d1d-80d8577ef400

📥 Commits

Reviewing files that changed from the base of the PR and between 4405495 and 5562438.

📒 Files selected for processing (1)

src/worker_mcp.rs

📝 Walkthrough

Walkthrough

Adds Memory::matches_filter (entity/type/tags) with tests, extends the MCP recall_check schema/args to accept metadata filters, and changes the recall flow to pre-filter Vectorize/D1 candidates before MMR reranking when filters are active.

Changes

Memory Metadata Filtering

Layer / File(s)	Summary
Filter contract and core implementation `src/memory.rs`	Adds `Memory::matches_filter` implementing exact entity and memory_type matching and any-of tag matching; includes unit tests for no-filter pass-through, entity None rejection, type filtering, tag semantics, empty-tag skip, and AND composition.
MCP tool schema and arguments `src/worker_mcp.rs`	Expands `recall_check` tool listing and `inputSchema` with optional `entity`, `memory_type`, and `tags` and adds corresponding optional fields to `RecallCheckArgs`.
Recall tool handler with filtering orchestration `src/worker_mcp.rs`	Updates `tool_recall_check` to validate `memory_type`, detect active filters, increase Vectorize oversampling when filtering, fetch all above-threshold D1 candidates, apply metadata filters before MMR reranking, return an explicit message if no candidates match filters, perform MMR reranking on filtered IDs, and conditionally render active filters in the response header.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ToolRecallCheck
  participant Vectorize
  participant D1
  participant MMR
  participant MemoryStore
  Client->>ToolRecallCheck: recall_check(topic, filters?)
  ToolRecallCheck->>Vectorize: vectorize(topic, oversample?)
  Vectorize-->>ToolRecallCheck: candidate vectors
  ToolRecallCheck->>D1: fetch above-threshold candidates
  D1-->>ToolRecallCheck: candidate IDs + metadata
  ToolRecallCheck->>ToolRecallCheck: apply metadata filters (entity/type/tags)
  ToolRecallCheck->>MMR: rerank(filtered IDs)
  MMR-->>ToolRecallCheck: top-ranked IDs
  ToolRecallCheck->>MemoryStore: fetch memories for IDs
  MemoryStore-->>ToolRecallCheck: memories
  ToolRecallCheck-->>Client: results (with header summarizing active filters)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hop through memories, sniffing tags and names,
Entity, type, and tags—I check their claims.
Before ranking dances, I filter the nest,
Then MMR chooses the very best. ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(recall_check): entity / memory_type / tag filtering (CLA-108)' clearly and concisely summarizes the main changes: adding three optional metadata filters to the recall_check feature.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch juzzydee/cla-108-recall_check-entity-memory_type-tag-filtering

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ype-tag-filtering

Merge branch 'dev' into juzzydee/cla-108-recall_check-entity-memory_t…

5562438

…ype-tag-filtering

JuzzyDee merged commit c4d8423 into dev May 23, 2026
6 checks passed

JuzzyDee deleted the juzzydee/cla-108-recall_check-entity-memory_type-tag-filtering branch May 23, 2026 00:35

coderabbitai Bot mentioned this pull request May 23, 2026

feat(recall_check): hybrid retrieval — FTS5 + Vectorize + RRF fusion (CLA-109) #39

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(recall_check): entity / memory_type / tag filtering (CLA-108)#38

feat(recall_check): entity / memory_type / tag filtering (CLA-108)#38
JuzzyDee merged 2 commits into
devfrom
juzzydee/cla-108-recall_check-entity-memory_type-tag-filtering

JuzzyDee commented May 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 22, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JuzzyDee commented May 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Implementation notes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JuzzyDee commented May 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 22, 2026 •

edited

Loading