Skip to content

feat(recall_check): entity / memory_type / tag filtering (CLA-108)#38

Merged
JuzzyDee merged 2 commits into
devfrom
juzzydee/cla-108-recall_check-entity-memory_type-tag-filtering
May 23, 2026
Merged

feat(recall_check): entity / memory_type / tag filtering (CLA-108)#38
JuzzyDee merged 2 commits into
devfrom
juzzydee/cla-108-recall_check-entity-memory_type-tag-filtering

Conversation

@JuzzyDee

@JuzzyDee JuzzyDee commented May 22, 2026

Copy link
Copy Markdown
Owner

Summary

recall_check gains three optional metadata filters:

  • entity — exact-match (case-sensitive). Answers "what do I know about Chopper" by scoping to memories tagged with entity=chopper, ranked by semantic match within that subset.
  • memory_typeepisodic / semantic / orientation. Validated against MemoryType::from_str so a bad value errors cleanly instead of silently returning no results.
  • tags — array; memory must include at least one of the listed tags (any-of, per the ticket).

Filters compose with AND across fields, OR within the tag list. Empty filter set is numerically identical to pre-CLA-108 behaviour.

Implementation notes

  • Filters apply post-Vectorize against D1 rows. The current Vectorize index doesn't store metadata, and pushing filters down would require re-upserting every existing vector + a schema migration — that's CLA-109 territory when the storage layer is already being touched.
  • No-filter path stays lean: MMR-rerank then small D1 fetch (existing shape, no perf delta).
  • Filtered path fetches all above-threshold candidates from D1 up front so the metadata filter runs before MMR — otherwise MMR would diversify candidates about to be thrown away.
  • Oversample doubles when filters are active (limit8, floor 20, was limit4 floor 10). Ceiling stays at 100 for free-tier Vectorize friendliness. A 90%-discriminating filter on a 10-deep oversample would leave ~1 survivor; doubling keeps the pool deep enough.
  • Filter logic lives on Memory::matches_filter in memory.rs so it compiles + tests on native (worker_mcp.rs is wasm-gated and not unit-testable directly).
  • Header line surfaces active filters back to the model: ... | Filters: entity=chopper, type=semantic, tags=[walk,morning] — result context is self-describing.

Establishes the filterable-recall surface CLA-109 (hybrid retrieval) will inherit.

Test plan

  • cargo test — 141 pass (was 127; +7 new tests for Memory::matches_filter, each counted in two test binaries because memory.rs is shared)
  • cargo check --target wasm32-unknown-unknown --lib clean
  • worker-build --release produces wasm bundle (28KB)
  • Unit tests cover: empty filter (no-op), entity exact-match, entity rejects when memory.entity is None, memory_type, tags any-of, tags-empty skip, filter composition with AND semantics
  • Post-merge smoke test against live worker — exercise each filter against the restored memory set (entity=chopper, memory_type=semantic, tags=[rover])

Closes CLA-108.

Summary by CodeRabbit

  • New Features

    • Added optional filters for memory recall: entity, memory type, and tags for more targeted results.
  • Improvements

    • Recall flow now detects active filters and adjusts retrieval and reranking to prioritize filtered candidates; returns a clear message when no memories match.
    • Output header shows a summary of active filters for easier inspection.

Review Change Stack

`recall_check` now accepts three optional filter fields alongside the
existing topic + similarity threshold:

  * `entity` — exact-match (case-sensitive) on the memory's entity.
  * `memory_type` — one of episodic / semantic / orientation; validated
    against MemoryType::from_str so a bad value errors cleanly instead
    of silently returning no results.
  * `tags` — array; memory must include at least one of the listed
    tags (any-of, not all-of, per the ticket).

Filters compose with AND across fields, OR within the tag list. Empty
filter set behaves identically to pre-CLA-108 recall_check.

Implementation: filters apply post-Vectorize against D1 rows. The
current Vectorize index doesn't store metadata, and adding it would
require re-upserting every existing vector + a schema change — that
work belongs with CLA-109's hybrid retrieval, when the storage layer
is being touched anyway. For now the no-filter path keeps the existing
lean shape (MMR-rerank then small D1 fetch); the filtered path fetches
all above-threshold candidates from D1 up front so the metadata filter
runs before MMR — otherwise MMR would diversify candidates we're about
to throw away.

When filters are active the Vectorize oversample doubles (limit*8 with
floor 20, was limit*4 floor 10) so a heavily-discriminating filter
still leaves enough candidates to fill `limit` and give MMR diversity
room. Ceiling stays at 100 for free-tier Vectorize friendliness.

Filter logic lives on `Memory::matches_filter` in memory.rs so it
compiles + tests on native (worker_mcp.rs is wasm-gated). Seven unit
tests cover each filter independently, the no-op empty case, and
filter composition.

Header line surfaces active filters back to the model so the result
context is self-describing.
@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7d5459bd-cd9a-499a-8d1d-80d8577ef400

📥 Commits

Reviewing files that changed from the base of the PR and between 4405495 and 5562438.

📒 Files selected for processing (1)
  • src/worker_mcp.rs

📝 Walkthrough

Walkthrough

Adds Memory::matches_filter (entity/type/tags) with tests, extends the MCP recall_check schema/args to accept metadata filters, and changes the recall flow to pre-filter Vectorize/D1 candidates before MMR reranking when filters are active.

Changes

Memory Metadata Filtering

Layer / File(s) Summary
Filter contract and core implementation
src/memory.rs
Adds Memory::matches_filter implementing exact entity and memory_type matching and any-of tag matching; includes unit tests for no-filter pass-through, entity None rejection, type filtering, tag semantics, empty-tag skip, and AND composition.
MCP tool schema and arguments
src/worker_mcp.rs
Expands recall_check tool listing and inputSchema with optional entity, memory_type, and tags and adds corresponding optional fields to RecallCheckArgs.
Recall tool handler with filtering orchestration
src/worker_mcp.rs
Updates tool_recall_check to validate memory_type, detect active filters, increase Vectorize oversampling when filtering, fetch all above-threshold D1 candidates, apply metadata filters before MMR reranking, return an explicit message if no candidates match filters, perform MMR reranking on filtered IDs, and conditionally render active filters in the response header.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ToolRecallCheck
  participant Vectorize
  participant D1
  participant MMR
  participant MemoryStore
  Client->>ToolRecallCheck: recall_check(topic, filters?)
  ToolRecallCheck->>Vectorize: vectorize(topic, oversample?)
  Vectorize-->>ToolRecallCheck: candidate vectors
  ToolRecallCheck->>D1: fetch above-threshold candidates
  D1-->>ToolRecallCheck: candidate IDs + metadata
  ToolRecallCheck->>ToolRecallCheck: apply metadata filters (entity/type/tags)
  ToolRecallCheck->>MMR: rerank(filtered IDs)
  MMR-->>ToolRecallCheck: top-ranked IDs
  ToolRecallCheck->>MemoryStore: fetch memories for IDs
  MemoryStore-->>ToolRecallCheck: memories
  ToolRecallCheck-->>Client: results (with header summarizing active filters)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hop through memories, sniffing tags and names,
Entity, type, and tags—I check their claims.
Before ranking dances, I filter the nest,
Then MMR chooses the very best. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(recall_check): entity / memory_type / tag filtering (CLA-108)' clearly and concisely summarizes the main changes: adding three optional metadata filters to the recall_check feature.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch juzzydee/cla-108-recall_check-entity-memory_type-tag-filtering

Comment @coderabbitai help to get the list of available commands and usage tips.

@JuzzyDee JuzzyDee merged commit c4d8423 into dev May 23, 2026
6 checks passed
@JuzzyDee JuzzyDee deleted the juzzydee/cla-108-recall_check-entity-memory_type-tag-filtering branch May 23, 2026 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant