Skip to content

DEV-1414: per-bucket ranking invariance in search()#127

Merged
ZmeiGorynych merged 3 commits into
mainfrom
egor/dev-1414-memory-ranking-in-search-depends-on-max_entities
May 14, 2026
Merged

DEV-1414: per-bucket ranking invariance in search()#127
ZmeiGorynych merged 3 commits into
mainfrom
egor/dev-1414-memory-ranking-in-search-depends-on-max_entities

Conversation

@ZmeiGorynych
Copy link
Copy Markdown
Member

@ZmeiGorynych ZmeiGorynych commented May 14, 2026

Summary

  • Memory / example_query / entity rankings in search() are now invariant under the other two caps — for fixed (question, datasource, max_X), the X bucket's id list and order is a pure function of the corpus + question + that one cap. See DEV-1414.
  • Root cause: over_fetch_budget = max(max_memories + max_example_queries, max_entities) * 5 in slayer/search/service.py fed one shared candidate-pool limit to all three channels. Channels 2 (tantivy) and 3 (cosine) produced mixed memory+entity hits, then split them by kind — so changing any cap reshuffled the bottom cliff of each channel and rippled into the top of every output bucket via RRF.
  • Fix: per-kind channel queries with no over-fetch truncation. Each channel ranks the full per-kind corpus; the max_* caps become pure post-fusion slice ops on three independent ranked lists.

Changes

  • slayer/search/index.pysearch_index accepts new kind_filter / exclude_kind keyword-only params (mutex), built via tantivy.Query.boolean_query + term_query. build_in_memory_corpus uses writer(num_threads=1) so doc-id tiebreak on equal BM25 scores is deterministic across rebuilds (the latent non-determinism the new invariance tests surfaced).
  • slayer/search/service.py — drops _OVER_FETCH_MULTIPLIER and over_fetch_budget. Channel 1 stops truncating. Channels 2 and 3 each run twice (once per kind, limit = full per-kind corpus size).
  • tests/test_search_invariance.py (new) — 9 tests covering all three buckets × two cross-cap variations, the exact DEV-1414 A/B/C repro tuples, and channel-3-active variants.
  • tests/test_search_index.py — 7 new tests pinning kind_filter / exclude_kind semantics.
  • docs/concepts/search.md, CLAUDE.md, .claude/skills/slayer-overview.md — describe the per-bucket invariance contract.

Test plan

  • poetry run pytest -m "not integration" — 2628 passed, 0 failed.
  • poetry run ruff check slayer/ tests/ — clean.
  • DEV-1414 repro tuples (A: (10, 10, 5), B: (10, 0, 0), C: (15, 5, 2)) yield identical top-10 memory ids.
  • Channel 3 active path covered with a deterministic embedding stub.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Search can be filtered to return only memories, example queries, or other entities; mutually-exclusive include/exclude filters are enforced.
  • Documentation

    • Clarified multi-channel search behavior, per-bucket ranking invariants, embedding-based partitioning, and that embedding search requires an extra and degrades gracefully when unavailable.
  • Bug Fixes

    • In-memory search rebuilds now yield deterministic results.
  • Tests

    • Added regression and unit tests for kind-filtering and per-bucket ranking invariance (including embedding-path cases).

Review Change Stack

Memory/example_query/entity rankings now depend only on the corpus,
question, datasource, and that bucket's own cap. Previously a shared
``over_fetch_budget = max(max_memories + max_example_queries,
max_entities) * 5`` fed all three channels, so changing any one cap
reshuffled the bottom cliff of channels 2 and 3 (mixed memory+entity
hits) and rippled into the top of every output bucket.

Fix:
- search_index gains kind_filter / exclude_kind keyword-only params
  (mutex), built via tantivy.Query.boolean_query + term_query.
- build_in_memory_corpus uses writer(num_threads=1) so tantivy doc-id
  tiebreak on equal BM25 scores is deterministic across rebuilds.
- service.py drops _OVER_FETCH_MULTIPLIER and over_fetch_budget;
  channels 2 and 3 each run twice (once per kind, with limit = full
  per-kind corpus size), channel 1 stops truncating.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6946ddfd-eed9-4da2-96b7-e8e8e8a10f82

📥 Commits

Reviewing files that changed from the base of the PR and between 8acc562 and ec7223a.

📒 Files selected for processing (3)
  • slayer/search/index.py
  • slayer/search/service.py
  • tests/test_search_invariance.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • slayer/search/index.py
  • tests/test_search_invariance.py
  • slayer/search/service.py

📝 Walkthrough

Walkthrough

Refactors SearchService and search_index to produce independent full per-kind rankings for memories, example_queries, and entities (removes shared over-fetch), adds kind-based filtering to search_index, pins in-memory index determinism, and introduces regression tests asserting per-bucket membership/order invariance (DEV-1414).

Changes

Semantic search per-bucket ranking invariance

Layer / File(s) Summary
Invariance documentation
.claude/skills/slayer-overview.md, CLAUDE.md, docs/concepts/search.md
Docs name the three search channels and add a Per-bucket ranking invariance section describing full per-kind rankings and RRF post-slicing.
Search index: deterministic build and kind filtering
slayer/search/index.py
build_in_memory_corpus() pins tantivy writer to num_threads=1. search_index() gains kind_filter and exclude_kind parameters with mutual-exclusivity validation and boolean-term combination into the text query.
Unit tests for search_index kind filtering
tests/test_search_index.py
Add tests for kind_filter / exclude_kind: memory-only, model-only, exclude-memory, mutual exclusivity, preserved score order, and limit clamping.
Service docstring, orchestration, and helpers
slayer/search/service.py
Update docstring for per-kind semantics; remove over-fetch constant/plumbing; add _count_corpus_kinds() and _memory_id_from_canonical(); make backfill O(N) via id->Memory dict and align orchestration call sites.
Channel 1: full BM25 memory ranking
slayer/search/service.py
Run full BM25 memory corpus ranking (no over-fetch slicing) and build memory id map and memory ranking from the complete list.
Channel 2: per-kind tantivy queries
slayer/search/service.py
Compute per-kind corpus sizes and run two kind-filtered tantivy searches (memory-only and non-memory) with limits equal to kind sizes; build memory and entity rankings separately.
Channel 3: partitioned embedding ranking and live-canonical alignment
slayer/search/service.py
Filter embedding rows to live canonical ids, validate dims, partition rows by entity_kind, compute per-partition normalized matrices and rank fully with k=len(partition); recover memory ids from canonical ids and produce entity ranking.
Regression tests for per-bucket ranking invariance
tests/test_search_invariance.py
New module seeds deterministic corpus and adds tests asserting membership/order invariance for memories, example_queries, and entities while varying non-target caps; includes embedding-path fixtures and DEV-1414 repro test.

Sequence Diagram

sequenceDiagram
  participant Client
  participant SearchService
  participant Tantivy
  participant EmbeddingStore
  participant RRF
  Client->>SearchService: search(question, datasource, caps)
  SearchService->>Tantivy: channel1 BM25 memory query (full)
  SearchService->>Tantivy: channel2 tantivy query(kind_filter='memory', limit=memory_count)
  SearchService->>Tantivy: channel2 tantivy query(exclude_kind='memory', limit=entity_count)
  SearchService->>EmbeddingStore: channel3 fetch embeddings (filter live canonical ids)
  EmbeddingStore-->>SearchService: partitioned embedding rows (memory / non-memory)
  SearchService->>RRF: RRF-fuse per-kind ranked lists
  RRF-->>SearchService: fused per-kind lists (post-slice by caps)
  SearchService-->>Client: ordered results by bucket
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • MotleyAI/slayer#112: Related prior work on the RRF-based semantic search architecture and channel pipeline that this change refactors for per-kind invariance.
  • MotleyAI/slayer#121: Related changes to the retrieval pipeline and embedding handling that overlap with channel partitioning and memory/entity separation.

"🐰
I nudge the corpora to stay in line,
Three channels hum, each their own design,
No shared budget ties,
Rankings keep their ties,
Per-kind truths hop out, precise and fine."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'DEV-1414: per-bucket ranking invariance in search()' directly and clearly summarizes the main change: fixing ranking invariance across memory, example_query, and entity buckets in the search function.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch egor/dev-1414-memory-ranking-in-search-depends-on-max_entities

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_search_invariance.py (1)

153-200: 💤 Low value

Consider simplifying the entity-mapping logic.

SonarCloud flags this function's cognitive complexity (18 > 15). The long if-elif chain (lines 167-184) could be refactored into a lookup dict to reduce complexity:

TOPIC_TO_ENTITIES = {
    "amount_paid": ["warehouse.orders.amount_paid"],
    "paid": ["warehouse.orders.amount_paid"],
    "revenue": ["warehouse.orders.amount_paid"],
    "email": ["warehouse.customers.email"],
    "anonymous": ["warehouse.customers.email"],
    # ... etc
}

def _get_entities_for_topic(topic: str) -> List[str]:
    for keyword, entities in TOPIC_TO_ENTITIES.items():
        if keyword in topic:
            return entities
    return ["warehouse"]

However, since this is test-data seeding with straightforward intent and the complexity is inherent to the diverse test corpus, this refactor is optional.

Based on learnings: static analysis tools like SonarCloud are notorious for false positives in test code, and private helper functions in fully-controlled repos don't require defensive complexity reduction unless there's a genuine maintainability concern.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_search_invariance.py` around lines 153 - 200, The if/elif chain
inside _seed_invariance_corpus increases cognitive complexity; extract the
mapping into a TOPIC_TO_ENTITIES dict and add a small helper
_get_entities_for_topic(topic: str) -> List[str] that iterates TOPIC_TO_ENTITIES
keys and returns the first matching entities or ["warehouse"] as a default, then
replace the long conditional block before storage.save_memory calls with a call
to _get_entities_for_topic; keep storage.save_memory, _LEARNING_TOPICS and other
logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_search_invariance.py`:
- Around line 153-200: The if/elif chain inside _seed_invariance_corpus
increases cognitive complexity; extract the mapping into a TOPIC_TO_ENTITIES
dict and add a small helper _get_entities_for_topic(topic: str) -> List[str]
that iterates TOPIC_TO_ENTITIES keys and returns the first matching entities or
["warehouse"] as a default, then replace the long conditional block before
storage.save_memory calls with a call to _get_entities_for_topic; keep
storage.save_memory, _LEARNING_TOPICS and other logic unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 74d8f871-2848-4dbf-80aa-c075b324f6b1

📥 Commits

Reviewing files that changed from the base of the PR and between 48ffe02 and 4fd1def.

📒 Files selected for processing (7)
  • .claude/skills/slayer-overview.md
  • CLAUDE.md
  • docs/concepts/search.md
  • slayer/search/index.py
  • slayer/search/service.py
  • tests/test_search_index.py
  • tests/test_search_invariance.py

…hash)

- HIGH _backfill_memory_by_id was O(N^2) under the new per-kind full
  rankings (linear scan of all_memories per id, 3x). Build a
  ``{m.id: m}`` dict once in ``SearchService.search`` and pass it in;
  per-call backfill is now O(N).
- MEDIUM channel 3 ranked every sidecar row, including stale ones
  (memories deleted from storage, hidden / removed entities still
  embedded). Those rows consumed cosine ranks and degraded live docs'
  RRF scores (still invariant under cap changes, but lossy). Filter
  rows against ``corpus.canonical_to_kind`` before partitioning so the
  embedding candidate set matches the live tantivy corpus.
- LOW invariance-test stub used Python ``hash()`` for deterministic
  vectors, which is randomised per interpreter. Switch to
  ``hashlib.sha256`` so the rankings are reproducible across runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_search_invariance.py (1)

154-189: ⚡ Quick win

Refactor topic classification out of _seed_invariance_corpus to clear the complexity gate.

Line 154 is currently flagged by Sonar (18 > 15). Extracting the topic→entities decision tree into a dedicated helper keeps this seeding routine linear and should satisfy the gate.

♻️ Proposed refactor
+def _entities_for_topic(topic: str) -> List[str]:
+    if "amount_paid" in topic or "paid" in topic or "revenue" in topic:
+        return ["warehouse.orders.amount_paid"]
+    if "email" in topic or "anonymous" in topic:
+        return ["warehouse.customers.email"]
+    if "ship" in topic or "warehouse" in topic:
+        return ["warehouse.warehouses"]
+    if "customer" in topic and "tier" in topic:
+        return ["warehouse.customers.customer_tier"]
+    if "customer" in topic:
+        return ["warehouse.customers"]
+    if "status" in topic:
+        return ["warehouse.orders.status"]
+    if "discount" in topic:
+        return ["warehouse.orders.discount_code"]
+    if "checkout" in topic or "fraud" in topic:
+        return ["warehouse.orders"]
+    return ["warehouse"]
+
 async def _seed_invariance_corpus(storage: StorageBackend) -> None:
@@
     for i, topic in enumerate(_LEARNING_TOPICS):
-        entities: List[str]
-        if "amount_paid" in topic or "paid" in topic or "revenue" in topic:
-            entities = ["warehouse.orders.amount_paid"]
-        elif "email" in topic or "anonymous" in topic:
-            entities = ["warehouse.customers.email"]
-        elif "ship" in topic or "warehouse" in topic:
-            entities = ["warehouse.warehouses"]
-        elif "customer" in topic and "tier" in topic:
-            entities = ["warehouse.customers.customer_tier"]
-        elif "customer" in topic:
-            entities = ["warehouse.customers"]
-        elif "status" in topic:
-            entities = ["warehouse.orders.status"]
-        elif "discount" in topic:
-            entities = ["warehouse.orders.discount_code"]
-        elif "checkout" in topic or "fraud" in topic:
-            entities = ["warehouse.orders"]
-        else:
-            entities = ["warehouse"]
+        entities = _entities_for_topic(topic)
         await storage.save_memory(
             learning=f"KB{i:02d}: {topic}.",
             entities=entities,
         )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_search_invariance.py` around lines 154 - 189, The
_seed_invariance_corpus function has a large topic→entities decision tree that
should be extracted into a helper to reduce complexity; create a new function
(e.g., map_topic_to_entities(topic: str) -> List[str] or _topic_entities(topic))
that implements the if/elif cascade and returns the entities list, then replace
the inline decision block in _seed_invariance_corpus with a single call to this
helper before calling storage.save_memory (keep the helper in the same module
and ensure its name is used where entities is assigned).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_search_invariance.py`:
- Around line 154-189: The _seed_invariance_corpus function has a large
topic→entities decision tree that should be extracted into a helper to reduce
complexity; create a new function (e.g., map_topic_to_entities(topic: str) ->
List[str] or _topic_entities(topic)) that implements the if/elif cascade and
returns the entities list, then replace the inline decision block in
_seed_invariance_corpus with a single call to this helper before calling
storage.save_memory (keep the helper in the same module and ensure its name is
used where entities is assigned).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d2c968ab-ccc9-49f3-8775-e3a2e1d8bd00

📥 Commits

Reviewing files that changed from the base of the PR and between 4fd1def and 8acc562.

📒 Files selected for processing (2)
  • slayer/search/service.py
  • tests/test_search_invariance.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • slayer/search/service.py

Three S3776 issues opened by the DEV-1414 changes (cognitive complexity
17, 19, 18 vs the 15 threshold). All reduce to "the function grew a few
branches; extract a helper":

- slayer/search/index.py: factor the kind-filter / exclude-kind boolean
  wrap-up into _apply_kind_filter so search_index reads as parse → wrap
  → search.
- slayer/search/service.py: factor the per-kind cosine ranking into a
  module-level _rank_embedding_kind so _run_channel_3 calls it twice
  instead of inlining two near-identical matrix+top_k blocks.
- tests/test_search_invariance.py: factor the topic→entities decision
  tree out of _seed_invariance_corpus into _entities_for_topic (CodeRabbit
  nitpicks #1 and #2 — same suggestion via both review batches).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

@ZmeiGorynych ZmeiGorynych merged commit 054b666 into main May 14, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant