DEV-1414: per-bucket ranking invariance in search() by ZmeiGorynych · Pull Request #127 · MotleyAI/slayer

ZmeiGorynych · 2026-05-14T13:10:35Z

Summary

Memory / example_query / entity rankings in search() are now invariant under the other two caps — for fixed (question, datasource, max_X), the X bucket's id list and order is a pure function of the corpus + question + that one cap. See DEV-1414.
Root cause: over_fetch_budget = max(max_memories + max_example_queries, max_entities) * 5 in slayer/search/service.py fed one shared candidate-pool limit to all three channels. Channels 2 (tantivy) and 3 (cosine) produced mixed memory+entity hits, then split them by kind — so changing any cap reshuffled the bottom cliff of each channel and rippled into the top of every output bucket via RRF.
Fix: per-kind channel queries with no over-fetch truncation. Each channel ranks the full per-kind corpus; the max_* caps become pure post-fusion slice ops on three independent ranked lists.

Changes

slayer/search/index.py — search_index accepts new kind_filter / exclude_kind keyword-only params (mutex), built via tantivy.Query.boolean_query + term_query. build_in_memory_corpus uses writer(num_threads=1) so doc-id tiebreak on equal BM25 scores is deterministic across rebuilds (the latent non-determinism the new invariance tests surfaced).
slayer/search/service.py — drops _OVER_FETCH_MULTIPLIER and over_fetch_budget. Channel 1 stops truncating. Channels 2 and 3 each run twice (once per kind, limit = full per-kind corpus size).
tests/test_search_invariance.py (new) — 9 tests covering all three buckets × two cross-cap variations, the exact DEV-1414 A/B/C repro tuples, and channel-3-active variants.
tests/test_search_index.py — 7 new tests pinning kind_filter / exclude_kind semantics.
docs/concepts/search.md, CLAUDE.md, .claude/skills/slayer-overview.md — describe the per-bucket invariance contract.

Test plan

poetry run pytest -m "not integration" — 2628 passed, 0 failed.
poetry run ruff check slayer/ tests/ — clean.
DEV-1414 repro tuples (A: (10, 10, 5), B: (10, 0, 0), C: (15, 5, 2)) yield identical top-10 memory ids.
Channel 3 active path covered with a deterministic embedding stub.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Search can be filtered to return only memories, example queries, or other entities; mutually-exclusive include/exclude filters are enforced.
Documentation
- Clarified multi-channel search behavior, per-bucket ranking invariants, embedding-based partitioning, and that embedding search requires an extra and degrades gracefully when unavailable.
Bug Fixes
- In-memory search rebuilds now yield deterministic results.
Tests
- Added regression and unit tests for kind-filtering and per-bucket ranking invariance (including embedding-path cases).

Memory/example_query/entity rankings now depend only on the corpus, question, datasource, and that bucket's own cap. Previously a shared ``over_fetch_budget = max(max_memories + max_example_queries, max_entities) * 5`` fed all three channels, so changing any one cap reshuffled the bottom cliff of channels 2 and 3 (mixed memory+entity hits) and rippled into the top of every output bucket. Fix: - search_index gains kind_filter / exclude_kind keyword-only params (mutex), built via tantivy.Query.boolean_query + term_query. - build_in_memory_corpus uses writer(num_threads=1) so tantivy doc-id tiebreak on equal BM25 scores is deterministic across rebuilds. - service.py drops _OVER_FETCH_MULTIPLIER and over_fetch_budget; channels 2 and 3 each run twice (once per kind, with limit = full per-kind corpus size), channel 1 stops truncating. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-14T13:11:00Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6946ddfd-eed9-4da2-96b7-e8e8e8a10f82

📥 Commits

Reviewing files that changed from the base of the PR and between 8acc562 and ec7223a.

📒 Files selected for processing (3)

slayer/search/index.py
slayer/search/service.py
tests/test_search_invariance.py

🚧 Files skipped from review as they are similar to previous changes (3)

slayer/search/index.py
tests/test_search_invariance.py
slayer/search/service.py

📝 Walkthrough

Walkthrough

Refactors SearchService and search_index to produce independent full per-kind rankings for memories, example_queries, and entities (removes shared over-fetch), adds kind-based filtering to search_index, pins in-memory index determinism, and introduces regression tests asserting per-bucket membership/order invariance (DEV-1414).

Changes

Semantic search per-bucket ranking invariance

Layer / File(s)	Summary
Invariance documentation `.claude/skills/slayer-overview.md`, `CLAUDE.md`, `docs/concepts/search.md`	Docs name the three search channels and add a Per-bucket ranking invariance section describing full per-kind rankings and RRF post-slicing.
Search index: deterministic build and kind filtering `slayer/search/index.py`	`build_in_memory_corpus()` pins tantivy writer to `num_threads=1`. `search_index()` gains `kind_filter` and `exclude_kind` parameters with mutual-exclusivity validation and boolean-term combination into the text query.
Unit tests for search_index kind filtering `tests/test_search_index.py`	Add tests for `kind_filter` / `exclude_kind`: memory-only, model-only, exclude-memory, mutual exclusivity, preserved score order, and limit clamping.
Service docstring, orchestration, and helpers `slayer/search/service.py`	Update docstring for per-kind semantics; remove over-fetch constant/plumbing; add `_count_corpus_kinds()` and `_memory_id_from_canonical()`; make backfill O(N) via id->Memory dict and align orchestration call sites.
Channel 1: full BM25 memory ranking `slayer/search/service.py`	Run full BM25 memory corpus ranking (no over-fetch slicing) and build memory id map and memory ranking from the complete list.
Channel 2: per-kind tantivy queries `slayer/search/service.py`	Compute per-kind corpus sizes and run two kind-filtered tantivy searches (memory-only and non-memory) with limits equal to kind sizes; build memory and entity rankings separately.
Channel 3: partitioned embedding ranking and live-canonical alignment `slayer/search/service.py`	Filter embedding rows to live canonical ids, validate dims, partition rows by `entity_kind`, compute per-partition normalized matrices and rank fully with `k=len(partition)`; recover memory ids from canonical ids and produce entity ranking.
Regression tests for per-bucket ranking invariance `tests/test_search_invariance.py`	New module seeds deterministic corpus and adds tests asserting membership/order invariance for memories, example_queries, and entities while varying non-target caps; includes embedding-path fixtures and DEV-1414 repro test.

Sequence Diagram

sequenceDiagram
  participant Client
  participant SearchService
  participant Tantivy
  participant EmbeddingStore
  participant RRF
  Client->>SearchService: search(question, datasource, caps)
  SearchService->>Tantivy: channel1 BM25 memory query (full)
  SearchService->>Tantivy: channel2 tantivy query(kind_filter='memory', limit=memory_count)
  SearchService->>Tantivy: channel2 tantivy query(exclude_kind='memory', limit=entity_count)
  SearchService->>EmbeddingStore: channel3 fetch embeddings (filter live canonical ids)
  EmbeddingStore-->>SearchService: partitioned embedding rows (memory / non-memory)
  SearchService->>RRF: RRF-fuse per-kind ranked lists
  RRF-->>SearchService: fused per-kind lists (post-slice by caps)
  SearchService-->>Client: ordered results by bucket

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

MotleyAI/slayer#112: Related prior work on the RRF-based semantic search architecture and channel pipeline that this change refactors for per-kind invariance.
MotleyAI/slayer#121: Related changes to the retrieval pipeline and embedding handling that overlap with channel partitioning and memory/entity separation.

"🐰
I nudge the corpora to stay in line,
Three channels hum, each their own design,
No shared budget ties,
Rankings keep their ties,
Per-kind truths hop out, precise and fine."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 52.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'DEV-1414: per-bucket ranking invariance in search()' directly and clearly summarizes the main change: fixing ranking invariance across memory, example_query, and entity buckets in the search function.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch egor/dev-1414-memory-ranking-in-search-depends-on-max_entities

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/test_search_invariance.py (1)
153-200: 💤 Low value

Consider simplifying the entity-mapping logic.

SonarCloud flags this function's cognitive complexity (18 > 15). The long if-elif chain (lines 167-184) could be refactored into a lookup dict to reduce complexity:
TOPIC_TO_ENTITIES = {
    "amount_paid": ["warehouse.orders.amount_paid"],
    "paid": ["warehouse.orders.amount_paid"],
    "revenue": ["warehouse.orders.amount_paid"],
    "email": ["warehouse.customers.email"],
    "anonymous": ["warehouse.customers.email"],
    # ... etc
}

def _get_entities_for_topic(topic: str) -> List[str]:
    for keyword, entities in TOPIC_TO_ENTITIES.items():
        if keyword in topic:
            return entities
    return ["warehouse"]
However, since this is test-data seeding with straightforward intent and the complexity is inherent to the diverse test corpus, this refactor is optional.

Based on learnings: static analysis tools like SonarCloud are notorious for false positives in test code, and private helper functions in fully-controlled repos don't require defensive complexity reduction unless there's a genuine maintainability concern.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_search_invariance.py` around lines 153 - 200, The if/elif chain
inside _seed_invariance_corpus increases cognitive complexity; extract the
mapping into a TOPIC_TO_ENTITIES dict and add a small helper
_get_entities_for_topic(topic: str) -> List[str] that iterates TOPIC_TO_ENTITIES
keys and returns the first matching entities or ["warehouse"] as a default, then
replace the long conditional block before storage.save_memory calls with a call
to _get_entities_for_topic; keep storage.save_memory, _LEARNING_TOPICS and other
logic unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_search_invariance.py`:
- Around line 153-200: The if/elif chain inside _seed_invariance_corpus
increases cognitive complexity; extract the mapping into a TOPIC_TO_ENTITIES
dict and add a small helper _get_entities_for_topic(topic: str) -> List[str]
that iterates TOPIC_TO_ENTITIES keys and returns the first matching entities or
["warehouse"] as a default, then replace the long conditional block before
storage.save_memory calls with a call to _get_entities_for_topic; keep
storage.save_memory, _LEARNING_TOPICS and other logic unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 74d8f871-2848-4dbf-80aa-c075b324f6b1

📥 Commits

Reviewing files that changed from the base of the PR and between 48ffe02 and 4fd1def.

📒 Files selected for processing (7)

.claude/skills/slayer-overview.md
CLAUDE.md
docs/concepts/search.md
slayer/search/index.py
slayer/search/service.py
tests/test_search_index.py
tests/test_search_invariance.py

…hash) - HIGH _backfill_memory_by_id was O(N^2) under the new per-kind full rankings (linear scan of all_memories per id, 3x). Build a ``{m.id: m}`` dict once in ``SearchService.search`` and pass it in; per-call backfill is now O(N). - MEDIUM channel 3 ranked every sidecar row, including stale ones (memories deleted from storage, hidden / removed entities still embedded). Those rows consumed cosine ranks and degraded live docs' RRF scores (still invariant under cap changes, but lossy). Filter rows against ``corpus.canonical_to_kind`` before partitioning so the embedding candidate set matches the live tantivy corpus. - LOW invariance-test stub used Python ``hash()`` for deterministic vectors, which is randomised per interpreter. Switch to ``hashlib.sha256`` so the rankings are reproducible across runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

tests/test_search_invariance.py (1)

154-189: ⚡ Quick win

Refactor topic classification out of _seed_invariance_corpus to clear the complexity gate.

Line 154 is currently flagged by Sonar (18 > 15). Extracting the topic→entities decision tree into a dedicated helper keeps this seeding routine linear and should satisfy the gate.

♻️ Proposed refactor

+def _entities_for_topic(topic: str) -> List[str]:
+    if "amount_paid" in topic or "paid" in topic or "revenue" in topic:
+        return ["warehouse.orders.amount_paid"]
+    if "email" in topic or "anonymous" in topic:
+        return ["warehouse.customers.email"]
+    if "ship" in topic or "warehouse" in topic:
+        return ["warehouse.warehouses"]
+    if "customer" in topic and "tier" in topic:
+        return ["warehouse.customers.customer_tier"]
+    if "customer" in topic:
+        return ["warehouse.customers"]
+    if "status" in topic:
+        return ["warehouse.orders.status"]
+    if "discount" in topic:
+        return ["warehouse.orders.discount_code"]
+    if "checkout" in topic or "fraud" in topic:
+        return ["warehouse.orders"]
+    return ["warehouse"]
+
 async def _seed_invariance_corpus(storage: StorageBackend) -> None:
@@
     for i, topic in enumerate(_LEARNING_TOPICS):
-        entities: List[str]
-        if "amount_paid" in topic or "paid" in topic or "revenue" in topic:
-            entities = ["warehouse.orders.amount_paid"]
-        elif "email" in topic or "anonymous" in topic:
-            entities = ["warehouse.customers.email"]
-        elif "ship" in topic or "warehouse" in topic:
-            entities = ["warehouse.warehouses"]
-        elif "customer" in topic and "tier" in topic:
-            entities = ["warehouse.customers.customer_tier"]
-        elif "customer" in topic:
-            entities = ["warehouse.customers"]
-        elif "status" in topic:
-            entities = ["warehouse.orders.status"]
-        elif "discount" in topic:
-            entities = ["warehouse.orders.discount_code"]
-        elif "checkout" in topic or "fraud" in topic:
-            entities = ["warehouse.orders"]
-        else:
-            entities = ["warehouse"]
+        entities = _entities_for_topic(topic)
         await storage.save_memory(
             learning=f"KB{i:02d}: {topic}.",
             entities=entities,
         )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_search_invariance.py` around lines 154 - 189, The
_seed_invariance_corpus function has a large topic→entities decision tree that
should be extracted into a helper to reduce complexity; create a new function
(e.g., map_topic_to_entities(topic: str) -> List[str] or _topic_entities(topic))
that implements the if/elif cascade and returns the entities list, then replace
the inline decision block in _seed_invariance_corpus with a single call to this
helper before calling storage.save_memory (keep the helper in the same module
and ensure its name is used where entities is assigned).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_search_invariance.py`:
- Around line 154-189: The _seed_invariance_corpus function has a large
topic→entities decision tree that should be extracted into a helper to reduce
complexity; create a new function (e.g., map_topic_to_entities(topic: str) ->
List[str] or _topic_entities(topic)) that implements the if/elif cascade and
returns the entities list, then replace the inline decision block in
_seed_invariance_corpus with a single call to this helper before calling
storage.save_memory (keep the helper in the same module and ensure its name is
used where entities is assigned).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d2c968ab-ccc9-49f3-8775-e3a2e1d8bd00

📥 Commits

Reviewing files that changed from the base of the PR and between 4fd1def and 8acc562.

📒 Files selected for processing (2)

slayer/search/service.py
tests/test_search_invariance.py

🚧 Files skipped from review as they are similar to previous changes (1)

slayer/search/service.py

Three S3776 issues opened by the DEV-1414 changes (cognitive complexity 17, 19, 18 vs the 15 threshold). All reduce to "the function grew a few branches; extract a helper": - slayer/search/index.py: factor the kind-filter / exclude-kind boolean wrap-up into _apply_kind_filter so search_index reads as parse → wrap → search. - slayer/search/service.py: factor the per-kind cosine ranking into a module-level _rank_embedding_kind so _run_channel_3 calls it twice instead of inlining two near-identical matrix+top_k blocks. - tests/test_search_invariance.py: factor the topic→entities decision tree out of _seed_invariance_corpus into _entities_for_topic (CodeRabbit nitpicks #1 and #2 — same suggestion via both review batches). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-05-14T13:57:35Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

ZmeiGorynych merged commit 054b666 into main May 14, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEV-1414: per-bucket ranking invariance in search()#127

DEV-1414: per-bucket ranking invariance in search()#127
ZmeiGorynych merged 3 commits into
mainfrom
egor/dev-1414-memory-ranking-in-search-depends-on-max_entities

ZmeiGorynych commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

sonarqubecloud Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZmeiGorynych commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 14, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZmeiGorynych commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading