DEV-1414: per-bucket ranking invariance in search()#127
Conversation
Memory/example_query/entity rankings now depend only on the corpus, question, datasource, and that bucket's own cap. Previously a shared ``over_fetch_budget = max(max_memories + max_example_queries, max_entities) * 5`` fed all three channels, so changing any one cap reshuffled the bottom cliff of channels 2 and 3 (mixed memory+entity hits) and rippled into the top of every output bucket. Fix: - search_index gains kind_filter / exclude_kind keyword-only params (mutex), built via tantivy.Query.boolean_query + term_query. - build_in_memory_corpus uses writer(num_threads=1) so tantivy doc-id tiebreak on equal BM25 scores is deterministic across rebuilds. - service.py drops _OVER_FETCH_MULTIPLIER and over_fetch_budget; channels 2 and 3 each run twice (once per kind, with limit = full per-kind corpus size), channel 1 stops truncating. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughRefactors SearchService and search_index to produce independent full per-kind rankings for memories, example_queries, and entities (removes shared over-fetch), adds kind-based filtering to search_index, pins in-memory index determinism, and introduces regression tests asserting per-bucket membership/order invariance (DEV-1414). ChangesSemantic search per-bucket ranking invariance
Sequence DiagramsequenceDiagram
participant Client
participant SearchService
participant Tantivy
participant EmbeddingStore
participant RRF
Client->>SearchService: search(question, datasource, caps)
SearchService->>Tantivy: channel1 BM25 memory query (full)
SearchService->>Tantivy: channel2 tantivy query(kind_filter='memory', limit=memory_count)
SearchService->>Tantivy: channel2 tantivy query(exclude_kind='memory', limit=entity_count)
SearchService->>EmbeddingStore: channel3 fetch embeddings (filter live canonical ids)
EmbeddingStore-->>SearchService: partitioned embedding rows (memory / non-memory)
SearchService->>RRF: RRF-fuse per-kind ranked lists
RRF-->>SearchService: fused per-kind lists (post-slice by caps)
SearchService-->>Client: ordered results by bucket
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_search_invariance.py (1)
153-200: 💤 Low valueConsider simplifying the entity-mapping logic.
SonarCloud flags this function's cognitive complexity (18 > 15). The long if-elif chain (lines 167-184) could be refactored into a lookup dict to reduce complexity:
TOPIC_TO_ENTITIES = { "amount_paid": ["warehouse.orders.amount_paid"], "paid": ["warehouse.orders.amount_paid"], "revenue": ["warehouse.orders.amount_paid"], "email": ["warehouse.customers.email"], "anonymous": ["warehouse.customers.email"], # ... etc } def _get_entities_for_topic(topic: str) -> List[str]: for keyword, entities in TOPIC_TO_ENTITIES.items(): if keyword in topic: return entities return ["warehouse"]However, since this is test-data seeding with straightforward intent and the complexity is inherent to the diverse test corpus, this refactor is optional.
Based on learnings: static analysis tools like SonarCloud are notorious for false positives in test code, and private helper functions in fully-controlled repos don't require defensive complexity reduction unless there's a genuine maintainability concern.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_search_invariance.py` around lines 153 - 200, The if/elif chain inside _seed_invariance_corpus increases cognitive complexity; extract the mapping into a TOPIC_TO_ENTITIES dict and add a small helper _get_entities_for_topic(topic: str) -> List[str] that iterates TOPIC_TO_ENTITIES keys and returns the first matching entities or ["warehouse"] as a default, then replace the long conditional block before storage.save_memory calls with a call to _get_entities_for_topic; keep storage.save_memory, _LEARNING_TOPICS and other logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_search_invariance.py`:
- Around line 153-200: The if/elif chain inside _seed_invariance_corpus
increases cognitive complexity; extract the mapping into a TOPIC_TO_ENTITIES
dict and add a small helper _get_entities_for_topic(topic: str) -> List[str]
that iterates TOPIC_TO_ENTITIES keys and returns the first matching entities or
["warehouse"] as a default, then replace the long conditional block before
storage.save_memory calls with a call to _get_entities_for_topic; keep
storage.save_memory, _LEARNING_TOPICS and other logic unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 74d8f871-2848-4dbf-80aa-c075b324f6b1
📒 Files selected for processing (7)
.claude/skills/slayer-overview.mdCLAUDE.mddocs/concepts/search.mdslayer/search/index.pyslayer/search/service.pytests/test_search_index.pytests/test_search_invariance.py
…hash)
- HIGH _backfill_memory_by_id was O(N^2) under the new per-kind full
rankings (linear scan of all_memories per id, 3x). Build a
``{m.id: m}`` dict once in ``SearchService.search`` and pass it in;
per-call backfill is now O(N).
- MEDIUM channel 3 ranked every sidecar row, including stale ones
(memories deleted from storage, hidden / removed entities still
embedded). Those rows consumed cosine ranks and degraded live docs'
RRF scores (still invariant under cap changes, but lossy). Filter
rows against ``corpus.canonical_to_kind`` before partitioning so the
embedding candidate set matches the live tantivy corpus.
- LOW invariance-test stub used Python ``hash()`` for deterministic
vectors, which is randomised per interpreter. Switch to
``hashlib.sha256`` so the rankings are reproducible across runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_search_invariance.py (1)
154-189: ⚡ Quick winRefactor topic classification out of
_seed_invariance_corpusto clear the complexity gate.Line 154 is currently flagged by Sonar (18 > 15). Extracting the topic→entities decision tree into a dedicated helper keeps this seeding routine linear and should satisfy the gate.
♻️ Proposed refactor
+def _entities_for_topic(topic: str) -> List[str]: + if "amount_paid" in topic or "paid" in topic or "revenue" in topic: + return ["warehouse.orders.amount_paid"] + if "email" in topic or "anonymous" in topic: + return ["warehouse.customers.email"] + if "ship" in topic or "warehouse" in topic: + return ["warehouse.warehouses"] + if "customer" in topic and "tier" in topic: + return ["warehouse.customers.customer_tier"] + if "customer" in topic: + return ["warehouse.customers"] + if "status" in topic: + return ["warehouse.orders.status"] + if "discount" in topic: + return ["warehouse.orders.discount_code"] + if "checkout" in topic or "fraud" in topic: + return ["warehouse.orders"] + return ["warehouse"] + async def _seed_invariance_corpus(storage: StorageBackend) -> None: @@ for i, topic in enumerate(_LEARNING_TOPICS): - entities: List[str] - if "amount_paid" in topic or "paid" in topic or "revenue" in topic: - entities = ["warehouse.orders.amount_paid"] - elif "email" in topic or "anonymous" in topic: - entities = ["warehouse.customers.email"] - elif "ship" in topic or "warehouse" in topic: - entities = ["warehouse.warehouses"] - elif "customer" in topic and "tier" in topic: - entities = ["warehouse.customers.customer_tier"] - elif "customer" in topic: - entities = ["warehouse.customers"] - elif "status" in topic: - entities = ["warehouse.orders.status"] - elif "discount" in topic: - entities = ["warehouse.orders.discount_code"] - elif "checkout" in topic or "fraud" in topic: - entities = ["warehouse.orders"] - else: - entities = ["warehouse"] + entities = _entities_for_topic(topic) await storage.save_memory( learning=f"KB{i:02d}: {topic}.", entities=entities, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_search_invariance.py` around lines 154 - 189, The _seed_invariance_corpus function has a large topic→entities decision tree that should be extracted into a helper to reduce complexity; create a new function (e.g., map_topic_to_entities(topic: str) -> List[str] or _topic_entities(topic)) that implements the if/elif cascade and returns the entities list, then replace the inline decision block in _seed_invariance_corpus with a single call to this helper before calling storage.save_memory (keep the helper in the same module and ensure its name is used where entities is assigned).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_search_invariance.py`:
- Around line 154-189: The _seed_invariance_corpus function has a large
topic→entities decision tree that should be extracted into a helper to reduce
complexity; create a new function (e.g., map_topic_to_entities(topic: str) ->
List[str] or _topic_entities(topic)) that implements the if/elif cascade and
returns the entities list, then replace the inline decision block in
_seed_invariance_corpus with a single call to this helper before calling
storage.save_memory (keep the helper in the same module and ensure its name is
used where entities is assigned).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d2c968ab-ccc9-49f3-8775-e3a2e1d8bd00
📒 Files selected for processing (2)
slayer/search/service.pytests/test_search_invariance.py
🚧 Files skipped from review as they are similar to previous changes (1)
- slayer/search/service.py
Three S3776 issues opened by the DEV-1414 changes (cognitive complexity 17, 19, 18 vs the 15 threshold). All reduce to "the function grew a few branches; extract a helper": - slayer/search/index.py: factor the kind-filter / exclude-kind boolean wrap-up into _apply_kind_filter so search_index reads as parse → wrap → search. - slayer/search/service.py: factor the per-kind cosine ranking into a module-level _rank_embedding_kind so _run_channel_3 calls it twice instead of inlining two near-identical matrix+top_k blocks. - tests/test_search_invariance.py: factor the topic→entities decision tree out of _seed_invariance_corpus into _entities_for_topic (CodeRabbit nitpicks #1 and #2 — same suggestion via both review batches). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|



Summary
search()are now invariant under the other two caps — for fixed(question, datasource, max_X), theXbucket's id list and order is a pure function of the corpus + question + that one cap. See DEV-1414.over_fetch_budget = max(max_memories + max_example_queries, max_entities) * 5inslayer/search/service.pyfed one shared candidate-pool limit to all three channels. Channels 2 (tantivy) and 3 (cosine) produced mixed memory+entity hits, then split them by kind — so changing any cap reshuffled the bottom cliff of each channel and rippled into the top of every output bucket via RRF.max_*caps become pure post-fusion slice ops on three independent ranked lists.Changes
slayer/search/index.py—search_indexaccepts newkind_filter/exclude_kindkeyword-only params (mutex), built viatantivy.Query.boolean_query+term_query.build_in_memory_corpususeswriter(num_threads=1)so doc-id tiebreak on equal BM25 scores is deterministic across rebuilds (the latent non-determinism the new invariance tests surfaced).slayer/search/service.py— drops_OVER_FETCH_MULTIPLIERandover_fetch_budget. Channel 1 stops truncating. Channels 2 and 3 each run twice (once per kind,limit = full per-kind corpus size).tests/test_search_invariance.py(new) — 9 tests covering all three buckets × two cross-cap variations, the exact DEV-1414 A/B/C repro tuples, and channel-3-active variants.tests/test_search_index.py— 7 new tests pinningkind_filter/exclude_kindsemantics.docs/concepts/search.md,CLAUDE.md,.claude/skills/slayer-overview.md— describe the per-bucket invariance contract.Test plan
poetry run pytest -m "not integration"— 2628 passed, 0 failed.poetry run ruff check slayer/ tests/— clean.(10, 10, 5), B:(10, 0, 0), C:(15, 5, 2)) yield identical top-10 memory ids.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Tests