MotleyAI · ZmeiGorynych · May 14, 2026 · May 14, 2026 · May 14, 2026 · May 14, 2026
diff --git a/.claude/skills/slayer-overview.md b/.claude/skills/slayer-overview.md
@@ -32,7 +32,7 @@ Datasources: `create_datasource`, `list_datasources`, `describe_datasource` (inc
 Ingestion: `ingest_datasource_models`
 Schema drift: `validate_models` (read-only diff against live schema; surfaces `SchemaDriftError` cleanups)
 Memory write side: `save_memory`, `forget_memory` (per-entity learnings indexed by canonical entity strings — see [memories.md](../../docs/concepts/memories.md))
-Search: `search` (three-channel: entity-overlap BM25 over memories + tantivy full-text over memories ∪ entities + optional dense embedding similarity, RRF-fused; embeddings require the `embedding_search` extra and degrade gracefully when unavailable; partitions query-bearing memories into `example_queries` — see [search.md](../../docs/concepts/search.md))
+Search: `search` (three-channel: entity-overlap BM25 over memories + tantivy full-text + optional dense embedding similarity, RRF-fused per kind so each output bucket — `memories` / `example_queries` / `entities` — has membership/order invariant under the other buckets' caps; embeddings require the `embedding_search` extra and degrade gracefully when unavailable; partitions query-bearing memories into `example_queries` — see [search.md](../../docs/concepts/search.md))
 
 ## Package Structure
 

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -93,7 +93,7 @@ poetry run ruff check slayer/ tests/
 
 - **Memories + semantic search** (DEV-1357 + DEV-1375): An agent-memory layer indexed by canonical entity strings. Two write-side tools — `save_memory(learning, linked_entities)` and `forget_memory(id)` — record per-entity notes (optionally bundled with an example `SlayerQuery`). Retrieval is unified into a single `search(entities, query, question, max_memories=5, max_example_queries=2, max_entities=5)` tool — there is no separate `recall_memories` surface. `linked_entities` accepts either a list of entity strings (resolved strictly) or an inline `SlayerQuery`/dict (entities auto-extracted; warnings non-fatal; the query is persisted on the memory). The canonical form is exactly one of `<ds>`, `<ds>.<model>`, `<ds>.<model>.<leaf>` (≤ 3 dotted segments after canonicalisation). Aggregation suffixes are stripped (`revenue:sum` → `<ds>.<model>.revenue`); `*:count` collapses to the source model; multi-hop dotted paths keep only the leaf (`orders.customers.regions.name` → `{<orders.ds>.orders, <regions.ds>.regions.name}`). The resolver lives in `slayer/memories/resolver.py`; the unified `Memory` row + storage primitives are concrete on `StorageBackend` (ID format / entity-intersection filter), with backends only implementing the row-shaped CRUD + a one-line `_next_memory_seq` that derives the next id from the existing corpus. `inspect_model` auto-renders a `Learnings` section listing only memories where `query is None`; query-bearing memories surface only via `search` (in the `example_queries` bucket). **Memory ids** (DEV-1405): positive ints that increase monotonically while the corpus grows; YAMLStorage derives the next id from the last row of `memories.yaml`, SQLiteStorage from `SELECT MAX(id) + 1 FROM memories`. Ids of deleted memories may be reused by future saves; `delete_memory` already cascades to the matching embedding row so reuse strands no data.
 
-  `search` runs up to three parallel channels merged by RRF (DEV-1386 adds the third). **Channel 1** is entity-overlap BM25 over memories (`slayer/memories/ranker.py` using `rank_bm25.BM25Plus`, DEV-1365) — a precisely-tagged memory outranks one with a long entity list that overlaps incidentally. **Channel 2** is a fresh in-memory tantivy index built per call over memories ∪ entities (datasources / non-hidden models / non-hidden columns / named measures / aggregations), using tantivy's `en_stem` analyzer (Porter stemmer + default tokenizer, splits on `_` and `.`). **Channel 3** (DEV-1386, optional via the `embedding_search` pip extra) is dense embedding similarity over the same memories ∪ entities corpus, computed numpy-only against rows persisted in a sidecar `embeddings` table keyed by `(canonical_id, embedding_model_name)`. The SQL lives in `slayer/storage/sidecar_embedding_store.py` (DEV-1405) — both `SQLiteStorage` and `YAMLStorage` instantiate a `SidecarEmbeddingStore` and forward all embedding CRUD to it. SQLiteStorage points it at the main `.db` file; YAMLStorage points it at a dedicated `<base_dir>/embeddings.db` sidecar so the YAML store keeps its git-diffable shape while embeddings live in a fast indexed store. **Cascade semantics** (DEV-1405 fix): `delete_embeddings_for_canonical(canonical_id_prefix=X)` matches the canonical id exactly OR as a strict dotted-path descendant (`X + "." + …`) — never as a character prefix, so deleting `memory:4` no longer also nukes `memory:42`. **Hot-path batching** (DEV-1405): `StorageBackend` exposes `save_embeddings(rows)` and `get_embeddings_for_canonical_ids(canonical_ids, embedding_model_name)` with default M-iteration impls, overridden by the bundled backends to issue single batched round-trips through `SidecarEmbeddingStore.save_many` / `get_many`; `EmbeddingService._apply_pending` uses them so one `refresh_model_subtree` issues exactly one batched read + one batched write regardless of subtree size. The active embedding model is read from `SLAYER_EMBEDDING_MODEL` (default `openai/text-embedding-3-small`) and dispatched via litellm; provider credentials are read by litellm directly (`OPENAI_API_KEY`, etc.). When the extra is not installed, the model has no rows, or the query embedding call fails, channel 3 contributes nothing and emits a single warning into `SearchResponse.warnings`; tantivy + BM25 continue to work. Refresh runs inline on `slayer ingest` / `edit_model` / `save_memory` and skips the litellm call when the rendered `content_hash` matches the stored row (cheap idempotent re-runs). Per-entity embed failures are non-fatal — search degrades gracefully. Memory rankings from every active channel are fused via Reciprocal Rank Fusion (`k=60`, hand-rolled in `slayer/search/rrf.py`); **entity hits from channels 2 and 3 are now also RRF-fused** (channel 1 contributes only to memory ranking). Memory hits are partitioned by `Memory.query is None` into `memories` (learning-only, small) and `example_queries` (query-bearing, bulky) — independent caps via `max_memories` and `max_example_queries` so bulky examples cannot crowd out small learnings. The response also echoes `resolved_input_entities` for diagnostics. Empty-input fallback returns the newest `max_memories` learning-only + newest `max_example_queries` query-bearing memories with a warning. Each indexed entity carries a `text` field rendered by `slayer/search/render.py` — named children (columns / measures / aggregations / join targets) are mentioned by name + kind only (no descriptions, since each child has its own indexed doc), while non-named children (model filters, model `sql` block, join `pairs`, aggregation `params`) are included in full. `meta` is **excluded** from indexed text (DEV-1377 hardening). Hidden models / hidden columns are skipped. **`datasource` filter** (DEV-1409): all four surfaces (`MCP search`, `POST /search`, `slayer search --datasource`, `SlayerClient.search`) accept an optional `datasource: Optional[str] = None`. When set, every channel pre-filters its corpus to that one datasource — entity hits only include docs rooted at it (exact name or strict dotted-path descendant); memories surface when any of their `entities` is rooted at it (memories spanning multiple datasources surface from each); BM25 / IDF / cosine corpus reflect only the filtered subset. Unknown datasource → `ValueError` (HTTP 400 on REST). Helper: `slayer.memories.resolver.canonical_id_rooted_at`.
+  `search` runs up to three parallel channels merged by RRF (DEV-1386 adds the third). **Channel 1** is entity-overlap BM25 over memories (`slayer/memories/ranker.py` using `rank_bm25.BM25Plus`, DEV-1365) — a precisely-tagged memory outranks one with a long entity list that overlaps incidentally. **Channel 2** is a fresh in-memory tantivy index built per call over memories ∪ entities (datasources / non-hidden models / non-hidden columns / named measures / aggregations), using tantivy's `en_stem` analyzer (Porter stemmer + default tokenizer, splits on `_` and `.`). **Channel 3** (DEV-1386, optional via the `embedding_search` pip extra) is dense embedding similarity over the same memories ∪ entities corpus, computed numpy-only against rows persisted in a sidecar `embeddings` table keyed by `(canonical_id, embedding_model_name)`. The SQL lives in `slayer/storage/sidecar_embedding_store.py` (DEV-1405) — both `SQLiteStorage` and `YAMLStorage` instantiate a `SidecarEmbeddingStore` and forward all embedding CRUD to it. SQLiteStorage points it at the main `.db` file; YAMLStorage points it at a dedicated `<base_dir>/embeddings.db` sidecar so the YAML store keeps its git-diffable shape while embeddings live in a fast indexed store. **Cascade semantics** (DEV-1405 fix): `delete_embeddings_for_canonical(canonical_id_prefix=X)` matches the canonical id exactly OR as a strict dotted-path descendant (`X + "." + …`) — never as a character prefix, so deleting `memory:4` no longer also nukes `memory:42`. **Hot-path batching** (DEV-1405): `StorageBackend` exposes `save_embeddings(rows)` and `get_embeddings_for_canonical_ids(canonical_ids, embedding_model_name)` with default M-iteration impls, overridden by the bundled backends to issue single batched round-trips through `SidecarEmbeddingStore.save_many` / `get_many`; `EmbeddingService._apply_pending` uses them so one `refresh_model_subtree` issues exactly one batched read + one batched write regardless of subtree size. The active embedding model is read from `SLAYER_EMBEDDING_MODEL` (default `openai/text-embedding-3-small`) and dispatched via litellm; provider credentials are read by litellm directly (`OPENAI_API_KEY`, etc.). When the extra is not installed, the model has no rows, or the query embedding call fails, channel 3 contributes nothing and emits a single warning into `SearchResponse.warnings`; tantivy + BM25 continue to work. Refresh runs inline on `slayer ingest` / `edit_model` / `save_memory` and skips the litellm call when the rendered `content_hash` matches the stored row (cheap idempotent re-runs). Per-entity embed failures are non-fatal — search degrades gracefully. Memory rankings from every active channel are fused via Reciprocal Rank Fusion (`k=60`, hand-rolled in `slayer/search/rrf.py`); **entity hits from channels 2 and 3 are now also RRF-fused** (channel 1 contributes only to memory ranking). Memory hits are partitioned by `Memory.query is None` into `memories` (learning-only, small) and `example_queries` (query-bearing, bulky) — independent caps via `max_memories` and `max_example_queries` so bulky examples cannot crowd out small learnings. The response also echoes `resolved_input_entities` for diagnostics. Empty-input fallback returns the newest `max_memories` learning-only + newest `max_example_queries` query-bearing memories with a warning. Each indexed entity carries a `text` field rendered by `slayer/search/render.py` — named children (columns / measures / aggregations / join targets) are mentioned by name + kind only (no descriptions, since each child has its own indexed doc), while non-named children (model filters, model `sql` block, join `pairs`, aggregation `params`) are included in full. `meta` is **excluded** from indexed text (DEV-1377 hardening). Hidden models / hidden columns are skipped. **`datasource` filter** (DEV-1409): all four surfaces (`MCP search`, `POST /search`, `slayer search --datasource`, `SlayerClient.search`) accept an optional `datasource: Optional[str] = None`. When set, every channel pre-filters its corpus to that one datasource — entity hits only include docs rooted at it (exact name or strict dotted-path descendant); memories surface when any of their `entities` is rooted at it (memories spanning multiple datasources surface from each); BM25 / IDF / cosine corpus reflect only the filtered subset. Unknown datasource → `ValueError` (HTTP 400 on REST). Helper: `slayer.memories.resolver.canonical_id_rooted_at`. **Per-bucket ranking invariance** (DEV-1414): channel 2 runs as two kind-filtered tantivy queries (one over memory docs, one over entity docs); channel 3 partitions the embedding corpus by `entity_kind` and ranks each side independently. There is no shared candidate-pool budget across kinds, so for a fixed `(question, datasource, max_X)` the membership and order of each output bucket (`memories` / `example_queries` / `entities`) is a pure function of the corpus + question + that one cap — varying the other two caps cannot move ids in or out of the returned list nor reorder it. The kind-filtered tantivy queries are emitted as boolean queries via `tantivy.Query.boolean_query` + `tantivy.Query.term_query` (`search_index`'s new `kind_filter` / `exclude_kind` params). The in-memory tantivy index is built with `writer(num_threads=1)` so doc-id tiebreak on equal BM25 scores is deterministic across rebuilds.
 
   Sample-value snapshots cached on `Column.sampled` (v6 schema bump, no-op forward migration in `slayer/storage/v6_migration.py`); refreshed on every `slayer ingest` for table-backed models, on `slayer search refresh-samples`, on `edit_model` (column-level edits → that column; `model.filters` / `model.sql` / `source_queries` change → all columns), and lazily on `inspect_model` cache miss (best-effort write-back). sql-mode and query-backed sample-value coverage is deferred to [DEV-1377](https://linear.app/motley-ai/issue/DEV-1377). Surfaces: write side via MCP, REST (`POST /memories`, `DELETE /memories/{id}`), CLI (`slayer memory {save,forget}`), and `SlayerClient`; retrieval via MCP (`search`), REST (`POST /search`), CLI (`slayer search [--entity ...] [--query ...] [--question ...] [--max-example-queries N]`, `slayer search refresh-samples`), and `SlayerClient.search()`. See [docs/concepts/memories.md](docs/concepts/memories.md) and [docs/concepts/search.md](docs/concepts/search.md).
 

diff --git a/docs/concepts/search.md b/docs/concepts/search.md
@@ -112,6 +112,20 @@ Entity rankings from channels 2 and 3 are RRF-fused the same way.
 Channel 1 contributes to the memory ranking only (it operates on
 memory entity tags, not on entity docs).
 
+### Per-bucket ranking invariance (DEV-1414)
+
+Each channel produces a **full per-kind ranking** — channel 2 runs as
+two kind-filtered tantivy queries (one over memory docs only, one over
+entity docs only), and channel 3 partitions the embedding corpus by
+`entity_kind` and ranks each side independently. There is no shared
+candidate-pool budget across kinds, so for a fixed
+`(question, datasource, max_X)` the membership and order of the
+returned `X` bucket (`memories` / `example_queries` / `entities`) is a
+pure function of the corpus + question + that one cap. Varying the
+other two caps cannot move an id in or out of the returned list nor
+reorder it. The `max_*` caps are pure post-fusion slice operations on
+the three independent ranked lists.
+
 ## Tool surface
 
 ```python

diff --git a/slayer/search/index.py b/slayer/search/index.py
@@ -192,7 +192,15 @@ def build_in_memory_corpus(
     """
     schema = _build_schema()
     index = tantivy.Index(schema=schema)
-    writer = index.writer()
+    # `num_threads=1` pins doc-id assignment to insertion order so the
+    # tantivy tiebreak (lower internal doc id wins on equal scores) is
+    # deterministic across rebuilds (DEV-1414). The default
+    # ``num_threads=0`` lets tantivy auto-pick a thread count, and with
+    # multiple writer threads the order in which threads commit their
+    # local segments determines doc-id assignment — which is
+    # non-deterministic for small in-RAM corpora that finish
+    # processing within microseconds.
+    writer = index.writer(num_threads=1)
 
     visible_models = [m for m in models if not m.hidden]
     pairs = _collect_render_pairs(
@@ -234,12 +242,39 @@ def build_in_memory_corpus(
 # ---------------------------------------------------------------------------
 
 
+def _apply_kind_filter(
+    *,
+    query: "tantivy.Query",
+    schema: "tantivy.Schema",
+    kind_filter: Optional[str],
+    exclude_kind: Optional[str],
+) -> "tantivy.Query":
+    """Wrap ``query`` in a boolean query that ``Must`` includes (or
+    ``MustNot`` excludes) docs whose ``kind`` field exactly equals the
+    supplied value. Returns ``query`` unchanged when neither argument
+    is set. The caller has already validated mutual exclusivity."""
+    if kind_filter is None and exclude_kind is None:
+        return query
+    target = kind_filter if kind_filter is not None else exclude_kind
+    occur = (
+        tantivy.Occur.Must if kind_filter is not None
+        else tantivy.Occur.MustNot
+    )
+    kind_term = tantivy.Query.term_query(schema, "kind", target)
+    return tantivy.Query.boolean_query([
+        (tantivy.Occur.Must, query),
+        (occur, kind_term),
+    ])
+
+
 def search_index(
     *,
     index: tantivy.Index,
     question: str,
     limit: int = 20,
     fields: Optional[List[str]] = None,
+    kind_filter: Optional[str] = None,
+    exclude_kind: Optional[str] = None,
 ) -> List[IndexHit]:
     """Run a tantivy query against ``index``.
 
@@ -250,10 +285,23 @@ def search_index(
         limit: Max hits to return.
         fields: Which schema fields to query against (default: ``["text"]``).
             Pass ``["canonical"]`` for an exact-match canonical lookup.
+        kind_filter: When set, restrict results to docs whose ``kind``
+            field exactly equals this value (e.g. ``"memory"``,
+            ``"model"``). Combined with the text query via ``Must``.
+        exclude_kind: When set, exclude docs whose ``kind`` field equals
+            this value. Combined with the text query via ``MustNot``.
+        ``kind_filter`` and ``exclude_kind`` are mutually exclusive
+        (DEV-1414): one is for keeping a single kind, the other for
+        dropping a single kind. Pass at most one.
 
     Returns:
         List of :class:`IndexHit` in score-desc order.
     """
+    if kind_filter is not None and exclude_kind is not None:
+        raise ValueError(
+            "kind_filter and exclude_kind are mutually exclusive; pass "
+            "at most one."
+        )
     if not question or not question.strip():
         return []
     if fields is None:
@@ -262,6 +310,12 @@ def search_index(
         query = index.parse_query(question, fields)
     except (ValueError, RuntimeError):
         return []
+    query = _apply_kind_filter(
+        query=query,
+        schema=index.schema,
+        kind_filter=kind_filter,
+        exclude_kind=exclude_kind,
+    )
     searcher = index.searcher()
     raw_hits = searcher.search(query, limit).hits
     out: List[IndexHit] = []