Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude/skills/slayer-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Datasources: `create_datasource`, `list_datasources`, `describe_datasource` (inc
Ingestion: `ingest_datasource_models`
Schema drift: `validate_models` (read-only diff against live schema; surfaces `SchemaDriftError` cleanups)
Memory write side: `save_memory`, `forget_memory` (per-entity learnings indexed by canonical entity strings — see [memories.md](../../docs/concepts/memories.md))
Search: `search` (three-channel: entity-overlap BM25 over memories + tantivy full-text over memories ∪ entities + optional dense embedding similarity, RRF-fused; embeddings require the `embedding_search` extra and degrade gracefully when unavailable; partitions query-bearing memories into `example_queries` — see [search.md](../../docs/concepts/search.md))
Search: `search` (three-channel: entity-overlap BM25 over memories + tantivy full-text + optional dense embedding similarity, RRF-fused per kind so each output bucket — `memories` / `example_queries` / `entities` — has membership/order invariant under the other buckets' caps; embeddings require the `embedding_search` extra and degrade gracefully when unavailable; partitions query-bearing memories into `example_queries` — see [search.md](../../docs/concepts/search.md))

## Package Structure

Expand Down
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ poetry run ruff check slayer/ tests/

- **Memories + semantic search** (DEV-1357 + DEV-1375): An agent-memory layer indexed by canonical entity strings. Two write-side tools — `save_memory(learning, linked_entities)` and `forget_memory(id)` — record per-entity notes (optionally bundled with an example `SlayerQuery`). Retrieval is unified into a single `search(entities, query, question, max_memories=5, max_example_queries=2, max_entities=5)` tool — there is no separate `recall_memories` surface. `linked_entities` accepts either a list of entity strings (resolved strictly) or an inline `SlayerQuery`/dict (entities auto-extracted; warnings non-fatal; the query is persisted on the memory). The canonical form is exactly one of `<ds>`, `<ds>.<model>`, `<ds>.<model>.<leaf>` (≤ 3 dotted segments after canonicalisation). Aggregation suffixes are stripped (`revenue:sum` → `<ds>.<model>.revenue`); `*:count` collapses to the source model; multi-hop dotted paths keep only the leaf (`orders.customers.regions.name` → `{<orders.ds>.orders, <regions.ds>.regions.name}`). The resolver lives in `slayer/memories/resolver.py`; the unified `Memory` row + storage primitives are concrete on `StorageBackend` (ID format / entity-intersection filter), with backends only implementing the row-shaped CRUD + a one-line `_next_memory_seq` that derives the next id from the existing corpus. `inspect_model` auto-renders a `Learnings` section listing only memories where `query is None`; query-bearing memories surface only via `search` (in the `example_queries` bucket). **Memory ids** (DEV-1405): positive ints that increase monotonically while the corpus grows; YAMLStorage derives the next id from the last row of `memories.yaml`, SQLiteStorage from `SELECT MAX(id) + 1 FROM memories`. Ids of deleted memories may be reused by future saves; `delete_memory` already cascades to the matching embedding row so reuse strands no data.

`search` runs up to three parallel channels merged by RRF (DEV-1386 adds the third). **Channel 1** is entity-overlap BM25 over memories (`slayer/memories/ranker.py` using `rank_bm25.BM25Plus`, DEV-1365) — a precisely-tagged memory outranks one with a long entity list that overlaps incidentally. **Channel 2** is a fresh in-memory tantivy index built per call over memories ∪ entities (datasources / non-hidden models / non-hidden columns / named measures / aggregations), using tantivy's `en_stem` analyzer (Porter stemmer + default tokenizer, splits on `_` and `.`). **Channel 3** (DEV-1386, optional via the `embedding_search` pip extra) is dense embedding similarity over the same memories ∪ entities corpus, computed numpy-only against rows persisted in a sidecar `embeddings` table keyed by `(canonical_id, embedding_model_name)`. The SQL lives in `slayer/storage/sidecar_embedding_store.py` (DEV-1405) — both `SQLiteStorage` and `YAMLStorage` instantiate a `SidecarEmbeddingStore` and forward all embedding CRUD to it. SQLiteStorage points it at the main `.db` file; YAMLStorage points it at a dedicated `<base_dir>/embeddings.db` sidecar so the YAML store keeps its git-diffable shape while embeddings live in a fast indexed store. **Cascade semantics** (DEV-1405 fix): `delete_embeddings_for_canonical(canonical_id_prefix=X)` matches the canonical id exactly OR as a strict dotted-path descendant (`X + "." + …`) — never as a character prefix, so deleting `memory:4` no longer also nukes `memory:42`. **Hot-path batching** (DEV-1405): `StorageBackend` exposes `save_embeddings(rows)` and `get_embeddings_for_canonical_ids(canonical_ids, embedding_model_name)` with default M-iteration impls, overridden by the bundled backends to issue single batched round-trips through `SidecarEmbeddingStore.save_many` / `get_many`; `EmbeddingService._apply_pending` uses them so one `refresh_model_subtree` issues exactly one batched read + one batched write regardless of subtree size. The active embedding model is read from `SLAYER_EMBEDDING_MODEL` (default `openai/text-embedding-3-small`) and dispatched via litellm; provider credentials are read by litellm directly (`OPENAI_API_KEY`, etc.). When the extra is not installed, the model has no rows, or the query embedding call fails, channel 3 contributes nothing and emits a single warning into `SearchResponse.warnings`; tantivy + BM25 continue to work. Refresh runs inline on `slayer ingest` / `edit_model` / `save_memory` and skips the litellm call when the rendered `content_hash` matches the stored row (cheap idempotent re-runs). Per-entity embed failures are non-fatal — search degrades gracefully. Memory rankings from every active channel are fused via Reciprocal Rank Fusion (`k=60`, hand-rolled in `slayer/search/rrf.py`); **entity hits from channels 2 and 3 are now also RRF-fused** (channel 1 contributes only to memory ranking). Memory hits are partitioned by `Memory.query is None` into `memories` (learning-only, small) and `example_queries` (query-bearing, bulky) — independent caps via `max_memories` and `max_example_queries` so bulky examples cannot crowd out small learnings. The response also echoes `resolved_input_entities` for diagnostics. Empty-input fallback returns the newest `max_memories` learning-only + newest `max_example_queries` query-bearing memories with a warning. Each indexed entity carries a `text` field rendered by `slayer/search/render.py` — named children (columns / measures / aggregations / join targets) are mentioned by name + kind only (no descriptions, since each child has its own indexed doc), while non-named children (model filters, model `sql` block, join `pairs`, aggregation `params`) are included in full. `meta` is **excluded** from indexed text (DEV-1377 hardening). Hidden models / hidden columns are skipped. **`datasource` filter** (DEV-1409): all four surfaces (`MCP search`, `POST /search`, `slayer search --datasource`, `SlayerClient.search`) accept an optional `datasource: Optional[str] = None`. When set, every channel pre-filters its corpus to that one datasource — entity hits only include docs rooted at it (exact name or strict dotted-path descendant); memories surface when any of their `entities` is rooted at it (memories spanning multiple datasources surface from each); BM25 / IDF / cosine corpus reflect only the filtered subset. Unknown datasource → `ValueError` (HTTP 400 on REST). Helper: `slayer.memories.resolver.canonical_id_rooted_at`.
`search` runs up to three parallel channels merged by RRF (DEV-1386 adds the third). **Channel 1** is entity-overlap BM25 over memories (`slayer/memories/ranker.py` using `rank_bm25.BM25Plus`, DEV-1365) — a precisely-tagged memory outranks one with a long entity list that overlaps incidentally. **Channel 2** is a fresh in-memory tantivy index built per call over memories ∪ entities (datasources / non-hidden models / non-hidden columns / named measures / aggregations), using tantivy's `en_stem` analyzer (Porter stemmer + default tokenizer, splits on `_` and `.`). **Channel 3** (DEV-1386, optional via the `embedding_search` pip extra) is dense embedding similarity over the same memories ∪ entities corpus, computed numpy-only against rows persisted in a sidecar `embeddings` table keyed by `(canonical_id, embedding_model_name)`. The SQL lives in `slayer/storage/sidecar_embedding_store.py` (DEV-1405) — both `SQLiteStorage` and `YAMLStorage` instantiate a `SidecarEmbeddingStore` and forward all embedding CRUD to it. SQLiteStorage points it at the main `.db` file; YAMLStorage points it at a dedicated `<base_dir>/embeddings.db` sidecar so the YAML store keeps its git-diffable shape while embeddings live in a fast indexed store. **Cascade semantics** (DEV-1405 fix): `delete_embeddings_for_canonical(canonical_id_prefix=X)` matches the canonical id exactly OR as a strict dotted-path descendant (`X + "." + …`) — never as a character prefix, so deleting `memory:4` no longer also nukes `memory:42`. **Hot-path batching** (DEV-1405): `StorageBackend` exposes `save_embeddings(rows)` and `get_embeddings_for_canonical_ids(canonical_ids, embedding_model_name)` with default M-iteration impls, overridden by the bundled backends to issue single batched round-trips through `SidecarEmbeddingStore.save_many` / `get_many`; `EmbeddingService._apply_pending` uses them so one `refresh_model_subtree` issues exactly one batched read + one batched write regardless of subtree size. The active embedding model is read from `SLAYER_EMBEDDING_MODEL` (default `openai/text-embedding-3-small`) and dispatched via litellm; provider credentials are read by litellm directly (`OPENAI_API_KEY`, etc.). When the extra is not installed, the model has no rows, or the query embedding call fails, channel 3 contributes nothing and emits a single warning into `SearchResponse.warnings`; tantivy + BM25 continue to work. Refresh runs inline on `slayer ingest` / `edit_model` / `save_memory` and skips the litellm call when the rendered `content_hash` matches the stored row (cheap idempotent re-runs). Per-entity embed failures are non-fatal — search degrades gracefully. Memory rankings from every active channel are fused via Reciprocal Rank Fusion (`k=60`, hand-rolled in `slayer/search/rrf.py`); **entity hits from channels 2 and 3 are now also RRF-fused** (channel 1 contributes only to memory ranking). Memory hits are partitioned by `Memory.query is None` into `memories` (learning-only, small) and `example_queries` (query-bearing, bulky) — independent caps via `max_memories` and `max_example_queries` so bulky examples cannot crowd out small learnings. The response also echoes `resolved_input_entities` for diagnostics. Empty-input fallback returns the newest `max_memories` learning-only + newest `max_example_queries` query-bearing memories with a warning. Each indexed entity carries a `text` field rendered by `slayer/search/render.py` — named children (columns / measures / aggregations / join targets) are mentioned by name + kind only (no descriptions, since each child has its own indexed doc), while non-named children (model filters, model `sql` block, join `pairs`, aggregation `params`) are included in full. `meta` is **excluded** from indexed text (DEV-1377 hardening). Hidden models / hidden columns are skipped. **`datasource` filter** (DEV-1409): all four surfaces (`MCP search`, `POST /search`, `slayer search --datasource`, `SlayerClient.search`) accept an optional `datasource: Optional[str] = None`. When set, every channel pre-filters its corpus to that one datasource — entity hits only include docs rooted at it (exact name or strict dotted-path descendant); memories surface when any of their `entities` is rooted at it (memories spanning multiple datasources surface from each); BM25 / IDF / cosine corpus reflect only the filtered subset. Unknown datasource → `ValueError` (HTTP 400 on REST). Helper: `slayer.memories.resolver.canonical_id_rooted_at`. **Per-bucket ranking invariance** (DEV-1414): channel 2 runs as two kind-filtered tantivy queries (one over memory docs, one over entity docs); channel 3 partitions the embedding corpus by `entity_kind` and ranks each side independently. There is no shared candidate-pool budget across kinds, so for a fixed `(question, datasource, max_X)` the membership and order of each output bucket (`memories` / `example_queries` / `entities`) is a pure function of the corpus + question + that one cap — varying the other two caps cannot move ids in or out of the returned list nor reorder it. The kind-filtered tantivy queries are emitted as boolean queries via `tantivy.Query.boolean_query` + `tantivy.Query.term_query` (`search_index`'s new `kind_filter` / `exclude_kind` params). The in-memory tantivy index is built with `writer(num_threads=1)` so doc-id tiebreak on equal BM25 scores is deterministic across rebuilds.

Sample-value snapshots cached on `Column.sampled` (v6 schema bump, no-op forward migration in `slayer/storage/v6_migration.py`); refreshed on every `slayer ingest` for table-backed models, on `slayer search refresh-samples`, on `edit_model` (column-level edits → that column; `model.filters` / `model.sql` / `source_queries` change → all columns), and lazily on `inspect_model` cache miss (best-effort write-back). sql-mode and query-backed sample-value coverage is deferred to [DEV-1377](https://linear.app/motley-ai/issue/DEV-1377). Surfaces: write side via MCP, REST (`POST /memories`, `DELETE /memories/{id}`), CLI (`slayer memory {save,forget}`), and `SlayerClient`; retrieval via MCP (`search`), REST (`POST /search`), CLI (`slayer search [--entity ...] [--query ...] [--question ...] [--max-example-queries N]`, `slayer search refresh-samples`), and `SlayerClient.search()`. See [docs/concepts/memories.md](docs/concepts/memories.md) and [docs/concepts/search.md](docs/concepts/search.md).

Expand Down
14 changes: 14 additions & 0 deletions docs/concepts/search.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,20 @@ Entity rankings from channels 2 and 3 are RRF-fused the same way.
Channel 1 contributes to the memory ranking only (it operates on
memory entity tags, not on entity docs).

### Per-bucket ranking invariance (DEV-1414)

Each channel produces a **full per-kind ranking** — channel 2 runs as
two kind-filtered tantivy queries (one over memory docs only, one over
entity docs only), and channel 3 partitions the embedding corpus by
`entity_kind` and ranks each side independently. There is no shared
candidate-pool budget across kinds, so for a fixed
`(question, datasource, max_X)` the membership and order of the
returned `X` bucket (`memories` / `example_queries` / `entities`) is a
pure function of the corpus + question + that one cap. Varying the
other two caps cannot move an id in or out of the returned list nor
reorder it. The `max_*` caps are pure post-fusion slice operations on
the three independent ranked lists.

## Tool surface

```python
Expand Down
56 changes: 55 additions & 1 deletion slayer/search/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,15 @@ def build_in_memory_corpus(
"""
schema = _build_schema()
index = tantivy.Index(schema=schema)
writer = index.writer()
# `num_threads=1` pins doc-id assignment to insertion order so the
# tantivy tiebreak (lower internal doc id wins on equal scores) is
# deterministic across rebuilds (DEV-1414). The default
# ``num_threads=0`` lets tantivy auto-pick a thread count, and with
# multiple writer threads the order in which threads commit their
# local segments determines doc-id assignment — which is
# non-deterministic for small in-RAM corpora that finish
# processing within microseconds.
writer = index.writer(num_threads=1)

visible_models = [m for m in models if not m.hidden]
pairs = _collect_render_pairs(
Expand Down Expand Up @@ -234,12 +242,39 @@ def build_in_memory_corpus(
# ---------------------------------------------------------------------------


def _apply_kind_filter(
*,
query: "tantivy.Query",
schema: "tantivy.Schema",
kind_filter: Optional[str],
exclude_kind: Optional[str],
) -> "tantivy.Query":
"""Wrap ``query`` in a boolean query that ``Must`` includes (or
``MustNot`` excludes) docs whose ``kind`` field exactly equals the
supplied value. Returns ``query`` unchanged when neither argument
is set. The caller has already validated mutual exclusivity."""
if kind_filter is None and exclude_kind is None:
return query
target = kind_filter if kind_filter is not None else exclude_kind
occur = (
tantivy.Occur.Must if kind_filter is not None
else tantivy.Occur.MustNot
)
kind_term = tantivy.Query.term_query(schema, "kind", target)
return tantivy.Query.boolean_query([
(tantivy.Occur.Must, query),
(occur, kind_term),
])


def search_index(
*,
index: tantivy.Index,
question: str,
limit: int = 20,
fields: Optional[List[str]] = None,
kind_filter: Optional[str] = None,
exclude_kind: Optional[str] = None,
) -> List[IndexHit]:
"""Run a tantivy query against ``index``.

Expand All @@ -250,10 +285,23 @@ def search_index(
limit: Max hits to return.
fields: Which schema fields to query against (default: ``["text"]``).
Pass ``["canonical"]`` for an exact-match canonical lookup.
kind_filter: When set, restrict results to docs whose ``kind``
field exactly equals this value (e.g. ``"memory"``,
``"model"``). Combined with the text query via ``Must``.
exclude_kind: When set, exclude docs whose ``kind`` field equals
this value. Combined with the text query via ``MustNot``.
``kind_filter`` and ``exclude_kind`` are mutually exclusive
(DEV-1414): one is for keeping a single kind, the other for
dropping a single kind. Pass at most one.

Returns:
List of :class:`IndexHit` in score-desc order.
"""
if kind_filter is not None and exclude_kind is not None:
raise ValueError(
"kind_filter and exclude_kind are mutually exclusive; pass "
"at most one."
)
if not question or not question.strip():
return []
if fields is None:
Expand All @@ -262,6 +310,12 @@ def search_index(
query = index.parse_query(question, fields)
except (ValueError, RuntimeError):
return []
query = _apply_kind_filter(
query=query,
schema=index.schema,
kind_filter=kind_filter,
exclude_kind=exclude_kind,
)
searcher = index.searcher()
raw_hits = searcher.search(query, limit).hits
out: List[IndexHit] = []
Expand Down
Loading
Loading