DEV-1416: refresh memory embeddings on slayer ingest / --ingest-on-startup by ZmeiGorynych · Pull Request #128 · MotleyAI/slayer

ZmeiGorynych · 2026-05-14T15:38:05Z

Summary

_refresh_datasource_embeddings now iterates memories whose canonical entities are rooted at the datasource (via canonical_id_rooted_at) and re-embeds each one through EmbeddingService.refresh_memory. Both slayer ingest <ds> and --ingest-on-startup go through this path, so a stale embeddings.db (created without an API key, or after a manual memories.yaml edit) is repaired by the next ingest with no extra step.
Per-memory embed failures attribute as IngestionError(model_name="memory:<id>", …) in IdempotentIngestResult.errors, so startup-log inspection distinguishes memory failures from datasource-doc / model failures at a glance. Regex extraction lives in ingest_datasource_idempotent (_MEMORY_REF_RE).
list_memories raising is caught and surfaced as a single warning — the datasource pass never propagates the failure.

Linear: DEV-1416. Follow-up issue for the generic flat batcher across entity kinds: DEV-1417.

Test plan

tests/test_idempotent_ingestion.py::TestMemoryEmbeddingRefresh — six unit cases covering: rooted-only filter, hash-skip no-op on second pass, per-memory failure → model_name="memory:<id>", defensive raise tagged, embedding_search extra missing → silent no-op, list_memories raise → warning + continue.
tests/test_startup_ingest.py::TestMemoryEmbeddingsOnStartup — end-to-end through ingest_all_datasources_idempotent on a real YAMLStorage + sqlite live DB, asserts the memory embedding row exists post-pass.
poetry run pytest -m "not integration" — 2651 passed, 0 failed.
poetry run ruff check slayer/ tests/ — clean.

Out of scope

Structured embedding_status on save_memory response.
Standalone slayer embeddings refresh CLI (tracked in DEV-1415).
Generic flat batcher across entity kinds (tracked in DEV-1417).

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Memory embeddings are refreshed during startup and per-datasource ingest; stale embedding stores are auto-repaired.
Bug Fixes
- Per-memory embedding failures and persistence errors are reported and attributed to the correct memory IDs; ingest passes continue on isolated warnings.
Validation
- Name validation tightened: colons and disallowed dotted/namespace patterns are rejected.
Documentation
- Clarified memories, search channels/fusion, embedding lifecycle, refresh timing, and error reporting.
Tests
- Added tests covering memory embedding refresh, startup ingest behavior, error attribution, and name validations.
Chores
- Package version bumped to 0.6.8.

…artup Per-datasource ingest pass now re-embeds every memory whose canonical entities are rooted at the datasource. A stale embeddings.db (created without an API key, or after a manual memories.yaml edit) is repaired by the next ingest without any extra step. Per-memory embed failures attribute as IngestionError(model_name="memory:<id>", ...) so a startup log inspection can distinguish them from datasource-doc / model failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…embeddings-not-refreshed-by-ingest-on-startup # Conflicts: # CLAUDE.md # pyproject.toml

coderabbitai · 2026-05-14T15:38:59Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e3a7ccd9-755a-4a22-a516-b83918ea7038

📥 Commits

Reviewing files that changed from the base of the PR and between 961453c and 746438e.

📒 Files selected for processing (1)

slayer/core/models.py

🚧 Files skipped from review as they are similar to previous changes (1)

slayer/core/models.py

📝 Walkthrough

Walkthrough

This PR makes idempotent per-datasource ingestion refresh embeddings for all memories canonically rooted at that datasource (including startup ingest), returns per-entity (model_name, error) warning tuples for precise error attribution, expands name-validation rules, adds tests, updates docs, and bumps the package version.

Changes

Memory embedding refresh and startup repair

Layer / File(s)	Summary
Embedding refresh implementation `slayer/engine/ingestion.py`, `slayer/embeddings/service.py`	Orchestrates three embedding refresh passes (models, datasource doc, memories); tags warnings with canonical identifiers and returns (model_name, error) tuples; persists embeddings and includes canonical IDs in save failure warnings.
Name validation refactor and tests `slayer/core/models.py`, `slayer/core/query.py`, `tests/test_models.py`	Introduce centralized substring-rule validators and `_require_non_empty_trimmed`; apply updated rejection rules (including `:`) to datasource/model/query/column names and add unit tests asserting these constraints.
Unit and integration tests for memory embeddings `tests/test_idempotent_ingestion.py`, `tests/test_startup_ingest.py`	Add `TestMemoryEmbeddingRefresh` unit suite covering memory-root filtering, idempotent noop, per-memory error attribution, exception-to-warning conversion, embedding-unavailable noop, persistence failure attribution, and `TestMemoryEmbeddingsOnStartup` end-to-end smoke test verifying memory embeddings are refreshed and persisted during startup ingest.
Documentation and version bump `CLAUDE.md`, `docs/concepts/ingestion.md`, `docs/concepts/search.md`, `pyproject.toml`	Document memory embedding refresh during per-datasource ingest and startup (including automatic repair of stale `embeddings.db`), revise memories/search guidance, and bump project version to 0.6.8.

Sequence Diagram(s)

sequenceDiagram
  participant Ingest as ingest_datasource_idempotent
  participant Refresh as _refresh_datasource_embeddings
  participant Models as _refresh_models_for_datasource
  participant Doc as _refresh_datasource_doc
  participant MemList as storage.list_memories()
  participant MemRefresh as EmbeddingService.refresh_memory()
  participant Persist as storage.save_embeddings()
  participant Errors as IdempotentIngestResult.errors

  Ingest->>Refresh: trigger embeddings refresh for datasource
  Refresh->>Models: refresh each visible model subtree
  Models-->>Refresh: warnings tagged "<ds>.<model>"
  Refresh->>Doc: refresh datasource doc embedding
  Doc-->>Refresh: warnings tagged ""
  Refresh->>MemList: list memories
  MemList-->>Refresh: memory records (filtered by canonical root)
  loop for each memory
    Refresh->>MemRefresh: refresh_memory(memory)
    MemRefresh-->>Persist: save embeddings
    Persist-->>Refresh: persist success or exception (warnings include canonical_ids)
  end
  Refresh->>Errors: return (model_name, error) tuples for ingestion mapping

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

MotleyAI/slayer#92: Overlaps on slayer/core/models.py validation changes and namespacing rules.

Suggested reviewers

AivanF

Poem

🐰 In fields of code where memories sleep,
I nudge the ingest, wake roots from deep.
Each memory tagged, each error named,
Stale embeddings mended, nothing blamed.
Hop—startup heals caches, tidy and neat!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding memory embeddings refresh functionality to the ingest process.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch egor/dev-1416-memory-embeddings-not-refreshed-by-ingest-on-startup

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

tests/test_idempotent_ingestion.py (2)
559-563: 💤 Low value

Consider adding SONAR comment for consistency.

This async stub function must remain async def to match the embed_batch signature, even though it doesn't await. Adding # NOSONAR(S7503) would suppress the warning and maintain consistency with other test stubs in this file.
📝 Suggested addition
-        async def should_not_be_called(
+        async def should_not_be_called(  # NOSONAR(S7503) — must be `async def` to match embed_batch signature for monkeypatch
             texts: List[str], *, model: Optional[str] = None,
         ) -> List[Optional[List[float]]]:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_idempotent_ingestion.py` around lines 559 - 563, The async stub
function should_not_be_called is intentionally declared async though it doesn't
await; add the SONAR suppression comment "# NOSONAR(S7503)" on the async def
line to suppress the S7503 warning and keep it consistent with other test stubs
(leave the function body unchanged so it still appends to called and returns
[None for _ in texts]).
393-397: 💤 Low value

Consider adding SONAR comment for consistency.

Similar to the stub in test_startup_ingest.py, this async def function doesn't await anything but must remain async to match the embed_batch signature for monkeypatching. Adding # NOSONAR(S7503) would suppress the static analysis warning and align with the pattern used in test_startup_ingest.py.
📝 Suggested addition
-        async def fake_embed_batch(
+        async def fake_embed_batch(  # NOSONAR(S7503) — must be `async def` to match embed_batch signature for monkeypatch
             texts: List[str], *, model: Optional[str] = None,
         ) -> List[Optional[List[float]]]:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_idempotent_ingestion.py` around lines 393 - 397, The async stub
fake_embed_batch currently has no await but must remain async to match the
embed_batch signature for monkeypatching; add a SONAR suppression comment (e.g.,
# NOSONAR(S7503)) on the fake_embed_batch definition to mirror the pattern used
in test_startup_ingest.py and suppress the static-analysis warning while keeping
the function async for tests.
tests/test_startup_ingest.py (1)
861-864: 💤 Low value

Consider adding NOSONAR comment for consistency.

The static analysis tool flags this async function for not using await, but this is a false positive—the function must be async def to match the signature of embed_batch for monkeypatching. Similar stubs elsewhere in this file (e.g., lines 77, 144, 383, 488, 527, 559, 588, 658, 683, 704, 728) include # NOSONAR(S7503) comments to suppress this warning. Adding the same comment here would maintain consistency.
📝 Suggested addition
-        async def fake_embed_batch(
+        async def fake_embed_batch(  # NOSONAR(S7503) — must be `async def` to match embed_batch signature for monkeypatch
             texts: List[str], *, model: Optional[str] = None,
         ) -> List[Optional[List[float]]]:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_startup_ingest.py` around lines 861 - 864, Add the same NOSONAR
suppression comment to the async stub so the static analyzer stops flagging it:
in the fake_embed_batch async function (definition named fake_embed_batch)
append the comment "# NOSONAR(S7503)" to the function body or immediately after
the def line to match the other stubs in this file and preserve the required
async signature for monkeypatching.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_idempotent_ingestion.py`:
- Around line 559-563: The async stub function should_not_be_called is
intentionally declared async though it doesn't await; add the SONAR suppression
comment "# NOSONAR(S7503)" on the async def line to suppress the S7503 warning
and keep it consistent with other test stubs (leave the function body unchanged
so it still appends to called and returns [None for _ in texts]).
- Around line 393-397: The async stub fake_embed_batch currently has no await
but must remain async to match the embed_batch signature for monkeypatching; add
a SONAR suppression comment (e.g., # NOSONAR(S7503)) on the fake_embed_batch
definition to mirror the pattern used in test_startup_ingest.py and suppress the
static-analysis warning while keeping the function async for tests.

In `@tests/test_startup_ingest.py`:
- Around line 861-864: Add the same NOSONAR suppression comment to the async
stub so the static analyzer stops flagging it: in the fake_embed_batch async
function (definition named fake_embed_batch) append the comment "#
NOSONAR(S7503)" to the function body or immediately after the def line to match
the other stubs in this file and preserve the required async signature for
monkeypatching.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6d885198-0f10-4162-b1cd-b7eb0adf114f

📥 Commits

Reviewing files that changed from the base of the PR and between 16c2315 and c9bdd7c.

📒 Files selected for processing (7)

CLAUDE.md
docs/concepts/ingestion.md
docs/concepts/search.md
pyproject.toml
slayer/engine/ingestion.py
tests/test_idempotent_ingestion.py
tests/test_startup_ingest.py

…s, NOSONAR) * Embedding persist-failure warning now lists the failing rows' canonical ids so per-memory persist failures attribute to memory:<id> via the ingest IngestionError shape. * Forbid ":" in all name validators (Column, ModelMeasure, SlayerModel name + data_source, SlayerQuery, DatasourceConfig). The aggregation separator is reserved across the canonical-id namespace; this enforces the invariant the memory resolver already assumes. * Factor shared name-validation rules into _SubstringRule constants (_NO_DUNDER / _NO_DOT / _NO_COLON / _NO_FWD_SLASH / _NO_BACK_SLASH / _NO_NUL) plus _require_non_empty_trimmed, so the four name validators share one source of truth per character. * Replace the regex-based memory:<id> extraction in ingest_datasource_idempotent with structured (model_name, error_text) tuples returned from three helpers: _refresh_models_for_datasource, _refresh_datasource_doc, _refresh_memories_for_datasource. Drops the fragile string sniffing and brings _refresh_datasource_embeddings' cognitive complexity back under the Sonar gate. * Add NOSONAR(S7503) on three async test stubs that must be `async def` to match the patched signatures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

slayer/core/models.py (1)

97-103: ⚡ Quick win

Use keyword arguments in multi-parameter helper calls.

The new helper calls pass multiple arguments positionally. Please switch them to keyword arguments to match repository rules and keep call sites self-documenting.

Suggested patch

 def _validate_model_name(name: str, context: str) -> str:
@@
-    _NO_DUNDER.check(name, label)
-    _NO_DOT.check(name, label)
-    _NO_COLON.check(name, label)
+    _NO_DUNDER.check(name=name, context=label)
+    _NO_DOT.check(name=name, context=label)
+    _NO_COLON.check(name=name, context=label)

 def _validate_column_name(name: str, context: str) -> str:
@@
-    _NO_DOT.check(name, label)
-    _NO_COLON.check(name, label)
+    _NO_DOT.check(name=name, context=label)
+    _NO_COLON.check(name=name, context=label)

 class SlayerModel(BaseModel):
@@
-        _NO_NUL.check(v, label)
-        _NO_FWD_SLASH.check(v, label)
-        _NO_BACK_SLASH.check(v, label)
-        _NO_DOT.check(v, label)
-        _NO_COLON.check(v, label)
+        _NO_NUL.check(name=v, context=label)
+        _NO_FWD_SLASH.check(name=v, context=label)
+        _NO_BACK_SLASH.check(name=v, context=label)
+        _NO_DOT.check(name=v, context=label)
+        _NO_COLON.check(name=v, context=label)

 class DatasourceConfig(BaseModel):
@@
-        _require_non_empty_trimmed(v, label)
-        _NO_NUL.check(v, label)
-        _NO_FWD_SLASH.check(v, label)
-        _NO_BACK_SLASH.check(v, label)
-        _NO_DOT.check(v, label)
-        _NO_COLON.check(v, label)
+        _require_non_empty_trimmed(v=v, context=label)
+        _NO_NUL.check(name=v, context=label)
+        _NO_FWD_SLASH.check(name=v, context=label)
+        _NO_BACK_SLASH.check(name=v, context=label)
+        _NO_DOT.check(name=v, context=label)
+        _NO_COLON.check(name=v, context=label)

As per coding guidelines **/*.py: Use keyword arguments for functions with more than 1 parameter.

Also applies to: 112-114, 430-435, 706-711

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/core/models.py` around lines 97 - 103, Change positional
multi-argument helper calls to use keyword arguments: in _validate_model_name,
update the three helper invocations (_NO_DUNDER.check, _NO_DOT.check,
_NO_COLON.check) to call check(name=..., label=...) instead of positional args;
also apply the same change to the other _*.check call sites in this module (the
other occurrences referenced in the review) so every check(...) call with more
than one parameter uses explicit keyword argument names.

slayer/core/query.py (1)

330-331: ⚡ Quick win

Use keyword arguments when delegating name validation.

Please call _validate_model_name with keyword arguments for consistency with repo style rules.
Suggested patch
-        return _validate_model_name(v, "Query")
+        return _validate_model_name(name=v, context="Query")
As per coding guidelines **/*.py: Use keyword arguments for functions with more than 1 parameter.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/core/query.py` around lines 330 - 331, Replace the positional call
_validate_model_name(v, "Query") with a keyword-argument call so it follows repo
style; call _validate_model_name using the parameter names from its signature
(for example: _validate_model_name(name=v, model_name="Query") or similar),
ensuring you pass the current variable v as the name and "Query" as the
model/context argument.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/engine/ingestion.py`:
- Around line 1226-1248: The helper currently builds models_in_ds by iterating
storage._list_all_model_identities and calling storage.get_model, but it still
proceeds to call _refresh_datasource_doc even when enumeration or per-model
get_model errors occurred or when hidden models were appended; this can
overwrite a correct datasource doc with a partial or unfiltered one. Fix by:
stop updating the datasource doc if storage._list_all_model_identities raised or
any get_model call failed (i.e., if the function caught exceptions and appended
to warnings), and additionally filter out hidden/ignored models from
models_in_ds before returning or refreshing (use the same visibility/filtering
logic as service.refresh_model_subtree); apply the same guard/filters in the
analogous block at the later range (lines referenced in the comment) so
_refresh_datasource_doc is only called with a complete, filtered models_in_ds.

---

Nitpick comments:
In `@slayer/core/models.py`:
- Around line 97-103: Change positional multi-argument helper calls to use
keyword arguments: in _validate_model_name, update the three helper invocations
(_NO_DUNDER.check, _NO_DOT.check, _NO_COLON.check) to call check(name=...,
label=...) instead of positional args; also apply the same change to the other
_*.check call sites in this module (the other occurrences referenced in the
review) so every check(...) call with more than one parameter uses explicit
keyword argument names.

In `@slayer/core/query.py`:
- Around line 330-331: Replace the positional call _validate_model_name(v,
"Query") with a keyword-argument call so it follows repo style; call
_validate_model_name using the parameter names from its signature (for example:
_validate_model_name(name=v, model_name="Query") or similar), ensuring you pass
the current variable v as the name and "Query" as the model/context argument.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9ceaf4b8-7e84-4dd2-a449-dbbd6860162c

📥 Commits

Reviewing files that changed from the base of the PR and between c9bdd7c and 961453c.

📒 Files selected for processing (7)

slayer/core/models.py
slayer/core/query.py
slayer/embeddings/service.py
slayer/engine/ingestion.py
tests/test_idempotent_ingestion.py
tests/test_models.py
tests/test_startup_ingest.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/test_idempotent_ingestion.py
tests/test_startup_ingest.py

Switch the new _NO_*.check() and _require_non_empty_trimmed() call sites in slayer/core/models.py to keyword form, per the global rule about multi-parameter helper calls. Mechanical change; existing validator tests pin behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-05-14T16:15:58Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

ZmeiGorynych and others added 2 commits May 14, 2026 17:28

Merge remote-tracking branch 'origin/main' into egor/dev-1416-memory-…

c9bdd7c

…embeddings-not-refreshed-by-ingest-on-startup # Conflicts: # CLAUDE.md # pyproject.toml

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread slayer/engine/ingestion.py

ZmeiGorynych merged commit 1245dde into main May 14, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEV-1416: refresh memory embeddings on slayer ingest / --ingest-on-startup#128

DEV-1416: refresh memory embeddings on slayer ingest / --ingest-on-startup#128
ZmeiGorynych merged 4 commits into
mainfrom
egor/dev-1416-memory-embeddings-not-refreshed-by-ingest-on-startup

ZmeiGorynych commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZmeiGorynych commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Out of scope

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 14, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZmeiGorynych commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading