Migrate embedder configuration from Django settings to PipelineSettings by JSv4 · Pull Request #896 · Open-Source-Legal/OpenContracts

JSv4 · 2026-02-17T14:25:33Z

Summary

This PR migrates embedder configuration from Django settings (DEFAULT_EMBEDDER, DEFAULT_EMBEDDERS_BY_FILETYPE) to the database-backed PipelineSettings singleton. This enables dynamic embedder configuration without requiring application restarts.

Key Changes

Removed Django settings dependency: Eliminated imports of django.conf.settings for embedder configuration across the codebase
New utility function: Added get_default_embedder_path() to centralize retrieval of the default embedder path from PipelineSettings
Simplified embedder resolution:
- get_default_embedder() now uses get_default_embedder_path() internally
- get_default_embedder_for_filetype() now delegates to get_preferred_embedder() which reads from PipelineSettings
Updated all consumers: Modified all code paths that previously read from Django settings to use the new utility functions:
- embeddings_task.py: Uses get_default_embedder_path() instead of settings.DEFAULT_EMBEDDER
- core_vector_stores.py: Global search now reads from PipelineSettings
- corpuses/models.py: Corpus creation freezes embedder from PipelineSettings
- backfill_default_embeddings.py: Management command uses new utility
- unified_agent_conversation.py: WebSocket consumer uses new utility
- GraphQL queries: Semantic search uses new utility
Updated system checks: check_default_embedder_consistency() now reads from PipelineSettings and provides updated guidance for configuration
Removed dead config: Deleted DEFAULT_EMBEDDERS_BY_FILETYPE from base.py and test.py (no longer referenced by any runtime code)
Updated tests: Modified test mocks to patch get_default_embedder_path() instead of Django settings; migrated all test assertions from settings.DEFAULT_EMBEDDER to get_default_embedder_path() to test the actual production API

Review Feedback Addressed

Fixed async ORM bug: Wrapped get_default_embedder_path() in database_sync_to_async() inside _pick_document_embedder to prevent SynchronousOnlyOperation
Fixed stale log message: Updated error message in pipeline/utils.py to reference PipelineSettings instead of "settings"
Removed dead DEFAULT_EMBEDDERS_BY_FILETYPE: Deleted from settings files and documentation
Migrated test assertions: Replaced 13 occurrences of settings.DEFAULT_EMBEDDER across 5 test files with get_default_embedder_path()
Updated documentation nits: Cleaned up comments and docstrings referencing DEFAULT_EMBEDDER as a settings constant

Implementation Details

The get_default_embedder_path() function imports PipelineSettings locally to avoid circular imports
All existing functionality is preserved; this is purely a configuration source migration
The change maintains backward compatibility in behavior while improving configurability

…singleton Several production code paths still read the default embedder path directly from Django settings (settings.DEFAULT_EMBEDDER) instead of the PipelineSettings database singleton, creating a split-brain bug where runtime changes to the default embedder via GraphQL mutations would not take full effect. Changes: - Add get_default_embedder_path() utility in pipeline/utils.py that returns the path string from the PipelineSettings singleton - Refactor get_default_embedder() to use get_default_embedder_path() - Simplify get_default_embedder_for_filetype() to delegate to get_preferred_embedder() (was reading from removed settings.DEFAULT_EMBEDDERS_BY_FILETYPE) - Update all production consumers: - embeddings_task.py (_apply_dual_embedding_strategy) - core_vector_stores.py (global_search) - corpuses/models.py (Corpus.save embedder freeze) - corpuses/checks.py (startup consistency check) - backfill_default_embeddings.py (management command) - queries.py (GraphQL semantic search resolver) - unified_agent_conversation.py (WebSocket embedder picker) - Update tests to mock get_default_embedder_path instead of settings.DEFAULT_EMBEDDER https://claude.ai/code/session_01XdTX294CLzLHxrnaC5sodd

claude · 2026-02-17T14:27:36Z

PR Review: Migrate embedder configuration from Django settings to PipelineSettings

This is a clean, well-scoped migration that correctly moves embedder configuration from static Django settings to the database-backed PipelineSettings singleton. The overall approach is sound. Here are my findings:

Correctness & Logic

Minor issue: redundant or None in corpuses/models.py

# corpuses/models.py line ~377
default_embedder = get_default_embedder_path() or None

get_default_embedder_path() already returns "" (empty string) when unconfigured, and the subsequent if not self.preferred_embedder and default_embedder: guard already handles falsy values correctly. The or None is a no-op that adds noise without changing behavior. This was likely added to mirror the previous getattr(settings, "DEFAULT_EMBEDDER", None) pattern, but it's now unnecessary.

Behaviour change in get_default_embedder_for_filetype()

The old implementation looked up by mimetype in settings.DEFAULT_EMBEDDERS_BY_FILETYPE, which was a separate dict. The new implementation delegates to get_preferred_embedder(mimetype), which reads PipelineSettings.preferred_embedders. These should be populated equivalently by the migration/sync command, but the docstring and test now assert this delegation without testing the actual PipelineSettings DB path. Existing tests for test_backfill_default_embeddings.py still use @override_settings(DEFAULT_EMBEDDER=...), which will no longer affect behaviour through the new code path — see the test concern below.

Test Coverage Gaps

test_backfill_default_embeddings.py still uses @override_settings(DEFAULT_EMBEDDER=...)

This test file was not updated in this PR and uses the old mechanism:

@override_settings(DEFAULT_EMBEDDER="test.embedder.path")
def test_...(self):
    call_command("backfill_default_embeddings", ...)

But the command now calls get_default_embedder_path(), which reads from PipelineSettings in the database — not from Django settings. The @override_settings decorator will have no effect on the new code path. These tests will either silently pass against whatever is in the test DB's PipelineSettings, or fail unexpectedly if the DB has no embedder configured.

Similarly, test_personal_corpus.py and test_django_annotation_vector_store.py still reference settings.DEFAULT_EMBEDDER heavily and use @override_settings — these tests will be broken or misleading after this change.

Missing test: get_default_embedder_path() itself

The new utility function get_default_embedder_path() has no direct unit test. A simple test verifying it reads from PipelineSettings (and returns "" when unconfigured) would improve confidence.

Performance

DB hit on every call to get_default_embedder_path()

PipelineSettings.get_instance() uses a 5-minute cache, so this is already mitigated. However, the checks.py system check now calls get_default_embedder_path() at startup — this is fine since checks are a one-time operation, but it's worth confirming the cache is warm before Celery workers start dispatching tasks.

global_search now imports two functions instead of one

from opencontractserver.pipeline.utils import (
    get_default_embedder,
    get_default_embedder_path,
)

Minor style note: since get_default_embedder internally calls get_default_embedder_path(), there's a redundant double-lookup in global_search (one for the path, one for the class). This was the same before, so it's not a regression, but it's something to note.

Architecture / Design

Inline imports for circular import avoidance

The inline from opencontractserver.documents.models import PipelineSettings inside get_default_embedder_path() is the right call to avoid circular imports, and the comment correctly documents the reason. No concerns here.

get_default_embedder_for_filetype() is now a trivial alias

Now that this function is a one-liner delegating to get_preferred_embedder(), consider whether the alias is still needed or if call sites should just use get_preferred_embedder() directly to avoid indirection. This is a minor DRY consideration.

Summary

Category	Severity	Item
Bug	Medium	`test_backfill_default_embeddings.py` `@override_settings` no longer controls the new code path
Bug	Medium	`test_personal_corpus.py` and `test_django_annotation_vector_store.py` still rely on `settings.DEFAULT_EMBEDDER`
Style	Low	Redundant `or None` in `corpuses/models.py:377`
Coverage	Low	No direct test for `get_default_embedder_path()`
Design	Low	`get_default_embedder_for_filetype()` is now a trivial pass-through; worth evaluating if the alias adds value

The core migration logic is correct and well-structured. The main concern is that several test files that were not updated in this PR will now silently test the wrong thing after the settings source changes. Those tests should be updated to either mock get_default_embedder_path (like the tests that were updated) or configure PipelineSettings directly in their test setup.

…e-defaults-XDUWa

Replace settings.DEFAULT_EMBEDDER with get_default_embedder_path() to complete the migration to the database-backed PipelineSettings singleton.

@patch

Replace decorative @override_settings(DEFAULT_EMBEDDER=...) with @patch targeting the actual code path through get_default_embedder_path().

@patch

Replace @override_settings(DEFAULT_EMBEDDER=...) with @patch targeting the actual code path through get_default_embedder_path() in corpus_tasks.

Replace settings.DEFAULT_EMBEDDER value lookups with get_default_embedder_path() to match what the vector store search code now reads from PipelineSettings.

get_default_embedder_path() returns "" when unconfigured, and the subsequent 'if not ... and default_embedder:' guard handles falsy values correctly without the conversion to None.

@patch

Since get_default_embedder_path is imported inside function bodies (not at module level), @patch must target the source module opencontractserver.pipeline.utils, not the consuming module.

claude · 2026-02-19T09:14:35Z

PR Review: Migrate embedder configuration to PipelineSettings

Overall this is a clean, well-scoped migration with good test coverage. The approach is sound — removing Django settings as the runtime source of truth in favour of the database-backed singleton is the right direction. A few issues need addressing before merge.

Bug (Critical) — Sync ORM call inside an async method

File: config/websocket/consumers/unified_agent_conversation.py, line 488

async def _pick_document_embedder(self) -> str | None:
    ...
    paths = await database_sync_to_async(get_embedder_paths)()

    if paths:
        return paths[0]
    else:
        from opencontractserver.pipeline.utils import get_default_embedder_path
        return get_default_embedder_path()   # ← called raw inside async def

get_default_embedder_path() calls PipelineSettings.get_instance(), which executes transaction.atomic() and get_or_create() — synchronous ORM operations. Before this PR, the equivalent call was settings.DEFAULT_EMBEDDER, a plain dict lookup that was safely callable from any context. The new implementation will raise SynchronousOnlyOperation at runtime whenever the else branch is taken (i.e., a document has no existing structural-annotation embeddings).

Fix: wrap the call:

return await database_sync_to_async(get_default_embedder_path)()

Bug (Minor) — Stale log message

File: opencontractserver/pipeline/utils.py, line 367

logger.error("No default embedder specified in settings")

This message still says "in settings" after the config has been moved to PipelineSettings. Should be updated to something like "No default embedder configured in PipelineSettings".

Dead Code — `DEFAULT_EMBEDDERS_BY_FILETYPE` setting

Files: config/settings/base.py, config/settings/test.py

get_default_embedder_for_filetype() previously read from settings.DEFAULT_EMBEDDERS_BY_FILETYPE. It now delegates entirely to get_preferred_embedder(), which reads from PipelineSettings.preferred_embedders. That field is seeded from PREFERRED_EMBEDDERS (not DEFAULT_EMBEDDERS_BY_FILETYPE), so the DEFAULT_EMBEDDERS_BY_FILETYPE entries in both settings files are now unreachable dead configuration. Per the project's no-dead-code rule, these should be removed (or, if the intent is to migrate this data to PipelineSettings.preferred_embedders, the migrate_pipeline_settings command should handle it).

Incomplete test migration — `settings.DEFAULT_EMBEDDER` still used as an assertion value

Several test files that were not updated in this PR compare against settings.DEFAULT_EMBEDDER directly:

File	Lines
`test_embedder_management.py`	39, 44, 736
`test_dual_embeddings.py`	90, 104, 128, 169, 305, 343, 440
`test_pydantic_ai_agents.py`	233
`test_corpus_embeddings.py`	73
`test_semantic_search_graphql.py`	69

These pass today because PipelineSettings.get_instance() bootstraps its default_embedder from settings.DEFAULT_EMBEDDER on first access — so the two values happen to be equal. The tests therefore don't actually verify that the production path reads from PipelineSettings; they verify a transitive equality. This creates a fragile coupling: if someone nulls out settings.DEFAULT_EMBEDDER in the test config (which would be consistent with full deprecation), several tests would silently start testing against an empty string.

The tests in test_embedder_management.py lines 36–44 are the sharpest example — test_preferred_embedder_auto_populated_from_default and test_created_with_embedder_set_on_creation should be updated to assert against a known value seeded into PipelineSettings directly, similar to the pattern already used in the updated test_auto_population_uses_current_default.

Documentation nits

config/graphql/queries.py lines 2197 and 2219 still reference DEFAULT_EMBEDDER by name in inline comments. Not blocking, but worth cleaning up for consistency.
opencontractserver/tests/test_backfill_default_embeddings.py docstring at line 4 still says "Tests the command that backfills DEFAULT_EMBEDDER embeddings" — same minor inconsistency.

What's good

The new get_default_embedder_path() helper is a clean single point of abstraction, and get_default_embedder() correctly delegates to it (DRY improvement).
The system check in checks.py correctly guards against the documents_pipelinesettings table not yet existing, preventing spurious errors during initial migrations.
get_default_embedder_path() or None in corpus_tasks.py correctly handles the empty-string-vs-None contract difference between the old and new APIs.
Test patches target the correct import location (where the symbol is used, not where it's defined).

- Wrap get_default_embedder_path() in database_sync_to_async inside async _pick_document_embedder to prevent SynchronousOnlyOperation - Update log message to reference PipelineSettings instead of settings - Remove dead DEFAULT_EMBEDDERS_BY_FILETYPE from settings and docs

…_embedder_path() Replace fragile transitive coupling through Django settings with direct calls to the production API across 5 test files (13 occurrences).

…of DEFAULT_EMBEDDER

claude · 2026-02-19T12:15:35Z

PR Review: Migrate embedder configuration to PipelineSettings

The direction of this change is sound — centralizing embedder configuration in the database-backed PipelineSettings singleton enables runtime reconfiguration, which is a clear improvement over requiring a server restart. The breadth of changes is well-coordinated and the test updates are systematic. A few issues worth addressing before merging:

Bug / Correctness

1. Hardcoded database table name in checks.py

In opencontractserver/corpuses/checks.py:

if "documents_pipelinesettings" not in tables:
    return errors

This string is derived from the app label + model name and will silently break (returning no warnings at all) if either ever changes. The function already uses introspection for the column check — the table guard should too:

from opencontractserver.documents.models import PipelineSettings
if PipelineSettings._meta.db_table not in tables:
    return errors

The import should be placed inside the existing try/except block where other guarded imports live.

2. Semantic drift in get_default_embedder_for_filetype()

The old implementation read from settings.DEFAULT_EMBEDDERS_BY_FILETYPE; the new one delegates to get_preferred_embedder() which reads PipelineSettings.preferred_embedders (initially populated from the different Django setting PREFERRED_EMBEDDERS). If any deployment configured DEFAULT_EMBEDDERS_BY_FILETYPE without also having PREFERRED_EMBEDDERS, callers of get_default_embedder_for_filetype() will now silently get None where they previously got a value.

The updated test in test_pipeline_utils.py mocks get_preferred_embedder so it verifies the delegation but not the actual resolution equivalence. A docstring note or CHANGELOG entry documenting this behavioral change would help operators upgrading.

Clarity / Potential Confusion

3. DEFAULT_EMBEDDER in Django settings still serves a purpose — it is not dead config

config/settings/test.py (and presumably base.py) still defines DEFAULT_EMBEDDER, which is correct — PipelineSettings.get_instance() uses it to bootstrap the singleton on first creation. However, after removing DEFAULT_EMBEDDERS_BY_FILETYPE without a comment, the remaining DEFAULT_EMBEDDER looks orphaned and a future cleanup pass might remove it incorrectly.

A short inline comment would prevent this:

# Bootstrap-only: used by PipelineSettings.get_instance() to seed the singleton
# on first DB creation. NOT read at runtime — configure via admin UI or
# updatePipelineSettings mutation after initial setup.
DEFAULT_EMBEDDER = "..."

4. Documentation still describes DEFAULT_EMBEDDER as a runtime setting

docs/architecture/embeddings_creation_and_retrieval.md lists DEFAULT_EMBEDDER under "Embedder settings are configured in config/settings/base.py" without noting that its role is now bootstrap-only. Worth a one-liner clarification to avoid operator confusion during upgrades.

Minor / Non-blocking

5. get_default_embedder_path() empty-string contract

The function correctly returns "" (not None) when unconfigured. Most callers handle this fine (or None, truthy checks). The queries.py path passes the result to get_default_embedder() which already handles empty strings via its own truthy check — that path is fine. Worth a quick scan to confirm no downstream code does a strict is not None check on the raw result.

6. Performance (already handled — FYI)

PipelineSettings.get_instance() already uses Django's cache framework with a 5-minute TTL and dual invalidation on save. The DB hit concern for high-throughput scenarios is mitigated. Noted only for completeness.

Summary

The hardcoded table name (#1) and the undocumented behavioral change in get_default_embedder_for_filetype() (#2) are the two items worth fixing before merge. Items 3-5 are lower priority. The test refactoring from @override_settings to @patch is the correct approach given that the configuration source is now the DB, and the lazy-import pattern in corpus_tasks.py and models.py ensures those patches work correctly at call time.

QuerySet.delete() bypasses model.delete() and doesn't invalidate the PipelineSettings cache. During setUp(), user creation signals trigger personal corpus creation which calls get_default_embedder_path(), populating the cache with test.py defaults. The two @override_settings tests then get stale cached values instead of creating fresh instances with the overridden settings. Adding _invalidate_cache() after the queryset delete ensures get_instance() hits the DB and picks up the overridden Django settings.

claude · 2026-02-19T13:19:04Z

PR Review: Migrate embedder configuration to PipelineSettings

This is a clean, well-scoped migration. The overall approach is sound and the async fix in the WebSocket consumer is correct. A few issues worth addressing before merge:

Bugs / Issues

1. Incorrect patch target in `test_personal_corpus.py` for corpus_tasks tests

Several tests patching opencontractserver.pipeline.utils.get_default_embedder_path exercise ensure_embeddings_for_corpus (in corpus_tasks.py), where the import in the diff is inside the function body (like other local imports in that task):

# corpus_tasks.py – local import inside the task function
from opencontractserver.pipeline.utils import get_default_embedder_path

Because the import is local, re-lookup at call time means patching the source module attribute works here. This is fine — but it's worth verifying this is intentional, since the more idiomatic target would be opencontractserver.tasks.corpus_tasks.get_default_embedder_path if the import were ever hoisted to module level.

2. `get_default_embedder_for_filetype()` may silently return `None` on fresh installs

Before this PR, the function read from DEFAULT_EMBEDDERS_BY_FILETYPE in Django settings — always populated from base.py. After, it delegates entirely to get_preferred_embedder(), which reads from PipelineSettings.preferred_embedders. On a freshly installed system where the migration has run but preferred_embedders is empty, callers that previously always got a class back may now get None. Whether PipelineSettings.get_instance() seeds preferred_embedders from the old settings values on creation should be verified and documented. If it does, this is fine; if not, it's a regression.

3. Stale process-level cache in long-running Celery workers

PipelineSettings.get_instance() uses a class-level cache. A change made via the admin UI will be cached out in all running Celery workers until they restart. This is an inherent limitation of the pattern, but unlike the old settings-based approach it's easy to miss — an operator changing the default embedder via the admin will see it work for new web requests but embedding tasks will keep using the old value. This trade-off should be called out in the error/warning messages or documentation so operators know they need to restart workers after changing the embedder.

Minor Issues

4. `queries.py`: inline import is inside the `if` block

if document_pk or corpus_pk:
    from opencontractserver.pipeline.utils import get_default_embedder_path
    scoped_embedder_path = get_default_embedder_path()

The import is nested inside a conditional rather than at the top of the function. If the circular import concern applies, the import should at minimum be at function scope. Even better — check whether the circular import can be resolved by adjusting module organization so the import can live at the file top like the rest.

5. `get_default_embedder_path()` return type vs. actual behavior

The type annotation is -> str and the docstring says "or empty string if not configured", but the implementation returns whatever PipelineSettings.get_default_embedder() returns, which may be None. The immediate callsites handle both (if not x, x or None), but a None return violates the declared return type and could surprise future callers. Either guarantee an empty string via return ... or "" or change the annotation to -> Optional[str].

6. Documentation left partially stale

In docs/architecture/embeddings_creation_and_retrieval.md, DEFAULT_EMBEDDERS_BY_FILETYPE was removed but the DEFAULT_EMBEDDER bullet still reads as if it's a Django settings key:

- **`DEFAULT_EMBEDDER`**: Fallback embedder when no preferred embedder is found

This entry should be updated to reflect that the value is now managed via the PipelineSettings singleton rather than config/settings/base.py.

What's Done Well

The async correctness fix in the WebSocket consumer (database_sync_to_async(get_default_embedder_path)()) is exactly right per CLAUDE.md's guidance on async tool/ORM calls.
The _invalidate_cache() addition in test cleanup correctly handles the edge case where queryset.delete() bypasses Model.delete() — good defensive test hygiene.
The checks.py guard that returns early if documents_pipelinesettings table doesn't exist is correctly defensive for migrations-in-progress states.
Test patches are placed at the right module locations in all cases reviewed (backfill_default_embeddings, embeddings_task, pipeline/utils for inline-import callers).
Removing DEFAULT_EMBEDDERS_BY_FILETYPE as dead config is clean — no lingering references found.

Overall this is in good shape. The stale-cache / worker-restart concern (item 3) is the one I'd most want addressed before merge, even if just as a note in the warning message or docs.

codecov · 2026-02-19T14:00:04Z

Codecov Report

❌ Patch coverage is 98.11321% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
opencontractserver/corpuses/checks.py	75.00%	1 Missing ⚠️
opencontractserver/pipeline/utils.py	83.33%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

JSv4 added 8 commits February 18, 2026 21:15

Merge remote-tracking branch 'origin/main' into claude/verify-pipelin…

efcd7f8

…e-defaults-XDUWa

Migrate corpus_tasks.ensure_embeddings_for_corpus to PipelineSettings

4c31288

Replace settings.DEFAULT_EMBEDDER with get_default_embedder_path() to complete the migration to the database-backed PipelineSettings singleton.

Fix backfill_default_embeddings tests to patch get_default_embedder_path

b660520

Replace decorative @override_settings(DEFAULT_EMBEDDER=...) with @patch targeting the actual code path through get_default_embedder_path().

Fix personal_corpus tests to patch get_default_embedder_path

8ce1bc6

Replace @override_settings(DEFAULT_EMBEDDER=...) with @patch targeting the actual code path through get_default_embedder_path() in corpus_tasks.

Use get_default_embedder_path() in vector store tests for consistency

3814827

Replace settings.DEFAULT_EMBEDDER value lookups with get_default_embedder_path() to match what the vector store search code now reads from PipelineSettings.

Remove redundant 'or None' in Corpus.save() embedder default

6e811b3

get_default_embedder_path() returns "" when unconfigured, and the subsequent 'if not ... and default_embedder:' guard handles falsy values correctly without the conversion to None.

Fix black formatting in test files

be4502a

Fix patch targets for locally-imported get_default_embedder_path

f746af3

Since get_default_embedder_path is imported inside function bodies (not at module level), @patch must target the source module opencontractserver.pipeline.utils, not the consuming module.

JSv4 added 4 commits February 19, 2026 05:45

Migrate test assertions from settings.DEFAULT_EMBEDDER to get_default…

9b4e12f

…_embedder_path() Replace fragile transitive coupling through Django settings with direct calls to the production API across 5 test files (13 occurrences).

Update comments and docstrings to reference PipelineSettings instead …

c84ab22

…of DEFAULT_EMBEDDER

Fix import ordering and remove unused settings import

57d90fa

Update auto-generated documentation screenshots

0dfda1a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate embedder configuration from Django settings to PipelineSettings#896

Migrate embedder configuration from Django settings to PipelineSettings#896
JSv4 wants to merge 15 commits intomainfrom
claude/verify-pipeline-defaults-XDUWa

JSv4 commented Feb 17, 2026 •

edited

Loading

Uh oh!

claude bot commented Feb 17, 2026

Uh oh!

claude bot commented Feb 19, 2026

Uh oh!

claude bot commented Feb 19, 2026

Uh oh!

claude bot commented Feb 19, 2026

Uh oh!

codecov bot commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Uh oh!

Conversation

JSv4 commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Review Feedback Addressed

Implementation Details

Uh oh!

claude bot commented Feb 17, 2026

PR Review: Migrate embedder configuration from Django settings to PipelineSettings

Correctness & Logic

Test Coverage Gaps

Performance

Architecture / Design

Summary

Uh oh!

claude bot commented Feb 19, 2026

PR Review: Migrate embedder configuration to PipelineSettings

Bug (Critical) — Sync ORM call inside an async method

Bug (Minor) — Stale log message

Dead Code — DEFAULT_EMBEDDERS_BY_FILETYPE setting

Incomplete test migration — settings.DEFAULT_EMBEDDER still used as an assertion value

Documentation nits

What's good

Uh oh!

claude bot commented Feb 19, 2026

PR Review: Migrate embedder configuration to PipelineSettings

Bug / Correctness

Clarity / Potential Confusion

Minor / Non-blocking

Summary

Uh oh!

claude bot commented Feb 19, 2026

PR Review: Migrate embedder configuration to PipelineSettings

Bugs / Issues

1. Incorrect patch target in test_personal_corpus.py for corpus_tasks tests

2. get_default_embedder_for_filetype() may silently return None on fresh installs

3. Stale process-level cache in long-running Celery workers

Minor Issues

4. queries.py: inline import is inside the if block

5. get_default_embedder_path() return type vs. actual behavior

6. Documentation left partially stale

What's Done Well

Uh oh!

codecov bot commented Feb 19, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

JSv4 commented Feb 17, 2026 •

edited

Loading

Dead Code — `DEFAULT_EMBEDDERS_BY_FILETYPE` setting

Incomplete test migration — `settings.DEFAULT_EMBEDDER` still used as an assertion value

1. Incorrect patch target in `test_personal_corpus.py` for corpus_tasks tests

2. `get_default_embedder_for_filetype()` may silently return `None` on fresh installs

4. `queries.py`: inline import is inside the `if` block

5. `get_default_embedder_path()` return type vs. actual behavior