Skip to content

feat(embedding): add strategy-based fallback engine with provenance#7

Merged
dubscode merged 4 commits intomainfrom
codex/feat/embedding-strategy-parity
Mar 4, 2026
Merged

feat(embedding): add strategy-based fallback engine with provenance#7
dubscode merged 4 commits intomainfrom
codex/feat/embedding-strategy-parity

Conversation

@dubscode
Copy link
Contributor

@dubscode dubscode commented Mar 4, 2026

Summary

Implements embedding parity hardening with an explicit strategy engine, controlled Anthropic fallback policy, and provenance plumbing across indexing/retrieval.

This PR also archives the OpenSpec change and syncs its capability specs into main OpenSpec specs.

What changed

  • Added embedding strategy V2 foundation:
    • typed strategy config schema (version: 1.0)
    • deterministic strategy resolution by strategy id
    • startup validation with structured config errors (unknown/missing/cyclic/default issues)
    • backward-compatible legacy default mapping
  • Added Anthropic native-first fallback behavior:
    • fallback only when failure category is explicitly eligible
    • fallback order enforced from config
    • terminal failure behavior when fallback is disallowed/exhausted/absent
  • Added normalized embedding provenance envelope:
    • strategy id, attempt path, resolved provider/model, fallback state
    • failure category + terminal reason on failure
  • Wired provenance into runtime data flows:
    • persisted provenance JSON in chunk_embeddings
    • exposed embedding provenance metadata in retrieval results
    • optional provenance log emission via env flag
  • Improved DB migration handling:
    • migration runner now executes all src/db/migrations/*.sql in sorted order
    • added 0002_embedding_provenance.sql
  • Added test coverage:
    • strategy config/resolution conformance
    • Anthropic policy behavior
    • provenance completeness for success/failure
  • Added rollout/rollback docs:
    • docs/embedding-strategy-rollout.md
    • README env var updates
  • OpenSpec updates:
    • archived embedding-parity-hardening
    • created synced specs:
      • openspec/specs/embedding-strategy-configuration/spec.md
      • openspec/specs/anthropic-embedding-fallback-and-provenance/spec.md

Feature gate

Embedding strategy engine is gated by:

  • DUBSBOT_EMBEDDING_STRATEGY_V2=1

Optional:

  • DUBSBOT_EMBEDDING_STRATEGY_CONFIG_JSON
  • DUBSBOT_EMBEDDING_PROVENANCE_LOG=1

Validation

  • pnpm checks passed locally:
    • pnpm test
    • pnpm typecheck
    • pnpm lint
    • pnpm build

Notes


🥞 DubStack

dubscode added 2 commits March 3, 2026 17:15
…change

- add retrieval proofing benchmark fixtures and profile thresholds
- add proofing runner, deterministic scoring, and report generation
- add retrieval-proof CLI command and smoke CI gate
- add tests and docs for local/CI proofing workflow
- archive retrieval-quality-proofing change and sync main spec
- add typed embedding strategy config, validation, and deterministic resolution
- add anthropic native-first fallback policy with failure-category control
- add embedding provenance envelope and persist provenance on chunk embeddings
- surface provenance in retrieval metadata and optional provenance logging
- add migration runner support for sequential SQL migrations and provenance column
- add conformance/policy/provenance tests and rollout documentation
- archive embedding-parity-hardening change and sync new capability specs
@github-actions
Copy link

github-actions bot commented Mar 4, 2026

PR Checks Summary

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a strategy-based, configurable embedding fallback engine with full provenance tracking for the embedding pipeline. It adds explicit strategy selection (primary provider/model + ordered fallback list), startup validation, a provenance envelope persisted to the database, and a feature-flag-gated rollout path. The system is backward-compatible, defaulting to the legacy embedding behavior unless DUBSBOT_EMBEDDING_STRATEGY_V2=1 is set.

Changes:

  • New src/context/embedding/ module: strategy.ts (schema/validation), engine.ts (execution with fallback + provenance), and config.ts (loading + legacy default mapping).
  • Refactored src/db/migrate.ts from single-file to glob-based ordered migration runner; added src/db/migrations/0002_embedding_provenance.sql to add provenance JSONB column to chunk_embeddings.
  • Integration into full-index.ts (indexing path), hybrid.ts (retrieval metadata), daemon/main.ts, and cli/runtime.ts; new conformance tests in tests/embedding-strategy.test.ts; rollout documentation.

Reviewed changes

Copilot reviewed 15 out of 20 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/context/embedding/strategy.ts Core schema (Zod), validation, cycle detection, and typed error classes
src/context/embedding/engine.ts BFS fallback execution loop with provenance tracking
src/context/embedding/config.ts Config loading from env, legacy default strategy construction
src/context/indexer/full-index.ts Strategy V2 path for embedding during full indexing
src/context/retrieval/hybrid.ts Exposes provenance fields from DB in retrieval results
src/db/migrations/0002_embedding_provenance.sql Adds provenance JSONB column to chunk_embeddings
src/db/migrate.ts Refactors migration runner from hardcoded single file to glob-ordered multi-file
src/daemon/main.ts Calls loadEmbeddingStrategyConfig() at startup for early validation
src/cli/runtime.ts Loads and exposes embeddingStrategyConfig from the runtime
tests/embedding-strategy.test.ts Conformance tests for config, fallback policy, and provenance completeness
docs/embedding-strategy-rollout.md Rollout and rollback procedure documentation
README.md Documents new env vars and rollout guide link
openspec/specs/*/spec.md (×2) Spec files for new capabilities (with placeholder Purpose sections)
openspec/changes/archive/*/ Archived change proposal, design, tasks documents

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dubscode dubscode changed the base branch from codex/feat/retrieval-quality-proofing to main March 4, 2026 03:53
@dubscode dubscode merged commit 63718dd into main Mar 4, 2026
1 check passed
@dubscode dubscode deleted the codex/feat/embedding-strategy-parity branch March 4, 2026 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants