Skip to content

feat(indexer): implement true incremental indexing pipeline#8

Open
dubscode wants to merge 1 commit intomainfrom
feat/incremental-indexing-pipeline
Open

feat(indexer): implement true incremental indexing pipeline#8
dubscode wants to merge 1 commit intomainfrom
feat/incremental-indexing-pipeline

Conversation

@dubscode
Copy link
Contributor

@dubscode dubscode commented Mar 4, 2026

Summary

This PR implements true incremental indexing and removes the previous broad full-reindex fallback behavior for normal filesystem changes.

What changed

  • Added shared file indexing primitives for reusable upsert/delete behavior:
    • upsertIndexedFileByPath
    • deleteIndexedFileByPath
  • Refactored full indexing to use shared indexing primitives.
  • Reworked incremental indexing to:
    • Process normalized, coalesced path-targeted operations only.
    • Handle explicit deletes (unlink) and treat missing files during upsert as deletes.
    • Return detailed run counters for inserted/updated/deleted files and chunk insert/delete totals.
    • Allow full fallback only for unresolved unscoped git-head transitions, with explicit reason metadata/logging.
  • Updated daemon wiring to pass richer event metadata:
    • FS: path + event type (add/change/unlink)
    • Git watcher: previous/current head SHA
  • Archived OpenSpec change true-incremental-indexing and synced capability spec to:
    • openspec/specs/incremental-indexing/spec.md

Verification

  • pnpm checks
    • pnpm test
    • pnpm typecheck
    • pnpm lint
    • pnpm build
  • Added targeted tests in tests/incremental-indexing.test.ts covering:
    • single-path update without reindexing unrelated files
    • delete cascade cleanup for files/chunks/chunk_embeddings/bm25_documents
    • fallback gating for unresolved git-head transitions

Notes

  • The fallback warning line in the fallback test is expected and intentional.

🥞 DubStack

Add shared file indexing helpers, replace broad incremental fallback with
path-targeted operations, wire watcher metadata for fs/git-head triggers,
and add coverage for targeted updates, deletes, and fallback gating.

Archive OpenSpec change true-incremental-indexing and sync the new
incremental-indexing spec into main specs.
Copilot AI review requested due to automatic review settings March 4, 2026 05:16
@github-actions
Copy link

github-actions bot commented Mar 4, 2026

PR Checks Summary

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements a true incremental indexing pipeline so daemon-triggered indexing updates only the changed paths (with explicit delete handling), while keeping a narrow full-index fallback for unresolved git-head transitions.

Changes:

  • Refactors shared file upsert/delete + chunk/embedding/document generation into createFileIndexHelpers, used by both full and incremental indexing.
  • Reworks runIncrementalIndex to process normalized/coalesced per-path operations and to return detailed per-run counters, with git-head fallback logic.
  • Updates daemon watcher wiring and adds incremental indexing integration tests + OpenSpec documentation.

Reviewed changes

Copilot reviewed 7 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/incremental-indexing.test.ts Adds integration tests for targeted updates, delete cleanup, and git-head fallback behavior.
src/daemon/main.ts Passes richer FS/git trigger metadata into incremental indexing and forwards payload to hooks.
src/context/indexer/incremental.ts Implements per-path incremental pipeline, coalescing, git diff resolution, and fallback result reporting.
src/context/indexer/full-index.ts Refactors full indexing to use shared file indexing helpers.
src/context/indexer/file-index.ts New shared helper module for file upsert/delete, chunking, embedding, and bm25 document generation.
openspec/specs/incremental-indexing/spec.md Adds a spec documenting incremental indexing requirements and scenarios.
openspec/changes/archive/2026-03-04-true-incremental-indexing/tasks.md Marks archived change tasks as completed.
openspec/changes/archive/2026-03-04-true-incremental-indexing/specs/incremental-indexing/spec.md Archives the incremental-indexing spec content under the change record.
openspec/changes/archive/2026-03-04-true-incremental-indexing/proposal.md Archives proposal describing motivation and impact.
openspec/changes/archive/2026-03-04-true-incremental-indexing/design.md Archives design decisions, tradeoffs, and migration plan.
openspec/changes/archive/2026-03-04-true-incremental-indexing/.openspec.yaml Adds OpenSpec metadata for the archived change.
Comments suppressed due to low confidence (3)

src/context/indexer/file-index.ts:113

  • Building absolutePath via string concatenation can produce mixed separators (and double slashes) depending on platform and incoming relativePath format. Using path.join(input.repoRoot, relativePath) (and/or normalizing) would be more robust across OSes.
    const absolutePath = `${input.repoRoot}/${relativePath}`;
    const content = await readFile(absolutePath, 'utf8').catch(() => null);

src/context/indexer/incremental.ts:176

  • Incremental operations are only normalized/coalesced here; they are not filtered against the same ignore set used by runFullIndex (e.g. !coverage/**) and the FS watcher (!node_modules/**, !dist/**, !.git/**). This can cause incremental runs to index generated/ignored paths that full indexing would skip, leading to inconsistent index contents depending on trigger. Consider applying a shared ignore/filter step before upserting/deleting.
function coalesceOperations(
  repoRoot: string,
  operations: IncrementalPathOperation[]
): IncrementalPathOperation[] {
  const byPath = new Map<string, IncrementalPathOperation>();
  for (const operation of operations) {
    const normalizedPath = normalizeRepoRelativePath(repoRoot, operation.path);
    if (!normalizedPath) {
      continue;
    }

src/context/indexer/incremental.ts:118

  • The branch that treats a missing file as a delete is important behavior, but it isn't covered by the new tests (current tests cover explicit delete ops, not an upsert for a now-missing path). Adding a test where an upsert operation is passed for a path that has been removed on disk would lock this in.
    const upserted = await fileIndexHelpers.upsertIndexedFileByPath(operation.path);
    if (upserted.status === 'missing') {
      const deleted = await fileIndexHelpers.deleteIndexedFileByPath(operation.path);
      counters.filesDeleted += deleted.fileDeleted ? 1 : 0;
      counters.chunksDeleted += deleted.chunksDeleted;
      continue;
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

async function upsertIndexedFileByPath(relativePath: string): Promise<UpsertFileResult> {
const absolutePath = `${input.repoRoot}/${relativePath}`;
const content = await readFile(absolutePath, 'utf8').catch(() => null);
if (!content) {
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readFile returns an empty string for empty files, but the current if (!content) check treats that as missing and will trigger delete behavior upstream. Consider checking content === null (or catching only ENOENT) so empty files are indexed correctly.

Suggested change
if (!content) {
if (content === null) {

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants