Skip to content

refactor: file-based migrations — eliminate silent version-collision bugs#118

Closed
andreinknv wants to merge 1 commit into
colbymchenry:mainfrom
andreinknv:refactor/migration-files
Closed

refactor: file-based migrations — eliminate silent version-collision bugs#118
andreinknv wants to merge 1 commit into
colbymchenry:mainfrom
andreinknv:refactor/migration-files

Conversation

@andreinknv
Copy link
Copy Markdown
Contributor

@andreinknv andreinknv commented Apr 27, 2026

Reviewers: for a 5-minute review path that surfaces only what to verify, see the review checklist comment below. The original description is preserved here for full technical detail.


refactor: file-based migrations — eliminate version-collision bug class

Today every PR adding a schema migration claims
CURRENT_SCHEMA_VERSION = next AND adds an array entry to
migrations: Migration[] in src/db/migrations.ts. Two PRs both
claiming the same version resolve as: "second PR's v4 silently
no-ops on existing DBs" — a real silent-data-loss bug class
(PR #113's reviewer caught one).

After this refactor:

Adding a new schema migration:

  1. Pick the next free 3-digit prefix (git ls-files 'src/db/migrations/[0-9]*.ts' shows what's taken).
  2. Create src/db/migrations/<NNN>-<short-name>.ts exporting a
    MIGRATION: MigrationModule (description + up).
  3. Add one import line and one entry to
    src/db/migrations/index.ts's REGISTERED_MODULES array.

Two PRs both creating 004-foo.ts collide on the FILESYSTEM —
the maintainer sees it instantly. No more silent skipped
migrations.

What's new

  • src/db/migrations/types.tsMigrationModule { description, up } and Migration extends MigrationModule { version }.
  • src/db/migrations/002-project-metadata.ts — extracted v2
    body verbatim.
  • src/db/migrations/003-lower-name-index.ts — extracted v3
    body verbatim.
  • src/db/migrations/index.ts — central registry. Static-imports
    each migration, parses the version FROM THE FILENAME (no
    hand-typed version field that can drift), enforces strict
    NNN-kebab-name.ts shape, validates uniqueness/sort at module
    load (throws loudly on collision), exposes ALL_MIGRATIONS and
    CURRENT_SCHEMA_VERSION.
  • src/db/migrations.ts — refactored to a thin runner. Same
    exported surface (CURRENT_SCHEMA_VERSION, getCurrentVersion,
    runMigrations, needsMigration, getPendingMigrations,
    getMigrationHistory, Migration type) — every existing import
    keeps working unchanged.
  • __tests__/migrations-registry.test.ts — 8 invariant tests:
    registry non-empty, versions unique + strictly ascending,
    CURRENT_SCHEMA_VERSION matches max, every file matches the
    strict NNN-kebab-name pattern, no orphan files, no phantom
    registrations.

Reviewer pass

Independent reviewer ran once. 3 REQUEST_CHANGES + 1 INFO addressed:

  1. Hand-typed version field in REGISTERED_MODULES could drift
    from filename. Fixed: removed the version field; registry
    now parses version from filename via FILENAME_PATTERN regex
    inside validateRegistered.
  2. Filename-pattern test was lenient (allowed 4-digit or 1-digit
    prefixes). Fixed: new "every migration file matches the
    strict NNN-kebab-name.ts pattern" test catches malformed
    filenames as orphan-detection-bypassing offenders.
  3. getPendingMigrations returned readonly Migration[],
    breaking callers that typed the result as Migration[].
    Fixed: returns a fresh mutable array via .slice().
  4. No throw-on-duplicate test for validateRegistered (module
    evaluation timing). Acknowledged; not added.

Backward compat

Every existing import works unchanged:

  • import { CURRENT_SCHEMA_VERSION } from './migrations'
  • import { runMigrations } from './migrations'
  • import { needsMigration } from './migrations'
  • import { getMigrationHistory } from './migrations'
  • import { getPendingMigrations } from './migrations' — returns
    mutable Migration[] (preserved)
  • Migration type — re-exported

Affected open PRs

Every migration-touching PR (#102 UNIQUE edges, #105 cochange,
#108 perf db, #111 LLM features, my #112 centrality+churn, #113
issue-history, #114 config-refs, #115 sql-refs) currently
claims migration v4 and conflicts with each other on
migrations.ts. After this lands they each become:

  • 1 new file: src/db/migrations/<NNN>-<name>.ts
  • 2 lines in registry.ts (import + array entry)

Conflict shape changes from "next free version + array entry +
CURRENT_SCHEMA_VERSION bump in one file" (4-way conflict) to "1
new file" + 2-line registry edit. If two PRs target the same
NNN, the filesystem collision surfaces immediately — no silent
skipped migrations.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

Today every PR adding a schema migration claims
`CURRENT_SCHEMA_VERSION = next` AND adds an array entry to
`migrations: Migration[]` in src/db/migrations.ts. Two PRs both
claiming the same version resolve as: "second PR's v4 silently
no-ops on existing DBs" — a real silent-data-loss bug class
(PR colbymchenry#113's reviewer caught one).

After this refactor:

  Adding a new schema migration:
  1. Pick the next free 3-digit prefix (`git ls-files
     'src/db/migrations/[0-9]*.ts'` shows what's taken).
  2. Create `src/db/migrations/<NNN>-<short-name>.ts` exporting a
     `MIGRATION: MigrationModule` (description + up).
  3. Add one import line and one entry to
     `src/db/migrations/index.ts`'s REGISTERED_MODULES array.

Two PRs both creating `004-foo.ts` collide on the FILESYSTEM —
the maintainer sees it instantly. No more silent skipped
migrations.

## What's new

- `src/db/migrations/types.ts` — `MigrationModule { description,
  up }` and `Migration extends MigrationModule { version }`.
- `src/db/migrations/002-project-metadata.ts` — extracted v2
  body verbatim.
- `src/db/migrations/003-lower-name-index.ts` — extracted v3
  body verbatim.
- `src/db/migrations/index.ts` — central registry. Static-imports
  each migration, parses the version FROM THE FILENAME (no
  hand-typed version field that can drift), enforces strict
  `NNN-kebab-name.ts` shape, validates uniqueness/sort at module
  load (throws loudly on collision), exposes ALL_MIGRATIONS and
  CURRENT_SCHEMA_VERSION.
- `src/db/migrations.ts` — refactored to a thin runner. Same
  exported surface (CURRENT_SCHEMA_VERSION, getCurrentVersion,
  runMigrations, needsMigration, getPendingMigrations,
  getMigrationHistory, Migration type) — every existing import
  keeps working unchanged.
- `__tests__/migrations-registry.test.ts` — 8 invariant tests:
  registry non-empty, versions unique + strictly ascending,
  CURRENT_SCHEMA_VERSION matches max, every file matches the
  strict NNN-kebab-name pattern, no orphan files, no phantom
  registrations.

## Reviewer pass

Independent reviewer ran once. 3 REQUEST_CHANGES + 1 INFO addressed:

1. Hand-typed `version` field in REGISTERED_MODULES could drift
   from filename. **Fixed**: removed the version field; registry
   now parses version from filename via FILENAME_PATTERN regex
   inside validateRegistered.
2. Filename-pattern test was lenient (allowed 4-digit or 1-digit
   prefixes). **Fixed**: new "every migration file matches the
   strict NNN-kebab-name.ts pattern" test catches malformed
   filenames as orphan-detection-bypassing offenders.
3. `getPendingMigrations` returned `readonly Migration[]`,
   breaking callers that typed the result as `Migration[]`.
   **Fixed**: returns a fresh mutable array via `.slice()`.
4. No throw-on-duplicate test for validateRegistered (module
   evaluation timing). Acknowledged; not added.

## Backward compat

Every existing import works unchanged:
- `import { CURRENT_SCHEMA_VERSION } from './migrations'` ✓
- `import { runMigrations } from './migrations'` ✓
- `import { needsMigration } from './migrations'` ✓
- `import { getMigrationHistory } from './migrations'` ✓
- `import { getPendingMigrations } from './migrations'` — returns
   mutable Migration[] (preserved)
- `Migration` type — re-exported

## Affected open PRs

Every migration-touching PR (colbymchenry#102 UNIQUE edges, colbymchenry#105 cochange,
colbymchenry#108 perf db, colbymchenry#111 LLM features, my colbymchenry#112 centrality+churn, colbymchenry#113
issue-history, colbymchenry#114 config-refs, colbymchenry#115 sql-refs) currently
claims migration v4 and conflicts with each other on
`migrations.ts`. After this lands they each become:
- 1 new file: `src/db/migrations/<NNN>-<name>.ts`
- 2 lines in registry.ts (import + array entry)

Conflict shape changes from "next free version + array entry +
CURRENT_SCHEMA_VERSION bump in one file" (4-way conflict) to "1
new file" + 2-line registry edit. If two PRs target the same
NNN, the filesystem collision surfaces immediately — no silent
skipped migrations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@andreinknv
Copy link
Copy Markdown
Contributor Author

Part of a coordinated 4-PR refactor that unblocks the open PR backlog. See #120 for the full merge-order guide explaining how this PR fits into the picture and how each open language/feature PR rebases onto the new pattern.

andreinknv added a commit to andreinknv/codegraph that referenced this pull request Apr 27, 2026
Mines Fixes/Closes/Resolves #N commits and attributes them to
symbols touched by each commit hunks. Lands as a registered
IndexHook (issue-history).

- Migration 005: symbol_issues table
- src/issue-history/ (pure module): mineIssueHistory + parse-diff
- src/index-hooks/issue-history.ts (registered hook)
- CodeGraph public method: getIssuesForNode
- codegraph_node MCP tool now surfaces issue history line
- enableIssueHistory flag default true wired through config merge
- Removed defensive ensureSymbolIssuesTable guard and its test:
  the v4-collision bug class is impossible under file-based
  migrations (PR colbymchenry#118 refactor); filenames collide on the
  filesystem instead.

Tests: 470/471 pass (1 watcher flake under load, isolation OK).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@andreinknv
Copy link
Copy Markdown
Contributor Author

Maintainer review checklist (5-min path)

This refactor is behavior-preserving and fixes a silent-data-loss bug class in the existing migration code. Below is the shortest path to verify both claims.

1. The bug class this eliminates

The current src/db/migrations.ts has a single Migration[] array AND a hand-typed CURRENT_SCHEMA_VERSION = N constant. When two PRs both claim version: 4, the second one to merge wins the array slot but runMigrations only runs each version number once per DB — the second PR's migration silently no-ops on existing DBs. PR #113's reviewer caught one such collision.

After this refactor, two PRs both creating 004-foo.ts collide on the filesystem — the maintainer sees it instantly when applying the second.

2. What changed structurally

  • New src/db/migrations/types.ts (25 lines) — MigrationModule { description, up } and Migration extends MigrationModule { version }
  • New src/db/migrations/index.ts (106 lines) — central registry. Static-imports each migration, parses the version FROM the filename (regex ^(\d{3})-[a-z0-9]+(?:-[a-z0-9]+)*\.ts$), enforces uniqueness/sort at module load, throws loudly on collision
  • New src/db/migrations/002-project-metadata.ts and 003-lower-name-index.ts — the existing v2/v3 bodies extracted verbatim
  • Shrunk src/db/migrations.ts (-62 net lines): now a thin runner. Every existing exported symbol re-exported with identical signatures

3. What stays exactly the same — verify in seconds

Public surface Where Status
CURRENT_SCHEMA_VERSION src/db/migrations.ts now derived from max(registered versions) instead of hand-typed
runMigrations src/db/migrations.ts unchanged signature
needsMigration src/db/migrations.ts unchanged
getCurrentVersion src/db/migrations.ts unchanged
getPendingMigrations src/db/migrations.ts returns mutable Migration[] (preserved — was a regression in the first draft, fixed in reviewer pass)
getMigrationHistory src/db/migrations.ts unchanged
Migration type src/db/migrations.ts re-exported from migrations/types.ts

Every existing import { … } from './migrations' keeps working unchanged.

4. Invariants now enforced by tests (__tests__/migrations-registry.test.ts, 8 tests)

  • ✅ Versions are unique and strictly ascending — duplicate-version regression impossible
  • CURRENT_SCHEMA_VERSION matches the highest registered version — drift between the constant and the registry impossible (the constant is now derived)
  • ✅ Every migration file matches the strict NNN-kebab-name.ts pattern — typo'd filenames (004foo.ts, 4-foo.ts) caught at module load
  • No orphan files: every src/db/migrations/NNN-*.ts on disk is registered — silently dropped migrations are impossible
  • No phantom registrations: every registered version has a matching file — references to deleted migrations are impossible
  • ✅ Each migration has a non-empty description and a function up() — half-implemented migration entries can't sneak through

The orphan + phantom tests are the load-bearing pair: a future PR adding a migration file but forgetting to register it (or vice versa) fails CI immediately.

5. Reviewer pass — caught and fixed before this comment

  1. Hand-typed version field in REGISTERED_MODULES could drift from filename → removed; version is parsed from filename via regex inside validateRegistered. No hand-typed version anywhere.
  2. Filename-pattern test was lenient (allowed 4-digit or 1-digit prefixes) → strict NNN test added so malformed filenames don't bypass orphan detection.
  3. getPendingMigrations started returning readonly Migration[], breaking callers typed as Migration[]reverted to mutable via .slice().

6. Mergeability

If §1-§5 look right, this is safe to land. The runtime semantics of running migrations are bit-identical to the previous migrations.ts; only the source-of-truth shape changed.

🤖 Comment template generated with Claude Code

andreinknv added a commit to andreinknv/codegraph that referenced this pull request Apr 28, 2026
These two narrow indexes are fully covered by the wider
idx_edges_source_kind and idx_edges_target_kind composite indexes
via SQLite's left-prefix scan. Keeping them costs DB size and bulk-
insert time without giving any query that the kind-prefixed indexes
don't already cover.

Empirical measurements on a 50K-node / 250K-edge synthesized DB
(scripts/spikes/spike-edge-indexes.mjs):

  - DB size: -22.2% (34.7 MB -> 27.0 MB)
  - Bulk insert: 1.37x faster (590ms -> 431ms)
  - Source-only / target-only query latency: no regression
    (EXPLAIN: SEARCH edges USING COVERING INDEX
     idx_edges_source_kind (source=?))

Ported to the post-colbymchenry#118 file-based-migration form: migration body
lives in src/db/migrations/017-drop-redundant-edge-indexes.ts and
is registered in src/db/migrations/index.ts. CURRENT_SCHEMA_VERSION
is auto-derived from the registry — no constant bump.

Schema: removes the two CREATE INDEX statements on the narrow
indexes; the wider kind-prefixed indexes (which actually serve
queries) stay.
@colbymchenry
Copy link
Copy Markdown
Owner

Closing — same reasoning as #117 and #119. The bug class you flag is real, but at codegraph's current scale (3 migrations total in repo history, single maintainer) the cost of accepting a new file-based migration system to prevent a 5-second review check exceeds the benefit. The cited 8 affected PRs are part of the same flood I've been declining on independent grounds.

If migration cadence picks up materially or I ever hit a real silent-skip incident, I'll build this on my own terms. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants