Skip to content

Migrate duplicate-base, record-read, lookup-search-index onto lifecycle drivers#64

Merged
HynLcc merged 4 commits into
mainfrom
migrate-batch-duplicatebase-read-search
Jun 20, 2026
Merged

Migrate duplicate-base, record-read, lookup-search-index onto lifecycle drivers#64
HynLcc merged 4 commits into
mainfrom
migrate-batch-duplicatebase-read-search

Conversation

@HynLcc

@HynLcc HynLcc commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Migrates three more legacy runner kinds onto lifecycle drivers, taking the
tracker from 26/35 → 29/35 runner kinds (36/55 → 43/55 cases). One
self-contained commit per runner (incl. the diff-artifacts masks it needs).

Driver decisions (reuse → extend → new)

Runner Decision Why
duplicate-base 2nd member of existing duplicate-lifecycle.ts, driver byte-unchanged Its run/seed already mirror the driver; the three operations (duplicate / duplicate-stream / export-stream) all flow through one opaque runPrimary. Because the shared driver is untouched, duplicate-table is unaffected and was not re-verified.
record-read NEW read-lifecycle.ts (1st member) No existing driver fits "measure a read over a seeded table". The read family's signature is a non-destructive measured op, so the driver owns the cleanup policy (drop seed tables only when not reusable / not isolated) — its real distinction from duplicate-lifecycle, which always drops a created copy.
lookup-search-index 2nd member of read-lifecycle.ts Same seed→ready→measured-read→drop-if-not-reusable shape; its keyword×sample p95 lives in the opaque runPrimary, seed always reusable so cleanup drops nothing. A real 2nd member proves the new driver generic.

Every migrated runner reuses its unchanged buildResult and seed/verify
helpers, so the artifact is byte-for-byte equivalent (G1).

G1 artifact equivalence — baseline (legacy) ↔ candidate (migrated), per case × engine

Local methodology: pinned teable-ee, seed cache on, all measured runs are cache hits (baseline seeds warm; candidate seeds warmed once before measuring) so there is no build-vs-restore asymmetry. Baseline A↔B clean (14/14) before editing; baseline↔candidate clean (14/14) after.

Case v1 v2 primary metric (v1 / v2, max)
duplicate-base/10k-3tables-link-2workflow ✅ G1 clean ✅ G1 clean duplicateBaseRequestMs 1353 / 3745 (max 180000)
duplicate-base/10k-3tables-link-2workflow-stream duplicateBaseStreamMs 1235 / 2176
export-base/10k-3tables-link-2workflow-stream exportBaseStreamMs 1105 / 637
record-read/10k-50fields-10x1k-pages getRecords10kPagedScanMs 887 / 1362 (max 30000)
record-read/10k-50fields-filter-sort-groupby-overhead getRecordsFilterSortGroupByOverheadMs 2419 / 2446
search/search-index-off-10k-20search-fields lookupSearchIndexP95Ms 65 / 59 (max 1500)
search/search-index-on-10k-20search-fields lookupSearchIndexP95Ms 63 / 68

All result=pass, error=null. Routing preserved: duplicate-base & record-read assert routing (routeMatched=true; duplicate-base v1 x-teable-v2=false / v2 =true); export-base & lookup-search-index assert no routing, as before. traceRefCount preserved vs baseline for every case×engine (28/28, 1/1, 10/10, 20/20, 270/270; locally savedTraceCount=0 with no Jaeger — CI is where saved==ref / failed=0 holds).

Negative tests (comparator teeth), per runner

For duplicate-base, record-read and lookup-search-index:

  • (a) semantic perturbation → diff FAILS (details.duplicate.operation / details.operation / details.tableIndexMode)
  • (b) masked-field change → diff PASSES (baseId / queryVariant.overheadRatio / seedCache.seedHash)
  • (c) unmasked semantic sibling of a masked field → still DIFFS (details.duplicate.status / queryVariant.config.filterFieldName / seedCache.seedNamePrefix)

9/9 as expected. Mask necessity also confirmed: with the pre-edit diff script, baseline A↔B fails on exactly the fields the new rules cover (and record-read pages surfaces nothing, correctly needing no new mask).

Mask deltas (scripts/diff-artifacts.mjs)

  • duplicate-base: baseId (GENERATED_ID_KEYS), baseName (GENERATED_NAME_KEYS), linkFieldForeignTableId, details.duplicate.exportResult/doneEvent previewUrl/fileName/id/name (run-to-run echoes of the created copy / export); seedBaseName added to the existing cache rule (seedHash-family, G1-only).
  • record-read: details.queryVariant.overheadRatio (timing ratio). No seedHash mask needed — nests under details.seed.cache, already covered.
  • lookup-search-index: off/on TableId/ViewId (GENERATED_ID_KEYS), details.keywords.*.summary.maxMs (timing), and the bare details.seedCache seedHash family (not nested in cache).

Each mask carries a justifying comment; the volatility ones are proven by the baseline A↔B noise check, the seedHash-family ones by absent-in-A↔B / present-in-G1.

🤖 Generated with Claude Code

HynLcc and others added 4 commits June 20, 2026 15:15
duplicate-base becomes the second member of duplicate-lifecycle.ts (after
duplicate-table), delegating via a thin spec: prepare a populated source base
(its own "prepare" measurement parked on the fixture), assert readiness, run
one measured base operation (duplicate / duplicate-stream / export-stream via
executeBaseOperation), then drop the created copy and the source unless it is a
reusable cached seed. The shared driver is byte-unchanged, so duplicate-table is
unaffected and needs no re-verification.

buildResult plus the seed/verify/cleanup helpers are reused unchanged, so the
artifact is byte-for-byte equivalent — G1 clean over both duplicate-base cases
and export-base/...-stream, each on v1 and v2.

diff-artifacts.mjs masks duplicate-base's run-to-run-volatile generated values
(each proven volatile by the baseline A vs B noise check): the created copy /
export base id and name (baseId, baseName), the duplicated main table id echoed
as linkFieldForeignTableId, and the export preview URL + hash file name; plus
the hash-derived seedBaseName under details.sourceBase.cache (seedHash-family,
present only in G1 after the migration changes the seed code hash).

Co-Authored-By: Claude <noreply@anthropic.com>
record-read is the first member of a new read-lifecycle.ts driver: seed (or
restore) a host table plus the source table its lookups read through, assert the
full 50-field projection is readable, run the measured paged getRecords scan
(optionally versus a no-query baseline for the overhead variant) and verify it,
then drop the host + source tables unless they are a reusable cached seed.

The read family's signature — and what makes this its own driver rather than a
copy of duplicate-lifecycle — is that the measured read is non-destructive: it
creates nothing to clean up, so the driver OWNS the cleanup policy (drop the
seed tables the fixture declares, only when they are not a reusable cached seed
and the execute DB is not the throwaway isolated copy). The runner just declares
seedTableIds + isReusableSeed and writes no cleanup boilerplate. seedReady is
computed outside the diagnostic try (a readiness failure throws raw, as before),
and the optional baseline + measured scan + verify live entirely in the opaque
runPrimary, so buildResult and all routing/verification evidence are reused
unchanged — G1 byte-equivalent over both record-read cases on v1 and v2.

diff-artifacts.mjs masks details.queryVariant.overheadRatio, the queryMs /
baselineMs timing quotient that varies run-to-run on unchanged code (proven by
the record-read baseline A vs B diff); the *Ms timings and threshold-metric value
are already masked. No seedHash mask is needed: record-read nests its seed-cache
key under details.seed.cache, already covered by the cache rule.

Co-Authored-By: Claude <noreply@anthropic.com>
lookup-search-index becomes the second member of read-lifecycle.ts (after
record-read): it measures global aggregation/search-index reads over a seeded
source + dual host (index-off / index-on) table set. It rides the same driver —
seed (or restore) the read fixture, assert readiness, run the measured read
workload, and (per the driver's non-destructive read cleanup policy) drop
nothing because the seed is always a reusable cached seed, matching the
pre-migration runner which had no cleanup at all.

Two member-specific shapes ride in the spec: prepare carries its per-stage seed
sub-measurements on the fixture and emits no "prepare" phase, and the measured
primary is a keyword x sample loop whose p95 is the threshold metric, expressed
entirely in the opaque runPrimary. buildResult is reused unchanged, so the
artifact is byte-for-byte equivalent — G1 clean over both search cases on v1 and
v2. Having a real second member proves the read driver generic across the family.

diff-artifacts.mjs masks: the per-keyword summarizeDurations maxMs (a timing
value, scoped to details.keywords.* so the threshold maxMs stays visible; proven
volatile by the baseline A vs B diff); and, present only in G1, the index-off /
index-on host table + view ids and the bare details.seedCache seedHash family
(emitted spread, not nested under a `cache` object, so the existing cache rule
does not reach it).

Co-Authored-By: Claude <noreply@anthropic.com>
duplicate-base (duplicate-lifecycle 2nd member), record-read (read-lifecycle
1st member) and lookup-search-index (read-lifecycle 2nd member) move to
Migrated; the read-lifecycle.ts driver is new this round.

Co-Authored-By: Claude <noreply@anthropic.com>
@HynLcc HynLcc merged commit b544278 into main Jun 20, 2026
6 checks passed
@HynLcc HynLcc deleted the migrate-batch-duplicatebase-read-search branch June 20, 2026 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant