Migrate duplicate-base, record-read, lookup-search-index onto lifecycle drivers by HynLcc · Pull Request #64 · teableio/teable-perf-lab

HynLcc · 2026-06-20T07:19:45Z

Migrates three more legacy runner kinds onto lifecycle drivers, taking the
tracker from 26/35 → 29/35 runner kinds (36/55 → 43/55 cases). One
self-contained commit per runner (incl. the diff-artifacts masks it needs).

Driver decisions (reuse → extend → new)

Runner	Decision	Why
duplicate-base	2nd member of existing `duplicate-lifecycle.ts`, driver byte-unchanged	Its run/seed already mirror the driver; the three operations (duplicate / duplicate-stream / export-stream) all flow through one opaque `runPrimary`. Because the shared driver is untouched, duplicate-table is unaffected and was not re-verified.
record-read	NEW `read-lifecycle.ts` (1st member)	No existing driver fits "measure a read over a seeded table". The read family's signature is a non-destructive measured op, so the driver owns the cleanup policy (drop seed tables only when not reusable / not isolated) — its real distinction from `duplicate-lifecycle`, which always drops a created copy.
lookup-search-index	2nd member of `read-lifecycle.ts`	Same seed→ready→measured-read→drop-if-not-reusable shape; its keyword×sample p95 lives in the opaque `runPrimary`, seed always reusable so cleanup drops nothing. A real 2nd member proves the new driver generic.

Every migrated runner reuses its unchanged buildResult and seed/verify
helpers, so the artifact is byte-for-byte equivalent (G1).

G1 artifact equivalence — baseline (legacy) ↔ candidate (migrated), per case × engine

Local methodology: pinned teable-ee, seed cache on, all measured runs are cache hits (baseline seeds warm; candidate seeds warmed once before measuring) so there is no build-vs-restore asymmetry. Baseline A↔B clean (14/14) before editing; baseline↔candidate clean (14/14) after.

Case	v1	v2	primary metric (v1 / v2, max)
duplicate-base/10k-3tables-link-2workflow	✅ G1 clean	✅ G1 clean	duplicateBaseRequestMs 1353 / 3745 (max 180000)
duplicate-base/10k-3tables-link-2workflow-stream	✅	✅	duplicateBaseStreamMs 1235 / 2176
export-base/10k-3tables-link-2workflow-stream	✅	✅	exportBaseStreamMs 1105 / 637
record-read/10k-50fields-10x1k-pages	✅	✅	getRecords10kPagedScanMs 887 / 1362 (max 30000)
record-read/10k-50fields-filter-sort-groupby-overhead	✅	✅	getRecordsFilterSortGroupByOverheadMs 2419 / 2446
search/search-index-off-10k-20search-fields	✅	✅	lookupSearchIndexP95Ms 65 / 59 (max 1500)
search/search-index-on-10k-20search-fields	✅	✅	lookupSearchIndexP95Ms 63 / 68

All result=pass, error=null. Routing preserved: duplicate-base & record-read assert routing (routeMatched=true; duplicate-base v1 x-teable-v2=false / v2 =true); export-base & lookup-search-index assert no routing, as before. traceRefCount preserved vs baseline for every case×engine (28/28, 1/1, 10/10, 20/20, 270/270; locally savedTraceCount=0 with no Jaeger — CI is where saved==ref / failed=0 holds).

Negative tests (comparator teeth), per runner

For duplicate-base, record-read and lookup-search-index:

(a) semantic perturbation → diff FAILS (details.duplicate.operation / details.operation / details.tableIndexMode)
(b) masked-field change → diff PASSES (baseId / queryVariant.overheadRatio / seedCache.seedHash)
(c) unmasked semantic sibling of a masked field → still DIFFS (details.duplicate.status / queryVariant.config.filterFieldName / seedCache.seedNamePrefix)

9/9 as expected. Mask necessity also confirmed: with the pre-edit diff script, baseline A↔B fails on exactly the fields the new rules cover (and record-read pages surfaces nothing, correctly needing no new mask).

Mask deltas (`scripts/diff-artifacts.mjs`)

duplicate-base: baseId (GENERATED_ID_KEYS), baseName (GENERATED_NAME_KEYS), linkFieldForeignTableId, details.duplicate.exportResult/doneEvent previewUrl/fileName/id/name (run-to-run echoes of the created copy / export); seedBaseName added to the existing cache rule (seedHash-family, G1-only).
record-read: details.queryVariant.overheadRatio (timing ratio). No seedHash mask needed — nests under details.seed.cache, already covered.
lookup-search-index: off/on TableId/ViewId (GENERATED_ID_KEYS), details.keywords.*.summary.maxMs (timing), and the bare details.seedCache seedHash family (not nested in cache).

Each mask carries a justifying comment; the volatility ones are proven by the baseline A↔B noise check, the seedHash-family ones by absent-in-A↔B / present-in-G1.

🤖 Generated with Claude Code

duplicate-base becomes the second member of duplicate-lifecycle.ts (after duplicate-table), delegating via a thin spec: prepare a populated source base (its own "prepare" measurement parked on the fixture), assert readiness, run one measured base operation (duplicate / duplicate-stream / export-stream via executeBaseOperation), then drop the created copy and the source unless it is a reusable cached seed. The shared driver is byte-unchanged, so duplicate-table is unaffected and needs no re-verification. buildResult plus the seed/verify/cleanup helpers are reused unchanged, so the artifact is byte-for-byte equivalent — G1 clean over both duplicate-base cases and export-base/...-stream, each on v1 and v2. diff-artifacts.mjs masks duplicate-base's run-to-run-volatile generated values (each proven volatile by the baseline A vs B noise check): the created copy / export base id and name (baseId, baseName), the duplicated main table id echoed as linkFieldForeignTableId, and the export preview URL + hash file name; plus the hash-derived seedBaseName under details.sourceBase.cache (seedHash-family, present only in G1 after the migration changes the seed code hash). Co-Authored-By: Claude <noreply@anthropic.com>

record-read is the first member of a new read-lifecycle.ts driver: seed (or restore) a host table plus the source table its lookups read through, assert the full 50-field projection is readable, run the measured paged getRecords scan (optionally versus a no-query baseline for the overhead variant) and verify it, then drop the host + source tables unless they are a reusable cached seed. The read family's signature — and what makes this its own driver rather than a copy of duplicate-lifecycle — is that the measured read is non-destructive: it creates nothing to clean up, so the driver OWNS the cleanup policy (drop the seed tables the fixture declares, only when they are not a reusable cached seed and the execute DB is not the throwaway isolated copy). The runner just declares seedTableIds + isReusableSeed and writes no cleanup boilerplate. seedReady is computed outside the diagnostic try (a readiness failure throws raw, as before), and the optional baseline + measured scan + verify live entirely in the opaque runPrimary, so buildResult and all routing/verification evidence are reused unchanged — G1 byte-equivalent over both record-read cases on v1 and v2. diff-artifacts.mjs masks details.queryVariant.overheadRatio, the queryMs / baselineMs timing quotient that varies run-to-run on unchanged code (proven by the record-read baseline A vs B diff); the *Ms timings and threshold-metric value are already masked. No seedHash mask is needed: record-read nests its seed-cache key under details.seed.cache, already covered by the cache rule. Co-Authored-By: Claude <noreply@anthropic.com>

lookup-search-index becomes the second member of read-lifecycle.ts (after record-read): it measures global aggregation/search-index reads over a seeded source + dual host (index-off / index-on) table set. It rides the same driver — seed (or restore) the read fixture, assert readiness, run the measured read workload, and (per the driver's non-destructive read cleanup policy) drop nothing because the seed is always a reusable cached seed, matching the pre-migration runner which had no cleanup at all. Two member-specific shapes ride in the spec: prepare carries its per-stage seed sub-measurements on the fixture and emits no "prepare" phase, and the measured primary is a keyword x sample loop whose p95 is the threshold metric, expressed entirely in the opaque runPrimary. buildResult is reused unchanged, so the artifact is byte-for-byte equivalent — G1 clean over both search cases on v1 and v2. Having a real second member proves the read driver generic across the family. diff-artifacts.mjs masks: the per-keyword summarizeDurations maxMs (a timing value, scoped to details.keywords.* so the threshold maxMs stays visible; proven volatile by the baseline A vs B diff); and, present only in G1, the index-off / index-on host table + view ids and the bare details.seedCache seedHash family (emitted spread, not nested under a `cache` object, so the existing cache rule does not reach it). Co-Authored-By: Claude <noreply@anthropic.com>

duplicate-base (duplicate-lifecycle 2nd member), record-read (read-lifecycle 1st member) and lookup-search-index (read-lifecycle 2nd member) move to Migrated; the read-lifecycle.ts driver is new this round. Co-Authored-By: Claude <noreply@anthropic.com>

HynLcc and others added 4 commits June 20, 2026 15:15

HynLcc merged commit b544278 into main Jun 20, 2026
6 checks passed

HynLcc deleted the migrate-batch-duplicatebase-read-search branch June 20, 2026 07:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate duplicate-base, record-read, lookup-search-index onto lifecycle drivers#64

Migrate duplicate-base, record-read, lookup-search-index onto lifecycle drivers#64
HynLcc merged 4 commits into
mainfrom
migrate-batch-duplicatebase-read-search

HynLcc commented Jun 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HynLcc commented Jun 20, 2026

Driver decisions (reuse → extend → new)

G1 artifact equivalence — baseline (legacy) ↔ candidate (migrated), per case × engine

Negative tests (comparator teeth), per runner

Mask deltas (scripts/diff-artifacts.mjs)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Mask deltas (`scripts/diff-artifacts.mjs`)