Skip to content

Storage Engine v4 — Stack 5: equivalence runner + microtests + bench CLIs#872

Open
c1-squire-dev[bot] wants to merge 1 commit into
pquerna/storage-v4-stack4-compactionfrom
pquerna/storage-v4-stack5-equiv-bench
Open

Storage Engine v4 — Stack 5: equivalence runner + microtests + bench CLIs#872
c1-squire-dev[bot] wants to merge 1 commit into
pquerna/storage-v4-stack4-compactionfrom
pquerna/storage-v4-stack5-equiv-bench

Conversation

@c1-squire-dev
Copy link
Copy Markdown
Contributor

@c1-squire-dev c1-squire-dev Bot commented May 24, 2026

Summary

Stack 5 of RFC 0004 — closes the v4 PR stack with the validation infrastructure (RFC §6). Stacks on Stack 4.

Equivalence harness at pkg/dotc1z/engine/equivalence/:

  • Reference interface — minimal store contract that both the Pebble engine and the in-memory reference satisfy. Pluggable: future stacks can register the SQLite engine as a second Reference once the v2↔v3 translation layer is implemented (deferred — see tracker.md).
  • MemoryRef — deterministic in-memory implementation; source-of-truth in property tests.
  • RunWorkload(ctx, ref, w) → Result; Compare(ctx, refA, refB, w) error.
  • Workload + Op types support Put + Delete sequences.
  • 4/4 tests pass: TestPebbleMatchesMemoryRef (200 puts + 25 deletes, 5 entitlements, 50 principals), TestEmptyWorkload, TestSingleGrant, TestPutOverwrite (proves index cleanup matches reference).

Microtests at pkg/dotc1z/engine/pebble/microtests/:

  • tuple_test.go — exercises the production codec.AppendTupleBytes / codec.AppendTupleSeparator for the prefix-free property over hand-picked + 200 random cases (40,000+ pairwise comparisons).
  • codec_perf_test.go — codegen vs cached-reflection codec benchmark. On Linux/arm64: 96.92 ns/op direct vs 188.0 ns/op reflect (~1.94× delta — justifies the hybrid registry: direct codegen for the canonical record types, reflection fallback for ad-hoc types via the codec.Lookup path).
  • doc.go indexes the 5 risk-validation tests. The remaining 3 (compress, checkpoint, ingest_excise) are already covered by Stack 3 / Stack 4 tests against the production engine — they don't need a separate microtest copy.

Benchmark CLI at cmd/baton-storage-bench/:

  • G2 random Get by primary key, G3 list by entitlement, G4 list by principal, G5 full-sync scan.
  • Read-only against an existing Pebble dir; reports ops/sec + ns/op.
  • G1 (bulk write) lives in cmd/baton-fixture-gen. G6-G10 land as follow-ups.

Fixture-gen CLI at cmd/baton-fixture-gen/:

  • Generates synthetic GrantRecord rows at scale: -grants flag for total count, -entitlements / -principals for cardinality, -seed for deterministic generation. Output is a Pebble directory ready to bench.
  • Smoke: 1000 grants in 51ms (~19.6k r/s single-threaded). At 1M grants this projects to ~50s.

End-to-end smoke

baton-fixture-gen → baton-storage-bench: G3 ListByEntitlement returns
732k rows in 1s of benching (7.3k ops/s with the small fixture).

Test plan

  • make lint clean (no findings across the whole v4 stack)
  • 4/4 equivalence tests + 1 tuple property test + 4 codec benchmarks
  • End-to-end fixture-gen → storage-bench smoke
  • CI green

🤖 Generated with Claude Code

@c1-squire-dev c1-squire-dev Bot force-pushed the pquerna/storage-v4-stack4-compaction branch from ea3fd77 to ebbb431 Compare May 24, 2026 16:51
@c1-squire-dev c1-squire-dev Bot force-pushed the pquerna/storage-v4-stack5-equiv-bench branch 2 times, most recently from c2ea070 to 7c4f5fa Compare May 24, 2026 16:59
@c1-squire-dev c1-squire-dev Bot force-pushed the pquerna/storage-v4-stack4-compaction branch from ebbb431 to 05ba86f Compare May 24, 2026 16:59
@c1-squire-dev c1-squire-dev Bot force-pushed the pquerna/storage-v4-stack5-equiv-bench branch 2 times, most recently from b70ddc4 to 4d89e9d Compare May 24, 2026 17:01
@btipling btipling self-assigned this May 24, 2026
Closes the v4 PR stack with the validation infrastructure (RFC 0004 §6).

Equivalence harness (pkg/dotc1z/engine/equivalence/):
  - Reference interface — minimal store contract that both the Pebble
    engine and the in-memory reference satisfy.
  - MemoryRef — deterministic in-memory implementation; serves as the
    source-of-truth in property tests.
  - RunWorkload(ctx, ref, w) → Result; Compare(ctx, refA, refB, w).
  - Workload + Op types support Put + Delete sequences.
  - 4/4 tests pass: TestPebbleMatchesMemoryRef (200 puts + 25 deletes,
    5 entitlements, 50 principals), TestEmptyWorkload, TestSingleGrant,
    TestPutOverwrite (proves index cleanup matches reference).
  - Future stacks register the SQLite engine as a second Reference
    once the v2 ↔ v3 translation layer is implemented.

Microtests (pkg/dotc1z/engine/pebble/microtests/):
  - tuple_test.go — tests the production codec.AppendTupleBytes /
    codec.AppendTupleSeparator for the prefix-free property over
    hand-picked + 200 random cases (9 + 40,000 pairwise comparisons).
  - codec_perf_test.go — codegen vs cached-reflection codec benchmark.
    On Linux/arm64: 96.92 ns/op direct vs 188.0 ns/op reflect (~1.94×
    delta — justifies the hybrid registry: direct codegen for the
    canonical record types, reflection fallback for ad-hoc types via
    the codec.Lookup path).
  - doc.go indexes the 5 risk-validation tests. The remaining 3
    (compress, checkpoint, ingest_excise) are already covered by
    Stack 3 / Stack 4 tests against the production engine — they
    don't need a separate microtest copy.

Benchmark CLI (cmd/baton-storage-bench/):
  - G2 random Get by primary key, G3 list by entitlement, G4 list by
    principal, G5 full-sync scan. Each loops for -duration and reports
    ops/sec + ns/op. Read-only against an existing Pebble dir.
  - G1 (bulk write) lives in cmd/baton-fixture-gen.
  - G6–G10 land alongside Stack 5 follow-up commits.

Fixture-gen CLI (cmd/baton-fixture-gen/):
  - Generates synthetic GrantRecord rows at scale. -grants flag for
    total count, -entitlements / -principals for cardinality, -seed
    for deterministic generation. Output is a Pebble directory ready
    to bench with baton-storage-bench.
  - Smoke: 1000 grants in 51ms (~19.6k r/s single-threaded). At 1M
    grants this projects to ~50s, fast enough for hyper-scale fixture
    workflows.

End-to-end smoke verified: baton-fixture-gen → baton-storage-bench
roundtrip with G3 ListByEntitlement returning 732k rows in 1s of
benching (7.3k ops/s with the small fixture).

Refs: RFC v4 §6 (benchmark goals), §3.5 (codec), §3.10 (compaction
already covered by Stack 4)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@c1-squire-dev c1-squire-dev Bot force-pushed the pquerna/storage-v4-stack5-equiv-bench branch from 4d89e9d to 6ed497e Compare May 24, 2026 21:11
@c1-squire-dev c1-squire-dev Bot force-pushed the pquerna/storage-v4-stack4-compaction branch from 05ba86f to 66a8025 Compare May 24, 2026 21:11
@c1-squire-dev
Copy link
Copy Markdown
Contributor Author

c1-squire-dev Bot commented May 24, 2026

Rebased onto updated Parent (PR #867) after review fixes from btipling + pr-review bot. See #867 for the specific fixes, or PR #874 for the combined squash view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants