Storage Engine v4 — Stack 5: equivalence runner + microtests + bench CLIs#872
Open
c1-squire-dev[bot] wants to merge 1 commit into
Open
Conversation
ea3fd77 to
ebbb431
Compare
c2ea070 to
7c4f5fa
Compare
ebbb431 to
05ba86f
Compare
b70ddc4 to
4d89e9d
Compare
Closes the v4 PR stack with the validation infrastructure (RFC 0004 §6).
Equivalence harness (pkg/dotc1z/engine/equivalence/):
- Reference interface — minimal store contract that both the Pebble
engine and the in-memory reference satisfy.
- MemoryRef — deterministic in-memory implementation; serves as the
source-of-truth in property tests.
- RunWorkload(ctx, ref, w) → Result; Compare(ctx, refA, refB, w).
- Workload + Op types support Put + Delete sequences.
- 4/4 tests pass: TestPebbleMatchesMemoryRef (200 puts + 25 deletes,
5 entitlements, 50 principals), TestEmptyWorkload, TestSingleGrant,
TestPutOverwrite (proves index cleanup matches reference).
- Future stacks register the SQLite engine as a second Reference
once the v2 ↔ v3 translation layer is implemented.
Microtests (pkg/dotc1z/engine/pebble/microtests/):
- tuple_test.go — tests the production codec.AppendTupleBytes /
codec.AppendTupleSeparator for the prefix-free property over
hand-picked + 200 random cases (9 + 40,000 pairwise comparisons).
- codec_perf_test.go — codegen vs cached-reflection codec benchmark.
On Linux/arm64: 96.92 ns/op direct vs 188.0 ns/op reflect (~1.94×
delta — justifies the hybrid registry: direct codegen for the
canonical record types, reflection fallback for ad-hoc types via
the codec.Lookup path).
- doc.go indexes the 5 risk-validation tests. The remaining 3
(compress, checkpoint, ingest_excise) are already covered by
Stack 3 / Stack 4 tests against the production engine — they
don't need a separate microtest copy.
Benchmark CLI (cmd/baton-storage-bench/):
- G2 random Get by primary key, G3 list by entitlement, G4 list by
principal, G5 full-sync scan. Each loops for -duration and reports
ops/sec + ns/op. Read-only against an existing Pebble dir.
- G1 (bulk write) lives in cmd/baton-fixture-gen.
- G6–G10 land alongside Stack 5 follow-up commits.
Fixture-gen CLI (cmd/baton-fixture-gen/):
- Generates synthetic GrantRecord rows at scale. -grants flag for
total count, -entitlements / -principals for cardinality, -seed
for deterministic generation. Output is a Pebble directory ready
to bench with baton-storage-bench.
- Smoke: 1000 grants in 51ms (~19.6k r/s single-threaded). At 1M
grants this projects to ~50s, fast enough for hyper-scale fixture
workflows.
End-to-end smoke verified: baton-fixture-gen → baton-storage-bench
roundtrip with G3 ListByEntitlement returning 732k rows in 1s of
benching (7.3k ops/s with the small fixture).
Refs: RFC v4 §6 (benchmark goals), §3.5 (codec), §3.10 (compaction
already covered by Stack 4)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4d89e9d to
6ed497e
Compare
05ba86f to
66a8025
Compare
Contributor
Author
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stack 5 of RFC 0004 — closes the v4 PR stack with the validation infrastructure (RFC §6). Stacks on Stack 4.
Equivalence harness at
pkg/dotc1z/engine/equivalence/:Referenceinterface — minimal store contract that both the Pebble engine and the in-memory reference satisfy. Pluggable: future stacks can register the SQLite engine as a secondReferenceonce the v2↔v3 translation layer is implemented (deferred — seetracker.md).MemoryRef— deterministic in-memory implementation; source-of-truth in property tests.RunWorkload(ctx, ref, w) → Result;Compare(ctx, refA, refB, w) error.Workload+Optypes support Put + Delete sequences.TestPebbleMatchesMemoryRef(200 puts + 25 deletes, 5 entitlements, 50 principals),TestEmptyWorkload,TestSingleGrant,TestPutOverwrite(proves index cleanup matches reference).Microtests at
pkg/dotc1z/engine/pebble/microtests/:tuple_test.go— exercises the productioncodec.AppendTupleBytes/codec.AppendTupleSeparatorfor the prefix-free property over hand-picked + 200 random cases (40,000+ pairwise comparisons).codec_perf_test.go— codegen vs cached-reflection codec benchmark. On Linux/arm64: 96.92 ns/op direct vs 188.0 ns/op reflect (~1.94× delta — justifies the hybrid registry: direct codegen for the canonical record types, reflection fallback for ad-hoc types via the codec.Lookup path).doc.goindexes the 5 risk-validation tests. The remaining 3 (compress, checkpoint, ingest_excise) are already covered by Stack 3 / Stack 4 tests against the production engine — they don't need a separate microtest copy.Benchmark CLI at
cmd/baton-storage-bench/:cmd/baton-fixture-gen. G6-G10 land as follow-ups.Fixture-gen CLI at
cmd/baton-fixture-gen/:-grantsflag for total count,-entitlements/-principalsfor cardinality,-seedfor deterministic generation. Output is a Pebble directory ready to bench.End-to-end smoke
Test plan
make lintclean (no findings across the whole v4 stack)🤖 Generated with Claude Code