Storage Engine v4 — Stack 5: equivalence runner + microtests + bench CLIs by c1-squire-dev[bot] · Pull Request #872 · ConductorOne/baton-sdk

c1-squire-dev · 2026-05-24T16:44:53Z

Summary

Stack 5 of RFC 0004 — closes the v4 PR stack with the validation infrastructure (RFC §6). Stacks on Stack 4.

Equivalence harness at pkg/dotc1z/engine/equivalence/:

Reference interface — minimal store contract that both the Pebble engine and the in-memory reference satisfy. Pluggable: future stacks can register the SQLite engine as a second Reference once the v2↔v3 translation layer is implemented (deferred — see tracker.md).
MemoryRef — deterministic in-memory implementation; source-of-truth in property tests.
RunWorkload(ctx, ref, w) → Result; Compare(ctx, refA, refB, w) error.
Workload + Op types support Put + Delete sequences.
4/4 tests pass: TestPebbleMatchesMemoryRef (200 puts + 25 deletes, 5 entitlements, 50 principals), TestEmptyWorkload, TestSingleGrant, TestPutOverwrite (proves index cleanup matches reference).

Microtests at pkg/dotc1z/engine/pebble/microtests/:

tuple_test.go — exercises the production codec.AppendTupleBytes / codec.AppendTupleSeparator for the prefix-free property over hand-picked + 200 random cases (40,000+ pairwise comparisons).
codec_perf_test.go — codegen vs cached-reflection codec benchmark. On Linux/arm64: 96.92 ns/op direct vs 188.0 ns/op reflect (~1.94× delta — justifies the hybrid registry: direct codegen for the canonical record types, reflection fallback for ad-hoc types via the codec.Lookup path).
doc.go indexes the 5 risk-validation tests. The remaining 3 (compress, checkpoint, ingest_excise) are already covered by Stack 3 / Stack 4 tests against the production engine — they don't need a separate microtest copy.

Benchmark CLI at cmd/baton-storage-bench/:

G2 random Get by primary key, G3 list by entitlement, G4 list by principal, G5 full-sync scan.
Read-only against an existing Pebble dir; reports ops/sec + ns/op.
G1 (bulk write) lives in cmd/baton-fixture-gen. G6-G10 land as follow-ups.

Fixture-gen CLI at cmd/baton-fixture-gen/:

Generates synthetic GrantRecord rows at scale: -grants flag for total count, -entitlements / -principals for cardinality, -seed for deterministic generation. Output is a Pebble directory ready to bench.
Smoke: 1000 grants in 51ms (~19.6k r/s single-threaded). At 1M grants this projects to ~50s.

End-to-end smoke

baton-fixture-gen → baton-storage-bench: G3 ListByEntitlement returns
732k rows in 1s of benching (7.3k ops/s with the small fixture).

Test plan

make lint clean (no findings across the whole v4 stack)
4/4 equivalence tests + 1 tuple property test + 4 codec benchmarks
End-to-end fixture-gen → storage-bench smoke
CI green

🤖 Generated with Claude Code

Closes the v4 PR stack with the validation infrastructure (RFC 0004 §6). Equivalence harness (pkg/dotc1z/engine/equivalence/): - Reference interface — minimal store contract that both the Pebble engine and the in-memory reference satisfy. - MemoryRef — deterministic in-memory implementation; serves as the source-of-truth in property tests. - RunWorkload(ctx, ref, w) → Result; Compare(ctx, refA, refB, w). - Workload + Op types support Put + Delete sequences. - 4/4 tests pass: TestPebbleMatchesMemoryRef (200 puts + 25 deletes, 5 entitlements, 50 principals), TestEmptyWorkload, TestSingleGrant, TestPutOverwrite (proves index cleanup matches reference). - Future stacks register the SQLite engine as a second Reference once the v2 ↔ v3 translation layer is implemented. Microtests (pkg/dotc1z/engine/pebble/microtests/): - tuple_test.go — tests the production codec.AppendTupleBytes / codec.AppendTupleSeparator for the prefix-free property over hand-picked + 200 random cases (9 + 40,000 pairwise comparisons). - codec_perf_test.go — codegen vs cached-reflection codec benchmark. On Linux/arm64: 96.92 ns/op direct vs 188.0 ns/op reflect (~1.94× delta — justifies the hybrid registry: direct codegen for the canonical record types, reflection fallback for ad-hoc types via the codec.Lookup path). - doc.go indexes the 5 risk-validation tests. The remaining 3 (compress, checkpoint, ingest_excise) are already covered by Stack 3 / Stack 4 tests against the production engine — they don't need a separate microtest copy. Benchmark CLI (cmd/baton-storage-bench/): - G2 random Get by primary key, G3 list by entitlement, G4 list by principal, G5 full-sync scan. Each loops for -duration and reports ops/sec + ns/op. Read-only against an existing Pebble dir. - G1 (bulk write) lives in cmd/baton-fixture-gen. - G6–G10 land alongside Stack 5 follow-up commits. Fixture-gen CLI (cmd/baton-fixture-gen/): - Generates synthetic GrantRecord rows at scale. -grants flag for total count, -entitlements / -principals for cardinality, -seed for deterministic generation. Output is a Pebble directory ready to bench with baton-storage-bench. - Smoke: 1000 grants in 51ms (~19.6k r/s single-threaded). At 1M grants this projects to ~50s, fast enough for hyper-scale fixture workflows. End-to-end smoke verified: baton-fixture-gen → baton-storage-bench roundtrip with G3 ListByEntitlement returning 732k rows in 1s of benching (7.3k ops/s with the small fixture). Refs: RFC v4 §6 (benchmark goals), §3.5 (codec), §3.10 (compaction already covered by Stack 4) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

c1-squire-dev · 2026-05-24T21:13:15Z

Rebased onto updated Parent (PR #867) after review fixes from btipling + pr-review bot. See #867 for the specific fixes, or PR #874 for the combined squash view.

c1-squire-dev Bot force-pushed the pquerna/storage-v4-stack4-compaction branch from ea3fd77 to ebbb431 Compare May 24, 2026 16:51

c1-squire-dev Bot force-pushed the pquerna/storage-v4-stack5-equiv-bench branch 2 times, most recently from c2ea070 to 7c4f5fa Compare May 24, 2026 16:59

c1-squire-dev Bot force-pushed the pquerna/storage-v4-stack4-compaction branch from ebbb431 to 05ba86f Compare May 24, 2026 16:59

c1-squire-dev Bot force-pushed the pquerna/storage-v4-stack5-equiv-bench branch 2 times, most recently from b70ddc4 to 4d89e9d Compare May 24, 2026 17:01

btipling self-assigned this May 24, 2026

c1-squire-dev Bot force-pushed the pquerna/storage-v4-stack5-equiv-bench branch from 4d89e9d to 6ed497e Compare May 24, 2026 21:11

c1-squire-dev Bot force-pushed the pquerna/storage-v4-stack4-compaction branch from 05ba86f to 66a8025 Compare May 24, 2026 21:11

c1-squire-dev Bot mentioned this pull request May 24, 2026

Storage Engine v4 — combined squash (for macro review) #874

Open

btipling mentioned this pull request May 25, 2026

feat: c1z sanitizer v0.1 — library + CLI #875

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storage Engine v4 — Stack 5: equivalence runner + microtests + bench CLIs#872

Storage Engine v4 — Stack 5: equivalence runner + microtests + bench CLIs#872
c1-squire-dev[bot] wants to merge 1 commit into
pquerna/storage-v4-stack4-compactionfrom
pquerna/storage-v4-stack5-equiv-bench

c1-squire-dev Bot commented May 24, 2026

Uh oh!

c1-squire-dev Bot commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

c1-squire-dev Bot commented May 24, 2026

Summary

End-to-end smoke

Test plan

Uh oh!

c1-squire-dev Bot commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants