Skip to content

Storage Engine v4 — Stack 2: c1z3 envelope format#869

Open
c1-squire-dev[bot] wants to merge 1 commit into
pquerna/storage-v4-stack1-protos-codegenfrom
pquerna/storage-v4-stack2-envelope
Open

Storage Engine v4 — Stack 2: c1z3 envelope format#869
c1-squire-dev[bot] wants to merge 1 commit into
pquerna/storage-v4-stack1-protos-codegenfrom
pquerna/storage-v4-stack2-envelope

Conversation

@c1-squire-dev
Copy link
Copy Markdown
Contributor

@c1-squire-dev c1-squire-dev Bot commented May 24, 2026

Summary

Stack 2 of RFC 0004. Stacks on Stack 1.

The v3 c1z file format envelope:

C1Z3\x00 | uint32_be(manifest_len) | manifest_proto | zstd(tar(pebble_dir))

New proto at proto/c1/c1z/v3/manifest.protoC1ZManifestV3:

  • engine — name of the engine that produced this c1z ("pebble", "sqlite").
  • engine_schema_version — schema major version (currently 1).
  • engine_config — opaque google.protobuf.Any for engine-specific config (e.g. PebbleEngineConfig includes the pinned format major version).
  • payload_encodingZstdTar or UncompressedTar.
  • descriptorsFileDescriptorSet containing the closure of types in this manifest (lets a future reader self-describe without our codebase).
  • record_types[]RecordTypeInfo with the (table_name, full_type_name) of each record family.
  • sync_runs[]SyncRunSummary so a reader knows what syncs are inside without opening Pebble.

New format/v3 package at pkg/dotc1z/format/v3/:

  • manifest.goMarshalManifest / UnmarshalManifest + BuildDescriptorClosure(protoregistry.Files) → *descriptorpb.FileDescriptorSet + VerifyDescriptorClosure.
  • envelope.goWriteEnvelope(w, manifest, payloadDir) + ReadEnvelope(r) (manifest, payloadReader, error) + ExtractZstdTar(payload, dir) with filepath.IsLocal traversal protection. Manifest size capped at 16 MiB.

Tests (envelope_test.go):

  • TestEnvelopeRoundtrip — write synthetic Pebble-shaped dir, read back, extract, verify.
  • TestReadEnvelopeBadMagic — refuses non-C1Z3 inputs.
  • TestReadEnvelopeTruncated — refuses truncated headers.
  • TestVerifyDescriptorClosureCatchesMissing — surfaces incomplete closures.
  • TestBuildDescriptorClosureIncludesStorageV3 — verifies the closure walker picks up storage v3 types.

Test plan

  • make lint clean
  • Envelope roundtrip tests pass
  • CI green

🤖 Generated with Claude Code

Stack 2 of the storage-engine-v4 PR series. Wires the v3 file
envelope on top of Stack 1's protos and codec layer. All under
`//go:build batonsdkv2`.

Proto at `proto/c1/c1z/v3/manifest.proto`:

  * `C1ZManifestV3` — engine identifier, engine_schema_version,
    engine_config (Any, typically PebbleEngineConfig), payload
    encoding, transitively-closed FileDescriptorSet, record-type
    catalog, sync-run summary, tooling metadata.
  * `PayloadEncoding` enum (UNSPECIFIED, RAW, ZSTD, ZSTD_TAR).
  * `RecordTypeInfo` for per-record-type schema versioning.
  * `SyncRunSummary` for cheap tooling enumeration without engine open.
  * `PebbleEngineConfig` — pinned format_major_version + cache size hint.

Format package at `pkg/dotc1z/format/v3/`:

  * `manifest.go` — Marshal/Unmarshal with empty-engine guard;
    BuildDescriptorClosure walks protoregistry.GlobalFiles for
    c1.storage.v3 packages and returns the transitive-closure
    FileDescriptorSet; VerifyDescriptorClosure detects missing
    dependencies at read time (returns ErrManifestIncompleteDescriptors).
  * `envelope.go` — WriteEnvelope emits magic + uint32-BE length
    prefix + manifest + zstd-tar payload. ReadEnvelope parses each
    layer with explicit truncation + manifest-size cap (16 MiB) +
    invalid-magic guards. ExtractZstdTar streams the payload tar into
    a destination directory with directory-traversal protection via
    filepath.IsLocal.

Tests cover: full directory roundtrip (write a synthetic Pebble-shaped
dir, read it back, verify file contents); bad magic; truncated header;
empty-engine refusal on both Marshal and Unmarshal; descriptor closure
completeness verification (missing dep detected; complete closure
accepted; storage.v3 records.proto present in the auto-built closure).

Deferred:
  * Full protodesc.ToFileDescriptorProto conversion in
    BuildDescriptorClosure (Stack 3) — Stack 2 ships the import-graph-
    plus-package-name projection so the closure invariant is testable.
    The richer projection lands when the engine actually consumes the
    descriptors via dynamicpb at open.
  * `cmd/baton-descriptor-closure-test/` standalone CI fixture lands
    in Stack 5 alongside the equivalence harness.

Refs: RFC v4 §3.2 (envelope), Appendix B (full envelope spec),
research/import-11.md TestCheckpointRoundtrip for the source-DB-
stays-writable property the save protocol depends on.
@c1-squire-dev c1-squire-dev Bot force-pushed the pquerna/storage-v4-stack1-protos-codegen branch from 5b79a33 to 205e0e4 Compare May 24, 2026 21:11
@c1-squire-dev c1-squire-dev Bot force-pushed the pquerna/storage-v4-stack2-envelope branch from bcdae06 to 0665ce2 Compare May 24, 2026 21:11
@c1-squire-dev
Copy link
Copy Markdown
Contributor Author

c1-squire-dev Bot commented May 24, 2026

Rebased onto updated Parent (PR #867) after review fixes from btipling + pr-review bot. See #867 for the specific fixes, or PR #874 for the combined squash view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant