catalog: repair Role records left in pre-v81 wire shape (v82→v83) [draft]#36511
Draft
bosconi wants to merge 1 commit into
Draft
catalog: repair Role records left in pre-v81 wire shape (v82→v83) [draft]#36511bosconi wants to merge 1 commit into
bosconi wants to merge 1 commit into
Conversation
The v80→v81 migration (src/catalog/src/durable/upgrade/v80_to_v81.rs) was
guarded by an `is_cloud` heuristic that returned false for any catalog
where, at the moment the migration committed in writable mode, mz_system
was not a Managed cluster with replication_factor > 0, or where
enable_password_auth was "on". Affected catalogs ran the migration with
the heuristic returning false; the migration emitted Vec::new() and left
every existing Role record's on-disk JSON in v80 wire shape — without an
`auto_provision_source` key.
On v81+ the value is still readable (serde fills None for the missing
Option<AutoProvisionSource> field), so the env doesn't crash. But any
in-band write that retracts a Role record — ALTER ROLE, membership
changes, etc. — re-serializes the deserialized value through the v81
type's Serialize, which emits `"auto_provision_source": null`. Those
retraction bytes are not byte-equal to the v80-shape bytes on disk, so
consolidation cannot cancel them; the original `+1` survives as a ghost
record alongside whatever else accumulates.
This change:
* Adds CATALOG_VERSION v83. v83 is structurally identical to v82 except
that RoleAttributes::auto_provision_source has #[serde(default,
skip_serializing_if = "Option::is_none")], making the JSON round-trip
byte-stable in both directions (None ⇔ no-field; Some(X) ⇔ "X").
* Adds run_versioned_upgrade_raw — a byte-precise counterpart to the
existing run_versioned_upgrade. Migrations using this entry point
receive each record's raw StateUpdateKindJson alongside its V1
deserialized form, and return raw (StateUpdateKindJson, Diff) updates
that are appended verbatim. This is what makes byte-precise retraction
of stale wire shapes possible — the standard MigrationAction::Update
path re-serializes `old` through V1::Serialize, which cannot reproduce
the v80 shape from a deserialized v82 value.
* Adds the v82→v83 migration. For every Role record, the migration
inspects the raw JSON to determine whether `auto_provision_source` is
present as a key, then decides the target value:
- field present, Some(X) on disk → preserve X
- field absent on disk (v80 shape) → apply the email-regex heuristic
from the original v80→v81 migration (Frontegg if the role name
matches `.+@.+\..+`, else None) — unconditionally, with no
is_cloud gate
- field present and explicitly null → preserve None
If the canonical v83 bytes differ from the on-disk bytes, the
migration emits a retraction of the exact on-disk bytes and an
insertion of the canonical v83 bytes. Records already in canonical v83
shape are skipped. Non-Role records are untouched.
Tests in v82_to_v83.rs cover the four code paths plus "non-Role records
untouched".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Draft for discussion. Potential fix for the v80→v81 catalog migration issue.
What's in the diff
CATALOG_VERSION 82 → 83. v83 is structurally identical to v82 except that
RoleAttributes::auto_provision_sourcehas#[serde(default, skip_serializing_if = "Option::is_none")]. This makes the JSON round-trip byte-stable in both directions (None ⇔ no-field; Some(X) ⇔ "X"). objects_v82.rs is unchanged — it remains the frozen "from" version the migration reads.New
run_versioned_upgrade_rawhelper inupgrade.rs. Byte-precise counterpart to the existingrun_versioned_upgrade. Migrations using this entry point receive each record's rawStateUpdateKindJsonalongside its V1 deserialized form, and return raw(StateUpdateKindJson, Diff)updates that are appended verbatim. This is what enables byte-precise retraction of stale wire shapes — the standardMigrationAction::Updatepath re-serializesoldthroughV1::Serialize, which cannot reproduce the v80 shape from a deserialized v82 value. All existing migrations remain on the typed path.v82→v83 migration (
v82_to_v83.rs). For every Role:Some(X)on disk → preserve XFronteggif name matches.+@.+\..+, elseNone), unconditional — nois_cloudgatenull→ preserveNoneIf the canonical v83 bytes differ from on-disk bytes, the migration emits a retraction of the exact on-disk bytes plus an insertion of canonical v83 bytes. Records already in canonical shape are skipped. Non-Role records are untouched.
What's intentionally NOT in this diff
MigrationAction. The new raw-bytes pathway is a parallel entry point.CreateRoleV1struct (which also has anauto_provision_sourcefield). Audit events are append-only — the byte-stability problem doesn't apply.Background: the bug
The v80→v81 migration (
src/catalog/src/durable/upgrade/v80_to_v81.rs:36-60) was guarded by anis_cloudheuristic that returned false for any catalog where, at the moment the migration committed in writable mode:mz_systemwas not aManagedcluster, ORmz_system.replication_factorwas 0, ORenable_password_authwas\"on\"When the heuristic returned false, the migration emitted
Vec::new()and left every existing Role record's on-disk JSON in v80 wire shape (without anauto_provision_sourcekey). On v81+ the value is still readable (serde fillsNonefor the missingOption<AutoProvisionSource>field), so the env doesn't crash. But any in-band write that retracts a Role record —ALTER ROLE, membership changes, etc. — re-serializes the deserialized value through the v81 type'sSerialize, which emits\"auto_provision_source\": null. Those retraction bytes are not byte-equal to the v80-shape bytes on disk, sodifferential_dataflowconsolidation cannot cancel them; the original+1survives as a ghost record alongside whatever else accumulates.In at least one environment this manifested as: most pre-04-02 user roles have
auto_provision_source: None(the field-absent v80 shape, deserialized as None), only roles created post-v81 haveFrontegg.Open questions for review
run_versioned_upgrade_rawis a one-off for this fix. Worth keeping for future shape-drift migrations, or should it be inlined into v82_to_v83 and removed once this migration is no longer needed?Vec::new()returns from migrations that don't declarejson_compatible!for shape-changed types; replaceis_cloudheuristics with an authoritative deployment-topology signal) are intentionally out of scope here. Worth filing as follow-ups?auto_provision_sourcein theCreateRoleV1payload (because the field was added to the audit struct at that point too). Audit events are append-only, so we can't repair them. Acceptable, or should the migration write synthetic audit events for the canonicalized roles?Test plan
Unit tests in
v82_to_v83.rscover:Frontegg, retraction byte-matches on-diskNone, retraction byte-matches on-diskFrontegg→ no rewrite (already canonical)null→ rewritten to no-field (preserve None, do NOT apply email regex)Larger end-to-end test plan (not landed in this PR):
cargo test --package mz-catalog --lib durable::upgrade::tests::generate_missing_encodings -- --ignoredto generateobjects_v83.txtsnapshot.cargo test -p mz-catalog --lib.🤖 Generated with Claude Code