Fix negative multiplicities caused by v80->v81 catalog migration#36519
Fix negative multiplicities caused by v80->v81 catalog migration#36519SangJunBak wants to merge 2 commits into
Conversation
c159418 to
f2b0fd4
Compare
2cf9fb1 to
40be5ee
Compare
There was a problem hiding this comment.
Nightly triggered, I'll take a closer look at the code now: https://buildkite.com/materialize/nightly/builds/16399
Edit: Fixed the miri test failures.
| agents: | ||
| queue: hetzner-x86-64-4cpu-8gb | ||
|
|
||
| - id: catalog-migration-incident-1007 |
There was a problem hiding this comment.
Upgrade tests can't run in Test pipeline, they are way too slow. Should be moved to Nightly. I would also prefer if this was caught in platform-checks additionally, not just with those specific versions, but with any version in the future exhibiting similar bugs.
I'll try add it to platform-checks.
| Catalog versions: | ||
| v26.17.x -> catalog 80 (pre-bug) | ||
| v26.18.0 -> catalog 81, introduces the buggy `v80_to_v81::upgrade` whose | ||
| `is_cloud` heuristic silently no-ops on self-managed envs. | ||
| Role rows stay in their v80 byte form | ||
| (no `auto_provision_source` key) while the catalog version | ||
| advances. Any subsequent ALTER reads the v80 form, parses it, | ||
| and writes a retract+insert pair re-serialized through the | ||
| current proto — which always includes `auto_provision_source: | ||
| null` — leaving a dangling `-1`. | ||
| current -> catalog 83, this PR's `v82_to_v83::upgrade` scans for the | ||
| structural signature of the drift and emits compensating | ||
| updates to retire the dangling `-1` and the stale `+1`. |
There was a problem hiding this comment.
I'm not sure if we still need this test at all then, we could add a scenario to platform checks that does these versions too one we add the scenario that catches it by altering the role and then selecting from the catalog.
| c.testdrive(dedent(""" | ||
| > SELECT EXISTS (SELECT 1 FROM mz_roles WHERE name = 'user1'); | ||
| false | ||
| """)) |
There was a problem hiding this comment.
The entire file smelly pretty LLMy, I think we shouldn't have one mzcompose.py test per bug/incident we run into, but use the existing frameworks when appropriate, so that we find similarly shaped bugs more generally.
|
I pushed platform checks extensions that would have caught this: Test run on this PR now: https://buildkite.com/materialize/nightly/builds/16406 Edit: Removed it for now, I'll do it in a separate PR so that we can merge this test already. |
| agents: | ||
| queue: hetzner-x86-64-4cpu-8gb | ||
|
|
||
| - id: catalog-migration-incident-1007 |
There was a problem hiding this comment.
Upgrade tests can't run in Test pipeline, they are way too slow. Should be moved to Nightly. I would also prefer if this was caught in platform-checks additionally, not just with those specific versions, but with any version in the future exhibiting similar bugs.
I'll try add it to platform-checks.
Rebased off and dependent on #36524
Motivation
Creates a regression test for fixing negative multiplicities caused by caused by v80->v81 catalog migration seen in incident-1007
Description
Verification
How do you know this change is correct? Describe new or existing automated
tests, or manual steps you took.