handle out-of-order columns in partitioned table#4035
Conversation
34df64a to
0c17f17
Compare
❌ 4 Tests Failed:
View the top 3 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
b24ac1c to
a875037
Compare
bb1e777 to
622ab85
Compare
31b9927 to
544a1b2
Compare
| // followed by a child RelationMessage for each partition. The parent's | ||
| // column list matches the tuple data wire format, so skip the child's | ||
| // to avoid overwriting with a potentially reordered column definition. | ||
| if originalRelID != msg.RelationID && parentRelKind == 'p' && p.publishViaPartitionRoot { |
There was a problem hiding this comment.
nit: can store 'p' and other relkinds as a rune type somewhere and reference here so it's more clear what each relkind does
There was a problem hiding this comment.
makes sense, there's a few other places where detecting relKind is needed, such as introducing ctid for partitioned tables, will consolidate it as a follow-up.
69ceb98 to
4dbe3ee
Compare
🔄 Flaky Test DetectedAnalysis: All four failing Snowflake e2e tests timed out during the STATUS_SETUP phase with "UNEXPECTED STATUS TIMEOUT STATUS_SETUP", indicating a transient Snowflake connectivity/resource delay rather than a code regression. ✅ Automatically retrying the workflow |
🔄 Flaky Test DetectedAnalysis: The e2e test suite timed out after exactly 900 seconds (the configured limit), indicating a slow/overloaded CI environment rather than a code defect. ✅ Automatically retrying the workflow |
🔄 Flaky Test DetectedAnalysis: The ✅ Automatically retrying the workflow |
- Bring back #4035, optionally query for pubviaroot only if pg version >= 13, otherwise default to false since pubviaroot is not supported anyways - Add e2e coverage to check that a dynamically added partition with out-of-order columns works - Validate that cdc continue to work on pg 12 Fixes: #3544
Skip child partition relation messages in CDC stream.
With
publish_via_partition_root = true, PostgreSQL emits both a parent and child RelationMessage before each partition's first change event.Also note that with
publish_via_partition_root = true, the insert/update/delete message would always use the parent's relation id, what I didn't realize was that the tuple data would also use the parent's column ordering.this means the parent's RelationMessage carries the correct column ordering, while the child RelationMessage may have a different column ordering.
Previously, processRelationMessage would remap the parent's relation id to the child's relation message and store it in
relationMessageMapping(p.relationMessageMapping[currRel.RelationID] = currRel), overwriting the parent's column ordering. This causes change events to be decoded against the child's column order, and if the child's column ordering has a mismatch with the parent, it would lead to decoding errors.The fix for
publish_via_partition_root = true's case turned out to be quite simple: skip the child's relation message rather than overwriting therelationMessageMappingwith it.Note that inherited tables work a bit differently because only the child table's RelationMessage would be sent, not the parent's. So we need to rely on child's RelationMessage. This does mean that inherited tables where the column order does not match the parent can also cause decoding errors. However this is an existing issue already and is out-of-scope for this PR.
Fixes: #3544
Testing: e2e test without the change fails but should succeeds after.
Follow up with #4045 to fix out-of-order columns in partitioned table when pubviaroot = false.