Handle out-of-order partitioned tables with pubviaroot = false#4045
Draft
Handle out-of-order partitioned tables with pubviaroot = false#4045
Conversation
69ceb98 to
4dbe3ee
Compare
jgao54
added a commit
that referenced
this pull request
Mar 18, 2026
Skip child partition relation messages in CDC stream. With `publish_via_partition_root = true`, PostgreSQL emits _both_ a parent and child RelationMessage before each partition's first change event. Also note that with `publish_via_partition_root = true`, the insert/update/delete message would always use the parent's relation id, what I didn't realize was that _the tuple data would also use the parent's column ordering._ this means the parent's RelationMessage carries the correct column ordering, while the child RelationMessage may have a different column ordering. Previously, processRelationMessage would remap the parent's relation id to the child's relation message and store it in `relationMessageMapping` (`p.relationMessageMapping[currRel.RelationID] = currRel`), overwriting the parent's column ordering. This causes change events to be decoded against the child's column order, and if the child's column ordering has a mismatch with the parent, it would lead to decoding errors. The fix for `publish_via_partition_root = true`'s case turned out to be quite simple: skip the child's relation message rather than overwriting the `relationMessageMapping` with it. Note that inherited tables work a bit differently because only the child table's RelationMessage would be sent, not the parent's. So we need to rely on child's RelationMessage. This does mean that inherited tables where the column order does not match the parent can also cause decoding errors. However this is an existing issue already and is out-of-scope for this PR. Fixes: #3544 Testing: e2e test without the change [fails](https://github.com/PeerDB-io/peerdb/actions/runs/22974964490/job/66701154925?pr=4035#step:28:1683) but should succeeds after. Follow up with #4045 to fix out-of-order columns in partitioned table when pubviaroot = false.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a customer provides their own publication without publish_via_partition_root = true, only the relationMessage from the child is sent, not the parent. And insert/update/delete events reference the child table's relID instead of the parent.
Before this change, child's relationMessage are mapped to parent's relID, which means if different children have different column ordering, the relationMessage of a later arrived child would overwrite the earlier one, potentially causing decoding failure. In other word, all child partitions were sharing the same entry in relationMessageMapping.
This fix makes sure relationMessageMapping is now keyed by the original relID (i.e. parent's relID if pubviaroot = true, child's relID if pubviaroot = false). Then processInsertMessage/processUpdateMessage/processDeleteMessage use the original relID to look up the correct relation message from the child's relID. This doesn't affect pubviaroot = true because the original relID of the change event is the parent's relID, which is also how it behaved before this PR.
Todo: test out inherit table case which should work as well.