[mypyc] Fix non-deterministic class struct layout under separate=True#21530
Merged
Conversation
detect_undefined_bitmap() was extending cl.bitmap_attrs in place. Under
separate=True each SCC's analyze_always_defined_attrs is invoked once per
group, and detect_undefined_bitmap recurses through cl.base_mro from the
subclass into its base classes. The seen set passed in dedupes within one
call but is fresh per call, so every subclass-group call re-extends the
shared base class's bitmap_attrs with another copy of the contributions.
The base class's emitted ObjectStruct then grows by one bitmap field per
~32 subclasses processed in the same build. The exact final length is a
function of how many SCCs went through compile_scc_to_ir this run:
- clean build: every SCC fresh -> base bitmap_attrs accumulates fully
- incremental build affecting N subclasses: base accumulates a fraction
- second incremental: yet another count
Subclasses not rebuilt this round still see their base's old, larger
struct layout. Any attribute access on the base segfaults with a
mismatched bitmap-field offset.
Pre-existing in mypyc; only manifested once the prior over-conservative
44-file always-rebuild was lifted (1.20.0.post5), because that wasteful
behavior kept rebuild sets self-consistent.
Fix: compute a fresh local list and assign at the end. The function
becomes naturally idempotent across repeated calls — same input, same
output, regardless of how many groups have visited the class. No new
fields, no serialization changes.
Verified against sqlglot[c] (separate=True, ~100 modules):
Edit: add a method to MySQLParser (a class with 7 dialect subclasses)
Before: parser.h struct layout differs between clean and incremental
builds; make unitc segfaults at first parser-using test.
After: parser.h identical between clean and incremental;
make unitc passes (1163 tests, 0 segfaults).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The helper function
detect_undefined_bitmapbuilds the list of attributes that need a per-instance "is set?" bit (cl.bitmap_attrs). It walks from a subclass up into its base and.append()s entries.The walk dedupes within one call via
seen, but the function is called once per SCC; Underseparate=True, every subclass of a shared base lives in its own SCC, so the base is visited multiple times and gets the same entries re-appended on every pass.After N visits,
base.bitmap_attrscontains N duplicate copies of the same names, so attribute offsets shift between builds and not-rebuilt subclasses end up reading the wrong bytes. The added test case shows that on master branchbase.bitmap_attrshas been populated with["i"] * 11The fix: Build a fresh local list and assign once at the end. The function becomes idempotent and the struct layout remains identical after each incremental build.