Skip to content

[mypyc] Fix non-deterministic class struct layout under separate=True#21530

Merged
p-sawicki merged 1 commit into
python:masterfrom
VaggelisD:bitmap-attrs-determinism
May 21, 2026
Merged

[mypyc] Fix non-deterministic class struct layout under separate=True#21530
p-sawicki merged 1 commit into
python:masterfrom
VaggelisD:bitmap-attrs-determinism

Conversation

@VaggelisD
Copy link
Copy Markdown
Contributor

The helper function detect_undefined_bitmap builds the list of attributes that need a per-instance "is set?" bit (cl.bitmap_attrs). It walks from a subclass up into its base and .append()s entries.

The walk dedupes within one call via seen, but the function is called once per SCC; Under separate=True, every subclass of a shared base lives in its own SCC, so the base is visited multiple times and gets the same entries re-appended on every pass.

After N visits, base.bitmap_attrs contains N duplicate copies of the same names, so attribute offsets shift between builds and not-rebuilt subclasses end up reading the wrong bytes. The added test case shows that on master branch base.bitmap_attrs has been populated with ["i"] * 11

The fix: Build a fresh local list and assign once at the end. The function becomes idempotent and the struct layout remains identical after each incremental build.

detect_undefined_bitmap() was extending cl.bitmap_attrs in place. Under
separate=True each SCC's analyze_always_defined_attrs is invoked once per
group, and detect_undefined_bitmap recurses through cl.base_mro from the
subclass into its base classes. The seen set passed in dedupes within one
call but is fresh per call, so every subclass-group call re-extends the
shared base class's bitmap_attrs with another copy of the contributions.

The base class's emitted ObjectStruct then grows by one bitmap field per
~32 subclasses processed in the same build. The exact final length is a
function of how many SCCs went through compile_scc_to_ir this run:

  - clean build: every SCC fresh -> base bitmap_attrs accumulates fully
  - incremental build affecting N subclasses: base accumulates a fraction
  - second incremental: yet another count

Subclasses not rebuilt this round still see their base's old, larger
struct layout. Any attribute access on the base segfaults with a
mismatched bitmap-field offset.

Pre-existing in mypyc; only manifested once the prior over-conservative
44-file always-rebuild was lifted (1.20.0.post5), because that wasteful
behavior kept rebuild sets self-consistent.

Fix: compute a fresh local list and assign at the end. The function
becomes naturally idempotent across repeated calls — same input, same
output, regardless of how many groups have visited the class. No new
fields, no serialization changes.

Verified against sqlglot[c] (separate=True, ~100 modules):

  Edit: add a method to MySQLParser (a class with 7 dialect subclasses)
  Before: parser.h struct layout differs between clean and incremental
          builds; make unitc segfaults at first parser-using test.
  After:  parser.h identical between clean and incremental;
          make unitc passes (1163 tests, 0 segfaults).
Copy link
Copy Markdown
Collaborator

@p-sawicki p-sawicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@p-sawicki p-sawicki merged commit cde4779 into python:master May 21, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants