Skip to content

Cast connector inputs to the live weight dtype#61

Merged
willxxy merged 1 commit into
ELM-Research:mainfrom
TonyChen06:fix/connector-dtype-cast
Jun 16, 2026
Merged

Cast connector inputs to the live weight dtype#61
willxxy merged 1 commit into
ELM-Research:mainfrom
TonyChen06:fix/connector-dtype-cast

Conversation

@TonyChen06

Copy link
Copy Markdown
Contributor

Second piece of splitting up #12 into simpler PRs (the dtype-robustness fix that the full-determinism experiments needed), extended to all four connectors.

What

LinearProjection, MLPProjection, PatchProjection, and CNNPatchProjection cast incoming signals to self.input_dtype — the dtype captured at construction. If the model's dtype is changed afterwards (.float() for high-precision evaluation/debugging, fp64 determinism experiments), the forward crashes:

RuntimeError: mat1 and mat2 must have the same dtype, but got BFloat16 and Float

Cast to the projection weights' current dtype instead. input_dtype keeps its construction-time role.

Verification

For all four connectors: default bf16 outputs are bit-identical before/after this change; after .float() the forward previously raised the RuntimeError above and now runs, returning float32.

The connectors cast incoming signals to self.input_dtype, the dtype
captured at construction time. Casting the model afterwards (e.g.
.float() for fp64/fp32 evaluation or debugging) then crashes with a
dtype mismatch, because inputs still arrive as bfloat16 while the
weights have moved on. Cast to the projection weights' current dtype
instead; input_dtype remains as the construction-time module dtype.

Default bf16 path is unchanged (verified identical outputs for all
four connectors); after .float() each connector now runs and returns
float32.
@willxxy willxxy merged commit b1fade9 into ELM-Research:main Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants