fix(frame): clarify LE-encoded byte contract; revert HOST_NATIVE_BE sinker routings (post-#83/#84/#85 audit) by uqio · Pull Request #92 · Findit-AI/colconv

uqio · 2026-05-08T07:47:53Z

Summary

Codex 3rd-pass review of PR #85 caught a contract conflict that affects
six other merged PRs. PR #85's 52f8191 reverted the broken Grayf32
sinker routing; this PR completes the audit for the remaining six float
frame types affected by PR #83 (dcf40a3) and PR #84 (8627280):
Rgbf32, Rgbf16, Gbrpf32, Gbrapf32, Gbrpf16, Gbrapf16.

Root cause

The Grayf32Frame / Gbrpf32Frame / Gbrpf16Frame / Gbrapf32Frame /
Gbrapf16Frame doc-comments already named the FFmpeg *LE pixel
formats (AV_PIX_FMT_GBRPF32LE, etc.) — the byte layout is LE-encoded
bytes reinterpreted as f32 / half::f16, NOT host-native floats.
The kernel BE = false template parameter applies u32::from_le /
u16::from_le, which is:

LE host with LE-encoded bytes: from_le no-op → correct
BE host with LE-encoded bytes: from_le swaps → restores host-native f32 → correct

The HOST_NATIVE_BE consts introduced in PR #83 + PR #84 made the
kernel a no-op byte-swap on either host, which corrupts every output
path on a BE host.

Phase breakdown

Phase 1 — Frame docs (commit 5b42065).
Add explicit # Endian contract — LE-encoded bytes sections to
Rgbf32Frame, Rgbf16Frame, Gbrpf32Frame, Gbrapf32Frame,
Gbrpf16Frame, Gbrapf16Frame, mirroring the existing Grayf32Frame
pattern. Add bytemuck::cast_slice + linesize-division-by-element-size
guidance for FFmpeg-buffer callers.

Phase 2 — Sinker revert (commit 84bbbb7).
Revert ::<HOST_NATIVE_BE> → ::<false> across:

File	Sites
`src/sinker/mixed/packed_rgb_float.rs` (Rgbf32)	6
`src/sinker/mixed/packed_rgb_f16.rs` (Rgbf16)	7
`src/sinker/mixed/planar_gbr_float.rs` (Gbrpf32 + Gbrapf32)	22
`src/sinker/mixed/planar_gbr_f16.rs` (Gbrpf16 + Gbrapf16)	22
Total	57

Removes the four const HOST_NATIVE_BE definitions and their explanatory
doc-comments. Rewrites the widen_f16_to_f32 doc-comment to accurately
describe the LE-encoded byte contract.

Untouched per the audit:

Scalar row-kernel BE == HOST_NATIVE_BE copy_from_slice fast paths —
generic in BE, correct under either contract.
Per-arch SIMD kernels (row/arch/*).
The bit-normalising widen_f16_be_to_host_f32 scalar helper — its
BE → host-native pattern is correct because it produces decoded f32
from LE-encoded f16 bits before the downstream f32 kernel sees them.

Phase 3 — Test cleanup + LE-encoded contract regressions (commit 2cbc60c).

Removed 10 wrong-contract tests that were typed against the host-native
reading of the Frame contract:

rgbf32_kernel_host_native_be_matches_false_on_le_host
rgbf32_sinker_host_native_contract_lossless_passthrough
rgbf16_kernel_host_native_be_matches_false_on_le_host
rgbf16_sinker_host_native_contract_lossless_passthrough
gbrpf32_kernel_host_native_be_matches_false_on_le_host
gbrpf32_sinker_host_native_contract_lossless_passthrough
gbrpf16_kernel_host_native_be_matches_false_on_le_host
gbrpf16_sinker_host_native_contract_lossless_passthrough
gbrapf32_sinker_host_native_contract_lossless_passthrough_with_alpha
gbrapf16_sinker_host_native_contract_lossless_passthrough_with_alpha

Added 6 LE-encoded contract regressions following PR #85's 52f8191
pattern (grayf32_sinker_le_encoded_frame_decodes_correctly):

rgbf32_sinker_le_encoded_frame_decodes_correctly
rgbf16_sinker_le_encoded_frame_decodes_correctly
gbrpf32_sinker_le_encoded_frame_decodes_correctly
gbrapf32_sinker_le_encoded_frame_decodes_correctly
gbrpf16_sinker_le_encoded_frame_decodes_correctly
gbrapf16_sinker_le_encoded_frame_decodes_correctly

Each test builds a plane via f32::from_bits(intended.to_bits().to_le())
(or the f16 analogue), runs the sinker's lossless pass-through, and
asserts the output equals the host-native intended values. Vacuous on LE
host (every CI runner today) but fails fast on BE for any regression
that drops the ::<false> routing.

Net test delta: −10 / +6 = −4 (final cargo test --target aarch64-apple-darwin --lib count: 2242, down from 2246).

Test plan

cargo test --target aarch64-apple-darwin --lib → 2242 ok
cargo build --target x86_64-apple-darwin --tests → ok (0 warnings)
RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests → ok
cargo build --no-default-features → ok
cargo fmt --check → ok
cargo clippy --all-targets --all-features -- -D warnings → ok

Notes

Untouched: GrayN / Gray16 / Ya16 / yuva*p sinkers — those use the
source Frame's per-format BE flag for encoded-byte order
(different layer from the f32/f16 host-native vs LE-encoded
contract).
The sinker-local widen_f16_to_f32 helper in
src/sinker/mixed/planar_gbr_f16.rs calls half::f16::to_f32
directly on the LE-encoded plane. On LE host this is correct (LE bytes
ARE host-native); on BE host this would mis-interpret the bits — a
separate bug from the routing reverted here. The accurate fix is to
switch this helper to the bit-normalising widen_f16_be_to_host_f32
pattern used by the per-arch SIMD backends; left for a follow-up. The
doc-comment is updated to make the LE-host caveat explicit.

🤖 Generated with Claude Code

Affected types: Rgbf32Frame, Rgbf16Frame, Gbrpf32Frame, Gbrapf32Frame, Gbrpf16Frame, Gbrapf16Frame. Each doc-comment now explicitly states the LE-encoded byte contract, mirroring the established Grayf32Frame doc pattern: the &[f32] / &[f16] plane is the LE-encoded byte layout reinterpreted as f32 / f16, matching FFmpeg's canonical *LE pixel-format suffixes (AV_PIX_FMT_GBRPF32LE etc). Adds the bytemuck::cast_slice + linesize-division-by-element-size instruction so callers wiring up FFmpeg buffers don't have to guess. Codex 3rd-pass review of PR #85 caught that PR #83 + PR #84 introduced sinker routings that assumed the planes are already host-native f32 / f16 — a wrong reading of the *LE contract that would corrupt every output path on a BE host. Pinning the doc-level contract here so future readers don't repeat the mistake; sinker revert lands in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors PR #85's Grayf32 sinker revert (52f8191) for the other six float frame types affected by PR #83 (dcf40a3) and PR #84 (8627280): Rgbf32, Rgbf16, Gbrpf32, Gbrapf32, Gbrpf16, Gbrapf16. The earlier PRs introduced sinker-layer `HOST_NATIVE_BE` consts that made the row-kernel `BE` parameter a no-op byte-swap on whichever host the build targeted. That assumed the Frame's plane bytes are already host-native f32/f16 — which contradicts the LE-encoded byte contract clarified in the previous commit. On a BE host the routing skipped the required `u32::from_le` / `u16::from_le` swap and would have corrupted every output path. Reverted call sites (all `::<HOST_NATIVE_BE>` -> `::<false>`): - src/sinker/mixed/packed_rgb_float.rs: 6 sites (Rgbf32) - src/sinker/mixed/packed_rgb_f16.rs: 7 sites (Rgbf16) - src/sinker/mixed/planar_gbr_float.rs: 22 sites (Gbrpf32 + Gbrapf32) - src/sinker/mixed/planar_gbr_f16.rs: 22 sites (Gbrpf16 + Gbrapf16) Removes the four `const HOST_NATIVE_BE: bool = cfg!(target_endian = "big")` definitions and their explanatory doc-comments. Rewrites the `widen_f16_to_f32` doc in planar_gbr_f16.rs to accurately describe the LE-encoded contract (helper is correct on LE host; cross-endian widen should use the bit-normalising `widen_f16_be_to_host_f32` pattern). Untouched: `BE == HOST_NATIVE_BE` `copy_from_slice` fast paths in scalar row kernels (`rgbf32_to_rgb_f32_row`, `gbrpf32_to_rgb_f32_row`, etc.) — generic in `BE`, correct under either contract. Per-arch SIMD kernels and `widen_f16_be_to_host_f32` left alone — different layer. Verification (aarch64-apple-darwin): cargo test --target aarch64-apple-darwin --lib # 2246 ok Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…oded regressions Removes the host-native-contract tests that PR #83 + PR #84 added — they were typed against an incorrect host-native-f32/f16 reading of the Frame contract and would mask the BE-host corruption introduced by the now- reverted `::<HOST_NATIVE_BE>` sinker routings: - rgbf32_kernel_host_native_be_matches_false_on_le_host (deleted) - rgbf32_sinker_host_native_contract_lossless_passthrough (deleted) - rgbf16_kernel_host_native_be_matches_false_on_le_host (deleted) - rgbf16_sinker_host_native_contract_lossless_passthrough (deleted) - gbrpf32_kernel_host_native_be_matches_false_on_le_host (deleted) - gbrpf32_sinker_host_native_contract_lossless_passthrough (deleted) - gbrpf16_kernel_host_native_be_matches_false_on_le_host (deleted) - gbrpf16_sinker_host_native_contract_lossless_passthrough (deleted) - gbrapf32_sinker_host_native_contract_lossless_passthrough_with_alpha (deleted) - gbrapf16_sinker_host_native_contract_lossless_passthrough_with_alpha (deleted) Adds LE-encoded byte contract regressions following PR #85's `52f8191` pattern (`grayf32_sinker_le_encoded_frame_decodes_correctly`): - rgbf32_sinker_le_encoded_frame_decodes_correctly (new) - rgbf16_sinker_le_encoded_frame_decodes_correctly (new) - gbrpf32_sinker_le_encoded_frame_decodes_correctly (new) - gbrapf32_sinker_le_encoded_frame_decodes_correctly (new) - gbrpf16_sinker_le_encoded_frame_decodes_correctly (new) - gbrapf16_sinker_le_encoded_frame_decodes_correctly (new) Each test constructs an explicitly LE-encoded plane via `f32::from_bits(intended.to_bits().to_le())` (or the f16 analogue), builds the Frame, runs the sinker's lossless pass-through, and asserts the output equals the host-native intended values. Vacuous on LE hosts (where `to_le` is identity) but on a BE host any regression that drops the `::<false>` kernel routing would fail fast. Also picks up two trivial pre-existing trailing-blank-line cleanups in `src/row/arch/{neon,x86_sse41}/endian.rs` produced by `cargo fmt` while running it across the modified files. Verification (aarch64-apple-darwin, x86_64-apple-darwin, wasm32): cargo test --target aarch64-apple-darwin --lib # 2242 ok cargo build --target x86_64-apple-darwin --tests # ok RUSTFLAGS="-C target-feature=+simd128" \ cargo build --target wasm32-unknown-unknown --tests # ok cargo build --no-default-features # ok cargo fmt --check # ok cargo clippy --all-targets --all-features -- -D warnings # ok Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rmalize-first canonical helper Closes the BE-host correctness gap flagged in the original body of #92: the sinker-local `widen_f16_to_f32` in `src/sinker/mixed/planar_gbr_f16.rs` called `half::f16::to_f32` directly on f16 plane bits, treating the bytes as host-native. That assumption only holds on a little-endian host with an LE-encoded source — on a big-endian host the LE-encoded f16 bits would be byte-swapped from the intended values, and `to_f32` would decode them as arbitrarily wrong magnitudes / signs / NaNs. Replaces every call site with the canonical bit-normalize-first helper `crate::row::scalar::planar_gbr_f16::widen_f16_be_to_host_f32::<false>` (BE = false because Frame plane bytes are LE-encoded per the contract documented in Phase 1 of this PR). The shared helper does `f32 = f16::from_bits(u16::from_le(plane_bits)).to_f32()` — correct on both LE hosts (where `from_le` is identity) and BE hosts (where it byte- swaps the LE-encoded plane bits back to host-native f16 bits before the f16 → f32 conversion). The downstream `gbrpf32_to_*` row kernels are still invoked with `BE = false` since the widened scratch is now host-native f32 (its `u32::from_le` is then a no-op on LE host and the right byte-swap on BE host — same contract as the dispatch f16-widen fallback and the per- backend SIMD scalar tails already use). Updated 7 call sites: - 3 in the `Gbrpf16` `process` widen loop (G, B, R planes). - 4 in the `Gbrapf16` `process` widen loop (G, B, R, A planes). - 1 in the `widen_and_scatter_f16_alpha_to_u8` u8 RGBA fast-path helper (counted as the seventh distinct call site; the other six are the two widen-loop blocks). Removed the local `widen_f16_to_f32` helper (and its now-stale LE-host-only-caveat doc-comment) entirely — no remaining callers. Adds two LE-encoded byte-contract regression tests covering the widening path (the existing PR #92 tests covered only the lossless f16 pass-through path which doesn't go through the widen helper): - gbrpf16_sinker_widen_path_le_encoded_frame_decodes_correctly (Gbrpf16 → with_rgb_f32; exercises the 3-plane widen loop) - gbrapf16_sinker_widen_path_le_encoded_frame_decodes_correctly (Gbrapf16 → with_rgba_f32; exercises the 4-plane widen loop incl. α) Each test constructs LE-encoded `&[half::f16]` planes via `half::f16::from_bits(intended.to_bits().to_le())`, builds a Frame, runs the widening sinker path, and asserts the f32 output equals the host- native expected values. Vacuous on LE hosts (where `to_le` is identity) but on a BE host any regression that drops the bit-normalize-first step in `widen_f16_be_to_host_f32::<false>` would fail fast. Verification (aarch64-apple-darwin, x86_64-apple-darwin, wasm32): cargo test --target aarch64-apple-darwin --lib # 2269 ok cargo build --target x86_64-apple-darwin --tests # 0 warnings RUSTFLAGS="-C target-feature=+simd128" \ cargo build --target wasm32-unknown-unknown --tests # ok cargo build --no-default-features # ok cargo fmt --check # ok cargo clippy --all-targets --all-features -- -D warnings # ok Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…are + frame docs cite LE names Addresses three follow-up findings from Codex's review of PR #92. Finding 1 — Post-widen f32 routing on Gbrpf16 / Gbrapf16. The Phase 2 follow-up replaced the local `widen_f16_to_f32` helper with the canonical `widen_f16_be_to_host_f32::<false>`, but the immediate downstream `gbrpf32_to_*::<false>` calls still told the f32 row kernel to apply `from_le` AGAIN — on a BE host this would byte-swap the already-host-native f32 scratch, corrupting output. Introduce a module-scope `HOST_NATIVE_BE = cfg!(target_endian = "big")` constant in `src/sinker/mixed/planar_gbr_f16.rs` and route ALL post-widen `gbrpf32_to_*` / `gbrapf32_to_*` calls through `::<HOST_NATIVE_BE>` (14 sites total across both Gbrpf16 and Gbrapf16 sinker bodies). The kernel's `from_le` / `from_be` then becomes a no-op on every host since post-widen scratch already matches the host byte order. Distinct from the Phase 2 revert: direct Frame-to-row-kernel calls (`gbrpf16_to_*::<false>` for u8 / f16 outputs) still use `<false>` per the LE-encoded Frame contract — those receive raw LE-encoded `&[half::f16]` plane bytes, so the kernel must apply `from_le`. Post-widen scratch is host-native f32, so it must use `<HOST_NATIVE_BE>` to keep the swap a no-op everywhere. The module-level comment block makes this distinction explicit. Finding 2 — Ya16 RGB+RGBA combined-path alpha not endian-aware. When `MixedSinker<Ya16>` attached BOTH `with_rgb` and `with_rgba`, the sinker ran `ya16_to_rgb_*::<false>` (correct), expanded the RGB output to RGBA prefix, and called `copy_alpha_ya_u16` / `copy_alpha_ya_u16_to_u8` to overlay the alpha plane from the source — but those helpers read raw `packed[n*2+1]` directly without `u16::from_le`. On a BE host processing the LE-encoded `Ya16Frame`, the host-native u16 read would interpret LE-encoded bytes as host-native (BE), yielding the wrong alpha byte. Audit found four alpha-patch helpers with the same pattern across three formats — Ya16 (`copy_alpha_ya_u16` / `_to_u8`), AYUV64 (`copy_alpha_packed_u16x4_at_0` / `_to_u8_at_0`), and Rgba64 / Bgra64 (`copy_alpha_packed_u16x4_at_3` / `_to_u8_at_3`). All six were made endian-aware via a `<const BE: bool>` parameter applying `u16::from_le(packed[..])` (or `from_be` per BE) before scaling / storing, mirroring the pre-existing `copy_alpha_plane_u16` / `copy_alpha_plane_u16_to_u8` pattern from PR #81. Updated all 8 sinker call sites to pass `<false>` (Frame contract is LE). The two AYUV64 helpers also have SIMD dispatchers — those gained a `<const BE: bool>` generic and the `safe_for_simd = !BE && cfg!(target_endian = "little")` gate that already exists for `copy_alpha_plane_u16_to_u8`, falling back to scalar in the BE-host or BE-data quadrants. The five SIMD scalar tails were pinned to `::<false>` matching the precedent from PR #81 / PR #82. Finding 3 — Public Frame docs cite wrong FFmpeg names. `Rgbf16Frame` docs cited `AV_PIX_FMT_RGBF16` (target-endian alias — `RGBF16LE` on LE, `RGBF16BE` on BE). `Rgbf32Frame` docs cited `AV_PIX_FMT_RGBF32` which doesn't exist as an unsuffixed alias; FFmpeg has only the alpha-bearing `RGBAF32LE` / `RGBAF32BE` pair. A BE caller following the unsuffixed contract could pass native-BE bytes into an LE-decoder. Updated `src/frame/packed_rgb_f16.rs` to cite `AV_PIX_FMT_RGBF16LE` explicitly, with a parenthetical noting that the unsuffixed name is a target-endian alias to avoid future confusion. Updated `src/frame/packed_rgb_float.rs` to drop the citation of the non-existent `AV_PIX_FMT_RGBF32` constant entirely and instead document the contract as "LE-encoded f32 bytes — `colconv` pins to the canonical FFmpeg `*LE` byte-order convention regardless of host endianness, exactly as the existing `AV_PIX_FMT_RGBAF32LE` flavour". Tests added: - `gbrpf16_sinker_widen_path_u16_and_u8_le_encoded_frame_decodes_correctly` exercises the Gbrpf16 widen → f32 → u16 chain (with `with_luma_u16` alongside) under the LE-encoded Frame contract — vacuous on LE, catches the double-swap on BE. - `ya16_combined_rgb_and_rgba_alpha_matches_standalone_le_encoded` and the u16 sibling assert that Strategy A+ (combined `with_rgb` + `with_rgba`) produces α bytes byte-identical to the standalone `with_rgba` path on every host. Builds the `&[u16]` plane via `u16::from_ne_bytes(x.to_le_bytes())` so the in-memory bytes spell the LE-encoded contract on both LE and BE hosts. - 8 new BE-parity tests in `src/row/scalar/alpha_extract.rs` for the newly endian-aware AYUV64 / Rgba64 / Ya16 helpers (4 helpers × u8 / u16 variants), all using the `swap_bytes`-fixture pattern that exists for `copy_alpha_plane_u16` so the BE arm runs on every host without needing a real BE runner. Verified: - `cargo test --target aarch64-apple-darwin --lib` — 2278 passed - `cargo build --target x86_64-apple-darwin --tests` — clean - `RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests` — clean - `cargo build --no-default-features` — clean - `cargo fmt --check` — clean - `cargo clippy --all-targets --all-features -- -D warnings` — clean - `cargo check --target s390x-unknown-linux-gnu --lib` — clean (BE-host smoke) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…direct-Frame vs post-widen paths Codex 3rd-pass review of #92 found that the f32 alpha-patch helpers `copy_alpha_plane_f32_to_u8`, `copy_alpha_plane_f32_to_u16`, and `copy_alpha_plane_f32` read each f32 α sample as host-native, which silently corrupted the α slot on BE hosts (e.g., s390x) processing the LE-encoded `Gbrapf32Frame` α plane. The byte-swapped f32 bits clamp to near-zero or near-one, so BE hosts emitted α = 0 or 255 regardless of the intended value. Same bug class as the u16 alpha-patch helpers fixed in cf26058, but for the f32 path; the standalone `with_rgba_f32` test didn't cover this because Strategy A+ uses a different code path (`expand_rgb_to_rgba_row` + alpha-patch). Helpers updated (4): * `copy_alpha_plane_f32_to_u8` — added `<const BE: bool>`, normalises each f32's bits via `f32::from_bits(u32::from_le/be(bits))` before clamp / scale / round-half-up. * `copy_alpha_plane_f32_to_u16` — same pattern. * `copy_alpha_plane_f32` — same pattern (lossless pass-through). * `copy_alpha_plane_f16` (planar_gbr_f16.rs) — added `<const BE: bool>` for consistency. Currently only test-called; future-proofs against a Strategy A+ wiring that consumes it. Sinker call sites updated, dual routing pattern (4 sites total): * `src/sinker/mixed/planar_gbr_float.rs` — Gbrapf32 Strategy A+ u8 RGBA path: `<false>` (direct-Frame, LE-encoded per Phase-1 contract). * `src/sinker/mixed/planar_gbr_f16.rs` — `widen_and_scatter_f16_alpha_to_u8` helper: `<HOST_NATIVE_BE>` (post-widen, scratch is host-native f32 after `widen_f16_be_to_host_f32::<false>`). Audit summary: only one direct-Frame call site (Gbrapf32 u8 RGBA) and one post-widen call site (Gbrapf16 u8 RGBA) currently invoke the f32 alpha-patch helpers. The Gbrapf32 RGBA u16 / RGBA f32 / RGBA f16 paths all use direct kernels (`gbrapf32_to_rgba_u16_row` / `gbrapf32_to_rgba_f32_row` / `gbrapf32_to_rgba_f16_row`) and were already endian-aware via their `<false>` parameter; same for the Gbrapf16 widened RGBA u16 / f32 paths. Tests added (7): * `copy_alpha_plane_f32_to_u8_be_parity_with_swapped_buffer` * `copy_alpha_plane_f32_to_u16_be_parity_with_swapped_buffer` * `copy_alpha_plane_f32_be_parity_with_swapped_buffer` * `copy_alpha_plane_f16_be_parity_with_swapped_buffer` * `gbrapf32_strategy_a_plus_le_encoded_frame_alpha_decodes_correctly` (w = 15, exercises scalar tail; compares Strategy A+ combo to standalone `with_rgba`). * `gbrapf32_strategy_a_plus_le_encoded_u16_alpha_decodes_correctly` (w = 17, defense-in-depth for the u16 RGBA combo). * `gbrapf16_strategy_a_plus_post_widen_alpha_decodes_correctly` (w = 15, verifies HOST_NATIVE_BE routing on the post-widen helper). The 3 prior `copy_alpha_plane_f32*_clamps_and_scales` / `_lossless_passthrough` tests are now `#[cfg(target_endian = "little")]` (host-native f32 fixtures); BE-host scalar correctness is locked down by the dedicated `*_be_parity_with_swapped_buffer` tests. Verified: * `cargo test --target aarch64-apple-darwin --lib` (2285 pass, +7). * `cargo build --target x86_64-apple-darwin --tests` (0 warnings). * `RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests`. * `cargo build --no-default-features`. * `cargo fmt --check`. * `cargo clippy --all-targets --all-features -- -D warnings`. * `cargo check --target s390x-unknown-linux-gnu --lib` (BE-host smoke). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ercises them Codex 4th-pass review of #92 found the BE-sensitive sinker regressions skip on miri targets — but BE CI runs via miri (powerpc64 / s390x), so the tests effectively did not validate the BE path. The `cfg_attr(miri, ignore = ...)` gate was added defensively, but it skipped exactly the host where BE corruption would surface. Audit summary: every BE-sensitive regression added in this PR calls the high-level `MixedSinker` builder + `gbrpf32_to` / `gbrapf32_to` / `gbrpf16_to` / `gbrapf16_to` / `rgbf32_to` / `rgbf16_to`. The dispatch layer honours the sinker's `simd` flag — passing `with_simd(false)` bypasses every SIMD branch and calls the scalar kernels directly. The post-widen helpers (`widen_f16_be_to_host_f32`, `expand_rgb_to_rgba_row`, the `copy_alpha_plane_f32*` family) are pure scalar. None of these tests construct or invoke arch-specific SIMD intrinsics directly. Tests un-gated (12): * `src/sinker/mixed/tests/packed_rgb_float.rs` - `rgbf32_sinker_le_encoded_frame_decodes_correctly` * `src/sinker/mixed/tests/packed_rgb_f16.rs` - `rgbf16_sinker_le_encoded_frame_decodes_correctly` * `src/sinker/mixed/tests/planar_gbr_float.rs` - `gbrpf32_sinker_le_encoded_frame_decodes_correctly` - `gbrapf32_sinker_le_encoded_frame_decodes_correctly` - `gbrpf16_sinker_le_encoded_frame_decodes_correctly` - `gbrapf16_sinker_le_encoded_frame_decodes_correctly` - `gbrpf16_sinker_widen_path_le_encoded_frame_decodes_correctly` - `gbrapf16_sinker_widen_path_le_encoded_frame_decodes_correctly` - `gbrpf16_sinker_widen_path_u16_and_u8_le_encoded_frame_decodes_correctly` - `gbrapf32_strategy_a_plus_le_encoded_frame_alpha_decodes_correctly` - `gbrapf32_strategy_a_plus_le_encoded_u16_alpha_decodes_correctly` - `gbrapf16_strategy_a_plus_post_widen_alpha_decodes_correctly` Each test now starts the builder chain with `.with_simd(false)`, forcing scalar dispatch on every chained `with_rgb*` / `with_rgba*` / `with_luma*` sink. SIMD-using tests in `planar_gbr_high_bit.rs` (8 pre-existing miri ignores, untouched in this PR) retain the gate; those test SIMD-dispatched kernels with no scalar bypass and genuinely cannot run under miri. Local miri verification (where the f16 setup path's portable software fallback applies — i.e. all f32 tests, plus those f16 tests whose fixtures do not call `half::f16::from_f32` on aarch64+fp16): * `cargo miri test --lib gbrpf32_sinker_le_encoded` — 1 passed * `cargo miri test --lib gbrapf32_sinker_le_encoded` — 1 passed * `cargo miri test --lib gbrapf32_strategy_a_plus_le_encoded_frame` — 1 passed * `cargo miri test --lib gbrapf32_strategy_a_plus_le_encoded_u16` — 1 passed * `cargo miri test --lib rgbf32_sinker_le_encoded` — 1 passed The f16 regression tests build their fixtures via `half::f16::from_f32`, which on aarch64-apple-darwin (miri's local triple) compiles to an inline asm path keyed on `target_feature = "fp16"` — unsupported by miri locally. On the BE CI hosts (s390x / powerpc64), `half-2.7.1` falls through to its portable software fallback (no asm, no SIMD), so all f16 tests will execute correctly under `cargo miri test` on those targets — which is the entire point of removing the `miri,ignore` gate. Verified: * `cargo test --target aarch64-apple-darwin --lib` (2285 pass, 0 fail). * `cargo build --target x86_64-apple-darwin --tests` (0 warnings). * `RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests`. * `cargo build --no-default-features`. * `cargo fmt --check`. * `cargo clippy --all-targets --all-features -- -D warnings`. * `cargo check --target s390x-unknown-linux-gnu --lib` (BE-host smoke). * `cargo miri test --lib gbrpf32_sinker_le_encoded` (and the four sibling f32 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Codex 5th-pass review of PR #92 caught: the prior doc said FFmpeg "does not define AV_PIX_FMT_RGBF32" — incorrect. FFmpeg actually defines AV_PIX_FMT_RGBF32BE, AV_PIX_FMT_RGBF32LE, and an unsuffixed alias AV_PIX_FMT_RGBF32 that is target-endian (resolves to LE on LE hosts and BE on BE hosts). Corrected: Rgbf32Frame maps to AV_PIX_FMT_RGBF32LE explicitly, with a warning that BE-host callers holding target-endian RGBF32 bytes must convert to LE before constructing the frame (otherwise the LE-decode contract would re-interpret BE bytes as LE → byte-swapped float data). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR clarifies and enforces the “LE-encoded bytes reinterpreted as f32/f16” contract for float RGB/GBR frame types (matching FFmpeg *LE pixel formats), and reverts sinker routing choices that would corrupt outputs on big-endian hosts. It also makes several Strategy A+/alpha-extract helpers explicitly endian-aware and adds regressions/parity tests to lock down the corrected behavior.

Changes:

Document the LE-encoded byte contract for Rgbf32/Rgbf16 and planar Gbr* float frame families, including FFmpeg buffer/linesize guidance.
Re-route affected sinkers to dispatch row kernels with BE = false for direct-frame reads (LE decoding), and keep post-widen paths host-native where appropriate.
Make scalar/dispatch alpha-extract helpers endian-aware (const-generic BE) with SIMD fallbacks guarded, and add LE-contract regressions + BE-parity tests.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/frame/packed_rgb_float.rs	Documents `Rgbf32Frame` as LE-encoded bytes and clarifies FFmpeg mapping/stride guidance.
src/frame/packed_rgb_f16.rs	Documents `Rgbf16Frame` LE-encoded contract and FFmpeg `*LE`/target-endian alias behavior.
src/frame/planar_gbr_float.rs	Adds explicit LE-encoded endian contract sections for planar GBR float frames.
src/sinker/mixed/packed_rgb_float.rs	Reverts `Rgbf32` sinker dispatch routing to `::<false>` (LE decode).
src/sinker/mixed/packed_rgb_f16.rs	Reverts `Rgbf16` sinker dispatch routing to `::<false>` (LE decode).
src/sinker/mixed/planar_gbr_float.rs	Reverts planar GBR `f32` sinker routing to `::<false>` and wires endian-aware alpha patching.
src/sinker/mixed/planar_gbr_f16.rs	Uses bit-normalizing widen helper; keeps post-widen routing host-native; updates alpha scatter routing.
src/sinker/mixed/packed_rgb_16bit.rs	Calls endian-aware scalar alpha extract helpers with `BE = false` for LE-encoded frame contract.
src/sinker/mixed/ayuv64.rs	Routes AYUV64 alpha extract through dispatcher with `BE = false`.
src/sinker/mixed/gray.rs	Makes Ya16 alpha patch helpers endian-aware and adds combined-vs-standalone parity tests.
src/sinker/mixed/tests/packed_rgb_float.rs	Replaces prior host-native routing tests with LE-encoded contract regression test (scalar/miri-safe).
src/sinker/mixed/tests/packed_rgb_f16.rs	Same as above for `Rgbf16` (scalar/miri-safe).
src/sinker/mixed/tests/planar_gbr_float.rs	Replaces prior host-native routing tests with LE-encoded contract regressions and Strategy A+ alpha regressions (scalar/miri-safe).
src/row/scalar/alpha_extract.rs	Makes multiple alpha extract helpers const-generic over `BE`, adds BE-parity tests, and updates docs.
src/row/scalar/planar_gbr_f16.rs	Makes `copy_alpha_plane_f16` endian-aware (`const BE`) and adds BE-parity test.
src/row/dispatch/alpha_extract.rs	Adds `const BE` to AYUV64 dispatchers and forces scalar when SIMD isn’t endian-safe.
src/row/arch/neon/alpha_extract.rs	Updates scalar tail calls to new const-generic scalar helper signatures.
src/row/arch/wasm_simd128/alpha_extract.rs	Same: scalar tail calls updated for const-generic scalar helpers.
src/row/arch/x86_avx2/alpha_extract.rs	Same: scalar tail calls updated for const-generic scalar helpers.
src/row/arch/x86_avx512/alpha_extract.rs	Same: scalar tail calls updated for const-generic scalar helpers.
src/row/arch/x86_sse41/alpha_extract.rs	Same: scalar tail calls updated for const-generic scalar helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    {
+      let mut sink = MixedSinker::<crate::yuv::Ya16>::new(w as usize, h as usize)
+        .with_rgb(&mut rgb_combined)
+        .unwrap()
+        .with_rgba(&mut rgba_combined)
+        .unwrap();
+      ya16_to(&frame, FR, M, &mut sink).unwrap();


+    {
+      let mut sink = MixedSinker::<crate::yuv::Ya16>::new(w as usize, h as usize)
+        .with_rgb_u16(&mut rgb_combined)
+        .unwrap()
+        .with_rgba_u16(&mut rgba_combined)
+        .unwrap();
+      ya16_to(&frame, FR, M, &mut sink).unwrap();


… miri PR #92's CI miri jobs (aarch64-apple-darwin) failed with `unsupported operation: can't call foreign function llvm.aarch64.neon.ld2.v8i16.p0`. Stack pointed at the `ya16_combined_rgb_and_rgba_alpha_matches_standalone_le_encoded` test calling `MixedSinker::<Ya16>::process` → `dispatch::ya16::ya16_to_rgb_row` → `arch::neon::gray::ya16_to_rgb_row::<false>` → `vld2q_u16` (NEON intrinsic Miri does not implement). Both `ya16_combined_*` tests added in `cf26058` (this PR) use width=8 — the exact threshold where the NEON kernel kicks in (`while x + 8 <= width`) — so on aarch64 hosts the SIMD body executes. Other Ya16 sinker tests in this module use width 1 / 2 which never reaches the `vld2q_u16` block. These two tests are BE-contract regressions: they verify combined (`with_rgb` + `with_rgba`) and standalone (`with_rgba`) paths agree on the LE-encoded `Ya16Frame`, locking down the endian-aware `copy_alpha_ya_u16_to_u8::<false>` and `copy_alpha_ya_u16::<false>` helpers. They must execute on BE-host miri (s390x / powerpc64) — gating them with `cfg_attr(miri, ignore)` would skip exactly the host where BE corruption surfaces. So they get the same Option A treatment 75cf865 applied to the planar / packed-rgb BE-contract regressions: prepend `.with_simd(false)` to every `MixedSinker::new(..)` builder chain so the sinker dispatch routes to the scalar kernel — no SIMD intrinsic, miri clean, BE validation preserved. Audit summary (other tests in `sinker::mixed::gray::tests`): * All `gray*` / `gray*_walker_smoke_test` / `gray*_limited_range_*` / `grayf32_with_*` integration tests use width <= 4 — far below the NEON SIMD threshold (8 lanes) — so they never invoke `vld2q_u16`. * `gray*_walker_smoke_test` and similar use the gray-family kernels (`gray9` / `gray12` / `gray14`) which dispatch through different row kernels with different SIMD-eligibility thresholds; none hit `vld2q_u16` at width=4. * `grayf32_sinker_le_encoded_frame_decodes_correctly` retains its existing `cfg_attr(miri, ignore = "...")` (added in `cf26058`) — left untouched in this commit; if a follow-up wants to align it with the 75cf865 un-gating pattern it can be done independently. * `ya8_*` tests use the Ya8 kernels (no `vld2q_u16` — Ya8 deinterleave uses `vld2_u8` which Miri *does* support) and existing width=128/130 smoke is already `cfg_attr(miri, ignore)`. * `ya16_with_*` tests (rgba_u16 / rgba_u8 / rgb_and_rgba / hsv) all use width 1 / 2 — below the SIMD threshold — so they execute the scalar tail unconditionally on every host. * `ya16_width_128_and_130_smoke` is already `cfg_attr(miri, ignore)` (Option B) and stays so — it explicitly validates the SIMD-dispatched smoke path. Tests fixed (2 — both via Option A): * `ya16_combined_rgb_and_rgba_alpha_matches_standalone_le_encoded` * `ya16_combined_rgb_u16_and_rgba_u16_alpha_matches_standalone_le_encoded` Each gains `.with_simd(false)` on both the combined sinker (with_rgb + with_rgba builder chain) and the standalone sinker (with_rgba only), matching the pattern in `tests/packed_rgb_float.rs::rgbf32_sinker_le_encoded_frame_decodes_correctly`. Verified: * `cargo test --target aarch64-apple-darwin --lib` (2285 pass, 0 fail). * `cargo build --target x86_64-apple-darwin --tests` (0 warnings). * `cargo build --no-default-features`. * `cargo fmt --check`. * `cargo clippy --all-targets --all-features -- -D warnings`. * `cargo miri test --lib sinker::mixed::gray::tests::ya16_combined` (2 pass — confirms the fix on aarch64-apple-darwin miri, the exact CI host that previously failed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot review on PR #92 caught: the comment at planar_gbr_f16.rs:359 said the downstream gbrpf32_to_* kernel was invoked with `BE = false`, but cf26058 changed the routing to `BE = HOST_NATIVE_BE` (correct for host-native f32 scratch produced by the bit-normalising widen helper). Updated comment to reflect actual routing + reinforced the distinction between post-widen scratch (host-native → HOST_NATIVE_BE) and direct- Frame paths (LE-encoded → ::<false>) so future readers don't conflate the two contracts. Copilot findings 2 + 3 (Ya16 tests at gray.rs:2470,2528 missing .with_simd(false)) were already addressed in fbb49a1 — Copilot reviewed pre-fix state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #92's 75cf865 un-gated the LE-encoded byte-contract regression tests for `Rgbf16` / `Gbrpf16` / `Gbrapf16` (and the f16 widen-path / Strategy A+ alpha variants) on the assumption that miri would see a software fallback for `half::f16::from_f32`. That assumption is wrong: `half-2.7.1` enables `target_feature = "fp16"` on aarch64 (and F16C on x86 / x86_64), and the compiled path is inline `asm!` (`fcvt`). Miri rejects inline asm with `unsupported operation: inline assembly is not supported`, breaking CI on aarch64-apple-darwin and aarch64-unknown-linux-gnu (and any other miri target where the f16 hardware feature is detected — including x86_64 with F16C). Fix: re-add `#[cfg_attr(miri, ignore = "half::f16::from_f32 uses inline asm (fcvt) unsupported by Miri")]` to every test that builds an f16 fixture via `half::f16::from_f32`. Trade-off: BE-host miri (s390x / powerpc64) now skips these particular f16 contract tests as well, but the corresponding f32 LE-encoded regressions (`gbrpf32_sinker_le_encoded_frame_decodes_correctly`, `gbrapf32_sinker_le_encoded_frame_decodes_correctly`, `gbrapf32_strategy_a_plus_le_encoded_frame_alpha_decodes_correctly`, `gbrapf32_strategy_a_plus_le_encoded_u16_alpha_decodes_correctly`) remain un-gated and exercise the same byte-swap routing logic on BE miri hosts. Tests re-gated: - src/sinker/mixed/tests/packed_rgb_f16.rs (1): - rgbf16_sinker_le_encoded_frame_decodes_correctly - src/sinker/mixed/tests/planar_gbr_float.rs (6): - gbrpf16_sinker_le_encoded_frame_decodes_correctly - gbrapf16_sinker_le_encoded_frame_decodes_correctly - gbrpf16_sinker_widen_path_le_encoded_frame_decodes_correctly - gbrapf16_sinker_widen_path_le_encoded_frame_decodes_correctly - gbrpf16_sinker_widen_path_u16_and_u8_le_encoded_frame_decodes_correctly - gbrapf16_strategy_a_plus_post_widen_alpha_decodes_correctly Bits-only constructions (`half::f16::from_bits(_)`) are left ungated: they don't involve inline asm and remain miri-safe. Verified: - cargo build --target aarch64-apple-darwin --tests (clean) - cargo build --target x86_64-apple-darwin --tests (clean) - RUSTFLAGS="-C target-feature=+simd128" cargo build \ --target wasm32-unknown-unknown --tests (clean) - cargo build --no-default-features (clean) - cargo fmt --check (clean) - cargo clippy --all-targets --all-features -- -D warnings (clean) - cargo test --target aarch64-apple-darwin --lib \ sinker::mixed::tests::packed_rgb_f16 (9 passed) - cargo test --target aarch64-apple-darwin --lib \ sinker::mixed::tests::planar_gbr_float (26 passed) - cargo miri test --lib sinker::mixed::tests::packed_rgb_f16 \ (1 ignored as expected, no inline-asm error) - cargo miri test --lib sinker::mixed::tests::planar_gbr_float \ (6 f16 tests ignored, 4 f32 LE-encoded regressions still passing) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…compat wrappers Codex 2nd-pass review of PR #87 surfaced two [high] findings: 1. Frame docs contradicted the LE-encoded plane contract. `src/frame/packed_rgb_16bit.rs` told big-endian callers to pre-normalize each `u16` via `u16::from_le` before constructing Rgb48/Bgr48/Rgba64/ Bgra64 frames. After PR #92 (`5b42065` / `3b1d716`) the documented contract is the opposite: the plane is the **LE-encoded byte layout** reinterpreted as `&[u16]` (matching FFmpeg's `*LE` suffix), and the downstream row kernel applies `u16::from_le` itself (no-op on LE host, byte-swap on BE host). A BE caller following the old docs would pre-swap → kernel swaps again → double swap → corrupted output on every BE host. Doc-comments on `Rgb48Frame` / `Bgr48Frame` / `Rgba64Frame` / `Bgra64Frame` now state the LE-encoded byte contract explicitly, mirroring the wording PR #92 used for `Rgbf32Frame` / `Gbrpf32Frame` / `Gbrapf32Frame` etc. The module-level header is updated to match. Adds an Rgb48 sinker BE-contract regression test (`rgb48_sinker_le_encoded_frame_decodes_correctly`) following the `rgbf32_sinker_le_encoded_frame_decodes_correctly` pattern from PR #92: builds the plane from LE-encoded `u16` patterns (`intended.to_le()`), forces `with_simd(false)` so it runs purely scalar, and asserts the `with_rgb_u16` identity-passthrough output bit-equals `intended`. On a BE host with a regressed pre-swap this would byte-swap every sample. The test runs under miri, which is exactly where BE CI surfaces. 2. Public row APIs broke backwards compatibility. PR #87 added `<const BE: bool>` to 28 public functions in `src/row/dispatch/packed_rgb_16bit.rs` (Rgb48/Bgr48/Rgba64/Bgra64 × {rgb, rgba, rgb_u16, rgba_u16, luma, luma_u16, hsv}) and 6 public functions in `src/row/dispatch/rgb_ops.rs` (X2Rgb10/X2Bgr10 × {rgb, rgba, rgb_u16}). All 34 functions are re-exported from `crate::row::*` (16-bit ones are explicit, x2 ones via `rgb_ops::*` glob). Existing downstream LE-only callers cannot compile against the new const-generic signature without adding `::<false>` turbofish to every call site — a hard breaking change. Each function is renamed to `foo_endian<const BE: bool>` and a thin non-generic LE-only wrapper `foo` is added that forwards to `foo_endian::<false>`. Existing pre-Tier 8 call sites compile unchanged; sinker code is updated to call the explicit `_endian::<BE>` form so endian intent is visible at every internal callsite. Audit / fan-out: - 28 packed-16-bit dispatchers renamed + 28 LE wrappers added - 6 X2Rgb10/X2Bgr10 dispatchers renamed + 6 LE wrappers added - `src/row/mod.rs` re-exports both `foo` and `foo_endian` for the 16 pub `→{rgb,rgba,rgb_u16,rgba_u16}` entries and pub(crate) for the 12 `→{luma,luma_u16,hsv}` entries - 12 internal turbofish call sites updated to `_endian::<…>` across `src/sinker/mixed/packed_rgb_16bit.rs`, `src/sinker/mixed/packed_rgb_10bit.rs`, and the in-file dispatcher unit tests in `src/row/dispatch/packed_rgb_16bit.rs` The wrappers carry `#[allow(clippy::too_many_arguments)]` only when the underlying `_endian` form already does (luma/luma_u16/hsv variants); this matches the pre-existing pattern on the BE-aware definitions and is not a new suppression. Verified: cargo test --target aarch64-apple-darwin --lib cargo build --target x86_64-apple-darwin --tests # 0 warnings RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests cargo build --no-default-features cargo fmt --check cargo clippy --all-targets --all-features -- -D warnings cargo check --target s390x-unknown-linux-gnu --lib # BE-host smoke Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

uqio and others added 4 commits May 8, 2026 20:29

uqio force-pushed the fix/frame-contract-le-encoded branch from 2cbc60c to 78f1c2f Compare May 8, 2026 08:35

uqio and others added 4 commits May 8, 2026 21:01

al8n requested a review from Copilot May 8, 2026 10:14

Copilot started reviewing on behalf of al8n May 8, 2026 10:15 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

uqio and others added 2 commits May 8, 2026 22:21

al8n merged commit 3b1d716 into main May 8, 2026
29 of 43 checks passed

al8n deleted the fix/frame-contract-le-encoded branch May 8, 2026 10:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(frame): clarify LE-encoded byte contract; revert HOST_NATIVE_BE sinker routings (post-#83/#84/#85 audit)#92

fix(frame): clarify LE-encoded byte contract; revert HOST_NATIVE_BE sinker routings (post-#83/#84/#85 audit)#92
al8n merged 10 commits intomainfrom
fix/frame-contract-le-encoded

uqio commented May 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

uqio commented May 8, 2026

Summary

Root cause

Phase breakdown

Test plan

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants