Skip to content

fix(frame): clarify LE-encoded byte contract; revert HOST_NATIVE_BE sinker routings (post-#83/#84/#85 audit)#92

Merged
al8n merged 10 commits intomainfrom
fix/frame-contract-le-encoded
May 8, 2026
Merged

fix(frame): clarify LE-encoded byte contract; revert HOST_NATIVE_BE sinker routings (post-#83/#84/#85 audit)#92
al8n merged 10 commits intomainfrom
fix/frame-contract-le-encoded

Conversation

@uqio
Copy link
Copy Markdown
Collaborator

@uqio uqio commented May 8, 2026

Summary

Codex 3rd-pass review of PR #85 caught a contract conflict that affects
six other merged PRs. PR #85's 52f8191 reverted the broken Grayf32
sinker routing; this PR completes the audit for the remaining six float
frame types affected by PR #83 (dcf40a3) and PR #84 (8627280):
Rgbf32, Rgbf16, Gbrpf32, Gbrapf32, Gbrpf16, Gbrapf16.

Root cause

The Grayf32Frame / Gbrpf32Frame / Gbrpf16Frame / Gbrapf32Frame /
Gbrapf16Frame doc-comments already named the FFmpeg *LE pixel
formats (AV_PIX_FMT_GBRPF32LE, etc.) — the byte layout is LE-encoded
bytes reinterpreted as f32 / half::f16
, NOT host-native floats.
The kernel BE = false template parameter applies u32::from_le /
u16::from_le, which is:

  • LE host with LE-encoded bytes: from_le no-op → correct
  • BE host with LE-encoded bytes: from_le swaps → restores host-native f32 → correct

The HOST_NATIVE_BE consts introduced in PR #83 + PR #84 made the
kernel a no-op byte-swap on either host, which corrupts every output
path on a BE host.

Phase breakdown

Phase 1 — Frame docs (commit 5b42065).
Add explicit # Endian contract — LE-encoded bytes sections to
Rgbf32Frame, Rgbf16Frame, Gbrpf32Frame, Gbrapf32Frame,
Gbrpf16Frame, Gbrapf16Frame, mirroring the existing Grayf32Frame
pattern. Add bytemuck::cast_slice + linesize-division-by-element-size
guidance for FFmpeg-buffer callers.

Phase 2 — Sinker revert (commit 84bbbb7).
Revert ::<HOST_NATIVE_BE>::<false> across:

File Sites
src/sinker/mixed/packed_rgb_float.rs (Rgbf32) 6
src/sinker/mixed/packed_rgb_f16.rs (Rgbf16) 7
src/sinker/mixed/planar_gbr_float.rs (Gbrpf32 + Gbrapf32) 22
src/sinker/mixed/planar_gbr_f16.rs (Gbrpf16 + Gbrapf16) 22
Total 57

Removes the four const HOST_NATIVE_BE definitions and their explanatory
doc-comments. Rewrites the widen_f16_to_f32 doc-comment to accurately
describe the LE-encoded byte contract.

Untouched per the audit:

  • Scalar row-kernel BE == HOST_NATIVE_BE copy_from_slice fast paths —
    generic in BE, correct under either contract.
  • Per-arch SIMD kernels (row/arch/*).
  • The bit-normalising widen_f16_be_to_host_f32 scalar helper — its
    BE → host-native pattern is correct because it produces decoded f32
    from LE-encoded f16 bits before the downstream f32 kernel sees them.

Phase 3 — Test cleanup + LE-encoded contract regressions (commit 2cbc60c).

Removed 10 wrong-contract tests that were typed against the host-native
reading of the Frame contract:

  • rgbf32_kernel_host_native_be_matches_false_on_le_host
  • rgbf32_sinker_host_native_contract_lossless_passthrough
  • rgbf16_kernel_host_native_be_matches_false_on_le_host
  • rgbf16_sinker_host_native_contract_lossless_passthrough
  • gbrpf32_kernel_host_native_be_matches_false_on_le_host
  • gbrpf32_sinker_host_native_contract_lossless_passthrough
  • gbrpf16_kernel_host_native_be_matches_false_on_le_host
  • gbrpf16_sinker_host_native_contract_lossless_passthrough
  • gbrapf32_sinker_host_native_contract_lossless_passthrough_with_alpha
  • gbrapf16_sinker_host_native_contract_lossless_passthrough_with_alpha

Added 6 LE-encoded contract regressions following PR #85's 52f8191
pattern (grayf32_sinker_le_encoded_frame_decodes_correctly):

  • rgbf32_sinker_le_encoded_frame_decodes_correctly
  • rgbf16_sinker_le_encoded_frame_decodes_correctly
  • gbrpf32_sinker_le_encoded_frame_decodes_correctly
  • gbrapf32_sinker_le_encoded_frame_decodes_correctly
  • gbrpf16_sinker_le_encoded_frame_decodes_correctly
  • gbrapf16_sinker_le_encoded_frame_decodes_correctly

Each test builds a plane via f32::from_bits(intended.to_bits().to_le())
(or the f16 analogue), runs the sinker's lossless pass-through, and
asserts the output equals the host-native intended values. Vacuous on LE
host (every CI runner today) but fails fast on BE for any regression
that drops the ::<false> routing.

Net test delta: −10 / +6 = −4 (final cargo test --target aarch64-apple-darwin --lib count: 2242, down from 2246).

Test plan

  • cargo test --target aarch64-apple-darwin --lib → 2242 ok
  • cargo build --target x86_64-apple-darwin --tests → ok (0 warnings)
  • RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests → ok
  • cargo build --no-default-features → ok
  • cargo fmt --check → ok
  • cargo clippy --all-targets --all-features -- -D warnings → ok

Notes

  • Untouched: GrayN / Gray16 / Ya16 / yuva*p sinkers — those use the
    source Frame's per-format BE flag for encoded-byte order
    (different layer from the f32/f16 host-native vs LE-encoded
    contract).
  • The sinker-local widen_f16_to_f32 helper in
    src/sinker/mixed/planar_gbr_f16.rs calls half::f16::to_f32
    directly on the LE-encoded plane. On LE host this is correct (LE bytes
    ARE host-native); on BE host this would mis-interpret the bits — a
    separate bug from the routing reverted here. The accurate fix is to
    switch this helper to the bit-normalising widen_f16_be_to_host_f32
    pattern used by the per-arch SIMD backends; left for a follow-up. The
    doc-comment is updated to make the LE-host caveat explicit.

🤖 Generated with Claude Code

uqio and others added 4 commits May 8, 2026 20:29
Affected types: Rgbf32Frame, Rgbf16Frame, Gbrpf32Frame, Gbrapf32Frame,
Gbrpf16Frame, Gbrapf16Frame.

Each doc-comment now explicitly states the LE-encoded byte contract,
mirroring the established Grayf32Frame doc pattern: the &[f32] / &[f16]
plane is the LE-encoded byte layout reinterpreted as f32 / f16, matching
FFmpeg's canonical *LE pixel-format suffixes (AV_PIX_FMT_GBRPF32LE etc).
Adds the bytemuck::cast_slice + linesize-division-by-element-size
instruction so callers wiring up FFmpeg buffers don't have to guess.

Codex 3rd-pass review of PR #85 caught that PR #83 + PR #84 introduced
sinker routings that assumed the planes are already host-native f32 / f16
— a wrong reading of the *LE contract that would corrupt every output
path on a BE host. Pinning the doc-level contract here so future readers
don't repeat the mistake; sinker revert lands in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors PR #85's Grayf32 sinker revert (52f8191) for the other six float
frame types affected by PR #83 (dcf40a3) and PR #84 (8627280):
Rgbf32, Rgbf16, Gbrpf32, Gbrapf32, Gbrpf16, Gbrapf16.

The earlier PRs introduced sinker-layer `HOST_NATIVE_BE` consts that
made the row-kernel `BE` parameter a no-op byte-swap on whichever host
the build targeted. That assumed the Frame's plane bytes are already
host-native f32/f16 — which contradicts the LE-encoded byte contract
clarified in the previous commit. On a BE host the routing skipped the
required `u32::from_le` / `u16::from_le` swap and would have corrupted
every output path.

Reverted call sites (all `::<HOST_NATIVE_BE>` -> `::<false>`):
  - src/sinker/mixed/packed_rgb_float.rs:    6 sites (Rgbf32)
  - src/sinker/mixed/packed_rgb_f16.rs:      7 sites (Rgbf16)
  - src/sinker/mixed/planar_gbr_float.rs:   22 sites (Gbrpf32 + Gbrapf32)
  - src/sinker/mixed/planar_gbr_f16.rs:     22 sites (Gbrpf16 + Gbrapf16)

Removes the four `const HOST_NATIVE_BE: bool = cfg!(target_endian = "big")`
definitions and their explanatory doc-comments. Rewrites the
`widen_f16_to_f32` doc in planar_gbr_f16.rs to accurately describe the
LE-encoded contract (helper is correct on LE host; cross-endian widen
should use the bit-normalising `widen_f16_be_to_host_f32` pattern).

Untouched: `BE == HOST_NATIVE_BE` `copy_from_slice` fast paths in scalar
row kernels (`rgbf32_to_rgb_f32_row`, `gbrpf32_to_rgb_f32_row`, etc.) —
generic in `BE`, correct under either contract. Per-arch SIMD kernels
and `widen_f16_be_to_host_f32` left alone — different layer.

Verification (aarch64-apple-darwin):

  cargo test --target aarch64-apple-darwin --lib   # 2246 ok

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oded regressions

Removes the host-native-contract tests that PR #83 + PR #84 added — they
were typed against an incorrect host-native-f32/f16 reading of the Frame
contract and would mask the BE-host corruption introduced by the now-
reverted `::<HOST_NATIVE_BE>` sinker routings:

  - rgbf32_kernel_host_native_be_matches_false_on_le_host       (deleted)
  - rgbf32_sinker_host_native_contract_lossless_passthrough     (deleted)
  - rgbf16_kernel_host_native_be_matches_false_on_le_host       (deleted)
  - rgbf16_sinker_host_native_contract_lossless_passthrough     (deleted)
  - gbrpf32_kernel_host_native_be_matches_false_on_le_host      (deleted)
  - gbrpf32_sinker_host_native_contract_lossless_passthrough    (deleted)
  - gbrpf16_kernel_host_native_be_matches_false_on_le_host      (deleted)
  - gbrpf16_sinker_host_native_contract_lossless_passthrough    (deleted)
  - gbrapf32_sinker_host_native_contract_lossless_passthrough_with_alpha  (deleted)
  - gbrapf16_sinker_host_native_contract_lossless_passthrough_with_alpha  (deleted)

Adds LE-encoded byte contract regressions following PR #85's `52f8191`
pattern (`grayf32_sinker_le_encoded_frame_decodes_correctly`):

  - rgbf32_sinker_le_encoded_frame_decodes_correctly    (new)
  - rgbf16_sinker_le_encoded_frame_decodes_correctly    (new)
  - gbrpf32_sinker_le_encoded_frame_decodes_correctly   (new)
  - gbrapf32_sinker_le_encoded_frame_decodes_correctly  (new)
  - gbrpf16_sinker_le_encoded_frame_decodes_correctly   (new)
  - gbrapf16_sinker_le_encoded_frame_decodes_correctly  (new)

Each test constructs an explicitly LE-encoded plane via
`f32::from_bits(intended.to_bits().to_le())` (or the f16 analogue),
builds the Frame, runs the sinker's lossless pass-through, and asserts
the output equals the host-native intended values. Vacuous on LE hosts
(where `to_le` is identity) but on a BE host any regression that drops
the `::<false>` kernel routing would fail fast.

Also picks up two trivial pre-existing trailing-blank-line cleanups in
`src/row/arch/{neon,x86_sse41}/endian.rs` produced by `cargo fmt` while
running it across the modified files.

Verification (aarch64-apple-darwin, x86_64-apple-darwin, wasm32):

  cargo test --target aarch64-apple-darwin --lib                  # 2242 ok
  cargo build --target x86_64-apple-darwin --tests                # ok
  RUSTFLAGS="-C target-feature=+simd128" \
    cargo build --target wasm32-unknown-unknown --tests           # ok
  cargo build --no-default-features                               # ok
  cargo fmt --check                                               # ok
  cargo clippy --all-targets --all-features -- -D warnings        # ok

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rmalize-first canonical helper

Closes the BE-host correctness gap flagged in the original body of #92:
the sinker-local `widen_f16_to_f32` in `src/sinker/mixed/planar_gbr_f16.rs`
called `half::f16::to_f32` directly on f16 plane bits, treating the bytes
as host-native. That assumption only holds on a little-endian host with an
LE-encoded source — on a big-endian host the LE-encoded f16 bits would be
byte-swapped from the intended values, and `to_f32` would decode them as
arbitrarily wrong magnitudes / signs / NaNs.

Replaces every call site with the canonical bit-normalize-first helper
`crate::row::scalar::planar_gbr_f16::widen_f16_be_to_host_f32::<false>`
(BE = false because Frame plane bytes are LE-encoded per the contract
documented in Phase 1 of this PR). The shared helper does
`f32 = f16::from_bits(u16::from_le(plane_bits)).to_f32()` — correct on
both LE hosts (where `from_le` is identity) and BE hosts (where it byte-
swaps the LE-encoded plane bits back to host-native f16 bits before the
f16 → f32 conversion). The downstream `gbrpf32_to_*` row kernels are still
invoked with `BE = false` since the widened scratch is now host-native f32
(its `u32::from_le` is then a no-op on LE host and the right byte-swap on
BE host — same contract as the dispatch f16-widen fallback and the per-
backend SIMD scalar tails already use).

Updated 7 call sites:
  - 3 in the `Gbrpf16` `process` widen loop (G, B, R planes).
  - 4 in the `Gbrapf16` `process` widen loop (G, B, R, A planes).
  - 1 in the `widen_and_scatter_f16_alpha_to_u8` u8 RGBA fast-path helper
    (counted as the seventh distinct call site; the other six are the
    two widen-loop blocks).

Removed the local `widen_f16_to_f32` helper (and its now-stale
LE-host-only-caveat doc-comment) entirely — no remaining callers.

Adds two LE-encoded byte-contract regression tests covering the widening
path (the existing PR #92 tests covered only the lossless f16 pass-through
path which doesn't go through the widen helper):

  - gbrpf16_sinker_widen_path_le_encoded_frame_decodes_correctly
    (Gbrpf16 → with_rgb_f32; exercises the 3-plane widen loop)
  - gbrapf16_sinker_widen_path_le_encoded_frame_decodes_correctly
    (Gbrapf16 → with_rgba_f32; exercises the 4-plane widen loop incl. α)

Each test constructs LE-encoded `&[half::f16]` planes via
`half::f16::from_bits(intended.to_bits().to_le())`, builds a Frame, runs
the widening sinker path, and asserts the f32 output equals the host-
native expected values. Vacuous on LE hosts (where `to_le` is identity)
but on a BE host any regression that drops the bit-normalize-first step
in `widen_f16_be_to_host_f32::<false>` would fail fast.

Verification (aarch64-apple-darwin, x86_64-apple-darwin, wasm32):

  cargo test --target aarch64-apple-darwin --lib                  # 2269 ok
  cargo build --target x86_64-apple-darwin --tests                # 0 warnings
  RUSTFLAGS="-C target-feature=+simd128" \
    cargo build --target wasm32-unknown-unknown --tests           # ok
  cargo build --no-default-features                               # ok
  cargo fmt --check                                               # ok
  cargo clippy --all-targets --all-features -- -D warnings        # ok

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@uqio uqio force-pushed the fix/frame-contract-le-encoded branch from 2cbc60c to 78f1c2f Compare May 8, 2026 08:35
uqio and others added 4 commits May 8, 2026 21:01
…are + frame docs cite LE names

Addresses three follow-up findings from Codex's review of PR #92.

Finding 1 — Post-widen f32 routing on Gbrpf16 / Gbrapf16. The Phase 2
follow-up replaced the local `widen_f16_to_f32` helper with the canonical
`widen_f16_be_to_host_f32::<false>`, but the immediate downstream
`gbrpf32_to_*::<false>` calls still told the f32 row kernel to apply
`from_le` AGAIN — on a BE host this would byte-swap the already-host-native
f32 scratch, corrupting output.

Introduce a module-scope `HOST_NATIVE_BE = cfg!(target_endian = "big")`
constant in `src/sinker/mixed/planar_gbr_f16.rs` and route ALL post-widen
`gbrpf32_to_*` / `gbrapf32_to_*` calls through `::<HOST_NATIVE_BE>` (14
sites total across both Gbrpf16 and Gbrapf16 sinker bodies). The kernel's
`from_le` / `from_be` then becomes a no-op on every host since post-widen
scratch already matches the host byte order.

Distinct from the Phase 2 revert: direct Frame-to-row-kernel calls
(`gbrpf16_to_*::<false>` for u8 / f16 outputs) still use `<false>` per
the LE-encoded Frame contract — those receive raw LE-encoded `&[half::f16]`
plane bytes, so the kernel must apply `from_le`. Post-widen scratch is
host-native f32, so it must use `<HOST_NATIVE_BE>` to keep the swap a
no-op everywhere. The module-level comment block makes this distinction
explicit.

Finding 2 — Ya16 RGB+RGBA combined-path alpha not endian-aware. When
`MixedSinker<Ya16>` attached BOTH `with_rgb` and `with_rgba`, the sinker
ran `ya16_to_rgb_*::<false>` (correct), expanded the RGB output to RGBA
prefix, and called `copy_alpha_ya_u16` / `copy_alpha_ya_u16_to_u8` to
overlay the alpha plane from the source — but those helpers read raw
`packed[n*2+1]` directly without `u16::from_le`. On a BE host processing
the LE-encoded `Ya16Frame`, the host-native u16 read would interpret
LE-encoded bytes as host-native (BE), yielding the wrong alpha byte.

Audit found four alpha-patch helpers with the same pattern across three
formats — Ya16 (`copy_alpha_ya_u16` / `_to_u8`), AYUV64
(`copy_alpha_packed_u16x4_at_0` / `_to_u8_at_0`), and Rgba64 / Bgra64
(`copy_alpha_packed_u16x4_at_3` / `_to_u8_at_3`). All six were made
endian-aware via a `<const BE: bool>` parameter applying
`u16::from_le(packed[..])` (or `from_be` per BE) before scaling /
storing, mirroring the pre-existing `copy_alpha_plane_u16` /
`copy_alpha_plane_u16_to_u8` pattern from PR #81.

Updated all 8 sinker call sites to pass `<false>` (Frame contract is LE).
The two AYUV64 helpers also have SIMD dispatchers — those gained a
`<const BE: bool>` generic and the `safe_for_simd = !BE && cfg!(target_endian = "little")`
gate that already exists for `copy_alpha_plane_u16_to_u8`, falling back
to scalar in the BE-host or BE-data quadrants. The five SIMD scalar tails
were pinned to `::<false>` matching the precedent from PR #81 / PR #82.

Finding 3 — Public Frame docs cite wrong FFmpeg names. `Rgbf16Frame`
docs cited `AV_PIX_FMT_RGBF16` (target-endian alias — `RGBF16LE` on LE,
`RGBF16BE` on BE). `Rgbf32Frame` docs cited `AV_PIX_FMT_RGBF32` which
doesn't exist as an unsuffixed alias; FFmpeg has only the alpha-bearing
`RGBAF32LE` / `RGBAF32BE` pair. A BE caller following the unsuffixed
contract could pass native-BE bytes into an LE-decoder.

Updated `src/frame/packed_rgb_f16.rs` to cite `AV_PIX_FMT_RGBF16LE`
explicitly, with a parenthetical noting that the unsuffixed name is a
target-endian alias to avoid future confusion. Updated
`src/frame/packed_rgb_float.rs` to drop the citation of the non-existent
`AV_PIX_FMT_RGBF32` constant entirely and instead document the contract
as "LE-encoded f32 bytes — `colconv` pins to the canonical FFmpeg `*LE`
byte-order convention regardless of host endianness, exactly as the
existing `AV_PIX_FMT_RGBAF32LE` flavour".

Tests added:
- `gbrpf16_sinker_widen_path_u16_and_u8_le_encoded_frame_decodes_correctly`
  exercises the Gbrpf16 widen → f32 → u16 chain (with `with_luma_u16`
  alongside) under the LE-encoded Frame contract — vacuous on LE,
  catches the double-swap on BE.
- `ya16_combined_rgb_and_rgba_alpha_matches_standalone_le_encoded` and
  the u16 sibling assert that Strategy A+ (combined `with_rgb` +
  `with_rgba`) produces α bytes byte-identical to the standalone
  `with_rgba` path on every host. Builds the `&[u16]` plane via
  `u16::from_ne_bytes(x.to_le_bytes())` so the in-memory bytes spell
  the LE-encoded contract on both LE and BE hosts.
- 8 new BE-parity tests in `src/row/scalar/alpha_extract.rs` for the
  newly endian-aware AYUV64 / Rgba64 / Ya16 helpers (4 helpers × u8 / u16
  variants), all using the `swap_bytes`-fixture pattern that exists for
  `copy_alpha_plane_u16` so the BE arm runs on every host without
  needing a real BE runner.

Verified:
- `cargo test --target aarch64-apple-darwin --lib` — 2278 passed
- `cargo build --target x86_64-apple-darwin --tests` — clean
- `RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests` — clean
- `cargo build --no-default-features` — clean
- `cargo fmt --check` — clean
- `cargo clippy --all-targets --all-features -- -D warnings` — clean
- `cargo check --target s390x-unknown-linux-gnu --lib` — clean (BE-host smoke)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…direct-Frame vs post-widen paths

Codex 3rd-pass review of #92 found that the f32 alpha-patch helpers
`copy_alpha_plane_f32_to_u8`, `copy_alpha_plane_f32_to_u16`, and
`copy_alpha_plane_f32` read each f32 α sample as host-native, which
silently corrupted the α slot on BE hosts (e.g., s390x) processing the
LE-encoded `Gbrapf32Frame` α plane. The byte-swapped f32 bits clamp to
near-zero or near-one, so BE hosts emitted α = 0 or 255 regardless of
the intended value. Same bug class as the u16 alpha-patch helpers fixed
in cf26058, but for the f32 path; the standalone `with_rgba_f32` test
didn't cover this because Strategy A+ uses a different code path
(`expand_rgb_to_rgba_row` + alpha-patch).

Helpers updated (4):
* `copy_alpha_plane_f32_to_u8`  — added `<const BE: bool>`, normalises
  each f32's bits via `f32::from_bits(u32::from_le/be(bits))` before
  clamp / scale / round-half-up.
* `copy_alpha_plane_f32_to_u16` — same pattern.
* `copy_alpha_plane_f32`        — same pattern (lossless pass-through).
* `copy_alpha_plane_f16` (planar_gbr_f16.rs) — added `<const BE: bool>`
  for consistency. Currently only test-called; future-proofs against a
  Strategy A+ wiring that consumes it.

Sinker call sites updated, dual routing pattern (4 sites total):
* `src/sinker/mixed/planar_gbr_float.rs` — Gbrapf32 Strategy A+ u8 RGBA
  path: `<false>` (direct-Frame, LE-encoded per Phase-1 contract).
* `src/sinker/mixed/planar_gbr_f16.rs`   — `widen_and_scatter_f16_alpha_to_u8`
  helper: `<HOST_NATIVE_BE>` (post-widen, scratch is host-native f32 after
  `widen_f16_be_to_host_f32::<false>`).

Audit summary: only one direct-Frame call site (Gbrapf32 u8 RGBA) and
one post-widen call site (Gbrapf16 u8 RGBA) currently invoke the f32
alpha-patch helpers. The Gbrapf32 RGBA u16 / RGBA f32 / RGBA f16 paths
all use direct kernels (`gbrapf32_to_rgba_u16_row` / `gbrapf32_to_rgba_f32_row`
/ `gbrapf32_to_rgba_f16_row`) and were already endian-aware via their
`<false>` parameter; same for the Gbrapf16 widened RGBA u16 / f32 paths.

Tests added (7):
* `copy_alpha_plane_f32_to_u8_be_parity_with_swapped_buffer`
* `copy_alpha_plane_f32_to_u16_be_parity_with_swapped_buffer`
* `copy_alpha_plane_f32_be_parity_with_swapped_buffer`
* `copy_alpha_plane_f16_be_parity_with_swapped_buffer`
* `gbrapf32_strategy_a_plus_le_encoded_frame_alpha_decodes_correctly`
  (w = 15, exercises scalar tail; compares Strategy A+ combo to standalone
  `with_rgba`).
* `gbrapf32_strategy_a_plus_le_encoded_u16_alpha_decodes_correctly`
  (w = 17, defense-in-depth for the u16 RGBA combo).
* `gbrapf16_strategy_a_plus_post_widen_alpha_decodes_correctly`
  (w = 15, verifies HOST_NATIVE_BE routing on the post-widen helper).

The 3 prior `copy_alpha_plane_f32*_clamps_and_scales` /
`_lossless_passthrough` tests are now `#[cfg(target_endian = "little")]`
(host-native f32 fixtures); BE-host scalar correctness is locked down
by the dedicated `*_be_parity_with_swapped_buffer` tests.

Verified:
* `cargo test --target aarch64-apple-darwin --lib` (2285 pass, +7).
* `cargo build --target x86_64-apple-darwin --tests` (0 warnings).
* `RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests`.
* `cargo build --no-default-features`.
* `cargo fmt --check`.
* `cargo clippy --all-targets --all-features -- -D warnings`.
* `cargo check --target s390x-unknown-linux-gnu --lib` (BE-host smoke).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ercises them

Codex 4th-pass review of #92 found the BE-sensitive sinker regressions skip
on miri targets — but BE CI runs via miri (powerpc64 / s390x), so the tests
effectively did not validate the BE path. The `cfg_attr(miri, ignore = ...)`
gate was added defensively, but it skipped exactly the host where BE
corruption would surface.

Audit summary: every BE-sensitive regression added in this PR calls the
high-level `MixedSinker` builder + `gbrpf32_to` / `gbrapf32_to` /
`gbrpf16_to` / `gbrapf16_to` / `rgbf32_to` / `rgbf16_to`. The dispatch layer
honours the sinker's `simd` flag — passing `with_simd(false)` bypasses every
SIMD branch and calls the scalar kernels directly. The post-widen helpers
(`widen_f16_be_to_host_f32`, `expand_rgb_to_rgba_row`, the
`copy_alpha_plane_f32*` family) are pure scalar. None of these tests
construct or invoke arch-specific SIMD intrinsics directly.

Tests un-gated (12):

* `src/sinker/mixed/tests/packed_rgb_float.rs`
  - `rgbf32_sinker_le_encoded_frame_decodes_correctly`
* `src/sinker/mixed/tests/packed_rgb_f16.rs`
  - `rgbf16_sinker_le_encoded_frame_decodes_correctly`
* `src/sinker/mixed/tests/planar_gbr_float.rs`
  - `gbrpf32_sinker_le_encoded_frame_decodes_correctly`
  - `gbrapf32_sinker_le_encoded_frame_decodes_correctly`
  - `gbrpf16_sinker_le_encoded_frame_decodes_correctly`
  - `gbrapf16_sinker_le_encoded_frame_decodes_correctly`
  - `gbrpf16_sinker_widen_path_le_encoded_frame_decodes_correctly`
  - `gbrapf16_sinker_widen_path_le_encoded_frame_decodes_correctly`
  - `gbrpf16_sinker_widen_path_u16_and_u8_le_encoded_frame_decodes_correctly`
  - `gbrapf32_strategy_a_plus_le_encoded_frame_alpha_decodes_correctly`
  - `gbrapf32_strategy_a_plus_le_encoded_u16_alpha_decodes_correctly`
  - `gbrapf16_strategy_a_plus_post_widen_alpha_decodes_correctly`

Each test now starts the builder chain with `.with_simd(false)`, forcing
scalar dispatch on every chained `with_rgb*` / `with_rgba*` / `with_luma*`
sink. SIMD-using tests in `planar_gbr_high_bit.rs` (8 pre-existing miri
ignores, untouched in this PR) retain the gate; those test SIMD-dispatched
kernels with no scalar bypass and genuinely cannot run under miri.

Local miri verification (where the f16 setup path's portable software
fallback applies — i.e. all f32 tests, plus those f16 tests whose fixtures
do not call `half::f16::from_f32` on aarch64+fp16):

* `cargo miri test --lib gbrpf32_sinker_le_encoded` — 1 passed
* `cargo miri test --lib gbrapf32_sinker_le_encoded` — 1 passed
* `cargo miri test --lib gbrapf32_strategy_a_plus_le_encoded_frame` — 1 passed
* `cargo miri test --lib gbrapf32_strategy_a_plus_le_encoded_u16` — 1 passed
* `cargo miri test --lib rgbf32_sinker_le_encoded` — 1 passed

The f16 regression tests build their fixtures via `half::f16::from_f32`,
which on aarch64-apple-darwin (miri's local triple) compiles to an inline
asm path keyed on `target_feature = "fp16"` — unsupported by miri locally.
On the BE CI hosts (s390x / powerpc64), `half-2.7.1` falls through to its
portable software fallback (no asm, no SIMD), so all f16 tests will execute
correctly under `cargo miri test` on those targets — which is the entire
point of removing the `miri,ignore` gate.

Verified:

* `cargo test --target aarch64-apple-darwin --lib` (2285 pass, 0 fail).
* `cargo build --target x86_64-apple-darwin --tests` (0 warnings).
* `RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests`.
* `cargo build --no-default-features`.
* `cargo fmt --check`.
* `cargo clippy --all-targets --all-features -- -D warnings`.
* `cargo check --target s390x-unknown-linux-gnu --lib` (BE-host smoke).
* `cargo miri test --lib gbrpf32_sinker_le_encoded` (and the four sibling f32 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex 5th-pass review of PR #92 caught: the prior doc said FFmpeg "does
not define AV_PIX_FMT_RGBF32" — incorrect. FFmpeg actually defines
AV_PIX_FMT_RGBF32BE, AV_PIX_FMT_RGBF32LE, and an unsuffixed alias
AV_PIX_FMT_RGBF32 that is target-endian (resolves to LE on LE hosts and
BE on BE hosts).

Corrected: Rgbf32Frame maps to AV_PIX_FMT_RGBF32LE explicitly, with a
warning that BE-host callers holding target-endian RGBF32 bytes must
convert to LE before constructing the frame (otherwise the LE-decode
contract would re-interpret BE bytes as LE → byte-swapped float data).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR clarifies and enforces the “LE-encoded bytes reinterpreted as f32/f16” contract for float RGB/GBR frame types (matching FFmpeg *LE pixel formats), and reverts sinker routing choices that would corrupt outputs on big-endian hosts. It also makes several Strategy A+/alpha-extract helpers explicitly endian-aware and adds regressions/parity tests to lock down the corrected behavior.

Changes:

  • Document the LE-encoded byte contract for Rgbf32/Rgbf16 and planar Gbr* float frame families, including FFmpeg buffer/linesize guidance.
  • Re-route affected sinkers to dispatch row kernels with BE = false for direct-frame reads (LE decoding), and keep post-widen paths host-native where appropriate.
  • Make scalar/dispatch alpha-extract helpers endian-aware (const-generic BE) with SIMD fallbacks guarded, and add LE-contract regressions + BE-parity tests.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/frame/packed_rgb_float.rs Documents Rgbf32Frame as LE-encoded bytes and clarifies FFmpeg mapping/stride guidance.
src/frame/packed_rgb_f16.rs Documents Rgbf16Frame LE-encoded contract and FFmpeg *LE/target-endian alias behavior.
src/frame/planar_gbr_float.rs Adds explicit LE-encoded endian contract sections for planar GBR float frames.
src/sinker/mixed/packed_rgb_float.rs Reverts Rgbf32 sinker dispatch routing to ::<false> (LE decode).
src/sinker/mixed/packed_rgb_f16.rs Reverts Rgbf16 sinker dispatch routing to ::<false> (LE decode).
src/sinker/mixed/planar_gbr_float.rs Reverts planar GBR f32 sinker routing to ::<false> and wires endian-aware alpha patching.
src/sinker/mixed/planar_gbr_f16.rs Uses bit-normalizing widen helper; keeps post-widen routing host-native; updates alpha scatter routing.
src/sinker/mixed/packed_rgb_16bit.rs Calls endian-aware scalar alpha extract helpers with BE = false for LE-encoded frame contract.
src/sinker/mixed/ayuv64.rs Routes AYUV64 alpha extract through dispatcher with BE = false.
src/sinker/mixed/gray.rs Makes Ya16 alpha patch helpers endian-aware and adds combined-vs-standalone parity tests.
src/sinker/mixed/tests/packed_rgb_float.rs Replaces prior host-native routing tests with LE-encoded contract regression test (scalar/miri-safe).
src/sinker/mixed/tests/packed_rgb_f16.rs Same as above for Rgbf16 (scalar/miri-safe).
src/sinker/mixed/tests/planar_gbr_float.rs Replaces prior host-native routing tests with LE-encoded contract regressions and Strategy A+ alpha regressions (scalar/miri-safe).
src/row/scalar/alpha_extract.rs Makes multiple alpha extract helpers const-generic over BE, adds BE-parity tests, and updates docs.
src/row/scalar/planar_gbr_f16.rs Makes copy_alpha_plane_f16 endian-aware (const BE) and adds BE-parity test.
src/row/dispatch/alpha_extract.rs Adds const BE to AYUV64 dispatchers and forces scalar when SIMD isn’t endian-safe.
src/row/arch/neon/alpha_extract.rs Updates scalar tail calls to new const-generic scalar helper signatures.
src/row/arch/wasm_simd128/alpha_extract.rs Same: scalar tail calls updated for const-generic scalar helpers.
src/row/arch/x86_avx2/alpha_extract.rs Same: scalar tail calls updated for const-generic scalar helpers.
src/row/arch/x86_avx512/alpha_extract.rs Same: scalar tail calls updated for const-generic scalar helpers.
src/row/arch/x86_sse41/alpha_extract.rs Same: scalar tail calls updated for const-generic scalar helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/sinker/mixed/planar_gbr_f16.rs Outdated
Comment thread src/sinker/mixed/gray.rs
Comment on lines +2459 to +2465
{
let mut sink = MixedSinker::<crate::yuv::Ya16>::new(w as usize, h as usize)
.with_rgb(&mut rgb_combined)
.unwrap()
.with_rgba(&mut rgba_combined)
.unwrap();
ya16_to(&frame, FR, M, &mut sink).unwrap();
Comment thread src/sinker/mixed/gray.rs
Comment on lines +2511 to +2517
{
let mut sink = MixedSinker::<crate::yuv::Ya16>::new(w as usize, h as usize)
.with_rgb_u16(&mut rgb_combined)
.unwrap()
.with_rgba_u16(&mut rgba_combined)
.unwrap();
ya16_to(&frame, FR, M, &mut sink).unwrap();
uqio and others added 2 commits May 8, 2026 22:21
… miri

PR #92's CI miri jobs (aarch64-apple-darwin) failed with `unsupported
operation: can't call foreign function llvm.aarch64.neon.ld2.v8i16.p0`.
Stack pointed at the `ya16_combined_rgb_and_rgba_alpha_matches_standalone_le_encoded`
test calling `MixedSinker::<Ya16>::process` → `dispatch::ya16::ya16_to_rgb_row`
→ `arch::neon::gray::ya16_to_rgb_row::<false>` → `vld2q_u16` (NEON
intrinsic Miri does not implement).

Both `ya16_combined_*` tests added in `cf26058` (this PR) use width=8 —
the exact threshold where the NEON kernel kicks in (`while x + 8 <= width`)
— so on aarch64 hosts the SIMD body executes. Other Ya16 sinker tests in
this module use width 1 / 2 which never reaches the `vld2q_u16` block.

These two tests are BE-contract regressions: they verify combined
(`with_rgb` + `with_rgba`) and standalone (`with_rgba`) paths agree on
the LE-encoded `Ya16Frame`, locking down the endian-aware
`copy_alpha_ya_u16_to_u8::<false>` and `copy_alpha_ya_u16::<false>`
helpers. They must execute on BE-host miri (s390x / powerpc64) — gating
them with `cfg_attr(miri, ignore)` would skip exactly the host where BE
corruption surfaces. So they get the same Option A treatment 75cf865
applied to the planar / packed-rgb BE-contract regressions: prepend
`.with_simd(false)` to every `MixedSinker::new(..)` builder chain so the
sinker dispatch routes to the scalar kernel — no SIMD intrinsic, miri
clean, BE validation preserved.

Audit summary (other tests in `sinker::mixed::gray::tests`):

* All `gray*` / `gray*_walker_smoke_test` / `gray*_limited_range_*` /
  `grayf32_with_*` integration tests use width <= 4 — far below the
  NEON SIMD threshold (8 lanes) — so they never invoke `vld2q_u16`.
* `gray*_walker_smoke_test` and similar use the gray-family kernels
  (`gray9` / `gray12` / `gray14`) which dispatch through different row
  kernels with different SIMD-eligibility thresholds; none hit
  `vld2q_u16` at width=4.
* `grayf32_sinker_le_encoded_frame_decodes_correctly` retains its
  existing `cfg_attr(miri, ignore = "...")` (added in `cf26058`) — left
  untouched in this commit; if a follow-up wants to align it with the
  75cf865 un-gating pattern it can be done independently.
* `ya8_*` tests use the Ya8 kernels (no `vld2q_u16` — Ya8 deinterleave
  uses `vld2_u8` which Miri *does* support) and existing width=128/130
  smoke is already `cfg_attr(miri, ignore)`.
* `ya16_with_*` tests (rgba_u16 / rgba_u8 / rgb_and_rgba / hsv) all use
  width 1 / 2 — below the SIMD threshold — so they execute the scalar
  tail unconditionally on every host.
* `ya16_width_128_and_130_smoke` is already `cfg_attr(miri, ignore)`
  (Option B) and stays so — it explicitly validates the SIMD-dispatched
  smoke path.

Tests fixed (2 — both via Option A):

* `ya16_combined_rgb_and_rgba_alpha_matches_standalone_le_encoded`
* `ya16_combined_rgb_u16_and_rgba_u16_alpha_matches_standalone_le_encoded`

Each gains `.with_simd(false)` on both the combined sinker (with_rgb +
with_rgba builder chain) and the standalone sinker (with_rgba only),
matching the pattern in `tests/packed_rgb_float.rs::rgbf32_sinker_le_encoded_frame_decodes_correctly`.

Verified:

* `cargo test --target aarch64-apple-darwin --lib` (2285 pass, 0 fail).
* `cargo build --target x86_64-apple-darwin --tests` (0 warnings).
* `cargo build --no-default-features`.
* `cargo fmt --check`.
* `cargo clippy --all-targets --all-features -- -D warnings`.
* `cargo miri test --lib sinker::mixed::gray::tests::ya16_combined`
  (2 pass — confirms the fix on aarch64-apple-darwin miri, the exact
  CI host that previously failed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot review on PR #92 caught: the comment at planar_gbr_f16.rs:359
said the downstream gbrpf32_to_* kernel was invoked with `BE = false`,
but cf26058 changed the routing to `BE = HOST_NATIVE_BE` (correct for
host-native f32 scratch produced by the bit-normalising widen helper).

Updated comment to reflect actual routing + reinforced the distinction
between post-widen scratch (host-native → HOST_NATIVE_BE) and direct-
Frame paths (LE-encoded → ::<false>) so future readers don't conflate
the two contracts.

Copilot findings 2 + 3 (Ya16 tests at gray.rs:2470,2528 missing
.with_simd(false)) were already addressed in fbb49a1 — Copilot
reviewed pre-fix state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@al8n al8n merged commit 3b1d716 into main May 8, 2026
29 of 43 checks passed
@al8n al8n deleted the fix/frame-contract-le-encoded branch May 8, 2026 10:25
uqio added a commit that referenced this pull request May 8, 2026
PR #92's 75cf865 un-gated the LE-encoded byte-contract regression tests
for `Rgbf16` / `Gbrpf16` / `Gbrapf16` (and the f16 widen-path / Strategy
A+ alpha variants) on the assumption that miri would see a software
fallback for `half::f16::from_f32`. That assumption is wrong:
`half-2.7.1` enables `target_feature = "fp16"` on aarch64 (and F16C on
x86 / x86_64), and the compiled path is inline `asm!` (`fcvt`). Miri
rejects inline asm with `unsupported operation: inline assembly is not
supported`, breaking CI on aarch64-apple-darwin and
aarch64-unknown-linux-gnu (and any other miri target where the f16
hardware feature is detected — including x86_64 with F16C).

Fix: re-add `#[cfg_attr(miri, ignore = "half::f16::from_f32 uses inline
asm (fcvt) unsupported by Miri")]` to every test that builds an f16
fixture via `half::f16::from_f32`. Trade-off: BE-host miri (s390x /
powerpc64) now skips these particular f16 contract tests as well, but
the corresponding f32 LE-encoded regressions
(`gbrpf32_sinker_le_encoded_frame_decodes_correctly`,
`gbrapf32_sinker_le_encoded_frame_decodes_correctly`,
`gbrapf32_strategy_a_plus_le_encoded_frame_alpha_decodes_correctly`,
`gbrapf32_strategy_a_plus_le_encoded_u16_alpha_decodes_correctly`)
remain un-gated and exercise the same byte-swap routing logic on BE
miri hosts.

Tests re-gated:
- src/sinker/mixed/tests/packed_rgb_f16.rs (1):
  - rgbf16_sinker_le_encoded_frame_decodes_correctly
- src/sinker/mixed/tests/planar_gbr_float.rs (6):
  - gbrpf16_sinker_le_encoded_frame_decodes_correctly
  - gbrapf16_sinker_le_encoded_frame_decodes_correctly
  - gbrpf16_sinker_widen_path_le_encoded_frame_decodes_correctly
  - gbrapf16_sinker_widen_path_le_encoded_frame_decodes_correctly
  - gbrpf16_sinker_widen_path_u16_and_u8_le_encoded_frame_decodes_correctly
  - gbrapf16_strategy_a_plus_post_widen_alpha_decodes_correctly

Bits-only constructions (`half::f16::from_bits(_)`) are left ungated:
they don't involve inline asm and remain miri-safe.

Verified:
- cargo build --target aarch64-apple-darwin --tests (clean)
- cargo build --target x86_64-apple-darwin --tests (clean)
- RUSTFLAGS="-C target-feature=+simd128" cargo build \
    --target wasm32-unknown-unknown --tests (clean)
- cargo build --no-default-features (clean)
- cargo fmt --check (clean)
- cargo clippy --all-targets --all-features -- -D warnings (clean)
- cargo test --target aarch64-apple-darwin --lib \
    sinker::mixed::tests::packed_rgb_f16 (9 passed)
- cargo test --target aarch64-apple-darwin --lib \
    sinker::mixed::tests::planar_gbr_float (26 passed)
- cargo miri test --lib sinker::mixed::tests::packed_rgb_f16 \
    (1 ignored as expected, no inline-asm error)
- cargo miri test --lib sinker::mixed::tests::planar_gbr_float \
    (6 f16 tests ignored, 4 f32 LE-encoded regressions still passing)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
uqio added a commit that referenced this pull request May 8, 2026
…compat wrappers

Codex 2nd-pass review of PR #87 surfaced two [high] findings:

1. Frame docs contradicted the LE-encoded plane contract.

   `src/frame/packed_rgb_16bit.rs` told big-endian callers to pre-normalize
   each `u16` via `u16::from_le` before constructing Rgb48/Bgr48/Rgba64/
   Bgra64 frames. After PR #92 (`5b42065` / `3b1d716`) the documented
   contract is the opposite: the plane is the **LE-encoded byte layout**
   reinterpreted as `&[u16]` (matching FFmpeg's `*LE` suffix), and the
   downstream row kernel applies `u16::from_le` itself (no-op on LE host,
   byte-swap on BE host). A BE caller following the old docs would pre-swap
   → kernel swaps again → double swap → corrupted output on every BE host.

   Doc-comments on `Rgb48Frame` / `Bgr48Frame` / `Rgba64Frame` / `Bgra64Frame`
   now state the LE-encoded byte contract explicitly, mirroring the wording
   PR #92 used for `Rgbf32Frame` / `Gbrpf32Frame` / `Gbrapf32Frame` etc.
   The module-level header is updated to match.

   Adds an Rgb48 sinker BE-contract regression test
   (`rgb48_sinker_le_encoded_frame_decodes_correctly`) following the
   `rgbf32_sinker_le_encoded_frame_decodes_correctly` pattern from PR #92:
   builds the plane from LE-encoded `u16` patterns (`intended.to_le()`),
   forces `with_simd(false)` so it runs purely scalar, and asserts the
   `with_rgb_u16` identity-passthrough output bit-equals `intended`. On a BE
   host with a regressed pre-swap this would byte-swap every sample. The
   test runs under miri, which is exactly where BE CI surfaces.

2. Public row APIs broke backwards compatibility.

   PR #87 added `<const BE: bool>` to 28 public functions in
   `src/row/dispatch/packed_rgb_16bit.rs` (Rgb48/Bgr48/Rgba64/Bgra64 ×
   {rgb, rgba, rgb_u16, rgba_u16, luma, luma_u16, hsv}) and 6 public
   functions in `src/row/dispatch/rgb_ops.rs` (X2Rgb10/X2Bgr10 ×
   {rgb, rgba, rgb_u16}). All 34 functions are re-exported from
   `crate::row::*` (16-bit ones are explicit, x2 ones via `rgb_ops::*`
   glob). Existing downstream LE-only callers cannot compile against the
   new const-generic signature without adding `::<false>` turbofish to
   every call site — a hard breaking change.

   Each function is renamed to `foo_endian<const BE: bool>` and a thin
   non-generic LE-only wrapper `foo` is added that forwards to
   `foo_endian::<false>`. Existing pre-Tier 8 call sites compile unchanged;
   sinker code is updated to call the explicit `_endian::<BE>` form so
   endian intent is visible at every internal callsite.

   Audit / fan-out:
   - 28 packed-16-bit dispatchers renamed + 28 LE wrappers added
   - 6 X2Rgb10/X2Bgr10 dispatchers renamed + 6 LE wrappers added
   - `src/row/mod.rs` re-exports both `foo` and `foo_endian` for the 16
     pub `→{rgb,rgba,rgb_u16,rgba_u16}` entries and pub(crate) for the 12
     `→{luma,luma_u16,hsv}` entries
   - 12 internal turbofish call sites updated to `_endian::<…>` across
     `src/sinker/mixed/packed_rgb_16bit.rs`,
     `src/sinker/mixed/packed_rgb_10bit.rs`, and the in-file dispatcher
     unit tests in `src/row/dispatch/packed_rgb_16bit.rs`

The wrappers carry `#[allow(clippy::too_many_arguments)]` only when the
underlying `_endian` form already does (luma/luma_u16/hsv variants);
this matches the pre-existing pattern on the BE-aware definitions and is
not a new suppression.

Verified:
  cargo test --target aarch64-apple-darwin --lib
  cargo build --target x86_64-apple-darwin --tests   # 0 warnings
  RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests
  cargo build --no-default-features
  cargo fmt --check
  cargo clippy --all-targets --all-features -- -D warnings
  cargo check --target s390x-unknown-linux-gnu --lib   # BE-host smoke

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants