feat(be-tier4): BE support for Y210/Y212/Y216/V210 row kernels#88
Open
feat(be-tier4): BE support for Y210/Y212/Y216/V210 row kernels#88
Conversation
Adds `<const BE: bool>` const-generic gating to all four Tier 4 packed YUV 4:2:2 formats across scalar, NEON, x86 (SSE4.1/AVX2/AVX512), and WASM SIMD128 backends, plus dispatch functions and sinker call sites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex flagged a high-severity scalar BE bug in tier 10b: the
inline `if BE { x.swap_bytes() } else { x }` pattern is wrong
on big-endian hosts because `swap_bytes()` is unconditional —
it swaps even when the data already matches the host's byte
order. The matching SIMD `load_endian_*::<BE>` helpers from
PR #81 are target-endian aware (cfg-gated reverses; no-op when
source order matches host order), so the buggy scalar paths
diverge on s390x, corrupting both BE-input and LE-input rows
when run through scalar tails or the (always-scalar) luma
kernels.
Audit of tier 4 scalar code confirms tier 4 was implemented
from the start using the helper functions `load_endian_u16::<BE>`
and `load_endian_u32::<BE>` declared in `src/row/scalar/mod.rs`,
which build a fresh `[u8; N]` from the source pointer and decode
via `u16::from_be_bytes` / `u16::from_le_bytes` (and the u32
pair). Those byte-array decoders are target-endian aware: each
is a no-op when the data byte order matches the host CPU and a
hardware byte-swap when they differ — the same semantics as
`u16::from_be` / `u16::from_le` and the SIMD `load_endian_*`
helpers. No `if BE { x.swap_bytes() } else { x }` pattern exists
in tier 4 production scalar code (`src/row/scalar/{v210,y2xx,y216}.rs`),
so no scalar production fix is needed for s390x correctness.
To prevent a future regression that introduces the buggy
pattern (a real risk now that the codex finding is on file
across tier 5/8/10b/10-float/11), this commit upgrades the
doc-comments on `load_endian_u16<BE>` and `load_endian_u32<BE>`
to:
- Spell out the **target-endian aware** contract (no swap on
matching host order, swap on differing order).
- Cite the codex finding and reference the tier 10b fix commit
message for the full motivation.
- Mark the inline-`swap_bytes` pattern as the "naive alternative"
that the helpers exist specifically to avoid.
Test helpers `to_be_u16` (`src/row/scalar/y2xx.rs`,
`src/row/scalar/y216.rs`) and `pack_v210_word_be`
(`src/row/scalar/v210.rs`) are intentionally left unchanged —
they synthesize BE-encoded fixtures from LE inputs on the
LE-host CI, mirroring the tier 5/8/10b convention; a future
phase 3 s390x QEMU run will revisit them.
Verified:
- cargo test --target aarch64-apple-darwin --lib (2171 passed, 0 failed)
- cargo build --target x86_64-apple-darwin --tests (clean, 0 warnings)
- RUSTFLAGS='-C target-feature=+simd128' cargo build --target wasm32-unknown-unknown --tests
- cargo build --no-default-features
- cargo fmt --check
- cargo clippy --all-targets --all-features -- -D warnings
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2 — Tier 4 BE rollout. Adds
<const BE: bool>to all Y210 / Y212 / Y216 / V210 row kernels across all 6 backends + dispatcher.Implementation:
load_endian_u16<BE>/load_endian_u32<BE>helpers insrc/row/scalar/mod.rs— target-endian aware (u16::from_be_bytes/u32::from_le_bytessemantics, matching the SIMDload_endian_*::<BE>helpers from feat(be-infra): endian-aware SIMD loaders across 5 backends #81). No naiveif BE { swap_bytes() }pattern (which would corrupt rows on s390x).load_endian_u16xN::<BE>/load_endian_u32xN::<BE>from feat(be-infra): endian-aware SIMD loaders across 5 backends #81.if !BE { /* SIMD */ } else { /* scalar fallback */ }for backends where the SIMD deinterleave path is too complex to easily make endian-aware (Y210/Y212/Y216 chroma deinterleave).BE-host scalar fix note
Codex flagged a high-severity scalar BE bug in tier 10b (
fix(be-tier10b): make scalar BE conversion target-endian aware,26e5077): the inlineif BE { x.swap_bytes() } else { x }pattern is unconditional — it swaps bytes on every load whenBE = true, regardless of host endianness, so on s390x both BE-input and LE-input rows are corrupted. The codex-recommended fix isif BE { u16::from_be(x) } else { u16::from_le(x) }, which is target-endian aware (no-op when source byte order matches the host).Tier 4 audit finding: the buggy
if BE { swap_bytes() }pattern does not exist in tier 4 production scalar code. Tier 4 was implemented from the start using the helper functionsload_endian_u16<BE>/load_endian_u32<BE>declared insrc/row/scalar/mod.rs, which decode raw bytes viau16::from_be_bytes/u16::from_le_bytes(and the u32 pair). Those byte-array decoders have the same target-endian-aware semantics asu16::from_be/u16::from_le, so tier 4 scalar paths agree with their SIMD counterparts on both LE and BE hosts.The follow-up commit
fix(be-tier4): make scalar BE conversion target-endian aware(2835895) records this audit finding by upgrading the doc-comments onload_endian_u16<BE>/load_endian_u32<BE>to spell out the target-endian-aware contract and cite the codex motivation, preventing a future regression that introduces the buggy inline pattern.Tests
scalar_v210_be_*,scalar_y210_be_*,y216_be_*— verify(LE-encoded buffer + BE = false) ≡ (BE-encoded buffer + BE = true).<BE>const generic.Test plan
cargo test --target aarch64-apple-darwin --lib(2171 passed)cargo build --target x86_64-apple-darwin --tests(0 warnings)cargo build --target wasm32-unknown-unknown --tests(0 warnings;-C target-feature=+simd128)cargo build --no-default-featurescargo fmt --checkcargo clippy --all-targets --all-features -- -D warnings🤖 Generated with Claude Code