Skip to content

feat(be-tier8): BE support for Rgb48/Bgr48/Rgba64/Bgra64/X2Rgb10/X2Bgr10 row kernels#87

Open
uqio wants to merge 2 commits intomainfrom
feat/be-tier8
Open

feat(be-tier8): BE support for Rgb48/Bgr48/Rgba64/Bgra64/X2Rgb10/X2Bgr10 row kernels#87
uqio wants to merge 2 commits intomainfrom
feat/be-tier8

Conversation

@uqio
Copy link
Copy Markdown
Collaborator

@uqio uqio commented May 7, 2026

Summary

Phase 2 — Tier 8 BE rollout. Adds <const BE: bool> to all Rgb48 / Bgr48 / Rgba64 / Bgra64 / X2Rgb10 / X2Bgr10 row kernels across all 6 backends + dispatcher.

Implementation:

  • Scalar: per-element swap_bytes() for u16 reads (Rgb48/Bgr48/Rgba64/Bgra64) and u32 reads (X2Rgb10/X2Bgr10)
  • All 5 SIMD backends use load_endian_u16xN::<BE> / load_endian_u32xN::<BE> from the merged BE infra (feat(be-infra): endian-aware SIMD loaders across 5 backends #81)
  • Sinker call sites hardcode <false> (sinker BE plumbing deferred to Phase 4)

Test results: 2159 tests pass. cargo build --all-features, clippy, fmt all clean.

Test plan

  • cargo test --target aarch64-apple-darwin --all-features
  • cargo build --target x86_64-apple-darwin --tests
  • cargo clippy --all-features (no warnings)
  • s390x QEMU (Phase 3)

🤖 Generated with Claude Code

…r10 row kernels

Add <const BE: bool> to all 6 packed-RGB-16bit and 10-bit format row
kernels (dispatchers, scalars, all 5 arch backends, sinkers) so
big-endian pixel sources can decode each format without a separate
code path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@uqio uqio force-pushed the feat/be-tier8 branch from 1ec5b6c to 2ed30cd Compare May 7, 2026 12:45
The Tier 8 scalar BE-load helpers used `if BE { x.swap_bytes() } else { x }`,
which is unconditional w.r.t. host endianness — wrong on big-endian hosts.
The companion SIMD `load_endian_u16x*` / `load_endian_u32x4` helpers are
target-endian aware (`#[cfg(target_endian = ...)]`), so a host-byte-order
mismatch between scalar and SIMD would corrupt s390x rows and break the
"SIMD matches scalar" parity property the dispatch tests rely on.

Replace the swap-on-BE pattern with the target-endian-aware primitives:

- `if BE { v.swap_bytes() } else { v }` → `if BE { u16::from_be(v) } else { u16::from_le(v) }`
- The fast-path `copy_from_slice` else-branches in `rgb48_to_rgb_u16_row`
  and `rgba64_to_rgba_u16_row` are likewise replaced with a per-element
  `u16::from_le` loop so the LE source path is also correct on BE hosts.

`from_be`/`from_le` are no-ops when the source byte order matches the host
and a `swap_bytes` otherwise, mirroring the SIMD `load_le_*` / `load_be_*`
semantics and keeping the scalar reference correct on every target.

Note: the X2Rgb10/X2Bgr10 (u32) scalar paths in `packed_rgb.rs` already use
`u32::from_be_bytes` / `u32::from_le_bytes` on raw `&[u8]` input, which are
target-endian aware by definition, so no fix is needed there.

Test fixtures (`byte_swap_*` / `to_be_bytes` helpers in `tests/`) are
intentionally left untouched — they synthesise BE-encoded byte buffers
from LE inputs and are correct as-is.

Verified:
- `cargo test --target aarch64-apple-darwin --lib` (2159 tests pass)
- `cargo build --target x86_64-apple-darwin --tests` (0 warnings)
- `RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests`
- `cargo build --no-default-features`
- `cargo fmt --check`
- `cargo clippy --all-targets --all-features -- -D warnings`

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant