sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product by aicss-genai · Pull Request #22156 · ggml-org/llama.cpp

aicss-genai · 2026-04-20T07:09:16Z

Overview

Authors

Small perf cleanup for Q6_K MMVQ on Intel GPUs.

The Q6_K MMVQ dot product previously called dpct::vectorized_binary<sycl::char4>(..., dpct::sub_sat()) to compute
a byte-wise saturated subtract in a packed int. On Intel targets this lowers to a scalarized sequence. Replaces it with a 4-byte SWAR trick (((a | 0x80808080u) - b) ^ 0x80808080u, safe under the known-range inputs) and a straight-line two-lane scalar dot (dp4a × 2) instead of the QR6_K-iterated loop.

Applied in all three Q6_K MMVQ sites: the standalone vec_dot_q6_K_q8_1_impl_mmvq, the reorder_vec_dot_q_sycl<Q6_K> method and operator, and the top-level vec_dot_q6_K_q8_1.

Bitwise equivalent on inputs produced by quantization; no accuracy change. Measurable tg improvement on BMG for Q6_K-heavy models.

Additional information

Split from #22066 per reviewer request for independent review.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: Yes. This work was partially produced with an agentic engineering approach: agents surface issues and explore experiments while engineers identify and reject candidates using domain knowledge. Human feedback involved.

Signed-off-by: Chun Tao <chun.tao@intel.com>

ggml-gh-bot · 2026-04-20T07:13:20Z

Hi @aicss-genai, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

Multiple open PRs from a new contributor: We limit new contributors (those without a previously merged PR) to 1 open PR at a time. You currently have 8 open PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

ctao456 and others added 2 commits April 19, 2026 23:39

sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product

4f611ae

Signed-off-by: Chun Tao <chun.tao@intel.com>

Merge branch 'ggml-org:master' into aicss-genai/sycl-bmg-upstream-pr-7

8a94a86

aicss-genai requested a review from a team as a code owner April 20, 2026 07:09

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product#22156

sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product#22156
aicss-genai wants to merge 2 commits intoggml-org:masterfrom
aicss-genai:aicss-genai/sycl-bmg-upstream-pr-7

aicss-genai commented Apr 20, 2026

Uh oh!

ggml-gh-bot bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aicss-genai commented Apr 20, 2026

Overview

Authors

Additional information

Requirements

Uh oh!

ggml-gh-bot bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants