Skip to content

Add BITOP XOR coverage for short-tail (50B values × 10 source keys, pipeline 10)#403

Merged
fcostaoliveira merged 1 commit into
redis:mainfrom
fcostaoliveira:add-bitop-xor-10keys-50B-tail-coverage
Jun 10, 2026
Merged

Add BITOP XOR coverage for short-tail (50B values × 10 source keys, pipeline 10)#403
fcostaoliveira merged 1 commit into
redis:mainfrom
fcostaoliveira:add-bitop-xor-10keys-50B-tail-coverage

Conversation

@fcostaoliveira

@fcostaoliveira fcostaoliveira commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

Why

redis/redis#15289 reports a -5.41% RPS / +5.93% avg-latency regression on BITOP XOR (and -3.42% / +3.77% on BITOP OR) between 391e3452 and 8dfb823 (PR #13898 — Implement DIFF, DIFF1, ANDOR and ONE for BITOP), specifically when source values are short enough to leave a non-32B-multiple scalar tail (50B is the headline length: one AVX2 block + 18B tail).

The currently registered BITOP spec — memtier_benchmark-2keys-bitmap-100M-bits-bitop-and — exercises the long-buffer SIMD path (12.5MB src1/src2), which is exactly the path where 8dfb823 is a win. It cannot reproduce the short-tail loss.

What

This PR adds a new spec, memtier_benchmark-10keys-bitmap-50B-bitop-xor-pipeline-10, that mirrors the reporter's repro:

field value
preload 10 keys (key:1..key:10), each holding a 50-byte string (32B AVX2 block + 18B scalar tail)
benchmark BITOP XOR tmp key:1 key:2 ... key:10, pipeline 10, 1 client, 1 thread, test-time 180s
build-variants bookworm gcc 15.2.0 amd64 + arm64 + dockerhub
tested-groups bitmap
priority 67 (regression-coverage spec, not high-priority release gate)

The XOR variant carries the strongest signal in the issue's table (-5.41% RPS, p ≈ 7e-9); a follow-up OR variant can be added if useful, but XOR alone is enough to bisect/track this regression on the AVX2 fleet.

Notes


Note

Low Risk
Benchmark registry YAML only; no Redis runtime or production code changes.

Overview
Registers a new memtier benchmark spec memtier_benchmark-10keys-bitmap-50B-bitop-xor-pipeline-10 so CI can bisect and track the BITOP XOR short-tail regression from redis/redis#15289 (50-byte values → one AVX2 block plus 18-byte scalar tail), which the existing large-bitmap BITOP AND spec does not hit.

The spec preloads 10 string keys with 50B values, then runs BITOP XOR across all ten with pipeline 10 (180s, 1 client/thread), on oss-standalone with bookworm gcc amd64/arm64 and dockerhub build variants, at priority 67 as regression coverage rather than a release gate.

Reviewed by Cursor Bugbot for commit 6fd4c58. Bugbot is set up for automated code reviews on this repo. Configure here.

…B values, pipeline 10)

Mirrors the redis/redis#15289 repro workload (-5.4% RPS XOR / -3.4% OR after
391e3452 -> 8dfb823) so the AVX2 short-tail path can be regressed/bisected on
the fleet.

Spec layout:
- Preload 10 string keys (key:1..key:10), each 50 bytes (32-byte AVX2 block +
  18-byte scalar tail — the same length class that triggers the regression).
- Bench `BITOP XOR tmp key:1 ... key:10` with pipeline 10, 1 client, 1 thread,
  test-time 180s.
- Build variants: bookworm gcc 15.2.0 amd64 + arm64 + dockerhub.
- Priority 67 (regression coverage, not high-priority release-gate).

This pairs with the existing 2keys-bitmap-100M-bits-bitop-and spec, which
covers the large-buffer SIMD path; the short-tail variant covers the call /
vzeroupper overhead specific to the AVX2 dispatch in bitopCommand.
@fcostaoliveira fcostaoliveira merged commit 266eeab into redis:main Jun 10, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants