stream 70-30 benchmark: use --test-time 120 for stable variance#374
stream 70-30 benchmark: use --test-time 120 for stable variance#374paulorsousa wants to merge 3 commits into
Conversation
Replace -n 1000 with --test-time 120 to ensure steady-state measurement. With -n 1000 (200K total requests at ~127K ops/s), the test finishes in ~1-2 seconds, making it extremely sensitive to startup jitter — observed 99.6% std.dev across runs. The 1k-consumers variant of this same test already uses --test-time 120 and achieves 0.1% variance.
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
Two issues prevented these benchmarks from exercising the streamEntryIsReferenced() PEL scan path: 1. Init only read 2000 entries per group (COUNT 2000) — changed to COUNT 100000 to read all pre-loaded entries, fully populating PELs and advancing min_cgroup_last_id. 2. No XREADGROUP during benchmark — min_cgroup_last_id never advanced past init, so streamEntryIsReferenced() short-circuited at the min_cgroup_last_id check before reaching the PEL scan. Added 20% XREADGROUP to the command mix (50% XADD, 20% XREADGROUP, 30% XTRIM) so consumer groups keep consuming and advancing min_cgroup_last_id. Without these fixes, XTRIM entries were always > min_cgroup_last_id and took the early return path — the O(C) PEL scan across 50 groups was never reached.
| @@ -138,7 +139,8 @@ clientconfig: | |||
| tool: memtier_benchmark | |||
| arguments: > | |||
| --data-size 100 | |||
| --command="XADD stream-key * field __data__" --command-key-pattern="P" --command-ratio=70 | |||
| --command="XADD stream-key * field __data__" --command-key-pattern="P" --command-ratio=50 | |||
| --command="XREADGROUP GROUP cg-01 consumer1 COUNT 10 STREAMS stream-key >" --command-key-pattern="P" --command-ratio=20 | |||
There was a problem hiding this comment.
XREADGROUP on one group won't advance min_cgroup_last_id
Medium Severity
The benchmark XREADGROUP only targets cg-01, but the description claims it "advances min_cgroup_last_id." Since min_cgroup_last_id is the minimum across all 50 groups, it stays pinned at the initial position because the other 49 groups never consume new entries. Once the preloaded entries are trimmed away (very quickly for MAXLEN ~ 1000), newly-added entries have IDs above min_cgroup_last_id, causing the exact short-circuit the benchmark is designed to avoid. The benchmark then spends most of its 120 seconds not stressing the intended O(C) PEL-scan path.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 6031c9b. Configure here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 5996773. Configure here.
| arguments: > | ||
| --data-size 100 | ||
| --command="XADD stream-key * field __data__" --command-key-pattern="P" --command-ratio=50 | ||
| --command="XREADGROUP GROUP cg-01 consumer1 COUNT 10 STREAMS stream-key >" --command-key-pattern="P" --command-ratio=20 |
There was a problem hiding this comment.
XREADGROUP reads only one of fifty consumer groups
Medium Severity
The benchmark's XREADGROUP command only reads from cg-01, but 50 consumer groups exist. The description claims this "advances min_cgroup_last_id," but since min_cgroup_last_id is the minimum of all groups' last_delivered_id, advancing only one group while the other 49 remain fixed doesn't advance the minimum. After the initial preloaded entries are trimmed (first seconds), new entries from XADD have IDs above min_cgroup_last_id and cause streamEntryIsReferenced() to short-circuit — meaning the O(C) PEL-scan code path the benchmark claims to stress is not exercised during the steady-state portion of the 120-second test.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 5996773. Configure here.


Replace
-n 1000with--test-time 120to ensure steady-state measurement.With
-n 1000(200K total requests at ~127K ops/s), the test finishes in ~1-2 seconds, making it extremely sensitive to startup jitter — observed 99.6% std.dev across runs on ARM.The
1k-consumersvariant of this same test already uses--test-time 120and achieves 0.1% variance.Change
-n 1000→--test-time 120(120-second steady-state run)Note
Low Risk
Changes are limited to benchmark YAML specifications (no runtime code changes), with primary impact being longer and more stressful benchmark runs that may affect CI/resource usage.
Overview
Adds two new stream GC-focused memtier benchmark specs (
memtier_benchmark-stream-50-cgroups-xadd-xreadgroup-xtrim-{gc,aggressive-gc}.yml) that fully populate 50 consumer-group PELs and include a 50/20/30 mix ofXADD/XREADGROUP/XTRIMto force GC compaction under active consumption.Replaces the previous
stream-50-cgroupsxadd+xtrim-only suites (which only preloaded 2k PEL entries) and updatesmemtier_benchmark-stream-concurrent-xadd-xreadgroup-70-30.ymlto run for--test-time 120(instead of-n 1000) with refreshed description for steadier measurements.Reviewed by Cursor Bugbot for commit 5996773. Bugbot is set up for automated code reviews on this repo. Configure here.