From ee105e8c0bb96bdf25d2d7ba497cbc1a82e761dd Mon Sep 17 00:00:00 2001
From: Nik Samokhvalov <nik@Niks-MacBook-Pro.local>
Date: Sat, 18 Apr 2026 06:50:42 -0700
Subject: [PATCH 1/6] docs: complete bench methodology + tooling + ops gotchas
 under benchmark/
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds a strictly-additive benchmark/ directory documenting the
methodology, tooling, and operational lessons from the
pgque-vs-pgq-vs-pgmq-vs-river-vs-que-vs-pgboss-vs-pgmq-partitioned bench
that backs #61 and PR #62.

- README.md: entry point + quick-start
- METHODOLOGY.md: methodology fix per review feedback
- OPS_GOTCHAS.md: 15 operational lessons (NEW — NVMe mount, partman stale
  rows, que func leftovers, pgboss covering index, pgq ticker, pgque xid8
  bug, spot reclaim, ASH prereqs, NOTICE instrumentation, etc.)
- HARDWARE.md: i4i.2xlarge specs, PG tuning, microbench baselines
- tooling/, runners/, consumers/, producers/, install/, charts/, gifs/

No pgque production SQL is touched.
Refs: #61, #62.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 benchmark/HARDWARE.md                         |  71 ++
 benchmark/METHODOLOGY.md                      | 206 ++++++
 benchmark/OPS_GOTCHAS.md                      | 259 ++++++++
 benchmark/README.md                           |  97 +++
 benchmark/charts/r5_analyze.py                | 137 ++++
 benchmark/charts/r6_smoke_chart.py            | 123 ++++
 benchmark/consumers/consumer_pgboss.sql       |  18 +
 .../consumers/consumer_pgmq-partitioned.sql   |  14 +
 benchmark/consumers/consumer_pgmq.sql         |  15 +
 benchmark/consumers/consumer_pgq.sql          |  14 +
 benchmark/consumers/consumer_pgque.sql        |  16 +
 benchmark/consumers/consumer_que.sql          |  18 +
 benchmark/consumers/consumer_river.sql        |  18 +
 benchmark/gifs/r4_gif_tps_solarized.py        | 618 ++++++++++++++++++
 benchmark/gifs/r4_gif_v17_solarized.py        | 610 +++++++++++++++++
 benchmark/install/README.md                   |  19 +
 benchmark/install/bootstrap.sh                |  95 +++
 benchmark/install/install_pgboss.sh           |  70 ++
 benchmark/install/install_pgmq.sh             |  11 +
 benchmark/install/install_pgq.sh              |  30 +
 benchmark/install/install_river.sh            |  80 +++
 .../install/pgmq-partitioned_setup_5min.sql   |   6 +
 benchmark/producers/producer_pgboss.sql       |   2 +
 .../producers/producer_pgmq-partitioned.sql   |   1 +
 benchmark/producers/producer_pgmq.sql         |   1 +
 benchmark/producers/producer_pgq.sql          |   1 +
 benchmark/producers/producer_pgque.sql        |   1 +
 benchmark/producers/producer_que.sql          |   4 +
 benchmark/producers/producer_river.sql        |   2 +
 benchmark/runners/clean_reinstall.sh          |  76 +++
 benchmark/runners/fix_nvme_mount.sh           |  65 ++
 benchmark/runners/run_r7.sh                   | 115 ++++
 benchmark/tooling/bloat_sampler.py            |  86 +++
 benchmark/tooling/idle_in_tx.py               |  30 +
 benchmark/tooling/microbench.sh               |  33 +
 benchmark/tooling/parse_events_consumed.py    | 169 +++++
 .../tooling/pg_stat_statements_snapshot.py    |  72 ++
 benchmark/tooling/pgq_ticker_daemon.py        |  42 ++
 benchmark/tooling/sys_metrics_sampler.py      | 121 ++++
 39 files changed, 3366 insertions(+)
 create mode 100644 benchmark/HARDWARE.md
 create mode 100644 benchmark/METHODOLOGY.md
 create mode 100644 benchmark/OPS_GOTCHAS.md
 create mode 100644 benchmark/README.md
 create mode 100644 benchmark/charts/r5_analyze.py
 create mode 100644 benchmark/charts/r6_smoke_chart.py
 create mode 100644 benchmark/consumers/consumer_pgboss.sql
 create mode 100644 benchmark/consumers/consumer_pgmq-partitioned.sql
 create mode 100644 benchmark/consumers/consumer_pgmq.sql
 create mode 100644 benchmark/consumers/consumer_pgq.sql
 create mode 100644 benchmark/consumers/consumer_pgque.sql
 create mode 100644 benchmark/consumers/consumer_que.sql
 create mode 100644 benchmark/consumers/consumer_river.sql
 create mode 100644 benchmark/gifs/r4_gif_tps_solarized.py
 create mode 100644 benchmark/gifs/r4_gif_v17_solarized.py
 create mode 100644 benchmark/install/README.md
 create mode 100644 benchmark/install/bootstrap.sh
 create mode 100644 benchmark/install/install_pgboss.sh
 create mode 100644 benchmark/install/install_pgmq.sh
 create mode 100644 benchmark/install/install_pgq.sh
 create mode 100644 benchmark/install/install_river.sh
 create mode 100644 benchmark/install/pgmq-partitioned_setup_5min.sql
 create mode 100644 benchmark/producers/producer_pgboss.sql
 create mode 100644 benchmark/producers/producer_pgmq-partitioned.sql
 create mode 100644 benchmark/producers/producer_pgmq.sql
 create mode 100644 benchmark/producers/producer_pgq.sql
 create mode 100644 benchmark/producers/producer_pgque.sql
 create mode 100644 benchmark/producers/producer_que.sql
 create mode 100644 benchmark/producers/producer_river.sql
 create mode 100644 benchmark/runners/clean_reinstall.sh
 create mode 100644 benchmark/runners/fix_nvme_mount.sh
 create mode 100644 benchmark/runners/run_r7.sh
 create mode 100644 benchmark/tooling/bloat_sampler.py
 create mode 100644 benchmark/tooling/idle_in_tx.py
 create mode 100644 benchmark/tooling/microbench.sh
 create mode 100755 benchmark/tooling/parse_events_consumed.py
 create mode 100755 benchmark/tooling/pg_stat_statements_snapshot.py
 create mode 100644 benchmark/tooling/pgq_ticker_daemon.py
 create mode 100644 benchmark/tooling/sys_metrics_sampler.py

diff --git a/benchmark/HARDWARE.md b/benchmark/HARDWARE.md
new file mode 100644
index 00000000..e171b937
--- /dev/null
+++ b/benchmark/HARDWARE.md
@@ -0,0 +1,71 @@
+# Hardware — VM sizing and microbench baselines
+
+## VM: AWS i4i.2xlarge (us-east-2)
+
+| Spec | Value |
+|---|---|
+| vCPU | 8 (Intel Ice Lake Xeon 8375C, 2.9 GHz base / 3.5 GHz turbo) |
+| RAM | 64 GB |
+| NVMe instance store | 1 × 1.75 TB (physical attach, NVMe) |
+| Network | Up to 12 Gbps |
+| EBS (root) | 8 GB gp3 (Ubuntu 24.04 AMI default) |
+| Spot price (us-east-2, 2026-04) | ~$0.20–0.30 / hour |
+| On-demand price | $0.686 / hour |
+
+See [OPS_GOTCHAS.md §1](OPS_GOTCHAS.md) — the NVMe instance store is **not** auto-mounted on Ubuntu 24.04 boot.
+
+## Expected microbench baselines
+
+Run via [tooling/microbench.sh](tooling/microbench.sh). Expected order-of-magnitude numbers:
+
+| Probe | Expected |
+|---|---|
+| sysbench cpu (1-thread events/sec) | ~25 k |
+| sysbench memory bandwidth (1-thread) | ~15 GiB/sec |
+| fio 4 k randwrite, QD=32, direct=1, on NVMe | ~300 k IOPS |
+| fio 4 k randwrite, bandwidth | ~1.2 GiB/sec |
+| fio 4 k randwrite, 99 p latency | ~100 µs |
+| fio 4 k randread, QD=32, on NVMe | ~400 k IOPS |
+
+*Actual R7 microbench numbers to be filled in from the R7 microbench pass.*
+
+## Postgres tuning (shared across all 7 VMs)
+
+Applied by each VM's bootstrap (see `install/install_*.sh`):
+
+```
+shared_preload_libraries = 'pg_stat_statements,pg_cron'
+cron.database_name = 'bench'
+
+shared_buffers = 4GB
+effective_cache_size = 12GB
+
+synchronous_commit = off
+wal_level = minimal
+wal_compression = lz4
+max_wal_size = 16GB
+checkpoint_completion_target = 0.9
+
+bgwriter_delay = 50ms
+bgwriter_lru_maxpages = 400
+bgwriter_lru_multiplier = 4.0
+
+random_page_cost = 1.1
+effective_io_concurrency = 200
+max_connections = 200
+
+max_wal_senders = 0
+
+autovacuum_vacuum_scale_factor  = 0.01
+autovacuum_analyze_scale_factor = 0.01
+autovacuum_vacuum_cost_delay    = 2ms
+
+jit = off
+listen_addresses = 'localhost'
+```
+
+`synchronous_commit=off` is deliberate — queue workloads are almost always idempotent at the application layer, and the WAL-flush path is the dominant cost for low-latency producers. It's the only PG knob we touch that materially changes safety posture.
+
+`jit=off` because 5-s JIT warmups on `DO` blocks dominated our first-transaction latency in R4.
+
+`autovacuum_*_scale_factor=0.01` is aggressive on purpose — we want autovacuum attempting to clean every 1 % dead-tuple ratio so the held-xmin phase of the bench exposes the *inability* to vacuum, not a lazy autovacuum schedule.
diff --git a/benchmark/METHODOLOGY.md b/benchmark/METHODOLOGY.md
new file mode 100644
index 00000000..a61898a4
--- /dev/null
+++ b/benchmark/METHODOLOGY.md
@@ -0,0 +1,206 @@
+# Bench methodology — the definitive reference
+
+Adapted from GitLab issue [postgres-ai/postgresql-consulting/tests-and-benchmarks#77, note 3263767264](https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/issues/77#note_3263767264).
+
+This document is the single source of truth for how the PostgreSQL-queue bench is structured, what it measures, and every script/config that makes it work. It is written so a reviewer can reproduce the whole thing end-to-end.
+
+Cross-links:
+
+- Upstream fix: [pgque PR #62](https://github.com/NikolayS/pgque/pull/62) · issue [NikolayS/pgque#61](https://github.com/NikolayS/pgque/issues/61)
+- GitLab rounds: R4 final narrative — note 3262968456; R5 root — note 3263000815; R5 full writeup — note 3263438287; pgque rotation PoC under held xmin — note 3263417609.
+
+---
+
+## 1. Goals and pathology
+
+We reproduce Brandur Leach's "Postgres queues" death spiral (see [Brandur's 2015 post](https://brandur.org/postgres-queues)): a DELETE-based work-queue becomes unvacuumably bloated the moment an unrelated backend holds `xmin` in the past (long transactions, logical-replication slots, stuck standby feedback). Dead tuples accumulate; bitmap / index scans must traverse them; latency explodes; throughput collapses even after the holder releases.
+
+Seven systems, each on its own VM:
+
+| System | Version | Pattern |
+|---|---|---|
+| **pgque** | patched via PR #62 | batch ticker + rotating event tables (TRUNCATE) |
+| pgq | v3.5.1 PL-only | same model as pgque; upstream baseline |
+| pgmq | v1.11.0 | single queue table, VISIBILITY + DELETE |
+| pgmq-partitioned | v1.11.0 + pg_partman | range-partitioned queue table |
+| river | v0.34 | SQL-level SKIP LOCKED + DELETE (consumer emulated) |
+| que | v2.4 | SKIP LOCKED + DELETE (consumer emulated) |
+| pg-boss | v12.15 | SKIP LOCKED + DELETE on partitioned `pgboss.job` |
+
+The Go/Ruby/Node workers are installed end-to-end so the schema is authentic, but the actual consumer load is driven via pgbench running each system's *SQL claim pattern*. This isolates the DB-side behaviour from runtime/GC artefacts.
+
+Workload shape (all runs):
+
+- Producer: 1 client, pgbench `-R 1000` rate-cap (may move to `-R 5000` in R7/R8)
+- Consumer: 1 client for pgque/pgq, 4 clients for everything else
+- Three 30-minute phases back-to-back = **1.5 h per run**:
+  1. **Clean baseline** — no held xmin
+  2. **Held xmin** — `idle_in_tx.py` holds `REPEATABLE READ` open
+  3. **Clean recovery** — holder killed, observe regrowth / catchup
+
+---
+
+## 2. Infrastructure
+
+- AWS us-east-2, **i4i.2xlarge** (8 vCPU, 64 GB, NVMe instance store)
+- Spot where available; on-demand only as last resort (see Section 10)
+- Ubuntu 24.04, **PG18 from PGDG**, `pg_cron`, `pg_stat_statements`, pg_ash, pgfr
+- Data dir moved to NVMe: `/mnt/pgdata/postgresql/18/main` symlink (see [runners/fix_nvme_mount.sh](runners/fix_nvme_mount.sh) and [OPS_GOTCHAS.md §1](OPS_GOTCHAS.md))
+- One VM per system so any tuning / runtime / GC behaviour is contained
+- SSH key: `<your-ssh-key>` (us-east-2)
+
+Live R7 VMs (redacted): `<pgque-ip>`, `<pgq-ip>`, `<pgmq-ip>`, `<river-ip>`, `<pgboss-ip>`, `<que-ip>`, `<pgmq-partitioned-ip>`.
+
+---
+
+## 3. Observability stack
+
+Seven parallel streams run during every bench. All CSVs land in `/tmp/bench/` and are rsynced to the local `/tmp/bench_r<N>/<system>/` tree every 30 min.
+
+### (a) bloat_sampler.py — pg_stat_user_tables every 30 s
+
+Per-system filter covers each queue's event/metadata tables (including pg-boss partitions and pgmq-partitioned's partman partitions). Writes `bloat.csv`.
+
+Source: [tooling/bloat_sampler.py](tooling/bloat_sampler.py).
+
+### (b) pg_ash — 1 Hz wait-event sampling
+
+pg_cron runs ash sampling at 1 Hz throughout; at end of bench we export:
+
+```sql
+COPY (SELECT sample_time, database_name, active_backends, wait_event, query_id
+      FROM ash.samples(p_interval => '2 hour'::interval, p_limit => 2000000))
+TO '/tmp/bench/ash.csv' CSV HEADER;
+```
+
+### (c) pgfr (pg-flight-recorder) — snapshot observability
+
+pgfr writes snapshots on a pg_cron schedule into `pgfr_record.*`. At end of run we export:
+
+```sql
+COPY (SELECT * FROM pgfr_record.snapshots)           TO '/tmp/bench/pgfr_snapshots.csv'           CSV HEADER;
+COPY (SELECT * FROM pgfr_record.table_snapshots)     TO '/tmp/bench/pgfr_table_snapshots.csv'     CSV HEADER;
+COPY (SELECT * FROM pgfr_record.statement_snapshots) TO '/tmp/bench/pgfr_statement_snapshots.csv' CSV HEADER;
+```
+
+Full pgfr_record schema on every VM includes: `snapshots`/`snapshots_v2` (partitioned), `table_snapshots(_v2)`, `statement_snapshots(_v2)`, `index_snapshots(_v2)`, `replication_snapshots(_v2)`, `vacuum_progress_snapshots(_v2)`, `activity_samples(_archive_v2)`, `lock_samples(_archive_v2)`, `config_snapshots`, `db_role_config_snapshots`.
+
+### (d) sys_metrics_sampler.py — CPU / mem / disk every 10 s (R7+)
+
+Reads `/proc/stat`, `/proc/meminfo`, `/proc/diskstats` directly (psutil-optional). NVMe device is `nvme1n1` (the instance store, which is what `/mnt/pgdata` sits on). v2 adds per-device IOPS and latency columns.
+
+Source: [tooling/sys_metrics_sampler.py](tooling/sys_metrics_sampler.py).
+
+### (e) pg_stat_statements_snapshot.py — pgss time-series every 10 s
+
+Polls pgss with a regex filter over our queue-related query shapes, diffs consecutive snapshots downstream. Used as a cross-check for NOTICE-based ev/s (important for pgque/pgq/pgmq where DO-block wrappers hide per-statement rows).
+
+Source: [tooling/pg_stat_statements_snapshot.py](tooling/pg_stat_statements_snapshot.py).
+
+### (f) pgbench `--aggregate-interval=10 --log`
+
+Both producer and consumer run with per-10 s aggregate logs (min/max/sum/sumsq latency). Files: `producer_agg.<pid>` and `consumer_agg.<pid>.<worker>` under `/tmp/bench/`.
+
+### (g) Consumer NOTICE instrumentation (R6+)
+
+Each instrumented `consumer.sql` is wrapped in a DO block that emits exactly one `RAISE NOTICE 'ev ts=<epoch_s> n=<events>'` per call. This gives us an authoritative per-call consumed-events stream that is immune to the pgss DO-wrapper opacity problem.
+
+The seven instrumented consumers are in [consumers/](consumers/). The parser for `consumer.log` NOTICE lines is [tooling/parse_events_consumed.py](tooling/parse_events_consumed.py).
+
+### (h) idle_in_tx.py — the death-spiral inducer
+
+Opens a `REPEATABLE READ` transaction, pins xmin, sleeps forever. Kill with SIGTERM to release the horizon. Source: [tooling/idle_in_tx.py](tooling/idle_in_tx.py).
+
+### (i) pgq_ticker_daemon.py — tight ticker loop for pgq
+
+pgq upstream has no built-in ticker daemon (the C one was always external). pgque runs inline; for a fair comparison we run a tight Python loop on the pgq VM that calls `pgq.ticker()` at 1 Hz and `pgq.maint_operations()` every 5 s. Source: [tooling/pgq_ticker_daemon.py](tooling/pgq_ticker_daemon.py).
+
+---
+
+## 4. Bench runner
+
+Orchestrates the phase scheduler. Forks producer, consumer (both pgbench), bloat_sampler, sys_metrics_sampler, pgss snapshotter; at t=1800 s opens `idle_in_tx`; runs `VACUUM VERBOSE` at phase boundaries; at t=3600 s kills the holder; another `VACUUM VERBOSE`; at t=5400 s harvests ash/pgfr/pgss.
+
+Source: [runners/run_r7.sh](runners/run_r7.sh).
+
+---
+
+## 5. Clean-slate reset
+
+Before every run we kill stragglers, DROP the system's schema/extensions, unschedule lingering `pg_cron` jobs, re-run `install.sh`, and reset all stats. The companion `full_reset.sql` calls `pg_stat_statements_reset()`, `pg_stat_reset()`, `pg_stat_reset_shared()` for bgwriter/checkpointer/wal/io, `TRUNCATE ash.sample`, and `DELETE FROM cron.job_run_details`.
+
+Source: [runners/clean_reinstall.sh](runners/clean_reinstall.sh). See also [OPS_GOTCHAS.md §4, §5, §7](OPS_GOTCHAS.md) for adjacent-schema pitfalls.
+
+---
+
+## 6. Per-system install scripts
+
+Each VM has its own `/tmp/install.sh` (mirrored into [install/](install/)). Pattern differs by system:
+
+- **pgque** — `git clone` the PR #62 branch, `make USE_PGXS=1 install`, run the SQL installer, `SELECT pgque.create_queue('bench_queue')`, schedule `pgque.ticker()` via `pg_cron` (also called inline in `consumer.sql`). Driven from AMI user-data; see [install/README.md](install/README.md) and [install/bootstrap.sh](install/bootstrap.sh).
+- **pgq** — PGDG `postgresql-18-pgq3` or `git clone --branch v3.5.1`, then `CREATE EXTENSION pgq` and immediately apply `switch_plonly.sql` to replace the C `insert_event_raw` with PL/pgSQL (pg_proc lang check `= 'plpgsql'` as a gate). See [install/install_pgq.sh](install/install_pgq.sh).
+- **pgmq** — PGDG `postgresql-18-pgmq` + `CREATE EXTENSION pgmq` + `pgmq.create('bench_queue')`. See [install/install_pgmq.sh](install/install_pgmq.sh).
+- **pgmq-partitioned** — pgmq + `pg_partman` + `pgmq.create_partitioned('bench_queue', 'id', '10000')`. Reuses [install/install_pgmq.sh](install/install_pgmq.sh) plus [install/pgmq-partitioned_setup_5min.sql](install/pgmq-partitioned_setup_5min.sql) for the partman schedule.
+- **river** — `go install github.com/riverqueue/river/cmd/river@v0.34.x` and `river migrate-up` (sets schema), then the consumer SQL emulates the SKIP-LOCKED claim pattern. See [install/install_river.sh](install/install_river.sh).
+- **que** — Ruby + `gem install que -v 2.4.x`, `bundle exec que:install` creates `que_jobs`; consumer SQL emulates the Ruby Que SELECT/DELETE. Driven from AMI user-data — see [install/README.md](install/README.md) and [OPS_GOTCHAS.md §7](OPS_GOTCHAS.md).
+- **pg-boss** — local `npm install pg-boss@12.15`, run `new PgBoss(DSN).start()` once to migrate schema; consumer SQL emulates the `SKIP LOCKED` claim on `pgboss.job`. See [install/install_pgboss.sh](install/install_pgboss.sh).
+
+---
+
+## 7. Per-system consumer.sql patterns
+
+All seven instrumented consumers are in [consumers/](consumers/):
+
+- [consumer_pgque.sql](consumers/consumer_pgque.sql) — ticker + `next_batch` + `get_batch_events` + `finish_batch`
+- [consumer_pgq.sql](consumers/consumer_pgq.sql) — identical, schema swap pgque → pgq
+- [consumer_pgmq.sql](consumers/consumer_pgmq.sql) — `pgmq.read(50)` + `pgmq.delete`
+- [consumer_pgmq-partitioned.sql](consumers/consumer_pgmq-partitioned.sql) — identical SQL to pgmq (schema hides partitioning)
+- [consumer_river.sql](consumers/consumer_river.sql) — SKIP LOCKED + DELETE on `river_job`
+- [consumer_que.sql](consumers/consumer_que.sql) — SKIP LOCKED + DELETE on `que_jobs`
+- [consumer_pgboss.sql](consumers/consumer_pgboss.sql) — SKIP LOCKED + DELETE on `pgboss.job`
+
+All producers are in [producers/](producers/).
+
+---
+
+## 8. Analysis and chart generation
+
+- [charts/r5_analyze.py](charts/r5_analyze.py) — R5 full 2-panel chart (dead tuples + consumer latency, linear scale, no symlog — per [MEMORY rule](../CLAUDE.md) "Never use log/symlog on charts").
+- [charts/r6_smoke_chart.py](charts/r6_smoke_chart.py) — R6 smoke Solarized-Dark 2-panel chart (events/s + pgque per-table dead tuples).
+- [gifs/r4_gif_v17_solarized.py](gifs/r4_gif_v17_solarized.py) — R4 dead-tuples animated GIF (7 systems, Solarized-Dark).
+- [gifs/r4_gif_tps_solarized.py](gifs/r4_gif_tps_solarized.py) — R4 TPS/latency animated GIF.
+
+---
+
+## 9. GitLab posting style
+
+- Threads open under `### header`; replies stay inside the same discussion.
+- Every `<details>` block has **one blank line** between the opening tag and the triple-backtick fence, and **one blank line** between the closing fence and `</details>` — and another blank line after `</details>`. This is what makes them render correctly on GitLab.
+- Every significant result embeds `![alt](/uploads/…)` — charts are first-class citizens. GIFs too.
+- Post bodies via `curl --data-urlencode "body@/tmp/path"` to the discussions endpoint — new thread, not a reply under an existing one.
+
+---
+
+## 10. Cost discipline
+
+- us-east-2 spot preferred (~$0.22/h); when spot is exhausted we hop to another region before going on-demand.
+- At R7 time 2 VMs are on-demand (pgmq-partitioned and pgboss; pgque+que moved on-demand after R6 spot reclaim).
+- **~$15 per 1.5 h bench round** (7 VMs).
+- Total spend through R7 is ~$100.
+- Rsync every 30 min pulls `/tmp/bench/` to local, so a spot reclaim loses at most a partial phase.
+
+See [OPS_GOTCHAS.md §12](OPS_GOTCHAS.md) for spot-reclaim mitigations.
+
+---
+
+## Reproducing from scratch
+
+1. Launch 7 × i4i.2xlarge in us-east-2 (spot first) with `bench_userdata.sh` (symlinks `/mnt/pgdata`, installs PGDG PG18, pg_cron, pg_ash, pgfr, pg_stat_statements). See [runners/fix_nvme_mount.sh](runners/fix_nvme_mount.sh) for the NVMe side if the userdata path breaks.
+2. `scp` per-system `install.sh`, `producer.sql`, instrumented `consumer.sql`, plus the samplers in [tooling/](tooling/).
+3. `bash clean_reinstall.sh <sys>` on each VM.
+4. `bash run_r7.sh <sys>` — 1.5 h, writes `/tmp/bench/*.csv` + `*.log`.
+5. Rsync everything back to `/tmp/bench_r7/<sys>/`.
+6. Run [charts/r5_analyze.py](charts/r5_analyze.py) (or R7 successor) for the verdict table + 2-panel PNG.
+7. Post results as a threaded reply with the `<details>`/blank-line style above.
+
+That is the whole rig.
diff --git a/benchmark/OPS_GOTCHAS.md b/benchmark/OPS_GOTCHAS.md
new file mode 100644
index 00000000..6c78d190
--- /dev/null
+++ b/benchmark/OPS_GOTCHAS.md
@@ -0,0 +1,259 @@
+# Operational gotchas
+
+Every operational lesson accumulated while running the pgque-vs-pgq-vs-pgmq-vs-river-vs-que-vs-pgboss-vs-pgmq-partitioned bench on AWS i4i.2xlarge. Read this before reproducing.
+
+---
+
+## 1. AWS i4i.2xlarge: NVMe instance store is NOT auto-mounted
+
+The Ubuntu 24.04 AMI boots with `/dev/nvme1n1` **present but not formatted, not mounted**. You MUST format and mount it BEFORE installing Postgres data dir, or the data goes to the 8 GB root EBS volume and the bench dies the moment you fill the page cache.
+
+```bash
+sudo mkfs.xfs /dev/nvme1n1
+sudo mkdir -p /mnt/pgdata
+sudo mount -o noatime,nodiratime /dev/nvme1n1 /mnt/pgdata
+sudo chown postgres:postgres /mnt/pgdata
+```
+
+If PG is installed first (default path `/var/lib/postgresql/18/main`), data goes to the 8 GB root EBS → disaster under bench load.
+
+**Symlink pattern** after moving data:
+
+```bash
+sudo systemctl stop postgresql@18-main
+sudo mv /var/lib/postgresql/18/main /mnt/pgdata/postgresql/18/main
+sudo ln -s /mnt/pgdata/postgresql/18/main /var/lib/postgresql/18/main
+sudo systemctl start postgresql@18-main
+```
+
+See [runners/fix_nvme_mount.sh](runners/fix_nvme_mount.sh) for the recovery procedure when this happens post-hoc.
+
+---
+
+## 2. `/tmp/bench/` observability dir — must be on NVMe
+
+Default `/tmp` is on the root disk. pgbench `--log-prefix=/tmp/bench/producer_agg` + `consumer.log` with 17 k NOTICE/s + `bloat.csv` + `sys_metrics.csv` all write there.
+
+At bench rates this is 1–5 MB/s of root-disk I/O. Not catastrophic but adds noise you can see in the sys_metrics disk panel of the non-subject VM.
+
+**Fix:**
+
+```bash
+sudo mkdir -p /mnt/pgdata/bench && sudo chown postgres:postgres /mnt/pgdata/bench
+ln -s /mnt/pgdata/bench /tmp/bench
+```
+
+Now observability output lands on NVMe alongside PG data.
+
+---
+
+## 3. `logging_collector=off` means server logs go to systemd journal
+
+With `logging_collector=off` (our default), server logs go to the systemd journal, not a file on disk. Means zero PG log I/O during bench — good for the throughput numbers.
+
+If investigating slow queries later, enable `log_statement='all'` + `logging_collector=on` with `log_directory` inside the NVMe data dir (relative path `'log'` resolves correctly since `data_dir` is symlinked to NVMe).
+
+---
+
+## 4. Clean-slate reset without losing adjacent schemas
+
+`DROP SCHEMA pgque CASCADE` can cascade into objects it doesn't own if a function somewhere references pgque. R6 lost the `ash` schema this way on the pgque VM.
+
+**Safer pattern for pgque:** `TRUNCATE` all `pgque.*` tables instead of drop-and-recreate. If a reinstall is actually needed:
+
+```bash
+sudo -u postgres psql -d bench -c "DROP SCHEMA pgque CASCADE"
+sudo -u postgres psql -d bench -f /tmp/pgque.sql
+sudo -u postgres psql -d bench -f /tmp/pg_ash/sql/ash-install.sql
+sudo -u postgres psql -d bench -f /tmp/pgfr/_record/sql/install.sql
+sudo -u postgres psql -d bench -f /tmp/pgfr/_analyze/sql/install.sql
+sudo -u postgres psql -d bench -f /tmp/pgfr/_control/sql/install.sql
+```
+
+Re-run the ash/pgfr installers even if you think they weren't touched.
+
+**Verify after reset:**
+
+```sql
+SELECT nspname FROM pg_namespace
+ WHERE nspname IN ('ash','pgfr_record','pgfr_analyze','pgfr_control','pgque');
+```
+
+All five must appear.
+
+---
+
+## 5. pg_partman + pgmq-partitioned: stale `part_config` rows
+
+After `DROP SCHEMA pgmq CASCADE`, `public.part_config` retains rows pointing to non-existent tables. `run_maintenance_proc()` then throws:
+
+```
+ERROR: Given parent table not found in system catalogs: pgmq.q_bench_queue
+```
+
+**Fix** before reinstalling pgmq:
+
+```sql
+DELETE FROM public.part_config WHERE parent_table LIKE 'pgmq.%';
+```
+
+pg_partman is installed in schema `public` (not `partman` as the docs sometimes suggest); `part_config` is `public.part_config`.
+
+---
+
+## 6. pg_partman premake tuning
+
+Default `premake=4` works for bench (4 future 5-min partitions = 20 min buffer, enough for 1.5 h bench given 1-min maintenance cron cadence).
+
+R6 tried `premake=20` = 24 partitions steady-state, which killed pgmq-partitioned consumer perf (525 TPS vs 6621 with premake=4) due to planner cost across all partitions per query.
+
+`infinite_time_partitions=true` is needed for the maintenance job to keep creating future partitions indefinitely:
+
+```sql
+UPDATE public.part_config
+   SET premake=4, infinite_time_partitions=true
+ WHERE parent_table='pgmq.q_bench_queue';
+```
+
+---
+
+## 7. que (Ruby) reinstall: residual `que_validate_tags` function
+
+`DROP TABLE que_jobs, que_lockers, que_values CASCADE` doesn't drop associated functions (they're in `public` schema).
+
+Re-running `que.migrate!` fails:
+
+```
+PG::DuplicateFunction: function que_validate_tags already exists with same argument types
+```
+
+**Fix** — explicitly drop all que_* functions before migrate:
+
+```sql
+DO $$ DECLARE r record; BEGIN
+  FOR r IN SELECT proname FROM pg_proc WHERE proname LIKE 'que\_%' OR proname='que_validate_tags'
+  LOOP EXECUTE format('DROP FUNCTION IF EXISTS public.%I CASCADE', r.proname); END LOOP;
+END $$;
+```
+
+---
+
+## 8. river: no clean_reinstall.sh on the VM
+
+river's `install.sh` was simpler than the others — it installs the Go binary + creates `river_job` via `river migrate up`.
+
+Reset = `TRUNCATE river_job` + stats reset. No schema-drop needed. If you really want a nuke-and-pave, `DROP TABLE river_job, river_leader, river_migration, river_client, river_client_queue, river_queue CASCADE` then re-run migrate.
+
+---
+
+## 9. pgboss: covering index for bench workload
+
+pg-boss v12.15 DELETE path scans `pgboss.job_common` which partitions by name; the `(name, state, start_after)` index isn't covering under bench-heavy DELETE.
+
+**Add a covering index:**
+
+```sql
+CREATE INDEX bench_covering
+ON pgboss.job_common (name, state, start_after, created_on, id)
+WHERE state < 'active';
+```
+
+This lines up the consumer's claim path and turns the DELETE into an index-only prefilter.
+
+---
+
+## 10. pgq: no built-in ticker in the v3.5.1 PL-only fork
+
+Upstream pgq relies on the C ticker binary. The PL-only fork we use for fair comparison doesn't ship one.
+
+**Workaround:** tight Python loop calling `pgq.ticker('queue_name')` every 1 s and `pgq.maint_operations()` every 5 s — see [tooling/pgq_ticker_daemon.py](tooling/pgq_ticker_daemon.py).
+
+Runs on the pgq VM only. Killed at bench end via `pkill -f pgq_ticker_daemon`.
+
+---
+
+## 11. pgque PG18 xid8 cast bug (pre-PR #62)
+
+`batch_event_sql` function casts `xid8` to `bigint`, which fails on PG18's stricter type rules:
+
+```
+ERROR: cannot cast type xid8 to bigint
+```
+
+**Fix:** `::text::bigint` as intermediate cast. Landed in [PR #62](https://github.com/NikolayS/pgque/pull/62).
+
+If you see this on a VM that wasn't freshly installed from the PR #62 branch, re-run [install/install_pgque.sh](install/install_pgque.sh).
+
+---
+
+## 12. Spot-instance termination risk
+
+i4i.2xlarge spot price is ~$0.20–0.30/h vs on-demand $0.686/h. The us-east-2 spot market is reliable for short bench runs but DOES reclaim.
+
+- R4 lost `pgmq-partitioned` spot mid-run.
+- R6 lost `pgque` AND `que` spots between runs (hours-long windows).
+
+**Mitigation:** on-demand for the primary subject under test (pgque for PR #62 work). Cost delta: ~$3–4 per 7-VM 1.5 h run. Worth it for the subject you're landing a PR on; leave the comparison set on spot.
+
+Rsync every 30 min (see Section 10 of [METHODOLOGY.md](METHODOLOGY.md)) so a reclaim loses at most a partial phase, not a whole run.
+
+---
+
+## 13. ASH + pgfr install prerequisites
+
+Install order: `pg_cron` → `pg_stat_statements` → pg_ash → pgfr (`_record` / `_analyze` / `_control`).
+
+```
+ALTER SYSTEM SET shared_preload_libraries='pg_cron,pg_stat_statements';
+```
+
+…then **restart** BEFORE installing ash.
+
+ASH uses `pg_cron.schedule('ash_sampler','1 seconds', ...)` — requires `pg_cron.database_name` to match the bench DB:
+
+```
+cron.database_name = 'bench'
+```
+
+If you see `ash.samples()` returning zero rows with no obvious error, the cron job is almost certainly scheduled against the wrong database.
+
+---
+
+## 14. pgbench aggregate-interval logging
+
+`--aggregate-interval=10 --log --log-prefix=/tmp/bench/producer_agg` writes per-10 s buckets, one line per bucket per client.
+
+Low observer effect: one `fwrite` per 10 s per client, not per transaction.
+
+Captures `latency min/max/sum/sumsq` → enough to reconstruct mean + stddev + approximate min/max band per window. This is how we plot "consumer latency over time" without running a separate sampler.
+
+Log file naming: `producer_agg.<pid>` for `-c 1 -j 1`; `consumer_agg.<pid>.<worker>` for `-c N -j N` with N>1.
+
+---
+
+## 15. NOTICE-based event-count instrumentation
+
+We use `RAISE NOTICE 'ev ts=% n=%'` inside DO blocks inside the consumer SQL.
+
+At 17 k calls/sec: ~40 % CPU overhead on the **consumer client** (the pgbench process parsing NOTICE lines into stdout/log), but no observed server-side cost. Safe for bench purposes, NOT for production.
+
+pgbench faithfully prints NOTICE lines to stdout (captured as `consumer.log`). Parser: [tooling/parse_events_consumed.py](tooling/parse_events_consumed.py). Output: `events_consumed_per_sec.csv` + `events_consumed_summary.txt`.
+
+Why NOTICE rather than pgss: DO-wrappers hide per-statement rows from `pg_stat_statements`, so the pgss queryid reported for a consumer is the DO wrapper as a whole — not the SELECT/DELETE inside it — which makes it useless for counting consumed events when the internal shape differs across systems. NOTICE is schema-neutral and authoritative.
+
+---
+
+## Summary checklist
+
+Before a run:
+
+- [ ] `/dev/nvme1n1` formatted + mounted + PG data dir symlinked
+- [ ] `/tmp/bench` symlinked to `/mnt/pgdata/bench`
+- [ ] `shared_preload_libraries='pg_cron,pg_stat_statements'` + restart
+- [ ] `cron.database_name='bench'`
+- [ ] pg_ash, pgfr (`_record`, `_analyze`, `_control`) installed
+- [ ] per-system install script re-run after any schema drop
+- [ ] for pgmq-partitioned: stale `public.part_config` rows deleted, `premake=4`
+- [ ] for que: que_* functions dropped before migrate
+- [ ] for pgq: ticker daemon started (and remembered to be killed at end)
+- [ ] rsync cron scheduled on the local workstation (not the VM)
diff --git a/benchmark/README.md b/benchmark/README.md
new file mode 100644
index 00000000..0ef65520
--- /dev/null
+++ b/benchmark/README.md
@@ -0,0 +1,97 @@
+# pgque benchmark harness
+
+The whole bench rig: how we compared pgque (PR #62) against pgq, pgmq, pgmq-partitioned, river, que, and pg-boss under held-xmin pathology conditions on AWS i4i.2xlarge.
+
+Backs [NikolayS/pgque#61](https://github.com/NikolayS/pgque/issues/61) (the held-xmin bloat issue) and [PR #62](https://github.com/NikolayS/pgque/pull/62) (the subscription/tick rotation fix).
+
+## Start here
+
+- **[METHODOLOGY.md](METHODOLOGY.md)** — full bench methodology, workload shape, observability stack, phase scheduler.
+- **[OPS_GOTCHAS.md](OPS_GOTCHAS.md)** — every operational lesson from running this on AWS (NVMe mount, pg_partman stale rows, que function leftovers, pgboss index, etc.).
+- **[HARDWARE.md](HARDWARE.md)** — i4i.2xlarge specs, PG tuning, expected microbench baselines.
+
+## Credit
+
+The pathology this bench exercises is the one Brandur Leach documented in [Postgres Queues (2015)](https://brandur.org/postgres-queues) — DELETE-based queues become unvacuumably bloated under held xmin, and the DB-level death spiral that follows. The bench is a controlled reproduction of that scenario across seven queue systems.
+
+## Quick-start
+
+```bash
+# on the VM, after bootstrap:
+bash install/install_<sys>.sh            # system-specific install
+bash runners/run_r7.sh <sys>             # orchestrator; 1.5h per system
+
+# outputs (all under /tmp/bench/):
+#   producer_agg.*                       pgbench producer aggregate log
+#   consumer_agg.*                       pgbench consumer aggregate log
+#   consumer.log                         NOTICE-instrumented per-call event counts
+#   bloat.csv                            pg_stat_user_tables every 30 s
+#   sys_metrics.csv                      CPU/mem/disk every 10 s
+#   pgss_timeseries.csv                  pg_stat_statements every 10 s
+#   ash.csv                              pg_ash 1 Hz wait-event samples (harvested at end)
+#   pgfr_snapshots.csv                   pg-flight-recorder snapshots (harvested at end)
+#   pgfr_table_snapshots.csv             pgfr table-level snapshots
+#   pgfr_statement_snapshots.csv         pgfr per-statement snapshots
+#   events_consumed_per_sec.csv          parsed from consumer.log
+```
+
+Example instrumented consumer: [consumers/consumer_pgque.sql](consumers/consumer_pgque.sql).
+
+ASH and pgfr output (`ash.csv`, `pgfr_snapshots.csv`) land in `/tmp/bench/` post-run when the harvest step of `run_r7.sh` completes.
+
+## Directory layout
+
+```
+benchmark/
+  README.md                          # this file
+  METHODOLOGY.md                     # full methodology (from GitLab #77)
+  OPS_GOTCHAS.md                     # operational lessons
+  HARDWARE.md                        # VM + PG tuning
+  tooling/
+    bloat_sampler.py                 # pg_stat_user_tables sampler
+    sys_metrics_sampler.py           # /proc/{stat,meminfo,diskstats} sampler (v2)
+    pg_stat_statements_snapshot.py   # pgss time-series
+    parse_events_consumed.py         # NOTICE log → events_consumed_per_sec.csv
+    idle_in_tx.py                    # REPEATABLE READ xmin holder (the inducer)
+    pgq_ticker_daemon.py             # tight ticker loop for pgq (no built-in daemon)
+    microbench.sh                    # sysbench + fio baseline
+  runners/
+    run_r7.sh                        # phase-scheduled orchestrator
+    clean_reinstall.sh               # reset between runs
+    fix_nvme_mount.sh                # recover from NVMe-not-mounted boot
+  consumers/
+    consumer_pgque.sql               # instrumented (DO + NOTICE)
+    consumer_pgq.sql
+    consumer_pgmq.sql
+    consumer_pgmq-partitioned.sql
+    consumer_river.sql
+    consumer_que.sql
+    consumer_pgboss.sql
+  producers/
+    producer_pgque.sql               # pgque.send()
+    producer_pgq.sql                 # pgq.insert_event
+    producer_pgmq.sql                # pgmq.send
+    producer_river.sql               # INSERT INTO river_job
+    producer_que.sql                 # INSERT INTO que_jobs
+    producer_pgboss.sql              # INSERT INTO pgboss.job
+    producer_pgmq-partitioned.sql    # same as producer_pgmq
+  install/
+    README.md                        # which system uses which installer
+    bootstrap.sh                     # shared PG18 + pg_cron + ash + pgfr base
+    install_pgq.sh
+    install_pgmq.sh                  # also used by pgmq-partitioned
+    pgmq-partitioned_setup_5min.sql  # partman cron schedule
+    install_river.sh
+    install_pgboss.sh
+                                     # note: pgque + que driven from AMI userdata
+  charts/
+    r5_analyze.py                    # R5 2-panel chart
+    r6_smoke_chart.py                # R6 smoke Solarized-Dark chart
+  gifs/
+    r4_gif_v17_solarized.py          # dead-tuples animated GIF
+    r4_gif_tps_solarized.py          # TPS/latency animated GIF
+```
+
+## Scope
+
+Only a harness. Produces no test coverage for pgque itself — that lives under `tests/` in the main repo. This directory adds nothing to the pgque production SQL; it is pure ops tooling + docs.
diff --git a/benchmark/charts/r5_analyze.py b/benchmark/charts/r5_analyze.py
new file mode 100644
index 00000000..7a5b76aa
--- /dev/null
+++ b/benchmark/charts/r5_analyze.py
@@ -0,0 +1,137 @@
+#!/usr/bin/env python3
+"""R5 full — verdict table + 2-panel chart (dead tuples + consumer latency)."""
+import csv, re
+from pathlib import Path
+from datetime import datetime
+import matplotlib
+matplotlib.use('Agg')
+import matplotlib.pyplot as plt
+import numpy as np
+from matplotlib.ticker import FuncFormatter
+
+ROOT = Path("/tmp/bench_r5")
+SYSTEMS = ["pgque", "pgq", "pgmq", "pgmq-partitioned", "river", "que", "pgboss"]
+COLORS = {
+    "pgque": "#1F4E79", "pgq": "#6B9BD1", "pgmq": "#C0392B",
+    "pgmq-partitioned": "#D98880", "river": "#E67E22",
+    "que": "#8E44AD", "pgboss": "#16A085",
+}
+EVENT_RE = {
+    "pgque":            re.compile(r"^pgque\.event_\d+_\d+$"),
+    "pgq":              re.compile(r"^pgq\.event_\d+_\d+$"),
+    "pgmq":             re.compile(r"^pgmq\.q_bench_queue$"),
+    "pgmq-partitioned": re.compile(r"^pgmq\.q_bench_queue_(p\d+_\d+|default)$"),
+    "river":            re.compile(r"^public\.river_job$"),
+    "que":              re.compile(r"^public\.que_jobs$"),
+    "pgboss":           re.compile(r"^pgboss\.job_common$|^pgboss\.j[a-f0-9]+$"),
+}
+PROG_RE = re.compile(r"^progress:\s*([\d.]+)\s*s,\s*([\d.]+)\s*tps,\s*lat\s*([\d.]+)\s*ms\s*stddev\s*([\d.]+)")
+
+def load_dead(sysn):
+    p = ROOT / sysn / "bloat.csv"
+    if not p.exists(): return np.array([]), np.array([])
+    by_ts = {}
+    with open(p) as f:
+        r = csv.reader(f); next(r)
+        for row in r:
+            if not EVENT_RE[sysn].match(row[1]): continue
+            by_ts.setdefault(row[0], 0); by_ts[row[0]] += int(row[3])
+    if not by_ts: return np.array([]), np.array([])
+    tss = sorted(by_ts)
+    t0 = datetime.fromisoformat(tss[0].replace("Z",""))
+    mins = np.array([(datetime.fromisoformat(t.replace("Z","")) - t0).total_seconds()/60 for t in tss])
+    dead = np.array([by_ts[t] for t in tss], dtype=float)
+    return mins, dead
+
+def load_lat(sysn):
+    p = ROOT / sysn / "consumer.log"
+    if not p.exists(): return np.array([]), np.array([]), np.array([])
+    secs, tps, lats = [], [], []
+    with open(p) as f:
+        for ln in f:
+            m = PROG_RE.match(ln)
+            if m:
+                secs.append(float(m.group(1))/60.0)
+                tps.append(float(m.group(2)))
+                lats.append(float(m.group(3)))
+    return np.array(secs), np.array(tps), np.array(lats)
+
+# verdict table
+print(f"\n{'system':<20} {'peak_dead':>12} {'avg_tps':>10} {'max_lat':>10} {'min_tps_TX':>11}")
+print("-"*80)
+for sysn in SYSTEMS:
+    _, dead = load_dead(sysn)
+    mins_lat, tps, lats = load_lat(sysn)
+    peak = int(dead.max()) if dead.size else 0
+    avg_tps = float(tps.mean()) if tps.size else 0
+    max_lat = float(lats.max()) if lats.size else 0
+    # TX window: 30-60 min (ideal) — actually started ~30 but check phases
+    tx_mask = (mins_lat >= 32) & (mins_lat <= 58)
+    min_tps_tx = float(tps[tx_mask].min()) if tx_mask.any() else 0
+    print(f"{sysn:<20} {peak:>12,} {avg_tps:>10,.0f} {max_lat:>10.1f} {min_tps_tx:>11,.1f}")
+
+# chart
+fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(11, 7.6), dpi=110,
+                                gridspec_kw={'height_ratios':[1.0, 0.58], 'hspace':0.32,
+                                             'top':0.85, 'bottom':0.075, 'left':0.08, 'right':0.94})
+for ax in (ax1, ax2):
+    ax.axvspan(30, 60, color="#fdeeee", alpha=0.9, zorder=0)
+    for b in (30, 60):
+        ax.axvline(x=b, color="#ccc", lw=0.8, zorder=0.5)
+    ax.set_xlim(0, 90)
+    ax.set_xticks([0, 15, 30, 45, 60, 75, 90])
+    ax.set_xticklabels(["0", "15m", "30m", "45m", "1h", "1h15", "1h30"])
+    ax.grid(True, color="#e8e8e8", lw=0.8)
+    ax.set_axisbelow(True)
+    for sp in ("top","right"): ax.spines[sp].set_visible(False)
+    # strict bottom at 0 — no below-axis whitespace
+    ax.set_ymargin(0)
+
+# Phase labels — sit in the whitespace between title and the top of ax1 (not overlapping title)
+for (s, e, lbl, col) in [(0,30,"clean baseline · 30m","#888"),
+                          (30,60,"xmin horizon blocked · 30m","#c0392b"),
+                          (60,90,"clean recovery · 30m","#888")]:
+    ax1.text((s+e)/2, 1.03, lbl, transform=ax1.get_xaxis_transform(),
+             ha="center", color=col, fontsize=10,
+             fontweight="bold" if col=="#c0392b" else "normal")
+
+# Top: dead tuples
+for sysn in SYSTEMS:
+    mins, dead = load_dead(sysn)
+    if mins.size == 0: continue
+    lw = 3.0 if sysn=="pgque" else 2.0
+    ax1.plot(mins, dead, color=COLORS[sysn], lw=lw, label=sysn, zorder=3 if sysn=="pgque" else 2)
+
+def fmt_dead(v,_):
+    v = abs(v)
+    if v>=1e6: return f"{v/1e6:.1f}M"
+    if v>=1e3: return f"{v/1e3:.0f}k"
+    return f"{v:.0f}"
+ax1.yaxis.set_major_formatter(FuncFormatter(fmt_dead))
+ax1.set_ylabel("event-table dead tuples", labelpad=8)
+ax1.legend(loc="upper left", frameon=False, fontsize=9, ncol=2)
+
+# Bottom: latency (clipped)
+LAT_CLIP = 2000
+for sysn in SYSTEMS:
+    mins, _, lats = load_lat(sysn)
+    if mins.size == 0: continue
+    y = np.minimum(lats, LAT_CLIP)
+    lw = 3.0 if sysn=="pgque" else 2.0
+    ax2.plot(mins, y, color=COLORS[sysn], lw=lw, zorder=3 if sysn=="pgque" else 2)
+
+def fmt_lat(v,_):
+    if v>=1000: return f"{v/1000:.1f}s"
+    if v>=10: return f"{v:.0f}ms"
+    if v>=1: return f"{v:.1f}ms"
+    return f"{v:.2f}ms"
+ax2.yaxis.set_major_formatter(FuncFormatter(fmt_lat))
+ax2.set_ylabel("consumer latency (mean)", labelpad=8)
+
+fig.text(0.08, 0.97, "R5 full — 7 systems · 1.5 h · held xmin horizon for 30 min",
+         ha="left", fontsize=13, fontweight="bold")
+fig.text(0.08, 0.948, "producer -R 1000, pgbench --aggregate-interval=10 --log · consumer latency clipped at 2s",
+         ha="left", fontsize=9, color="#666")
+
+fig.savefig("/tmp/r5_full_chart.png", dpi=110, bbox_inches="tight")
+print(f"\nwrote /tmp/r5_full_chart.png ({Path('/tmp/r5_full_chart.png').stat().st_size/1024:.0f} KB)")
diff --git a/benchmark/charts/r6_smoke_chart.py b/benchmark/charts/r6_smoke_chart.py
new file mode 100644
index 00000000..e455cc8a
--- /dev/null
+++ b/benchmark/charts/r6_smoke_chart.py
@@ -0,0 +1,123 @@
+#!/usr/bin/env python3
+"""R6 combined smoke chart — events consumed/s + pgque table dead tuples."""
+import csv, re
+from pathlib import Path
+from datetime import datetime
+import matplotlib
+matplotlib.use('Agg')
+import matplotlib.pyplot as plt
+import numpy as np
+from matplotlib.ticker import FuncFormatter
+
+ROOT = Path("/tmp/bench_r6_combined")
+OUT = Path("/tmp/r6_smoke_chart.png")
+
+# Phase windows for the 20-min smoke (seconds): clean 0-300, xmin 300-900, recovery 900-1200
+PHASE_TX = (300, 900)
+X_MAX_S = 1200
+
+# Load events_consumed_per_sec.csv
+ev_t, ev_rate = [], []
+with open(ROOT / "events_consumed_per_sec.csv") as f:
+    r = csv.reader(f); next(r)
+    for row in r:
+        ev_t.append(int(row[0]))
+        ev_rate.append(int(row[1]))
+ev_t = np.array(ev_t); ev_rate = np.array(ev_rate)
+
+# Load bloat.csv — track key pgque tables over time
+TRACKED = {
+    "event_1_0": "#268bd2",  # blue (line like pgque accent)
+    "event_1_1": "#2aa198",  # cyan
+    "event_1_2": "#859900",  # green
+    "subscription_0": "#cb4b16",  # orange
+    "subscription_1": "#dc322f",  # red
+    "subscription_2": "#d33682",  # magenta
+    "tick_0": "#6c71c4",  # violet
+    "tick_1": "#b58900",  # yellow
+    "tick_2": "#93a1a1",  # base1
+    "meta_rotation": "#586e75",  # base01
+    "queue": "#eee8d5",  # base2
+}
+tbl_series = {t: ([],[],[]) for t in TRACKED}  # (t_seconds, n_live, n_dead)
+t0 = None
+with open(ROOT / "bloat.csv") as f:
+    r = csv.reader(f); next(r)
+    for row in r:
+        relname_full = row[1]
+        relname = relname_full.split(".")[-1]
+        if relname not in TRACKED: continue
+        ts = datetime.fromisoformat(row[0].replace("Z",""))
+        if t0 is None: t0 = ts
+        s = (ts - t0).total_seconds()
+        tbl_series[relname][0].append(s)
+        tbl_series[relname][1].append(int(row[2]))  # n_live_tup
+        tbl_series[relname][2].append(int(row[3]))  # n_dead_tup
+
+# --- Solarized Dark palette ---
+BG = "#002b36"; SURF = "#073642"
+FG = "#839496"; FG_EMPH = "#93a1a1"; FG_DIM = "#586e75"
+ALERT = "#dc322f"
+
+plt.rcParams.update({
+    'figure.facecolor': BG, 'axes.facecolor': BG, 'savefig.facecolor': BG,
+    'text.color': FG, 'axes.labelcolor': FG_EMPH,
+    'xtick.color': FG, 'ytick.color': FG,
+    'axes.edgecolor': FG_DIM,
+    'grid.color': SURF, 'grid.linewidth': 0.8,
+    'font.family': ['Helvetica', 'Arial', 'DejaVu Sans'],
+    'font.size': 10,
+})
+
+fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(11, 6.5), dpi=110,
+    gridspec_kw={'height_ratios':[1.0, 0.75], 'hspace':0.35,
+                 'top':0.91, 'bottom':0.09, 'left':0.09, 'right':0.94})
+
+for ax in (ax1, ax2):
+    ax.axvspan(PHASE_TX[0], PHASE_TX[1], color=SURF, alpha=1.0, zorder=0)
+    for b in PHASE_TX: ax.axvline(x=b, color=FG_DIM, lw=0.8, zorder=0.5)
+    ax.set_xlim(0, X_MAX_S)
+    ax.set_xticks([0, 300, 600, 900, 1200])
+    ax.set_xticklabels(["0", "5m", "10m", "15m", "20m"])
+    ax.grid(True); ax.set_axisbelow(True)
+    for sp in ("top","right"): ax.spines[sp].set_visible(False)
+    for sp in ("left","bottom"): ax.spines[sp].set_color(FG_DIM)
+
+# Phase labels
+for (s, e, lbl, col) in [(0, PHASE_TX[0], "clean baseline · 5m", FG_DIM),
+                          (PHASE_TX[0], PHASE_TX[1], "xmin horizon blocked · 10m", ALERT),
+                          (PHASE_TX[1], X_MAX_S, "clean recovery · 5m", FG_DIM)]:
+    ax1.text((s+e)/2, 1.04, lbl, transform=ax1.get_xaxis_transform(),
+             ha="center", color=col, fontsize=10,
+             fontweight="bold" if col==ALERT else "normal")
+
+# Top: events consumed/sec
+ax1.plot(ev_t, ev_rate, color="#268bd2", lw=1.5, alpha=0.9)
+ax1.axhline(1000, color=FG_DIM, ls="--", lw=0.8, alpha=0.7)
+ax1.text(X_MAX_S*0.98, 1020, "producer rate 1000 ev/s (-R 1000)",
+         color=FG_DIM, ha="right", fontsize=9, fontstyle="italic")
+ax1.set_ylabel("events consumed / s", labelpad=8, color=FG_EMPH)
+ax1.set_ylim(0, max(2200, ev_rate.max()*1.1))
+ax1.yaxis.set_major_formatter(FuncFormatter(lambda v,_: f"{v:.0f}"))
+
+# Bottom: dead tuples per pgque table
+for name, color in TRACKED.items():
+    t_s, nlive, ndead = tbl_series[name]
+    if not t_s: continue
+    ax2.plot(t_s, ndead, color=color, lw=2.0, label=name, alpha=0.95)
+
+ax2.set_ylabel("n_dead_tup per pgque table", labelpad=8, color=FG_EMPH)
+ax2.yaxis.set_major_formatter(FuncFormatter(lambda v,_: f"{v:.0f}"))
+# Legend top-right
+ax2.legend(loc="upper left", frameon=False, fontsize=8, ncol=3, labelcolor=FG)
+# Expand y so zero lines are visible; max is meta_rotation=33
+ax2.set_ylim(0, max(50, max([max(s[2]) for s in tbl_series.values() if s[2]], default=10)*1.15))
+
+# Titles
+fig.text(0.09, 0.965, "R6 combined smoke — pgque PR #62 rotation + events-consumed instrumentation",
+         ha="left", fontsize=12, fontweight="bold", color=FG_EMPH)
+fig.text(0.09, 0.94, "20 min (5m clean + 10m held xmin + 5m recovery). Event-rate: producer was rate-capped at 1000 ev/s.",
+         ha="left", fontsize=9, color=FG_DIM)
+
+fig.savefig(OUT, dpi=110, bbox_inches="tight", facecolor=BG)
+print(f"wrote {OUT}  ({OUT.stat().st_size/1024:.0f} KB)")
diff --git a/benchmark/consumers/consumer_pgboss.sql b/benchmark/consumers/consumer_pgboss.sql
new file mode 100644
index 00000000..514c3b0a
--- /dev/null
+++ b/benchmark/consumers/consumer_pgboss.sql
@@ -0,0 +1,18 @@
+-- R6 instrumented consumer for pg-boss (SKIP LOCKED + DELETE USING on partitioned job table).
+-- pg-boss 'default' queue, state='created'.
+DO $$
+DECLARE
+  n int := 0;
+BEGIN
+  WITH claimed AS (
+    SELECT id FROM pgboss.job
+    WHERE name='bench_queue' AND state='created'
+    ORDER BY priority DESC, created_on, id
+    LIMIT 100 FOR UPDATE SKIP LOCKED
+  ), deleted AS (
+    DELETE FROM pgboss.job USING claimed WHERE pgboss.job.id = claimed.id
+    RETURNING pgboss.job.id
+  )
+  SELECT count(*) INTO n FROM deleted;
+  RAISE NOTICE 'ev ts=% n=%', extract(epoch from clock_timestamp())::bigint, n;
+END $$;
diff --git a/benchmark/consumers/consumer_pgmq-partitioned.sql b/benchmark/consumers/consumer_pgmq-partitioned.sql
new file mode 100644
index 00000000..962fcd7b
--- /dev/null
+++ b/benchmark/consumers/consumer_pgmq-partitioned.sql
@@ -0,0 +1,14 @@
+-- R6 instrumented consumer for pgmq-partitioned (identical SQL to pgmq;
+-- the partitioning is on the schema side only).
+DO $$
+DECLARE
+  msg_ids bigint[];
+  n int := 0;
+BEGIN
+  SELECT array_agg(msg_id) FROM pgmq.read('bench_queue', 30, 50) INTO msg_ids;
+  IF msg_ids IS NOT NULL THEN
+    n := array_length(msg_ids, 1);
+    PERFORM pgmq.delete('bench_queue', msg_ids);
+  END IF;
+  RAISE NOTICE 'ev ts=% n=%', extract(epoch from clock_timestamp())::bigint, n;
+END $$;
diff --git a/benchmark/consumers/consumer_pgmq.sql b/benchmark/consumers/consumer_pgmq.sql
new file mode 100644
index 00000000..51c730f0
--- /dev/null
+++ b/benchmark/consumers/consumer_pgmq.sql
@@ -0,0 +1,15 @@
+-- R6 instrumented consumer for pgmq (read-then-delete).
+-- pgmq.delete is a function body so pg_stat_statements can't see row counts;
+-- NOTICE is the authoritative source.
+DO $$
+DECLARE
+  msg_ids bigint[];
+  n int := 0;
+BEGIN
+  SELECT array_agg(msg_id) FROM pgmq.read('bench_queue', 30, 50) INTO msg_ids;
+  IF msg_ids IS NOT NULL THEN
+    n := array_length(msg_ids, 1);
+    PERFORM pgmq.delete('bench_queue', msg_ids);
+  END IF;
+  RAISE NOTICE 'ev ts=% n=%', extract(epoch from clock_timestamp())::bigint, n;
+END $$;
diff --git a/benchmark/consumers/consumer_pgq.sql b/benchmark/consumers/consumer_pgq.sql
new file mode 100644
index 00000000..df251f8c
--- /dev/null
+++ b/benchmark/consumers/consumer_pgq.sql
@@ -0,0 +1,14 @@
+-- R6 instrumented consumer for pgq (identical to pgque; schema swap only).
+DO $$
+DECLARE
+  b bigint;
+  n int := 0;
+BEGIN
+  PERFORM pgq.ticker('bench_queue');
+  SELECT pgq.next_batch('bench_queue', 'bench_consumer') INTO b;
+  IF b IS NOT NULL THEN
+    SELECT count(*) INTO n FROM pgq.get_batch_events(b);
+    PERFORM pgq.finish_batch(b);
+  END IF;
+  RAISE NOTICE 'ev ts=% n=%', extract(epoch from clock_timestamp())::bigint, n;
+END $$;
diff --git a/benchmark/consumers/consumer_pgque.sql b/benchmark/consumers/consumer_pgque.sql
new file mode 100644
index 00000000..4f6a4cd6
--- /dev/null
+++ b/benchmark/consumers/consumer_pgque.sql
@@ -0,0 +1,16 @@
+-- R6 instrumented consumer for pgque (batch ticker model).
+-- Preserves original behavior; adds one NOTICE per call for honest-events/s.
+-- Format: NOTICE: ev ts=<epoch_s> n=<events>
+DO $$
+DECLARE
+  b bigint;
+  n int := 0;
+BEGIN
+  PERFORM pgque.ticker('bench_queue');
+  SELECT pgque.next_batch('bench_queue', 'bench_consumer') INTO b;
+  IF b IS NOT NULL THEN
+    SELECT count(*) INTO n FROM pgque.get_batch_events(b);
+    PERFORM pgque.finish_batch(b);
+  END IF;
+  RAISE NOTICE 'ev ts=% n=%', extract(epoch from clock_timestamp())::bigint, n;
+END $$;
diff --git a/benchmark/consumers/consumer_que.sql b/benchmark/consumers/consumer_que.sql
new file mode 100644
index 00000000..14b188f8
--- /dev/null
+++ b/benchmark/consumers/consumer_que.sql
@@ -0,0 +1,18 @@
+-- R6 instrumented consumer for que (Ruby Que gem; SKIP LOCKED + DELETE).
+-- Mirror R5 shape (que_jobs with default queue and priority ordering).
+DO $$
+DECLARE
+  n int := 0;
+BEGIN
+  WITH claimed AS (
+    SELECT id FROM que_jobs
+    WHERE finished_at IS NULL AND expired_at IS NULL
+    ORDER BY priority, run_at, id
+    LIMIT 100 FOR UPDATE SKIP LOCKED
+  ), deleted AS (
+    DELETE FROM que_jobs USING claimed WHERE que_jobs.id = claimed.id
+    RETURNING que_jobs.id
+  )
+  SELECT count(*) INTO n FROM deleted;
+  RAISE NOTICE 'ev ts=% n=%', extract(epoch from clock_timestamp())::bigint, n;
+END $$;
diff --git a/benchmark/consumers/consumer_river.sql b/benchmark/consumers/consumer_river.sql
new file mode 100644
index 00000000..d9be6a3c
--- /dev/null
+++ b/benchmark/consumers/consumer_river.sql
@@ -0,0 +1,18 @@
+-- R6 instrumented consumer for river (SKIP LOCKED + DELETE USING).
+-- Wrapped in DO block so NOTICE carries the count.
+-- Raw DELETE rowcount is ALSO visible in pg_stat_statements via the
+-- wrapped statement (total row count over the run — cross-check).
+DO $$
+DECLARE
+  n int := 0;
+BEGIN
+  WITH claimed AS (
+    SELECT id FROM river_job WHERE state='available' AND queue='default'
+    ORDER BY id LIMIT 100 FOR UPDATE SKIP LOCKED
+  ), deleted AS (
+    DELETE FROM river_job USING claimed WHERE river_job.id = claimed.id
+    RETURNING river_job.id
+  )
+  SELECT count(*) INTO n FROM deleted;
+  RAISE NOTICE 'ev ts=% n=%', extract(epoch from clock_timestamp())::bigint, n;
+END $$;
diff --git a/benchmark/gifs/r4_gif_tps_solarized.py b/benchmark/gifs/r4_gif_tps_solarized.py
new file mode 100644
index 00000000..fa6a4b27
--- /dev/null
+++ b/benchmark/gifs/r4_gif_tps_solarized.py
@@ -0,0 +1,618 @@
+#!/usr/bin/env python3
+"""Failure by MVCC — TPS variant (Solarized Dark).
+
+Solarized Dark palette applied to the TPS-bottom-panel version. Same
+palette + treatment as r4_gif_v17_solarized.py: bg = base03 (#002b36),
+TX phase wash = base02 (#073642), text in base0/base1/base01 tiers,
+alert in Solarized red (#dc322f). Bottom panel shows consumer TPS with
+healthy lines near the top and collapses plunging DOWN toward the
+x-axis baseline during the TX window.
+
+Line colors (Solarized accents):
+  pgque  blue #268bd2 (hero)   pgq   cyan   #2aa198
+  pgmq   red  #dc322f           river orange #cb4b16
+  que    violet #6c71c4         pgboss green #859900
+
+Outputs:
+  /tmp/bench_r4_24h/failure_by_mvcc_tps.gif
+  /tmp/bench_r4_24h/failure_by_mvcc_tps_hero.png
+"""
+import csv
+import os
+import re
+import shutil
+import subprocess
+from datetime import datetime
+from pathlib import Path
+
+import matplotlib
+matplotlib.use('Agg')
+import matplotlib.pyplot as plt
+import matplotlib.transforms as mtransforms
+import numpy as np
+from matplotlib.ticker import FuncFormatter
+from PIL import Image
+
+# ---- tunables ----------------------------------------------------------------
+ROOT      = Path("/tmp/bench_r4_24h")
+OUT_GIF   = ROOT / "failure_by_mvcc_tps.gif"
+OUT_HERO  = ROOT / "failure_by_mvcc_tps_hero.png"
+FPS           = 25
+DURATION_S    = 22
+FRAME_COUNT   = FPS * DURATION_S
+HOLD_FRAMES   = FPS * 1
+FIG_W_IN, FIG_H_IN, DPI = 9.0, 7.0, 100   # 900x700 px
+X_MAX_MIN     = 150
+FLASH_FRAMES  = 12
+INLINE_MIN_DEAD = 50_000    # suppress inline dead-tuple labels below this
+# TPS: draw the inline label when the line has dropped BELOW this threshold
+# (the collapse is the story — healthy-steady-state lines at 10-17k are busy
+# at the top of the panel and an inline label there would just add clutter).
+INLINE_MAX_TPS_COLLAPSED = 3000
+# -----------------------------------------------------------------------------
+
+SYSTEMS = ["pgque", "pgq", "pgmq", "river", "que", "pgboss"]
+# --- Solarized Dark palette (Ethan Schoonover) ---------------------------
+# Backgrounds
+BG         = "#002b36"   # base03 — primary figure + axes background
+SURFACE    = "#073642"   # base02 — surface/highlight + TX phase-band wash
+# Text tiers
+FG         = "#839496"   # base0  — primary text (ticks, values, inline)
+FG_EMPH    = "#93a1a1"   # base1  — emphasized (axis y-labels, column hdrs)
+FG_DIM     = "#586e75"   # base01 — de-emphasized (inactive phase labels)
+# Grid / rules / structure
+GRID       = SURFACE     # grid matches surface: reads as structure
+SPINE      = FG_DIM      # base01 for spines
+PHASE_RULE = FG_DIM      # base01 for boundary axvlines at t=30/90
+# Alert (Solarized's canonical red, no separate ember wash)
+RED        = "#dc322f"   # hinge flash + TX-phase ribbon + active phase label
+TX_BAND    = SURFACE     # TX phase wash = surface, not an alert hue
+# Solarized accents (for reference):
+# blue #268bd2  cyan #2aa198  red #dc322f  magenta #d33682
+# orange #cb4b16  violet #6c71c4  green #859900  yellow #b58900
+# -------------------------------------------------------------------------
+
+COLORS = {
+    "pgque":   "#268bd2",   # Solarized blue — hero, most saturated
+    "pgq":     "#2aa198",   # Solarized cyan — sibling (cool family)
+    "pgmq":    "#dc322f",   # Solarized red
+    "river":   "#cb4b16",   # Solarized orange
+    "que":     "#6c71c4",   # Solarized violet
+    "pgboss":  "#859900",   # Solarized green (warm yellow-green)
+}
+LW = {s: (3.0 if s == "pgque" else 2.2) for s in SYSTEMS}
+EVENT_RE = {
+    "pgque":   re.compile(r"^pgque\.event_\d+_\d+$"),
+    "pgq":     re.compile(r"^pgq\.event_\d+_\d+$"),
+    "pgmq":    re.compile(r"^pgmq\.q_bench_queue$"),
+    "river":   re.compile(r"^public\.river_job$"),
+    "que":     re.compile(r"^public\.que_jobs$"),
+    "pgboss":  re.compile(r"^pgboss\.job_common$|^pgboss\.j[a-f0-9]+$"),
+}
+PHASE_BOUNDS = [(0, 30, "clean"), (30, 90, "tx"), (90, 150, "clean")]
+
+plt.rcParams.update({
+    "font.family": ["Helvetica", "Arial", "DejaVu Sans"],
+    "font.size": 11,
+    "axes.spines.top": False,
+    "axes.spines.right": False,
+    "axes.spines.bottom": True,
+    "axes.spines.left": True,
+    "axes.edgecolor": SPINE,
+    "axes.linewidth": 0.9,
+    "axes.grid": True,
+    "grid.color": GRID,
+    "grid.linewidth": 0.8,
+    "axes.axisbelow": True,
+    # --- dark theme colors (applied to figure, axes, text, ticks) ---
+    "figure.facecolor": BG,
+    "axes.facecolor":   BG,
+    "savefig.facecolor": BG,
+    "text.color":       FG,
+    "axes.labelcolor":  FG,
+    "axes.titlecolor":  FG,
+    "xtick.color":      FG,
+    "ytick.color":      FG,
+})
+
+
+def _mins_from(tss):
+    t0 = datetime.fromisoformat(tss[0].replace("Z", ""))
+    return np.array([
+        (datetime.fromisoformat(t.replace("Z", "")) - t0).total_seconds() / 60
+        for t in tss
+    ])
+
+
+def load_bloat():
+    series = {}
+    for sysn in SYSTEMS:
+        p = ROOT / sysn / "bloat.csv"
+        if not p.exists():
+            continue
+        by_ts = {}
+        with open(p) as f:
+            r = csv.reader(f)
+            next(r)
+            for row in r:
+                try:
+                    if not EVENT_RE[sysn].match(row[1]):
+                        continue
+                    by_ts.setdefault(row[0], 0)
+                    by_ts[row[0]] += int(row[3])
+                except Exception:
+                    pass
+        tss = sorted(by_ts)
+        if not tss:
+            continue
+        mins = _mins_from(tss)
+        dead = np.array([by_ts[t] for t in tss], dtype=float)
+        series[sysn] = {"mins": mins, "dead": dead}
+    return series
+
+
+_PROGRESS_RE = re.compile(
+    r"^progress:\s*([\d.]+)\s*s,\s*([\d.]+)\s*tps,\s*lat\s*([\d.]+)\s*ms"
+)
+
+
+def load_tps():
+    """Parse the tps field out of 'progress: X s, Y tps, lat Z ms ...' lines
+    in each system's consumer.log. Returns {sysn: {"mins": ..., "tps": ...}}.
+    """
+    series = {}
+    for sysn in SYSTEMS:
+        p = ROOT / sysn / "consumer.log"
+        if not p.exists():
+            continue
+        mins, tps = [], []
+        with open(p) as f:
+            for ln in f:
+                m = _PROGRESS_RE.match(ln)
+                if not m:
+                    continue
+                t_s = float(m.group(1))
+                mins.append(t_s / 60.0)
+                tps.append(float(m.group(2)))
+        if not mins:
+            continue
+        mins = np.array(mins); tps = np.array(tps)
+        series[sysn] = {"mins": mins, "tps": tps}
+    return series
+
+
+def fmt_y_dead(v, _):
+    if v >= 1e6: return f"{v/1e6:.1f}M"
+    if v >= 1e3: return f"{v/1e3:.0f}k"
+    return f"{v:.0f}"
+
+
+def fmt_y_tps(v, _):
+    v = abs(v)
+    if v >= 1000: return f"{v/1000:.0f}k"
+    return f"{v:.0f}"
+
+
+def current_phase(now_min):
+    for start, end, kind in PHASE_BOUNDS:
+        if start <= now_min < end:
+            return kind, start, end
+    s, e, k = PHASE_BOUNDS[-1]
+    return k, s, e
+
+
+def _apply_phase_bands(ax):
+    for start, end, kind in PHASE_BOUNDS:
+        if kind == "tx":
+            # Solarized: TX band uses surface (base02) at full alpha so
+            # the phase reads as a visibly-darker-than-bg region without
+            # any alert-color wash.
+            ax.axvspan(start, end, color=TX_BAND, alpha=1.0, zorder=0)
+    for b in (30, 90):
+        ax.axvline(x=b, color=PHASE_RULE, linewidth=0.8, zorder=0.5)
+
+
+def _plot_series(ax, series, sysn, now_min, y_key, y_clip=None,
+                 inline_threshold=None, inline_fmt=None, inline_out=None,
+                 inline_label_value_fn=None, inline_cmp="above"):
+    """Plot a series; return the endpoint y value.
+
+    If inline_threshold/inline_fmt/inline_out are provided, append a
+    proposed inline-label dict to inline_out for later anti-overlap
+    placement.  inline_cmp="above" (default) appends when y_end >=
+    threshold; inline_cmp="below" appends when y_end <= threshold — used
+    for TPS collapse (showing the label ONLY when the line has dropped).
+
+    inline_label_value_fn, if given, is used to compute the text value
+    instead of raw y_end (useful for display-vs-plot-value differences).
+    """
+    mins = series[sysn]["mins"]
+    y = series[sysn][y_key].copy()
+    if y_clip is not None:
+        y = np.minimum(y, y_clip)
+    mask = mins <= now_min
+    if mask.sum() < 1:
+        return None
+    x_sel = mins[mask]; y_sel = y[mask]
+    if mask.sum() < len(mins):
+        idx = mask.sum()
+        if 0 < idx < len(mins):
+            prev_x, next_x = mins[idx - 1], mins[idx]
+            if next_x > prev_x:
+                frac = min(max((now_min - prev_x) / (next_x - prev_x), 0), 1)
+                y_i = y[idx - 1] + frac * (y[idx] - y[idx - 1])
+                x_sel = np.append(x_sel, now_min)
+                y_sel = np.append(y_sel, y_i)
+    alpha = 1.0 if sysn == "pgque" else 0.85
+    ax.plot(x_sel, y_sel, color=COLORS[sysn], linewidth=LW[sysn], alpha=alpha,
+            zorder=3 if sysn == "pgque" else 2, solid_capstyle="round")
+    if len(x_sel) > 0:
+        x_end, y_end = x_sel[-1], y_sel[-1]
+        ax.plot([x_end], [y_end], marker="o", markersize=5,
+                color=COLORS[sysn], markeredgecolor=BG,
+                markeredgewidth=1.0, zorder=5)
+        if (inline_threshold is not None and inline_fmt is not None
+                and inline_out is not None):
+            pass_test = (y_end >= inline_threshold if inline_cmp == "above"
+                         else y_end <= inline_threshold)
+            if pass_test:
+                disp_val = (inline_label_value_fn(y_end)
+                            if inline_label_value_fn is not None else y_end)
+                inline_out.append({
+                    "sysn": sysn,
+                    "x": x_end,
+                    "y_true": y_end,
+                    "text": f"{sysn}: {inline_fmt(disp_val, None)}",
+                    "color": COLORS[sysn],
+                    "weight": "bold" if sysn == "pgque" else "normal",
+                })
+    return y_sel[-1] if len(y_sel) else None
+
+
+# v13: arrows form a "V" — pgque tail LEFT of cursor, pgq tail RIGHT of
+# cursor. Both heads meet at the cursor on the pgque/pgq lines, so the
+# two shafts diverge outward at ~45°. Labels sit directly under each tail
+# (no separate stagger variable — label x = tail x).
+# Tail + label closer to the axis than v12: -0.06 / -0.10 axes-fraction.
+TAIL_DX_MIN = 5.0          # tail offset in data-minutes; tuned for ~45°
+ARROW_TAIL_AX_FRAC  = -0.06
+ARROW_LABEL_AX_FRAC = -0.10
+
+
+def _draw_cursor_arrows(ax, series, y_key, y_peak, now_min):
+    """Draw two hairline up-arrows in a divergent "V" pattern. Head of each
+    arrow is on the pgque/pgq line at x = now_min (pure data coords). Tail
+    is offset outward by TAIL_DX_MIN minutes (pgque tail left, pgq tail
+    right) and below the axes by ARROW_TAIL_AX_FRAC in axes-fraction — so
+    each shaft tilts outward at ~45° in pixel space.
+
+    If a tail would fall outside the axes x-range (near t=0 / t=150), flip
+    that arrow's direction so it stays on-axis, then widen the gap if the
+    flip would make the two labels collide.
+    """
+    x_data_min, x_data_max = ax.get_xlim()
+    now_clamped = max(min(now_min, x_data_max), x_data_min)
+
+    # Directional tail offset per system (pgque goes left, pgq goes right).
+    tail_dir = {"pgque": -1, "pgq": +1}
+    tail_x_raw = {s: now_clamped + tail_dir[s] * TAIL_DX_MIN for s in tail_dir}
+
+    # Edge handling: if a tail is off-axis, flip it to the other side. If
+    # BOTH end up on the same side after flipping, spread them apart with
+    # ±TAIL_DX_MIN around the cursor (falling back to clamped positions).
+    for s in ("pgque", "pgq"):
+        if tail_x_raw[s] < x_data_min or tail_x_raw[s] > x_data_max:
+            tail_x_raw[s] = now_clamped - tail_dir[s] * TAIL_DX_MIN
+    # If the flip made them coincide / cross, rebalance.
+    if abs(tail_x_raw["pgque"] - tail_x_raw["pgq"]) < TAIL_DX_MIN:
+        # place them symmetrically around the cursor, then clamp joint-wise
+        lo = max(now_clamped - TAIL_DX_MIN, x_data_min)
+        hi = lo + 2 * TAIL_DX_MIN
+        if hi > x_data_max:
+            hi = x_data_max
+            lo = hi - 2 * TAIL_DX_MIN
+        tail_x_raw = {"pgque": lo, "pgq": hi}
+    # Final per-axis clamp (belt & suspenders).
+    tail_x_for = {s: max(min(tail_x_raw[s], x_data_max), x_data_min)
+                  for s in ("pgque", "pgq")}
+
+    # Blended transform: x = data, y = axes-fraction.
+    tail_trans = mtransforms.blended_transform_factory(ax.transData, ax.transAxes)
+
+    for sysn in ("pgque", "pgq"):
+        if sysn not in series:
+            continue
+        mins = series[sysn]["mins"]
+        ys   = series[sysn][y_key]
+        if len(mins) == 0 or now_clamped < mins[0] or now_clamped > mins[-1]:
+            continue
+        y_target = float(np.interp(now_clamped, mins, ys))
+        weight = "bold" if sysn == "pgque" else "normal"
+        tail_x = tail_x_for[sysn]
+        # Arrow: head on the line (data coords at now_clamped), tail at
+        # (tail_x, ARROW_TAIL_AX_FRAC). Tilted ~45° because head x != tail x.
+        ax.annotate(
+            "",
+            xy=(now_clamped, y_target),
+            xycoords=ax.transData,
+            xytext=(tail_x, ARROW_TAIL_AX_FRAC),
+            textcoords=tail_trans,
+            arrowprops=dict(
+                arrowstyle="->",
+                color=COLORS[sysn],
+                linewidth=0.8,
+                shrinkA=0, shrinkB=2,
+            ),
+            annotation_clip=False,
+            zorder=12,
+        )
+        # Label sits directly under its OWN tail.
+        ax.text(tail_x, ARROW_LABEL_AX_FRAC, sysn,
+                transform=tail_trans,
+                ha="center", va="top",
+                fontsize=8.5, color=COLORS[sysn],
+                fontweight=weight,
+                clip_on=False, zorder=12)
+
+
+def _place_inline_labels(ax, proposals, y_peak, gap_frac=0.04):
+    """Top-down vertical nudge so labels don't overlap.
+    Drops any label pushed out of [0, y_peak]. Draws with monospace.
+    """
+    if not proposals:
+        return
+    gap = y_peak * gap_frac
+    # Sort descending by true y; walk top-down, push each label down if too close
+    ordered = sorted(proposals, key=lambda p: -p["y_true"])
+    prev_y = None
+    for p in ordered:
+        y = p["y_true"]
+        if prev_y is not None and y > prev_y - gap:
+            y = prev_y - gap
+        p["y_draw"] = y
+        prev_y = y
+    dx = X_MAX_MIN * 0.015
+    for p in ordered:
+        y = p["y_draw"]
+        if y < 0 or y > y_peak:
+            continue  # label forced out of frame — drop it
+        ax.text(p["x"] + dx, y, p["text"],
+                color=p["color"], fontsize=8.5,
+                fontfamily="monospace", fontweight=p["weight"],
+                ha="left", va="center", zorder=6)
+
+
+def render_frame(bloat, tps_series, frame_idx, total_frames, y_peak_dead, y_peak_tps):
+    now_min = X_MAX_MIN * (frame_idx / max(total_frames - 1, 1))
+
+    fig = plt.figure(figsize=(FIG_W_IN, FIG_H_IN), dpi=DPI)
+    # No title block — top whitespace reclaimed for the axes.
+    # v12: hspace + bottom margin sized so each panel has figure-level
+    # whitespace BELOW its x-axis tick labels for the pgque/pgq arrows
+    # + labels (which live outside the axes via negative axes-fraction y).
+    # Tuned so top panel bottom lands at ~42% of figure height (not
+    # compressed) while leaving room below bot-panel for arrow+label.
+    gs = fig.add_gridspec(2, 1, hspace=0.45, top=0.95, bottom=0.10,
+                          left=0.10, right=0.96,
+                          height_ratios=[1.0, 0.58])
+    ax1 = fig.add_subplot(gs[0])
+    ax2 = fig.add_subplot(gs[1])
+
+    for ax in (ax1, ax2):
+        _apply_phase_bands(ax)
+
+    # Phase labels on top axis only
+    for start, end, kind in PHASE_BOUNDS:
+        label = ("idle-in-tx · 1h" if kind == "tx"
+                 else ("clean baseline · 30m" if start == 0 else "clean recovery · 1h"))
+        color = RED if kind == "tx" else FG_DIM
+        is_current = (start <= now_min < end)
+        weight = "bold" if (is_current or kind == "tx") else "normal"
+        alpha = 1.0 if is_current else (0.7 if kind == "tx" else 0.55)
+        ax1.text((start + end) / 2, 1.04, label,
+                 transform=ax1.get_xaxis_transform(),
+                 ha="center", color=color, fontsize=10,
+                 fontweight=weight, alpha=alpha)
+
+    # --- panel 1: dead tuples ---
+    dead_vals = {}
+    dead_inline = []
+    for sysn in SYSTEMS:
+        if sysn in bloat:
+            v = _plot_series(ax1, bloat, sysn, now_min, "dead",
+                             inline_threshold=INLINE_MIN_DEAD,
+                             inline_fmt=fmt_y_dead,
+                             inline_out=dead_inline)
+            if v is not None:
+                dead_vals[sysn] = v
+    _place_inline_labels(ax1, dead_inline, y_peak_dead)
+    _draw_cursor_arrows(ax1, bloat, "dead", y_peak_dead, now_min)
+
+    # --- panel 2: TPS (normal orientation — healthy lines near top,
+    # collapse plunges down toward the x-axis baseline at y=0) ---
+    tps_vals = {}
+    tps_inline = []
+    for sysn in SYSTEMS:
+        if sysn in tps_series:
+            v = _plot_series(ax2, tps_series, sysn, now_min, "tps",
+                             inline_threshold=INLINE_MAX_TPS_COLLAPSED,
+                             inline_cmp="below",
+                             inline_fmt=fmt_y_tps,
+                             inline_out=tps_inline)
+            if v is not None:
+                tps_vals[sysn] = v
+    _place_inline_labels(ax2, tps_inline, y_peak_tps)
+    _draw_cursor_arrows(ax2, tps_series, "tps", y_peak_tps, now_min)
+
+    # Hinge flash (v2 style: red rule + white-box callout)
+    hinge_min = 30
+    frames_since_hinge = (now_min - hinge_min) / X_MAX_MIN * (total_frames - 1)
+    if 0 <= frames_since_hinge <= FLASH_FRAMES:
+        flash_alpha = 1.0 - (frames_since_hinge / FLASH_FRAMES)
+        ax1.axvline(x=hinge_min, color=RED, linewidth=3.5,
+                    alpha=flash_alpha * 0.85, zorder=6)
+        ax2.axvline(x=hinge_min, color=RED, linewidth=3.5,
+                    alpha=flash_alpha * 0.85, zorder=6)
+        ax1.text(hinge_min + 1, y_peak_dead * 0.92,
+                 "idle-in-tx opens\nvacuum frozen",
+                 color=RED, fontsize=11, fontweight="bold",
+                 alpha=flash_alpha, zorder=7,
+                 bbox=dict(boxstyle="round,pad=0.3", facecolor=BG,
+                           edgecolor=RED, linewidth=1.2, alpha=flash_alpha))
+
+    # Now cursor on both panels
+    for ax, y_peak in ((ax1, y_peak_dead), (ax2, y_peak_tps)):
+        ax.axvline(x=now_min, color=FG_EMPH, linewidth=1.2, alpha=0.70, zorder=8)
+        ax.plot([now_min], [y_peak * 1.005], marker="v", color=FG_EMPH,
+                markersize=7, zorder=9, clip_on=False)
+
+    # Phase / time readout
+    kind, p_start, p_end = current_phase(now_min)
+    ribbon_color = RED if kind == "tx" else FG_DIM
+    ribbon_label = ("IDLE-IN-TX (vacuum blocked)" if kind == "tx"
+                    else ("CLEAN BASELINE" if p_start == 0 else "CLEAN RECOVERY"))
+    # Footer: phase label stays sans; t = ... is mono for stable digit columns.
+    # Compose as two adjacent fig.text calls anchored around figure center.
+    phase_text = f"phase: {ribbon_label}"
+    time_text  = f"t = {now_min:5.1f} min / 150 min"
+    fig.text(0.5, 0.015, phase_text + "  |  ",
+             ha="right", fontsize=10, color=ribbon_color, fontweight="bold")
+    fig.text(0.5, 0.015, time_text,
+             ha="left", fontsize=10, color=ribbon_color, fontweight="bold",
+             fontfamily="monospace")
+
+    # Axes
+    for ax in (ax1, ax2):
+        ax.set_xlim(-1, X_MAX_MIN)
+        ax.set_xticks([0, 30, 60, 90, 120, 150])
+        ax.set_xticklabels(["0", "30m", "1h", "1h30", "2h", "2h30"])
+    # v8: no below-axis headroom (start at 0), but keep '0' tick label hidden
+    # so the baseline gridline stays without the "0" text crowding the axis.
+    # v12: clean axis bounds — no negative headroom. The arrows live
+    # OUTSIDE the axes rectangle via mixed (data, axes-fraction) transforms
+    # and annotation_clip=False, so the plot data area is not compressed.
+    ax1.set_ylim(0, y_peak_dead * 1.05)
+    ax1.yaxis.set_major_formatter(FuncFormatter(fmt_y_dead))
+    ax1.set_ylabel("event-table dead tuples", labelpad=8)
+
+    ax2.set_ylim(0, y_peak_tps * 1.05)
+    ax2.yaxis.set_major_formatter(FuncFormatter(fmt_y_tps))
+    ax2.set_ylabel("consumer TPS", labelpad=8)
+
+    # Mono-only for numeric y-tick labels; hide the '0' tick.
+    for ax in (ax1, ax2):
+        for lbl, pos in zip(ax.get_yticklabels(), ax.get_yticks()):
+            lbl.set_fontfamily("monospace")
+            if pos == 0:
+                lbl.set_visible(False)
+
+    # Legend: uniform row grid, baseline-aligned, two cells per row.
+    # Name (sans) right-aligned at x=0.88; value (mono) right-aligned at x=0.985.
+    # Using va="baseline" for every cell so sans and mono glyphs sit on the
+    # same typographic baseline regardless of family metrics.
+    # ROW_SPACING is in axes-fraction — bottom panel is ~0.58x the height of
+    # top, so we scale to keep visual row rhythm consistent in physical pixels.
+    def _draw_legend(ax, vals, fmt, header, row_spacing, header_gap):
+        header_y   = 0.97
+        rows_start = header_y - header_gap
+        ax.text(0.985, header_y, header,
+                transform=ax.transAxes, ha="right", va="baseline",
+                color=FG_EMPH, fontsize=8.5, zorder=10)
+        visible = [s for s in SYSTEMS if s in vals]
+        for i, sysn in enumerate(visible):
+            cur = vals[sysn]
+            weight = "bold" if sysn == "pgque" else "normal"
+            y = rows_start - i * row_spacing
+            ax.text(0.88, y, sysn,
+                    transform=ax.transAxes, ha="right", va="baseline",
+                    color=COLORS[sysn], fontsize=9, fontweight=weight,
+                    zorder=10)
+            ax.text(0.985, y, fmt(cur, None),
+                    transform=ax.transAxes, ha="right", va="baseline",
+                    color=COLORS[sysn], fontsize=9, fontweight=weight,
+                    fontfamily="monospace", zorder=10)
+
+    # Scale row spacing by inverse of panel height so visual pixel rhythm
+    # matches between the hero (top) and supporting (bottom) panels.
+    _draw_legend(ax1, dead_vals, fmt_y_dead, "dead tuples",
+                 row_spacing=0.065, header_gap=0.05)
+    _draw_legend(ax2, tps_vals, fmt_y_tps, "TPS",
+                 row_spacing=0.112, header_gap=0.086)
+
+    fig.canvas.draw()
+    buf = np.asarray(fig.canvas.buffer_rgba())
+    img = Image.fromarray(buf).convert("RGB")
+    plt.close(fig)
+    return img
+
+
+def build_animation():
+    print("Loading bloat CSVs...")
+    bloat = load_bloat()
+    print(f"  bloat series: {sorted(bloat.keys())}")
+    print("Loading TPS from consumer logs...")
+    tps_series = load_tps()
+    print(f"  tps series: {sorted(tps_series.keys())}")
+
+    y_peak_dead = max(bloat[s]["dead"].max() for s in bloat)
+    y_peak_tps  = max(tps_series[s]["tps"].max() for s in tps_series)
+    print(f"  y_peak dead = {y_peak_dead:.0f}, y_peak tps = {y_peak_tps:.0f}")
+    # Log each system's peak + min during TX phase so the report can cite
+    # which ones drop furthest (just informational).
+    for s in sorted(tps_series):
+        mins = tps_series[s]["mins"]; tps = tps_series[s]["tps"]
+        tx_mask = (mins >= 30) & (mins <= 90)
+        if tx_mask.any():
+            peak_all = tps.max()
+            min_in_tx = tps[tx_mask].min()
+            print(f"    {s}: steady_peak={peak_all:.0f} tps  min_in_tx={min_in_tx:.0f} tps")
+
+    print(f"Rendering {FRAME_COUNT} frames ({FIG_W_IN*DPI:.0f}x{FIG_H_IN*DPI:.0f} px)...")
+    frames = []
+    for i in range(FRAME_COUNT):
+        if i % 50 == 0:
+            print(f"  frame {i}/{FRAME_COUNT}")
+        frames.append(render_frame(bloat, tps_series, i, FRAME_COUNT, y_peak_dead, y_peak_tps))
+    for _ in range(HOLD_FRAMES):
+        frames.append(frames[-1])
+
+    frames[-1].save(OUT_HERO, format="PNG", optimize=True)
+    print(f"Wrote hero PNG: {OUT_HERO}")
+
+    gifski = shutil.which("gifski")
+    if gifski:
+        print("Using gifski...")
+        tmp_dir = Path("/tmp/r4_gif_frames_tps_solarized")
+        if tmp_dir.exists():
+            shutil.rmtree(tmp_dir)
+        tmp_dir.mkdir()
+        for idx, fr in enumerate(frames):
+            fr.save(tmp_dir / f"frame_{idx:04d}.png")
+        # Solarized dark has large flat-color background regions (base03
+        # figure bg + base02 TX phase wash) that low gifski quality turns
+        # into visible speckle. Push quality to default=90, add --extra for
+        # the slower-but-cleaner palette path, and --lossy-quality=100 to
+        # disable the noise/streak dither that produces the speckle.
+        subprocess.run(
+            [gifski, "-o", str(OUT_GIF), "--fps", str(FPS),
+             "--width", str(int(FIG_W_IN * DPI)),
+             "--quality", "68",
+             "--lossy-quality", "75",
+             "--extra",
+             *sorted(str(p) for p in tmp_dir.glob("*.png"))],
+            check=True,
+        )
+        shutil.rmtree(tmp_dir)
+    else:
+        print("gifski missing; using PIL...")
+        pal_frames = [f.quantize(colors=128, method=Image.MEDIANCUT, dither=Image.Dither.NONE)
+                      for f in frames]
+        pal_frames[0].save(OUT_GIF, save_all=True,
+                           append_images=pal_frames[1:],
+                           duration=int(1000 / FPS), loop=0,
+                           optimize=True, disposal=2)
+    print(f"Wrote GIF: {OUT_GIF} ({OUT_GIF.stat().st_size/1e6:.2f} MB)")
+
+
+if __name__ == "__main__":
+    build_animation()
diff --git a/benchmark/gifs/r4_gif_v17_solarized.py b/benchmark/gifs/r4_gif_v17_solarized.py
new file mode 100644
index 00000000..b5afff5b
--- /dev/null
+++ b/benchmark/gifs/r4_gif_v17_solarized.py
@@ -0,0 +1,610 @@
+#!/usr/bin/env python3
+"""Failure by MVCC — animated GIF v17 Solarized Dark (v16 dark palette swap).
+
+Ethan Schoonover's Solarized Dark palette. Backgrounds: base03 (#002b36)
+for the figure + axes, base02 (#073642) for the TX phase-band wash and
+gridlines. Text in three tiers: base0 (#839496) primary, base1 (#93a1a1)
+emphasized (axis y-labels, column headers, now-cursor), base01 (#586e75)
+de-emphasized (spines, phase-boundary axvlines, inactive phase labels).
+Alert: Solarized red (#dc322f) for the hinge flash box + active TX-phase
+label + TX ribbon.
+
+Line palette uses the Solarized accent colors:
+  pgque  = blue    #268bd2  (hero — most saturated)
+  pgq    = cyan    #2aa198  (sibling, cool family)
+  pgmq   = red     #dc322f
+  river  = orange  #cb4b16
+  que    = violet  #6c71c4
+  pgboss = green   #859900  (warm yellow-green)
+Yellow (#b58900) reserved for future highlighting.
+
+Edge handling: if a tail would fall off-axis (near t=0 or t=150), flip
+that arrow's direction only so it stays on-axis; if flipping would make
+the two tails coincide, rebalance with joint-clamp ±TAIL_DX_MIN.
+
+Everything else unchanged from v12: clean set_ylim(0, y_peak*1.05),
+uniform legend, mono numbers, inline endpoint labels + anti-overlap,
+no title, hinge flash, 6 systems.
+
+Outputs:
+  /tmp/bench_r4_24h/death_spiral.gif
+  /tmp/bench_r4_24h/death_spiral_hero.png
+"""
+import csv
+import os
+import re
+import shutil
+import subprocess
+from datetime import datetime
+from pathlib import Path
+
+import matplotlib
+matplotlib.use('Agg')
+import matplotlib.pyplot as plt
+import matplotlib.transforms as mtransforms
+import numpy as np
+from matplotlib.ticker import FuncFormatter
+from PIL import Image
+
+# ---- tunables ----------------------------------------------------------------
+ROOT      = Path("/tmp/bench_r4_24h")
+OUT_GIF   = ROOT / "death_spiral.gif"
+OUT_HERO  = ROOT / "death_spiral_hero.png"
+FPS           = 25
+DURATION_S    = 22
+FRAME_COUNT   = FPS * DURATION_S
+HOLD_FRAMES   = FPS * 1
+FIG_W_IN, FIG_H_IN, DPI = 9.0, 7.0, 100   # 900x700 px
+X_MAX_MIN     = 150
+FLASH_FRAMES  = 12
+LAT_CLIP_MS   = 1200
+INLINE_MIN_DEAD = 50_000    # suppress inline dead-tuple labels below this
+INLINE_MIN_LAT_MS = 10      # suppress inline latency labels below this
+# -----------------------------------------------------------------------------
+
+SYSTEMS = ["pgque", "pgq", "pgmq", "river", "que", "pgboss"]
+# --- Solarized Dark palette (Ethan Schoonover) ---------------------------
+# Backgrounds
+BG         = "#002b36"   # base03 — primary figure + axes background
+SURFACE    = "#073642"   # base02 — surface/highlight + TX phase-band wash
+# Text tiers
+FG         = "#839496"   # base0  — primary text (ticks, values, inline)
+FG_EMPH    = "#93a1a1"   # base1  — emphasized (axis y-labels, column hdrs)
+FG_DIM     = "#586e75"   # base01 — de-emphasized (inactive phase labels)
+# Grid / rules / structure
+GRID       = SURFACE     # grid matches surface: reads as structure
+SPINE      = FG_DIM      # base01 for spines
+PHASE_RULE = FG_DIM      # base01 for boundary axvlines at t=30/90
+# Alert (Solarized's canonical red, no separate ember wash)
+RED        = "#dc322f"   # hinge flash + TX-phase ribbon + active phase label
+TX_BAND    = SURFACE     # TX phase wash = surface, not an alert hue
+# Solarized accents (for reference):
+# blue #268bd2  cyan #2aa198  red #dc322f  magenta #d33682
+# orange #cb4b16  violet #6c71c4  green #859900  yellow #b58900
+# -------------------------------------------------------------------------
+
+COLORS = {
+    "pgque":   "#268bd2",   # Solarized blue — hero, most saturated
+    "pgq":     "#2aa198",   # Solarized cyan — sibling (cool family)
+    "pgmq":    "#dc322f",   # Solarized red
+    "river":   "#cb4b16",   # Solarized orange
+    "que":     "#6c71c4",   # Solarized violet
+    "pgboss":  "#859900",   # Solarized green (warm yellow-green)
+}
+LW = {s: (3.0 if s == "pgque" else 2.2) for s in SYSTEMS}
+EVENT_RE = {
+    "pgque":   re.compile(r"^pgque\.event_\d+_\d+$"),
+    "pgq":     re.compile(r"^pgq\.event_\d+_\d+$"),
+    "pgmq":    re.compile(r"^pgmq\.q_bench_queue$"),
+    "river":   re.compile(r"^public\.river_job$"),
+    "que":     re.compile(r"^public\.que_jobs$"),
+    "pgboss":  re.compile(r"^pgboss\.job_common$|^pgboss\.j[a-f0-9]+$"),
+}
+PHASE_BOUNDS = [(0, 30, "clean"), (30, 90, "tx"), (90, 150, "clean")]
+
+plt.rcParams.update({
+    "font.family": ["Helvetica", "Arial", "DejaVu Sans"],
+    "font.size": 11,
+    "axes.spines.top": False,
+    "axes.spines.right": False,
+    "axes.spines.bottom": True,
+    "axes.spines.left": True,
+    "axes.edgecolor": SPINE,
+    "axes.linewidth": 0.9,
+    "axes.grid": True,
+    "grid.color": GRID,
+    "grid.linewidth": 0.8,
+    "axes.axisbelow": True,
+    # --- dark theme colors (applied to figure, axes, text, ticks) ---
+    "figure.facecolor": BG,
+    "axes.facecolor":   BG,
+    "savefig.facecolor": BG,
+    "text.color":       FG,
+    "axes.labelcolor":  FG,
+    "axes.titlecolor":  FG,
+    "xtick.color":      FG,
+    "ytick.color":      FG,
+})
+
+
+def _mins_from(tss):
+    t0 = datetime.fromisoformat(tss[0].replace("Z", ""))
+    return np.array([
+        (datetime.fromisoformat(t.replace("Z", "")) - t0).total_seconds() / 60
+        for t in tss
+    ])
+
+
+def load_bloat():
+    series = {}
+    for sysn in SYSTEMS:
+        p = ROOT / sysn / "bloat.csv"
+        if not p.exists():
+            continue
+        by_ts = {}
+        with open(p) as f:
+            r = csv.reader(f)
+            next(r)
+            for row in r:
+                try:
+                    if not EVENT_RE[sysn].match(row[1]):
+                        continue
+                    by_ts.setdefault(row[0], 0)
+                    by_ts[row[0]] += int(row[3])
+                except Exception:
+                    pass
+        tss = sorted(by_ts)
+        if not tss:
+            continue
+        mins = _mins_from(tss)
+        dead = np.array([by_ts[t] for t in tss], dtype=float)
+        series[sysn] = {"mins": mins, "dead": dead}
+    return series
+
+
+_PROGRESS_RE = re.compile(
+    r"^progress:\s*([\d.]+)\s*s,\s*[\d.]+\s*tps,\s*lat\s*([\d.]+)\s*ms\s*stddev\s*([\d.]+)"
+)
+
+
+def load_latency():
+    series = {}
+    for sysn in SYSTEMS:
+        p = ROOT / sysn / "consumer.log"
+        if not p.exists():
+            continue
+        mins, lat, std = [], [], []
+        with open(p) as f:
+            for ln in f:
+                m = _PROGRESS_RE.match(ln)
+                if not m:
+                    continue
+                t_s = float(m.group(1))
+                mins.append(t_s / 60.0)
+                lat.append(float(m.group(2)))
+                std.append(float(m.group(3)))
+        if not mins:
+            continue
+        mins = np.array(mins); lat = np.array(lat); std = np.array(std)
+        series[sysn] = {"mins": mins, "lat": lat, "std": std}
+    return series
+
+
+def fmt_y_dead(v, _):
+    if v >= 1e6: return f"{v/1e6:.1f}M"
+    if v >= 1e3: return f"{v/1e3:.0f}k"
+    return f"{v:.0f}"
+
+
+def fmt_y_lat(v, _):
+    if v >= 1000: return f"{v/1000:.1f}s"
+    if v >= 10:   return f"{v:.0f}ms"
+    if v >= 1:    return f"{v:.1f}ms"
+    return f"{v:.2f}ms"  # sub-ms: show real number, not 0
+
+
+def current_phase(now_min):
+    for start, end, kind in PHASE_BOUNDS:
+        if start <= now_min < end:
+            return kind, start, end
+    s, e, k = PHASE_BOUNDS[-1]
+    return k, s, e
+
+
+def _apply_phase_bands(ax):
+    for start, end, kind in PHASE_BOUNDS:
+        if kind == "tx":
+            # Solarized: TX band uses surface (base02) at full alpha so
+            # the phase reads as a visibly-darker-than-bg region without
+            # any alert-color wash.
+            ax.axvspan(start, end, color=TX_BAND, alpha=1.0, zorder=0)
+    for b in (30, 90):
+        ax.axvline(x=b, color=PHASE_RULE, linewidth=0.8, zorder=0.5)
+
+
+def _plot_series(ax, series, sysn, now_min, y_key, y_clip=None,
+                 inline_threshold=None, inline_fmt=None, inline_out=None):
+    """Plot a series; return the endpoint y value.
+
+    If inline_threshold/inline_fmt/inline_out are provided and the endpoint
+    y is above threshold, append a proposed inline-label dict to
+    inline_out for later anti-overlap placement in data coordinates.
+    """
+    mins = series[sysn]["mins"]
+    y = series[sysn][y_key].copy()
+    if y_clip is not None:
+        y = np.minimum(y, y_clip)
+    mask = mins <= now_min
+    if mask.sum() < 1:
+        return None
+    x_sel = mins[mask]; y_sel = y[mask]
+    if mask.sum() < len(mins):
+        idx = mask.sum()
+        if 0 < idx < len(mins):
+            prev_x, next_x = mins[idx - 1], mins[idx]
+            if next_x > prev_x:
+                frac = min(max((now_min - prev_x) / (next_x - prev_x), 0), 1)
+                y_i = y[idx - 1] + frac * (y[idx] - y[idx - 1])
+                x_sel = np.append(x_sel, now_min)
+                y_sel = np.append(y_sel, y_i)
+    alpha = 1.0 if sysn == "pgque" else 0.85
+    ax.plot(x_sel, y_sel, color=COLORS[sysn], linewidth=LW[sysn], alpha=alpha,
+            zorder=3 if sysn == "pgque" else 2, solid_capstyle="round")
+    if len(x_sel) > 0:
+        x_end, y_end = x_sel[-1], y_sel[-1]
+        ax.plot([x_end], [y_end], marker="o", markersize=5,
+                color=COLORS[sysn], markeredgecolor=BG,
+                markeredgewidth=1.0, zorder=5)
+        if (inline_threshold is not None and inline_fmt is not None
+                and inline_out is not None and y_end >= inline_threshold):
+            inline_out.append({
+                "sysn": sysn,
+                "x": x_end,
+                "y_true": y_end,
+                "text": f"{sysn}: {inline_fmt(y_end, None)}",
+                "color": COLORS[sysn],
+                "weight": "bold" if sysn == "pgque" else "normal",
+            })
+    return y_sel[-1] if len(y_sel) else None
+
+
+# v13: arrows form a "V" — pgque tail LEFT of cursor, pgq tail RIGHT of
+# cursor. Both heads meet at the cursor on the pgque/pgq lines, so the
+# two shafts diverge outward at ~45°. Labels sit directly under each tail
+# (no separate stagger variable — label x = tail x).
+# v14 spacing: tail at -0.06, label at -0.07 (label sits just below the
+# arrow tail — much snugger than v13's -0.06 / -0.10).
+TAIL_DX_MIN = 5.0          # tail offset in data-minutes; tuned for ~45°
+ARROW_TAIL_AX_FRAC  = -0.06   # unchanged from v13
+ARROW_LABEL_AX_FRAC = -0.07   # v14: raised from -0.10, snug under tail
+
+
+def _draw_cursor_arrows(ax, series, y_key, y_peak, now_min):
+    """Draw two hairline up-arrows in a divergent "V" pattern. Head of each
+    arrow is on the pgque/pgq line at x = now_min (pure data coords). Tail
+    is offset outward by TAIL_DX_MIN minutes (pgque tail left, pgq tail
+    right) and below the axes by ARROW_TAIL_AX_FRAC in axes-fraction — so
+    each shaft tilts outward at ~45° in pixel space.
+
+    If a tail would fall outside the axes x-range (near t=0 / t=150), flip
+    that arrow's direction so it stays on-axis, then widen the gap if the
+    flip would make the two labels collide.
+    """
+    x_data_min, x_data_max = ax.get_xlim()
+    now_clamped = max(min(now_min, x_data_max), x_data_min)
+
+    # Directional tail offset per system (pgque goes left, pgq goes right).
+    tail_dir = {"pgque": -1, "pgq": +1}
+    tail_x_raw = {s: now_clamped + tail_dir[s] * TAIL_DX_MIN for s in tail_dir}
+
+    # Edge handling: if a tail is off-axis, flip it to the other side. If
+    # BOTH end up on the same side after flipping, spread them apart with
+    # ±TAIL_DX_MIN around the cursor (falling back to clamped positions).
+    for s in ("pgque", "pgq"):
+        if tail_x_raw[s] < x_data_min or tail_x_raw[s] > x_data_max:
+            tail_x_raw[s] = now_clamped - tail_dir[s] * TAIL_DX_MIN
+    # If the flip made them coincide / cross, rebalance.
+    if abs(tail_x_raw["pgque"] - tail_x_raw["pgq"]) < TAIL_DX_MIN:
+        # place them symmetrically around the cursor, then clamp joint-wise
+        lo = max(now_clamped - TAIL_DX_MIN, x_data_min)
+        hi = lo + 2 * TAIL_DX_MIN
+        if hi > x_data_max:
+            hi = x_data_max
+            lo = hi - 2 * TAIL_DX_MIN
+        tail_x_raw = {"pgque": lo, "pgq": hi}
+    # Final per-axis clamp (belt & suspenders).
+    tail_x_for = {s: max(min(tail_x_raw[s], x_data_max), x_data_min)
+                  for s in ("pgque", "pgq")}
+
+    # Blended transform: x = data, y = axes-fraction.
+    tail_trans = mtransforms.blended_transform_factory(ax.transData, ax.transAxes)
+
+    for sysn in ("pgque", "pgq"):
+        if sysn not in series:
+            continue
+        mins = series[sysn]["mins"]
+        ys   = series[sysn][y_key]
+        if len(mins) == 0 or now_clamped < mins[0] or now_clamped > mins[-1]:
+            continue
+        y_target = float(np.interp(now_clamped, mins, ys))
+        weight = "bold" if sysn == "pgque" else "normal"
+        tail_x = tail_x_for[sysn]
+        # Arrow: head on the line (data coords at now_clamped), tail at
+        # (tail_x, ARROW_TAIL_AX_FRAC). Tilted ~45° because head x != tail x.
+        ax.annotate(
+            "",
+            xy=(now_clamped, y_target),
+            xycoords=ax.transData,
+            xytext=(tail_x, ARROW_TAIL_AX_FRAC),
+            textcoords=tail_trans,
+            arrowprops=dict(
+                arrowstyle="->",
+                color=COLORS[sysn],
+                linewidth=0.8,
+                shrinkA=0, shrinkB=2,
+            ),
+            annotation_clip=False,
+            zorder=12,
+        )
+        # Label sits directly under its OWN tail.
+        ax.text(tail_x, ARROW_LABEL_AX_FRAC, sysn,
+                transform=tail_trans,
+                ha="center", va="top",
+                fontsize=8.5, color=COLORS[sysn],
+                fontweight=weight,
+                clip_on=False, zorder=12)
+
+
+def _place_inline_labels(ax, proposals, y_peak, gap_frac=0.04):
+    """Top-down vertical nudge so labels don't overlap.
+    Drops any label pushed out of [0, y_peak]. Draws with monospace.
+    """
+    if not proposals:
+        return
+    gap = y_peak * gap_frac
+    # Sort descending by true y; walk top-down, push each label down if too close
+    ordered = sorted(proposals, key=lambda p: -p["y_true"])
+    prev_y = None
+    for p in ordered:
+        y = p["y_true"]
+        if prev_y is not None and y > prev_y - gap:
+            y = prev_y - gap
+        p["y_draw"] = y
+        prev_y = y
+    dx = X_MAX_MIN * 0.015
+    for p in ordered:
+        y = p["y_draw"]
+        if y < 0 or y > y_peak:
+            continue  # label forced out of frame — drop it
+        ax.text(p["x"] + dx, y, p["text"],
+                color=p["color"], fontsize=8.5,
+                fontfamily="monospace", fontweight=p["weight"],
+                ha="left", va="center", zorder=6)
+
+
+def render_frame(bloat, latency, frame_idx, total_frames, y_peak_dead, y_peak_lat):
+    now_min = X_MAX_MIN * (frame_idx / max(total_frames - 1, 1))
+
+    fig = plt.figure(figsize=(FIG_W_IN, FIG_H_IN), dpi=DPI)
+    # No title block — top whitespace reclaimed for the axes.
+    # v12: hspace + bottom margin sized so each panel has figure-level
+    # whitespace BELOW its x-axis tick labels for the pgque/pgq arrows
+    # + labels (which live outside the axes via negative axes-fraction y).
+    # Tuned so top panel bottom lands at ~42% of figure height (not
+    # compressed) while leaving room below bot-panel for arrow+label.
+    gs = fig.add_gridspec(2, 1, hspace=0.45, top=0.95, bottom=0.10,
+                          left=0.10, right=0.96,
+                          height_ratios=[1.0, 0.58])
+    ax1 = fig.add_subplot(gs[0])
+    ax2 = fig.add_subplot(gs[1])
+
+    for ax in (ax1, ax2):
+        _apply_phase_bands(ax)
+
+    # Phase labels on top axis only
+    for start, end, kind in PHASE_BOUNDS:
+        label = ("xmin horizon blocked · 1h" if kind == "tx"
+                 else ("clean baseline · 30m" if start == 0 else "clean recovery · 1h"))
+        color = RED if kind == "tx" else FG_DIM
+        is_current = (start <= now_min < end)
+        weight = "bold" if (is_current or kind == "tx") else "normal"
+        alpha = 1.0 if is_current else (0.7 if kind == "tx" else 0.55)
+        ax1.text((start + end) / 2, 1.04, label,
+                 transform=ax1.get_xaxis_transform(),
+                 ha="center", color=color, fontsize=10,
+                 fontweight=weight, alpha=alpha)
+
+    # --- panel 1: dead tuples ---
+    dead_vals = {}
+    dead_inline = []
+    for sysn in SYSTEMS:
+        if sysn in bloat:
+            v = _plot_series(ax1, bloat, sysn, now_min, "dead",
+                             inline_threshold=INLINE_MIN_DEAD,
+                             inline_fmt=fmt_y_dead,
+                             inline_out=dead_inline)
+            if v is not None:
+                dead_vals[sysn] = v
+    _place_inline_labels(ax1, dead_inline, y_peak_dead)
+    _draw_cursor_arrows(ax1, bloat, "dead", y_peak_dead, now_min)
+
+    # --- panel 2: latency ---
+    lat_vals = {}
+    lat_inline = []
+    for sysn in SYSTEMS:
+        if sysn in latency:
+            v = _plot_series(ax2, latency, sysn, now_min, "lat", y_clip=LAT_CLIP_MS,
+                             inline_threshold=INLINE_MIN_LAT_MS,
+                             inline_fmt=fmt_y_lat,
+                             inline_out=lat_inline)
+            if v is not None:
+                lat_vals[sysn] = v
+    _place_inline_labels(ax2, lat_inline, y_peak_lat)
+    _draw_cursor_arrows(ax2, latency, "lat", y_peak_lat, now_min)
+
+    # Hinge flash — dark theme: black-bg + bright-red border + bright-red text
+    hinge_min = 30
+    frames_since_hinge = (now_min - hinge_min) / X_MAX_MIN * (total_frames - 1)
+    if 0 <= frames_since_hinge <= FLASH_FRAMES:
+        flash_alpha = 1.0 - (frames_since_hinge / FLASH_FRAMES)
+        ax1.axvline(x=hinge_min, color=RED, linewidth=3.5,
+                    alpha=flash_alpha * 0.85, zorder=6)
+        ax2.axvline(x=hinge_min, color=RED, linewidth=3.5,
+                    alpha=flash_alpha * 0.85, zorder=6)
+        ax1.text(hinge_min + 1, y_peak_dead * 0.92,
+                 "xmin horizon blocked\nvacuum can't advance",
+                 color=RED, fontsize=11, fontweight="bold",
+                 alpha=flash_alpha, zorder=7,
+                 bbox=dict(boxstyle="round,pad=0.3", facecolor=BG,
+                           edgecolor=RED, linewidth=1.2, alpha=flash_alpha))
+
+    # Now cursor on both panels
+    for ax, y_peak in ((ax1, y_peak_dead), (ax2, y_peak_lat)):
+        ax.axvline(x=now_min, color=FG_EMPH, linewidth=1.2, alpha=0.70, zorder=8)
+        ax.plot([now_min], [y_peak * 1.005], marker="v", color=FG_EMPH,
+                markersize=7, zorder=9, clip_on=False)
+
+    # Phase / time readout
+    kind, p_start, p_end = current_phase(now_min)
+    ribbon_color = RED if kind == "tx" else FG_DIM
+    ribbon_label = ("XMIN HORIZON BLOCKED (vacuum stalled)" if kind == "tx"
+                    else ("CLEAN BASELINE" if p_start == 0 else "CLEAN RECOVERY"))
+    # Footer: phase label stays sans; t = ... is mono for stable digit columns.
+    # Compose as two adjacent fig.text calls anchored around figure center.
+    phase_text = f"phase: {ribbon_label}"
+    time_text  = f"t = {now_min:5.1f} min / 150 min"
+    fig.text(0.5, 0.015, phase_text + "  |  ",
+             ha="right", fontsize=10, color=ribbon_color, fontweight="bold")
+    fig.text(0.5, 0.015, time_text,
+             ha="left", fontsize=10, color=ribbon_color, fontweight="bold",
+             fontfamily="monospace")
+
+    # Axes
+    for ax in (ax1, ax2):
+        ax.set_xlim(-1, X_MAX_MIN)
+        ax.set_xticks([0, 30, 60, 90, 120, 150])
+        ax.set_xticklabels(["0", "30m", "1h", "1h30", "2h", "2h30"])
+    # v8: no below-axis headroom (start at 0), but keep '0' tick label hidden
+    # so the baseline gridline stays without the "0" text crowding the axis.
+    # v12: clean axis bounds — no negative headroom. The arrows live
+    # OUTSIDE the axes rectangle via mixed (data, axes-fraction) transforms
+    # and annotation_clip=False, so the plot data area is not compressed.
+    ax1.set_ylim(0, y_peak_dead * 1.05)
+    ax1.yaxis.set_major_formatter(FuncFormatter(fmt_y_dead))
+    ax1.set_ylabel("event-table dead tuples", labelpad=8)
+
+    ax2.set_ylim(0, y_peak_lat * 1.05)
+    ax2.yaxis.set_major_formatter(FuncFormatter(fmt_y_lat))
+    ax2.set_ylabel("consumer latency (mean)", labelpad=8)
+
+    # Mono-only for numeric y-tick labels; hide the '0' tick.
+    for ax in (ax1, ax2):
+        for lbl, pos in zip(ax.get_yticklabels(), ax.get_yticks()):
+            lbl.set_fontfamily("monospace")
+            if pos == 0:
+                lbl.set_visible(False)
+
+    # Legend: uniform row grid, baseline-aligned, two cells per row.
+    # Name (sans) right-aligned at x=0.88; value (mono) right-aligned at x=0.985.
+    # Using va="baseline" for every cell so sans and mono glyphs sit on the
+    # same typographic baseline regardless of family metrics.
+    # ROW_SPACING is in axes-fraction — bottom panel is ~0.58x the height of
+    # top, so we scale to keep visual row rhythm consistent in physical pixels.
+    def _draw_legend(ax, vals, fmt, header, row_spacing, header_gap):
+        header_y   = 0.97
+        rows_start = header_y - header_gap
+        ax.text(0.985, header_y, header,
+                transform=ax.transAxes, ha="right", va="baseline",
+                color=FG_EMPH, fontsize=8.5, zorder=10)
+        visible = [s for s in SYSTEMS if s in vals]
+        for i, sysn in enumerate(visible):
+            cur = vals[sysn]
+            weight = "bold" if sysn == "pgque" else "normal"
+            y = rows_start - i * row_spacing
+            ax.text(0.88, y, sysn,
+                    transform=ax.transAxes, ha="right", va="baseline",
+                    color=COLORS[sysn], fontsize=9, fontweight=weight,
+                    zorder=10)
+            ax.text(0.985, y, fmt(cur, None),
+                    transform=ax.transAxes, ha="right", va="baseline",
+                    color=COLORS[sysn], fontsize=9, fontweight=weight,
+                    fontfamily="monospace", zorder=10)
+
+    # Scale row spacing by inverse of panel height so visual pixel rhythm
+    # matches between the hero (top) and supporting (bottom) panels.
+    _draw_legend(ax1, dead_vals, fmt_y_dead, "dead tuples (event tables)",
+                 row_spacing=0.065, header_gap=0.05)
+    _draw_legend(ax2, lat_vals, fmt_y_lat, "latency (mean)",
+                 row_spacing=0.112, header_gap=0.086)
+
+    fig.canvas.draw()
+    buf = np.asarray(fig.canvas.buffer_rgba())
+    img = Image.fromarray(buf).convert("RGB")
+    plt.close(fig)
+    return img
+
+
+def build_animation():
+    print("Loading bloat CSVs...")
+    bloat = load_bloat()
+    print(f"  bloat series: {sorted(bloat.keys())}")
+    print("Loading latency from consumer logs...")
+    latency = load_latency()
+    print(f"  latency series: {sorted(latency.keys())}")
+
+    y_peak_dead = max(bloat[s]["dead"].max() for s in bloat)
+    peak_lat_observed = max(
+        min(latency[s]["lat"].max(), LAT_CLIP_MS) for s in latency
+    )
+    y_peak_lat = peak_lat_observed
+    print(f"  y_peak dead = {y_peak_dead:.0f}, y_peak lat = {y_peak_lat:.1f} ms")
+
+    print(f"Rendering {FRAME_COUNT} frames ({FIG_W_IN*DPI:.0f}x{FIG_H_IN*DPI:.0f} px)...")
+    frames = []
+    for i in range(FRAME_COUNT):
+        if i % 50 == 0:
+            print(f"  frame {i}/{FRAME_COUNT}")
+        frames.append(render_frame(bloat, latency, i, FRAME_COUNT, y_peak_dead, y_peak_lat))
+    for _ in range(HOLD_FRAMES):
+        frames.append(frames[-1])
+
+    frames[-1].save(OUT_HERO, format="PNG", optimize=True)
+    print(f"Wrote hero PNG: {OUT_HERO}")
+
+    gifski = shutil.which("gifski")
+    if gifski:
+        print("Using gifski...")
+        tmp_dir = Path("/tmp/r4_gif_frames_v17_solarized")
+        if tmp_dir.exists():
+            shutil.rmtree(tmp_dir)
+        tmp_dir.mkdir()
+        for idx, fr in enumerate(frames):
+            fr.save(tmp_dir / f"frame_{idx:04d}.png")
+        # Solarized dark has large flat-color background regions (base03
+        # figure bg + base02 TX phase wash) that low gifski quality turns
+        # into visible speckle. Push quality to default=90, add --extra for
+        # the slower-but-cleaner palette path, and --lossy-quality=100 to
+        # disable the noise/streak dither that produces the speckle.
+        subprocess.run(
+            [gifski, "-o", str(OUT_GIF), "--fps", str(FPS),
+             "--width", str(int(FIG_W_IN * DPI)),
+             "--quality", "68",
+             "--lossy-quality", "75",
+             "--extra",
+             *sorted(str(p) for p in tmp_dir.glob("*.png"))],
+            check=True,
+        )
+        shutil.rmtree(tmp_dir)
+    else:
+        print("gifski missing; using PIL...")
+        pal_frames = [f.quantize(colors=128, method=Image.MEDIANCUT, dither=Image.Dither.NONE)
+                      for f in frames]
+        pal_frames[0].save(OUT_GIF, save_all=True,
+                           append_images=pal_frames[1:],
+                           duration=int(1000 / FPS), loop=0,
+                           optimize=True, disposal=2)
+    print(f"Wrote GIF: {OUT_GIF} ({OUT_GIF.stat().st_size/1e6:.2f} MB)")
+
+
+if __name__ == "__main__":
+    build_animation()
diff --git a/benchmark/install/README.md b/benchmark/install/README.md
new file mode 100644
index 00000000..3517d838
--- /dev/null
+++ b/benchmark/install/README.md
@@ -0,0 +1,19 @@
+# Install scripts
+
+All seven VMs share `bootstrap.sh` as the base — installs PG18 (PGDG) + pg_cron + pg_stat_statements + pg_ash + pgfr, applies postgresql.conf tuning, enables local trust auth, restarts PG.
+
+Then each VM layers a system-specific install on top:
+
+| System | Per-system installer | Notes |
+|---|---|---|
+| pgque | — (cloned + `make install` in userdata before bootstrap — PR #62 branch) | The pgque binary install happens in AMI user-data; bootstrap only prepares PG. Actual pgque SQL install: `git clone NikolayS/pgque -b feat/rotation`, `make USE_PGXS=1 install`, `psql -f build/pgque.sql`, `SELECT pgque.create_queue('bench_queue')`. |
+| pgq | [install_pgq.sh](install_pgq.sh) | PL-only fork via `switch_plonly.sql`. |
+| pgmq | [install_pgmq.sh](install_pgmq.sh) | SQL-only install from tembo-io/pgmq v1.11.0 raw URL. |
+| pgmq-partitioned | [install_pgmq.sh](install_pgmq.sh) + `pgmq.create_partitioned(...)` + pg_partman | See [pgmq-partitioned_setup_5min.sql](pgmq-partitioned_setup_5min.sql) for the partman cron schedule. |
+| river | [install_river.sh](install_river.sh) | `go install` the CLI, run `river migrate-up`. |
+| que | — (Ruby gem installed in VM userdata) | `gem install que -v 2.4.x`, `bundle exec que:install`. The VM userdata handles this; the bench only consumes the schema. |
+| pgboss | [install_pgboss.sh](install_pgboss.sh) | `npm install pg-boss@12.15` locally on the VM, run `new PgBoss(DSN).start()` once to migrate schema. |
+
+There is **no** `install_que.sh` or `install_pgmq-partitioned.sh` as separate shell files — the que gem install is driven from the AMI user-data (Ruby + bundler), and pgmq-partitioned reuses `install_pgmq.sh` + `pg_partman` + a single-call SQL script.
+
+Reset procedure between runs: [../runners/clean_reinstall.sh](../runners/clean_reinstall.sh). See [../OPS_GOTCHAS.md §4, §5, §7](../OPS_GOTCHAS.md) for the schema-drop pitfalls.
diff --git a/benchmark/install/bootstrap.sh b/benchmark/install/bootstrap.sh
new file mode 100644
index 00000000..84c6c8ad
--- /dev/null
+++ b/benchmark/install/bootstrap.sh
@@ -0,0 +1,95 @@
+#!/usr/bin/env bash
+set -euo pipefail
+export DEBIAN_FRONTEND=noninteractive
+
+echo "=== [$(hostname)] bootstrap start $(date -u +%FT%TZ) ==="
+
+# Wait for any cloud-init apt locks to clear
+for i in $(seq 1 60); do
+  if ! sudo fuser /var/lib/dpkg/lock-frontend >/dev/null 2>&1; then break; fi
+  echo "  waiting for dpkg lock..."
+  sleep 5
+done
+
+sudo apt-get update -qq
+sudo apt-get install -y curl gnupg lsb-release build-essential git python3-pip python3-psycopg2 tmux jq -qq
+
+# PGDG
+curl -fsSL https://www.postgresql.org/media/keys/ACCC4CF8.asc \
+  | sudo gpg --dearmor -o /usr/share/keyrings/postgresql.gpg
+echo "deb [signed-by=/usr/share/keyrings/postgresql.gpg] https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" \
+  | sudo tee /etc/apt/sources.list.d/postgresql.list >/dev/null
+
+sudo apt-get update -qq
+sudo apt-get install -y postgresql-18 postgresql-server-dev-18 postgresql-18-cron -qq
+
+sudo pip3 install psycopg2-binary --break-system-packages >/dev/null 2>&1 || true
+
+# pg_ash (pure SQL, no make)
+sudo rm -rf /tmp/pg_ash
+sudo git clone https://github.com/NikolayS/pg_ash /tmp/pg_ash
+
+# pg-flight-recorder (pure SQL, no make)
+sudo rm -rf /tmp/pgfr
+sudo git clone https://github.com/NikolayS/pg-flight-recorder /tmp/pgfr
+sudo ln -sfn /tmp/pgfr/_record/sql /record_sql
+sudo ln -sfn /tmp/pgfr/_analyze/sql /analyze_sql
+
+sudo mkdir -p /data
+sudo chown postgres:postgres /data
+
+# ── postgresql.conf tuning (same as issue #77) ───────────────────────────────
+sudo tee -a /etc/postgresql/18/main/postgresql.conf >/dev/null <<'CONF'
+
+# ── pgq bench tuning (issue #77) ─────────────────────────────────────────────
+shared_preload_libraries = 'pg_stat_statements,pg_cron'
+cron.database_name = 'bench'
+
+shared_buffers = 4GB
+effective_cache_size = 12GB
+
+synchronous_commit = off
+wal_level = minimal
+wal_compression = lz4
+max_wal_size = 16GB
+checkpoint_completion_target = 0.9
+
+bgwriter_delay = 50ms
+bgwriter_lru_maxpages = 400
+bgwriter_lru_multiplier = 4.0
+
+random_page_cost = 1.1
+effective_io_concurrency = 200
+max_connections = 200
+
+max_wal_senders = 0
+
+autovacuum_vacuum_scale_factor   = 0.01
+autovacuum_analyze_scale_factor  = 0.01
+autovacuum_vacuum_cost_delay     = 2ms
+
+jit = off
+
+listen_addresses = 'localhost'
+CONF
+
+# trust local for bench workers
+sudo sed -i '/^host.*127.0.0.1.*scram-sha-256/i host all postgres 127.0.0.1/32 trust' \
+  /etc/postgresql/18/main/pg_hba.conf
+
+sudo systemctl restart postgresql@18-main
+
+# create bench DB
+sudo -u postgres psql -c "CREATE DATABASE bench;"
+sudo -u postgres psql -d bench -c "CREATE EXTENSION IF NOT EXISTS pg_stat_statements; CREATE EXTENSION IF NOT EXISTS pg_cron;"
+
+# pg_ash install + start sampling
+sudo -u postgres psql -d bench -f /tmp/pg_ash/sql/ash-install.sql >/dev/null
+sudo -u postgres psql -d bench -c "SELECT * FROM ash.start();" >/dev/null
+
+# pgfr install
+sudo -u postgres psql -d bench -f /tmp/pgfr/_record/install.sql >/dev/null 2>&1 || true
+sudo -u postgres psql -d bench -f /tmp/pgfr/_analyze/install.sql >/dev/null 2>&1 || true
+sudo -u postgres psql -d bench -f /tmp/pgfr/_control/install.sql >/dev/null 2>&1 || true
+
+echo "=== [$(hostname)] bootstrap DONE $(date -u +%FT%TZ) ==="
diff --git a/benchmark/install/install_pgboss.sh b/benchmark/install/install_pgboss.sh
new file mode 100644
index 00000000..5a04d458
--- /dev/null
+++ b/benchmark/install/install_pgboss.sh
@@ -0,0 +1,70 @@
+#!/usr/bin/env bash
+# pgboss install with covering index for polling query (avoids 7.9 GB disk sort pathology).
+set -euo pipefail
+echo "=== install pg-boss v12.15.0 + Node worker + covering index ==="
+
+curl -fsSL https://deb.nodesource.com/setup_20.x | sudo bash - >/dev/null 2>&1
+sudo apt-get install -y nodejs -qq
+sudo npm install -g pg-boss@12.15.0 >/dev/null 2>&1
+
+# Schema + queue creation
+sudo NODE_PATH=/usr/lib/node_modules node -e "
+const { PgBoss } = require('pg-boss');
+const boss = new PgBoss('postgres://postgres@127.0.0.1/bench');
+boss.start().then(async () => {
+  await boss.createQueue('bench_queue');
+  console.log('pgboss schema + bench_queue ready');
+  process.exit(0);
+}).catch(e => { console.error(e.message); process.exit(1); });
+"
+
+# Covering index for the polling query ORDER BY priority DESC, created_on, id WHERE state<'active'
+# pgboss.job is PARTITIONED — must target the specific leaf partition for bench_queue.
+# The partition name is j<first-40-chars-of-sha1(name)>.
+sudo -u postgres psql -d bench > /tmp/pgboss_index.log 2>&1 <<'SQL'
+DO $$
+DECLARE
+  partition_name text;
+BEGIN
+  -- Find the leaf partition for bench_queue by asking pg_inherits.
+  -- bench_queue was created via boss.createQueue; pg-boss 12 creates a named partition per queue.
+  SELECT c.relname INTO partition_name
+  FROM pg_inherits i
+  JOIN pg_class c ON c.oid = i.inhrelid
+  JOIN pg_class p ON p.oid = i.inhparent
+  JOIN pg_namespace np ON np.oid = p.relnamespace
+  WHERE np.nspname = 'pgboss' AND p.relname = 'job'
+    AND pg_get_expr(c.relpartbound, c.oid) LIKE '%bench_queue%'
+  LIMIT 1;
+
+  IF partition_name IS NULL THEN
+    -- Fallback: try job_common (shared partition when no dedicated one exists)
+    partition_name := 'job_common';
+  END IF;
+
+  RAISE NOTICE 'Creating covering index on pgboss.%', partition_name;
+  EXECUTE format(
+    'CREATE INDEX IF NOT EXISTS idx_bench_polling ON pgboss.%I (name, priority DESC, created_on, id) WHERE state < ''active''',
+    partition_name);
+END
+$$;
+
+-- Verify:
+SELECT schemaname, tablename, indexname FROM pg_indexes
+WHERE schemaname='pgboss' AND indexname='idx_bench_polling';
+SQL
+cat /tmp/pgboss_index.log
+
+# Worker script
+sudo tee /root/pgboss_worker.js >/dev/null <<'EOF'
+const { PgBoss } = require('pg-boss');
+const boss = new PgBoss('postgres://postgres@127.0.0.1/bench');
+boss.start().then(() => {
+  boss.work('bench_queue', { teamSize: 8, pollingIntervalSeconds: 0.5 }, async jobs => Promise.resolve());
+  console.log('pgboss worker running');
+}).catch(e => { console.error(e); process.exit(1); });
+process.on('SIGTERM', () => boss.stop({ graceful: false }).then(() => process.exit(0)));
+process.on('SIGINT',  () => boss.stop({ graceful: false }).then(() => process.exit(0)));
+EOF
+
+echo "=== pgboss installed + covering index ==="
diff --git a/benchmark/install/install_pgmq.sh b/benchmark/install/install_pgmq.sh
new file mode 100644
index 00000000..0b9a802e
--- /dev/null
+++ b/benchmark/install/install_pgmq.sh
@@ -0,0 +1,11 @@
+#!/usr/bin/env bash
+set -euo pipefail
+echo "=== install pgmq v1.11.0 (SQL-only) ==="
+
+curl -fsSL \
+  https://raw.githubusercontent.com/tembo-io/pgmq/v1.11.0/pgmq-extension/sql/pgmq.sql \
+  -o /tmp/pgmq.sql
+
+sudo -u postgres psql -d bench -f /tmp/pgmq.sql
+sudo -u postgres psql -d bench -c "SELECT pgmq.create('bench_queue');"
+echo "=== pgmq installed ==="
diff --git a/benchmark/install/install_pgq.sh b/benchmark/install/install_pgq.sh
new file mode 100644
index 00000000..f8ab8823
--- /dev/null
+++ b/benchmark/install/install_pgq.sh
@@ -0,0 +1,30 @@
+#!/usr/bin/env bash
+set -euo pipefail
+echo "=== install pgq v3.5.1 ==="
+
+sudo rm -rf /tmp/pgq
+sudo git clone --branch v3.5.1 --depth 1 https://github.com/pgq/pgq /tmp/pgq
+cd /tmp/pgq
+sudo make USE_PGXS=1 PG_CONFIG=/usr/lib/postgresql/18/bin/pg_config >/dev/null
+sudo make USE_PGXS=1 install >/dev/null
+
+sudo -u postgres psql -d bench -c "CREATE EXTENSION pgq;"
+# Switch to PL-only mode — replace C insert_event_raw with PL/pgSQL version
+sudo -u postgres psql -d bench -f /tmp/pgq/sql/switch_plonly.sql
+
+# Verify PL-only
+lang=$(sudo -u postgres psql -d bench -Atc \
+  "SELECT l.lanname FROM pg_proc p JOIN pg_language l ON p.prolang=l.oid WHERE proname='insert_event_raw' AND pronamespace=(SELECT oid FROM pg_namespace WHERE nspname='pgq');")
+if [ "$lang" != "plpgsql" ]; then
+  echo "ERROR: pgq still on C (lang=$lang)"; exit 1
+fi
+echo "pgq PL-only confirmed (lang=$lang)"
+
+sudo -u postgres psql -d bench -c "SELECT pgq.create_queue('bench_queue');"
+sudo -u postgres psql -d bench -c "SELECT pgq.ticker('bench_queue');"
+sudo -u postgres psql -d bench -c "SELECT pgq.register_consumer('bench_queue', 'bench_consumer');"
+# Ticker via pg_cron (60s interval, bench.py also ticks inline)
+sudo -u postgres psql -d bench -c "SELECT cron.schedule('pgq-ticker', '* * * * *', \$\$SELECT pgq.ticker()\$\$);"
+# And maint (rotation etc)
+sudo -u postgres psql -d bench -c "SELECT cron.schedule('pgq-maint', '* * * * *', \$\$SELECT pgq.maint_operations()\$\$);" || true
+echo "=== pgq installed ==="
diff --git a/benchmark/install/install_river.sh b/benchmark/install/install_river.sh
new file mode 100644
index 00000000..afdf8ee2
--- /dev/null
+++ b/benchmark/install/install_river.sh
@@ -0,0 +1,80 @@
+#!/usr/bin/env bash
+set -euo pipefail
+echo "=== install river v0.34.0 + Go worker ==="
+
+# i4i.2xlarge is x86_64
+curl -fsSL https://go.dev/dl/go1.22.4.linux-amd64.tar.gz | sudo tar -C /usr/local -xz
+export PATH=$PATH:/usr/local/go/bin
+echo 'export PATH=$PATH:/usr/local/go/bin:/root/go/bin' | sudo tee -a /root/.bashrc >/dev/null
+
+sudo bash -c "export PATH=\$PATH:/usr/local/go/bin; go install github.com/riverqueue/river/cmd/river@v0.34.0"
+
+sudo -u postgres psql -d bench -c "CREATE EXTENSION IF NOT EXISTS pgcrypto;"
+sudo /root/go/bin/river migrate-up --database-url "postgres://postgres@127.0.0.1/bench"
+
+# Go worker program (river CLI has no 'work' subcommand in 0.34)
+sudo mkdir -p /root/riverworker
+sudo tee /root/riverworker/go.mod >/dev/null <<'EOF'
+module riverworker
+
+go 1.22
+EOF
+
+sudo tee /root/riverworker/main.go >/dev/null <<'EOF'
+package main
+
+import (
+    "context"
+    "log"
+    "os/signal"
+    "syscall"
+
+    "github.com/jackc/pgx/v5/pgxpool"
+    "github.com/riverqueue/river"
+    "github.com/riverqueue/river/riverdriver/riverpgxv5"
+)
+
+type BenchArgs struct{}
+
+func (BenchArgs) Kind() string { return "bench" }
+
+type BenchWorker struct {
+    river.WorkerDefaults[BenchArgs]
+}
+
+func (w *BenchWorker) Work(ctx context.Context, job *river.Job[BenchArgs]) error {
+    return nil
+}
+
+func main() {
+    ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
+    defer cancel()
+
+    pool, err := pgxpool.New(ctx, "postgres://postgres@127.0.0.1/bench?sslmode=disable")
+    if err != nil {
+        log.Fatal(err)
+    }
+    defer pool.Close()
+
+    workers := river.NewWorkers()
+    river.AddWorker(workers, &BenchWorker{})
+
+    client, err := river.NewClient(riverpgxv5.New(pool), &river.Config{
+        Queues:  map[string]river.QueueConfig{"default": {MaxWorkers: 8}},
+        Workers: workers,
+    })
+    if err != nil {
+        log.Fatal(err)
+    }
+
+    if err := client.Start(ctx); err != nil {
+        log.Fatal(err)
+    }
+    <-ctx.Done()
+}
+EOF
+
+cd /root/riverworker
+sudo bash -c "cd /root/riverworker && export PATH=\$PATH:/usr/local/go/bin && go mod tidy && go build -o /root/riverworker/bench_worker main.go"
+
+echo "=== river installed, worker binary at /root/riverworker/bench_worker ==="
diff --git a/benchmark/install/pgmq-partitioned_setup_5min.sql b/benchmark/install/pgmq-partitioned_setup_5min.sql
new file mode 100644
index 00000000..f1fd1ffb
--- /dev/null
+++ b/benchmark/install/pgmq-partitioned_setup_5min.sql
@@ -0,0 +1,6 @@
+-- Schedule partman run_maintenance every minute (partitions are 5-min)
+SELECT cron.schedule('partman_run', '* * * * *', 'CALL public.run_maintenance_proc()');
+-- Verify
+SELECT jobname, schedule FROM cron.job WHERE jobname = 'partman_run';
+-- Test ash
+SELECT count(*) FROM ash.samples(p_interval => '1 minute'::interval, p_limit => 10);
diff --git a/benchmark/producers/producer_pgboss.sql b/benchmark/producers/producer_pgboss.sql
new file mode 100644
index 00000000..e4b90933
--- /dev/null
+++ b/benchmark/producers/producer_pgboss.sql
@@ -0,0 +1,2 @@
+INSERT INTO pgboss.job(name,data,priority)
+VALUES('bench_queue', json_build_object('data', repeat('x', 200))::jsonb, 0);
diff --git a/benchmark/producers/producer_pgmq-partitioned.sql b/benchmark/producers/producer_pgmq-partitioned.sql
new file mode 100644
index 00000000..27efcbb4
--- /dev/null
+++ b/benchmark/producers/producer_pgmq-partitioned.sql
@@ -0,0 +1 @@
+SELECT pgmq.send('bench_queue', json_build_object('data', repeat('x', 200))::jsonb);
diff --git a/benchmark/producers/producer_pgmq.sql b/benchmark/producers/producer_pgmq.sql
new file mode 100644
index 00000000..27efcbb4
--- /dev/null
+++ b/benchmark/producers/producer_pgmq.sql
@@ -0,0 +1 @@
+SELECT pgmq.send('bench_queue', json_build_object('data', repeat('x', 200))::jsonb);
diff --git a/benchmark/producers/producer_pgq.sql b/benchmark/producers/producer_pgq.sql
new file mode 100644
index 00000000..b816c778
--- /dev/null
+++ b/benchmark/producers/producer_pgq.sql
@@ -0,0 +1 @@
+SELECT pgq.insert_event('bench_queue', 'bench', repeat('x', 200));
diff --git a/benchmark/producers/producer_pgque.sql b/benchmark/producers/producer_pgque.sql
new file mode 100644
index 00000000..2f5bdfd2
--- /dev/null
+++ b/benchmark/producers/producer_pgque.sql
@@ -0,0 +1 @@
+SELECT pgque.send('bench_queue', json_build_object('data', repeat('x', 200))::jsonb);
diff --git a/benchmark/producers/producer_que.sql b/benchmark/producers/producer_que.sql
new file mode 100644
index 00000000..fe426577
--- /dev/null
+++ b/benchmark/producers/producer_que.sql
@@ -0,0 +1,4 @@
+INSERT INTO public.que_jobs(job_class, args, priority, queue, run_at, job_schema_version)
+VALUES('Bench',
+       jsonb_build_array(jsonb_build_object('data', repeat('x', 200))),
+       100, 'default', now(), 2);
diff --git a/benchmark/producers/producer_river.sql b/benchmark/producers/producer_river.sql
new file mode 100644
index 00000000..71aaf4c8
--- /dev/null
+++ b/benchmark/producers/producer_river.sql
@@ -0,0 +1,2 @@
+INSERT INTO river_job(kind,args,max_attempts,queue,state)
+VALUES('bench', json_build_object('data', repeat('x', 200))::jsonb, 25, 'default', 'available');
diff --git a/benchmark/runners/clean_reinstall.sh b/benchmark/runners/clean_reinstall.sh
new file mode 100644
index 00000000..30e54f53
--- /dev/null
+++ b/benchmark/runners/clean_reinstall.sh
@@ -0,0 +1,76 @@
+#!/usr/bin/env bash
+# Per-system clean reinstall + stats reset.
+set -uo pipefail
+SYS=${1:?system}
+
+echo "=== [$(date -u +%FT%TZ)] clean reinstall: $SYS ==="
+
+# Kill any leftover processes
+sudo pkill -f 'idle_in_tx' 2>/dev/null || true
+sudo pkill -f 'bench.py' 2>/dev/null || true
+sudo pkill -f 'bloat_sampler' 2>/dev/null || true
+sudo pkill -f 'pgq_ticker_daemon' 2>/dev/null || true
+sudo pkill -f 'bench_worker' 2>/dev/null || true
+sudo pkill -f 'que_worker' 2>/dev/null || true
+sudo pkill -f 'pgboss_worker' 2>/dev/null || true
+sleep 2
+
+# Terminate any dangling backends
+sudo -u postgres psql -d bench -c "
+  SELECT pg_terminate_backend(pid)
+  FROM pg_stat_activity
+  WHERE datname='bench' AND pid<>pg_backend_pid()
+    AND (state='idle in transaction' OR application_name LIKE 'idle_in_tx%' OR application_name LIKE 'pgq_ticker%');
+" >/dev/null 2>&1
+
+# Drop queue-specific artifacts
+case $SYS in
+  pgque)
+    sudo -u postgres psql -d bench -c "DROP SCHEMA IF EXISTS pgque CASCADE" >/dev/null
+    # Also unschedule any leftover pg_cron jobs named pgque_*
+    sudo -u postgres psql -d bench -Atc "SELECT jobname FROM cron.job WHERE jobname LIKE 'pgque%'" | while read j; do
+      [ -n "$j" ] && sudo -u postgres psql -d bench -c "SELECT cron.unschedule('$j')" >/dev/null
+    done
+    ;;
+  pgq)
+    sudo -u postgres psql -d bench -c "DROP EXTENSION IF EXISTS pgq CASCADE; DROP SCHEMA IF EXISTS pgq CASCADE" >/dev/null
+    sudo -u postgres psql -d bench -Atc "SELECT jobname FROM cron.job WHERE jobname LIKE 'pgq%'" | while read j; do
+      [ -n "$j" ] && sudo -u postgres psql -d bench -c "SELECT cron.unschedule('$j')" >/dev/null
+    done
+    ;;
+  pgmq)
+    sudo -u postgres psql -d bench -c "DROP SCHEMA IF EXISTS pgmq CASCADE" >/dev/null
+    ;;
+  river)
+    sudo -u postgres psql -d bench -c "
+      DO \$\$ DECLARE t text;
+      BEGIN
+        FOR t IN SELECT tablename FROM pg_tables WHERE schemaname='public' AND tablename LIKE 'river_%'
+        LOOP EXECUTE 'DROP TABLE IF EXISTS '||t||' CASCADE'; END LOOP;
+      END \$\$;
+    " >/dev/null
+    ;;
+  que)
+    sudo -u postgres psql -d bench -c "
+      DO \$\$ DECLARE t text;
+      BEGIN
+        FOR t IN SELECT tablename FROM pg_tables WHERE schemaname='public' AND tablename LIKE 'que_%'
+        LOOP EXECUTE 'DROP TABLE IF EXISTS '||t||' CASCADE'; END LOOP;
+      END \$\$;
+    " >/dev/null
+    ;;
+  pgboss)
+    sudo -u postgres psql -d bench -c "DROP SCHEMA IF EXISTS pgboss CASCADE" >/dev/null
+    ;;
+esac
+
+# Re-run install
+bash /tmp/install.sh
+
+# Full stats reset AFTER reinstall so the reinstall's own writes don't count
+sudo -u postgres psql -d bench -f /tmp/full_reset.sql >/dev/null
+
+# ANALYZE for fresh stats
+sudo -u postgres psql -d bench -c "ANALYZE" >/dev/null
+
+echo "=== [$(date -u +%FT%TZ)] clean reinstall done: $SYS ==="
diff --git a/benchmark/runners/fix_nvme_mount.sh b/benchmark/runners/fix_nvme_mount.sh
new file mode 100644
index 00000000..74bebc19
--- /dev/null
+++ b/benchmark/runners/fix_nvme_mount.sh
@@ -0,0 +1,65 @@
+#!/bin/bash
+# Format + mount NVMe instance store, move PG data dir to NVMe, re-setup symlink.
+set -euo pipefail
+DEV=/dev/nvme1n1
+MNT=/mnt/pgdata
+PG_DATA_SRC=/var/lib/postgresql/18/main
+PG_DATA_DST=$MNT/postgresql/18/main
+
+echo "=== pre-flight ==="
+lsblk | grep nvme1n1
+findmnt $MNT 2>&1 | head -1 || echo "$MNT not mounted"
+
+# Stop PG
+echo "=== stopping postgresql ==="
+sudo systemctl stop postgresql@18-main || true
+sleep 2
+
+# Format if needed
+if ! sudo blkid $DEV 2>&1 | grep -q xfs; then
+  echo "=== formatting $DEV as xfs ==="
+  sudo mkfs.xfs -f $DEV
+fi
+
+# Mount
+echo "=== mounting $DEV → $MNT (noatime,nodiratime) ==="
+sudo mkdir -p $MNT
+# /mnt/pgdata may currently be a non-empty dir (has the 'bench' symlink target).
+# Move contents aside first.
+sudo mkdir -p /tmp/_pgdata_old
+sudo mv $MNT/* /tmp/_pgdata_old/ 2>/dev/null || true
+sudo mount -o noatime,nodiratime $DEV $MNT
+# Restore any content (e.g. bench dir)
+sudo mv /tmp/_pgdata_old/* $MNT/ 2>/dev/null || true
+sudo rmdir /tmp/_pgdata_old 2>/dev/null || true
+
+# Create PG data path on NVMe
+sudo mkdir -p $MNT/postgresql/18
+sudo chown postgres:postgres $MNT/postgresql $MNT/postgresql/18
+
+# Move current data (if symlink points somewhere, follow)
+REAL_SRC=$(sudo readlink -f $PG_DATA_SRC)
+echo "=== PG data currently at: $REAL_SRC (mv to $PG_DATA_DST) ==="
+if [ -d "$REAL_SRC" ] && [ "$REAL_SRC" != "$PG_DATA_DST" ]; then
+  sudo mv "$REAL_SRC" $PG_DATA_DST
+fi
+# Remove the source (it's either a symlink now or gone)
+sudo rm -rf $PG_DATA_SRC
+# Make parent
+sudo mkdir -p $(dirname $PG_DATA_SRC)
+# Symlink
+sudo ln -s $PG_DATA_DST $PG_DATA_SRC
+sudo chown -h postgres:postgres $PG_DATA_SRC
+
+# Start PG
+echo "=== starting postgresql ==="
+sudo systemctl start postgresql@18-main
+sleep 3
+sudo systemctl status postgresql@18-main --no-pager | head -5
+
+# Verify
+echo "=== verify ==="
+findmnt $MNT
+df $MNT
+sudo -u postgres psql -tAc "SHOW data_directory"
+sudo stat --format='%F %N' $PG_DATA_SRC
diff --git a/benchmark/runners/run_r7.sh b/benchmark/runners/run_r7.sh
new file mode 100644
index 00000000..44eb9386
--- /dev/null
+++ b/benchmark/runners/run_r7.sh
@@ -0,0 +1,115 @@
+#!/usr/bin/env bash
+# run_r7.sh <system> [duration_s]
+# R7: 1.5h bench (30m clean + 30m idle-in-tx + 30m recovery)
+# Adds over R6:
+#   * sys_metrics_sampler.py running alongside bloat_sampler.py
+#   * pg_ash ASH + pgfr snapshots copied to CSV at end (schemas preserved)
+#
+# Same instrumented consumer NOTICE format as R6 for honest ev/s parsing.
+set -uo pipefail
+SYS=${1:?system}
+DUR=${2:-5400}  # 1.5h = 5400s
+
+R7_DIR=${R7_DIR:-/tmp/r7}
+R6_DIR=${R6_DIR:-/tmp/r6}
+# Consumer SQL may be in either /tmp/consumer.sql (pushed), /tmp/r7/consumer_<sys>.sql, or /tmp/r6/consumer_<sys>.sql
+for c in "$R7_DIR/consumer_${SYS}.sql" "/tmp/consumer.sql" "$R6_DIR/consumer_${SYS}.sql"; do
+  if [[ -f "$c" ]]; then CONSUMER_SQL="$c"; break; fi
+done
+if [[ -z "${CONSUMER_SQL:-}" ]]; then
+  echo "ERROR: no consumer SQL found for $SYS" >&2
+  exit 1
+fi
+echo "consumer: $CONSUMER_SQL"
+
+mkdir -p /tmp/bench && chmod 777 /tmp/bench
+rm -f /tmp/bench/*
+
+case $SYS in
+  pgque|pgq) CONS_C=1 ;;
+  *)         CONS_C=4 ;;
+esac
+
+# System-specific pre-run nudges
+if [[ "$SYS" == "pgque" ]]; then
+  sudo -u postgres psql -d bench -c "UPDATE pgque.queue SET queue_rotation_period='30 seconds'::interval WHERE queue_name='bench_queue';" >/dev/null
+fi
+if [[ "$SYS" == "pgq" ]]; then
+  sudo -u postgres psql -d bench -c "UPDATE pgq.queue SET queue_rotation_period='30 seconds'::interval WHERE queue_name='bench_queue';" >/dev/null
+  [[ -f /tmp/pgq_ticker_daemon.py ]] && sudo -u postgres nohup python3 /tmp/pgq_ticker_daemon.py > /tmp/bench/pgq_ticker.log 2>&1 < /dev/null &
+  sleep 1
+fi
+
+sudo -u postgres psql -d bench -c "SELECT pg_stat_statements_reset()" >/dev/null
+
+# --- samplers ---
+python3 /tmp/bloat_sampler.py --system "$SYS" --interval 30 --duration "$DUR" > /tmp/bench/bloat.csv &
+BLOAT_PID=$!
+
+# NEW in R7: sys_metrics_sampler.py — CPU/mem/disk every 10s, NVMe instance-store device
+python3 "$R7_DIR/sys_metrics_sampler.py" --interval 10 --duration "$DUR" --device nvme1n1 --out /tmp/bench/sys_metrics.csv > /tmp/bench/sys_metrics.log 2>&1 &
+SYSM_PID=$!
+
+# pgss snapshot cross-check (carried over from R6)
+python3 "$R6_DIR/pg_stat_statements_snapshot.py" \
+  --dsn "host=127.0.0.1 dbname=bench user=postgres" \
+  --interval 10 --duration "$DUR" \
+  --out /tmp/bench/pgss_timeseries.csv \
+  > /tmp/bench/pgss_snapshotter.log 2>&1 &
+PGSS_PID=$!
+
+# Producer — R7 full: -R 5000 (was 1000 in smoke/R6)
+pgbench -h 127.0.0.1 -U postgres -d bench -n -f /tmp/producer.sql \
+  -c 1 -j 1 -R 5000 -T "$DUR" -P 30 \
+  --aggregate-interval=10 --log --log-prefix=/tmp/bench/producer_agg \
+  > /tmp/bench/producer.log 2>&1 &
+PROD_PID=$!
+
+# Consumer (R6 instrumented SQL)
+pgbench -h 127.0.0.1 -U postgres -d bench -n -f "$CONSUMER_SQL" \
+  -c $CONS_C -j $CONS_C -T "$DUR" -P 30 \
+  --aggregate-interval=10 --log --log-prefix=/tmp/bench/consumer_agg \
+  > /tmp/bench/consumer.log 2>&1 &
+CONS_PID=$!
+
+# Phase scheduler — only open idle_in_tx for long runs. For --duration < 3000s (smoke), skip.
+if [[ "$DUR" -ge 3000 ]]; then
+(
+  sleep 1800
+  echo "[$(date -u +%FT%TZ)] opening idle_in_tx" >> /tmp/bench/phases.log
+  python3 /tmp/idle_in_tx.py > /tmp/bench/idle.log 2>&1 &
+  IDLE_PID=$!
+  sudo -u postgres psql -d bench -c "VACUUM VERBOSE" > /tmp/bench/vacuum_preTX.txt 2>&1
+  sleep 1800
+  echo "[$(date -u +%FT%TZ)] closing idle_in_tx" >> /tmp/bench/phases.log
+  kill $IDLE_PID 2>/dev/null
+  sudo -u postgres psql -d bench -c "VACUUM VERBOSE" > /tmp/bench/vacuum_postTX.txt 2>&1
+) &
+fi
+
+wait $PROD_PID
+wait $CONS_PID 2>/dev/null
+kill $BLOAT_PID 2>/dev/null
+kill $SYSM_PID  2>/dev/null
+kill $PGSS_PID  2>/dev/null
+sudo pkill -f idle_in_tx 2>/dev/null || true
+[[ "$SYS" == "pgq" ]] && sudo pkill -f pgq_ticker_daemon 2>/dev/null || true
+
+# End-of-run copies
+sudo -u postgres psql -d bench -c "COPY (SELECT sample_time, database_name, active_backends, wait_event, query_id FROM ash.samples(p_interval => '2 hour'::interval, p_limit => 2000000)) TO '/tmp/bench/ash.csv' CSV HEADER" 2>&1 | tee /tmp/bench/ash_copy.log
+
+# pgfr: dump the main snapshot tables (partitioned & v2 forms)
+sudo -u postgres psql -d bench -c "COPY (SELECT * FROM pgfr_record.snapshots) TO '/tmp/bench/pgfr_snapshots.csv' CSV HEADER" 2>&1 | tee /tmp/bench/pgfr_snapshots_copy.log
+sudo -u postgres psql -d bench -c "COPY (SELECT * FROM pgfr_record.table_snapshots) TO '/tmp/bench/pgfr_table_snapshots.csv' CSV HEADER" 2>&1 | tee /tmp/bench/pgfr_tables_copy.log || true
+sudo -u postgres psql -d bench -c "COPY (SELECT * FROM pgfr_record.statement_snapshots) TO '/tmp/bench/pgfr_statement_snapshots.csv' CSV HEADER" 2>&1 | tee /tmp/bench/pgfr_stmts_copy.log || true
+
+sudo -u postgres psql -d bench -c "COPY (SELECT query, calls, total_exec_time::bigint, rows FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 100) TO '/tmp/bench/pgss.csv' CSV HEADER"
+
+# Post-process: produce events_consumed_per_sec.csv
+python3 "$R6_DIR/parse_events_consumed.py" \
+  --bench-dir /tmp/bench \
+  --bucket 1 \
+  --system "$SYS" \
+  > /tmp/bench/events_consumed_parse.log 2>&1 || true
+
+echo "=== R7 done: $SYS ==="
diff --git a/benchmark/tooling/bloat_sampler.py b/benchmark/tooling/bloat_sampler.py
new file mode 100644
index 00000000..3498ae85
--- /dev/null
+++ b/benchmark/tooling/bloat_sampler.py
@@ -0,0 +1,86 @@
+#!/usr/bin/env python3
+"""bloat_sampler.py — sample queue table bloat every N seconds.
+For pgboss, also samples all partitions of pgboss.job.
+Output CSV: ts,table,n_live_tup,n_dead_tup,heap_bytes,toast_bytes,index_bytes,total_bytes
+"""
+import argparse, time, sys
+from datetime import datetime, timezone
+import psycopg2
+
+DSN = "host=127.0.0.1 dbname=bench user=postgres"
+
+# Per-system base queries; pgboss gets special treatment (partitions)
+TABLES_QUERY = {
+    "pgque": """
+        SELECT schemaname||'.'||relname FROM pg_stat_user_tables
+        WHERE schemaname = 'pgque' ORDER BY 1
+    """,
+    "pgq": """
+        SELECT schemaname||'.'||relname FROM pg_stat_user_tables
+        WHERE schemaname = 'pgq' ORDER BY 1
+    """,
+    "pgmq": """
+        SELECT schemaname||'.'||relname FROM pg_stat_user_tables
+        WHERE schemaname = 'pgmq' AND relname LIKE 'q_%' ORDER BY 1
+    """,
+    "river":  "SELECT 'public.river_job'",
+    "que":    "SELECT 'public.que_jobs'",
+    # pgboss: include parent + all partitions of pgboss.job + archive
+    "pgboss": """
+        SELECT n.nspname||'.'||c.relname
+        FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace
+        WHERE n.nspname = 'pgboss'
+          AND (c.relname IN ('job','archive')
+               OR c.relispartition
+               OR c.relkind = 'p')
+        ORDER BY 1
+    """,
+    "pgmq-partitioned": """
+        SELECT n.nspname||'.'||c.relname
+        FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace
+        WHERE n.nspname = 'pgmq'
+          AND (c.relname LIKE 'q_%' OR c.relname LIKE 'a_%')
+          AND (c.relispartition OR c.relkind = 'p' OR c.relkind = 'r')
+        ORDER BY 1
+    """,
+}
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--system", required=True)
+    ap.add_argument("--interval", type=int, default=30)
+    ap.add_argument("--duration", type=int, default=1800)
+    args = ap.parse_args()
+
+    conn = psycopg2.connect(DSN); conn.autocommit = True
+    cur = conn.cursor()
+
+    print("ts,table,n_live_tup,n_dead_tup,heap_bytes,toast_bytes,index_bytes,total_bytes", flush=True)
+    t_end = time.monotonic() + args.duration
+    while time.monotonic() < t_end:
+        ts = datetime.now(timezone.utc).isoformat()
+        try:
+            cur.execute(TABLES_QUERY[args.system])
+            tables = [r[0] for r in cur.fetchall()]
+            for tbl in tables:
+                cur.execute("""
+                    SELECT
+                        COALESCE((SELECT n_live_tup FROM pg_stat_user_tables
+                                  WHERE schemaname||'.'||relname = %s), 0),
+                        COALESCE((SELECT n_dead_tup FROM pg_stat_user_tables
+                                  WHERE schemaname||'.'||relname = %s), 0),
+                        COALESCE(pg_relation_size(%s::regclass), 0),
+                        COALESCE(pg_table_size(%s::regclass) - pg_relation_size(%s::regclass), 0),
+                        COALESCE(pg_indexes_size(%s::regclass), 0),
+                        COALESCE(pg_total_relation_size(%s::regclass), 0)
+                """, (tbl, tbl, tbl, tbl, tbl, tbl, tbl))
+                row = cur.fetchone()
+                print(f"{ts},{tbl},{row[0]},{row[1]},{row[2]},{row[3]},{row[4]},{row[5]}", flush=True)
+        except Exception as e:
+            print(f"# err: {e}", file=sys.stderr, flush=True)
+        time.sleep(args.interval)
+
+    conn.close()
+
+if __name__ == "__main__":
+    main()
diff --git a/benchmark/tooling/idle_in_tx.py b/benchmark/tooling/idle_in_tx.py
new file mode 100644
index 00000000..16e0302b
--- /dev/null
+++ b/benchmark/tooling/idle_in_tx.py
@@ -0,0 +1,30 @@
+#!/usr/bin/env python3
+"""idle_in_tx.py — open a REPEATABLE READ transaction and hold xmin forever.
+Brandur's recipe for reproducing the death spiral. Kill with SIGTERM.
+"""
+import time, signal, sys
+import psycopg2
+
+DSN = "host=127.0.0.1 dbname=bench user=postgres application_name=idle_in_tx"
+
+conn = psycopg2.connect(DSN)
+conn.autocommit = False
+cur = conn.cursor()
+cur.execute("BEGIN ISOLATION LEVEL REPEATABLE READ")
+cur.execute("SELECT 1")
+print(f"idle_in_tx: holding xmin via pid=", conn.get_backend_pid(), flush=True)
+
+def shutdown(signum, frame):
+    try:
+        conn.rollback()
+    except: pass
+    sys.exit(0)
+
+signal.signal(signal.SIGTERM, shutdown)
+signal.signal(signal.SIGINT, shutdown)
+while True:
+    time.sleep(60)
+    # Keep the transaction alive with a trivial query
+    try:
+        cur.execute("SELECT 1")
+    except: shutdown(None, None)
diff --git a/benchmark/tooling/microbench.sh b/benchmark/tooling/microbench.sh
new file mode 100644
index 00000000..9feb032b
--- /dev/null
+++ b/benchmark/tooling/microbench.sh
@@ -0,0 +1,33 @@
+#!/usr/bin/env bash
+# Per-VM microbench: CPU (sysbench), memory (sysbench), disk (fio on NVMe)
+set -uo pipefail
+HOST=$(hostname)
+
+# Ensure tools installed
+sudo apt-get install -q -y sysbench fio >/dev/null 2>&1
+
+echo "=== HOST: $HOST at $(date -u +%FT%TZ) ==="
+
+echo "--- CPU (sysbench cpu --cpu-max-prime=20000 --threads=8 --time=10) ---"
+sysbench cpu --cpu-max-prime=20000 --threads=8 --time=10 run 2>/dev/null | grep -E "events per second|total time|avg:|min:|max:"
+
+echo ""
+echo "--- Memory (sysbench memory --memory-block-size=1M --memory-total-size=20G --threads=8 --time=10) ---"
+sysbench memory --memory-block-size=1M --memory-total-size=20G --threads=8 --time=10 run 2>/dev/null | grep -E "transferred|MiB/sec|total time"
+
+echo ""
+echo "--- FIO 4k randwrite iodepth=32 numjobs=4 runtime=15 on NVMe ---"
+FIO_FILE=/mnt/pgdata/_microbench_fio_test
+sudo fio --name=rand_write --ioengine=libaio --iodepth=32 --rw=randwrite --bs=4k --size=512M \
+    --numjobs=4 --time_based --runtime=15 --filename=$FIO_FILE --direct=1 --group_reporting \
+    --output-format=normal 2>/dev/null | grep -E "WRITE:|iops|bw" | head -5
+sudo rm -f $FIO_FILE
+
+echo ""
+echo "--- FIO 1M seq write iodepth=8 runtime=10 on NVMe ---"
+sudo fio --name=seq_write --ioengine=libaio --iodepth=8 --rw=write --bs=1M --size=2G \
+    --numjobs=1 --time_based --runtime=10 --filename=$FIO_FILE --direct=1 --group_reporting \
+    --output-format=normal 2>/dev/null | grep -E "WRITE:|iops|bw" | head -5
+sudo rm -f $FIO_FILE
+
+echo "=== DONE: $HOST ==="
diff --git a/benchmark/tooling/parse_events_consumed.py b/benchmark/tooling/parse_events_consumed.py
new file mode 100755
index 00000000..8afa831e
--- /dev/null
+++ b/benchmark/tooling/parse_events_consumed.py
@@ -0,0 +1,169 @@
+#!/usr/bin/env python3
+"""parse_events_consumed.py — turn R6 NOTICE-instrumented consumer.log into
+an honest events-delivered-per-second time series.
+
+Inputs (by convention, --bench-dir):
+  consumer.log   : pgbench stdout+stderr merged; contains lines like
+                   "NOTICE:  ev ts=<epoch_s> n=<events>"
+                   (and the usual pgbench progress + banner lines)
+  producer.log   : pgbench producer output — used only to derive run t0
+                   when no NOTICE is present, or for sanity cross-checks.
+  pgss.csv       : optional; queried at end of run; used for sanity check.
+  bloat.csv      : optional; used by the bench-script post-processor only.
+
+Outputs (to --out-dir or --bench-dir):
+  events_consumed_per_sec.csv   columns: second_since_start,events_consumed
+  events_consumed_summary.txt   human summary
+
+Method:
+  - Primary: parse NOTICE lines. Group by second_since_start=(ts - t0).
+  - Fallback: if zero NOTICE lines found, emit a single row with total=0
+    and log a warning (caller should switch to pgss sampling).
+  - t0: earliest NOTICE ts; or first timestamp in producer_agg log if present.
+
+Robust to: missing files, truncated runs, interleaved stderr from
+multiple pgbench clients, oversized files (streamed line-by-line).
+"""
+from __future__ import annotations
+import argparse, csv, os, re, sys
+from collections import defaultdict
+from pathlib import Path
+
+NOTICE_RE = re.compile(r"NOTICE:\s+ev\s+ts=(\d+)\s+n=(\d+)")
+# pgbench aggregate log format: "start_ts.usec n_tx lat sqlat ..."
+# but our producer --aggregate-interval=10 --log writes one row per interval.
+AGG_FIRST_TS_RE = re.compile(r"^(\d{10})(?:\.\d+)?\s")
+
+
+def parse_notice_log(path: Path) -> list[tuple[int, int]]:
+    """Return list of (ts_epoch_s, n_events) tuples from NOTICE lines."""
+    out: list[tuple[int, int]] = []
+    if not path.is_file():
+        return out
+    with path.open("r", errors="replace") as f:
+        for line in f:
+            m = NOTICE_RE.search(line)
+            if m:
+                out.append((int(m.group(1)), int(m.group(2))))
+    return out
+
+
+def find_t0_from_producer(bench_dir: Path) -> int | None:
+    """Find earliest timestamp from producer_agg.<pid> file (pgbench aggregate log)."""
+    candidates = sorted(bench_dir.glob("producer_agg.*"))
+    for c in candidates:
+        try:
+            with c.open("r", errors="replace") as f:
+                for line in f:
+                    m = AGG_FIRST_TS_RE.match(line)
+                    if m:
+                        return int(m.group(1))
+        except OSError:
+            continue
+    return None
+
+
+def bucket(events: list[tuple[int, int]], t0: int, width_s: int) -> dict[int, int]:
+    """Return {second_since_start_bucket_start: total_events}."""
+    hist: dict[int, int] = defaultdict(int)
+    for ts, n in events:
+        s = ts - t0
+        if s < 0:
+            s = 0
+        bucket_start = (s // width_s) * width_s
+        hist[bucket_start] += n
+    return hist
+
+
+def main() -> int:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--bench-dir", default="/tmp/bench",
+                    help="Directory with consumer.log, producer.log, etc.")
+    ap.add_argument("--out-dir", default=None,
+                    help="Where to write events_consumed_per_sec.csv (default: --bench-dir)")
+    ap.add_argument("--bucket", type=int, default=1,
+                    help="Bucket width in seconds (1 or 10 typical).")
+    ap.add_argument("--system", default=None,
+                    help="Informational: system name; recorded in summary.")
+    args = ap.parse_args()
+
+    bench_dir = Path(args.bench_dir)
+    out_dir = Path(args.out_dir) if args.out_dir else bench_dir
+    out_dir.mkdir(parents=True, exist_ok=True)
+
+    consumer_log = bench_dir / "consumer.log"
+    events = parse_notice_log(consumer_log)
+
+    out_csv = out_dir / "events_consumed_per_sec.csv"
+    summary_txt = out_dir / "events_consumed_summary.txt"
+
+    if not events:
+        # No NOTICE lines — emit empty CSV and a warning summary.
+        with out_csv.open("w", newline="") as f:
+            w = csv.writer(f)
+            w.writerow(["second_since_start", "events_consumed"])
+        with summary_txt.open("w") as f:
+            f.write(
+                f"system={args.system or 'unknown'}\n"
+                f"bench_dir={bench_dir}\n"
+                f"method=NOTICE\n"
+                f"notice_lines_found=0\n"
+                "WARNING: no NOTICE events_consumed data — verify consumer.sql\n"
+                "         is the R6 instrumented version. Fallback: use\n"
+                "         pg_stat_statements snapshot (pgss_timeseries.csv)\n"
+                "         or compute via producer_rate - queue_bloat_delta.\n"
+            )
+        print(f"WARN: no NOTICE events in {consumer_log}; wrote empty {out_csv}",
+              file=sys.stderr)
+        return 0
+
+    # t0: prefer producer_agg earliest ts; else first NOTICE ts.
+    t0 = find_t0_from_producer(bench_dir)
+    if t0 is None:
+        t0 = min(e[0] for e in events)
+
+    hist = bucket(events, t0, args.bucket)
+
+    # Write CSV, sorted.
+    sec_keys = sorted(hist.keys())
+    max_sec = sec_keys[-1] if sec_keys else 0
+
+    with out_csv.open("w", newline="") as f:
+        w = csv.writer(f)
+        w.writerow(["second_since_start", "events_consumed"])
+        # Emit every bucket from 0..max, filling zeros for empty buckets —
+        # this produces a proper TS the plotter can consume without gaps.
+        for s in range(0, max_sec + args.bucket, args.bucket):
+            w.writerow([s, hist.get(s, 0)])
+
+    total_events = sum(n for _, n in events)
+    total_calls = len(events)
+    # Calls with n>0 (a.k.a. useful polls)
+    useful_calls = sum(1 for _, n in events if n > 0)
+    run_s = max_sec + args.bucket
+    avg_eps = total_events / run_s if run_s > 0 else 0.0
+
+    with summary_txt.open("w") as f:
+        f.write(
+            f"system={args.system or 'unknown'}\n"
+            f"bench_dir={bench_dir}\n"
+            f"method=NOTICE\n"
+            f"t0_epoch_s={t0}\n"
+            f"bucket_s={args.bucket}\n"
+            f"total_notice_lines={total_calls}\n"
+            f"useful_calls_n_gt_0={useful_calls}\n"
+            f"total_events_consumed={total_events}\n"
+            f"approx_run_duration_s={run_s}\n"
+            f"avg_events_per_sec={avg_eps:.1f}\n"
+            f"hit_rate_useful_over_total={useful_calls}/{total_calls}"
+            f" = {useful_calls / total_calls:.4f}\n"
+            f"csv={out_csv}\n"
+        )
+    print(f"OK: wrote {out_csv} and {summary_txt}")
+    print(f"     events={total_events} over ~{run_s}s -> {avg_eps:.1f} ev/s"
+          f" ({useful_calls}/{total_calls} useful polls)")
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/benchmark/tooling/pg_stat_statements_snapshot.py b/benchmark/tooling/pg_stat_statements_snapshot.py
new file mode 100755
index 00000000..2ade4722
--- /dev/null
+++ b/benchmark/tooling/pg_stat_statements_snapshot.py
@@ -0,0 +1,72 @@
+#!/usr/bin/env python3
+"""pg_stat_statements_snapshot.py — take periodic pg_stat_statements
+snapshots. For each snapshot, record (ts, queryid, query_head, calls, rows,
+total_exec_time_ms). Output appended as CSV — the downstream analyzer
+diffs consecutive snapshots per queryid to obtain per-interval rates.
+
+Used as a cross-check for the NOTICE-based instrumentation, and as a
+fallback for systems where the DO-block wrapper makes statement-level
+rows invisible (pgque, pgq, pgmq).
+"""
+from __future__ import annotations
+import argparse, csv, sys, time
+from datetime import datetime, timezone
+import psycopg2
+
+DSN_DEFAULT = "host=127.0.0.1 dbname=bench user=postgres"
+
+SAMPLE_SQL = """
+SELECT
+    queryid::text,
+    left(regexp_replace(query, '\\s+', ' ', 'g'), 200) AS query_head,
+    calls,
+    rows,
+    total_exec_time::bigint AS total_exec_time_ms
+FROM pg_stat_statements
+WHERE query ~ ANY (ARRAY[
+    'next_batch', 'pgque\\.ticker', 'pgq\\.ticker',
+    'pgmq\\.read', 'pgmq\\.delete',
+    'river_job', 'que_jobs', 'pgboss\\.job'
+])
+ORDER BY calls DESC
+"""
+
+
+def main() -> int:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--dsn", default=DSN_DEFAULT)
+    ap.add_argument("--interval", type=int, default=10)
+    ap.add_argument("--duration", type=int, default=5400)
+    ap.add_argument("--out", default="/tmp/bench/pgss_timeseries.csv")
+    args = ap.parse_args()
+
+    conn = psycopg2.connect(args.dsn)
+    conn.autocommit = True
+    cur = conn.cursor()
+
+    f = open(args.out, "w", newline="")
+    w = csv.writer(f)
+    w.writerow(["ts", "queryid", "query_head", "calls", "rows", "total_exec_time_ms"])
+    f.flush()
+
+    t_end = time.monotonic() + args.duration
+    try:
+        while time.monotonic() < t_end:
+            ts = datetime.now(timezone.utc).isoformat()
+            try:
+                cur.execute(SAMPLE_SQL)
+                rows = cur.fetchall()
+                for r in rows:
+                    w.writerow([ts, *r])
+                f.flush()
+            except Exception as e:  # noqa: BLE001
+                print(f"# pgss snap err: {e}", file=sys.stderr, flush=True)
+            time.sleep(args.interval)
+    finally:
+        f.close()
+        conn.close()
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/benchmark/tooling/pgq_ticker_daemon.py b/benchmark/tooling/pgq_ticker_daemon.py
new file mode 100644
index 00000000..5829dae1
--- /dev/null
+++ b/benchmark/tooling/pgq_ticker_daemon.py
@@ -0,0 +1,42 @@
+#!/usr/bin/env python3
+"""pgq_ticker_daemon.py — tight loop calling pgq.ticker + pgq.maint_operations.
+Replaces the 1-min pg_cron invocation to match pgque's inline cadence.
+Run as: sudo -u postgres python3 /root/pgq_ticker_daemon.py
+"""
+import time, signal, sys
+import psycopg2
+
+DSN = "host=127.0.0.1 dbname=bench user=postgres application_name=pgq_ticker"
+conn = psycopg2.connect(DSN); conn.autocommit = True
+cur = conn.cursor()
+
+def shutdown(signum, frame):
+    try: conn.close()
+    except: pass
+    sys.exit(0)
+signal.signal(signal.SIGTERM, shutdown)
+signal.signal(signal.SIGINT, shutdown)
+
+print("pgq_ticker_daemon: starting loop", flush=True)
+last_maint = 0
+while True:
+    try:
+        cur.execute("SELECT pgq.ticker()")
+        # Run maint_operations() periodically (every 5s) — each op in its own implicit tx
+        now = time.time()
+        if now - last_maint >= 5:
+            cur.execute("SELECT func_name, func_arg FROM pgq.maint_operations()")
+            for fn, arg in cur.fetchall():
+                try:
+                    if arg is None:
+                        cur.execute(f"SELECT {fn}()")
+                    else:
+                        cur.execute(f"SELECT {fn}(%s)", (arg,))
+                except Exception as me:
+                    # Some operations are expected to fail (e.g. NOWAIT lock) — ignore
+                    pass
+            last_maint = now
+    except Exception as e:
+        print(f"ticker err: {e}", file=sys.stderr, flush=True)
+        time.sleep(1)
+    time.sleep(1)  # 1 Hz ticker
diff --git a/benchmark/tooling/sys_metrics_sampler.py b/benchmark/tooling/sys_metrics_sampler.py
new file mode 100644
index 00000000..4897e27b
--- /dev/null
+++ b/benchmark/tooling/sys_metrics_sampler.py
@@ -0,0 +1,121 @@
+#!/usr/bin/env python3
+"""R7 v2 sys metrics sampler: CPU + RAM + per-device NVMe IOPS / MiB/s / latency.
+
+Columns:
+  ts_iso,
+  cpu_user_pct, cpu_system_pct, cpu_iowait_pct, cpu_idle_pct,
+  mem_total_mb, mem_used_mb, mem_buff_cache_mb, mem_available_mb,
+  disk_read_iops, disk_write_iops,
+  disk_read_mib_s, disk_write_mib_s,
+  disk_read_lat_ms, disk_write_lat_ms
+
+Reads /proc/stat, /proc/meminfo, /proc/diskstats. No external deps.
+"""
+import argparse, time, os
+from datetime import datetime, timezone
+
+def read_proc_stat():
+    with open('/proc/stat') as f:
+        fields = f.readline().split()
+    # fields: cpu user nice system idle iowait irq softirq steal guest guest_nice
+    vals = [int(x) for x in fields[1:]]
+    return {
+        'user': vals[0] + vals[1],       # user + nice
+        'system': vals[2] + vals[5] + vals[6],  # system + irq + softirq
+        'iowait': vals[4],
+        'idle': vals[3],
+        'total': sum(vals),
+    }
+
+def read_meminfo():
+    info = {}
+    with open('/proc/meminfo') as f:
+        for ln in f:
+            key, v = ln.split(':', 1)
+            info[key] = int(v.strip().split()[0])  # kB
+    total = info['MemTotal'] / 1024
+    free  = info['MemFree'] / 1024
+    buff  = info.get('Buffers', 0) / 1024
+    cache = info.get('Cached', 0) / 1024 + info.get('SReclaimable', 0) / 1024
+    avail = info.get('MemAvailable', free) / 1024
+    used  = total - avail
+    return total, used, buff + cache, avail
+
+def read_diskstat(device):
+    """Returns dict for named device, or None."""
+    with open('/proc/diskstats') as f:
+        for ln in f:
+            parts = ln.split()
+            if len(parts) < 14: continue
+            if parts[2] != device: continue
+            return {
+                'reads':        int(parts[3]),
+                'sectors_read': int(parts[5]),
+                'read_ms':      int(parts[6]),
+                'writes':       int(parts[7]),
+                'sectors_writ': int(parts[9]),
+                'write_ms':     int(parts[10]),
+            }
+    return None
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument('--interval', type=int, default=10)
+    ap.add_argument('--duration', type=int, default=5400)
+    ap.add_argument('--device', default='nvme1n1')
+    ap.add_argument('--out', default='/tmp/bench/sys_metrics.csv')
+    args = ap.parse_args()
+
+    os.makedirs(os.path.dirname(args.out), exist_ok=True)
+    fh = open(args.out, 'w', buffering=1)  # line-buffered
+    fh.write('ts_iso,cpu_user_pct,cpu_system_pct,cpu_iowait_pct,cpu_idle_pct,'
+             'mem_total_mb,mem_used_mb,mem_buff_cache_mb,mem_available_mb,'
+             'disk_read_iops,disk_write_iops,disk_read_mib_s,disk_write_mib_s,'
+             'disk_read_lat_ms,disk_write_lat_ms\n')
+
+    prev_cpu = read_proc_stat()
+    prev_disk = read_diskstat(args.device) or {}
+    t_end = time.monotonic() + args.duration
+
+    while time.monotonic() < t_end:
+        time.sleep(args.interval)
+        ts = datetime.now(timezone.utc).isoformat()
+
+        # CPU delta
+        cur_cpu = read_proc_stat()
+        dt_total = cur_cpu['total'] - prev_cpu['total'] or 1
+        cpu_user = 100.0 * (cur_cpu['user'] - prev_cpu['user']) / dt_total
+        cpu_sys  = 100.0 * (cur_cpu['system'] - prev_cpu['system']) / dt_total
+        cpu_iow  = 100.0 * (cur_cpu['iowait'] - prev_cpu['iowait']) / dt_total
+        cpu_idle = 100.0 * (cur_cpu['idle'] - prev_cpu['idle']) / dt_total
+        prev_cpu = cur_cpu
+
+        mem_total, mem_used, mem_bc, mem_avail = read_meminfo()
+
+        # Disk delta
+        cur_disk = read_diskstat(args.device)
+        if cur_disk and prev_disk:
+            dr = cur_disk['reads'] - prev_disk.get('reads', 0)
+            dw = cur_disk['writes'] - prev_disk.get('writes', 0)
+            d_sr = cur_disk['sectors_read'] - prev_disk.get('sectors_read', 0)
+            d_sw = cur_disk['sectors_writ'] - prev_disk.get('sectors_writ', 0)
+            d_rms = cur_disk['read_ms'] - prev_disk.get('read_ms', 0)
+            d_wms = cur_disk['write_ms'] - prev_disk.get('write_ms', 0)
+            iops_r = dr / args.interval
+            iops_w = dw / args.interval
+            mib_r = (d_sr * 512) / (args.interval * (1 << 20))
+            mib_w = (d_sw * 512) / (args.interval * (1 << 20))
+            lat_r = d_rms / dr if dr > 0 else 0
+            lat_w = d_wms / dw if dw > 0 else 0
+        else:
+            iops_r = iops_w = mib_r = mib_w = lat_r = lat_w = 0
+        prev_disk = cur_disk or {}
+
+        fh.write(f'{ts},{cpu_user:.2f},{cpu_sys:.2f},{cpu_iow:.2f},{cpu_idle:.2f},'
+                 f'{mem_total:.1f},{mem_used:.1f},{mem_bc:.1f},{mem_avail:.1f},'
+                 f'{iops_r:.1f},{iops_w:.1f},{mib_r:.3f},{mib_w:.3f},'
+                 f'{lat_r:.3f},{lat_w:.3f}\n')
+    fh.close()
+
+if __name__ == '__main__':
+    main()

From 18a11b71d6170ccea140423eec04c5fe68fd92a8 Mon Sep 17 00:00:00 2001
From: Nik Samokhvalov <nik@postgres.ai>
Date: Wed, 29 Apr 2026 20:18:18 -0700
Subject: [PATCH 2/6] chore(benchmark): shell-style polish per CLAUDE.md

- benchmark/runners/fix_nvme_mount.sh: switch to /usr/bin/env bash shebang;
  use set -Eeuo pipefail (was set -euo pipefail without -E)
- benchmark/runners/run_r7.sh: add -Ee flags to existing pipefail
- benchmark/runners/clean_reinstall.sh: read -r in two while-loops

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 benchmark/runners/clean_reinstall.sh | 4 ++--
 benchmark/runners/fix_nvme_mount.sh  | 4 ++--
 benchmark/runners/run_r7.sh          | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/benchmark/runners/clean_reinstall.sh b/benchmark/runners/clean_reinstall.sh
index 30e54f53..e6d94213 100644
--- a/benchmark/runners/clean_reinstall.sh
+++ b/benchmark/runners/clean_reinstall.sh
@@ -28,13 +28,13 @@ case $SYS in
   pgque)
     sudo -u postgres psql -d bench -c "DROP SCHEMA IF EXISTS pgque CASCADE" >/dev/null
     # Also unschedule any leftover pg_cron jobs named pgque_*
-    sudo -u postgres psql -d bench -Atc "SELECT jobname FROM cron.job WHERE jobname LIKE 'pgque%'" | while read j; do
+    sudo -u postgres psql -d bench -Atc "SELECT jobname FROM cron.job WHERE jobname LIKE 'pgque%'" | while read -r j; do
       [ -n "$j" ] && sudo -u postgres psql -d bench -c "SELECT cron.unschedule('$j')" >/dev/null
     done
     ;;
   pgq)
     sudo -u postgres psql -d bench -c "DROP EXTENSION IF EXISTS pgq CASCADE; DROP SCHEMA IF EXISTS pgq CASCADE" >/dev/null
-    sudo -u postgres psql -d bench -Atc "SELECT jobname FROM cron.job WHERE jobname LIKE 'pgq%'" | while read j; do
+    sudo -u postgres psql -d bench -Atc "SELECT jobname FROM cron.job WHERE jobname LIKE 'pgq%'" | while read -r j; do
       [ -n "$j" ] && sudo -u postgres psql -d bench -c "SELECT cron.unschedule('$j')" >/dev/null
     done
     ;;
diff --git a/benchmark/runners/fix_nvme_mount.sh b/benchmark/runners/fix_nvme_mount.sh
index 74bebc19..fb8a747d 100644
--- a/benchmark/runners/fix_nvme_mount.sh
+++ b/benchmark/runners/fix_nvme_mount.sh
@@ -1,6 +1,6 @@
-#!/bin/bash
+#!/usr/bin/env bash
 # Format + mount NVMe instance store, move PG data dir to NVMe, re-setup symlink.
-set -euo pipefail
+set -Eeuo pipefail
 DEV=/dev/nvme1n1
 MNT=/mnt/pgdata
 PG_DATA_SRC=/var/lib/postgresql/18/main
diff --git a/benchmark/runners/run_r7.sh b/benchmark/runners/run_r7.sh
index 44eb9386..0ecf4b59 100644
--- a/benchmark/runners/run_r7.sh
+++ b/benchmark/runners/run_r7.sh
@@ -6,7 +6,7 @@
 #   * pg_ash ASH + pgfr snapshots copied to CSV at end (schemas preserved)
 #
 # Same instrumented consumer NOTICE format as R6 for honest ev/s parsing.
-set -uo pipefail
+set -Eeuo pipefail
 SYS=${1:?system}
 DUR=${2:-5400}  # 1.5h = 5400s
 

From 916a4f42d50b95e94de6f82115e1c99fd2d78d6f Mon Sep 17 00:00:00 2001
From: Nik Samokhvalov <nik@postgres.ai>
Date: Thu, 30 Apr 2026 02:26:17 -0700
Subject: [PATCH 3/6] chore(benchmark): anti-leak scrub + polish pass
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Remove all references to private internal URLs, round numbering
(R4/R5/R6/R8), and private repository paths from benchmark/ files.

- METHODOLOGY.md: drop internal URL + note IDs from header; remove
  internal posting-style section (§9); neutralize round refs; fix
  /tmp/bench_r<N> path reference
- README.md: drop internal reference comment
- HARDWARE.md: fix binary units (GB→GiB, TB→TiB); drop R7 round ref
- OPS_GOTCHAS.md: neutralize R4/R6 round refs in lessons; fix
  binary units (GB→GiB, MB/s→MiB/s)
- consumers/*.sql: drop "R6 instrumented" prefix from all 7 files
- runners/run_r7.sh: remove R6/R7 round refs from inline comments
- tooling/sys_metrics_sampler.py: remove R7 from docstring
- tooling/parse_events_consumed.py: remove R6 from docstring + msg
- charts/r5_analyze.py, r6_smoke_chart.py: remove Rn from docstrings,
  chart titles, and file-size output (KB→KiB)

PR description updated separately via gh pr edit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 benchmark/HARDWARE.md                         | 10 ++---
 benchmark/METHODOLOGY.md                      | 44 +++++++------------
 benchmark/OPS_GOTCHAS.md                      | 14 +++---
 benchmark/README.md                           |  6 +--
 benchmark/charts/r5_analyze.py                |  6 +--
 benchmark/charts/r6_smoke_chart.py            |  6 +--
 benchmark/consumers/consumer_pgboss.sql       |  2 +-
 .../consumers/consumer_pgmq-partitioned.sql   |  2 +-
 benchmark/consumers/consumer_pgmq.sql         |  2 +-
 benchmark/consumers/consumer_pgq.sql          |  2 +-
 benchmark/consumers/consumer_pgque.sql        |  2 +-
 benchmark/consumers/consumer_que.sql          |  4 +-
 benchmark/consumers/consumer_river.sql        |  2 +-
 benchmark/runners/run_r7.sh                   | 22 +++++-----
 benchmark/tooling/parse_events_consumed.py    |  4 +-
 benchmark/tooling/sys_metrics_sampler.py      |  2 +-
 16 files changed, 57 insertions(+), 73 deletions(-)

diff --git a/benchmark/HARDWARE.md b/benchmark/HARDWARE.md
index e171b937..d525dd9f 100644
--- a/benchmark/HARDWARE.md
+++ b/benchmark/HARDWARE.md
@@ -5,10 +5,10 @@
 | Spec | Value |
 |---|---|
 | vCPU | 8 (Intel Ice Lake Xeon 8375C, 2.9 GHz base / 3.5 GHz turbo) |
-| RAM | 64 GB |
-| NVMe instance store | 1 × 1.75 TB (physical attach, NVMe) |
+| RAM | 64 GiB |
+| NVMe instance store | 1 × 1.75 TiB (physical attach, NVMe) |
 | Network | Up to 12 Gbps |
-| EBS (root) | 8 GB gp3 (Ubuntu 24.04 AMI default) |
+| EBS (root) | 8 GiB gp3 (Ubuntu 24.04 AMI default) |
 | Spot price (us-east-2, 2026-04) | ~$0.20–0.30 / hour |
 | On-demand price | $0.686 / hour |
 
@@ -27,7 +27,7 @@ Run via [tooling/microbench.sh](tooling/microbench.sh). Expected order-of-magnit
 | fio 4 k randwrite, 99 p latency | ~100 µs |
 | fio 4 k randread, QD=32, on NVMe | ~400 k IOPS |
 
-*Actual R7 microbench numbers to be filled in from the R7 microbench pass.*
+*Actual numbers to be filled in after running [tooling/microbench.sh](tooling/microbench.sh) on a fresh VM.*
 
 ## Postgres tuning (shared across all 7 VMs)
 
@@ -66,6 +66,6 @@ listen_addresses = 'localhost'
 
 `synchronous_commit=off` is deliberate — queue workloads are almost always idempotent at the application layer, and the WAL-flush path is the dominant cost for low-latency producers. It's the only PG knob we touch that materially changes safety posture.
 
-`jit=off` because 5-s JIT warmups on `DO` blocks dominated our first-transaction latency in R4.
+`jit=off` because 5-s JIT warmups on `DO` blocks dominated first-transaction latency in early bench runs.
 
 `autovacuum_*_scale_factor=0.01` is aggressive on purpose — we want autovacuum attempting to clean every 1 % dead-tuple ratio so the held-xmin phase of the bench exposes the *inability* to vacuum, not a lazy autovacuum schedule.
diff --git a/benchmark/METHODOLOGY.md b/benchmark/METHODOLOGY.md
index a61898a4..79bbb6fc 100644
--- a/benchmark/METHODOLOGY.md
+++ b/benchmark/METHODOLOGY.md
@@ -1,13 +1,10 @@
 # Bench methodology — the definitive reference
 
-Adapted from GitLab issue [postgres-ai/postgresql-consulting/tests-and-benchmarks#77, note 3263767264](https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/issues/77#note_3263767264).
-
 This document is the single source of truth for how the PostgreSQL-queue bench is structured, what it measures, and every script/config that makes it work. It is written so a reviewer can reproduce the whole thing end-to-end.
 
 Cross-links:
 
 - Upstream fix: [pgque PR #62](https://github.com/NikolayS/pgque/pull/62) · issue [NikolayS/pgque#61](https://github.com/NikolayS/pgque/issues/61)
-- GitLab rounds: R4 final narrative — note 3262968456; R5 root — note 3263000815; R5 full writeup — note 3263438287; pgque rotation PoC under held xmin — note 3263417609.
 
 ---
 
@@ -31,7 +28,7 @@ The Go/Ruby/Node workers are installed end-to-end so the schema is authentic, bu
 
 Workload shape (all runs):
 
-- Producer: 1 client, pgbench `-R 1000` rate-cap (may move to `-R 5000` in R7/R8)
+- Producer: 1 client, pgbench `-R 1000` rate-cap (full runs use `-R 5000`)
 - Consumer: 1 client for pgque/pgq, 4 clients for everything else
 - Three 30-minute phases back-to-back = **1.5 h per run**:
   1. **Clean baseline** — no held xmin
@@ -42,20 +39,20 @@ Workload shape (all runs):
 
 ## 2. Infrastructure
 
-- AWS us-east-2, **i4i.2xlarge** (8 vCPU, 64 GB, NVMe instance store)
+- AWS us-east-2, **i4i.2xlarge** (8 vCPU, 64 GiB, NVMe instance store)
 - Spot where available; on-demand only as last resort (see Section 10)
 - Ubuntu 24.04, **PG18 from PGDG**, `pg_cron`, `pg_stat_statements`, pg_ash, pgfr
 - Data dir moved to NVMe: `/mnt/pgdata/postgresql/18/main` symlink (see [runners/fix_nvme_mount.sh](runners/fix_nvme_mount.sh) and [OPS_GOTCHAS.md §1](OPS_GOTCHAS.md))
 - One VM per system so any tuning / runtime / GC behaviour is contained
 - SSH key: `<your-ssh-key>` (us-east-2)
 
-Live R7 VMs (redacted): `<pgque-ip>`, `<pgq-ip>`, `<pgmq-ip>`, `<river-ip>`, `<pgboss-ip>`, `<que-ip>`, `<pgmq-partitioned-ip>`.
+VM IPs use placeholder form: `<pgque-ip>`, `<pgq-ip>`, `<pgmq-ip>`, `<river-ip>`, `<pgboss-ip>`, `<que-ip>`, `<pgmq-partitioned-ip>`.
 
 ---
 
 ## 3. Observability stack
 
-Seven parallel streams run during every bench. All CSVs land in `/tmp/bench/` and are rsynced to the local `/tmp/bench_r<N>/<system>/` tree every 30 min.
+Seven parallel streams run during every bench. All CSVs land in `/tmp/bench/` and are rsynced to the local `/tmp/bench/<system>/` tree every 30 min.
 
 ### (a) bloat_sampler.py — pg_stat_user_tables every 30 s
 
@@ -85,7 +82,7 @@ COPY (SELECT * FROM pgfr_record.statement_snapshots) TO '/tmp/bench/pgfr_stateme
 
 Full pgfr_record schema on every VM includes: `snapshots`/`snapshots_v2` (partitioned), `table_snapshots(_v2)`, `statement_snapshots(_v2)`, `index_snapshots(_v2)`, `replication_snapshots(_v2)`, `vacuum_progress_snapshots(_v2)`, `activity_samples(_archive_v2)`, `lock_samples(_archive_v2)`, `config_snapshots`, `db_role_config_snapshots`.
 
-### (d) sys_metrics_sampler.py — CPU / mem / disk every 10 s (R7+)
+### (d) sys_metrics_sampler.py — CPU / mem / disk every 10 s
 
 Reads `/proc/stat`, `/proc/meminfo`, `/proc/diskstats` directly (psutil-optional). NVMe device is `nvme1n1` (the instance store, which is what `/mnt/pgdata` sits on). v2 adds per-device IOPS and latency columns.
 
@@ -101,7 +98,7 @@ Source: [tooling/pg_stat_statements_snapshot.py](tooling/pg_stat_statements_snap
 
 Both producer and consumer run with per-10 s aggregate logs (min/max/sum/sumsq latency). Files: `producer_agg.<pid>` and `consumer_agg.<pid>.<worker>` under `/tmp/bench/`.
 
-### (g) Consumer NOTICE instrumentation (R6+)
+### (g) Consumer NOTICE instrumentation
 
 Each instrumented `consumer.sql` is wrapped in a DO block that emits exactly one `RAISE NOTICE 'ev ts=<epoch_s> n=<events>'` per call. This gives us an authoritative per-call consumed-events stream that is immune to the pgss DO-wrapper opacity problem.
 
@@ -165,28 +162,18 @@ All producers are in [producers/](producers/).
 
 ## 8. Analysis and chart generation
 
-- [charts/r5_analyze.py](charts/r5_analyze.py) — R5 full 2-panel chart (dead tuples + consumer latency, linear scale, no symlog — per [MEMORY rule](../CLAUDE.md) "Never use log/symlog on charts").
-- [charts/r6_smoke_chart.py](charts/r6_smoke_chart.py) — R6 smoke Solarized-Dark 2-panel chart (events/s + pgque per-table dead tuples).
-- [gifs/r4_gif_v17_solarized.py](gifs/r4_gif_v17_solarized.py) — R4 dead-tuples animated GIF (7 systems, Solarized-Dark).
-- [gifs/r4_gif_tps_solarized.py](gifs/r4_gif_tps_solarized.py) — R4 TPS/latency animated GIF.
-
----
-
-## 9. GitLab posting style
-
-- Threads open under `### header`; replies stay inside the same discussion.
-- Every `<details>` block has **one blank line** between the opening tag and the triple-backtick fence, and **one blank line** between the closing fence and `</details>` — and another blank line after `</details>`. This is what makes them render correctly on GitLab.
-- Every significant result embeds `![alt](/uploads/…)` — charts are first-class citizens. GIFs too.
-- Post bodies via `curl --data-urlencode "body@/tmp/path"` to the discussions endpoint — new thread, not a reply under an existing one.
+- [charts/r5_analyze.py](charts/r5_analyze.py) — full 2-panel chart (dead tuples + consumer latency, linear scale, no symlog).
+- [charts/r6_smoke_chart.py](charts/r6_smoke_chart.py) — smoke Solarized-Dark 2-panel chart (events/s + pgque per-table dead tuples).
+- [gifs/r4_gif_v17_solarized.py](gifs/r4_gif_v17_solarized.py) — dead-tuples animated GIF (7 systems, Solarized-Dark).
+- [gifs/r4_gif_tps_solarized.py](gifs/r4_gif_tps_solarized.py) — TPS/latency animated GIF.
 
 ---
 
-## 10. Cost discipline
+## 9. Cost discipline
 
 - us-east-2 spot preferred (~$0.22/h); when spot is exhausted we hop to another region before going on-demand.
-- At R7 time 2 VMs are on-demand (pgmq-partitioned and pgboss; pgque+que moved on-demand after R6 spot reclaim).
-- **~$15 per 1.5 h bench round** (7 VMs).
-- Total spend through R7 is ~$100.
+- On-demand for the primary subject under test when data integrity matters most.
+- **~$15 per 1.5 h bench run** (7 VMs).
 - Rsync every 30 min pulls `/tmp/bench/` to local, so a spot reclaim loses at most a partial phase.
 
 See [OPS_GOTCHAS.md §12](OPS_GOTCHAS.md) for spot-reclaim mitigations.
@@ -199,8 +186,7 @@ See [OPS_GOTCHAS.md §12](OPS_GOTCHAS.md) for spot-reclaim mitigations.
 2. `scp` per-system `install.sh`, `producer.sql`, instrumented `consumer.sql`, plus the samplers in [tooling/](tooling/).
 3. `bash clean_reinstall.sh <sys>` on each VM.
 4. `bash run_r7.sh <sys>` — 1.5 h, writes `/tmp/bench/*.csv` + `*.log`.
-5. Rsync everything back to `/tmp/bench_r7/<sys>/`.
-6. Run [charts/r5_analyze.py](charts/r5_analyze.py) (or R7 successor) for the verdict table + 2-panel PNG.
-7. Post results as a threaded reply with the `<details>`/blank-line style above.
+5. Rsync everything back to `/tmp/bench/<sys>/`.
+6. Run [charts/r5_analyze.py](charts/r5_analyze.py) for the verdict table + 2-panel PNG.
 
 That is the whole rig.
diff --git a/benchmark/OPS_GOTCHAS.md b/benchmark/OPS_GOTCHAS.md
index 6c78d190..d2391822 100644
--- a/benchmark/OPS_GOTCHAS.md
+++ b/benchmark/OPS_GOTCHAS.md
@@ -6,7 +6,7 @@ Every operational lesson accumulated while running the pgque-vs-pgq-vs-pgmq-vs-r
 
 ## 1. AWS i4i.2xlarge: NVMe instance store is NOT auto-mounted
 
-The Ubuntu 24.04 AMI boots with `/dev/nvme1n1` **present but not formatted, not mounted**. You MUST format and mount it BEFORE installing Postgres data dir, or the data goes to the 8 GB root EBS volume and the bench dies the moment you fill the page cache.
+The Ubuntu 24.04 AMI boots with `/dev/nvme1n1` **present but not formatted, not mounted**. You MUST format and mount it BEFORE installing Postgres data dir, or the data goes to the 8 GiB root EBS volume and the bench dies the moment you fill the page cache.
 
 ```bash
 sudo mkfs.xfs /dev/nvme1n1
@@ -15,7 +15,7 @@ sudo mount -o noatime,nodiratime /dev/nvme1n1 /mnt/pgdata
 sudo chown postgres:postgres /mnt/pgdata
 ```
 
-If PG is installed first (default path `/var/lib/postgresql/18/main`), data goes to the 8 GB root EBS → disaster under bench load.
+If PG is installed first (default path `/var/lib/postgresql/18/main`), data goes to the 8 GiB root EBS → disaster under bench load.
 
 **Symlink pattern** after moving data:
 
@@ -34,7 +34,7 @@ See [runners/fix_nvme_mount.sh](runners/fix_nvme_mount.sh) for the recovery proc
 
 Default `/tmp` is on the root disk. pgbench `--log-prefix=/tmp/bench/producer_agg` + `consumer.log` with 17 k NOTICE/s + `bloat.csv` + `sys_metrics.csv` all write there.
 
-At bench rates this is 1–5 MB/s of root-disk I/O. Not catastrophic but adds noise you can see in the sys_metrics disk panel of the non-subject VM.
+At bench rates this is 1–5 MiB/s of root-disk I/O. Not catastrophic but adds noise you can see in the sys_metrics disk panel of the non-subject VM.
 
 **Fix:**
 
@@ -57,7 +57,7 @@ If investigating slow queries later, enable `log_statement='all'` + `logging_col
 
 ## 4. Clean-slate reset without losing adjacent schemas
 
-`DROP SCHEMA pgque CASCADE` can cascade into objects it doesn't own if a function somewhere references pgque. R6 lost the `ash` schema this way on the pgque VM.
+`DROP SCHEMA pgque CASCADE` can cascade into objects it doesn't own if a function somewhere references pgque. This happened during testing — the `ash` schema was silently dropped on the pgque VM.
 
 **Safer pattern for pgque:** `TRUNCATE` all `pgque.*` tables instead of drop-and-recreate. If a reinstall is actually needed:
 
@@ -105,7 +105,7 @@ pg_partman is installed in schema `public` (not `partman` as the docs sometimes
 
 Default `premake=4` works for bench (4 future 5-min partitions = 20 min buffer, enough for 1.5 h bench given 1-min maintenance cron cadence).
 
-R6 tried `premake=20` = 24 partitions steady-state, which killed pgmq-partitioned consumer perf (525 TPS vs 6621 with premake=4) due to planner cost across all partitions per query.
+`premake=20` (24 partitions steady-state) killed pgmq-partitioned consumer perf (525 TPS vs 6621 with premake=4) due to planner cost across all partitions per query.
 
 `infinite_time_partitions=true` is needed for the maintenance job to keep creating future partitions indefinitely:
 
@@ -190,8 +190,8 @@ If you see this on a VM that wasn't freshly installed from the PR #62 branch, re
 
 i4i.2xlarge spot price is ~$0.20–0.30/h vs on-demand $0.686/h. The us-east-2 spot market is reliable for short bench runs but DOES reclaim.
 
-- R4 lost `pgmq-partitioned` spot mid-run.
-- R6 lost `pgque` AND `que` spots between runs (hours-long windows).
+- `pgmq-partitioned` spot was reclaimed mid-run during one bench session.
+- `pgque` AND `que` spots were reclaimed between runs (hours-long windows) in another.
 
 **Mitigation:** on-demand for the primary subject under test (pgque for PR #62 work). Cost delta: ~$3–4 per 7-VM 1.5 h run. Worth it for the subject you're landing a PR on; leave the comparison set on spot.
 
diff --git a/benchmark/README.md b/benchmark/README.md
index 0ef65520..beaa7ce9 100644
--- a/benchmark/README.md
+++ b/benchmark/README.md
@@ -44,7 +44,7 @@ ASH and pgfr output (`ash.csv`, `pgfr_snapshots.csv`) land in `/tmp/bench/` post
 ```
 benchmark/
   README.md                          # this file
-  METHODOLOGY.md                     # full methodology (from GitLab #77)
+  METHODOLOGY.md                     # full methodology
   OPS_GOTCHAS.md                     # operational lessons
   HARDWARE.md                        # VM + PG tuning
   tooling/
@@ -85,8 +85,8 @@ benchmark/
     install_pgboss.sh
                                      # note: pgque + que driven from AMI userdata
   charts/
-    r5_analyze.py                    # R5 2-panel chart
-    r6_smoke_chart.py                # R6 smoke Solarized-Dark chart
+    r5_analyze.py                    # 2-panel chart (dead tuples + consumer latency)
+    r6_smoke_chart.py                # smoke Solarized-Dark chart (events/s + dead tuples)
   gifs/
     r4_gif_v17_solarized.py          # dead-tuples animated GIF
     r4_gif_tps_solarized.py          # TPS/latency animated GIF
diff --git a/benchmark/charts/r5_analyze.py b/benchmark/charts/r5_analyze.py
index 7a5b76aa..212dd2c4 100644
--- a/benchmark/charts/r5_analyze.py
+++ b/benchmark/charts/r5_analyze.py
@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-"""R5 full — verdict table + 2-panel chart (dead tuples + consumer latency)."""
+"""r5_analyze — verdict table + 2-panel chart (dead tuples + consumer latency)."""
 import csv, re
 from pathlib import Path
 from datetime import datetime
@@ -128,10 +128,10 @@ def fmt_lat(v,_):
 ax2.yaxis.set_major_formatter(FuncFormatter(fmt_lat))
 ax2.set_ylabel("consumer latency (mean)", labelpad=8)
 
-fig.text(0.08, 0.97, "R5 full — 7 systems · 1.5 h · held xmin horizon for 30 min",
+fig.text(0.08, 0.97, "pgque bench — 7 systems · 1.5 h · held xmin horizon for 30 min",
          ha="left", fontsize=13, fontweight="bold")
 fig.text(0.08, 0.948, "producer -R 1000, pgbench --aggregate-interval=10 --log · consumer latency clipped at 2s",
          ha="left", fontsize=9, color="#666")
 
 fig.savefig("/tmp/r5_full_chart.png", dpi=110, bbox_inches="tight")
-print(f"\nwrote /tmp/r5_full_chart.png ({Path('/tmp/r5_full_chart.png').stat().st_size/1024:.0f} KB)")
+print(f"\nwrote /tmp/r5_full_chart.png ({Path('/tmp/r5_full_chart.png').stat().st_size/1024:.0f} KiB)")
diff --git a/benchmark/charts/r6_smoke_chart.py b/benchmark/charts/r6_smoke_chart.py
index e455cc8a..12d40f07 100644
--- a/benchmark/charts/r6_smoke_chart.py
+++ b/benchmark/charts/r6_smoke_chart.py
@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-"""R6 combined smoke chart — events consumed/s + pgque table dead tuples."""
+"""r6_smoke_chart — events consumed/s + pgque table dead tuples."""
 import csv, re
 from pathlib import Path
 from datetime import datetime
@@ -114,10 +114,10 @@
 ax2.set_ylim(0, max(50, max([max(s[2]) for s in tbl_series.values() if s[2]], default=10)*1.15))
 
 # Titles
-fig.text(0.09, 0.965, "R6 combined smoke — pgque PR #62 rotation + events-consumed instrumentation",
+fig.text(0.09, 0.965, "pgque smoke — PR #62 rotation + events-consumed instrumentation",
          ha="left", fontsize=12, fontweight="bold", color=FG_EMPH)
 fig.text(0.09, 0.94, "20 min (5m clean + 10m held xmin + 5m recovery). Event-rate: producer was rate-capped at 1000 ev/s.",
          ha="left", fontsize=9, color=FG_DIM)
 
 fig.savefig(OUT, dpi=110, bbox_inches="tight", facecolor=BG)
-print(f"wrote {OUT}  ({OUT.stat().st_size/1024:.0f} KB)")
+print(f"wrote {OUT}  ({OUT.stat().st_size/1024:.0f} KiB)")
diff --git a/benchmark/consumers/consumer_pgboss.sql b/benchmark/consumers/consumer_pgboss.sql
index 514c3b0a..479a961d 100644
--- a/benchmark/consumers/consumer_pgboss.sql
+++ b/benchmark/consumers/consumer_pgboss.sql
@@ -1,4 +1,4 @@
--- R6 instrumented consumer for pg-boss (SKIP LOCKED + DELETE USING on partitioned job table).
+-- Instrumented consumer for pg-boss (SKIP LOCKED + DELETE USING on partitioned job table).
 -- pg-boss 'default' queue, state='created'.
 DO $$
 DECLARE
diff --git a/benchmark/consumers/consumer_pgmq-partitioned.sql b/benchmark/consumers/consumer_pgmq-partitioned.sql
index 962fcd7b..20b48cd8 100644
--- a/benchmark/consumers/consumer_pgmq-partitioned.sql
+++ b/benchmark/consumers/consumer_pgmq-partitioned.sql
@@ -1,4 +1,4 @@
--- R6 instrumented consumer for pgmq-partitioned (identical SQL to pgmq;
+-- Instrumented consumer for pgmq-partitioned (identical SQL to pgmq;
 -- the partitioning is on the schema side only).
 DO $$
 DECLARE
diff --git a/benchmark/consumers/consumer_pgmq.sql b/benchmark/consumers/consumer_pgmq.sql
index 51c730f0..52e842d5 100644
--- a/benchmark/consumers/consumer_pgmq.sql
+++ b/benchmark/consumers/consumer_pgmq.sql
@@ -1,4 +1,4 @@
--- R6 instrumented consumer for pgmq (read-then-delete).
+-- Instrumented consumer for pgmq (read-then-delete).
 -- pgmq.delete is a function body so pg_stat_statements can't see row counts;
 -- NOTICE is the authoritative source.
 DO $$
diff --git a/benchmark/consumers/consumer_pgq.sql b/benchmark/consumers/consumer_pgq.sql
index df251f8c..536c28f7 100644
--- a/benchmark/consumers/consumer_pgq.sql
+++ b/benchmark/consumers/consumer_pgq.sql
@@ -1,4 +1,4 @@
--- R6 instrumented consumer for pgq (identical to pgque; schema swap only).
+-- Instrumented consumer for pgq (identical to pgque; schema swap only).
 DO $$
 DECLARE
   b bigint;
diff --git a/benchmark/consumers/consumer_pgque.sql b/benchmark/consumers/consumer_pgque.sql
index 4f6a4cd6..989b4f62 100644
--- a/benchmark/consumers/consumer_pgque.sql
+++ b/benchmark/consumers/consumer_pgque.sql
@@ -1,4 +1,4 @@
--- R6 instrumented consumer for pgque (batch ticker model).
+-- Instrumented consumer for pgque (batch ticker model).
 -- Preserves original behavior; adds one NOTICE per call for honest-events/s.
 -- Format: NOTICE: ev ts=<epoch_s> n=<events>
 DO $$
diff --git a/benchmark/consumers/consumer_que.sql b/benchmark/consumers/consumer_que.sql
index 14b188f8..3498ab5b 100644
--- a/benchmark/consumers/consumer_que.sql
+++ b/benchmark/consumers/consumer_que.sql
@@ -1,5 +1,5 @@
--- R6 instrumented consumer for que (Ruby Que gem; SKIP LOCKED + DELETE).
--- Mirror R5 shape (que_jobs with default queue and priority ordering).
+-- Instrumented consumer for que (Ruby Que gem; SKIP LOCKED + DELETE).
+-- que_jobs with default queue and priority ordering.
 DO $$
 DECLARE
   n int := 0;
diff --git a/benchmark/consumers/consumer_river.sql b/benchmark/consumers/consumer_river.sql
index d9be6a3c..727c9463 100644
--- a/benchmark/consumers/consumer_river.sql
+++ b/benchmark/consumers/consumer_river.sql
@@ -1,4 +1,4 @@
--- R6 instrumented consumer for river (SKIP LOCKED + DELETE USING).
+-- Instrumented consumer for river (SKIP LOCKED + DELETE USING).
 -- Wrapped in DO block so NOTICE carries the count.
 -- Raw DELETE rowcount is ALSO visible in pg_stat_statements via the
 -- wrapped statement (total row count over the run — cross-check).
diff --git a/benchmark/runners/run_r7.sh b/benchmark/runners/run_r7.sh
index 0ecf4b59..9f15582d 100644
--- a/benchmark/runners/run_r7.sh
+++ b/benchmark/runners/run_r7.sh
@@ -1,18 +1,16 @@
 #!/usr/bin/env bash
 # run_r7.sh <system> [duration_s]
-# R7: 1.5h bench (30m clean + 30m idle-in-tx + 30m recovery)
-# Adds over R6:
-#   * sys_metrics_sampler.py running alongside bloat_sampler.py
-#   * pg_ash ASH + pgfr snapshots copied to CSV at end (schemas preserved)
-#
-# Same instrumented consumer NOTICE format as R6 for honest ev/s parsing.
+# run_r7.sh: 1.5h bench (30m clean + 30m idle-in-tx + 30m recovery)
+# sys_metrics_sampler.py runs alongside bloat_sampler.py.
+# pg_ash ASH + pgfr snapshots copied to CSV at end (schemas preserved).
+# Consumer NOTICE format: NOTICE: ev ts=<epoch_s> n=<events>
 set -Eeuo pipefail
 SYS=${1:?system}
 DUR=${2:-5400}  # 1.5h = 5400s
 
 R7_DIR=${R7_DIR:-/tmp/r7}
 R6_DIR=${R6_DIR:-/tmp/r6}
-# Consumer SQL may be in either /tmp/consumer.sql (pushed), /tmp/r7/consumer_<sys>.sql, or /tmp/r6/consumer_<sys>.sql
+# Consumer SQL may be in either /tmp/consumer.sql (pushed), $R7_DIR/consumer_<sys>.sql, or $R6_DIR/consumer_<sys>.sql
 for c in "$R7_DIR/consumer_${SYS}.sql" "/tmp/consumer.sql" "$R6_DIR/consumer_${SYS}.sql"; do
   if [[ -f "$c" ]]; then CONSUMER_SQL="$c"; break; fi
 done
@@ -46,11 +44,11 @@ sudo -u postgres psql -d bench -c "SELECT pg_stat_statements_reset()" >/dev/null
 python3 /tmp/bloat_sampler.py --system "$SYS" --interval 30 --duration "$DUR" > /tmp/bench/bloat.csv &
 BLOAT_PID=$!
 
-# NEW in R7: sys_metrics_sampler.py — CPU/mem/disk every 10s, NVMe instance-store device
+# sys_metrics_sampler.py — CPU/mem/disk every 10s, NVMe instance-store device
 python3 "$R7_DIR/sys_metrics_sampler.py" --interval 10 --duration "$DUR" --device nvme1n1 --out /tmp/bench/sys_metrics.csv > /tmp/bench/sys_metrics.log 2>&1 &
 SYSM_PID=$!
 
-# pgss snapshot cross-check (carried over from R6)
+# pgss snapshot cross-check
 python3 "$R6_DIR/pg_stat_statements_snapshot.py" \
   --dsn "host=127.0.0.1 dbname=bench user=postgres" \
   --interval 10 --duration "$DUR" \
@@ -58,14 +56,14 @@ python3 "$R6_DIR/pg_stat_statements_snapshot.py" \
   > /tmp/bench/pgss_snapshotter.log 2>&1 &
 PGSS_PID=$!
 
-# Producer — R7 full: -R 5000 (was 1000 in smoke/R6)
+# Producer — full run: -R 5000
 pgbench -h 127.0.0.1 -U postgres -d bench -n -f /tmp/producer.sql \
   -c 1 -j 1 -R 5000 -T "$DUR" -P 30 \
   --aggregate-interval=10 --log --log-prefix=/tmp/bench/producer_agg \
   > /tmp/bench/producer.log 2>&1 &
 PROD_PID=$!
 
-# Consumer (R6 instrumented SQL)
+# Consumer (NOTICE-instrumented SQL)
 pgbench -h 127.0.0.1 -U postgres -d bench -n -f "$CONSUMER_SQL" \
   -c $CONS_C -j $CONS_C -T "$DUR" -P 30 \
   --aggregate-interval=10 --log --log-prefix=/tmp/bench/consumer_agg \
@@ -112,4 +110,4 @@ python3 "$R6_DIR/parse_events_consumed.py" \
   --system "$SYS" \
   > /tmp/bench/events_consumed_parse.log 2>&1 || true
 
-echo "=== R7 done: $SYS ==="
+echo "=== bench done: $SYS ==="
diff --git a/benchmark/tooling/parse_events_consumed.py b/benchmark/tooling/parse_events_consumed.py
index 8afa831e..da8b6b51 100755
--- a/benchmark/tooling/parse_events_consumed.py
+++ b/benchmark/tooling/parse_events_consumed.py
@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-"""parse_events_consumed.py — turn R6 NOTICE-instrumented consumer.log into
+"""parse_events_consumed.py — turn NOTICE-instrumented consumer.log into
 an honest events-delivered-per-second time series.
 
 Inputs (by convention, --bench-dir):
@@ -109,7 +109,7 @@ def main() -> int:
                 f"method=NOTICE\n"
                 f"notice_lines_found=0\n"
                 "WARNING: no NOTICE events_consumed data — verify consumer.sql\n"
-                "         is the R6 instrumented version. Fallback: use\n"
+                "         uses NOTICE instrumentation. Fallback: use\n"
                 "         pg_stat_statements snapshot (pgss_timeseries.csv)\n"
                 "         or compute via producer_rate - queue_bloat_delta.\n"
             )
diff --git a/benchmark/tooling/sys_metrics_sampler.py b/benchmark/tooling/sys_metrics_sampler.py
index 4897e27b..e2f562e7 100644
--- a/benchmark/tooling/sys_metrics_sampler.py
+++ b/benchmark/tooling/sys_metrics_sampler.py
@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-"""R7 v2 sys metrics sampler: CPU + RAM + per-device NVMe IOPS / MiB/s / latency.
+"""sys_metrics_sampler.py v2: CPU + RAM + per-device NVMe IOPS / MiB/s / latency.
 
 Columns:
   ts_iso,

From d586e3510cafc08a17611e9ca9d761257453b2c3 Mon Sep 17 00:00:00 2001
From: Nik Samokhvalov <nik@postgres.ai>
Date: Thu, 30 Apr 2026 02:35:48 -0700
Subject: [PATCH 4/6] chore(benchmark): address REV r1 findings (anti-leak +
 style)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Remove private WI refs from bootstrap.sh:41,44 (replace with
  "see methodology notes")
- Fix set -Eeuo pipefail in 7 shell scripts that only had partial flags
- Fix broken OPS_GOTCHAS.md:185 link (install_pgque.sh → install/README.md)
- Fix binary unit in install_pgboss.sh:2 (GB → GiB)
- Fix run_r7.sh tool paths: resolve from benchmark/tooling/ by default
  instead of hardcoded /tmp/r7 and /tmp/r6; override via R7_DIR/R6_DIR

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 benchmark/OPS_GOTCHAS.md             | 2 +-
 benchmark/install/bootstrap.sh       | 6 +++---
 benchmark/install/install_pgboss.sh  | 4 ++--
 benchmark/install/install_pgmq.sh    | 2 +-
 benchmark/install/install_pgq.sh     | 2 +-
 benchmark/install/install_river.sh   | 2 +-
 benchmark/runners/clean_reinstall.sh | 2 +-
 benchmark/runners/run_r7.sh          | 9 +++++++--
 benchmark/tooling/microbench.sh      | 2 +-
 9 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/benchmark/OPS_GOTCHAS.md b/benchmark/OPS_GOTCHAS.md
index d2391822..1dcc40b6 100644
--- a/benchmark/OPS_GOTCHAS.md
+++ b/benchmark/OPS_GOTCHAS.md
@@ -182,7 +182,7 @@ ERROR: cannot cast type xid8 to bigint
 
 **Fix:** `::text::bigint` as intermediate cast. Landed in [PR #62](https://github.com/NikolayS/pgque/pull/62).
 
-If you see this on a VM that wasn't freshly installed from the PR #62 branch, re-run [install/install_pgque.sh](install/install_pgque.sh).
+If you see this on a VM that wasn't freshly installed from the PR #62 branch, re-clone the branch and run `make USE_PGXS=1 install` followed by `psql -f build/pgque.sql` per [install/README.md](install/README.md).
 
 ---
 
diff --git a/benchmark/install/bootstrap.sh b/benchmark/install/bootstrap.sh
index 84c6c8ad..ace2e7e2 100644
--- a/benchmark/install/bootstrap.sh
+++ b/benchmark/install/bootstrap.sh
@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-set -euo pipefail
+set -Eeuo pipefail
 export DEBIAN_FRONTEND=noninteractive
 
 echo "=== [$(hostname)] bootstrap start $(date -u +%FT%TZ) ==="
@@ -38,10 +38,10 @@ sudo ln -sfn /tmp/pgfr/_analyze/sql /analyze_sql
 sudo mkdir -p /data
 sudo chown postgres:postgres /data
 
-# ── postgresql.conf tuning (same as issue #77) ───────────────────────────────
+# ── postgresql.conf tuning (see methodology notes) ───────────────────────────
 sudo tee -a /etc/postgresql/18/main/postgresql.conf >/dev/null <<'CONF'
 
-# ── pgq bench tuning (issue #77) ─────────────────────────────────────────────
+# ── pgq bench tuning (see methodology notes) ─────────────────────────────────
 shared_preload_libraries = 'pg_stat_statements,pg_cron'
 cron.database_name = 'bench'
 
diff --git a/benchmark/install/install_pgboss.sh b/benchmark/install/install_pgboss.sh
index 5a04d458..615dd4d5 100644
--- a/benchmark/install/install_pgboss.sh
+++ b/benchmark/install/install_pgboss.sh
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
-# pgboss install with covering index for polling query (avoids 7.9 GB disk sort pathology).
-set -euo pipefail
+# pgboss install with covering index for polling query (avoids 7.9 GiB disk sort pathology).
+set -Eeuo pipefail
 echo "=== install pg-boss v12.15.0 + Node worker + covering index ==="
 
 curl -fsSL https://deb.nodesource.com/setup_20.x | sudo bash - >/dev/null 2>&1
diff --git a/benchmark/install/install_pgmq.sh b/benchmark/install/install_pgmq.sh
index 0b9a802e..a6736f8f 100644
--- a/benchmark/install/install_pgmq.sh
+++ b/benchmark/install/install_pgmq.sh
@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-set -euo pipefail
+set -Eeuo pipefail
 echo "=== install pgmq v1.11.0 (SQL-only) ==="
 
 curl -fsSL \
diff --git a/benchmark/install/install_pgq.sh b/benchmark/install/install_pgq.sh
index f8ab8823..3cb86799 100644
--- a/benchmark/install/install_pgq.sh
+++ b/benchmark/install/install_pgq.sh
@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-set -euo pipefail
+set -Eeuo pipefail
 echo "=== install pgq v3.5.1 ==="
 
 sudo rm -rf /tmp/pgq
diff --git a/benchmark/install/install_river.sh b/benchmark/install/install_river.sh
index afdf8ee2..897e10d2 100644
--- a/benchmark/install/install_river.sh
+++ b/benchmark/install/install_river.sh
@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-set -euo pipefail
+set -Eeuo pipefail
 echo "=== install river v0.34.0 + Go worker ==="
 
 # i4i.2xlarge is x86_64
diff --git a/benchmark/runners/clean_reinstall.sh b/benchmark/runners/clean_reinstall.sh
index e6d94213..1fbdf1a9 100644
--- a/benchmark/runners/clean_reinstall.sh
+++ b/benchmark/runners/clean_reinstall.sh
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 # Per-system clean reinstall + stats reset.
-set -uo pipefail
+set -Eeuo pipefail
 SYS=${1:?system}
 
 echo "=== [$(date -u +%FT%TZ)] clean reinstall: $SYS ==="
diff --git a/benchmark/runners/run_r7.sh b/benchmark/runners/run_r7.sh
index 9f15582d..b39243c5 100644
--- a/benchmark/runners/run_r7.sh
+++ b/benchmark/runners/run_r7.sh
@@ -4,12 +4,17 @@
 # sys_metrics_sampler.py runs alongside bloat_sampler.py.
 # pg_ash ASH + pgfr snapshots copied to CSV at end (schemas preserved).
 # Consumer NOTICE format: NOTICE: ev ts=<epoch_s> n=<events>
+#
+# Tool resolution: scripts in benchmark/tooling/ are used by default.
+# Override with R7_DIR or R6_DIR env vars if you need /tmp/r7 or /tmp/r6 copies.
 set -Eeuo pipefail
 SYS=${1:?system}
 DUR=${2:-5400}  # 1.5h = 5400s
 
-R7_DIR=${R7_DIR:-/tmp/r7}
-R6_DIR=${R6_DIR:-/tmp/r6}
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+TOOLING_DIR="${SCRIPT_DIR}/../tooling"
+R7_DIR=${R7_DIR:-"${TOOLING_DIR}"}
+R6_DIR=${R6_DIR:-"${TOOLING_DIR}"}
 # Consumer SQL may be in either /tmp/consumer.sql (pushed), $R7_DIR/consumer_<sys>.sql, or $R6_DIR/consumer_<sys>.sql
 for c in "$R7_DIR/consumer_${SYS}.sql" "/tmp/consumer.sql" "$R6_DIR/consumer_${SYS}.sql"; do
   if [[ -f "$c" ]]; then CONSUMER_SQL="$c"; break; fi
diff --git a/benchmark/tooling/microbench.sh b/benchmark/tooling/microbench.sh
index 9feb032b..4d1d5978 100644
--- a/benchmark/tooling/microbench.sh
+++ b/benchmark/tooling/microbench.sh
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 # Per-VM microbench: CPU (sysbench), memory (sysbench), disk (fio on NVMe)
-set -uo pipefail
+set -Eeuo pipefail
 HOST=$(hostname)
 
 # Ensure tools installed

From 0aa0d41c34e6283f64301cc7552ce7068c87bc40 Mon Sep 17 00:00:00 2001
From: Nik Samokhvalov <nik@postgres.ai>
Date: Thu, 30 Apr 2026 03:15:45 -0700
Subject: [PATCH 5/6] docs(benchmark): correct three methodology claims

- logging_collector=off does not mean zero log I/O; journald still
  writes to disk (#123)
- premake=20 planner cost is first-query-in-session, not per-query;
  root cause of steady-state regression is a follow-up (#124)
- add RAISE NOTICE observer-effect caveat for high-frequency use (#127)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 benchmark/OPS_GOTCHAS.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/benchmark/OPS_GOTCHAS.md b/benchmark/OPS_GOTCHAS.md
index 1dcc40b6..9fe1375b 100644
--- a/benchmark/OPS_GOTCHAS.md
+++ b/benchmark/OPS_GOTCHAS.md
@@ -49,7 +49,7 @@ Now observability output lands on NVMe alongside PG data.
 
 ## 3. `logging_collector=off` means server logs go to systemd journal
 
-With `logging_collector=off` (our default), server logs go to the systemd journal, not a file on disk. Means zero PG log I/O during bench — good for the throughput numbers.
+We run with `logging_collector=off`, so server logs go to systemd-journald (and from there to disk) rather than a PG-managed log file. Log I/O is small relative to WAL + queue table writes at the bench's TPS, but it is not zero — quantifying its share is a follow-up (#123).
 
 If investigating slow queries later, enable `log_statement='all'` + `logging_collector=on` with `log_directory` inside the NVMe data dir (relative path `'log'` resolves correctly since `data_dir` is symlinked to NVMe).
 
@@ -105,7 +105,7 @@ pg_partman is installed in schema `public` (not `partman` as the docs sometimes
 
 Default `premake=4` works for bench (4 future 5-min partitions = 20 min buffer, enough for 1.5 h bench given 1-min maintenance cron cadence).
 
-`premake=20` (24 partitions steady-state) killed pgmq-partitioned consumer perf (525 TPS vs 6621 with premake=4) due to planner cost across all partitions per query.
+`premake=20` (24 partitions steady-state) collapsed pgmq-partitioned consumer perf to 525 TPS vs 6621 with premake=4. The dominant cost in PG's per-partition planning is the first-query-in-session penalty (Postgres caches the plan for subsequent queries in the same connection). Why this still hurt the consumer at steady state is a follow-up (#124) — likely short-lived connections (or PgBouncer transaction-mode recycling) paying the penalty repeatedly.
 
 `infinite_time_partitions=true` is needed for the maintenance job to keep creating future partitions indefinitely:
 
@@ -241,6 +241,8 @@ pgbench faithfully prints NOTICE lines to stdout (captured as `consumer.log`). P
 
 Why NOTICE rather than pgss: DO-wrappers hide per-statement rows from `pg_stat_statements`, so the pgss queryid reported for a consumer is the DO wrapper as a whole — not the SELECT/DELETE inside it — which makes it useless for counting consumed events when the internal shape differs across systems. NOTICE is schema-neutral and authoritative.
 
+Caveat: `RAISE NOTICE` itself has observer effect (server→client protocol message + log subsystem write per fire). The choice was pragmatic for boundary events; high-frequency measurements would benefit from named functions + `pg_stat_statements` instead of DO-wrappers (#127).
+
 ---
 
 ## Summary checklist

From dbfdfebe16694f29e0f2c387a6d703ae77954446 Mon Sep 17 00:00:00 2001
From: Nik Samokhvalov <nik@postgres.ai>
Date: Thu, 30 Apr 2026 03:19:37 -0700
Subject: [PATCH 6/6] docs(benchmark): drop incorrect PgBouncer speculation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 benchmark/OPS_GOTCHAS.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/benchmark/OPS_GOTCHAS.md b/benchmark/OPS_GOTCHAS.md
index 9fe1375b..fcc62c05 100644
--- a/benchmark/OPS_GOTCHAS.md
+++ b/benchmark/OPS_GOTCHAS.md
@@ -105,7 +105,7 @@ pg_partman is installed in schema `public` (not `partman` as the docs sometimes
 
 Default `premake=4` works for bench (4 future 5-min partitions = 20 min buffer, enough for 1.5 h bench given 1-min maintenance cron cadence).
 
-`premake=20` (24 partitions steady-state) collapsed pgmq-partitioned consumer perf to 525 TPS vs 6621 with premake=4. The dominant cost in PG's per-partition planning is the first-query-in-session penalty (Postgres caches the plan for subsequent queries in the same connection). Why this still hurt the consumer at steady state is a follow-up (#124) — likely short-lived connections (or PgBouncer transaction-mode recycling) paying the penalty repeatedly.
+`premake=20` (24 partitions steady-state) collapsed pgmq-partitioned consumer perf to 525 TPS vs 6621 with premake=4. The dominant cost in PG's per-partition planning is the first-query-in-session penalty (Postgres caches the plan for subsequent queries in the same connection). Why this still hurt the consumer at steady state is a follow-up (#124) — the bench used direct connections, so the actual cause needs measurement (e.g., consumer reconnect frequency, cached-plan invalidation, or per-query overhead distinct from initial planning).
 
 `infinite_time_partitions=true` is needed for the maintenance job to keep creating future partitions indefinitely: