Skip to content

postgres: add primary/secondary cluster sample app with traffic generator#122

Merged
Dasomeone merged 7 commits intomainfrom
gaantunes/postgres-cluster-sample-app
Mar 11, 2026
Merged

postgres: add primary/secondary cluster sample app with traffic generator#122
Dasomeone merged 7 commits intomainfrom
gaantunes/postgres-cluster-sample-app

Conversation

@gaantunes
Copy link
Copy Markdown
Contributor

Summary

  • Converts the single-node postgres sample app into a primary + secondary streaming replication cluster using Multipass VMs and cloud-init
  • Adds a realistic traffic generator (traffic/load.sh + traffic/schema.sql) with an e-commerce schema (users, products, orders, order_items) and mixed read/write workloads
  • Adds contention workloads (lock holds, slow scans, long-running transactions) to surface realistic query latency and wait events in dashboards
  • Adds a postgres_cluster label to all metrics via Alloy relabeling so both nodes can be grouped/filtered in Grafana

Changes

New files:

  • jinja/templates/cloud-init-primary-template.yaml — cloud-init for the primary: configures WAL replication, pg_stat_statements, creates replication user, seeds schema, starts Alloy
  • jinja/templates/cloud-init-secondary-template.yaml — cloud-init for the secondary: runs pg_basebackup against the primary and starts as a hot standby
  • traffic/schema.sql — e-commerce schema with 500 users, 200 products, 300 seeded orders
  • traffic/load.sh — continuous mixed workload with weighted random selection

Updated:

  • Makefile — added run, run-primary, run-secondary, wait-primary, render-primary-config, render-secondary-config, launch-primary, launch-secondary targets

How to run

# Configure with your Grafana Cloud (or local k3d) endpoints
make defaultconfig   # or edit jinja/variables/cloud-init.yaml directly

# Launch both nodes
make run

# Start traffic generation
make load-gen

Test plan

  • Primary VM launches, postgres initializes with WAL replication settings
  • Secondary VM connects to primary via pg_basebackup and starts as hot standby (pg_is_in_recovery() = true)
  • Alloy on both nodes ships metrics with job="integrations/postgres_exporter", instance=<hostname>, postgres_cluster="postgres-cluster"
  • Logs from /var/log/postgresql/ ship to Loki
  • Traffic generator creates realistic mixed workload with visible lock contention and long-running queries

🤖 Generated with Claude Code

gaantunes and others added 3 commits March 6, 2026 16:51
…enerator

- Add cloud-init-primary-template.yaml: sets up PostgreSQL as a streaming
  replication primary with shared_preload_libraries, wal_level=replica,
  replication user, and pg_hba.conf entries for the standby
- Add cloud-init-secondary-template.yaml: runs pg_basebackup from the primary
  (using -R to auto-configure standby.signal and primary_conninfo) and starts
  as a hot standby
- Both nodes ship metrics and logs via Alloy using constants.hostname as the
  instance label, so primary and secondary appear as separate instances in
  dashboards
- Add traffic/schema.sql: realistic e-commerce schema (users, products, orders,
  order_items) with seed data
- Add traffic/load.sh: continuous mixed workload (reads, inserts, updates,
  deletes, lock contention, cleanup) with weighted random selection
- Update Makefile: run launches the full cluster (launch-primary, wait-primary,
  launch-secondary); fix defaultconfig to use > instead of >> to avoid
  duplicate entries on repeated runs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…with contention workloads

- Add cluster="postgres-cluster" relabel rule to both primary and secondary
  alloy configs so both nodes can be grouped/filtered by cluster in dashboards
- Add lock_contention workload (holds row locks for ~2s) to surface wait events
- Add slow_scan workload (full table joins without index hints) for query latency
- Add long_read workload (pg_sleep inside transaction) to trigger long_running_transactions collector
- Increase workload concurrency weights and reduce sleep to drive higher throughput
- Fix Jinja2 template issue: wrap ${#WORKLOADS[@]} in {% raw %}...{% endraw %} to
  prevent Jinja2 interpreting {# as a comment tag

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@gaantunes gaantunes requested a review from a team as a code owner March 6, 2026 21:20
- Bump seed data to 2000 users, 500 products, 2000 orders for heavier workloads
- Add 10 parallel workers (up from 5) for increased concurrency
- Add workload_idle_in_transaction: holds row locks via shell-level sleep
  to produce real idle-in-transaction sessions in pg_stat_activity
- Add workload_blocked: targets same rows to create genuine lock queue
- Add workload_heavy_analytics and workload_slow_report for long-running queries
- Fix auto_explain removal (caused setup script to abort before init flag)
- Enable slow query logging: log_min_duration_statement=200ms, log_connections,
  log_disconnections, log_lock_waits, log_temp_files, log_checkpoints,
  log_autovacuum_min_duration=250ms, log_statement=ddl, log_line_prefix
- Add load-gen to make run target so traffic starts automatically

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@cla-assistant
Copy link
Copy Markdown

cla-assistant bot commented Mar 10, 2026

CLA assistant check
All committers have signed the CLA.

gaantunes and others added 2 commits March 10, 2026 18:23
Make's line continuation replaces '\' + newline with a space, causing
printf to output leading spaces before each key after the first line.
This produced indented YAML keys which failed to parse in CI.

Fix by using separate printf calls per line instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…late

Instead of embedding load.sh and schema.sql inline in the cloud-init YAML,
keep them as standalone files in traffic/ and transfer them to the VM via
multipass transfer after the primary is up (new setup-traffic Makefile target).

- traffic/schema.sql: bump to 2000 users, 500 products, 2000 orders
- traffic/load.sh: port full parallel worker logic with idle-in-transaction,
  blocked, lock contention, heavy analytics, and slow report workloads
- cloud-init-primary-template.yaml: remove embedded script write_files blocks
- Makefile: add setup-traffic target; wire into run after wait-primary

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@Dasomeone Dasomeone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small things, otherwise LGTM:

  • Delete the existing cloud-init file and remove references to it in makefile, given that run-ci actually invokes run which uses the new setup
  • For the matching integration, could you please rerun generation for metric_names file and update the one here as well?
    Should let us check that any change in metrics is still covered by the sample app, not that I doubt that it is.

- Delete unused cloud-init-template.yaml (replaced by primary/secondary templates)
- Fix run-ci target: remove duplicate load-gen (already included via run)
- Update linux_metrics to match current enabled_collectors config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@Dasomeone Dasomeone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for updating this (and the metrics list!) :D

@Dasomeone Dasomeone merged commit fb4d3e3 into main Mar 11, 2026
9 checks passed
@Dasomeone Dasomeone deleted the gaantunes/postgres-cluster-sample-app branch March 11, 2026 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants