Skip to content

Benchmark: producer batch insert options and send_batch tradeoffs #160

@NikolayS

Description

@NikolayS

Summary

Benchmarked PgQue producer paths to compare:

  1. client-side loop over send();
  2. legacy send_batch() implementation (PL/pgSQL loop over insert_event());
  3. PR fix: make send_batch set-based #159 set-based send_batch() implementation;
  4. PgQ-style direct set-based insert into current_event_table();
  5. PgQ-style direct COPY into current_event_table().

This issue documents the numbers and the product/API conclusion: send_batch() is worth keeping as the safe high-level batch API, but COPY/direct table insert remains the expert maximum-throughput path.

Environment / method

  • Branch: fix/send-batch-set-based / PR fix: make send_batch set-based #159 implementation.
  • Fresh local Postgres DB with sql/pgque.sql installed.
  • Python psycopg benchmark harness.
  • Each case creates a fresh queue, inserts N events, commits, verifies count, drops queue.
  • Repeats: 2 per case; table below uses median.
  • Payloads:
    • JSON cases: {"x": i} style payloads.
    • Text cases: opaque-i payloads.
  • These are local single-session microbenchmarks, not full server-class published benchmark numbers.

Results

N = 1,000

Path Median seconds Events/s
single send() in client loop 0.3277 3,051
legacy loop send_batch(jsonb[]) 0.0514 19,470
new set-based send_batch(jsonb[]) 0.0134 74,739
legacy loop send_batch(text[]) 0.0364 27,494
new set-based send_batch(text[]) 0.0090 111,447
direct set-based insert into current_event_table() 0.0058 171,286
direct COPY into current_event_table() 0.0087 114,293

N = 10,000

Path Median seconds Events/s
single send() in client loop 3.0403 3,289
legacy loop send_batch(jsonb[]) 0.4029 24,822
new set-based send_batch(jsonb[]) 0.1167 85,683
legacy loop send_batch(text[]) 0.3642 27,457
new set-based send_batch(text[]) 0.0748 133,717
direct set-based insert into current_event_table() 0.0674 148,331
direct COPY into current_event_table() 0.0649 154,147

N = 50,000

Path Median seconds Events/s
legacy loop send_batch(jsonb[]) 2.2683 22,043
new set-based send_batch(jsonb[]) 1.1400 43,858
legacy loop send_batch(text[]) 1.8915 26,435
new set-based send_batch(text[]) 0.6330 78,992
direct set-based insert into current_event_table() 0.2823 177,130
direct COPY into current_event_table() 0.3042 164,364

Interpretation

send_batch() is useful

Even the legacy loop send_batch() beats client-side send() loops because it removes client round-trips.

The set-based implementation in #159 is materially better:

  • JSON batch, N=10k: ~3.5× faster than legacy loop.
  • Text batch, N=10k: ~4.9× faster than legacy loop.
  • Text batch gets close to direct insert/COPY for moderate batch sizes.

send_batch() is not the absolute max-throughput path

Direct PgQ-style insertion into current_event_table() and COPY remain faster, especially at 50k rows.

That is expected: those paths bypass the safe SQL API wrapper and operate directly on queue event tables. They are expert tools.

Product/API conclusion

Keep a two-tier story:

  1. send_batch() — safe high-level API for app developers and drivers.

  2. current_event_table() + direct INSERT/COPY — expert maximum-throughput path.

    • More fragile / lower-level.
    • Requires understanding active event tables, rotation semantics, defaults, and privileges.
    • Should be documented as advanced producer mode, not the default quickstart.

Recommendation

  • Merge fix: make send_batch set-based #159 after CI/review if green.
  • Document send_batch() as set-based but not COPY-level.
  • Add a future docs/benchmark section for expert direct insert/COPY producer mode.
  • Do not remove send_batch(); it fills the ergonomic middle between single send() and raw PgQ table access.

Raw benchmark harness

The benchmark was run with a temporary Python/psycopg harness that creates equivalent legacy-loop functions for comparison and measures all paths in fresh queues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions