Skip to content

docs: draft delayed-delivery rotation blueprint // v0.3 scope#237

Draft
NikolayS wants to merge 2 commits into
mainfrom
claude/check-delayed-messages-IGSpF
Draft

docs: draft delayed-delivery rotation blueprint // v0.3 scope#237
NikolayS wants to merge 2 commits into
mainfrom
claude/check-delayed-messages-IGSpF

Conversation

@NikolayS
Copy link
Copy Markdown
Owner

Summary

Draft blueprint for replacing the current single-heap pgque.delayed_events
design with a two-table TRUNCATE rotation, modeled on PgQ's event_N
pattern. The goal is to make pgque.send_at() viable as a primary write
path without the dead-tuple churn, index bloat, and VACUUM dependence
the current experimental design would incur at scale.

Key properties of the proposed design:

  • No DELETE and no UPDATE on delayed-event rows. Reclamation is
    TRUNCATE-only.
  • No VACUUM dependence on the hot path.
  • Stable catalog (two tables, one state row, one view) — no per-bucket
    CREATE / DROP churn.
  • Delivery latency bounded by the maint tick, not by a bucket-width
    floor.
  • Bounded write amplification (1 + rotations_crossed per row) for
    long-horizon scheduled messages.

Includes a comparison against the alternative (declarative range
partition + drop), explicit invariants, a TDD-shaped implementation
plan, and acceptance criteria phrased against pg_stat_user_tables and
pg_class so the no-bloat property is testable.

Open questions in the draft

These are flagged in the blueprint and want a call before code lands:

  • Default rotation interval — draft picks 24 h / UTC midnight; a shorter
    default trades catalog stability for write amplification on
    long-horizon rows.
  • pgque.delayed_events exposed as a view over the two physical tables,
    vs. renaming the public surface.
  • Cancellation (pgque.cancel_scheduled) is deferred to a follow-up
    blueprint; locking in a tombstone-set design now would be cheap insurance
    if cancel is likely-yes for 0.3.
  • Whether to migrate live data from existing experimental installs, or
    drop-and-recreate (current draft assumes the latter).

Test plan

Blueprint-only change; no code or test diffs. Once approved, the
implementation slice will:

  • Add a failing pg_stat_user_tables.n_dead_tup == 0 acceptance check to
    tests/acceptance/us4_delayed_delivery.sql.

  • Make it green by replacing the single heap with the rotation scheme.

  • Capture before/after dead-tuple and catalog-row counts in the
    implementation PR description.

  • review blueprint design and tradeoffs

  • decide open questions (rotation interval, view vs. rename, cancellation scope)

  • convert to implementation PR with failing test + rotation code

https://claude.ai/code/session_01STJ9TCTr7ykbwG23t9ZxJv


Generated by Claude Code

claude added 2 commits May 19, 2026 19:54
Two-table TRUNCATE rotation for pgque.delayed_events modeled on PgQ's
event_N pattern. Removes DELETE/UPDATE/VACUUM from the delayed-delivery
hot path so send_at() can serve as a primary write path without
dead-tuple churn or catalog growth.
cancel_scheduled() is the single exception to the no-DELETE invariant.
Dead tuples it produces are bounded above by one rotation interval and
TRUNCATEd away on the next rotation that targets the affected table.

The bloat-free property of the design is therefore conditional on a
low cancellation rate. That caveat is required to appear:

- in the SQL comment on cancel_scheduled()
- in docs/reference.md next to the function entry
- in tutorials / examples that introduce cancellation
- in the send_at() function comment

For very-high-cancel workloads, docs must steer users to the
application-level "skip on consume" pattern instead.
@NikolayS NikolayS changed the title docs: draft delayed-delivery rotation blueprint docs: draft delayed-delivery rotation blueprint // v0.3 scope May 24, 2026
NikolayS pushed a commit that referenced this pull request May 30, 2026
Earlier draft concluded the zero-bloat differentiator does not transfer
to a workflow layer, assuming a mutable workflow_status row updated per
step (the DBOS/absurd strategy). That was wrong.

Model workflow state transitions as appended events over the rotating
log (continuation-passing): each step enqueues its successor instead of
mutating a row. Transitions become appends, not UPDATEs, so zero-bloat
carries through. Exactly-once handoff falls out of insert_event +
finish_batch in one transaction; sleep/timers use the rotating send_at
from PR #237; exclusivity is structural via cooperative consumers; the
only mutable state is a current-state projection bounded by concurrency.

Verdict flips from 'do not compete' to 'compete on a substrate
SKIP-LOCKED systems cannot match for high-throughput durable workflows'.
Remaining real risk: awaitEvent/join semantics.
NikolayS pushed a commit that referenced this pull request Jun 2, 2026
Map the durable-workflow design to pgque's real primitives and verify
the keystone against sql/pgque.sql: insert_event + finish_batch compose
atomically in the caller transaction (exactly-once handoff), finish_batch
is one subscription UPDATE per batch (amortization), ev_extra1..4 are
settable+indexable (workflow_id lookup). Flags the retry_queue DELETE-bloat
constraint (route sleeps through rotating send_at, PR #237), gives the new
coordination DDL, concrete awaitEvent/emit + join SQL, a bloat audit, and
the pgque gaps to close (promote send_at, ev_extra1 index, durable.sql).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants