Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
c90689c
Add spec: atomic df.start() to eliminate df/_duroxide divergence
tjgreen42 Jun 18, 2026
00de110
spec: add dual-write inventory; clarify df.cancel vs df.signal
tjgreen42 Jun 18, 2026
7aeff23
spec: surface the four design options up front; state Option 3 + Opti…
tjgreen42 Jun 18, 2026
3f64e60
spec: add options comparison table
tjgreen42 Jun 18, 2026
efb818c
POC: atomic df.start() via in-transaction SPI enqueue (Option 3)
tjgreen42 Jun 18, 2026
6495ee1
spec: record validated Option 3 POC results
tjgreen42 Jun 19, 2026
66dd652
POC round 2: dynamic schema, provider probe + fallback, Option 4 reco…
tjgreen42 Jun 19, 2026
f72c6a0
Option 4: built-in durable reconciler cron loop
tjgreen42 Jun 19, 2026
33e9648
Address code-review findings (security + correctness)
tjgreen42 Jun 19, 2026
bbb4fed
Enqueue df.signal/df.cancel on the caller's transaction
tjgreen42 Jun 19, 2026
bf10f6f
Rewrite atomic-start spec: align with implementation, tighten, de-jargon
tjgreen42 Jun 19, 2026
f032982
Add E2E regression tests for atomic start/cancel/signal and reconcile
tjgreen42 Jun 19, 2026
0053a18
spec: call out direct duroxide-pg coupling and the PG-provider requir…
tjgreen42 Jun 19, 2026
d0eb2dc
Merge main into tjgreen42/atomic-start
tjgreen42 Jun 19, 2026
d46f431
Address self-review findings: upgrade path, fallback probe, start-wra…
tjgreen42 Jun 19, 2026
8472e19
Address rubber-duck feedback on early signals and upgrade docs
tjgreen42 Jun 19, 2026
00108fd
docs: flag wait_for_schedule replay-sequence upgrade caveat
tjgreen42 Jun 19, 2026
0845620
Address correctness review: backfill wrapper grants and broaden sched…
tjgreen42 Jun 19, 2026
458a422
Fix pgspot search_path finding for SECURITY DEFINER functions
tjgreen42 Jun 19, 2026
a38eb9f
docs: shorten atomic consistency spec to high-level design
tjgreen42 Jun 19, 2026
73a8cfe
docs: clarify start wrapper authorization comment
tjgreen42 Jun 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
153 changes: 153 additions & 0 deletions docs/spec-atomic-start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
# Keeping the control plane and the runtime consistent

**Status:** Implemented and validated. Atomic `df.start` / `df.cancel` /
`df.signal`, plus a reconciler that repairs leftover drift.
**Author:** pg_durable team
**Date:** June 2026

## Problem

A durable function has state in two places:

- **pg_durable control plane** — `df.nodes` and `df.instances`, written on the
caller's PostgreSQL transaction.
- **duroxide runtime** — `_duroxide` queue/history/instance state, previously
written out-of-band through a separate connection.

Those writes were not atomic. The visible failure was a rolled-back `df.start()`:
`df.*` rows rolled back, but the duroxide enqueue survived. The worker then retried
an orchestration whose graph no longer existed, waited up to 5 seconds for rows that
would never appear, and left an orphan behind. We reproduced this with rolled-back
starts leaving `_duroxide.instances` rows that had no matching `df.instances` row.

`df.cancel()` and `df.signal()` had the same transaction-boundary problem for their
runtime enqueue: a rollback of the caller's transaction did not roll back the runtime
work.

## Decision

Implement **prevent + repair**:

1. **Prevent:** enqueue `df.start()`, `df.cancel()`, and `df.signal()` runtime work
inside the caller's transaction via SPI. The control-plane writes and runtime
enqueue now commit or roll back together.
2. **Repair:** run a lightweight reconciler that removes leftover runtime orphans
and marks stuck control-plane rows failed. This catches legacy/fallback drift and
crash-time drift that transaction-local enqueue cannot prevent.

## Alternatives considered

| Option | Summary | Decision |
|---|---|---|
| Single source of truth in `_duroxide` | Move all state into the runtime schema and rebuild pg_durable reads/security over it. | Too large and migration-heavy for this fix. |
| Run pg_durable writes inside duroxide's transaction | Let duroxide-pg perform the `df.*` writes on its worker-pool connection. | Atomic, but not with the caller's transaction; breaks identity capture and row security expectations. |
| Run duroxide enqueue inside the caller's transaction | Keep both stores and use SPI to enqueue runtime work next to `df.*` writes. | Chosen primary fix. |
| Async reconciler only | Tolerate drift and repair it later. | Useful backstop, but insufficient alone. |

## Design overview

`df.start()` still creates the graph and instance rows in `df.*`. The final enqueue
step now happens through SQL on the same transaction, so a rollback undoes both the
control-plane rows and the runtime queue row. `df.cancel()` and `df.signal()` follow
the same transaction rule.

The runtime queue is not writable by ordinary users, so the enqueue goes through
private `SECURITY DEFINER` wrappers in the `df` schema. These wrappers are granted
through `df.grant_usage()`, build the runtime work items themselves, and perform
their own authorization checks before writing to `_duroxide`.

The start wrapper is intentionally **not** a general-purpose privileged runtime
entrypoint. It only starts the root function-graph orchestration and validates that
the input targets the same instance id. Cancel/signal wrappers authorize against the
instance owner.

### Direct duroxide-pg coupling

This is the abstraction break in the design. Most pg_durable code talks to the
runtime through duroxide's Rust provider/client API. That API uses a separate
connection pool, so it cannot share the caller's backend transaction.

To get caller-transaction atomicity, this PR calls the duroxide-pg SQL surface
directly (`enqueue_orchestrator_work`, `delete_instances_atomic`, and selected
runtime tables). That only works with a PostgreSQL-backed provider. In practice,
pg_durable is itself a PostgreSQL extension whose runtime state lives in the same
database, so a non-PG provider is not a meaningful deployment target; still, the
code probes for the SQL surface and falls back to the old out-of-band path when it
is absent.

### Signals

`df.signal()` fans out to the root instance and any running sub-orchestrations,
because a child branch may be the one waiting on the signal.

Duroxide does not buffer external events until an orchestration is ready to receive
them. Therefore a signal sent before the root runtime row exists is rejected instead
of returning `OK` and being silently skipped. Once the runtime row exists, the
signal enqueue is atomic with the caller's transaction.

### Reconciler

`df.reconcile()` is an admin-only backstop. It:

- deletes orphaned runtime **root** instances whose full subtree has no matching
`df.instances` row; and
- marks stuck `df.instances` rows failed when there is no live runtime instance and
no queued start.

The background worker keeps one reconciler durable loop running per cluster on
`pg_durable.reconciler_cron` (default `*/5 * * * *`; empty disables it), submitted
by the dedicated non-superuser role `df_reconciler`.

## Behavior changes

- `df.start()`, `df.cancel()`, and `df.signal()` now participate in the caller's
transaction. For example, `BEGIN; SELECT df.start(...); ROLLBACK;` no longer
starts the workflow on the atomic path.
- If the duroxide-pg SQL surface or the `df` wrappers are missing, pg_durable logs
and falls back to the previous non-atomic client path. The fallback is not emitted
as a client-visible SQL `WARNING`, so scripts that capture `SELECT df.start(...)`
output remain compatible.
- `df.wait_for_schedule` now records an `utc_now` event before the timer so repeated
schedule waits compute the next cron tick each generation. This fixes a loop
busy-loop bug, but changes the recorded replay event sequence.

## Upgrade and compatibility

The wrappers and `df.reconcile()` are `df`-schema objects, so they are included in
both fresh-install SQL and the `0.2.3 -> 0.2.4` upgrade script. The upgrade script
also updates `df.grant_usage()` / `df.revoke_usage()` and backfills wrapper EXECUTE
privileges to existing roles that already had explicit `USAGE` on schema `df`.

Binary backward compatibility is preserved for old schemas that have not yet run
`ALTER EXTENSION UPDATE`: the new binary uses the atomic path only when both the
duroxide-pg enqueue function and the `df._enqueue_orchestrator_*` wrappers exist;
otherwise it falls back to the old out-of-band client path.

There is no table data migration.

**Upgrade caveat:** drain or restart any in-flight instance already waiting in a
`WAIT_SCHEDULE` node during upgrade. Those histories may replay expecting the old
sequence and fail as nondeterministic after the `utc_now` change.

## Validation

Automated coverage added in this PR:

- `24_atomic_rollback` — rolled-back start/cancel/signal leave no runtime effect;
signaling before runtime materialization is rejected.
- `25_enqueue_wrapper_authz` — wrapper authorization and start-wrapper hardening.
- `26_reconcile_orphan_gc` — orphan root with child sub-orchestrations is collected
as a full subtree; healthy instances remain untouched.
- `scripts/test-upgrade.sh` — schema equivalence, binary compatibility, data
compatibility, and existing-role wrapper-grant backfill.

Manual/targeted validation included signal fan-out, cancel consistency, reconciler
liveness, cron scheduling, formatting, clippy, pgspot, and upgrade tests.

## Remaining risks / follow-up

- The direct duroxide-pg SQL dependency is intentional but should remain small and
well documented. Moving the privileged enqueue surface into duroxide-pg would be a
cleaner long-term boundary.
- Reconciler grace/cadence and role provisioning may need tuning after operational
experience.
13 changes: 11 additions & 2 deletions docs/upgrade-testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,14 +206,23 @@ what the upgrade script handles, and any backward compatibility considerations.
### v0.2.3 → v0.2.4

#### Simplify `df.grant_usage()` — drop the explicit function allowlist
- **DDL change (df schema):** `df.grant_usage()` no longer loops over a hard-coded `func_sigs` array issuing `GRANT EXECUTE` per function. Fresh installs (`src/lib.rs`) and the upgrade script (`sql/pg_durable--0.2.3--0.2.4.sql`) both `CREATE OR REPLACE` the function with a body that grants `USAGE ON SCHEMA df` plus the table privileges, and conditionally grants `df.http()` / the admin helpers. The signature `df.grant_usage(text, boolean, boolean)` is unchanged.
- **DDL change (df schema):** `df.revoke_usage()` is made symmetric with the new `grant_usage()`. It no longer loops over every `df.*` function in `pg_proc` issuing `REVOKE EXECUTE` (which, post-simplification, only produced "no privileges could be revoked" warnings since ordinary functions are never granted per-function EXECUTE). The new body revokes only what `grant_usage()` grants: schema `USAGE`, EXECUTE on the sensitive functions (`df.http`, `df.grant_usage`, `df.revoke_usage`), and the table privileges. The signature `df.revoke_usage(text)` is unchanged.
- **DDL change (df schema):** `df.grant_usage()` no longer loops over a hard-coded `func_sigs` array issuing `GRANT EXECUTE` per function. Fresh installs (`src/lib.rs`) and the upgrade script (`sql/pg_durable--0.2.3--0.2.4.sql`) both `CREATE OR REPLACE` the function with a body that grants `USAGE ON SCHEMA df` plus the table privileges, conditionally grants `df.http()` / the admin helpers, and explicitly grants the new private enqueue wrappers (`df._enqueue_orchestrator_start`, `df._enqueue_orchestrator_cancel`, `df._enqueue_orchestrator_signal`) to ordinary df users. The signature `df.grant_usage(text, boolean, boolean)` is unchanged.
- **DDL change (df schema):** `df.revoke_usage()` is made symmetric with the new `grant_usage()`. It no longer loops over every `df.*` function in `pg_proc` issuing `REVOKE EXECUTE` (which, post-simplification, only produced "no privileges could be revoked" warnings since ordinary functions are never granted per-function EXECUTE). The new body revokes only what `grant_usage()` grants: schema `USAGE`, EXECUTE on the sensitive/admin functions, EXECUTE on the private enqueue wrappers, and the table privileges. The signature `df.revoke_usage(text)` is unchanged.
- **Rationale:** The ordinary `df.*` functions retain PostgreSQL's default PUBLIC `EXECUTE`, so schema `USAGE` is the real access gate; the per-function grants/revokes were redundant. The sensitive functions have PUBLIC `EXECUTE` revoked at install time and were never in the allowlist, so their protection is unchanged.
- **Behavioral note:** A newly added `df.*` function is now callable by any role with schema `USAGE` by default. To keep a future function private, `REVOKE EXECUTE ... FROM PUBLIC` at install time and grant it explicitly in `df.grant_usage()`.
- **Legacy cleanup caveat:** A role that was granted under the *old* `grant_usage()` (explicit per-function EXECUTE) and is later revoked under the new `revoke_usage()` may retain inert EXECUTE entries on ordinary functions. These are harmless — revoking schema `USAGE` fully locks the role out — and clear on the next drop/regrant cycle.
- **Scenario A considerations:** Signatures are identical on the fresh-install and upgrade paths (only the bodies differ), so the function-signature equivalence contract passes.
- **Scenario B1/B2 considerations:** No schema/data migration and no new objects. The replaced bodies work against the existing schema and change no privileges already granted.

#### Atomic in-transaction enqueue wrappers + `df.reconcile()`
- **DDL change (df schema):** Adds three private `SECURITY DEFINER` wrappers: `df._enqueue_orchestrator_start(text, text, text)`, `df._enqueue_orchestrator_cancel(text, text)`, and `df._enqueue_orchestrator_signal(text, text, text)`. Fresh installs and the upgrade script create the same functions and `REVOKE EXECUTE ... FROM PUBLIC`; `df.grant_usage()` grants them explicitly to df users because `df.start()` / `df.cancel()` / `df.signal()` call them via SPI as the caller. The upgrade script also backfills these wrapper grants to roles that already had explicit `USAGE` on schema `df` before `ALTER EXTENSION UPDATE`, preserving grant option where present.
- **DDL change (df schema):** Adds admin-only `df.reconcile(integer)` (`SECURITY DEFINER`, `REVOKE EXECUTE ... FROM PUBLIC`) to delete orphaned duroxide instance subtrees and mark stuck `df.instances` rows failed. The background worker starts it through a built-in durable loop as the dedicated `df_reconciler` role.
- **Provider coupling:** These wrappers and `df.reconcile()` call the duroxide-pg SQL surface directly (`enqueue_orchestrator_work`, `delete_instances_atomic`, and `_duroxide` tables). This is intentional: only direct SQL via SPI can share the caller's transaction. Non-PG providers (or older schemas without the wrappers) use the legacy out-of-band fallback.
- **Scenario A considerations:** Fresh-install and upgrade schemas must expose the same new `df` functions and grants. The upgrade script creates the wrappers/reconciler, updates `grant_usage()`/`revoke_usage()`, and backfills existing df users in the same release, so upgraded users can call `df.start()` / `df.cancel()` / `df.signal()` immediately after `ALTER EXTENSION UPDATE`.
- **Scenario B1 considerations:** The new `.so` remains compatible with pre-upgrade 0.2.3 schemas because `df.start()` / `df.cancel()` / `df.signal()` use the in-transaction path only when both the duroxide-pg provider SQL function and the `df._enqueue_orchestrator_*` wrappers exist; otherwise they log and fall back to the old out-of-band client path. This fallback is not client-visible as a SQL `WARNING`, so it does not contaminate scripts that capture `SELECT df.start(...)` output.
- **Scenario B2 considerations:** No data migration. Existing instances and queued work keep their original behavior; the new atomic semantics apply to calls made after the schema has the wrappers.
- **In-flight schedule caveat:** This release also changes `df.wait_for_schedule` to compute from the orchestration's recorded clock. That fixes a loop busy-loop bug, but changes the recorded replay event sequence (`utc_now` is recorded before the timer). Any in-flight instance already waiting in a `WAIT_SCHEDULE` node when the binary changes should be drained or restarted during upgrade; otherwise it may replay expecting the old sequence and fail as nondeterministic.

#### Rename `df.wait_for_completion()` to `df.await_instance()`
- **DDL change (df schema):** Adds `df.await_instance(text, integer)` as the canonical C binding for the helper formerly exposed as `df.wait_for_completion(text, integer)`. The old SQL function remains present and the new `.so` continues exporting `wait_for_completion_wrapper` as a shim, so existing customer scripts keep working.
- **Grant behavior:** No explicit grant migration is required. PostgreSQL grants `EXECUTE` on newly created functions to `PUBLIC` by default, and `df.await_instance` is not a sensitive helper whose default PUBLIC grant is revoked.
Expand Down
17 changes: 17 additions & 0 deletions scripts/test-upgrade.sh
Original file line number Diff line number Diff line change
Expand Up @@ -914,6 +914,7 @@ echo ""
B2_PRE_INSTANCE_ID=""
B2_INFLIGHT_INSTANCE_ID=""
B2_POST_INSTANCE_ID=""
B2_PRE_GRANTED_ROLE="durable_b2_pre_grant"

test_b2_data_survives_upgrade() {
# Step 1: Install previous version and create test data
Expand All @@ -924,6 +925,14 @@ test_b2_data_survives_upgrade() {
assert_sql_equals "SELECT df.clearvars();" "OK" || return 1
assert_sql_equals "SELECT df.setvar('b2_key', 'b2_value');" "OK" || return 1

# Role granted under the previous schema. The upgrade must backfill EXECUTE
# on new private wrappers so this existing df user can keep calling
# df.start()/df.cancel()/df.signal() without re-running df.grant_usage().
run_sql_capture "DROP OWNED BY ${B2_PRE_GRANTED_ROLE};" >/dev/null 2>&1 || true
run_sql_capture "DROP ROLE IF EXISTS ${B2_PRE_GRANTED_ROLE};" >/dev/null 2>&1 || true
run_sql_capture "CREATE ROLE ${B2_PRE_GRANTED_ROLE} LOGIN;" >/dev/null || return 1
run_sql_capture "SELECT df.grant_usage('${B2_PRE_GRANTED_ROLE}');" >/dev/null || return 1

B2_PRE_INSTANCE_ID=$(run_sql_capture "SELECT df.start('INSERT INTO test_upgrade_b2_log (kind, msg) VALUES (''pre'', ''{b2_key}'') RETURNING msg', 'b2-pre-upgrade');") || return 1
B2_INFLIGHT_INSTANCE_ID=$(run_sql_capture "SELECT df.start(df.sleep(2) ~> 'SELECT ''b2-running'' AS value', 'b2-inflight');") || return 1

Expand Down Expand Up @@ -968,6 +977,12 @@ test_b2_new_data_after_upgrade() {
assert_sql_equals "SELECT msg FROM test_upgrade_b2_log WHERE kind = 'post' ORDER BY id DESC LIMIT 1;" "new_value"
}

test_b2_existing_grants_after_upgrade() {
assert_sql_equals "SELECT has_function_privilege('${B2_PRE_GRANTED_ROLE}', 'df._enqueue_orchestrator_start(text, text, text)', 'EXECUTE');" "t" &&
assert_sql_equals "SELECT has_function_privilege('${B2_PRE_GRANTED_ROLE}', 'df._enqueue_orchestrator_cancel(text, text)', 'EXECUTE');" "t" &&
assert_sql_equals "SELECT has_function_privilege('${B2_PRE_GRANTED_ROLE}', 'df._enqueue_orchestrator_signal(text, text, text)', 'EXECUTE');" "t"
}

test_b2_grant_usage_after_upgrade() {
# Regression guard for #110: after ALTER EXTENSION UPDATE, df.debug_connection()
# must be gone from the catalog. Scenario A only compares function name/args/
Expand Down Expand Up @@ -996,13 +1011,15 @@ test_b2_grant_usage_after_upgrade() {

# Clean up the probe role.
run_sql_capture "DROP OWNED BY ${probe_role}; DROP ROLE IF EXISTS ${probe_role};" >/dev/null 2>&1 || true
run_sql_capture "DROP OWNED BY ${B2_PRE_GRANTED_ROLE}; DROP ROLE IF EXISTS ${B2_PRE_GRANTED_ROLE};" >/dev/null 2>&1 || true
}

if [ "$HAS_COMPAT_PREV" = true ]; then
run_test "B2: Pre-upgrade data survives ALTER EXTENSION UPDATE" test_b2_data_survives_upgrade
run_test "B2: Pre-upgrade instance remains queryable" test_b2_pre_upgrade_instance_after_upgrade
run_test "B2: In-flight work completes after upgrade" test_b2_inflight_work_after_upgrade
run_test "B2: New data and execution after upgrade" test_b2_new_data_after_upgrade
run_test "B2: Existing df users retain wrapper privileges after upgrade" test_b2_existing_grants_after_upgrade
run_test "B2: df.grant_usage() works and df.debug_connection() is gone after upgrade" test_b2_grant_usage_after_upgrade
fi

Expand Down
Loading
Loading