Summary
df.loop uses continue_as_new after each iteration, but the continuation input only preserves the instance metadata (instance_id, label, vars). When the loop node is not the root of the function graph, the next orchestration generation reloads the graph and starts again from the instance root, re-executing any prefix nodes before the loop.
This causes side effects before the loop to run once per loop iteration instead of once per function instance.
Repro
Run this in a database where pg_durable is installed and the background worker is running:
DROP TABLE IF EXISTS table_outside_loop;
DROP TABLE IF EXISTS table_inside_loop;
CREATE TABLE table_outside_loop (
id SERIAL PRIMARY KEY,
inserted_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
);
CREATE TABLE table_inside_loop (
id SERIAL PRIMARY KEY,
inserted_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
);
SELECT df.start(
df.seq(
'INSERT INTO table_outside_loop DEFAULT VALUES RETURNING id, inserted_at',
df.loop(
'INSERT INTO table_inside_loop DEFAULT VALUES RETURNING id, inserted_at'
~> df.sleep(10)
)
),
'manual-nonroot-loop-prefix',
NULL
) AS instance_id;
After two loop iterations, inspect the tables:
SELECT COUNT(*) AS outside_loop_rows FROM table_outside_loop;
SELECT COUNT(*) AS inside_loop_rows FROM table_inside_loop;
TABLE table_outside_loop ORDER BY id;
TABLE table_inside_loop ORDER BY id;
Expected behavior
table_outside_loop should contain exactly one row for the function instance. The outer/prefix SQL step should run once before entering the loop.
table_inside_loop should grow by one row per loop iteration.
Actual behavior
table_outside_loop gets one new row per loop iteration/generation. In a local run, after the second loop generation:
outside_loop_rows = 2
inside_loop_rows = 2
The timestamped rows showed the outside-loop insert running again immediately before the second inside-loop insert.
Likely cause
execute_loop_node calls continue_as_new with only:
FunctionInput {
instance_id: graph.instance_id.clone(),
label: exec_ctx.label.clone(),
vars: exec_ctx.vars.clone(),
}
The new orchestration generation then runs the top-level execute path, calls load_function_graph, and starts from graph.root_node_id again. It does not preserve the loop node id or the current execution context as the continuation start point.
This is especially problematic for finite loops when setup state appears before the loop. For example, a prefix SQL node that resets a counter can cause a loop with df.break after N iterations to never reach N, because the counter is reset on every continue_as_new generation.
Notes
This affects loop surfaces built on the LOOP node type, including:
df.loop(body)
df.loop(body, condition)
@> body
- recurring cron-style patterns that wrap
df.wait_for_schedule(...) in df.loop / @>
Summary
df.loopusescontinue_as_newafter each iteration, but the continuation input only preserves the instance metadata (instance_id, label, vars). When the loop node is not the root of the function graph, the next orchestration generation reloads the graph and starts again from the instance root, re-executing any prefix nodes before the loop.This causes side effects before the loop to run once per loop iteration instead of once per function instance.
Repro
Run this in a database where
pg_durableis installed and the background worker is running:After two loop iterations, inspect the tables:
Expected behavior
table_outside_loopshould contain exactly one row for the function instance. The outer/prefix SQL step should run once before entering the loop.table_inside_loopshould grow by one row per loop iteration.Actual behavior
table_outside_loopgets one new row per loop iteration/generation. In a local run, after the second loop generation:The timestamped rows showed the outside-loop insert running again immediately before the second inside-loop insert.
Likely cause
execute_loop_nodecallscontinue_as_newwith only:The new orchestration generation then runs the top-level
executepath, callsload_function_graph, and starts fromgraph.root_node_idagain. It does not preserve the loop node id or the current execution context as the continuation start point.This is especially problematic for finite loops when setup state appears before the loop. For example, a prefix SQL node that resets a counter can cause a loop with
df.breakafter N iterations to never reach N, because the counter is reset on everycontinue_as_newgeneration.Notes
This affects loop surfaces built on the
LOOPnode type, including:df.loop(body)df.loop(body, condition)@> bodydf.wait_for_schedule(...)indf.loop/@>