Skip to content

df.loop continue_as_new restarts from root when loop is not root #227

Description

@pinodeca

Summary

df.loop uses continue_as_new after each iteration, but the continuation input only preserves the instance metadata (instance_id, label, vars). When the loop node is not the root of the function graph, the next orchestration generation reloads the graph and starts again from the instance root, re-executing any prefix nodes before the loop.

This causes side effects before the loop to run once per loop iteration instead of once per function instance.

Repro

Run this in a database where pg_durable is installed and the background worker is running:

DROP TABLE IF EXISTS table_outside_loop;
DROP TABLE IF EXISTS table_inside_loop;

CREATE TABLE table_outside_loop (
    id SERIAL PRIMARY KEY,
    inserted_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
);

CREATE TABLE table_inside_loop (
    id SERIAL PRIMARY KEY,
    inserted_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
);

SELECT df.start(
    df.seq(
        'INSERT INTO table_outside_loop DEFAULT VALUES RETURNING id, inserted_at',
        df.loop(
            'INSERT INTO table_inside_loop DEFAULT VALUES RETURNING id, inserted_at'
            ~> df.sleep(10)
        )
    ),
    'manual-nonroot-loop-prefix',
    NULL
) AS instance_id;

After two loop iterations, inspect the tables:

SELECT COUNT(*) AS outside_loop_rows FROM table_outside_loop;
SELECT COUNT(*) AS inside_loop_rows FROM table_inside_loop;

TABLE table_outside_loop ORDER BY id;
TABLE table_inside_loop ORDER BY id;

Expected behavior

table_outside_loop should contain exactly one row for the function instance. The outer/prefix SQL step should run once before entering the loop.

table_inside_loop should grow by one row per loop iteration.

Actual behavior

table_outside_loop gets one new row per loop iteration/generation. In a local run, after the second loop generation:

outside_loop_rows = 2
inside_loop_rows  = 2

The timestamped rows showed the outside-loop insert running again immediately before the second inside-loop insert.

Likely cause

execute_loop_node calls continue_as_new with only:

FunctionInput {
    instance_id: graph.instance_id.clone(),
    label: exec_ctx.label.clone(),
    vars: exec_ctx.vars.clone(),
}

The new orchestration generation then runs the top-level execute path, calls load_function_graph, and starts from graph.root_node_id again. It does not preserve the loop node id or the current execution context as the continuation start point.

This is especially problematic for finite loops when setup state appears before the loop. For example, a prefix SQL node that resets a counter can cause a loop with df.break after N iterations to never reach N, because the counter is reset on every continue_as_new generation.

Notes

This affects loop surfaces built on the LOOP node type, including:

  • df.loop(body)
  • df.loop(body, condition)
  • @> body
  • recurring cron-style patterns that wrap df.wait_for_schedule(...) in df.loop / @>

Metadata

Metadata

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions