JOIN/RACE inside a loop stalls on iteration ≥2 (continue_as_new resets sub-orchestration id counter → child id collision)

## Summary

A `df.join()` / `df.race()` placed **inside a `df.loop()`** stalls permanently once the loop reaches its **second iteration**. The instance never completes or fails — it simply hangs after iteration 1, sitting in `running` indefinitely.

This is independent of `df.break()`; it reproduces with any JOIN/RACE in a loop body that runs ≥ 2 iterations. It was discovered while implementing #148 / #229. Test 4 in `tests/e2e/sql/22_break_in_join_race.sql` is deliberately scoped to break on **iteration 1** specifically to avoid tripping this bug.

## Mechanism

- The loop node calls `ctx.continue_as_new(...)` once per iteration (`execute_loop_node`, `src/orchestrations/execute_function_graph.rs` ~L644).
- JOIN (`execute_join_node`, ~L894) and RACE (`execute_race_node`, ~L987–988) schedule their branches with `ctx.schedule_sub_orchestration(SUBTREE_NAME, input)` — no explicit child instance id and no per-iteration discriminator (branch inputs carry only `graph`, `node_id`, `results`, `vars`, `label`).
- duroxide derives the child (sub-orchestration) instance id deterministically from the parent instance id plus a per-instance counter seeded from orchestration history. `continue_as_new` truncates/restarts history, so that counter resets to the same starting value on every iteration.
- Result: iteration 2's JOIN/RACE derives the **same** child instance id(s) as iteration 1. Those ids already exist in the provider store as `Completed`, so duroxide does not re-run them or deliver a fresh completion signal — the parent's await on the sub-orchestration future never resolves → permanent stall.

## Symptom / impact

- Any workflow with a JOIN/RACE inside a loop that iterates ≥ 2 times hangs after the first iteration.
- The instance remains `running` forever (no completion, no failure, no timeout).

## Repro (illustrative)

A loop whose body contains a JOIN and is allowed to iterate at least twice, conceptually:

```
df.loop(
  df.seq(
    df.join(df.sql('SELECT 1'), df.sql('SELECT 2')),
    <advance / while-condition that permits a 2nd iteration>
  )
)
```

Concrete in-repo reference: `tests/e2e/sql/22_break_in_join_race.sql` Test 4 (break in IF-in-JOIN-in-loop) passes only because it breaks on iteration 1. Moving the break to iteration 2 — or removing it so the loop iterates again — reproduces the stall.

## Suggested fix direction

Give each loop iteration's sub-orchestrations a unique, replay-stable instance id so they don't collide across `continue_as_new`. Options:

- Thread a monotonic iteration counter (persisted in the loop's `continue_as_new` input / vars) into the child instance id or branch input so duroxide derives distinct ids per iteration; or
- Schedule JOIN/RACE branches with an explicit instance id that includes the iteration ordinal; or
- Confirm with duroxide whether sub-orchestration id derivation can be made stable across `continue_as_new`, and adopt the recommended pattern.

Any fix must remain deterministic / replay-safe (this file is orchestration code).</body>
<parameter name="labels">["bug"]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JOIN/RACE inside a loop stalls on iteration ≥2 (continue_as_new resets sub-orchestration id counter → child id collision) #230

Summary

Mechanism

Symptom / impact

Repro (illustrative)

Suggested fix direction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

JOIN/RACE inside a loop stalls on iteration ≥2 (continue_as_new resets sub-orchestration id counter → child id collision) #230

Description

Summary

Mechanism

Symptom / impact

Repro (illustrative)

Suggested fix direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions