fix: prevent duplicate DAGTask names in foreach + split-switch flows#3204
fix: prevent duplicate DAGTask names in foreach + split-switch flows#3204odncode wants to merge 4 commits into
Conversation
When a split-switch's matching_conditional_join resolves to the same node
as a foreach's matching_join, _visit in _dag_templates emitted two DAGTasks
with the same name. Argo rejects this at submit time with:
templates.<flow> sorting failed: duplicated nodeName <step>
Root cause: the foreach branch appended a join DAGTask but did not add the
matching_join name to the seen list, allowing a later traversal via the
split-switch handler to emit a second DAGTask for the same node.
Fix: add seen.append(self.graph[node.matching_join].name) immediately after
the foreach join task is appended.
Includes a regression test that compiles a minimal repro flow and checks
for duplicate task names in the generated Argo JSON.
Closes Netflix#3196
Greptile SummaryThis PR fixes a duplicate
Confidence Score: 4/5The production fix is a precise, well-reasoned one-liner that correctly closes the deduplication gap; the test scaffolding works but the chosen flow topology does not actually trigger the bug path it is meant to guard against. The core change to argo_workflows.py is correct and minimal. However, the accompanying regression test uses a flow graph where the split-switch matching_conditional_join resolves to end (not join_step), so the foreach and split-switch join pointers are different nodes and the duplicate-emission condition never fires. test/unit/test_argo_foreach_dedup.py — the flow topology should be revised so that all branches of the split-switch converge on the foreach join node. Important Files Changed
Reviews (4): Last reviewed commit: "test: use FlowGraph to build real graph ..." | Re-trigger Greptile |
Addresses review feedback: the subprocess approach always skipped in CI due to cloud datastore requirements. The new test builds a minimal graph stand-in and calls _dag_templates() directly, consistent with the existing test patterns in test_argo_workflows_cli.py.
Fixed dunder method dispatch issue (class-level not instance-level). Added enable_heartbeat_daemon attribute needed by _dag_templates. Verified locally: new test passes, all 16 existing tests still pass.
Previous mock graph did not trigger the duplicate-emission condition because _parse_conditional_branches requires real graph attributes. Now uses Metaflow's FlowGraph to parse the repro flow directly, bypassing CLI and cloud datastore requirements. Verified: test FAILS without the fix (detects duplicate 'join-step'), PASSES with it. All 16 existing tests unaffected.
Summary
Fixes the duplicate
DAGTaskname bug reported in #3196.When a split-switch's
matching_conditional_joinresolves to the same node as a foreach'smatching_join,_visitin_dag_templatesemitted twoDAGTasks with the same name. Argo rejects this at submit time with:Root Cause
The foreach branch (around line 1918) appends a
join_foreach_tasktodag_tasksbut does not add thematching_joinname to theseenlist.When a later traversal arrives at the same node via the split-switch handler, the
seencheck on line 1240 does not fire, and a secondDAGTaskis emitted.Fix
One-line change:
added immediately after:
Testing
Added
test/unit/test_argo_foreach_dedup.py, a regression test that:Existing tests in
test/unit/test_argo_workflows_cli.pycontinue to pass (16/16, as confirmed by the original reporter).Closes #3196
Disclosure
I used Claude (Anthropic) to help understand the codebase. The fix and rationale reflect my own understanding, and I’m happy to walk through the logic during review.