Skip to content

Bug: Race condition in parallel step execution — shared mutable execution_context #2204

@mrveiss

Description

@mrveiss

Problem

PR #2197 added parallel step execution via asyncio.gather in _execute_step_group(). However, multiple concurrent steps write to the same mutable execution_context dict without synchronization:

# orchestration/workflow_executor.py:157-160
execution_context["step_results"][step["id"]] = step_result  # different keys — likely OK
execution_context["agents_involved"].add(agent_id)           # set.add() — NOT safe
execution_context["interactions"].append(interaction)         # list.append() — NOT safe

While CPython's GIL makes set.add() and list.append() appear atomic for simple cases, this is an implementation detail, not a language guarantee. With asyncio.gather, the coroutines interleave at await points, but the mutations between awaits could still produce inconsistent state.

Evidence

# _execute_step_group uses asyncio.gather with shared context:
await asyncio.gather(
    *(self._execute_step_with_agent(step, execution_context, context)
      for step in group)
)

Each _execute_step_with_agent call mutates:

  • execution_context["step_results"] — dict, different keys per step (low risk)
  • execution_context["agents_involved"] — set, concurrent .add() (medium risk)
  • execution_context["interactions"] — list, concurrent .append() (medium risk)

Impact

Medium — Could cause missing agent IDs in agents_involved or dropped interactions in interactions list. Unlikely to crash but may produce incomplete execution reports.

Expected Fix

Use asyncio.Lock or collect per-step results and merge after gather:

async def _execute_step_group(self, group, execution_context, context):
    if len(group) == 1:
        await self._execute_step_with_agent(group[0], execution_context, context)
        return
    # Collect results separately, merge after
    results = await asyncio.gather(
        *(self._execute_step_isolated(step, context) for step in group)
    )
    for step, result in zip(group, results):
        execution_context["step_results"][step["id"]] = result
        # ... merge agents_involved and interactions

Files

  • autobot-backend/orchestration/workflow_executor.py:217-244_execute_step_group()
  • autobot-backend/orchestration/workflow_executor.py:136-168_execute_step_with_agent() (mutates shared state)

Discovered During

Implementation of #2172 (parallel step execution).

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions