Summary
When a worker raises an exception, executor.execute() returns a workflow run with status: RUNNING and output: {} — with no indication that anything went wrong. The actual error is only visible in background logs. A first-time user sees the workflow silently hang and has no idea what failed.
Steps to reproduce
from conductor.client.automator.task_handler import TaskHandler
from conductor.client.configuration.configuration import Configuration
from conductor.client.orkes_clients import OrkesClients
from conductor.client.workflow.conductor_workflow import ConductorWorkflow
from conductor.client.worker.worker_task import worker_task
@worker_task(task_definition_name='bad_worker', register_task_def=True)
def bad_worker(name: str) -> str:
raise ValueError("intentional failure")
config = Configuration()
clients = OrkesClients(configuration=config)
executor = clients.get_workflow_executor()
workflow = ConductorWorkflow(name='fail_test', version=1, executor=executor)
t = bad_worker(task_ref_name='t', name=workflow.input('name'))
workflow >> t
workflow.register(overwrite=True)
with TaskHandler(configuration=config, scan_for_annotated_workers=True) as th:
th.start_processes()
run = executor.execute(name='fail_test', version=1, workflow_input={'name': 'x'})
print('Status:', run.status) # prints: RUNNING
print('Output:', run.output) # prints: {}
What the user sees
Status: RUNNING
Output: {}
The actual error traceback (ValueError: intentional failure) is logged to stderr from a background worker process, but:
run.status is RUNNING not FAILED
run.output is empty {}
run.reason_for_incompletion returns None (and is deprecated with a warning)
Why this happens
executor.execute() defaults to wait_for_seconds=10. When a worker fails, Conductor schedules retries (default: 3 retries with 60-second delays). The workflow is genuinely still RUNNING (waiting for retry) when execute() times out after 10 seconds. The workflow won't fail for at least 3 × 60 = 180 seconds.
Impact on first-time users
This is a silent failure mode. New users:
- Write a worker, it has a bug
- Run their app — see
Status: RUNNING, Output: {}
- Have no idea why it's not completing
- Must know to look at background INFO/ERROR logs from a separate process
Expected behavior
At minimum, the SDK should provide a clear path to surface the failure reason. Possible improvements:
- If the workflow status is
RUNNING after the timeout, check for failed tasks and surface the failure reason in the exception or returned object
- Document
wait_for_seconds prominently and suggest increasing it for debugging
- Suppress or fix the deprecation warning on
reason_for_incompletion and ensure it returns useful info
- Add a
run.failed_tasks or similar accessor
Environment
- Python 3.14, conductor-python 1.3.8, Conductor OSS server
- Tested on macOS 15
Summary
When a worker raises an exception,
executor.execute()returns a workflow run withstatus: RUNNINGandoutput: {}— with no indication that anything went wrong. The actual error is only visible in background logs. A first-time user sees the workflow silently hang and has no idea what failed.Steps to reproduce
What the user sees
The actual error traceback (
ValueError: intentional failure) is logged to stderr from a background worker process, but:run.statusisRUNNINGnotFAILEDrun.outputis empty{}run.reason_for_incompletionreturnsNone(and is deprecated with a warning)Why this happens
executor.execute()defaults towait_for_seconds=10. When a worker fails, Conductor schedules retries (default: 3 retries with 60-second delays). The workflow is genuinely stillRUNNING(waiting for retry) whenexecute()times out after 10 seconds. The workflow won't fail for at least 3 × 60 = 180 seconds.Impact on first-time users
This is a silent failure mode. New users:
Status: RUNNING, Output: {}Expected behavior
At minimum, the SDK should provide a clear path to surface the failure reason. Possible improvements:
RUNNINGafter the timeout, check for failed tasks and surface the failure reason in the exception or returned objectwait_for_secondsprominently and suggest increasing it for debuggingreason_for_incompletionand ensure it returns useful inforun.failed_tasksor similar accessorEnvironment