Skip to content

Preserve FunctionError semantics while surfacing child tracebacks#547

Open
diptanu wants to merge 1 commit intomainfrom
fix/child-function-errors-not-swallowed
Open

Preserve FunctionError semantics while surfacing child tracebacks#547
diptanu wants to merge 1 commit intomainfrom
fix/child-function-errors-not-swallowed

Conversation

@diptanu
Copy link
Contributor

@diptanu diptanu commented Mar 4, 2026

Summary

  • preserve the existing distinction between RequestError and runtime FunctionError
  • stop swallowing child function runtime exceptions as generic "failed"
  • forward child runtime tracebacks through request_error_output with an internal marker and reconstruct them into FunctionError on the parent side

Details

  • when a child function call raises a non-RequestError, FE now uploads traceback text with a private prefix marker
  • FE still returns ALLOCATION_FAILURE_REASON_FUNCTION_ERROR for these runtime failures
  • parent watcher path decodes marked payloads into create_function_error(..., cause=traceback)
  • intentional RequestError paths are unchanged and remain RequestError

Validation

  • python3 -m py_compile src/tensorlake/function_executor/allocation_runner/allocation_runner.py
  • full test suite not run in this environment (missing local deps/tooling)

uploaded_request_error_blob=uploaded_output_blob,
)
except BaseException as e:
# For child function calls (created by SDK), forward traceback via
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't copy-paste the logic here, reuse logic in except RequestError as e: branch.

except BaseException as e:
# For child function calls (created by SDK), forward traceback via
# request_error_output while preserving FUNCTION_ERROR semantics.
if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if looks redundant and obscure.

request_error_message: str = serialized_request_error.data.decode(
"utf-8"
)
if request_error_message.startswith(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update FE proto comments to change and explain new request_error field its semantic.

Copy link
Contributor

@eabatalov eabatalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear what we're achieving with this change. We'll discuss this outside of this PR.

Besides the other comments, the gaps in this PR:

  • Not implemented for LocalRunner.
  • No tests that validate that backtraces are now present in FunctionError. The existing local runner test needs to get updated for the new UX, see test_original_function_exception_is_pretty_printed_in_local_mode and the same test needs be added for remote mode.

@diptanu
Copy link
Contributor Author

diptanu commented Mar 5, 2026

Leaving this open so I can understand the comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants