Skip to content

Preserve instance_id and execution_id in OrchestrationFailed events#36

Open
pinodeca wants to merge 1 commit into
mainfrom
fix/preserve-instance-execution-id-failed-events
Open

Preserve instance_id and execution_id in OrchestrationFailed events#36
pinodeca wants to merge 1 commit into
mainfrom
fix/preserve-instance-execution-id-failed-events

Conversation

@pinodeca

Copy link
Copy Markdown
Contributor

Summary

HistoryManager::append_failed hardcoded placeholder identifiers ("" for instance_id and 0 for execution_id) that callers never overwrote. As a result, every terminal OrchestrationFailed event serialized an empty instance_id and execution_id = 0 into its payload, breaking payload-only correlation for telemetry/diagnostic consumers.

This affected more call sites than just the cancellation path. All four were fixed:

  1. TurnResult::Failed arm — src/runtime/execution.rs
  2. TurnResult::Cancelled arm — src/runtime/execution.rs
  3. Ack-failure infrastructure error path — src/runtime/dispatchers/orchestration.rs
  4. Poison-message failure path — src/runtime/dispatchers/orchestration.rs

Changes

  • append_failed now takes instance_id: &str, execution_id: u64, matching the sibling Completed/ContinueAsNew terminal paths, and drops the placeholder comments.
  • All four call sites now thread the in-scope instance/execution_id.
  • Added a regression assertion to the cancellation sample (tests/e2e_samples.rs) verifying the terminal OrchestrationFailed event carries the real instance_id and a non-zero execution_id.

Compatibility

Backward compatible — no event schema change, only correcting values that were always meant to be populated. No rolling-upgrade concern. Existing already-persisted failed events are historical and not rewritten.

Testing

  • All 70 cancellation tests pass.
  • Updated cancellation sample passes.
  • cargo clippy --all-targets --all-features is clean.

Fixes #35

append_failed hardcoded placeholder identifiers ( and 0) that callers
never overwrote, so every terminal OrchestrationFailed event (both the
Failed and Cancelled paths, plus the infra-ack-failure and poison paths)
serialized an empty instance_id and execution_id=0 into its payload.
This broke payload-only correlation for telemetry/diagnostic consumers.

Thread the real instance_id/execution_id through append_failed at all
four call sites, matching the sibling Completed/ContinueAsNew paths, and
add a regression assertion to the cancellation sample.

Fixes #35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OrchestrationFailed terminal event persisted with empty instance_id and execution_id=0 in payload

1 participant