Skip to content

fix: correct abort flow — phase transitions, reason propagation, and K8s cascade#7225

Merged
pingsutw merged 11 commits intov2from
kevin-abort-fix
Apr 17, 2026
Merged

fix: correct abort flow — phase transitions, reason propagation, and K8s cascade#7225
pingsutw merged 11 commits intov2from
kevin-abort-fix

Conversation

@AdilFayyaz
Copy link
Copy Markdown

@AdilFayyaz AdilFayyaz commented Apr 16, 2026

Why are the changes needed?

The abort implementation on v2 had several bugs: the reconciler was pushing the wrong action ID, AbortInfo was never populated in action details, aborted actions were missing phase-transition events (so WatchActionDetails never showed ABORTED), and WatchActionDetails had a race condition that could miss the abort notification entirely.

What changes were proposed in this pull request?

  • Action service: add DeleteFunc handler to the K8s informer so CRD deletions (triggered by abort) propagate ABORTED back to the run service; skip label-patching on deleted CRs which would always 404
  • Run service: fix reconciler push to use "a0" (was using run name); populate AbortInfo.Reason in buildActionDetails (was a TODO); add 404 handling for unknown run/action in AbortRun/AbortAction; fix WatchActionDetails subscribe-before-read ordering to close the notification race
  • Repo: insert an action_events row with ABORTED phase on abort so WatchActionDetails returns a phaseTransitions entry; set ended_at/duration_ms on abort; simplify abort to only mark the root/targeted action — K8s OwnerReferences cascade CRD deletion to children, and the informer handles marking them ABORTED
  • Reconciler: handle sql.ErrNoRows in MarkAbortAttempt gracefully; broaden isAlreadyTerminated to catch K8s "not found" errors wrapped as CodeInternal

How was this patch tested?

  • go test ./runs/... passes
  • Call AbortRun on a running workflow; confirm root action is immediately ABORTED in DB with reason populated; confirm child actions transition to ABORTED after reconciler deletes the root CRD and the informer fires delete events
  • Call WatchActionDetails on an aborted action; confirm the response includes a phaseTransitions entry with ABORTED and the AbortInfo.Reason field is set

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

  • main
    • Flyte 2 #6583
      • fix: correct abort flow — phase transitions, reason propagation, and K8s cascade 👈

AdilFayyaz and others added 5 commits April 16, 2026 11:59
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
@github-actions github-actions Bot mentioned this pull request Apr 16, 2026
3 tasks
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
Comment thread docker/demo-bundled/manifests/complete.yaml Outdated
Comment thread runs/repository/impl/action.go Outdated
Comment thread runs/repository/impl/action.go Outdated
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
@AdilFayyaz AdilFayyaz requested a review from pingsutw April 16, 2026 22:08
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
@pingsutw pingsutw merged commit fa3c40a into v2 Apr 17, 2026
20 checks passed
@pingsutw pingsutw deleted the kevin-abort-fix branch April 17, 2026 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fixed For any bug fixes flyte2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants