Skip to content

fix: retry VCS status on finalize for failed state#2716

Closed
ab-ghosh wants to merge 1 commit into
tektoncd:mainfrom
ab-ghosh:SRVKP-11614-retry-vcs-status-on-finalize-for-failed-state
Closed

fix: retry VCS status on finalize for failed state#2716
ab-ghosh wants to merge 1 commit into
tektoncd:mainfrom
ab-ghosh:SRVKP-11614-retry-vcs-status-on-finalize-for-failed-state

Conversation

@ab-ghosh
Copy link
Copy Markdown
Member

@ab-ghosh ab-ghosh commented May 6, 2026

📝 Description of the Change

When the VCS update fails, PaC sets state to failed and never re-attempts during FinalizeKind when the PipelineRun is being deleted. This leaves the VCS commit status stuck at pending. This PR adds a StateFailed handler in FinalizeKind that retries the VCS status update one last time when the PipelineRun is deleted, following the same pattern as the existing reportPipelineRunAsCancelled

🔗 Linked GitHub Issue

Fixes #

🧪 Testing Strategy

  • Unit tests
  • Integration tests
  • End-to-end tests
  • Manual testing
  • Not Applicable

🤖 AI Assistance

AI assistance can be used for various tasks, such as code generation,
documentation, or testing.

Please indicate whether you have used AI assistance
for this PR and provide details if applicable.

  • I have not used any AI assistance for this PR.
  • I have used AI assistance for this PR.

Important

Slop will be simply rejected, if you are using AI assistance you need to make sure you
understand the code generated and that it meets the project's standards. you
need at least know how to run the code and deploy it (if needed). See
startpaac to make it easy
to deploy and test your code changes.

If the majority of the code in this PR was generated by an AI, please add a Co-authored-by trailer to your commit message.
For example:

Co-authored-by: Claude noreply@anthropic.com

✅ Submitter Checklist

  • 📝 My commit messages are clear, informative, and follow the project's How to write a git commit message guide. The Gitlint linter ensures in CI it's properly validated
  • ✨ I have ensured my commit message prefix (e.g., fix:, feat:) matches the "Type of Change" I selected above.
  • ♽ I have run make test and make lint locally to check for and fix any
    issues. For an efficient workflow, I have considered installing
    pre-commit and running pre-commit install to
    automate these checks.
  • 📖 I have added or updated documentation for any user-facing changes.
  • 🧪 I have added sufficient unit tests for my code changes.
  • 🎁 I have added end-to-end tests where feasible. See README for more details.
  • 🔎 I have addressed any CI test flakiness or provided a clear reason to bypass it.
  • If adding a provider feature, I have filled in the following and updated the provider documentation:
    • GitHub App
    • GitHub Webhook
    • Gitea/Forgejo
    • GitLab
    • Bitbucket Cloud
    • Bitbucket Data Center

When postFinalStatus fails during reconciliation, the PipelineRun
state is set to "failed" but FinalizeKind never retries the VCS
update. This leaves commit statuses stuck at "pending" with no
recovery path. Add a StateFailed handler in FinalizeKind that
retries the status post one last time during PipelineRun cleanup.

Signed-off-by: ab-ghosh <abghosh@redhat.com>
Assisted-by: Claude Opus 4.6 (via Claude Code)
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 34.37500% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.23%. Comparing base (c615efb) to head (32d4c2f).

Files with missing lines Patch % Lines
pkg/reconciler/finalizer.go 34.37% 14 Missing and 7 partials ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2716      +/-   ##
==========================================
- Coverage   59.25%   59.23%   -0.03%     
==========================================
  Files         208      208              
  Lines       20573    20605      +32     
==========================================
+ Hits        12191    12205      +14     
- Misses       7610     7622      +12     
- Partials      772      778       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a retry mechanism for updating the final status on the git provider when a PipelineRun enters a failed state during finalization. It includes new logic in the reconciler to fetch repository details and re-attempt the status post, along with updated unit tests to cover this scenario. A significant thread-safety concern was raised regarding the assignment of the logger to a shared client object, which could lead to data races in concurrent reconciliation loops.

I am having trouble creating individual review comments. Click here to see my feedback.

pkg/reconciler/finalizer.go (103-105)

high

Assigning to r.run.Clients.Log is not thread-safe because r.run is a shared object across multiple concurrent reconciliation loops. This creates a data race. Since postFinalStatus and other methods should ideally take the logger as an argument or use the one already present in the context, this mutation should be avoided. If the client requires a logger, consider if it can be initialized once or handled in a thread-safe manner.

Copy link
Copy Markdown
Member

@chmouel chmouel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can race with a newer rerun and put the wrong status back on the PR.

We already send the final status once in the normal flow. With this change, if that failed PipelineRun gets deleted later, the finalizer sends it again. The problem is that it uses the same check/status name as the rerun, so an older failed run can overwrite a newer successful one.

In general please tell your agent to be very careful when touching this part of the code, because race like this can be introduced.

@chmouel
Copy link
Copy Markdown
Member

chmouel commented May 6, 2026

If the intent is to solve VCS interminent, i think we may be able to do this, but it would need some quite a bit of thinking to make it right, feel free to join the PaC meeting to talk about it

@ab-ghosh
Copy link
Copy Markdown
Member Author

ab-ghosh commented May 6, 2026

Thank you @chmouel, will join the PaC call to discuss the approach. Converting to draft for now

@ab-ghosh ab-ghosh marked this pull request as draft May 6, 2026 18:09
Copy link
Copy Markdown
Member Author

@ab-ghosh ab-ghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📋 Review Summary

The PR successfully addresses the issue where VCS status updates are not retried when a PipelineRun fails and is then deleted (finalized). The implementation follows existing patterns in the reconciler.

🔍 General Feedback

  • The logic is sound and correctly identifies the StateFailed condition during finalization.
  • Consider refactoring the repository retrieval logic in FinalizeKind to reduce duplication.
  • Be cautious with side effects on the shared Run object (setting the logger).

detectedProvider, event, err := r.initGitProviderClient(ctx, logger, repo, pr)
if err != nil {
return err
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Setting r.run.Clients.Log modifies shared state within the Reconciler's Run object. While this ensures that methods like GetStatusFromTaskStatusOrFromAsking have a logger, it introduces a side effect that could affect other concurrent reconciliations. Since postFinalStatus and other methods already take a logger parameter, it would be safer to pass the logger down to where it's needed instead of relying on the shared client state.

}
}

if state == kubeinteraction.StateFailed {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] This block for StateFailed repeats the repository retrieval logic used for StateQueued and StateStarted. Consider refactoring to fetch the repository once for all states that require it.

}

if err := r.retryFinalStatusUpdate(ctx, repo, pr); err != nil {
logger.Errorf("failed to retry final status update on finalize: %v", err)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Inconsistent error formatting. Line 47 uses %w while here %v is used. While logger.Errorf doesn't wrap, consistency across the file is preferred.

},
},
{
name: "failed state retries final status",
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] The test case "failed state retries final status" verifies that the function runs without error but doesn't assert that the status was actually retried or posted. Consider adding assertions for the expected side effects.

@ab-ghosh ab-ghosh closed this May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants