Report job success immediately after main graph completes, before post steps by deepsm007 · Pull Request #4921 · openshift/ci-tools

deepsm007 · 2026-02-02T20:05:55Z

Report success to users as soon as main graph completes successfully
Post steps (promotion/cleanup) now run as best-effort and don't affect job result
Added metrics tracking: main_graph_duration_seconds, post_steps_duration_seconds, time_saved_seconds

/hold

/cc @droslean @openshift/test-platform

openshift-ci-robot · 2026-02-02T20:05:59Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

coderabbitai · 2026-02-02T20:06:19Z

Walkthrough

Reports main graph success immediately on completion (and records that a success report was sent), runs post-steps concurrently as best-effort cleanup (logging failures without aborting), and records aggregated metrics including main/post-step/total durations and a post-steps-failed flag.

Changes

Cohort / File(s)	Summary
CI operator main logic `cmd/ci-operator/main.go`	Adds `successReported bool` to `options`; adds `shouldReportEarly(jobSpec *api.JobSpec) bool`; emits `CiJobSucceeded` and attempts an immediate success report after main graph completion; runs promotion/post-steps concurrently as best-effort (log/count failures without aborting); prevents duplicate success reports; records metrics for `main_graph_duration_seconds`, `post_steps_duration_seconds`, total `duration_seconds`, `time_saved_seconds`, `success` and conditionally `post_steps_failed`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@cmd/ci-operator/main.go`:
- Around line 1065-1071: Duplicate success reports are caused by the early
reporter.Report(nil) call in the Run path and the final opt.Report() call in
main(); make reporting idempotent by adding a boolean flag on the options struct
(e.g., earlyReported) and set it when you call reporter.Report(nil) after
o.resultsOptions.Reporter(...) (the block that currently logs loadErr), then
update the Report() method (opt.Report / Report()) to check that flag and skip
calling reporter.Report(nil) if earlyReported is true; reference the Run()
caller flow, main() → opt.Report(), the Report() method, reporter.Report(nil),
and o.resultsOptions.Reporter to locate where to set and check the flag.

cmd/ci-operator/main.go

…t steps - Report success to users as soon as main graph completes successfully - Post steps (promotion/cleanup) now run as best-effort and don't affect job result - Add metrics tracking: main_graph_duration_seconds, post_steps_duration_seconds, time_saved_seconds - Metrics are extractable from ci-operator-metrics.json in test_platform_insights events - Prevent duplicate success reporting by tracking early-report state in options

…riodics, exclude rehearsals

deepsm007 · 2026-02-02T22:25:50Z

/unhold

deepsm007 · 2026-02-02T23:27:44Z

/test images

openshift-ci-robot · 2026-02-03T00:02:08Z

Scheduling required tests:
/test e2e

openshift-ci · 2026-02-03T10:25:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deepsm007, liangxia

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [deepsm007,liangxia]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2026-02-03T10:50:45Z

/retest-required

Remaining retests: 0 against base HEAD 4188be3 and 2 for PR HEAD efdce43 in total

deepsm007 · 2026-02-03T12:47:24Z

/hold

openshift-ci · 2026-02-03T12:58:42Z

@deepsm007: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/images	`efdce43`	link	unknown	`/test images`
ci/prow/breaking-changes	`efdce43`	link	false	`/test breaking-changes`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

droslean · 2026-02-03T13:22:23Z

cmd/ci-operator/main.go

+		// Main graph completed successfully - report success immediately before post steps
+		mainGraphCompletedAt := time.Now()
+		mainGraphDuration := mainGraphCompletedAt.Sub(start)
+		eventRecorder.Event(runtimeObject, coreapi.EventTypeNormal, "CiJobSucceeded", eventJobDescription(o.jobSpec, o.namespace))


Nothing is consuming this. I had plans to deprecate them all.

petr-muller · 2026-02-03T13:23:18Z

/cc

droslean · 2026-02-03T13:23:52Z

I think the best approach will be to report after each step is completed in general and keep the final report state after post steps as it was.

petr-muller · 2026-02-03T13:30:46Z

cmd/ci-operator/main.go

+				reporter.Report(nil)
+				o.successReported = true


I think this is the part you expect to deliver the early report, but AFAIK this does not do any actual meaningful reporting. It reports results to result-aggregator which AFAIK only powers the ci operator executions fail too much because of a $reason alerts and apparently does something related to pod scaler, too (but its not documented so who knows lol).

The actual result reporting goes strictly through exit codes: ci-operator exits, prow picks up that exit code and propagates the result accordingly (sets Prowjob status, crier reports to GitHub...). So early reporting is likely a much greater architecture change - either we invent APIs on both ci-operator and Prow layers to directly support this (report the final result while the job is still executing), or we'd allow ci-operator and therefore the prowjob to terminate with the final result, and the "remaining" parts of the workload would need to be handed over to some background manager to process and finalize them. Both are IMO non trivial and a kind of effort that needs some design work

openshift-ci bot requested review from a team and droslean February 2, 2026 20:06

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 2, 2026

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 2, 2026

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

cmd/ci-operator/main.go Show resolved Hide resolved

deepsm007 force-pushed the post-step-job-success branch from 8a88f52 to fd54539 Compare February 2, 2026 20:19

Limit early success reporting to presubmits and release controller pe…

efdce43

…riodics, exclude rehearsals

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 2, 2026

liangxia approved these changes Feb 3, 2026

View reviewed changes

openshift-ci bot assigned liangxia Feb 3, 2026

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 3, 2026

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 3, 2026

droslean reviewed Feb 3, 2026

View reviewed changes

openshift-ci bot requested a review from petr-muller February 3, 2026 13:23

petr-muller reviewed Feb 3, 2026

View reviewed changes

Conversation

deepsm007 commented Feb 2, 2026

Uh oh!

openshift-ci-robot commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deepsm007 commented Feb 2, 2026

Uh oh!

deepsm007 commented Feb 2, 2026

Uh oh!

openshift-ci-robot commented Feb 3, 2026

Uh oh!

openshift-ci bot commented Feb 3, 2026

Uh oh!

openshift-ci-robot commented Feb 3, 2026

Uh oh!

deepsm007 commented Feb 3, 2026

Uh oh!

openshift-ci bot commented Feb 3, 2026

Uh oh!

droslean Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

petr-muller commented Feb 3, 2026

Uh oh!

droslean commented Feb 3, 2026

Uh oh!

petr-muller Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

coderabbitai bot commented Feb 2, 2026 •

edited

Loading

petr-muller Feb 3, 2026 •

edited

Loading