Skip to content

OCPERT-300: Replace triggering Image Consistency Jenkins job with Prow job#943

Open
tomasdavidorg wants to merge 2 commits intoopenshift:mainfrom
tomasdavidorg:OCPERT-300
Open

OCPERT-300: Replace triggering Image Consistency Jenkins job with Prow job#943
tomasdavidorg wants to merge 2 commits intoopenshift:mainfrom
tomasdavidorg:OCPERT-300

Conversation

@tomasdavidorg
Copy link
Contributor

@tomasdavidorg tomasdavidorg commented Mar 11, 2026

https://issues.redhat.com/browse/OCPERT-300

Summary by CodeRabbit

  • New Features

    • Image consistency check now runs on Prow with job‑ID based triggering and status checks.
    • Notifications: shareable Prow job links sent to Slack.
  • Documentation

    • CLI examples and workflow docs updated to use job IDs and reflect Prow migration.
    • Jenkins docs narrowed to stage-testing guidance.
  • Chores

    • Removed legacy Jenkins image‑consistency paths and moved CLI/status flags to job‑id.
    • Build now exposes additional CLI helpers during image build.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 11, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 11, 2026

@tomasdavidorg: This pull request references OCPERT-300 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

https://issues.redhat.com/browse/OCPERT-300

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ec4e8081-c05c-4c0b-a3c6-76b423c59095

📥 Commits

Reviewing files that changed from the base of the PR and between 4666271 and e0cd797.

📒 Files selected for processing (1)
  • Dockerfile

Walkthrough

Migrates image-consistency-check from Jenkins to Prow: replaces build-number flags with job-id, removes Jenkins trigger paths, adds Prow/Jobs API integration, updates CLI signatures, notification helpers, server API, docs, and tests to use Prow job IDs and workflows.

Changes

Cohort / File(s) Summary
CI Analysis Documentation
.claude/commands/ci/analyze-failures.md, .claude/commands/ci/analyze-jenkins-failures.md
Removed image-consistency-check from Jenkins scope, added migration notes pointing to Prow and updated example job/URL patterns to stage-testing context.
Release Flow / Drive Docs
.claude/commands/release/drive.md, docs/KONFLUX_RELEASE_FLOW.md
Replaced Jenkins build_number with job_id across async task handling, examples, and status text; generalized wording to cover Prow/Jenkins job IDs.
Agent / CLI Docs
AGENTS.md, CLAUDE.md, oar/README.md
Updated CLI examples and flags from -n/--build-number to -i/--job-id, reorganized version/job configuration guidance to reflect Prow migration.
Skill Workflow
.claude/skills/release-workflow/SKILL.md
Adjusted async status checks to use job identifiers (job_id) instead of build numbers.
MCP Server API
mcp_server/server.py
Renamed oar_image_consistency_check(release, build_number)oar_image_consistency_check(release, job_id), updated docstring and CLI flag semantics to reflect Prow/Gangway usage.
CLI: image-consistency-check
oar/cli/cmd_image_consistency_check.py
Replaced JenkinsHelper with Jobs API; trigger_job() now triggers Prow job and returns job_id; check_job_status(job_id) fetches Prow results; CLI option changed to -i/--job-id; removed Jenkins-specific flows and nightly flag.
Core: Jenkins helper
oar/core/jenkins.py
Removed call_image_consistency_job() and the image-consistency branch from call_build_job(), leaving only stage-pipeline support.
Notifications
oar/core/notification.py
Added share_prow_job_url() and get_slack_message_for_prow_job() to format/send Slack messages for Prow jobs.
State / Status messages
oar/core/statebox.py
Updated pass status text from "Jenkins job completed successfully" to "Prow job completed successfully" for image-consistency-check.
Tests
tests/test_jenkins.py
Removed test_call_image_consistency_job() (Jenkins image-consistency test) reducing coverage of that path.
Docker build
Dockerfile
Added local ./prow install in build step and exposed additional CLI helpers (job, jobctl) in the build test chain.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant CLI as CLI (image-consistency-check)
participant JobsAPI as Jobs / Gangway
participant Prow as Prow job
participant Notification as NotificationManager (Slack)
participant ImageHealth as ImageHealthOperator
CLI->>JobsAPI: trigger image-consistency job (returns job_id, job_url)
JobsAPI->>Prow: create/queue job
JobsAPI-->>CLI: return job_id & job_url
CLI->>Notification: share_prow_job_url(job_name, job_url)
Note right of CLI: later / status check
CLI->>JobsAPI: get_job_results(job_id)
JobsAPI-->>CLI: job_state + results
alt job_state == success
CLI->>ImageHealth: check_image_health(results)
ImageHealth-->>CLI: PASS/FAIL
else job pending
CLI-->>CLI: INPROGRESS
else other
CLI-->>CLI: FAIL
end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the primary change: replacing Jenkins-based image consistency job triggering with Prow-based job triggering, which is the core objective reflected across multiple file changes.
Docstring Coverage ✅ Passed Docstring coverage is 86.67% which is sufficient. The required threshold is 80.00%.
Stable And Deterministic Test Names ✅ Passed This Python repository contains no Go source files or Ginkgo tests, making the Ginkgo-specific check not applicable.
Test Structure And Quality ✅ Passed This PR contains Python test files, not Ginkgo test code. The repository is a Python project with no Go test files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan for PR comments
  • Generate coding plan

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 11, 2026
@openshift-ci
Copy link

openshift-ci bot commented Mar 11, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 11, 2026

@tomasdavidorg: This pull request references OCPERT-300 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

https://issues.redhat.com/browse/OCPERT-300

Summary by CodeRabbit

  • New Features

  • Image consistency check now runs on Prow platform with job ID tracking.

  • Added Prow job notification sharing capability.

  • Documentation

  • Updated CLI and workflow documentation to reflect Prow-based image consistency checks and new job ID parameters.

  • Refined Jenkins documentation to focus on stage-testing jobs.

  • Chores

  • Migrated image consistency check from Jenkins to Prow workflow.

  • Removed Jenkins-specific code paths and legacy parameters.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
oar/core/jenkins.py (1)

133-146: ⚠️ Potential issue | 🟠 Major

Uninitialized variable will cause NameError when unsupported job name is passed.

The code logs a warning for unsupported job names but continues execution. At line 146, parameters_value is used unconditionally without being defined in the else branch, which will raise a NameError at runtime if call_build_job is invoked with any job name other than JENKINS_JOB_STAGE_PIPELINE.

🐛 Proposed fix
             if (job_name == JENKINS_JOB_STAGE_PIPELINE):
                 parameters_value = {
                     "VERSION": self.version,
                     "PULL_SPEC": pull_spec,
                 }
                 # only add metadata ad to job request for errata release flow
                 # if param metadata_ad is not provided, stage 'Verify Metadata' will be skipped.
                 if self.metadata_ad:
                     parameters_value["METADATA_AD"] = self.metadata_ad
             else:
-                logger.info(f"{job_name} is not supported")
+                raise JenkinsHelperException(f"{job_name} is not supported")

             # trigger build and get queue item in resp payload
             res = self.server.build_job(job_name, parameters=parameters_value)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@oar/core/jenkins.py` around lines 133 - 146, The code uses parameters_value
unconditionally after logging an unsupported job, which causes a NameError;
update the call_build_job flow in the method (the block referencing job_name,
JENKINS_JOB_STAGE_PIPELINE, parameters_value and self.server.build_job) so that
for unsupported job names you either return/raise immediately (e.g., raise
ValueError or return None) or initialize a safe parameters_value before calling
self.server.build_job; specifically, move the self.server.build_job invocation
into the JENKINS_JOB_STAGE_PIPELINE branch (or add an explicit early exit in the
else branch) to ensure parameters_value is always defined when passed to
build_job.
docs/KONFLUX_RELEASE_FLOW.md (1)

714-719: ⚠️ Potential issue | 🟡 Minor

Inconsistent terminology: Jenkins mentioned for Prow-based job.

Line 719 mentions "Jenkins job execution: 90-120 minutes" but this section now describes a Prow-based workflow. The terminology should be updated to reflect the Prow migration.

📝 Suggested fix
 **Expected Duration:**
 - Stage-release pipeline wait: Variable (requires ART team intervention if failed)
-- Jenkins job execution: 90-120 minutes after trigger succeeds
+- Prow job execution: 90-120 minutes after trigger succeeds
 - User should check status every 10-15 minutes
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/KONFLUX_RELEASE_FLOW.md` around lines 714 - 719, Update the outdated
"Jenkins" terminology in the release flow text: replace occurrences like
"Jenkins job execution: 90-120 minutes after trigger succeeds" and any nearby
references to "Jenkins job failure" with Prow-appropriate wording (e.g., "Prow
job execution" and "Prow job failure"), and ensure related phrases such as "User
should check status every 10-15 minutes" and "Stage-release pipeline not ready"
remain consistent with the Prow-based workflow; search for the exact strings
"Jenkins job execution", "Jenkins job failure", and "Stage-release pipeline" to
locate and update the lines in KONFLUX_RELEASE_FLOW.md.
🧹 Nitpick comments (2)
.claude/commands/ci/analyze-jenkins-failures.md (2)

147-336: Consider removing or marking the image-consistency-check section as deprecated.

The entire "image-consistency-check Job" analysis section (lines 147-336) remains in this Jenkins-focused document, but image-consistency-check has migrated to Prow. This section may confuse users since:

  1. The file description states it's for "stage-testing" Jenkins jobs
  2. The migration note says to use /ci:analyze-failures for Prow jobs

Either remove this section or add a prominent deprecation notice at its start redirecting users to the Prow analysis path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/ci/analyze-jenkins-failures.md around lines 147 - 336, The
"image-consistency-check Job" section is outdated because that job moved to
Prow; either delete the entire "#### A. image-consistency-check Job" block or
replace its content with a short deprecation notice that immediately directs
users to the Prow workflow (e.g., reference using /ci:analyze-failures) and
explains that stage-testing Jenkins docs no longer apply; edit the header "####
A. image-consistency-check Job" and the following paragraphs to perform this
replacement so readers aren’t misled by Jenkins-specific instructions.

5-17: Duplicate migration notes may cause confusion.

Lines 7 and 16-17 both contain migration notes about image-consistency-check moving to Prow. Consider consolidating to a single note to avoid redundancy.

♻️ Suggested consolidation

Keep only one migration note, preferably at line 7 in the overview section, and remove lines 16-17 since the overview already covers it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/ci/analyze-jenkins-failures.md around lines 5 - 17,
Consolidate the duplicate migration note about "image-consistency-check has been
migrated from Jenkins to Prow" by keeping the single instance in the Overview
section near the top (the note currently after the Jenkins job URL) and removing
the redundant note at the bottom of the file; update surrounding text if
necessary so the overview still mentions that Prow job failures should use
`/ci:analyze-failures` and delete the later duplicate lines that repeat the
migration message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@docs/KONFLUX_RELEASE_FLOW.md`:
- Around line 714-719: Update the outdated "Jenkins" terminology in the release
flow text: replace occurrences like "Jenkins job execution: 90-120 minutes after
trigger succeeds" and any nearby references to "Jenkins job failure" with
Prow-appropriate wording (e.g., "Prow job execution" and "Prow job failure"),
and ensure related phrases such as "User should check status every 10-15
minutes" and "Stage-release pipeline not ready" remain consistent with the
Prow-based workflow; search for the exact strings "Jenkins job execution",
"Jenkins job failure", and "Stage-release pipeline" to locate and update the
lines in KONFLUX_RELEASE_FLOW.md.

In `@oar/core/jenkins.py`:
- Around line 133-146: The code uses parameters_value unconditionally after
logging an unsupported job, which causes a NameError; update the call_build_job
flow in the method (the block referencing job_name, JENKINS_JOB_STAGE_PIPELINE,
parameters_value and self.server.build_job) so that for unsupported job names
you either return/raise immediately (e.g., raise ValueError or return None) or
initialize a safe parameters_value before calling self.server.build_job;
specifically, move the self.server.build_job invocation into the
JENKINS_JOB_STAGE_PIPELINE branch (or add an explicit early exit in the else
branch) to ensure parameters_value is always defined when passed to build_job.

---

Nitpick comments:
In @.claude/commands/ci/analyze-jenkins-failures.md:
- Around line 147-336: The "image-consistency-check Job" section is outdated
because that job moved to Prow; either delete the entire "#### A.
image-consistency-check Job" block or replace its content with a short
deprecation notice that immediately directs users to the Prow workflow (e.g.,
reference using /ci:analyze-failures) and explains that stage-testing Jenkins
docs no longer apply; edit the header "#### A. image-consistency-check Job" and
the following paragraphs to perform this replacement so readers aren’t misled by
Jenkins-specific instructions.
- Around line 5-17: Consolidate the duplicate migration note about
"image-consistency-check has been migrated from Jenkins to Prow" by keeping the
single instance in the Overview section near the top (the note currently after
the Jenkins job URL) and removing the redundant note at the bottom of the file;
update surrounding text if necessary so the overview still mentions that Prow
job failures should use `/ci:analyze-failures` and delete the later duplicate
lines that repeat the migration message.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 85b57e28-5b9c-4383-bba7-2e52b6790318

📥 Commits

Reviewing files that changed from the base of the PR and between d1e8600 and 4666271.

📒 Files selected for processing (14)
  • .claude/commands/ci/analyze-failures.md
  • .claude/commands/ci/analyze-jenkins-failures.md
  • .claude/commands/release/drive.md
  • .claude/skills/release-workflow/SKILL.md
  • AGENTS.md
  • CLAUDE.md
  • docs/KONFLUX_RELEASE_FLOW.md
  • mcp_server/server.py
  • oar/README.md
  • oar/cli/cmd_image_consistency_check.py
  • oar/core/jenkins.py
  • oar/core/notification.py
  • oar/core/statebox.py
  • tests/test_jenkins.py
💤 Files with no reviewable changes (1)
  • tests/test_jenkins.py

@openshift-ci
Copy link

openshift-ci bot commented Mar 11, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ming1013 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tomasdavidorg tomasdavidorg marked this pull request as ready for review March 12, 2026 12:53
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 12, 2026
@openshift-ci openshift-ci bot requested review from jhuttana and rioliu-rh March 12, 2026 12:58
@tomasdavidorg tomasdavidorg added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 12, 2026
@tomasdavidorg
Copy link
Contributor Author

Adding labeů DO_NOT_MERGE. We should consider merging this after Atlassian Cloud Migration.

@tomasdavidorg
Copy link
Contributor Author

ci/prow/oar-check fail... loooking

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 12, 2026

@tomasdavidorg: This pull request references OCPERT-300 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

https://issues.redhat.com/browse/OCPERT-300

Summary by CodeRabbit

  • New Features

  • Image consistency check now runs on Prow with job‑ID based triggering and status checks.

  • Notifications: shareable Prow job links sent to Slack.

  • Documentation

  • CLI examples and workflow docs updated to use job IDs and reflect Prow migration.

  • Jenkins docs narrowed to stage-testing guidance.

  • Chores

  • Removed legacy Jenkins image‑consistency paths and moved CLI/status flags to job‑id.

  • Build now exposes additional CLI helpers during image build.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest keeping call_image_consistency_job() and the JENKINS_JOB_IMAGE_CONSISTENCY_CHECK branch in call_build_job() rather than removing them.

The goal of this PR is to replace Jenkins with Prow in the CLI layer (cmd_image_consistency_check.py), which is done correctly. But removing the underlying Jenkins method goes further than needed and introduces two risks:

  1. Rollback / legacy support: If a rollback is ever needed or an older release still needs the Jenkins path, the capability is gone.
  2. Latent bug in call_build_job(): Removing the JENKINS_JOB_IMAGE_CONSISTENCY_CHECK branch changes the if/elif chain to start directly with if job_name == JENKINS_JOB_STAGE_PIPELINE. Any future call with an unrecognized job name will silently fall through with parameters_value unset, which would cause a runtime error.

Suggested approach: only touch cmd_image_consistency_check.py and leave jenkins.py unchanged. The Jenkins methods become unused by the CLI but remain available as a safety net.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old Jenkins implementation guarded against duplicate runs via is_job_enqueue(). That guard is removed here with no Prow equivalent, so re-invoking the command while a job is already running will trigger a duplicate Prow job.

The fix is straightforward: when the task is already In Progress, parse the job ID from the StateBox result field (which is populated by cli_result_callback from captured logs), then call get_job_results(job_id) to check status instead of triggering a new job.

The log line logger.info(f"Triggered image consistency check Prow job: {job_id}") is already captured and saved to StateBox result by cli_result_callback, so the job ID is recoverable.

Suggested logic in trigger_job():

task_status = self.statebox.get_task_status(TASK_IMAGE_CONSISTENCY_CHECK)
if task_status == TASK_STATUS_PASS:
    logger.info("Image consistency check already passed, skipping")
    return

if task_status == TASK_STATUS_INPROGRESS:
    # Recover existing Prow job ID from StateBox result to avoid duplicate trigger
    task = self.statebox.get_task(TASK_IMAGE_CONSISTENCY_CHECK)
    result_text = (task or {}).get("result", "") or ""
    match = re.search(r"Triggered image consistency check Prow job: (\S+)", result_text)
    if match:
        existing_job_id = match.group(1)
        logger.info(f"Prow job {existing_job_id} already in progress, skipping duplicate trigger")
        return
    # No job ID found in result, fall through and trigger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues with the job state handling:

1. Empty string "" in the in-progress check is dead code

run_image_consistency_check calls get_job_results(job_id, poll=True) internally and only returns once jobURL is not None — meaning the job is confirmed scheduled by Gangway. At that point jobState is guaranteed to be a valid Prow state enum, never an empty string. The "" case should be removed.

2. aborted and error states should re-trigger, not permanently fail

The current else branch treats all non-success, non-in-progress states as TASK_STATUS_FAIL. But aborted (manually canceled) and error (infra issue) are transient — a new job run should be triggered. failure however means the job ran but the image consistency check actually failed, so that should remain TASK_STATUS_FAIL.

Suggested state handling based on Prow state enum (triggered, pending, running, success, failure, aborted, error):

if job_state == "success":
    healthy = self.io.check_image_health()
    task_status = TASK_STATUS_PASS if healthy else TASK_STATUS_FAIL
elif job_state in ("triggered", "pending", "running"):
    task_status = TASK_STATUS_INPROGRESS
elif job_state in ("aborted", "error"):
    # Job was canceled or had infra issues - reset so trigger_job will re-trigger
    logger.warning(f"Prow job {job_id} ended with state '{job_state}', resetting for re-trigger")
    task_status = TASK_STATUS_NOT_STARTED
else:
    # failure - job ran but image consistency check failed
    task_status = TASK_STATUS_FAIL

The duplicate-trigger guard in trigger_job() should also account for this: when recovering a job ID from StateBox and querying its state, only skip re-triggering if state is in {triggered, pending, running, success}. For aborted/error, proceed to trigger a new run.

@openshift-ci
Copy link

openshift-ci bot commented Mar 13, 2026

@tomasdavidorg: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants