OCPERT-300: Replace triggering Image Consistency Jenkins job with Prow job#943
OCPERT-300: Replace triggering Image Consistency Jenkins job with Prow job#943tomasdavidorg wants to merge 2 commits intoopenshift:mainfrom
Conversation
|
@tomasdavidorg: This pull request references OCPERT-300 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughMigrates image-consistency-check from Jenkins to Prow: replaces build-number flags with job-id, removes Jenkins trigger paths, adds Prow/Jobs API integration, updates CLI signatures, notification helpers, server API, docs, and tests to use Prow job IDs and workflows. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan for PR comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Skipping CI for Draft Pull Request. |
|
@tomasdavidorg: This pull request references OCPERT-300 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
oar/core/jenkins.py (1)
133-146:⚠️ Potential issue | 🟠 MajorUninitialized variable will cause
NameErrorwhen unsupported job name is passed.The code logs a warning for unsupported job names but continues execution. At line 146,
parameters_valueis used unconditionally without being defined in the else branch, which will raise aNameErrorat runtime ifcall_build_jobis invoked with any job name other thanJENKINS_JOB_STAGE_PIPELINE.🐛 Proposed fix
if (job_name == JENKINS_JOB_STAGE_PIPELINE): parameters_value = { "VERSION": self.version, "PULL_SPEC": pull_spec, } # only add metadata ad to job request for errata release flow # if param metadata_ad is not provided, stage 'Verify Metadata' will be skipped. if self.metadata_ad: parameters_value["METADATA_AD"] = self.metadata_ad else: - logger.info(f"{job_name} is not supported") + raise JenkinsHelperException(f"{job_name} is not supported") # trigger build and get queue item in resp payload res = self.server.build_job(job_name, parameters=parameters_value)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@oar/core/jenkins.py` around lines 133 - 146, The code uses parameters_value unconditionally after logging an unsupported job, which causes a NameError; update the call_build_job flow in the method (the block referencing job_name, JENKINS_JOB_STAGE_PIPELINE, parameters_value and self.server.build_job) so that for unsupported job names you either return/raise immediately (e.g., raise ValueError or return None) or initialize a safe parameters_value before calling self.server.build_job; specifically, move the self.server.build_job invocation into the JENKINS_JOB_STAGE_PIPELINE branch (or add an explicit early exit in the else branch) to ensure parameters_value is always defined when passed to build_job.docs/KONFLUX_RELEASE_FLOW.md (1)
714-719:⚠️ Potential issue | 🟡 MinorInconsistent terminology: Jenkins mentioned for Prow-based job.
Line 719 mentions "Jenkins job execution: 90-120 minutes" but this section now describes a Prow-based workflow. The terminology should be updated to reflect the Prow migration.
📝 Suggested fix
**Expected Duration:** - Stage-release pipeline wait: Variable (requires ART team intervention if failed) -- Jenkins job execution: 90-120 minutes after trigger succeeds +- Prow job execution: 90-120 minutes after trigger succeeds - User should check status every 10-15 minutes🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/KONFLUX_RELEASE_FLOW.md` around lines 714 - 719, Update the outdated "Jenkins" terminology in the release flow text: replace occurrences like "Jenkins job execution: 90-120 minutes after trigger succeeds" and any nearby references to "Jenkins job failure" with Prow-appropriate wording (e.g., "Prow job execution" and "Prow job failure"), and ensure related phrases such as "User should check status every 10-15 minutes" and "Stage-release pipeline not ready" remain consistent with the Prow-based workflow; search for the exact strings "Jenkins job execution", "Jenkins job failure", and "Stage-release pipeline" to locate and update the lines in KONFLUX_RELEASE_FLOW.md.
🧹 Nitpick comments (2)
.claude/commands/ci/analyze-jenkins-failures.md (2)
147-336: Consider removing or marking the image-consistency-check section as deprecated.The entire "image-consistency-check Job" analysis section (lines 147-336) remains in this Jenkins-focused document, but image-consistency-check has migrated to Prow. This section may confuse users since:
- The file description states it's for "stage-testing" Jenkins jobs
- The migration note says to use
/ci:analyze-failuresfor Prow jobsEither remove this section or add a prominent deprecation notice at its start redirecting users to the Prow analysis path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/commands/ci/analyze-jenkins-failures.md around lines 147 - 336, The "image-consistency-check Job" section is outdated because that job moved to Prow; either delete the entire "#### A. image-consistency-check Job" block or replace its content with a short deprecation notice that immediately directs users to the Prow workflow (e.g., reference using /ci:analyze-failures) and explains that stage-testing Jenkins docs no longer apply; edit the header "#### A. image-consistency-check Job" and the following paragraphs to perform this replacement so readers aren’t misled by Jenkins-specific instructions.
5-17: Duplicate migration notes may cause confusion.Lines 7 and 16-17 both contain migration notes about image-consistency-check moving to Prow. Consider consolidating to a single note to avoid redundancy.
♻️ Suggested consolidation
Keep only one migration note, preferably at line 7 in the overview section, and remove lines 16-17 since the overview already covers it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/commands/ci/analyze-jenkins-failures.md around lines 5 - 17, Consolidate the duplicate migration note about "image-consistency-check has been migrated from Jenkins to Prow" by keeping the single instance in the Overview section near the top (the note currently after the Jenkins job URL) and removing the redundant note at the bottom of the file; update surrounding text if necessary so the overview still mentions that Prow job failures should use `/ci:analyze-failures` and delete the later duplicate lines that repeat the migration message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@docs/KONFLUX_RELEASE_FLOW.md`:
- Around line 714-719: Update the outdated "Jenkins" terminology in the release
flow text: replace occurrences like "Jenkins job execution: 90-120 minutes after
trigger succeeds" and any nearby references to "Jenkins job failure" with
Prow-appropriate wording (e.g., "Prow job execution" and "Prow job failure"),
and ensure related phrases such as "User should check status every 10-15
minutes" and "Stage-release pipeline not ready" remain consistent with the
Prow-based workflow; search for the exact strings "Jenkins job execution",
"Jenkins job failure", and "Stage-release pipeline" to locate and update the
lines in KONFLUX_RELEASE_FLOW.md.
In `@oar/core/jenkins.py`:
- Around line 133-146: The code uses parameters_value unconditionally after
logging an unsupported job, which causes a NameError; update the call_build_job
flow in the method (the block referencing job_name, JENKINS_JOB_STAGE_PIPELINE,
parameters_value and self.server.build_job) so that for unsupported job names
you either return/raise immediately (e.g., raise ValueError or return None) or
initialize a safe parameters_value before calling self.server.build_job;
specifically, move the self.server.build_job invocation into the
JENKINS_JOB_STAGE_PIPELINE branch (or add an explicit early exit in the else
branch) to ensure parameters_value is always defined when passed to build_job.
---
Nitpick comments:
In @.claude/commands/ci/analyze-jenkins-failures.md:
- Around line 147-336: The "image-consistency-check Job" section is outdated
because that job moved to Prow; either delete the entire "#### A.
image-consistency-check Job" block or replace its content with a short
deprecation notice that immediately directs users to the Prow workflow (e.g.,
reference using /ci:analyze-failures) and explains that stage-testing Jenkins
docs no longer apply; edit the header "#### A. image-consistency-check Job" and
the following paragraphs to perform this replacement so readers aren’t misled by
Jenkins-specific instructions.
- Around line 5-17: Consolidate the duplicate migration note about
"image-consistency-check has been migrated from Jenkins to Prow" by keeping the
single instance in the Overview section near the top (the note currently after
the Jenkins job URL) and removing the redundant note at the bottom of the file;
update surrounding text if necessary so the overview still mentions that Prow
job failures should use `/ci:analyze-failures` and delete the later duplicate
lines that repeat the migration message.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 85b57e28-5b9c-4383-bba7-2e52b6790318
📒 Files selected for processing (14)
.claude/commands/ci/analyze-failures.md.claude/commands/ci/analyze-jenkins-failures.md.claude/commands/release/drive.md.claude/skills/release-workflow/SKILL.mdAGENTS.mdCLAUDE.mddocs/KONFLUX_RELEASE_FLOW.mdmcp_server/server.pyoar/README.mdoar/cli/cmd_image_consistency_check.pyoar/core/jenkins.pyoar/core/notification.pyoar/core/statebox.pytests/test_jenkins.py
💤 Files with no reviewable changes (1)
- tests/test_jenkins.py
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Adding labeů |
|
ci/prow/oar-check fail... loooking |
|
@tomasdavidorg: This pull request references OCPERT-300 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Suggest keeping call_image_consistency_job() and the JENKINS_JOB_IMAGE_CONSISTENCY_CHECK branch in call_build_job() rather than removing them.
The goal of this PR is to replace Jenkins with Prow in the CLI layer (cmd_image_consistency_check.py), which is done correctly. But removing the underlying Jenkins method goes further than needed and introduces two risks:
- Rollback / legacy support: If a rollback is ever needed or an older release still needs the Jenkins path, the capability is gone.
- Latent bug in
call_build_job(): Removing theJENKINS_JOB_IMAGE_CONSISTENCY_CHECKbranch changes theif/elifchain to start directly withif job_name == JENKINS_JOB_STAGE_PIPELINE. Any future call with an unrecognized job name will silently fall through withparameters_valueunset, which would cause a runtime error.
Suggested approach: only touch cmd_image_consistency_check.py and leave jenkins.py unchanged. The Jenkins methods become unused by the CLI but remain available as a safety net.
There was a problem hiding this comment.
The old Jenkins implementation guarded against duplicate runs via is_job_enqueue(). That guard is removed here with no Prow equivalent, so re-invoking the command while a job is already running will trigger a duplicate Prow job.
The fix is straightforward: when the task is already In Progress, parse the job ID from the StateBox result field (which is populated by cli_result_callback from captured logs), then call get_job_results(job_id) to check status instead of triggering a new job.
The log line logger.info(f"Triggered image consistency check Prow job: {job_id}") is already captured and saved to StateBox result by cli_result_callback, so the job ID is recoverable.
Suggested logic in trigger_job():
task_status = self.statebox.get_task_status(TASK_IMAGE_CONSISTENCY_CHECK)
if task_status == TASK_STATUS_PASS:
logger.info("Image consistency check already passed, skipping")
return
if task_status == TASK_STATUS_INPROGRESS:
# Recover existing Prow job ID from StateBox result to avoid duplicate trigger
task = self.statebox.get_task(TASK_IMAGE_CONSISTENCY_CHECK)
result_text = (task or {}).get("result", "") or ""
match = re.search(r"Triggered image consistency check Prow job: (\S+)", result_text)
if match:
existing_job_id = match.group(1)
logger.info(f"Prow job {existing_job_id} already in progress, skipping duplicate trigger")
return
# No job ID found in result, fall through and triggerThere was a problem hiding this comment.
Two issues with the job state handling:
1. Empty string "" in the in-progress check is dead code
run_image_consistency_check calls get_job_results(job_id, poll=True) internally and only returns once jobURL is not None — meaning the job is confirmed scheduled by Gangway. At that point jobState is guaranteed to be a valid Prow state enum, never an empty string. The "" case should be removed.
2. aborted and error states should re-trigger, not permanently fail
The current else branch treats all non-success, non-in-progress states as TASK_STATUS_FAIL. But aborted (manually canceled) and error (infra issue) are transient — a new job run should be triggered. failure however means the job ran but the image consistency check actually failed, so that should remain TASK_STATUS_FAIL.
Suggested state handling based on Prow state enum (triggered, pending, running, success, failure, aborted, error):
if job_state == "success":
healthy = self.io.check_image_health()
task_status = TASK_STATUS_PASS if healthy else TASK_STATUS_FAIL
elif job_state in ("triggered", "pending", "running"):
task_status = TASK_STATUS_INPROGRESS
elif job_state in ("aborted", "error"):
# Job was canceled or had infra issues - reset so trigger_job will re-trigger
logger.warning(f"Prow job {job_id} ended with state '{job_state}', resetting for re-trigger")
task_status = TASK_STATUS_NOT_STARTED
else:
# failure - job ran but image consistency check failed
task_status = TASK_STATUS_FAILThe duplicate-trigger guard in trigger_job() should also account for this: when recovering a job ID from StateBox and querying its state, only skip re-triggering if state is in {triggered, pending, running, success}. For aborted/error, proceed to trigger a new run.
|
@tomasdavidorg: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
https://issues.redhat.com/browse/OCPERT-300
Summary by CodeRabbit
New Features
Documentation
Chores