OCPBUGS-91656: fix(test): add retry logic to GetLogs in Karpenter kubelet propagatio…#8805
Conversation
…n test The kubelet serving certificate on freshly provisioned Karpenter nodes may not be ready immediately after the node is marked Ready, causing transient "tls: internal error" or "http2: client connection lost" failures on the proxied log request. Wrap the GetLogs call in g.Eventually with a 2-minute timeout to tolerate this window. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@mgencur: This pull request references Jira Issue OCPBUGS-91656, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughIn Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 2 warnings)
✅ Passed checks (8 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8805 +/- ##
=======================================
Coverage 42.12% 42.12%
=======================================
Files 767 767
Lines 95067 95067
=======================================
Hits 40051 40051
Misses 52202 52202
Partials 2814 2814
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
/approve |
|
Scheduling tests matching the |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre, mehabhalodiya, mgencur The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
Test Resultse2e-aws
e2e-aks
|
|
/retest |
|
I now have all the evidence needed for a comprehensive analysis. Here is the report: Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe Root CauseThe root cause is a transient Snyk service hang during the SAST (Static Application Security Testing) scan step of the Konflux pipeline. The Key evidence proving this is not related to the PR:
The downstream Recommendations
Evidence
|
|
/verified by unit test This PR only includes a fix for the given unit test. |
|
@mgencur: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
/jira refresh |
|
@mgencur: This pull request references Jira Issue OCPBUGS-91656, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@mgencur: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@mgencur: Jira Issue Verification Checks: Jira Issue OCPBUGS-91656 Jira Issue OCPBUGS-91656 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Fix included in release 5.0.0-0.nightly-2026-06-25-194049 |
…n test
What this PR does / why we need it:
The kubelet serving certificate on freshly provisioned Karpenter nodes may not be ready immediately after the node is marked Ready, causing transient "tls: internal error" or "http2: client connection lost" failures on the proxied log request. Wrap the GetLogs call in g.Eventually with a 2-minute timeout to tolerate this window.
Which issue(s) this PR fixes:
Fixes https://redhat.atlassian.net/browse/OCPBUGS-91656
Special notes for your reviewer:
Checklist:
Summary by CodeRabbit