product-lifecycle: Add plc_lookup.py CLI, migrate to v2 API#13
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: harche The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold will look into this after #6 |
ce82a3e to
b0a3f09
Compare
Testing ResultsEval Framework (Agent-in-the-loop)Ran 12 skill evals via the agentic-skills eval framework using Claude on Vertex AI. Test queries use the exact prompt format CVO proposals send to the agent, with real OLM operator data and readiness JSON. product-lifecycle (7 evals)
update-advisor (5 evals)Each test sends a complete readiness JSON (same format CVO produces) and verifies the agent's upgrade decision:
All 12 agent evals + 46 unit/integration tests passed. End-to-End with CVO Readiness DataTested the full proposal → readiness → skill pipeline using a custom CVO build from openshift/cluster-version-operator#1395 deployed to a live OCP 4.21.5 cluster on GCP (6 nodes). Installed OLM operators for testing:
Flow verified:
Cluster readiness summary (from proposal
Ground truth verificationAll expected values in evals were verified against the live PLC v2 API: 🤖 Generated with Claude Code |
|
/cc @wking |
7e43631 to
eace3b2
Compare
|
/hold improving evals with better data. |
f01f289 to
81bf078
Compare
|
Converting to draft, I need to make a few more changes before it is ready for the review. |
81bf078 to
0544f37
Compare
back to open. |
0544f37 to
ef3bf25
Compare
Cross-PR testingTested end-to-end with:
All three deployed together on OCP 4.21.5 cluster. The product-lifecycle skill ( The skill path fix ( |
|
/override ci/prow/eval |
|
@harche: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| expected: | ||
| decision: "caution" | ||
| blockers_found: 0 | ||
| degraded_operator_detected: true |
There was a problem hiding this comment.
how specific do we want to make these eval checks? E.g. you currently have a boolean here. But you could raise the bar a bit for the AI engine being evaluated by requiring an array of degraded operator names. Or further by requiring an array of degraded operator names and propogation of the reason and message (OAuthFlaky and oauth pods intermittently failing for the authentication information in the fixture prompt, for example). Or you could raise the bar even higher by requiring AI to return some suggested next-step that goes beyond the information we already knew when generating the input query. Is the goal here smoke-testing, or high-fidelity spec -> status plumbing, or AI-adds-value-beyond-what-the-classical-bots-already-knew, or what?
|
|
||
| def _run_cli(self, *args): | ||
| result = subprocess.run( | ||
| [sys.executable, SCRIPT_PATH, *args], |
There was a problem hiding this comment.
nit: you're already importing plc_lookup to feed tests like self.assertEqual(plc_lookup.normalize_status("Full Support"), "supported"). We could reroll plc_lookup to replace the current main() and args = parser.parse_args() with:
def main(args=None):
# ...
args = parser.parse_args(args=args, output=sys.stdout)
# ...so this test harness can call:
output = io.StringIO()
plc_lookup.main(args=['products', 'logging for Red Hat OpenShift'], output=io)or some such, without needing to bother with SCRIPT_PATH magic or subprocess execution.
Doesn't need to happen in this pull though, or maybe ever. I'm just floating as a potential simplification that might make future maintenance easier, and we can always come back to this in the future if SCRIPT_PATH gives us trouble.
|
|
||
| def test_unknown_status(self): | ||
| self.assertEqual(plc_lookup.normalize_status("Something New"), "unknown") | ||
| self.assertEqual(plc_lookup.normalize_status(""), "unknown") |
There was a problem hiding this comment.
nit: seems like this risks us conflating two unknown statuses? Maybe normalize_status should opaquely pass through any unrecognized status strings?
| def test_case_insensitive(self): | ||
| version = SAMPLE_PRODUCT["versions"][0] | ||
| result = plc_lookup.extract_phase_date(version, "general availability") | ||
| self.assertIsNotNone(result) |
There was a problem hiding this comment.
hmm, I'd have expected the case-insensitive lookup to return the same result as the General availability test a few cases earlier. Are we asserting that phase casing is important for some reason?
|
|
||
| class TestFormatProductVersion(unittest.TestCase): | ||
| def test_without_target(self): | ||
| result = plc_lookup.format_product_version(SAMPLE_PRODUCT, SAMPLE_PRODUCT["versions"][0]) |
There was a problem hiding this comment.
nit: it's a bit strange to me to have two arguments where the later is a subset of the data in the former. It seems like we'd reduce the space for inconsistencies if we had one argument for the whole product structure, and the second argument with just the information needed to look up the version we wanted formatted (and not the entire version structure we wanted formatted). Like format_product_version(product=SAMPLE_PRODUCT, version_index=0) or format_product_version(product=SAMPLE_PRODUCT, version_name=SAMPLE_PRODUCT['versions'][0].name) or some such. No need to block merging over this, just pointing it out in case it makes sense to you, or in case someone else wants to reroll to use that approach in the future.
| items = list(range(10)) | ||
| page, meta = plc_lookup.paginate(items, limit=3, offset=3) | ||
| self.assertEqual(page, [3, 4, 5]) | ||
| self.assertEqual(meta["next_offset"], 6) |
There was a problem hiding this comment.
nit: do we want to assert total and returned here too, like we did for test_limit and earlier? Seems easy to just continue that pattern on? Or if we want to make it even harder to forget to check some aspect of the response, we can pivot all of these to self.assertEqual(meta, $FIXME_HAND_CRAFT_ENTIRE_META_DICT) so the tests fail if meta includes a new key we forgot to add test-coverage for.
| self.assertEqual(ret, 0) | ||
| self.assertEqual(output["total"], 2) | ||
| self.assertEqual(output["returned"], 2) | ||
| self.assertEqual(output["results"][0]["version"], "6.5") |
There was a problem hiding this comment.
probably worth an self.assertEqual(output['results'][0]['product'], 'logging for Red Hat OpenShift') or some such here too, to confirm that the result we're looking at is the expected SAMPLE_PRODUCT entry without having to rely on the 6.5 version. Maybe drop the 6.5 check and use:
for result in output['results']:
self.assertEqual(result['product'], 'logging for Red Hat OpenShift')`since what we're trying to excercise in this test_found case is the name-based lookup?
| self.assertEqual(ret, 0) | ||
| self.assertEqual(output["operators_checked"], 1) | ||
| self.assertEqual(output["lifecycle_unavailable"], []) | ||
| self.assertGreater(len(output["results"]), 0) |
There was a problem hiding this comment.
why assertGreater here instead of being more clear on the specifics of the results we expect? We should be able to assertEqual the entire output string, right?
| self.assertEqual(r.returncode, 0, r.stderr) | ||
| output = json.loads(r.stdout) | ||
| self.assertGreater(output["total"], 0) | ||
| self.assertEqual(output["results"][0]["package"], "cluster-logging") |
There was a problem hiding this comment.
nit: if we're allowing >0 results, shouldn't we be iterating over results to confirm that at least one of the results is cluster-logging? That way this test keeps passing if the lifecycle data picks up some other product that matches logging but which isn't the OLM-installed operator we're looking for and that other product happens to come first.
| self.assertEqual(r.returncode, 0, r.stderr) | ||
| output = json.loads(r.stdout) | ||
| names = {entry["product"] for entry in output["results"]} | ||
| self.assertTrue(any("OpenShift" in n for n in names)) |
There was a problem hiding this comment.
$ curl -s 'https://access.redhat.com/product-life-cycles/api/v2/products?name=Openshift+Container+Platform' | jq -c '.data[] | {name, former_names}'
{"name":"Red Hat OpenShift Container Platform","former_names":["OpenShift Container Platform"," OpenShift Container Platform 3"," OpenShift Container Platform 4"]}so at the moment, only the one hit. Can we assert that at least one entry has name or a former_names entry that exactly matches Red Hat OpenShift Container Platform instead of using the weaker OpenShift search? Something like:
names = []
for result in output['results']:
names.append(result.get('product'))
names.extend(results.get('former_names', [])
self.assertIn('Red Hat OpenShift Container Platform', names)or are we not preserving former_names through to the results structure?
| self.assertEqual(r.returncode, 0, r.stderr) | ||
| output = json.loads(r.stdout) | ||
| compatible = [e for e in output["results"] if e.get("ocp_compatible")] | ||
| self.assertEqual(len(compatible), 0, "No logging version should be compatible with OCP 3.11") |
There was a problem hiding this comment.
nit: slightly easier to debug this failing if we include some hints about the surprisingly-compatible versions, like:
self.assertEqual(compatible, [], 'No logging version should be compatible with OCP 3.11')| def test_subcommand_help(self): | ||
| for cmd in ["products", "olm-check"]: | ||
| r = self._run_cli(cmd, "-h") | ||
| self.assertEqual(r.returncode, 0, f"{cmd} -h failed") |
There was a problem hiding this comment.
nit: if we want to assert something useful about the content of the subcommand help, we could use something like:
self.assertIn('Max results to return (0 = all, default: all)', r.stdout)| @@ -0,0 +1,190 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
I'm not aware of a commitment from OLS that whatever image they use to back their sandbox Pod will always have python3. But 🤷 , that's what end-to-end testing is for, wherever that is happening.
|
|
||
|
|
||
| def normalize_status(raw_type): | ||
| return STATUS_MAP.get(raw_type, "unknown") |
There was a problem hiding this comment.
I'm not clear on the value of a 1:1 mapping. See here on my concerns around the unknown default vs. passing through the raw type. And checking the production API now, there doesn't seems to be much diversity like case-inconsistency to collapse:
$ curl -s 'https://access.redhat.com/product-life-cycles/api/v2/products' | jq -r '.data[].versions[].type' | sort | uniq -c | sort -n
12
43 End of Maintenance
116 Extended Support
173 Maintenance Support
263 Full Support
891 End of lifeCan we use the raw types, instead of inventing our own strings? Or are we concerned that the upstream consistency will break down over time, or that the upstream API will introduce other new strings that we want to collapse into a smaller set of our strings? Or...?
| "ga_date": extract_phase_date(version, "General availability"), | ||
| "full_support_end": extract_phase_date(version, "Full support"), | ||
| "maintenance_end": extract_phase_date(version, "Maintenance support"), | ||
| } |
There was a problem hiding this comment.
don't we need the other phases too, e.g. Extended update support and Extended update support Term 2?:
$ curl -s 'https://access.redhat.com/product-life-cycles/api/v2/products?name=fence+agents+remediation' | jq -c '.data[].versions[0].phases[]'
{"name":"General availability","start_date":"N/A","end_date":"2026-06-16T00:00:00.000Z","start_date_format":"string","end_date_format":"date","additional_text":""}
{"name":"Full support","start_date":"2026-06-16T00:00:00.000Z","end_date":"2026-12-31T00:00:00.000Z","start_date_format":"date","end_date_format":"date","additional_text":""}
{"name":"Maintenance support","start_date":"2027-01-01T00:00:00.000Z","end_date":"2027-12-31T00:00:00.000Z","start_date_format":"date","end_date_format":"date","additional_text":""}
{"name":"Extended update support","start_date":"2028-01-01T00:00:00.000Z","end_date":"2028-06-30T00:00:00.000Z","start_date_format":"date","end_date_format":"date","additional_text":""}
{"name":"Extended update support Term 2","start_date":"2028-07-01T00:00:00.000Z","end_date":"2029-06-30T00:00:00.000Z","start_date_format":"date","end_date_format":"date","additional_text":""}|
|
||
| def paginate(results, limit, offset): | ||
| total = len(results) | ||
| start = min(offset, total) |
There was a problem hiding this comment.
nit: I'm not all that clear on https://access.redhat.com/product-life-cycles/api/v2/products order commitments (do they commit to stable ordering? Do they commit to any particular order like "alphabetical by current product name"? What happens if a product gets renamed? Does the caller here really care about limiting results for pagination? Do we have a situation where we'd recommend the consuming AI agent actually paginate the results? Maybe we can just drop the pagination feature as something with unclear value, return all the results every time, and recommend in the skill Markdown that the agent try to be specific with their queries? Or maybe we can drop --offset, keep --limit, and explain in the skill that if the response shows --limit mattered, that the AI should look at the returned product names and try to use that context to try again with a more specific name for the products it was hoping to retrieve?
| # Cluster Update Advisor | ||
|
|
||
| ## Purpose | ||
| ## 1. Purpose |
| product-lifecycle skills for deeper analysis. | ||
| │ | ||
| ▼ | ||
| Produce structured risk report |
| operators = json.loads(args.operators) | ||
| target = args.ocp | ||
|
|
||
| batch = api_search("OpenShift") |
There was a problem hiding this comment.
hmm, this seems like a risky way to try and find the subset of products that includes package. For example:
$ curl -s 'https://access.redhat.com/product-life-cycles/api/v2/products' | jq -c '.data[] | select(.package) | {name, former_names, package}' | grep -v OpenShift
{"name":"AMQ broker","former_names":["Red Hat AMQ","Red Hat AMQ Broker"],"package":"amq-broker-rhel8"}
{"name":"AMQ Broker Operator (RHEL8)","former_names":[],"package":"amq-broker-rhel8"}
{"name":"AMQ Broker Operator (RHEL9)","former_names":[],"package":"amq-broker-rhel9"}
{"name":"AMQ interconnect","former_names":["Red Hat AMQ Interconnect"],"package":"amq7-interconnect-operator"}
...
$ curl -s 'https://access.redhat.com/product-life-cycles/api/v2/products' | jq -c '.data[] | select(.package) | {name, former_names, package}' | grep -v OpenShift | wc -l
122vs. only 9 with the name=OpenShift product filter:
$ curl -s 'https://access.redhat.com/product-life-cycles/api/v2/products?name=OpenShift' | jq -c '.data[] | select(.package) | {name, former_names, package}'{"name":"builds for Red Hat OpenShift","former_names":["Builds for Red Hat OpenShift"],"package":"openshift-builds-operator"}
{"name":"custom metrics autoscaler","former_names":["Custom Metric Autoscaler operator for Red Hat OpenShift"],"package":"openshift-custom-metrics-autoscaler-operator"}
{"name":"kernel module management","former_names":["Kernel Module Management operator for Red Hat OpenShift"],"package":"kernel-module-management"}
{"name":"logging for Red Hat OpenShift","former_names":["Red Hat OpenShift Logging"],"package":"cluster-logging"}
{"name":"OpenShift APIs for Data Protection","former_names":["OpenShift API for Data Protection"],"package":"redhat-oadp-operator"}
{"name":"Red Hat OpenShift AI Self-Managed","former_names":["Red Hat OpenShift Data Science self-managed"],"package":"rhods-operator"}
{"name":"Red Hat OpenShift Data Foundation","former_names":["OpenShift Container Storage 4","Red Hat OpenShift Data Foundation (formerly OpenShift Container Storage 4)"],"package":"odf-operator"}
{"name":"Red Hat OpenShift intelligent assistant","former_names":["Red Hat OpenShift Lightspeed"],"package":"lightspeed-operator"}
{"name":"Red Hat OpenShift support for Windows Containers","former_names":["Red Hat OpenShift support for Windows Containers","Red Hat OpenShift for Windows Containers"],"package":"windows-machine-config-operator"}
{"name":"web terminal operator","former_names":["Red Hat OpenShift Web Terminal Operator","Web Terminal Operator"],"package":"web-terminal"}| for p in batch: | ||
| pkg = p.get("package") | ||
| if pkg: | ||
| by_package[pkg] = p |
There was a problem hiding this comment.
I hope we aren't often moving packages from one product to another, but I'm not sure it's impossible. For compat check purposes, we probably want to preserve every product that mentions the package we're interested in. You can use collections.defaultdict for this, something like:
by_package = collections.defaultdict(list)
for p in batch:
pkg = p.get("package")
if pkg:
by_package[pkg].append(p)after which by_package.get(package_name) would give you a list of all the products claiming that package, or None.
There was a problem hiding this comment.
Ah, and looks like we do need this already, e.g.:
$ curl -s 'https://access.redhat.com/product-life-cycles/api/v2/products' | jq -r '.data[].package | select(.)' | sort | uniq -c | sort -n | tail
1 trustee-operator
1 vertical-pod-autoscaler
1 volsync-product
1 watcher-operator
1 web-terminal
1 windows-machine-config-operator
2 amq7-interconnect-operator
2 amq-broker-rhel8
2 amq-streams
2 skupper-operator
$ curl -s 'https://access.redhat.com/product-life-cycles/api/v2/products' | jq -c '.data[] | select(.package == "skupper-operator") | {name, former_names}'
{"name":"Red Hat Service Interconnect","former_names":[]}
{"name":"Red Hat Service Interconnect Operator","former_names":[]}Happily for at least Red Hat Service Interconnect*, there doesn't seem to be any difference in phase information:
$ diff -u1 <(curl -s 'https://access.redhat.com/product-life-cycles/api/v2/products?name=Red+Hat+Service+Interconnect' | jq -c '.data[] | .versions[] | .name as $v | .phases[] | {version: $v, phase: .}') <(curl -s 'https://access.redhat.com/product-life-cycles/api/v2/products?name=Red+Hat+Service+Interconnect+Operator' | jq -c '.data[] | .versions[] | .name as $v | .phases[] | {version: $v, phase: .}')I haven't checked the others.
| product = by_package.get(pkg) | ||
|
|
||
| if not product: | ||
| extra = api_search(pkg) |
There was a problem hiding this comment.
This seems like it's trying to backstop for the earlier OpenShift query missing the product(s) that ships this package. But it's not a complete guard, e.g. it fails for the skupper-operator package:
$ curl -s 'https://access.redhat.com/product-life-cycles/api/v2/products?name=skupper-operator'
{"data":[]}
$ curl -s 'https://access.redhat.com/product-life-cycles/api/v2/products?name=OpenShift' | jq -c '[.data[] | select(.package == "skupper-operator")]'
[]
$ curl -s 'https://access.redhat.com/product-life-cycles/api/v2/products' | jq -c '[.data[] | select(.package == "skupper-operator").name]'
["Red Hat Service Interconnect","Red Hat Service Interconnect Operator"]| --- | ||
| name: product-lifecycle | ||
| description: Query Red Hat Product Life Cycle data for support phases, end-of-life dates, and OpenShift version compatibility. Use when evaluating whether installed operators or layered products are supported on a given OCP version, approaching end of life, or need upgrading before a cluster upgrade. Also use when the user asks about product support status, EOL dates, or lifecycle phases for any Red Hat product. | ||
| allowed-tools: Bash(python3:*) |
There was a problem hiding this comment.
Our previous discussion ended up deciding allowed-tools was too under-specified to be worth including. Doesn't look like the spec has evolved since then.
| # Red Hat Product Life Cycle | ||
|
|
||
| Query the Red Hat Product Life Cycle API to check support status, EOL dates, and OpenShift compatibility for Red Hat products and layered operators. | ||
| Query the Red Hat Product Life Cycle API (v2) to check support status, EOL |
There was a problem hiding this comment.
nit: this skill doesn't have to care that the tool queries the v2 API, so we can probably drop the change to this line. One less place that will need bumping if we eventually pivot the tool to a v3 API or whatever.
| #### `products` — Query products by name | ||
|
|
||
| ### Check support status for a specific product version | ||
| Maps directly to `GET /v2/products?name=<name>`. |
There was a problem hiding this comment.
Internal tool detail, right? I expect dropping this line would save the AI agent a few tokens parsing the skill, and not impact its ability to use the tool at all. And I expect dropping this line will save us the effort of having to remember to bump this line if the tool behavior changes in the future.
| } | ||
| ] | ||
| } | ||
| python3 product-lifecycle/scripts/plc_lookup.py -h |
There was a problem hiding this comment.
can't we drop python3 here, set the executable bit on plc_lookup.py, and rely on the #!/usr/bin/env python3 shebang to get the script invoked by its intended interpreter? Or is there something about volume mounting the skills image into the sandbox container that messes with executable bits?
| - Current and target version metadata | ||
| - Channel and update path information | ||
| - **Cluster readiness JSON** — cluster health checks with context relevant to preparing for the update | ||
| - **Cluster readiness JSON** — pre-collected by CVO with results from 9 parallel checks |
There was a problem hiding this comment.
This is restoring the commitment to explicit CVO behavior (calling for 9 exact checks) that we'd dropped after this earlier thread. Is my argument there making less sense now than it did then? 😅
| "total_checks": 9, | ||
| "checks_ok": 9, | ||
| "checks_errored": 0, | ||
| "elapsed_seconds": 0.65 |
There was a problem hiding this comment.
This is restoring the room for inconsistent claims that we'd dropped after this earlier thread. Is my argument there making less sense now than it did then? 😅
| check-specific data with a `summary` section for quick parsing. | ||
|
|
||
| ## Evaluation | ||
| ### What the checks cover |
There was a problem hiding this comment.
This is restoring the section committing to CVO behavior that seems hard to maintain, and which we'd dropped after this earlier thread. Is my argument there making less sense now than it did then? 😅
| Any check with `_status` `error` represents a gap in visibility. | ||
| Note incomplete areas — they reduce confidence. | ||
| - **`prometheus`** — if `etcd_health` shows degraded conditions, query | ||
| `etcd_disk_backend_commit_duration_seconds` for trends |
There was a problem hiding this comment.
We'd dropped this section after this earlier thread, and also have the Investigate with other skills section a bit further down that you're currently dropping. Why pivot back to the old approach?
Summary
plc_lookup.py) wrapping the Red Hat Product Life Cycle API v2--limit,--offset) for broad queriesBuilds on top of #6 (that PR stays open).
CLI usage
Test plan
python3 -m unittest— 19 unit + 17 integration against live API + 10 pagination)plc_lookup.py -hshows clean help output🤖 Generated with Claude Code