feat: surface init container failures, plugin compatibility, and vers… by cooktheryan · Pull Request #13 · redhat-et/claw-operator

cooktheryan · 2026-06-16T17:43:01Z

…ion downgrade warnings

When a Claw deployment fails due to init container crashes (e.g., a plugin requiring a newer OpenClaw version than spec.version), the operator previously reported only "Waiting for deployments to become ready" — forcing users to manually dig through pod logs to find the root cause. This was observed in production when spec.version 2026.6.5 was deployed against a PVC configured for 2026.6.8, causing the @openclaw/anthropic-vertex-provider plugin's init-plugins container to crash-loop with no actionable status on the CR.

This commit adds three warning-only status conditions:

Ready condition enrichment: when deployments are not ready, the operator inspects pods for init container failures (non-zero exit or CrashLoopBackOff) and surfaces the actual error message in the Ready condition with reason InitContainerFailure.
PluginCompatibility condition: when spec.version is older than a plugin's minimum required version (declared via PluginMinVersion in knownProviders), a warning condition is set. This does not block deployment — users may have compatible plugins cached on PVC.
VersionDowngrade condition: when spec.version is older than status.lastDeployedVersion, a warning is set about potential PVC data incompatibility. Downgrades remain fully permitted.

Summary by CodeRabbit

New Features
- Added new Claw status conditions for plugin compatibility and version downgrade detection.
- Extended status reporting with lastDeployedVersion to help identify potentially unsafe version downgrades.
- Improved non-ready status messaging by surfacing init-container failure details on the Ready condition.
Tests
- Added unit tests covering version comparison, plugin compatibility checks, init failure surfacing, and downgrade behavior.
Chores
- Updated controller permissions to read Pod details needed for init failure detection.

coderabbitai · 2026-06-16T17:43:22Z

Warning

Review limit reached

@cooktheryan, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 45 minutes and 56 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: fe4c17b5-32bc-4e1c-8fd0-726bba8b72eb

📥 Commits

Reviewing files that changed from the base of the PR and between 5148225 and f086d4d.

📒 Files selected for processing (9)

api/v1alpha1/claw_types.go
config/crd/bases/claw.sandbox.redhat.com_claws.yaml
config/rbac/role.yaml
internal/controller/claw_plugins.go
internal/controller/claw_plugins_test.go
internal/controller/claw_providers.go
internal/controller/claw_resource_controller.go
internal/controller/claw_status.go
internal/controller/claw_status_test.go

📝 Walkthrough

Walkthrough

Adds three safety and observability enhancements to the Claw operator: plugin CalVer compatibility checking that gates init-container installation and sets a PluginCompatibility condition; version downgrade detection using a new LastDeployedVersion status field that sets a VersionDowngrade condition; and init-container failure surfacing that enriches the Ready condition with pod termination details.

Changes

Claw Operator Safety Checks

Layer / File(s)	Summary
API types and CRD schema `api/v1alpha1/claw_types.go`, `config/crd/bases/claw.sandbox.redhat.com_claws.yaml`	Adds `ConditionTypePluginCompatibility` and `ConditionTypeVersionDowngrade` condition type constants, three new `ConditionReason` constants (`Incompatible`, `VersionDowngrade`, `InitContainerFailure`), and the `LastDeployedVersion` field on `ClawStatus`. The CRD YAML gains the corresponding `status.lastDeployedVersion` schema entry.
CalVer version comparison utilities `internal/controller/claw_plugins.go`, `internal/controller/claw_plugins_test.go`	Implements `compareCalver` for segment-by-segment numeric version comparison (returning `-1/0/1` or validity flag on parse failure). Full test coverage includes equality, ordering, year/segment differences, and malformed/empty inputs.
Plugin compatibility check logic `internal/controller/claw_plugins.go`, `internal/controller/claw_plugins_test.go`	Implements `checkPluginCompatibility` to validate `spec.version` against any implicitly required Vertex plugin minimum version (`PluginMinVersion`), returning a compatibility warning when spec version is too low or "cannot check" when spec is non-empty but not valid CalVer. Includes test subtests for vertex/non-vertex credentials, compatible/incompatible versions, and edge cases.
Provider version requirements `internal/controller/claw_providers.go`	Extends `providerDefaults` struct with `PluginMinVersion` field and configures the anthropic provider to require OpenClaw 2026.6.8 for Vertex plugin support.
Status management: init failures and version tracking `internal/controller/claw_status.go`, `internal/controller/claw_status_test.go`	Implements `checkPodInitFailures` to inspect pending deployment pods for non-zero exit codes or `CrashLoopBackOff` waiting states, and `setReadyConditionWithDetail` to enrich the Ready condition with failure details using `InitContainerFailure` reason. Implements version downgrade detection by comparing `spec.version` against `status.LastDeployedVersion`, setting `VersionDowngrade` condition on downgrade and removing it on upgrade; maintains `LastDeployedVersion` as a high-water mark when Ready. Includes full test coverage for init failure surfacing, downgrade/upgrade detection, and condition persistence.
Controller integration and RBAC `internal/controller/claw_resource_controller.go`, `config/rbac/role.yaml`	Wires `checkPluginCompatibility` into `configureDeployments` to set or remove the `PluginCompatibility` status condition based on version warnings. Updates controller RBAC with `get` and `list` verbs for `pods` resource to enable pod inspection for init-container failures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

redhat-et/claw-operator#7: Modifies configureDeployments and pluginInstallationDisabled gating that this PR directly extends with the plugin compatibility check.
redhat-et/claw-operator#1: Introduces spec.version and its image-tag override, which is the field this PR reads for both checkPluginCompatibility and VersionDowngrade detection.

Suggested reviewers

sallyom
pavelanni

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title is truncated mid-word ('vers…'), making it unclear what the final feature is and preventing full understanding of the changeset scope.	Complete the title to fully convey all three features: surface init container failures, plugin compatibility, and version downgrade detection.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/surface-deployment-failures

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/controller/claw_plugins.go`:
- Around line 243-275: The compareCalver function silently returns 0 when
parsing version parts fails via strconv.Atoi, treating malformed version strings
as equal and suppressing important compatibility warnings. Modify the
compareCalver function to return an error as a second return value instead of
returning 0 on parse failures. When strconv.Atoi fails for either aParts or
bParts, return the error so callers can properly handle and surface "cannot
compare" diagnostics for invalid version strings. Update all call sites of
compareCalver to check the error return value and propagate these diagnostics
appropriately to prevent suppressing plugin compatibility and downgrade
warnings.

In `@internal/controller/claw_resource_controller.go`:
- Around line 897-914: The plugin compatibility check is incorrectly nested
inside the conditional block that gates plugin auto-configuration. Move the
checkPluginCompatibility call and its associated condition-setting logic outside
the if/else block that checks userManagedConfig(instance) and
pluginInstallationDisabled(instance), so compatibility warnings are evaluated
regardless of plugin installation mode. Keep the configurePluginsInitContainer
call and effectivePlugins logic inside the original conditional block since
those should only execute when plugins are being auto-configured.

In `@internal/controller/claw_status.go`:
- Around line 205-222: Error handling in the initialization failure inspection
is swallowing all errors by returning empty strings indiscriminately. The
r.Get() call for the deployment, the metav1.LabelSelectorAsSelector() call, and
the r.List() call for pods all return empty string on error, making it
impossible to distinguish between "no init failure found" and "inspection
failed". Modify these error cases to return an error value or use a tuple return
type (string, error) so that real inspection failures are properly propagated
rather than masked. This issue occurs at the primary location and also applies
to similar error handling at another location in the file.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: c31162a9-cb88-412c-868b-670ce2e0e10b

📥 Commits

Reviewing files that changed from the base of the PR and between f8c18ad and e3e5f16.

📒 Files selected for processing (9)

api/v1alpha1/claw_types.go
config/crd/bases/claw.sandbox.redhat.com_claws.yaml
config/rbac/role.yaml
internal/controller/claw_plugins.go
internal/controller/claw_plugins_test.go
internal/controller/claw_providers.go
internal/controller/claw_resource_controller.go
internal/controller/claw_status.go
internal/controller/claw_status_test.go

IsaiahStapleton · 2026-06-16T23:14:41Z

I built and redeployed the operator with these changes and tried to test each warning only feature the operator adds and here is what I found:

Feature 1: Init Container Failure Surfacing — WORKS
Bogus plugin caused init-plugins to crash. Ready condition
correctly showed reason: InitContainerFailure with the
CrashLoopBackOff details instead of generic Provisioning.

Feature 2: Plugin Compatibility Check — WORKS
Setting spec.version: "2026.6.5" with an Anthropic Vertex
credential correctly set PluginCompatibility: Incompatible
warning about requiring >= 2026.6.8.

Feature 3: Version Downgrade Detection — BUG
Set spec.version: "2026.6.8", waited for ready, then
downgraded to "2026.6.1". No VersionDowngrade condition
appeared — lastDeployedVersion was overwritten to the
downgraded version despite the if ready guard, so the
comparison never detected a downgrade.

coderabbitai

🧹 Nitpick comments (1)

internal/controller/claw_status.go (1)
368-375: ⚡ Quick win

Consider validating the new version before updating the high-water mark.

When compareCalver returns !cmpTrackOK, it could mean either the old or new version is malformed. The current code updates LastDeployedVersion in this case, which could overwrite a valid old version with a malformed new version.

Scenario: Deploy "2026.6.8" → Deploy "notaversion" → Deploy "2026.6.5". After step 2, LastDeployedVersion becomes malformed, preventing downgrade detection in step 3.

Since this is a warning-only feature and malformed versions are rare, the current fail-open behavior is acceptable. However, validating that Spec.Version is valid CalVer before updating would make downgrade detection more robust:
// Validate new version is parseable before updating high-water mark
if _, newValid := compareCalver(instance.Spec.Version, instance.Spec.Version); newValid {
    cmpTrack, cmpTrackOK := compareCalver(instance.Spec.Version, instance.Status.LastDeployedVersion)
    if instance.Status.LastDeployedVersion == "" || !cmpTrackOK || cmpTrack >= 0 {
        instance.Status.LastDeployedVersion = instance.Spec.Version
    }
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/claw_status.go` around lines 368 - 375, The code updates
LastDeployedVersion without validating that the new version is valid CalVer,
which could corrupt the high-water mark if a malformed version is deployed. Add
a validation check before the existing logic: call compareCalver with
instance.Spec.Version against itself to verify it parses correctly, and only
proceed with the high-water mark update logic if this validation succeeds (the
second return value is true). This ensures that only valid CalVer versions can
overwrite LastDeployedVersion, making downgrade detection more robust.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/controller/claw_status.go`:
- Around line 368-375: The code updates LastDeployedVersion without validating
that the new version is valid CalVer, which could corrupt the high-water mark if
a malformed version is deployed. Add a validation check before the existing
logic: call compareCalver with instance.Spec.Version against itself to verify it
parses correctly, and only proceed with the high-water mark update logic if this
validation succeeds (the second return value is true). This ensures that only
valid CalVer versions can overwrite LastDeployedVersion, making downgrade
detection more robust.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 6364ca6f-1cad-4df9-a29d-92dc9f20609b

📥 Commits

Reviewing files that changed from the base of the PR and between e3e5f16 and 5e45a1e.

📒 Files selected for processing (5)

internal/controller/claw_plugins.go
internal/controller/claw_plugins_test.go
internal/controller/claw_resource_controller.go
internal/controller/claw_status.go
internal/controller/claw_status_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

internal/controller/claw_plugins_test.go
internal/controller/claw_resource_controller.go

cooktheryan · 2026-06-17T12:53:14Z

@IsaiahStapleton ready for round 2 if you have the cycles

…ion downgrade warnings When a Claw deployment fails due to init container crashes (e.g., a plugin requiring a newer OpenClaw version than spec.version), the operator previously reported only "Waiting for deployments to become ready" — forcing users to manually dig through pod logs to find the root cause. This was observed in production when spec.version 2026.6.5 was deployed against a PVC configured for 2026.6.8, causing the @openclaw/anthropic-vertex-provider plugin's init-plugins container to crash-loop with no actionable status on the CR. This commit adds three warning-only status conditions: 1. Ready condition enrichment: when deployments are not ready, the operator inspects pods for init container failures (non-zero exit or CrashLoopBackOff) and surfaces the actual error message in the Ready condition with reason InitContainerFailure. 2. PluginCompatibility condition: when spec.version is older than a plugin's minimum required version (declared via PluginMinVersion in knownProviders), a warning condition is set. This does not block deployment — users may have compatible plugins cached on PVC. 3. VersionDowngrade condition: when spec.version is older than status.lastDeployedVersion, a warning is set about potential PVC data incompatibility. Downgrades remain fully permitted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- compareCalver returns (int, bool) so callers can distinguish malformed versions from equal ones instead of silently treating parse failures as "equal" - Plugin compatibility check moved outside the plugin-installation gate so warnings surface regardless of config management mode - checkPodInitFailures returns (string, error) so inspection failures are logged rather than silently swallowed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…warning persists The VersionDowngrade condition was being cleared on the next reconcile because LastDeployedVersion was unconditionally overwritten to the downgraded version when deployments became ready. Now it only updates upward, preserving the previous version for comparison. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

IsaiahStapleton · 2026-06-18T18:47:30Z

I tested again, and everything seems to be working now. I tested PluginCompatability, InitContainerFailure, and VersionDowngrade are all showing up under status.conditions when the conditions are met.

LGTM

Since you asked about recording some of the things I looked for so you can write automated tests:

For PluginCompatibility, I patched the existing Claw CR with spec.version: "2026.6.1" (which is older than the PluginMinVersion of 2026.6.8 for the Anthropic Vertex plugin). The CR already had a GCP-type Anthropic credential. Immediately after reconcile, the condition appeared: PluginCompatibility=False, Reason=Incompatible. After patching the version up to 2026.6.8, the condition cleared.
For InitContainerFailure, using that same CR at spec.version: "2026.6.1", the init-plugins container failed because the plugin requires API >= 2026.6.8. Once the pod entered CrashLoopBackOff and the operator re-reconciled, the Ready condition changed from Reason=Provisioning to Reason=InitContainerFailure with a message containing the container name and failure details. After upgrading to 2026.6.8 and waiting for the pod to start, Ready went back to True with Reason=Ready.
For VersionDowngrade, I first deployed at spec.version: "2026.6.8" and waited for Ready=True, which caused status.lastDeployedVersion to be recorded as 2026.6.8. Then I patched spec.version down to "2026.6.5". The condition appeared: VersionDowngrade=True, Reason=VersionDowngrade, with a message referencing both versions. lastDeployedVersion stayed at 2026.6.8 (high-water mark, not overwritten by the downgrade).

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread internal/controller/claw_plugins.go Outdated

Comment thread internal/controller/claw_resource_controller.go Outdated

Comment thread internal/controller/claw_status.go

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

cooktheryan and others added 3 commits June 18, 2026 12:19

cooktheryan force-pushed the feat/surface-deployment-failures branch from 5148225 to f086d4d Compare June 18, 2026 16:24

cooktheryan requested a review from IsaiahStapleton June 18, 2026 16:32

IsaiahStapleton approved these changes Jun 18, 2026

View reviewed changes

cooktheryan merged commit f13825d into main Jun 18, 2026
6 checks passed

cooktheryan mentioned this pull request Jun 18, 2026

ci: add VersionDowngrade and PluginCompatibility validation to kind test #21

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: surface init container failures, plugin compatibility, and vers…#13

feat: surface init container failures, plugin compatibility, and vers…#13
cooktheryan merged 3 commits into
mainfrom
feat/surface-deployment-failures

cooktheryan commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IsaiahStapleton commented Jun 16, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

cooktheryan commented Jun 17, 2026

Uh oh!

IsaiahStapleton commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

cooktheryan commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IsaiahStapleton commented Jun 16, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cooktheryan commented Jun 17, 2026

Uh oh!

IsaiahStapleton commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cooktheryan commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading