feat: surface init container failures, plugin compatibility, and vers…#13
Conversation
|
Warning Review limit reached
More reviews will be available in 45 minutes and 56 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (9)
📝 WalkthroughWalkthroughAdds three safety and observability enhancements to the Claw operator: plugin CalVer compatibility checking that gates init-container installation and sets a ChangesClaw Operator Safety Checks
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/controller/claw_plugins.go`:
- Around line 243-275: The compareCalver function silently returns 0 when
parsing version parts fails via strconv.Atoi, treating malformed version strings
as equal and suppressing important compatibility warnings. Modify the
compareCalver function to return an error as a second return value instead of
returning 0 on parse failures. When strconv.Atoi fails for either aParts or
bParts, return the error so callers can properly handle and surface "cannot
compare" diagnostics for invalid version strings. Update all call sites of
compareCalver to check the error return value and propagate these diagnostics
appropriately to prevent suppressing plugin compatibility and downgrade
warnings.
In `@internal/controller/claw_resource_controller.go`:
- Around line 897-914: The plugin compatibility check is incorrectly nested
inside the conditional block that gates plugin auto-configuration. Move the
checkPluginCompatibility call and its associated condition-setting logic outside
the if/else block that checks userManagedConfig(instance) and
pluginInstallationDisabled(instance), so compatibility warnings are evaluated
regardless of plugin installation mode. Keep the configurePluginsInitContainer
call and effectivePlugins logic inside the original conditional block since
those should only execute when plugins are being auto-configured.
In `@internal/controller/claw_status.go`:
- Around line 205-222: Error handling in the initialization failure inspection
is swallowing all errors by returning empty strings indiscriminately. The
r.Get() call for the deployment, the metav1.LabelSelectorAsSelector() call, and
the r.List() call for pods all return empty string on error, making it
impossible to distinguish between "no init failure found" and "inspection
failed". Modify these error cases to return an error value or use a tuple return
type (string, error) so that real inspection failures are properly propagated
rather than masked. This issue occurs at the primary location and also applies
to similar error handling at another location in the file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: c31162a9-cb88-412c-868b-670ce2e0e10b
📒 Files selected for processing (9)
api/v1alpha1/claw_types.goconfig/crd/bases/claw.sandbox.redhat.com_claws.yamlconfig/rbac/role.yamlinternal/controller/claw_plugins.gointernal/controller/claw_plugins_test.gointernal/controller/claw_providers.gointernal/controller/claw_resource_controller.gointernal/controller/claw_status.gointernal/controller/claw_status_test.go
|
I built and redeployed the operator with these changes and tried to test each warning only feature the operator adds and here is what I found: Feature 1: Init Container Failure Surfacing — WORKS Feature 2: Plugin Compatibility Check — WORKS Feature 3: Version Downgrade Detection — BUG |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/controller/claw_status.go (1)
368-375: ⚡ Quick winConsider validating the new version before updating the high-water mark.
When
compareCalverreturns!cmpTrackOK, it could mean either the old or new version is malformed. The current code updatesLastDeployedVersionin this case, which could overwrite a valid old version with a malformed new version.Scenario: Deploy
"2026.6.8"→ Deploy"notaversion"→ Deploy"2026.6.5". After step 2,LastDeployedVersionbecomes malformed, preventing downgrade detection in step 3.Since this is a warning-only feature and malformed versions are rare, the current fail-open behavior is acceptable. However, validating that
Spec.Versionis valid CalVer before updating would make downgrade detection more robust:// Validate new version is parseable before updating high-water mark if _, newValid := compareCalver(instance.Spec.Version, instance.Spec.Version); newValid { cmpTrack, cmpTrackOK := compareCalver(instance.Spec.Version, instance.Status.LastDeployedVersion) if instance.Status.LastDeployedVersion == "" || !cmpTrackOK || cmpTrack >= 0 { instance.Status.LastDeployedVersion = instance.Spec.Version } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/controller/claw_status.go` around lines 368 - 375, The code updates LastDeployedVersion without validating that the new version is valid CalVer, which could corrupt the high-water mark if a malformed version is deployed. Add a validation check before the existing logic: call compareCalver with instance.Spec.Version against itself to verify it parses correctly, and only proceed with the high-water mark update logic if this validation succeeds (the second return value is true). This ensures that only valid CalVer versions can overwrite LastDeployedVersion, making downgrade detection more robust.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/controller/claw_status.go`:
- Around line 368-375: The code updates LastDeployedVersion without validating
that the new version is valid CalVer, which could corrupt the high-water mark if
a malformed version is deployed. Add a validation check before the existing
logic: call compareCalver with instance.Spec.Version against itself to verify it
parses correctly, and only proceed with the high-water mark update logic if this
validation succeeds (the second return value is true). This ensures that only
valid CalVer versions can overwrite LastDeployedVersion, making downgrade
detection more robust.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 6364ca6f-1cad-4df9-a29d-92dc9f20609b
📒 Files selected for processing (5)
internal/controller/claw_plugins.gointernal/controller/claw_plugins_test.gointernal/controller/claw_resource_controller.gointernal/controller/claw_status.gointernal/controller/claw_status_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/controller/claw_plugins_test.go
- internal/controller/claw_resource_controller.go
|
@IsaiahStapleton ready for round 2 if you have the cycles |
…ion downgrade warnings When a Claw deployment fails due to init container crashes (e.g., a plugin requiring a newer OpenClaw version than spec.version), the operator previously reported only "Waiting for deployments to become ready" — forcing users to manually dig through pod logs to find the root cause. This was observed in production when spec.version 2026.6.5 was deployed against a PVC configured for 2026.6.8, causing the @openclaw/anthropic-vertex-provider plugin's init-plugins container to crash-loop with no actionable status on the CR. This commit adds three warning-only status conditions: 1. Ready condition enrichment: when deployments are not ready, the operator inspects pods for init container failures (non-zero exit or CrashLoopBackOff) and surfaces the actual error message in the Ready condition with reason InitContainerFailure. 2. PluginCompatibility condition: when spec.version is older than a plugin's minimum required version (declared via PluginMinVersion in knownProviders), a warning condition is set. This does not block deployment — users may have compatible plugins cached on PVC. 3. VersionDowngrade condition: when spec.version is older than status.lastDeployedVersion, a warning is set about potential PVC data incompatibility. Downgrades remain fully permitted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- compareCalver returns (int, bool) so callers can distinguish malformed versions from equal ones instead of silently treating parse failures as "equal" - Plugin compatibility check moved outside the plugin-installation gate so warnings surface regardless of config management mode - checkPodInitFailures returns (string, error) so inspection failures are logged rather than silently swallowed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…warning persists The VersionDowngrade condition was being cleared on the next reconcile because LastDeployedVersion was unconditionally overwritten to the downgraded version when deployments became ready. Now it only updates upward, preserving the previous version for comparison. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5148225 to
f086d4d
Compare
|
I tested again, and everything seems to be working now. I tested PluginCompatability, InitContainerFailure, and VersionDowngrade are all showing up under status.conditions when the conditions are met. LGTM Since you asked about recording some of the things I looked for so you can write automated tests:
|
…ion downgrade warnings
When a Claw deployment fails due to init container crashes (e.g., a plugin requiring a newer OpenClaw version than spec.version), the operator previously reported only "Waiting for deployments to become ready" — forcing users to manually dig through pod logs to find the root cause. This was observed in production when spec.version 2026.6.5 was deployed against a PVC configured for 2026.6.8, causing the @openclaw/anthropic-vertex-provider plugin's init-plugins container to crash-loop with no actionable status on the CR.
This commit adds three warning-only status conditions:
Ready condition enrichment: when deployments are not ready, the operator inspects pods for init container failures (non-zero exit or CrashLoopBackOff) and surfaces the actual error message in the Ready condition with reason InitContainerFailure.
PluginCompatibility condition: when spec.version is older than a plugin's minimum required version (declared via PluginMinVersion in knownProviders), a warning condition is set. This does not block deployment — users may have compatible plugins cached on PVC.
VersionDowngrade condition: when spec.version is older than status.lastDeployedVersion, a warning is set about potential PVC data incompatibility. Downgrades remain fully permitted.
Summary by CodeRabbit
lastDeployedVersionto help identify potentially unsafe version downgrades.