feat: restructure spec.skills with OCI image and ConfigMap skill delivery#20
feat: restructure spec.skills with OCI image and ConfigMap skill delivery#20cooktheryan wants to merge 3 commits into
Conversation
📝 WalkthroughWalkthroughThe PR replaces ChangesMulti-source skill delivery
CI branch trigger updates
Sequence Diagram(s)sequenceDiagram
rect rgba(70, 130, 180, 0.5)
note over Reconcile,injectSkillConfigMapFiles: Validation & Enrichment Phase
end
participant Reconcile
participant validateSkills
participant enrichConfigAndNetworkPolicy
participant injectSkillConfigMapFiles
participant configureSkillImages
participant APIServer
Reconcile->>validateSkills: validate Skills (content, images, configMaps)
validateSkills-->>Reconcile: collision/name error → set Ready=False, return
Reconcile->>enrichConfigAndNetworkPolicy: ctx + instance + gatewayConfigMap
enrichConfigAndNetworkPolicy->>injectSkillConfigMapFiles: ctx, instance, gatewayConfigMap
injectSkillConfigMapFiles->>APIServer: Get each referenced ConfigMap
APIServer-->>injectSkillConfigMapFiles: ConfigMap data
injectSkillConfigMapFiles-->>enrichConfigAndNetworkPolicy: write _skill_ keys or error
enrichConfigAndNetworkPolicy-->>Reconcile: enriched objects
Reconcile->>configureSkillImages: objects + instance.Spec.Skills.Images
configureSkillImages-->>Reconcile: Deployment volumes/mounts patched or error
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
internal/controller/claw_workspace.go (1)
292-313: ⚡ Quick winConsider consolidating duplicate validation logic.
validateSkillNameduplicates the exact same checks found invalidateSkillNames(lines 82-96) andvalidateSkillImages(lines 135-152). If validation rules change, three places need updating.♻️ Suggested refactor
Refactor
validateSkillNamesto delegate tovalidateSkillName:func validateSkillNames(skills map[string]string) error { for name := range skills { - if name == "" { - return fmt.Errorf("skill name must not be empty") - } - if name == "." || name == ".." { - return fmt.Errorf("skill name %q is invalid: must not be %q", name, name) - } - if strings.Contains(name, "/") { - return fmt.Errorf("skill name %q is invalid: must not contain \"/\"", name) - } - if strings.Contains(name, pathSeparatorCode) { - return fmt.Errorf("skill name %q is invalid: must not contain %q (reserved as path encoding delimiter)", name, pathSeparatorCode) - } - if builtinSkillNames[name] { - return fmt.Errorf("skill name %q conflicts with builtin operator skill", name) + if err := validateSkillName(name); err != nil { + return err } } return nil }Similarly,
validateSkillImagescould callvalidateSkillName(si.Name)instead of repeating the same checks inline.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/controller/claw_workspace.go` around lines 292 - 313, The validateSkillName function duplicates validation checks that are already performed in validateSkillNames and validateSkillImages functions. Refactor both validateSkillNames and validateSkillImages to delegate to validateSkillName instead of repeating the validation logic inline. This means removing the redundant checks from validateSkillNames (around lines 82-96) and validateSkillImages (around lines 135-152) and instead calling validateSkillName for each skill name being validated, ensuring a single source of truth for validation rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/v1alpha1/claw_types.go`:
- Around line 752-755: The change to the Skills field modifies its type from
map[string]string to *SkillsSpec within the same v1alpha1 API version, which
will silently drop existing map-shaped data during upgrades since unknown fields
no longer match the new schema. To fix this, establish an explicit migration
path by either creating a new API version (such as v1alpha2) with proper
conversion functions that transform the old map format to the new SkillsSpec
structure, or provide a pre-upgrade migration tool that rewrites existing CRs to
the new format before rolling out this change. This ensures existing resources
are preserved and properly migrated rather than losing data.
In `@internal/controller/claw_workspace.go`:
- Around line 170-171: The comment for the validateSkillConfigMaps function is
misleading because it claims the function checks for conflicts with "other skill
sources," but it actually only validates that ConfigMap refs have non-empty
names and no duplicate refs. Update the comment to accurately describe what
validateSkillConfigMaps actually does by removing the reference to conflict
checking with other skill sources, since that validation happens later in the
injectSkillConfigMapFiles function at injection time, not in this validation
function.
In `@test/e2e/e2e_test.go`:
- Around line 829-838: Replace the direct call to wait.PollUntilContextTimeout
with the standard waitFor helper function to ensure consistent async timing
behavior across tests. The waitFor helper should be invoked with parameters: t
(test object), 10 second timeout, 250 millisecond poll interval, an anonymous
condition function that executes the kubectl command to check if the
ProxyConfigured status equals conditionTrue, and the error message "Claw
ProxyConfigured did not become True". This ensures the test follows the
established coding guidelines for async operations.
---
Nitpick comments:
In `@internal/controller/claw_workspace.go`:
- Around line 292-313: The validateSkillName function duplicates validation
checks that are already performed in validateSkillNames and validateSkillImages
functions. Refactor both validateSkillNames and validateSkillImages to delegate
to validateSkillName instead of repeating the validation logic inline. This
means removing the redundant checks from validateSkillNames (around lines 82-96)
and validateSkillImages (around lines 135-152) and instead calling
validateSkillName for each skill name being validated, ensuring a single source
of truth for validation rules.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: aa36579a-4525-475e-8f06-70dc5e971799
📒 Files selected for processing (16)
.github/workflows/lint-bundle.yml.github/workflows/lint.yml.github/workflows/test-e2e.yml.github/workflows/test.ymlapi/v1alpha1/claw_types.goapi/v1alpha1/zz_generated.deepcopy.goconfig/crd/bases/claw.sandbox.redhat.com_claws.yamlinternal/controller/claw_deployment.gointernal/controller/claw_resource_controller.gointernal/controller/claw_skill_images_test.gointernal/controller/claw_workspace.gointernal/controller/claw_workspace_test.gotest/e2e/e2e_suite_test.gotest/e2e/e2e_test.gotest/e2e/testdata/skill-image/Containerfiletest/e2e/testdata/skill-image/SKILL.md
| // Skills configures skill delivery: inline content, OCI images, and ConfigMap refs. | ||
| // +optional | ||
| Skills map[string]string `json:"skills,omitempty"` | ||
| Skills *SkillsSpec `json:"skills,omitempty"` | ||
|
|
There was a problem hiding this comment.
Prevent silent upgrade breakage for existing spec.skills resources.
Changing spec.skills from map[string]string to *SkillsSpec in the same v1alpha1 API version risks dropping legacy map-shaped data after upgrade (unknown fields no longer match the new schema). Please add an explicit migration path (for example, versioned API conversion or a pre-upgrade rewrite of existing CRs) before rollout.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/v1alpha1/claw_types.go` around lines 752 - 755, The change to the Skills
field modifies its type from map[string]string to *SkillsSpec within the same
v1alpha1 API version, which will silently drop existing map-shaped data during
upgrades since unknown fields no longer match the new schema. To fix this,
establish an explicit migration path by either creating a new API version (such
as v1alpha2) with proper conversion functions that transform the old map format
to the new SkillsSpec structure, or provide a pre-upgrade migration tool that
rewrites existing CRs to the new format before rolling out this change. This
ensures existing resources are preserved and properly migrated rather than
losing data.
| ctx := context.Background() | ||
| err = wait.PollUntilContextTimeout(ctx, pollInterval, defaultTimeout, true, | ||
| func(ctx context.Context) (bool, error) { | ||
| cmd := exec.Command("kubectl", "get", "claw", "instance", | ||
| "-o", "jsonpath={.status.conditions[?(@.type=='ProxyConfigured')].status}", | ||
| "-n", userNamespace) | ||
| output, err := utils.Run(t, cmd) | ||
| return err == nil && output == conditionTrue, nil | ||
| }) | ||
| require.NoError(t, err, "Claw ProxyConfigured did not become True") |
There was a problem hiding this comment.
Use the repo’s waitFor(...) helper for this async condition wait.
This new subtest uses wait.PollUntilContextTimeout directly instead of the standard test helper/interval contract, which makes async timing behavior inconsistent across tests.
As per coding guidelines, **/*_test.go: Use waitFor(t, timeout, interval, condition, message) helper with 10s timeout and 250ms poll interval for async operations in tests.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/e2e_test.go` around lines 829 - 838, Replace the direct call to
wait.PollUntilContextTimeout with the standard waitFor helper function to ensure
consistent async timing behavior across tests. The waitFor helper should be
invoked with parameters: t (test object), 10 second timeout, 250 millisecond
poll interval, an anonymous condition function that executes the kubectl command
to check if the ProxyConfigured status equals conditionTrue, and the error
message "Claw ProxyConfigured did not become True". This ensures the test
follows the established coding guidelines for async operations.
Source: Coding guidelines
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/e2e_test.go`:
- Around line 821-823: The test is using a fixed filename path for crFile
instead of a unique temporary file, causing test isolation issues when runs are
concurrent or retried. Replace the hardcoded filepath.Join("/tmp",
"claw-e2e-skills.yaml") assignment with a call to ioutil.TempFile or
os.CreateTemp to generate a unique temporary file, store the returned file path
in crFile, and add a defer statement to clean up the temporary file using
os.Remove after the test completes to prevent resource leaks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 6bf8c92b-d713-42d8-ab5e-ca4ebc6c79b8
📒 Files selected for processing (5)
.github/workflows/publish-images.yml.github/workflows/test-kind.ymlinternal/controller/claw_workspace.gotest/e2e/e2e_test.gotest/e2e/testdata/skill-image/Dockerfile
💤 Files with no reviewable changes (2)
- test/e2e/testdata/skill-image/Dockerfile
- .github/workflows/publish-images.yml
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/controller/claw_workspace.go
| crFile := filepath.Join("/tmp", "claw-e2e-skills.yaml") | ||
| err = os.WriteFile(crFile, []byte(crYAML), os.FileMode(0o644)) | ||
| require.NoError(t, err) |
There was a problem hiding this comment.
Avoid a shared fixed manifest filename in /tmp.
Using /tmp/claw-e2e-skills.yaml makes this subtest non-isolated; concurrent/retried runs can overwrite each other and cause flaky behavior. Use a unique temp file and clean it up.
Suggested patch
- crFile := filepath.Join("/tmp", "claw-e2e-skills.yaml")
- err = os.WriteFile(crFile, []byte(crYAML), os.FileMode(0o644))
- require.NoError(t, err)
+ f, err := os.CreateTemp("", "claw-e2e-skills-*.yaml")
+ require.NoError(t, err)
+ crFile := f.Name()
+ t.Cleanup(func() { _ = os.Remove(crFile) })
+ _, err = f.Write([]byte(crYAML))
+ require.NoError(t, err)
+ require.NoError(t, f.Close())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| crFile := filepath.Join("/tmp", "claw-e2e-skills.yaml") | |
| err = os.WriteFile(crFile, []byte(crYAML), os.FileMode(0o644)) | |
| require.NoError(t, err) | |
| f, err := os.CreateTemp("", "claw-e2e-skills-*.yaml") | |
| require.NoError(t, err) | |
| crFile := f.Name() | |
| t.Cleanup(func() { _ = os.Remove(crFile) }) | |
| _, err = f.Write([]byte(crYAML)) | |
| require.NoError(t, err) | |
| require.NoError(t, f.Close()) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/e2e_test.go` around lines 821 - 823, The test is using a fixed
filename path for crFile instead of a unique temporary file, causing test
isolation issues when runs are concurrent or retried. Replace the hardcoded
filepath.Join("/tmp", "claw-e2e-skills.yaml") assignment with a call to
ioutil.TempFile or os.CreateTemp to generate a unique temporary file, store the
returned file path in crFile, and add a defer statement to clean up the
temporary file using os.Remove after the test completes to prevent resource
leaks.
…very Resolves redhat-et/claw-project#15 (OCI skill image delivery via ImageVolumes) Resolves redhat-et/claw-project#18 (ConfigMap-based skill refs) Restructures spec.skills from a flat map[string]string to a nested SkillsSpec with three independent delivery mechanisms: skills.content — inline SKILL.md injected as _skill_<name> keys in the gateway ConfigMap; merge.js writes them to workspace/skills/<name>/SKILL.md on pod start skills.images — OCI images mounted read-only as Kubernetes ImageVolumes at /home/node/.openclaw/workspace/skills/<name> (requires K8s 1.33+; pullPolicy defaults to IfNotPresent when not specified) skills.configMaps — references to user-owned ConfigMaps; keys are read at reconcile time via UserSecretReader (bypasses the label-filtered cache) and injected as _skill_<name> entries alongside content skills Cross-source collision detection is enforced: a skill name appearing in more than one source (content, images, or any referenced ConfigMap key) returns a reconcile error rather than silently overwriting. Unit tests cover all three validation paths, cross-source collision detection, ConfigMap injection including content/image collision detection, missing ConfigMap error, and the complete integration path via envtest. E2e tests (test/e2e/e2e_suite_test.go, e2e_test.go) build a minimal FROM-scratch OCI skill image in TestMain, load it into Kind, and exercise all three delivery mechanisms in a dedicated subtest. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Incorporates PR #18 (ci/unified-workflow): replaces publish-images.yml with test-kind.yml that combines Kind e2e testing with conditional image publishing. Pins all GitHub Actions to commit hashes for supply chain security. Extends the kind test job with skill-specific steps: - Builds a minimal FROM-scratch OCI skill image and loads it into Kind with pullPolicy: IfNotPresent (avoids pull from non-existent registry) - Applies a Claw CR with all three skill types from the start (content, images, configMaps) using a K8s 1.35 node for full ImageVolume support - Validates all three delivery mechanisms by exec-ing into the running gateway pod and checking skill files on disk Also adds main branch to push/PR triggers on the existing workflows (Kubebuilder scaffolding defaulted to master only). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
9ce6494 to
3037509
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/e2e/testdata/skill-image/Dockerfile (1)
1-2: ⚡ Quick winSet a non-root user for this image.
FROM scratchdefaults to root, which weakens the security posture of the new e2e artifact and matches the Trivy finding. A numeric UID is enough here.♻️ Proposed fix
FROM scratch COPY SKILL.md / +USER 65532:65532🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/testdata/skill-image/Dockerfile` around lines 1 - 2, The Dockerfile currently defaults to the root user when using FROM scratch, which is a security vulnerability. Add a USER instruction after the COPY SKILL.md / command to specify a non-root user with a numeric UID (for example, USER 1000 or any appropriate numeric identifier) to improve the security posture of the image and align with the Trivy security findings.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/e2e/testdata/skill-image/Dockerfile`:
- Around line 1-2: The Dockerfile currently defaults to the root user when using
FROM scratch, which is a security vulnerability. Add a USER instruction after
the COPY SKILL.md / command to specify a non-root user with a numeric UID (for
example, USER 1000 or any appropriate numeric identifier) to improve the
security posture of the image and align with the Trivy security findings.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: d3470217-35c0-4f8e-baba-37ae746b00cb
📒 Files selected for processing (17)
.github/workflows/lint-bundle.yml.github/workflows/lint.yml.github/workflows/test-e2e.yml.github/workflows/test-kind.yml.github/workflows/test.ymlapi/v1alpha1/claw_types.goapi/v1alpha1/zz_generated.deepcopy.goconfig/crd/bases/claw.sandbox.redhat.com_claws.yamlinternal/controller/claw_deployment.gointernal/controller/claw_resource_controller.gointernal/controller/claw_skill_images_test.gointernal/controller/claw_workspace.gointernal/controller/claw_workspace_test.gotest/e2e/e2e_suite_test.gotest/e2e/e2e_test.gotest/e2e/testdata/skill-image/Dockerfiletest/e2e/testdata/skill-image/SKILL.md
✅ Files skipped from review due to trivial changes (4)
- test/e2e/testdata/skill-image/SKILL.md
- .github/workflows/test.yml
- .github/workflows/test-e2e.yml
- api/v1alpha1/zz_generated.deepcopy.go
🚧 Files skipped from review as they are similar to previous changes (10)
- .github/workflows/lint-bundle.yml
- .github/workflows/lint.yml
- config/crd/bases/claw.sandbox.redhat.com_claws.yaml
- internal/controller/claw_workspace.go
- test/e2e/e2e_test.go
- internal/controller/claw_deployment.go
- internal/controller/claw_skill_images_test.go
- internal/controller/claw_resource_controller.go
- internal/controller/claw_workspace_test.go
- api/v1alpha1/claw_types.go
|
@pavelanni cleared out the merge conflict |
Summary
publish-images.ymlwithtest-kind.yml)Restructures
spec.skillsfrom a flatmap[string]stringto a nestedSkillsSpecwith three independent delivery mechanisms:skills.content_skill_<name>keys in the gateway ConfigMap;merge.jswrites them toworkspace/skills/<name>/SKILL.mdskills.images/home/node/.openclaw/workspace/skills/<name>via Kubernetes ImageVolumes (requires K8s 1.33+)skills.configMapsUserSecretReader(bypasses label-filtered cache) and injected as_skill_<name>entriesCross-source collision detection is enforced at reconcile time — a skill name appearing in more than one source (content, images, or any referenced ConfigMap) returns a clear error rather than silently overwriting.
Also brings in the
test-kind.ymlunified CI workflow from #18, extended with skill-specific validation steps:FROM scratchOCI skill image and loads it into Kind_skill_greeting(content),_skill_sales-playbook+_skill_onboarding(configMaps), and the ImageVolume shape (reference, mountPath, readOnly) on the deploymentTest plan
TestMainbuilds the skill OCI image and loads into Kind; new subtest exercises all three delivery mechanismstest-kind.ymlworkflow: dedicated steps validate all three skill types after reconcilemake testpasses,make lint0 issues🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Chores
mainandmasterbranches.