feat: add spec.traces and spec.logs for OTel forwarding#15
feat: add spec.traces and spec.logs for OTel forwarding#15cooktheryan wants to merge 11 commits into
Conversation
Add traces and logs as top-level spec fields alongside metrics, enabling OpenTelemetry trace and log forwarding through the existing OTel Collector sidecar to external OTLP endpoints. Auto-installs diagnostics plugins and injects OTel resource attributes for multi-instance discrimination. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 41 minutes and 32 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds OTLP-based traces and logs forwarding to the Claw operator. New ChangesOTel Traces and Logs Observability
Sequence Diagram(s)sequenceDiagram
participant reconciler as Reconciler
participant inject as injectObservabilityResources
participant builder as buildCollectorConfig
participant diagCfg as injectDiagnosticsConfig
participant envVars as injectOTelEnvVars
participant netpol as injectTelemetryEgressRules
reconciler->>inject: objects, instance
inject->>builder: generate OTel Collector YAML
inject->>diagCfg: set diagnostics.otel flags and endpoint
inject->>netpol: append TCP egress rules for endpoints
reconciler->>envVars: inject OTEL_* and trace sampler env vars
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/hold until i can run through code rabbit |
|
@coderabbitai review |
✅ Action performedReview finished.
|
Rename the deprecated `otlphttp` exporter alias to `otlp_http` in the OTel Collector config to suppress warnings on collector 0.152.1+. Add a dedicated GitHub Actions workflow that validates the full OTel diagnostics stack (traces, logs, metrics) on a Kind cluster. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/test-diagnostics.yml:
- Around line 31-36: The Kind installation step downloads from the `/dl/latest/`
endpoint in the curl command, which is not reproducible as the latest version
can change over time. Replace the `/dl/latest/` path with a specific pinned
version number (e.g., `/dl/v0.20.0/`) in the curl command that downloads Kind
from kind.sigs.k8s.io to ensure consistent and reproducible test runs.
- Around line 11-22: Add a top-level permissions block to the workflow that
explicitly defines the minimum required GITHUB_TOKEN scopes. Currently, the
workflow lacks a permissions declaration which results in using default
permissions that are broader than necessary, violating the principle of least
privilege. Place the permissions block at the same level as the jobs key to
restrict token access to only the specific permissions needed for this OTel
Diagnostics workflow (typically read or none for workflows that only run tests
without deploying or modifying repository content).
- Line 24: The test-diagnostics.yml workflow uses semantic version tags (like
`@v4`) for GitHub Actions references instead of commit hashes, which is
inconsistent with the pinning approach used in publish-images.yml. Update all
uses statements in the test-diagnostics.yml workflow to pin actions to their
full commit hashes instead of version tags, following the same pattern already
established in publish-images.yml for supply-chain hardening and consistency
across the repository.
In `@api/v1alpha1/claw_types.go`:
- Around line 699-707: The TracesSpec and LogsSpec fields in the claw_types.go
file need CRD-level validation to prevent silent misconfiguration. Add
kubebuilder validation markers (such as +kubebuilder:validation:XValidationRule)
to enforce that when Traces or Logs specs have enabled set to true, the
corresponding endpoint fields must be explicitly configured. This ensures the
validation happens at admission time rather than silently failing during
collector configuration, preventing users from creating invalid resource
combinations where endpoints would not be wired for export.
In `@internal/controller/claw_diagnostics.go`:
- Around line 276-282: The isExternalEndpoint function suppresses errors from
classifyServiceURL by returning false on failure, which hides configuration
issues and silently skips egress rule injection. Modify isExternalEndpoint to
return both a boolean and an error instead of only a boolean, then propagate any
error returned by classifyServiceURL up through the call chain via
collectTelemetryEndpoints and finally through injectTelemetryEgressRules so that
reconciliation and status updates reflect misconfiguration issues immediately.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 2ab0c029-6158-4696-ad50-d544ea85e162
📒 Files selected for processing (9)
.github/workflows/test-diagnostics.ymlapi/v1alpha1/claw_types.goapi/v1alpha1/zz_generated.deepcopy.goconfig/crd/bases/claw.sandbox.redhat.com_claws.yamlinternal/controller/claw_diagnostics.gointernal/controller/claw_diagnostics_test.gointernal/controller/claw_metrics.gointernal/controller/claw_plugins.gointernal/controller/claw_resource_controller.go
| // Traces configures distributed tracing via OTLP, forwarded through the | ||
| // OTel Collector sidecar to an external OTLP endpoint. | ||
| // +optional | ||
| Traces *TracesSpec `json:"traces,omitempty"` | ||
|
|
||
| // Logs configures log forwarding via OTLP, forwarded through the | ||
| // OTel Collector sidecar to an external OTLP endpoint. | ||
| // +optional | ||
| Logs *LogsSpec `json:"logs,omitempty"` |
There was a problem hiding this comment.
Add CRD-level validation for enabled traces/logs endpoint combinations.
enabled: true currently permits endpoint-less specs, but the collector config path only wires trace/log export when endpoints resolve. That creates a silent misconfiguration path. Reject invalid combinations at admission time.
Suggested guardrails
+// +kubebuilder:validation:XValidation:rule="!has(self.traces) || !self.traces.enabled || self.traces.endpoint != ''",message="spec.traces.endpoint is required when spec.traces.enabled is true"
+// +kubebuilder:validation:XValidation:rule="!has(self.logs) || !self.logs.enabled || self.logs.endpoint != '' || (has(self.traces) && self.traces.enabled && self.traces.endpoint != '')",message="spec.logs.endpoint is required when spec.logs.enabled is true unless spec.traces.endpoint is set"
type ClawSpec struct {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/v1alpha1/claw_types.go` around lines 699 - 707, The TracesSpec and
LogsSpec fields in the claw_types.go file need CRD-level validation to prevent
silent misconfiguration. Add kubebuilder validation markers (such as
+kubebuilder:validation:XValidationRule) to enforce that when Traces or Logs
specs have enabled set to true, the corresponding endpoint fields must be
explicitly configured. This ensures the validation happens at admission time
rather than silently failing during collector configuration, preventing users
from creating invalid resource combinations where endpoints would not be wired
for export.
| func isExternalEndpoint(rawURL string, instance *clawv1alpha1.Claw) bool { | ||
| target, err := classifyServiceURL(rawURL, instance.Namespace) | ||
| if err != nil { | ||
| return false | ||
| } | ||
| return target.External | ||
| } |
There was a problem hiding this comment.
Don’t suppress telemetry endpoint classification errors.
Returning false on classification failure hides invalid endpoint configs and silently skips egress rule injection. Bubble this error up so reconcile/status reflects misconfiguration immediately.
Minimal direction
-func isExternalEndpoint(rawURL string, instance *clawv1alpha1.Claw) bool {
+func isExternalEndpoint(rawURL string, instance *clawv1alpha1.Claw) (bool, error) {
target, err := classifyServiceURL(rawURL, instance.Namespace)
if err != nil {
- return false
+ return false, err
}
- return target.External
+ return target.External, nil
}Then propagate the error through collectTelemetryEndpoints → injectTelemetryEgressRules.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/controller/claw_diagnostics.go` around lines 276 - 282, The
isExternalEndpoint function suppresses errors from classifyServiceURL by
returning false on failure, which hides configuration issues and silently skips
egress rule injection. Modify isExternalEndpoint to return both a boolean and an
error instead of only a boolean, then propagate any error returned by
classifyServiceURL up through the call chain via collectTelemetryEndpoints and
finally through injectTelemetryEgressRules so that reconciliation and status
updates reflect misconfiguration issues immediately.
- Fix injectTelemetryEgressRules to handle in-cluster endpoints using classifyServiceURL/buildInClusterEgressRule (was silently blocking) - Rename workflow to "kind test" - Use quay.io/redhat-et/ image names for Kind compatibility - Add IMAGE_PULL_POLICY=IfNotPresent for Kind deploys - Enhance debug logging: dump container logs, configmap, kubectl get all Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ntain permissions' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
- Add permissions: contents: read to limit GITHUB_TOKEN scope - Add persist-credentials: false on checkout - Pin Kind to v0.32.0 for reproducible builds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Send test OTLP data to the sidecar and verify it flows through: - Traces: sidecar receives span, backend logs it via debug exporter - Logs: sidecar receives log record, backend logs it - Metrics: sidecar receives OTLP metric, Prometheus endpoint serves it Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These tested the OTel Collector generically, not OpenClaw's actual telemetry output. Keep the operator artifact validations (sidecar, config, env vars, NP rules) which are what this test is for. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge publish-images.yml into test-kind.yml so images are only pushed to quay.io after successful Kind testing and only on merge to main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Align test job action references with publish job for supply-chain hardening consistency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
spec.tracesandspec.logsas top-level CRD fields alongside the existingspec.metrics, enabling OpenTelemetry trace and log forwarding through the OTel Collector sidecar to external OTLP endpoints@openclaw/diagnostics-oteland@openclaw/diagnostics-prometheusplugins when the corresponding signals are enabledOTEL_SERVICE_NAME,service.instance.id,k8s.namespace.name) via Downward API for multi-instance discriminationparentbased_traceidratiosampler with user-configurable sampling ratioExample CR
Test plan
make generate— deepcopy generated for newTracesSpecandLogsSpectypesmake manifests— CRD schema includestracesandlogsfieldsmake lint— 0 issues (extractedinjectObservabilityResourcesto stay under cyclomatic complexity limit)make test— all tests pass including new diagnostics tests🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Quality Improvements