feat(observability): pipeline-aware metrics, dashboard, and printer column#260
Merged
Merged
Conversation
…olumn
Knowledge-work teams operate at the granularity of pipeline stages
and produced artifacts, not just tokens and cost. v0.8.0 needed
observability to match — operators should be able to see, in a
glance, which stage a team is in, how long stages take, what came
out the other end, and whether delivery to external systems worked.
Metrics (new, `kagents_` prefix to match the rebrand)
- kagents_team_pipeline_stage_active{team,namespace,stage} — gauge,
1 while Running, 0 elsewhere. Stack-by-stage in Grafana for the
classic flight-status view across many teams.
- kagents_team_stage_duration_seconds{team,namespace,stage} —
histogram, observed once at the Running → Completed edge.
Exponential buckets starting at 30s, matching the existing
team-duration histogram.
- kagents_team_artifacts_produced_total{team,namespace,teammate} —
counter, increments once per ArtifactStatus appended; idempotent
across reconciles because the dedup guard on Status.Artifacts is
the gate.
- kagents_team_delivery_success_total / _failure_total
{team,namespace,type} — counters, one increment per delivery
attempt from executeDelivery.
The existing `claude_*` metric family stays untouched so existing
dashboards keep working; this PR adds five `kagents_*` series rather
than renaming. A future major release can sync the prefixes.
Reconciler wiring
- updatePipelineStatus emits the stage gauge on every reconcile
(Running → 1, otherwise 0) and observes duration exactly once at
the Completed transition (gated on prevPhase != Completed).
- recordTeammateArtifacts increments the artifact counter inside
the dedup-checked append path.
- executeDelivery records success vs failure per target alongside
the existing event emission.
Status / kubectl
- New `Stage` printer column (priority=1, so `kubectl get agentteams`
shows it when -o wide / on wide-enough terminals). Sources
status.pipeline.currentStage.
Dashboard
- Pipeline section on the team detail view: progress bar over
completed/total, list of stages with their phase chip and
teammates-ready ratio.
- Artifacts section listing name, producing teammate, and source
path; renders only when status.artifacts is non-empty.
- New pipelinePercent template helper, fully covered (nil, zero,
half, full, overflow clamp).
Tests
- internal/metrics rose to 100% coverage. Five new tests on the new
helpers + the names guard extended to the kagents_ prefix.
- internal/dashboard rose to 73.9% with the new helper tests.
- internal/controller existing pipeline-status tests still green.
Docs
- docs/explanation/operations.md metrics table extended with the
five new series and a paragraph explaining the dual-prefix
situation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: amcheste <13696614+amcheste@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Knowledge-work teams operate at the granularity of pipeline stages
and produced artifacts, not just tokens and cost. Observability
needed to match — operators should be able to see, at a glance, which
stage a team is in, how long stages take, what came out the other
end, and whether delivery to external systems worked.
This PR adds five new metrics under the
kagents_prefix, a dashboardpipeline section, an artifact listing, a kubectl Stage printer column,
and the corresponding tests.
New metrics (
kagents_prefix)kagents_team_pipeline_stage_activekagents_team_stage_duration_secondskagents_team_artifacts_produced_totalkagents_team_delivery_success_totalkagents_team_delivery_failure_totalDual-prefix decision
The existing
claude_*family stays untouched so existing dashboardskeep working. This PR adds the new metrics alongside; a future major
release can synchronize the prefixes. Documented in
docs/explanation/operations.md.Reconciler wiring
updatePipelineStatusemits the stage gauge on every reconcile(Running → 1, otherwise 0) and observes duration exactly once at
the Completed transition (gated on
prevPhase != Completed).recordTeammateArtifactsincrements the artifact counter insidethe dedup-checked append path — idempotent across reconciles.
executeDeliveryrecords success vs failure per target alongsidethe existing event emission.
kubectl
New
Stageprinter column (priority=1, so it shows onkubectl get agentteams -o wide/ wide terminals). Sourcesstatus.pipeline.currentStage.Dashboard
completed/total stages, list of stages with phase chip + teammates-ready
ratio.
renders only when
status.artifactsis non-empty.pipelinePercenttemplate helper, fully covered (nil, zero,half, full, overflow clamp).
Tests
internal/metrics→ 100% coverage. Five new tests plus thenames-guard extended to the new prefix.
internal/dashboard→ 73.9% with new helper tests.internal/controller— existing pipeline-status tests stay green.Test plan
kubectl get agentteams -o wideshows the Stage column🤖 Generated with Claude Code