Skip to content

feat(observability): pipeline-aware metrics, dashboard, and printer column#260

Merged
amcheste-ai-agent[bot] merged 1 commit into
developfrom
feat/pipeline-observability
Jun 2, 2026
Merged

feat(observability): pipeline-aware metrics, dashboard, and printer column#260
amcheste-ai-agent[bot] merged 1 commit into
developfrom
feat/pipeline-observability

Conversation

@amcheste-ai-agent
Copy link
Copy Markdown
Contributor

Summary

Knowledge-work teams operate at the granularity of pipeline stages
and produced artifacts, not just tokens and cost. Observability
needed to match — operators should be able to see, at a glance, which
stage a team is in, how long stages take, what came out the other
end, and whether delivery to external systems worked.

This PR adds five new metrics under the kagents_ prefix, a dashboard
pipeline section, an artifact listing, a kubectl Stage printer column,
and the corresponding tests.

New metrics (kagents_ prefix)

Metric Type Labels
kagents_team_pipeline_stage_active gauge team, namespace, stage
kagents_team_stage_duration_seconds histogram team, namespace, stage
kagents_team_artifacts_produced_total counter team, namespace, teammate
kagents_team_delivery_success_total counter team, namespace, type
kagents_team_delivery_failure_total counter team, namespace, type

Dual-prefix decision

The existing claude_* family stays untouched so existing dashboards
keep working. This PR adds the new metrics alongside; a future major
release can synchronize the prefixes. Documented in
docs/explanation/operations.md.

Reconciler wiring

  • updatePipelineStatus emits the stage gauge on every reconcile
    (Running → 1, otherwise 0) and observes duration exactly once at
    the Completed transition (gated on prevPhase != Completed).
  • recordTeammateArtifacts increments the artifact counter inside
    the dedup-checked append path — idempotent across reconciles.
  • executeDelivery records success vs failure per target alongside
    the existing event emission.

kubectl

New Stage printer column (priority=1, so it shows on kubectl get agentteams -o wide / wide terminals). Sources
status.pipeline.currentStage.

Dashboard

  • Pipeline section on the team detail view: progress bar over
    completed/total stages, list of stages with phase chip + teammates-ready
    ratio.
  • Artifacts section listing name, producing teammate, and source path;
    renders only when status.artifacts is non-empty.
  • New pipelinePercent template helper, fully covered (nil, zero,
    half, full, overflow clamp).

Tests

  • internal/metrics100% coverage. Five new tests plus the
    names-guard extended to the new prefix.
  • internal/dashboard → 73.9% with new helper tests.
  • internal/controller — existing pipeline-status tests stay green.

Test plan

  • CI green
  • kubectl get agentteams -o wide shows the Stage column

🤖 Generated with Claude Code

…olumn

Knowledge-work teams operate at the granularity of pipeline stages
and produced artifacts, not just tokens and cost. v0.8.0 needed
observability to match — operators should be able to see, in a
glance, which stage a team is in, how long stages take, what came
out the other end, and whether delivery to external systems worked.

Metrics (new, `kagents_` prefix to match the rebrand)
- kagents_team_pipeline_stage_active{team,namespace,stage} — gauge,
  1 while Running, 0 elsewhere. Stack-by-stage in Grafana for the
  classic flight-status view across many teams.
- kagents_team_stage_duration_seconds{team,namespace,stage} —
  histogram, observed once at the Running → Completed edge.
  Exponential buckets starting at 30s, matching the existing
  team-duration histogram.
- kagents_team_artifacts_produced_total{team,namespace,teammate} —
  counter, increments once per ArtifactStatus appended; idempotent
  across reconciles because the dedup guard on Status.Artifacts is
  the gate.
- kagents_team_delivery_success_total / _failure_total
  {team,namespace,type} — counters, one increment per delivery
  attempt from executeDelivery.

The existing `claude_*` metric family stays untouched so existing
dashboards keep working; this PR adds five `kagents_*` series rather
than renaming. A future major release can sync the prefixes.

Reconciler wiring
- updatePipelineStatus emits the stage gauge on every reconcile
  (Running → 1, otherwise 0) and observes duration exactly once at
  the Completed transition (gated on prevPhase != Completed).
- recordTeammateArtifacts increments the artifact counter inside
  the dedup-checked append path.
- executeDelivery records success vs failure per target alongside
  the existing event emission.

Status / kubectl
- New `Stage` printer column (priority=1, so `kubectl get agentteams`
  shows it when -o wide / on wide-enough terminals). Sources
  status.pipeline.currentStage.

Dashboard
- Pipeline section on the team detail view: progress bar over
  completed/total, list of stages with their phase chip and
  teammates-ready ratio.
- Artifacts section listing name, producing teammate, and source
  path; renders only when status.artifacts is non-empty.
- New pipelinePercent template helper, fully covered (nil, zero,
  half, full, overflow clamp).

Tests
- internal/metrics rose to 100% coverage. Five new tests on the new
  helpers + the names guard extended to the kagents_ prefix.
- internal/dashboard rose to 73.9% with the new helper tests.
- internal/controller existing pipeline-status tests still green.

Docs
- docs/explanation/operations.md metrics table extended with the
  five new series and a paragraph explaining the dual-prefix
  situation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: amcheste <13696614+amcheste@users.noreply.github.com>
@amcheste-ai-agent amcheste-ai-agent Bot requested a review from amcheste as a code owner June 2, 2026 14:42
@github-actions github-actions Bot added the docs label Jun 2, 2026
@amcheste-ai-agent amcheste-ai-agent Bot merged commit 0b84179 into develop Jun 2, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant