Skip to content

Add skill-level spans to OpenTelemetry traces #1608

@Alexk2309

Description

@Alexk2309

Describe the feature or problem you'd like to solve

When a Copilot CLI agent invokes a project skill, the resulting tool calls (bash, glob, etc.) are emitted as flat children of the root invoke_agent span. There is no intermediate span representing the skill invocation itself, and no skill.name attribute on the child tool spans. This makes it impossible to attribute tool calls to the skill that triggered them using the OTEL trace data.

Current trace structure:

invoke_agent (root)
  ├── execute_tool glob      ← no skill attribution
  ├── execute_tool bash      ← no skill attribution
  ├── execute_tool bash      ← no skill attribution
  ├── chat model
  └── chat model

The skill name only appears in github.copilot.context.skills on the invoke_agent span, which lists all available skills — not the one that was actually invoked.

Proposed solution

Introduce an execute_skill (or similar) span that:

  1. Wraps the tool calls triggered by the skill invocation, so they become children of the skill span rather than the root agent span.
  2. Carries a skill.name attribute identifying which skill was executed.
  3. Is a child of the invoke_agent span, preserving the existing hierarchy.

Example desired trace:

invoke_agent (root)
  ├── chat model
  ├── execute_skill "my-skill"              ← NEW
  │     ├── execute_tool glob               ← child of skill span
  │     ├── execute_tool bash               ← child of skill span
  │     └── execute_tool bash               ← child of skill span
  ├── chat model
  └── execute_tool bash                     ← not part of a skill

Alternatively (minimal version): if nesting is not feasible, adding a skill.name attribute to each execute_tool span that was triggered within a skill invocation context would also solve the problem.

How will it benefit GitHub Copilot CLI and its users?

  • Skill-level latency measurement: Users can measure how long a skill takes end-to-end, rather than manually summing individual tool call durations.
  • Tool call attribution: Clearly distinguish which tool calls belong to a skill invocation vs. general agent reasoning, enabling targeted debugging and optimization.
  • Faster failure diagnosis: When a skill fails, users can immediately identify which child tool call failed without reading through command arguments.
  • Aggregated dashboards: Teams can build per-skill usage and performance dashboards across sessions, which is essential for monitoring custom skill reliability at scale.

Example prompts or workflows

  1. Debugging a failed skill: A custom skill fails intermittently. The user exports OTEL traces and filters for execute_skill spans with error status. They drill into the child execute_tool spans to see exactly which bash command failed — without reading every tool call in the session.

  2. Measuring skill performance over time: A team ships a custom skill and wants to track its p50/p95 latency across sessions. They query their trace backend for execute_skill spans where skill.name = "my-skill" and chart duration over time. Today this is impossible without manually parsing tool call arguments.

  3. Attributing token/tool usage to skills: A user runs a session where the agent invokes three different skills. They want to see how many tool calls each skill made and how much time each consumed. With skill-level spans, this is a simple trace query. Without them, all tool calls are indistinguishable siblings under invoke_agent.

  4. Building an observability dashboard: A team sends OTEL traces to their backend and builds a dashboard showing skill invocation frequency, success rate, and latency. This requires a reliable skill.name attribute or dedicated span — inferring skill boundaries from bash command content is fragile and breaks when scripts change.

  5. Auditing skill usage in CI/automation: In cloud agent jobs, a team wants to verify that the correct skills were invoked and completed successfully. Skill-level spans would make this a simple trace query rather than log parsing.

Additional context

  • Hooks (preToolUse/postToolUse) also don't carry skill context — toolName is "bash" / "glob" etc., with no reference to the parent skill. Adding skillName to hook payloads would be a complementary improvement.
  • The skill tool invocation itself may appear as an execute_tool skill span, but subsequent tool calls triggered by the skill are not linked to it via parentSpanId.
  • Observed on Copilot CLI v1.0.60.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions