Add skill-level spans to OpenTelemetry traces

## Describe the feature or problem you'd like to solve

When a Copilot CLI agent invokes a project skill, the resulting tool calls (bash, glob, etc.) are emitted as flat children of the root `invoke_agent` span. There is no intermediate span representing the skill invocation itself, and no `skill.name` attribute on the child tool spans. This makes it impossible to attribute tool calls to the skill that triggered them using the OTEL trace data.

Current trace structure:

```
invoke_agent (root)
  ├── execute_tool glob      ← no skill attribution
  ├── execute_tool bash      ← no skill attribution
  ├── execute_tool bash      ← no skill attribution
  ├── chat model
  └── chat model
```

The skill name only appears in `github.copilot.context.skills` on the `invoke_agent` span, which lists all **available** skills — not the one that was actually invoked.

## Proposed solution

Introduce an `execute_skill` (or similar) span that:

1. **Wraps the tool calls** triggered by the skill invocation, so they become children of the skill span rather than the root agent span.
2. **Carries a `skill.name` attribute** identifying which skill was executed.
3. **Is a child of the `invoke_agent` span**, preserving the existing hierarchy.

Example desired trace:

```
invoke_agent (root)
  ├── chat model
  ├── execute_skill "my-skill"              ← NEW
  │     ├── execute_tool glob               ← child of skill span
  │     ├── execute_tool bash               ← child of skill span
  │     └── execute_tool bash               ← child of skill span
  ├── chat model
  └── execute_tool bash                     ← not part of a skill
```

Alternatively (minimal version): if nesting is not feasible, adding a `skill.name` attribute to each `execute_tool` span that was triggered within a skill invocation context would also solve the problem.

**How will it benefit GitHub Copilot CLI and its users?**

- **Skill-level latency measurement**: Users can measure how long a skill takes end-to-end, rather than manually summing individual tool call durations.
- **Tool call attribution**: Clearly distinguish which tool calls belong to a skill invocation vs. general agent reasoning, enabling targeted debugging and optimization.
- **Faster failure diagnosis**: When a skill fails, users can immediately identify which child tool call failed without reading through command arguments.
- **Aggregated dashboards**: Teams can build per-skill usage and performance dashboards across sessions, which is essential for monitoring custom skill reliability at scale.

## Example prompts or workflows

1. **Debugging a failed skill**: A custom skill fails intermittently. The user exports OTEL traces and filters for `execute_skill` spans with error status. They drill into the child `execute_tool` spans to see exactly which bash command failed — without reading every tool call in the session.

2. **Measuring skill performance over time**: A team ships a custom skill and wants to track its p50/p95 latency across sessions. They query their trace backend for `execute_skill` spans where `skill.name = "my-skill"` and chart duration over time. Today this is impossible without manually parsing tool call arguments.

3. **Attributing token/tool usage to skills**: A user runs a session where the agent invokes three different skills. They want to see how many tool calls each skill made and how much time each consumed. With skill-level spans, this is a simple trace query. Without them, all tool calls are indistinguishable siblings under `invoke_agent`.

4. **Building an observability dashboard**: A team sends OTEL traces to their backend and builds a dashboard showing skill invocation frequency, success rate, and latency. This requires a reliable `skill.name` attribute or dedicated span — inferring skill boundaries from bash command content is fragile and breaks when scripts change.

5. **Auditing skill usage in CI/automation**: In cloud agent jobs, a team wants to verify that the correct skills were invoked and completed successfully. Skill-level spans would make this a simple trace query rather than log parsing.

## Additional context

- Hooks (`preToolUse`/`postToolUse`) also don't carry skill context — `toolName` is `"bash"` / `"glob"` etc., with no reference to the parent skill. Adding `skillName` to hook payloads would be a complementary improvement.
- The `skill` tool invocation itself may appear as an `execute_tool skill` span, but subsequent tool calls triggered by the skill are not linked to it via `parentSpanId`.
- Observed on Copilot CLI v1.0.60.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add skill-level spans to OpenTelemetry traces #1608

Describe the feature or problem you'd like to solve

Proposed solution

Example prompts or workflows

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add skill-level spans to OpenTelemetry traces #1608

Description

Describe the feature or problem you'd like to solve

Proposed solution

Example prompts or workflows

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions