Skip to content

code-testing-agent: mention find-untested-sources for C# discovery#734

Open
Evangelink wants to merge 1 commit into
mainfrom
dev/amauryleve/mention-find-untested-in-cta
Open

code-testing-agent: mention find-untested-sources for C# discovery#734
Evangelink wants to merge 1 commit into
mainfrom
dev/amauryleve/mention-find-untested-in-cta

Conversation

@Evangelink

@Evangelink Evangelink commented Jun 8, 2026

Copy link
Copy Markdown
Member

Summary

Adds a conditional pointer to the find-untested-sources skill in two places inside code-testing-agent:

  • SKILL.md Step 3 (Research Phase) — high-level note for C# / .NET multi-file scopes: prefer the helper over manual find/grep/glob walks.
  • code-testing-researcher.agent.md Section 7 (Discover Preexisting Tests) — directive instruction telling the researcher to invoke the helper before manually pairing source ↔ test files, and to use its source_to_tests / untested output to fill the research document.

Both callouts are gated on "when available in the workspace" — installations without find-untested-sources continue working via manual discovery. Adds no behavior change for non-C# repos.

Diff

+4 lines, -0 lines across 2 files. No structural changes, no removed content.

Measurement — honest restatement

I originally claimed "−15.67 % input tokens at neutral pass rate" from a 5×136-instance internal msbench experiment. After re-doing the analysis correctly (per-task mean rather than volume-weighted aggregate, with a non-.NET control bucket), the picture is more modest:

Bucket n tasks Per-task mean Δ input tokens Per-task median Δ
.NET tasks 35 −8.73 % −6.98 %
non-.NET tasks (control) 101 −0.13 % −1.43 %

Differential (.NET minus non-.NET): −8.60 pp, Welch's t ≈ −2.02 (right at p ≈ 0.05). Per-task std-dev is ~21 % in both buckets — 5 runs per task is not enough to nail down per-task effects precisely, but the across-task pattern is clean: non-.NET tasks are unchanged (which is the predicted no-effect under my hypothesis), .NET tasks shift modestly negative.

Pass rate was neutral (within noise) on both buckets.

Important caveats

  • The volume-weighted −15 % total was dominated by a handful of very-high-token .NET tasks (e.g. ocelot-core-gen-detailed went 19M → 11.6M); the per-task picture is the right unit of analysis.
  • The helper itself was invoked 0/679 times. The Copilot CLI router did not auto-load it as a sibling skill, so any savings come from the model reading the documentation text in the loaded code-testing-agent SKILL.md, not from the helper executing. The runtime value of the helper is still unmeasured.
  • Several .NET tasks regressed (aspnetcore +44 %, maui +35 %, complog-gen-detailed +13 %), consistent with "the model trusts the heuristic and sometimes skips exploration it shouldn't have skipped."

The honest claim is therefore: a modest, marginally significant reduction on .NET tasks, isolated to .NET (control bucket flat), at the cost of higher variance.

Dependency

Depends on #733 (which adds the find-untested-sources skill itself). Safe to merge before #733 lands — the "when available" gating means the pointer is a no-op until the helper is installed.

Test plan

Doc-only change. No code paths affected. Verified diff is the two intended additions only.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Adds a conditional pointer (gated on 'when available') to the
find-untested-sources skill in two places:

- SKILL.md Step 3 (Research Phase): high-level note for C# / .NET
  multi-file scopes — prefer the helper over manual find/grep/glob walks.

- code-testing-researcher.agent.md Section 7 (Discover Preexisting
  Tests): directive instruction telling the researcher to invoke the
  helper before manually pairing source <-> test files, and to use its
  source_to_tests / untested output to fill the research document.

Both callouts are phrased as 'when available', so installations
without the find-untested-sources skill continue to work via manual
discovery. Adds no behavior for non-C# repos.

Context: in a 5x136-instance internal experiment on the msbench .NET
test bench, adding equivalent pointers to the routed code-testing-agent
yielded a 15.67% input-token reduction at neutral pass rate — the
model trusted the documented pairing heuristics and skipped its own
discovery walk. The helper itself was not invoked in those runs (the
Copilot CLI router did not auto-load the sibling skill), so the
measured win comes from the doc text causing the model to short-circuit
its manual exploration, not from the helper executing.

Depends on #733 (which adds the find-untested-sources
skill itself).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 8, 2026 12:25
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Skill Coverage Report

Plugin Skill Covered Coverage
dotnet-test code-testing-agent 4/5 80%
Uncovered: dotnet-test/code-testing-agent
  • [WorkflowStep] Step 2: Invoke the Test Generator (line 83)

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates code-testing-agent documentation to recommend the find-untested-sources helper (when installed) for C#/.NET multi-file scopes, so the researcher can build source↔test pairing more efficiently without manual repo-wide discovery walks.

Changes:

  • Add a C#/.NET-specific note in SKILL.md Step 3 (Research Phase) pointing researchers to find-untested-sources when available.
  • Add a directive in code-testing-researcher.agent.md Step 7 (Discover Preexisting Tests) to invoke find-untested-sources first and use its untested / source_to_tests output to populate the research doc.
Show a summary per file
File Description
plugins/dotnet-test/skills/code-testing-agent/SKILL.md Adds a gated C#/.NET pointer to prefer find-untested-sources for pairing over manual find/grep/glob.
plugins/dotnet-test/agents/code-testing-researcher.agent.md Instructs the researcher to invoke find-untested-sources (when available) before manual source↔test pairing and to use its JSON outputs in research.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 0

@github-actions github-actions Bot added pr-state/ready-for-eval PR is mergeable and awaiting evaluation evaluate-now Trigger evaluation.yml for current PR head (transient) labels Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

evaluate-now Trigger evaluation.yml for current PR head (transient) pr-state/ready-for-eval PR is mergeable and awaiting evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants