Fix PR-triage eval trigger: dispatch evaluation.yml instead of bot-applied label by JanKrivanek · Pull Request #746 · dotnet/skills

JanKrivanek · 2026-06-10T16:13:58Z

Problem

When the PR-triage worker decides a PR is ready-for-eval, it added the
evaluate-now label and relied on evaluation.yml's
pull_request_target: [labeled] trigger to start evaluation. That trigger never
fires for the bot.

GitHub's recursion guard: events emitted by the default GITHUB_TOKEN do not
start new workflow runs — and labeled is one of them. The worker applies the
label as github-actions[bot] (its GITHUB_TOKEN), so the labeled webhook is
suppressed and no evaluation runs.

Live repro — PR #745

Time (UTC)	Actor	Event	Result
13:33:39Z	`github-actions[bot]`	added `evaluate-now`	no run fired
15:39:24Z	a maintainer	`/evaluate`	evaluation ran

The same worker run (workflow_dispatch, dispatched by the batch via
GITHUB_TOKEN) is itself an A/B proof of the guard:

workflow_dispatch → fired (the worker executed)
its own labeled (the label it added) → did not fire

workflow_dispatch and repository_dispatch are the only token-initiated events
exempt from the recursion guard.

Fix

The worker now dispatches evaluation.yml directly via gh workflow run
(workflow_dispatch) with a pr_number input, instead of adding a label that
can't fire. The dispatch is routed through the existing gate job, so the path
is identical to /evaluate (same permission checks, PR fetch, fork handling,
concurrency group, commit status, and result comment).

evaluation.yml: new workflow_dispatch inputs pr_number + head_sha;
gate.if and the concurrency group gain a third entry point; gate reads and
numerically validates pr_number (via env, no interpolation); discover only
runs a full/plugin eval for a no-pr_number dispatch.
pr-triage-act.sh: do_eval_trigger dispatches evaluation.yml rather
than labeling. Idempotency (eval_run_exists_for_head) gains a second path:
a dispatched run's head_sha is the default branch (not the PR head), so it is
matched by the deterministic run name Evaluate PR #<n> @ <sha7>.
pr-triage.yml: worker granted actions: write (required for
gh workflow run).
The evaluate-now label remains a valid human entry point (a human
adding the label is not subject to the recursion guard).

Verification

Recursion-guard behavior is proven empirically from this repo's own history
(see the A/B table above) — workflow_dispatch via GITHUB_TOKEN already
drives the batch → worker chain.
evaluation.yml + pr-triage.yml parse cleanly; pr-triage-act.sh passes
bash -n.

Note: workflow_dispatch always runs the default-branch copy of
evaluation.yml, so the new dispatch path can only be exercised end-to-end
once this is merged to main. The pre-merge evidence above plus syntax
validation is the available verification.

Docs

Updated pr-triage-workflows.md (architecture
diagram + the three evaluation.yml entry points).

The triage worker added the 'evaluate-now' label via GITHUB_TOKEN, but label events emitted by GITHUB_TOKEN do not start workflows (GitHub's recursion guard), so evaluation.yml's pull_request_target:[labeled] entry point never fired for the bot (repro: PR dotnet#745). workflow_dispatch and repository_dispatch are the only token-initiated events exempt from that guard. The worker now dispatches evaluation.yml directly via 'gh workflow run' with a pr_number input, routed through the existing gate job so the path is identical to /evaluate. A dispatched run's head_sha is the default branch (not the PR head), so idempotency now matches the deterministic run name 'Evaluate PR #<n> @ <sha7>'. The 'evaluate-now' label remains a valid human entry point. Worker granted actions:write for 'gh workflow run'.

github-actions · 2026-06-10T16:14:09Z

Note

This PR is from a fork and modifies infrastructure files (eng/ or .github/).

Changes to infrastructure typically need to be submitted from a branch in dotnet/skills (not a fork) so that CI workflows run with the correct permissions and secrets.

Please consider recreating this PR from an upstream branch. If you don't have push access to dotnet/skills, ask a maintainer to push your branch for you.

Copilot

Pull request overview

Updates the PR-triage automation so that when a PR is deemed ready-for-eval, the worker triggers evaluation.yml via workflow_dispatch (instead of relying on a bot-applied label that can’t emit workflow-triggering events under GitHub’s recursion guard).

Changes:

Add workflow_dispatch inputs to evaluation.yml and route pr_number dispatches through the existing gate PR pipeline/concurrency group.
Update the triage worker script to dispatch evaluation.yml via gh workflow run and enhance idempotency detection for dispatch-triggered runs.
Grant the triage worker actions: write so it can dispatch evaluation.yml, and update design docs to reflect the new trigger path.

Show a summary per file

File	Description
`docs/design/pr-triage-workflows.md`	Updates architecture + evaluation entry-point documentation to reflect dispatch-based triggering.
`.github/workflows/pr-triage.yml`	Expands permissions to allow dispatching `evaluation.yml` from the worker.
`.github/workflows/evaluation.yml`	Adds dispatch inputs and updates gate/concurrency logic to support triage dispatch entry point.
`.github/scripts/pr-triage-act.sh`	Switches eval triggering from label application to workflow dispatch; adds dispatch-run idempotency lookup.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 4/4 changed files
Comments generated: 1

github-actions · 2026-06-10T17:28:33Z

👋 @JanKrivanek — this PR has 1 unresolved review thread(s). When you're ready, please address the feedback and push an update; the triage bot will pick up the next state automatically. (Add the no-stale label to silence further pings.)

JanKrivanek · 2026-06-12T13:28:27Z

/evaluate

github-actions · 2026-06-12T13:55:29Z

Skill Validation Results

Skill	Scenario	Quality	Skills Loaded	Overfit	Verdict
technology-selection	ML.NET classification on tabular data	3.0/5 → 4.0/5 🟢	✅ technology-selection; tools: skill, stop_bash / ✅ technology-selection; tools: skill	🟡 0.39	✅
technology-selection	LLM integration with MEAI abstraction	1.0/5 → 1.0/5	⚠️ NOT ACTIVATED	🟡 0.39	❌ [1]
technology-selection	Reject LLM for tabular classification	3.0/5 → 4.0/5 🟢	✅ technology-selection; tools: skill	🟡 0.39	✅
technology-selection	Agentic workflow with guardrails	2.0/5 → 3.0/5 🟢	✅ technology-selection; tools: skill	🟡 0.39	✅
technology-selection	Natural-language scenario decomposition — RAG chatbot	3.0/5 → 4.0/5 🟢	✅ technology-selection; tools: skill / ✅ technology-selection; tools: skill, bash	🟡 0.39	❌ [2]
technology-selection	RAG pipeline with vector search	4.0/5 → 5.0/5 🟢	✅ technology-selection; tools: skill	🟡 0.39	✅
mcp-csharp-debug	Debug an MCP server with MCP Inspector	4.0/5 → 4.0/5	✅ mcp-csharp-debug; tools: report_intent, skill / ✅ mcp-csharp-debug; tools: skill	✅ 0.07	❌ [3]
mcp-csharp-debug	Configure VS Code to use an MCP server	4.0/5 → 4.0/5	✅ mcp-csharp-debug; tools: skill, report_intent, view, glob / ✅ mcp-csharp-debug; tools: skill, report_intent	✅ 0.07	✅
mcp-csharp-debug	Debug a failing MCP server tool	5.0/5 → 4.0/5 🔴	✅ mcp-csharp-debug; tools: report_intent, skill / ✅ mcp-csharp-debug; tools: skill	✅ 0.07	❌
mcp-csharp-publish	Publish an MCP server as a NuGet tool package	3.0/5 → 4.0/5 🟢	✅ mcp-csharp-publish; tools: skill	🟡 0.21	❌ [4]
mcp-csharp-publish	Deploy an HTTP MCP server to Azure Container Apps	3.0/5 → 5.0/5 🟢	✅ mcp-csharp-publish; tools: skill, report_intent, view	🟡 0.21	✅
mcp-csharp-publish	Publish to the MCP Registry	1.0/5 → 3.0/5 🟢	✅ mcp-csharp-publish; tools: skill	🟡 0.21	✅
mcp-csharp-create	Implement MCP tools with proper attributes and DI	4.0/5 → 5.0/5 🟢	✅ mcp-csharp-create; tools: skill, view	✅ 0.12	✅
mcp-csharp-create	Create an HTTP MCP server with tools and resources	4.0/5 → 5.0/5 🟢	✅ mcp-csharp-create; tools: skill	✅ 0.12	✅
mcp-csharp-create	Create an MCP server with tools, prompts, and proper logging	4.0/5 → 5.0/5 🟢	✅ mcp-csharp-create; tools: skill	✅ 0.12	✅
mcp-csharp-test	Write unit and integration tests for an MCP server	3.0/5 → 4.0/5 🟢	✅ mcp-csharp-test; tools: skill, report_intent, view	🟡 0.21	✅
mcp-csharp-test	Test an HTTP MCP server with WebApplicationFactory	3.0/5 → 4.0/5 🟢	✅ mcp-csharp-test; tools: skill, report_intent, view	🟡 0.21	✅
mcp-csharp-test	Create evaluations for an MCP server	2.0/5 → 5.0/5 🟢	✅ mcp-csharp-test; tools: skill, view	🟡 0.21	✅
exp-mock-usage-analysis	Detect unused and unreachable mock setups	3.0/5 → 5.0/5 🟢	✅ exp-mock-usage-analysis; tools: skill	✅ 0.09	✅
exp-mock-usage-analysis	Detect redundant mock configurations duplicated across tests	3.0/5 → 4.0/5 🟢	✅ exp-mock-usage-analysis; tools: skill	✅ 0.09	✅
exp-mock-usage-analysis	Detect mocking of stable framework types	3.0/5 → 5.0/5 🟢	✅ exp-mock-usage-analysis; tools: skill	✅ 0.09	✅
exp-mock-usage-analysis	Analyze mock usage in NSubstitute tests	3.0/5 → 5.0/5 🟢	✅ exp-mock-usage-analysis; tools: skill	✅ 0.09	✅
exp-mock-usage-analysis	Analyze mock usage in FakeItEasy tests	4.0/5 → 5.0/5 🟢	✅ exp-mock-usage-analysis; tools: skill	✅ 0.09	✅
exp-mock-usage-analysis	Detect excessive mock configuration sprawl	3.0/5 → 4.0/5 🟢	✅ exp-mock-usage-analysis; tools: skill	✅ 0.09	✅
exp-test-maintainability	Recommend data-driven patterns with display names for unclear parameters	4.0/5 → 4.0/5	⚠️ NOT ACTIVATED	✅ 0.13	❌ [5]
exp-test-maintainability	Recognize well-maintained tests that need minimal changes	5.0/5 → 5.0/5	⚠️ NOT ACTIVATED	✅ 0.13	❌ [6]
exp-test-maintainability	Detect repeated object construction and setup across test methods	3.0/5 → 4.0/5 🟢	✅ exp-test-maintainability; tools: skill	✅ 0.13	✅
exp-test-maintainability	Recognize tests with minimal boilerplate that need no refactoring	3.0/5 → 5.0/5 🟢	✅ exp-test-maintainability; tools: skill	✅ 0.13	✅
exp-simd-vectorization	Optimize manual min/max with TensorPrimitives	1.0/5 → 4.0/5 🟢	✅ exp-simd-vectorization; tools: skill, create, bash	✅ 0.17	✅
exp-simd-vectorization	Optimize manual product with TensorPrimitives	1.0/5 → 5.0/5 🟢	✅ exp-simd-vectorization; tools: skill, glob, create, bash	✅ 0.17	✅
exp-simd-vectorization	No optimization opportunity — dictionary-based lookup service	5.0/5 → 5.0/5	⚠️ NOT ACTIVATED	✅ 0.17	❌ [7]
exp-simd-vectorization	Optimize int array conditional increment with SIMD	3.0/5 → 3.0/5	✅ exp-simd-vectorization; tools: skill / ⚠️ NOT ACTIVATED	✅ 0.17	✅
exp-simd-vectorization	Optimize byte buffer bit reversal with SIMD	4.0/5 → 4.0/5	✅ exp-simd-vectorization; tools: skill	✅ 0.17	❌ [8]

[1] (Isolated) Quality unchanged but weighted score is -0.8% due to: efficiency metrics
[2] (Isolated) Quality improved but weighted score is -4.2% due to: tokens (54057 → 92250), time (36.5s → 46.3s)
[3] (Plugin) Quality unchanged but weighted score is -8.4% due to: tokens (12732 → 30244), tool calls (0 → 1), time (11.2s → 15.1s)
[4] (Plugin) Quality unchanged but weighted score is -5.1% due to: tokens (38459 → 65947), tool calls (3 → 5)
[5] (Isolated) Quality unchanged but weighted score is -0.1% due to: efficiency metrics
[6] (Plugin) Quality unchanged but weighted score is -0.4% due to: efficiency metrics
[7] (Isolated) Quality unchanged but weighted score is -2.4% due to: judgment
[8] (Isolated) Quality unchanged but weighted score is -8.8% due to: quality, tool calls (9 → 11)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

To investigate failures, paste this to your AI coding agent:

For PR 746 in dotnet/skills, download eval artifacts with gh run download 27418730270 --repo dotnet/skills --pattern "skill-validator-results-*" --dir ./eval-results, then fetch https://raw.githubusercontent.com/dotnet/skills/5cbaa520aa377f40dc753209ddb67edddb61c16b/eng/skill-validator/src/docs/InvestigatingResults.md and follow it to analyze the results.json files. Diagnose each failure, suggest fixes to the eval.yaml and skill content, and tell me what to fix first.

▶ Sessions Visualisation -- interactive replay of all evaluation sessions
📊 Session Analytics (preview) -- aggregated metrics across evaluation sessions

JanKrivanek marked this pull request as ready for review June 10, 2026 16:30

JanKrivanek requested review from ViktorHofer and timheuer as code owners June 10, 2026 16:30

Copilot AI review requested due to automatic review settings June 10, 2026 16:30

JanKrivanek requested a review from dbreshears as a code owner June 10, 2026 16:30

JanKrivanek enabled auto-merge (squash) June 10, 2026 16:30

Copilot started reviewing on behalf of JanKrivanek June 10, 2026 16:30 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread .github/workflows/evaluation.yml

github-actions Bot added the waiting-on-author PR state label label Jun 10, 2026

github-actions Bot added pr-state/ready-for-eval PR is mergeable and awaiting evaluation evaluate-now Trigger evaluation.yml for current PR head (transient) and removed waiting-on-author PR state label labels Jun 12, 2026

Evangelink approved these changes Jun 12, 2026

View reviewed changes

JanKrivanek merged commit 0c0f6f0 into dotnet:main Jun 12, 2026
23 checks passed

github-actions Bot added a commit that referenced this pull request Jun 12, 2026

Update PR token usage data (PR #746)

550088c

This was referenced Jun 12, 2026

Fix evaluation.yml run-name: quote expression so '#' isn't a YAML comment #759

Open

Add authoring-github-workflows skill + actionlint CI gate (prevent workflow-YAML breakage) #760

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PR-triage eval trigger: dispatch evaluation.yml instead of bot-applied label#746

Fix PR-triage eval trigger: dispatch evaluation.yml instead of bot-applied label#746
JanKrivanek merged 1 commit into
dotnet:mainfrom
JanKrivanek:fix/triage-eval-dispatch

JanKrivanek commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

JanKrivanek commented Jun 12, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JanKrivanek commented Jun 10, 2026

Problem

Live repro — PR #745

Fix

Verification

Docs

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

JanKrivanek commented Jun 12, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026

Skill Validation Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants