Skip to content

Fix dotnet-msbuild Codex plugin install by externalizing mcpServers#740

Merged
AbhitejJohn merged 2 commits into
mainfrom
abhitejjohn/fix-msbuild-mcp-servers-inline
Jun 10, 2026
Merged

Fix dotnet-msbuild Codex plugin install by externalizing mcpServers#740
AbhitejJohn merged 2 commits into
mainfrom
abhitejjohn/fix-msbuild-mcp-servers-inline

Conversation

@AbhitejJohn

@AbhitejJohn AbhitejJohn commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #738 — thanks @grevgeny for filing this with a clear repro and root cause analysis!

The dotnet-msbuild Codex plugin fails to install because the mcpServers configuration is embedded inline in plugin.json as an object. Per the Codex plugin spec, mcpServers should be a string path pointing to an external .mcp.json file.

Changes

  1. Created plugins/dotnet-msbuild/.codex-plugin/.mcp.json containing the MCP server configuration
  2. Updated plugins/dotnet-msbuild/.codex-plugin/plugin.json to reference ./.mcp.json instead of embedding the config inline
  3. Root plugins/dotnet-msbuild/plugin.json retains inline mcpServers for the skill-validator
  4. Updated the skill-validator (FindPluginMcpServers and ExternalDependencyChecker.CheckPlugin) to also handle the Codex string path format for forward compatibility

Verification

This matches the fix verified locally by the issue reporter, and aligns with the Codex plugin-json-spec which states:

apps and mcpServers should appear in plugin.json only when .app.json and .mcp.json actually exist.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Skill Coverage Report

Plugin Skill Covered Coverage
⚠️ dotnet-msbuild check-bin-obj-clash 9/13 69.2%
dotnet-msbuild directory-build-organization 0/1 0%
dotnet-msbuild extension-points 0/1 0%
dotnet-msbuild msbuild-modernization 6/7 85.7%
⚠️ dotnet-msbuild msbuild-server 7/9 77.8%
dotnet-msbuild property-patterns 0/1 0%
dotnet-msbuild resolve-project-references 5/6 83.3%
Uncovered: dotnet-msbuild/check-bin-obj-clash
  • [WorkflowStep] Step 2: Get an overview and list projects (line 48)
  • [WorkflowStep] Step 5: Check for double writes (line 62)
  • [WorkflowStep] Step 2: Replay the Binary Log to Text (line 78)
  • [WorkflowStep] Step 3: List All Projects (line 84)
Uncovered: dotnet-msbuild/directory-build-organization
  • [CodePattern] [MSBuild] (line 122)
Uncovered: dotnet-msbuild/extension-points
  • [CodePattern] [MSBuild] (line 184)
Uncovered: dotnet-msbuild/msbuild-modernization
  • [WorkflowStep] Step 6: Remove Unnecessary Boilerplate (line 217)
Uncovered: dotnet-msbuild/msbuild-server
  • [Validation] MSBUILDUSESERVER=1 is set in the shell (line 59)
  • [Validation] Second sequential build is faster than the first (line 60)
Uncovered: dotnet-msbuild/property-patterns
  • [CodePattern] [MSBuild] (line 33)
Uncovered: dotnet-msbuild/resolve-project-references
  • [Validation] ResolveProjectReferences was not set as the optimization target (line 64)

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the dotnet-msbuild plugin manifests to match the Codex plugin spec by externalizing the MCP server configuration into a dedicated .mcp.json file, fixing marketplace installation failures caused by an invalid inline mcpServers shape.

Changes:

  • Added plugins/dotnet-msbuild/.mcp.json containing the binlog MCP server definition.
  • Updated plugins/dotnet-msbuild/plugin.json to reference mcpServers via ./.mcp.json (string path) instead of an inline object.
  • Updated plugins/dotnet-msbuild/.codex-plugin/plugin.json similarly.
Show a summary per file
File Description
plugins/dotnet-msbuild/plugin.json Switches mcpServers from inline object to external file reference for Codex compatibility.
plugins/dotnet-msbuild/.mcp.json Introduces the externalized MCP server configuration payload.
plugins/dotnet-msbuild/.codex-plugin/plugin.json Mirrors the external mcpServers reference for the Codex-specific manifest.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 3/3 changed files
  • Comments generated: 1

Comment thread plugins/dotnet-msbuild/plugin.json Outdated
@AbhitejJohn AbhitejJohn force-pushed the abhitejjohn/fix-msbuild-mcp-servers-inline branch from ad6e5da to 084f874 Compare June 9, 2026 17:28
Move the inline mcpServers configuration to a separate .mcp.json file
inside .codex-plugin/, and update .codex-plugin/plugin.json to reference
it via a relative path string. This matches the Codex plugin spec which
requires mcpServers to be a path reference to a .mcp.json file rather
than an embedded object.

The root plugin.json retains the inline mcpServers format for the
skill-validator. Both validator methods (FindPluginMcpServers and
ExternalDependencyChecker.CheckPlugin) are updated to also handle the
Codex string path format for forward compatibility.

Fixes #738

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 9, 2026 17:39
@AbhitejJohn AbhitejJohn force-pushed the abhitejjohn/fix-msbuild-mcp-servers-inline branch from 084f874 to c47aa13 Compare June 9, 2026 17:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 4/4 changed files
  • Comments generated: 3

Comment thread eng/skill-validator/src/Evaluate/EvaluateCommand.cs Outdated
Comment thread eng/skill-validator/src/Evaluate/EvaluateCommand.cs Outdated
Comment thread eng/skill-validator/src/Check/ExternalDependencyChecker.cs Outdated
- Reject rooted paths and paths with '..' segments to prevent directory
  traversal when resolving mcpServers string references.
- Log parse errors in ResolveMcpFile to stderr (consistent with
  plugin.json parse error handling) instead of silently swallowing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@AbhitejJohn

Copy link
Copy Markdown
Contributor Author

/evaluate

github-actions Bot added a commit that referenced this pull request Jun 9, 2026
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Skill Validation Results

Skill Scenario Quality Skills Loaded Overfit Verdict
convert-blazor-server-to-webapp Blazor Server app with CascadingAuthenticationState 4.0/5 → 5.0/5 🟢 ✅ convert-blazor-server-to-webapp; tools: report_intent, skill ✅ 0.04
configuring-opentelemetry-dotnet Set up OpenTelemetry tracing and metrics with custom spans in ASP.NET Core 4.0/5 → 5.0/5 🟢 ✅ configuring-opentelemetry-dotnet; tools: skill ✅ 0.19
configuring-opentelemetry-dotnet Configure all three OpenTelemetry signals with correct OTLP export 4.0/5 → 5.0/5 🟢 ✅ configuring-opentelemetry-dotnet; tools: report_intent, skill ✅ 0.19
configuring-opentelemetry-dotnet Propagate trace context across a message queue 5.0/5 → 5.0/5 ✅ configuring-opentelemetry-dotnet; tools: skill ✅ 0.19 [1]
minimal-api-file-upload Implement secure file upload in ASP.NET Core 8 minimal API 4.0/5 → 5.0/5 🟢 ✅ minimal-api-file-upload; tools: report_intent, skill / ✅ minimal-api-file-upload; tools: skill ✅ 0.09
minimal-api-file-upload Upload multiple files with metadata in minimal API 4.0/5 → 5.0/5 🟢 ✅ minimal-api-file-upload; tools: report_intent, skill / ✅ minimal-api-file-upload; tools: skill ✅ 0.09
minimal-api-file-upload Stream very large file uploads without buffering 5.0/5 → 5.0/5 ✅ minimal-api-file-upload; tools: skill ✅ 0.09 [2]
dotnet-webapi Create a CRUD Web API with minimal APIs, OpenAPI, and proper HTTP semantics 4.0/5 → 5.0/5 🟢 ✅ dotnet-webapi; tools: skill / ✅ dotnet-webapi; tools: skill, stop_bash 🟡 0.31
dotnet-webapi Add error handling with ProblemDetails and IExceptionHandler 2.0/5 → 4.0/5 🟢 ✅ dotnet-webapi; tools: skill, report_intent, view, glob / ✅ dotnet-webapi; tools: report_intent, skill, view, glob 🟡 0.31
dotnet-webapi Add a new API endpoint to an existing controller-based project 3.0/5 → 4.0/5 🟢 ✅ dotnet-webapi; tools: skill, view / ✅ dotnet-webapi; tools: skill, view, edit 🟡 0.31
microbenchmarking Investigate runtime upgrade performance impact 3.0/5 → 4.0/5 🟢 ✅ microbenchmarking; tools: skill ✅ 0.13
clr-activation-debugging Diagnose unexpected FOD dialog from native build tool 5.0/5 → 5.0/5 ✅ clr-activation-debugging; tools: skill ✅ 0.08 [3]
clr-activation-debugging Diagnose FOD suppressed but activation still failing 3.0/5 → 5.0/5 🟢 ✅ clr-activation-debugging; tools: skill, glob ✅ 0.08
clr-activation-debugging Explain why same binary behaves differently under different launch methods 5.0/5 → 5.0/5 ✅ clr-activation-debugging; tools: skill / ✅ clr-activation-debugging; tools: skill, glob ✅ 0.08 [4]
clr-activation-debugging Analyze healthy managed EXE activation 5.0/5 → 5.0/5 ✅ clr-activation-debugging; tools: skill ✅ 0.08 [5]
clr-activation-debugging Identify multiple activation sequences in a single log 5.0/5 → 5.0/5 ✅ clr-activation-debugging; tools: skill, glob / ✅ clr-activation-debugging; tools: glob, skill ✅ 0.08 [6]
clr-activation-debugging Explain useLegacyV2RuntimeActivationPolicy in activation log 3.0/5 → 3.0/5 ✅ clr-activation-debugging; tools: report_intent, skill / ✅ clr-activation-debugging; tools: skill, report_intent, view ✅ 0.08
clr-activation-debugging Decline non-CLR-activation issue 1.0/5 → 5.0/5 🟢 ✅ clr-activation-debugging; tools: skill, glob / ℹ️ not activated (expected) ✅ 0.08
android-tombstone-symbolication Symbolicate .NET frames in an Android tombstone 4.0/5 → 3.0/5 🔴 ✅ android-tombstone-symbolication; tools: skill ✅ 0.14 [7]
android-tombstone-symbolication Recognize tombstone with no .NET frames 5.0/5 → 5.0/5 ✅ android-tombstone-symbolication; tools: skill ✅ 0.14 [8]
android-tombstone-symbolication Symbolicate CoreCLR frames in an Android tombstone 3.0/5 → 4.0/5 🟢 ✅ android-tombstone-symbolication; tools: skill / ✅ android-tombstone-symbolication; tools: skill, read_bash ✅ 0.14
android-tombstone-symbolication Recognize NativeAOT tombstone with app binary and libSystem.Native.so 3.0/5 → 4.0/5 🟢 ✅ android-tombstone-symbolication; tools: skill, bash ✅ 0.14
android-tombstone-symbolication Symbolicate multi-thread tombstone 2.0/5 ⏰ → 5.0/5 🟢 ✅ android-tombstone-symbolication; tools: skill ✅ 0.14
android-tombstone-symbolication Handle .NET frames with no BuildId metadata 4.0/5 → 5.0/5 🟢 ✅ android-tombstone-symbolication; tools: skill ✅ 0.14
android-tombstone-symbolication Symbolicate tombstone with multiple .NET libraries and different BuildIds 3.0/5 → 2.0/5 ⏰ 🔴 ✅ android-tombstone-symbolication; tools: skill, read_bash / ✅ android-tombstone-symbolication; tools: skill ✅ 0.14
android-tombstone-symbolication Reject iOS crash log as wrong format 4.0/5 → 5.0/5 🟢 ℹ️ not activated (expected) ✅ 0.14 [9]
apple-crash-symbolication Parse .NET frames and locate dSYMs from an iOS crash log 3.0/5 → 3.0/5 ✅ apple-crash-symbolication; tools: skill, bash ✅ 0.20
apple-crash-symbolication Investigate root cause of a .NET MAUI iOS crash 2.0/5 → 4.0/5 🟢 ✅ apple-crash-symbolication; tools: skill, bash ✅ 0.20
apple-crash-symbolication Reject Android tombstone passed as iOS crash log 4.0/5 → 4.0/5 ℹ️ not activated (expected) ✅ 0.20 [10]
dump-collect Configure automatic crash dumps for CoreCLR app on Linux 4.0/5 → 5.0/5 🟢 ✅ dump-collect; tools: report_intent, skill, view / ✅ dump-collect; tools: skill, report_intent, view 🟡 0.28
dump-collect Set up NativeAOT crash dumps with createdump in Kubernetes 3.0/5 → 5.0/5 🟢 ✅ dump-collect; tools: skill, view 🟡 0.28
dump-collect Recover crash dump from macOS NativeAOT without createdump 4.0/5 → 4.0/5 ⚠️ NOT ACTIVATED 🟡 0.28
dump-collect Configure CoreCLR dump collection in Alpine Docker as non-root 3.0/5 → 5.0/5 🟢 ✅ dump-collect; tools: report_intent, skill, view / ✅ dump-collect; tools: skill, report_intent, view 🟡 0.28
dump-collect Advisory: macOS NativeAOT crash dump recovery steps 4.0/5 → 5.0/5 🟢 ✅ dump-collect; tools: skill 🟡 0.28
dump-collect Advisory: CoreCLR Alpine Docker non-root configuration 4.0/5 → 5.0/5 🟢 ✅ dump-collect; tools: skill, report_intent, view 🟡 0.28
dump-collect Advisory: NativeAOT Kubernetes dump collection setup 3.0/5 → 5.0/5 🟢 ✅ dump-collect; tools: skill, report_intent, view 🟡 0.28
dump-collect Detect runtime and configure crash dumps for unknown .NET app on Linux 4.0/5 → 5.0/5 🟢 ✅ dump-collect; tools: skill, view / ✅ dump-collect; tools: skill 🟡 0.28
dump-collect Decline dump analysis request 2.0/5 → 2.0/5 ℹ️ not activated (expected) 🟡 0.28 [11]
dotnet-trace-collect High CPU in Kubernetes on Linux (.NET 8) 4.0/5 → 4.0/5 ✅ dotnet-trace-collect; tools: skill, report_intent, view ✅ 0.14 [12]
dotnet-trace-collect .NET Framework on Windows without admin privileges 2.0/5 → 5.0/5 🟢 ✅ dotnet-trace-collect; tools: report_intent, skill / ✅ dotnet-trace-collect; tools: skill ✅ 0.14
dotnet-trace-collect .NET 10 on Linux with root access and native call stacks 1.0/5 → 4.0/5 🟢 ✅ dotnet-trace-collect; tools: skill, report_intent, view / ✅ dotnet-trace-collect; tools: report_intent, skill, view ✅ 0.14
dotnet-trace-collect Memory leak on Linux (.NET 8) 3.0/5 → 3.0/5 ✅ dotnet-trace-collect; tools: report_intent, skill, view / ⚠️ NOT ACTIVATED ✅ 0.14 [13]
dotnet-trace-collect Slow requests on Windows with PerfView 5.0/5 → 5.0/5 ✅ dotnet-trace-collect; tools: skill, report_intent, view / ✅ dotnet-trace-collect; tools: report_intent, skill ✅ 0.14 [14]
dotnet-trace-collect Excessive GC on Linux (.NET 8) 4.0/5 → 5.0/5 🟢 ✅ dotnet-trace-collect; tools: report_intent, skill, view / ✅ dotnet-trace-collect; tools: report_intent, skill ✅ 0.14
dotnet-trace-collect Hang or deadlock diagnosis on Linux 3.0/5 → 4.0/5 🟢 ✅ dotnet-trace-collect; tools: skill, report_intent, view ✅ 0.14
dotnet-trace-collect Windows container high CPU with PerfView 2.0/5 → 5.0/5 🟢 ✅ dotnet-trace-collect; tools: report_intent, skill, view / ✅ dotnet-trace-collect; tools: report_intent, skill ✅ 0.14
dotnet-trace-collect Long-running intermittent issue with PerfView triggers 3.0/5 → 4.0/5 🟢 ✅ dotnet-trace-collect; tools: skill, view ✅ 0.14
dotnet-trace-collect Linux pre-.NET 10 needing native call stacks 3.0/5 → 5.0/5 🟢 ✅ dotnet-trace-collect; tools: report_intent, skill, view ✅ 0.14
dotnet-trace-collect Windows modern .NET with admin high CPU 2.0/5 → 5.0/5 🟢 ✅ dotnet-trace-collect; tools: report_intent, skill, view / ⚠️ NOT ACTIVATED ✅ 0.14
dotnet-trace-collect Memory leak on .NET Framework Windows 4.0/5 → 5.0/5 🟢 ✅ dotnet-trace-collect; tools: report_intent, skill, view / ⚠️ NOT ACTIVATED ✅ 0.14 [15]
dotnet-trace-collect Kubernetes with console access prefers console tools 5.0/5 → 5.0/5 ✅ dotnet-trace-collect; tools: skill, view ✅ 0.14 [16]
dotnet-trace-collect Container installation without .NET SDK 4.0/5 → 4.0/5 ✅ dotnet-trace-collect; tools: skill, report_intent, view / ✅ dotnet-trace-collect; tools: report_intent, skill, view ✅ 0.14 [17]
dotnet-trace-collect HTTP 500s from downstream service on Linux (.NET 8) 5.0/5 → 5.0/5 ✅ dotnet-trace-collect; tools: report_intent, skill, view / ✅ dotnet-trace-collect; tools: skill ✅ 0.14 [18]
dotnet-trace-collect Networking timeouts on Windows with admin (.NET 8) 2.0/5 → 4.0/5 🟢 ✅ dotnet-trace-collect; tools: skill, report_intent, view ✅ 0.14
dotnet-trace-collect Assembly loading failure on Linux (.NET 8) 4.0/5 → 5.0/5 🟢 ✅ dotnet-trace-collect; tools: skill, report_intent, view ✅ 0.14
analyzing-dotnet-performance Detects compiled regex startup budget and regex chain allocations 4.0/5 → 5.0/5 🟢 ✅ analyzing-dotnet-performance; tools: skill ✅ 0.16
analyzing-dotnet-performance Detects CurrentCulture comparer and compiled regex budget in inflection rules 5.0/5 → 5.0/5 ✅ analyzing-dotnet-performance; tools: skill ✅ 0.16 [19]
analyzing-dotnet-performance Finds per-call Dictionary allocation not hoisted to static 5.0/5 → 5.0/5 ✅ analyzing-dotnet-performance; tools: skill, bash / ✅ analyzing-dotnet-performance; tools: skill, grep ✅ 0.16 [20]
analyzing-dotnet-performance Catches compound allocations in recursive number converter with ToLower 3.0/5 → 5.0/5 🟢 ✅ analyzing-dotnet-performance; tools: skill, grep / ✅ analyzing-dotnet-performance; tools: skill ✅ 0.16
analyzing-dotnet-performance Finds StringComparison.Ordinal missing and FrozenDictionary opportunities 5.0/5 → 5.0/5 ✅ analyzing-dotnet-performance; tools: skill ✅ 0.16 [21]
analyzing-dotnet-performance Detects Aggregate+Replace chain and struct missing IEquatable 3.0/5 → 5.0/5 🟢 ✅ analyzing-dotnet-performance; tools: skill, bash / ⚠️ NOT ACTIVATED ✅ 0.16
analyzing-dotnet-performance Finds branched Replace chain in format string manipulation 3.0/5 → 4.0/5 🟢 ✅ analyzing-dotnet-performance; tools: skill ✅ 0.16
analyzing-dotnet-performance Catches LINQ on hot-path string processing and All(char.IsUpper) 4.0/5 → 5.0/5 🟢 ✅ analyzing-dotnet-performance; tools: skill / ✅ analyzing-dotnet-performance; tools: skill, grep ✅ 0.16
analyzing-dotnet-performance Detects LINQ pipeline in TimeSpan formatting and collection processing 3.0/5 → 3.0/5 ✅ analyzing-dotnet-performance; tools: skill, grep / ✅ analyzing-dotnet-performance; tools: skill, bash ✅ 0.16 [22]
analyzing-dotnet-performance Flags Span inconsistencies and compound method chains in truncation library 4.0/5 → 4.0/5 ✅ analyzing-dotnet-performance; tools: skill ✅ 0.16
analyzing-dotnet-performance Identifies unsealed leaf classes and locale hierarchy patterns 3.0/5 → 5.0/5 🟢 ✅ analyzing-dotnet-performance; tools: skill, grep ✅ 0.16

[1] (Isolated) Quality unchanged but weighted score is -24.5% due to: judgment, quality, tokens (13416 → 29254), tool calls (0 → 1)
[2] (Plugin) Quality unchanged but weighted score is -1.9% due to: tokens (13513 → 30114), tool calls (0 → 1)
[3] (Plugin) Quality unchanged but weighted score is -5.9% due to: tokens (42914 → 79688), time (25.3s → 36.7s), tool calls (5 → 6)
[4] (Isolated) Quality unchanged but weighted score is -1.4% due to: tokens (40849 → 72406), tool calls (4 → 5)
[5] (Plugin) Quality unchanged but weighted score is -2.3% due to: tokens (38869 → 75344), tool calls (3 → 4), time (15.0s → 19.5s)
[6] (Plugin) Quality unchanged but weighted score is -3.6% due to: tokens (40984 → 82567), tool calls (3 → 6), time (20.0s → 58.5s)
[7] (Isolated) Quality dropped but weighted score is +3.4% due to: errors (1 → 0), tokens (127466 → 65533), time (122.3s → 53.4s), tool calls (10 → 6)
[8] (Plugin) Quality unchanged but weighted score is -6.6% due to: tokens (26376 → 48699), tool calls (2 → 3), time (13.1s → 19.2s)
[9] (Plugin) Quality unchanged but weighted score is -24.0% due to: judgment, tokens (26789 → 67470), quality, tool calls (2 → 4), time (19.4s → 23.4s)
[10] (Plugin) Quality improved but weighted score is -13.0% due to: completion (✓ → ✗), tokens (27339 → 164176), tool calls (2 → 9), time (20.7s → 74.4s)
[11] (Isolated) Quality unchanged but weighted score is -1.3% due to: time (7.3s → 10.7s)
[12] (Isolated) Quality unchanged but weighted score is -17.8% due to: judgment, tokens (13048 → 53231), tool calls (0 → 3), time (14.2s → 22.5s)
[13] (Plugin) Quality unchanged but weighted score is -17.1% due to: completion (✓ → ✗), tokens (12950 → 46752), tool calls (0 → 3), time (14.4s → 20.4s)
[14] (Plugin) Quality unchanged but weighted score is -1.5% due to: tokens (12861 → 34969), tool calls (0 → 2)
[15] (Plugin) Quality unchanged but weighted score is -8.7% due to: tokens (12874 → 29965), tool calls (0 → 2), time (12.5s → 18.7s)
[16] (Plugin) Quality unchanged but weighted score is -9.1% due to: tokens (25314 → 56843), tool calls (1 → 3), time (12.0s → 19.9s)
[17] (Plugin) Quality unchanged but weighted score is -4.8% due to: tokens (12911 → 56778), tool calls (0 → 3), time (12.1s → 18.4s)
[18] (Plugin) Quality unchanged but weighted score is -11.5% due to: tokens (13036 → 35025), quality, tool calls (0 → 1), time (13.3s → 17.6s)
[19] (Plugin) Quality unchanged but weighted score is -10.0% due to: tokens (29401 → 78866), tool calls (2 → 6), time (21.8s → 50.5s)
[20] (Plugin) Quality unchanged but weighted score is -8.7% due to: quality, tokens (76678 → 118508), tool calls (7 → 14)
[21] (Plugin) Quality unchanged but weighted score is -6.8% due to: tokens (29702 → 79788), tool calls (2 → 6), time (19.8s → 49.2s)
[22] (Plugin) Quality unchanged but weighted score is -19.6% due to: quality, tokens (29819 → 77119), tool calls (2 → 4), time (20.7s → 44.7s)

timeout — run(s) hit the (300s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

To investigate failures, paste this to your AI coding agent:

For PR 740 in dotnet/skills, download eval artifacts with gh run download 27233380552 --repo dotnet/skills --pattern "skill-validator-results-*" --dir ./eval-results, then fetch https://raw.githubusercontent.com/dotnet/skills/1467f0d57f201ca957d2008a3a416a9d157e53e8/eng/skill-validator/src/docs/InvestigatingResults.md and follow it to analyze the results.json files. Diagnose each failure, suggest fixes to the eval.yaml and skill content, and tell me what to fix first.

▶ Sessions Visualisation -- interactive replay of all evaluation sessions
📊 Session Analytics (preview) -- aggregated metrics across evaluation sessions

@AbhitejJohn AbhitejJohn merged commit c85a2d8 into main Jun 10, 2026
40 checks passed
@AbhitejJohn AbhitejJohn deleted the abhitejjohn/fix-msbuild-mcp-servers-inline branch June 10, 2026 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dotnet-msbuild Codex plugin fails to install because mcpServers is embedded inline

3 participants