fix: improve agent-plugin-review skill to pass 9/9 evals (#779)#783
Merged
fix: improve agent-plugin-review skill to pass 9/9 evals (#779)#783
Conversation
…779) Skill improvements: - Add explicit checks for file path format (leading slash) and repeated inputs in eval YAML - Add hard gate detection recipe for multi-phase workflows - Update workflow-checklist example to use concrete deploy-plan artifact Pi-cli fix: - Strip competing provider env vars when subprovider is explicitly configured (AZURE_OPENAI_* vars were overriding --provider flag) - Add subprovider: openrouter to pi-cli target config All 9 agent-plugin-review eval tests now pass (was 6/9). Closes #779 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deploying agentv with
|
| Latest commit: |
9480f01
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://41830288.agentv.pages.dev |
| Branch Preview URL: | https://fix-779-plugin-review-evals.agentv.pages.dev |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/), repeated inputs across test cases, and missing hard gates in multi-phase workflowssubprovideris explicitly configured, strip competing provider env vars from the spawned process (AZURE_OPENAI_* was overriding--provider openrouter)subprovider: openrouterto pi-cli target so it uses the intended providerTest results
Before: 0/9 pass (pi-cli was silently using Azure OpenAI, returning empty responses)
After: 9/9 pass, mean score 1.000 (verified across 2 consecutive runs)
Test plan
--target pi-cliCloses #779
🤖 Generated with Claude Code