Skip to content

fix: improve agent-plugin-review skill to pass 9/9 evals (#779)#783

Merged
christso merged 2 commits intomainfrom
fix/779-plugin-review-evals
Mar 26, 2026
Merged

fix: improve agent-plugin-review skill to pass 9/9 evals (#779)#783
christso merged 2 commits intomainfrom
fix/779-plugin-review-evals

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Summary

  • Skill improvements: Added explicit guidance for detecting relative file paths (missing leading /), repeated inputs across test cases, and missing hard gates in multi-phase workflows
  • Pi-cli env isolation: When a subprovider is explicitly configured, strip competing provider env vars from the spawned process (AZURE_OPENAI_* was overriding --provider openrouter)
  • Target config: Added subprovider: openrouter to pi-cli target so it uses the intended provider
  • Lint fix: Fixed pre-existing biome lint errors in workspace setup script

Test results

Before: 0/9 pass (pi-cli was silently using Azure OpenAI, returning empty responses)
After: 9/9 pass, mean score 1.000 (verified across 2 consecutive runs)

Test plan

  • All 9 agent-plugin-review eval tests pass with --target pi-cli
  • Verified stability across 2 consecutive full runs
  • Unit tests pass (1598 tests, 0 failures)
  • Pre-push hooks pass (build, typecheck, lint, test, validate:examples)

Closes #779

🤖 Generated with Claude Code

christso and others added 2 commits March 26, 2026 08:28
…779)

Skill improvements:
- Add explicit checks for file path format (leading slash) and repeated inputs in eval YAML
- Add hard gate detection recipe for multi-phase workflows
- Update workflow-checklist example to use concrete deploy-plan artifact

Pi-cli fix:
- Strip competing provider env vars when subprovider is explicitly configured
  (AZURE_OPENAI_* vars were overriding --provider flag)
- Add subprovider: openrouter to pi-cli target config

All 9 agent-plugin-review eval tests now pass (was 6/9).

Closes #779

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 9480f01
Status: ✅  Deploy successful!
Preview URL: https://41830288.agentv.pages.dev
Branch Preview URL: https://fix-779-plugin-review-evals.agentv.pages.dev

View logs

@christso christso merged commit c5c7a11 into main Mar 26, 2026
2 checks passed
@christso christso deleted the fix/779-plugin-review-evals branch March 26, 2026 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve agent-plugin-review skill to pass remaining 3 eval tests

1 participant