Provision Copilot CLI for code-review judge in Claude workflow by gggdttt · Pull Request #701 · microsoft/BC-Bench

gggdttt · 2026-06-26T11:30:45Z

Problem

Running the code-review category through the Claude evaluation workflow fails in the scoring step with:

LLMJudgeError: Copilot CLI not found; cannot run the semantic judge

The code-review semantic judge (judge_verdicts in src/bcbench/evaluate/codereview_judge.py) always runs on Copilot CLI as a fixed, agent-independent judge. But claude-evaluation.yml only installs Claude Code — it never provisions Copilot CLI, the copilot-requests permission, or COPILOT_GITHUB_TOKEN — so judging dies for any Claude code-review run.

This is a pre-existing gap in the Claude workflow, unrelated to the BCQuality work in #696.

Fix

In .github/workflows/claude-evaluation.yml:

Add copilot-requests: write to the evaluation job permissions.
Install @github/copilot@1.0.57, gated on category == 'code-review' (skipped for bug-fix / test-generation to save time).
Pass COPILOT_GITHUB_TOKEN: ${{ github.token }} to the run step and mask it.

This mirrors the judge environment already present in copilot-evaluation.yml.

Testing

Workflow-only change. To validate: run the Claude evaluation workflow with category: code-review (test-run) and confirm the judge step no longer errors.

…th CLIs in both workflows

Co-authored-by: Sun Haoran <haoransun@microsoft.com>

ci: provision Copilot CLI for code-review judge in Claude workflow

66265ff

gggdttt changed the title ~~ci: provision Copilot CLI for code-review judge in Claude workflow~~ Provision Copilot CLI for code-review judge in Claude workflow Jun 26, 2026

gggdttt marked this pull request as ready for review June 26, 2026 11:47

gggdttt enabled auto-merge (squash) June 26, 2026 11:49

ci: extract eval CLI install into shared composite action; install bo…

c2ffb75

…th CLIs in both workflows

haoranpb requested changes Jun 26, 2026

View reviewed changes

Comment thread .github/workflows/claude-evaluation.yml Outdated

haoranpb requested changes Jun 26, 2026

View reviewed changes

Comment thread .github/actions/install-eval-clis/action.yml Outdated

Comment thread .github/workflows/claude-evaluation.yml Outdated

gggdttt and others added 2 commits June 26, 2026 14:55

Update .github/actions/install-eval-clis/action.yml

1886bbb

Co-authored-by: Sun Haoran <haoransun@microsoft.com>

Update .github/workflows/claude-evaluation.yml

efe69b6

Co-authored-by: Sun Haoran <haoransun@microsoft.com>

haoranpb approved these changes Jun 26, 2026

View reviewed changes

gggdttt merged commit eb546e4 into main Jun 26, 2026
7 checks passed

gggdttt deleted the private/wenjiefan/claude-codereview-judge-copilot branch June 26, 2026 12:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provision Copilot CLI for code-review judge in Claude workflow#701

Provision Copilot CLI for code-review judge in Claude workflow#701
gggdttt merged 4 commits into
mainfrom
private/wenjiefan/claude-codereview-judge-copilot

gggdttt commented Jun 26, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gggdttt commented Jun 26, 2026

Problem

Fix

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants