Add imaginary `hello-world` sample category by Copilot · Pull Request #694 · microsoft/BC-Bench

Copilot · 2026-06-26T05:50:34Z

Adds a deliberately tiny, imaginary, self-contained hello-world evaluation category as a worked example for demonstrating how to add a new category to BC-Bench. Also refreshes the stale CATEGORIES.md.

What `hello-world` is

The smallest possible category: the agent writes a tiny AL Greeting codeunit returning a "Hello, World!" message in a given language. No BC container, no symbols; judge-based scoring, so it reuses the existing JudgeBased* result/summary/aggregate classes like nl2al does. Intended for demonstrating the plumbing, not real scores (dataset entries use placeholder gold patches).

Changes

Registry (types.py): HELLO_WORLD enum + every match arm (dataset path, entry/result/summary/aggregate classes, pipeline, evaluators, core_score, requires_container, runner).
Dataset: HelloWorldEntry model + exports; dataset/hello_world.jsonl with two sample tasks. The entries point at the real public microsoft/BCApps repo and a valid base_commit (borrowed from the code-review category) so the setup clone step succeeds and the category runs end to end; the pipeline still rebuilds its own self-contained scaffold, so the cloned repo is only there to satisfy setup.
PowerShell (scripts/BCBenchUtils.psm1): registered hello-world (→ hello_world.jsonl) in the Get-BCBenchDatasetPath ValidateSet/switch so the setup scripts accept the category.
Pipeline (evaluate/hello_world.py): reset repo → git init → run agent → capture diff as raw judge-based output.
Prompt: hello-world-template in config.yaml.
Mock pipeline (commands/evaluate.py): handles the new category so the CI mock-evaluation job stays green.
Workflows: hello-world added to copilot/claude category choice lists.
Tests: conftest fixtures + exhaustiveness/mock wiring, plus test_hello_world_pipeline.py.
CATEGORIES.md: corrected the "two categories" claim (there are now several), documented the previously-omitted touch points (the Get-BCBenchDatasetPath PowerShell ValidateSet, aggregate_class, evaluators/core_score, requires_container/runner, mock pipeline, test wiring), and added a runnable hello-world example.

Run it

uv run bcbench run copilot helloworld__greeting-english-1 --category hello-world --repo-path /tmp/hello-world-repo

Co-authored-by: haoranpb <27280733+haoranpb@users.noreply.github.com>

haoranpb · 2026-06-26T07:15:09Z

@copilot

Workflow run failed due to

Get-BCBenchDatasetPath: /home/runner/work/BC-Bench/BC-Bench/scripts/Setup-ContainerAndRepository.ps1:23
Line |
23 | … [string]$DatasetPath = (Get-BCBenchDatasetPath -Category $Category),
| ~~~~~~~~~
| Cannot validate argument on parameter 'Category'. The argument
| "hello-world" does not belong to the set
| "bug-fix,test-generation,code-review,nl2al" specified by the ValidateSet
| attribute. Supply an argument that is in the set and then try the
| command again.
Error: Process completed with exit code 1.

Co-authored-by: haoranpb <27280733+haoranpb@users.noreply.github.com>

Copilot · 2026-06-26T07:17:47Z

@copilot

Workflow run failed due to
...

Fixed in 7c7d9ac. Get-BCBenchDatasetPath in scripts/BCBenchUtils.psm1 had its own ValidateSet that I missed — I added hello-world (mapping to hello_world.jsonl) to both the ValidateSet and the switch, and documented this PowerShell touch point in the CATEGORIES.md checklist.

haoranpb · 2026-06-26T07:21:27Z

@copilot now the workflow can't run because the commit sha is empty, can you simply take some from code review category?

The goal is that this can run end to end

Co-authored-by: haoranpb <27280733+haoranpb@users.noreply.github.com>

Copilot · 2026-06-26T07:24:34Z

@copilot now the workflow can't run because the commit sha is empty, can you simply take some from code review category?

The goal is that this can run end to end

Done. Both hello_world.jsonl entries now use microsoft/BCApps with base_commit 70fd0246a0a4dbc72cb183ca719106722c03be4d (borrowed from the code-review category), so the setup clone step succeeds. Since hello-world has requires_container = False, the workflow runs with skip-container, and the pipeline rebuilds a self-contained scaffold — so the clone is just there to satisfy setup and let it run end to end.

Copilot AI and others added 2 commits June 25, 2026 19:53

Add imaginary hello-world sample category

a6cf83a

Co-authored-by: haoranpb <27280733+haoranpb@users.noreply.github.com>

Trim HelloWorldPipeline docstring

a5304ff

Co-authored-by: haoranpb <27280733+haoranpb@users.noreply.github.com>

Copilot AI assigned Copilot and haoranpb Jun 26, 2026

Copilot created this pull request from a session on behalf of haoranpb June 26, 2026 05:50 View session

Copilot started work on behalf of haoranpb June 26, 2026 07:15 View session

Register hello-world in Get-BCBenchDatasetPath ValidateSet

55ffcbb

Co-authored-by: haoranpb <27280733+haoranpb@users.noreply.github.com>

Copilot finished work on behalf of haoranpb June 26, 2026 07:18

Copilot AI requested a review from haoranpb June 26, 2026 07:18

Copilot started work on behalf of haoranpb June 26, 2026 07:21 View session

Use real BCApps repo/base_commit for hello-world dataset

f39cf27

Co-authored-by: haoranpb <27280733+haoranpb@users.noreply.github.com>

Copilot finished work on behalf of haoranpb June 26, 2026 07:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add imaginary `hello-world` sample category#694

Add imaginary `hello-world` sample category#694
Copilot wants to merge 4 commits into
mainfrom
copilot/add-new-category-code-review

Copilot AI commented Jun 26, 2026 •

edited

Loading

Uh oh!

haoranpb commented Jun 26, 2026

Uh oh!

Copilot AI commented Jun 26, 2026

Uh oh!

haoranpb commented Jun 26, 2026

Uh oh!

Copilot AI commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Copilot AI commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What hello-world is

Changes

Run it

Uh oh!

haoranpb commented Jun 26, 2026

Uh oh!

Copilot AI commented Jun 26, 2026

Uh oh!

haoranpb commented Jun 26, 2026

Uh oh!

Copilot AI commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jun 26, 2026 •

edited

Loading

What `hello-world` is