docs(code-review): add experiment leaderboard table by gggdttt · Pull Request #705 · microsoft/BC-Bench

gggdttt · 2026-06-26T21:37:01Z

Adds the Experiment Leaderboard section to the code-review docs page, decoupled from #696.

Why

#696 (live BCQuality consumption) is paused pending a review of how BCApps consumes BCQuality, but we still run the experiment data. The site page should reflect those runs independently of merging #696.

Changes

docs/code-review.md: rename baseline header F1 -> Micro F1 for consistency; add an Experiment Leaderboard table that renders rows where agg.experiment is set (BCQuality live skills vs inline pre-#8700 knowledge).
src/bcbench/types.py: add bcquality: bool = False to ExperimentConfiguration (and include it in is_empty()). Required so the leaderboard aggregation preserves the live-arm marker; otherwise pydantic drops the unknown key and live-arm rows mislabel as Other. Default-off and backward compatible — existing results are unaffected.

Safety

Self-contained: with no experiment data the table shows No experiment results available yet. No pipeline / value-computation changes; gracefully degrades. The oldinline arm already labels correctly on main; this PR additionally enables the BCQuality live-arm label.

Generated with the help of GitHub Copilot.

)

…e can label the live arm

wenjiefan added 2 commits June 26, 2026 23:36

docs(code-review): add experiment leaderboard table (decoupled from #696

9d14797

)

types: add bcquality flag to ExperimentConfiguration so the docs tabl…

79d8c71

…e can label the live arm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(code-review): add experiment leaderboard table#705

docs(code-review): add experiment leaderboard table#705
gggdttt wants to merge 2 commits into
mainfrom
private/wenjiefan/docs-code-review-experiment-table

gggdttt commented Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gggdttt commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Changes

Safety

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gggdttt commented Jun 26, 2026 •

edited

Loading