Skip to content

docs(code-review): add experiment leaderboard table#705

Open
gggdttt wants to merge 2 commits into
mainfrom
private/wenjiefan/docs-code-review-experiment-table
Open

docs(code-review): add experiment leaderboard table#705
gggdttt wants to merge 2 commits into
mainfrom
private/wenjiefan/docs-code-review-experiment-table

Conversation

@gggdttt

@gggdttt gggdttt commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Adds the Experiment Leaderboard section to the code-review docs page, decoupled from #696.

Why

#696 (live BCQuality consumption) is paused pending a review of how BCApps consumes BCQuality, but we still run the experiment data. The site page should reflect those runs independently of merging #696.

Changes

  • docs/code-review.md: rename baseline header F1 -> Micro F1 for consistency; add an Experiment Leaderboard table that renders rows where agg.experiment is set (BCQuality live skills vs inline pre-#8700 knowledge).
  • src/bcbench/types.py: add bcquality: bool = False to ExperimentConfiguration (and include it in is_empty()). Required so the leaderboard aggregation preserves the live-arm marker; otherwise pydantic drops the unknown key and live-arm rows mislabel as Other. Default-off and backward compatible — existing results are unaffected.

Safety

  • Self-contained: with no experiment data the table shows No experiment results available yet. No pipeline / value-computation changes; gracefully degrades. The oldinline arm already labels correctly on main; this PR additionally enables the BCQuality live-arm label.

Generated with the help of GitHub Copilot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant