Experiment: composed BCQuality super-skill/sub-skill code review#661
Draft
WaelAbuSeada wants to merge 69 commits into
Draft
Experiment: composed BCQuality super-skill/sub-skill code review#661WaelAbuSeada wants to merge 69 commits into
WaelAbuSeada wants to merge 69 commits into
Conversation
…C-Bench into category/code-review
…display and refactoring comment parsing logic
…egory/code-review
…egory/code-review
…egory/code-review
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
…egory/code-review
…egory/code-review
…t branch
Snapshots all non-experiment files from experiment/code-review-al-skill onto
category/code-review. Experiment-specific assets (al-code-review skill and
custom instructions under microsoft-BCApps/instructions/*.md) remain only on
the experiment branch.
Highlights:
- Dataset: enriched 28 zero-expected entries (security/privacy/style/upgrade)
with in-domain expected_comments; cleaned up OOD bait across pre-existing
entries; renumbered performance and privacy entries to be contiguous.
- Eval: domain-aware code-review evaluation, codereview_judge for LLM-confirmed
matches, improved review parsing, grouped per-domain summary layout.
- Results: domain-split metrics, leaderboard refresh, severity_mae, macro_f1.
- Tooling: probe_codereview_case/batch harness for local skill testing,
apply_enrichment + unindent_bait_files + fix_enrichment_iteration_{1,2}
scripts used to produce the dataset enrichment, dump_entries, ood_worklist,
run_entry helpers.
- Hooks: Python log_tool_usage hook (Linux-compatible) with process log capture
and unit tests.
- Workflow: copilot-evaluation.yml updates for category routing and metrics.
- Lint: ruff/ty cleanups across tools/, tests/, and shared hooks.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
The /al-code-review skill prompt, custom instructions, and skills are experiment-specific. Revert config.yaml to the defaults (/review prompt, instructions/skills disabled). Experiment branch keeps its own version. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Collaborator
|
Marked experiment PRs as draft |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a new code-review experiment that implements BCQuality's composed super-skill / sub-skill review pattern.
What this adds
src/bcbench/agent/shared/instructions/microsoft-BCApps/skills/al-code-review/as committed static copies (no runtime fetch).read.md,do.md)al-code-review.md+ 5 domain leaf sub-skills (performance, security, privacy, upgrade, style)knowledge-index.jsonSKILL.mdentry point that orchestrates the composition and maps BCQuality findings to BC-Bench'sreview.jsonschema (blocker->critical, major->high, minor->medium, info->low;from-sub-skill->domain).config.yaml:code-review-templateinvokes the skill directly via/al-code-review(matching theexperiment/code-review-al-skillpattern);skills.enabled: true.Notes
setup_agent_skillscopies the committed skill folder intorepo/.github/skills/— no BCQuality clone needed.