Skip to content

Experiment: composed BCQuality super-skill/sub-skill code review#661

Draft
WaelAbuSeada wants to merge 69 commits into
mainfrom
experiment/code-review-composed-skills
Draft

Experiment: composed BCQuality super-skill/sub-skill code review#661
WaelAbuSeada wants to merge 69 commits into
mainfrom
experiment/code-review-composed-skills

Conversation

@WaelAbuSeada

Copy link
Copy Markdown
Member

Adds a new code-review experiment that implements BCQuality's composed super-skill / sub-skill review pattern.

What this adds

  • Vendors the BCQuality composed-review framework into src/bcbench/agent/shared/instructions/microsoft-BCApps/skills/al-code-review/ as committed static copies (no runtime fetch).
    • Meta-skill contracts (read.md, do.md)
    • Super-skill al-code-review.md + 5 domain leaf sub-skills (performance, security, privacy, upgrade, style)
    • 123 knowledge articles + AL samples across the 5 domains, plus a generated knowledge-index.json
    • UI domain dropped to match BC-Bench's 5 domains
  • Authored Copilot SKILL.md entry point that orchestrates the composition and maps BCQuality findings to BC-Bench's review.json schema (blocker->critical, major->high, minor->medium, info->low; from-sub-skill->domain).
  • config.yaml: code-review-template invokes the skill directly via /al-code-review (matching the experiment/code-review-al-skill pattern); skills.enabled: true.

Notes

  • CI's existing setup_agent_skills copies the committed skill folder into repo/.github/skills/ — no BCQuality clone needed.
  • Relevant tests pass (agent-skills, experiment-config, copilot-prompt, dataset-integrity).

haoranpb and others added 30 commits April 8, 2026 13:57
…display and refactoring comment parsing logic
haoranpb and others added 27 commits May 29, 2026 11:57
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
…t branch

Snapshots all non-experiment files from experiment/code-review-al-skill onto
category/code-review. Experiment-specific assets (al-code-review skill and
custom instructions under microsoft-BCApps/instructions/*.md) remain only on
the experiment branch.

Highlights:
- Dataset: enriched 28 zero-expected entries (security/privacy/style/upgrade)
  with in-domain expected_comments; cleaned up OOD bait across pre-existing
  entries; renumbered performance and privacy entries to be contiguous.
- Eval: domain-aware code-review evaluation, codereview_judge for LLM-confirmed
  matches, improved review parsing, grouped per-domain summary layout.
- Results: domain-split metrics, leaderboard refresh, severity_mae, macro_f1.
- Tooling: probe_codereview_case/batch harness for local skill testing,
  apply_enrichment + unindent_bait_files + fix_enrichment_iteration_{1,2}
  scripts used to produce the dataset enrichment, dump_entries, ood_worklist,
  run_entry helpers.
- Hooks: Python log_tool_usage hook (Linux-compatible) with process log capture
  and unit tests.
- Workflow: copilot-evaluation.yml updates for category routing and metrics.
- Lint: ruff/ty cleanups across tools/, tests/, and shared hooks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
The /al-code-review skill prompt, custom instructions, and skills are
experiment-specific. Revert config.yaml to the defaults (/review prompt,
instructions/skills disabled). Experiment branch keeps its own version.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Base automatically changed from category/code-review to main June 23, 2026 13:15
@haoranpb haoranpb marked this pull request as draft June 24, 2026 06:04
@haoranpb

Copy link
Copy Markdown
Collaborator

Marked experiment PRs as draft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants