-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Summary
Add confidence thresholds to Reviewer agents so only findings with >80% confidence are reported. Reduces noise in /code-review output and increases signal-to-noise ratio. Inspired by Harness Alpha's confidence-based filtering pattern.
Motivation
Harness Alpha's code-reviewer and all language-specific reviewers (Python, Go, Kotlin) require >80% confidence before flagging issues. Their philosophy: "Better to miss something than flood with false positives."
The Problem
Without confidence thresholds, reviewers report everything they notice — including:
- Style preferences that aren't clearly wrong
- Potential issues that depend on context the reviewer can't see
- Edge cases that are handled elsewhere in the codebase
- "Could be improved" suggestions that distract from real issues
This creates review fatigue. When 40% of findings are noise, users start ignoring all findings.
The Pattern
Each review finding must include a confidence assessment:
### Finding: Missing error boundary around API call
- **Severity**: HIGH
- **Confidence**: 90%
- **File**: src/api/client.ts:45
- **Issue**: Uncaught promise rejection could crash the app
- **Fix**: Wrap in try/catch with error reporting
### Finding: Variable name could be more descriptive
- **Severity**: LOW
- **Confidence**: 55% ← FILTERED (below 80% threshold)Only findings with confidence ≥ 80% appear in the review output. Lower-confidence findings are either dropped or collected in a separate "suggestions" section.
Why This Matters for DevFlow
Our /code-review spawns 7-11 Reviewer agents in parallel. Each reports all findings regardless of confidence. The Synthesizer deduplicates and merges, but doesn't filter by confidence — noise from individual reviewers propagates to the final report.
Adding confidence thresholds would:
- Reduce review report length by ~30-40% (estimated noise ratio)
- Increase user trust in review output
- Focus attention on real issues
Technical Approach
1. Update Reviewer Agent Prompt
Add confidence assessment requirement to shared/agents/reviewer.md:
## Finding Format
For each issue found, assess your confidence (0-100%):
- **90-100%**: Certain — clear bug, security vulnerability, or standards violation
- **80-89%**: High — likely issue based on context and patterns
- **60-79%**: Medium — possible issue, depends on context not visible
- **Below 60%**: Low — subjective preference or uncertain
**Only report findings with confidence ≥ 80%.**
Collect lower-confidence observations in a separate "Suggestions" section (max 3 items).2. Structured Output
## Critical & High Confidence Findings
[Only ≥80% confidence findings, severity-ordered]
## Suggestions (Lower Confidence)
[Max 3 items, clearly labeled as suggestions not findings]3. Synthesizer Integration
Update Synthesizer to:
- Respect confidence levels during deduplication
- Boost confidence when multiple reviewers flag the same issue
- Maintain the ≥80% threshold in the final report
Effort & Impact
- Effort: Small (agent prompt update + synthesizer tweak)
- Impact: Medium — cleaner reviews, higher user trust, less review fatigue
- Risk: Low — worst case is missing a marginal issue that a user would have ignored anyway
Cross-Reference
- Enhances
/code-reviewoutput quality - Complements Research Report: Harness Alpha Competitive Analysis — 13 Enhancement Opportunities #107 item 9 (De-Sloppify Categories) — both aim to increase signal-to-noise
- Independent of PreToolUse Hook Enforcement System #98 (PreToolUse enforcement) and PostToolUse Quality Hooks: Auto-Format and Typecheck After Edits #110 (PostToolUse quality hooks)