Skip to content

Reviewer Confidence Thresholds: Reduce Noise in Code Review Output #113

@dean0x

Description

@dean0x

Summary

Add confidence thresholds to Reviewer agents so only findings with >80% confidence are reported. Reduces noise in /code-review output and increases signal-to-noise ratio. Inspired by Harness Alpha's confidence-based filtering pattern.

Motivation

Harness Alpha's code-reviewer and all language-specific reviewers (Python, Go, Kotlin) require >80% confidence before flagging issues. Their philosophy: "Better to miss something than flood with false positives."

The Problem

Without confidence thresholds, reviewers report everything they notice — including:

  • Style preferences that aren't clearly wrong
  • Potential issues that depend on context the reviewer can't see
  • Edge cases that are handled elsewhere in the codebase
  • "Could be improved" suggestions that distract from real issues

This creates review fatigue. When 40% of findings are noise, users start ignoring all findings.

The Pattern

Each review finding must include a confidence assessment:

### Finding: Missing error boundary around API call
- **Severity**: HIGH
- **Confidence**: 90%
- **File**: src/api/client.ts:45
- **Issue**: Uncaught promise rejection could crash the app
- **Fix**: Wrap in try/catch with error reporting

### Finding: Variable name could be more descriptive
- **Severity**: LOW
- **Confidence**: 55%  ← FILTERED (below 80% threshold)

Only findings with confidence ≥ 80% appear in the review output. Lower-confidence findings are either dropped or collected in a separate "suggestions" section.

Why This Matters for DevFlow

Our /code-review spawns 7-11 Reviewer agents in parallel. Each reports all findings regardless of confidence. The Synthesizer deduplicates and merges, but doesn't filter by confidence — noise from individual reviewers propagates to the final report.

Adding confidence thresholds would:

  1. Reduce review report length by ~30-40% (estimated noise ratio)
  2. Increase user trust in review output
  3. Focus attention on real issues

Technical Approach

1. Update Reviewer Agent Prompt

Add confidence assessment requirement to shared/agents/reviewer.md:

## Finding Format

For each issue found, assess your confidence (0-100%):
- **90-100%**: Certain — clear bug, security vulnerability, or standards violation
- **80-89%**: High — likely issue based on context and patterns
- **60-79%**: Medium — possible issue, depends on context not visible
- **Below 60%**: Low — subjective preference or uncertain

**Only report findings with confidence ≥ 80%.** 

Collect lower-confidence observations in a separate "Suggestions" section (max 3 items).

2. Structured Output

## Critical & High Confidence Findings
[Only ≥80% confidence findings, severity-ordered]

## Suggestions (Lower Confidence)
[Max 3 items, clearly labeled as suggestions not findings]

3. Synthesizer Integration

Update Synthesizer to:

  • Respect confidence levels during deduplication
  • Boost confidence when multiple reviewers flag the same issue
  • Maintain the ≥80% threshold in the final report

Effort & Impact

  • Effort: Small (agent prompt update + synthesizer tweak)
  • Impact: Medium — cleaner reviews, higher user trust, less review fatigue
  • Risk: Low — worst case is missing a marginal issue that a user would have ignored anyway

Cross-Reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestpost-v1.0.0Deferred to post-v1.0.0 release

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions