chore(.claude): add review-ddtrace code review skill#4613
chore(.claude): add review-ddtrace code review skill#4613
Conversation
Add a Claude Code command and reference files that encode the implicit review standards dd-trace-go reviewers consistently enforce. Distilled from 3 months of review comments (668 inline comments across 148 PRs). The skill is structured as a progressive-disclosure system: - `.claude/commands/review-ddtrace.md` — main command with universal checklist - `.claude/review-ddtrace/style-and-idioms.md` — Go style patterns - `.claude/review-ddtrace/concurrency.md` — locking, races, restart safety - `.claude/review-ddtrace/contrib-patterns.md` — integration conventions - `.claude/review-ddtrace/performance.md` — hot path awareness Evaluated across 16 PRs over 5 iterations. On 10 never-before-seen PRs, the skill catches 58% of reviewer-flagged patterns vs 35% baseline (+23pp), with the strongest gains on contrib integration conventions and type safety/lifecycle patterns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files🚀 New features to boost your workflow:
|
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: eb1fd1f | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
BenchmarksBenchmark execution time: 2026-03-30 18:17:08 Comparing candidate commit eb1fd1f in PR branch Found 1 performance improvements and 0 performance regressions! Performance is the same for 214 metrics, 9 unstable metrics.
|
The example claimed a function going from cost 60 to 90 "won't inline differently" because "it was already over 80", but 60 is under the budget — it would inline at 60 and stop at 90. Fix the explanation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…race-go-specific patterns Remove sections that duplicate Effective Go content from style-and-idioms.md (import grouping, std library preference, code organization, function length). Trim concurrency.md, performance.md, and contrib-patterns.md per reviewer feedback. Add iteration-7 eval workspace: 3-way comparison (baseline / pre-fix / post-fix) across 10 new PRs. Results: skill +15pp vs baseline (70%→85%); post-fix and pre-fix within noise (85% vs 83%), confirming the cleanup has no regression. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@hannahkm I've just applied all of your feedback and ran a fresh 3-way eval across 10 new PRs to measure the impact of your suggestions Results (effective assertions caught out of 30):
The cleanup is a clean no-op since post-feedback and pre-feedback are within noise and the suggestions you made to style-and-idioms.md is better than what was there before, and the evals back that up! |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…atterns Single data point in eval dataset — not frequent enough to warrant dedicated guidance. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…edback Strip all blockquoted reviewer comments, generalize hyperspecific PR examples, collapse redundant sections. 538 → 446 lines (-17%). Eval iteration-8 confirms identical aggregate scores (0.767 both configs) across 10 PRs — the cleanup preserves all signal with less noise. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
kakkoyun
left a comment
There was a problem hiding this comment.
I wonder how the performance would change if tell it to respect and consider files like CONTRIBUTING.md and contrib/README.md during reviews.
I want to understand if it would keep up with the human maintained documentation.
What does this PR do?
Adds a
/review-ddtraceClaude Code skill that reviews PRs against patterns and conventions dd-trace-go reviewers consistently enforce. The skill consists of:review-ddtrace.md— Main skill file with universal checklist (happy path, regression tests, error handling, constants, config patterns, nil safety)style-and-idioms.md— dd-trace-go-specific style patterns (beyond Effective Go)concurrency.md— Mutex discipline, atomics, shared state, race-prone patternsperformance.md— Hot-path benchmarking, allocation avoidance, serialization correctnesscontrib-patterns.md— Integration API design, async lifecycle, DSM patterns, cross-integration consistencyEval Results
8 iterations of evals across 30+ unique PRs, comparing skill vs baseline (no skill) and measuring the impact of iterative refinements:
Key findings:
Motivation
Human reviewers on dd-trace-go consistently flag the same patterns — happy path alignment, missing benchmarks, lock discipline, DSM gating, init() avoidance. This skill captures those patterns so Claude Code can flag them during review, reducing round-trips.
Reviewer's Checklist
.claude/directory) — no Go code changes