Summary
Add a lightweight suspension advisory score ("Bad Egg") that flags PR authors who resemble suspended/malicious accounts. Computed from the same data Good Egg already fetches -- no additional API calls.
Evidence
Experiments on the bot-detection branch (PR #44), tracked in experiments/bot_detection/RESULTS.md:
- Iteration 10 (stage14): 10-feature balanced LR achieves AUC 0.65-0.68 for suspension prediction across 6 temporal holdout cutoffs.
- Iteration 11 (stage15): Feature ablation shows 3-feature model {
merge_rate, median_additions, isolation_score} beats the full 10-feature model in every cutoff. 7 of 10 features are pure noise for suspension detection.
Scoring model
Balanced logistic regression with 3 features:
merge_rate — fraction of PRs merged. Strongest single predictor.
median_additions — median lines added per PR (log-transformed). Catches anomalous PR sizes.
isolation_score — fraction of author's repos where no other multi-repo contributor works. Catches activity in obscure/abandoned repos.
Implementation tasks