Summary
Simplify the GE scoring formula for unknown contributors (the population scored when skip_known_contributors=true). Drop hub_score and log_account_age from the v2 LR; use alltime merge_rate as the sole scoring input.
Evidence
Experiments on the bot-detection branch (PR #44), tracked in experiments/bot_detection/RESULTS.md:
hub_score hurts unknown contributors (stage17): merge_rate alone outperforms every model that includes hub_score across all repo size tiers:
| Tier |
mr_only |
mr+hub |
Delta |
| All medium+ |
0.516 |
0.408 |
-0.108 |
| Large (500-1999 PRs) |
0.553 |
0.484 |
-0.069 |
| XL (2000+ PRs) |
0.533 |
0.405 |
-0.128 |
log_account_age adds nothing (stage19): On 4 stable cutoffs (n=130 to n=1014, 5-fold CV), mr+age never beats mr_only. DeLong p > 0.07 at every cutoff. age_only AUC is 0.505-0.522 (barely above chance).
| Cutoff |
N |
mr_only |
mr+age |
DeLong p |
| T_2022 |
130 |
0.584 |
0.576 |
0.807 |
| T_2022-07 |
431 |
0.606 |
0.606 |
0.992 |
| T_2023 |
474 |
0.552 |
0.534 |
0.076 |
| T_2024 |
1014 |
0.580 |
0.569 |
0.111 |
Recency windows don't help (stage18): No significant difference between alltime, 2yr, 1yr, 6mo, or 3mo merge_rate for unknown contributors (zero significant DeLong tests across all tiers and cutoffs).
Cross-repo merge prediction confirms (stage13): hub_score hurts here too. ge_v2_proxy (hub_score + merge_rate) AUC 0.542 vs merge_rate_only AUC 0.576.
PR #27 validation study corroborates: account_age was LRT-significant (p = 1.2e-5) against graph_score but did not improve AUC ranking (DeLong p = 0.65 for GE + merge_rate + age vs GE alone).
Implementation tasks