Optimize ALE and PD computation: 1.6-3.2x speedup by monte-flora · Pull Request #90 · monte-flora/scikit-explain

monte-flora · 2026-04-02T15:50:20Z

ALE (compute_first_order_ale):

Replace DataFrame operations with numpy arrays throughout bootstrap loop
Batch both bin-edge predictions into single predict call (2 → 1 calls)
Replace pandas groupby with numpy bincount for mean effects
Eliminates 2 DataFrame copies per bootstrap iteration

PD (compute_partial_dependence):

Vectorize grid-point loop: batch all n_bins points into single predict call instead of per-grid-point predict loop (20 → 1 calls per bootstrap)
Fix bug: predict was called inside feature loop instead of after all features assigned
Use numpy arrays instead of DataFrame throughout

Benchmarks (2000 samples, 10 features, 50-tree RF):
PD 1D (3 feat, 10 boot): 1.96s → 0.60s (3.2× faster)
ALE 1D (all, 10 boot): 0.85s → 0.52s (1.6× faster)
PD 1D (3 feat, 1 boot): 0.20s → 0.06s (3.3× faster)
ALE 1D (all, 1 boot): 0.09s → 0.05s (1.7× faster)

Add benchmark_suite.py for reproducible performance measurement.

ALE (compute_first_order_ale): - Replace DataFrame operations with numpy arrays throughout bootstrap loop - Batch both bin-edge predictions into single predict call (2 → 1 calls) - Replace pandas groupby with numpy bincount for mean effects - Eliminates 2 DataFrame copies per bootstrap iteration PD (compute_partial_dependence): - Vectorize grid-point loop: batch all n_bins points into single predict call instead of per-grid-point predict loop (20 → 1 calls per bootstrap) - Fix bug: predict was called inside feature loop instead of after all features assigned - Use numpy arrays instead of DataFrame throughout Benchmarks (2000 samples, 10 features, 50-tree RF): PD 1D (3 feat, 10 boot): 1.96s → 0.60s (3.2× faster) ALE 1D (all, 10 boot): 0.85s → 0.52s (1.6× faster) PD 1D (3 feat, 1 boot): 0.20s → 0.06s (3.3× faster) ALE 1D (all, 1 boot): 0.09s → 0.05s (1.7× faster) Add benchmark_suite.py for reproducible performance measurement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

monte-flora merged commit 2e1822c into master Apr 2, 2026
11 checks passed

monte-flora deleted the improve/performance-optimization branch April 2, 2026 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize ALE and PD computation: 1.6-3.2x speedup#90

Optimize ALE and PD computation: 1.6-3.2x speedup#90
monte-flora merged 1 commit into
masterfrom
improve/performance-optimization

monte-flora commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

monte-flora commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant