Optimize ALE and PD computation: 1.6-3.2x speedup#90
Merged
Conversation
ALE (compute_first_order_ale): - Replace DataFrame operations with numpy arrays throughout bootstrap loop - Batch both bin-edge predictions into single predict call (2 → 1 calls) - Replace pandas groupby with numpy bincount for mean effects - Eliminates 2 DataFrame copies per bootstrap iteration PD (compute_partial_dependence): - Vectorize grid-point loop: batch all n_bins points into single predict call instead of per-grid-point predict loop (20 → 1 calls per bootstrap) - Fix bug: predict was called inside feature loop instead of after all features assigned - Use numpy arrays instead of DataFrame throughout Benchmarks (2000 samples, 10 features, 50-tree RF): PD 1D (3 feat, 10 boot): 1.96s → 0.60s (3.2× faster) ALE 1D (all, 10 boot): 0.85s → 0.52s (1.6× faster) PD 1D (3 feat, 1 boot): 0.20s → 0.06s (3.3× faster) ALE 1D (all, 1 boot): 0.09s → 0.05s (1.7× faster) Add benchmark_suite.py for reproducible performance measurement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ALE (compute_first_order_ale):
PD (compute_partial_dependence):
Benchmarks (2000 samples, 10 features, 50-tree RF):
PD 1D (3 feat, 10 boot): 1.96s → 0.60s (3.2× faster)
ALE 1D (all, 10 boot): 0.85s → 0.52s (1.6× faster)
PD 1D (3 feat, 1 boot): 0.20s → 0.06s (3.3× faster)
ALE 1D (all, 1 boot): 0.09s → 0.05s (1.7× faster)
Add benchmark_suite.py for reproducible performance measurement.