Improve parallelization: threading backend, PI batching, auto-selection by monte-flora · Pull Request #91 · monte-flora/scikit-explain

monte-flora · 2026-04-02T18:02:41Z

Switch run_parallel from loky to threading backend — eliminates pickling overhead for feature-level parallelism. Sklearn predict releases the GIL, so threads scale well (tested up to 10 cores).
Add auto-selection: skip parallel overhead when task count < 3
Batch n_permute predict calls in permutation importance into single predict call (PI 10v/10p: 78.6s → 64.2s serial, 19.1s at n_jobs=4)
Optimize PI column reassembly with numpy in-place column swap instead of pd.concat column-by-column extraction
Add stress test to benchmark suite (10K samples, 30 features, 100 trees)

Scaling results (10K samples, 30 features, 100-tree RF):
ALE (30 feat, 1 boot): 1.67s → 0.36s at n_jobs=10 (4.7x)
ALE (30 feat, 10 boot): 16.3s → 2.7s at n_jobs=10 (6.1x)
PD (5 feat, 10 boot): 29.7s → 7.4s at n_jobs=5 (4.0x)
PI (10 vars, 10 perm): 66.4s → 19.1s at n_jobs=4 (3.5x)

- Switch run_parallel from loky to threading backend — eliminates pickling overhead for feature-level parallelism. Sklearn predict releases the GIL, so threads scale well (tested up to 10 cores). - Add auto-selection: skip parallel overhead when task count < 3 - Batch n_permute predict calls in permutation importance into single predict call (PI 10v/10p: 78.6s → 64.2s serial, 19.1s at n_jobs=4) - Optimize PI column reassembly with numpy in-place column swap instead of pd.concat column-by-column extraction - Add stress test to benchmark suite (10K samples, 30 features, 100 trees) Scaling results (10K samples, 30 features, 100-tree RF): ALE (30 feat, 1 boot): 1.67s → 0.36s at n_jobs=10 (4.7x) ALE (30 feat, 10 boot): 16.3s → 2.7s at n_jobs=10 (6.1x) PD (5 feat, 10 boot): 29.7s → 7.4s at n_jobs=5 (4.0x) PI (10 vars, 10 perm): 66.4s → 19.1s at n_jobs=4 (3.5x) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

monte-flora merged commit a353d2f into master Apr 2, 2026
11 checks passed

monte-flora deleted the improve/performance-optimization branch April 2, 2026 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve parallelization: threading backend, PI batching, auto-selection#91

Improve parallelization: threading backend, PI batching, auto-selection#91
monte-flora merged 1 commit into
masterfrom
improve/performance-optimization

monte-flora commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

monte-flora commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant