Apply LOFO support in Optuna hyperparameter search by mrconway · Pull Request #13 · ScottfreeLLC/alphapy-pro

mrconway · 2026-05-10T17:34:33Z

Summary

Fixes a CatBoost crash at predict time when both LOFO and grid_search are enabled (kaggle project hit this).
Root cause: first_fit and make_predictions apply LOFO feature support, but hyper_grid_search (Optuna) was retraining best_estimator on the full feature matrix, so the model and pool disagreed on feature count.
Two-line fix in alphapy/optimize.py to apply apply_feature_selection_support before Optuna fits, mirroring the pattern in model.py:make_predictions.

Bundled in this PR (already on develop):

c8e2c59 — Apply LOFO support before model fitting (the LOFO commit that introduced the gap)
106c857 — Apply LOFO support in Optuna hyperparameter search (this fix)

Test plan

pytest tests/ — 23 passed
cd projects/kaggle && uv run alphapy — runs end-to-end, writes submission.csv (was crashing before fix)
cd projects/pizza && uv run alphapy — runs end-to-end
cd projects/time-series && uv run alphapy — runs end-to-end

hyper_grid_search trained the best estimator on the full feature matrix while first_fit and make_predictions used the LOFO-reduced matrix, causing CatBoost to crash at predict time with "Feature N is present in model but not in pool" whenever LOFO and grid_search were both enabled (kaggle project). Apply apply_feature_selection_support before fitting in Optuna so all three stages train and predict on the same feature subset.

claude · 2026-05-10T17:34:47Z

Claude finished @mrconway's task —— View job

PR Review: Apply LOFO support in Optuna hyperparameter search

Todo List:

Code Quality & Best Practices ✅

Excellent overall implementation. The changes follow established patterns and conventions:

Consistent pattern: The fix in optimize.py:296 mirrors the existing pattern used in model.py:make_predictions(), maintaining consistency across the codebase
Proper function reuse: Uses existing apply_feature_selection_support() helper instead of duplicating logic
Clean imports: Appropriately imports the required function at module level
Good documentation: The docstring clearly explains the Optuna optimization functionality

Bug Fix Analysis ✅

The root cause analysis is spot-on:

Problem: CatBoost expects consistent feature counts between training and prediction, but hyper_grid_search() was training on full feature matrix while make_predictions() applied LOFO support
Solution: Apply apply_feature_selection_support() before Optuna training (optimize.py:296-301) to ensure feature consistency

The fix correctly handles the feature selection pipeline:

LOFO support takes precedence over univariate support
Falls back to univariate if LOFO rejects all features
Gracefully handles missing support via try/except

Test Coverage ✅

Excellent test coverage in test_model_feature_selection.py:

Tests univariate support application
Tests LOFO support precedence over univariate
Tests fallback behavior when LOFO selects no features
Uses a custom ShapeRecordingClassifier to verify actual feature count passed to estimator
Tests are focused, well-named, and verify the correct behavior

Performance Considerations ✅

Minimal overhead: The fix adds only one function call that performs a simple array indexing operation
No redundant computation: Reuses existing support vectors rather than recomputing
Threading handled correctly: Maintains single-threading during Optuna optimization to avoid nested parallelism

Security Concerns ✅

No security issues identified
Uses standard numpy array operations and sklearn patterns
No external data access or privilege escalation

Minor Observations

Try/except scope (optimize.py:297-301): The broad except clause could potentially mask other errors. Consider catching specific exceptions like KeyError or IndexError.
Code duplication: The support application logic appears twice (lines 296 and 298-301). The second application handles RFE support from model.support[algo], but the interaction between LOFO and RFE support could be clearer.

Recommendation: APPROVE ✅

This is a well-executed bug fix that:

Solves a real production issue (CatBoost crashes)
Uses minimal, surgical changes
Follows established patterns
Has comprehensive test coverage
Maintains backward compatibility

The fix addresses the immediate problem without introducing technical debt. The test suite provides confidence that the feature selection logic works correctly across different scenarios.

mrconway added 2 commits May 10, 2026 13:14

Apply LOFO support before model fitting

c8e2c59

mrconway merged commit 5698204 into main May 10, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply LOFO support in Optuna hyperparameter search#13

Apply LOFO support in Optuna hyperparameter search#13
mrconway merged 2 commits into
mainfrom
develop

mrconway commented May 10, 2026

Uh oh!

claude Bot commented May 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mrconway commented May 10, 2026

Summary

Test plan

Uh oh!

claude Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Apply LOFO support in Optuna hyperparameter search

Todo List:

Code Quality & Best Practices ✅

Bug Fix Analysis ✅

Test Coverage ✅

Performance Considerations ✅

Security Concerns ✅

Minor Observations

Recommendation: APPROVE ✅

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 10, 2026 •

edited

Loading