Conversation
hyper_grid_search trained the best estimator on the full feature matrix while first_fit and make_predictions used the LOFO-reduced matrix, causing CatBoost to crash at predict time with "Feature N is present in model but not in pool" whenever LOFO and grid_search were both enabled (kaggle project). Apply apply_feature_selection_support before fitting in Optuna so all three stages train and predict on the same feature subset.
|
Claude finished @mrconway's task —— View job PR Review: Apply LOFO support in Optuna hyperparameter searchTodo List:
Code Quality & Best Practices ✅Excellent overall implementation. The changes follow established patterns and conventions:
Bug Fix Analysis ✅The root cause analysis is spot-on:
The fix correctly handles the feature selection pipeline:
Test Coverage ✅Excellent test coverage in
Performance Considerations ✅
Security Concerns ✅
Minor Observations
Recommendation: APPROVE ✅This is a well-executed bug fix that:
The fix addresses the immediate problem without introducing technical debt. The test suite provides confidence that the feature selection logic works correctly across different scenarios. |
Summary
first_fitandmake_predictionsapply LOFO feature support, buthyper_grid_search(Optuna) was retrainingbest_estimatoron the full feature matrix, so the model and pool disagreed on feature count.alphapy/optimize.pyto applyapply_feature_selection_supportbefore Optuna fits, mirroring the pattern inmodel.py:make_predictions.Bundled in this PR (already on develop):
c8e2c59— Apply LOFO support before model fitting (the LOFO commit that introduced the gap)106c857— Apply LOFO support in Optuna hyperparameter search (this fix)Test plan
pytest tests/— 23 passedcd projects/kaggle && uv run alphapy— runs end-to-end, writessubmission.csv(was crashing before fix)cd projects/pizza && uv run alphapy— runs end-to-endcd projects/time-series && uv run alphapy— runs end-to-end