Skip to content

Apply LOFO support in Optuna hyperparameter search#13

Merged
mrconway merged 2 commits into
mainfrom
develop
May 10, 2026
Merged

Apply LOFO support in Optuna hyperparameter search#13
mrconway merged 2 commits into
mainfrom
develop

Conversation

@mrconway
Copy link
Copy Markdown
Collaborator

Summary

  • Fixes a CatBoost crash at predict time when both LOFO and grid_search are enabled (kaggle project hit this).
  • Root cause: first_fit and make_predictions apply LOFO feature support, but hyper_grid_search (Optuna) was retraining best_estimator on the full feature matrix, so the model and pool disagreed on feature count.
  • Two-line fix in alphapy/optimize.py to apply apply_feature_selection_support before Optuna fits, mirroring the pattern in model.py:make_predictions.

Bundled in this PR (already on develop):

  • c8e2c59 — Apply LOFO support before model fitting (the LOFO commit that introduced the gap)
  • 106c857 — Apply LOFO support in Optuna hyperparameter search (this fix)

Test plan

  • pytest tests/ — 23 passed
  • cd projects/kaggle && uv run alphapy — runs end-to-end, writes submission.csv (was crashing before fix)
  • cd projects/pizza && uv run alphapy — runs end-to-end
  • cd projects/time-series && uv run alphapy — runs end-to-end

mrconway added 2 commits May 10, 2026 13:14
hyper_grid_search trained the best estimator on the full feature
matrix while first_fit and make_predictions used the LOFO-reduced
matrix, causing CatBoost to crash at predict time with
"Feature N is present in model but not in pool" whenever LOFO and
grid_search were both enabled (kaggle project).

Apply apply_feature_selection_support before fitting in Optuna so
all three stages train and predict on the same feature subset.
@claude
Copy link
Copy Markdown

claude Bot commented May 10, 2026

Claude finished @mrconway's task —— View job


PR Review: Apply LOFO support in Optuna hyperparameter search

Todo List:

  • Read repository guidelines (CLAUDE.md)
  • Examine the core fix in alphapy/optimize.py
  • Review changes in alphapy/model.py
  • Review other modified files (alphapy_main.py, plots.py)
  • Analyze the new test file for coverage adequacy
  • Assess code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Review security concerns
  • Provide comprehensive feedback

Code Quality & Best Practices ✅

Excellent overall implementation. The changes follow established patterns and conventions:

  • Consistent pattern: The fix in optimize.py:296 mirrors the existing pattern used in model.py:make_predictions(), maintaining consistency across the codebase
  • Proper function reuse: Uses existing apply_feature_selection_support() helper instead of duplicating logic
  • Clean imports: Appropriately imports the required function at module level
  • Good documentation: The docstring clearly explains the Optuna optimization functionality

Bug Fix Analysis ✅

The root cause analysis is spot-on:

  • Problem: CatBoost expects consistent feature counts between training and prediction, but hyper_grid_search() was training on full feature matrix while make_predictions() applied LOFO support
  • Solution: Apply apply_feature_selection_support() before Optuna training (optimize.py:296-301) to ensure feature consistency

The fix correctly handles the feature selection pipeline:

  1. LOFO support takes precedence over univariate support
  2. Falls back to univariate if LOFO rejects all features
  3. Gracefully handles missing support via try/except

Test Coverage ✅

Excellent test coverage in test_model_feature_selection.py:

  • Tests univariate support application
  • Tests LOFO support precedence over univariate
  • Tests fallback behavior when LOFO selects no features
  • Uses a custom ShapeRecordingClassifier to verify actual feature count passed to estimator
  • Tests are focused, well-named, and verify the correct behavior

Performance Considerations ✅

  • Minimal overhead: The fix adds only one function call that performs a simple array indexing operation
  • No redundant computation: Reuses existing support vectors rather than recomputing
  • Threading handled correctly: Maintains single-threading during Optuna optimization to avoid nested parallelism

Security Concerns ✅

  • No security issues identified
  • Uses standard numpy array operations and sklearn patterns
  • No external data access or privilege escalation

Minor Observations

  1. Try/except scope (optimize.py:297-301): The broad except clause could potentially mask other errors. Consider catching specific exceptions like KeyError or IndexError.

  2. Code duplication: The support application logic appears twice (lines 296 and 298-301). The second application handles RFE support from model.support[algo], but the interaction between LOFO and RFE support could be clearer.

Recommendation: APPROVE

This is a well-executed bug fix that:

  • Solves a real production issue (CatBoost crashes)
  • Uses minimal, surgical changes
  • Follows established patterns
  • Has comprehensive test coverage
  • Maintains backward compatibility

The fix addresses the immediate problem without introducing technical debt. The test suite provides confidence that the feature selection logic works correctly across different scenarios.


@mrconway mrconway merged commit 5698204 into main May 10, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant