Add monotonic constraints to RFGBoostClassifier#6
Open
orgoca wants to merge 2 commits into
Open
Conversation
Per-feature monotonic constraints (+1 / -1 / 0) enforced exactly via
value-bound propagation during tree growth plus split-rejection, so
monotonicity holds globally (all-else-fixed), not just locally. Exposed as
RFGBoostClassifier(monotone_constraints={column_index: +1|-1}); defaults to
no-op, byte-identical to previous behavior when unset. Binary classification.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reviewer's GuideAdds per-feature monotonic constraints to RFGBoostClassifier by threading a monotone_constraints vector through tree configuration and enforcing it via split rejection and value-bound propagation in tree building, plus a Python API surface and tests. Flow diagram for monotonic enforcement in tree buildingflowchart TD
A["build_node / build_node_exact"] --> B["find_best_split_hist / find_best_split_exact"]
B --> C{Constraint via constraint_of}
C -->|c == 0| D["accept split candidate"]
C -->|c != 0 and child means violate| E["reject split candidate (continue)"]
D --> F["child_bounds (compute ll, lu, rl, ru)"]
F --> G["build_node / build_node_exact on left with (ll, lu)"]
F --> H["build_node / build_node_exact on right with (rl, ru)"]
G --> I["create_leaf (value clamped to [lower, upper])"]
H --> J["create_leaf (value clamped to [lower, upper])"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In the Python wrapper, consider validating
monotone_constraintsearly (e.g., keys within[0, n_features), values in{-1, 0, 1}) so user errors fail fast with a clear message rather than silently being treated as unconstrained or causing subtle issues. - In the histogram/exact split paths you already have
left_w,left_wy,right_w,right_wyavailable; you could refactorchild_boundsto take these aggregates instead of recomputing weighted means over index lists to avoid extra passes over the data at each split. - The new
weighted_meanhelper and the regression branch ofcreate_leafimplement very similar logic for computing a weighted average; consider consolidating this into a single utility to keep the behavior (especially the zero-weight fallback) consistent and reduce duplication.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In the Python wrapper, consider validating `monotone_constraints` early (e.g., keys within `[0, n_features)`, values in `{-1, 0, 1}`) so user errors fail fast with a clear message rather than silently being treated as unconstrained or causing subtle issues.
- In the histogram/exact split paths you already have `left_w`, `left_wy`, `right_w`, `right_wy` available; you could refactor `child_bounds` to take these aggregates instead of recomputing weighted means over index lists to avoid extra passes over the data at each split.
- The new `weighted_mean` helper and the regression branch of `create_leaf` implement very similar logic for computing a weighted average; consider consolidating this into a single utility to keep the behavior (especially the zero-weight fallback) consistent and reduce duplication.
## Individual Comments
### Comment 1
<location path="rfgboost/_woe.py" line_range="125-130" />
<code_context>
# Store hyperparameters as attributes (sklearn BaseEstimator convention)
+ # monotone_constraints: {original_column_index: +1|-1|0}, translated to
+ # encoded-feature order at fit time (binary classification only).
+ self.monotone_constraints = monotone_constraints
self.n_estimators = n_estimators
self.learning_rate = learning_rate
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Validate or normalize monotone constraint values before passing them through to Rust.
`monotone_constraints` is currently stored and passed through as arbitrary ints, even though Rust only uses the sign (`+1`, `-1`, `0`). Consider normalizing here (e.g. any >0 → 1, <0 → -1, else → 0) so the Python API is robust to accidental values like `2` or `-3` and behavior is predictable.
```suggestion
# Store hyperparameters as attributes (sklearn BaseEstimator convention)
# monotone_constraints: {original_column_index: +1|-1|0}, translated to
# encoded-feature order at fit time (binary classification only).
# Normalize constraint values so only {-1, 0, 1} are stored/passed through.
self.monotone_constraints = (
{
col_idx: 1 if direction > 0 else -1 if direction < 0 else 0
for col_idx, direction in monotone_constraints.items()
}
if monotone_constraints is not None
else None
)
self.n_estimators = n_estimators
self.learning_rate = learning_rate
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- Validate monotone_constraints at fit time (keys must be valid column
indices) and normalize direction values to their sign, so only {-1,0,1}
reach the core and user errors fail fast with a clear message. Kept in
fit rather than __init__ to preserve the sklearn 'store params verbatim'
convention (clone/get_params).
- create_leaf now reuses weighted_mean, removing duplicated weighted-average
logic and keeping the zero-weight fallback consistent.
- Add tests: sign-normalization and invalid-key validation.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add monotonic constraints to RFGBoostClassifier
Motivation
Credit-risk and other regulated models frequently require monotonic
relationships between features and the prediction (e.g. higher bureau score ⇒
lower probability of default). CatBoost, LightGBM and XGBoost all support this;
RFGBoost currently does not. This PR adds per-feature monotonic constraints to
RFGBoostClassifier.What changed
TreeConfig.monotone_constraints: Vec<i8>— per-feature direction(
+1non-decreasing,-1non-increasing,0unconstrained; empty = all 0).build_node/build_node_exact: when a node splits on a constrained feature, the childrenreceive value bounds around the midpoint of their means, so every leaf in the
left subtree stays ≤ (≥) every leaf in the right subtree. Leaf values are
clamped to the propagated
[lower, upper]interval. Combined with asplit-rejection rule in
find_best_split_hist/find_best_split_exact(candidate splits whose child means violate the direction are skipped), this
guarantees monotonicity globally, not just locally — verified by an
all-else-fixed sweep test.
RFGBoostClassifier(monotone_constraints={column_index: +1|-1}).The dict is keyed by original input-column index and translated to the
encoded-feature order at fit time (so it composes with
cat_features).Backward compatibility
Defaults to empty /
None⇒ no-op. All existingTreeConfigconstructionsites pass an empty constraint set, and the new code paths are inert when no
constraints are set (
constraint_ofreturns 0, bounds stay ±∞, leaf clamp is ano-op, split rejection is skipped). Verified: predictions are identical to the
previous behavior when
monotone_constraintsis unset(
test_no_constraints_is_backward_compatible).Tests (
tests/test_monotonic.py)test_increasing_constraint_holds/test_decreasing_constraint_holds—sweeping a constrained feature with all others fixed never moves the
prediction the wrong way (exact monotonicity on real predictions).
test_unconstrained_model_is_non_monotone— a sanity check that the syntheticdata is genuinely non-monotone without the constraint, so the above tests are
not vacuous.
test_no_constraints_is_backward_compatible— unset == previous behavior.Evidence
On a real credit default (PD) benchmark evaluated out-of-time (rolling-origin
across vintages), enabling monotonic constraints on RFGBoost improved OOT Gini in
5/5 folds and reduced the train→OOT generalization gap by ~40% relative to the
unconstrained model — i.e. on a genuinely monotone target the constraint acts as
a useful regularizer, not just a governance checkbox.
Limitations / scope
constraints through it would be a follow-up).
columns first in
cat_featuresorder, then numeric columns). Worth a carefullook in review.
Notes
I was unable to run the full Python test suite locally because
tests/test_sample_weight.pysegfaults on my platform (Windows + Python 3.14)on pristine
mainas well — i.e. it's a pre-existing, environment-specificcrash unrelated to this change. The new monotonic tests and all
classification/regression/async/CI tests pass; CI should exercise the rest.
Summary by Sourcery
Add per-feature monotonic constraints to RFGBoostClassifier and propagate them through tree building to enforce globally monotone predictions for constrained features.
New Features:
Enhancements:
Tests: