Skip to content

Add monotonic constraints to RFGBoostClassifier#6

Open
orgoca wants to merge 2 commits into
xRiskLab:mainfrom
orgoca:feat/monotonic-constraints
Open

Add monotonic constraints to RFGBoostClassifier#6
orgoca wants to merge 2 commits into
xRiskLab:mainfrom
orgoca:feat/monotonic-constraints

Conversation

@orgoca

@orgoca orgoca commented Jun 3, 2026

Copy link
Copy Markdown

Add monotonic constraints to RFGBoostClassifier

Motivation

Credit-risk and other regulated models frequently require monotonic
relationships between features and the prediction (e.g. higher bureau score ⇒
lower probability of default). CatBoost, LightGBM and XGBoost all support this;
RFGBoost currently does not. This PR adds per-feature monotonic constraints to
RFGBoostClassifier.

What changed

  • TreeConfig.monotone_constraints: Vec<i8> — per-feature direction
    (+1 non-decreasing, -1 non-increasing, 0 unconstrained; empty = all 0).
  • Exact enforcement via value-bound propagation in build_node /
    build_node_exact: when a node splits on a constrained feature, the children
    receive value bounds around the midpoint of their means, so every leaf in the
    left subtree stays ≤ (≥) every leaf in the right subtree. Leaf values are
    clamped to the propagated [lower, upper] interval. Combined with a
    split-rejection rule in find_best_split_hist / find_best_split_exact
    (candidate splits whose child means violate the direction are skipped), this
    guarantees monotonicity globally, not just locally — verified by an
    all-else-fixed sweep test.
  • Python: RFGBoostClassifier(monotone_constraints={column_index: +1|-1}).
    The dict is keyed by original input-column index and translated to the
    encoded-feature order at fit time (so it composes with cat_features).

Backward compatibility

Defaults to empty / Noneno-op. All existing TreeConfig construction
sites pass an empty constraint set, and the new code paths are inert when no
constraints are set (constraint_of returns 0, bounds stay ±∞, leaf clamp is a
no-op, split rejection is skipped). Verified: predictions are identical to the
previous behavior when monotone_constraints is unset
(test_no_constraints_is_backward_compatible).

Tests (tests/test_monotonic.py)

  • test_increasing_constraint_holds / test_decreasing_constraint_holds
    sweeping a constrained feature with all others fixed never moves the
    prediction the wrong way (exact monotonicity on real predictions).
  • test_unconstrained_model_is_non_monotone — a sanity check that the synthetic
    data is genuinely non-monotone without the constraint, so the above tests are
    not vacuous.
  • test_no_constraints_is_backward_compatible — unset == previous behavior.

Evidence

On a real credit default (PD) benchmark evaluated out-of-time (rolling-origin
across vintages), enabling monotonic constraints on RFGBoost improved OOT Gini in
5/5 folds and reduced the train→OOT generalization gap by ~40% relative to the
unconstrained model — i.e. on a genuinely monotone target the constraint acts as
a useful regularizer, not just a governance checkbox.

Limitations / scope

  • Binary classification for now (the multiclass path is unchanged; wiring
    constraints through it would be a follow-up).
  • The Python dict→encoded-order translation assumes the binary WOE layout (WOE
    columns first in cat_features order, then numeric columns). Worth a careful
    look in review.

Notes

I was unable to run the full Python test suite locally because
tests/test_sample_weight.py segfaults on my platform (Windows + Python 3.14)
on pristine main as well — i.e. it's a pre-existing, environment-specific
crash unrelated to this change. The new monotonic tests and all
classification/regression/async/CI tests pass; CI should exercise the rest.

Summary by Sourcery

Add per-feature monotonic constraints to RFGBoostClassifier and propagate them through tree building to enforce globally monotone predictions for constrained features.

New Features:

  • Introduce a monotone_constraints vector in tree configuration and RFGBoostClassifier to specify per-feature monotonic directions.
  • Expose a Python monotone_constraints API on RFGBoostClassifier that accepts constraints by original column index and maps them to the encoded feature order for binary classification.

Enhancements:

  • Enforce monotonicity during tree growth via value-bound propagation and split rejection in both histogram-based and exact split routines while remaining a no-op when constraints are unset.
  • Plumb monotone_constraints through boosting, decision tree, random forest, and unsupervised tree configs with defaults that preserve existing behavior.

Tests:

  • Add monotonicity tests that verify increasing and decreasing constraints hold exactly when sweeping constrained features, check unconstrained models remain non-monotone on synthetic data, and confirm backward compatibility when constraints are not provided.

Per-feature monotonic constraints (+1 / -1 / 0) enforced exactly via
value-bound propagation during tree growth plus split-rejection, so
monotonicity holds globally (all-else-fixed), not just locally. Exposed as
RFGBoostClassifier(monotone_constraints={column_index: +1|-1}); defaults to
no-op, byte-identical to previous behavior when unset. Binary classification.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@sourcery-ai

sourcery-ai Bot commented Jun 3, 2026

Copy link
Copy Markdown

Reviewer's Guide

Adds per-feature monotonic constraints to RFGBoostClassifier by threading a monotone_constraints vector through tree configuration and enforcing it via split rejection and value-bound propagation in tree building, plus a Python API surface and tests.

Flow diagram for monotonic enforcement in tree building

flowchart TD
    A["build_node / build_node_exact"] --> B["find_best_split_hist / find_best_split_exact"]
    B --> C{Constraint via constraint_of}
    C -->|c == 0| D["accept split candidate"]
    C -->|c != 0 and child means violate| E["reject split candidate (continue)"]
    D --> F["child_bounds (compute ll, lu, rl, ru)"]
    F --> G["build_node / build_node_exact on left with (ll, lu)"]
    F --> H["build_node / build_node_exact on right with (rl, ru)"]
    G --> I["create_leaf (value clamped to [lower, upper])"]
    H --> J["create_leaf (value clamped to [lower, upper])"]
Loading

File-Level Changes

Change Details Files
Introduce per-feature monotonic constraints in tree configuration and enforce them during node construction.
  • Extend TreeConfig with a monotone_constraints vector and helper to read per-feature constraint values.
  • Add value-bound propagation and clamping in build_node/build_node_exact and create_leaf so descendant leaf values respect ancestor monotone constraints.
  • Introduce child_bounds logic to compute per-child value intervals based on constrained feature splits and weighted child means.
src/tree.rs
Ensure split selection respects monotonic constraints so chosen tree structure is globally monotone-compatible.
  • Update histogram-based split search to reject candidate splits whose child means violate the specified monotone direction.
  • Apply the same split-rejection logic in the exact split search path to keep both algorithms consistent with monotone enforcement.
src/tree.rs
Plumb monotonic constraints through boosting and tree-based models while keeping existing models effectively unconstrained by default.
  • Add monotone_constraints storage and constructor parameter to RFGBoostClassifier and pass it into TreeConfig during fit.
  • Explicitly set empty monotone_constraints in RFGBoostRegressor, DecisionTree, RandomForestRegressor, RandomForestClassifier, and RandomForestUnsupervised tree configs so their behavior remains unchanged.
  • Update RFGBoost unified constructor to accept an optional constraint vector and pass None for non-classifier paths.
src/boosting.rs
src/decision_tree.rs
src/random_forest.rs
src/unsupervised.rs
Expose monotonic constraints in the Python RFGBoostClassifier API and align them with the encoded feature order used by the Rust backend.
  • Add a monotone_constraints parameter to the Python RFGBoostClassifier initializer and store it as an attribute.
  • Implement _encoded_monotone_list to translate {original_column_index: direction} into a per-encoded-feature list, honoring cat_features/WOE ordering for binary classification.
  • Wire the encoded monotone list into the Rust classifier parameters only for the binary path and keep it None for multiclass.
rfgboost/_woe.py
Validate monotonic constraint behavior and backward compatibility via targeted tests.
  • Add tests verifying that with +1/-1 monotone constraints, predicted probabilities are non-decreasing/non-increasing in the constrained feature under sweeps with other features fixed.
  • Add a backward-compatibility test asserting predictions are identical when constraints are unset versus the previous default behavior.
  • Add a sanity test showing the unconstrained model is genuinely non-monotone on synthetic U-shaped data so the constraint tests are non-vacuous.
tests/test_monotonic.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In the Python wrapper, consider validating monotone_constraints early (e.g., keys within [0, n_features), values in {-1, 0, 1}) so user errors fail fast with a clear message rather than silently being treated as unconstrained or causing subtle issues.
  • In the histogram/exact split paths you already have left_w, left_wy, right_w, right_wy available; you could refactor child_bounds to take these aggregates instead of recomputing weighted means over index lists to avoid extra passes over the data at each split.
  • The new weighted_mean helper and the regression branch of create_leaf implement very similar logic for computing a weighted average; consider consolidating this into a single utility to keep the behavior (especially the zero-weight fallback) consistent and reduce duplication.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the Python wrapper, consider validating `monotone_constraints` early (e.g., keys within `[0, n_features)`, values in `{-1, 0, 1}`) so user errors fail fast with a clear message rather than silently being treated as unconstrained or causing subtle issues.
- In the histogram/exact split paths you already have `left_w`, `left_wy`, `right_w`, `right_wy` available; you could refactor `child_bounds` to take these aggregates instead of recomputing weighted means over index lists to avoid extra passes over the data at each split.
- The new `weighted_mean` helper and the regression branch of `create_leaf` implement very similar logic for computing a weighted average; consider consolidating this into a single utility to keep the behavior (especially the zero-weight fallback) consistent and reduce duplication.

## Individual Comments

### Comment 1
<location path="rfgboost/_woe.py" line_range="125-130" />
<code_context>
         # Store hyperparameters as attributes (sklearn BaseEstimator convention)
+        # monotone_constraints: {original_column_index: +1|-1|0}, translated to
+        # encoded-feature order at fit time (binary classification only).
+        self.monotone_constraints = monotone_constraints
         self.n_estimators = n_estimators
         self.learning_rate = learning_rate
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Validate or normalize monotone constraint values before passing them through to Rust.

`monotone_constraints` is currently stored and passed through as arbitrary ints, even though Rust only uses the sign (`+1`, `-1`, `0`). Consider normalizing here (e.g. any >0 → 1, <0 → -1, else → 0) so the Python API is robust to accidental values like `2` or `-3` and behavior is predictable.

```suggestion
        # Store hyperparameters as attributes (sklearn BaseEstimator convention)
        # monotone_constraints: {original_column_index: +1|-1|0}, translated to
        # encoded-feature order at fit time (binary classification only).
        # Normalize constraint values so only {-1, 0, 1} are stored/passed through.
        self.monotone_constraints = (
            {
                col_idx: 1 if direction > 0 else -1 if direction < 0 else 0
                for col_idx, direction in monotone_constraints.items()
            }
            if monotone_constraints is not None
            else None
        )
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread rfgboost/_woe.py
- Validate monotone_constraints at fit time (keys must be valid column
  indices) and normalize direction values to their sign, so only {-1,0,1}
  reach the core and user errors fail fast with a clear message. Kept in
  fit rather than __init__ to preserve the sklearn 'store params verbatim'
  convention (clone/get_params).
- create_leaf now reuses weighted_mean, removing duplicated weighted-average
  logic and keeping the zero-weight fallback consistent.
- Add tests: sign-normalization and invalid-key validation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant