Skip to content

Problems with random forest classifier when using more and deeper leaners #80

@GradOpt

Description

@GradOpt

Hi, I'm new to thunderGBM,

I just run the example of random forest
'''
from thundergbm import TGBMClassifier
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score

x, y = load_digits(return_X_y=True)
clf = TGBMClassifier(bagging=1,depth=12, n_trees=1,n_parallel_trees=100)
clf.fit(x, y)
y_pred = clf.predict(x)
accuracy = accuracy_score(y, y_pred)
print(accuracy)
'''
and several problems have arisen:

First, I watch the verbose and found that, when set "n_trees=1", the classifier only use 1 leaner, no matter how I set the value of "n_parallel_trees", contrary to the claim in issue #42 .

Furthermore, I try more and deeper learners, when "depth" is more than 20, or "n_trees" more than 70, the program may well crash. When I use python file, it turns out to be a Segmentation fault (core dumped), when I use jupyter notebook, the kernel died. When I try a large dataset with millions of samples, it crashed even when converting csr to csc. Cause I'm using a workstation with a CPU of 32 cores, 128 GB memory, and a RTX 3090 GPU, I don't believe this is a hardware issue. Is thunderGBM only capable to train really small forests on small datasets ? That's unacceptable. I'm confused and hope to see the power of thunderGBM.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions