Problems with random forest classifier when using more and deeper leaners

Hi, I'm new to thunderGBM,

I just run the example of random forest
'''
  from thundergbm import TGBMClassifier
  from sklearn.datasets import load_digits
  from sklearn.metrics import accuracy_score
  
  x, y = load_digits(return_X_y=True)
  clf = TGBMClassifier(bagging=1,depth=12, n_trees=1,n_parallel_trees=100)
  clf.fit(x, y)
  y_pred = clf.predict(x)
  accuracy = accuracy_score(y, y_pred)
  print(accuracy)
'''
and several problems have arisen:

First, I watch the verbose and found that, when set "n_trees=1", the classifier only use 1 leaner, no matter how I set the value of "n_parallel_trees", contrary to the claim in issue #42 .

Furthermore, I try more and deeper learners, when "depth" is more than 20, or "n_trees" more than 70, the program may well crash. When I use python file, it turns out to be a Segmentation fault (core dumped), when I use jupyter notebook, the kernel died. When I try a large dataset with millions of samples, it crashed even when converting csr to csc. Cause I'm using a workstation with a CPU of 32 cores, 128 GB memory, and a RTX 3090 GPU, I don't believe this is a hardware issue. Is thunderGBM only capable to train really small forests on small datasets ? That's unacceptable. I'm confused and hope to see the power of thunderGBM.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with random forest classifier when using more and deeper leaners #80

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Problems with random forest classifier when using more and deeper leaners #80

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions