Draft
Conversation
… memory utilization in asv
added regression forest benchmark
upstream changes
adam2392
reviewed
Aug 7, 2024
Collaborator
There was a problem hiding this comment.
Feel free to edit this comment to add information:
Testing
The current unit test tells me that the code runs, but uncertain if it works to an outsider.
- Add a unit-test comparing honest tree and dishonest tree depth on the same dataset
def test_honest_tree_depth_vs_dishonest_tree - Add a short Jupyter notebook comparing the visualization of a honest/dishonest tree (https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html) on a fixed toy simulated dataset.
- etc. please document things you intend on testing w/ a brief sketch?
Questions/Comments
- logistics: I think it is possible to keep all sort functions within
_partitionerand have this diff be essentially almost gone. See https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/tree/_partitioner.pxd. It's just easier to reason about the code if there's less diff. Similarly in areas where there's diff that isn't related to the functionality of the PR, it'd be good to remove whenever you can. - It's unclear to me exactly the diff between _events.pxd/pyx, _honesty.pxd/pyx files, and the events abstractions created in the splitter/tree files. That is they define relevant events, but unclear what is necessary versus what is not. Can you elaborate by adding a file docstring at the top of the pxd files to help illustrate the intentions?
- I think overall this is an interesting design exploration for the reasons we've discussed over the past 6 months. However, I don't see us merging this as is because the changes are going to affect the maintainability of the scikit-learn fork, which is a hard dependency for treeple. With more testing, and a separate naive implementation of honesty, I think we can scope out how to get this functionality into treeple.
I think a separate PR implementing honesty naively as a separate splitter (similar to EconML and does not have to be beautiful) would be good to compare side-by-side. Do you think you can implement that once this PR branch has been tested and documented?
sklearn/tree/tests/test_tree.py
Outdated
| name, criterion, score | ||
| ) | ||
|
|
||
| clf = Tree(criterion=criterion, max_features=2, random_state=0) |
Collaborator
There was a problem hiding this comment.
Is there a reason max_features=2?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
What does this implement/fix? Explain your changes.
First draft honesty module
Any other comments?