Scarliles/honesty by SamuelCarliles3 · Pull Request #69 · neurodata/scikit-learn

SamuelCarliles3 · 2024-07-02T19:19:06Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

First draft honesty module

Any other comments?

…t to injections

… memory utilization in asv

added regression forest benchmark

…ubmodulev3

…ession-benchmark

upstream changes

adam2392

Feel free to edit this comment to add information:

Testing

The current unit test tells me that the code runs, but uncertain if it works to an outsider.

Add a unit-test comparing honest tree and dishonest tree depth on the same dataset def test_honest_tree_depth_vs_dishonest_tree
Add a short Jupyter notebook comparing the visualization of a honest/dishonest tree (https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html) on a fixed toy simulated dataset.
etc. please document things you intend on testing w/ a brief sketch?

Questions/Comments

logistics: I think it is possible to keep all sort functions within _partitioner and have this diff be essentially almost gone. See https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/tree/_partitioner.pxd. It's just easier to reason about the code if there's less diff. Similarly in areas where there's diff that isn't related to the functionality of the PR, it'd be good to remove whenever you can.
It's unclear to me exactly the diff between _events.pxd/pyx, _honesty.pxd/pyx files, and the events abstractions created in the splitter/tree files. That is they define relevant events, but unclear what is necessary versus what is not. Can you elaborate by adding a file docstring at the top of the pxd files to help illustrate the intentions?
I think overall this is an interesting design exploration for the reasons we've discussed over the past 6 months. However, I don't see us merging this as is because the changes are going to affect the maintainability of the scikit-learn fork, which is a hard dependency for treeple. With more testing, and a separate naive implementation of honesty, I think we can scope out how to get this functionality into treeple.

I think a separate PR implementing honesty naively as a separate splitter (similar to EconML and does not have to be beautiful) would be good to compare side-by-side. Do you think you can implement that once this PR branch has been tested and documented?

adam2392 · 2024-08-07T13:47:21Z

sklearn/tree/tests/test_tree.py

+            name, criterion, score
+        )
+
+        clf = Tree(criterion=criterion, max_features=2, random_state=0)


Is there a reason max_features=2?

…ge in ensemble.

SamuelCarliles3 and others added 30 commits February 16, 2024 13:36

init split condition injection

8c09f7f

wip

ecfc9b1

wip

0c3d5c0

wip

5fd12a2

injection progress

b593ee0

injection progress

180fac3

split injection refactoring

c207c3e

added condition parameter passthrough prototype

7cc71c1

some tidying

2470d49

more tidying

ee3399f

splitter injection refactoring

a079e4f

cython injection due diligence, converted min_sample and monotonic_cs…

5397b66

…t to injections

tree tests pass huzzah!

44f1d57

added some splitconditions to header

4f19d53

commented out some sample code that was substantially increasing peak…

cb71be0

… memory utilization in asv

added vector resize

e34be5c

wip

aac802e

Merge branch 'submodulev3' into scarliles/splitter-injection-redux

c12f2fd

settling injection memory management for now

a7f5e92

added regression forest benchmark

7a70a0b

Merge pull request #2 from ssec-jhu/scarliles/regression-benchmark

d9ad68a

added regression forest benchmark

ran black for linting check

893d588

Merge branch 'submodulev3' of github.com:ssec-jhu/scikit-learn into s…

548493c

…ubmodulev3

Merge branch 'submodulev3' into scarliles/regression-benchmark

e4b53ff

Merge branch 'neurodata:submodulev3' into submodulev3

089d901

Merge branch 'submodulev3' of github.com:ssec-jhu/scikit-learn into s…

3ba5f74

…ubmodulev3

Merge branch 'scarliles/splitter-injection-redux' into scarliles/regr…

cf285c1

…ession-benchmark

Merge pull request #3 from ssec-jhu/scarliles/regression-benchmark

ffc6328

upstream changes

initial pass at refactoring DepthFirstTreeBuilder.build

87c90fd

some renaming to make closure pattern more obvious

51da586

SamuelCarliles3 added 6 commits July 22, 2024 17:54

honesty wip

febf5e9

honesty wip

5e7d07d

honesty wip

2c4e992

honesty wip

2346e4d

honesty wip

551fcf1

honesty wip

f1fb747

adam2392 reviewed Aug 7, 2024

View reviewed changes

SamuelCarliles3 and others added 23 commits August 9, 2024 18:17

honest partition testing wip

2f2d15a

honest leaf validity test working

cd79492

honest prediction wip

53cf65c

honest prediction wip

a9e065b

honest prediction passing tests

80c391d

hacked in working honest predict_proba, progress on honest regression

9b5651e

first draft honest forest passing tests

cbb23ee

honesty wip

c565d65

treeple-compatibility tweaks

2316e4c

might testing wip

71cacf3

honest forest fixes, honest tree tests

6ea50cc

honest forest test added

492ddad

documented method and reasoning for Partitioner "defusing"

92156cf

documented event broker

5291fb1

commented changes to splitter

f655401

commented changes to tree

877a822

commented honesty module

3b16b8f

commented honest tree

5af6c0b

commented classes.py

d75a79b

fixed dependency in honest tree tests

bdb4ee1

Merge branch 'neurodata:submodulev3' into submodulev3

bd1dd04

merged back from submodulev3, overrode Partitioner and Splitter changes

35432ee

commented out some flaky tests in tree which now fail. correct covera…

7059bf7

…ge in ensemble.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scarliles/honesty#69

Scarliles/honesty#69
SamuelCarliles3 wants to merge 85 commits intoneurodata:submodulev3from
ssec-jhu:scarliles/honesty

SamuelCarliles3 commented Jul 2, 2024

Uh oh!

adam2392 left a comment •

edited

Loading

Uh oh!

adam2392 Aug 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SamuelCarliles3 commented Jul 2, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

adam2392 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Testing

Questions/Comments

Uh oh!

adam2392 Aug 7, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adam2392 left a comment •

edited

Loading