Add interpretability example notebooks#21
Open
jshinm wants to merge 37 commits intoneurodata:obliqueprfrom
Open
Add interpretability example notebooks#21jshinm wants to merge 37 commits intoneurodata:obliqueprfrom
jshinm wants to merge 37 commits intoneurodata:obliqueprfrom
Conversation
adam2392
requested changes
Jun 8, 2022
Collaborator
There was a problem hiding this comment.
LGTM once the following changes are made:
- 0. For simulation notebook: I would remove the gaussian circles and just focus on the sparse parity since that shows the most difference. and Remove max_feature=3*n_features
- 1.
notebook/iris_benchmark_OF_vs_RF.ipynbmove the relevant OF part content intoexamples/tree/plot_iris_dtc.py. - 2. For simulation notebook: Add description on the sparse parity problem according to the reference I linked. Here is a paraphrased summary of what we want to say:3.
Ref for sparse parity: https://epubs.siam.org/doi/epdf/10.1137/1.9781611974973.56
Sparse parity is a variation of the noisy parity problem, which itself is a multivariate generalization of the noisy XOR problem. This is a binary classification task in high dimensions.
<describe sparse parity as done in the paper in more laymen terms>
<describe the intuition for why OF would be better than RF>
e.g. OF should be more robust to high-dimensional noise. Moreover, due to the ability to sample more variable splits (i.e. `max_features` can be greater than `n_features` compared to RF), then we expect to see an increase in performance when we are willing to use computational power to sample more splits.
...
- 3. For MNIST notebook: only show
max_features =sqrtandn_features. - Add a section describing the dataset very briefly and then linking to https://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html for reference.
- Add a section similar to sparse parity talking about the differences between OF and RF. Add multi-class ROC curve.
- Add similar reports shown in the existing digits example.
Ideally we can try to have this done by Friday so we can show these to sklearn devs at OH on Monday. If you can't have this done by then (I know you have a lot of stuff going on!), please let me know and I can help out so we can have things ready by Monday.
Author
|
6/13/2022 TODOS:
Additional refs from sklearn dev team |
Collaborator
|
For documentation that will get merged into the PR branch:
For the real datasets, we can use cnae-9 and phishing-websites and wdbc from openml, which seemed to have differing performances for OF and RF:
Ideally we can have some intuition on why RF vs OF is better in one of these... |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Add 3 interpretability example notebooks
Any other comments?